docs: update README

This commit is contained in:
hayden 2023-10-20 16:04:51 +08:00 committed by Hayden
parent 917cb988f2
commit 970bf4ee84
2 changed files with 270 additions and 78 deletions

View File

@ -7,36 +7,67 @@
[English](./README.md) | 简体中文
dt-sql-parser 是一个基于 [ANTLR4](https://github.com/antlr/antlr4) 开发的, 针对大数据领域的 **SQL Parser** 项目。通过[ANTLR4](https://github.com/antlr/antlr4) 默认生成的 Parser、Visitor 和 Listener 对象,我们可以轻松的做到对 SQL 语句的**语法检查**Syntax Validation、**词法分析**Tokenizer)、 **遍历 AST** 节点等功能。此外,还提供了几个辅助方法, 例如 SQL 切割Split、过滤 SQL 语句中的 `--``/**/` 等类型的注释
dt-sql-parser 是一个基于 [ANTLR4](https://github.com/antlr/antlr4) 开发的, 针对大数据领域的 **SQL Parser** 项目。通过[ANTLR4](https://github.com/antlr/antlr4) 默认生成的 Parser、Visitor 和 Listener 对象,我们可以轻松的做到对 SQL 语句的**语法检查**Syntax Validation、**词法分析**Tokenizer)、 **遍历 AST** 节点等功能。此外,还提供了一些辅助方法, 例如 **SQL 切割Split**、**自动补全**等
已支持的 SQL 类型:
**已支持的 SQL 类型:**
- MySQL
- Generic SQL (MySQL)
- Flink SQL
- Spark SQL
- Hive SQL
- PL/SQL
- PostgreSQL
- Trino SQL
**SQL 辅助方法支持**
| SQL 类型 | SQL 切割 | 自动补全 |
| ----------- | -------- | -------- |
| Generic SQL | WIP | WIP |
| Flink SQL | ✅ | ✅ |
| Spark SQL | ✅ | ✅ |
| Hive SQL | ✅ | ✅ |
| PL/SQL | WIP | WIP |
| Postgre SQL | WIP | WIP |
| Trino SQL | WIP | WIP |
> 提示:当前的 Parser 是 `Javascript` 语言版本,如果有必要,可以尝试编译 Grammar 文件到其他目标语言
<br/>
## 与 MonacoEditor 集成
我们提供了一个[monaco-sql-languages](https://github.com/DTStack/monaco-sql-languages)包,你可以轻易的将`dt-sql-parser`与`monaco-editor`集成。
<br/>
## 安装
```bash
// use npm
# use npm
npm i dt-sql-parser --save
// use yarn
# use yarn
yarn add dt-sql-parser
```
<br/>
## 使用
在开始使用前,需要先了解基本的使用方式。`dt-sql-parser` 为不同类型的 SQL分别提供相应的 SQL Parser 类:
```javascript
import { GenericSQL, FlinkSQL, SparkSQL, HiveSQL, PLSQL, PostgresSQL, TrinoSQL } from 'dt-sql-parser';
```
在使用语法校验,自动补全等功能之前,需要先实例化对应 SQL 类型的 Parser`GenericSQL` 为例:
```javascript
const parser = new GenericSQL();
```
下文中的使用示例将使用 `GenericSQL`,其他 SQL 类型的 Parser 使用方式与`GenericSQL` 相同。
<br/>
### 语法校验Syntax Validation
首先需要声明相应的 Parser 对象,不同的 SQL 类型需要引入不同的 Parser 对象处理,例如如果是
针对 **Flink SQL**,则需要单独引入 **FlinkSQL** Parser这里我们使用 **GenericSQL** 作为示例:
```javascript
import { GenericSQL } from 'dt-sql-parser';
@ -81,6 +112,8 @@ console.log(errors);
先实例化 Parser 对象,然后使用 `validate` 方法对 SQL 语句进行校验,如果校验失败,则返回一个包含 `error` 信息的数组。
<br/>
### 词法分析Tokenizer
必要场景下,可单独对 SQL 语句进行词法分析,获取所有的 Tokens 对象:
@ -110,6 +143,8 @@ console.log(tokens)
*/
```
<br/>
### 访问者模式Visitor
使用 Visitor 模式访问 AST 中的指定节点
@ -145,9 +180,11 @@ TableName user1
> 提示:使用 Visitor 模式时,节点的方法名称可以在对应 SQL 目录下的 Visitor 文件中查找
<br/>
### 监听器Listener
Listener 模式,利用 [ANTLR4](https://github.com/antlr/antlr4) 提供的 ParseTreeWalker 对象遍历 AST进入各个节点时调用对应的方法。
Listener 模式,利用 [ANTLR4](https://github.com/antlr/antlr4) 提供的 `ParseTreeWalker` 对象遍历 AST进入各个节点时调用对应的方法。
```javascript
import { GenericSQL, SqlParserListener } from 'dt-sql-parser';
@ -178,50 +215,112 @@ TableName user1
> 提示:使用 Listener 模式时,节点的方法名称可以在对应 SQL 目录下的 Listener 文件中查找
### 清理注释内容
清除注释和前后空格
<br/>
### SQL 按语句切割
`FlinkSQL` 为例:
```javascript
import { cleanSql } from 'dt-sql-parser';
const sql = `-- comment comment
select id,name from user1; `
const cleanedSql = cleanSql(sql)
console.log(cleanedSql)
import { FlinkSQL } from 'dt-sql-parser';
const parser = new FlinkSQL();
const sql = 'SHOW TABLES;\nSELECT * FROM tb;';
const sqlSlices = parser.splitSQLByStatement(sql);
console.log(sqlSlices)
/*
select id,name from user1;
[
{
startIndex: 0,
endIndex: 11,
startLine: 1,
endLine: 1,
startColumn: 1,
endColumn: 12,
text: 'SHOW TABLES;'
},
{
startIndex: 13,
endIndex: 29,
startLine: 2,
endLine: 2,
startColumn: 1,
endColumn: 17,
text: 'SELECT * FROM tb;'
}
]
*/
```
### 切割 SQL Split
<br/>
SQL 太大的情况下我们可以先将SQL语句按 `;` 切割,然后逐句处理。
### 自动补全Auto Complete
在 sql 的指定位置上获取自动补全信息,以 `FlinkSQL` 为例:
```javascript
import { splitSql } from 'dt-sql-parser';
调用 `getSuggestionAtCaretPosition` 方法,传入 sql 内容和需要自动补全的位置的行列号。
+ 获取关键字候选项列表
const sql = `select id,name from user1;
select id,name from user2;`
const sqlList = splitSql(sql)
console.log(sqlList)
```javascript
import { FlinkSQL } from 'dt-sql-parser';
const parser = new FlinkSQL();
const sql = 'CREATE ';
const pos = { lineNumber: 1, column: 16 }; // 最后一个位置
const keywords = parser.getSuggestionAtCaretPosition(sql, pos)?.keywords;
console.log(keywords);
/*
["select id,name from user1;", "\nselect id,name from user2;"]
*/
```
/*
[ 'CATALOG', 'FUNCTION', 'TEMPORARY', 'VIEW', 'DATABASE', 'TABLE' ]
*/
```
+ 获取语法相关自动补全信息
```javascript
const parser = new FlinkSQL();
const sql = 'SELECT * FROM tb';
const pos = { lineNumber: 1, column: 16 }; // tb 的后面
const syntaxSuggestions = parser.getSuggestionAtCaretPosition(sql, pos)?.syntax;
console.log(syntaxSuggestions);
/*
[
{
syntaxContextType: 'table',
wordRanges: [
{
text: 'tb',
startIndex: 14,
stopIndex: 15,
line: 1,
startColumn: 15,
stopColumn: 16
}
]
},
{
syntaxContextType: 'view',
wordRanges: [
{
text: 'tb',
startIndex: 14,
stopIndex: 15,
line: 1,
startColumn: 15,
stopColumn: 16
}
]
}
]
*/
```
语法相关自动补全信息返回一个数组,数组中每一项代表该位置可以填写什么语法,比如上例中的输出结果代表该位置可以填写**表名**或者**视图名称**。其中 `syntaxContextType` 是可以补全的语法类型,`wordRanges` 则是已经填写的内容。
<br/>
### 其他 API
- parserTreeToString (input: string)
- `createLexer` 创建一个 Antlr4 Lexer 实例并返回;
- `createParser` 创建一个 Antlr4 Parser 实例并返回;
- `parse` 解析输入的 sql并返回解析树
将 SQL 解析成 `List-like` 风格的树形字符串, 一般用于测试
## 路线图
- Auto-complete
- Format code
<br/>
## 许可证

171
README.md
View File

@ -15,9 +15,9 @@ English | [简体中文](./README-zh_CN.md)
dt-sql-parser is a **SQL Parser** project built with [ANTLR4](https://github.com/antlr/antlr4), and it's mainly for the **BigData** domain. The [ANTLR4](https://github.com/antlr/antlr4) generated the basic Parser, Visitor, and Listener, so it's easy to complete the **syntax validation**, **tokenizer**, **traverse** the AST, and so on features.
Besides, it provides some helper methods, like **split** SQL, and filter the `--` and `/**/` types of comments in SQL.
Besides, it provides some helper methods, like **split** SQL, and **Auto-Complete**.
Supported SQL:
**Supported SQL**:
- Generic SQL (MySQL)
- Flink SQL
@ -27,32 +27,57 @@ Supported SQL:
- PostgreSQL
- Trino SQL
**Supported helper methods**
| SQL Type | SQL Split | Auto-Complete |
| ----------- | -------- | -------- |
| Generic SQL | WIP | WIP |
| Flink SQL | ✅ | ✅ |
| Spark SQL | ✅ | ✅ |
| Hive SQL | ✅ | ✅ |
| PL/SQL | WIP | WIP |
| Postgre SQL | WIP | WIP |
| Trino SQL | WIP | WIP |
>Tips: This project is the default for Javascript language, also you can try to compile it to other languages if you need.
<br/>
## Integrating SQL Parser with Monaco Editor
We have provided a [monaco-sql-languages](https://github.com/DTStack/monaco-sql-languages) package, you can integrate with `monaco-editor`
easily.
<br/>
## Installation
```bash
// use npm
# use npm
npm i dt-sql-parser --save
// use yarn
# use yarn
yarn add dt-sql-parser
```
<br/>
## Usage
Before you get started, you need to understand the basics of how to use it. `dt-sql-parser` provides SQL parser classes for different types of supported SQL:
```javascript
import { GenericSQL, FlinkSQL, SparkSQL, HiveSQL, PLSQL, PostgresSQL, TrinoSQL } from 'dt-sql-parser';
```
Before using syntax validation, autocompletion, and other method, you need to instantiate the Parser of the corresponding SQL type, taking `GenericSQL` as an example:
```javascript
const parser = new GenericSQL();
```
The usage examples below will use `GenericSQL`, and Parser for other SQL types will be used in the same way as `GenericSQL`.
<br/>
### Syntax Validation
First, we need to import the **Parser** object from `dt-sql-parser`, the different language needs
different Parser, so if you need to handle the **Flink SQL**, you can import the **FlinkSQL Parser**.
The below is a **GenericSQL Parser** example:
```javascript
import GenericSQL from 'dt-sql-parser/dist/parser/generic';
@ -98,6 +123,8 @@ Output:
We instanced a Parser object, and use the **validate** method to check the SQL syntax, if failed
returns an array object includes **error** message.
<br/>
### Tokenizer
Get all **tokens** by the Parser:
@ -127,6 +154,8 @@ console.log(tokens)
*/
```
<br/>
### Visitor
Traverse the tree node by the Visitor:
@ -163,6 +192,8 @@ TableName user1
> Tips: The node's method name can be found in the Visitor file under the corresponding SQL directory
<br/>
### Listener
Access the specified node in the AST by the Listener
@ -197,50 +228,112 @@ TableName user1
> Tips: The node's method name can be found in the Listener file under the corresponding SQL directory
### Clean
Clear the **comments** and **spaces** before and after
<br/>
### Split sql by statement
Take `FlinkSQL` as an example:
```javascript
import { cleanSql } from 'dt-sql-parser';
const sql = `-- comment comment
select id,name from user1; `
const cleanedSql = cleanSql(sql)
console.log(cleanedSql)
import { FlinkSQL } from 'dt-sql-parser';
const parser = new FlinkSQL();
const sql = 'SHOW TABLES;\nSELECT * FROM tb;';
const sqlSlices = parser.splitSQLByStatement(sql);
console.log(sqlSlices)
/*
select id,name from user1;
[
{
startIndex: 0,
endIndex: 11,
startLine: 1,
endLine: 1,
startColumn: 1,
endColumn: 12,
text: 'SHOW TABLES;'
},
{
startIndex: 13,
endIndex: 29,
startLine: 2,
endLine: 2,
startColumn: 1,
endColumn: 17,
text: 'SELECT * FROM tb;'
}
]
*/
```
### Split SQL
<br/>
When the SQL text is very big, you can think about to split it by `;` , and handle it by each line.
### Auto Complete
Get the autocomplete information in the specified position of sql, using `FlinkSQL` as an example:
```javascript
import { splitSql } from 'dt-sql-parser';
Call the `getSuggestionAtCaretPosition` method, passing in the SQL content and the row and column numbers of the position that need to be autocompleted.
+ Get a list of keyword candidates
const sql = `select id,name from user1;
select id,name from user2;`
const sqlList = splitSql(sql)
console.log(sqlList)
```javascript
import { FlinkSQL } from 'dt-sql-parser';
const parser = new FlinkSQL();
const sql = 'CREATE ';
const pos = { lineNumber: 1, column: 16 }; // the end position
const keywords = parser.getSuggestionAtCaretPosition(sql, pos)?.keywords;
console.log(keywords);
/*
["select id,name from user1;", "\nselect id,name from user2;"]
*/
```
/*
[ 'CATALOG', 'FUNCTION', 'TEMPORARY', 'VIEW', 'DATABASE', 'TABLE' ]
*/
```
+ Gets syntax-related autocompletion information
```javascript
const parser = new FlinkSQL();
const sql = 'SELECT * FROM tb';
const pos = { lineNumber: 1, column: 16 }; // after 'tb'
const syntaxSuggestions = parser.getSuggestionAtCaretPosition(sql, pos)?.syntax;
console.log(syntaxSuggestions);
/*
[
{
syntaxContextType: 'table',
wordRanges: [
{
text: 'tb',
startIndex: 14,
stopIndex: 15,
line: 1,
startColumn: 15,
stopColumn: 16
}
]
},
{
syntaxContextType: 'view',
wordRanges: [
{
text: 'tb',
startIndex: 14,
stopIndex: 15,
line: 1,
startColumn: 15,
stopColumn: 16
}
]
}
]
*/
```
Syntax-related autocomplete information returns an array, and each item in the array represents what syntax can be filled in at that position, such as the output result in the above example represents that the position can be filled in **table name** or **view name**. The `syntaxContextType` is a syntax type that can be completed, and `wordRanges` is what has been filled in.
<br/>
### Other API
- parserTreeToString(input: string)
- `createLexer` Create an instance of Antlr4 Lexer and return;
- `createParser` Create an instance of Antlr4 parser and return;
- `parse` Parses the input SQL and returns the parse tree;
Parse the input and convert the AST to a `List-like` tree string.
## Roadmap
- Auto-complete
- Code formatting
<br/>
## License