docs: update README

This commit is contained in:
xiaowei 2020-12-17 16:46:42 +08:00
parent 3aa21f51c0
commit fa5fed2c2a
3 changed files with 172 additions and 180 deletions

View File

@ -1,33 +0,0 @@
# dt-sql-parser
[![NPM version][npm-image]][npm-url]
[npm-image]: https://img.shields.io/npm/v/dt-sql-parser.svg?style=flat-square
[npm-url]: https://www.npmjs.com/package/dt-sql-parser
## Installation
## Usage
### Basic
### Syntax validation
### Visitor
### Listener
## Example
## Roadmap
- Unify parser generate to Antlr4
- Generic SQL
- Flink SQL
- Libra SQL
- TiDB
MySQL Compatible Syntax
## Contributing
## License

View File

@ -7,15 +7,9 @@
[English](./README.md) | 简体中文 [English](./README.md) | 简体中文
dt-sql-parser 是一个基于 [ANTLR4](https://github.com/antlr/antlr4) 开发的 SQL 解析器集合。主要用于大数据开发中,对各类 SQL 的解析 dt-sql-parser 是一个基于 [ANTLR4](https://github.com/antlr/antlr4) 开发的 针对大数据领域的 `SQL Parser` 项目。通过[ANTLR4](https://github.com/antlr/antlr4) 默认生成的 Parser、Visitor 和 Listener 对象,我们可以轻松的做到对 SQL 语句的`语法检查`Syntax Validation、`词法分析`Tokenizer)、 `遍历 AST` 节点等功能。此外,还提供了几个辅助方法, 例如 SQL 切分Split、过滤 SQL 语句中的 `--``/**/` 等类型的注释
每种 SQL 都提供了对应基础类、Visitor 类和 Listener 类,包含了生成 token、生成 AST、语法校验、visitor 和 listener 模式遍历 AST 指定节点等功能。 已支持的 SQL 类型:
此外,为了方便解析,还提供了几个辅助方法可以在解析前对 SQL 进行格式处理。主要作用是清除 SQL 语句中的 '--' 和 '/**/' 两种类型的注释,以及拆分大段 SQL。
提示:项目中的 grammar 文件也可以通过 [ANTLR4](https://github.com/antlr/antlr4) 编译成其他语言
目前支持的 SQL
- MySQL - MySQL
- Flink SQL - Flink SQL
@ -23,9 +17,11 @@ dt-sql-parser 是一个基于 [ANTLR4](https://github.com/antlr/antlr4) 开发
- Hive SQL - Hive SQL
- PL/SQL - PL/SQL
> 提示:当前的 Parser 是 `Javascript` 语言版本,如果有必要,可以尝试编译 Grammar 文件到其他目标语言
## 安装 ## 安装
``` ```bash
// use npm // use npm
npm i dt-sql-parser --save npm i dt-sql-parser --save
@ -33,45 +29,60 @@ npm i dt-sql-parser --save
yarn add dt-sql-parser yarn add dt-sql-parser
``` ```
## 示例 ## 使用
### Clean ### 语法校验Syntax Validation
清除注释和前后空格 首先需要声明想对应的 Parser 对象,不同的 SQL 类型需要引入不同的 Parser 对象处理,例如如果是
真的 `Flink SQL`, 则需要单独引入 `FlinkSQL` 对象, 这里我们使用 `GenericSQL` 作为示例:
```javascript ```javascript
import { cleanSql } from 'dt-sql-parser'; import { GenericSQL } from 'dt-sql-parser';
const sql = `-- comment comment const parser = new GenericSQL();
select id,name from user1; `
const cleanedSql = cleanSql(sql)
console.log(cleanedSql)
const correctSql = 'select id,name from user1;';
const errors = parser.validate(correctSql);
console.log(errors);
```
输出:
```javascript
/* /*
select id,name from user1; []
*/ */
``` ```
### Split 校验失败示例:
分割 sql
```javascript ```javascript
import { splitSql } from 'dt-sql-parser'; const incorrectSql = 'selec id,name from user1;'
const errors = parser.validate(incorrectSql);
console.log(errors);
```
const sql = `select id,name from user1; 输出:
select id,name from user2;`
const sqlList = splitSql(sql)
console.log(sqlList)
```javascript
/* /*
["select id,name from user1;", "\nselect id,name from user2;"] [
{
endCol: 5,
endLine: 1,
startCol: 0,
startLine: 1,
message: "mismatched input 'SELEC' expecting {<EOF>, 'ALTER', 'ANALYZE', 'CALL', 'CHANGE', 'CHECK', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DROP', 'EXPLAIN', 'GET', 'GRANT', 'INSERT', 'KILL', 'LOAD', 'LOCK', 'OPTIMIZE', 'PURGE', 'RELEASE', 'RENAME', 'REPLACE', 'RESIGNAL', 'REVOKE', 'SELECT', 'SET', 'SHOW', 'SIGNAL', 'UNLOCK', 'UPDATE', 'USE', 'BEGIN', 'BINLOG', 'CACHE', 'CHECKSUM', 'COMMIT', 'DEALLOCATE', 'DO', 'FLUSH', 'HANDLER', 'HELP', 'INSTALL', 'PREPARE', 'REPAIR', 'RESET', 'ROLLBACK', 'SAVEPOINT', 'START', 'STOP', 'TRUNCATE', 'UNINSTALL', 'XA', 'EXECUTE', 'SHUTDOWN', '--', '(', ';'}"
}
]
*/ */
``` ```
### Tokens 先实例化 Parser 对象,然后使用 `validate` 方法对 SQL 语句进行校验,如果校验失败,则返回一个包含 `Error` 信息的数组。
对 sql 语句进行词法分析,生成 token ### 词法分析Tokenizer
必要场景下,可单独对 SQL 语句进行词法分析,获取所有的 Tokens 对象:
```javascript ```javascript
import { GenericSQL } from 'dt-sql-parser'; import { GenericSQL } from 'dt-sql-parser';
@ -99,47 +110,9 @@ console.log(tokens)
*/ */
``` ```
### Syntax validation ### 访问者模式Visitor
validate 方法对 sql 语句的语法正确性进行校验,返回一个由 error 组成的数组 使用 Visitor 模式访问 AST 中的指定节点
```javascript
import { GenericSQL } from 'dt-sql-parser';
const validate = (sql) => {
const parser = new GenericSQL()
const errors = parser.validate(sql)
console.log(errors)
}
```
语法正确的 sql:
```javascript
const correctSql = 'select id,name from user1;'
validate(correctSql)
/*
[]
*/
```
包含错误语法的 sql:
```javascript
const incorrectSql = 'selec id,name from user1;'
validate(incorrectSql)
/*
[
{
endCol: 5,
endLine: 1,
startCol: 0,
startLine: 1,
message: "mismatched input 'SELEC' expecting {<EOF>, 'ALTER', 'ANALYZE', 'CALL', 'CHANGE', 'CHECK', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DROP', 'EXPLAIN', 'GET', 'GRANT', 'INSERT', 'KILL', 'LOAD', 'LOCK', 'OPTIMIZE', 'PURGE', 'RELEASE', 'RENAME', 'REPLACE', 'RESIGNAL', 'REVOKE', 'SELECT', 'SET', 'SHOW', 'SIGNAL', 'UNLOCK', 'UPDATE', 'USE', 'BEGIN', 'BINLOG', 'CACHE', 'CHECKSUM', 'COMMIT', 'DEALLOCATE', 'DO', 'FLUSH', 'HANDLER', 'HELP', 'INSTALL', 'PREPARE', 'REPAIR', 'RESET', 'ROLLBACK', 'SAVEPOINT', 'START', 'STOP', 'TRUNCATE', 'UNINSTALL', 'XA', 'EXECUTE', 'SHUTDOWN', '--', '(', ';'}"
}
]
*/
```
### Visitor
使用 visitor 模式访问 AST 中的指定节点
```javascript ```javascript
import { GenericSQL, SqlParserVisitor } from 'dt-sql-parser'; import { GenericSQL, SqlParserVisitor } from 'dt-sql-parser';
@ -169,11 +142,12 @@ TableName user1
*/ */
``` ```
提示:使用 Visitor 模式时,节点的方法名称可以在对应 SQL 目录下的 Visitor 文件中查找
### Listener > 提示:使用 Visitor 模式时,节点的方法名称可以在对应 SQL 目录下的 Visitor 文件中查找
listener 模式,利用 [ANTLR4](https://github.com/antlr/antlr4) 提供的 ParseTreeWalker 对象遍历 AST进入各个节点时调用对应的方法。 ### 监听器Listener
Listener 模式,利用 [ANTLR4](https://github.com/antlr/antlr4) 提供的 ParseTreeWalker 对象遍历 AST进入各个节点时调用对应的方法。
```javascript ```javascript
import { GenericSQL, SqlParserListener } from 'dt-sql-parser'; import { GenericSQL, SqlParserListener } from 'dt-sql-parser';
@ -202,11 +176,47 @@ TableName user1
``` ```
提示:使用 Listener 模式时,节点的方法名称可以在对应 SQL 目录下的 Listener 文件中查找 > 提示:使用 Listener 模式时,节点的方法名称可以在对应 SQL 目录下的 Listener 文件中查找
### 其他 ### 清理注释内容
- parserTreeToString (将 SQL 解析成 AST再转成 string 形式) 清除注释和前后空格
```javascript
import { cleanSql } from 'dt-sql-parser';
const sql = `-- comment comment
select id,name from user1; `
const cleanedSql = cleanSql(sql)
console.log(cleanedSql)
/*
select id,name from user1;
*/
```
### 切割 SQL Split
SQL 太大的情况下我们可以先将SQL语句按 `;` 切割,然后逐句处理。
```javascript
import { splitSql } from 'dt-sql-parser';
const sql = `select id,name from user1;
select id,name from user2;`
const sqlList = splitSql(sql)
console.log(sqlList)
/*
["select id,name from user1;", "\nselect id,name from user2;"]
*/
```
### 其他 API
- parserTreeToString (input: string)
将 SQL 解析成 `List-like` 风格的树形字符串, 一般用于测试
## 路线图 ## 路线图

155
README.md
View File

@ -2,18 +2,16 @@
[![NPM version][npm-image]][npm-url] [![NPM version][npm-image]][npm-url]
English | [简体中文](./README-zh_CN.md)
[npm-image]: https://img.shields.io/npm/v/dt-sql-parser.svg?style=flat-square [npm-image]: https://img.shields.io/npm/v/dt-sql-parser.svg?style=flat-square
[npm-url]: https://www.npmjs.com/package/dt-sql-parser [npm-url]: https://www.npmjs.com/package/dt-sql-parser
English | [简体中文](./README-zh_CN.md) dt-sql-parser is a `SQL Parser` project built with [ANTLR4](https://github.com/antlr/antlr4), and it's mainly for the `BigData` domain. The [ANTLR4](https://github.com/antlr/antlr4) generated the basic Parser, Visitor, and Listener, so it's easy to complete `validate`, `tokenize`, `traverse` the AST, and so on features.
dt-sql-parser is a collection of SQL parsers developed based on [ANTLR4](https://github.com/antlr/antlr4) .It's mainly used for parsing all kinds of SQL in the development of big data. Besides, it' provides some helper methods, like `split` SQL, and filter the `--` and `/**/` types of comments in SQL.
It provides the basic class, Visitor class, and Listener class. These class including the ability to generate tokens, generate parse tree, syntax validation, and Visitor & Listener patterns to traverse the AST. > Tips: This project is the default for Javascript language, also you can try to compile it to other languages if you need.
In addition, several helper methods are provided to format the SQL before parsing. The main effect is to clear the '--' and '/**/' types of comments in SQL statements, and to split large chunks of SQL
tips: The Grammar file can also be compiled into other languages with [ANTLR4](https://github.com/antlr/antlr4) .
Supported SQL: Supported SQL:
@ -25,7 +23,7 @@ Supported SQL:
## Installation ## Installation
``` ```bash
// use npm // use npm
npm i dt-sql-parser --save npm i dt-sql-parser --save
@ -35,43 +33,61 @@ yarn add dt-sql-parser
## Usage ## Usage
### Clean ### Syntax Validation
clear comments and Spaces before and after First, we need to import the `Parser` object from `dt-sql-parser`, the different language needs
different Parser, so if you need to handle the `Flink SQL`, you can import the `FlinkSQL Parser`.
The below is a `GenericSQL Parser` example:
```javascript ```javascript
import { cleanSql } from 'dt-sql-parser'; import { GenericSQL } from 'dt-sql-parser';
const sql = `-- comment comment const parser = new GenericSQL();
select id,name from user1; `
const cleanedSql = cleanSql(sql)
console.log(cleanedSql)
const correctSql = 'select id,name from user1;';
const errors = parser.validate(correctSql);
console.log(errors);
```
output:
```javascript
/* /*
select id,name from user1; []
*/ */
``` ```
### Split validate failed:
split sql
```javascript ```javascript
import { splitSql } from 'dt-sql-parser'; const incorrectSql = 'selec id,name from user1;'
const errors = parser.validate(incorrectSql);
console.log(errors);
```
const sql = `select id,name from user1; output:
select id,name from user2;`
const sqlList = splitSql(sql)
console.log(sqlList)
```javascript
/* /*
["select id,name from user1;", "\nselect id,name from user2;"] [
{
endCol: 5,
endLine: 1,
startCol: 0,
startLine: 1,
message: "mismatched input 'SELEC' expecting {<EOF>, 'ALTER', 'ANALYZE', 'CALL', 'CHANGE', 'CHECK', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DROP', 'EXPLAIN', 'GET', 'GRANT', 'INSERT', 'KILL', 'LOAD', 'LOCK', 'OPTIMIZE', 'PURGE', 'RELEASE', 'RENAME', 'REPLACE', 'RESIGNAL', 'REVOKE', 'SELECT', 'SET', 'SHOW', 'SIGNAL', 'UNLOCK', 'UPDATE', 'USE', 'BEGIN', 'BINLOG', 'CACHE', 'CHECKSUM', 'COMMIT', 'DEALLOCATE', 'DO', 'FLUSH', 'HANDLER', 'HELP', 'INSTALL', 'PREPARE', 'REPAIR', 'RESET', 'ROLLBACK', 'SAVEPOINT', 'START', 'STOP', 'TRUNCATE', 'UNINSTALL', 'XA', 'EXECUTE', 'SHUTDOWN', '--', '(', ';'}"
}
]
*/ */
``` ```
### Tokens We instanced a Parser object, and use the `validate` method to check the SQL syntax, if failed
returns an array object includes `error` message.
lexical analysis, generate token ### Tokenizer
You can also the all `tokens` by the Parser:
```javascript ```javascript
import { GenericSQL } from 'dt-sql-parser'; import { GenericSQL } from 'dt-sql-parser';
@ -99,47 +115,9 @@ console.log(tokens)
*/ */
``` ```
### Syntax validation
verifies the syntax correctness of the SQL statement and returns an array of errors
```javascript
import { GenericSQL } from 'dt-sql-parser';
const validate = (sql) => {
const parser = new GenericSQL()
const errors = parser.validate(sql)
console.log(errors)
}
```
correct sql:
```javascript
const correctSql = 'select id,name from user1;'
validate(correctSql)
/*
[]
*/
```
incorrect sql:
```javascript
const incorrectSql = 'selec id,name from user1;'
validate(incorrectSql)
/*
[
{
endCol: 5,
endLine: 1,
startCol: 0,
startLine: 1,
message: "mismatched input 'SELEC' expecting {<EOF>, 'ALTER', 'ANALYZE', 'CALL', 'CHANGE', 'CHECK', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DROP', 'EXPLAIN', 'GET', 'GRANT', 'INSERT', 'KILL', 'LOAD', 'LOCK', 'OPTIMIZE', 'PURGE', 'RELEASE', 'RENAME', 'REPLACE', 'RESIGNAL', 'REVOKE', 'SELECT', 'SET', 'SHOW', 'SIGNAL', 'UNLOCK', 'UPDATE', 'USE', 'BEGIN', 'BINLOG', 'CACHE', 'CHECKSUM', 'COMMIT', 'DEALLOCATE', 'DO', 'FLUSH', 'HANDLER', 'HELP', 'INSTALL', 'PREPARE', 'REPAIR', 'RESET', 'ROLLBACK', 'SAVEPOINT', 'START', 'STOP', 'TRUNCATE', 'UNINSTALL', 'XA', 'EXECUTE', 'SHUTDOWN', '--', '(', ';'}"
}
]
*/
```
### Visitor ### Visitor
access the specified node in the AST by Visitor pattern Traverse the tree node by the Visitor:
```javascript ```javascript
import { GenericSQL, SqlParserVisitor } from 'dt-sql-parser'; import { GenericSQL, SqlParserVisitor } from 'dt-sql-parser';
@ -169,7 +147,8 @@ TableName user1
*/ */
``` ```
tips: The node's method name can be found in the Visitor file under the corresponding SQL directory
> Tips: The node's method name can be found in the Visitor file under the corresponding SQL directory
### Listener ### Listener
@ -202,11 +181,47 @@ TableName user1
``` ```
tips: The node's method name can be found in the Listener file under the corresponding SQL directory > Tips: The node's method name can be found in the Listener file under the corresponding SQL directory
### Other ### Clean
- parserTreeToString (parse the SQL into AST and turn it into a String) Clear the `comments` and `spaces` before and after
```javascript
import { cleanSql } from 'dt-sql-parser';
const sql = `-- comment comment
select id,name from user1; `
const cleanedSql = cleanSql(sql)
console.log(cleanedSql)
/*
select id,name from user1;
*/
```
### Split SQL
When the SQL text is very big, you can think about to split it by `;` , and handle each line.
```javascript
import { splitSql } from 'dt-sql-parser';
const sql = `select id,name from user1;
select id,name from user2;`
const sqlList = splitSql(sql)
console.log(sqlList)
/*
["select id,name from user1;", "\nselect id,name from user2;"]
*/
```
### Other API
- parserTreeToString(input: string)
Parse the input and convert the AST to a `List-like` tree string.
## Roadmap ## Roadmap