lava-oushudb-dt-sql-parser/README.md

341 lines
8.9 KiB
Markdown
Raw Normal View History

2018-07-03 11:33:53 +08:00
# dt-sql-parser
2018-07-02 18:01:01 +08:00
2022-03-01 16:59:34 +08:00
[![NPM version][npm-image]][npm-url] [![NPM downloads][download-img]][download-url] [![Chat][online-chat-img]][online-chat-url]
2020-05-12 09:39:49 +08:00
2020-12-17 11:17:14 +08:00
English | [简体中文](./README-zh_CN.md)
2020-12-15 15:04:46 +08:00
2020-12-17 16:46:42 +08:00
[npm-image]: https://img.shields.io/npm/v/dt-sql-parser.svg?style=flat-square
[npm-url]: https://www.npmjs.com/package/dt-sql-parser
2020-12-15 15:04:46 +08:00
2021-01-04 14:44:26 +08:00
[download-img]: https://img.shields.io/npm/dm/dt-sql-parser.svg?style=flat
[download-url]: https://www.npmjs.com/package/dt-sql-parser
2022-03-01 16:59:34 +08:00
[online-chat-img]: https://img.shields.io/discord/920616811261743104?logo=Molecule
[online-chat-url]: https://discord.gg/uVvq6mfPfa
2020-12-17 17:15:46 +08:00
dt-sql-parser is a **SQL Parser** project built with [ANTLR4](https://github.com/antlr/antlr4), and it's mainly for the **BigData** domain. The [ANTLR4](https://github.com/antlr/antlr4) generated the basic Parser, Visitor, and Listener, so it's easy to complete the **syntax validation**, **tokenizer**, **traverse** the AST, and so on features.
2020-12-15 15:04:46 +08:00
2023-10-20 16:04:51 +08:00
Besides, it provides some helper methods, like **split** SQL, and **Auto-Complete**.
2018-07-02 18:01:01 +08:00
2023-10-20 16:04:51 +08:00
**Supported SQL**:
2020-12-17 11:17:14 +08:00
- Generic SQL (MySQL)
2020-12-17 11:17:14 +08:00
- Flink SQL
- Spark SQL
- Hive SQL
- PL/SQL
2022-12-16 17:30:09 +08:00
- PostgreSQL
- Trino SQL
2023-10-20 16:04:51 +08:00
**Supported helper methods**
| SQL Type | SQL Split | Auto-Complete |
| ----------- | -------- | -------- |
| Generic SQL | WIP | WIP |
| Flink SQL | ✅ | ✅ |
| Spark SQL | ✅ | ✅ |
| Hive SQL | ✅ | ✅ |
| PL/SQL | WIP | WIP |
| Postgre SQL | WIP | WIP |
| Trino SQL | WIP | WIP |
>Tips: This project is the default for Javascript language, also you can try to compile it to other languages if you need.
2023-10-20 16:04:51 +08:00
<br/>
## Integrating SQL Parser with Monaco Editor
We have provided a [monaco-sql-languages](https://github.com/DTStack/monaco-sql-languages) package, you can integrate with `monaco-editor`
easily.
2023-10-20 16:04:51 +08:00
<br/>
2020-12-15 15:24:37 +08:00
## Installation
2020-12-17 16:46:42 +08:00
```bash
2023-10-20 16:04:51 +08:00
# use npm
2020-12-15 15:04:46 +08:00
npm i dt-sql-parser --save
2023-10-20 16:04:51 +08:00
# use yarn
2020-12-15 15:04:46 +08:00
yarn add dt-sql-parser
```
2023-10-20 16:04:51 +08:00
<br/>
2020-12-15 15:24:37 +08:00
## Usage
2023-10-20 16:04:51 +08:00
Before you get started, you need to understand the basics of how to use it. `dt-sql-parser` provides SQL parser classes for different types of supported SQL:
```javascript
import { GenericSQL, FlinkSQL, SparkSQL, HiveSQL, PLSQL, PostgresSQL, TrinoSQL } from 'dt-sql-parser';
```
2023-10-20 16:04:51 +08:00
Before using syntax validation, autocompletion, and other method, you need to instantiate the Parser of the corresponding SQL type, taking `GenericSQL` as an example:
```javascript
const parser = new GenericSQL();
```
2020-08-28 13:29:47 +08:00
2023-10-20 16:04:51 +08:00
The usage examples below will use `GenericSQL`, and Parser for other SQL types will be used in the same way as `GenericSQL`.
2020-12-17 16:46:42 +08:00
2023-10-20 16:04:51 +08:00
<br/>
2018-08-16 19:30:22 +08:00
2023-10-20 16:04:51 +08:00
### Syntax Validation
2020-12-15 15:04:46 +08:00
```javascript
2023-06-14 10:51:01 +08:00
import GenericSQL from 'dt-sql-parser/dist/parser/generic';
2020-12-15 15:04:46 +08:00
2020-12-17 16:46:42 +08:00
const parser = new GenericSQL();
const correctSql = 'select id,name from user1;';
const errors = parser.validate(correctSql);
console.log(errors);
```
2018-08-16 19:30:22 +08:00
2020-12-17 17:18:45 +08:00
Output:
2020-12-17 16:46:42 +08:00
```javascript
2018-08-16 19:30:22 +08:00
/*
2020-12-17 16:46:42 +08:00
[]
2018-08-16 19:30:22 +08:00
*/
```
2020-12-17 17:18:45 +08:00
Validate failed:
2018-10-08 17:58:36 +08:00
2020-12-15 15:04:46 +08:00
```javascript
2020-12-17 16:46:42 +08:00
const incorrectSql = 'selec id,name from user1;'
const errors = parser.validate(incorrectSql);
console.log(errors);
```
2020-08-28 13:29:47 +08:00
2020-12-17 17:18:45 +08:00
Output:
2018-10-08 17:58:36 +08:00
2020-12-17 16:46:42 +08:00
```javascript
2020-12-15 15:04:46 +08:00
/*
2020-12-17 16:46:42 +08:00
[
{
endCol: 5,
endLine: 1,
startCol: 0,
startLine: 1,
message: "mismatched input 'SELEC' expecting {<EOF>, 'ALTER', 'ANALYZE', 'CALL', 'CHANGE', 'CHECK', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DROP', 'EXPLAIN', 'GET', 'GRANT', 'INSERT', 'KILL', 'LOAD', 'LOCK', 'OPTIMIZE', 'PURGE', 'RELEASE', 'RENAME', 'REPLACE', 'RESIGNAL', 'REVOKE', 'SELECT', 'SET', 'SHOW', 'SIGNAL', 'UNLOCK', 'UPDATE', 'USE', 'BEGIN', 'BINLOG', 'CACHE', 'CHECKSUM', 'COMMIT', 'DEALLOCATE', 'DO', 'FLUSH', 'HANDLER', 'HELP', 'INSTALL', 'PREPARE', 'REPAIR', 'RESET', 'ROLLBACK', 'SAVEPOINT', 'START', 'STOP', 'TRUNCATE', 'UNINSTALL', 'XA', 'EXECUTE', 'SHUTDOWN', '--', '(', ';'}"
}
]
2020-12-15 15:04:46 +08:00
*/
```
2020-08-28 13:29:47 +08:00
2020-12-17 17:15:46 +08:00
We instanced a Parser object, and use the **validate** method to check the SQL syntax, if failed
returns an array object includes **error** message.
2020-12-17 16:46:42 +08:00
2023-10-20 16:04:51 +08:00
<br/>
2020-12-17 16:46:42 +08:00
### Tokenizer
2018-10-08 17:58:36 +08:00
2020-12-17 17:32:40 +08:00
Get all **tokens** by the Parser:
2018-10-08 17:58:36 +08:00
2020-12-15 15:04:46 +08:00
```javascript
2023-06-14 10:51:01 +08:00
import GenericSQL from 'dt-sql-parser/dist/parser/generic';
2020-08-28 13:29:47 +08:00
2020-12-15 15:04:46 +08:00
const parser = new GenericSQL()
const sql = 'select id,name,sex from user1;'
const tokens = parser.getAllTokens(sql)
console.log(tokens)
/*
[
2020-12-17 10:12:05 +08:00
{
channel: 0
2020-12-15 15:04:46 +08:00
column: 0
line: 1
source: [SqlLexer, InputStream]
start: 0
stop: 5
tokenIndex: -1
type: 137
_text: null
2020-12-17 10:12:05 +08:00
},
...
2020-12-15 15:04:46 +08:00
]
*/
```
2018-10-08 18:02:26 +08:00
2023-10-20 16:04:51 +08:00
<br/>
2020-12-15 15:04:46 +08:00
### Visitor
2020-12-17 16:46:42 +08:00
Traverse the tree node by the Visitor:
2018-10-08 17:58:36 +08:00
2020-12-15 15:04:46 +08:00
```javascript
2023-06-14 10:51:01 +08:00
import GenericSQL from 'dt-sql-parser/dist/parser/generic';
import { SqlParserVisitor } from 'dt-sql-parser/dist/parser/generic/SqlParserVisitor';
2020-12-15 15:04:46 +08:00
const parser = new GenericSQL()
const sql = `select id,name from user1;`
// parseTree
const tree = parser.parse(sql)
class MyVisitor extends SqlParserVisitor {
2020-12-15 15:24:37 +08:00
// overwrite visitTableName
2020-12-15 15:04:46 +08:00
visitTableName(ctx) {
let tableName = ctx.getText().toLowerCase()
console.log('TableName', tableName)
}
2020-12-15 15:24:37 +08:00
// overwrite visitSelectElements
2020-12-15 15:04:46 +08:00
visitSelectElements(ctx) {
let selectElements = ctx.getText().toLowerCase()
console.log('SelectElements', selectElements)
}
}
const visitor = new MyVisitor()
visitor.visit(tree)
/*
SelectElements id,name
TableName user1
*/
```
2020-12-17 16:46:42 +08:00
> Tips: The node's method name can be found in the Visitor file under the corresponding SQL directory
2020-12-15 15:04:46 +08:00
2023-10-20 16:04:51 +08:00
<br/>
2020-12-15 15:04:46 +08:00
### Listener
2020-12-17 17:15:46 +08:00
Access the specified node in the AST by the Listener
2020-12-15 15:04:46 +08:00
```javascript
2023-06-14 10:51:01 +08:00
import GenericSQL from 'dt-sql-parser/dist/parser/generic';
import { SqlParserListener } from 'dt-sql-parser/dist/parser/generic/SqlParserListener';
2020-12-15 15:04:46 +08:00
const parser = new GenericSQL();
const sql = 'select id,name from user1;'
// parseTree
const tree = parser.parse(sql)
class MyListener extends SqlParserListener {
enterTableName(ctx) {
let tableName = ctx.getText().toLowerCase()
console.log('TableName', tableName)
}
enterSelectElements(ctx) {
let selectElements = ctx.getText().toLowerCase()
console.log('SelectElements', selectElements)
2020-12-15 15:04:46 +08:00
}
}
const listenTableName = new MyListener();
parser.listen(listenTableName, tree);
/*
SelectElements id,name
TableName user1
*/
```
2018-10-08 17:58:36 +08:00
2020-12-17 16:46:42 +08:00
> Tips: The node's method name can be found in the Listener file under the corresponding SQL directory
2023-10-20 16:04:51 +08:00
<br/>
2020-12-17 16:46:42 +08:00
2023-10-20 16:04:51 +08:00
### Split sql by statement
Take `FlinkSQL` as an example:
2020-12-17 16:46:42 +08:00
```javascript
2023-10-20 16:04:51 +08:00
import { FlinkSQL } from 'dt-sql-parser';
const parser = new FlinkSQL();
const sql = 'SHOW TABLES;\nSELECT * FROM tb;';
const sqlSlices = parser.splitSQLByStatement(sql);
console.log(sqlSlices)
2020-12-17 16:46:42 +08:00
/*
2023-10-20 16:04:51 +08:00
[
{
startIndex: 0,
endIndex: 11,
startLine: 1,
endLine: 1,
startColumn: 1,
endColumn: 12,
text: 'SHOW TABLES;'
},
{
startIndex: 13,
endIndex: 29,
startLine: 2,
endLine: 2,
startColumn: 1,
endColumn: 17,
text: 'SELECT * FROM tb;'
}
]
2020-12-17 16:46:42 +08:00
*/
```
2023-10-20 16:04:51 +08:00
<br/>
### Auto Complete
Get the autocomplete information in the specified position of sql, using `FlinkSQL` as an example:
Call the `getSuggestionAtCaretPosition` method, passing in the SQL content and the row and column numbers of the position that need to be autocompleted.
+ Get a list of keyword candidates
```javascript
import { FlinkSQL } from 'dt-sql-parser';
const parser = new FlinkSQL();
const sql = 'CREATE ';
const pos = { lineNumber: 1, column: 16 }; // the end position
const keywords = parser.getSuggestionAtCaretPosition(sql, pos)?.keywords;
console.log(keywords);
/*
[ 'CATALOG', 'FUNCTION', 'TEMPORARY', 'VIEW', 'DATABASE', 'TABLE' ]
*/
```
+ Gets syntax-related autocompletion information
```javascript
const parser = new FlinkSQL();
const sql = 'SELECT * FROM tb';
const pos = { lineNumber: 1, column: 16 }; // after 'tb'
const syntaxSuggestions = parser.getSuggestionAtCaretPosition(sql, pos)?.syntax;
console.log(syntaxSuggestions);
/*
[
{
syntaxContextType: 'table',
wordRanges: [
{
text: 'tb',
startIndex: 14,
stopIndex: 15,
line: 1,
startColumn: 15,
stopColumn: 16
}
]
},
{
syntaxContextType: 'view',
wordRanges: [
{
text: 'tb',
startIndex: 14,
stopIndex: 15,
line: 1,
startColumn: 15,
stopColumn: 16
}
]
}
]
*/
```
Syntax-related autocomplete information returns an array, and each item in the array represents what syntax can be filled in at that position, such as the output result in the above example represents that the position can be filled in **table name** or **view name**. The `syntaxContextType` is a syntax type that can be completed, and `wordRanges` is what has been filled in.
<br/>
2019-09-25 15:57:25 +08:00
2023-10-20 16:04:51 +08:00
### Other API
2018-10-08 17:58:36 +08:00
2023-10-20 16:04:51 +08:00
- `createLexer` Create an instance of Antlr4 Lexer and return;
- `createParser` Create an instance of Antlr4 parser and return;
- `parse` Parses the input SQL and returns the parse tree;
2018-11-30 13:40:48 +08:00
2023-10-20 16:04:51 +08:00
<br/>
2018-11-30 13:40:48 +08:00
2020-12-15 15:24:37 +08:00
## License
2018-11-30 13:40:48 +08:00
2020-12-17 16:46:42 +08:00
[MIT](./LICENSE)