2018-07-03 11:33:53 +08:00
# dt-sql-parser
2018-07-02 18:01:01 +08:00
2022-03-01 16:59:34 +08:00
[![NPM version][npm-image]][npm-url] [![NPM downloads][download-img]][download-url] [![Chat][online-chat-img]][online-chat-url]
2020-05-12 09:39:49 +08:00
2020-12-17 16:46:42 +08:00
[npm-image]: https://img.shields.io/npm/v/dt-sql-parser.svg?style=flat-square
[npm-url]: https://www.npmjs.com/package/dt-sql-parser
2020-12-15 15:04:46 +08:00
2021-01-04 14:44:26 +08:00
[download-img]: https://img.shields.io/npm/dm/dt-sql-parser.svg?style=flat
[download-url]: https://www.npmjs.com/package/dt-sql-parser
2022-03-01 16:59:34 +08:00
[online-chat-img]: https://img.shields.io/discord/920616811261743104?logo=Molecule
[online-chat-url]: https://discord.gg/uVvq6mfPfa
2023-11-29 14:56:52 +08:00
English | [简体中文 ](./README-zh_CN.md )
2023-11-06 18:07:33 +08:00
dt-sql-parser is a **SQL Parser** project built with [ANTLR4 ](https://github.com/antlr/antlr4 ), and it's mainly for the **BigData** field. The [ANTLR4 ](https://github.com/antlr/antlr4 ) generated the basic Parser, Visitor, and Listener, so it's easy to complete the **syntax validation** , **tokenizer** , **traverse** the AST, and so on features.
2020-12-15 15:04:46 +08:00
2023-11-06 18:09:28 +08:00
Additionally, it provides auxiliary functions such as **SQL splitting** and **code completion** .
2018-07-02 18:01:01 +08:00
2023-10-20 16:04:51 +08:00
**Supported SQL**:
2020-12-17 11:17:14 +08:00
2023-11-29 14:56:52 +08:00
- MySQL
2020-12-17 11:17:14 +08:00
- Flink SQL
- Spark SQL
- Hive SQL
2022-12-16 17:30:09 +08:00
- PostgreSQL
2023-05-24 17:01:17 +08:00
- Trino SQL
2023-11-29 14:56:52 +08:00
- Impala SQL
2018-07-05 11:29:39 +08:00
2023-10-20 17:46:12 +08:00
**Supported auxiliary methods**
2023-10-20 16:04:51 +08:00
2023-11-06 18:09:28 +08:00
| SQL Type | SQL Spliting | Code Completion |
| ----------- | ------------ | --------------- |
2023-11-29 14:56:52 +08:00
| MySQL | ✅ | ✅ |
| Flink SQL | ✅ | ✅ |
| Spark SQL | ✅ | ✅ |
| Hive SQL | ✅ | ✅ |
| PostgreSQL | ✅ | ✅ |
| Trino SQL | ✅ | ✅ |
| Impala SQL | ✅ | ✅ |
2023-10-20 16:04:51 +08:00
2020-12-17 17:21:40 +08:00
>Tips: This project is the default for Javascript language, also you can try to compile it to other languages if you need.
2023-10-20 16:04:51 +08:00
< br / >
2021-07-16 14:33:35 +08:00
## Integrating SQL Parser with Monaco Editor
2023-11-29 14:56:52 +08:00
We have provided [monaco-sql-languages ](https://github.com/DTStack/monaco-sql-languages ), it is easily to integrate with `monaco-editor` .
2021-07-16 14:33:35 +08:00
2023-12-12 20:30:44 +08:00
>Tips: If you want to run `dt-sql-parser` in browser, don't forget to install the `assert` and `util` polyfills, and define the global variable `process.env` .
None of this is needed in the node environment, because node has them built-in.
2023-10-20 16:04:51 +08:00
< br / >
2020-12-15 15:24:37 +08:00
## Installation
2018-07-05 11:29:39 +08:00
2020-12-17 16:46:42 +08:00
```bash
2023-10-20 16:04:51 +08:00
# use npm
2020-12-15 15:04:46 +08:00
npm i dt-sql-parser --save
2023-10-20 16:04:51 +08:00
# use yarn
2020-12-15 15:04:46 +08:00
yarn add dt-sql-parser
```
2023-10-20 16:04:51 +08:00
< br / >
2020-12-15 15:24:37 +08:00
## Usage
2023-10-20 17:46:12 +08:00
We recommend learning the Fundamentals usage before continuing. The dt-sql-parser library provides SQL Parser classes for different types of SQL.
2023-10-20 16:04:51 +08:00
```javascript
2023-11-29 14:56:52 +08:00
import { MySQL, FlinkSQL, SparkSQL, HiveSQL, PostgresSQL, TrinoSQL, ImpalaSQL } from 'dt-sql-parser';
2023-10-20 16:04:51 +08:00
```
2018-07-05 11:29:39 +08:00
2023-11-20 09:54:57 +08:00
Before using syntax validation, code completion, and other features, it is necessary to instantiate the Parser of the relevant SQL type.
2023-11-29 14:56:52 +08:00
For instance, one can consider using `MySQL` as an example:
2023-10-20 16:04:51 +08:00
```javascript
2023-11-29 14:56:52 +08:00
const parser = new MySQL();
2023-10-20 16:04:51 +08:00
```
2020-08-28 13:29:47 +08:00
2023-11-29 14:56:52 +08:00
The following usage examples will utilize the `MySQL` , and the Parser for other SQL types will be used in a similar manner as `MySQL` .
2018-08-16 19:30:22 +08:00
2023-10-20 16:04:51 +08:00
### Syntax Validation
2020-12-15 15:04:46 +08:00
```javascript
2023-11-29 14:56:52 +08:00
import { MySQL } from 'dt-sql-parser';
2020-12-15 15:04:46 +08:00
2023-11-29 14:56:52 +08:00
const parser = new MySQL();
2020-12-17 16:46:42 +08:00
const correctSql = 'select id,name from user1;';
const errors = parser.validate(correctSql);
console.log(errors);
```
2018-08-16 19:30:22 +08:00
2023-11-29 14:56:52 +08:00
*output:*
2020-12-17 16:46:42 +08:00
```javascript
2023-11-29 14:56:52 +08:00
/*
2020-12-17 16:46:42 +08:00
[]
2018-08-16 19:30:22 +08:00
*/
```
2023-11-29 14:56:52 +08:00
**Validate failed:**
2018-10-08 17:58:36 +08:00
2020-12-15 15:04:46 +08:00
```javascript
2020-12-17 16:46:42 +08:00
const incorrectSql = 'selec id,name from user1;'
const errors = parser.validate(incorrectSql);
console.log(errors);
2023-11-29 14:56:52 +08:00
2020-12-17 16:46:42 +08:00
```
2020-08-28 13:29:47 +08:00
2023-11-29 14:56:52 +08:00
*output:*
2018-10-08 17:58:36 +08:00
2020-12-17 16:46:42 +08:00
```javascript
2020-12-15 15:04:46 +08:00
/*
2020-12-17 16:46:42 +08:00
[
2023-11-29 14:56:52 +08:00
{
endCol: 5,
endLine: 1,
startCol: 0,
startLine: 1,
message: "mismatched input 'SELEC' expecting {< EOF > , 'ALTER', 'ANALYZE', 'CALL', 'CHANGE', 'CHECK', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DROP', 'EXPLAIN', 'GET', 'GRANT', 'INSERT', 'KILL', 'LOAD', 'LOCK', 'OPTIMIZE', 'PURGE', 'RELEASE', 'RENAME', 'REPLACE', 'RESIGNAL', 'REVOKE', 'SELECT', 'SET', 'SHOW', 'SIGNAL', 'UNLOCK', 'UPDATE', 'USE', 'BEGIN', 'BINLOG', 'CACHE', 'CHECKSUM', 'COMMIT', 'DEALLOCATE', 'DO', 'FLUSH', 'HANDLER', 'HELP', 'INSTALL', 'PREPARE', 'REPAIR', 'RESET', 'ROLLBACK', 'SAVEPOINT', 'START', 'STOP', 'TRUNCATE', 'UNINSTALL', 'XA', 'EXECUTE', 'SHUTDOWN', '--', '(', ';'}"
}
2020-12-17 16:46:42 +08:00
]
2020-12-15 15:04:46 +08:00
*/
```
2020-08-28 13:29:47 +08:00
2020-12-17 17:15:46 +08:00
We instanced a Parser object, and use the **validate** method to check the SQL syntax, if failed
returns an array object includes **error** message.
2020-12-17 16:46:42 +08:00
### Tokenizer
2018-10-08 17:58:36 +08:00
2020-12-17 17:32:40 +08:00
Get all **tokens** by the Parser:
2018-10-08 17:58:36 +08:00
2020-12-15 15:04:46 +08:00
```javascript
2023-11-29 14:56:52 +08:00
import { MySQL } from 'dt-sql-parser';
2020-08-28 13:29:47 +08:00
2023-11-29 14:56:52 +08:00
const parser = new MySQL()
2020-12-15 15:04:46 +08:00
const sql = 'select id,name,sex from user1;'
const tokens = parser.getAllTokens(sql)
console.log(tokens)
2023-11-29 14:56:52 +08:00
```
*output:*
```javascript
2020-12-15 15:04:46 +08:00
/*
[
2023-11-29 14:56:52 +08:00
{
channel: 0
column: 0
line: 1
source: [SqlLexer, InputStream]
start: 0
stop: 5
tokenIndex: -1
type: 137
_text: null
},
...
2020-12-15 15:04:46 +08:00
]
*/
```
2018-10-08 18:02:26 +08:00
2020-12-15 15:04:46 +08:00
### Visitor
2020-12-17 16:46:42 +08:00
Traverse the tree node by the Visitor:
2018-10-08 17:58:36 +08:00
2023-12-19 20:51:40 +08:00
```typescript
import { MySQL, AbstractParseTreeVisitor } from 'dt-sql-parser';
import type { MySqlParserVisitor } from 'dt-sql-parser';
2020-12-15 15:04:46 +08:00
2023-12-19 20:51:40 +08:00
const parser = new MySQL();
const sql = `select id,name from user1;` ;
const tree = parser.parse(sql);
type Result = string;
class MyVisitor extends AbstractParseTreeVisitor< Result > implements MySqlParserVisitor< Result > {
protected defaultResult() {
return '';
}
2020-12-15 15:04:46 +08:00
visitTableName(ctx) {
2023-12-19 20:51:40 +08:00
let tableName = ctx.text.toLowerCase();
console.log('TableName:', tableName);
return '';
2020-12-15 15:04:46 +08:00
}
visitSelectElements(ctx) {
2023-12-19 20:51:40 +08:00
let selectElements = ctx.text.toLowerCase();
console.log('SelectElements:', selectElements);
return '';
}
visitProgram(ctx) {
return 'Return by program node'
2020-12-15 15:04:46 +08:00
}
}
2023-12-19 20:51:40 +08:00
const visitor = new MyVisitor();
const result = visitor.visit(tree);
console.log(result);
2023-11-29 14:56:52 +08:00
```
*output:*
2020-12-15 15:04:46 +08:00
2023-11-29 14:56:52 +08:00
```javascript
2020-12-15 15:04:46 +08:00
/*
2023-12-19 20:51:40 +08:00
SelectElements: id,name
TableName: user1
*/
/*
Return by program node
2020-12-15 15:04:46 +08:00
*/
```
2020-12-17 16:46:42 +08:00
> Tips: The node's method name can be found in the Visitor file under the corresponding SQL directory
2020-12-15 15:04:46 +08:00
### Listener
2020-12-17 17:15:46 +08:00
Access the specified node in the AST by the Listener
2020-12-15 15:04:46 +08:00
2023-12-19 20:51:40 +08:00
```typescript
import { MySQL } from 'dt-sql-parser';
import type { MySqlParserListener } from 'dt-sql-parser';
2020-12-15 15:04:46 +08:00
2023-11-29 14:56:52 +08:00
const parser = new MySQL();
2023-12-19 20:51:40 +08:00
const sql = 'select id,name from user1;';
const parseTree = parser.parse(sql);
class MyListener implements MySqlParserListener {
2020-12-15 15:04:46 +08:00
enterTableName(ctx) {
2023-12-19 20:51:40 +08:00
let tableName = ctx.text.toLowerCase();
console.log('TableName:', tableName);
2020-12-15 15:04:46 +08:00
}
enterSelectElements(ctx) {
2023-12-19 20:51:40 +08:00
let selectElements = ctx.text.toLowerCase();
console.log('SelectElements:', selectElements);
2020-12-15 15:04:46 +08:00
}
}
const listenTableName = new MyListener();
2023-12-19 20:51:40 +08:00
parser.listen(listenTableName as MySqlParserListener, parseTree);
2023-11-29 14:56:52 +08:00
```
*output:*
```javascript
2020-12-15 15:04:46 +08:00
/*
2023-12-19 20:51:40 +08:00
SelectElements: id,name
TableName: user1
2020-12-15 15:04:46 +08:00
*/
```
2018-10-08 17:58:36 +08:00
2020-12-17 16:46:42 +08:00
> Tips: The node's method name can be found in the Listener file under the corresponding SQL directory
2023-10-20 17:46:12 +08:00
### Splitting SQL statements
2023-10-20 16:04:51 +08:00
Take `FlinkSQL` as an example:
2020-12-17 16:46:42 +08:00
```javascript
2023-10-20 16:04:51 +08:00
import { FlinkSQL } from 'dt-sql-parser';
const parser = new FlinkSQL();
const sql = 'SHOW TABLES;\nSELECT * FROM tb;';
const sqlSlices = parser.splitSQLByStatement(sql);
console.log(sqlSlices)
2023-11-29 14:56:52 +08:00
```
2020-12-17 16:46:42 +08:00
2023-11-29 14:56:52 +08:00
*output:*
```javascript
2020-12-17 16:46:42 +08:00
/*
2023-10-20 16:04:51 +08:00
[
2023-11-29 14:56:52 +08:00
{
2023-10-20 16:04:51 +08:00
startIndex: 0,
endIndex: 11,
startLine: 1,
endLine: 1,
startColumn: 1,
endColumn: 12,
text: 'SHOW TABLES;'
2023-11-29 14:56:52 +08:00
},
{
2023-10-20 16:04:51 +08:00
startIndex: 13,
endIndex: 29,
startLine: 2,
endLine: 2,
startColumn: 1,
endColumn: 17,
text: 'SELECT * FROM tb;'
2023-11-29 14:56:52 +08:00
}
2023-10-20 16:04:51 +08:00
]
2020-12-17 16:46:42 +08:00
*/
```
2023-10-20 17:46:12 +08:00
### Code Completion
2023-11-06 18:07:33 +08:00
Obtaining code completion information at a specified position in SQL.
2023-10-20 17:46:12 +08:00
We can refer to the example of using `FlinkSQL` .
2023-10-20 16:04:51 +08:00
2023-11-06 18:07:33 +08:00
Invoke the `getSuggestionAtCaretPosition` method, pass the SQL content and the row and column numbers indicating the position where code completion is desired.
2023-11-29 14:56:52 +08:00
+ **keyword candidates list**
2023-10-20 16:04:51 +08:00
```javascript
import { FlinkSQL } from 'dt-sql-parser';
const parser = new FlinkSQL();
const sql = 'CREATE ';
const pos = { lineNumber: 1, column: 16 }; // the end position
const keywords = parser.getSuggestionAtCaretPosition(sql, pos)?.keywords;
console.log(keywords);
2023-11-29 14:56:52 +08:00
```
*output:*
```javascript
2023-10-20 16:04:51 +08:00
/*
[ 'CATALOG', 'FUNCTION', 'TEMPORARY', 'VIEW', 'DATABASE', 'TABLE' ]
2023-11-29 14:56:52 +08:00
*/
2023-10-20 16:04:51 +08:00
```
2023-11-29 14:56:52 +08:00
+ **Obtaining information related to grammar completion**
2023-10-20 16:04:51 +08:00
```javascript
const parser = new FlinkSQL();
const sql = 'SELECT * FROM tb';
const pos = { lineNumber: 1, column: 16 }; // after 'tb'
const syntaxSuggestions = parser.getSuggestionAtCaretPosition(sql, pos)?.syntax;
console.log(syntaxSuggestions);
2023-11-29 14:56:52 +08:00
```
*output:*
```javascript
2023-10-20 16:04:51 +08:00
/*
[
{
syntaxContextType: 'table',
wordRanges: [
{
text: 'tb',
startIndex: 14,
stopIndex: 15,
line: 1,
startColumn: 15,
stopColumn: 16
}
]
},
{
syntaxContextType: 'view',
wordRanges: [
{
text: 'tb',
startIndex: 14,
stopIndex: 15,
line: 1,
startColumn: 15,
stopColumn: 16
}
]
}
]
*/
```
2023-11-06 18:07:33 +08:00
The grammar-related code completion information returns an array, where each item represents what grammar can be filled in at that position. For example, the output in the above example represents that the position can be filled with either a **table name** or **a view name** . In this case, `syntaxContextType` represents the type of grammar that can be completed, and `wordRanges` represents the content that has already been filled.
2019-09-25 15:57:25 +08:00
2023-10-20 16:04:51 +08:00
### Other API
2018-10-08 17:58:36 +08:00
2023-10-20 17:46:12 +08:00
- `createLexer` Create an instance of Antlr4 Lexer and return it;
- `createParser` Create an instance of Antlr4 parser and return it;
2023-10-20 16:04:51 +08:00
- `parse` Parses the input SQL and returns the parse tree;
2018-11-30 13:40:48 +08:00
2023-10-20 16:04:51 +08:00
< br / >
2018-11-30 13:40:48 +08:00
2020-12-15 15:24:37 +08:00
## License
2018-11-30 13:40:48 +08:00
2020-12-17 16:46:42 +08:00
[MIT ](./LICENSE )