Go to file
Hayden f9dbd9fc23
Refactor/basic parser (#182)
* feat: replace errorCollector with ParserErrorListener to collect lexer error

* refactor: remove useless method in basicParser

* feat: correct splitSQLByStatement method

* feat: rename parse to parseWithCache and add new parse method

* refactor: rename parserTree to parseTree

* test: rename parserTree to parseTree

* refactor: rename parserError to parseError

* feat: export ErrorHandler type

* feat: use errorhandler form params

* test: basic parser unit tests

* style: lint via prettier
2023-10-16 17:59:28 +08:00
.github chroe: devops (#180) 2023-10-13 11:16:36 +08:00
.husky chroe: devops (#180) 2023-10-13 11:16:36 +08:00
docs docs: add release example 2021-01-05 16:09:07 +08:00
scripts build: optimize antlr script (#171) 2023-10-09 17:43:41 +08:00
src Refactor/basic parser (#182) 2023-10-16 17:59:28 +08:00
test Refactor/basic parser (#182) 2023-10-16 17:59:28 +08:00
.czrc chroe: devops (#180) 2023-10-13 11:16:36 +08:00
.gitignore fix: correct the grammar usage, especially in the parts targeting javascript (#109) 2023-05-24 15:07:53 +08:00
.lintstagedrc.js Refactor/basic parser (#182) 2023-10-16 17:59:28 +08:00
.npmignore build: ignore useless file 2023-01-06 10:19:26 +08:00
.prettierignore chroe: devops (#180) 2023-10-13 11:16:36 +08:00
.prettierrc chroe: devops (#180) 2023-10-13 11:16:36 +08:00
CHANGELOG.md chore(release): 4.0.0-beta.4.2 2023-08-31 13:57:02 +08:00
commitlint.config.js chroe: devops (#180) 2023-10-13 11:16:36 +08:00
CONTRIBUTING.md feat: upgrade antlr4 to 4.12.0 (#88) 2023-05-04 10:13:05 +08:00
jest.config.js refactor: migrate antlr4 v4.12.0 to antlr4ts(4.9.0) (#106) 2023-05-30 14:44:03 +08:00
package.json chroe: devops (#180) 2023-10-13 11:16:36 +08:00
pnpm-lock.yaml chroe: devops (#180) 2023-10-13 11:16:36 +08:00
README-zh_CN.md docs(readme): fix documentation error (#110) 2023-05-24 17:01:17 +08:00
README.md docs: update content 2023-06-14 10:51:01 +08:00
tsconfig.check.json chroe: devops (#180) 2023-10-13 11:16:36 +08:00
tsconfig.json refactor: migrate antlr4 v4.12.0 to antlr4ts(4.9.0) (#106) 2023-05-30 14:44:03 +08:00
yarn.lock chroe: devops (#180) 2023-10-13 11:16:36 +08:00

dt-sql-parser

NPM version NPM downloads Chat

English | 简体中文

dt-sql-parser is a SQL Parser project built with ANTLR4, and it's mainly for the BigData domain. The ANTLR4 generated the basic Parser, Visitor, and Listener, so it's easy to complete the syntax validation, tokenizer, traverse the AST, and so on features.

Besides, it provides some helper methods, like split SQL, and filter the -- and /**/ types of comments in SQL.

Supported SQL:

  • Generic SQL (MySQL)
  • Flink SQL
  • Spark SQL
  • Hive SQL
  • PL/SQL
  • PostgreSQL
  • Trino SQL

Tips: This project is the default for Javascript language, also you can try to compile it to other languages if you need.

Integrating SQL Parser with Monaco Editor

We have provided a monaco-sql-languages package, you can integrate with monaco-editor easily.

Installation

// use npm
npm i dt-sql-parser --save

// use yarn
yarn add dt-sql-parser

Usage

Syntax Validation

First, we need to import the Parser object from dt-sql-parser, the different language needs different Parser, so if you need to handle the Flink SQL, you can import the FlinkSQL Parser.

The below is a GenericSQL Parser example:

import GenericSQL from 'dt-sql-parser/dist/parser/generic';

const parser = new GenericSQL();

const correctSql = 'select id,name from user1;';
const errors = parser.validate(correctSql);
console.log(errors); 

Output:

/*
[]
*/

Validate failed:

const incorrectSql = 'selec id,name from user1;'
const errors = parser.validate(incorrectSql);
console.log(errors); 

Output:

/*
[
    {
        endCol: 5,
        endLine: 1,
        startCol: 0,
        startLine: 1,
        message: "mismatched input 'SELEC' expecting {<EOF>, 'ALTER', 'ANALYZE', 'CALL', 'CHANGE', 'CHECK', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DROP', 'EXPLAIN', 'GET', 'GRANT', 'INSERT', 'KILL', 'LOAD', 'LOCK', 'OPTIMIZE', 'PURGE', 'RELEASE', 'RENAME', 'REPLACE', 'RESIGNAL', 'REVOKE', 'SELECT', 'SET', 'SHOW', 'SIGNAL', 'UNLOCK', 'UPDATE', 'USE', 'BEGIN', 'BINLOG', 'CACHE', 'CHECKSUM', 'COMMIT', 'DEALLOCATE', 'DO', 'FLUSH', 'HANDLER', 'HELP', 'INSTALL', 'PREPARE', 'REPAIR', 'RESET', 'ROLLBACK', 'SAVEPOINT', 'START', 'STOP', 'TRUNCATE', 'UNINSTALL', 'XA', 'EXECUTE', 'SHUTDOWN', '--', '(', ';'}"
    }
]
*/

We instanced a Parser object, and use the validate method to check the SQL syntax, if failed returns an array object includes error message.

Tokenizer

Get all tokens by the Parser:

import GenericSQL from 'dt-sql-parser/dist/parser/generic';

const parser = new GenericSQL()
const sql = 'select id,name,sex from user1;'
const tokens = parser.getAllTokens(sql)
console.log(tokens)
/*
[
    {
        channel: 0
        column: 0
        line: 1
        source: [SqlLexer, InputStream]
        start: 0
        stop: 5
        tokenIndex: -1
        type: 137
        _text: null
    },
    ...
]
*/

Visitor

Traverse the tree node by the Visitor:

import GenericSQL from 'dt-sql-parser/dist/parser/generic';
import { SqlParserVisitor } from 'dt-sql-parser/dist/parser/generic/SqlParserVisitor';

const parser = new GenericSQL()
const sql = `select id,name from user1;`
// parseTree
const tree = parser.parse(sql)
class MyVisitor extends SqlParserVisitor {
    // overwrite visitTableName
    visitTableName(ctx) {
        let tableName = ctx.getText().toLowerCase()
        console.log('TableName', tableName)
    }
    // overwrite visitSelectElements
    visitSelectElements(ctx) {
        let selectElements = ctx.getText().toLowerCase()
        console.log('SelectElements', selectElements)
    }
}
const visitor = new MyVisitor()
visitor.visit(tree)

/*
SelectElements id,name
TableName user1
*/

Tips: The node's method name can be found in the Visitor file under the corresponding SQL directory

Listener

Access the specified node in the AST by the Listener

import GenericSQL from 'dt-sql-parser/dist/parser/generic';
import { SqlParserListener } from 'dt-sql-parser/dist/parser/generic/SqlParserListener';

const parser = new GenericSQL();
const sql = 'select id,name from user1;'
// parseTree
const tree = parser.parse(sql)
class MyListener extends SqlParserListener {
    enterTableName(ctx) {
        let tableName = ctx.getText().toLowerCase()
        console.log('TableName', tableName)
    }
    enterSelectElements(ctx) {
        let selectElements = ctx.getText().toLowerCase()
        console.log('SelectElements', selectElements)
    }
}
const listenTableName = new MyListener();
parser.listen(listenTableName, tree);

/*
SelectElements id,name
TableName user1
*/

Tips: The node's method name can be found in the Listener file under the corresponding SQL directory

Clean

Clear the comments and spaces before and after

import { cleanSql } from 'dt-sql-parser';

const sql = `-- comment comment
select id,name from user1; `
const cleanedSql = cleanSql(sql)
console.log(cleanedSql)

/*
select id,name from user1;
*/

Split SQL

When the SQL text is very big, you can think about to split it by ; , and handle it by each line.

import { splitSql } from 'dt-sql-parser';

const sql = `select id,name from user1;
select id,name from user2;`
const sqlList = splitSql(sql)
console.log(sqlList)

/*
["select id,name from user1;", "\nselect id,name from user2;"]
*/

Other API

  • parserTreeToString(input: string)

Parse the input and convert the AST to a List-like tree string.

Roadmap

  • Auto-complete
  • Code formatting

License

MIT