Feat/range boundary (#241)

* feat: unify the index, line and column in all APIs * docs: describe index, line and column
2023-12-22 19:12:29 +08:00
parent f6bc7594e1
commit 9c52542ec7
9 changed files with 159 additions and 11 deletions
--- a/README-zh_CN.md
+++ b/README-zh_CN.md
@ -282,7 +282,7 @@ console.log(sqlSlices)
 ### 自动补全（Code Completion）
 在 sql 文本的指定位置上获取自动补全信息，以 `FlinkSQL` 为例：

-调用 `getSuggestionAtCaretPosition` 方法，传入 sql 内容和指定位置的行列号。
+调用 `getSuggestionAtCaretPosition` 方法，传入 sql 内容和指定位置的行列号，下文中有一些关于[自动补全位置](#自动补全功能的-caretposition)的补充说明。
 + **获取关键字候选项列表**

    ```javascript
@ -355,6 +355,75 @@ console.log(sqlSlices)

 <br/>

+## 关于文本位置和文本范围
+`dt-sql-parser` 提供的部分 API 的返回结果中包含文本信息，其中关于行号、列数以及索引的范围和起始值可能会带来一些困惑。
+
+### 索引（index）
+索引从 0 开始，在编程领域，索引从 0 开始更符合直觉
+
+![index-image](./docs/images/index.png)
+
+对于一个索引范围，起始索引从 0 开始，以 n-1 结束，如上图中，一个圈定蓝色文本的索引范围应该这样表示：
+
+```javascript
+{
+    startIndex: 0,
+    endIndex: 3
+}
+```
+
+### 行号（line）
+行号（line）从 1 开始
+
+![line-image](./docs/images/line.png)
+
+对于一个圈定多行的范围，行号从 1 开始，以 n 结束，一个圈定第一行和第二行的范围这样表示：
+```javascript
+{
+    startLine: 1,
+    endLine: 2
+}
+```
+
+### 列数（column）
+列数也从 1 开始
+
+![column-image](./docs/images/column.png)
+
+将列数类比为编辑器的光标位置会更加容易理解。对于一个圈定多列的范围，列数从 1 开始，以 n+1 结束，如上图中，一个圈定蓝色文本的列数范围这样表示：
+
+```javascript
+{
+    startColumn: 1,
+    endColumn: 5
+}
+```
+
+### 自动补全功能的 CaretPosition
+dt-sql-parser 的自动补全功能在设计之初就是为了在编辑器中使用，所以 `getSuggestionAtCaretPosition` 方法的第二个参数（位置信息）的格式为行列号而不是字符位置索引。这可以让自动补全功能更容易的集成到编辑器中。对于编辑器来说，只需要在特定的时机获取编辑器内的文本内容以及光标位置即可调用 `dt-sql-parser` 的自动补全功能，而不需要任何额外的计算。
+
+但是在一些其他场景下，你可能需要通过转换或者计算来得到自动补全功能所需要的位置信息，那么在此之前，有一些注意事项可能是你需要关心的。
+
+dt-sql-parser 的自动补全功能依赖于 [antlr4-c3](https://github.com/mike-lischke/antlr4-c3), 这是一个很棒的库。dt-sql-parser 的自动补全功能只是基于 antlr4-c3 做了一些封装和转换，包括将行列号信息转换成 antlr4-c3 需要的 token 索引，以下图为例：
+
+![column-image](./docs/images/token.png)
+
+将图中的 column 视作为光标位置，这段文本放到编辑器中，会得到 13 个可能的光标位置，而对于 dt-sql-parser 来说，这段文本被解析后会生成 4 个 Token。自动补全功能的一个重要策略是：**当光标（自动补全位置）还没有完全离开某个 Token 时，dt-sql-parser 就认为这个 Token 还没有完成，自动补全功能将会去推断这个 Token 所在的位置可以填什么。**
+
+举个例子，如果想要通过自动补全功能知道 `SHOW` 后面应该填什么， 那么对应的位置信息应该是：
+```javascript
+{
+    lineNumber: 1,
+    column: 6
+}
+```
+
+此时，dt-sql-parser 会认为 `SHOW` 已经是一个完整的 Token 了，应该去推断 `SHOW` 后面可以填什么。如果传入的位置信息中 column 是 5， 那么 dt-sql-parser 会认为 `SHOW` 还没有被完成，进而去推断 `SHOW` 的位置可以填什么。也即在上图中 `column: 5` 属于 `token: 0`，`column: 6` 属于 `token: 1`。
+
+对于编辑器来说，这种策略也更符合直觉。当用户输入了 `SHOW` 以后，在没有敲击空格键之前，用户大概率还没有输入完成，也许用户想要输入的是 `SHOWS` 之类的。当用户敲击了空格键，编辑器会认为用户想要输入下一个 Token，是时候询问 dt-sql-parser 下一个 Token 位置可以填哪些东西了。
+
+<br/>
+
 ## 许可证

 [MIT](./LICENSE)
--- a/README.md
+++ b/README.md
@ -290,7 +290,7 @@ console.log(sqlSlices)
 Obtaining code completion information at a specified position in SQL.
 We can refer to the example of using `FlinkSQL`.

-Invoke the `getSuggestionAtCaretPosition` method, pass the SQL content and the row and column numbers indicating the position where code completion is desired.
+Invoke the `getSuggestionAtCaretPosition` method, pass the SQL content and the row and column numbers indicating the position where code completion is desired. The following are some additional explanations about [CaretPosition](#caretposition-of-code-completion).
 + **keyword candidates list**

    ```javascript
@ -358,6 +358,76 @@ The grammar-related code completion information returns an array, where each ite

 <br/>

+## Position and Range
+Some return results of the APIs provided by `dt-sql-parser` contain text information, among which the range and start value of line number, column number and index may cause some confusion.
+
+### Index
+The index starts at 0. In the programming field, it is more intuitive.
+
+![index-image](./docs/images/index.png)
+
+For an index range, the start index starts from 0 and ends with n-1, as shown in the figure above, an index range of blue text should be represented as follows:
+
+```javascript
+{
+    startIndex: 0,
+    endIndex: 3
+}
+```
+
+### Line
+The line starts at 1.
+
+![line-image](./docs/images/line.png)
+
+For a range of multiple lines, the line number starts from 1 and ends with n. A range of the first and second lines is represented as follows:
+
+```javascript
+{
+    startLine: 1,
+    endLine: 2
+}
+```
+
+### Column 
+The column also starts at 1.
+
+![column-image](./docs/images/column.png)
+
+It is easier to understand by comparing the column number with the cursor position of the editor. For a range of multiple columns, the column number starts from 1 and ends with n+1, as shown in the figure above, a range of blue text columns is represented as follows:
+
+```javascript
+{
+    startColumn: 1,
+    endColumn: 5
+}
+```
+
+### CaretPosition Of Code Completion
+The code completion of `dt-sql-parser` was designed to be used in the editor, so the format of the second parameter (CaretPosition) of the `getSuggestionAtCaretPosition` method is line and column number instead of character position index. This makes it easier to integrate the code completion into the editor. For the editor, it only needs to get the text content and cursor position in the editor at a specific time to call the code completion of `dt-sql-parser`, without any additional calculation.
+
+But in some other scenarios, you may need to get the caret position required by the code completion through conversion or calculation. Then, there are some precautions that you may need to care about before that.
+
+The code completion of `dt-sql-parser` depends on [antlr4-c3](https://github.com/mike-lischke/antlr4-c3), which is a great library. The code completion of `dt-sql-parser` is just encapsulated and converted based on antlr4-c3, including converting the line and column number information into the token index required by antlr4-c3, as shown in the figure below:
+
+![column-image](./docs/images/token.png)
+
+Regard the column in the figure as the cursor position, and put this text into the editor, you will get 13 possible cursor positions, while for dt-sql-parser, this text will generate 4 Tokens after being parsed. An important strategy of the code completion is: **When the cursor (CaretPosition) has not completely left a Token, dt-sql-parser thinks that this Token has not been completed, and the code completion will infer what can be filled in the position of this Token.**
+
+For example, if you want to know what to fill in after `SHOW` through the code completion, the caret position should be:
+
+```javascript
+{
+    lineNumber: 1,
+    column: 6
+}
+```
+
+At this time, dt-sql-parser will think that `SHOW` is already a complete Token, and it should infer what can be filled in after `SHOW`. If the column in the passed-in caret position is 5, then dt-sql-parser will think that `SHOW` has not been completed, and then infer what can be filled in the position of `SHOW`. In other words, in the figure above, `column: 5` belongs to `token: 0`, and `column: 6` belongs to `token: 1`.
+
+For the editor, this strategy is also more intuitive. After the user enters `SHOW`, before pressing the space key, the user probably has not finished entering, maybe the user wants to enter something like `SHOWS`. When the user presses the space key, the editor thinks that the user wants to enter the next Token, and it is time to ask dt-sql-parser what can be filled in the next Token position.
+
+<br/>
 ## License

 [MIT](./LICENSE)
--- a/docs/images/column.png
+++ b/docs/images/column.png
--- a/docs/images/index.png
+++ b/docs/images/index.png
--- a/docs/images/line.png
+++ b/docs/images/line.png
--- a/docs/images/token.png
+++ b/docs/images/token.png
--- a/src/parser/common/basic-parser-types.ts
+++ b/src/parser/common/basic-parser-types.ts
@ -48,11 +48,13 @@ export interface WordRange {
    readonly text: string;
    /** start at 0 */
    readonly startIndex: number;
-    readonly stopIndex: number;
+    /** end at ..n-1 */
+    readonly endIndex: number;
    /** start at 1 */
    readonly line: number;
    /** start at 1 */
    readonly startColumn: number;
+    /** end at ..n + 1 */
    readonly stopColumn: number;
 }

@ -81,12 +83,15 @@ export interface Suggestions<T = WordRange> {
 export interface TextSlice {
    /** start at 0 */
    readonly startIndex: number;
+    /** end at ..n-1 */
    readonly endIndex: number;
    /** start at 1 */
    readonly startLine: number;
+    /** end at ..n */
    readonly endLine: number;
    /** start at 1 */
    readonly startColumn: number;
+    /** end at ..n + 1 */
    readonly endColumn: number;
    readonly text: string;
 }
--- a/src/parser/common/basicParser.ts
+++ b/src/parser/common/basicParser.ts
@ -244,7 +244,7 @@ export default abstract class BasicParser<
                startLine: start.line,
                endLine: stop.line,
                startColumn: start.charPositionInLine + 1,
-                endColumn: stop.charPositionInLine + stop.text.length,
+                endColumn: stop.charPositionInLine + 1 + stop.text.length,
                text: this._parsedInput.slice(start.startIndex, stop.stopIndex + 1),
            };
        });
@ -364,10 +364,10 @@ export default abstract class BasicParser<
                    return {
                        text: this._parsedInput.slice(token.startIndex, token.stopIndex + 1),
                        startIndex: token.startIndex,
-                        stopIndex: token.stopIndex,
+                        endIndex: token.stopIndex,
                        line: token.line,
                        startColumn: token.charPositionInLine + 1,
-                        stopColumn: token.charPositionInLine + token.text.length,
+                        stopColumn: token.charPositionInLine + 1 + token.text.length,
                    };
                });
                return {
--- a/src/parser/common/parseErrorListener.ts
+++ b/src/parser/common/parseErrorListener.ts
@ -5,10 +5,14 @@ import { ATNSimulator } from 'antlr4ts/atn/ATNSimulator';
 * Converted from {@link SyntaxError}.
 */
 export interface ParseError {
+    /** start at 1 */
    readonly startLine: number;
+    /** end at ..n */
    readonly endLine: number;
-    readonly startCol: number;
-    readonly endCol: number;
+    /** start at 1 */
+    readonly startColumn: number;
+    /** end at ..n + 1 */
+    readonly endColumn: number;
    readonly message: string;
 }

@ -31,7 +35,7 @@ export interface SyntaxError<T> {
 export type ErrorListener<T> = (parseError: ParseError, originalError: SyntaxError<T>) => void;

 export default class ParseErrorListener implements ANTLRErrorListener<Token> {
-    private _errorListener;
+    private _errorListener: ErrorListener<Token>;

    constructor(errorListener: ErrorListener<Token>) {
        this._errorListener = errorListener;
@ -54,8 +58,8 @@ export default class ParseErrorListener implements ANTLRErrorListener<Token> {
                {
                    startLine: line,
                    endLine: line,
-                    startCol: charPositionInLine,
-                    endCol: endCol,
+                    startColumn: charPositionInLine + 1,
+                    endColumn: endCol + 1,
                    message: msg,
                },
                {