Java使用ANTLR4對(duì)Lua腳本語(yǔ)法校驗(yàn)詳解
什么是ANTLR?
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
ANTLR(ANother Tool for Language Recognition)是一個(gè)強(qiáng)大的解析器生成器,用于讀取、處理、執(zhí)行或翻譯結(jié)構(gòu)化文本或二進(jìn)制文件。 它被廣泛用于構(gòu)建語(yǔ)言、工具和框架。ANTLR 根據(jù)語(yǔ)法定義生成解析器,解析器可以構(gòu)建和遍歷解析樹(shù)。
第一個(gè)例子
https://github.com/antlr/antlr4/blob/master/doc/getting-started.md#a-first-example
1.新建個(gè)Hello.g4文件:
// Define a grammar called Hello grammar Hello; r : 'hello' ID ; // match keyword hello followed by an identifier ID : [a-z]+ ; // match lower-case identifiers WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
2.安裝IDEA插件
ANTLR v4:https://plugins.jetbrains.com/plugin/7358-antlr-v4
3.打開(kāi)ANTLR Preview
在r : 'hello' ID ; // match keyword hello followed by an identifier這行上右鍵,點(diǎn)擊Test Rule r
輸入hello world,能夠準(zhǔn)確識(shí)別出ID為word。
輸入hello World
,就不能夠識(shí)別出ID為world了。
ANTLR4 的工作流程
詞法分析器 (Lexer) :將字符序列轉(zhuǎn)換為單詞(Token)的過(guò)程。詞法分析器(Lexer)一般是用來(lái)供語(yǔ)法解析器(Parser)調(diào)用的。
語(yǔ)法解析器 (Parser) :通常作為編譯器或解釋器出現(xiàn)。它的作用是進(jìn)行語(yǔ)法檢查,并構(gòu)建由輸入單詞(Token)組成的數(shù)據(jù)結(jié)構(gòu)(即抽象語(yǔ)法樹(shù))。語(yǔ)法解析器通常使用詞法分析器(Lexer)從輸入字符流中分離出一個(gè)個(gè)的單詞(Token),并將單詞(Token)流作為其輸入。實(shí)際開(kāi)發(fā)中,語(yǔ)法解析器可以手工編寫,也可以使用工具自動(dòng)生成。
抽象語(yǔ)法樹(shù) (Parse Tree) :是源代碼結(jié)構(gòu)的一種抽象表示,它以樹(shù)的形狀表示語(yǔ)言的語(yǔ)法結(jié)構(gòu)。抽象語(yǔ)法樹(shù)一般可以用來(lái)進(jìn)行代碼語(yǔ)法的檢查,代碼風(fēng)格的檢查,代碼的格式化,代碼的高亮,代碼的錯(cuò)誤提示以及代碼的自動(dòng)補(bǔ)全等。
使用 antlr4 的一般流程如下:
- 書寫 antlr4 的詞法和文法規(guī)則
- 使用 antlr4 的生成工具處理寫好的規(guī)則,以生成指定語(yǔ)言的 Lexer 和 Parser 代碼
- 調(diào)用生成的 Lexer 和 Parser 類,書寫相應(yīng)的邏輯代碼,將原始輸入文本轉(zhuǎn)化為一個(gè)抽象語(yǔ)法樹(shù)
- 使用 antlr4 的 visitor 來(lái)解析語(yǔ)法樹(shù),實(shí)現(xiàn)各種功能
實(shí)際上,除了 visitor 之外,antlr4 還提供了另一種解析語(yǔ)法樹(shù)方式,叫做 Listener。Listener 是 antlr4 默認(rèn)解析語(yǔ)法樹(shù)的方式,它和 visitor 一樣都可以實(shí)現(xiàn)對(duì) ParseTree 的解析。如果開(kāi)啟了 visitor 或 listener,那么 antlr4 除了會(huì)生成 Lexer 和 Parser 代碼,還會(huì)生成相應(yīng)的 Visitor 和 Listener 代碼。Listener 和 Visitor 區(qū)別如下
Listener | Visitor(個(gè)人傾向這種) | |
---|---|---|
是否訪問(wèn)所有節(jié)點(diǎn) | 訪問(wèn)所有節(jié)點(diǎn) | 只訪問(wèn)手動(dòng)指定的節(jié)點(diǎn) |
訪問(wèn)節(jié)點(diǎn)方式 | 通過(guò) enter 和 exit 方法 | 通過(guò) visit 方法 |
方法是否有返回值 | 沒(méi)有返回值 | 有返回值 |
了解了 Listener 和 Visitor 的區(qū)別之后,我們可以總結(jié)出 antlr4 的大致工作流程如下:
如上左邊的點(diǎn)線流程代表了通過(guò) ANTLR4,將原始的.g4 規(guī)則轉(zhuǎn)化為 Lexer、Parser、Listener 和 Visitor。右邊的虛線流程代表了將原始的輸入流通過(guò) Lexer 轉(zhuǎn)化為 Tokens,再將 Tokens 通過(guò) Parser 轉(zhuǎn)化為語(yǔ)法樹(shù),最后通過(guò) Listener 或 Visitor 遍歷 ParseTree 得到最終結(jié)果。
Lua腳本語(yǔ)法校驗(yàn)
準(zhǔn)備一個(gè)Lua Grammar文件
https://github.com/antlr/grammars-v4/tree/master/lua
/* BSD License Copyright (c) 2013, Kazunori Sakamoto Copyright (c) 2016, Alexander Alexeev All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the NAME of Rainer Schuster nor the NAMEs of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. This grammar file derived from: Lua 5.3 Reference Manual http://www.lua.org/manual/5.3/manual.html Lua 5.2 Reference Manual http://www.lua.org/manual/5.2/manual.html Lua 5.1 grammar written by Nicolai Mainiero http://www.antlr3.org/grammar/1178608849736/Lua.g Tested by Kazunori Sakamoto with Test suite for Lua 5.2 (http://www.lua.org/tests/5.2/) Tested by Alexander Alexeev with Test suite for Lua 5.3 http://www.lua.org/tests/lua-5.3.2-tests.tar.gz */ grammar Lua; chunk : block EOF ; block : stat* retstat? ; stat : ';' | varlist '=' explist | functioncall | label | 'break' | 'goto' NAME | 'do' block 'end' | 'while' exp 'do' block 'end' | 'repeat' block 'until' exp | 'if' exp 'then' block ('elseif' exp 'then' block)* ('else' block)? 'end' | 'for' NAME '=' exp ',' exp (',' exp)? 'do' block 'end' | 'for' namelist 'in' explist 'do' block 'end' | 'function' funcname funcbody | 'local' 'function' NAME funcbody | 'local' attnamelist ('=' explist)? ; attnamelist : NAME attrib (',' NAME attrib)* ; attrib : ('<' NAME '>')? ; retstat : 'return' explist? ';'? ; label : '::' NAME '::' ; funcname : NAME ('.' NAME)* (':' NAME)? ; varlist : var_ (',' var_)* ; namelist : NAME (',' NAME)* ; explist : exp (',' exp)* ; exp : 'nil' | 'false' | 'true' | number | string | '...' | functiondef | prefixexp | tableconstructor | <assoc=right> exp operatorPower exp | operatorUnary exp | exp operatorMulDivMod exp | exp operatorAddSub exp | <assoc=right> exp operatorStrcat exp | exp operatorComparison exp | exp operatorAnd exp | exp operatorOr exp | exp operatorBitwise exp ; prefixexp : varOrExp nameAndArgs* ; functioncall : varOrExp nameAndArgs+ ; varOrExp : var_ | '(' exp ')' ; var_ : (NAME | '(' exp ')' varSuffix) varSuffix* ; varSuffix : nameAndArgs* ('[' exp ']' | '.' NAME) ; nameAndArgs : (':' NAME)? args ; /* var_ : NAME | prefixexp '[' exp ']' | prefixexp '.' NAME ; prefixexp : var_ | functioncall | '(' exp ')' ; functioncall : prefixexp args | prefixexp ':' NAME args ; */ args : '(' explist? ')' | tableconstructor | string ; functiondef : 'function' funcbody ; funcbody : '(' parlist? ')' block 'end' ; parlist : namelist (',' '...')? | '...' ; tableconstructor : '{' fieldlist? '}' ; fieldlist : field (fieldsep field)* fieldsep? ; field : '[' exp ']' '=' exp | NAME '=' exp | exp ; fieldsep : ',' | ';' ; operatorOr : 'or'; operatorAnd : 'and'; operatorComparison : '<' | '>' | '<=' | '>=' | '~=' | '=='; operatorStrcat : '..'; operatorAddSub : '+' | '-'; operatorMulDivMod : '*' | '/' | '%' | '//'; operatorBitwise : '&' | '|' | '~' | '<<' | '>>'; operatorUnary : 'not' | '#' | '-' | '~'; operatorPower : '^'; number : INT | HEX | FLOAT | HEX_FLOAT ; string : NORMALSTRING | CHARSTRING | LONGSTRING ; // LEXER NAME : [a-zA-Z_][a-zA-Z_0-9]* ; NORMALSTRING : '"' ( EscapeSequence | ~('\\'|'"') )* '"' ; CHARSTRING : '\'' ( EscapeSequence | ~('\''|'\\') )* '\'' ; LONGSTRING : '[' NESTED_STR ']' ; fragment NESTED_STR : '=' NESTED_STR '=' | '[' .*? ']' ; INT : Digit+ ; HEX : '0' [xX] HexDigit+ ; FLOAT : Digit+ '.' Digit* ExponentPart? | '.' Digit+ ExponentPart? | Digit+ ExponentPart ; HEX_FLOAT : '0' [xX] HexDigit+ '.' HexDigit* HexExponentPart? | '0' [xX] '.' HexDigit+ HexExponentPart? | '0' [xX] HexDigit+ HexExponentPart ; fragment ExponentPart : [eE] [+-]? Digit+ ; fragment HexExponentPart : [pP] [+-]? Digit+ ; fragment EscapeSequence : '\\' [abfnrtvz"'\\] | '\\' '\r'? '\n' | DecimalEscape | HexEscape | UtfEscape ; fragment DecimalEscape : '\\' Digit | '\\' Digit Digit | '\\' [0-2] Digit Digit ; fragment HexEscape : '\\' 'x' HexDigit HexDigit ; fragment UtfEscape : '\\' 'u{' HexDigit+ '}' ; fragment Digit : [0-9] ; fragment HexDigit : [0-9a-fA-F] ; COMMENT : '--[' NESTED_STR ']' -> channel(HIDDEN) ; LINE_COMMENT : '--' ( // -- | '[' '='* // --[== | '[' '='* ~('='|'['|'\r'|'\n') ~('\r'|'\n')* // --[==AA | ~('['|'\r'|'\n') ~('\r'|'\n')* // --AAA ) ('\r\n'|'\r'|'\n'|EOF) -> channel(HIDDEN) ; WS : [ \t\u000C\r\n]+ -> skip ; SHEBANG : '#' '!' ~('\n'|'\r')* -> channel(HIDDEN) ;
maven配置
使用JDK8的注意:antlr4最高版本為4.9.3,原因如下:
來(lái)源:https://github.com/antlr/antlr4/releases/tag/4.10
Increasing minimum java version
Going forward, we are using Java 11 for the source code and the compiled .class files for the ANTLR tool. The Java runtime target, however, and the associated runtime tests use Java 8 (bumping up from Java 7).
<dependencies> <dependency> <groupId>org.antlr</groupId> <artifactId>antlr4-runtime</artifactId> <version>${antlr.version}</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.antlr</groupId> <artifactId>antlr4-maven-plugin</artifactId> <version>${antlr.version}</version> <configuration> <visitor>true</visitor> <listener>true</listener> </configuration> <executions> <execution> <goals> <goal>antlr4</goal> </goals> </execution> </executions> </plugin> </plugins> </build> <properties> <!-- https://mvnrepository.com/artifact/org.antlr/antlr4-runtime --> <!-- Antlr4 4.9.3 is the last version compatible with Java 8 --> <antlr.version>4.9.3</antlr.version> </properties>
生成Lexer Parser Listener Visitor代碼
mvn clean compile
新建實(shí)體類
語(yǔ)法錯(cuò)誤:每行有什么錯(cuò)誤。
package com.baeldung.antlr.lua.model; /** * 語(yǔ)法錯(cuò)誤 * * @author duhongming * @see * @since 1.0.0 */ public class SyntaxErrorEntry { private Integer lineNum; private String errorInfo; public Integer getLineNum() { return lineNum; } public void setLineNum(Integer lineNum) { this.lineNum = lineNum; } public String getErrorInfo() { return errorInfo; } public void setErrorInfo(String errorInfo) { this.errorInfo = errorInfo; } }
語(yǔ)法錯(cuò)誤報(bào)告:每行有什么錯(cuò)誤的集合。
package com.baeldung.antlr.lua.model; import java.util.LinkedList; import java.util.List; /** * 語(yǔ)法錯(cuò)誤報(bào)告 * * @author duhongming * @see * @since 1.0.0 */ public class SyntaxErrorReportEntry { private final List<SyntaxErrorEntry> syntaxErrorList = new LinkedList<>(); public void addError(int line, int charPositionInLine, Object offendingSymbol, String msg) { SyntaxErrorEntry syntaxErrorEntry = new SyntaxErrorEntry(); syntaxErrorEntry.setLineNum(line); syntaxErrorEntry.setErrorInfo(line + "行," + charPositionInLine + "列," + offendingSymbol + "字符處,存在語(yǔ)法錯(cuò)誤:" + msg); syntaxErrorList.add(syntaxErrorEntry); } public List<SyntaxErrorEntry> getSyntaxErrorReport() { return syntaxErrorList; } }
Lua語(yǔ)法遍歷器
package com.baeldung.antlr.lua; import com.baeldung.antlr.LuaParser; import com.baeldung.antlr.LuaVisitor; import org.antlr.v4.runtime.tree.ErrorNode; import org.antlr.v4.runtime.tree.ParseTree; import org.antlr.v4.runtime.tree.RuleNode; import org.antlr.v4.runtime.tree.TerminalNode; /** * Lua語(yǔ)法遍歷器 * * @author duhongming * @see * @since 1.0.0 */ public class LuaSyntaxVisitor implements LuaVisitor<Object> { // ctrl+O Override即可 }
語(yǔ)法錯(cuò)誤監(jiān)聽(tīng)器
package com.baeldung.antlr.lua; import com.baeldung.antlr.lua.model.SyntaxErrorReportEntry; import org.antlr.v4.runtime.BaseErrorListener; import org.antlr.v4.runtime.RecognitionException; import org.antlr.v4.runtime.Recognizer; /** * 語(yǔ)法錯(cuò)誤監(jiān)聽(tīng)器 * * @author duhongming * @see * @since 1.0.0 */ public class SyntaxErrorListener extends BaseErrorListener { private final SyntaxErrorReportEntry reporter; public SyntaxErrorListener(SyntaxErrorReportEntry reporter) { this.reporter = reporter; } @Override public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) { this.reporter.addError(line, charPositionInLine, offendingSymbol, msg); } }
單元測(cè)試
package com.baeldung.antlr; import com.baeldung.antlr.lua.LuaSyntaxVisitor; import com.baeldung.antlr.lua.SyntaxErrorListener; import com.baeldung.antlr.lua.model.SyntaxErrorEntry; import com.baeldung.antlr.lua.model.SyntaxErrorReportEntry; import org.antlr.v4.runtime.CharStream; import org.antlr.v4.runtime.CharStreams; import org.antlr.v4.runtime.CommonTokenStream; import org.junit.Test; import java.util.List; import static org.hamcrest.CoreMatchers.is; import static org.hamcrest.MatcherAssert.assertThat; public class LuaSyntaxErrorUnitTest { public static List<SyntaxErrorEntry> judgeLuaSyntax(String luaScript) { //新建一個(gè)CharStream,讀取數(shù)據(jù) CharStream charStreams = CharStreams.fromString(luaScript); //包含一個(gè)詞法分析器的定義,作用是將輸入的字符序列聚集成詞匯符號(hào)。 LuaLexer luaLexer = new LuaLexer(charStreams); //新建一個(gè)詞法符號(hào)的緩沖區(qū),用于存儲(chǔ)詞法分析器生成的詞法符號(hào)(Token) CommonTokenStream tokenStream = new CommonTokenStream(luaLexer); //新建一個(gè)語(yǔ)法分析器,用于分析詞法符號(hào)緩沖區(qū)中的詞法符號(hào) LuaParser luaParser = new LuaParser(tokenStream); SyntaxErrorReportEntry syntaxErrorReporter = new SyntaxErrorReportEntry(); SyntaxErrorListener errorListener = new SyntaxErrorListener(syntaxErrorReporter); luaParser.addErrorListener(errorListener); LuaSyntaxVisitor luaSyntaxVisitor = new LuaSyntaxVisitor(); luaSyntaxVisitor.visit(luaParser.chunk()); return syntaxErrorReporter.getSyntaxErrorReport(); } @Test public void testGood() throws Exception { List<SyntaxErrorEntry> errorEntryList = judgeLuaSyntax("if a~=1 then print(1) end"); assertThat(errorEntryList.size(), is(0)); } @Test public void testBad() throws Exception { //新建一個(gè)CharStream,讀取數(shù)據(jù) List<SyntaxErrorEntry> errorEntryList = judgeLuaSyntax("if a!=1 then print(1) end"); assertThat(errorEntryList.size(), is(2)); } }
順便說(shuō)一下:把a(bǔ)ntlr4看成一種語(yǔ)言,和java同一級(jí)別,這個(gè)在使用groovy時(shí)也是一樣的。
最終目錄情況及單元測(cè)試情況如下:
以上就是Java使用ANTLR4對(duì)Lua腳本語(yǔ)法校驗(yàn)詳解的詳細(xì)內(nèi)容,更多關(guān)于Java Lua腳本語(yǔ)法校驗(yàn)的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章!
相關(guān)文章
Java?HashTable與Collections.synchronizedMap源碼深入解析
HashTable是jdk?1.0中引入的產(chǎn)物,基本上現(xiàn)在很少使用了,但是會(huì)在面試中經(jīng)常被問(wèn)到。本文就來(lái)帶大家一起深入了解一下Hashtable,需要的可以參考一下2022-11-11淺談java多線程 join方法以及優(yōu)先級(jí)方法
下面小編就為大家?guī)?lái)一篇淺談java多線程 join方法以及優(yōu)先級(jí)方法。小編覺(jué)得挺不錯(cuò)的,現(xiàn)在就分享給大家,也給大家做個(gè)參考。一起跟隨小編過(guò)來(lái)看看吧2017-01-01Java即將引入新對(duì)象類型來(lái)解決內(nèi)存使用問(wèn)題
這篇文章主要介紹了Java即將引入新對(duì)象類型來(lái)解決內(nèi)存使用問(wèn)題,文章通過(guò)圍繞主題的相關(guān)資料展開(kāi)詳細(xì)內(nèi)容,具有一定的參考價(jià)值,需要的小伙伴可以參考一下2022-05-05在SpringBoot中配置日志級(jí)別和輸出格式的教程詳解
在開(kāi)發(fā)一個(gè)應(yīng)用程序時(shí),日志記錄是非常重要的一環(huán),SpringBoot提供了多種日志輸出方式和配置選項(xiàng),本文將介紹如何在SpringBoot應(yīng)用程序中配置日志級(jí)別和輸出格式,需要的朋友可以參考下2023-06-06java 記錄一個(gè)子串在整串中出現(xiàn)的次數(shù)實(shí)例
今天小編就為大家分享一篇java 記錄一個(gè)子串在整串中出現(xiàn)的次數(shù)實(shí)例,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2018-07-07