Java使用ANTLR4對Lua腳本語法校驗詳解
什么是ANTLR?
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
ANTLR(ANother Tool for Language Recognition)是一個強大的解析器生成器,用于讀取、處理、執(zhí)行或翻譯結構化文本或二進制文件。 它被廣泛用于構建語言、工具和框架。ANTLR 根據(jù)語法定義生成解析器,解析器可以構建和遍歷解析樹。
第一個例子
https://github.com/antlr/antlr4/blob/master/doc/getting-started.md#a-first-example
1.新建個Hello.g4文件:
// Define a grammar called Hello grammar Hello; r : 'hello' ID ; // match keyword hello followed by an identifier ID : [a-z]+ ; // match lower-case identifiers WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
2.安裝IDEA插件
ANTLR v4:https://plugins.jetbrains.com/plugin/7358-antlr-v4
3.打開ANTLR Preview
在r : 'hello' ID ; // match keyword hello followed by an identifier這行上右鍵,點擊Test Rule r
輸入hello world,能夠準確識別出ID為word。
輸入hello World
,就不能夠識別出ID為world了。
ANTLR4 的工作流程
詞法分析器 (Lexer) :將字符序列轉(zhuǎn)換為單詞(Token)的過程。詞法分析器(Lexer)一般是用來供語法解析器(Parser)調(diào)用的。
語法解析器 (Parser) :通常作為編譯器或解釋器出現(xiàn)。它的作用是進行語法檢查,并構建由輸入單詞(Token)組成的數(shù)據(jù)結構(即抽象語法樹)。語法解析器通常使用詞法分析器(Lexer)從輸入字符流中分離出一個個的單詞(Token),并將單詞(Token)流作為其輸入。實際開發(fā)中,語法解析器可以手工編寫,也可以使用工具自動生成。
抽象語法樹 (Parse Tree) :是源代碼結構的一種抽象表示,它以樹的形狀表示語言的語法結構。抽象語法樹一般可以用來進行代碼語法的檢查,代碼風格的檢查,代碼的格式化,代碼的高亮,代碼的錯誤提示以及代碼的自動補全等。
使用 antlr4 的一般流程如下:
- 書寫 antlr4 的詞法和文法規(guī)則
- 使用 antlr4 的生成工具處理寫好的規(guī)則,以生成指定語言的 Lexer 和 Parser 代碼
- 調(diào)用生成的 Lexer 和 Parser 類,書寫相應的邏輯代碼,將原始輸入文本轉(zhuǎn)化為一個抽象語法樹
- 使用 antlr4 的 visitor 來解析語法樹,實現(xiàn)各種功能
實際上,除了 visitor 之外,antlr4 還提供了另一種解析語法樹方式,叫做 Listener。Listener 是 antlr4 默認解析語法樹的方式,它和 visitor 一樣都可以實現(xiàn)對 ParseTree 的解析。如果開啟了 visitor 或 listener,那么 antlr4 除了會生成 Lexer 和 Parser 代碼,還會生成相應的 Visitor 和 Listener 代碼。Listener 和 Visitor 區(qū)別如下
Listener | Visitor(個人傾向這種) | |
---|---|---|
是否訪問所有節(jié)點 | 訪問所有節(jié)點 | 只訪問手動指定的節(jié)點 |
訪問節(jié)點方式 | 通過 enter 和 exit 方法 | 通過 visit 方法 |
方法是否有返回值 | 沒有返回值 | 有返回值 |
了解了 Listener 和 Visitor 的區(qū)別之后,我們可以總結出 antlr4 的大致工作流程如下:
如上左邊的點線流程代表了通過 ANTLR4,將原始的.g4 規(guī)則轉(zhuǎn)化為 Lexer、Parser、Listener 和 Visitor。右邊的虛線流程代表了將原始的輸入流通過 Lexer 轉(zhuǎn)化為 Tokens,再將 Tokens 通過 Parser 轉(zhuǎn)化為語法樹,最后通過 Listener 或 Visitor 遍歷 ParseTree 得到最終結果。
Lua腳本語法校驗
準備一個Lua Grammar文件
https://github.com/antlr/grammars-v4/tree/master/lua
/* BSD License Copyright (c) 2013, Kazunori Sakamoto Copyright (c) 2016, Alexander Alexeev All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the NAME of Rainer Schuster nor the NAMEs of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. This grammar file derived from: Lua 5.3 Reference Manual http://www.lua.org/manual/5.3/manual.html Lua 5.2 Reference Manual http://www.lua.org/manual/5.2/manual.html Lua 5.1 grammar written by Nicolai Mainiero http://www.antlr3.org/grammar/1178608849736/Lua.g Tested by Kazunori Sakamoto with Test suite for Lua 5.2 (http://www.lua.org/tests/5.2/) Tested by Alexander Alexeev with Test suite for Lua 5.3 http://www.lua.org/tests/lua-5.3.2-tests.tar.gz */ grammar Lua; chunk : block EOF ; block : stat* retstat? ; stat : ';' | varlist '=' explist | functioncall | label | 'break' | 'goto' NAME | 'do' block 'end' | 'while' exp 'do' block 'end' | 'repeat' block 'until' exp | 'if' exp 'then' block ('elseif' exp 'then' block)* ('else' block)? 'end' | 'for' NAME '=' exp ',' exp (',' exp)? 'do' block 'end' | 'for' namelist 'in' explist 'do' block 'end' | 'function' funcname funcbody | 'local' 'function' NAME funcbody | 'local' attnamelist ('=' explist)? ; attnamelist : NAME attrib (',' NAME attrib)* ; attrib : ('<' NAME '>')? ; retstat : 'return' explist? ';'? ; label : '::' NAME '::' ; funcname : NAME ('.' NAME)* (':' NAME)? ; varlist : var_ (',' var_)* ; namelist : NAME (',' NAME)* ; explist : exp (',' exp)* ; exp : 'nil' | 'false' | 'true' | number | string | '...' | functiondef | prefixexp | tableconstructor | <assoc=right> exp operatorPower exp | operatorUnary exp | exp operatorMulDivMod exp | exp operatorAddSub exp | <assoc=right> exp operatorStrcat exp | exp operatorComparison exp | exp operatorAnd exp | exp operatorOr exp | exp operatorBitwise exp ; prefixexp : varOrExp nameAndArgs* ; functioncall : varOrExp nameAndArgs+ ; varOrExp : var_ | '(' exp ')' ; var_ : (NAME | '(' exp ')' varSuffix) varSuffix* ; varSuffix : nameAndArgs* ('[' exp ']' | '.' NAME) ; nameAndArgs : (':' NAME)? args ; /* var_ : NAME | prefixexp '[' exp ']' | prefixexp '.' NAME ; prefixexp : var_ | functioncall | '(' exp ')' ; functioncall : prefixexp args | prefixexp ':' NAME args ; */ args : '(' explist? ')' | tableconstructor | string ; functiondef : 'function' funcbody ; funcbody : '(' parlist? ')' block 'end' ; parlist : namelist (',' '...')? | '...' ; tableconstructor : '{' fieldlist? '}' ; fieldlist : field (fieldsep field)* fieldsep? ; field : '[' exp ']' '=' exp | NAME '=' exp | exp ; fieldsep : ',' | ';' ; operatorOr : 'or'; operatorAnd : 'and'; operatorComparison : '<' | '>' | '<=' | '>=' | '~=' | '=='; operatorStrcat : '..'; operatorAddSub : '+' | '-'; operatorMulDivMod : '*' | '/' | '%' | '//'; operatorBitwise : '&' | '|' | '~' | '<<' | '>>'; operatorUnary : 'not' | '#' | '-' | '~'; operatorPower : '^'; number : INT | HEX | FLOAT | HEX_FLOAT ; string : NORMALSTRING | CHARSTRING | LONGSTRING ; // LEXER NAME : [a-zA-Z_][a-zA-Z_0-9]* ; NORMALSTRING : '"' ( EscapeSequence | ~('\\'|'"') )* '"' ; CHARSTRING : '\'' ( EscapeSequence | ~('\''|'\\') )* '\'' ; LONGSTRING : '[' NESTED_STR ']' ; fragment NESTED_STR : '=' NESTED_STR '=' | '[' .*? ']' ; INT : Digit+ ; HEX : '0' [xX] HexDigit+ ; FLOAT : Digit+ '.' Digit* ExponentPart? | '.' Digit+ ExponentPart? | Digit+ ExponentPart ; HEX_FLOAT : '0' [xX] HexDigit+ '.' HexDigit* HexExponentPart? | '0' [xX] '.' HexDigit+ HexExponentPart? | '0' [xX] HexDigit+ HexExponentPart ; fragment ExponentPart : [eE] [+-]? Digit+ ; fragment HexExponentPart : [pP] [+-]? Digit+ ; fragment EscapeSequence : '\\' [abfnrtvz"'\\] | '\\' '\r'? '\n' | DecimalEscape | HexEscape | UtfEscape ; fragment DecimalEscape : '\\' Digit | '\\' Digit Digit | '\\' [0-2] Digit Digit ; fragment HexEscape : '\\' 'x' HexDigit HexDigit ; fragment UtfEscape : '\\' 'u{' HexDigit+ '}' ; fragment Digit : [0-9] ; fragment HexDigit : [0-9a-fA-F] ; COMMENT : '--[' NESTED_STR ']' -> channel(HIDDEN) ; LINE_COMMENT : '--' ( // -- | '[' '='* // --[== | '[' '='* ~('='|'['|'\r'|'\n') ~('\r'|'\n')* // --[==AA | ~('['|'\r'|'\n') ~('\r'|'\n')* // --AAA ) ('\r\n'|'\r'|'\n'|EOF) -> channel(HIDDEN) ; WS : [ \t\u000C\r\n]+ -> skip ; SHEBANG : '#' '!' ~('\n'|'\r')* -> channel(HIDDEN) ;
maven配置
使用JDK8的注意:antlr4最高版本為4.9.3,原因如下:
來源:https://github.com/antlr/antlr4/releases/tag/4.10
Increasing minimum java version
Going forward, we are using Java 11 for the source code and the compiled .class files for the ANTLR tool. The Java runtime target, however, and the associated runtime tests use Java 8 (bumping up from Java 7).
<dependencies> <dependency> <groupId>org.antlr</groupId> <artifactId>antlr4-runtime</artifactId> <version>${antlr.version}</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.antlr</groupId> <artifactId>antlr4-maven-plugin</artifactId> <version>${antlr.version}</version> <configuration> <visitor>true</visitor> <listener>true</listener> </configuration> <executions> <execution> <goals> <goal>antlr4</goal> </goals> </execution> </executions> </plugin> </plugins> </build> <properties> <!-- https://mvnrepository.com/artifact/org.antlr/antlr4-runtime --> <!-- Antlr4 4.9.3 is the last version compatible with Java 8 --> <antlr.version>4.9.3</antlr.version> </properties>
生成Lexer Parser Listener Visitor代碼
mvn clean compile
新建實體類
語法錯誤:每行有什么錯誤。
package com.baeldung.antlr.lua.model; /** * 語法錯誤 * * @author duhongming * @see * @since 1.0.0 */ public class SyntaxErrorEntry { private Integer lineNum; private String errorInfo; public Integer getLineNum() { return lineNum; } public void setLineNum(Integer lineNum) { this.lineNum = lineNum; } public String getErrorInfo() { return errorInfo; } public void setErrorInfo(String errorInfo) { this.errorInfo = errorInfo; } }
語法錯誤報告:每行有什么錯誤的集合。
package com.baeldung.antlr.lua.model; import java.util.LinkedList; import java.util.List; /** * 語法錯誤報告 * * @author duhongming * @see * @since 1.0.0 */ public class SyntaxErrorReportEntry { private final List<SyntaxErrorEntry> syntaxErrorList = new LinkedList<>(); public void addError(int line, int charPositionInLine, Object offendingSymbol, String msg) { SyntaxErrorEntry syntaxErrorEntry = new SyntaxErrorEntry(); syntaxErrorEntry.setLineNum(line); syntaxErrorEntry.setErrorInfo(line + "行," + charPositionInLine + "列," + offendingSymbol + "字符處,存在語法錯誤:" + msg); syntaxErrorList.add(syntaxErrorEntry); } public List<SyntaxErrorEntry> getSyntaxErrorReport() { return syntaxErrorList; } }
Lua語法遍歷器
package com.baeldung.antlr.lua; import com.baeldung.antlr.LuaParser; import com.baeldung.antlr.LuaVisitor; import org.antlr.v4.runtime.tree.ErrorNode; import org.antlr.v4.runtime.tree.ParseTree; import org.antlr.v4.runtime.tree.RuleNode; import org.antlr.v4.runtime.tree.TerminalNode; /** * Lua語法遍歷器 * * @author duhongming * @see * @since 1.0.0 */ public class LuaSyntaxVisitor implements LuaVisitor<Object> { // ctrl+O Override即可 }
語法錯誤監(jiān)聽器
package com.baeldung.antlr.lua; import com.baeldung.antlr.lua.model.SyntaxErrorReportEntry; import org.antlr.v4.runtime.BaseErrorListener; import org.antlr.v4.runtime.RecognitionException; import org.antlr.v4.runtime.Recognizer; /** * 語法錯誤監(jiān)聽器 * * @author duhongming * @see * @since 1.0.0 */ public class SyntaxErrorListener extends BaseErrorListener { private final SyntaxErrorReportEntry reporter; public SyntaxErrorListener(SyntaxErrorReportEntry reporter) { this.reporter = reporter; } @Override public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) { this.reporter.addError(line, charPositionInLine, offendingSymbol, msg); } }
單元測試
package com.baeldung.antlr; import com.baeldung.antlr.lua.LuaSyntaxVisitor; import com.baeldung.antlr.lua.SyntaxErrorListener; import com.baeldung.antlr.lua.model.SyntaxErrorEntry; import com.baeldung.antlr.lua.model.SyntaxErrorReportEntry; import org.antlr.v4.runtime.CharStream; import org.antlr.v4.runtime.CharStreams; import org.antlr.v4.runtime.CommonTokenStream; import org.junit.Test; import java.util.List; import static org.hamcrest.CoreMatchers.is; import static org.hamcrest.MatcherAssert.assertThat; public class LuaSyntaxErrorUnitTest { public static List<SyntaxErrorEntry> judgeLuaSyntax(String luaScript) { //新建一個CharStream,讀取數(shù)據(jù) CharStream charStreams = CharStreams.fromString(luaScript); //包含一個詞法分析器的定義,作用是將輸入的字符序列聚集成詞匯符號。 LuaLexer luaLexer = new LuaLexer(charStreams); //新建一個詞法符號的緩沖區(qū),用于存儲詞法分析器生成的詞法符號(Token) CommonTokenStream tokenStream = new CommonTokenStream(luaLexer); //新建一個語法分析器,用于分析詞法符號緩沖區(qū)中的詞法符號 LuaParser luaParser = new LuaParser(tokenStream); SyntaxErrorReportEntry syntaxErrorReporter = new SyntaxErrorReportEntry(); SyntaxErrorListener errorListener = new SyntaxErrorListener(syntaxErrorReporter); luaParser.addErrorListener(errorListener); LuaSyntaxVisitor luaSyntaxVisitor = new LuaSyntaxVisitor(); luaSyntaxVisitor.visit(luaParser.chunk()); return syntaxErrorReporter.getSyntaxErrorReport(); } @Test public void testGood() throws Exception { List<SyntaxErrorEntry> errorEntryList = judgeLuaSyntax("if a~=1 then print(1) end"); assertThat(errorEntryList.size(), is(0)); } @Test public void testBad() throws Exception { //新建一個CharStream,讀取數(shù)據(jù) List<SyntaxErrorEntry> errorEntryList = judgeLuaSyntax("if a!=1 then print(1) end"); assertThat(errorEntryList.size(), is(2)); } }
順便說一下:把antlr4看成一種語言,和java同一級別,這個在使用groovy時也是一樣的。
最終目錄情況及單元測試情況如下:
以上就是Java使用ANTLR4對Lua腳本語法校驗詳解的詳細內(nèi)容,更多關于Java Lua腳本語法校驗的資料請關注腳本之家其它相關文章!
相關文章
Java?HashTable與Collections.synchronizedMap源碼深入解析
HashTable是jdk?1.0中引入的產(chǎn)物,基本上現(xiàn)在很少使用了,但是會在面試中經(jīng)常被問到。本文就來帶大家一起深入了解一下Hashtable,需要的可以參考一下2022-11-11java 記錄一個子串在整串中出現(xiàn)的次數(shù)實例
今天小編就為大家分享一篇java 記錄一個子串在整串中出現(xiàn)的次數(shù)實例,具有很好的參考價值,希望對大家有所幫助。一起跟隨小編過來看看吧2018-07-07