ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Overview

ANTLR v4

Java 7+ License

Build status

Github CI Build Status (MacOSX) AppVeyor CI Build Status (Windows) Circle CI Build Status (Linux) Travis-CI Build Status (Swift-Linux)

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build parse trees and also generates a listener interface (or visitor) that makes it easy to respond to the recognition of phrases of interest.

Given day-job constraints, my time working on this project is limited so I'll have to focus first on fixing bugs rather than changing/improving the feature set. Likely I'll do it in bursts every few months. Please do not be offended if your bug or pull request does not yield a response! --parrt

Donate

Authors and major contributors

Useful information

You might also find the following pages useful, particularly if you want to mess around with the various target languages.

The Definitive ANTLR 4 Reference

Programmers run into parsing problems all the time. Whether it’s a data format like JSON, a network protocol like SMTP, a server configuration file for Apache, a PostScript/PDF file, or a simple spreadsheet macro language—ANTLR v4 and this book will demystify the process. ANTLR v4 has been rewritten from scratch to make it easier than ever to build parsers and the language applications built on top. This completely rewritten new edition of the bestselling Definitive ANTLR Reference shows you how to take advantage of these new features.

You can buy the book The Definitive ANTLR 4 Reference at amazon or an electronic version at the publisher's site.

You will find the Book source code useful.

Additional grammars

This repository is a collection of grammars without actions where the root directory name is the all-lowercase name of the language parsed by the grammar. For example, java, cpp, csharp, c, etc...

Issues
  • New extended Unicode escape \u{10ABCD} to support Unicode literals > U+FFFF

    New extended Unicode escape \u{10ABCD} to support Unicode literals > U+FFFF

    Fixes #276 .

    This used to be a WIP PR, but it's now ready for review.

    This PR introduces a new extended Unicode escape \u{10ABCD} in ANTLR4 grammars to support Unicode literal values > U+FFFF.

    The serialized ATN represents any atom or range with a Unicode value > U+FFFF as a set. Any such set is serialized in the ATN with 32-bit arguments.

    I bumped the UUID, since this changes the serialized ATN format.

    I included lots of tests and made sure everything is passing on Linux, Mac, and Windows.

    type:feature unicode 
    opened by bhamiltoncx 115
  • splitting version numbers for targets

    splitting version numbers for targets

    Hiya: @pboyer, @mike-lischke, @janyou, @ewanmellor, @hanjoes, @ericvergnaud, @lingyv-li, @marcospassos

    Eric has raised the point that it would be nice to be able to make quick patches to the various runtimes; e.g., there is a stopping bug now in the JavaScript target. He proposes something along these lines:

    • any change in the tool or the runtime algorithm bumps the middle version #: 4.9 -> 4.10 -> 4.11
    • any bug fix in a runtime we bump the last digit of that runtime only: 4.9 -> 4.9.1 -> 4.9.2
    • if bumping the java runtime for bug fix we also bump the tool since it contains the runtime

    This is in optimal as people have criticized me in the past for bumping, say, 4.6 to 4.7 for some minor changes. It also has the problem that 4.9.x will not mean the same thing in two different targets possibly, as each target will now have their own version number.

    Rather than break up all of the targets into separate repositories or similar, can you guys think of a better solution? Any suggestions? The goal here is to allow more rapid target releases, and independent of me having to do a major release of the tool.

    type:question 
    opened by parrt 94
  • Improve memory usage and perf of CodePointCharStream: Use 8-bit, 16-bit, or 32-bit buffer

    Improve memory usage and perf of CodePointCharStream: Use 8-bit, 16-bit, or 32-bit buffer

    This greatly improves the memory usage and performance of CodePointCharStream by ensuring the internal storage uses either an 8-bit buffer (for Unicode code points <= U+00FF), 16-bit buffer (for Unicode code points <= U+FFFF), or a 32-bit buffer (Unicode code points > U+FFFF).

    I split out the internal storage into a class CodePointBuffer which has a CodePointBuffer.Builder class which has the logic to upgrade from 8-bit to 16-bit to 32-bit storage.

    I found the perf hotspot in CodePointCharStream on master was the virtual method calls from CharStream.LA(offset) into IntBuffer.

    Refactoring it into CodePointBuffer didn't help (in fact, it added another virtual method call).

    To fix the perf, I made CodePointCharStream an abstract class and made three concrete subclasses: CodePoint8BitCharStream, CodePoint16BitCharStream, and CodePoint32BitCharStream which directly access the array of underlying code points in the CodePointBuffer without virtual method calls.

    comp:performance lexers target:java 
    opened by bhamiltoncx 85
  • initial discussion to start integration of new targets

    initial discussion to start integration of new targets

    As promised, I am now ready to integrate the new ANTLR target languages you folks have been working on. This issue is meant to get everybody in sync, check status, and discuss the proper order of integration and resolve issues etc.

    There are two administrative details to get out of the way first:

    1. Please let me know if there is another github user that should be added to one of the categories. Or, of course, if you would like your user ID removed from this discussion.
    2. Nothing can be merged into antlr/antlr4 unless every single committer has added themselves to the contributors.txt file. It's onerous, particularly for simple commits, but it is requirement for anything merged into the master. Eclipse foundation lawyers tell me that we have one of the cleanest licenses out there and it contributes to ANTLR's widespread use because companies are not afraid to use the software. See the genesis of such heinous requirements in SCO v IBM. This means lead target authors have to go back through their committers list quickly and ask them to sign the contributors file with a new commit. Or, they can remove that commit and enter their own version of the functionality, being careful not to violate copyright on the previous.

    As we proceed, please keep in mind that I have a difficult role, balancing the needs of multiple targets and keeping discussions in the civil and practical zone. Decisions I make come from the perspective of over 25 years managing and leading this project. I look forward to incorporating your hard work into the main antlr repo.

    C++ current location

    • @mike-lischke
    • @DanMcLaughlin
    • @nburles
    • @davesisson

    Go current location, previous discussion

    • @pboyer

    Swift current location: unclear, previous discussion

    • @jeffreyguenther
    • @hanjoes
    • @janyou
    • @ewanmellor

    Likely interested/supporting humans (scraped from github issues):

    • @RYDB3RG
    • @wjkohnen
    • @willfaught
    • @parrt
    • @sharwell
    • @ericvergnaud
    target:cpp target:go target:swift type:improvement 
    opened by parrt 84
  • .NET Core Support

    .NET Core Support

    I added a new solution for the .NET Runtime, which supports .NET Core. Also made some code changes, to fix issues with missing APIs.

    comp:build target:csharp type:improvement 
    opened by lecode-official 80
  • PHP Target

    PHP Target

    I would like to propose introducing a PHP target and runtime.

    Is anyone interested in joining me to implement a PHP target?

    opened by marcospassos 75
  • Swift Target

    Swift Target

    I did a quick search and I didn't see anything written about this yet. What's the likelihood of a Swift target for ANTLR?

    There are C#, Javascript, and Python targets at the moment.

    What does it take to implement a target? Given that Swift is more Java-like, it seems like it should be possible. Maybe start with a code translator if there is one for (Java to Swift), and iterate towards a more idiomatic implementation.

    type:question 
    opened by jeffreyguenther 69
  • Add a new CharStream that converts the symbols to upper or lower case.

    Add a new CharStream that converts the symbols to upper or lower case.

    This is useful for many of the case insensitive grammars found at https://github.com/antlr/grammars-v4/ which assume the input would be all upper or lower case. Related discussion can be found at https://github.com/antlr/antlr4test-maven-plugin/issues/1

    It would be used like so:

    input, _ := antlr.NewFileStream("filename")
    
    in = antlr.NewCaseChangingStream(is, true) // true forces upper case symbols, false forces lower case.
    
    lexer := parser.NewAbnfLexer(in)
    

    While writing this, I found other people have written their own similar implementations (go, java). It makes sense to place this in the core, so everyone can use it.

    I would love for the grammar to have a option that says the lexer should upper/lower case all input, and then this code could be moved into the generated Lexer, and no user would need to explicitly use a CaseChangingStream (similar to what's discussed in #1002).

    comp:runtime lexers target:go target:java target:javascript 
    opened by bramp 68
  • [CSharp] #2021 fixes nuget packaging options to avoid missing dll exceptions

    [CSharp] #2021 fixes nuget packaging options to avoid missing dll exceptions

    @ericvergnaud Hi, I modified csproj options a bit, now I can get a working nuget package locally without the issue we described in #2021. I added .net 3.5 as a target to "main" csproj along with netstandard, since it's easier to keep track of requirements for both sets of api's when editing code and, ideally, both targets can be packed into a nuget package with a single command. Right now it's possible only on Windows via msbuild /t:pack or Visual Studio; unfortunately, due to https://github.com/Microsoft/msbuild/issues/1333, right now dotnet build pack does not work for .net 3.5 target the way it should, so I adjusted the existing script to create packages from .nuspec and different solutions for different targets.

    comp:build target:csharp 
    opened by listerenko 68
  • A few updates to the Unicode documentation.

    A few updates to the Unicode documentation.

    It should be made clear that the recommended use of CharStreams.fromPath() is a Java-only solution. The other targets just have their ANTLRInputStream class extended to support full Unicode.

    comp:doc 
    opened by mike-lischke 61
  • [Python] Fix index when combining inserts

    [Python] Fix index when combining inserts

    prevInserts list is the subset of rewrites, so indices taken from enumerating its elements do not correspond to the correct indices in rewrites. Use prevIop.instructionIndex instead, which matches the other runtimes.

    opened by JamesHutch 0
  • Go target for Antlr tool, type "int8"">

    Go target for Antlr tool, type ",int8" => "int8"

    The reserved word table for the Go target contains a typo here. The string ",int8" should be "int8".

    opened by kaby76 0
  • Use of

    Use of "expressions" in a parser grammar for Dart target causes compiler errors

    This seems like an Antlr4 4.9.2 bug with Dart code generation. When I use "expressions" in grammars-v4/sql/plsql/PlSqlParser.g4, the generated code from Dart contains two methods for "expressions()" with different return values. I can work around the problem by renaming "expressions" to "expressions_" in the grammar.

    opened by kaby76 0
  • Fail to serialize NoViableAltException

    Fail to serialize NoViableAltException

    Attempting to serialize a NoViableAltException, or an Exception whose cause chain contains an instance of NoViableAltException fails because the ctx member (RuleContext) from parent class RecognitionException is not serializable.

    Need to make the ANTLR classes that can be included in ANTLR exceptions serializable, or need to make the members of these classes in ANTLR exceptions transient.

    public static class GSContext extends ParserRuleContext {
    ...
    }
    
    Caused by: java.io.NotSerializableException: c.n.j.j.parser.antlr.JP$GSContext
    	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
    	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    	at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:441)
    	at java.lang.Throwable.writeObject(Throwable.java:1024)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1155)
    	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
    	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
    	at c.n.j.d.s.s.jse.writeRes(jse.java:880)
    	at c.n.j.d.s.s.jse.writeScriptRes(jse.java:994)
    	at c.n.j.d.s.s.jse.main(jse.java:944)
    
    opened by jbnas 0
  • ExternalAntlr4Cpp: Mitigate build problems with restricted path lengths and zip files

    ExternalAntlr4Cpp: Mitigate build problems with restricted path lengths and zip files

    This deals with #3189 and fixes a problem with specifying a zip file of the Cpp runtime from the website (e.g. https://www.antlr.org/download/antlr4-cpp-runtime-4.9.2-source.zip ).

    Given that you've turned the utfcpp tests off in a more recent commit there may no longer be critical path length problems, but it may be better to be safe than sorry.

    Accept or close this as you will -- I'll answer questions and do fixes as requested.

    opened by skef 0
  • "EMPTY" is a reserved word in CSharp target

    As the title says, EMPTY is a reserved symbol in grammars for the C# target (and likely in the alt tool/runtime Antlr4cs). That's because it's defined in ParserRuleContext here. If you define EMPTY as a lexer symbol in your grammar, the compiler will give warnings. For further information, see this issue. The workaround is to just rename EMPTY to EMPTY_ in your grammar. (I use trrename to bulk rename such conflicts quickly.)

    opened by kaby76 8
  • Crash in Dart runtime, bitset, for sql/plsql

    Crash in Dart runtime, bitset, for sql/plsql

    I've been porting grammars-v4/sql/plsql to Dart. It works much of the time, but fails for grammars/sql/plsql/examples/analyze.sql with the following error:

    Unhandled exception:
    RangeError: index
    #0      BitSet.set (package:antlr4/src/util/bit_set.dart:184)
    #1      PredictionModeExtension.getSingleViableAlt (package:antlr4/src/atn/src/parser_atn_simulator.dart:2622)
    #2      PredictionModeExtension.resolvesToJustOneViableAlt (package:antlr4/src/atn/src/parser_atn_simulator.dart:2466)
    #3      ParserATNSimulator.execATNWithFullContext (package:antlr4/src/atn/src/parser_atn_simulator.dart:682)
    #4      ParserATNSimulator.execATN (package:antlr4/src/atn/src/parser_atn_simulator.dart:485)
    #5      ParserATNSimulator.adaptivePredict (package:antlr4/src/atn/src/parser_atn_simulator.dart:371)
    #6      PlSqlParser.unit_statement (file:///c:/users/kenne/documents/github/issue-2219/sql/plsql/generated/PlSqlParser.dart:2396)
    #7      PlSqlParser.sql_script (file:///c:/users/kenne/documents/github/issue-2219/sql/plsql/generated/PlSqlParser.dart:2358)
    #8      main (file:///c:/users/kenne/documents/github/issue-2219/sql/plsql/generated/cli.dart:175)
    <asynchronous suspension>
    

    To reproduce, clone the grammars-v4 repo, cd to sql/plsql, then use trgen -t Dart, version 8.2 of trgen, and 4.9.2 Antlr tool and runtime to generate a driver for Dart. To build, type "make".

    opened by kaby76 0
  • Don't restrict supported types for clone in antlrcpp::Any

    Don't restrict supported types for clone in antlrcpp::Any

    Issue

    The antlrcpp::Any class does not properly handle storing std::string types in copy constructor/assignment. For example, this code (see #3194 )

    #include <iostream>
    #include <string>
    #include "antlr4-runtime.h"
    
    int main()
    {
        antlrcpp::Any a = std::string("hello");
        antlrcpp::Any b = a;
    
        std::cout << "a = ";
        if (a.isNotNull()) {
            if (a.is<std::string>()) {
                std::cout << a.as<std::string>() << "\n";
            }
        } else {
            std::cout << "null\n";
        }
    
        std::cout << "b = ";
        if (b.isNotNull()) {
            if (b.is<std::string>()) {
                std::cout << b.as<std::string>() << "\n";
            }
        } else {
            std::cout << "null\n";
        }
        return 0;
    }
    

    Would be expected to print (and does if changed to use std::any):

    a = hello
    b = hello
    

    However b does not get assigned the correct value from a and the above code actually prints:

    a = hello
    b = null
    

    Wrapping the std::string in a pointer like std::shared_ptr yields the expected behavior; however it seems like the condition on clone in Any is too restrictive:

        template<int N = 0, typename std::enable_if<N == N && std::is_nothrow_copy_constructible<T>::value, int>::type = 0>
        Base* clone() const {
          return new Derived<T>(value);
        }
    

    Modification in this PR

    This PR resolves #3194 by removing the template restrictions. In looking into this I noticed that Boost's Any actually has no restriction applied, which makes sense to me in terms of cloning any type to itself that is copy/move/etc constructible:

    virtual placeholder * clone() const
    {
        return new holder(held);
    }
    

    So this PR modifies clone to set no restrictions:

        Base* clone() const {
          return new Derived<T>(value);
        }
    

    I've just been testing this in my own project which is quite small, so I'm not sure if this opens some wider issue. It seems like it should be fine, as this how Boost's Any works. The C++ runtime test suite also looks to be passing:

    [INFO] Scanning for projects...
    [WARNING] The project org.antlr:antlr4-runtime-testsuite:jar:4.9.3-SNAPSHOT uses prerequisites which is only intended for maven-plugin projects but not for non maven-plugin projects. For such purposes you should use the maven-enforcer-plugin. See https://maven.apache.org/enforcer/enforcer-rules/requireMavenVersion.html
    [INFO] 
    [INFO] -----------------< org.antlr:antlr4-runtime-testsuite >-----------------
    [INFO] Building ANTLR 4 Runtime Tests (2nd generation) 4.9.3-SNAPSHOT
    [INFO] --------------------------------[ jar ]---------------------------------
    [INFO] 
    [INFO] --- maven-enforcer-plugin:1.2:enforce (enforce-maven) @ antlr4-runtime-testsuite ---
    [INFO] 
    [INFO] --- antlr4-maven-plugin:4.9.3-SNAPSHOT:antlr4 (default) @ antlr4-runtime-testsuite ---
    [INFO] No grammars to process
    [INFO] ANTLR 4: Processing source directory /home/will/repos/antlr4/runtime-testsuite/test
    [INFO] 
    [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ antlr4-runtime-testsuite ---
    [INFO] Using 'UTF-8' encoding to copy filtered resources.
    [INFO] Copying 15 resources
    [INFO] Copying 1242 resources
    [INFO] 
    [INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ antlr4-runtime-testsuite ---
    [INFO] No sources to compile
    [INFO] 
    [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ antlr4-runtime-testsuite ---
    [INFO] Using 'UTF-8' encoding to copy filtered resources.
    [INFO] Copying 203 resources
    [INFO] 
    [INFO] --- maven-compiler-plugin:3.8.1:testCompile (default-testCompile) @ antlr4-runtime-testsuite ---
    [INFO] Nothing to compile - all classes are up to date
    [INFO] 
    [INFO] --- maven-surefire-plugin:2.19.1:test (default-test) @ antlr4-runtime-testsuite ---
    
    -------------------------------------------------------
     T E S T S
    -------------------------------------------------------
    Running org.antlr.v4.test.runtime.cpp.TestLexerExec
    Compiler version is: clang version 10.0.0-4ubuntu1 
    Target: x86_64-pc-linux-gnu
    Thread model: posix
    InstalledDir: /usr/bin
    
    Building ANTLR4 C++ runtime (if necessary) at /home/will/repos/antlr4/runtime-testsuite/target/classes/Cpp
    C++ runtime build succeeded
    
    Tests run: 37, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 50.258 sec - in org.antlr.v4.test.runtime.cpp.TestLexerExec
    Running org.antlr.v4.test.runtime.cpp.TestParseTrees
    Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.744 sec - in org.antlr.v4.test.runtime.cpp.TestParseTrees
    Running org.antlr.v4.test.runtime.cpp.TestSemPredEvalLexer
    Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.048 sec - in org.antlr.v4.test.runtime.cpp.TestSemPredEvalLexer
    Running org.antlr.v4.test.runtime.cpp.TestCompositeLexers
    Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.514 sec - in org.antlr.v4.test.runtime.cpp.TestCompositeLexers
    Running org.antlr.v4.test.runtime.cpp.TestPerformance
    Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.145 sec - in org.antlr.v4.test.runtime.cpp.TestPerformance
    Running org.antlr.v4.test.runtime.cpp.TestFullContextParsing
    Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 60.92 sec - in org.antlr.v4.test.runtime.cpp.TestFullContextParsing
    Running org.antlr.v4.test.runtime.cpp.TestParserExec
    Tests run: 38, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 154.39 sec - in org.antlr.v4.test.runtime.cpp.TestParserExec
    Running org.antlr.v4.test.runtime.cpp.TestLexerErrors
    Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.032 sec - in org.antlr.v4.test.runtime.cpp.TestLexerErrors
    Running org.antlr.v4.test.runtime.cpp.TestListeners
    Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 28.649 sec - in org.antlr.v4.test.runtime.cpp.TestListeners
    Running org.antlr.v4.test.runtime.cpp.TestCompositeParsers
    Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 61.146 sec - in org.antlr.v4.test.runtime.cpp.TestCompositeParsers
    Running org.antlr.v4.test.runtime.cpp.TestSemPredEvalParser
    Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 101.993 sec - in org.antlr.v4.test.runtime.cpp.TestSemPredEvalParser
    Running org.antlr.v4.test.runtime.cpp.TestLeftRecursion
    Tests run: 98, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 401.897 sec - in org.antlr.v4.test.runtime.cpp.TestLeftRecursion
    Running org.antlr.v4.test.runtime.cpp.TestSets
    Tests run: 32, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.207 sec - in org.antlr.v4.test.runtime.cpp.TestSets
    Running org.antlr.v4.test.runtime.cpp.TestParserErrors
    Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 133.091 sec - in org.antlr.v4.test.runtime.cpp.TestParserErrors
    
    Results :
    
    Tests run: 338, Failures: 0, Errors: 0, Skipped: 0
    
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time:  20:18 min
    [INFO] Finished at: 2021-06-06T10:49:56-06:00
    [INFO] ------------------------------------------------------------------------
    
    opened by Twinklebear 0
  • Using explicit rule for empty production seems to cause parse fail in some cases.

    Using explicit rule for empty production seems to cause parse fail in some cases.

    Mostly copied from https://groups.google.com/g/antlr-discussion/c/AnG5BA-OrTw.

    (github won't allow preview or text alterations (markdown?) so slight cleanup and hope for the best)

    Grammar is at end. It's hugely minimised from an original sql-like language.

    Essentially a 'select' statement takes an optional scaffold part, followed by an optional 'with' statement, followed by the 'select' statement itself. These have been reduced down to single lexemes here for simplicity.

    Optionality is done with an "empty_production" rule, which I prefer the explicitness of over just '... | ;'. It should not matter.

    So with this grammar, with this:

    scaffold select ;

    It parses ok, but comment out the 'scaffold':

    --scaffold select ;

    and you get:

    line 2:0 mismatched input 'select' expecting {'with', 'scaffold'} Stack overflow.

    Scaffold is optional, so why the problem?

    Odder yet (to me), if you add a statement with a scaffold above that statement without a scaffold (which failed previously), it then succeeds overall:

    scaffold select ;

    --scaffold <<< this failed before select ;

    Which parses ok.

    From playing about it seems the optionality of the 'with' statement interferes with the optionality of the 'scaffold' statement, but that's just an impression. AIUI they should not interfere because there's no actual ambiguity in the grammar.

    I'm using the AntlrVSIX plugin for visual studio, version 8.3, which reports the Antlr parser version as 4.9.

    grammar LDB;

    start_parse returns [LDBitems ldbis] : siX = ldb_items EOF ;

    ldb_items : ( sisX += select_statement SEMICOLON ) + ;

    select_statement : sns = opt_set_name_scaffold wctec = opt_with_CTEs_clause qe = query_expression ;

    opt_set_name_scaffold : SCAFFOLD
    | empty_production
    ;

    opt_with_CTEs_clause : WITH
    | empty_production
    ;

    query_expression : SELECT
    ;

    empty_production : ;

    SELECT : 'select' ; WITH : 'with' ; SCAFFOLD : 'scaffold' ;

    SEMICOLON : ';' ;

    fragment WUnl : ( '\r' ? ) '\n' ;

    SLCOMMENT : ( '--' .*? WUnl ) -> skip ;

    fragment ALLWSes : [ \t\r\n]+ ;

    SKIPWS : ALLWSes -> skip ;

    opened by SimonSntPeter 4
  • [C++] antlrcpp::Any incorrect behavior when storing std::string

    [C++] antlrcpp::Any incorrect behavior when storing std::string

    The antlrcpp::Any class does not properly handle storing std::string types in copy constructor/assignment. For example, this code:

    #include <iostream>
    #include <string>
    #include "antlr4-runtime.h"
    
    int main()
    {
        antlrcpp::Any a = std::string("hello");
        antlrcpp::Any b = a;
    
        std::cout << "a = ";
        if (a.isNotNull()) {
            if (a.is<std::string>()) {
                std::cout << a.as<std::string>() << "\n";
            }
        } else {
            std::cout << "null\n";
        }
    
        std::cout << "b = ";
        if (b.isNotNull()) {
            if (b.is<std::string>()) {
                std::cout << b.as<std::string>() << "\n";
            }
        } else {
            std::cout << "null\n";
        }
        return 0;
    }
    

    Would be expected to print (and does if changed to use std::any):

    a = hello
    b = hello
    

    However b does not get assigned the correct value from a and the above code actually prints:

    a = hello
    b = null
    

    Wrapping the std::string in a pointer like std::shared_ptr yields the expected behavior; however it seems like the condition on clone in Any is too restrictive:

        template<int N = 0, typename std::enable_if<N == N && std::is_nothrow_copy_constructible<T>::value, int>::type = 0>
        Base* clone() const {
          return new Derived<T>(value);
        }
    

    In looking into this I noticed that Boost's Any actually has no restriction applied, which makes sense to me in terms of cloning any type to itself that is copy/move/etc constructible:

    virtual placeholder * clone() const
    {
        return new holder(held);
    }
    

    So could clone in Any just become this?

        Base* clone() const {
          return new Derived<T>(value);
        }
    

    I changed this in my local build and it fixes the original issue, but I'm not sure if this has some potential side effects since I'm testing in a small codebase. If it seems ok, I can make this change and open a PR

    opened by Twinklebear 2
Releases(4.9.2)
Owner
Antlr Project
The Project organization for the ANTLR parser generator.
Antlr Project
A pure-Java Markdown processor based on a parboiled PEG parser supporting a number of extensions

:>>> DEPRECATION NOTE <<<: Although still one of the most popular Markdown parsing libraries for the JVM, pegdown has reached its end of life. The pro

Mathias 1.3k May 14, 2021
Nokogiri (鋸) is a Rubygem providing HTML, XML, SAX, and Reader parsers with XPath and CSS selector support.

Nokogiri Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writ

Sparkle Motion 5.6k Jun 11, 2021
jQuery-like cross-driver interface in Java for Selenium WebDriver

seleniumQuery Feature-rich jQuery-like Java interface for Selenium WebDriver seleniumQuery is a feature-rich cross-driver Java library that brings a j

null 71 May 2, 2021
Automated driver management for Selenium WebDriver

WebDriverManager is a library which allows to automate the management of the drivers (e.g. chromedriver, geckodriver, etc.) required by Selenium WebDr

Boni García 1.6k Jun 17, 2021
jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

jsoup: Java HTML Parser jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting a

Jonathan Hedley 8.9k Jun 17, 2021
Open Source Web Crawler for Java

crawler4j crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-thr

Yasser Ganjisaffar 4.1k Jun 8, 2021
Elegant parsing in Java and Scala - lightweight, easy-to-use, powerful.

Please see https://repo1.maven.org/maven2/org/parboiled/ for download access to the artifacts https://github.com/sirthias/parboiled/wiki for all docum

Mathias 1.2k Jun 7, 2021
A scalable web crawler framework for Java.

Readme in Chinese A scalable crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persiste

Yihua Huang 9.8k Jun 14, 2021
Concise UI Tests with Java!

Selenide = UI Testing Framework powered by Selenium WebDriver What is Selenide? Selenide is a framework for writing easy-to-read and easy-to-maintain

Selenide 1.3k Jun 9, 2021
This is public repository for Selenium Learners at TestLeaf

Selenium WebDriver Course for March 2021 Online Learners This is public repository for Selenium Learners at TestLeaf. Week1 - Core Java Basics How Jav

TestLeaf 57 May 30, 2021
An implementation of darcy-web that uses Selenium WebDriver as the automation library backend.

darcy-webdriver An implementation of darcy-ui and darcy-web that uses Selenium WebDriver as the automation library backend. maven <dependency> <gr

darcy framework 20 Aug 22, 2020