Antlr4.Runtime.net35 This is an that is loaded from a file all at once when you construct the object. Vacuum all input from a / and then treat it like a char[] buffer. Can also pass in a or char[] to use.

If you need encoding, pass in stream/reader with correct encoding.

A source of characters for an ANTLR lexer. A source of characters for an ANTLR lexer. A simple stream of symbols whose values are represented as integers. A simple stream of symbols whose values are represented as integers. This interface provides marked ranges with support for a minimum level of buffering necessary to implement arbitrary lookahead during prediction. For more information on marked ranges, see .

Initializing Methods: Some methods in this interface have unspecified behavior if no call to an initializing method has occurred after the stream was constructed. The following is a list of initializing methods:

Consumes the current symbol in the stream. Consumes the current symbol in the stream. This method has the following effects:
  • Forward movement: The value of index() before calling this method is less than the value of index() after calling this method.
  • Ordered lookahead: The value of LA(1) before calling this method becomes the value of LA(-1) after calling this method.
Note that calling this method does not guarantee that index() is incremented by exactly 1, as that would preclude the ability to implement filtering streams (e.g. which distinguishes between "on-channel" and "off-channel" tokens).
if an attempt is made to consume the the end of the stream (i.e. if LA(1)== EOF before calling consume ).
Gets the value of the symbol at offset i from the current position. When i==1 , this method returns the value of the current symbol in the stream (which is the next symbol to be consumed). When i==-1 , this method returns the value of the previously read symbol in the stream. It is not valid to call this method with i==0 , but the specific behavior is unspecified because this method is frequently called from performance-critical code.

This method is guaranteed to succeed if any of the following are true:

  • i>0
  • i==-1 and index() returns a value greater than the value of index() after the stream was constructed and LA(1) was called in that order. Specifying the current index() relative to the index after the stream was created allows for filtering implementations that do not return every symbol from the underlying source. Specifying the call to LA(1) allows for lazily initialized streams.
  • LA(i) refers to a symbol consumed within a marked region that has not yet been released.

If i represents a position at or beyond the end of the stream, this method returns .

The return value is unspecified if i<0 and fewer than -i calls to consume() have occurred from the beginning of the stream before calling this method.

if the stream does not support retrieving the value of the specified symbol
A mark provides a guarantee that seek() operations will be valid over a "marked range" extending from the index where mark() was called to the current index() . This allows the use of streaming input sources by specifying the minimum buffering requirements to support arbitrary lookahead during prediction.

The returned mark is an opaque handle (type int ) which is passed to release() when the guarantees provided by the marked range are no longer necessary. When calls to mark() / release() are nested, the marks must be released in reverse order of which they were obtained. Since marked regions are used during performance-critical sections of prediction, the specific behavior of invalid usage is unspecified (i.e. a mark is not released, or a mark is released twice, or marks are not released in reverse order from which they were created).

The behavior of this method is unspecified if no call to an initializing method has occurred after this stream was constructed.

This method does not change the current position in the input stream.

The following example shows the use of mark() , release(mark) , index() , and seek(index) as part of an operation to safely work within a marked region, then restore the stream position to its original value and release the mark.

            IntStream stream = ...;
            int index = -1;
            int mark = stream.mark();
            try {
            index = stream.index();
            // perform work here...
            } finally {
            if (index != -1) {
            stream.seek(index);
            }
            stream.release(mark);
            }
            
An opaque marker which should be passed to release() when the marked range is no longer required.
This method releases a marked range created by a call to mark() . Calls to release() must appear in the reverse order of the corresponding calls to mark() . If a mark is released twice, or if marks are not released in reverse order of the corresponding calls to mark() , the behavior is unspecified.

For more information and an example, see .

A marker returned by a call to mark() .
Set the input cursor to the position indicated by index . If the specified index lies past the end of the stream, the operation behaves as though index was the index of the EOF symbol. After this method returns without throwing an exception, the at least one of the following will be true.
  • index() will return the index of the first symbol appearing at or after the specified index . Specifically, implementations which filter their sources should automatically adjust index forward the minimum amount required for the operation to target a non-ignored symbol.
  • LA(1) returns
This operation is guaranteed to not throw an exception if index lies within a marked region. For more information on marked regions, see . The behavior of this method is unspecified if no call to an initializing method has occurred after this stream was constructed.
The absolute index to seek to. if index is less than 0 if the stream does not support seeking to the specified index
Return the index into the stream of the input symbol referred to by LA(1) .

The behavior of this method is unspecified if no call to an initializing method has occurred after this stream was constructed.

Returns the total number of symbols in the stream, including a single EOF symbol. Returns the total number of symbols in the stream, including a single EOF symbol. if the size of the stream is unknown. Gets the name of the underlying symbol source. Gets the name of the underlying symbol source. This method returns a non-null, non-empty string. If such a name is not known, this method returns . This method returns the text for a range of characters within this input stream. This method returns the text for a range of characters within this input stream. This method is guaranteed to not throw an exception if the specified interval lies entirely within a marked range. For more information about marked ranges, see . an interval within the stream the text of the specified interval if interval is null if interval.a < 0 , or if interval.b < interval.a - 1 , or if interval.b lies at or past the end of the stream if the stream does not support getting the text of the specified interval The data being scanned How many characters are actually in the buffer 0..n-1 index into string of next char What is name or source of this char stream? Copy data in string to a local char array This is the preferred constructor for strings as no data is copied Reset the stream so that it's in the same state it was when the object was created *except* the data array is not touched. Reset the stream so that it's in the same state it was when the object was created *except* the data array is not touched. mark/release do nothing; we have entire buffer consume() ahead until p==index; can't just set p=index as we must update line and charPositionInLine. consume() ahead until p==index; can't just set p=index as we must update line and charPositionInLine. If we seek backwards, just set p Return the current input symbol index 0..n where n indicates the last symbol has been read. Return the current input symbol index 0..n where n indicates the last symbol has been read. The index is the index of char to be returned from LA(1). Sam Harwell An ATN transition between any two ATN states. An ATN transition between any two ATN states. Subclasses define atom, set, epsilon, action, predicate, rule transitions.

This is a one way link. It emanates from a state (usually via a list of transitions) and has a target state.

Since we never have to change the ATN transitions once we construct it, we can fix these transitions as specific classes. The DFA transitions on the other hand need to update the labels as it adds transitions to the states. We'll use the term Edge for the DFA to distinguish them from ATN transitions.

The target of this transition. The target of this transition. Determines if the transition is an "epsilon" transition. Determines if the transition is an "epsilon" transition.

The default implementation returns false .

true if traversing this transition in the ATN does not consume an input symbol; otherwise, false if traversing this transition consumes (matches) an input symbol.
This class represents profiling event information for an ambiguity. This class represents profiling event information for an ambiguity. Ambiguities are decisions where a particular input resulted in an SLL conflict, followed by LL prediction also reaching a conflict state (indicating a true ambiguity in the grammar).

This event may be reported during SLL prediction in cases where the conflicting SLL configuration set provides sufficient information to determine that the SLL conflict is truly an ambiguity. For example, if none of the ATN configurations in the conflicting SLL configuration set have traversed a global follow transition (i.e. is false for all configurations), then the result of SLL prediction for that input is known to be equivalent to the result of LL prediction for that input.

In some cases, the minimum represented alternative in the conflicting LL configuration set is not equal to the minimum represented alternative in the conflicting SLL configuration set. Grammars and inputs which result in this scenario are unable to use , which in turn means they cannot use the two-stage parsing strategy to improve parsing performance for that input.

4.3
This is the base class for gathering detailed information about prediction events which occur during parsing. This is the base class for gathering detailed information about prediction events which occur during parsing. 4.3 The invoked decision number which this event is related to. The invoked decision number which this event is related to. The simulator state containing additional information relevant to the prediction state when the current event occurred, or null if no additional information is relevant or available. The input token stream which is being parsed. The input token stream which is being parsed. The token index in the input stream at which the current prediction was originally invoked. The token index in the input stream at which the current prediction was originally invoked. The token index in the input stream at which the current event occurred. The token index in the input stream at which the current event occurred. true if the current event occurred during LL prediction; otherwise, false if the input occurred during SLL prediction. Constructs a new instance of the class with the specified detailed ambiguity information. The decision number The final simulator state identifying the ambiguous alternatives for the current input The input token stream The start index for the current prediction The index at which the ambiguity was identified during prediction Stores the computed hash code of this . The hash code is computed in parts to match the following reference algorithm.
            private int referenceHashCode() {
            int hash =
            MurmurHash.initialize
            (
            
            );
            for (int i = 0; i <
            
            ; i++) {
            hash =
            MurmurHash.update
            (hash,
            getParent
            (i));
            }
            for (int i = 0; i <
            
            ; i++) {
            hash =
            MurmurHash.update
            (hash,
            getReturnState
            (i));
            }
            hash =
            MurmurHash.finish
            (hash, 2 *
            
            );
            return hash;
            }
            
Each subrule/rule is a decision point and we must track them so we can go back later and build DFA predictors for them. Each subrule/rule is a decision point and we must track them so we can go back later and build DFA predictors for them. This includes all the rules, subrules, optional blocks, ()+, ()* etc... Maps from rule index to starting state number. Maps from rule index to starting state number. Maps from rule index to stop state number. Maps from rule index to stop state number. The type of the ATN. The type of the ATN. The maximum value for any symbol recognized by a transition in the ATN. The maximum value for any symbol recognized by a transition in the ATN. For lexer ATNs, this maps the rule index to the resulting token type. For lexer ATNs, this maps the rule index to the resulting token type. For parser ATNs, this maps the rule index to the generated bypass token type if the deserialization option was specified; otherwise, this is null . For lexer ATNs, this is an array of objects which may be referenced by action transitions in the ATN. Used for runtime deserialization of ATNs from strings Compute the set of valid tokens that can occur starting in state s . If ctx is , the set of tokens will not include what can follow the rule surrounding s . In other words, the set will be restricted to tokens reachable staying within s 's rule. Compute the set of valid tokens that can occur starting in s and staying in same rule. is in set if we reach end of rule. Computes the set of input symbols which could follow ATN state number stateNumber in the specified full context . This method considers the complete parser context, but does not evaluate semantic predicates (i.e. all predicates encountered during the calculation are assumed true). If a path in the ATN exists from the starting state to the of the outermost context without matching any symbols, is added to the returned set.

If context is null , it is treated as .

the ATN state number the full parse context The set of potentially valid input symbols which could follow the specified state in the specified context. if the ATN does not contain a state with number stateNumber
A tuple: (ATN state, predicted alt, syntactic, semantic context). A tuple: (ATN state, predicted alt, syntactic, semantic context). The syntactic context is a graph-structured stack node whose path(s) to the root is the rule invocation(s) chain used to arrive at the state. The semantic context is the tree of semantic predicates encountered before reaching an ATN state. The ATN state associated with this configuration The stack of invoking states leading to the rule/states associated with this config. The stack of invoking states leading to the rule/states associated with this config. We track only those contexts pushed during execution of the ATN simulator. An ATN configuration is equal to another if both have the same state, they predict the same alternative, and syntactic/semantic contexts are the same. An ATN configuration is equal to another if both have the same state, they predict the same alternative, and syntactic/semantic contexts are the same. Gets the ATN state associated with this configuration. Gets the ATN state associated with this configuration. What alt (or lexer rule) is predicted by this configuration. What alt (or lexer rule) is predicted by this configuration. We cannot execute predicates dependent upon local context unless we know for sure we are in the correct context. We cannot execute predicates dependent upon local context unless we know for sure we are in the correct context. Because there is no way to do this efficiently, we simply cannot evaluate dependent predicates unless we are in the rule that initially invokes the ATN simulator. closure() tracks the depth of how far we dip into the outer context: depth > 0. Note that it may not be totally accurate depth since I don't ever decrement. TODO: make it a boolean then Sam Harwell This maps (state, alt) -> merged . The key does not account for the of the value, which is only a problem if a single ATNConfigSet contains two configs with the same state and alternative but different semantic contexts. When this case arises, the first config added to this map stays, and the remaining configs are placed in .

This map is only used for optimizing the process of adding configs to the set, and is null for read-only sets stored in the DFA.

This is an "overflow" list holding configs which cannot be merged with one of the configs in but have a colliding key. This occurs when two configs in the set have the same state and alternative but different semantic contexts.

This list is only used for optimizing the process of adding configs to the set, and is null for read-only sets stored in the DFA.

This is a list of all configs in this set. This is a list of all configs in this set. When true , this config set represents configurations where the entire outer context has been consumed by the ATN interpreter. This prevents the from pursuing the global FOLLOW when a rule stop state is reached with an empty prediction context.

Note: outermostConfigSet and should never be true at the same time.

Get the set of all alternatives represented by configurations in this set. Get the set of all alternatives represented by configurations in this set. Sam Harwell Sam Harwell This is the earliest supported serialized UUID. This is the earliest supported serialized UUID. This UUID indicates an extension of for the addition of lexer actions encoded as a sequence of instances. This list contains all of the currently supported UUIDs, ordered by when the feature first appeared in this branch. This list contains all of the currently supported UUIDs, ordered by when the feature first appeared in this branch. This is the current serialized UUID. This is the current serialized UUID. Determines if a particular serialized representation of an ATN supports a particular feature, identified by the used for serializing the ATN at the time the feature was first introduced. The marking the first time the feature was supported in the serialized ATN. The of the actual serialized ATN which is currently being deserialized. true if the actualUuid value represents a serialized ATN at or after the feature identified by feature was introduced; otherwise, false . Analyze the states in the specified ATN to set the field to the correct value. The ATN. This is the current serialized UUID. This is the current serialized UUID. Must distinguish between missing edge and edge we know leads nowhere Clear the DFA cache used by the current instance. Clear the DFA cache used by the current instance. Since the DFA cache may be shared by multiple ATN simulators, this method may affect the performance (but not accuracy) of other parsers which are being used concurrently. if the current instance does not support clearing the DFA. 4.3 The following images show the relation of states and for various grammar constructs.
  • Solid edges marked with an ε indicate a required .
  • Dashed edges indicate locations where any transition derived from might appear.
  • Dashed nodes are place holders for either a sequence of linked states or the inclusion of a block representing a nested construct in one of the forms below.
  • Nodes showing multiple outgoing alternatives with a ... support any number of alternatives (one or more). Nodes without the ... only support the exact number of alternatives shown in the diagram.

Basic Blocks

Rule

Block of 1 or more alternatives

Greedy Loops

Greedy Closure: (...)*

Greedy Positive Closure: (...)+

Greedy Optional: (...)?

Non-Greedy Loops

Non-Greedy Closure: (...)*?

Non-Greedy Positive Closure: (...)+?

Non-Greedy Optional: (...)??

Which ATN are we in? Track the transitions emanating from this ATN state. Track the transitions emanating from this ATN state. Used to cache lookahead during parsing, not used during construction Gets the state number. Gets the state number. the state number For all states except , this returns the state number. Returns -1 for stop states. -1 for , otherwise the state number Represents the type of recognizer an ATN applies to. Represents the type of recognizer an ATN applies to. Sam Harwell TODO: make all transitions sets? no, should remove set edges The token type or character value; or, signifies special label. The token type or character value; or, signifies special label. Sam Harwell The start of a regular (...) block. Sam Harwell Terminal node of a simple (a|b|c) block. This class represents profiling event information for a context sensitivity. This class represents profiling event information for a context sensitivity. Context sensitivities are decisions where a particular input resulted in an SLL conflict, but LL prediction produced a single unique alternative.

In some cases, the unique alternative identified by LL prediction is not equal to the minimum represented alternative in the conflicting SLL configuration set. Grammars and inputs which result in this scenario are unable to use , which in turn means they cannot use the two-stage parsing strategy to improve parsing performance for that input.

4.3
Constructs a new instance of the class with the specified detailed context sensitivity information. The decision number The final simulator state containing the unique alternative identified by full-context prediction The input token stream The start index for the current prediction The index at which the context sensitivity was identified during full-context prediction This class contains profiling gathered for a particular decision. This class contains profiling gathered for a particular decision.

Parsing performance in ANTLR 4 is heavily influenced by both static factors (e.g. the form of the rules in the grammar) and dynamic factors (e.g. the choice of input and the state of the DFA cache at the time profiling operations are started). For best results, gather and use aggregate statistics from a large sample of inputs representing the inputs expected in production before using the results to make changes in the grammar.

4.3
The decision number, which is an index into . The total number of times was invoked for this decision. The total time spent in for this decision, in nanoseconds.

The value of this field is computed by , and is not adjusted to compensate for JIT and/or garbage collection overhead. For best accuracy, perform profiling in a separate process which is warmed up by parsing the input prior to profiling. If desired, call to reset the DFA cache to its initial state before starting the profiling measurement pass.

The sum of the lookahead required for SLL prediction for this decision. The sum of the lookahead required for SLL prediction for this decision. Note that SLL prediction is used before LL prediction for performance reasons even when or is used. Gets the minimum lookahead required for any single SLL prediction to complete for this decision, by reaching a unique prediction, reaching an SLL conflict state, or encountering a syntax error. Gets the minimum lookahead required for any single SLL prediction to complete for this decision, by reaching a unique prediction, reaching an SLL conflict state, or encountering a syntax error. Gets the maximum lookahead required for any single SLL prediction to complete for this decision, by reaching a unique prediction, reaching an SLL conflict state, or encountering a syntax error. Gets the maximum lookahead required for any single SLL prediction to complete for this decision, by reaching a unique prediction, reaching an SLL conflict state, or encountering a syntax error. Gets the associated with the event where the value was set. The sum of the lookahead required for LL prediction for this decision. The sum of the lookahead required for LL prediction for this decision. Note that LL prediction is only used when SLL prediction reaches a conflict state. Gets the minimum lookahead required for any single LL prediction to complete for this decision. Gets the minimum lookahead required for any single LL prediction to complete for this decision. An LL prediction completes when the algorithm reaches a unique prediction, a conflict state (for , an ambiguity state (for , or a syntax error. Gets the maximum lookahead required for any single LL prediction to complete for this decision. Gets the maximum lookahead required for any single LL prediction to complete for this decision. An LL prediction completes when the algorithm reaches a unique prediction, a conflict state (for , an ambiguity state (for , or a syntax error. Gets the associated with the event where the value was set. A collection of instances describing the context sensitivities encountered during LL prediction for this decision. A collection of instances describing the parse errors identified during calls to for this decision. A collection of instances describing the ambiguities encountered during LL prediction for this decision. A collection of instances describing the results of evaluating individual predicates during prediction for this decision. The total number of ATN transitions required during SLL prediction for this decision. The total number of ATN transitions required during SLL prediction for this decision. An ATN transition is determined by the number of times the DFA does not contain an edge that is required for prediction, resulting in on-the-fly computation of that edge.

If DFA caching of SLL transitions is employed by the implementation, ATN computation may cache the computed edge for efficient lookup during future parsing of this decision. Otherwise, the SLL parsing algorithm will use ATN transitions exclusively.

The total number of DFA transitions required during SLL prediction for this decision. The total number of DFA transitions required during SLL prediction for this decision.

If the ATN simulator implementation does not use DFA caching for SLL transitions, this value will be 0.

Gets the total number of times SLL prediction completed in a conflict state, resulting in fallback to LL prediction. Gets the total number of times SLL prediction completed in a conflict state, resulting in fallback to LL prediction.

Note that this value is not related to whether or not may be used successfully with a particular grammar. If the ambiguity resolution algorithm applied to the SLL conflicts for this decision produce the same result as LL prediction for this decision, would produce the same overall parsing result as .

The total number of ATN transitions required during LL prediction for this decision. The total number of ATN transitions required during LL prediction for this decision. An ATN transition is determined by the number of times the DFA does not contain an edge that is required for prediction, resulting in on-the-fly computation of that edge.

If DFA caching of LL transitions is employed by the implementation, ATN computation may cache the computed edge for efficient lookup during future parsing of this decision. Otherwise, the LL parsing algorithm will use ATN transitions exclusively.

The total number of DFA transitions required during LL prediction for this decision. The total number of DFA transitions required during LL prediction for this decision.

If the ATN simulator implementation does not use DFA caching for LL transitions, this value will be 0.

Constructs a new instance of the class to contain statistics for a particular decision. The decision number This class represents profiling event information for a syntax error identified during prediction. This class represents profiling event information for a syntax error identified during prediction. Syntax errors occur when the prediction algorithm is unable to identify an alternative which would lead to a successful parse. 4.3 Constructs a new instance of the class with the specified detailed syntax error information. The decision number The final simulator state reached during prediction prior to reaching the state The input token stream The start index for the current prediction The index at which the syntax error was identified Represents a single action which can be executed following the successful match of a lexer rule. Represents a single action which can be executed following the successful match of a lexer rule. Lexer actions are used for both embedded action syntax and ANTLR 4's new lexer command syntax. Sam Harwell 4.2 Execute the lexer action in the context of the specified .

For position-dependent actions, the input stream must already be positioned correctly prior to calling this method.

The lexer instance.
Gets the serialization type of the lexer action. Gets the serialization type of the lexer action. The serialization type of the lexer action. Gets whether the lexer action is position-dependent. Gets whether the lexer action is position-dependent. Position-dependent actions may have different semantics depending on the index at the time the action is executed.

Many lexer commands, including type , skip , and more , do not check the input index during their execution. Actions like this are position-independent, and may be stored more efficiently as part of the .

true if the lexer action semantics can be affected by the position of the input at the time it is executed; otherwise, false .
Represents an executor for a sequence of lexer actions which traversed during the matching operation of a lexer rule (token). Represents an executor for a sequence of lexer actions which traversed during the matching operation of a lexer rule (token).

The executor tracks position information for position-dependent lexer actions efficiently, ensuring that actions appearing only at the end of the rule do not cause bloating of the created for the lexer.

Sam Harwell 4.2
Caches the result of since the hash code is an element of the performance-critical operation. Constructs an executor for a sequence of actions. The lexer actions to execute. Creates a which executes the actions for the input lexerActionExecutor followed by a specified lexerAction . The executor for actions already traversed by the lexer while matching a token within a particular . If this is null , the method behaves as though it were an empty executor. The lexer action to execute after the actions specified in lexerActionExecutor . A for executing the combine actions of lexerActionExecutor and lexerAction . Creates a which encodes the current offset for position-dependent lexer actions.

Normally, when the executor encounters lexer actions where returns true , it calls on the input to set the input position to the end of the current token. This behavior provides for efficient DFA representation of lexer actions which appear at the end of a lexer rule, even when the lexer rule matches a variable number of characters.

Prior to traversing a match transition in the ATN, the current offset from the token start index is assigned to all position-dependent lexer actions which have not already been assigned a fixed offset. By storing the offsets relative to the token start index, the DFA representation of lexer actions which appear in the middle of tokens remains efficient due to sharing among tokens of the same length, regardless of their absolute position in the input stream.

If the current executor already has offsets assigned to all position-dependent lexer actions, the method returns this .

The current offset to assign to all position-dependent lexer actions which do not already have offsets assigned. A which stores input stream offsets for all position-dependent lexer actions.
Execute the actions encapsulated by this executor within the context of a particular .

This method calls to set the position of the input prior to calling on a position-dependent action. Before the method returns, the input position will be restored to the same position it was in when the method was invoked.

The lexer instance. The input stream which is the source for the current token. When this method is called, the current for input should be the start of the following token, i.e. 1 character past the end of the current token. The token start index. This value may be passed to to set the input position to the beginning of the token.
Gets the lexer actions to be executed by this executor. Gets the lexer actions to be executed by this executor. The lexer actions to be executed by this executor. Sam Harwell 4.2 "dup" of ParserInterpreter The current token's starting index into the character stream. The current token's starting index into the character stream. Shared across DFA to ATN simulation in case the ATN fails and the DFA did not have a previous accept state. In this case, we use the ATN-generated exception object. line number 1..n within the input The index of the character relative to the beginning of the line 0..n-1 Used during DFA/ATN exec to record the most recent accept configuration info Get an existing target state for an edge in the DFA. Get an existing target state for an edge in the DFA. If the target state for the edge has not yet been computed or is otherwise not available, this method returns null . The current DFA state The next input symbol The existing target DFA state for the given input symbol t , or null if the target state for this edge is not already cached Compute a target state for an edge in the DFA, and attempt to add the computed state and corresponding edge to the DFA. Compute a target state for an edge in the DFA, and attempt to add the computed state and corresponding edge to the DFA. The input stream The current DFA state The next input symbol The computed target DFA state for the given input symbol t . If t does not lead to a valid DFA state, this method returns . Given a starting configuration set, figure out all ATN configurations we can reach upon input t . Parameter reach is a return parameter. Since the alternatives within any lexer decision are ordered by preference, this method stops pursuing the closure as soon as an accept state is reached. Since the alternatives within any lexer decision are ordered by preference, this method stops pursuing the closure as soon as an accept state is reached. After the first accept state is reached by depth-first search from config , all other (potentially reachable) states for this rule would have a lower priority. true if an accept state is reached, otherwise false . Evaluate a predicate specified in the lexer. Evaluate a predicate specified in the lexer.

If speculative is true , this method was called before for the matched character. This method should call before evaluating the predicate to ensure position sensitive values, including , , and , properly reflect the current lexer state. This method should restore input and the simulator to the original state before returning (i.e. undo the actions made by the call to .

The input stream. The rule containing the predicate. The index of the predicate within the rule. true if the current index in input is one character before the predicate's location. true if the specified predicate evaluates to true .
Add a new DFA state if there isn't one with this set of configurations already. Add a new DFA state if there isn't one with this set of configurations already. This method also detects the first configuration containing an ATN rule stop state. Later, when traversing the DFA, we will know which rule to accept. Get the text matched so far for the current token. Get the text matched so far for the current token. When we hit an accept state in either the DFA or the ATN, we have to notify the character stream to start buffering characters via and record the current state. The current sim state includes the current index into the input, the current line, and current character position in that line. Note that the Lexer is tracking the starting line and characterization of the token. These variables track the "state" of the simulator when it hits an accept state.

We track these variables separately for the DFA and ATN simulation because the DFA simulation often has to fail over to the ATN simulation. If the ATN simulation fails, we need the DFA to fall back to its previously accepted state, if any. If the ATN succeeds, then the ATN does the accept and the DFA simulator that invoked it can simply return the predicted token type.

Implements the channel lexer action by calling with the assigned channel. Sam Harwell 4.2 Constructs a new channel action with the specified channel value. The channel value to pass to .

This action is implemented by calling with the value provided by .

Gets the channel to use for the created by the lexer. The channel to use for the created by the lexer. This method returns . This method returns false . Executes a custom lexer action by calling with the rule and action indexes assigned to the custom action. The implementation of a custom action is added to the generated code for the lexer in an override of when the grammar is compiled.

This class may represent embedded actions created with the {...} syntax in ANTLR 4, as well as actions created for lexer commands where the command argument could not be evaluated when the grammar was compiled.

Sam Harwell 4.2
Constructs a custom lexer action with the specified rule and action indexes. Constructs a custom lexer action with the specified rule and action indexes. The rule index to use for calls to . The action index to use for calls to .

Custom actions are implemented by calling with the appropriate rule and action indexes.

Gets the rule index to use for calls to . The rule index for the custom action. Gets the action index to use for calls to . The action index for the custom action. This method returns . Gets whether the lexer action is position-dependent. Gets whether the lexer action is position-dependent. Position-dependent actions may have different semantics depending on the index at the time the action is executed.

Custom actions are position-dependent since they may represent a user-defined embedded action which makes calls to methods like .

This method returns true .
This implementation of is used for tracking input offsets for position-dependent actions within a .

This action is not serialized as part of the ATN, and is only required for position-dependent lexer actions which appear at a location other than the end of a rule. For more information about DFA optimizations employed for lexer actions, see and .

Sam Harwell 4.2
Constructs a new indexed custom action by associating a character offset with a .

Note: This class is only required for lexer actions for which returns true .

The offset into the input , relative to the token start index, at which the specified lexer action should be executed. The lexer action to execute at a particular offset in the input .

This method calls on the result of using the provided lexer .

Gets the location in the input at which the lexer action should be executed. The value is interpreted as an offset relative to the token start index. The location in the input at which the lexer action should be executed. Gets the lexer action to execute. Gets the lexer action to execute. A object which executes the lexer action. This method returns the result of calling on the returned by . This method returns true . Implements the mode lexer action by calling with the assigned mode. Sam Harwell 4.2 Constructs a new mode action with the specified mode value. The mode value to pass to .

This action is implemented by calling with the value provided by .

Get the lexer mode this action should transition the lexer to. Get the lexer mode this action should transition the lexer to. The lexer mode for this mode command. This method returns . This method returns false . Implements the more lexer action by calling .

The more command does not have any parameters, so this action is implemented as a singleton instance exposed by .

Sam Harwell 4.2
Provides a singleton instance of this parameterless lexer action. Provides a singleton instance of this parameterless lexer action. Constructs the singleton instance of the lexer more command.

This action is implemented by calling .

This method returns . This method returns false . Implements the popMode lexer action by calling .

The popMode command does not have any parameters, so this action is implemented as a singleton instance exposed by .

Sam Harwell 4.2
Provides a singleton instance of this parameterless lexer action. Provides a singleton instance of this parameterless lexer action. Constructs the singleton instance of the lexer popMode command.

This action is implemented by calling .

This method returns . This method returns false . Implements the pushMode lexer action by calling with the assigned mode. Sam Harwell 4.2 Constructs a new pushMode action with the specified mode value. The mode value to pass to .

This action is implemented by calling with the value provided by .

Get the lexer mode this action should transition the lexer to. Get the lexer mode this action should transition the lexer to. The lexer mode for this pushMode command. This method returns . This method returns false . Implements the skip lexer action by calling .

The skip command does not have any parameters, so this action is implemented as a singleton instance exposed by .

Sam Harwell 4.2
Provides a singleton instance of this parameterless lexer action. Provides a singleton instance of this parameterless lexer action. Constructs the singleton instance of the lexer skip command.

This action is implemented by calling .

This method returns . This method returns false . Implements the type lexer action by calling with the assigned type. Sam Harwell 4.2 Constructs a new type action with the specified token type value. The type to assign to the token using .

This action is implemented by calling with the value provided by .

Gets the type to assign to a token created by the lexer. Gets the type to assign to a token created by the lexer. The type to assign to a token created by the lexer. This method returns . This method returns false . Special value added to the lookahead sets to indicate that we hit a predicate during analysis if seeThruPreds==false . Calculates the SLL(1) expected lookahead set for each outgoing transition of an . The returned array has one element for each outgoing transition in s . If the closure from transition i leads to a semantic predicate before matching a symbol, the element at index i of the result will be null . the ATN state the expected symbols for each outgoing transition of s . Compute set of tokens that can follow s in the ATN in the specified ctx .

If ctx is null and the end of the rule containing s is reached, is added to the result set. If ctx is not null and the end of the outermost rule is reached, is added to the result set.

the ATN state the complete parser context, or null if the context should be ignored The set of tokens that can follow s in the ATN in the specified ctx .
Compute set of tokens that can follow s in the ATN in the specified ctx .

If ctx is null and the end of the rule containing s is reached, is added to the result set. If is not PredictionContext#EMPTY_LOCAL and the end of the outermost rule is reached, is added to the result set.

the ATN state the ATN state to stop at. This can be a to detect epsilon paths through a closure. the complete parser context, or null if the context should be ignored The set of tokens that can follow s in the ATN in the specified ctx .
Compute set of tokens that can follow s in the ATN in the specified ctx .

If ctx is and stopState or the end of the rule containing s is reached, is added to the result set. If ctx is not and addEOF is true and stopState or the end of the outermost rule is reached, is added to the result set.

the ATN state. the ATN state to stop at. This can be a to detect epsilon paths through a closure. The outer context, or if the outer context should not be used. The result lookahead set. A set used for preventing epsilon closures in the ATN from causing a stack overflow. Outside code should pass new HashSet<ATNConfig> for this argument. A set used for preventing left recursion in the ATN from causing a stack overflow. Outside code should pass new BitSet() for this argument. true to true semantic predicates as implicitly true and "see through them", otherwise false to treat semantic predicates as opaque and add to the result if one is encountered. Add to the result if the end of the outermost context is reached. This parameter has no effect if ctx is .
This class represents profiling event information for tracking the lookahead depth required in order to make a prediction. This class represents profiling event information for tracking the lookahead depth required in order to make a prediction. 4.3 Constructs a new instance of the class with the specified detailed lookahead information. The decision number The final simulator state containing the necessary information to determine the result of a prediction, or null if the final state is not available The input token stream The start index for the current prediction The index at which the prediction was finally made true if the current lookahead is part of an LL prediction; otherwise, false if the current lookahead is part of an SLL prediction Mark the end of a * or + loop. Mark the end of a * or + loop. A transition containing a set of values. A transition containing a set of values. Sam Harwell This class provides access to specific and aggregate statistics gathered during profiling of a parser. This class provides access to specific and aggregate statistics gathered during profiling of a parser. 4.3 Gets the decision numbers for decisions that required one or more full-context predictions during parsing. Gets the decision numbers for decisions that required one or more full-context predictions during parsing. These are decisions for which is non-zero. A list of decision numbers which required one or more full-context predictions during parsing. Gets the total time spent during prediction across all decisions made during parsing. Gets the total time spent during prediction across all decisions made during parsing. This value is the sum of for all decisions. Gets the total number of SLL lookahead operations across all decisions made during parsing. Gets the total number of SLL lookahead operations across all decisions made during parsing. This value is the sum of for all decisions. Gets the total number of LL lookahead operations across all decisions made during parsing. Gets the total number of LL lookahead operations across all decisions made during parsing. This value is the sum of for all decisions. Gets the total number of ATN lookahead operations for SLL prediction across all decisions made during parsing. Gets the total number of ATN lookahead operations for SLL prediction across all decisions made during parsing. Gets the total number of ATN lookahead operations for LL prediction across all decisions made during parsing. Gets the total number of ATN lookahead operations for LL prediction across all decisions made during parsing. Gets the total number of ATN lookahead operations for SLL and LL prediction across all decisions made during parsing. Gets the total number of ATN lookahead operations for SLL and LL prediction across all decisions made during parsing.

This value is the sum of and .

Gets the total number of DFA states stored in the DFA cache for all decisions in the ATN. Gets the total number of DFA states stored in the DFA cache for all decisions in the ATN. Gets the total number of DFA states stored in the DFA cache for a particular decision. Gets the total number of DFA states stored in the DFA cache for a particular decision. Gets an array of instances containing the profiling information gathered for each decision in the ATN. An array of instances, indexed by decision number. The embodiment of the adaptive LL(*), ALL(*), parsing strategy. The embodiment of the adaptive LL(*), ALL(*), parsing strategy.

The basic complexity of the adaptive strategy makes it harder to understand. We begin with ATN simulation to build paths in a DFA. Subsequent prediction requests go through the DFA first. If they reach a state without an edge for the current symbol, the algorithm fails over to the ATN simulation to complete the DFA path for the current input (until it finds a conflict state or uniquely predicting state).

All of that is done without using the outer context because we want to create a DFA that is not dependent upon the rule invocation stack when we do a prediction. One DFA works in all contexts. We avoid using context not necessarily because it's slower, although it can be, but because of the DFA caching problem. The closure routine only considers the rule invocation stack created during prediction beginning in the decision rule. For example, if prediction occurs without invoking another rule's ATN, there are no context stacks in the configurations. When lack of context leads to a conflict, we don't know if it's an ambiguity or a weakness in the strong LL(*) parsing strategy (versus full LL(*)).

When SLL yields a configuration set with conflict, we rewind the input and retry the ATN simulation, this time using full outer context without adding to the DFA. Configuration context stacks will be the full invocation stacks from the start rule. If we get a conflict using full context, then we can definitively say we have a true ambiguity for that input sequence. If we don't get a conflict, it implies that the decision is sensitive to the outer context. (It is not context-sensitive in the sense of context-sensitive grammars.)

The next time we reach this DFA state with an SLL conflict, through DFA simulation, we will again retry the ATN simulation using full context mode. This is slow because we can't save the results and have to "interpret" the ATN each time we get that input.

CACHING FULL CONTEXT PREDICTIONS

We could cache results from full context to predicted alternative easily and that saves a lot of time but doesn't work in presence of predicates. The set of visible predicates from the ATN start state changes depending on the context, because closure can fall off the end of a rule. I tried to cache tuples (stack context, semantic context, predicted alt) but it was slower than interpreting and much more complicated. Also required a huge amount of memory. The goal is not to create the world's fastest parser anyway. I'd like to keep this algorithm simple. By launching multiple threads, we can improve the speed of parsing across a large number of files.

There is no strict ordering between the amount of input used by SLL vs LL, which makes it really hard to build a cache for full context. Let's say that we have input A B C that leads to an SLL conflict with full context X. That implies that using X we might only use A B but we could also use A B C D to resolve conflict. Input A B C D could predict alternative 1 in one position in the input and A B C E could predict alternative 2 in another position in input. The conflicting SLL configurations could still be non-unique in the full context prediction, which would lead us to requiring more input than the original A B C. To make a prediction cache work, we have to track the exact input used during the previous prediction. That amounts to a cache that maps X to a specific DFA for that context.

Something should be done for left-recursive expression predictions. They are likely LL(1) + pred eval. Easier to do the whole SLL unless error and retry with full LL thing Sam does.

AVOIDING FULL CONTEXT PREDICTION

We avoid doing full context retry when the outer context is empty, we did not dip into the outer context by falling off the end of the decision state rule, or when we force SLL mode.

As an example of the not dip into outer context case, consider as super constructor calls versus function calls. One grammar might look like this:

            ctorBody
            : '{' superCall? stat* '}'
            ;
            

Or, you might see something like

            stat
            : superCall ';'
            | expression ';'
            | ...
            ;
            

In both cases I believe that no closure operations will dip into the outer context. In the first case ctorBody in the worst case will stop at the '}'. In the 2nd case it should stop at the ';'. Both cases should stay within the entry rule and not dip into the outer context.

PREDICATES

Predicates are always evaluated if present in either SLL or LL both. SLL and LL simulation deals with predicates differently. SLL collects predicates as it performs closure operations like ANTLR v3 did. It delays predicate evaluation until it reaches and accept state. This allows us to cache the SLL ATN simulation whereas, if we had evaluated predicates on-the-fly during closure, the DFA state configuration sets would be different and we couldn't build up a suitable DFA.

When building a DFA accept state during ATN simulation, we evaluate any predicates and return the sole semantically valid alternative. If there is more than 1 alternative, we report an ambiguity. If there are 0 alternatives, we throw an exception. Alternatives without predicates act like they have true predicates. The simple way to think about it is to strip away all alternatives with false predicates and choose the minimum alternative that remains.

When we start in the DFA and reach an accept state that's predicated, we test those and return the minimum semantically viable alternative. If no alternatives are viable, we throw an exception.

During full LL ATN simulation, closure always evaluates predicates and on-the-fly. This is crucial to reducing the configuration set size during closure. It hits a landmine when parsing with the Java grammar, for example, without this on-the-fly evaluation.

SHARING DFA

All instances of the same parser share the same decision DFAs through a static field. Each instance gets its own ATN simulator but they share the same field. They also share a object that makes sure that all objects are shared among the DFA states. This makes a big size difference.

THREAD SAFETY

The locks on the field when it adds a new DFA object to that array. locks on the DFA for the current decision when setting the field. locks on the DFA for the current decision when looking up a DFA state to see if it already exists. We must make sure that all requests to add DFA states that are equivalent result in the same shared DFA object. This is because lots of threads will be trying to update the DFA at once. The method also locks inside the DFA lock but this time on the shared context cache when it rebuilds the configurations' objects using cached subgraphs/nodes. No other locking occurs, even during DFA simulation. This is safe as long as we can guarantee that all threads referencing s.edge[t] get the same physical target , or null . Once into the DFA, the DFA simulation does not reference the map. It follows the field to new targets. The DFA simulator will either find to be null , to be non- null and dfa.edges[t] null, or dfa.edges[t] to be non-null. The method could be racing to set the field but in either case the DFA simulator works; if null , and requests ATN simulation. It could also race trying to get dfa.edges[t] , but either way it will work because it's not doing a test and set operation.

Starting with SLL then failing to combined SLL/LL (Two-Stage Parsing)

Sam pointed out that if SLL does not give a syntax error, then there is no point in doing full LL, which is slower. We only have to try LL if we get a syntax error. For maximum speed, Sam starts the parser set to pure SLL mode with the :

            parser.
            getInterpreter()
            .
            
            (
            
            )
            ;
            parser.
            
            (new
            
            ());
            

If it does not get a syntax error, then we're done. If it does get a syntax error, we need to retry with the combined SLL/LL strategy.

The reason this works is as follows. If there are no SLL conflicts, then the grammar is SLL (at least for that input set). If there is an SLL conflict, the full LL analysis must yield a set of viable alternatives which is a subset of the alternatives reported by SLL. If the LL set is a singleton, then the grammar is LL but not SLL. If the LL set is the same size as the SLL set, the decision is SLL. If the LL set has size > 1, then that decision is truly ambiguous on the current input. If the LL set is smaller, then the SLL conflict resolution might choose an alternative that the full LL would rule out as a possibility based upon better context information. If that's the case, then the SLL parse will definitely get an error because the full LL analysis says it's not viable. If SLL conflict resolution chooses an alternative within the LL set, them both SLL and LL would choose the same alternative because they both choose the minimum of multiple conflicting alternatives.

Let's say we have a set of SLL conflicting alternatives 1, 2, 3}} and a smaller LL set called s. If s is 2, 3}}, then SLL parsing will get an error because SLL will pursue alternative 1. If s is 1, 2}} or 1, 3}} then both SLL and LL will choose the same alternative because alternative one is the minimum of either set. If s is 2}} or 3}} then SLL will get a syntax error. If s is 1}} then SLL will succeed.

Of course, if the input is invalid, then we will get an error for sure in both SLL and LL parsing. Erroneous input will therefore require 2 passes over the input.

Determines whether the DFA is used for full-context predictions. Determines whether the DFA is used for full-context predictions. When true , the DFA stores transition information for both full-context and SLL parsing; otherwise, the DFA only stores SLL transition information.

For some grammars, enabling the full-context DFA can result in a substantial performance improvement. However, this improvement typically comes at the expense of memory used for storing the cached DFA states, configuration sets, and prediction contexts.

The default value is false .

When true , ambiguous alternatives are reported when they are encountered within . When false , these messages are suppressed. The default is false .

When messages about ambiguous alternatives are not required, setting this to false enables additional internal optimizations which may lose this information.

By default we do full context-sensitive LL(*) parsing not Strong LL(*) parsing. By default we do full context-sensitive LL(*) parsing not Strong LL(*) parsing. If we fail with Strong LL(*) we try full LL(*). That means we rewind and use context information when closure operations fall off the end of the rule that holds the decision were evaluating. Testing only! Performs ATN simulation to compute a predicted alternative based upon the remaining input, but also updates the DFA cache to avoid having to traverse the ATN again for the same input sequence. Performs ATN simulation to compute a predicted alternative based upon the remaining input, but also updates the DFA cache to avoid having to traverse the ATN again for the same input sequence. There are some key conditions we're looking for after computing a new set of ATN configs (proposed DFA state): if the set is empty, there is no viable alternative for current symbol does the state uniquely predict an alternative? does the state have a conflict that would prevent us from putting it on the work list? if in non-greedy decision is there a config at a rule stop state? We also have some key operations to do: add an edge from previous DFA state to potentially new DFA state, D, upon current symbol but only if adding to work list, which means in all cases except no viable alternative (and possibly non-greedy decisions?) collecting predicates and adding semantic context to DFA accept states adding rule context to context-sensitive DFA accept states consuming an input symbol reporting a conflict reporting an ambiguity reporting a context sensitivity reporting insufficient predicates We should isolate those operations, which are side-effecting, to the main work loop. We can isolate lots of code into other functions, but they should be side effect free. They can return package that indicates whether we should report something, whether we need to add a DFA edge, whether we need to augment accept state with semantic context or rule invocation context. Actually, it seems like we always add predicates if they exist, so that can simply be done in the main loop for any accept state creation or modification request. cover these cases: dead end single alt single alt + preds conflict conflict + preds TODO: greedy + those This method is used to improve the localization of error messages by choosing an alternative rather than throwing a in particular prediction scenarios where the state was reached during ATN simulation.

The default implementation of this method uses the following algorithm to identify an ATN configuration which successfully parsed the decision entry rule. Choosing such an alternative ensures that the returned by the calling rule will be complete and valid, and the syntax error will be reported later at a more localized location.

  • If no configuration in configs reached the end of the decision rule, return .
  • If all configurations in configs which reached the end of the decision rule predict the same alternative, return that alternative.
  • If the configurations in configs which reached the end of the decision rule predict multiple alternatives (call this S), choose an alternative in the following order.
    1. Filter the configurations in configs to only those configurations which remain viable after evaluating semantic predicates. If the set of these filtered configurations which also reached the end of the decision rule is not empty, return the minimum alternative represented in this set.
    2. Otherwise, choose the minimum alternative in S.

In some scenarios, the algorithm described above could predict an alternative which will result in a in parser. Specifically, this could occur if the only configuration capable of successfully parsing to the end of the decision rule is blocked by a semantic predicate. By choosing this alternative within instead of throwing a , the resulting in the parser will identify the specific predicate which is preventing the parser from successfully parsing the decision rule, which helps developers identify and correct logic errors in semantic predicates.

The input The start index for the current prediction, which is the input index where any semantic context in configs should be evaluated The ATN simulation state immediately before the state was reached The value to return from , or if a suitable alternative was not identified and should report an error instead.
Get an existing target state for an edge in the DFA. Get an existing target state for an edge in the DFA. If the target state for the edge has not yet been computed or is otherwise not available, this method returns null . The current DFA state The next input symbol The existing target DFA state for the given input symbol t , or null if the target state for this edge is not already cached Compute a target state for an edge in the DFA, and attempt to add the computed state and corresponding edge to the DFA. Compute a target state for an edge in the DFA, and attempt to add the computed state and corresponding edge to the DFA. The current DFA state The next input symbol The computed target DFA state for the given input symbol t . If t does not lead to a valid DFA state, this method returns . Return a configuration set containing only the configurations from configs which are in a . If all configurations in configs are already in a rule stop state, this method simply returns configs . the configuration set to update the cache configs if all configurations in configs are in a rule stop state, otherwise return a new configuration set containing only the configurations from configs which are in a rule stop state This method transforms the start state computed by to the special start state used by a precedence DFA for a particular precedence value. The transformation process applies the following changes to the start state's configuration set.
  1. Evaluate the precedence predicates for each configuration using .
  2. Remove all configurations which predict an alternative greater than 1, for which another configuration that predicts alternative 1 is in the same ATN state with the same prediction context. This transformation is valid for the following reasons:
    • The closure block cannot contain any epsilon transitions which bypass the body of the closure, so all states reachable via alternative 1 are part of the precedence alternatives of the transformed left-recursive rule.
    • The "primary" portion of a left recursive rule cannot contain an epsilon transition, so the only way an alternative other than 1 can exist in a state that is also reachable via alternative 1 is by nesting calls to the left-recursive rule, with the outer calls not being at the preferred precedence level.

The prediction context must be considered by this filter to address situations like the following.

            grammar TA;
            prog: statement* EOF;
            statement: letterA | statement letterA 'b' ;
            letterA: 'a';
            

If the above grammar, the ATN state immediately before the token reference 'a' in letterA is reachable from the left edge of both the primary and closure blocks of the left-recursive rule statement . The prediction context associated with each of these configurations distinguishes between them, and prevents the alternative which stepped out to prog (and then back in to statement from being eliminated by the filter.

The configuration set computed by as the start state for the DFA. The transformed configuration set representing the start state for a precedence DFA at a particular precedence level (determined by calling ).
collect and set D's semantic context Look through a list of predicate/alt pairs, returning alts for the pairs that win. Look through a list of predicate/alt pairs, returning alts for the pairs that win. A null predicate indicates an alt containing an unpredicated config which behaves as "always true." Evaluate a semantic context within a specific parser context. Evaluate a semantic context within a specific parser context.

This method might not be called for every semantic context evaluated during the prediction process. In particular, we currently do not evaluate the following but it may change in the future:

  • Precedence predicates (represented by ) are not currently evaluated through this method.
  • Operator predicates (represented by and ) are evaluated as a single semantic context, rather than evaluating the operands individually. Implementations which require evaluation results from individual predicates should override this method to explicitly handle evaluation of the operands within operator predicates.
The semantic context to evaluate The parser context in which to evaluate the semantic context The alternative which is guarded by pred 4.3
See comment on LexerInterpreter.addDFAState. See comment on LexerInterpreter.addDFAState. See comment on LexerInterpreter.addDFAState. See comment on LexerInterpreter.addDFAState. If context sensitive parsing, we know it's ambiguity not conflict 4.3 Start of (A|B|...)+ loop. Technically a decision state, but we don't use for code generation; somebody might need it, so I'm defining it for completeness. In reality, the node is the real decision-making note for A+ . Decision state for A+ and (A|B)+ . It has two transitions: one to the loop back to start of the block and one to exit. Sam Harwell This class represents profiling event information for semantic predicate evaluations which occur during prediction. This class represents profiling event information for semantic predicate evaluations which occur during prediction. 4.3 The semantic context which was evaluated. The semantic context which was evaluated. The alternative number for the decision which is guarded by the semantic context . Note that other ATN configurations may predict the same alternative which are guarded by other semantic contexts and/or . The result of evaluating the semantic context . Constructs a new instance of the class with the specified detailed predicate evaluation information. The simulator state The decision number The input token stream The start index for the current prediction The index at which the predicate evaluation was triggered. Note that the input stream may be reset to other positions for the actual evaluation of individual predicates. The semantic context which was evaluated The results of evaluating the semantic context The alternative number for the decision which is guarded by the semantic context semctx . See for more information. TODO: this is old comment: A tree of semantic predicates from the grammar AST if label==SEMPRED. TODO: this is old comment: A tree of semantic predicates from the grammar AST if label==SEMPRED. In the ATN, labels will always be exactly one predicate, but the DFA may have to combine a bunch of them as it collects predicates from multiple ATN configurations into a single DFA state. Used to cache objects. Its used for the shared context cash associated with contexts in DFA states. This cache can be used for both lexers and parsers. Sam Harwell This enumeration defines the prediction modes available in ANTLR 4 along with utility methods for analyzing configuration sets for conflicts and/or ambiguities. This enumeration defines the prediction modes available in ANTLR 4 along with utility methods for analyzing configuration sets for conflicts and/or ambiguities. The SLL(*) prediction mode. The SLL(*) prediction mode. This prediction mode ignores the current parser context when making predictions. This is the fastest prediction mode, and provides correct results for many grammars. This prediction mode is more powerful than the prediction mode provided by ANTLR 3, but may result in syntax errors for grammar and input combinations which are not SLL.

When using this prediction mode, the parser will either return a correct parse tree (i.e. the same parse tree that would be returned with the prediction mode), or it will report a syntax error. If a syntax error is encountered when using the prediction mode, it may be due to either an actual syntax error in the input or indicate that the particular combination of grammar and input requires the more powerful prediction abilities to complete successfully.

This prediction mode does not provide any guarantees for prediction behavior for syntactically-incorrect inputs.

The LL(*) prediction mode. The LL(*) prediction mode. This prediction mode allows the current parser context to be used for resolving SLL conflicts that occur during prediction. This is the fastest prediction mode that guarantees correct parse results for all combinations of grammars with syntactically correct inputs.

When using this prediction mode, the parser will make correct decisions for all syntactically-correct grammar and input combinations. However, in cases where the grammar is truly ambiguous this prediction mode might not report a precise answer for exactly which alternatives are ambiguous.

This prediction mode does not provide any guarantees for prediction behavior for syntactically-incorrect inputs.

The LL(*) prediction mode with exact ambiguity detection. The LL(*) prediction mode with exact ambiguity detection. In addition to the correctness guarantees provided by the prediction mode, this prediction mode instructs the prediction algorithm to determine the complete and exact set of ambiguous alternatives for every ambiguous decision encountered while parsing.

This prediction mode may be used for diagnosing ambiguities during grammar development. Due to the performance overhead of calculating sets of ambiguous alternatives, this prediction mode should be avoided when the exact results are not necessary.

This prediction mode does not provide any guarantees for prediction behavior for syntactically-incorrect inputs.

Computes the SLL prediction termination condition. Computes the SLL prediction termination condition.

This method computes the SLL prediction termination condition for both of the following cases.

  • The usual SLL+LL fallback upon SLL conflict
  • Pure SLL without LL fallback

COMBINED SLL+LL PARSING

When LL-fallback is enabled upon SLL conflict, correct predictions are ensured regardless of how the termination condition is computed by this method. Due to the substantially higher cost of LL prediction, the prediction should only fall back to LL when the additional lookahead cannot lead to a unique SLL prediction.

Assuming combined SLL+LL parsing, an SLL configuration set with only conflicting subsets should fall back to full LL, even if the configuration sets don't resolve to the same alternative (e.g. 1,2}} and 3,4}}. If there is at least one non-conflicting configuration, SLL could continue with the hopes that more lookahead will resolve via one of those non-conflicting configurations.

Here's the prediction termination rule them: SLL (for SLL+LL parsing) stops when it sees only conflicting configuration subsets. In contrast, full LL keeps going when there is uncertainty.

HEURISTIC

As a heuristic, we stop prediction when we see any conflicting subset unless we see a state that only has one alternative associated with it. The single-alt-state thing lets prediction continue upon rules like (otherwise, it would admit defeat too soon):

[12|1|[], 6|2|[], 12|2|[]]. s : (ID | ID ID?) ';' ;

When the ATN simulation reaches the state before ';' , it has a DFA state that looks like: [12|1|[], 6|2|[], 12|2|[]] . Naturally 12|1|[] and 12|2|[] conflict, but we cannot stop processing this node because alternative to has another way to continue, via [6|2|[]] .

It also let's us continue for this rule:

[1|1|[], 1|2|[], 8|3|[]] a : A | A | A B ;

After matching input A, we reach the stop state for rule A, state 1. State 8 is the state right before B. Clearly alternatives 1 and 2 conflict and no amount of further lookahead will separate the two. However, alternative 3 will be able to continue and so we do not stop working on this state. In the previous example, we're concerned with states associated with the conflicting alternatives. Here alt 3 is not associated with the conflicting configs, but since we can continue looking for input reasonably, don't declare the state done.

PURE SLL PARSING

To handle pure SLL parsing, all we have to do is make sure that we combine stack contexts for configurations that differ only by semantic predicate. From there, we can do the usual SLL termination heuristic.

PREDICATES IN SLL+LL PARSING

SLL decisions don't evaluate predicates until after they reach DFA stop states because they need to create the DFA cache that works in all semantic situations. In contrast, full LL evaluates predicates collected during start state computation so it can ignore predicates thereafter. This means that SLL termination detection can totally ignore semantic predicates.

Implementation-wise, combines stack contexts but not semantic predicate contexts so we might see two configurations like the following.

(s, 1, x, ), (s, 1, x', {p})}

Before testing these configurations against others, we have to merge x and x' (without modifying the existing configurations). For example, we test (x+x')==x'' when looking for conflicts in the following configurations.

(s, 1, x, ), (s, 1, x', {p}), (s, 2, x'', {})}

If the configuration set has predicates (as indicated by ), this algorithm makes a copy of the configurations to strip out all of the predicates so that a standard will merge everything ignoring predicates.

Checks if any configuration in configs is in a . Configurations meeting this condition have reached the end of the decision rule (local context) or end of start rule (full context). the configuration set to test true if any configuration in configs is in a , otherwise false Checks if all configurations in configs are in a . Configurations meeting this condition have reached the end of the decision rule (local context) or end of start rule (full context). the configuration set to test true if all configurations in configs are in a , otherwise false Full LL prediction termination. Full LL prediction termination.

Can we stop looking ahead during ATN simulation or is there some uncertainty as to which alternative we will ultimately pick, after consuming more input? Even if there are partial conflicts, we might know that everything is going to resolve to the same minimum alternative. That means we can stop since no more lookahead will change that fact. On the other hand, there might be multiple conflicts that resolve to different minimums. That means we need more look ahead to decide which of those alternatives we should predict.

The basic idea is to split the set of configurations C , into conflicting subsets (s, _, ctx, _) and singleton subsets with non-conflicting configurations. Two configurations conflict if they have identical and values but different value, e.g. (s, i, ctx, _) and (s, j, ctx, _) for i!=j .

Reduce these configuration subsets to the set of possible alternatives. You can compute the alternative subsets in one pass as follows:

A_s,ctx = i | (s, i, ctx, _)}} for each configuration in C holding s and ctx fixed.

Or in pseudo-code, for each configuration c in C :

            map[c] U= c.
            getAlt()
            # map hash/equals uses s and x, not
            alt and not pred
            

The values in map are the set of A_s,ctx sets.

If |A_s,ctx|=1 then there is no conflict associated with s and ctx .

Reduce the subsets to singletons by choosing a minimum of each subset. If the union of these alternative subsets is a singleton, then no amount of more lookahead will help us. We will always pick that alternative. If, however, there is more than one alternative, then we are uncertain which alternative to predict and must continue looking for resolution. We may or may not discover an ambiguity in the future, even if there are no conflicting subsets this round.

The biggest sin is to terminate early because it means we've made a decision but were uncertain as to the eventual outcome. We haven't used enough lookahead. On the other hand, announcing a conflict too late is no big deal; you will still have the conflict. It's just inefficient. It might even look until the end of file.

No special consideration for semantic predicates is required because predicates are evaluated on-the-fly for full LL prediction, ensuring that no configuration contains a semantic context during the termination check.

CONFLICTING CONFIGS

Two configurations (s, i, x) and (s, j, x') , conflict when i!=j but x=x' . Because we merge all (s, i, _) configurations together, that means that there are at most n configurations associated with state s for n possible alternatives in the decision. The merged stacks complicate the comparison of configuration contexts x and x' . Sam checks to see if one is a subset of the other by calling merge and checking to see if the merged result is either x or x' . If the x associated with lowest alternative i is the superset, then i is the only possible prediction since the others resolve to min(i) as well. However, if x is associated with j>i then at least one stack configuration for j is not in conflict with alternative i . The algorithm should keep going, looking for more lookahead due to the uncertainty.

For simplicity, I'm doing a equality check between x and x' that lets the algorithm continue to consume lookahead longer than necessary. The reason I like the equality is of course the simplicity but also because that is the test you need to detect the alternatives that are actually in conflict.

CONTINUE/STOP RULE

Continue if union of resolved alternative sets from non-conflicting and conflicting alternative subsets has more than one alternative. We are uncertain about which alternative to predict.

The complete set of alternatives, [i for (_,i,_)] , tells us which alternatives are still in the running for the amount of input we've consumed at this point. The conflicting sets let us to strip away configurations that won't lead to more states because we resolve conflicts to the configuration with a minimum alternate for the conflicting set.

CASES

  • no conflicts and more than 1 alternative in set => continue
  • (s, 1, x) , (s, 2, x) , (s, 3, z) , (s', 1, y) , (s', 2, y) yields non-conflicting set 3}} U conflicting sets min( 1,2})} U min( 1,2})} = 1,3}} => continue
  • (s, 1, x) , (s, 2, x) , (s', 1, y) , (s', 2, y) , (s'', 1, z) yields non-conflicting set 1}} U conflicting sets min( 1,2})} U min( 1,2})} = 1}} => stop and predict 1
  • (s, 1, x) , (s, 2, x) , (s', 1, y) , (s', 2, y) yields conflicting, reduced sets 1}} U 1}} = 1}} => stop and predict 1, can announce ambiguity 1,2}}
  • (s, 1, x) , (s, 2, x) , (s', 2, y) , (s', 3, y) yields conflicting, reduced sets 1}} U 2}} = 1,2}} => continue
  • (s, 1, x) , (s, 2, x) , (s', 3, y) , (s', 4, y) yields conflicting, reduced sets 1}} U 3}} = 1,3}} => continue

EXACT AMBIGUITY DETECTION

If all states report the same conflicting set of alternatives, then we know we have the exact ambiguity set.

|A_i|>1 and A_i = A_j for all i, j.

In other words, we continue examining lookahead until all A_i have more than one alternative and all A_i are the same. If A= {1,2}, {1,3}}}, then regular LL prediction would terminate because the resolved set is 1}}. To determine what the real ambiguity is, we have to know whether the ambiguity is between one and two or one and three so we keep going. We can only stop prediction when we need exact ambiguity detection when the sets look like A= {1,2}}} or {1,2},{1,2}}}, etc...

Determines if every alternative subset in altsets contains more than one alternative. a collection of alternative subsets true if every in altsets has cardinality > 1, otherwise false Determines if any single alternative subset in altsets contains exactly one alternative. a collection of alternative subsets true if altsets contains a with cardinality 1, otherwise false Determines if any single alternative subset in altsets contains more than one alternative. a collection of alternative subsets true if altsets contains a with cardinality > 1, otherwise false Determines if every alternative subset in altsets is equivalent. a collection of alternative subsets true if every member of altsets is equal to the others, otherwise false Returns the unique alternative predicted by all alternative subsets in altsets . If no such alternative exists, this method returns . a collection of alternative subsets Gets the complete set of represented alternatives for a collection of alternative subsets. Gets the complete set of represented alternatives for a collection of alternative subsets. This method returns the union of each in altsets . a collection of alternative subsets the set of represented alternatives in altsets This function gets the conflicting alt subsets from a configuration set. This function gets the conflicting alt subsets from a configuration set. For each configuration c in configs :
            map[c] U= c.
            getAlt()
            # map hash/equals uses s and x, not
            alt and not pred
            
Get a map from state to alt subset from a configuration set. Get a map from state to alt subset from a configuration set. For each configuration c in configs :
            map[c.
            
            ] U= c.
            
            
A Map that uses just the state and the stack context as the key. A Map that uses just the state and the stack context as the key. The hash code is only a function of the and . 4.3 At the point of LL failover, we record how SLL would resolve the conflict so that we can determine whether or not a decision / input pair is context-sensitive. At the point of LL failover, we record how SLL would resolve the conflict so that we can determine whether or not a decision / input pair is context-sensitive. If LL gives a different result than SLL's predicted alternative, we have a context sensitivity for sure. The converse is not necessarily true, however. It's possible that after conflict resolution chooses minimum alternatives, SLL could get the same answer as LL. Regardless of whether or not the result indicates an ambiguity, it is not treated as a context sensitivity because LL prediction was not required in order to produce a correct prediction for this decision and input sequence. It may in fact still be a context sensitivity but we don't know by looking at the minimum alternatives for the current input. The last node in the ATN for a rule, unless that rule is the start symbol. The last node in the ATN for a rule, unless that rule is the start symbol. In that case, there is one transition to EOF. Later, we might encode references to all calls to this rule to compute FOLLOW sets for error handling. Ptr to the rule definition object for this rule ref What node to begin computations following ref to rule A tree structure used to record the semantic context in which an ATN configuration is valid. A tree structure used to record the semantic context in which an ATN configuration is valid. It's either a single predicate, a conjunction p1&&p2 , or a sum of products p1||p2 .

I have scoped the , , and subclasses of within the scope of this outer class.

The default , which is semantically equivalent to a predicate of the form true}?}. For context independent predicates, we evaluate them without a local context (i.e., null context). For context independent predicates, we evaluate them without a local context (i.e., null context). That way, we can evaluate them without having to create proper rule-specific context during prediction (as opposed to the parser, which creates them naturally). In a practical sense, this avoids a cast exception from RuleContext to myruleContext.

For context dependent predicates, we must pass in a local context so that references such as $arg evaluate properly as _localctx.arg. We only capture context dependent predicates in the context in which we begin prediction, so we passed in the outer context here in case of context dependent predicate evaluation.

Evaluate the precedence predicates for the context and reduce the result. Evaluate the precedence predicates for the context and reduce the result. The parser instance. The simplified semantic context after precedence predicates are evaluated, which will be one of the following values.
  • : if the predicate simplifies to true after precedence predicates are evaluated.
  • null : if the predicate simplifies to false after precedence predicates are evaluated.
  • this : if the semantic context is not changed as a result of precedence predicate evaluation.
  • A non- null : the new simplified semantic context after precedence predicates are evaluated.
This is the base class for semantic context "operators", which operate on a collection of semantic context "operands". This is the base class for semantic context "operators", which operate on a collection of semantic context "operands". 4.3 Gets the operands for the semantic context operator. Gets the operands for the semantic context operator. a collection of operands for the operator. 4.3 A semantic context which is true whenever none of the contained contexts is false. A semantic context which is true whenever none of the contained contexts is false.

The evaluation of predicates by this context is short-circuiting, but unordered.

A semantic context which is true whenever at least one of the contained contexts is true. A semantic context which is true whenever at least one of the contained contexts is true.

The evaluation of predicates by this context is short-circuiting, but unordered.

Sam Harwell The block that begins a closure loop. The block that begins a closure loop. Indicates whether this state can benefit from a precedence DFA during SLL decision making. Indicates whether this state can benefit from a precedence DFA during SLL decision making.

This is a computed property that is calculated during ATN deserialization and stored for use in and .

The Tokens rule start state linking to each lexer rule start state This implementation of responds to syntax errors by immediately canceling the parse operation with a . The implementation ensures that the field is set for all parse tree nodes that were not completed prior to encountering the error.

This error strategy is useful in the following scenarios.

  • Two-stage parsing: This error strategy allows the first stage of two-stage parsing to immediately terminate if an error is encountered, and immediately fall back to the second stage. In addition to avoiding wasted work by attempting to recover from errors here, the empty implementation of improves the performance of the first stage.
  • Silent validation: When syntax errors are not being reported or logged, and the parse result is simply ignored if errors occur, the avoids wasting work on recovering from errors when the result will be ignored either way.

myparser.setErrorHandler(new BailErrorStrategy());

This is the default implementation of used for error reporting and recovery in ANTLR parsers. The interface for defining strategies to deal with syntax errors encountered during a parse by ANTLR-generated parsers. The interface for defining strategies to deal with syntax errors encountered during a parse by ANTLR-generated parsers. We distinguish between three different kinds of errors:
  • The parser could not figure out which path to take in the ATN (none of the available alternatives could possibly match)
  • The current input does not match what we were looking for
  • A predicate evaluated to false
Implementations of this interface report syntax errors by calling .

TODO: what to do about lexers

Reset the error handler state for the specified recognizer . the parser instance This method is called when an unexpected symbol is encountered during an inline match operation, such as . If the error strategy successfully recovers from the match failure, this method returns the instance which should be treated as the successful result of the match.

Note that the calling code will not report an error if this method returns successfully. The error strategy implementation is responsible for calling as appropriate.

the parser instance if the error strategy was not able to recover from the unexpected input symbol
This method is called to recover from exception e . This method is called after by the default exception handler generated for a rule method. the parser instance the recognition exception to recover from if the error strategy could not recover from the recognition exception This method provides the error handler with an opportunity to handle syntactic or semantic errors in the input stream before they result in a .

The generated code currently contains calls to after entering the decision state of a closure block ( (...)* or (...)+ ).

For an implementation based on Jim Idle's "magic sync" mechanism, see .

the parser instance if an error is detected by the error strategy but cannot be automatically recovered at the current state in the parsing process
Tests whether or not recognizer is in the process of recovering from an error. In error recovery mode, adds symbols to the parse tree by calling instead of . the parser instance true if the parser is currently recovering from a parse error, otherwise false This method is called by when the parser successfully matches an input symbol. This method is called by when the parser successfully matches an input symbol. the parser instance Report any kind of . This method is called by the default exception handler generated for a rule method. the parser instance the recognition exception to report Indicates whether the error strategy is currently "recovering from an error". Indicates whether the error strategy is currently "recovering from an error". This is used to suppress reporting multiple error messages while attempting to recover from a detected syntax error. The index into the input stream where the last error occurred. The index into the input stream where the last error occurred. This is used to prevent infinite loops where an error is found but no token is consumed during recovery...another error is found, ad nauseum. This is a failsafe mechanism to guarantee that at least one token/tree node is consumed for two errors.

The default implementation simply calls to ensure that the handler is not in error recovery mode.

This method is called to enter error recovery mode when a recognition exception is reported. This method is called to enter error recovery mode when a recognition exception is reported. the parser instance This method is called to leave error recovery mode after recovering from a recognition exception. This method is called to leave error recovery mode after recovering from a recognition exception.

The default implementation simply calls .

The default implementation returns immediately if the handler is already in error recovery mode. Otherwise, it calls and dispatches the reporting task based on the runtime type of e according to the following table.

  • : Dispatches the call to
  • : Dispatches the call to
  • : Dispatches the call to
  • All other types: calls to report the exception

The default implementation resynchronizes the parser by consuming tokens until we find one in the resynchronization set--loosely the set of tokens that can follow the current rule.

The default implementation of makes sure that the current lookahead symbol is consistent with what were expecting at this point in the ATN. You can call this anytime but ANTLR only generates code to check before subrules/loops and each iteration.

Implements Jim Idle's magic sync mechanism in closures and optional subrules. E.g.,

            a : sync ( stuff sync )* ;
            sync : {consume to what can follow sync} ;
            
At the start of a sub rule upon error, performs single token deletion, if possible. If it can't do that, it bails on the current rule and uses the default error recovery, which consumes until the resynchronization set of the current rule.

If the sub rule is optional ( (...)? , (...)* , or block with an empty alternative), then the expected set includes what follows the subrule.

During loop iteration, it consumes until it sees a token that can start a sub rule or what follows loop. Yes, that is pretty aggressive. We opt to stay in the loop as long as possible.

ORIGINS

Previous versions of ANTLR did a poor job of their recovery within loops. A single mismatch token or missing token would force the parser to bail out of the entire rules surrounding the loop. So, for rule

            classDef : 'class' ID '{' member* '}'
            
input with an extra token between members would force the parser to consume until it found the next class definition rather than the next member definition of the current class.

This functionality cost a little bit of effort because the parser has to compare token set at the start of the loop and at each iteration. If for some reason speed is suffering for you, you can turn off this functionality by simply overriding this method as a blank { }.

This is called by when the exception is a . the parser instance the recognition exception This is called by when the exception is an . the parser instance the recognition exception This is called by when the exception is a . the parser instance the recognition exception This method is called to report a syntax error which requires the removal of a token from the input stream. This method is called to report a syntax error which requires the removal of a token from the input stream. At the time this method is called, the erroneous symbol is current LT(1) symbol and has not yet been removed from the input stream. When this method returns, recognizer is in error recovery mode.

This method is called when identifies single-token deletion as a viable recovery strategy for a mismatched input error.

The default implementation simply returns if the handler is already in error recovery mode. Otherwise, it calls to enter error recovery mode, followed by calling .

the parser instance
This method is called to report a syntax error which requires the insertion of a missing token into the input stream. This method is called to report a syntax error which requires the insertion of a missing token into the input stream. At the time this method is called, the missing token has not yet been inserted. When this method returns, recognizer is in error recovery mode.

This method is called when identifies single-token insertion as a viable recovery strategy for a mismatched input error.

The default implementation simply returns if the handler is already in error recovery mode. Otherwise, it calls to enter error recovery mode, followed by calling .

the parser instance

The default implementation attempts to recover from the mismatched input by using single token insertion and deletion as described below. If the recovery attempt fails, this method throws an .

EXTRA TOKEN (single token deletion)

LA(1) is not what we are looking for. If LA(2) has the right token, however, then assume LA(1) is some extra spurious token and delete it. Then consume and return the next token (which was the LA(2) token) as the successful result of the match operation.

This recovery strategy is implemented by .

MISSING TOKEN (single token insertion)

If current token (at LA(1) ) is consistent with what could come after the expected LA(1) token, then assume the token is missing and use the parser's to create it on the fly. The "insertion" is performed by returning the created token as the successful result of the match operation.

This recovery strategy is implemented by .

EXAMPLE

For example, Input i=(3; is clearly missing the ')' . When the parser returns from the nested call to expr , it will have call chain:

            stat → expr → atom
            
and it will be trying to match the ')' at this point in the derivation:
            => ID '=' '(' INT ')' ('+' atom)* ';'
            ^
            
The attempt to match ')' will fail when it sees ';' and call . To recover, it sees that LA(1)==';' is in the set of tokens that can follow the ')' token reference in rule atom . It can assume that you forgot the ')' .
This method implements the single-token insertion inline error recovery strategy. This method implements the single-token insertion inline error recovery strategy. It is called by if the single-token deletion strategy fails to recover from the mismatched input. If this method returns true , recognizer will be in error recovery mode.

This method determines whether or not single-token insertion is viable by checking if the LA(1) input symbol could be successfully matched if it were instead the LA(2) symbol. If this method returns true , the caller is responsible for creating and inserting a token with the correct type to produce this behavior.

the parser instance true if single-token insertion is a viable recovery strategy for the current mismatched input, otherwise false
This method implements the single-token deletion inline error recovery strategy. This method implements the single-token deletion inline error recovery strategy. It is called by to attempt to recover from mismatched input. If this method returns null, the parser and error handler state will not have changed. If this method returns non-null, recognizer will not be in error recovery mode since the returned token was a successful match.

If the single-token deletion is successful, this method calls to report the error, followed by to actually "delete" the extraneous token. Then, before returning is called to signal a successful match.

the parser instance the successfully matched instance if single-token deletion successfully recovers from the mismatched input, otherwise null
Conjure up a missing token during error recovery. Conjure up a missing token during error recovery. The recognizer attempts to recover from single missing symbols. But, actions might refer to that missing symbol. For example, x=ID {f($x);}. The action clearly assumes that there has been an identifier matched previously and that $x points at that token. If that token is missing, but the next token in the stream is what we want we assume that this token is missing and we keep going. Because we have to return some token to replace the missing token, we have to conjure one up. This method gives the user control over the tokens returned for missing tokens. Mostly, you will want to create something special for identifier tokens. For literals such as '{' and ',', the default action in the parser or tree parser works. It simply creates a CommonToken of the appropriate type. The text will be the token. If you change what tokens must be created by the lexer, override this method to create the appropriate tokens. How should a token be displayed in an error message? The default is to display just the text, but during development you might want to have a lot of information spit out. How should a token be displayed in an error message? The default is to display just the text, but during development you might want to have a lot of information spit out. Override in that case to use t.toString() (which, for CommonToken, dumps everything about the token). This is better than forcing you to override a method in your token objects because you don't have to go modify your lexer so that it creates a new Java type. Consume tokens until one matches the given token set. Consume tokens until one matches the given token set. Instead of recovering from exception e , re-throw it wrapped in a so it is not caught by the rule function catches. Use to get the original . Make sure we don't attempt to recover inline; if the parser successfully recovers, it won't throw an exception. Make sure we don't attempt to recover inline; if the parser successfully recovers, it won't throw an exception. Make sure we don't attempt to recover from problems in subrules. Make sure we don't attempt to recover from problems in subrules. Provides an empty default implementation of . The default implementation of each method does nothing, but can be overridden as necessary. Sam Harwell How to emit recognition errors for parsers. How to emit recognition errors for parsers. How to emit recognition errors. How to emit recognition errors. Upon syntax error, notify any interested parties. Upon syntax error, notify any interested parties. This is not how to recover from errors or compute error messages. specifies how to recover from syntax errors and how to compute error messages. This listener's job is simply to emit a computed message, though it has enough information to create its own message in many cases.

The is non-null for all syntax errors except when we discover mismatched token errors that we can recover from in-line, without returning from the surrounding rule (via the single token insertion and deletion mechanism).

What parser got the error. From this object, you can access the context as well as the input stream. The offending token in the input token stream, unless recognizer is a lexer (then it's null). If no viable alternative error, e has token at which we started production for the decision. The line number in the input where the error occurred. The character position within that line where the error occurred. The message to emit. The exception generated by the parser that led to the reporting of an error. It is null in the case where the parser was able to recover in line without exiting the surrounding rule.
This method is called by the parser when a full-context prediction results in an ambiguity. This method is called by the parser when a full-context prediction results in an ambiguity.

Each full-context prediction which does not result in a syntax error will call either or .

When ambigAlts is not null, it contains the set of potentially viable alternatives identified by the prediction algorithm. When ambigAlts is null, use to obtain the represented alternatives from the configs argument.

When exact is true , all of the potentially viable alternatives are truly viable, i.e. this is reporting an exact ambiguity. When exact is false , at least two of the potentially viable alternatives are viable for the current input, but the prediction algorithm terminated as soon as it determined that at least the minimum potentially viable alternative is truly viable.

When the prediction mode is used, the parser is required to identify exact ambiguities so exact will always be true .

the parser instance the DFA for the current decision the input index where the decision started the input input where the ambiguity was identified true if the ambiguity is exactly known, otherwise false . This is always true when is used. the potentially ambiguous alternatives, or null to indicate that the potentially ambiguous alternatives are the complete set of represented alternatives in configs the ATN configuration set where the ambiguity was identified
This method is called when an SLL conflict occurs and the parser is about to use the full context information to make an LL decision. This method is called when an SLL conflict occurs and the parser is about to use the full context information to make an LL decision.

If one or more configurations in configs contains a semantic predicate, the predicates are evaluated before this method is called. The subset of alternatives which are still viable after predicates are evaluated is reported in conflictingAlts .

the parser instance the DFA for the current decision the input index where the decision started the input index where the SLL conflict occurred The specific conflicting alternatives. If this is null , the conflicting alternatives are all alternatives represented in configs . the simulator state when the SLL conflict was detected
This method is called by the parser when a full-context prediction has a unique result. This method is called by the parser when a full-context prediction has a unique result.

Each full-context prediction which does not result in a syntax error will call either or .

For prediction implementations that only evaluate full-context predictions when an SLL conflict is found (including the default implementation), this method reports cases where SLL conflicts were resolved to unique full-context predictions, i.e. the decision was context-sensitive. This report does not necessarily indicate a problem, and it may appear even in completely unambiguous grammars.

configs may have more than one represented alternative if the full-context prediction algorithm does not evaluate predicates before beginning the full-context prediction. In all cases, the final prediction is passed as the prediction argument.

Note that the definition of "context sensitivity" in this method differs from the concept in . This method reports all instances where an SLL conflict occurred but LL parsing produced a unique result, whether or not that unique result matches the minimum alternative in the SLL conflicting set.

the parser instance the DFA for the current decision the input index where the decision started the input index where the context sensitivity was finally determined the unambiguous result of the full-context prediction the simulator state when the unambiguous prediction was determined
This implementation of loads tokens from a on-demand, and places the tokens in a buffer to provide access to any previous token by index.

This token stream ignores the value of . If your parser requires the token stream filter tokens to only those on a particular channel, such as or , use a filtering token stream such a .

An whose symbols are instances. Get the instance associated with the value returned by LA(k) . This method has the same pre- and post-conditions as . In addition, when the preconditions of this method are met, the return value is non-null and the value of LT(k).getType()==LA(k) . Gets the at the specified index in the stream. When the preconditions of this method are met, the return value is non-null.

The preconditions for this method are the same as the preconditions of . If the behavior of seek(index) is unspecified for the current state and given index , then the behavior of this method is also unspecified.

The symbol referred to by index differs from seek() only in the case of filtering streams where index lies before the end of the stream. Unlike seek() , this method does not adjust index to point to a non-ignored symbol.

if {code index} is less than 0 if the stream does not support retrieving the token at the specified index
Return the text of all tokens within the specified interval . This method behaves like the following code (including potential exceptions for violating preconditions of , but may be optimized by the specific implementation.
            TokenStream stream = ...;
            String text = "";
            for (int i = interval.a; i <= interval.b; i++) {
            text += stream.get(i).getText();
            }
            
The interval of tokens within this stream to get text for. The text of all tokens within the specified interval in this stream. if interval is null
Return the text of all tokens in the stream. Return the text of all tokens in the stream. This method behaves like the following code, including potential exceptions from the calls to and , but may be optimized by the specific implementation.
            TokenStream stream = ...;
            String text = stream.getText(new Interval(0, stream.size()));
            
The text of all tokens in the stream.
Return the text of all tokens in the source interval of the specified context. Return the text of all tokens in the source interval of the specified context. This method behaves like the following code, including potential exceptions from the call to , but may be optimized by the specific implementation.

If ctx.getSourceInterval() does not return a valid interval of tokens provided by this stream, the behavior is unspecified.

            TokenStream stream = ...;
            String text = stream.getText(ctx.getSourceInterval());
            
The context providing the source interval of tokens to get text for. The text of all tokens within the source interval of ctx .
Return the text of all tokens in this stream between start and stop (inclusive).

If the specified start or stop token was not provided by this stream, or if the stop occurred before the start token, the behavior is unspecified.

For streams which ensure that the method is accurate for all of its provided tokens, this method behaves like the following code. Other streams may implement this method in other ways provided the behavior is consistent with this at a high level.

            TokenStream stream = ...;
            String text = "";
            for (int i = start.getTokenIndex(); i <= stop.getTokenIndex(); i++) {
            text += stream.get(i).getText();
            }
            
The first token in the interval to get text for. The last token in the interval to get text for (inclusive). The text of all tokens lying between the specified start and stop tokens. if this stream does not support this method for the specified tokens
Gets the underlying which provides tokens for this stream. The from which tokens for this stream are fetched. A collection of all tokens fetched from the token source. A collection of all tokens fetched from the token source. The list is considered a complete view of the input once is set to true . The index into of the current token (next token to ). [ ] should be LT(1) .

This field is set to -1 when the stream is first constructed or when is called, indicating that the first token has not yet been fetched from the token source. For additional information, see the documentation of for a description of Initializing Methods.

Indicates whether the token has been fetched from and added to . This field improves performance for the following cases:
  • : The lookahead check in to prevent consuming the EOF symbol is optimized by checking the values of and instead of calling .
  • : The check to prevent adding multiple EOF symbols into is trivial with this field.
Make sure index i in tokens has a token. true if a token is located at index i , otherwise false . Add n elements to buffer. The actual number of elements added to the buffer. Get all tokens from start..stop inclusively. Get all tokens from start..stop inclusively. Allowed derived classes to modify the behavior of operations which change the current stream position by adjusting the target token index of a seek operation. Allowed derived classes to modify the behavior of operations which change the current stream position by adjusting the target token index of a seek operation. The default implementation simply returns i . If an exception is thrown in this method, the current stream index should not be changed.

For example, overrides this method to ensure that the seek target is always an on-channel token.

The target token index. The adjusted target token index.
Reset this token stream by setting its token source. Reset this token stream by setting its token source. Given a start and stop index, return a List of all tokens in the token type BitSet . Return null if no tokens were found. This method looks at both on and off channel tokens. Given a starting index, return the index of the next token on channel. Given a starting index, return the index of the next token on channel. Return i if tokens[i] is on channel. Return the index of the EOF token if there are no tokens on channel between i and EOF. Given a starting index, return the index of the previous token on channel. Given a starting index, return the index of the previous token on channel. Return i if tokens[i] is on channel. Return -1 if there are no tokens on channel between i and 0.

If i specifies an index at or after the EOF token, the EOF token index is returned. This is due to the fact that the EOF token is treated as though it were on every channel.

Collect all tokens on specified channel to the right of the current token up until we see a token on or EOF. If channel is -1 , find any non default channel token. Collect all hidden tokens (any off-default channel) to the right of the current token up until we see a token on or EOF. Collect all tokens on specified channel to the left of the current token up until we see a token on . If channel is -1 , find any non default channel token. Collect all hidden tokens (any off-default channel) to the left of the current token up until we see a token on . Get the text of all tokens in this buffer. Get the text of all tokens in this buffer. Get all tokens from lexer until EOF. Get all tokens from lexer until EOF. A token has properties: text, type, line, character position in the line (so we can ignore tabs), token channel, index, and source from which we obtained this token. A token has properties: text, type, line, character position in the line (so we can ignore tabs), token channel, index, and source from which we obtained this token. Get the text of the token. Get the text of the token. Get the token type of the token. Get the token type of the token. The line number on which the 1st character of this token was matched, line=1..n The index of the first character of this token relative to the beginning of the line at which it occurs, 0..n-1 Return the channel this token. Return the channel this token. Each token can arrive at the parser on a different channel, but the parser only "tunes" to a single channel. The parser ignores everything not on DEFAULT_CHANNEL. An index from 0..n-1 of the token object in the input stream. An index from 0..n-1 of the token object in the input stream. This must be valid in order to print token streams and use TokenRewriteStream. Return -1 to indicate that this token was conjured up since it doesn't have a valid index. The starting character index of the token This method is optional; return -1 if not implemented. The starting character index of the token This method is optional; return -1 if not implemented. The last character index of the token. The last character index of the token. This method is optional; return -1 if not implemented. Gets the which created this token. Gets the from which this token was derived. An empty which is used as the default value of for tokens that do not have a source. This is the backing field for the property. This is the backing field for the property. This is the backing field for the property. This is the backing field for the property. This is the backing field for and .

These properties share a field to reduce the memory footprint of . Tokens created by a from the same source and input stream share a reference to the same containing these values.

This is the backing field for the property. This is the backing field for the property. This is the backing field for the property. This is the backing field for the property. Constructs a new with the specified token type. The token type. Constructs a new with the specified token type and text. The token type. The text of the token. Constructs a new as a copy of another .

If oldToken is also a instance, the newly constructed token will share a reference to the field and the stored in . Otherwise, will be assigned the result of calling , and will be constructed from the result of and .

The token to copy.
Explicitly set the text for this token. Explicitly set the text for this token. If {code text} is not null , then will return this value rather than extracting the text from the input. The explicit text of the token, or null if the text should be obtained from the input along with the start and stop indexes of the token. This default implementation of creates objects. The default mechanism for creating tokens. The default mechanism for creating tokens. It's used by default in Lexer and the error handling strategy (to create missing tokens). Notifying the parser of a new factory means that it notifies it's token source and error strategy. This is the method used to create tokens in the lexer and in the error handling strategy. This is the method used to create tokens in the lexer and in the error handling strategy. If text!=null, than the start and stop positions are wiped to -1 in the text override is set in the CommonToken. Generically useful The default instance.

This token factory does not explicitly copy token text when constructing tokens.

Indicates whether should be called after constructing tokens to explicitly set the text. This is useful for cases where the input stream might not be able to provide arbitrary substrings of text from the input after the lexer creates a token (e.g. the implementation of in throws an ). Explicitly setting the token text allows to be called at any time regardless of the input stream implementation.

The default value is false to avoid the performance and memory overhead of copying text for every token unless explicitly requested.

Constructs a with the specified value for .

When copyText is false , the instance should be used instead of constructing a new instance.

The value for .
Constructs a with set to false .

The instance should be used instead of calling this directly.

This class extends with functionality to filter token streams to tokens on a particular channel (tokens where returns a particular value).

This token stream provides access to all tokens by index or when calling methods like . The channel filtering is only used for code accessing tokens via the lookahead methods , , and .

By default, tokens are placed on the default channel ( ), but may be reassigned by using the ->channel(HIDDEN) lexer command, or by using an embedded action to call .

Note: lexer rules which use the ->skip lexer command or call do not produce tokens at all, so input text matched by such a rule will not be available as part of the token stream, regardless of channel.

Specifies the channel to use for filtering tokens. Specifies the channel to use for filtering tokens.

The default value is , which matches the default channel assigned to tokens created by the lexer.

Constructs a new using the specified token source and the default token channel ( ). The token source. Constructs a new using the specified token source and filtering tokens to the specified channel. Only tokens whose matches channel or have the equal to will be returned by the token stream lookahead methods. The token source. The channel to use for filtering tokens. Count EOF just once. Count EOF just once. Sam Harwell Provides a default instance of .

This implementation prints messages to containing the values of line , charPositionInLine , and msg using the following format.

            line line:charPositionInLine msg
            
Sam Harwell Sam Harwell Sam Harwell sam A set of all DFA states. A set of all DFA states. Use so we can get old state back ( only allows you to see if it's there). From which ATN state did we create this DFA? true if this DFA is for a precedence decision; otherwise, false . This is the backing field for . Get the start state for a specific precedence value. Get the start state for a specific precedence value. The current precedence. The start state corresponding to the specified precedence, or null if no start state exists for the specified precedence. if this is not a precedence DFA. Set the start state for a specific precedence value. Set the start state for a specific precedence value. The current precedence. The start state corresponding to the specified precedence. if this is not a precedence DFA. Gets whether this DFA is a precedence DFA. Gets whether this DFA is a precedence DFA. Precedence DFAs use a special start state which is not stored in . The array for this start state contains outgoing edges supplying individual start states corresponding to specific precedence values. true if this is a precedence DFA; otherwise, false . Sets whether this is a precedence DFA. Sets whether this is a precedence DFA. If the specified value differs from the current DFA configuration, the following actions are taken; otherwise no changes are made to the current DFA.
  • The map is cleared
  • If precedenceDfa is false , the initial state is set to null ; otherwise, it is initialized to a new with an empty outgoing array to store the start states for individual precedence values.
  • The field is updated
true if this is a precedence DFA; otherwise, false
A DFA walker that knows how to dump them to serialized strings. A DFA walker that knows how to dump them to serialized strings. A DFA state represents a set of possible ATN configurations. A DFA state represents a set of possible ATN configurations. As Aho, Sethi, Ullman p. 117 says "The DFA uses its state to keep track of all possible states the ATN can be in after reading each input symbol. That is to say, after reading input a1a2..an, the DFA is in a state that represents the subset T of the states of the ATN that are reachable from the ATN's start state along some path labeled a1a2..an." In conventional NFA→DFA conversion, therefore, the subset T would be a bitset representing the set of states the ATN could be in. We need to track the alt predicted by each state as well, however. More importantly, we need to maintain a stack of states, tracking the closure operations as they jump from rule to rule, emulating rule invocations (method calls). I have to add a stack to simulate the proper lookahead sequences for the underlying LL grammar from which the ATN was derived.

I use a set of ATNConfig objects not simple states. An ATNConfig is both a state (ala normal conversion) and a RuleContext describing the chain of rules (if any) followed to arrive at that state.

A DFA state may have multiple references to a particular state, but with different ATN contexts (with same or different alts) meaning that state was reached via a different set of rule invocations.

edges.get(symbol) points to target of symbol. if accept state, what ttype do we match or alt do we predict? This is set to when !=null . These keys for these edges are the top level element of the global context. These keys for these edges are the top level element of the global context. Symbols in this set require a global context transition before matching an input symbol. Symbols in this set require a global context transition before matching an input symbol. This list is computed by . Two instances are equal if their ATN configuration sets are the same. This method is used to see if a state already exists.

Because the number of alternatives and number of ATN configurations are finite, there is a finite number of DFA states that can be processed. This is necessary to show that the algorithm terminates.

Cannot test the DFA state numbers here because in we need to know if any other state exists that has this exact set of ATN configurations. The is irrelevant.

Map a predicate to a predicted alternative. Map a predicate to a predicted alternative. Sam Harwell Sam Harwell This implementation of can be used to identify certain potential correctness and performance problems in grammars. "Reports" are made by calling with the appropriate message.
  • Ambiguities: These are cases where more than one path through the grammar can match the input.
  • Weak context sensitivity: These are cases where full-context prediction resolved an SLL conflict to a unique alternative which equaled the minimum alternative of the SLL conflict.
  • Strong (forced) context sensitivity: These are cases where the full-context prediction resolved an SLL conflict to a unique alternative, and the minimum alternative of the SLL conflict was found to not be a truly viable alternative. Two-stage parsing cannot be used for inputs where this situation occurs.
Sam Harwell
When true , only exactly known ambiguities are reported. Initializes a new instance of which only reports exact ambiguities. Initializes a new instance of , specifying whether all ambiguities or only exact ambiguities are reported. true to report only exact ambiguities, otherwise false to report all ambiguities. Computes the set of conflicting or ambiguous alternatives from a configuration set, if that information was not already provided by the parser. Computes the set of conflicting or ambiguous alternatives from a configuration set, if that information was not already provided by the parser. The set of conflicting or ambiguous alternatives, as reported by the parser. The conflicting or ambiguous configuration set. Returns reportedAlts if it is not null , otherwise returns the set of alternatives represented in configs . A semantic predicate failed during validation. A semantic predicate failed during validation. Validation of predicates occurs when normally parsing the alternative just like matching a token. Disambiguating predicate evaluation occurs when we test a predicate during prediction. The root of the ANTLR exception hierarchy. The root of the ANTLR exception hierarchy. In general, ANTLR tracks just 3 kinds of errors: prediction errors, failed predicate errors, and mismatched input errors. In each case, the parser knows where it is in the input, where it is in the ATN, the rule invocation stack, and what kind of problem occurred. The where this exception originated. The current when an error occurred. Since not all streams support accessing symbols by index, we have to track the instance itself. Gets the set of input symbols which could potentially follow the previously matched symbol at the time this exception was thrown. Gets the set of input symbols which could potentially follow the previously matched symbol at the time this exception was thrown.

If the set of expected tokens is not known and could not be computed, this method returns null .

The set of token types that could potentially follow the current state in the ATN, or null if the information is not available.
Get the ATN state number the parser was in at the time the error occurred. Get the ATN state number the parser was in at the time the error occurred. For and exceptions, this is the number. For others, it is the state whose outgoing edge we couldn't match.

If the state number is not known, this method returns -1.

Gets the at the time this exception was thrown.

If the context is not available, this method returns null .

The at the time this exception was thrown. If the context is not available, this method returns null .
Gets the input stream which is the symbol source for the recognizer where this exception was thrown. Gets the input stream which is the symbol source for the recognizer where this exception was thrown.

If the input stream is not available, this method returns null .

The input stream which is the symbol source for the recognizer where this exception was thrown, or null if the stream is not available.
Gets the where this exception occurred.

If the recognizer is not available, this method returns null .

The recognizer where this exception occurred, or null if the recognizer is not available.
The value returned by LA() when the end of the stream is reached. The value returned by when the actual name of the underlying source is not known. This signifies any kind of mismatched input exceptions such as when the current input does not match the expected token. This signifies any kind of mismatched input exceptions such as when the current input does not match the expected token. This class extends by allowing the value of to be explicitly set for the context.

does not include field storage for the rule index since the context classes created by the code generator override the method to return the correct value for that context. Since the parser interpreter does not use the context classes generated for a parser, this class (with slightly more memory overhead per node) is used to provide equivalent functionality.

A rule invocation record for parsing. A rule invocation record for parsing. Contains all of the information about the current rule not stored in the RuleContext. It handles parse tree children list, Any ATN state tracing, and the default values available for rule indications: start, stop, rule index, current alt number, current ATN state. Subclasses made for each rule and grammar track the parameters, return values, locals, and labels specific to that rule. These are the objects that are returned from rules. Note text is not an actual field of a rule return value; it is computed from start and stop using the input stream's toString() method. I could add a ctor to this so that we can pass in and store the input stream, but I'm not sure we want to do that. It would seem to be undefined to get the .text property anyway if the rule matches tokens from multiple input streams. I do not use getters for fields of objects that are used simply to group values such as this aggregate. The getters/setters are there to satisfy the superclass interface. A rule context is a record of a single rule invocation. A rule context is a record of a single rule invocation. It knows which context invoked it, if any. If there is no parent context, then naturally the invoking state is not valid. The parent link provides a chain upwards from the current rule invocation to the root of the invocation tree, forming a stack. We actually carry no information about the rule associated with this context (except when parsing). We keep only the state number of the invoking state from the ATN submachine that invoked this. Contrast this with the s pointer inside ParserRuleContext that tracks the current state being "executed" for the current rule. The parent contexts are useful for computing lookahead sets and getting error information. These objects are used during parsing and prediction. For the special case of parsers, we use the subclass ParserRuleContext. An interface to access the tree of objects created during a parse that makes the data structure look like a simple parse tree. This node represents both internal nodes, rule invocations, and leaf nodes, token matches.

The payload is either a or a object.

A tree that knows about an interval in a token stream is some kind of syntax tree. A tree that knows about an interval in a token stream is some kind of syntax tree. Subinterfaces distinguish between parse trees and other kinds of syntax trees we might want to create. The basic notion of a tree has a parent, a payload, and a list of children. The basic notion of a tree has a parent, a payload, and a list of children. It is the most abstract interface for all the trees used by ANTLR. If there are children, get the i th value indexed from 0. Print out a whole tree, not just a node, in LISP format (root child1 .. childN) . Print just a node if this is a leaf. The parent of this node. The parent of this node. If the return value is null, then this node is the root of the tree. This method returns whatever object represents the data at this note. This method returns whatever object represents the data at this note. For example, for parse trees, the payload can be a representing a leaf node or a object representing a rule invocation. For abstract syntax trees (ASTs), this is a object. How many children are there? If there is none, then this node represents a leaf node. How many children are there? If there is none, then this node represents a leaf node. Return an indicating the index in the of the first and last token associated with this subtree. If this node is a leaf, then the interval represents a single token.

If source interval is unknown, this returns .

The needs a double dispatch method. Return the combined text of all leaf nodes. Return the combined text of all leaf nodes. Does not get any off-channel tokens (if any) so won't return whitespace and comments if they are sent to parser on hidden channel. Specialize toStringTree so that it can print out more information based upon the parser. Specialize toStringTree so that it can print out more information based upon the parser. What context invoked this rule? What state invoked the rule associated with this context? The "return address" is the followState of invokingState If parent is null, this should be -1. What state invoked the rule associated with this context? The "return address" is the followState of invokingState If parent is null, this should be -1. Return the combined text of all child nodes. Return the combined text of all child nodes. This method only considers tokens which have been added to the parse tree.

Since tokens on hidden channels (e.g. whitespace or comments) are not added to the parse trees, they will not appear in the output of this method.

Print out a whole tree, not just a node, in LISP format (root child1 .. Print out a whole tree, not just a node, in LISP format (root child1 .. childN). Print just a node if this is a leaf. We have to know the recognizer so we can get rule names.
Print out a whole tree, not just a node, in LISP format (root child1 .. Print out a whole tree, not just a node, in LISP format (root child1 .. childN). Print just a node if this is a leaf. A context is empty if there is no invoking state; meaning nobody call current context. A context is empty if there is no invoking state; meaning nobody call current context. If we are debugging or building a parse tree for a visitor, we need to track all of the tokens and rule invocations associated with this rule's context. If we are debugging or building a parse tree for a visitor, we need to track all of the tokens and rule invocations associated with this rule's context. This is empty for parsing w/o tree constr. operation because we don't the need to track the details about how we parse this rule. For debugging/tracing purposes, we want to track all of the nodes in the ATN traversed by the parser for a particular rule. For debugging/tracing purposes, we want to track all of the nodes in the ATN traversed by the parser for a particular rule. This list indicates the sequence of ATN nodes used to match the elements of the children list. This list does not include ATN nodes and other rules used to match rule invocations. It traces the rule invocation node itself but nothing inside that other rule's ATN submachine. There is NOT a one-to-one correspondence between the children and states list. There are typically many nodes in the ATN traversed for each element in the children list. For example, for a rule invocation there is the invoking state and the following state. The parser setState() method updates field s and adds it to this list if we are debugging/tracing. This does not trace states visited during prediction. For debugging/tracing purposes, we want to track all of the nodes in the ATN traversed by the parser for a particular rule. For debugging/tracing purposes, we want to track all of the nodes in the ATN traversed by the parser for a particular rule. This list indicates the sequence of ATN nodes used to match the elements of the children list. This list does not include ATN nodes and other rules used to match rule invocations. It traces the rule invocation node itself but nothing inside that other rule's ATN submachine. There is NOT a one-to-one correspondence between the children and states list. There are typically many nodes in the ATN traversed for each element in the children list. For example, for a rule invocation there is the invoking state and the following state. The parser setState() method updates field s and adds it to this list if we are debugging/tracing. This does not trace states visited during prediction. The exception that forced this rule to return. The exception that forced this rule to return. If the rule successfully completed, this is null . COPY a ctx (I'm deliberately not using copy constructor) Does not set parent link; other add methods do that Used by enterOuterAlt to toss out a RuleContext previously added as we entered a rule. Used by enterOuterAlt to toss out a RuleContext previously added as we entered a rule. If we have # label, we will need to remove generic ruleContext object. Used for rule context info debugging during parse-time, not so much for ATN debugging This is the backing field for . Constructs a new with the specified parent, invoking state, and rule index. The parent context. The invoking state number. The rule index for the current context. During lookahead operations, this "token" signifies we hit rule end ATN state and did not follow it despite needing to. During lookahead operations, this "token" signifies we hit rule end ATN state and did not follow it despite needing to. All tokens go to the parser (unless skip() is called in that rule) on a particular "channel". All tokens go to the parser (unless skip() is called in that rule) on a particular "channel". The parser tunes to a particular channel so that whitespace etc... can go to the parser on a "hidden" channel. Anything on different channel than DEFAULT_CHANNEL is not parsed by parser. Anything on different channel than DEFAULT_CHANNEL is not parsed by parser. A source of tokens must provide a sequence of tokens via and also must reveal it's source of characters; 's text is computed from a ; it only store indices into the char stream.

Errors from the lexer are never passed to the parser. Either you want to keep going or you do not upon token recognition error. If you do not want to continue lexing then you do not want to continue parsing. Just throw an exception not under and Java will naturally toss you all the way out of the recognizers. If you want to continue lexing then you should not throw an exception to the parser--it has already requested a token. Keep lexing until you get a valid one. Just report errors and keep going, looking for a valid token.

Return a object from your input stream (usually a ). Do not fail/return upon lexing error; keep chewing on the characters until you get a good one; errors are not passed through to the parser. Get the line number for the current position in the input stream. Get the line number for the current position in the input stream. The first line in the input is line 1. The line number for the current position in the input stream, or 0 if the current token source does not track line numbers. Get the index into the current line for the current position in the input stream. Get the index into the current line for the current position in the input stream. The first character on a line has position 0. The line number for the current position in the input stream, or -1 if the current token source does not track character positions. Get the from which this token source is currently providing tokens. The associated with the current position in the input, or null if no input stream is available for the token source. Gets the name of the underlying input source. Gets the name of the underlying input source. This method returns a non-null, non-empty string. If such a name is not known, this method returns . Set the this token source should use for creating objects from the input. The to use for creating tokens. Gets the this token source is currently using for creating objects from the input. The currently used by this token source. A lexer is recognizer that draws input symbols from a character stream. A lexer is recognizer that draws input symbols from a character stream. lexer grammars result in a subclass of this object. A Lexer object uses simplified match() and error recovery mechanisms in the interest of speed. What is the error header, normally line/character position information? How should a token be displayed in an error message? The default is to display just the text, but during development you might want to have a lot of information spit out. How should a token be displayed in an error message? The default is to display just the text, but during development you might want to have a lot of information spit out. Override in that case to use t.toString() (which, for CommonToken, dumps everything about the token). This is better than forcing you to override a method in your token objects because you don't have to go modify your lexer so that it creates a new Java type. NullPointerException if listener is null . Used to print out token names like ID during debugging and error reporting. Used to print out token names like ID during debugging and error reporting. The generated parsers implement a method that overrides this to point to their String[] tokenNames. Get a map from token names to token types. Get a map from token names to token types.

Used for XPath and tree pattern compilation.

Get a map from rule names to rule indexes. Get a map from rule names to rule indexes.

Used for XPath and tree pattern compilation.

If this recognizer was generated, it will have a serialized ATN representation of the grammar. If this recognizer was generated, it will have a serialized ATN representation of the grammar.

For interpreters, we don't know their serialized ATN despite having created the interpreter from it.

For debugging and other purposes, might want the grammar name. For debugging and other purposes, might want the grammar name. Have ANTLR generate an implementation for this method. Get the used by the recognizer for prediction. The used by the recognizer for prediction. Get the ATN interpreter used by the recognizer for prediction. Get the ATN interpreter used by the recognizer for prediction. The ATN interpreter used by the recognizer for prediction. Set the ATN interpreter used by the recognizer for prediction. Set the ATN interpreter used by the recognizer for prediction. The ATN interpreter used by the recognizer for prediction. If profiling during the parse/lex, this will return DecisionInfo records for each decision in recognizer in a ParseInfo object. If profiling during the parse/lex, this will return DecisionInfo records for each decision in recognizer in a ParseInfo object. 4.3 Indicate that the recognizer has changed internal state that is consistent with the ATN state passed in. Indicate that the recognizer has changed internal state that is consistent with the ATN state passed in. This way we always know where we are in the ATN as the parser goes along. The rule context objects form a stack that lets us see the stack of invoking rules. Combine this and we have complete ATN configuration information. How to create token objects The goal of all lexer rules/methods is to create a token object. The goal of all lexer rules/methods is to create a token object. This is an instance variable as multiple rules may collaborate to create a single token. nextToken will return this object after matching lexer rule(s). If you subclass to allow multiple token emissions, then set this to the last token to be matched or something nonnull so that the auto token emit mechanism will not emit another token. What character index in the stream did the current token start at? Needed, for example, to get the text for current token. What character index in the stream did the current token start at? Needed, for example, to get the text for current token. Set at the start of nextToken. The line on which the first character of the token resides The character position of first character within the line Once we see EOF on char stream, next token will be EOF. Once we see EOF on char stream, next token will be EOF. If you have DONE : EOF ; then you see DONE EOF. The channel number for the current token The token type for the current token You can set the text for the current token to override what is in the input char buffer. You can set the text for the current token to override what is in the input char buffer. Use setText() or can set this instance var. Return a token from this source; i.e., match a token on the char stream. Return a token from this source; i.e., match a token on the char stream. Instruct the lexer to skip creating a token for current lexer rule and look for another token. Instruct the lexer to skip creating a token for current lexer rule and look for another token. nextToken() knows to keep looking when a lexer rule finishes with token set to SKIP_TOKEN. Recall that if token==null at end of any token rule, it creates one for you and emits it. Set the char stream and reset the lexer By default does not support multiple emits per nextToken invocation for efficiency reasons. By default does not support multiple emits per nextToken invocation for efficiency reasons. Subclass and override this method, nextToken, and getToken (to push tokens into a list and pull from that list rather than a single variable as this implementation does). The standard method called to automatically emit a token at the outermost lexical rule. The standard method called to automatically emit a token at the outermost lexical rule. The token object should point into the char buffer start..stop. If there is a text override in 'text', use that to set the token's text. Override this method to emit custom Token objects or provide a new factory. Return a list of all Token objects in input char stream. Return a list of all Token objects in input char stream. Forces load of all tokens. Does not include EOF token. Lexers can normally match any char in it's vocabulary after matching a token, so do the easy thing and just kill a character and hope it all works out. Lexers can normally match any char in it's vocabulary after matching a token, so do the easy thing and just kill a character and hope it all works out. You can instead use the rule invocation stack to do sophisticated error recovery if you are in a fragment rule. What is the index of the current character of lookahead? Return the text matched so far for the current token or any text override. Return the text matched so far for the current token or any text override. Set the complete text of this token; it wipes any previous changes to the text. Set the complete text of this token; it wipes any previous changes to the text. Override if emitting multiple tokens. Override if emitting multiple tokens. Used to print out token names like ID during debugging and error reporting. Used to print out token names like ID during debugging and error reporting. The generated parsers implement a method that overrides this to point to their String[] tokenNames. Matching attempted at what input index? Which configurations did we try at input.index() that couldn't match input.LA(1)? Provides an implementation of as a wrapper around a list of objects.

If the final token in the list is an token, it will be used as the EOF token for every call to after the end of the list is reached. Otherwise, an EOF token will be created.

The wrapped collection of objects to return. The name of the input source. The name of the input source. If this value is null , a call to should return the source name used to create the the next token in (or the previous token if the end of the input has been reached). The index into of token to return by the next call to . The end of the input is indicated by this value being greater than or equal to the number of items in . This field caches the EOF token for the token source. This field caches the EOF token for the token source. This is the backing field for the property. Constructs a new instance from the specified collection of objects. The collection of objects to provide as a . NullPointerException if tokens is null Constructs a new instance from the specified collection of objects and source name. The collection of objects to provide as a . The name of the . If this value is null , will attempt to infer the name from the next (or the previous token if the end of the input has been reached). NullPointerException if tokens is null Sam Harwell if value is null . Sam Harwell Initialize the hash using the default seed value. Initialize the hash using the default seed value. the intermediate hash value Initialize the hash using the specified seed . the seed the intermediate hash value Update the intermediate hash value for the next input value . the intermediate hash value the value to add to the current hash the updated intermediate hash value Update the intermediate hash value for the next input value . the intermediate hash value the value to add to the current hash the updated intermediate hash value Apply the final computation steps to the intermediate value hash to form the final result of the MurmurHash 3 hash function. the intermediate hash value the number of integer values added to the hash the final hash result Utility function to compute the hash code of an array using the MurmurHash algorithm. Utility function to compute the hash code of an array using the MurmurHash algorithm. the array data the seed for the MurmurHash algorithm the hash code of the data A generic set of integers. A generic set of integers. Adds the specified value to the current set. Adds the specified value to the current set. the value to add IllegalStateException if the current set is read-only Modify the current object to contain all elements that are present in itself, the specified set , or both. The set to add to the current set. A null argument is treated as though it were an empty set. this (to support chained calls) IllegalStateException if the current set is read-only Return a new object containing all elements that are present in both the current set and the specified set a . The set to intersect with the current set. A null argument is treated as though it were an empty set. A new instance containing the intersection of the current set and a . The value null may be returned in place of an empty result set. Return a new object containing all elements that are present in elements but not present in the current set. The following expressions are equivalent for input non-null instances x and y .
  • x.complement(y)
  • y.subtract(x)
The set to compare with the current set. A null argument is treated as though it were an empty set. A new instance containing the elements present in elements but not present in the current set. The value null may be returned in place of an empty result set.
Return a new object containing all elements that are present in the current set, the specified set a , or both.

This method is similar to , but returns a new instance instead of modifying the current set.

The set to union with the current set. A null argument is treated as though it were an empty set. A new instance containing the union of the current set and a . The value null may be returned in place of an empty result set.
Return a new object containing all elements that are present in the current set but not present in the input set a . The following expressions are equivalent for input non-null instances x and y .
  • y.subtract(x)
  • x.complement(y)
The set to compare with the current set. A null argument is treated as though it were an empty set. A new instance containing the elements present in elements but not present in the current set. The value null may be returned in place of an empty result set.
Returns true if the set contains the specified element. The element to check for. true if the set contains el ; otherwise false . Removes the specified value from the current set. Removes the specified value from the current set. If the current set does not contain the element, no changes are made. the value to remove IllegalStateException if the current set is read-only Return a list containing the elements represented by the current set. Return a list containing the elements represented by the current set. The list is returned in ascending numerical order. A list containing all element present in the current set, sorted in ascending numerical order. Return the total number of elements represented by the current set. Return the total number of elements represented by the current set. the total number of elements represented by the current set, regardless of the manner in which the elements are stored. Returns true if this set contains no elements. true if the current set contains no elements; otherwise, false . Returns the single value contained in the set, if is 1; otherwise, returns . the single value contained in the set, if is 1; otherwise, returns . An immutable inclusive interval a..b. An immutable inclusive interval a..b. The start of the interval. The start of the interval. The end of the interval (inclusive). The end of the interval (inclusive). Interval objects are used readonly so share all with the same single value a==b up to some max size. Interval objects are used readonly so share all with the same single value a==b up to some max size. Use an array as a perfect hash. Return shared object for 0..INTERVAL_POOL_MAX_VALUE or a new Interval object with a..a in it. On Java.g4, 218623 IntervalSets have a..a (set with 1 element). Does this start completely before other? Disjoint Does this start at or before other? Nondisjoint Does this.a start after other.b? May or may not be disjoint Does this start completely after other? Disjoint Does this start after other? NonDisjoint Are both ranges disjoint? I.e., no overlap? Are two intervals adjacent such as 0..41 and 42..42? Return the interval computed from combining this and other Return the interval in common between this and o Return the interval with elements from this not in other ; other must not be totally enclosed (properly contained) within this , which would result in two disjoint intervals instead of the single one returned by this method. return number of elements between a and b inclusively. return number of elements between a and b inclusively. x..x is length 1. if b < a, then length is 0. 9..10 has length 2. This class implements the backed by a sorted array of non-overlapping intervals. It is particularly efficient for representing large collections of numbers, where the majority of elements appear as part of a sequential range of numbers that are all part of the set. For example, the set { 1, 2, 3, 4, 7, 8 } may be represented as { [1, 4], [7, 8] }.

This class is able to represent sets containing any combination of values in the range to (inclusive).

The list of sorted, disjoint intervals. The list of sorted, disjoint intervals. Create a set with a single element, el. Create a set with a single element, el. Create a set with all ints within range [a..b] (inclusive) Add a single element to the set. Add a single element to the set. An isolated element is stored as a range el..el. Add interval; i.e., add all integers from a to b to set. Add interval; i.e., add all integers from a to b to set. If b<a, do nothing. Keep list in sorted order (by left range value). If overlap, combine ranges. For example, If this is {1..5, 10..20}, adding 6..7 yields {1..5, 6..7, 10..20}. Adding 4..8 yields {1..8, 10..20}. combine all sets in the array returned the or'd value Compute the set difference between two interval sets. Compute the set difference between two interval sets. The specific operation is left - right . If either of the input sets is null , it is treated as though it was an empty set. Return a list of Interval objects. Return a list of Interval objects. Are two IntervalSets equal? Because all intervals are sorted and disjoint, equals is a simple linear walk over both lists to make sure they are the same. Are two IntervalSets equal? Because all intervals are sorted and disjoint, equals is a simple linear walk over both lists to make sure they are the same. Interval.equals() is used by the List.equals() method to check the ranges. Returns the maximum value contained in the set. Returns the maximum value contained in the set. the maximum value contained in the set. If the set is empty, this method returns . Returns the minimum value contained in the set. Returns the minimum value contained in the set. the minimum value contained in the set. If the set is empty, this method returns . This exception is thrown to cancel a parsing operation. This exception is thrown to cancel a parsing operation. This exception does not extend , allowing it to bypass the standard error recovery mechanisms. throws this exception in response to a parse error. Sam Harwell Sam Harwell Convert array of strings to string→index map. Convert array of strings to string→index map. Useful for converting rulenames to name→ruleindex map. Indicates that the parser could not decide which of two or more paths to take based upon the remaining input. Indicates that the parser could not decide which of two or more paths to take based upon the remaining input. It tracks the starting token of the offending input and also knows where the parser was in the various paths when the error. Reported by reportNoViableAlternative() Which configurations did we try at input.index() that couldn't match input.LT(1)? The token object at the start index; the input stream might not be buffering tokens so get a reference to it. The token object at the start index; the input stream might not be buffering tokens so get a reference to it. (At the time the error occurred, of course the stream needs to keep a buffer all of the tokens but later we might not have access to those.) This is all the parsing support code essentially; most of it is error recovery stuff. This is all the parsing support code essentially; most of it is error recovery stuff. This field maps from the serialized ATN string to the deserialized with bypass alternatives. The error handling strategy for the parser. The error handling strategy for the parser. The default value is a new instance of . The input stream. The input stream. The object for the currently executing rule. This is always non-null during the parsing process. Specifies whether or not the parser should construct a parse tree during the parsing process. Specifies whether or not the parser should construct a parse tree during the parsing process. The default value is true . When (true) is called, a reference to the is stored here so it can be easily removed in a later call to (false) . The listener itself is implemented as a parser listener so this field is not directly used by other parser methods. The list of listeners registered to receive events during the parse. The number of syntax errors reported during parsing. The number of syntax errors reported during parsing. This value is incremented each time is called. reset the parser's state Match current input symbol against ttype . If the symbol type matches, and are called to complete the match process.

If the symbol type does not match, is called on the current error strategy to attempt recovery. If is true and the token index of the symbol returned by is -1, the symbol is added to the parse tree by calling .

the token type to match the matched symbol if the current input symbol did not match ttype and the error strategy could not recover from the mismatched symbol
Match current input symbol as a wildcard. Match current input symbol as a wildcard. If the symbol type matches (i.e. has a value greater than 0), and are called to complete the match process.

If the symbol type does not match, is called on the current error strategy to attempt recovery. If is true and the token index of the symbol returned by is -1, the symbol is added to the parse tree by calling .

the matched symbol if the current input symbol did not match a wildcard and the error strategy could not recover from the mismatched symbol
Registers listener to receive events during the parsing process.

To support output-preserving grammar transformations (including but not limited to left-recursion removal, automated left-factoring, and optimized code generation), calls to listener methods during the parse may differ substantially from calls made by used after the parse is complete. In particular, rule entry and exit events may occur in a different order during the parse than after the parser. In addition, calls to certain rule entry methods may be omitted.

With the following specific exceptions, calls to listener events are deterministic, i.e. for identical input the calls to listener methods will be the same.

  • Alterations to the grammar used to generate code may change the behavior of the listener calls.
  • Alterations to the command line options passed to ANTLR 4 when generating the parser may change the behavior of the listener calls.
  • Changing the version of the ANTLR Tool used to generate the parser may change the behavior of the listener calls.
the listener to add if listener is null
Remove listener from the list of parse listeners.

If listener is null or has not been added as a parse listener, this method does nothing.

the listener to remove
Remove all parse listeners. Remove all parse listeners. Notify any parse listeners of an enter rule event. Notify any parse listeners of an enter rule event. Notify any parse listeners of an exit rule event. Notify any parse listeners of an exit rule event. The ATN with bypass alternatives is expensive to create so we create it lazily. The ATN with bypass alternatives is expensive to create so we create it lazily. if the current parser does not implement the method. The preferred method of getting a tree pattern. The preferred method of getting a tree pattern. For example, here's a sample use:
            ParseTree t = parser.expr();
            ParseTreePattern p = parser.compileParseTreePattern("<ID>+0", MyParser.RULE_expr);
            ParseTreeMatch m = p.match(t);
            String id = m.get("ID");
            
The same as but specify a rather than trying to deduce it from this parser. Set the token stream and reset the parser. Set the token stream and reset the parser. Consume and return the #getCurrentToken current symbol .

E.g., given the following input with A being the current lookahead symbol, this function moves the cursor to B and returns A .

            A B
            ^
            
If the parser is not in error recovery mode, the consumed symbol is added to the parse tree using , and is called on any parse listeners. If the parser is in error recovery mode, the consumed symbol is added to the parse tree using , and is called on any parse listeners.
Always called by generated parsers upon entry to a rule. Always called by generated parsers upon entry to a rule. Access field get the current context. Like but for recursive rules. Checks whether or not symbol can follow the current state in the ATN. The behavior of this method is equivalent to the following, but is implemented such that the complete context-sensitive follow set does not need to be explicitly constructed.
            return getExpectedTokens().contains(symbol);
            
the symbol type to check true if symbol can follow the current state in the ATN, otherwise false .
Computes the set of input symbols which could follow the current parser state and context, as given by and , respectively. Get a rule's index (i.e., RULE_ruleName field) or -1 if not found. Return List<String> of the rule names in your parser instance leading up to a call to the current rule. Return List<String> of the rule names in your parser instance leading up to a call to the current rule. You could override if you want more details such as the file/line info of where in the ATN a rule is invoked. This is very useful for error messages. For debugging and other purposes. For debugging and other purposes. For debugging and other purposes. For debugging and other purposes. Track the objects during the parse and hook them up using the list so that it forms a parse tree. The returned from the start rule represents the root of the parse tree.

Note that if we are not building parse trees, rule contexts only point upwards. When a rule exits, it returns the context but that gets garbage collected if nobody holds a reference. It points upwards but nobody points at it.

When we build parse trees, we are adding all of these contexts to list. Contexts are then not candidates for garbage collection.

Gets whether or not a complete parse tree will be constructed while parsing. Gets whether or not a complete parse tree will be constructed while parsing. This property is true for a newly constructed parser. true if a complete parse tree will be constructed while parsing, otherwise false
Trim the internal lists of the parse tree during parsing to conserve memory. Trim the internal lists of the parse tree during parsing to conserve memory. This property is set to false by default for a newly constructed parser. true to trim the capacity of the list to its size after a rule is parsed. true if the list is trimmed using the default during the parse process. Gets the number of syntax errors reported during parsing. Gets the number of syntax errors reported during parsing. This value is incremented each time is called. Match needs to return the current input symbol, which gets put into the label for the associated token ref; e.g., x=ID. Match needs to return the current input symbol, which gets put into the label for the associated token ref; e.g., x=ID. Get the precedence level for the top-most precedence rule. Get the precedence level for the top-most precedence rule. The precedence level for the top-most precedence rule, or -1 if the parser context is not nested within a precedence rule. 4.3 During a parse is sometimes useful to listen in on the rule entry and exit events as well as token matches. During a parse is sometimes useful to listen in on the rule entry and exit events as well as token matches. This is for quick and dirty debugging. A parser simulator that mimics what ANTLR's generated parser code does. A parser simulator that mimics what ANTLR's generated parser code does. A ParserATNSimulator is used to make predictions via adaptivePredict but this class moves a pointer through the ATN to simulate parsing. ParserATNSimulator just makes us efficient rather than having to backtrack, for example. This properly creates parse trees even for left recursive rules. We rely on the left recursive rule invocation and special predicate transitions to make left recursive rules work. See TestParserInterpreter for examples. Begin parsing at startRuleIndex This implementation of dispatches all calls to a collection of delegate listeners. This reduces the effort required to support multiple listeners. Sam Harwell Sam Harwell Useful for rewriting out a buffered input token stream after doing some augmentation or other manipulations on it. Useful for rewriting out a buffered input token stream after doing some augmentation or other manipulations on it.

You can insert stuff, replace, and delete chunks. Note that the operations are done lazily--only if you convert the buffer to a with . This is very efficient because you are not moving data around all the time. As the buffer of tokens is converted to strings, the method(s) scan the input token stream and check to see if there is an operation at the current index. If so, the operation is done and then normal rendering continues on the buffer. This is like having multiple Turing machine instruction streams (programs) operating on a single input tape. :)

This rewriter makes no modifications to the token stream. It does not ask the stream to fill itself up nor does it advance the input cursor. The token stream will return the same value before and after any call.

The rewriter only works on tokens that you have in the buffer and ignores the current input cursor. If you are buffering tokens on-demand, calling halfway through the input will only do rewrites for those tokens in the first half of the file.

Since the operations are done lazily at -time, operations do not screw up the token index values. That is, an insert operation at token index i does not change the index values for tokens i +1..n-1.

Because operations never actually alter the buffer, you may always get the original token stream back without undoing anything. Since the instructions are queued up, you can easily simulate transactions and roll back any changes if there is an error just by removing instructions. For example,

            CharStream input = new ANTLRFileStream("input");
            TLexer lex = new TLexer(input);
            CommonTokenStream tokens = new CommonTokenStream(lex);
            T parser = new T(tokens);
            TokenStreamRewriter rewriter = new TokenStreamRewriter(tokens);
            parser.startRule();
            

Then in the rules, you can execute (assuming rewriter is visible):

            Token t,u;
            ...
            rewriter.insertAfter(t, "text to put after t");}
            rewriter.insertAfter(u, "text after u");}
            System.out.println(tokens.toString());
            

You can also have multiple "instruction streams" and get multiple rewrites from a single pass over the input. Just name the instruction streams and use that name again when printing the buffer. This could be useful for generating a C file and also its header file--all from the same buffer:

            tokens.insertAfter("pass1", t, "text to put after t");}
            tokens.insertAfter("pass2", u, "text after u");}
            System.out.println(tokens.toString("pass1"));
            System.out.println(tokens.toString("pass2"));
            

If you don't use named rewrite streams, a "default" stream is used as the first example shows.

Our source stream You may have multiple, named streams of rewrite operations. You may have multiple, named streams of rewrite operations. I'm calling these things "programs." Maps String (name) → rewrite (List) Map String (program name) → Integer index Rollback the instruction stream for a program so that the indicated instruction (via instructionIndex) is no longer in the stream. Rollback the instruction stream for a program so that the indicated instruction (via instructionIndex) is no longer in the stream. UNTESTED! Reset the program so that no instructions exist Return the text from the original tokens altered per the instructions given to this rewriter. Return the text from the original tokens altered per the instructions given to this rewriter. Return the text associated with the tokens in the interval from the original token stream but with the alterations given to this rewriter. Return the text associated with the tokens in the interval from the original token stream but with the alterations given to this rewriter. The interval refers to the indexes in the original token stream. We do not alter the token stream in any way, so the indexes and intervals are still consistent. Includes any operations done to the first and last token in the interval. So, if you did an insertBefore on the first token, you would get that insertion. The same is true if you do an insertAfter the stop token. We need to combine operations and report invalid operations (like overlapping replaces that are not completed nested). We need to combine operations and report invalid operations (like overlapping replaces that are not completed nested). Inserts to same index need to be combined etc... Here are the cases: I.i.u I.j.v leave alone, nonoverlapping I.i.u I.i.v combine: Iivu R.i-j.u R.x-y.v | i-j in x-y delete first R R.i-j.u R.i-j.v delete first R R.i-j.u R.x-y.v | x-y in i-j ERROR R.i-j.u R.x-y.v | boundaries overlap ERROR Delete special case of replace (text==null): D.i-j.u D.x-y.v | boundaries overlap combine to max(min)..max(right) I.i.u R.x-y.v | i in (x+1)-y delete I (since insert before we're not deleting i) I.i.u R.x-y.v | i not in (x+1)-y leave alone, nonoverlapping R.x-y.v I.i.u | i in x-y ERROR R.x-y.v I.x.u R.x-y.uv (combine, delete I) R.x-y.v I.i.u | i not in x-y leave alone, nonoverlapping I.i.u = insert u before op @ index i R.x-y.u = replace x-y indexed tokens with u First we need to examine replaces. For any replace op: 1. wipe out any insertions before op within that range. 2. Drop any replace op before that is contained completely within that range. 3. Throw exception upon boundary overlap with any previous replace. Then we can deal with inserts: 1. for any inserts to same index, combine even if not adjacent. 2. for any prior replace with same left boundary, combine this insert with replace and delete this replace. 3. throw exception if index in same range as previous replace Don't actually delete; make op null in list. Easier to walk list. Later we can throw as we add to index → op map. Note that I.2 R.2-2 will wipe out I.2 even though, technically, the inserted stuff would be before the replace range. But, if you add tokens in front of a method body '{' and then delete the method body, I think the stuff before the '{' you added should disappear too. Return a map from token index to operation. Get all operations before an index of a particular kind What index into rewrites List are we? Token buffer index. Token buffer index. Execute the rewrite operation by possibly adding to the buffer. Execute the rewrite operation by possibly adding to the buffer. Return the index of the next token to operate on. I'm going to try replacing range from x..y with (y-x)+1 ReplaceOp instructions. I'm going to try replacing range from x..y with (y-x)+1 ReplaceOp instructions. This interface defines the basic notion of a parse tree visitor. This interface defines the basic notion of a parse tree visitor. Generated visitors implement this interface and the XVisitor interface for grammar X . Sam Harwell Visit a parse tree, and return a user-defined result of the operation. Visit a parse tree, and return a user-defined result of the operation. The to visit. The result of visiting the parse tree. Visit the children of a node, and return a user-defined result of the operation. Visit the children of a node, and return a user-defined result of the operation. The whose children should be visited. The result of visiting the children of the node. Visit a terminal node, and return a user-defined result of the operation. Visit a terminal node, and return a user-defined result of the operation. The to visit. The result of visiting the node. Visit an error node, and return a user-defined result of the operation. Visit an error node, and return a user-defined result of the operation. The to visit. The result of visiting the node.

The default implementation calls on the specified tree.

The default implementation initializes the aggregate result to defaultResult() . Before visiting each child, it calls shouldVisitNextChild ; if the result is false no more children are visited and the current aggregate result is returned. After visiting a child, the aggregate result is updated by calling aggregateResult with the previous aggregate result and the result of visiting the child.

The default implementation is not safe for use in visitors that modify the tree structure. Visitors that modify the tree should override this method to behave properly in respect to the specific algorithm in use.

The default implementation returns the result of defaultResult .

The default implementation returns the result of defaultResult .

Aggregates the results of visiting multiple children of a node. Aggregates the results of visiting multiple children of a node. After either all children are visited or returns false , the aggregate value is returned as the result of .

The default implementation returns nextResult , meaning will return the result of the last child visited (or return the initial value if the node has no children).

The previous aggregate value. In the default implementation, the aggregate value is initialized to , which is passed as the aggregate argument to this method after the first child node is visited. The result of the immediately preceeding call to visit a child node. The updated aggregate result.
This method is called after visiting each child in . This method is first called before the first child is visited; at that point currentResult will be the initial value (in the default implementation, the initial value is returned by a call to . This method is not called after the last child is visited.

The default implementation always returns true , indicating that visitChildren should only return after all children are visited. One reason to override this method is to provide a "short circuit" evaluation option for situations where the result of visiting a single child has the potential to determine the result of the visit operation as a whole.

The whose children are currently being visited. The current aggregate result of the children visited to the current point. true to continue visiting children. Otherwise return false to stop visiting children and immediately return the current aggregate result from .
Gets the default value returned by visitor methods. Gets the default value returned by visitor methods. This value is returned by the default implementations of visitTerminal , visitErrorNode . The default implementation of visitChildren initializes its aggregate result to this value.

The base implementation returns null .

The default value returned by visitor methods.
Represents a token that was consumed during resynchronization rather than during a valid match operation. Represents a token that was consumed during resynchronization rather than during a valid match operation. For example, we will create this kind of a node during single token insertion and deletion as well as during "consume until error recovery set" upon no viable alternative exceptions. Associate a property with a parse tree node. Associate a property with a parse tree node. Useful with parse tree listeners that need to associate values with particular tree nodes, kind of like specifying a return value for the listener event method that visited a particular node. Example:
            ParseTreeProperty<Integer> values = new ParseTreeProperty<Integer>();
            values.put(tree, 36);
            int x = values.get(tree);
            values.removeFrom(tree);
            
You would make one decl (values here) in the listener and use lots of times in your event methods.
The discovery of a rule node, involves sending two events: the generic and a -specific event. First we trigger the generic and then the rule specific. We to them in reverse order upon finishing the node. A chunk is either a token tag, a rule tag, or a span of literal text within a tree pattern. A chunk is either a token tag, a rule tag, or a span of literal text within a tree pattern.

The method returns a list of chunks in preparation for creating a token stream by . From there, we get a parse tree from with . These chunks are converted to , , or the regular tokens of the text surrounding the tags.

Represents the result of matching a against a tree pattern. This is the backing field for . This is the backing field for . This is the backing field for . This is the backing field for . Constructs a new instance of from the specified parse tree and pattern. The parse tree to match against the pattern. The parse tree pattern. A mapping from label names to collections of objects located by the tree pattern matching process. The first node which failed to match the tree pattern during the matching process. IllegalArgumentException if tree is null IllegalArgumentException if pattern is null IllegalArgumentException if labels is null Get the last node associated with a specific label .

For example, for pattern <id:ID> , get("id") returns the node matched for that ID . If more than one node matched the specified label, only the last is returned. If there is no node associated with the label, this returns null .

Pattern tags like <ID> and <expr> without labels are considered to be labeled with ID and expr , respectively.

The label to check. The last to match a tag with the specified label, or null if no parse tree matched a tag with the label.
Return all nodes matching a rule or token tag with the specified label. Return all nodes matching a rule or token tag with the specified label.

If the label is the name of a parser rule or token in the grammar, the resulting list will contain both the parse trees matching rule or tags explicitly labeled with the label and the complete set of parse trees matching the labeled and unlabeled tags in the pattern for the parser rule or token. For example, if label is "foo" , the result will contain all of the following.

  • Parse tree nodes matching tags of the form <foo:anyRuleName> and <foo:AnyTokenName> .
  • Parse tree nodes matching tags of the form <anyLabel:foo> .
  • Parse tree nodes matching tags of the form <foo> .
The label. A collection of all nodes matching tags with the specified label . If no nodes matched the label, an empty list is returned.
Return a mapping from label → [list of nodes]. Return a mapping from label → [list of nodes].

The map includes special entries corresponding to the names of rules and tokens referenced in tags in the original pattern. For additional information, see the description of .

A mapping from labels to parse tree nodes. If the parse tree pattern did not contain any rule or token tags, this map will be empty.
Get the node at which we first detected a mismatch. Get the node at which we first detected a mismatch. the node at which we first detected a mismatch, or null if the match was successful. Gets a value indicating whether the match operation succeeded. Gets a value indicating whether the match operation succeeded. true if the match operation succeeded; otherwise, false . Get the tree pattern we are matching against. Get the tree pattern we are matching against. The tree pattern we are matching against. Get the parse tree we are trying to match to a pattern. Get the parse tree we are trying to match to a pattern. The we are trying to match to a pattern. A pattern like <ID> = <expr>; converted to a by . This is the backing field for . This is the backing field for . This is the backing field for . This is the backing field for . Construct a new instance of the class. The which created this tree pattern. The tree pattern in concrete syntax form. The parser rule which serves as the root of the tree pattern. The tree pattern in form. Match a specific parse tree against this tree pattern. Match a specific parse tree against this tree pattern. The parse tree to match against this tree pattern. A object describing the result of the match operation. The method can be used to determine whether or not the match was successful. Determine whether or not a parse tree matches this tree pattern. Determine whether or not a parse tree matches this tree pattern. The parse tree to match against this tree pattern. true if tree is a match for the current tree pattern; otherwise, false . Find all nodes using XPath and then try to match those subtrees against this tree pattern. Find all nodes using XPath and then try to match those subtrees against this tree pattern. The to match against this pattern. An expression matching the nodes A collection of objects describing the successful matches. Unsuccessful matches are omitted from the result, regardless of the reason for the failure. Get the which created this tree pattern. The which created this tree pattern. Get the tree pattern in concrete syntax form. Get the tree pattern in concrete syntax form. The tree pattern in concrete syntax form. Get the parser rule which serves as the outermost rule for the tree pattern. Get the parser rule which serves as the outermost rule for the tree pattern. The parser rule which serves as the outermost rule for the tree pattern. Get the tree pattern as a . The rule and token tags from the pattern are present in the parse tree as terminal nodes with a symbol of type or . The tree pattern as a . A tree pattern matching mechanism for ANTLR s.

Patterns are strings of source input text with special tags representing token or rule references such as:

<ID> = <expr>;

Given a pattern start rule such as statement , this object constructs a with placeholders for the ID and expr subtree. Then the routines can compare an actual from a parse with this pattern. Tag <ID> matches any ID token and tag <expr> references the result of the expr rule (generally an instance of ExprContext .

Pattern x = 0; is a similar pattern that matches the same pattern except that it requires the identifier to be x and the expression to be 0 .

The routines return true or false based upon a match for the tree rooted at the parameter sent in. The routines return a object that contains the parse tree, the parse tree pattern, and a map from tag name to matched nodes (more below). A subtree that fails to match, returns with set to the first tree node that did not match.

For efficiency, you can compile a tree pattern in string form to a object.

See TestParseTreeMatcher for lots of examples. has two static helper methods: and that are easy to use but not super efficient because they create new objects each time and have to compile the pattern in string form before using it.

The lexer and parser that you pass into the constructor are used to parse the pattern in string form. The lexer converts the <ID> = <expr>; into a sequence of four tokens (assuming lexer throws out whitespace or puts it on a hidden channel). Be aware that the input stream is reset for the lexer (but not the parser; a is created to parse the input.). Any user-defined fields you have put into the lexer might get changed when this mechanism asks it to scan the pattern string.

Normally a parser does not accept token <expr> as a valid expr but, from the parser passed in, we create a special version of the underlying grammar representation (an ) that allows imaginary tokens representing rules ( <expr> ) to match entire rules. We call these bypass alternatives.

Delimiters are < and > , with \ as the escape string by default, but you can set them to whatever you want using . You must escape both start and stop strings \< and \> .

This is the backing field for . This is the backing field for . Constructs a or from a and object. The lexer input stream is altered for tokenizing the tree patterns. The parser is used as a convenient mechanism to get the grammar name, plus token, rule names. Set the delimiters used for marking rule and token tags within concrete syntax used by the tree pattern parser. Set the delimiters used for marking rule and token tags within concrete syntax used by the tree pattern parser. The start delimiter. The stop delimiter. The escape sequence to use for escaping a start or stop delimiter. IllegalArgumentException if start is null or empty. IllegalArgumentException if stop is null or empty. Does pattern matched as rule patternRuleIndex match tree ? Does pattern matched as rule patternRuleIndex match tree? Pass in a compiled pattern instead of a string representation of a tree pattern. Compare pattern matched as rule patternRuleIndex against tree and return a object that contains the matched elements, or the node at which the match failed. Compare pattern matched against tree and return a object that contains the matched elements, or the node at which the match failed. Pass in a compiled pattern instead of a string representation of a tree pattern. For repeated use of a tree pattern, compile it to a using this method. Recursively walk tree against patternTree , filling match. . the first node encountered in tree which does not match a corresponding node in patternTree , or null if the match was successful. The specific node returned depends on the matching algorithm used by the implementation, and may be overridden. Is t (expr <expr>) subtree? Split <ID> = <e:expr> ; into 4 chunks for tokenizing by . Used to convert the tree pattern string into a series of tokens. Used to convert the tree pattern string into a series of tokens. The input stream is reset. Used to collect to the grammar file name, token names, rule names for used to parse the pattern into a parse tree. Used to collect to the grammar file name, token names, rule names for used to parse the pattern into a parse tree. A object representing an entire subtree matched by a parser rule; e.g., <expr> . These tokens are created for chunks where the tag corresponds to a parser rule. This is the backing field for . The token type for the current token. The token type for the current token. This is the token type assigned to the bypass alternative for the rule during ATN deserialization. This is the backing field for . Constructs a new instance of with the specified rule name and bypass token type and no label. The name of the parser rule this rule tag matches. The bypass token type assigned to the parser rule. IllegalArgumentException if ruleName is null or empty. Constructs a new instance of with the specified rule name, bypass token type, and label. The name of the parser rule this rule tag matches. The bypass token type assigned to the parser rule. The label associated with the rule tag, or null if the rule tag is unlabeled. IllegalArgumentException if ruleName is null or empty.

The implementation for returns a string of the form ruleName:bypassTokenType .

Gets the name of the rule associated with this rule tag. Gets the name of the rule associated with this rule tag. The name of the parser rule associated with this rule tag. Gets the label associated with the rule tag. Gets the label associated with the rule tag. The name of the label associated with the rule tag, or null if this is an unlabeled rule tag.

Rule tag tokens are always placed on the .

This method returns the rule tag formatted with < and > delimiters.

Rule tag tokens have types assigned according to the rule bypass transitions created during ATN deserialization.

The implementation for always returns 0.

The implementation for always returns -1.

The implementation for always returns -1.

The implementation for always returns -1.

The implementation for always returns -1.

The implementation for always returns null .

The implementation for always returns null .

Represents a placeholder tag in a tree pattern. Represents a placeholder tag in a tree pattern. A tag can have any of the following forms.
  • expr : An unlabeled placeholder for a parser rule expr .
  • ID : An unlabeled placeholder for a token of type ID .
  • e:expr : A labeled placeholder for a parser rule expr .
  • id:ID : A labeled placeholder for a token of type ID .
This class does not perform any validation on the tag or label names aside from ensuring that the tag is a non-null, non-empty string.
This is the backing field for . This is the backing field for . Construct a new instance of using the specified tag and no label. The tag, which should be the name of a parser rule or token type. IllegalArgumentException if tag is null or empty. Construct a new instance of using the specified label and tag. The label for the tag. If this is null , the represents an unlabeled tag. The tag, which should be the name of a parser rule or token type. IllegalArgumentException if tag is null or empty. This method returns a text representation of the tag chunk. This method returns a text representation of the tag chunk. Labeled tags are returned in the form label:tag , and unlabeled tags are returned as just the tag name. Get the tag for this chunk. Get the tag for this chunk. The tag for the chunk. Get the label, if any, assigned to this chunk. Get the label, if any, assigned to this chunk. The label assigned to this chunk, or null if no label is assigned to the chunk. Represents a span of raw text (concrete syntax) between tags in a tree pattern string. Represents a span of raw text (concrete syntax) between tags in a tree pattern string. This is the backing field for . Constructs a new instance of with the specified text. The text of this chunk. IllegalArgumentException if text is null .

The implementation for returns the result of in single quotes.

Gets the raw text of this chunk. Gets the raw text of this chunk. The text of the chunk. A object representing a token of a particular type; e.g., <ID> . These tokens are created for chunks where the tag corresponds to a lexer rule or token type. This is the backing field for . This is the backing field for . Constructs a new instance of for an unlabeled tag with the specified token name and type. The token name. The token type. Constructs a new instance of with the specified token name, type, and label. The token name. The token type. The label associated with the token tag, or null if the token tag is unlabeled.

The implementation for returns a string of the form tokenName:type .

Gets the token name. Gets the token name. The token name. Gets the label associated with the rule tag. Gets the label associated with the rule tag. The name of the label associated with the rule tag, or null if this is an unlabeled rule tag.

The implementation for returns the token tag formatted with < and > delimiters.

A set of utility routines useful for all kinds of ANTLR trees. A set of utility routines useful for all kinds of ANTLR trees. Print out a whole tree in LISP form. Print out a whole tree in LISP form. is used on the node payloads to get the text for the nodes. Detect parse trees and extract data appropriately. Print out a whole tree in LISP form. Print out a whole tree in LISP form. is used on the node payloads to get the text for the nodes. Detect parse trees and extract data appropriately. Print out a whole tree in LISP form. Print out a whole tree in LISP form. is used on the node payloads to get the text for the nodes. Detect parse trees and extract data appropriately. Return ordered list of all children of this node Return a list of all ancestors of this node. Return a list of all ancestors of this node. The first node of list is the root and the last is the parent of this node. Represent a subset of XPath XML path syntax for use in identifying nodes in parse trees. Represent a subset of XPath XML path syntax for use in identifying nodes in parse trees.

Split path into words and separators / and // via ANTLR itself then walk path elements from left to right. At each separator-word pair, find set of nodes. Next stage uses those as work list.

The basic interface is ParseTree.findAll (tree, pathString, parser) . But that is just shorthand for:

            
            p = new
            XPath
            (parser, pathString);
            return p.
            evaluate
            (tree);
            

See org.antlr.v4.test.TestXPath for descriptions. In short, this allows operators:

/
root
//
anywhere
!
invert; this must appear directly after root or anywhere operator

and path elements:

ID
token name
'string'
any string literal token from the grammar
expr
rule name
*
wildcard matching any node

Whitespace is not allowed.

Convert word like * or ID or expr to a path element. anywhere is true if // precedes the word. Return a list of all nodes starting at t as root that satisfy the path. The root / is relative to the node passed to . Construct element like /ID or ID or /* etc... op is null if just node Given tree rooted at t return all nodes matched by this path element. Either ID at start of path or ...//ID in middle of path. Do not buffer up the entire char stream. Do not buffer up the entire char stream. It does keep a small buffer for efficiency and also buffers while a mark exists (set by the lookahead prediction in parser). "Unbuffered" here refers to fact that it doesn't buffer all data, not that's it's on demand loading of char. A moving window buffer of the data being scanned. A moving window buffer of the data being scanned. While there's a marker, we keep adding to buffer. Otherwise, consume() resets so we start filling at index 0 again. The number of characters currently in data .

This is not the buffer capacity, that's data.length .

0..n-1 index into data of next character.

The LA(1) character is data[p] . If p == n , we are out of buffered characters.

Count up with mark() and down with release() . When we release() the last mark, numMarkers reaches 0 and we reset the buffer. Copy data[p]..data[n-1] to data[0]..data[(n-1)-p] . This is the LA(-1) character for the current position. When numMarkers > 0 , this is the LA(-1) character for the first character in data . Otherwise, this is unspecified. Absolute character index. Absolute character index. It's the index of the character about to be read via LA(1) . Goes from 0 to the number of characters in the entire stream, although the stream size is unknown before the end is reached. The name or source of this char stream. The name or source of this char stream. Useful for subclasses that pull char from other than this.input. Useful for subclasses that pull char from other than this.input. Useful for subclasses that pull char from other than this.input. Useful for subclasses that pull char from other than this.input. Make sure we have 'need' elements from current position p . Last valid p index is data.length-1 . p+need-1 is the char index 'need' elements ahead. If we need 1 element, (p+1-1)==p must be less than data.length . Add n characters to the buffer. Returns the number of characters actually added to the buffer. If the return value is less than n , then EOF was reached before n characters could be added. Override to provide different source of characters than input . Return a marker that we can release later. Return a marker that we can release later.

The specific marker value used for this class allows for some level of protection against misuse where seek() is called on a mark or release() is called in the wrong order.

Decrement number of markers, resetting buffer if we hit 0. Decrement number of markers, resetting buffer if we hit 0. Seek to absolute character index, which might not be in the current sliding window. Seek to absolute character index, which might not be in the current sliding window. Move p to index-bufferStartIndex . A moving window buffer of the data being scanned. A moving window buffer of the data being scanned. While there's a marker, we keep adding to buffer. Otherwise, consume() resets so we start filling at index 0 again. The number of tokens currently in tokens .

This is not the buffer capacity, that's tokens.length .

0..n-1 index into tokens of next token.

The LT(1) token is tokens[p] . If p == n , we are out of buffered tokens.

Count up with mark() and down with release() . When we release() the last mark, numMarkers reaches 0 and we reset the buffer. Copy tokens[p]..tokens[n-1] to tokens[0]..tokens[(n-1)-p] . This is the LT(-1) token for the current position. When numMarkers > 0 , this is the LT(-1) token for the first token in . Otherwise, this is null . Absolute token index. Absolute token index. It's the index of the token about to be read via LT(1) . Goes from 0 to the number of tokens in the entire stream, although the stream size is unknown before the end is reached.

This value is used to set the token indexes if the stream provides tokens that implement .

Make sure we have 'need' elements from current position p . Last valid p index is tokens.length-1 . p+need-1 is the tokens index 'need' elements ahead. If we need 1 element, (p+1-1)==p must be less than tokens.length . Add n elements to the buffer. Returns the number of tokens actually added to the buffer. If the return value is less than n , then EOF was reached before n tokens could be added. Return a marker that we can release later. Return a marker that we can release later.

The specific marker value used for this class allows for some level of protection against misuse where seek() is called on a mark or release() is called in the wrong order.