char[]
buffer. Can also pass in a
char[]
to use.
If you need encoding, pass in stream/reader with correct encoding.
Initializing Methods: Some methods in this interface have unspecified behavior if no call to an initializing method has occurred after the stream was constructed. The following is a list of initializing methods:
index()
after calling this method.LA(1)
before
calling this method becomes the value of
LA(-1)
after calling
this method.index()
is
incremented by exactly 1, as that would preclude the ability to implement
filtering streams (e.g.
LA(1)==
consume
).
i
from the current
position. When
i==1
, this method returns the value of the current
symbol in the stream (which is the next symbol to be consumed). When
i==-1
, this method returns the value of the previously read
symbol in the stream. It is not valid to call this method with
i==0
, but the specific behavior is unspecified because this
method is frequently called from performance-critical code.
This method is guaranteed to succeed if any of the following are true:
i>0
i==-1
and
index()
after the stream was constructed
and
LA(1)
was called in that order. Specifying the current
index()
relative to the index after the stream was created
allows for filtering implementations that do not return every symbol
from the underlying source. Specifying the call to
LA(1)
allows for lazily initialized streams.LA(i)
refers to a symbol consumed within a marked region
that has not yet been released.If
i
represents a position at or beyond the end of the stream,
this method returns
The return value is unspecified if
i<0
and fewer than
-i
calls to
mark()
was called to the current
The returned mark is an opaque handle (type
int
) which is passed
to
mark()
/
release()
are nested, the marks must be released
in reverse order of which they were obtained. Since marked regions are
used during performance-critical sections of prediction, the specific
behavior of invalid usage is unspecified (i.e. a mark is not released, or
a mark is released twice, or marks are not released in reverse order from
which they were created).
The behavior of this method is unspecified if no call to an
This method does not change the current position in the input stream.
The following example shows the use of
IntStream stream = ...; int index = -1; int mark = stream.mark(); try { index = stream.index(); // perform work here... } finally { if (index != -1) { stream.seek(index); } stream.release(mark); }
release()
must appear in the
reverse order of the corresponding calls to
mark()
. If a mark is
released twice, or if marks are not released in reverse order of the
corresponding calls to
mark()
, the behavior is unspecified.
For more information and an example, see
mark()
.
index
. If the
specified index lies past the end of the stream, the operation behaves as
though
index
was the index of the EOF symbol. After this method
returns without throwing an exception, the at least one of the following
will be true.
index
. Specifically,
implementations which filter their sources should automatically
adjust
index
forward the minimum amount required for the
operation to target a non-ignored symbol.LA(1)
returns
index
lies within a marked region. For more information on marked regions, see
index
is less than 0
LA(1)
.
The behavior of this method is unspecified if no call to an
interval
lies entirely within a marked range. For more
information about marked ranges, see
interval
is
null
interval.a < 0
, or if
interval.b < interval.a - 1
, or if
interval.b
lies at or
past the end of the stream
This is a one way link. It emanates from a state (usually via a list of transitions) and has a target state.
Since we never have to change the ATN transitions once we construct it, we can fix these transitions as specific classes. The DFA transitions on the other hand need to update the labels as it adds transitions to the states. We'll use the term Edge for the DFA to distinguish them from ATN transitions.
The default implementation returns
false
.
true
if traversing this transition in the ATN does not
consume an input symbol; otherwise,
false
if traversing this
transition consumes (matches) an input symbol.
This event may be reported during SLL prediction in cases where the
conflicting SLL configuration set provides sufficient information to
determine that the SLL conflict is truly an ambiguity. For example, if none
of the ATN configurations in the conflicting SLL configuration set have
traversed a global follow transition (i.e.
false
for all
configurations), then the result of SLL prediction for that input is known to
be equivalent to the result of LL prediction for that input.
In some cases, the minimum represented alternative in the conflicting LL
configuration set is not equal to the minimum represented alternative in the
conflicting SLL configuration set. Grammars and inputs which result in this
scenario are unable to use
null
if no
additional information is relevant or available.
true
if the current event occurred during LL prediction;
otherwise,
false
if the input occurred during SLL prediction.
private int referenceHashCode() { int hash =MurmurHash.initialize (); for (int i = 0; i < ; i++) { hash = MurmurHash.update (hash,getParent (i)); } for (int i = 0; i <; i++) { hash = MurmurHash.update (hash,getReturnState (i)); } hash =MurmurHash.finish (hash, 2 *); return hash; }
null
.
s
.
If
ctx
is
s
. In other words, the set will be
restricted to tokens reachable staying within
s
's rule.
s
and
staying in same rule.
stateNumber
in the specified full
context
. This method
considers the complete parser context, but does not evaluate semantic
predicates (i.e. all predicates encountered during the calculation are
assumed true). If a path in the ATN exists from the starting state to the
If
context
is
null
, it is treated as
stateNumber
ATNConfigSet
contains two configs with the same state and alternative
but different semantic contexts. When this case arises, the first config
added to this map stays, and the remaining configs are placed in
null
for read-only sets stored in the DFA.
null
for read-only sets stored in the DFA.
true
, this config set represents configurations where the entire
outer context has been consumed by the ATN interpreter. This prevents the
outermostConfigSet
and
true
if the
actualUuid
value represents a
serialized ATN at or after the feature identified by
feature
was
introduced; otherwise,
false
.
...
support
any number of alternatives (one or more). Nodes without the
...
only
support the exact number of alternatives shown in the diagram.(...)*
(...)+
(...)?
(...)*?
(...)+?
(...)??
(...)
block.
(a|b|c)
block.
In some cases, the unique alternative identified by LL prediction is not
equal to the minimum represented alternative in the conflicting SLL
configuration set. Grammars and inputs which result in this scenario are
unable to use
Parsing performance in ANTLR 4 is heavily influenced by both static factors (e.g. the form of the rules in the grammar) and dynamic factors (e.g. the choice of input and the state of the DFA cache at the time profiling operations are started). For best results, gather and use aggregate statistics from a large sample of inputs representing the inputs expected in production before using the results to make changes in the grammar.
The value of this field is computed by
If DFA caching of SLL transitions is employed by the implementation, ATN computation may cache the computed edge for efficient lookup during future parsing of this decision. Otherwise, the SLL parsing algorithm will use ATN transitions exclusively.
If the ATN simulator implementation does not use DFA caching for SLL transitions, this value will be 0.
Note that this value is not related to whether or not
If DFA caching of LL transitions is employed by the implementation, ATN computation may cache the computed edge for efficient lookup during future parsing of this decision. Otherwise, the LL parsing algorithm will use ATN transitions exclusively.
If the ATN simulator implementation does not use DFA caching for LL transitions, this value will be 0.
For position-dependent actions, the input stream must already be positioned correctly prior to calling this method.
Many lexer commands, including
type
,
skip
, and
more
, do not check the input index during their execution.
Actions like this are position-independent, and may be stored more
efficiently as part of the
true
if the lexer action semantics can be affected by the
position of the input
false
.
The executor tracks position information for position-dependent lexer actions
efficiently, ensuring that actions appearing only at the end of the rule do
not cause bloating of the
lexerActionExecutor
followed by a specified
lexerAction
.
null
, the method behaves as though
it were an empty executor.
The lexer action to execute after the actions
specified in
lexerActionExecutor
.
lexerActionExecutor
and
lexerAction
.
Normally, when the executor encounters lexer actions where
true
, it calls
Prior to traversing a match transition in the ATN, the current offset from the token start index is assigned to all position-dependent lexer actions which have not already been assigned a fixed offset. By storing the offsets relative to the token start index, the DFA representation of lexer actions which appear in the middle of tokens remains efficient due to sharing among tokens of the same length, regardless of their absolute position in the input stream.
If the current executor already has offsets assigned to all
position-dependent lexer actions, the method returns
this
.
This method calls
input
input
should be the start of the following token, i.e. 1
character past the end of the current token.
The token start index. This value may be passed to
input
position to the beginning
of the token.
null
.
t
, or
null
if the target state for this edge is not
already cached
t
. If
t
does not lead to a valid DFA state, this method
returns
t
. Parameter
reach
is a return
parameter.
config
, all other (potentially reachable) states for
this rule would have a lower priority.
true
if an accept state is reached, otherwise
false
.
If
speculative
is
true
, this method was called before
input
and the simulator
to the original state before returning (i.e. undo the actions made by the
call to
true
if the current index in
input
is
one character before the predicate's location.
true
if the specified predicate evaluates to
true
.
We track these variables separately for the DFA and ATN simulation because the DFA simulation often has to fail over to the ATN simulation. If the ATN simulation fails, we need the DFA to fall back to its previously accepted state, if any. If the ATN succeeds, then the ATN does the accept and the DFA simulator that invoked it can simply return the predicted token type.
channel
lexer action by calling
channel
action with the specified channel value.
This action is implemented by calling
false
.
This class may represent embedded actions created with the {...}
syntax in ANTLR 4, as well as actions created for lexer commands where the
command argument could not be evaluated when the grammar was compiled.
Custom actions are implemented by calling
Custom actions are position-dependent since they may represent a
user-defined embedded action which makes calls to methods like
true
.
This action is not serialized as part of the ATN, and is only required for
position-dependent lexer actions which appear at a location other than the
end of a rule. For more information about DFA optimizations employed for
lexer actions, see
Note: This class is only required for lexer actions for which
true
.
This method calls
lexer
.
true
.
mode
lexer action by calling
mode
action with the specified mode value.
This action is implemented by calling
mode
command.
false
.
more
lexer action by calling
The
more
command does not have any parameters, so this action is
implemented as a singleton instance exposed by
more
command.
This action is implemented by calling
false
.
popMode
lexer action by calling
The
popMode
command does not have any parameters, so this action is
implemented as a singleton instance exposed by
popMode
command.
This action is implemented by calling
false
.
pushMode
lexer action by calling
pushMode
action with the specified mode value.
This action is implemented by calling
pushMode
command.
false
.
skip
lexer action by calling
The
skip
command does not have any parameters, so this action is
implemented as a singleton instance exposed by
skip
command.
This action is implemented by calling
false
.
type
lexer action by calling
type
action with the specified token type value.
This action is implemented by calling
false
.
seeThruPreds==false
.
s
. If the closure from transition
i leads to a semantic predicate before matching a symbol, the
element at index i of the result will be
null
.
s
.
s
in the ATN in the
specified
ctx
.
If
ctx
is
null
and the end of the rule containing
s
is reached,
ctx
is not
null
and the end of the outermost rule is
reached,
null
if the context
should be ignored
s
in the ATN in the
specified
ctx
.
s
in the ATN in the
specified
ctx
.
If
ctx
is
null
and the end of the rule containing
s
is reached,
PredictionContext#EMPTY_LOCAL
and the end of the outermost rule is
reached,
null
if the context
should be ignored
s
in the ATN in the
specified
ctx
.
s
in the ATN in the
specified
ctx
.
If
ctx
is
stopState
or the end of the rule containing
s
is reached,
ctx
is not
addEOF
is
true
and
stopState
or the end of the outermost rule is reached,
new HashSet<ATNConfig>
for this argument.
A set used for preventing left recursion in the
ATN from causing a stack overflow. Outside code should pass
new BitSet()
for this argument.
true
to true semantic predicates as
implicitly
true
and "see through them", otherwise
false
to treat semantic predicates as opaque and add
ctx
is
null
if
the final state is not available
The input token stream
The start index for the current prediction
The index at which the prediction was finally made
true
if the current lookahead is part of an LL
prediction; otherwise,
false
if the current lookahead is part of
an SLL prediction
This value is the sum of
The basic complexity of the adaptive strategy makes it harder to understand. We begin with ATN simulation to build paths in a DFA. Subsequent prediction requests go through the DFA first. If they reach a state without an edge for the current symbol, the algorithm fails over to the ATN simulation to complete the DFA path for the current input (until it finds a conflict state or uniquely predicting state).
All of that is done without using the outer context because we want to create a DFA that is not dependent upon the rule invocation stack when we do a prediction. One DFA works in all contexts. We avoid using context not necessarily because it's slower, although it can be, but because of the DFA caching problem. The closure routine only considers the rule invocation stack created during prediction beginning in the decision rule. For example, if prediction occurs without invoking another rule's ATN, there are no context stacks in the configurations. When lack of context leads to a conflict, we don't know if it's an ambiguity or a weakness in the strong LL(*) parsing strategy (versus full LL(*)).
When SLL yields a configuration set with conflict, we rewind the input and retry the ATN simulation, this time using full outer context without adding to the DFA. Configuration context stacks will be the full invocation stacks from the start rule. If we get a conflict using full context, then we can definitively say we have a true ambiguity for that input sequence. If we don't get a conflict, it implies that the decision is sensitive to the outer context. (It is not context-sensitive in the sense of context-sensitive grammars.)
The next time we reach this DFA state with an SLL conflict, through DFA simulation, we will again retry the ATN simulation using full context mode. This is slow because we can't save the results and have to "interpret" the ATN each time we get that input.
CACHING FULL CONTEXT PREDICTIONS
We could cache results from full context to predicted alternative easily and that saves a lot of time but doesn't work in presence of predicates. The set of visible predicates from the ATN start state changes depending on the context, because closure can fall off the end of a rule. I tried to cache tuples (stack context, semantic context, predicted alt) but it was slower than interpreting and much more complicated. Also required a huge amount of memory. The goal is not to create the world's fastest parser anyway. I'd like to keep this algorithm simple. By launching multiple threads, we can improve the speed of parsing across a large number of files.
There is no strict ordering between the amount of input used by SLL vs LL, which makes it really hard to build a cache for full context. Let's say that we have input A B C that leads to an SLL conflict with full context X. That implies that using X we might only use A B but we could also use A B C D to resolve conflict. Input A B C D could predict alternative 1 in one position in the input and A B C E could predict alternative 2 in another position in input. The conflicting SLL configurations could still be non-unique in the full context prediction, which would lead us to requiring more input than the original A B C. To make a prediction cache work, we have to track the exact input used during the previous prediction. That amounts to a cache that maps X to a specific DFA for that context.
Something should be done for left-recursive expression predictions. They are likely LL(1) + pred eval. Easier to do the whole SLL unless error and retry with full LL thing Sam does.
AVOIDING FULL CONTEXT PREDICTION
We avoid doing full context retry when the outer context is empty, we did not dip into the outer context by falling off the end of the decision state rule, or when we force SLL mode.
As an example of the not dip into outer context case, consider as super constructor calls versus function calls. One grammar might look like this:
ctorBody : '{' superCall? stat* '}' ;
Or, you might see something like
stat : superCall ';' | expression ';' | ... ;
In both cases I believe that no closure operations will dip into the outer context. In the first case ctorBody in the worst case will stop at the '}'. In the 2nd case it should stop at the ';'. Both cases should stay within the entry rule and not dip into the outer context.
PREDICATES
Predicates are always evaluated if present in either SLL or LL both. SLL and LL simulation deals with predicates differently. SLL collects predicates as it performs closure operations like ANTLR v3 did. It delays predicate evaluation until it reaches and accept state. This allows us to cache the SLL ATN simulation whereas, if we had evaluated predicates on-the-fly during closure, the DFA state configuration sets would be different and we couldn't build up a suitable DFA.
When building a DFA accept state during ATN simulation, we evaluate any predicates and return the sole semantically valid alternative. If there is more than 1 alternative, we report an ambiguity. If there are 0 alternatives, we throw an exception. Alternatives without predicates act like they have true predicates. The simple way to think about it is to strip away all alternatives with false predicates and choose the minimum alternative that remains.
When we start in the DFA and reach an accept state that's predicated, we test those and return the minimum semantically viable alternative. If no alternatives are viable, we throw an exception.
During full LL ATN simulation, closure always evaluates predicates and on-the-fly. This is crucial to reducing the configuration set size during closure. It hits a landmine when parsing with the Java grammar, for example, without this on-the-fly evaluation.
SHARING DFA
All instances of the same parser share the same decision DFAs through a
static field. Each instance gets its own ATN simulator but they share the
same
THREAD SAFETY
The
s.edge[t]
get the same physical target
null
. Once into the DFA, the DFA simulation does not reference the
null
, to be non-
null
and
dfa.edges[t]
null, or
dfa.edges[t]
to be non-null. The
null
, and requests ATN
simulation. It could also race trying to get
dfa.edges[t]
, but either
way it will work because it's not doing a test and set operation.
Starting with SLL then failing to combined SLL/LL (Two-Stage Parsing)
Sam pointed out that if SLL does not give a syntax error, then there is no
point in doing full LL, which is slower. We only have to try LL if we get a
syntax error. For maximum speed, Sam starts the parser set to pure SLL
mode with the
parser.getInterpreter() .(
)
; parser.(new ());
If it does not get a syntax error, then we're done. If it does get a syntax error, we need to retry with the combined SLL/LL strategy.
The reason this works is as follows. If there are no SLL conflicts, then the grammar is SLL (at least for that input set). If there is an SLL conflict, the full LL analysis must yield a set of viable alternatives which is a subset of the alternatives reported by SLL. If the LL set is a singleton, then the grammar is LL but not SLL. If the LL set is the same size as the SLL set, the decision is SLL. If the LL set has size > 1, then that decision is truly ambiguous on the current input. If the LL set is smaller, then the SLL conflict resolution might choose an alternative that the full LL would rule out as a possibility based upon better context information. If that's the case, then the SLL parse will definitely get an error because the full LL analysis says it's not viable. If SLL conflict resolution chooses an alternative within the LL set, them both SLL and LL would choose the same alternative because they both choose the minimum of multiple conflicting alternatives.
Let's say we have a set of SLL conflicting alternatives
1, 2, 3}} and
a smaller LL set called s. If s is
2, 3}}, then SLL
parsing will get an error because SLL will pursue alternative 1. If
s is
1, 2}} or
1, 3}} then both SLL and LL will
choose the same alternative because alternative one is the minimum of either
set. If s is
2}} or
3}} then SLL will get a syntax
error. If s is
1}} then SLL will succeed.
Of course, if the input is invalid, then we will get an error for sure in both SLL and LL parsing. Erroneous input will therefore require 2 passes over the input.
true
, the DFA stores transition information for both full-context
and SLL parsing; otherwise, the DFA only stores SLL transition
information.
For some grammars, enabling the full-context DFA can result in a substantial performance improvement. However, this improvement typically comes at the expense of memory used for storing the cached DFA states, configuration sets, and prediction contexts.
The default value is
false
.
true
, ambiguous alternatives are reported when they are
encountered within
false
, these messages
are suppressed. The default is
false
.
When messages about ambiguous alternatives are not required, setting this
to
false
enables additional internal optimizations which may lose
this information.
The default implementation of this method uses the following
algorithm to identify an ATN configuration which successfully parsed the
decision entry rule. Choosing such an alternative ensures that the
configs
reached the end of the
decision rule, return
configs
which reached the end of the
decision rule predict the same alternative, return that alternative.configs
which reached the end of the
decision rule predict multiple alternatives (call this S),
choose an alternative in the following order.
configs
to only those
configurations which remain viable after evaluating semantic predicates.
If the set of these filtered configurations which also reached the end of
the decision rule is not empty, return the minimum alternative
represented in this set.
In some scenarios, the algorithm described above could predict an
alternative which will result in a
configs
should be
evaluated
The ATN simulation state immediately before the
null
.
t
, or
null
if the target state for this edge is not
already cached
t
. If
t
does not lead to a valid DFA state, this method
returns
configs
which are in a
configs
are already in a rule stop state, this
method simply returns
configs
.
configs
if all configurations in
configs
are in a
rule stop state, otherwise return a new configuration set containing only
the configurations from
configs
which are in a rule stop state
The prediction context must be considered by this filter to address situations like the following.
grammar TA;
prog: statement* EOF;
statement: letterA | statement letterA 'b' ;
letterA: 'a';
If the above grammar, the ATN state immediately before the token
reference
'a'
in
letterA
is reachable from the left edge
of both the primary and closure blocks of the left-recursive rule
statement
. The prediction context associated with each of these
configurations distinguishes between them, and prevents the alternative
which stepped out to
prog
(and then back in to
statement
from being eliminated by the filter.
null
predicate indicates an alt containing an
unpredicated config which behaves as "always true."
This method might not be called for every semantic context evaluated during the prediction process. In particular, we currently do not evaluate the following but it may change in the future:
pred
(A|B|...)+
loop. Technically a decision state, but
we don't use for code generation; somebody might need it, so I'm defining
it for completeness. In reality, the
A+
.
A+
and
(A|B)+
. It has two transitions:
one to the loop back to start of the block and one to exit.
semctx
. See
When using this prediction mode, the parser will either return a correct
parse tree (i.e. the same parse tree that would be returned with the
This prediction mode does not provide any guarantees for prediction behavior for syntactically-incorrect inputs.
When using this prediction mode, the parser will make correct decisions for all syntactically-correct grammar and input combinations. However, in cases where the grammar is truly ambiguous this prediction mode might not report a precise answer for exactly which alternatives are ambiguous.
This prediction mode does not provide any guarantees for prediction behavior for syntactically-incorrect inputs.
This prediction mode may be used for diagnosing ambiguities during grammar development. Due to the performance overhead of calculating sets of ambiguous alternatives, this prediction mode should be avoided when the exact results are not necessary.
This prediction mode does not provide any guarantees for prediction behavior for syntactically-incorrect inputs.
This method computes the SLL prediction termination condition for both of the following cases.
COMBINED SLL+LL PARSING
When LL-fallback is enabled upon SLL conflict, correct predictions are ensured regardless of how the termination condition is computed by this method. Due to the substantially higher cost of LL prediction, the prediction should only fall back to LL when the additional lookahead cannot lead to a unique SLL prediction.
Assuming combined SLL+LL parsing, an SLL configuration set with only
conflicting subsets should fall back to full LL, even if the
configuration sets don't resolve to the same alternative (e.g.
1,2}} and
3,4}}. If there is at least one non-conflicting
configuration, SLL could continue with the hopes that more lookahead will
resolve via one of those non-conflicting configurations.
Here's the prediction termination rule them: SLL (for SLL+LL parsing) stops when it sees only conflicting configuration subsets. In contrast, full LL keeps going when there is uncertainty.
HEURISTIC
As a heuristic, we stop prediction when we see any conflicting subset unless we see a state that only has one alternative associated with it. The single-alt-state thing lets prediction continue upon rules like (otherwise, it would admit defeat too soon):
[12|1|[], 6|2|[], 12|2|[]]. s : (ID | ID ID?) ';' ;
When the ATN simulation reaches the state before
';'
, it has a
DFA state that looks like:
[12|1|[], 6|2|[], 12|2|[]]
. Naturally
12|1|[]
and
12|2|[]
conflict, but we cannot stop
processing this node because alternative to has another way to continue,
via
[6|2|[]]
.
It also let's us continue for this rule:
[1|1|[], 1|2|[], 8|3|[]] a : A | A | A B ;
After matching input A, we reach the stop state for rule A, state 1. State 8 is the state right before B. Clearly alternatives 1 and 2 conflict and no amount of further lookahead will separate the two. However, alternative 3 will be able to continue and so we do not stop working on this state. In the previous example, we're concerned with states associated with the conflicting alternatives. Here alt 3 is not associated with the conflicting configs, but since we can continue looking for input reasonably, don't declare the state done.
PURE SLL PARSING
To handle pure SLL parsing, all we have to do is make sure that we combine stack contexts for configurations that differ only by semantic predicate. From there, we can do the usual SLL termination heuristic.
PREDICATES IN SLL+LL PARSING
SLL decisions don't evaluate predicates until after they reach DFA stop states because they need to create the DFA cache that works in all semantic situations. In contrast, full LL evaluates predicates collected during start state computation so it can ignore predicates thereafter. This means that SLL termination detection can totally ignore semantic predicates.
Implementation-wise,
(s, 1, x,
), (s, 1, x', {p})}
Before testing these configurations against others, we have to merge
x
and
x'
(without modifying the existing configurations).
For example, we test
(x+x')==x''
when looking for conflicts in
the following configurations.
(s, 1, x,
), (s, 1, x', {p}), (s, 2, x'', {})}
If the configuration set has predicates (as indicated by
configs
is in a
true
if any configuration in
configs
is in a
false
configs
are in a
true
if all configurations in
configs
are in a
false
Can we stop looking ahead during ATN simulation or is there some uncertainty as to which alternative we will ultimately pick, after consuming more input? Even if there are partial conflicts, we might know that everything is going to resolve to the same minimum alternative. That means we can stop since no more lookahead will change that fact. On the other hand, there might be multiple conflicts that resolve to different minimums. That means we need more look ahead to decide which of those alternatives we should predict.
The basic idea is to split the set of configurations
C
, into
conflicting subsets
(s, _, ctx, _)
and singleton subsets with
non-conflicting configurations. Two configurations conflict if they have
identical
(s, i, ctx, _)
and
(s, j, ctx, _)
for
i!=j
.
A_s,ctx =
i | (s, i, ctx, _)}} for each configuration in
C
holding
s
and
ctx
fixed.
Or in pseudo-code, for each configuration
c
in
C
:
map[c] U= c.getAlt() # map hash/equals uses s and x, not alt and not pred
The values in
map
are the set of
A_s,ctx
sets.
If
|A_s,ctx|=1
then there is no conflict associated with
s
and
ctx
.
Reduce the subsets to singletons by choosing a minimum of each subset. If the union of these alternative subsets is a singleton, then no amount of more lookahead will help us. We will always pick that alternative. If, however, there is more than one alternative, then we are uncertain which alternative to predict and must continue looking for resolution. We may or may not discover an ambiguity in the future, even if there are no conflicting subsets this round.
The biggest sin is to terminate early because it means we've made a decision but were uncertain as to the eventual outcome. We haven't used enough lookahead. On the other hand, announcing a conflict too late is no big deal; you will still have the conflict. It's just inefficient. It might even look until the end of file.
No special consideration for semantic predicates is required because predicates are evaluated on-the-fly for full LL prediction, ensuring that no configuration contains a semantic context during the termination check.
CONFLICTING CONFIGS
Two configurations
(s, i, x)
and
(s, j, x')
, conflict
when
i!=j
but
x=x'
. Because we merge all
(s, i, _)
configurations together, that means that there are at
most
n
configurations associated with state
s
for
n
possible alternatives in the decision. The merged stacks
complicate the comparison of configuration contexts
x
and
x'
. Sam checks to see if one is a subset of the other by calling
merge and checking to see if the merged result is either
x
or
x'
. If the
x
associated with lowest alternative
i
is the superset, then
i
is the only possible prediction since the
others resolve to
min(i)
as well. However, if
x
is
associated with
j>i
then at least one stack configuration for
j
is not in conflict with alternative
i
. The algorithm
should keep going, looking for more lookahead due to the uncertainty.
For simplicity, I'm doing a equality check between
x
and
x'
that lets the algorithm continue to consume lookahead longer
than necessary. The reason I like the equality is of course the
simplicity but also because that is the test you need to detect the
alternatives that are actually in conflict.
CONTINUE/STOP RULE
Continue if union of resolved alternative sets from non-conflicting and conflicting alternative subsets has more than one alternative. We are uncertain about which alternative to predict.
The complete set of alternatives,
[i for (_,i,_)]
, tells us which
alternatives are still in the running for the amount of input we've
consumed at this point. The conflicting sets let us to strip away
configurations that won't lead to more states because we resolve
conflicts to the configuration with a minimum alternate for the
conflicting set.
CASES
(s, 1, x)
,
(s, 2, x)
,
(s, 3, z)
,
(s', 1, y)
,
(s', 2, y)
yields non-conflicting set
3}} U conflicting sets
min(
1,2})} U
min(
1,2})} =
1,3}} => continue
(s, 1, x)
,
(s, 2, x)
,
(s', 1, y)
,
(s', 2, y)
,
(s'', 1, z)
yields non-conflicting set
1}} U conflicting sets
min(
1,2})} U
min(
1,2})} =
1}} => stop and predict 1(s, 1, x)
,
(s, 2, x)
,
(s', 1, y)
,
(s', 2, y)
yields conflicting, reduced sets
1}} U
1}} =
1}} => stop and predict 1, can announce
ambiguity
1,2}}(s, 1, x)
,
(s, 2, x)
,
(s', 2, y)
,
(s', 3, y)
yields conflicting, reduced sets
1}} U
2}} =
1,2}} => continue(s, 1, x)
,
(s, 2, x)
,
(s', 3, y)
,
(s', 4, y)
yields conflicting, reduced sets
1}} U
3}} =
1,3}} => continueEXACT AMBIGUITY DETECTION
If all states report the same conflicting set of alternatives, then we know we have the exact ambiguity set.
|A_i|>1
and
A_i = A_j
for all i, j.
In other words, we continue examining lookahead until all
A_i
have more than one alternative and all
A_i
are the same. If
A=
{1,2}, {1,3}}}, then regular LL prediction would terminate
because the resolved set is
1}}. To determine what the real
ambiguity is, we have to know whether the ambiguity is between one and
two or one and three so we keep going. We can only stop prediction when
we need exact ambiguity detection when the sets look like
A=
{1,2}}} or
{1,2},{1,2}}}, etc...
altsets
contains more
than one alternative.
true
if every
altsets
has
false
altsets
contains
exactly one alternative.
true
if
altsets
contains a
false
altsets
contains
more than one alternative.
true
if
altsets
contains a
false
altsets
is equivalent.
true
if every member of
altsets
is equal to the
others, otherwise
false
altsets
. If no such alternative exists, this method returns
altsets
.
altsets
c
in
configs
:
map[c] U= c.getAlt() # map hash/equals uses s and x, not alt and not pred
c
in
configs
:
map[c.] U= c.
p1&&p2
, or a sum of products
p1||p2
.
I have scoped the
true}?}.
For context dependent predicates, we must pass in a local context so that references such as $arg evaluate properly as _localctx.arg. We only capture context dependent predicates in the context in which we begin prediction, so we passed in the outer context here in case of context dependent predicate evaluation.
true
after
precedence predicates are evaluated.null
: if the predicate simplifies to
false
after
precedence predicates are evaluated.this
: if the semantic context is not changed as a result of
precedence predicate evaluation.null
The evaluation of predicates by this context is short-circuiting, but unordered.
The evaluation of predicates by this context is short-circuiting, but unordered.
This is a computed property that is calculated during ATN deserialization
and stored for use in
This error strategy is useful in the following scenarios.
myparser.setErrorHandler(new BailErrorStrategy());
TODO: what to do about lexers
recognizer
.
Note that the calling code will not report an error if this method
returns successfully. The error strategy implementation is responsible
for calling
e
. This method is
called after
The generated code currently contains calls to
(...)*
or
(...)+
).
For an implementation based on Jim Idle's "magic sync" mechanism, see
recognizer
is in the process of recovering
from an error. In error recovery mode,
true
if the parser is currently recovering from a parse
error, otherwise
false
The default implementation simply calls
The default implementation simply calls
The default implementation returns immediately if the handler is already
in error recovery mode. Otherwise, it calls
e
according to the following table.
The default implementation resynchronizes the parser by consuming tokens until we find one in the resynchronization set--loosely the set of tokens that can follow the current rule.
Implements Jim Idle's magic sync mechanism in closures and optional subrules. E.g.,
a : sync ( stuff sync )* ; sync : {consume to what can follow sync} ;At the start of a sub rule upon error,
If the sub rule is optional (
(...)?
,
(...)*
, or block
with an empty alternative), then the expected set includes what follows
the subrule.
During loop iteration, it consumes until it sees a token that can start a sub rule or what follows loop. Yes, that is pretty aggressive. We opt to stay in the loop as long as possible.
ORIGINS
Previous versions of ANTLR did a poor job of their recovery within loops. A single mismatch token or missing token would force the parser to bail out of the entire rules surrounding the loop. So, for rule
classDef : 'class' ID '{' member* '}'input with an extra token between members would force the parser to consume until it found the next class definition rather than the next member definition of the current class.
This functionality cost a little bit of effort because the parser has to compare token set at the start of the loop and at each iteration. If for some reason speed is suffering for you, you can turn off this functionality by simply overriding this method as a blank { }.
LT(1)
symbol and has not yet been
removed from the input stream. When this method returns,
recognizer
is in error recovery mode.
This method is called when
The default implementation simply returns if the handler is already in
error recovery mode. Otherwise, it calls
recognizer
is in error recovery mode.
This method is called when
The default implementation simply returns if the handler is already in
error recovery mode. Otherwise, it calls
The default implementation attempts to recover from the mismatched input
by using single token insertion and deletion as described below. If the
recovery attempt fails, this method throws an
EXTRA TOKEN (single token deletion)
LA(1)
is not what we are looking for. If
LA(2)
has the
right token, however, then assume
LA(1)
is some extra spurious
token and delete it. Then consume and return the next token (which was
the
LA(2)
token) as the successful result of the match operation.
This recovery strategy is implemented by
MISSING TOKEN (single token insertion)
If current token (at
LA(1)
) is consistent with what could come
after the expected
LA(1)
token, then assume the token is missing
and use the parser's
This recovery strategy is implemented by
EXAMPLE
For example, Input
i=(3;
is clearly missing the
')'
. When
the parser returns from the nested call to
expr
, it will have
call chain:
stat → expr → atomand it will be trying to match the
')'
at this point in the
derivation:
=> ID '=' '(' INT ')' ('+' atom)* ';' ^The attempt to match
')'
will fail when it sees
';'
and
call
LA(1)==';'
is in the set of tokens that can follow the
')'
token reference
in rule
atom
. It can assume that you forgot the
')'
.
true
,
recognizer
will be in error recovery
mode.
This method determines whether or not single-token insertion is viable by
checking if the
LA(1)
input symbol could be successfully matched
if it were instead the
LA(2)
symbol. If this method returns
true
, the caller is responsible for creating and inserting a
token with the correct type to produce this behavior.
true
if single-token insertion is a viable recovery
strategy for the current mismatched input, otherwise
false
recognizer
will not be in error recovery mode since the
returned token was a successful match.
If the single-token deletion is successful, this method calls
null
e
, re-throw it wrapped
in a
The
e
has token at which we
started production for the decision.
The line number in the input where the error occurred.
The character position within that line where the error occurred.
The message to emit.
The exception generated by the parser that led to
the reporting of an error. It is null in the case where
the parser was able to recover in line without exiting the
surrounding rule.
Each full-context prediction which does not result in a syntax error
will call either
When
ambigAlts
is not null, it contains the set of potentially
viable alternatives identified by the prediction algorithm. When
ambigAlts
is null, use
configs
argument.
When
exact
is
true
, all of the potentially
viable alternatives are truly viable, i.e. this is reporting an exact
ambiguity. When
exact
is
false
, at least two of
the potentially viable alternatives are viable for the current input, but
the prediction algorithm terminated as soon as it determined that at
least the minimum potentially viable alternative is truly
viable.
When the
exact
will always be
true
.
true
if the ambiguity is exactly known, otherwise
false
. This is always
true
when
null
to indicate that the potentially ambiguous alternatives are the complete
set of represented alternatives in
configs
the ATN configuration set where the ambiguity was
identified
If one or more configurations in
configs
contains a semantic
predicate, the predicates are evaluated before this method is called. The
subset of alternatives which are still viable after predicates are
evaluated is reported in
conflictingAlts
.
null
, the conflicting alternatives are all alternatives
represented in
configs
.
the simulator state when the SLL conflict was
detected
Each full-context prediction which does not result in a syntax error
will call either
For prediction implementations that only evaluate full-context
predictions when an SLL conflict is found (including the default
configs
may have more than one represented alternative if the
full-context prediction algorithm does not evaluate predicates before
beginning the full-context prediction. In all cases, the final prediction
is passed as the
prediction
argument.
Note that the definition of "context sensitivity" in this method
differs from the concept in
This token stream ignores the value of
LT(k).getType()==LA(k)
.
index
in the stream. When
the preconditions of this method are met, the return value is non-null.
The preconditions for this method are the same as the preconditions of
seek(index)
is
unspecified for the current state and given
index
, then the
behavior of this method is also unspecified.
The symbol referred to by
index
differs from
seek()
only
in the case of filtering streams where
index
lies before the end
of the stream. Unlike
seek()
, this method does not adjust
index
to point to a non-ignored symbol.
interval
. This
method behaves like the following code (including potential exceptions
for violating preconditions of
TokenStream stream = ...; String text = ""; for (int i = interval.a; i <= interval.b; i++) { text += stream.get(i).getText(); }
interval
is
null
TokenStream stream = ...; String text = stream.getText(new Interval(0, stream.size()));
If
ctx.getSourceInterval()
does not return a valid interval of
tokens provided by this stream, the behavior is unspecified.
TokenStream stream = ...; String text = stream.getText(ctx.getSourceInterval());
ctx
.
start
and
stop
(inclusive).
If the specified
start
or
stop
token was not provided by
this stream, or if the
stop
occurred before the
start
token, the behavior is unspecified.
For streams which ensure that the
TokenStream stream = ...; String text = ""; for (int i = start.getTokenIndex(); i <= stop.getTokenIndex(); i++) { text += stream.get(i).getText(); }
start
and
stop
tokens.
true
.
[
]
should be
This field is set to -1 when the stream is first constructed or when
i
in tokens has a token.
true
if a token is located at index
i
, otherwise
false
.
n
elements to buffer.
i
. If an
exception is thrown in this method, the current stream index should not be
changed.
For example,
List
of all tokens in
the token type
BitSet
. Return
null
if no tokens were found. This
method looks at both on and off channel tokens.
i
if
tokens[i]
is on channel. Return the index of
the EOF token if there are no tokens on channel between
i
and
EOF.
i
if
tokens[i]
is on channel. Return -1
if there are no tokens on channel between
i
and 0.
If
i
specifies an index at or after the EOF token, the EOF token
index is returned. This is due to the fact that the EOF token is treated
as though it were on every channel.
channel
is
-1
, find any non default channel token.
channel
is
-1
, find any non default channel token.
These properties share a field to reduce the memory footprint of
If
oldToken
is also a
null
, then
null
if the text
should be obtained from the input along with the start and stop indexes
of the token.
This token factory does not explicitly copy token text when constructing tokens.
The default value is
false
to avoid the performance and memory
overhead of copying text for every token unless explicitly requested.
When
copyText
is
false
, the
false
.
The
This token stream provides access to all tokens by index or when calling
methods like
By default, tokens are placed on the default channel
(
->channel(HIDDEN)
lexer command, or by using an embedded action to
call
Note: lexer rules which use the
->skip
lexer command or call
The default value is
channel
or have the
This implementation prints messages to
line
,
charPositionInLine
, and
msg
using
the following format.
line line:charPositionInLine msg
true
if this DFA is for a precedence decision; otherwise,
false
. This is the backing field for null
if no start state exists for the specified precedence.
true
if this is a precedence DFA; otherwise,
false
.
precedenceDfa
is
false
, the initial state
null
; otherwise, it is initialized to a new
true
if this is a precedence DFA; otherwise,
false
I use a set of ATNConfig objects not simple states. An ATNConfig is both a state (ala normal conversion) and a RuleContext describing the chain of rules (if any) followed to arrive at that state.
A DFA state may have multiple references to a particular state, but with different ATN contexts (with same or different alts) meaning that state was reached via a different set of rule invocations.
edges.get(symbol)
points to target of symbol.
!=null
.
Because the number of alternatives and number of ATN configurations are finite, there is a finite number of DFA states that can be processed. This is necessary to show that the algorithm terminates.
Cannot test the DFA state numbers here because in
true
, only exactly known ambiguities are reported.
true
to report only exact ambiguities, otherwise
false
to report all ambiguities.
reportedAlts
if it is not
null
, otherwise
returns the set of alternatives represented in
configs
.
If the set of expected tokens is not known and could not be computed,
this method returns
null
.
null
if the information is not available.
If the state number is not known, this method returns -1.
If the context is not available, this method returns
null
.
null
.
If the input stream is not available, this method returns
null
.
null
if the stream is not
available.
If the recognizer is not available, this method returns
null
.
null
if
the recognizer is not available.
The payload is either a
i
th value indexed from 0.
(root child1 .. childN)
. Print just a node if this is a leaf.
If source interval is unknown, this returns
null
.
Errors from the lexer are never passed to the parser. Either you want to keep
going or you do not upon token recognition error. If you do not want to
continue lexing then you do not want to continue parsing. Just throw an
exception not under
null
if no input stream is available for the token
source.
listener
is
null
.
Used for XPath and tree pattern compilation.
Used for XPath and tree pattern compilation.
For interpreters, we don't know their serialized ATN despite having created the interpreter from it.
If the final token in the list is an
null
, a call to
tokens
is
null
null
,
tokens
is
null
value
is
null
.
seed
.
value
.
value
.
hash
to form the final result of the MurmurHash 3 hash function.
set
, or both.
null
argument is
treated as though it were an empty set.
this
(to support chained calls)
a
.
null
argument is treated as though it were an empty set.
a
. The value
null
may be returned in
place of an empty result set.
elements
but not present in the current set. The
following expressions are equivalent for input non-null
x
and
y
.
x.complement(y)
y.subtract(x)
null
argument is treated as though it were an empty set.
elements
but not present in the current set. The value
null
may be returned in place of an empty result set.
a
, or both.
This method is similar to
null
argument
is treated as though it were an empty set.
a
. The value
null
may be returned in place of an
empty result set.
a
.
The following expressions are equivalent for input non-null
x
and
y
.
y.subtract(x)
x.complement(y)
null
argument is treated as though it were an empty set.
elements
but not present in the current set. The value
null
may be returned in place of an empty result set.
true
if the set contains the specified element.
true
if the set contains
el
; otherwise
false
.
true
if this set contains no elements.
true
if the current set contains no elements; otherwise,
false
.
this
not in
other
;
other
must not be totally enclosed (properly contained)
within
this
, which would result in two disjoint intervals
instead of the single one returned by this method.
This class is able to represent sets containing any combination of values in
the range
left - right
. If either of the input sets is
null
, it is treated as though it was an empty set.
true
.
(true)
is called, a reference to the
(false)
. The listener itself is
implemented as a parser listener so this field is not directly used by
other parser methods.
ttype
. If the symbol type
matches,
If the symbol type does not match,
true
and the token index of the symbol returned by
ttype
and the error strategy could not recover from the
mismatched symbol
If the symbol type does not match,
true
and the token index of the symbol returned by
listener
to receive events during the parsing process.
To support output-preserving grammar transformations (including but not
limited to left-recursion removal, automated left-factoring, and
optimized code generation), calls to listener methods during the parse
may differ substantially from calls made by
With the following specific exceptions, calls to listener events are deterministic, i.e. for identical input the calls to listener methods will be the same.
listener is
null
listener
from the list of parse listeners.
If
listener
is
null
or has not been added as a parse
listener, this method does nothing.
ParseTree t = parser.expr(); ParseTreePattern p = parser.compileParseTreePattern("<ID>+0", MyParser.RULE_expr); ParseTreeMatch m = p.match(t); String id = m.get("ID");
E.g., given the following input with
A
being the current
lookahead symbol, this function moves the cursor to
B
and returns
A
.
A B ^If the parser is not in error recovery mode, the consumed symbol is added to the parse tree using
symbol
can follow the current state in the
ATN. The behavior of this method is equivalent to the following, but is
implemented such that the complete context-sensitive follow set does not
need to be explicitly constructed.
return getExpectedTokens().contains(symbol);
true
if
symbol
can follow the current state in
the ATN, otherwise
false
.
RULE_ruleName
field) or -1 if not found.
Note that if we are not building parse trees, rule contexts only point upwards. When a rule exits, it returns the context but that gets garbage collected if nobody holds a reference. It points upwards but nobody points at it.
When we build parse trees, we are adding all of these contexts to
true
for a newly constructed parser.
true
if a complete parse tree will be constructed while
parsing, otherwise
false
false
by default for a newly constructed parser.
true
to trim the capacity of the
true
if the
You can insert stuff, replace, and delete chunks. Note that the operations
are done lazily--only if you convert the buffer to a
This rewriter makes no modifications to the token stream. It does not ask the
stream to fill itself up nor does it advance the input cursor. The token
stream
The rewriter only works on tokens that you have in the buffer and ignores the
current input cursor. If you are buffering tokens on-demand, calling
Since the operations are done lazily at
i
does not change the index values for tokens
i
+1..n-1.
Because operations never actually alter the buffer, you may always get the original token stream back without undoing anything. Since the instructions are queued up, you can easily simulate transactions and roll back any changes if there is an error just by removing instructions. For example,
CharStream input = new ANTLRFileStream("input"); TLexer lex = new TLexer(input); CommonTokenStream tokens = new CommonTokenStream(lex); T parser = new T(tokens); TokenStreamRewriter rewriter = new TokenStreamRewriter(tokens); parser.startRule();
Then in the rules, you can execute (assuming rewriter is visible):
Token t,u; ... rewriter.insertAfter(t, "text to put after t");} rewriter.insertAfter(u, "text after u");} System.out.println(tokens.toString());
You can also have multiple "instruction streams" and get multiple rewrites from a single pass over the input. Just name the instruction streams and use that name again when printing the buffer. This could be useful for generating a C file and also its header file--all from the same buffer:
tokens.insertAfter("pass1", t, "text to put after t");} tokens.insertAfter("pass2", u, "text after u");} System.out.println(tokens.toString("pass1")); System.out.println(tokens.toString("pass2"));
If you don't use named rewrite streams, a "default" stream is used as the first example shows.
XVisitor
interface for
grammar
X
.
The default implementation calls
The default implementation initializes the aggregate result to
false
no more children are visited and the current aggregate
result is returned. After visiting a child, the aggregate result is
updated by calling
The default implementation is not safe for use in visitors that modify the tree structure. Visitors that modify the tree should override this method to behave properly in respect to the specific algorithm in use.
The default implementation returns the result of
The default implementation returns the result of
false
, the aggregate value is returned as the result of
The default implementation returns
nextResult
, meaning
aggregate
argument
to this method after the first child node is visited.
The result of the immediately preceeding call to visit
a child node.
currentResult
will be the initial
value (in the default implementation, the initial value is returned by a
call to
The default implementation always returns
true
, indicating that
visitChildren
should only return after all children are visited.
One reason to override this method is to provide a "short circuit"
evaluation option for situations where the result of visiting a single
child has the potential to determine the result of the visit operation as
a whole.
true
to continue visiting children. Otherwise return
false
to stop visiting children and immediately return the
current aggregate result from
The base implementation returns
null
.
ParseTreeProperty<Integer> values = new ParseTreeProperty<Integer>(); values.put(tree, 36); int x = values.get(tree); values.removeFrom(tree);You would make one decl (values here) in the listener and use lots of times in your event methods.
The method
tree
is
null
pattern
is
null
labels
is
null
label
.
For example, for pattern
<id:ID>
,
get("id")
returns the
node matched for that
ID
. If more than one node
matched the specified label, only the last is returned. If there is
no node associated with the label, this returns
null
.
Pattern tags like
<ID>
and
<expr>
without labels are
considered to be labeled with
ID
and
expr
, respectively.
null
if no parse tree matched a tag with the label.
If the
label
is the name of a parser rule or token in the
grammar, the resulting list will contain both the parse trees matching
rule or tags explicitly labeled with the label and the complete set of
parse trees matching the labeled and unlabeled tags in the pattern for
the parser rule or token. For example, if
label
is
"foo"
,
the result will contain all of the following.
<foo:anyRuleName>
and
<foo:AnyTokenName>
.<anyLabel:foo>
.<foo>
.label
. If no nodes matched the label, an empty list
is returned.
The map includes special entries corresponding to the names of rules and
tokens referenced in tags in the original pattern. For additional
information, see the description of
null
if the match was successful.
true
if the match operation succeeded; otherwise,
false
.
<ID> = <expr>;
converted to a
true
if
tree
is a match for the current tree
pattern; otherwise,
false
.
Patterns are strings of source input text with special tags representing token or rule references such as:
<ID> = <expr>;
Given a pattern start rule such as
statement
, this object constructs
a
ID
and
expr
subtree. Then the
<ID>
matches
any
ID
token and tag
<expr>
references the result of the
expr
rule (generally an instance of
ExprContext
.
Pattern
x = 0;
is a similar pattern that matches the same pattern
except that it requires the identifier to be
x
and the expression to
be
0
.
The
true
or
false
based
upon a match for the tree rooted at the parameter sent in. The
For efficiency, you can compile a tree pattern in string form to a
See
TestParseTreeMatcher
for lots of examples.
The lexer and parser that you pass into the
<ID> = <expr>;
into a sequence of four tokens (assuming lexer
throws out whitespace or puts it on a hidden channel). Be aware that the
input stream is reset for the lexer (but not the parser; a
Normally a parser does not accept token
<expr>
as a valid
expr
but, from the parser passed in, we create a special version of
the underlying grammar representation (an
<expr>
) to match entire rules. We call
these bypass alternatives.
Delimiters are
<
and
>
, with
\
as the escape string
by default, but you can set them to whatever you want using
\<
and
\>
.
start
is
null
or empty.
stop
is
null
or empty.
pattern
matched as rule
patternRuleIndex
match
tree
?
pattern
matched as rule patternRuleIndex match tree? Pass in a
compiled pattern instead of a string representation of a tree pattern.
pattern
matched as rule
patternRuleIndex
against
tree
and return a
pattern
matched against
tree
and return a
tree
against
patternTree
, filling
match.
tree
which does not match
a corresponding node in
patternTree
, or
null
if the match
was successful. The specific node returned depends on the matching
algorithm used by the implementation, and may be overridden.
t
(expr <expr>)
subtree?
<ID> = <e:expr> ;
into 4 chunks for tokenizing by
<expr>
. These tokens are created for
ruleName
is
null
or empty.
null
if
the rule tag is unlabeled.
ruleName
is
null
or empty.
The implementation for
ruleName:bypassTokenType
.
null
if this is an unlabeled rule tag.
Rule tag tokens are always placed on the
This method returns the rule tag formatted with
<
and
>
delimiters.
Rule tag tokens have types assigned according to the rule bypass transitions created during ATN deserialization.
The implementation for
The implementation for
The implementation for
The implementation for
The implementation for
The implementation for
null
.
The implementation for
null
.
expr
: An unlabeled placeholder for a parser rule
expr
.ID
: An unlabeled placeholder for a token of type
ID
.e:expr
: A labeled placeholder for a parser rule
expr
.id:ID
: A labeled placeholder for a token of type
ID
.tag
is
null
or
empty.
null
, the
tag
is
null
or
empty.
label:tag
, and unlabeled tags are
returned as just the tag name.
null
if no label is
assigned to the chunk.
text
is
null
.
The implementation for
<ID>
. These tokens are created for
null
if
the token tag is unlabeled.
The implementation for
tokenName:type
.
null
if this is an unlabeled rule tag.
The implementation for
<
and
>
delimiters.
Split path into words and separators
/
and
//
via ANTLR
itself then walk path elements from left to right. At each separator-word
pair, find set of nodes. Next stage uses those as work list.
The basic interface is
(tree, pathString, parser)
.
But that is just shorthand for:
p = new XPath (parser, pathString); return p.evaluate (tree);
See
org.antlr.v4.test.TestXPath
for descriptions. In short, this
allows operators:
and path elements:
Whitespace is not allowed.
*
or
ID
or
expr
to a path
element.
anywhere
is
true
if
//
precedes the
word.
t
as root that satisfy the
path. The root
/
is relative to the node passed to
/ID
or
ID
or
/*
etc...
op is null if just node
t
return all nodes matched by this path
element.
ID
at start of path or
...//ID
in middle of path.
This is not the buffer capacity, that's
data.length
.
The
LA(1)
character is
data[p]
. If
p == n
, we are
out of buffered characters.
release()
the last mark,
numMarkers
reaches 0 and we reset the buffer. Copy
data[p]..data[n-1]
to
data[0]..data[(n-1)-p]
.
LA(-1)
character for the current position.
numMarkers > 0
, this is the
LA(-1)
character for the
first character in
LA(1)
. Goes from 0 to the number of characters in the
entire stream, although the stream size is unknown before the end is
reached.
p
index is
data.length-1
.
p+need-1
is
the char index 'need' elements ahead. If we need 1 element,
(p+1-1)==p
must be less than
data.length
.
n
characters to the buffer. Returns the number of characters
actually added to the buffer. If the return value is less than
n
,
then EOF was reached before
n
characters could be added.
The specific marker value used for this class allows for some level of
protection against misuse where
seek()
is called on a mark or
release()
is called in the wrong order.
p
to
index-bufferStartIndex
.
This is not the buffer capacity, that's
tokens.length
.
The
LT(1)
token is
tokens[p]
. If
p == n
, we are
out of buffered tokens.
release()
the last mark,
numMarkers
reaches 0 and we reset the buffer. Copy
tokens[p]..tokens[n-1]
to
tokens[0]..tokens[(n-1)-p]
.
LT(-1)
token for the current position.
numMarkers > 0
, this is the
LT(-1)
token for the
first token in
null
.
LT(1)
. Goes from 0 to the number of tokens in the entire stream,
although the stream size is unknown before the end is reached.
This value is used to set the token indexes if the stream provides tokens
that implement
p
index is
tokens.length-1
.
p+need-1
is the tokens index 'need' elements
ahead. If we need 1 element,
(p+1-1)==p
must be less than
tokens.length
.
n
elements to the buffer. Returns the number of tokens
actually added to the buffer. If the return value is less than
n
,
then EOF was reached before
n
tokens could be added.
The specific marker value used for this class allows for some level of
protection against misuse where
seek()
is called on a mark or
release()
is called in the wrong order.