public interface Lexer<T extends TokenId>
LexerInput and groups
them into tokens.
Token factory instance should be given to the lexer in its constructor.
The lexer must be able to express its internal lexing
state at token boundaries and it must be able
to restart lexing from such state.
It is expected that if the input characters following the restart point would not change then the lexer will return the same tokens regardless whether it was restarted at the restart point or run from the input begining as a batch lexer.
Testing of the lexers:
Testing of newly written lexers can be performed in several ways. The most simple way is to test batch lexing first (see e.g. org.netbeans.lib.lexer.test.simple.SimpleLexerBatchTest in lexer module tests).
Then an "incremental" behavior of the new lexer can be tested (see e.g. org.netbeans.lib.lexer.test.simple.SimpleLexerIncTest).
Finally the lexer can be tested by random tests that randomly insert and remove characters from the document (see e.g. org.netbeans.lib.lexer.test.simple.SimpleLexerRandomTest).
Once these tests pass the lexer can be considered stable.
|Modifier and Type
|Method and Description
Return a token based on characters of the input and possibly additional input properties.
Infrastructure calls this method when it no longer needs this lexer for lexing so it becomes unused.
This method is called by lexer's infrastructure to return present lexer's state once the lexer has recognized and returned a token.
LexerInput.read() method. Once the lexer
knows that it has read enough characters to recognize
a token it calls
to obtain an instance of a
Token and then returns it.
Note: Lexer must *not* return any other
Token instances than
those obtained from the TokenFactory.
The lexer is required to tokenize all the characters (except EOF)
provided by the
LexerInput prior to returning null
from this method. Not doing so is treated
as malfunctioning of the lexer.
if the token should be skipped because of a token filter.
IllegalStateException - if the token instance created by the lexer
was not created by the methods of TokenFactory (there is a common superclass
for those token implementations).
IllegalStateException - if this method returns null but not all
the characters of the lexer input were tokenized.
If the lexer is in no extra state (it is in a default state)
it should return
null. Most lexers are in the default state
only at all the time.
If possible the non-default lexer states should be expressed as small non-negative integers.
There is an optimization that shrinks the storage costs for small
java.lang.Integers to single bytes.
The returned value should not be tied to this particular lexer instance in any way. Another lexer instance may be restarted from this state later.