public interface Lexer<T extends TokenId>
LexerInput
and groups
them into tokens.
TokenFactory.createToken(TokenId)
.
Token factory instance should be given to the lexer in its constructor.
The lexer must be able to express its internal lexing
state at token boundaries and it must be able
to restart lexing from such state.
It is expected that if the input characters following the restart point
would not change then the lexer will return the same tokens
regardless whether it was restarted at the restart point
or run from the input begining as a batch lexer.
Testing of the lexers:
Testing of newly written lexers can be performed in several ways.
The most simple way is to test batch lexing first
(see e.g.
org.netbeans.lib.lexer.test.simple.SimpleLexerBatchTest in lexer module tests).
Then an "incremental" behavior of the new lexer can be tested
(see e.g.
org.netbeans.lib.lexer.test.simple.SimpleLexerIncTest).
Finally the lexer can be tested by random tests that randomly insert and remove
characters from the document
(see e.g.
org.netbeans.lib.lexer.test.simple.SimpleLexerRandomTest).
Once these tests pass the lexer can be considered stable.
Modifier and Type | Method and Description |
---|---|
Token<T> |
nextToken()
Return a token based on characters of the input
and possibly additional input properties.
|
void |
release()
Infrastructure calls this method when it no longer needs this lexer for lexing
so it becomes unused.
|
Object |
state()
This method is called by lexer's infrastructure
to return present lexer's state
once the lexer has recognized and returned a token.
|
Token<T> nextToken()
LexerInput.read()
method. Once the lexer
knows that it has read enough characters to recognize
a token it calls
TokenFactory.createToken(TokenId)
to obtain an instance of a Token
and then returns it.
Note: Lexer must *not* return any other Token
instances than
those obtained from the TokenFactory.
The lexer is required to tokenize all the characters (except EOF)
provided by the LexerInput
prior to returning null
from this method. Not doing so is treated
as malfunctioning of the lexer.
TokenFactory.SKIP_TOKEN
if the token should be skipped because of a token filter.IllegalStateException
- if the token instance created by the lexer
was not created by the methods of TokenFactory (there is a common superclass
for those token implementations).IllegalStateException
- if this method returns null but not all
the characters of the lexer input were tokenized.Object state()
If the lexer is in no extra state (it is in a default state)
it should return null
. Most lexers are in the default state
only at all the time.
If possible the non-default lexer states should be expressed
as small non-negative integers.
There is an optimization that shrinks the storage costs for small
java.lang.Integer
s to single bytes.
The returned value should not be tied to this particular lexer instance in any way. Another lexer instance may be restarted from this state later.
void release()