public final class LexerInput extends Object
Lexer
.
It logically corresponds to java.io.Reader
but its LexerInput.read()
method
does not throw any checked exception.
LexerInput.read()
so that they can be re-read again later.
java.lang.CharSequence
by LexerInput.readText(int, int)
.
The LexerInput
can only be used safely by a single thread.
The following picture shows an example of java identifier recognition:
.
Modifier and Type | Field and Description |
---|---|
static int |
EOF
Integer constant -1 returned by
LexerInput.read() to signal
that there are no more characters available on input. |
Modifier and Type | Method and Description |
---|---|
void |
backup(int count)
Undo last
count of LexerInput.read() operations. |
boolean |
consumeNewline()
Read the next character and check whether it's '\n'
and if not backup it (otherwise leave it consumed).
|
int |
read()
Read a single character from input or return
LexerInput.EOF . |
int |
readLength()
Get distance between the current reading point and the begining of a token
being currently recognized (excluding possibly read EOF).
|
int |
readLengthEOF()
Read length that includes EOF as a single character
if it was just read from this input.
|
CharSequence |
readText()
Return the read text for all the characters consumed from the input
for the current token recognition.
|
CharSequence |
readText(int start,
int end)
Get character sequence that corresponds to characters
that were read by previous
LexerInput.read() operations in the current token. |
public static final int EOF
LexerInput.read()
to signal
that there are no more characters available on input.
LexerInput.backup(int)
operations.
0xFFFF
when casted to char
.public int read()
LexerInput.EOF
.LexerInput.EOF
when there are no more characters available
on input. It's allowed to repeat the reads once EOF was returned
- all of them will return EOF.public void backup(int count)
count
of LexerInput.read()
operations.
LexerInput.read()
reads characters) so that subsequent read operations
will re-read the characters that were backed up.
LexerInput.EOF
was returned by LexerInput.read()
then
it will count as a single character in the backup operation
(even if returned multiple times)
i.e backup(1) will undo reading of (previously read) EOF.
Example:
// backup last character that was read - either regular char or EOF lexerInput.backup(1); // Backup all characters read during recognition of current token lexerInput.backup(readLengthEOF());
count
- >=0 amount of characters to return back to the input.IndexOutOfBoundsException
- in case
the count > readLengthEOF()
.public int readLength()
LexerInput.read()
operations since
the last token was returned. The LexerInput.backup(int)
operations with positive argument decrease that value
while those with negative argument increase it.
Once a token gets created by
TokenFactory.createToken(TokenId)
the value returned by readLength()
becomes zero.
If LexerInput.EOF
was read then it is not counted into read length.
public int readLengthEOF()
public CharSequence readText(int start, int end)
LexerInput.read()
operations in the current token.
private static final Map kwdStr2id = new HashMap(); static { String[] keywords = new String[] { "private", "protected", ... }; TokenId[] ids = new TokenId[] { JavaLanguage.PRIVATE, JavaLanguage.PROTECTED, ... }; for (int i = keywords.length - 1; i >= 0; i--) { kwdStr2id.put(keywords[i], ids[i]); } } public Token nextToken() { ... read characters of identifier/keyword by lexerInput.read() ... // Now decide between keyword or identifier CharSequence text = lexerInput.readText(0, lexerInput.readLength()); TokenId id = (TokenId)kwdStr2id.get(text); return (id != null) ? id : JavaLanguage.IDENTIFIER; }
If LexerInput.EOF
was previously returned by LexerInput.read()
then it will not be a part of the returned charcter sequence
(it also does not count into LexerInput.readLength()
.
Subsequent invocations of this method are cheap as the returned CharSequence instance is reused and just reinitialized.
start
- >=0 and =<LexerInput.readLength()
is the starting index of the character sequence in the previously read characters.end
- >=start and =<LexerInput.readLength()
is the starting index of the character sequence in the previously read characters.The returned character sequence is only valid
until any of read()
, backup()
,
createToken()
or another readText()
is called.
The length()
of the returned
character sequence will be equal
to the end - start
.
The hashCode()
method of the returned
character sequence works in the same way like
String.hashCode()
.
The equals()
method
attempts to cast the compared object to CharSequence
and compare the lengths and if they match
then compare every character of the given
character sequence i.e. the same way like String.equals()
works.
IndexOutOfBoundsException
- in case the parameters are not in the
required bounds.public CharSequence readText()
public boolean consumeNewline()
This method is useful in the following scenario:
switch (ch) { case 'x': ... break; case 'y': ... break; case '\r': input.consumeNewline(); case '\n': // Line separator recognized }