public interface TokenProcessor
Modifier and Type | Method and Description |
---|---|
int |
eot(int offset)
Notify that end of scanned buffer was found.
|
void |
nextBuffer(char[] buffer,
int offset,
int len,
int startPos,
int preScan,
boolean lastBuffer)
Notify that the following buffer will be scanned.
|
boolean |
token(TokenID tokenID,
TokenContextPath tokenContextPath,
int tokenBufferOffset,
int tokenLength)
Notify that the token was found.
|
boolean token(TokenID tokenID, TokenContextPath tokenContextPath, int tokenBufferOffset, int tokenLength)
tokenID
- ID of the token foundtokenContextPath
- Context-path in which the token that was found.tokenBufferOffset
- Offset of the token in the buffer. The buffer
is provided in the nextBuffer() method.tokenLength
- Length of the token foundint eot(int offset)
offset
- offset of the rest of the charactersvoid nextBuffer(char[] buffer, int offset, int len, int startPos, int preScan, boolean lastBuffer)
buffer
- buffer that will be scanned. To get the text of the tokens
the buffer should be stored in some instance variable.offset
- offset in the buffer with the first character to be scanned.
If doesn't reflect the possible preScan. If the preScan would be non-zero
then the first buffer offset that contains the valid data is
offset - preScan.len
- count of the characters that will be scanned. It doesn't reflect
the ppossible reScan.startPos
- starting position of the scanning in the document. It
logically corresponds to the offset because of the same
text data both in the buffer and in the document.
It again doesn't reflect the possible preScan and the startPos - preScan
gives the real start of the first token. If it's necessary to know
the position of each token, it's a good idea to store the value
startPos - offset in an instance variable that could be called
bufferStartPos. The position of the token can be then computed
as bufferStartPos + tokenBufferOffset.preScan
- preScan needed for the scanning.lastBuffer
- whether this is the last buffer to scan in the document
so there are no more characters in the document after this buffer.