public class Syntax extends Object
Modifier and Type | Class and Description |
---|---|
static class |
Syntax.BaseStateInfo
Base implementation of the StateInfo interface
|
static interface |
Syntax.StateInfo
Interface that stores two basic pieces of information about
the state of the whole lexical analyzer - its internal state and preScan.
|
Modifier and Type | Field and Description |
---|---|
protected char[] |
buffer
Text buffer to scan
|
static int |
DIFFERENT_STATE
Is the state of analyzer different from given state info?
|
static int |
EQUAL_STATE
Is the state of analyzer equal to a given state info?
|
static int |
INIT
Initial internal state of the analyzer
|
protected boolean |
lastBuffer
Setting this flag to true means that there are currently no more
buffers available so that analyzer should return all the tokens
including those whose successful scanning would be otherwise
left for later when the next buffer will be available.
|
protected int |
offset
Current offset in the buffer
|
protected int |
state
Internal state of the lexical analyzer.
|
protected int |
stopOffset
On which offset in the buffer scanning should stop.
|
protected int |
stopPosition
The position in the document that logically corresponds
to the stopOffset value.
|
protected TokenID |
supposedTokenID
This variable can be populated by the parseToken() method
in case the user types an errorneous construction but
it's clear what correct token he meant to write.
|
protected TokenContextPath |
tokenContextPath
Path from which the found token-id comes from.
|
protected int |
tokenLength
This variable is the length of the token that was found
|
protected int |
tokenOffset
Offset holding the begining of the current token
|
Constructor and Description |
---|
Syntax() |
Modifier and Type | Method and Description |
---|---|
int |
compareState(Syntax.StateInfo stateInfo)
Compare state of this analyzer to given state info
|
Syntax.StateInfo |
createStateInfo()
Create state info appropriate for particular analyzer
|
char[] |
getBuffer()
Get the current buffer
|
int |
getOffset()
Get the current scanning offset
|
int |
getPreScan()
Get the pre-scan which is a number
of characters between offset and tokenOffset.
|
String |
getStateName(int stateNumber)
Get state name as string.
|
TokenID |
getSupposedTokenID() |
TokenContextPath |
getTokenContextPath()
Get the token-context-path of the returned token.
|
int |
getTokenLength()
Get length of token in scanned buffer.
|
int |
getTokenOffset()
Get start of token in scanned buffer.
|
void |
load(Syntax.StateInfo stateInfo,
char[] buffer,
int offset,
int len,
boolean lastBuffer,
int stopPosition)
Load the state from syntax mark into analyzer.
|
void |
loadInitState()
Initialize the analyzer when scanning from the begining
of the document or when the state stored in syntax mark
is null for some reason or to explicitly reset the analyzer
to the initial state.
|
void |
loadState(Syntax.StateInfo stateInfo)
Load valid mark state into the analyzer.
|
TokenID |
nextToken()
Function that should be called externally to scan the text.
|
protected TokenID |
parseToken()
This is core function of analyzer and it returns either the token-id
or null to indicate that the end of buffer was found.
|
void |
relocate(char[] buffer,
int offset,
int len,
boolean lastBuffer,
int stopPosition)
Relocate scanning to another buffer.
|
void |
reset() |
void |
storeState(Syntax.StateInfo stateInfo)
Store state of this analyzer into given mark state.
|
String |
toString()
Syntax information as String
|
public static final int EQUAL_STATE
public static final int DIFFERENT_STATE
public static final int INIT
protected int state
protected char[] buffer
protected int offset
protected int tokenOffset
protected int tokenLength
protected TokenContextPath tokenContextPath
protected boolean lastBuffer
protected int stopOffset
protected int stopPosition
protected TokenID supposedTokenID
public TokenID nextToken()
protected TokenID parseToken()
public void load(Syntax.StateInfo stateInfo, char[] buffer, int offset, int len, boolean lastBuffer, int stopPosition)
stateInfo
- info about the state of the lexical analyzer to load.
It can be null to indicate there's no previous state so the analyzer
starts from its initial state.buffer
- buffer that will be scannedoffset
- offset of the first character that will be scannedlen
- length of the area to be scannedlastBuffer
- whether this is the last buffer in the document. All the tokens
will be returned including the last possibly incomplete one. If the data
come from the document, the simple rule for this parameter
is (doc.getLength() == stop-position) where stop-position
is the position corresponding to the (offset + len) in the buffer
that comes from the document data.stopPosition
- position in the document that corresponds to (offset + len) offset
in the provided buffer. It has only sense if the data in the buffer come from the document.
It helps in writing the advanced analyzers that need to interact with some other data
in the document than only those provided in the character buffer.
If there is no relation to the document data, the stopPosition parameter
must be filled with -1 which means an invalid value.
The stop-position is passed (instead of start-position) because it doesn't
change through the analyzer operation. It corresponds to the stopOffset
that also doesn't change through the analyzer operation so any
buffer-offset can be transferred to position by computing
stopPosition + buffer-offset - stopOffset
where stopOffset is the instance variable that is assigned
to offset + len in the body of relocate().public void relocate(char[] buffer, int offset, int len, boolean lastBuffer, int stopPosition)
buffer
- next buffer where the scan will continue.offset
- offset where the scan will continue.
It's not decremented by the current preScan.len
- length of the area to be scanned.
It's not extended by the current preScan.lastBuffer
- whether this is the last buffer in the document. All the tokens
will be returned including the last possibly incomplete one. If the data
come from the document, the simple rule for this parameter
is (doc.getLength() == stop-position) where stop-position
is the position corresponding to the (offset + len) in the buffer
that comes from the document data.stopPosition
- position in the document that corresponds to (offset + len) offset
in the provided buffer. It has only sense if the data in the buffer come from the document.
It helps in writing the advanced analyzers that need to interact with some other data
in the document than only those provided in the character buffer.
If there is no relation to the document data, the stopPosition parameter
must be filled with -1 which means an invalid value.
The stop-position is passed (instead of start-position) because it doesn't
change through the analyzer operation. It corresponds to the stopOffset
that also doesn't change through the analyzer operation so any
buffer-offset can be transferred to position by computing
stopPosition + buffer-offset - stopOffset
where stopOffset is the instance variable that is assigned
to offset + len in the body of relocate().public char[] getBuffer()
public int getOffset()
public int getTokenOffset()
public int getTokenLength()
public TokenContextPath getTokenContextPath()
public TokenID getSupposedTokenID()
public int getPreScan()
public void loadInitState()
public void reset()
public void loadState(Syntax.StateInfo stateInfo)
markState
- mark state to be loaded into syntax. It must be non-null value.public void storeState(Syntax.StateInfo stateInfo)
public int compareState(Syntax.StateInfo stateInfo)
public Syntax.StateInfo createStateInfo()
public String getStateName(int stateNumber)
Built on June 4 2024. | Copyright © 2017-2024 Apache Software Foundation. All Rights Reserved.