public final class TokenSequence<T extends TokenId> extends Object
TokenHierarchy.tokenSequence()
.
Use of token sequence is a two-step operation:
TokenSequence.move(int)
positions TS before token that either starts
at the given offset or "contains" it.
TokenSequence.moveIndex(int)
positions TS before n-th token in the underlying
token list.
TokenSequence.moveStart()
positions TS before the first token. TokenSequence.moveEnd()
positions TS behind the last token. TokenSequence.token()
will return null
to signal between-tokens location).
TokenSequence.moveNext()
or TokenSequence.movePrevious()
.
moveNext()
or movePrevious()
returned
true
then TS is positioned
over a concrete token retrievable by TokenSequence.token()
.
TokenSequence.offset()
.
An example of forward iteration through the tokens:
TokenSequence ts = tokenHierarchy.tokenSequence(); // Possible positioning by ts.move(offset) or ts.moveIndex(index) while (ts.moveNext()) { Token t = ts.token(); if (t.id() == ...) { ... } if (TokenUtilities.equals(t.text(), "mytext")) { ... } if (ts.offset() == ...) { ... } }
This object should be used by a single thread only. For token hierarchies over mutable input sources the obtaining and using of the token sequence needs to be done under a read-lock of the input source.
Modifier and Type | Method and Description |
---|---|
boolean |
createEmbedding(Language<?> embeddedLanguage,
int startSkipLength,
int endSkipLength)
Create language embedding without joining of the embedded sections.
|
boolean |
createEmbedding(Language<?> embeddedLanguage,
int startSkipLength,
int endSkipLength,
boolean joinSections)
Create language embedding described by the given parameters.
|
TokenSequence<?> |
embedded()
Get embedded token sequence if the token
to which this token sequence is currently positioned
has a language embedding.
|
<ET extends TokenId> |
embedded(Language<ET> embeddedLanguage)
Get embedded token sequence if the token
to which this token sequence is currently positioned
has a language embedding.
|
TokenSequence<?> |
embeddedJoined()
Get embedded token sequence that possibly joins multiple embeddings
with the same language paths (if the embeddings allow it - see
LanguageEmbedding.joinSections() ) into a single input text
which is then lexed as a single continuous text. |
<ET extends TokenId> |
embeddedJoined(Language<ET> embeddedLanguage)
Get embedded token sequence if the token
to which this token sequence is currently positioned
has a language embedding.
|
int |
index()
Get an index of token to which (or before which) this TS is currently positioned.
|
boolean |
isEmpty()
Check whether this TS contains zero tokens.
|
boolean |
isValid()
Check whether this token sequence is valid and can be iterated.
|
Language<T> |
language()
Get the language describing token ids
used by tokens in this token sequence.
|
LanguagePath |
languagePath()
Get the complete language path of the tokens contained
in this token sequence.
|
int |
move(int offset)
Move token sequence to be positioned between
index-1
and index tokens where Token[index] either starts at offset
or "contains" the offset. |
void |
moveEnd()
Move the token sequence to be positioned behind the last token.
|
int |
moveIndex(int index)
Position token sequence between
index-1
and index tokens. |
boolean |
moveNext()
Move to the next token in this token sequence.
|
boolean |
movePrevious()
Move to a previous token in this token sequence.
|
void |
moveStart()
Move the token sequence to be positioned before the first token.
|
int |
offset()
Get the offset of the current token in the underlying input.
|
Token<T> |
offsetToken()
Similar to
TokenSequence.token() but always returns a non-flyweight token
with the appropriate offset. |
boolean |
removeEmbedding(Language<?> embeddedLanguage)
Remove previously created language embedding.
|
TokenSequence<T> |
subSequence(int startOffset)
Create sub sequence of this token sequence that only returns
tokens above the given offset.
|
TokenSequence<T> |
subSequence(int startOffset,
int endOffset)
Create sub sequence of this token sequence that only returns
tokens between the given offsets.
|
Token<T> |
token()
Get token to which this token sequence points to or null
if TS is positioned between tokens
(
TokenSequence.moveNext() or TokenSequence.movePrevious() were not called yet). |
int |
tokenCount()
Return total count of tokens in this sequence.
|
String |
toString() |
public Language<T> language()
public LanguagePath languagePath()
public Token<T> token()
TokenSequence.moveNext()
or TokenSequence.movePrevious()
were not called yet).
TokenSequence ts = tokenHierarchy.tokenSequence(); // Possible positioning by ts.move(offset) or ts.moveIndex(index) while (ts.moveNext()) { Token t = ts.token(); if (t.id() == ...) { ... } if (TokenUtilities.equals(t.text(), "mytext")) { ... } if (ts.offset() == ...) { ... } }The returned token instance may be flyweight (
Token.isFlyweight()
returns true)
which means that its Token.offset(TokenHierarchy)
will return -1.
TokenSequence.offset()
.
TokenSequence.offsetToken()
may be used.
The lifetime of the returned token instance may be limited for mutable inputs. The token instance should not be held across the input source modifications.
TokenSequence.move(int)
or TokenSequence.moveIndex(int)
.TokenSequence.offsetToken()
public Token<T> offsetToken()
TokenSequence.token()
but always returns a non-flyweight token
with the appropriate offset.
TokenSequence.token()
will also return this non-flyweight token.
This method may be handy if the token instance is referenced in a standalone way (e.g. in an expression node of a parse tree) and it's necessary to get the appropriate offset from the token itself later when a token sequence will not be available.
IllegalStateException
- if TokenSequence.token()
returns null.public int offset()
IllegalStateException
- if TokenSequence.token()
returns null.public int index()
Initially or after TokenSequence.move(int)
or TokenSequence.moveIndex(int)
token sequence is positioned between tokens:
Token[0] Token[1] ... Token[n] ^ ^ ^ Index: 0 1 n
After use of TokenSequence.moveNext()
or TokenSequence.movePrevious()
the token sequence is positioned over one of the actual tokens:
Token[0] Token[1] ... Token[n] ^ ^ ^ Index: 0 1 n
public TokenSequence<?> embedded()
TokenSequence.createEmbedding(Language,int,int)
it will be returned
instead of the default embedding
(the one created by LanguageHierarchy.embedding()
or LanguageProvider
).IllegalStateException
- if TokenSequence.token()
returns null.public <ET extends TokenId> TokenSequence<ET> embedded(Language<ET> embeddedLanguage)
IllegalStateException
- if TokenSequence.token()
returns null.public TokenSequence<?> embeddedJoined()
LanguageEmbedding.joinSections()
) into a single input text
which is then lexed as a single continuous text.
TokenSequence.embedded()
.public <ET extends TokenId> TokenSequence<ET> embeddedJoined(Language<ET> embeddedLanguage)
IllegalStateException
- if TokenSequence.token()
returns null.public boolean createEmbedding(Language<?> embeddedLanguage, int startSkipLength, int endSkipLength)
startSkipLength
- number of characters to be skipped at token's begining.endSkipLength
- number of characters to be skipped at token's end.IllegalStateException
- if TokenSequence.token()
returns null.TokenSequence.createEmbedding(Language, int, int, boolean)
public boolean createEmbedding(Language<?> embeddedLanguage, int startSkipLength, int endSkipLength, boolean joinSections)
embeddedLanguage
- non-null embedded languagestartSkipLength
- >=0 number of characters in an initial part of the token
for which the language embedding is defined that should be excluded
from the embedded section. The excluded characters will not be lexed
and there will be no tokens created for them.endSkipLength
- >=0 number of characters at the end of the token
for which the language embedding is defined that should be excluded
from the embedded section. The excluded characters will not be lexed
and there will be no tokens created for them.joinSections
- whether sections with this embedding should be joined
across the input source or whether they should stay separate.
<!-- HTML comment start <% System.out.println("Hello"); %> still in HTML comment --<
IllegalStateException
- if TokenSequence.token()
returns null.public boolean removeEmbedding(Language<?> embeddedLanguage)
public boolean moveNext()
The next token may not necessarily start at the offset where
the previous token ends (there may be gaps between tokens
caused by token filtering). TokenSequence.offset()
should be used
for offset retrieval.
ConcurrentModificationException
- if this token sequence
is no longer valid because of an underlying mutable input source modification.public boolean movePrevious()
The previous token may not necessarily end at the offset where
the previous token started (there may be gaps between tokens
caused by token filtering). TokenSequence.offset()
should be used
for offset retrieval.
ConcurrentModificationException
- if this token sequence
is no longer valid because of an underlying mutable input source modification.public int moveIndex(int index)
index-1
and index
tokens.
Token[0] ... Token[index-1] Token[index] ... ^ ^ ^ Index: 0 index-1 index
Subsequent TokenSequence.moveNext()
or TokenSequence.movePrevious()
is needed to fetch
a concrete token in the desired direction.
Subsequent TokenSequence.moveNext()
will position TS over Token[index]
(or TokenSequence.movePrevious()
will position TS over Token[index-1]
)
so that
.TokenSequence.token()
!= null
index
- index of the token to which this sequence
should be positioned.
index >= TokenSequence.tokenCount()
then the TS will be positioned to TokenSequence.tokenCount()
.
index < 0
then the TS will be positioned to index 0.ConcurrentModificationException
- if this token sequence
is no longer valid because of an underlying mutable input source modification.public void moveStart()
moveIndex(0)
.public void moveEnd()
moveIndex(tokenCount())
.public int move(int offset)
index-1
and index
tokens where Token[index] either starts at offset
or "contains" the offset.
+----------+-----+----------------+--------------+------ | Token[0] | ... | Token[index-1] | Token[index] | ... | "public" | ... | "static" | "int" | ... +----------+-----+----------------+--------------+------ ^ ^ ^ Index: 0 index-1 index Offset: ---^ (if offset points to 'i','n' or 't')
Subsequent TokenSequence.moveNext()
or TokenSequence.movePrevious()
is needed to fetch
a concrete token.
If the offset is too big then the token sequence will be positioned
behind the last token.
If token filtering is used there may be gaps that are not covered by any tokens and if the offset is contained in such gap then the token sequence will be positioned before the token that precedes the gap.
offset
- absolute offset to which the token sequence should be moved.ConcurrentModificationException
- if this token sequence
is no longer valid because of an underlying mutable input source modification.public boolean isEmpty()
tokenCount() == 0
.TokenSequence.tokenCount()
public int tokenCount()
public TokenSequence<T> subSequence(int startOffset)
startOffset
- only tokens satisfying
tokenStartOffset + tokenLength > startOffset
will be present in the returned sequence.public TokenSequence<T> subSequence(int startOffset, int endOffset)
startOffset
- only tokens satisfying
tokenStartOffset + tokenLength > startOffset
will be present in the returned sequence.endOffset
- >=startOffset only tokens satisfying
tokenStartOffset < endOffset
will be present in the returned sequence.public boolean isValid()