Class TruncateTokenFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.miscellaneous.TruncateTokenFilter
- All Implemented Interfaces:
Closeable,AutoCloseable,Unwrappable<TokenStream>
A token filter for truncating the terms into a specific length (number of codepoints). Fixed
prefix truncation, as a stemming method, produces good results on Turkish language. It is
reported that F5, using first 5 characters, produced best results in Information Retrieval on Turkish Texts
Since Lucene 10.5, the filter is able to correctly handle codepoints and truncates after the
given number of codepoints, no longer producing incomplete surrogate pairs. Use the modern
factory method truncateAfterCodePoints(TokenStream, int) to enable this mode. Legacy
behaviour is still available with truncateAfterChars(TokenStream, int)
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State -
Field Summary
Fields inherited from class org.apache.lucene.analysis.TokenFilter
inputFields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionfinal booleanstatic TruncateTokenFiltertruncateAfterChars(TokenStream input, int nChars) Returns a filter with a prefix ofnCharsJava Characters.static TruncateTokenFiltertruncateAfterCodePoints(TokenStream input, int nCodePoints) Returns a filter with a prefix ofnCodePoints.Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, reset, unwrapMethods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Constructor Details
-
TruncateTokenFilter
Deprecated.This constructor is deprecated, usetruncateAfterChars(TokenStream, int)for backwards compatibility, ortruncateAfterCodePoints(TokenStream, int)to be unicode conformant.Instantiates filter with a prefix ofnCharsJava Characters. This may split surrogate pairs.
-
-
Method Details
-
truncateAfterCodePoints
Returns a filter with a prefix ofnCodePoints. -
truncateAfterChars
Returns a filter with a prefix ofnCharsJava Characters. This may split surrogate pairs. -
incrementToken
- Specified by:
incrementTokenin classTokenStream- Throws:
IOException
-
truncateAfterChars(TokenStream, int)for backwards compatibility, ortruncateAfterCodePoints(TokenStream, int)to be unicode conformant.