All Classes Interface Summary Class Summary Enum Summary Exception Summary
| Class |
Description |
| AbstractEncoder |
Base class for payload encoders.
|
| AbstractWordsFileFilterFactory |
Abstract parent class for analysis factories that accept a stopwords file as input.
|
| AffixedWord |
An object representing the analysis result of a simple (non-compound) word
|
| AffixedWord.Affix |
An object representing a prefix or a suffix applied to a word stem
|
| Among |
Internal class used by Snowball stemmers
|
| ApostropheFilter |
Strips all characters after an apostrophe (including the apostrophe itself).
|
| ApostropheFilterFactory |
|
| ArabicAnalyzer |
|
| ArabicNormalizationFilter |
|
| ArabicNormalizationFilterFactory |
|
| ArabicNormalizer |
Normalizer for Arabic.
|
| ArabicStemFilter |
|
| ArabicStemFilterFactory |
|
| ArabicStemmer |
Stemmer for Arabic.
|
| ArabicStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| ArmenianAnalyzer |
|
| ArmenianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| ASCIIFoldingFilter |
This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the
first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one
exists.
|
| ASCIIFoldingFilterFactory |
|
| BaseCharFilter |
|
| BasqueAnalyzer |
|
| BasqueStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| BengaliAnalyzer |
Analyzer for Bengali.
|
| BengaliNormalizationFilter |
|
| BengaliNormalizationFilterFactory |
|
| BengaliNormalizer |
Normalizer for Bengali.
|
| BengaliStemFilter |
|
| BengaliStemFilterFactory |
|
| BengaliStemmer |
Stemmer for Bengali.
|
| BrazilianAnalyzer |
Analyzer for Brazilian Portuguese language.
|
| BrazilianStemFilter |
|
| BrazilianStemFilterFactory |
|
| BrazilianStemmer |
A stemmer for Brazilian Portuguese words.
|
| BulgarianAnalyzer |
|
| BulgarianStemFilter |
|
| BulgarianStemFilterFactory |
|
| BulgarianStemmer |
Light Stemmer for Bulgarian.
|
| ByteVector |
This class implements a simple byte vector with access to the underlying array.
|
| CapitalizationFilter |
A filter to apply normal capitalization rules to Tokens.
|
| CapitalizationFilterFactory |
|
| CatalanAnalyzer |
|
| CatalanStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| CharArrayIterator |
|
| CharTokenizer |
An abstract base class for simple, character-oriented tokenizers.
|
| CharVector |
This class implements a simple char vector with access to the underlying array.
|
| CJKAnalyzer |
|
| CJKBigramFilter |
Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.
|
| CJKBigramFilterFactory |
|
| CJKWidthCharFilter |
A CharFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
|
| CJKWidthCharFilterFactory |
|
| CJKWidthFilter |
A TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
|
| CJKWidthFilterFactory |
|
| ClassicAnalyzer |
|
| ClassicFilter |
|
| ClassicFilterFactory |
|
| ClassicTokenizer |
A grammar-based tokenizer constructed with JFlex
|
| ClassicTokenizerFactory |
|
| CodepointCountFilter |
Removes words that are too long or too short from the stream.
|
| CodepointCountFilterFactory |
|
| CollatedTermAttributeImpl |
Extension of CharTermAttributeImpl that encodes the term text as a binary Unicode
collation key instead of as UTF-8 bytes.
|
| CollationAttributeFactory |
Converts each token into its CollationKey, and then encodes the bytes as an
index term.
|
| CollationDocValuesField |
|
| CollationKeyAnalyzer |
|
| CommonGramsFilter |
Construct bigrams for frequently occurring terms while indexing.
|
| CommonGramsFilterFactory |
|
| CommonGramsQueryFilter |
Wrap a CommonGramsFilter optimizing phrase queries by only returning single words when they are
not a member of a bigram.
|
| CommonGramsQueryFilterFactory |
|
| CompoundWordTokenFilterBase |
Base class for decomposition token filters.
|
| ConcatenateGraphFilter |
Concatenates/Joins every incoming token with a separator into one output token for every path
through the token stream (which is a graph).
|
| ConcatenateGraphFilter.BytesRefBuilderTermAttribute |
Attribute providing access to the term builder and UTF-16 conversion
|
| ConcatenateGraphFilter.BytesRefBuilderTermAttributeImpl |
|
| ConcatenateGraphFilterFactory |
|
| ConcatenatingTokenStream |
A TokenStream that takes an array of input TokenStreams as sources, and concatenates them
together.
|
| ConditionalTokenFilter |
Allows skipping TokenFilters based on the current set of attributes.
|
| ConditionalTokenFilterFactory |
|
| CSVUtil |
Utility class for parsing CSV text
|
| CustomAnalyzer |
A general-purpose Analyzer that can be created with a builder-style API.
|
| CustomAnalyzer.Builder |
|
| CustomAnalyzer.ConditionBuilder |
|
| CzechAnalyzer |
|
| CzechStemFilter |
|
| CzechStemFilterFactory |
|
| CzechStemmer |
Light Stemmer for Czech.
|
| DanishAnalyzer |
|
| DanishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| DateRecognizerFilter |
Filters all tokens that cannot be parsed to a date, using the provided DateFormat.
|
| DateRecognizerFilterFactory |
|
| DecimalDigitFilter |
Folds all Unicode digits in [:General_Category=Decimal_Number:] to Basic Latin digits
(0-9).
|
| DecimalDigitFilterFactory |
|
| DelimitedBoostTokenFilter |
Characters before the delimiter are the "token", those after are the boost.
|
| DelimitedBoostTokenFilterFactory |
|
| DelimitedPayloadTokenFilter |
Characters before the delimiter are the "token", those after are the payload.
|
| DelimitedPayloadTokenFilterFactory |
|
| DelimitedTermFrequencyTokenFilter |
Characters before the delimiter are the "token", the textual integer after is the term frequency.
|
| DelimitedTermFrequencyTokenFilterFactory |
|
| DictEntries |
An object representing homonym dictionary entries.
|
| DictEntry |
An object representing *.dic file entry with its word, flags and morphological data.
|
| Dictionary |
In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.
|
| DictionaryCompoundWordTokenFilter |
A TokenFilter that decomposes compound words found in many
Germanic languages.
|
| DictionaryCompoundWordTokenFilterFactory |
|
| Dl4jModelReader |
Dl4jModelReader reads the file generated by the library Deeplearning4j and provide a
Word2VecModel with normalized vectors
|
| DropIfFlaggedFilter |
Allows Tokens with a given combination of flags to be dropped.
|
| DropIfFlaggedFilterFactory |
Provides a filter that will drop tokens matching a set of flags.
|
| DutchAnalyzer |
|
| DutchStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| EdgeNGramFilterFactory |
|
| EdgeNGramTokenFilter |
Tokenizes the given token into n-grams of given size(s).
|
| EdgeNGramTokenizer |
Tokenizes the input from an edge into n-grams of given size(s).
|
| EdgeNGramTokenizerFactory |
|
| ElisionFilter |
|
| ElisionFilterFactory |
|
| EmptyTokenStream |
An always exhausted token stream.
|
| EnglishAnalyzer |
|
| EnglishMinimalStemFilter |
|
| EnglishMinimalStemFilterFactory |
|
| EnglishMinimalStemmer |
Minimal plural stemmer for English.
|
| EnglishPossessiveFilter |
TokenFilter that removes possessives (trailing 's) from words.
|
| EnglishPossessiveFilterFactory |
|
| EnglishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| EntrySuggestion |
|
| EstonianAnalyzer |
|
| EstonianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| FilesystemResourceLoader |
Simple ResourceLoader that opens resource files from the local file system, optionally
resolving against a base directory.
|
| FingerprintFilter |
Filter outputs a single token which is a concatenation of the sorted and de-duplicated set of
input tokens.
|
| FingerprintFilterFactory |
|
| FinnishAnalyzer |
|
| FinnishLightStemFilter |
|
| FinnishLightStemFilterFactory |
|
| FinnishLightStemmer |
Light Stemmer for Finnish.
|
| FinnishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| FixBrokenOffsetsFilter |
Deprecated.
|
| FixBrokenOffsetsFilterFactory |
Deprecated. |
| FixedShingleFilter |
A FixedShingleFilter constructs shingles (token n-grams) from a token stream.
|
| FixedShingleFilterFactory |
|
| FlattenGraphFilter |
Converts an incoming graph token stream, such as one from SynonymGraphFilter, into a flat
form so that all nodes form a single linear chain with no side paths.
|
| FlattenGraphFilterFactory |
|
| FloatEncoder |
Encode a character array Float as a BytesRef.
|
| FragmentChecker |
An oracle for quickly checking that a specific part of a word can never be a valid word.
|
| FrenchAnalyzer |
|
| FrenchLightStemFilter |
|
| FrenchLightStemFilterFactory |
|
| FrenchLightStemmer |
Light Stemmer for French.
|
| FrenchMinimalStemFilter |
|
| FrenchMinimalStemFilterFactory |
|
| FrenchMinimalStemmer |
Light Stemmer for French.
|
| FrenchStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| GalicianAnalyzer |
|
| GalicianMinimalStemFilter |
|
| GalicianMinimalStemFilterFactory |
|
| GalicianMinimalStemmer |
Minimal Stemmer for Galician
|
| GalicianStemFilter |
|
| GalicianStemFilterFactory |
|
| GalicianStemmer |
Galician stemmer implementing "Regras do lematizador para o galego".
|
| German2Stemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| GermanAnalyzer |
|
| GermanLightStemFilter |
|
| GermanLightStemFilterFactory |
|
| GermanLightStemmer |
Light Stemmer for German.
|
| GermanMinimalStemFilter |
|
| GermanMinimalStemFilterFactory |
|
| GermanMinimalStemmer |
Minimal Stemmer for German.
|
| GermanNormalizationFilter |
|
| GermanNormalizationFilterFactory |
|
| GermanStemFilter |
|
| GermanStemFilterFactory |
|
| GermanStemmer |
A stemmer for German words.
|
| GermanStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| GreekAnalyzer |
|
| GreekLowerCaseFilter |
Normalizes token text to lower case, removes some Greek diacritics, and standardizes final sigma
to sigma.
|
| GreekLowerCaseFilterFactory |
|
| GreekStemFilter |
|
| GreekStemFilterFactory |
|
| GreekStemmer |
A stemmer for Greek words, according to: Development of a Stemmer for the Greek Language.
Georgios Ntais
|
| GreekStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| HindiAnalyzer |
Analyzer for Hindi.
|
| HindiNormalizationFilter |
|
| HindiNormalizationFilterFactory |
|
| HindiNormalizer |
Normalizer for Hindi.
|
| HindiStemFilter |
|
| HindiStemFilterFactory |
|
| HindiStemmer |
Light Stemmer for Hindi.
|
| HindiStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| HTMLStripCharFilter |
A CharFilter that wraps another Reader and attempts to strip out HTML constructs.
|
| HTMLStripCharFilterFactory |
|
| HungarianAnalyzer |
|
| HungarianLightStemFilter |
|
| HungarianLightStemFilterFactory |
|
| HungarianLightStemmer |
Light Stemmer for Hungarian.
|
| HungarianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| Hunspell |
A spell checker based on Hunspell dictionaries.
|
| HunspellStemFilter |
TokenFilter that uses hunspell affix rules and words to stem tokens.
|
| HunspellStemFilterFactory |
|
| Hyphen |
This class represents a hyphen.
|
| HyphenatedWordsFilter |
When the plain text is extracted from documents, we will often have many words hyphenated and
broken into two lines.
|
| HyphenatedWordsFilterFactory |
|
| Hyphenation |
This class represents a hyphenated word.
|
| HyphenationCompoundWordTokenFilter |
A TokenFilter that decomposes compound words found in many
Germanic languages.
|
| HyphenationCompoundWordTokenFilterFactory |
|
| HyphenationTree |
This tree structure stores the hyphenation patterns in an efficient way for fast lookup.
|
| IdentityEncoder |
Does nothing other than convert the char array to a byte array using the specified encoding.
|
| IndicNormalizationFilter |
|
| IndicNormalizationFilterFactory |
|
| IndicNormalizer |
Normalizes the Unicode representation of text in Indian languages.
|
| IndonesianAnalyzer |
Analyzer for Indonesian (Bahasa)
|
| IndonesianStemFilter |
|
| IndonesianStemFilterFactory |
|
| IndonesianStemmer |
Stemmer for Indonesian.
|
| IndonesianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| IntegerEncoder |
Encode a character array Integer as a BytesRef.
|
| IrishAnalyzer |
|
| IrishLowerCaseFilter |
Normalises token text to lower case, handling t-prothesis and n-eclipsis (i.e., that 'nAthair'
should become 'n-athair')
|
| IrishLowerCaseFilterFactory |
|
| IrishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| ItalianAnalyzer |
|
| ItalianLightStemFilter |
|
| ItalianLightStemFilterFactory |
|
| ItalianLightStemmer |
Light Stemmer for Italian.
|
| ItalianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| KeepWordFilter |
A TokenFilter that only keeps tokens with text contained in the required words.
|
| KeepWordFilterFactory |
|
| KeywordAnalyzer |
"Tokenizes" the entire stream as a single token.
|
| KeywordMarkerFilter |
|
| KeywordMarkerFilterFactory |
|
| KeywordRepeatFilter |
This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other
words once with KeywordAttribute.setKeyword(boolean) set to true and once
set to false.
|
| KeywordRepeatFilterFactory |
|
| KeywordTokenizer |
Emits the entire input as a single token.
|
| KeywordTokenizerFactory |
|
| KpStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| KStemFilter |
A high-performance kstem filter for english.
|
| KStemFilterFactory |
|
| KStemmer |
This class implements the Kstem algorithm
|
| LatvianAnalyzer |
|
| LatvianStemFilter |
|
| LatvianStemFilterFactory |
|
| LatvianStemmer |
Light stemmer for Latvian.
|
| LengthFilter |
Removes words that are too long or too short from the stream.
|
| LengthFilterFactory |
|
| LetterTokenizer |
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
| LetterTokenizerFactory |
|
| LimitTokenCountAnalyzer |
This Analyzer limits the number of tokens while indexing.
|
| LimitTokenCountFilter |
This TokenFilter limits the number of tokens while indexing.
|
| LimitTokenCountFilterFactory |
|
| LimitTokenOffsetFilter |
Lets all tokens pass through until it sees one with a start offset <= a configured limit,
which won't pass and ends the stream.
|
| LimitTokenOffsetFilterFactory |
|
| LimitTokenPositionFilter |
This TokenFilter limits its emitted tokens to those with positions that are not greater than the
configured limit.
|
| LimitTokenPositionFilterFactory |
|
| LithuanianAnalyzer |
|
| LithuanianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| LovinsStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| LowerCaseFilter |
Normalizes token text to lower case.
|
| LowerCaseFilterFactory |
|
| MappingCharFilter |
Simplistic CharFilter that applies the mappings contained in a NormalizeCharMap
to the character stream, and correcting the resulting changes to the offsets.
|
| MappingCharFilterFactory |
|
| MinHashFilter |
Generate min hash tokens from an incoming stream of tokens.
|
| MinHashFilterFactory |
|
| NepaliAnalyzer |
Analyzer for Nepali.
|
| NepaliStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| NGramFilterFactory |
|
| NGramFragmentChecker |
A FragmentChecker based on all character n-grams possible in a certain language, keeping
them in a relatively memory-efficient, but probabilistic data structure.
|
| NGramFragmentChecker.NGramConsumer |
A callback for n-gram ranges in words
|
| NGramTokenFilter |
Tokenizes the input into n-grams of the given size(s).
|
| NGramTokenizer |
Tokenizes the input into n-grams of the given size(s).
|
| NGramTokenizerFactory |
|
| NormalizeCharMap |
|
| NormalizeCharMap.Builder |
Builds an NormalizeCharMap.
|
| NorwegianAnalyzer |
|
| NorwegianLightStemFilter |
|
| NorwegianLightStemFilterFactory |
|
| NorwegianLightStemmer |
Light Stemmer for Norwegian.
|
| NorwegianMinimalStemFilter |
|
| NorwegianMinimalStemFilterFactory |
|
| NorwegianMinimalStemmer |
Minimal Stemmer for Norwegian Bokmål (no-nb) and Nynorsk (no-nn)
|
| NorwegianNormalizationFilter |
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded
variants (ae, oe, aa) by transforming them to åÅæÆøØ.
|
| NorwegianNormalizationFilterFactory |
|
| NorwegianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| NumericPayloadTokenFilter |
|
| NumericPayloadTokenFilterFactory |
|
| OpenStringBuilder |
A StringBuilder that allows one to access the array.
|
| PathHierarchyTokenizer |
Tokenizer for path-like hierarchies.
|
| PathHierarchyTokenizerFactory |
|
| PatternCaptureGroupFilterFactory |
|
| PatternCaptureGroupTokenFilter |
CaptureGroup uses Java regexes to emit multiple tokens - one for each capture group in one or
more patterns.
|
| PatternConsumer |
This interface is used to connect the XML pattern file parser to the hyphenation tree.
|
| PatternKeywordMarkerFilter |
|
| PatternParser |
A SAX document handler to read and parse hyphenation patterns from a XML file.
|
| PatternReplaceCharFilter |
CharFilter that uses a regular expression for the target of replace string.
|
| PatternReplaceCharFilterFactory |
|
| PatternReplaceFilter |
A TokenFilter which applies a Pattern to each token in the stream, replacing match occurrences
with the specified replacement string.
|
| PatternReplaceFilterFactory |
|
| PatternTokenizer |
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream.
|
| PatternTokenizerFactory |
|
| PatternTypingFilter |
Set a type attribute to a parameterized value when tokens are matched by any of a several regex
patterns.
|
| PatternTypingFilter.PatternTypingRule |
Value holding class for pattern typing rules.
|
| PatternTypingFilterFactory |
Provides a filter that will analyze tokens with the analyzer from an arbitrary field type.
|
| PayloadEncoder |
Mainly for use with the DelimitedPayloadTokenFilter, converts char buffers to BytesRef.
|
| PayloadHelper |
Utility methods for encoding payloads.
|
| PerFieldAnalyzerWrapper |
This analyzer is used to facilitate scenarios where different fields require different analysis
techniques.
|
| PersianAnalyzer |
|
| PersianCharFilter |
CharFilter that replaces instances of Zero-width non-joiner with an ordinary space.
|
| PersianCharFilterFactory |
|
| PersianNormalizationFilter |
|
| PersianNormalizationFilterFactory |
|
| PersianNormalizer |
Normalizer for Persian.
|
| PersianStemFilter |
|
| PersianStemFilterFactory |
|
| PersianStemmer |
Stemmer for Persian.
|
| PorterStemFilter |
Transforms the token stream as per the Porter stemming algorithm.
|
| PorterStemFilterFactory |
|
| PorterStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| PortugueseAnalyzer |
|
| PortugueseLightStemFilter |
|
| PortugueseLightStemFilterFactory |
|
| PortugueseLightStemmer |
Light Stemmer for Portuguese
|
| PortugueseMinimalStemFilter |
|
| PortugueseMinimalStemFilterFactory |
|
| PortugueseMinimalStemmer |
Minimal Stemmer for Portuguese
|
| PortugueseStemFilter |
|
| PortugueseStemFilterFactory |
|
| PortugueseStemmer |
Portuguese stemmer implementing the RSLP (Removedor de Sufixos da Lingua Portuguesa) algorithm.
|
| PortugueseStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| ProtectedTermFilter |
A ConditionalTokenFilter that only applies its wrapped filters to tokens that are not contained
in a protected set.
|
| ProtectedTermFilterFactory |
|
| QueryAutoStopWordAnalyzer |
An Analyzer used primarily at query time to wrap another analyzer and provide a layer of
protection which prevents very common words from being passed into queries.
|
| RemoveDuplicatesTokenFilter |
A TokenFilter which filters out Tokens at the same position and Term text as the previous token
in the stream.
|
| RemoveDuplicatesTokenFilterFactory |
|
| ReversePathHierarchyTokenizer |
Tokenizer for domain-like hierarchies.
|
| ReverseStringFilter |
Reverse token string, for example "country" => "yrtnuoc".
|
| ReverseStringFilterFactory |
|
| RollingCharBuffer |
Acts like a forever growing char[] as you read characters into it from the provided reader, but
internally it uses a circular buffer to only hold the characters that haven't been freed yet.
|
| RomanianAnalyzer |
|
| RomanianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| RSLPStemmerBase |
Base class for stemmers that use a set of RSLP-like stemming steps.
|
| RSLPStemmerBase.Rule |
A basic rule, with no exceptions.
|
| RSLPStemmerBase.RuleWithSetExceptions |
A rule with a set of whole-word exceptions.
|
| RSLPStemmerBase.RuleWithSuffixExceptions |
A rule with a set of exceptional suffixes.
|
| RSLPStemmerBase.Step |
A step containing a list of rules.
|
| RussianAnalyzer |
|
| RussianLightStemFilter |
|
| RussianLightStemFilterFactory |
|
| RussianLightStemmer |
Light Stemmer for Russian.
|
| RussianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| ScandinavianFoldingFilter |
This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.
|
| ScandinavianFoldingFilterFactory |
|
| ScandinavianNormalizationFilter |
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded
variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
|
| ScandinavianNormalizationFilterFactory |
|
| ScandinavianNormalizer |
This Normalizer does the heavy lifting for a set of Scandinavian normalization filters,
normalizing use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa,
ao, ae, oe and oo) by transforming them to åÅæÆøØ.
|
| ScandinavianNormalizer.Foldings |
List of possible foldings that can be used when configuring the filter
|
| SegmentingTokenizerBase |
Breaks text into sentences with a BreakIterator and allows subclasses to decompose these
sentences into words.
|
| SerbianAnalyzer |
|
| SerbianNormalizationFilter |
Normalizes Serbian Cyrillic and Latin characters to "bald" Latin.
|
| SerbianNormalizationFilterFactory |
|
| SerbianNormalizationRegularFilter |
Normalizes Serbian Cyrillic to Latin.
|
| SerbianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| SetKeywordMarkerFilter |
|
| ShingleAnalyzerWrapper |
|
| ShingleFilter |
A ShingleFilter constructs shingles (token n-grams) from a token stream.
|
| ShingleFilterFactory |
|
| SimpleAnalyzer |
|
| SimplePatternSplitTokenizer |
This tokenizer uses a Lucene RegExp or (expert usage) a pre-built determinized Automaton, to locate tokens.
|
| SimplePatternSplitTokenizerFactory |
|
| SimplePatternTokenizer |
This tokenizer uses a Lucene RegExp or (expert usage) a pre-built determinized Automaton, to locate tokens.
|
| SimplePatternTokenizerFactory |
|
| SnowballFilter |
A filter that stems words using a Snowball-generated stemmer.
|
| SnowballPorterFilterFactory |
|
| SnowballProgram |
Base class for a snowball stemmer
|
| SnowballStemmer |
Parent class of all snowball stemmers, which must implement stem
|
| SolrSynonymParser |
Parser for the Solr synonyms format.
|
| SoraniAnalyzer |
|
| SoraniNormalizationFilter |
|
| SoraniNormalizationFilterFactory |
|
| SoraniNormalizer |
Normalizes the Unicode representation of Sorani text.
|
| SoraniStemFilter |
|
| SoraniStemFilterFactory |
|
| SoraniStemmer |
Light stemmer for Sorani
|
| SortingStrategy |
The strategy defining how a Hunspell dictionary should be loaded, with different tradeoffs.
|
| SpanishAnalyzer |
|
| SpanishLightStemFilter |
|
| SpanishLightStemFilterFactory |
|
| SpanishLightStemmer |
Light Stemmer for Spanish
|
| SpanishMinimalStemFilter |
Deprecated.
|
| SpanishMinimalStemFilterFactory |
Deprecated.
|
| SpanishMinimalStemmer |
Deprecated.
|
| SpanishPluralStemFilter |
|
| SpanishPluralStemFilterFactory |
|
| SpanishPluralStemmer |
Plural Stemmer for Spanish
|
| SpanishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| StemmerOverrideFilter |
Provides the ability to override any KeywordAttribute aware stemmer with custom
dictionary-based stemming.
|
| StemmerOverrideFilter.Builder |
|
| StemmerOverrideFilter.StemmerOverrideMap |
A read-only 4-byte FST backed map that allows fast case-insensitive key value lookups for
StemmerOverrideFilter
|
| StemmerOverrideFilterFactory |
|
| StemmerUtil |
Some commonly-used stemming functions
|
| StopAnalyzer |
|
| StopFilter |
Removes stop words from a token stream.
|
| StopFilterFactory |
|
| Suggester |
A generator for misspelled word corrections based on Hunspell flags.
|
| SuggestionTimeoutException |
|
| SwedishAnalyzer |
|
| SwedishLightStemFilter |
|
| SwedishLightStemFilterFactory |
|
| SwedishLightStemmer |
Light Stemmer for Swedish.
|
| SwedishMinimalStemFilter |
|
| SwedishMinimalStemFilterFactory |
|
| SwedishMinimalStemmer |
Minimal Stemmer for Swedish.
|
| SwedishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| SynonymFilter |
Deprecated.
|
| SynonymFilterFactory |
Deprecated.
|
| SynonymGraphFilter |
Applies single- or multi-token synonyms from a SynonymMap to an incoming TokenStream, producing a fully correct graph output.
|
| SynonymGraphFilterFactory |
|
| SynonymMap |
A map of synonyms, keys and values are phrases.
|
| SynonymMap.Builder |
Builds an FSTSynonymMap.
|
| SynonymMap.Parser |
Abstraction for parsing synonym files.
|
| TamilAnalyzer |
Analyzer for Tamil.
|
| TamilStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| TeeSinkTokenFilter |
This TokenFilter provides the ability to set aside attribute states that have already been
analyzed.
|
| TeeSinkTokenFilter.SinkTokenStream |
TokenStream output from a tee.
|
| TeluguAnalyzer |
Analyzer for Telugu.
|
| TeluguNormalizationFilter |
|
| TeluguNormalizationFilterFactory |
|
| TeluguNormalizer |
Normalizer for Telugu.
|
| TeluguStemFilter |
|
| TeluguStemFilterFactory |
|
| TeluguStemmer |
Stemmer for Telugu.
|
| TermAndBoost |
Wraps a term and boost
|
| TernaryTree |
Ternary Search Tree.
|
| ThaiAnalyzer |
|
| ThaiTokenizer |
|
| ThaiTokenizerFactory |
|
| TimeoutPolicy |
A strategy determining what to do when Hunspell API calls take too much time
|
| TokenOffsetPayloadTokenFilter |
|
| TokenOffsetPayloadTokenFilterFactory |
|
| TrimFilter |
Trims leading and trailing whitespace from Tokens in the stream.
|
| TrimFilterFactory |
|
| TruncateTokenFilter |
A token filter for truncating the terms into a specific length.
|
| TruncateTokenFilterFactory |
|
| TurkishAnalyzer |
|
| TurkishLowerCaseFilter |
Normalizes Turkish token text to lower case.
|
| TurkishLowerCaseFilterFactory |
|
| TurkishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
| TypeAsPayloadTokenFilter |
|
| TypeAsPayloadTokenFilterFactory |
|
| TypeAsSynonymFilter |
|
| TypeAsSynonymFilterFactory |
|
| TypeTokenFilter |
Removes tokens whose types appear in a set of blocked types from a token stream.
|
| TypeTokenFilterFactory |
|
| UAX29URLEmailAnalyzer |
|
| UAX29URLEmailTokenizer |
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified
in Unicode Standard Annex #29 URLs and email
addresses are also tokenized according to the relevant RFCs.
|
| UAX29URLEmailTokenizerFactory |
|
| UAX29URLEmailTokenizerImpl |
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
|
| UnicodeProps |
This file contains unicode properties used by various CharTokenizers.
|
| UnicodeWhitespaceAnalyzer |
|
| UnicodeWhitespaceTokenizer |
A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
| UpperCaseFilter |
Normalizes token text to UPPER CASE.
|
| UpperCaseFilterFactory |
|
| WhitespaceAnalyzer |
|
| WhitespaceTokenizer |
|
| WhitespaceTokenizerFactory |
|
| WikipediaTokenizer |
Extension of StandardTokenizer that is aware of Wikipedia syntax.
|
| WikipediaTokenizerFactory |
|
| Word2VecModel |
Word2VecModel is a class representing the parsed Word2Vec model containing the vectors for each
word in dictionary
|
| Word2VecSynonymFilter |
Applies single-token synonyms from a Word2Vec trained network to an incoming TokenStream.
|
| Word2VecSynonymFilterFactory |
|
| Word2VecSynonymProvider |
The Word2VecSynonymProvider generates the list of sysnonyms of a term.
|
| Word2VecSynonymProviderFactory |
Supply Word2Vec Word2VecSynonymProvider cache avoiding that multiple instances of
Word2VecSynonymFilterFactory will instantiate multiple instances of the same SynonymProvider.
|
| WordDelimiterFilter |
Deprecated.
|
| WordDelimiterFilterFactory |
Deprecated.
|
| WordDelimiterGraphFilter |
Splits words into subwords and performs optional transformations on subword groups, producing a
correct token graph so that e.g.
|
| WordDelimiterGraphFilterFactory |
|
| WordDelimiterIterator |
A BreakIterator-like API for iterating over subwords in text, according to
WordDelimiterGraphFilter rules.
|
| WordFormGenerator |
|
| WordnetSynonymParser |
Parser for wordnet prolog format
|
| YiddishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|