Package org.apache.lucene.analysis.core
Basic, general-purpose analysis components.
-
Class Summary Class Description DecimalDigitFilter Folds all Unicode digits in[:General_Category=Decimal_Number:]to Basic Latin digits (0-9).DecimalDigitFilterFactory Factory forDecimalDigitFilter.FlattenGraphFilter Converts an incoming graph token stream, such as one fromSynonymGraphFilter, into a flat form so that all nodes form a single linear chain with no side paths.FlattenGraphFilterFactory Factory forFlattenGraphFilter.KeywordAnalyzer "Tokenizes" the entire stream as a single token.KeywordTokenizer Emits the entire input as a single token.KeywordTokenizerFactory Factory forKeywordTokenizer.LetterTokenizer A LetterTokenizer is a tokenizer that divides text at non-letters.LetterTokenizerFactory Factory forLetterTokenizer.LowerCaseFilter Normalizes token text to lower case.LowerCaseFilterFactory Factory forLowerCaseFilter.SimpleAnalyzer StopAnalyzer StopFilter Removes stop words from a token stream.StopFilterFactory Factory forStopFilter.TypeTokenFilter Removes tokens whose types appear in a set of blocked types from a token stream.TypeTokenFilterFactory Factory class forTypeTokenFilter.UnicodeWhitespaceAnalyzer An Analyzer that usesUnicodeWhitespaceTokenizer.UnicodeWhitespaceTokenizer A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.UpperCaseFilter Normalizes token text to UPPER CASE.UpperCaseFilterFactory Factory forUpperCaseFilter.WhitespaceAnalyzer An Analyzer that usesWhitespaceTokenizer.WhitespaceTokenizer A tokenizer that divides text at whitespace characters as defined byCharacter.isWhitespace(int).WhitespaceTokenizerFactory Factory forWhitespaceTokenizer.