Package org.apache.lucene.tests.analysis
Class MockAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.tests.analysis.MockAnalyzer
- All Implemented Interfaces:
Closeable,AutoCloseable
Analyzer for testing
This analyzer is a replacement for Whitespace/Simple/KeywordAnalyzers for unit tests. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, it's a great idea to test it with this analyzer instead. MockAnalyzer has the following behavior:
- By default, the assertions in
MockTokenizerare turned on for extra checks that the consumer is consuming properly. These checks can be disabled withsetEnableChecks(boolean). - Payload data is randomly injected into the stream for more thorough testing of payloads.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents -
Field Summary
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY -
Constructor Summary
ConstructorsConstructorDescriptionMockAnalyzer(Random random) Create a Whitespace-lowercasing analyzer with no stopwords removal.MockAnalyzer(Random random, CharacterRunAutomaton runAutomaton, boolean lowerCase) MockAnalyzer(Random random, CharacterRunAutomaton runAutomaton, boolean lowerCase, CharacterRunAutomaton filter) Creates a new MockAnalyzer. -
Method Summary
Modifier and TypeMethodDescriptioncreateComponents(String fieldName) intgetOffsetGap(String fieldName) Get the offset gap between tokens in fields if several fields with the same name were added.intgetPositionIncrementGap(String fieldName) protected TokenStreamnormalize(String fieldName, TokenStream in) voidsetEnableChecks(boolean enableChecks) Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.voidsetMaxTokenLength(int length) Toggle maxTokenLength for MockTokenizervoidsetOffsetGap(int offsetGap) Set a new offset gap which will then be added to the offset when several fields with the same name are indexedvoidsetPositionIncrementGap(int positionIncrementGap) Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream
-
Constructor Details
-
MockAnalyzer
public MockAnalyzer(Random random, CharacterRunAutomaton runAutomaton, boolean lowerCase, CharacterRunAutomaton filter) Creates a new MockAnalyzer.- Parameters:
random- Random for payloads behaviorrunAutomaton- DFA describing how tokenization should happen (e.g. [a-zA-Z]+)lowerCase- true if the tokenizer should lowercase termsfilter- DFA describing how terms should be filtered (set of stopwords, etc)
-
MockAnalyzer
-
MockAnalyzer
Create a Whitespace-lowercasing analyzer with no stopwords removal.Calls
MockAnalyzer(random, MockTokenizer.WHITESPACE, true, MockTokenFilter.EMPTY_STOPSET, false).
-
-
Method Details
-
createComponents
- Specified by:
createComponentsin classAnalyzer
-
normalize
-
setPositionIncrementGap
public void setPositionIncrementGap(int positionIncrementGap) -
getPositionIncrementGap
- Overrides:
getPositionIncrementGapin classAnalyzer
-
setOffsetGap
public void setOffsetGap(int offsetGap) Set a new offset gap which will then be added to the offset when several fields with the same name are indexed- Parameters:
offsetGap- The offset gap that should be used.
-
getOffsetGap
Get the offset gap between tokens in fields if several fields with the same name were added.- Overrides:
getOffsetGapin classAnalyzer- Parameters:
fieldName- Currently not used, the same offset gap is returned for each field.
-
setEnableChecks
public void setEnableChecks(boolean enableChecks) Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled. -
setMaxTokenLength
public void setMaxTokenLength(int length) Toggle maxTokenLength for MockTokenizer
-