Class ClassicSimilarity
- java.lang.Object
-
- org.apache.lucene.search.similarities.Similarity
-
- org.apache.lucene.search.similarities.TFIDFSimilarity
-
- org.apache.lucene.search.similarities.ClassicSimilarity
-
public class ClassicSimilarity extends TFIDFSimilarity
Expert: Historical scoring implementation. You might want to consider usingBM25Similarityinstead, which is generally considered superior to TF-IDF.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer
-
-
Constructor Summary
Constructors Constructor Description ClassicSimilarity()Default constructor: parameter-freeClassicSimilarity(boolean discountOverlaps)Primary constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description floatidf(long docFreq, long docCount)Implemented aslog((docCount+1)/(docFreq+1)) + 1.ExplanationidfExplain(CollectionStatistics collectionStats, TermStatistics termStats)Computes a score factor for a simple term and returns an explanation for that score factor.floatlengthNorm(int numTerms)Implemented as1/sqrt(length).floattf(float freq)Implemented assqrt(freq).StringtoString()-
Methods inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity
idfExplain, scorer
-
Methods inherited from class org.apache.lucene.search.similarities.Similarity
computeNorm, getDiscountOverlaps
-
-
-
-
Method Detail
-
lengthNorm
public float lengthNorm(int numTerms)
Implemented as1/sqrt(length).- Specified by:
lengthNormin classTFIDFSimilarity- Parameters:
numTerms- the number of terms in the field, optionallydiscounting overlaps- Returns:
- a length normalization value
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
tf
public float tf(float freq)
Implemented assqrt(freq).- Specified by:
tfin classTFIDFSimilarity- Parameters:
freq- the frequency of a term within a document- Returns:
- a score factor based on a term's within-document frequency
-
idfExplain
public Explanation idfExplain(CollectionStatistics collectionStats, TermStatistics termStats)
Description copied from class:TFIDFSimilarityComputes a score factor for a simple term and returns an explanation for that score factor.The default implementation uses:
idf(docFreq, docCount);
Note thatCollectionStatistics.docCount()is used instead ofIndexReader#numDocs()because alsoTermStatistics.docFreq()is used, and when the latter is inaccurate, so isCollectionStatistics.docCount(), and in the same direction. In addition,CollectionStatistics.docCount()does not skew when fields are sparse.- Overrides:
idfExplainin classTFIDFSimilarity- Parameters:
collectionStats- collection-level statisticstermStats- term-level statistics for the term- Returns:
- an Explain object that includes both an idf score factor and an explanation for the term.
-
idf
public float idf(long docFreq, long docCount)Implemented aslog((docCount+1)/(docFreq+1)) + 1.- Specified by:
idfin classTFIDFSimilarity- Parameters:
docFreq- the number of documents which contain the termdocCount- the total number of documents in the collection- Returns:
- a score factor based on the term's document frequency
-
-