Class EnwikiContentSource
- java.lang.Object
-
- org.apache.lucene.benchmark.byTask.feeds.ContentItemsSource
-
- org.apache.lucene.benchmark.byTask.feeds.ContentSource
-
- org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public class EnwikiContentSource extends ContentSource
AContentSourcewhich reads the English Wikipedia dump. You can read the .bz2 file directly (it will be decompressed on the fly). Config properties:- keep.image.only.docs=false|true (default true).
- docs.file=<path to the file>
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.benchmark.byTask.feeds.ContentItemsSource
encoding, forever, logStep, verbose
-
-
Constructor Summary
Constructors Constructor Description EnwikiContentSource()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()Called when reading from this content source is no longer required.DocDatagetNextDocData(DocData docData)Returns the nextDocDatafrom the content source.protected InputStreamopenInputStream()Open the input stream.voidresetInputs()Resets the input for this content source, so that the test would behave as if it was just started, input-wise.voidsetConfig(Config config)Sets theConfigfor this content source.-
Methods inherited from class org.apache.lucene.benchmark.byTask.feeds.ContentItemsSource
addBytes, addItem, collectFiles, getBytesCount, getConfig, getItemsCount, getTotalBytesCount, getTotalItemsCount, printStatistics, shouldLog
-
-
-
-
Method Detail
-
close
public void close() throws IOExceptionDescription copied from class:ContentItemsSourceCalled when reading from this content source is no longer required.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein classContentItemsSource- Throws:
IOException
-
getNextDocData
public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException
Description copied from class:ContentSourceReturns the nextDocDatafrom the content source. Implementations must account for multi-threading, as multiple threads can call this method simultaneously.- Specified by:
getNextDocDatain classContentSource- Throws:
NoMoreDataExceptionIOException
-
resetInputs
public void resetInputs() throws IOExceptionDescription copied from class:ContentItemsSourceResets the input for this content source, so that the test would behave as if it was just started, input-wise.NOTE: the default implementation resets the number of bytes and items generated since the last reset, so it's important to call super.resetInputs in case you override this method.
- Overrides:
resetInputsin classContentItemsSource- Throws:
IOException
-
openInputStream
protected InputStream openInputStream() throws IOException
Open the input stream.- Throws:
IOException
-
setConfig
public void setConfig(Config config)
Description copied from class:ContentItemsSourceSets theConfigfor this content source. If you override this method, you must call super.setConfig.- Overrides:
setConfigin classContentItemsSource
-
-