Package org.apache.lucene.util
Class BytesRefHash
java.lang.Object
org.apache.lucene.util.BytesRefHash
- All Implemented Interfaces:
Accountable
BytesRefHash is a special purpose hash-map like data-structure optimized for BytesRef instances. BytesRefHash maintains mappings of byte arrays to ids
(Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping
to the id is encapsulated inside BytesRefHash and is guaranteed to be increased for each
added BytesRef.
Note: The maximum capacity BytesRef instance passed to add(BytesRef) must not
be longer than ByteBlockPool.BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB
total byte storage.
- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classManages allocation of the per-term addresses.static classA simpleBytesRefHash.BytesStartArraythat tracks memory allocation using a privateCounterinstance.static class -
Field Summary
FieldsFields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE -
Constructor Summary
ConstructorsConstructorDescriptionBytesRefHash(ByteBlockPool pool) Creates a newBytesRefHashBytesRefHash(ByteBlockPool pool, int capacity, BytesRefHash.BytesStartArray bytesStartArray) Creates a newBytesRefHash -
Method Summary
Modifier and TypeMethodDescriptionintAdds a newBytesRefintaddByPoolOffset(int offset) Adds a "arbitrary" int offset instead of a BytesRef term.intbyteStart(int bytesID) Returns the bytesStart offset into the internally usedByteBlockPoolfor the given bytesIDvoidclear()voidclear(boolean resetPool) voidclose()Closes the BytesRefHash and releases all internally used memoryint[]compact()Returns the ids array in arbitrary order.intReturns the id of the givenBytesRef.Populates and returns aBytesRefwith the bytes for the given bytesID.longReturn the memory usage of this object in bytes.voidreinit()reinitializes theBytesRefHashafter a previousclear()call.intsize()Returns the number ofBytesRefvalues in thisBytesRefHash.int[]sort()Returns the values array sorted by the referenced byte values.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
Field Details
-
DEFAULT_CAPACITY
public static final int DEFAULT_CAPACITY- See Also:
-
-
Constructor Details
-
BytesRefHash
public BytesRefHash() -
BytesRefHash
Creates a newBytesRefHash -
BytesRefHash
Creates a newBytesRefHash
-
-
Method Details
-
size
public int size()Returns the number ofBytesRefvalues in thisBytesRefHash.- Returns:
- the number of
BytesRefvalues in thisBytesRefHash.
-
get
Populates and returns aBytesRefwith the bytes for the given bytesID.Note: the given bytesID must be a positive integer less than the current size (
size())- Parameters:
bytesID- the idref- theBytesRefto populate- Returns:
- the given BytesRef instance populated with the bytes for the given bytesID
-
compact
public int[] compact()Returns the ids array in arbitrary order. Valid ids start at offset of 0 and end at a limit ofsize()- 1Note: This is a destructive operation.
clear()must be called in order to reuse thisBytesRefHashinstance.- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
sort
public int[] sort()Returns the values array sorted by the referenced byte values.Note: This is a destructive operation.
clear()must be called in order to reuse thisBytesRefHashinstance. -
clear
public void clear(boolean resetPool) -
clear
public void clear() -
close
public void close()Closes the BytesRefHash and releases all internally used memory -
add
Adds a newBytesRef- Parameters:
bytes- the bytes to hash- Returns:
- the id the given bytes are hashed if there was no mapping for the given bytes,
otherwise
(-(id)-1). This guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before. - Throws:
BytesRefHash.MaxBytesLengthExceededException- if the given bytes are> 2 +ByteBlockPool.BYTE_BLOCK_SIZE
-
find
Returns the id of the givenBytesRef.- Parameters:
bytes- the bytes to look for- Returns:
- the id of the given bytes, or
-1if there is no mapping for the given bytes.
-
addByPoolOffset
public int addByPoolOffset(int offset) Adds a "arbitrary" int offset instead of a BytesRef term. This is used in the indexer to hold the hash for term vectors, because they do not redundantly store the byte[] term directly and instead reference the byte[] term already stored by the postings BytesRefHash. See add(int textStart) in TermsHashPerField. -
reinit
public void reinit()reinitializes theBytesRefHashafter a previousclear()call. Ifclear()has not been called previously this method has no effect. -
byteStart
public int byteStart(int bytesID) Returns the bytesStart offset into the internally usedByteBlockPoolfor the given bytesID- Parameters:
bytesID- the id to look up- Returns:
- the bytesStart offset into the internally used
ByteBlockPoolfor the given id
-
ramBytesUsed
public long ramBytesUsed()Description copied from interface:AccountableReturn the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsedin interfaceAccountable
-