sparsehash-1.10/ 0000777 0000764 0011610 00000000000 11516450314 010601 5 0000000 0000000 sparsehash-1.10/doc/ 0000777 0000764 0011610 00000000000 11516450312 011344 5 0000000 0000000 sparsehash-1.10/doc/dense_hash_map.html 0000444 0000764 0011610 00000127673 11370651740 015130 0000000 0000000
[Note: this document is formatted similarly to the SGI STL implementation documentation pages, and refers to concepts and classes defined there. However, neither this document nor the code it describes is associated with SGI, nor is it necessary to have SGI's STL implementation installed in order to use this class.]
dense_hash_map is a Hashed Associative Container that associates objects of type Key with objects of type Data. dense_hash_map is a Pair Associative Container, meaning that its value type is pair<const Key, Data>. It is also a Unique Associative Container, meaning that no two elements have keys that compare equal using EqualKey.
Looking up an element in a dense_hash_map by its key is efficient, so dense_hash_map is useful for "dictionaries" where the order of elements is irrelevant. If it is important for the elements to be in a particular order, however, then map is more appropriate.
dense_hash_map is distinguished from other hash-map implementations by its speed and by the ability to save and restore contents to disk. On the other hand, this hash-map implementation can use significantly more space than other hash-map implementations, and it also has requirements -- for instance, for a distinguished "empty key" -- that may not be easy for all applications to satisfy.
This class is appropriate for applications that need speedy access to relatively small "dictionaries" stored in memory, or for applications that need these dictionaries to be persistent. [implementation note])
hash<>
-- the kind used by gcc and most Unix compiler suites -- and not
Dinkumware semantics -- the kind used by Microsoft Visual Studio. If
you are using MSVC, this example will not compile as-is: you'll need
to change hash
to hash_compare
, and you
won't use eqstr
at all. See the MSVC documentation for
hash_map
and hash_compare
, for more
details.)
#include <iostream> #include <google/dense_hash_map> using google::dense_hash_map; // namespace where class lives by default using std::cout; using std::endl; using ext::hash; // or __gnu_cxx::hash, or maybe tr1::hash, depending on your OS struct eqstr { bool operator()(const char* s1, const char* s2) const { return (s1 == s2) || (s1 && s2 && strcmp(s1, s2) == 0); } }; int main() { dense_hash_map<const char*, int, hash<const char*>, eqstr> months; months.set_empty_key(NULL); months["january"] = 31; months["february"] = 28; months["march"] = 31; months["april"] = 30; months["may"] = 31; months["june"] = 30; months["july"] = 31; months["august"] = 31; months["september"] = 30; months["october"] = 31; months["november"] = 30; months["december"] = 31; cout << "september -> " << months["september"] << endl; cout << "april -> " << months["april"] << endl; cout << "june -> " << months["june"] << endl; cout << "november -> " << months["november"] << endl; }
unordered_map
.
Parameter | Description | Default |
---|---|---|
Key | The hash_map's key type. This is also defined as dense_hash_map::key_type. | |
Data | The hash_map's data type. This is also defined as dense_hash_map::data_type. [7] | |
HashFcn |
The hash function used by the
hash_map. This is also defined as dense_hash_map::hasher.
Note: Hashtable performance depends heavily on the choice of hash function. See the performance page for more information. |
hash<Key> |
EqualKey | The hash_map key equality function: a binary predicate that determines whether two keys are equal. This is also defined as dense_hash_map::key_equal. | equal_to<Key> |
Alloc |
The STL allocator to use. By default, uses the provided allocator
libc_allocator_with_realloc , which likely gives better
performance than other STL allocators due to its built-in support
for realloc , which this container takes advantage of.
If you use an allocator other than the default, note that this
container imposes an additional requirement on the STL allocator
type beyond those in [lib.allocator.requirements]: it does not
support allocators that define alternate memory models. That is,
it assumes that pointer , const_pointer ,
size_type , and difference_type are just
T* , const T* , size_t , and
ptrdiff_t , respectively. This is also defined as
dense_hash_map::allocator_type.
|
Member | Where defined | Description |
---|---|---|
key_type | Associative Container | The dense_hash_map's key type, Key. |
data_type | Pair Associative Container | The type of object associated with the keys. |
value_type | Pair Associative Container | The type of object, pair<const key_type, data_type>, stored in the hash_map. |
hasher | Hashed Associative Container | The dense_hash_map's hash function. |
key_equal | Hashed Associative Container | Function object that compares keys for equality. |
allocator_type | Unordered Associative Container (tr1) | The type of the Allocator given as a template parameter. |
pointer | Container | Pointer to T. |
reference | Container | Reference to T |
const_reference | Container | Const reference to T |
size_type | Container | An unsigned integral type. |
difference_type | Container | A signed integral type. |
iterator | Container | Iterator used to iterate through a dense_hash_map. [1] |
const_iterator | Container | Const iterator used to iterate through a dense_hash_map. |
local_iterator | Unordered Associative Container (tr1) | Iterator used to iterate through a subset of dense_hash_map. [1] |
const_local_iterator | Unordered Associative Container (tr1) | Const iterator used to iterate through a subset of dense_hash_map. |
iterator begin() | Container | Returns an iterator pointing to the beginning of the dense_hash_map. |
iterator end() | Container | Returns an iterator pointing to the end of the dense_hash_map. |
const_iterator begin() const | Container | Returns an const_iterator pointing to the beginning of the dense_hash_map. |
const_iterator end() const | Container | Returns an const_iterator pointing to the end of the dense_hash_map. |
local_iterator begin(size_type i) | Unordered Associative Container (tr1) | Returns a local_iterator pointing to the beginning of bucket i in the dense_hash_map. |
local_iterator end(size_type i) | Unordered Associative Container (tr1) | Returns a local_iterator pointing to the end of bucket i in the dense_hash_map. For dense_hash_map, each bucket contains either 0 or 1 item. |
const_local_iterator begin(size_type i) const | Unordered Associative Container (tr1) | Returns a const_local_iterator pointing to the beginning of bucket i in the dense_hash_map. |
const_local_iterator end(size_type i) const | Unordered Associative Container (tr1) | Returns a const_local_iterator pointing to the end of bucket i in the dense_hash_map. For dense_hash_map, each bucket contains either 0 or 1 item. |
size_type size() const | Container | Returns the size of the dense_hash_map. |
size_type max_size() const | Container | Returns the largest possible size of the dense_hash_map. |
bool empty() const | Container | true if the dense_hash_map's size is 0. |
size_type bucket_count() const | Hashed Associative Container | Returns the number of buckets used by the dense_hash_map. |
size_type max_bucket_count() const | Hashed Associative Container | Returns the largest possible number of buckets used by the dense_hash_map. |
size_type bucket_size(size_type i) const | Unordered Associative Container (tr1) | Returns the number of elements in bucket i. For dense_hash_map, this will be either 0 or 1. |
size_type bucket(const key_type& key) const | Unordered Associative Container (tr1) | If the key exists in the map, returns the index of the bucket containing the given key, otherwise, return the bucket the key would be inserted into. This value may be passed to begin(size_type) and end(size_type). |
float load_factor() const | Unordered Associative Container (tr1) | The number of elements in the dense_hash_map divided by the number of buckets. |
float max_load_factor() const | Unordered Associative Container (tr1) | The maximum load factor before increasing the number of buckets in the dense_hash_map. |
void max_load_factor(float new_grow) | Unordered Associative Container (tr1) | Sets the maximum load factor before increasing the number of buckets in the dense_hash_map. |
float min_load_factor() const | dense_hash_map | The minimum load factor before decreasing the number of buckets in the dense_hash_map. |
void min_load_factor(float new_grow) | dense_hash_map | Sets the minimum load factor before decreasing the number of buckets in the dense_hash_map. |
void set_resizing_parameters(float shrink, float grow) | dense_hash_map | DEPRECATED. See below. |
void resize(size_type n) | Hashed Associative Container | Increases the bucket count to hold at least n items. [4] [5] |
void rehash(size_type n) | Unordered Associative Container (tr1) | Increases the bucket count to hold at least n items. This is identical to resize. [4] [5] |
hasher hash_funct() const | Hashed Associative Container | Returns the hasher object used by the dense_hash_map. |
hasher hash_function() const | Unordered Associative Container (tr1) | Returns the hasher object used by the dense_hash_map. This is idential to hash_funct. |
key_equal key_eq() const | Hashed Associative Container | Returns the key_equal object used by the dense_hash_map. |
allocator_type get_allocator() const | Unordered Associative Container (tr1) | Returns the allocator_type object used by the dense_hash_map: either the one passed in to the constructor, or a default Alloc instance. |
dense_hash_map() | Container | Creates an empty dense_hash_map. |
dense_hash_map(size_type n) | Hashed Associative Container | Creates an empty dense_hash_map that's optimized for holding up to n items. [5] |
dense_hash_map(size_type n, const hasher& h) | Hashed Associative Container | Creates an empty dense_hash_map that's optimized for up to n items, using h as the hash function. |
dense_hash_map(size_type n, const hasher& h, const key_equal& k) | Hashed Associative Container | Creates an empty dense_hash_map that's optimized for up to n items, using h as the hash function and k as the key equal function. |
dense_hash_map(size_type n, const hasher& h, const key_equal& k, const allocator_type& a) | Unordered Associative Container (tr1) | Creates an empty dense_hash_map that's optimized for up to n items, using h as the hash function, k as the key equal function, and a as the allocator object. |
template <class InputIterator> dense_hash_map(InputIterator f, InputIterator l)[2] |
Unique Hashed Associative Container | Creates a dense_hash_map with a copy of a range. |
template <class InputIterator> dense_hash_map(InputIterator f, InputIterator l, size_type n)[2] |
Unique Hashed Associative Container | Creates a hash_map with a copy of a range that's optimized to hold up to n items. |
template <class InputIterator> dense_hash_map(InputIterator f, InputIterator l, size_type n, const hasher& h)[2] |
Unique Hashed Associative Container | Creates a hash_map with a copy of a range that's optimized to hold up to n items, using h as the hash function. |
template <class InputIterator> dense_hash_map(InputIterator f, InputIterator l, size_type n, const hasher& h, const key_equal& k)[2] |
Unique Hashed Associative Container | Creates a hash_map with a copy of a range that's optimized for holding up to n items, using h as the hash function and k as the key equal function. |
template <class InputIterator> dense_hash_map(InputIterator f, InputIterator l, size_type n, const hasher& h, const key_equal& k, const allocator_type& a)[2] |
Unordered Associative Container (tr1) | Creates a hash_map with a copy of a range that's optimized for holding up to n items, using h as the hash function, k as the key equal function, and a as the allocator object. |
dense_hash_map(const hash_map&) | Container | The copy constructor. |
dense_hash_map& operator=(const hash_map&) | Container | The assignment operator |
void swap(hash_map&) | Container | Swaps the contents of two hash_maps. |
pair<iterator, bool> insert(const value_type& x) |
Unique Associative Container | Inserts x into the dense_hash_map. |
template <class InputIterator> void insert(InputIterator f, InputIterator l)[2] |
Unique Associative Container | Inserts a range into the dense_hash_map. |
void set_empty_key(const key_type& key) [6] | dense_hash_map | See below. |
void set_deleted_key(const key_type& key) [6] | dense_hash_map | See below. |
void clear_deleted_key() [6] | dense_hash_map | See below. |
void erase(iterator pos) | Associative Container | Erases the element pointed to by pos. [6] |
size_type erase(const key_type& k) | Associative Container | Erases the element whose key is k. [6] |
void erase(iterator first, iterator last) | Associative Container | Erases all elements in a range. [6] |
void clear() | Associative Container | Erases all of the elements. |
void clear_no_resize() | dense_hash_map | See below. |
const_iterator find(const key_type& k) const | Associative Container | Finds an element whose key is k. |
iterator find(const key_type& k) | Associative Container | Finds an element whose key is k. |
size_type count(const key_type& k) const | Unique Associative Container | Counts the number of elements whose key is k. |
pair<const_iterator, const_iterator> equal_range(const key_type& k) const |
Associative Container | Finds a range containing all elements whose key is k. |
pair<iterator, iterator> equal_range(const key_type& k) |
Associative Container | Finds a range containing all elements whose key is k. |
data_type& operator[](const key_type& k) [3] |
dense_hash_map | See below. |
bool write_metadata(FILE *fp) | dense_hash_map | See below. |
bool read_metadata(FILE *fp) | dense_hash_map | See below. |
bool write_nopointer_data(FILE *fp) | dense_hash_map | See below. |
bool read_nopointer_data(FILE *fp) | dense_hash_map | See below. |
bool operator==(const hash_map&, const hash_map&) |
Hashed Associative Container | Tests two hash_maps for equality. This is a global function, not a member function. |
Member | Description |
---|---|
void set_empty_key(const key_type& key) | Sets the distinguished "empty" key to key. This must be called immediately after construct time, before calls to another other dense_hash_map operation. [6] |
void set_deleted_key(const key_type& key) | Sets the distinguished "deleted" key to key. This must be called before any calls to erase(). [6] |
void clear_deleted_key() | Clears the distinguished "deleted" key. After this is called, calls to erase() are not valid on this object. [6] |
void clear_no_resize() | Clears the hashtable like clear() does, but does not recover the memory used for hashtable buckets. (The memory used by the items in the hashtable is still recovered.) This can save time for applications that want to reuse a dense_hash_map many times, each time with a similar number of objects. |
data_type& operator[](const key_type& k) [3] |
Returns a reference to the object that is associated with a particular key. If the dense_hash_map does not already contain such an object, operator[] inserts the default object data_type(). [3] | void set_resizing_parameters(float shrink, float grow) | This function is DEPRECATED. It is equivalent to calling min_load_factor(shrink); max_load_factor(grow). |
bool write_metadata(FILE *fp) | Write hashtable metadata to fp. See below. |
bool read_metadata(FILE *fp) | Read hashtable metadata from fp. See below. |
bool write_nopointer_data(FILE *fp) | Write hashtable contents to fp. This is valid only if the hashtable key and value are "plain" data. See below. |
bool read_nopointer_data(FILE *fp) | Read hashtable contents to fp. This is valid only if the hashtable key and value are "plain" data. See below. |
[1] dense_hash_map::iterator is not a mutable iterator, because dense_hash_map::value_type is not Assignable. That is, if i is of type dense_hash_map::iterator and p is of type dense_hash_map::value_type, then *i = p is not a valid expression. However, dense_hash_map::iterator isn't a constant iterator either, because it can be used to modify the object that it points to. Using the same notation as above, (*i).second = p is a valid expression.
[2] This member function relies on member template functions, which may not be supported by all compilers. If your compiler supports member templates, you can call this function with any type of input iterator. If your compiler does not yet support member templates, though, then the arguments must either be of type const value_type* or of type dense_hash_map::const_iterator.
[3] Since operator[] might insert a new element into the dense_hash_map, it can't possibly be a const member function. Note that the definition of operator[] is extremely simple: m[k] is equivalent to (*((m.insert(value_type(k, data_type()))).first)).second. Strictly speaking, this member function is unnecessary: it exists only for convenience.
[4] In order to preserve iterators, erasing hashtable elements does not cause a hashtable to resize. This means that after a string of erase() calls, the hashtable will use more space than is required. At a cost of invalidating all current iterators, you can call resize() to manually compact the hashtable. The hashtable promotes too-small resize() arguments to the smallest legal value, so to compact a hashtable, it's sufficient to call resize(0).
[5] Unlike some other hashtable implementations, the optional n in the calls to the constructor, resize, and rehash indicates not the desired number of buckets that should be allocated, but instead the expected number of items to be inserted. The class then sizes the hash-map appropriately for the number of items specified. It's not an error to actually insert more or fewer items into the hashtable, but the implementation is most efficient -- does the fewest hashtable resizes -- if the number of inserted items is n or slightly less.
[6] dense_hash_map requires you call set_empty_key() immediately after constructing the hash-map, and before calling any other dense_hash_map method. (This is the largest difference between the dense_hash_map API and other hash-map APIs. See implementation.html for why this is necessary.) The argument to set_empty_key() should be a key-value that is never used for legitimate hash-map entries. If you have no such key value, you will be unable to use dense_hash_map. It is an error to call insert() with an item whose key is the "empty key."
dense_hash_map also requires you call set_deleted_key() before calling erase(). The argument to set_deleted_key() should be a key-value that is never used for legitimate hash-map entries. It must be different from the key-value used for set_empty_key(). It is an error to call erase() without first calling set_deleted_key(), and it is also an error to call insert() with an item whose key is the "deleted key."There is no need to call set_deleted_key if you do not wish to call erase() on the hash-map.
It is acceptable to change the deleted-key at any time by calling set_deleted_key() with a new argument. You can also call clear_deleted_key(), at which point all keys become valid for insertion but no hashtable entries can be deleted until set_deleted_key() is called again.
[7] dense_hash_map requires that data_type has a zero-argument default constructor. This is because dense_hash_map uses the special value pair(empty_key, data_type()) to denote empty buckets, and thus needs to be able to create data_type using a zero-argument constructor.
If your data_type does not have a zero-argument default constructor, there are several workarounds:
IMPORTANT IMPLEMENTATION NOTE: In the current version of this code, the input/output routines for dense_hash_map have not yet been implemented. This section explains the API, but note that all calls to these routines will fail (return false). It is a TODO to remedy this situation. |
It is possible to save and restore dense_hash_map objects to disk. Storage takes place in two steps. The first writes the hashtable metadata. The second writes the actual data.
To write a hashtable to disk, first call write_metadata() on an open file pointer. This saves the hashtable information in a byte-order-independent format.
After the metadata has been written to disk, you must write the actual data stored in the hash-map to disk. If both the key and data are "simple" enough, you can do this by calling write_nopointer_data(). "Simple" data is data that can be safely copied to disk via fwrite(). Native C data types fall into this category, as do structs of native C data types. Pointers and STL objects do not.
Note that write_nopointer_data() does not do any endian conversion. Thus, it is only appropriate when you intend to read the data on the same endian architecture as you write the data.
If you cannot use write_nopointer_data() for any reason, you can write the data yourself by iterating over the dense_hash_map with a const_iterator and writing the key and data in any manner you wish.
To read the hashtable information from disk, first you must create a dense_hash_map object. Then open a file pointer to point to the saved hashtable, and call read_metadata(). If you saved the data via write_nopointer_data(), you can follow the read_metadata() call with a call to read_nopointer_data(). This is all that is needed.
If you saved the data through a custom write routine, you must call a custom read routine to read in the data. To do this, iterate over the dense_hash_map with an iterator; this operation is sensical because the metadata has already been set up. For each iterator item, you can read the key and value from disk, and set it appropriately. You will need to do a const_cast on the iterator, since it->first is always const. You will also need to use placement-new if the key or value is a C++ object. The code might look like this:
for (dense_hash_map<int*, ComplicatedClass>::iterator it = ht.begin(); it != ht.end(); ++it) { // The key is stored in the dense_hash_map as a pointer const_cast<int*>(it->first) = new int; fread(const_cast<int*>(it->first), sizeof(int), 1, fp); // The value is a complicated C++ class that takes an int to construct int ctor_arg; fread(&ctor_arg, sizeof(int), 1, fp); new (&it->second) ComplicatedClass(ctor_arg); // "placement new" }
erase() is guaranteed not to invalidate any iterators -- except for any iterators pointing to the item being erased, of course. insert() invalidates all iterators, as does resize().
This is implemented by making erase() not resize the hashtable. If you desire maximum space efficiency, you can call resize(0) after a string of erase() calls, to force the hashtable to resize to the smallest possible size.
In addition to invalidating iterators, insert() and resize() invalidate all pointers into the hashtable. If you want to store a pointer to an object held in a dense_hash_map, either do so after finishing hashtable inserts, or store the object on the heap and a pointer to it in the dense_hash_map.
The following are SGI STL, and some Google STL, concepts and classes related to dense_hash_map.
hash_map, Associative Container, Hashed Associative Container, Pair Associative Container, Unique Hashed Associative Container, set, map multiset, multimap, hash_set, hash_multiset, hash_multimap, sparse_hash_map, sparse_hash_set, dense_hash_set sparsehash-1.10/doc/dense_hash_set.html 0000444 0000764 0011610 00000117226 11370652315 015136 0000000 0000000[Note: this document is formatted similarly to the SGI STL implementation documentation pages, and refers to concepts and classes defined there. However, neither this document nor the code it describes is associated with SGI, nor is it necessary to have SGI's STL implementation installed in order to use this class.]
dense_hash_set is a Hashed Associative Container that stores objects of type Key. dense_hash_set is a Simple Associative Container, meaning that its value type, as well as its key type, is key. It is also a Unique Associative Container, meaning that no two elements have keys that compare equal using EqualKey.
Looking up an element in a dense_hash_set by its key is efficient, so dense_hash_set is useful for "dictionaries" where the order of elements is irrelevant. If it is important for the elements to be in a particular order, however, then map is more appropriate.
dense_hash_set is distinguished from other hash-set implementations by its speed and by the ability to save and restore contents to disk. On the other hand, this hash-set implementation can use significantly more space than other hash-set implementations, and it also has requirements -- for instance, for a distinguished "empty key" -- that may not be easy for all applications to satisfy.
This class is appropriate for applications that need speedy access to relatively small "dictionaries" stored in memory, or for applications that need these dictionaries to be persistent. [implementation note])
hash<>
-- the kind used by gcc and most Unix compiler suites -- and not
Dinkumware semantics -- the kind used by Microsoft Visual Studio. If
you are using MSVC, this example will not compile as-is: you'll need
to change hash
to hash_compare
, and you
won't use eqstr
at all. See the MSVC documentation for
hash_map
and hash_compare
, for more
details.)
#include <iostream> #include <google/dense_hash_set> using google::dense_hash_set; // namespace where class lives by default using std::cout; using std::endl; using ext::hash; // or __gnu_cxx::hash, or maybe tr1::hash, depending on your OS struct eqstr { bool operator()(const char* s1, const char* s2) const { return (s1 == s2) || (s1 && s2 && strcmp(s1, s2) == 0); } }; void lookup(const hash_set<const char*, hash<const char*>, eqstr>& Set, const char* word) { dense_hash_set<const char*, hash<const char*>, eqstr>::const_iterator it = Set.find(word); cout << word << ": " << (it != Set.end() ? "present" : "not present") << endl; } int main() { dense_hash_set<const char*, hash<const char*>, eqstr> Set; Set.set_empty_key(NULL); Set.insert("kiwi"); Set.insert("plum"); Set.insert("apple"); Set.insert("mango"); Set.insert("apricot"); Set.insert("banana"); lookup(Set, "mango"); lookup(Set, "apple"); lookup(Set, "durian"); }
unordered_set
.
Parameter | Description | Default |
---|---|---|
Key | The hash_set's key and value type. This is also defined as dense_hash_set::key_type and dense_hash_set::value_type. | |
HashFcn |
The hash function used by the
hash_set. This is also defined as dense_hash_set::hasher.
Note: Hashtable performance depends heavily on the choice of hash function. See the performance page for more information. |
hash<Key> |
EqualKey | The hash_set key equality function: a binary predicate that determines whether two keys are equal. This is also defined as dense_hash_set::key_equal. | equal_to<Key> |
Alloc |
The STL allocator to use. By default, uses the provided allocator
libc_allocator_with_realloc , which likely gives better
performance than other STL allocators due to its built-in support
for realloc , which this container takes advantage of.
If you use an allocator other than the default, note that this
container imposes an additional requirement on the STL allocator
type beyond those in [lib.allocator.requirements]: it does not
support allocators that define alternate memory models. That is,
it assumes that pointer , const_pointer ,
size_type , and difference_type are just
T* , const T* , size_t , and
ptrdiff_t , respectively. This is also defined as
dense_hash_set::allocator_type.
|
Member | Where defined | Description |
---|---|---|
value_type | Container | The type of object, T, stored in the hash_set. |
key_type | Associative Container | The key type associated with value_type. |
hasher | Hashed Associative Container | The dense_hash_set's hash function. |
key_equal | Hashed Associative Container | Function object that compares keys for equality. |
allocator_type | Unordered Associative Container (tr1) | The type of the Allocator given as a template parameter. |
pointer | Container | Pointer to T. |
reference | Container | Reference to T |
const_reference | Container | Const reference to T |
size_type | Container | An unsigned integral type. |
difference_type | Container | A signed integral type. |
iterator | Container | Iterator used to iterate through a dense_hash_set. |
const_iterator | Container | Const iterator used to iterate through a dense_hash_set. (iterator and const_iterator are the same type.) |
local_iterator | Unordered Associative Container (tr1) | Iterator used to iterate through a subset of dense_hash_set. |
const_local_iterator | Unordered Associative Container (tr1) | Const iterator used to iterate through a subset of dense_hash_set. |
iterator begin() const | Container | Returns an iterator pointing to the beginning of the dense_hash_set. |
iterator end() const | Container | Returns an iterator pointing to the end of the dense_hash_set. |
local_iterator begin(size_type i) | Unordered Associative Container (tr1) | Returns a local_iterator pointing to the beginning of bucket i in the dense_hash_set. |
local_iterator end(size_type i) | Unordered Associative Container (tr1) | Returns a local_iterator pointing to the end of bucket i in the dense_hash_set. For dense_hash_set, each bucket contains either 0 or 1 item. |
const_local_iterator begin(size_type i) const | Unordered Associative Container (tr1) | Returns a const_local_iterator pointing to the beginning of bucket i in the dense_hash_set. |
const_local_iterator end(size_type i) const | Unordered Associative Container (tr1) | Returns a const_local_iterator pointing to the end of bucket i in the dense_hash_set. For dense_hash_set, each bucket contains either 0 or 1 item. |
size_type size() const | Container | Returns the size of the dense_hash_set. |
size_type max_size() const | Container | Returns the largest possible size of the dense_hash_set. |
bool empty() const | Container | true if the dense_hash_set's size is 0. |
size_type bucket_count() const | Hashed Associative Container | Returns the number of buckets used by the dense_hash_set. |
size_type max_bucket_count() const | Hashed Associative Container | Returns the largest possible number of buckets used by the dense_hash_set. |
size_type bucket_size(size_type i) const | Unordered Associative Container (tr1) | Returns the number of elements in bucket i. For dense_hash_set, this will be either 0 or 1. |
size_type bucket(const key_type& key) const | Unordered Associative Container (tr1) | If the key exists in the map, returns the index of the bucket containing the given key, otherwise, return the bucket the key would be inserted into. This value may be passed to begin(size_type) and end(size_type). |
float load_factor() const | Unordered Associative Container (tr1) | The number of elements in the dense_hash_set divided by the number of buckets. |
float max_load_factor() const | Unordered Associative Container (tr1) | The maximum load factor before increasing the number of buckets in the dense_hash_set. |
void max_load_factor(float new_grow) | Unordered Associative Container (tr1) | Sets the maximum load factor before increasing the number of buckets in the dense_hash_set. |
float min_load_factor() const | dense_hash_set | The minimum load factor before decreasing the number of buckets in the dense_hash_set. |
void min_load_factor(float new_grow) | dense_hash_set | Sets the minimum load factor before decreasing the number of buckets in the dense_hash_set. |
void set_resizing_parameters(float shrink, float grow) | dense_hash_set | DEPRECATED. See below. |
void resize(size_type n) | Hashed Associative Container | Increases the bucket count to hold at least n items. [2] [3] |
void rehash(size_type n) | Unordered Associative Container (tr1) | Increases the bucket count to hold at least n items. This is identical to resize. [2] [3] |
hasher hash_funct() const | Hashed Associative Container | Returns the hasher object used by the dense_hash_set. |
hasher hash_function() const | Unordered Associative Container (tr1) | Returns the hasher object used by the dense_hash_set. This is idential to hash_funct. |
key_equal key_eq() const | Hashed Associative Container | Returns the key_equal object used by the dense_hash_set. |
allocator_type get_allocator() const | Unordered Associative Container (tr1) | Returns the allocator_type object used by the dense_hash_set: either the one passed in to the constructor, or a default Alloc instance. |
dense_hash_set() | Container | Creates an empty dense_hash_set. |
dense_hash_set(size_type n) | Hashed Associative Container | Creates an empty dense_hash_set that's optimized for holding up to n items. [3] |
dense_hash_set(size_type n, const hasher& h) | Hashed Associative Container | Creates an empty dense_hash_set that's optimized for up to n items, using h as the hash function. |
dense_hash_set(size_type n, const hasher& h, const key_equal& k) | Hashed Associative Container | Creates an empty dense_hash_set that's optimized for up to n items, using h as the hash function and k as the key equal function. |
dense_hash_set(size_type n, const hasher& h, const key_equal& k, const allocator_type& a) | Unordered Associative Container (tr1) | Creates an empty dense_hash_set that's optimized for up to n items, using h as the hash function, k as the key equal function, and a as the allocator object. |
template <class InputIterator> dense_hash_set(InputIterator f, InputIterator l)[2] |
Unique Hashed Associative Container | Creates a dense_hash_set with a copy of a range. |
template <class InputIterator> dense_hash_set(InputIterator f, InputIterator l, size_type n)[2] |
Unique Hashed Associative Container | Creates a hash_set with a copy of a range that's optimized to hold up to n items. |
template <class InputIterator> dense_hash_set(InputIterator f, InputIterator l, size_type n, const hasher& h)[2] |
Unique Hashed Associative Container | Creates a hash_set with a copy of a range that's optimized to hold up to n items, using h as the hash function. |
template <class InputIterator> dense_hash_set(InputIterator f, InputIterator l, size_type n, const hasher& h, const key_equal& k)[2] |
Unique Hashed Associative Container | Creates a hash_set with a copy of a range that's optimized for holding up to n items, using h as the hash function and k as the key equal function. |
template <class InputIterator> dense_hash_set(InputIterator f, InputIterator l, size_type n, const hasher& h, const key_equal& k, const allocator_type& a)[2] |
Unordered Associative Container (tr1) | Creates a hash_set with a copy of a range that's optimized for holding up to n items, using h as the hash function, k as the key equal function, and a as the allocator object. |
dense_hash_set(const hash_set&) | Container | The copy constructor. |
dense_hash_set& operator=(const hash_set&) | Container | The assignment operator |
void swap(hash_set&) | Container | Swaps the contents of two hash_sets. |
pair<iterator, bool> insert(const value_type& x) |
Unique Associative Container | Inserts x into the dense_hash_set. |
template <class InputIterator> void insert(InputIterator f, InputIterator l)[2] |
Unique Associative Container | Inserts a range into the dense_hash_set. |
void set_empty_key(const key_type& key) [4] | dense_hash_set | See below. |
void set_deleted_key(const key_type& key) [4] | dense_hash_set | See below. |
void clear_deleted_key() [4] | dense_hash_set | See below. |
void erase(iterator pos) | Associative Container | Erases the element pointed to by pos. [4] |
size_type erase(const key_type& k) | Associative Container | Erases the element whose key is k. [4] |
void erase(iterator first, iterator last) | Associative Container | Erases all elements in a range. [4] |
void clear() | Associative Container | Erases all of the elements. |
void clear_no_resize() | dense_hash_map | See below. |
iterator find(const key_type& k) const | Associative Container | Finds an element whose key is k. |
size_type count(const key_type& k) const | Unique Associative Container | Counts the number of elements whose key is k. |
pair<iterator, iterator> equal_range(const key_type& k) const |
Associative Container | Finds a range containing all elements whose key is k. |
bool write_metadata(FILE *fp) | dense_hash_set | See below. |
bool read_metadata(FILE *fp) | dense_hash_set | See below. |
bool write_nopointer_data(FILE *fp) | dense_hash_set | See below. |
bool read_nopointer_data(FILE *fp) | dense_hash_set | See below. |
bool operator==(const hash_set&, const hash_set&) |
Hashed Associative Container | Tests two hash_sets for equality. This is a global function, not a member function. |
Member | Description |
---|---|
void set_empty_key(const key_type& key) | Sets the distinguished "empty" key to key. This must be called immediately after construct time, before calls to another other dense_hash_set operation. [4] |
void set_deleted_key(const key_type& key) | Sets the distinguished "deleted" key to key. This must be called before any calls to erase(). [4] |
void clear_deleted_key() | Clears the distinguished "deleted" key. After this is called, calls to erase() are not valid on this object. [4] |
void clear_no_resize() | Clears the hashtable like clear() does, but does not recover the memory used for hashtable buckets. (The memory used by the items in the hashtable is still recovered.) This can save time for applications that want to reuse a dense_hash_set many times, each time with a similar number of objects. | void set_resizing_parameters(float shrink, float grow) | This function is DEPRECATED. It is equivalent to calling min_load_factor(shrink); max_load_factor(grow). |
bool write_metadata(FILE *fp) | Write hashtable metadata to fp. See below. |
bool read_metadata(FILE *fp) | Read hashtable metadata from fp. See below. |
bool write_nopointer_data(FILE *fp) | Write hashtable contents to fp. This is valid only if the hashtable key and value are "plain" data. See below. |
bool read_nopointer_data(FILE *fp) | Read hashtable contents to fp. This is valid only if the hashtable key and value are "plain" data. See below. |
[1] This member function relies on member template functions, which may not be supported by all compilers. If your compiler supports member templates, you can call this function with any type of input iterator. If your compiler does not yet support member templates, though, then the arguments must either be of type const value_type* or of type dense_hash_set::const_iterator.
[2] In order to preserve iterators, erasing hashtable elements does not cause a hashtable to resize. This means that after a string of erase() calls, the hashtable will use more space than is required. At a cost of invalidating all current iterators, you can call resize() to manually compact the hashtable. The hashtable promotes too-small resize() arguments to the smallest legal value, so to compact a hashtable, it's sufficient to call resize(0).
[3] Unlike some other hashtable implementations, the optional n in the calls to the constructor, resize, and rehash indicates not the desired number of buckets that should be allocated, but instead the expected number of items to be inserted. The class then sizes the hash-set appropriately for the number of items specified. It's not an error to actually insert more or fewer items into the hashtable, but the implementation is most efficient -- does the fewest hashtable resizes -- if the number of inserted items is n or slightly less.
[4] dense_hash_set requires you call set_empty_key() immediately after constructing the hash-set, and before calling any other dense_hash_set method. (This is the largest difference between the dense_hash_set API and other hash-set APIs. See implementation.html for why this is necessary.) The argument to set_empty_key() should be a key-value that is never used for legitimate hash-set entries. If you have no such key value, you will be unable to use dense_hash_set. It is an error to call insert() with an item whose key is the "empty key."
dense_hash_set also requires you call set_deleted_key() before calling erase(). The argument to set_deleted_key() should be a key-value that is never used for legitimate hash-set entries. It must be different from the key-value used for set_empty_key(). It is an error to call erase() without first calling set_deleted_key(), and it is also an error to call insert() with an item whose key is the "deleted key."There is no need to call set_deleted_key if you do not wish to call erase() on the hash-set.
It is acceptable to change the deleted-key at any time by calling set_deleted_key() with a new argument. You can also call clear_deleted_key(), at which point all keys become valid for insertion but no hashtable entries can be deleted until set_deleted_key() is called again.
IMPORTANT IMPLEMENTATION NOTE: In the current version of this code, the input/output routines for dense_hash_set have not yet been implemented. This section explains the API, but note that all calls to these routines will fail (return false). It is a TODO to remedy this situation. |
It is possible to save and restore dense_hash_set objects to disk. Storage takes place in two steps. The first writes the hashtable metadata. The second writes the actual data.
To write a hashtable to disk, first call write_metadata() on an open file pointer. This saves the hashtable information in a byte-order-independent format.
After the metadata has been written to disk, you must write the actual data stored in the hash-set to disk. If both the key and data are "simple" enough, you can do this by calling write_nopointer_data(). "Simple" data is data that can be safely copied to disk via fwrite(). Native C data types fall into this category, as do structs of native C data types. Pointers and STL objects do not.
Note that write_nopointer_data() does not do any endian conversion. Thus, it is only appropriate when you intend to read the data on the same endian architecture as you write the data.
If you cannot use write_nopointer_data() for any reason, you can write the data yourself by iterating over the dense_hash_set with a const_iterator and writing the key and data in any manner you wish.
To read the hashtable information from disk, first you must create a dense_hash_set object. Then open a file pointer to point to the saved hashtable, and call read_metadata(). If you saved the data via write_nopointer_data(), you can follow the read_metadata() call with a call to read_nopointer_data(). This is all that is needed.
If you saved the data through a custom write routine, you must call a custom read routine to read in the data. To do this, iterate over the dense_hash_set with an iterator; this operation is sensical because the metadata has already been set up. For each iterator item, you can read the key and value from disk, and set it appropriately. You will need to do a const_cast on the iterator, since *it is always const. The code might look like this:
for (dense_hash_set<int*>::iterator it = ht.begin(); it != ht.end(); ++it) { const_cast<int*>(*it) = new int; fread(const_cast<int*>(*it), sizeof(int), 1, fp); }
Here's another example, where the item stored in the hash-set is a C++ object with a non-trivial constructor. In this case, you must use "placement new" to construct the object at the correct memory location.
for (dense_hash_set<ComplicatedClass>::iterator it = ht.begin(); it != ht.end(); ++it) { int ctor_arg; // ComplicatedClass takes an int as its constructor arg fread(&ctor_arg, sizeof(int), 1, fp); new (const_cast<ComplicatedClass*>(&(*it))) ComplicatedClass(ctor_arg); }
erase() is guaranteed not to invalidate any iterators -- except for any iterators pointing to the item being erased, of course. insert() invalidates all iterators, as does resize().
This is implemented by making erase() not resize the hashtable. If you desire maximum space efficiency, you can call resize(0) after a string of erase() calls, to force the hashtable to resize to the smallest possible size.
In addition to invalidating iterators, insert() and resize() invalidate all pointers into the hashtable. If you want to store a pointer to an object held in a dense_hash_set, either do so after finishing hashtable inserts, or store the object on the heap and a pointer to it in the dense_hash_set.
The following are SGI STL, and some Google STL, concepts and classes related to dense_hash_set.
hash_set, Associative Container, Hashed Associative Container, Simple Associative Container, Unique Hashed Associative Container, set, map multiset, multimap, hash_map, hash_multiset, hash_multimap, sparse_hash_set, sparse_hash_map, dense_hash_map sparsehash-1.10/doc/sparse_hash_map.html 0000444 0000764 0011610 00000124210 11370652165 015311 0000000 0000000[Note: this document is formatted similarly to the SGI STL implementation documentation pages, and refers to concepts and classes defined there. However, neither this document nor the code it describes is associated with SGI, nor is it necessary to have SGI's STL implementation installed in order to use this class.]
sparse_hash_map is a Hashed Associative Container that associates objects of type Key with objects of type Data. sparse_hash_map is a Pair Associative Container, meaning that its value type is pair<const Key, Data>. It is also a Unique Associative Container, meaning that no two elements have keys that compare equal using EqualKey.
Looking up an element in a sparse_hash_map by its key is efficient, so sparse_hash_map is useful for "dictionaries" where the order of elements is irrelevant. If it is important for the elements to be in a particular order, however, then map is more appropriate.
sparse_hash_map is distinguished from other hash-map implementations by its stingy use of memory and by the ability to save and restore contents to disk. On the other hand, this hash-map implementation, while still efficient, is slower than other hash-map implementations, and it also has requirements -- for instance, for a distinguished "deleted key" -- that may not be easy for all applications to satisfy.
This class is appropriate for applications that need to store large "dictionaries" in memory, or for applications that need these dictionaries to be persistent.
hash<>
-- the kind used by gcc and most Unix compiler suites -- and not
Dinkumware semantics -- the kind used by Microsoft Visual Studio. If
you are using MSVC, this example will not compile as-is: you'll need
to change hash
to hash_compare
, and you
won't use eqstr
at all. See the MSVC documentation for
hash_map
and hash_compare
, for more
details.)
#include <iostream> #include <google/sparse_hash_map> using google::sparse_hash_map; // namespace where class lives by default using std::cout; using std::endl; using ext::hash; // or __gnu_cxx::hash, or maybe tr1::hash, depending on your OS struct eqstr { bool operator()(const char* s1, const char* s2) const { return (s1 == s2) || (s1 && s2 && strcmp(s1, s2) == 0); } }; int main() { sparse_hash_map<const char*, int, hash<const char*>, eqstr> months; months["january"] = 31; months["february"] = 28; months["march"] = 31; months["april"] = 30; months["may"] = 31; months["june"] = 30; months["july"] = 31; months["august"] = 31; months["september"] = 30; months["october"] = 31; months["november"] = 30; months["december"] = 31; cout << "september -> " << months["september"] << endl; cout << "april -> " << months["april"] << endl; cout << "june -> " << months["june"] << endl; cout << "november -> " << months["november"] << endl; }
unordered_map
.
Parameter | Description | Default |
---|---|---|
Key | The hash_map's key type. This is also defined as sparse_hash_map::key_type. | |
Data | The hash_map's data type. This is also defined as sparse_hash_map::data_type. | |
HashFcn |
The hash function used by the
hash_map. This is also defined as sparse_hash_map::hasher.
Note: Hashtable performance depends heavily on the choice of hash function. See the performance page for more information. |
hash<Key> |
EqualKey | The hash_map key equality function: a binary predicate that determines whether two keys are equal. This is also defined as sparse_hash_map::key_equal. | equal_to<Key> |
Alloc |
The STL allocator to use. By default, uses the provided allocator
libc_allocator_with_realloc , which likely gives better
performance than other STL allocators due to its built-in support
for realloc , which this container takes advantage of.
If you use an allocator other than the default, note that this
container imposes an additional requirement on the STL allocator
type beyond those in [lib.allocator.requirements]: it does not
support allocators that define alternate memory models. That is,
it assumes that pointer , const_pointer ,
size_type , and difference_type are just
T* , const T* , size_t , and
ptrdiff_t , respectively. This is also defined as
sparse_hash_map::allocator_type.
|
Member | Where defined | Description |
---|---|---|
key_type | Associative Container | The sparse_hash_map's key type, Key. |
data_type | Pair Associative Container | The type of object associated with the keys. |
value_type | Pair Associative Container | The type of object, pair<const key_type, data_type>, stored in the hash_map. |
hasher | Hashed Associative Container | The sparse_hash_map's hash function. |
key_equal | Hashed Associative Container | Function object that compares keys for equality. |
allocator_type | Unordered Associative Container (tr1) | The type of the Allocator given as a template parameter. |
pointer | Container | Pointer to T. |
reference | Container | Reference to T |
const_reference | Container | Const reference to T |
size_type | Container | An unsigned integral type. |
difference_type | Container | A signed integral type. |
iterator | Container | Iterator used to iterate through a sparse_hash_map. [1] |
const_iterator | Container | Const iterator used to iterate through a sparse_hash_map. |
local_iterator | Unordered Associative Container (tr1) | Iterator used to iterate through a subset of sparse_hash_map. [1] |
const_local_iterator | Unordered Associative Container (tr1) | Const iterator used to iterate through a subset of sparse_hash_map. |
iterator begin() | Container | Returns an iterator pointing to the beginning of the sparse_hash_map. |
iterator end() | Container | Returns an iterator pointing to the end of the sparse_hash_map. |
const_iterator begin() const | Container | Returns an const_iterator pointing to the beginning of the sparse_hash_map. |
const_iterator end() const | Container | Returns an const_iterator pointing to the end of the sparse_hash_map. |
local_iterator begin(size_type i) | Unordered Associative Container (tr1) | Returns a local_iterator pointing to the beginning of bucket i in the sparse_hash_map. |
local_iterator end(size_type i) | Unordered Associative Container (tr1) | Returns a local_iterator pointing to the end of bucket i in the sparse_hash_map. For sparse_hash_map, each bucket contains either 0 or 1 item. |
const_local_iterator begin(size_type i) const | Unordered Associative Container (tr1) | Returns a const_local_iterator pointing to the beginning of bucket i in the sparse_hash_map. |
const_local_iterator end(size_type i) const | Unordered Associative Container (tr1) | Returns a const_local_iterator pointing to the end of bucket i in the sparse_hash_map. For sparse_hash_map, each bucket contains either 0 or 1 item. |
size_type size() const | Container | Returns the size of the sparse_hash_map. |
size_type max_size() const | Container | Returns the largest possible size of the sparse_hash_map. |
bool empty() const | Container | true if the sparse_hash_map's size is 0. |
size_type bucket_count() const | Hashed Associative Container | Returns the number of buckets used by the sparse_hash_map. |
size_type max_bucket_count() const | Hashed Associative Container | Returns the largest possible number of buckets used by the sparse_hash_map. |
size_type bucket_size(size_type i) const | Unordered Associative Container (tr1) | Returns the number of elements in bucket i. For sparse_hash_map, this will be either 0 or 1. |
size_type bucket(const key_type& key) const | Unordered Associative Container (tr1) | If the key exists in the map, returns the index of the bucket containing the given key, otherwise, return the bucket the key would be inserted into. This value may be passed to begin(size_type) and end(size_type). |
float load_factor() const | Unordered Associative Container (tr1) | The number of elements in the sparse_hash_map divided by the number of buckets. |
float max_load_factor() const | Unordered Associative Container (tr1) | The maximum load factor before increasing the number of buckets in the sparse_hash_map. |
void max_load_factor(float new_grow) | Unordered Associative Container (tr1) | Sets the maximum load factor before increasing the number of buckets in the sparse_hash_map. |
float min_load_factor() const | sparse_hash_map | The minimum load factor before decreasing the number of buckets in the sparse_hash_map. |
void min_load_factor(float new_grow) | sparse_hash_map | Sets the minimum load factor before decreasing the number of buckets in the sparse_hash_map. |
void set_resizing_parameters(float shrink, float grow) | sparse_hash_map | DEPRECATED. See below. |
void resize(size_type n) | Hashed Associative Container | Increases the bucket count to hold at least n items. [4] [5] |
void rehash(size_type n) | Unordered Associative Container (tr1) | Increases the bucket count to hold at least n items. This is identical to resize. [4] [5] |
hasher hash_funct() const | Hashed Associative Container | Returns the hasher object used by the sparse_hash_map. |
hasher hash_function() const | Unordered Associative Container (tr1) | Returns the hasher object used by the sparse_hash_map. This is idential to hash_funct. |
key_equal key_eq() const | Hashed Associative Container | Returns the key_equal object used by the sparse_hash_map. |
allocator_type get_allocator() const | Unordered Associative Container (tr1) | Returns the allocator_type object used by the sparse_hash_map: either the one passed in to the constructor, or a default Alloc instance. |
sparse_hash_map() | Container | Creates an empty sparse_hash_map. |
sparse_hash_map(size_type n) | Hashed Associative Container | Creates an empty sparse_hash_map that's optimized for holding up to n items. [5] |
sparse_hash_map(size_type n, const hasher& h) | Hashed Associative Container | Creates an empty sparse_hash_map that's optimized for up to n items, using h as the hash function. |
sparse_hash_map(size_type n, const hasher& h, const key_equal& k) | Hashed Associative Container | Creates an empty sparse_hash_map that's optimized for up to n items, using h as the hash function and k as the key equal function. |
sparse_hash_map(size_type n, const hasher& h, const key_equal& k, const allocator_type& a) | Unordered Associative Container (tr1) | Creates an empty sparse_hash_map that's optimized for up to n items, using h as the hash function, k as the key equal function, and a as the allocator object. |
template <class InputIterator> sparse_hash_map(InputIterator f, InputIterator l)[2] |
Unique Hashed Associative Container | Creates a sparse_hash_map with a copy of a range. |
template <class InputIterator> sparse_hash_map(InputIterator f, InputIterator l, size_type n)[2] |
Unique Hashed Associative Container | Creates a hash_map with a copy of a range that's optimized to hold up to n items. |
template <class InputIterator> sparse_hash_map(InputIterator f, InputIterator l, size_type n, const hasher& h)[2] |
Unique Hashed Associative Container | Creates a hash_map with a copy of a range that's optimized to hold up to n items, using h as the hash function. |
template <class InputIterator> sparse_hash_map(InputIterator f, InputIterator l, size_type n, const hasher& h, const key_equal& k)[2] |
Unique Hashed Associative Container | Creates a hash_map with a copy of a range that's optimized for holding up to n items, using h as the hash function and k as the key equal function. |
template <class InputIterator> sparse_hash_map(InputIterator f, InputIterator l, size_type n, const hasher& h, const key_equal& k, const allocator_type& a)[2] |
Unordered Associative Container (tr1) | Creates a hash_map with a copy of a range that's optimized for holding up to n items, using h as the hash function, k as the key equal function, and a as the allocator object. |
sparse_hash_map(const hash_map&) | Container | The copy constructor. |
sparse_hash_map& operator=(const hash_map&) | Container | The assignment operator |
void swap(hash_map&) | Container | Swaps the contents of two hash_maps. |
pair<iterator, bool> insert(const value_type& x) |
Unique Associative Container | Inserts x into the sparse_hash_map. |
template <class InputIterator> void insert(InputIterator f, InputIterator l)[2] |
Unique Associative Container | Inserts a range into the sparse_hash_map. |
void set_deleted_key(const key_type& key) [6] | sparse_hash_map | See below. |
void clear_deleted_key() [6] | sparse_hash_map | See below. |
void erase(iterator pos) | Associative Container | Erases the element pointed to by pos. [6] |
size_type erase(const key_type& k) | Associative Container | Erases the element whose key is k. [6] |
void erase(iterator first, iterator last) | Associative Container | Erases all elements in a range. [6] |
void clear() | Associative Container | Erases all of the elements. |
const_iterator find(const key_type& k) const | Associative Container | Finds an element whose key is k. |
iterator find(const key_type& k) | Associative Container | Finds an element whose key is k. |
size_type count(const key_type& k) const | Unique Associative Container | Counts the number of elements whose key is k. |
pair<const_iterator, const_iterator> equal_range(const key_type& k) const |
Associative Container | Finds a range containing all elements whose key is k. |
pair<iterator, iterator> equal_range(const key_type& k) |
Associative Container | Finds a range containing all elements whose key is k. |
data_type& operator[](const key_type& k) [3] |
sparse_hash_map | See below. |
bool write_metadata(FILE *fp) | sparse_hash_map | See below. |
bool read_metadata(FILE *fp) | sparse_hash_map | See below. |
bool write_nopointer_data(FILE *fp) | sparse_hash_map | See below. |
bool read_nopointer_data(FILE *fp) | sparse_hash_map | See below. |
bool operator==(const hash_map&, const hash_map&) |
Hashed Associative Container | Tests two hash_maps for equality. This is a global function, not a member function. |
Member | Description |
---|---|
void set_deleted_key(const key_type& key) | Sets the distinguished "deleted" key to key. This must be called before any calls to erase(). [6] |
void clear_deleted_key() | Clears the distinguished "deleted" key. After this is called, calls to erase() are not valid on this object. [6] |
data_type& operator[](const key_type& k) [3] |
Returns a reference to the object that is associated with a particular key. If the sparse_hash_map does not already contain such an object, operator[] inserts the default object data_type(). [3] | void set_resizing_parameters(float shrink, float grow) | This function is DEPRECATED. It is equivalent to calling min_load_factor(shrink); max_load_factor(grow). |
bool write_metadata(FILE *fp) | Write hashtable metadata to fp. See below. |
bool read_metadata(FILE *fp) | Read hashtable metadata from fp. See below. |
bool write_nopointer_data(FILE *fp) | Write hashtable contents to fp. This is valid only if the hashtable key and value are "plain" data. See below. |
bool read_nopointer_data(FILE *fp) | Read hashtable contents to fp. This is valid only if the hashtable key and value are "plain" data. See below. |
[1] sparse_hash_map::iterator is not a mutable iterator, because sparse_hash_map::value_type is not Assignable. That is, if i is of type sparse_hash_map::iterator and p is of type sparse_hash_map::value_type, then *i = p is not a valid expression. However, sparse_hash_map::iterator isn't a constant iterator either, because it can be used to modify the object that it points to. Using the same notation as above, (*i).second = p is a valid expression.
[2] This member function relies on member template functions, which may not be supported by all compilers. If your compiler supports member templates, you can call this function with any type of input iterator. If your compiler does not yet support member templates, though, then the arguments must either be of type const value_type* or of type sparse_hash_map::const_iterator.
[3] Since operator[] might insert a new element into the sparse_hash_map, it can't possibly be a const member function. Note that the definition of operator[] is extremely simple: m[k] is equivalent to (*((m.insert(value_type(k, data_type()))).first)).second. Strictly speaking, this member function is unnecessary: it exists only for convenience.
[4] In order to preserve iterators, erasing hashtable elements does not cause a hashtable to resize. This means that after a string of erase() calls, the hashtable will use more space than is required. At a cost of invalidating all current iterators, you can call resize() to manually compact the hashtable. The hashtable promotes too-small resize() arguments to the smallest legal value, so to compact a hashtable, it's sufficient to call resize(0).
[5] Unlike some other hashtable implementations, the optional n in the calls to the constructor, resize, and rehash indicates not the desired number of buckets that should be allocated, but instead the expected number of items to be inserted. The class then sizes the hash-map appropriately for the number of items specified. It's not an error to actually insert more or fewer items into the hashtable, but the implementation is most efficient -- does the fewest hashtable resizes -- if the number of inserted items is n or slightly less.
[6] sparse_hash_map requires you call set_deleted_key() before calling erase(). (This is the largest difference between the sparse_hash_map API and other hash-map APIs. See implementation.html for why this is necessary.) The argument to set_deleted_key() should be a key-value that is never used for legitimate hash-map entries. It is an error to call erase() without first calling set_deleted_key(), and it is also an error to call insert() with an item whose key is the "deleted key."
There is no need to call set_deleted_key if you do not wish to call erase() on the hash-map.
It is acceptable to change the deleted-key at any time by calling set_deleted_key() with a new argument. You can also call clear_deleted_key(), at which point all keys become valid for insertion but no hashtable entries can be deleted until set_deleted_key() is called again.
Note: If you use set_deleted_key, it is also necessary that data_type has a zero-argument default constructor. This is because sparse_hash_map uses the special value pair(deleted_key, data_type()) to denote deleted buckets, and thus needs to be able to create data_type using a zero-argument constructor.
If your data_type does not have a zero-argument default constructor, there are several workarounds:
If you do not use set_deleted_key, then there is no requirement that data_type havea zero-argument default constructor.
It is possible to save and restore sparse_hash_map objects to disk. Storage takes place in two steps. The first writes the hashtable metadata. The second writes the actual data.
To write a hashtable to disk, first call write_metadata() on an open file pointer. This saves the hashtable information in a byte-order-independent format.
After the metadata has been written to disk, you must write the actual data stored in the hash-map to disk. If both the key and data are "simple" enough, you can do this by calling write_nopointer_data(). "Simple" data is data that can be safely copied to disk via fwrite(). Native C data types fall into this category, as do structs of native C data types. Pointers and STL objects do not.
Note that write_nopointer_data() does not do any endian conversion. Thus, it is only appropriate when you intend to read the data on the same endian architecture as you write the data.
If you cannot use write_nopointer_data() for any reason, you can write the data yourself by iterating over the sparse_hash_map with a const_iterator and writing the key and data in any manner you wish.
To read the hashtable information from disk, first you must create a sparse_hash_map object. Then open a file pointer to point to the saved hashtable, and call read_metadata(). If you saved the data via write_nopointer_data(), you can follow the read_metadata() call with a call to read_nopointer_data(). This is all that is needed.
If you saved the data through a custom write routine, you must call a custom read routine to read in the data. To do this, iterate over the sparse_hash_map with an iterator; this operation is sensical because the metadata has already been set up. For each iterator item, you can read the key and value from disk, and set it appropriately. You will need to do a const_cast on the iterator, since it->first is always const. You will also need to use placement-new if the key or value is a C++ object. The code might look like this:
for (sparse_hash_map<int*, ComplicatedClass>::iterator it = ht.begin(); it != ht.end(); ++it) { // The key is stored in the sparse_hash_map as a pointer const_cast<int*>(it->first) = new int; fread(const_cast<int*>(it->first), sizeof(int), 1, fp); // The value is a complicated C++ class that takes an int to construct int ctor_arg; fread(&ctor_arg, sizeof(int), 1, fp); new (&it->second) ComplicatedClass(ctor_arg); // "placement new" }
erase() is guaranteed not to invalidate any iterators -- except for any iterators pointing to the item being erased, of course. insert() invalidates all iterators, as does resize().
This is implemented by making erase() not resize the hashtable. If you desire maximum space efficiency, you can call resize(0) after a string of erase() calls, to force the hashtable to resize to the smallest possible size.
In addition to invalidating iterators, insert() and resize() invalidate all pointers into the hashtable. If you want to store a pointer to an object held in a sparse_hash_map, either do so after finishing hashtable inserts, or store the object on the heap and a pointer to it in the sparse_hash_map.
The following are SGI STL, and some Google STL, concepts and classes related to sparse_hash_map.
hash_map, Associative Container, Hashed Associative Container, Pair Associative Container, Unique Hashed Associative Container, set, map multiset, multimap, hash_set, hash_multiset, hash_multimap, sparsetable, sparse_hash_set, dense_hash_set, dense_hash_map sparsehash-1.10/doc/sparse_hash_set.html 0000444 0000764 0011610 00000113321 11370652405 015325 0000000 0000000[Note: this document is formatted similarly to the SGI STL implementation documentation pages, and refers to concepts and classes defined there. However, neither this document nor the code it describes is associated with SGI, nor is it necessary to have SGI's STL implementation installed in order to use this class.]
sparse_hash_set is a Hashed Associative Container that stores objects of type Key. sparse_hash_set is a Simple Associative Container, meaning that its value type, as well as its key type, is key. It is also a Unique Associative Container, meaning that no two elements have keys that compare equal using EqualKey.
Looking up an element in a sparse_hash_set by its key is efficient, so sparse_hash_set is useful for "dictionaries" where the order of elements is irrelevant. If it is important for the elements to be in a particular order, however, then map is more appropriate.
sparse_hash_set is distinguished from other hash-set implementations by its stingy use of memory and by the ability to save and restore contents to disk. On the other hand, this hash-set implementation, while still efficient, is slower than other hash-set implementations, and it also has requirements -- for instance, for a distinguished "deleted key" -- that may not be easy for all applications to satisfy.
This class is appropriate for applications that need to store large "dictionaries" in memory, or for applications that need these dictionaries to be persistent.
hash<>
-- the kind used by gcc and most Unix compiler suites -- and not
Dinkumware semantics -- the kind used by Microsoft Visual Studio. If
you are using MSVC, this example will not compile as-is: you'll need
to change hash
to hash_compare
, and you
won't use eqstr
at all. See the MSVC documentation for
hash_map
and hash_compare
, for more
details.)
#include <iostream> #include <google/sparse_hash_set> using google::sparse_hash_set; // namespace where class lives by default using std::cout; using std::endl; using ext::hash; // or __gnu_cxx::hash, or maybe tr1::hash, depending on your OS struct eqstr { bool operator()(const char* s1, const char* s2) const { return (s1 == s2) || (s1 && s2 && strcmp(s1, s2) == 0); } }; void lookup(const hash_set<const char*, hash<const char*>, eqstr>& Set, const char* word) { sparse_hash_set<const char*, hash<const char*>, eqstr>::const_iterator it = Set.find(word); cout << word << ": " << (it != Set.end() ? "present" : "not present") << endl; } int main() { sparse_hash_set<const char*, hash<const char*>, eqstr> Set; Set.insert("kiwi"); Set.insert("plum"); Set.insert("apple"); Set.insert("mango"); Set.insert("apricot"); Set.insert("banana"); lookup(Set, "mango"); lookup(Set, "apple"); lookup(Set, "durian"); }
unordered_set
.
Parameter | Description | Default |
---|---|---|
Key | The hash_set's key and value type. This is also defined as sparse_hash_set::key_type and sparse_hash_set::value_type. | |
HashFcn |
The hash function used by the
hash_set. This is also defined as sparse_hash_set::hasher.
Note: Hashtable performance depends heavily on the choice of hash function. See the performance page for more information. |
hash<Key> |
EqualKey | The hash_set key equality function: a binary predicate that determines whether two keys are equal. This is also defined as sparse_hash_set::key_equal. | equal_to<Key> |
Alloc |
The STL allocator to use. By default, uses the provided allocator
libc_allocator_with_realloc , which likely gives better
performance than other STL allocators due to its built-in support
for realloc , which this container takes advantage of.
If you use an allocator other than the default, note that this
container imposes an additional requirement on the STL allocator
type beyond those in [lib.allocator.requirements]: it does not
support allocators that define alternate memory models. That is,
it assumes that pointer , const_pointer ,
size_type , and difference_type are just
T* , const T* , size_t , and
ptrdiff_t , respectively. This is also defined as
sparse_hash_set::allocator_type.
|
Member | Where defined | Description |
---|---|---|
value_type | Container | The type of object, T, stored in the hash_set. |
key_type | Associative Container | The key type associated with value_type. |
hasher | Hashed Associative Container | The sparse_hash_set's hash function. |
key_equal | Hashed Associative Container | Function object that compares keys for equality. |
allocator_type | Unordered Associative Container (tr1) | The type of the Allocator given as a template parameter. |
pointer | Container | Pointer to T. |
reference | Container | Reference to T |
const_reference | Container | Const reference to T |
size_type | Container | An unsigned integral type. |
difference_type | Container | A signed integral type. |
iterator | Container | Iterator used to iterate through a sparse_hash_set. |
const_iterator | Container | Const iterator used to iterate through a sparse_hash_set. (iterator and const_iterator are the same type.) |
local_iterator | Unordered Associative Container (tr1) | Iterator used to iterate through a subset of sparse_hash_set. |
const_local_iterator | Unordered Associative Container (tr1) | Const iterator used to iterate through a subset of sparse_hash_set. |
iterator begin() const | Container | Returns an iterator pointing to the beginning of the sparse_hash_set. |
iterator end() const | Container | Returns an iterator pointing to the end of the sparse_hash_set. |
local_iterator begin(size_type i) | Unordered Associative Container (tr1) | Returns a local_iterator pointing to the beginning of bucket i in the sparse_hash_set. |
local_iterator end(size_type i) | Unordered Associative Container (tr1) | Returns a local_iterator pointing to the end of bucket i in the sparse_hash_set. For sparse_hash_set, each bucket contains either 0 or 1 item. |
const_local_iterator begin(size_type i) const | Unordered Associative Container (tr1) | Returns a const_local_iterator pointing to the beginning of bucket i in the sparse_hash_set. |
const_local_iterator end(size_type i) const | Unordered Associative Container (tr1) | Returns a const_local_iterator pointing to the end of bucket i in the sparse_hash_set. For sparse_hash_set, each bucket contains either 0 or 1 item. |
size_type size() const | Container | Returns the size of the sparse_hash_set. |
size_type max_size() const | Container | Returns the largest possible size of the sparse_hash_set. |
bool empty() const | Container | true if the sparse_hash_set's size is 0. |
size_type bucket_count() const | Hashed Associative Container | Returns the number of buckets used by the sparse_hash_set. |
size_type max_bucket_count() const | Hashed Associative Container | Returns the largest possible number of buckets used by the sparse_hash_set. |
size_type bucket_size(size_type i) const | Unordered Associative Container (tr1) | Returns the number of elements in bucket i. For sparse_hash_set, this will be either 0 or 1. |
size_type bucket(const key_type& key) const | Unordered Associative Container (tr1) | If the key exists in the map, returns the index of the bucket containing the given key, otherwise, return the bucket the key would be inserted into. This value may be passed to begin(size_type) and end(size_type). |
float load_factor() const | Unordered Associative Container (tr1) | The number of elements in the sparse_hash_set divided by the number of buckets. |
float max_load_factor() const | Unordered Associative Container (tr1) | The maximum load factor before increasing the number of buckets in the sparse_hash_set. |
void max_load_factor(float new_grow) | Unordered Associative Container (tr1) | Sets the maximum load factor before increasing the number of buckets in the sparse_hash_set. |
float min_load_factor() const | sparse_hash_set | The minimum load factor before decreasing the number of buckets in the sparse_hash_set. |
void min_load_factor(float new_grow) | sparse_hash_set | Sets the minimum load factor before decreasing the number of buckets in the sparse_hash_set. |
void set_resizing_parameters(float shrink, float grow) | sparse_hash_set | DEPRECATED. See below. |
void resize(size_type n) | Hashed Associative Container | Increases the bucket count to hold at least n items. [2] [3] |
void rehash(size_type n) | Unordered Associative Container (tr1) | Increases the bucket count to hold at least n items. This is identical to resize. [2] [3] |
hasher hash_funct() const | Hashed Associative Container | Returns the hasher object used by the sparse_hash_set. |
hasher hash_function() const | Unordered Associative Container (tr1) | Returns the hasher object used by the sparse_hash_set. This is idential to hash_funct. |
key_equal key_eq() const | Hashed Associative Container | Returns the key_equal object used by the sparse_hash_set. |
allocator_type get_allocator() const | Unordered Associative Container (tr1) | Returns the allocator_type object used by the sparse_hash_set: either the one passed in to the constructor, or a default Alloc instance. |
sparse_hash_set() | Container | Creates an empty sparse_hash_set. |
sparse_hash_set(size_type n) | Hashed Associative Container | Creates an empty sparse_hash_set that's optimized for holding up to n items. [3] |
sparse_hash_set(size_type n, const hasher& h) | Hashed Associative Container | Creates an empty sparse_hash_set that's optimized for up to n items, using h as the hash function. |
sparse_hash_set(size_type n, const hasher& h, const key_equal& k) | Hashed Associative Container | Creates an empty sparse_hash_set that's optimized for up to n items, using h as the hash function and k as the key equal function. |
sparse_hash_set(size_type n, const hasher& h, const key_equal& k, const allocator_type& a) | Unordered Associative Container (tr1) | Creates an empty sparse_hash_set that's optimized for up to n items, using h as the hash function, k as the key equal function, and a as the allocator object. |
template <class InputIterator> sparse_hash_set(InputIterator f, InputIterator l)[2] |
Unique Hashed Associative Container | Creates a sparse_hash_set with a copy of a range. |
template <class InputIterator> sparse_hash_set(InputIterator f, InputIterator l, size_type n)[2] |
Unique Hashed Associative Container | Creates a hash_set with a copy of a range that's optimized to hold up to n items. |
template <class InputIterator> sparse_hash_set(InputIterator f, InputIterator l, size_type n, const hasher& h)[2] |
Unique Hashed Associative Container | Creates a hash_set with a copy of a range that's optimized to hold up to n items, using h as the hash function. |
template <class InputIterator> sparse_hash_set(InputIterator f, InputIterator l, size_type n, const hasher& h, const key_equal& k)[2] |
Unique Hashed Associative Container | Creates a hash_set with a copy of a range that's optimized for holding up to n items, using h as the hash function and k as the key equal function. |
template <class InputIterator> sparse_hash_set(InputIterator f, InputIterator l, size_type n, const hasher& h, const key_equal& k, const allocator_type& a)[2] |
Unordered Associative Container (tr1) | Creates a hash_set with a copy of a range that's optimized for holding up to n items, using h as the hash function, k as the key equal function, and a as the allocator object. |
sparse_hash_set(const hash_set&) | Container | The copy constructor. |
sparse_hash_set& operator=(const hash_set&) | Container | The assignment operator |
void swap(hash_set&) | Container | Swaps the contents of two hash_sets. |
pair<iterator, bool> insert(const value_type& x) |
Unique Associative Container | Inserts x into the sparse_hash_set. |
template <class InputIterator> void insert(InputIterator f, InputIterator l)[2] |
Unique Associative Container | Inserts a range into the sparse_hash_set. |
void set_deleted_key(const key_type& key) [4] | sparse_hash_set | See below. |
void clear_deleted_key() [4] | sparse_hash_set | See below. |
void erase(iterator pos) | Associative Container | Erases the element pointed to by pos. [4] |
size_type erase(const key_type& k) | Associative Container | Erases the element whose key is k. [4] |
void erase(iterator first, iterator last) | Associative Container | Erases all elements in a range. [4] |
void clear() | Associative Container | Erases all of the elements. |
iterator find(const key_type& k) const | Associative Container | Finds an element whose key is k. |
size_type count(const key_type& k) const | Unique Associative Container | Counts the number of elements whose key is k. |
pair<iterator, iterator> equal_range(const key_type& k) const |
Associative Container | Finds a range containing all elements whose key is k. |
bool write_metadata(FILE *fp) | sparse_hash_set | See below. |
bool read_metadata(FILE *fp) | sparse_hash_set | See below. |
bool write_nopointer_data(FILE *fp) | sparse_hash_set | See below. |
bool read_nopointer_data(FILE *fp) | sparse_hash_set | See below. |
bool operator==(const hash_set&, const hash_set&) |
Hashed Associative Container | Tests two hash_sets for equality. This is a global function, not a member function. |
Member | Description |
---|---|
void set_deleted_key(const key_type& key) | Sets the distinguished "deleted" key to key. This must be called before any calls to erase(). [4] |
void clear_deleted_key() | Clears the distinguished "deleted" key. After this is called, calls to erase() are not valid on this object. [4] | void set_resizing_parameters(float shrink, float grow) | This function is DEPRECATED. It is equivalent to calling min_load_factor(shrink); max_load_factor(grow). |
bool write_metadata(FILE *fp) | Write hashtable metadata to fp. See below. |
bool read_metadata(FILE *fp) | Read hashtable metadata from fp. See below. |
bool write_nopointer_data(FILE *fp) | Write hashtable contents to fp. This is valid only if the hashtable key and value are "plain" data. See below. |
bool read_nopointer_data(FILE *fp) | Read hashtable contents to fp. This is valid only if the hashtable key and value are "plain" data. See below. |
[1] This member function relies on member template functions, which may not be supported by all compilers. If your compiler supports member templates, you can call this function with any type of input iterator. If your compiler does not yet support member templates, though, then the arguments must either be of type const value_type* or of type sparse_hash_set::const_iterator.
[2] In order to preserve iterators, erasing hashtable elements does not cause a hashtable to resize. This means that after a string of erase() calls, the hashtable will use more space than is required. At a cost of invalidating all current iterators, you can call resize() to manually compact the hashtable. The hashtable promotes too-small resize() arguments to the smallest legal value, so to compact a hashtable, it's sufficient to call resize(0).
[3] Unlike some other hashtable implementations, the optional n in the calls to the constructor, resize, and rehash indicates not the desired number of buckets that should be allocated, but instead the expected number of items to be inserted. The class then sizes the hash-set appropriately for the number of items specified. It's not an error to actually insert more or fewer items into the hashtable, but the implementation is most efficient -- does the fewest hashtable resizes -- if the number of inserted items is n or slightly less.
[4] sparse_hash_set requires you call set_deleted_key() before calling erase(). (This is the largest difference between the sparse_hash_set API and other hash-set APIs. See implementation.html for why this is necessary.) The argument to set_deleted_key() should be a key-value that is never used for legitimate hash-set entries. It is an error to call erase() without first calling set_deleted_key(), and it is also an error to call insert() with an item whose key is the "deleted key."
There is no need to call set_deleted_key if you do not wish to call erase() on the hash-set.
It is acceptable to change the deleted-key at any time by calling set_deleted_key() with a new argument. You can also call clear_deleted_key(), at which point all keys become valid for insertion but no hashtable entries can be deleted until set_deleted_key() is called again.
It is possible to save and restore sparse_hash_set objects to disk. Storage takes place in two steps. The first writes the hashtable metadata. The second writes the actual data.
To write a hashtable to disk, first call write_metadata() on an open file pointer. This saves the hashtable information in a byte-order-independent format.
After the metadata has been written to disk, you must write the actual data stored in the hash-set to disk. If both the key and data are "simple" enough, you can do this by calling write_nopointer_data(). "Simple" data is data that can be safely copied to disk via fwrite(). Native C data types fall into this category, as do structs of native C data types. Pointers and STL objects do not.
Note that write_nopointer_data() does not do any endian conversion. Thus, it is only appropriate when you intend to read the data on the same endian architecture as you write the data.
If you cannot use write_nopointer_data() for any reason, you can write the data yourself by iterating over the sparse_hash_set with a const_iterator and writing the key and data in any manner you wish.
To read the hashtable information from disk, first you must create a sparse_hash_set object. Then open a file pointer to point to the saved hashtable, and call read_metadata(). If you saved the data via write_nopointer_data(), you can follow the read_metadata() call with a call to read_nopointer_data(). This is all that is needed.
If you saved the data through a custom write routine, you must call a custom read routine to read in the data. To do this, iterate over the sparse_hash_set with an iterator; this operation is sensical because the metadata has already been set up. For each iterator item, you can read the key and value from disk, and set it appropriately. You will need to do a const_cast on the iterator, since *it is always const. The code might look like this:
for (sparse_hash_set<int*>::iterator it = ht.begin(); it != ht.end(); ++it) { const_cast<int*>(*it) = new int; fread(const_cast<int*>(*it), sizeof(int), 1, fp); }
Here's another example, where the item stored in the hash-set is a C++ object with a non-trivial constructor. In this case, you must use "placement new" to construct the object at the correct memory location.
for (sparse_hash_set<ComplicatedClass>::iterator it = ht.begin(); it != ht.end(); ++it) { int ctor_arg; // ComplicatedClass takes an int as its constructor arg fread(&ctor_arg, sizeof(int), 1, fp); new (const_cast<ComplicatedClass*>(&(*it))) ComplicatedClass(ctor_arg); }
erase() is guaranteed not to invalidate any iterators -- except for any iterators pointing to the item being erased, of course. insert() invalidates all iterators, as does resize().
This is implemented by making erase() not resize the hashtable. If you desire maximum space efficiency, you can call resize(0) after a string of erase() calls, to force the hashtable to resize to the smallest possible size.
In addition to invalidating iterators, insert() and resize() invalidate all pointers into the hashtable. If you want to store a pointer to an object held in a sparse_hash_set, either do so after finishing hashtable inserts, or store the object on the heap and a pointer to it in the sparse_hash_set.
The following are SGI STL, and some Google STL, concepts and classes related to sparse_hash_set.
hash_set, Associative Container, Hashed Associative Container, Simple Associative Container, Unique Hashed Associative Container, set, map multiset, multimap, hash_map, hash_multiset, hash_multimap, sparsetable, sparse_hash_map, dense_hash_set, dense_hash_map sparsehash-1.10/doc/sparsetable.html 0000444 0000764 0011610 00000077745 10633336042 014477 0000000 0000000[Note: this document is formatted similarly to the SGI STL implementation documentation pages, and refers to concepts and classes defined there. However, neither this document nor the code it describes is associated with SGI, nor is it necessary to have SGI's STL implementation installed in order to use this class.]
A sparsetable is a Random Access Container that supports constant time random access to elements, and constant time insertion and removal of elements. It implements the "array" or "table" abstract data type. The number of elements in a sparsetable is set at constructor time, though you can change it at any time by calling resize().
sparsetable is distinguished from other array implementations, including the default C implementation, in its stingy use of memory -- in particular, unused array elements require only 1 bit of disk space to store, rather than sizeof(T) bytes -- and by the ability to save and restore contents to disk. On the other hand, this array implementation, while still efficient, is slower than other array implementations.
A sparsetable distinguishes between table elements that have been assigned and those that are unassigned. Assigned table elements are those that have had a value set via set(), operator(), assignment via an iterator, and so forth. Unassigned table elements are those that have not had a value set in one of these ways, or that have been explicitly unassigned via a call to erase() or clear(). Lookup is valid on both assigned and unassigned table elements; for unassigned elements, lookup returns the default value T().
This class is appropriate for applications that need to store large arrays in memory, or for applications that need these arrays to be persistent.
#include <google/sparsetable> using google::sparsetable; // namespace where class lives by default sparsetable<int> t(100); t[5] = 6; cout << "t[5] = " << t[5]; cout << "Default value = " << t[99];
Parameter | Description | Default |
---|---|---|
T | The sparsetable's value type: the type of object that is stored in the table. | |
GROUP_SIZE | The number of elements in each sparsetable group (see the implementation doc for more details on this value). This almost never need be specified; the default template parameter value works well in all situations. |
Member | Where defined | Description |
---|---|---|
value_type | Container | The type of object, T, stored in the table. |
pointer | Container | Pointer to T. |
reference | Container | Reference to T. |
const_reference | Container | Const reference to T. |
size_type | Container | An unsigned integral type. |
difference_type | Container | A signed integral type. |
iterator | Container | Iterator used to iterate through a sparsetable. |
const_iterator | Container | Const iterator used to iterate through a sparsetable. |
reverse_iterator | Reversible Container | Iterator used to iterate backwards through a sparsetable. |
const_reverse_iterator | Reversible Container | Const iterator used to iterate backwards through a sparsetable. |
nonempty_iterator | sparsetable | Iterator used to iterate through the assigned elements of the sparsetable. |
const_nonempty_iterator | sparsetable | Const iterator used to iterate through the assigned elements of the sparsetable. |
reverse_nonempty_iterator | sparsetable | Iterator used to iterate backwards through the assigned elements of the sparsetable. |
const_reverse_nonempty_iterator | sparsetable | Const iterator used to iterate backwards through the assigned elements of the sparsetable. |
destructive_iterator | sparsetable | Iterator used to iterate through the assigned elements of the sparsetable, erasing elements as it iterates. [1] |
iterator begin() | Container | Returns an iterator pointing to the beginning of the sparsetable. |
iterator end() | Container | Returns an iterator pointing to the end of the sparsetable. |
const_iterator begin() const | Container | Returns an const_iterator pointing to the beginning of the sparsetable. |
const_iterator end() const | Container | Returns an const_iterator pointing to the end of the sparsetable. |
reverse_iterator rbegin() | Reversible Container | Returns a reverse_iterator pointing to the beginning of the reversed sparsetable. |
reverse_iterator rend() | Reversible Container | Returns a reverse_iterator pointing to the end of the reversed sparsetable. |
const_reverse_iterator rbegin() const | Reversible Container | Returns a const_reverse_iterator pointing to the beginning of the reversed sparsetable. |
const_reverse_iterator rend() const | Reversible Container | Returns a const_reverse_iterator pointing to the end of the reversed sparsetable. |
nonempty_iterator nonempty_begin() | sparsetable | Returns a nonempty_iterator pointing to the first assigned element of the sparsetable. |
nonempty_iterator nonempty_end() | sparsetable | Returns a nonempty_iterator pointing to the end of the sparsetable. |
const_nonempty_iterator nonempty_begin() const | sparsetable | Returns a const_nonempty_iterator pointing to the first assigned element of the sparsetable. |
const_nonempty_iterator nonempty_end() const | sparsetable | Returns a const_nonempty_iterator pointing to the end of the sparsetable. |
reverse_nonempty_iterator nonempty_rbegin() | sparsetable | Returns a reverse_nonempty_iterator pointing to the first assigned element of the reversed sparsetable. |
reverse_nonempty_iterator nonempty_rend() | sparsetable | Returns a reverse_nonempty_iterator pointing to the end of the reversed sparsetable. |
const_reverse_nonempty_iterator nonempty_rbegin() const | sparsetable | Returns a const_reverse_nonempty_iterator pointing to the first assigned element of the reversed sparsetable. |
const_reverse_nonempty_iterator nonempty_rend() const | sparsetable | Returns a const_reverse_nonempty_iterator pointing to the end of the reversed sparsetable. |
destructive_iterator destructive_begin() | sparsetable | Returns a destructive_iterator pointing to the first assigned element of the sparsetable. |
destructive_iterator destructive_end() | sparsetable | Returns a destructive_iterator pointing to the end of the sparsetable. |
size_type size() const | Container | Returns the size of the sparsetable. |
size_type max_size() const | Container | Returns the largest possible size of the sparsetable. |
bool empty() const | Container | true if the sparsetable's size is 0. |
size_type num_nonempty() const | sparsetable | Returns the number of sparsetable elements that are currently assigned. |
sparsetable(size_type n) | Container | Creates a sparsetable with n elements. |
sparsetable(const sparsetable&) | Container | The copy constructor. |
~sparsetable() | Container | The destructor. |
sparsetable& operator=(const sparsetable&) | Container | The assignment operator |
void swap(sparsetable&) | Container | Swaps the contents of two sparsetables. |
reference operator[](size_type n) | Random Access Container | Returns the n'th element. [2] |
const_reference operator[](size_type n) const | Random Access Container | Returns the n'th element. |
bool test(size_type i) const | sparsetable | true if the i'th element of the sparsetable is assigned. |
bool test(iterator pos) const | sparsetable | true if the sparsetable element pointed to by pos is assigned. |
bool test(const_iterator pos) const | sparsetable | true if the sparsetable element pointed to by pos is assigned. |
const_reference get(size_type i) const | sparsetable | returns the i'th element of the sparsetable. |
reference set(size_type i, const_reference val) | sparsetable | Sets the i'th element of the sparsetable to value val. |
void erase(size_type i) | sparsetable | Erases the i'th element of the sparsetable. |
void erase(iterator pos) | sparsetable | Erases the element of the sparsetable pointed to by pos. |
void erase(iterator first, iterator last) | sparsetable | Erases the elements of the sparsetable in the range [first, last). |
void clear() | sparsetable | Erases all of the elements. |
void resize(size_type n) | sparsetable | Changes the size of sparsetable to n. |
bool write_metadata(FILE *fp) | sparsetable | See below. |
bool read_metadata(FILE *fp) | sparsetable | See below. |
bool write_nopointer_data(FILE *fp) | sparsetable | See below. |
bool read_nopointer_data(FILE *fp) | sparsetable | See below. |
bool operator==(const sparsetable&, const sparsetable&) |
Forward Container | Tests two sparsetables for equality. This is a global function, not a member function. |
bool operator<(const sparsetable&, const sparsetable&) |
Forward Container | Lexicographical comparison. This is a global function, not a member function. |
Member | Description |
---|---|
nonempty_iterator | Iterator used to iterate through the assigned elements of the sparsetable. |
const_nonempty_iterator | Const iterator used to iterate through the assigned elements of the sparsetable. |
reverse_nonempty_iterator | Iterator used to iterate backwards through the assigned elements of the sparsetable. |
const_reverse_nonempty_iterator | Const iterator used to iterate backwards through the assigned elements of the sparsetable. |
destructive_iterator | Iterator used to iterate through the assigned elements of the sparsetable, erasing elements as it iterates. [1] |
nonempty_iterator nonempty_begin() | Returns a nonempty_iterator pointing to the first assigned element of the sparsetable. |
nonempty_iterator nonempty_end() | Returns a nonempty_iterator pointing to the end of the sparsetable. |
const_nonempty_iterator nonempty_begin() const | Returns a const_nonempty_iterator pointing to the first assigned element of the sparsetable. |
const_nonempty_iterator nonempty_end() const | Returns a const_nonempty_iterator pointing to the end of the sparsetable. |
reverse_nonempty_iterator nonempty_rbegin() | Returns a reverse_nonempty_iterator pointing to the first assigned element of the reversed sparsetable. |
reverse_nonempty_iterator nonempty_rend() | Returns a reverse_nonempty_iterator pointing to the end of the reversed sparsetable. |
const_reverse_nonempty_iterator nonempty_rbegin() const | Returns a const_reverse_nonempty_iterator pointing to the first assigned element of the reversed sparsetable. |
const_reverse_nonempty_iterator nonempty_rend() const | Returns a const_reverse_nonempty_iterator pointing to the end of the reversed sparsetable. |
destructive_iterator destructive_begin() | Returns a destructive_iterator pointing to the first assigned element of the sparsetable. |
destructive_iterator destructive_end() | Returns a destructive_iterator pointing to the end of the sparsetable. |
size_type num_nonempty() const | Returns the number of sparsetable elements that are currently assigned. |
bool test(size_type i) const | true if the i'th element of the sparsetable is assigned. |
bool test(iterator pos) const | true if the sparsetable element pointed to by pos is assigned. |
bool test(const_iterator pos) const | true if the sparsetable element pointed to by pos is assigned. |
const_reference get(size_type i) const | returns the i'th element of the sparsetable. If the i'th element is assigned, the assigned value is returned, otherwise, the default value T() is returned. |
reference set(size_type i, const_reference val) | Sets the i'th element of the sparsetable to value val, and returns a reference to the i'th element of the table. This operation causes the i'th element to be assigned. |
void erase(size_type i) | Erases the i'th element of the sparsetable. This operation causes the i'th element to be unassigned. |
void erase(iterator pos) | Erases the element of the sparsetable pointed to by pos. This operation causes the i'th element to be unassigned. |
void erase(iterator first, iterator last) | Erases the elements of the sparsetable in the range [first, last). This operation causes these elements to be unassigned. |
void clear() | Erases all of the elements. This causes all elements to be unassigned. |
void resize(size_type n) | Changes the size of sparsetable to n. If n is greater than the old size, new, unassigned elements are appended. If n is less than the old size, all elements in position >n are deleted. |
bool write_metadata(FILE *fp) | Write hashtable metadata to fp. See below. |
bool read_metadata(FILE *fp) | Read hashtable metadata from fp. See below. |
bool write_nopointer_data(FILE *fp) | Write hashtable contents to fp. This is valid only if the hashtable key and value are "plain" data. See below. |
bool read_nopointer_data(FILE *fp) | Read hashtable contents to fp. This is valid only if the hashtable key and value are "plain" data. See below. |
[1] sparsetable::destructive_iterator iterates through a sparsetable like a normal iterator, but ++it may delete the element being iterated past. Obviously, this iterator can only be used once on a given table! One application of this iterator is to copy data from a sparsetable to some other data structure without using extra memory to store the data in both places during the copy.
[2] Since operator[] might insert a new element into the sparsetable, it can't possibly be a const member function. In theory, since it might insert a new element, it should cause the element it refers to to become assigned. However, this is undesirable when operator[] is used to examine elements, rather than assign them. Thus, as an implementation trick, operator[] does not really return a reference. Instead it returns an object that behaves almost exactly like a reference. This object, however, delays setting the appropriate sparsetable element to assigned to when it is actually assigned to.
For a bit more detail: the object returned by operator[] is an opaque type which defines operator=, operator reference(), and operator&. The first operator controls assigning to the value. The second controls examining the value. The third controls pointing to the value.
All three operators perform exactly as an object of type reference would perform. The only problems that arise is when this object is accessed in situations where C++ cannot do the conversion by default. By far the most common situation is with variadic functions such as printf. In such situations, you may need to manually cast the object to the right type:
printf("%d", static_cast<typename table::reference>(table[i]));
It is possible to save and restore sparsetable objects to disk. Storage takes place in two steps. The first writes the table metadata. The second writes the actual data.
To write a sparsetable to disk, first call write_metadata() on an open file pointer. This saves the sparsetable information in a byte-order-independent format.
After the metadata has been written to disk, you must write the actual data stored in the sparsetable to disk. If the value is "simple" enough, you can do this by calling write_nopointer_data(). "Simple" data is data that can be safely copied to disk via fwrite(). Native C data types fall into this category, as do structs of native C data types. Pointers and STL objects do not.
Note that write_nopointer_data() does not do any endian conversion. Thus, it is only appropriate when you intend to read the data on the same endian architecture as you write the data.
If you cannot use write_nopointer_data() for any reason, you can write the data yourself by iterating over the sparsetable with a const_nonempty_iterator and writing the key and data in any manner you wish.
To read the hashtable information from disk, first you must create a sparsetable object. Then open a file pointer to point to the saved sparsetable, and call read_metadata(). If you saved the data via write_nopointer_data(), you can follow the read_metadata() call with a call to read_nopointer_data(). This is all that is needed.
If you saved the data through a custom write routine, you must call a custom read routine to read in the data. To do this, iterate over the sparsetable with a nonempty_iterator; this operation is sensical because the metadata has already been set up. For each iterator item, you can read the key and value from disk, and set it appropriately. The code might look like this:
for (sparsetable<int*>::nonempty_iterator it = t.nonempty_begin(); it != t.nonempty_end(); ++it) { *it = new int; fread(*it, sizeof(int), 1, fp); }
Here's another example, where the item stored in the sparsetable is a C++ object with a non-trivial constructor. In this case, you must use "placement new" to construct the object at the correct memory location.
for (sparsetable<ComplicatedCppClass>::nonempty_iterator it = t.nonempty_begin(); it != t.nonempty_end(); ++it) { int constructor_arg; // ComplicatedCppClass takes an int to construct fread(&constructor_arg, sizeof(int), 1, fp); new (&(*it)) ComplicatedCppClass(constructor_arg); // placement new }
The following are SGI STL concepts and classes related to sparsetable.
Container, Random Access Container, sparse_hash_set, sparse_hash_map sparsehash-1.10/doc/implementation.html 0000444 0000764 0011610 00000036637 11245312713 015211 0000000 0000000For specificity, consider the declaration
sparsetable<Foo> t(100); // a sparse array with 100 elements
A sparsetable is a random container that implements a sparse array, that is, an array that uses very little memory to store unassigned indices (in this case, between 1-2 bits per unassigned index). For instance, if you allocate an array of size 5 and assign a[2] = [big struct], then a[2] will take up a lot of memory but a[0], a[1], a[3], and a[4] will not. Array elements that have a value are called "assigned". Array elements that have no value yet, or have had their value cleared using erase() or clear(), are called "unassigned". For assigned elements, lookups return the assigned value; for unassigned elements, they return the default value, which for t is Foo().
sparsetable is implemented as an array of "groups". Each group is responsible for M array indices. The first group knows about t[0]..t[M-1], the second about t[M]..t[2M-1], and so forth. (M is 48 by default.) At construct time, t creates an array of (99/M + 1) groups. From this point on, all operations -- insert, delete, lookup -- are passed to the appropriate group. In particular, any operation on t[i] is actually performed on (t.group[i / M])[i % M].
Each group contains of a vector, which holds assigned values, and a bitmap of size M, which indicates which indices are assigned. A lookup works as follows: the group is asked to look up index i, where i < M. The group looks at bitmap[i]. If it's 0, the lookup fails. If it's 1, then the group has to find the appropriate value in the vector.
Finding the appropriate vector element is the most expensive part of the lookup. The code counts all bitmap entries <= i that are set to 1. (There's at least 1 of them, since bitmap[i] is 1.) Suppose there are 4 such entries. Then the right value to return is the 4th element of the vector: vector[3]. This takes time O(M), which is a constant since M is a constant.
Insert starts with a lookup. If the lookup succeeds, the code merely replaces vector[3] with the new value. If the lookup fails, then the code must insert a new entry into the middle of the vector. Again, to insert at position i, the code must count all the bitmap entries <= i that are set to i. This indicates the position to insert into the vector. All vector entries above that position must be moved to make room for the new entry. This takes time, but still constant time since the vector has size at most M.
(Inserts could be made faster by using a list instead of a vector to hold group values, but this would use much more memory, since each list element requires a full pointer of overhead.)
The only metadata that needs to be updated, after the actual value is inserted, is to set bitmap[i] to 1. No other counts must be maintained.
Deletes are similar to inserts. They start with a lookup. If it fails, the delete is a noop. Otherwise, the appropriate entry is removed from the vector, all the vector elements above it are moved down one, and bitmap[i] is set to 0.
Sparsetable iterators pose a special burden. They must iterate over unassigned array values, but the act of iterating should not cause an assignment to happen -- otherwise, iterating over a sparsetable would cause it to take up much more room. For const iterators, the matter is simple: the iterator is merely programmed to return the default value -- Foo() -- when dereferenced while pointing to an unassigned entry.
For non-const iterators, such simple techniques fail. Instead, dereferencing a sparsetable_iterator returns an opaque object that acts like a Foo in almost all situations, but isn't actually a Foo. (It does this by defining operator=(), operator value_type(), and, most sneakily, operator&().) This works in almost all cases. If it doesn't, an explicit cast to value_type will solve the problem:
printf("%d", static_cast<Foo>(*t.find(0)));
To avoid such problems, consider using get() and set() instead of an iterator:
for (int i = 0; i < t.size(); ++i) if (t.get(i) == ...) t.set(i, ...);
Sparsetable also has a special class of iterator, besides normal and const: nonempty_iterator. This only iterates over array values that are assigned. This is particularly fast given the sparsetable implementation, since it can ignore the bitmaps entirely and just iterate over the various group vectors.
The space overhead for an sparsetable of size N is N + 48N/M bits. For the default value of M, this is exactly 2 bits per array entry. (That's for 32-bit pointers; for machines with 64-bit pointers, it's N + 80N/M bits, or 2.67 bits per entry.) A larger M would use less overhead -- approaching 1 bit per array entry -- but take longer for inserts, deletes, and lookups. A smaller M would use more overhead but make operations somewhat faster.
You can also look at some specific performance numbers.
For specificity, consider the declaration
sparse_hash_set<Foo> t;
sparse_hash_set is a hashtable. For more information on hashtables, see Knuth. Hashtables are basically arrays with complicated logic on top of them. sparse_hash_set uses a sparsetable to implement the underlying array.
In particular, sparse_hash_set stores its data in a sparsetable using quadratic internal probing (see Knuth). Many hashtable implementations use external probing, so each table element is actually a pointer chain, holding many hashtable values. sparse_hash_set, on the other hand, always stores at most one value in each table location. If the hashtable wants to store a second value at a given table location, it can't; it's forced to look somewhere else.
As a specific example, suppose t is a new sparse_hash_set. It then holds a sparsetable of size 32. The code for t.insert(foo) works as follows:
1) Call hash<Foo>(foo) to convert foo into an integer i. (hash<Foo> is the default hash function; you can specify a different one in the template arguments.)
2a) Look at t.sparsetable[i % 32]. If it's unassigned, assign it to foo. foo is now in the hashtable.
2b) If t.sparsetable[i % 32] is assigned, and its value is foo, then do nothing: foo was already in t and the insert is a noop.
2c) If t.sparsetable[i % 32] is assigned, but to a value other than foo, look at t.sparsetable[(i+1) % 32]. If that also fails, try t.sparsetable[(i+3) % 32], then t.sparsetable[(i+6) % 32]. In general, keep trying the next triangular number.
3) If the table is now "too full" -- say, 25 of the 32 table entries are now assigned -- grow the table by creating a new sparsetable that's twice as big, and rehashing every single element from the old table into the new one. This keeps the table from ever filling up.
4) If the table is now "too empty" -- say, only 3 of the 32 table entries are now assigned -- shrink the table by creating a new sparsetable that's half as big, and rehashing every element as in the growing case. This keeps the table overhead proportional to the number of elements in the table.
Instead of using triangular numbers as offsets, one could just use regular integers: try i, then i+1, then i+2, then i+3. This has bad 'clumping' behavior, as explored in Knuth. Quadratic probing, using the triangular numbers, avoids the clumping while keeping cache coherency in the common case. As long as the table size is a power of 2, the quadratic-probing method described above will explore every table element if necessary, to find a good place to insert.
(As a side note, using a table size that's a power of two has several advantages, including the speed of calculating (i % table_size). On the other hand, power-of-two tables are not very forgiving of a poor hash function. Make sure your hash function is a good one! There are plenty of dos and don'ts on the web (and in Knuth), for writing hash functions.)
The "too full" value, also called the "maximum occupancy", determines a time-space tradeoff: in general, the higher it is, the less space is wasted but the more probes must be performed for each insert. sparse_hash_set uses a high maximum occupancy, since space is more important than speed for this data structure.
The "too empty" value is not necessary for performance but helps with space use. It's rare for hashtable implementations to check this value at insert() time -- after all, how will inserting cause a hashtable to get too small? However, the sparse_hash_set implementation never resizes on erase(); it's nice to have an erase() that does not invalidate iterators. Thus, the first insert() after a long string of erase()s could well trigger a hashtable shrink.
find() works similarly to insert. The only difference is in step (2a): if the value is unassigned, then the lookup fails immediately.
delete() is tricky in an internal-probing scheme. The obvious implementation of just "unassigning" the relevant table entry doesn't work. Consider the following scenario:
t.insert(foo1); // foo1 hashes to 4, is put in table[4] t.insert(foo2); // foo2 hashes to 4, is put in table[5] t.erase(foo1); // table[4] is now 'unassigned' t.lookup(foo2); // fails since table[hash(foo2)] is unassigned
To avoid these failure situations, delete(foo1) is actually implemented by replacing foo1 by a special 'delete' value in the hashtable. This 'delete' value causes the table entry to be considered unassigned for the purposes of insertion -- if foo3 hashes to 4 as well, it can go into table[4] no problem -- but assigned for the purposes of lookup.
What is this special 'delete' value? The delete value has to be an element of type Foo, since the table can't hold anything else. It obviously must be an element the client would never want to insert on its own, or else the code couldn't distinguish deleted entries from 'real' entries with the same value. There's no way to determine a good value automatically. The client has to specify it explicitly. This is what the set_deleted_key() method does.
Note that set_deleted_key() is only necessary if the client actually wants to call t.erase(). For insert-only hash-sets, set_deleted_key() is unnecessary.
When copying the hashtable, either to grow it or shrink it, the special 'delete' values are not copied into the new table. The copy-time rehash makes them unnecessary.
The data is stored in a sparsetable, so space use is the same as for sparsetable. However, by default the sparse_hash_set implementation tries to keep about half the table buckets empty, to keep lookup-chains short. Since sparsehashmap has about 2 bits overhead per bucket (or 2.5 bits on 64-bit systems), sparse_hash_map has about 4-5 bits overhead per hashtable item.
Time use is also determined in large part by the sparsetable implementation. However, there is also an extra probing cost in hashtables, which depends in large part on the "too full" value. It should be rare to need more than 4-5 probes per lookup, and usually significantly less will suffice.
A note on growing and shrinking the hashtable: all hashtable implementations use the most memory when growing a hashtable, since they must have room for both the old table and the new table at the same time. sparse_hash_set is careful to delete entries from the old hashtable as soon as they're copied into the new one, to minimize this space overhead. (It does this efficiently by using its knowledge of the sparsetable class and copying one sparsetable group at a time.)
You can also look at some specific performance numbers.
sparse_hash_map is implemented identically to sparse_hash_set. The only difference is instead of storing just Foo in each table entry, the data structure stores pair<Foo, Value>.
The hashtable aspects of dense_hash_set are identical to sparse_hash_set: it uses quadratic internal probing, and resizes hashtables in exactly the same way. The difference is in the underlying array: instead of using a sparsetable, dense_hash_set uses a C array. This means much more space is used, especially if Foo is big. However, it makes all operations faster, since sparsetable has memory management overhead that C arrays do not.
The use of C arrays instead of sparsetables points to one immediate complication dense_hash_set has that sparse_hash_set does not: the need to distinguish assigned from unassigned entries. In a sparsetable, this is accomplished by a bitmap. dense_hash_set, on the other hand, uses a dedicated value to specify unassigned entries. Thus, dense_hash_set has two special values: one to indicate deleted table entries, and one to indicated unassigned table entries. At construct time, all table entries are initialized to 'unassigned'.
dense_hash_set provides the method set_empty_key() to indicate the value that should be used for unassigned entries. Like set_deleted_key(), set_empty_key() requires a value that will not be used by the client for any legitimate purpose. Unlike set_deleted_key(), set_empty_key() is always required, no matter what hashtable operations the client wishes to perform.
This implementation is fast because even though dense_hash_set may not be space efficient, most lookups are localized: a single lookup may need to access table[i], and maybe table[i+1] and table[i+3], but nothing other than that. For all but the biggest data structures, these will frequently be in a single cache line.
This implementation takes, for every unused bucket, space as big as the key-type. Usually between half and two-thirds of the buckets are empty.
The doubling method used by dense_hash_set tends to work poorly with most memory allocators. This is because memory allocators tend to have memory 'buckets' which are a power of two. Since each doubling of a dense_hash_set doubles the memory use, a single hashtable doubling will require a new memory 'bucket' from the memory allocator, leaving the old bucket stranded as fragmented memory. Hence, it's not recommended this data structure be used with many inserts in memory-constrained situations.
You can also look at some specific performance numbers.
dense_hash_map is identical to dense_hash_set except for what values are stored in each table entry.
Here are some performance numbers from an example desktop machine, taken from a version of time_hash_map that was instrumented to also report memory allocation information (this modification is not included by default because it required a big hack to do, including modifying the STL code to not try to do its own freelist management).
Note there are lots of caveats on these numbers: they may differ from machine to machine and compiler to compiler, and they only test a very particular usage pattern that may not match how you use hashtables -- for instance, they test hashtables with very small keys. However, they're still useful for a baseline comparison of the various hashtable implementations.
These figures are from a 2.80GHz Pentium 4 with 2G of memory. The 'standard' hash_map and map implementations are the SGI STL code included with gcc2. Compiled with gcc2.95.3 -g -O2
====== Average over 10000000 iterations Wed Dec 8 14:56:38 PST 2004 SPARSE_HASH_MAP: map_grow 665 ns map_predict/grow 303 ns map_replace 177 ns map_fetch 117 ns map_remove 192 ns memory used in map_grow 84.3956 Mbytes DENSE_HASH_MAP: map_grow 84 ns map_predict/grow 22 ns map_replace 18 ns map_fetch 13 ns map_remove 23 ns memory used in map_grow 256.0000 Mbytes STANDARD HASH_MAP: map_grow 162 ns map_predict/grow 107 ns map_replace 44 ns map_fetch 22 ns map_remove 124 ns memory used in map_grow 204.1643 Mbytes STANDARD MAP: map_grow 297 ns map_predict/grow 282 ns map_replace 113 ns map_fetch 113 ns map_remove 238 ns memory used in map_grow 236.8081 Mbytes
For good performance, the Google hash routines depend on a good hash function: one that distributes data evenly. Many hashtable implementations come with sub-optimal hash functions that can degrade performance. For instance, the hash function given in Knuth's _Art of Computer Programming_, and the default string hash function in SGI's STL implementation, both distribute certain data sets unevenly, leading to poor performance.
As an example, in one test of the default SGI STL string hash function against the Hsieh hash function (see below), for a particular set of string keys, the Hsieh function resulted in hashtable lookups that were 20 times as fast as the STLPort hash function. The string keys were chosen to be "hard" to hash well, so these results may not be typical, but they are suggestive.
There has been much research over the years into good hash functions. Here are some hash functions of note.
The Google sparsehash package consists of two hashtable implementations: sparse, which is designed to be very space efficient, and dense, which is designed to be very time efficient. For each one, the package provides both a hash-map and a hash-set, to mirror the classes in the common STL implementation.
Documentation on how to use these classes:
In addition to the hash-map (and hash-set) classes, there's also a
lower-level class that implements a "sparse" array. This class can be
useful in its own right; consider using it when you'd normally use a
sparse_hash_map
, but your keys are all small-ish
integers.
There is also a doc explaining the implementation details of these classes, for those who are curious. And finally, you can see some performance comparisons, both between the various classes here, but also between these implementations and other standard hashtable implementations.
tag for bits of code and for variables and objects. */
code,pre,samp,var {
color: #006000;
}
/* Use the tag for file and directory paths and names. */
file {
color: #905050;
font-family: monospace;
}
/* Use the tag for stuff the user should type. */
kbd {
color: #600000;
}
div.note p {
float: right;
width: 3in;
margin-right: 0%;
padding: 1px;
border: 2px solid #6060a0;
background-color: #fffff0;
}
UL.nobullets {
list-style-type: none;
list-style-image: none;
margin-left: -1em;
}
/*
body:after {
content: "Google Confidential";
}
*/
/* pretty printing styles. See prettify.js */
.str { color: #080; }
.kwd { color: #008; }
.com { color: #800; }
.typ { color: #606; }
.lit { color: #066; }
.pun { color: #660; }
.pln { color: #000; }
.tag { color: #008; }
.atn { color: #606; }
.atv { color: #080; }
pre.prettyprint { padding: 2px; border: 1px solid #888; }
.embsrc { background: #eee; }
@media print {
.str { color: #060; }
.kwd { color: #006; font-weight: bold; }
.com { color: #600; font-style: italic; }
.typ { color: #404; font-weight: bold; }
.lit { color: #044; }
.pun { color: #440; }
.pln { color: #000; }
.tag { color: #006; font-weight: bold; }
.atn { color: #404; }
.atv { color: #060; }
}
/* Table Column Headers */
.hdr {
color: #006;
font-weight: bold;
background-color: #dddddd; }
.hdr2 {
color: #006;
background-color: #eeeeee; }sparsehash-1.10/m4/ 0000777 0000764 0011610 00000000000 11516450312 011117 5 0000000 0000000 sparsehash-1.10/m4/acx_pthread.m4 0000444 0000764 0011610 00000031766 11011141342 013560 0000000 0000000 # This was retrieved from
# http://svn.0pointer.de/viewvc/trunk/common/acx_pthread.m4?revision=1277&root=avahi
# See also (perhaps for new versions?)
# http://svn.0pointer.de/viewvc/trunk/common/acx_pthread.m4?root=avahi
#
# We've rewritten the inconsistency check code (from avahi), to work
# more broadly. In particular, it no longer assumes ld accepts -zdefs.
# This caused a restructing of the code, but the functionality has only
# changed a little.
dnl @synopsis ACX_PTHREAD([ACTION-IF-FOUND[, ACTION-IF-NOT-FOUND]])
dnl
dnl @summary figure out how to build C programs using POSIX threads
dnl
dnl This macro figures out how to build C programs using POSIX threads.
dnl It sets the PTHREAD_LIBS output variable to the threads library and
dnl linker flags, and the PTHREAD_CFLAGS output variable to any special
dnl C compiler flags that are needed. (The user can also force certain
dnl compiler flags/libs to be tested by setting these environment
dnl variables.)
dnl
dnl Also sets PTHREAD_CC to any special C compiler that is needed for
dnl multi-threaded programs (defaults to the value of CC otherwise).
dnl (This is necessary on AIX to use the special cc_r compiler alias.)
dnl
dnl NOTE: You are assumed to not only compile your program with these
dnl flags, but also link it with them as well. e.g. you should link
dnl with $PTHREAD_CC $CFLAGS $PTHREAD_CFLAGS $LDFLAGS ... $PTHREAD_LIBS
dnl $LIBS
dnl
dnl If you are only building threads programs, you may wish to use
dnl these variables in your default LIBS, CFLAGS, and CC:
dnl
dnl LIBS="$PTHREAD_LIBS $LIBS"
dnl CFLAGS="$CFLAGS $PTHREAD_CFLAGS"
dnl CC="$PTHREAD_CC"
dnl
dnl In addition, if the PTHREAD_CREATE_JOINABLE thread-attribute
dnl constant has a nonstandard name, defines PTHREAD_CREATE_JOINABLE to
dnl that name (e.g. PTHREAD_CREATE_UNDETACHED on AIX).
dnl
dnl ACTION-IF-FOUND is a list of shell commands to run if a threads
dnl library is found, and ACTION-IF-NOT-FOUND is a list of commands to
dnl run it if it is not found. If ACTION-IF-FOUND is not specified, the
dnl default action will define HAVE_PTHREAD.
dnl
dnl Please let the authors know if this macro fails on any platform, or
dnl if you have any other suggestions or comments. This macro was based
dnl on work by SGJ on autoconf scripts for FFTW (www.fftw.org) (with
dnl help from M. Frigo), as well as ac_pthread and hb_pthread macros
dnl posted by Alejandro Forero Cuervo to the autoconf macro repository.
dnl We are also grateful for the helpful feedback of numerous users.
dnl
dnl @category InstalledPackages
dnl @author Steven G. Johnson
dnl @version 2006-05-29
dnl @license GPLWithACException
dnl
dnl Checks for GCC shared/pthread inconsistency based on work by
dnl Marcin Owsiany
AC_DEFUN([ACX_PTHREAD], [
AC_REQUIRE([AC_CANONICAL_HOST])
AC_LANG_SAVE
AC_LANG_C
acx_pthread_ok=no
# We used to check for pthread.h first, but this fails if pthread.h
# requires special compiler flags (e.g. on True64 or Sequent).
# It gets checked for in the link test anyway.
# First of all, check if the user has set any of the PTHREAD_LIBS,
# etcetera environment variables, and if threads linking works using
# them:
if test x"$PTHREAD_LIBS$PTHREAD_CFLAGS" != x; then
save_CFLAGS="$CFLAGS"
CFLAGS="$CFLAGS $PTHREAD_CFLAGS"
save_LIBS="$LIBS"
LIBS="$PTHREAD_LIBS $LIBS"
AC_MSG_CHECKING([for pthread_join in LIBS=$PTHREAD_LIBS with CFLAGS=$PTHREAD_CFLAGS])
AC_TRY_LINK_FUNC(pthread_join, acx_pthread_ok=yes)
AC_MSG_RESULT($acx_pthread_ok)
if test x"$acx_pthread_ok" = xno; then
PTHREAD_LIBS=""
PTHREAD_CFLAGS=""
fi
LIBS="$save_LIBS"
CFLAGS="$save_CFLAGS"
fi
# We must check for the threads library under a number of different
# names; the ordering is very important because some systems
# (e.g. DEC) have both -lpthread and -lpthreads, where one of the
# libraries is broken (non-POSIX).
# Create a list of thread flags to try. Items starting with a "-" are
# C compiler flags, and other items are library names, except for "none"
# which indicates that we try without any flags at all, and "pthread-config"
# which is a program returning the flags for the Pth emulation library.
acx_pthread_flags="pthreads none -Kthread -kthread lthread -pthread -pthreads -mthreads pthread --thread-safe -mt pthread-config"
# The ordering *is* (sometimes) important. Some notes on the
# individual items follow:
# pthreads: AIX (must check this before -lpthread)
# none: in case threads are in libc; should be tried before -Kthread and
# other compiler flags to prevent continual compiler warnings
# -Kthread: Sequent (threads in libc, but -Kthread needed for pthread.h)
# -kthread: FreeBSD kernel threads (preferred to -pthread since SMP-able)
# lthread: LinuxThreads port on FreeBSD (also preferred to -pthread)
# -pthread: Linux/gcc (kernel threads), BSD/gcc (userland threads)
# -pthreads: Solaris/gcc
# -mthreads: Mingw32/gcc, Lynx/gcc
# -mt: Sun Workshop C (may only link SunOS threads [-lthread], but it
# doesn't hurt to check since this sometimes defines pthreads too;
# also defines -D_REENTRANT)
# ... -mt is also the pthreads flag for HP/aCC
# pthread: Linux, etcetera
# --thread-safe: KAI C++
# pthread-config: use pthread-config program (for GNU Pth library)
case "${host_cpu}-${host_os}" in
*solaris*)
# On Solaris (at least, for some versions), libc contains stubbed
# (non-functional) versions of the pthreads routines, so link-based
# tests will erroneously succeed. (We need to link with -pthreads/-mt/
# -lpthread.) (The stubs are missing pthread_cleanup_push, or rather
# a function called by this macro, so we could check for that, but
# who knows whether they'll stub that too in a future libc.) So,
# we'll just look for -pthreads and -lpthread first:
acx_pthread_flags="-pthreads pthread -mt -pthread $acx_pthread_flags"
;;
esac
if test x"$acx_pthread_ok" = xno; then
for flag in $acx_pthread_flags; do
case $flag in
none)
AC_MSG_CHECKING([whether pthreads work without any flags])
;;
-*)
AC_MSG_CHECKING([whether pthreads work with $flag])
PTHREAD_CFLAGS="$flag"
;;
pthread-config)
AC_CHECK_PROG(acx_pthread_config, pthread-config, yes, no)
if test x"$acx_pthread_config" = xno; then continue; fi
PTHREAD_CFLAGS="`pthread-config --cflags`"
PTHREAD_LIBS="`pthread-config --ldflags` `pthread-config --libs`"
;;
*)
AC_MSG_CHECKING([for the pthreads library -l$flag])
PTHREAD_LIBS="-l$flag"
;;
esac
save_LIBS="$LIBS"
save_CFLAGS="$CFLAGS"
LIBS="$PTHREAD_LIBS $LIBS"
CFLAGS="$CFLAGS $PTHREAD_CFLAGS"
# Check for various functions. We must include pthread.h,
# since some functions may be macros. (On the Sequent, we
# need a special flag -Kthread to make this header compile.)
# We check for pthread_join because it is in -lpthread on IRIX
# while pthread_create is in libc. We check for pthread_attr_init
# due to DEC craziness with -lpthreads. We check for
# pthread_cleanup_push because it is one of the few pthread
# functions on Solaris that doesn't have a non-functional libc stub.
# We try pthread_create on general principles.
AC_TRY_LINK([#include ],
[pthread_t th; pthread_join(th, 0);
pthread_attr_init(0); pthread_cleanup_push(0, 0);
pthread_create(0,0,0,0); pthread_cleanup_pop(0); ],
[acx_pthread_ok=yes])
LIBS="$save_LIBS"
CFLAGS="$save_CFLAGS"
AC_MSG_RESULT($acx_pthread_ok)
if test "x$acx_pthread_ok" = xyes; then
break;
fi
PTHREAD_LIBS=""
PTHREAD_CFLAGS=""
done
fi
# Various other checks:
if test "x$acx_pthread_ok" = xyes; then
save_LIBS="$LIBS"
LIBS="$PTHREAD_LIBS $LIBS"
save_CFLAGS="$CFLAGS"
CFLAGS="$CFLAGS $PTHREAD_CFLAGS"
# Detect AIX lossage: JOINABLE attribute is called UNDETACHED.
AC_MSG_CHECKING([for joinable pthread attribute])
attr_name=unknown
for attr in PTHREAD_CREATE_JOINABLE PTHREAD_CREATE_UNDETACHED; do
AC_TRY_LINK([#include ], [int attr=$attr; return attr;],
[attr_name=$attr; break])
done
AC_MSG_RESULT($attr_name)
if test "$attr_name" != PTHREAD_CREATE_JOINABLE; then
AC_DEFINE_UNQUOTED(PTHREAD_CREATE_JOINABLE, $attr_name,
[Define to necessary symbol if this constant
uses a non-standard name on your system.])
fi
AC_MSG_CHECKING([if more special flags are required for pthreads])
flag=no
case "${host_cpu}-${host_os}" in
*-aix* | *-freebsd* | *-darwin*) flag="-D_THREAD_SAFE";;
*solaris* | *-osf* | *-hpux*) flag="-D_REENTRANT";;
esac
AC_MSG_RESULT(${flag})
if test "x$flag" != xno; then
PTHREAD_CFLAGS="$flag $PTHREAD_CFLAGS"
fi
LIBS="$save_LIBS"
CFLAGS="$save_CFLAGS"
# More AIX lossage: must compile with xlc_r or cc_r
if test x"$GCC" != xyes; then
AC_CHECK_PROGS(PTHREAD_CC, xlc_r cc_r, ${CC})
else
PTHREAD_CC=$CC
fi
# The next part tries to detect GCC inconsistency with -shared on some
# architectures and systems. The problem is that in certain
# configurations, when -shared is specified, GCC "forgets" to
# internally use various flags which are still necessary.
#
# Prepare the flags
#
save_CFLAGS="$CFLAGS"
save_LIBS="$LIBS"
save_CC="$CC"
# Try with the flags determined by the earlier checks.
#
# -Wl,-z,defs forces link-time symbol resolution, so that the
# linking checks with -shared actually have any value
#
# FIXME: -fPIC is required for -shared on many architectures,
# so we specify it here, but the right way would probably be to
# properly detect whether it is actually required.
CFLAGS="-shared -fPIC -Wl,-z,defs $CFLAGS $PTHREAD_CFLAGS"
LIBS="$PTHREAD_LIBS $LIBS"
CC="$PTHREAD_CC"
# In order not to create several levels of indentation, we test
# the value of "$done" until we find the cure or run out of ideas.
done="no"
# First, make sure the CFLAGS we added are actually accepted by our
# compiler. If not (and OS X's ld, for instance, does not accept -z),
# then we can't do this test.
if test x"$done" = xno; then
AC_MSG_CHECKING([whether to check for GCC pthread/shared inconsistencies])
AC_TRY_LINK(,, , [done=yes])
if test "x$done" = xyes ; then
AC_MSG_RESULT([no])
else
AC_MSG_RESULT([yes])
fi
fi
if test x"$done" = xno; then
AC_MSG_CHECKING([whether -pthread is sufficient with -shared])
AC_TRY_LINK([#include ],
[pthread_t th; pthread_join(th, 0);
pthread_attr_init(0); pthread_cleanup_push(0, 0);
pthread_create(0,0,0,0); pthread_cleanup_pop(0); ],
[done=yes])
if test "x$done" = xyes; then
AC_MSG_RESULT([yes])
else
AC_MSG_RESULT([no])
fi
fi
#
# Linux gcc on some architectures such as mips/mipsel forgets
# about -lpthread
#
if test x"$done" = xno; then
AC_MSG_CHECKING([whether -lpthread fixes that])
LIBS="-lpthread $PTHREAD_LIBS $save_LIBS"
AC_TRY_LINK([#include ],
[pthread_t th; pthread_join(th, 0);
pthread_attr_init(0); pthread_cleanup_push(0, 0);
pthread_create(0,0,0,0); pthread_cleanup_pop(0); ],
[done=yes])
if test "x$done" = xyes; then
AC_MSG_RESULT([yes])
PTHREAD_LIBS="-lpthread $PTHREAD_LIBS"
else
AC_MSG_RESULT([no])
fi
fi
#
# FreeBSD 4.10 gcc forgets to use -lc_r instead of -lc
#
if test x"$done" = xno; then
AC_MSG_CHECKING([whether -lc_r fixes that])
LIBS="-lc_r $PTHREAD_LIBS $save_LIBS"
AC_TRY_LINK([#include ],
[pthread_t th; pthread_join(th, 0);
pthread_attr_init(0); pthread_cleanup_push(0, 0);
pthread_create(0,0,0,0); pthread_cleanup_pop(0); ],
[done=yes])
if test "x$done" = xyes; then
AC_MSG_RESULT([yes])
PTHREAD_LIBS="-lc_r $PTHREAD_LIBS"
else
AC_MSG_RESULT([no])
fi
fi
if test x"$done" = xno; then
# OK, we have run out of ideas
AC_MSG_WARN([Impossible to determine how to use pthreads with shared libraries])
# so it's not safe to assume that we may use pthreads
acx_pthread_ok=no
fi
CFLAGS="$save_CFLAGS"
LIBS="$save_LIBS"
CC="$save_CC"
else
PTHREAD_CC="$CC"
fi
AC_SUBST(PTHREAD_LIBS)
AC_SUBST(PTHREAD_CFLAGS)
AC_SUBST(PTHREAD_CC)
# Finally, execute ACTION-IF-FOUND/ACTION-IF-NOT-FOUND:
if test x"$acx_pthread_ok" = xyes; then
ifelse([$1],,AC_DEFINE(HAVE_PTHREAD,1,[Define if you have POSIX threads libraries and header files.]),[$1])
:
else
acx_pthread_ok=no
$2
fi
AC_LANG_RESTORE
])dnl ACX_PTHREAD
sparsehash-1.10/m4/google_namespace.m4 0000444 0000764 0011610 00000004121 11021665630 014563 0000000 0000000 # Allow users to override the namespace we define our application's classes in
# Arg $1 is the default namespace to use if --enable-namespace isn't present.
# In general, $1 should be 'google', so we put all our exported symbols in a
# unique namespace that is not likely to conflict with anyone else. However,
# when it makes sense -- for instance, when publishing stl-like code -- you
# may want to go with a different default, like 'std'.
# We guarantee the invariant that GOOGLE_NAMESPACE starts with ::,
# unless it's the empty string. Thus, it's always safe to do
# GOOGLE_NAMESPACE::foo and be sure you're getting the foo that's
# actually in the google namespace, and not some other namespace that
# the namespace rules might kick in.
AC_DEFUN([AC_DEFINE_GOOGLE_NAMESPACE],
[google_namespace_default=[$1]
AC_ARG_ENABLE(namespace, [ --enable-namespace=FOO to define these Google
classes in the FOO namespace. --disable-namespace
to define them in the global namespace. Default
is to define them in namespace $1.],
[case "$enableval" in
yes) google_namespace="$google_namespace_default" ;;
no) google_namespace="" ;;
*) google_namespace="$enableval" ;;
esac],
[google_namespace="$google_namespace_default"])
if test -n "$google_namespace"; then
ac_google_namespace="::$google_namespace"
ac_google_start_namespace="namespace $google_namespace {"
ac_google_end_namespace="}"
else
ac_google_namespace=""
ac_google_start_namespace=""
ac_google_end_namespace=""
fi
AC_DEFINE_UNQUOTED(GOOGLE_NAMESPACE, $ac_google_namespace,
Namespace for Google classes)
AC_DEFINE_UNQUOTED(_START_GOOGLE_NAMESPACE_, $ac_google_start_namespace,
Puts following code inside the Google namespace)
AC_DEFINE_UNQUOTED(_END_GOOGLE_NAMESPACE_, $ac_google_end_namespace,
Stops putting the code inside the Google namespace)
])
sparsehash-1.10/m4/namespaces.m4 0000444 0000764 0011610 00000001337 10633336042 013420 0000000 0000000 # Checks whether the compiler implements namespaces
AC_DEFUN([AC_CXX_NAMESPACES],
[AC_CACHE_CHECK(whether the compiler implements namespaces,
ac_cv_cxx_namespaces,
[AC_LANG_SAVE
AC_LANG_CPLUSPLUS
AC_TRY_COMPILE([namespace Outer {
namespace Inner { int i = 0; }}],
[using namespace Outer::Inner; return i;],
ac_cv_cxx_namespaces=yes,
ac_cv_cxx_namespaces=no)
AC_LANG_RESTORE])
if test "$ac_cv_cxx_namespaces" = yes; then
AC_DEFINE(HAVE_NAMESPACES, 1, [define if the compiler implements namespaces])
fi])
sparsehash-1.10/m4/stl_hash.m4 0000444 0000764 0011610 00000006216 11200112547 013100 0000000 0000000 # We check two things: where the include file is for
# unordered_map/hash_map (we prefer the first form), and what
# namespace unordered/hash_map lives in within that include file. We
# include AC_TRY_COMPILE for all the combinations we've seen in the
# wild. We define HASH_MAP_H to the location of the header file, and
# HASH_NAMESPACE to the namespace the class (unordered_map or
# hash_map) is in. We define HAVE_UNORDERED_MAP if the class we found
# is named unordered_map, or leave it undefined if not.
# This also checks if unordered map exists.
AC_DEFUN([AC_CXX_STL_HASH],
[AC_REQUIRE([AC_CXX_NAMESPACES])
AC_MSG_CHECKING(the location of hash_map)
AC_LANG_SAVE
AC_LANG_CPLUSPLUS
ac_cv_cxx_hash_map=""
# First try unordered_map, but not on gcc's before 4.2 -- I've
# seen unexplainable unordered_map bugs with -O2 on older gcc's.
AC_TRY_COMPILE([#if defined(__GNUC__) && (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 2))
# error GCC too old for unordered_map
#endif
],
[/* no program body necessary */],
[stl_hash_old_gcc=no],
[stl_hash_old_gcc=yes])
for location in unordered_map tr1/unordered_map; do
for namespace in std std::tr1; do
if test -z "$ac_cv_cxx_hash_map" -a "$stl_hash_old_gcc" != yes; then
# Some older gcc's have a buggy tr1, so test a bit of code.
AC_TRY_COMPILE([#include <$location>],
[const ${namespace}::unordered_map t;
return t.find(5) == t.end();],
[ac_cv_cxx_hash_map="<$location>";
ac_cv_cxx_hash_namespace="$namespace";
ac_cv_cxx_have_unordered_map="yes";])
fi
done
done
# Now try hash_map
for location in ext/hash_map hash_map; do
for namespace in __gnu_cxx "" std stdext; do
if test -z "$ac_cv_cxx_hash_map"; then
AC_TRY_COMPILE([#include <$location>],
[${namespace}::hash_map t],
[ac_cv_cxx_hash_map="<$location>";
ac_cv_cxx_hash_namespace="$namespace";
ac_cv_cxx_have_unordered_map="no";])
fi
done
done
ac_cv_cxx_hash_set=`echo "$ac_cv_cxx_hash_map" | sed s/map/set/`;
if test -n "$ac_cv_cxx_hash_map"; then
AC_DEFINE(HAVE_HASH_MAP, 1, [define if the compiler has hash_map])
AC_DEFINE(HAVE_HASH_SET, 1, [define if the compiler has hash_set])
AC_DEFINE_UNQUOTED(HASH_MAP_H,$ac_cv_cxx_hash_map,
[the location of or ])
AC_DEFINE_UNQUOTED(HASH_SET_H,$ac_cv_cxx_hash_set,
[the location of or ])
AC_DEFINE_UNQUOTED(HASH_NAMESPACE,$ac_cv_cxx_hash_namespace,
[the namespace of hash_map/hash_set])
if test "$ac_cv_cxx_have_unordered_map" = yes; then
AC_DEFINE(HAVE_UNORDERED_MAP,1,
[define if the compiler supports unordered_{map,set}])
fi
AC_MSG_RESULT([$ac_cv_cxx_hash_map])
else
AC_MSG_RESULT()
AC_MSG_WARN([could not find an STL hash_map])
fi
])
sparsehash-1.10/m4/stl_hash_fun.m4 0000444 0000764 0011610 00000002723 11200112716 013745 0000000 0000000 # We just try to figure out where hash<> is defined. It's in some file
# that ends in hash_fun.h...
#
# Ideally we'd use AC_CACHE_CHECK, but that only lets us store one value
# at a time, and we need to store two (filename and namespace).
# prints messages itself, so we have to do the message-printing ourselves
# via AC_MSG_CHECKING + AC_MSG_RESULT. (TODO(csilvers): can we cache?)
#
# tr1/functional_hash.h: new gcc's with tr1 support
# stl_hash_fun.h: old gcc's (gc2.95?)
# ext/hash_fun.h: newer gcc's (gcc4)
# stl/_hash_fun.h: STLport
AC_DEFUN([AC_CXX_STL_HASH_FUN],
[AC_REQUIRE([AC_CXX_STL_HASH])
AC_MSG_CHECKING(how to include hash_fun directly)
AC_LANG_SAVE
AC_LANG_CPLUSPLUS
ac_cv_cxx_stl_hash_fun=""
for location in functional tr1/functional \
ext/hash_fun.h ext/stl_hash_fun.h \
hash_fun.h stl_hash_fun.h \
stl/_hash_fun.h; do
if test -z "$ac_cv_cxx_stl_hash_fun"; then
AC_TRY_COMPILE([#include <$location>],
[int x = ${ac_cv_cxx_hash_namespace}::hash()(5)],
[ac_cv_cxx_stl_hash_fun="<$location>";])
fi
done
AC_LANG_RESTORE
AC_DEFINE_UNQUOTED(HASH_FUN_H,$ac_cv_cxx_stl_hash_fun,
[the location of the header defining hash functions])
AC_DEFINE_UNQUOTED(HASH_NAMESPACE,$ac_cv_cxx_hash_namespace,
[the namespace of the hash<> function])
AC_MSG_RESULT([$ac_cv_cxx_stl_hash_fun])
])
sparsehash-1.10/m4/stl_namespace.m4 0000444 0000764 0011610 00000001625 10633336042 014117 0000000 0000000 # We check what namespace stl code like vector expects to be executed in
AC_DEFUN([AC_CXX_STL_NAMESPACE],
[AC_CACHE_CHECK(
what namespace STL code is in,
ac_cv_cxx_stl_namespace,
[AC_REQUIRE([AC_CXX_NAMESPACES])
AC_LANG_SAVE
AC_LANG_CPLUSPLUS
AC_TRY_COMPILE([#include ],
[vector t; return 0;],
ac_cv_cxx_stl_namespace=none)
AC_TRY_COMPILE([#include ],
[std::vector t; return 0;],
ac_cv_cxx_stl_namespace=std)
AC_LANG_RESTORE])
if test "$ac_cv_cxx_stl_namespace" = none; then
AC_DEFINE(STL_NAMESPACE,,
[the namespace where STL code like vector<> is defined])
fi
if test "$ac_cv_cxx_stl_namespace" = std; then
AC_DEFINE(STL_NAMESPACE,std,
[the namespace where STL code like vector<> is defined])
fi
])
sparsehash-1.10/packages/ 0000777 0000764 0011610 00000000000 11516450313 012356 5 0000000 0000000 sparsehash-1.10/packages/rpm/ 0000777 0000764 0011610 00000000000 11516450313 013154 5 0000000 0000000 sparsehash-1.10/packages/rpm/rpm.spec 0000444 0000764 0011610 00000004066 11453735431 014555 0000000 0000000 %define RELEASE 1
%define rel %{?CUSTOM_RELEASE} %{!?CUSTOM_RELEASE:%RELEASE}
%define prefix /usr
Name: %NAME
Summary: hash_map and hash_set classes with minimal space overhead
Version: %VERSION
Release: %rel
Group: Development/Libraries
URL: http://code.google.com/p/google-sparsehash
License: BSD
Vendor: Google
Packager: Google
Source: http://%{NAME}.googlecode.com/files/%{NAME}-%{VERSION}.tar.gz
Distribution: Redhat 7 and above.
Buildroot: %{_tmppath}/%{name}-root
Prefix: %prefix
Buildarch: noarch
%description
The %name package contains several hash-map implementations, similar
in API to the SGI hash_map class, but with different performance
characteristics. sparse_hash_map uses very little space overhead: 1-2
bits per entry. dense_hash_map is typically faster than the default
SGI STL implementation. This package also includes hash-set analogues
of these classes.
%changelog
* Wed Apr 22 2009
- Change build rule to use %configure instead of ./configure
- Change install to use DESTDIR instead of prefix for make install
- Use wildcards for doc/ and lib/ directories
- Use {_includedir} instead of {prefix}/include
* Fri Jan 14 2005
- First draft
%prep
%setup
%build
# I can't use '% configure', because it defines -m32 which breaks on
# my development environment for some reason. But I do take
# as much from % configure (in /usr/lib/rpm/macros) as I can.
./configure --prefix=%{_prefix} --exec-prefix=%{_exec_prefix} --bindir=%{_bindir} --sbindir=%{_sbindir} --sysconfdir=%{_sysconfdir} --datadir=%{_datadir} --includedir=%{_includedir} --libdir=%{_libdir} --libexecdir=%{_libexecdir} --localstatedir=%{_localstatedir} --sharedstatedir=%{_sharedstatedir} --mandir=%{_mandir} --infodir=%{_infodir}
make
%install
rm -rf $RPM_BUILD_ROOT
make DESTDIR=$RPM_BUILD_ROOT install
%clean
rm -rf $RPM_BUILD_ROOT
%files
%defattr(-,root,root)
%docdir %{prefix}/share/doc/%{NAME}-%{VERSION}
%{prefix}/share/doc/%{NAME}-%{VERSION}/*
%{_includedir}/google
%{_libdir}/pkgconfig/*.pc
sparsehash-1.10/packages/rpm.sh 0000555 0000764 0011610 00000005165 11173721123 013433 0000000 0000000 #!/bin/sh -e
# Run this from the 'packages' directory, just under rootdir
# We can only build rpm packages, if the rpm build tools are installed
if [ \! -x /usr/bin/rpmbuild ]
then
echo "Cannot find /usr/bin/rpmbuild. Not building an rpm." 1>&2
exit 0
fi
# Check the commandline flags
PACKAGE="$1"
VERSION="$2"
fullname="${PACKAGE}-${VERSION}"
archive=../$fullname.tar.gz
if [ -z "$1" -o -z "$2" ]
then
echo "Usage: $0 " 1>&2
exit 0
fi
# Double-check we're in the packages directory, just under rootdir
if [ \! -r ../Makefile -a \! -r ../INSTALL ]
then
echo "Must run $0 in the 'packages' directory, under the root directory." 1>&2
echo "Also, you must run \"make dist\" before running this script." 1>&2
exit 0
fi
if [ \! -r "$archive" ]
then
echo "Cannot find $archive. Run \"make dist\" first." 1>&2
exit 0
fi
# Create the directory where the input lives, and where the output should live
RPM_SOURCE_DIR="/tmp/rpmsource-$fullname"
RPM_BUILD_DIR="/tmp/rpmbuild-$fullname"
trap 'rm -rf $RPM_SOURCE_DIR $RPM_BUILD_DIR; exit $?' EXIT SIGHUP SIGINT SIGTERM
rm -rf "$RPM_SOURCE_DIR" "$RPM_BUILD_DIR"
mkdir "$RPM_SOURCE_DIR"
mkdir "$RPM_BUILD_DIR"
cp "$archive" "$RPM_SOURCE_DIR"
# rpmbuild -- as far as I can tell -- asks the OS what CPU it has.
# This may differ from what kind of binaries gcc produces. dpkg
# does a better job of this, so if we can run 'dpkg --print-architecture'
# to get the build CPU, we use that in preference of the rpmbuild
# default.
target=`dpkg --print-architecture 2>/dev/null` # "" if dpkg isn't found
if [ -n "$target" ]
then
target=" --target $target"
fi
rpmbuild -bb rpm/rpm.spec $target \
--define "NAME $PACKAGE" \
--define "VERSION $VERSION" \
--define "_sourcedir $RPM_SOURCE_DIR" \
--define "_builddir $RPM_BUILD_DIR" \
--define "_rpmdir $RPM_SOURCE_DIR"
# We put the output in a directory based on what system we've built for
destdir=rpm-unknown
if [ -r /etc/issue ]
then
grep "Red Hat.*release 7" /etc/issue >/dev/null 2>&1 && destdir=rh7
grep "Red Hat.*release 8" /etc/issue >/dev/null 2>&1 && destdir=rh8
grep "Red Hat.*release 9" /etc/issue >/dev/null 2>&1 && destdir=rh9
grep "Fedora Core.*release 1" /etc/issue >/dev/null 2>&1 && destdir=fc1
grep "Fedora Core.*release 2" /etc/issue >/dev/null 2>&1 && destdir=fc2
grep "Fedora Core.*release 3" /etc/issue >/dev/null 2>&1 && destdir=fc3
fi
rm -rf "$destdir"
mkdir -p "$destdir"
# We want to get not only the main package but devel etc, hence the middle *
mv "$RPM_SOURCE_DIR"/*/"${PACKAGE}"-*"${VERSION}"*.rpm "$destdir"
echo
echo "The rpm package file(s) are located in $PWD/$destdir"
sparsehash-1.10/packages/deb.sh 0000555 0000764 0011610 00000005000 11154067420 013355 0000000 0000000 #!/bin/bash -e
# This takes one commandline argument, the name of the package. If no
# name is given, then we'll end up just using the name associated with
# an arbitrary .tar.gz file in the rootdir. That's fine: there's probably
# only one.
#
# Run this from the 'packages' directory, just under rootdir
## Set LIB to lib if exporting a library, empty-string else
LIB=
#LIB=lib
PACKAGE="$1"
VERSION="$2"
# We can only build Debian packages, if the Debian build tools are installed
if [ \! -x /usr/bin/debuild ]; then
echo "Cannot find /usr/bin/debuild. Not building Debian packages." 1>&2
exit 0
fi
# Double-check we're in the packages directory, just under rootdir
if [ \! -r ../Makefile -a \! -r ../INSTALL ]; then
echo "Must run $0 in the 'packages' directory, under the root directory." 1>&2
echo "Also, you must run \"make dist\" before running this script." 1>&2
exit 0
fi
# Find the top directory for this package
topdir="${PWD%/*}"
# Find the tar archive built by "make dist"
archive="${PACKAGE}-${VERSION}"
archive_with_underscore="${PACKAGE}_${VERSION}"
if [ -z "${archive}" ]; then
echo "Cannot find ../$PACKAGE*.tar.gz. Run \"make dist\" first." 1>&2
exit 0
fi
# Create a pristine directory for building the Debian package files
trap 'rm -rf '`pwd`/tmp'; exit $?' EXIT SIGHUP SIGINT SIGTERM
rm -rf tmp
mkdir -p tmp
cd tmp
# Debian has very specific requirements about the naming of build
# directories, and tar archives. It also wants to write all generated
# packages to the parent of the source directory. We accommodate these
# requirements by building directly from the tar file.
ln -s "${topdir}/${archive}.tar.gz" "${LIB}${archive}.orig.tar.gz"
# Some version of debuilder want foo.orig.tar.gz with _ between versions.
ln -s "${topdir}/${archive}.tar.gz" "${LIB}${archive_with_underscore}.orig.tar.gz"
tar zfx "${LIB}${archive}.orig.tar.gz"
[ -n "${LIB}" ] && mv "${archive}" "${LIB}${archive}"
cd "${LIB}${archive}"
# This is one of those 'specific requirements': where the deb control files live
cp -a "packages/deb" "debian"
# Now, we can call Debian's standard build tool
debuild -uc -us
cd ../.. # get back to the original top-level dir
# We'll put the result in a subdirectory that's named after the OS version
# we've made this .deb file for.
destdir="debian-$(cat /etc/debian_version 2>/dev/null || echo UNKNOWN)"
rm -rf "$destdir"
mkdir -p "$destdir"
mv $(find tmp -mindepth 1 -maxdepth 1 -type f) "$destdir"
echo
echo "The Debian package files are located in $PWD/$destdir"
sparsehash-1.10/packages/deb/ 0000777 0000764 0011610 00000000000 11516152100 013101 5 0000000 0000000 sparsehash-1.10/packages/deb/README 0000444 0000764 0011610 00000000457 11174146552 013717 0000000 0000000 The list of files here isn't complete. For a step-by-step guide on
how to set this package up correctly, check out
http://www.debian.org/doc/maint-guide/
Most of the files that are in this directory are boilerplate.
However, you may need to change the list of binary-arch dependencies
in 'rules'.
sparsehash-1.10/packages/deb/changelog 0000644 0000764 0011610 00000006604 11516152100 014675 0000000 0000000 sparsehash (1.10-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Thu, 20 Jan 2011 16:07:39 -0800
sparsehash (1.9-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Fri, 24 Sep 2010 11:37:50 -0700
sparsehash (1.8.1-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Thu, 29 Jul 2010 15:01:29 -0700
sparsehash (1.8-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Thu, 29 Jul 2010 09:53:26 -0700
sparsehash (1.7-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Wed, 31 Mar 2010 12:32:03 -0700
sparsehash (1.6-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Fri, 08 Jan 2010 14:47:55 -0800
sparsehash (1.5.2-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Tue, 12 May 2009 14:16:38 -0700
sparsehash (1.5.1-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Fri, 08 May 2009 15:23:44 -0700
sparsehash (1.5-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Wed, 06 May 2009 11:28:49 -0700
sparsehash (1.4-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Wed, 28 Jan 2009 17:11:31 -0800
sparsehash (1.3-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Thu, 06 Nov 2008 15:06:09 -0800
sparsehash (1.2-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Thu, 18 Sep 2008 13:53:20 -0700
sparsehash (1.1-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Mon, 11 Feb 2008 16:30:11 -0800
sparsehash (1.0-1) unstable; urgency=low
* New upstream release. We are now out of beta.
-- Google Inc. Tue, 13 Nov 2007 15:15:46 -0800
sparsehash (0.9.1-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Fri, 12 Oct 2007 12:35:24 -0700
sparsehash (0.9-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Tue, 09 Oct 2007 14:15:21 -0700
sparsehash (0.8-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Tue, 03 Jul 2007 12:55:04 -0700
sparsehash (0.7-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Mon, 11 Jun 2007 11:33:41 -0700
sparsehash (0.6-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Tue, 20 Mar 2007 17:29:34 -0700
sparsehash (0.5-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Sat, 21 Oct 2006 13:47:47 -0700
sparsehash (0.4-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Sun, 23 Apr 2006 22:42:35 -0700
sparsehash (0.3-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Thu, 03 Nov 2005 20:12:31 -0800
sparsehash (0.2-1) unstable; urgency=low
* New upstream release.
-- Google Inc. Mon, 02 May 2005 07:04:46 -0700
sparsehash (0.1-1) unstable; urgency=low
* Initial release.
-- Google Inc. Tue, 15 Feb 2005 07:17:02 -0800
sparsehash-1.10/packages/deb/compat 0000444 0000764 0011610 00000000002 11174146576 014235 0000000 0000000 4
sparsehash-1.10/packages/deb/control 0000444 0000764 0011610 00000001230 11366614051 014425 0000000 0000000 Source: sparsehash
Section: libdevel
Priority: optional
Maintainer: Google Inc.
Build-Depends: debhelper (>= 4.0.0)
Standards-Version: 3.6.1
Package: sparsehash
Section: libs
Architecture: any
Description: hash_map and hash_set classes with minimal space overhead
This package contains several hash-map implementations, similar
in API to SGI's hash_map class, but with different performance
characteristics. sparse_hash_map uses very little space overhead: 1-2
bits per entry. dense_hash_map is typically faster than the default
SGI STL implementation. This package also includes hash-set analogues
of these classes.
sparsehash-1.10/packages/deb/copyright 0000444 0000764 0011610 00000003170 10633336042 014757 0000000 0000000 This package was debianized by Google Inc. on
15 February 2005.
It was downloaded from http://code.google.com/
Upstream Author: opensource@google.com
Copyright (c) 2005, Google Inc.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
sparsehash-1.10/packages/deb/docs 0000444 0000764 0011610 00000000372 10633336042 013700 0000000 0000000 AUTHORS
COPYING
ChangeLog
INSTALL
NEWS
README
TODO
doc/dense_hash_map.html
doc/dense_hash_set.html
doc/sparse_hash_map.html
doc/sparse_hash_set.html
doc/sparsetable.html
doc/implementation.html
doc/performance.html
doc/index.html
doc/designstyle.css
sparsehash-1.10/packages/deb/rules 0000555 0000764 0011610 00000005542 11174146602 014113 0000000 0000000 #!/usr/bin/make -f
# -*- makefile -*-
# Sample debian/rules that uses debhelper.
# This file was originally written by Joey Hess and Craig Small.
# As a special exception, when this file is copied by dh-make into a
# dh-make output file, you may use that output file without restriction.
# This special exception was added by Craig Small in version 0.37 of dh-make.
# Uncomment this to turn on verbose mode.
#export DH_VERBOSE=1
# These are used for cross-compiling and for saving the configure script
# from having to guess our platform (since we know it already)
DEB_HOST_GNU_TYPE ?= $(shell dpkg-architecture -qDEB_HOST_GNU_TYPE)
DEB_BUILD_GNU_TYPE ?= $(shell dpkg-architecture -qDEB_BUILD_GNU_TYPE)
CFLAGS = -Wall -g
ifneq (,$(findstring noopt,$(DEB_BUILD_OPTIONS)))
CFLAGS += -O0
else
CFLAGS += -O2
endif
ifeq (,$(findstring nostrip,$(DEB_BUILD_OPTIONS)))
INSTALL_PROGRAM += -s
endif
# shared library versions, option 1
#version=2.0.5
#major=2
# option 2, assuming the library is created as src/.libs/libfoo.so.2.0.5 or so
version=`ls src/.libs/lib*.so.* | \
awk '{if (match($$0,/[0-9]+\.[0-9]+\.[0-9]+$$/)) print substr($$0,RSTART)}'`
major=`ls src/.libs/lib*.so.* | \
awk '{if (match($$0,/\.so\.[0-9]+$$/)) print substr($$0,RSTART+4)}'`
config.status: configure
dh_testdir
# Add here commands to configure the package.
CFLAGS="$(CFLAGS)" ./configure --host=$(DEB_HOST_GNU_TYPE) --build=$(DEB_BUILD_GNU_TYPE) --prefix=/usr --mandir=\$${prefix}/share/man --infodir=\$${prefix}/share/info
build: build-stamp
build-stamp: config.status
dh_testdir
# Add here commands to compile the package.
$(MAKE)
touch build-stamp
clean:
dh_testdir
dh_testroot
rm -f build-stamp
# Add here commands to clean up after the build process.
-$(MAKE) distclean
ifneq "$(wildcard /usr/share/misc/config.sub)" ""
cp -f /usr/share/misc/config.sub config.sub
endif
ifneq "$(wildcard /usr/share/misc/config.guess)" ""
cp -f /usr/share/misc/config.guess config.guess
endif
dh_clean
install: build
dh_testdir
dh_testroot
dh_clean -k
dh_installdirs
# Add here commands to install the package into debian/tmp
$(MAKE) install DESTDIR=$(CURDIR)/debian/tmp
# Build architecture-independent files here.
binary-indep: build install
# We have nothing to do by default.
# Build architecture-dependent files here.
binary-arch: build install
dh_testdir
dh_testroot
dh_installchangelogs ChangeLog
dh_installdocs
dh_installexamples
dh_install --sourcedir=debian/tmp
# dh_installmenu
# dh_installdebconf
# dh_installlogrotate
# dh_installemacsen
# dh_installpam
# dh_installmime
# dh_installinit
# dh_installcron
# dh_installinfo
dh_installman
dh_link
dh_strip
dh_compress
dh_fixperms
# dh_perl
# dh_python
dh_makeshlibs
dh_installdeb
dh_shlibdeps
dh_gencontrol
dh_md5sums
dh_builddeb
binary: binary-indep binary-arch
.PHONY: build clean binary-indep binary-arch binary install
sparsehash-1.10/packages/deb/sparsehash.dirs 0000444 0000764 0011610 00000000071 11446763731 016061 0000000 0000000 usr/include
usr/include/google
usr/lib
usr/lib/pkgconfig
sparsehash-1.10/packages/deb/sparsehash.install 0000444 0000764 0011610 00000000150 11446763707 016567 0000000 0000000 usr/include/google/*
usr/lib/pkgconfig/*
debian/tmp/usr/include/google/*
debian/tmp/usr/lib/pkgconfig/*
sparsehash-1.10/src/ 0000777 0000764 0011610 00000000000 11516450313 011367 5 0000000 0000000 sparsehash-1.10/src/google/ 0000777 0000764 0011610 00000000000 11516450312 012642 5 0000000 0000000 sparsehash-1.10/src/google/sparsehash/ 0000777 0000764 0011610 00000000000 11516450313 015004 5 0000000 0000000 sparsehash-1.10/src/google/sparsehash/densehashtable.h 0000444 0000764 0011610 00000146011 11471027064 020047 0000000 0000000 // Copyright (c) 2005, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ---
// Author: Craig Silverstein
//
// A dense hashtable is a particular implementation of
// a hashtable: one that is meant to minimize memory allocation.
// It does this by using an array to store all the data. We
// steal a value from the key space to indicate "empty" array
// elements (ie indices where no item lives) and another to indicate
// "deleted" elements.
//
// (Note it is possible to change the value of the delete key
// on the fly; you can even remove it, though after that point
// the hashtable is insert_only until you set it again. The empty
// value however can't be changed.)
//
// To minimize allocation and pointer overhead, we use internal
// probing, in which the hashtable is a single table, and collisions
// are resolved by trying to insert again in another bucket. The
// most cache-efficient internal probing schemes are linear probing
// (which suffers, alas, from clumping) and quadratic probing, which
// is what we implement by default.
//
// Type requirements: value_type is required to be Copy Constructible
// and Default Constructible. It is not required to be (and commonly
// isn't) Assignable.
//
// You probably shouldn't use this code directly. Use
// or instead.
// You can change the following below:
// HT_OCCUPANCY_PCT -- how full before we double size
// HT_EMPTY_PCT -- how empty before we halve size
// HT_MIN_BUCKETS -- default smallest bucket size
//
// You can also change enlarge_factor (which defaults to
// HT_OCCUPANCY_PCT), and shrink_factor (which defaults to
// HT_EMPTY_PCT) with set_resizing_parameters().
//
// How to decide what values to use?
// shrink_factor's default of .4 * OCCUPANCY_PCT, is probably good.
// HT_MIN_BUCKETS is probably unnecessary since you can specify
// (indirectly) the starting number of buckets at construct-time.
// For enlarge_factor, you can use this chart to try to trade-off
// expected lookup time to the space taken up. By default, this
// code uses quadratic probing, though you can change it to linear
// via _JUMP below if you really want to.
//
// From http://www.augustana.ca/~mohrj/courses/1999.fall/csc210/lecture_notes/hashing.html
// NUMBER OF PROBES / LOOKUP Successful Unsuccessful
// Quadratic collision resolution 1 - ln(1-L) - L/2 1/(1-L) - L - ln(1-L)
// Linear collision resolution [1+1/(1-L)]/2 [1+1/(1-L)2]/2
//
// -- enlarge_factor -- 0.10 0.50 0.60 0.75 0.80 0.90 0.99
// QUADRATIC COLLISION RES.
// probes/successful lookup 1.05 1.44 1.62 2.01 2.21 2.85 5.11
// probes/unsuccessful lookup 1.11 2.19 2.82 4.64 5.81 11.4 103.6
// LINEAR COLLISION RES.
// probes/successful lookup 1.06 1.5 1.75 2.5 3.0 5.5 50.5
// probes/unsuccessful lookup 1.12 2.5 3.6 8.5 13.0 50.0 5000.0
#ifndef _DENSEHASHTABLE_H_
#define _DENSEHASHTABLE_H_
// The probing method
// Linear probing
// #define JUMP_(key, num_probes) ( 1 )
// Quadratic probing
#define JUMP_(key, num_probes) ( num_probes )
#include
#include
#include
#include // for abort()
#include // For swap(), eg
#include // For length_error
#include // For cerr
#include // For uninitialized_fill, uninitialized_copy
#include // for pair<>
#include // for facts about iterator tags
#include // for numeric_limits<>
#include
#include
#include // for true_type, integral_constant, etc.
_START_GOOGLE_NAMESPACE_
using STL_NAMESPACE::pair;
// Hashtable class, used to implement the hashed associative containers
// hash_set and hash_map.
// Value: what is stored in the table (each bucket is a Value).
// Key: something in a 1-to-1 correspondence to a Value, that can be used
// to search for a Value in the table (find() takes a Key).
// HashFcn: Takes a Key and returns an integer, the more unique the better.
// ExtractKey: given a Value, returns the unique Key associated with it.
// Must inherit from unary_function, or at least have a
// result_type enum indicating the return type of operator().
// SetKey: given a Value* and a Key, modifies the value such that
// ExtractKey(value) == key. We guarantee this is only called
// with key == deleted_key or key == empty_key.
// EqualKey: Given two Keys, says whether they are the same (that is,
// if they are both associated with the same Value).
// Alloc: STL allocator to use to allocate memory.
template
class dense_hashtable;
template
struct dense_hashtable_iterator;
template
struct dense_hashtable_const_iterator;
// We're just an array, but we need to skip over empty and deleted elements
template
struct dense_hashtable_iterator {
private:
typedef typename A::template rebind::other value_alloc_type;
public:
typedef dense_hashtable_iterator iterator;
typedef dense_hashtable_const_iterator const_iterator;
typedef STL_NAMESPACE::forward_iterator_tag iterator_category;
typedef V value_type;
typedef typename value_alloc_type::difference_type difference_type;
typedef typename value_alloc_type::size_type size_type;
typedef typename value_alloc_type::reference reference;
typedef typename value_alloc_type::pointer pointer;
// "Real" constructor and default constructor
dense_hashtable_iterator(const dense_hashtable *h,
pointer it, pointer it_end, bool advance)
: ht(h), pos(it), end(it_end) {
if (advance) advance_past_empty_and_deleted();
}
dense_hashtable_iterator() { }
// The default destructor is fine; we don't define one
// The default operator= is fine; we don't define one
// Happy dereferencer
reference operator*() const { return *pos; }
pointer operator->() const { return &(operator*()); }
// Arithmetic. The only hard part is making sure that
// we're not on an empty or marked-deleted array element
void advance_past_empty_and_deleted() {
while ( pos != end && (ht->test_empty(*this) || ht->test_deleted(*this)) )
++pos;
}
iterator& operator++() {
assert(pos != end); ++pos; advance_past_empty_and_deleted(); return *this;
}
iterator operator++(int) { iterator tmp(*this); ++*this; return tmp; }
// Comparison.
bool operator==(const iterator& it) const { return pos == it.pos; }
bool operator!=(const iterator& it) const { return pos != it.pos; }
// The actual data
const dense_hashtable *ht;
pointer pos, end;
};
// Now do it all again, but with const-ness!
template
struct dense_hashtable_const_iterator {
private:
typedef typename A::template rebind::other value_alloc_type;
public:
typedef dense_hashtable_iterator iterator;
typedef dense_hashtable_const_iterator const_iterator;
typedef STL_NAMESPACE::forward_iterator_tag iterator_category;
typedef V value_type;
typedef typename value_alloc_type::difference_type difference_type;
typedef typename value_alloc_type::size_type size_type;
typedef typename value_alloc_type::const_reference reference;
typedef typename value_alloc_type::const_pointer pointer;
// "Real" constructor and default constructor
dense_hashtable_const_iterator(
const dense_hashtable *h,
pointer it, pointer it_end, bool advance)
: ht(h), pos(it), end(it_end) {
if (advance) advance_past_empty_and_deleted();
}
dense_hashtable_const_iterator()
: ht(NULL), pos(pointer()), end(pointer()) { }
// This lets us convert regular iterators to const iterators
dense_hashtable_const_iterator(const iterator &it)
: ht(it.ht), pos(it.pos), end(it.end) { }
// The default destructor is fine; we don't define one
// The default operator= is fine; we don't define one
// Happy dereferencer
reference operator*() const { return *pos; }
pointer operator->() const { return &(operator*()); }
// Arithmetic. The only hard part is making sure that
// we're not on an empty or marked-deleted array element
void advance_past_empty_and_deleted() {
while ( pos != end && (ht->test_empty(*this) || ht->test_deleted(*this)) )
++pos;
}
const_iterator& operator++() {
assert(pos != end); ++pos; advance_past_empty_and_deleted(); return *this;
}
const_iterator operator++(int) { const_iterator tmp(*this); ++*this; return tmp; }
// Comparison.
bool operator==(const const_iterator& it) const { return pos == it.pos; }
bool operator!=(const const_iterator& it) const { return pos != it.pos; }
// The actual data
const dense_hashtable *ht;
pointer pos, end;
};
template
class dense_hashtable {
private:
typedef typename Alloc::template rebind::other value_alloc_type;
public:
typedef Key key_type;
typedef Value value_type;
typedef HashFcn hasher;
typedef EqualKey key_equal;
typedef Alloc allocator_type;
typedef typename value_alloc_type::size_type size_type;
typedef typename value_alloc_type::difference_type difference_type;
typedef typename value_alloc_type::reference reference;
typedef typename value_alloc_type::const_reference const_reference;
typedef typename value_alloc_type::pointer pointer;
typedef typename value_alloc_type::const_pointer const_pointer;
typedef dense_hashtable_iterator
iterator;
typedef dense_hashtable_const_iterator
const_iterator;
// These come from tr1. For us they're the same as regular iterators.
typedef iterator local_iterator;
typedef const_iterator const_local_iterator;
// How full we let the table get before we resize, by default.
// Knuth says .8 is good -- higher causes us to probe too much,
// though it saves memory.
static const int HT_OCCUPANCY_PCT; // = 50 (out of 100)
// How empty we let the table get before we resize lower, by default.
// (0.0 means never resize lower.)
// It should be less than OCCUPANCY_PCT / 2 or we thrash resizing
static const int HT_EMPTY_PCT; // = 0.4 * HT_OCCUPANCY_PCT;
// Minimum size we're willing to let hashtables be.
// Must be a power of two, and at least 4.
// Note, however, that for a given hashtable, the initial size is a
// function of the first constructor arg, and may be >HT_MIN_BUCKETS.
static const size_type HT_MIN_BUCKETS = 4;
// By default, if you don't specify a hashtable size at
// construction-time, we use this size. Must be a power of two, and
// at least HT_MIN_BUCKETS.
static const size_type HT_DEFAULT_STARTING_BUCKETS = 32;
// ITERATOR FUNCTIONS
iterator begin() { return iterator(this, table,
table + num_buckets, true); }
iterator end() { return iterator(this, table + num_buckets,
table + num_buckets, true); }
const_iterator begin() const { return const_iterator(this, table,
table+num_buckets,true);}
const_iterator end() const { return const_iterator(this, table + num_buckets,
table+num_buckets,true);}
// These come from tr1 unordered_map. They iterate over 'bucket' n.
// We'll just consider bucket n to be the n-th element of the table.
local_iterator begin(size_type i) {
return local_iterator(this, table + i, table + i+1, false);
}
local_iterator end(size_type i) {
local_iterator it = begin(i);
if (!test_empty(i) && !test_deleted(i))
++it;
return it;
}
const_local_iterator begin(size_type i) const {
return const_local_iterator(this, table + i, table + i+1, false);
}
const_local_iterator end(size_type i) const {
const_local_iterator it = begin(i);
if (!test_empty(i) && !test_deleted(i))
++it;
return it;
}
// ACCESSOR FUNCTIONS for the things we templatize on, basically
hasher hash_funct() const { return settings; }
key_equal key_eq() const { return key_info; }
allocator_type get_allocator() const {
return allocator_type(val_info);
}
// Accessor function for statistics gathering.
int num_table_copies() const { return settings.num_ht_copies(); }
private:
// Annoyingly, we can't copy values around, because they might have
// const components (they're probably pair). We use
// explicit destructor invocation and placement new to get around
// this. Arg.
void set_value(pointer dst, const_reference src) {
dst->~value_type(); // delete the old value, if any
new(dst) value_type(src);
}
void destroy_buckets(size_type first, size_type last) {
for ( ; first != last; ++first)
table[first].~value_type();
}
// DELETE HELPER FUNCTIONS
// This lets the user describe a key that will indicate deleted
// table entries. This key should be an "impossible" entry --
// if you try to insert it for real, you won't be able to retrieve it!
// (NB: while you pass in an entire value, only the key part is looked
// at. This is just because I don't know how to assign just a key.)
private:
void squash_deleted() { // gets rid of any deleted entries we have
if ( num_deleted ) { // get rid of deleted before writing
dense_hashtable tmp(*this); // copying will get rid of deleted
swap(tmp); // now we are tmp
}
assert(num_deleted == 0);
}
bool test_deleted_key(const key_type& key) const {
// The num_deleted test is crucial for read(): after read(), the ht values
// are garbage, and we don't want to think some of them are deleted.
// Invariant: !use_deleted implies num_deleted is 0.
assert(settings.use_deleted() || num_deleted == 0);
return num_deleted > 0 && equals(key_info.delkey, key);
}
public:
void set_deleted_key(const key_type &key) {
// the empty indicator (if specified) and the deleted indicator
// must be different
assert((!settings.use_empty() || !equals(key, get_key(val_info.emptyval)))
&& "Passed the empty-key to set_deleted_key");
// It's only safe to change what "deleted" means if we purge deleted guys
squash_deleted();
settings.set_use_deleted(true);
key_info.delkey = key;
}
void clear_deleted_key() {
squash_deleted();
settings.set_use_deleted(false);
}
key_type deleted_key() const {
assert(settings.use_deleted()
&& "Must set deleted key before calling deleted_key");
return key_info.delkey;
}
// These are public so the iterators can use them
// True if the item at position bucknum is "deleted" marker
bool test_deleted(size_type bucknum) const {
return test_deleted_key(get_key(table[bucknum]));
}
bool test_deleted(const iterator &it) const {
return test_deleted_key(get_key(*it));
}
bool test_deleted(const const_iterator &it) const {
return test_deleted_key(get_key(*it));
}
private:
// Set it so test_deleted is true. true if object didn't used to be deleted.
bool set_deleted(iterator &it) {
assert(settings.use_deleted());
bool retval = !test_deleted(it);
// &* converts from iterator to value-type.
set_key(&(*it), key_info.delkey);
return retval;
}
// Set it so test_deleted is false. true if object used to be deleted.
bool clear_deleted(iterator &it) {
assert(settings.use_deleted());
// Happens automatically when we assign something else in its place.
return test_deleted(it);
}
// We also allow to set/clear the deleted bit on a const iterator.
// We allow a const_iterator for the same reason you can delete a
// const pointer: it's convenient, and semantically you can't use
// 'it' after it's been deleted anyway, so its const-ness doesn't
// really matter.
bool set_deleted(const_iterator &it) {
assert(settings.use_deleted());
bool retval = !test_deleted(it);
set_key(const_cast(&(*it)), key_info.delkey);
return retval;
}
// Set it so test_deleted is false. true if object used to be deleted.
bool clear_deleted(const_iterator &it) {
assert(settings.use_deleted());
return test_deleted(it);
}
// EMPTY HELPER FUNCTIONS
// This lets the user describe a key that will indicate empty (unused)
// table entries. This key should be an "impossible" entry --
// if you try to insert it for real, you won't be able to retrieve it!
// (NB: while you pass in an entire value, only the key part is looked
// at. This is just because I don't know how to assign just a key.)
public:
// These are public so the iterators can use them
// True if the item at position bucknum is "empty" marker
bool test_empty(size_type bucknum) const {
assert(settings.use_empty()); // we always need to know what's empty!
return equals(get_key(val_info.emptyval), get_key(table[bucknum]));
}
bool test_empty(const iterator &it) const {
assert(settings.use_empty()); // we always need to know what's empty!
return equals(get_key(val_info.emptyval), get_key(*it));
}
bool test_empty(const const_iterator &it) const {
assert(settings.use_empty()); // we always need to know what's empty!
return equals(get_key(val_info.emptyval), get_key(*it));
}
private:
void fill_range_with_empty(pointer table_start, pointer table_end) {
STL_NAMESPACE::uninitialized_fill(table_start, table_end, val_info.emptyval);
}
public:
// TODO(csilvers): change all callers of this to pass in a key instead,
// and take a const key_type instead of const value_type.
void set_empty_key(const_reference val) {
// Once you set the empty key, you can't change it
assert(!settings.use_empty() && "Calling set_empty_key multiple times");
// The deleted indicator (if specified) and the empty indicator
// must be different.
assert((!settings.use_deleted() || !equals(get_key(val), key_info.delkey))
&& "Setting the empty key the same as the deleted key");
settings.set_use_empty(true);
set_value(&val_info.emptyval, val);
assert(!table); // must set before first use
// num_buckets was set in constructor even though table was NULL
table = val_info.allocate(num_buckets);
assert(table);
fill_range_with_empty(table, table + num_buckets);
}
// TODO(sjackman): return a key_type rather than a value_type
value_type empty_key() const {
assert(settings.use_empty());
return val_info.emptyval;
}
// FUNCTIONS CONCERNING SIZE
public:
size_type size() const { return num_elements - num_deleted; }
size_type max_size() const { return val_info.max_size(); }
bool empty() const { return size() == 0; }
size_type bucket_count() const { return num_buckets; }
size_type max_bucket_count() const { return max_size(); }
size_type nonempty_bucket_count() const { return num_elements; }
// These are tr1 methods. Their idea of 'bucket' doesn't map well to
// what we do. We just say every bucket has 0 or 1 items in it.
size_type bucket_size(size_type i) const {
return begin(i) == end(i) ? 0 : 1;
}
private:
// Because of the above, size_type(-1) is never legal; use it for errors
static const size_type ILLEGAL_BUCKET = size_type(-1);
// Used after a string of deletes. Returns true if we actually shrunk.
// TODO(csilvers): take a delta so we can take into account inserts
// done after shrinking. Maybe make part of the Settings class?
bool maybe_shrink() {
assert(num_elements >= num_deleted);
assert((bucket_count() & (bucket_count()-1)) == 0); // is a power of two
assert(bucket_count() >= HT_MIN_BUCKETS);
bool retval = false;
// If you construct a hashtable with < HT_DEFAULT_STARTING_BUCKETS,
// we'll never shrink until you get relatively big, and we'll never
// shrink below HT_DEFAULT_STARTING_BUCKETS. Otherwise, something
// like "dense_hash_set x; x.insert(4); x.erase(4);" will
// shrink us down to HT_MIN_BUCKETS buckets, which is too small.
const size_type num_remain = num_elements - num_deleted;
const size_type shrink_threshold = settings.shrink_threshold();
if (shrink_threshold > 0 && num_remain < shrink_threshold &&
bucket_count() > HT_DEFAULT_STARTING_BUCKETS) {
const float shrink_factor = settings.shrink_factor();
size_type sz = bucket_count() / 2; // find how much we should shrink
while (sz > HT_DEFAULT_STARTING_BUCKETS &&
num_remain < sz * shrink_factor) {
sz /= 2; // stay a power of 2
}
dense_hashtable tmp(*this, sz); // Do the actual resizing
swap(tmp); // now we are tmp
retval = true;
}
settings.set_consider_shrink(false); // because we just considered it
return retval;
}
// We'll let you resize a hashtable -- though this makes us copy all!
// When you resize, you say, "make it big enough for this many more elements"
// Returns true if we actually resized, false if size was already ok.
bool resize_delta(size_type delta) {
bool did_resize = false;
if ( settings.consider_shrink() ) { // see if lots of deletes happened
if ( maybe_shrink() )
did_resize = true;
}
if (num_elements >= (STL_NAMESPACE::numeric_limits::max)() - delta)
throw std::length_error("resize overflow");
if ( bucket_count() >= HT_MIN_BUCKETS &&
(num_elements + delta) <= settings.enlarge_threshold() )
return did_resize; // we're ok as we are
// Sometimes, we need to resize just to get rid of all the
// "deleted" buckets that are clogging up the hashtable. So when
// deciding whether to resize, count the deleted buckets (which
// are currently taking up room). But later, when we decide what
// size to resize to, *don't* count deleted buckets, since they
// get discarded during the resize.
const size_type needed_size = settings.min_buckets(num_elements + delta, 0);
if ( needed_size <= bucket_count() ) // we have enough buckets
return did_resize;
size_type resize_to =
settings.min_buckets(num_elements - num_deleted + delta, bucket_count());
if (resize_to < needed_size && // may double resize_to
resize_to < (STL_NAMESPACE::numeric_limits::max)() / 2) {
// This situation means that we have enough deleted elements,
// that once we purge them, we won't actually have needed to
// grow. But we may want to grow anyway: if we just purge one
// element, say, we'll have to grow anyway next time we
// insert. Might as well grow now, since we're already going
// through the trouble of copying (in order to purge the
// deleted elements).
const size_type target =
static_cast(settings.shrink_size(resize_to*2));
if (num_elements - num_deleted + delta >= target) {
// Good, we won't be below the shrink threshhold even if we double.
resize_to *= 2;
}
}
dense_hashtable tmp(*this, resize_to);
swap(tmp); // now we are tmp
return true;
}
// We require table be not-NULL and empty before calling this.
void resize_table(size_type /*old_size*/, size_type new_size,
true_type) {
table = val_info.realloc_or_die(table, new_size);
}
void resize_table(size_type old_size, size_type new_size, false_type) {
val_info.deallocate(table, old_size);
table = val_info.allocate(new_size);
}
// Used to actually do the rehashing when we grow/shrink a hashtable
void copy_from(const dense_hashtable &ht, size_type min_buckets_wanted) {
clear_to_size(settings.min_buckets(ht.size(), min_buckets_wanted));
// We use a normal iterator to get non-deleted bcks from ht
// We could use insert() here, but since we know there are
// no duplicates and no deleted items, we can be more efficient
assert((bucket_count() & (bucket_count()-1)) == 0); // a power of two
for ( const_iterator it = ht.begin(); it != ht.end(); ++it ) {
size_type num_probes = 0; // how many times we've probed
size_type bucknum;
const size_type bucket_count_minus_one = bucket_count() - 1;
for (bucknum = hash(get_key(*it)) & bucket_count_minus_one;
!test_empty(bucknum); // not empty
bucknum = (bucknum + JUMP_(key, num_probes)) & bucket_count_minus_one) {
++num_probes;
assert(num_probes < bucket_count()
&& "Hashtable is full: an error in key_equal<> or hash<>");
}
set_value(&table[bucknum], *it); // copies the value to here
num_elements++;
}
settings.inc_num_ht_copies();
}
// Required by the spec for hashed associative container
public:
// Though the docs say this should be num_buckets, I think it's much
// more useful as num_elements. As a special feature, calling with
// req_elements==0 will cause us to shrink if we can, saving space.
void resize(size_type req_elements) { // resize to this or larger
if ( settings.consider_shrink() || req_elements == 0 )
maybe_shrink();
if ( req_elements > num_elements )
resize_delta(req_elements - num_elements);
}
// Get and change the value of shrink_factor and enlarge_factor. The
// description at the beginning of this file explains how to choose
// the values. Setting the shrink parameter to 0.0 ensures that the
// table never shrinks.
void get_resizing_parameters(float* shrink, float* grow) const {
*shrink = settings.shrink_factor();
*grow = settings.enlarge_factor();
}
void set_resizing_parameters(float shrink, float grow) {
settings.set_resizing_parameters(shrink, grow);
settings.reset_thresholds(bucket_count());
}
// CONSTRUCTORS -- as required by the specs, we take a size,
// but also let you specify a hashfunction, key comparator,
// and key extractor. We also define a copy constructor and =.
// DESTRUCTOR -- needs to free the table
explicit dense_hashtable(size_type expected_max_items_in_table = 0,
const HashFcn& hf = HashFcn(),
const EqualKey& eql = EqualKey(),
const ExtractKey& ext = ExtractKey(),
const SetKey& set = SetKey(),
const Alloc& alloc = Alloc())
: settings(hf),
key_info(ext, set, eql),
num_deleted(0),
num_elements(0),
num_buckets(expected_max_items_in_table == 0
? HT_DEFAULT_STARTING_BUCKETS
: settings.min_buckets(expected_max_items_in_table, 0)),
val_info(alloc_impl(alloc)),
table(NULL) {
// table is NULL until emptyval is set. However, we set num_buckets
// here so we know how much space to allocate once emptyval is set
settings.reset_thresholds(bucket_count());
}
// As a convenience for resize(), we allow an optional second argument
// which lets you make this new hashtable a different size than ht
dense_hashtable(const dense_hashtable& ht,
size_type min_buckets_wanted = HT_DEFAULT_STARTING_BUCKETS)
: settings(ht.settings),
key_info(ht.key_info),
num_deleted(0),
num_elements(0),
num_buckets(0),
val_info(ht.val_info),
table(NULL) {
if (!ht.settings.use_empty()) {
// If use_empty isn't set, copy_from will crash, so we do our own copying.
assert(ht.empty());
num_buckets = settings.min_buckets(ht.size(), min_buckets_wanted);
settings.reset_thresholds(bucket_count());
return;
}
settings.reset_thresholds(bucket_count());
copy_from(ht, min_buckets_wanted); // copy_from() ignores deleted entries
}
dense_hashtable& operator= (const dense_hashtable& ht) {
if (&ht == this) return *this; // don't copy onto ourselves
if (!ht.settings.use_empty()) {
assert(ht.empty());
dense_hashtable empty_table(ht); // empty table with ht's thresholds
this->swap(empty_table);
return *this;
}
settings = ht.settings;
key_info = ht.key_info;
set_value(&val_info.emptyval, ht.val_info.emptyval);
// copy_from() calls clear and sets num_deleted to 0 too
copy_from(ht, HT_MIN_BUCKETS);
// we purposefully don't copy the allocator, which may not be copyable
return *this;
}
~dense_hashtable() {
if (table) {
destroy_buckets(0, num_buckets);
val_info.deallocate(table, num_buckets);
}
}
// Many STL algorithms use swap instead of copy constructors
void swap(dense_hashtable& ht) {
STL_NAMESPACE::swap(settings, ht.settings);
STL_NAMESPACE::swap(key_info, ht.key_info);
STL_NAMESPACE::swap(num_deleted, ht.num_deleted);
STL_NAMESPACE::swap(num_elements, ht.num_elements);
STL_NAMESPACE::swap(num_buckets, ht.num_buckets);
{ value_type tmp; // for annoying reasons, swap() doesn't work
set_value(&tmp, val_info.emptyval);
set_value(&val_info.emptyval, ht.val_info.emptyval);
set_value(&ht.val_info.emptyval, tmp);
}
STL_NAMESPACE::swap(table, ht.table);
settings.reset_thresholds(bucket_count()); // this also resets consider_shrink
ht.settings.reset_thresholds(bucket_count());
// we purposefully don't swap the allocator, which may not be swap-able
}
private:
void clear_to_size(size_type new_num_buckets) {
if (!table) {
table = val_info.allocate(new_num_buckets);
} else {
destroy_buckets(0, num_buckets);
if (new_num_buckets != num_buckets) { // resize, if necessary
typedef integral_constant >::value>
realloc_ok;
resize_table(num_buckets, new_num_buckets, realloc_ok());
}
}
assert(table);
fill_range_with_empty(table, table + new_num_buckets);
num_elements = 0;
num_deleted = 0;
num_buckets = new_num_buckets; // our new size
settings.reset_thresholds(bucket_count());
}
public:
// It's always nice to be able to clear a table without deallocating it
void clear() {
// If the table is already empty, and the number of buckets is
// already as we desire, there's nothing to do.
const size_type new_num_buckets = settings.min_buckets(0, 0);
if (num_elements == 0 && new_num_buckets == num_buckets) {
return;
}
clear_to_size(new_num_buckets);
}
// Clear the table without resizing it.
// Mimicks the stl_hashtable's behaviour when clear()-ing in that it
// does not modify the bucket count
void clear_no_resize() {
if (num_elements > 0) {
assert(table);
destroy_buckets(0, num_buckets);
fill_range_with_empty(table, table + num_buckets);
}
// don't consider to shrink before another erase()
settings.reset_thresholds(bucket_count());
num_elements = 0;
num_deleted = 0;
}
// LOOKUP ROUTINES
private:
// Returns a pair of positions: 1st where the object is, 2nd where
// it would go if you wanted to insert it. 1st is ILLEGAL_BUCKET
// if object is not found; 2nd is ILLEGAL_BUCKET if it is.
// Note: because of deletions where-to-insert is not trivial: it's the
// first deleted bucket we see, as long as we don't find the key later
pair find_position(const key_type &key) const {
size_type num_probes = 0; // how many times we've probed
const size_type bucket_count_minus_one = bucket_count() - 1;
size_type bucknum = hash(key) & bucket_count_minus_one;
size_type insert_pos = ILLEGAL_BUCKET; // where we would insert
while ( 1 ) { // probe until something happens
if ( test_empty(bucknum) ) { // bucket is empty
if ( insert_pos == ILLEGAL_BUCKET ) // found no prior place to insert
return pair(ILLEGAL_BUCKET, bucknum);
else
return pair(ILLEGAL_BUCKET, insert_pos);
} else if ( test_deleted(bucknum) ) {// keep searching, but mark to insert
if ( insert_pos == ILLEGAL_BUCKET )
insert_pos = bucknum;
} else if ( equals(key, get_key(table[bucknum])) ) {
return pair(bucknum, ILLEGAL_BUCKET);
}
++num_probes; // we're doing another probe
bucknum = (bucknum + JUMP_(key, num_probes)) & bucket_count_minus_one;
assert(num_probes < bucket_count()
&& "Hashtable is full: an error in key_equal<> or hash<>");
}
}
public:
iterator find(const key_type& key) {
if ( size() == 0 ) return end();
pair pos = find_position(key);
if ( pos.first == ILLEGAL_BUCKET ) // alas, not there
return end();
else
return iterator(this, table + pos.first, table + num_buckets, false);
}
const_iterator find(const key_type& key) const {
if ( size() == 0 ) return end();
pair pos = find_position(key);
if ( pos.first == ILLEGAL_BUCKET ) // alas, not there
return end();
else
return const_iterator(this, table + pos.first, table+num_buckets, false);
}
// This is a tr1 method: the bucket a given key is in, or what bucket
// it would be put in, if it were to be inserted. Shrug.
size_type bucket(const key_type& key) const {
pair pos = find_position(key);
return pos.first == ILLEGAL_BUCKET ? pos.second : pos.first;
}
// Counts how many elements have key key. For maps, it's either 0 or 1.
size_type count(const key_type &key) const {
pair pos = find_position(key);
return pos.first == ILLEGAL_BUCKET ? 0 : 1;
}
// Likewise, equal_range doesn't really make sense for us. Oh well.
pair equal_range(const key_type& key) {
iterator pos = find(key); // either an iterator or end
if (pos == end()) {
return pair(pos, pos);
} else {
const iterator startpos = pos++;
return pair(startpos, pos);
}
}
pair equal_range(const key_type& key) const {
const_iterator pos = find(key); // either an iterator or end
if (pos == end()) {
return pair(pos, pos);
} else {
const const_iterator startpos = pos++;
return pair(startpos, pos);
}
}
// INSERTION ROUTINES
private:
// Private method used by insert_noresize and find_or_insert.
iterator insert_at(const_reference obj, size_type pos) {
if (size() >= max_size())
throw std::length_error("insert overflow");
if ( test_deleted(pos) ) { // just replace if it's been del.
// shrug: shouldn't need to be const.
const_iterator delpos(this, table + pos, table + num_buckets, false);
clear_deleted(delpos);
assert( num_deleted > 0);
--num_deleted; // used to be, now it isn't
} else {
++num_elements; // replacing an empty bucket
}
set_value(&table[pos], obj);
return iterator(this, table + pos, table + num_buckets, false);
}
// If you know *this is big enough to hold obj, use this routine
pair insert_noresize(const_reference obj) {
// First, double-check we're not inserting delkey or emptyval
assert((!settings.use_empty() || !equals(get_key(obj),
get_key(val_info.emptyval)))
&& "Inserting the empty key");
assert((!settings.use_deleted() || !equals(get_key(obj), key_info.delkey))
&& "Inserting the deleted key");
const pair pos = find_position(get_key(obj));
if ( pos.first != ILLEGAL_BUCKET) { // object was already there
return pair(iterator(this, table + pos.first,
table + num_buckets, false),
false); // false: we didn't insert
} else { // pos.second says where to put it
return pair(insert_at(obj, pos.second), true);
}
}
// Specializations of insert(it, it) depending on the power of the iterator:
// (1) Iterator supports operator-, resize before inserting
template
void insert(ForwardIterator f, ForwardIterator l, STL_NAMESPACE::forward_iterator_tag) {
size_t dist = STL_NAMESPACE::distance(f, l);
if (dist >= (std::numeric_limits::max)())
throw std::length_error("insert-range overflow");
resize_delta(static_cast(dist));
for ( ; dist > 0; --dist, ++f) {
insert_noresize(*f);
}
}
// (2) Arbitrary iterator, can't tell how much to resize
template
void insert(InputIterator f, InputIterator l, STL_NAMESPACE::input_iterator_tag) {
for ( ; f != l; ++f)
insert(*f);
}
public:
// This is the normal insert routine, used by the outside world
pair insert(const_reference obj) {
resize_delta(1); // adding an object, grow if need be
return insert_noresize(obj);
}
// When inserting a lot at a time, we specialize on the type of iterator
template
void insert(InputIterator f, InputIterator l) {
// specializes on iterator type
insert(f, l, typename STL_NAMESPACE::iterator_traits::iterator_category());
}
// DefaultValue is a functor that takes a key and returns a value_type
// representing the default value to be inserted if none is found.
template
value_type& find_or_insert(const key_type& key) {
// First, double-check we're not inserting emptykey or delkey
assert((!settings.use_empty() || !equals(key, get_key(val_info.emptyval)))
&& "Inserting the empty key");
assert((!settings.use_deleted() || !equals(key, key_info.delkey))
&& "Inserting the deleted key");
const pair pos = find_position(key);
DefaultValue default_value;
if ( pos.first != ILLEGAL_BUCKET) { // object was already there
return table[pos.first];
} else if (resize_delta(1)) { // needed to rehash to make room
// Since we resized, we can't use pos, so recalculate where to insert.
return *insert_noresize(default_value(key)).first;
} else { // no need to rehash, insert right here
return *insert_at(default_value(key), pos.second);
}
}
// DELETION ROUTINES
size_type erase(const key_type& key) {
// First, double-check we're not trying to erase delkey or emptyval.
assert((!settings.use_empty() || !equals(key, get_key(val_info.emptyval)))
&& "Erasing the empty key");
assert((!settings.use_deleted() || !equals(key, key_info.delkey))
&& "Erasing the deleted key");
const_iterator pos = find(key); // shrug: shouldn't need to be const
if ( pos != end() ) {
assert(!test_deleted(pos)); // or find() shouldn't have returned it
set_deleted(pos);
++num_deleted;
settings.set_consider_shrink(true); // will think about shrink after next insert
return 1; // because we deleted one thing
} else {
return 0; // because we deleted nothing
}
}
// We return the iterator past the deleted item.
void erase(iterator pos) {
if ( pos == end() ) return; // sanity check
if ( set_deleted(pos) ) { // true if object has been newly deleted
++num_deleted;
settings.set_consider_shrink(true); // will think about shrink after next insert
}
}
void erase(iterator f, iterator l) {
for ( ; f != l; ++f) {
if ( set_deleted(f) ) // should always be true
++num_deleted;
}
settings.set_consider_shrink(true); // will think about shrink after next insert
}
// We allow you to erase a const_iterator just like we allow you to
// erase an iterator. This is in parallel to 'delete': you can delete
// a const pointer just like a non-const pointer. The logic is that
// you can't use the object after it's erased anyway, so it doesn't matter
// if it's const or not.
void erase(const_iterator pos) {
if ( pos == end() ) return; // sanity check
if ( set_deleted(pos) ) { // true if object has been newly deleted
++num_deleted;
settings.set_consider_shrink(true); // will think about shrink after next insert
}
}
void erase(const_iterator f, const_iterator l) {
for ( ; f != l; ++f) {
if ( set_deleted(f) ) // should always be true
++num_deleted;
}
settings.set_consider_shrink(true); // will think about shrink after next insert
}
// COMPARISON
bool operator==(const dense_hashtable& ht) const {
if (size() != ht.size()) {
return false;
} else if (this == &ht) {
return true;
} else {
// Iterate through the elements in "this" and see if the
// corresponding element is in ht
for ( const_iterator it = begin(); it != end(); ++it ) {
const_iterator it2 = ht.find(get_key(*it));
if ((it2 == ht.end()) || (*it != *it2)) {
return false;
}
}
return true;
}
}
bool operator!=(const dense_hashtable& ht) const {
return !(*this == ht);
}
// I/O
// We support reading and writing hashtables to disk. Alas, since
// I don't know how to write a hasher or key_equal, you have to make
// sure everything but the table is the same. We compact before writing
//
// NOTE: These functions are currently TODO. They've not been implemented.
bool write_metadata(FILE * /*fp*/) {
squash_deleted(); // so we don't have to worry about delkey
return false; // TODO
}
bool read_metadata(FILE* /*fp*/) {
num_deleted = 0; // since we got rid before writing
assert(settings.use_empty() && "empty_key not set for read_metadata");
if (table) val_info.deallocate(table, num_buckets); // we'll make our own
// TODO: read magic number
// TODO: read num_buckets
settings.reset_thresholds(bucket_count());
table = val_info.allocate(num_buckets);
assert(table);
fill_range_with_empty(table, table + num_buckets);
// TODO: read num_elements
for ( size_type i = 0; i < num_elements; ++i ) {
// TODO: read bucket_num
// TODO: set with non-empty, non-deleted value
}
return false; // TODO
}
// If your keys and values are simple enough, we can write them to
// disk for you. "simple enough" means value_type is a POD type
// that contains no pointers. However, we don't try to normalize
// endianness
bool write_nopointer_data(FILE *fp) const {
for ( const_iterator it = begin(); it != end(); ++it ) {
// TODO: skip empty/deleted values
if ( !fwrite(&*it, sizeof(*it), 1, fp) ) return false;
}
return false;
}
// When reading, we have to override the potential const-ness of *it
bool read_nopointer_data(FILE *fp) {
for ( iterator it = begin(); it != end(); ++it ) {
// TODO: skip empty/deleted values
if ( !fread(reinterpret_cast(&(*it)), sizeof(*it), 1, fp) )
return false;
}
return false;
}
private:
template
class alloc_impl : public A {
public:
typedef typename A::pointer pointer;
typedef typename A::size_type size_type;
// Convert a normal allocator to one that has realloc_or_die()
alloc_impl(const A& a) : A(a) { }
// realloc_or_die should only be used when using the default
// allocator (libc_allocator_with_realloc).
pointer realloc_or_die(pointer /*ptr*/, size_type /*n*/) {
fprintf(stderr, "realloc_or_die is only supported for "
"libc_allocator_with_realloc");
exit(1);
return NULL;
}
};
// A template specialization of alloc_impl for
// libc_allocator_with_realloc that can handle realloc_or_die.
template
class alloc_impl >
: public libc_allocator_with_realloc {
public:
typedef typename libc_allocator_with_realloc::pointer pointer;
typedef typename libc_allocator_with_realloc::size_type size_type;
alloc_impl(const libc_allocator_with_realloc& a)
: libc_allocator_with_realloc(a) { }
pointer realloc_or_die(pointer ptr, size_type n) {
pointer retval = this->reallocate(ptr, n);
if (retval == NULL) {
// We really should use PRIuS here, but I don't want to have to add
// a whole new configure option, with concomitant macro namespace
// pollution, just to print this (unlikely) error message. So I cast.
fprintf(stderr, "sparsehash: FATAL ERROR: failed to reallocate "
"%lu elements for ptr %p",
static_cast(n), ptr);
exit(1);
}
return retval;
}
};
// Package allocator with emptyval to eliminate memory needed for
// the zero-size allocator.
// If new fields are added to this class, we should add them to
// operator= and swap.
class ValInfo : public alloc_impl {
public:
typedef typename alloc_impl::value_type value_type;
ValInfo(const alloc_impl& a)
: alloc_impl(a), emptyval() { }
ValInfo(const ValInfo& v)
: alloc_impl(v), emptyval(v.emptyval) { }
value_type emptyval; // which key marks unused entries
};
// Package functors with another class to eliminate memory needed for
// zero-size functors. Since ExtractKey and hasher's operator() might
// have the same function signature, they must be packaged in
// different classes.
struct Settings :
sh_hashtable_settings {
explicit Settings(const hasher& hf)
: sh_hashtable_settings(
hf, HT_OCCUPANCY_PCT / 100.0f, HT_EMPTY_PCT / 100.0f) {}
};
// Packages ExtractKey and SetKey functors.
class KeyInfo : public ExtractKey, public SetKey, public key_equal {
public:
KeyInfo(const ExtractKey& ek, const SetKey& sk, const key_equal& eq)
: ExtractKey(ek),
SetKey(sk),
key_equal(eq) {
}
// We want to return the exact same type as ExtractKey: Key or const Key&
typename ExtractKey::result_type get_key(const_reference v) const {
return ExtractKey::operator()(v);
}
void set_key(pointer v, const key_type& k) const {
SetKey::operator()(v, k);
}
bool equals(const key_type& a, const key_type& b) const {
return key_equal::operator()(a, b);
}
// Which key marks deleted entries.
// TODO(csilvers): make a pointer, and get rid of use_deleted (benchmark!)
typename remove_const::type delkey;
};
// Utility functions to access the templated operators
size_type hash(const key_type& v) const {
return settings.hash(v);
}
bool equals(const key_type& a, const key_type& b) const {
return key_info.equals(a, b);
}
typename ExtractKey::result_type get_key(const_reference v) const {
return key_info.get_key(v);
}
void set_key(pointer v, const key_type& k) const {
key_info.set_key(v, k);
}
private:
// Actual data
Settings settings;
KeyInfo key_info;
size_type num_deleted; // how many occupied buckets are marked deleted
size_type num_elements;
size_type num_buckets;
ValInfo val_info; // holds emptyval, and also the allocator
pointer table;
};
// We need a global swap as well
template
inline void swap(dense_hashtable &x,
dense_hashtable &y) {
x.swap(y);
}
#undef JUMP_
template
const typename dense_hashtable::size_type
dense_hashtable::ILLEGAL_BUCKET;
// How full we let the table get before we resize. Knuth says .8 is
// good -- higher causes us to probe too much, though saves memory.
// However, we go with .5, getting better performance at the cost of
// more space (a trade-off densehashtable explicitly chooses to make).
// Feel free to play around with different values, though.
template
const int dense_hashtable::HT_OCCUPANCY_PCT = 50;
// How empty we let the table get before we resize lower.
// It should be less than OCCUPANCY_PCT / 2 or we thrash resizing
template
const int dense_hashtable::HT_EMPTY_PCT
= static_cast(0.4 *
dense_hashtable::HT_OCCUPANCY_PCT);
_END_GOOGLE_NAMESPACE_
#endif /* _DENSEHASHTABLE_H_ */
sparsehash-1.10/src/google/sparsehash/sparsehashtable.h 0000444 0000764 0011610 00000142256 11471027066 020257 0000000 0000000 // Copyright (c) 2005, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ---
// Author: Craig Silverstein
//
// A sparse hashtable is a particular implementation of
// a hashtable: one that is meant to minimize memory use.
// It does this by using a *sparse table* (cf sparsetable.h),
// which uses between 1 and 2 bits to store empty buckets
// (we may need another bit for hashtables that support deletion).
//
// When empty buckets are so cheap, an appealing hashtable
// implementation is internal probing, in which the hashtable
// is a single table, and collisions are resolved by trying
// to insert again in another bucket. The most cache-efficient
// internal probing schemes are linear probing (which suffers,
// alas, from clumping) and quadratic probing, which is what
// we implement by default.
//
// Deleted buckets are a bit of a pain. We have to somehow mark
// deleted buckets (the probing must distinguish them from empty
// buckets). The most principled way is to have another bitmap,
// but that's annoying and takes up space. Instead we let the
// user specify an "impossible" key. We set deleted buckets
// to have the impossible key.
//
// Note it is possible to change the value of the delete key
// on the fly; you can even remove it, though after that point
// the hashtable is insert_only until you set it again.
//
// You probably shouldn't use this code directly. Use
// or instead.
//
// You can modify the following, below:
// HT_OCCUPANCY_PCT -- how full before we double size
// HT_EMPTY_PCT -- how empty before we halve size
// HT_MIN_BUCKETS -- smallest bucket size
// HT_DEFAULT_STARTING_BUCKETS -- default bucket size at construct-time
//
// You can also change enlarge_factor (which defaults to
// HT_OCCUPANCY_PCT), and shrink_factor (which defaults to
// HT_EMPTY_PCT) with set_resizing_parameters().
//
// How to decide what values to use?
// shrink_factor's default of .4 * OCCUPANCY_PCT, is probably good.
// HT_MIN_BUCKETS is probably unnecessary since you can specify
// (indirectly) the starting number of buckets at construct-time.
// For enlarge_factor, you can use this chart to try to trade-off
// expected lookup time to the space taken up. By default, this
// code uses quadratic probing, though you can change it to linear
// via _JUMP below if you really want to.
//
// From http://www.augustana.ca/~mohrj/courses/1999.fall/csc210/lecture_notes/hashing.html
// NUMBER OF PROBES / LOOKUP Successful Unsuccessful
// Quadratic collision resolution 1 - ln(1-L) - L/2 1/(1-L) - L - ln(1-L)
// Linear collision resolution [1+1/(1-L)]/2 [1+1/(1-L)2]/2
//
// -- enlarge_factor -- 0.10 0.50 0.60 0.75 0.80 0.90 0.99
// QUADRATIC COLLISION RES.
// probes/successful lookup 1.05 1.44 1.62 2.01 2.21 2.85 5.11
// probes/unsuccessful lookup 1.11 2.19 2.82 4.64 5.81 11.4 103.6
// LINEAR COLLISION RES.
// probes/successful lookup 1.06 1.5 1.75 2.5 3.0 5.5 50.5
// probes/unsuccessful lookup 1.12 2.5 3.6 8.5 13.0 50.0 5000.0
//
// The value type is required to be copy constructible and default
// constructible, but it need not be (and commonly isn't) assignable.
#ifndef _SPARSEHASHTABLE_H_
#define _SPARSEHASHTABLE_H_
#ifndef SPARSEHASH_STAT_UPDATE
#define SPARSEHASH_STAT_UPDATE(x) ((void) 0)
#endif
// The probing method
// Linear probing
// #define JUMP_(key, num_probes) ( 1 )
// Quadratic probing
#define JUMP_(key, num_probes) ( num_probes )
#include
#include
#include // For swap(), eg
#include // For length_error
#include // for facts about iterator tags
#include // for numeric_limits<>
#include // for pair<>
#include
#include // Since that's basically what we are
_START_GOOGLE_NAMESPACE_
using STL_NAMESPACE::pair;
// The smaller this is, the faster lookup is (because the group bitmap is
// smaller) and the faster insert is, because there's less to move.
// On the other hand, there are more groups. Since group::size_type is
// a short, this number should be of the form 32*x + 16 to avoid waste.
static const u_int16_t DEFAULT_GROUP_SIZE = 48; // fits in 1.5 words
// Hashtable class, used to implement the hashed associative containers
// hash_set and hash_map.
//
// Value: what is stored in the table (each bucket is a Value).
// Key: something in a 1-to-1 correspondence to a Value, that can be used
// to search for a Value in the table (find() takes a Key).
// HashFcn: Takes a Key and returns an integer, the more unique the better.
// ExtractKey: given a Value, returns the unique Key associated with it.
// Must inherit from unary_function, or at least have a
// result_type enum indicating the return type of operator().
// SetKey: given a Value* and a Key, modifies the value such that
// ExtractKey(value) == key. We guarantee this is only called
// with key == deleted_key.
// EqualKey: Given two Keys, says whether they are the same (that is,
// if they are both associated with the same Value).
// Alloc: STL allocator to use to allocate memory.
template
class sparse_hashtable;
template
struct sparse_hashtable_iterator;
template
struct sparse_hashtable_const_iterator;
// As far as iterating, we're basically just a sparsetable
// that skips over deleted elements.
template
struct sparse_hashtable_iterator {
private:
typedef typename A::template rebind::other value_alloc_type;
public:
typedef sparse_hashtable_iterator iterator;
typedef sparse_hashtable_const_iterator const_iterator;
typedef typename sparsetable::nonempty_iterator
st_iterator;
typedef STL_NAMESPACE::forward_iterator_tag iterator_category;
typedef V value_type;
typedef typename value_alloc_type::difference_type difference_type;
typedef typename value_alloc_type::size_type size_type;
typedef typename value_alloc_type::reference reference;
typedef typename value_alloc_type::pointer pointer;
// "Real" constructor and default constructor
sparse_hashtable_iterator(const sparse_hashtable *h,
st_iterator it, st_iterator it_end)
: ht(h), pos(it), end(it_end) { advance_past_deleted(); }
sparse_hashtable_iterator() { } // not ever used internally
// The default destructor is fine; we don't define one
// The default operator= is fine; we don't define one
// Happy dereferencer
reference operator*() const { return *pos; }
pointer operator->() const { return &(operator*()); }
// Arithmetic. The only hard part is making sure that
// we're not on a marked-deleted array element
void advance_past_deleted() {
while ( pos != end && ht->test_deleted(*this) )
++pos;
}
iterator& operator++() {
assert(pos != end); ++pos; advance_past_deleted(); return *this;
}
iterator operator++(int) { iterator tmp(*this); ++*this; return tmp; }
// Comparison.
bool operator==(const iterator& it) const { return pos == it.pos; }
bool operator!=(const iterator& it) const { return pos != it.pos; }
// The actual data
const sparse_hashtable *ht;
st_iterator pos, end;
};
// Now do it all again, but with const-ness!
template
struct sparse_hashtable_const_iterator {
private:
typedef typename A::template rebind::other value_alloc_type;
public:
typedef sparse_hashtable_iterator iterator;
typedef sparse_hashtable_const_iterator const_iterator;
typedef typename sparsetable::const_nonempty_iterator
st_iterator;
typedef STL_NAMESPACE::forward_iterator_tag iterator_category;
typedef V value_type;
typedef typename value_alloc_type::difference_type difference_type;
typedef typename value_alloc_type::size_type size_type;
typedef typename value_alloc_type::const_reference reference;
typedef typename value_alloc_type::const_pointer pointer;
// "Real" constructor and default constructor
sparse_hashtable_const_iterator(const sparse_hashtable *h,
st_iterator it, st_iterator it_end)
: ht(h), pos(it), end(it_end) { advance_past_deleted(); }
// This lets us convert regular iterators to const iterators
sparse_hashtable_const_iterator() { } // never used internally
sparse_hashtable_const_iterator(const iterator &it)
: ht(it.ht), pos(it.pos), end(it.end) { }
// The default destructor is fine; we don't define one
// The default operator= is fine; we don't define one
// Happy dereferencer
reference operator*() const { return *pos; }
pointer operator->() const { return &(operator*()); }
// Arithmetic. The only hard part is making sure that
// we're not on a marked-deleted array element
void advance_past_deleted() {
while ( pos != end && ht->test_deleted(*this) )
++pos;
}
const_iterator& operator++() {
assert(pos != end); ++pos; advance_past_deleted(); return *this;
}
const_iterator operator++(int) { const_iterator tmp(*this); ++*this; return tmp; }
// Comparison.
bool operator==(const const_iterator& it) const { return pos == it.pos; }
bool operator!=(const const_iterator& it) const { return pos != it.pos; }
// The actual data
const sparse_hashtable *ht;
st_iterator pos, end;
};
// And once again, but this time freeing up memory as we iterate
template
struct sparse_hashtable_destructive_iterator {
private:
typedef typename A::template rebind::other value_alloc_type;
public:
typedef sparse_hashtable_destructive_iterator iterator;
typedef typename sparsetable::destructive_iterator
st_iterator;
typedef STL_NAMESPACE::forward_iterator_tag iterator_category;
typedef V value_type;
typedef typename value_alloc_type::difference_type difference_type;
typedef typename value_alloc_type::size_type size_type;
typedef typename value_alloc_type::reference reference;
typedef typename value_alloc_type::pointer pointer;
// "Real" constructor and default constructor
sparse_hashtable_destructive_iterator(const
sparse_hashtable *h,
st_iterator it, st_iterator it_end)
: ht(h), pos(it), end(it_end) { advance_past_deleted(); }
sparse_hashtable_destructive_iterator() { } // never used internally
// The default destructor is fine; we don't define one
// The default operator= is fine; we don't define one
// Happy dereferencer
reference operator*() const { return *pos; }
pointer operator->() const { return &(operator*()); }
// Arithmetic. The only hard part is making sure that
// we're not on a marked-deleted array element
void advance_past_deleted() {
while ( pos != end && ht->test_deleted(*this) )
++pos;
}
iterator& operator++() {
assert(pos != end); ++pos; advance_past_deleted(); return *this;
}
iterator operator++(int) { iterator tmp(*this); ++*this; return tmp; }
// Comparison.
bool operator==(const iterator& it) const { return pos == it.pos; }
bool operator!=(const iterator& it) const { return pos != it.pos; }
// The actual data
const sparse_hashtable *ht;
st_iterator pos, end;
};
template
class sparse_hashtable {
private:
typedef typename Alloc::template rebind::other value_alloc_type;
public:
typedef Key key_type;
typedef Value value_type;
typedef HashFcn hasher;
typedef EqualKey key_equal;
typedef Alloc allocator_type;
typedef typename value_alloc_type::size_type size_type;
typedef typename value_alloc_type::difference_type difference_type;
typedef typename value_alloc_type::reference reference;
typedef typename value_alloc_type::const_reference const_reference;
typedef typename value_alloc_type::pointer pointer;
typedef typename value_alloc_type::const_pointer const_pointer;
typedef sparse_hashtable_iterator
iterator;
typedef sparse_hashtable_const_iterator
const_iterator;
typedef sparse_hashtable_destructive_iterator
destructive_iterator;
// These come from tr1. For us they're the same as regular iterators.
typedef iterator local_iterator;
typedef const_iterator const_local_iterator;
// How full we let the table get before we resize, by default.
// Knuth says .8 is good -- higher causes us to probe too much,
// though it saves memory.
static const int HT_OCCUPANCY_PCT; // = 80 (out of 100);
// How empty we let the table get before we resize lower, by default.
// (0.0 means never resize lower.)
// It should be less than OCCUPANCY_PCT / 2 or we thrash resizing
static const int HT_EMPTY_PCT; // = 0.4 * HT_OCCUPANCY_PCT;
// Minimum size we're willing to let hashtables be.
// Must be a power of two, and at least 4.
// Note, however, that for a given hashtable, the initial size is a
// function of the first constructor arg, and may be >HT_MIN_BUCKETS.
static const size_type HT_MIN_BUCKETS = 4;
// By default, if you don't specify a hashtable size at
// construction-time, we use this size. Must be a power of two, and
// at least HT_MIN_BUCKETS.
static const size_type HT_DEFAULT_STARTING_BUCKETS = 32;
// ITERATOR FUNCTIONS
iterator begin() { return iterator(this, table.nonempty_begin(),
table.nonempty_end()); }
iterator end() { return iterator(this, table.nonempty_end(),
table.nonempty_end()); }
const_iterator begin() const { return const_iterator(this,
table.nonempty_begin(),
table.nonempty_end()); }
const_iterator end() const { return const_iterator(this,
table.nonempty_end(),
table.nonempty_end()); }
// These come from tr1 unordered_map. They iterate over 'bucket' n.
// For sparsehashtable, we could consider each 'group' to be a bucket,
// I guess, but I don't really see the point. We'll just consider
// bucket n to be the n-th element of the sparsetable, if it's occupied,
// or some empty element, otherwise.
local_iterator begin(size_type i) {
if (table.test(i))
return local_iterator(this, table.get_iter(i), table.nonempty_end());
else
return local_iterator(this, table.nonempty_end(), table.nonempty_end());
}
local_iterator end(size_type i) {
local_iterator it = begin(i);
if (table.test(i) && !test_deleted(i))
++it;
return it;
}
const_local_iterator begin(size_type i) const {
if (table.test(i))
return const_local_iterator(this, table.get_iter(i),
table.nonempty_end());
else
return const_local_iterator(this, table.nonempty_end(),
table.nonempty_end());
}
const_local_iterator end(size_type i) const {
const_local_iterator it = begin(i);
if (table.test(i) && !test_deleted(i))
++it;
return it;
}
// This is used when resizing
destructive_iterator destructive_begin() {
return destructive_iterator(this, table.destructive_begin(),
table.destructive_end());
}
destructive_iterator destructive_end() {
return destructive_iterator(this, table.destructive_end(),
table.destructive_end());
}
// ACCESSOR FUNCTIONS for the things we templatize on, basically
hasher hash_funct() const { return settings; }
key_equal key_eq() const { return key_info; }
allocator_type get_allocator() const { return table.get_allocator(); }
// Accessor function for statistics gathering.
int num_table_copies() const { return settings.num_ht_copies(); }
private:
// We need to copy values when we set the special marker for deleted
// elements, but, annoyingly, we can't just use the copy assignment
// operator because value_type might not be assignable (it's often
// pair). We use explicit destructor invocation and
// placement new to get around this. Arg.
void set_value(pointer dst, const_reference src) {
dst->~value_type(); // delete the old value, if any
new(dst) value_type(src);
}
// This is used as a tag for the copy constructor, saying to destroy its
// arg We have two ways of destructively copying: with potentially growing
// the hashtable as we copy, and without. To make sure the outside world
// can't do a destructive copy, we make the typename private.
enum MoveDontCopyT {MoveDontCopy, MoveDontGrow};
// DELETE HELPER FUNCTIONS
// This lets the user describe a key that will indicate deleted
// table entries. This key should be an "impossible" entry --
// if you try to insert it for real, you won't be able to retrieve it!
// (NB: while you pass in an entire value, only the key part is looked
// at. This is just because I don't know how to assign just a key.)
private:
void squash_deleted() { // gets rid of any deleted entries we have
if ( num_deleted ) { // get rid of deleted before writing
sparse_hashtable tmp(MoveDontGrow, *this);
swap(tmp); // now we are tmp
}
assert(num_deleted == 0);
}
bool test_deleted_key(const key_type& key) const {
// The num_deleted test is crucial for read(): after read(), the ht values
// are garbage, and we don't want to think some of them are deleted.
// Invariant: !use_deleted implies num_deleted is 0.
assert(settings.use_deleted() || num_deleted == 0);
return num_deleted > 0 && equals(key_info.delkey, key);
}
public:
void set_deleted_key(const key_type &key) {
// It's only safe to change what "deleted" means if we purge deleted guys
squash_deleted();
settings.set_use_deleted(true);
key_info.delkey = key;
}
void clear_deleted_key() {
squash_deleted();
settings.set_use_deleted(false);
}
key_type deleted_key() const {
assert(settings.use_deleted()
&& "Must set deleted key before calling deleted_key");
return key_info.delkey;
}
// These are public so the iterators can use them
// True if the item at position bucknum is "deleted" marker
bool test_deleted(size_type bucknum) const {
if (num_deleted == 0 || !table.test(bucknum)) return false;
return test_deleted_key(get_key(table.unsafe_get(bucknum)));
}
bool test_deleted(const iterator &it) const {
if (!settings.use_deleted()) return false;
return test_deleted_key(get_key(*it));
}
bool test_deleted(const const_iterator &it) const {
if (!settings.use_deleted()) return false;
return test_deleted_key(get_key(*it));
}
bool test_deleted(const destructive_iterator &it) const {
if (!settings.use_deleted()) return false;
return test_deleted_key(get_key(*it));
}
private:
// Set it so test_deleted is true. true if object didn't used to be deleted.
// TODO(csilvers): make these private (also in densehashtable.h)
bool set_deleted(iterator &it) {
assert(settings.use_deleted());
bool retval = !test_deleted(it);
// &* converts from iterator to value-type.
set_key(&(*it), key_info.delkey);
return retval;
}
// Set it so test_deleted is false. true if object used to be deleted.
bool clear_deleted(iterator &it) {
assert(settings.use_deleted());
// Happens automatically when we assign something else in its place.
return test_deleted(it);
}
// We also allow to set/clear the deleted bit on a const iterator.
// We allow a const_iterator for the same reason you can delete a
// const pointer: it's convenient, and semantically you can't use
// 'it' after it's been deleted anyway, so its const-ness doesn't
// really matter.
bool set_deleted(const_iterator &it) {
assert(settings.use_deleted()); // bad if set_deleted_key() wasn't called
bool retval = !test_deleted(it);
set_key(const_cast(&(*it)), key_info.delkey);
return retval;
}
// Set it so test_deleted is false. true if object used to be deleted.
bool clear_deleted(const_iterator &it) {
assert(settings.use_deleted()); // bad if set_deleted_key() wasn't called
return test_deleted(it);
}
// FUNCTIONS CONCERNING SIZE
public:
size_type size() const { return table.num_nonempty() - num_deleted; }
size_type max_size() const { return table.max_size(); }
bool empty() const { return size() == 0; }
size_type bucket_count() const { return table.size(); }
size_type max_bucket_count() const { return max_size(); }
// These are tr1 methods. Their idea of 'bucket' doesn't map well to
// what we do. We just say every bucket has 0 or 1 items in it.
size_type bucket_size(size_type i) const {
return begin(i) == end(i) ? 0 : 1;
}
private:
// Because of the above, size_type(-1) is never legal; use it for errors
static const size_type ILLEGAL_BUCKET = size_type(-1);
// Used after a string of deletes. Returns true if we actually shrunk.
// TODO(csilvers): take a delta so we can take into account inserts
// done after shrinking. Maybe make part of the Settings class?
bool maybe_shrink() {
assert(table.num_nonempty() >= num_deleted);
assert((bucket_count() & (bucket_count()-1)) == 0); // is a power of two
assert(bucket_count() >= HT_MIN_BUCKETS);
bool retval = false;
// If you construct a hashtable with < HT_DEFAULT_STARTING_BUCKETS,
// we'll never shrink until you get relatively big, and we'll never
// shrink below HT_DEFAULT_STARTING_BUCKETS. Otherwise, something
// like "dense_hash_set x; x.insert(4); x.erase(4);" will
// shrink us down to HT_MIN_BUCKETS buckets, which is too small.
const size_type num_remain = table.num_nonempty() - num_deleted;
const size_type shrink_threshold = settings.shrink_threshold();
if (shrink_threshold > 0 && num_remain < shrink_threshold &&
bucket_count() > HT_DEFAULT_STARTING_BUCKETS) {
const float shrink_factor = settings.shrink_factor();
size_type sz = bucket_count() / 2; // find how much we should shrink
while (sz > HT_DEFAULT_STARTING_BUCKETS &&
num_remain < static_cast(sz * shrink_factor)) {
sz /= 2; // stay a power of 2
}
sparse_hashtable tmp(MoveDontCopy, *this, sz);
swap(tmp); // now we are tmp
retval = true;
}
settings.set_consider_shrink(false); // because we just considered it
return retval;
}
// We'll let you resize a hashtable -- though this makes us copy all!
// When you resize, you say, "make it big enough for this many more elements"
// Returns true if we actually resized, false if size was already ok.
bool resize_delta(size_type delta) {
bool did_resize = false;
if ( settings.consider_shrink() ) { // see if lots of deletes happened
if ( maybe_shrink() )
did_resize = true;
}
if (table.num_nonempty() >=
(STL_NAMESPACE::numeric_limits::max)() - delta)
throw std::length_error("resize overflow");
if ( bucket_count() >= HT_MIN_BUCKETS &&
(table.num_nonempty() + delta) <= settings.enlarge_threshold() )
return did_resize; // we're ok as we are
// Sometimes, we need to resize just to get rid of all the
// "deleted" buckets that are clogging up the hashtable. So when
// deciding whether to resize, count the deleted buckets (which
// are currently taking up room). But later, when we decide what
// size to resize to, *don't* count deleted buckets, since they
// get discarded during the resize.
const size_type needed_size =
settings.min_buckets(table.num_nonempty() + delta, 0);
if ( needed_size <= bucket_count() ) // we have enough buckets
return did_resize;
size_type resize_to =
settings.min_buckets(table.num_nonempty() - num_deleted + delta,
bucket_count());
if (resize_to < needed_size && // may double resize_to
resize_to < (STL_NAMESPACE::numeric_limits::max)() / 2) {
// This situation means that we have enough deleted elements,
// that once we purge them, we won't actually have needed to
// grow. But we may want to grow anyway: if we just purge one
// element, say, we'll have to grow anyway next time we
// insert. Might as well grow now, since we're already going
// through the trouble of copying (in order to purge the
// deleted elements).
const size_type target =
static_cast(settings.shrink_size(resize_to*2));
if (table.num_nonempty() - num_deleted + delta >= target) {
// Good, we won't be below the shrink threshhold even if we double.
resize_to *= 2;
}
}
sparse_hashtable tmp(MoveDontCopy, *this, resize_to);
swap(tmp); // now we are tmp
return true;
}
// Used to actually do the rehashing when we grow/shrink a hashtable
void copy_from(const sparse_hashtable &ht, size_type min_buckets_wanted) {
clear(); // clear table, set num_deleted to 0
// If we need to change the size of our table, do it now
const size_type resize_to =
settings.min_buckets(ht.size(), min_buckets_wanted);
if ( resize_to > bucket_count() ) { // we don't have enough buckets
table.resize(resize_to); // sets the number of buckets
settings.reset_thresholds(bucket_count());
}
// We use a normal iterator to get non-deleted bcks from ht
// We could use insert() here, but since we know there are
// no duplicates and no deleted items, we can be more efficient
assert((bucket_count() & (bucket_count()-1)) == 0); // a power of two
for ( const_iterator it = ht.begin(); it != ht.end(); ++it ) {
size_type num_probes = 0; // how many times we've probed
size_type bucknum;
const size_type bucket_count_minus_one = bucket_count() - 1;
for (bucknum = hash(get_key(*it)) & bucket_count_minus_one;
table.test(bucknum); // not empty
bucknum = (bucknum + JUMP_(key, num_probes)) & bucket_count_minus_one) {
++num_probes;
assert(num_probes < bucket_count()
&& "Hashtable is full: an error in key_equal<> or hash<>");
}
table.set(bucknum, *it); // copies the value to here
}
settings.inc_num_ht_copies();
}
// Implementation is like copy_from, but it destroys the table of the
// "from" guy by freeing sparsetable memory as we iterate. This is
// useful in resizing, since we're throwing away the "from" guy anyway.
void move_from(MoveDontCopyT mover, sparse_hashtable &ht,
size_type min_buckets_wanted) {
clear(); // clear table, set num_deleted to 0
// If we need to change the size of our table, do it now
size_type resize_to;
if ( mover == MoveDontGrow )
resize_to = ht.bucket_count(); // keep same size as old ht
else // MoveDontCopy
resize_to = settings.min_buckets(ht.size(), min_buckets_wanted);
if ( resize_to > bucket_count() ) { // we don't have enough buckets
table.resize(resize_to); // sets the number of buckets
settings.reset_thresholds(bucket_count());
}
// We use a normal iterator to get non-deleted bcks from ht
// We could use insert() here, but since we know there are
// no duplicates and no deleted items, we can be more efficient
assert( (bucket_count() & (bucket_count()-1)) == 0); // a power of two
// THIS IS THE MAJOR LINE THAT DIFFERS FROM COPY_FROM():
for ( destructive_iterator it = ht.destructive_begin();
it != ht.destructive_end(); ++it ) {
size_type num_probes = 0; // how many times we've probed
size_type bucknum;
for ( bucknum = hash(get_key(*it)) & (bucket_count()-1); // h % buck_cnt
table.test(bucknum); // not empty
bucknum = (bucknum + JUMP_(key, num_probes)) & (bucket_count()-1) ) {
++num_probes;
assert(num_probes < bucket_count()
&& "Hashtable is full: an error in key_equal<> or hash<>");
}
table.set(bucknum, *it); // copies the value to here
}
settings.inc_num_ht_copies();
}
// Required by the spec for hashed associative container
public:
// Though the docs say this should be num_buckets, I think it's much
// more useful as num_elements. As a special feature, calling with
// req_elements==0 will cause us to shrink if we can, saving space.
void resize(size_type req_elements) { // resize to this or larger
if ( settings.consider_shrink() || req_elements == 0 )
maybe_shrink();
if ( req_elements > table.num_nonempty() ) // we only grow
resize_delta(req_elements - table.num_nonempty());
}
// Get and change the value of shrink_factor and enlarge_factor. The
// description at the beginning of this file explains how to choose
// the values. Setting the shrink parameter to 0.0 ensures that the
// table never shrinks.
void get_resizing_parameters(float* shrink, float* grow) const {
*shrink = settings.shrink_factor();
*grow = settings.enlarge_factor();
}
void set_resizing_parameters(float shrink, float grow) {
settings.set_resizing_parameters(shrink, grow);
settings.reset_thresholds(bucket_count());
}
// CONSTRUCTORS -- as required by the specs, we take a size,
// but also let you specify a hashfunction, key comparator,
// and key extractor. We also define a copy constructor and =.
// DESTRUCTOR -- the default is fine, surprisingly.
explicit sparse_hashtable(size_type expected_max_items_in_table = 0,
const HashFcn& hf = HashFcn(),
const EqualKey& eql = EqualKey(),
const ExtractKey& ext = ExtractKey(),
const SetKey& set = SetKey(),
const Alloc& alloc = Alloc())
: settings(hf),
key_info(ext, set, eql),
num_deleted(0),
table((expected_max_items_in_table == 0
? HT_DEFAULT_STARTING_BUCKETS
: settings.min_buckets(expected_max_items_in_table, 0)),
alloc) {
settings.reset_thresholds(bucket_count());
}
// As a convenience for resize(), we allow an optional second argument
// which lets you make this new hashtable a different size than ht.
// We also provide a mechanism of saying you want to "move" the ht argument
// into us instead of copying.
sparse_hashtable(const sparse_hashtable& ht,
size_type min_buckets_wanted = HT_DEFAULT_STARTING_BUCKETS)
: settings(ht.settings),
key_info(ht.key_info),
num_deleted(0),
table(0, ht.get_allocator()) {
settings.reset_thresholds(bucket_count());
copy_from(ht, min_buckets_wanted); // copy_from() ignores deleted entries
}
sparse_hashtable(MoveDontCopyT mover, sparse_hashtable& ht,
size_type min_buckets_wanted = HT_DEFAULT_STARTING_BUCKETS)
: settings(ht.settings),
key_info(ht.key_info),
num_deleted(0),
table(0, ht.get_allocator()) {
settings.reset_thresholds(bucket_count());
move_from(mover, ht, min_buckets_wanted); // ignores deleted entries
}
sparse_hashtable& operator= (const sparse_hashtable& ht) {
if (&ht == this) return *this; // don't copy onto ourselves
settings = ht.settings;
key_info = ht.key_info;
num_deleted = ht.num_deleted;
// copy_from() calls clear and sets num_deleted to 0 too
copy_from(ht, HT_MIN_BUCKETS);
// we purposefully don't copy the allocator, which may not be copyable
return *this;
}
// Many STL algorithms use swap instead of copy constructors
void swap(sparse_hashtable& ht) {
STL_NAMESPACE::swap(settings, ht.settings);
STL_NAMESPACE::swap(key_info, ht.key_info);
STL_NAMESPACE::swap(num_deleted, ht.num_deleted);
table.swap(ht.table);
}
// It's always nice to be able to clear a table without deallocating it
void clear() {
if (!empty() || (num_deleted != 0)) {
table.clear();
}
settings.reset_thresholds(bucket_count());
num_deleted = 0;
}
// LOOKUP ROUTINES
private:
// Returns a pair of positions: 1st where the object is, 2nd where
// it would go if you wanted to insert it. 1st is ILLEGAL_BUCKET
// if object is not found; 2nd is ILLEGAL_BUCKET if it is.
// Note: because of deletions where-to-insert is not trivial: it's the
// first deleted bucket we see, as long as we don't find the key later
pair find_position(const key_type &key) const {
size_type num_probes = 0; // how many times we've probed
const size_type bucket_count_minus_one = bucket_count() - 1;
size_type bucknum = hash(key) & bucket_count_minus_one;
size_type insert_pos = ILLEGAL_BUCKET; // where we would insert
SPARSEHASH_STAT_UPDATE(total_lookups += 1);
while ( 1 ) { // probe until something happens
if ( !table.test(bucknum) ) { // bucket is empty
SPARSEHASH_STAT_UPDATE(total_probes += num_probes);
if ( insert_pos == ILLEGAL_BUCKET ) // found no prior place to insert
return pair(ILLEGAL_BUCKET, bucknum);
else
return pair(ILLEGAL_BUCKET, insert_pos);
} else if ( test_deleted(bucknum) ) {// keep searching, but mark to insert
if ( insert_pos == ILLEGAL_BUCKET )
insert_pos = bucknum;
} else if ( equals(key, get_key(table.unsafe_get(bucknum))) ) {
SPARSEHASH_STAT_UPDATE(total_probes += num_probes);
return pair(bucknum, ILLEGAL_BUCKET);
}
++num_probes; // we're doing another probe
bucknum = (bucknum + JUMP_(key, num_probes)) & bucket_count_minus_one;
assert(num_probes < bucket_count()
&& "Hashtable is full: an error in key_equal<> or hash<>");
}
}
public:
iterator find(const key_type& key) {
if ( size() == 0 ) return end();
pair pos = find_position(key);
if ( pos.first == ILLEGAL_BUCKET ) // alas, not there
return end();
else
return iterator(this, table.get_iter(pos.first), table.nonempty_end());
}
const_iterator find(const key_type& key) const {
if ( size() == 0 ) return end();
pair pos = find_position(key);
if ( pos.first == ILLEGAL_BUCKET ) // alas, not there
return end();
else
return const_iterator(this,
table.get_iter(pos.first), table.nonempty_end());
}
// This is a tr1 method: the bucket a given key is in, or what bucket
// it would be put in, if it were to be inserted. Shrug.
size_type bucket(const key_type& key) const {
pair pos = find_position(key);
return pos.first == ILLEGAL_BUCKET ? pos.second : pos.first;
}
// Counts how many elements have key key. For maps, it's either 0 or 1.
size_type count(const key_type &key) const {
pair pos = find_position(key);
return pos.first == ILLEGAL_BUCKET ? 0 : 1;
}
// Likewise, equal_range doesn't really make sense for us. Oh well.
pair equal_range(const key_type& key) {
iterator pos = find(key); // either an iterator or end
if (pos == end()) {
return pair(pos, pos);
} else {
const iterator startpos = pos++;
return pair(startpos, pos);
}
}
pair equal_range(const key_type& key) const {
const_iterator pos = find(key); // either an iterator or end
if (pos == end()) {
return pair(pos, pos);
} else {
const const_iterator startpos = pos++;
return pair(startpos, pos);
}
}
// INSERTION ROUTINES
private:
// Private method used by insert_noresize and find_or_insert.
iterator insert_at(const_reference obj, size_type pos) {
if (size() >= max_size())
throw std::length_error("insert overflow");
if ( test_deleted(pos) ) { // just replace if it's been deleted
// The set() below will undelete this object. We just worry about stats
assert(num_deleted > 0);
--num_deleted; // used to be, now it isn't
}
table.set(pos, obj);
return iterator(this, table.get_iter(pos), table.nonempty_end());
}
// If you know *this is big enough to hold obj, use this routine
pair insert_noresize(const_reference obj) {
// First, double-check we're not inserting delkey
assert((!settings.use_deleted() || !equals(get_key(obj), key_info.delkey))
&& "Inserting the deleted key");
const pair pos = find_position(get_key(obj));
if ( pos.first != ILLEGAL_BUCKET) { // object was already there
return pair(iterator(this, table.get_iter(pos.first),
table.nonempty_end()),
false); // false: we didn't insert
} else { // pos.second says where to put it
return pair(insert_at(obj, pos.second), true);
}
}
// Specializations of insert(it, it) depending on the power of the iterator:
// (1) Iterator supports operator-, resize before inserting
template
void insert(ForwardIterator f, ForwardIterator l, STL_NAMESPACE::forward_iterator_tag) {
size_t dist = STL_NAMESPACE::distance(f, l);
if (dist >= (std::numeric_limits::max)())
throw std::length_error("insert-range overflow");
resize_delta(static_cast(dist));
for ( ; dist > 0; --dist, ++f) {
insert_noresize(*f);
}
}
// (2) Arbitrary iterator, can't tell how much to resize
template
void insert(InputIterator f, InputIterator l, STL_NAMESPACE::input_iterator_tag) {
for ( ; f != l; ++f)
insert(*f);
}
public:
// This is the normal insert routine, used by the outside world
pair insert(const_reference obj) {
resize_delta(1); // adding an object, grow if need be
return insert_noresize(obj);
}
// When inserting a lot at a time, we specialize on the type of iterator
template
void insert(InputIterator f, InputIterator l) {
// specializes on iterator type
insert(f, l, typename STL_NAMESPACE::iterator_traits::iterator_category());
}
// DefaultValue is a functor that takes a key and returns a value_type
// representing the default value to be inserted if none is found.
template
value_type& find_or_insert(const key_type& key) {
// First, double-check we're not inserting delkey
assert((!settings.use_deleted() || !equals(key, key_info.delkey))
&& "Inserting the deleted key");
const pair pos = find_position(key);
DefaultValue default_value;
if ( pos.first != ILLEGAL_BUCKET) { // object was already there
return *table.get_iter(pos.first);
} else if (resize_delta(1)) { // needed to rehash to make room
// Since we resized, we can't use pos, so recalculate where to insert.
return *insert_noresize(default_value(key)).first;
} else { // no need to rehash, insert right here
return *insert_at(default_value(key), pos.second);
}
}
// DELETION ROUTINES
size_type erase(const key_type& key) {
// First, double-check we're not erasing delkey.
assert((!settings.use_deleted() || !equals(key, key_info.delkey))
&& "Erasing the deleted key");
assert(!settings.use_deleted() || !equals(key, key_info.delkey));
const_iterator pos = find(key); // shrug: shouldn't need to be const
if ( pos != end() ) {
assert(!test_deleted(pos)); // or find() shouldn't have returned it
set_deleted(pos);
++num_deleted;
// will think about shrink after next insert
settings.set_consider_shrink(true);
return 1; // because we deleted one thing
} else {
return 0; // because we deleted nothing
}
}
// We return the iterator past the deleted item.
void erase(iterator pos) {
if ( pos == end() ) return; // sanity check
if ( set_deleted(pos) ) { // true if object has been newly deleted
++num_deleted;
// will think about shrink after next insert
settings.set_consider_shrink(true);
}
}
void erase(iterator f, iterator l) {
for ( ; f != l; ++f) {
if ( set_deleted(f) ) // should always be true
++num_deleted;
}
// will think about shrink after next insert
settings.set_consider_shrink(true);
}
// We allow you to erase a const_iterator just like we allow you to
// erase an iterator. This is in parallel to 'delete': you can delete
// a const pointer just like a non-const pointer. The logic is that
// you can't use the object after it's erased anyway, so it doesn't matter
// if it's const or not.
void erase(const_iterator pos) {
if ( pos == end() ) return; // sanity check
if ( set_deleted(pos) ) { // true if object has been newly deleted
++num_deleted;
// will think about shrink after next insert
settings.set_consider_shrink(true);
}
}
void erase(const_iterator f, const_iterator l) {
for ( ; f != l; ++f) {
if ( set_deleted(f) ) // should always be true
++num_deleted;
}
// will think about shrink after next insert
settings.set_consider_shrink(true);
}
// COMPARISON
bool operator==(const sparse_hashtable& ht) const {
if (size() != ht.size()) {
return false;
} else if (this == &ht) {
return true;
} else {
// Iterate through the elements in "this" and see if the
// corresponding element is in ht
for ( const_iterator it = begin(); it != end(); ++it ) {
const_iterator it2 = ht.find(get_key(*it));
if ((it2 == ht.end()) || (*it != *it2)) {
return false;
}
}
return true;
}
}
bool operator!=(const sparse_hashtable& ht) const {
return !(*this == ht);
}
// I/O
// We support reading and writing hashtables to disk. NOTE that
// this only stores the hashtable metadata, not the stuff you've
// actually put in the hashtable! Alas, since I don't know how to
// write a hasher or key_equal, you have to make sure everything
// but the table is the same. We compact before writing.
bool write_metadata(FILE *fp) {
squash_deleted(); // so we don't have to worry about delkey
return table.write_metadata(fp);
}
bool read_metadata(FILE *fp) {
num_deleted = 0; // since we got rid before writing
bool result = table.read_metadata(fp);
settings.reset_thresholds(bucket_count());
return result;
}
// Only meaningful if value_type is a POD.
bool write_nopointer_data(FILE *fp) {
return table.write_nopointer_data(fp);
}
// Only meaningful if value_type is a POD.
bool read_nopointer_data(FILE *fp) {
return table.read_nopointer_data(fp);
}
private:
// Table is the main storage class.
typedef sparsetable Table;
// Package templated functors with the other types to eliminate memory
// needed for storing these zero-size operators. Since ExtractKey and
// hasher's operator() might have the same function signature, they
// must be packaged in different classes.
struct Settings :
sh_hashtable_settings {
explicit Settings(const hasher& hf)
: sh_hashtable_settings(
hf, HT_OCCUPANCY_PCT / 100.0f, HT_EMPTY_PCT / 100.0f) {}
};
// KeyInfo stores delete key and packages zero-size functors:
// ExtractKey and SetKey.
class KeyInfo : public ExtractKey, public SetKey, public key_equal {
public:
KeyInfo(const ExtractKey& ek, const SetKey& sk, const key_equal& eq)
: ExtractKey(ek),
SetKey(sk),
key_equal(eq) {
}
// We want to return the exact same type as ExtractKey: Key or const Key&
typename ExtractKey::result_type get_key(const_reference v) const {
return ExtractKey::operator()(v);
}
void set_key(pointer v, const key_type& k) const {
SetKey::operator()(v, k);
}
bool equals(const key_type& a, const key_type& b) const {
return key_equal::operator()(a, b);
}
// Which key marks deleted entries.
// TODO(csilvers): make a pointer, and get rid of use_deleted (benchmark!)
typename remove_const::type delkey;
};
// Utility functions to access the templated operators
size_type hash(const key_type& v) const {
return settings.hash(v);
}
bool equals(const key_type& a, const key_type& b) const {
return key_info.equals(a, b);
}
typename ExtractKey::result_type get_key(const_reference v) const {
return key_info.get_key(v);
}
void set_key(pointer v, const key_type& k) const {
key_info.set_key(v, k);
}
private:
// Actual data
Settings settings;
KeyInfo key_info;
size_type num_deleted; // how many occupied buckets are marked deleted
Table table; // holds num_buckets and num_elements too
};
// We need a global swap as well
template
inline void swap(sparse_hashtable &x,
sparse_hashtable &y) {
x.swap(y);
}
#undef JUMP_
template
const typename sparse_hashtable::size_type
sparse_hashtable::ILLEGAL_BUCKET;
// How full we let the table get before we resize. Knuth says .8 is
// good -- higher causes us to probe too much, though saves memory
template
const int sparse_hashtable::HT_OCCUPANCY_PCT = 80;
// How empty we let the table get before we resize lower.
// It should be less than OCCUPANCY_PCT / 2 or we thrash resizing
template
const int sparse_hashtable::HT_EMPTY_PCT
= static_cast(0.4 *
sparse_hashtable::HT_OCCUPANCY_PCT);
_END_GOOGLE_NAMESPACE_
#endif /* _SPARSEHASHTABLE_H_ */
sparsehash-1.10/src/google/sparsehash/hashtable-common.h 0000444 0000764 0011610 00000013307 11446463402 020321 0000000 0000000 // Copyright (c) 2005, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ---
// Author: Giao Nguyen
#ifndef UTIL_GTL_HASHTABLE_COMMON_H_
#define UTIL_GTL_HASHTABLE_COMMON_H_
#include
// Settings contains parameters for growing and shrinking the table.
// It also packages zero-size functor (ie. hasher).
template
class sh_hashtable_settings : public HashFunc {
public:
typedef Key key_type;
typedef HashFunc hasher;
typedef SizeType size_type;
public:
sh_hashtable_settings(const hasher& hf,
const float ht_occupancy_flt,
const float ht_empty_flt)
: hasher(hf),
enlarge_threshold_(0),
shrink_threshold_(0),
consider_shrink_(false),
use_empty_(false),
use_deleted_(false),
num_ht_copies_(0) {
set_enlarge_factor(ht_occupancy_flt);
set_shrink_factor(ht_empty_flt);
}
size_type hash(const key_type& v) const {
return hasher::operator()(v);
}
float enlarge_factor() const {
return enlarge_factor_;
}
void set_enlarge_factor(float f) {
enlarge_factor_ = f;
}
float shrink_factor() const {
return shrink_factor_;
}
void set_shrink_factor(float f) {
shrink_factor_ = f;
}
size_type enlarge_threshold() const {
return enlarge_threshold_;
}
void set_enlarge_threshold(size_type t) {
enlarge_threshold_ = t;
}
size_type shrink_threshold() const {
return shrink_threshold_;
}
void set_shrink_threshold(size_type t) {
shrink_threshold_ = t;
}
size_type enlarge_size(size_type x) const {
return static_cast(x * enlarge_factor_);
}
size_type shrink_size(size_type x) const {
return static_cast(x * shrink_factor_);
}
bool consider_shrink() const {
return consider_shrink_;
}
void set_consider_shrink(bool t) {
consider_shrink_ = t;
}
bool use_empty() const {
return use_empty_;
}
void set_use_empty(bool t) {
use_empty_ = t;
}
bool use_deleted() const {
return use_deleted_;
}
void set_use_deleted(bool t) {
use_deleted_ = t;
}
size_type num_ht_copies() const {
return static_cast(num_ht_copies_);
}
void inc_num_ht_copies() {
++num_ht_copies_;
}
// Reset the enlarge and shrink thresholds
void reset_thresholds(size_type num_buckets) {
set_enlarge_threshold(enlarge_size(num_buckets));
set_shrink_threshold(shrink_size(num_buckets));
// whatever caused us to reset already considered
set_consider_shrink(false);
}
// Caller is resposible for calling reset_threshold right after
// set_resizing_parameters.
void set_resizing_parameters(float shrink, float grow) {
assert(shrink >= 0.0);
assert(grow <= 1.0);
if (shrink > grow/2.0f)
shrink = grow / 2.0f; // otherwise we thrash hashtable size
set_shrink_factor(shrink);
set_enlarge_factor(grow);
}
// This is the smallest size a hashtable can be without being too crowded
// If you like, you can give a min #buckets as well as a min #elts
size_type min_buckets(size_type num_elts, size_type min_buckets_wanted) {
float enlarge = enlarge_factor();
size_type sz = HT_MIN_BUCKETS; // min buckets allowed
while ( sz < min_buckets_wanted ||
num_elts >= static_cast(sz * enlarge) ) {
// This just prevents overflowing size_type, since sz can exceed
// max_size() here.
if (static_cast(sz * 2) < sz) {
throw std::length_error("resize overflow"); // protect against overflow
}
sz *= 2;
}
return sz;
}
private:
size_type enlarge_threshold_; // table.size() * enlarge_factor
size_type shrink_threshold_; // table.size() * shrink_factor
float enlarge_factor_; // how full before resize
float shrink_factor_; // how empty before resize
// consider_shrink=true if we should try to shrink before next insert
bool consider_shrink_;
bool use_empty_; // used only by densehashtable, not sparsehashtable
bool use_deleted_; // false until delkey has been set
// num_ht_copies is a counter incremented every Copy/Move
unsigned int num_ht_copies_;
};
#endif // UTIL_GTL_HASHTABLE_COMMON_H_
sparsehash-1.10/src/google/sparsehash/libc_allocator_with_realloc.h 0000444 0000764 0011610 00000007462 11370456402 022610 0000000 0000000 // Copyright (c) 2010, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ---
// Author: Guilin Chen
#ifndef UTIL_GTL_LIBC_ALLOCATOR_WITH_REALLOC_H_
#define UTIL_GTL_LIBC_ALLOCATOR_WITH_REALLOC_H_
#include
#include // for malloc/realloc/free
#include // for ptrdiff_t
_START_GOOGLE_NAMESPACE_
template
class libc_allocator_with_realloc {
public:
typedef T value_type;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
typedef T* pointer;
typedef const T* const_pointer;
typedef T& reference;
typedef const T& const_reference;
libc_allocator_with_realloc() {}
libc_allocator_with_realloc(const libc_allocator_with_realloc&) {}
~libc_allocator_with_realloc() {}
pointer address(reference r) const { return &r; }
const_pointer address(const_reference r) const { return &r; }
pointer allocate(size_type n, const_pointer = 0) {
return static_cast(malloc(n * sizeof(value_type)));
}
void deallocate(pointer p, size_type) {
free(p);
}
pointer reallocate(pointer p, size_type n) {
return static_cast(realloc(p, n * sizeof(value_type)));
}
size_type max_size() const {
return static_cast(-1) / sizeof(value_type);
}
void construct(pointer p, const value_type& val) {
new(p) value_type(val);
}
void destroy(pointer p) { p->~value_type(); }
template
libc_allocator_with_realloc(const libc_allocator_with_realloc&) {}
template
struct rebind {
typedef libc_allocator_with_realloc