fastmap/0000755000176200001440000000000014003630522011700 5ustar liggesusersfastmap/NAMESPACE0000644000176200001440000000032214003624163013120 0ustar liggesusers# Generated by roxygen2: do not edit by hand S3method(print,key_missing) export(fastmap) export(fastqueue) export(faststack) export(is.key_missing) export(key_missing) useDynLib(fastmap, .registration = TRUE) fastmap/LICENSE.note0000644000176200001440000000502013546743337013672 0ustar liggesusersThe fastmap package as a whole is distributed under the MIT license. =============================================================================== MIT License Copyright (c) 2019 RStudio Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. =============================================================================== The fastmap package also includes the following third-party library, with the following license: - hopscotch_map (https://github.com/Tessil/hopscotch-map): MIT license Full text of the license for hopscotch_map: MIT License Copyright (c) 2016 Tessil Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. fastmap/LICENSE0000644000176200001440000000004513546743337012730 0ustar liggesusersYEAR: 2019 COPYRIGHT HOLDER: RStudio fastmap/README.md0000644000176200001440000003025114003624163013164 0ustar liggesusersfastmap ======= [![CRAN status](https://www.r-pkg.org/badges/version/fastmap)](https://cran.r-project.org/package=fastmap) [![R build status](https://github.com/r-lib/fastmap/workflows/R-CMD-check/badge.svg)](https://github.com/r-lib/fastmap/actions) **fastmap** implements the following data structures for R: * `fastmap`: maps (key-value store) * `faststack`: stacks * `fastqueue`: queues The usual way of implementing maps in R is to use environments. However, this method is problematic when using a large set of keys or randomly-generated keys, because each time you use a key or even check for the existence of a key using `exists()`, that key is interned as a symbol and stored in the R symbol table, which is never garbage-collected. This means that every time you use a new key -- whether it is to store an object or just check whether the key exists in the environment, R leaks a little memory. If you have a relatively small, fixed set of keys, or if your R process is a short-running process, this may not be a problem. But if, for example, you have a long-running R process that uses random keys, then the memory leakage can cause a noticeable increase in memory usage. Also, when R's symbol table is large, garbage collection events, which occur regularly, take more time, reducing R's performance in general. (See the _Memory leak examples_ section of this document for more information.) **fastmap** solves this problem by storing the keys as C++ `std::string` objects, and so it does not use the R symbol table at all. The values are stored in a list so that R knows not to garbage-collect them. In C++, fastmap uses a [`tsl::hopscotch_map`](https://github.com/Tessil/hopscotch-map/) (which is similar to `std::unordered_map`) to map from the keys to indices in the list of values. ## Installation ```R install.packages("fastmap") ``` ## Usage ### `fastmap()` ```R library(fastmap) # Create a map m <- fastmap() # Set some key-value pairs m$set("x", 100) m$set("letters", c("a", "b", "c")) m$mset(numbers = c(10, 20, 30), nothing = NULL) # Get values using keys m$get("x") #> [1] 100 m$get("numbers") #> [1] 10 20 30 m$mget(c("letters", "numbers")) #> $letters #> [1] "a" "b" "c" #> #> $numbers #> [1] 10 20 30 # Missing keys return NULL by default, but this can be customized m$get("xyz") #> NULL # Check for existence of keys m$has("x") #> [1] TRUE m$has("nothing") #> [1] TRUE m$has("xyz") #> [1] FALSE # Remove one or more items m$remove(c("letters", "x")) # Return number of items m$size() #> [1] 2 # Get all keys m$keys() #> [1] "nothing" "numbers" # Return named list that represents all key-value pairs str(m$as_list()) #> List of 3 #> $ nothing: NULL #> $ numbers: num [1:3] 10 20 30 # Clear the map m$reset() ``` By default, `get()` returns `NULL` for keys that aren't present. You can instead specify a sentinel value to return for missing keys, either when the fastmap is created, or when `get()` is called. For example, you can return a `key_missing()` object to represent missing values: ```R # Specify missing value when get() is called m <- fastmap() m$get("x", missing = key_missing()) #> # Specify the default missing value m <- fastmap(missing_default = key_missing()) m$get("x") #> ``` ### `faststack()` ```R s <- faststack() s$push(10) s$mpush(11, 12, 13) s$mpush(.list = list(14, 15)) s$pop() #> [1] 15 str(s$mpop(3)) #> List of 3 #> $ : num 14 #> $ : num 13 #> $ : num 12 s$peek() #> [1] 11 s$size() #> [1] 2 # Get the stack in list form. Note that the order is the opposite of $mpop() str(s$as_list()) #> List of 2 #> $ : num 10 #> $ : num 11 s$reset() ``` By default, popping from an empty stack returns `NULL`, but you can specify other values. ```R s$pop() #> NULL # Can specify the default missing value at creation. s <- faststack(missing_default = key_missing()) s$pop() #> # Can specify a missing value when $pop is called s$pop(missing = "nope") #> [1] "nope" ``` ### `fastqueue()` ```R q <- fastqueue() q$add(10) q$madd(11, 12, 13) q$madd(.list = list(14, 15)) q$remove() #> [1] 10 str(q$mremove(3)) #> List of 3 #> $ : num 11 #> $ : num 12 #> $ : num 13 q$peek() #> [1] 14 q$size() #> [1] 2 # Get the queue in list form. str(q$as_list()) #> List of 2 #> $ : num 14 #> $ : num 15 q$reset() ``` By default, removing from an empty queue returns `NULL`, but you can specify other values. ```R q$remove() #> NULL # Can specify the default missing value at creation. q <- fastqueue(missing_default = key_missing()) q$remove() #> # Can specify a missing value when $pop is called q$remove(missing = "nope") #> [1] "nope" ``` ## Notes on `fastmap` objects ### Key ordering When you call `m$keys()` or `m$as_list()`, the items are returned in an arbitrary order. Keep in mind that there is no guarantee that the order will be the same across platforms, or across different builds of fastmap. If you want to guarantee a particular order, you can call `m$keys(sort=TRUE)` or `m$as_list(sort=TRUE)`. The result will be a locale-independent sorting of the keys by their Unicode code point values. For example, `é` (Unicode code point 233) comes after `z` (122). If you want the keys to be sorted a different way, you will need to sort them yourself. ### Serialization A `fastmap` object can be serialized (or saved) in one R session and deserialized (or loaded) in another. For performance, the data structure that tracks the mapping between keys and values is implemented in C++, and this data structure will not be serialized, but fastmap also keeps a copy of the same information in an ordinary R vector, which will be serialized. After a fastmap object is deserialized, the C++ data structure will not exist, but the first time any method on the fastmap is called, the C++ data structure will be rebuilt using information from the R vector. The vector is much slower for lookups, and so it is used only for restoring the C++ data structure after a fastmap object is deserialized or loaded. ### Key encoding Unlike with environments, the keys in a fastmap are always encoded as UTF-8, so if you call `m$set()` with two different strings that have the same Unicode values but have different encodings, the second call will overwrite the first value. If you call `m$keys()`, it will return UTF-8 encoded strings, and similarly, `m$mget()` and `m$as_list()` will return lists with names that have UTF-8 encoding. ### Testing for equality The base R functions `identical()` and `all.equal()` are commonly used to test two objects for equality, but they will not work correctly for fastmap objects. `identical()` will always report `FALSE` for two distinct fastmap objects, even if they have the same contents, while `all.equal()` will always report `TRUE` for two fastmap objects. To test whether two fastmap objects have the same contents, compare the results of `$as_list(sort=TRUE)` for both of the objects. For example: ``` identical(a$as_list(sort = TRUE), b$as_list(sort = TRUE)) # or all.equal(a$as_list(sort = TRUE), b$as_list(sort = TRUE)) ``` These comparisons are subject to the technical details of how `identical()` and `all.equal()` treat named lists. ## Memory leak examples This example shows how using a regular R environment leaks memory, even when simply checking for the existence of a key. ```R library(pryr) gc() start_mem <- mem_used() start_time <- as.numeric(Sys.time()) for (i in 1:8) { cat(i, ": ", sep = "") print(mem_used()) e <- new.env(parent = emptyenv()) for (j in 1:10000) { # Generate random key x <- as.character(runif(1)) exists(x, envir = e, inherits = FALSE) } rm(e, x) } end_time <- as.numeric(Sys.time()) gc() end_mem <- mem_used() cat("Elapsed time:", round(end_time - start_time, 1), "seconds\n") cat("Memory leaked:", end_mem - start_mem, "bytes\n") ``` The output looks something like this: ``` 1: 57.9 MB 2: 59.9 MB 3: 61.9 MB 4: 64.4 MB 5: 66.4 MB 6: 68.4 MB 7: 70.4 MB 8: 72.4 MB Elapsed time: 1.1 seconds Memory leaked: 16243656 bytes ``` The elapsed time gets progressively slower as the R symbol table gets larger and larger. After running the above code repeatedly, the elapsed time for the fifth run is 3.1 seconds. If you profile the code with [profvis](https://rstudio.github.io/profvis/), you can see that most of the slowdown is not with environment operations themselves, but with garbage collection events. This slowdown appears to affect all GC events, even when no environment-related operations are performed between one GC and the next. For comparison, this example with fastmap does the same thing. ```R library(fastmap) library(pryr) gc() start_mem <- mem_used() start_time <- as.numeric(Sys.time()) for (i in 1:8) { cat(i, ": ", sep = "") print(mem_used()) m <- fastmap() for (j in 1:10000) { x <- as.character(runif(1)) m$has(x) } rm(m, x) } end_time <- as.numeric(Sys.time()) gc() end_mem <- mem_used() cat("Elapsed time:", round(end_time - start_time, 1), "seconds\n") cat("Memory leaked:", end_mem - start_mem, "bytes\n") ``` The output in a new R session looks something like this (note that this is from the second run of the code above -- for the first run, there is an increase in memory used, but it is probably related to code being run for the first time in the R session): ``` 1: 42.3 MB 2: 42.3 MB 3: 42.3 MB 4: 42.3 MB 5: 42.3 MB 6: 42.3 MB 7: 42.3 MB 8: 42.3 MB Elapsed time: 0.9 seconds Memory leaked: 0 bytes ``` It does not leak memory, and it does not slow down if you run it repeatedly. After running it ten times, it still takes 0.9 seconds, and leaks no memory. The simple tests above simply check for the existence of keys, but with setting values, the results are similar. Note that the environment operations are themselves slightly faster than the fastmap operations, but the penalty is in slower garbage collection when many keys have been used. Also keep in mind that these tests are very artificial and use tens of thousands of random keys; if your application does not do this, then fastmap may have no practical benefit. In general, these operations are so fast that performance bottlenecks almost always lie elsewhere. ## Testing your code for symbol leakage If you want to test your code directly for symbol leakage, you can use the code below. (Note: This only works on Mac.) The `get_symbols()` function returns all symbols that are registered in R's symbol table. `new_symbols()` returns all symbols that have been added since the last time `new_symbols()` was run. If you want to test whether your code causes the symbol table to grow, run `new_symbols()`, then run your code, then run `new_symbols()` again. ```R # Note: this will only compile on a Mac. `R_SymbolTable` is not an exported # symbol from Defn.h, but the a Mac, the linker exports all C symbols. get_symbols <- inline::cfunction( includes = " #define HSIZE 49157 /* The size of the hash table for symbols, from Defn.h */ extern SEXP* R_SymbolTable; ", body = " int symbol_count = 0; SEXP s; int j; for (j = 0; j < HSIZE; j++) { for (s = R_SymbolTable[j]; s != R_NilValue; s = CDR(s)) { if (CAR(s) != R_NilValue) { symbol_count++; } } } SEXP result = PROTECT(Rf_allocVector(STRSXP, symbol_count)); symbol_count = 0; for (j = 0; j < HSIZE; j++) { for (s = R_SymbolTable[j]; s != R_NilValue; s = CDR(s)) { if (CAR(s) != R_NilValue) { SET_STRING_ELT(result, symbol_count, PRINTNAME(CAR(s))); symbol_count++; } } } UNPROTECT(1); return result; " ) # Test it out get_symbols() # new_symbols() returns a character vector of symbols that have been added since # the last time it was run. last_symbols <- get_symbols() new_symbols <- function() { cur_symbols <- get_symbols() res <- setdiff(cur_symbols, last_symbols) last_symbols <<- cur_symbols res } # Example # The first couple times it's run, R might do something that adds symbols, like # load the compiler package. Run it a bunch of times until it returns # character(0). new_symbols() new_symbols() new_symbols() # character(0) # After R stops loading things, run our code and see which new symbols have # been added. abcdefg <- 1 exists("xyz") new_symbols() #> [1] "abcdefg" "xyz" ``` fastmap/man/0000755000176200001440000000000014003624163012457 5ustar liggesusersfastmap/man/fastqueue.Rd0000644000176200001440000000431114003624163014747 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/fastqueue.R \name{fastqueue} \alias{fastqueue} \title{Create a queue} \usage{ fastqueue(init = 20, missing_default = NULL) } \arguments{ \item{init}{Initial size of the list that backs the queue. This is also used as the minimum size of the list; it will not shrink any smaller.} \item{missing_default}{The value to return when \code{remove()} or \code{peek()} are called when the stack is empty. Default is \code{NULL}.} } \description{ A \code{fastqueue} is backed by a list, which is used in a circular manner. The backing list will grow or shrink as the queue changes in size. } \details{ \code{fastqueue} objects have the following methods: \describe{ \item{\code{add(x)}}{ Add an object to the queue. } \item{\code{madd(..., .list = NULL)}}{ Add objects to the queue. \code{.list} can be a list of objects to add. } \item{\code{remove(missing = missing_default)}}{ Remove and return the next object in the queue, but do not remove it from the queue. If the queue is empty, this will return \code{missing}, which defaults to the value of \code{missing_default} that \code{queue()} was created with (typically, \code{NULL}). } \item{\code{remove(n, missing = missing_default)}}{ Remove and return the next \code{n} objects on the queue, in a list. The first element of the list is the oldest object in the queue (in other words, the next item that would be returned by \code{remove()}). If \code{n} is greater than the number of objects in the queue, any requested items beyond those in the queue will be replaced with \code{missing} (typically, \code{NULL}). } \item{\code{peek(missing = missing_default)}}{ Return the next object in the queue but do not remove it from the queue. If the queue is empty, this will return \code{missing}. } \item{\code{reset()}}{ Reset the queue, clearing all items. } \item{\code{size()}}{ Returns the number of items in the queue. } \item{\code{as_list()}}{ Return a list containing the objects in the queue, where the first element in the list is oldest object in the queue (in other words, it is the next item that would be returned by \code{remove()}), and the last element in the list is the most recently added object. } } } fastmap/man/key_missing.Rd0000644000176200001440000000052313546743337015307 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/key_missing.R \name{key_missing} \alias{key_missing} \alias{is.key_missing} \title{A missing key object} \usage{ key_missing() is.key_missing(x) } \arguments{ \item{x}{An object to test.} } \description{ A \code{key_missing} object represents a missing key. } fastmap/man/faststack.Rd0000644000176200001440000000375614003624163014744 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/faststack.R \name{faststack} \alias{faststack} \title{Create a stack} \usage{ faststack(init = 20, missing_default = NULL) } \arguments{ \item{init}{Initial size of the list that backs the stack. This is also used as the minimum size of the list; it will not shrink any smaller.} \item{missing_default}{The value to return when \code{pop()} or \code{peek()} are called when the stack is empty. Default is \code{NULL}.} } \description{ A \code{faststack} is backed by a list. The backing list will grow or shrink as the stack changes in size. } \details{ \code{faststack} objects have the following methods: \describe{ \item{\code{push(x)}}{ Push an object onto the stack. } \item{\code{mpush(..., .list = NULL)}}{ Push objects onto the stack. \code{.list} can be a list of objects to add. } \item{\code{pop(missing = missing_default)}}{ Remove and return the top object on the stack. If the stack is empty, it will return \code{missing}, which defaults to the value of \code{missing_default} that \code{stack()} was created with (typically, \code{NULL}). } \item{\code{mpop(n, missing = missing_default)}}{ Remove and return the top \code{n} objects on the stack, in a list. The first element of the list is the top object in the stack. If \code{n} is greater than the number of objects in the stack, any requested items beyond those in the stack will be replaced with \code{missing} (typically, \code{NULL}). } \item{\code{peek(missing = missing_default)}}{ Return the top object on the stack, but do not remove it from the stack. If the stack is empty, this will return \code{missing}. } \item{\code{reset()}}{ Reset the stack, clearing all items. } \item{\code{size()}}{ Returns the number of items in the stack. } \item{\code{as_list()}}{ Return a list containing the objects in the stack, where the first element in the list is the object at the bottom of the stack, and the last element in the list is the object at the top of the stack. } } } fastmap/man/fastmap.Rd0000644000176200001440000001111114003624163014374 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/fastmap.R \name{fastmap} \alias{fastmap} \title{Create a fastmap object} \usage{ fastmap(missing_default = NULL) } \arguments{ \item{missing_default}{The value to return when \code{get()} is called with a key that is not in the map. The default is \code{NULL}, but in some cases it can be useful to return a sentinel value, such as a \code{\link{key_missing}} object.} } \description{ A fastmap object provides a key-value store where the keys are strings and the values are any R objects. } \details{ In R, it is common to use environments as key-value stores, but they can leak memory: every time a new key is used, R registers it in its global symbol table, which only grows and is never garbage collected. If many different keys are used, this can cause a non-trivial amount of memory leakage. Fastmap objects do not use the symbol table and do not leak memory. Unlike with environments, the keys in a fastmap are always encoded as UTF-8, so if you call \code{$set()} with two different strings that have the same Unicode values but have different encodings, the second call will overwrite the first value. If you call \code{$keys()}, it will return UTF-8 encoded strings, and similarly, \code{$as_list()} will return a list with names that have UTF-8 encoding. Note that if you call \code{$mset()} with a named argument, where the name is non-ASCII, R will convert the name to the native encoding before fastmap has the chance to convert them to UTF-8, and the keys may get mangled in the process. However, if you use \code{$mset(.list = x)}, then R will not convert the keys to the native encoding, and the keys will be correctly converted to UTF-8. With \code{$mget()}, the keys will be converted to UTF-8 before they are fetched. \code{fastmap} objects have the following methods: \describe{ \item{\code{set(key, value)}}{ Set a key-value pair. \code{key} must be a string. Returns \code{value}. } \item{\code{mset(..., .list = NULL)}}{ Set multiple key-value pairs. The key-value pairs are named arguments, and/or a list passed in as \code{.list}. Returns a named list where the names are the keys, and the values are the values. } \item{\code{get(key, missing = missing_default)}}{ Get a value corresponding to \code{key}. If the key is not in the map, return \code{missing}. } \item{\code{mget(keys, missing = missing_default)}}{ Get values corresponding to \code{keys}, which is a character vector. The values will be returned in a named list where the names are the same as the \code{keys} passed in, in the same order. For keys not in the map, they will have \code{missing} for their value. } \item{\code{has(keys)}}{ Given a vector of keys, returns a logical vector reporting whether each key is contained in the map. } \item{\code{remove(keys)}}{ Given a vector of keys, remove the key-value pairs from the map. Returns a logical vector reporting whether each item existed in (and was removed from) the map. } \item{\code{keys(sort = FALSE)}}{ Returns a character vector of all the keys. By default, the keys will be in arbitrary order. Note that the order can vary across platforms and is not guaranteed to be consistent. With \code{sort=TRUE}, the keys will be sorted according to their Unicode code point values. } \item{\code{size()}}{ Returns the number of items in the map. } \item{\code{as_list(sort = FALSE)}}{ Return a named list where the names are the keys from the map, and the values are the values. By default, the keys will be in arbitrary order. Note that the order can vary across platforms and is not guaranteed to be consistent. With \code{sort=TRUE}, the keys will be sorted according to their Unicode code point values. } \item{\code{reset()}}{ Reset the fastmap object, clearing all items. } } } \examples{ # Create the fastmap object m <- fastmap() # Set some key-value pairs m$set("x", 100) m$set("letters", c("a", "b", "c")) m$mset(numbers = c(10, 20, 30), nothing = NULL) # Get values using keys m$get("x") m$get("numbers") m$mget(c("letters", "numbers")) # Missing keys return NULL by default, but this can be customized m$get("xyz") # Check for existence of keys m$has("x") m$has("nothing") m$has("xyz") # Remove one or more items m$remove(c("letters", "x")) # Return number of items m$size() # Get all keys m$keys() # Return named list that represents all key-value pairs str(m$as_list()) # Clear the map m$reset() # Specify missing value when get() is called m <- fastmap() m$get("x", missing = key_missing()) #> # Specify the default missing value m <- fastmap(missing_default = key_missing()) m$get("x") #> } fastmap/DESCRIPTION0000644000176200001440000000234514003630522013412 0ustar liggesusersPackage: fastmap Title: Fast Data Structures Version: 1.1.0 Authors@R: c( person("Winston", "Chang", email = "winston@rstudio.com", role = c("aut", "cre")), person(given = "RStudio", role = c("cph", "fnd")), person(given = "Tessil", role = "cph", comment = "hopscotch_map library") ) Description: Fast implementation of data structures, including a key-value store, stack, and queue. Environments are commonly used as key-value stores in R, but every time a new key is used, it is added to R's global symbol table, causing a small amount of memory leakage. This can be problematic in cases where many different keys are used. Fastmap avoids this memory leak issue by implementing the map using data structures in C++. License: MIT + file LICENSE Encoding: UTF-8 LazyData: true RoxygenNote: 7.1.1 Suggests: testthat (>= 2.1.1) URL: https://r-lib.github.io/fastmap/, https://github.com/r-lib/fastmap BugReports: https://github.com/r-lib/fastmap/issues NeedsCompilation: yes Packaged: 2021-01-25 20:28:54 UTC; winston Author: Winston Chang [aut, cre], RStudio [cph, fnd], Tessil [cph] (hopscotch_map library) Maintainer: Winston Chang Repository: CRAN Date/Publication: 2021-01-25 21:00:02 UTC fastmap/tests/0000755000176200001440000000000013546743337013066 5ustar liggesusersfastmap/tests/testthat/0000755000176200001440000000000014003630522014702 5ustar liggesusersfastmap/tests/testthat/test-serialize.R0000644000176200001440000000426413546743337020023 0ustar liggesuserstest_that("Serializing and unserializing a map", { # Test with difficult encodings (code borrowed from encoding tests) m <- fastmap() k1 <- "abc" # "åbc" in UTF-8 k2 <- "\u00e5bc" # "åbc" in latin1 k3 <- iconv(k2, from = "UTF-8", to = "latin1") # "中 A" in UTF-8 k4 <- "\u4e2d A" k5 <- "xyz" m$set(k1, 1) m$set(k2, 2) m$set(k3, 3) m$set(k4, 4) m$set(k5, 5) m$remove(k1) # Make a hole m1 <- unserialize(serialize(m, NULL)) expect_mapequal(m$as_list(), m1$as_list()) expect_identical(m$size(), m1$size()) expect_setequal(m$keys(), m1$keys()) expect_true(all(Encoding(m1$keys()) %in% c("unknown", "UTF-8"))) # Make sure that m1 behaves correctly when modified m$set(k1, 10) m$set(k3, 30) m1$set(k1, 10) m1$set(k3, 30) expect_mapequal(m$as_list(), m1$as_list()) expect_identical(m$size(), m1$size()) expect_true(all(Encoding(m1$keys()) %in% c("unknown", "UTF-8"))) }) test_that("Serializing and unserializing stress test", { set.seed(3524) n <- 1e4 # Generate keys and values. values <- rnorm(n) keys <- as.character(values) m <- fastmap() add_order <- sample.int(n) for (i in add_order) { m$set(keys[i], values[i]) } # Then remove 1/3 them in random order remove_order <- sample.int(n, size = round(1/3 * n)) for (i in remove_order) { m$remove(keys[i]) } m1 <- unserialize(serialize(m, NULL)) expect_mapequal(m$as_list(), m1$as_list()) expect_identical(m$size(), m1$size()) expect_setequal(m$keys(), m1$keys()) # Add some random subset of values to m and m1, and make sure the result is # the same. add_order <- sample.int(n, size = round(1/3 * n)) for (i in add_order) { m$set(keys[i], values[i]) m1$set(keys[i], values[i]) } expect_mapequal(m$as_list(), m1$as_list()) expect_identical(m$size(), m1$size()) expect_setequal(m$keys(), m1$keys()) # Remove a subset of values from m and m1, and make sure the result is the # same. remove_order <- sample.int(n, size = round(1/3 * n)) for (i in remove_order) { m$remove(keys[i]) m1$remove(keys[i]) } expect_mapequal(m$as_list(), m1$as_list()) expect_identical(m$size(), m1$size()) expect_setequal(m$keys(), m1$keys()) }) fastmap/tests/testthat/test-queue.R0000644000176200001440000002174014003624163017136 0ustar liggesuserstest_that("Basic operations", { q <- fastqueue(3) q$add(1) q$madd(2) q$add(3) expect_identical(q$as_list(), list(1,2,3)) expect_identical(q$remove(), 1) expect_identical(q$as_list(), list(2,3)) q$add(4) expect_identical(q$as_list(), list(2,3,4)) q$add(5) q$add(list(6,7)) expect_identical(q$as_list(), list(2,3,4,5,list(6,7))) # Grow again q$add(8) q$add(9) expect_identical(q$remove(), 2) }) test_that("Removing from empty queue", { q <- fastqueue() expect_null(q$remove()) expect_null(q$remove()) expect_identical(q$size(), 0L) expect_identical(q$as_list(), list()) q$add(5) q$add(6) expect_identical(q$as_list(), list(5, 6)) }) test_that("Adding NULL to a queue", { q <- fastqueue() q$add(NULL) expect_identical(q$as_list(), list(NULL)) # Do some other weird combinations of adding NULL q$madd(NULL, NULL) q$madd(.list = list(NULL)) q$madd(NULL, .list = list(NULL, NULL)) expect_identical(q$as_list(), list(NULL, NULL, NULL, NULL, NULL, NULL, NULL)) expect_identical(q$remove(missing = "foo"), NULL) expect_identical(q$remove(missing = "foo"), NULL) expect_identical(q$remove(missing = "foo"), NULL) expect_identical(q$remove(missing = "foo"), NULL) expect_identical(q$remove(missing = "foo"), NULL) expect_identical(q$remove(missing = "foo"), NULL) expect_identical(q$remove(missing = "foo"), NULL) expect_identical(q$remove(missing = "foo"), "foo") }) test_that("Different values when removing from an empty queue", { q <- fastqueue() expect_identical(q$remove(missing = "foo"), "foo") expect_identical(q$peek(missing = "foo"), "foo") q <- fastqueue(missing_default = key_missing()) expect_identical(q$remove(), key_missing()) q$add(5) q$add(6) expect_identical(q$remove(), 5) expect_identical(q$remove(), 6) expect_identical(q$remove(), key_missing()) expect_identical(q$peek(), key_missing()) expect_identical(q$remove(missing = "foo"), "foo") expect_identical(q$peek(missing = "foo"), "foo") }) test_that("Adding multiple items", { q <- fastqueue(3) q$madd(1, .list = list(3), 2) expect_identical(env(q)$q, list(1,2,3)) expect_identical(q$as_list(), list(1,2,3)) # Should cause two doublings, to capacity of 12 q$madd(4,5,6,7) expect_identical(env(q)$q, list(1,2,3,4,5,6,7,NULL,NULL,NULL,NULL,NULL)) expect_identical(q$as_list(), list(1,2,3,4,5,6,7)) q$madd(8,9) for (i in 1:5) q$remove() # Wrapping around q$madd(10,11,12,13,14,15,16) expect_identical(q$as_list(), as.list(as.numeric(6:16))) expect_identical(env(q)$q, list(13,14,15,16,NULL,6,7,8,9,10,11,12)) q <- fastqueue(3) # Should double to size 6 q$madd(1,2,3,4,5,6) expect_identical(q$as_list(), list(1,2,3,4,5,6)) expect_identical(q$remove(), 1) q$madd(7,8,9,10,11,12,13,14) expect_equal(length(env(q)$q), 24) expect_identical(q$as_list(), as.list(as.numeric(2:14))) expect_identical(env(q)$q[1:13], as.list(as.numeric(2:14))) for (i in 1:6) q$remove() expect_equal(length(env(q)$q), 24) expect_identical(q$as_list(), as.list(as.numeric(8:14))) # This should cause a shrink to size 12 because we hit the 1/4 threshold. expect_identical(q$remove(), 8) expect_identical(env(q)$q, list(9,10,11,12,13,14,NULL,NULL,NULL,NULL,NULL,NULL)) }) test_that("Resizing", { # Starting index 1, grow q <- fastqueue(3) q$add(1) q$add(2) q$add(3) env(q)$.resize(5) expect_identical(env(q)$q, list(1,2,3,NULL,NULL)) expect_identical(q$as_list(), list(1,2,3)) # Starting index 2, grow q <- fastqueue(3) q$add(1) q$add(2) q$add(3) expect_identical(q$remove(), 1) env(q)$.resize(4) expect_identical(env(q)$q, list(2,3,NULL,NULL)) expect_identical(q$as_list(), list(2,3)) # Starting index 3, wrap around, grow q <- fastqueue(3) q$add(1) q$add(2) q$add(3) expect_identical(q$remove(), 1) expect_identical(q$remove(), 2) q$add(4) q$add(5) env(q)$.resize(5) expect_identical(env(q)$q, list(3,4,5,NULL,NULL)) expect_identical(q$as_list(), list(3,4,5)) # Starting index 1, shrink q <- fastqueue(4) q$add(1) q$add(2) env(q)$.resize(2) expect_identical(env(q)$q, list(1,2)) expect_identical(q$as_list(), list(1,2)) # Starting index 2, shrink q <- fastqueue(4) q$add(1) q$add(2) q$add(3) expect_identical(q$remove(), 1) env(q)$.resize(2) expect_identical(env(q)$q, list(2,3)) expect_identical(q$as_list(), list(2,3)) # Starting index 3, wrap around, shrink q <- fastqueue(4) q$add(1) q$add(2) q$add(3) q$add(4) expect_identical(q$remove(), 1) expect_identical(q$remove(), 2) q$add(5) env(q)$.resize(3) expect_identical(env(q)$q, list(3,4,5)) expect_identical(q$as_list(), list(3,4,5)) # Can't shrink smaller than number of items q <- fastqueue(4) q$add(1) q$add(2) q$add(3) expect_error(env(q)$.resize(2)) expect_identical(env(q)$q, list(1,2,3,NULL)) expect_identical(q$as_list(), list(1,2,3)) }) test_that("Error expressions don't result in inconsistent state", { q <- fastqueue(4) q$add(1) expect_error(q$add(stop("2"))) q$add(3) expect_error(q$madd(stop("4"))) expect_identical(q$size(), 2L) expect_identical(q$peek(), 1) expect_identical(env(q)$q, list(1,3,NULL,NULL)) expect_identical(q$as_list(), list(1,3)) }) test_that("Random walk test", { q <- fastqueue() set.seed(1312) ops <- integer(2e5) csum <- 0 # Set up a random walk where we add and remove items (1=add, -1=remove), but # never try to remove when the queue is empty. # This is a "pool" of possible operations that's larger than the actual number # of operations we'll end up with. The loop below reads from this object so # that it doesn't have to call sample(, 1) over and over, because that takes a # long time. ops_pool <- sample(c(-1, 1), length(ops) * 2, replace = TRUE, prob = c(0.4, 0.6)) for (i in seq_along(ops)) { if (csum <= 0) { # Ensure we never do a -1 when already empty. ops[i] <- 1 } else { # Each position has 1 or -1, but we bias the probability toward increasing # in size. ops[i] <- ops_pool[i] } csum <- csum + ops[i] } # Set up commands to remove items until empty. for (i in (length(ops) + seq_len(csum))) { ops[i] <- -1 # csum <- csum + ops[i] } # At the end, we should have same number of add and remove commands. expect_identical(sum(ops), 0) # The values to add are 1, 2, 3, etc., and we expect them to be removed in # the same order. next_to_add <- 1 next_to_remove <- 1 for (i in seq_along(ops)) { if (ops[i] < 0) { v <- q$remove() if (v != next_to_remove) { # Don't use expect_identical() here because it's too slow in a loop. stop("Mismatch between actual and expected value.") } next_to_remove <- next_to_remove + 1 } else { q$add(next_to_add) next_to_add <- next_to_add + 1 } } expect_identical(q$size(), 0L) expect_identical(q$as_list(), list()) # Should be back to original internal state: a list of 20 NULLs expect_identical(env(q)$q, lapply(1:20, function(x) NULL)) }) test_that(".resize_at_least() works", { q <- fastqueue(3) n <- 1:25 names(n) <- n # When calling .ensure_capacity for each vakye of n, the resulting size should # be enough to fit it, and go up by doubling. capacities <- vapply(n, function(x) { env(q)$.resize_at_least(x) length(env(q)$q) }, numeric(1)) expected <- c( `1` = 3, `2` = 3, `3` = 3, `4` = 6, `5` = 6, `6` = 6, `7` = 12, `8` = 12, `9` = 12, `10` = 12, `11` = 12, `12` = 12, `13` = 24, `14` = 24, `15` = 24, `16` = 24, `17` = 24, `18` = 24, `19` = 24, `20` = 24, `21` = 24, `22` = 24, `23` = 24, `24` = 24, `25` = 48 ) expect_identical(capacities, expected) }) test_that("mremove()", { # Case 1A and 1B q <- fastqueue(5) q$madd(1,2,3,4,5) expect_identical(q$mremove(1), list(1)) expect_identical(q$mremove(4), list(2,3,4,5)) expect_identical(env(q)$q, vector("list", 5)) expect_identical(q$size(), 0L) # Case 1A and 1B, but removing exactly one item in last mremove q <- fastqueue(5) q$madd(1,2,3,4,5) expect_identical(q$mremove(4), list(1,2,3,4)) expect_identical(q$mremove(1), list(5)) expect_identical(env(q)$q, vector("list", 5)) expect_identical(q$size(), 0L) # Wrap around q <- fastqueue(5) q$madd(1,2,3,4,5) expect_identical(q$mremove(1), list(1)) q$madd(6) expect_identical(q$as_list(), list(2,3,4,5,6)) expect_identical(env(q)$q, list(6,2,3,4,5)) expect_identical(q$mremove(4), list(2,3,4,5)) expect_identical(env(q)$q, list(6,NULL,NULL,NULL,NULL)) expect_identical(q$size(), 1L) expect_identical(q$mremove(3, missing = -1), list(6, -1, -1)) q <- fastqueue(5) q$madd(1,2,3,4,5) q$remove() q$add(6) expect_identical(q$mremove(5), list(2,3,4,5,6)) q <- fastqueue(5, missing_default = -1) q$madd(1,2,3,4,5) q$remove() q$add(6) expect_identical(q$mremove(6), list(2,3,4,5,6,-1)) q <- fastqueue(5, missing_default = -1) q$madd(1,2,3,4,5) q$mremove(2) q$madd(6,7) expect_identical(q$mremove(6), list(3,4,5,6,7,-1)) }) fastmap/tests/testthat/test-map.R0000644000176200001440000002027214003624163016566 0ustar liggesusers test_that("General correctness", { m <- fastmap() expect_identical(m$set("asdf", c(1, 2, 3)), c(1, 2, 3)) expect_identical(m$set("foo", "blah"), "blah") expect_equal(m$get("asdf"), c(1, 2, 3)) expect_mapequal( m$as_list(), list("asdf" = c(1, 2, 3), "foo"= "blah") ) expect_true(m$has("asdf")) expect_true(m$has("foo")) expect_false(m$has("bar")) expect_identical(m$size(), 2L) expect_identical(m$size(), length(env(m)$values) - env(m)$n_holes) # Removal expect_true(m$remove("asdf")) expect_equal(m$get("asdf"), NULL) expect_mapequal( m$as_list(), list("foo"= "blah") ) expect_false(m$has("asdf")) expect_true(m$has("foo")) expect_false(m$has("bar")) expect_identical(m$size(), 1L) expect_identical(m$size(), length(env(m)$values) - env(m)$n_holes) # Removing non-existent key has no effect expect_false(m$remove("asdf")) expect_equal(m$get("asdf"), NULL) # Adding back m$set("asdf", list("a", "b")) expect_equal(m$get("asdf"), list("a", "b")) expect_mapequal( m$as_list(), list("asdf" = list("a", "b"), "foo"= "blah") ) expect_true(m$has("asdf")) expect_true(m$has("foo")) expect_false(m$has("bar")) expect_identical(m$size(), 2L) expect_identical(m$size(), length(env(m)$values) - env(m)$n_holes) # Replacing existing object m$set("asdf", list("x", "y")) expect_equal(m$get("asdf"), list("x", "y")) expect_mapequal( m$as_list(), list("asdf" = list("x", "y"), "foo"= "blah") ) expect_true(m$has("asdf")) expect_true(m$has("foo")) expect_false(m$has("bar")) expect_identical(m$size(), 2L) expect_identical(m$size(), length(env(m)$values) - env(m)$n_holes) # NULL handling m$set("asdf", NULL) expect_equal(m$get("asdf"), NULL) expect_true(m$has("asdf")) expect_mapequal( m$as_list(), list("asdf" = NULL, "foo"= "blah") ) }) test_that("reset", { m <- fastmap() m$set("a", 1) m$set("b", 2) m$reset() expect_equal(m$as_list(), list(a=1)[0]) expect_equal(m$size(), 0) }) test_that("Vectorized operations", { m <- fastmap() expect_identical(m$set("c", 3), 3) expect_identical(m$mset(b = -2, a = 1), list(b = -2, a = 1)) expect_identical(m$mset(b = 2, .list = list(e = 5)), list(b = 2, e = 5)) # Order does not matter for as_list() expect_mapequal( m$as_list(), list(a=1, b=2, c=3, e=5) ) # Order matters for mget(), and keys can be duplicated expect_identical( m$mget(c("e", "c", "a", "e")), list(e=5, c=3, a=1, e=5) ) expect_identical( m$has(c("e", "a", "x", "a", "y")), c(TRUE, TRUE, FALSE, TRUE, FALSE) ) # Note that when removing a duplicated key, like "a" here, it reports TRUE for # the first instance, and FALSE for the second, because it was removed when # the algorithm iterated through the vector and encountered it the first time. # I'm not sure if that's the way it should be, or if it would be better to # report TRUE both times, but that is how it currently works. expect_identical( m$remove(c("e", "a", "x", "a", "y")), c(TRUE, TRUE, FALSE, FALSE, FALSE) ) expect_mapequal( m$as_list(), list(b=2, c=3) ) }) test_that("Missing keys", { m <- fastmap() expect_identical(m$get("a"), NULL) expect_identical(m$get("a", missing = key_missing()), key_missing()) expect_identical(m$mget(c("a", "b")), list(a = NULL, b = NULL)) expect_identical( m$mget(c("a", "b"), missing = key_missing()), list(a = key_missing(), b = key_missing()) ) # With a different default for missing m <- fastmap(missing_default = key_missing()) expect_identical(m$get("a"), key_missing()) expect_true(is.key_missing(m$get("a"))) expect_identical(m$get("a", missing = NULL), NULL) expect_identical(m$get("a"), key_missing()) expect_identical(m$mget(c("a", "b")), list(a = key_missing(), b = key_missing())) expect_identical( m$mget(c("a", "b"), missing = NULL), list(a = NULL, b = NULL) ) }) test_that("Malformed keys", { m <- fastmap() expect_error(m$set(1, 123)) expect_error(m$set(TRUE, 123)) expect_error(m$set(NA_character_, 123)) expect_error(m$set(NA_integer_, 123)) expect_error(m$set("", 123)) expect_error(m$set(character(0), 123)) expect_error(m$set(numeric(0), 123)) expect_error(m$set(NULL, 123)) args <- list(1,2,3) names(args) <- c("a", NA_character_, "c") expect_error(m$mset(.list = args)) # Make sure no values got set expect_true(length(m$as_list()) == 0) expect_identical(m$keys(), character(0)) expect_identical(m$size(), 0L) expect_identical(m$get("a"), NULL) expect_error(m$get(1)) expect_error(m$get(TRUE)) expect_error(m$get(NA_character_)) expect_error(m$get(NA_integer_)) expect_error(m$get("")) expect_error(m$get(character(0))) expect_error(m$get(numeric(0))) expect_error(m$get(NULL)) expect_identical(m$mget(NULL), list(a=1)[0]) # Empty named list expect_error(m$mget(c(1, 2))) expect_error(m$mget(c("A", ""))) expect_error(m$mget(c("A", NA))) expect_error(m$has(1)) expect_error(m$has(TRUE)) expect_error(m$has(NA_character_)) expect_error(m$has(NA_integer_)) expect_error(m$has("")) # has() is a bit more lenient than get() because it accepts a vector. expect_silent(m$has(character(0))) expect_silent(m$has(NULL)) expect_error(m$has(numeric(0))) expect_error(m$remove(NA_character_)) expect_error(m$remove(NA_integer_)) expect_error(m$remove("")) # remove() is a bit more lenient than get() because it accepts a vector. m$remove(character(0)) m$remove(NULL) expect_error(m$remove(numeric(0))) # Key or value unspecified expect_error(m$set("a")) expect_error(m$set(value = 123)) }) test_that("Vectorized operations are all-or-nothing", { # An error in set() won't leave map in an inconsistent state. m <- fastmap(missing_default = key_missing()) expect_error(m$set("a", stop("oops"))) expect_identical(m$size(), 0L) expect_true(length(m$as_list()) == 0) expect_identical(m$get("a"), key_missing()) # Same for mset() expect_error(m$mset(a=1, b=stop("oops"))) expect_identical(m$size(), 0L) expect_true(length(m$as_list()) == 0) expect_identical(m$get("a"), key_missing()) # mset(): one bad key stops the entire thing from happening. expect_error(m$mset(a=1, 2, c=3)) expect_identical(m$size(), 0L) expect_true(length(m$as_list()) == 0) expect_identical(m$get("a"), key_missing()) # get(): one bad key stops all from being removed. m$mset(a=1, b=2, c=3) expect_error(m$mget(c("a", NA, "c"))) # remove(): one bad key stops all from being removed. expect_error(m$remove(c("a", NA, "c"))) expect_identical(m$size(), 3L) expect_mapequal(m$as_list(), list(a=1, b=2, c=3)) expect_identical(m$get("a"), 1) # has(): one bad key stops all from being removed. expect_error(m$has(c("a", "", "c"))) expect_identical(m$size(), 3L) expect_mapequal(m$as_list(), list(a=1, b=2, c=3)) expect_identical(m$get("a"), 1) }) test_that("Sorting keys", { m <- fastmap() m$mset(c = 3, a = 1, ".d" = 4, b = 2) expect_identical(m$keys(sort = TRUE), c(".d", "a", "b", "c")) expect_identical( m$as_list(sort = TRUE), list(".d" = 4, a = 1, b = 2, c = 3) ) # Sorting is done by Unicode code point, and is locale-independent. m <- fastmap() m$set("é", 1) m$set("z", 2) expect_identical(m$keys(sort = TRUE), c("z", "é")) }) test_that("Stress test, compared to environment", { # Randomly add and remove items, and compare results with an environment. # This should and remove enough items so that grow() and shrink() are called # several times. set.seed(2250) iterations <- 5 n <- 1e4 # Generate keys and values. values <- rnorm(n) keys <- as.character(values) e <- new.env(parent = emptyenv()) m <- fastmap() for (iter in seq_len(iterations)) { # Add a random 3/4 of the keys add_order <- sample.int(n, size = round(3/4 * n)) for (i in add_order) { e[[keys[i]]] <- values[i] m$set(keys[i], values[i]) } # Then remove a random 3/4 of them remove_order <- sample.int(n, size = round(3/4 * n)) for (i in remove_order) { if (exists(keys[i], envir = e)) rm(list = keys[i], envir = e) # No need to check for existence first with fastmap m$remove(keys[i]) } expect_mapequal(as.list(e), m$as_list()) } }) fastmap/tests/testthat/helpers-fastmap.R0000644000176200001440000000022614003624163020124 0ustar liggesusers# Get the environment from a fastmap/fastqueue/faststack object, so we can # access internal objects. env <- function(x) { environment(x$as_list) } fastmap/tests/testthat/test-stack.R0000644000176200001440000000662514003624163017124 0ustar liggesuserstest_that("Basic operations", { s <- faststack() expect_identical(s$size(), 0L) s$push(5) s$push(6) s$push(NULL) s$push(list(a=1, b=2)) s$mpush(.list=list(NULL)) s$mpush(.list=list(NULL,7)) s$mpush(8, .list=list(10), 9) # as_list() returns in the order that they were inserted expect_identical( s$as_list(), list( 5, 6, NULL, list(a=1, b=2), NULL, NULL, 7, 8, 9, 10 ) ) expect_identical(s$pop(), 10) expect_identical(s$pop(), 9) expect_identical(s$pop(), 8) expect_identical(s$pop(), 7) expect_identical(s$pop(), NULL) expect_identical(s$pop(), NULL) expect_identical(s$pop(), list(a=1, b=2)) expect_identical(s$peek(), NULL) expect_identical(s$pop(), NULL) expect_identical(s$size(), 2L) expect_identical(s$as_list(), list(5, 6)) s$reset() expect_identical(s$size(), 0L) expect_identical(s$as_list(), list()) }) test_that("Pushing multiple", { s <- faststack() s$mpush(1,2,3) s$mpush(4,5, .list = list(6, list(7,8))) s$mpush(9,10) expect_identical(s$as_list(), list(1,2,3,4,5,6,list(7,8),9,10)) expect_identical(s$pop(), 10) expect_identical(s$pop(), 9) expect_identical(s$pop(), list(7,8)) }) test_that("Popping from empty stack", { s <- faststack() expect_null(s$pop()) expect_null(s$pop()) expect_null(s$peek()) expect_identical(s$size(), 0L) s$push(5) s$push(6) expect_identical(s$as_list(), list(5, 6)) }) test_that("Different values when popping from an empty stack", { s <- faststack() expect_identical(s$pop(missing = "foo"), "foo") expect_identical(s$peek(missing = "foo"), "foo") s <- faststack(missing_default = key_missing()) expect_identical(s$pop(), key_missing()) expect_identical(s$pop(), key_missing()) expect_identical(s$peek(), key_missing()) expect_identical(s$size(), 0L) s$push(5) s$push(6) expect_identical(s$pop(), 6) expect_identical(s$pop(), 5) expect_identical(s$pop(missing = "foo"), "foo") expect_identical(s$pop(), key_missing()) }) test_that("Error expressions prevent any from being added", { s <- faststack() expect_error(s$push(1, stop("2"), 3)) expect_identical(s$size(), 0L) expect_null(s$peek()) expect_identical(s$as_list(), list()) expect_error(s$push(1, .list = list(2, stop("3")), 4)) expect_identical(s$size(), 0L) expect_null(s$peek()) expect_identical(s$as_list(), list()) }) test_that("mpop()", { s <- faststack(2L) s$mpush(1,2,3,4,5,6,7,8,9,10,11,12,13) expect_identical(s$mpop(6), list(13,12,11,10,9,8)) expect_identical(s$as_list(), list(1,2,3,4,5,6,7)) expect_identical(s$size(), 7L) # Check that we did NOT resize the underlying list since we haven't gone under # the 1/2 threshold. expect_identical(env(s)$s, c(list(1,2,3,4,5,6,7), rep(list(NULL), 6))) expect_identical(s$mpop(1), list(7)) expect_identical(s$as_list(), list(1,2,3,4,5,6)) expect_identical(s$size(), 6L) # Now we should have resized. expect_identical(env(s)$s, list(1,2,3,4,5,6)) expect_identical(s$mpop(9), list(6,5,4,3,2,1,NULL,NULL,NULL)) expect_identical(env(s)$s, list(NULL,NULL)) expect_identical(s$size(), 0L) # Different `missing` s <- faststack(2, missing_default = NA) s$mpush(1,2,3,4,5,6,7,8,9,10,11,12,13) expect_identical(s$mpop(6), list(13,12,11,10,9,8)) expect_identical(s$mpop(9), list(7,6,5,4,3,2,1,NA,NA)) expect_identical(s$mpop(3, missing = "x"), list("x","x","x")) }) fastmap/tests/testthat/test-shrink.R0000644000176200001440000000130714003624163017305 0ustar liggesusers test_that("Shrinking a map", { m <- fastmap() m$set("d", 4) m$set("g", 7) m$set("b", 2) m$set("a", 1) m$set("c", 3) m$set("e", 5) m$set("f", 6) env(m)$shrink() expect_mapequal( m$as_list(), list(a = 1, b = 2, c = 3, d = 4, e = 5, f= 6, g = 7) ) m$remove("d") m$remove("a") m$remove("c") m$remove("e") m$set("e", 5) env(m)$shrink() expect_mapequal( m$as_list(), list(b = 2, e = 5, f= 6, g = 7) ) # Second shrinking does not change anything env(m)$shrink() expect_mapequal( m$as_list(), list(b = 2, e = 5, f= 6, g = 7) ) }) test_that("Shrinking empty map", { m <- fastmap() env(m)$shrink() expect_true(length(m$as_list()) == 0) }) fastmap/tests/testthat/test-encoding.R0000644000176200001440000000700713546743337017620 0ustar liggesusers# Note: for manual testing on Mac, the following can be used to set a multi-byte # but non-UTF-8 locale: # Sys.setlocale("LC_ALL", "ja_JP.SJIS") test_that("Non-ASCII keys are represented as UTF-8", { m <- fastmap() k1 <- "abc" # "åbc" in UTF-8 k2 <- "\u00e5bc" # "åbc" in latin1 k3 <- iconv(k2, from = "UTF-8", to = "latin1") # "中 A" in UTF-8 k4 <- "\u4e2d A" expect_identical(Encoding(k2), "UTF-8") expect_identical(Encoding(k3), "latin1") expect_identical(Encoding(k4), "UTF-8") m$set(k1, 1) m$set(k2, 2) # Should overwrite k2 since the keys are the same strings in different # encodings, and fastmap converts keys to UTF-8. m$set(k3, 3) m$set(k4, 4) expect_identical(m$get(k1), 1) expect_identical(m$get(k2), 3) expect_identical(m$get(k3), 3) expect_identical(m$get(k4), 4) # keys() should be in UTF-8 keys <- m$keys() # Note: expect_setequal (and expect_identical, for that matter) compares # strings but converts them to the same encoding before comparison, so we need # to separately check encoding. expect_setequal(keys, c(k1, k2, k4)) expect_true(Encoding(keys[keys == k1]) == "unknown") expect_true(Encoding(keys[keys == k2]) == "UTF-8") expect_true(Encoding(keys[keys == k3]) == "UTF-8") expect_true(Encoding(keys[keys == k4]) == "UTF-8") # names for as_list() should be in UTF-8 m_list <- m$as_list() expect_mapequal( m_list, setNames(list(1, 3, 4), c(k1, k2, k4)) ) keys <- names(m_list) expect_setequal(keys, c(k1, k2, k4)) expect_true(Encoding(keys[keys == k1]) == "unknown") expect_true(Encoding(keys[keys == k2]) == "UTF-8") expect_true(Encoding(keys[keys == k3]) == "UTF-8") expect_true(Encoding(keys[keys == k4]) == "UTF-8") }) test_that("Non-ASCII keys with mset and mget", { m <- fastmap() k1 <- "abc" # "åbc" in UTF-8 k2 <- "\u00e5bc" # "åbc" in latin1 k3 <- iconv(k2, from = "UTF-8", to = "latin1") # "中 A" in UTF-8 k4 <- "\u4e2d A" args <- setNames(list(1, 2, 3, 4), c(k1, k2, k3, k4)) expect_identical( Encoding(names(args)), c("unknown", "UTF-8", "latin1", "UTF-8") ) # These are just here for comparison purposes. R will convert the argument # names to native encoding before fastmap can convert the names (keys) to # UTF-8. In a UTF-8 locale, the tests below would pass; in some non-UTF-8 # locales, the tests would fail. They're commented out because we can't expect # them to pass on all platforms. # do.call(m$mset, args) # expect_identical(m$get(k1), 1) # expect_identical(m$get(k2), 3) # expect_identical(m$get(k3), 3) # expect_identical(m$get(k4), 4) # Same as above, but using .list. This should succeed in all locales. m <- fastmap() m$mset(.list = args) expect_identical(m$get(k1), 1) expect_identical(m$get(k2), 3) expect_identical(m$get(k3), 3) expect_identical(m$get(k4), 4) # names for as_list() should be in UTF-8 m_list <- m$as_list() expect_mapequal( m_list, setNames(list(1, 3, 4), c(k1, k2, k4)) ) keys <- names(m_list) expect_setequal(keys, c(k1, k2, k4)) expect_true(Encoding(keys[keys == k1]) == "unknown") expect_true(Encoding(keys[keys == k2]) == "UTF-8") expect_true(Encoding(keys[keys == k3]) == "UTF-8") expect_true(Encoding(keys[keys == k4]) == "UTF-8") # mget will convert the latin1 key to UTF-8 res <- m$mget(c(k1, k2, k3, k4)) expect_identical( Encoding(names(res)), c("unknown", "UTF-8", "UTF-8", "UTF-8") ) expect_identical(names(res), c(k1, k2, k2, k4)) expect_identical(unname(res), list(1, 3, 3, 4)) }) fastmap/tests/testthat.R0000644000176200001440000000007213546743337015050 0ustar liggesuserslibrary(testthat) library(fastmap) test_check("fastmap") fastmap/src/0000755000176200001440000000000014003625006012470 5ustar liggesusersfastmap/src/init.c0000644000176200001440000000235213546743337013624 0ustar liggesusers#include #include #include // for NULL #include #include /* .Call calls */ extern SEXP C_map_create(); extern SEXP C_map_set(SEXP, SEXP, SEXP); extern SEXP C_map_get(SEXP, SEXP); extern SEXP C_map_remove(SEXP, SEXP); extern SEXP C_map_keys(SEXP, SEXP); extern SEXP C_map_keys_idxs(SEXP, SEXP); extern SEXP C_char_vec_to_utf8(SEXP); extern SEXP C_xptr_is_null(SEXP); static const R_CallMethodDef CallEntries[] = { {"C_map_create", (DL_FUNC) &C_map_create, 0}, {"C_map_set", (DL_FUNC) &C_map_set, 3}, {"C_map_get", (DL_FUNC) &C_map_get, 2}, {"C_map_remove", (DL_FUNC) &C_map_remove, 2}, {"C_map_keys", (DL_FUNC) &C_map_keys, 2}, {"C_map_keys_idxs", (DL_FUNC) &C_map_keys_idxs, 2}, {"C_char_vec_to_utf8", (DL_FUNC) &C_char_vec_to_utf8, 1}, {"C_xptr_is_null", (DL_FUNC) &C_xptr_is_null, 1}, {NULL, NULL, 0} }; attribute_visible void R_init_fastmap(DllInfo *dll) { R_registerRoutines(dll, NULL, CallEntries, NULL, NULL); R_useDynamicSymbols(dll, FALSE); } fastmap/src/Makevars0000644000176200001440000000022113546743337014202 0ustar liggesusers# Use C++11 if available CXX_STD=CXX11 # Require Rf_ prefix for R's functions, to avoid clashes. PKG_CXXFLAGS=-DR_NO_REMAP PKG_CPPFLAGS=-Ilib/ fastmap/src/lib/0000755000176200001440000000000013546743337013261 5ustar liggesusersfastmap/src/lib/tsl/0000755000176200001440000000000013546743337014063 5ustar liggesusersfastmap/src/lib/tsl/hopscotch_hash.h0000644000176200001440000020610313546743337017233 0ustar liggesusers/** * MIT License * * Copyright (c) 2017 Tessil * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell * copies of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #ifndef TSL_HOPSCOTCH_HASH_H #define TSL_HOPSCOTCH_HASH_H #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "hopscotch_growth_policy.h" #if (defined(__GNUC__) && (__GNUC__ == 4) && (__GNUC_MINOR__ < 9)) # define TSL_HH_NO_RANGE_ERASE_WITH_CONST_ITERATOR #endif /* * Only activate tsl_hh_assert if TSL_DEBUG is defined. * This way we avoid the performance hit when NDEBUG is not defined with assert as tsl_hh_assert is used a lot * (people usually compile with "-O3" and not "-O3 -DNDEBUG"). */ #ifdef TSL_DEBUG # define tsl_hh_assert(expr) assert(expr) #else # define tsl_hh_assert(expr) (static_cast(0)) #endif namespace tsl { namespace detail_hopscotch_hash { template struct make_void { using type = void; }; template struct has_is_transparent : std::false_type { }; template struct has_is_transparent::type> : std::true_type { }; template struct has_key_compare : std::false_type { }; template struct has_key_compare::type> : std::true_type { }; template struct is_power_of_two_policy: std::false_type { }; template struct is_power_of_two_policy>: std::true_type { }; /* * smallest_type_for_min_bits::type returns the smallest type that can fit MinBits. */ static const std::size_t SMALLEST_TYPE_MAX_BITS_SUPPORTED = 64; template class smallest_type_for_min_bits { }; template class smallest_type_for_min_bits 0) && (MinBits <= 8)>::type> { public: using type = std::uint_least8_t; }; template class smallest_type_for_min_bits 8) && (MinBits <= 16)>::type> { public: using type = std::uint_least16_t; }; template class smallest_type_for_min_bits 16) && (MinBits <= 32)>::type> { public: using type = std::uint_least32_t; }; template class smallest_type_for_min_bits 32) && (MinBits <= 64)>::type> { public: using type = std::uint_least64_t; }; /* * Each bucket may store up to three elements: * - An aligned storage to store a value_type object with placement-new. * - An (optional) hash of the value in the bucket. * - An unsigned integer of type neighborhood_bitmap used to tell us which buckets in the neighborhood of the * current bucket contain a value with a hash belonging to the current bucket. * * For a bucket 'bct', a bit 'i' (counting from 0 and from the least significant bit to the most significant) * set to 1 means that the bucket 'bct + i' contains a value with a hash belonging to bucket 'bct'. * The bits used for that, start from the third least significant bit. * The two least significant bits are reserved: * - The least significant bit is set to 1 if there is a value in the bucket storage. * - The second least significant bit is set to 1 if there is an overflow. More than NeighborhoodSize values * give the same hash, all overflow values are stored in the m_overflow_elements list of the map. * * Details regarding hopscotch hashing an its implementation can be found here: * https://tessil.github.io/2016/08/29/hopscotch-hashing.html */ static const std::size_t NB_RESERVED_BITS_IN_NEIGHBORHOOD = 2; using truncated_hash_type = std::uint_least32_t; /** * Helper class that stores a truncated hash if StoreHash is true and nothing otherwise. */ template class hopscotch_bucket_hash { public: bool bucket_hash_equal(std::size_t /*hash*/) const noexcept { return true; } truncated_hash_type truncated_bucket_hash() const noexcept { return 0; } protected: void copy_hash(const hopscotch_bucket_hash& ) noexcept { } void set_hash(truncated_hash_type /*hash*/) noexcept { } }; template<> class hopscotch_bucket_hash { public: bool bucket_hash_equal(std::size_t hash) const noexcept { return m_hash == truncated_hash_type(hash); } truncated_hash_type truncated_bucket_hash() const noexcept { return m_hash; } protected: void copy_hash(const hopscotch_bucket_hash& bucket) noexcept { m_hash = bucket.m_hash; } void set_hash(truncated_hash_type hash) noexcept { m_hash = hash; } private: truncated_hash_type m_hash; }; template class hopscotch_bucket: public hopscotch_bucket_hash { private: static const std::size_t MIN_NEIGHBORHOOD_SIZE = 4; static const std::size_t MAX_NEIGHBORHOOD_SIZE = SMALLEST_TYPE_MAX_BITS_SUPPORTED - NB_RESERVED_BITS_IN_NEIGHBORHOOD; static_assert(NeighborhoodSize >= 4, "NeighborhoodSize should be >= 4."); // We can't put a variable in the message, ensure coherence static_assert(MIN_NEIGHBORHOOD_SIZE == 4, ""); static_assert(NeighborhoodSize <= 62, "NeighborhoodSize should be <= 62."); // We can't put a variable in the message, ensure coherence static_assert(MAX_NEIGHBORHOOD_SIZE == 62, ""); static_assert(!StoreHash || NeighborhoodSize <= 30, "NeighborhoodSize should be <= 30 if StoreHash is true."); // We can't put a variable in the message, ensure coherence static_assert(MAX_NEIGHBORHOOD_SIZE - 32 == 30, ""); using bucket_hash = hopscotch_bucket_hash; public: using value_type = ValueType; using neighborhood_bitmap = typename smallest_type_for_min_bits::type; hopscotch_bucket() noexcept: bucket_hash(), m_neighborhood_infos(0) { tsl_hh_assert(empty()); } hopscotch_bucket(const hopscotch_bucket& bucket) noexcept(std::is_nothrow_copy_constructible::value): bucket_hash(bucket), m_neighborhood_infos(0) { if(!bucket.empty()) { ::new (static_cast(std::addressof(m_value))) value_type(bucket.value()); } m_neighborhood_infos = bucket.m_neighborhood_infos; } hopscotch_bucket(hopscotch_bucket&& bucket) noexcept(std::is_nothrow_move_constructible::value) : bucket_hash(std::move(bucket)), m_neighborhood_infos(0) { if(!bucket.empty()) { ::new (static_cast(std::addressof(m_value))) value_type(std::move(bucket.value())); } m_neighborhood_infos = bucket.m_neighborhood_infos; } hopscotch_bucket& operator=(const hopscotch_bucket& bucket) noexcept(std::is_nothrow_copy_constructible::value) { if(this != &bucket) { remove_value(); bucket_hash::operator=(bucket); if(!bucket.empty()) { ::new (static_cast(std::addressof(m_value))) value_type(bucket.value()); } m_neighborhood_infos = bucket.m_neighborhood_infos; } return *this; } hopscotch_bucket& operator=(hopscotch_bucket&& ) = delete; ~hopscotch_bucket() noexcept { if(!empty()) { destroy_value(); } } neighborhood_bitmap neighborhood_infos() const noexcept { return neighborhood_bitmap(m_neighborhood_infos >> NB_RESERVED_BITS_IN_NEIGHBORHOOD); } void set_overflow(bool has_overflow) noexcept { if(has_overflow) { m_neighborhood_infos = neighborhood_bitmap(m_neighborhood_infos | 2); } else { m_neighborhood_infos = neighborhood_bitmap(m_neighborhood_infos & ~2); } } bool has_overflow() const noexcept { return (m_neighborhood_infos & 2) != 0; } bool empty() const noexcept { return (m_neighborhood_infos & 1) == 0; } void toggle_neighbor_presence(std::size_t ineighbor) noexcept { tsl_hh_assert(ineighbor <= NeighborhoodSize); m_neighborhood_infos = neighborhood_bitmap( m_neighborhood_infos ^ (1ull << (ineighbor + NB_RESERVED_BITS_IN_NEIGHBORHOOD))); } bool check_neighbor_presence(std::size_t ineighbor) const noexcept { tsl_hh_assert(ineighbor <= NeighborhoodSize); if(((m_neighborhood_infos >> (ineighbor + NB_RESERVED_BITS_IN_NEIGHBORHOOD)) & 1) == 1) { return true; } return false; } value_type& value() noexcept { tsl_hh_assert(!empty()); return *reinterpret_cast(std::addressof(m_value)); } const value_type& value() const noexcept { tsl_hh_assert(!empty()); return *reinterpret_cast(std::addressof(m_value)); } template void set_value_of_empty_bucket(truncated_hash_type hash, Args&&... value_type_args) { tsl_hh_assert(empty()); ::new (static_cast(std::addressof(m_value))) value_type(std::forward(value_type_args)...); set_empty(false); this->set_hash(hash); } void swap_value_into_empty_bucket(hopscotch_bucket& empty_bucket) { tsl_hh_assert(empty_bucket.empty()); if(!empty()) { ::new (static_cast(std::addressof(empty_bucket.m_value))) value_type(std::move(value())); empty_bucket.copy_hash(*this); empty_bucket.set_empty(false); destroy_value(); set_empty(true); } } void remove_value() noexcept { if(!empty()) { destroy_value(); set_empty(true); } } void clear() noexcept { if(!empty()) { destroy_value(); } m_neighborhood_infos = 0; tsl_hh_assert(empty()); } static truncated_hash_type truncate_hash(std::size_t hash) noexcept { return truncated_hash_type(hash); } private: void set_empty(bool is_empty) noexcept { if(is_empty) { m_neighborhood_infos = neighborhood_bitmap(m_neighborhood_infos & ~1); } else { m_neighborhood_infos = neighborhood_bitmap(m_neighborhood_infos | 1); } } void destroy_value() noexcept { tsl_hh_assert(!empty()); value().~value_type(); } private: using storage = typename std::aligned_storage::type; neighborhood_bitmap m_neighborhood_infos; storage m_value; }; /** * Internal common class used by (b)hopscotch_map and (b)hopscotch_set. * * ValueType is what will be stored by hopscotch_hash (usually std::pair for a map and Key for a set). * * KeySelect should be a FunctionObject which takes a ValueType in parameter and returns a reference to the key. * * ValueSelect should be a FunctionObject which takes a ValueType in parameter and returns a reference to the value. * ValueSelect should be void if there is no value (in a set for example). * * OverflowContainer will be used as containers for overflown elements. Usually it should be a list * or a set/map. */ template class hopscotch_hash: private Hash, private KeyEqual, private GrowthPolicy { private: template using has_mapped_type = typename std::integral_constant::value>; static_assert(noexcept(std::declval().bucket_for_hash(std::size_t(0))), "GrowthPolicy::bucket_for_hash must be noexcept."); static_assert(noexcept(std::declval().clear()), "GrowthPolicy::clear must be noexcept."); public: template class hopscotch_iterator; using key_type = typename KeySelect::key_type; using value_type = ValueType; using size_type = std::size_t; using difference_type = std::ptrdiff_t; using hasher = Hash; using key_equal = KeyEqual; using allocator_type = Allocator; using reference = value_type&; using const_reference = const value_type&; using pointer = value_type*; using const_pointer = const value_type*; using iterator = hopscotch_iterator; using const_iterator = hopscotch_iterator; private: using hopscotch_bucket = tsl::detail_hopscotch_hash::hopscotch_bucket; using neighborhood_bitmap = typename hopscotch_bucket::neighborhood_bitmap; using buckets_allocator = typename std::allocator_traits::template rebind_alloc; using buckets_container_type = std::vector; using overflow_container_type = OverflowContainer; static_assert(std::is_same::value, "OverflowContainer should have ValueType as type."); static_assert(std::is_same::value, "Invalid allocator, not the same type as the value_type."); using iterator_buckets = typename buckets_container_type::iterator; using const_iterator_buckets = typename buckets_container_type::const_iterator; using iterator_overflow = typename overflow_container_type::iterator; using const_iterator_overflow = typename overflow_container_type::const_iterator; public: /** * The `operator*()` and `operator->()` methods return a const reference and const pointer respectively to the * stored value type. * * In case of a map, to get a modifiable reference to the value associated to a key (the `.second` in the * stored pair), you have to call `value()`. */ template class hopscotch_iterator { friend class hopscotch_hash; private: using iterator_bucket = typename std::conditional::type; using iterator_overflow = typename std::conditional::type; hopscotch_iterator(iterator_bucket buckets_iterator, iterator_bucket buckets_end_iterator, iterator_overflow overflow_iterator) noexcept : m_buckets_iterator(buckets_iterator), m_buckets_end_iterator(buckets_end_iterator), m_overflow_iterator(overflow_iterator) { } public: using iterator_category = std::forward_iterator_tag; using value_type = const typename hopscotch_hash::value_type; using difference_type = std::ptrdiff_t; using reference = value_type&; using pointer = value_type*; hopscotch_iterator() noexcept { } // Copy constructor from iterator to const_iterator. template::type* = nullptr> hopscotch_iterator(const hopscotch_iterator& other) noexcept : m_buckets_iterator(other.m_buckets_iterator), m_buckets_end_iterator(other.m_buckets_end_iterator), m_overflow_iterator(other.m_overflow_iterator) { } hopscotch_iterator(const hopscotch_iterator& other) = default; hopscotch_iterator(hopscotch_iterator&& other) = default; hopscotch_iterator& operator=(const hopscotch_iterator& other) = default; hopscotch_iterator& operator=(hopscotch_iterator&& other) = default; const typename hopscotch_hash::key_type& key() const { if(m_buckets_iterator != m_buckets_end_iterator) { return KeySelect()(m_buckets_iterator->value()); } return KeySelect()(*m_overflow_iterator); } template::value>::type* = nullptr> typename std::conditional< IsConst, const typename U::value_type&, typename U::value_type&>::type value() const { if(m_buckets_iterator != m_buckets_end_iterator) { return U()(m_buckets_iterator->value()); } return U()(*m_overflow_iterator); } reference operator*() const { if(m_buckets_iterator != m_buckets_end_iterator) { return m_buckets_iterator->value(); } return *m_overflow_iterator; } pointer operator->() const { if(m_buckets_iterator != m_buckets_end_iterator) { return std::addressof(m_buckets_iterator->value()); } return std::addressof(*m_overflow_iterator); } hopscotch_iterator& operator++() { if(m_buckets_iterator == m_buckets_end_iterator) { ++m_overflow_iterator; return *this; } do { ++m_buckets_iterator; } while(m_buckets_iterator != m_buckets_end_iterator && m_buckets_iterator->empty()); return *this; } hopscotch_iterator operator++(int) { hopscotch_iterator tmp(*this); ++*this; return tmp; } friend bool operator==(const hopscotch_iterator& lhs, const hopscotch_iterator& rhs) { return lhs.m_buckets_iterator == rhs.m_buckets_iterator && lhs.m_overflow_iterator == rhs.m_overflow_iterator; } friend bool operator!=(const hopscotch_iterator& lhs, const hopscotch_iterator& rhs) { return !(lhs == rhs); } private: iterator_bucket m_buckets_iterator; iterator_bucket m_buckets_end_iterator; iterator_overflow m_overflow_iterator; }; public: template::value>::type* = nullptr> hopscotch_hash(size_type bucket_count, const Hash& hash, const KeyEqual& equal, const Allocator& alloc, float max_load_factor) : Hash(hash), KeyEqual(equal), GrowthPolicy(bucket_count), m_buckets_data(alloc), m_overflow_elements(alloc), m_buckets(static_empty_bucket_ptr()), m_nb_elements(0) { if(bucket_count > max_bucket_count()) { throw std::length_error("The map exceeds its maxmimum size."); } if(bucket_count > 0) { static_assert(NeighborhoodSize - 1 > 0, ""); // Can't directly construct with the appropriate size in the initializer // as m_buckets_data(bucket_count, alloc) is not supported by GCC 4.8 m_buckets_data.resize(bucket_count + NeighborhoodSize - 1); m_buckets = m_buckets_data.data(); } this->max_load_factor(max_load_factor); // Check in the constructor instead of outside of a function to avoi compilation issues // when value_type is not complete. static_assert(std::is_nothrow_move_constructible::value || std::is_copy_constructible::value, "value_type must be either copy constructible or nothrow move constructible."); } template::value>::type* = nullptr> hopscotch_hash(size_type bucket_count, const Hash& hash, const KeyEqual& equal, const Allocator& alloc, float max_load_factor, const typename OC::key_compare& comp) : Hash(hash), KeyEqual(equal), GrowthPolicy(bucket_count), m_buckets_data(alloc), m_overflow_elements(comp, alloc), m_buckets(static_empty_bucket_ptr()), m_nb_elements(0) { if(bucket_count > max_bucket_count()) { throw std::length_error("The map exceeds its maxmimum size."); } if(bucket_count > 0) { static_assert(NeighborhoodSize - 1 > 0, ""); // Can't directly construct with the appropriate size in the initializer // as m_buckets_data(bucket_count, alloc) is not supported by GCC 4.8 m_buckets_data.resize(bucket_count + NeighborhoodSize - 1); m_buckets = m_buckets_data.data(); } this->max_load_factor(max_load_factor); // Check in the constructor instead of outside of a function to avoi compilation issues // when value_type is not complete. static_assert(std::is_nothrow_move_constructible::value || std::is_copy_constructible::value, "value_type must be either copy constructible or nothrow move constructible."); } hopscotch_hash(const hopscotch_hash& other): Hash(other), KeyEqual(other), GrowthPolicy(other), m_buckets_data(other.m_buckets_data), m_overflow_elements(other.m_overflow_elements), m_buckets(m_buckets_data.empty()?static_empty_bucket_ptr(): m_buckets_data.data()), m_nb_elements(other.m_nb_elements), m_max_load_factor(other.m_max_load_factor), m_max_load_threshold_rehash(other.m_max_load_threshold_rehash), m_min_load_threshold_rehash(other.m_min_load_threshold_rehash) { } hopscotch_hash(hopscotch_hash&& other) noexcept( std::is_nothrow_move_constructible::value && std::is_nothrow_move_constructible::value && std::is_nothrow_move_constructible::value && std::is_nothrow_move_constructible::value && std::is_nothrow_move_constructible::value ): Hash(std::move(static_cast(other))), KeyEqual(std::move(static_cast(other))), GrowthPolicy(std::move(static_cast(other))), m_buckets_data(std::move(other.m_buckets_data)), m_overflow_elements(std::move(other.m_overflow_elements)), m_buckets(m_buckets_data.empty()?static_empty_bucket_ptr(): m_buckets_data.data()), m_nb_elements(other.m_nb_elements), m_max_load_factor(other.m_max_load_factor), m_max_load_threshold_rehash(other.m_max_load_threshold_rehash), m_min_load_threshold_rehash(other.m_min_load_threshold_rehash) { other.GrowthPolicy::clear(); other.m_buckets_data.clear(); other.m_overflow_elements.clear(); other.m_buckets = static_empty_bucket_ptr(); other.m_nb_elements = 0; other.m_max_load_threshold_rehash = 0; other.m_min_load_threshold_rehash = 0; } hopscotch_hash& operator=(const hopscotch_hash& other) { if(&other != this) { Hash::operator=(other); KeyEqual::operator=(other); GrowthPolicy::operator=(other); m_buckets_data = other.m_buckets_data; m_overflow_elements = other.m_overflow_elements; m_buckets = m_buckets_data.empty()?static_empty_bucket_ptr(): m_buckets_data.data(); m_nb_elements = other.m_nb_elements; m_max_load_factor = other.m_max_load_factor; m_max_load_threshold_rehash = other.m_max_load_threshold_rehash; m_min_load_threshold_rehash = other.m_min_load_threshold_rehash; } return *this; } hopscotch_hash& operator=(hopscotch_hash&& other) { other.swap(*this); other.clear(); return *this; } allocator_type get_allocator() const { return m_buckets_data.get_allocator(); } /* * Iterators */ iterator begin() noexcept { auto begin = m_buckets_data.begin(); while(begin != m_buckets_data.end() && begin->empty()) { ++begin; } return iterator(begin, m_buckets_data.end(), m_overflow_elements.begin()); } const_iterator begin() const noexcept { return cbegin(); } const_iterator cbegin() const noexcept { auto begin = m_buckets_data.cbegin(); while(begin != m_buckets_data.cend() && begin->empty()) { ++begin; } return const_iterator(begin, m_buckets_data.cend(), m_overflow_elements.cbegin()); } iterator end() noexcept { return iterator(m_buckets_data.end(), m_buckets_data.end(), m_overflow_elements.end()); } const_iterator end() const noexcept { return cend(); } const_iterator cend() const noexcept { return const_iterator(m_buckets_data.cend(), m_buckets_data.cend(), m_overflow_elements.cend()); } /* * Capacity */ bool empty() const noexcept { return m_nb_elements == 0; } size_type size() const noexcept { return m_nb_elements; } size_type max_size() const noexcept { return m_buckets_data.max_size(); } /* * Modifiers */ void clear() noexcept { for(auto& bucket: m_buckets_data) { bucket.clear(); } m_overflow_elements.clear(); m_nb_elements = 0; } std::pair insert(const value_type& value) { return insert_impl(value); } template::value>::type* = nullptr> std::pair insert(P&& value) { return insert_impl(value_type(std::forward

(value))); } std::pair insert(value_type&& value) { return insert_impl(std::move(value)); } iterator insert(const_iterator hint, const value_type& value) { if(hint != cend() && compare_keys(KeySelect()(*hint), KeySelect()(value))) { return mutable_iterator(hint); } return insert(value).first; } template::value>::type* = nullptr> iterator insert(const_iterator hint, P&& value) { return emplace_hint(hint, std::forward

(value)); } iterator insert(const_iterator hint, value_type&& value) { if(hint != cend() && compare_keys(KeySelect()(*hint), KeySelect()(value))) { return mutable_iterator(hint); } return insert(std::move(value)).first; } template void insert(InputIt first, InputIt last) { if(std::is_base_of::iterator_category>::value) { const auto nb_elements_insert = std::distance(first, last); const std::size_t nb_elements_in_buckets = m_nb_elements - m_overflow_elements.size(); const std::size_t nb_free_buckets = m_max_load_threshold_rehash - nb_elements_in_buckets; tsl_hh_assert(m_nb_elements >= m_overflow_elements.size()); tsl_hh_assert(m_max_load_threshold_rehash >= nb_elements_in_buckets); if(nb_elements_insert > 0 && nb_free_buckets < std::size_t(nb_elements_insert)) { reserve(nb_elements_in_buckets + std::size_t(nb_elements_insert)); } } for(; first != last; ++first) { insert(*first); } } template std::pair insert_or_assign(const key_type& k, M&& obj) { return insert_or_assign_impl(k, std::forward(obj)); } template std::pair insert_or_assign(key_type&& k, M&& obj) { return insert_or_assign_impl(std::move(k), std::forward(obj)); } template iterator insert_or_assign(const_iterator hint, const key_type& k, M&& obj) { if(hint != cend() && compare_keys(KeySelect()(*hint), k)) { auto it = mutable_iterator(hint); it.value() = std::forward(obj); return it; } return insert_or_assign(k, std::forward(obj)).first; } template iterator insert_or_assign(const_iterator hint, key_type&& k, M&& obj) { if(hint != cend() && compare_keys(KeySelect()(*hint), k)) { auto it = mutable_iterator(hint); it.value() = std::forward(obj); return it; } return insert_or_assign(std::move(k), std::forward(obj)).first; } template std::pair emplace(Args&&... args) { return insert(value_type(std::forward(args)...)); } template iterator emplace_hint(const_iterator hint, Args&&... args) { return insert(hint, value_type(std::forward(args)...)); } template std::pair try_emplace(const key_type& k, Args&&... args) { return try_emplace_impl(k, std::forward(args)...); } template std::pair try_emplace(key_type&& k, Args&&... args) { return try_emplace_impl(std::move(k), std::forward(args)...); } template iterator try_emplace(const_iterator hint, const key_type& k, Args&&... args) { if(hint != cend() && compare_keys(KeySelect()(*hint), k)) { return mutable_iterator(hint); } return try_emplace(k, std::forward(args)...).first; } template iterator try_emplace(const_iterator hint, key_type&& k, Args&&... args) { if(hint != cend() && compare_keys(KeySelect()(*hint), k)) { return mutable_iterator(hint); } return try_emplace(std::move(k), std::forward(args)...).first; } /** * Here to avoid `template size_type erase(const K& key)` being used when * we use an iterator instead of a const_iterator. */ iterator erase(iterator pos) { return erase(const_iterator(pos)); } iterator erase(const_iterator pos) { const std::size_t ibucket_for_hash = bucket_for_hash(hash_key(pos.key())); if(pos.m_buckets_iterator != pos.m_buckets_end_iterator) { auto it_bucket = m_buckets_data.begin() + std::distance(m_buckets_data.cbegin(), pos.m_buckets_iterator); erase_from_bucket(*it_bucket, ibucket_for_hash); return ++iterator(it_bucket, m_buckets_data.end(), m_overflow_elements.begin()); } else { auto it_next_overflow = erase_from_overflow(pos.m_overflow_iterator, ibucket_for_hash); return iterator(m_buckets_data.end(), m_buckets_data.end(), it_next_overflow); } } iterator erase(const_iterator first, const_iterator last) { if(first == last) { return mutable_iterator(first); } auto to_delete = erase(first); while(to_delete != last) { to_delete = erase(to_delete); } return to_delete; } template size_type erase(const K& key) { return erase(key, hash_key(key)); } template size_type erase(const K& key, std::size_t hash) { const std::size_t ibucket_for_hash = bucket_for_hash(hash); hopscotch_bucket* bucket_found = find_in_buckets(key, hash, m_buckets + ibucket_for_hash); if(bucket_found != nullptr) { erase_from_bucket(*bucket_found, ibucket_for_hash); return 1; } if(m_buckets[ibucket_for_hash].has_overflow()) { auto it_overflow = find_in_overflow(key); if(it_overflow != m_overflow_elements.end()) { erase_from_overflow(it_overflow, ibucket_for_hash); return 1; } } return 0; } void swap(hopscotch_hash& other) { using std::swap; swap(static_cast(*this), static_cast(other)); swap(static_cast(*this), static_cast(other)); swap(static_cast(*this), static_cast(other)); swap(m_buckets_data, other.m_buckets_data); swap(m_overflow_elements, other.m_overflow_elements); swap(m_buckets, other.m_buckets); swap(m_nb_elements, other.m_nb_elements); swap(m_max_load_factor, other.m_max_load_factor); swap(m_max_load_threshold_rehash, other.m_max_load_threshold_rehash); swap(m_min_load_threshold_rehash, other.m_min_load_threshold_rehash); } /* * Lookup */ template::value>::type* = nullptr> typename U::value_type& at(const K& key) { return at(key, hash_key(key)); } template::value>::type* = nullptr> typename U::value_type& at(const K& key, std::size_t hash) { return const_cast(static_cast(this)->at(key, hash)); } template::value>::type* = nullptr> const typename U::value_type& at(const K& key) const { return at(key, hash_key(key)); } template::value>::type* = nullptr> const typename U::value_type& at(const K& key, std::size_t hash) const { using T = typename U::value_type; const T* value = find_value_impl(key, hash, m_buckets + bucket_for_hash(hash)); if(value == nullptr) { throw std::out_of_range("Couldn't find key."); } else { return *value; } } template::value>::type* = nullptr> typename U::value_type& operator[](K&& key) { using T = typename U::value_type; const std::size_t hash = hash_key(key); const std::size_t ibucket_for_hash = bucket_for_hash(hash); T* value = find_value_impl(key, hash, m_buckets + ibucket_for_hash); if(value != nullptr) { return *value; } else { return insert_value(ibucket_for_hash, hash, std::piecewise_construct, std::forward_as_tuple(std::forward(key)), std::forward_as_tuple()).first.value(); } } template size_type count(const K& key) const { return count(key, hash_key(key)); } template size_type count(const K& key, std::size_t hash) const { return count_impl(key, hash, m_buckets + bucket_for_hash(hash)); } template iterator find(const K& key) { return find(key, hash_key(key)); } template iterator find(const K& key, std::size_t hash) { return find_impl(key, hash, m_buckets + bucket_for_hash(hash)); } template const_iterator find(const K& key) const { return find(key, hash_key(key)); } template const_iterator find(const K& key, std::size_t hash) const { return find_impl(key, hash, m_buckets + bucket_for_hash(hash)); } template std::pair equal_range(const K& key) { return equal_range(key, hash_key(key)); } template std::pair equal_range(const K& key, std::size_t hash) { iterator it = find(key, hash); return std::make_pair(it, (it == end())?it:std::next(it)); } template std::pair equal_range(const K& key) const { return equal_range(key, hash_key(key)); } template std::pair equal_range(const K& key, std::size_t hash) const { const_iterator it = find(key, hash); return std::make_pair(it, (it == cend())?it:std::next(it)); } /* * Bucket interface */ size_type bucket_count() const { /* * So that the last bucket can have NeighborhoodSize neighbors, the size of the bucket array is a little * bigger than the real number of buckets when not empty. * We could use some of the buckets at the beginning, but it is faster this way as we avoid extra checks. */ if(m_buckets_data.empty()) { return 0; } return m_buckets_data.size() - NeighborhoodSize + 1; } size_type max_bucket_count() const { const std::size_t max_bucket_count = std::min(GrowthPolicy::max_bucket_count(), m_buckets_data.max_size()); return max_bucket_count - NeighborhoodSize + 1; } /* * Hash policy */ float load_factor() const { if(bucket_count() == 0) { return 0; } return float(m_nb_elements)/float(bucket_count()); } float max_load_factor() const { return m_max_load_factor; } void max_load_factor(float ml) { m_max_load_factor = std::max(0.1f, std::min(ml, 0.95f)); m_max_load_threshold_rehash = size_type(float(bucket_count())*m_max_load_factor); m_min_load_threshold_rehash = size_type(float(bucket_count())*MIN_LOAD_FACTOR_FOR_REHASH); } void rehash(size_type count_) { count_ = std::max(count_, size_type(std::ceil(float(size())/max_load_factor()))); rehash_impl(count_); } void reserve(size_type count_) { rehash(size_type(std::ceil(float(count_)/max_load_factor()))); } /* * Observers */ hasher hash_function() const { return static_cast(*this); } key_equal key_eq() const { return static_cast(*this); } /* * Other */ iterator mutable_iterator(const_iterator pos) { if(pos.m_buckets_iterator != pos.m_buckets_end_iterator) { // Get a non-const iterator auto it = m_buckets_data.begin() + std::distance(m_buckets_data.cbegin(), pos.m_buckets_iterator); return iterator(it, m_buckets_data.end(), m_overflow_elements.begin()); } else { // Get a non-const iterator auto it = mutable_overflow_iterator(pos.m_overflow_iterator); return iterator(m_buckets_data.end(), m_buckets_data.end(), it); } } size_type overflow_size() const noexcept { return m_overflow_elements.size(); } template::value>::type* = nullptr> typename U::key_compare key_comp() const { return m_overflow_elements.key_comp(); } private: template std::size_t hash_key(const K& key) const { return Hash::operator()(key); } template bool compare_keys(const K1& key1, const K2& key2) const { return KeyEqual::operator()(key1, key2); } std::size_t bucket_for_hash(std::size_t hash) const { const std::size_t bucket = GrowthPolicy::bucket_for_hash(hash); tsl_hh_assert(bucket < m_buckets_data.size() || (bucket == 0 && m_buckets_data.empty())); return bucket; } template::value>::type* = nullptr> void rehash_impl(size_type count_) { hopscotch_hash new_map = new_hopscotch_hash(count_); if(!m_overflow_elements.empty()) { new_map.m_overflow_elements.swap(m_overflow_elements); new_map.m_nb_elements += new_map.m_overflow_elements.size(); for(const value_type& value : new_map.m_overflow_elements) { const std::size_t ibucket_for_hash = new_map.bucket_for_hash(new_map.hash_key(KeySelect()(value))); new_map.m_buckets[ibucket_for_hash].set_overflow(true); } } try { const bool use_stored_hash = USE_STORED_HASH_ON_REHASH(new_map.bucket_count()); for(auto it_bucket = m_buckets_data.begin(); it_bucket != m_buckets_data.end(); ++it_bucket) { if(it_bucket->empty()) { continue; } const std::size_t hash = use_stored_hash? it_bucket->truncated_bucket_hash(): new_map.hash_key(KeySelect()(it_bucket->value())); const std::size_t ibucket_for_hash = new_map.bucket_for_hash(hash); new_map.insert_value(ibucket_for_hash, hash, std::move(it_bucket->value())); erase_from_bucket(*it_bucket, bucket_for_hash(hash)); } } /* * The call to insert_value may throw an exception if an element is added to the overflow * list. Rollback the elements in this case. */ catch(...) { m_overflow_elements.swap(new_map.m_overflow_elements); const bool use_stored_hash = USE_STORED_HASH_ON_REHASH(new_map.bucket_count()); for(auto it_bucket = new_map.m_buckets_data.begin(); it_bucket != new_map.m_buckets_data.end(); ++it_bucket) { if(it_bucket->empty()) { continue; } const std::size_t hash = use_stored_hash? it_bucket->truncated_bucket_hash(): hash_key(KeySelect()(it_bucket->value())); const std::size_t ibucket_for_hash = bucket_for_hash(hash); // The elements we insert were not in the overflow list before the switch. // They will not be go in the overflow list if we rollback the switch. insert_value(ibucket_for_hash, hash, std::move(it_bucket->value())); } throw; } new_map.swap(*this); } template::value && !std::is_nothrow_move_constructible::value>::type* = nullptr> void rehash_impl(size_type count_) { hopscotch_hash new_map = new_hopscotch_hash(count_); const bool use_stored_hash = USE_STORED_HASH_ON_REHASH(new_map.bucket_count()); for(const hopscotch_bucket& bucket: m_buckets_data) { if(bucket.empty()) { continue; } const std::size_t hash = use_stored_hash? bucket.truncated_bucket_hash(): new_map.hash_key(KeySelect()(bucket.value())); const std::size_t ibucket_for_hash = new_map.bucket_for_hash(hash); new_map.insert_value(ibucket_for_hash, hash, bucket.value()); } for(const value_type& value: m_overflow_elements) { const std::size_t hash = new_map.hash_key(KeySelect()(value)); const std::size_t ibucket_for_hash = new_map.bucket_for_hash(hash); new_map.insert_value(ibucket_for_hash, hash, value); } new_map.swap(*this); } #ifdef TSL_HH_NO_RANGE_ERASE_WITH_CONST_ITERATOR iterator_overflow mutable_overflow_iterator(const_iterator_overflow it) { return std::next(m_overflow_elements.begin(), std::distance(m_overflow_elements.cbegin(), it)); } #else iterator_overflow mutable_overflow_iterator(const_iterator_overflow it) { return m_overflow_elements.erase(it, it); } #endif // iterator is in overflow list iterator_overflow erase_from_overflow(const_iterator_overflow pos, std::size_t ibucket_for_hash) { #ifdef TSL_HH_NO_RANGE_ERASE_WITH_CONST_ITERATOR auto it_next = m_overflow_elements.erase(mutable_overflow_iterator(pos)); #else auto it_next = m_overflow_elements.erase(pos); #endif m_nb_elements--; // Check if we can remove the overflow flag tsl_hh_assert(m_buckets[ibucket_for_hash].has_overflow()); for(const value_type& value: m_overflow_elements) { const std::size_t bucket_for_value = bucket_for_hash(hash_key(KeySelect()(value))); if(bucket_for_value == ibucket_for_hash) { return it_next; } } m_buckets[ibucket_for_hash].set_overflow(false); return it_next; } /** * bucket_for_value is the bucket in which the value is. * ibucket_for_hash is the bucket where the value belongs. */ void erase_from_bucket(hopscotch_bucket& bucket_for_value, std::size_t ibucket_for_hash) noexcept { const std::size_t ibucket_for_value = std::distance(m_buckets_data.data(), &bucket_for_value); tsl_hh_assert(ibucket_for_value >= ibucket_for_hash); bucket_for_value.remove_value(); m_buckets[ibucket_for_hash].toggle_neighbor_presence(ibucket_for_value - ibucket_for_hash); m_nb_elements--; } template std::pair insert_or_assign_impl(K&& key, M&& obj) { auto it = try_emplace_impl(std::forward(key), std::forward(obj)); if(!it.second) { it.first.value() = std::forward(obj); } return it; } template std::pair try_emplace_impl(P&& key, Args&&... args_value) { const std::size_t hash = hash_key(key); const std::size_t ibucket_for_hash = bucket_for_hash(hash); // Check if already presents auto it_find = find_impl(key, hash, m_buckets + ibucket_for_hash); if(it_find != end()) { return std::make_pair(it_find, false); } return insert_value(ibucket_for_hash, hash, std::piecewise_construct, std::forward_as_tuple(std::forward

(key)), std::forward_as_tuple(std::forward(args_value)...)); } template std::pair insert_impl(P&& value) { const std::size_t hash = hash_key(KeySelect()(value)); const std::size_t ibucket_for_hash = bucket_for_hash(hash); // Check if already presents auto it_find = find_impl(KeySelect()(value), hash, m_buckets + ibucket_for_hash); if(it_find != end()) { return std::make_pair(it_find, false); } return insert_value(ibucket_for_hash, hash, std::forward

(value)); } template std::pair insert_value(std::size_t ibucket_for_hash, std::size_t hash, Args&&... value_type_args) { if((m_nb_elements - m_overflow_elements.size()) >= m_max_load_threshold_rehash) { rehash(GrowthPolicy::next_bucket_count()); ibucket_for_hash = bucket_for_hash(hash); } std::size_t ibucket_empty = find_empty_bucket(ibucket_for_hash); if(ibucket_empty < m_buckets_data.size()) { do { tsl_hh_assert(ibucket_empty >= ibucket_for_hash); // Empty bucket is in range of NeighborhoodSize, use it if(ibucket_empty - ibucket_for_hash < NeighborhoodSize) { auto it = insert_in_bucket(ibucket_empty, ibucket_for_hash, hash, std::forward(value_type_args)...); return std::make_pair(iterator(it, m_buckets_data.end(), m_overflow_elements.begin()), true); } } // else, try to swap values to get a closer empty bucket while(swap_empty_bucket_closer(ibucket_empty)); } // Load factor is too low or a rehash will not change the neighborhood, put the value in overflow list if(size() < m_min_load_threshold_rehash || !will_neighborhood_change_on_rehash(ibucket_for_hash)) { auto it = insert_in_overflow(ibucket_for_hash, std::forward(value_type_args)...); return std::make_pair(iterator(m_buckets_data.end(), m_buckets_data.end(), it), true); } rehash(GrowthPolicy::next_bucket_count()); ibucket_for_hash = bucket_for_hash(hash); return insert_value(ibucket_for_hash, hash, std::forward(value_type_args)...); } /* * Return true if a rehash will change the position of a key-value in the neighborhood of * ibucket_neighborhood_check. In this case a rehash is needed instead of puting the value in overflow list. */ bool will_neighborhood_change_on_rehash(size_t ibucket_neighborhood_check) const { std::size_t expand_bucket_count = GrowthPolicy::next_bucket_count(); GrowthPolicy expand_growth_policy(expand_bucket_count); const bool use_stored_hash = USE_STORED_HASH_ON_REHASH(expand_bucket_count); for(size_t ibucket = ibucket_neighborhood_check; ibucket < m_buckets_data.size() && (ibucket - ibucket_neighborhood_check) < NeighborhoodSize; ++ibucket) { tsl_hh_assert(!m_buckets[ibucket].empty()); const size_t hash = use_stored_hash? m_buckets[ibucket].truncated_bucket_hash(): hash_key(KeySelect()(m_buckets[ibucket].value())); if(bucket_for_hash(hash) != expand_growth_policy.bucket_for_hash(hash)) { return true; } } return false; } /* * Return the index of an empty bucket in m_buckets_data. * If none, the returned index equals m_buckets_data.size() */ std::size_t find_empty_bucket(std::size_t ibucket_start) const { const std::size_t limit = std::min(ibucket_start + MAX_PROBES_FOR_EMPTY_BUCKET, m_buckets_data.size()); for(; ibucket_start < limit; ibucket_start++) { if(m_buckets[ibucket_start].empty()) { return ibucket_start; } } return m_buckets_data.size(); } /* * Insert value in ibucket_empty where value originally belongs to ibucket_for_hash * * Return bucket iterator to ibucket_empty */ template iterator_buckets insert_in_bucket(std::size_t ibucket_empty, std::size_t ibucket_for_hash, std::size_t hash, Args&&... value_type_args) { tsl_hh_assert(ibucket_empty >= ibucket_for_hash ); tsl_hh_assert(m_buckets[ibucket_empty].empty()); m_buckets[ibucket_empty].set_value_of_empty_bucket(hopscotch_bucket::truncate_hash(hash), std::forward(value_type_args)...); tsl_hh_assert(!m_buckets[ibucket_for_hash].empty()); m_buckets[ibucket_for_hash].toggle_neighbor_presence(ibucket_empty - ibucket_for_hash); m_nb_elements++; return m_buckets_data.begin() + ibucket_empty; } template::value>::type* = nullptr> iterator_overflow insert_in_overflow(std::size_t ibucket_for_hash, Args&&... value_type_args) { auto it = m_overflow_elements.emplace(m_overflow_elements.end(), std::forward(value_type_args)...); m_buckets[ibucket_for_hash].set_overflow(true); m_nb_elements++; return it; } template::value>::type* = nullptr> iterator_overflow insert_in_overflow(std::size_t ibucket_for_hash, Args&&... value_type_args) { auto it = m_overflow_elements.emplace(std::forward(value_type_args)...).first; m_buckets[ibucket_for_hash].set_overflow(true); m_nb_elements++; return it; } /* * Try to swap the bucket ibucket_empty_in_out with a bucket preceding it while keeping the neighborhood * conditions correct. * * If a swap was possible, the position of ibucket_empty_in_out will be closer to 0 and true will re returned. */ bool swap_empty_bucket_closer(std::size_t& ibucket_empty_in_out) { tsl_hh_assert(ibucket_empty_in_out >= NeighborhoodSize); const std::size_t neighborhood_start = ibucket_empty_in_out - NeighborhoodSize + 1; for(std::size_t to_check = neighborhood_start; to_check < ibucket_empty_in_out; to_check++) { neighborhood_bitmap neighborhood_infos = m_buckets[to_check].neighborhood_infos(); std::size_t to_swap = to_check; while(neighborhood_infos != 0 && to_swap < ibucket_empty_in_out) { if((neighborhood_infos & 1) == 1) { tsl_hh_assert(m_buckets[ibucket_empty_in_out].empty()); tsl_hh_assert(!m_buckets[to_swap].empty()); m_buckets[to_swap].swap_value_into_empty_bucket(m_buckets[ibucket_empty_in_out]); tsl_hh_assert(!m_buckets[to_check].check_neighbor_presence(ibucket_empty_in_out - to_check)); tsl_hh_assert(m_buckets[to_check].check_neighbor_presence(to_swap - to_check)); m_buckets[to_check].toggle_neighbor_presence(ibucket_empty_in_out - to_check); m_buckets[to_check].toggle_neighbor_presence(to_swap - to_check); ibucket_empty_in_out = to_swap; return true; } to_swap++; neighborhood_infos = neighborhood_bitmap(neighborhood_infos >> 1); } } return false; } template::value>::type* = nullptr> typename U::value_type* find_value_impl(const K& key, std::size_t hash, hopscotch_bucket* bucket_for_hash) { return const_cast( static_cast(this)->find_value_impl(key, hash, bucket_for_hash)); } /* * Avoid the creation of an iterator to just get the value for operator[] and at() in maps. Faster this way. * * Return null if no value for the key (TODO use std::optional when available). */ template::value>::type* = nullptr> const typename U::value_type* find_value_impl(const K& key, std::size_t hash, const hopscotch_bucket* bucket_for_hash) const { const hopscotch_bucket* bucket_found = find_in_buckets(key, hash, bucket_for_hash); if(bucket_found != nullptr) { return std::addressof(ValueSelect()(bucket_found->value())); } if(bucket_for_hash->has_overflow()) { auto it_overflow = find_in_overflow(key); if(it_overflow != m_overflow_elements.end()) { return std::addressof(ValueSelect()(*it_overflow)); } } return nullptr; } template size_type count_impl(const K& key, std::size_t hash, const hopscotch_bucket* bucket_for_hash) const { if(find_in_buckets(key, hash, bucket_for_hash) != nullptr) { return 1; } else if(bucket_for_hash->has_overflow() && find_in_overflow(key) != m_overflow_elements.cend()) { return 1; } else { return 0; } } template iterator find_impl(const K& key, std::size_t hash, hopscotch_bucket* bucket_for_hash) { hopscotch_bucket* bucket_found = find_in_buckets(key, hash, bucket_for_hash); if(bucket_found != nullptr) { return iterator(m_buckets_data.begin() + std::distance(m_buckets_data.data(), bucket_found), m_buckets_data.end(), m_overflow_elements.begin()); } if(!bucket_for_hash->has_overflow()) { return end(); } return iterator(m_buckets_data.end(), m_buckets_data.end(), find_in_overflow(key)); } template const_iterator find_impl(const K& key, std::size_t hash, const hopscotch_bucket* bucket_for_hash) const { const hopscotch_bucket* bucket_found = find_in_buckets(key, hash, bucket_for_hash); if(bucket_found != nullptr) { return const_iterator(m_buckets_data.cbegin() + std::distance(m_buckets_data.data(), bucket_found), m_buckets_data.cend(), m_overflow_elements.cbegin()); } if(!bucket_for_hash->has_overflow()) { return cend(); } return const_iterator(m_buckets_data.cend(), m_buckets_data.cend(), find_in_overflow(key)); } template hopscotch_bucket* find_in_buckets(const K& key, std::size_t hash, hopscotch_bucket* bucket_for_hash) { const hopscotch_bucket* bucket_found = static_cast(this)->find_in_buckets(key, hash, bucket_for_hash); return const_cast(bucket_found); } /** * Return a pointer to the bucket which has the value, nullptr otherwise. */ template const hopscotch_bucket* find_in_buckets(const K& key, std::size_t hash, const hopscotch_bucket* bucket_for_hash) const { (void) hash; // Avoid warning of unused variable when StoreHash is false; // TODO Try to optimize the function. // I tried to use ffs and __builtin_ffs functions but I could not reduce the time the function // takes with -march=native neighborhood_bitmap neighborhood_infos = bucket_for_hash->neighborhood_infos(); while(neighborhood_infos != 0) { if((neighborhood_infos & 1) == 1) { // Check StoreHash before calling bucket_hash_equal. Functionally it doesn't change anythin. // If StoreHash is false, bucket_hash_equal is a no-op. Avoiding the call is there to help // GCC optimizes `hash` parameter away, it seems to not be able to do without this hint. if((!StoreHash || bucket_for_hash->bucket_hash_equal(hash)) && compare_keys(KeySelect()(bucket_for_hash->value()), key)) { return bucket_for_hash; } } ++bucket_for_hash; neighborhood_infos = neighborhood_bitmap(neighborhood_infos >> 1); } return nullptr; } template::value>::type* = nullptr> iterator_overflow find_in_overflow(const K& key) { return std::find_if(m_overflow_elements.begin(), m_overflow_elements.end(), [&](const value_type& value) { return compare_keys(key, KeySelect()(value)); }); } template::value>::type* = nullptr> const_iterator_overflow find_in_overflow(const K& key) const { return std::find_if(m_overflow_elements.cbegin(), m_overflow_elements.cend(), [&](const value_type& value) { return compare_keys(key, KeySelect()(value)); }); } template::value>::type* = nullptr> iterator_overflow find_in_overflow(const K& key) { return m_overflow_elements.find(key); } template::value>::type* = nullptr> const_iterator_overflow find_in_overflow(const K& key) const { return m_overflow_elements.find(key); } template::value>::type* = nullptr> hopscotch_hash new_hopscotch_hash(size_type bucket_count) { return hopscotch_hash(bucket_count, static_cast(*this), static_cast(*this), get_allocator(), m_max_load_factor); } template::value>::type* = nullptr> hopscotch_hash new_hopscotch_hash(size_type bucket_count) { return hopscotch_hash(bucket_count, static_cast(*this), static_cast(*this), get_allocator(), m_max_load_factor, m_overflow_elements.key_comp()); } public: static const size_type DEFAULT_INIT_BUCKETS_SIZE = 0; static constexpr float DEFAULT_MAX_LOAD_FACTOR = (NeighborhoodSize <= 30)?0.8f:0.9f; private: static const std::size_t MAX_PROBES_FOR_EMPTY_BUCKET = 12*NeighborhoodSize; static constexpr float MIN_LOAD_FACTOR_FOR_REHASH = 0.1f; static bool USE_STORED_HASH_ON_REHASH(size_type bucket_count) { (void) bucket_count; if(StoreHash && is_power_of_two_policy::value) { tsl_hh_assert(bucket_count > 0); return (bucket_count - 1) <= std::numeric_limits::max(); } else { return false; } } /** * Return an always valid pointer to an static empty hopscotch_bucket. */ hopscotch_bucket* static_empty_bucket_ptr() { static hopscotch_bucket empty_bucket; return &empty_bucket; } private: buckets_container_type m_buckets_data; overflow_container_type m_overflow_elements; /** * Points to m_buckets_data.data() if !m_buckets_data.empty() otherwise points to static_empty_bucket_ptr. * This variable is useful to avoid the cost of checking if m_buckets_data is empty when trying * to find an element. * * TODO Remove m_buckets_data and only use a pointer+size instead of a pointer+vector to save some space in the hopscotch_hash object. */ hopscotch_bucket* m_buckets; size_type m_nb_elements; float m_max_load_factor; /** * Max size of the hash table before a rehash occurs automatically to grow the table. */ size_type m_max_load_threshold_rehash; /** * Min size of the hash table before a rehash can occurs automatically (except if m_max_load_threshold_rehash os reached). * If the neighborhood of a bucket is full before the min is reacher, the elements are put into m_overflow_elements. */ size_type m_min_load_threshold_rehash; }; } // end namespace detail_hopscotch_hash } // end namespace tsl #endif fastmap/src/lib/tsl/hopscotch_map.h0000644000176200001440000006166213546743337017076 0ustar liggesusers/** * MIT License * * Copyright (c) 2017 Tessil * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell * copies of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #ifndef TSL_HOPSCOTCH_MAP_H #define TSL_HOPSCOTCH_MAP_H #include #include #include #include #include #include #include #include #include "hopscotch_hash.h" namespace tsl { /** * Implementation of a hash map using the hopscotch hashing algorithm. * * The Key and the value T must be either nothrow move-constructible, copy-constuctible or both. * * The size of the neighborhood (NeighborhoodSize) must be > 0 and <= 62 if StoreHash is false. * When StoreHash is true, 32-bits of the hash will be stored alongside the neighborhood limiting * the NeighborhoodSize to <= 30. There is no memory usage difference between * 'NeighborhoodSize 62; StoreHash false' and 'NeighborhoodSize 30; StoreHash true'. * * Storing the hash may improve performance on insert during the rehash process if the hash takes time * to compute. It may also improve read performance if the KeyEqual function takes time (or incurs a cache-miss). * If used with simple Hash and KeyEqual it may slow things down. * * StoreHash can only be set if the GrowthPolicy is set to tsl::power_of_two_growth_policy. * * GrowthPolicy defines how the map grows and consequently how a hash value is mapped to a bucket. * By default the map uses tsl::power_of_two_growth_policy. This policy keeps the number of buckets * to a power of two and uses a mask to map the hash to a bucket instead of the slow modulo. * You may define your own growth policy, check tsl::power_of_two_growth_policy for the interface. * * If the destructors of Key or T throw an exception, behaviour of the class is undefined. * * Iterators invalidation: * - clear, operator=, reserve, rehash: always invalidate the iterators. * - insert, emplace, emplace_hint, operator[]: if there is an effective insert, invalidate the iterators * if a displacement is needed to resolve a collision (which mean that most of the time, * insert will invalidate the iterators). Or if there is a rehash. * - erase: iterator on the erased element is the only one which become invalid. */ template, class KeyEqual = std::equal_to, class Allocator = std::allocator>, unsigned int NeighborhoodSize = 62, bool StoreHash = false, class GrowthPolicy = tsl::hh::power_of_two_growth_policy<2>> class hopscotch_map { private: template using has_is_transparent = tsl::detail_hopscotch_hash::has_is_transparent; class KeySelect { public: using key_type = Key; const key_type& operator()(const std::pair& key_value) const { return key_value.first; } key_type& operator()(std::pair& key_value) { return key_value.first; } }; class ValueSelect { public: using value_type = T; const value_type& operator()(const std::pair& key_value) const { return key_value.second; } value_type& operator()(std::pair& key_value) { return key_value.second; } }; using overflow_container_type = std::list, Allocator>; using ht = detail_hopscotch_hash::hopscotch_hash, KeySelect, ValueSelect, Hash, KeyEqual, Allocator, NeighborhoodSize, StoreHash, GrowthPolicy, overflow_container_type>; public: using key_type = typename ht::key_type; using mapped_type = T; using value_type = typename ht::value_type; using size_type = typename ht::size_type; using difference_type = typename ht::difference_type; using hasher = typename ht::hasher; using key_equal = typename ht::key_equal; using allocator_type = typename ht::allocator_type; using reference = typename ht::reference; using const_reference = typename ht::const_reference; using pointer = typename ht::pointer; using const_pointer = typename ht::const_pointer; using iterator = typename ht::iterator; using const_iterator = typename ht::const_iterator; /* * Constructors */ hopscotch_map() : hopscotch_map(ht::DEFAULT_INIT_BUCKETS_SIZE) { } explicit hopscotch_map(size_type bucket_count, const Hash& hash = Hash(), const KeyEqual& equal = KeyEqual(), const Allocator& alloc = Allocator()) : m_ht(bucket_count, hash, equal, alloc, ht::DEFAULT_MAX_LOAD_FACTOR) { } hopscotch_map(size_type bucket_count, const Allocator& alloc) : hopscotch_map(bucket_count, Hash(), KeyEqual(), alloc) { } hopscotch_map(size_type bucket_count, const Hash& hash, const Allocator& alloc) : hopscotch_map(bucket_count, hash, KeyEqual(), alloc) { } explicit hopscotch_map(const Allocator& alloc) : hopscotch_map(ht::DEFAULT_INIT_BUCKETS_SIZE, alloc) { } template hopscotch_map(InputIt first, InputIt last, size_type bucket_count = ht::DEFAULT_INIT_BUCKETS_SIZE, const Hash& hash = Hash(), const KeyEqual& equal = KeyEqual(), const Allocator& alloc = Allocator()) : hopscotch_map(bucket_count, hash, equal, alloc) { insert(first, last); } template hopscotch_map(InputIt first, InputIt last, size_type bucket_count, const Allocator& alloc) : hopscotch_map(first, last, bucket_count, Hash(), KeyEqual(), alloc) { } template hopscotch_map(InputIt first, InputIt last, size_type bucket_count, const Hash& hash, const Allocator& alloc) : hopscotch_map(first, last, bucket_count, hash, KeyEqual(), alloc) { } hopscotch_map(std::initializer_list init, size_type bucket_count = ht::DEFAULT_INIT_BUCKETS_SIZE, const Hash& hash = Hash(), const KeyEqual& equal = KeyEqual(), const Allocator& alloc = Allocator()) : hopscotch_map(init.begin(), init.end(), bucket_count, hash, equal, alloc) { } hopscotch_map(std::initializer_list init, size_type bucket_count, const Allocator& alloc) : hopscotch_map(init.begin(), init.end(), bucket_count, Hash(), KeyEqual(), alloc) { } hopscotch_map(std::initializer_list init, size_type bucket_count, const Hash& hash, const Allocator& alloc) : hopscotch_map(init.begin(), init.end(), bucket_count, hash, KeyEqual(), alloc) { } hopscotch_map& operator=(std::initializer_list ilist) { m_ht.clear(); m_ht.reserve(ilist.size()); m_ht.insert(ilist.begin(), ilist.end()); return *this; } allocator_type get_allocator() const { return m_ht.get_allocator(); } /* * Iterators */ iterator begin() noexcept { return m_ht.begin(); } const_iterator begin() const noexcept { return m_ht.begin(); } const_iterator cbegin() const noexcept { return m_ht.cbegin(); } iterator end() noexcept { return m_ht.end(); } const_iterator end() const noexcept { return m_ht.end(); } const_iterator cend() const noexcept { return m_ht.cend(); } /* * Capacity */ bool empty() const noexcept { return m_ht.empty(); } size_type size() const noexcept { return m_ht.size(); } size_type max_size() const noexcept { return m_ht.max_size(); } /* * Modifiers */ void clear() noexcept { m_ht.clear(); } std::pair insert(const value_type& value) { return m_ht.insert(value); } template::value>::type* = nullptr> std::pair insert(P&& value) { return m_ht.insert(std::forward

(value)); } std::pair insert(value_type&& value) { return m_ht.insert(std::move(value)); } iterator insert(const_iterator hint, const value_type& value) { return m_ht.insert(hint, value); } template::value>::type* = nullptr> iterator insert(const_iterator hint, P&& value) { return m_ht.insert(hint, std::forward

(value)); } iterator insert(const_iterator hint, value_type&& value) { return m_ht.insert(hint, std::move(value)); } template void insert(InputIt first, InputIt last) { m_ht.insert(first, last); } void insert(std::initializer_list ilist) { m_ht.insert(ilist.begin(), ilist.end()); } template std::pair insert_or_assign(const key_type& k, M&& obj) { return m_ht.insert_or_assign(k, std::forward(obj)); } template std::pair insert_or_assign(key_type&& k, M&& obj) { return m_ht.insert_or_assign(std::move(k), std::forward(obj)); } template iterator insert_or_assign(const_iterator hint, const key_type& k, M&& obj) { return m_ht.insert_or_assign(hint, k, std::forward(obj)); } template iterator insert_or_assign(const_iterator hint, key_type&& k, M&& obj) { return m_ht.insert_or_assign(hint, std::move(k), std::forward(obj)); } /** * Due to the way elements are stored, emplace will need to move or copy the key-value once. * The method is equivalent to insert(value_type(std::forward(args)...)); * * Mainly here for compatibility with the std::unordered_map interface. */ template std::pair emplace(Args&&... args) { return m_ht.emplace(std::forward(args)...); } /** * Due to the way elements are stored, emplace_hint will need to move or copy the key-value once. * The method is equivalent to insert(hint, value_type(std::forward(args)...)); * * Mainly here for compatibility with the std::unordered_map interface. */ template iterator emplace_hint(const_iterator hint, Args&&... args) { return m_ht.emplace_hint(hint, std::forward(args)...); } template std::pair try_emplace(const key_type& k, Args&&... args) { return m_ht.try_emplace(k, std::forward(args)...); } template std::pair try_emplace(key_type&& k, Args&&... args) { return m_ht.try_emplace(std::move(k), std::forward(args)...); } template iterator try_emplace(const_iterator hint, const key_type& k, Args&&... args) { return m_ht.try_emplace(hint, k, std::forward(args)...); } template iterator try_emplace(const_iterator hint, key_type&& k, Args&&... args) { return m_ht.try_emplace(hint, std::move(k), std::forward(args)...); } iterator erase(iterator pos) { return m_ht.erase(pos); } iterator erase(const_iterator pos) { return m_ht.erase(pos); } iterator erase(const_iterator first, const_iterator last) { return m_ht.erase(first, last); } size_type erase(const key_type& key) { return m_ht.erase(key); } /** * Use the hash value 'precalculated_hash' instead of hashing the key. The hash value should be the same * as hash_function()(key). Usefull to speed-up the lookup to the value if you already have the hash. */ size_type erase(const key_type& key, std::size_t precalculated_hash) { return m_ht.erase(key, precalculated_hash); } /** * This overload only participates in the overload resolution if the typedef KeyEqual::is_transparent exists. * If so, K must be hashable and comparable to Key. */ template::value>::type* = nullptr> size_type erase(const K& key) { return m_ht.erase(key); } /** * @copydoc erase(const K& key) * * Use the hash value 'precalculated_hash' instead of hashing the key. The hash value should be the same * as hash_function()(key). Usefull to speed-up the lookup to the value if you already have the hash. */ template::value>::type* = nullptr> size_type erase(const K& key, std::size_t precalculated_hash) { return m_ht.erase(key, precalculated_hash); } void swap(hopscotch_map& other) { other.m_ht.swap(m_ht); } /* * Lookup */ T& at(const Key& key) { return m_ht.at(key); } /** * Use the hash value 'precalculated_hash' instead of hashing the key. The hash value should be the same * as hash_function()(key). Usefull to speed-up the lookup if you already have the hash. */ T& at(const Key& key, std::size_t precalculated_hash) { return m_ht.at(key, precalculated_hash); } const T& at(const Key& key) const { return m_ht.at(key); } /** * @copydoc at(const Key& key, std::size_t precalculated_hash) */ const T& at(const Key& key, std::size_t precalculated_hash) const { return m_ht.at(key, precalculated_hash); } /** * This overload only participates in the overload resolution if the typedef KeyEqual::is_transparent exists. * If so, K must be hashable and comparable to Key. */ template::value>::type* = nullptr> T& at(const K& key) { return m_ht.at(key); } /** * @copydoc at(const K& key) * * Use the hash value 'precalculated_hash' instead of hashing the key. The hash value should be the same * as hash_function()(key). Usefull to speed-up the lookup if you already have the hash. */ template::value>::type* = nullptr> T& at(const K& key, std::size_t precalculated_hash) { return m_ht.at(key, precalculated_hash); } /** * @copydoc at(const K& key) */ template::value>::type* = nullptr> const T& at(const K& key) const { return m_ht.at(key); } /** * @copydoc at(const K& key, std::size_t precalculated_hash) */ template::value>::type* = nullptr> const T& at(const K& key, std::size_t precalculated_hash) const { return m_ht.at(key, precalculated_hash); } T& operator[](const Key& key) { return m_ht[key]; } T& operator[](Key&& key) { return m_ht[std::move(key)]; } size_type count(const Key& key) const { return m_ht.count(key); } /** * Use the hash value 'precalculated_hash' instead of hashing the key. The hash value should be the same * as hash_function()(key). Usefull to speed-up the lookup if you already have the hash. */ size_type count(const Key& key, std::size_t precalculated_hash) const { return m_ht.count(key, precalculated_hash); } /** * This overload only participates in the overload resolution if the typedef KeyEqual::is_transparent exists. * If so, K must be hashable and comparable to Key. */ template::value>::type* = nullptr> size_type count(const K& key) const { return m_ht.count(key); } /** * @copydoc count(const K& key) const * * Use the hash value 'precalculated_hash' instead of hashing the key. The hash value should be the same * as hash_function()(key). Usefull to speed-up the lookup if you already have the hash. */ template::value>::type* = nullptr> size_type count(const K& key, std::size_t precalculated_hash) const { return m_ht.count(key, precalculated_hash); } iterator find(const Key& key) { return m_ht.find(key); } /** * Use the hash value 'precalculated_hash' instead of hashing the key. The hash value should be the same * as hash_function()(key). Usefull to speed-up the lookup if you already have the hash. */ iterator find(const Key& key, std::size_t precalculated_hash) { return m_ht.find(key, precalculated_hash); } const_iterator find(const Key& key) const { return m_ht.find(key); } /** * @copydoc find(const Key& key, std::size_t precalculated_hash) */ const_iterator find(const Key& key, std::size_t precalculated_hash) const { return m_ht.find(key, precalculated_hash); } /** * This overload only participates in the overload resolution if the typedef KeyEqual::is_transparent exists. * If so, K must be hashable and comparable to Key. */ template::value>::type* = nullptr> iterator find(const K& key) { return m_ht.find(key); } /** * @copydoc find(const K& key) * * Use the hash value 'precalculated_hash' instead of hashing the key. The hash value should be the same * as hash_function()(key). Usefull to speed-up the lookup if you already have the hash. */ template::value>::type* = nullptr> iterator find(const K& key, std::size_t precalculated_hash) { return m_ht.find(key, precalculated_hash); } /** * @copydoc find(const K& key) */ template::value>::type* = nullptr> const_iterator find(const K& key) const { return m_ht.find(key); } /** * @copydoc find(const K& key) * * Use the hash value 'precalculated_hash' instead of hashing the key. The hash value should be the same * as hash_function()(key). Usefull to speed-up the lookup if you already have the hash. */ template::value>::type* = nullptr> const_iterator find(const K& key, std::size_t precalculated_hash) const { return m_ht.find(key, precalculated_hash); } std::pair equal_range(const Key& key) { return m_ht.equal_range(key); } /** * Use the hash value 'precalculated_hash' instead of hashing the key. The hash value should be the same * as hash_function()(key). Usefull to speed-up the lookup if you already have the hash. */ std::pair equal_range(const Key& key, std::size_t precalculated_hash) { return m_ht.equal_range(key, precalculated_hash); } std::pair equal_range(const Key& key) const { return m_ht.equal_range(key); } /** * @copydoc equal_range(const Key& key, std::size_t precalculated_hash) */ std::pair equal_range(const Key& key, std::size_t precalculated_hash) const { return m_ht.equal_range(key, precalculated_hash); } /** * This overload only participates in the overload resolution if the typedef KeyEqual::is_transparent exists. * If so, K must be hashable and comparable to Key. */ template::value>::type* = nullptr> std::pair equal_range(const K& key) { return m_ht.equal_range(key); } /** * @copydoc equal_range(const K& key) * * Use the hash value 'precalculated_hash' instead of hashing the key. The hash value should be the same * as hash_function()(key). Usefull to speed-up the lookup if you already have the hash. */ template::value>::type* = nullptr> std::pair equal_range(const K& key, std::size_t precalculated_hash) { return m_ht.equal_range(key, precalculated_hash); } /** * @copydoc equal_range(const K& key) */ template::value>::type* = nullptr> std::pair equal_range(const K& key) const { return m_ht.equal_range(key); } /** * @copydoc equal_range(const K& key, std::size_t precalculated_hash) */ template::value>::type* = nullptr> std::pair equal_range(const K& key, std::size_t precalculated_hash) const { return m_ht.equal_range(key, precalculated_hash); } /* * Bucket interface */ size_type bucket_count() const { return m_ht.bucket_count(); } size_type max_bucket_count() const { return m_ht.max_bucket_count(); } /* * Hash policy */ float load_factor() const { return m_ht.load_factor(); } float max_load_factor() const { return m_ht.max_load_factor(); } void max_load_factor(float ml) { m_ht.max_load_factor(ml); } void rehash(size_type count_) { m_ht.rehash(count_); } void reserve(size_type count_) { m_ht.reserve(count_); } /* * Observers */ hasher hash_function() const { return m_ht.hash_function(); } key_equal key_eq() const { return m_ht.key_eq(); } /* * Other */ /** * Convert a const_iterator to an iterator. */ iterator mutable_iterator(const_iterator pos) { return m_ht.mutable_iterator(pos); } size_type overflow_size() const noexcept { return m_ht.overflow_size(); } friend bool operator==(const hopscotch_map& lhs, const hopscotch_map& rhs) { if(lhs.size() != rhs.size()) { return false; } for(const auto& element_lhs : lhs) { const auto it_element_rhs = rhs.find(element_lhs.first); if(it_element_rhs == rhs.cend() || element_lhs.second != it_element_rhs->second) { return false; } } return true; } friend bool operator!=(const hopscotch_map& lhs, const hopscotch_map& rhs) { return !operator==(lhs, rhs); } friend void swap(hopscotch_map& lhs, hopscotch_map& rhs) { lhs.swap(rhs); } private: ht m_ht; }; /** * Same as `tsl::hopscotch_map`. */ template, class KeyEqual = std::equal_to, class Allocator = std::allocator>, unsigned int NeighborhoodSize = 62, bool StoreHash = false> using hopscotch_pg_map = hopscotch_map; } // end namespace tsl #endif fastmap/src/lib/tsl/hopscotch_growth_policy.h0000644000176200001440000002351213546743337021202 0ustar liggesusers/** * MIT License * * Copyright (c) 2018 Tessil * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell * copies of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #ifndef TSL_HOPSCOTCH_GROWTH_POLICY_H #define TSL_HOPSCOTCH_GROWTH_POLICY_H #include #include #include #include #include #include #include #include #include namespace tsl { namespace hh { /** * Grow the hash table by a factor of GrowthFactor keeping the bucket count to a power of two. It allows * the table to use a mask operation instead of a modulo operation to map a hash to a bucket. * * GrowthFactor must be a power of two >= 2. */ template class power_of_two_growth_policy { public: /** * Called on the hash table creation and on rehash. The number of buckets for the table is passed in parameter. * This number is a minimum, the policy may update this value with a higher value if needed (but not lower). * * If 0 is given, min_bucket_count_in_out must still be 0 after the policy creation and * bucket_for_hash must always return 0 in this case. */ explicit power_of_two_growth_policy(std::size_t& min_bucket_count_in_out) { if(min_bucket_count_in_out > max_bucket_count()) { throw std::length_error("The hash table exceeds its maxmimum size."); } if(min_bucket_count_in_out > 0) { min_bucket_count_in_out = round_up_to_power_of_two(min_bucket_count_in_out); m_mask = min_bucket_count_in_out - 1; } else { m_mask = 0; } } /** * Return the bucket [0, bucket_count()) to which the hash belongs. * If bucket_count() is 0, it must always return 0. */ std::size_t bucket_for_hash(std::size_t hash) const noexcept { return hash & m_mask; } /** * Return the bucket count to use when the bucket array grows on rehash. */ std::size_t next_bucket_count() const { if((m_mask + 1) > max_bucket_count() / GrowthFactor) { throw std::length_error("The hash table exceeds its maxmimum size."); } return (m_mask + 1) * GrowthFactor; } /** * Return the maximum number of buckets supported by the policy. */ std::size_t max_bucket_count() const { // Largest power of two. return (std::numeric_limits::max() / 2) + 1; } /** * Reset the growth policy as if it was created with a bucket count of 0. * After a clear, the policy must always return 0 when bucket_for_hash is called. */ void clear() noexcept { m_mask = 0; } private: static std::size_t round_up_to_power_of_two(std::size_t value) { if(is_power_of_two(value)) { return value; } if(value == 0) { return 1; } --value; for(std::size_t i = 1; i < sizeof(std::size_t) * CHAR_BIT; i *= 2) { value |= value >> i; } return value + 1; } static constexpr bool is_power_of_two(std::size_t value) { return value != 0 && (value & (value - 1)) == 0; } private: static_assert(is_power_of_two(GrowthFactor) && GrowthFactor >= 2, "GrowthFactor must be a power of two >= 2."); std::size_t m_mask; }; /** * Grow the hash table by GrowthFactor::num / GrowthFactor::den and use a modulo to map a hash * to a bucket. Slower but it can be useful if you want a slower growth. */ template> class mod_growth_policy { public: explicit mod_growth_policy(std::size_t& min_bucket_count_in_out) { if(min_bucket_count_in_out > max_bucket_count()) { throw std::length_error("The hash table exceeds its maxmimum size."); } if(min_bucket_count_in_out > 0) { m_mod = min_bucket_count_in_out; } else { m_mod = 1; } } std::size_t bucket_for_hash(std::size_t hash) const noexcept { return hash % m_mod; } std::size_t next_bucket_count() const { if(m_mod == max_bucket_count()) { throw std::length_error("The hash table exceeds its maxmimum size."); } const double next_bucket_count = std::ceil(double(m_mod) * REHASH_SIZE_MULTIPLICATION_FACTOR); if(!std::isnormal(next_bucket_count)) { throw std::length_error("The hash table exceeds its maxmimum size."); } if(next_bucket_count > double(max_bucket_count())) { return max_bucket_count(); } else { return std::size_t(next_bucket_count); } } std::size_t max_bucket_count() const { return MAX_BUCKET_COUNT; } void clear() noexcept { m_mod = 1; } private: static constexpr double REHASH_SIZE_MULTIPLICATION_FACTOR = 1.0 * GrowthFactor::num / GrowthFactor::den; static const std::size_t MAX_BUCKET_COUNT = std::size_t(double( std::numeric_limits::max() / REHASH_SIZE_MULTIPLICATION_FACTOR )); static_assert(REHASH_SIZE_MULTIPLICATION_FACTOR >= 1.1, "Growth factor should be >= 1.1."); std::size_t m_mod; }; namespace detail { static constexpr const std::array PRIMES = {{ 1ul, 5ul, 17ul, 29ul, 37ul, 53ul, 67ul, 79ul, 97ul, 131ul, 193ul, 257ul, 389ul, 521ul, 769ul, 1031ul, 1543ul, 2053ul, 3079ul, 6151ul, 12289ul, 24593ul, 49157ul, 98317ul, 196613ul, 393241ul, 786433ul, 1572869ul, 3145739ul, 6291469ul, 12582917ul, 25165843ul, 50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul, 1610612741ul, 3221225473ul, 4294967291ul }}; template static constexpr std::size_t mod(std::size_t hash) { return hash % PRIMES[IPrime]; } // MOD_PRIME[iprime](hash) returns hash % PRIMES[iprime]. This table allows for faster modulo as the // compiler can optimize the modulo code better with a constant known at the compilation. static constexpr const std::array MOD_PRIME = {{ &mod<0>, &mod<1>, &mod<2>, &mod<3>, &mod<4>, &mod<5>, &mod<6>, &mod<7>, &mod<8>, &mod<9>, &mod<10>, &mod<11>, &mod<12>, &mod<13>, &mod<14>, &mod<15>, &mod<16>, &mod<17>, &mod<18>, &mod<19>, &mod<20>, &mod<21>, &mod<22>, &mod<23>, &mod<24>, &mod<25>, &mod<26>, &mod<27>, &mod<28>, &mod<29>, &mod<30>, &mod<31>, &mod<32>, &mod<33>, &mod<34>, &mod<35>, &mod<36>, &mod<37> , &mod<38>, &mod<39> }}; } /** * Grow the hash table by using prime numbers as bucket count. Slower than tsl::hh::power_of_two_growth_policy in * general but will probably distribute the values around better in the buckets with a poor hash function. * * To allow the compiler to optimize the modulo operation, a lookup table is used with constant primes numbers. * * With a switch the code would look like: * \code * switch(iprime) { // iprime is the current prime of the hash table * case 0: hash % 5ul; * break; * case 1: hash % 17ul; * break; * case 2: hash % 29ul; * break; * ... * } * \endcode * * Due to the constant variable in the modulo the compiler is able to optimize the operation * by a series of multiplications, substractions and shifts. * * The 'hash % 5' could become something like 'hash - (hash * 0xCCCCCCCD) >> 34) * 5' in a 64 bits environement. */ class prime_growth_policy { public: explicit prime_growth_policy(std::size_t& min_bucket_count_in_out) { auto it_prime = std::lower_bound(detail::PRIMES.begin(), detail::PRIMES.end(), min_bucket_count_in_out); if(it_prime == detail::PRIMES.end()) { throw std::length_error("The hash table exceeds its maxmimum size."); } m_iprime = static_cast(std::distance(detail::PRIMES.begin(), it_prime)); if(min_bucket_count_in_out > 0) { min_bucket_count_in_out = *it_prime; } else { min_bucket_count_in_out = 0; } } std::size_t bucket_for_hash(std::size_t hash) const noexcept { return detail::MOD_PRIME[m_iprime](hash); } std::size_t next_bucket_count() const { if(m_iprime + 1 >= detail::PRIMES.size()) { throw std::length_error("The hash table exceeds its maxmimum size."); } return detail::PRIMES[m_iprime + 1]; } std::size_t max_bucket_count() const { return detail::PRIMES.back(); } void clear() noexcept { m_iprime = 0; } private: unsigned int m_iprime; static_assert(std::numeric_limits::max() >= detail::PRIMES.size(), "The type of m_iprime is not big enough."); }; } } #endif fastmap/src/Makevars.win0000644000176200001440000000022113546743337014776 0ustar liggesusers# Use C++11 if available CXX_STD=CXX11 # Require Rf_ prefix for R's functions, to avoid clashes. PKG_CXXFLAGS=-DR_NO_REMAP PKG_CPPFLAGS=-Ilib/ fastmap/src/fastmap.cpp0000644000176200001440000001527713546743337014666 0ustar liggesusers#include #include #include #include #include #if __cplusplus >= 201103L // tsl::hopscotch_map is faster than std::map, but requires C++11. // We're using it instead of std::unordered_map, because the ordering of keys // should be stable across platforms (see #8), and because it's faster. #include typedef tsl::hopscotch_map si_map; #else #include typedef std::map si_map; #endif // Note that this returns a const char* which points to the CHARSXP's // memory, so its lifetime must not exceed the CHARSXP's lifetime. std::string key_from_sexp(SEXP key_r) { if (TYPEOF(key_r) != STRSXP || Rf_length(key_r) != 1) { Rf_error("key must be a one-element character vector"); } SEXP key_c = STRING_ELT(key_r, 0); if (key_c == NA_STRING || Rf_StringBlank(key_c)) { Rf_error("key must be not be \"\" or NA"); } return std::string(Rf_translateCharUTF8(key_c)); } extern "C" { bool is_ascii(const char *str) { while (*str) { if ((unsigned int)*str > 0x7F) { return false; } str++; } return true; } si_map* map_from_xptr(SEXP map_xptr) { if (TYPEOF(map_xptr) != EXTPTRSXP) { Rf_error("map_xptr must be an external pointer."); } si_map* map = (si_map*) R_ExternalPtrAddr(map_xptr); if (!map) { Rf_error("fastmap: external pointer to string-to-index map is null."); } return map; } void map_finalizer(SEXP map_xptr) { si_map* map = map_from_xptr(map_xptr); delete map; R_ClearExternalPtr(map_xptr); } SEXP C_xptr_is_null(SEXP map_xptr) { if (TYPEOF(map_xptr) != EXTPTRSXP) { Rf_error("map_xptr must be an external pointer."); } return Rf_ScalarLogical(R_ExternalPtrAddr(map_xptr) == NULL); } SEXP C_map_create() { si_map* map = new si_map; SEXP map_xptr = PROTECT(R_MakeExternalPtr(map, R_NilValue, R_NilValue)); R_RegisterCFinalizerEx(map_xptr, map_finalizer, TRUE); UNPROTECT(1); return map_xptr; } SEXP C_map_set(SEXP map_xptr, SEXP key_r, SEXP idx_r) { std::string key = key_from_sexp(key_r); if (TYPEOF(idx_r) != INTSXP || Rf_length(idx_r) != 1) { Rf_error("idx must be a one-element integer vector"); } si_map* map = map_from_xptr(map_xptr); int idx = INTEGER(idx_r)[0]; (*map)[key] = idx; return R_NilValue; } SEXP C_map_get(SEXP map_xptr, SEXP key_r) { std::string key = key_from_sexp(key_r); si_map* map = map_from_xptr(map_xptr); si_map::const_iterator it = map->find(key); if (it == map->end()) { return Rf_ScalarInteger(-1); } else { return Rf_ScalarInteger(it->second); } } SEXP C_map_remove(SEXP map_xptr, SEXP key_r) { std::string key = key_from_sexp(key_r); si_map* map = map_from_xptr(map_xptr); si_map::iterator it = map->find(key); if (it == map->end()) { return Rf_ScalarInteger(-1); } else { int value = it->second; map->erase(it); return Rf_ScalarInteger(value); } } SEXP C_map_keys(SEXP map_xptr, SEXP sort_r) { si_map* map = map_from_xptr(map_xptr); SEXP keys = PROTECT(Rf_allocVector(STRSXP, map->size())); bool sort = LOGICAL(sort_r)[0]; if (sort) { std::vector keys_vec; keys_vec.reserve(map->size()); // Extract all the keys from the map, then sort them. int i = 0; for(si_map::const_iterator it = map->begin(); it != map->end(); ++it, ++i) { keys_vec.push_back(it->first); } std::sort(keys_vec.begin(), keys_vec.end()); // Put the sorted keys in the character vector. i = 0; for(std::vector::const_iterator it = keys_vec.begin(); it != keys_vec.end(); ++it, ++i) { SET_STRING_ELT(keys, i, Rf_mkCharCE(it->c_str(), CE_UTF8)); } } else { int i = 0; for(si_map::const_iterator it = map->begin(); it != map->end(); ++it, ++i) { SET_STRING_ELT(keys, i, Rf_mkCharCE(it->first.c_str(), CE_UTF8)); } } UNPROTECT(1); return keys; } SEXP C_map_keys_idxs(SEXP map_xptr, SEXP sort_r) { si_map* map = map_from_xptr(map_xptr); SEXP keys = PROTECT(Rf_allocVector(STRSXP, map->size())); SEXP idxs = PROTECT(Rf_allocVector(INTSXP, map->size())); int* idxs_ = INTEGER(idxs); bool sort = LOGICAL(sort_r)[0]; if (sort) { std::vector keys_vec; keys_vec.reserve(map->size()); // Extract all the keys from the map, then sort them. int i = 0; for(si_map::const_iterator it = map->begin(); it != map->end(); ++it, ++i) { keys_vec.push_back(it->first); } std::sort(keys_vec.begin(), keys_vec.end()); // Use the sorted keys to populate `keys`, as well as extract values // from `map` and put them into `idxs_`. i = 0; for(std::vector::const_iterator it = keys_vec.begin(); it != keys_vec.end(); ++it, ++i) { SET_STRING_ELT(keys, i, Rf_mkCharCE(it->c_str(), CE_UTF8)); idxs_[i] = (*map)[*it]; } } else { int i = 0; for(si_map::const_iterator it = map->begin(); it != map->end(); ++it, ++i) { SET_STRING_ELT(keys, i, Rf_mkCharCE(it->first.c_str(), CE_UTF8)); idxs_[i] = it->second; } } Rf_setAttrib(idxs, R_NamesSymbol, keys); UNPROTECT(2); return idxs; } // Convert an R character vector to UTF-8. This is necessary because iconv // doesn't really work for vectors where the items have mixed encoding. SEXP C_char_vec_to_utf8(SEXP str) { if (TYPEOF(str) != STRSXP) { Rf_error("str must be a character vector"); } // Our default assumption is that all the keys are UTF-8 (or ASCII), in // which case we do _not_ need to re-encode the keys and copy them to a // new vector. bool need_reencode = false; // Fast path for the common case: check if all the strings are UTF-8. If // yes, just return str. If no, we need to copy and re-encode each element // to a new vector. int n_str = Rf_length(str); SEXP tmp; for (int i=0; i #' #' # Specify the default missing value #' m <- fastmap(missing_default = key_missing()) #' m$get("x") #' #> #' #' @export fastmap <- function(missing_default = NULL) { force(missing_default) # =================================== # Constants # =================================== INITIAL_SIZE <- 32L GROWTH_FACTOR <- 1.2 SHRINK_FACTOR <- 2 # =================================== # Internal state # =================================== # Number of items currently stored in the fastmap object. n <- NULL # External pointer to the C++ object that maps from key (a string) to index # into the list that stores the values (which can be any R object). key_idx_map <- NULL # A vector containing keys, where the keys are in the corresponding position # to the values in the values list. This is only used to repopulate the map # after the fastmap has been serialized and deserialized. It contains the same # information as key_idx_map, but, since it is a normal R object, it can be # saved and restored without any extra effort. keys_ <- NULL # Backing store for the R objects. values <- NULL # Indices in the list which are less than n and not currently occupied. These # occur when objects are removed from the map. When a hole is filled, the # entry is replaced with NA, and n_holes is updated to reflect it; this is # instead of simply shrinking the holes vector, because that involves copying # the entire object. holes <- NULL n_holes <- NULL # =================================== # Methods # =================================== reset <- function() { n <<- 0L key_idx_map <<- .Call(C_map_create) keys_ <<- rep(NA_character_, INITIAL_SIZE) values <<- vector(mode = "list", INITIAL_SIZE) holes <<- seq_len(INITIAL_SIZE) n_holes <<- INITIAL_SIZE invisible(NULL) } reset() set <- function(key, value) { # Force evaluation of value here, so that, if it throws an error, the error # will not happen in the middle of this function and leave things in an # inconsistent state. force(value) ensure_restore_map() idx <- .Call(C_map_get, key_idx_map, key) if (idx == -1L) { # This is a new key. n <<- n + 1L # If we have any holes in our values list, store it there. Otherwise # append to the end of the values list. if (n_holes == 0L) { idx <- n # If we got here, we need to grow. This grows values, and holes is # updated to track it. grow() } idx <- holes[n_holes] holes[n_holes] <<- NA_integer_ # Mark as NA, for safety n_holes <<- n_holes - 1L .Call(C_map_set, key_idx_map, key, idx) } if (is.null(value)) { # Need to handle NULLs differently. Wrap them in a list so that this # doesn't result in deletion. values[idx] <<- list(NULL) } else { values[[idx]] <<- value } # Store the key, as UTF-8 keys_[idx] <<- .Call(C_char_vec_to_utf8, key) invisible(value) } mset <- function(..., .list = NULL) { objs <- c(list(...), .list) keys <- names(objs) if (is.null(keys) || any(is.na(keys)) || any(keys == "")) { stop("mset: all values must be named.") } for (i in seq_along(objs)) { set(keys[i], objs[[i]]) } invisible(objs) } get <- function(key, missing = missing_default) { ensure_restore_map() idx <- .Call(C_map_get, key_idx_map, key) if (idx == -1L) { return(missing) } values[[idx]] } mget <- function(keys, missing = missing_default) { if (is.null(keys)) { return(list(a=1)[0]) # Special case: return empty named list } if (!is.character(keys)) { stop("mget: `keys` must be a character vector or NULL") } # Make sure keys are encoded in UTF-8. Need this C function because iconv # doesn't work right for vectors with mixed encodings. keys <- .Call(C_char_vec_to_utf8, keys) res <- lapply(keys, get, missing) names(res) <- keys res } # Internal function has_one <- function(key) { ensure_restore_map() idx <- .Call(C_map_get, key_idx_map, key) return(idx != -1L) } has <- function(keys) { if (!(is.character(keys) || is.null(keys))) { stop("mget: `keys` must be a character vector or NULL") } if (length(keys) == 1) { # In the common case of only one key, it's faster to avoid vapply. has_one(keys) } else { vapply(keys, has_one, FUN.VALUE = TRUE, USE.NAMES = FALSE) } } # Internal function remove_one <- function(key) { ensure_restore_map() idx <- .Call(C_map_remove, key_idx_map, key) if (idx == -1L) { return(FALSE) } values[idx] <<- list(NULL) keys_[idx] <<- NA_character_ n <<- n - 1L n_holes <<- n_holes + 1L holes[n_holes] <<- idx # Shrink the values list if its length is larger than 32 and it is half or # more empty. values_length <- length(values) if (values_length > INITIAL_SIZE && values_length >= n * SHRINK_FACTOR) { shrink() } TRUE } remove <- function(keys) { if (!(is.character(keys) || is.null(keys))) { stop("mget: `keys` must be a character vector or NULL") } if (any(keys == "") || any(is.na(keys))) { stop('mget: `keys` must not be "" or NA') } if (length(keys) == 1) { # In the common case of only one key, it's faster to avoid vapply. invisible(remove_one(keys)) } else { invisible(vapply(keys, remove_one, FUN.VALUE = TRUE, USE.NAMES = FALSE)) } } size <- function() { n } keys <- function(sort = FALSE) { ensure_restore_map() .Call(C_map_keys, key_idx_map, sort) } as_list <- function(sort = FALSE) { ensure_restore_map() keys_idxs <- .Call(C_map_keys_idxs, key_idx_map, sort) result <- values[keys_idxs] names(result) <- names(keys_idxs) result } # Internal function grow <- function() { old_values_length <- length(values) new_values_length <- as.integer(ceiling(old_values_length * GROWTH_FACTOR)) # Increase size of values list by assigning NULL past the end. On R 3.4 and # up, this will grow it in place. values[new_values_length] <<- list(NULL) keys_[new_values_length] <<- NA_character_ # When grow() is called, `holes` is all NAs, but it's not as long as values. # Grow it (possibly in place, depending on R version) to new_values_length, # and have it point to all the new empty spaces in `values`. Strictly # speaking, it doesn't have to be as large as new_values_length -- it only # needs to be of size (new_values_length - new_values_length / # SHRINK_FACTOR), but it's possible that there will be a rounding error and # I'm playing it safe here. holes[new_values_length] <<- NA_integer_ n_holes <<- new_values_length - old_values_length holes[seq_len(n_holes)] <<- seq.int(from = old_values_length + 1, to = new_values_length) } # Internal function shrink <- function() { if (n_holes == 0L) return(invisible()) keys_idxs <- .Call(C_map_keys_idxs, key_idx_map, FALSE) # Suppose values is a length-7 list, n==3, holes==c(4,1,3,NA), n_holes==3 # Drop any extra values stored in the holes vector. holes <<- holes[seq_len(n_holes)] remap_inv <- seq_len(length(values)) remap_inv <- remap_inv[-holes] # remap_inv now is c(2, 5, 6, 7). It will be sorted. remap <- integer(length(values)) remap[remap_inv] <- seq_along(remap_inv) # remap is now c(0,1,0,0,2,3,4). The non-zero values will be sorted. if (length(keys_idxs) != length(remap_inv)) { stop("length mismatch of keys_idxs and remap_inv") } keys <- names(keys_idxs) for (i in seq_along(keys)) { idx <- keys_idxs[[i]] .Call(C_map_set, key_idx_map, keys[i], remap[idx]) } values <<- values[-holes] keys_ <<- keys_[-holes] holes <<- integer() n_holes <<- 0L n <<- length(values) } # Internal function. This is useful after a fastmap is deserialized. When that # happens, the key_idx_map xptr will be NULL, and it needs to be repopulated. # Every external-facing method that makes use of key_idx_map should call this # before doing any operations on it. ensure_restore_map <- function() { # If the key_idx_map pointer is not NULL, just return. if (!.Call(C_xptr_is_null, key_idx_map)) { return(invisible()) } # Repopulate key_idx_map. key_idx_map <<- .Call(C_map_create) holes <- holes[seq_len(n_holes)] idxs <- seq_along(keys_)[-holes] for (idx in idxs) { .Call(C_map_set, key_idx_map, keys_[idx], idx) } } list( reset = reset, set = set, mset = mset, get = get, mget = mget, has = has, remove = remove, keys = keys, size = size, as_list = as_list ) } fastmap/R/key_missing.R0000644000176200001440000000074313546743337014575 0ustar liggesusers#' A missing key object #' #' A \code{key_missing} object represents a missing key. #' #' @param x An object to test. #' #' @export key_missing <- function() { # Note: this is more verbose, but much faster than # structure(list(), class = "key_missing") x <- list() class(x) <- "key_missing" x } #' @rdname key_missing #' @export is.key_missing <- function(x) { inherits(x, "key_missing") } #' @export print.key_missing <- function(x, ...) { cat("\n") } fastmap/R/faststack.R0000644000176200001440000001174114003624771014224 0ustar liggesusers#' Create a stack #' #' A `faststack` is backed by a list. The backing list will grow or shrink as #' the stack changes in size. #' #' `faststack` objects have the following methods: #' #' \describe{ #' \item{\code{push(x)}}{ #' Push an object onto the stack. #' } #' \item{\code{mpush(..., .list = NULL)}}{ #' Push objects onto the stack. `.list` can be a list of objects to add. #' } #' \item{\code{pop(missing = missing_default)}}{ #' Remove and return the top object on the stack. If the stack is empty, #' it will return `missing`, which defaults to the value of #' `missing_default` that `stack()` was created with (typically, `NULL`). #' } #' \item{\code{mpop(n, missing = missing_default)}}{ #' Remove and return the top `n` objects on the stack, in a list. The first #' element of the list is the top object in the stack. If `n` is greater #' than the number of objects in the stack, any requested items beyond #' those in the stack will be replaced with `missing` (typically, `NULL`). #' } #' \item{\code{peek(missing = missing_default)}}{ #' Return the top object on the stack, but do not remove it from the stack. #' If the stack is empty, this will return `missing`. #' } #' \item{\code{reset()}}{ #' Reset the stack, clearing all items. #' } #' \item{\code{size()}}{ #' Returns the number of items in the stack. #' } #' \item{\code{as_list()}}{ #' Return a list containing the objects in the stack, where the first #' element in the list is the object at the bottom of the stack, and the #' last element in the list is the object at the top of the stack. #' } #' } #' #' #' @param init Initial size of the list that backs the stack. This is also used #' as the minimum size of the list; it will not shrink any smaller. #' @param missing_default The value to return when `pop()` or `peek()` are #' called when the stack is empty. Default is `NULL`. #' @export faststack <- function(init = 20, missing_default = NULL) { force(missing_default) # A list that represents the stack s <- vector("list", init) # Current size of the stack count <- 0L push <- function(x) { new_size <- count + 1L # R 3.4.0 and up will automatically grow vectors in place, if possible, so # we don't need to explicitly grow the list here. if (is.null(x)) { # Special case for NULL (in the normal case, we'll avoid creating a new # list() and then unwrapping it.) s[new_size] <<- list(NULL) } else { s[[new_size]] <<- x } count <<- new_size invisible() } mpush <- function(..., .list = NULL) { if (is.null(.list)) { # Fast path for common case args <- list(...) } else { args <- c(list(...), .list) } if (length(args) == 0) { stop("`mpush`: No items provided to push on stack.") } new_size <- count + length(args) # R 3.4.0 and up will automatically grow vectors in place, if possible, so # we don't need to explicitly grow the list here. s[count + seq_along(args)] <<- args count <<- new_size invisible() } pop <- function(missing = missing_default) { if (count == 0L) { return(missing) } value <- s[[count]] s[count] <<- list(NULL) count <<- count - 1L # Shrink list if < 1/2 of the list is used, down to a minimum size of `init` len <- length(s) if (len > init && count < len/2) { new_len <- max(init, count) s[seq.int(new_len + 1L, len)] <<- list(NULL) } value } mpop <- function(n, missing = missing_default) { n <- as.integer(n) if (n < 1) { stop("`n` must be at least 1.") } if (n > count) { n_pop <- count n_extra <- n - count } else { n_pop <- n n_extra <- 0L } idx <- seq.int(count, count - n_pop + 1L) if (n_extra != 0) { values <- vector("list", n) values[seq_len(n_pop)] <- s[idx] if (!is.null(missing)) { values[seq.int(n_pop + 1, n)] <- missing } } else { values <- s[idx] } s[idx] <<- list(NULL) count <<- count - n_pop # Shrink list if < 1/2 of the list is used, down to a minimum size of `init` len <- length(s) if (len > init && count < len/2) { new_len <- max(init, count) # Assign in place; avoids making copies s[seq.int(new_len + 1L, len)] <<- NULL } values } peek <- function(missing = missing_default) { if (count == 0L) { return(missing) } s[[count]] } reset <- function() { s <<- vector("list", init) count <<- 0L invisible() } size <- function() { count } # Return the entire stack as a list, where the first item in the list is the # oldest item in the stack, and the last item is the most recently added. as_list <- function() { s[seq_len(count)] } list( push = push, mpush = mpush, pop = pop, mpop = mpop, peek = peek, reset = reset, size = size, as_list = as_list ) } fastmap/R/fastqueue.R0000644000176200001440000002347114003624771014246 0ustar liggesusers#' Create a queue #' #' A `fastqueue` is backed by a list, which is used in a circular manner. The #' backing list will grow or shrink as the queue changes in size. #' #' `fastqueue` objects have the following methods: #' #' \describe{ #' \item{\code{add(x)}}{ #' Add an object to the queue. #' } #' \item{\code{madd(..., .list = NULL)}}{ #' Add objects to the queue. `.list` can be a list of objects to add. #' } #' \item{\code{remove(missing = missing_default)}}{ #' Remove and return the next object in the queue, but do not remove it from #' the queue. If the queue is empty, this will return `missing`, which #' defaults to the value of `missing_default` that `queue()` was created #' with (typically, `NULL`). #' } #' \item{\code{remove(n, missing = missing_default)}}{ #' Remove and return the next `n` objects on the queue, in a list. The first #' element of the list is the oldest object in the queue (in other words, #' the next item that would be returned by `remove()`). If `n` is greater #' than the number of objects in the queue, any requested items beyond #' those in the queue will be replaced with `missing` (typically, `NULL`). #' } #' \item{\code{peek(missing = missing_default)}}{ #' Return the next object in the queue but do not remove it from the queue. #' If the queue is empty, this will return `missing`. #' } #' \item{\code{reset()}}{ #' Reset the queue, clearing all items. #' } #' \item{\code{size()}}{ #' Returns the number of items in the queue. #' } #' \item{\code{as_list()}}{ #' Return a list containing the objects in the queue, where the first #' element in the list is oldest object in the queue (in other words, it is #' the next item that would be returned by `remove()`), and the last element #' in the list is the most recently added object. #' } #' } #' #' #' @param init Initial size of the list that backs the queue. This is also used #' as the minimum size of the list; it will not shrink any smaller. #' @param missing_default The value to return when `remove()` or `peek()` are #' called when the stack is empty. Default is `NULL`. #' @export fastqueue <- function(init = 20, missing_default = NULL) { force(missing_default) q <- vector("list", init) head <- 0L # Index of most recently added item tail <- 0L # Index of oldest item (next to be removed) count <- 0L # Number of items in queue add <- function(x) { force(x) capacity <- length(q) if (count + 1L > capacity) { capacity <- .resize_at_least(count + 1L) } if (capacity - head >= 1L) { # Case 1: We don't need to wrap head <<- head + 1L } else { # Case 2: need to wrap around head <<- 1L } if (is.null(x)) { q[head] <<- list(NULL) } else { q[[head]] <<- x } # If tail was at zero, we had an empty queue, and need to set tail to 1 if (tail == 0L) { tail <<- 1L } count <<- count + 1L invisible() } madd <- function(..., .list = NULL) { if (is.null(.list)) { # Fast path for common case args <- list(...) } else { args <- c(list(...), .list) } n_args <- length(args) if (n_args == 0L) { return(invisible()) } capacity <- length(q) if (count + n_args > capacity) { capacity <- .resize_at_least(count + n_args) } n_until_wrap <- capacity - head if (n_until_wrap >= n_args) { # Case 1: We don't need to wrap q[head + seq_along(args)] <<- args head <<- head + n_args } else { # Case 2: need to wrap around # Fill in from head until end of `q` if (n_until_wrap > 0) { q[seq.int(head + 1, capacity)] <<- args[seq_len(n_until_wrap)] } # Now fill in beginning of q. n_after_wrap <- n_args - n_until_wrap q[seq_len(n_after_wrap)] <<- args[seq.int(n_until_wrap + 1, n_args)] head <<- head + n_args - capacity } # If tail was at zero, we had an empty queue, and need to set tail to 1 if (tail == 0L) { tail <<- 1L } count <<- count + n_args invisible() } remove <- function(missing = missing_default) { if (count == 0L) return(missing) capacity <- length(q) value <- q[[tail]] q[tail] <<- list(NULL) if (tail == head) { # We've emptied the queue tail <<- head <<- 0L } else { tail <<- tail + 1L # Wrapped around if (tail > capacity) tail <<- tail - capacity } count <<- count - 1L # Shrink list if <= 1/4 of the list is used, down to a minimum size of # `init`. When we resize, make sure there's room to add items without having # to resize again (that's why the +1 is there). if (capacity > init && count <= capacity/4) { .resize_at_least(count + 1L) } value } mremove <- function(n, missing = missing_default) { n <- as.integer(n) if (n < 1) { stop("`n` must be at least 1.") } capacity <- length(q) values <- vector("list", n) # When removing multiple, there are two variables to deal with: # (1) no wrap vs. (2) wrap # (A) n < count vs. (B) n == count vs. (C) n > count # ===================================================================== # First run: Fill from tail until we hit n items, head, or end of list. # ===================================================================== run_length <- min(n, capacity-tail+1L) if (head >= tail) { # In the case when the queue does NOT wrap around... run_length <- min(run_length, head-tail+1L) } run_idxs <- seq.int(tail, tail + run_length - 1) values[seq_len(run_length)] <- q[run_idxs] q[run_idxs] <<- list(NULL) # After first run, do some bookkeeping. total_filled <- run_length remaining_n <- n - run_length count <<- count - run_length tail <<- tail + run_length if (count == 0L) { # We've emptied the queue head <<- 0L tail <<- 0L } else if (tail > capacity) { # We've wrapped around stopifnot(tail == capacity + 1L) # Should alwoys land on one after capacity (debugging) tail <<- 1L } # ========================================================== # Second run: Continue filling until we hit n items or head. # ========================================================== if (remaining_n > 0 && count != 0 && tail <= head) { stopifnot(tail == 1L) # Make sure we've actually wrapped run_length <- min(remaining_n, head) run_idxs <- seq_len(run_length) values[seq.int(total_filled+1, total_filled+run_length)] <- q[run_idxs] q[run_idxs] <<- list(NULL) # Do more bookkeeping. TODO: functionize this part total_filled <- total_filled + run_length remaining_n <- remaining_n - run_length count <<- count - run_length tail <<- tail + run_length if (count == 0L) { # We've emptied the queue stopifnot(tail == head + 1L) # Should land on one after head (debugging) head <<- 0L tail <<- 0L } } # =============================================================== # Third run: We've emptied the queue but still need to fill more. # =============================================================== if (remaining_n > 0) { stopifnot(count == 0) values[seq(total_filled+1, n)] <- list(missing) } # Shrink list if <= 1/4 of the list is used, down to a minimum size of # `init`. When we resize, make sure there's room to add items without having # to resize again (that's why the +1 is there). if (capacity > init && count <= capacity/4) { .resize_at_least(count + 1L) } values } peek <- function(missing = missing_default) { if (count == 0L) { return(missing) } q[[tail]] } reset <- function() { q <<- vector("list", init) head <<- 0L tail <<- 0L count <<- 0L invisible() } size <- function() { count } # Return the entire queue as a list, where the first item is the next to be # removed (and oldest in the queue). as_list <- function() { if (count == 0L) return(list()) .as_list() } # Internal version of as_list() # `.size` is the desired size of the output list. .as_list <- function(.size = count) { if (.size < count) { stop("Can't return list smaller than number of items.") } capacity <- length(q) # low_tail can be negative values up to zero, and is always less than head. low_tail <- tail if (head < tail) low_tail <- tail - capacity # Get indices and transfer over old new_q <- vector("list", .size) old_idx <- (seq(low_tail, head) - 1L) %% capacity + 1L new_q[seq_len(length(old_idx))] <- q[old_idx] new_q } # Resize to a specific size. This will also rearrange items so the tail is at # 1 and the head is at count. .resize <- function(n) { if (n < count) { stop("Can't shrink smaller than number of items (", count, ").") } if (n <= 0) { stop("Can't shrink smaller than one.") } # If q is already the right size, don't need to do anything. if (length(q) == n) { return(n) } if (count == 0L) { q <<- vector("list", n) return(n) } q <<- .as_list(n) tail <<- 1L head <<- count n } # Resize the backing list to a size that's `init` times a power of 2, so that # it's at least as large as `n`. .resize_at_least <- function(n) { doublings <- ceiling(log2(n / init)) doublings <- max(0, doublings) new_capacity <- init * 2 ^ doublings .resize(new_capacity) } list( add = add, madd = madd, remove = remove, mremove = mremove, peek = peek, reset = reset, size = size, as_list = as_list ) } fastmap/NEWS.md0000644000176200001440000000035414003624774013014 0ustar liggesusersfastmap 1.1.0 ============= * Added `faststack()` and `fastqueue()`. (#15) fastmap 1.0.1 ============= * Fixed #13: fastmap.cpp now explicitly includes the algorithm header, which is needed on some platforms to successfully compile. fastmap/MD50000644000176200001440000000301314003630522012205 0ustar liggesusers52b07130bd3c1ffe1a653f4abf5ad7ca *DESCRIPTION 44b42abf175150c1e37a8d871caac748 *LICENSE bab9cfd3b060292fd8be73b9383d837a *LICENSE.note be6edcb583251591ac59b79ea7dc7e05 *NAMESPACE afab618cdf35f83f382d329b0a73fb8d *NEWS.md c70afdb086973e6d272339606826c9c0 *R/fastmap.R aefa9d80d406f0873781cd251bd8645c *R/fastqueue.R 2f6da4d369b044b0befb783392feae55 *R/faststack.R 612fd3f480c16e159aa86a19926984f4 *R/key_missing.R 28235859554115b0ca81843ba26e56cd *README.md 55ae30983e13f94fcdeade1335ddf7b5 *man/fastmap.Rd c58cb1cd29bd27e3633945b59217f071 *man/fastqueue.Rd aae9d743ea0e61d3d745d9c7cc41cb2d *man/faststack.Rd 8f983010df4f86ddd00cb829b0e0f71c *man/key_missing.Rd 74d53a543fee8e2ad7d166cb846162d6 *src/Makevars 74d53a543fee8e2ad7d166cb846162d6 *src/Makevars.win f5cb37bea6a235abb1b3bedf8bee668e *src/fastmap.cpp 900e937f559e0a17b0ea411337008ba8 *src/init.c b2cd4d0bcf22b279cf11513557a0f6ea *src/lib/tsl/hopscotch_growth_policy.h eaf4188967d1a054772610a8980fe66f *src/lib/tsl/hopscotch_hash.h 9ceb07ebf09e6bca84c04e9036c79edd *src/lib/tsl/hopscotch_map.h c01a691846e6b7ddb09c79088a85d929 *tests/testthat.R 8423e4c6d48bd9902ab99b3aef7305ab *tests/testthat/helpers-fastmap.R 6f18920ed6fc144315f70c11a2c8f5f3 *tests/testthat/test-encoding.R 5fe971182dbeba61501528e3c5ec5b71 *tests/testthat/test-map.R af5468baaba7aaeca377b95a753f0d9a *tests/testthat/test-queue.R 1355578c48f3b81dc0674e1e003a1fde *tests/testthat/test-serialize.R 8b0f9c805d1565e072c416eec16bc064 *tests/testthat/test-shrink.R 6139733c067881aa068279f8db1446ab *tests/testthat/test-stack.R