cachem/0000755000176200001440000000000014107546532011501 5ustar liggesuserscachem/NAMESPACE0000644000176200001440000000065514107537575012735 0ustar liggesusers# Generated by roxygen2: do not edit by hand S3method(format,cachem) S3method(print,cachem) export(cache_disk) export(cache_layered) export(cache_mem) export(is.key_missing) export(key_missing) import(fastmap) importFrom(fastmap,is.key_missing) importFrom(fastmap,key_missing) importFrom(rlang,as_quosure) importFrom(rlang,enquo) importFrom(rlang,eval_tidy) importFrom(utils,object.size) useDynLib(cachem, .registration = TRUE) cachem/LICENSE0000644000176200001440000000005314107023304012467 0ustar liggesusersYEAR: 2020 COPYRIGHT HOLDER: RStudio, Inc. cachem/README.md0000644000176200001440000003602014107023304012744 0ustar liggesusers - [cachem](#cachem) - [Installation](#installation) - [Usage](#usage) - [Cache types](#cache-types) - [`cache_mem()`](#cache_mem) - [`cache_disk()`](#cache_disk) - [Cache API](#cache-api) - [Pruning](#pruning) - [Layered caches](#layered-caches) # cachem [![R build status](https://github.com/r-lib/cachem/workflows/R-CMD-check/badge.svg)](https://github.com/r-lib/cachem/actions) The **cachem** R package provides objects creating and managing caches. These cache objects are key-value stores, but unlike other basic key-value stores, they have built-in support for memory and age limits so that they won’t have unbounded growth. The cache objects in **cachem** differ from some other key-value stores in the following ways: - The cache objects provide automatic pruning so that they remain within memory limits. - Fetching a non-existing object returns a sentinel value. An alternative is to simply return `NULL`. This is what R lists and environments do, but it is ambiguous whether the value really is `NULL`, or if it is not present. Another alternative is to throw an exception when fetching a non-existent object. However, this results in more complicated code, as every `get()` needs to be wrapped in a `tryCatch()`. ## Installation To install the CRAN version: ``` r install.packages("cachem") ``` You can install the development version from with: ``` r if (!require("remotes")) install.packages("remotes") remotes::install_github("r-lib/cachem") ``` ## Usage To create a memory-based cache, call `cache_mem()`. ``` r library(cachem) m <- cache_mem() ``` Add arbitrary R objects to the cache using `$set(key, value)`: ``` r m$set("abc123", c("Hello", "world")) m$set("xyz", function() message("Goodbye")) ``` The `key` must be a string consisting of lowercase letters, numbers, and the underscore (`_`) and hyphen (`-`) characters. (Upper-case characters are not allowed because some storage backends do not distinguish between lowercase and uppercase letters.) The `value` can be any R object. Get the values with `$get()`: ``` r m$get("abc123") #> [1] "Hello" "world" m$get("xyz") #> function() message("Goodbye") ``` If you call `get()` on a key that doesn’t exists, it will return a `key_missing()` sentinel value: ``` r m$get("dog") #> ``` A common usage pattern is to call `get()`, and then check if the result is a `key_missing` object: ``` r value <- m$get(key) if (is.key_missing(value)) { # Cache miss - do something } else { # Cache hit - do another thing } ``` The reason for doing this (instead of calling `$exists(key)` and then `$get(key)`) is that for some storage backends, there is a potential race condition: the object could be removed from the cache between the `exists()` and `get()` calls. For example: - If multiple R processes have `cache_disk`s that share the same directory, one process could remove an object from the cache in between the `exists()` and `get()` calls in another process, resulting in an error. - If you use a `cache_mem` with a `max_age`, it’s possible for an object to be present when you call `exists()`, but for its age to exceed `max_age` by the time `get()` is called. In that case, the `get()` will return a `key_missing()` object. ``` r # Avoid this pattern, due to a potential race condition! if (m$exists(key)) { value <- m$get(key) } ``` ## Cache types **cachem** comes with two kinds of cache objects: a memory cache, and a disk cache. ### `cache_mem()` The memory cache stores stores objects in memory, by simply keeping a reference to each object. To create a memory cache: ``` r m <- cache_mem() ``` The default size of the cache is 200MB, but this can be customized with `max_size`: ``` r m <- cache_mem(max_size = 10 * 1024^2) ``` It may also be useful to set a maximum age of objects. For example, if you only want objects to stay for a maximum of one hour: ``` r m <- cache_mem(max_size = 10 * 1024^2, max_age = 3600) ``` For more about how objects are evicted from the cache, see section [Pruning](#pruning) below. An advantage that the memory cache has over the disk cache (and any other type of cache that stores the objects outside of the R process’s memory), is that it does not need to serialize objects. Instead, it merely stores references to the objects. This means that it can store objects that other caches cannot, and with more efficient use of memory – if two objects in the cache share some of their contents (such that they refer to the same sub-object in memory), then `cache_mem` will not create duplicate copies of the contents, as `cache_disk` would, since it serializes the objects with the `serialize()` function. Compared to the memory usage, the size *calculation* is not as intelligent: if there are two objects that share contents, their sizes are computed separately, even if they have items that share the exact same represention in memory. This is done with the `object.size()` function, which does not account for multiple references to the same object in memory. In short, a memory cache, if anything, over-counts the amount of memory actually consumed. In practice, this means that if you set a 200MB limit to the size of cache, and the cache *thinks* it has 200MB of contents, the actual amount of memory consumed could be less than 200MB.
Demonstration of memory over-counting from `object.size()` ``` r # Create a and b which both contain the same numeric vector. x <- list(rnorm(1e5)) a <- list(1, x) b <- list(2, x) # Add to cache m$set("a", a) m$set("b", b) # Each object is about 800kB in memory, so the cache_mem() will consider the # total memory used to be 1600kB. object.size(m$get("a")) #> 800224 bytes object.size(m$get("b")) #> 800224 bytes ``` For reference, lobstr::obj\_size can detect shared objects, and knows that these objects share most of their memory. ``` r lobstr::obj_size(m$get("a")) #> 800,224 B lobstr::obj_size(list(m$get("a"), m$get("b"))) #> 800,408 B ``` However, lobstr is not on CRAN, and if obj\_size() were used to find the incremental memory used when an object was added to the cache, it would have to walk all objects in the cache every time a single object is added. For these reasons, cache\_mem uses `object.size()` to compute the object sizes.
### `cache_disk()` Disk caches are stored in a directory on disk. A disk cache is slower than a memory cache, but can generally be larger. To create one: ``` r d <- cache_disk() ``` By default, it creates a subdirectory of the R process’s temp directory, and it will persist until the R process exits. ``` r d$info()$dir #> "/tmp/Rtmp6h5iB3/cache_disk-d1901b2b615a" ``` Like a `cache_mem`, the `max_size`, `max_n`, `max_age` can be customized. See section [Pruning](#pruning) below for more information. Each object in the cache is stored as an RDS file on disk, using the `serialize()` function. ``` r d$set("abc", 100) d$set("x01", list(1, 2, 3)) dir(d$info()$dir) #> [1] "abc.rds" "x01.rds" ``` Since objects in a disk cache are serialized, they are subject to the limitations of the `serialize()` function. For more information, see section [Limitations of serialized objects](#limitations-of-serialized-objects). The storage directory can be specified with `dir`; it will be created if necessary. ``` r cache_disk(dir = "cachedir") ``` #### Sharing a disk cache among processes Multiple R processes can use `disk_cache` objects that share the same cache directory. To do this, simply point each `cache_disk` to the same directory. #### `disk_cache` pruning For a `disk_cache`, pruning does not happen on every access, because finding the size of files in the cache directory can take a nontrivial amount of time. By default, pruning happens once every 20 times that `$set()` is called, or if at least five seconds have elapsed since the last pruning. The `prune_rate` controls how many times `$set()` must be called before a pruning occurs. It defaults to 20; smaller values result in more frequent pruning and larger values result in less frequent pruning (but keep in mind pruning always occurs if it has been at least five seconds since the last pruning). #### Cleaning up the cache directory The cache directory can be deleted by calling `$destroy()`. After it is destroyed, the cache object can no longer be used. ``` r d$destroy() d$set("a", 1) # Error ``` To create a `cache_disk` that will automatically delete its storage directory when garbage collected, use `destroy_on_finalize=TRUE`: ``` r d <- cache_disk(destroy_on_finalize = TRUE) d$set("a", 1) cachedir <- d$info()$dir dir(cachedir) #> [1] "a.rds" # Remove reference to d and trigger a garbage collection rm(d) gc() dir.exists(cachedir) ``` ## Cache API `cache_mem()` and `cache_disk()` support all of the methods listed below. If you want to create a compatible caching object, it must have at least the `get()` and `set()` methods: - `get(key, missing = missing_)`: Get the object associated with `key`. The `missing` parameter allows customized behavior if the key is not present: it actually is an expression which is evaluated when there is a cache miss, and it could return a value or throw an error. - `set(key, value)`: Set a key to a value. - `exists(key)`: Check whether a particular key exists in the cache. - `remove(key)`: Remove a key-value from the cache. Some optional methods: - `reset()`: Clear all objects from the cache. - `keys()`: Return a character vector of all keys in the cache. - `prune()`: Prune the cache. (Some types of caches may not prune on every access, and may temporarily grow past their limits, until the next pruning is triggered automatically, or manually with this function.) - `size()`: Return the number of objects in the cache. - `size()`: Return the number of objects in the cache. For these methods: - `key`: can be any string with lowercase letters, numbers, underscore (`_`) and hyphen (`-`). Some storage backends may not be handle very long keys well. For example, with a `cache_disk()`, the key is used as a filename, and on some filesystems, very filenames may hit limits on path lengths. - `value`: can be any R object, with some exceptions noted below. #### Limitations of serialized objects For any cache that serializes the object for storage outside of the R process – in other words, any cache other than a `cache_mem()` – some types of objects will not save and restore as well. Notably, reference objects may consume more memory when restored, since R may not know to deduplicate shared objects. External pointers are not be able to be serialized, since they point to memory in the R process. See `?serialize` for more information. #### Read-only caches It is possible to create a read-only cache by making the `set()`, `remove()`, `reset()`, and `prune()` methods into no-ops. This can be useful if sharing a cache with another R process which can write to the cache. For example, one (or more) processes can write to the cache, and other processes can read from it. This function will wrap a cache object in a read-only wrapper. Note, however, that code that uses such a cache must not require that `$set()` actually sets a value in the cache. This is good practice anyway, because with these cache objects, items can be pruned from them at any time. ``` r cache_readonly_wrap <- function(cache) { structure( list( get = cache$get, set = function(key, value) NULL, exists = cache$exists, keys = cache$keys, remove = function(key) NULL, reset = function() NULL, prune = function() NULL, size = cache$size ), class = c("cache_readonly", class(cache)) ) } mr <- cache_readonly_wrap(m) ``` ## Pruning The cache objects provided by cachem have automatic pruning. (Note that pruning is not required by the API, so one could implement an API-compatible cache without pruning.) This section describes how pruning works for `cache_mem()` and `cache_disk()`. When the cache object is created, the maximum size (in bytes) is specified by `max_size`. When the size of objects in the cache exceeds `max_size`, objects will be pruned from the cache. When objects are pruned from the cache, which ones are removed is determined by the eviction policy, `evict`: - **`lru`**: The least-recently-used objects will be removed from the cache, until it fits within the limit. This is the default and is appropriate for most cases. - **`fifo`**: The oldest objects will be removed first. It is also possible to set the maximum number of items that can be in the cache, with `max_n`. By default this is set to `Inf`, or no limit. The `max_age` parameter is somewhat different from `max_size` and `max_n`. The latter two set limits on the cache store as a whole, whereas `max_age` sets limits for each individual item; for each item, if its age exceeds `max_age`, then it will be removed from the cache. ## Layered caches Multiple caches can be composed into a single cache, using `cache_layered()`. This can be used to create a multi-level cache. (Note thate `cache_layered()` is currently experimental.) For example, we can create a layered cache with a very fast 100MB memory cache and a larger but slower 2GB disk cache: ``` r m <- cache_mem(max_size = 100 * 1024^2) d <- cache_disk(max_size = 2 * 1024^3) cl <- cache_layered(m, d) ``` The layered cache will have the same API, with `$get()`, `$set()`, and so on, so it can be used interchangeably with other caching objects. For this example, we’ll recreate the `cache_layered` with logging enabled, so that it will show cache hits and misses. ``` r cl <- cache_layered(m, d, logfile = stderr()) # Each of the objects generated by rnorm() is about 40 MB cl$set("a", rnorm(5e6)) cl$set("b", rnorm(5e6)) cl$set("c", rnorm(5e6)) # View the objects in each of the component caches m$keys() #> [1] "c" "b" d$keys() #> [1] "a" "b" "c" # The layered cache reports having all keys lc$keys() #> [1] "c" "b" "a" ``` When `$get()` is called, it searches the first cache, and if it’s missing there, it searches the next cache, and so on. If not found in any caches, it returns `key_missing()`. ``` r # Get object that exists in the memory cache x <- cl$get("c") #> [2020-10-23 13:11:09.985] cache_layered Get: c #> [2020-10-23 13:11:09.985] cache_layered Get from cache_mem... hit # Get object that doesn't exist in the memory cache x <- cl$get("c") #> [2020-10-23 13:13:10.968] cache_layered Get: a #> [2020-10-23 13:13:10.969] cache_layered Get from cache_mem... miss #> [2020-10-23 13:13:11.329] cache_layered Get from cache_disk... hit # Object is not present in any component caches cl$get("d") #> [2020-10-23 13:13:40.197] cache_layered Get: d #> [2020-10-23 13:13:40.197] cache_layered Get from cache_mem... miss #> [2020-10-23 13:13:40.198] cache_layered Get from cache_disk... miss #> ``` Multiple cache objects can be layered this way. You could even add a cache which uses a remote store, such as a network file system or even AWS S3. cachem/man/0000755000176200001440000000000014107023360012241 5ustar liggesuserscachem/man/cache_layered.Rd0000644000176200001440000000151414107023304015277 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/cache-layered.R \name{cache_layered} \alias{cache_layered} \title{Compose any number of cache objects into a new, layered cache object} \usage{ cache_layered(..., logfile = NULL) } \arguments{ \item{...}{Cache objects to compose into a new, layered cache object.} \item{logfile}{An optional filename or connection object to where logging information will be written. To log to the console, use \code{stderr()} or \code{stdout()}.} } \value{ A layered caching object, with class \code{cache_layered}. } \description{ Note that \code{cache_layered} is currently experimental. } \examples{ # Make a layered cache from a small memory cache and large disk cache m <- cache_mem(max_size = 100 * 1024^2) d <- cache_disk(max_size = 2 * 1024^3) cl <- cache_layered(m, d) } cachem/man/cache_disk.Rd0000644000176200001440000002244314107023304014610 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/cache-disk.R \name{cache_disk} \alias{cache_disk} \title{Create a disk cache object} \usage{ cache_disk( dir = NULL, max_size = 1024 * 1024^2, max_age = Inf, max_n = Inf, evict = c("lru", "fifo"), destroy_on_finalize = FALSE, missing = key_missing(), prune_rate = 20, warn_ref_objects = FALSE, logfile = NULL ) } \arguments{ \item{dir}{Directory to store files for the cache. If \code{NULL} (the default) it will create and use a temporary directory.} \item{max_size}{Maximum size of the cache, in bytes. If the cache exceeds this size, cached objects will be removed according to the value of the \code{evict}. Use \code{Inf} for no size limit. The default is 1 gigabyte.} \item{max_age}{Maximum age of files in cache before they are evicted, in seconds. Use \code{Inf} for no age limit.} \item{max_n}{Maximum number of objects in the cache. If the number of objects exceeds this value, then cached objects will be removed according to the value of \code{evict}. Use \code{Inf} for no limit of number of items.} \item{evict}{The eviction policy to use to decide which objects are removed when a cache pruning occurs. Currently, \code{"lru"} and \code{"fifo"} are supported.} \item{destroy_on_finalize}{If \code{TRUE}, then when the cache_disk object is garbage collected, the cache directory and all objects inside of it will be deleted from disk. If \code{FALSE} (the default), it will do nothing when finalized.} \item{missing}{A value to return when \code{get(key)} is called but the key is not present in the cache. The default is a \code{\link[=key_missing]{key_missing()}} object. It is actually an expression that is evaluated each time there is a cache miss. See section Missing keys for more information.} \item{prune_rate}{How often to prune the cache. See section Cache Pruning for more information.} \item{warn_ref_objects}{Should a warning be emitted when a reference is stored in the cache? This can be useful because serializing and deserializing a reference object (such as environments and external pointers) can lead to unexpected behavior.} \item{logfile}{An optional filename or connection object to where logging information will be written. To log to the console, use \code{stderr()} or \code{stdout()}.} } \value{ A disk caching object, with class \code{cache_disk}. } \description{ A disk cache object is a key-value store that saves the values as files in a directory on disk. Objects can be stored and retrieved using the \code{get()} and \code{set()} methods. Objects are automatically pruned from the cache according to the parameters \code{max_size}, \code{max_age}, \code{max_n}, and \code{evict}. } \section{Missing keys}{ The \code{missing} parameter controls what happens when \code{get()} is called with a key that is not in the cache (a cache miss). The default behavior is to return a \code{\link[=key_missing]{key_missing()}} object. This is a \emph{sentinel value} that indicates that the key was not present in the cache. You can test if the returned value represents a missing key by using the \code{\link[=is.key_missing]{is.key_missing()}} function. You can also have \code{get()} return a different sentinel value, like \code{NULL}. If you want to throw an error on a cache miss, you can do so by providing an expression for \code{missing}, as in \code{missing = stop("Missing key")}. When the cache is created, you can supply a value for \code{missing}, which sets the default value to be returned for missing values. It can also be overridden when \code{get()} is called, by supplying a \code{missing} argument. For example, if you use \code{cache$get("mykey", missing = NULL)}, it will return \code{NULL} if the key is not in the cache. The \code{missing} parameter is actually an expression which is evaluated each time there is a cache miss. A quosure (from the rlang package) can be used. If you use this, the code that calls \code{get()} should be wrapped with \code{\link[=tryCatch]{tryCatch()}} to gracefully handle missing keys. } \section{Cache pruning}{ Cache pruning occurs when \code{set()} is called, or it can be invoked manually by calling \code{prune()}. The disk cache will throttle the pruning so that it does not happen on every call to \code{set()}, because the filesystem operations for checking the status of files can be slow. Instead, it will prune once in every \code{prune_rate} calls to \code{set()}, or if at least 5 seconds have elapsed since the last prune occurred, whichever is first. When a pruning occurs, if there are any objects that are older than \code{max_age}, they will be removed. The \code{max_size} and \code{max_n} parameters are applied to the cache as a whole, in contrast to \code{max_age}, which is applied to each object individually. If the number of objects in the cache exceeds \code{max_n}, then objects will be removed from the cache according to the eviction policy, which is set with the \code{evict} parameter. Objects will be removed so that the number of items is \code{max_n}. If the size of the objects in the cache exceeds \code{max_size}, then objects will be removed from the cache. Objects will be removed from the cache so that the total size remains under \code{max_size}. Note that the size is calculated using the size of the files, not the size of disk space used by the files --- these two values can differ because of files are stored in blocks on disk. For example, if the block size is 4096 bytes, then a file that is one byte in size will take 4096 bytes on disk. Another time that objects can be removed from the cache is when \code{get()} is called. If the target object is older than \code{max_age}, it will be removed and the cache will report it as a missing value. } \section{Eviction policies}{ If \code{max_n} or \code{max_size} are used, then objects will be removed from the cache according to an eviction policy. The available eviction policies are: \describe{ \item{\code{"lru"}}{ Least Recently Used. The least recently used objects will be removed. This uses the filesystem's mtime property. When "lru" is used, each \code{get()} is called, it will update the file's mtime using \code{\link[=Sys.setFileTime]{Sys.setFileTime()}}. Note that on some platforms, the resolution of \code{\link[=Sys.setFileTime]{Sys.setFileTime()}} may be low, one or two seconds. } \item{\code{"fifo"}}{ First-in-first-out. The oldest objects will be removed. } } Both of these policies use files' mtime. Note that some filesystems (notably FAT) have poor mtime resolution. (atime is not used because support for atime is worse than mtime.) } \section{Sharing among multiple processes}{ The directory for a cache_disk can be shared among multiple R processes. To do this, each R process should have a cache_disk object that uses the same directory. Each cache_disk will do pruning independently of the others, so if they have different pruning parameters, then one cache_disk may remove cached objects before another cache_disk would do so. Even though it is possible for multiple processes to share a cache_disk directory, this should not be done on networked file systems, because of slow performance of networked file systems can cause problems. If you need a high-performance shared cache, you can use one built on a database like Redis, SQLite, mySQL, or similar. When multiple processes share a cache directory, there are some potential race conditions. For example, if your code calls \code{exists(key)} to check if an object is in the cache, and then call \code{get(key)}, the object may be removed from the cache in between those two calls, and \code{get(key)} will throw an error. Instead of calling the two functions, it is better to simply call \code{get(key)}, and check that the returned object is not a \code{key_missing()} object, using \code{is.key_missing()}. This effectively tests for existence and gets the object in one operation. It is also possible for one processes to prune objects at the same time that another processes is trying to prune objects. If this happens, you may see a warning from \code{file.remove()} failing to remove a file that has already been deleted. } \section{Methods}{ A disk cache object has the following methods: \describe{ \item{\code{get(key, missing)}}{ Returns the value associated with \code{key}. If the key is not in the cache, then it evaluates the expression specified by \code{missing} and returns the value. If \code{missing} is specified here, then it will override the default that was set when the \code{cache_mem} object was created. See section Missing Keys for more information. } \item{\code{set(key, value)}}{ Stores the \code{key}-\code{value} pair in the cache. } \item{\code{exists(key)}}{ Returns \code{TRUE} if the cache contains the key, otherwise \code{FALSE}. } \item{\code{remove(key)}}{ Removes \code{key} from the cache, if it exists in the cache. If the key is not in the cache, this does nothing. } \item{\code{size()}}{ Returns the number of items currently in the cache. } \item{\code{keys()}}{ Returns a character vector of all keys currently in the cache. } \item{\code{reset()}}{ Clears all objects from the cache. } \item{\code{destroy()}}{ Clears all objects in the cache, and removes the cache directory from disk. } \item{\code{prune()}}{ Prunes the cache, using the parameters specified by \code{max_size}, \code{max_age}, \code{max_n}, and \code{evict}. } } } cachem/man/reexports.Rd0000644000176200001440000000074214107023304014564 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/reexports.R \docType{import} \name{reexports} \alias{reexports} \alias{key_missing} \alias{is.key_missing} \title{Objects exported from other packages} \keyword{internal} \description{ These objects are imported from other packages. Follow the links below to see their documentation. \describe{ \item{fastmap}{\code{\link[fastmap:key_missing]{is.key_missing}}, \code{\link[fastmap]{key_missing}}} }} cachem/man/cache_mem.Rd0000644000176200001440000001510714107023304014433 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/cache-mem.R \name{cache_mem} \alias{cache_mem} \title{Create a memory cache object} \usage{ cache_mem( max_size = 512 * 1024^2, max_age = Inf, max_n = Inf, evict = c("lru", "fifo"), missing = key_missing(), logfile = NULL ) } \arguments{ \item{max_size}{Maximum size of the cache, in bytes. If the cache exceeds this size, cached objects will be removed according to the value of the \code{evict}. Use \code{Inf} for no size limit. The default is 1 gigabyte.} \item{max_age}{Maximum age of files in cache before they are evicted, in seconds. Use \code{Inf} for no age limit.} \item{max_n}{Maximum number of objects in the cache. If the number of objects exceeds this value, then cached objects will be removed according to the value of \code{evict}. Use \code{Inf} for no limit of number of items.} \item{evict}{The eviction policy to use to decide which objects are removed when a cache pruning occurs. Currently, \code{"lru"} and \code{"fifo"} are supported.} \item{missing}{A value to return when \code{get(key)} is called but the key is not present in the cache. The default is a \code{\link[=key_missing]{key_missing()}} object. It is actually an expression that is evaluated each time there is a cache miss. See section Missing keys for more information.} \item{logfile}{An optional filename or connection object to where logging information will be written. To log to the console, use \code{stderr()} or \code{stdout()}.} } \value{ A memory caching object, with class \code{cache_mem}. } \description{ A memory cache object is a key-value store that saves the values in an environment. Objects can be stored and retrieved using the \code{get()} and \code{set()} methods. Objects are automatically pruned from the cache according to the parameters \code{max_size}, \code{max_age}, \code{max_n}, and \code{evict}. } \details{ In a \code{cache_mem}, R objects are stored directly in the cache; they are not \emph{not} serialized before being stored in the cache. This contrasts with other cache types, like \code{\link[=cache_disk]{cache_disk()}}, where objects are serialized, and the serialized object is cached. This can result in some differences of behavior. For example, as long as an object is stored in a cache_mem, it will not be garbage collected. } \section{Missing keys}{ The \code{missing} parameter controls what happens when \code{get()} is called with a key that is not in the cache (a cache miss). The default behavior is to return a \code{\link[=key_missing]{key_missing()}} object. This is a \emph{sentinel value} that indicates that the key was not present in the cache. You can test if the returned value represents a missing key by using the \code{\link[=is.key_missing]{is.key_missing()}} function. You can also have \code{get()} return a different sentinel value, like \code{NULL}. If you want to throw an error on a cache miss, you can do so by providing an expression for \code{missing}, as in \code{missing = stop("Missing key")}. When the cache is created, you can supply a value for \code{missing}, which sets the default value to be returned for missing values. It can also be overridden when \code{get()} is called, by supplying a \code{missing} argument. For example, if you use \code{cache$get("mykey", missing = NULL)}, it will return \code{NULL} if the key is not in the cache. The \code{missing} parameter is actually an expression which is evaluated each time there is a cache miss. A quosure (from the rlang package) can be used. If you use this, the code that calls \code{get()} should be wrapped with \code{\link[=tryCatch]{tryCatch()}} to gracefully handle missing keys. @section Cache pruning: Cache pruning occurs when \code{set()} is called, or it can be invoked manually by calling \code{prune()}. When a pruning occurs, if there are any objects that are older than \code{max_age}, they will be removed. The \code{max_size} and \code{max_n} parameters are applied to the cache as a whole, in contrast to \code{max_age}, which is applied to each object individually. If the number of objects in the cache exceeds \code{max_n}, then objects will be removed from the cache according to the eviction policy, which is set with the \code{evict} parameter. Objects will be removed so that the number of items is \code{max_n}. If the size of the objects in the cache exceeds \code{max_size}, then objects will be removed from the cache. Objects will be removed from the cache so that the total size remains under \code{max_size}. Note that the size is calculated using the size of the files, not the size of disk space used by the files --- these two values can differ because of files are stored in blocks on disk. For example, if the block size is 4096 bytes, then a file that is one byte in size will take 4096 bytes on disk. Another time that objects can be removed from the cache is when \code{get()} is called. If the target object is older than \code{max_age}, it will be removed and the cache will report it as a missing value. } \section{Eviction policies}{ If \code{max_n} or \code{max_size} are used, then objects will be removed from the cache according to an eviction policy. The available eviction policies are: \describe{ \item{\code{"lru"}}{ Least Recently Used. The least recently used objects will be removed. } \item{\code{"fifo"}}{ First-in-first-out. The oldest objects will be removed. } } } \section{Methods}{ A disk cache object has the following methods: \describe{ \item{\code{get(key, missing)}}{ Returns the value associated with \code{key}. If the key is not in the cache, then it evaluates the expression specified by \code{missing} and returns the value. If \code{missing} is specified here, then it will override the default that was set when the \code{cache_mem} object was created. See section Missing Keys for more information. } \item{\code{set(key, value)}}{ Stores the \code{key}-\code{value} pair in the cache. } \item{\code{exists(key)}}{ Returns \code{TRUE} if the cache contains the key, otherwise \code{FALSE}. } \item{\code{remove(key)}}{ Removes \code{key} from the cache, if it exists in the cache. If the key is not in the cache, this does nothing. } \item{\code{size()}}{ Returns the number of items currently in the cache. } \item{\code{keys()}}{ Returns a character vector of all keys currently in the cache. } \item{\code{reset()}}{ Clears all objects from the cache. } \item{\code{destroy()}}{ Clears all objects in the cache, and removes the cache directory from disk. } \item{\code{prune()}}{ Prunes the cache, using the parameters specified by \code{max_size}, \code{max_age}, \code{max_n}, and \code{evict}. } } } cachem/DESCRIPTION0000644000176200001440000000151014107546532013204 0ustar liggesusersPackage: cachem Version: 1.0.6 Title: Cache R Objects with Automatic Pruning Description: Key-value stores with automatic pruning. Caches can limit either their total size or the age of the oldest object (or both), automatically pruning objects to maintain the constraints. Authors@R: c( person("Winston", "Chang", , "winston@rstudio.com", c("aut", "cre")), person(family = "RStudio", role = c("cph", "fnd"))) License: MIT + file LICENSE Encoding: UTF-8 ByteCompile: true URL: https://cachem.r-lib.org/, https://github.com/r-lib/cachem Imports: rlang, fastmap Suggests: testthat RoxygenNote: 7.1.1 NeedsCompilation: yes Packaged: 2021-08-19 21:07:46 UTC; barret Author: Winston Chang [aut, cre], RStudio [cph, fnd] Maintainer: Winston Chang Repository: CRAN Date/Publication: 2021-08-19 21:30:02 UTC cachem/tests/0000755000176200001440000000000014107023360012630 5ustar liggesuserscachem/tests/testthat/0000755000176200001440000000000014107546532014503 5ustar liggesuserscachem/tests/testthat/helper-utils.R0000644000176200001440000000011614107541531017233 0ustar liggesusersis_on_github_actions <- function() { nzchar(Sys.getenv("GITHUB_ACTIONS")) } cachem/tests/testthat/test-cache-mem.R0000644000176200001440000001665614107541531017432 0ustar liggesuserstime_factor <- 1 # Do things slower on GHA because of slow machines if (is_on_github_actions()) time_factor <- 4 test_that("cache_mem: handling missing values", { d <- cache_mem() expect_true(is.key_missing(d$get("abcd"))) d$set("a", 100) expect_identical(d$get("a"), 100) expect_identical(d$get("y", missing = NULL), NULL) expect_error( d$get("y", missing = stop("Missing key")), "^Missing key$", ) d <- cache_mem(missing = NULL) expect_true(is.null(d$get("abcd"))) d$set("a", 100) expect_identical(d$get("a"), 100) expect_identical(d$get("y", missing = -1), -1) expect_error( d$get("y", missing = stop("Missing key")), "^Missing key$", ) d <- cache_mem(missing = stop("Missing key")) expect_error(d$get("abcd"), "^Missing key$") d$set("x", NULL) d$set("a", 100) expect_identical(d$get("a"), 100) expect_error(d$get("y"), "^Missing key$") expect_identical(d$get("y", missing = NULL), NULL) expect_true(is.key_missing(d$get("y", missing = key_missing()))) expect_error( d$get("y", missing = stop("Missing key 2")), "^Missing key 2$", ) # Pass in a quosure expr <- rlang::quo(stop("Missing key")) d <- cache_mem(missing = !!expr) expect_error(d$get("y"), "^Missing key$") expect_error(d$get("y"), "^Missing key$") # Make sure a second time also throws }) test_that("cache_mem: pruning respects max_n", { delay <- 0.001 * time_factor d <- cache_mem(max_n = 3) # NOTE: The short delays after each item are meant to tests more reliable on # CI systems. d$set("a", rnorm(100)); Sys.sleep(delay) d$set("b", rnorm(100)); Sys.sleep(delay) d$set("c", rnorm(100)); Sys.sleep(delay) d$set("d", rnorm(100)); Sys.sleep(delay) d$set("e", rnorm(100)); Sys.sleep(delay) expect_identical(sort(d$keys()), c("c", "d", "e")) }) test_that("cache_mem: pruning respects max_size", { delay <- 0.001 * time_factor d <- cache_mem(max_size = object.size(123) * 3) d$set("a", rnorm(100)); Sys.sleep(delay) d$set("b", rnorm(100)); Sys.sleep(delay) d$set("c", 1); Sys.sleep(delay) expect_identical(sort(d$keys()), c("c")) d$set("d", rnorm(100)); Sys.sleep(delay) # Objects are pruned with oldest first, so even though "c" would fit in the # cache, it is removed after adding "d" (and "d" is removed as well because it # doesn't fit). expect_length(d$keys(), 0) d$set("e", 2); Sys.sleep(delay) d$set("f", 3); Sys.sleep(delay) expect_identical(sort(d$keys()), c("e", "f")) }) test_that("cache_mem: max_size=Inf", { mc <- cachem::cache_mem(max_size = Inf) mc$set("a", 123) expect_identical(mc$get("a"), 123) mc$prune() expect_identical(mc$get("a"), 123) }) test_that("cache_mem: pruning respects both max_n and max_size", { delay <- 0.001 * time_factor d <- cache_mem(max_n = 3, max_size = object.size(123) * 3) # Set some values. Use rnorm so that object size is large; a simple vector # like 1:100 will be stored very efficiently by R's ALTREP, and won't exceed # the max_size. We want each of these objects to exceed max_size so that # they'll be pruned. d$set("a", rnorm(100)); Sys.sleep(delay) d$set("b", rnorm(100)); Sys.sleep(delay) d$set("c", rnorm(100)); Sys.sleep(delay) d$set("d", rnorm(100)); Sys.sleep(delay) d$set("e", rnorm(100)); Sys.sleep(delay) d$set("f", 1); Sys.sleep(delay) d$set("g", 1); Sys.sleep(delay) d$set("h", 1); Sys.sleep(delay) expect_identical(sort(d$keys()), c("f", "g", "h")) # This will cause f to be pruned (due to max_n) and g to be pruned (due to # max_size). d$set("i", c(2, 3)); Sys.sleep(0.001) expect_identical(sort(d$keys()), c("h", "i")) }) test_that('cache_mem: pruning with evict="lru"', { delay <- 0.001 * time_factor d <- cache_mem(max_n = 2) d$set("a", 1); Sys.sleep(delay) d$set("b", 1); Sys.sleep(delay) d$set("c", 1); Sys.sleep(delay) expect_identical(sort(d$keys()), c("b", "c")) d$get("b") d$set("d", 1); Sys.sleep(delay) expect_identical(sort(d$keys()), c("b", "d")) d$get("b") d$set("e", 2); Sys.sleep(delay) d$get("b") d$set("f", 3); Sys.sleep(delay) expect_identical(sort(d$keys()), c("b", "f")) d <- cache_mem(max_n = 2, evict = "lru") d$set("a", 1); Sys.sleep(delay) d$set("b", 1); Sys.sleep(delay) d$set("c", 1); Sys.sleep(delay) d$set("b", 2); Sys.sleep(delay) d$set("d", 2); Sys.sleep(delay) expect_identical(sort(d$keys()), c("b", "d")) }) test_that('cache_mem: pruning with evict="fifo"', { delay <- 0.001 * time_factor d <- cache_mem(max_n = 2, evict = "fifo") d$set("a", 1); Sys.sleep(delay) d$set("b", 1); Sys.sleep(delay) d$set("c", 1); Sys.sleep(delay) expect_identical(sort(d$keys()), c("b", "c")) d$get("b") d$set("d", 1); Sys.sleep(delay) expect_identical(sort(d$keys()), c("c", "d")) d$get("b") d$set("e", 2); Sys.sleep(delay) d$get("b") d$set("f", 3); Sys.sleep(delay) expect_identical(sort(d$keys()), c("e", "f")) d <- cache_mem(max_n = 2, evict = "fifo") d$set("a", 1); Sys.sleep(delay) d$set("b", 1); Sys.sleep(delay) d$set("c", 1); Sys.sleep(delay) d$set("b", 2); Sys.sleep(delay) d$set("d", 2); Sys.sleep(delay) expect_identical(sort(d$keys()), c("b", "d")) }) test_that("Pruning by max_age", { skip_on_cran() # Should prune target item on get() d <- cache_mem(max_age = 0.25*time_factor) d$set("a", 1) expect_identical(d$get("a"), 1) Sys.sleep(0.3*time_factor) expect_identical(d$get("a"), key_missing()) expect_identical(d$get("x"), key_missing()) # Should prune all items on set() d <- cache_mem(max_age = 0.25*time_factor) d$set("a", 1) expect_identical(d$get("a"), 1) Sys.sleep(0.3*time_factor) d$set("b", 1) expect_identical(d$keys(), "b") # Should prune target item on exists() d <- cache_mem(max_age = 0.25*time_factor) d$set("a", 1) expect_identical(d$get("a"), 1) expect_true(d$exists("a")) expect_false(d$exists("b")) Sys.sleep(0.15*time_factor) d$set("b", 1) expect_true(d$exists("a")) expect_true(d$exists("b")) Sys.sleep(0.15*time_factor) expect_false(d$exists("a")) expect_true(d$exists("b")) # Should prune all items on keys() d <- cache_mem(max_age = 0.25*time_factor) d$set("a", 1) expect_identical(d$keys(), "a") Sys.sleep(0.15*time_factor) d$set("b", 1) Sys.sleep(0.15*time_factor) expect_identical(d$keys(), "b") # Should prune all items on size() d <- cache_mem(max_age = 0.25*time_factor) d$set("a", 1) expect_identical(d$size(), 1L) Sys.sleep(0.15*time_factor) d$set("b", 1) expect_identical(d$size(), 2L) Sys.sleep(0.15*time_factor) expect_identical(d$size(), 1L) }) test_that("Removed objects can be GC'd", { mc <- cache_mem() e <- new.env() finalized <- FALSE reg.finalizer(e, function(x) finalized <<- TRUE) mc$set("e", e) rm(e) mc$set("x", 1) gc() expect_false(finalized) expect_true(is.environment(mc$get("e"))) }) test_that("Pruned objects can be GC'd", { delay <- 0.001 * time_factor # Cache is large enough to hold one environment and one number mc <- cache_mem(max_size = object.size(new.env()) + object.size(1234)) e <- new.env() finalized <- FALSE reg.finalizer(e, function(x) finalized <<- TRUE) mc$set("e", e) rm(e) mc$set("x", 1) gc() expect_false(finalized) expect_true(is.environment(mc$get("e"))) # Get x so that the atime is updated Sys.sleep(delay) mc$get("x") Sys.sleep(delay) # e should be pruned when we add another item mc$set("y", 2) gc() expect_true(finalized) expect_true(is.key_missing(mc$get("e"))) }) cachem/tests/testthat/test-cache-disk.R0000644000176200001440000001603114107541533017573 0ustar liggesusers cache_disk_deterministic <- function(...) { d <- cache_disk(...) # Normally the throttle counter starts with a random value, but for these # tests we need to make it deterministic. environment(d$set)$prune_throttle_counter_ <- 0 d } test_that("cache_disk: handling missing values", { d <- cache_disk() expect_true(is.key_missing(d$get("abcd"))) d$set("a", 100) expect_identical(d$get("a"), 100) expect_identical(d$get("y", missing = NULL), NULL) expect_error( d$get("y", missing = stop("Missing key")), "^Missing key$", ) d <- cache_disk(missing = NULL) expect_true(is.null(d$get("abcd"))) d$set("a", 100) expect_identical(d$get("a"), 100) expect_identical(d$get("y", missing = -1), -1) expect_error( d$get("y", missing = stop("Missing key")), "^Missing key$", ) d <- cache_disk(missing = stop("Missing key")) expect_error(d$get("abcd"), "^Missing key$") d$set("x", NULL) d$set("a", 100) expect_identical(d$get("a"), 100) expect_error(d$get("y"), "^Missing key$") expect_identical(d$get("y", missing = NULL), NULL) expect_true(is.key_missing(d$get("y", missing = key_missing()))) expect_error( d$get("y", missing = stop("Missing key 2")), "^Missing key 2$", ) # Pass in a quosure expr <- rlang::quo(stop("Missing key")) d <- cache_disk(missing = !!expr) expect_error(d$get("y"), "^Missing key$") expect_error(d$get("y"), "^Missing key$") # Make sure a second time also throws }) test_that("cache_disk: pruning respects max_n", { # Timing is apparently unreliable on CRAN, so skip tests there. It's possible # that a heavily loaded system will have issues with these tests because of # the time resolution. skip_on_cran() delay <- 0.01 d <- cache_disk_deterministic(max_n = 3) # NOTE: The short delays after each item are meant to tests more reliable on # CI systems. d$set("a", rnorm(100)); Sys.sleep(delay) d$set("b", rnorm(100)); Sys.sleep(delay) d$set("c", rnorm(100)); Sys.sleep(delay) d$set("d", rnorm(100)); Sys.sleep(delay) d$set("e", rnorm(100)); Sys.sleep(delay) d$prune() expect_identical(sort(d$keys()), c("c", "d", "e")) }) test_that("cache_disk: pruning respects max_size", { skip_on_cran() delay <- 0.01 d <- cache_disk_deterministic(max_size = 200) d$set("a", rnorm(100)); Sys.sleep(delay) d$set("b", rnorm(100)); Sys.sleep(delay) d$set("c", 1); Sys.sleep(delay) d$prune() expect_identical(sort(d$keys()), c("c")) d$set("d", rnorm(100)); Sys.sleep(delay) # Objects are pruned with oldest first, so even though "c" would fit in the # cache, it is removed after adding "d" (and "d" is removed as well because it # doesn't fit). d$prune() expect_length(d$keys(), 0) d$set("e", 2); Sys.sleep(delay) d$set("f", 3); Sys.sleep(delay) d$prune() expect_identical(sort(d$keys()), c("e", "f")) }) # Issue shiny#3033 test_that("cache_disk: pruning respects both max_n and max_size", { skip_on_cran() d <- cache_disk_deterministic(max_n = 3, max_size = 200) # Set some values. Use rnorm so that object size is large; a simple vector # like 1:100 will be stored very efficiently by R's ALTREP, and won't exceed # the max_size. We want each of these objects to exceed max_size so that # they'll be pruned. d$set("a", rnorm(100)) d$set("b", rnorm(100)) d$set("c", rnorm(100)) d$set("d", rnorm(100)) d$set("e", rnorm(100)) Sys.sleep(0.1) # For systems that have low mtime resolution. d$set("f", 1) # This object is small and shouldn't be pruned. d$prune() expect_identical(d$keys(), "f") }) # Return TRUE if the Sys.setFileTime() has subsecond resolution, FALSE # otherwise. setfiletime_has_subsecond_resolution <- function() { tmp <- tempfile() file.create(tmp) Sys.setFileTime(tmp, Sys.time()) time <- as.numeric(file.info(tmp)[['mtime']]) if (time == floor(time)) { return(FALSE) } else { return(TRUE) } } test_that('cache_disk: pruning with evict="lru"', { skip_on_cran() delay <- 0.01 # For lru tests, make sure there's sub-second resolution for # Sys.setFileTime(), because that's what the lru code uses to update times. skip_if_not( setfiletime_has_subsecond_resolution(), "Sys.setFileTime() does not have subsecond resolution on this platform." ) d <- cache_disk_deterministic(max_n = 2) d$set("a", 1); Sys.sleep(delay) d$set("b", 1); Sys.sleep(delay) d$set("c", 1); Sys.sleep(delay) d$prune() expect_identical(sort(d$keys()), c("b", "c")) d$get("b"); Sys.sleep(delay) d$set("d", 1); Sys.sleep(delay) d$prune() expect_identical(sort(d$keys()), c("b", "d")) d$get("b"); Sys.sleep(delay) d$set("e", 2); Sys.sleep(delay) d$get("b"); Sys.sleep(delay) d$set("f", 3); Sys.sleep(delay) d$prune() expect_identical(sort(d$keys()), c("b", "f")) }) test_that('cache_disk: pruning with evict="fifo"', { skip_on_cran() delay <- 0.01 d <- cache_disk_deterministic(max_n = 2, evict = "fifo") d$set("a", 1); Sys.sleep(delay) d$set("b", 1); Sys.sleep(delay) d$set("c", 1); Sys.sleep(delay) d$prune() expect_identical(sort(d$keys()), c("b", "c")) d$get("b") d$set("d", 1); Sys.sleep(delay) d$prune() expect_identical(sort(d$keys()), c("c", "d")) d$get("b") d$set("e", 2); Sys.sleep(delay) d$get("b") d$set("f", 3); Sys.sleep(delay) d$prune() expect_identical(sort(d$keys()), c("e", "f")) }) test_that("cache_disk: pruning throttling", { skip_on_cran() delay <- 0.01 # Pruning won't happen when the number of items is less than prune_rate AND # the set() calls happen within 5 seconds. d <- cache_disk_deterministic(max_n = 2, prune_rate = 20) d$set("a", 1); Sys.sleep(delay) d$set("b", 1); Sys.sleep(delay) d$set("c", 1); Sys.sleep(delay) d$set("d", 1); Sys.sleep(delay) expect_identical(sort(d$keys()), c("a", "b", "c", "d")) # Pruning will happen with a lower prune_rate value. d <- cache_disk_deterministic(max_n = 2, prune_rate = 3) d$set("a", 1); Sys.sleep(delay) d$set("b", 1); Sys.sleep(delay) d$set("c", 1); Sys.sleep(delay) expect_identical(sort(d$keys()), c("b", "c")) d$set("d", 1); Sys.sleep(delay) expect_identical(sort(d$keys()), c("b", "c", "d")) d$set("e", 1); Sys.sleep(delay) expect_identical(sort(d$keys()), c("b", "c", "d", "e")) d$set("f", 1); Sys.sleep(delay) expect_identical(sort(d$keys()), c("e", "f")) Sys.sleep(5) d$set("f", 1); Sys.sleep(delay) expect_identical(sort(d$keys()), c("e", "f")) }) test_that("destroy_on_finalize works", { d <- cache_disk(destroy_on_finalize = TRUE) cache_dir <- d$info()$dir expect_true(dir.exists(cache_dir)) rm(d) gc() expect_false(dir.exists(cache_dir)) }) test_that("Warnings for caching reference objects", { d <- cache_disk(warn_ref_objects = TRUE) expect_warning(d$set("a", new.env())) expect_warning(d$set("a", function() NULL)) expect_warning(d$set("a", fastmap())) # fastmap objects contain an external pointer # Default is to not warn on ref objects d <- cache_disk() expect_silent(d$set("a", new.env())) expect_silent(d$set("a", function() NULL)) expect_silent(d$set("a", fastmap())) }) cachem/tests/testthat/test-utils.R0000644000176200001440000000075514107023304016735 0ustar liggesusers test_that("validate_key", { expect_true(validate_key("e")) expect_true(validate_key("abc")) expect_true(validate_key("abcd123-_")) expect_true(validate_key("-")) expect_true(validate_key("_")) expect_error(validate_key("a.b")) expect_error(validate_key("a,b")) expect_error(validate_key("é")) expect_error(validate_key("ABC")) expect_error(validate_key("_A")) expect_error(validate_key("!")) expect_error(validate_key("a b")) expect_error(validate_key("ab\n")) }) cachem/tests/testthat.R0000644000176200001440000000007014107023304014606 0ustar liggesuserslibrary(testthat) library(cachem) test_check("cachem") cachem/src/0000755000176200001440000000000014107544042012262 5ustar liggesuserscachem/src/init.c0000644000176200001440000000073314107023304013365 0ustar liggesusers#include #include #include // for NULL #include #include /* .Call calls */ extern SEXP C_validate_key(SEXP); static const R_CallMethodDef CallEntries[] = { {"C_validate_key", (DL_FUNC) &C_validate_key, 1}, {NULL, NULL, 0} }; attribute_visible void R_init_cachem(DllInfo *dll) { R_registerRoutines(dll, NULL, CallEntries, NULL, NULL); R_useDynamicSymbols(dll, FALSE); } cachem/src/cache.c0000644000176200001440000000116014107023304013460 0ustar liggesusers#include #include #include SEXP C_validate_key(SEXP key_r) { if (TYPEOF(key_r) != STRSXP || Rf_length(key_r) != 1) { Rf_error("key must be a one-element character vector"); } SEXP key_c = STRING_ELT(key_r, 0); if (key_c == NA_STRING || Rf_StringBlank(key_c)) { Rf_error("key must be not be \"\" or NA"); } const char* s = R_CHAR(key_c); char cset[] = "1234567890abcdefghijklmnopqrstuvwxyz_-"; int i = strspn(s, cset); if (i != strlen(s)) { Rf_error("Invalid key: %s. Only lowercase letters and numbers are allowed.", s); } return Rf_ScalarLogical(TRUE); } cachem/R/0000755000176200001440000000000014107541531011674 5ustar liggesuserscachem/R/cache-disk.R0000644000176200001440000005217314107541531014022 0ustar liggesusers#' Create a disk cache object #' #' A disk cache object is a key-value store that saves the values as files in a #' directory on disk. Objects can be stored and retrieved using the `get()` and #' `set()` methods. Objects are automatically pruned from the cache according to #' the parameters `max_size`, `max_age`, `max_n`, and `evict`. #' #' #' @section Missing keys: #' #' The `missing` parameter controls what happens when `get()` is called with a #' key that is not in the cache (a cache miss). The default behavior is to #' return a [key_missing()] object. This is a *sentinel value* that indicates #' that the key was not present in the cache. You can test if the returned #' value represents a missing key by using the [is.key_missing()] function. #' You can also have `get()` return a different sentinel value, like `NULL`. #' If you want to throw an error on a cache miss, you can do so by providing #' an expression for `missing`, as in `missing = stop("Missing key")`. #' #' When the cache is created, you can supply a value for `missing`, which sets #' the default value to be returned for missing values. It can also be #' overridden when `get()` is called, by supplying a `missing` argument. For #' example, if you use `cache$get("mykey", missing = NULL)`, it will return #' `NULL` if the key is not in the cache. #' #' The `missing` parameter is actually an expression which is evaluated each #' time there is a cache miss. A quosure (from the rlang package) can be used. #' #' If you use this, the code that calls `get()` should be wrapped with #' [tryCatch()] to gracefully handle missing keys. #' #' #' @section Cache pruning: #' #' Cache pruning occurs when `set()` is called, or it can be invoked manually #' by calling `prune()`. #' #' The disk cache will throttle the pruning so that it does not happen on #' every call to `set()`, because the filesystem operations for checking the #' status of files can be slow. Instead, it will prune once in every #' `prune_rate` calls to `set()`, or if at least 5 seconds have elapsed since #' the last prune occurred, whichever is first. #' #' When a pruning occurs, if there are any objects that are older than #' `max_age`, they will be removed. #' #' The `max_size` and `max_n` parameters are applied to the cache as a whole, #' in contrast to `max_age`, which is applied to each object individually. #' #' If the number of objects in the cache exceeds `max_n`, then objects will be #' removed from the cache according to the eviction policy, which is set with #' the `evict` parameter. Objects will be removed so that the number of items #' is `max_n`. #' #' If the size of the objects in the cache exceeds `max_size`, then objects #' will be removed from the cache. Objects will be removed from the cache so #' that the total size remains under `max_size`. Note that the size is #' calculated using the size of the files, not the size of disk space used by #' the files --- these two values can differ because of files are stored in #' blocks on disk. For example, if the block size is 4096 bytes, then a file #' that is one byte in size will take 4096 bytes on disk. #' #' Another time that objects can be removed from the cache is when `get()` is #' called. If the target object is older than `max_age`, it will be removed #' and the cache will report it as a missing value. #' #' @section Eviction policies: #' #' If `max_n` or `max_size` are used, then objects will be removed from the #' cache according to an eviction policy. The available eviction policies are: #' #' \describe{ #' \item{`"lru"`}{ #' Least Recently Used. The least recently used objects will be removed. #' This uses the filesystem's mtime property. When "lru" is used, each #' `get()` is called, it will update the file's mtime using #' [Sys.setFileTime()]. Note that on some platforms, the resolution of #' [Sys.setFileTime()] may be low, one or two seconds. #' } #' \item{`"fifo"`}{ #' First-in-first-out. The oldest objects will be removed. #' } #' } #' #' Both of these policies use files' mtime. Note that some filesystems (notably #' FAT) have poor mtime resolution. (atime is not used because support for atime #' is worse than mtime.) #' #' #' @section Sharing among multiple processes: #' #' The directory for a cache_disk can be shared among multiple R processes. To #' do this, each R process should have a cache_disk object that uses the same #' directory. Each cache_disk will do pruning independently of the others, so #' if they have different pruning parameters, then one cache_disk may remove #' cached objects before another cache_disk would do so. #' #' Even though it is possible for multiple processes to share a cache_disk #' directory, this should not be done on networked file systems, because of #' slow performance of networked file systems can cause problems. If you need #' a high-performance shared cache, you can use one built on a database like #' Redis, SQLite, mySQL, or similar. #' #' When multiple processes share a cache directory, there are some potential #' race conditions. For example, if your code calls `exists(key)` to check if #' an object is in the cache, and then call `get(key)`, the object may be #' removed from the cache in between those two calls, and `get(key)` will #' throw an error. Instead of calling the two functions, it is better to #' simply call `get(key)`, and check that the returned object is not a #' `key_missing()` object, using `is.key_missing()`. This effectively tests #' for existence and gets the object in one operation. #' #' It is also possible for one processes to prune objects at the same time #' that another processes is trying to prune objects. If this happens, you may #' see a warning from `file.remove()` failing to remove a file that has #' already been deleted. #' #' #' @section Methods: #' #' A disk cache object has the following methods: #' #' \describe{ #' \item{`get(key, missing)`}{ #' Returns the value associated with `key`. If the key is not in the #' cache, then it evaluates the expression specified by `missing` and #' returns the value. If `missing` is specified here, then it will #' override the default that was set when the `cache_mem` object was #' created. See section Missing Keys for more information. #' } #' \item{`set(key, value)`}{ #' Stores the `key`-`value` pair in the cache. #' } #' \item{`exists(key)`}{ #' Returns `TRUE` if the cache contains the key, otherwise #' `FALSE`. #' } #' \item{`remove(key)`}{ #' Removes `key` from the cache, if it exists in the cache. If the key is #' not in the cache, this does nothing. #' } #' \item{`size()`}{ #' Returns the number of items currently in the cache. #' } #' \item{`keys()`}{ #' Returns a character vector of all keys currently in the cache. #' } #' \item{`reset()`}{ #' Clears all objects from the cache. #' } #' \item{`destroy()`}{ #' Clears all objects in the cache, and removes the cache directory from #' disk. #' } #' \item{`prune()`}{ #' Prunes the cache, using the parameters specified by `max_size`, #' `max_age`, `max_n`, and `evict`. #' } #' } #' #' @param dir Directory to store files for the cache. If `NULL` (the default) it #' will create and use a temporary directory. #' @param max_age Maximum age of files in cache before they are evicted, in #' seconds. Use `Inf` for no age limit. #' @param max_size Maximum size of the cache, in bytes. If the cache exceeds #' this size, cached objects will be removed according to the value of the #' `evict`. Use `Inf` for no size limit. The default is 1 gigabyte. #' @param max_n Maximum number of objects in the cache. If the number of objects #' exceeds this value, then cached objects will be removed according to the #' value of `evict`. Use `Inf` for no limit of number of items. #' @param evict The eviction policy to use to decide which objects are removed #' when a cache pruning occurs. Currently, `"lru"` and `"fifo"` are supported. #' @param destroy_on_finalize If `TRUE`, then when the cache_disk object is #' garbage collected, the cache directory and all objects inside of it will be #' deleted from disk. If `FALSE` (the default), it will do nothing when #' finalized. #' @param missing A value to return when `get(key)` is called but the key is not #' present in the cache. The default is a [key_missing()] object. It is #' actually an expression that is evaluated each time there is a cache miss. #' See section Missing keys for more information. #' @param prune_rate How often to prune the cache. See section Cache Pruning for #' more information. #' @param warn_ref_objects Should a warning be emitted when a reference is #' stored in the cache? This can be useful because serializing and #' deserializing a reference object (such as environments and external #' pointers) can lead to unexpected behavior. #' @param logfile An optional filename or connection object to where logging #' information will be written. To log to the console, use `stderr()` or #' `stdout()`. #' #' @return A disk caching object, with class `cache_disk`. #' @importFrom rlang enquo eval_tidy as_quosure #' @export cache_disk <- function( dir = NULL, max_size = 1024 * 1024 ^ 2, max_age = Inf, max_n = Inf, evict = c("lru", "fifo"), destroy_on_finalize = FALSE, missing = key_missing(), prune_rate = 20, warn_ref_objects = FALSE, logfile = NULL ) { # ============================================================================ # Constants # ============================================================================ PRUNE_THROTTLE_TIME_LIMIT <- 5 # ============================================================================ # Logging # ============================================================================ # This needs to be defined first, because it's used in initialization. log_ <- function(text) { if (is.null(logfile_)) return() text <- paste0(format(Sys.time(), "[%Y-%m-%d %H:%M:%OS3] cache_disk "), text) cat(text, sep = "\n", file = logfile_, append = TRUE) } # ============================================================================ # Initialization # ============================================================================ if (is.null(dir)) { dir <- tempfile("cache_disk-") } if (!is.numeric(max_size)) stop("max_size must be a number. Use `Inf` for no limit.") if (!is.numeric(max_age)) stop("max_age must be a number. Use `Inf` for no limit.") if (!is.numeric(max_n)) stop("max_n must be a number. Use `Inf` for no limit.") if (!dir.exists(dir)) { # log_(paste0("initialize: Creating ", dir)) dir.create(dir, recursive = TRUE) } logfile_ <- logfile dir_ <- normalizePath(dir) max_size_ <- max_size max_age_ <- max_age max_n_ <- max_n evict_ <- match.arg(evict) destroy_on_finalize_ <- destroy_on_finalize missing_ <- enquo(missing) prune_rate_ <- prune_rate destroyed_ <- FALSE # Start the prune throttle counter with a random number from 0-19. This is # so that, in the case where multiple cache_disk objects that point to the # same directory are created and discarded after just a few uses each, # pruning will still occur. prune_throttle_counter_ <- sample.int(prune_rate_, 1) - 1 prune_last_time_ <- as.numeric(Sys.time()) if (destroy_on_finalize_) { reg.finalizer( environment(), function(e) { e$destroy() } ) } # ============================================================================ # Public methods # ============================================================================ get <- function(key, missing = missing_) { log_(paste0('get: key "', key, '"')) is_destroyed(throw = TRUE) validate_key(key) maybe_prune_single_(key) filename <- key_to_filename_(key) # Instead of calling exists() before fetching the value, just try to # fetch the value. This reduces the risk of a race condition when # multiple processes share a cache. read_error <- FALSE tryCatch( { value <- suppressWarnings(readRDS(filename)) if (evict_ == "lru"){ Sys.setFileTime(filename, Sys.time()) } }, error = function(e) { read_error <<- TRUE } ) if (read_error) { log_(paste0('get: key "', key, '" is missing')) missing <- as_quosure(missing) return(eval_tidy(missing)) } log_(paste0('get: key "', key, '" found')) value } set <- function(key, value) { log_(paste0('set: key "', key, '"')) is_destroyed(throw = TRUE) validate_key(key) file <- key_to_filename_(key) temp_file <- paste0(file, "-temp-", random_hex(16)) save_error <- FALSE ref_object <- FALSE tryCatch( { saveRDS(value, file = temp_file, refhook = function(x) { ref_object <<- TRUE NULL } ) file.rename(temp_file, file) }, error = function(e) { save_error <<- TRUE # Unlike file.remove(), unlink() does not raise warning if file does # not exist. unlink(temp_file) } ) if (save_error) { log_(paste0('set: key "', key, '" error')) stop('Error setting value for key "', key, '".') } if (warn_ref_objects && ref_object) { log_(paste0('set: value is a reference object')) warning("A reference object was cached in a serialized format. The restored object may not work as expected.") } prune_throttled_() invisible(TRUE) } exists <- function(key) { is_destroyed(throw = TRUE) validate_key(key) file.exists(key_to_filename_(key)) } # Return all keys in the cache keys <- function() { is_destroyed(throw = TRUE) files <- dir(dir_, "\\.rds$") sub("\\.rds$", "", files) } remove <- function(key) { log_(paste0('remove: key "', key, '"')) is_destroyed(throw = TRUE) validate_key(key) # Remove file; use unlink() instead of file.remove() because it won't # warn if the file doesn't exist. unlink(key_to_filename_(key)) invisible(TRUE) } reset <- function() { log_(paste0('reset')) is_destroyed(throw = TRUE) file.remove(dir(dir_, "\\.rds$", full.names = TRUE)) invisible(TRUE) } prune <- function() { # TODO: It would be good to add parameters `n` and `size`, so that the # cache can be pruned to `max_n - n` and `max_size - size` before adding # an object. Right now we prune after adding the object, so the cache # can temporarily grow past the limits. The reason we don't do this now # is because it is expensive to find the size of the serialized object # before adding it. log_('prune') is_destroyed(throw = TRUE) current_time <- Sys.time() filenames <- dir(dir_, "\\.rds$", full.names = TRUE) info <- file.info(filenames, extra_cols = FALSE) info <- info[info$isdir == FALSE, ] info$name <- rownames(info) rownames(info) <- NULL # Files could be removed between the dir() and file.info() calls. The # entire row for such files will have NA values. Remove those rows. info <- info[!is.na(info$size), ] # 1. Remove any files where the age exceeds max age. if (is.finite(max_age_)) { timediff <- as.numeric(current_time - info$mtime, units = "secs") rm_idx <- timediff > max_age_ if (any(rm_idx)) { log_(paste0("prune max_age: Removing ", paste(info$name[rm_idx], collapse = ", "))) rm_success <- file.remove(info$name[rm_idx]) # This maps rm_success back into the TRUEs in the rm_idx vector. # If (for example) rm_idx is c(F,T,F,T,T) and rm_success is c(T,F,T), # then this line modifies rm_idx to be c(F,T,F,F,T). rm_idx[rm_idx] <- rm_success info <- info[!rm_idx, ] } } # Sort objects by priority. The sorting is done in a function which can be # called multiple times but only does the work the first time. info_is_sorted <- FALSE ensure_info_is_sorted <- function() { if (info_is_sorted) return() info <<- info[order(info$mtime, decreasing = TRUE), ] info_is_sorted <<- TRUE } # 2. Remove files if there are too many. if (is.finite(max_n_) && nrow(info) > max_n_) { ensure_info_is_sorted() rm_idx <- seq_len(nrow(info)) > max_n_ log_(paste0("prune max_n: Removing ", paste(info$name[rm_idx], collapse = ", "))) rm_success <- file.remove(info$name[rm_idx]) rm_idx[rm_idx] <- rm_success info <- info[!rm_idx, ] } # 3. Remove files if cache is too large. if (is.finite(max_size_) && sum(info$size) > max_size_) { ensure_info_is_sorted() cum_size <- cumsum(info$size) rm_idx <- cum_size > max_size_ log_(paste0("prune max_size: Removing ", paste(info$name[rm_idx], collapse = ", "))) rm_success <- file.remove(info$name[rm_idx]) rm_idx[rm_idx] <- rm_success info <- info[!rm_idx, ] } prune_last_time_ <<- as.numeric(current_time) invisible(TRUE) } size <- function() { is_destroyed(throw = TRUE) length(dir(dir_, "\\.rds$")) } info <- function() { list( dir = dir_, max_size = max_size_, max_age = max_age_, max_n = max_n_, evict = evict_, destroy_on_finalize = destroy_on_finalize_, missing = missing_, prune_rate = prune_rate, logfile = logfile_, prune_throttle_counter = prune_throttle_counter_, prune_last_time = as.POSIXct(prune_last_time_, origin = "1970-01-01") ) } destroy <- function() { if (is_destroyed()) { return(invisible(FALSE)) } log_(paste0("destroy: Removing ", dir_)) # First create a sentinel file so that other processes sharing this # cache know that the cache is to be destroyed. This is needed because # the recursive unlink is not atomic: another process can add a file to # the directory after unlink starts removing files but before it removes # the directory, and when that happens, the directory removal will fail. file.create(file.path(dir_, "__destroyed__")) # Remove all the .rds files. This will not remove the setinel file. file.remove(dir(dir_, "\\.rds$", full.names = TRUE)) # Next remove dir recursively, including sentinel file. unlink(dir_, recursive = TRUE) destroyed_ <<- TRUE invisible(TRUE) } is_destroyed <- function(throw = FALSE) { if (!dir.exists(dir_) || file.exists(file.path(dir_, "__destroyed__"))) { # It's possible for another process to destroy a shared cache directory destroyed_ <<- TRUE } if (throw) { if (destroyed_) { stop("Attempted to use cache which has been destroyed:\n ", dir_) } } else { destroyed_ } } # ============================================================================ # Private methods # ============================================================================ key_to_filename_ <- function(key) { validate_key(key) # Additional validation. This 80-char limit is arbitrary, and is # intended to avoid hitting a filename length limit on Windows. if (nchar(key) > 80) { stop("Invalid key: key must have fewer than 80 characters.") } file.path(dir_, paste0(key, ".rds")) } # A wrapper for prune() that throttles it, because prune() can be expensive # due to filesystem operations. This function will prune only once every # `prune_rate` times it is called, or if it has been more than 5 seconds since # the last time the cache was actually pruned, whichever is first. In the # future, the behavior may be customizable. prune_throttled_ <- function() { # Count the number of times prune() has been called. prune_throttle_counter_ <<- prune_throttle_counter_ + 1 if (prune_throttle_counter_ >= prune_rate_ || as.numeric(Sys.time()) - prune_last_time_ > PRUNE_THROTTLE_TIME_LIMIT) { prune() prune_throttle_counter_ <<- 0 } } # Prunes a single object if it exceeds max_age. If the object does not # exceed max_age, or if the object doesn't exist, do nothing. maybe_prune_single_ <- function(key) { # obj <- cache_[[key]] # if (is.null(obj)) return() filepath <- file.path(dir_, paste0(key, ".rds")) info <- file.info(filepath, extra_cols = FALSE) if (is.na(info$mtime)) return() timediff <- as.numeric(Sys.time()) - as.numeric(info$mtime) if (timediff > max_age_) { log_(paste0("pruning single object exceeding max_age: Removing ", key)) unlink(filepath) } } # ============================================================================ # Returned object # ============================================================================ structure( list( get = get, set = set, exists = exists, keys = keys, remove = remove, reset = reset, prune = prune, size = size, destroy = destroy, is_destroyed = is_destroyed, info = info ), class = c("cache_disk", "cachem") ) } cachem/R/utils.R0000644000176200001440000000163714107023304013157 0ustar liggesusershex_digits <- c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "b", "c", "d", "e", "f") random_hex <- function(digits = 16) { paste(sample(hex_digits, digits, replace = TRUE), collapse = "") } dir_remove <- function(path) { for (p in path) { if (!dir.exists(p)) { stop("Cannot remove non-existent directory ", p, ".") } if (length(dir(p, all.files = TRUE, no.. = TRUE)) != 0) { stop("Cannot remove non-empty directory ", p, ".") } result <- unlink(p, recursive = TRUE) if (result == 1) { stop("Error removing directory ", p, ".") } } } absolute_path <- function(path) { norm_path <- normalizePath(path, mustWork = FALSE) if (path == norm_path) { file.path(getwd(), path) } else { norm_path } } validate_key <- function(key) { # This C function does the same as `grepl("[^a-z0-9_-]")`, but faster. .Call(C_validate_key, key) } cachem/R/cache-layered.R0000644000176200001440000000460214107023304014500 0ustar liggesusers#' Compose any number of cache objects into a new, layered cache object #' #' Note that `cache_layered` is currently experimental. #' #' @param ... Cache objects to compose into a new, layered cache object. #' @inheritParams cache_disk #' #' @return A layered caching object, with class `cache_layered`. #' @examples #' #' # Make a layered cache from a small memory cache and large disk cache #' m <- cache_mem(max_size = 100 * 1024^2) #' d <- cache_disk(max_size = 2 * 1024^3) #' cl <- cache_layered(m, d) #' @export cache_layered <- function(..., logfile = NULL) { caches <- list(...) logfile_ <- logfile # ============================================================================ # Logging # ============================================================================ # This needs to be defined first, because it's used in initialization. log_ <- function(text) { if (is.null(logfile_)) return() text <- paste0(format(Sys.time(), "[%Y-%m-%d %H:%M:%OS3] cache_layered "), text) cat(text, sep = "\n", file = logfile_, append = TRUE) } get <- function(key) { log_(paste0("Get: ", key)) value <- NULL # Search down the caches for the object for (i in seq_along(caches)) { value <- caches[[i]]$get(key) if (!is.key_missing(value)) { log_(paste0("Get from ", class(caches[[i]])[1], "... hit")) # Set the value in any caches where we searched and missed. for (j in seq_len(i-1)) { caches[[j]]$set(key, value) } break } else { log_(paste0("Get from ", class(caches[[i]])[1], "... miss")) } } value } set <- function(key, value) { for (cache in caches) { cache$set(key, value) } } exists <- function(key) { for (cache in caches) { if (cache$exists(key)) { return(TRUE) } } FALSE } keys <- function() { unique(unlist(lapply(caches, function (cache) { cache$keys() }))) } remove <- function(key) { for (cache in caches) { cache$remove(key) } } reset <- function() { for (cache in caches) { cache$reset() } } get_caches <- function() { caches } structure( list( get = get, set = set, exists = exists, keys = keys, remove = remove, reset = reset, get_caches = get_caches ), class = c("cache_layered", "cachem") ) } cachem/R/cache-mem.R0000644000176200001440000005107314107023304013635 0ustar liggesusers#' Create a memory cache object #' #' A memory cache object is a key-value store that saves the values in an #' environment. Objects can be stored and retrieved using the `get()` and #' `set()` methods. Objects are automatically pruned from the cache according to #' the parameters `max_size`, `max_age`, `max_n`, and `evict`. #' #' In a `cache_mem`, R objects are stored directly in the cache; they are not #' *not* serialized before being stored in the cache. This contrasts with other #' cache types, like [cache_disk()], where objects are serialized, and the #' serialized object is cached. This can result in some differences of behavior. #' For example, as long as an object is stored in a cache_mem, it will not be #' garbage collected. #' #' @section Missing keys: #' #' The `missing` parameter controls what happens when `get()` is called with a #' key that is not in the cache (a cache miss). The default behavior is to #' return a [key_missing()] object. This is a *sentinel value* that indicates #' that the key was not present in the cache. You can test if the returned #' value represents a missing key by using the [is.key_missing()] function. #' You can also have `get()` return a different sentinel value, like `NULL`. #' If you want to throw an error on a cache miss, you can do so by providing #' an expression for `missing`, as in `missing = stop("Missing key")`. #' #' When the cache is created, you can supply a value for `missing`, which sets #' the default value to be returned for missing values. It can also be #' overridden when `get()` is called, by supplying a `missing` argument. For #' example, if you use `cache$get("mykey", missing = NULL)`, it will return #' `NULL` if the key is not in the cache. #' #' The `missing` parameter is actually an expression which is evaluated each #' time there is a cache miss. A quosure (from the rlang package) can be used. #' #' If you use this, the code that calls `get()` should be wrapped with #' [tryCatch()] to gracefully handle missing keys. #' #' #' @section Cache pruning: #' #' Cache pruning occurs when `set()` is called, or it can be invoked manually #' by calling `prune()`. #' #' When a pruning occurs, if there are any objects that are older than #' `max_age`, they will be removed. #' #' The `max_size` and `max_n` parameters are applied to the cache as a whole, #' in contrast to `max_age`, which is applied to each object individually. #' #' If the number of objects in the cache exceeds `max_n`, then objects will be #' removed from the cache according to the eviction policy, which is set with #' the `evict` parameter. Objects will be removed so that the number of items #' is `max_n`. #' #' If the size of the objects in the cache exceeds `max_size`, then objects #' will be removed from the cache. Objects will be removed from the cache so #' that the total size remains under `max_size`. Note that the size is #' calculated using the size of the files, not the size of disk space used by #' the files --- these two values can differ because of files are stored in #' blocks on disk. For example, if the block size is 4096 bytes, then a file #' that is one byte in size will take 4096 bytes on disk. #' #' Another time that objects can be removed from the cache is when `get()` is #' called. If the target object is older than `max_age`, it will be removed #' and the cache will report it as a missing value. #' #' @section Eviction policies: #' #' If `max_n` or `max_size` are used, then objects will be removed #' from the cache according to an eviction policy. The available eviction #' policies are: #' #' \describe{ #' \item{`"lru"`}{ #' Least Recently Used. The least recently used objects will be removed. #' } #' \item{`"fifo"`}{ #' First-in-first-out. The oldest objects will be removed. #' } #' } #' #' @section Methods: #' #' A disk cache object has the following methods: #' #' \describe{ #' \item{`get(key, missing)`}{ #' Returns the value associated with `key`. If the key is not in the #' cache, then it evaluates the expression specified by `missing` and #' returns the value. If `missing` is specified here, then it will #' override the default that was set when the `cache_mem` object was #' created. See section Missing Keys for more information. #' } #' \item{`set(key, value)`}{ #' Stores the `key`-`value` pair in the cache. #' } #' \item{`exists(key)`}{ #' Returns `TRUE` if the cache contains the key, otherwise #' `FALSE`. #' } #' \item{`remove(key)`}{ #' Removes `key` from the cache, if it exists in the cache. If the key is #' not in the cache, this does nothing. #' } #' \item{`size()`}{ #' Returns the number of items currently in the cache. #' } #' \item{`keys()`}{ #' Returns a character vector of all keys currently in the cache. #' } #' \item{`reset()`}{ #' Clears all objects from the cache. #' } #' \item{`destroy()`}{ #' Clears all objects in the cache, and removes the cache directory from #' disk. #' } #' \item{`prune()`}{ #' Prunes the cache, using the parameters specified by `max_size`, #' `max_age`, `max_n`, and `evict`. #' } #' } #' #' @inheritParams cache_disk #' #' @return A memory caching object, with class `cache_mem`. #' @importFrom utils object.size #' @export cache_mem <- function( max_size = 512 * 1024 ^ 2, max_age = Inf, max_n = Inf, evict = c("lru", "fifo"), missing = key_missing(), logfile = NULL) { # ============================================================================ # Constants # ============================================================================ DEBUG <- TRUE INITIAL_SIZE <- 64L COMPACT_LIMIT <- 256L COMPACT_MULT <- 2 # If TRUE, the data will be kept in the correct atime (for lru) or mtime (for # fifo) order each time get() or set() is called, though the metadata log will # grow by one entry each time (it will also occasionally be compacted). If # FALSE, the metadata entry will be kept in place (so the metadata log won't # grow as quickly), but the atimes/mtimes will not be kept in order; instead, # the metadata will be sorted by atime/mtime each time prune() is called (and # prune() is called by set()). The overall behavior is the same, but there are # somewhat different performance characteristics. The tradeoff is either # growing the log for every get() (and needing to occasionally compact it), or # having to sort it every time set() is called. Sorting data of a reasonable # size (up to around 1e5) is fast in R. For larger numbers of items it may be # better to set this to TRUE. MAINTAIN_TIME_SORT <- FALSE # ============================================================================ # Initialization # ============================================================================ if (!is.numeric(max_size)) stop("max_size must be a number. Use `Inf` for no limit.") if (!is.numeric(max_age)) stop("max_age must be a number. Use `Inf` for no limit.") if (!is.numeric(max_n)) stop("max_n must be a number. Use `Inf` for no limit.") max_size_ <- max_size max_age_ <- max_age max_n_ <- max_n evict_ <- match.arg(evict) missing_ <- enquo(missing) logfile_ <- logfile PRUNE_BY_SIZE <- is.finite(max_size_) PRUNE_BY_AGE <- is.finite(max_age_) PRUNE_BY_N <- is.finite(max_n_) # ============================================================================ # Internal state # ============================================================================ # The keys, values, and metadata are stored in columnar format. The vectors # key_, value_, size_, mtime_, and atime_ are the columns. Separate vectors # are used instead of a data frame, because operations for modifying and # growing vectors are much faster than the same operations on data frames. # # It uses a column-first format because a row-first format is much slower for # doing the manipulations and computations that are needed for pruning, such # as sorting by atime, and calculating a cumulative sum of sizes. # # For fast get() performance, there is also key_idx_map_, which maps between # the key, and the "row" index in our "data frame". # # An older version of this code stored the value along with metadata (size, # mtime, and atime) in a fastmap object, but this had poor performance for # pruning operations. This is because, for pruning, it needs to fetch the # metadata for all objects, then sort by atime (if evict="lru"), then take a # cumulative sum of sizes. Fetching the metadata for all objects was slow, as # was converting the resulting row-first data into column-first data. The # current column-first approach is much, much faster. key_idx_map_ <- fastmap() key_ <- rep_len(NA_character_, INITIAL_SIZE) value_ <- vector("list", INITIAL_SIZE) size_ <- rep_len(NA_real_, INITIAL_SIZE) mtime_ <- rep_len(NA_real_, INITIAL_SIZE) atime_ <- rep_len(NA_real_, INITIAL_SIZE) total_n_ <- 0L # Total number of items total_size_ <- 0 # Total number of bytes used last_idx_ <- 0L # Most recent (and largest) index used # ============================================================================ # Public methods # ============================================================================ get <- function(key, missing = missing_) { log_(paste0('get: key "', key, '"')) validate_key(key) idx <- key_idx_map_$get(key) if (is.null(idx)) { log_(paste0('get: key "', key, '" is missing')) missing <- as_quosure(missing) return(eval_tidy(missing)) } # Prunes a single object if it exceeds max_age. If the object does not # exceed max_age, or if the object doesn't exist, do nothing. if (PRUNE_BY_AGE) { time <- as.numeric(Sys.time()) if (time - mtime_[idx] > max_age_) { log_(paste0("pruning single object exceeding max_age: Removing ", key)) remove_(key) missing <- as_quosure(missing) return(eval_tidy(missing)) } } log_(paste0('get: key "', key, '" found')) # Get the value before updating atime, because that can move items around # when MAINTAIN_TIME_SORT is TRUE. value <- value_[[idx]] update_atime_(key) value } set <- function(key, value) { log_(paste0('set: key "', key, '"')) validate_key(key) time <- as.numeric(Sys.time()) if (PRUNE_BY_SIZE) { # Reported size is rough! See ?object.size. size <- as.numeric(object.size(value)) total_size_ <<- total_size_ + size } else { size <- NA_real_ } old_idx <- key_idx_map_$get(key) # We'll set this to TRUE if we need to append to the data; FALSE if we can # modify the existing entry in place. append <- NULL if (!is.null(old_idx)) { # If there's an existing entry with this key, clear out its row, because # we'll be appending a new one later. if (PRUNE_BY_SIZE) { total_size_ <<- total_size_ - size_[old_idx] } if (MAINTAIN_TIME_SORT && old_idx != last_idx_) { append <- TRUE key_ [old_idx] <<- NA_character_ value_[old_idx] <<- list(NULL) size_ [old_idx] <<- NA_real_ mtime_[old_idx] <<- NA_real_ atime_[old_idx] <<- NA_real_ } else { append <- FALSE } } else { append <- TRUE total_n_ <<- total_n_ + 1L } if (append) { # If we're appending, update the last_idx_ and use it for storage. This # assign past the end of the vector. As of R 3.4, this grows the vector in # place if possible, and is generally very fast, because vectors are # allocated with extra memory at the end. For older versions of R, this # can be very slow because a copy of the whole vector must be made each # time. last_idx_ <<- last_idx_ + 1L key_idx_map_$set(key, last_idx_) new_idx <- last_idx_ } else { # Not appending; replace the old item in place. new_idx <- old_idx } key_ [new_idx] <<- key value_[[new_idx]] <<- value size_ [new_idx] <<- size mtime_[new_idx] <<- time atime_[new_idx] <<- time prune() invisible(TRUE) } exists <- function(key) { validate_key(key) if (PRUNE_BY_AGE) { # Prunes a single object if it exceeds max_age. This code path looks a bit # complicated for what it does, but this is for performance. idx <- key_idx_map_$get(key) if (is.null(idx)) { return(FALSE) } time <- as.numeric(Sys.time()) if (time - mtime_[idx] > max_age_) { log_(paste0("pruning single object exceeding max_age: Removing ", key)) remove_(key) return(FALSE) } return(TRUE) } else { key_idx_map_$has(key) } } keys <- function() { if (PRUNE_BY_AGE) { # When there's no max_age, pruning is only needed when set() is called, # because that's the only way for max_n or max_size to be exceeded. But # when there is a max_age, we might need to prune here simply because time # has passed. (This could be made faster by having an option to prune() to # only prunes by age (and not by n or size). It could also avoid sorting # the metadata.) prune() } key_idx_map_$keys() } remove <- function(key) { log_(paste0('remove: key "', key, '"')) validate_key(key) remove_(key) invisible(TRUE) } reset <- function() { log_(paste0('reset')) key_idx_map_$reset() invisible(TRUE) } prune <- function() { log_(paste0('prune')) # Quick check to see if we need to prune if ((!PRUNE_BY_SIZE || total_size_ <= max_size_) && (!PRUNE_BY_N || total_n_ <= max_n_ ) && (!PRUNE_BY_AGE)) { return(invisible(TRUE)) } info <- get_metadata_() if (DEBUG) { # Sanity checks if (PRUNE_BY_SIZE && sum(info$size) != total_size_) { stop("Size mismatch") } if (length(info$key) != total_n_) { stop("Count mismatch") } } # 1. Remove any objects where the age exceeds max age. if (PRUNE_BY_AGE) { time <- as.numeric(Sys.time()) timediff <- time - info$mtime rm_idx <- timediff > max_age_ if (any(rm_idx)) { log_(paste0("prune max_age: Removing ", paste(info$key[rm_idx], collapse = ", "))) remove_(info$key[rm_idx]) # Trim all the vectors (need to do each individually since we're using a # list of vectors instead of a data frame, for performance). info$key <- info$key [!rm_idx] info$size <- info$size [!rm_idx] info$mtime <- info$mtime[!rm_idx] info$atime <- info$atime[!rm_idx] } } # 2. Remove objects if there are too many. if (PRUNE_BY_N && length(info$key) > max_n_) { rm_idx <- seq_along(info$key) > max_n_ log_(paste0("prune max_n: Removing ", paste(info$key[rm_idx], collapse = ", "))) remove_(info$key[rm_idx]) info$key <- info$key [!rm_idx] info$size <- info$size [!rm_idx] info$mtime <- info$mtime[!rm_idx] info$atime <- info$atime[!rm_idx] } # 3. Remove objects if cache is too large. if (PRUNE_BY_SIZE && sum(info$size) > max_size_) { cum_size <- cumsum(info$size) rm_idx <- cum_size > max_size_ log_(paste0("prune max_size: Removing ", paste(info$key[rm_idx], collapse = ", "))) remove_(info$key[rm_idx]) # No need to trim vectors this time, since this is the last pruning step. } invisible(TRUE) } size <- function() { if (PRUNE_BY_AGE) { # See note in exists() about why we prune here. prune() } if (DEBUG) { if (key_idx_map_$size() != total_n_) stop("n mismatch") } total_n_ } info <- function() { list( max_size = max_size_, max_age = max_age_, max_n = max_n_, evict = evict_, missing = missing_, logfile = logfile_ ) } # ============================================================================ # Private methods # ============================================================================ # Called when get() with lru. If fifo, no need to update. update_atime_ <- function(key) { if (evict_ != "lru") return() idx <- key_idx_map_$get(key) time <- as.numeric(Sys.time()) if (is.null(idx)) { stop("Can't update atime because entry doesn't exist") } if (MAINTAIN_TIME_SORT) { if (idx == last_idx_) { # last_idx_ entry; simply update time atime_[idx] <<- time } else { # "Move" this entry to the end. last_idx_ <<- last_idx_ + 1L # Add new entry to end. Fast on R 3.4 and above, slow on older versions. key_idx_map_$set(key, last_idx_) key_ [last_idx_] <<- key value_[[last_idx_]] <<- value_[[idx]] size_ [last_idx_] <<- size_ [idx] mtime_[last_idx_] <<- mtime_[idx] atime_[last_idx_] <<- time # Clear out old entry key_ [idx] <<- NA_character_ value_[idx] <<- list(NULL) size_ [idx] <<- NA_real_ mtime_[idx] <<- NA_real_ atime_[idx] <<- NA_real_ } } else { atime_[idx] <<- time } } remove_ <- function(keys) { if (length(keys) == 1) { remove_one_(keys) } else { vapply(keys, remove_one_, TRUE) } if (last_idx_ > COMPACT_LIMIT && last_idx_ > total_n_ * COMPACT_MULT) { compact_() } } remove_one_ <- function(key) { idx <- key_idx_map_$get(key) if (is.null(idx)) { return() } # Overall n and size bookkeeping total_n_ <<- total_n_ - 1L if (PRUNE_BY_SIZE) { total_size_ <<- total_size_ - size_[idx] } # Clear out entry key_ [idx] <<- NA_character_ value_[idx] <<- list(NULL) size_ [idx] <<- NA_real_ mtime_[idx] <<- NA_real_ atime_[idx] <<- NA_real_ key_idx_map_$remove(key) } compact_ <- function() { from_idxs <- key_[seq_len(last_idx_)] from_idxs <- !is.na(from_idxs) from_idxs <- which(from_idxs) if (DEBUG) stopifnot(total_n_ == length(from_idxs)) if (total_n_ == 0L) { message("nothing to compact") return() } new_size <- ceiling(total_n_ * COMPACT_MULT) # Allocate new vectors for metadata. new_key_ <- rep_len(NA_character_, new_size) new_value_ <- vector("list", new_size) new_size_ <- rep_len(NA_real_, new_size) new_mtime_ <- rep_len(NA_real_, new_size) new_atime_ <- rep_len(NA_real_, new_size) # Copy (and compact, removing gaps) from old vectors to new ones. to_idxs <- seq_len(total_n_) new_key_ [to_idxs] <- key_ [from_idxs] new_value_[to_idxs] <- value_[from_idxs] new_size_ [to_idxs] <- size_ [from_idxs] new_mtime_[to_idxs] <- mtime_[from_idxs] new_atime_[to_idxs] <- atime_[from_idxs] # Replace old vectors with new ones. key_ <<- new_key_ value_ <<- new_value_ size_ <<- new_size_ mtime_ <<- new_mtime_ atime_ <<- new_atime_ # Update the index values in the key-index map. args <- to_idxs names(args) <- key_[to_idxs] key_idx_map_$mset(.list = args) last_idx_ <<- total_n_ } # Returns data frame of info, with gaps removed. # If evict=="lru", this will be sorted by atime. # If evict=="fifo", this will be sorted by mtime. get_metadata_ <- function() { idxs <- !is.na(mtime_[seq_len(last_idx_)]) idxs <- which(idxs) if (!MAINTAIN_TIME_SORT) { if (evict_ == "lru") { idxs <- idxs[order(atime_[idxs])] } else { idxs <- idxs[order(mtime_[idxs])] } } idxs <- rev(idxs) # Return a list -- this basically same structure as a data frame, but # we're using a plain list to avoid data frame slowness list( key = key_ [idxs], size = size_ [idxs], mtime = mtime_[idxs], atime = atime_[idxs] ) } log_ <- function(text) { if (is.null(logfile_)) return() text <- paste0(format(Sys.time(), "[%Y-%m-%d %H:%M:%OS3] cache_mem "), text) cat(text, sep = "\n", file = logfile_, append = TRUE) } # ============================================================================ # Returned object # ============================================================================ structure( list( get = get, set = set, exists = exists, keys = keys, remove = remove, reset = reset, prune = prune, size = size, info = info ), class = c("cache_mem", "cachem") ) } cachem/R/cachem.R0000644000176200001440000000145014107023304013230 0ustar liggesusers#' @export format.cachem <- function(x, ...) { paste0( paste0("<", class(x), ">", collapse= " "), "\n", " Methods:\n", paste0( " ", format_methods(x), collapse ="\n" ) ) } format_methods <- function(x) { vapply(seq_along(x), function(i) { name <- names(x)[i] f <- x[[i]] if (is.function(f)) { paste0(name, "(", format_args(f), ")") } else { name } }, character(1) ) } format_args <- function(x) { nms <- names(formals(x)) vals <- as.character(formals(x)) args <- mapply(nms, vals, FUN = function(name, value) { if (value == "") { name } else { paste0(name, " = ", value) } }) paste(args, collapse = ", ") } #' @export print.cachem <- function(x, ...) { cat(format(x, ...)) } cachem/R/reexports.R0000644000176200001440000000021714107023304014043 0ustar liggesusers#' @importFrom fastmap key_missing #' @export fastmap::key_missing #' @importFrom fastmap is.key_missing #' @export fastmap::is.key_missing cachem/R/cachem-package.R0000644000176200001440000000012714107023304014621 0ustar liggesusers#' @docType package #' @useDynLib cachem, .registration = TRUE #' @import fastmap NULL cachem/NEWS.md0000644000176200001440000000145714107541531012600 0ustar liggesuserscachem 1.0.6 ============ * Fixed #14: Fix off-by-one error when checking pruning throttling counter for `cache_disk`. (#15) * Closed #13: Added documentation for the `remove()` method. cachem 1.0.5 ============ * `cache_mem()` and `cache_disk()` now allow `-` and `_` (hyphen and underscore) characters in the keys. (#9) * `cache_disk()` previously did not correctly throttle pruning. (#11) cachem 1.0.4 ============ * More pruning speed enhancements for `cache_mem()`. (#7) cachem 1.0.3 ============ * Addressed issues with timing-sensitive tests. cachem 1.0.2 ============ * Closed #4: Sped up pruning for `cache_mem`. (#5) * Fixed `cache_mem` pruning with `evict="lru"`. cachem 1.0.1 ============ * Fixed function declaration of `C_validate_key`. cachem 1.0.0 ============ * First CRAN release. cachem/MD50000644000176200001440000000221714107546532012013 0ustar liggesusers732692340772d863e430e644b82989b9 *DESCRIPTION 9dcc09d5adf966f50ec6e33aaf1a9299 *LICENSE d04ed310118f0f6793fd21dded64e62c *NAMESPACE 175ed916f234e16bf10f2ea2fc657347 *NEWS.md 5e1f6ed0ae9252a9dad6fb7df855590f *R/cache-disk.R bb0319525b8ff1b5979368ad765dad80 *R/cache-layered.R 2858c6a9409c07b9189d5d6d44cc6cb7 *R/cache-mem.R 9dd00dc2cbe812c13282751505d0e72d *R/cachem-package.R 0ed84916ca8a387d314b047c0b4b1693 *R/cachem.R 104501b9d33ae7b6c81f6313b1b4d892 *R/reexports.R a8c662a3f4f120ee3ef0035d6a7d8158 *R/utils.R 6ee30df63a8375ff84b1dca5008625b4 *README.md 395536841e0ba469aeafa970dd1e7e9b *man/cache_disk.Rd 9b827b902a8ab730b6b7c38be1827de6 *man/cache_layered.Rd 609c4700c54d68222e7c1a72ee7c926a *man/cache_mem.Rd f0451920126d89d9838750df0aa6e54e *man/reexports.Rd 4694a015e14c6a71dd56d7169e9b9ebe *src/cache.c 2979e6fdec726696d457d35a2e0b6cd4 *src/init.c d47ea68deaedae789dac89cd959c4b33 *tests/testthat.R 5a77e6915380c656616b5ea696b33a87 *tests/testthat/helper-utils.R cd27292f59b9f5aea4f48b01c5df0748 *tests/testthat/test-cache-disk.R 2b890cecde4d322822c716cfa4c9b930 *tests/testthat/test-cache-mem.R 7ccaa54736c7dced1b066edf76022572 *tests/testthat/test-utils.R