triebeard/0000755000176200001440000000000012750473341012221 5ustar liggesuserstriebeard/inst/0000755000176200001440000000000012750461372013177 5ustar liggesuserstriebeard/inst/doc/0000755000176200001440000000000012750461372013744 5ustar liggesuserstriebeard/inst/doc/r_radix.html0000644000176200001440000004673012750461372016274 0ustar liggesusers Radix trees in R

Radix trees in R

Oliver Keyes

2016-08-03

A radix tree, or trie, is a data structure optimised for storing key-value pairs in a way optimised for searching. This makes them very, very good for efficiently matching data against keys, and retrieving the values associated with those keys.

triebeard provides an implementation of tries for R (and one that can be used in Rcpp development, too, if that’s your thing) so that useRs can take advantage of the fast, efficient and user-friendly matching that they allow.

Radix usage

Suppose we have observations in a dataset that are labelled, with a 2-3 letter code that identifies the facility the sample came from:

labels <- c("AO-1002", "AEO-1004", "AAI-1009", "AFT-1403", "QZ-9065", "QZ-1021", "RF-0901",
            "AO-1099", "AFT-1101", "QZ-4933")

We know the facility each code maps to, and we want to be able to map the labels to that - not over 10 entries but over hundreds, or thousands, or hundreds of thousands. Tries are a great way of doing that: we treat the codes as keys and the full facility names as values. So let’s make a trie to do this matching, and then, well, match:

library(triebeard)
trie <- trie(keys = c("AO", "AEO", "AAI", "AFT", "QZ", "RF"),
             values = c("Audobon", "Atlanta", "Ann Arbor", "Austin", "Queensland", "Raleigh"))

longest_match(trie = trie, to_match = labels)

 [1] "Audobon"    "Atlanta"    "Ann Arbor"  "Austin"     "Queensland" "Queensland" "Raleigh"    "Audobon"    "Austin"    
[10] "Queensland"

This pulls out, for each label, the trie value where the associated key has the longest prefix-match to the label. We can also just grab all the values where the key starts with, say, A:

prefix_match(trie = trie, to_match = "A")

[[1]]
[1] "Ann Arbor" "Atlanta"   "Austin"    "Audobon"  

And finally if we want we can match very, very fuzzily using “greedy” matching:

greedy_match(trie = trie, to_match = "AO")

[[1]]
[1] "Ann Arbor" "Atlanta"   "Austin"    "Audobon"  

These operations are very, very efficient. If we use longest-match as an example, since that’s the most useful thing, with a one-million element vector of things to match against:

library(triebeard)
library(microbenchmark)

trie <- trie(keys = c("AO", "AEO", "AAI", "AFT", "QZ", "RF"),
             values = c("Audobon", "Atlanta", "Ann Arbor", "Austin", "Queensland", "Raleigh"))

labels <- rep(c("AO-1002", "AEO-1004", "AAI-1009", "AFT-1403", "QZ-9065", "QZ-1021", "RF-0901",
                "AO-1099", "AFT-1101", "QZ-4933"), 100000)

microbenchmark({longest_match(trie = trie, to_match = labels)})

Unit: milliseconds
                                                  expr      min       lq     mean   median       uq      max neval
 {     longest_match(trie = trie, to_match = labels) } 284.6457 285.5902 289.5342 286.8775 288.4564 327.3878   100

I think we can call <300 milliseconds for a million matches against an entire set of possible values pretty fast.

Radix modification

There’s always the possibility that (horror of horrors) you’ll have to add or remove entries from the trie. Fear not; you can do just that with trie_add and trie_remove respectively, both of which silently modify the trie they’re provided with to add or remove whatever key-value pairs you provide:

to_match = "198.0.0.1"
trie_inst <- trie(keys = "197", values = "fake range")

longest_match(trie_inst, to_match)
[1] NA

trie_add(trie_inst, keys = "198", values = "home range")
longest_match(trie_inst, to_match)
[1] "home range"

trie_remove(trie_inst, keys = "198")
longest_match(trie_inst, to_match)
[1] NA

Metadata and coercion

You can also extract information from tries without using them. dim, str, print and length all work for tries, and you can use get_keys(trie) and get_values(trie) to extract, respectively, the keys and values from a trie object.

In addition, you can also coerce tries into other R data structures, specifically lists and data.frames:

trie <- trie(keys = c("AO", "AEO", "AAI", "AFT", "QZ", "RF"),
             values = c("Audobon", "Atlanta", "Ann Arbor", "Austin", "Queensland", "Raleigh"))

str(as.data.frame(trie))
'data.frame':   6 obs. of  2 variables:
 $ keys  : chr  "AAI" "AEO" "AFT" "AO" ...
 $ values: chr  "Ann Arbor" "Atlanta" "Austin" "Audobon" ...

str(as.list(trie))

List of 2
 $ keys  : chr [1:6] "AAI" "AEO" "AFT" "AO" ...
 $ values: chr [1:6] "Ann Arbor" "Atlanta" "Austin" "Audobon" ...

Other trie operations

If you have ideas for other trie-like structures, or functions that would be useful with these tries, the best approach is to either request it or add it!

triebeard/inst/doc/r_radix.Rmd0000644000176200001440000001174012750461372016043 0ustar liggesusers--- title: "Radix trees in R" author: "Oliver Keyes" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Radix trees in R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- A **radix tree**, or **trie**, is a data structure optimised for storing key-value pairs in a way optimised for searching. This makes them very, very good for efficiently matching data against keys, and retrieving the values *associated* with those keys. `triebeard` provides an implementation of tries for R (and one that can be used in Rcpp development, too, if that's your thing) so that useRs can take advantage of the fast, efficient and user-friendly matching that they allow. ## Radix usage Suppose we have observations in a dataset that are labelled, with a 2-3 letter code that identifies the facility the sample came from: ```{r, eval=FALSE} labels <- c("AO-1002", "AEO-1004", "AAI-1009", "AFT-1403", "QZ-9065", "QZ-1021", "RF-0901", "AO-1099", "AFT-1101", "QZ-4933") ``` We know the facility each code maps to, and we want to be able to map the labels to that - not over 10 entries but over hundreds, or thousands, or hundreds *of* thousands. Tries are a great way of doing that: we treat the codes as *keys* and the full facility names as *values*. So let's make a trie to do this matching, and then, well, match: ```{r, eval=FALSE} library(triebeard) trie <- trie(keys = c("AO", "AEO", "AAI", "AFT", "QZ", "RF"), values = c("Audobon", "Atlanta", "Ann Arbor", "Austin", "Queensland", "Raleigh")) longest_match(trie = trie, to_match = labels) [1] "Audobon" "Atlanta" "Ann Arbor" "Austin" "Queensland" "Queensland" "Raleigh" "Audobon" "Austin" [10] "Queensland" ```` This pulls out, for each label, the trie value where the associated key has the longest prefix-match to the label. We can also just grab all the values where the key starts with, say, A: ```{r, eval=FALSE} prefix_match(trie = trie, to_match = "A") [[1]] [1] "Ann Arbor" "Atlanta" "Austin" "Audobon" ``` And finally if we want we can match very, very fuzzily using "greedy" matching: ```{r, eval=FALSE} greedy_match(trie = trie, to_match = "AO") [[1]] [1] "Ann Arbor" "Atlanta" "Austin" "Audobon" ``` These operations are very, very efficient. If we use longest-match as an example, since that's the most useful thing, with a one-million element vector of things to match against: ```{r, eval=FALSE} library(triebeard) library(microbenchmark) trie <- trie(keys = c("AO", "AEO", "AAI", "AFT", "QZ", "RF"), values = c("Audobon", "Atlanta", "Ann Arbor", "Austin", "Queensland", "Raleigh")) labels <- rep(c("AO-1002", "AEO-1004", "AAI-1009", "AFT-1403", "QZ-9065", "QZ-1021", "RF-0901", "AO-1099", "AFT-1101", "QZ-4933"), 100000) microbenchmark({longest_match(trie = trie, to_match = labels)}) Unit: milliseconds expr min lq mean median uq max neval { longest_match(trie = trie, to_match = labels) } 284.6457 285.5902 289.5342 286.8775 288.4564 327.3878 100 ``` I think we can call <300 milliseconds for a million matches against an entire set of possible values pretty fast. ## Radix modification There's always the possibility that (horror of horrors) you'll have to add or remove entries from the trie. Fear not; you can do just that with `trie_add` and `trie_remove` respectively, both of which silently modify the trie they're provided with to add or remove whatever key-value pairs you provide: ```{r, eval=FALSE} to_match = "198.0.0.1" trie_inst <- trie(keys = "197", values = "fake range") longest_match(trie_inst, to_match) [1] NA trie_add(trie_inst, keys = "198", values = "home range") longest_match(trie_inst, to_match) [1] "home range" trie_remove(trie_inst, keys = "198") longest_match(trie_inst, to_match) [1] NA ``` ## Metadata and coercion You can also extract information from tries without using them. `dim`, `str`, `print` and `length` all work for tries, and you can use `get_keys(trie)` and `get_values(trie)` to extract, respectively, the keys and values from a trie object. In addition, you can also coerce tries into other R data structures, specifically lists and data.frames: ```{r, eval=FALSE} trie <- trie(keys = c("AO", "AEO", "AAI", "AFT", "QZ", "RF"), values = c("Audobon", "Atlanta", "Ann Arbor", "Austin", "Queensland", "Raleigh")) str(as.data.frame(trie)) 'data.frame': 6 obs. of 2 variables: $ keys : chr "AAI" "AEO" "AFT" "AO" ... $ values: chr "Ann Arbor" "Atlanta" "Austin" "Audobon" ... str(as.list(trie)) List of 2 $ keys : chr [1:6] "AAI" "AEO" "AFT" "AO" ... $ values: chr [1:6] "Ann Arbor" "Atlanta" "Austin" "Audobon" ... ``` ### Other trie operations If you have ideas for other trie-like structures, or functions that would be useful with *these* tries, the best approach is to either [request it](https://github.com/Ironholds/triebeard/issues) or [add it](https://github.com/Ironholds/triebeard/pulls)! triebeard/inst/doc/rcpp_radix.R0000644000176200001440000000255712750461372016233 0ustar liggesusers## ---- eval=FALSE, engine="Rcpp"------------------------------------------ ## //[[Rcpp::depends(triebeard)]] ## #include ## ---- eval=FALSE, engine="Rcpp"------------------------------------------ ## radix_tree radix; ## ---- eval=FALSE, engine="Rcpp"------------------------------------------ ## radix_tree radix; ## radix["turnin"] = "entry the first"; ## radix["turin"] = "entry the second"; ## ## radix_tree::iterator it; ## ## it = radix.longest_match("turing"); ## ## if(it = radix.end()){ ## printf("No match was found :("); ## } else { ## std::string result = "Key of longest match: " + it->first + " , value of longest match: " + it->second; ## } ## ---- eval=FALSE, engine="Rcpp"------------------------------------------ ## radix_tree radix; ## radix["turnin"] = "entry the first"; ## radix["turin"] = "entry the second"; ## ## std::vector::iterator> vec; ## std::vector::iterator>::iterator it; ## ## it = radix.prefix_match("tur"); ## ## if(it == vec.end()){ ## printf("No match was found :("); ## } else { ## for (it = vec.begin(); it != vec.end(); ++it) { ## std::string result = "Key of a prefix match: " + it->first + " , value of a prefix match: " + it->second; ## } ## } triebeard/inst/doc/rcpp_radix.html0000644000176200001440000003113412750461372016767 0ustar liggesusers Radix trees in Rcpp

Radix trees in Rcpp

Oliver Keyes

2016-08-03

A radix tree is a data structure optimised for storing key-value pairs in a way optimised for searching. This makes them very, very good for efficiently matching data against keys, and retrieving the values associated with those keys.

triebeard provides an implementation of radix trees for Rcpp (and also for use directly in R). To start using radix trees in your Rcpp development, simply modify your C++ file to include at the top:

//[[Rcpp::depends(triebeard)]]
#include <radix.h>

Constructing trees

Trees are constructed using the syntax:

radix_tree<type1, type2> radix;

Where type represents the type of the keys (for example, std::string) and type2 the type of the values.

Radix trees can have any scalar type as keys, although strings are most typical; they can also have any scalar type for values. Once you’ve constructed a tree, new entries can be added in a very R-like way: radix[new_key] = new_value;. Entries can also be removed, with radix.erase(key).

Matching against trees

We then move on to the fun bit: matching! As mentioned, radix trees are really good for matching arbitrary values against keys (well, keys of the same type) and retrieving the associated values.

There are three types of supported matching; longest, prefix, and greedy. Longest does exactly what it says on the tin: it finds the key-value pair where the longest initial part of the key matches the arbitrary value:

radix_tree<std::string, std::string> radix;
radix["turnin"] = "entry the first";
radix["turin"] = "entry the second";

radix_tree<std::string, std::string>::iterator it;

it = radix.longest_match("turing");

if(it = radix.end()){
  printf("No match was found :(");
} else {
  std::string result = "Key of longest match: " + it->first + " , value of longest match: " + it->second;
}

Prefix matching provides all trie entries where the value-to-match is a prefix of the key:

radix_tree<std::string, std::string> radix;
radix["turnin"] = "entry the first";
radix["turin"] = "entry the second";

std::vector<radix_tree<std::string, std::string>::iterator> vec;
std::vector<radix_tree<std::string, std::string>::iterator>::iterator it;

it = radix.prefix_match("tur");

if(it == vec.end()){
  printf("No match was found :(");
} else {
  for (it = vec.begin(); it != vec.end(); ++it) {
    std::string result = "Key of a prefix match: " + it->first + " , value of a prefix match: " + it->second;
  }
}

Greedy matching matches very, very fuzzily (a value of ‘bring’, for example, will match ‘blind’, ‘bind’ and ‘binary’) and, syntactically, looks exactly the same as prefix-matching, albeit with radix.greedy_match() instead of radix.prefix_match().

Other trie things

If you have ideas for other trie-like structures, or functions that would be useful with these tries, the best approach is to either request it or add it!

triebeard/inst/doc/rcpp_radix.Rmd0000644000176200001440000000636012750461372016550 0ustar liggesusers--- title: "Radix trees in Rcpp" author: "Oliver Keyes" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Radix trees in Rcpp} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- A **radix tree** is a data structure optimised for storing key-value pairs in a way optimised for searching. This makes them very, very good for efficiently matching data against keys, and retrieving the values *associated* with those keys. `triebeard` provides an implementation of radix trees for Rcpp (and also for use directly in R). To start using radix trees in your Rcpp development, simply modify your C++ file to include at the top: ```{Rcpp, eval=FALSE} //[[Rcpp::depends(triebeard)]] #include ``` ## Constructing trees Trees are constructed using the syntax: ```{Rcpp, eval=FALSE} radix_tree radix; ``` Where `type` represents the type of the keys (for example, `std::string`) and `type2` the type of the values. Radix trees can have any scalar type as keys, although strings are most typical; they can also have any scalar type for values. Once you've constructed a tree, new entries can be added in a very R-like way: `radix[new_key] = new_value;`. Entries can also be removed, with `radix.erase(key)`. ## Matching against trees We then move on to the fun bit: matching! As mentioned, radix trees are really good for matching arbitrary values against keys (well, keys of the same type) and retrieving the associated values. There are three types of supported matching; longest, prefix, and greedy. Longest does exactly what it says on the tin: it finds the key-value pair where the longest initial part of the key matches the arbitrary value: ```{Rcpp, eval=FALSE} radix_tree radix; radix["turnin"] = "entry the first"; radix["turin"] = "entry the second"; radix_tree::iterator it; it = radix.longest_match("turing"); if(it = radix.end()){ printf("No match was found :("); } else { std::string result = "Key of longest match: " + it->first + " , value of longest match: " + it->second; } ``` Prefix matching provides all trie entries where the value-to-match is a *prefix* of the key: ```{Rcpp, eval=FALSE} radix_tree radix; radix["turnin"] = "entry the first"; radix["turin"] = "entry the second"; std::vector::iterator> vec; std::vector::iterator>::iterator it; it = radix.prefix_match("tur"); if(it == vec.end()){ printf("No match was found :("); } else { for (it = vec.begin(); it != vec.end(); ++it) { std::string result = "Key of a prefix match: " + it->first + " , value of a prefix match: " + it->second; } } ``` Greedy matching matches very, very fuzzily (a value of 'bring', for example, will match 'blind', 'bind' and 'binary') and, syntactically, looks exactly the same as prefix-matching, albeit with `radix.greedy_match()` instead of `radix.prefix_match()`. ### Other trie things If you have ideas for other trie-like structures, or functions that would be useful with *these* tries, the best approach is to either [request it](https://github.com/Ironholds/triebeard/issues) or [add it](https://github.com/Ironholds/triebeard/pulls)! triebeard/inst/doc/r_radix.R0000644000176200001440000000545712750461372015532 0ustar liggesusers## ---- eval=FALSE--------------------------------------------------------- # labels <- c("AO-1002", "AEO-1004", "AAI-1009", "AFT-1403", "QZ-9065", "QZ-1021", "RF-0901", # "AO-1099", "AFT-1101", "QZ-4933") ## ---- eval=FALSE--------------------------------------------------------- # library(triebeard) # trie <- trie(keys = c("AO", "AEO", "AAI", "AFT", "QZ", "RF"), # values = c("Audobon", "Atlanta", "Ann Arbor", "Austin", "Queensland", "Raleigh")) # # longest_match(trie = trie, to_match = labels) # # [1] "Audobon" "Atlanta" "Ann Arbor" "Austin" "Queensland" "Queensland" "Raleigh" "Audobon" "Austin" # [10] "Queensland" ## ---- eval=FALSE--------------------------------------------------------- # prefix_match(trie = trie, to_match = "A") # # [[1]] # [1] "Ann Arbor" "Atlanta" "Austin" "Audobon" ## ---- eval=FALSE--------------------------------------------------------- # greedy_match(trie = trie, to_match = "AO") # # [[1]] # [1] "Ann Arbor" "Atlanta" "Austin" "Audobon" ## ---- eval=FALSE--------------------------------------------------------- # library(triebeard) # library(microbenchmark) # # trie <- trie(keys = c("AO", "AEO", "AAI", "AFT", "QZ", "RF"), # values = c("Audobon", "Atlanta", "Ann Arbor", "Austin", "Queensland", "Raleigh")) # # labels <- rep(c("AO-1002", "AEO-1004", "AAI-1009", "AFT-1403", "QZ-9065", "QZ-1021", "RF-0901", # "AO-1099", "AFT-1101", "QZ-4933"), 100000) # # microbenchmark({longest_match(trie = trie, to_match = labels)}) # # Unit: milliseconds # expr min lq mean median uq max neval # { longest_match(trie = trie, to_match = labels) } 284.6457 285.5902 289.5342 286.8775 288.4564 327.3878 100 ## ---- eval=FALSE--------------------------------------------------------- # to_match = "198.0.0.1" # trie_inst <- trie(keys = "197", values = "fake range") # # longest_match(trie_inst, to_match) # [1] NA # # trie_add(trie_inst, keys = "198", values = "home range") # longest_match(trie_inst, to_match) # [1] "home range" # # trie_remove(trie_inst, keys = "198") # longest_match(trie_inst, to_match) # [1] NA ## ---- eval=FALSE--------------------------------------------------------- # trie <- trie(keys = c("AO", "AEO", "AAI", "AFT", "QZ", "RF"), # values = c("Audobon", "Atlanta", "Ann Arbor", "Austin", "Queensland", "Raleigh")) # # str(as.data.frame(trie)) # 'data.frame': 6 obs. of 2 variables: # $ keys : chr "AAI" "AEO" "AFT" "AO" ... # $ values: chr "Ann Arbor" "Atlanta" "Austin" "Audobon" ... # # str(as.list(trie)) # # List of 2 # $ keys : chr [1:6] "AAI" "AEO" "AFT" "AO" ... # $ values: chr [1:6] "Ann Arbor" "Atlanta" "Austin" "Audobon" ... triebeard/inst/include/0000755000176200001440000000000012742550046014620 5ustar liggesuserstriebeard/inst/include/radix.h0000644000176200001440000000004012742550046016072 0ustar liggesusers#include "radix/radix_tree.hpp" triebeard/inst/include/radix/0000755000176200001440000000000012750473341015730 5ustar liggesuserstriebeard/inst/include/radix/radix_tree.hpp0000644000176200001440000003017412742550046020573 0ustar liggesusers#ifndef RADIX_TREE_HPP #define RADIX_TREE_HPP #include #include #include #include #include "radix_tree_it.hpp" #include "radix_tree_node.hpp" template K radix_substr(const K &key, int begin, int num); template<> inline std::string radix_substr(const std::string &key, int begin, int num) { return key.substr(begin, num); } template K radix_join(const K &key1, const K &key2); template<> inline std::string radix_join(const std::string &key1, const std::string &key2) { return key1 + key2; } template int radix_length(const K &key); template<> inline int radix_length(const std::string &key) { return key.size(); } template class radix_tree { public: typedef K key_type; typedef T mapped_type; typedef std::pair value_type; typedef radix_tree_it iterator; typedef std::size_t size_type; radix_tree() : m_size(0), m_root(NULL) { } ~radix_tree() { delete m_root; } size_type size() const { return m_size; } bool empty() const { return m_size == 0; } void clear() { delete m_root; m_root = NULL; m_size = 0; } iterator find(const K &key); iterator begin(); iterator end(); std::pair insert(const value_type &val); bool erase(const K &key); void erase(iterator it); void prefix_match(const K &key, std::vector &vec); void greedy_match(const K &key, std::vector &vec); iterator longest_match(const K &key); T& operator[] (const K &lhs); private: size_type m_size; radix_tree_node* m_root; radix_tree_node* begin(radix_tree_node *node); radix_tree_node* find_node(const K &key, radix_tree_node *node, int depth); radix_tree_node* append(radix_tree_node *parent, const value_type &val); radix_tree_node* prepend(radix_tree_node *node, const value_type &val); void greedy_match(radix_tree_node *node, std::vector &vec); radix_tree(const radix_tree& other); // delete radix_tree& operator =(const radix_tree other); // delete }; template void radix_tree::prefix_match(const K &key, std::vector &vec) { vec.clear(); if (m_root == NULL) return; radix_tree_node *node; K key_sub1, key_sub2; node = find_node(key, m_root, 0); if (node->m_is_leaf) node = node->m_parent; int len = radix_length(key) - node->m_depth; key_sub1 = radix_substr(key, node->m_depth, len); key_sub2 = radix_substr(node->m_key, 0, len); if (key_sub1 != key_sub2) return; greedy_match(node, vec); } template typename radix_tree::iterator radix_tree::longest_match(const K &key) { if (m_root == NULL) return iterator(NULL); radix_tree_node *node; K key_sub; node = find_node(key, m_root, 0); if (node->m_is_leaf) return iterator(node); key_sub = radix_substr(key, node->m_depth, radix_length(node->m_key)); if (! (key_sub == node->m_key)) node = node->m_parent; K nul = radix_substr(key, 0, 0); while (node != NULL) { typename radix_tree_node::it_child it; it = node->m_children.find(nul); if (it != node->m_children.end() && it->second->m_is_leaf) return iterator(it->second); node = node->m_parent; } return iterator(NULL); } template typename radix_tree::iterator radix_tree::end() { return iterator(NULL); } template typename radix_tree::iterator radix_tree::begin() { radix_tree_node *node; if (m_root == NULL) node = NULL; else node = begin(m_root); return iterator(node); } template radix_tree_node* radix_tree::begin(radix_tree_node *node) { if (node->m_is_leaf) return node; assert(!node->m_children.empty()); return begin(node->m_children.begin()->second); } template T& radix_tree::operator[] (const K &lhs) { iterator it = find(lhs); if (it == end()) { std::pair val; val.first = lhs; std::pair ret; ret = insert(val); assert(ret.second == true); it = ret.first; } return it->second; } template void radix_tree::greedy_match(const K &key, std::vector &vec) { radix_tree_node *node; vec.clear(); if (m_root == NULL) return; node = find_node(key, m_root, 0); if (node->m_is_leaf) node = node->m_parent; greedy_match(node, vec); } template void radix_tree::greedy_match(radix_tree_node *node, std::vector &vec) { if (node->m_is_leaf) { vec.push_back(iterator(node)); return; } typename std::map*>::iterator it; for (it = node->m_children.begin(); it != node->m_children.end(); ++it) { greedy_match(it->second, vec); } } template void radix_tree::erase(iterator it) { erase(it->first); } template bool radix_tree::erase(const K &key) { if (m_root == NULL) return 0; radix_tree_node *child; radix_tree_node *parent; radix_tree_node *grandparent; K nul = radix_substr(key, 0, 0); child = find_node(key, m_root, 0); if (! child->m_is_leaf) return 0; parent = child->m_parent; parent->m_children.erase(nul); delete child; m_size--; if (parent == m_root) return 1; if (parent->m_children.size() > 1) return 1; if (parent->m_children.empty()) { grandparent = parent->m_parent; grandparent->m_children.erase(parent->m_key); delete parent; } else { grandparent = parent; } if (grandparent == m_root) { return 1; } if (grandparent->m_children.size() == 1) { // merge grandparent with the uncle typename std::map*>::iterator it; it = grandparent->m_children.begin(); radix_tree_node *uncle = it->second; if (uncle->m_is_leaf) return 1; uncle->m_depth = grandparent->m_depth; uncle->m_key = radix_join(grandparent->m_key, uncle->m_key); uncle->m_parent = grandparent->m_parent; grandparent->m_children.erase(it); grandparent->m_parent->m_children.erase(grandparent->m_key); grandparent->m_parent->m_children[uncle->m_key] = uncle; delete grandparent; } return 1; } template radix_tree_node* radix_tree::append(radix_tree_node *parent, const value_type &val) { int depth; int len; K nul = radix_substr(val.first, 0, 0); radix_tree_node *node_c, *node_cc; depth = parent->m_depth + radix_length(parent->m_key); len = radix_length(val.first) - depth; if (len == 0) { node_c = new radix_tree_node(val); node_c->m_depth = depth; node_c->m_parent = parent; node_c->m_key = nul; node_c->m_is_leaf = true; parent->m_children[nul] = node_c; return node_c; } else { node_c = new radix_tree_node(val); K key_sub = radix_substr(val.first, depth, len); parent->m_children[key_sub] = node_c; node_c->m_depth = depth; node_c->m_parent = parent; node_c->m_key = key_sub; node_cc = new radix_tree_node(val); node_c->m_children[nul] = node_cc; node_cc->m_depth = depth + len; node_cc->m_parent = node_c; node_cc->m_key = nul; node_cc->m_is_leaf = true; return node_cc; } } template radix_tree_node* radix_tree::prepend(radix_tree_node *node, const value_type &val) { int count; int len1, len2; len1 = radix_length(node->m_key); len2 = radix_length(val.first) - node->m_depth; for (count = 0; count < len1 && count < len2; count++) { if (! (node->m_key[count] == val.first[count + node->m_depth]) ) break; } assert(count != 0); node->m_parent->m_children.erase(node->m_key); radix_tree_node *node_a = new radix_tree_node; node_a->m_parent = node->m_parent; node_a->m_key = radix_substr(node->m_key, 0, count); node_a->m_depth = node->m_depth; node_a->m_parent->m_children[node_a->m_key] = node_a; node->m_depth += count; node->m_parent = node_a; node->m_key = radix_substr(node->m_key, count, len1 - count); node->m_parent->m_children[node->m_key] = node; K nul = radix_substr(val.first, 0, 0); if (count == len2) { radix_tree_node *node_b; node_b = new radix_tree_node(val); node_b->m_parent = node_a; node_b->m_key = nul; node_b->m_depth = node_a->m_depth + count; node_b->m_is_leaf = true; node_b->m_parent->m_children[nul] = node_b; return node_b; } else { radix_tree_node *node_b, *node_c; node_b = new radix_tree_node; node_b->m_parent = node_a; node_b->m_depth = node->m_depth; node_b->m_key = radix_substr(val.first, node_b->m_depth, len2 - count); node_b->m_parent->m_children[node_b->m_key] = node_b; node_c = new radix_tree_node(val); node_c->m_parent = node_b; node_c->m_depth = radix_length(val.first); node_c->m_key = nul; node_c->m_is_leaf = true; node_c->m_parent->m_children[nul] = node_c; return node_c; } } template std::pair::iterator, bool> radix_tree::insert(const value_type &val) { if (m_root == NULL) { K nul = radix_substr(val.first, 0, 0); m_root = new radix_tree_node; m_root->m_key = nul; } radix_tree_node *node = find_node(val.first, m_root, 0); if (node->m_is_leaf) { return std::pair(node, false); } else if (node == m_root) { m_size++; return std::pair(append(m_root, val), true); } else { m_size++; int len = radix_length(node->m_key); K key_sub = radix_substr(val.first, node->m_depth, len); if (key_sub == node->m_key) { return std::pair(append(node, val), true); } else { return std::pair(prepend(node, val), true); } } } template typename radix_tree::iterator radix_tree::find(const K &key) { if (m_root == NULL) return iterator(NULL); radix_tree_node *node = find_node(key, m_root, 0); // if the node is a internal node, return NULL if (! node->m_is_leaf) return iterator(NULL); return iterator(node); } template radix_tree_node* radix_tree::find_node(const K &key, radix_tree_node *node, int depth) { if (node->m_children.empty()) return node; typename radix_tree_node::it_child it; int len_key = radix_length(key) - depth; for (it = node->m_children.begin(); it != node->m_children.end(); ++it) { if (len_key == 0) { if (it->second->m_is_leaf) return it->second; else continue; } if (! it->second->m_is_leaf && key[depth] == it->first[0] ) { int len_node = radix_length(it->first); K key_sub = radix_substr(key, depth, len_node); if (key_sub == it->first) { return find_node(key, it->second, depth+len_node); } else { return it->second; } } } return node; } #endif // RADIX_TREE_HPP triebeard/inst/include/radix/radix_tree_node.hpp0000644000176200001440000000242612742550046021577 0ustar liggesusers#ifndef RADIX_TREE_NODE_HPP #define RADIX_TREE_NODE_HPP #include template class radix_tree_node { friend class radix_tree; friend class radix_tree_it; typedef std::pair value_type; typedef typename std::map* >::iterator it_child; private: radix_tree_node() : m_children(), m_parent(NULL), m_value(NULL), m_depth(0), m_is_leaf(false), m_key() { } radix_tree_node(const value_type &val); radix_tree_node(const radix_tree_node&); // delete radix_tree_node& operator=(const radix_tree_node&); // delete ~radix_tree_node(); std::map*> m_children; radix_tree_node *m_parent; value_type *m_value; int m_depth; bool m_is_leaf; K m_key; }; template radix_tree_node::radix_tree_node(const value_type &val) : m_children(), m_parent(NULL), m_value(NULL), m_depth(0), m_is_leaf(false), m_key() { m_value = new value_type(val); } template radix_tree_node::~radix_tree_node() { it_child it; for (it = m_children.begin(); it != m_children.end(); ++it) { delete it->second; } delete m_value; } #endif // RADIX_TREE_NODE_HPP triebeard/inst/include/radix/radix_tree_it.hpp0000644000176200001440000000576212742550046021274 0ustar liggesusers#ifndef RADIX_TREE_IT #define RADIX_TREE_IT #include // forward declaration template class radix_tree; template class radix_tree_node; template class radix_tree_it : public std::iterator > { friend class radix_tree; public: radix_tree_it() : m_pointee(0) { } radix_tree_it(const radix_tree_it& r) : m_pointee(r.m_pointee) { } radix_tree_it& operator=(const radix_tree_it& r) { m_pointee = r.m_pointee; return *this; } ~radix_tree_it() { } std::pair& operator* () const; std::pair* operator-> () const; const radix_tree_it& operator++ (); radix_tree_it operator++ (int); // const radix_tree_it& operator-- (); bool operator!= (const radix_tree_it &lhs) const; bool operator== (const radix_tree_it &lhs) const; private: radix_tree_node *m_pointee; radix_tree_it(radix_tree_node *p) : m_pointee(p) { } radix_tree_node* increment(radix_tree_node* node) const; radix_tree_node* descend(radix_tree_node* node) const; }; template radix_tree_node* radix_tree_it::increment(radix_tree_node* node) const { radix_tree_node* parent = node->m_parent; if (parent == NULL) return NULL; typename radix_tree_node::it_child it = parent->m_children.find(node->m_key); assert(it != parent->m_children.end()); ++it; if (it == parent->m_children.end()) return increment(parent); else return descend(it->second); } template radix_tree_node* radix_tree_it::descend(radix_tree_node* node) const { if (node->m_is_leaf) return node; typename radix_tree_node::it_child it = node->m_children.begin(); assert(it != node->m_children.end()); return descend(it->second); } template std::pair& radix_tree_it::operator* () const { return *m_pointee->m_value; } template std::pair* radix_tree_it::operator-> () const { return m_pointee->m_value; } template bool radix_tree_it::operator!= (const radix_tree_it &lhs) const { return m_pointee != lhs.m_pointee; } template bool radix_tree_it::operator== (const radix_tree_it &lhs) const { return m_pointee == lhs.m_pointee; } template const radix_tree_it& radix_tree_it::operator++ () { if (m_pointee != NULL) // it is undefined behaviour to dereference iterator that is out of bounds... m_pointee = increment(m_pointee); return *this; } template radix_tree_it radix_tree_it::operator++ (int) { radix_tree_it copy(*this); ++(*this); return copy; } #endif // RADIX_TREE_IT triebeard/tests/0000755000176200001440000000000012742550046013362 5ustar liggesuserstriebeard/tests/testthat.R0000644000176200001440000000007612742550046015350 0ustar liggesuserslibrary(testthat) library(triebeard) test_check("triebeard") triebeard/tests/testthat/0000755000176200001440000000000012750473341015223 5ustar liggesuserstriebeard/tests/testthat/test_get.R0000644000176200001440000000200712742550046017162 0ustar liggesuserscontext("Test key and value retrieval") testthat::test_that("String keys and values can be retrieved", { string_trie <- trie(LETTERS, LETTERS) testthat::expect_equal(LETTERS, get_values(string_trie)) testthat::expect_equal(LETTERS, get_keys(string_trie)) }) testthat::test_that("Integer keys and values can be retrieved", { int_trie <- trie(LETTERS, 1:length(LETTERS)) testthat::expect_equal(1:length(LETTERS), get_values(int_trie)) testthat::expect_equal(LETTERS, get_keys(int_trie)) }) testthat::test_that("Numeric keys and values can be retrieved", { vals <- as.double(1:length(LETTERS)) double_trie <- trie(LETTERS, vals) testthat::expect_equal(vals, get_values(double_trie)) testthat::expect_equal(LETTERS, get_keys(double_trie)) }) testthat::test_that("Boolean keys and values can be retrieved", { vals <- as.logical(rep(c(0,1), (length(LETTERS)/2))) bool_trie <- trie(LETTERS, vals) testthat::expect_equal(vals, get_values(bool_trie)) testthat::expect_equal(LETTERS, get_keys(bool_trie)) })triebeard/tests/testthat/test_convert.R0000644000176200001440000000144212742550046020065 0ustar liggesuserstestthat::context("Test conversion of tries into other R objects") testthat::test_that("Tries can be turned into lists", { trie_inst <- trie("foo", "bar") trlist <- as.list(trie_inst) testthat::expect_true(is.list(trlist)) testthat::expect_equal(length(trlist), 2) testthat::expect_equal(names(trlist), c("keys", "values")) testthat::expect_equal(trlist$values, "bar") testthat::expect_equal(trlist$keys, "foo") }) testthat::test_that("Tries can be turned into lists", { trie_inst <- trie("foo", "bar") trlist <- as.data.frame(trie_inst) testthat::expect_true(is.data.frame(trlist)) testthat::expect_equal(ncol(trlist), 2) testthat::expect_equal(names(trlist), c("keys", "values")) testthat::expect_equal(trlist$values, "bar") testthat::expect_equal(trlist$keys, "foo") })triebeard/tests/testthat/test_create.R0000644000176200001440000000201512742550046017645 0ustar liggesuserscontext("Test trie creation") testthat::test_that("String tries can be created and safely avoid collection", { string_trie <- trie(LETTERS, LETTERS) testthat::expect_true(any(class(string_trie) == "string_trie")) testthat::expect_true(any(class(string_trie) == "trie")) }) testthat::test_that("Integer tries can be created", { int_trie <- trie(LETTERS, 1:length(LETTERS)) testthat::expect_true(any(class(int_trie) == "integer_trie")) testthat::expect_true(any(class(int_trie) == "trie")) }) testthat::test_that("Double tries can be created", { vals <- as.double(1:length(LETTERS)) double_trie <- trie(LETTERS, vals) testthat::expect_true(any(class(double_trie) == "numeric_trie")) testthat::expect_true(any(class(double_trie) == "trie")) }) testthat::test_that("Logical tries can be created", { vals <- as.logical(rep(c(0,1), (length(LETTERS)/2))) bool_trie <- trie(LETTERS, vals) testthat::expect_true(any(class(bool_trie) == "logical_trie")) testthat::expect_true(any(class(bool_trie) == "trie")) }) triebeard/tests/testthat/test_prefix.R0000644000176200001440000000422412742550046017703 0ustar liggesuserstestthat::context("Test that prefix-matching works") testthat::test_that("prefix matching works for string tries", { trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = c("afford", "affair", "available", "binary", "bind", "blind")) output <- prefix_match(trie, "bin") testthat::expect_equal(length(output), 1) testthat::expect_true(is.list(output)) testthat::expect_true(all(output[[1]] == c("binary", "bind"))) }) testthat::test_that("prefix matching works for integer tries", { trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = c(1, 2, 3, 4, 5, 6)) output <- prefix_match(trie, "bin") testthat::expect_equal(length(output), 1) testthat::expect_true(is.list(output)) testthat::expect_true(all(output[[1]] == c(4, 5))) }) testthat::test_that("prefix matching works for numeric tries", { trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = as.numeric(c(1, 2, 3, 4, 5, 6))) output <- prefix_match(trie, "bin") testthat::expect_equal(length(output), 1) testthat::expect_true(is.list(output)) testthat::expect_true(all(output[[1]] == c(4, 5))) }) testthat::test_that("prefix matching works for logical tries", { trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = c(FALSE, FALSE, TRUE, FALSE, TRUE, TRUE)) output <- prefix_match(trie, "bin") testthat::expect_equal(length(output), 1) testthat::expect_true(is.list(output)) testthat::expect_true(all(output[[1]] == c(FALSE, TRUE))) }) testthat::test_that("prefix matching objects to non-trie objects", { expect_error(prefix_match("foo", "bar")) }) testthat::test_that("prefix matching produces NAs with impossibilities", { trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = c(FALSE, FALSE, TRUE, FALSE, TRUE, TRUE)) output <- prefix_match(trie, "bingo") testthat::expect_equal(length(output), 1) testthat::expect_true(is.list(output)) testthat::expect_true(is.na(output[[1]])) }) triebeard/tests/testthat/test_greedy.R0000644000176200001440000000333212742550046017664 0ustar liggesuserstestthat::context("Test that greedy-matching works") testthat::test_that("greedy matching works for string tries", { trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = c("afford", "affair", "available", "binary", "bind", "blind")) output <- greedy_match(trie, "avoid") testthat::expect_equal(length(output), 1) testthat::expect_true(is.list(output)) testthat::expect_true(all(output[[1]] == "available")) }) testthat::test_that("greedy matching works for integer tries", { trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = c(1, 2, 3, 4, 5, 6)) output <- greedy_match(trie, "avoid") testthat::expect_equal(length(output), 1) testthat::expect_true(is.list(output)) testthat::expect_true(all(output[[1]] == 3)) }) testthat::test_that("greedy matching works for numeric tries", { trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = as.numeric(c(1, 2, 3, 4, 5, 6))) output <- greedy_match(trie, "avoid") testthat::expect_equal(length(output), 1) testthat::expect_true(is.list(output)) testthat::expect_true(all(output[[1]] == 3)) }) testthat::test_that("greedy matching works for logical tries", { trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = c(FALSE, FALSE, TRUE, FALSE, TRUE, TRUE)) output <- greedy_match(trie, "avoid") testthat::expect_equal(length(output), 1) testthat::expect_true(is.list(output)) testthat::expect_true(output[[1]]) }) testthat::test_that("greedy matching objects to non-trie objects", { expect_error(greedy_match("foo", "bar")) })triebeard/tests/testthat/test_longest.R0000644000176200001440000000244012742550046020057 0ustar liggesuserstestthat::context("Test that longest-matching works") testthat::test_that("Longest matching works for string tries", { trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = c("afford", "affair", "available", "binary", "bind", "blind")) testthat::expect_equal(longest_match(trie, "binder"), "bind") }) testthat::test_that("Longest matching works for integer tries", { trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = c(1, 2, 3, 4, 5, 6)) testthat::expect_equal(longest_match(trie, "binder"), 5) }) testthat::test_that("Longest matching works for numeric tries", { trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = as.numeric(c(1, 2, 3, 4, 5, 6))) testthat::expect_equal(longest_match(trie, "binder"), 5.0) }) testthat::test_that("Longest matching works for logical tries", { trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = c(FALSE, FALSE, TRUE, FALSE, TRUE, TRUE)) testthat::expect_true(longest_match(trie, "binder")) }) testthat::test_that("Longest matching objects to non-trie objects", { expect_error(longest_match("foo", "bar")) })triebeard/tests/testthat/test_alter.R0000644000176200001440000000324212742550046017514 0ustar liggesuserscontext("Test trie alteration") testthat::test_that("String tries can be altered", { trie <- trie("foo", "bar") original_length <- length(trie) trie_add(trie, "baz", "qux") increased_length <- length(trie) trie_remove(trie, "baz") testthat::expect_true(original_length < increased_length) testthat::expect_true(length(trie) == original_length) }) testthat::test_that("String tries can be altered", { trie <- trie("foo", "bar") original_length <- length(trie) trie_add(trie, "baz", "qux") increased_length <- length(trie) trie_remove(trie, "baz") testthat::expect_true(original_length < increased_length) testthat::expect_true(length(trie) == original_length) }) testthat::test_that("Integer tries can be altered", { trie <- trie("foo", 1) original_length <- length(trie) trie_add(trie, "baz", 2) increased_length <- length(trie) trie_remove(trie, "baz") testthat::expect_true(original_length < increased_length) testthat::expect_true(length(trie) == original_length) }) testthat::test_that("Numeric tries can be altered", { trie <- trie("foo", as.numeric(1)) original_length <- length(trie) trie_add(trie, "baz", as.numeric(2)) increased_length <- length(trie) trie_remove(trie, "baz") testthat::expect_true(original_length < increased_length) testthat::expect_true(length(trie) == original_length) }) testthat::test_that("Logical tries can be altered", { trie <- trie("foo", FALSE) original_length <- length(trie) trie_add(trie, "baz", TRUE) increased_length <- length(trie) trie_remove(trie, "baz") testthat::expect_true(original_length < increased_length) testthat::expect_true(length(trie) == original_length) }) triebeard/src/0000755000176200001440000000000012750461372013011 5ustar liggesuserstriebeard/src/Makevars0000644000176200001440000000003712750461372014505 0ustar liggesusersPKG_CXXFLAGS=-I../inst/include triebeard/src/str.cpp0000644000176200001440000000512012750461372014323 0ustar liggesusers#include "r_trie.h" #define PRINTMAX 75 template static inline int numlen(T num){ return ((int)std::log10(num))+1; } // TODO NA's static inline int printsize(std::string x){ return x.length(); } static inline int printsize(int x){ if (x == NA_INTEGER) return 2; else return numlen(x); } static inline int printsize(double x){ if (ISNA(x)) return 2; else return numlen(x); } static inline int printsize(bool x){ if (x == NA_LOGICAL) return 2; else return 1; } // TODO NA's static inline void valprinter(std::string val){ Rcout << "\""; Rcout << val; Rcout << "\"" << " "; } static inline void valprinter(int val){ if (val == NA_INTEGER) Rcout << "NA"; else Rcout << val; } static inline void valprinter(double val){ if (ISNA(val)) Rcout << "NA"; else Rcout << val; } static inline void valprinter(bool val){ if (val == NA_INTEGER) Rcout << "NA"; else { if (val) Rcout << "TRUE"; else Rcout << "FALSE"; } } template static inline void trie_str_generic(SEXP radix, std::string type_str){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(radix); ptr_check(rt_ptr); int input_size = rt_ptr->size(); int iter; int printed = 0; Rcout << " Keys: chr [1:" << input_size << "] "; printed += 19 + numlen(input_size); typename radix_tree< std::string, T >::iterator it; iter = 0; for (it = rt_ptr->radix.begin(); it != rt_ptr->radix.end() && printed < PRINTMAX; ++it) { printed += it->first.length(); if (iter > 0 && printed > PRINTMAX) break; Rcout << "\"" << it->first << "\"" << " "; iter++; } if (iter < input_size) Rcout << "..."; Rcout << std::endl; printed = 0; Rcout << " Values: " << type_str << " [1:" << input_size << "] "; printed += 15 + type_str.length() + numlen(input_size); iter = 0; for (it = rt_ptr->radix.begin(); it != rt_ptr->radix.end() && iter < 5; ++it) { printed += printsize(it->second); if (iter > 0 && printed > PRINTMAX) break; valprinter(it->second); Rcout << " "; iter++; } if (iter < input_size) Rcout << "..."; Rcout << std::endl; } //[[Rcpp::export]] void trie_str_string(SEXP radix){ trie_str_generic(radix, "str"); } //[[Rcpp::export]] void trie_str_integer(SEXP radix){ trie_str_generic(radix, "int"); } //[[Rcpp::export]] void trie_str_numeric(SEXP radix){ trie_str_generic(radix, "num"); } //[[Rcpp::export]] void trie_str_logical(SEXP radix){ trie_str_generic(radix, "logi"); } triebeard/src/prefix_match.cpp0000644000176200001440000000330012750461372016162 0ustar liggesusers#include #include "r_trie.h" using namespace Rcpp; template List prefix_generic(SEXP radix, CharacterVector to_match, Z missing_val){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(radix); ptr_check(rt_ptr); typename radix_tree::iterator it; unsigned int input_size = to_match.size(); List output(input_size); for(unsigned int i = 0; i < input_size; i++){ if((i % 10000) == 0){ Rcpp::checkUserInterrupt(); } X holding; std::vector::iterator> vec; typename std::vector::iterator>::iterator it; if(to_match[i] == NA_STRING){ holding.push_back(missing_val); } else { rt_ptr->radix.prefix_match(Rcpp::as(to_match[i]), vec); for (it = vec.begin(); it != vec.end(); ++it) { holding.push_back((*it)->second); } if(holding.size() == 0){ holding.push_back(missing_val); } } output[i] = holding; } return output; } //[[Rcpp::export]] List prefix_string(SEXP radix, CharacterVector to_match){ return prefix_generic(radix, to_match, NA_STRING); } //[[Rcpp::export]] List prefix_integer(SEXP radix, CharacterVector to_match){ return prefix_generic(radix, to_match, NA_INTEGER); } //[[Rcpp::export]] List prefix_numeric(SEXP radix, CharacterVector to_match){ return prefix_generic(radix, to_match, NA_REAL); } //[[Rcpp::export]] List prefix_logical(SEXP radix, CharacterVector to_match){ return prefix_generic(radix, to_match, NA_INTEGER); } triebeard/src/get.cpp0000644000176200001440000000260312750461372014275 0ustar liggesusers#include "r_trie.h" template static inline std::vector < std::string > get_keys_generic(SEXP radix){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(radix); ptr_check(rt_ptr); return rt_ptr->get_keys(); } //[[Rcpp::export]] std::vector < std::string > get_keys_string(SEXP radix){ return get_keys_generic(radix); } //[[Rcpp::export]] std::vector < std::string > get_keys_integer(SEXP radix){ return get_keys_generic(radix); } //[[Rcpp::export]] std::vector < std::string > get_keys_numeric(SEXP radix){ return get_keys_generic(radix); } //[[Rcpp::export]] std::vector < std::string > get_keys_logical(SEXP radix){ return get_keys_generic(radix); } template static inline std::vector < T > get_values_generic(SEXP radix){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(radix); ptr_check(rt_ptr); return rt_ptr->get_values(); } //[[Rcpp::export]] std::vector < std::string > get_values_string(SEXP radix){ return get_values_generic(radix); } //[[Rcpp::export]] std::vector < int > get_values_integer(SEXP radix){ return get_values_generic(radix); } //[[Rcpp::export]] std::vector < double > get_values_numeric(SEXP radix){ return get_values_generic(radix); } //[[Rcpp::export]] std::vector < bool > get_values_logical(SEXP radix){ return get_values_generic(radix); } triebeard/src/create.cpp0000644000176200001440000000160312750461372014760 0ustar liggesusers#include #include "typedef.h" using namespace Rcpp; //[[Rcpp::export]] SEXP radix_create_string(std::vector < std::string > keys, std::vector < std::string > values){ r_trie * radix = new r_trie (keys, values); return XPtrRadixStr(radix); } //[[Rcpp::export]] SEXP radix_create_integer(std::vector < std::string > keys, std::vector < int > values){ r_trie * radix = new r_trie (keys, values); return XPtrRadixInt(radix); } //[[Rcpp::export]] SEXP radix_create_numeric(std::vector < std::string > keys, std::vector < double > values){ r_trie * radix = new r_trie (keys, values); XPtrRadixDouble ptr(radix); return ptr; } //[[Rcpp::export]] SEXP radix_create_logical(std::vector < std::string > keys, std::vector < bool > values){ r_trie * radix = new r_trie (keys, values); return XPtrRadixBool(radix); } triebeard/src/longest_match.cpp0000644000176200001440000000274612750461372016355 0ustar liggesusers#include #include "r_trie.h" using namespace Rcpp; template X longest_generic(SEXP radix, CharacterVector to_match, Z missing_val){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(radix); ptr_check(rt_ptr); typename radix_tree::iterator it; unsigned int input_size = to_match.size(); X output(input_size); for(unsigned int i = 0; i < input_size; i++){ if((i % 10000) == 0){ Rcpp::checkUserInterrupt(); } if(to_match[i] == NA_STRING){ output[i] = missing_val; } else { it = rt_ptr->radix.longest_match(Rcpp::as(to_match[i])); if(it != rt_ptr->radix.end()){ output[i] = it->second; } else { output[i] = missing_val; } } } return output; } //[[Rcpp::export]] CharacterVector longest_string(SEXP radix, CharacterVector to_match){ return longest_generic(radix, to_match, NA_STRING); } //[[Rcpp::export]] IntegerVector longest_integer(SEXP radix, CharacterVector to_match){ return longest_generic(radix, to_match, NA_INTEGER); } //[[Rcpp::export]] NumericVector longest_numeric(SEXP radix, CharacterVector to_match){ return longest_generic(radix, to_match, NA_REAL); } //[[Rcpp::export]] LogicalVector longest_logical(SEXP radix, CharacterVector to_match){ return longest_generic(radix, to_match, NA_INTEGER); } triebeard/src/greedy_match.cpp0000644000176200001440000000322312750461372016150 0ustar liggesusers#include #include "r_trie.h" using namespace Rcpp; template List greedy_generic(SEXP radix, CharacterVector to_match, Y non_match_val){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(radix); ptr_check(rt_ptr); unsigned int input_size = to_match.size(); List output(input_size); for(unsigned int i = 0; i < input_size; i++){ if((i % 10000) == 0){ Rcpp::checkUserInterrupt(); } X holding; std::vector::iterator> vec; typename std::vector::iterator>::iterator it; if(to_match[i] == NA_STRING){ holding.push_back(non_match_val); } else { rt_ptr->radix.greedy_match(Rcpp::as(to_match[i]), vec); for (it = vec.begin(); it != vec.end(); ++it) { holding.push_back((*it)->second); } if(holding.size() == 0){ holding.push_back(non_match_val); } } output[i] = holding; } return output; } //[[Rcpp::export]] List greedy_string(SEXP radix, CharacterVector to_match){ return greedy_generic(radix, to_match, NA_STRING); } //[[Rcpp::export]] List greedy_integer(SEXP radix, CharacterVector to_match){ return greedy_generic(radix, to_match, NA_INTEGER); } //[[Rcpp::export]] List greedy_numeric(SEXP radix, CharacterVector to_match){ return greedy_generic(radix, to_match, NA_REAL); } //[[Rcpp::export]] List greedy_logical(SEXP radix, CharacterVector to_match){ return greedy_generic(radix, to_match, NA_INTEGER); } triebeard/src/r_trie.h0000644000176200001440000000346212750461372014453 0ustar liggesusers#include #include using namespace Rcpp; #ifndef __RTRIE_CORE__ #define __RTRIE_CORE__ static inline void ptr_check(void *ptr){ if (ptr == NULL){ stop("invalid trie object; pointer is NULL"); } } template class r_trie { public: int size(){ return radix.size(); } radix_tree radix; int radix_size; r_trie(std::vector < std::string > keys, std::vector < T > values){ unsigned int in_size = keys.size(); for(unsigned int i = 0; i < in_size; i++){ if((i % 10000) == 0){ Rcpp::checkUserInterrupt(); } radix[keys[i]] = values[i]; } radix_size = size(); } std::vector < std::string > get_keys(){ int input_size = size(); int iter = 0; std::vector < std::string > output(input_size); typename radix_tree< std::string, T >::iterator it; for (it = radix.begin(); it != radix.end(); ++it) { output[iter] = it->first; iter++; } return output; } std::vector < T > get_values(){ int input_size = size(); int iter = 0; std::vector < T > output(input_size); typename radix_tree< std::string, T >::iterator it; for (it = radix.begin(); it != radix.end(); ++it) { output[iter] = it->second; iter++; } return output; } void insert_value(std::string key, T value){ radix[key] = value; } void remove_values(CharacterVector keys){ unsigned int in_size = keys.size(); for(unsigned int i = 0; i < in_size; i++){ if((i % 10000) == 0){ Rcpp::checkUserInterrupt(); } if(keys[i] != NA_STRING){ radix.erase(Rcpp::as(keys[i])); } } radix_size = size(); } }; #endif triebeard/src/RcppExports.cpp0000644000176200001440000004121412750461372016010 0ustar liggesusers// This file was generated by Rcpp::compileAttributes // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 #include using namespace Rcpp; // add_trie_string void add_trie_string(SEXP trie, CharacterVector keys, CharacterVector values); RcppExport SEXP triebeard_add_trie_string(SEXP trieSEXP, SEXP keysSEXP, SEXP valuesSEXP) { BEGIN_RCPP Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type trie(trieSEXP); Rcpp::traits::input_parameter< CharacterVector >::type keys(keysSEXP); Rcpp::traits::input_parameter< CharacterVector >::type values(valuesSEXP); add_trie_string(trie, keys, values); return R_NilValue; END_RCPP } // add_trie_integer void add_trie_integer(SEXP trie, CharacterVector keys, IntegerVector values); RcppExport SEXP triebeard_add_trie_integer(SEXP trieSEXP, SEXP keysSEXP, SEXP valuesSEXP) { BEGIN_RCPP Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type trie(trieSEXP); Rcpp::traits::input_parameter< CharacterVector >::type keys(keysSEXP); Rcpp::traits::input_parameter< IntegerVector >::type values(valuesSEXP); add_trie_integer(trie, keys, values); return R_NilValue; END_RCPP } // add_trie_numeric void add_trie_numeric(SEXP trie, CharacterVector keys, NumericVector values); RcppExport SEXP triebeard_add_trie_numeric(SEXP trieSEXP, SEXP keysSEXP, SEXP valuesSEXP) { BEGIN_RCPP Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type trie(trieSEXP); Rcpp::traits::input_parameter< CharacterVector >::type keys(keysSEXP); Rcpp::traits::input_parameter< NumericVector >::type values(valuesSEXP); add_trie_numeric(trie, keys, values); return R_NilValue; END_RCPP } // add_trie_logical void add_trie_logical(SEXP trie, CharacterVector keys, LogicalVector values); RcppExport SEXP triebeard_add_trie_logical(SEXP trieSEXP, SEXP keysSEXP, SEXP valuesSEXP) { BEGIN_RCPP Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type trie(trieSEXP); Rcpp::traits::input_parameter< CharacterVector >::type keys(keysSEXP); Rcpp::traits::input_parameter< LogicalVector >::type values(valuesSEXP); add_trie_logical(trie, keys, values); return R_NilValue; END_RCPP } // remove_trie_string void remove_trie_string(SEXP trie, CharacterVector keys); RcppExport SEXP triebeard_remove_trie_string(SEXP trieSEXP, SEXP keysSEXP) { BEGIN_RCPP Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type trie(trieSEXP); Rcpp::traits::input_parameter< CharacterVector >::type keys(keysSEXP); remove_trie_string(trie, keys); return R_NilValue; END_RCPP } // remove_trie_integer void remove_trie_integer(SEXP trie, CharacterVector keys); RcppExport SEXP triebeard_remove_trie_integer(SEXP trieSEXP, SEXP keysSEXP) { BEGIN_RCPP Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type trie(trieSEXP); Rcpp::traits::input_parameter< CharacterVector >::type keys(keysSEXP); remove_trie_integer(trie, keys); return R_NilValue; END_RCPP } // remove_trie_numeric void remove_trie_numeric(SEXP trie, CharacterVector keys); RcppExport SEXP triebeard_remove_trie_numeric(SEXP trieSEXP, SEXP keysSEXP) { BEGIN_RCPP Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type trie(trieSEXP); Rcpp::traits::input_parameter< CharacterVector >::type keys(keysSEXP); remove_trie_numeric(trie, keys); return R_NilValue; END_RCPP } // remove_trie_logical void remove_trie_logical(SEXP trie, CharacterVector keys); RcppExport SEXP triebeard_remove_trie_logical(SEXP trieSEXP, SEXP keysSEXP) { BEGIN_RCPP Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type trie(trieSEXP); Rcpp::traits::input_parameter< CharacterVector >::type keys(keysSEXP); remove_trie_logical(trie, keys); return R_NilValue; END_RCPP } // radix_create_string SEXP radix_create_string(std::vector < std::string > keys, std::vector < std::string > values); RcppExport SEXP triebeard_radix_create_string(SEXP keysSEXP, SEXP valuesSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< std::vector < std::string > >::type keys(keysSEXP); Rcpp::traits::input_parameter< std::vector < std::string > >::type values(valuesSEXP); __result = Rcpp::wrap(radix_create_string(keys, values)); return __result; END_RCPP } // radix_create_integer SEXP radix_create_integer(std::vector < std::string > keys, std::vector < int > values); RcppExport SEXP triebeard_radix_create_integer(SEXP keysSEXP, SEXP valuesSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< std::vector < std::string > >::type keys(keysSEXP); Rcpp::traits::input_parameter< std::vector < int > >::type values(valuesSEXP); __result = Rcpp::wrap(radix_create_integer(keys, values)); return __result; END_RCPP } // radix_create_numeric SEXP radix_create_numeric(std::vector < std::string > keys, std::vector < double > values); RcppExport SEXP triebeard_radix_create_numeric(SEXP keysSEXP, SEXP valuesSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< std::vector < std::string > >::type keys(keysSEXP); Rcpp::traits::input_parameter< std::vector < double > >::type values(valuesSEXP); __result = Rcpp::wrap(radix_create_numeric(keys, values)); return __result; END_RCPP } // radix_create_logical SEXP radix_create_logical(std::vector < std::string > keys, std::vector < bool > values); RcppExport SEXP triebeard_radix_create_logical(SEXP keysSEXP, SEXP valuesSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< std::vector < std::string > >::type keys(keysSEXP); Rcpp::traits::input_parameter< std::vector < bool > >::type values(valuesSEXP); __result = Rcpp::wrap(radix_create_logical(keys, values)); return __result; END_RCPP } // get_keys_string std::vector < std::string > get_keys_string(SEXP radix); RcppExport SEXP triebeard_get_keys_string(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); __result = Rcpp::wrap(get_keys_string(radix)); return __result; END_RCPP } // get_keys_integer std::vector < std::string > get_keys_integer(SEXP radix); RcppExport SEXP triebeard_get_keys_integer(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); __result = Rcpp::wrap(get_keys_integer(radix)); return __result; END_RCPP } // get_keys_numeric std::vector < std::string > get_keys_numeric(SEXP radix); RcppExport SEXP triebeard_get_keys_numeric(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); __result = Rcpp::wrap(get_keys_numeric(radix)); return __result; END_RCPP } // get_keys_logical std::vector < std::string > get_keys_logical(SEXP radix); RcppExport SEXP triebeard_get_keys_logical(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); __result = Rcpp::wrap(get_keys_logical(radix)); return __result; END_RCPP } // get_values_string std::vector < std::string > get_values_string(SEXP radix); RcppExport SEXP triebeard_get_values_string(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); __result = Rcpp::wrap(get_values_string(radix)); return __result; END_RCPP } // get_values_integer std::vector < int > get_values_integer(SEXP radix); RcppExport SEXP triebeard_get_values_integer(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); __result = Rcpp::wrap(get_values_integer(radix)); return __result; END_RCPP } // get_values_numeric std::vector < double > get_values_numeric(SEXP radix); RcppExport SEXP triebeard_get_values_numeric(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); __result = Rcpp::wrap(get_values_numeric(radix)); return __result; END_RCPP } // get_values_logical std::vector < bool > get_values_logical(SEXP radix); RcppExport SEXP triebeard_get_values_logical(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); __result = Rcpp::wrap(get_values_logical(radix)); return __result; END_RCPP } // greedy_string List greedy_string(SEXP radix, CharacterVector to_match); RcppExport SEXP triebeard_greedy_string(SEXP radixSEXP, SEXP to_matchSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); Rcpp::traits::input_parameter< CharacterVector >::type to_match(to_matchSEXP); __result = Rcpp::wrap(greedy_string(radix, to_match)); return __result; END_RCPP } // greedy_integer List greedy_integer(SEXP radix, CharacterVector to_match); RcppExport SEXP triebeard_greedy_integer(SEXP radixSEXP, SEXP to_matchSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); Rcpp::traits::input_parameter< CharacterVector >::type to_match(to_matchSEXP); __result = Rcpp::wrap(greedy_integer(radix, to_match)); return __result; END_RCPP } // greedy_numeric List greedy_numeric(SEXP radix, CharacterVector to_match); RcppExport SEXP triebeard_greedy_numeric(SEXP radixSEXP, SEXP to_matchSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); Rcpp::traits::input_parameter< CharacterVector >::type to_match(to_matchSEXP); __result = Rcpp::wrap(greedy_numeric(radix, to_match)); return __result; END_RCPP } // greedy_logical List greedy_logical(SEXP radix, CharacterVector to_match); RcppExport SEXP triebeard_greedy_logical(SEXP radixSEXP, SEXP to_matchSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); Rcpp::traits::input_parameter< CharacterVector >::type to_match(to_matchSEXP); __result = Rcpp::wrap(greedy_logical(radix, to_match)); return __result; END_RCPP } // radix_len_string int radix_len_string(SEXP radix); RcppExport SEXP triebeard_radix_len_string(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); __result = Rcpp::wrap(radix_len_string(radix)); return __result; END_RCPP } // radix_len_integer int radix_len_integer(SEXP radix); RcppExport SEXP triebeard_radix_len_integer(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); __result = Rcpp::wrap(radix_len_integer(radix)); return __result; END_RCPP } // radix_len_numeric int radix_len_numeric(SEXP radix); RcppExport SEXP triebeard_radix_len_numeric(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); __result = Rcpp::wrap(radix_len_numeric(radix)); return __result; END_RCPP } // radix_len_logical int radix_len_logical(SEXP radix); RcppExport SEXP triebeard_radix_len_logical(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); __result = Rcpp::wrap(radix_len_logical(radix)); return __result; END_RCPP } // longest_string CharacterVector longest_string(SEXP radix, CharacterVector to_match); RcppExport SEXP triebeard_longest_string(SEXP radixSEXP, SEXP to_matchSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); Rcpp::traits::input_parameter< CharacterVector >::type to_match(to_matchSEXP); __result = Rcpp::wrap(longest_string(radix, to_match)); return __result; END_RCPP } // longest_integer IntegerVector longest_integer(SEXP radix, CharacterVector to_match); RcppExport SEXP triebeard_longest_integer(SEXP radixSEXP, SEXP to_matchSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); Rcpp::traits::input_parameter< CharacterVector >::type to_match(to_matchSEXP); __result = Rcpp::wrap(longest_integer(radix, to_match)); return __result; END_RCPP } // longest_numeric NumericVector longest_numeric(SEXP radix, CharacterVector to_match); RcppExport SEXP triebeard_longest_numeric(SEXP radixSEXP, SEXP to_matchSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); Rcpp::traits::input_parameter< CharacterVector >::type to_match(to_matchSEXP); __result = Rcpp::wrap(longest_numeric(radix, to_match)); return __result; END_RCPP } // longest_logical LogicalVector longest_logical(SEXP radix, CharacterVector to_match); RcppExport SEXP triebeard_longest_logical(SEXP radixSEXP, SEXP to_matchSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); Rcpp::traits::input_parameter< CharacterVector >::type to_match(to_matchSEXP); __result = Rcpp::wrap(longest_logical(radix, to_match)); return __result; END_RCPP } // prefix_string List prefix_string(SEXP radix, CharacterVector to_match); RcppExport SEXP triebeard_prefix_string(SEXP radixSEXP, SEXP to_matchSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); Rcpp::traits::input_parameter< CharacterVector >::type to_match(to_matchSEXP); __result = Rcpp::wrap(prefix_string(radix, to_match)); return __result; END_RCPP } // prefix_integer List prefix_integer(SEXP radix, CharacterVector to_match); RcppExport SEXP triebeard_prefix_integer(SEXP radixSEXP, SEXP to_matchSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); Rcpp::traits::input_parameter< CharacterVector >::type to_match(to_matchSEXP); __result = Rcpp::wrap(prefix_integer(radix, to_match)); return __result; END_RCPP } // prefix_numeric List prefix_numeric(SEXP radix, CharacterVector to_match); RcppExport SEXP triebeard_prefix_numeric(SEXP radixSEXP, SEXP to_matchSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); Rcpp::traits::input_parameter< CharacterVector >::type to_match(to_matchSEXP); __result = Rcpp::wrap(prefix_numeric(radix, to_match)); return __result; END_RCPP } // prefix_logical List prefix_logical(SEXP radix, CharacterVector to_match); RcppExport SEXP triebeard_prefix_logical(SEXP radixSEXP, SEXP to_matchSEXP) { BEGIN_RCPP Rcpp::RObject __result; Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); Rcpp::traits::input_parameter< CharacterVector >::type to_match(to_matchSEXP); __result = Rcpp::wrap(prefix_logical(radix, to_match)); return __result; END_RCPP } // trie_str_string void trie_str_string(SEXP radix); RcppExport SEXP triebeard_trie_str_string(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); trie_str_string(radix); return R_NilValue; END_RCPP } // trie_str_integer void trie_str_integer(SEXP radix); RcppExport SEXP triebeard_trie_str_integer(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); trie_str_integer(radix); return R_NilValue; END_RCPP } // trie_str_numeric void trie_str_numeric(SEXP radix); RcppExport SEXP triebeard_trie_str_numeric(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); trie_str_numeric(radix); return R_NilValue; END_RCPP } // trie_str_logical void trie_str_logical(SEXP radix); RcppExport SEXP triebeard_trie_str_logical(SEXP radixSEXP) { BEGIN_RCPP Rcpp::RNGScope __rngScope; Rcpp::traits::input_parameter< SEXP >::type radix(radixSEXP); trie_str_logical(radix); return R_NilValue; END_RCPP } triebeard/src/length.cpp0000644000176200001440000000107112750461372014775 0ustar liggesusers#include "r_trie.h" template static inline int radix_len(SEXP radix){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(radix); ptr_check(rt_ptr); return rt_ptr->radix_size; } //[[Rcpp::export]] int radix_len_string(SEXP radix){ return radix_len(radix); } //[[Rcpp::export]] int radix_len_integer(SEXP radix){ return radix_len(radix); } //[[Rcpp::export]] int radix_len_numeric(SEXP radix){ return radix_len(radix); } //[[Rcpp::export]] int radix_len_logical(SEXP radix){ return radix_len(radix); } triebeard/src/typedef.h0000644000176200001440000000101412750461372014616 0ustar liggesusers#include "r_trie.h" #ifndef __RTRIE_TYPES__ #define __RTRIE_TYPES__ template void finaliseRadix(r_trie * radix_inst){ delete radix_inst; } typedef Rcpp::XPtr, Rcpp::PreserveStorage, finaliseRadix> XPtrRadixStr; typedef Rcpp::XPtr, Rcpp::PreserveStorage, finaliseRadix> XPtrRadixInt; typedef Rcpp::XPtr, Rcpp::PreserveStorage, finaliseRadix> XPtrRadixBool; typedef Rcpp::XPtr, Rcpp::PreserveStorage, finaliseRadix> XPtrRadixDouble; #endif triebeard/src/alter.cpp0000644000176200001440000000557712750461372014642 0ustar liggesusers#include "r_trie.h" //[[Rcpp::export]] void add_trie_string(SEXP trie, CharacterVector keys, CharacterVector values){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(trie); ptr_check(rt_ptr); unsigned int in_size = keys.size(); for(unsigned int i = 0; i < in_size; i++){ if((i % 10000) == 0){ Rcpp::checkUserInterrupt(); } if(keys[i] != NA_STRING && values[i] != NA_STRING){ rt_ptr->insert_value(Rcpp::as(keys[i]), Rcpp::as(values[i])); } } rt_ptr->radix_size = rt_ptr->size(); } //[[Rcpp::export]] void add_trie_integer(SEXP trie, CharacterVector keys, IntegerVector values){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(trie); ptr_check(rt_ptr); unsigned int in_size = keys.size(); for(unsigned int i = 0; i < in_size; i++){ if((i % 10000) == 0){ Rcpp::checkUserInterrupt(); } if(keys[i] != NA_STRING && values[i] != NA_INTEGER){ rt_ptr->insert_value(Rcpp::as(keys[i]), values[i]); } } rt_ptr->radix_size = rt_ptr->size(); } //[[Rcpp::export]] void add_trie_numeric(SEXP trie, CharacterVector keys, NumericVector values){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(trie); ptr_check(rt_ptr); unsigned int in_size = keys.size(); for(unsigned int i = 0; i < in_size; i++){ if((i % 10000) == 0){ Rcpp::checkUserInterrupt(); } if(keys[i] != NA_STRING && values[i] != NA_REAL){ rt_ptr->insert_value(Rcpp::as(keys[i]), values[i]); } } rt_ptr->radix_size = rt_ptr->size(); } //[[Rcpp::export]] void add_trie_logical(SEXP trie, CharacterVector keys, LogicalVector values){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(trie); ptr_check(rt_ptr); unsigned int in_size = keys.size(); for(unsigned int i = 0; i < in_size; i++){ if((i % 10000) == 0){ Rcpp::checkUserInterrupt(); } if(keys[i] != NA_STRING && values[i] != NA_LOGICAL){ rt_ptr->insert_value(Rcpp::as(keys[i]), values[i]); } } rt_ptr->radix_size = rt_ptr->size(); } //[[Rcpp::export]] void remove_trie_string(SEXP trie, CharacterVector keys){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(trie); ptr_check(rt_ptr); rt_ptr->remove_values(keys); } //[[Rcpp::export]] void remove_trie_integer(SEXP trie, CharacterVector keys){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(trie); ptr_check(rt_ptr); rt_ptr->remove_values(keys); } //[[Rcpp::export]] void remove_trie_numeric(SEXP trie, CharacterVector keys){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(trie); ptr_check(rt_ptr); rt_ptr->remove_values(keys); } //[[Rcpp::export]] void remove_trie_logical(SEXP trie, CharacterVector keys){ r_trie * rt_ptr = (r_trie *) R_ExternalPtrAddr(trie); ptr_check(rt_ptr); rt_ptr->remove_values(keys); } triebeard/NAMESPACE0000644000176200001440000000263312742550046013443 0ustar liggesusers# Generated by roxygen2: do not edit by hand S3method(as.data.frame,trie) S3method(as.list,trie) S3method(dim,trie) S3method(get_keys,integer_trie) S3method(get_keys,logical_trie) S3method(get_keys,numeric_trie) S3method(get_keys,string_trie) S3method(get_values,integer_trie) S3method(get_values,logical_trie) S3method(get_values,numeric_trie) S3method(get_values,string_trie) S3method(greedy_match,integer_trie) S3method(greedy_match,logical_trie) S3method(greedy_match,numeric_trie) S3method(greedy_match,string_trie) S3method(length,integer_trie) S3method(length,logical_trie) S3method(length,numeric_trie) S3method(length,string_trie) S3method(longest_match,integer_trie) S3method(longest_match,logical_trie) S3method(longest_match,numeric_trie) S3method(longest_match,string_trie) S3method(prefix_match,integer_trie) S3method(prefix_match,logical_trie) S3method(prefix_match,numeric_trie) S3method(prefix_match,string_trie) S3method(print,trie) S3method(str,trie) S3method(trie_add,integer_trie) S3method(trie_add,logical_trie) S3method(trie_add,numeric_trie) S3method(trie_add,string_trie) S3method(trie_remove,integer_trie) S3method(trie_remove,logical_trie) S3method(trie_remove,numeric_trie) S3method(trie_remove,string_trie) export(get_keys) export(get_values) export(greedy_match) export(longest_match) export(prefix_match) export(trie) export(trie_add) export(trie_remove) importFrom(Rcpp,sourceCpp) useDynLib(triebeard) triebeard/NEWS0000644000176200001440000000143512750460170012717 0ustar liggesusersVersion 0.3.0 ================================= DEVELOPMENT * Internal refactoring using templates has drastically reduced the size of the codebase Version 0.2.1 ================================= BUG FIXES * Fixed segfault when `trie_remove()` resulted in a 0 element trie. Version 0.2.0 ================================= FEATURES * tries can now be converted into lists and data.frames * We now have str() and print() methods! That's nice. * create_trie() renamed trie() BUGS * Haven't found any. Probably means they're lurking around and particularly nasty. DEVELOPMENT * greedy and prefix matching should now be (slightly) faster. * Installed size has been slightly reduced, and the C++ code simplifed. Version 0.1.0 ================================= * Initial, GitHub-centred release triebeard/R/0000755000176200001440000000000012742550046012421 5ustar liggesuserstriebeard/R/triebeard.R0000644000176200001440000000050212742550046014502 0ustar liggesusers#' @title Radix trees in Rcpp #' @name triebeard #' @description This package provides access to Radix tree (or "trie") structures in Rcpp. At #' a later date it will hopefully provide them in R, too. #' #' @docType package #' @aliases triebeard triebeard-package #' @useDynLib triebeard #' @importFrom Rcpp sourceCpp NULLtriebeard/R/create.R0000644000176200001440000000337512742550046014017 0ustar liggesusers#'@title Create a Trie #'@description \code{create_trie} creates a trie (a key-value store optimised #'for matching) out of a provided character vector of keys, and a numeric, #'character, logical or integer vector of values (both the same length). #' #'@param keys a character vector containing the keys for the trie. #' #'@param values an atomic vector of any type, containing the values to pair with #'\code{keys}. Must be the same length as \code{keys}. #' #'@return a `trie` object. #' #'@seealso \code{\link{trie_add}} and \code{\link{trie_remove}} for adding to and removing #'from tries after their creation, and \code{\link{longest_match}} and other match functions #'for matching values against the keys of a created trie. #' #'@examples #'# An integer trie #'int_trie <- trie(keys = "foo", values = 1) #' #'# A string trie #'str_trie <- trie(keys = "foo", values = "bar") #' #'@export trie <- function(keys, values){ stopifnot(length(keys) == length(values)) stopifnot(is.character(keys)) output <- NULL output_classes <- c("trie", NA) switch(class(values)[1], "character" = { output <- radix_create_string(keys, values) output_classes[2] <- "string_trie" }, "integer" = { output <- radix_create_integer(keys, values) output_classes[2] <- "integer_trie" }, "numeric" = { output <- radix_create_numeric(keys, values) output_classes[2] <- "numeric_trie" }, "logical" = { output <- radix_create_logical(keys, values) output_classes[2] <- "logical_trie" }, stop("'values' must be a numeric, integer, character or logical vector")) class(output) <- c(class(output), output_classes) return(output) }triebeard/R/get.R0000644000176200001440000000241512742550046013325 0ustar liggesusers#'@title Trie Getters #'@description "Getters" for the data stored in a trie object. \code{get_keys} #' gets the keys, \code{get_values} gets the values. #' #'@param trie A trie object, created with \code{\link{trie}}. #' #'@return An atomic vector of keys or values stored in the trie. #' #'@name getters #'@rdname getters NULL #'@rdname getters #'@export get_keys <- function(trie){ stopifnot("trie" %in% class(trie)) UseMethod("get_keys", trie) } #'@rdname getters #'@export get_values <- function(trie){ stopifnot("trie" %in% class(trie)) UseMethod("get_values", trie) } #'@export get_keys.string_trie <- function(trie){ return(get_keys_string(trie)) } #'@export get_keys.integer_trie <- function(trie){ return(get_keys_integer(trie)) } #'@export get_keys.numeric_trie <- function(trie){ return(get_keys_numeric(trie)) } #'@export get_keys.logical_trie <- function(trie){ return(get_keys_logical(trie)) } #'@export get_values.string_trie <- function(trie){ return(get_values_string(trie)) } #'@export get_values.integer_trie <- function(trie){ return(get_values_integer(trie)) } #'@export get_values.numeric_trie <- function(trie){ return(get_values_numeric(trie)) } #'@export get_values.logical_trie <- function(trie){ return(get_values_logical(trie)) } triebeard/R/as.R0000644000176200001440000000077712742550046013162 0ustar liggesusers#'@export as.list.trie <- function(x, ...){ return(list(keys = get_keys(x), values = get_values(x))) } #'@export as.data.frame.trie <- function(x, row.names = NULL, optional = FALSE, stringsAsFactors = FALSE, ...){ output <- data.frame(keys = get_keys(x), values = get_values(x), stringsAsFactors = stringsAsFactors, ...) if(!is.null(row.names)){ rownames(output) <- row.names } return(output) }triebeard/R/metadata.R0000644000176200001440000000162712742550046014332 0ustar liggesusers#'@export length.string_trie <- function(x){ return(radix_len_string(x)) } #'@export length.integer_trie <- function(x){ return(radix_len_integer(x)) } #'@export length.numeric_trie <- function(x){ return(radix_len_numeric(x)) } #'@export length.logical_trie <- function(x){ return(radix_len_logical(x)) } #'@export dim.trie <- function(x){ return(length(x)) } #'@export str.trie <- function(object, ...){ type <- class(object)[3] cat(paste0(type, "\n")) switch(type, "string_trie" = {trie_str_string(object)}, "integer_trie" = {trie_str_integer(object)}, "numeric_trie" = {trie_str_numeric(object)}, "logical_trie" = {trie_str_logical(object)} ) return(invisible()) } #'@export print.trie <- function(x, ...){ len <- length(x) entry_word <- ifelse(len != 1, "entries", "entry") cat("A", class(x)[3], "object with", len, entry_word, "\n") } triebeard/R/RcppExports.R0000644000176200001440000001202612742550046015036 0ustar liggesusers# This file was generated by Rcpp::compileAttributes # Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 add_trie_string <- function(trie, keys, values) { invisible(.Call('triebeard_add_trie_string', PACKAGE = 'triebeard', trie, keys, values)) } add_trie_integer <- function(trie, keys, values) { invisible(.Call('triebeard_add_trie_integer', PACKAGE = 'triebeard', trie, keys, values)) } add_trie_numeric <- function(trie, keys, values) { invisible(.Call('triebeard_add_trie_numeric', PACKAGE = 'triebeard', trie, keys, values)) } add_trie_logical <- function(trie, keys, values) { invisible(.Call('triebeard_add_trie_logical', PACKAGE = 'triebeard', trie, keys, values)) } remove_trie_string <- function(trie, keys) { invisible(.Call('triebeard_remove_trie_string', PACKAGE = 'triebeard', trie, keys)) } remove_trie_integer <- function(trie, keys) { invisible(.Call('triebeard_remove_trie_integer', PACKAGE = 'triebeard', trie, keys)) } remove_trie_numeric <- function(trie, keys) { invisible(.Call('triebeard_remove_trie_numeric', PACKAGE = 'triebeard', trie, keys)) } remove_trie_logical <- function(trie, keys) { invisible(.Call('triebeard_remove_trie_logical', PACKAGE = 'triebeard', trie, keys)) } radix_create_string <- function(keys, values) { .Call('triebeard_radix_create_string', PACKAGE = 'triebeard', keys, values) } radix_create_integer <- function(keys, values) { .Call('triebeard_radix_create_integer', PACKAGE = 'triebeard', keys, values) } radix_create_numeric <- function(keys, values) { .Call('triebeard_radix_create_numeric', PACKAGE = 'triebeard', keys, values) } radix_create_logical <- function(keys, values) { .Call('triebeard_radix_create_logical', PACKAGE = 'triebeard', keys, values) } get_keys_string <- function(radix) { .Call('triebeard_get_keys_string', PACKAGE = 'triebeard', radix) } get_keys_integer <- function(radix) { .Call('triebeard_get_keys_integer', PACKAGE = 'triebeard', radix) } get_keys_numeric <- function(radix) { .Call('triebeard_get_keys_numeric', PACKAGE = 'triebeard', radix) } get_keys_logical <- function(radix) { .Call('triebeard_get_keys_logical', PACKAGE = 'triebeard', radix) } get_values_string <- function(radix) { .Call('triebeard_get_values_string', PACKAGE = 'triebeard', radix) } get_values_integer <- function(radix) { .Call('triebeard_get_values_integer', PACKAGE = 'triebeard', radix) } get_values_numeric <- function(radix) { .Call('triebeard_get_values_numeric', PACKAGE = 'triebeard', radix) } get_values_logical <- function(radix) { .Call('triebeard_get_values_logical', PACKAGE = 'triebeard', radix) } greedy_string <- function(radix, to_match) { .Call('triebeard_greedy_string', PACKAGE = 'triebeard', radix, to_match) } greedy_integer <- function(radix, to_match) { .Call('triebeard_greedy_integer', PACKAGE = 'triebeard', radix, to_match) } greedy_numeric <- function(radix, to_match) { .Call('triebeard_greedy_numeric', PACKAGE = 'triebeard', radix, to_match) } greedy_logical <- function(radix, to_match) { .Call('triebeard_greedy_logical', PACKAGE = 'triebeard', radix, to_match) } radix_len_string <- function(radix) { .Call('triebeard_radix_len_string', PACKAGE = 'triebeard', radix) } radix_len_integer <- function(radix) { .Call('triebeard_radix_len_integer', PACKAGE = 'triebeard', radix) } radix_len_numeric <- function(radix) { .Call('triebeard_radix_len_numeric', PACKAGE = 'triebeard', radix) } radix_len_logical <- function(radix) { .Call('triebeard_radix_len_logical', PACKAGE = 'triebeard', radix) } longest_string <- function(radix, to_match) { .Call('triebeard_longest_string', PACKAGE = 'triebeard', radix, to_match) } longest_integer <- function(radix, to_match) { .Call('triebeard_longest_integer', PACKAGE = 'triebeard', radix, to_match) } longest_numeric <- function(radix, to_match) { .Call('triebeard_longest_numeric', PACKAGE = 'triebeard', radix, to_match) } longest_logical <- function(radix, to_match) { .Call('triebeard_longest_logical', PACKAGE = 'triebeard', radix, to_match) } prefix_string <- function(radix, to_match) { .Call('triebeard_prefix_string', PACKAGE = 'triebeard', radix, to_match) } prefix_integer <- function(radix, to_match) { .Call('triebeard_prefix_integer', PACKAGE = 'triebeard', radix, to_match) } prefix_numeric <- function(radix, to_match) { .Call('triebeard_prefix_numeric', PACKAGE = 'triebeard', radix, to_match) } prefix_logical <- function(radix, to_match) { .Call('triebeard_prefix_logical', PACKAGE = 'triebeard', radix, to_match) } trie_str_string <- function(radix) { invisible(.Call('triebeard_trie_str_string', PACKAGE = 'triebeard', radix)) } trie_str_integer <- function(radix) { invisible(.Call('triebeard_trie_str_integer', PACKAGE = 'triebeard', radix)) } trie_str_numeric <- function(radix) { invisible(.Call('triebeard_trie_str_numeric', PACKAGE = 'triebeard', radix)) } trie_str_logical <- function(radix) { invisible(.Call('triebeard_trie_str_logical', PACKAGE = 'triebeard', radix)) } triebeard/R/alter.R0000644000176200001440000000426112742550046013656 0ustar liggesusers#'@title Add or remove trie entries #' #'@description \code{trie_add} and \code{trie_remove} allow you to #'add or remove entries from tries, respectively. #' #'@param trie a trie object created with \code{\link{trie}} #' #'@param keys a character vector containing the keys of the entries to #'add (or remove). Entries with NA keys will not be added. #' #'@param values an atomic vector, matching the type of the trie, containing #'the values of the entries to add. Entries with NA values will not be added. #' #'@return nothing; the trie is modified in-place #' #'@examples #'trie <- trie("foo", "bar") #'length(trie) #' #'trie_add(trie, "baz", "qux") #'length(trie) #' #'trie_remove(trie, "baz") #'length(trie) #' #'@seealso \code{\link{trie}} for creating tries in the first place. #'@name alter #'@rdname alter #'@export trie_add <- function(trie, keys, values){ stopifnot(length(keys) == length(values)) stopifnot(is.character(keys)) UseMethod("trie_add", trie) } #'@export trie_add.string_trie <- function(trie, keys, values){ stopifnot(is.character(values)) add_trie_string(trie, keys, values) return(invisible()) } #'@export trie_add.integer_trie <- function(trie, keys, values){ stopifnot(is.integer(values)) add_trie_integer(trie, keys, values) return(invisible()) } #'@export trie_add.numeric_trie <- function(trie, keys, values){ stopifnot(is.numeric(values)) add_trie_numeric(trie, keys, values) return(invisible()) } #'@export trie_add.logical_trie <- function(trie, keys, values){ stopifnot(is.logical(values)) add_trie_logical(trie, keys, values) return(invisible()) } #'@rdname alter #'@export trie_remove <- function(trie, keys){ stopifnot(is.character(keys)) UseMethod("trie_remove", trie) } #'@export trie_remove.string_trie <- function(trie, keys){ remove_trie_string(trie, keys) return(invisible()) } #'@export trie_remove.integer_trie <- function(trie, keys){ remove_trie_integer(trie, keys) return(invisible()) } #'@export trie_remove.numeric_trie <- function(trie, keys){ remove_trie_numeric(trie, keys) return(invisible()) } #'@export trie_remove.logical_trie <- function(trie, keys){ remove_trie_logical(trie, keys) return(invisible()) } triebeard/R/match.R0000644000176200001440000001072612742550046013646 0ustar liggesusers#'@title Find the longest match in a trie #'@description \code{longest_match} accepts a trie and a character vector #'and returns the value associated with whichever key had the \emph{longest match} #'to each entry in the character vector. A trie of "binary" and "bind", for example, #'with an entry-to-compare of "binder", will match to "bind". #' #'@param trie a trie object, created with \code{\link{trie}} #' #'@param to_match a character vector containing the strings to match against the #'trie's keys. #' #'@examples #'trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), #' values = c("afford", "affair", "available", "binary", "bind", "blind")) #'longest_match(trie, "binder") #' #'@seealso \code{\link{prefix_match}} and \code{\link{greedy_match}} #'for prefix and greedy matching, respectively. #' #'@export longest_match <- function(trie, to_match){ stopifnot("trie" %in% class(trie)) UseMethod("longest_match", trie) } #'@export longest_match.string_trie <- function(trie, to_match){ return(longest_string(trie, to_match)) } #'@export longest_match.integer_trie <- function(trie, to_match){ return(longest_integer(trie, to_match)) } #'@export longest_match.numeric_trie <- function(trie, to_match){ return(longest_numeric(trie, to_match)) } #'@export longest_match.logical_trie <- function(trie, to_match){ return(longest_logical(trie, to_match)) } #'@title Find the prefix matches in a trie #'@description \code{prefix_match} accepts a trie and a character vector #'and returns the values associated with any key that has a particular #'character vector entry as a prefix (see the examples). #' #'@param trie a trie object, created with \code{\link{trie}} #' #'@param to_match a character vector containing the strings to check against the #'trie's keys. #' #'@return a list, the length of \code{to_match}, with each entry containing any trie values #'where the \code{to_match} element was a prefix of the associated key. In the case that #'nothing was found, the entry will contain \code{NA}. #' #'@examples #'trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), #' values = c("afford", "affair", "available", "binary", "bind", "blind")) #'prefix_match(trie, "aff") #' #'@seealso \code{\link{longest_match}} and \code{\link{greedy_match}} #'for longest and greedy matching, respectively. #' #'@export prefix_match <- function(trie, to_match){ stopifnot("trie" %in% class(trie)) UseMethod("prefix_match", trie) } #'@export prefix_match.numeric_trie <- function(trie, to_match){ return(prefix_numeric(trie, to_match)) } #'@export prefix_match.integer_trie <- function(trie, to_match){ return(prefix_integer(trie, to_match)) } #'@export prefix_match.string_trie <- function(trie, to_match){ return(prefix_string(trie, to_match)) } #'@export prefix_match.logical_trie <- function(trie, to_match){ return(prefix_logical(trie, to_match)) } #'@title Greedily match against a tree #'@description \code{greedy_match} accepts a trie and a character vector #'and returns the values associated with any key that is "greedily" (read: fuzzily) #'matched against one of the character vector entries. #' #'@param trie a trie object, created with \code{\link{trie}} #' #'@param to_match a character vector containing the strings to check against the #'trie's keys. #' #'@return a list, the length of \code{to_match}, with each entry containing any trie values #'where the \code{to_match} element greedily matches the associated key. In the case that #'nothing was found, the entry will contain \code{NA}. #' #'@examples #'trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), #' values = c("afford", "affair", "available", "binary", "bind", "blind")) #'greedy_match(trie, c("avoid", "bring", "attack")) #' #'@seealso \code{\link{longest_match}} and \code{\link{prefix_match}} #'for longest and prefix matching, respectively. #' #'@export greedy_match <- function(trie, to_match){ stopifnot("trie" %in% class(trie)) UseMethod("greedy_match", trie) } #'@export greedy_match.numeric_trie <- function(trie, to_match){ return(greedy_numeric(trie, to_match)) } #'@export greedy_match.integer_trie <- function(trie, to_match){ return(greedy_integer(trie, to_match)) } #'@export greedy_match.string_trie <- function(trie, to_match){ return(greedy_string(trie, to_match)) } #'@export greedy_match.logical_trie <- function(trie, to_match){ return(greedy_logical(trie, to_match)) } triebeard/vignettes/0000755000176200001440000000000012750461372014232 5ustar liggesuserstriebeard/vignettes/r_radix.Rmd0000644000176200001440000001174012742550046016327 0ustar liggesusers--- title: "Radix trees in R" author: "Oliver Keyes" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Radix trees in R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- A **radix tree**, or **trie**, is a data structure optimised for storing key-value pairs in a way optimised for searching. This makes them very, very good for efficiently matching data against keys, and retrieving the values *associated* with those keys. `triebeard` provides an implementation of tries for R (and one that can be used in Rcpp development, too, if that's your thing) so that useRs can take advantage of the fast, efficient and user-friendly matching that they allow. ## Radix usage Suppose we have observations in a dataset that are labelled, with a 2-3 letter code that identifies the facility the sample came from: ```{r, eval=FALSE} labels <- c("AO-1002", "AEO-1004", "AAI-1009", "AFT-1403", "QZ-9065", "QZ-1021", "RF-0901", "AO-1099", "AFT-1101", "QZ-4933") ``` We know the facility each code maps to, and we want to be able to map the labels to that - not over 10 entries but over hundreds, or thousands, or hundreds *of* thousands. Tries are a great way of doing that: we treat the codes as *keys* and the full facility names as *values*. So let's make a trie to do this matching, and then, well, match: ```{r, eval=FALSE} library(triebeard) trie <- trie(keys = c("AO", "AEO", "AAI", "AFT", "QZ", "RF"), values = c("Audobon", "Atlanta", "Ann Arbor", "Austin", "Queensland", "Raleigh")) longest_match(trie = trie, to_match = labels) [1] "Audobon" "Atlanta" "Ann Arbor" "Austin" "Queensland" "Queensland" "Raleigh" "Audobon" "Austin" [10] "Queensland" ```` This pulls out, for each label, the trie value where the associated key has the longest prefix-match to the label. We can also just grab all the values where the key starts with, say, A: ```{r, eval=FALSE} prefix_match(trie = trie, to_match = "A") [[1]] [1] "Ann Arbor" "Atlanta" "Austin" "Audobon" ``` And finally if we want we can match very, very fuzzily using "greedy" matching: ```{r, eval=FALSE} greedy_match(trie = trie, to_match = "AO") [[1]] [1] "Ann Arbor" "Atlanta" "Austin" "Audobon" ``` These operations are very, very efficient. If we use longest-match as an example, since that's the most useful thing, with a one-million element vector of things to match against: ```{r, eval=FALSE} library(triebeard) library(microbenchmark) trie <- trie(keys = c("AO", "AEO", "AAI", "AFT", "QZ", "RF"), values = c("Audobon", "Atlanta", "Ann Arbor", "Austin", "Queensland", "Raleigh")) labels <- rep(c("AO-1002", "AEO-1004", "AAI-1009", "AFT-1403", "QZ-9065", "QZ-1021", "RF-0901", "AO-1099", "AFT-1101", "QZ-4933"), 100000) microbenchmark({longest_match(trie = trie, to_match = labels)}) Unit: milliseconds expr min lq mean median uq max neval { longest_match(trie = trie, to_match = labels) } 284.6457 285.5902 289.5342 286.8775 288.4564 327.3878 100 ``` I think we can call <300 milliseconds for a million matches against an entire set of possible values pretty fast. ## Radix modification There's always the possibility that (horror of horrors) you'll have to add or remove entries from the trie. Fear not; you can do just that with `trie_add` and `trie_remove` respectively, both of which silently modify the trie they're provided with to add or remove whatever key-value pairs you provide: ```{r, eval=FALSE} to_match = "198.0.0.1" trie_inst <- trie(keys = "197", values = "fake range") longest_match(trie_inst, to_match) [1] NA trie_add(trie_inst, keys = "198", values = "home range") longest_match(trie_inst, to_match) [1] "home range" trie_remove(trie_inst, keys = "198") longest_match(trie_inst, to_match) [1] NA ``` ## Metadata and coercion You can also extract information from tries without using them. `dim`, `str`, `print` and `length` all work for tries, and you can use `get_keys(trie)` and `get_values(trie)` to extract, respectively, the keys and values from a trie object. In addition, you can also coerce tries into other R data structures, specifically lists and data.frames: ```{r, eval=FALSE} trie <- trie(keys = c("AO", "AEO", "AAI", "AFT", "QZ", "RF"), values = c("Audobon", "Atlanta", "Ann Arbor", "Austin", "Queensland", "Raleigh")) str(as.data.frame(trie)) 'data.frame': 6 obs. of 2 variables: $ keys : chr "AAI" "AEO" "AFT" "AO" ... $ values: chr "Ann Arbor" "Atlanta" "Austin" "Audobon" ... str(as.list(trie)) List of 2 $ keys : chr [1:6] "AAI" "AEO" "AFT" "AO" ... $ values: chr [1:6] "Ann Arbor" "Atlanta" "Austin" "Audobon" ... ``` ### Other trie operations If you have ideas for other trie-like structures, or functions that would be useful with *these* tries, the best approach is to either [request it](https://github.com/Ironholds/triebeard/issues) or [add it](https://github.com/Ironholds/triebeard/pulls)! triebeard/vignettes/rcpp_radix.Rmd0000644000176200001440000000636012742550046017034 0ustar liggesusers--- title: "Radix trees in Rcpp" author: "Oliver Keyes" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Radix trees in Rcpp} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- A **radix tree** is a data structure optimised for storing key-value pairs in a way optimised for searching. This makes them very, very good for efficiently matching data against keys, and retrieving the values *associated* with those keys. `triebeard` provides an implementation of radix trees for Rcpp (and also for use directly in R). To start using radix trees in your Rcpp development, simply modify your C++ file to include at the top: ```{Rcpp, eval=FALSE} //[[Rcpp::depends(triebeard)]] #include ``` ## Constructing trees Trees are constructed using the syntax: ```{Rcpp, eval=FALSE} radix_tree radix; ``` Where `type` represents the type of the keys (for example, `std::string`) and `type2` the type of the values. Radix trees can have any scalar type as keys, although strings are most typical; they can also have any scalar type for values. Once you've constructed a tree, new entries can be added in a very R-like way: `radix[new_key] = new_value;`. Entries can also be removed, with `radix.erase(key)`. ## Matching against trees We then move on to the fun bit: matching! As mentioned, radix trees are really good for matching arbitrary values against keys (well, keys of the same type) and retrieving the associated values. There are three types of supported matching; longest, prefix, and greedy. Longest does exactly what it says on the tin: it finds the key-value pair where the longest initial part of the key matches the arbitrary value: ```{Rcpp, eval=FALSE} radix_tree radix; radix["turnin"] = "entry the first"; radix["turin"] = "entry the second"; radix_tree::iterator it; it = radix.longest_match("turing"); if(it = radix.end()){ printf("No match was found :("); } else { std::string result = "Key of longest match: " + it->first + " , value of longest match: " + it->second; } ``` Prefix matching provides all trie entries where the value-to-match is a *prefix* of the key: ```{Rcpp, eval=FALSE} radix_tree radix; radix["turnin"] = "entry the first"; radix["turin"] = "entry the second"; std::vector::iterator> vec; std::vector::iterator>::iterator it; it = radix.prefix_match("tur"); if(it == vec.end()){ printf("No match was found :("); } else { for (it = vec.begin(); it != vec.end(); ++it) { std::string result = "Key of a prefix match: " + it->first + " , value of a prefix match: " + it->second; } } ``` Greedy matching matches very, very fuzzily (a value of 'bring', for example, will match 'blind', 'bind' and 'binary') and, syntactically, looks exactly the same as prefix-matching, albeit with `radix.greedy_match()` instead of `radix.prefix_match()`. ### Other trie things If you have ideas for other trie-like structures, or functions that would be useful with *these* tries, the best approach is to either [request it](https://github.com/Ironholds/triebeard/issues) or [add it](https://github.com/Ironholds/triebeard/pulls)! triebeard/README.md0000644000176200001440000000302312742550046013475 0ustar liggesusers##triebeard Fast key-value matching in R and Rcpp __Author:__ Oliver Keyes, Drew Schmidt, Yuuki Takano
__License:__ [MIT](http://opensource.org/licenses/MIT)
__Status:__ Stable [![Travis-CI Build Status](https://travis-ci.org/Ironholds/triebeard.svg?branch=master)](https://travis-ci.org/Ironholds/triebeard) ![downloads](http://cranlogs.r-pkg.org/badges/grand-total/triebeard) ###Description Tries, or [radix trees](https://en.wikipedia.org/wiki/Radix_tree), are key-value data structures optimised for very, very fast matching of the keys against user-provided data (and then the return of the associated values!) This is pretty useful in data cleaning and value extraction, and tries let you do it *really* efficiently. `triebeard` contains an implementation that can be used both when writing R, and when writing Rcpp (and imported and linked against, to boot). For more information see: 1. The [vignette on Rcpp usage](https://cran.r-project.org/web/packages/triebeard/vignettes/rcpp_radix.html); 2. The [vignette on R usage](https://cran.r-project.org/web/packages/triebeard/vignettes/r_radix.html). Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms. ###Installation The stable, CRAN-ready version can be retrieved with: install.packages("triebeard") The latest version can be obtained via: devtools::install_github("ironholds/triebeard") ###Dependencies * R. * [Rcpp](https://cran.r-project.org/package=Rcpp) triebeard/MD50000644000176200001440000000535212750473341012536 0ustar liggesusers9023c4e6b78b5f293f3b4a5deda63658 *DESCRIPTION f94cca6b956b73bb59133806bf8630ed *LICENSE 381bb68bc39e29d78751ed3a23d1e294 *NAMESPACE 562c97b7461f0633162f51bf069dde04 *NEWS e6680aff516001a7f35a0a8d8a6ee08f *R/RcppExports.R 6b8c5820078e558fed74ac9abde562d4 *R/alter.R 896e6c1354e06b438613b8bb0102aa82 *R/as.R 697b7a8fdaff92effd3d888bc8488548 *R/create.R da099d8106bd5e288145f294938b44b1 *R/get.R 1536d13d83bc75c5b1051cd496288ad5 *R/match.R ffbddee925213dc0f96f435e1ac57e37 *R/metadata.R 0b7942e6e1f579a016a4a9460b077465 *R/triebeard.R f277730d740739a3dcd52afdb78bcc55 *README.md 4afef02ab2a0dcc9046acac6acad44f7 *build/vignette.rds 7ea9df0d8f6b68c39a1a7f779777c07d *inst/doc/r_radix.R 1724dac3c8942746078c278ae7a0e682 *inst/doc/r_radix.Rmd 5cfbc4d40b2cfb8e9267a7885576c9a9 *inst/doc/r_radix.html d2a945ffc638ce79f3279132bae1d555 *inst/doc/rcpp_radix.R 0c0cb935a3f5e8c8d89f47aaa9da4857 *inst/doc/rcpp_radix.Rmd e19f6bda8ae1d0b78144ac7582c686bd *inst/doc/rcpp_radix.html 5d7986ce0d8af2be83ef21f2567aec01 *inst/include/radix.h 88d53ac341ef6cd3de95d9fc08c300d4 *inst/include/radix/radix_tree.hpp 95a4bd770c142b6e2d385520db079616 *inst/include/radix/radix_tree_it.hpp 51cc663dd5e66c53629dd472f3010343 *inst/include/radix/radix_tree_node.hpp 4ee34b1a35ef816c0c24906e7ced9c40 *man/alter.Rd a216fd245b29c517158c009529eaa119 *man/getters.Rd 2b6725804a7d199b82fde933e37e3be6 *man/greedy_match.Rd eab84d123bf25cbc86e5fb02942941ad *man/longest_match.Rd aa1939ddac0c0c1e02ede7e70ce7c346 *man/prefix_match.Rd 58acf216d9f0575c1045ad971515fe01 *man/trie.Rd fffaff8757543bc5d6d1bcd58f8622cb *man/triebeard.Rd a4791667d570aa373d3accd2c0342f70 *src/Makevars f5f4205e4fb0e33e147915abb3992a09 *src/RcppExports.cpp 5ee232743bdd76ee685f76d63a87f29c *src/alter.cpp 03602e2613ad2a39164a3ebb546160fe *src/create.cpp 27d4379b5184abfdeed8bbf4517f5390 *src/get.cpp 5723fe9f3174c8702e999c77285c27ae *src/greedy_match.cpp 289a0cb64a67563d44a3043f13bd976b *src/length.cpp b521a66e5a98c2f5d202f80f82ae9f50 *src/longest_match.cpp f523dd303b2714f0d237b94bb127e36d *src/prefix_match.cpp ff3819918299a03be89550c3d0216dbd *src/r_trie.h e8e1a1e438ce94018046f61819e269b3 *src/str.cpp 0ce9673389a6629661d5cb3c771ce283 *src/typedef.h 1924dd7d9fc0a96bd905e1505fed3c17 *tests/testthat.R e97eaea72cb061e52763587e46a40cf2 *tests/testthat/test_alter.R 4625183c39080e1110621d2a69dca991 *tests/testthat/test_convert.R cc491a0f239228a9767f802806e68ac3 *tests/testthat/test_create.R b7aea82f87079307342cd8207b08177e *tests/testthat/test_get.R 47648bfd8332d9b3ebe96c73e29d7aae *tests/testthat/test_greedy.R 849b8fc4c248337974d3e06f87ae81ae *tests/testthat/test_longest.R d754bb22a652c8ba92de00764268b19b *tests/testthat/test_prefix.R 1724dac3c8942746078c278ae7a0e682 *vignettes/r_radix.Rmd 0c0cb935a3f5e8c8d89f47aaa9da4857 *vignettes/rcpp_radix.Rmd triebeard/build/0000755000176200001440000000000012750461372013321 5ustar liggesuserstriebeard/build/vignette.rds0000644000176200001440000000034012750461372015655 0ustar liggesusersuM0E#S6n q4PcҒ;O8%@7}!YQxRIlw.Y*^A2 ܸ\cޒÒtu!;M Description: 'Radix trees', or 'tries', are key-value data structures optimised for efficient lookups, similar in purpose to hash tables. 'triebeard' provides an implementation of 'radix trees' for use in R programming and in developing packages with 'Rcpp'. License: MIT + file LICENSE LazyData: TRUE LinkingTo: Rcpp Imports: Rcpp RoxygenNote: 5.0.1 Suggests: knitr, rmarkdown, testthat VignetteBuilder: knitr URL: https://github.com/Ironholds/triebeard/ BugReports: https://github.com/Ironholds/triebeard/issues Date: 2016-08-03 NeedsCompilation: yes Packaged: 2016-08-03 21:32:42 UTC; ironholds Repository: CRAN Date/Publication: 2016-08-04 00:57:37 triebeard/man/0000755000176200001440000000000012742550046012773 5ustar liggesuserstriebeard/man/trie.Rd0000644000176200001440000000175612742550046014236 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/create.R \name{trie} \alias{trie} \title{Create a Trie} \usage{ trie(keys, values) } \arguments{ \item{keys}{a character vector containing the keys for the trie.} \item{values}{an atomic vector of any type, containing the values to pair with \code{keys}. Must be the same length as \code{keys}.} } \value{ a `trie` object. } \description{ \code{create_trie} creates a trie (a key-value store optimised for matching) out of a provided character vector of keys, and a numeric, character, logical or integer vector of values (both the same length). } \examples{ # An integer trie int_trie <- trie(keys = "foo", values = 1) # A string trie str_trie <- trie(keys = "foo", values = "bar") } \seealso{ \code{\link{trie_add}} and \code{\link{trie_remove}} for adding to and removing from tries after their creation, and \code{\link{longest_match}} and other match functions for matching values against the keys of a created trie. } triebeard/man/getters.Rd0000644000176200001440000000074612742550046014746 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/get.R \name{getters} \alias{get_keys} \alias{get_values} \alias{getters} \title{Trie Getters} \usage{ get_keys(trie) get_values(trie) } \arguments{ \item{trie}{A trie object, created with \code{\link{trie}}.} } \value{ An atomic vector of keys or values stored in the trie. } \description{ "Getters" for the data stored in a trie object. \code{get_keys} gets the keys, \code{get_values} gets the values. } triebeard/man/triebeard.Rd0000644000176200001440000000053312742550046015224 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/triebeard.R \docType{package} \name{triebeard} \alias{triebeard} \alias{triebeard-package} \title{Radix trees in Rcpp} \description{ This package provides access to Radix tree (or "trie") structures in Rcpp. At a later date it will hopefully provide them in R, too. } triebeard/man/greedy_match.Rd0000644000176200001440000000222312742550046015714 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/match.R \name{greedy_match} \alias{greedy_match} \title{Greedily match against a tree} \usage{ greedy_match(trie, to_match) } \arguments{ \item{trie}{a trie object, created with \code{\link{trie}}} \item{to_match}{a character vector containing the strings to check against the trie's keys.} } \value{ a list, the length of \code{to_match}, with each entry containing any trie values where the \code{to_match} element greedily matches the associated key. In the case that nothing was found, the entry will contain \code{NA}. } \description{ \code{greedy_match} accepts a trie and a character vector and returns the values associated with any key that is "greedily" (read: fuzzily) matched against one of the character vector entries. } \examples{ trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = c("afford", "affair", "available", "binary", "bind", "blind")) greedy_match(trie, c("avoid", "bring", "attack")) } \seealso{ \code{\link{longest_match}} and \code{\link{prefix_match}} for longest and prefix matching, respectively. } triebeard/man/alter.Rd0000644000176200001440000000171612742550046014376 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/alter.R \name{alter} \alias{alter} \alias{trie_add} \alias{trie_remove} \title{Add or remove trie entries} \usage{ trie_add(trie, keys, values) trie_remove(trie, keys) } \arguments{ \item{trie}{a trie object created with \code{\link{trie}}} \item{keys}{a character vector containing the keys of the entries to add (or remove). Entries with NA keys will not be added.} \item{values}{an atomic vector, matching the type of the trie, containing the values of the entries to add. Entries with NA values will not be added.} } \value{ nothing; the trie is modified in-place } \description{ \code{trie_add} and \code{trie_remove} allow you to add or remove entries from tries, respectively. } \examples{ trie <- trie("foo", "bar") length(trie) trie_add(trie, "baz", "qux") length(trie) trie_remove(trie, "baz") length(trie) } \seealso{ \code{\link{trie}} for creating tries in the first place. } triebeard/man/prefix_match.Rd0000644000176200001440000000216312742550046015735 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/match.R \name{prefix_match} \alias{prefix_match} \title{Find the prefix matches in a trie} \usage{ prefix_match(trie, to_match) } \arguments{ \item{trie}{a trie object, created with \code{\link{trie}}} \item{to_match}{a character vector containing the strings to check against the trie's keys.} } \value{ a list, the length of \code{to_match}, with each entry containing any trie values where the \code{to_match} element was a prefix of the associated key. In the case that nothing was found, the entry will contain \code{NA}. } \description{ \code{prefix_match} accepts a trie and a character vector and returns the values associated with any key that has a particular character vector entry as a prefix (see the examples). } \examples{ trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = c("afford", "affair", "available", "binary", "bind", "blind")) prefix_match(trie, "aff") } \seealso{ \code{\link{longest_match}} and \code{\link{greedy_match}} for longest and greedy matching, respectively. } triebeard/man/longest_match.Rd0000644000176200001440000000176412742550046016121 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/match.R \name{longest_match} \alias{longest_match} \title{Find the longest match in a trie} \usage{ longest_match(trie, to_match) } \arguments{ \item{trie}{a trie object, created with \code{\link{trie}}} \item{to_match}{a character vector containing the strings to match against the trie's keys.} } \description{ \code{longest_match} accepts a trie and a character vector and returns the value associated with whichever key had the \emph{longest match} to each entry in the character vector. A trie of "binary" and "bind", for example, with an entry-to-compare of "binder", will match to "bind". } \examples{ trie <- trie(keys = c("afford", "affair", "available", "binary", "bind", "blind"), values = c("afford", "affair", "available", "binary", "bind", "blind")) longest_match(trie, "binder") } \seealso{ \code{\link{prefix_match}} and \code{\link{greedy_match}} for prefix and greedy matching, respectively. } triebeard/LICENSE0000644000176200001440000000005212742550046013222 0ustar liggesusersYEAR: 2016 COPYRIGHT HOLDER: Oliver Keyes