stringr/0000755000175100001440000000000012520375150011760 5ustar hornikusersstringr/inst/0000755000175100001440000000000012520151252012730 5ustar hornikusersstringr/inst/doc/0000755000175100001440000000000012520151252013475 5ustar hornikusersstringr/inst/doc/stringr.R0000644000175100001440000000423612520151252015315 0ustar hornikusers## ---- echo=FALSE--------------------------------------------------------- library("stringr") knitr::opts_chunk$set(comment = "#>", collapse = TRUE) ## ------------------------------------------------------------------------ strings <- c( "apple", "219 733 8965", "329-293-8753", "Work: 579-499-7527; Home: 543.355.3679" ) phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})" ## ------------------------------------------------------------------------ # Which strings contain phone numbers? str_detect(strings, phone) str_subset(strings, phone) ## ------------------------------------------------------------------------ # Where in the string is the phone number located? (loc <- str_locate(strings, phone)) str_locate_all(strings, phone) ## ------------------------------------------------------------------------ # What are the phone numbers? str_extract(strings, phone) str_extract_all(strings, phone) str_extract_all(strings, phone, simplify = TRUE) ## ------------------------------------------------------------------------ # Pull out the three components of the match str_match(strings, phone) str_match_all(strings, phone) ## ------------------------------------------------------------------------ str_replace(strings, phone, "XXX-XXX-XXXX") str_replace_all(strings, phone, "XXX-XXX-XXXX") ## ------------------------------------------------------------------------ col2hex <- function(col) { rgb <- col2rgb(col) rgb(rgb["red", ], rgb["green", ], rgb["blue", ], max = 255) } # Goal replace colour names in a string with their hex equivalent strings <- c("Roses are red, violets are blue", "My favourite colour is green") colours <- str_c("\\b", colors(), "\\b", collapse="|") # This gets us the colours, but we have no way of replacing them str_extract_all(strings, colours) # Instead, let's work with locations locs <- str_locate_all(strings, colours) Map(function(string, loc) { hex <- col2hex(str_sub(string, loc)) str_sub(string, loc) <- hex string }, strings, locs) ## ------------------------------------------------------------------------ matches <- col2hex(colors()) names(matches) <- str_c("\\b", colors(), "\\b") str_replace_all(strings, matches) stringr/inst/doc/stringr.html0000644000175100001440000006302612520151252016062 0ustar hornikusers Introduction to stringr

Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R. The stringr package aims to remedy these problems by providing a clean, modern interface to common string operations.

More concretely, stringr:

To meet these goals, stringr provides two basic families of functions:

As of version 1.0, stringr is a thin wrapper around stringi, which implements all the functions in stringr with efficient C code based on the ICU library. Compared to stringi, stringr is considerably simpler: it provides fewer options and fewer functions. This is great when you’re getting started learning string functions, and if you do need more of stringi’s power, you should find the interface similar.

These are described in more detail in the following sections.

Basic string operations

There are three string functions that are closely related to their base R equivalents, but with a few enhancements:

Three functions add new functionality:

Pattern matching

stringr provides pattern matching functions to detect, locate, extract, match, replace, and split strings. I’ll illustrate how they work with some strings and a regular expression designed to match (US) phone numbers:

strings <- c(
  "apple", 
  "219 733 8965", 
  "329-293-8753", 
  "Work: 579-499-7527; Home: 543.355.3679"
)
phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"

Arguments

Each pattern matching function has the same first two arguments, a character vector of strings to process and a single pattern (regular expression) to match. The replace functions have an additional argument specifying the replacement string, and the split functions have an argument to specify the number of pieces.

Unlike base string functions, stringr offers control over matching not through arguments, but through modifier functions, regexp(), coll() and fixed(). This is a deliberate choice made to simplify these functions. For example, while grepl has six arguments, str_detect() only has two.

Regular expressions

To be able to use these functions effectively, you’ll need a good knowledge of regular expressions, which this vignette is not going to teach you. Some useful tools to get you started:

When writing regular expressions, I strongly recommend generating a list of positive (pattern should match) and negative (pattern shouldn’t match) test cases to ensure that you are matching the correct components.

Functions that return lists

Many of the functions return a list of vectors or matrices. To work with each element of the list there are two strategies: iterate through a common set of indices, or use Map() to iterate through the vectors simultaneously. The second strategy is illustrated below:

col2hex <- function(col) {
  rgb <- col2rgb(col)
  rgb(rgb["red", ], rgb["green", ], rgb["blue", ], max = 255)
}

# Goal replace colour names in a string with their hex equivalent
strings <- c("Roses are red, violets are blue", "My favourite colour is green")

colours <- str_c("\\b", colors(), "\\b", collapse="|")
# This gets us the colours, but we have no way of replacing them
str_extract_all(strings, colours)
#> [[1]]
#> [1] "red"  "blue"
#> 
#> [[2]]
#> [1] "green"

# Instead, let's work with locations
locs <- str_locate_all(strings, colours)
Map(function(string, loc) {
  hex <- col2hex(str_sub(string, loc))
  str_sub(string, loc) <- hex
  string
}, strings, locs)
#> $`Roses are red, violets are blue`
#> [1] "Roses are #FF0000, violets are blue"
#> [2] "Roses are red, violets are #0000FF" 
#> 
#> $`My favourite colour is green`
#> [1] "My favourite colour is #00FF00"

Another approach is to use the second form of str_replace_all(): if you give it a named vector, it applies each pattern = replacement in turn:

matches <- col2hex(colors())
names(matches) <- str_c("\\b", colors(), "\\b")

str_replace_all(strings, matches)
#> [1] "Roses are #FF0000, violets are #0000FF"
#> [2] "My favourite colour is #00FF00"

Conclusion

stringr provides an opinionated interface to strings in R. It makes string processing simpler by removing uncommon options, and by vigorously enforcing consistency across functions. I have also added new functions that I have found useful from Ruby, and over time, I hope users will suggest useful functions from other programming languages. I will continue to build on the included test suite to ensure that the package behaves as expected and remains bug free.

stringr/inst/doc/stringr.Rmd0000644000175100001440000002204112520151252015630 0ustar hornikusers--- title: "Introduction to stringr" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to stringr} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, echo=FALSE} library("stringr") knitr::opts_chunk$set(comment = "#>", collapse = TRUE) ``` Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R. The __stringr__ package aims to remedy these problems by providing a clean, modern interface to common string operations. More concretely, stringr: - Simplifies string operations by eliminating options that you don't need 95% of the time (the other 5% of the time you can functions from base R or [stringi](https://github.com/Rexamine/stringi/)). - Uses consistent function names and arguments. - Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs. It also processes factors and character vectors in the same way. - Completes R's string handling functions with useful functions from other programming languages. To meet these goals, stringr provides two basic families of functions: - basic string operations, and - pattern matching functions which use regular expressions to detect, locate, match, replace, extract, and split strings. As of version 1.0, stringr is a thin wrapper around [stringi](https://github.com/Rexamine/stringi/), which implements all the functions in stringr with efficient C code based on the [ICU library](http://site.icu-project.org). Compared to stringi, stringr is considerably simpler: it provides fewer options and fewer functions. This is great when you're getting started learning string functions, and if you do need more of stringi's power, you should find the interface similar. These are described in more detail in the following sections. ## Basic string operations There are three string functions that are closely related to their base R equivalents, but with a few enhancements: - `str_c()` is equivalent to `paste()`, but it uses the empty string ("") as the default separator and silently removes `NULL` inputs. - `str_length()` is equivalent to `nchar()`, but it preserves NA's (rather than giving them length 2) and converts factors to characters (not integers). - `str_sub()` is equivalent to `substr()` but it returns a zero length vector if any of its inputs are zero length, and otherwise expands each argument to match the longest. It also accepts negative positions, which are calculated from the left of the last character. The end position defaults to `-1`, which corresponds to the last character. - `str_str<-` is equivalent to `substr<-`, but like `str_sub` it understands negative indices, and replacement strings not do need to be the same length as the string they are replacing. Three functions add new functionality: - `str_dup()` to duplicate the characters within a string. - `str_trim()` to remove leading and trailing whitespace. - `str_pad()` to pad a string with extra whitespace on the left, right, or both sides. ## Pattern matching stringr provides pattern matching functions to **detect**, **locate**, **extract**, **match**, **replace**, and **split** strings. I'll illustrate how they work with some strings and a regular expression designed to match (US) phone numbers: ```{r} strings <- c( "apple", "219 733 8965", "329-293-8753", "Work: 579-499-7527; Home: 543.355.3679" ) phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})" ``` - `str_detect()` detects the presence or absence of a pattern and returns a logical vector (similar to `grepl()`). `str_subset()` returns the elements of a character vector that match a regular expression (similar to `grep()` with `value = TRUE`)`. ```{r} # Which strings contain phone numbers? str_detect(strings, phone) str_subset(strings, phone) ``` - `str_locate()` locates the first position of a pattern and returns a numeric matrix with columns start and end. `str_locate_all()` locates all matches, returning a list of numeric matrices. Similar to `regexpr()` and `gregexpr()`. ```{r} # Where in the string is the phone number located? (loc <- str_locate(strings, phone)) str_locate_all(strings, phone) ``` - `str_extract()` extracts text corresponding to the first match, returning a character vector. `str_extract_all()` extracts all matches and returns a list of character vectors. ```{r} # What are the phone numbers? str_extract(strings, phone) str_extract_all(strings, phone) str_extract_all(strings, phone, simplify = TRUE) ``` - `str_match()` extracts capture groups formed by `()` from the first match. It returns a character matrix with one column for the complete match and one column for each group. `str_match_all()` extracts capture groups from all matches and returns a list of character matrices. Similar to `regmatches()`. ```{r} # Pull out the three components of the match str_match(strings, phone) str_match_all(strings, phone) ``` - `str_replace()` replaces the first matched pattern and returns a character vector. `str_replace_all()` replaces all matches. Similar to `sub()` and `gsub()`. ```{r} str_replace(strings, phone, "XXX-XXX-XXXX") str_replace_all(strings, phone, "XXX-XXX-XXXX") ``` - `str_split_fixed()` splits the string into a fixed number of pieces based on a pattern and returns a character matrix. `str_split()` splits a string into a variable number of pieces and returns a list of character vectors. ### Arguments Each pattern matching function has the same first two arguments, a character vector of `string`s to process and a single `pattern` (regular expression) to match. The replace functions have an additional argument specifying the replacement string, and the split functions have an argument to specify the number of pieces. Unlike base string functions, stringr offers control over matching not through arguments, but through modifier functions, `regexp()`, `coll()` and `fixed()`. This is a deliberate choice made to simplify these functions. For example, while `grepl` has six arguments, `str_detect()` only has two. ### Regular expressions To be able to use these functions effectively, you'll need a good knowledge of regular expressions, which this vignette is not going to teach you. Some useful tools to get you started: - A good [reference sheet](http://www.regular-expressions.info/reference.html). - A tool that allows you to [interactively test](http://gskinner.com/RegExr/) what a regular expression will match. - A tool to [build a regular expression](http://www.txt2re.com) from an input string. When writing regular expressions, I strongly recommend generating a list of positive (pattern should match) and negative (pattern shouldn't match) test cases to ensure that you are matching the correct components. ### Functions that return lists Many of the functions return a list of vectors or matrices. To work with each element of the list there are two strategies: iterate through a common set of indices, or use `Map()` to iterate through the vectors simultaneously. The second strategy is illustrated below: ```{r} col2hex <- function(col) { rgb <- col2rgb(col) rgb(rgb["red", ], rgb["green", ], rgb["blue", ], max = 255) } # Goal replace colour names in a string with their hex equivalent strings <- c("Roses are red, violets are blue", "My favourite colour is green") colours <- str_c("\\b", colors(), "\\b", collapse="|") # This gets us the colours, but we have no way of replacing them str_extract_all(strings, colours) # Instead, let's work with locations locs <- str_locate_all(strings, colours) Map(function(string, loc) { hex <- col2hex(str_sub(string, loc)) str_sub(string, loc) <- hex string }, strings, locs) ``` Another approach is to use the second form of `str_replace_all()`: if you give it a named vector, it applies each `pattern = replacement` in turn: ```{r} matches <- col2hex(colors()) names(matches) <- str_c("\\b", colors(), "\\b") str_replace_all(strings, matches) ``` ## Conclusion stringr provides an opinionated interface to strings in R. It makes string processing simpler by removing uncommon options, and by vigorously enforcing consistency across functions. I have also added new functions that I have found useful from Ruby, and over time, I hope users will suggest useful functions from other programming languages. I will continue to build on the included test suite to ensure that the package behaves as expected and remains bug free. stringr/tests/0000755000175100001440000000000012435640121013120 5ustar hornikusersstringr/tests/testthat.R0000644000175100001440000000007212435640121015102 0ustar hornikuserslibrary(testthat) library(stringr) test_check("stringr") stringr/tests/testthat/0000755000175100001440000000000012520375150014762 5ustar hornikusersstringr/tests/testthat/test-trim.r0000644000175100001440000000114712435640121017076 0ustar hornikuserscontext("Trimming strings") test_that("trimming removes spaces", { is_trimmed <- equals("abc") expect_that(str_trim("abc "), is_trimmed) expect_that(str_trim(" abc"), is_trimmed) expect_that(str_trim(" abc "), is_trimmed) }) test_that("trimming removes tabs", { is_trimmed <- equals("abc") expect_that(str_trim("abc\t"), is_trimmed) expect_that(str_trim("\tabc"), is_trimmed) expect_that(str_trim("\tabc\t"), is_trimmed) }) test_that("side argument restricts trimming", { expect_that(str_trim(" abc ", "left"), equals("abc ")) expect_that(str_trim(" abc ", "right"), equals(" abc")) }) stringr/tests/testthat/test-sub.r0000644000175100001440000000340012435640121016706 0ustar hornikuserscontext("Extracting substrings") alphabet <- str_c(letters, collapse = "") test_that("correct substring extracted", { expect_that(str_sub(alphabet, 1, 3), equals("abc")) expect_that(str_sub(alphabet, 24, 26), equals("xyz")) }) test_that("arguments expanded to longest", { alphabet <- str_c(letters, collapse = "") expect_that( str_sub(alphabet, c(1, 24), c(3, 26)), equals(c("abc", "xyz"))) expect_that( str_sub(c("abc", "xyz"), 2, 2), equals(c("b", "y"))) }) test_that("specifying only end subsets from start", { expect_that(str_sub(alphabet, end = 3), equals(c("abc"))) }) test_that("specifying only start subsets to end", { expect_that(str_sub(alphabet, 24), equals(c("xyz"))) }) test_that("specifying -1 as end selects entire string", { expect_that( str_sub("ABCDEF", c(4, 5), c(5, -1)), equals(c("DE", "EF")) ) expect_that( str_sub("ABCDEF", c(4, 5), c(-1, -1)), equals(c("DEF", "EF")) ) }) test_that("negative values select from end", { expect_that(str_sub("ABCDEF", 1, -4), equals("ABC")) expect_that(str_sub("ABCDEF", -3), equals("DEF")) }) test_that("missing arguments give missing results", { expect_that(str_sub(NA), equals(NA_character_)) expect_that(str_sub(NA, 1, 3), equals(NA_character_)) expect_that(str_sub(c(NA, "NA"), 1, 3), equals(c(NA, "NA"))) expect_that(str_sub("test", NA, NA), equals(NA_character_)) expect_that(str_sub(c(NA, "test"), NA, NA), equals(rep(NA_character_, 2))) }) test_that("replacement works", { x <- "BBCDEF" str_sub(x, 1, 1) <- "A" expect_that(x, equals("ABCDEF")) str_sub(x, -1, -1) <- "K" expect_that(x, equals("ABCDEK")) str_sub(x, -2, -1) <- "EFGH" expect_that(x, equals("ABCDEFGH")) str_sub(x, 2, -2) <- "" expect_that(x, equals("AH")) }) stringr/tests/testthat/test-length.r0000644000175100001440000000116312435640121017402 0ustar hornikuserscontext("String length") test_that("str_length is number of characters", { expect_that(str_length("a"), equals(1)) expect_that(str_length("ab"), equals(2)) expect_that(str_length("abc"), equals(3)) }) test_that("str_length of missing string is missing", { expect_that(str_length(NA), equals(NA_integer_)) expect_that(str_length(c(NA, 1)), equals(c(NA, 1))) expect_that(str_length("NA"), equals(2)) }) test_that("str_length of factor is length of level", { expect_that(str_length(factor("a")), equals(1)) expect_that(str_length(factor("ab")), equals(2)) expect_that(str_length(factor("abc")), equals(3)) }) stringr/tests/testthat/test-extract.r0000644000175100001440000000071112435640121017571 0ustar hornikuserscontext("Extract patterns") test_that("single pattern extracted correctly", { test <- c("one two three", "a b c") expect_that( str_extract_all(test, "[a-z]+"), equals(list(c("one", "two", "three"), c("a", "b", "c")))) expect_that( str_extract_all(test, "[a-z]{3,}"), equals(list(c("one", "two", "three"), character()))) }) test_that("no match yields empty vector", { expect_equal(str_extract_all("a", "b")[[1]], character()) }) stringr/tests/testthat/test-detect.r0000644000175100001440000000140112435640121017364 0ustar hornikuserscontext("Detecting patterns") test_that("special cases are correct", { expect_that(str_detect(NA, "x"), equals(NA)) expect_that(str_detect(character(), "x"), equals(logical())) }) test_that("vectorised patterns work", { expect_that(str_detect("ab", c("a", "b", "c")), equals(c(T, T, F))) expect_that(str_detect(c("ca", "ab"), c("a", "c")), equals(c(T, F))) }) test_that("modifiers work", { expect_that(str_detect("ab", "AB"), equals(FALSE)) expect_that(str_detect("ab", regex("AB", TRUE)), equals(TRUE)) expect_that(str_detect("abc", "ab[c]"), equals(TRUE)) expect_that(str_detect("abc", fixed("ab[c]")), equals(FALSE)) expect_that(str_detect("ab[c]", fixed("ab[c]")), equals(TRUE)) expect_that(str_detect("abc", "(?x)a b c"), equals(TRUE)) }) stringr/tests/testthat/test-split.r0000644000175100001440000000432212440403712017253 0ustar hornikuserscontext("Splitting strings") test_that("special cases are correct", { expect_that(str_split(NA, "")[[1]], equals(NA_character_)) expect_that(str_split(character(), ""), equals(list())) }) test_that("str_split functions as expected", { test <- c("bab", "cac", "dadad") result <- str_split(test, "a") expect_that(result, is_a("list")) expect_that(length(result), equals(3)) lengths <- vapply(result, length, integer(1)) expect_that(lengths, equals(c(2, 2, 3))) expect_that(result, equals( list(c("b", "b"), c("c", "c"), c("d", "d", "d")))) }) test_that("vectors give correct results dealt with correctly", { test <- c("bab", "cac", "dadad", "eae") result <- str_split_fixed(test, "a", 3) expect_that(result, is_a("matrix")) expect_that(nrow(result), equals(4)) expect_that(ncol(result), equals(3)) expect_that(result[1, ], equals(c("b", "b", ""))) expect_that(result[3, ], equals(c("d", "d", "d"))) expect_that(result[, 1], equals(c("b", "c", "d", "e"))) }) test_that("n sets maximum number of splits in str_split", { test <- "Subject: Roger: his drinking problems" expect_that(length(str_split(test, ": ")[[1]]), equals(3)) expect_that(length(str_split(test, ": ", 4)[[1]]), equals(3)) expect_that(length(str_split(test, ": ", 3)[[1]]), equals(3)) expect_that(length(str_split(test, ": ", 2)[[1]]), equals(2)) expect_that(length(str_split(test, ": ", 1)[[1]]), equals(1)) expect_that( str_split(test, ": ", 3)[[1]], equals(c("Subject", "Roger", "his drinking problems"))) expect_that( str_split(test, ": ", 2)[[1]], equals(c("Subject", "Roger: his drinking problems"))) }) test_that("n sets exact number of splits in str_split_fixed", { test <- "Subject: Roger: his drinking problems" expect_that(ncol(str_split_fixed(test, ": ", 4)), equals(4)) expect_that(ncol(str_split_fixed(test, ": ", 3)), equals(3)) expect_that(ncol(str_split_fixed(test, ": ", 2)), equals(2)) expect_that(ncol(str_split_fixed(test, ": ", 1)), equals(1)) expect_that( str_split_fixed(test, ": ", 3)[1, ], equals(c("Subject", "Roger", "his drinking problems"))) expect_that( str_split_fixed(test, ": ", 2)[1, ], equals(c("Subject", "Roger: his drinking problems"))) }) stringr/tests/testthat/test-match.r0000644000175100001440000000353512513530736017231 0ustar hornikuserscontext("Matching groups") set.seed(1410) num <- matrix(sample(9, 10 * 10, rep = T), ncol = 10) num_flat <- apply(num, 1, str_c, collapse = "") phones <- str_c( "(", num[, 1], num[ ,2], num[, 3], ") ", num[, 4], num[, 5], num[, 6], " ", num[, 7], num[, 8], num[, 9], num[, 10]) test_that("special case are correct", { expect_equal(str_match(NA, "(a)"), matrix(NA_character_)) expect_equal(str_match(character(), "(a)"), matrix(character(), 0, 1)) }) test_that("no matching cases returns 1 column matrix", { res <- str_match(c("a", "b"), ".") expect_that(nrow(res), equals(2)) expect_that(ncol(res), equals(1)) expect_that(res[, 1], equals(c("a", "b"))) }) test_that("single match works when all match", { matches <- str_match(phones, "\\(([0-9]{3})\\) ([0-9]{3}) ([0-9]{4})") expect_that(nrow(matches), equals(length(phones))) expect_that(ncol(matches), equals(4)) expect_that(matches[, 1], equals(phones)) matches_flat <- apply(matches[, -1], 1, str_c, collapse = "") expect_that(matches_flat, equals(num_flat)) }) test_that("match returns NA when some inputs don't match", { matches <- str_match(c(phones, "blah", NA), "\\(([0-9]{3})\\) ([0-9]{3}) ([0-9]{4})") expect_that(nrow(matches), equals(length(phones) + 2)) expect_that(ncol(matches), equals(4)) expect_that(matches[11, ], equals(rep(NA_character_, 4))) expect_that(matches[12, ], equals(rep(NA_character_, 4))) }) test_that("match returns NA when optional group doesn't match", { expect_equal(str_match(c("ab", "a"), "(a)(b)?")[,3], c("b", NA)) }) test_that("multiple match works", { phones_one <- str_c(phones, collapse = " ") multi_match <- str_match_all(phones_one, "\\(([0-9]{3})\\) ([0-9]{3}) ([0-9]{4})") single_matches <- str_match(phones, "\\(([0-9]{3})\\) ([0-9]{3}) ([0-9]{4})") expect_that(multi_match[[1]], equals(single_matches)) }) stringr/tests/testthat/test-dup.r0000644000175100001440000000071012435640121016706 0ustar hornikuserscontext("Duplicating strings") test_that("basic duplication works", { expect_that(str_dup("a", 3), equals("aaa")) expect_that(str_dup("abc", 2), equals("abcabc")) expect_that(str_dup(c("a", "b"), 2), equals(c("aa", "bb"))) expect_that(str_dup(c("a", "b"), c(2, 3)), equals(c("aa", "bbb"))) }) test_that("0 duplicates equals empty string", { expect_that(str_dup("a", 0), equals("")) expect_that(str_dup(c("a", "b"), 0), equals(rep("", 2))) }) stringr/tests/testthat/test-pad.r0000644000175100001440000000112512435640121016663 0ustar hornikuserscontext("Test padding") test_that("long strings are unchanged", { lengths <- sample(40:100, 10) strings <- vapply(lengths, function(x) str_c(letters[sample(26, x, rep = T)], collapse = ""), character(1)) padded <- str_pad(strings, width = 30) expect_that(str_length(padded), equals(str_length(padded))) }) test_that("directions work for simple case", { pad <- function(direction) str_pad("had", direction, width = 10) expect_that(pad("right"), equals("had ")) expect_that(pad("left"), equals(" had")) expect_that(pad("both"), equals(" had ")) }) stringr/tests/testthat/test-locate.r0000644000175100001440000000222512435640121017370 0ustar hornikuserscontext("Locations") test_that("basic location matching works", { expect_that(str_locate("abc", "a")[1, ], equals(c(1, 1), check.attributes = F)) expect_that(str_locate("abc", "b")[1, ], equals(c(2, 2), check.attributes = F)) expect_that(str_locate("abc", "c")[1, ], equals(c(3, 3), check.attributes = F)) expect_that(str_locate("abc", ".+")[1, ], equals(c(1, 3), check.attributes = F)) }) test_that("locations are integers", { strings <- c("a b c", "d e f") expect_that(is.integer(str_locate(strings, "[a-z]")), is_true()) res <- str_locate_all(strings, "[a-z]")[[1]] expect_that(is.integer(res), is_true()) expect_that(is.integer(invert_match(res)), is_true()) }) test_that("both string and patterns are vectorised", { strings <- c("abc", "def") locs <- str_locate(strings, "a") expect_that(locs[, "start"], equals(c(1, NA))) locs <- str_locate(strings, c("a", "d")) expect_that(locs[, "start"], equals(c(1, 1))) expect_that(locs[, "end"], equals(c(1, 1))) locs <- str_locate_all(c("abab"), c("a", "b")) expect_that(locs[[1]][, "start"], equals(c(1, 3))) expect_that(locs[[2]][, "start"], equals(c(2, 4))) }) stringr/tests/testthat/test-count.r0000644000175100001440000000054312435640121017252 0ustar hornikuserscontext("Counting matches") test_that("counts are as expected", { fruit <- c("apple", "banana", "pear", "pineapple") expect_equal(str_count(fruit, "a"), c(1, 3, 1, 1)) expect_equal(str_count(fruit, "p"), c(2, 0, 1, 3)) expect_equal(str_count(fruit, "e"), c(1, 0, 1, 2)) expect_equal(str_count(fruit, c("a", "b", "p", "n")), c(1, 1, 1, 1)) }) stringr/tests/testthat/test-join.r0000644000175100001440000000064012435640121017057 0ustar hornikuserscontext("Joining strings") test_that("basic case works", { test <- c("a", "b", "c") expect_that(str_c(test), equals(test)) expect_that(str_c(test, sep = " "), equals(test)) expect_that(str_c(test, collapse = ""), equals("abc")) }) test_that("NULLs are dropped", { test <- letters[1:3] expect_equal(str_c(test, NULL), test) expect_equal(str_c(test, NULL, "a", sep = " "), c("a a", "b a", "c a")) }) stringr/NAMESPACE0000644000175100001440000000143212442343223013176 0ustar hornikusers# Generated by roxygen2 (4.1.0): do not edit by hand export("%>%") export("str_sub<-") export(boundary) export(coll) export(fixed) export(ignore.case) export(invert_match) export(perl) export(regex) export(str_c) export(str_conv) export(str_count) export(str_detect) export(str_dup) export(str_extract) export(str_extract_all) export(str_join) export(str_length) export(str_locate) export(str_locate_all) export(str_match) export(str_match_all) export(str_order) export(str_pad) export(str_replace) export(str_replace_all) export(str_replace_na) export(str_sort) export(str_split) export(str_split_fixed) export(str_sub) export(str_subset) export(str_to_lower) export(str_to_title) export(str_to_upper) export(str_trim) export(str_wrap) export(word) import(stringi) importFrom(magrittr,"%>%") stringr/R/0000755000175100001440000000000012513233473012164 5ustar hornikusersstringr/R/utils.R0000644000175100001440000000021312435640121013436 0ustar hornikusers#' Pipe operator #' #' @name %>% #' @rdname pipe #' @keywords internal #' @export #' @importFrom magrittr %>% #' @usage lhs \%>\% rhs NULL stringr/R/wrap.r0000644000175100001440000000242612452577753013342 0ustar hornikusers#' Wrap strings into nicely formatted paragraphs. #' #' This is a wrapper around \code{\link[stringi]{stri_wrap}} which implements #' the Knuth-Plass paragraph wrapping algorithm. #' #' @param string character vector of strings to reformat. #' @param width positive integer giving target line width in characters. A #' width less than or equal to 1 will put each word on its own line. #' @param indent non-negative integer giving indentation of first line in #' each paragraph #' @param exdent non-negative integer giving indentation of following lines in #' each paragraph #' @return A character vector of re-wrapped strings. #' @export #' @examples #' thanks_path <- file.path(R.home("doc"), "THANKS") #' thanks <- str_c(readLines(thanks_path), collapse = "\n") #' thanks <- word(thanks, 1, 3, fixed("\n\n")) #' cat(str_wrap(thanks), "\n") #' cat(str_wrap(thanks, width = 40), "\n") #' cat(str_wrap(thanks, width = 60, indent = 2), "\n") #' cat(str_wrap(thanks, width = 60, exdent = 2), "\n") #' cat(str_wrap(thanks, width = 0, exdent = 2), "\n") str_wrap <- function(string, width = 80, indent = 0, exdent = 0) { if (width <= 0) width <- 1 out <- stri_wrap(string, width = width, indent = indent, exdent = exdent, simplify = FALSE) vapply(out, str_c, collapse = "\n", character(1)) } stringr/R/subset.R0000644000175100001440000000206712440371067013622 0ustar hornikusers#' Keep strings matching a pattern. #' #' This is a convenient wrapper around \code{x[str_detect(x, pattern)]}. #' Vectorised over \code{string} and \code{pattern} #' #' @inheritParams str_detect #' @return A character vector. #' @seealso \code{\link{grep}} with argument \code{value = TRUE}, #' \code{\link[stringi]{stri_subset}} for the underlying implementation. #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pinapple") #' str_subset(fruit, "a") #' str_subset(fruit, "^a") #' str_subset(fruit, "a$") #' str_subset(fruit, "b") #' str_subset(fruit, "[aeiou]") #' #' # Missings are silently dropped #' str_subset(c("a", NA, "b"), ".") str_subset <- function(string, pattern) { switch(type(pattern), empty = , bound = stop("Not implemented", call. = FALSE), fixed = stri_subset_fixed(string, pattern, omit_na = TRUE), coll = stri_subset_coll(string, pattern, omit_na = TRUE, opts_collator = attr(pattern, "options")), regex = stri_subset_regex(string, pattern, omit_na = TRUE, opts_regex = attr(pattern, "options")) ) } stringr/R/case.R0000644000175100001440000000135012436074132013220 0ustar hornikusers#' Convert case of a string. #' #' @param string String to modify #' @param locale Locale to use for translations. #' @examples #' dog <- "The quick brown dog" #' str_to_upper(dog) #' str_to_lower(dog) #' str_to_title(dog) #' #' # Locale matters! #' str_to_upper("i", "en") # English #' str_to_upper("i", "tr") # Turkish #' @name case NULL #' @export #' @rdname case str_to_upper <- function(string, locale = "") { stri_trans_toupper(string, locale = locale) } #' @export #' @rdname case str_to_lower <- function(string, locale = "") { stri_trans_tolower(string, locale = locale) } #' @export #' @rdname case str_to_title <- function(string, locale = "") { stri_trans_totitle(string, opts_brkiter = stri_opts_brkiter(locale = locale)) } stringr/R/split.r0000644000175100001440000000474112442343250013504 0ustar hornikusers#' Split up a string into pieces. #' #' Vectorised over \code{string} and \code{pattern}. #' #' @inheritParams str_detect #' @param n number of pieces to return. Default (Inf) uses all #' possible split positions. #' #' For \code{str_split_fixed}, if n is greater than the number of pieces, #' the result will be padded with empty strings. #' @return For \code{str_split_fixed}, a character matrix with \code{n} columns. #' For \code{str_split}, a list of character vectors. #' @seealso \code{\link{stri_split}} for the underlying implementation. #' @export #' @examples #' fruits <- c( #' "apples and oranges and pears and bananas", #' "pineapples and mangos and guavas" #' ) #' #' str_split(fruits, " and ") #' #' # Specify n to restrict the number of possible matches #' str_split(fruits, " and ", n = 3) #' str_split(fruits, " and ", n = 2) #' # If n greater than number of pieces, no padding occurs #' str_split(fruits, " and ", n = 5) #' #' # Use fixed to return a character matrix #' str_split_fixed(fruits, " and ", 3) #' str_split_fixed(fruits, " and ", 4) str_split <- function(string, pattern, n = Inf) { if (identical(n, Inf)) n <- -1L switch(type(pattern), empty = stri_split_boundaries(string, n = n, simplify = FALSE, opts_brkiter = stri_opts_brkiter(type = "character")), bound = stri_split_boundaries(string, n = n, simplify = FALSE, opts_brkiter = attr(pattern, "options")), fixed = stri_split_fixed(string, pattern, n = n, simplify = FALSE, opts_fixed = attr(pattern, "options")), regex = stri_split_regex(string, pattern, n = n, simplify = FALSE, opts_regex = attr(pattern, "options")), coll = stri_split_coll(string, pattern, n = n, simplify = FALSE, opts_collator = attr(pattern, "options")) ) } #' @export #' @rdname str_split str_split_fixed <- function(string, pattern, n) { out <- switch(type(pattern), empty = stri_split_boundaries(string, n = n, simplify = TRUE, opts_brkiter = stri_opts_brkiter(type = "character")), bound = stri_split_boundaries(string, n = n, simplify = TRUE, opts_brkiter = attr(pattern, "options")), fixed = stri_split_fixed(string, pattern, n = n, simplify = TRUE, opts_fixed = attr(pattern, "options")), regex = stri_split_regex(string, pattern, n = n, simplify = TRUE, opts_regex = attr(pattern, "options")), coll = stri_split_coll(string, pattern, n = n, simplify = TRUE, opts_collator = attr(pattern, "options")) ) out[is.na(out)] <- "" out } stringr/R/stringr.R0000644000175100001440000000012612435640121013771 0ustar hornikusers#' Fast and friendly string manipulation. #' #' @name stringr #' @import stringi NULL stringr/R/conv.R0000644000175100001440000000103512435654364013264 0ustar hornikusers#' Specify the encoding of a string. #' #' This is a convenient way to override the current encoding of a string. #' #' @param string String to re-encode. #' @param encoding Name of encoding. See \code{\link[stringi]{stri_enc_list}} #' for a complete list. #' @export #' @examples #' # Example from encoding?stringi::stringi #' x <- rawToChar(as.raw(177)) #' x #' str_conv(x, "ISO-8859-2") # Polish "a with ogonek" #' str_conv(x, "ISO-8859-1") # Plus-minus str_conv <- function(string, encoding) { stri_conv(string, encoding, "UTF-8") } stringr/R/length.r0000644000175100001440000000204112436074637013636 0ustar hornikusers#' The length of a string. #' #' Technically this returns the number of "code points", in a string. One #' code point usually corresponds to one character, but not always. For example, #' an u with a umlaut might be represented as a single character or as the #' combination a u and an umlaut. #' #' @inheritParams str_detect #' @return A numeric vector giving number of characters (code points) in each #' element of the character vector. Missing string have missing length. #' @seealso \code{\link[stringi]{stri_length}} which this function wraps. #' @export #' @examples #' str_length(letters) #' str_length(NA) #' str_length(factor("abc")) #' str_length(c("i", "like", "programming", NA)) #' #' # Two ways of representing a u with an umlaut #' u1 <- "\u00fc" #' u2 <- stringi::stri_trans_nfd(u1) #' # The print the same: #' u1 #' u2 #' # But have a different length #' str_length(u1) #' str_length(u2) #' # Even though they have the same number of characters #' str_count(u1) #' str_count(u2) str_length <- function(string) { stri_length(string) } stringr/R/modifiers.r0000644000175100001440000001061112442343363014330 0ustar hornikusers#' Control matching behaviour with modifier functions. #' #' \describe{ #' \item{fixed}{Compare literal bytes in the string. This is very fast, but #' not usually what you want for non-ASCII character sets.} #' \item{coll}{Compare strings respecting standard collation rules.} #' \item{regexp}{The default. Uses ICU regular expressions.} #' \item{boundary}{Match boundaries between things.} #' } #' #' @param pattern Pattern to modify behaviour. #' @param ignore_case Should case differences be ignored in the match? #' @name modifiers #' @examples #' pattern <- "a.b" #' strings <- c("abb", "a.b") #' str_detect(strings, pattern) #' str_detect(strings, fixed(pattern)) #' str_detect(strings, coll(pattern)) #' #' # coll() is useful for locale-aware case-insensitive matching #' i <- c("I", "\u0130", "i") #' i #' str_detect(i, fixed("i", TRUE)) #' str_detect(i, coll("i", TRUE)) #' str_detect(i, coll("i", TRUE, locale = "tr")) #' #' # Word boundaries #' words <- c("These are some words.") #' str_count(words, boundary("word")) #' str_split(words, " ")[[1]] #' str_split(words, boundary("word"))[[1]] #' #' # Regular expression variations #' str_extract_all("The Cat in the Hat", "[a-z]+") #' str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE)) #' #' str_extract_all("a\nb\nc", "^.") #' str_extract_all("a\nb\nc", regex("^.", multiline = TRUE)) #' #' str_extract_all("a\nb\nc", "a.") #' str_extract_all("a\nb\nc", regex("a.", dotall = TRUE)) NULL #' @export #' @rdname modifiers fixed <- function(pattern, ignore_case = FALSE) { options <- stri_opts_fixed(case_insensitive = ignore_case) structure( pattern, options = options, class = c("fixed", "pattern", "character") ) } #' @export #' @rdname modifiers #' @param locale Locale to use for comparisons. See #' \code{\link[stringi]{stri_locale_list}()} for all possible options. #' @param ... Other less frequently used arguments passed on to #' \code{\link[stringi]{stri_opts_collator}}, #' \code{\link[stringi]{stri_opts_regex}}, or #' \code{\link[stringi]{stri_opts_brkiter}} coll <- function(pattern, ignore_case = FALSE, locale = NULL, ...) { options <- stri_opts_collator( strength = if (ignore_case) 2L else 3L, locale = locale, ... ) structure( pattern, options = options, class = c("coll", "pattern", "character") ) } #' @export #' @rdname modifiers #' @param multiline If \code{TRUE}, \code{$} and \code{^} match #' the beginning and end of each line. If \code{FALSE}, the #' default, only match the start and end of the input. #' @param comments If \code{TRUE}, white space and comments beginning with #' \code{#} are ignored. Escape literal spaces with \code{\\ }. #' @param dotall If \code{TRUE}, \code{.} will also match line terminators. regex <- function(pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, ...) { options <- stri_opts_regex( case_insensitive = ignore_case, multiline = multiline, comments = comments, dotall = dotall, ... ) structure( pattern, options = options, class = c("regex", "pattern", "character") ) } #' @param type Boundary type to detect. #' @param skip_word_none Ignore "words" that don't contain any characters #' or numbers - i.e. punctuation. #' @export #' @rdname modifiers boundary <- function(type = c("character", "line_break", "sentence", "word"), skip_word_none = TRUE, ...) { type <- match.arg(type) options <- stri_opts_brkiter( type = type, skip_word_none = skip_word_none, ... ) structure( character(), options = options, class = c("boundary", "pattern", "character") ) } type <- function(x) UseMethod("type") type.boundary <- function(x) "bound" type.regexp <- function(x) "regex" type.coll <- function(x) "coll" type.fixed <- function(x) "fixed" type.character <- function(x) if (identical(x, "")) "empty" else "regex" #' Deprecated modifier functions. #' #' Please use \code{\link{regexp}} and \code{\link{coll}} instead. #' #' @name modifier-deprecated #' @keywords internal NULL #' @export #' @rdname modifier-deprecated ignore.case <- function(string) { message("Please use (fixed|coll|regexp)(x, ignore_case = TRUE) instead of ignore.case(x)") fixed(string, ignore_case = TRUE) } #' @export #' @rdname modifier-deprecated perl <- function(pattern) { message("perl is deprecated. Please use regexp instead") regex(pattern) } stringr/R/sort.R0000644000175100001440000000233112435651034013275 0ustar hornikusers#' Order or sort a character vector. #' #' @param x A character vector to sort. #' @param decreasing A boolean. If \code{FALSE}, the default, sorts from #' lowest to highest; if \code{TRUE} sorts from highest to lowest. #' @param na_last Where should \code{NA} go? \code{TRUE} at the end, #' \code{FALSE} at the beginning, \code{NA} dropped. #' @param locale In which locale should the sorting occur? Defaults to #' the current locale. #' @param ... Other options used to control sorting order. Passed on to #' \code{\link[stringi]{stri_opts_collator}}. #' @seealso \code{\link[stringi]{stri_order}} for the underlying implementation. #' @export #' @examples #' str_order(letters, locale = "en") #' str_sort(letters, locale = "en") #' #' str_order(letters, locale = "haw") #' str_sort(letters, locale = "haw") str_order <- function(x, decreasing = FALSE, na_last = TRUE, locale = "", ...) { stri_order(x, decreasing = decreasing, na_last = na_last, opts_collator = stri_opts_collator(locale, ...)) } #' @export #' @rdname str_order str_sort <- function(x, decreasing = FALSE, na_last = TRUE, locale = "", ...) { stri_sort(x, decreasing = decreasing, na_last = na_last, opts_collator = stri_opts_collator(locale, ...)) } stringr/R/word.r0000644000175100001440000000342412513233473013325 0ustar hornikusers#' Extract words from a sentence. #' #' @param string input character vector. #' @param start integer vector giving position of first word to extract. #' Defaults to first word. If negative, counts backwards from last #' character. #' @param end integer vector giving position of last word to extract. #' Defaults to first word. If negative, counts backwards from last #' character. #' @param sep separator between words. Defaults to single space. #' @return character vector of words from \code{start} to \code{end} #' (inclusive). Will be length of longest input argument. #' @export #' @examples #' sentences <- c("Jane saw a cat", "Jane sat down") #' word(sentences, 1) #' word(sentences, 2) #' word(sentences, -1) #' word(sentences, 2, -1) #' #' # Also vectorised over start and end #' word(sentences[1], 1:3, -1) #' word(sentences[1], 1, 1:4) #' #' # Can define words by other separators #' str <- 'abc.def..123.4568.999' #' word(str, 1, sep = fixed('..')) #' word(str, 2, sep = fixed('..')) word <- function(string, start = 1L, end = start, sep = fixed(" ")) { n <- max(length(string), length(start), length(end)) string <- rep(string, length.out = n) start <- rep(start, length.out = n) end <- rep(end, length.out = n) breaks <- str_locate_all(string, sep) words <- lapply(breaks, invert_match) # Convert negative values into actual positions len <- vapply(words, nrow, integer(1)) neg_start <- !is.na(start) & start < 0L start[neg_start] <- start[neg_start] + len[neg_start] + 1L neg_end <- !is.na(end) & end < 0L end[neg_end] <- end[neg_end] + len[neg_end] + 1L # Extract locations starts <- mapply(function(word, loc) word[loc, "start"], words, start) ends <- mapply(function(word, loc) word[loc, "end"], words, end) str_sub(string, starts, ends) } stringr/R/locate.r0000644000175100001440000000562712442343250013624 0ustar hornikusers#' Locate the position of patterns in a string. #' #' Vectorised over \code{string} and \code{pattern}. If the match is of length #' 0, (e.g. from a special match like \code{$}) end will be one character less #' than start. #' #' @inheritParams str_detect #' @return For \code{str_locate}, an integer matrix. First column gives start #' postion of match, and second column gives end position. For #' \code{str_locate_all} a list of integer matrices. #' @seealso #' \code{\link{str_extract}} for a convenient way of extracting matches, #' \code{\link[stringi]{stri_locate}} for the underlying implementation. #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_locate(fruit, "$") #' str_locate(fruit, "a") #' str_locate(fruit, "e") #' str_locate(fruit, c("a", "b", "p", "p")) #' #' str_locate_all(fruit, "a") #' str_locate_all(fruit, "e") #' str_locate_all(fruit, c("a", "b", "p", "p")) #' #' # Find location of every character #' str_locate_all(fruit, "") str_locate <- function(string, pattern) { switch(type(pattern), empty = stri_locate_first_boundaries(string, opts_brkiter = stri_opts_brkiter("character")), bound = stri_locate_first_boundaries(string, opts_brkiter = attr(pattern, "options")), fixed = stri_locate_first_fixed(string, pattern, opts_fixed = attr(pattern, "options")), coll = stri_locate_first_coll(string, pattern, opts_collator = attr(pattern, "options")), regex = stri_locate_first_regex(string, pattern, opts_regex = attr(pattern, "options")) ) } #' @rdname str_locate #' @export str_locate_all <- function(string, pattern) { switch(type(pattern), empty = stri_locate_all_boundaries(string, omit_no_match = TRUE, opts_brkiter = stri_opts_brkiter("character")), bound = stri_locate_all_boundaries(string, omit_no_match = TRUE, opts_brkiter = attr(pattern, "options")), fixed = stri_locate_all_fixed(string, pattern, omit_no_match = TRUE, opts_fixed = attr(pattern, "options")), regex = stri_locate_all_regex(string, pattern, omit_no_match = TRUE, opts_regex = attr(pattern, "options")), coll = stri_locate_all_coll(string, pattern, omit_no_match = TRUE, opts_collator = attr(pattern, "options")) ) } #' Switch location of matches to location of non-matches. #' #' Invert a matrix of match locations to match the opposite of what was #' previously matched. #' #' @param loc matrix of match locations, as from \code{\link{str_locate_all}} #' @return numeric match giving locations of non-matches #' @export #' @examples #' numbers <- "1 and 2 and 4 and 456" #' num_loc <- str_locate_all(numbers, "[0-9]+")[[1]] #' str_sub(numbers, num_loc[, "start"], num_loc[, "end"]) #' #' text_loc <- invert_match(num_loc) #' str_sub(numbers, text_loc[, "start"], text_loc[, "end"]) invert_match <- function(loc) { cbind( start = c(0L, loc[, "end"] + 1L), end = c(loc[, "start"] - 1L, -1L) ) } stringr/R/sub.r0000644000175100001440000000426112435722031013137 0ustar hornikusers#' Extract and replace substrings from a character vector. #' #' \code{str_sub} will recycle all arguments to be the same length as the #' longest argument. If any arguments are of length 0, the output will be #' a zero length character vector. #' #' Substrings are inclusive - they include the characters at both start and #' end positions. \code{str_sub(string, 1, -1)} will return the complete #' substring, from the first character to the last. #' #' @param string input character vector. #' @param start,end Two integer vectors. \code{start} gives the position #' of the first character (defaults to first), \code{end} gives the position #' of the last (defaults to last character). Alternatively, pass a two-column #' matrix to \code{start}. #' #' Negative values count backwards from the last character. #' @param value replacement string #' @return A character vector of substring from \code{start} to \code{end} #' (inclusive). Will be length of longest input argument. #' @seealso The underlying implementation in \code{\link[stringi]{stri_sub}} #' @export #' @examples #' hw <- "Hadley Wickham" #' #' str_sub(hw, 1, 6) #' str_sub(hw, end = 6) #' str_sub(hw, 8, 14) #' str_sub(hw, 8) #' str_sub(hw, c(1, 8), c(6, 14)) #' #' # Negative indices #' str_sub(hw, -1) #' str_sub(hw, -7) #' str_sub(hw, end = -7) #' #' # Alternatively, you can pass in a two colum matrix, as in the #' # output from str_locate_all #' pos <- str_locate_all(hw, "[aeio]")[[1]] #' str_sub(hw, pos) #' str_sub(hw, pos[, 1], pos[, 2]) #' #' # Vectorisation #' str_sub(hw, seq_len(str_length(hw))) #' str_sub(hw, end = seq_len(str_length(hw))) #' #' # Replacement form #' x <- "BBCDEF" #' str_sub(x, 1, 1) <- "A"; x #' str_sub(x, -1, -1) <- "K"; x #' str_sub(x, -2, -2) <- "GHIJ"; x #' str_sub(x, 2, -2) <- ""; x str_sub <- function(string, start = 1L, end = -1L) { if (is.matrix(start)) { stri_sub(string, from = start) } else { stri_sub(string, from = start, to = end) } } #' @export #' @rdname str_sub "str_sub<-" <- function(string, start = 1L, end = -1L, value) { if (is.matrix(start)) { stri_sub(string, from = start) <- value } else { stri_sub(string, from = start, to = end) <- value } string } stringr/R/dup.r0000644000175100001440000000071412435643521013143 0ustar hornikusers#' Duplicate and concatenate strings within a character vector. #' #' Vectorised over \code{string} and \code{times}. #' #' @param string Input character vector. #' @param times Number of times to duplicate each string. #' @return A character vector. #' @export #' @examples #' fruit <- c("apple", "pear", "banana") #' str_dup(fruit, 2) #' str_dup(fruit, 1:3) #' str_c("ba", str_dup("na", 0:5)) str_dup <- function(string, times) { stri_dup(string, times) } stringr/R/pad-trim.r0000644000175100001440000000323312440404606014062 0ustar hornikusers#' Pad a string. #' #' Vectorised over \code{string}, \code{width} and \code{pad}. #' #' @param string A character vector. #' @param width Minimum width of padded strings. #' @param side Side on which padding character is added (left, right or both). #' @param pad Single padding character (default is a space). #' @return A character vector. #' @seealso \code{\link{str_trim}} to remove whitespace #' @export #' @examples #' rbind( #' str_pad("hadley", 30, "left"), #' str_pad("hadley", 30, "right"), #' str_pad("hadley", 30, "both") #' ) #' #' # All arguments are vectorised except side #' str_pad(c("a", "abc", "abcdef"), 10) #' str_pad("a", c(5, 10, 20)) #' str_pad("a", 10, pad = c("-", "_", " ")) #' #' # Longer strings are returned unchanged #' str_pad("hadley", 3) str_pad <- function(string, width, side = c("left", "right", "both"), pad = " ") { side <- match.arg(side) switch(side, left = stri_pad_left(string, width, pad = pad), right = stri_pad_right(string, width, pad = pad), both = stri_pad_both(string, width, pad = pad) ) } #' Trim whitespace from start and end of string. #' #' @param string A character vector. #' @param side Side on which to remove whitespace (left, right or both). #' @return A character vector. #' @export #' @seealso \code{\link{str_pad}} to add whitespace #' @examples #' str_trim(" String with trailing and leading white space\t") #' str_trim("\n\nString with trailing and leading white space\n\n") str_trim <- function(string, side = c("both", "left", "right")) { side <- match.arg(side) switch(side, left = stri_trim_left(string), right = stri_trim_right(string), both = stri_trim_both(string) ) } stringr/R/count.r0000644000175100001440000000222512442343250013474 0ustar hornikusers#' Count the number of matches in a string. #' #' Vectorised over \code{string} and \code{pattern}. #' #' @inheritParams str_detect #' @return An integer vector. #' @seealso #' \code{\link[stringi]{stri_count}} which this function wraps. #' #' \code{\link{str_locate}}/\code{\link{str_locate_all}} to locate position #' of matches #' #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_count(fruit, "a") #' str_count(fruit, "p") #' str_count(fruit, "e") #' str_count(fruit, c("a", "b", "p", "p")) #' #' str_count(c("a.", "...", ".a.a"), ".") #' str_count(c("a.", "...", ".a.a"), fixed(".")) str_count <- function(string, pattern = "") { switch(type(pattern), empty = stri_count_boundaries(string, opts_brkiter = stri_opts_brkiter(type = "character")), bound = stri_count_boundaries(string, opts_brkiter = attr(pattern, "options")), fixed = stri_count_fixed(string, pattern, opts_fixed = attr(pattern, "options")), coll = stri_count_coll(string, pattern, opts_collator = attr(pattern, "options")), regex = stri_count_regex(string, pattern, opts_regex = attr(pattern, "options")) ) } stringr/R/detect.r0000644000175100001440000000320312442343250013611 0ustar hornikusers#' Detect the presence or absence of a pattern in a string. #' #' Vectorised over \code{string} and \code{pattern}. #' #' @param string Input vector. Either a character vector, or something #' coercible to one. #' @param pattern Pattern to look for. #' #' The default interpretation is a regular expression, as described #' in \link[stringi]{stringi-search-regex}. Control options with #' \code{\link{regex}()}. #' #' Match a fixed string (i.e. by comparing only bytes), using #' \code{\link{fixed}(x)}. This is fast, but approximate. Generally, #' for matching human text, you'll want \code{\link{coll}(x)} which #' respects character matching rules for the specified locale. #' #' Match character, word, line and sentence boundaries with #' \code{\link{boundary}()}. An empty pattern, "", is equivalent to #' \code{boundary("character")}. #' @return A logical vector. #' @seealso \code{\link[stringi]{stri_detect}} which this function wraps #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pinapple") #' str_detect(fruit, "a") #' str_detect(fruit, "^a") #' str_detect(fruit, "a$") #' str_detect(fruit, "b") #' str_detect(fruit, "[aeiou]") #' #' # Also vectorised over pattern #' str_detect("aecfg", letters) str_detect <- function(string, pattern) { switch(type(pattern), empty = , bound = stop("Not implemented", call. = FALSE), fixed = stri_detect_fixed(string, pattern, opts_fixed = attr(pattern, "options")), coll = stri_detect_coll(string, pattern, opts_collator = attr(pattern, "options")), regex = stri_detect_regex(string, pattern, opts_regex = attr(pattern, "options")) ) } stringr/R/match.r0000644000175100001440000000344312513530665013452 0ustar hornikusers#' Extract matched groups from a string. #' #' Vectorised over \code{string} and \code{pattern}. #' #' @inheritParams str_detect #' @param pattern Pattern to look for, as defined by an ICU regular #' expression. See \link[stringi]{stringi-search-regex} for more details. #' @return For \code{str_match}, a character matrix. First column is the #' complete match, followed by one column for each capture group. #' For \code{str_match_all}, a list of character matrices. #' #' @seealso \code{\link{str_extract}} to extract the complete match, #' \code{\link[stringi]{stri_match}} for the underlying #' implementation. #' @export #' @examples #' strings <- c(" 219 733 8965", "329-293-8753 ", "banana", "595 794 7569", #' "387 287 6718", "apple", "233.398.9187 ", "482 952 3315", #' "239 923 8115 and 842 566 4692", "Work: 579-499-7527", "$1000", #' "Home: 543.355.3679") #' phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})" #' #' str_extract(strings, phone) #' str_match(strings, phone) #' #' # Extract/match all #' str_extract_all(strings, phone) #' str_match_all(strings, phone) #' #' x <- c(" ", " <>", "", "", NA) #' str_match(x, "<(.*?)> <(.*?)>") #' str_match_all(x, "<(.*?)>") #' #' str_extract(x, "<.*?>") #' str_extract_all(x, "<.*?>") str_match <- function(string, pattern) { switch(type(pattern), regex = stri_match_first_regex(string, pattern, opts_regex = attr(pattern, "options")), stop("Can only match regular expressions", call. = FALSE) ) } #' @rdname str_match #' @export str_match_all <- function(string, pattern) { switch(type(pattern), regex = stri_match_all_regex(string, pattern, cg_missing = "", omit_no_match = TRUE, opts_regex = attr(pattern, "options")), stop("Can only match regular expressions", call. = FALSE) ) } stringr/R/extract.r0000644000175100001440000000376412442343250014027 0ustar hornikusers#' Extract matching patterns from a string. #' #' Vectorised over \code{string} and \code{pattern}. #' #' @inheritParams str_detect #' @return A character vector. #' @seealso \code{\link[stringi]{stri_extract_first}} and #' \code{\link[stringi]{stri_extract_all}} for the underlying #' implementation. #' @param simplify If \code{FALSE}, the default, returns a list of character #' vectors. If \code{TRUE} returns a character matrix. #' @export #' @examples #' shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2") #' str_extract(shopping_list, "\\d") #' str_extract(shopping_list, "[a-z]+") #' str_extract(shopping_list, "[a-z]{1,4}") #' str_extract(shopping_list, "\\b[a-z]{1,4}\\b") #' #' # Extract all matches #' str_extract_all(shopping_list, "[a-z]+") #' str_extract_all(shopping_list, "\\b[a-z]+\\b") #' str_extract_all(shopping_list, "\\d") #' #' # Simplify results into character matrix #' str_extract_all(shopping_list, "\\b[a-z]+\\b", simplify = TRUE) #' str_extract_all(shopping_list, "\\d", simplify = TRUE) str_extract <- function(string, pattern) { switch(type(pattern), empty = , bound = stop("Not implemented", call. = FALSE), fixed = stri_extract_first_fixed(string, pattern, opts_fixed = attr(pattern, "options")), coll = stri_extract_first_coll(string, pattern, opts_collator = attr(pattern, "options")), regex = stri_extract_first_regex(string, pattern, opts_regex = attr(pattern, "options")) ) } #' @rdname str_extract #' @export str_extract_all <- function(string, pattern, simplify = FALSE) { switch(type(pattern), empty = , bound = stop("Not implemented", call. = FALSE), fixed = stri_extract_all_fixed(string, pattern, opts_fixed = attr(pattern, "options")), coll = stri_extract_all_coll(string, pattern, simplify = simplify, omit_no_match = TRUE, attr(pattern, "options")), regex = stri_extract_all_regex(string, pattern, simplify = simplify, omit_no_match = TRUE, attr(pattern, "options")) ) } stringr/R/replace.r0000644000175100001440000000642512442343250013765 0ustar hornikusers#' Replace matched patterns in a string. #' #' Vectorised over \code{string}, \code{pattern} and \code{replacement}. #' #' @inheritParams str_detect #' @param pattern,replacement Supply separate pattern and replacement strings #' to vectorise over the patterns. References of the form \code{\1}, #' \code{\2} will be replaced with the contents of the respective matched #' group (created by \code{()}) within the pattern. #' #' For \code{str_replace_all} only, you can perform multiple patterns and #' replacements to each string, by passing a named character to #' \code{pattern}. #' @return A character vector. #' @seealso \code{str_replace_na} to turn missing values into "NA"; #' \code{\link{stri_replace}} for the underlying implementation. #' @export #' @examples #' fruits <- c("one apple", "two pears", "three bananas") #' str_replace(fruits, "[aeiou]", "-") #' str_replace_all(fruits, "[aeiou]", "-") #' #' str_replace(fruits, "([aeiou])", "") #' str_replace(fruits, "([aeiou])", "\\1\\1") #' str_replace(fruits, "[aeiou]", c("1", "2", "3")) #' str_replace(fruits, c("a", "e", "i"), "-") #' #' fruits <- c("one apple", "two pears", "three bananas") #' str_replace(fruits, "[aeiou]", "-") #' str_replace_all(fruits, "[aeiou]", "-") #' #' str_replace_all(fruits, "([aeiou])", "") #' str_replace_all(fruits, "([aeiou])", "\\1\\1") #' str_replace_all(fruits, "[aeiou]", c("1", "2", "3")) #' str_replace_all(fruits, c("a", "e", "i"), "-") #' #' # If you want to apply multiple patterns and replacements to the same #' # string, pass a named version to pattern. #' str_replace_all(str_c(fruits, collapse = "---"), #' c("one" = 1, "two" = 2, "three" = 3)) str_replace <- function(string, pattern, replacement) { replacement <- fix_replacement(replacement) switch(type(pattern), empty = , bound = stop("Not implemented", call. = FALSE), fixed = stri_replace_first_fixed(string, pattern, replacement, opts_fixed = attr(pattern, "options")), coll = stri_replace_first_coll(string, pattern, replacement, opts_collator = attr(pattern, "options")), regex = stri_replace_first_regex(string, pattern, replacement, opts_regex = attr(pattern, "options")), ) } #' @export #' @rdname str_replace str_replace_all <- function(string, pattern, replacement) { if (!is.null(names(pattern))) { replacement <- unname(pattern) pattern <- names(pattern) vec <- FALSE } else { vec <- TRUE } replacement <- fix_replacement(replacement) switch(type(pattern), empty = , bound = stop("Not implemented", call. = FALSE), fixed = stri_replace_all_fixed(string, pattern, replacement, vectorize_all = vec, opts_fixed = attr(pattern, "options")), coll = stri_replace_all_coll(string, pattern, replacement, vectorize_all = vec, opts_collator = attr(pattern, "options")), regex = stri_replace_all_regex(string, pattern, replacement, vectorize_all = vec, opts_regex = attr(pattern, "options")) ) } fix_replacement <- function(x) { stri_replace_all_regex(x, c("\\$", "\\\\(\\d)"), c("\\\\$", "\\$$1"), vectorize_all = FALSE) } #' Turn NA into "NA" #' #' @inheritParams str_replace #' @export #' @examples #' str_replace_na(c("NA", "abc", "def")) str_replace_na <- function(string, replacement = "NA") { stri_replace_na(string, replacement) } stringr/R/c.r0000644000175100001440000000344612437166530012604 0ustar hornikusers#' Join multiple strings into a single string. #' #' To understand how \code{str_c} works, you need to imagine that you are #' building up a matrix of strings. Each input argument forms a column, and #' is expanded to the length of the longest argument, using the usual #' recyling rules. The \code{sep} string is inserted between each column. If #' collapse is \code{NULL} each row is collapsed into a single string. If #' non-\code{NULL} that string is inserted at the end of each row, and #' the entire matrix collapsed to a single string. #' #' @param ... One or more character vectors. Zero length arguments #' are removed. #' @param sep String to insert between input vectors. #' @param collapse Optional string used to combine input vectors into single #' string. #' @return If \code{collapse = NULL} (the default) a character vector with #' length equal to the longest input string. If \code{collapse} is #' non-NULL, a character vector of length 1. #' @seealso \code{\link{paste}} for equivalent base R functionality, and #' \code{\link[stringi]{stri_c}} which this function wraps #' @export str_c #' @examples #' str_c("Letter: ", letters) #' str_c("Letter", letters, sep = ": ") #' str_c(letters, " is for", "...") #' str_c(letters[-26], " comes before ", letters[-1]) #' #' str_c(letters, collapse = "") #' str_c(letters, collapse = ", ") #' #' # Missing inputs give missing outputs #' str_c(c("a", NA, "b"), "-d") #' # Use str_replace_NA to display literal NAs: #' str_c(str_replace_na(c("a", NA, "b")), "-d") str_c <- function(..., sep = "", collapse = NULL) { stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE) } #' @export #' @rdname str_c str_join <- function(..., sep = "", collapse = NULL) { .Deprecated("str_c") stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE) } stringr/vignettes/0000755000175100001440000000000012520151252013763 5ustar hornikusersstringr/vignettes/stringr.Rmd0000644000175100001440000002204112436152754016134 0ustar hornikusers--- title: "Introduction to stringr" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to stringr} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, echo=FALSE} library("stringr") knitr::opts_chunk$set(comment = "#>", collapse = TRUE) ``` Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R. The __stringr__ package aims to remedy these problems by providing a clean, modern interface to common string operations. More concretely, stringr: - Simplifies string operations by eliminating options that you don't need 95% of the time (the other 5% of the time you can functions from base R or [stringi](https://github.com/Rexamine/stringi/)). - Uses consistent function names and arguments. - Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs. It also processes factors and character vectors in the same way. - Completes R's string handling functions with useful functions from other programming languages. To meet these goals, stringr provides two basic families of functions: - basic string operations, and - pattern matching functions which use regular expressions to detect, locate, match, replace, extract, and split strings. As of version 1.0, stringr is a thin wrapper around [stringi](https://github.com/Rexamine/stringi/), which implements all the functions in stringr with efficient C code based on the [ICU library](http://site.icu-project.org). Compared to stringi, stringr is considerably simpler: it provides fewer options and fewer functions. This is great when you're getting started learning string functions, and if you do need more of stringi's power, you should find the interface similar. These are described in more detail in the following sections. ## Basic string operations There are three string functions that are closely related to their base R equivalents, but with a few enhancements: - `str_c()` is equivalent to `paste()`, but it uses the empty string ("") as the default separator and silently removes `NULL` inputs. - `str_length()` is equivalent to `nchar()`, but it preserves NA's (rather than giving them length 2) and converts factors to characters (not integers). - `str_sub()` is equivalent to `substr()` but it returns a zero length vector if any of its inputs are zero length, and otherwise expands each argument to match the longest. It also accepts negative positions, which are calculated from the left of the last character. The end position defaults to `-1`, which corresponds to the last character. - `str_str<-` is equivalent to `substr<-`, but like `str_sub` it understands negative indices, and replacement strings not do need to be the same length as the string they are replacing. Three functions add new functionality: - `str_dup()` to duplicate the characters within a string. - `str_trim()` to remove leading and trailing whitespace. - `str_pad()` to pad a string with extra whitespace on the left, right, or both sides. ## Pattern matching stringr provides pattern matching functions to **detect**, **locate**, **extract**, **match**, **replace**, and **split** strings. I'll illustrate how they work with some strings and a regular expression designed to match (US) phone numbers: ```{r} strings <- c( "apple", "219 733 8965", "329-293-8753", "Work: 579-499-7527; Home: 543.355.3679" ) phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})" ``` - `str_detect()` detects the presence or absence of a pattern and returns a logical vector (similar to `grepl()`). `str_subset()` returns the elements of a character vector that match a regular expression (similar to `grep()` with `value = TRUE`)`. ```{r} # Which strings contain phone numbers? str_detect(strings, phone) str_subset(strings, phone) ``` - `str_locate()` locates the first position of a pattern and returns a numeric matrix with columns start and end. `str_locate_all()` locates all matches, returning a list of numeric matrices. Similar to `regexpr()` and `gregexpr()`. ```{r} # Where in the string is the phone number located? (loc <- str_locate(strings, phone)) str_locate_all(strings, phone) ``` - `str_extract()` extracts text corresponding to the first match, returning a character vector. `str_extract_all()` extracts all matches and returns a list of character vectors. ```{r} # What are the phone numbers? str_extract(strings, phone) str_extract_all(strings, phone) str_extract_all(strings, phone, simplify = TRUE) ``` - `str_match()` extracts capture groups formed by `()` from the first match. It returns a character matrix with one column for the complete match and one column for each group. `str_match_all()` extracts capture groups from all matches and returns a list of character matrices. Similar to `regmatches()`. ```{r} # Pull out the three components of the match str_match(strings, phone) str_match_all(strings, phone) ``` - `str_replace()` replaces the first matched pattern and returns a character vector. `str_replace_all()` replaces all matches. Similar to `sub()` and `gsub()`. ```{r} str_replace(strings, phone, "XXX-XXX-XXXX") str_replace_all(strings, phone, "XXX-XXX-XXXX") ``` - `str_split_fixed()` splits the string into a fixed number of pieces based on a pattern and returns a character matrix. `str_split()` splits a string into a variable number of pieces and returns a list of character vectors. ### Arguments Each pattern matching function has the same first two arguments, a character vector of `string`s to process and a single `pattern` (regular expression) to match. The replace functions have an additional argument specifying the replacement string, and the split functions have an argument to specify the number of pieces. Unlike base string functions, stringr offers control over matching not through arguments, but through modifier functions, `regexp()`, `coll()` and `fixed()`. This is a deliberate choice made to simplify these functions. For example, while `grepl` has six arguments, `str_detect()` only has two. ### Regular expressions To be able to use these functions effectively, you'll need a good knowledge of regular expressions, which this vignette is not going to teach you. Some useful tools to get you started: - A good [reference sheet](http://www.regular-expressions.info/reference.html). - A tool that allows you to [interactively test](http://gskinner.com/RegExr/) what a regular expression will match. - A tool to [build a regular expression](http://www.txt2re.com) from an input string. When writing regular expressions, I strongly recommend generating a list of positive (pattern should match) and negative (pattern shouldn't match) test cases to ensure that you are matching the correct components. ### Functions that return lists Many of the functions return a list of vectors or matrices. To work with each element of the list there are two strategies: iterate through a common set of indices, or use `Map()` to iterate through the vectors simultaneously. The second strategy is illustrated below: ```{r} col2hex <- function(col) { rgb <- col2rgb(col) rgb(rgb["red", ], rgb["green", ], rgb["blue", ], max = 255) } # Goal replace colour names in a string with their hex equivalent strings <- c("Roses are red, violets are blue", "My favourite colour is green") colours <- str_c("\\b", colors(), "\\b", collapse="|") # This gets us the colours, but we have no way of replacing them str_extract_all(strings, colours) # Instead, let's work with locations locs <- str_locate_all(strings, colours) Map(function(string, loc) { hex <- col2hex(str_sub(string, loc)) str_sub(string, loc) <- hex string }, strings, locs) ``` Another approach is to use the second form of `str_replace_all()`: if you give it a named vector, it applies each `pattern = replacement` in turn: ```{r} matches <- col2hex(colors()) names(matches) <- str_c("\\b", colors(), "\\b") str_replace_all(strings, matches) ``` ## Conclusion stringr provides an opinionated interface to strings in R. It makes string processing simpler by removing uncommon options, and by vigorously enforcing consistency across functions. I have also added new functions that I have found useful from Ruby, and over time, I hope users will suggest useful functions from other programming languages. I will continue to build on the included test suite to ensure that the package behaves as expected and remains bug free. stringr/README.md0000644000175100001440000000332212436153510013237 0ustar hornikusers# stringr [![Build Status](https://travis-ci.org/hadley/stringr.png?branch=master)](https://travis-ci.org/hadley/stringr) Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R. The __stringr__ package aims to remedy these problems by providing a clean, modern interface to common string operations. More concretely, stringr: * Uses consistent functions and argument names. * Simplifies string operations by eliminating options that you don't need 95% of the time. * Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs. * Is built on top of [stringi](https://github.com/Rexamine/stringi/) which uses the [ICU](http://site.icu-project.org) library to provide fast, correct implementations of common string manipulations ## Installation To get the current released version from CRAN: ```R install.packages("stringr") ``` To get the current development version from github: ```R # install.packages("devtools") devtools::install_github("Rexamine/stringi") devtools::install_github("hadley/stringr") ``` ## Piping stringr provides the pipe, `%>%`, from magrittr to make it easy to string together sequences of string operations: ```R letters %>% str_pad(5, "right") %>% str_c(letters) ``` stringr/MD50000644000175100001440000000650612520375150012277 0ustar hornikusersf6c0f7228263e8d0c42f7b8581ee9da7 *DESCRIPTION 54c1589e84ba5df2778a4e893ef72457 *NAMESPACE 87ddd6605b202ee5d3ee22926e447100 *R/c.r 9c5e91f93a404215e8c4946c5a3ac2b7 *R/case.R bc5a2a73f2842baf45454f53f324f274 *R/conv.R 3ebde5c8b233eb438daa6eb6ef20b580 *R/count.r 5a02f00b78b0090830f05b1747033e83 *R/detect.r 017248629447b588fd6e607c6a5b420e *R/dup.r 033276694abb26130c10b31e087f71b9 *R/extract.r 8ef6b7317657989078722686317f810b *R/length.r 673284eb210305e4983f2618d55c2d7a *R/locate.r 77cb46494e7208f2deb942ef897fdd52 *R/match.r f5d93cebd0efc3ad2f1a479c86f4c041 *R/modifiers.r 7c467f8789c92d5e437f9b39cbc419f9 *R/pad-trim.r 373404758535e696141d86d2cd4a1f9b *R/replace.r 932f28001598d05ca5f977721e9bb131 *R/sort.R 0b6092c4c346f20b2181c505e7d86912 *R/split.r 994867fa3894453fc4d4832171eafee3 *R/stringr.R 657a92dfd6074f24e81c49279bd54297 *R/sub.r b925d9c285b6b377f1cb26205521ed29 *R/subset.R f583f5b5856f7cb5f2c5fbb04f39f8a8 *R/utils.R e289fd194fc7928e7159cf8afe1d0677 *R/word.r 200bb24c414024721759d59d2907eadd *R/wrap.r b838ad43bd80f67c1348cdfd1db69208 *README.md b536c8a62aa1b18eea448bf780b6f225 *build/vignette.rds 8ffc45088b1068264eba4514b264d53a *inst/doc/stringr.R 2d2afda9742a6d5ef5fa1c51a781bda9 *inst/doc/stringr.Rmd 293072752c163cf1501f8948dd27cb2e *inst/doc/stringr.html 7f5ecca60bf966d675cad61a9f96d5cd *man/case.Rd f2bce59645a8e34f5ce03d3c96b00481 *man/invert_match.Rd 895425e477b9733374018900728a2900 *man/modifier-deprecated.Rd 11c58e978d8a8b356967e4b2c0e74a1e *man/modifiers.Rd 46b011f56b10d41a81084a43102f615b *man/pipe.Rd 870530c70a8db2f1cd85638257365eae *man/str_c.Rd ead10b491a30fc630e7b12ea45a3bd13 *man/str_conv.Rd d99b1cc60142d76eefdc5b301e746de8 *man/str_count.Rd 8277de388e437d825f74f5e14f114bb1 *man/str_detect.Rd 20dccc633025393e296a7f8e0973bfb9 *man/str_dup.Rd 7cd0c123e5673f99e4162b93a80f1674 *man/str_extract.Rd 5cd37c9c64d70595c1ed8ce551f82789 *man/str_length.Rd 81fac342bc1e6fe503d02c9673dce784 *man/str_locate.Rd 3fd2199b7ff06db88c9ba6cd4722d6cd *man/str_match.Rd 92b4ac302da8c85b58e31652655df6ab *man/str_order.Rd 176c05214e4313f394e7714e1d202376 *man/str_pad.Rd 211b5f4ab56a0771b3239d32ab2cd613 *man/str_replace.Rd ccd6299c89a844ec3573455cb4b86825 *man/str_replace_na.Rd ef5fcb6a4e71c6316da2157046a92138 *man/str_split.Rd f81e06be94e067d0ee7e972d2744ea01 *man/str_sub.Rd 353ba68fb55d9274cc38d498d80a947e *man/str_subset.Rd b040d44ded59aaf147050d56ada196da *man/str_trim.Rd 9b2a1aea53b13a1fd0edc924f55c370a *man/str_wrap.Rd 93af066d98be9af372aa91351be491c1 *man/stringr.Rd 9f42a2d37fae6d2bc33ac8e1255c9d42 *man/word.Rd 4ee9d05bd4688270eca8d85299cedcd1 *tests/testthat.R 61f9d77768cf9ff813d382f9337178fb *tests/testthat/test-count.r c69fd2a84d10850f39b85819d60a32fb *tests/testthat/test-detect.r 065f752787f210c753d5bb5feea7f7a5 *tests/testthat/test-dup.r a7623052beaad8b11fcdc1fbe1504599 *tests/testthat/test-extract.r 0e49f9a28c45a65d7c6893f03a8c9ca0 *tests/testthat/test-join.r 922366c3451f88871b9ce063529edb7a *tests/testthat/test-length.r 76249df3c11c62fb11aef63899029790 *tests/testthat/test-locate.r 3fbd8882c34a3923b7b12707af0a987d *tests/testthat/test-match.r 3cfc28d6785f4a8c0796a7980c9aac90 *tests/testthat/test-pad.r f339473f66b14267ec4b86db14b97820 *tests/testthat/test-split.r c95563eafc4fad4c60504ae59225b9d0 *tests/testthat/test-sub.r 7dc6b256c7c2d3af1483b84698494819 *tests/testthat/test-trim.r 2d2afda9742a6d5ef5fa1c51a781bda9 *vignettes/stringr.Rmd stringr/build/0000755000175100001440000000000012520151252013052 5ustar hornikusersstringr/build/vignette.rds0000644000175100001440000000032312520151252015407 0ustar hornikusersb```b`f@&0`b fd`a%EyEzA)hRy%E)%y % Phx`&dqMA,Q,LH YsSt楀aM wjey~L6̜T!%ps QY_/( @hrNb1GRKҊAɹstringr/DESCRIPTION0000644000175100001440000000162012520375150013465 0ustar hornikusersPackage: stringr Version: 1.0.0 Title: Simple, Consistent Wrappers for Common String Operations Description: A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another. Authors@R: c( person("Hadley", "Wickham", , "hadley@rstudio.com", c("aut", "cre", "cph")), person("RStudio", role = "cph") ) License: GPL-2 Depends: R (>= 2.14) Imports: stringi (>= 0.4.1), magrittr Suggests: testthat, knitr VignetteBuilder: knitr NeedsCompilation: no Packaged: 2015-04-29 12:46:34 UTC; hadley Author: Hadley Wickham [aut, cre, cph], RStudio [cph] Maintainer: Hadley Wickham Repository: CRAN Date/Publication: 2015-04-30 11:48:24 stringr/man/0000755000175100001440000000000012442343250012532 5ustar hornikusersstringr/man/word.Rd0000644000175100001440000000223112442343223013772 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/word.r \name{word} \alias{word} \title{Extract words from a sentence.} \usage{ word(string, start = 1L, end = start, sep = fixed(" ")) } \arguments{ \item{string}{input character vector.} \item{start}{integer vector giving position of first word to extract. Defaults to first word. If negative, counts backwards from last character.} \item{end}{integer vector giving position of last word to extract. Defaults to first word. If negative, counts backwards from last character.} \item{sep}{separator between words. Defaults to single space.} } \value{ character vector of words from \code{start} to \code{end} (inclusive). Will be length of longest input argument. } \description{ Extract words from a sentence. } \examples{ sentences <- c("Jane saw a cat", "Jane sat down") word(sentences, 1) word(sentences, 2) word(sentences, -1) word(sentences, 2, -1) # Also vectorised over start and end word(sentences[1], 1:3, -1) word(sentences[1], 1, 1:4) # Can define words by other separators str <- 'abc.def..123.4568.999' word(str, 1, sep = fixed('..')) word(str, 2, sep = fixed('..')) } stringr/man/str_sub.Rd0000644000175100001440000000363412442343223014510 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/sub.r \name{str_sub} \alias{str_sub} \alias{str_sub<-} \title{Extract and replace substrings from a character vector.} \usage{ str_sub(string, start = 1L, end = -1L) str_sub(string, start = 1L, end = -1L) <- value } \arguments{ \item{string}{input character vector.} \item{start,end}{Two integer vectors. \code{start} gives the position of the first character (defaults to first), \code{end} gives the position of the last (defaults to last character). Alternatively, pass a two-column matrix to \code{start}. Negative values count backwards from the last character.} \item{value}{replacement string} } \value{ A character vector of substring from \code{start} to \code{end} (inclusive). Will be length of longest input argument. } \description{ \code{str_sub} will recycle all arguments to be the same length as the longest argument. If any arguments are of length 0, the output will be a zero length character vector. } \details{ Substrings are inclusive - they include the characters at both start and end positions. \code{str_sub(string, 1, -1)} will return the complete substring, from the first character to the last. } \examples{ hw <- "Hadley Wickham" str_sub(hw, 1, 6) str_sub(hw, end = 6) str_sub(hw, 8, 14) str_sub(hw, 8) str_sub(hw, c(1, 8), c(6, 14)) # Negative indices str_sub(hw, -1) str_sub(hw, -7) str_sub(hw, end = -7) # Alternatively, you can pass in a two colum matrix, as in the # output from str_locate_all pos <- str_locate_all(hw, "[aeio]")[[1]] str_sub(hw, pos) str_sub(hw, pos[, 1], pos[, 2]) # Vectorisation str_sub(hw, seq_len(str_length(hw))) str_sub(hw, end = seq_len(str_length(hw))) # Replacement form x <- "BBCDEF" str_sub(x, 1, 1) <- "A"; x str_sub(x, -1, -1) <- "K"; x str_sub(x, -2, -2) <- "GHIJ"; x str_sub(x, 2, -2) <- ""; x } \seealso{ The underlying implementation in \code{\link[stringi]{stri_sub}} } stringr/man/str_length.Rd0000644000175100001440000000222512442343223015173 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/length.r \name{str_length} \alias{str_length} \title{The length of a string.} \usage{ str_length(string) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} } \value{ A numeric vector giving number of characters (code points) in each element of the character vector. Missing string have missing length. } \description{ Technically this returns the number of "code points", in a string. One code point usually corresponds to one character, but not always. For example, an u with a umlaut might be represented as a single character or as the combination a u and an umlaut. } \examples{ str_length(letters) str_length(NA) str_length(factor("abc")) str_length(c("i", "like", "programming", NA)) # Two ways of representing a u with an umlaut u1 <- "\\u00fc" u2 <- stringi::stri_trans_nfd(u1) # The print the same: u1 u2 # But have a different length str_length(u1) str_length(u2) # Even though they have the same number of characters str_count(u1) str_count(u2) } \seealso{ \code{\link[stringi]{stri_length}} which this function wraps. } stringr/man/pipe.Rd0000644000175100001440000000033112442343223013753 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/utils.R \name{\%>\%} \alias{\%>\%} \title{Pipe operator} \usage{ lhs \%>\% rhs } \description{ Pipe operator } \keyword{internal} stringr/man/str_trim.Rd0000644000175100001440000000122012442343223014657 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/pad-trim.r \name{str_trim} \alias{str_trim} \title{Trim whitespace from start and end of string.} \usage{ str_trim(string, side = c("both", "left", "right")) } \arguments{ \item{string}{A character vector.} \item{side}{Side on which to remove whitespace (left, right or both).} } \value{ A character vector. } \description{ Trim whitespace from start and end of string. } \examples{ str_trim(" String with trailing and leading white space\\t") str_trim("\\n\\nString with trailing and leading white space\\n\\n") } \seealso{ \code{\link{str_pad}} to add whitespace } stringr/man/str_conv.Rd0000644000175100001440000000114712442343223014661 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/conv.R \name{str_conv} \alias{str_conv} \title{Specify the encoding of a string.} \usage{ str_conv(string, encoding) } \arguments{ \item{string}{String to re-encode.} \item{encoding}{Name of encoding. See \code{\link[stringi]{stri_enc_list}} for a complete list.} } \description{ This is a convenient way to override the current encoding of a string. } \examples{ # Example from encoding?stringi::stringi x <- rawToChar(as.raw(177)) x str_conv(x, "ISO-8859-2") # Polish "a with ogonek" str_conv(x, "ISO-8859-1") # Plus-minus } stringr/man/str_replace_na.Rd0000644000175100001440000000146712442343223016012 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/replace.r \name{str_replace_na} \alias{str_replace_na} \title{Turn NA into "NA"} \usage{ str_replace_na(string, replacement = "NA") } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{replacement}{Supply separate pattern and replacement strings to vectorise over the patterns. References of the form \code{\1}, \code{\2} will be replaced with the contents of the respective matched group (created by \code{()}) within the pattern. For \code{str_replace_all} only, you can perform multiple patterns and replacements to each string, by passing a named character to \code{pattern}.} } \description{ Turn NA into "NA" } \examples{ str_replace_na(c("NA", "abc", "def")) } stringr/man/str_subset.Rd0000644000175100001440000000271212442343223015220 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/subset.R \name{str_subset} \alias{str_subset} \title{Keep strings matching a pattern.} \usage{ str_subset(string, pattern) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \link[stringi]{stringi-search-regex}. Control options with \code{\link{regex}()}. Match a fixed string (i.e. by comparing only bytes), using \code{\link{fixed}(x)}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link{coll}(x)} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link{boundary}()}. An empty pattern, "", is equivalent to \code{boundary("character")}.} } \value{ A character vector. } \description{ This is a convenient wrapper around \code{x[str_detect(x, pattern)]}. Vectorised over \code{string} and \code{pattern} } \examples{ fruit <- c("apple", "banana", "pear", "pinapple") str_subset(fruit, "a") str_subset(fruit, "^a") str_subset(fruit, "a$") str_subset(fruit, "b") str_subset(fruit, "[aeiou]") # Missings are silently dropped str_subset(c("a", NA, "b"), ".") } \seealso{ \code{\link{grep}} with argument \code{value = TRUE}, \code{\link[stringi]{stri_subset}} for the underlying implementation. } stringr/man/str_split.Rd0000644000175100001440000000362512442343223015052 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/split.r \name{str_split} \alias{str_split} \alias{str_split_fixed} \title{Split up a string into pieces.} \usage{ str_split(string, pattern, n = Inf) str_split_fixed(string, pattern, n) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \link[stringi]{stringi-search-regex}. Control options with \code{\link{regex}()}. Match a fixed string (i.e. by comparing only bytes), using \code{\link{fixed}(x)}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link{coll}(x)} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link{boundary}()}. An empty pattern, "", is equivalent to \code{boundary("character")}.} \item{n}{number of pieces to return. Default (Inf) uses all possible split positions. For \code{str_split_fixed}, if n is greater than the number of pieces, the result will be padded with empty strings.} } \value{ For \code{str_split_fixed}, a character matrix with \code{n} columns. For \code{str_split}, a list of character vectors. } \description{ Vectorised over \code{string} and \code{pattern}. } \examples{ fruits <- c( "apples and oranges and pears and bananas", "pineapples and mangos and guavas" ) str_split(fruits, " and ") # Specify n to restrict the number of possible matches str_split(fruits, " and ", n = 3) str_split(fruits, " and ", n = 2) # If n greater than number of pieces, no padding occurs str_split(fruits, " and ", n = 5) # Use fixed to return a character matrix str_split_fixed(fruits, " and ", 3) str_split_fixed(fruits, " and ", 4) } \seealso{ \code{\link{stri_split}} for the underlying implementation. } stringr/man/str_c.Rd0000644000175100001440000000323512442343223014136 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/c.r \name{str_c} \alias{str_c} \alias{str_join} \title{Join multiple strings into a single string.} \usage{ str_c(..., sep = "", collapse = NULL) str_join(..., sep = "", collapse = NULL) } \arguments{ \item{...}{One or more character vectors. Zero length arguments are removed.} \item{sep}{String to insert between input vectors.} \item{collapse}{Optional string used to combine input vectors into single string.} } \value{ If \code{collapse = NULL} (the default) a character vector with length equal to the longest input string. If \code{collapse} is non-NULL, a character vector of length 1. } \description{ To understand how \code{str_c} works, you need to imagine that you are building up a matrix of strings. Each input argument forms a column, and is expanded to the length of the longest argument, using the usual recyling rules. The \code{sep} string is inserted between each column. If collapse is \code{NULL} each row is collapsed into a single string. If non-\code{NULL} that string is inserted at the end of each row, and the entire matrix collapsed to a single string. } \examples{ str_c("Letter: ", letters) str_c("Letter", letters, sep = ": ") str_c(letters, " is for", "...") str_c(letters[-26], " comes before ", letters[-1]) str_c(letters, collapse = "") str_c(letters, collapse = ", ") # Missing inputs give missing outputs str_c(c("a", NA, "b"), "-d") # Use str_replace_NA to display literal NAs: str_c(str_replace_na(c("a", NA, "b")), "-d") } \seealso{ \code{\link{paste}} for equivalent base R functionality, and \code{\link[stringi]{stri_c}} which this function wraps } stringr/man/str_count.Rd0000644000175100001440000000264612442343223015051 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/count.r \name{str_count} \alias{str_count} \title{Count the number of matches in a string.} \usage{ str_count(string, pattern = "") } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \link[stringi]{stringi-search-regex}. Control options with \code{\link{regex}()}. Match a fixed string (i.e. by comparing only bytes), using \code{\link{fixed}(x)}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link{coll}(x)} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link{boundary}()}. An empty pattern, "", is equivalent to \code{boundary("character")}.} } \value{ An integer vector. } \description{ Vectorised over \code{string} and \code{pattern}. } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_count(fruit, "a") str_count(fruit, "p") str_count(fruit, "e") str_count(fruit, c("a", "b", "p", "p")) str_count(c("a.", "...", ".a.a"), ".") str_count(c("a.", "...", ".a.a"), fixed(".")) } \seealso{ \code{\link[stringi]{stri_count}} which this function wraps. \code{\link{str_locate}}/\code{\link{str_locate_all}} to locate position of matches } stringr/man/case.Rd0000644000175100001440000000121112442343223013727 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/case.R \name{case} \alias{case} \alias{str_to_lower} \alias{str_to_title} \alias{str_to_upper} \title{Convert case of a string.} \usage{ str_to_upper(string, locale = "") str_to_lower(string, locale = "") str_to_title(string, locale = "") } \arguments{ \item{string}{String to modify} \item{locale}{Locale to use for translations.} } \description{ Convert case of a string. } \examples{ dog <- "The quick brown dog" str_to_upper(dog) str_to_lower(dog) str_to_title(dog) # Locale matters! str_to_upper("i", "en") # English str_to_upper("i", "tr") # Turkish } stringr/man/str_replace.Rd0000644000175100001440000000360612442343223015331 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/replace.r \name{str_replace} \alias{str_replace} \alias{str_replace_all} \title{Replace matched patterns in a string.} \usage{ str_replace(string, pattern, replacement) str_replace_all(string, pattern, replacement) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern,replacement}{Supply separate pattern and replacement strings to vectorise over the patterns. References of the form \code{\1}, \code{\2} will be replaced with the contents of the respective matched group (created by \code{()}) within the pattern. For \code{str_replace_all} only, you can perform multiple patterns and replacements to each string, by passing a named character to \code{pattern}.} } \value{ A character vector. } \description{ Vectorised over \code{string}, \code{pattern} and \code{replacement}. } \examples{ fruits <- c("one apple", "two pears", "three bananas") str_replace(fruits, "[aeiou]", "-") str_replace_all(fruits, "[aeiou]", "-") str_replace(fruits, "([aeiou])", "") str_replace(fruits, "([aeiou])", "\\\\1\\\\1") str_replace(fruits, "[aeiou]", c("1", "2", "3")) str_replace(fruits, c("a", "e", "i"), "-") fruits <- c("one apple", "two pears", "three bananas") str_replace(fruits, "[aeiou]", "-") str_replace_all(fruits, "[aeiou]", "-") str_replace_all(fruits, "([aeiou])", "") str_replace_all(fruits, "([aeiou])", "\\\\1\\\\1") str_replace_all(fruits, "[aeiou]", c("1", "2", "3")) str_replace_all(fruits, c("a", "e", "i"), "-") # If you want to apply multiple patterns and replacements to the same # string, pass a named version to pattern. str_replace_all(str_c(fruits, collapse = "---"), c("one" = 1, "two" = 2, "three" = 3)) } \seealso{ \code{str_replace_na} to turn missing values into "NA"; \code{\link{stri_replace}} for the underlying implementation. } stringr/man/str_wrap.Rd0000644000175100001440000000230412452600062014657 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/wrap.r \name{str_wrap} \alias{str_wrap} \title{Wrap strings into nicely formatted paragraphs.} \usage{ str_wrap(string, width = 80, indent = 0, exdent = 0) } \arguments{ \item{string}{character vector of strings to reformat.} \item{width}{positive integer giving target line width in characters. A width less than or equal to 1 will put each word on its own line.} \item{indent}{non-negative integer giving indentation of first line in each paragraph} \item{exdent}{non-negative integer giving indentation of following lines in each paragraph} } \value{ A character vector of re-wrapped strings. } \description{ This is a wrapper around \code{\link[stringi]{stri_wrap}} which implements the Knuth-Plass paragraph wrapping algorithm. } \examples{ thanks_path <- file.path(R.home("doc"), "THANKS") thanks <- str_c(readLines(thanks_path), collapse = "\\n") thanks <- word(thanks, 1, 3, fixed("\\n\\n")) cat(str_wrap(thanks), "\\n") cat(str_wrap(thanks, width = 40), "\\n") cat(str_wrap(thanks, width = 60, indent = 2), "\\n") cat(str_wrap(thanks, width = 60, exdent = 2), "\\n") cat(str_wrap(thanks, width = 0, exdent = 2), "\\n") } stringr/man/str_pad.Rd0000644000175100001440000000167712442343223014470 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/pad-trim.r \name{str_pad} \alias{str_pad} \title{Pad a string.} \usage{ str_pad(string, width, side = c("left", "right", "both"), pad = " ") } \arguments{ \item{string}{A character vector.} \item{width}{Minimum width of padded strings.} \item{side}{Side on which padding character is added (left, right or both).} \item{pad}{Single padding character (default is a space).} } \value{ A character vector. } \description{ Vectorised over \code{string}, \code{width} and \code{pad}. } \examples{ rbind( str_pad("hadley", 30, "left"), str_pad("hadley", 30, "right"), str_pad("hadley", 30, "both") ) # All arguments are vectorised except side str_pad(c("a", "abc", "abcdef"), 10) str_pad("a", c(5, 10, 20)) str_pad("a", 10, pad = c("-", "_", " ")) # Longer strings are returned unchanged str_pad("hadley", 3) } \seealso{ \code{\link{str_trim}} to remove whitespace } stringr/man/invert_match.Rd0000644000175100001440000000132412442343223015504 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/locate.r \name{invert_match} \alias{invert_match} \title{Switch location of matches to location of non-matches.} \usage{ invert_match(loc) } \arguments{ \item{loc}{matrix of match locations, as from \code{\link{str_locate_all}}} } \value{ numeric match giving locations of non-matches } \description{ Invert a matrix of match locations to match the opposite of what was previously matched. } \examples{ numbers <- "1 and 2 and 4 and 456" num_loc <- str_locate_all(numbers, "[0-9]+")[[1]] str_sub(numbers, num_loc[, "start"], num_loc[, "end"]) text_loc <- invert_match(num_loc) str_sub(numbers, text_loc[, "start"], text_loc[, "end"]) } stringr/man/str_dup.Rd0000644000175100001440000000104712442343223014503 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/dup.r \name{str_dup} \alias{str_dup} \title{Duplicate and concatenate strings within a character vector.} \usage{ str_dup(string, times) } \arguments{ \item{string}{Input character vector.} \item{times}{Number of times to duplicate each string.} } \value{ A character vector. } \description{ Vectorised over \code{string} and \code{times}. } \examples{ fruit <- c("apple", "pear", "banana") str_dup(fruit, 2) str_dup(fruit, 1:3) str_c("ba", str_dup("na", 0:5)) } stringr/man/str_locate.Rd0000644000175100001440000000347712442343223015173 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/locate.r \name{str_locate} \alias{str_locate} \alias{str_locate_all} \title{Locate the position of patterns in a string.} \usage{ str_locate(string, pattern) str_locate_all(string, pattern) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \link[stringi]{stringi-search-regex}. Control options with \code{\link{regex}()}. Match a fixed string (i.e. by comparing only bytes), using \code{\link{fixed}(x)}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link{coll}(x)} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link{boundary}()}. An empty pattern, "", is equivalent to \code{boundary("character")}.} } \value{ For \code{str_locate}, an integer matrix. First column gives start postion of match, and second column gives end position. For \code{str_locate_all} a list of integer matrices. } \description{ Vectorised over \code{string} and \code{pattern}. If the match is of length 0, (e.g. from a special match like \code{$}) end will be one character less than start. } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_locate(fruit, "$") str_locate(fruit, "a") str_locate(fruit, "e") str_locate(fruit, c("a", "b", "p", "p")) str_locate_all(fruit, "a") str_locate_all(fruit, "e") str_locate_all(fruit, c("a", "b", "p", "p")) # Find location of every character str_locate_all(fruit, "") } \seealso{ \code{\link{str_extract}} for a convenient way of extracting matches, \code{\link[stringi]{stri_locate}} for the underlying implementation. } stringr/man/str_detect.Rd0000644000175100001440000000252412442343223015164 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/detect.r \name{str_detect} \alias{str_detect} \title{Detect the presence or absence of a pattern in a string.} \usage{ str_detect(string, pattern) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \link[stringi]{stringi-search-regex}. Control options with \code{\link{regex}()}. Match a fixed string (i.e. by comparing only bytes), using \code{\link{fixed}(x)}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link{coll}(x)} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link{boundary}()}. An empty pattern, "", is equivalent to \code{boundary("character")}.} } \value{ A logical vector. } \description{ Vectorised over \code{string} and \code{pattern}. } \examples{ fruit <- c("apple", "banana", "pear", "pinapple") str_detect(fruit, "a") str_detect(fruit, "^a") str_detect(fruit, "a$") str_detect(fruit, "b") str_detect(fruit, "[aeiou]") # Also vectorised over pattern str_detect("aecfg", letters) } \seealso{ \code{\link[stringi]{stri_detect}} which this function wraps } stringr/man/stringr.Rd0000644000175100001440000000034612442343223014514 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/stringr.R \name{stringr} \alias{stringr} \title{Fast and friendly string manipulation.} \description{ Fast and friendly string manipulation. } stringr/man/modifier-deprecated.Rd0000644000175100001440000000056212442343223016720 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/modifiers.r \name{modifier-deprecated} \alias{ignore.case} \alias{modifier-deprecated} \alias{perl} \title{Deprecated modifier functions.} \usage{ ignore.case(string) perl(pattern) } \description{ Please use \code{\link{regexp}} and \code{\link{coll}} instead. } \keyword{internal} stringr/man/str_order.Rd0000644000175100001440000000214312442343223015024 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/sort.R \name{str_order} \alias{str_order} \alias{str_sort} \title{Order or sort a character vector.} \usage{ str_order(x, decreasing = FALSE, na_last = TRUE, locale = "", ...) str_sort(x, decreasing = FALSE, na_last = TRUE, locale = "", ...) } \arguments{ \item{x}{A character vector to sort.} \item{decreasing}{A boolean. If \code{FALSE}, the default, sorts from lowest to highest; if \code{TRUE} sorts from highest to lowest.} \item{na_last}{Where should \code{NA} go? \code{TRUE} at the end, \code{FALSE} at the beginning, \code{NA} dropped.} \item{locale}{In which locale should the sorting occur? Defaults to the current locale.} \item{...}{Other options used to control sorting order. Passed on to \code{\link[stringi]{stri_opts_collator}}.} } \description{ Order or sort a character vector. } \examples{ str_order(letters, locale = "en") str_sort(letters, locale = "en") str_order(letters, locale = "haw") str_sort(letters, locale = "haw") } \seealso{ \code{\link[stringi]{stri_order}} for the underlying implementation. } stringr/man/str_extract.Rd0000644000175100001440000000363612442343223015373 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/extract.r \name{str_extract} \alias{str_extract} \alias{str_extract_all} \title{Extract matching patterns from a string.} \usage{ str_extract(string, pattern) str_extract_all(string, pattern, simplify = FALSE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \link[stringi]{stringi-search-regex}. Control options with \code{\link{regex}()}. Match a fixed string (i.e. by comparing only bytes), using \code{\link{fixed}(x)}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link{coll}(x)} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link{boundary}()}. An empty pattern, "", is equivalent to \code{boundary("character")}.} \item{simplify}{If \code{FALSE}, the default, returns a list of character vectors. If \code{TRUE} returns a character matrix.} } \value{ A character vector. } \description{ Vectorised over \code{string} and \code{pattern}. } \examples{ shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2") str_extract(shopping_list, "\\\\d") str_extract(shopping_list, "[a-z]+") str_extract(shopping_list, "[a-z]{1,4}") str_extract(shopping_list, "\\\\b[a-z]{1,4}\\\\b") # Extract all matches str_extract_all(shopping_list, "[a-z]+") str_extract_all(shopping_list, "\\\\b[a-z]+\\\\b") str_extract_all(shopping_list, "\\\\d") # Simplify results into character matrix str_extract_all(shopping_list, "\\\\b[a-z]+\\\\b", simplify = TRUE) str_extract_all(shopping_list, "\\\\d", simplify = TRUE) } \seealso{ \code{\link[stringi]{stri_extract_first}} and \code{\link[stringi]{stri_extract_all}} for the underlying implementation. } stringr/man/str_match.Rd0000644000175100001440000000247512442343223015015 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/match.r \name{str_match} \alias{str_match} \alias{str_match_all} \title{Extract matched groups from a string.} \usage{ str_match(string, pattern) str_match_all(string, pattern) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for, as defined by an ICU regular expression. See \link[stringi]{stringi-search-regex} for more details.} } \value{ For \code{str_match}, a character matrix. First column is the complete match, followed by one column for each capture group. For \code{str_match_all}, a list of character matrices. } \description{ Vectorised over \code{string} and \code{pattern}. } \examples{ strings <- c(" 219 733 8965", "329-293-8753 ", "banana", "595 794 7569", "387 287 6718", "apple", "233.398.9187 ", "482 952 3315", "239 923 8115 and 842 566 4692", "Work: 579-499-7527", "$1000", "Home: 543.355.3679") phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})" str_extract(strings, phone) str_match(strings, phone) # Extract/match all str_extract_all(strings, phone) str_match_all(strings, phone) } \seealso{ \code{\link{str_extract}} to extract the complete match, \code{\link[stringi]{stri_match}} for the underlying implementation. } stringr/man/modifiers.Rd0000644000175100001440000000521412442343462015011 0ustar hornikusers% Generated by roxygen2 (4.1.0): do not edit by hand % Please edit documentation in R/modifiers.r \name{modifiers} \alias{boundary} \alias{coll} \alias{fixed} \alias{modifiers} \alias{regex} \title{Control matching behaviour with modifier functions.} \usage{ fixed(pattern, ignore_case = FALSE) coll(pattern, ignore_case = FALSE, locale = NULL, ...) regex(pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, ...) boundary(type = c("character", "line_break", "sentence", "word"), skip_word_none = TRUE, ...) } \arguments{ \item{pattern}{Pattern to modify behaviour.} \item{ignore_case}{Should case differences be ignored in the match?} \item{locale}{Locale to use for comparisons. See \code{\link[stringi]{stri_locale_list}()} for all possible options.} \item{...}{Other less frequently used arguments passed on to \code{\link[stringi]{stri_opts_collator}}, \code{\link[stringi]{stri_opts_regex}}, or \code{\link[stringi]{stri_opts_brkiter}}} \item{multiline}{If \code{TRUE}, \code{$} and \code{^} match the beginning and end of each line. If \code{FALSE}, the default, only match the start and end of the input.} \item{comments}{If \code{TRUE}, white space and comments beginning with \code{#} are ignored. Escape literal spaces with \code{\\ }.} \item{dotall}{If \code{TRUE}, \code{.} will also match line terminators.} \item{type}{Boundary type to detect.} \item{skip_word_none}{Ignore "words" that don't contain any characters or numbers - i.e. punctuation.} } \description{ \describe{ \item{fixed}{Compare literal bytes in the string. This is very fast, but not usually what you want for non-ASCII character sets.} \item{coll}{Compare strings respecting standard collation rules.} \item{regexp}{The default. Uses ICU regular expressions.} \item{boundary}{Match boundaries between things.} } } \examples{ pattern <- "a.b" strings <- c("abb", "a.b") str_detect(strings, pattern) str_detect(strings, fixed(pattern)) str_detect(strings, coll(pattern)) # coll() is useful for locale-aware case-insensitive matching i <- c("I", "\\u0130", "i") i str_detect(i, fixed("i", TRUE)) str_detect(i, coll("i", TRUE)) str_detect(i, coll("i", TRUE, locale = "tr")) # Word boundaries words <- c("These are some words.") str_count(words, boundary("word")) str_split(words, " ")[[1]] str_split(words, boundary("word"))[[1]] # Regular expression variations str_extract_all("The Cat in the Hat", "[a-z]+") str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE)) str_extract_all("a\\nb\\nc", "^.") str_extract_all("a\\nb\\nc", regex("^.", multiline = TRUE)) str_extract_all("a\\nb\\nc", "a.") str_extract_all("a\\nb\\nc", regex("a.", dotall = TRUE)) }