stringr/0000755000176200001440000000000014524777112011754 5ustar liggesusersstringr/NAMESPACE0000644000176200001440000000276414520174727013203 0ustar liggesusers# Generated by roxygen2: do not edit by hand S3method("[",stringr_pattern) S3method("[",stringr_view) S3method(print,stringr_view) S3method(type,character) S3method(type,default) S3method(type,stringr_boundary) S3method(type,stringr_coll) S3method(type,stringr_fixed) S3method(type,stringr_regex) export("%>%") export("str_sub<-") export(boundary) export(coll) export(fixed) export(invert_match) export(regex) export(str_c) export(str_conv) export(str_count) export(str_detect) export(str_dup) export(str_ends) export(str_equal) export(str_escape) export(str_extract) export(str_extract_all) export(str_flatten) export(str_flatten_comma) export(str_glue) export(str_glue_data) export(str_interp) export(str_length) export(str_like) export(str_locate) export(str_locate_all) export(str_match) export(str_match_all) export(str_order) export(str_pad) export(str_rank) export(str_remove) export(str_remove_all) export(str_replace) export(str_replace_all) export(str_replace_na) export(str_sort) export(str_split) export(str_split_1) export(str_split_fixed) export(str_split_i) export(str_squish) export(str_starts) export(str_sub) export(str_sub_all) export(str_subset) export(str_to_lower) export(str_to_sentence) export(str_to_title) export(str_to_upper) export(str_trim) export(str_trunc) export(str_unique) export(str_view) export(str_view_all) export(str_which) export(str_width) export(str_wrap) export(word) import(rlang) import(stringi) importFrom(glue,glue) importFrom(lifecycle,deprecated) importFrom(magrittr,"%>%") stringr/LICENSE0000644000176200001440000000005514524700555012756 0ustar liggesusersYEAR: 2023 COPYRIGHT HOLDER: stringr authors stringr/README.md0000644000176200001440000001432314524700555013233 0ustar liggesusers # stringr [![CRAN status](https://www.r-pkg.org/badges/version/stringr)](https://cran.r-project.org/package=stringr) [![R-CMD-check](https://github.com/tidyverse/stringr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/stringr/actions/workflows/R-CMD-check.yaml) [![Codecov test coverage](https://codecov.io/gh/tidyverse/stringr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/stringr?branch=main) [![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable) ## Overview Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparation tasks. The stringr package provides a cohesive set of functions designed to make working with strings as easy as possible. If you’re not familiar with strings, the best place to start is the [chapter on strings](https://r4ds.hadley.nz/strings) in R for Data Science. stringr is built on top of [stringi](https://github.com/gagolews/stringi), which uses the [ICU](https://icu.unicode.org) C library to provide fast, correct implementations of common string manipulations. stringr focusses on the most important and commonly used string manipulation functions whereas stringi provides a comprehensive set covering almost anything you can imagine. If you find that stringr is missing a function that you need, try looking in stringi. Both packages share similar conventions, so once you’ve mastered stringr, you should find stringi similarly easy to use. ## Installation ``` r # The easiest way to get stringr is to install the whole tidyverse: install.packages("tidyverse") # Alternatively, install just stringr: install.packages("stringr") ``` ## Cheatsheet ## Usage All functions in stringr start with `str_` and take a vector of strings as the first argument: ``` r x <- c("why", "video", "cross", "extra", "deal", "authority") str_length(x) #> [1] 3 5 5 5 4 9 str_c(x, collapse = ", ") #> [1] "why, video, cross, extra, deal, authority" str_sub(x, 1, 2) #> [1] "wh" "vi" "cr" "ex" "de" "au" ``` Most string functions work with regular expressions, a concise language for describing patterns of text. For example, the regular expression `"[aeiou]"` matches any single character that is a vowel: ``` r str_subset(x, "[aeiou]") #> [1] "video" "cross" "extra" "deal" "authority" str_count(x, "[aeiou]") #> [1] 0 3 1 2 2 4 ``` There are seven main verbs that work with patterns: - `str_detect(x, pattern)` tells you if there’s any match to the pattern: ``` r str_detect(x, "[aeiou]") #> [1] FALSE TRUE TRUE TRUE TRUE TRUE ``` - `str_count(x, pattern)` counts the number of patterns: ``` r str_count(x, "[aeiou]") #> [1] 0 3 1 2 2 4 ``` - `str_subset(x, pattern)` extracts the matching components: ``` r str_subset(x, "[aeiou]") #> [1] "video" "cross" "extra" "deal" "authority" ``` - `str_locate(x, pattern)` gives the position of the match: ``` r str_locate(x, "[aeiou]") #> start end #> [1,] NA NA #> [2,] 2 2 #> [3,] 3 3 #> [4,] 1 1 #> [5,] 2 2 #> [6,] 1 1 ``` - `str_extract(x, pattern)` extracts the text of the match: ``` r str_extract(x, "[aeiou]") #> [1] NA "i" "o" "e" "e" "a" ``` - `str_match(x, pattern)` extracts parts of the match defined by parentheses: ``` r # extract the characters on either side of the vowel str_match(x, "(.)[aeiou](.)") #> [,1] [,2] [,3] #> [1,] NA NA NA #> [2,] "vid" "v" "d" #> [3,] "ros" "r" "s" #> [4,] NA NA NA #> [5,] "dea" "d" "a" #> [6,] "aut" "a" "t" ``` - `str_replace(x, pattern, replacement)` replaces the matches with new text: ``` r str_replace(x, "[aeiou]", "?") #> [1] "why" "v?deo" "cr?ss" "?xtra" "d?al" "?uthority" ``` - `str_split(x, pattern)` splits up a string into multiple pieces: ``` r str_split(c("a,b", "c,d,e"), ",") #> [[1]] #> [1] "a" "b" #> #> [[2]] #> [1] "c" "d" "e" ``` As well as regular expressions (the default), there are three other pattern matching engines: - `fixed()`: match exact bytes - `coll()`: match human letters - `boundary()`: match boundaries ## RStudio Addin The [RegExplain RStudio addin](https://www.garrickadenbuie.com/project/regexplain/) provides a friendly interface for working with regular expressions and functions from stringr. This addin allows you to interactively build your regexp, check the output of common string matching functions, consult the interactive help pages, or use the included resources to learn regular expressions. This addin can easily be installed with devtools: ``` r # install.packages("devtools") devtools::install_github("gadenbuie/regexplain") ``` ## Compared to base R R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R. - Uses consistent function and argument names. The first argument is always the vector of strings to modify, which makes stringr work particularly well in conjunction with the pipe: ``` r letters %>% .[1:10] %>% str_pad(3, "right") %>% str_c(letters[2:11]) #> [1] "a b" "b c" "c d" "d e" "e f" "f g" "g h" "h i" "i j" "j k" ``` - Simplifies string operations by eliminating options that you don’t need 95% of the time. - Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs. Learn more in `vignette("from-base")` stringr/data/0000755000176200001440000000000014520174727012664 5ustar liggesusersstringr/data/words.rda0000644000176200001440000000651212743441637014520 0ustar liggesusersBZh91AY&SY#qafo@ @?`w O|u] 3=L5%:vaxMCMSi 4OA*=I A ?Jl$M $SB24iOS L#B4J!F vI!}ÖO]$R9 i%=3S>J  YQ$$O4GS˝{ˉ6n$I߮ݲV J1Klw" [YQ; `=C,DfUedW!΂`OX}FQaM,2& 4Ŏpy;:]=y?n2(MvB;zմ\X'à&jBSBAMOfckFPpO7iJ!J3p&^4,n}1RLO:V/u]hWsQFip*{E%@!yR]AnW-/EF(I*9pYI |<7#,--,qbPTDh>fa)q71!ΡS #LZ a("x).jf䴼 (A|!zQVg5.yUVZ!H8yZk/4Í ЫkjA0`řWcbOu]=;4pFmhZzB.,yNp ݸJwlx _'F3á;g9CL":(+7:,tvɾ/ aĴ#$6T6S .ZE~{I+%㉳Qc\jqcGd$v |&1gDfYrnUDe4U(8,MXF/ifc|Uŋ6IޜUIy^Os>˝?R>0nn\n.T.;  "ŷ5IFռ1J p_ {͢10y֊5/7 {0^nACqs4J@ >Ϥv,%uX@IY!S$Gy|x$>vHc.PQDC%S(9BC2ȥʑqeC3t !D".\nsxl o=4]8G}ܹfrSVM!< FRd&dM!;Xr ݗW>1svJ6]ޢ NwPd%@fIn\1e}ge|UX*CLpAȃgFnD \`]W0'9 }nb>tq$ 8KO31p|,;=7Ú=R&[":iMÑ%y M|.-|V\5Gչ(/ q#' cǜ|8s<ԡЈͬL P2GOZ'0vA}_~gCKndWS{G6]J׉CC;nr|"dnzW*P*>jR Rq4 e(=C"uV9'=1܁oN"tabmlߦ`2`Hl~+9Pk;k7z/C܃ w* &'D0(bxT0f]}:ay<"_/7T^p#aQVH&Ѐhz;ب?)lgl& # T#ՒA␺"-.)QpC+&!mh)pX֍,ЎcÒnpڣs`k>y_!rDv3O I73ѹz4zv^fuɠ_,;E/CMQ2Hɤ y7eA3\6"HQ3c ;}lW$<9,4S)!:덯5ϜWx}vL ')/ A"Y}FIoo 1)>r^`)KˆŐ958$)C;#uxJa"x^9㱓6=C$0C< $ \}~OKڮ2i4$dd0j8jZz*YGύV47;lޠ`FK0DhYT|8Xx2¿lzH0]dP dC37`vz=z]'$2` DB;ZP =,2q8#$_˜@7/ݝ_>|0ܔС~xjkE$C Dcf8ͱI䱷Iu؁ -興/EIӇr X#:ƻ f/q#ˌІžS8ak] 'ۯj&w~3g.7(ɧ".,eF"@Sl-#%U'+,d,NarHi.<ζA#XAPk{=_oLɁc=BzmoORSt$ k2xqU-w<\ymsƵvzԌAU~X[9p `JBcd.$Z~**SB!BY' òd-q~yFۘ>?cD#͕, }皏W4|fO$!vV|jzs \tQaJeQ6ߺيxA.Π9e1)]1.^`YįVz˜TZĻzb!ͼ ߏ1]3ݼT޸y;ʩ!b; XPH)A q9;ܑN$Xzstringr/data/sentences.rda0000644000176200001440000002601214520174727015344 0ustar liggesusersBZh91AY&SY F0@`??@`9?{ޯGmRj@@*[u 0 ޻/zyk S7v\NgM۷fnwu1"ԭ{,hd;X6*wqQ S@B5FM6D h4ѠjBDU?S)4z 4$CB5&e44 &"&h荠0hhdbhdaC&dhiddhd&ɑa!!4L4H@ 4L[$g^HYHX/@(>oO. K/?竚n'@y /=o>l'땼Jiz_.T EeUA!XLo-ݴghk_R&:5OB 4::`N3XK4Gem%oKJ-azLL#xx;k޵gI(wŁXAqꌖфw mQ|>wONx\I#^f g.-1phȉqz:$$1K{;S_,=:S .7@{z^Iы@?b[(mYrYfv=_օx|B$2:/z<4!$ M(RiKt2Q!Ir#}x2LT7+h VV21-ݼ}kd91H]jliZTZlό# j hJLYyh7 &_caU{0䯷j5h{Սgrڡa@Dd>57 >< ?vd4B79Bzx9_B`0v7U<6z1Z]D'܉韣6CU)=:CZHXs3NxԸL!vO7y,̑F{ ;wad|2y0)udJ"ͤ*~A-#S_/:Ç`G-z,8>Pzf]0䖸\D"A/4梌dQ0*D 4ԢRȾ/"|^}XZ7>bJ-nc#Wt_!ȈSid oW  $_U O mcQR l! i0/&u⓳0餓r^BTM >ZպRrْLk)% `uaUn*Lp6E7(!34N6B(9.e ME)>ҪicZr#oKrIUY/bSGQ/*fdIH\dg"/(`ꎪBV OXbsѓp=Xyy@*9A>XS~)5 fQ軂mXfɤlHʉK\"IG}bACsPՑyϮS;8wX[gIۢW .-K!(ǒ󫦮̊:v f+Z{;xo{ =ȁQ[;~D3($7"Z- *?Qm8}+Y7{|^/j~#9O==;t_zv\amRoJ*b_2ЊZ&~6(A]}씎{C}Ӣk-. W{?o)`\)TSoߍrbmCnfXmʗG _ͳԮ#-_?oz[XKw}fKٔ|i dڝL+)ҝ֛贴˹_j+ &Lo~xG#-n)^k7w&>թ{*F+F+,rw#|ƈۡrmL5ƐQQ*ڃ8PV2N>.!,>rc箇² /ΒT9@%~v.}}EȠq" TDMvzA.„[XFL=`>`3WyBY)KG~C0#A~G;:'gokwC"c'/$$6`ᇏBPxeSt,ݐ ɉU512|z&c5}7nU)Of1=88߷$s򡡘d~/L\*dF* :0!F%M"˦e/5G<4D΁+,Ga%`v%@'7 Tmwgnʜ5t?cBi_0~dC\d;}FnM?K)!Vh%b 1IlB3yMrNs5D40)n\ps,DR* qΐ͖AZ(]D+Y Zz @DJ*ǩgvLjw,8qeUT\rɕ.%u ꍩNf$ ةP?6 ;qn@ )NJ 5n֧}}8ǣˠ࢐ 4Űce.j0 fF΍"Ȳ!qNU@hT7.qSlZ*^ (b?JW)J-.–6RCQ.vE l;W-q(Ԥ]:MaAY$@K0,{ATgor~I>Iڐ쇿=9C1r¾f(|53L9J|XgD:7vg~׿`˵: ,6aP4:dVm1&ɂ"<:Mڮ|q߂ N!HC}T;y=YwM3ç͙<$7`H5Eb"HNr m.@2H Wzd~R 5Nۻ`Ү:Op$.H  9KZ`Gшћ4qߔF6K⠫,EۭpT8˶g!z'T\C >Խ(lPh՞7D rheʅާ'nWdo}qC<*/˛hD L~J=quqb%SsWHCuE(8S)+{F$.!fXse`@7Gv}S0x/c:HϢ2ڲj+ XH'[l8Īz {;o/FiyM'4ɾ%DCr~~#'0Z7߷3C:Cvw3<>]$x[p'`diq;~_4V5pbyZo&_'3n>q-M%gƠshgt'Kc\{7PR*)WI`()5p hr{'Z! >eNlNj"J}[h[n4Fʈ ,!) [e^1>^5(Ebí>p;(JK\o4cC @<3/L=T`EA *kp]~˗A8M0q` D8 5(G?U)ns\~T;8*vOfc\5[JbZ?D^&N7+|oM^a.ݕ1F1lJFR& |bSf,쟣ü)$==O+p@{1ق^?Q4+a-1&1acށ'TA*8\rYfA T@@֐[ڜO sάO^P: X,1REM5]  iy=J ^E oDqyX+$DX((sGu 2H|ogh헊i٧_@e&PFѹ\.\V3"ermd˃m\[L̷ᙅ*[sPqƸeC"<)1X'hT?9F3R X(U+(ʊ"*ZAV*$PU"TAT Q@QdTkQAb DUQr*1"UQDb(j ""*AH=hTV* EDbF c20F"(`Yf0G-R(+")"*bQUclTF EX*H, :&%amLAkuR;.Ys0ִ' -#¥f RF38.&޷vP0) L11iwr3 RE[u'(fɢbBdK9dD@0K1&`FCi#`m*3'_QLT$Oa0R0|~Z1h>gtFnEd12\XzL~-Sޓb9}֎PƦρƘx3ANx8Y3&d1혻+AEE*ATDEX@$@P.Z{/a޾~(j(}Yw$ >DV3{ _./ vTr+3 5aor4mfzHqFMYjaȴI#{:"7K .$"f+-Hc(=sm)/e5$Q2J @ ,0T@sYJ;qUJ@ :lɩiYʼYbU_aRw}DU8a0qᘗ0k0XiVq)V)v-,1'C; Vh4,hҁkJP4xd&%l3m7OZ (mTzr7uvi.;Aսx,~IW\|89(jVG֋IݑhQ-DP)RҤrHKQ,CF"N9ZjUtݴE ;L̐ȡ'w;Gt X3J2rYgzpǷoa([?xуGb9R՛놽7r$X<$;d8ϵ3N<6mC>B] 2ƨilsuZ^ ZҤO|l~ŁY ^zg~7 Tֽmw>VLIi\~.6i(`H+>QS{CGʾ~#vL@НҀ'v r^jMuJnD(HBKS>*b#2SLY5 y^ݪ< ׉ɲ!xZkNvG)k`) աs/R#e;`01ɓ]UzZGa,1 a|'9 z0yMM%mDе8." ,k_/Sxdu|8]9|4TvinQ}XSIW,DgtzC']4CrEf"m)0PƦs T~qGRAچ]ӏec68SBưnEj<$BHW'%;p^SҘpM@PbɌ5XWu6bң oan!SbYI̙LZPaYۓZۻA t[ݳUdc5z,f(r&9Qk^8eƉftaBRYۙ5ae5 1&TV4YGT̅D(ڸpff7 MTD}5gfn4_FocslGjS׹ܾq+Ulm-+X4k0RیGVci*%`[UZŨ0[KuZFRtV:qxi% DžosbbR rPzZ܃gV&huF,-YpPpmA5Խ>;:wcO}Ng"o&wb[qs9ty~IV;gOiDRLZ̵4%Dt-̕jeQ/[w&I׍˫.kZE貈p#y !ފtTBfW-CJ 4 LSNe.HƵ0*gܥWiYx^uK93e-^l&ps^U] ؘb)"(j 8QGRRoj" 糐*jg|}ݿ5,^LIS4~U-3a=Y?:t_v's㮯s[e}X.$HYZ_d.f&աg@' mXfc+9ASNW״B0.#5ȏ8ƃ(z)4~C_%,0b#Gc?~'Gʓ͌4zsxv}x Jj<̎$/T@dTӑ;a70\V^A]Mj4q…6 sm#!eݮMƌeDhLmŸ܁-˙ '5"CH]7c !4`m`TdqaKYh4N] #"ų4 .dLAv F6wNoi£y!h3M LhuqѠ$ 72ЕD &G*M=N^êv VrKa{oҀ7n}| N )a1.ӗ!a4V+ XQ8 l8ا++kJ `9M-u+Kc)CD9|R]~wėe:q7F6ڇvdspmd1aad[&:OX=,TietSĵE6F? |#of[:!i~G>)K (tn;ܑN$.qstringr/data/fruit.rda0000644000176200001440000000103412743441637014505 0ustar liggesusersBZh91AY&SYko`[`@P@?@SABFiCCFL#*mj)2*Th`M10Jy i#M4hhtbAbv+{)DZgnB0"$(Cepll &zP"! UB郐^Vk]~gg]OCSYg8D b 7&ay^7N;vӸw/lLVYٞ qfۮIJKjHU_۹&DCx3\Z]L*(gYvke(+5@8}@̸HB@⏚F@5Tx*ݡ#( ̔ XPM!{sg[s,F0rö6$)\D\% str_glue_data("{rownames(.)} has {hp} hp") } stringr/man/str_pad.Rd0000644000176200001440000000264214520174727014455 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/pad.R \name{str_pad} \alias{str_pad} \title{Pad a string to minimum width} \usage{ str_pad( string, width, side = c("left", "right", "both"), pad = " ", use_width = TRUE ) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{width}{Minimum width of padded strings.} \item{side}{Side on which padding character is added (left, right or both).} \item{pad}{Single padding character (default is a space).} \item{use_width}{If \code{FALSE}, use the length of the string instead of the width; see \code{\link[=str_width]{str_width()}}/\code{\link[=str_length]{str_length()}} for the difference.} } \value{ A character vector the same length as \code{stringr}/\code{width}/\code{pad}. } \description{ Pad a string to a fixed width, so that \code{str_length(str_pad(x, n))} is always greater than or equal to \code{n}. } \examples{ rbind( str_pad("hadley", 30, "left"), str_pad("hadley", 30, "right"), str_pad("hadley", 30, "both") ) # All arguments are vectorised except side str_pad(c("a", "abc", "abcdef"), 10) str_pad("a", c(5, 10, 20)) str_pad("a", 10, pad = c("-", "_", " ")) # Longer strings are returned unchanged str_pad("hadley", 3) } \seealso{ \code{\link[=str_trim]{str_trim()}} to remove whitespace; \code{\link[=str_trunc]{str_trunc()}} to decrease the maximum width of a string. } stringr/man/str_escape.Rd0000644000176200001440000000136714520174727015154 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/escape.R \name{str_escape} \alias{str_escape} \title{Escape regular expression metacharacters} \usage{ str_escape(string) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} } \value{ A character vector the same length as \code{string}. } \description{ This function escapes metacharacter, the characters that have special meaning to the regular expression engine. In most cases you are better off using \code{\link[=fixed]{fixed()}} since it is faster, but \code{str_escape()} is useful if you are composing user provided strings into a pattern. } \examples{ str_detect(c("a", "."), ".") str_detect(c("a", "."), str_escape(".")) } stringr/man/word.Rd0000644000176200001440000000205714520174727013774 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/word.R \name{word} \alias{word} \title{Extract words from a sentence} \usage{ word(string, start = 1L, end = start, sep = fixed(" ")) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{start, end}{Pair of integer vectors giving range of words (inclusive) to extract. If negative, counts backwards from the last word. The default value select the first word.} \item{sep}{Separator between words. Defaults to single space.} } \value{ A character vector with the same length as \code{string}/\code{start}/\code{end}. } \description{ Extract words from a sentence } \examples{ sentences <- c("Jane saw a cat", "Jane sat down") word(sentences, 1) word(sentences, 2) word(sentences, -1) word(sentences, 2, -1) # Also vectorised over start and end word(sentences[1], 1:3, -1) word(sentences[1], 1, 1:4) # Can define words by other separators str <- 'abc.def..123.4568.999' word(str, 1, sep = fixed('..')) word(str, 2, sep = fixed('..')) } stringr/man/str_equal.Rd0000644000176200001440000000243114520174727015014 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/equal.R \name{str_equal} \alias{str_equal} \title{Determine if two strings are equivalent} \usage{ str_equal(x, y, locale = "en", ignore_case = FALSE, ...) } \arguments{ \item{x, y}{A pair of character vectors.} \item{locale}{Locale to use for comparisons. See \code{\link[stringi:stri_locale_list]{stringi::stri_locale_list()}} for all possible options. Defaults to "en" (English) to ensure that default behaviour is consistent across platforms.} \item{ignore_case}{Ignore case when comparing strings?} \item{...}{Other options used to control collation. Passed on to \code{\link[stringi:stri_opts_collator]{stringi::stri_opts_collator()}}.} } \value{ An logical vector the same length as \code{x}/\code{y}. } \description{ This uses Unicode canonicalisation rules, and optionally ignores case. } \examples{ # These two strings encode "a" with an accent in two different ways a1 <- "\u00e1" a2 <- "a\u0301" c(a1, a2) a1 == a2 str_equal(a1, a2) # ohm and omega use different code points but should always be treated # as equal ohm <- "\u2126" omega <- "\u03A9" c(ohm, omega) ohm == omega str_equal(ohm, omega) } \seealso{ \code{\link[stringi:stri_compare]{stringi::stri_cmp_equiv()}} for the underlying implementation. } stringr/man/str_dup.Rd0000644000176200001440000000122414520174727014474 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/dup.R \name{str_dup} \alias{str_dup} \title{Duplicate a string} \usage{ str_dup(string, times) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{times}{Number of times to duplicate each string.} } \value{ A character vector the same length as \code{string}/\code{times}. } \description{ \code{str_dup()} duplicates the characters within a string, e.g. \code{str_dup("xy", 3)} returns \code{"xyxyxy"}. } \examples{ fruit <- c("apple", "pear", "banana") str_dup(fruit, 2) str_dup(fruit, 1:3) str_c("ba", str_dup("na", 0:5)) } stringr/man/str_locate.Rd0000644000176200001440000000453714520174727015165 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/locate.R \name{str_locate} \alias{str_locate} \alias{str_locate_all} \title{Find location of match} \usage{ str_locate(string, pattern) str_locate_all(string, pattern) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link[=boundary]{boundary()}}. An empty pattern, "", is equivalent to \code{boundary("character")}.} } \value{ \itemize{ \item \code{str_locate()} returns an integer matrix with two columns and one row for each element of \code{string}. The first column, \code{start}, gives the position at the start of the match, and the second column, \code{end}, gives the position of the end. \item \code{str_locate_all()} returns a list of integer matrices with the same length as \code{string}/\code{pattern}. The matrices have columns \code{start} and \code{end} as above, and one row for each match. } } \description{ \code{str_locate()} returns the \code{start} and \code{end} position of the first match; \code{str_locate_all()} returns the \code{start} and \code{end} position of each match. Because the \code{start} and \code{end} values are inclusive, zero-length matches (e.g. \code{$}, \code{^}, \verb{\\\\b}) will have an \code{end} that is smaller than \code{start}. } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_locate(fruit, "$") str_locate(fruit, "a") str_locate(fruit, "e") str_locate(fruit, c("a", "b", "p", "p")) str_locate_all(fruit, "a") str_locate_all(fruit, "e") str_locate_all(fruit, c("a", "b", "p", "p")) # Find location of every character str_locate_all(fruit, "") } \seealso{ \code{\link[=str_extract]{str_extract()}} for a convenient way of extracting matches, \code{\link[stringi:stri_locate]{stringi::stri_locate()}} for the underlying implementation. } stringr/man/str_order.Rd0000644000176200001440000000404714520174727015025 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/sort.R \name{str_order} \alias{str_order} \alias{str_rank} \alias{str_sort} \title{Order, rank, or sort a character vector} \usage{ str_order( x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ... ) str_rank(x, locale = "en", numeric = FALSE, ...) str_sort( x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ... ) } \arguments{ \item{x}{A character vector to sort.} \item{decreasing}{A boolean. If \code{FALSE}, the default, sorts from lowest to highest; if \code{TRUE} sorts from highest to lowest.} \item{na_last}{Where should \code{NA} go? \code{TRUE} at the end, \code{FALSE} at the beginning, \code{NA} dropped.} \item{locale}{Locale to use for comparisons. See \code{\link[stringi:stri_locale_list]{stringi::stri_locale_list()}} for all possible options. Defaults to "en" (English) to ensure that default behaviour is consistent across platforms.} \item{numeric}{If \code{TRUE}, will sort digits numerically, instead of as strings.} \item{...}{Other options used to control collation. Passed on to \code{\link[stringi:stri_opts_collator]{stringi::stri_opts_collator()}}.} } \value{ A character vector the same length as \code{string}. } \description{ \itemize{ \item \code{str_sort()} returns the sorted vector. \item \code{str_order()} returns an integer vector that returns the desired order when used for subsetting, i.e. \code{x[str_order(x)]} is the same as \code{str_sort()} \item \code{str_rank()} returns the ranks of the values, i.e. \code{arrange(df, str_rank(x))} is the same as \code{str_sort(df$x)}. } } \examples{ x <- c("apple", "car", "happy", "char") str_sort(x) str_order(x) x[str_order(x)] str_rank(x) # In Czech, ch is a digraph that sorts after h str_sort(x, locale = "cs") # Use numeric = TRUE to sort numbers in strings x <- c("100a10", "100a5", "2b", "2a") str_sort(x) str_sort(x, numeric = TRUE) } \seealso{ \code{\link[stringi:stri_order]{stringi::stri_order()}} for the underlying implementation. } stringr/man/str_trim.Rd0000644000176200001440000000204114520174727014655 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/trim.R \name{str_trim} \alias{str_trim} \alias{str_squish} \title{Remove whitespace} \usage{ str_trim(string, side = c("both", "left", "right")) str_squish(string) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{side}{Side on which to remove whitespace: "left", "right", or "both", the default.} } \value{ A character vector the same length as \code{string}. } \description{ \code{str_trim()} removes whitespace from start and end of string; \code{str_squish()} removes whitespace at the start and end, and replaces all internal whitespace with a single space. } \examples{ str_trim(" String with trailing and leading white space\t") str_trim("\n\nString with trailing and leading white space\n\n") str_squish(" String with trailing, middle, and leading white space\t") str_squish("\n\nString with excess, trailing and leading white space\n\n") } \seealso{ \code{\link[=str_pad]{str_pad()}} to add whitespace } stringr/man/str_extract.Rd0000644000176200001440000000525614520174727015367 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/extract.R \name{str_extract} \alias{str_extract} \alias{str_extract_all} \title{Extract the complete match} \usage{ str_extract(string, pattern, group = NULL) str_extract_all(string, pattern, simplify = FALSE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link[=boundary]{boundary()}}. An empty pattern, "", is equivalent to \code{boundary("character")}.} \item{group}{If supplied, instead of returning the complete match, will return the matched text from the specified capturing group.} \item{simplify}{A boolean. \itemize{ \item \code{FALSE} (the default): returns a list of character vectors. \item \code{TRUE}: returns a character matrix. }} } \value{ \itemize{ \item \code{str_extract()}: an character vector the same length as \code{string}/\code{pattern}. \item \code{str_extract_all()}: a list of character vectors the same length as \code{string}/\code{pattern}. } } \description{ \code{str_extract()} extracts the first complete match from each string, \code{str_extract_all()}extracts all matches from each string. } \examples{ shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2") str_extract(shopping_list, "\\\\d") str_extract(shopping_list, "[a-z]+") str_extract(shopping_list, "[a-z]{1,4}") str_extract(shopping_list, "\\\\b[a-z]{1,4}\\\\b") str_extract(shopping_list, "([a-z]+) of ([a-z]+)") str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 1) str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 2) # Extract all matches str_extract_all(shopping_list, "[a-z]+") str_extract_all(shopping_list, "\\\\b[a-z]+\\\\b") str_extract_all(shopping_list, "\\\\d") # Simplify results into character matrix str_extract_all(shopping_list, "\\\\b[a-z]+\\\\b", simplify = TRUE) str_extract_all(shopping_list, "\\\\d", simplify = TRUE) # Extract all words str_extract_all("This is, suprisingly, a sentence.", boundary("word")) } \seealso{ \code{\link[=str_match]{str_match()}} to extract matched groups; \code{\link[stringi:stri_extract]{stringi::stri_extract()}} for the underlying implementation. } stringr/man/str_conv.Rd0000644000176200001440000000125114316640452014645 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/conv.R \name{str_conv} \alias{str_conv} \title{Specify the encoding of a string} \usage{ str_conv(string, encoding) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{encoding}{Name of encoding. See \code{\link[stringi:stri_enc_list]{stringi::stri_enc_list()}} for a complete list.} } \description{ This is a convenient way to override the current encoding of a string. } \examples{ # Example from encoding?stringi::stringi x <- rawToChar(as.raw(177)) x str_conv(x, "ISO-8859-2") # Polish "a with ogonek" str_conv(x, "ISO-8859-1") # Plus-minus } stringr/man/stringr-data.Rd0000644000176200001440000000135214316043620015403 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{stringr-data} \alias{stringr-data} \alias{sentences} \alias{fruit} \alias{words} \title{Sample character vectors for practicing string manipulations} \format{ Character vectors. } \usage{ sentences fruit words } \description{ \code{fruit} and \code{words} come from the \code{rcorpora} package written by Gabor Csardi; the data was collected by Darius Kazemi and made available at \url{https://github.com/dariusk/corpora}. \code{sentences} is a collection of "Harvard sentences" used for standardised testing of voice. } \examples{ length(sentences) sentences[1:5] length(fruit) fruit[1:5] length(words) words[1:5] } \keyword{datasets} stringr/man/str_flatten.Rd0000644000176200001440000000322514520174727015344 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/flatten.R \name{str_flatten} \alias{str_flatten} \alias{str_flatten_comma} \title{Flatten a string} \usage{ str_flatten(string, collapse = "", last = NULL, na.rm = FALSE) str_flatten_comma(string, last = NULL, na.rm = FALSE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{collapse}{String to insert between each piece. Defaults to \code{""}.} \item{last}{Optional string to use in place of the final separator.} \item{na.rm}{Remove missing values? If \code{FALSE} (the default), the result will be \code{NA} if any element of \code{string} is \code{NA}.} } \value{ A string, i.e. a character vector of length 1. } \description{ \code{str_flatten()} reduces a character vector to a single string. This is a summary function because regardless of the length of the input \code{x}, it always returns a single string. \code{str_flatten_comma()} is a variation designed specifically for flattening with commas. It automatically recognises if \code{last} uses the Oxford comma and handles the special case of 2 elements. } \examples{ str_flatten(letters) str_flatten(letters, "-") str_flatten(letters[1:3], ", ") # Use last to customise the last component str_flatten(letters[1:3], ", ", " and ") # this almost works if you want an Oxford (aka serial) comma str_flatten(letters[1:3], ", ", ", and ") # but it will always add a comma, even when not necessary str_flatten(letters[1:2], ", ", ", and ") # str_flatten_comma knows how to handle the Oxford comma str_flatten_comma(letters[1:3], ", and ") str_flatten_comma(letters[1:2], ", and ") } stringr/man/figures/0000755000176200001440000000000014524706124014166 5ustar liggesusersstringr/man/figures/lifecycle-defunct.svg0000644000176200001440000000170414316043620020270 0ustar liggesuserslifecyclelifecycledefunctdefunct stringr/man/figures/lifecycle-maturing.svg0000644000176200001440000000170614316043620020470 0ustar liggesuserslifecyclelifecyclematuringmaturing stringr/man/figures/logo.png0000644000176200001440000005545714524700555015656 0ustar liggesusersPNG  IHDRޫh cHRMz&u0`:pQ<bKGDtIME  |ZIDATxw|ս3h՛fY$^C$@!7yp\nBMFr$TS qo7EkyJZ*Δ=Ϝ;#1j J G+TK} B1pb P< d5/G#Ԅ<<%tsWI" M#M#oKP l~ x@HE#F"0U~BTakBah!.,@e&䑁&@xSO_'4_/7115W.tt x MMÐ Lm@] x x( y8 x+|1']R 🳑 1M>0+EoAWU /11PܫQ~ Gr5>0oMȗM^igQGMTάÜl`pyx=>'/ ^:Q dFQꙐ%d|.?v/Ap4@]&E#^+p3j|"u"&Äd?4@ӋÇ YvVx(<-B)b\Հ9لΤE! iJH|C7!Є<hDc{Oy=\s Հ F1QdO7!q I- &[fD\^sMv#N9$qz CcͶ8DhN015,F>h+Lft8O-nʏ5msC#ŧUhAAp8'][}sEQ v@"n_&ҫ\H/'2B&u>hg:x= vG-Ͷ04I/&#M2bt:qP $$>>Dm-rPr|h^6+Q+sE=<-d,`.h NDA`=j|m1n4^LTiZM-!kiހ;K؅jx0)Ì|q1|N/e ?*w&1k%$F:m~אM՛AԴLOl~ml&^.OEI"l~&7[xPg[| նx(rPrOFW-|u q6]}HtĤe۶X9 YetԠ#^ 6 '/`m{f[Am 4p/Bm~xo1HwQw؂f[n~Fk[l۶x,rp, yLU^MBJa”40h㜴Lol-6G%!w-f33b{"ZficQ_zrJ"l~=攡6Emq'cĶ8jkyl~_'a6?FqlǹrۢC0q?N=BuUp;PM3-sFl[<7FmqTռ^6tm0؍7\t0D<*ܫ-爧qY9exF`[۶Qd[5pǡƸ_-3gJ6a6Fm1$LmnbuH򈬎kGU0'2{» h$|x;Hu_-򈫕l~K)n~3: v2m[ 񈩥Z)tR)6?{ZHcX-z=c[lC-h[<6{Ps-3:-d0;LFoqmm1!N-wju~3Vm~?blQaYc{rD"l~攱mm mTVmqX^剨_ȌXQ9E:.j[{n$aQ}cu425⼶vo2/2l]:-D1Z;M{m ^ė:A]o-4P'77P>ey].@ @ne e6HwL:eƀ\b<[tP 9njĈׁ: T=qN’bƚa`6Dq|G(sy8&B`} @j@NXMe[x h;>Ry32xwKHw6>v':I6iy#t\PcxiGm6`kۄo eEI䟥,T1X3,XRHGwGL(. .[2&OL[GU oQH߾A귝丿CZ\j #$j@g!d2PIB8fOM)+JHD{ 7szTPYRX-MO AZ={7SYYɲebK+y?$5o>SNa4sxj_ck:N;D䀌PklÈ:5P!%^6O59j)ل-Ê;RG7O/|TVVc6)//W{<k>ޣY]ӬlqE ÈH]zڭNA;^P׮+~ 8Սj^Ɩia:C?EǾ:;`XV=ʱc())a~6mġ ӱOFߥ(-zf=&Эh7(w6'@d͕o""v;uuL(@gg'---,]@ @uu5GೳIj"7(ڔҨ"Ew}Sn*6)q?ajSAqiӛȵf"[jlt Yz5~>OrrKa=M؂wTsnc[۶9qB`:FF8A4sX3)¥m"#ܺ$#Ysteud$#Gߏf1a͛Gk]3'ryam.x^$uDo#@"tTKB8?[}c܈P&ZTqjalBŜzU0a!ٌ XRAݧ Ք.7oN?FYݤ%tvbH|3QB@cu3I ]h:^BwBu\GkKA7a+  i;[=>5ɡ'NȦx )2zAX!}9|Y̎5Q-%0bmQ q$KjD_21'لbu B0dѢE=sKٰqƢdDID<9׶.{+ڭ/^?j{Il~#)7 omކ::sUWw^JbX'CQ:TgΉ`7xQ)YcoQ`LzgϞ=,[;dbH4o9sՈ%DmbDZSKA Y7LA/+Ē%KXz5_eӺZFOYԅ5.Bx?ZsOOۍ`ńB!lƒ?G{I'w'A]h>ə6ɹeTum66lu={6V]wwoKWt`PC?h-p̉R-zul6c٘uHǐfE:ՃPUUEqq1$1c p|MZdہ)?y̦j x,[ R5v95<̙3ɓl۶>s'V917=5F +yXRp%x0͔v6m>|gE=NO^Z165LtM9Id0A'"2[n6ۙ={6qFqW)iH9$ABAO/@1Ҟv5_̙3\}tvvbb޽8|ob mYj=L 7c[5j[_S'QBԉ¥_0_#> c;El–i4,y?@AkyGp\SWWǖ-[뮻8~8UXcѼSfCxiljTc;OEBh\l6at5wn\! troۅy)));Ap0e._+˗W%)ע1hbA@|6͛wgl6شi'OIzz:&M>(++RgTT<кCM*1YYYXjj*eF|>u(++.G$ }9ziEԜZצRkZ =L vi1nJKK?K.nLZZZ4i_~?9rH\הǽ},ﮭ2rۊCol$X,EEXjiiiL>G={6˛1ye,#kVAh*@QhSäbϘާ_^~~><o& Izt:J6mDmm-?0ӧO'vRRSx/PޔTS{1Z<<*[vP2I_~9gƍHDjj*N4fϞͫJ{{;_W|OMff&MMMP\\o,|_g͚5^:{~˔J#VZ+<RYYInn.555XV)//h4rJ yo~Ç:Ngӿ˷K۶3ȾOkh -Q[v?ƍ+2331cUUUсlr1g:;;yw={6_~9?ihh|zORwU;vG=A`ڴi455fQWWGrr2W_}5[nɓAcccʮ  ڴa&@QPB!N>MRRRTĿoعs瀊rdYvڊfcͼ˼ d}Ѹ]DDQdɒ%?~cx͛ǓO>=Ýw믿NEEE.Y`R$3iVa&!@Rfr1vInnn4}{TWWŔsQx N<ɯ~+ϟOjj*eee[8t_ 0sf9R@@?%Scho_җ\wu-oe(pi()ڊCd(ܹs$fY~=`A`„ l޼7oBaa!3g+dܹq\y=fCxwI(iB[r8FMOq ZrrrGe233Yf W\qKS6˹5kwư7HKKn_BDĔ"i̡/4'߁! @ٳgcۙYVRaҁ)cvK4m87ީ.W}b x"ll4=ֿgڵ,]e zQ=$IdggsN&Mb8pɤ{n&NC=DSSO={¢".]ʼii?\K&%`aŬ%|/~}ڰkQR<"E3y+dʔ)I~~>]w7of{e޼y,Z|̧P0ޠ'@ [KG/Z |FzS3|_ذa.7-->|7|Ǐ3aRRRtTiooGEt:]ta;vpWOS*++yWٱc-"++ QZv,9azW_yJlHFm!#!؝)7 C_yq`[᝷Geٲe: Xv->ECCmmms 7pBeƌ~јBquYyRn A]$DVj?bh0b  ^###|[bƍL&.\Hjj*iiiF~?'OFEF#3gٳAˣC ӄ$I:xfdk' H Jzffڴi(+C@n!" tQϸqdgg_4c߾}/b '?!999<tMvcҥ$'']pٸq#`qQXXHuu5?Xret;Bfz|_ŠeeM#EPwtn>uӖqqVXٳ !55rrs /Hyyycq\|̙3L8vɳ>ŋyꩧ0L?˅CRR ;;Nٸq#zg֭[IOO'???\~l&i$;6ka*{W G]'ѝyfժUIHJJl6JVf,SRRBqq1=7v8N~ӟr}쳪 r & m=[ZZ$::::e&''b0hoo#33|+XVN8sw}.7Zq;m{DEQ[y֏β}9r~!*7l@Aa„ FL:g,9hdĉ xodʕXϝwމbAQ:::FKKKU-7//z+ڵtRl6  (**СC7lRRRjL&\{FFͣlJKKIOO?o3oBI h]M#A !bGg3uB;'kl~Z[[l۶ŋ {nf3,\AHIIaԩ?'55e˖qiL&zkRXXnG ?cլ߸9Z |4zA1fјeR3!W]]b0;v,ƢinnӧO3w\RSS&77%?? rqN'f"##AGG\dOՋ)îu?M藧1FPr[Zm2}3f 332vl6?+,,DLҘ1cSL!%%NG[[[B V5x tDI۪BhH ./Dk" B4mhht:**++{$w}7??ikkA}Y+BgggB~:(OB#~4ŀ?y~򓟰~ &) /t=~n.Y&Bd^ɦ.:;;ٸq#s̹].---8qIp|f|09gRRGM̙3B6ޖN-GzhI(`HY<3TWWկ~Z9rnd2Eǽ<\{G?q@KKKnÞEtk.zGJ DG;w2qDwRQQ+;څ0 |ߧ|;1cON_|9F&/DvWZŪUp:=>kqײo>mƑ#G曣Ƅ aZ~O}}=555 I'-;6k聠M#$D5Pr6l}!77$?BSSn_ה4b~ӟb67o$QQQ5@ Lk=e|r6<:rM":::c$''PSSÜ9s8qylܸ|3Qoo_p_r B0+͍M|px3։i{FZ&p).Z^fQZZߏ(̟?zTEaʕaAsw}7?񏩮pPUUXɬM#7ͫANv6PIB(”)S蠹ӧ~֮],:N&SLg?>}ք܏lBM#-]`붭QA$2bd0 dggsAnv,X@{{;yyyqd2裏kFii);vB}! 2Pw`[uU)G IӁXLfO˱Z(B0 a֭Z[[9x 7n'd/E+(,,dƍ}F?.W,( ^?9:Q[Zgh]Q@dCkA*'vFl puZd x`A@a)..fŊtM~z^~el6&Lb`x)//ԩSڵOWy}<䓼]EhC˅ctoǐfa+rͲ+XVp$%ꫯ2{>ǫf)Sp-0a^yU>\b͛믿fѢEL:SNQ[[Kqqq "Ag'aʴpj>Mc!aLvQ8̝;ӧy7+X>((*d֒yd_xAQQ=˗/'##d֮]˂ EQh9l: @ff&?7{ 745%$k#]][=}^G1I QLUA~q7F?xhii˥.ѣrhwDJj VH("554l6IIIL&y?v!ޙX򒵸hEousXke~s=`2x<:u:]XEg 꼲(5#bZZ={ YOw<ʉ m;Ԯ<3v'yk_c֬YQ,Tw4 dF+]]]Ȳ`0ZL&DQ=w;™"cE )YfTMEQQQM bEiUMۇHr >OpBrrrp0Xs+x<;Kk]3NVo>v#'fInjDde+e4RE7|&'Pnf4z&bAA y5Gke2{,M'9;i9(BKs UgRU]͙Tu"'KHYEi 0Yt& x59$QW%`C.~џ \ (r8h}-m:$Q"E13X'a(݈nDI M LɊ$HDHrț(皛9%o9ڪ(`ӛ1KF[;#`LMTp40?ɜ-X%9M Mqbj,̞AюIgۮ{* 4yhRlOH6_]Mk#m;qbuLM@%5u*͞v\.U|!? V^xJg 1ljڣڍ"欫IE)2-y;l@$Gv5Qو pc1-HcAdR "2}k6Q=szD= ` z~\B`L 8$ތA#+2O@,ѺE*j|zHcZ]oAǵ$}xp̀γuU~cdнl:Q¦`ћ}tC9[,: G7BνsMb*NbL SoŘuƘ]rw1fwf͢첰 cζuѹ]EQ&_&DDN$'fQ4zr3\ vt5ePdtrߏ- 2JOQQd"YFDfe,&\k4i@ˉh q=j@3D˂ ~bG$A+q=9fRIX[v({BA9DPnuI|i'OʉUsB'J$ K!gmRn">cAh_IL1ڥh:9QC QȵfPŦ3c՛o-vb-bLjB lEA7;ɳgGMsz91;% ǂ.8B)1LaAuGp;xg[?P[<=SndcpNm`wcN' mIcE|-XEveEfRJ7.+;$0/c N dDgy~?Om9Xt&I0's i&Mz(90pYNFɀ(z|xcmg A}Qp䋛5d z;IQ@(YW=x'EjH2Xn4{9vIi(PY!"ZgeL!Ք¡kj9VXF̔"v5/q6<Ţ~STy`Ƨ4zZ{tEaH1%E[^wS=Z®9^f~ eidQ}רh=}ix|LR¤/P8r6_RN}Bx"҂Bh>p$JHxn"ܩJ( lXu&N:fOQB]|T_;{_+FdESX{z&섚RpNk&+21#mbt` yr}xM>(7^x{"BƲ,w+ m%#G߻́+( h*l3&,"DZ[M)H|ޣY`{RdrGGP ꉵ |_p| $9?'z/ f+C/S[5~r!핒 RY1c=!/}3:t4jۛ1!,GԵ]jeG h[DA^0EHHZ@mg;^$"'Ի8[&ϱB=)=Z5{| 0}׿EAH)v5ADK}CAPq:g"z )F5f{Jp=s^T(wSӍ $9$5z+awcE3|tgT=G La Xmi`k~jq(2a9-|`7X ;r!q\:JvmH >)r/fO[؟CtqM ';GcEРϺy{|3thwe% (@94x9)?& ]Y'JdZR՟Gzθ}!?ދ*F#cCᵜ?dSnLMExF;yGMPz&tjM {$ 6A9?o.tA!^;{4VDs!2lq?{?yRFn =v,|L ED ! mY=yVL:Ź3LGpAtXA]tR(͖L\IqxV~7˖gXuz 1%+22&sGUH4jSJH QF;IFs`1A/g_Q~l_hEEf-sTzIA@n.xaI!Řԯg%zɘprnM@FL&&ՖxRQuh)(‰}{f j.u5]1/̤jNIHD9c. Vv3jKzIg eEaFDriZO1x^\nwըN] @XADUh↬˚MTSLXAԏgƌ#+O|\|ks2 N_9BPҽٲ0ɨ X}fޠ/L|a'ZO>{EQ2fϗHYz 1!`YQȣ(i@49*nXSbp, b#=B&TBPp Z?jƐ¤\߼/puBz3 A9DH! "֬e̐F_<}JxaKbVdVdk~κ x)2 0X;Ko@/JȊN8RɩSmg j)B1?͋GWqAH6کj /E3rYX*ޡ0)"8BX=yv¤K4" Gp3+Jc E97]蓀CrPFu'7ed$yjr5:nzw V,:#mYeE١+'>+=gJ4F95\iJ!3,xzKsjb?iT:yz_gɊAS>_Xjy?9S !y%ȳC2@߲ԅtv1VQ&&kM(i՛)q4w&,bٸ&"" rN'=:w93:ު%0K:LIus8Oac;#}nw!\r]eSA` yv5>ǽYb{qt??EV|tԺwm#o`C'N=d3:nD‚ PUۙ5TC]/-+ z9RVQ 6"k0|br1X^ $+>UH mIcvuupQS|K;בHDG /ˁs{-$zAB@C=b?`y=.FtXdqW%<_ƞ{OȦi)Moῖ~E?֩ =VQI5]*b!d|.v/A(,hKjvcN6aĘx=E زB?&R$ нZqAz+_$A $"62qc_vRQȲmIQx_S_B#o@0WO'}i7P&tֵ {Id9YSI6+iYf| s#(`#$LM+FgM#9㪥ہ_b dYR:I39 RY=r1 ~ЀmVPw7K5>^Ov҉&LIF->ED*gQո6(P'"Jx*Ig5zL e^N@Bm/Psbl[w%ǀjE򲬎XٟP7F1B7 yq,&SiD5z;7jJM炚4jB 'qW:x83JRqZ )+<pYwx E/!_bǜbi΃x`_`0eY,AV_: 8Ɉ12l#DbPܵ. WBm75m#9h 7 (W:2P߲b:s E5Km~uoӑ/ r%2Tӧm1BlZjN;i@Iͯ x5#©߃!CRcV-F:Q-jAmBs00=FTbYzph%䉨ۢYv ^BK0kY%m7 ~g1VO/Mt|ƺm6?#6Q cZFwmYů|ϚomhǶ0`f[&*j7rPnOFgMJlKQ2A-A}#7T̚3َn?&b[Mnȶ8H6?u7_< 74l_fS/ɴ^` M -f[Lͯn߱AMCpe[Km߲jQ0] ylB]6E eoJƘo<ǶxjFx4N ^B.۶߲˶86Q1ƴo< m1لw A/pMCH/ͨB^淇4 x9m^T~k[ al~t"5-/GlˁZGO2?@}tEXtSoftwareAdobe ImageReadyqe<IENDB`stringr/man/figures/lifecycle-archived.svg0000644000176200001440000000170714316043620020430 0ustar liggesusers lifecyclelifecyclearchivedarchived stringr/man/figures/lifecycle-questioning.svg0000644000176200001440000000171414316043620021206 0ustar liggesuserslifecyclelifecyclequestioningquestioning stringr/man/figures/lifecycle-superseded.svg0000644000176200001440000000171314316043620021003 0ustar liggesusers lifecyclelifecyclesupersededsuperseded stringr/man/figures/lifecycle-stable.svg0000644000176200001440000000167414316043620020120 0ustar liggesuserslifecyclelifecyclestablestable stringr/man/figures/lifecycle-experimental.svg0000644000176200001440000000171614316043620021340 0ustar liggesuserslifecyclelifecycleexperimentalexperimental stringr/man/figures/lifecycle-deprecated.svg0000644000176200001440000000171214316043620020737 0ustar liggesuserslifecyclelifecycledeprecateddeprecated stringr/man/str_subset.Rd0000644000176200001440000000361314524677110015213 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/subset.R \name{str_subset} \alias{str_subset} \title{Find matching elements} \usage{ str_subset(string, pattern, negate = FALSE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link[=boundary]{boundary()}}. An empty pattern, "", is equivalent to \code{boundary("character")}.} \item{negate}{If \code{TRUE}, inverts the resulting boolean vector.} } \value{ A character vector, usually smaller than \code{string}. } \description{ \code{str_subset()} returns all elements of \code{string} where there's at least one match to \code{pattern}. It's a wrapper around \code{x[str_detect(x, pattern)]}, and is equivalent to \code{grep(pattern, x, value = TRUE)}. Use \code{\link[=str_extract]{str_extract()}} to find the location of the match \emph{within} each string. } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_subset(fruit, "a") str_subset(fruit, "^a") str_subset(fruit, "a$") str_subset(fruit, "b") str_subset(fruit, "[aeiou]") # Elements that don't match str_subset(fruit, "^p", negate = TRUE) # Missings never match str_subset(c("a", NA, "b"), ".") } \seealso{ \code{\link[=grep]{grep()}} with argument \code{value = TRUE}, \code{\link[stringi:stri_subset]{stringi::stri_subset()}} for the underlying implementation. } stringr/man/str_length.Rd0000644000176200001440000000260214520174727015166 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/length.R \name{str_length} \alias{str_length} \alias{str_width} \title{Compute the length/width} \usage{ str_length(string) str_width(string) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} } \value{ A numeric vector the same length as \code{string}. } \description{ \code{str_length()} returns the number of codepoints in a string. These are the individual elements (which are often, but not always letters) that can be extracted with \code{\link[=str_sub]{str_sub()}}. \code{str_width()} returns how much space the string will occupy when printed in a fixed width font (i.e. when printed in the console). } \examples{ str_length(letters) str_length(NA) str_length(factor("abc")) str_length(c("i", "like", "programming", NA)) # Some characters, like emoji and Chinese characters (hanzi), are square # which means they take up the width of two Latin characters x <- c("\u6c49\u5b57", "\U0001f60a") str_view(x) str_width(x) str_length(x) # There are two ways of representing a u with an umlaut u <- c("\u00fc", "u\u0308") # They have the same width str_width(u) # But a different length str_length(u) # Because the second element is made up of a u + an accent str_sub(u, 1, 1) } \seealso{ \code{\link[stringi:stri_length]{stringi::stri_length()}} which this function wraps. } stringr/man/str_sub.Rd0000644000176200001440000000442414520174727014502 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/sub.R \name{str_sub} \alias{str_sub} \alias{str_sub<-} \alias{str_sub_all} \title{Get and set substrings using their positions} \usage{ str_sub(string, start = 1L, end = -1L) str_sub(string, start = 1L, end = -1L, omit_na = FALSE) <- value str_sub_all(string, start = 1L, end = -1L) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{start, end}{A pair of integer vectors defining the range of characters to extract (inclusive). Alternatively, instead of a pair of vectors, you can pass a matrix to \code{start}. The matrix should have two columns, either labelled \code{start} and \code{end}, or \code{start} and \code{length}.} \item{omit_na}{Single logical value. If \code{TRUE}, missing values in any of the arguments provided will result in an unchanged input.} \item{value}{replacement string} } \value{ \itemize{ \item \code{str_sub()}: A character vector the same length as \code{string}/\code{start}/\code{end}. \item \code{str_sub_all()}: A list the same length as \code{string}. Each element is a character vector the same length as \code{start}/\code{end}. } } \description{ \code{str_sub()} extracts or replaces the elements at a single position in each string. \code{str_sub_all()} allows you to extract strings at multiple elements in every string. } \examples{ hw <- "Hadley Wickham" str_sub(hw, 1, 6) str_sub(hw, end = 6) str_sub(hw, 8, 14) str_sub(hw, 8) # Negative indices index from end of string str_sub(hw, -1) str_sub(hw, -7) str_sub(hw, end = -7) # str_sub() is vectorised by both string and position str_sub(hw, c(1, 8), c(6, 14)) # if you want to extract multiple positions from multiple strings, # use str_sub_all() x <- c("abcde", "ghifgh") str_sub(x, c(1, 2), c(2, 4)) str_sub_all(x, start = c(1, 2), end = c(2, 4)) # Alternatively, you can pass in a two column matrix, as in the # output from str_locate_all pos <- str_locate_all(hw, "[aeio]")[[1]] pos str_sub(hw, pos) # You can also use `str_sub()` to modify strings: x <- "BBCDEF" str_sub(x, 1, 1) <- "A"; x str_sub(x, -1, -1) <- "K"; x str_sub(x, -2, -2) <- "GHIJ"; x str_sub(x, 2, -2) <- ""; x } \seealso{ The underlying implementation in \code{\link[stringi:stri_sub]{stringi::stri_sub()}} } stringr/man/str_detect.Rd0000644000176200001440000000352514524677110015160 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/detect.R \name{str_detect} \alias{str_detect} \title{Detect the presence/absence of a match} \usage{ str_detect(string, pattern, negate = FALSE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link[=boundary]{boundary()}}. An empty pattern, "", is equivalent to \code{boundary("character")}.} \item{negate}{If \code{TRUE}, inverts the resulting boolean vector.} } \value{ A logical vector the same length as \code{string}/\code{pattern}. } \description{ \code{str_detect()} returns a logical vector with \code{TRUE} for each element of \code{string} that matches \code{pattern} and \code{FALSE} otherwise. It's equivalent to \code{grepl(pattern, string)}. } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_detect(fruit, "a") str_detect(fruit, "^a") str_detect(fruit, "a$") str_detect(fruit, "b") str_detect(fruit, "[aeiou]") # Also vectorised over pattern str_detect("aecfg", letters) # Returns TRUE if the pattern do NOT match str_detect(fruit, "^p", negate = TRUE) } \seealso{ \code{\link[stringi:stri_detect]{stringi::stri_detect()}} which this function wraps, \code{\link[=str_subset]{str_subset()}} for a convenient wrapper around \code{x[str_detect(x, pattern)]} } stringr/man/str_view.Rd0000644000176200001440000000547614524700753014671 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/view.R \name{str_view} \alias{str_view} \alias{str_view_all} \title{View strings and matches} \usage{ str_view( string, pattern = NULL, match = TRUE, html = FALSE, use_escapes = FALSE ) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link[=boundary]{boundary()}}. An empty pattern, "", is equivalent to \code{boundary("character")}.} \item{match}{If \code{pattern} is supplied, which elements should be shown? \itemize{ \item \code{TRUE}, the default, shows only elements that match the pattern. \item \code{NA} shows all elements. \item \code{FALSE} shows only elements that don't match the pattern. } If \code{pattern} is not supplied, all elements are always shown.} \item{html}{Use HTML output? If \code{TRUE} will create an HTML widget; if \code{FALSE} will style using ANSI escapes.} \item{use_escapes}{If \code{TRUE}, all non-ASCII characters will be rendered with unicode escapes. This is useful to see exactly what underlying values are stored in the string.} } \description{ \code{str_view()} is used to print the underlying representation of a string and to see how a \code{pattern} matches. Matches are surrounded by \verb{<>} and unusual whitespace (i.e. all whitespace apart from \code{" "} and \code{"\\n"}) are surrounded by \code{{}} and escaped. Where possible, matches and unusual whitespace are coloured blue and \code{NA}s red. } \examples{ # Show special characters str_view(c("\"\\\\", "\\\\\\\\\\\\", "fgh", NA, "NA")) # A non-breaking space looks like a regular space: nbsp <- "Hi\u00A0you" nbsp # But it doesn't behave like one: str_detect(nbsp, " ") # So str_view() brings it to your attention with a blue background str_view(nbsp) # You can also use escapes to see all non-ASCII characters str_view(nbsp, use_escapes = TRUE) # Supply a pattern to see where it matches str_view(c("abc", "def", "fghi"), "[aeiou]") str_view(c("abc", "def", "fghi"), "^") str_view(c("abc", "def", "fghi"), "..") # By default, only matching strings will be shown str_view(c("abc", "def", "fghi"), "e") # but you can show all: str_view(c("abc", "def", "fghi"), "e", match = NA) # or just those that don't match: str_view(c("abc", "def", "fghi"), "e", match = FALSE) } stringr/man/str_like.Rd0000644000176200001440000000221514317267003014623 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/detect.R \name{str_like} \alias{str_like} \title{Detect a pattern in the same way as \code{SQL}'s \code{LIKE} operator} \usage{ str_like(string, pattern, ignore_case = TRUE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{A character vector containing a SQL "like" pattern. See above for details.} \item{ignore_case}{Ignore case of matches? Defaults to \code{TRUE} to match the SQL \code{LIKE} operator.} } \value{ A logical vector the same length as \code{string}. } \description{ \code{str_like()} follows the conventions of the SQL \code{LIKE} operator: \itemize{ \item Must match the entire string. \item \verb{_} matches a single character (like \code{.}). \item \verb{\%} matches any number of characters (like \verb{.*}). \item \verb{\\\%} and \verb{\\_} match literal \verb{\%} and \verb{_}. \item The match is case insensitive by default. } } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_like(fruit, "app") str_like(fruit, "app\%") str_like(fruit, "ba_ana") str_like(fruit, "\%APPLE") } stringr/man/str_remove.Rd0000644000176200001440000000253214520174727015204 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/remove.R \name{str_remove} \alias{str_remove} \alias{str_remove_all} \title{Remove matched patterns} \usage{ str_remove(string, pattern) str_remove_all(string, pattern) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link[=boundary]{boundary()}}. An empty pattern, "", is equivalent to \code{boundary("character")}.} } \value{ A character vector the same length as \code{string}/\code{pattern}. } \description{ Remove matches, i.e. replace them with \code{""}. } \examples{ fruits <- c("one apple", "two pears", "three bananas") str_remove(fruits, "[aeiou]") str_remove_all(fruits, "[aeiou]") } \seealso{ \code{\link[=str_replace]{str_replace()}} for the underlying implementation. } stringr/man/pipe.Rd0000644000176200001440000000032013031472465013741 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/utils.R \name{\%>\%} \alias{\%>\%} \title{Pipe operator} \usage{ lhs \%>\% rhs } \description{ Pipe operator } \keyword{internal} stringr/man/str_unique.Rd0000644000176200001440000000261414520174727015216 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/unique.R \name{str_unique} \alias{str_unique} \title{Remove duplicated strings} \usage{ str_unique(string, locale = "en", ignore_case = FALSE, ...) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{locale}{Locale to use for comparisons. See \code{\link[stringi:stri_locale_list]{stringi::stri_locale_list()}} for all possible options. Defaults to "en" (English) to ensure that default behaviour is consistent across platforms.} \item{ignore_case}{Ignore case when comparing strings?} \item{...}{Other options used to control collation. Passed on to \code{\link[stringi:stri_opts_collator]{stringi::stri_opts_collator()}}.} } \value{ A character vector, usually shorter than \code{string}. } \description{ \code{str_unique()} removes duplicated values, with optional control over how duplication is measured. } \examples{ str_unique(c("a", "b", "c", "b", "a")) str_unique(c("a", "b", "c", "B", "A")) str_unique(c("a", "b", "c", "B", "A"), ignore_case = TRUE) # Use ... to pass additional arguments to stri_unique() str_unique(c("motley", "mötley", "pinguino", "pingüino")) str_unique(c("motley", "mötley", "pinguino", "pingüino"), strength = 1) } \seealso{ \code{\link[=unique]{unique()}}, \code{\link[stringi:stri_unique]{stringi::stri_unique()}} which this function wraps. } stringr/man/str_split.Rd0000644000176200001440000000732314524700556015044 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/split.R \name{str_split} \alias{str_split} \alias{str_split_1} \alias{str_split_fixed} \alias{str_split_i} \title{Split up a string into pieces} \usage{ str_split(string, pattern, n = Inf, simplify = FALSE) str_split_1(string, pattern) str_split_fixed(string, pattern, n) str_split_i(string, pattern, i) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link[=boundary]{boundary()}}. An empty pattern, "", is equivalent to \code{boundary("character")}.} \item{n}{Maximum number of pieces to return. Default (Inf) uses all possible split positions. For \code{str_split()}, this determines the maximum length of each element of the output. For \code{str_split_fixed()}, this determines the number of columns in the output; if an input is too short, the result will be padded with \code{""}.} \item{simplify}{A boolean. \itemize{ \item \code{FALSE} (the default): returns a list of character vectors. \item \code{TRUE}: returns a character matrix. }} \item{i}{Element to return. Use a negative value to count from the right hand side.} } \value{ \itemize{ \item \code{str_split_1()}: a character vector. \item \code{str_split()}: a list the same length as \code{string}/\code{pattern} containing character vectors. \item \code{str_split_fixed()}: a character matrix with \code{n} columns and the same number of rows as the length of \code{string}/\code{pattern}. \item \code{str_split_i()}: a character vector the same length as \code{string}/\code{pattern}. } } \description{ This family of functions provides various ways of splitting a string up into pieces. These two functions return a character vector: \itemize{ \item \code{str_split_1()} takes a single string and splits it into pieces, returning a single character vector. \item \code{str_split_i()} splits each string in a character vector into pieces and extracts the \code{i}th value, returning a character vector. } These two functions return a more complex object: \itemize{ \item \code{str_split()} splits each string in a character vector into a varying number of pieces, returning a list of character vectors. \item \code{str_split_fixed()} splits each string in a character vector into a fixed number of pieces, returning a character matrix. } } \examples{ fruits <- c( "apples and oranges and pears and bananas", "pineapples and mangos and guavas" ) str_split(fruits, " and ") str_split(fruits, " and ", simplify = TRUE) # If you want to split a single string, use `str_split_1` str_split_1(fruits[[1]], " and ") # Specify n to restrict the number of possible matches str_split(fruits, " and ", n = 3) str_split(fruits, " and ", n = 2) # If n greater than number of pieces, no padding occurs str_split(fruits, " and ", n = 5) # Use fixed to return a character matrix str_split_fixed(fruits, " and ", 3) str_split_fixed(fruits, " and ", 4) # str_split_i extracts only a single piece from a string str_split_i(fruits, " and ", 1) str_split_i(fruits, " and ", 4) # use a negative number to select from the end str_split_i(fruits, " and ", -1) } \seealso{ \code{\link[=stri_split]{stri_split()}} for the underlying implementation. } stringr/man/str_c.Rd0000644000176200001440000000425314520174727014133 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/c.R \name{str_c} \alias{str_c} \title{Join multiple strings into one string} \usage{ str_c(..., sep = "", collapse = NULL) } \arguments{ \item{...}{One or more character vectors. \code{NULL}s are removed; scalar inputs (vectors of length 1) are recycled to the common length of vector inputs. Like most other R functions, missing values are "infectious": whenever a missing value is combined with another string the result will always be missing. Use \code{\link[dplyr:coalesce]{dplyr::coalesce()}} or \code{\link[=str_replace_na]{str_replace_na()}} to convert to the desired value.} \item{sep}{String to insert between input vectors.} \item{collapse}{Optional string used to combine output into single string. Generally better to use \code{\link[=str_flatten]{str_flatten()}} if you needed this behaviour.} } \value{ If \code{collapse = NULL} (the default) a character vector with length equal to the longest input. If \code{collapse} is a string, a character vector of length 1. } \description{ \code{str_c()} combines multiple character vectors into a single character vector. It's very similar to \code{\link[=paste0]{paste0()}} but uses tidyverse recycling and \code{NA} rules. One way to understand how \code{str_c()} works is picture a 2d matrix of strings, where each argument forms a column. \code{sep} is inserted between each column, and then each row is combined together into a single string. If \code{collapse} is set, it's inserted between each row, and then the result is again combined, this time into a single string. } \examples{ str_c("Letter: ", letters) str_c("Letter", letters, sep = ": ") str_c(letters, " is for", "...") str_c(letters[-26], " comes before ", letters[-1]) str_c(letters, collapse = "") str_c(letters, collapse = ", ") # Differences from paste() ---------------------- # Missing inputs give missing outputs str_c(c("a", NA, "b"), "-d") paste0(c("a", NA, "b"), "-d") # Use str_replace_NA to display literal NAs: str_c(str_replace_na(c("a", NA, "b")), "-d") # Uses tidyverse recycling rules \dontrun{str_c(1:2, 1:3)} # errors paste0(1:2, 1:3) str_c("x", character()) paste0("x", character()) } stringr/man/str_match.Rd0000644000176200001440000000437314520174727015010 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/match.R \name{str_match} \alias{str_match} \alias{str_match_all} \title{Extract components (capturing groups) from a match} \usage{ str_match(string, pattern) str_match_all(string, pattern) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Unlike other stringr functions, \code{str_match()} only supports regular expressions, as described \code{vignette("regular-expressions")}. The pattern should contain at least one capturing group.} } \value{ \itemize{ \item \code{str_match()}: a character matrix with the same number of rows as the length of \code{string}/\code{pattern}. The first column is the complete match, followed by one column for each capture group. The columns will be named if you used "named captured groups", i.e. \verb{(?pattern')}. \item \code{str_match_all()}: a list of the same length as \code{string}/\code{pattern} containing character matrices. Each matrix has columns as descrbed above and one row for each match. } } \description{ Extract any number of matches defined by unnamed, \code{(pattern)}, and named, \verb{(?pattern)} capture groups. Use a non-capturing group, \verb{(?:pattern)}, if you need to override default operate precedence but don't want to capture the result. } \examples{ strings <- c(" 219 733 8965", "329-293-8753 ", "banana", "595 794 7569", "387 287 6718", "apple", "233.398.9187 ", "482 952 3315", "239 923 8115 and 842 566 4692", "Work: 579-499-7527", "$1000", "Home: 543.355.3679") phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})" str_extract(strings, phone) str_match(strings, phone) # Extract/match all str_extract_all(strings, phone) str_match_all(strings, phone) # You can also name the groups to make further manipulation easier phone <- "(?[2-9][0-9]{2})[- .](?[0-9]{3}[- .][0-9]{4})" str_match(strings, phone) x <- c(" ", " <>", "", "", NA) str_match(x, "<(.*?)> <(.*?)>") str_match_all(x, "<(.*?)>") str_extract(x, "<.*?>") str_extract_all(x, "<.*?>") } \seealso{ \code{\link[=str_extract]{str_extract()}} to extract the complete match, \code{\link[stringi:stri_match]{stringi::stri_match()}} for the underlying implementation. } stringr/man/str_count.Rd0000644000176200001440000000310214520174727015031 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/count.R \name{str_count} \alias{str_count} \title{Count number of matches} \usage{ str_count(string, pattern = "") } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link[=boundary]{boundary()}}. An empty pattern, "", is equivalent to \code{boundary("character")}.} } \value{ An integer vector the same length as \code{string}/\code{pattern}. } \description{ Counts the number of times \code{pattern} is found within each element of \code{string.} } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_count(fruit, "a") str_count(fruit, "p") str_count(fruit, "e") str_count(fruit, c("a", "b", "p", "p")) str_count(c("a.", "...", ".a.a"), ".") str_count(c("a.", "...", ".a.a"), fixed(".")) } \seealso{ \code{\link[stringi:stri_count]{stringi::stri_count()}} which this function wraps. \code{\link[=str_locate]{str_locate()}}/\code{\link[=str_locate_all]{str_locate_all()}} to locate position of matches } stringr/man/stringr-package.Rd0000644000176200001440000000204514524700555016075 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/stringr-package.R \docType{package} \name{stringr-package} \alias{stringr} \alias{stringr-package} \title{stringr: Simple, Consistent Wrappers for Common String Operations} \description{ \if{html}{\figure{logo.png}{options: style='float: right' alt='logo' width='120'}} A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another. } \seealso{ Useful links: \itemize{ \item \url{https://stringr.tidyverse.org} \item \url{https://github.com/tidyverse/stringr} \item Report bugs at \url{https://github.com/tidyverse/stringr/issues} } } \author{ \strong{Maintainer}: Hadley Wickham \email{hadley@posit.co} [copyright holder] Other contributors: \itemize{ \item Posit Software, PBC [copyright holder, funder] } } \keyword{internal} stringr/man/modifiers.Rd0000644000176200001440000000717414520174727015007 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/modifiers.R \name{modifiers} \alias{modifiers} \alias{fixed} \alias{coll} \alias{regex} \alias{boundary} \title{Control matching behaviour with modifier functions} \usage{ fixed(pattern, ignore_case = FALSE) coll(pattern, ignore_case = FALSE, locale = "en", ...) regex( pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, ... ) boundary( type = c("character", "line_break", "sentence", "word"), skip_word_none = NA, ... ) } \arguments{ \item{pattern}{Pattern to modify behaviour.} \item{ignore_case}{Should case differences be ignored in the match? For \code{fixed()}, this uses a simple algorithm which assumes a one-to-one mapping between upper and lower case letters.} \item{locale}{Locale to use for comparisons. See \code{\link[stringi:stri_locale_list]{stringi::stri_locale_list()}} for all possible options. Defaults to "en" (English) to ensure that default behaviour is consistent across platforms.} \item{...}{Other less frequently used arguments passed on to \code{\link[stringi:stri_opts_collator]{stringi::stri_opts_collator()}}, \code{\link[stringi:stri_opts_regex]{stringi::stri_opts_regex()}}, or \code{\link[stringi:stri_opts_brkiter]{stringi::stri_opts_brkiter()}}} \item{multiline}{If \code{TRUE}, \code{$} and \code{^} match the beginning and end of each line. If \code{FALSE}, the default, only match the start and end of the input.} \item{comments}{If \code{TRUE}, white space and comments beginning with \verb{#} are ignored. Escape literal spaces with \verb{\\\\ }.} \item{dotall}{If \code{TRUE}, \code{.} will also match line terminators.} \item{type}{Boundary type to detect. \describe{ \item{\code{character}}{Every character is a boundary.} \item{\code{line_break}}{Boundaries are places where it is acceptable to have a line break in the current locale.} \item{\code{sentence}}{The beginnings and ends of sentences are boundaries, using intelligent rules to avoid counting abbreviations (\href{https://www.unicode.org/reports/tr29/#Sentence_Boundaries}{details}).} \item{\code{word}}{The beginnings and ends of words are boundaries.} }} \item{skip_word_none}{Ignore "words" that don't contain any characters or numbers - i.e. punctuation. Default \code{NA} will skip such "words" only when splitting on \code{word} boundaries.} } \value{ A stringr modifier object, i.e. a character vector with parent S3 class \code{stringr_pattern}. } \description{ Modifier functions control the meaning of the \code{pattern} argument to stringr functions: \itemize{ \item \code{boundary()}: Match boundaries between things. \item \code{coll()}: Compare strings using standard Unicode collation rules. \item \code{fixed()}: Compare literal bytes. \item \code{regex()} (the default): Uses ICU regular expressions. } } \examples{ pattern <- "a.b" strings <- c("abb", "a.b") str_detect(strings, pattern) str_detect(strings, fixed(pattern)) str_detect(strings, coll(pattern)) # coll() is useful for locale-aware case-insensitive matching i <- c("I", "\u0130", "i") i str_detect(i, fixed("i", TRUE)) str_detect(i, coll("i", TRUE)) str_detect(i, coll("i", TRUE, locale = "tr")) # Word boundaries words <- c("These are some words.") str_count(words, boundary("word")) str_split(words, " ")[[1]] str_split(words, boundary("word"))[[1]] # Regular expression variations str_extract_all("The Cat in the Hat", "[a-z]+") str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE)) str_extract_all("a\nb\nc", "^.") str_extract_all("a\nb\nc", regex("^.", multiline = TRUE)) str_extract_all("a\nb\nc", "a.") str_extract_all("a\nb\nc", regex("a.", dotall = TRUE)) } stringr/man/case.Rd0000644000176200001440000000257714520174727013743 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/case.R \name{case} \alias{case} \alias{str_to_upper} \alias{str_to_lower} \alias{str_to_title} \alias{str_to_sentence} \title{Convert string to upper case, lower case, title case, or sentence case} \usage{ str_to_upper(string, locale = "en") str_to_lower(string, locale = "en") str_to_title(string, locale = "en") str_to_sentence(string, locale = "en") } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{locale}{Locale to use for comparisons. See \code{\link[stringi:stri_locale_list]{stringi::stri_locale_list()}} for all possible options. Defaults to "en" (English) to ensure that default behaviour is consistent across platforms.} } \value{ A character vector the same length as \code{string}. } \description{ \itemize{ \item \code{str_to_upper()} converts to upper case. \item \code{str_to_lower()} converts to lower case. \item \code{str_to_title()} converts to title case, where only the first letter of each word is capitalized. \item \code{str_to_sentence()} convert to sentence case, where only the first letter of sentence is capitalized. } } \examples{ dog <- "The quick brown dog" str_to_upper(dog) str_to_lower(dog) str_to_title(dog) str_to_sentence("the quick brown dog") # Locale matters! str_to_upper("i") # English str_to_upper("i", "tr") # Turkish } stringr/man/str_interp.Rd0000644000176200001440000000445714316043620015206 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/interp.R \name{str_interp} \alias{str_interp} \title{String interpolation} \usage{ str_interp(string, env = parent.frame()) } \arguments{ \item{string}{A template character string. This function is not vectorised: a character vector will be collapsed into a single string.} \item{env}{The environment in which to evaluate the expressions.} } \value{ An interpolated character string. } \description{ \ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#superseded}{\figure{lifecycle-superseded.svg}{options: alt='[Superseded]'}}}{\strong{[Superseded]}} \code{str_interp()} is superseded in favour of \code{\link[=str_glue]{str_glue()}}. String interpolation is a useful way of specifying a character string which depends on values in a certain environment. It allows for string creation which is easier to read and write when compared to using e.g. \code{\link[=paste]{paste()}} or \code{\link[=sprintf]{sprintf()}}. The (template) string can include expression placeholders of the form \verb{$\{expression\}} or \verb{$[format]\{expression\}}, where expressions are valid R expressions that can be evaluated in the given environment, and \code{format} is a format specification valid for use with \code{\link[=sprintf]{sprintf()}}. } \examples{ # Using values from the environment, and some formats user_name <- "smbache" amount <- 6.656 account <- 1337 str_interp("User ${user_name} (account $[08d]{account}) has $$[.2f]{amount}.") # Nested brace pairs work inside expressions too, and any braces can be # placed outside the expressions. str_interp("Works with } nested { braces too: $[.2f]{{{2 + 2}*{amount}}}") # Values can also come from a list str_interp( "One value, ${value1}, and then another, ${value2*2}.", list(value1 = 10, value2 = 20) ) # Or a data frame str_interp( "Values are $[.2f]{max(Sepal.Width)} and $[.2f]{min(Sepal.Width)}.", iris ) # Use a vector when the string is long: max_char <- 80 str_interp(c( "This particular line is so long that it is hard to write ", "without breaking the ${max_char}-char barrier!" )) } \seealso{ \code{\link[=str_glue]{str_glue()}} and \code{\link[=str_glue_data]{str_glue_data()}} for alternative approaches to the same problem. } \author{ Stefan Milton Bache } \keyword{internal} stringr/man/str_which.Rd0000644000176200001440000000277614524677110015021 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/subset.R \name{str_which} \alias{str_which} \title{Find matching indices} \usage{ str_which(string, pattern, negate = FALSE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \code{vignette("regular-expressions")}. Use \code{\link[=regex]{regex()}} for finer control of the matching behaviour. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale. Match character, word, line and sentence boundaries with \code{\link[=boundary]{boundary()}}. An empty pattern, "", is equivalent to \code{boundary("character")}.} \item{negate}{If \code{TRUE}, inverts the resulting boolean vector.} } \value{ An integer vector, usually smaller than \code{string}. } \description{ \code{str_which()} returns the indices of \code{string} where there's at least one match to \code{pattern}. It's a wrapper around \code{which(str_detect(x, pattern))}, and is equivalent to \code{grep(pattern, x)}. } \examples{ fruit <- c("apple", "banana", "pear", "pineapple") str_which(fruit, "a") # Elements that don't match str_which(fruit, "^p", negate = TRUE) # Missings never match str_which(c("a", NA, "b"), ".") } stringr/man/str_replace_na.Rd0000644000176200001440000000066214317040167015774 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/replace.R \name{str_replace_na} \alias{str_replace_na} \title{Turn NA into "NA"} \usage{ str_replace_na(string, replacement = "NA") } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{replacement}{A single string.} } \description{ Turn NA into "NA" } \examples{ str_replace_na(c(NA, "abc", "def")) } stringr/man/str_replace.Rd0000644000176200001440000000565114524701215015317 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/replace.R \name{str_replace} \alias{str_replace} \alias{str_replace_all} \title{Replace matches with new text} \usage{ str_replace(string, pattern, replacement) str_replace_all(string, pattern, replacement) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{pattern}{Pattern to look for. The default interpretation is a regular expression, as described in \link[stringi:about_search_regex]{stringi::about_search_regex}. Control options with \code{\link[=regex]{regex()}}. For \code{str_replace_all()} this can also be a named vector (\code{c(pattern1 = replacement1)}), in order to perform multiple replacements in each element of \code{string}. Match a fixed string (i.e. by comparing only bytes), using \code{\link[=fixed]{fixed()}}. This is fast, but approximate. Generally, for matching human text, you'll want \code{\link[=coll]{coll()}} which respects character matching rules for the specified locale.} \item{replacement}{The replacement value, usually a single string, but it can be the a vector the same length as \code{string} or \code{pattern}. References of the form \verb{\\1}, \verb{\\2}, etc will be replaced with the contents of the respective matched group (created by \verb{()}). Alternatively, supply a function, which will be called once for each match (from right to left) and its return value will be used to replace the match.} } \value{ A character vector the same length as \code{string}/\code{pattern}/\code{replacement}. } \description{ \code{str_replace()} replaces the first match; \code{str_replace_all()} replaces all matches. } \examples{ fruits <- c("one apple", "two pears", "three bananas") str_replace(fruits, "[aeiou]", "-") str_replace_all(fruits, "[aeiou]", "-") str_replace_all(fruits, "[aeiou]", toupper) str_replace_all(fruits, "b", NA_character_) str_replace(fruits, "([aeiou])", "") str_replace(fruits, "([aeiou])", "\\\\1\\\\1") # Note that str_replace() is vectorised along text, pattern, and replacement str_replace(fruits, "[aeiou]", c("1", "2", "3")) str_replace(fruits, c("a", "e", "i"), "-") # If you want to apply multiple patterns and replacements to the same # string, pass a named vector to pattern. fruits \%>\% str_c(collapse = "---") \%>\% str_replace_all(c("one" = "1", "two" = "2", "three" = "3")) # Use a function for more sophisticated replacement. This example # replaces colour names with their hex values. colours <- str_c("\\\\b", colors(), "\\\\b", collapse="|") col2hex <- function(col) { rgb <- col2rgb(col) rgb(rgb["red", ], rgb["green", ], rgb["blue", ], max = 255) } x <- c( "Roses are red, violets are blue", "My favourite colour is green" ) str_replace_all(x, colours, col2hex) } \seealso{ \code{\link[=str_replace_na]{str_replace_na()}} to turn missing values into "NA"; \code{\link[=stri_replace]{stri_replace()}} for the underlying implementation. } stringr/man/str_trunc.Rd0000644000176200001440000000165014520174727015042 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/trunc.R \name{str_trunc} \alias{str_trunc} \title{Truncate a string to maximum width} \usage{ str_trunc(string, width, side = c("right", "left", "center"), ellipsis = "...") } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{width}{Maximum width of string.} \item{side, ellipsis}{Location and content of ellipsis that indicates content has been removed.} } \value{ A character vector the same length as \code{string}. } \description{ Truncate a string to a fixed of characters, so that \code{str_length(str_trunc(x, n))} is always less than or equal to \code{n}. } \examples{ x <- "This string is moderately long" rbind( str_trunc(x, 20, "right"), str_trunc(x, 20, "left"), str_trunc(x, 20, "center") ) } \seealso{ \code{\link[=str_pad]{str_pad()}} to increase the minimum width of a string. } stringr/man/str_wrap.Rd0000644000176200001440000000303614317040167014652 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/wrap.R \name{str_wrap} \alias{str_wrap} \title{Wrap words into nicely formatted paragraphs} \usage{ str_wrap(string, width = 80, indent = 0, exdent = 0, whitespace_only = TRUE) } \arguments{ \item{string}{Input vector. Either a character vector, or something coercible to one.} \item{width}{Positive integer giving target line width (in number of characters). A width less than or equal to 1 will put each word on its own line.} \item{indent, exdent}{A non-negative integer giving the indent for the first line (\code{indent}) and all subsequent lines (\code{exdent}).} \item{whitespace_only}{A boolean. \itemize{ \item If \code{TRUE} (the default) wrapping will only occur at whitespace. \item If \code{FALSE}, can break on any non-word character (e.g. \code{/}, \code{-}). }} } \value{ A character vector the same length as \code{string}. } \description{ Wrap words into paragraphs, minimizing the "raggedness" of the lines (i.e. the variation in length line) using the Knuth-Plass algorithm. } \examples{ thanks_path <- file.path(R.home("doc"), "THANKS") thanks <- str_c(readLines(thanks_path), collapse = "\n") thanks <- word(thanks, 1, 3, fixed("\n\n")) cat(str_wrap(thanks), "\n") cat(str_wrap(thanks, width = 40), "\n") cat(str_wrap(thanks, width = 60, indent = 2), "\n") cat(str_wrap(thanks, width = 60, exdent = 2), "\n") cat(str_wrap(thanks, width = 0, exdent = 2), "\n") } \seealso{ \code{\link[stringi:stri_wrap]{stringi::stri_wrap()}} for the underlying implementation. } stringr/man/invert_match.Rd0000644000176200001440000000133514317267003015474 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/locate.R \name{invert_match} \alias{invert_match} \title{Switch location of matches to location of non-matches} \usage{ invert_match(loc) } \arguments{ \item{loc}{matrix of match locations, as from \code{\link[=str_locate_all]{str_locate_all()}}} } \value{ numeric match giving locations of non-matches } \description{ Invert a matrix of match locations to match the opposite of what was previously matched. } \examples{ numbers <- "1 and 2 and 4 and 456" num_loc <- str_locate_all(numbers, "[0-9]+")[[1]] str_sub(numbers, num_loc[, "start"], num_loc[, "end"]) text_loc <- invert_match(num_loc) str_sub(numbers, text_loc[, "start"], text_loc[, "end"]) } stringr/DESCRIPTION0000644000176200001440000000256714524777112013474 0ustar liggesusersPackage: stringr Title: Simple, Consistent Wrappers for Common String Operations Version: 1.5.1 Authors@R: c( person("Hadley", "Wickham", , "hadley@posit.co", role = c("aut", "cre", "cph")), person("Posit Software, PBC", role = c("cph", "fnd")) ) Description: A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another. License: MIT + file LICENSE URL: https://stringr.tidyverse.org, https://github.com/tidyverse/stringr BugReports: https://github.com/tidyverse/stringr/issues Depends: R (>= 3.6) Imports: cli, glue (>= 1.6.1), lifecycle (>= 1.0.3), magrittr, rlang (>= 1.0.0), stringi (>= 1.5.3), vctrs (>= 0.4.0) Suggests: covr, dplyr, gt, htmltools, htmlwidgets, knitr, rmarkdown, testthat (>= 3.0.0), tibble VignetteBuilder: knitr Config/Needs/website: tidyverse/tidytemplate Config/testthat/edition: 3 Encoding: UTF-8 LazyData: true RoxygenNote: 7.2.3 NeedsCompilation: no Packaged: 2023-11-14 15:03:52 UTC; hadleywickham Author: Hadley Wickham [aut, cre, cph], Posit Software, PBC [cph, fnd] Maintainer: Hadley Wickham Repository: CRAN Date/Publication: 2023-11-14 23:10:02 UTC stringr/build/0000755000176200001440000000000014524706130013043 5ustar liggesusersstringr/build/vignette.rds0000644000176200001440000000042514524706130015403 0ustar liggesusersN0 ? 14!8֧*!.5ި&ipš3:Q_BPQ ˆт„6Hho7e!{RѶWYb'fd9K=bv$Jg$Л?ﮮX}3٫_{6ٛљ&n@GY!Z`8a1 ҝy6.W]'zσA%ho6o/ M|+e#/N TAO/}]Zstringr/tests/0000755000176200001440000000000012435640121013102 5ustar liggesusersstringr/tests/testthat/0000755000176200001440000000000014524777112014756 5ustar liggesusersstringr/tests/testthat/test-match.R0000644000176200001440000000504414317040167017146 0ustar liggesusersset.seed(1410) num <- matrix(sample(9, 10 * 10, replace = T), ncol = 10) num_flat <- apply(num, 1, str_c, collapse = "") phones <- str_c( "(", num[, 1], num[, 2], num[, 3], ") ", num[, 4], num[, 5], num[, 6], " ", num[, 7], num[, 8], num[, 9], num[, 10]) test_that("empty strings return correct matrix of correct size", { skip_if_not_installed("stringi", "1.2.2") expect_equal(str_match(NA, "(a)"), matrix(NA_character_, 1, 2)) expect_equal(str_match(character(), "(a)"), matrix(character(), 0, 2)) }) test_that("no matching cases returns 1 column matrix", { res <- str_match(c("a", "b"), ".") expect_equal(nrow(res), 2) expect_equal(ncol(res), 1) expect_equal(res[, 1], c("a", "b")) }) test_that("single match works when all match", { matches <- str_match(phones, "\\(([0-9]{3})\\) ([0-9]{3}) ([0-9]{4})") expect_equal(nrow(matches), length(phones)) expect_equal(ncol(matches), 4) expect_equal(matches[, 1], phones) matches_flat <- apply(matches[, -1], 1, str_c, collapse = "") expect_equal(matches_flat, num_flat) }) test_that("match returns NA when some inputs don't match", { matches <- str_match(c(phones, "blah", NA), "\\(([0-9]{3})\\) ([0-9]{3}) ([0-9]{4})") expect_equal(nrow(matches), length(phones) + 2) expect_equal(ncol(matches), 4) expect_equal(matches[11, ], rep(NA_character_, 4)) expect_equal(matches[12, ], rep(NA_character_, 4)) }) test_that("match returns NA when optional group doesn't match", { expect_equal(str_match(c("ab", "a"), "(a)(b)?")[, 3], c("b", NA)) }) test_that("match_all returns NA when option group doesn't match", { expect_equal(str_match_all("a", "(a)(b)?")[[1]][1, ], c("a", "a", NA)) }) test_that("multiple match works", { phones_one <- str_c(phones, collapse = " ") multi_match <- str_match_all(phones_one, "\\(([0-9]{3})\\) ([0-9]{3}) ([0-9]{4})") single_matches <- str_match(phones, "\\(([0-9]{3})\\) ([0-9]{3}) ([0-9]{4})") expect_equal(multi_match[[1]], single_matches) }) test_that("match and match_all fail when pattern is not a regex", { expect_error(str_match(phones, fixed("3"))) expect_error(str_match_all(phones, coll("9"))) }) test_that("uses tidyverse recycling rules", { expect_error( str_match(c("a", "b"), c("a", "b", "c")), class = "vctrs_error_incompatible_size" ) expect_error( str_match_all(c("a", "b"), c("a", "b", "c")), class = "vctrs_error_incompatible_size" ) }) test_that("match can't use other modifiers", { expect_snapshot(error = TRUE, { str_match("x", coll("y")) str_match_all("x", coll("y")) }) }) stringr/tests/testthat/test-escape.R0000644000176200001440000000026414316043620017305 0ustar liggesuserstest_that("multiplication works", { expect_equal( str_escape(".^$|*+?{}[]()"), "\\.\\^\\$\\|\\*\\+\\?\\{\\}\\[\\]\\(\\)" ) expect_equal(str_escape("\\"), "\\\\") }) stringr/tests/testthat/test-wrap.R0000644000176200001440000000107014317040167017016 0ustar liggesuserstest_that("wrapping removes spaces", { expect_equal(str_wrap(""), "") expect_equal(str_wrap(" "), "") expect_equal(str_wrap(" a "), "a") }) test_that("wrapping with width of 0 puts each word on own line", { n_returns <- letters %>% str_c(collapse = " ") %>% str_wrap(0) %>% str_count("\n") expect_equal(n_returns, length(letters) - 1) }) test_that("wrapping at whitespace break works", { expect_equal(str_wrap("a/b", width = 0, whitespace_only = TRUE), "a/b") expect_equal(str_wrap("a/b", width = 0, whitespace_only = FALSE), "a/\nb") }) stringr/tests/testthat/test-flatten.R0000644000176200001440000000174514520174727017521 0ustar liggesuserstest_that("equivalent to paste with collapse", { expect_equal(str_flatten(letters), paste0(letters, collapse = "")) }) test_that("collapse must be single string", { expect_error(str_flatten("A", c("a", "b")), "single string") }) test_that("last optionally used instead of final separator", { expect_equal(str_flatten(letters[1:3], ", ", ", and "), "a, b, and c") expect_equal(str_flatten(letters[1:2], ", ", ", and "), "a, and b") expect_equal(str_flatten(letters[1], ", ", ", and "), "a") }) test_that("can remove missing values", { expect_equal(str_flatten(c("a", NA)), NA_character_) expect_equal(str_flatten(c("a", NA), na.rm = TRUE), "a") }) test_that("str_flatten_oxford removes comma iif necessary", { expect_equal(str_flatten_comma(letters[1:2], ", or "), "a or b") expect_equal(str_flatten_comma(letters[1:3], ", or "), "a, b, or c") expect_equal(str_flatten_comma(letters[1:3], " or "), "a, b or c") expect_equal(str_flatten_comma(letters[1:3]), "a, b, c") }) stringr/tests/testthat/test-sort.R0000644000176200001440000000134714316110505017034 0ustar liggesuserstest_that("digits can be sorted/ordered as strings or numbers", { x <- c("2", "1", "10") expect_equal(str_sort(x, numeric = FALSE), c("1", "10", "2")) expect_equal(str_sort(x, numeric = TRUE), c("1", "2", "10")) expect_equal(str_order(x, numeric = FALSE), c(2, 3, 1)) expect_equal(str_order(x, numeric = TRUE), c(2, 1, 3)) expect_equal(str_rank(x, numeric = FALSE), c(3, 1, 2)) expect_equal(str_rank(x, numeric = TRUE), c(2, 1, 3)) }) test_that("NA can be at beginning or end", { x <- c("2", "1", NA, "10") na_end <- str_sort(x, numeric = TRUE, na_last = TRUE) expect_equal(tail(na_end, 1), NA_character_) na_start <- str_sort(x, numeric = TRUE, na_last = FALSE) expect_equal(head(na_start, 1), NA_character_) }) stringr/tests/testthat/test-count.R0000644000176200001440000000144014520174727017204 0ustar liggesuserstest_that("counts are as expected", { fruit <- c("apple", "banana", "pear", "pineapple") expect_equal(str_count(fruit, "a"), c(1, 3, 1, 1)) expect_equal(str_count(fruit, "p"), c(2, 0, 1, 3)) expect_equal(str_count(fruit, "e"), c(1, 0, 1, 2)) expect_equal(str_count(fruit, c("a", "b", "p", "n")), c(1, 1, 1, 1)) }) test_that("uses tidyverse recycling rules", { expect_error(str_count(1:2, 1:3), class = "vctrs_error_incompatible_size") }) test_that("can use fixed() and coll()", { expect_equal(str_count("x.", fixed(".")), 1) expect_equal(str_count("\u0131", turkish_I()), 1) }) test_that("can count boundaries", { # str_count(x, boundary()) == lengths(str_split(x, boundary())) expect_equal(str_count("a b c", ""), 5) expect_equal(str_count("a b c", boundary("word")), 3) }) stringr/tests/testthat/test-c.R0000644000176200001440000000122214317040167016266 0ustar liggesuserstest_that("basic case works", { test <- c("a", "b", "c") expect_equal(str_c(test), test) expect_equal(str_c(test, sep = " "), test) expect_equal(str_c(test, collapse = ""), "abc") }) test_that("obeys tidyverse recycling rules", { expect_equal(str_c(), character()) expect_equal(str_c("x", character()), character()) expect_equal(str_c("x", NULL), "x") expect_snapshot(str_c(c("x", "y"), character()), error = TRUE) expect_equal(str_c(c("x", "y"), NULL), c("x", "y")) }) test_that("vectorised arguments error", { expect_snapshot(error = TRUE, { str_c(letters, sep = c("a", "b")) str_c(letters, collapse = c("a", "b")) }) }) stringr/tests/testthat/test-glue.R0000644000176200001440000000032614316043620017000 0ustar liggesuserstest_that("verify wrapper is functional", { expect_equal(as.character(str_glue("a {b}", b = "b")), "a b") df <- data.frame(b = "b") expect_equal(as.character(str_glue_data(df, "a {b}", b = "b")), "a b") }) stringr/tests/testthat/test-sub.R0000644000176200001440000000474014317040167016645 0ustar liggesuserstest_that("correct substring extracted", { alphabet <- str_c(letters, collapse = "") expect_equal(str_sub(alphabet, 1, 3), "abc") expect_equal(str_sub(alphabet, 24, 26), "xyz") }) test_that("can extract multiple substrings", { expect_equal( str_sub_all(c("abc", "def"), list(c(1, 2), 1), list(c(1, 2), 2)), list(c("a", "b"), "de") ) }) test_that("arguments expanded to longest", { alphabet <- str_c(letters, collapse = "") expect_equal( str_sub(alphabet, c(1, 24), c(3, 26)), c("abc", "xyz") ) expect_equal( str_sub(c("abc", "xyz"), 2, 2), c("b", "y") ) }) test_that("can supply start and end/length as a matrix", { x <- c("abc", "def") expect_equal(str_sub(x, cbind(1, end = 1)), c("a", "d")) expect_equal(str_sub(x, cbind(1, length = 2)), c("ab", "de")) expect_equal( str_sub_all(x, cbind(c(1, 2), end = c(2, 3))), list(c("ab", "bc"), c("de", "ef")) ) str_sub(x, cbind(1, end = 1)) <- c("A", "D") expect_equal(x, c("Abc", "Def")) }) test_that("specifying only end subsets from start", { alphabet <- str_c(letters, collapse = "") expect_equal(str_sub(alphabet, end = 3), "abc") }) test_that("specifying only start subsets to end", { alphabet <- str_c(letters, collapse = "") expect_equal(str_sub(alphabet, 24), "xyz") }) test_that("specifying -1 as end selects entire string", { expect_equal( str_sub("ABCDEF", c(4, 5), c(5, -1)), c("DE", "EF") ) expect_equal( str_sub("ABCDEF", c(4, 5), c(-1, -1)), c("DEF", "EF") ) }) test_that("negative values select from end", { expect_equal(str_sub("ABCDEF", 1, -4), "ABC") expect_equal(str_sub("ABCDEF", -3), "DEF") }) test_that("missing arguments give missing results", { expect_equal(str_sub(NA), NA_character_) expect_equal(str_sub(NA, 1, 3), NA_character_) expect_equal(str_sub(c(NA, "NA"), 1, 3), c(NA, "NA")) expect_equal(str_sub("test", NA, NA), NA_character_) expect_equal(str_sub(c(NA, "test"), NA, NA), rep(NA_character_, 2)) }) test_that("replacement works", { x <- "BBCDEF" str_sub(x, 1, 1) <- "A" expect_equal(x, "ABCDEF") str_sub(x, -1, -1) <- "K" expect_equal(x, "ABCDEK") str_sub(x, -2, -1) <- "EFGH" expect_equal(x, "ABCDEFGH") str_sub(x, 2, -2) <- "" expect_equal(x, "AH") }) test_that("replacement with NA works", { x <- "BBCDEF" str_sub(x, NA) <- "A" expect_equal(x, NA_character_) x <- "BBCDEF" str_sub(x, NA, omit_na = TRUE) <- "A" str_sub(x, 1, 1, omit_na = TRUE) <- NA expect_equal(x, "BBCDEF") }) stringr/tests/testthat/test-view.R0000644000176200001440000000355314524677110017033 0ustar liggesuserstest_that("results are truncated", { expect_snapshot(str_view(words)) # and can control with option local_options(stringr.view_n = 5) expect_snapshot(str_view(words)) }) test_that("indices come from original vector", { expect_snapshot(str_view(letters, "a|z", match = TRUE)) }) test_that("view highlights all matches", { x <- c("abc", "def", "fgh") expect_snapshot({ str_view(x, "[aeiou]") str_view(x, "d|e") }) }) test_that("view highlights whitespace (except a space/nl)", { x <- c(" ", "\u00A0", "\n", "\t") expect_snapshot({ str_view(x) "or can instead use escapes" str_view(x, use_escapes = TRUE) }) }) test_that("view displays nothing for empty vectors",{ expect_snapshot(str_view(character())) }) test_that("match argument controls what is shown", { x <- c("abc", "def", "fgh", NA) out <- str_view(x, "d|e", match = NA) expect_length(out, 4) out <- str_view(x, "d|e", match = TRUE) expect_length(out, 1) out <- str_view(x, "d|e", match = FALSE) expect_length(out, 3) }) test_that("can match across lines", { local_reproducible_output(crayon = TRUE) expect_snapshot(str_view("a\nb\nbbb\nc", "(b|\n)+")) }) test_that("vectorised over pattern", { x <- str_view("a", c("a", "b"), match = NA) expect_equal(length(x), 2) }) test_that("[ preserves class", { x <- str_view(letters) expect_s3_class(x[], "stringr_view") }) test_that("str_view_all() is deprecated", { expect_snapshot(str_view_all("abc", "a|b")) }) test_that("html mode continues to work", { skip_if_not_installed("htmltools") skip_if_not_installed("htmlwidgets") x <- c("abc", "def", "fgh") expect_snapshot({ str_view(x, "[aeiou]", html = TRUE)$x$html str_view(x, "d|e", html = TRUE)$x$html }) # can use escapes x <- c(" ", "\u00A0", "\n") expect_snapshot({ str_view(x, html = TRUE, use_escapes = TRUE)$x$html }) }) stringr/tests/testthat/test-locate.R0000644000176200001440000000377314520174727017336 0ustar liggesuserstest_that("basic location matching works", { expect_equal(str_locate("abc", "a")[1, ], c(start = 1, end = 1)) expect_equal(str_locate("abc", "b")[1, ], c(start = 2, end = 2)) expect_equal(str_locate("abc", "c")[1, ], c(start = 3, end = 3)) expect_equal(str_locate("abc", ".+")[1, ], c(start = 1, end = 3)) }) test_that("uses tidyverse recycling rules", { expect_error(str_locate(1:2, 1:3), class = "vctrs_error_incompatible_size") expect_error(str_locate_all(1:2, 1:3), class = "vctrs_error_incompatible_size") }) test_that("locations are integers", { strings <- c("a b c", "d e f") expect_true(is.integer(str_locate(strings, "[a-z]"))) res <- str_locate_all(strings, "[a-z]")[[1]] expect_true(is.integer(res)) expect_true(is.integer(invert_match(res))) }) test_that("both string and patterns are vectorised", { strings <- c("abc", "def") locs <- str_locate(strings, "a") expect_equal(locs[, "start"], c(1, NA)) locs <- str_locate(strings, c("a", "d")) expect_equal(locs[, "start"], c(1, 1)) expect_equal(locs[, "end"], c(1, 1)) locs <- str_locate_all(c("abab"), c("a", "b")) expect_equal(locs[[1]][, "start"], c(1, 3)) expect_equal(locs[[2]][, "start"], c(2, 4)) }) test_that("can use fixed() and coll()", { expect_equal(str_locate("x.x", fixed(".")), cbind(start = 2, end = 2)) expect_equal( str_locate_all("x.x.", fixed(".")), list(cbind(start = c(2, 4), end = c(2, 4))) ) expect_equal(str_locate("\u0131", turkish_I()), cbind(start = 1, end = 1)) expect_equal( str_locate_all("\u0131I", turkish_I()), list(cbind(start = 1:2, end = 1:2)) ) }) test_that("can use boundaries", { expect_equal( str_locate(" x y", ""), cbind(start = 1, end = 1) ) expect_equal( str_locate_all("abc", ""), list(cbind(start = 1:3, end = 1:3)) ) expect_equal( str_locate(" xy", boundary("word")), cbind(start = 2, end = 3) ) expect_equal( str_locate_all(" ab cd", boundary("word")), list(cbind(start = c(2, 6), end = c(3, 7))) ) }) stringr/tests/testthat/test-case.R0000644000176200001440000000100614316043620016753 0ustar liggesuserstest_that("to_upper and to_lower have equivalent base versions", { x <- "This is a sentence." expect_identical(str_to_upper(x), toupper(x)) expect_identical(str_to_lower(x), tolower(x)) }) test_that("to_title creates one capital letter per word", { x <- "This is a sentence." expect_equal(str_count(x, "\\W+"), str_count(str_to_title(x), "[[:upper:]]")) }) test_that("to_sentence capitalizes just the first letter", { x <- "This is a sentence." expect_identical(str_to_sentence("a Test"), "A test") }) stringr/tests/testthat/test-split.R0000644000176200001440000000506714520174727017220 0ustar liggesuserstest_that("special cases are correct", { expect_equal(str_split(NA, "")[[1]], NA_character_) expect_equal(str_split(character(), ""), list()) }) test_that("str_split functions as expected", { expect_equal( str_split(c("bab", "cac", "dadad"), "a"), list(c("b", "b"), c("c", "c"), c("d", "d", "d")) ) }) test_that("str_split() can split by special patterns", { expect_equal(str_split("ab", ""), list(c("a", "b"))) expect_equal(str_split("this that.", boundary("word")), list(c("this", "that"))) expect_equal(str_split("a-b", fixed("-")), list(c("a", "b"))) expect_equal(str_split("aXb", coll("X", ignore_case = TRUE)), list(c("a", "b"))) }) test_that("boundary() can be recycled", { expect_equal(str_split(c("x", "y"), boundary()), list("x", "y")) }) test_that("str_split() can control maximum number of splits", { expect_equal( str_split(c("a", "a-b"), n = 1, "-"), list("a", "a-b") ) expect_equal( str_split(c("a", "a-b"), n = 3, "-"), list("a", c("a", "b")) ) }) test_that("str_split() checks its inputs", { expect_snapshot(error = TRUE, { str_split(letters[1:3], letters[1:2]) str_split("x", 1) str_split("x", "x", n = 0) }) }) test_that("str_split_1 takes string and returns character vector", { expect_equal(str_split_1("abc", ""), c("a", "b", "c")) expect_snapshot_error(str_split_1(letters, "")) }) test_that("str_split_fixed pads with empty string", { expect_equal( str_split_fixed(c("a", "a-b"), "-", 1), cbind(c("a", "a-b"))) expect_equal( str_split_fixed(c("a", "a-b"), "-", 2), cbind(c("a", "a"), c("", "b")) ) expect_equal( str_split_fixed(c("a", "a-b"), "-", 3), cbind(c("a", "a"), c("", "b"), c("", "")) ) }) test_that("str_split_fixed check its inputs", { expect_snapshot(str_split_fixed("x", "x", 0), error = TRUE) }) # str_split_i ------------------------------------------------------------- test_that("str_split_i can extract from LHS or RHS", { expect_equal(str_split_i(c("1-2-3", "4-5"), "-", 1), c("1", "4")) expect_equal(str_split_i(c("1-2-3", "4-5"), "-", -1), c("3", "5")) }) test_that("str_split_i returns NA for absent components", { expect_equal(str_split_i(c("a", "b-c"), "-", 2), c(NA, "c")) expect_equal(str_split_i(c("a", "b-c"), "-", 3), c(NA_character_, NA)) expect_equal(str_split_i(c("1-2-3", "4-5"), "-", -3), c("1", NA)) expect_equal(str_split_i(c("1-2-3", "4-5"), "-", -4), c(NA_character_, NA)) }) test_that("str_split_i check its inputs", { expect_snapshot(error = TRUE, { str_split_i("x", "x", 0) str_split_i("x", "x", 0.5) }) }) stringr/tests/testthat/test-trunc.R0000644000176200001440000000422314524677110017207 0ustar liggesuserstest_that("NA values in input pass through unchanged", { expect_equal( str_trunc(NA_character_, width = 5), NA_character_ ) expect_equal( str_trunc(c("foobar", NA), 5), c("fo...", NA) ) }) test_that("truncations work for all elements of a vector", { expect_equal( str_trunc(c("abcd", "abcde", "abcdef"), width = 5), c("abcd", "abcde", "ab...") ) }) test_that("truncations work for all sides", { trunc <- function(direction, width) str_trunc( "This string is moderately long", direction, width = width ) expect_equal(trunc("right", 20), "This string is mo...") expect_equal(trunc("left", 20), "...s moderately long") expect_equal(trunc("center", 20), "This stri...ely long") expect_equal(trunc("right", 3), "...") expect_equal(trunc("left", 3), "...") expect_equal(trunc("center", 3), "...") expect_equal(trunc("right", 4), "T...") expect_equal(trunc("left", 4), "...g") expect_equal(trunc("center", 4), "T...") expect_equal(trunc("right", 5), "Th...") expect_equal(trunc("left", 5), "...ng") expect_equal(trunc("center", 5), "T...g") }) test_that("does not truncate to a length shorter than elipsis", { expect_snapshot(error = TRUE, { str_trunc("foobar", 2) str_trunc("foobar", 3, ellipsis = "....") }) }) test_that("str_trunc correctly snips rhs-of-ellipsis for truncated strings", { trunc <- function(width, side) { str_trunc(c("", "a", "aa", "aaa", "aaaa", "aaaaaaa"), width, side, ellipsis = "..") } expect_equal(trunc(4, "right"), c("", "a", "aa", "aaa", "aaaa", "aa..")) expect_equal(trunc(4, "left"), c("", "a", "aa", "aaa", "aaaa", "..aa")) expect_equal(trunc(4, "center"), c("", "a", "aa", "aaa", "aaaa", "a..a")) expect_equal(trunc(3, "right"), c("", "a", "aa", "aaa", "a..", "a..")) expect_equal(trunc(3, "left"), c("", "a", "aa", "aaa", "..a", "..a")) expect_equal(trunc(3, "center"), c("", "a", "aa", "aaa", "a..", "a..")) expect_equal(trunc(2, "right"), c("", "a", "aa", "..", "..", "..")) expect_equal(trunc(2, "left"), c("", "a", "aa", "..", "..", "..")) expect_equal(trunc(2, "center"), c("", "a", "aa", "..", "..", "..")) }) stringr/tests/testthat/test-length.R0000644000176200001440000000123414317040167017330 0ustar liggesuserstest_that("str_length is number of characters", { expect_equal(str_length("a"), 1) expect_equal(str_length("ab"), 2) expect_equal(str_length("abc"), 3) }) test_that("str_length of missing string is missing", { expect_equal(str_length(NA), NA_integer_) expect_equal(str_length(c(NA, 1)), c(NA, 1)) expect_equal(str_length("NA"), 2) }) test_that("str_length of factor is length of level", { expect_equal(str_length(factor("a")), 1) expect_equal(str_length(factor("ab")), 2) expect_equal(str_length(factor("abc")), 3) }) test_that("str_width returns display width", { x <- c("\u0308", "x", "\U0001f60a") expect_equal(str_width(x), c(0, 1, 2)) }) stringr/tests/testthat/test-unique.R0000644000176200001440000000042614520174727017365 0ustar liggesuserstest_that("unique values returned for strings with duplicate values", { expect_equal(str_unique(c("a", "a", "a")), "a") expect_equal(str_unique(c(NA, NA)), NA_character_) }) test_that("can ignore case", { expect_equal(str_unique(c("a", "A"), ignore_case = TRUE), "a") }) stringr/tests/testthat/test-modifiers.R0000644000176200001440000000152614520174727020042 0ustar liggesuserstest_that("patterns coerced to character", { x <- factor("a") expect_snapshot({ . <- regex(x) . <- coll(x) . <- fixed(x) }) }) test_that("useful error message for bad type", { expect_snapshot(error = TRUE, { type(1:3) }) }) test_that("fallback for regex (#433)", { expect_equal(type(structure("x", class = "regex")), "regex") }) test_that("ignore_case sets strength, but can override manually", { x1 <- coll("x", strength = 1) x2 <- coll("x", ignore_case = TRUE) x3 <- coll("x") expect_equal(attr(x1, "options")$strength, 1) expect_equal(attr(x2, "options")$strength, 2) expect_equal(attr(x3, "options")$strength, 3) }) test_that("boundary has length 1", { expect_length(boundary(), 1) }) test_that("subsetting preserves class and options", { x <- regex("a", multiline = TRUE) expect_equal(x[], x) }) stringr/tests/testthat/_snaps/0000755000176200001440000000000014524677110016236 5ustar liggesusersstringr/tests/testthat/_snaps/modifiers.md0000644000176200001440000000105414520174727020543 0ustar liggesusers# patterns coerced to character Code . <- regex(x) Condition Warning in `regex()`: Coercing `pattern` to a plain character vector. Code . <- coll(x) Condition Warning in `coll()`: Coercing `pattern` to a plain character vector. Code . <- fixed(x) Condition Warning in `fixed()`: Coercing `pattern` to a plain character vector. # useful error message for bad type Code type(1:3) Condition Error: ! `pattern` must be a string, not an integer vector. stringr/tests/testthat/_snaps/detect.md0000644000176200001440000000232514520174727020034 0ustar liggesusers# can't empty/boundary Code str_detect("x", "") Condition Error in `str_detect()`: ! `pattern` can't be the empty string (`""`). Code str_starts("x", "") Condition Error in `str_starts()`: ! `pattern` can't be the empty string (`""`). Code str_ends("x", "") Condition Error in `str_ends()`: ! `pattern` can't be the empty string (`""`). # functions use tidyverse recycling rules Code str_detect(1:2, 1:3) Condition Error in `str_detect()`: ! Can't recycle `string` (size 2) to match `pattern` (size 3). Code str_starts(1:2, 1:3) Condition Error in `str_starts()`: ! Can't recycle `string` (size 2) to match `pattern` (size 3). Code str_ends(1:2, 1:3) Condition Error in `str_ends()`: ! Can't recycle `string` (size 2) to match `pattern` (size 3). Code str_like(1:2, c("a", "b", "c")) Condition Error in `str_like()`: ! Can't recycle `string` (size 2) to match `pattern` (size 3). # str_like works Code str_like("abc", regex("x")) Condition Error in `str_like()`: ! `pattern` must be a plain string, not a stringr modifier. stringr/tests/testthat/_snaps/trunc.md0000644000176200001440000000052614520174727017720 0ustar liggesusers# does not truncate to a length shorter than elipsis Code str_trunc("foobar", 2) Condition Error in `str_trunc()`: ! `width` (2) is shorter than `ellipsis` (3). Code str_trunc("foobar", 3, ellipsis = "....") Condition Error in `str_trunc()`: ! `width` (3) is shorter than `ellipsis` (4). stringr/tests/testthat/_snaps/split.md0000644000176200001440000000206614520174727017721 0ustar liggesusers# str_split() checks its inputs Code str_split(letters[1:3], letters[1:2]) Condition Error in `str_split()`: ! Can't recycle `string` (size 3) to match `pattern` (size 2). Code str_split("x", 1) Condition Error in `str_split()`: ! `pattern` must be a string, not a number. Code str_split("x", "x", n = 0) Condition Error in `str_split()`: ! `n` must be a number larger than 1, not the number 0. # str_split_1 takes string and returns character vector `string` must be a single string, not a character vector. # str_split_fixed check its inputs Code str_split_fixed("x", "x", 0) Condition Error in `str_split_fixed()`: ! `n` must be a number larger than 1, not the number 0. # str_split_i check its inputs Code str_split_i("x", "x", 0) Condition Error in `str_split_i()`: ! `i` must not be 0. Code str_split_i("x", "x", 0.5) Condition Error in `str_split_i()`: ! `i` must be a whole number, not the number 0.5. stringr/tests/testthat/_snaps/subset.md0000644000176200001440000000046014520174727020067 0ustar liggesusers# can't use boundaries Code str_subset(c("a", "b c"), "") Condition Error in `str_subset()`: ! `pattern` can't be the empty string (`""`). Code str_subset(c("a", "b c"), boundary()) Condition Error in `str_subset()`: ! `pattern` can't be a boundary. stringr/tests/testthat/_snaps/c.md0000644000176200001440000000103414325503444016775 0ustar liggesusers# obeys tidyverse recycling rules Code str_c(c("x", "y"), character()) Condition Error in `str_c()`: ! Can't recycle `..1` (size 2) to match `..2` (size 0). # vectorised arguments error Code str_c(letters, sep = c("a", "b")) Condition Error in `str_c()`: ! `sep` must be a single string, not a character vector. Code str_c(letters, collapse = c("a", "b")) Condition Error in `str_c()`: ! `collapse` must be a single string or `NULL`, not a character vector. stringr/tests/testthat/_snaps/replace.md0000644000176200001440000000176514520175762020206 0ustar liggesusers# replacement must be a string Code str_replace("x", "x", 1) Condition Error in `str_replace()`: ! `replacement` must be a character vector, not the number 1. # can't replace empty/boundary Code str_replace("x", "", "") Condition Error in `str_replace()`: ! `pattern` can't be the empty string (`""`). Code str_replace("x", boundary("word"), "") Condition Error in `str_replace()`: ! `pattern` can't be a boundary. Code str_replace_all("x", "", "") Condition Error in `str_replace_all()`: ! `pattern` can't be empty. Code str_replace_all("x", boundary("word"), "") Condition Error in `str_replace_all()`: ! `pattern` can't be a boundary. # backrefs are correctly translated Code str_replace_all("abcde", "(b)(c)(d)", "\\4") Condition Error in `stri_replace_all_regex()`: ! Trying to access the index that is out of bounds. (U_INDEX_OUTOFBOUNDS_ERROR) stringr/tests/testthat/_snaps/view.md0000644000176200001440000000462014524677110017534 0ustar liggesusers# results are truncated Code str_view(words) Output [1] | a [2] | able [3] | about [4] | absolute [5] | accept [6] | account [7] | achieve [8] | across [9] | act [10] | active [11] | actual [12] | add [13] | address [14] | admit [15] | advertise [16] | affect [17] | afford [18] | after [19] | afternoon [20] | again ... and 960 more --- Code str_view(words) Output [1] | a [2] | able [3] | about [4] | absolute [5] | accept ... and 975 more # indices come from original vector Code str_view(letters, "a|z", match = TRUE) Output [1] | [26] | # view highlights all matches Code str_view(x, "[aeiou]") Output [1] | bc [2] | df Code str_view(x, "d|e") Output [2] | f # view highlights whitespace (except a space/nl) Code str_view(x) Output [1] | [2] | {\u00a0} [3] | | [4] | {\t} Code # or can instead use escapes str_view(x, use_escapes = TRUE) Output [1] | [2] | \u00a0 [3] | \n [4] | \t # view displays nothing for empty vectors Code str_view(character()) # can match across lines Code str_view("a\nb\nbbb\nc", "(b|\n)+") Output [1] | a< | b | bbb | >c # str_view_all() is deprecated Code str_view_all("abc", "a|b") Condition Warning: `str_view_all()` was deprecated in stringr 1.5.0. i Please use `str_view()` instead. Output [1] | c # html mode continues to work Code str_view(x, "[aeiou]", html = TRUE)$x$html Output
  • abc
  • def
Code str_view(x, "d|e", html = TRUE)$x$html Output
  • def
--- Code str_view(x, html = TRUE, use_escapes = TRUE)$x$html Output
  •  
  • \u00a0
  • \n
stringr/tests/testthat/_snaps/match.md0000644000176200001440000000046614325503445017660 0ustar liggesusers# match can't use other modifiers Code str_match("x", coll("y")) Condition Error in `str_match()`: ! `pattern` must be a regular expression. Code str_match_all("x", coll("y")) Condition Error in `str_match_all()`: ! `pattern` must be a regular expression. stringr/tests/testthat/test-detect.R0000644000176200001440000000414114520174727017325 0ustar liggesuserstest_that("special cases are correct", { expect_equal(str_detect(NA, "x"), NA) expect_equal(str_detect(character(), "x"), logical()) }) test_that("vectorised patterns work", { expect_equal(str_detect("ab", c("a", "b", "c")), c(T, T, F)) expect_equal(str_detect(c("ca", "ab"), c("a", "c")), c(T, F)) # negation works expect_equal(str_detect("ab", c("a", "b", "c"), negate = TRUE), c(F, F, T)) }) test_that("str_starts() and str_ends() match expected strings", { expect_equal(str_starts(c("ab", "ba"), "a"), c(TRUE, FALSE)) expect_equal(str_ends(c("ab", "ba"), "a"), c(FALSE, TRUE)) # negation expect_equal(str_starts(c("ab", "ba"), "a", negate = TRUE), c(FALSE, TRUE)) expect_equal(str_ends(c("ab", "ba"), "a", negate = TRUE), c(TRUE, FALSE)) # correct precedence expect_equal(str_starts(c("ab", "ba", "cb"), "a|b"), c(TRUE, TRUE, FALSE)) expect_equal(str_ends(c("ab", "ba", "bc"), "a|b"), c(TRUE, TRUE, FALSE)) }) test_that("can use fixed() and coll()", { expect_equal(str_detect("X", fixed(".")), FALSE) expect_equal(str_starts("X", fixed(".")), FALSE) expect_equal(str_ends("X", fixed(".")), FALSE) expect_equal(str_detect("\u0131", turkish_I()), TRUE) expect_equal(str_starts("\u0131", turkish_I()), TRUE) expect_equal(str_ends("\u0131", turkish_I()), TRUE) }) test_that("can't empty/boundary", { expect_snapshot(error = TRUE, { str_detect("x", "") str_starts("x", "") str_ends("x", "") }) }) test_that("functions use tidyverse recycling rules", { expect_snapshot(error = TRUE, { str_detect(1:2, 1:3) str_starts(1:2, 1:3) str_ends(1:2, 1:3) str_like(1:2, c("a", "b", "c")) }) }) # str_like ---------------------------------------------------------------- test_that("str_like works", { expect_true(str_like("abc", "ab%")) expect_snapshot(str_like("abc", regex("x")), error = TRUE) }) test_that("like_to_regex generates expected regexps",{ expect_equal(like_to_regex("ab%"), "^ab.*$") expect_equal(like_to_regex("ab_"), "^ab.$") # escaping expect_equal(like_to_regex("ab\\%"), "^ab\\%$") expect_equal(like_to_regex("ab[%]"), "^ab[%]$") }) stringr/tests/testthat/test-trim.R0000644000176200001440000000132214317040167017020 0ustar liggesuserstest_that("trimming removes spaces", { expect_equal(str_trim("abc "), "abc") expect_equal(str_trim(" abc"), "abc") expect_equal(str_trim(" abc "), "abc") }) test_that("trimming removes tabs", { expect_equal(str_trim("abc\t"), "abc") expect_equal(str_trim("\tabc"), "abc") expect_equal(str_trim("\tabc\t"), "abc") }) test_that("side argument restricts trimming", { expect_equal(str_trim(" abc ", "left"), "abc ") expect_equal(str_trim(" abc ", "right"), " abc") }) test_that("str_squish removes excess spaces from all parts of string", { expect_equal(str_squish("ab\t\tc\t"), "ab c") expect_equal(str_squish("\ta bc"), "a bc") expect_equal(str_squish("\ta\t bc\t"), "a bc") }) stringr/tests/testthat/test-word.R0000644000176200001440000000104114317040167017016 0ustar liggesuserstest_that("word extraction", { expect_equal("walk", word("walk the moon")) expect_equal("walk", word("walk the moon", 1)) expect_equal("moon", word("walk the moon", 3)) expect_equal("the moon", word("walk the moon", 2, 3)) }) test_that("words past end return NA", { expect_equal(word("a b c", 4), NA_character_) }) test_that("negative parameters", { expect_equal("moon", word("walk the moon", -1, -1)) expect_equal("walk the moon", word("walk the moon", -3, -1)) expect_equal("walk the moon", word("walk the moon", -5, -1)) }) stringr/tests/testthat/test-remove.R0000644000176200001440000000023614317040167017345 0ustar liggesuserstest_that("succesfully wraps str_replace_all", { expect_equal(str_remove_all("abababa", "ba"), "a") expect_equal(str_remove("abababa", "ba"), "ababa") }) stringr/tests/testthat/test-pad.R0000644000176200001440000000211714317040167016614 0ustar liggesuserstest_that("long strings are unchanged", { lengths <- sample(40:100, 10) strings <- vapply(lengths, function(x) str_c(letters[sample(26, x, replace = T)], collapse = ""), character(1)) padded <- str_pad(strings, width = 30) expect_equal(str_length(padded), str_length(strings)) }) test_that("directions work for simple case", { pad <- function(direction) str_pad("had", direction, width = 10) expect_equal(pad("right"), "had ") expect_equal(pad("left"), " had") expect_equal(pad("both"), " had ") }) test_that("padding based of length works", { # \u4e2d is a 2-characters-wide Chinese character pad <- function(...) str_pad("\u4e2d", ..., side = "both") expect_equal(pad(width = 6), " \u4e2d ") expect_equal(pad(width = 5, use_width = FALSE), " \u4e2d ") }) test_that("uses tidyverse recycling rules", { expect_error( str_pad(c("a", "b"), 1:3), class = "vctrs_error_incompatible_size" ) expect_error( str_pad(c("a", "b"), 10, pad = c("a", "b", "c")), class = "vctrs_error_incompatible_size" ) }) stringr/tests/testthat/test-dup.R0000644000176200001440000000077314317040167016646 0ustar liggesuserstest_that("basic duplication works", { expect_equal(str_dup("a", 3), "aaa") expect_equal(str_dup("abc", 2), "abcabc") expect_equal(str_dup(c("a", "b"), 2), c("aa", "bb")) expect_equal(str_dup(c("a", "b"), c(2, 3)), c("aa", "bbb")) }) test_that("0 duplicates equals empty string", { expect_equal(str_dup("a", 0), "") expect_equal(str_dup(c("a", "b"), 0), rep("", 2)) }) test_that("uses tidyverse recycling rules", { expect_error(str_dup(1:2, 1:3), class = "vctrs_error_incompatible_size") }) stringr/tests/testthat/test-equal.R0000644000176200001440000000062114316043620017151 0ustar liggesuserstest_that("vectorised using TRR", { expect_equal(str_equal("a", character()), logical()) expect_equal(str_equal("a", "b"), FALSE) expect_equal(str_equal("a", c("a", "b")), c(TRUE, FALSE)) expect_error(str_equal(letters[1:3], c("a", "b")), "recycle") }) test_that("can ignore case", { expect_equal(str_equal("a", "A"), FALSE) expect_equal(str_equal("a", "A", ignore_case = TRUE), TRUE) }) stringr/tests/testthat/test-extract.R0000644000176200001440000000343714520174727017536 0ustar liggesuserstest_that("single pattern extracted correctly", { test <- c("one two three", "a b c") expect_equal( str_extract_all(test, "[a-z]+"), list(c("one", "two", "three"), c("a", "b", "c")) ) expect_equal( str_extract_all(test, "[a-z]{3,}"), list(c("one", "two", "three"), character()) ) }) test_that("uses tidyverse recycling rules", { expect_error( str_extract(c("a", "b"), c("a", "b", "c")), class = "vctrs_error_incompatible_size" ) expect_error( str_extract_all(c("a", "b"), c("a", "b", "c")), class = "vctrs_error_incompatible_size" ) }) test_that("no match yields empty vector", { expect_equal(str_extract_all("a", "b")[[1]], character()) }) test_that("str_extract extracts first match if found, NA otherwise", { shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2") word_1_to_4 <- str_extract(shopping_list, "\\b[a-z]{1,4}\\b") expect_length(word_1_to_4, length(shopping_list)) expect_equal(word_1_to_4[1], NA_character_) }) test_that("can extract a group", { expect_equal(str_extract("abc", "(.).(.)", group = 1), "a") expect_equal(str_extract("abc", "(.).(.)", group = 2), "c") }) test_that("can use fixed() and coll()", { expect_equal(str_extract("x.x", fixed(".")), ".") expect_equal(str_extract_all("x.x.", fixed(".")), list(c(".", "."))) expect_equal(str_extract("\u0131", turkish_I()), "\u0131") expect_equal(str_extract_all("\u0131I", turkish_I()), list(c("\u0131", "I"))) }) test_that("can extract boundaries", { expect_equal(str_extract("a b c", ""), "a") expect_equal( str_extract_all("a b c", ""), list(c("a", " ", "b", " ", "c")) ) expect_equal(str_extract("a b c", boundary("word")), "a") expect_equal( str_extract_all("a b c", boundary("word")), list(c("a", "b", "c")) ) }) stringr/tests/testthat/test-conv.R0000644000176200001440000000041014316043620017003 0ustar liggesuserstest_that("encoding conversion works", { skip_on_os("windows") x <- rawToChar(as.raw(177)) expect_equal(str_conv(x, "latin1"), "±") }) test_that("check encoding argument", { expect_error(str_conv("A", c("ISO-8859-1", "ISO-8859-2")), "single string") }) stringr/tests/testthat/test-interp.R0000644000176200001440000000366514317040167017362 0ustar liggesuserstest_that("str_interp works with default env", { subject <- "statistics" number <- 7 floating <- 6.656 expect_equal( str_interp("A ${subject}. B $[d]{number}. C $[.2f]{floating}."), "A statistics. B 7. C 6.66." ) expect_equal( str_interp("Pi is approximately $[.5f]{pi}"), "Pi is approximately 3.14159" ) }) test_that("str_interp works with lists and data frames.", { expect_equal( str_interp( "One value, ${value1}, and then another, ${value2*2}.", list(value1 = 10, value2 = 20) ), "One value, 10, and then another, 40." ) expect_equal( str_interp( "Values are $[.2f]{max(Sepal.Width)} and $[.2f]{min(Sepal.Width)}.", iris ), "Values are 4.40 and 2.00." ) }) test_that("str_interp works with nested expressions", { amount <- 1337 expect_equal( str_interp("Works with } nested { braces too: $[.2f]{{{2 + 2}*{amount}}}"), "Works with } nested { braces too: 5348.00" ) }) test_that("str_interp works in the absense of placeholders", { expect_equal( str_interp("A quite static string here."), "A quite static string here." ) }) test_that("str_interp fails when encountering nested placeholders", { msg <- "This will never see the light of day" num <- 1.2345 expect_error( str_interp("${${msg}}"), "Invalid template string for interpolation" ) expect_error( str_interp("$[.2f]{${msg}}"), "Invalid template string for interpolation" ) }) test_that("str_interp fails when input is not a character string", { expect_error(str_interp(3L)) }) test_that("str_interp formats list independetly of other placeholders", { a_list <- c("item1", "item2", "item3") other <- "1" extract <- function(text) regmatches(text, regexpr("xx[^x]+xx", text)) from_list <- extract(str_interp("list: xx${a_list}xx")) from_both <- extract(str_interp("list: xx${a_list}xx, and another ${other}")) expect_equal(from_list, from_both) }) stringr/tests/testthat/test-subset.R0000644000176200001440000000210014520174727017353 0ustar liggesuserstest_that("can subset with regexps", { x <- c("a", "b", "c") expect_equal(str_subset(x, "a|c"), c("a", "c")) expect_equal(str_subset(x, "a|c", negate = TRUE), "b") }) test_that("can subset with fixed patterns", { expect_equal(str_subset(c("i", "I"), fixed("i")), "i") expect_equal( str_subset(c("i", "I"), fixed("i", ignore_case = TRUE)), c("i", "I") ) # negation works expect_equal(str_subset(c("i", "I"), fixed("i"), negate = TRUE), "I") }) test_that("str_which is equivalent to grep", { expect_equal( str_which(head(letters), "[aeiou]"), grep("[aeiou]", head(letters)) ) # negation works expect_equal( str_which(head(letters), "[aeiou]", negate = TRUE), grep("[aeiou]", head(letters), invert = TRUE) ) }) test_that("can use fixed() and coll()", { expect_equal(str_subset(c("x", "."), fixed(".")), ".") expect_equal(str_subset(c("i", "\u0131"), turkish_I()), "\u0131") }) test_that("can't use boundaries", { expect_snapshot(error = TRUE, { str_subset(c("a", "b c"), "") str_subset(c("a", "b c"), boundary()) }) }) stringr/tests/testthat/test-replace.R0000644000176200001440000000743714520175762017503 0ustar liggesuserstest_that("basic replacement works", { expect_equal(str_replace_all("abababa", "ba", "BA"), "aBABABA") expect_equal(str_replace("abababa", "ba", "BA"), "aBAbaba") }) test_that("can replace multiple matches", { x <- c("a1", "b2") y <- str_replace_all(x, c("a" = "1", "b" = "2")) expect_equal(y, c("11", "22")) }) test_that("even when lengths differ", { x <- c("a1", "b2", "c3") y <- str_replace_all(x, c("a" = "1", "b" = "2")) expect_equal(y, c("11", "22", "c3")) }) test_that("multiple matches respects class", { x <- c("x", "y") y <- str_replace_all(x, regex(c("X" = "a"), ignore_case = TRUE)) expect_equal(y, c("a", "y")) }) test_that("replacement must be a string", { expect_snapshot(str_replace("x", "x", 1), error = TRUE) }) test_that("replacement must be a string", { expect_equal(str_replace("xyz", "x", NA_character_), NA_character_) }) test_that("can replace all types of NA values", { expect_equal(str_replace_na(NA), "NA") expect_equal(str_replace_na(NA_character_), "NA") expect_equal(str_replace_na(NA_complex_), "NA") expect_equal(str_replace_na(NA_integer_), "NA") expect_equal(str_replace_na(NA_real_), "NA") }) test_that("can use fixed() and coll()", { expect_equal(str_replace("x.x", fixed("."), "Y"), "xYx") expect_equal(str_replace_all("x.x.", fixed("."), "Y"), "xYxY") expect_equal(str_replace("\u0131", turkish_I(), "Y"), "Y") expect_equal(str_replace_all("\u0131I", turkish_I(), "Y"), "YY") }) test_that("can't replace empty/boundary", { expect_snapshot(error = TRUE, { str_replace("x", "", "") str_replace("x", boundary("word"), "") str_replace_all("x", "", "") str_replace_all("x", boundary("word"), "") }) }) # functions --------------------------------------------------------------- test_that("can supply replacement function", { expect_equal(str_replace("abc", "a|c", toupper), "Abc") expect_equal(str_replace_all("abc", "a|c", toupper), "AbC") }) test_that("replacement can be different length", { double <- function(x) str_dup(x, 2) expect_equal(str_replace_all("abc", "a|c", double), "aabcc") }) test_that("replacement with NA works", { expect_equal(str_replace("abc", "z", toupper), "abc") }) test_that("can use formula", { expect_equal(str_replace("abc", "b", ~ "x"), "axc") expect_equal(str_replace_all("abc", "b", ~ "x"), "axc") }) # fix_replacement --------------------------------------------------------- test_that("backrefs are correctly translated", { expect_equal(str_replace_all("abcde", "(b)(c)(d)", "\\1"), "abe") expect_equal(str_replace_all("abcde", "(b)(c)(d)", "\\2"), "ace") expect_equal(str_replace_all("abcde", "(b)(c)(d)", "\\3"), "ade") # gsub("(b)(c)(d)", "\\0", "abcde", perl=TRUE) gives a0e, # in ICU regex $0 refers to the whole pattern match expect_equal(str_replace_all("abcde", "(b)(c)(d)", "\\0"), "abcde") # gsub("(b)(c)(d)", "\\4", "abcde", perl=TRUE) is legal, # in ICU regex this gives an U_INDEX_OUTOFBOUNDS_ERROR expect_snapshot(str_replace_all("abcde", "(b)(c)(d)", "\\4"), error = TRUE) expect_equal(str_replace_all("abcde", "bcd", "\\\\1"), "a\\1e") expect_equal(str_replace_all("a!1!2!b", "!", "$"), "a$1$2$b") expect_equal(str_replace("aba", "b", "$"), "a$a") expect_equal(str_replace("aba", "b", "$$$"), "a$$$a") expect_equal(str_replace("aba", "(b)", "\\1$\\1$\\1"), "ab$b$ba") expect_equal(str_replace("aba", "(b)", "\\1$\\\\1$\\1"), "ab$\\1$ba") expect_equal(str_replace("aba", "(b)", "\\\\1$\\1$\\\\1"), "a\\1$b$\\1a") }) test_that("$ are escaped", { expect_equal(fix_replacement("$"), "\\$") expect_equal(fix_replacement("\\$"), "\\\\$") }) test_that("\1 converted to $1 etc", { expect_equal(fix_replacement("\\1"), "$1") expect_equal(fix_replacement("\\9"), "$9") }) test_that("\\1 left as is", { expect_equal(fix_replacement("\\\\1"), "\\\\1") }) stringr/tests/testthat.R0000644000176200001440000000007212435640121015064 0ustar liggesuserslibrary(testthat) library(stringr) test_check("stringr") stringr/vignettes/0000755000176200001440000000000014524706130013754 5ustar liggesusersstringr/vignettes/from-base.Rmd0000644000176200001440000004036414524677110016307 0ustar liggesusers--- title: "From base R" author: "Sara Stoudt" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{From base R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r} #| label: setup #| include: false knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(stringr) library(magrittr) ``` This vignette compares stringr functions to their base R equivalents to help users transitioning from using base R to stringr. # Overall differences We'll begin with a lookup table between the most important stringr functions and their base R equivalents. ```{r} #| label: stringr-base-r-diff #| echo: false data_stringr_base_diff <- tibble::tribble( ~stringr, ~base_r, "str_detect(string, pattern)", "grepl(pattern, x)", "str_dup(string, times)", "strrep(x, times)", "str_extract(string, pattern)", "regmatches(x, m = regexpr(pattern, text))", "str_extract_all(string, pattern)", "regmatches(x, m = gregexpr(pattern, text))", "str_length(string)", "nchar(x)", "str_locate(string, pattern)", "regexpr(pattern, text)", "str_locate_all(string, pattern)", "gregexpr(pattern, text)", "str_match(string, pattern)", "regmatches(x, m = regexec(pattern, text))", "str_order(string)", "order(...)", "str_replace(string, pattern, replacement)", "sub(pattern, replacement, x)", "str_replace_all(string, pattern, replacement)", "gsub(pattern, replacement, x)", "str_sort(string)", "sort(x)", "str_split(string, pattern)", "strsplit(x, split)", "str_sub(string, start, end)", "substr(x, start, stop)", "str_subset(string, pattern)", "grep(pattern, x, value = TRUE)", "str_to_lower(string)", "tolower(x)", "str_to_title(string)", "tools::toTitleCase(text)", "str_to_upper(string)", "toupper(x)", "str_trim(string)", "trimws(x)", "str_which(string, pattern)", "grep(pattern, x)", "str_wrap(string)", "strwrap(x)" ) # create MD table, arranged alphabetically by stringr fn name data_stringr_base_diff %>% dplyr::mutate(dplyr::across(.fns = ~ paste0("`", .x, "`"))) %>% dplyr::arrange(stringr) %>% dplyr::rename(`base R` = base_r) %>% gt::gt() %>% gt::fmt_markdown(columns = everything()) %>% gt::tab_options(column_labels.font.weight = "bold") ``` Overall the main differences between base R and stringr are: 1. stringr functions start with `str_` prefix; base R string functions have no consistent naming scheme. 1. The order of inputs is usually different between base R and stringr. In base R, the `pattern` to match usually comes first; in stringr, the `string` to manupulate always comes first. This makes stringr easier to use in pipes, and with `lapply()` or `purrr::map()`. 1. Functions in stringr tend to do less, where many of the string processing functions in base R have multiple purposes. 1. The output and input of stringr functions has been carefully designed. For example, the output of `str_locate()` can be fed directly into `str_sub()`; the same is not true of `regpexpr()` and `substr()`. 1. Base functions use arguments (like `perl`, `fixed`, and `ignore.case`) to control how the pattern is interpreted. To avoid dependence between arguments, stringr instead uses helper functions (like `fixed()`, `regex()`, and `coll()`). Next we'll walk through each of the functions, noting the similarities and important differences. These examples are adapted from the stringr documentation and here they are contrasted with the analogous base R operations. # Detect matches ## `str_detect()`: Detect the presence or absence of a pattern in a string Suppose you want to know whether each word in a vector of fruit names contains an "a". ```{r} fruit <- c("apple", "banana", "pear", "pineapple") # base grepl(pattern = "a", x = fruit) # stringr str_detect(fruit, pattern = "a") ``` In base you would use `grepl()` (see the "l" and think logical) while in stringr you use `str_detect()` (see the verb "detect" and think of a yes/no action). ## `str_which()`: Find positions matching a pattern Now you want to identify the positions of the words in a vector of fruit names that contain an "a". ```{r} # base grep(pattern = "a", x = fruit) # stringr str_which(fruit, pattern = "a") ``` In base you would use `grep()` while in stringr you use `str_which()` (by analogy to `which()`). ## `str_count()`: Count the number of matches in a string How many "a"s are in each fruit? ```{r} # base loc <- gregexpr(pattern = "a", text = fruit, fixed = TRUE) sapply(loc, function(x) length(attr(x, "match.length"))) # stringr str_count(fruit, pattern = "a") ``` This information can be gleaned from `gregexpr()` in base, but you need to look at the `match.length` attribute as the vector uses a length-1 integer vector (`-1`) to indicate no match. ## `str_locate()`: Locate the position of patterns in a string Within each fruit, where does the first "p" occur? Where are all of the "p"s? ```{r} fruit3 <- c("papaya", "lime", "apple") # base str(gregexpr(pattern = "p", text = fruit3)) # stringr str_locate(fruit3, pattern = "p") str_locate_all(fruit3, pattern = "p") ``` # Subset strings ## `str_sub()`: Extract and replace substrings from a character vector What if we want to grab part of a string? ```{r} hw <- "Hadley Wickham" # base substr(hw, start = 1, stop = 6) substring(hw, first = 1) # stringr str_sub(hw, start = 1, end = 6) str_sub(hw, start = 1) str_sub(hw, end = 6) ``` In base you could use `substr()` or `substring()`. The former requires both a start and stop of the substring while the latter assumes the stop will be the end of the string. The stringr version, `str_sub()` has the same functionality, but also gives a default start value (the beginning of the string). Both the base and stringr functions have the same order of expected inputs. In stringr you can use negative numbers to index from the right-hand side string: -1 is the last letter, -2 is the second to last, and so on. ```{r} str_sub(hw, start = 1, end = -1) str_sub(hw, start = -5, end = -2) ``` Both base R and stringr subset are vectorized over their parameters. This means you can either choose the same subset across multiple strings or specify different subsets for different strings. ```{r} al <- "Ada Lovelace" # base substr(c(hw,al), start = 1, stop = 6) substr(c(hw,al), start = c(1,1), stop = c(6,7)) # stringr str_sub(c(hw,al), start = 1, end = -1) str_sub(c(hw,al), start = c(1,1), end = c(-1,-2)) ``` stringr will automatically recycle the first argument to the same length as `start` and `stop`: ```{r} str_sub(hw, start = 1:5) ``` Whereas the base equivalent silently uses just the first value: ```{r} substr(hw, start = 1:5, stop = 15) ``` ## `str_sub() <- `: Subset assignment `substr()` behaves in a surprising way when you replace a substring with a different number of characters: ```{r} # base x <- "ABCDEF" substr(x, 1, 3) <- "x" x ``` `str_sub()` does what you would expect: ```{r} # stringr x <- "ABCDEF" str_sub(x, 1, 3) <- "x" x ``` ## `str_subset()`: Keep strings matching a pattern, or find positions We may want to retrieve strings that contain a pattern of interest: ```{r} # base grep(pattern = "g", x = fruit, value = TRUE) # stringr str_subset(fruit, pattern = "g") ``` ## `str_extract()`: Extract matching patterns from a string We may want to pick out certain patterns from a string, for example, the digits in a shopping list: ```{r} shopping_list <- c("apples x4", "bag of flour", "10", "milk x2") # base matches <- regexpr(pattern = "\\d+", text = shopping_list) # digits regmatches(shopping_list, m = matches) matches <- gregexpr(pattern = "[a-z]+", text = shopping_list) # words regmatches(shopping_list, m = matches) # stringr str_extract(shopping_list, pattern = "\\d+") str_extract_all(shopping_list, "[a-z]+") ``` Base R requires the combination of `regexpr()` with `regmatches()`; but note that the strings without matches are dropped from the output. stringr provides `str_extract()` and `str_extract_all()`, and the output is always the same length as the input. ## `str_match()`: Extract matched groups from a string We may also want to extract groups from a string. Here I'm going to use the scenario from Section 14.4.3 in [R for Data Science](https://r4ds.had.co.nz/strings.html). ```{r} head(sentences) noun <- "([A]a|[Tt]he) ([^ ]+)" # base matches <- regexec(pattern = noun, text = head(sentences)) do.call("rbind", regmatches(x = head(sentences), m = matches)) # stringr str_match(head(sentences), pattern = noun) ``` As for extracting the full match base R requires the combination of two functions, and inputs with no matches are dropped from the output. # Manage lengths ## `str_length()`: The length of a string To determine the length of a string, base R uses `nchar()` (not to be confused with `length()` which gives the length of vectors, etc.) while stringr uses `str_length()`. ```{r} # base nchar(letters) # stringr str_length(letters) ``` There are some subtle differences between base and stringr here. `nchar()` requires a character vector, so it will return an error if used on a factor. `str_length()` can handle a factor input. ```{r} #| error: true # base nchar(factor("abc")) ``` ```{r} # stringr str_length(factor("abc")) ``` Note that "characters" is a poorly defined concept, and technically both `nchar()` and `str_length()` returns the number of code points. This is usually the same as what you'd consider to be a charcter, but not always: ```{r} x <- c("\u00fc", "u\u0308") x nchar(x) str_length(x) ``` ## `str_pad()`: Pad a string To pad a string to a certain width, use stringr's `str_pad()`. In base R you could use `sprintf()`, but unlike `str_pad()`, `sprintf()` has many other functionalities. ```{r} # base sprintf("%30s", "hadley") sprintf("%-30s", "hadley") # "both" is not as straightforward # stringr rbind( str_pad("hadley", 30, "left"), str_pad("hadley", 30, "right"), str_pad("hadley", 30, "both") ) ``` ## `str_trunc()`: Truncate a character string The stringr package provides an easy way to truncate a character string: `str_trunc()`. Base R has no function to do this directly. ```{r} x <- "This string is moderately long" # stringr rbind( str_trunc(x, 20, "right"), str_trunc(x, 20, "left"), str_trunc(x, 20, "center") ) ``` ## `str_trim()`: Trim whitespace from a string Similarly, stringr provides `str_trim()` to trim whitespace from a string. This is analogous to base R's `trimws()` added in R 3.3.0. ```{r} # base trimws(" String with trailing and leading white space\t") trimws("\n\nString with trailing and leading white space\n\n") # stringr str_trim(" String with trailing and leading white space\t") str_trim("\n\nString with trailing and leading white space\n\n") ``` The stringr function `str_squish()` allows for extra whitespace within a string to be trimmed (in contrast to `str_trim()` which removes whitespace at the beginning and/or end of string). In base R, one might take advantage of `gsub()` to accomplish the same effect. ```{r} # stringr str_squish(" String with trailing, middle, and leading white space\t") str_squish("\n\nString with excess, trailing and leading white space\n\n") ``` ## `str_wrap()`: Wrap strings into nicely formatted paragraphs `strwrap()` and `str_wrap()` use different algorithms. `str_wrap()` uses the famous [Knuth-Plass algorithm](http://litherum.blogspot.com/2015/07/knuth-plass-line-breaking-algorithm.html). ```{r} gettysburg <- "Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal." # base cat(strwrap(gettysburg, width = 60), sep = "\n") # stringr cat(str_wrap(gettysburg, width = 60), "\n") ``` Note that `strwrap()` returns a character vector with one element for each line; `str_wrap()` returns a single string containing line breaks. # Mutate strings ## `str_replace()`: Replace matched patterns in a string To replace certain patterns within a string, stringr provides the functions `str_replace()` and `str_replace_all()`. The base R equivalents are `sub()` and `gsub()`. Note the difference in default input order again. ```{r} fruits <- c("apple", "banana", "pear", "pineapple") # base sub("[aeiou]", "-", fruits) gsub("[aeiou]", "-", fruits) # stringr str_replace(fruits, "[aeiou]", "-") str_replace_all(fruits, "[aeiou]", "-") ``` ## case: Convert case of a string Both stringr and base R have functions to convert to upper and lower case. Title case is also provided in stringr. ```{r} dog <- "The quick brown dog" # base toupper(dog) tolower(dog) tools::toTitleCase(dog) # stringr str_to_upper(dog) str_to_lower(dog) str_to_title(dog) ``` In stringr we can control the locale, while in base R locale distinctions are controlled with global variables. Therefore, the output of your base R code may vary across different computers with different global settings. ```{r} # stringr str_to_upper("i") # English str_to_upper("i", locale = "tr") # Turkish ``` # Join and split ## `str_flatten()`: Flatten a string If we want to take elements of a string vector and collapse them to a single string we can use the `collapse` argument in `paste()` or use stringr's `str_flatten()`. ```{r} # base paste0(letters, collapse = "-") # stringr str_flatten(letters, collapse = "-") ``` The advantage of `str_flatten()` is that it always returns a vector the same length as its input; to predict the return length of `paste()` you must carefully read all arguments. ## `str_dup()`: duplicate strings within a character vector To duplicate strings within a character vector use `strrep()` (in R 3.3.0 or greater) or `str_dup()`: ```{r} #| eval: !expr getRversion() >= "3.3.0" fruit <- c("apple", "pear", "banana") # base strrep(fruit, 2) strrep(fruit, 1:3) # stringr str_dup(fruit, 2) str_dup(fruit, 1:3) ``` ## `str_split()`: Split up a string into pieces To split a string into pieces with breaks based on a particular pattern match stringr uses `str_split()` and base R uses `strsplit()`. Unlike most other functions, `strsplit()` starts with the character vector to modify. ```{r} fruits <- c( "apples and oranges and pears and bananas", "pineapples and mangos and guavas" ) # base strsplit(fruits, " and ") # stringr str_split(fruits, " and ") ``` The stringr package's `str_split()` allows for more control over the split, including restricting the number of possible matches. ```{r} # stringr str_split(fruits, " and ", n = 3) str_split(fruits, " and ", n = 2) ``` ## `str_glue()`: Interpolate strings It's often useful to interpolate varying values into a fixed string. In base R, you can use `sprintf()` for this purpose; stringr provides a wrapper for the more general purpose [glue](https://glue.tidyverse.org) package. ```{r} name <- "Fred" age <- 50 anniversary <- as.Date("1991-10-12") # base sprintf( "My name is %s my age next year is %s and my anniversary is %s.", name, age + 1, format(anniversary, "%A, %B %d, %Y") ) # stringr str_glue( "My name is {name}, ", "my age next year is {age + 1}, ", "and my anniversary is {format(anniversary, '%A, %B %d, %Y')}." ) ``` # Order strings ## `str_order()`: Order or sort a character vector Both base R and stringr have separate functions to order and sort strings. ```{r} # base order(letters) sort(letters) # stringr str_order(letters) str_sort(letters) ``` Some options in `str_order()` and `str_sort()` don't have analogous base R options. For example, the stringr functions have a `locale` argument to control how to order or sort. In base R the locale is a global setting, so the outputs of `sort()` and `order()` may differ across different computers. For example, in the Norwegian alphabet, å comes after z: ```{r} x <- c("å", "a", "z") str_sort(x) str_sort(x, locale = "no") ``` The stringr functions also have a `numeric` argument to sort digits numerically instead of treating them as strings. ```{r} # stringr x <- c("100a10", "100a5", "2b", "2a") str_sort(x) str_sort(x, numeric = TRUE) ``` stringr/vignettes/regular-expressions.Rmd0000644000176200001440000003447614520174727020466 0ustar liggesusers--- title: "Regular expressions" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Regular expressions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(stringr) ``` Regular expressions are a concise and flexible tool for describing patterns in strings. This vignette describes the key features of stringr's regular expressions, as implemented by [stringi](https://github.com/gagolews/stringi). It is not a tutorial, so if you're unfamiliar regular expressions, I'd recommend starting at . If you want to master the details, I'd recommend reading the classic [_Mastering Regular Expressions_](https://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124) by Jeffrey E. F. Friedl. Regular expressions are the default pattern engine in stringr. That means when you use a pattern matching function with a bare string, it's equivalent to wrapping it in a call to `regex()`: ```{r, eval = FALSE} # The regular call: str_extract(fruit, "nana") # Is shorthand for str_extract(fruit, regex("nana")) ``` You will need to use `regex()` explicitly if you want to override the default options, as you'll see in examples below. ## Basic matches The simplest patterns match exact strings: ```{r} x <- c("apple", "banana", "pear") str_extract(x, "an") ``` You can perform a case-insensitive match using `ignore_case = TRUE`: ```{r} bananas <- c("banana", "Banana", "BANANA") str_detect(bananas, "banana") str_detect(bananas, regex("banana", ignore_case = TRUE)) ``` The next step up in complexity is `.`, which matches any character except a newline: ```{r} str_extract(x, ".a.") ``` You can allow `.` to match everything, including `\n`, by setting `dotall = TRUE`: ```{r} str_detect("\nX\n", ".X.") str_detect("\nX\n", regex(".X.", dotall = TRUE)) ``` ## Escaping If "`.`" matches any character, how do you match a literal "`.`"? You need to use an "escape" to tell the regular expression you want to match it exactly, not use its special behaviour. Like strings, regexps use the backslash, `\`, to escape special behaviour. So to match an `.`, you need the regexp `\.`. Unfortunately this creates a problem. We use strings to represent regular expressions, and `\` is also used as an escape symbol in strings. So to create the regular expression `\.` we need the string `"\\."`. ```{r} # To create the regular expression, we need \\ dot <- "\\." # But the expression itself only contains one: writeLines(dot) # And this tells R to look for an explicit . str_extract(c("abc", "a.c", "bef"), "a\\.c") ``` If `\` is used as an escape character in regular expressions, how do you match a literal `\`? Well you need to escape it, creating the regular expression `\\`. To create that regular expression, you need to use a string, which also needs to escape `\`. That means to match a literal `\` you need to write `"\\\\"` --- you need four backslashes to match one! ```{r} x <- "a\\b" writeLines(x) str_extract(x, "\\\\") ``` In this vignette, I use `\.` to denote the regular expression, and `"\\."` to denote the string that represents the regular expression. An alternative quoting mechanism is `\Q...\E`: all the characters in `...` are treated as exact matches. This is useful if you want to exactly match user input as part of a regular expression. ```{r} x <- c("a.b.c.d", "aeb") starts_with <- "a.b" str_detect(x, paste0("^", starts_with)) str_detect(x, paste0("^\\Q", starts_with, "\\E")) ``` ## Special characters Escapes also allow you to specify individual characters that are otherwise hard to type. You can specify individual unicode characters in five ways, either as a variable number of hex digits (four is most common), or by name: * `\xhh`: 2 hex digits. * `\x{hhhh}`: 1-6 hex digits. * `\uhhhh`: 4 hex digits. * `\Uhhhhhhhh`: 8 hex digits. * `\N{name}`, e.g. `\N{grinning face}` matches the basic smiling emoji. Similarly, you can specify many common control characters: * `\a`: bell. * `\cX`: match a control-X character. * `\e`: escape (`\u001B`). * `\f`: form feed (`\u000C`). * `\n`: line feed (`\u000A`). * `\r`: carriage return (`\u000D`). * `\t`: horizontal tabulation (`\u0009`). * `\0ooo` match an octal character. 'ooo' is from one to three octal digits, from 000 to 0377. The leading zero is required. (Many of these are only of historical interest and are only included here for the sake of completeness.) ## Matching multiple characters There are a number of patterns that match more than one character. You've already seen `.`, which matches any character (except a newline). A closely related operator is `\X`, which matches a __grapheme cluster__, a set of individual elements that form a single symbol. For example, one way of representing "á" is as the letter "a" plus an accent: `.` will match the component "a", while `\X` will match the complete symbol: ```{r} x <- "a\u0301" str_extract(x, ".") str_extract(x, "\\X") ``` There are five other escaped pairs that match narrower classes of characters: * `\d`: matches any digit. The complement, `\D`, matches any character that is not a decimal digit. ```{r} str_extract_all("1 + 2 = 3", "\\d+")[[1]] ``` Technically, `\d` includes any character in the Unicode Category of Nd ("Number, Decimal Digit"), which also includes numeric symbols from other languages: ```{r} # Some Laotian numbers str_detect("១២៣", "\\d") ``` * `\s`: matches any whitespace. This includes tabs, newlines, form feeds, and any character in the Unicode Z Category (which includes a variety of space characters and other separators.). The complement, `\S`, matches any non-whitespace character. ```{r} (text <- "Some \t badly\n\t\tspaced \f text") str_replace_all(text, "\\s+", " ") ``` * `\p{property name}` matches any character with specific unicode property, like `\p{Uppercase}` or `\p{Diacritic}`. The complement, `\P{property name}`, matches all characters without the property. A complete list of unicode properties can be found at . ```{r} (text <- c('"Double quotes"', "«Guillemet»", "“Fancy quotes”")) str_replace_all(text, "\\p{quotation mark}", "'") ``` * `\w` matches any "word" character, which includes alphabetic characters, marks and decimal numbers. The complement, `\W`, matches any non-word character. ```{r} str_extract_all("Don't eat that!", "\\w+")[[1]] str_split("Don't eat that!", "\\W")[[1]] ``` Technically, `\w` also matches connector punctuation, `\u200c` (zero width connector), and `\u200d` (zero width joiner), but these are rarely seen in the wild. * `\b` matches word boundaries, the transition between word and non-word characters. `\B` matches the opposite: boundaries that have either both word or non-word characters on either side. ```{r} str_replace_all("The quick brown fox", "\\b", "_") str_replace_all("The quick brown fox", "\\B", "_") ``` You can also create your own __character classes__ using `[]`: * `[abc]`: matches a, b, or c. * `[a-z]`: matches every character between a and z (in Unicode code point order). * `[^abc]`: matches anything except a, b, or c. * `[\^\-]`: matches `^` or `-`. There are a number of pre-built classes that you can use inside `[]`: * `[:punct:]`: punctuation. * `[:alpha:]`: letters. * `[:lower:]`: lowercase letters. * `[:upper:]`: upperclass letters. * `[:digit:]`: digits. * `[:xdigit:]`: hex digits. * `[:alnum:]`: letters and numbers. * `[:cntrl:]`: control characters. * `[:graph:]`: letters, numbers, and punctuation. * `[:print:]`: letters, numbers, punctuation, and whitespace. * `[:space:]`: space characters (basically equivalent to `\s`). * `[:blank:]`: space and tab. These all go inside the `[]` for character classes, i.e. `[[:digit:]AX]` matches all digits, A, and X. You can also using Unicode properties, like `[\p{Letter}]`, and various set operations, like `[\p{Letter}--\p{script=latin}]`. See `?"stringi-search-charclass"` for details. ## Alternation `|` is the __alternation__ operator, which will pick between one or more possible matches. For example, `abc|def` will match `abc` or `def`: ```{r} str_detect(c("abc", "def", "ghi"), "abc|def") ``` Note that the precedence for `|` is low: `abc|def` is equivalent to `(abc)|(def)` not `ab(c|d)ef`. ## Grouping You can use parentheses to override the default precedence rules: ```{r} str_extract(c("grey", "gray"), "gre|ay") str_extract(c("grey", "gray"), "gr(e|a)y") ``` Parenthesis also define "groups" that you can refer to with __backreferences__, like `\1`, `\2` etc, and can be extracted with `str_match()`. For example, the following regular expression finds all fruits that have a repeated pair of letters: ```{r} pattern <- "(..)\\1" fruit %>% str_subset(pattern) fruit %>% str_subset(pattern) %>% str_match(pattern) ``` You can use `(?:...)`, the non-grouping parentheses, to control precedence but not capture the match in a group. This is slightly more efficient than capturing parentheses. ```{r} str_match(c("grey", "gray"), "gr(e|a)y") str_match(c("grey", "gray"), "gr(?:e|a)y") ``` This is most useful for more complex cases where you need to capture matches and control precedence independently. ## Anchors By default, regular expressions will match any part of a string. It's often useful to __anchor__ the regular expression so that it matches from the start or end of the string: * `^` matches the start of string. * `$` matches the end of the string. ```{r} x <- c("apple", "banana", "pear") str_extract(x, "^a") str_extract(x, "a$") ``` To match a literal "$" or "^", you need to escape them, `\$`, and `\^`. For multiline strings, you can use `regex(multiline = TRUE)`. This changes the behaviour of `^` and `$`, and introduces three new operators: * `^` now matches the start of each line. * `$` now matches the end of each line. * `\A` matches the start of the input. * `\z` matches the end of the input. * `\Z` matches the end of the input, but before the final line terminator, if it exists. ```{r} x <- "Line 1\nLine 2\nLine 3\n" str_extract_all(x, "^Line..")[[1]] str_extract_all(x, regex("^Line..", multiline = TRUE))[[1]] str_extract_all(x, regex("\\ALine..", multiline = TRUE))[[1]] ``` ## Repetition You can control how many times a pattern matches with the repetition operators: * `?`: 0 or 1. * `+`: 1 or more. * `*`: 0 or more. ```{r} x <- "1888 is the longest year in Roman numerals: MDCCCLXXXVIII" str_extract(x, "CC?") str_extract(x, "CC+") str_extract(x, 'C[LX]+') ``` Note that the precedence of these operators is high, so you can write: `colou?r` to match either American or British spellings. That means most uses will need parentheses, like `bana(na)+`. You can also specify the number of matches precisely: * `{n}`: exactly n * `{n,}`: n or more * `{n,m}`: between n and m ```{r} str_extract(x, "C{2}") str_extract(x, "C{2,}") str_extract(x, "C{2,3}") ``` By default these matches are "greedy": they will match the longest string possible. You can make them "lazy", matching the shortest string possible by putting a `?` after them: * `??`: 0 or 1, prefer 0. * `+?`: 1 or more, match as few times as possible. * `*?`: 0 or more, match as few times as possible. * `{n,}?`: n or more, match as few times as possible. * `{n,m}?`: between n and m, , match as few times as possible, but at least n. ```{r} str_extract(x, c("C{2,3}", "C{2,3}?")) str_extract(x, c("C[LX]+", "C[LX]+?")) ``` You can also make the matches possessive by putting a `+` after them, which means that if later parts of the match fail, the repetition will not be re-tried with a smaller number of characters. This is an advanced feature used to improve performance in worst-case scenarios (called "catastrophic backtracking"). * `?+`: 0 or 1, possessive. * `++`: 1 or more, possessive. * `*+`: 0 or more, possessive. * `{n}+`: exactly n, possessive. * `{n,}+`: n or more, possessive. * `{n,m}+`: between n and m, possessive. A related concept is the __atomic-match__ parenthesis, `(?>...)`. If a later match fails and the engine needs to back-track, an atomic match is kept as is: it succeeds or fails as a whole. Compare the following two regular expressions: ```{r} str_detect("ABC", "(?>A|.B)C") str_detect("ABC", "(?:A|.B)C") ``` The atomic match fails because it matches A, and then the next character is a C so it fails. The regular match succeeds because it matches A, but then C doesn't match, so it back-tracks and tries B instead. ## Look arounds These assertions look ahead or behind the current match without "consuming" any characters (i.e. changing the input position). * `(?=...)`: positive look-ahead assertion. Matches if `...` matches at the current input. * `(?!...)`: negative look-ahead assertion. Matches if `...` __does not__ match at the current input. * `(?<=...)`: positive look-behind assertion. Matches if `...` matches text preceding the current position, with the last character of the match being the character just before the current position. Length must be bounded (i.e. no `*` or `+`). * `(? %\VignetteIndexEntry{Introduction to stringr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(stringr) knitr::opts_chunk$set( comment = "#>", collapse = TRUE ) ``` There are four main families of functions in stringr: 1. Character manipulation: these functions allow you to manipulate individual characters within the strings in character vectors. 1. Whitespace tools to add, remove, and manipulate whitespace. 1. Locale sensitive operations whose operations will vary from locale to locale. 1. Pattern matching functions. These recognise four engines of pattern description. The most common is regular expressions, but there are three other tools. ## Getting and setting individual characters You can get the length of the string with `str_length()`: ```{r} str_length("abc") ``` This is now equivalent to the base R function `nchar()`. Previously it was needed to work around issues with `nchar()` such as the fact that it returned 2 for `nchar(NA)`. This has been fixed as of R 3.3.0, so it is no longer so important. You can access individual character using `str_sub()`. It takes three arguments: a character vector, a `start` position and an `end` position. Either position can either be a positive integer, which counts from the left, or a negative integer which counts from the right. The positions are inclusive, and if longer than the string, will be silently truncated. ```{r} x <- c("abcdef", "ghifjk") # The 3rd letter str_sub(x, 3, 3) # The 2nd to 2nd-to-last character str_sub(x, 2, -2) ``` You can also use `str_sub()` to modify strings: ```{r} str_sub(x, 3, 3) <- "X" x ``` To duplicate individual strings, you can use `str_dup()`: ```{r} str_dup(x, c(2, 3)) ``` ## Whitespace Three functions add, remove, or modify whitespace: 1. `str_pad()` pads a string to a fixed length by adding extra whitespace on the left, right, or both sides. ```{r} x <- c("abc", "defghi") str_pad(x, 10) # default pads on left str_pad(x, 10, "both") ``` (You can pad with other characters by using the `pad` argument.) `str_pad()` will never make a string shorter: ```{r} str_pad(x, 4) ``` So if you want to ensure that all strings are the same length (often useful for print methods), combine `str_pad()` and `str_trunc()`: ```{r} x <- c("Short", "This is a long string") x %>% str_trunc(10) %>% str_pad(10, "right") ``` 1. The opposite of `str_pad()` is `str_trim()`, which removes leading and trailing whitespace: ```{r} x <- c(" a ", "b ", " c") str_trim(x) str_trim(x, "left") ``` 1. You can use `str_wrap()` to modify existing whitespace in order to wrap a paragraph of text, such that the length of each line is as similar as possible. ```{r} jabberwocky <- str_c( "`Twas brillig, and the slithy toves ", "did gyre and gimble in the wabe: ", "All mimsy were the borogoves, ", "and the mome raths outgrabe. " ) cat(str_wrap(jabberwocky, width = 40)) ``` ## Locale sensitive A handful of stringr functions are locale-sensitive: they will perform differently in different regions of the world. These functions are case transformation functions: ```{r} x <- "I like horses." str_to_upper(x) str_to_title(x) str_to_lower(x) # Turkish has two sorts of i: with and without the dot str_to_lower(x, "tr") ``` String ordering and sorting: ```{r} x <- c("y", "i", "k") str_order(x) str_sort(x) # In Lithuanian, y comes between i and k str_sort(x, locale = "lt") ``` The locale always defaults to English to ensure that the default behaviour is identical across systems. Locales always include a two letter ISO-639-1 language code (like "en" for English or "zh" for Chinese), and optionally a ISO-3166 country code (like "en_UK" vs "en_US"). You can see a complete list of available locales by running `stringi::stri_locale_list()`. ## Pattern matching The vast majority of stringr functions work with patterns. These are parameterised by the task they perform and the types of patterns they match. ### Tasks Each pattern matching function has the same first two arguments, a character vector of `string`s to process and a single `pattern` to match. stringr provides pattern matching functions to **detect**, **locate**, **extract**, **match**, **replace**, and **split** strings. I'll illustrate how they work with some strings and a regular expression designed to match (US) phone numbers: ```{r} strings <- c( "apple", "219 733 8965", "329-293-8753", "Work: 579-499-7527; Home: 543.355.3679" ) phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})" ``` - `str_detect()` detects the presence or absence of a pattern and returns a logical vector (similar to `grepl()`). `str_subset()` returns the elements of a character vector that match a regular expression (similar to `grep()` with `value = TRUE`)`. ```{r} # Which strings contain phone numbers? str_detect(strings, phone) str_subset(strings, phone) ``` - `str_count()` counts the number of matches: ```{r} # How many phone numbers in each string? str_count(strings, phone) ``` - `str_locate()` locates the **first** position of a pattern and returns a numeric matrix with columns start and end. `str_locate_all()` locates all matches, returning a list of numeric matrices. Similar to `regexpr()` and `gregexpr()`. ```{r} # Where in the string is the phone number located? (loc <- str_locate(strings, phone)) str_locate_all(strings, phone) ``` - `str_extract()` extracts text corresponding to the **first** match, returning a character vector. `str_extract_all()` extracts all matches and returns a list of character vectors. ```{r} # What are the phone numbers? str_extract(strings, phone) str_extract_all(strings, phone) str_extract_all(strings, phone, simplify = TRUE) ``` - `str_match()` extracts capture groups formed by `()` from the **first** match. It returns a character matrix with one column for the complete match and one column for each group. `str_match_all()` extracts capture groups from all matches and returns a list of character matrices. Similar to `regmatches()`. ```{r} # Pull out the three components of the match str_match(strings, phone) str_match_all(strings, phone) ``` - `str_replace()` replaces the **first** matched pattern and returns a character vector. `str_replace_all()` replaces all matches. Similar to `sub()` and `gsub()`. ```{r} str_replace(strings, phone, "XXX-XXX-XXXX") str_replace_all(strings, phone, "XXX-XXX-XXXX") ``` - `str_split_fixed()` splits a string into a **fixed** number of pieces based on a pattern and returns a character matrix. `str_split()` splits a string into a **variable** number of pieces and returns a list of character vectors. ```{r} str_split("a-b-c", "-") str_split_fixed("a-b-c", "-", n = 2) ``` ### Engines There are four main engines that stringr can use to describe patterns: * Regular expressions, the default, as shown above, and described in `vignette("regular-expressions")`. * Fixed bytewise matching, with `fixed()`. * Locale-sensitive character matching, with `coll()` * Text boundary analysis with `boundary()`. #### Fixed matches `fixed(x)` only matches the exact sequence of bytes specified by `x`. This is a very limited "pattern", but the restriction can make matching much faster. Beware using `fixed()` with non-English data. It is problematic because there are often multiple ways of representing the same character. For example, there are two ways to define "á": either as a single character or as an "a" plus an accent: ```{r} a1 <- "\u00e1" a2 <- "a\u0301" c(a1, a2) a1 == a2 ``` They render identically, but because they're defined differently, `fixed()` doesn't find a match. Instead, you can use `coll()`, explained below, to respect human character comparison rules: ```{r} str_detect(a1, fixed(a2)) str_detect(a1, coll(a2)) ``` #### Collation search `coll(x)` looks for a match to `x` using human-language **coll**ation rules, and is particularly important if you want to do case insensitive matching. Collation rules differ around the world, so you'll also need to supply a `locale` parameter. ```{r} i <- c("I", "İ", "i", "ı") i str_subset(i, coll("i", ignore_case = TRUE)) str_subset(i, coll("i", ignore_case = TRUE, locale = "tr")) ``` The downside of `coll()` is speed. Because the rules for recognising which characters are the same are complicated, `coll()` is relatively slow compared to `regex()` and `fixed()`. Note that when both `fixed()` and `regex()` have `ignore_case` arguments, they perform a much simpler comparison than `coll()`. #### Boundary `boundary()` matches boundaries between characters, lines, sentences or words. It's most useful with `str_split()`, but can be used with all pattern matching functions: ```{r} x <- "This is a sentence." str_split(x, boundary("word")) str_count(x, boundary("word")) str_extract_all(x, boundary("word")) ``` By convention, `""` is treated as `boundary("character")`: ```{r} str_split(x, "") str_count(x, "") ``` stringr/R/0000755000176200001440000000000014524700556012153 5ustar liggesusersstringr/R/conv.R0000644000176200001440000000104314316640452013236 0ustar liggesusers#' Specify the encoding of a string #' #' This is a convenient way to override the current encoding of a string. #' #' @inheritParams str_detect #' @param encoding Name of encoding. See [stringi::stri_enc_list()] #' for a complete list. #' @export #' @examples #' # Example from encoding?stringi::stringi #' x <- rawToChar(as.raw(177)) #' x #' str_conv(x, "ISO-8859-2") # Polish "a with ogonek" #' str_conv(x, "ISO-8859-1") # Plus-minus str_conv <- function(string, encoding) { check_string(encoding) stri_conv(string, encoding, "UTF-8") } stringr/R/count.R0000644000176200001440000000205514520174727013431 0ustar liggesusers#' Count number of matches #' #' Counts the number of times `pattern` is found within each element #' of `string.` #' #' @inheritParams str_detect #' @return An integer vector the same length as `string`/`pattern`. #' @seealso [stringi::stri_count()] which this function wraps. #' #' [str_locate()]/[str_locate_all()] to locate position #' of matches #' #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_count(fruit, "a") #' str_count(fruit, "p") #' str_count(fruit, "e") #' str_count(fruit, c("a", "b", "p", "p")) #' #' str_count(c("a.", "...", ".a.a"), ".") #' str_count(c("a.", "...", ".a.a"), fixed(".")) str_count <- function(string, pattern = "") { check_lengths(string, pattern) switch(type(pattern), empty = , bound = stri_count_boundaries(string, opts_brkiter = opts(pattern)), fixed = stri_count_fixed(string, pattern, opts_fixed = opts(pattern)), coll = stri_count_coll(string, pattern, opts_collator = opts(pattern)), regex = stri_count_regex(string, pattern, opts_regex = opts(pattern)) ) } stringr/R/wrap.R0000644000176200001440000000336114317040167013245 0ustar liggesusers#' Wrap words into nicely formatted paragraphs #' #' Wrap words into paragraphs, minimizing the "raggedness" of the lines #' (i.e. the variation in length line) using the Knuth-Plass algorithm. #' #' @inheritParams str_detect #' @param width Positive integer giving target line width (in number of #' characters). A width less than or equal to 1 will put each word on its #' own line. #' @param indent,exdent A non-negative integer giving the indent for the #' first line (`indent`) and all subsequent lines (`exdent`). #' @param whitespace_only A boolean. #' * If `TRUE` (the default) wrapping will only occur at whitespace. #' * If `FALSE`, can break on any non-word character (e.g. `/`, `-`). #' @return A character vector the same length as `string`. #' @seealso [stringi::stri_wrap()] for the underlying implementation. #' @export #' @examples #' thanks_path <- file.path(R.home("doc"), "THANKS") #' thanks <- str_c(readLines(thanks_path), collapse = "\n") #' thanks <- word(thanks, 1, 3, fixed("\n\n")) #' cat(str_wrap(thanks), "\n") #' cat(str_wrap(thanks, width = 40), "\n") #' cat(str_wrap(thanks, width = 60, indent = 2), "\n") #' cat(str_wrap(thanks, width = 60, exdent = 2), "\n") #' cat(str_wrap(thanks, width = 0, exdent = 2), "\n") str_wrap <- function(string, width = 80, indent = 0, exdent = 0, whitespace_only = TRUE) { check_number_decimal(width) if (width <= 0) { width <- 1 } check_number_whole(indent) check_number_whole(exdent) check_bool(whitespace_only) out <- stri_wrap(string, width = width, indent = indent, exdent = exdent, whitespace_only = whitespace_only, simplify = FALSE) vapply(out, str_c, collapse = "\n", character(1)) } stringr/R/locate.R0000644000176200001440000000575214520174727013557 0ustar liggesusers#' Find location of match #' #' @description #' `str_locate()` returns the `start` and `end` position of the first match; #' `str_locate_all()` returns the `start` and `end` position of each match. #' #' Because the `start` and `end` values are inclusive, zero-length matches #' (e.g. `$`, `^`, `\\b`) will have an `end` that is smaller than `start`. #' #' @inheritParams str_detect #' @returns #' * `str_locate()` returns an integer matrix with two columns and #' one row for each element of `string`. The first column, `start`, #' gives the position at the start of the match, and the second column, `end`, #' gives the position of the end. #' #'* `str_locate_all()` returns a list of integer matrices with the same #' length as `string`/`pattern`. The matrices have columns `start` and `end` #' as above, and one row for each match. #' @seealso #' [str_extract()] for a convenient way of extracting matches, #' [stringi::stri_locate()] for the underlying implementation. #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_locate(fruit, "$") #' str_locate(fruit, "a") #' str_locate(fruit, "e") #' str_locate(fruit, c("a", "b", "p", "p")) #' #' str_locate_all(fruit, "a") #' str_locate_all(fruit, "e") #' str_locate_all(fruit, c("a", "b", "p", "p")) #' #' # Find location of every character #' str_locate_all(fruit, "") str_locate <- function(string, pattern) { check_lengths(string, pattern) switch(type(pattern), empty = , bound = stri_locate_first_boundaries(string, opts_brkiter = opts(pattern)), fixed = stri_locate_first_fixed(string, pattern, opts_fixed = opts(pattern)), coll = stri_locate_first_coll(string, pattern, opts_collator = opts(pattern)), regex = stri_locate_first_regex(string, pattern, opts_regex = opts(pattern)) ) } #' @rdname str_locate #' @export str_locate_all <- function(string, pattern) { check_lengths(string, pattern) opts <- opts(pattern) switch(type(pattern), empty = , bound = stri_locate_all_boundaries(string, omit_no_match = TRUE, opts_brkiter = opts), fixed = stri_locate_all_fixed(string, pattern, omit_no_match = TRUE, opts_fixed = opts), regex = stri_locate_all_regex(string, pattern, omit_no_match = TRUE, opts_regex = opts), coll = stri_locate_all_coll(string, pattern, omit_no_match = TRUE, opts_collator = opts) ) } #' Switch location of matches to location of non-matches #' #' Invert a matrix of match locations to match the opposite of what was #' previously matched. #' #' @param loc matrix of match locations, as from [str_locate_all()] #' @return numeric match giving locations of non-matches #' @export #' @examples #' numbers <- "1 and 2 and 4 and 456" #' num_loc <- str_locate_all(numbers, "[0-9]+")[[1]] #' str_sub(numbers, num_loc[, "start"], num_loc[, "end"]) #' #' text_loc <- invert_match(num_loc) #' str_sub(numbers, text_loc[, "start"], text_loc[, "end"]) invert_match <- function(loc) { cbind( start = c(0L, loc[, "end"] + 1L), end = c(loc[, "start"] - 1L, -1L) ) } stringr/R/detect.R0000644000176200001440000001300014524677110013537 0ustar liggesusers#' Detect the presence/absence of a match #' #' `str_detect()` returns a logical vector with `TRUE` for each element of #' `string` that matches `pattern` and `FALSE` otherwise. It's equivalent to #' `grepl(pattern, string)`. #' #' @param string Input vector. Either a character vector, or something #' coercible to one. #' @param pattern Pattern to look for. #' #' The default interpretation is a regular expression, as described in #' `vignette("regular-expressions")`. Use [regex()] for finer control of the #' matching behaviour. #' #' Match a fixed string (i.e. by comparing only bytes), using #' [fixed()]. This is fast, but approximate. Generally, #' for matching human text, you'll want [coll()] which #' respects character matching rules for the specified locale. #' #' Match character, word, line and sentence boundaries with #' [boundary()]. An empty pattern, "", is equivalent to #' `boundary("character")`. #' #' @param negate If `TRUE`, inverts the resulting boolean vector. #' @return A logical vector the same length as `string`/`pattern`. #' @seealso [stringi::stri_detect()] which this function wraps, #' [str_subset()] for a convenient wrapper around #' `x[str_detect(x, pattern)]` #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_detect(fruit, "a") #' str_detect(fruit, "^a") #' str_detect(fruit, "a$") #' str_detect(fruit, "b") #' str_detect(fruit, "[aeiou]") #' #' # Also vectorised over pattern #' str_detect("aecfg", letters) #' #' # Returns TRUE if the pattern do NOT match #' str_detect(fruit, "^p", negate = TRUE) str_detect <- function(string, pattern, negate = FALSE) { check_lengths(string, pattern) check_bool(negate) switch(type(pattern), empty = no_empty(), bound = no_boundary(), fixed = stri_detect_fixed(string, pattern, negate = negate, opts_fixed = opts(pattern)), coll = stri_detect_coll(string, pattern, negate = negate, opts_collator = opts(pattern)), regex = stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) ) } #' Detect the presence/absence of a match at the start/end #' #' `str_starts()` and `str_ends()` are special cases of [str_detect()] that #' only match at the beginning or end of a string, respectively. #' #' @inheritParams str_detect #' @param pattern Pattern with which the string starts or ends. #' #' The default interpretation is a regular expression, as described in #' [stringi::about_search_regex]. Control options with [regex()]. #' #' Match a fixed string (i.e. by comparing only bytes), using [fixed()]. This #' is fast, but approximate. Generally, for matching human text, you'll want #' [coll()] which respects character matching rules for the specified locale. #' #' @return A logical vector. #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_starts(fruit, "p") #' str_starts(fruit, "p", negate = TRUE) #' str_ends(fruit, "e") #' str_ends(fruit, "e", negate = TRUE) str_starts <- function(string, pattern, negate = FALSE) { check_lengths(string, pattern) check_bool(negate) switch(type(pattern), empty = no_empty(), bound = no_boundary(), fixed = stri_startswith_fixed(string, pattern, negate = negate, opts_fixed = opts(pattern)), coll = stri_startswith_coll(string, pattern, negate = negate, opts_collator = opts(pattern)), regex = { pattern2 <- paste0("^(", pattern, ")") stri_detect_regex(string, pattern2, negate = negate, opts_regex = opts(pattern)) } ) } #' @rdname str_starts #' @export str_ends <- function(string, pattern, negate = FALSE) { check_lengths(string, pattern) check_bool(negate) switch(type(pattern), empty = no_empty(), bound = no_boundary(), fixed = stri_endswith_fixed(string, pattern, negate = negate, opts_fixed = opts(pattern)), coll = stri_endswith_coll(string, pattern, negate = negate, opts_collator = opts(pattern)), regex = { pattern2 <- paste0("(", pattern, ")$") stri_detect_regex(string, pattern2, negate = negate, opts_regex = opts(pattern)) } ) } #' Detect a pattern in the same way as `SQL`'s `LIKE` operator #' #' @description #' `str_like()` follows the conventions of the SQL `LIKE` operator: #' #' * Must match the entire string. #' * `_` matches a single character (like `.`). #' * `%` matches any number of characters (like `.*`). #' * `\%` and `\_` match literal `%` and `_`. #' * The match is case insensitive by default. #' #' @inheritParams str_detect #' @param pattern A character vector containing a SQL "like" pattern. #' See above for details. #' @param ignore_case Ignore case of matches? Defaults to `TRUE` to match #' the SQL `LIKE` operator. #' @return A logical vector the same length as `string`. #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_like(fruit, "app") #' str_like(fruit, "app%") #' str_like(fruit, "ba_ana") #' str_like(fruit, "%APPLE") str_like <- function(string, pattern, ignore_case = TRUE) { check_lengths(string, pattern) check_character(pattern) if (inherits(pattern, "stringr_pattern")) { cli::cli_abort("{.arg pattern} must be a plain string, not a stringr modifier.") } check_bool(ignore_case) pattern <- regex(like_to_regex(pattern), ignore_case = ignore_case) stri_detect_regex(string, pattern, opts_regex = opts(pattern)) } like_to_regex <- function(pattern) { converted <- stri_replace_all_regex(pattern, "(?` and unusual whitespace (i.e. all whitespace #' apart from `" "` and `"\n"`) are surrounded by `{}` and escaped. Where #' possible, matches and unusual whitespace are coloured blue and `NA`s red. #' #' @inheritParams str_detect #' @param match If `pattern` is supplied, which elements should be shown? #' #' * `TRUE`, the default, shows only elements that match the pattern. #' * `NA` shows all elements. #' * `FALSE` shows only elements that don't match the pattern. #' #' If `pattern` is not supplied, all elements are always shown. #' @param html Use HTML output? If `TRUE` will create an HTML widget; if `FALSE` #' will style using ANSI escapes. #' @param use_escapes If `TRUE`, all non-ASCII characters will be rendered #' with unicode escapes. This is useful to see exactly what underlying #' values are stored in the string. #' @export #' @examples #' # Show special characters #' str_view(c("\"\\", "\\\\\\", "fgh", NA, "NA")) #' #' # A non-breaking space looks like a regular space: #' nbsp <- "Hi\u00A0you" #' nbsp #' # But it doesn't behave like one: #' str_detect(nbsp, " ") #' # So str_view() brings it to your attention with a blue background #' str_view(nbsp) #' #' # You can also use escapes to see all non-ASCII characters #' str_view(nbsp, use_escapes = TRUE) #' #' # Supply a pattern to see where it matches #' str_view(c("abc", "def", "fghi"), "[aeiou]") #' str_view(c("abc", "def", "fghi"), "^") #' str_view(c("abc", "def", "fghi"), "..") #' #' # By default, only matching strings will be shown #' str_view(c("abc", "def", "fghi"), "e") #' # but you can show all: #' str_view(c("abc", "def", "fghi"), "e", match = NA) #' # or just those that don't match: #' str_view(c("abc", "def", "fghi"), "e", match = FALSE) str_view <- function(string, pattern = NULL, match = TRUE, html = FALSE, use_escapes = FALSE) { rec <- vctrs::vec_recycle_common(string = string, pattern = pattern) string <- rec$string pattern <- rec$pattern check_bool(match, allow_na = TRUE) check_bool(html) check_bool(use_escapes) filter <- str_view_filter(string, pattern, match) out <- string[filter] pattern <- pattern[filter] if (!is.null(pattern)) { out <- str_replace_all(out, pattern, str_view_highlighter(html)) } if (use_escapes) { out <- stri_escape_unicode(out) out <- str_replace_all(out, fixed("\\u001b"), "\u001b") } else { out <- str_view_special(out, html = html) } str_view_print(out, filter, html = html) } #' @rdname str_view #' @usage NULL #' @export str_view_all <- function(string, pattern = NULL, match = NA, html = FALSE, use_escapes = FALSE) { lifecycle::deprecate_warn("1.5.0", "str_view_all()", "str_view()") str_view( string = string, pattern = pattern, match = match, html = html, use_escapes = use_escapes ) } str_view_filter <- function(x, pattern, match) { if (is.null(pattern) || inherits(pattern, "stringr_boundary")) { rep(TRUE, length(x)) } else { if (identical(match, TRUE)) { str_detect(x, pattern) & !is.na(x) } else if (identical(match, FALSE)) { !str_detect(x, pattern) | is.na(x) } else { rep(TRUE, length(x)) } } } # Helpers ----------------------------------------------------------------- str_view_highlighter <- function(html = TRUE) { if (html) { function(x) paste0("", x, "") } else { function(x) { out <- cli::col_cyan("<", x, ">") # Ensure styling is starts and ends within each line out <- cli::ansi_strsplit(out, "\n", fixed = TRUE) out <- map_chr(out, str_flatten, "\n") out } } } str_view_special <- function(x, html = TRUE) { if (html) { replace <- function(x) paste0("", x, "") } else { replace <- function(x) cli::col_cyan("{", stri_escape_unicode(x), "}") } # Highlight any non-standard whitespace characters str_replace_all(x, "[\\p{Whitespace}-- \n]+", replace) } str_view_print <- function(x, filter, html = TRUE) { if (html) { str_view_widget(x) } else { structure(x, id = which(filter), class = "stringr_view") } } str_view_widget <- function(lines) { check_installed(c("htmltools", "htmlwidgets")) lines <- str_replace_na(lines) bullets <- str_c( "
    \n", str_c("
  • ", lines, "
  • ", collapse = "\n"), "\n
" ) html <- htmltools::HTML(bullets) size <- htmlwidgets::sizingPolicy( knitr.figure = FALSE, defaultHeight = pmin(10 * length(lines), 300), knitr.defaultHeight = "100%" ) htmlwidgets::createWidget( "str_view", list(html = html), sizingPolicy = size, package = "stringr" ) } #' @export print.stringr_view <- function(x, ..., n = getOption("stringr.view_n", 20)) { n_extra <- length(x) - n if (n_extra > 0) { x <- x[seq_len(n)] } if (length(x) == 0) { return(invisible(x)) } bar <- if (cli::is_utf8_output()) "\u2502" else "|" id <- format(paste0("[", attr(x, "id"), "] "), justify = "right") indent <- paste0(cli::col_grey(id, bar), " ") exdent <- paste0(strrep(" ", nchar(id[[1]])), cli::col_grey(bar), " ") x[is.na(x)] <- cli::col_red("NA") x <- paste0(indent, x) x <- str_replace_all(x, "\n", paste0("\n", exdent)) cat(x, sep = "\n") if (n_extra > 0) { cat("... and ", n_extra, " more\n", sep = "") } invisible(x) } #' @export `[.stringr_view` <- function(x, i, ...) { structure(NextMethod(), id = attr(x, "id")[i], class = "stringr_view") } stringr/R/utils.R0000644000176200001440000000135114520174727013437 0ustar liggesusers#' Pipe operator #' #' @name %>% #' @rdname pipe #' @keywords internal #' @export #' @importFrom magrittr %>% #' @usage lhs \%>\% rhs NULL check_lengths <- function(string, pattern, replacement = NULL, error_call = caller_env()) { # stringi already correctly recycles vectors of length 0 and 1 # we just want more stringent vctrs checks for other lengths vctrs::vec_size_common( string = string, pattern = pattern, replacement = replacement, .call = error_call ) } no_boundary <- function(call = caller_env()) { cli::cli_abort("{.arg pattern} can't be a boundary.", call = call) } no_empty <- function(call = caller_env()) { cli::cli_abort("{.arg pattern} can't be the empty string ({.code \"\"}).", call = call) } stringr/R/pad.R0000644000176200001440000000301314520174727013040 0ustar liggesusers#' Pad a string to minimum width #' #' Pad a string to a fixed width, so that #' `str_length(str_pad(x, n))` is always greater than or equal to `n`. #' #' @inheritParams str_detect #' @param width Minimum width of padded strings. #' @param side Side on which padding character is added (left, right or both). #' @param pad Single padding character (default is a space). #' @param use_width If `FALSE`, use the length of the string instead of the #' width; see [str_width()]/[str_length()] for the difference. #' @return A character vector the same length as `stringr`/`width`/`pad`. #' @seealso [str_trim()] to remove whitespace; #' [str_trunc()] to decrease the maximum width of a string. #' @export #' @examples #' rbind( #' str_pad("hadley", 30, "left"), #' str_pad("hadley", 30, "right"), #' str_pad("hadley", 30, "both") #' ) #' #' # All arguments are vectorised except side #' str_pad(c("a", "abc", "abcdef"), 10) #' str_pad("a", c(5, 10, 20)) #' str_pad("a", 10, pad = c("-", "_", " ")) #' #' # Longer strings are returned unchanged #' str_pad("hadley", 3) str_pad <- function(string, width, side = c("left", "right", "both"), pad = " ", use_width = TRUE) { vctrs::vec_size_common(string = string, width = width, pad = pad) side <- arg_match(side) check_bool(use_width) switch(side, left = stri_pad_left(string, width, pad = pad, use_length = !use_width), right = stri_pad_right(string, width, pad = pad, use_length = !use_width), both = stri_pad_both(string, width, pad = pad, use_length = !use_width) ) } stringr/R/subset.R0000644000176200001440000000407114524677110013604 0ustar liggesusers#' Find matching elements #' #' @description #' `str_subset()` returns all elements of `string` where there's at least #' one match to `pattern`. It's a wrapper around `x[str_detect(x, pattern)]`, #' and is equivalent to `grep(pattern, x, value = TRUE)`. #' #' Use [str_extract()] to find the location of the match _within_ each string. #' #' @inheritParams str_detect #' @return A character vector, usually smaller than `string`. #' @seealso [grep()] with argument `value = TRUE`, #' [stringi::stri_subset()] for the underlying implementation. #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_subset(fruit, "a") #' #' str_subset(fruit, "^a") #' str_subset(fruit, "a$") #' str_subset(fruit, "b") #' str_subset(fruit, "[aeiou]") #' #' # Elements that don't match #' str_subset(fruit, "^p", negate = TRUE) #' #' # Missings never match #' str_subset(c("a", NA, "b"), ".") str_subset <- function(string, pattern, negate = FALSE) { check_lengths(string, pattern) check_bool(negate) switch(type(pattern), empty = no_empty(), bound = no_boundary(), fixed = stri_subset_fixed(string, pattern, omit_na = TRUE, negate = negate, opts_fixed = opts(pattern)), coll = stri_subset_coll(string, pattern, omit_na = TRUE, negate = negate, opts_collator = opts(pattern)), regex = stri_subset_regex(string, pattern, omit_na = TRUE, negate = negate, opts_regex = opts(pattern)) ) } #' Find matching indices #' #' `str_which()` returns the indices of `string` where there's at least #' one match to `pattern`. It's a wrapper around #' `which(str_detect(x, pattern))`, and is equivalent to `grep(pattern, x)`. #' #' @inheritParams str_detect #' @return An integer vector, usually smaller than `string`. #' @export #' @examples #' fruit <- c("apple", "banana", "pear", "pineapple") #' str_which(fruit, "a") #' #' # Elements that don't match #' str_which(fruit, "^p", negate = TRUE) #' #' # Missings never match #' str_which(c("a", NA, "b"), ".") str_which <- function(string, pattern, negate = FALSE) { which(str_detect(string, pattern, negate = negate)) } stringr/R/split.R0000644000176200001440000001042214524700556013430 0ustar liggesusers#' Split up a string into pieces #' #' @description #' This family of functions provides various ways of splitting a string up #' into pieces. These two functions return a character vector: #' #' * `str_split_1()` takes a single string and splits it into pieces, #' returning a single character vector. #' * `str_split_i()` splits each string in a character vector into pieces and #' extracts the `i`th value, returning a character vector. #' #' These two functions return a more complex object: #' #' * `str_split()` splits each string in a character vector into a varying #' number of pieces, returning a list of character vectors. #' * `str_split_fixed()` splits each string in a character vector into a #' fixed number of pieces, returning a character matrix. #' #' @inheritParams str_detect #' @inheritParams str_extract #' @param n Maximum number of pieces to return. Default (Inf) uses all #' possible split positions. #' #' For `str_split()`, this determines the maximum length of each element #' of the output. For `str_split_fixed()`, this determines the number of #' columns in the output; if an input is too short, the result will be padded #' with `""`. #' @return #' * `str_split_1()`: a character vector. #' * `str_split()`: a list the same length as `string`/`pattern` containing #' character vectors. #' * `str_split_fixed()`: a character matrix with `n` columns and the same #' number of rows as the length of `string`/`pattern`. #' * `str_split_i()`: a character vector the same length as `string`/`pattern`. #' @seealso [stri_split()] for the underlying implementation. #' @export #' @examples #' fruits <- c( #' "apples and oranges and pears and bananas", #' "pineapples and mangos and guavas" #' ) #' #' str_split(fruits, " and ") #' str_split(fruits, " and ", simplify = TRUE) #' #' # If you want to split a single string, use `str_split_1` #' str_split_1(fruits[[1]], " and ") #' #' # Specify n to restrict the number of possible matches #' str_split(fruits, " and ", n = 3) #' str_split(fruits, " and ", n = 2) #' # If n greater than number of pieces, no padding occurs #' str_split(fruits, " and ", n = 5) #' #' # Use fixed to return a character matrix #' str_split_fixed(fruits, " and ", 3) #' str_split_fixed(fruits, " and ", 4) #' #' # str_split_i extracts only a single piece from a string #' str_split_i(fruits, " and ", 1) #' str_split_i(fruits, " and ", 4) #' # use a negative number to select from the end #' str_split_i(fruits, " and ", -1) str_split <- function(string, pattern, n = Inf, simplify = FALSE) { check_lengths(string, pattern) check_positive_integer(n) check_bool(simplify, allow_na = TRUE) if (identical(n, Inf)) { n <- -1L } switch(type(pattern), empty = stri_split_boundaries(string, n = n, simplify = simplify, opts_brkiter = opts(pattern)), bound = stri_split_boundaries(string, n = n, simplify = simplify, opts_brkiter = opts(pattern)), fixed = stri_split_fixed(string, pattern, n = n, simplify = simplify, opts_fixed = opts(pattern)), regex = stri_split_regex(string, pattern, n = n, simplify = simplify, opts_regex = opts(pattern)), coll = stri_split_coll(string, pattern, n = n, simplify = simplify, opts_collator = opts(pattern)) ) } #' @export #' @rdname str_split str_split_1 <- function(string, pattern) { check_string(string) str_split(string, pattern)[[1]] } #' @export #' @rdname str_split str_split_fixed <- function(string, pattern, n) { check_lengths(string, pattern) check_positive_integer(n) str_split(string, pattern, n = n, simplify = TRUE) } #' @export #' @rdname str_split #' @param i Element to return. Use a negative value to count from the #' right hand side. str_split_i <- function(string, pattern, i) { check_number_whole(i) if (i > 0) { out <- str_split(string, pattern, simplify = NA, n = i + 1) out[, i] } else if (i < 0) { i <- abs(i) pieces <- str_split(string, pattern) last <- function(x) { n <- length(x) if (i > n) { NA_character_ } else{ x[[n + 1 - i]] } } map_chr(pieces, last) } else { cli::cli_abort("{.arg i} must not be 0.") } } check_positive_integer <- function(x, arg = caller_arg(x), call = caller_env()) { if (!identical(x, Inf)) { check_number_whole(x, min = 1, arg = arg, call = call) } } stringr/R/trunc.R0000644000176200001440000000271314524677110013433 0ustar liggesusers#' Truncate a string to maximum width #' #' Truncate a string to a fixed of characters, so that #' `str_length(str_trunc(x, n))` is always less than or equal to `n`. #' #' @inheritParams str_detect #' @param width Maximum width of string. #' @param side,ellipsis Location and content of ellipsis that indicates #' content has been removed. #' @return A character vector the same length as `string`. #' @seealso [str_pad()] to increase the minimum width of a string. #' @export #' @examples #' x <- "This string is moderately long" #' rbind( #' str_trunc(x, 20, "right"), #' str_trunc(x, 20, "left"), #' str_trunc(x, 20, "center") #' ) str_trunc <- function(string, width, side = c("right", "left", "center"), ellipsis = "...") { check_number_whole(width) side <- arg_match(side) check_string(ellipsis) len <- str_length(string) too_long <- !is.na(string) & len > width width... <- width - str_length(ellipsis) if (width... < 0) { cli::cli_abort( "`width` ({width}) is shorter than `ellipsis` ({str_length(ellipsis)})." ) } string[too_long] <- switch(side, right = str_c(str_sub(string[too_long], 1, width...), ellipsis), left = str_c(ellipsis, str_sub(string[too_long], len[too_long] - width... + 1, -1)), center = str_c( str_sub(string[too_long], 1, ceiling(width... / 2)), ellipsis, str_sub(string[too_long], len[too_long] - floor(width... / 2) + 1, -1) ) ) string } stringr/R/case.R0000644000176200001440000000266714520174727013225 0ustar liggesusers#' Convert string to upper case, lower case, title case, or sentence case #' #' * `str_to_upper()` converts to upper case. #' * `str_to_lower()` converts to lower case. #' * `str_to_title()` converts to title case, where only the first letter of #' each word is capitalized. #' * `str_to_sentence()` convert to sentence case, where only the first letter #' of sentence is capitalized. #' #' @inheritParams str_detect #' @inheritParams coll #' @return A character vector the same length as `string`. #' @examples #' dog <- "The quick brown dog" #' str_to_upper(dog) #' str_to_lower(dog) #' str_to_title(dog) #' str_to_sentence("the quick brown dog") #' #' # Locale matters! #' str_to_upper("i") # English #' str_to_upper("i", "tr") # Turkish #' @name case NULL #' @export #' @rdname case str_to_upper <- function(string, locale = "en") { check_string(locale) stri_trans_toupper(string, locale = locale) } #' @export #' @rdname case str_to_lower <- function(string, locale = "en") { check_string(locale) stri_trans_tolower(string, locale = locale) } #' @export #' @rdname case str_to_title <- function(string, locale = "en") { check_string(locale) stri_trans_totitle(string, opts_brkiter = stri_opts_brkiter(locale = locale)) } #' @export #' @rdname case str_to_sentence <- function(string, locale = "en") { check_string(locale) stri_trans_totitle( string, opts_brkiter = stri_opts_brkiter(type = "sentence", locale = locale) ) } stringr/R/glue.R0000644000176200001440000000262114520174727013234 0ustar liggesusers#' Interpolation with glue #' #' @description #' These functions are wrappers around [glue::glue()] and [glue::glue_data()], #' which provide a powerful and elegant syntax for interpolating strings #' with `{}`. #' #' These wrappers provide a small set of the full options. Use `glue()` and #' `glue_data()` directly from glue for more control. #' #' @inheritParams glue::glue #' @return A character vector with same length as the longest input. #' @export #' @examples #' name <- "Fred" #' age <- 50 #' anniversary <- as.Date("1991-10-12") #' str_glue( #' "My name is {name}, ", #' "my age next year is {age + 1}, ", #' "and my anniversary is {format(anniversary, '%A, %B %d, %Y')}." #' ) #' #' # single braces can be inserted by doubling them #' str_glue("My name is {name}, not {{name}}.") #' #' # You can also used named arguments #' str_glue( #' "My name is {name}, ", #' "and my age next year is {age + 1}.", #' name = "Joe", #' age = 40 #' ) #' #' # `str_glue_data()` is useful in data pipelines #' mtcars %>% str_glue_data("{rownames(.)} has {hp} hp") str_glue <- function(..., .sep = "", .envir = parent.frame()) { glue::glue(..., .sep = .sep, .envir = .envir) } #' @export #' @rdname str_glue str_glue_data <- function(.x, ..., .sep = "", .envir = parent.frame(), .na = "NA") { glue::glue_data( .x, ..., .sep = .sep, .envir = .envir, .na = .na ) } stringr/R/interp.R0000644000176200001440000001713314316640452013601 0ustar liggesusers#' String interpolation #' #' @description #' `r lifecycle::badge("superseded")` #' #' `str_interp()` is superseded in favour of [str_glue()]. #' #' String interpolation is a useful way of specifying a character string which #' depends on values in a certain environment. It allows for string creation #' which is easier to read and write when compared to using e.g. #' [paste()] or [sprintf()]. The (template) string can #' include expression placeholders of the form `${expression}` or #' `$[format]{expression}`, where expressions are valid R expressions that #' can be evaluated in the given environment, and `format` is a format #' specification valid for use with [sprintf()]. #' #' @param string A template character string. This function is not vectorised: #' a character vector will be collapsed into a single string. #' @param env The environment in which to evaluate the expressions. #' @seealso [str_glue()] and [str_glue_data()] for alternative approaches to #' the same problem. #' @keywords internal #' @return An interpolated character string. #' @author Stefan Milton Bache #' @export #' @examples #' #' # Using values from the environment, and some formats #' user_name <- "smbache" #' amount <- 6.656 #' account <- 1337 #' str_interp("User ${user_name} (account $[08d]{account}) has $$[.2f]{amount}.") #' #' # Nested brace pairs work inside expressions too, and any braces can be #' # placed outside the expressions. #' str_interp("Works with } nested { braces too: $[.2f]{{{2 + 2}*{amount}}}") #' #' # Values can also come from a list #' str_interp( #' "One value, ${value1}, and then another, ${value2*2}.", #' list(value1 = 10, value2 = 20) #' ) #' #' # Or a data frame #' str_interp( #' "Values are $[.2f]{max(Sepal.Width)} and $[.2f]{min(Sepal.Width)}.", #' iris #' ) #' #' # Use a vector when the string is long: #' max_char <- 80 #' str_interp(c( #' "This particular line is so long that it is hard to write ", #' "without breaking the ${max_char}-char barrier!" #' )) str_interp <- function(string, env = parent.frame()) { check_character(string) string <- str_c(string, collapse = "") # Find expression placeholders matches <- interp_placeholders(string) # Determine if any placeholders were found. if (matches$indices[1] <= 0) { string } else { # Evaluate them to get the replacement strings. replacements <- eval_interp_matches(matches$matches, env) # Replace the expressions by their values and return. `regmatches<-`(string, list(matches$indices), FALSE, list(replacements)) } } #' Match String Interpolation Placeholders #' #' Given a character string a set of expression placeholders are matched. They #' are of the form \code{${...}} or optionally \code{$[f]{...}} where `f` #' is a valid format for [sprintf()]. #' #' @param string character: The string to be interpolated. #' #' @return list containing `indices` (regex match data) and `matches`, #' the string representations of matched expressions. #' #' @noRd #' @author Stefan Milton Bache interp_placeholders <- function(string) { # Find starting position of ${} or $[]{} placeholders. starts <- gregexpr("\\$(\\[.*?\\])?\\{", string)[[1]] # Return immediately if no matches are found. if (starts[1] <= 0) return(list(indices = starts)) # Break up the string in parts parts <- substr(rep(string, length(starts)), start = starts, stop = c(starts[-1L] - 1L, nchar(string))) # If there are nested placeholders, each part will not contain a full # placeholder in which case we report invalid string interpolation template. if (any(!grepl("\\$(\\[.*?\\])?\\{.+\\}", parts))) stop("Invalid template string for interpolation.", call. = FALSE) # For each part, find the opening and closing braces. opens <- lapply(strsplit(parts, ""), function(v) which(v == "{")) closes <- lapply(strsplit(parts, ""), function(v) which(v == "}")) # Identify the positions within the parts of the matching closing braces. # These are the lengths of the placeholder matches. lengths <- mapply(match_brace, opens, closes) # Update the `starts` match data with the attr(starts, "match.length") <- lengths # Return both the indices (regex match data) and the actual placeholder # matches (as strings.) list(indices = starts, matches = mapply(substr, starts, starts + lengths - 1, x = string)) } #' Evaluate String Interpolation Matches #' #' The expression part of string interpolation matches are evaluated in a #' specified environment and formatted for replacement in the original string. #' Used internally by [str_interp()]. #' #' @param matches Match data #' #' @param env The environment in which to evaluate the expressions. #' #' @return A character vector of replacement strings. #' #' @noRd #' @author Stefan Milton Bache eval_interp_matches <- function(matches, env) { # Extract expressions from the matches expressions <- extract_expressions(matches) # Evaluate them in the given environment values <- lapply(expressions, eval, envir = env, enclos = if (is.environment(env)) env else environment(env)) # Find the formats to be used formats <- extract_formats(matches) # Format the values and return. mapply(sprintf, formats, values, SIMPLIFY = FALSE) } #' Extract Expression Objects from String Interpolation Matches #' #' An interpolation match object will contain both its wrapping \code{${ }} part #' and possibly a format. This extracts the expression parts and parses them to #' prepare them for evaluation. #' #' @param matches Match data #' #' @return list of R expressions #' #' @noRd #' @author Stefan Milton Bache extract_expressions <- function(matches) { # Parse function for text argument as first argument. parse_text <- function(text) { tryCatch( parse(text = text), error = function(e) stop(conditionMessage(e), call. = FALSE) ) } # string representation of the expressions (without the possible formats). strings <- gsub("\\$(\\[.+?\\])?\\{", "", matches) # Remove the trailing closing brace and parse. lapply(substr(strings, 1L, nchar(strings) - 1), parse_text) } #' Extract String Interpolation Formats from Matched Placeholders #' #' An expression placeholder for string interpolation may optionally contain a #' format valid for [sprintf()]. This function will extract such or #' default to "s" the format for strings. #' #' @param matches Match data #' #' @return A character vector of format specifiers. #' #' @noRd #' @author Stefan Milton Bache extract_formats <- function(matches) { # Extract the optional format parts. formats <- gsub("\\$(\\[(.+?)\\])?.*", "\\2", matches) # Use string options "s" as default when not specified. paste0("%", ifelse(formats == "", "s", formats)) } #' Utility Function for Matching a Closing Brace #' #' Given positions of opening and closing braces `match_brace` identifies #' the closing brace matching the first opening brace. #' #' @param opening integer: Vector with positions of opening braces. #' #' @param closing integer: Vector with positions of closing braces. #' #' @return Integer with the posision of the matching brace. #' #' @noRd #' @author Stefan Milton Bache match_brace <- function(opening, closing) { # maximum index for the matching closing brace max_close <- max(closing) # "path" for mapping opening and closing breaces path <- numeric(max_close) # Set openings to 1, and closings to -1 path[opening[opening < max_close]] <- 1 path[closing] <- -1 # Cumulate the path ... cumpath <- cumsum(path) # ... and the first 0 after the first opening identifies the match. min(which(1:max_close > min(which(cumpath == 1)) & cumpath == 0)) } stringr/R/modifiers.R0000644000176200001440000001471114520174727014264 0ustar liggesusers#' Control matching behaviour with modifier functions #' #' @description #' Modifier functions control the meaning of the `pattern` argument to #' stringr functions: #' #' * `boundary()`: Match boundaries between things. #' * `coll()`: Compare strings using standard Unicode collation rules. #' * `fixed()`: Compare literal bytes. #' * `regex()` (the default): Uses ICU regular expressions. #' #' @param pattern Pattern to modify behaviour. #' @param ignore_case Should case differences be ignored in the match? #' For `fixed()`, this uses a simple algorithm which assumes a #' one-to-one mapping between upper and lower case letters. #' @return A stringr modifier object, i.e. a character vector with #' parent S3 class `stringr_pattern`. #' @name modifiers #' @examples #' pattern <- "a.b" #' strings <- c("abb", "a.b") #' str_detect(strings, pattern) #' str_detect(strings, fixed(pattern)) #' str_detect(strings, coll(pattern)) #' #' # coll() is useful for locale-aware case-insensitive matching #' i <- c("I", "\u0130", "i") #' i #' str_detect(i, fixed("i", TRUE)) #' str_detect(i, coll("i", TRUE)) #' str_detect(i, coll("i", TRUE, locale = "tr")) #' #' # Word boundaries #' words <- c("These are some words.") #' str_count(words, boundary("word")) #' str_split(words, " ")[[1]] #' str_split(words, boundary("word"))[[1]] #' #' # Regular expression variations #' str_extract_all("The Cat in the Hat", "[a-z]+") #' str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE)) #' #' str_extract_all("a\nb\nc", "^.") #' str_extract_all("a\nb\nc", regex("^.", multiline = TRUE)) #' #' str_extract_all("a\nb\nc", "a.") #' str_extract_all("a\nb\nc", regex("a.", dotall = TRUE)) NULL #' @export #' @rdname modifiers fixed <- function(pattern, ignore_case = FALSE) { pattern <- as_bare_character(pattern) check_bool(ignore_case) options <- stri_opts_fixed(case_insensitive = ignore_case) structure( pattern, options = options, class = c("stringr_fixed", "stringr_pattern", "character") ) } #' @export #' @rdname modifiers #' @param locale Locale to use for comparisons. See #' [stringi::stri_locale_list()] for all possible options. #' Defaults to "en" (English) to ensure that default behaviour is #' consistent across platforms. #' @param ... Other less frequently used arguments passed on to #' [stringi::stri_opts_collator()], #' [stringi::stri_opts_regex()], or #' [stringi::stri_opts_brkiter()] coll <- function(pattern, ignore_case = FALSE, locale = "en", ...) { pattern <- as_bare_character(pattern) check_bool(ignore_case) check_string(locale) options <- str_opts_collator( ignore_case = ignore_case, locale = locale, ... ) structure( pattern, options = options, class = c("stringr_coll", "stringr_pattern", "character") ) } str_opts_collator <- function(locale = "en", ignore_case = FALSE, strength = NULL, ...) { strength <- strength %||% if (ignore_case) 2L else 3L stri_opts_collator( strength = strength, locale = locale, ... ) } # used for testing turkish_I <- function() { coll("I", ignore_case = TRUE, locale = "tr") } #' @export #' @rdname modifiers #' @param multiline If `TRUE`, `$` and `^` match #' the beginning and end of each line. If `FALSE`, the #' default, only match the start and end of the input. #' @param comments If `TRUE`, white space and comments beginning with #' `#` are ignored. Escape literal spaces with `\\ `. #' @param dotall If `TRUE`, `.` will also match line terminators. regex <- function(pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, ...) { pattern <- as_bare_character(pattern) check_bool(ignore_case) check_bool(multiline) check_bool(comments) check_bool(dotall) options <- stri_opts_regex( case_insensitive = ignore_case, multiline = multiline, comments = comments, dotall = dotall, ... ) structure( pattern, options = options, class = c("stringr_regex", "stringr_pattern", "character") ) } #' @param type Boundary type to detect. #' \describe{ #' \item{`character`}{Every character is a boundary.} #' \item{`line_break`}{Boundaries are places where it is acceptable to have #' a line break in the current locale.} #' \item{`sentence`}{The beginnings and ends of sentences are boundaries, #' using intelligent rules to avoid counting abbreviations #' ([details](https://www.unicode.org/reports/tr29/#Sentence_Boundaries)).} #' \item{`word`}{The beginnings and ends of words are boundaries.} #' } #' @param skip_word_none Ignore "words" that don't contain any characters #' or numbers - i.e. punctuation. Default `NA` will skip such "words" #' only when splitting on `word` boundaries. #' @export #' @rdname modifiers boundary <- function(type = c("character", "line_break", "sentence", "word"), skip_word_none = NA, ...) { type <- arg_match(type) check_bool(skip_word_none, allow_na = TRUE) if (identical(skip_word_none, NA)) { skip_word_none <- type == "word" } options <- stri_opts_brkiter( type = type, skip_word_none = skip_word_none, ... ) structure( NA_character_, options = options, class = c("stringr_boundary", "stringr_pattern", "character") ) } opts <- function(x) { if (identical(x, "")) { stri_opts_brkiter(type = "character") } else { attr(x, "options") } } type <- function(x, error_call = caller_env()) { UseMethod("type") } #' @export type.stringr_boundary <- function(x, error_call = caller_env()) { "bound" } #' @export type.stringr_regex <- function(x, error_call = caller_env()) { "regex" } #' @export type.stringr_coll <- function(x, error_call = caller_env()) { "coll" } #' @export type.stringr_fixed <- function(x, error_call = caller_env()) { "fixed" } #' @export type.character <- function(x, error_call = caller_env()) { if (identical(x, "")) "empty" else "regex" } #' @export type.default <- function(x, error_call = caller_env()) { if (inherits(x, "regex")) { # Fallback for rex return("regex") } cli::cli_abort( "`pattern` must be a string, not {.obj_type_friendly {x}}.", call = error_call ) } #' @export `[.stringr_pattern` <- function(x, i) { structure( NextMethod(), options = attr(x, "options"), class = class(x) ) } as_bare_character <- function(x, call = caller_env()) { if (is.character(x) && !is.object(x)) { # All OK! return(x) } warn("Coercing `pattern` to a plain character vector.", call = call) as.character(x) } stringr/R/word.R0000644000176200001440000000337614520174727013263 0ustar liggesusers#' Extract words from a sentence #' #' @inheritParams str_detect #' @param start,end Pair of integer vectors giving range of words (inclusive) #' to extract. If negative, counts backwards from the last word. #' #' The default value select the first word. #' @param sep Separator between words. Defaults to single space. #' @return A character vector with the same length as `string`/`start`/`end`. #' @export #' @examples #' sentences <- c("Jane saw a cat", "Jane sat down") #' word(sentences, 1) #' word(sentences, 2) #' word(sentences, -1) #' word(sentences, 2, -1) #' #' # Also vectorised over start and end #' word(sentences[1], 1:3, -1) #' word(sentences[1], 1, 1:4) #' #' # Can define words by other separators #' str <- 'abc.def..123.4568.999' #' word(str, 1, sep = fixed('..')) #' word(str, 2, sep = fixed('..')) word <- function(string, start = 1L, end = start, sep = fixed(" ")) { args <- vctrs::vec_recycle_common(string = string, start = start, end = end) string <- args$string start <- args$start end <- args$end breaks <- str_locate_all(string, sep) words <- lapply(breaks, invert_match) # Convert negative values into actual positions len <- vapply(words, nrow, integer(1)) neg_start <- !is.na(start) & start < 0L start[neg_start] <- start[neg_start] + len[neg_start] + 1L neg_end <- !is.na(end) & end < 0L end[neg_end] <- end[neg_end] + len[neg_end] + 1L # Replace indexes past end with NA start[start > len] <- NA end[end > len] <- NA # To return all words when trying to extract more words than available start[start < 1L] <- 1 # Extract locations starts <- mapply(function(word, loc) word[loc, "start"], words, start) ends <- mapply(function(word, loc) word[loc, "end"], words, end) str_sub(string, starts, ends) } stringr/R/escape.R0000644000176200001440000000115414520174727013540 0ustar liggesusers#' Escape regular expression metacharacters #' #' This function escapes metacharacter, the characters that have special #' meaning to the regular expression engine. In most cases you are better #' off using [fixed()] since it is faster, but `str_escape()` is useful #' if you are composing user provided strings into a pattern. #' #' @inheritParams str_detect #' @return A character vector the same length as `string`. #' @export #' @examples #' str_detect(c("a", "."), ".") #' str_detect(c("a", "."), str_escape(".")) str_escape <- function(string) { str_replace_all(string, "([.^$\\\\|*+?{}\\[\\]()])", "\\\\\\1") } stringr/R/data.R0000644000176200001440000000125614316043620013202 0ustar liggesusers#' Sample character vectors for practicing string manipulations #' #' `fruit` and `words` come from the `rcorpora` package #' written by Gabor Csardi; the data was collected by Darius Kazemi #' and made available at \url{https://github.com/dariusk/corpora}. #' `sentences` is a collection of "Harvard sentences" used for #' standardised testing of voice. #' #' @format Character vectors. #' @name stringr-data #' @examples #' length(sentences) #' sentences[1:5] #' #' length(fruit) #' fruit[1:5] #' #' length(words) #' words[1:5] NULL #' @rdname stringr-data #' @format NULL "sentences" #' @rdname stringr-data #' @format NULL "fruit" #' @rdname stringr-data #' @format NULL "words" stringr/R/compat-obj-type.R0000644000176200001440000001764214520174727015323 0ustar liggesusers# nocov start --- r-lib/rlang compat-obj-type # # Changelog # ========= # # 2022-10-04: # - `obj_type_friendly(value = TRUE)` now shows numeric scalars # literally. # - `stop_friendly_type()` now takes `show_value`, passed to # `obj_type_friendly()` as the `value` argument. # # 2022-10-03: # - Added `allow_na` and `allow_null` arguments. # - `NULL` is now backticked. # - Better friendly type for infinities and `NaN`. # # 2022-09-16: # - Unprefixed usage of rlang functions with `rlang::` to # avoid onLoad issues when called from rlang (#1482). # # 2022-08-11: # - Prefixed usage of rlang functions with `rlang::`. # # 2022-06-22: # - `friendly_type_of()` is now `obj_type_friendly()`. # - Added `obj_type_oo()`. # # 2021-12-20: # - Added support for scalar values and empty vectors. # - Added `stop_input_type()` # # 2021-06-30: # - Added support for missing arguments. # # 2021-04-19: # - Added support for matrices and arrays (#141). # - Added documentation. # - Added changelog. #' Return English-friendly type #' @param x Any R object. #' @param value Whether to describe the value of `x`. Special values #' like `NA` or `""` are always described. #' @param length Whether to mention the length of vectors and lists. #' @return A string describing the type. Starts with an indefinite #' article, e.g. "an integer vector". #' @noRd obj_type_friendly <- function(x, value = TRUE) { if (is_missing(x)) { return("absent") } if (is.object(x)) { if (inherits(x, "quosure")) { type <- "quosure" } else { type <- paste(class(x), collapse = "/") } return(sprintf("a <%s> object", type)) } if (!is_vector(x)) { return(.rlang_as_friendly_type(typeof(x))) } n_dim <- length(dim(x)) if (!n_dim) { if (!is_list(x) && length(x) == 1) { if (is_na(x)) { return(switch( typeof(x), logical = "`NA`", integer = "an integer `NA`", double = if (is.nan(x)) { "`NaN`" } else { "a numeric `NA`" }, complex = "a complex `NA`", character = "a character `NA`", .rlang_stop_unexpected_typeof(x) )) } show_infinites <- function(x) { if (x > 0) { "`Inf`" } else { "`-Inf`" } } str_encode <- function(x, width = 30, ...) { if (nchar(x) > width) { x <- substr(x, 1, width - 3) x <- paste0(x, "...") } encodeString(x, ...) } if (value) { if (is.numeric(x) && is.infinite(x)) { return(show_infinites(x)) } if (is.numeric(x) || is.complex(x)) { number <- as.character(round(x, 2)) what <- if (is.complex(x)) "the complex number" else "the number" return(paste(what, number)) } return(switch( typeof(x), logical = if (x) "`TRUE`" else "`FALSE`", character = { what <- if (nzchar(x)) "the string" else "the empty string" paste(what, str_encode(x, quote = "\"")) }, raw = paste("the raw value", as.character(x)), .rlang_stop_unexpected_typeof(x) )) } return(switch( typeof(x), logical = "a logical value", integer = "an integer", double = if (is.infinite(x)) show_infinites(x) else "a number", complex = "a complex number", character = if (nzchar(x)) "a string" else "\"\"", raw = "a raw value", .rlang_stop_unexpected_typeof(x) )) } if (length(x) == 0) { return(switch( typeof(x), logical = "an empty logical vector", integer = "an empty integer vector", double = "an empty numeric vector", complex = "an empty complex vector", character = "an empty character vector", raw = "an empty raw vector", list = "an empty list", .rlang_stop_unexpected_typeof(x) )) } } vec_type_friendly(x) } vec_type_friendly <- function(x, length = FALSE) { if (!is_vector(x)) { abort("`x` must be a vector.") } type <- typeof(x) n_dim <- length(dim(x)) add_length <- function(type) { if (length && !n_dim) { paste0(type, sprintf(" of length %s", length(x))) } else { type } } if (type == "list") { if (n_dim < 2) { return(add_length("a list")) } else if (is.data.frame(x)) { return("a data frame") } else if (n_dim == 2) { return("a list matrix") } else { return("a list array") } } type <- switch( type, logical = "a logical %s", integer = "an integer %s", numeric = , double = "a double %s", complex = "a complex %s", character = "a character %s", raw = "a raw %s", type = paste0("a ", type, " %s") ) if (n_dim < 2) { kind <- "vector" } else if (n_dim == 2) { kind <- "matrix" } else { kind <- "array" } out <- sprintf(type, kind) if (n_dim >= 2) { out } else { add_length(out) } } .rlang_as_friendly_type <- function(type) { switch( type, list = "a list", NULL = "`NULL`", environment = "an environment", externalptr = "a pointer", weakref = "a weak reference", S4 = "an S4 object", name = , symbol = "a symbol", language = "a call", pairlist = "a pairlist node", expression = "an expression vector", char = "an internal string", promise = "an internal promise", ... = "an internal dots object", any = "an internal `any` object", bytecode = "an internal bytecode object", primitive = , builtin = , special = "a primitive function", closure = "a function", type ) } .rlang_stop_unexpected_typeof <- function(x, call = caller_env()) { abort( sprintf("Unexpected type <%s>.", typeof(x)), call = call ) } #' Return OO type #' @param x Any R object. #' @return One of `"bare"` (for non-OO objects), `"S3"`, `"S4"`, #' `"R6"`, or `"R7"`. #' @noRd obj_type_oo <- function(x) { if (!is.object(x)) { return("bare") } class <- inherits(x, c("R6", "R7_object"), which = TRUE) if (class[[1]]) { "R6" } else if (class[[2]]) { "R7" } else if (isS4(x)) { "S4" } else { "S3" } } #' @param x The object type which does not conform to `what`. Its #' `obj_type_friendly()` is taken and mentioned in the error message. #' @param what The friendly expected type as a string. Can be a #' character vector of expected types, in which case the error #' message mentions all of them in an "or" enumeration. #' @param show_value Passed to `value` argument of `obj_type_friendly()`. #' @param ... Arguments passed to [abort()]. #' @inheritParams args_error_context #' @noRd stop_input_type <- function(x, what, ..., allow_na = FALSE, allow_null = FALSE, show_value = TRUE, arg = caller_arg(x), call = caller_env()) { # From compat-cli.R cli <- env_get_list( nms = c("format_arg", "format_code"), last = topenv(), default = function(x) sprintf("`%s`", x), inherit = TRUE ) if (allow_na) { what <- c(what, cli$format_code("NA")) } if (allow_null) { what <- c(what, cli$format_code("NULL")) } if (length(what)) { what <- oxford_comma(what) } message <- sprintf( "%s must be %s, not %s.", cli$format_arg(arg), what, obj_type_friendly(x, value = show_value) ) abort(message, ..., call = call, arg = arg) } oxford_comma <- function(chr, sep = ", ", final = "or") { n <- length(chr) if (n < 2) { return(chr) } head <- chr[seq_len(n - 1)] last <- chr[n] head <- paste(head, collapse = sep) # Write a or b. But a, b, or c. if (n > 2) { paste0(head, sep, final, " ", last) } else { paste0(head, " ", final, " ", last) } } # nocov end stringr/R/length.R0000644000176200001440000000244014520174727013560 0ustar liggesusers#' Compute the length/width #' #' @description #' `str_length()` returns the number of codepoints in a string. These are #' the individual elements (which are often, but not always letters) that #' can be extracted with [str_sub()]. #' #' `str_width()` returns how much space the string will occupy when printed #' in a fixed width font (i.e. when printed in the console). #' #' @inheritParams str_detect #' @return A numeric vector the same length as `string`. #' @seealso [stringi::stri_length()] which this function wraps. #' @export #' @examples #' str_length(letters) #' str_length(NA) #' str_length(factor("abc")) #' str_length(c("i", "like", "programming", NA)) #' #' # Some characters, like emoji and Chinese characters (hanzi), are square #' # which means they take up the width of two Latin characters #' x <- c("\u6c49\u5b57", "\U0001f60a") #' str_view(x) #' str_width(x) #' str_length(x) #' #' # There are two ways of representing a u with an umlaut #' u <- c("\u00fc", "u\u0308") #' # They have the same width #' str_width(u) #' # But a different length #' str_length(u) #' # Because the second element is made up of a u + an accent #' str_sub(u, 1, 1) str_length <- function(string) { stri_length(string) } #' @export #' @rdname str_length str_width <- function(string) { stri_width(string) } stringr/R/flatten.R0000644000176200001440000000425314520174727013740 0ustar liggesusers#' Flatten a string # #' @description #' `str_flatten()` reduces a character vector to a single string. This is a #' summary function because regardless of the length of the input `x`, it #' always returns a single string. #' #' `str_flatten_comma()` is a variation designed specifically for flattening #' with commas. It automatically recognises if `last` uses the Oxford comma #' and handles the special case of 2 elements. #' #' @inheritParams str_detect #' @param collapse String to insert between each piece. Defaults to `""`. #' @param last Optional string to use in place of the final separator. #' @param na.rm Remove missing values? If `FALSE` (the default), the result #' will be `NA` if any element of `string` is `NA`. #' @return A string, i.e. a character vector of length 1. #' @export #' @examples #' str_flatten(letters) #' str_flatten(letters, "-") #' #' str_flatten(letters[1:3], ", ") #' #' # Use last to customise the last component #' str_flatten(letters[1:3], ", ", " and ") #' #' # this almost works if you want an Oxford (aka serial) comma #' str_flatten(letters[1:3], ", ", ", and ") #' #' # but it will always add a comma, even when not necessary #' str_flatten(letters[1:2], ", ", ", and ") #' #' # str_flatten_comma knows how to handle the Oxford comma #' str_flatten_comma(letters[1:3], ", and ") #' str_flatten_comma(letters[1:2], ", and ") str_flatten <- function(string, collapse = "", last = NULL, na.rm = FALSE) { check_string(collapse) check_string(last, allow_null = TRUE) check_bool(na.rm) if (na.rm) { string <- string[!is.na(string)] } n <- length(string) if (!is.null(last) && n >= 2) { string <- c( string[seq2(1, n - 2)], stringi::stri_c(string[[n - 1]], last, string[[n]]) ) } stri_flatten(string, collapse = collapse) } #' @export #' @rdname str_flatten str_flatten_comma <- function(string, last = NULL, na.rm = FALSE) { check_string(last, allow_null = TRUE) check_bool(na.rm) # Remove comma if exactly two elements, and last uses Oxford comma if (length(string) == 2 && !is.null(last) && str_detect(last, "^,")) { last <- str_replace(last, "^,", "") } str_flatten(string, ", ", last = last, na.rm = na.rm) } stringr/R/trim.R0000644000176200001440000000211314520174727013247 0ustar liggesusers#' Remove whitespace #' #' `str_trim()` removes whitespace from start and end of string; `str_squish()` #' removes whitespace at the start and end, and replaces all internal whitespace #' with a single space. #' #' @inheritParams str_detect #' @param side Side on which to remove whitespace: "left", "right", or #' "both", the default. #' @return A character vector the same length as `string`. #' @export #' @seealso [str_pad()] to add whitespace #' @examples #' str_trim(" String with trailing and leading white space\t") #' str_trim("\n\nString with trailing and leading white space\n\n") #' #' str_squish(" String with trailing, middle, and leading white space\t") #' str_squish("\n\nString with excess, trailing and leading white space\n\n") str_trim <- function(string, side = c("both", "left", "right")) { side <- arg_match(side) switch(side, left = stri_trim_left(string), right = stri_trim_right(string), both = stri_trim_both(string) ) } #' @export #' @rdname str_trim str_squish <- function(string) { stri_trim_both(str_replace_all(string, "\\s+", " ")) } stringr/R/equal.R0000644000176200001440000000204114520174727013403 0ustar liggesusers#' Determine if two strings are equivalent #' #' This uses Unicode canonicalisation rules, and optionally ignores case. #' #' @param x,y A pair of character vectors. #' @inheritParams str_order #' @param ignore_case Ignore case when comparing strings? #' @return An logical vector the same length as `x`/`y`. #' @seealso [stringi::stri_cmp_equiv()] for the underlying implementation. #' @export #' @examples #' # These two strings encode "a" with an accent in two different ways #' a1 <- "\u00e1" #' a2 <- "a\u0301" #' c(a1, a2) #' #' a1 == a2 #' str_equal(a1, a2) #' #' # ohm and omega use different code points but should always be treated #' # as equal #' ohm <- "\u2126" #' omega <- "\u03A9" #' c(ohm, omega) #' #' ohm == omega #' str_equal(ohm, omega) str_equal <- function(x, y, locale = "en", ignore_case = FALSE, ...) { vctrs::vec_size_common(x = x, y = y) check_string(locale) check_bool(ignore_case) opts <- str_opts_collator( locale = locale, ignore_case = ignore_case, ... ) stri_cmp_equiv(x, y, opts_collator = opts) } stringr/R/dup.R0000644000176200001440000000105114520174727013064 0ustar liggesusers#' Duplicate a string #' #' `str_dup()` duplicates the characters within a string, e.g. #' `str_dup("xy", 3)` returns `"xyxyxy"`. #' #' @inheritParams str_detect #' @param times Number of times to duplicate each string. #' @return A character vector the same length as `string`/`times`. #' @export #' @examples #' fruit <- c("apple", "pear", "banana") #' str_dup(fruit, 2) #' str_dup(fruit, 1:3) #' str_c("ba", str_dup("na", 0:5)) str_dup <- function(string, times) { vctrs::vec_size_common(string = string, times = times) stri_dup(string, times) } stringr/R/unique.R0000644000176200001440000000171414520174727013610 0ustar liggesusers#' Remove duplicated strings #' #' `str_unique()` removes duplicated values, with optional control over #' how duplication is measured. #' #' @inheritParams str_detect #' @inheritParams str_equal #' @return A character vector, usually shorter than `string`. #' @seealso [unique()], [stringi::stri_unique()] which this function wraps. #' @examples #' str_unique(c("a", "b", "c", "b", "a")) #' #' str_unique(c("a", "b", "c", "B", "A")) #' str_unique(c("a", "b", "c", "B", "A"), ignore_case = TRUE) #' #' # Use ... to pass additional arguments to stri_unique() #' str_unique(c("motley", "mötley", "pinguino", "pingüino")) #' str_unique(c("motley", "mötley", "pinguino", "pingüino"), strength = 1) #' @export str_unique <- function(string, locale = "en", ignore_case = FALSE, ...) { check_string(locale) check_bool(ignore_case) opts <- str_opts_collator( locale = locale, ignore_case = ignore_case, ... ) stri_unique(string, opts_collator = opts) } stringr/R/sub.R0000644000176200001440000000534114520174727013073 0ustar liggesusers#' Get and set substrings using their positions #' #' `str_sub()` extracts or replaces the elements at a single position in each #' string. `str_sub_all()` allows you to extract strings at multiple elements #' in every string. #' #' @inheritParams str_detect #' @param start,end A pair of integer vectors defining the range of characters #' to extract (inclusive). #' #' Alternatively, instead of a pair of vectors, you can pass a matrix to #' `start`. The matrix should have two columns, either labelled `start` #' and `end`, or `start` and `length`. #' @param omit_na Single logical value. If `TRUE`, missing values in any of the #' arguments provided will result in an unchanged input. #' @param value replacement string #' @return #' * `str_sub()`: A character vector the same length as `string`/`start`/`end`. #' * `str_sub_all()`: A list the same length as `string`. Each element is #' a character vector the same length as `start`/`end`. #' @seealso The underlying implementation in [stringi::stri_sub()] #' @export #' @examples #' hw <- "Hadley Wickham" #' #' str_sub(hw, 1, 6) #' str_sub(hw, end = 6) #' str_sub(hw, 8, 14) #' str_sub(hw, 8) #' #' # Negative indices index from end of string #' str_sub(hw, -1) #' str_sub(hw, -7) #' str_sub(hw, end = -7) #' #' # str_sub() is vectorised by both string and position #' str_sub(hw, c(1, 8), c(6, 14)) #' #' # if you want to extract multiple positions from multiple strings, #' # use str_sub_all() #' x <- c("abcde", "ghifgh") #' str_sub(x, c(1, 2), c(2, 4)) #' str_sub_all(x, start = c(1, 2), end = c(2, 4)) #' #' # Alternatively, you can pass in a two column matrix, as in the #' # output from str_locate_all #' pos <- str_locate_all(hw, "[aeio]")[[1]] #' pos #' str_sub(hw, pos) #' #' # You can also use `str_sub()` to modify strings: #' x <- "BBCDEF" #' str_sub(x, 1, 1) <- "A"; x #' str_sub(x, -1, -1) <- "K"; x #' str_sub(x, -2, -2) <- "GHIJ"; x #' str_sub(x, 2, -2) <- ""; x str_sub <- function(string, start = 1L, end = -1L) { vctrs::vec_size_common(string = string, start = start, end = end) if (is.matrix(start)) { stri_sub(string, from = start) } else { stri_sub(string, from = start, to = end) } } #' @export #' @rdname str_sub "str_sub<-" <- function(string, start = 1L, end = -1L, omit_na = FALSE, value) { vctrs::vec_size_common(string = string, start = start, end = end) if (is.matrix(start)) { stri_sub(string, from = start, omit_na = omit_na) <- value } else { stri_sub(string, from = start, to = end, omit_na = omit_na) <- value } string } #' @export #' @rdname str_sub str_sub_all <- function(string, start = 1L, end = -1L) { if (is.matrix(start)) { stri_sub_all(string, from = start) } else { stri_sub_all(string, from = start, to = end) } } stringr/R/compat-types-check.R0000644000176200001440000002377714520174727016017 0ustar liggesusers# nocov start --- r-lib/rlang compat-types-check # # Dependencies # ============ # # - compat-obj-type.R # # Changelog # ========= # # 2022-10-04: # - Added `check_name()` that forbids the empty string. # `check_string()` allows the empty string by default. # # 2022-09-28: # - Removed `what` arguments. # - Added `allow_na` and `allow_null` arguments. # - Added `allow_decimal` and `allow_infinite` arguments. # - Improved errors with absent arguments. # # # 2022-09-16: # - Unprefixed usage of rlang functions with `rlang::` to # avoid onLoad issues when called from rlang (#1482). # # 2022-08-11: # - Added changelog. # Scalars ----------------------------------------------------------------- check_bool <- function(x, ..., allow_na = FALSE, allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { if (!missing(x)) { if (is_bool(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } if (allow_na && identical(x, NA)) { return(invisible(NULL)) } } stop_input_type( x, c("`TRUE`", "`FALSE`"), ..., allow_na = allow_na, allow_null = allow_null, arg = arg, call = call ) } check_string <- function(x, ..., allow_empty = TRUE, allow_na = FALSE, allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { if (!missing(x)) { is_string <- .rlang_check_is_string( x, allow_empty = allow_empty, allow_na = allow_na, allow_null = allow_null ) if (is_string) { return(invisible(NULL)) } } stop_input_type( x, "a single string", ..., allow_na = allow_na, allow_null = allow_null, arg = arg, call = call ) } .rlang_check_is_string <- function(x, allow_empty, allow_na, allow_null) { if (is_string(x)) { if (allow_empty || !is_string(x, "")) { return(TRUE) } } if (allow_null && is_null(x)) { return(TRUE) } if (allow_na && (identical(x, NA) || identical(x, na_chr))) { return(TRUE) } FALSE } check_name <- function(x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { if (!missing(x)) { is_string <- .rlang_check_is_string( x, allow_empty = FALSE, allow_na = FALSE, allow_null = allow_null ) if (is_string) { return(invisible(NULL)) } } stop_input_type( x, "a valid name", ..., allow_na = FALSE, allow_null = allow_null, arg = arg, call = call ) } check_number_decimal <- function(x, ..., min = -Inf, max = Inf, allow_infinite = TRUE, allow_na = FALSE, allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { .rlang_types_check_number( x, ..., min = min, max = max, allow_decimal = TRUE, allow_infinite = allow_infinite, allow_na = allow_na, allow_null = allow_null, arg = arg, call = call ) } check_number_whole <- function(x, ..., min = -Inf, max = Inf, allow_na = FALSE, allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { .rlang_types_check_number( x, ..., min = min, max = max, allow_decimal = FALSE, allow_infinite = FALSE, allow_na = allow_na, allow_null = allow_null, arg = arg, call = call ) } .rlang_types_check_number <- function(x, ..., min = -Inf, max = Inf, allow_decimal = FALSE, allow_infinite = FALSE, allow_na = FALSE, allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { if (allow_decimal) { what <- "a number" } else { what <- "a whole number" } .stop <- function(x, what, ...) stop_input_type( x, what, ..., allow_na = allow_na, allow_null = allow_null, arg = arg, call = call ) if (!missing(x)) { is_number <- is_number( x, allow_decimal = allow_decimal, allow_infinite = allow_infinite ) if (is_number) { if (min > -Inf && max < Inf) { what <- sprintf("a number between %s and %s", min, max) } else { what <- NULL } if (x < min) { what <- what %||% sprintf("a number larger than %s", min) .stop(x, what, ...) } if (x > max) { what <- what %||% sprintf("a number smaller than %s", max) .stop(x, what, ...) } return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } if (allow_na && (identical(x, NA) || identical(x, na_dbl) || identical(x, na_int))) { return(invisible(NULL)) } } .stop(x, what, ...) } is_number <- function(x, allow_decimal = FALSE, allow_infinite = FALSE) { if (!typeof(x) %in% c("integer", "double")) { return(FALSE) } if (length(x) != 1) { return(FALSE) } if (is.na(x)) { return(FALSE) } if (!allow_decimal && !is_integerish(x)) { return(FALSE) } if (!allow_infinite && is.infinite(x)) { return(FALSE) } TRUE } check_symbol <- function(x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { if (!missing(x)) { if (is_symbol(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "a symbol", ..., allow_null = allow_null, arg = arg, call = call ) } check_arg <- function(x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { if (!missing(x)) { if (is_symbol(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "an argument name", ..., allow_null = allow_null, arg = arg, call = call ) } check_call <- function(x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { if (!missing(x)) { if (is_call(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "a defused call", ..., allow_null = allow_null, arg = arg, call = call ) } check_environment <- function(x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { if (!missing(x)) { if (is_environment(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "an environment", ..., allow_null = allow_null, arg = arg, call = call ) } check_function <- function(x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { if (!missing(x)) { if (is_function(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "a function", ..., allow_null = allow_null, arg = arg, call = call ) } check_closure <- function(x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { if (!missing(x)) { if (is_closure(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "an R function", ..., allow_null = allow_null, arg = arg, call = call ) } check_formula <- function(x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { if (!missing(x)) { if (is_formula(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "a formula", ..., allow_null = allow_null, arg = arg, call = call ) } # Vectors ----------------------------------------------------------------- check_character <- function(x, ..., allow_null = FALSE, arg = caller_arg(x), call = caller_env()) { if (!missing(x)) { if (is_character(x)) { return(invisible(NULL)) } if (allow_null && is_null(x)) { return(invisible(NULL)) } } stop_input_type( x, "a character vector", ..., allow_null = allow_null, arg = arg, call = call ) } # nocov end stringr/R/remove.R0000644000176200001440000000112214520174727013570 0ustar liggesusers#' Remove matched patterns #' #' Remove matches, i.e. replace them with `""`. #' #' @inheritParams str_detect #' @return A character vector the same length as `string`/`pattern`. #' @seealso [str_replace()] for the underlying implementation. #' @export #' @examples #' fruits <- c("one apple", "two pears", "three bananas") #' str_remove(fruits, "[aeiou]") #' str_remove_all(fruits, "[aeiou]") str_remove <- function(string, pattern) { str_replace(string, pattern, "") } #' @export #' @rdname str_remove str_remove_all <- function(string, pattern) { str_replace_all(string, pattern, "") } stringr/R/sort.R0000644000176200001440000000515014520174727013267 0ustar liggesusers#' Order, rank, or sort a character vector #' #' * `str_sort()` returns the sorted vector. #' * `str_order()` returns an integer vector that returns the desired #' order when used for subsetting, i.e. `x[str_order(x)]` is the same #' as `str_sort()` #' * `str_rank()` returns the ranks of the values, i.e. #' `arrange(df, str_rank(x))` is the same as `str_sort(df$x)`. #' #' @param x A character vector to sort. #' @param decreasing A boolean. If `FALSE`, the default, sorts from #' lowest to highest; if `TRUE` sorts from highest to lowest. #' @param na_last Where should `NA` go? `TRUE` at the end, #' `FALSE` at the beginning, `NA` dropped. #' @param numeric If `TRUE`, will sort digits numerically, instead #' of as strings. #' @param ... Other options used to control collation. Passed on to #' [stringi::stri_opts_collator()]. #' @inheritParams coll #' @return A character vector the same length as `string`. #' @seealso [stringi::stri_order()] for the underlying implementation. #' @export #' @examples #' x <- c("apple", "car", "happy", "char") #' str_sort(x) #' #' str_order(x) #' x[str_order(x)] #' #' str_rank(x) #' #' # In Czech, ch is a digraph that sorts after h #' str_sort(x, locale = "cs") #' #' # Use numeric = TRUE to sort numbers in strings #' x <- c("100a10", "100a5", "2b", "2a") #' str_sort(x) #' str_sort(x, numeric = TRUE) str_order <- function(x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ...) { check_bool(decreasing) check_bool(na_last, allow_na = TRUE) check_string(locale) check_bool(numeric) opts <- stri_opts_collator(locale, numeric = numeric, ...) stri_order(x, decreasing = decreasing, na_last = na_last, opts_collator = opts ) } #' @export #' @rdname str_order str_rank <- function(x, locale = "en", numeric = FALSE, ...) { check_string(locale) check_bool(numeric) opts <- stri_opts_collator(locale, numeric = numeric, ...) stri_rank(x, opts_collator = opts ) } #' @export #' @rdname str_order str_sort <- function(x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ...) { check_bool(decreasing) check_bool(na_last, allow_na = TRUE) check_string(locale) check_bool(numeric) opts <- stri_opts_collator(locale, numeric = numeric, ...) stri_sort(x, decreasing = decreasing, na_last = na_last, opts_collator = opts ) } stringr/R/extract.R0000644000176200001440000000607114520174727013755 0ustar liggesusers#' Extract the complete match #' #' `str_extract()` extracts the first complete match from each string, #' `str_extract_all()`extracts all matches from each string. #' #' @inheritParams str_detect #' @param group If supplied, instead of returning the complete match, will #' return the matched text from the specified capturing group. #' @seealso [str_match()] to extract matched groups; #' [stringi::stri_extract()] for the underlying implementation. #' @param simplify A boolean. #' * `FALSE` (the default): returns a list of character vectors. #' * `TRUE`: returns a character matrix. #' @return #' * `str_extract()`: an character vector the same length as `string`/`pattern`. #' * `str_extract_all()`: a list of character vectors the same length as #' `string`/`pattern`. #' @export #' @examples #' shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2") #' str_extract(shopping_list, "\\d") #' str_extract(shopping_list, "[a-z]+") #' str_extract(shopping_list, "[a-z]{1,4}") #' str_extract(shopping_list, "\\b[a-z]{1,4}\\b") #' #' str_extract(shopping_list, "([a-z]+) of ([a-z]+)") #' str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 1) #' str_extract(shopping_list, "([a-z]+) of ([a-z]+)", group = 2) #' #' # Extract all matches #' str_extract_all(shopping_list, "[a-z]+") #' str_extract_all(shopping_list, "\\b[a-z]+\\b") #' str_extract_all(shopping_list, "\\d") #' #' # Simplify results into character matrix #' str_extract_all(shopping_list, "\\b[a-z]+\\b", simplify = TRUE) #' str_extract_all(shopping_list, "\\d", simplify = TRUE) #' #' # Extract all words #' str_extract_all("This is, suprisingly, a sentence.", boundary("word")) str_extract <- function(string, pattern, group = NULL) { if (!is.null(group)) { return(str_match(string, pattern)[, group + 1]) } check_lengths(string, pattern) switch(type(pattern), empty = stri_extract_first_boundaries(string, opts_brkiter = opts(pattern)), bound = stri_extract_first_boundaries(string, opts_brkiter = opts(pattern)), fixed = stri_extract_first_fixed(string, pattern, opts_fixed = opts(pattern)), coll = stri_extract_first_coll(string, pattern, opts_collator = opts(pattern)), regex = stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) ) } #' @rdname str_extract #' @export str_extract_all <- function(string, pattern, simplify = FALSE) { check_lengths(string, pattern) check_bool(simplify) switch(type(pattern), empty = stri_extract_all_boundaries(string, simplify = simplify, omit_no_match = TRUE, opts_brkiter = opts(pattern)), bound = stri_extract_all_boundaries(string, simplify = simplify, omit_no_match = TRUE, opts_brkiter = opts(pattern)), fixed = stri_extract_all_fixed(string, pattern, simplify = simplify, omit_no_match = TRUE, opts_fixed = opts(pattern)), coll = stri_extract_all_coll(string, pattern, simplify = simplify, omit_no_match = TRUE, opts_collator = opts(pattern)), regex = stri_extract_all_regex(string, pattern, simplify = simplify, omit_no_match = TRUE, opts_regex = opts(pattern)) ) } stringr/R/match.R0000644000176200001440000000504614520174727013400 0ustar liggesusers#' Extract components (capturing groups) from a match #' #' @description #' Extract any number of matches defined by unnamed, `(pattern)`, and #' named, `(?pattern)` capture groups. #' #' Use a non-capturing group, `(?:pattern)`, if you need to override default #' operate precedence but don't want to capture the result. #' #' @inheritParams str_detect #' @param pattern Unlike other stringr functions, `str_match()` only supports #' regular expressions, as described `vignette("regular-expressions")`. #' The pattern should contain at least one capturing group. #' @return #' * `str_match()`: a character matrix with the same number of rows as the #' length of `string`/`pattern`. The first column is the complete match, #' followed by one column for each capture group. The columns will be named #' if you used "named captured groups", i.e. `(?pattern')`. #' #' * `str_match_all()`: a list of the same length as `string`/`pattern` #' containing character matrices. Each matrix has columns as descrbed above #' and one row for each match. #' #' @seealso [str_extract()] to extract the complete match, #' [stringi::stri_match()] for the underlying implementation. #' @export #' @examples #' strings <- c(" 219 733 8965", "329-293-8753 ", "banana", "595 794 7569", #' "387 287 6718", "apple", "233.398.9187 ", "482 952 3315", #' "239 923 8115 and 842 566 4692", "Work: 579-499-7527", "$1000", #' "Home: 543.355.3679") #' phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})" #' #' str_extract(strings, phone) #' str_match(strings, phone) #' #' # Extract/match all #' str_extract_all(strings, phone) #' str_match_all(strings, phone) #' #' # You can also name the groups to make further manipulation easier #' phone <- "(?[2-9][0-9]{2})[- .](?[0-9]{3}[- .][0-9]{4})" #' str_match(strings, phone) #' #' x <- c("
", " <>", "", "", NA) #' str_match(x, "<(.*?)> <(.*?)>") #' str_match_all(x, "<(.*?)>") #' #' str_extract(x, "<.*?>") #' str_extract_all(x, "<.*?>") str_match <- function(string, pattern) { check_lengths(string, pattern) if (type(pattern) != "regex") { cli::cli_abort("`pattern` must be a regular expression.") } stri_match_first_regex(string, pattern, opts_regex = opts(pattern) ) } #' @rdname str_match #' @export str_match_all <- function(string, pattern) { check_lengths(string, pattern) if (type(pattern) != "regex") { cli::cli_abort("`pattern` must be a regular expression.") } stri_match_all_regex(string, pattern, omit_no_match = TRUE, opts_regex = opts(pattern) ) } stringr/R/stringr-package.R0000644000176200001440000000027614316043620015353 0ustar liggesusers#' @keywords internal "_PACKAGE" ## usethis namespace: start #' @import stringi #' @import rlang #' @importFrom glue glue #' @importFrom lifecycle deprecated ## usethis namespace: end NULL stringr/R/compat-purrr.R0000644000176200001440000001067114316043620014725 0ustar liggesusers# nocov start - compat-purrr (last updated: rlang 0.3.2.9000) # This file serves as a reference for compatibility functions for # purrr. They are not drop-in replacements but allow a similar style # of programming. This is useful in cases where purrr is too heavy a # package to depend on. Please find the most recent version in rlang's # repository. map <- function(.x, .f, ...) { lapply(.x, .f, ...) } map_mold <- function(.x, .f, .mold, ...) { out <- vapply(.x, .f, .mold, ..., USE.NAMES = FALSE) names(out) <- names(.x) out } map_lgl <- function(.x, .f, ...) { map_mold(.x, .f, logical(1), ...) } map_int <- function(.x, .f, ...) { map_mold(.x, .f, integer(1), ...) } map_dbl <- function(.x, .f, ...) { map_mold(.x, .f, double(1), ...) } map_chr <- function(.x, .f, ...) { map_mold(.x, .f, character(1), ...) } map_cpl <- function(.x, .f, ...) { map_mold(.x, .f, complex(1), ...) } walk <- function(.x, .f, ...) { map(.x, .f, ...) invisible(.x) } pluck <- function(.x, .f) { map(.x, `[[`, .f) } pluck_lgl <- function(.x, .f) { map_lgl(.x, `[[`, .f) } pluck_int <- function(.x, .f) { map_int(.x, `[[`, .f) } pluck_dbl <- function(.x, .f) { map_dbl(.x, `[[`, .f) } pluck_chr <- function(.x, .f) { map_chr(.x, `[[`, .f) } pluck_cpl <- function(.x, .f) { map_cpl(.x, `[[`, .f) } map2 <- function(.x, .y, .f, ...) { out <- mapply(.f, .x, .y, MoreArgs = list(...), SIMPLIFY = FALSE) if (length(out) == length(.x)) { set_names(out, names(.x)) } else { set_names(out, NULL) } } map2_lgl <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "logical") } map2_int <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "integer") } map2_dbl <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "double") } map2_chr <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "character") } map2_cpl <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "complex") } args_recycle <- function(args) { lengths <- map_int(args, length) n <- max(lengths) stopifnot(all(lengths == 1L | lengths == n)) to_recycle <- lengths == 1L args[to_recycle] <- map(args[to_recycle], function(x) rep.int(x, n)) args } pmap <- function(.l, .f, ...) { args <- args_recycle(.l) do.call("mapply", c( FUN = list(quote(.f)), args, MoreArgs = quote(list(...)), SIMPLIFY = FALSE, USE.NAMES = FALSE )) } probe <- function(.x, .p, ...) { if (is_logical(.p)) { stopifnot(length(.p) == length(.x)) .p } else { map_lgl(.x, .p, ...) } } keep <- function(.x, .f, ...) { .x[probe(.x, .f, ...)] } discard <- function(.x, .p, ...) { sel <- probe(.x, .p, ...) .x[is.na(sel) | !sel] } map_if <- function(.x, .p, .f, ...) { matches <- probe(.x, .p) .x[matches] <- map(.x[matches], .f, ...) .x } compact <- function(.x) { Filter(length, .x) } transpose <- function(.l) { inner_names <- names(.l[[1]]) if (is.null(inner_names)) { fields <- seq_along(.l[[1]]) } else { fields <- set_names(inner_names) } map(fields, function(i) { map(.l, .subset2, i) }) } every <- function(.x, .p, ...) { for (i in seq_along(.x)) { if (!rlang::is_true(.p(.x[[i]], ...))) return(FALSE) } TRUE } some <- function(.x, .p, ...) { for (i in seq_along(.x)) { if (rlang::is_true(.p(.x[[i]], ...))) return(TRUE) } FALSE } negate <- function(.p) { function(...) !.p(...) } reduce <- function(.x, .f, ..., .init) { f <- function(x, y) .f(x, y, ...) Reduce(f, .x, init = .init) } reduce_right <- function(.x, .f, ..., .init) { f <- function(x, y) .f(y, x, ...) Reduce(f, .x, init = .init, right = TRUE) } accumulate <- function(.x, .f, ..., .init) { f <- function(x, y) .f(x, y, ...) Reduce(f, .x, init = .init, accumulate = TRUE) } accumulate_right <- function(.x, .f, ..., .init) { f <- function(x, y) .f(y, x, ...) Reduce(f, .x, init = .init, right = TRUE, accumulate = TRUE) } detect <- function(.x, .f, ..., .right = FALSE, .p = is_true) { for (i in index(.x, .right)) { if (.p(.f(.x[[i]], ...))) { return(.x[[i]]) } } NULL } detect_index <- function(.x, .f, ..., .right = FALSE, .p = is_true) { for (i in index(.x, .right)) { if (.p(.f(.x[[i]], ...))) { return(i) } } 0L } index <- function(x, right = FALSE) { idx <- seq_along(x) if (right) { idx <- rev(idx) } idx } imap <- function(.x, .f, ...) { map2(.x, vec_index(.x), .f, ...) } vec_index <- function(x) { names(x) %||% seq_along(x) } # nocov end stringr/R/replace.R0000644000176200001440000001414314524701211013701 0ustar liggesusers#' Replace matches with new text #' #' `str_replace()` replaces the first match; `str_replace_all()` replaces #' all matches. #' #' @inheritParams str_detect #' @param pattern Pattern to look for. #' #' The default interpretation is a regular expression, as described #' in [stringi::about_search_regex]. Control options with #' [regex()]. #' #' For `str_replace_all()` this can also be a named vector #' (`c(pattern1 = replacement1)`), in order to perform multiple replacements #' in each element of `string`. #' #' Match a fixed string (i.e. by comparing only bytes), using #' [fixed()]. This is fast, but approximate. Generally, #' for matching human text, you'll want [coll()] which #' respects character matching rules for the specified locale. #' @param replacement The replacement value, usually a single string, #' but it can be the a vector the same length as `string` or `pattern`. #' References of the form `\1`, `\2`, etc will be replaced with #' the contents of the respective matched group (created by `()`). #' #' Alternatively, supply a function, which will be called once for each #' match (from right to left) and its return value will be used to replace #' the match. #' @return A character vector the same length as #' `string`/`pattern`/`replacement`. #' @seealso [str_replace_na()] to turn missing values into "NA"; #' [stri_replace()] for the underlying implementation. #' @export #' @examples #' fruits <- c("one apple", "two pears", "three bananas") #' str_replace(fruits, "[aeiou]", "-") #' str_replace_all(fruits, "[aeiou]", "-") #' str_replace_all(fruits, "[aeiou]", toupper) #' str_replace_all(fruits, "b", NA_character_) #' #' str_replace(fruits, "([aeiou])", "") #' str_replace(fruits, "([aeiou])", "\\1\\1") #' #' # Note that str_replace() is vectorised along text, pattern, and replacement #' str_replace(fruits, "[aeiou]", c("1", "2", "3")) #' str_replace(fruits, c("a", "e", "i"), "-") #' #' # If you want to apply multiple patterns and replacements to the same #' # string, pass a named vector to pattern. #' fruits %>% #' str_c(collapse = "---") %>% #' str_replace_all(c("one" = "1", "two" = "2", "three" = "3")) #' #' # Use a function for more sophisticated replacement. This example #' # replaces colour names with their hex values. #' colours <- str_c("\\b", colors(), "\\b", collapse="|") #' col2hex <- function(col) { #' rgb <- col2rgb(col) #' rgb(rgb["red", ], rgb["green", ], rgb["blue", ], max = 255) #' } #' #' x <- c( #' "Roses are red, violets are blue", #' "My favourite colour is green" #' ) #' str_replace_all(x, colours, col2hex) str_replace <- function(string, pattern, replacement) { if (!missing(replacement) && is_replacement_fun(replacement)) { replacement <- as_function(replacement) return(str_transform(string, pattern, replacement)) } check_lengths(string, pattern, replacement) switch(type(pattern), empty = no_empty(), bound = no_boundary(), fixed = stri_replace_first_fixed(string, pattern, replacement, opts_fixed = opts(pattern)), coll = stri_replace_first_coll(string, pattern, replacement, opts_collator = opts(pattern)), regex = stri_replace_first_regex(string, pattern, fix_replacement(replacement), opts_regex = opts(pattern)) ) } #' @export #' @rdname str_replace str_replace_all <- function(string, pattern, replacement) { if (!missing(replacement) && is_replacement_fun(replacement)) { replacement <- as_function(replacement) return(str_transform_all(string, pattern, replacement)) } if (!is.null(names(pattern))) { vec <- FALSE replacement <- unname(pattern) pattern[] <- names(pattern) } else { check_lengths(string, pattern, replacement) vec <- TRUE } switch(type(pattern), empty = cli::cli_abort("{.arg pattern} can't be empty."), bound = cli::cli_abort("{.arg pattern} can't be a boundary."), fixed = stri_replace_all_fixed(string, pattern, replacement, vectorize_all = vec, opts_fixed = opts(pattern)), coll = stri_replace_all_coll(string, pattern, replacement, vectorize_all = vec, opts_collator = opts(pattern)), regex = stri_replace_all_regex(string, pattern, fix_replacement(replacement), vectorize_all = vec, opts_regex = opts(pattern)) ) } is_replacement_fun <- function(x) { is.function(x) || is_formula(x) } fix_replacement <- function(x, error_call = caller_env()) { check_character(x, arg = "replacement", call = error_call) vapply(x, fix_replacement_one, character(1), USE.NAMES = FALSE) } fix_replacement_one <- function(x) { if (is.na(x)) { return(x) } chars <- str_split(x, "")[[1]] out <- character(length(chars)) escaped <- logical(length(chars)) in_escape <- FALSE for (i in seq_along(chars)) { escaped[[i]] <- in_escape char <- chars[[i]] if (in_escape) { # Escape character not printed previously so must include here if (char == "$") { out[[i]] <- "\\\\$" } else if (char >= "0" && char <= "9") { out[[i]] <- paste0("$", char) } else { out[[i]] <- paste0("\\", char) } in_escape <- FALSE } else { if (char == "$") { out[[i]] <- "\\$" } else if (char == "\\") { in_escape <- TRUE } else { out[[i]] <- char } } } # tibble::tibble(chars, out, escaped) paste0(out, collapse = "") } #' Turn NA into "NA" #' #' @inheritParams str_replace #' @param replacement A single string. #' @export #' @examples #' str_replace_na(c(NA, "abc", "def")) str_replace_na <- function(string, replacement = "NA") { check_string(replacement) stri_replace_na(string, replacement) } str_transform <- function(string, pattern, replacement) { loc <- str_locate(string, pattern) str_sub(string, loc, omit_na = TRUE) <- replacement(str_sub(string, loc)) string } str_transform_all <- function(string, pattern, replacement) { locs <- str_locate_all(string, pattern) for (i in seq_along(string)) { for (j in rev(seq_len(nrow(locs[[i]])))) { loc <- locs[[i]] str_sub(string[[i]], loc[j, 1], loc[j, 2]) <- replacement(str_sub(string[[i]], loc[j, 1], loc[j, 2])) } } string } stringr/R/c.R0000644000176200001440000000443414520174727012526 0ustar liggesusers#' Join multiple strings into one string #' #' @description #' `str_c()` combines multiple character vectors into a single character #' vector. It's very similar to [paste0()] but uses tidyverse recycling and #' `NA` rules. #' #' One way to understand how `str_c()` works is picture a 2d matrix of strings, #' where each argument forms a column. `sep` is inserted between each column, #' and then each row is combined together into a single string. If `collapse` #' is set, it's inserted between each row, and then the result is again #' combined, this time into a single string. #' #' @param ... One or more character vectors. #' #' `NULL`s are removed; scalar inputs (vectors of length 1) are recycled to #' the common length of vector inputs. #' #' Like most other R functions, missing values are "infectious": whenever #' a missing value is combined with another string the result will always #' be missing. Use [dplyr::coalesce()] or [str_replace_na()] to convert to #' the desired value. #' @param sep String to insert between input vectors. #' @param collapse Optional string used to combine output into single #' string. Generally better to use [str_flatten()] if you needed this #' behaviour. #' @return If `collapse = NULL` (the default) a character vector with #' length equal to the longest input. If `collapse` is a string, a character #' vector of length 1. #' @export #' @examples #' str_c("Letter: ", letters) #' str_c("Letter", letters, sep = ": ") #' str_c(letters, " is for", "...") #' str_c(letters[-26], " comes before ", letters[-1]) #' #' str_c(letters, collapse = "") #' str_c(letters, collapse = ", ") #' #' # Differences from paste() ---------------------- #' # Missing inputs give missing outputs #' str_c(c("a", NA, "b"), "-d") #' paste0(c("a", NA, "b"), "-d") #' # Use str_replace_NA to display literal NAs: #' str_c(str_replace_na(c("a", NA, "b")), "-d") #' #' # Uses tidyverse recycling rules #' \dontrun{str_c(1:2, 1:3)} # errors #' paste0(1:2, 1:3) #' #' str_c("x", character()) #' paste0("x", character()) str_c <- function(..., sep = "", collapse = NULL) { check_string(sep) check_string(collapse, allow_null = TRUE) dots <- list(...) dots <- dots[!map_lgl(dots, is.null)] vctrs::vec_size_common(!!!dots) inject(stri_c(!!!dots, sep = sep, collapse = collapse)) } stringr/NEWS.md0000644000176200001440000003265614524706116013062 0ustar liggesusers# stringr 1.5.1 * Some minor documentation improvements. * `str_trunc()` now correctly truncates strings when `side` is `"left"` or `"center"` (@UchidaMizuki, #512). # stringr 1.5.0 ## Breaking changes * stringr functions now consistently implement the tidyverse recycling rules (#372). There are two main changes: * Only vectors of length 1 are recycled. Previously, (e.g.) `str_detect(letters, c("x", "y"))` worked, but it now errors. * `str_c()` ignores `NULLs`, rather than treating them as length 0 vectors. Additionally, many more arguments now throw errors, rather than warnings, if supplied the wrong type of input. * `regex()` and friends now generate class names with `stringr_` prefix (#384). * `str_detect()`, `str_starts()`, `str_ends()` and `str_subset()` now error when used with either an empty string (`""`) or a `boundary()`. These operations didn't really make sense (`str_detect(x, "")` returned `TRUE` for all non-empty strings) and made it easy to make mistakes when programming. ## New features * Many tweaks to the documentation to make it more useful and consistent. * New `vignette("from-base")` by @sastoudt provides a comprehensive comparison between base R functions and their stringr equivalents. It's designed to help you move to stringr if you're already familiar with base R string functions (#266). * New `str_escape()` escapes regular expression metacharacters, providing an alternative to `fixed()` if you want to compose a pattern from user supplied strings (#408). * New `str_equal()` compares two character vectors using unicode rules, optionally ignoring case (#381). * `str_extract()` can now optionally extract a capturing group instead of the complete match (#420). * New `str_flatten_comma()` is a special case of `str_flatten()` designed for comma separated flattening and can correctly apply the Oxford commas when there are only two elements (#444). * New `str_split_1()` is tailored for the special case of splitting up a single string (#409). * New `str_split_i()` extract a single piece from a string (#278, @bfgray3). * New `str_like()` allows the use of SQL wildcards (#280, @rjpat). * New `str_rank()` to complete the set of order/rank/sort functions (#353). * New `str_sub_all()` to extract multiple substrings from each string. * New `str_unique()` is a wrapper around `stri_unique()` and returns unique string values in a character vector (#249, @seasmith). * `str_view()` uses ANSI colouring rather than an HTML widget (#370). This works in more places and requires fewer dependencies. It includes a number of other small improvements: * It no longer requires a pattern so you can use it to display strings with special characters. * It highlights unusual whitespace characters. * It's vectorised over both string` and `pattern` (#407). * It defaults to displaying all matches, making `str_view_all()` redundant (and hence deprecated) (#455). * New `str_width()` returns the display width of a string (#380). * stringr is now licensed as MIT (#351). ## Minor improvements and bug fixes * Better error message if you supply a non-string pattern (#378). * A new data source for `sentences` has fixed many small errors. * `str_extract()` and `str_exctract_all()` now work correctly when `pattern` is a `boundary()`. * `str_flatten()` gains a `last` argument that optionally override the final separator (#377). It gains a `na.rm` argument to remove missing values (since it's a summary function) (#439). * `str_pad()` gains `use_width` argument to control whether to use the total code point width or the number of code points as "width" of a string (#190). * `str_replace()` and `str_replace_all()` can use standard tidyverse formula shorthand for `replacement` function (#331). * `str_starts()` and `str_ends()` now correctly respect regex operator precedence (@carlganz). * `str_wrap()` breaks only at whitespace by default; set `whitespace_only = FALSE` to return to the previous behaviour (#335, @rjpat). * `word()` now returns all the sentence when using a negative `start` parameter that is greater or equal than the number of words. (@pdelboca, #245) # stringr 1.4.1 Hot patch release to resolve R CMD check failures. # stringr 1.4.0 * `str_interp()` now renders lists consistently independent on the presence of additional placeholders (@amhrasmussen). * New `str_starts()` and `str_ends()` functions to detect patterns at the beginning or end of strings (@jonthegeek, #258). * `str_subset()`, `str_detect()`, and `str_which()` get `negate` argument, which is useful when you want the elements that do NOT match (#259, @yutannihilation). * New `str_to_sentence()` function to capitalize with sentence case (@jonthegeek, #202). # stringr 1.3.1 * `str_replace_all()` with a named vector now respects modifier functions (#207) * `str_trunc()` is once again vectorised correctly (#203, @austin3dickey). * `str_view()` handles `NA` values more gracefully (#217). I've also tweaked the sizing policy so hopefully it should work better in notebooks, while preserving the existing behaviour in knit documents (#232). # stringr 1.3.0 ## API changes * During package build, you may see `Error : object ‘ignore.case’ is not exported by 'namespace:stringr'`. This is because the long deprecated `str_join()`, `ignore.case()` and `perl()` have now been removed. ## New features * `str_glue()` and `str_glue_data()` provide convenient wrappers around `glue` and `glue_data()` from the [glue](https://glue.tidyverse.org/) package (#157). * `str_flatten()` is a wrapper around `stri_flatten()` and clearly conveys flattening a character vector into a single string (#186). * `str_remove()` and `str_remove_all()` functions. These wrap `str_replace()` and `str_replace_all()` to remove patterns from strings. (@Shians, #178) * `str_squish()` removes spaces from both the left and right side of strings, and also converts multiple space (or space-like characters) to a single space within strings (@stephlocke, #197). * `str_sub()` gains `omit_na` argument for ignoring `NA`. Accordingly, `str_replace()` now ignores `NA`s and keeps the original strings. (@yutannihilation, #164) ## Bug fixes and minor improvements * `str_trunc()` now preserves NAs (@ClaytonJY, #162) * `str_trunc()` now throws an error when `width` is shorter than `ellipsis` (@ClaytonJY, #163). * Long deprecated `str_join()`, `ignore.case()` and `perl()` have now been removed. # stringr 1.2.0 ## API changes * `str_match_all()` now returns NA if an optional group doesn't match (previously it returned ""). This is more consistent with `str_match()` and other match failures (#134). ## New features * In `str_replace()`, `replacement` can now be a function that is called once for each match and whose return value is used to replace the match. * New `str_which()` mimics `grep()` (#129). * A new vignette (`vignette("regular-expressions")`) describes the details of the regular expressions supported by stringr. The main vignette (`vignette("stringr")`) has been updated to give a high-level overview of the package. ## Minor improvements and bug fixes * `str_order()` and `str_sort()` gain explicit `numeric` argument for sorting mixed numbers and strings. * `str_replace_all()` now throws an error if `replacement` is not a character vector. If `replacement` is `NA_character_` it replaces the complete string with replaces with `NA` (#124). * All functions that take a locale (e.g. `str_to_lower()` and `str_sort()`) default to "en" (English) to ensure that the default is consistent across platforms. # stringr 1.1.0 * Add sample datasets: `fruit`, `words` and `sentences`. * `fixed()`, `regex()`, and `coll()` now throw an error if you use them with anything other than a plain string (#60). I've clarified that the replacement for `perl()` is `regex()` not `regexp()` (#61). `boundary()` has improved defaults when splitting on non-word boundaries (#58, @lmullen). * `str_detect()` now can detect boundaries (by checking for a `str_count()` > 0) (#120). `str_subset()` works similarly. * `str_extract()` and `str_extract_all()` now work with `boundary()`. This is particularly useful if you want to extract logical constructs like words or sentences. `str_extract_all()` respects the `simplify` argument when used with `fixed()` matches. * `str_subset()` now respects custom options for `fixed()` patterns (#79, @gagolews). * `str_replace()` and `str_replace_all()` now behave correctly when a replacement string contains `$`s, `\\\\1`, etc. (#83, #99). * `str_split()` gains a `simplify` argument to match `str_extract_all()` etc. * `str_view()` and `str_view_all()` create HTML widgets that display regular expression matches (#96). * `word()` returns `NA` for indexes greater than number of words (#112). # stringr 1.0.0 * stringr is now powered by [stringi](https://github.com/gagolews/stringi) instead of base R regular expressions. This improves unicode and support, and makes most operations considerably faster. If you find stringr inadequate for your string processing needs, I highly recommend looking at stringi in more detail. * stringr gains a vignette, currently a straight forward update of the article that appeared in the R Journal. * `str_c()` now returns a zero length vector if any of its inputs are zero length vectors. This is consistent with all other functions, and standard R recycling rules. Similarly, using `str_c("x", NA)` now yields `NA`. If you want `"xNA"`, use `str_replace_na()` on the inputs. * `str_replace_all()` gains a convenient syntax for applying multiple pairs of pattern and replacement to the same vector: ```R input <- c("abc", "def") str_replace_all(input, c("[ad]" = "!", "[cf]" = "?")) ``` * `str_match()` now returns NA if an optional group doesn't match (previously it returned ""). This is more consistent with `str_extract()` and other match failures. * New `str_subset()` keeps values that match a pattern. It's a convenient wrapper for `x[str_detect(x)]` (#21, @jiho). * New `str_order()` and `str_sort()` allow you to sort and order strings in a specified locale. * New `str_conv()` to convert strings from specified encoding to UTF-8. * New modifier `boundary()` allows you to count, locate and split by character, word, line and sentence boundaries. * The documentation got a lot of love, and very similar functions (e.g. first and all variants) are now documented together. This should hopefully make it easier to locate the function you need. * `ignore.case(x)` has been deprecated in favour of `fixed|regex|coll(x, ignore.case = TRUE)`, `perl(x)` has been deprecated in favour of `regex(x)`. * `str_join()` is deprecated, please use `str_c()` instead. # stringr 0.6.2 * fixed path in `str_wrap` example so works for more R installations. * remove dependency on plyr # stringr 0.6.1 * Zero input to `str_split_fixed` returns 0 row matrix with `n` columns * Export `str_join` # stringr 0.6 * new modifier `perl` that switches to Perl regular expressions * `str_match` now uses new base function `regmatches` to extract matches - this should hopefully be faster than my previous pure R algorithm # stringr 0.5 * new `str_wrap` function which gives `strwrap` output in a more convenient format * new `word` function extract words from a string given user defined separator (thanks to suggestion by David Cooper) * `str_locate` now returns consistent type when matching empty string (thanks to Stavros Macrakis) * new `str_count` counts number of matches in a string. * `str_pad` and `str_trim` receive performance tweaks - for large vectors this should give at least a two order of magnitude speed up * str_length returns NA for invalid multibyte strings * fix small bug in internal `recyclable` function # stringr 0.4 * all functions now vectorised with respect to string, pattern (and where appropriate) replacement parameters * fixed() function now tells stringr functions to use fixed matching, rather than escaping the regular expression. Should improve performance for large vectors. * new ignore.case() modifier tells stringr functions to ignore case of pattern. * str_replace renamed to str_replace_all and new str_replace function added. This makes str_replace consistent with all functions. * new str_sub<- function (analogous to substring<-) for substring replacement * str_sub now understands negative positions as a position from the end of the string. -1 replaces Inf as indicator for string end. * str_pad side argument can be left, right, or both (instead of center) * str_trim gains side argument to better match str_pad * stringr now has a namespace and imports plyr (rather than requiring it) # stringr 0.3 * fixed() now also escapes | * str_join() renamed to str_c() * all functions more carefully check input and return informative error messages if not as expected. * add invert_match() function to convert a matrix of location of matches to locations of non-matches * add fixed() function to allow matching of fixed strings. # stringr 0.2 * str_length now returns correct results when used with factors * str_sub now correctly replaces Inf in end argument with length of string * new function str_split_fixed returns fixed number of splits in a character matrix * str_split no longer uses strsplit to preserve trailing breaks stringr/MD50000644000176200001440000001725514524777112012276 0ustar liggesusers6417661a7c8efb838ad5d5b6f7765ba8 *DESCRIPTION a0018b1c7c6a9756addc1b079daa5cf0 *LICENSE 53cac8267191c7135e60ff422a789b57 *NAMESPACE 3c28871ed9da543fe831761643561104 *NEWS.md 11390682e89c2e7653a90d0c72f84241 *R/c.R a4a814af55ec5f84c9c39746ea39edc1 *R/case.R 07c06e6be0443b7d5b9094f11daa406f *R/compat-obj-type.R 6ff61ce96b8a0aca5f241ef7aaab4a68 *R/compat-purrr.R f299eacebcf906163a629b2ef7787ba4 *R/compat-types-check.R 236c313428f675772ae78e10ec583aab *R/conv.R f75b36e8fb84010a555444e6f8950148 *R/count.R d17b3c3f3f546123603f4f5f84a5f3cb *R/data.R a864288668850afe6aa376437d70d344 *R/detect.R 02515c6f25da38950c825474b67623e1 *R/dup.R 1c6b2097f903dde0057acb61c89eeb01 *R/equal.R 0df0460a69007695b24bbd4c2bd57645 *R/escape.R ea6dfb84a0cbb37d017ebe5b8ff64e63 *R/extract.R 61dae6d6a3696574e7d6d86c5cf8673d *R/flatten.R 1f7ebdf1299f7832220640f16f03c7bb *R/glue.R 242f08a29f1cdcdf774bc277a0fec933 *R/interp.R d072949ee58976ca4e02dcb4e775c394 *R/length.R 7d10a9448a8556aaf846a9408c733eb4 *R/locate.R 8c37d98f0218668449fa9378ad693eea *R/match.R 815c2c1b410c4e7a5d26fd35f5e097ce *R/modifiers.R eede1618dad4a0861c54dcdfb2c8ac75 *R/pad.R d21d0843b2319c6f450c1e62459e2502 *R/remove.R 100c2b42b967fff29315697c2ef89225 *R/replace.R dfabd70c0e2d57f9b2c3ce94394a0be5 *R/sort.R 0552ae2a46503839a88cfb707866ba80 *R/split.R 89c1a3fe6a9b2c2627c0c217d3240601 *R/stringr-package.R 95a7f7de995ead49e5728f4d5ac9b6af *R/sub.R 9d96b7aeda42bf69cd9b7a1ce5492e3e *R/subset.R 88afe326dd676421d7cc04309610796a *R/trim.R 467b44328c4da52de9d4820d84cf9a70 *R/trunc.R ee836edf224d73d887bc94cf8453705d *R/unique.R 33b05fa3a588a58663f835ce45783e83 *R/utils.R be25c17f9a9668f766a8a02b73da9268 *R/view.R 9440e9e7a0bbc33ada2e425cf0ed2cbe *R/word.R ccb9c91317f1b34569c8495f4c8f3b9a *R/wrap.R 0b6dd73d03f405561232bb60c5052c33 *README.md effdb89ddbda0e944da4a5d7699bbd4b *build/vignette.rds 89f0d280160eb4419b23251639a728c2 *data/fruit.rda 51f821ca3b3f8a286d0ceabccdab6ca7 *data/sentences.rda c99f00d311e24c76bbeabfc8a58b4b50 *data/words.rda 6259b8c4d33f86a18130632a8c44e770 *inst/doc/from-base.R 55fa03361dfbc453024c7ab8df2ede71 *inst/doc/from-base.Rmd c9b48c49b98c37e75a6f7b64571fd6d6 *inst/doc/from-base.html 13a030b4d1d374db7bc577145b69e24c *inst/doc/regular-expressions.R bee4d7614ef68fce3dc6772a0442a1ff *inst/doc/regular-expressions.Rmd 17c9d247135faeb4b172ebc7b47b97e1 *inst/doc/regular-expressions.html d57838d0deb2a741886f70eeed71604b *inst/doc/stringr.R 2e6abe80c39713fdd5778e6276185408 *inst/doc/stringr.Rmd 80f1f2e645f232b6fb10cc716a5407f4 *inst/doc/stringr.html 5f38e68c0f6148b954a6849fe553b173 *inst/htmlwidgets/lib/str_view.css e7c37a495d4ae965400eeb1000dee672 *inst/htmlwidgets/str_view.js 1763429826b7f9745d2e590e4ca4c119 *inst/htmlwidgets/str_view.yaml 71f33ca928beb04691d6a17102f377f9 *man/case.Rd cb1e46f469cfbbbde29c8b5113e1d789 *man/figures/lifecycle-archived.svg c0d2e5a54f1fa4ff02bf9533079dd1f7 *man/figures/lifecycle-defunct.svg a1b8c987c676c16af790f563f96cbb1f *man/figures/lifecycle-deprecated.svg c3978703d8f40f2679795335715e98f4 *man/figures/lifecycle-experimental.svg 952b59dc07b171b97d5d982924244f61 *man/figures/lifecycle-maturing.svg 27b879bf3677ea76e3991d56ab324081 *man/figures/lifecycle-questioning.svg 53b3f893324260b737b3c46ed2a0e643 *man/figures/lifecycle-stable.svg 1c1fe7a759b86dc6dbcbe7797ab8246c *man/figures/lifecycle-superseded.svg 86fdd0a3f998a3d5a3ed6a850f5a1a0b *man/figures/logo.png ab69caa66e7d332f9555f8c4ae39e441 *man/invert_match.Rd ef8404412dac3596f66f50c3268fd089 *man/modifiers.Rd a64a7ea44fcaa33c2d3ad0f7909cbc3e *man/pipe.Rd 7ecc8f5ddb1b5fd123cdbf4a532da370 *man/str_c.Rd 4c5363f6872ab4d60fdb91609af1be13 *man/str_conv.Rd d6973b6218a2a1a32eee03e1a0ffa41c *man/str_count.Rd 1bdaa07efcfb600dadd5a93342af1e4b *man/str_detect.Rd b07cca29357daa5c72bad16c916ede42 *man/str_dup.Rd 5f09e863013a8297b0b38ffa1d3ba5fe *man/str_equal.Rd 0e95910de21db06a6e642bbd8cc17e96 *man/str_escape.Rd d4f5015fbae4ef8d749a8d334b274412 *man/str_extract.Rd 8d296bb5ba6ef0bc7d976c12c3af195e *man/str_flatten.Rd 4b55b07f3b462f77dc4ebb5bb22c303b *man/str_glue.Rd 0aaa5a0cce235f807c189a0233464add *man/str_interp.Rd 9a978f65167837c2a82b9eb45e877276 *man/str_length.Rd 743c58ac7940bfdb78022ab998687a3d *man/str_like.Rd fde4acb343e7261725078e7453f1a5fd *man/str_locate.Rd c78d29b9b7696ac56fbd4f8975efa955 *man/str_match.Rd 6ac9678ff6aa4a96ff3beb460a2d565e *man/str_order.Rd df526bd928ec6a89b62d80df016d7aa9 *man/str_pad.Rd ae9a5667a4064d65c3812080b3f80fec *man/str_remove.Rd 94e9edf26c8bdca90bcac40e426ba042 *man/str_replace.Rd 797da58fc306dc7b018434727a32d26a *man/str_replace_na.Rd fffbbf8d258f6be66101c63210570936 *man/str_split.Rd 422188a7f648f1047e529fcafa3dcacc *man/str_starts.Rd a0bda4a83f1bbcce75d6ff072eab3812 *man/str_sub.Rd d92d29fbb86b0fb2a98e496af68e5561 *man/str_subset.Rd 7dffcd7bcfda40c125dd8da67404a32e *man/str_trim.Rd 4e4e96af2d85d1f024a8c3495c7a8400 *man/str_trunc.Rd ad920b9d524a6cdad804abddd67836bc *man/str_unique.Rd fce68051479e8c8a2126cab061cc956a *man/str_view.Rd 6f3edc14ec4040eaa8414f53ead6e9d9 *man/str_which.Rd fb432c6d1ee6a45f6bae4630a72e4c31 *man/str_wrap.Rd b020a9e0c22c8a73988673d8105e6beb *man/stringr-data.Rd faadd8ce234ceadf5aa02978b567398f *man/stringr-package.Rd 0e38c8016c39ab1cbfdac436864918df *man/word.Rd 4ee9d05bd4688270eca8d85299cedcd1 *tests/testthat.R 2495853cc152148a9967b97a742189d2 *tests/testthat/_snaps/c.md e1070ef18b7d962e0060cecc2751f0eb *tests/testthat/_snaps/detect.md feb7439575c71a94b2fd96a1cfe8490f *tests/testthat/_snaps/match.md bb5aa361c47349d8e72ef050f17ce13f *tests/testthat/_snaps/modifiers.md 701413f4ddebcd6f159e3c1bee9ea807 *tests/testthat/_snaps/replace.md f7d84d80aac766e0dabc941f4d1cc99f *tests/testthat/_snaps/split.md b226cf3dc38f2430e39bcdace6baa41f *tests/testthat/_snaps/subset.md 998a7db85bac398d9ad63edc24ae9fb4 *tests/testthat/_snaps/trunc.md 666ca9c4c0f885d6829fadce41a61616 *tests/testthat/_snaps/view.md 04fa3430820d49de179ce1be247461bf *tests/testthat/test-c.R 1b96097e258ae55b2501a037dfb9005f *tests/testthat/test-case.R cbf33254fcbdb2f01825365e017d5ee4 *tests/testthat/test-conv.R fd652a597975ae2e952ca11339671fe2 *tests/testthat/test-count.R e8be5c6fee11af55731655faa27fcc57 *tests/testthat/test-detect.R f9fca2103c840b9ed94900b03d31b5f5 *tests/testthat/test-dup.R 8c45219dd2995f936cb8c3f9716c8011 *tests/testthat/test-equal.R d23fcddf47a45da27a103f808a18e78e *tests/testthat/test-escape.R a5a2d54d820de92d947f38a73aa28fce *tests/testthat/test-extract.R e05f21c06429b79a1448f42578e44408 *tests/testthat/test-flatten.R 5d174d75b4333754c212093685c8dfa9 *tests/testthat/test-glue.R 8d10e593e9d78b50eb581f4e07c73c5c *tests/testthat/test-interp.R 4b370f0991952923730c72182afddb8c *tests/testthat/test-length.R 7fefa323cad92802b1bdc6fcab170240 *tests/testthat/test-locate.R fe205e7026398ab262c3d9bbcc94579e *tests/testthat/test-match.R 5a6bb7b3ac53d7ef600d8aa86f48c736 *tests/testthat/test-modifiers.R 3542cb39e4cc66103619704b1ab97700 *tests/testthat/test-pad.R 067c9f1b67bd63b8e355cb28e31c976f *tests/testthat/test-remove.R 6edbe852bdb28146002beea516236f6a *tests/testthat/test-replace.R 82139ea85343349a40a44be88aa36d11 *tests/testthat/test-sort.R 0b9db589d9d6b3ea2d058e310a281e61 *tests/testthat/test-split.R 2399869fcfc88342844800f634459d2a *tests/testthat/test-sub.R 968816a52cce5608b6159e758526739d *tests/testthat/test-subset.R d6bcfb2fd01012d8ec733a7600291bed *tests/testthat/test-trim.R bf32022e418f7e355da993ba8b354600 *tests/testthat/test-trunc.R c801dfc65d41c161b000ea64a49f6944 *tests/testthat/test-unique.R 8a43c5c965ddfc1680a58812acde8267 *tests/testthat/test-view.R a1de73cbd092c5c94bf6a51e6103740c *tests/testthat/test-word.R 217ae1d7e7fa11dcf0b5ba0ed1c08473 *tests/testthat/test-wrap.R 55fa03361dfbc453024c7ab8df2ede71 *vignettes/from-base.Rmd bee4d7614ef68fce3dc6772a0442a1ff *vignettes/regular-expressions.Rmd 2e6abe80c39713fdd5778e6276185408 *vignettes/stringr.Rmd stringr/inst/0000755000176200001440000000000014524706130012721 5ustar liggesusersstringr/inst/doc/0000755000176200001440000000000014524706130013466 5ustar liggesusersstringr/inst/doc/from-base.Rmd0000644000176200001440000004036414524677110016021 0ustar liggesusers--- title: "From base R" author: "Sara Stoudt" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{From base R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r} #| label: setup #| include: false knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(stringr) library(magrittr) ``` This vignette compares stringr functions to their base R equivalents to help users transitioning from using base R to stringr. # Overall differences We'll begin with a lookup table between the most important stringr functions and their base R equivalents. ```{r} #| label: stringr-base-r-diff #| echo: false data_stringr_base_diff <- tibble::tribble( ~stringr, ~base_r, "str_detect(string, pattern)", "grepl(pattern, x)", "str_dup(string, times)", "strrep(x, times)", "str_extract(string, pattern)", "regmatches(x, m = regexpr(pattern, text))", "str_extract_all(string, pattern)", "regmatches(x, m = gregexpr(pattern, text))", "str_length(string)", "nchar(x)", "str_locate(string, pattern)", "regexpr(pattern, text)", "str_locate_all(string, pattern)", "gregexpr(pattern, text)", "str_match(string, pattern)", "regmatches(x, m = regexec(pattern, text))", "str_order(string)", "order(...)", "str_replace(string, pattern, replacement)", "sub(pattern, replacement, x)", "str_replace_all(string, pattern, replacement)", "gsub(pattern, replacement, x)", "str_sort(string)", "sort(x)", "str_split(string, pattern)", "strsplit(x, split)", "str_sub(string, start, end)", "substr(x, start, stop)", "str_subset(string, pattern)", "grep(pattern, x, value = TRUE)", "str_to_lower(string)", "tolower(x)", "str_to_title(string)", "tools::toTitleCase(text)", "str_to_upper(string)", "toupper(x)", "str_trim(string)", "trimws(x)", "str_which(string, pattern)", "grep(pattern, x)", "str_wrap(string)", "strwrap(x)" ) # create MD table, arranged alphabetically by stringr fn name data_stringr_base_diff %>% dplyr::mutate(dplyr::across(.fns = ~ paste0("`", .x, "`"))) %>% dplyr::arrange(stringr) %>% dplyr::rename(`base R` = base_r) %>% gt::gt() %>% gt::fmt_markdown(columns = everything()) %>% gt::tab_options(column_labels.font.weight = "bold") ``` Overall the main differences between base R and stringr are: 1. stringr functions start with `str_` prefix; base R string functions have no consistent naming scheme. 1. The order of inputs is usually different between base R and stringr. In base R, the `pattern` to match usually comes first; in stringr, the `string` to manupulate always comes first. This makes stringr easier to use in pipes, and with `lapply()` or `purrr::map()`. 1. Functions in stringr tend to do less, where many of the string processing functions in base R have multiple purposes. 1. The output and input of stringr functions has been carefully designed. For example, the output of `str_locate()` can be fed directly into `str_sub()`; the same is not true of `regpexpr()` and `substr()`. 1. Base functions use arguments (like `perl`, `fixed`, and `ignore.case`) to control how the pattern is interpreted. To avoid dependence between arguments, stringr instead uses helper functions (like `fixed()`, `regex()`, and `coll()`). Next we'll walk through each of the functions, noting the similarities and important differences. These examples are adapted from the stringr documentation and here they are contrasted with the analogous base R operations. # Detect matches ## `str_detect()`: Detect the presence or absence of a pattern in a string Suppose you want to know whether each word in a vector of fruit names contains an "a". ```{r} fruit <- c("apple", "banana", "pear", "pineapple") # base grepl(pattern = "a", x = fruit) # stringr str_detect(fruit, pattern = "a") ``` In base you would use `grepl()` (see the "l" and think logical) while in stringr you use `str_detect()` (see the verb "detect" and think of a yes/no action). ## `str_which()`: Find positions matching a pattern Now you want to identify the positions of the words in a vector of fruit names that contain an "a". ```{r} # base grep(pattern = "a", x = fruit) # stringr str_which(fruit, pattern = "a") ``` In base you would use `grep()` while in stringr you use `str_which()` (by analogy to `which()`). ## `str_count()`: Count the number of matches in a string How many "a"s are in each fruit? ```{r} # base loc <- gregexpr(pattern = "a", text = fruit, fixed = TRUE) sapply(loc, function(x) length(attr(x, "match.length"))) # stringr str_count(fruit, pattern = "a") ``` This information can be gleaned from `gregexpr()` in base, but you need to look at the `match.length` attribute as the vector uses a length-1 integer vector (`-1`) to indicate no match. ## `str_locate()`: Locate the position of patterns in a string Within each fruit, where does the first "p" occur? Where are all of the "p"s? ```{r} fruit3 <- c("papaya", "lime", "apple") # base str(gregexpr(pattern = "p", text = fruit3)) # stringr str_locate(fruit3, pattern = "p") str_locate_all(fruit3, pattern = "p") ``` # Subset strings ## `str_sub()`: Extract and replace substrings from a character vector What if we want to grab part of a string? ```{r} hw <- "Hadley Wickham" # base substr(hw, start = 1, stop = 6) substring(hw, first = 1) # stringr str_sub(hw, start = 1, end = 6) str_sub(hw, start = 1) str_sub(hw, end = 6) ``` In base you could use `substr()` or `substring()`. The former requires both a start and stop of the substring while the latter assumes the stop will be the end of the string. The stringr version, `str_sub()` has the same functionality, but also gives a default start value (the beginning of the string). Both the base and stringr functions have the same order of expected inputs. In stringr you can use negative numbers to index from the right-hand side string: -1 is the last letter, -2 is the second to last, and so on. ```{r} str_sub(hw, start = 1, end = -1) str_sub(hw, start = -5, end = -2) ``` Both base R and stringr subset are vectorized over their parameters. This means you can either choose the same subset across multiple strings or specify different subsets for different strings. ```{r} al <- "Ada Lovelace" # base substr(c(hw,al), start = 1, stop = 6) substr(c(hw,al), start = c(1,1), stop = c(6,7)) # stringr str_sub(c(hw,al), start = 1, end = -1) str_sub(c(hw,al), start = c(1,1), end = c(-1,-2)) ``` stringr will automatically recycle the first argument to the same length as `start` and `stop`: ```{r} str_sub(hw, start = 1:5) ``` Whereas the base equivalent silently uses just the first value: ```{r} substr(hw, start = 1:5, stop = 15) ``` ## `str_sub() <- `: Subset assignment `substr()` behaves in a surprising way when you replace a substring with a different number of characters: ```{r} # base x <- "ABCDEF" substr(x, 1, 3) <- "x" x ``` `str_sub()` does what you would expect: ```{r} # stringr x <- "ABCDEF" str_sub(x, 1, 3) <- "x" x ``` ## `str_subset()`: Keep strings matching a pattern, or find positions We may want to retrieve strings that contain a pattern of interest: ```{r} # base grep(pattern = "g", x = fruit, value = TRUE) # stringr str_subset(fruit, pattern = "g") ``` ## `str_extract()`: Extract matching patterns from a string We may want to pick out certain patterns from a string, for example, the digits in a shopping list: ```{r} shopping_list <- c("apples x4", "bag of flour", "10", "milk x2") # base matches <- regexpr(pattern = "\\d+", text = shopping_list) # digits regmatches(shopping_list, m = matches) matches <- gregexpr(pattern = "[a-z]+", text = shopping_list) # words regmatches(shopping_list, m = matches) # stringr str_extract(shopping_list, pattern = "\\d+") str_extract_all(shopping_list, "[a-z]+") ``` Base R requires the combination of `regexpr()` with `regmatches()`; but note that the strings without matches are dropped from the output. stringr provides `str_extract()` and `str_extract_all()`, and the output is always the same length as the input. ## `str_match()`: Extract matched groups from a string We may also want to extract groups from a string. Here I'm going to use the scenario from Section 14.4.3 in [R for Data Science](https://r4ds.had.co.nz/strings.html). ```{r} head(sentences) noun <- "([A]a|[Tt]he) ([^ ]+)" # base matches <- regexec(pattern = noun, text = head(sentences)) do.call("rbind", regmatches(x = head(sentences), m = matches)) # stringr str_match(head(sentences), pattern = noun) ``` As for extracting the full match base R requires the combination of two functions, and inputs with no matches are dropped from the output. # Manage lengths ## `str_length()`: The length of a string To determine the length of a string, base R uses `nchar()` (not to be confused with `length()` which gives the length of vectors, etc.) while stringr uses `str_length()`. ```{r} # base nchar(letters) # stringr str_length(letters) ``` There are some subtle differences between base and stringr here. `nchar()` requires a character vector, so it will return an error if used on a factor. `str_length()` can handle a factor input. ```{r} #| error: true # base nchar(factor("abc")) ``` ```{r} # stringr str_length(factor("abc")) ``` Note that "characters" is a poorly defined concept, and technically both `nchar()` and `str_length()` returns the number of code points. This is usually the same as what you'd consider to be a charcter, but not always: ```{r} x <- c("\u00fc", "u\u0308") x nchar(x) str_length(x) ``` ## `str_pad()`: Pad a string To pad a string to a certain width, use stringr's `str_pad()`. In base R you could use `sprintf()`, but unlike `str_pad()`, `sprintf()` has many other functionalities. ```{r} # base sprintf("%30s", "hadley") sprintf("%-30s", "hadley") # "both" is not as straightforward # stringr rbind( str_pad("hadley", 30, "left"), str_pad("hadley", 30, "right"), str_pad("hadley", 30, "both") ) ``` ## `str_trunc()`: Truncate a character string The stringr package provides an easy way to truncate a character string: `str_trunc()`. Base R has no function to do this directly. ```{r} x <- "This string is moderately long" # stringr rbind( str_trunc(x, 20, "right"), str_trunc(x, 20, "left"), str_trunc(x, 20, "center") ) ``` ## `str_trim()`: Trim whitespace from a string Similarly, stringr provides `str_trim()` to trim whitespace from a string. This is analogous to base R's `trimws()` added in R 3.3.0. ```{r} # base trimws(" String with trailing and leading white space\t") trimws("\n\nString with trailing and leading white space\n\n") # stringr str_trim(" String with trailing and leading white space\t") str_trim("\n\nString with trailing and leading white space\n\n") ``` The stringr function `str_squish()` allows for extra whitespace within a string to be trimmed (in contrast to `str_trim()` which removes whitespace at the beginning and/or end of string). In base R, one might take advantage of `gsub()` to accomplish the same effect. ```{r} # stringr str_squish(" String with trailing, middle, and leading white space\t") str_squish("\n\nString with excess, trailing and leading white space\n\n") ``` ## `str_wrap()`: Wrap strings into nicely formatted paragraphs `strwrap()` and `str_wrap()` use different algorithms. `str_wrap()` uses the famous [Knuth-Plass algorithm](http://litherum.blogspot.com/2015/07/knuth-plass-line-breaking-algorithm.html). ```{r} gettysburg <- "Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal." # base cat(strwrap(gettysburg, width = 60), sep = "\n") # stringr cat(str_wrap(gettysburg, width = 60), "\n") ``` Note that `strwrap()` returns a character vector with one element for each line; `str_wrap()` returns a single string containing line breaks. # Mutate strings ## `str_replace()`: Replace matched patterns in a string To replace certain patterns within a string, stringr provides the functions `str_replace()` and `str_replace_all()`. The base R equivalents are `sub()` and `gsub()`. Note the difference in default input order again. ```{r} fruits <- c("apple", "banana", "pear", "pineapple") # base sub("[aeiou]", "-", fruits) gsub("[aeiou]", "-", fruits) # stringr str_replace(fruits, "[aeiou]", "-") str_replace_all(fruits, "[aeiou]", "-") ``` ## case: Convert case of a string Both stringr and base R have functions to convert to upper and lower case. Title case is also provided in stringr. ```{r} dog <- "The quick brown dog" # base toupper(dog) tolower(dog) tools::toTitleCase(dog) # stringr str_to_upper(dog) str_to_lower(dog) str_to_title(dog) ``` In stringr we can control the locale, while in base R locale distinctions are controlled with global variables. Therefore, the output of your base R code may vary across different computers with different global settings. ```{r} # stringr str_to_upper("i") # English str_to_upper("i", locale = "tr") # Turkish ``` # Join and split ## `str_flatten()`: Flatten a string If we want to take elements of a string vector and collapse them to a single string we can use the `collapse` argument in `paste()` or use stringr's `str_flatten()`. ```{r} # base paste0(letters, collapse = "-") # stringr str_flatten(letters, collapse = "-") ``` The advantage of `str_flatten()` is that it always returns a vector the same length as its input; to predict the return length of `paste()` you must carefully read all arguments. ## `str_dup()`: duplicate strings within a character vector To duplicate strings within a character vector use `strrep()` (in R 3.3.0 or greater) or `str_dup()`: ```{r} #| eval: !expr getRversion() >= "3.3.0" fruit <- c("apple", "pear", "banana") # base strrep(fruit, 2) strrep(fruit, 1:3) # stringr str_dup(fruit, 2) str_dup(fruit, 1:3) ``` ## `str_split()`: Split up a string into pieces To split a string into pieces with breaks based on a particular pattern match stringr uses `str_split()` and base R uses `strsplit()`. Unlike most other functions, `strsplit()` starts with the character vector to modify. ```{r} fruits <- c( "apples and oranges and pears and bananas", "pineapples and mangos and guavas" ) # base strsplit(fruits, " and ") # stringr str_split(fruits, " and ") ``` The stringr package's `str_split()` allows for more control over the split, including restricting the number of possible matches. ```{r} # stringr str_split(fruits, " and ", n = 3) str_split(fruits, " and ", n = 2) ``` ## `str_glue()`: Interpolate strings It's often useful to interpolate varying values into a fixed string. In base R, you can use `sprintf()` for this purpose; stringr provides a wrapper for the more general purpose [glue](https://glue.tidyverse.org) package. ```{r} name <- "Fred" age <- 50 anniversary <- as.Date("1991-10-12") # base sprintf( "My name is %s my age next year is %s and my anniversary is %s.", name, age + 1, format(anniversary, "%A, %B %d, %Y") ) # stringr str_glue( "My name is {name}, ", "my age next year is {age + 1}, ", "and my anniversary is {format(anniversary, '%A, %B %d, %Y')}." ) ``` # Order strings ## `str_order()`: Order or sort a character vector Both base R and stringr have separate functions to order and sort strings. ```{r} # base order(letters) sort(letters) # stringr str_order(letters) str_sort(letters) ``` Some options in `str_order()` and `str_sort()` don't have analogous base R options. For example, the stringr functions have a `locale` argument to control how to order or sort. In base R the locale is a global setting, so the outputs of `sort()` and `order()` may differ across different computers. For example, in the Norwegian alphabet, å comes after z: ```{r} x <- c("å", "a", "z") str_sort(x) str_sort(x, locale = "no") ``` The stringr functions also have a `numeric` argument to sort digits numerically instead of treating them as strings. ```{r} # stringr x <- c("100a10", "100a5", "2b", "2a") str_sort(x) str_sort(x, numeric = TRUE) ``` stringr/inst/doc/regular-expressions.html0000644000176200001440000014065214524706130020405 0ustar liggesusers Regular expressions

Regular expressions

Regular expressions are a concise and flexible tool for describing patterns in strings. This vignette describes the key features of stringr’s regular expressions, as implemented by stringi. It is not a tutorial, so if you’re unfamiliar regular expressions, I’d recommend starting at https://r4ds.had.co.nz/strings.html. If you want to master the details, I’d recommend reading the classic Mastering Regular Expressions by Jeffrey E. F. Friedl.

Regular expressions are the default pattern engine in stringr. That means when you use a pattern matching function with a bare string, it’s equivalent to wrapping it in a call to regex():

# The regular call:
str_extract(fruit, "nana")
# Is shorthand for
str_extract(fruit, regex("nana"))

You will need to use regex() explicitly if you want to override the default options, as you’ll see in examples below.

Basic matches

The simplest patterns match exact strings:

x <- c("apple", "banana", "pear")
str_extract(x, "an")
#> [1] NA   "an" NA

You can perform a case-insensitive match using ignore_case = TRUE:

bananas <- c("banana", "Banana", "BANANA")
str_detect(bananas, "banana")
#> [1]  TRUE FALSE FALSE
str_detect(bananas, regex("banana", ignore_case = TRUE))
#> [1] TRUE TRUE TRUE

The next step up in complexity is ., which matches any character except a newline:

str_extract(x, ".a.")
#> [1] NA    "ban" "ear"

You can allow . to match everything, including \n, by setting dotall = TRUE:

str_detect("\nX\n", ".X.")
#> [1] FALSE
str_detect("\nX\n", regex(".X.", dotall = TRUE))
#> [1] TRUE

Escaping

If “.” matches any character, how do you match a literal “.”? You need to use an “escape” to tell the regular expression you want to match it exactly, not use its special behaviour. Like strings, regexps use the backslash, \, to escape special behaviour. So to match an ., you need the regexp \.. Unfortunately this creates a problem. We use strings to represent regular expressions, and \ is also used as an escape symbol in strings. So to create the regular expression \. we need the string "\\.".

# To create the regular expression, we need \\
dot <- "\\."

# But the expression itself only contains one:
writeLines(dot)
#> \.

# And this tells R to look for an explicit .
str_extract(c("abc", "a.c", "bef"), "a\\.c")
#> [1] NA    "a.c" NA

If \ is used as an escape character in regular expressions, how do you match a literal \? Well you need to escape it, creating the regular expression \\. To create that regular expression, you need to use a string, which also needs to escape \. That means to match a literal \ you need to write "\\\\" — you need four backslashes to match one!

x <- "a\\b"
writeLines(x)
#> a\b

str_extract(x, "\\\\")
#> [1] "\\"

In this vignette, I use \. to denote the regular expression, and "\\." to denote the string that represents the regular expression.

An alternative quoting mechanism is \Q...\E: all the characters in ... are treated as exact matches. This is useful if you want to exactly match user input as part of a regular expression.

x <- c("a.b.c.d", "aeb")
starts_with <- "a.b"

str_detect(x, paste0("^", starts_with))
#> [1] TRUE TRUE
str_detect(x, paste0("^\\Q", starts_with, "\\E"))
#> [1]  TRUE FALSE

Special characters

Escapes also allow you to specify individual characters that are otherwise hard to type. You can specify individual unicode characters in five ways, either as a variable number of hex digits (four is most common), or by name:

  • \xhh: 2 hex digits.

  • \x{hhhh}: 1-6 hex digits.

  • \uhhhh: 4 hex digits.

  • \Uhhhhhhhh: 8 hex digits.

  • \N{name}, e.g. \N{grinning face} matches the basic smiling emoji.

Similarly, you can specify many common control characters:

  • \a: bell.

  • \cX: match a control-X character.

  • \e: escape (\u001B).

  • \f: form feed (\u000C).

  • \n: line feed (\u000A).

  • \r: carriage return (\u000D).

  • \t: horizontal tabulation (\u0009).

  • \0ooo match an octal character. ‘ooo’ is from one to three octal digits, from 000 to 0377. The leading zero is required.

(Many of these are only of historical interest and are only included here for the sake of completeness.)

Matching multiple characters

There are a number of patterns that match more than one character. You’ve already seen ., which matches any character (except a newline). A closely related operator is \X, which matches a grapheme cluster, a set of individual elements that form a single symbol. For example, one way of representing “á” is as the letter “a” plus an accent: . will match the component “a”, while \X will match the complete symbol:

x <- "a\u0301"
str_extract(x, ".")
#> [1] "a"
str_extract(x, "\\X")
#> [1] "á"

There are five other escaped pairs that match narrower classes of characters:

  • \d: matches any digit. The complement, \D, matches any character that is not a decimal digit.

    str_extract_all("1 + 2 = 3", "\\d+")[[1]]
    #> [1] "1" "2" "3"

    Technically, \d includes any character in the Unicode Category of Nd (“Number, Decimal Digit”), which also includes numeric symbols from other languages:

    # Some Laotian numbers
    str_detect("១២៣", "\\d")
    #> [1] TRUE
  • \s: matches any whitespace. This includes tabs, newlines, form feeds, and any character in the Unicode Z Category (which includes a variety of space characters and other separators.). The complement, \S, matches any non-whitespace character.

    (text <- "Some  \t badly\n\t\tspaced \f text")
    #> [1] "Some  \t badly\n\t\tspaced \f text"
    str_replace_all(text, "\\s+", " ")
    #> [1] "Some badly spaced text"
  • \p{property name} matches any character with specific unicode property, like \p{Uppercase} or \p{Diacritic}. The complement, \P{property name}, matches all characters without the property. A complete list of unicode properties can be found at http://www.unicode.org/reports/tr44/#Property_Index.

    (text <- c('"Double quotes"', "«Guillemet»", "“Fancy quotes”"))
    #> [1] "\"Double quotes\"" "«Guillemet»"       "“Fancy quotes”"
    str_replace_all(text, "\\p{quotation mark}", "'")
    #> [1] "'Double quotes'" "'Guillemet'"     "'Fancy quotes'"
  • \w matches any “word” character, which includes alphabetic characters, marks and decimal numbers. The complement, \W, matches any non-word character.

    str_extract_all("Don't eat that!", "\\w+")[[1]]
    #> [1] "Don"  "t"    "eat"  "that"
    str_split("Don't eat that!", "\\W")[[1]]
    #> [1] "Don"  "t"    "eat"  "that" ""

    Technically, \w also matches connector punctuation, \u200c (zero width connector), and \u200d (zero width joiner), but these are rarely seen in the wild.

  • \b matches word boundaries, the transition between word and non-word characters. \B matches the opposite: boundaries that have either both word or non-word characters on either side.

    str_replace_all("The quick brown fox", "\\b", "_")
    #> [1] "_The_ _quick_ _brown_ _fox_"
    str_replace_all("The quick brown fox", "\\B", "_")
    #> [1] "T_h_e q_u_i_c_k b_r_o_w_n f_o_x"

You can also create your own character classes using []:

  • [abc]: matches a, b, or c.
  • [a-z]: matches every character between a and z (in Unicode code point order).
  • [^abc]: matches anything except a, b, or c.
  • [\^\-]: matches ^ or -.

There are a number of pre-built classes that you can use inside []:

  • [:punct:]: punctuation.
  • [:alpha:]: letters.
  • [:lower:]: lowercase letters.
  • [:upper:]: upperclass letters.
  • [:digit:]: digits.
  • [:xdigit:]: hex digits.
  • [:alnum:]: letters and numbers.
  • [:cntrl:]: control characters.
  • [:graph:]: letters, numbers, and punctuation.
  • [:print:]: letters, numbers, punctuation, and whitespace.
  • [:space:]: space characters (basically equivalent to \s).
  • [:blank:]: space and tab.

These all go inside the [] for character classes, i.e. [[:digit:]AX] matches all digits, A, and X.

You can also using Unicode properties, like [\p{Letter}], and various set operations, like [\p{Letter}--\p{script=latin}]. See ?"stringi-search-charclass" for details.

Alternation

| is the alternation operator, which will pick between one or more possible matches. For example, abc|def will match abc or def:

str_detect(c("abc", "def", "ghi"), "abc|def")
#> [1]  TRUE  TRUE FALSE

Note that the precedence for | is low: abc|def is equivalent to (abc)|(def) not ab(c|d)ef.

Grouping

You can use parentheses to override the default precedence rules:

str_extract(c("grey", "gray"), "gre|ay")
#> [1] "gre" "ay"
str_extract(c("grey", "gray"), "gr(e|a)y")
#> [1] "grey" "gray"

Parenthesis also define “groups” that you can refer to with backreferences, like \1, \2 etc, and can be extracted with str_match(). For example, the following regular expression finds all fruits that have a repeated pair of letters:

pattern <- "(..)\\1"
fruit %>% 
  str_subset(pattern)
#> [1] "banana"

fruit %>% 
  str_subset(pattern) %>% 
  str_match(pattern)
#>      [,1]   [,2]
#> [1,] "anan" "an"

You can use (?:...), the non-grouping parentheses, to control precedence but not capture the match in a group. This is slightly more efficient than capturing parentheses.

str_match(c("grey", "gray"), "gr(e|a)y")
#>      [,1]   [,2]
#> [1,] "grey" "e" 
#> [2,] "gray" "a"
str_match(c("grey", "gray"), "gr(?:e|a)y")
#>      [,1]  
#> [1,] "grey"
#> [2,] "gray"

This is most useful for more complex cases where you need to capture matches and control precedence independently.

Anchors

By default, regular expressions will match any part of a string. It’s often useful to anchor the regular expression so that it matches from the start or end of the string:

  • ^ matches the start of string.
  • $ matches the end of the string.
x <- c("apple", "banana", "pear")
str_extract(x, "^a")
#> [1] "a" NA  NA
str_extract(x, "a$")
#> [1] NA  "a" NA

To match a literal “$” or “^”, you need to escape them, \$, and \^.

For multiline strings, you can use regex(multiline = TRUE). This changes the behaviour of ^ and $, and introduces three new operators:

  • ^ now matches the start of each line.

  • $ now matches the end of each line.

  • \A matches the start of the input.

  • \z matches the end of the input.

  • \Z matches the end of the input, but before the final line terminator, if it exists.

x <- "Line 1\nLine 2\nLine 3\n"
str_extract_all(x, "^Line..")[[1]]
#> [1] "Line 1"
str_extract_all(x, regex("^Line..", multiline = TRUE))[[1]]
#> [1] "Line 1" "Line 2" "Line 3"
str_extract_all(x, regex("\\ALine..", multiline = TRUE))[[1]]
#> [1] "Line 1"

Repetition

You can control how many times a pattern matches with the repetition operators:

  • ?: 0 or 1.
  • +: 1 or more.
  • *: 0 or more.
x <- "1888 is the longest year in Roman numerals: MDCCCLXXXVIII"
str_extract(x, "CC?")
#> [1] "CC"
str_extract(x, "CC+")
#> [1] "CCC"
str_extract(x, 'C[LX]+')
#> [1] "CLXXX"

Note that the precedence of these operators is high, so you can write: colou?r to match either American or British spellings. That means most uses will need parentheses, like bana(na)+.

You can also specify the number of matches precisely:

  • {n}: exactly n
  • {n,}: n or more
  • {n,m}: between n and m
str_extract(x, "C{2}")
#> [1] "CC"
str_extract(x, "C{2,}")
#> [1] "CCC"
str_extract(x, "C{2,3}")
#> [1] "CCC"

By default these matches are “greedy”: they will match the longest string possible. You can make them “lazy”, matching the shortest string possible by putting a ? after them:

  • ??: 0 or 1, prefer 0.
  • +?: 1 or more, match as few times as possible.
  • *?: 0 or more, match as few times as possible.
  • {n,}?: n or more, match as few times as possible.
  • {n,m}?: between n and m, , match as few times as possible, but at least n.
str_extract(x, c("C{2,3}", "C{2,3}?"))
#> [1] "CCC" "CC"
str_extract(x, c("C[LX]+", "C[LX]+?"))
#> [1] "CLXXX" "CL"

You can also make the matches possessive by putting a + after them, which means that if later parts of the match fail, the repetition will not be re-tried with a smaller number of characters. This is an advanced feature used to improve performance in worst-case scenarios (called “catastrophic backtracking”).

  • ?+: 0 or 1, possessive.
  • ++: 1 or more, possessive.
  • *+: 0 or more, possessive.
  • {n}+: exactly n, possessive.
  • {n,}+: n or more, possessive.
  • {n,m}+: between n and m, possessive.

A related concept is the atomic-match parenthesis, (?>...). If a later match fails and the engine needs to back-track, an atomic match is kept as is: it succeeds or fails as a whole. Compare the following two regular expressions:

str_detect("ABC", "(?>A|.B)C")
#> [1] FALSE
str_detect("ABC", "(?:A|.B)C")
#> [1] TRUE

The atomic match fails because it matches A, and then the next character is a C so it fails. The regular match succeeds because it matches A, but then C doesn’t match, so it back-tracks and tries B instead.

Look arounds

These assertions look ahead or behind the current match without “consuming” any characters (i.e. changing the input position).

  • (?=...): positive look-ahead assertion. Matches if ... matches at the current input.

  • (?!...): negative look-ahead assertion. Matches if ... does not match at the current input.

  • (?<=...): positive look-behind assertion. Matches if ... matches text preceding the current position, with the last character of the match being the character just before the current position. Length must be bounded
    (i.e. no * or +).

  • (?<!...): negative look-behind assertion. Matches if ... does not match text preceding the current position. Length must be bounded
    (i.e. no * or +).

These are useful when you want to check that a pattern exists, but you don’t want to include it in the result:

x <- c("1 piece", "2 pieces", "3")
str_extract(x, "\\d+(?= pieces?)")
#> [1] "1" "2" NA

y <- c("100", "$400")
str_extract(y, "(?<=\\$)\\d+")
#> [1] NA    "400"

Comments

There are two ways to include comments in a regular expression. The first is with (?#...):

str_detect("xyz", "x(?#this is a comment)")
#> [1] TRUE

The second is to use regex(comments = TRUE). This form ignores spaces and newlines, and anything everything after #. To match a literal space, you’ll need to escape it: "\\ ". This is a useful way of describing complex regular expressions:

phone <- regex("
  \\(?       # optional opening parens
  (\\d{3})   # area code
  \\)?       # optional closing parens
  (?:-|\\ )? # optional dash or space
  (\\d{3})   # another three numbers
  (?:-|\\ )? # optional dash or space
  (\\d{3})   # three more numbers
  ", comments = TRUE)

str_match(c("514-791-8141", "(514) 791 8141"), phone)
#>      [,1]            [,2]  [,3]  [,4] 
#> [1,] "514-791-814"   "514" "791" "814"
#> [2,] "(514) 791 814" "514" "791" "814"
stringr/inst/doc/from-base.html0000644000176200001440000026610414524706127016246 0ustar liggesusers From base R

From base R

Sara Stoudt

This vignette compares stringr functions to their base R equivalents to help users transitioning from using base R to stringr.

Overall differences

We’ll begin with a lookup table between the most important stringr functions and their base R equivalents.

#> Warning: There was 1 warning in `dplyr::mutate()`.
#> ℹ In argument: `dplyr::across(.fns = ~paste0("`", .x, "`"))`.
#> Caused by warning:
#> ! Using `across()` without supplying `.cols` was deprecated in dplyr 1.1.0.
#> ℹ Please supply `.cols` instead.
stringr base R

str_detect(string, pattern)

grepl(pattern, x)

str_dup(string, times)

strrep(x, times)

str_extract(string, pattern)

regmatches(x, m = regexpr(pattern, text))

str_extract_all(string, pattern)

regmatches(x, m = gregexpr(pattern, text))

str_length(string)

nchar(x)

str_locate(string, pattern)

regexpr(pattern, text)

str_locate_all(string, pattern)

gregexpr(pattern, text)

str_match(string, pattern)

regmatches(x, m = regexec(pattern, text))

str_order(string)

order(...)

str_replace(string, pattern, replacement)

sub(pattern, replacement, x)

str_replace_all(string, pattern, replacement)

gsub(pattern, replacement, x)

str_sort(string)

sort(x)

str_split(string, pattern)

strsplit(x, split)

str_sub(string, start, end)

substr(x, start, stop)

str_subset(string, pattern)

grep(pattern, x, value = TRUE)

str_to_lower(string)

tolower(x)

str_to_title(string)

tools::toTitleCase(text)

str_to_upper(string)

toupper(x)

str_trim(string)

trimws(x)

str_which(string, pattern)

grep(pattern, x)

str_wrap(string)

strwrap(x)

Overall the main differences between base R and stringr are:

  1. stringr functions start with str_ prefix; base R string functions have no consistent naming scheme.

  2. The order of inputs is usually different between base R and stringr. In base R, the pattern to match usually comes first; in stringr, the string to manupulate always comes first. This makes stringr easier to use in pipes, and with lapply() or purrr::map().

  3. Functions in stringr tend to do less, where many of the string processing functions in base R have multiple purposes.

  4. The output and input of stringr functions has been carefully designed. For example, the output of str_locate() can be fed directly into str_sub(); the same is not true of regpexpr() and substr().

  5. Base functions use arguments (like perl, fixed, and ignore.case) to control how the pattern is interpreted. To avoid dependence between arguments, stringr instead uses helper functions (like fixed(), regex(), and coll()).

Next we’ll walk through each of the functions, noting the similarities and important differences. These examples are adapted from the stringr documentation and here they are contrasted with the analogous base R operations.

Detect matches

str_detect(): Detect the presence or absence of a pattern in a string

Suppose you want to know whether each word in a vector of fruit names contains an “a”.

fruit <- c("apple", "banana", "pear", "pineapple")

# base
grepl(pattern = "a", x = fruit)
#> [1] TRUE TRUE TRUE TRUE

# stringr
str_detect(fruit, pattern = "a")
#> [1] TRUE TRUE TRUE TRUE

In base you would use grepl() (see the “l” and think logical) while in stringr you use str_detect() (see the verb “detect” and think of a yes/no action).

str_which(): Find positions matching a pattern

Now you want to identify the positions of the words in a vector of fruit names that contain an “a”.

# base
grep(pattern = "a", x = fruit)
#> [1] 1 2 3 4

# stringr
str_which(fruit, pattern = "a")
#> [1] 1 2 3 4

In base you would use grep() while in stringr you use str_which() (by analogy to which()).

str_count(): Count the number of matches in a string

How many “a”s are in each fruit?

# base 
loc <- gregexpr(pattern = "a", text = fruit, fixed = TRUE)
sapply(loc, function(x) length(attr(x, "match.length")))
#> [1] 1 3 1 1

# stringr
str_count(fruit, pattern = "a")
#> [1] 1 3 1 1

This information can be gleaned from gregexpr() in base, but you need to look at the match.length attribute as the vector uses a length-1 integer vector (-1) to indicate no match.

str_locate(): Locate the position of patterns in a string

Within each fruit, where does the first “p” occur? Where are all of the “p”s?

fruit3 <- c("papaya", "lime", "apple")

# base
str(gregexpr(pattern = "p", text = fruit3))
#> List of 3
#>  $ : int [1:2] 1 3
#>   ..- attr(*, "match.length")= int [1:2] 1 1
#>   ..- attr(*, "index.type")= chr "chars"
#>   ..- attr(*, "useBytes")= logi TRUE
#>  $ : int -1
#>   ..- attr(*, "match.length")= int -1
#>   ..- attr(*, "index.type")= chr "chars"
#>   ..- attr(*, "useBytes")= logi TRUE
#>  $ : int [1:2] 2 3
#>   ..- attr(*, "match.length")= int [1:2] 1 1
#>   ..- attr(*, "index.type")= chr "chars"
#>   ..- attr(*, "useBytes")= logi TRUE

# stringr
str_locate(fruit3, pattern = "p")
#>      start end
#> [1,]     1   1
#> [2,]    NA  NA
#> [3,]     2   2
str_locate_all(fruit3, pattern = "p")
#> [[1]]
#>      start end
#> [1,]     1   1
#> [2,]     3   3
#> 
#> [[2]]
#>      start end
#> 
#> [[3]]
#>      start end
#> [1,]     2   2
#> [2,]     3   3

Subset strings

str_sub(): Extract and replace substrings from a character vector

What if we want to grab part of a string?

hw <- "Hadley Wickham"

# base
substr(hw, start = 1, stop = 6)
#> [1] "Hadley"
substring(hw, first = 1) 
#> [1] "Hadley Wickham"

# stringr
str_sub(hw, start = 1, end = 6)
#> [1] "Hadley"
str_sub(hw, start = 1)
#> [1] "Hadley Wickham"
str_sub(hw, end = 6)
#> [1] "Hadley"

In base you could use substr() or substring(). The former requires both a start and stop of the substring while the latter assumes the stop will be the end of the string. The stringr version, str_sub() has the same functionality, but also gives a default start value (the beginning of the string). Both the base and stringr functions have the same order of expected inputs.

In stringr you can use negative numbers to index from the right-hand side string: -1 is the last letter, -2 is the second to last, and so on.

str_sub(hw, start = 1, end = -1)
#> [1] "Hadley Wickham"
str_sub(hw, start = -5, end = -2)
#> [1] "ckha"

Both base R and stringr subset are vectorized over their parameters. This means you can either choose the same subset across multiple strings or specify different subsets for different strings.

al <- "Ada Lovelace"

# base
substr(c(hw,al), start = 1, stop = 6)
#> [1] "Hadley" "Ada Lo"
substr(c(hw,al), start = c(1,1), stop = c(6,7))
#> [1] "Hadley"  "Ada Lov"

# stringr
str_sub(c(hw,al), start = 1, end = -1)
#> [1] "Hadley Wickham" "Ada Lovelace"
str_sub(c(hw,al), start = c(1,1), end = c(-1,-2))
#> [1] "Hadley Wickham" "Ada Lovelac"

stringr will automatically recycle the first argument to the same length as start and stop:

str_sub(hw, start = 1:5)
#> [1] "Hadley Wickham" "adley Wickham"  "dley Wickham"   "ley Wickham"   
#> [5] "ey Wickham"

Whereas the base equivalent silently uses just the first value:

substr(hw, start = 1:5, stop = 15)
#> [1] "Hadley Wickham"

str_sub() <-: Subset assignment

substr() behaves in a surprising way when you replace a substring with a different number of characters:

# base
x <- "ABCDEF"
substr(x, 1, 3) <- "x"
x
#> [1] "xBCDEF"

str_sub() does what you would expect:

# stringr
x <- "ABCDEF"
str_sub(x, 1, 3) <- "x"
x
#> [1] "xDEF"

str_subset(): Keep strings matching a pattern, or find positions

We may want to retrieve strings that contain a pattern of interest:

# base
grep(pattern = "g", x = fruit, value = TRUE)
#> character(0)

# stringr
str_subset(fruit, pattern = "g")
#> character(0)

str_extract(): Extract matching patterns from a string

We may want to pick out certain patterns from a string, for example, the digits in a shopping list:

shopping_list <- c("apples x4", "bag of flour", "10", "milk x2")

# base
matches <- regexpr(pattern = "\\d+", text = shopping_list) # digits
regmatches(shopping_list, m = matches)
#> [1] "4"  "10" "2"

matches <- gregexpr(pattern = "[a-z]+", text = shopping_list) # words
regmatches(shopping_list, m = matches)
#> [[1]]
#> [1] "apples" "x"     
#> 
#> [[2]]
#> [1] "bag"   "of"    "flour"
#> 
#> [[3]]
#> character(0)
#> 
#> [[4]]
#> [1] "milk" "x"

# stringr
str_extract(shopping_list, pattern = "\\d+") 
#> [1] "4"  NA   "10" "2"
str_extract_all(shopping_list, "[a-z]+")
#> [[1]]
#> [1] "apples" "x"     
#> 
#> [[2]]
#> [1] "bag"   "of"    "flour"
#> 
#> [[3]]
#> character(0)
#> 
#> [[4]]
#> [1] "milk" "x"

Base R requires the combination of regexpr() with regmatches(); but note that the strings without matches are dropped from the output. stringr provides str_extract() and str_extract_all(), and the output is always the same length as the input.

str_match(): Extract matched groups from a string

We may also want to extract groups from a string. Here I’m going to use the scenario from Section 14.4.3 in R for Data Science.

head(sentences)
#> [1] "The birch canoe slid on the smooth planks." 
#> [2] "Glue the sheet to the dark blue background."
#> [3] "It's easy to tell the depth of a well."     
#> [4] "These days a chicken leg is a rare dish."   
#> [5] "Rice is often served in round bowls."       
#> [6] "The juice of lemons makes fine punch."
noun <- "([A]a|[Tt]he) ([^ ]+)"

# base
matches <- regexec(pattern = noun, text = head(sentences))
do.call("rbind", regmatches(x = head(sentences), m = matches))
#>      [,1]        [,2]  [,3]   
#> [1,] "The birch" "The" "birch"
#> [2,] "the sheet" "the" "sheet"
#> [3,] "the depth" "the" "depth"
#> [4,] "The juice" "The" "juice"

# stringr
str_match(head(sentences), pattern = noun)
#>      [,1]        [,2]  [,3]   
#> [1,] "The birch" "The" "birch"
#> [2,] "the sheet" "the" "sheet"
#> [3,] "the depth" "the" "depth"
#> [4,] NA          NA    NA     
#> [5,] NA          NA    NA     
#> [6,] "The juice" "The" "juice"

As for extracting the full match base R requires the combination of two functions, and inputs with no matches are dropped from the output.

Manage lengths

str_length(): The length of a string

To determine the length of a string, base R uses nchar() (not to be confused with length() which gives the length of vectors, etc.) while stringr uses str_length().

# base
nchar(letters)
#>  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

# stringr
str_length(letters)
#>  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

There are some subtle differences between base and stringr here. nchar() requires a character vector, so it will return an error if used on a factor. str_length() can handle a factor input.

# base
nchar(factor("abc")) 
#> Error in nchar(factor("abc")): 'nchar()' requires a character vector
# stringr
str_length(factor("abc"))
#> [1] 3

Note that “characters” is a poorly defined concept, and technically both nchar() and str_length() returns the number of code points. This is usually the same as what you’d consider to be a charcter, but not always:

x <- c("\u00fc", "u\u0308")
x
#> [1] "ü" "ü"

nchar(x)
#> [1] 1 2
str_length(x)
#> [1] 1 2

str_pad(): Pad a string

To pad a string to a certain width, use stringr’s str_pad(). In base R you could use sprintf(), but unlike str_pad(), sprintf() has many other functionalities.

# base
sprintf("%30s", "hadley")
#> [1] "                        hadley"
sprintf("%-30s", "hadley")
#> [1] "hadley                        "
# "both" is not as straightforward

# stringr
rbind(
  str_pad("hadley", 30, "left"),
  str_pad("hadley", 30, "right"),
  str_pad("hadley", 30, "both")
)
#>      [,1]                            
#> [1,] "                        hadley"
#> [2,] "hadley                        "
#> [3,] "            hadley            "

str_trunc(): Truncate a character string

The stringr package provides an easy way to truncate a character string: str_trunc(). Base R has no function to do this directly.

x <- "This string is moderately long"

# stringr
rbind(
  str_trunc(x, 20, "right"),
  str_trunc(x, 20, "left"),
  str_trunc(x, 20, "center")
)
#>      [,1]                  
#> [1,] "This string is mo..."
#> [2,] "...s moderately long"
#> [3,] "This stri...ely long"

str_trim(): Trim whitespace from a string

Similarly, stringr provides str_trim() to trim whitespace from a string. This is analogous to base R’s trimws() added in R 3.3.0.

# base
trimws(" String with trailing and leading white space\t")
#> [1] "String with trailing and leading white space"
trimws("\n\nString with trailing and leading white space\n\n")
#> [1] "String with trailing and leading white space"

# stringr
str_trim(" String with trailing and leading white space\t")
#> [1] "String with trailing and leading white space"
str_trim("\n\nString with trailing and leading white space\n\n")
#> [1] "String with trailing and leading white space"

The stringr function str_squish() allows for extra whitespace within a string to be trimmed (in contrast to str_trim() which removes whitespace at the beginning and/or end of string). In base R, one might take advantage of gsub() to accomplish the same effect.

# stringr
str_squish(" String with trailing, middle,   and leading white space\t")
#> [1] "String with trailing, middle, and leading white space"
str_squish("\n\nString with excess, trailing and leading white space\n\n")
#> [1] "String with excess, trailing and leading white space"

str_wrap(): Wrap strings into nicely formatted paragraphs

strwrap() and str_wrap() use different algorithms. str_wrap() uses the famous Knuth-Plass algorithm.

gettysburg <- "Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal."

# base
cat(strwrap(gettysburg, width = 60), sep = "\n")
#> Four score and seven years ago our fathers brought forth on
#> this continent, a new nation, conceived in Liberty, and
#> dedicated to the proposition that all men are created
#> equal.

# stringr
cat(str_wrap(gettysburg, width = 60), "\n")
#> Four score and seven years ago our fathers brought forth
#> on this continent, a new nation, conceived in Liberty, and
#> dedicated to the proposition that all men are created equal.

Note that strwrap() returns a character vector with one element for each line; str_wrap() returns a single string containing line breaks.

Mutate strings

str_replace(): Replace matched patterns in a string

To replace certain patterns within a string, stringr provides the functions str_replace() and str_replace_all(). The base R equivalents are sub() and gsub(). Note the difference in default input order again.

fruits <- c("apple", "banana", "pear", "pineapple")

# base
sub("[aeiou]", "-", fruits)
#> [1] "-pple"     "b-nana"    "p-ar"      "p-neapple"
gsub("[aeiou]", "-", fruits)
#> [1] "-ppl-"     "b-n-n-"    "p--r"      "p-n--ppl-"

# stringr
str_replace(fruits, "[aeiou]", "-")
#> [1] "-pple"     "b-nana"    "p-ar"      "p-neapple"
str_replace_all(fruits, "[aeiou]", "-")
#> [1] "-ppl-"     "b-n-n-"    "p--r"      "p-n--ppl-"

case: Convert case of a string

Both stringr and base R have functions to convert to upper and lower case. Title case is also provided in stringr.

dog <- "The quick brown dog"

# base
toupper(dog)
#> [1] "THE QUICK BROWN DOG"
tolower(dog)
#> [1] "the quick brown dog"
tools::toTitleCase(dog)
#> [1] "The Quick Brown Dog"

# stringr
str_to_upper(dog)
#> [1] "THE QUICK BROWN DOG"
str_to_lower(dog)
#> [1] "the quick brown dog"
str_to_title(dog)
#> [1] "The Quick Brown Dog"

In stringr we can control the locale, while in base R locale distinctions are controlled with global variables. Therefore, the output of your base R code may vary across different computers with different global settings.

# stringr
str_to_upper("i") # English
#> [1] "I"
str_to_upper("i", locale = "tr") # Turkish
#> [1] "İ"

Join and split

str_flatten(): Flatten a string

If we want to take elements of a string vector and collapse them to a single string we can use the collapse argument in paste() or use stringr’s str_flatten().

# base
paste0(letters, collapse = "-")
#> [1] "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"

# stringr
str_flatten(letters, collapse = "-")
#> [1] "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"

The advantage of str_flatten() is that it always returns a vector the same length as its input; to predict the return length of paste() you must carefully read all arguments.

str_dup(): duplicate strings within a character vector

To duplicate strings within a character vector use strrep() (in R 3.3.0 or greater) or str_dup():

fruit <- c("apple", "pear", "banana")

# base
strrep(fruit, 2)
#> [1] "appleapple"   "pearpear"     "bananabanana"
strrep(fruit, 1:3)
#> [1] "apple"              "pearpear"           "bananabananabanana"

# stringr
str_dup(fruit, 2)
#> [1] "appleapple"   "pearpear"     "bananabanana"
str_dup(fruit, 1:3)
#> [1] "apple"              "pearpear"           "bananabananabanana"

str_split(): Split up a string into pieces

To split a string into pieces with breaks based on a particular pattern match stringr uses str_split() and base R uses strsplit(). Unlike most other functions, strsplit() starts with the character vector to modify.

fruits <- c(
  "apples and oranges and pears and bananas",
  "pineapples and mangos and guavas"
)
# base
strsplit(fruits, " and ")
#> [[1]]
#> [1] "apples"  "oranges" "pears"   "bananas"
#> 
#> [[2]]
#> [1] "pineapples" "mangos"     "guavas"

# stringr
str_split(fruits, " and ")
#> [[1]]
#> [1] "apples"  "oranges" "pears"   "bananas"
#> 
#> [[2]]
#> [1] "pineapples" "mangos"     "guavas"

The stringr package’s str_split() allows for more control over the split, including restricting the number of possible matches.

# stringr
str_split(fruits, " and ", n = 3)
#> [[1]]
#> [1] "apples"            "oranges"           "pears and bananas"
#> 
#> [[2]]
#> [1] "pineapples" "mangos"     "guavas"
str_split(fruits, " and ", n = 2)
#> [[1]]
#> [1] "apples"                        "oranges and pears and bananas"
#> 
#> [[2]]
#> [1] "pineapples"        "mangos and guavas"

str_glue(): Interpolate strings

It’s often useful to interpolate varying values into a fixed string. In base R, you can use sprintf() for this purpose; stringr provides a wrapper for the more general purpose glue package.

name <- "Fred"
age <- 50
anniversary <- as.Date("1991-10-12")

# base
sprintf(
  "My name is %s my age next year is %s and my anniversary is %s.", 
  name,
  age + 1,
  format(anniversary, "%A, %B %d, %Y")
)
#> [1] "My name is Fred my age next year is 51 and my anniversary is Saturday, October 12, 1991."

# stringr
str_glue(
  "My name is {name}, ",
  "my age next year is {age + 1}, ",
  "and my anniversary is {format(anniversary, '%A, %B %d, %Y')}."
)
#> My name is Fred, my age next year is 51, and my anniversary is Saturday, October 12, 1991.

Order strings

str_order(): Order or sort a character vector

Both base R and stringr have separate functions to order and sort strings.

# base
order(letters)
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26
sort(letters)
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
#> [20] "t" "u" "v" "w" "x" "y" "z"

# stringr
str_order(letters)
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26
str_sort(letters)
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
#> [20] "t" "u" "v" "w" "x" "y" "z"

Some options in str_order() and str_sort() don’t have analogous base R options. For example, the stringr functions have a locale argument to control how to order or sort. In base R the locale is a global setting, so the outputs of sort() and order() may differ across different computers. For example, in the Norwegian alphabet, å comes after z:

x <- c("å", "a", "z")
str_sort(x)
#> [1] "a" "å" "z"
str_sort(x, locale = "no")
#> [1] "a" "z" "å"

The stringr functions also have a numeric argument to sort digits numerically instead of treating them as strings.

# stringr
x <- c("100a10", "100a5", "2b", "2a")
str_sort(x)
#> [1] "100a10" "100a5"  "2a"     "2b"
str_sort(x, numeric = TRUE)
#> [1] "2a"     "2b"     "100a5"  "100a10"
stringr/inst/doc/regular-expressions.Rmd0000644000176200001440000003447614520174727020200 0ustar liggesusers--- title: "Regular expressions" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Regular expressions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(stringr) ``` Regular expressions are a concise and flexible tool for describing patterns in strings. This vignette describes the key features of stringr's regular expressions, as implemented by [stringi](https://github.com/gagolews/stringi). It is not a tutorial, so if you're unfamiliar regular expressions, I'd recommend starting at . If you want to master the details, I'd recommend reading the classic [_Mastering Regular Expressions_](https://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124) by Jeffrey E. F. Friedl. Regular expressions are the default pattern engine in stringr. That means when you use a pattern matching function with a bare string, it's equivalent to wrapping it in a call to `regex()`: ```{r, eval = FALSE} # The regular call: str_extract(fruit, "nana") # Is shorthand for str_extract(fruit, regex("nana")) ``` You will need to use `regex()` explicitly if you want to override the default options, as you'll see in examples below. ## Basic matches The simplest patterns match exact strings: ```{r} x <- c("apple", "banana", "pear") str_extract(x, "an") ``` You can perform a case-insensitive match using `ignore_case = TRUE`: ```{r} bananas <- c("banana", "Banana", "BANANA") str_detect(bananas, "banana") str_detect(bananas, regex("banana", ignore_case = TRUE)) ``` The next step up in complexity is `.`, which matches any character except a newline: ```{r} str_extract(x, ".a.") ``` You can allow `.` to match everything, including `\n`, by setting `dotall = TRUE`: ```{r} str_detect("\nX\n", ".X.") str_detect("\nX\n", regex(".X.", dotall = TRUE)) ``` ## Escaping If "`.`" matches any character, how do you match a literal "`.`"? You need to use an "escape" to tell the regular expression you want to match it exactly, not use its special behaviour. Like strings, regexps use the backslash, `\`, to escape special behaviour. So to match an `.`, you need the regexp `\.`. Unfortunately this creates a problem. We use strings to represent regular expressions, and `\` is also used as an escape symbol in strings. So to create the regular expression `\.` we need the string `"\\."`. ```{r} # To create the regular expression, we need \\ dot <- "\\." # But the expression itself only contains one: writeLines(dot) # And this tells R to look for an explicit . str_extract(c("abc", "a.c", "bef"), "a\\.c") ``` If `\` is used as an escape character in regular expressions, how do you match a literal `\`? Well you need to escape it, creating the regular expression `\\`. To create that regular expression, you need to use a string, which also needs to escape `\`. That means to match a literal `\` you need to write `"\\\\"` --- you need four backslashes to match one! ```{r} x <- "a\\b" writeLines(x) str_extract(x, "\\\\") ``` In this vignette, I use `\.` to denote the regular expression, and `"\\."` to denote the string that represents the regular expression. An alternative quoting mechanism is `\Q...\E`: all the characters in `...` are treated as exact matches. This is useful if you want to exactly match user input as part of a regular expression. ```{r} x <- c("a.b.c.d", "aeb") starts_with <- "a.b" str_detect(x, paste0("^", starts_with)) str_detect(x, paste0("^\\Q", starts_with, "\\E")) ``` ## Special characters Escapes also allow you to specify individual characters that are otherwise hard to type. You can specify individual unicode characters in five ways, either as a variable number of hex digits (four is most common), or by name: * `\xhh`: 2 hex digits. * `\x{hhhh}`: 1-6 hex digits. * `\uhhhh`: 4 hex digits. * `\Uhhhhhhhh`: 8 hex digits. * `\N{name}`, e.g. `\N{grinning face}` matches the basic smiling emoji. Similarly, you can specify many common control characters: * `\a`: bell. * `\cX`: match a control-X character. * `\e`: escape (`\u001B`). * `\f`: form feed (`\u000C`). * `\n`: line feed (`\u000A`). * `\r`: carriage return (`\u000D`). * `\t`: horizontal tabulation (`\u0009`). * `\0ooo` match an octal character. 'ooo' is from one to three octal digits, from 000 to 0377. The leading zero is required. (Many of these are only of historical interest and are only included here for the sake of completeness.) ## Matching multiple characters There are a number of patterns that match more than one character. You've already seen `.`, which matches any character (except a newline). A closely related operator is `\X`, which matches a __grapheme cluster__, a set of individual elements that form a single symbol. For example, one way of representing "á" is as the letter "a" plus an accent: `.` will match the component "a", while `\X` will match the complete symbol: ```{r} x <- "a\u0301" str_extract(x, ".") str_extract(x, "\\X") ``` There are five other escaped pairs that match narrower classes of characters: * `\d`: matches any digit. The complement, `\D`, matches any character that is not a decimal digit. ```{r} str_extract_all("1 + 2 = 3", "\\d+")[[1]] ``` Technically, `\d` includes any character in the Unicode Category of Nd ("Number, Decimal Digit"), which also includes numeric symbols from other languages: ```{r} # Some Laotian numbers str_detect("១២៣", "\\d") ``` * `\s`: matches any whitespace. This includes tabs, newlines, form feeds, and any character in the Unicode Z Category (which includes a variety of space characters and other separators.). The complement, `\S`, matches any non-whitespace character. ```{r} (text <- "Some \t badly\n\t\tspaced \f text") str_replace_all(text, "\\s+", " ") ``` * `\p{property name}` matches any character with specific unicode property, like `\p{Uppercase}` or `\p{Diacritic}`. The complement, `\P{property name}`, matches all characters without the property. A complete list of unicode properties can be found at . ```{r} (text <- c('"Double quotes"', "«Guillemet»", "“Fancy quotes”")) str_replace_all(text, "\\p{quotation mark}", "'") ``` * `\w` matches any "word" character, which includes alphabetic characters, marks and decimal numbers. The complement, `\W`, matches any non-word character. ```{r} str_extract_all("Don't eat that!", "\\w+")[[1]] str_split("Don't eat that!", "\\W")[[1]] ``` Technically, `\w` also matches connector punctuation, `\u200c` (zero width connector), and `\u200d` (zero width joiner), but these are rarely seen in the wild. * `\b` matches word boundaries, the transition between word and non-word characters. `\B` matches the opposite: boundaries that have either both word or non-word characters on either side. ```{r} str_replace_all("The quick brown fox", "\\b", "_") str_replace_all("The quick brown fox", "\\B", "_") ``` You can also create your own __character classes__ using `[]`: * `[abc]`: matches a, b, or c. * `[a-z]`: matches every character between a and z (in Unicode code point order). * `[^abc]`: matches anything except a, b, or c. * `[\^\-]`: matches `^` or `-`. There are a number of pre-built classes that you can use inside `[]`: * `[:punct:]`: punctuation. * `[:alpha:]`: letters. * `[:lower:]`: lowercase letters. * `[:upper:]`: upperclass letters. * `[:digit:]`: digits. * `[:xdigit:]`: hex digits. * `[:alnum:]`: letters and numbers. * `[:cntrl:]`: control characters. * `[:graph:]`: letters, numbers, and punctuation. * `[:print:]`: letters, numbers, punctuation, and whitespace. * `[:space:]`: space characters (basically equivalent to `\s`). * `[:blank:]`: space and tab. These all go inside the `[]` for character classes, i.e. `[[:digit:]AX]` matches all digits, A, and X. You can also using Unicode properties, like `[\p{Letter}]`, and various set operations, like `[\p{Letter}--\p{script=latin}]`. See `?"stringi-search-charclass"` for details. ## Alternation `|` is the __alternation__ operator, which will pick between one or more possible matches. For example, `abc|def` will match `abc` or `def`: ```{r} str_detect(c("abc", "def", "ghi"), "abc|def") ``` Note that the precedence for `|` is low: `abc|def` is equivalent to `(abc)|(def)` not `ab(c|d)ef`. ## Grouping You can use parentheses to override the default precedence rules: ```{r} str_extract(c("grey", "gray"), "gre|ay") str_extract(c("grey", "gray"), "gr(e|a)y") ``` Parenthesis also define "groups" that you can refer to with __backreferences__, like `\1`, `\2` etc, and can be extracted with `str_match()`. For example, the following regular expression finds all fruits that have a repeated pair of letters: ```{r} pattern <- "(..)\\1" fruit %>% str_subset(pattern) fruit %>% str_subset(pattern) %>% str_match(pattern) ``` You can use `(?:...)`, the non-grouping parentheses, to control precedence but not capture the match in a group. This is slightly more efficient than capturing parentheses. ```{r} str_match(c("grey", "gray"), "gr(e|a)y") str_match(c("grey", "gray"), "gr(?:e|a)y") ``` This is most useful for more complex cases where you need to capture matches and control precedence independently. ## Anchors By default, regular expressions will match any part of a string. It's often useful to __anchor__ the regular expression so that it matches from the start or end of the string: * `^` matches the start of string. * `$` matches the end of the string. ```{r} x <- c("apple", "banana", "pear") str_extract(x, "^a") str_extract(x, "a$") ``` To match a literal "$" or "^", you need to escape them, `\$`, and `\^`. For multiline strings, you can use `regex(multiline = TRUE)`. This changes the behaviour of `^` and `$`, and introduces three new operators: * `^` now matches the start of each line. * `$` now matches the end of each line. * `\A` matches the start of the input. * `\z` matches the end of the input. * `\Z` matches the end of the input, but before the final line terminator, if it exists. ```{r} x <- "Line 1\nLine 2\nLine 3\n" str_extract_all(x, "^Line..")[[1]] str_extract_all(x, regex("^Line..", multiline = TRUE))[[1]] str_extract_all(x, regex("\\ALine..", multiline = TRUE))[[1]] ``` ## Repetition You can control how many times a pattern matches with the repetition operators: * `?`: 0 or 1. * `+`: 1 or more. * `*`: 0 or more. ```{r} x <- "1888 is the longest year in Roman numerals: MDCCCLXXXVIII" str_extract(x, "CC?") str_extract(x, "CC+") str_extract(x, 'C[LX]+') ``` Note that the precedence of these operators is high, so you can write: `colou?r` to match either American or British spellings. That means most uses will need parentheses, like `bana(na)+`. You can also specify the number of matches precisely: * `{n}`: exactly n * `{n,}`: n or more * `{n,m}`: between n and m ```{r} str_extract(x, "C{2}") str_extract(x, "C{2,}") str_extract(x, "C{2,3}") ``` By default these matches are "greedy": they will match the longest string possible. You can make them "lazy", matching the shortest string possible by putting a `?` after them: * `??`: 0 or 1, prefer 0. * `+?`: 1 or more, match as few times as possible. * `*?`: 0 or more, match as few times as possible. * `{n,}?`: n or more, match as few times as possible. * `{n,m}?`: between n and m, , match as few times as possible, but at least n. ```{r} str_extract(x, c("C{2,3}", "C{2,3}?")) str_extract(x, c("C[LX]+", "C[LX]+?")) ``` You can also make the matches possessive by putting a `+` after them, which means that if later parts of the match fail, the repetition will not be re-tried with a smaller number of characters. This is an advanced feature used to improve performance in worst-case scenarios (called "catastrophic backtracking"). * `?+`: 0 or 1, possessive. * `++`: 1 or more, possessive. * `*+`: 0 or more, possessive. * `{n}+`: exactly n, possessive. * `{n,}+`: n or more, possessive. * `{n,m}+`: between n and m, possessive. A related concept is the __atomic-match__ parenthesis, `(?>...)`. If a later match fails and the engine needs to back-track, an atomic match is kept as is: it succeeds or fails as a whole. Compare the following two regular expressions: ```{r} str_detect("ABC", "(?>A|.B)C") str_detect("ABC", "(?:A|.B)C") ``` The atomic match fails because it matches A, and then the next character is a C so it fails. The regular match succeeds because it matches A, but then C doesn't match, so it back-tracks and tries B instead. ## Look arounds These assertions look ahead or behind the current match without "consuming" any characters (i.e. changing the input position). * `(?=...)`: positive look-ahead assertion. Matches if `...` matches at the current input. * `(?!...)`: negative look-ahead assertion. Matches if `...` __does not__ match at the current input. * `(?<=...)`: positive look-behind assertion. Matches if `...` matches text preceding the current position, with the last character of the match being the character just before the current position. Length must be bounded (i.e. no `*` or `+`). * `(? Introduction to stringr

Introduction to stringr

There are four main families of functions in stringr:

  1. Character manipulation: these functions allow you to manipulate individual characters within the strings in character vectors.

  2. Whitespace tools to add, remove, and manipulate whitespace.

  3. Locale sensitive operations whose operations will vary from locale to locale.

  4. Pattern matching functions. These recognise four engines of pattern description. The most common is regular expressions, but there are three other tools.

Getting and setting individual characters

You can get the length of the string with str_length():

str_length("abc")
#> [1] 3

This is now equivalent to the base R function nchar(). Previously it was needed to work around issues with nchar() such as the fact that it returned 2 for nchar(NA). This has been fixed as of R 3.3.0, so it is no longer so important.

You can access individual character using str_sub(). It takes three arguments: a character vector, a start position and an end position. Either position can either be a positive integer, which counts from the left, or a negative integer which counts from the right. The positions are inclusive, and if longer than the string, will be silently truncated.

x <- c("abcdef", "ghifjk")

# The 3rd letter
str_sub(x, 3, 3)
#> [1] "c" "i"

# The 2nd to 2nd-to-last character
str_sub(x, 2, -2)
#> [1] "bcde" "hifj"

You can also use str_sub() to modify strings:

str_sub(x, 3, 3) <- "X"
x
#> [1] "abXdef" "ghXfjk"

To duplicate individual strings, you can use str_dup():

str_dup(x, c(2, 3))
#> [1] "abXdefabXdef"       "ghXfjkghXfjkghXfjk"

Whitespace

Three functions add, remove, or modify whitespace:

  1. str_pad() pads a string to a fixed length by adding extra whitespace on the left, right, or both sides.

    x <- c("abc", "defghi")
    str_pad(x, 10) # default pads on left
    #> [1] "       abc" "    defghi"
    str_pad(x, 10, "both")
    #> [1] "   abc    " "  defghi  "

    (You can pad with other characters by using the pad argument.)

    str_pad() will never make a string shorter:

    str_pad(x, 4)
    #> [1] " abc"   "defghi"

    So if you want to ensure that all strings are the same length (often useful for print methods), combine str_pad() and str_trunc():

    x <- c("Short", "This is a long string")
    
    x %>% 
      str_trunc(10) %>% 
      str_pad(10, "right")
    #> [1] "Short     " "This is..."
  2. The opposite of str_pad() is str_trim(), which removes leading and trailing whitespace:

    x <- c("  a   ", "b   ",  "   c")
    str_trim(x)
    #> [1] "a" "b" "c"
    str_trim(x, "left")
    #> [1] "a   " "b   " "c"
  3. You can use str_wrap() to modify existing whitespace in order to wrap a paragraph of text, such that the length of each line is as similar as possible.

    jabberwocky <- str_c(
      "`Twas brillig, and the slithy toves ",
      "did gyre and gimble in the wabe: ",
      "All mimsy were the borogoves, ",
      "and the mome raths outgrabe. "
    )
    cat(str_wrap(jabberwocky, width = 40))
    #> `Twas brillig, and the slithy toves did
    #> gyre and gimble in the wabe: All mimsy
    #> were the borogoves, and the mome raths
    #> outgrabe.

Locale sensitive

A handful of stringr functions are locale-sensitive: they will perform differently in different regions of the world. These functions are case transformation functions:

x <- "I like horses."
str_to_upper(x)
#> [1] "I LIKE HORSES."
str_to_title(x)
#> [1] "I Like Horses."

str_to_lower(x)
#> [1] "i like horses."
# Turkish has two sorts of i: with and without the dot
str_to_lower(x, "tr")
#> [1] "ı like horses."

String ordering and sorting:

x <- c("y", "i", "k")
str_order(x)
#> [1] 2 3 1

str_sort(x)
#> [1] "i" "k" "y"
# In Lithuanian, y comes between i and k
str_sort(x, locale = "lt")
#> [1] "i" "y" "k"

The locale always defaults to English to ensure that the default behaviour is identical across systems. Locales always include a two letter ISO-639-1 language code (like “en” for English or “zh” for Chinese), and optionally a ISO-3166 country code (like “en_UK” vs “en_US”). You can see a complete list of available locales by running stringi::stri_locale_list().

Pattern matching

The vast majority of stringr functions work with patterns. These are parameterised by the task they perform and the types of patterns they match.

Tasks

Each pattern matching function has the same first two arguments, a character vector of strings to process and a single pattern to match. stringr provides pattern matching functions to detect, locate, extract, match, replace, and split strings. I’ll illustrate how they work with some strings and a regular expression designed to match (US) phone numbers:

strings <- c(
  "apple", 
  "219 733 8965", 
  "329-293-8753", 
  "Work: 579-499-7527; Home: 543.355.3679"
)
phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
  • str_detect() detects the presence or absence of a pattern and returns a logical vector (similar to grepl()). str_subset() returns the elements of a character vector that match a regular expression (similar to grep() with value = TRUE)`.

    # Which strings contain phone numbers?
    str_detect(strings, phone)
    #> [1] FALSE  TRUE  TRUE  TRUE
    str_subset(strings, phone)
    #> [1] "219 733 8965"                          
    #> [2] "329-293-8753"                          
    #> [3] "Work: 579-499-7527; Home: 543.355.3679"
  • str_count() counts the number of matches:

    # How many phone numbers in each string?
    str_count(strings, phone)
    #> [1] 0 1 1 2
  • str_locate() locates the first position of a pattern and returns a numeric matrix with columns start and end. str_locate_all() locates all matches, returning a list of numeric matrices. Similar to regexpr() and gregexpr().

    # Where in the string is the phone number located?
    (loc <- str_locate(strings, phone))
    #>      start end
    #> [1,]    NA  NA
    #> [2,]     1  12
    #> [3,]     1  12
    #> [4,]     7  18
    str_locate_all(strings, phone)
    #> [[1]]
    #>      start end
    #> 
    #> [[2]]
    #>      start end
    #> [1,]     1  12
    #> 
    #> [[3]]
    #>      start end
    #> [1,]     1  12
    #> 
    #> [[4]]
    #>      start end
    #> [1,]     7  18
    #> [2,]    27  38
  • str_extract() extracts text corresponding to the first match, returning a character vector. str_extract_all() extracts all matches and returns a list of character vectors.

    # What are the phone numbers?
    str_extract(strings, phone)
    #> [1] NA             "219 733 8965" "329-293-8753" "579-499-7527"
    str_extract_all(strings, phone)
    #> [[1]]
    #> character(0)
    #> 
    #> [[2]]
    #> [1] "219 733 8965"
    #> 
    #> [[3]]
    #> [1] "329-293-8753"
    #> 
    #> [[4]]
    #> [1] "579-499-7527" "543.355.3679"
    str_extract_all(strings, phone, simplify = TRUE)
    #>      [,1]           [,2]          
    #> [1,] ""             ""            
    #> [2,] "219 733 8965" ""            
    #> [3,] "329-293-8753" ""            
    #> [4,] "579-499-7527" "543.355.3679"
  • str_match() extracts capture groups formed by () from the first match. It returns a character matrix with one column for the complete match and one column for each group. str_match_all() extracts capture groups from all matches and returns a list of character matrices. Similar to regmatches().

    # Pull out the three components of the match
    str_match(strings, phone)
    #>      [,1]           [,2]  [,3]  [,4]  
    #> [1,] NA             NA    NA    NA    
    #> [2,] "219 733 8965" "219" "733" "8965"
    #> [3,] "329-293-8753" "329" "293" "8753"
    #> [4,] "579-499-7527" "579" "499" "7527"
    str_match_all(strings, phone)
    #> [[1]]
    #>      [,1] [,2] [,3] [,4]
    #> 
    #> [[2]]
    #>      [,1]           [,2]  [,3]  [,4]  
    #> [1,] "219 733 8965" "219" "733" "8965"
    #> 
    #> [[3]]
    #>      [,1]           [,2]  [,3]  [,4]  
    #> [1,] "329-293-8753" "329" "293" "8753"
    #> 
    #> [[4]]
    #>      [,1]           [,2]  [,3]  [,4]  
    #> [1,] "579-499-7527" "579" "499" "7527"
    #> [2,] "543.355.3679" "543" "355" "3679"
  • str_replace() replaces the first matched pattern and returns a character vector. str_replace_all() replaces all matches. Similar to sub() and gsub().

    str_replace(strings, phone, "XXX-XXX-XXXX")
    #> [1] "apple"                                 
    #> [2] "XXX-XXX-XXXX"                          
    #> [3] "XXX-XXX-XXXX"                          
    #> [4] "Work: XXX-XXX-XXXX; Home: 543.355.3679"
    str_replace_all(strings, phone, "XXX-XXX-XXXX")
    #> [1] "apple"                                 
    #> [2] "XXX-XXX-XXXX"                          
    #> [3] "XXX-XXX-XXXX"                          
    #> [4] "Work: XXX-XXX-XXXX; Home: XXX-XXX-XXXX"
  • str_split_fixed() splits a string into a fixed number of pieces based on a pattern and returns a character matrix. str_split() splits a string into a variable number of pieces and returns a list of character vectors.

    str_split("a-b-c", "-")
    #> [[1]]
    #> [1] "a" "b" "c"
    str_split_fixed("a-b-c", "-", n = 2)
    #>      [,1] [,2] 
    #> [1,] "a"  "b-c"

Engines

There are four main engines that stringr can use to describe patterns:

  • Regular expressions, the default, as shown above, and described in vignette("regular-expressions").

  • Fixed bytewise matching, with fixed().

  • Locale-sensitive character matching, with coll()

  • Text boundary analysis with boundary().

Fixed matches

fixed(x) only matches the exact sequence of bytes specified by x. This is a very limited “pattern”, but the restriction can make matching much faster. Beware using fixed() with non-English data. It is problematic because there are often multiple ways of representing the same character. For example, there are two ways to define “á”: either as a single character or as an “a” plus an accent:

a1 <- "\u00e1"
a2 <- "a\u0301"
c(a1, a2)
#> [1] "á" "á"
a1 == a2
#> [1] FALSE

They render identically, but because they’re defined differently, fixed() doesn’t find a match. Instead, you can use coll(), explained below, to respect human character comparison rules:

str_detect(a1, fixed(a2))
#> [1] FALSE
str_detect(a1, coll(a2))
#> [1] TRUE

Boundary

boundary() matches boundaries between characters, lines, sentences or words. It’s most useful with str_split(), but can be used with all pattern matching functions:

x <- "This is a sentence."
str_split(x, boundary("word"))
#> [[1]]
#> [1] "This"     "is"       "a"        "sentence"
str_count(x, boundary("word"))
#> [1] 4
str_extract_all(x, boundary("word"))
#> [[1]]
#> [1] "This"     "is"       "a"        "sentence"

By convention, "" is treated as boundary("character"):

str_split(x, "")
#> [[1]]
#>  [1] "T" "h" "i" "s" " " "i" "s" " " "a" " " "s" "e" "n" "t" "e" "n" "c" "e" "."
str_count(x, "")
#> [1] 19
stringr/inst/doc/stringr.R0000644000176200001440000001027114524706130015302 0ustar liggesusers## ----include = FALSE---------------------------------------------------------- library(stringr) knitr::opts_chunk$set( comment = "#>", collapse = TRUE ) ## ----------------------------------------------------------------------------- str_length("abc") ## ----------------------------------------------------------------------------- x <- c("abcdef", "ghifjk") # The 3rd letter str_sub(x, 3, 3) # The 2nd to 2nd-to-last character str_sub(x, 2, -2) ## ----------------------------------------------------------------------------- str_sub(x, 3, 3) <- "X" x ## ----------------------------------------------------------------------------- str_dup(x, c(2, 3)) ## ----------------------------------------------------------------------------- x <- c("abc", "defghi") str_pad(x, 10) # default pads on left str_pad(x, 10, "both") ## ----------------------------------------------------------------------------- str_pad(x, 4) ## ----------------------------------------------------------------------------- x <- c("Short", "This is a long string") x %>% str_trunc(10) %>% str_pad(10, "right") ## ----------------------------------------------------------------------------- x <- c(" a ", "b ", " c") str_trim(x) str_trim(x, "left") ## ----------------------------------------------------------------------------- jabberwocky <- str_c( "`Twas brillig, and the slithy toves ", "did gyre and gimble in the wabe: ", "All mimsy were the borogoves, ", "and the mome raths outgrabe. " ) cat(str_wrap(jabberwocky, width = 40)) ## ----------------------------------------------------------------------------- x <- "I like horses." str_to_upper(x) str_to_title(x) str_to_lower(x) # Turkish has two sorts of i: with and without the dot str_to_lower(x, "tr") ## ----------------------------------------------------------------------------- x <- c("y", "i", "k") str_order(x) str_sort(x) # In Lithuanian, y comes between i and k str_sort(x, locale = "lt") ## ----------------------------------------------------------------------------- strings <- c( "apple", "219 733 8965", "329-293-8753", "Work: 579-499-7527; Home: 543.355.3679" ) phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})" ## ----------------------------------------------------------------------------- # Which strings contain phone numbers? str_detect(strings, phone) str_subset(strings, phone) ## ----------------------------------------------------------------------------- # How many phone numbers in each string? str_count(strings, phone) ## ----------------------------------------------------------------------------- # Where in the string is the phone number located? (loc <- str_locate(strings, phone)) str_locate_all(strings, phone) ## ----------------------------------------------------------------------------- # What are the phone numbers? str_extract(strings, phone) str_extract_all(strings, phone) str_extract_all(strings, phone, simplify = TRUE) ## ----------------------------------------------------------------------------- # Pull out the three components of the match str_match(strings, phone) str_match_all(strings, phone) ## ----------------------------------------------------------------------------- str_replace(strings, phone, "XXX-XXX-XXXX") str_replace_all(strings, phone, "XXX-XXX-XXXX") ## ----------------------------------------------------------------------------- str_split("a-b-c", "-") str_split_fixed("a-b-c", "-", n = 2) ## ----------------------------------------------------------------------------- a1 <- "\u00e1" a2 <- "a\u0301" c(a1, a2) a1 == a2 ## ----------------------------------------------------------------------------- str_detect(a1, fixed(a2)) str_detect(a1, coll(a2)) ## ----------------------------------------------------------------------------- i <- c("I", "İ", "i", "ı") i str_subset(i, coll("i", ignore_case = TRUE)) str_subset(i, coll("i", ignore_case = TRUE, locale = "tr")) ## ----------------------------------------------------------------------------- x <- "This is a sentence." str_split(x, boundary("word")) str_count(x, boundary("word")) str_extract_all(x, boundary("word")) ## ----------------------------------------------------------------------------- str_split(x, "") str_count(x, "") stringr/inst/doc/regular-expressions.R0000644000176200001440000001220214524706130017627 0ustar liggesusers## ----setup, include = FALSE--------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(stringr) ## ----eval = FALSE------------------------------------------------------------- # # The regular call: # str_extract(fruit, "nana") # # Is shorthand for # str_extract(fruit, regex("nana")) ## ----------------------------------------------------------------------------- x <- c("apple", "banana", "pear") str_extract(x, "an") ## ----------------------------------------------------------------------------- bananas <- c("banana", "Banana", "BANANA") str_detect(bananas, "banana") str_detect(bananas, regex("banana", ignore_case = TRUE)) ## ----------------------------------------------------------------------------- str_extract(x, ".a.") ## ----------------------------------------------------------------------------- str_detect("\nX\n", ".X.") str_detect("\nX\n", regex(".X.", dotall = TRUE)) ## ----------------------------------------------------------------------------- # To create the regular expression, we need \\ dot <- "\\." # But the expression itself only contains one: writeLines(dot) # And this tells R to look for an explicit . str_extract(c("abc", "a.c", "bef"), "a\\.c") ## ----------------------------------------------------------------------------- x <- "a\\b" writeLines(x) str_extract(x, "\\\\") ## ----------------------------------------------------------------------------- x <- c("a.b.c.d", "aeb") starts_with <- "a.b" str_detect(x, paste0("^", starts_with)) str_detect(x, paste0("^\\Q", starts_with, "\\E")) ## ----------------------------------------------------------------------------- x <- "a\u0301" str_extract(x, ".") str_extract(x, "\\X") ## ----------------------------------------------------------------------------- str_extract_all("1 + 2 = 3", "\\d+")[[1]] ## ----------------------------------------------------------------------------- # Some Laotian numbers str_detect("១២៣", "\\d") ## ----------------------------------------------------------------------------- (text <- "Some \t badly\n\t\tspaced \f text") str_replace_all(text, "\\s+", " ") ## ----------------------------------------------------------------------------- (text <- c('"Double quotes"', "«Guillemet»", "“Fancy quotes”")) str_replace_all(text, "\\p{quotation mark}", "'") ## ----------------------------------------------------------------------------- str_extract_all("Don't eat that!", "\\w+")[[1]] str_split("Don't eat that!", "\\W")[[1]] ## ----------------------------------------------------------------------------- str_replace_all("The quick brown fox", "\\b", "_") str_replace_all("The quick brown fox", "\\B", "_") ## ----------------------------------------------------------------------------- str_detect(c("abc", "def", "ghi"), "abc|def") ## ----------------------------------------------------------------------------- str_extract(c("grey", "gray"), "gre|ay") str_extract(c("grey", "gray"), "gr(e|a)y") ## ----------------------------------------------------------------------------- pattern <- "(..)\\1" fruit %>% str_subset(pattern) fruit %>% str_subset(pattern) %>% str_match(pattern) ## ----------------------------------------------------------------------------- str_match(c("grey", "gray"), "gr(e|a)y") str_match(c("grey", "gray"), "gr(?:e|a)y") ## ----------------------------------------------------------------------------- x <- c("apple", "banana", "pear") str_extract(x, "^a") str_extract(x, "a$") ## ----------------------------------------------------------------------------- x <- "Line 1\nLine 2\nLine 3\n" str_extract_all(x, "^Line..")[[1]] str_extract_all(x, regex("^Line..", multiline = TRUE))[[1]] str_extract_all(x, regex("\\ALine..", multiline = TRUE))[[1]] ## ----------------------------------------------------------------------------- x <- "1888 is the longest year in Roman numerals: MDCCCLXXXVIII" str_extract(x, "CC?") str_extract(x, "CC+") str_extract(x, 'C[LX]+') ## ----------------------------------------------------------------------------- str_extract(x, "C{2}") str_extract(x, "C{2,}") str_extract(x, "C{2,3}") ## ----------------------------------------------------------------------------- str_extract(x, c("C{2,3}", "C{2,3}?")) str_extract(x, c("C[LX]+", "C[LX]+?")) ## ----------------------------------------------------------------------------- str_detect("ABC", "(?>A|.B)C") str_detect("ABC", "(?:A|.B)C") ## ----------------------------------------------------------------------------- x <- c("1 piece", "2 pieces", "3") str_extract(x, "\\d+(?= pieces?)") y <- c("100", "$400") str_extract(y, "(?<=\\$)\\d+") ## ----------------------------------------------------------------------------- str_detect("xyz", "x(?#this is a comment)") ## ----------------------------------------------------------------------------- phone <- regex(" \\(? # optional opening parens (\\d{3}) # area code \\)? # optional closing parens (?:-|\\ )? # optional dash or space (\\d{3}) # another three numbers (?:-|\\ )? # optional dash or space (\\d{3}) # three more numbers ", comments = TRUE) str_match(c("514-791-8141", "(514) 791 8141"), phone) stringr/inst/doc/stringr.Rmd0000644000176200001440000002247514037267204015637 0ustar liggesusers--- title: "Introduction to stringr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to stringr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(stringr) knitr::opts_chunk$set( comment = "#>", collapse = TRUE ) ``` There are four main families of functions in stringr: 1. Character manipulation: these functions allow you to manipulate individual characters within the strings in character vectors. 1. Whitespace tools to add, remove, and manipulate whitespace. 1. Locale sensitive operations whose operations will vary from locale to locale. 1. Pattern matching functions. These recognise four engines of pattern description. The most common is regular expressions, but there are three other tools. ## Getting and setting individual characters You can get the length of the string with `str_length()`: ```{r} str_length("abc") ``` This is now equivalent to the base R function `nchar()`. Previously it was needed to work around issues with `nchar()` such as the fact that it returned 2 for `nchar(NA)`. This has been fixed as of R 3.3.0, so it is no longer so important. You can access individual character using `str_sub()`. It takes three arguments: a character vector, a `start` position and an `end` position. Either position can either be a positive integer, which counts from the left, or a negative integer which counts from the right. The positions are inclusive, and if longer than the string, will be silently truncated. ```{r} x <- c("abcdef", "ghifjk") # The 3rd letter str_sub(x, 3, 3) # The 2nd to 2nd-to-last character str_sub(x, 2, -2) ``` You can also use `str_sub()` to modify strings: ```{r} str_sub(x, 3, 3) <- "X" x ``` To duplicate individual strings, you can use `str_dup()`: ```{r} str_dup(x, c(2, 3)) ``` ## Whitespace Three functions add, remove, or modify whitespace: 1. `str_pad()` pads a string to a fixed length by adding extra whitespace on the left, right, or both sides. ```{r} x <- c("abc", "defghi") str_pad(x, 10) # default pads on left str_pad(x, 10, "both") ``` (You can pad with other characters by using the `pad` argument.) `str_pad()` will never make a string shorter: ```{r} str_pad(x, 4) ``` So if you want to ensure that all strings are the same length (often useful for print methods), combine `str_pad()` and `str_trunc()`: ```{r} x <- c("Short", "This is a long string") x %>% str_trunc(10) %>% str_pad(10, "right") ``` 1. The opposite of `str_pad()` is `str_trim()`, which removes leading and trailing whitespace: ```{r} x <- c(" a ", "b ", " c") str_trim(x) str_trim(x, "left") ``` 1. You can use `str_wrap()` to modify existing whitespace in order to wrap a paragraph of text, such that the length of each line is as similar as possible. ```{r} jabberwocky <- str_c( "`Twas brillig, and the slithy toves ", "did gyre and gimble in the wabe: ", "All mimsy were the borogoves, ", "and the mome raths outgrabe. " ) cat(str_wrap(jabberwocky, width = 40)) ``` ## Locale sensitive A handful of stringr functions are locale-sensitive: they will perform differently in different regions of the world. These functions are case transformation functions: ```{r} x <- "I like horses." str_to_upper(x) str_to_title(x) str_to_lower(x) # Turkish has two sorts of i: with and without the dot str_to_lower(x, "tr") ``` String ordering and sorting: ```{r} x <- c("y", "i", "k") str_order(x) str_sort(x) # In Lithuanian, y comes between i and k str_sort(x, locale = "lt") ``` The locale always defaults to English to ensure that the default behaviour is identical across systems. Locales always include a two letter ISO-639-1 language code (like "en" for English or "zh" for Chinese), and optionally a ISO-3166 country code (like "en_UK" vs "en_US"). You can see a complete list of available locales by running `stringi::stri_locale_list()`. ## Pattern matching The vast majority of stringr functions work with patterns. These are parameterised by the task they perform and the types of patterns they match. ### Tasks Each pattern matching function has the same first two arguments, a character vector of `string`s to process and a single `pattern` to match. stringr provides pattern matching functions to **detect**, **locate**, **extract**, **match**, **replace**, and **split** strings. I'll illustrate how they work with some strings and a regular expression designed to match (US) phone numbers: ```{r} strings <- c( "apple", "219 733 8965", "329-293-8753", "Work: 579-499-7527; Home: 543.355.3679" ) phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})" ``` - `str_detect()` detects the presence or absence of a pattern and returns a logical vector (similar to `grepl()`). `str_subset()` returns the elements of a character vector that match a regular expression (similar to `grep()` with `value = TRUE`)`. ```{r} # Which strings contain phone numbers? str_detect(strings, phone) str_subset(strings, phone) ``` - `str_count()` counts the number of matches: ```{r} # How many phone numbers in each string? str_count(strings, phone) ``` - `str_locate()` locates the **first** position of a pattern and returns a numeric matrix with columns start and end. `str_locate_all()` locates all matches, returning a list of numeric matrices. Similar to `regexpr()` and `gregexpr()`. ```{r} # Where in the string is the phone number located? (loc <- str_locate(strings, phone)) str_locate_all(strings, phone) ``` - `str_extract()` extracts text corresponding to the **first** match, returning a character vector. `str_extract_all()` extracts all matches and returns a list of character vectors. ```{r} # What are the phone numbers? str_extract(strings, phone) str_extract_all(strings, phone) str_extract_all(strings, phone, simplify = TRUE) ``` - `str_match()` extracts capture groups formed by `()` from the **first** match. It returns a character matrix with one column for the complete match and one column for each group. `str_match_all()` extracts capture groups from all matches and returns a list of character matrices. Similar to `regmatches()`. ```{r} # Pull out the three components of the match str_match(strings, phone) str_match_all(strings, phone) ``` - `str_replace()` replaces the **first** matched pattern and returns a character vector. `str_replace_all()` replaces all matches. Similar to `sub()` and `gsub()`. ```{r} str_replace(strings, phone, "XXX-XXX-XXXX") str_replace_all(strings, phone, "XXX-XXX-XXXX") ``` - `str_split_fixed()` splits a string into a **fixed** number of pieces based on a pattern and returns a character matrix. `str_split()` splits a string into a **variable** number of pieces and returns a list of character vectors. ```{r} str_split("a-b-c", "-") str_split_fixed("a-b-c", "-", n = 2) ``` ### Engines There are four main engines that stringr can use to describe patterns: * Regular expressions, the default, as shown above, and described in `vignette("regular-expressions")`. * Fixed bytewise matching, with `fixed()`. * Locale-sensitive character matching, with `coll()` * Text boundary analysis with `boundary()`. #### Fixed matches `fixed(x)` only matches the exact sequence of bytes specified by `x`. This is a very limited "pattern", but the restriction can make matching much faster. Beware using `fixed()` with non-English data. It is problematic because there are often multiple ways of representing the same character. For example, there are two ways to define "á": either as a single character or as an "a" plus an accent: ```{r} a1 <- "\u00e1" a2 <- "a\u0301" c(a1, a2) a1 == a2 ``` They render identically, but because they're defined differently, `fixed()` doesn't find a match. Instead, you can use `coll()`, explained below, to respect human character comparison rules: ```{r} str_detect(a1, fixed(a2)) str_detect(a1, coll(a2)) ``` #### Collation search `coll(x)` looks for a match to `x` using human-language **coll**ation rules, and is particularly important if you want to do case insensitive matching. Collation rules differ around the world, so you'll also need to supply a `locale` parameter. ```{r} i <- c("I", "İ", "i", "ı") i str_subset(i, coll("i", ignore_case = TRUE)) str_subset(i, coll("i", ignore_case = TRUE, locale = "tr")) ``` The downside of `coll()` is speed. Because the rules for recognising which characters are the same are complicated, `coll()` is relatively slow compared to `regex()` and `fixed()`. Note that when both `fixed()` and `regex()` have `ignore_case` arguments, they perform a much simpler comparison than `coll()`. #### Boundary `boundary()` matches boundaries between characters, lines, sentences or words. It's most useful with `str_split()`, but can be used with all pattern matching functions: ```{r} x <- "This is a sentence." str_split(x, boundary("word")) str_count(x, boundary("word")) str_extract_all(x, boundary("word")) ``` By convention, `""` is treated as `boundary("character")`: ```{r} str_split(x, "") str_count(x, "") ``` stringr/inst/doc/from-base.R0000644000176200001440000002277114524706127015503 0ustar liggesusers## ----------------------------------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(stringr) library(magrittr) ## ----------------------------------------------------------------------------- data_stringr_base_diff <- tibble::tribble( ~stringr, ~base_r, "str_detect(string, pattern)", "grepl(pattern, x)", "str_dup(string, times)", "strrep(x, times)", "str_extract(string, pattern)", "regmatches(x, m = regexpr(pattern, text))", "str_extract_all(string, pattern)", "regmatches(x, m = gregexpr(pattern, text))", "str_length(string)", "nchar(x)", "str_locate(string, pattern)", "regexpr(pattern, text)", "str_locate_all(string, pattern)", "gregexpr(pattern, text)", "str_match(string, pattern)", "regmatches(x, m = regexec(pattern, text))", "str_order(string)", "order(...)", "str_replace(string, pattern, replacement)", "sub(pattern, replacement, x)", "str_replace_all(string, pattern, replacement)", "gsub(pattern, replacement, x)", "str_sort(string)", "sort(x)", "str_split(string, pattern)", "strsplit(x, split)", "str_sub(string, start, end)", "substr(x, start, stop)", "str_subset(string, pattern)", "grep(pattern, x, value = TRUE)", "str_to_lower(string)", "tolower(x)", "str_to_title(string)", "tools::toTitleCase(text)", "str_to_upper(string)", "toupper(x)", "str_trim(string)", "trimws(x)", "str_which(string, pattern)", "grep(pattern, x)", "str_wrap(string)", "strwrap(x)" ) # create MD table, arranged alphabetically by stringr fn name data_stringr_base_diff %>% dplyr::mutate(dplyr::across(.fns = ~ paste0("`", .x, "`"))) %>% dplyr::arrange(stringr) %>% dplyr::rename(`base R` = base_r) %>% gt::gt() %>% gt::fmt_markdown(columns = everything()) %>% gt::tab_options(column_labels.font.weight = "bold") ## ----------------------------------------------------------------------------- fruit <- c("apple", "banana", "pear", "pineapple") # base grepl(pattern = "a", x = fruit) # stringr str_detect(fruit, pattern = "a") ## ----------------------------------------------------------------------------- # base grep(pattern = "a", x = fruit) # stringr str_which(fruit, pattern = "a") ## ----------------------------------------------------------------------------- # base loc <- gregexpr(pattern = "a", text = fruit, fixed = TRUE) sapply(loc, function(x) length(attr(x, "match.length"))) # stringr str_count(fruit, pattern = "a") ## ----------------------------------------------------------------------------- fruit3 <- c("papaya", "lime", "apple") # base str(gregexpr(pattern = "p", text = fruit3)) # stringr str_locate(fruit3, pattern = "p") str_locate_all(fruit3, pattern = "p") ## ----------------------------------------------------------------------------- hw <- "Hadley Wickham" # base substr(hw, start = 1, stop = 6) substring(hw, first = 1) # stringr str_sub(hw, start = 1, end = 6) str_sub(hw, start = 1) str_sub(hw, end = 6) ## ----------------------------------------------------------------------------- str_sub(hw, start = 1, end = -1) str_sub(hw, start = -5, end = -2) ## ----------------------------------------------------------------------------- al <- "Ada Lovelace" # base substr(c(hw,al), start = 1, stop = 6) substr(c(hw,al), start = c(1,1), stop = c(6,7)) # stringr str_sub(c(hw,al), start = 1, end = -1) str_sub(c(hw,al), start = c(1,1), end = c(-1,-2)) ## ----------------------------------------------------------------------------- str_sub(hw, start = 1:5) ## ----------------------------------------------------------------------------- substr(hw, start = 1:5, stop = 15) ## ----------------------------------------------------------------------------- # base x <- "ABCDEF" substr(x, 1, 3) <- "x" x ## ----------------------------------------------------------------------------- # stringr x <- "ABCDEF" str_sub(x, 1, 3) <- "x" x ## ----------------------------------------------------------------------------- # base grep(pattern = "g", x = fruit, value = TRUE) # stringr str_subset(fruit, pattern = "g") ## ----------------------------------------------------------------------------- shopping_list <- c("apples x4", "bag of flour", "10", "milk x2") # base matches <- regexpr(pattern = "\\d+", text = shopping_list) # digits regmatches(shopping_list, m = matches) matches <- gregexpr(pattern = "[a-z]+", text = shopping_list) # words regmatches(shopping_list, m = matches) # stringr str_extract(shopping_list, pattern = "\\d+") str_extract_all(shopping_list, "[a-z]+") ## ----------------------------------------------------------------------------- head(sentences) noun <- "([A]a|[Tt]he) ([^ ]+)" # base matches <- regexec(pattern = noun, text = head(sentences)) do.call("rbind", regmatches(x = head(sentences), m = matches)) # stringr str_match(head(sentences), pattern = noun) ## ----------------------------------------------------------------------------- # base nchar(letters) # stringr str_length(letters) ## ----------------------------------------------------------------------------- # base nchar(factor("abc")) ## ----------------------------------------------------------------------------- # stringr str_length(factor("abc")) ## ----------------------------------------------------------------------------- x <- c("\u00fc", "u\u0308") x nchar(x) str_length(x) ## ----------------------------------------------------------------------------- # base sprintf("%30s", "hadley") sprintf("%-30s", "hadley") # "both" is not as straightforward # stringr rbind( str_pad("hadley", 30, "left"), str_pad("hadley", 30, "right"), str_pad("hadley", 30, "both") ) ## ----------------------------------------------------------------------------- x <- "This string is moderately long" # stringr rbind( str_trunc(x, 20, "right"), str_trunc(x, 20, "left"), str_trunc(x, 20, "center") ) ## ----------------------------------------------------------------------------- # base trimws(" String with trailing and leading white space\t") trimws("\n\nString with trailing and leading white space\n\n") # stringr str_trim(" String with trailing and leading white space\t") str_trim("\n\nString with trailing and leading white space\n\n") ## ----------------------------------------------------------------------------- # stringr str_squish(" String with trailing, middle, and leading white space\t") str_squish("\n\nString with excess, trailing and leading white space\n\n") ## ----------------------------------------------------------------------------- gettysburg <- "Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal." # base cat(strwrap(gettysburg, width = 60), sep = "\n") # stringr cat(str_wrap(gettysburg, width = 60), "\n") ## ----------------------------------------------------------------------------- fruits <- c("apple", "banana", "pear", "pineapple") # base sub("[aeiou]", "-", fruits) gsub("[aeiou]", "-", fruits) # stringr str_replace(fruits, "[aeiou]", "-") str_replace_all(fruits, "[aeiou]", "-") ## ----------------------------------------------------------------------------- dog <- "The quick brown dog" # base toupper(dog) tolower(dog) tools::toTitleCase(dog) # stringr str_to_upper(dog) str_to_lower(dog) str_to_title(dog) ## ----------------------------------------------------------------------------- # stringr str_to_upper("i") # English str_to_upper("i", locale = "tr") # Turkish ## ----------------------------------------------------------------------------- # base paste0(letters, collapse = "-") # stringr str_flatten(letters, collapse = "-") ## ----------------------------------------------------------------------------- fruit <- c("apple", "pear", "banana") # base strrep(fruit, 2) strrep(fruit, 1:3) # stringr str_dup(fruit, 2) str_dup(fruit, 1:3) ## ----------------------------------------------------------------------------- fruits <- c( "apples and oranges and pears and bananas", "pineapples and mangos and guavas" ) # base strsplit(fruits, " and ") # stringr str_split(fruits, " and ") ## ----------------------------------------------------------------------------- # stringr str_split(fruits, " and ", n = 3) str_split(fruits, " and ", n = 2) ## ----------------------------------------------------------------------------- name <- "Fred" age <- 50 anniversary <- as.Date("1991-10-12") # base sprintf( "My name is %s my age next year is %s and my anniversary is %s.", name, age + 1, format(anniversary, "%A, %B %d, %Y") ) # stringr str_glue( "My name is {name}, ", "my age next year is {age + 1}, ", "and my anniversary is {format(anniversary, '%A, %B %d, %Y')}." ) ## ----------------------------------------------------------------------------- # base order(letters) sort(letters) # stringr str_order(letters) str_sort(letters) ## ----------------------------------------------------------------------------- x <- c("å", "a", "z") str_sort(x) str_sort(x, locale = "no") ## ----------------------------------------------------------------------------- # stringr x <- c("100a10", "100a5", "2b", "2a") str_sort(x) str_sort(x, numeric = TRUE) stringr/inst/htmlwidgets/0000755000176200001440000000000014524706124015257 5ustar liggesusersstringr/inst/htmlwidgets/str_view.yaml0000644000176200001440000000014712613714471020010 0ustar liggesusersdependencies: - name: str_view version: 0.1.0 src: htmlwidgets/lib/ stylesheet: str_view.css stringr/inst/htmlwidgets/str_view.js0000644000176200001440000000036712613714660017466 0ustar liggesusersHTMLWidgets.widget({ name: 'str_view', type: 'output', initialize: function(el, width, height) { }, renderValue: function(el, x, instance) { el.innerHTML = x.html; }, resize: function(el, width, height, instance) { } }); stringr/inst/htmlwidgets/lib/0000755000176200001440000000000014316043620016017 5ustar liggesusersstringr/inst/htmlwidgets/lib/str_view.css0000644000176200001440000000044014316043620020371 0ustar liggesusers.str_view ul { font-size: 16px; } .str_view ul, .str_view li { list-style: none; padding: 0; margin: 0.5em 0; } .str_view .match { border: 1px solid #ccc; background-color: #eee; border-color: #ccc; border-radius: 3px; } .str_view .special { background-color: red; }