fansi/0000755000176200001440000000000013605457511011361 5ustar liggesusersfansi/NAMESPACE0000755000176200001440000000127113604141176012600 0ustar liggesusers# Generated by roxygen2: do not edit by hand export(fansi_lines) export(has_ctl) export(has_sgr) export(html_code_block) export(html_esc) export(nchar_ctl) export(nchar_sgr) export(nzchar_ctl) export(nzchar_sgr) export(set_knit_hooks) export(sgr_to_html) export(strip_ctl) export(strip_sgr) export(strsplit_ctl) export(strsplit_sgr) export(strtrim2_ctl) export(strtrim2_sgr) export(strtrim_ctl) export(strtrim_sgr) export(strwrap2_ctl) export(strwrap2_sgr) export(strwrap_ctl) export(strwrap_sgr) export(substr2_ctl) export(substr2_sgr) export(substr_ctl) export(substr_sgr) export(tabs_as_spaces) export(term_cap_test) export(unhandled_ctl) useDynLib(fansi, .registration=TRUE, .fixes="FANSI_") fansi/README.md0000755000176200001440000001766413604512046012653 0ustar liggesusers fansi - ANSI Control Sequence Aware String Functions ==================================================== [![](https://travis-ci.org/brodieG/fansi.svg?branch=master)](https://travis-ci.org/brodieG/fansi) [![](https://codecov.io/github/brodieG/fansi/coverage.svg?branch=master)](https://codecov.io/github/brodieG/fansi?branch=master) [![](http://www.r-pkg.org/badges/version/fansi)](https://cran.r-project.org/package=fansi) [![Dependencies direct/recursive](https://tinyverse.netlify.com/badge/fansi)](https://tinyverse.netlify.com/) Counterparts to R string manipulation functions that account for the effects of ANSI text formatting control sequences. Formatting Strings with Control Sequences ----------------------------------------- Many terminals will recognize special sequences of characters in strings and change display behavior as a result. For example, on my terminal the sequence `"\033[42m"` turns text background green: ![hello world](https://raw.githubusercontent.com/brodieG/fansi/rc/extra/images/hello.png) The sequence itself is not shown, but the text display changes. This type of sequence is called an ANSI CSI SGR control sequence. Most \*nix terminals support them, and newer versions of Windows and Rstudio consoles do too. You can check whether your display supports them by running `term_cap_test()`. Whether the `fansi` functions behave as expected depends on many factors, including how your particular display handles Control Sequences. See `?fansi` for details, particularly if you are getting unexpected results. Control Sequences Require Special Handling ------------------------------------------ ANSI control characters and sequences (*Control Sequences* hereafter) break the relationship between byte/character position in a string and display position. For example, in `"Hello \033[42mWorld, Good\033[m Night Moon!"` the space after “World,” is thirteenth displayed character, but the eighteenth actual character (“\\033” is a single character, the ESC). If we try to split the string after the space with `substr` things go wrong in several ways: ![bad substring](https://raw.githubusercontent.com/brodieG/fansi/master/extra/images/substr.png) We end up cutting up our string in the middle of “World”, and worse the formatting bleeds out of our string into the prompt line. Compare to what happens when we use `substr_ctl`, the *Control Sequence* aware version of `substr`: ![good substring](https://raw.githubusercontent.com/brodieG/fansi/master/extra/images/substr_ctl.png) Functions --------- `fansi` provides counterparts to the following string functions: - `substr` - `strsplit` - `strtrim` - `strwrap` - `nchar` / `nzchar` These are drop-in replacements that behave (almost) identically to the base counterparts, except for the *Control Sequence* awareness. `fansi` also includes improved versions of some of those functions, such as `substr2_ctl` which allows for width based substrings. There are also utility functions such as `strip_ctl` to remove *Control Sequences* and `has_ctl` to detect whether strings contain them. Most of `fansi` is written in C so you should find performance of the `fansi` functions to be comparable to the base functions. `strwrap_ctl` is much faster, and `strsplit_ctl` is somewhat slower than the corresponding base functions. HTML Translation ---------------- You can translate ANSI CSI SGR formatted strings into their HTML counterparts with `sgr_to_html`: ![translate to html](https://raw.githubusercontent.com/brodieG/fansi/master/extra/images/sgr_to_html.png) Rmarkdown --------- It is possible to set `knitr` hooks such that R output that contains ANSI CSI SGR is automatically converted to the HTML formatted equivalent and displayed as intended. See the [vignette](https://htmlpreview.github.io/?https://raw.githubusercontent.com/brodieG/fansi/issue61/doc/sgr-in-rmd.html) for details. Installation ------------ This package is available on CRAN: install.packages('fansi') It has no runtime dependencies. For the development version use `remotes::install_github('brodieg/fansi@development')` or: f.dl <- tempfile() f.uz <- tempfile() github.url <- 'https://github.com/brodieG/fansi/archive/development.zip' download.file(github.url, f.dl) unzip(f.dl, exdir=f.uz) install.packages(file.path(f.uz, 'fansi-development'), repos=NULL, type='source') unlink(c(f.dl, f.uz)) There is no guarantee that development versions are stable or even working (Travis build status: [![](https://travis-ci.org/brodieG/fansi.svg?branch=development)](https://travis-ci.org/brodieG/fansi)). The master branch typically mirrors CRAN and should be stable. Related Packages and References ------------------------------- - [crayon](https://github.com/r-lib/crayon), the library that started it all. - [ansistrings](https://github.com/r-lib/ansistrings/), which implements similar functionality. - [ECMA-48 - Control Functions For Coded Character Sets](http://www.ecma-international.org/publications/standards/Ecma-048.htm), in particular pages 10-12, and 61. - [CCITT Recommendation T.416](https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-T.416-199303-I!!PDF-E&type=items) - [ANSI Escape Code - Wikipedia](https://en.wikipedia.org/wiki/ANSI_escape_code) for a gentler introduction. Acknowledgments --------------- - R Core for developing and maintaining such a wonderful language. - CRAN maintainers, for patiently shepherding packages onto CRAN and maintaining the repository, and Uwe Ligges in particular for maintaining [Winbuilder](http://win-builder.r-project.org/). - [Gábor Csárdi](https://github.com/gaborcsardi) for getting me started on the journey ANSI control sequences, and for many of the ideas on how to process them. - [Jim Hester](https://github.com/jimhester) because [covr](https://cran.r-project.org/package=covr) rocks. - [Dirk Eddelbuettel](https://github.com/eddelbuettel) and [Carl Boettiger](https://github.com/cboettig) for the [rocker](https://github.com/rocker-org/rocker) project, and [Gábor Csárdi](https://github.com/gaborcsardi) and the [R-consortium](https://www.r-consortium.org/) for [Rhub](https://github.com/r-hub), without which testing bugs on R-devel and other platforms would be a nightmare. - [Tomas Kalibera](https://github.com/kalibera) for [rchk](https://github.com/kalibera/rchk) and the accompanying vagrant image, and rcnst to help detect errors in compiled code. - [Winston Chang](https://github.com/wch) for the [r-debug](https://hub.docker.com/r/wch1/r-debug/) docker container, in particular because of the valgrind level 2 instrumented version of R. - Hadley Wickham for [devtools](https://cran.r-project.org/package=devtools) and [roxygen2](https://cran.r-project.org/package=roxygen2). - [Yihui Xie](https://github.com/yihui) for [knitr](https://cran.r-project.org/package=knitr) and [J.J. Allaire](https://github.com/jjallaire) etal for [rmarkdown](https://cran.r-project.org/package=rmarkdown), and by extension John MacFarlane for [pandoc](http://pandoc.org/). - Olaf Mersmann for [microbenchmark](https://cran.r-project.org/package=microbenchmark), because microsecond matter. - All open source developers out there that make their work freely available for others to use. - [Github](https://github.com/), [Travis-CI](https://travis-ci.org/), [Codecov](https://codecov.io/), [Vagrant](https://www.vagrantup.com/), [Docker](https://www.docker.com/), [Ubuntu](https://www.ubuntu.com/), [Brew](https://brew.sh/) for providing infrastructure that greatly simplifies open source development. - [Free Software Foundation](http://fsf.org/) for developing the GPL license and promotion of the free software movement. fansi/man/0000755000176200001440000000000013604507330012126 5ustar liggesusersfansi/man/fansi.Rd0000755000176200001440000002137413355765264013546 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/fansi-package.R \docType{package} \name{fansi} \alias{fansi} \alias{fansi-package} \title{Details About Manipulation of Strings Containing Control Sequences} \description{ Counterparts to R string manipulation functions that account for the effects of ANSI text formatting control sequences. } \section{Control Characters and Sequences}{ Control characters and sequences are non-printing inline characters that can be used to modify terminal display and behavior, for example by changing text color or cursor position. We will refer to ANSI control characters and sequences as "\emph{Control Sequences}" hereafter. There are three types of \emph{Control Sequences} that \code{fansi} can treat specially: \itemize{ \item "C0" control characters, such as tabs and carriage returns (we include delete in this set, even though technically it is not part of it). \item Sequences starting in "ESC[", also known as ANSI CSI sequences. \item Sequences starting in "ESC" and followed by something other than "[". } \emph{Control Sequences} starting with ESC are assumed to be two characters long (including the ESC) unless they are of the CSI variety, in which case their length is computed as per the \href{http://www.ecma-international.org/publications/standards/Ecma-048.htm}{ECMA-48 specification}. There are non-CSI escape sequences that may be longer than two characters, but \code{fansi} will (incorrectly) treat them as if they were two characters long. In theory it is possible to encode \emph{Control Sequences} with a single byte introducing character in the 0x40-0x5F range instead of the traditional "ESC[". Since this is rare and it conflicts with UTF-8 encoding, we do not support it. The special treatment of \emph{Control Sequences} is to compute their display/character width as zero. For the SGR subset of the ANSI CSI sequences, \code{fansi} will also parse, interpret, and reapply the text styles they encode as needed. Whether a particular type of \emph{Control Sequence} is treated specially can be specified via the \code{ctl} parameter to the \code{fansi} functions that have it. } \section{ANSI CSI SGR Control Sequences}{ \strong{NOTE}: not all displays support ANSI CSI SGR sequences; run \link{term_cap_test} to see whether your display supports them. ANSI CSI SGR Control Sequences are the subset of CSI sequences that can be used to change text appearance (e.g. color). These sequences begin with "ESC[" and end in "m". \code{fansi} interprets these sequences and writes new ones to the output strings in such a way that the original formatting is preserved. In most cases this should be transparent to the user. Occasionally there may be mismatches between how \code{fansi} and a display interpret the CSI SGR sequences, which may produce display artifacts. The most likely source of artifacts are \emph{Control Sequences} that move the cursor or change the display, or that \code{fansi} otherwise fails to interpret, such as: \itemize{ \item Unknown SGR substrings. \item "C0" control characters like tabs and carriage returns. \item Other escape sequences. } Another possible source of problems is that different displays parse and interpret control sequences differently. The common CSI SGR sequences that you are likely to encounter in formatted text tend to be treated consistently, but less common ones are not. \code{fansi} tries to hew by the ECMA-48 specification \strong{for CSI control sequences}, but not all terminals do. The most likely source of problems will be 24-bit CSI SGR sequences. For example, a 24-bit color sequence such as "ESC[38;2;31;42;4" is a single foreground color to a terminal that supports it, or separate foreground, background, faint, and underline specifications for one that does not. To mitigate this particular problem you can tell \code{fansi} what your terminal capabilities are via the \code{term.cap} parameter or the "fansi.term.cap" global option, although \code{fansi} does try to detect them by default. \code{fansi} will will warn if it encounters \emph{Control Sequences} that it cannot interpret or that might conflict with terminal capabilities. You can turn off warnings via the \code{warn} parameter or via the "fansi.warn" global option. \code{fansi} can work around "C0" tab control characters by turning them into spaces first with \code{\link{tabs_as_spaces}} or with the \code{tabs.as.spaces} parameter available in some of the \code{fansi} functions. We chose to interpret ANSI CSI SGR sequences because this reduces how much string transcription we need to do during string manipulation. If we do not interpret the sequences then we need to record all of them from the beginning of the string and prepend all the accumulated tags up to beginning of a substring to the substring. In many case the bulk of those accumulated tags will be irrelevant as their effects will have been superseded by subsequent tags. \code{fansi} assumes that ANSI CSI SGR sequences should be interpreted in cumulative "Graphic Rendition Combination Mode". This means new SGR sequences add to rather than replace previous ones, although in some cases the effect is the same as replacement (e.g. if you have a color active and pick another one). } \section{Encodings / UTF-8}{ \code{fansi} will convert any non-ASCII strings to UTF-8 before processing them, and \code{fansi} functions that return strings will return them encoded in UTF-8. In some cases this will be different to what base R does. For example, \code{substr} re-encodes substrings to their original encoding. Interpretation of UTF-8 strings is intended to be consistent with base R. There are three ways things may not work out exactly as desired: \enumerate{ \item \code{fansi}, despite its best intentions, handles a UTF-8 sequence differently to the way R does. \item R incorrectly handles a UTF-8 sequence. \item Your display incorrectly handles a UTF-8 sequence. } These issues are most likely to occur with invalid UTF-8 sequences, combining character sequences, and emoji. For example, as of this writing R (and the OSX terminal) consider emojis to be one wide characters, when in reality they are two wide. Do not expect the \code{fansi} width calculations to to work correctly with strings containing emoji. Internally, \code{fansi} computes the width of every UTF-8 character sequence outside of the ASCII range using the native \code{R_nchar} function. This will cause such characters to be processed slower than ASCII characters. Additionally, \code{fansi} character width computations can differ from R width computations despite the use of \code{R_nchar}. \code{fansi} always computes width for each character individually, which assumes that the sum of the widths of each character is equal to the width of a sequence. However, it is theoretically possible for a character sequence that forms a single grapheme to break that assumption. In informal testing we have found this to be rare because in the most common multi-character graphemes the trailing characters are computed as zero width. As of R 3.4.0 \code{substr} appears to use UTF-8 character byte sizes as indicated by the leading byte, irrespective of whether the subsequent bytes lead to a valid sequence. Additionally, UTF-8 byte sequences as long as 5 or 6 bytes may be allowed, which is likely a holdover from older Unicode versions. \code{fansi} mimics this behavior. It is likely \code{substr} will start failing with invalid UTF-8 byte sequences with R 3.6.0 (as per SVN r74488). In general, you should assume that \code{fansi} may not replicate base R exactly when there are illegal UTF-8 sequences present. Our long term objective is to implement proper UTF-8 character width computations, but for simplicity and also because R and our terminal do not do it properly either we are deferring the issue for now. } \section{R < 3.2.2 support}{ Nominally you can build and run this package in R versions between 3.1.0 and 3.2.1. Things should mostly work, but please be aware we do not run the test suite under versions of R less than 3.2.2. One key degraded capability is width computation of wide-display characters. Under R < 3.2.2 \code{fansi} will assume every character is 1 display width. Additionally, \code{fansi} may not always report malformed UTF-8 sequences as it usually does. One exception to this is \code{\link{nchar_ctl}} as that is just a thin wrapper around \code{\link[base:nchar]{base::nchar}}. } \section{Miscellaneous}{ The native code in this package assumes that all strings are NULL terminated and no longer than (32 bit) INT_MAX (excluding the NULL). This should be a safe assumption since the code is designed to work with STRSXPs and CHRSXPs. Behavior is undefined and probably bad if you somehow manage to provide to \code{fansi} strings that do not adhere to these assumptions. } fansi/man/strsplit_ctl.Rd0000755000176200001440000001206013604507326015152 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/strsplit.R \name{strsplit_ctl} \alias{strsplit_ctl} \alias{strsplit_sgr} \title{ANSI Control Sequence Aware Version of strsplit} \usage{ strsplit_ctl(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE, warn = getOption("fansi.warn"), term.cap = getOption("fansi.term.cap"), ctl = "all") strsplit_sgr(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE, warn = getOption("fansi.warn"), term.cap = getOption("fansi.term.cap")) } \arguments{ \item{x}{a character vector, or, unlike \link[base:strsplit]{base::strsplit} an object that can be coerced to character.} \item{split}{ character vector (or object which can be coerced to such) containing \link{regular expression}(s) (unless \code{fixed = TRUE}) to use for splitting. If empty matches occur, in particular if \code{split} has length 0, \code{x} is split into single characters. If \code{split} has length greater than 1, it is re-cycled along \code{x}. } \item{fixed}{ logical. If \code{TRUE} match \code{split} exactly, otherwise use regular expressions. Has priority over \code{perl}. } \item{perl}{logical. Should Perl-compatible regexps be used?} \item{useBytes}{logical. If \code{TRUE} the matching is done byte-by-byte rather than character-by-character, and inputs with marked encodings are not converted. This is forced (with a warning) if any input is found which is marked as \code{"bytes"} (see \code{\link{Encoding}}).} \item{warn}{TRUE (default) or FALSE, whether to warn when potentially problematic \emph{Control Sequences} are encountered. These could cause the assumptions \code{fansi} makes about how strings are rendered on your display to be incorrect, for example by moving the cursor (see \link{fansi}).} \item{term.cap}{character a vector of the capabilities of the terminal, can be any combination "bright" (SGR codes 90-97, 100-107), "256" (SGR codes starting with "38;5" or "48;5"), and "truecolor" (SGR codes starting with "38;2" or "48;2"). Changing this parameter changes how \code{fansi} interprets escape sequences, so you should ensure that it matches your terminal capabilities. See \link{term_cap_test} for details.} \item{ctl}{character, which \emph{Control Sequences} should be treated specially. See the "_ctl vs. _sgr" section for details. \itemize{ \item "nl": newlines. \item "c0": all other "C0" control characters (i.e. 0x01-0x1f, 0x7F), except for newlines and the actual ESC (0x1B) character. \item "sgr": ANSI CSI SGR sequences. \item "csi": all non-SGR ANSI CSI sequences. \item "esc": all other escape sequences. \item "all": all of the above, except when used in combination with any of the above, in which case it means "all but". }} } \value{ list, see \link[base:strsplit]{base::strsplit}. } \description{ A drop-in replacement for \link[base:strsplit]{base::strsplit}. It will be noticeably slower, but should otherwise behave the same way except for \emph{Control Sequence} awareness. } \details{ This function works by computing the position of the split points after removing \emph{Control Sequences}, and uses those positions in conjunction with \code{\link{substr_ctl}} to extract the pieces. This concept is borrowed from \code{crayon::col_strsplit}. An important implication of this is that you cannot split by \emph{Control Sequences} that are being treated as \emph{Control Sequences}. You can however limit which control sequences are treated specially via the \code{ctl} parameters (see examples). } \note{ Non-ASCII strings are converted to and returned in UTF-8 encoding. The split positions are computed after both \code{x} and \code{split} are converted to UTF-8. } \section{_ctl vs. _sgr}{ The \code{*_ctl} versions of the functions treat all \emph{Control Sequences} specially by default. Special treatment is context dependent, and may include detecting them and/or computing their display/character width as zero. For the SGR subset of the ANSI CSI sequences, \code{fansi} will also parse, interpret, and reapply the text styles they encode if needed. You can modify whether a \emph{Control Sequence} is treated specially with the \code{ctl} parameter. You can exclude a type of \emph{Control Sequence} from special treatment by combining "all" with that type of sequence (e.g. \code{ctl=c("all", "nl")} for special treatment of all \emph{Control Sequences} \strong{but} newlines). The \code{*_sgr} versions only treat ANSI CSI SGR sequences specially, and are equivalent to the \code{*_ctl} versions with the \code{ctl} parameter set to "sgr". } \examples{ strsplit_sgr("\\033[31mhello\\033[42m world!", " ") ## Next two examples allow splitting by newlines, which ## normally doesn't work as newlines are _Control Sequences_ strsplit_sgr("\\033[31mhello\\033[42m\\nworld!", "\\n") strsplit_ctl("\\033[31mhello\\033[42m\\nworld!", "\\n", ctl=c("all", "nl")) } \seealso{ \link{fansi} for details on how \emph{Control Sequences} are interpreted, particularly if you are getting unexpected results, \link[base:strsplit]{base::strsplit} for details on the splitting. } fansi/man/set_knit_hooks.Rd0000755000176200001440000001216013604507326015450 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/misc.R \name{set_knit_hooks} \alias{set_knit_hooks} \title{Set an Output Hook to Display ANSI CSI SGR in Rmarkdown} \usage{ set_knit_hooks(hooks, which = "output", proc.fun = function(x, class) html_code_block(sgr_to_html(html_esc(x)), class = class), class = sprintf("fansi fansi-\%s", which), style = getOption("fansi.css"), split.nl = FALSE, .test = FALSE) } \arguments{ \item{hooks}{list, this should the be \code{knitr::knit_hooks} object; we require you pass this to avoid a run-time dependency on \code{knitr}.} \item{which}{character vector with the names of the hooks that should be replaced, defaults to 'output', but can also contain values 'message', 'warning', and 'error'.} \item{proc.fun}{function that will be applied to output that contains ANSI CSI SGR sequences. Should accept parameters \code{x} and \code{class}, where \code{x} is the output, and \code{class} is the CSS class that should be applied to the
 blocks the output will be placed in.}

\item{class}{character the CSS class to give the output chunks.  Each type of
output chunk specified in \code{which} will be matched position-wise to the
classes specified here.  This vector should be the same length as \code{which}.}

\item{style}{character a vector of CSS styles; these will be output inside
HTML "))

  if(.test) list(old.hooks=old.hook.list, new.hooks=new.hook.list, res=set.res)
  else old.hook.list
}
fansi/R/strip.R0000755000176200001440000001040213604507326013045 0ustar  liggesusers## Copyright (C) 2020  Brodie Gaslam
##
## This file is part of "fansi - ANSI Control Sequence Aware String Functions"
##
## This program is free software: you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 2 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## Go to  for a copy of the license.

#' Strip ANSI Control Sequences
#'
#' Removes _Control Sequences_ from strings.  By default it will
#' strip all known _Control Sequences_, including ANSI CSI
#' sequences, two character sequences starting with ESC, and all C0 control
#' characters, including newlines.  You can fine tune this behavior with the
#' `ctl` parameter.  `strip_sgr` only strips ANSI CSI SGR sequences.
#'
#' The `ctl` value contains the names of **non-overlapping** subsets of the
#' known _Control Sequences_ (e.g. "csi" does not contain "sgr", and "c0" does
#' not contain newlines).  The one exception is "all" which means strip every
#' known sequence.  If you combine "all" with any other option then everything
#' **but** that option will be stripped.
#'
#' @note Non-ASCII strings are converted to and returned in UTF-8 encoding.
#' @seealso [fansi] for details on how _Control Sequences_ are
#'   interpreted, particularly if you are getting unexpected results.
#' @inheritParams substr_ctl
#' @inheritSection substr_ctl _ctl vs. _sgr
#' @export
#' @param ctl character, any combination of the following values (see details):
#'   * "nl": strip newlines.
#'   * "c0": strip all other "C0" control characters (i.e. x01-x1f, x7F), 
#'     except for newlines and the actual ESC character.
#'   * "sgr": strip ANSI CSI SGR sequences.
#'   * "csi": strip all non-SGR csi sequences.
#'   * "esc": strip all other escape sequences.
#'   * "all": all of the above, except when used in combination with any of the
#'     above, in which case it means "all but" (see details).
#' @param strip character, deprecated in favor of `ctl`.
#' @return character vector of same length as x with ANSI escape sequences
#'   stripped
#' @examples
#' string <- "hello\033k\033[45p world\n\033[31mgoodbye\a moon"
#' strip_ctl(string)
#' strip_ctl(string, c("nl", "c0", "sgr", "csi", "esc")) # equivalently
#' strip_ctl(string, "sgr")
#' strip_ctl(string, c("c0", "esc"))
#'
#' ## everything but C0 controls, we need to specify "nl"
#' ## in addition to "c0" since "nl" is not part of "c0"
#' ## as far as the `strip` argument is concerned
#' strip_ctl(string, c("all", "nl", "c0"))
#'
#' ## convenience function, same as `strip_ctl(ctl='sgr')`
#' strip_sgr(string)

strip_ctl <- function(x, ctl='all', warn=getOption('fansi.warn'), strip) {
  if(!missing(strip)) {
    message("Parameter `strip` has been deprecated; use `ctl` instead.")
    ctl <- strip
  }
  if(!is.character(x)) x <- as.character(x)

  if(!is.logical(warn)) warn <- as.logical(warn)
  if(length(warn) != 1L || is.na(warn))
    stop("Argument `warn` must be TRUE or FALSE.")

  if(!is.character(ctl))
    stop("Argument `ctl` must be character.")
  if(length(ctl)) {
    # duplicate values in `ctl` are okay, so save a call to `unique` here
    if(anyNA(ctl.int <- match(ctl, VALID.CTL)))
      stop(
        "Argument `ctl` may contain only values in `",
        deparse(VALID.CTL), "`"
      )
    .Call(FANSI_strip_csi, enc2utf8(x), ctl.int, warn)
  } else x
}
#' @export
#' @rdname strip_ctl

strip_sgr <- function(x, warn=getOption('fansi.warn')) {
  if(!is.character(x)) x <- as.character(x)
  if(!is.logical(warn)) warn <- as.logical(warn)
  if(length(warn) != 1L || is.na(warn))
    stop("Argument `warn` must be TRUE or FALSE.")

  ctl.int <- match("sgr", VALID.CTL)
  if(anyNA(ctl.int))
    stop("Internal Error: invalid ctl type; contact maintainer.") # nocov

  .Call(FANSI_strip_csi, enc2utf8(x), ctl.int, warn)
}

## Process String by Removing Unwanted Characters
##
## This is to simulate what `strwrap` does, exposed for testing purposes.

process <- function(x) .Call(FANSI_process, enc2utf8(x))

fansi/R/strtrim.R0000755000176200001440000001161613604507326013420 0ustar  liggesusers## Copyright (C) 2020  Brodie Gaslam
##
## This file is part of "fansi - ANSI Control Sequence Aware String Functions"
##
## This program is free software: you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 2 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## Go to  for a copy of the license.

#' ANSI Control Sequence Aware Version of strtrim
#'
#' One difference with [base::strtrim] is that all C0 control characters such as
#' newlines, carriage returns, etc., are treated as zero width.
#'
#' `strtrim2_ctl` adds the option of converting tabs to spaces before trimming.
#' This is the only difference between `strtrim_ctl` and `strtrim2_ctl`.
#'
#' @note Non-ASCII strings are converted to and returned in UTF-8 encoding.
#'   Width calculations will not work correctly with R < 3.2.2.
#' @export
#' @inheritSection substr_ctl _ctl vs. _sgr
#' @seealso [fansi] for details on how _Control Sequences_ are
#'   interpreted, particularly if you are getting unexpected results.
#'   [strwrap_ctl] is used internally by this function.
#' @inheritParams base::strtrim
#' @inheritParams strwrap_ctl
#' @examples
#' strtrim_ctl("\033[42mHello world\033[m", 6)

strtrim_ctl <- function(x, width, warn=getOption('fansi.warn'), ctl='all'){
  if(!is.character(x)) x <- as.character(x)

  if(!is.numeric(width) || length(width) != 1L || is.na(width) || width < 0)
    stop("Argument `width` must be a positive scalar numeric.")

  if(!is.logical(warn)) warn <- as.logical(warn)
  if(length(warn) != 1L || is.na(warn))
    stop("Argument `warn` must be TRUE or FALSE.")

  if(!is.character(ctl))
    stop("Argument `ctl` must be character.")
  ctl.int <- integer()
  if(length(ctl)) {
    # duplicate values in `ctl` are okay, so save a call to `unique` here
    if(anyNA(ctl.int <- match(ctl, VALID.CTL)))
      stop(
        "Argument `ctl` may contain only values in `",
        deparse(VALID.CTL), "`"
      )
  }
  # can assume all term cap available for these purposes

  term.cap.int <- seq_along(VALID.TERM.CAP)
  width <- as.integer(width)

  # a bit inefficient to rely on strwrap, but oh well

  res <- .Call(
    FANSI_strwrap_csi,
    enc2utf8(x), width,
    0L, 0L,    # indent, exdent
    "", "",    # prefix, initial
    TRUE, "",  # wrap always
    FALSE,     # strip spaces
    FALSE, 8L,
    warn, term.cap.int,
    TRUE,      # first only
    ctl.int
  )
  res
}
#' @export
#' @rdname strtrim_ctl

strtrim2_ctl <- function(
  x, width, warn=getOption('fansi.warn'),
  tabs.as.spaces=getOption('fansi.tabs.as.spaces'),
  tab.stops=getOption('fansi.tab.stops'),
  ctl='all'
) {
  if(!is.character(x)) x <- as.character(x)

  if(!is.numeric(width) || length(width) != 1L || is.na(width) || width < 0)
    stop("Argument `width` must be a positive scalar numeric.")

  if(!is.logical(warn)) warn <- as.logical(warn)
  if(length(warn) != 1L || is.na(warn))
    stop("Argument `warn` must be TRUE or FALSE.")

  if(!is.logical(tabs.as.spaces)) tabs.as.spaces <- as.logical(tabs.as.spaces)
  if(length(tabs.as.spaces) != 1L || is.na(tabs.as.spaces))
    stop("Argument `tabs.as.spaces` must be TRUE or FALSE.")

  if(!is.numeric(tab.stops) || !length(tab.stops) || any(tab.stops < 1))
    stop("Argument `tab.stops` must be numeric and strictly positive")

  if(!is.character(ctl))
    stop("Argument `ctl` must be character.")
  ctl.int <- integer()
  if(length(ctl)) {
    # duplicate values in `ctl` are okay, so save a call to `unique` here
    if(anyNA(ctl.int <- match(ctl, VALID.CTL)))
      stop(
        "Argument `ctl` may contain only values in `",
        deparse(VALID.CTL), "`"
      )
  }
  # can assume all term cap available for these purposes

  term.cap.int <- seq_along(VALID.TERM.CAP)
  width <- as.integer(width)
  tab.stops <- as.integer(tab.stops)

  # a bit inefficient to rely on strwrap, but oh well

  res <- .Call(
    FANSI_strwrap_csi,
    enc2utf8(x), width,
    0L, 0L,    # indent, exdent
    "", "",    # prefix, initial
    TRUE, "",  # wrap always
    FALSE,     # strip spaces
    tabs.as.spaces, tab.stops,
    warn, term.cap.int,
    TRUE,      # first only
    ctl.int
  )
  res
}
#' @export
#' @rdname strtrim_ctl

strtrim_sgr <- function(x, width, warn=getOption('fansi.warn'))
  strtrim_ctl(x=x, width=width, warn=warn, ctl='sgr')

#' @export
#' @rdname strtrim_ctl

strtrim2_sgr <- function(x, width, warn=getOption('fansi.warn'),
  tabs.as.spaces=getOption('fansi.tabs.as.spaces'),
  tab.stops=getOption('fansi.tab.stops')
)
  strtrim2_ctl(
    x=x, width=width, warn=warn, tabs.as.spaces=tabs.as.spaces,
    tab.stops=tab.stops, ctl='sgr'
  )
fansi/R/internal.R0000755000176200001440000000374513604507326013534 0ustar  liggesusers## Copyright (C) 2020  Brodie Gaslam
##
## This file is part of "fansi - ANSI Control Sequence Aware String Functions"
##
## This program is free software: you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 2 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## Go to  for a copy of the license.

## Tracks whether we are running in R > 3.2.2 or not (see .onLoad)

R.ver.gte.3.2.2 <- NA

## Internal functions, used primarily for testing

## A version of unique that isn't terrible for very long strings that are
## actually the same

unique_chr <- function(x) .Call(FANSI_unique_chr, x)

## Testing interface for color code to HTML conversion

esc_color_code_to_html <- function(x) {
  if(!is.matrix(x) || !is.integer(x) || nrow(x) != 5)
    stop("Argument `x` must be a five row integer matrix.")
  .Call(FANSI_color_to_html, as.integer(x))
}

check_assumptions <- function() .Call(FANSI_check_assumptions)  # nocov
digits_in_int <- function(x) .Call(FANSI_digits_in_int, x)

add_int <- function(x, y) .Call(FANSI_add_int, as.integer(x), as.integer(y))

## testing interface for low overhead versions of R funs

cleave <- function(x) .Call(FANSI_cleave, x)
forder <- function(x) .Call(FANSI_order, x)
sort_chr <- function(x) .Call(FANSI_sort_chr, x)

set_int_max <- function(x) .Call(FANSI_set_int_max, as.integer(x)[1])
get_int_max <- function(x) .Call(FANSI_get_int_max)  # nocov for debug only

## exposed internals for testing

check_enc <- function(x, i) .Call(FANSI_check_enc, x, as.integer(i)[1])

## make sure what compression working

ctl_as_int <- function(x) .Call(FANSI_ctl_as_int, as.integer(x))

fansi/R/unhandled.R0000755000176200001440000001045213604507326013653 0ustar  liggesusers## Copyright (C) 2020  Brodie Gaslam
##
## This file is part of "fansi - ANSI Control Sequence Aware String Functions"
##
## This program is free software: you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 2 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## Go to  for a copy of the license.

#' Identify Unhandled ANSI Control Sequences
#'
#' Will return position and types of unhandled _Control Sequences_ in a
#' character vector.  Unhandled sequences may cause `fansi` to interpret strings
#' in a way different to your display.  See [fansi] for details.
#'
#' This is a debugging function that is not optimized for speed.
#'
#' The return value is a data frame with five columns:
#'
#' * index: integer the index in `x` with the unhandled sequence
#' * start: integer the start position of the sequence (in characters)
#' * stop: integer the end of the sequence (in characters), but note that if
#'   there are multiple ESC sequences abutting each other they will all be
#'   treated as one, even if some of those sequences are valid.
#' * error: the reason why the sequence was not handled:
#'     * exceed-term-cap: contains color codes not supported by the terminal
#'       (see [term_cap_test]).  Bright colors with color codes in the 90-97 and
#'       100-107 range in terminals that do not support them are not considered
#'       errors, whereas 256 or truecolor codes in terminals that do not support
#'       them are.  This is because the latter are often misinterpreted by
#'       terminals that do not support them, whereas the former are typically
#'       silently ignored.
#'     * special: SGR substring contains uncommon characters in ":<=>".
#'     * unknown: SGR substring with a value that does not correspond to a known
#'       SGR code.
#'     * non-SGR: a non-SGR CSI sequence.
#'     * non-CSI: a non-CSI escape sequence, i.e. one where the ESC is
#'       followed by something other than "[".  Since we assume all non-CSI
#'       sequences are only 2 characters long include the ESC, this type of
#'       sequence is the most likely to cause problems as many are not actually
#'       two characters long.
#'     * malformed-CSI: a malformed CSI sequence.
#'     * malformed-ESC: a malformed ESC sequence (i.e. one not ending in
#'       0x40-0x7e).
#'     * C0: a "C0" control character (e.g. tab, bell, etc.).
#' * translated: whether the string was translated to UTF-8, might be helpful in
#'   odd cases were character offsets change depending on encoding.  You should
#'   only worry about this if you cannot tie out the `start`/`stop` values to
#'   the escape sequence shown.
#' * esc: character the unhandled escape sequence
#'
#' @note Non-ASCII strings are converted to UTF-8 encoding.
#' @export
#' @seealso [fansi] for details on how _Control Sequences_ are
#'   interpreted, particularly if you are getting unexpected results.
#' @param x character vector
#' @inheritParams substr_ctl
#' @return data frame with as many rows as there are unhandled escape
#'   sequences and columns containing useful information for debugging the
#'   problem.  See details.
#'
#' @examples
#' string <- c(
#'   "\033[41mhello world\033[m", "foo\033[22>m", "\033[999mbar",
#'   "baz \033[31#3m", "a\033[31k", "hello\033m world"
#' )
#' unhandled_ctl(string)

unhandled_ctl <- function(x, term.cap=getOption('fansi.term.cap')) {
  if(!is.character(term.cap))
    stop("Argument `term.cap` must be character.")
  if(anyNA(term.cap.int <- match(term.cap, VALID.TERM.CAP)))
    stop(
      "Argument `term.cap` may only contain values in ",
      deparse(VALID.TERM.CAP)
    )
  res <- .Call(FANSI_unhandled_esc, enc2utf8(x), term.cap.int)
  names(res) <- c("index", "start", "stop", "error", "translated", "esc")
  errors <- c(
    'unknown', 'special', 'exceed-term-cap', 'non-SGR', 'malformed-CSI',
    'non-CSI', 'malformed-ESC', 'C0', 'malformed-UTF8'
  )
  res[['error']] <- errors[res[['error']]]
  as.data.frame(res, stringsAsFactors=FALSE)
}

fansi/R/substr2.R0000755000176200001440000003134013604507326013314 0ustar  liggesusers## Copyright (C) 2020  Brodie Gaslam
##
## This file is part of "fansi - ANSI Control Sequence Aware String Functions"
##
## This program is free software: you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 2 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## Go to  for a copy of the license.

#' ANSI Control Sequence Aware Version of substr
#'
#' `substr_ctl` is a drop-in replacement for `substr`.  Performance is
#' slightly slower than `substr`.  ANSI CSI SGR sequences will be included in
#' the substrings to reflect the format of the substring when it was embedded in
#' the source string.  Additionally, other _Control Sequences_ specified in
#' `ctl` are treated as zero-width.
#'
#' `substr2_ctl` and `substr2_sgr` add the ability to retrieve substrings based
#' on display width, and byte width in addition to the normal character width.
#' `substr2_ctl` also provides the option to convert tabs to spaces with
#' [tabs_as_spaces] prior to taking substrings.
#'
#' Because exact substrings on anything other than character width cannot be
#' guaranteed (e.g. as a result of multi-byte encodings, or double display-width
#' characters) `substr2_ctl` must make assumptions on how to resolve provided
#' `start`/`stop` values that are infeasible and does so via the `round`
#' parameter.
#'
#' If we use "start" as the `round` value, then any time the `start`
#' value corresponds to the middle of a multi-byte or a wide character, then
#' that character is included in the substring, while any similar partially
#' included character via the `stop` is left out.  The converse is true if we
#' use "stop" as the `round` value.  "neither" would cause all partial
#' characters to be dropped irrespective whether they correspond to `start` or
#' `stop`, and "both" could cause all of them to be included.
#'
#' These functions map string lengths accounting for ANSI CSI SGR sequence
#' semantics to the naive length calculations, and then use the mapping in
#' conjunction with [base::substr()] to extract the string.  This concept is
#' borrowed directly from Gábor Csárdi's `crayon` package, although the
#' implementation of the calculation is different.
#'
#' @section _ctl vs. _sgr:
#'
#' The `*_ctl` versions of the functions treat all _Control Sequences_ specially
#' by default.  Special treatment is context dependent, and may include
#' detecting them and/or computing their display/character width as zero.  For
#' the SGR subset of the ANSI CSI sequences, `fansi` will also parse, interpret,
#' and reapply the text styles they encode if needed.  You can modify whether a
#' _Control Sequence_ is treated specially with the `ctl` parameter.  You can
#' exclude a type of _Control Sequence_ from special treatment by combining
#' "all" with that type of sequence (e.g. `ctl=c("all", "nl")` for special
#' treatment of all _Control Sequences_ **but** newlines).  The `*_sgr` versions
#' only treat ANSI CSI SGR sequences specially, and are equivalent to the
#' `*_ctl` versions with the `ctl` parameter set to "sgr".
#'
#' @note Non-ASCII strings are converted to and returned in UTF-8 encoding.
#' @inheritParams base::substr
#' @export
#' @seealso [fansi] for details on how _Control Sequences_ are
#'   interpreted, particularly if you are getting unexpected results.
#' @param x a character vector or object that can be coerced to character.
#' @param type character(1L) partial matching `c("chars", "width")`, although
#'   `type="width"` only works correctly with R >= 3.2.2.
#' @param round character(1L) partial matching
#'   `c("start", "stop", "both", "neither")`, controls how to resolve
#'   ambiguities when a `start` or `stop` value in "width" `type` mode falls
#'   within a multi-byte character or a wide display character.  See details.
#' @param tabs.as.spaces FALSE (default) or TRUE, whether to convert tabs to
#'   spaces.  This can only be set to TRUE if `strip.spaces` is FALSE.
#' @param tab.stops integer(1:n) indicating position of tab stops to use
#'   when converting tabs to spaces.  If there are more tabs in a line than
#'   defined tab stops the last tab stop is re-used.  For the purposes of
#'   applying tab stops, each input line is considered a line and the character
#'   count begins from the beginning of the input line.
#' @param ctl character, which _Control Sequences_ should be treated
#'   specially. See the "_ctl vs. _sgr" section for details.
#'
#'   * "nl": newlines.
#'   * "c0": all other "C0" control characters (i.e. 0x01-0x1f, 0x7F), except
#'     for newlines and the actual ESC (0x1B) character.
#'   * "sgr": ANSI CSI SGR sequences.
#'   * "csi": all non-SGR ANSI CSI sequences.
#'   * "esc": all other escape sequences.
#'   * "all": all of the above, except when used in combination with any of the
#'     above, in which case it means "all but".
#' @param warn TRUE (default) or FALSE, whether to warn when potentially
#'   problematic _Control Sequences_ are encountered.  These could cause the
#'   assumptions `fansi` makes about how strings are rendered on your display
#'   to be incorrect, for example by moving the cursor (see [fansi]).
#' @param term.cap character a vector of the capabilities of the terminal, can
#'   be any combination "bright" (SGR codes 90-97, 100-107), "256" (SGR codes
#'   starting with "38;5" or "48;5"), and "truecolor" (SGR codes starting with
#'   "38;2" or "48;2"). Changing this parameter changes how `fansi` interprets
#'   escape sequences, so you should ensure that it matches your terminal
#'   capabilities. See [term_cap_test] for details.
#' @examples
#' substr_ctl("\033[42mhello\033[m world", 1, 9)
#' substr_ctl("\033[42mhello\033[m world", 3, 9)
#'
#' ## Width 2 and 3 are in the middle of an ideogram as
#' ## start and stop positions respectively, so we control
#' ## what we get with `round`
#'
#' cn.string <- paste0("\033[42m", "\u4E00\u4E01\u4E03", "\033[m")
#'
#' substr2_ctl(cn.string, 2, 3, type='width')
#' substr2_ctl(cn.string, 2, 3, type='width', round='both')
#' substr2_ctl(cn.string, 2, 3, type='width', round='start')
#' substr2_ctl(cn.string, 2, 3, type='width', round='stop')
#'
#' ## the _sgr variety only treat as special CSI SGR,
#' ## compare the following:
#'
#' substr_sgr("\033[31mhello\tworld", 1, 6)
#' substr_ctl("\033[31mhello\tworld", 1, 6)
#' substr_ctl("\033[31mhello\tworld", 1, 6, ctl=c('all', 'c0'))

substr_ctl <- function(
  x, start, stop,
  warn=getOption('fansi.warn'),
  term.cap=getOption('fansi.term.cap'),
  ctl='all'
)
  substr2_ctl(
    x=x, start=start, stop=stop, warn=warn, term.cap=term.cap, ctl=ctl
  )

#' @rdname substr_ctl
#' @export

substr2_ctl <- function(
  x, start, stop, type='chars', round='start',
  tabs.as.spaces=getOption('fansi.tabs.as.spaces'),
  tab.stops=getOption('fansi.tab.stops'),
  warn=getOption('fansi.warn'),
  term.cap=getOption('fansi.term.cap'),
  ctl='all'
) {
  if(!is.character(x)) x <- as.character(x)
  x <- enc2utf8(x)
  if(any(Encoding(x) == "bytes"))
    stop("BYTE encoded strings are not supported.")

  if(!is.logical(tabs.as.spaces)) tabs.as.spaces <- as.logical(tabs.as.spaces)
  if(length(tabs.as.spaces) != 1L || is.na(tabs.as.spaces))
    stop("Argument `tabs.as.spaces` must be TRUE or FALSE.")
  if(!is.numeric(tab.stops) || !length(tab.stops) || any(tab.stops < 1))
    stop("Argument `tab.stops` must be numeric and strictly positive")

  if(!is.logical(warn)) warn <- as.logical(warn)
  if(length(warn) != 1L || is.na(warn))
    stop("Argument `warn` must be TRUE or FALSE.")

  if(!is.character(term.cap))
    stop("Argument `term.cap` must be character.")
  if(anyNA(term.cap.int <- match(term.cap, VALID.TERM.CAP)))
    stop(
      "Argument `term.cap` may only contain values in ",
      deparse(VALID.TERM.CAP)
    )
  if(!is.character(ctl))
    stop("Argument `ctl` must be character.")
  ctl.int <- integer()
  if(length(ctl)) {
    # duplicate values in `ctl` are okay, so save a call to `unique` here
    if(anyNA(ctl.int <- match(ctl, VALID.CTL)))
      stop(
        "Argument `ctl` may contain only values in `",
        deparse(VALID.CTL), "`"
      )
  }

  valid.round <- c('start', 'stop', 'both', 'neither')
  if(
    !is.character(round) || length(round) != 1 ||
    is.na(round.int <- pmatch(round, valid.round))
  )
    stop("Argument `round` must partial match one of ", deparse(valid.round))

  round <- valid.round[round.int]

  valid.types <- c('chars', 'width')
  if(
    !is.character(type) || length(type) != 1 ||
    is.na(type.int <- pmatch(type, valid.types))
  )
    stop("Argument `type` must partial match one of ", deparse(valid.types))

  type.m <- type.int - 1L
  x.len <- length(x)

  # Silently recycle start/stop like substr does

  start <- rep(as.integer(start), length.out=x.len)
  stop <- rep(as.integer(stop), length.out=x.len)
  start[start < 1L] <- 1L

  res <- x
  no.na <- !(is.na(x) | is.na(start & stop))

  res[no.na] <- substr_ctl_internal(
    x[no.na], start=start[no.na], stop=stop[no.na],
    type.int=type.m,
    tabs.as.spaces=tabs.as.spaces, tab.stops=tab.stops, warn=warn,
    term.cap.int=term.cap.int,
    round.start=round == 'start' || round == 'both',
    round.stop=round == 'stop' || round == 'both',
    x.len=length(x),
    ctl.int=ctl.int
  )
  res[!no.na] <- NA_character_
  res
}
#' @rdname substr_ctl
#' @export

substr_sgr <- function(
  x, start, stop,
  warn=getOption('fansi.warn'),
  term.cap=getOption('fansi.term.cap')
)
  substr2_ctl(
    x=x, start=start, stop=stop, warn=warn, term.cap=term.cap, ctl='sgr'
  )

#' @rdname substr_ctl
#' @export

substr2_sgr <- function(
  x, start, stop, type='chars', round='start',
  tabs.as.spaces=getOption('fansi.tabs.as.spaces'),
  tab.stops=getOption('fansi.tab.stops'),
  warn=getOption('fansi.warn'),
  term.cap=getOption('fansi.term.cap')
)
  substr2_ctl(
    x=x, start=start, stop=stop, type=type, round=round,
    tabs.as.spaces=tabs.as.spaces,
    tab.stops=tab.stops, warn=warn, term.cap=term.cap, ctl='sgr'
  )

## Lower overhead version of the function for use by strwrap
##
## @x must already have been converted to UTF8
## @param type.int is supposed to be the matched version of type, minus 1

substr_ctl_internal <- function(
  x, start, stop, type.int, round, tabs.as.spaces,
  tab.stops, warn, term.cap.int, round.start, round.stop,
  x.len, ctl.int
) {
  # For each unique string, compute the state at each start and stop position
  # and re-map the positions to "ansi" space

  if(tabs.as.spaces)
    x <- .Call(FANSI_tabs_as_spaces, x, tab.stops, warn, term.cap.int, ctl.int)

  res <- character(x.len)
  s.s.valid <- stop >= start & stop

  x.scalar <- length(x) == 1
  x.u <- if(x.scalar) x else unique_chr(x)

  for(u in x.u) {
    elems <- which(x == u & s.s.valid)
    elems.len <- length(elems)
    e.start <- start[elems]
    e.stop <- stop[elems]
    x.elems <- if(x.scalar) rep(x, length.out=elems.len) else x[elems]

    # note, for expediency we're currently assuming that there is no overlap
    # between starts and stops

    e.order <- forder(c(e.start, e.stop))

    e.lag <- rep(c(round.start, round.stop), each=elems.len)[e.order]
    e.ends <- rep(c(FALSE, TRUE), each=elems.len)[e.order]
    e.sort <- c(e.start, e.stop)[e.order]

    state <- .Call(
      FANSI_state_at_pos_ext,
      u, e.sort - 1L, type.int,
      e.lag, e.ends,
      warn, term.cap.int,
      ctl.int
    )
    # Recover the matching values for e.sort

    e.unsort.idx <- match(seq_along(e.order), e.order)
    start.stop.ansi.idx <- .Call(FANSI_cleave, e.unsort.idx)
    start.ansi.idx <- start.stop.ansi.idx[[1L]]
    stop.ansi.idx <- start.stop.ansi.idx[[2L]]

    # And use those to substr with

    start.ansi <- state[[2]][3, start.ansi.idx]
    stop.ansi <- state[[2]][3, stop.ansi.idx]
    start.tag <- state[[1]][start.ansi.idx]
    stop.tag <- state[[1]][stop.ansi.idx]

    # if there is any ANSI CSI at end then add a terminating CSI

    end.csi <- character(length(start.tag))
    end.csi[nzchar(stop.tag)] <- '\033[0m'

    res[elems] <- paste0(
      start.tag, substr(x.elems, start.ansi, stop.ansi), end.csi
    )
  }
  res
}

## Need to expose this so we can test bad UTF8 handling because substr will
## behave different with bad UTF8 pre and post R 3.6.0

state_at_pos <- function(x, starts, ends, warn=getOption('fansi.warn')) {
  is.start <- c(rep(TRUE, length(starts)), rep(FALSE, length(ends)))
  .Call(
    FANSI_state_at_pos_ext,
    x, as.integer(c(starts, ends)) - 1L,
    0L,      # character type
    is.start,  # lags
    !is.start, # ends
    warn,
    seq_along(VALID.TERM.CAP),
    seq_along(VALID.CTL)
  )
}
fansi/NEWS.md0000755000176200001440000000671713604510134012463 0ustar  liggesusers# fansi Release Notes

## v0.4.1

* Correctly define/declare global symbols as per WRE 1.6.4.1, (h/t Professor
  Ripley, Joshua Ulrich for example fixes).
* [#59](https://github.com/brodieG/fansi/issues/59): Provide a `split.nl` option
  to `set_knit_hooks` to mitigate white space issues when using blackfriday for
  the markdown->html conversion (@krlmlr).

## v0.4.0

* Systematized which control sequences are handled specially by adding the `ctl`
  parameter to most functions.  Some functions such as `strip_ctl` had existing
  parameters that did the same thing (e.g. `strip`, or `which`), and those have
  been deprecated in favor of `ctl`.  While technically this is a change in the
  API, it is backwards compatible (addresses
  [#56](https://github.com/brodieG/fansi/issues/56) among and other things).
* Added `*_sgr` version of most `*_ctl` functions.
* `nzchar_ctl` gains the `ctl` parameter.
* [#57](https://github.com/brodieG/fansi/issues/57): Correctly detect when CSI
  sequences are not actually SGR (previously would apply styles from some
  non-SGR CSI sequences).
* [#55](https://github.com/brodieG/fansi/issues/55): `strsplit_ctl` can now work
  with `ctl` parameters containing escape sequences provided those sequences
  are excluded from by the `ctl` parameter.
* [#54](https://github.com/brodieG/fansi/issues/54): fix `sgr_to_html` so that
  it can handle vector elements with un-terminated SGR sequences (@krlmlr).
* Fix bug in width computation of first line onwards in `strwrap_ctl` when
  indent/exdent/prefix/initial widths vary from first to second line.
* Fix wrapping in `strwrap2_*(..., strip.spaces=FALSE)`, including a bug when
  `wrap.always=TRUE` and a line started in a word-whitespace boundary.
* Add `term.cap` parameter to `unhandled_ctl`.

## v0.3.0

* `fansi::set_knit_hooks` makes it easy to automatically convert ANSI CSI SGR
  sequences to HTML in Rmarkdown documents.  We also add a vignette that
  demonstrates how to do this.
* [#53](https://github.com/brodieG/fansi/issues/53): fix for systems where
  'char' is signed (found and fixed by @QuLogic).
* [#52](https://github.com/brodieG/fansi/issues/52): fix bad compilation under
  ICC (@kazumits).
* [#51](https://github.com/brodieG/fansi/issues/51): documentation improvements
  (@krlmlr).
* [#50](https://github.com/brodieG/fansi/issues/50): run tests on R 3.1 - 3.4
  tests for the rc branch only (@krlmlr).
* [#48](https://github.com/brodieG/fansi/issues/48): malformed call to error
  in FANSI_check_enc (@msannell).
* [#47](https://github.com/brodieG/fansi/issues/47): compatibility with R
  versions 3.2.0 and 3.2.1 (@andreadega).

## v0.2.3

* [#45](https://github.com/brodieG/fansi/issues/45): add capability to run under
  R 3.1 [hadley](https://github.com/hadley), [Gábor
  Csárdi](https://github.com/gaborcsardi).
* [#44](https://github.com/brodieG/fansi/issues/44): include bright color
  support in HTML conversion (h/t [Will Landau](https://github.com/wlandau)).

Other minor fixes ([#43](https://github.com/brodieG/fansi/issues/43), [#46](https://github.com/brodieG/fansi/issues/46)).

## v0.2.2

* Remove valgrind uninitialized string errors by avoiding `strsplit`.
* Reduce R dependency to >= 3.2.x (@gaborcsardi).
* Update tests to handle potential change in `substr` behavior starting with
  R-3.6.

## v0.2.1

* All string inputs are now encoded to UTF-8, not just those that are used in
  width calculations.
* UTF-8 tests skipped on Solaris.

## v0.2.0

* Add `strsplit_ctl`.

## v0.1.0

Initial release.


fansi/MD50000644000176200001440000001070213605457511011671 0ustar  liggesusersede3e24a533391b95ea19af6b3cbe1f3 *DESCRIPTION
5db9714c4ce2a800bcfddcab4a34b8c7 *NAMESPACE
71da6e3a61229e25b0aca4828dda2527 *NEWS.md
f5bac8eb5e3143152a95fb1974e27e96 *R/constants.R
e8560d231177c5c9c920d26a25bac6d7 *R/fansi-package.R
a25510d9ba4406a85a869b659c40f9f8 *R/has.R
ed74d2b53979ceb05542044cf69dd858 *R/internal.R
c113acfdc4935877b75f24128b787f32 *R/load.R
dada871fb611a714543775a211be4a95 *R/misc.R
bcc388ad63e69b71f98e256753a1e64c *R/nchar.R
1d166f5af133ca00b117cfcc2e776bf7 *R/strip.R
b791ccc6335d614c245b6e90b93a813d *R/strsplit.R
01bf095169746463e19c7e059dec5a98 *R/strtrim.R
f68bd496beff7cb21d614c9a4f721a45 *R/strwrap.R
5db21a2fe30dd8cb41ac1bbbf68e53d9 *R/substr2.R
c8fd43ba40bafe777a29090a9c0eb23a *R/tohtml.R
429153156855418278af2339453d2a00 *R/unhandled.R
6df7047de3f8da7631a9acf447d440f6 *README.md
fcd059056bfc76d47e5555b67701f631 *build/vignette.rds
7642f25e75f84d5811fefb105694ff48 *inst/doc/sgr-in-rmd.R
6b49432afbaf7938d27a7cbdc8e91fc6 *inst/doc/sgr-in-rmd.Rmd
19225e3362a781c84ea50f8f4b75664b *inst/doc/sgr-in-rmd.html
c18b8fac22c3814c1fd2e70a3b7875f1 *man/fansi.Rd
09b9afefad88c42caa18f6bdeff4be2c *man/fansi_lines.Rd
4d575c0e8523b729e43f118d434128c4 *man/has_ctl.Rd
3f4c061aea5ad9430e8e91d2ea2baa6c *man/html_code_block.Rd
d0ec8efaaafbe9bcbe983bd39742e649 *man/html_esc.Rd
51a9e27c46af97c5b21af3b46af5cd12 *man/nchar_ctl.Rd
3b9ab75a1e76dd56de28fb974d9a0db2 *man/set_knit_hooks.Rd
32a72fd3bc4ddb29c6b6b2e9f4c04748 *man/sgr_to_html.Rd
0d2dbe2373ce784b7f8b9ca384626a9d *man/strip_ctl.Rd
663b151cf6fc3811ec3d502754125079 *man/strsplit_ctl.Rd
c59aeb37fce5cc2e42c0cf04149c32f3 *man/strtrim_ctl.Rd
bc18a0fbcb1ae8aa13367f9426f28f3f *man/strwrap_ctl.Rd
35948fed25f555c6af4bee8e898dcb49 *man/substr_ctl.Rd
c53dcb763de80106e7e1e9d0382cb851 *man/tabs_as_spaces.Rd
2c68420d9bb7964ffe190b74957ccd93 *man/term_cap_test.Rd
34983f79e8003db003896a8db9cec9ee *man/unhandled_ctl.Rd
f9fed5e7cc9f558e3702cb9523a669ff *src/assumptions.c
6a35b497944d85d2fb24b648e3f77b27 *src/fansi.h
81f10e347c616209b21367fa1afa8fe5 *src/has.c
90312470e4c9a15aaec362ef501997b6 *src/init.c
276da7846816b9db3da00e71c55a6276 *src/nchar.c
be626ebd069a14e404ae840b900b25ce *src/read.c
5305a78d0b5ff588493dda7ffe1c6b7d *src/rnchar.c
48212470130b41d7527ddb44de68a96d *src/state.c
e8ccf59d413a74cdff787ef4986c9db3 *src/strip.c
3f91106a41105a34deebbe3297ca7cfc *src/strsplit.c
750236c29c136c5df550a239bd93c0ae *src/tabs.c
f7a835cf043e80a0448e6f7eead243cc *src/tohtml.c
9e49c3d7230601b57e032679b16cb1af *src/unhandled.c
bba6705198389ff4b77e7771709da4ec *src/unique.c
7eb929712e27a27ed3adfc13a9c038f7 *src/utf8.c
6c41a9478fa5c0295b863936b39054be *src/utils.c
b3de96ec14e57148a9f374a5de4d121d *src/wrap.c
e97721543d482e495b3a7bfdd1ce3422 *tests/run.R
f7a9e4a8d4f338c79b1c07b4ede3ad71 *tests/unitizer/_pre/funs.R
db34e4d2f8850486f5f06d96a1766bdf *tests/unitizer/_pre/lorem.R
a75d0df3c8bac48c6069dfe78774570e *tests/unitizer/_pre/lorem.data/lorem.cn.phrases.RDS
a8eaee7d7eb3d494b11a0d0e5564f962 *tests/unitizer/_pre/strings.R
5d41a6743f5764debbf3c0025d1a84d2 *tests/unitizer/has.R
ec43ad63bd6f6762e0a1ae0a6b4bd2ef *tests/unitizer/has.unitizer/data.rds
c1d25f7da9c443f8ab7b9ed431618c82 *tests/unitizer/misc.R
1849ecdeb97f90ee85676c71266d3997 *tests/unitizer/misc.unitizer/data.rds
c24d67d4d27c6f52f4f1971487f5f85c *tests/unitizer/nchar.R
3a6fd914f9721fab841d0b098ca64a27 *tests/unitizer/nchar.unitizer/data.rds
714ff1a124270cdbcf4aa744099d4bca *tests/unitizer/overflow.R
036bdf683ea5097b1838ad8edd78497a *tests/unitizer/overflow.unitizer/data.rds
428a02558545fbdf9f467a70ac921d5c *tests/unitizer/strip.R
dc95336a5c94b77f03185ca18c101d8e *tests/unitizer/strip.unitizer/data.rds
5c484005be88a3bc8656171617a92407 *tests/unitizer/strsplit.R
10a37bf395c759fbb237f31d1d7a2404 *tests/unitizer/strsplit.unitizer/data.rds
e94fbb025e462d9944fd8e7a72a2e437 *tests/unitizer/substr.R
d06deccfa73c3da4906fce17d9df4ab0 *tests/unitizer/substr.unitizer/data.rds
ab07f3010b93715d0fa5672fdf12310f *tests/unitizer/tabs.R
5a9a9461121f50834cfee6df408c7512 *tests/unitizer/tabs.unitizer/data.rds
31eb6c5ed82603c41768df8af6c5e3c2 *tests/unitizer/tohtml.R
1e42955372df2ccff39582833de82ba2 *tests/unitizer/tohtml.unitizer/data.rds
725282744a077df93334f08d451c9b1d *tests/unitizer/utf8.R
54a0c91503e2a9f0edbb78b9a44a3c4f *tests/unitizer/utf8.unitizer/data.rds
7c84e85c44b5f5432173ec07634465fe *tests/unitizer/wrap.R
e6237d3b4ba0395aa707042c2707e5bb *tests/unitizer/wrap.unitizer/data.rds
6b49432afbaf7938d27a7cbdc8e91fc6 *vignettes/sgr-in-rmd.Rmd
b35b7d6227ab84772153a7c06e2f8dcc *vignettes/styles.css
fansi/inst/0000755000176200001440000000000013604512140012323 5ustar  liggesusersfansi/inst/doc/0000755000176200001440000000000013604512140013070 5ustar  liggesusersfansi/inst/doc/sgr-in-rmd.Rmd0000755000176200001440000001035013604507326015527 0ustar  liggesusers---
title: "ANSI CSI SGR Sequences in Rmarkdown"
author: "Brodie Gaslam"
output:
    rmarkdown::html_vignette:
        css: styles.css
mathjax: local
vignette: >
  %\VignetteIndexEntry{ANSI CSI SGR Sequences in Rmarkdown}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r echo=FALSE}
library(fansi)
knitr::knit_hooks$set(document=function(x, options) gsub("\033", "\uFFFD", x))
```

### Browsers Do Not Interpret ANSI CSI SGR Sequences

Over the past few years color has been gaining traction in the R terminal,
particularly since Gábor Csárdi's [crayon](https://github.com/r-lib/crayon)
made it easy to format text with [ANSI CSI SGR
sequences](https://en.wikipedia.org/wiki/ANSI_escape_code).  At the
same time the advent of JJ Alaire and Yihui Xie `rmarkdown` and `knitr`
packages, along with John MacFarlane `pandoc`, made it easy to automatically
incorporate R code and output in HTML documents.

Unfortunately ANSI CSI SGR sequences are not recognized by web browsers and end
up rendering weirdly1:

```{r}
sgr.string <- c(
  "\033[43;34mday > night\033[0m",
  "\033[44;33mdawn < dusk\033[0m"
)
writeLines(sgr.string)
```

### Automatically Convert ANSI CSI SGR to HTML

`fansi` provides the `sgr_to_html` function which converts the ANSI CSI SGR
sequences into HTML markup.  When we combine it with `knitr::knit_hooks` we can
modify the rendering of the `rmarkdown` document such that ANSI CSI SGR
encoding is shown in the equivalent HTML.

`fansi::set_knit_hooks` is a convenience function that does just this.  You
should call it in an `rmarkdown` document with the:

  * Chunk option `results` set to "asis".
  * Chunk option `comments` set to "" (empty string).
  * The `knitr::knit_hooks` object as an argument.

The corresponding `rmarkdown` hunk should look as follows:

````
```{r, comment="", results="asis"}`r ''`
old.hooks <- fansi::set_knit_hooks(knitr::knit_hooks)
```
````

```{r comment="", results="asis", echo=FALSE}
old.hooks <- fansi::set_knit_hooks(knitr::knit_hooks)
```
We run this function for its side effects, which cause the output to be
displayed as intended:

```{r}
writeLines(sgr.string)
```

If you are seeing extra line breaks in your output you may need to use:

````
```{r, comment="", results="asis"}`r ''`
old.hooks <- fansi::set_knit_hooks(knitr::knit_hooks, split.nl=TRUE)
```
````

If you use `crayon` to generate your ANSI CSI SGR style strings you may need to
set `options(crayon.enabled=TRUE)`, as in some cases `crayon` suppresses the SGR
markup if it thinks it is not outputting to a terminal.

We can also set hooks for the other types of outputs, and add some additional
CSS styles.

````
```{r, comment="", results="asis"}`r ''`
styles <- c(
  getOption("fansi.style"),  # default style
  "PRE.fansi CODE {background-color: transparent;}",
  "PRE.fansi-error {background-color: #DDAAAA;}",
  "PRE.fansi-warning {background-color: #DDDDAA;}",
  "PRE.fansi-message {background-color: #AAAADD;}"
)
old.hooks <- c(
  old.hooks,
  fansi::set_knit_hooks(
    knitr::knit_hooks,
    which=c("warning", "error", "message"),
    style=styles
) )
```
````
```{r comment="", results="asis", echo=FALSE}
styles <- c(
  getOption("fansi.style"),  # default style
  "PRE.fansi CODE {background-color: transparent;}",
  "PRE.fansi-error {background-color: #DDAAAA;}",
  "PRE.fansi-warning {background-color: #DDDDAA;}",
  "PRE.fansi-message {background-color: #AAAADD;}"
)
old.hooks <- c(
  old.hooks,
  fansi::set_knit_hooks(
    knitr::knit_hooks,
    which=c("warning", "error", "message"),
    style=styles
) )
```
```{r error=TRUE}
message(paste0(sgr.string, collapse="\n"))
warning(paste0(c("", sgr.string), collapse="\n"))
stop(paste0(c("", sgr.string), collapse="\n"))
```

You can restore the old hooks at any time in your document with:

```{r}
do.call(knitr::knit_hooks$set, old.hooks)
writeLines(sgr.string)
```

See `?fansi::set_knit_hooks` for details.


----
1For illustrative purposes we output raw ANSI
CSI SGR sequences in this document.  However, because the ESC control character
causes problems with some HTML rendering services we replace it with the �
symbol.  Depending on the browser and process it would normally not be
visible at all, or substituted with some other symbol.

fansi/inst/doc/sgr-in-rmd.R0000644000176200001440000000263213604512137015203 0ustar  liggesusers## ----echo=FALSE----------------------------------------------------------
library(fansi)
knitr::knit_hooks$set(document=function(x, options) gsub("\033", "\uFFFD", x))

## ------------------------------------------------------------------------
sgr.string <- c(
  "\033[43;34mday > night\033[0m",
  "\033[44;33mdawn < dusk\033[0m"
)
writeLines(sgr.string)

## ----comment="", results="asis", echo=FALSE------------------------------
old.hooks <- fansi::set_knit_hooks(knitr::knit_hooks)

## ------------------------------------------------------------------------
writeLines(sgr.string)

## ----comment="", results="asis", echo=FALSE------------------------------
styles <- c(
  getOption("fansi.style"),  # default style
  "PRE.fansi CODE {background-color: transparent;}",
  "PRE.fansi-error {background-color: #DDAAAA;}",
  "PRE.fansi-warning {background-color: #DDDDAA;}",
  "PRE.fansi-message {background-color: #AAAADD;}"
)
old.hooks <- c(
  old.hooks,
  fansi::set_knit_hooks(
    knitr::knit_hooks,
    which=c("warning", "error", "message"),
    style=styles
) )

## ----error=TRUE----------------------------------------------------------
message(paste0(sgr.string, collapse="\n"))
warning(paste0(c("", sgr.string), collapse="\n"))
stop(paste0(c("", sgr.string), collapse="\n"))

## ------------------------------------------------------------------------
do.call(knitr::knit_hooks$set, old.hooks)
writeLines(sgr.string)

fansi/inst/doc/sgr-in-rmd.html0000644000176200001440000003756213604512140015752 0ustar  liggesusers














ANSI CSI SGR Sequences in Rmarkdown





















ANSI CSI SGR Sequences in Rmarkdown

Brodie Gaslam

Browsers Do Not Interpret ANSI CSI SGR Sequences

Over the past few years color has been gaining traction in the R terminal, particularly since Gábor Csárdi’s crayon made it easy to format text with ANSI CSI SGR sequences. At the same time the advent of JJ Alaire and Yihui Xie rmarkdown and knitr packages, along with John MacFarlane pandoc, made it easy to automatically incorporate R code and output in HTML documents.

Unfortunately ANSI CSI SGR sequences are not recognized by web browsers and end up rendering weirdly1:

## �[43;34mday > night�[0m
## �[44;33mdawn < dusk�[0m

Automatically Convert ANSI CSI SGR to HTML

fansi provides the sgr_to_html function which converts the ANSI CSI SGR sequences into HTML markup. When we combine it with knitr::knit_hooks we can modify the rendering of the rmarkdown document such that ANSI CSI SGR encoding is shown in the equivalent HTML.

fansi::set_knit_hooks is a convenience function that does just this. You should call it in an rmarkdown document with the:

  • Chunk option results set to “asis”.
  • Chunk option comments set to “” (empty string).
  • The knitr::knit_hooks object as an argument.

The corresponding rmarkdown hunk should look as follows:

```{r, comment="", results="asis"}
old.hooks <- fansi::set_knit_hooks(knitr::knit_hooks)
```

We run this function for its side effects, which cause the output to be displayed as intended:

## day > night
## dawn < dusk

If you are seeing extra line breaks in your output you may need to use:

```{r, comment="", results="asis"}
old.hooks <- fansi::set_knit_hooks(knitr::knit_hooks, split.nl=TRUE)
```

If you use crayon to generate your ANSI CSI SGR style strings you may need to set options(crayon.enabled=TRUE), as in some cases crayon suppresses the SGR markup if it thinks it is not outputting to a terminal.

We can also set hooks for the other types of outputs, and add some additional CSS styles.

```{r, comment="", results="asis"}
styles <- c(
  getOption("fansi.style"),  # default style
  "PRE.fansi CODE {background-color: transparent;}",
  "PRE.fansi-error {background-color: #DDAAAA;}",
  "PRE.fansi-warning {background-color: #DDDDAA;}",
  "PRE.fansi-message {background-color: #AAAADD;}"
)
old.hooks <- c(
  old.hooks,
  fansi::set_knit_hooks(
    knitr::knit_hooks,
    which=c("warning", "error", "message"),
    style=styles
) )
```
## day > night
## dawn < dusk
## Warning: 
## day > night
## dawn < dusk
## Error in eval(expr, envir, enclos): 
## day > night
## dawn < dusk

You can restore the old hooks at any time in your document with:

## �[43;34mday > night�[0m
## �[44;33mdawn < dusk�[0m

See ?fansi::set_knit_hooks for details.


1For illustrative purposes we output raw ANSI CSI SGR sequences in this document. However, because the ESC control character causes problems with some HTML rendering services we replace it with the � symbol. Depending on the browser and process it would normally not be visible at all, or substituted with some other symbol.