spelling/0000755000176200001440000000000014571530062012072 5ustar liggesusersspelling/NAMESPACE0000644000176200001440000000040014571343557013316 0ustar liggesusers# Generated by roxygen2: do not edit by hand S3method(print,summary_spellcheck) export(get_wordlist) export(spell_check_files) export(spell_check_package) export(spell_check_setup) export(spell_check_test) export(spell_check_text) export(update_wordlist) spelling/LICENSE0000644000176200001440000000005114571343557013106 0ustar liggesusersYEAR: 2017 COPYRIGHT HOLDER: Jeroen Ooms spelling/man/0000755000176200001440000000000014571343557012660 5ustar liggesusersspelling/man/spell_check_files.Rd0000644000176200001440000000302314571343557016603 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/check-files.R \name{spell_check_files} \alias{spell_check_files} \alias{spell_check_text} \title{Spell Check} \usage{ spell_check_files(path, ignore = character(), lang = "en_US") spell_check_text(text, ignore = character(), lang = "en_US") } \arguments{ \item{path}{path to file or to spell check} \item{ignore}{character vector with words which will be added to the \link[hunspell:hunspell]{hunspell::dictionary}} \item{lang}{set \code{Language} field in \code{DESCRIPTION} e.g. \code{"en-US"} or \code{"en-GB"}. For supporting other languages, see the \href{https://docs.ropensci.org/hunspell/articles/intro.html#hunspell-dictionaries}{hunspell vignette}.} \item{text}{character vector with plain text} } \description{ Perform a spell check on document files or plain text. } \details{ This function parses a file based on the file extension, and checks only text fields while ignoring code chunks and meta data. It works particularly well for markdown, but also latex, html, xml, pdf, and plain text are supported. For more information about the underlying spelling engine, see the \href{https://docs.ropensci.org/hunspell/articles/intro.html#hunspell-dictionaries}{hunspell package}. } \examples{ # Example files files <- list.files(system.file("examples", package = "knitr"), pattern = "\\\\.(Rnw|Rmd|html)$", full.names = TRUE) spell_check_files(files) } \seealso{ Other spelling: \code{\link{spell_check_package}()}, \code{\link{wordlist}} } \concept{spelling} spelling/man/wordlist.Rd0000644000176200001440000000263314571343557015022 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/wordlist.R \name{wordlist} \alias{wordlist} \alias{update_wordlist} \alias{get_wordlist} \title{The WORDLIST file} \usage{ update_wordlist(pkg = ".", vignettes = TRUE, confirm = TRUE) get_wordlist(pkg = ".") } \arguments{ \item{pkg}{path to package root directory containing the \code{DESCRIPTION} file} \item{vignettes}{check all \code{rmd} and \code{rnw} files in the pkg root directory (e.g. \code{readme.md}) and package \code{vignettes} folder.} \item{confirm}{show changes and ask confirmation before adding new words to the list} } \description{ The package wordlist file is used to allow custom words which will be added to the dictionary when spell checking. It is stored in \code{inst/WORDLIST} in the source package and must contain one word per line in UTF-8 encoded text. } \details{ The \link{update_wordlist} function runs a full spell check on a package, shows the results, and then prompts to add the found words to the package wordlist. Obviously you should check closely that these legitimate words and not actual spelling errors. It also removes words from the wordlist that no longer appear as spelling errors, either because they have been removed from the documentation or added to the \code{lang} dictionary. } \seealso{ Other spelling: \code{\link{spell_check_files}()}, \code{\link{spell_check_package}()} } \concept{spelling} spelling/man/spell_check_package.Rd0000644000176200001440000000464114571343557017103 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/spell-check.R \name{spell_check_package} \alias{spell_check_package} \alias{spelling} \alias{spell_check_setup} \alias{spell_check_test} \title{Package Spell Checking} \usage{ spell_check_package(pkg = ".", vignettes = TRUE, use_wordlist = TRUE) spell_check_setup(pkg = ".", vignettes = TRUE, lang = "en-US", error = FALSE) } \arguments{ \item{pkg}{path to package root directory containing the \code{DESCRIPTION} file} \item{vignettes}{check all \code{rmd} and \code{rnw} files in the pkg root directory (e.g. \code{readme.md}) and package \code{vignettes} folder.} \item{use_wordlist}{ignore words in the package \link[=get_wordlist]{WORDLIST} file} \item{lang}{set \code{Language} field in \code{DESCRIPTION} e.g. \code{"en-US"} or \code{"en-GB"}. For supporting other languages, see the \href{https://docs.ropensci.org/hunspell/articles/intro.html#hunspell-dictionaries}{hunspell vignette}.} \item{error}{should \verb{CMD check} fail if spelling errors are found? Default only prints results.} } \description{ Automatically spell-check package description, documentation, and vignettes. } \details{ Parses and checks R manual pages, rmd/rnw vignettes, and text fields in the package \code{DESCRIPTION} file. The preferred spelling language (typically \code{en-GB} or \code{en-US}) should be specified in the \code{Language} field from your package \code{DESCRIPTION}. To allow custom words, use the package \link[=get_wordlist]{WORDLIST} file which will be added to the dictionary when spell checking. See \link{update_wordlist} to automatically populate and update this file. The \link{spell_check_setup} function adds a unit test to your package which automatically runs a spell check on documentation and vignettes during \verb{R CMD check} if the environment variable \code{NOT_CRAN} is set to \code{TRUE}. By default this unit test never fails; it merely prints potential spelling errors to the console. If not already done, the \link{spell_check_setup} function will add \code{spelling} as a \code{Suggests} dependency, and a \code{Language} field to \code{DESCRIPTION}. Hunspell includes dictionaries for \code{en_US} and \code{en_GB} by default. Other languages require installation of a custom dictionary, see \link[hunspell:hunspell]{hunspell} for details. } \seealso{ Other spelling: \code{\link{spell_check_files}()}, \code{\link{wordlist}} } \concept{spelling} spelling/DESCRIPTION0000644000176200001440000000241514571530062013602 0ustar liggesusersPackage: spelling Title: Tools for Spell Checking in R Version: 2.3.0 Authors@R: c( person("Jeroen", "Ooms", , "jeroen@berkeley.edu", role = c("cre", "aut"), comment = c(ORCID = "0000-0002-4035-0289")), person("Jim", "Hester", , "james.hester@rstudio.com", role = "aut")) Description: Spell checking common document formats including latex, markdown, manual pages, and description files. Includes utilities to automate checking of documentation and vignettes as a unit test during 'R CMD check'. Both British and American English are supported out of the box and other languages can be added. In addition, packages may define a 'wordlist' to allow custom terminology without having to abuse punctuation. License: MIT + file LICENSE Encoding: UTF-8 URL: https://ropensci.r-universe.dev/spelling https://docs.ropensci.org/spelling/ BugReports: https://github.com/ropensci/spelling/issues Imports: commonmark, xml2, hunspell (>= 3.0), knitr Suggests: pdftools RoxygenNote: 7.3.1 Language: en-GB NeedsCompilation: no Packaged: 2024-03-04 21:14:16 UTC; jeroen Author: Jeroen Ooms [cre, aut] (), Jim Hester [aut] Maintainer: Jeroen Ooms Repository: CRAN Date/Publication: 2024-03-05 05:40:02 UTC spelling/tests/0000755000176200001440000000000014571343557013247 5ustar liggesusersspelling/tests/spelling.R0000644000176200001440000000007414571343557015210 0ustar liggesusersspelling::spell_check_test(vignettes = TRUE, error = FALSE) spelling/NEWS0000644000176200001440000000321614571404307012575 0ustar liggesusers2.3.0 - Support for spellchecking on Quarto files (@olivroy, #77) - You can now specify a global wordlist file via the SPELLING_WORDLIST envvar 2.2.1 - Fix for commonmark 1.7.0 change for auto-linked markdown URLs 2.2 - WORDLIST is now sorted with a locale-independent method, which avoids large diffs in version control due to the fact that developpers use different locale, with different lexicographic ordering rules (#48, @bisaloo) - spell_check_package() now loads Rd macros (#42) 2.1 - Pre-filter script/style/img tags when checking html files because the huge embedded binary blogs produced by rmarkdown slow down the hunspell parser. - Treat input files in spell_check_files() as UTF-8 on all platforms - Fix a sorting bug in spell_check_files() 2.0 - spell_check_package() now also checks README.md and NEWS.md in the package root - Enforce latest hunspell and libhunspell, which include updated dictionaries - Treat all input as UTF-8. Fixes some false positives on Windows - Ignore yaml front matter in markdown except for 'title', 'subtitle', and 'description' - Markdown: filter words that contain an '@' symbol (citation key or email address) - Properly parse authors@R field for ignore list (issue #2) - Use tools::file_ext instead of knitr:::file_ext 1.2 - Internally normalize all case of lang strings to lower_UPPER e.g en_US - Only run automatic check when 'spelling' is available and NOT_CRAN is set 1.1 - Breaking: Package spell-checker now uses language from DESCRIPTION - Require hunspell 2.9 dependency (better parsing and dicationaries) - Change default lang to 'en_US' 1.0 - Initial release spelling/R/0000755000176200001440000000000014571407475012306 5ustar liggesusersspelling/R/rmarkdown.R0000644000176200001440000000211414571343557014433 0ustar liggesusers# This is borrowed from the rmarkdown pkg partition_yaml_front_matter <- function (input_lines) { validate_front_matter <- function(delimiters) { if (length(delimiters) >= 2 && (delimiters[2] - delimiters[1] > 1) && grepl("^---\\s*$", input_lines[delimiters[1]])) { if (delimiters[1] == 1) TRUE else is_blank(input_lines[1:delimiters[1] - 1]) } else { FALSE } } delimiters <- grep("^(---|\\.\\.\\.)\\s*$", input_lines) if (validate_front_matter(delimiters)) { front_matter <- input_lines[(delimiters[1]):(delimiters[2])] input_body <- c() if (delimiters[1] > 1) input_body <- c(input_body, input_lines[1:delimiters[1] - 1]) if (delimiters[2] < length(input_lines)) input_body <- c(input_body, input_lines[-(1:delimiters[2])]) list(front_matter = front_matter, body = input_body) } else { list(front_matter = NULL, body = input_lines) } } is_blank <- function(x) { if (length(x)) all(grepl("^\\s*$", x)) else TRUE } spelling/R/check-files.R0000644000176200001440000001332714571343557014614 0ustar liggesusers#' Spell Check #' #' Perform a spell check on document files or plain text. #' #' This function parses a file based on the file extension, and checks only #' text fields while ignoring code chunks and meta data. It works particularly #' well for markdown, but also latex, html, xml, pdf, and plain text are #' supported. #' #' For more information about the underlying spelling engine, see the #' [hunspell package](https://docs.ropensci.org/hunspell/articles/intro.html#hunspell-dictionaries). #' #' @rdname spell_check_files #' @family spelling #' @inheritParams spell_check_package #' @param path path to file or to spell check #' @param ignore character vector with words which will be added to the [hunspell::dictionary] #' @export #' @examples # Example files #' files <- list.files(system.file("examples", package = "knitr"), #' pattern = "\\.(Rnw|Rmd|html)$", full.names = TRUE) #' spell_check_files(files) spell_check_files <- function(path, ignore = character(), lang = "en_US"){ stopifnot(is.character(ignore)) lang <- normalize_lang(lang) dict <- hunspell::dictionary(lang, add_words = ignore) path <- sort(normalizePath(path, mustWork = TRUE)) lines <- lapply(path, spell_check_file_one, dict = dict) summarize_words(path, lines) } spell_check_file_one <- function(path, dict){ if(grepl("\\.r?q?md$",path, ignore.case = TRUE)) return(spell_check_file_md(path, dict = dict)) if(grepl("\\.rd$", path, ignore.case = TRUE)) return(spell_check_file_rd(path, dict = dict)) if(grepl("\\.(rnw|snw)$",path, ignore.case = TRUE)) return(spell_check_file_knitr(path = path, format = "latex", dict = dict)) if(grepl("\\.(tex)$",path, ignore.case = TRUE)) return(spell_check_file_plain(path = path, format = "latex", dict = dict)) if(grepl("\\.(html?)$", path, ignore.case = TRUE)){ try({ path <- pre_filter_html(path) }) return(spell_check_file_plain(path = path, format = "html", dict = dict)) } if(grepl("\\.(xml)$",path, ignore.case = TRUE)) return(spell_check_file_plain(path = path, format = "xml", dict = dict)) if(grepl("\\.(pdf)$",path, ignore.case = TRUE)) return(spell_check_file_pdf(path = path, format = "text", dict = dict)) return(spell_check_file_plain(path = path, format = "text", dict = dict)) } #' @rdname spell_check_files #' @export #' @param text character vector with plain text spell_check_text <- function(text, ignore = character(), lang = "en_US"){ stopifnot(is.character(ignore)) lang <- normalize_lang(lang) dict <- hunspell::dictionary(lang, add_words = ignore) bad_words <- hunspell::hunspell(text, dict = dict) words <- sort(unique(unlist(bad_words))) out <- data.frame(word = words, stringsAsFactors = FALSE) out$found <- lapply(words, function(word) { which(vapply(bad_words, `%in%`, x = word, logical(1))) }) out } spell_check_plain <- function(text, dict){ bad_words <- hunspell::hunspell(text, dict = dict) vapply(sort(unique(unlist(bad_words))), function(word) { line_numbers <- which(vapply(bad_words, `%in%`, x = word, logical(1))) paste(line_numbers, collapse = ",") }, character(1)) } spell_check_file_text <- function(file, dict){ spell_check_plain(readLines(file), dict = dict) } spell_check_description_text <- function(file, dict){ lines <- readLines(file) lines <- gsub("", "", lines) spell_check_plain(lines, dict = dict) } spell_check_file_rd <- function(rdfile, macros = NULL, dict) { text <- if (!length(macros)) { tools::RdTextFilter(rdfile) } else { tools::RdTextFilter(rdfile, macros = macros) } Encoding(text) <- "UTF-8" spell_check_plain(text, dict = dict) } spell_check_file_md <- function(path, dict){ words <- parse_text_md(path) # Filter out citation keys, see https://github.com/ropensci/spelling/issues/9 words$text <- gsub("\\S*@\\S+", "", words$text, perl = TRUE) words$startline <- vapply(strsplit(words$position, ":", fixed = TRUE), `[[`, character(1), 1) bad_words <- hunspell::hunspell(words$text, dict = dict) vapply(sort(unique(unlist(bad_words))), function(word) { line_numbers <- which(vapply(bad_words, `%in%`, x = word, logical(1))) paste(words$startline[line_numbers], collapse = ",") }, character(1)) } spell_check_file_knitr <- function(path, format, dict){ latex <- remove_chunks(path) words <- hunspell::hunspell_parse(latex, format = format, dict = dict) text <- vapply(words, paste, character(1), collapse = " ") spell_check_plain(text, dict = dict) } spell_check_file_plain <- function(path, format, dict){ lines <- readLines(path, warn = FALSE, encoding = 'UTF-8') words <- hunspell::hunspell_parse(lines, format = format, dict = dict) text <- vapply(words, paste, character(1), collapse = " ") spell_check_plain(text, dict = dict) } spell_check_file_pdf <- function(path, format, dict){ lines <- pdftools::pdf_text(path) words <- hunspell::hunspell_parse(lines, format = format, dict = dict) text <- vapply(words, paste, character(1), collapse = " ") spell_check_plain(text, dict = dict) } # TODO: this does not retain whitespace in DTD before the tag pre_filter_html <- function(path){ doc <- xml2::read_html(path, options = c("RECOVER", "NOERROR")) src_nodes <- xml2::xml_find_all(doc, ".//*[@src]") xml2::xml_set_attr(src_nodes, 'src', replace_text(xml2::xml_attr(src_nodes, 'src'))) script_nodes <- xml2::xml_find_all(doc, "(.//script|.//style)") xml2::xml_set_text(script_nodes, replace_text(xml2::xml_text(script_nodes))) tmp <- file.path(tempdir(), basename(path)) unlink(tmp) xml2::write_html(doc, tmp, options = 'format_whitespace') return(tmp) } # This replaces all text except for linebreaks. # Therefore line numbers in spelling output should be unaffected replace_text <- function(x){ gsub(".*", "", x, perl = TRUE) } spelling/R/parse-markdown.R0000644000176200001440000000455714571343557015376 0ustar liggesusers#' Text Parsers #' #' Parse text from various formats and return a data frame with text lines #' and position in the source document. #' #' @noRd #' @name parse_text #' @param path markdown file #' @param yaml_fields character vector indicating which fields of the yaml #' front matter should be spell checked. #' @param extensions render markdown extensions? Passed to [commonmark][commonmark::markdown_xml] parse_text_md <- function(path, extensions = TRUE, yaml_fields = c("title" ,"subtitle", "description")){ # Read file and remove yaml font matter text <- readLines(path, warn = FALSE, encoding = 'UTF-8') parts <- partition_yaml_front_matter(text) if(length(parts$front_matter)){ yaml_fields <- paste(yaml_fields, collapse = "|") has_field <- grepl(paste0("^\\s*(",yaml_fields, ")"), parts$front_matter, ignore.case = TRUE) text[which(!has_field)] <- "" } # Get markdown AST as xml doc md <- commonmark::markdown_xml(text, sourcepos = TRUE, extensions = extensions) doc <- xml2::xml_ns_strip(xml2::read_xml(md)) # Filter autolinked URLs from text nodes link_nodes <- xml2::xml_find_all(doc, "//link[@destination]") lapply(link_nodes, function(x){ dest <- xml2::xml_attr(x, 'destination') if(nchar(dest)){ node <- xml2::xml_find_first(x, "./text") xml2::xml_set_text(node, sub(dest, '', xml2::xml_text(node), fixed = TRUE)) } }) # Find text nodes and their location in the markdown source doc sourcepos_nodes <- xml2::xml_find_all(doc, "//*[@sourcepos][text]") sourcepos <- xml2::xml_attr(sourcepos_nodes, "sourcepos") values <- vapply(sourcepos_nodes, function(x) { paste0(collapse = "\n", xml2::xml_text(xml2::xml_find_all(x, "./text"))) }, character(1)) # Strip 'heading identifiers', see: https://pandoc.org/MANUAL.html#heading-identifiers values <- gsub('\\{#[^\\n]+\\}\\s*($|\\r?\\n)', '\\1', values, perl = TRUE) # Strip bookdown text references, see: https://bookdown.org/yihui/bookdown/markdown-extensions-by-bookdown.html#text-references values <- gsub("\\(ref:.*?\\)", "", values) # Quarto references start with @, for example @sec-, @fig- @eq- etc. https://quarto.org/docs/authoring/cross-reference-options.html # No special regex is necessary since all words containing @ are removed from # spell check #9 data.frame( text = values, position = sourcepos, stringsAsFactors = FALSE ) } spelling/R/wordlist.R0000644000176200001440000000554714571407475014313 0ustar liggesusers#' The WORDLIST file #' #' The package wordlist file is used to allow custom words which will be added to the #' dictionary when spell checking. It is stored in `inst/WORDLIST` in the source package #' and must contain one word per line in UTF-8 encoded text. #' #' The [update_wordlist] function runs a full spell check on a package, shows the results, #' and then prompts to add the found words to the package wordlist. Obviously you should #' check closely that these legitimate words and not actual spelling errors. It also #' removes words from the wordlist that no longer appear as spelling errors, either because #' they have been removed from the documentation or added to the `lang` dictionary. #' #' @rdname wordlist #' @name wordlist #' @family spelling #' @export #' @param confirm show changes and ask confirmation before adding new words to the list #' @inheritParams spell_check_package update_wordlist <- function(pkg = ".", vignettes = TRUE, confirm = TRUE){ pkg <- as_package(pkg) wordfile <- get_wordfile(pkg$path) old_words <- sort(get_wordlist(pkg$path), method = "radix") new_words <- sort(spell_check_package(pkg$path, vignettes = vignettes, use_wordlist = FALSE)$word, method = "radix") if(isTRUE(all.equal(old_words, new_words))){ cat(sprintf("No changes required to %s\n", wordfile)) } else { words_added <- new_words[is.na(match(new_words, old_words))] words_removed <- old_words[is.na(match(old_words, new_words))] if(length(words_added)){ cat(sprintf("The following words will be added to the wordlist:\n%s\n", paste(" -", words_added, collapse = "\n"))) } if(length(words_removed)){ cat(sprintf("The following words will be removed from the wordlist:\n%s\n", paste(" -", words_removed, collapse = "\n"))) } if(isTRUE(confirm) && length(words_added)){ cat("Are you sure you want to update the wordlist?") if (utils::menu(c("Yes", "No")) != 1){ return(invisible()) } } # Save as UTF-8 dir.create(dirname(wordfile), showWarnings = FALSE) writeLines(enc2utf8(new_words), wordfile, useBytes = TRUE) cat(sprintf("Added %d and removed %d words in %s\n", length(words_added), length(words_removed), wordfile)) } } #' @rdname wordlist #' @export get_wordlist <- function(pkg = "."){ pkg <- as_package(pkg) wordfile <- get_wordfile(pkg$path) pkg_wordlist <- if(file.exists(wordfile)) read_wordfile(wordfile) global_wordfile <- Sys.getenv('SPELLING_WORDLIST') global_wordlist <- if(nchar(global_wordfile) && file.exists(global_wordfile)) read_wordfile(global_wordfile) as.character(c(pkg_wordlist, global_wordlist)) } get_wordfile <- function(path){ normalizePath(file.path(path, "inst/WORDLIST"), mustWork = FALSE) } read_wordfile <- function(wordfile){ unlist(strsplit(readLines(wordfile, warn = FALSE, encoding = "UTF-8"), " ", fixed = TRUE)) } spelling/R/spell-check.R0000644000176200001440000002225014571343557014624 0ustar liggesusers#' Package Spell Checking #' #' Automatically spell-check package description, documentation, and vignettes. #' #' Parses and checks R manual pages, rmd/rnw vignettes, and text fields in the #' package `DESCRIPTION` file. #' #' The preferred spelling language (typically `en-GB` or `en-US`) should be specified #' in the `Language` field from your package `DESCRIPTION`. To allow custom words, #' use the package [WORDLIST][get_wordlist] file which will be added to the dictionary #' when spell checking. See [update_wordlist] to automatically populate and update this #' file. #' #' The [spell_check_setup] function adds a unit test to your package which automatically #' runs a spell check on documentation and vignettes during `R CMD check` if the environment #' variable `NOT_CRAN` is set to `TRUE`. By default this unit test never fails; it merely #' prints potential spelling errors to the console. If not already done, #' the [spell_check_setup] function will add `spelling` as a `Suggests` dependency, #' and a `Language` field to `DESCRIPTION`. #' #' Hunspell includes dictionaries for `en_US` and `en_GB` by default. Other languages #' require installation of a custom dictionary, see [hunspell][hunspell::hunspell] for details. #' #' @export #' @rdname spell_check_package #' @name spell_check_package #' @aliases spelling #' @family spelling #' @param pkg path to package root directory containing the `DESCRIPTION` file #' @param vignettes check all `rmd` and `rnw` files in the pkg root directory (e.g. #' `readme.md`) and package `vignettes` folder. #' @param use_wordlist ignore words in the package [WORDLIST][get_wordlist] file #' @param lang set `Language` field in `DESCRIPTION` e.g. `"en-US"` or `"en-GB"`. #' For supporting other languages, see the [hunspell vignette](https://docs.ropensci.org/hunspell/articles/intro.html#hunspell-dictionaries). spell_check_package <- function(pkg = ".", vignettes = TRUE, use_wordlist = TRUE){ # Get package info pkg <- as_package(pkg) # Get language from DESCRIPTION lang <- normalize_lang(pkg$language) # Add custom words to the ignore list add_words <- if(isTRUE(use_wordlist)) get_wordlist(pkg$path) author <- if(length(pkg[['authors@r']])){ parse_r_field(pkg[['authors@r']]) } else { strsplit(pkg[['author']], " ", fixed = TRUE)[[1]] } ignore <- unique(c(pkg$package, author, hunspell::en_stats, add_words)) # Create the hunspell dictionary object dict <- hunspell::dictionary(lang, add_words = sort(ignore)) # Check Rd manual files rd_files <- sort(list.files(file.path(pkg$path, "man"), "\\.rd$", ignore.case = TRUE, full.names = TRUE)) macros <- tools::loadRdMacros( file.path(R.home("share"), "Rd", "macros", "system.Rd"), tools::loadPkgRdMacros(pkg$path, macros = NULL) ) rd_lines <- lapply(rd_files, spell_check_file_rd, dict = dict, macros = macros) # Check 'DESCRIPTION' fields pkg_fields <- c("title", "description") pkg_lines <- lapply(pkg_fields, function(x){ spell_check_description_text(textConnection(pkg[[x]]), dict = dict) }) # Combine all_sources <- c(rd_files, pkg_fields) all_lines <- c(rd_lines, pkg_lines) if(isTRUE(vignettes)){ # Where to check for rmd/md files vign_files <- list.files(file.path(pkg$path, "vignettes"), pattern = "\\.q?r?md$", ignore.case = TRUE, full.names = TRUE, recursive = TRUE) root_files <- list.files(pkg$path, pattern = "(readme|news|changes|index).r?q?md", ignore.case = TRUE, full.names = TRUE) # Markdown vignettes md_files <- normalizePath(c(root_files, vign_files)) md_lines <- lapply(sort(md_files), spell_check_file_md, dict = dict) # Sweave vignettes rnw_files <- list.files(file.path(pkg$path, "vignettes"), pattern = "\\.[rs]nw$", ignore.case = TRUE, full.names = TRUE) rnw_lines <- lapply(sort(rnw_files), spell_check_file_knitr, format = "latex", dict = dict) # Combine all_sources <- c(all_sources, md_files, rnw_files) all_lines <- c(all_lines, md_lines, rnw_lines) } summarize_words(all_sources, all_lines) } as_package <- function(pkg){ if(inherits(pkg, 'package')) return(pkg) path <- pkg description <- if(file.exists(file.path(path, "DESCRIPTION.in"))){ file.path(path, "DESCRIPTION.in") } else { normalizePath(file.path(path, "DESCRIPTION"), mustWork = TRUE) } pkg <- read.dcf(description)[1,] Encoding(pkg) = "UTF-8" pkg <- as.list(pkg) names(pkg) <- tolower(names(pkg)) pkg$path <- dirname(description) structure(pkg, class = 'package') } # Find all occurences for each word summarize_words <- function(file_names, found_line){ words_by_file <- lapply(found_line, names) bad_words <- sort(unique(unlist(words_by_file))) out <- data.frame( word = bad_words, stringsAsFactors = FALSE ) out$found <- lapply(bad_words, function(word) { index <- which(vapply(words_by_file, `%in%`, x = word, logical(1))) reports <- vapply(index, function(i){ paste0(basename(file_names[i]), ":", found_line[[i]][word]) }, character(1)) }) structure(out, class = c("summary_spellcheck", "data.frame")) } #' @export print.summary_spellcheck <- function(x, ...){ if(!nrow(x)){ cat("No spelling errors found.\n") return(invisible()) } words <- x$word fmt <- paste0("%-", max(nchar(words), 0) + 3, "s") pretty_names <- sprintf(fmt, words) cat(sprintf(fmt, " WORD"), " FOUND IN\n", sep = "") for(i in seq_len(nrow(x))){ cat(pretty_names[i]) cat(paste(x$found[[i]], collapse = paste0("\n", sprintf(fmt, "")))) cat("\n") } invisible(x) } #' @export #' @aliases spell_check_test #' @rdname spell_check_package #' @param error should `CMD check` fail if spelling errors are found? #' Default only prints results. spell_check_setup <- function(pkg = ".", vignettes = TRUE, lang = "en-US", error = FALSE){ # Get package info pkg <- as_package(pkg) lang <- normalize_lang(lang) pkg$language <- lang update_description(pkg, lang = lang) update_wordlist(pkg, vignettes = vignettes) dir.create(file.path(pkg$path, "tests"), showWarnings = FALSE) writeLines(sprintf("if(requireNamespace('spelling', quietly = TRUE)) spelling::spell_check_test(vignettes = %s, error = %s, skip_on_cran = TRUE)", deparse(vignettes), deparse(error)), file.path(pkg$path, "tests/spelling.R")) cat(sprintf("Updated %s\n", file.path(pkg$path, "tests/spelling.R"))) } #' @export spell_check_test <- function(vignettes = TRUE, error = FALSE, lang = NULL, skip_on_cran = TRUE){ if(isTRUE(skip_on_cran)){ not_cran <- Sys.getenv('NOT_CRAN') # See logic in tools:::config_val_to_logical if(is.na(match(tolower(not_cran), c("1", "yes", "true")))) return(NULL) } out_save <- readLines(system.file("templates/spelling.Rout.save", package = 'spelling')) code <- format_syntax(readLines("spelling.R")) out_save <- sub("@INPUT@", code, out_save, fixed = TRUE) writeLines(out_save, "spelling.Rout.save") # Try to find pkg source directory pkg_dir <- list.files("../00_pkg_src", full.names = TRUE) if(!length(pkg_dir)){ # This is where it is on e.g. win builder check_dir <- dirname(getwd()) if(grepl("\\.Rcheck$", check_dir)){ source_dir <- sub("\\.Rcheck$", "", check_dir) if(file.exists(source_dir)) pkg_dir <- source_dir } } if(!length(pkg_dir) && identical(basename(getwd()), 'tests')){ if(file.exists('../DESCRIPTION')){ pkg_dir <- dirname(getwd()) } } if(!length(pkg_dir)){ warning("Failed to find package source directory from: ", getwd()) return(invisible()) } results <- spell_check_package(pkg_dir, vignettes = vignettes) if(nrow(results)){ if(isTRUE(error)){ output <- sprintf("Potential spelling errors: %s\n", paste(results$word, collapse = ", ")) stop(output, "\n", "If these are false positive, run `spelling::update_wordlist()`.", call. = FALSE) } else { cat("Potential spelling errors:\n") print(results) cat("If these are false positive, run `spelling::update_wordlist()`.") } } cat("All Done!\n") } update_description <- function(pkg, lang = NULL){ desc <- normalizePath(file.path(pkg$path, "DESCRIPTION"), mustWork = TRUE) lines <- readLines(desc, warn = FALSE) if(!any(grepl("spelling", c(pkg$package, pkg$suggests, pkg$imports, pkg$depends)))){ lines <- if(!any(grepl("^Suggests", lines))){ c(lines, "Suggests:\n spelling") } else { sub("^Suggests:", "Suggests:\n spelling,", lines) } } is_lang <- grepl("^Language:", lines, ignore.case = TRUE) isolang <- gsub("_", "-", lang, fixed = TRUE) if(any(is_lang)){ is_lang <- which(grepl("^Language:", lines)) lines[is_lang] <- paste("Language:", isolang) } else { message(sprintf("Adding 'Language: %s' to DESCRIPTION", isolang)) lines <- c(lines, paste("Language:", isolang)) } writeLines(lines, desc) } format_syntax <- function(txt){ pt <- getOption('prompt') ct <- getOption('continue') prefix <- c(pt, rep(ct, length(txt) - 1)) paste(prefix, txt, collapse = "\n", sep = "") } parse_r_field <- function(txt){ tryCatch({ info <- eval(parse(text = txt)) unlist(info, recursive = TRUE, use.names = FALSE) }, error = function(e){ NULL }) } spelling/R/language.R0000644000176200001440000000150414571343557014214 0ustar liggesusers# Very simple right now: # Convert dashes to underscore # Convert 'en' to 'en_US' # Convert e.g. 'de' to 'de_DE' normalize_lang <- function(lang = NULL){ if(!length(lang) || !nchar(lang)){ message("DESCRIPTION does not contain 'Language' field. Defaulting to 'en-US'.") lang <- "en-US" } if(tolower(lang) == "en" || tolower(lang) == "eng"){ message("Found ambiguous language 'en'. Defaulting to 'en-US") lang <- "en-US" } if(nchar(lang) == 2){ oldlang <- lang lang <- paste(tolower(lang), toupper(lang), sep = "_") message(sprintf("Found ambiguous language '%s'. Defaulting to '%s'", oldlang, lang)) } lang <- gsub("-", "_", lang, fixed = TRUE) parts <- strsplit(lang, "_", fixed = TRUE)[[1]] parts[1] <- tolower(parts[1]) parts[-1] <- toupper(parts[-1]) paste(parts, collapse = "_") } spelling/R/remove-chunks.R0000644000176200001440000000207114571343557015217 0ustar liggesusers# Adapted from lintr:::extract_r_source remove_chunks <- function(path) { path <- normalizePath(path, mustWork = TRUE) filename <- basename(path) lines <- readLines(path, encoding = 'UTF-8') pattern <- get_knitr_pattern(filename, lines) if (is.null(pattern$chunk.begin) || is.null(pattern$chunk.end)) { return(lines) } starts <- grep(pattern$chunk.begin, lines, perl = TRUE) ends <- grep(pattern$chunk.end, lines, perl = TRUE) # no chunks found, so just return the lines if (length(starts) == 0 || length(ends) == 0) { return(lines) } # Find first ending after a start seqs <- lapply(starts, function(start){ end <- sort(ends[ends > start])[1] if(!is.na(end)) seq(start, end) }) lines[unlist(seqs)] = "" return(lines) } detect_pattern <- function(...){ utils::getFromNamespace('detect_pattern', 'knitr')(...) } get_knitr_pattern <- function(filename, lines) { pattern <- detect_pattern(lines, tolower(tools::file_ext(filename))) if (!is.null(pattern)) { knitr::all_patterns[[pattern]] } else { NULL } } spelling/MD50000644000176200001440000000152714571530062012407 0ustar liggesusersf4e164d8c82a19cb90bdb186a5550966 *DESCRIPTION 1ee0683cce6d3479250337954c075d63 *LICENSE 151c78c6510bc95a7ee0dda1c7324701 *NAMESPACE af62affba167a3b5ccf91e2d46b9b4e3 *NEWS d97fa6a646288361b62e8afa315b3966 *R/check-files.R fec788201862967ee85f4fa838596c41 *R/language.R f03b5be48b16af7ebe19085ec1875f79 *R/parse-markdown.R 84ab4c59719bd93da028ae5b0d6e7868 *R/remove-chunks.R 7169db7d9fded57a654275109db0d3f6 *R/rmarkdown.R 1fd708e3727ef8425772f61d57d7d386 *R/spell-check.R 3e101eb2ee108a98127e8378abad1ed8 *R/wordlist.R 80f21503dddc391bb76ee403b82031a2 *inst/WORDLIST 221eb7751a0c8a4ef7b3ec62a43b6f0c *inst/templates/spelling.Rout.save 47b05131eea8bff4ad875fa71742aa36 *man/spell_check_files.Rd 5afb3136c43e66c04570d60c1ee85da3 *man/spell_check_package.Rd 334eea235383fd8b9012052f3778c8d3 *man/wordlist.Rd bc882e5235bfdccc9e49f99290365b40 *tests/spelling.R spelling/inst/0000755000176200001440000000000014571347132013053 5ustar liggesusersspelling/inst/templates/0000755000176200001440000000000014571343557015060 5ustar liggesusersspelling/inst/templates/spelling.Rout.save0000644000176200001440000000131414571343557020504 0ustar liggesusers R version 3.4.1 (2017-06-30) -- "Single Candle" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin15.6.0 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. @INPUT@ All Done! > > proc.time() user system elapsed 0.372 0.039 0.408 spelling/inst/WORDLIST0000644000176200001440000000006314571347132014244 0ustar liggesusersAppVeyor CMD RStudio devtools hunspell pkg rmd rnw