forcats/ 0000755 0001762 0000144 00000000000 13432271102 011706 5 ustar ligges users forcats/inst/ 0000755 0001762 0000144 00000000000 13432266204 012672 5 ustar ligges users forcats/inst/doc/ 0000755 0001762 0000144 00000000000 13432266204 013437 5 ustar ligges users forcats/inst/doc/forcats.Rmd 0000644 0001762 0000144 00000012144 13432022227 015541 0 ustar ligges users ---
title: "Introduction to forcats"
author: "Emily Robinson"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Introduction to forcats}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
The goal of the __forcats__ package is to provide a suite of useful tools that solve common problems with factors. Factors are useful when you have categorical data, variables that have a fixed and known set of values, and when you want to display character vectors in non-alphabetical order. If you want to learn more, the best place to start is the [chapter on factors](http://r4ds.had.co.nz/factors.html) in R for Data Science.
## Ordering by frequency
```{r message = FALSE}
library(dplyr)
library(ggplot2)
library(forcats)
```
Let's try answering the question, "what are the most common hair colors of star wars characters?" Let's start off by making a bar plot:
```{r initial-plot}
ggplot(starwars, aes(x = hair_color)) +
geom_bar() +
coord_flip()
```
That's okay, but it would be more helpful the graph was ordered by count. This is a case of an **unordered** categorical variable where we want it ordered by its frequency. To do so, we can use the function `fct_infreq()`:
```{r fct-infreq-hair}
ggplot(starwars, aes(x = fct_infreq(hair_color))) +
geom_bar() +
coord_flip()
```
Note that `fct_infreq()` it automatically puts NA at the top, even though that doesn't have the smallest number of entries.
## Combining levels
Let's take a look at skin color now:
```{r}
starwars %>%
count(skin_color, sort = TRUE)
```
We see that there's 31 different skin colors - if we want to make a plot this would be way too many to display! Let's reduce it to only be the top 5. We can use `fct_lump()` to "lump" all the infrequent colors into one factor, "other." The argument `n` is the number of levels we want to keep.
```{r}
starwars %>%
mutate(skin_color = fct_lump(skin_color, n = 5)) %>%
count(skin_color, sort = TRUE)
```
We could also have used `prop` instead, which keeps all the levels that appear at least `prop` of the time. For example, let's keep skin colors that at least 10% of the characters have:
```{r}
starwars %>%
mutate(skin_color = fct_lump(skin_color, prop = .1)) %>%
count(skin_color, sort = TRUE)
```
Only light and fair remain; everything else is other.
If you wanted to call it something than "other", you can change it with the argument `other_level`:
```{r}
starwars %>%
mutate(skin_color = fct_lump(skin_color, prop = .1, other_level = "extra")) %>%
count(skin_color, sort = TRUE)
```
What if we wanted to see if the average mass differed by eye color? We'll only look at the 6 most popular eye colors and remove `NA`s.
```{r fct-lump-mean}
avg_mass_eye_color <- starwars %>%
mutate(eye_color = fct_lump(eye_color, n = 6)) %>%
group_by(eye_color) %>%
summarise(mean_mass = mean(mass, na.rm = TRUE))
avg_mass_eye_color
```
## Ordering by another variable
It looks like people (or at least one person) with orange eyes are definitely heavier! If we wanted to make a graph, it would be nice if it was ordered by `mean_mass`. We can do this with `fct_reorder()`, which reorders one variable by another.
```{r fct-reorder}
avg_mass_eye_color %>%
mutate(eye_color = fct_reorder(eye_color, mean_mass)) %>%
ggplot(aes(x = eye_color, y = mean_mass)) +
geom_col()
```
## Manually reordering
Let's switch to using another dataset, `gss_cat`, the general social survey. What is the income distribution among the respondents?
```{r}
gss_cat %>%
count(rincome)
```
Notice that the income levels are in the correct order - they start with the non-answers and then go from highest to lowest. This is the same order you'd see if you plotted it as a bar chart. This is not a coincidence. When you're working with ordinal data, where there is an order, you can have an ordered factor. You can examine them with the base function `levels()`, which prints them in order:
```{r}
levels(gss_cat$rincome)
```
But what if your factor came in the wrong order? Let's simulate that by reordering the levels of `rincome` randomly with `fct_shuffle()`:
```{r}
reshuffled_income <- gss_cat$rincome %>%
fct_shuffle()
levels(reshuffled_income)
```
Now if we plotted it, it would show in this order, which is all over the place! How can we fix this and put it in the right order?
We can use the function `fct_relevel()` when we need to manually reorder our factor levels. In addition to the factor, you give it a character vector of level names, and specify where you want to move them. It defaults to moving them to the front, but you can move them after another level with the argument `after`. If you want to move it to the end, you set `after` equal to `Inf`.
For example, let's say we wanted to move `Lt $1000` and `$1000 to 2999` to the front. We would write:
```{r}
fct_relevel(reshuffled_income, c("Lt $1000", "$1000 to 2999")) %>%
levels()
```
What if we want to move them to the second and third place?
```{r}
fct_relevel(reshuffled_income, c("Lt $1000", "$1000 to 2999"), after = 1) %>%
levels()
```
forcats/inst/doc/forcats.R 0000644 0001762 0000144 00000004600 13432266204 015223 0 ustar ligges users ## ----setup, include = FALSE----------------------------------------------
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
## ----message = FALSE-----------------------------------------------------
library(dplyr)
library(ggplot2)
library(forcats)
## ----initial-plot--------------------------------------------------------
ggplot(starwars, aes(x = hair_color)) +
geom_bar() +
coord_flip()
## ----fct-infreq-hair-----------------------------------------------------
ggplot(starwars, aes(x = fct_infreq(hair_color))) +
geom_bar() +
coord_flip()
## ------------------------------------------------------------------------
starwars %>%
count(skin_color, sort = TRUE)
## ------------------------------------------------------------------------
starwars %>%
mutate(skin_color = fct_lump(skin_color, n = 5)) %>%
count(skin_color, sort = TRUE)
## ------------------------------------------------------------------------
starwars %>%
mutate(skin_color = fct_lump(skin_color, prop = .1)) %>%
count(skin_color, sort = TRUE)
## ------------------------------------------------------------------------
starwars %>%
mutate(skin_color = fct_lump(skin_color, prop = .1, other_level = "extra")) %>%
count(skin_color, sort = TRUE)
## ----fct-lump-mean-------------------------------------------------------
avg_mass_eye_color <- starwars %>%
mutate(eye_color = fct_lump(eye_color, n = 6)) %>%
group_by(eye_color) %>%
summarise(mean_mass = mean(mass, na.rm = TRUE))
avg_mass_eye_color
## ----fct-reorder---------------------------------------------------------
avg_mass_eye_color %>%
mutate(eye_color = fct_reorder(eye_color, mean_mass)) %>%
ggplot(aes(x = eye_color, y = mean_mass)) +
geom_col()
## ------------------------------------------------------------------------
gss_cat %>%
count(rincome)
## ------------------------------------------------------------------------
levels(gss_cat$rincome)
## ------------------------------------------------------------------------
reshuffled_income <- gss_cat$rincome %>%
fct_shuffle()
levels(reshuffled_income)
## ------------------------------------------------------------------------
fct_relevel(reshuffled_income, c("Lt $1000", "$1000 to 2999")) %>%
levels()
## ------------------------------------------------------------------------
fct_relevel(reshuffled_income, c("Lt $1000", "$1000 to 2999"), after = 1) %>%
levels()
forcats/inst/doc/forcats.html 0000644 0001762 0000144 00000262104 13432266204 015773 0 ustar ligges users
Introduction to forcats
Introduction to forcats
Emily Robinson
The goal of the forcats package is to provide a suite of useful tools that solve common problems with factors. Factors are useful when you have categorical data, variables that have a fixed and known set of values, and when you want to display character vectors in non-alphabetical order. If you want to learn more, the best place to start is the chapter on factors in R for Data Science.
Ordering by frequency
Let’s try answering the question, “what are the most common hair colors of star wars characters?” Let’s start off by making a bar plot:

That’s okay, but it would be more helpful the graph was ordered by count. This is a case of an unordered categorical variable where we want it ordered by its frequency. To do so, we can use the function fct_infreq()
:

Note that fct_infreq()
it automatically puts NA at the top, even though that doesn’t have the smallest number of entries.
Combining levels
Let’s take a look at skin color now:
We see that there’s 31 different skin colors - if we want to make a plot this would be way too many to display! Let’s reduce it to only be the top 5. We can use fct_lump()
to “lump” all the infrequent colors into one factor, “other.” The argument n
is the number of levels we want to keep.
We could also have used prop
instead, which keeps all the levels that appear at least prop
of the time. For example, let’s keep skin colors that at least 10% of the characters have:
Only light and fair remain; everything else is other.
If you wanted to call it something than “other”, you can change it with the argument other_level
:
What if we wanted to see if the average mass differed by eye color? We’ll only look at the 6 most popular eye colors and remove NA
s.
Ordering by another variable
It looks like people (or at least one person) with orange eyes are definitely heavier! If we wanted to make a graph, it would be nice if it was ordered by mean_mass
. We can do this with fct_reorder()
, which reorders one variable by another.

Manually reordering
Let’s switch to using another dataset, gss_cat
, the general social survey. What is the income distribution among the respondents?
Notice that the income levels are in the correct order - they start with the non-answers and then go from highest to lowest. This is the same order you’d see if you plotted it as a bar chart. This is not a coincidence. When you’re working with ordinal data, where there is an order, you can have an ordered factor. You can examine them with the base function levels()
, which prints them in order:
But what if your factor came in the wrong order? Let’s simulate that by reordering the levels of rincome
randomly with fct_shuffle()
:
Now if we plotted it, it would show in this order, which is all over the place! How can we fix this and put it in the right order?
We can use the function fct_relevel()
when we need to manually reorder our factor levels. In addition to the factor, you give it a character vector of level names, and specify where you want to move them. It defaults to moving them to the front, but you can move them after another level with the argument after
. If you want to move it to the end, you set after
equal to Inf
.
For example, let’s say we wanted to move Lt $1000
and $1000 to 2999
to the front. We would write:
What if we want to move them to the second and third place?
forcats/tests/ 0000755 0001762 0000144 00000000000 12752645041 013063 5 ustar ligges users forcats/tests/testthat.R 0000644 0001762 0000144 00000000072 12752645041 015045 0 ustar ligges users library(testthat)
library(forcats)
test_check("forcats")
forcats/tests/testthat/ 0000755 0001762 0000144 00000000000 13432271102 014710 5 ustar ligges users forcats/tests/testthat/test-utils.R 0000644 0001762 0000144 00000000422 13432022227 017150 0 ustar ligges users context("test-utils.R")
test_that("check_factor() fails when needed", {
expect_error(check_factor(NA), "factor")
})
test_that("check_factor_list() fails when needed", {
expect_error(check_factor_list(1), "list")
expect_error(check_factor_list(list(1)), "factor")
})
forcats/tests/testthat/helper-lump.R 0000644 0001762 0000144 00000000150 12754332005 017267 0 ustar ligges users lump_test <- function(x) {
paste(ifelse(in_smallest(x), "X", letters[seq_along(x)]), collapse = "")
}
forcats/tests/testthat/test-count.R 0000644 0001762 0000144 00000001240 13432022227 017137 0 ustar ligges users context("test-count.R")
test_that("0 count for empty levels", {
f <- factor(levels = c("a", "b"))
out <- fct_count(f)
expect_equal(out$n, c(0, 0))
})
test_that("counts NA even when not in levels", {
f <- factor(c("a", "a", NA))
out <- fct_count(f)
expect_equal(out$n, c(2, 1))
})
test_that("returns marginal table", {
f <- factor(c("a", "a", "b"))
out <- fct_count(f, prop = TRUE)
expect_equal(out$n, c(2, 1))
expect_equal(out$p, c(2, 1) / 3)
})
test_that("sort = TRUE brings most frequent values to top", {
f <- factor(c("a", "b", "b"))
out <- fct_count(f, sort = TRUE)
expect_equal(out$f, factor(c("b", "a"), levels = c("a", "b")))
})
forcats/tests/testthat/test-fct_c.R 0000644 0001762 0000144 00000000570 13241142210 017063 0 ustar ligges users context("fct_c")
test_that("uses tidy_dots", {
fs <- list(factor("a"), factor("b"))
fab <- factor(c("a", "b"))
expect_equal(fct_c(!!!fs), fab)
expect_equal(fct_c(fs[[1]], fs[[2]]), fab)
})
test_that("all inputs must be factors", {
expect_error(fct_c("a"), "must be factors")
})
test_that("empty input yields empty factor", {
expect_equal(fct_c(), factor())
})
forcats/tests/testthat/test-relevel.R 0000644 0001762 0000144 00000002003 13432022227 017443 0 ustar ligges users context("fct_relevel")
test_that("warns about unknown levels", {
f1 <- factor(c("a", "b"))
expect_warning(f2 <- fct_relevel(f1, "d"), "Unknown levels")
expect_equal(levels(f2), levels(f1))
})
test_that("moves supplied levels to front", {
f1 <- factor(c("a", "b", "c", "d"))
f2 <- fct_relevel(f1, "c", "b")
expect_equal(levels(f2), c("c", "b", "a", "d"))
})
test_that("can moves supplied levels to end", {
f1 <- factor(c("a", "b", "c", "d"))
f2 <- fct_relevel(f1, "a", "b", after = 2)
f3 <- fct_relevel(f1, "a", "b", after = Inf)
expect_equal(levels(f2), c("c", "d", "a", "b"))
expect_equal(levels(f3), c("c", "d", "a", "b"))
})
test_that("can relevel with function ", {
f1 <- fct_rev(factor(c("a", "b")))
f2a <- fct_relevel(f1, rev)
f2b <- fct_relevel(f1, ~ rev(.))
expect_equal(levels(f2a), c("a", "b"))
expect_equal(levels(f2b), c("a", "b"))
})
test_that("function must return character vector", {
f <- factor(c("a", "b"))
expect_error(fct_relevel(f, ~ 1), "character vector")
})
forcats/tests/testthat/test-fct_lump.R 0000644 0001762 0000144 00000011735 13432022227 017632 0 ustar ligges users context("fct_lump")
test_that("positive values keeps most commmon", {
f <- c("a", "a", "a", "b", "b", "c", "d", "e", "f", "g")
expect_equal(levels(fct_lump(f, n = 1)), c("a", "Other"))
expect_equal(levels(fct_lump(f, n = 2)), c("a", "b", "Other"))
expect_equal(levels(fct_lump(f, prop = 0.25)), c("a", "Other"))
expect_equal(levels(fct_lump(f, prop = 0.15)), c("a", "b", "Other"))
})
test_that("ties are respected", {
f <- c("a", "a", "a", "b", "b", "b", "c", "d")
expect_equal(levels(fct_lump(f, 1)), c("a", "b", "Other"))
})
test_that("negative values drop most common" ,{
f <- c("a", "a", "a", "a", "b", "b", "b", "b", "c", "d")
expect_equal(levels(fct_lump(f, n = -1)), c("c", "d", "Other"))
expect_equal(levels(fct_lump(f, prop = -0.2)), c("c", "d", "Other"))
})
test_that("return original factor when all element satisfy n / p condition", {
f <- c("a", "a", "a", "b", "b", "c", "d", "e", "f", "g")
expect_equal(levels(fct_lump(f, n = 4)), c("a", "b", "c", "d", "e", "f", "g"))
expect_equal(levels(fct_lump(f, n = 10)), c("a", "b", "c", "d", "e", "f", "g"))
expect_equal(levels(fct_lump(f, n = -10)), c("a", "b", "c", "d", "e", "f", "g"))
expect_equal(levels(fct_lump(f, prop = 0.01)), c("a", "b", "c", "d", "e", "f", "g"))
expect_equal(levels(fct_lump(f, prop = -1)), c("a", "b", "c", "d", "e", "f", "g"))
})
test_that("different behaviour when apply tie function", {
f <- c("a", "a", "a", "b", "b", "c", "d", "e", "f", "g")
expect_equal(levels(fct_lump(f, n = 4, ties.method = "min")),
c("a", "b", "c", "d", "e", "f", "g"))
expect_equal(levels(fct_lump(f, n = 4, ties.method = "max")),
c("a", "b", "Other" ))
# Rank of c, d, e, f, g is (3+4+5+6+7)/5 = 5
expect_equal(levels(fct_lump(f, n = 4, ties.method = "average")),
c("a", "b", "Other" ))
expect_equal(levels(fct_lump(f, n = 5, ties.method = "average")),
c("a", "b", "c", "d", "e", "f", "g"))
expect_equal(levels(fct_lump(f, n = 4, ties.method = "first")),
c("a", "b", "c", "d", "Other"))
if (getRversion() >= "3.3.0") {
expect_equal(levels(fct_lump(f, n = 4, ties.method = "last")),
c("a", "b", "f", "g", "Other"))
}
})
test_that("NAs included in total", {
f <- factor(c("a", "a", "b", "c", rep(NA, 7)))
o1 <- fct_lump(f, prop = 0.10)
expect_equal(levels(o1), c("a", "Other"))
o2 <- fct_lump(f, w = rep(1, 11), prop = 0.10)
expect_equal(levels(o2), c("a", "Other"))
})
test_that("bad weights generate error messages", {
expect_error(fct_lump(letters, w = letters), "must be a numeric vector")
expect_error(fct_lump(letters, w = 1:10), "must be the same length")
expect_error(fct_lump(letters, w = rep(-1, 26)), "must be non-negative")
})
test_that("values are correctly weighted", {
f <- c("a", "a", "a", "b", "b", "c", "d", "e", "f", "g")
w <- c( 0.2, 0.2, 0.6, 2, 2, 6, 4, 2, 2, 1)
f2 <- c(
"a",
rep("b", 4),
rep("c", 6),
rep("d", 4),
rep("e", 2),
rep("f", 2),
"g"
)
expect_equal(levels(fct_lump(f, w = w)), levels(fct_lump(f2)))
expect_equal(
levels(fct_lump(f, n = 1, w = w)),
levels(fct_lump(f2, n = 1))
)
expect_equal(
levels(fct_lump(f, n = -2, w = w, ties.method = "first")),
levels(fct_lump(f2, n = -2, ties.method = "first"))
)
expect_equal(
levels(fct_lump(f, n = 99, w = w)),
levels(fct_lump(f2, n = 99))
)
expect_equal(
levels(fct_lump(f, prop = 0.01, w = w)),
levels(fct_lump(f2, prop = 0.01))
)
expect_equal(
levels(fct_lump(f, prop = -0.25, w = w, ties.method = "max")),
levels(fct_lump(f2, prop = -0.25, ties.method = "max"))
)
})
test_that("do not change the label when no lumping occurs", {
f <- c("a", "a", "a", "a", "b", "b", "b", "c", "c", "d")
expect_equal(levels(fct_lump(f, n = 3)), c("a", "b", "c", "d"))
expect_equal(levels(fct_lump(f, prop = 0.1)), c("a", "b", "c", "d"))
})
test_that("fct_lump_min works when not weighted", {
f <- c("a", "a", "a", "b", "b", "c", "d", "e", "f", "g")
expect_equal(levels(fct_lump_min(f, min = 3)), c("a", "Other"))
expect_equal(levels(fct_lump_min(f, min = 2)), c("a", "b", "Other"))
})
test_that("fct_lump_min works when weighted", {
f <- c("a", "b", "c", "d", "e")
w <- c( 0.2, 2, 6, 4, 1)
expect_equal(levels(fct_lump_min(f, min = 6, w = w)), c("c", "Other"))
expect_equal(levels(fct_lump_min(f, min = 1.5, w = w)), c("b", "c", "d", "Other"))
})
# Default -----------------------------------------------------------------
test_that("lumps smallest", {
expect_equal(lump_test(c(1, 2, 3, 6)), "Xbcd")
expect_equal(lump_test(c(1, 2, 3, 7)), "XXXd")
expect_equal(lump_test(c(1, 2, 3, 7, 13)), "XXXde")
expect_equal(lump_test(c(1, 2, 3, 7, 14)), "XXXXe")
})
test_that("doesn't lump if none small enough", {
expect_equal(lump_test(c(2, 2, 4)), "abc")
})
test_that("order doesn't matter", {
expect_equal(lump_test(c(2, 2, 5)), "XXc")
expect_equal(lump_test(c(2, 5, 2)), "XbX")
expect_equal(lump_test(c(5, 2, 2)), "aXX")
})
forcats/tests/testthat/test-reorder.R 0000644 0001762 0000144 00000003025 13432022227 017454 0 ustar ligges users context("test-reorder.R")
test_that("can reorder by 2d summary", {
df <- tibble::tribble(
~g, ~x,
"a", 3,
"a", 3,
"b", 2,
"b", 2,
"b", 1
)
f1 <- fct_reorder(df$g, df$x)
expect_equal(levels(f1), c("b", "a"))
f2 <- fct_reorder(df$g, df$x, .desc = TRUE)
expect_equal(levels(f2), c("a", "b"))
})
test_that("can reorder by 2d summary", {
df <- tibble::tribble(
~g, ~x, ~y,
"a", 1, 10,
"a", 2, 5,
"b", 1, 5,
"b", 2, 10
)
f1 <- fct_reorder2(df$g, df$x, df$y)
expect_equal(levels(f1), c("b", "a"))
f2 <- fct_reorder(df$g, df$x, .desc = TRUE)
expect_equal(levels(f2), c("a", "b"))
})
test_that("complains if summary doesn't return single value", {
fun1 <- function(x, y) c(1, 2)
fun2 <- function(x, y) integer()
expect_error(fct_reorder("a", 1, fun1), "single value per group")
expect_error(fct_reorder("a", 1, fun2), "single value per group")
expect_error(fct_reorder2("a", 1, 2, fun1), "single value per group")
expect_error(fct_reorder2("a", 1, 2, fun2), "single value per group")
})
test_that("fct_infreq respects missing values", {
f <- factor(c("a", "b", "b", NA, NA, NA), exclude = FALSE)
expect_equal(levels(fct_infreq(f)), c(NA, "b", "a"))
})
test_that("fct_inseq sorts in numeric order", {
f <- factor(c("3", "1", "1", "2"))
expect_equal(levels(fct_inseq(f)), c("1", "2", "3"))
})
test_that("fct_inseq gives error for non-numeric levels", {
f <- factor(c("c", "a", "a", "b"))
expect_error(levels(fct_inseq(f)), "level must be coercible to numeric")
})
forcats/tests/testthat/test-as_factor.R 0000644 0001762 0000144 00000000547 13241142430 017756 0 ustar ligges users context("as_factor")
test_that("equivalent to fct_inorder", {
x <- c("a", "z", "g")
expect_equal(as_factor(x), fct_inorder(x))
})
test_that("leaves factors as is", {
f1 <- factor(letters)
f2 <- as_factor(f1)
expect_identical(f1, f2)
})
test_that("supports NA (#89)", {
x <- c("a", "z", "g", NA)
expect_equal(as_factor(x), fct_inorder(x))
})
forcats/tests/testthat/test-rev.R 0000644 0001762 0000144 00000000227 13241140440 016603 0 ustar ligges users context("test-rev.R")
test_that("reverses levels", {
f1 <- factor(c("a", "b", "a"))
f2 <- fct_rev(f1)
expect_equal(levels(f2), c("b", "a"))
})
forcats/tests/testthat/test-fct_relabel.R 0000644 0001762 0000144 00000002232 13413770514 020263 0 ustar ligges users context("fct_relabel")
test_that("identity", {
f1 <- factor(c("a", "b"))
expect_identical(fct_relabel(f1, identity), f1)
})
test_that("error if not function", {
f1 <- factor("a")
expect_error(fct_relabel(f1, 1), "function")
})
test_that("error if level not character", {
f1 <- factor("a")
expect_error(fct_relabel(f1, function(x) 1), "character")
})
test_that("error if level has different length", {
f1 <- factor(letters)
expect_error(fct_relabel(f1, function(x) x[-1]), "expected 26.*got 25")
})
test_that("total collapse", {
f1 <- factor(letters)
new_levels <- function(x) rep("1", length(x))
expect_identical(fct_relabel(f1, new_levels), factor(new_levels(letters)))
})
test_that("additional arguments", {
f1 <- factor(letters)
expect_identical(fct_relabel(f1, paste0, "."), factor(paste0(letters, ".")))
})
test_that("formulas are coerced to functions", {
f1 <- factor(letters)
expect_identical(
fct_relabel(f1, ~paste0(.x, ".")),
factor(paste0(letters, "."))
)
})
test_that("string input is coerced to a factor", {
expect_identical(
fct_relabel(LETTERS[1:2], .fun=function(x) x),
factor(LETTERS[1:2])
)
}) forcats/tests/testthat/test-collapse.R 0000644 0001762 0000144 00000001677 13432022227 017627 0 ustar ligges users context("test-collapse.R")
test_that("can collapse multiple values", {
f1 <- factor(letters[1:3])
f2 <- fct_collapse(f1, x = c("a", "b"), y = "c")
expect_equal(f2, factor(c("x", "x", "y")))
})
test_that("empty dots yields unchanged factor", {
f1 <- factor(letters[1:3])
f2 <- fct_collapse(f1)
expect_identical(f1, f2)
})
test_that("can collapse missing levels", {
f1 <- factor(c("x", NA), exclude = NULL)
f2 <- fct_collapse(f1, y = NA_character_)
expect_equal(f2, factor(c("x", "y")))
})
test_that("can collapse un-named levels to Other", {
f1 <- factor(letters[1:3])
f2 <- fct_collapse(f1, xy = c("a", "b"), group_other = TRUE)
expect_equal(f2, factor(c("xy", "xy", "Other"), levels = c("xy", "Other")))
})
test_that("does not automatically collapse unnamed levels to Other", {
f1 <- factor(letters[1:3])
f2 <- fct_collapse(f1, xy = c("a", "b"))
expect_equal(f2, factor(c("xy", "xy", "c"), levels = c("xy", "c")))
})
forcats/tests/testthat/test-fct_drop.R 0000644 0001762 0000144 00000001162 13240104364 017612 0 ustar ligges users context("fct_drop")
test_that("doesn't add NA level", {
f1 <- factor(c("a", NA), levels = c("a", "b"))
f2 <- fct_drop(f1)
expect_equal(levels(f2), "a")
})
test_that("can optionally restrict levels to drop", {
f1 <- factor("a", levels = c("a", "b", "c"))
expect_equal(levels(fct_drop(f1, only = "b")), c("a", "c"))
expect_equal(levels(fct_drop(f1, only = "a")), c("a", "b", "c"))
})
test_that("preserves ordered class and attributes", {
f1 <- ordered(letters[1:2], letters[1:3])
attr(f1, "x") <- "test"
f2 <- fct_drop(f1)
expect_s3_class(f2, "ordered")
expect_equal(attr(f2, "x"), attr(f1, "x"))
})
forcats/tests/testthat/test-shuffle.R 0000644 0001762 0000144 00000000265 13241140611 017445 0 ustar ligges users context("test-shuffle.R")
test_that("reproducibility shuffles", {
set.seed(1014)
f1 <- factor(c("a", "b"))
f2 <- fct_shuffle(f1)
expect_equal(levels(f2), c("a", "b"))
})
forcats/tests/testthat/test-shift.R 0000644 0001762 0000144 00000000603 13241140304 017121 0 ustar ligges users context("test-shift.R")
test_that("can shift in either direction", {
f1 <- factor(c("a", "b", "c"))
f2_l <- fct_shift(f1, 1)
expect_equal(levels(f2_l), c("b", "c", "a"))
f2_r <- fct_shift(f1, -1)
expect_equal(levels(f2_r), c("c", "a", "b"))
})
test_that("0 shift leaves unchanged", {
f1 <- factor(c("a", "b", "c"))
f2 <- fct_shift(f1, 0)
expect_identical(f1, f2)
})
forcats/tests/testthat/test-lvls_reorder.R 0000644 0001762 0000144 00000002146 13241142372 020521 0 ustar ligges users context("lvls_reorder")
test_that("changes levels, not values", {
f1 <- factor(c("a", "b"))
f2 <- factor(c("a", "b"), levels = c("b", "a"))
expect_equal(lvls_reorder(f1, 2:1), f2)
})
test_that("idx must be numeric", {
f <- factor(c("a", "b"))
expect_error(lvls_reorder(f, "a"), "must be numeric")
})
test_that("must have one integer per level", {
f <- factor(c("a", "b", "c"))
expect_error(lvls_reorder(f, c(1, 2)), "one integer for each level")
expect_error(lvls_reorder(f, c(1, 2, 2)), "one integer for each level")
expect_error(lvls_reorder(f, c(1, 2.5)), "one integer for each level")
})
test_that("can change ordered status of output", {
f1 <- factor(letters[1:3])
f2 <- ordered(f1)
expect_equal(is.ordered(lvls_reorder(f1, 1:3)), FALSE)
expect_equal(is.ordered(lvls_reorder(f1, 1:3, ordered = FALSE)), FALSE)
expect_equal(is.ordered(lvls_reorder(f1, 1:3, ordered = TRUE)), TRUE)
expect_equal(is.ordered(lvls_reorder(f2, 1:3)), TRUE)
expect_equal(is.ordered(lvls_reorder(f2, 1:3, ordered = FALSE)), FALSE)
expect_equal(is.ordered(lvls_reorder(f2, 1:3, ordered = TRUE)), TRUE)
})
forcats/tests/testthat/test-lvls.R 0000644 0001762 0000144 00000002466 13240077457 017016 0 ustar ligges users context("lvls")
# lvls_expand -------------------------------------------------------------
test_that("changes levels, not values", {
f1 <- factor(c("a"))
f2 <- factor(c("a"), levels = c("a", "b"))
expect_equal(lvls_expand(f1, c("a", "b")), f2)
})
test_that("must include all existing levels", {
f1 <- factor(c("a", "b"))
expect_error(lvls_expand(f1, c("a", "c")), "include all existing levels")
})
# lvls_revalue ------------------------------------------------------------
test_that("changes values and levels", {
f1 <- factor(c("a", "b"))
f2 <- factor(c("b", "a"), levels = c("b", "a"))
expect_equal(lvls_revalue(f1, c("b", "a")), f2)
})
test_that("can collapse values", {
f1 <- factor(c("a", "b"))
f2 <- factor(c("a", "a"))
expect_equal(lvls_revalue(f1, c("a", "a")), f2)
})
test_that("preserves missing values", {
f1 <- factor(c("a", NA), exclude = NULL)
f2 <- lvls_revalue(f1, levels(f1))
expect_equal(levels(f2), levels(f1))
})
test_that("`new_levels` must be a character", {
f1 <- factor(c("a", "b"))
expect_error(lvls_revalue(f1, 1:5), "character vector")
})
test_that("`new_levels` must be same length as existing levels", {
f1 <- factor(c("a", "b"))
expect_error(lvls_revalue(f1, c("a")), "same length")
expect_error(lvls_revalue(f1, c("a", "b", "c")), "same length")
})
forcats/tests/testthat/test-fct_other.R 0000644 0001762 0000144 00000001046 13241142307 017770 0 ustar ligges users context("fct_other")
test_that("keeps levels in keep", {
x1 <- factor(c("a", "b"))
x2 <- fct_other(x1, keep = "a")
expect_equal(levels(x2), c("a", "Other"))
})
test_that("drops levels in drop", {
x1 <- factor(c("a", "b"))
x2 <- fct_other(x1, drop = "a")
# other always placed at end
expect_equal(levels(x2), c("b", "Other"))
})
test_that("must supply exactly one of drop and keep", {
f <- factor(c("a", "b"))
expect_error(fct_other(f), "supply one of")
expect_error(fct_other(f, keep = "a", drop = "a"), "supply one of")
})
forcats/tests/testthat/test-explicit_na.R 0000644 0001762 0000144 00000000746 13241136610 020320 0 ustar ligges users context("test-explicit_na.R")
test_that("factor unchanged if no missing levels", {
f1 <- factor(letters[1:3])
f2 <- fct_explicit_na(f1)
expect_identical(f1, f2)
})
test_that("converts implicit NA", {
f1 <- factor(c("a", NA))
f2 <- fct_explicit_na(f1)
expect_equal(f2, fct_inorder(c("a", "(Missing)")))
})
test_that("converts explicit NA", {
f1 <- factor(c("a", NA), exclude = NULL)
f2 <- fct_explicit_na(f1)
expect_equal(f2, fct_inorder(c("a", "(Missing)")))
})
forcats/tests/testthat/test-match.R 0000644 0001762 0000144 00000000526 13432024645 017117 0 ustar ligges users context("test-fct_match.R")
test_that("equivalent to %in% when levels present", {
f <- factor(c("a", "b", "c", NA))
expect_equal(fct_match(f, "a"), f %in% "a")
expect_equal(fct_match(f, NA), f %in% NA)
})
test_that("error when levels are missing", {
f <- factor(c("a", "b", "c"))
expect_error(fct_match(f, "d"), "not present")
})
forcats/tests/testthat/test-fct_cross.R 0000644 0001762 0000144 00000002156 13432022227 020003 0 ustar ligges users context("fct_cross")
test_that("gives correct levels", {
fruit <- as_factor(c("apple", "kiwi", "apple", "apple"))
colour <- as_factor(c("green", "green", "red", "green"))
f2 <- fct_cross(fruit, colour)
expect_setequal(levels(f2), c("apple:green", "kiwi:green", "apple:red"))
})
test_that("keeps empty levels when requested", {
fruit <- as_factor(c("apple", "kiwi", "apple", "apple"))
colour <- as_factor(c("green", "green", "red", "green"))
f2 <- fct_cross(fruit, colour, keep_empty = TRUE)
expect_setequal(levels(f2), c("apple:green", "kiwi:green", "apple:red", "kiwi:red"))
})
test_that("gives NA output on NA input", {
fruit <- as_factor(c("apple", "kiwi", "apple", "apple"))
colour <- as_factor(c("green", "green", "red", "green"))
fruit[1] <- NA
f2 <- fct_cross(fruit, colour)
expect_true(is.na(f2[1]))
})
test_that("gives NA output on NA input, when keeping empty levels", {
fruit <- as_factor(c("apple", "kiwi", "apple", "apple"))
colour <- as_factor(c("green", "green", "red", "green"))
fruit[1] <- NA
f2 <- fct_cross(fruit, colour, keep_empty = TRUE)
expect_true(is.na(f2[1]))
})
forcats/tests/testthat/test-fct_recode.R 0000644 0001762 0000144 00000002135 13031476264 020121 0 ustar ligges users context("fct_recode")
test_that("warns about unknown levels", {
f1 <- factor(c("a", "b"))
expect_warning(f2 <- fct_recode(f1, d = "e"), "Unknown levels")
expect_equal(levels(f2), levels(f1))
})
test_that("can collapse levels", {
f1 <- factor(c("a1", "a2", "b1", "b2"))
f2 <- factor(c("a", "a", "b", "b"))
expect_equal(fct_recode(f1, a = "a1", a = "a2", b = "b1", b = "b2"), f2)
})
test_that("can recode multiple levels to NA", {
f1 <- factor(c("a1", "empty", "a2", "b", "missing"))
f2 <- factor(c("a", NA, "a", "b", NA))
expect_equal(fct_recode(f1, NULL = "missing", NULL = "empty", a = "a1", a = "a2"), f2)
})
test_that("can just remove levels", {
f1 <- factor(c("a", "missing"))
f2 <- factor(c("a", NA))
expect_equal(fct_recode(f1, NULL = "missing"), f2)
})
# check_recode_levels -----------------------------------------------------
test_that("new levels must be character", {
expect_error(check_recode_levels(a = 1), "Problems at positions: 1")
})
test_that("new levels must be length 1", {
expect_error(check_recode_levels(a = c("a", "b")), "Problems at positions: 1")
})
forcats/tests/testthat/test-anon.R 0000644 0001762 0000144 00000000472 13241137443 016755 0 ustar ligges users context("test-anon.R")
test_that("new levels are padded numerics", {
f1 <- factor(letters[1:10])
f2 <- fct_anon(f1)
expect_equal(levels(f2), sprintf("%02d", 1:10))
})
test_that("prefix added to start of level", {
f1 <- factor("x")
f2 <- fct_anon(f1, prefix = "X")
expect_equal(levels(f2), "X1")
})
forcats/NAMESPACE 0000644 0001762 0000144 00000001414 13432024645 013135 0 ustar ligges users # Generated by roxygen2: do not edit by hand
S3method(as_factor,character)
S3method(as_factor,factor)
S3method(as_factor,numeric)
export("%>%")
export(as_factor)
export(fct_anon)
export(fct_c)
export(fct_collapse)
export(fct_count)
export(fct_cross)
export(fct_drop)
export(fct_expand)
export(fct_explicit_na)
export(fct_infreq)
export(fct_inorder)
export(fct_inseq)
export(fct_lump)
export(fct_lump_min)
export(fct_match)
export(fct_other)
export(fct_recode)
export(fct_relabel)
export(fct_relevel)
export(fct_reorder)
export(fct_reorder2)
export(fct_rev)
export(fct_shift)
export(fct_shuffle)
export(fct_unify)
export(fct_unique)
export(last2)
export(lvls_expand)
export(lvls_reorder)
export(lvls_revalue)
export(lvls_union)
importFrom(magrittr,"%>%")
importFrom(stats,median)
forcats/NEWS.md 0000644 0001762 0000144 00000007607 13432266135 013030 0 ustar ligges users # forcats 0.4.0
## New features
* `fct_collapse()` gains a `group_other` argument to allow you to group all
un-named levels into `"Other"`. (#100, @AmeliaMN)
* `fct_cross()` creates a new factor containing the combined levels from two
or more input factors, similar to `base::interaction` (@tslumley, #136)
* `fct_inseq()` reorders labels in numeric order, if possible (#145, @kbodwin).
* `fct_lump_min()` preserves levels that appear at least `min` times (can also
be used with the `w` weighted argument) (@robinsones, #142).
* `fct_match()` performs validated matching, providing a safer alternative to
`f %in% c("x", "y")` which silently returns `FALSE` if `"x"` or `"y"`
are not levels of `f` (e.g. because of a typo) (#126, @jonocarroll).
* `fct_relevel()` can now level factors using a function that is passed the
current levels (#117).
* `as_factor()` now has a numeric method. By default, orders factors in numeric
order, unlike the other methods which default to order of appearance.
(#145, @kbodwin)
## Minor bug fixes and improvements
* `fct_count()` gains a parameter to also compute the proportion
(@zhiiiyang, #146).
* `fct_lump()` now does not change the label if no lumping occurs
(@zhiiiyang, #130).
* `fct_relabel()` now accepts character input.
* `fct_reorder()` and `fct_reorder2()` no longer require that the summary
function return a numeric vector of length 1; instead it can return any
orderable vector of length 1 (#147).
* `fct_reorder()`, `fct_reorder2()` and `as_factor()` now use the ellipsis
package to warn if you pass in named components to `...` (#174).
# forcats 0.3.0
## API changes
* `fct_c()` now requires explicit splicing with `!!!` if you have a
list of factors that you want to combine. This is consistent with an emerging
standards for handling `...` throughout the tidyverse.
* `fct_reorder()` and `fct_reorder2()` have renamed `fun` to `.fun` to
avoid spurious matching of named arguments.
## New features
* All functions that take `...` use "tidy" dots: this means that you use can
`!!!` to splice in a list of values, and trailing empty arguments are
automatically removed. Additionally, all other arguments gain a `.` prefix
in order to avoid unhelpful matching of named arguments (#110).
* `fct_lump()` gains `w` argument (#70, @wilkox) to weight value
frequencies before lumping them together (#68).
## Improvements to NA handling
* `as_factor()` and `fct_inorder()` accept NA levels (#98).
* `fct_explicit_na()` also replaces NAs encoded in levels.
* `fct_lump()` correctly acccounts for `NA` values in input (#41)
* `lvls_revalue()` preserves NA levels.
## Minor improvements and bug fixes
* Test coverage increased from 80% to 99%.
* `fct_drop()` now preserves attributes (#83).
* `fct_expand()` and `lvls_expand()` now also take character vectors (#99).
* `fct_relabel()` now accepts objects coercible to functions
by `rlang::as_function` (#91, @alistaire47)
# forcats 0.2.0
## New functions
* `as_factor()` which works like `as.factor()` but orders levels by
appearance to avoid differences between locales (#39).
* `fct_other()` makes it easier to convert selected levels to "other" (#40)
* `fct_relabel()` allows programmatic relabeling of levels (#50, @krlmlr).
## Minor improvements and bug fixes
* `fct_c()` can take either a list of factors or individual factors (#42).
* `fct_drop()` gains `only` argument to restrict which levels are dropped (#69)
and no longer adds `NA` level if not present (#52).
* `fct_recode()` is now checks that each new value is of length 1 (#56).
* `fct_relevel()` gains `after` argument so you can also move levels
to the end (or any other position you like) (#29).
* `lvls_reorder()`, `fct_inorder()`, and `fct_infreq()` gain an `ordered`
argument, allowing you to override the existing "ordered" status (#54).
# forcats 0.1.1
* Minor fixes for R CMD check
* Add package docs
forcats/data/ 0000755 0001762 0000144 00000000000 12754676654 012652 5 ustar ligges users forcats/data/gss_cat.rda 0000644 0001762 0000144 00000217341 12754704400 014752 0 ustar ligges users BZh91AY&SYC]@ UU ᖘt|$}u3`Zh!ҬDCD = Ѡ: (
/A{@4 ^z(TքP R((UU@ )G )=t6"ҏLeJkFȨU
l-jafiz5+U9R P lҪm()VRI@PڶiZiJkjlf
MEbE-:
tT֮Sz
J=P (D r
!U4-` 0H P (
@ @l@ր ʠ C=1pQzTE tDJ
JEP={
P Sѥ*@-͖1b)oVڭ6D%h
jdF!fPP`
t4l)Z0 B:
ht: į@yP(P U^ƴ4R ( @R" T% J $IP
B%PJ
Q"Un*zJ(
2 "
ȑ S B!I44)hSmz$2 L jx"HBFDz1P= d4 O$DFP 2 RTH5
4 JHOJyMģ6zdz(zOM6Q*`*~,xu\jO