themis/ 0000755 0001762 0000144 00000000000 14466517462 011562 5 ustar ligges users themis/NAMESPACE 0000644 0001762 0000144 00000005362 14434172647 013004 0 ustar ligges users # Generated by roxygen2: do not edit by hand
S3method(bake,step_adasyn)
S3method(bake,step_bsmote)
S3method(bake,step_downsample)
S3method(bake,step_nearmiss)
S3method(bake,step_rose)
S3method(bake,step_smote)
S3method(bake,step_smotenc)
S3method(bake,step_tomek)
S3method(bake,step_upsample)
S3method(prep,step_adasyn)
S3method(prep,step_bsmote)
S3method(prep,step_downsample)
S3method(prep,step_nearmiss)
S3method(prep,step_rose)
S3method(prep,step_smote)
S3method(prep,step_smotenc)
S3method(prep,step_tomek)
S3method(prep,step_upsample)
S3method(print,step_adasyn)
S3method(print,step_bsmote)
S3method(print,step_downsample)
S3method(print,step_nearmiss)
S3method(print,step_rose)
S3method(print,step_smote)
S3method(print,step_smotenc)
S3method(print,step_tomek)
S3method(print,step_upsample)
S3method(required_pkgs,step_adasyn)
S3method(required_pkgs,step_bsmote)
S3method(required_pkgs,step_downsample)
S3method(required_pkgs,step_nearmiss)
S3method(required_pkgs,step_rose)
S3method(required_pkgs,step_smote)
S3method(required_pkgs,step_smotenc)
S3method(required_pkgs,step_tomek)
S3method(required_pkgs,step_upsample)
S3method(tidy,step_adasyn)
S3method(tidy,step_bsmote)
S3method(tidy,step_downsample)
S3method(tidy,step_nearmiss)
S3method(tidy,step_rose)
S3method(tidy,step_smote)
S3method(tidy,step_smotenc)
S3method(tidy,step_tomek)
S3method(tidy,step_upsample)
S3method(tunable,step_adasyn)
S3method(tunable,step_bsmote)
S3method(tunable,step_downsample)
S3method(tunable,step_nearmiss)
S3method(tunable,step_rose)
S3method(tunable,step_smote)
S3method(tunable,step_smotenc)
S3method(tunable,step_upsample)
export(adasyn)
export(bsmote)
export(nearmiss)
export(required_pkgs)
export(smote)
export(smotenc)
export(step_adasyn)
export(step_bsmote)
export(step_downsample)
export(step_nearmiss)
export(step_rose)
export(step_smote)
export(step_smotenc)
export(step_tomek)
export(step_upsample)
export(tidy)
export(tomek)
export(tunable)
import(rlang)
importFrom(ROSE,ROSE)
importFrom(dplyr,all_of)
importFrom(dplyr,bind_rows)
importFrom(dplyr,mutate)
importFrom(dplyr,select)
importFrom(generics,required_pkgs)
importFrom(generics,tidy)
importFrom(generics,tunable)
importFrom(glue,glue)
importFrom(lifecycle,deprecated)
importFrom(purrr,map_dfr)
importFrom(purrr,map_lgl)
importFrom(recipes,add_step)
importFrom(recipes,bake)
importFrom(recipes,check_new_data)
importFrom(recipes,check_type)
importFrom(recipes,is_trained)
importFrom(recipes,prep)
importFrom(recipes,print_step)
importFrom(recipes,rand_id)
importFrom(recipes,recipes_eval_select)
importFrom(recipes,sel2char)
importFrom(recipes,step)
importFrom(rlang,":=")
importFrom(rlang,caller_env)
importFrom(rlang,enquos)
importFrom(tibble,as_tibble)
importFrom(tibble,tibble)
importFrom(vctrs,vec_cbind)
importFrom(withr,with_seed)
themis/LICENSE 0000644 0001762 0000144 00000000054 14406427231 012552 0 ustar ligges users YEAR: 2023
COPYRIGHT HOLDER: themis authors
themis/README.md 0000644 0001762 0000144 00000016530 14466476130 013042 0 ustar ligges users
# themis
[](https://github.com/tidymodels/themis/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/tidymodels/themis?branch=main)
[](https://CRAN.R-project.org/package=themis)
[](https://CRAN.R-project.org/package=themis)
[](https://lifecycle.r-lib.org/articles/stages.html)
**themis** contains extra steps for the
[`recipes`](https://CRAN.R-project.org/package=recipes) package for
dealing with unbalanced data. The name **themis** is that of the
[ancient Greek
god](https://thishollowearth.wordpress.com/2012/07/02/god-of-the-week-themis/)
who is typically depicted with a balance.
## Installation
You can install the released version of themis from
[CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("themis")
```
Install the development version from GitHub with:
``` r
# install.packages("pak")
pak::pak("tidymodels/themis")
```
## Example
Following is a example of using the
[SMOTE](https://jair.org/index.php/jair/article/view/10302/24590)
algorithm to deal with unbalanced data
``` r
library(recipes)
library(modeldata)
library(themis)
data("credit_data")
credit_data0 <- credit_data %>%
filter(!is.na(Job))
count(credit_data0, Job)
#> Job n
#> 1 fixed 2805
#> 2 freelance 1024
#> 3 others 171
#> 4 partime 452
ds_rec <- recipe(Job ~ Time + Age + Expenses, data = credit_data0) %>%
step_impute_mean(all_predictors()) %>%
step_smote(Job, over_ratio = 0.25) %>%
prep()
ds_rec %>%
bake(new_data = NULL) %>%
count(Job)
#> # A tibble: 4 × 2
#> Job n
#>
#> 1 fixed 2805
#> 2 freelance 1024
#> 3 others 701
#> 4 partime 701
```
## Methods
Below is some unbalanced data. Used for examples latter.
``` r
example_data <- data.frame(class = letters[rep(1:5, 1:5 * 10)],
x = rnorm(150))
library(ggplot2)
example_data %>%
ggplot(aes(class)) +
geom_bar()
```
### Upsample / Over-sampling
The following methods all share the tuning parameter `over_ratio`, which
is the ratio of the majority-to-minority frequencies.
| name | function | Multi-class |
|-----------------------------------------------------------------|---------------------------|--------------------|
| Random minority over-sampling with replacement | `step_upsample()` | :heavy_check_mark: |
| Synthetic Minority Over-sampling Technique | `step_smote()` | :heavy_check_mark: |
| Borderline SMOTE-1 | `step_bsmote(method = 1)` | :heavy_check_mark: |
| Borderline SMOTE-2 | `step_bsmote(method = 2)` | :heavy_check_mark: |
| Adaptive synthetic sampling approach for imbalanced learning | `step_adasyn()` | :heavy_check_mark: |
| Generation of synthetic data by Randomly Over Sampling Examples | `step_rose()` | |
By setting `over_ratio = 1` you bring the number of samples of all
minority classes equal to 100% of the majority class.
``` r
recipe(~., example_data) %>%
step_upsample(class, over_ratio = 1) %>%
prep() %>%
bake(new_data = NULL) %>%
ggplot(aes(class)) +
geom_bar()
```
and by setting `over_ratio = 0.5` we upsample any minority class with
less samples then 50% of the majority up to have 50% of the majority.
``` r
recipe(~., example_data) %>%
step_upsample(class, over_ratio = 0.5) %>%
prep() %>%
bake(new_data = NULL) %>%
ggplot(aes(class)) +
geom_bar()
```
### Downsample / Under-sampling
Most of the the following methods all share the tuning parameter
`under_ratio`, which is the ratio of the minority-to-majority
frequencies.
| name | function | Multi-class | under_ratio |
|-------------------------------------------------|---------------------|--------------------|--------------------|
| Random majority under-sampling with replacement | `step_downsample()` | :heavy_check_mark: | :heavy_check_mark: |
| NearMiss-1 | `step_nearmiss()` | :heavy_check_mark: | :heavy_check_mark: |
| Extraction of majority-minority Tomek links | `step_tomek()` | | |
By setting `under_ratio = 1` you bring the number of samples of all
majority classes equal to 100% of the minority class.
``` r
recipe(~., example_data) %>%
step_downsample(class, under_ratio = 1) %>%
prep() %>%
bake(new_data = NULL) %>%
ggplot(aes(class)) +
geom_bar()
```
and by setting `under_ratio = 2` we downsample any majority class with
more then 200% samples of the minority class down to have to 200%
samples of the minority.
``` r
recipe(~., example_data) %>%
step_downsample(class, under_ratio = 2) %>%
prep() %>%
bake(new_data = NULL) %>%
ggplot(aes(class)) +
geom_bar()
```
## Contributing
This project is released with a [Contributor Code of
Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms.
- For questions and discussions about tidymodels packages, modeling, and
machine learning, [join us on RStudio
Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
- If you think you have encountered a bug, please [submit an
issue](https://github.com/tidymodels/themis/issues).
- Either way, learn how to create and share a
[reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html)
(a minimal, reproducible example), to clearly communicate about your
code.
- Check out further details on [contributing guidelines for tidymodels
packages](https://www.tidymodels.org/contribute/) and [how to get
help](https://www.tidymodels.org/help/).
themis/data/ 0000755 0001762 0000144 00000000000 14401475236 012462 5 ustar ligges users themis/data/circle_example.rda 0000644 0001762 0000144 00000012510 14401475236 016125 0 ustar ligges users BZh91AY&SYb