recipes/0000755000177700017770000000000013136342173013241 5ustar herbrandtherbrandtrecipes/inst/0000755000177700017770000000000013136242227014215 5ustar herbrandtherbrandtrecipes/inst/doc/0000755000177700017770000000000013136242227014762 5ustar herbrandtherbrandtrecipes/inst/doc/Selecting_Variables.R0000644000177700017770000000205413136242227021013 0ustar herbrandtherbrandt## ----ex_setup, include=FALSE--------------------------------------------- knitr::opts_chunk$set( message = FALSE, digits = 3, collapse = TRUE, comment = "#>" ) options(digits = 3) ## ----credit-------------------------------------------------------------- library(recipes) data("credit_data") str(credit_data) rec <- recipe(Status ~ Seniority + Time + Age + Records, data = credit_data) rec ## ----var_info_orig------------------------------------------------------- summary(rec, original = TRUE) ## ----dummy_1------------------------------------------------------------- dummied <- rec %>% step_dummy(all_nominal()) ## ----dummy_2------------------------------------------------------------- dummied <- rec %>% step_dummy(Records) # or dummied <- rec %>% step_dummy(all_nominal(), - Status) # or dummied <- rec %>% step_dummy(all_nominal(), - all_outcomes()) ## ----dummy_3------------------------------------------------------------- dummied <- prep(dummied, training = credit_data) with_dummy <- bake(dummied, newdata = credit_data) with_dummy recipes/inst/doc/Custom_Steps.html0000644000177700017770000070621413136242227020312 0ustar herbrandtherbrandt Creating Custom Step Functions

Creating Custom Step Functions

recipes contains a number of different steps included in the package:

library(recipes)
steps <- apropos("^step_")
steps[!grepl("new$", steps)]
#>  [1] "step_BoxCox"       "step_YeoJohnson"   "step_bagimpute"   
#>  [4] "step_bin2factor"   "step_center"       "step_classdist"   
#>  [7] "step_corr"         "step_date"         "step_depth"       
#> [10] "step_discretize"   "step_dummy"        "step_holiday"     
#> [13] "step_hyperbolic"   "step_ica"          "step_interact"    
#> [16] "step_intercept"    "step_invlogit"     "step_isomap"      
#> [19] "step_knnimpute"    "step_kpca"         "step_lincomb"     
#> [22] "step_log"          "step_logit"        "step_meanimpute"  
#> [25] "step_modeimpute"   "step_ns"           "step_nzv"         
#> [28] "step_ordinalscore" "step_other"        "step_pca"         
#> [31] "step_poly"         "step_range"        "step_ratio"       
#> [34] "step_regex"        "step_rm"           "step_scale"       
#> [37] "step_shuffle"      "step_spatialsign"  "step_sqrt"        
#> [40] "step_window"

You might want to make your own and this page describes how to do that. If you are looking for good examples of existing steps, I would suggest looking at the code for centering or PCA to start.

A new step definition

At an example, let’s create a step that replaces the value of a variable with its percentile from the training set. The date that I’ll use is from the recipes package:

data(biomass)
str(biomass)
#> 'data.frame':    536 obs. of  8 variables:
#>  $ sample  : chr  "Akhrot Shell" "Alabama Oak Wood Waste" "Alder" "Alfalfa" ...
#>  $ dataset : chr  "Training" "Training" "Training" "Training" ...
#>  $ carbon  : num  49.8 49.5 47.8 45.1 46.8 ...
#>  $ hydrogen: num  5.64 5.7 5.8 4.97 5.4 5.75 5.99 5.7 5.5 5.9 ...
#>  $ oxygen  : num  42.9 41.3 46.2 35.6 40.7 ...
#>  $ nitrogen: num  0.41 0.2 0.11 3.3 1 2.04 2.68 1.7 0.8 1.2 ...
#>  $ sulfur  : num  0 0 0.02 0.16 0.02 0.1 0.2 0.2 0 0.1 ...
#>  $ HHV     : num  20 19.2 18.3 18.2 18.4 ...

biomass_tr <- biomass[biomass$dataset == "Training",]
biomass_te <- biomass[biomass$dataset == "Testing",]

To illustrate the transformation with the carbon variable, the training set distribution of that variables is shown below with a vertical line for the first value of the test set.

library(ggplot2)
theme_set(theme_bw())
ggplot(biomass_tr, aes(x = carbon)) + 
  geom_histogram(binwidth = 5, col = "blue", fill = "blue", alpha = .5) + 
  geom_vline(xintercept = biomass_te$carbon[1], lty = 2)

Based on the training set, 42.1% of the data are less than a value of 46.35. There are some applications where it might be advantageous to represent the predictor values are percentiles rather than their original values.

Our new step will do this computation for any numeric variables of interest. We will call this step_percentile. The code below is designed for illustration and not speed or best practices. I’ve left out a lot of error trapping that we would want in a real implementation.

Create the initial function.

The user-exposed function step_percentile is just a simple wrapper around an internal function called add_step. This function takes the same arguments as your function and simply adds it to a new recipe. The ... signfies the variable selectors that can be used.

step_percentile <- function(recipe, ..., role = NA, 
                            trained = FALSE, ref_dist = NULL,
                            approx = FALSE, 
                            options = list(probs = (0:100)/100, names = TRUE)) {
## bake but do not evaluate the variable selectors with
## the `quos` function in `rlang`
  terms <- rlang::quos(...) 
  if(length(terms) == 0)
    stop("Please supply at least one variable specification. See ?selections.")
  add_step(
    recipe, 
    step_percentile_new(
      terms = terms, 
      trained = trained,
      role = role, 
      ref_dist = ref_dist,
      approx = approx,
      options = options))
}

You should always keep the first four arguments (recipe though trained) the same as listed above. Some notes:

I’ve added extra arguments specific to this step. In order to calculate the percentile, the training data for the relevant columns will need to be saved. This data will be saved in the ref_dist object. However, this might be problematic if the data set is large. approx would be used when you want to save a grid of pre-computed percentiles from the training set and use these to estimate the percentile for a new data point. If approx = TRUE, the argument ref_dist will contain the grid for each variable.

We will use the stats::quantile to compute the grid. However, we might also want to have control over the granularity of this grid, so the options argument will be used to define how that calculations is done. We could just use the ellipses (aka ...) so that any options passed to step_percentile that are not one of its arguments will then be passed to stats::quantile. We recommend making a seperate list object with the options and use these inside the function.

Initialization of new objects

Next, you can utilize the internal function step that sets the class of new objects. Using subclass = "percentile" will set the class of new objects to `“step_percentile”.

step_percentile_new <- function(terms = NULL, role = NA, trained = FALSE, 
                                ref_dist = NULL, approx = NULL, options = NULL) {
  step(
    subclass = "percentile", 
    terms = terms,
    role = role,
    trained = trained,
    ref_dist = ref_dist,
    approx = approx,
    options = options
  )
}

Define the estimation procedure

You will need to create a new prep method for your step’s class. To do this, three arguments that the method should have:

function(x, training, info = NULL)

where

You can define other options.

The first thing that you might want to do in the prep function is to translate the specification listed in the terms argument to column names in the current data. There is an internal function called terms_select that can be used to obtain this.

prep.step_percentile <- function(x, training, info = NULL, ...) {
  col_names <- terms_select(terms = x$terms, info = info) 
}

Once we have this, we can either save the original data columns or estimate the approximation grid. For the grid, we will use a helper function that enables us to run do.call on a list of arguments that include the options list.

get_pctl <- function(x, args) {
  args$x <- x
  do.call("quantile", args)
}

prep.step_percentile <- function(x, training, info = NULL, ...) {
  col_names <- terms_select(terms = x$terms, info = info) 
  ## You can add error trapping for non-numeric data here and so on.
  ## We'll use the names later so
  if(x$options$names == FALSE)
    stop("`names` should be set to TRUE")
  
  if(!x$approx) {
    x$ref_dist <- training[, col_names]
  } else {
    pctl <- lapply(
      training[, col_names],  
      get_pctl, 
      args = x$options
    )
    x$ref_dist <- pctl
  }
  ## Always return the updated step
  x
}

Create the bake method

Remember that the prep function does not apply the step to the data; it only estimates any required values such as ref_dist. We will need to create a new method for our step_percentile class. The minimum arguments for this are

function(object, newdata, ...)

where object is the updated step function that has been through the corresponding prep code and newdata is a tibble of data to be preprocessingcessed.

Here is the code to convert the new data to percentiles. Two initial helper functions handle the two cases (approximation or not). We always return a tibble as the output.

## Two helper functions
pctl_by_mean <- function(x, ref) mean(ref <= x)

pctl_by_approx <- function(x, ref) {
  ## go from 1 column tibble to vector
  x <- getElement(x, names(x))
  ## get the percentiles values from the names (e.g. "10%")
  p_grid <- as.numeric(gsub("%$", "", names(ref))) 
  approx(x = ref, y = p_grid, xout = x)$y/100
}

bake.step_percentile <- function(object, newdata, ...) {
  require(tibble)
  ## For illustration (and not speed), we will loop through the affected variables
  ## and do the computations
  vars <- names(object$ref_dist)
  
  for(i in vars) {
    if(!object$approx) {
      ## We can use `apply` since tibbles do not drop dimensions:
      newdata[, i] <- apply(newdata[, i], 1, pctl_by_mean, 
                            ref = object$ref_dist[, i])
    } else 
      newdata[, i] <- pctl_by_approx(newdata[, i], object$ref_dist[[i]])
  }
  ## Always convert to tibbles on the way out
  as_tibble(newdata)
}

Running the example

Let’s use the example data to make sure that it works:

rec_obj <- recipe(HHV ~ ., data = biomass_tr[, -(1:2)])
rec_obj <- rec_obj %>%
  step_percentile(all_predictors(), approx = TRUE) 

rec_obj <- prep(rec_obj, training = biomass_tr)
#> step 1 percentile training

percentiles <- bake(rec_obj, biomass_te)
percentiles
#> # A tibble: 80 x 5
#>    carbon hydrogen oxygen nitrogen sulfur
#>     <dbl>    <dbl>  <dbl>    <dbl>  <dbl>
#>  1 0.4209   0.4500 0.9026    0.215  0.735
#>  2 0.1800   0.3850 0.9217    0.928  0.839
#>  3 0.1561   0.3850 0.9447    0.900  0.805
#>  4 0.4233   0.7750 0.2800    0.845  0.902
#>  5 0.6662   0.8667 0.6314    0.155  0.090
#>  6 0.2175   0.3850 0.5363    0.495  0.700
#>  7 0.0803   0.2713 0.9859    0.695  0.903
#>  8 0.1395   0.1260 0.1604    0.606  0.700
#>  9 0.0226   0.1035 0.1312    0.126  0.996
#> 10 0.0178   0.0821 0.0987    0.972  0.974
#> # ... with 70 more rows

The plot below shows how the original data line up with the percentiles for each split of the data for one of the predictors:

recipes/inst/doc/Simple_Example.Rmd0000644000177700017770000001361613136242227020341 0ustar herbrandtherbrandt--- title: "Basic Recipes" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Basic Recipes} output: knitr:::html_vignette: toc: yes --- ```{r ex_setup, include=FALSE} knitr::opts_chunk$set( message = FALSE, digits = 3, collapse = TRUE, comment = "#>" ) options(digits = 3) ``` This document demonstrates some basic uses of recipes. First, some definitions are required: * __variables__ are the original (raw) data columns in a data frame or tibble. For example, in a traditional formula `Y ~ A + B + A:B`, the variables are `A`, `B`, and `Y`. * __roles__ define how variables will be used in the model. Examples are: `predictor` (independent variables), `response`, and `case weight`. This is meant to be open-ended and extensible. * __terms__ are columns in a design matrix such as `A`, `B`, and `A:B`. These can be other derived entities that are grouped such a a set of principal components or a set of columns that define a basis function for a variable. These are synonymous with features in machine learning. Variables that have `predictor` roles would automatically be main effect terms ## An Example The cell segmentation data will be used. It has 58 predictor columns, a factor variable `Class` (the outcome), and two extra labelling columns. Each of the predictors has a suffix for the optical channel (`"Ch1"`-`"Ch4"`). We will first separate the data into a training and test set then remove unimportant variables: ```{r data} library(recipes) library(caret) data(segmentationData) seg_train <- segmentationData %>% filter(Case == "Train") %>% select(-Case, -Cell) seg_test <- segmentationData %>% filter(Case == "Test") %>% select(-Case, -Cell) ``` The idea is that the preprocessing operations will all be created using the training set and then these steps will be applied to both the training and test set. ## An Initial Recipe For a first recipe, let's plan on centering and scaling the predictors. First, we will create a recipe from the original data and then specify the processing steps. Recipes can be created manually by sequentially adding roles to variables in a data set. If the analysis only required **outcomes** and **predictors**, the easiest way to create the initial recipe is to use the standard formula method: ```{r first_rec} rec_obj <- recipe(Class ~ ., data = seg_train) rec_obj ``` The data contained in the `data` argument need not be the training set; this data is only used to catalog the names of the variables and their types (e.g. numeric, etc.). (Note that the formula method here is used to declare the variables and their roles and nothing else. If you use inline functions (e.g. `log`) it will complain. These types of operations can be added later.) ## Preprocessing Steps From here, preprocessing steps can be added sequentially in one of two ways: ```{r step_code, eval = FALSE} rec_obj <- step_name(rec_obj, arguments) ## or rec_obj <- rec_obj %>% step_name(arguments) ``` `step_center` and the other functions will always return updated recipes. One other important facet of the code is the method for specifying which variables should be used in different steps. The manual page `?selections` has more details but [`dplyr`](https://cran.r-project.org/package=dplyr)-like selector functions can be used: * use basic variable names (e.g. `x1, x2`), * [`dplyr`](https://cran.r-project.org/package=dplyr) functions for selecting variables: `contains`, `ends_with`, `everything`, `matches`, `num_range`, and `starts_with`, * functions that subset on the role of the variables that have been specified so far: `all_outcomes`, `all_predictors`, `has_role`, or * similar functions for the type of data: `all_nominal`, `all_numeric`, and `has_type`. Note that the functions listed above are the only ones that can be used to selecto variables inside the steps. Also, minus signs can be used to deselect variables. For our data, we can add the two operations for all of the predictors: ```{r center_scale} standardized <- rec_obj %>% step_center(all_predictors()) %>% step_scale(all_predictors()) standardized ``` It is important to realize that the _specific_ variables have not been declared yet (in this example). In some preprocessing steps, variables will be added or removed from the current list of possible variables. If this is the only preprocessing steps for the predictors, we can now estimate the means and standard deviations from the training set. The `prep` function is used with a recipe and a data set: ```{r trained} trained_rec <- prep(standardized, training = seg_train) ``` Now that the statistics have been estimated, the preprocessing can be applied to the training and test set: ```{r apply} train_data <- bake(trained_rec, newdata = seg_train) test_data <- bake(trained_rec, newdata = seg_test) ``` `bake` returns a tibble: ```{r tibbles} class(test_data) test_data ``` ## Adding Steps After exploring the data, more preprocessing might be required. Steps can be added to the trained recipe. Suppose that we need to create PCA components but only from the predictors from channel 1 and any predictors that are areas: ```{r pca} trained_rec <- trained_rec %>% step_pca(ends_with("Ch1"), contains("area"), num = 5) trained_rec ``` Note that only the last step has been estimated; the first two were previously trained and these activities are not duplicated. We can add the PCA estimates using `prep` again: ```{r pca_training} trained_rec <- prep(trained_rec, training = seg_train) ``` `bake` can be reapplied to get the principal components in addition to the other variables: ```{r pca_bake} test_data <- bake(trained_rec, newdata = seg_test) names(test_data) ``` Note that the PCA components have replaced the original variables that were from channel 1 or measured an area aspect of the cells. There are a number of different steps included in the package: ```{r step_list} steps <- apropos("^step_") steps[!grepl("new$", steps)] ``` recipes/inst/doc/Simple_Example.R0000644000177700017770000000372513136242227020020 0ustar herbrandtherbrandt## ----ex_setup, include=FALSE--------------------------------------------- knitr::opts_chunk$set( message = FALSE, digits = 3, collapse = TRUE, comment = "#>" ) options(digits = 3) ## ----data---------------------------------------------------------------- library(recipes) library(caret) data(segmentationData) seg_train <- segmentationData %>% filter(Case == "Train") %>% select(-Case, -Cell) seg_test <- segmentationData %>% filter(Case == "Test") %>% select(-Case, -Cell) ## ----first_rec----------------------------------------------------------- rec_obj <- recipe(Class ~ ., data = seg_train) rec_obj ## ----step_code, eval = FALSE--------------------------------------------- # rec_obj <- step_name(rec_obj, arguments) ## or # rec_obj <- rec_obj %>% step_name(arguments) ## ----center_scale-------------------------------------------------------- standardized <- rec_obj %>% step_center(all_predictors()) %>% step_scale(all_predictors()) standardized ## ----trained------------------------------------------------------------- trained_rec <- prep(standardized, training = seg_train) ## ----apply--------------------------------------------------------------- train_data <- bake(trained_rec, newdata = seg_train) test_data <- bake(trained_rec, newdata = seg_test) ## ----tibbles------------------------------------------------------------- class(test_data) test_data ## ----pca----------------------------------------------------------------- trained_rec <- trained_rec %>% step_pca(ends_with("Ch1"), contains("area"), num = 5) trained_rec ## ----pca_training-------------------------------------------------------- trained_rec <- prep(trained_rec, training = seg_train) ## ----pca_bake------------------------------------------------------------ test_data <- bake(trained_rec, newdata = seg_test) names(test_data) ## ----step_list----------------------------------------------------------- steps <- apropos("^step_") steps[!grepl("new$", steps)] recipes/inst/doc/Custom_Steps.R0000644000177700017770000001122613136242227017537 0ustar herbrandtherbrandt## ----ex_setup, include=FALSE--------------------------------------------- knitr::opts_chunk$set( message = FALSE, digits = 3, collapse = TRUE, comment = "#>" ) options(digits = 3) ## ----step_list----------------------------------------------------------- library(recipes) steps <- apropos("^step_") steps[!grepl("new$", steps)] ## ----initial------------------------------------------------------------- data(biomass) str(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] ## ----carbon_dist--------------------------------------------------------- library(ggplot2) theme_set(theme_bw()) ggplot(biomass_tr, aes(x = carbon)) + geom_histogram(binwidth = 5, col = "blue", fill = "blue", alpha = .5) + geom_vline(xintercept = biomass_te$carbon[1], lty = 2) ## ----initial_def--------------------------------------------------------- step_percentile <- function(recipe, ..., role = NA, trained = FALSE, ref_dist = NULL, approx = FALSE, options = list(probs = (0:100)/100, names = TRUE)) { ## bake but do not evaluate the variable selectors with ## the `quos` function in `rlang` terms <- rlang::quos(...) if(length(terms) == 0) stop("Please supply at least one variable specification. See ?selections.") add_step( recipe, step_percentile_new( terms = terms, trained = trained, role = role, ref_dist = ref_dist, approx = approx, options = options)) } ## ----initialize---------------------------------------------------------- step_percentile_new <- function(terms = NULL, role = NA, trained = FALSE, ref_dist = NULL, approx = NULL, options = NULL) { step( subclass = "percentile", terms = terms, role = role, trained = trained, ref_dist = ref_dist, approx = approx, options = options ) } ## ----prep_1, eval = FALSE------------------------------------------------ # prep.step_percentile <- function(x, training, info = NULL, ...) { # col_names <- terms_select(terms = x$terms, info = info) # } ## ----prep_2-------------------------------------------------------------- get_pctl <- function(x, args) { args$x <- x do.call("quantile", args) } prep.step_percentile <- function(x, training, info = NULL, ...) { col_names <- terms_select(terms = x$terms, info = info) ## You can add error trapping for non-numeric data here and so on. ## We'll use the names later so if(x$options$names == FALSE) stop("`names` should be set to TRUE") if(!x$approx) { x$ref_dist <- training[, col_names] } else { pctl <- lapply( training[, col_names], get_pctl, args = x$options ) x$ref_dist <- pctl } ## Always return the updated step x } ## ----bake---------------------------------------------------------------- ## Two helper functions pctl_by_mean <- function(x, ref) mean(ref <= x) pctl_by_approx <- function(x, ref) { ## go from 1 column tibble to vector x <- getElement(x, names(x)) ## get the percentiles values from the names (e.g. "10%") p_grid <- as.numeric(gsub("%$", "", names(ref))) approx(x = ref, y = p_grid, xout = x)$y/100 } bake.step_percentile <- function(object, newdata, ...) { require(tibble) ## For illustration (and not speed), we will loop through the affected variables ## and do the computations vars <- names(object$ref_dist) for(i in vars) { if(!object$approx) { ## We can use `apply` since tibbles do not drop dimensions: newdata[, i] <- apply(newdata[, i], 1, pctl_by_mean, ref = object$ref_dist[, i]) } else newdata[, i] <- pctl_by_approx(newdata[, i], object$ref_dist[[i]]) } ## Always convert to tibbles on the way out as_tibble(newdata) } ## ----example------------------------------------------------------------- rec_obj <- recipe(HHV ~ ., data = biomass_tr[, -(1:2)]) rec_obj <- rec_obj %>% step_percentile(all_predictors(), approx = TRUE) rec_obj <- prep(rec_obj, training = biomass_tr) percentiles <- bake(rec_obj, biomass_te) percentiles ## ----cdf_plot, echo = FALSE---------------------------------------------- grid_pct <- rec_obj$steps[[1]]$options$probs plot_data <- data.frame( carbon = c( quantile(biomass_tr$carbon, probs = grid_pct), biomass_te$carbon ), percentile = c(grid_pct, percentiles$carbon), dataset = rep( c("Training", "Testing"), c(length(grid_pct), nrow(percentiles)) ) ) ggplot(plot_data, aes(x = carbon, y = percentile, col = dataset)) + geom_point(alpha = .4, cex = 2) + theme(legend.position = "top") recipes/inst/doc/Ordering.Rmd0000644000177700017770000000275713136242227017212 0ustar herbrandtherbrandt--- title: "Ordering of Steps" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Ordering of Steps} output: knitr:::html_vignette: toc: yes --- In recipes, there are no constraints related to the order in which steps are added to the recipe. However, there are some general suggestions that you should consider: * If using a Box-Cox transformation, don't center the data first or do any operations that might make the data non-positive. Alternatively, use the Yeo-Johnson transformation so you don't have to worry about this. * Recipes do not automatically create dummy variables (unlike _most_ formula methods). If you want to center, scale, or do any other operations on _all_ of the predictors, run `step_dummy` first so that numeric columns are in the data set instead of factors. * As noted in the help file for `step_interact`, you should make dummy variables _before_ creating the interactions. * If you are lumping infrequently categories together with `step_other`, call `step_other` before `step_dummy`. While your project's needs may vary, here is a suggested order of _potential_ steps that should work for most problems: 1. Impute 1. Individual transformations for skewness and other issues 1. Discretize (if needed and if you have no other choice) 1. Create dummy variables 1. Create interactions 1. Normalization steps (center, scale, range, etc) 1. Multivariate transformation (e.g. PCA, spatial sign, etc) Again, your milage may vary for your particular problem. recipes/inst/doc/Selecting_Variables.Rmd0000644000177700017770000000442713136242227021342 0ustar herbrandtherbrandt--- title: "Selecting Variables" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Selecting Variables} output: knitr:::html_vignette: toc: yes --- ```{r ex_setup, include=FALSE} knitr::opts_chunk$set( message = FALSE, digits = 3, collapse = TRUE, comment = "#>" ) options(digits = 3) ``` When recipe steps are used, there are different approaches that can be used to select which variables or features should be used. The three main characteristics of variables that can be queried: * the name of the variable * the data type (e.g. numeric or nominal) * the role that was declared by the recipe The manual pages for `?selections` and `?has_role` have details about the available selection methods. To illustrate this, the credit data will be used: ```{r credit} library(recipes) data("credit_data") str(credit_data) rec <- recipe(Status ~ Seniority + Time + Age + Records, data = credit_data) rec ``` Before any steps are used the information on the original variables is: ```{r var_info_orig} summary(rec, original = TRUE) ``` We can add a step to compute dummy variables on the non-numeric data after we impute any missing data: ```{r dummy_1} dummied <- rec %>% step_dummy(all_nominal()) ``` This will capture _any_ variables that are either character strings or factors: `Status` and `Records`. However, since `Status` is our outcome, we might want to keep it as a factor so we can _subtract_ that variable out either by name or by role: ```{r dummy_2} dummied <- rec %>% step_dummy(Records) # or dummied <- rec %>% step_dummy(all_nominal(), - Status) # or dummied <- rec %>% step_dummy(all_nominal(), - all_outcomes()) ``` Using the last definition: ```{r dummy_3} dummied <- prep(dummied, training = credit_data) with_dummy <- bake(dummied, newdata = credit_data) with_dummy ``` `Status` is unaffected. One important aspect about selecting variables in steps is that the variable names and types may change as steps are being executed. In the above example, `Records` is a factor variable before the step is executed. Afterwards, `Records` is gone and the binary variable `Records_yes` is in its place. One reason to have general selection routines like `all_predictors` or `contains` is to be able to select variables that have not be created yet. recipes/inst/doc/Selecting_Variables.html0000644000177700017770000004114713136242227021564 0ustar herbrandtherbrandt Selecting Variables

Selecting Variables

When recipe steps are used, there are different approaches that can be used to select which variables or features should be used.

The three main characteristics of variables that can be queried:

The manual pages for ?selections and ?has_role have details about the available selection methods.

To illustrate this, the credit data will be used:

library(recipes)
data("credit_data")
str(credit_data)
#> 'data.frame':    4454 obs. of  14 variables:
#>  $ Status   : Factor w/ 2 levels "bad","good": 2 2 1 2 2 2 2 2 2 1 ...
#>  $ Seniority: int  9 17 10 0 0 1 29 9 0 0 ...
#>  $ Home     : Factor w/ 6 levels "ignore","other",..: 6 6 3 6 6 3 3 4 3 4 ...
#>  $ Time     : int  60 60 36 60 36 60 60 12 60 48 ...
#>  $ Age      : int  30 58 46 24 26 36 44 27 32 41 ...
#>  $ Marital  : Factor w/ 5 levels "divorced","married",..: 2 5 2 4 4 2 2 4 2 2 ...
#>  $ Records  : Factor w/ 2 levels "no","yes": 1 1 2 1 1 1 1 1 1 1 ...
#>  $ Job      : Factor w/ 4 levels "fixed","freelance",..: 2 1 2 1 1 1 1 1 2 4 ...
#>  $ Expenses : int  73 48 90 63 46 75 75 35 90 90 ...
#>  $ Income   : int  129 131 200 182 107 214 125 80 107 80 ...
#>  $ Assets   : int  0 0 3000 2500 0 3500 10000 0 15000 0 ...
#>  $ Debt     : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ Amount   : int  800 1000 2000 900 310 650 1600 200 1200 1200 ...
#>  $ Price    : int  846 1658 2985 1325 910 1645 1800 1093 1957 1468 ...

rec <- recipe(Status ~ Seniority + Time + Age + Records, data = credit_data)
rec
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor          4

Before any steps are used the information on the original variables is:

summary(rec, original = TRUE)
#> # A tibble: 5 x 4
#>    variable    type      role   source
#>       <chr>   <chr>     <chr>    <chr>
#> 1 Seniority numeric predictor original
#> 2      Time numeric predictor original
#> 3       Age numeric predictor original
#> 4   Records nominal predictor original
#> 5    Status nominal   outcome original

We can add a step to compute dummy variables on the non-numeric data after we impute any missing data:

dummied <- rec %>% step_dummy(all_nominal())

This will capture any variables that are either character strings or factors: Status and Records. However, since Status is our outcome, we might want to keep it as a factor so we can subtract that variable out either by name or by role:

dummied <- rec %>% step_dummy(Records) # or
dummied <- rec %>% step_dummy(all_nominal(), - Status) # or
dummied <- rec %>% step_dummy(all_nominal(), - all_outcomes()) 

Using the last definition:

dummied <- prep(dummied, training = credit_data)
#> step 1 dummy training
with_dummy <- bake(dummied, newdata = credit_data)
with_dummy
#> # A tibble: 4,454 x 4
#>    Seniority  Time   Age Records_yes
#>        <int> <int> <int>       <dbl>
#>  1         9    60    30           0
#>  2        17    60    58           0
#>  3        10    36    46           1
#>  4         0    60    24           0
#>  5         0    36    26           0
#>  6         1    60    36           0
#>  7        29    60    44           0
#>  8         9    12    27           0
#>  9         0    60    32           0
#> 10         0    48    41           0
#> # ... with 4,444 more rows

Status is unaffected.

One important aspect about selecting variables in steps is that the variable names and types may change as steps are being executed. In the above example, Records is a factor variable before the step is executed. Afterwards, Records is gone and the binary variable Records_yes is in its place. One reason to have general selection routines like all_predictors or contains is to be able to select variables that have not be created yet.

recipes/inst/doc/Simple_Example.html0000644000177700017770000006737113136242227020572 0ustar herbrandtherbrandt Basic Recipes

Basic Recipes

This document demonstrates some basic uses of recipes. First, some definitions are required:

An Example

The cell segmentation data will be used. It has 58 predictor columns, a factor variable Class (the outcome), and two extra labelling columns. Each of the predictors has a suffix for the optical channel ("Ch1"-"Ch4"). We will first separate the data into a training and test set then remove unimportant variables:

library(recipes)
library(caret)
data(segmentationData)

seg_train <- segmentationData %>% 
  filter(Case == "Train") %>% 
  select(-Case, -Cell)
seg_test  <- segmentationData %>% 
  filter(Case == "Test")  %>% 
  select(-Case, -Cell)

The idea is that the preprocessing operations will all be created using the training set and then these steps will be applied to both the training and test set.

An Initial Recipe

For a first recipe, let’s plan on centering and scaling the predictors. First, we will create a recipe from the original data and then specify the processing steps.

Recipes can be created manually by sequentially adding roles to variables in a data set.

If the analysis only required outcomes and predictors, the easiest way to create the initial recipe is to use the standard formula method:

rec_obj <- recipe(Class ~ ., data = seg_train)
rec_obj
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor         58

The data contained in the data argument need not be the training set; this data is only used to catalog the names of the variables and their types (e.g. numeric, etc.).

(Note that the formula method here is used to declare the variables and their roles and nothing else. If you use inline functions (e.g. log) it will complain. These types of operations can be added later.)

Preprocessing Steps

From here, preprocessing steps can be added sequentially in one of two ways:

rec_obj <- step_name(rec_obj, arguments)    ## or
rec_obj <- rec_obj %>% step_name(arguments)

step_center and the other functions will always return updated recipes.

One other important facet of the code is the method for specifying which variables should be used in different steps. The manual page ?selections has more details but dplyr-like selector functions can be used:

Note that the functions listed above are the only ones that can be used to selecto variables inside the steps. Also, minus signs can be used to deselect variables.

For our data, we can add the two operations for all of the predictors:

standardized <- rec_obj %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors()) 
standardized
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor         58
#> 
#> Steps:
#> 
#> Centering for all_predictors()
#> Scaling for all_predictors()

It is important to realize that the specific variables have not been declared yet (in this example). In some preprocessing steps, variables will be added or removed from the current list of possible variables.

If this is the only preprocessing steps for the predictors, we can now estimate the means and standard deviations from the training set. The prep function is used with a recipe and a data set:

trained_rec <- prep(standardized, training = seg_train)
#> step 1 center training 
#> step 2 scale training

Now that the statistics have been estimated, the preprocessing can be applied to the training and test set:

train_data <- bake(trained_rec, newdata = seg_train)
test_data  <- bake(trained_rec, newdata = seg_test)

bake returns a tibble:

class(test_data)
#> [1] "tbl_df"     "tbl"        "data.frame"
test_data
#> # A tibble: 1,010 x 58
#>    AngleCh1 AreaCh1 AvgIntenCh1 AvgIntenCh2 AvgIntenCh3 AvgIntenCh4
#>       <dbl>   <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
#>  1   1.0656  -0.647      -0.684      -1.177      -0.926     -0.9238
#>  2  -1.8040  -0.185      -0.632      -0.479      -0.809     -0.6666
#>  3  -1.0300  -0.707       1.207       3.035       0.348      1.3864
#>  4   1.6935  -0.684       0.806       2.664       0.296      0.8934
#>  5   1.8129  -0.342      -0.668      -1.172      -0.843     -0.9282
#>  6  -1.4759   0.784      -0.682      -0.628      -0.881     -0.5939
#>  7   1.2702   0.272      -0.672      -0.625      -0.809     -0.5156
#>  8  -1.5837   0.457       0.283       1.320      -0.613     -0.0891
#>  9  -0.7957  -0.412      -0.669      -1.168      -0.845     -0.9258
#> 10   0.0363  -0.638      -0.535       0.182      -0.555     -0.0253
#> # ... with 1,000 more rows, and 52 more variables:
#> #   ConvexHullAreaRatioCh1 <dbl>, ConvexHullPerimRatioCh1 <dbl>,
#> #   DiffIntenDensityCh1 <dbl>, DiffIntenDensityCh3 <dbl>,
#> #   DiffIntenDensityCh4 <dbl>, EntropyIntenCh1 <dbl>,
#> #   EntropyIntenCh3 <dbl>, EntropyIntenCh4 <dbl>, EqCircDiamCh1 <dbl>,
#> #   EqEllipseLWRCh1 <dbl>, EqEllipseOblateVolCh1 <dbl>,
#> #   EqEllipseProlateVolCh1 <dbl>, EqSphereAreaCh1 <dbl>,
#> #   EqSphereVolCh1 <dbl>, FiberAlign2Ch3 <dbl>, FiberAlign2Ch4 <dbl>,
#> #   FiberLengthCh1 <dbl>, FiberWidthCh1 <dbl>, IntenCoocASMCh3 <dbl>,
#> #   IntenCoocASMCh4 <dbl>, IntenCoocContrastCh3 <dbl>,
#> #   IntenCoocContrastCh4 <dbl>, IntenCoocEntropyCh3 <dbl>,
#> #   IntenCoocEntropyCh4 <dbl>, IntenCoocMaxCh3 <dbl>,
#> #   IntenCoocMaxCh4 <dbl>, KurtIntenCh1 <dbl>, KurtIntenCh3 <dbl>,
#> #   KurtIntenCh4 <dbl>, LengthCh1 <dbl>, NeighborAvgDistCh1 <dbl>,
#> #   NeighborMinDistCh1 <dbl>, NeighborVarDistCh1 <dbl>, PerimCh1 <dbl>,
#> #   ShapeBFRCh1 <dbl>, ShapeLWRCh1 <dbl>, ShapeP2ACh1 <dbl>,
#> #   SkewIntenCh1 <dbl>, SkewIntenCh3 <dbl>, SkewIntenCh4 <dbl>,
#> #   SpotFiberCountCh3 <dbl>, SpotFiberCountCh4 <dbl>, TotalIntenCh1 <dbl>,
#> #   TotalIntenCh2 <dbl>, TotalIntenCh3 <dbl>, TotalIntenCh4 <dbl>,
#> #   VarIntenCh1 <dbl>, VarIntenCh3 <dbl>, VarIntenCh4 <dbl>,
#> #   WidthCh1 <dbl>, XCentroid <dbl>, YCentroid <dbl>

Adding Steps

After exploring the data, more preprocessing might be required. Steps can be added to the trained recipe. Suppose that we need to create PCA components but only from the predictors from channel 1 and any predictors that are areas:

trained_rec <- trained_rec %>%
  step_pca(ends_with("Ch1"), contains("area"), num = 5)
trained_rec
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor         58
#> 
#> Training data contained 1009 data points and no missing data.
#> 
#> Steps:
#> 
#> Centering for AngleCh1, AreaCh1, ... [trained]
#> Scaling for AngleCh1, AreaCh1, ... [trained]
#> PCA extraction with ends_with("Ch1"), contains("area")

Note that only the last step has been estimated; the first two were previously trained and these activities are not duplicated. We can add the PCA estimates using prep again:

trained_rec <- prep(trained_rec, training = seg_train)
#> step 1 center [pre-trained]
#> step 2 scale [pre-trained]
#> step 3 pca training

bake can be reapplied to get the principal components in addition to the other variables:

test_data  <- bake(trained_rec, newdata = seg_test)
names(test_data)
#>  [1] "AvgIntenCh2"          "AvgIntenCh3"          "AvgIntenCh4"         
#>  [4] "DiffIntenDensityCh3"  "DiffIntenDensityCh4"  "EntropyIntenCh3"     
#>  [7] "EntropyIntenCh4"      "FiberAlign2Ch3"       "FiberAlign2Ch4"      
#> [10] "IntenCoocASMCh3"      "IntenCoocASMCh4"      "IntenCoocContrastCh3"
#> [13] "IntenCoocContrastCh4" "IntenCoocEntropyCh3"  "IntenCoocEntropyCh4" 
#> [16] "IntenCoocMaxCh3"      "IntenCoocMaxCh4"      "KurtIntenCh3"        
#> [19] "KurtIntenCh4"         "SkewIntenCh3"         "SkewIntenCh4"        
#> [22] "SpotFiberCountCh3"    "SpotFiberCountCh4"    "TotalIntenCh2"       
#> [25] "TotalIntenCh3"        "TotalIntenCh4"        "VarIntenCh3"         
#> [28] "VarIntenCh4"          "XCentroid"            "YCentroid"           
#> [31] "PC1"                  "PC2"                  "PC3"                 
#> [34] "PC4"                  "PC5"

Note that the PCA components have replaced the original variables that were from channel 1 or measured an area aspect of the cells.

There are a number of different steps included in the package:

steps <- apropos("^step_")
steps[!grepl("new$", steps)]
#>  [1] "step_BoxCox"       "step_YeoJohnson"   "step_bagimpute"   
#>  [4] "step_bin2factor"   "step_center"       "step_classdist"   
#>  [7] "step_corr"         "step_date"         "step_depth"       
#> [10] "step_discretize"   "step_dummy"        "step_holiday"     
#> [13] "step_hyperbolic"   "step_ica"          "step_interact"    
#> [16] "step_intercept"    "step_invlogit"     "step_isomap"      
#> [19] "step_knnimpute"    "step_kpca"         "step_lincomb"     
#> [22] "step_log"          "step_logit"        "step_meanimpute"  
#> [25] "step_modeimpute"   "step_ns"           "step_nzv"         
#> [28] "step_ordinalscore" "step_other"        "step_pca"         
#> [31] "step_percentile"   "step_poly"         "step_range"       
#> [34] "step_ratio"        "step_regex"        "step_rm"          
#> [37] "step_scale"        "step_shuffle"      "step_spatialsign" 
#> [40] "step_sqrt"         "step_window"
recipes/inst/doc/Ordering.html0000644000177700017770000002205213136242227017422 0ustar herbrandtherbrandt Ordering of Steps

Ordering of Steps

In recipes, there are no constraints related to the order in which steps are added to the recipe. However, there are some general suggestions that you should consider:

While your project’s needs may vary, here is a suggested order of potential steps that should work for most problems:

  1. Impute
  2. Individual transformations for skewness and other issues
  3. Discretize (if needed and if you have no other choice)
  4. Create dummy variables
  5. Create interactions
  6. Normalization steps (center, scale, range, etc)
  7. Multivariate transformation (e.g. PCA, spatial sign, etc)

Again, your milage may vary for your particular problem.

recipes/inst/doc/Custom_Steps.Rmd0000644000177700017770000002326613136242227020067 0ustar herbrandtherbrandt--- title: "Creating Custom Step Functions" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Custom Steps} %\VignetteEncoding{UTF-8} output: knitr:::html_vignette: toc: yes --- ```{r ex_setup, include=FALSE} knitr::opts_chunk$set( message = FALSE, digits = 3, collapse = TRUE, comment = "#>" ) options(digits = 3) ``` `recipes` contains a number of different steps included in the package: ```{r step_list} library(recipes) steps <- apropos("^step_") steps[!grepl("new$", steps)] ``` You might want to make your own and this page describes how to do that. If you are looking for good examples of existing steps, I would suggest looking at the code for [centering](https://github.com/topepo/recipes/blob/master/R/center.R) or [PCA](https://github.com/topepo/recipes/blob/master/R/pca.R) to start. # A new step definition At an example, let's create a step that replaces the value of a variable with its percentile from the training set. The date that I'll use is from the `recipes` package: ```{r initial} data(biomass) str(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] ``` To illustrate the transformation with the `carbon` variable, the training set distribution of that variables is shown below with a vertical line for the first value of the test set. ```{r carbon_dist} library(ggplot2) theme_set(theme_bw()) ggplot(biomass_tr, aes(x = carbon)) + geom_histogram(binwidth = 5, col = "blue", fill = "blue", alpha = .5) + geom_vline(xintercept = biomass_te$carbon[1], lty = 2) ``` Based on the training set, `r round(mean(biomass_tr$carbon <= biomass_te$carbon[1])*100, 1)`% of the data are less than a value of `r biomass_te$carbon[1]`. There are some applications where it might be advantageous to represent the predictor values are percentiles rather than their original values. Our new step will do this computation for any numeric variables of interest. We will call this `step_percentile`. The code below is designed for illustration and not speed or best practices. I've left out a lot of error trapping that we would want in a real implementation. # Create the initial function. The user-exposed function `step_percentile` is just a simple wrapper around an internal function called `add_step`. This function takes the same arguments as your function and simply adds it to a new recipe. The `...` signfies the variable selectors that can be used. ```{r initial_def} step_percentile <- function(recipe, ..., role = NA, trained = FALSE, ref_dist = NULL, approx = FALSE, options = list(probs = (0:100)/100, names = TRUE)) { ## bake but do not evaluate the variable selectors with ## the `quos` function in `rlang` terms <- rlang::quos(...) if(length(terms) == 0) stop("Please supply at least one variable specification. See ?selections.") add_step( recipe, step_percentile_new( terms = terms, trained = trained, role = role, ref_dist = ref_dist, approx = approx, options = options)) } ``` You should always keep the first four arguments (`recipe` though `trained`) the same as listed above. Some notes: * the `role` argument is used when you either 1) create new variables and want their role to be pre-set or 2) replace the existing variables with new values. The latter is what we will be doing and using `role = NA` will leave the existing role intact. * `trained` is set by the package when the estimation step has been run. You should default your function definition's argument to `FALSE`. I've added extra arguments specific to this step. In order to calculate the percentile, the training data for the relevant columns will need to be saved. This data will be saved in the `ref_dist` object. However, this might be problematic if the data set is large. `approx` would be used when you want to save a grid of pre-computed percentiles from the training set and use these to estimate the percentile for a new data point. If `approx = TRUE`, the argument `ref_dist` will contain the grid for each variable. We will use the `stats::quantile` to compute the grid. However, we might also want to have control over the granularity of this grid, so the `options` argument will be used to define how that calculations is done. We could just use the ellipses (aka `...`) so that any options passed to `step_percentile` that are not one of its arguments will then be passed to `stats::quantile`. We recommend making a seperate list object with the options and use these inside the function. # Initialization of new objects Next, you can utilize the internal function `step` that sets the class of new objects. Using `subclass = "percentile"` will set the class of new objects to `"step_percentile". ```{r initialize} step_percentile_new <- function(terms = NULL, role = NA, trained = FALSE, ref_dist = NULL, approx = NULL, options = NULL) { step( subclass = "percentile", terms = terms, role = role, trained = trained, ref_dist = ref_dist, approx = approx, options = options ) } ``` # Define the estimation procedure You will need to create a new `prep` method for your step's class. To do this, three arguments that the method should have: ```r function(x, training, info = NULL) ``` where * `x` will be the `step_percentile` object * `training` will be a _tibble_ that has the training set data * `info` will also be a tibble that has information on the current set of data available. This information is updated as each step is evaluated by its specific `prep` method so it may not have the variables from the original data. The columns in this tibble are `variable` (the variable name), `type` (currently either "numeric" or "nominal"), `role` (defining the variable's role), and `source` (either "original" or "derived" depending on where it originated). You can define other options. The first thing that you might want to do in the `prep` function is to translate the specification listed in the `terms` argument to column names in the current data. There is an internal function called `terms_select` that can be used to obtain this. ```{r prep_1, eval = FALSE} prep.step_percentile <- function(x, training, info = NULL, ...) { col_names <- terms_select(terms = x$terms, info = info) } ``` Once we have this, we can either save the original data columns or estimate the approximation grid. For the grid, we will use a helper function that enables us to run `do.call` on a list of arguments that include the `options` list. ```{r prep_2} get_pctl <- function(x, args) { args$x <- x do.call("quantile", args) } prep.step_percentile <- function(x, training, info = NULL, ...) { col_names <- terms_select(terms = x$terms, info = info) ## You can add error trapping for non-numeric data here and so on. ## We'll use the names later so if(x$options$names == FALSE) stop("`names` should be set to TRUE") if(!x$approx) { x$ref_dist <- training[, col_names] } else { pctl <- lapply( training[, col_names], get_pctl, args = x$options ) x$ref_dist <- pctl } ## Always return the updated step x } ``` # Create the `bake` method Remember that the `prep` function does not _apply_ the step to the data; it only estimates any required values such as `ref_dist`. We will need to create a new method for our `step_percentile` class. The minimum arguments for this are ```r function(object, newdata, ...) ``` where `object` is the updated step function that has been through the corresponding `prep` code and `newdata` is a tibble of data to be preprocessingcessed. Here is the code to convert the new data to percentiles. Two initial helper functions handle the two cases (approximation or not). We always return a tibble as the output. ```{r bake} ## Two helper functions pctl_by_mean <- function(x, ref) mean(ref <= x) pctl_by_approx <- function(x, ref) { ## go from 1 column tibble to vector x <- getElement(x, names(x)) ## get the percentiles values from the names (e.g. "10%") p_grid <- as.numeric(gsub("%$", "", names(ref))) approx(x = ref, y = p_grid, xout = x)$y/100 } bake.step_percentile <- function(object, newdata, ...) { require(tibble) ## For illustration (and not speed), we will loop through the affected variables ## and do the computations vars <- names(object$ref_dist) for(i in vars) { if(!object$approx) { ## We can use `apply` since tibbles do not drop dimensions: newdata[, i] <- apply(newdata[, i], 1, pctl_by_mean, ref = object$ref_dist[, i]) } else newdata[, i] <- pctl_by_approx(newdata[, i], object$ref_dist[[i]]) } ## Always convert to tibbles on the way out as_tibble(newdata) } ``` # Running the example Let's use the example data to make sure that it works: ```{r example} rec_obj <- recipe(HHV ~ ., data = biomass_tr[, -(1:2)]) rec_obj <- rec_obj %>% step_percentile(all_predictors(), approx = TRUE) rec_obj <- prep(rec_obj, training = biomass_tr) percentiles <- bake(rec_obj, biomass_te) percentiles ``` The plot below shows how the original data line up with the percentiles for each split of the data for one of the predictors: ```{r cdf_plot, echo = FALSE} grid_pct <- rec_obj$steps[[1]]$options$probs plot_data <- data.frame( carbon = c( quantile(biomass_tr$carbon, probs = grid_pct), biomass_te$carbon ), percentile = c(grid_pct, percentiles$carbon), dataset = rep( c("Training", "Testing"), c(length(grid_pct), nrow(percentiles)) ) ) ggplot(plot_data, aes(x = carbon, y = percentile, col = dataset)) + geom_point(alpha = .4, cex = 2) + theme(legend.position = "top") ``` recipes/tests/0000755000177700017770000000000013136242227014402 5ustar herbrandtherbrandtrecipes/tests/testthat.R0000644000177700017770000000011513064546045016367 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) test_check(package = "recipes") q("no") recipes/tests/testthat/0000755000177700017770000000000013136342173016243 5ustar herbrandtherbrandtrecipes/tests/testthat/test_spatialsign.R0000644000177700017770000000164713135741217021754 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) data("biomass") rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass) test_that('spatial sign', { sp_sign <- rec %>% step_center(carbon, hydrogen) %>% step_scale(carbon, hydrogen) %>% step_spatialsign(carbon, hydrogen) sp_sign_trained <- prep(sp_sign, training = biomass, verbose = FALSE) sp_sign_pred <- bake(sp_sign_trained, newdata = biomass) sp_sign_pred <- as.matrix(sp_sign_pred)[, c("carbon", "hydrogen")] x <- as.matrix(scale(biomass[, 3:4], center = TRUE, scale = TRUE)) x <- t(apply(x, 1, function(x) x/sqrt(sum(x^2)))) expect_equal(sp_sign_pred, x) }) test_that('printing', { sp_sign <- rec %>% step_center(carbon, hydrogen) %>% step_scale(carbon, hydrogen) %>% step_spatialsign(carbon, hydrogen) expect_output(print(sp_sign)) expect_output(prep(sp_sign, training = biomass)) }) recipes/tests/testthat/test_rm.R0000644000177700017770000000131113135741217020040 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(tibble) n <- 20 set.seed(12) ex_dat <- data.frame(x1 = rnorm(n), x2 = runif(n)) test_that('simple logit trans', { rec <- recipe(~., data = ex_dat) %>% step_rm(x1) rec_trained <- prep(rec, training = ex_dat, verbose = FALSE) rec_rm <- bake(rec_trained, newdata = ex_dat) expect_equal(colnames(rec_rm), "x2") }) test_that('printing', { rec <- recipe(~., data = ex_dat) %>% step_rm(x1) expect_output(print(rec)) expect_output(prep(rec, training = ex_dat)) }) test_that('printing', { rec <- recipe(~., data = ex_dat) %>% step_rm(x1) expect_output(print(rec)) expect_output(prep(rec, training = ex_dat)) }) recipes/tests/testthat/test_YeoJohnson.R0000644000177700017770000000611713135741217021526 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) n <- 20 set.seed(1) ex_dat <- data.frame(x1 = exp(rnorm(n, mean = .1)), x2 = 1/rnorm(n), x3 = rep(1:2, each = n/2), x4 = rexp(n)) ## from `car` package exp_lambda <- c(x1 = -0.2727204451, x2 = 1.139292543, x3 = NA, x4 = -1.012702061) exp_dat <- structure(list(x1 = c(0.435993557749438, 0.754696454247318, 0.371327932207827, 1.46113017436327, 0.82204097731098, 0.375761562702297, 0.89751975937422, 1.02175936118846, 0.940739811377902, 0.54984302797741, 1.41856737837093, 0.850587387615876, 0.437701618670981, 0.112174615510591, 1.21942112715274, 0.654589551748501, 0.666780580127795, 1.12625135443351, 1.0636850911955, 0.949680956411546), x2 = c(1.15307873387121, 1.36532999080347, 17.4648439780388, -0.487746797875704, 1.74452440065935, -13.3640721541574, -5.35805967319061, -0.653901985285932, -1.90735599477338, 2.65253432454371, 0.76771137336975, -7.79484535687973, 2.87484976680907, -13.8738947581599, -0.696856395842167, -2.17745353101028, -2.28384276604207, -12.7261652971783, 0.95585544349634, 1.40099012093008), x3 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), x4 = c(0.49061104973894, 0.49670370366879, 0.338742419511653, 0.663722100577351, 0.296260662322359, 0.681346128666408, 0.757581280603711, 0.357148961119583, 0.371872889850153, 0.49239057672598, 0.173259524331095, 0.235933290139909, 0.52297977893566, 0.434927187456966, 0.0822501770191215, 0.523479652016858, 0.197977570919824, 0.608108816144845, 0.821913792446345, 0.300608495427594)), .Names = c("x1", "x2", "x3", "x4"), row.names = c(NA, -20L), class = "data.frame") test_that('simple YJ trans', { rec <- recipe(~., data = ex_dat) %>% step_YeoJohnson(x1, x2, x3, x4) rec_trained <- prep(rec, training = ex_dat, verbose = FALSE) rec_trans <- bake(rec_trained, newdata = ex_dat) expect_equal(names(exp_lambda)[!is.na(exp_lambda)], names(rec_trained$steps[[1]]$lambdas)) expect_equal(exp_lambda[!is.na(exp_lambda)], rec_trained$steps[[1]]$lambdas, tol = .001) expect_equal(as.matrix(exp_dat), as.matrix(rec_trans), tol = .05) }) test_that('printing', { rec <- recipe(~., data = ex_dat) %>% step_YeoJohnson(x1, x2, x3, x4) expect_output(print(rec)) expect_output(prep(rec, training = ex_dat)) }) recipes/tests/testthat/test_ordinalscore.R0000644000177700017770000000440013135741217022110 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) n <- 20 set.seed(752) ex_dat <- data.frame( numbers = rnorm(n), fact = factor(sample(letters[1:3], n, replace = TRUE)), ord1 = factor(sample(LETTERS[1:3], n, replace = TRUE), ordered = TRUE), ord2 = factor(sample(LETTERS[4:8], n, replace = TRUE), ordered = TRUE), ord3 = factor(sample(LETTERS[10:20], n, replace = TRUE), ordered = TRUE) ) ex_miss <- ex_dat ex_miss$ord1[c(1, 5, 9)] <- NA ex_miss$ord3[2] <- NA score <- function(x) as.numeric(x)^2 test_that('linear scores', { rec1 <- recipe(~ ., data = ex_dat) %>% step_ordinalscore(starts_with("ord")) rec1 <- prep(rec1, training = ex_dat, retain = TRUE, stringsAsFactors = FALSE, verbose = FALSE) rec1_scores <- bake(rec1, newdata = ex_dat) rec1_scores_NA <- bake(rec1, newdata = ex_miss) expect_equal(as.numeric(ex_dat$ord1), rec1_scores$ord1) expect_equal(as.numeric(ex_dat$ord2), rec1_scores$ord2) expect_equal(as.numeric(ex_dat$ord3), rec1_scores$ord3) expect_equal(as.numeric(ex_miss$ord1), rec1_scores_NA$ord1) expect_equal(as.numeric(ex_miss$ord3), rec1_scores_NA$ord3) }) test_that('nonlinear scores', { rec2 <- recipe(~ ., data = ex_dat) %>% step_ordinalscore(starts_with("ord"), convert = score) rec2 <- prep(rec2, training = ex_dat, retain = TRUE, stringsAsFactors = FALSE, verbose = FALSE) rec2_scores <- bake(rec2, newdata = ex_dat) rec2_scores_NA <- bake(rec2, newdata = ex_miss) expect_equal(as.numeric(ex_dat$ord1)^2, rec2_scores$ord1) expect_equal(as.numeric(ex_dat$ord2)^2, rec2_scores$ord2) expect_equal(as.numeric(ex_dat$ord3)^2, rec2_scores$ord3) expect_equal(as.numeric(ex_miss$ord1)^2, rec2_scores_NA$ord1) expect_equal(as.numeric(ex_miss$ord3)^2, rec2_scores_NA$ord3) }) test_that('bad spec', { rec3 <- recipe(~ ., data = ex_dat) %>% step_ordinalscore(everything()) expect_error(prep(rec3, training = ex_dat, verbose = FALSE)) rec4 <- recipe(~ ., data = ex_dat) expect_error(rec4 %>% step_ordinalscore()) }) test_that('printing', { rec5 <- recipe(~ ., data = ex_dat) %>% step_ordinalscore(starts_with("ord")) expect_output(print(rec5)) expect_output(prep(rec5, training = ex_dat)) }) recipes/tests/testthat/test_other.R0000644000177700017770000001172013135741217020550 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) data(okc) set.seed(19) in_train <- sample(1:nrow(okc), size = 30000) okc_tr <- okc[ in_train,] okc_te <- okc[-in_train,] rec <- recipe(~ diet + location, data = okc_tr) test_that('default inputs', { others <- rec %>% step_other(diet, location) others <- prep(others, training = okc_tr) others_te <- bake(others, newdata = okc_te) diet_props <- table(okc_tr$diet)/sum(!is.na(okc_tr$diet)) diet_props <- sort(diet_props, decreasing = TRUE) diet_levels <- names(diet_props)[diet_props >= others$step[[1]]$threshold] for(i in diet_levels) expect_equal(sum(others_te$diet == i, na.rm =TRUE), sum(okc_te$diet == i, na.rm =TRUE)) diet_levels <- c(diet_levels, others$step[[1]]$objects[["diet"]]$other) expect_true(all(levels(others_te$diet) %in% diet_levels)) expect_true(all(diet_levels %in% levels(others_te$diet))) location_props <- table(okc_tr$location)/sum(!is.na(okc_tr$location)) location_props <- sort(location_props, decreasing = TRUE) location_levels <- names(location_props)[location_props >= others$step[[1]]$threshold] for(i in location_levels) expect_equal(sum(others_te$location == i, na.rm =TRUE), sum(okc_te$location == i, na.rm =TRUE)) location_levels <- c(location_levels, others$step[[1]]$objects[["location"]]$other) expect_true(all(levels(others_te$location) %in% location_levels)) expect_true(all(location_levels %in% levels(others_te$location))) expect_equal(is.na(okc_te$diet), is.na(others_te$diet)) expect_equal(is.na(okc_te$location), is.na(others_te$location)) }) test_that('high threshold - much removals', { others <- rec %>% step_other(diet, location, threshold = .5) others <- prep(others, training = okc_tr) others_te <- bake(others, newdata = okc_te) diet_props <- table(okc_tr$diet) diet_levels <- others$steps[[1]]$objects$diet$keep for(i in diet_levels) expect_equal(sum(others_te$diet == i, na.rm =TRUE), sum(okc_te$diet == i, na.rm =TRUE)) diet_levels <- c(diet_levels, others$step[[1]]$objects[["diet"]]$other) expect_true(all(levels(others_te$diet) %in% diet_levels)) expect_true(all(diet_levels %in% levels(others_te$diet))) location_props <- table(okc_tr$location) location_levels <- others$steps[[1]]$objects$location$keep for(i in location_levels) expect_equal(sum(others_te$location == i, na.rm =TRUE), sum(okc_te$location == i, na.rm =TRUE)) location_levels <- c(location_levels, others$step[[1]]$objects[["location"]]$other) expect_true(all(levels(others_te$location) %in% location_levels)) expect_true(all(location_levels %in% levels(others_te$location))) expect_equal(is.na(okc_te$diet), is.na(others_te$diet)) expect_equal(is.na(okc_te$location), is.na(others_te$location)) }) test_that('low threshold - no removals', { others <- rec %>% step_other(diet, location, threshold = 10^-10) others <- prep(others, training = okc_tr, stringsAsFactors = FALSE) others_te <- bake(others, newdata = okc_te) expect_equal(others$steps[[1]]$objects$diet$collapse, FALSE) expect_equal(others$steps[[1]]$objects$location$collapse, FALSE) expect_equal(okc_te$diet, others_te$diet) expect_equal(okc_te$location, others_te$location) }) test_that('factor inputs', { okc$diet <- as.factor(okc$diet) okc$location <- as.factor(okc$location) okc_tr <- okc[ in_train,] okc_te <- okc[-in_train,] rec <- recipe(~ diet + location, data = okc_tr) others <- rec %>% step_other(diet, location) others <- prep(others, training = okc_tr) others_te <- bake(others, newdata = okc_te) diet_props <- table(okc_tr$diet)/sum(!is.na(okc_tr$diet)) diet_props <- sort(diet_props, decreasing = TRUE) diet_levels <- names(diet_props)[diet_props >= others$step[[1]]$threshold] for(i in diet_levels) expect_equal(sum(others_te$diet == i, na.rm =TRUE), sum(okc_te$diet == i, na.rm =TRUE)) diet_levels <- c(diet_levels, others$step[[1]]$objects[["diet"]]$other) expect_true(all(levels(others_te$diet) %in% diet_levels)) expect_true(all(diet_levels %in% levels(others_te$diet))) location_props <- table(okc_tr$location)/sum(!is.na(okc_tr$location)) location_props <- sort(location_props, decreasing = TRUE) location_levels <- names(location_props)[location_props >= others$step[[1]]$threshold] for(i in location_levels) expect_equal(sum(others_te$location == i, na.rm =TRUE), sum(okc_te$location == i, na.rm =TRUE)) location_levels <- c(location_levels, others$step[[1]]$objects[["location"]]$other) expect_true(all(levels(others_te$location) %in% location_levels)) expect_true(all(location_levels %in% levels(others_te$location))) expect_equal(is.na(okc_te$diet), is.na(others_te$diet)) expect_equal(is.na(okc_te$location), is.na(others_te$location)) }) test_that('printing', { rec <- rec %>% step_other(diet, location) expect_output(print(rec)) expect_output(prep(rec, training = okc_tr)) }) recipes/tests/testthat/test_ns.R0000644000177700017770000000420513135741217020047 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) data(biomass) library(splines) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) test_that('correct basis functions', { with_ns <- rec %>% step_ns(carbon, hydrogen) with_ns <- prep(with_ns, training = biomass_tr, verbose = FALSE) with_ns_pred_tr <- bake(with_ns, newdata = biomass_tr) with_ns_pred_te <- bake(with_ns, newdata = biomass_te) carbon_ns_tr_exp <- ns(biomass_tr$carbon, df = 2) hydrogen_ns_tr_exp <- ns(biomass_tr$hydrogen, df = 2) carbon_ns_te_exp <- predict(carbon_ns_tr_exp, biomass_te$carbon) hydrogen_ns_te_exp <- predict(hydrogen_ns_tr_exp, biomass_te$hydrogen) carbon_ns_tr_res <- as.matrix(with_ns_pred_tr[, grep("carbon", names(with_ns_pred_tr))]) colnames(carbon_ns_tr_res) <- NULL hydrogen_ns_tr_res <- as.matrix(with_ns_pred_tr[, grep("hydrogen", names(with_ns_pred_tr))]) colnames(hydrogen_ns_tr_res) <- NULL carbon_ns_te_res <- as.matrix(with_ns_pred_te[, grep("carbon", names(with_ns_pred_te))]) colnames(carbon_ns_te_res) <- 1:ncol(carbon_ns_te_res) hydrogen_ns_te_res <- as.matrix(with_ns_pred_te[, grep("hydrogen", names(with_ns_pred_te))]) colnames(hydrogen_ns_te_res) <- 1:ncol(hydrogen_ns_te_res) ## remove attributes carbon_ns_tr_exp <- matrix(carbon_ns_tr_exp, ncol = 2) carbon_ns_te_exp <- matrix(carbon_ns_te_exp, ncol = 2) hydrogen_ns_tr_exp <- matrix(hydrogen_ns_tr_exp, ncol = 2) hydrogen_ns_te_exp <- matrix(hydrogen_ns_te_exp, ncol = 2) dimnames(carbon_ns_tr_res) <- NULL dimnames(carbon_ns_te_res) <- NULL dimnames(hydrogen_ns_tr_res) <- NULL dimnames(hydrogen_ns_te_res) <- NULL expect_equal(carbon_ns_tr_res, carbon_ns_tr_exp) expect_equal(carbon_ns_te_res, carbon_ns_te_exp) expect_equal(hydrogen_ns_tr_res, hydrogen_ns_tr_exp) expect_equal(hydrogen_ns_te_res, hydrogen_ns_te_exp) }) test_that('printing', { with_ns <- rec %>% step_ns(carbon, hydrogen) expect_output(print(with_ns)) expect_output(prep(with_ns, training = biomass_tr)) }) recipes/tests/testthat/test_poly.R0000644000177700017770000000436413135741217020420 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) test_that('correct basis functions', { with_poly <- rec %>% step_poly(carbon, hydrogen) with_poly <- prep(with_poly, training = biomass_tr, verbose = FALSE) with_poly_pred_tr <- bake(with_poly, newdata = biomass_tr) with_poly_pred_te <- bake(with_poly, newdata = biomass_te) carbon_poly_tr_exp <- poly(biomass_tr$carbon, degree = 2) hydrogen_poly_tr_exp <- poly(biomass_tr$hydrogen, degree = 2) carbon_poly_te_exp <- predict(carbon_poly_tr_exp, biomass_te$carbon) hydrogen_poly_te_exp <- predict(hydrogen_poly_tr_exp, biomass_te$hydrogen) carbon_poly_tr_res <- as.matrix(with_poly_pred_tr[, grep("carbon", names(with_poly_pred_tr))]) colnames(carbon_poly_tr_res) <- NULL hydrogen_poly_tr_res <- as.matrix(with_poly_pred_tr[, grep("hydrogen", names(with_poly_pred_tr))]) colnames(hydrogen_poly_tr_res) <- NULL carbon_poly_te_res <- as.matrix(with_poly_pred_te[, grep("carbon", names(with_poly_pred_te))]) colnames(carbon_poly_te_res) <- 1:ncol(carbon_poly_te_res) hydrogen_poly_te_res <- as.matrix(with_poly_pred_te[, grep("hydrogen", names(with_poly_pred_te))]) colnames(hydrogen_poly_te_res) <- 1:ncol(hydrogen_poly_te_res) ## remove attributes carbon_poly_tr_exp <- matrix(carbon_poly_tr_exp, ncol = 2) carbon_poly_te_exp <- matrix(carbon_poly_te_exp, ncol = 2) hydrogen_poly_tr_exp <- matrix(hydrogen_poly_tr_exp, ncol = 2) hydrogen_poly_te_exp <- matrix(hydrogen_poly_te_exp, ncol = 2) dimnames(carbon_poly_tr_res) <- NULL dimnames(carbon_poly_te_res) <- NULL dimnames(hydrogen_poly_tr_res) <- NULL dimnames(hydrogen_poly_te_res) <- NULL expect_equal(carbon_poly_tr_res, carbon_poly_tr_exp) expect_equal(carbon_poly_te_res, carbon_poly_te_exp) expect_equal(hydrogen_poly_tr_res, hydrogen_poly_tr_exp) expect_equal(hydrogen_poly_te_res, hydrogen_poly_te_exp) }) test_that('printing', { with_poly <- rec %>% step_poly(carbon, hydrogen) expect_output(print(with_poly)) expect_output(prep(with_poly, training = biomass_tr)) }) recipes/tests/testthat/test_interact.R0000644000177700017770000000452513135741217021245 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) data("biomass") tr_biomass <- subset(biomass, dataset == "Training")[, -(1:2)] te_biomass <- subset(biomass, dataset == "Testing")[, -(1:2)] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = tr_biomass) test_that('non-factor variables with dot', { int_rec <- rec %>% step_interact(~(.-HHV)^3, sep=":") int_rec_trained <- prep(int_rec, training = tr_biomass, verbose = FALSE) te_new <- bake(int_rec_trained, newdata = te_biomass, all_predictors()) te_new <- te_new[, sort(names(te_new))] te_new <- as.matrix(te_new) og_terms <- terms(~(.-HHV)^3, data = te_biomass) te_og <- model.matrix(og_terms, data = te_biomass)[, -1] te_og <- te_og[, sort(colnames(te_og))] rownames(te_new) <- NULL rownames(te_og) <- NULL expect_equal(te_og, te_new) }) test_that('non-factor variables with specific variables', { int_rec <- rec %>% step_interact(~carbon:hydrogen + oxygen:nitrogen:sulfur, sep = ":") int_rec_trained <- prep(int_rec, training = tr_biomass, verbose = FALSE) te_new <- bake(int_rec_trained, newdata = te_biomass, all_predictors()) te_new <- te_new[, sort(names(te_new))] te_new <- as.matrix(te_new) og_terms <- terms(~carbon + hydrogen + oxygen + nitrogen + sulfur + carbon:hydrogen + oxygen:nitrogen:sulfur, data = te_biomass) te_og <- model.matrix(og_terms, data = te_biomass)[, -1] te_og <- te_og[, sort(colnames(te_og))] rownames(te_new) <- NULL rownames(te_og) <- NULL expect_equal(te_og, te_new) }) test_that('printing', { int_rec <- rec %>% step_interact(~carbon:hydrogen) expect_output(print(int_rec)) expect_output(prep(int_rec, training = tr_biomass)) }) # currently failing; try to figure out why # test_that('with factors', { # int_rec <- recipe(Sepal.Width ~ ., data = iris) %>% # step_interact(~ (. - Sepal.Width)^3, sep = ":") # int_rec_trained <- prep(int_rec, iris) # # te_new <- bake(int_rec_trained, newdata = iris, role = "predictor") # te_new <- te_new[, sort(names(te_new))] # te_new <- as.matrix(te_new) # # og_terms <- terms(Sepal.Width ~ (.)^3, data = iris) # te_og <- model.matrix(og_terms, data = iris)[, -1] # te_og <- te_og[, sort(colnames(te_og))] # # rownames(te_new) <- NULL # rownames(te_og) <- NULL # # all.equal(te_og, te_new) # }) recipes/tests/testthat/test_lincomb.R0000644000177700017770000000414213135741217021052 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) dummies <- cbind(model.matrix( ~ block - 1, npk), model.matrix( ~ N - 1, npk), model.matrix( ~ P - 1, npk), model.matrix( ~ K - 1, npk), yield = npk$yield) dummies <- as.data.frame(dummies) dum_rec <- recipe(yield ~ . , data = dummies) ################################################################### data(biomass) biomass$new_1 <- with(biomass, .1*carbon - .2*hydrogen + .6*sulfur) biomass$new_2 <- with(biomass, .5*carbon - .2*oxygen + .6*nitrogen) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] biomass_rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur + new_1 + new_2, data = biomass_tr) ################################################################### test_that('example 1', { dum_filtered <- dum_rec %>% step_lincomb(all_predictors()) dum_filtered <- prep(dum_filtered, training = dummies, verbose = FALSE) removed <- c("N1", "P1", "K1") expect_equal(dum_filtered$steps[[1]]$removals, removed) }) test_that('example 2', { lincomb_filter <- biomass_rec %>% step_lincomb(all_predictors()) filtering_trained <- prep(lincomb_filter, training = biomass_tr) test_res <- bake(filtering_trained, newdata = biomass_te) expect_true(all(!(paste0("new_", 1:2) %in% colnames(test_res)))) }) test_that('no exclusions', { biomass_rec_2 <- recipe(HHV ~ carbon + hydrogen, data = biomass_tr) lincomb_filter_2 <- biomass_rec_2 %>% step_lincomb(all_predictors()) filtering_trained_2 <- prep(lincomb_filter_2, training = biomass_tr) test_res_2 <- bake(filtering_trained_2, newdata = biomass_te) expect_true(length(filtering_trained_2$steps[[1]]$removals) == 0) expect_true(all(colnames(test_res_2) == c("carbon", "hydrogen"))) }) test_that('printing', { dum_filtered <- dum_rec %>% step_lincomb(all_predictors()) expect_output(print(dum_filtered)) expect_output(prep(dum_filtered, training = dummies)) }) recipes/tests/testthat/test_depth.R0000644000177700017770000000317613135741217020541 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(ddalpha) test_that("defaults", { rec <- recipe(Species ~ ., data = iris) %>% step_depth(all_predictors(), class = "Species", metric = "spatial") trained <- prep(rec, training = iris, verbose = FALSE) depths <- bake(trained, newdata = iris) depths <- depths[, grepl("depth", names(depths))] depths <- as.data.frame(depths) split_up <- split(iris[, 1:4], iris$Species) spatial <- function(x, y) depth.spatial(x = y, data = x) exp_res <- lapply(split_up, spatial, y = iris[, 1:4]) exp_res <- as.data.frame(exp_res) for(i in 1:ncol(exp_res)) expect_equal(depths[, i], exp_res[, i]) }) test_that("alt args", { rec <- recipe(Species ~ ., data = iris) %>% step_depth(all_predictors(), class = "Species", metric = "Mahalanobis", options = list(mah.estimate = "MCD", mah.parMcd = .75)) trained <- prep(rec, training = iris, verbose = FALSE) depths <- bake(trained, newdata = iris) depths <- depths[, grepl("depth", names(depths))] depths <- as.data.frame(depths) split_up <- split(iris[, 1:4], iris$Species) Mahalanobis <- function(x, y) depth.Mahalanobis(x = y, data = x, mah.estimate = "MCD", mah.parMcd = .75) exp_res <- lapply(split_up, Mahalanobis, y = iris[, 1:4]) exp_res <- as.data.frame(exp_res) head(exp_res) head(depths) for(i in 1:ncol(exp_res)) expect_equal(depths[, i], exp_res[, i]) }) test_that('printing', { rec <- recipe(Species ~ ., data = iris) %>% step_depth(all_predictors(), class = "Species", metric = "spatial") expect_output(print(rec)) expect_output(prep(rec, training = iris)) }) recipes/tests/testthat/test_stringsAsFactors.R0000644000177700017770000000254613135741217022734 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) n <- 20 set.seed(752) as_fact <- data.frame( numbers = rnorm(n), fact = factor(sample(letters[1:3], n, replace = TRUE)), ord = factor(sample(LETTERS[22:26], n, replace = TRUE), ordered = TRUE) ) as_str <- as_fact as_str$fact <- as.character(as_str$fact) as_str$ord <- as.character(as_str$ord) test_that('stringsAsFactors = FALSE', { rec1 <- recipe(~ ., data = as_fact) %>% step_center(numbers) rec1 <- prep(rec1, training = as_fact, retain = TRUE, stringsAsFactors = FALSE, verbose = FALSE) rec1_as_fact <- bake(rec1, newdata = as_fact) rec1_as_str <- bake(rec1, newdata = as_str) expect_equal(as_fact$fact, rec1_as_fact$fact) expect_equal(as_fact$ord, rec1_as_fact$ord) expect_equal(as_str$fact, rec1_as_str$fact) expect_equal(as_str$ord, rec1_as_str$ord) }) test_that('stringsAsFactors = TRUE', { rec2 <- recipe(~ ., data = as_fact) %>% step_center(numbers) rec2 <- prep(rec2, training = as_fact, retain = TRUE, stringsAsFactors = TRUE, verbose = FALSE) rec2_as_fact <- bake(rec2, newdata = as_fact) rec2_as_str <- bake(rec2, newdata = as_str) expect_equal(as_fact$fact, rec2_as_fact$fact) expect_equal(as_fact$ord, rec2_as_fact$ord) expect_equal(as_fact$fact, rec2_as_str$fact) expect_equal(as_fact$ord, rec2_as_str$ord) }) recipes/tests/testthat/test_center_scale.R0000644000177700017770000000415413135741217022061 0ustar herbrandtherbrandtlibrary(testthat) context("Testing center and scale") library(recipes) means <- vapply(biomass[, 3:7], mean, c(mean = 0)) sds <- vapply(biomass[, 3:7], sd, c(sd = 0)) rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass) test_that('correct means and std devs', { standardized <- rec %>% step_center(carbon, hydrogen, oxygen, nitrogen, sulfur) %>% step_scale(carbon, hydrogen, oxygen, nitrogen, sulfur) standardized_trained <- prep(standardized, training = biomass, verbose = FALSE) expect_equal(standardized_trained$steps[[1]]$means, means) expect_equal(standardized_trained$steps[[2]]$sds, sds) }) test_that('training in stages', { at_once <- rec %>% step_center(carbon, hydrogen, oxygen, nitrogen, sulfur) %>% step_scale(carbon, hydrogen, oxygen, nitrogen, sulfur) at_once_trained <- prep(at_once, training = biomass, verbose = FALSE) ## not train in stages center_first <- rec %>% step_center(carbon, hydrogen, oxygen, nitrogen, sulfur) center_first_trained <- prep(center_first, training = biomass, verbose = FALSE) in_stages <- center_first_trained %>% step_scale(carbon, hydrogen, oxygen, nitrogen, sulfur) in_stages_trained <- prep(in_stages, training = biomass, verbose = FALSE) in_stages_retrained <- prep(in_stages, training = biomass, verbose = FALSE, fresh = TRUE) expect_equal(at_once_trained, in_stages_trained) expect_equal(at_once_trained, in_stages_retrained) }) test_that('single predictor', { standardized <- rec %>% step_center(carbon) %>% step_scale(hydrogen) standardized_trained <- prep(standardized, training = biomass, verbose = FALSE) results <- bake(standardized_trained, biomass) exp_res <- biomass[, 3:8] exp_res$carbon <- exp_res$carbon - mean(exp_res$carbon) exp_res$hydrogen <- exp_res$hydrogen / sd(exp_res$hydrogen) expect_equal(as.data.frame(results), exp_res[, colnames(results)]) }) test_that('printing', { standardized <- rec %>% step_center(carbon) %>% step_scale(hydrogen) expect_output(print(standardized)) expect_output(prep(standardized, training = biomass)) }) recipes/tests/testthat/test_holiday.R0000644000177700017770000000424013135741217021057 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(lubridate) exp_dates <- data.frame(date = ymd(c("2017-12-25", "2017-05-29", "2017-04-16")), holiday = c("ChristmasDay", "USMemorialDay", "Easter"), stringsAsFactors = FALSE) test_data <- data.frame(day = ymd("2017-01-01") + days(0:364)) test_that('Date class', { holiday_rec <- recipe(~ day, test_data) %>% step_holiday(all_predictors(), holidays = exp_dates$holiday) holiday_rec <- prep(holiday_rec, training = test_data) holiday_ind <- bake(holiday_rec, test_data) all.equal(holiday_ind$day[holiday_ind$day_USMemorialDay == 1], exp_dates$date[exp_dates$holiday == "USMemorialDay"]) expect_equal(holiday_ind$day[holiday_ind$day_USMemorialDay == 1], exp_dates$date[exp_dates$holiday == "USMemorialDay"]) expect_equal(holiday_ind$day[holiday_ind$day_ChristmasDay == 1], exp_dates$date[exp_dates$holiday == "ChristmasDay"]) expect_equal(holiday_ind$day[holiday_ind$day_Easter == 1], exp_dates$date[exp_dates$holiday == "Easter"]) }) test_that('POSIXct class', { test_data$day <- as.POSIXct(test_data$day) exp_dates$date <- as.POSIXct(exp_dates$date) holiday_rec <- recipe(~ day, test_data) %>% step_holiday(all_predictors(), holidays = exp_dates$holiday) holiday_rec <- prep(holiday_rec, training = test_data) holiday_ind <- bake(holiday_rec, test_data) all.equal(holiday_ind$day[holiday_ind$day_USMemorialDay == 1], exp_dates$date[exp_dates$holiday == "USMemorialDay"]) expect_equal(holiday_ind$day[holiday_ind$day_USMemorialDay == 1], exp_dates$date[exp_dates$holiday == "USMemorialDay"]) expect_equal(holiday_ind$day[holiday_ind$day_ChristmasDay == 1], exp_dates$date[exp_dates$holiday == "ChristmasDay"]) expect_equal(holiday_ind$day[holiday_ind$day_Easter == 1], exp_dates$date[exp_dates$holiday == "Easter"]) }) test_that('printing', { holiday_rec <- recipe(~ day, test_data) %>% step_holiday(all_predictors(), holidays = exp_dates$holiday) expect_output(print(holiday_rec)) expect_output(prep(holiday_rec, training = test_data)) }) recipes/tests/testthat/test_dummies.R0000644000177700017770000000234513135741217021075 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) data(okc) okc$location <- gsub(", california", "", okc$location) okc$diet[is.na(okc$diet)] <- "missing" okc <- okc[complete.cases(okc), -5] okc_fac <- data.frame(okc) test_that('dummy variables with string inputs', { rec <- recipe(age ~ ., data = okc) dummy <- rec %>% step_dummy(diet, location) dummy_trained <- prep(dummy, training = okc, verbose = FALSE, stringsAsFactors = FALSE) dummy_pred <- bake(dummy_trained, newdata = okc) dummy_pred <- dummy_pred[, order(colnames(dummy_pred))] dummy_pred <- as.data.frame(dummy_pred) rownames(dummy_pred) <- NULL exp_res <- model.matrix(age ~ ., data = okc_fac)[, -1] exp_res <- exp_res[, colnames(exp_res) != "age"] colnames(exp_res) <- gsub("^location", "location_", colnames(exp_res)) colnames(exp_res) <- gsub("^diet", "diet_", colnames(exp_res)) colnames(exp_res) <- make.names(colnames(exp_res)) exp_res <- exp_res[, order(colnames(exp_res))] exp_res <- as.data.frame(exp_res) rownames(exp_res) <- NULL expect_equal(dummy_pred, exp_res) }) test_that('printing', { rec <- recipe(age ~ ., data = okc) dummy <- rec %>% step_dummy(diet, location) expect_output(print(dummy)) expect_output(prep(dummy, training = okc)) }) recipes/tests/testthat/test_meanimpute.R0000644000177700017770000000327213135741217021576 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) data("credit_data") set.seed(342) in_training <- sample(1:nrow(credit_data), 2000) credit_tr <- credit_data[ in_training, ] credit_te <- credit_data[-in_training, ] test_that('simple mean', { rec <- recipe(Price ~ ., data = credit_tr) impute_rec <- rec %>% step_meanimpute(Age, Assets, Income) imputed <- prep(impute_rec, training = credit_tr, verbose = FALSE) te_imputed <- bake(imputed, newdata = credit_te) expect_equal(te_imputed$Age, credit_te$Age) expect_equal(te_imputed$Assets[is.na(credit_te$Assets)], rep(mean(credit_tr$Assets, na.rm = TRUE), sum(is.na(credit_te$Assets)))) expect_equal(te_imputed$Income[is.na(credit_te$Income)], rep(mean(credit_tr$Income, na.rm = TRUE), sum(is.na(credit_te$Income)))) }) test_that('trimmed mean', { rec <- recipe(Price ~ ., data = credit_tr) impute_rec <- rec %>% step_meanimpute(Assets, trim = .1) imputed <- prep(impute_rec, training = credit_tr, verbose = FALSE) te_imputed <- bake(imputed, newdata = credit_te) expect_equal(te_imputed$Assets[is.na(credit_te$Assets)], rep(mean(credit_tr$Assets, na.rm = TRUE, trim = .1), sum(is.na(credit_te$Assets)))) }) test_that('non-numeric', { rec <- recipe(Price ~ ., data = credit_tr) impute_rec <- rec %>% step_meanimpute(Assets, Job) expect_error(prep(impute_rec, training = credit_tr, verbose = FALSE)) }) test_that('printing', { impute_rec <- recipe(Price ~ ., data = credit_tr) %>% step_meanimpute(Age, Assets, Income) expect_output(print(impute_rec)) expect_output(prep(impute_rec, training = credit_tr)) }) recipes/tests/testthat/test_ratio.R0000644000177700017770000000437613135741217020556 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(tibble) n <- 20 ex_dat <- data.frame( x1 = -1:8, x2 = 1, x3 = c(1:9, NA), x4 = 11:20, x5 = letters[1:10] ) rec <- recipe( ~ x1 + x2 + x3 + x4 + x5, data = ex_dat) test_that('1:many', { rec1 <- rec %>% step_ratio(x1, denom = denom_vars(all_numeric())) rec1 <- prep(rec1, ex_dat, verbose = FALSE) obs1 <- bake(rec1, ex_dat) res1 <- tibble( x1_o_x2 = ex_dat$x1/ex_dat$x2, x1_o_x3 = ex_dat$x1/ex_dat$x3, x1_o_x4 = ex_dat$x1/ex_dat$x4 ) for(i in names(res1)) expect_equal(res1[i], obs1[i]) }) test_that('many:1', { rec2 <- rec %>% step_ratio(all_numeric(), denom = denom_vars(x1)) rec2 <- prep(rec2, ex_dat, verbose = FALSE) obs2 <- bake(rec2, ex_dat) res2 <- tibble( x2_o_x1 = ex_dat$x2/ex_dat$x1, x3_o_x1 = ex_dat$x3/ex_dat$x1, x4_o_x1 = ex_dat$x4/ex_dat$x1 ) for(i in names(res2)) expect_equal(res2[i], obs2[i]) }) test_that('many:many', { rec3 <- rec %>% step_ratio(all_numeric(), denom = denom_vars(all_numeric())) rec3 <- prep(rec3, ex_dat, verbose = FALSE) obs3 <- bake(rec3, ex_dat) res3 <- tibble( x2_o_x1 = ex_dat$x2/ex_dat$x1, x3_o_x1 = ex_dat$x3/ex_dat$x1, x4_o_x1 = ex_dat$x4/ex_dat$x1, x1_o_x2 = ex_dat$x1/ex_dat$x2, x3_o_x2 = ex_dat$x3/ex_dat$x2, x4_o_x2 = ex_dat$x4/ex_dat$x2, x1_o_x3 = ex_dat$x1/ex_dat$x3, x2_o_x3 = ex_dat$x2/ex_dat$x3, x4_o_x3 = ex_dat$x4/ex_dat$x3, x1_o_x4 = ex_dat$x1/ex_dat$x4, x2_o_x4 = ex_dat$x2/ex_dat$x4, x3_o_x4 = ex_dat$x3/ex_dat$x4 ) for(i in names(res3)) expect_equal(res3[i], obs3[i]) }) test_that('wrong type', { rec4 <- rec %>% step_ratio(x1, denom = denom_vars(all_predictors())) expect_error(prep(rec4, ex_dat, verbose = FALSE)) rec5 <- rec %>% step_ratio(all_predictors(), denom = denom_vars(x1)) expect_error(prep(rec5, ex_dat, verbose = FALSE)) rec6 <- rec %>% step_ratio(all_predictors(), denom = denom_vars(all_predictors())) expect_error(prep(rec6, ex_dat, verbose = FALSE)) }) test_that('printing', { rec3 <- rec %>% step_ratio(all_numeric(), denom = denom_vars(all_numeric())) expect_output(print(rec3)) expect_output(prep(rec3, training = ex_dat)) }) recipes/tests/testthat/test_sqrt.R0000644000177700017770000000120513135741217020415 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(tibble) n <- 20 ex_dat <- data.frame(x1 = seq(0, 1, length = n), x2 = rep(1:5, 4)) test_that('simple sqrt trans', { rec <- recipe(~., data = ex_dat) %>% step_sqrt(x1, x2) rec_trained <- prep(rec, training = ex_dat, verbose = FALSE) rec_trans <- bake(rec_trained, newdata = ex_dat) exp_res <- as_tibble(lapply(ex_dat, sqrt)) expect_equal(rec_trans, exp_res) }) test_that('printing', { rec <- recipe(~., data = ex_dat) %>% step_sqrt(x1, x2) expect_output(print(rec)) expect_output(prep(rec, training = ex_dat)) }) recipes/tests/testthat/test_kpca.R0000644000177700017770000000215413135741217020346 0ustar herbrandtherbrandtlibrary(testthat) library(kernlab) library(recipes) set.seed(131) tr_dat <- matrix(rnorm(100*6), ncol = 6) te_dat <- matrix(rnorm(20*6), ncol = 6) colnames(tr_dat) <- paste0("X", 1:6) colnames(te_dat) <- paste0("X", 1:6) rec <- recipe(X1 ~ ., data = tr_dat) test_that('correct kernel PCA values', { kpca_rec <- rec %>% step_kpca(X2, X3, X4, X5, X6) kpca_trained <- prep(kpca_rec, training = tr_dat, verbose = FALSE) pca_pred <- bake(kpca_trained, newdata = te_dat) pca_pred <- as.matrix(pca_pred) pca_exp <- kpca(as.matrix(tr_dat[, -1]), kernel = kpca_rec$steps[[1]]$options$kernel, kpar = kpca_rec$steps[[1]]$options$kpar) pca_pred_exp <- kernlab::predict(pca_exp, te_dat[, -1])[, 1:kpca_trained$steps[[1]]$num] colnames(pca_pred_exp) <- paste0("kPC", 1:kpca_trained$steps[[1]]$num) rownames(pca_pred) <- NULL rownames(pca_pred_exp) <- NULL expect_equal(pca_pred, pca_pred_exp) }) test_that('printing', { kpca_rec <- rec %>% step_kpca(X2, X3, X4, X5, X6) expect_output(print(kpca_rec)) expect_output(prep(kpca_rec, training = tr_dat)) }) recipes/tests/testthat/test_ica.R0000644000177700017770000001167213135741217020171 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) ## Generated using fastICA exp_comp <- structure( c(-0.741586750989301, -0.473165319478851, -0.532724778033598, 0.347336643017696, -0.523140911818999, 0.0839020928800183, -0.689112937865132, 1.1905359062157, 2.87389193916233, 3.87655326677861, 0.662748883270711, 0.108848159489063, 0.509384921516091, -0.708397515735095, -0.129606867389727, 1.7900565287023, 0.171125628795304, 0.314289325954585, -0.142425199147843, -0.619509248504534, 0.38690051207701, -0.414352364956822, -0.609744599991299, -0.144705519030626, -0.293470631707416, -0.791746573929697, -0.634208572824357, 1.36675934105489, -0.785855217530414, -0.730790987290872, -0.236417274868796, -0.210596011735952, -0.413793241941344, -0.511246150962085, -0.181254985021062, 0.298659496162363, -0.757969803548959, -0.666845883775384, -0.240983277334825, -0.394806974813201, 1.44451054341856, 3.33833135277739, -0.54575996404394, -0.423145023192357, -0.388925027133234, -0.418629250017466, -0.463085718807788, -0.14499128867367, 0.323243757311295, -0.417689940076107, -0.777761367811451, -0.799107717902467, -0.548346133015069, 0.769235286712577, -0.40466870434895, -0.591389964794494, -0.208052301856056, -0.945352336400244, 0.919793619211536, -0.561549525440524, -0.535789943464846, -0.735536725127484, 3.7162236121338, 0.459835444175181, 0.137984939011763, -0.755831873688866, -0.757751495230731, -0.512815283263682, 0.901123348803226, -0.755032174981781, -1.04745496967861, -0.481720409476034, -0.956534770489922, 2.39775097011864, -0.537189360991569, 0.455171520278689, -0.764070183446535, -0.0133182183358093, 0.0084094629323547, -0.11887530759164, -0.50492491720854, -0.731237740789087, -0.810056304451282, -0.0654477889270799, -0.165218457853762, -0.384457532271443, -1.25744957888255, -0.164838366701182, -0.818591960610985, -0.577844253001226, 0.159731749239493, -0.350242543749645, 3.22437340069565, -0.575271823706669, -0.171250094126726, 1.21819592885382, -0.303636775510361, 0.192247367642684, 0.235728177283036, -0.768212986589321, 0.333147682813931, -0.403932170943429, -0.261749940045069, -0.331436881499356, -0.298793661022028, -0.255788540744319, -0.764483629396313, -0.162133725599773, -0.10676549266036, -0.349722429991475, -0.340728544016434, -0.358565693266084, 0.0242508678396987, -0.277425329351928, 0.055217077863271, 0.146403703647814, -0.241268230680493, -0.283770652745491, -0.573657866580657, -0.224655195396099, 0.226079102614757, 2.03305968574443, -0.225655562941607, -0.155789455588855, -0.613828894885655, 0.480057477445702, 0.277055812270816, -0.263765068404404, 0.0411239101983566, 0.30164066516454, -0.760891669412883, -0.478609196612072, -0.162692709808673, 3.12547570195871, -0.189300748528298, -0.16882558146447, -0.30745201359965, 2.77823976198232, -0.306599455530011, -0.979722296618571, -0.913952653732135, -0.608622766593967, -0.061561169157735, 0.0134953299517241, -0.111595843415483, -0.0995809192931606, -0.353150299985198, -0.173474040260694, -0.11913118533085, -0.268152445374219, -1.64524056576117, -0.052825674116391, 2.82692828099746, -0.257823074604271, -0.0316348082448068, -0.347414676200845, -0.237534967478309, -0.266298103195764, -0.0555773569483491, 2.35155293218832), .Dim = c(80L, 2L), .Dimnames = list(c("15", "20", "26", "31", "36", "41", "46", "51", "55", "65", "69", "73", "76", "88", "91", "126", "132", "136", "141", "147", "155", "162", "167", "173", "178", "183", "190", "196", "203", "208", "213", "218", "223", "230", "235", "241", "252", "257", "262", "267", "277", "282", "286", "294", "299", "305", "309", "314", "319", "325", "330", "348", "353", "357", "359", "370", "375", "385", "399", "407", "409", "414", "419", "424", "429", "434", "439", "448", "467", "473", "477", "482", "485", "493", "499", "516", "519", "527", "532", "535" ), c("IC1", "IC2"))) rownames(exp_comp) <- NULL test_that('correct ICA values', { ica_extract <- rec %>% step_ica(carbon, hydrogen, oxygen, nitrogen, sulfur, num = 2) set.seed(12) ica_extract_trained <- prep(ica_extract, training = biomass_tr, verbose = FALSE) ica_pred <- bake(ica_extract_trained, newdata = biomass_te) ica_pred <- as.matrix(ica_pred) rownames(ica_pred) <- NULL expect_equal(ica_pred, exp_comp) }) test_that('printing', { ica_extract <- rec %>% step_ica(carbon, hydrogen, num = 2) expect_output(print(ica_extract)) expect_output(prep(ica_extract, training = biomass_tr)) }) recipes/tests/testthat/test_discretized.R0000644000177700017770000000221413135741217021736 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) ex_tr <- data.frame(x1 = 1:100, x2 = rep(1:5, each = 20), x3 = factor(rep(letters[1:2], each = 50))) ex_te <- data.frame(x1 = c(1, 50, 101, NA)) lvls_breaks_4 <- c('bin_missing', 'bin1', 'bin2', 'bin3', 'bin4') test_that('default args', { bin_1 <- discretize(ex_tr$x1) pred_1 <- predict(bin_1, ex_te$x1) exp_1 <- factor(c("bin1", "bin2", "bin4", "bin_missing"), levels = lvls_breaks_4) expect_equal(pred_1, exp_1) }) test_that('NA values', { bin_2 <- discretize(ex_tr$x1, keep_na = FALSE) pred_2 <- predict(bin_2, ex_te$x1) exp_2 <- factor(c("bin1", "bin2", "bin4", NA), levels = lvls_breaks_4[-1]) expect_equal(pred_2, exp_2) }) test_that('NA values from out of range', { bin_3 <- discretize(ex_tr$x1, keep_na = FALSE, infs = FALSE) pred_3 <- predict(bin_3, ex_te$x1) exp_3 <- factor(c("bin1", "bin2", NA, NA), levels = lvls_breaks_4[-1]) expect_equal(pred_3, exp_3) }) test_that('printing', { rec <- recipe(~., data = ex_tr) %>% step_discretize(x1) expect_output(print(rec)) expect_output(prep(rec, training = ex_tr)) }) recipes/tests/testthat/test_bagimpute.R0000644000177700017770000000341413135741217021405 0ustar herbrandtherbrandtlibrary(testthat) library(ipred) library(rpart) library(recipes) data("biomass") biomass$fac <- factor(sample(letters[1:2], size = nrow(biomass), replace = TRUE)) rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur + fac, data = biomass) test_that('imputation models', { imputed <- rec %>% step_bagimpute(carbon, fac, impute_with = imp_vars(hydrogen, oxygen), seed_val = 12) imputed_trained <- prep(imputed, training = biomass, verbose = FALSE) ## make sure we get the same trees given the same random samples carb_samps <- lapply(imputed_trained$steps[[1]]$models[["carbon"]]$mtrees, function(x) x$bindx) for(i in seq_along(carb_samps)) { carb_data <- biomass[carb_samps[[i]], c("carbon", "hydrogen", "oxygen")] carb_mod <- rpart(carbon ~ ., data = carb_data, control= rpart.control(xval=0)) expect_equal(carb_mod$splits, imputed_trained$steps[[1]]$models[["carbon"]]$mtrees[[i]]$btree$splits) } fac_samps <- lapply(imputed_trained$steps[[1]]$models[[1]]$mtrees, function(x) x$bindx) fac_ctrl <- imputed_trained$steps[[1]]$models[["fac"]]$mtrees[[1]]$btree$control ## make sure we get the same trees given the same random samples for(i in seq_along(fac_samps)) { fac_data <- biomass[fac_samps[[i]], c("fac", "hydrogen", "oxygen")] fac_mod <- rpart(fac ~ ., data = fac_data, control= fac_ctrl) expect_equal(fac_mod$splits, imputed_trained$steps[[1]]$models[["fac"]]$mtrees[[i]]$btree$splits) } }) test_that('printing', { imputed <- rec %>% step_bagimpute(carbon, impute_with = imp_vars(hydrogen), seed_val = 12) expect_output(print(imputed)) expect_output(prep(imputed, training = biomass)) }) recipes/tests/testthat/test_select_terms.R0000644000177700017770000000516013125050130022103 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(tibble) library(tidyselect) library(rlang) data(okc) rec1 <- recipe(~ ., data = okc) info1 <- summary(rec1) data(biomass) rec2 <- recipe(biomass) %>% add_role(carbon, hydrogen, oxygen, nitrogen, sulfur, new_role = "predictor") %>% add_role(HHV, new_role = "outcome") %>% add_role(sample, new_role = "id variable") %>% add_role(dataset, new_role = "splitting indicator") info2 <- summary(rec2) test_that('simple role selections', { expect_equal( terms_select(info = info1, quos(all_predictors())), info1$variable ) expect_error(terms_select(info = info1, quos(all_outcomes()))) expect_equal( terms_select(info = info2, quos(all_outcomes())), "HHV" ) expect_equal( terms_select(info = info2, quos(has_role("splitting indicator"))), "dataset" ) }) test_that('simple type selections', { expect_equal( terms_select(info = info1, quos(all_numeric())), c("age", "height") ) expect_equal( terms_select(info = info1, quos(has_type("date"))), "date" ) expect_equal( terms_select(info = info1, quos(all_nominal())), c("diet", "location") ) }) test_that('simple name selections', { expect_equal( terms_select(info = info1, quos(matches("e$"))), c("age", "date") ) expect_equal( terms_select(info = info2, quos(contains("gen"))), c("hydrogen", "oxygen", "nitrogen") ) expect_equal( terms_select(info = info2, quos(contains("gen"), -nitrogen)), c("hydrogen", "oxygen") ) expect_equal( terms_select(info = info1, quos(date, age)), c("date", "age") ) ## This is weird but consistent with `dplyr::select_vars` expect_equal( terms_select(info = info1, quos(-age, date)), c("diet", "height", "location", "date") ) expect_equal( terms_select(info = info1, quos(date, -age)), "date" ) expect_error(terms_select(info = info1, quos(log(date)))) expect_error(terms_select(info = info1, quos(date:age))) expect_error(terms_select(info = info1, quos(I(date:age)))) expect_error(terms_select(info = info1, quos(matches("blahblahblah")))) expect_error(terms_select(info = info1)) }) test_that('combinations', { expect_equal( terms_select(info = info2, quos(matches("[hH]"), -all_outcomes())), "hydrogen" ) expect_equal( terms_select(info = info2, quos(all_numeric(), -all_predictors())), "HHV" ) expect_equal( terms_select(info = info2, quos(all_numeric(), -all_predictors(), dataset)), c("HHV", "dataset") ) expect_equal( terms_select(info = info2, quos(all_numeric(), -all_predictors(), dataset, -dataset)), "HHV" ) }) recipes/tests/testthat/test_isomap.R0000644000177700017770000000265613135741217020727 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) ## expected results form the `dimRed` package exp_res <- structure(list(Isomap1 = c(0.312570873898531, 0.371885353599467, 2.23124009833741, 0.248271457498181, -0.420128801874122), Isomap2 = c(-0.443724171391742, -0.407721529759647, 0.245721022395862, 3.112001672258, 0.0292770508011519), Isomap3 = c(0.761529345514676, 0.595015565588918, 1.59943072269788, 0.566884409484389, 1.53770327701819)), .Names = c("Isomap1","Isomap2", "Isomap3"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L)) set.seed(1) dat1 <- matrix(rnorm(15), ncol = 3) dat2 <- matrix(rnorm(15), ncol = 3) colnames(dat1) <- paste0("x", 1:3) colnames(dat2) <- paste0("x", 1:3) rec <- recipe( ~ ., data = dat1) test_that('correct Isomap values', { skip_on_cran() im_rec <- rec %>% step_isomap(x1, x2, x3, options = list(knn = 3), num = 3) im_trained <- prep(im_rec, training = dat1, verbose = FALSE) im_pred <- bake(im_trained, newdata = dat2) all.equal(as.matrix(im_pred), as.matrix(exp_res)) }) test_that('printing', { im_rec <- rec %>% step_isomap(x1, x2, x3, options = list(knn = 3), num = 3) expect_output(print(im_rec)) expect_output(prep(im_rec, training = dat1)) }) recipes/tests/testthat/test_multivariate.R0000644000177700017770000000152113135741217022133 0ustar herbrandtherbrandtlibrary(tibble) library(recipes) data("biomass") test_that('multivariate outcome', { raw_recipe <- recipe(carbon + hydrogen ~ oxygen + nitrogen + sulfur, data = biomass) rec <- raw_recipe %>% step_center(all_outcomes()) %>% step_scale(all_predictors()) rec_trained <- prep(rec, training = biomass) results <- bake(rec_trained, head(biomass)) exp_res <- biomass pred <- c("oxygen", "nitrogen", "sulfur") outcome <- c("carbon", "hydrogen") for(i in pred) exp_res[,i] <- exp_res[,i]/sd(exp_res[,i]) for(i in outcome) exp_res[,i] <- exp_res[,i]-mean(exp_res[,i]) expect_equal(rec$term_info$variable[rec$term_info$role == "outcome"], outcome) expect_equal(rec$term_info$variable[rec$term_info$role == "predictor"], pred) expect_equal(exp_res[1:6, colnames(results)], as.data.frame(results)) }) recipes/tests/testthat/test_classdist.R0000644000177700017770000000300613135741217021416 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) test_that("defaults", { rec <- recipe(Species ~ ., data = iris) %>% step_classdist(all_predictors(), class = "Species", log = FALSE) trained <- prep(rec, training = iris, verbose = FALSE) dists <- bake(trained, newdata = iris) dists <- dists[, grepl("classdist", names(dists))] dists <- as.data.frame(dists) split_up <- split(iris[, 1:4], iris$Species) mahalanobis2 <- function(x, y) mahalanobis(y, center = colMeans(x), cov = cov(x)) exp_res <- lapply(split_up, mahalanobis2, y = iris[, 1:4]) exp_res <- as.data.frame(exp_res) for(i in 1:ncol(exp_res)) expect_equal(dists[, i], exp_res[, i]) }) test_that("alt args", { rec <- recipe(Species ~ ., data = iris) %>% step_classdist(all_predictors(), class = "Species", log = FALSE, mean_func = median) trained <- prep(rec, training = iris, verbose = FALSE) dists <- bake(trained, newdata = iris) dists <- dists[, grepl("classdist", names(dists))] dists <- as.data.frame(dists) split_up <- split(iris[, 1:4], iris$Species) mahalanobis2 <- function(x, y) mahalanobis(y, center = apply(x, 2, median), cov = cov(x)) exp_res <- lapply(split_up, mahalanobis2, y = iris[, 1:4]) exp_res <- as.data.frame(exp_res) for(i in 1:ncol(exp_res)) expect_equal(dists[, i], exp_res[, i]) }) test_that('printing', { rec <- recipe(Species ~ ., data = iris) %>% step_classdist(all_predictors(), class = "Species", log = FALSE) expect_output(print(rec)) expect_output(prep(rec, training = iris)) }) recipes/tests/testthat/test_shuffle.R0000644000177700017770000000345013135741217021064 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) n <- 50 set.seed(424) dat <- data.frame( x1 = sort(rnorm(n)), x2 = sort(rep(1:5, each = 10)), x3 = sort(factor(rep(letters[1:3], c(2, 2, 46)))), x4 = 1, y = sort(runif(n)) ) test_that('numeric data', { rec1 <- recipe(y ~ ., data = dat) %>% step_shuffle(all_numeric()) rec1 <- prep(rec1, training = dat, verbose = FALSE) set.seed(7046) dat1 <- bake(rec1, dat) exp1 <- c(FALSE, FALSE, TRUE, TRUE) obs1 <- rep(NA, 4) for (i in 1:ncol(dat1)) obs1[i] <- isTRUE(all.equal(dat[, i], getElement(dat1, names(dat)[i]))) expect_equal(exp1, obs1) }) test_that('nominal data', { rec2 <- recipe(y ~ ., data = dat) %>% step_shuffle(all_nominal()) rec2 <- prep(rec2, training = dat, verbose = FALSE) set.seed(804) dat2 <- bake(rec2, dat) exp2 <- c(TRUE, TRUE, FALSE, TRUE) obs2 <- rep(NA, 4) for (i in 1:ncol(dat2)) obs2[i] <- isTRUE(all.equal(dat[, i], getElement(dat2, names(dat)[i]))) expect_equal(exp2, obs2) }) test_that('all data', { rec3 <- recipe(y ~ ., data = dat) %>% step_shuffle(everything()) rec3 <- prep(rec3, training = dat, verbose = FALSE) set.seed(2516) dat3 <- bake(rec3, dat) exp3 <- c(FALSE, FALSE, FALSE, TRUE) obs3 <- rep(NA, 4) for (i in 1:ncol(dat3)) obs3[i] <- isTRUE(all.equal(dat[, i], getElement(dat3, names(dat)[i]))) expect_equal(exp3, obs3) }) test_that('printing', { rec3 <- recipe(y ~ ., data = dat) %>% step_shuffle(everything()) expect_output(print(rec3)) expect_output(prep(rec3, training = dat)) }) test_that('bake a single row', { rec4 <- recipe(y ~ ., data = dat) %>% step_shuffle(everything()) rec4 <- prep(rec4, training = dat, verbose = FALSE) expect_warning(dat4 <- bake(rec4, dat[1,], everything())) expect_equal(dat4, dat[1,]) }) recipes/tests/testthat/test-basics.R0000644000177700017770000000325413135757306020622 0ustar herbrandtherbrandtlibrary(testthat) context("Testing basic functionalities") library(tibble) library(recipes) data("biomass") test_that("Recipe correctly identifies output variable", { raw_recipe <- recipe(HHV ~ ., data = biomass) var_info <- raw_recipe$var_info expect_true(is.tibble(var_info)) outcome_ind <- which(var_info$variable == "HHV") expect_true(var_info$role[outcome_ind] == "outcome") expect_true(all(var_info$role[-outcome_ind] == rep("predictor", ncol(biomass) - 1))) }) test_that("Recipe fails on in-line functions", { expect_error(recipe(HHV ~ log(nitrogen), data = biomass)) expect_error(recipe(HHV ~ (.)^2, data = biomass)) expect_error(recipe(HHV ~ nitrogen + sulfur + nitrogen:sulfur, data = biomass)) expect_error(recipe(HHV ~ nitrogen^2, data = biomass)) }) test_that("return character or factor values", { raw_recipe <- recipe(HHV ~ ., data = biomass) centered <- raw_recipe %>% step_center(carbon, hydrogen, oxygen, nitrogen, sulfur) centered_char <- prep(centered, training = biomass, stringsAsFactors = FALSE, retain = TRUE) char_var <- bake(centered_char, newdata = head(biomass)) expect_equal(class(char_var$sample), "character") centered_fac <- prep(centered, training = biomass, stringsAsFactors = TRUE, retain = TRUE) fac_var <- bake(centered_fac, newdata = head(biomass)) expect_equal(class(fac_var$sample), "factor") expect_equal(levels(fac_var$sample), sort(unique(biomass$sample))) }) test_that("Using prepare", { expect_error(prepare(recipe(HHV ~ ., data = biomass), training = biomass), paste0("As of version 0.0.1.9006, used `prep` ", "instead of `prepare`")) }) recipes/tests/testthat/test_date.R0000644000177700017770000000644413135741217020353 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(lubridate) library(tibble) examples <- data.frame(Dan = ymd("2002-03-04") + days(1:10), Stefan = ymd("2006-01-13") + days(1:10)) examples$Dan <- as.POSIXct(examples$Dan) date_rec <- recipe(~ Dan + Stefan, examples) %>% step_date(all_predictors()) feats <- c("year", "doy", "week", "decimal", "semester", "quarter", "dow", "month") test_that('default option', { date_rec <- recipe(~ Dan + Stefan, examples) %>% step_date(all_predictors(), features = feats) date_rec <- prep(date_rec, training = examples) date_res <- bake(date_rec, newdata = examples) date_exp <- tibble( Dan = examples$Dan, Stefan = examples$Stefan, Dan_year = year(examples$Dan), Dan_doy = yday(examples$Dan), Dan_week = week(examples$Dan), Dan_decimal = decimal_date(examples$Dan), Dan_semester = semester(examples$Dan), Dan_quarter = quarter(examples$Dan), Dan_dow = wday(examples$Dan, label = TRUE, abbr = TRUE), Dan_month = month(examples$Dan, label = TRUE, abbr = TRUE), Stefan_year = year(examples$Stefan), Stefan_doy = yday(examples$Stefan), Stefan_week = week(examples$Stefan), Stefan_decimal = decimal_date(examples$Stefan), Stefan_semester = semester(examples$Stefan), Stefan_quarter = quarter(examples$Stefan), Stefan_dow = wday(examples$Stefan, label = TRUE, abbr = TRUE), Stefan_month = month(examples$Stefan, label = TRUE, abbr = TRUE) ) date_exp$Dan_dow <- factor(as.character(date_exp$Dan_dow), levels = levels(date_exp$Dan_dow)) date_exp$Dan_month <- factor(as.character(date_exp$Dan_month), levels = levels(date_exp$Dan_month)) date_exp$Stefan_dow <- factor(as.character(date_exp$Stefan_dow), levels = levels(date_exp$Stefan_dow)) date_exp$Stefan_month <- factor(as.character(date_exp$Stefan_month), levels = levels(date_exp$Stefan_month)) expect_equal(date_res, date_exp) }) test_that('nondefault options', { date_rec <- recipe(~ Dan + Stefan, examples) %>% step_date(all_predictors(), features = c("dow", "month"), label = FALSE) date_rec <- prep(date_rec, training = examples) date_res <- bake(date_rec, newdata = examples) date_exp <- tibble( Dan = examples$Dan, Stefan = examples$Stefan, Dan_dow = wday(examples$Dan, label = FALSE), Dan_month = month(examples$Dan, label = FALSE), Stefan_dow = wday(examples$Stefan, label = FALSE), Stefan_month = month(examples$Stefan, label = FALSE) ) expect_equal(date_res, date_exp) }) test_that('ordinal values', { date_rec <- recipe(~ Dan + Stefan, examples) %>% step_date(all_predictors(), features = c("dow", "month"), ordinal = TRUE) date_rec <- prep(date_rec, training = examples) date_res <- bake(date_rec, newdata = examples) date_exp <- tibble( Dan = examples$Dan, Stefan = examples$Stefan, Dan_dow = wday(examples$Dan, label = TRUE), Dan_month = month(examples$Dan, label = TRUE), Stefan_dow = wday(examples$Stefan, label = TRUE), Stefan_month = month(examples$Stefan, label = TRUE) ) expect_equal(date_res, date_exp) }) test_that('printing', { date_rec <- recipe(~ Dan + Stefan, examples) %>% step_date(all_predictors(), features = feats) expect_output(print(date_rec)) expect_output(prep(date_rec, training = examples)) }) recipes/tests/testthat/test_roles.R0000644000177700017770000000227713125447176020570 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(tibble) data(biomass) test_that('default method', { rec <- recipe(x = biomass) exp_res <- tibble(variable = colnames(biomass), type = rep(c("nominal", "numeric"), c(2, 6)), role = NA, source = "original") expect_equal(summary(rec, TRUE), exp_res) }) test_that('changing roles', { rec <- recipe(x = biomass) rec <- add_role(rec, sample, new_role = "some other role") exp_res <- tibble(variable = colnames(biomass), type = rep(c("nominal", "numeric"), c(2, 6)), role = rep(c("some other role", NA), c(1, 7)), source = "original") expect_equal(summary(rec, TRUE), exp_res) }) test_that('change existing role', { rec <- recipe(x = biomass) rec <- add_role(rec, sample, new_role = "some other role") rec <- add_role(rec, sample, new_role = "other other role") exp_res <- tibble(variable = colnames(biomass), type = rep(c("nominal", "numeric"), c(2, 6)), role = rep(c("other other role", NA), c(1, 7)), source = "original") expect_equal(summary(rec, TRUE), exp_res) }) recipes/tests/testthat/test_intercept.R0000644000177700017770000000276413135741217021434 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(tibble) ex_dat <- data.frame(cat = rep(c("A", "B"), each = 5), numer = 1:10) test_that('add appropriate column with default settings', { rec <- recipe(~ ., data = ex_dat) %>% step_intercept() rec_trained <- prep(rec, training = ex_dat, verbose = FALSE) rec_trans <- bake(rec_trained, newdata = ex_dat) exp_res <- tibble::add_column(ex_dat, "intercept" = 1, .before = TRUE) expect_equal(rec_trans, exp_res) }) test_that('adds arbitrary numeric column', { rec <- recipe(~ ., data = ex_dat) %>% step_intercept(name = "(Intercept)", value = 2.5) rec_trained <- prep(rec, training = ex_dat, verbose = FALSE) rec_trans <- bake(rec_trained, newdata = ex_dat) exp_res <- tibble::add_column(ex_dat, "(Intercept)" = 2.5, .before = TRUE) expect_equal(rec_trans, exp_res) }) test_that('deals with bad input', { expect_error( recipe(~ ., data = ex_dat) %>% step_intercept(value = "Pie") %>% prep(), "Intercept value must be numeric." ) expect_error( recipe(~ ., data = ex_dat) %>% step_intercept(name = 4) %>% prep(), "Intercept/constant column name must be a character value." ) expect_warning( recipe(~ ., data = ex_dat) %>% step_intercept(all_predictors()) %>% prep(), "Selectors are not used for this step." ) }) test_that('printing', { rec <- recipe(~ ., data = ex_dat) %>% step_intercept() expect_output(print(rec)) expect_output(prep(rec, training = ex_dat)) }) recipes/tests/testthat/test_nzv.R0000644000177700017770000000307413135741217020247 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) n <- 50 set.seed(424) dat <- data.frame(x1 = rnorm(n), x2 = rep(1:5, each = 10), x3 = factor(rep(letters[1:3], c(2, 2, 46))), x4 = 1, y = runif(n)) ratios <- function(x) { tab <- sort(table(x), decreasing = TRUE) if(length(tab) > 1) tab[1]/tab[2] else Inf } pct_uni <- vapply(dat[, -5], function(x) length(unique(x)), c(val = 0))/nrow(dat)*100 f_ratio <- vapply(dat[, -5], ratios, c(val = 0)) vars <- names(pct_uni) test_that('nzv filtering', { rec <- recipe(y ~ ., data = dat) filtering <- rec %>% step_nzv(x1, x2, x3, x4) filtering_trained <- prep(filtering, training = dat, verbose = FALSE) removed <- vars[ pct_uni <= filtering_trained$steps[[1]]$options$unique_cut & f_ratio >= filtering_trained$steps[[1]]$options$freq_cut] expect_equal(filtering_trained$steps[[1]]$removals, removed) }) test_that('altered options', { rec <- recipe(y ~ ., data = dat) filtering <- rec %>% step_nzv(x1, x2, x3, x4, options = list(freq_cut = 50, unique_cut = 10)) filtering_trained <- prep(filtering, training = dat, verbose = FALSE) removed <- vars[ pct_uni <= filtering_trained$steps[[1]]$options$unique_cut & f_ratio >= filtering_trained$steps[[1]]$options$freq_cut] expect_equal(filtering_trained$steps[[1]]$removals, removed) }) test_that('printing', { rec <- recipe(y ~ ., data = dat) %>% step_nzv(x1, x2, x3, x4) expect_output(print(rec)) expect_output(prep(rec, training = dat)) }) recipes/tests/testthat/test_logit.R0000644000177700017770000000145313135741217020547 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(tibble) n <- 20 set.seed(12) ex_dat <- data.frame(x1 = runif(n), x2 = rnorm(n)) test_that('simple logit trans', { rec <- recipe(~., data = ex_dat) %>% step_logit(x1) rec_trained <- prep(rec, training = ex_dat, verbose = FALSE) rec_trans <- bake(rec_trained, newdata = ex_dat) exp_res <- as_tibble(ex_dat) exp_res$x1 <- binomial()$linkfun(exp_res$x1) expect_equal(rec_trans, exp_res) }) test_that('out of bounds logit trans', { rec <- recipe(~., data = ex_dat) %>% step_logit(x1, x2) expect_error(prep(rec, training = ex_dat, verbose = FALSE)) }) test_that('printing', { rec <- recipe(~., data = ex_dat) %>% step_logit(x1) expect_output(print(rec)) expect_output(prep(rec, training = ex_dat)) }) recipes/tests/testthat/test_corr.R0000644000177700017770000000175713135741217020405 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) n <- 100 set.seed(424) dat <- matrix(rnorm(n*5), ncol = 5) dat <- as.data.frame(dat) dat$duplicate <- dat$V1 dat$V6 <- -dat$V2 + runif(n)*.2 test_that('high filter', { set.seed(1) rec <- recipe(~ ., data = dat) filtering <- rec %>% step_corr(all_predictors(), threshold = .5) filtering_trained <- prep(filtering, training = dat, verbose = FALSE) removed <- c("V6", "V1") expect_equal(filtering_trained$steps[[1]]$removals, removed) }) test_that('low filter', { rec <- recipe(~ ., data = dat) filtering <- rec %>% step_corr(all_predictors(), threshold = 1) filtering_trained <- prep(filtering, training = dat, verbose = FALSE) expect_equal(filtering_trained$steps[[1]]$removals, numeric(0)) }) test_that('printing', { set.seed(1) rec <- recipe(~ ., data = dat) filtering <- rec %>% step_corr(all_predictors(), threshold = .5) expect_output(print(filtering)) expect_output(prep(filtering, training = dat)) }) recipes/tests/testthat/test_retraining.R0000644000177700017770000000174213135741217021574 0ustar herbrandtherbrandtcontext("Testing retraining") data(biomass) rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass) test_that('training in stages', { skip_on_cran() at_once <- rec %>% step_center(carbon, hydrogen, oxygen, nitrogen, sulfur) %>% step_scale(carbon, hydrogen, oxygen, nitrogen, sulfur) at_once_trained <- prep(at_once, training = biomass, verbose = FALSE) ## not train in stages center_first <- rec %>% step_center(carbon, hydrogen, oxygen, nitrogen, sulfur) center_first_trained <- prep(center_first, training = biomass, verbose = FALSE) in_stages <- center_first_trained %>% step_scale(carbon, hydrogen, oxygen, nitrogen, sulfur) in_stages_trained <- prep(in_stages, training = biomass, verbose = FALSE) in_stages_retrained <- prep(in_stages, training = biomass, verbose = FALSE, fresh = TRUE) expect_equal(at_once_trained, in_stages_trained) expect_equal(at_once_trained, in_stages_retrained) }) recipes/tests/testthat/test_roll.R0000644000177700017770000000504213135741217020377 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(tibble) set.seed(5522) sim_dat <- data.frame(x1 = (20:100) / 10) n <- nrow(sim_dat) sim_dat$y1 <- sin(sim_dat$x1) + rnorm(n, sd = 0.1) sim_dat$y2 <- cos(sim_dat$x1) + rnorm(n, sd = 0.1) sim_dat$x2 <- runif(n) sim_dat$x3 <- rnorm(n) sim_dat$fac <- sample(letters[1:3], size = n, replace = TRUE) rec <- recipe( ~ ., data = sim_dat) test_that('error checks', { expect_error(rec %>% step_window()) expect_error(rec %>% step_window(y1, size = 6)) expect_error(rec %>% step_window(y1, size = NA)) expect_error(rec %>% step_window(y1, size = NULL)) expect_error(rec %>% step_window(y1, statistic = "average")) expect_error(rec %>% step_window(y1, size = 1)) expect_error(rec %>% step_window(y1, size = 2)) expect_error(rec %>% step_window(y1, size = -1)) expect_warning(rec %>% step_window(y1, size = pi)) expect_error(prep(rec %>% step_window(fac), training = sim_dat)) expect_error(prep(rec %>% step_window(y1, size = 1000L), training = sim_dat)) bad_names <- rec %>% step_window(starts_with("y"), names = "only_one_name") expect_error(prep(bad_names, training = sim_dat)) }) test_that('basic moving average', { simple_ma <- rec %>% step_window(starts_with("y")) simple_ma <- prep(simple_ma, training = sim_dat) simple_ma_res <- bake(simple_ma, newdata = sim_dat) expect_equal(names(sim_dat), names(simple_ma_res)) for (i in 2:(n - 1)) { expect_equal(simple_ma_res$y1[i], mean(sim_dat$y1[(i - 1):(i + 1)])) expect_equal(simple_ma_res$y2[i], mean(sim_dat$y2[(i - 1):(i + 1)])) } expect_equal(simple_ma_res$y1[1], mean(sim_dat$y1[1:3])) expect_equal(simple_ma_res$y2[1], mean(sim_dat$y2[1:3])) expect_equal(simple_ma_res$y1[n], mean(sim_dat$y1[(n - 2):n])) expect_equal(simple_ma_res$y2[n], mean(sim_dat$y2[(n - 2):n])) }) test_that('creating new variables', { new_names <- rec %>% step_window(starts_with("y"), names = paste0("new", 1:2), role = "predictor") new_names <- prep(new_names, training = sim_dat) new_names_res <- bake(new_names, newdata = sim_dat) simple_ma <- rec %>% step_window(starts_with("y")) simple_ma <- prep(simple_ma, training = sim_dat) simple_ma_res <- bake(simple_ma, newdata = sim_dat) expect_equal(new_names_res$new1, simple_ma_res$y1) expect_equal(new_names_res$new2, simple_ma_res$y2) }) test_that('printing', { new_names <- rec %>% step_window(starts_with("y"), names = paste0("new", 1:2), role = "predictor") expect_output(print(new_names)) expect_output(prep(new_names, training = sim_dat)) }) recipes/tests/testthat/test_log.R0000644000177700017770000000203413135741217020206 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(tibble) n <- 20 set.seed(1) ex_dat <- data.frame(x1 = exp(rnorm(n, mean = .1)), x2 = 1/abs(rnorm(n)), x3 = rep(1:2, each = n/2), x4 = rexp(n)) test_that('simple log trans', { rec <- recipe(~., data = ex_dat) %>% step_log(x1, x2, x3, x4) rec_trained <- prep(rec, training = ex_dat, verbose = FALSE) rec_trans <- bake(rec_trained, newdata = ex_dat) exp_res <- as_tibble(lapply(ex_dat, log)) expect_equal(rec_trans, exp_res) }) test_that('alt base', { rec <- recipe(~., data = ex_dat) %>% step_log(x1, x2, x3, x4, base = pi) rec_trained <- prep(rec, training = ex_dat, verbose = FALSE) rec_trans <- bake(rec_trained, newdata = ex_dat) exp_res <- as_tibble(lapply(ex_dat, log, base = pi)) expect_equal(rec_trans, exp_res) }) test_that('printing', { rec <- recipe(~., data = ex_dat) %>% step_log(x1, x2, x3, x4) expect_output(print(rec)) expect_output(prep(rec, training = ex_dat)) }) recipes/tests/testthat/test_pca.R0000644000177700017770000000471013135741217020173 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) test_that('correct PCA values', { pca_extract <- rec %>% step_center(carbon, hydrogen, oxygen ,nitrogen, sulfur) %>% step_scale(carbon, hydrogen, oxygen ,nitrogen, sulfur) %>% step_pca(carbon, hydrogen, oxygen, nitrogen, sulfur, options = list(retx = TRUE)) pca_extract_trained <- prep(pca_extract, training = biomass_tr, verbose = FALSE) pca_pred <- bake(pca_extract_trained, newdata = biomass_te) pca_pred <- as.matrix(pca_pred) pca_exp <- prcomp(biomass_tr[, 3:7], center = TRUE, scale. = TRUE, retx = TRUE) pca_pred_exp <- predict(pca_exp, biomass_te[, 3:7])[, 1:pca_extract$steps[[3]]$num] rownames(pca_pred) <- NULL rownames(pca_pred_exp) <- NULL expect_equal(pca_pred, pca_pred_exp) }) test_that('correct PCA values with threshold', { pca_extract <- rec %>% step_center(carbon, hydrogen, oxygen ,nitrogen, sulfur) %>% step_scale(carbon, hydrogen, oxygen ,nitrogen, sulfur) %>% step_pca(carbon, hydrogen, oxygen, nitrogen, sulfur, threshold = .5) pca_extract_trained <- prep(pca_extract, training = biomass_tr, verbose = FALSE) pca_exp <- prcomp(biomass_tr[, 3:7], center = TRUE, scale. = TRUE, retx = TRUE) # cumsum(pca_exp$sdev^2)/sum(pca_exp$sdev^2) expect_equal(pca_extract_trained$steps[[3]]$num, 2) }) test_that('Reduced rotation size', { pca_extract <- rec %>% step_center(carbon, hydrogen, oxygen ,nitrogen, sulfur) %>% step_scale(carbon, hydrogen, oxygen ,nitrogen, sulfur) %>% step_pca(carbon, hydrogen, oxygen, nitrogen, sulfur, num = 3) pca_extract_trained <- prep(pca_extract, training = biomass_tr, verbose = FALSE) pca_pred <- bake(pca_extract_trained, newdata = biomass_te) pca_pred <- as.matrix(pca_pred) pca_exp <- prcomp(biomass_tr[, 3:7], center = TRUE, scale. = TRUE, retx = TRUE) pca_pred_exp <- predict(pca_exp, biomass_te[, 3:7])[, 1:3] rownames(pca_pred_exp) <- NULL rownames(pca_pred) <- NULL rownames(pca_pred_exp) <- NULL expect_equal(pca_pred, pca_pred_exp) }) test_that('printing', { pca_extract <- rec %>% step_pca(carbon, hydrogen, oxygen, nitrogen, sulfur) expect_output(print(pca_extract)) expect_output(prep(pca_extract, training = biomass_tr)) }) recipes/tests/testthat/test_knnimpute.R0000644000177700017770000000361513135741217021445 0ustar herbrandtherbrandtlibrary(testthat) library(gower) library(recipes) library(dplyr) data("biomass") rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass) biomass_tr <- biomass[biomass$dataset == "Training", ] biomass_te <- biomass[biomass$dataset == "Testing", ] # induce some missing data at random set.seed(9039) carb_missing <- sample(1:nrow(biomass_te), 3) nitro_missing <- sample(1:nrow(biomass_te), 3) biomass_te$carbon[carb_missing] <- NA biomass_te$nitrogen[nitro_missing] <- NA test_that('imputation values', { discr_rec <- rec %>% step_discretize(nitrogen, options = list(keep_na = FALSE)) impute_rec <- discr_rec %>% step_knnimpute(carbon, nitrogen, impute_with = imp_vars(hydrogen, oxygen, nitrogen), K = 3) discr_rec <- prep(discr_rec, training = biomass_tr, verbose = FALSE) tr_data <- bake(discr_rec, newdata = biomass_tr) te_data <- bake(discr_rec, newdata = biomass_te) %>% dplyr::select(hydrogen, oxygen, nitrogen, carbon) nn <- gower_topn(te_data[, c("hydrogen", "oxygen", "nitrogen")], tr_data[, c("hydrogen", "oxygen", "nitrogen")], n = 3)$index impute_rec <- prep(impute_rec, training = biomass_tr, verbose = FALSE) imputed_te <- bake(impute_rec, newdata = biomass_te) for(i in carb_missing) { nn_tr_ind <- nn[, i] nn_tr_data <- tr_data$carbon[nn_tr_ind] expect_equal(imputed_te$carbon[i], mean(nn_tr_data)) } for(i in nitro_missing) { nn_tr_ind <- nn[, i] nn_tr_data <- tr_data$nitrogen[nn_tr_ind] expect_equal(as.character(imputed_te$nitrogen[i]), recipes:::mode_est(nn_tr_data)) } }) test_that('printing', { discr_rec <- rec %>% step_discretize(nitrogen, options = list(keep_na = FALSE)) expect_output(print(discr_rec)) expect_output(prep(discr_rec, training = biomass_tr)) }) recipes/tests/testthat/test_bin2factor.R0000644000177700017770000000215713135741217021464 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) data(covers) rec <- recipe(~ description, covers) %>% step_regex(description, pattern = "(rock|stony)", result = "rocks") %>% step_regex(description, pattern = "(rock|stony)", result = "more_rocks") test_that('default options', { rec1 <- rec %>% step_bin2factor(rocks) rec1 <- prep(rec1, training = covers) res1 <- bake(rec1, newdata = covers) expect_true(all(diag(table(res1$rocks, res1$more_rocks)) == 0)) }) test_that('nondefault options', { rec2 <- rec %>% step_bin2factor(rocks, levels = letters[2:1]) rec2 <- prep(rec2, training = covers) res2 <- bake(rec2, newdata = covers) expect_true(all(diag(table(res2$rocks, res2$more_rocks)) == 0)) }) test_that('bad options', { rec3 <- rec %>% step_bin2factor(description) expect_error(prep(rec3, training = covers)) expect_error(rec %>% step_bin2factor(rocks, levels = letters[1:5])) expect_error(rec %>% step_bin2factor(rocks, levels = 1:2)) }) test_that('printing', { rec2 <- rec %>% step_bin2factor(rocks, levels = letters[2:1]) expect_output(print(rec2)) expect_output(prep(rec2, training = covers)) }) recipes/tests/testthat/test_range.R0000644000177700017770000000607013135741217020525 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) data(biomass) biomass_tr <- biomass[1:10,] biomass_te <- biomass[c(13:14, 19, 522),] rec <- recipe(HHV ~ carbon + hydrogen, data = biomass_tr) test_that('correct values', { standardized <- rec %>% step_range(carbon, hydrogen, min = -12) standardized_trained <- prep(standardized, training = biomass_tr, verbose = FALSE) obs_pred <- bake(standardized_trained, newdata = biomass_te) obs_pred <- as.matrix(obs_pred) mins <- apply(biomass_tr[, c("carbon", "hydrogen")], 2, min) maxs <- apply(biomass_tr[, c("carbon", "hydrogen")], 2, max) new_min <- -12 new_max <- 1 new_range <- new_max - new_min carb <- ((new_range * (biomass_te$carbon - mins["carbon"])) / (maxs["carbon"] - mins["carbon"])) + new_min carb <- ifelse(carb > new_max, new_max, carb) carb <- ifelse(carb < new_min, new_min, carb) hydro <- ((new_range * (biomass_te$hydrogen - mins["hydrogen"])) / (maxs["hydrogen"] - mins["hydrogen"])) + new_min hydro <- ifelse(hydro > new_max, new_max, hydro) hydro <- ifelse(hydro < new_min, new_min, hydro) exp_pred <- cbind(carb, hydro) colnames(exp_pred) <- c("carbon", "hydrogen") expect_equal(exp_pred, obs_pred) }) test_that('defaults', { standardized <- rec %>% step_range(carbon, hydrogen) standardized_trained <- prep(standardized, training = biomass_tr, verbose = FALSE) obs_pred <- bake(standardized_trained, newdata = biomass_te) obs_pred <- as.matrix(obs_pred) mins <- apply(biomass_tr[, c("carbon", "hydrogen")], 2, min) maxs <- apply(biomass_tr[, c("carbon", "hydrogen")], 2, max) new_min <- 0 new_max <- 1 new_range <- new_max - new_min carb <- ((new_range * (biomass_te$carbon - mins["carbon"])) / (maxs["carbon"] - mins["carbon"])) + new_min carb <- ifelse(carb > new_max, new_max, carb) carb <- ifelse(carb < new_min, new_min, carb) hydro <- ((new_range * (biomass_te$hydrogen - mins["hydrogen"])) / (maxs["hydrogen"] - mins["hydrogen"])) + new_min hydro <- ifelse(hydro > new_max, new_max, hydro) hydro <- ifelse(hydro < new_min, new_min, hydro) exp_pred <- cbind(carb, hydro) colnames(exp_pred) <- c("carbon", "hydrogen") expect_equal(exp_pred, obs_pred) }) test_that('one variable', { standardized <- rec %>% step_range(carbon) standardized_trained <- prep(standardized, training = biomass_tr, verbose = FALSE) obs_pred <- bake(standardized_trained, newdata = biomass_te) mins <- min(biomass_tr$carbon) maxs <- max(biomass_tr$carbon) new_min <- 0 new_max <- 1 new_range <- new_max - new_min carb <- ((new_range * (biomass_te$carbon - mins)) / (maxs - mins)) + new_min carb <- ifelse(carb > new_max, new_max, carb) carb <- ifelse(carb < new_min, new_min, carb) expect_equal(carb, obs_pred$carbon) }) test_that('printing', { standardized <- rec %>% step_range(carbon, hydrogen, min = -12) expect_output(print(standardized)) expect_output(prep(standardized, training = biomass_tr)) }) recipes/tests/testthat/test_BoxCox.R0000644000177700017770000000576313135741217020643 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) n <- 20 set.seed(1) ex_dat <- data.frame(x1 = exp(rnorm(n, mean = .1)), x2 = 1/rnorm(n), x3 = rep(1:2, each = n/2), x4 = rexp(n)) ## from `car` package exp_lambda <- c(x1 = 0.2874304685, x2 = NA, x3 = NA, x4 = 0.06115365314) exp_dat <- structure(list(x1 = c(-0.48855792533959, 0.295526451871788, -0.66306066037752, 2.18444062220084, 0.45714544418559, -0.650762952308473, 0.639934327981261, 0.94795174900382, 0.745877376631664, -0.199443408020842, 2.05013184840922, 0.526004196848377, -0.484073411411316, -1.5846209165316, 1.46827089088108, 0.0555044880684726, 0.0848273579417863, 1.21733702306844, 1.05470177834901, 0.76793945044649), x2 = c(1.0881660755694, 1.27854953038913, 13.4111208085756, -0.502676325196487, 1.61335666257264, -17.8161848705567, -6.41867035287092, -0.679924106156326, -2.09139367300257, 2.39267901359744, 0.736008721758276, -9.72878791903891, 2.57950278065913, -18.5856192870844, -0.726185004156987, -2.40967012205861, -2.5362046143702, -16.8595975858421, 0.909069940992826, 1.31031417340121), x3 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), x4 = c(-0.0299153493217198, -0.00545480495048682, -0.605467890118739, 0.771791879612809, -0.763649380406524, 0.872804671752781, 1.38894407918253, -0.537364454265797, -0.482864603899052, -0.0227886234018179, -1.25797709152009, -0.995703197045091, 0.102163556869708, -0.246753343931442, -1.7395729395129, 0.104247324965852, -1.15077903230011, 0.48306309307708, 1.99265865015763, -0.747338829803379)), .Names = c("x1", "x2", "x3", "x4"), row.names = c(NA, -20L), class = "data.frame") test_that('simple Box Cox', { rec <- recipe(~., data = ex_dat) %>% step_BoxCox(x1, x2, x3, x4) rec_trained <- prep(rec, training = ex_dat, verbose = FALSE) rec_trans <- bake(rec_trained, newdata = ex_dat) expect_equal(names(exp_lambda)[!is.na(exp_lambda)], names(rec_trained$steps[[1]]$lambdas)) expect_equal(exp_lambda[!is.na(exp_lambda)], rec_trained$steps[[1]]$lambdas, tol = .001) expect_equal(as.matrix(exp_dat), as.matrix(rec_trans), tol = .05) }) test_that('printing', { rec <- recipe(~., data = ex_dat) %>% step_BoxCox(x1, x2, x3, x4) expect_output(print(rec)) expect_output(prep(rec, training = ex_dat)) }) recipes/tests/testthat/test_invlogit.R0000644000177700017770000000120213135741217021254 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(tibble) n <- 20 set.seed(12) ex_dat <- data.frame(x1 = rnorm(n), x2 = runif(n)) test_that('simple logit trans', { rec <- recipe(~., data = ex_dat) %>% step_invlogit(x1) rec_trained <- prep(rec, training = ex_dat, verbose = FALSE) rec_trans <- bake(rec_trained, newdata = ex_dat) exp_res <- as_tibble(ex_dat) exp_res$x1 <- binomial()$linkinv(exp_res$x1) expect_equal(rec_trans, exp_res) }) test_that('printing', { rec <- recipe(~., data = ex_dat) %>% step_invlogit(x1) expect_output(print(rec)) expect_output(prep(rec, training = ex_dat)) }) recipes/tests/testthat/test_regex.R0000644000177700017770000000260313135741217020541 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) data(covers) covers$rows <- 1:nrow(covers) covers$ch_rows <- paste(1:nrow(covers)) rec <- recipe(~ description + rows + ch_rows, covers) test_that('default options', { rec1 <- rec %>% step_regex(description, pattern = "(rock|stony)") %>% step_regex(description, result = "all ones") rec1 <- prep(rec1, training = covers) res1 <- bake(rec1, newdata = covers) expect_equal(res1$X.rock.stony., as.numeric(grepl("(rock|stony)", covers$description))) expect_equal(res1$`all ones`, rep(1, nrow(covers))) }) test_that('nondefault options', { rec2 <- rec %>% step_regex(description, pattern = "(rock|stony)", result = "rocks", options = list(fixed = TRUE)) rec2 <- prep(rec2, training = covers) res2 <- bake(rec2, newdata = covers) expect_equal(res2$rocks, rep(0, nrow(covers))) }) test_that('bad selector(s)', { expect_error(rec %>% step_regex(description, rows, pattern = "(rock|stony)")) rec3 <- rec %>% step_regex(starts_with("b"), pattern = "(rock|stony)") expect_error(prep(rec3, training = covers)) rec4 <- rec %>% step_regex(rows, pattern = "(rock|stony)") expect_error(prep(rec4, training = covers)) }) test_that('printing', { rec1 <- rec %>% step_regex(description, pattern = "(rock|stony)") expect_output(print(rec1)) expect_output(prep(rec1, training = covers)) }) recipes/tests/testthat/test_hyperbolic.R0000644000177700017770000000173013135741217021567 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) library(tibble) n <- 20 ex_dat <- data.frame(x1 = seq(0, 1, length = n), x2 = seq(1, 0, length = n)) get_exp <- function(x, f) as_tibble(lapply(x, f)) test_that('simple hyperbolic trans', { for(func in c("sin", "cos", "tan")) { for(invf in c(TRUE, FALSE)) { rec <- recipe(~., data = ex_dat) %>% step_hyperbolic(x1, x2, func = func, inverse = invf) rec_trained <- prep(rec, training = ex_dat, verbose = FALSE) rec_trans <- bake(rec_trained, newdata = ex_dat) if(invf) { foo <- get(paste0("a", func)) } else { foo <- get(func) } exp_res <- get_exp(ex_dat, foo) expect_equal(rec_trans, exp_res) } } }) test_that('printing', { rec <- recipe(~., data = ex_dat) %>% step_hyperbolic(x1, x2, func = "sin", inverse = TRUE) expect_output(print(rec)) expect_output(prep(rec, training = ex_dat)) }) recipes/tests/testthat/test_modeimpute.R0000644000177700017770000000267413135741217021607 0ustar herbrandtherbrandtlibrary(testthat) library(recipes) data("credit_data") set.seed(342) in_training <- sample(1:nrow(credit_data), 2000) credit_tr <- credit_data[ in_training, ] credit_te <- credit_data[-in_training, ] test_that('simple modes', { rec <- recipe(Price ~ ., data = credit_tr) impute_rec <- rec %>% step_modeimpute(Status, Home, Marital) imputed <- prep(impute_rec, training = credit_tr, verbose = FALSE) te_imputed <- bake(imputed, newdata = credit_te) expect_equal(te_imputed$Status, credit_te$Status) home_exp <- rep(recipes:::mode_est(credit_tr$Home), sum(is.na(credit_te$Home))) home_exp <- factor(home_exp, levels = levels(credit_te$Home)) expect_equal(te_imputed$Home[is.na(credit_te$Home)], home_exp) marital_exp <- rep(recipes:::mode_est(credit_tr$Marital), sum(is.na(credit_te$Marital))) marital_exp <- factor(marital_exp, levels = levels(credit_te$Marital)) expect_equal(te_imputed$Marital[is.na(credit_te$Marital)], marital_exp) }) test_that('non-nominal', { rec <- recipe(Price ~ ., data = credit_tr) impute_rec <- rec %>% step_modeimpute(Assets, Job) expect_error(prep(impute_rec, training = credit_tr, verbose = FALSE)) }) test_that('printing', { impute_rec <- recipe(Price ~ ., data = credit_tr) %>% step_modeimpute(Status, Home, Marital) expect_output(print(impute_rec)) expect_output(prep(impute_rec, training = credit_tr)) }) recipes/NAMESPACE0000644000177700017770000001347013135756764014503 0ustar herbrandtherbrandt# Generated by roxygen2: do not edit by hand S3method(bake,recipe) S3method(bake,step_BoxCox) S3method(bake,step_YeoJohnson) S3method(bake,step_bagimpute) S3method(bake,step_classdist) S3method(bake,step_corr) S3method(bake,step_date) S3method(bake,step_depth) S3method(bake,step_discretize) S3method(bake,step_dummy) S3method(bake,step_holiday) S3method(bake,step_hyperbolic) S3method(bake,step_ica) S3method(bake,step_interact) S3method(bake,step_invlogit) S3method(bake,step_isomap) S3method(bake,step_knnimpute) S3method(bake,step_kpca) S3method(bake,step_lincomb) S3method(bake,step_log) S3method(bake,step_logit) S3method(bake,step_meanimpute) S3method(bake,step_modeimpute) S3method(bake,step_ns) S3method(bake,step_nzv) S3method(bake,step_ordinalscore) S3method(bake,step_other) S3method(bake,step_pca) S3method(bake,step_poly) S3method(bake,step_range) S3method(bake,step_ratio) S3method(bake,step_rm) S3method(bake,step_scale) S3method(bake,step_shuffle) S3method(bake,step_spatialsign) S3method(bake,step_sqrt) S3method(bake,step_window) S3method(discretize,numeric) S3method(predict,discretize) S3method(prep,recipe) S3method(prep,step_BoxCox) S3method(prep,step_YeoJohnson) S3method(prep,step_bagimpute) S3method(prep,step_bin2factor) S3method(prep,step_classdist) S3method(prep,step_corr) S3method(prep,step_date) S3method(prep,step_depth) S3method(prep,step_discretize) S3method(prep,step_dummy) S3method(prep,step_holiday) S3method(prep,step_hyperbolic) S3method(prep,step_ica) S3method(prep,step_interact) S3method(prep,step_invlogit) S3method(prep,step_isomap) S3method(prep,step_knnimpute) S3method(prep,step_kpca) S3method(prep,step_lincomb) S3method(prep,step_log) S3method(prep,step_logit) S3method(prep,step_meanimpute) S3method(prep,step_modeimpute) S3method(prep,step_ns) S3method(prep,step_nzv) S3method(prep,step_ordinalscore) S3method(prep,step_other) S3method(prep,step_pca) S3method(prep,step_poly) S3method(prep,step_range) S3method(prep,step_ratio) S3method(prep,step_regex) S3method(prep,step_rm) S3method(prep,step_scale) S3method(prep,step_shuffle) S3method(prep,step_spatialsign) S3method(prep,step_sqrt) S3method(prep,step_window) S3method(print,discretize) S3method(print,recipe) S3method(recipe,data.frame) S3method(recipe,default) S3method(recipe,formula) S3method(recipe,matrix) S3method(summary,recipe) export("%>%") export(add_role) export(add_step) export(all_nominal) export(all_numeric) export(all_outcomes) export(all_predictors) export(bake) export(current_info) export(denom_vars) export(discretize) export(estimate_yj) export(has_role) export(has_type) export(imp_vars) export(juice) export(names0) export(prep) export(prepare) export(recipe) export(step) export(step_BoxCox) export(step_YeoJohnson) export(step_bagimpute) export(step_bin2factor) export(step_center) export(step_classdist) export(step_corr) export(step_date) export(step_depth) export(step_discretize) export(step_dummy) export(step_holiday) export(step_hyperbolic) export(step_ica) export(step_interact) export(step_intercept) export(step_invlogit) export(step_isomap) export(step_knnimpute) export(step_kpca) export(step_lincomb) export(step_log) export(step_logit) export(step_meanimpute) export(step_modeimpute) export(step_ns) export(step_nzv) export(step_ordinalscore) export(step_other) export(step_pca) export(step_poly) export(step_range) export(step_ratio) export(step_regex) export(step_rm) export(step_scale) export(step_shuffle) export(step_spatialsign) export(step_sqrt) export(step_window) export(terms_select) export(yj_trans) import(rlang) import(timeDate) importFrom(RcppRoll,roll_max) importFrom(RcppRoll,roll_maxl) importFrom(RcppRoll,roll_maxr) importFrom(RcppRoll,roll_mean) importFrom(RcppRoll,roll_meanl) importFrom(RcppRoll,roll_meanr) importFrom(RcppRoll,roll_median) importFrom(RcppRoll,roll_medianl) importFrom(RcppRoll,roll_medianr) importFrom(RcppRoll,roll_min) importFrom(RcppRoll,roll_minl) importFrom(RcppRoll,roll_minr) importFrom(RcppRoll,roll_prod) importFrom(RcppRoll,roll_prodl) importFrom(RcppRoll,roll_prodr) importFrom(RcppRoll,roll_sd) importFrom(RcppRoll,roll_sdl) importFrom(RcppRoll,roll_sdr) importFrom(RcppRoll,roll_sum) importFrom(RcppRoll,roll_suml) importFrom(RcppRoll,roll_sumr) importFrom(RcppRoll,roll_var) importFrom(RcppRoll,roll_varl) importFrom(RcppRoll,roll_varr) importFrom(ddalpha,depth.Mahalanobis) importFrom(ddalpha,depth.halfspace) importFrom(ddalpha,depth.potential) importFrom(ddalpha,depth.projection) importFrom(ddalpha,depth.simplicial) importFrom(ddalpha,depth.simplicialVolume) importFrom(ddalpha,depth.spatial) importFrom(ddalpha,depth.zonoid) importFrom(dimRed,FastICA) importFrom(dimRed,dimRedData) importFrom(dimRed,embed) importFrom(dimRed,kPCA) importFrom(dplyr,filter) importFrom(dplyr,full_join) importFrom(dplyr,left_join) importFrom(gower,gower_topn) importFrom(ipred,ipredbagg) importFrom(lubridate,decimal_date) importFrom(lubridate,is.Date) importFrom(lubridate,month) importFrom(lubridate,quarter) importFrom(lubridate,semester) importFrom(lubridate,wday) importFrom(lubridate,week) importFrom(lubridate,yday) importFrom(lubridate,year) importFrom(magrittr,"%>%") importFrom(purrr,map) importFrom(purrr,map_chr) importFrom(purrr,map_if) importFrom(purrr,map_lgl) importFrom(rlang,expr) importFrom(rlang,f_lhs) importFrom(rlang,is_empty) importFrom(rlang,names2) importFrom(rlang,quos) importFrom(splines,ns) importFrom(stats,as.formula) importFrom(stats,binomial) importFrom(stats,complete.cases) importFrom(stats,cor) importFrom(stats,cov) importFrom(stats,mahalanobis) importFrom(stats,model.frame) importFrom(stats,model.matrix) importFrom(stats,optimize) importFrom(stats,poly) importFrom(stats,prcomp) importFrom(stats,predict) importFrom(stats,quantile) importFrom(stats,sd) importFrom(stats,terms) importFrom(stats,var) importFrom(tibble,add_column) importFrom(tibble,as_tibble) importFrom(tibble,is_tibble) importFrom(tibble,tibble) recipes/NEWS.md0000644000177700017770000000225213136242173014337 0ustar herbrandtherbrandt# recipes 0.1.0 First CRAN release. * Changed `prepare` to `prep` per [issue #59](https://github.com/topepo/recipes/issues/59) # recipes 0.0.1.9003 * Two of the main functions [changed names](https://github.com/topepo/recipes/issues/57). `learn` has become `prepare` and `process` has become `bake` # recipes 0.0.1.9002 New steps: * `step_lincomb` removes variables involved in linear combinations to resolve them. * A step for converting binary variables to factors (`step_bin2factor`) * `step_regex` applies a regular expression to a character or factor vector to create dummy variables. Other changes: * `step_dummy` and `step_interact` do a better job of respecting missing values in the data set. # recipes 0.0.1.9001 * The class system for `recipe` objects was changed so that [pipes can be used to create the recipe with a formula](https://github.com/topepo/recipes/issues/46). * `process.recipe` lost the `role` argument in factor of a general set of [selectors](https://topepo.github.io/recipes/articles/Selecting_Variables.html). If no selector is used, all the predictors are returned. * Two steps for simple imputation using the mean or mode were added. recipes/data/0000755000177700017770000000000013136242227014151 5ustar herbrandtherbrandtrecipes/data/covers.RData0000644000177700017770000000132013104230631016352 0ustar herbrandtherbrandtVn@5MAj%2V\Z  knXtDZQ9Ad lAf(G|M % QJ,T-x?fN"xlS9g3'>4o\Zrg2a}mՊ.1&.6u0Ԛl]7x ZL֊c(&i%%6.#"9Kv|#fEeD"0}) C1" QZbBwNpdXh^K\Kh|@hhk]vI&nؠ2q :娛q߃W`wЌ5e-tǢ9(p; Fb6b32Cʊ:}PI(utRs*7$W`,lE^ Te~ =Mht T|>c:R<5\&~y}e_8Tħ3\ˎ([ recipes/data/credit_data.RData0000644000177700017770000014747413064546045017357 0ustar herbrandtherbrandtͽ ^Uy?;$! I0!bDAqCDٗTh!U\JjRF-Uk]ZRh[|y<33|,ϾsqƹG;w Ȱ gH:2/u٥;oҋnHddpkp] [:g{C5R~mj-m-`3N]jC~M5ՖmT"h1[G~Ѧledth⯗&[ z3)A`m#e-Kޖmx "ocG?|~vy#)mn<ޞmdqdf~iZ "[e[>6|G:hK6>dO B0AV.6 ˠmw?Dk&~Я}?YjÓS:FҤsAߦ}[[_O?2:ɺ6%Ah8~0v~VѻAl^S&{5[~ "wmh߶OgKh:O<[>_?8Kv6.K?=lkU-mc NuFmUINd?kc[ڃ6M~2$m5f&zeܖvmt޹Xm}gf'oC6aٷ&6fPle?f;ƠoK:%u0wOW'+Ixmi8< c ou}nK0O? B:\뾗 nlh~<uT-mt mg#ma mY6-&,YwryP/4Y~s>lL.e]I|m~,%:a伄q;,Y<pZNwt,T۶+:4gu4[[z^Γ^[OnDrL}idߧ . =Zl4mdxad?B>$v~s،58-7M{ІcX !ycoɲF~ .Q(nC>l[KExvd{@Xhgڸ~Rrk"3S|fiۖJ3~ ُtf,Kv%󍼣]e:KsH7(vߴ هK~"Y/l3談L1)2S#oM:8wdDŽd[nsI´d1`L-XÒsM.֖>u'L_ }\BZY٧9@zoo#a܀xP^͵x4-'\6%W[b Yp1[h+|7}Ї|} bHx3}Ba]a`$>n7RkGI'oҐ!}% 㭖l+m?>[ /ن-cDKNuoy3Op3m Ƙ6-v,7x%!f >9mlζ;qYɶf;a1p/ɾnHoGb<҅x̘kFEf=cWyɗU:&:'m]X(7## "|m1Ys{-sQJ魑X^af222O̱Ǥ&Ez˵ m>uR`Ba,:֪XCam:ƫlcfs|܎ θX?aKqld}#_=D29Rɶnɶh4 =mC;amBhcm)aH IEHoؚJ]X*5QsuW3u^J5kIV9E)L8aWHWKgs+Na.[JZ$3a=@zm+Y{̰G8Yg!Y$0ڹt^ٌoH8WlN6v]!z%M\ɱIkƌ6%OmYZϰ~<\[A?huF7[##Lğteb-g`ԑ5k^%JCJܝ2˵0 Y)YH7Ҏغ$b k|@m?exc׺s햶=ŭ f<I]A;FEIٵ'oXv `l@=aHź`Gr\:%푵96Im"3w5Wq12WCd_?{QHQW% -i޶я9.񡭥̳~Hٚӆ.޼bɹR&&>fGf!"v׭13d(k~N)sd?~1ߘ8֫ܧ]us%e}#8H2i3,`򈲼}6q?ƈik\;Yf"[8H+mh1B[dmm(һa2H H=2YIyb|P!,mQ,=YCvp}FQ7޸iz=_zץh}b`m-;@k9%mDsrqb,43Xe }*}vb [gA<waFlcJY%Z;Rmm8s/9ƌ eƠě' h6Ƥ>'YH%+6ơ+$ot<~&.OSЎ,l݋Xd|ka{ 1So9.X3 ' $y#>cOkiH_lcgf{ ӶP^Fqi>ꚭS8:k']3/c#럘;0kdɱڍɸBEZSl J=Us+v?iE~0A.4&hGrkk#E^} ` :aa}lcx ({ %O%O8 l73f<ʺ xv+iۘںŘO9%g,ac2Bߵ(öd}<|OHEzeߣR(u;ߴsb#7q$GzJV&0/o5zWrzd-kVw G[ɱ9hs~2Czp)ӏ{DYKrڑ^a=QI^#<zF_a>/5te|sn|юʘ͈[yN_g w؇ 1ζʇ+\;K7+`ژN$\Ri/ЕKX]#w߱3fu:˱c"wEbAGZ1Dr>ɵYoC){|i^+$ײ[{1:~dA>sɱކ)+8}߹pr"Ofv]7ɹurx%v}<[}ט9h{9sckW]8Dz7/_{ 怶@w5mgv=7l>u;{з4FaiCZ]wo0F>D:-]饳G>`_Gopa,;v=gݳ|+T3s/ei̔ :|:2њ%k|u&Jnu|`osIЪᱷ䟗1Z~tܸuc#1VϽ /jV uJx݆쪿Wk Z==[%lίѫV׻Kf\kF c;F 񭞧x6u:uݷ)$%p]]?Y^NK{믜?8/^7e?=,~]-zYsY=tmK2/_w3˳&g~~j7lt4 ,~%;P7vhQ%^b*{:k3qlx6{pxZӒ|)%Sn|N,]n[ogģd y>7(/Ϟ%|c !7'lkW %UGu:PU%~}Y3m=ʺŻl%rX62mJ2sI_^x=(:XJn|oRe!c] TG,3/]i9JaDYx>(^&9ZhW9J~D:omDo%:N.</K%>z>[xyus_K&{ɿ5WxeVV+l߈o^'u~N7;5}zdartX,śӎѴhex}iM|4J-g{]+{I Jtfj}K}sp_)P [s%=5ThS4O%?hic&xYj^?,}WfKoqrc7vɞb& JtL2-mqcڌϥ5:_/pksKm쑕I?O3u53:VGKd_>d`n%xq])G,Ɉ}ʖÐ̄.~}R<mc#JcxMN/(Co˽Ͳ^:K:S|v:Y>^;ވi['u~D&~6ޗ`K6D;k,ɡ&KZyْ8W˔0mca/[%;QX>ؘ%ٷ8kI__:}`?]h:?`uն)|ioBے9,z,\>w^9ϝؖ/ϗH~Zv m<_X_'M$M [{j7ciRKeN>M~WŨ%{%^G4ɍ-ίS%+ү :_[_ِ9-} v+uĺ:m+{IJxK>\~4o.fۿauvk;;;YkSױmt6PWA)^)y;QMkf%U⯥dzNWڔ[ڶu^[ O/OfN?3ty5L^{Õuׯv|)?A~q/kwO?c8by봞"I?s̍L7s7{J36\kޢ`p&sZ0,0O~v>vi.s,,M<5&|M0M~~6W:اtmy=ލᛖ2-MedaC }=,J21m{/sWҴ4$<߳M< ~, muLZ/ͿRzaXhSIIgxY^.n-rcXT-{Kgz92U^];?.wZxzحy,FRoJ2nv<ja_)3hvLve&']xjp8~^uRfH,ޏ +{+ec礹G}]IK0vwZx5k?l! =n3)Fy6[lZ-< xZ230^MRW~KiWgL˷o7];oK9iWM;̄eidmgZeڌVfʖ+֮N&7m<3O:ԝKl-M]-tkobeC]~JӬdM_Yz@ܙ4ɛdyYzqx.\_/KRU3?n~볿?쐽igʆ]Kݏu0yYi,NVՎ6i1YX{<]خ+ezm1K2`ʭ~ciX 7<|ꌧ̔i:]jlux9-lO_/G~>:Y5:;NxӶmsO䜥)OJƤkRfʂ_)!kGJ1tߵr|6ɫ?']K;ۤ̄s[xħN=_6fzckNKSKG鬋 hac$V2MxRFxWɅ?}}^ɏӺiLq|^f:Ji?LҲ'6V,L=nR~Q' 8\/6׻2T?Ycٽ&%UA{zQ,&fZx{Sta|^'wNɴ4X~ uڵu\}v̒^N.%7֡5c{^翉ôĩԧޒ']iI mcf絸zkV֏h\Ǥ̌Jxѥ./}pMn<ݾ\½dJ4)-:mjmQ:A\ke:qJSm=B:^a8'`=I͘)k.%߯qU^Mk:)::aJ:f s?XctlGceHt:&N=RVށ֐٣&((`g{VY%Y6*tN}Af܃>a:e|8(4[ce;JiVa]iqPAe{}΃>+\GAT|gεF̵p=7)?)7)(W(*)wwεJ Y뽣tC:֩::feIqHyE[M_j&^'*-^yi#+<(~Ô:JaΣt\wڙ: cd٦ܒN&2$d_2-YVQ)k'HUfc%۾t)pNyv1ȓk%d$kDD]㞸ZZ?Eix_cWPɾ&cPXqJ9:mzݨiU:3pF+ٿX;|NyҹG-')8K6&=W+6+lwd_r]pxLiTɾ{Xcd;XLpdoK?FXc:Nr\(Y:q;E:%9IH?FYe.SNKN3MCz]+Y!OS,Hi7p|}>){KW?+I}FAq_0(Y)Oc%BKW˜ߏ 3nmFcdY=X͡-YChAO{ud69Lae䘅6eγZσ%IO R=SS .T3`tE;Aǧ]^!9~Zć8xQ ٦0&O)̑w2c$(q2Ǵ895_=9HXmC ̷SO) kIo~tis?%LIxd!=IGK) wџXq_u W̩$sD`Ø ?M:.턍rmgiZrC/qO\'`|iϰ}*f+U//S+9Ofƀ,wGP6Iզ k #(O)B=g}d_yd?}]]X!~T) G\S1`OPQ>0FLOI%`x:1XG?}IN2` ؏9y1Kѧ/1ev2Ï%ڬun54ŴG*m$'[$++.KiYk_r͑ x/9ޚ|u|E[s:Q ^r.Ge,=%Yfg^4bcCKG7،8Ǿ?曌+h7h;i'YcIcvzs?A9E"87eD>3yNzLICO/RWI)p1&yNkZ97E8Y֠_tflɜ$u;?!:7 s\ϥ<06`^ЀOyøts9J2'KNSY_&6猃iNq6'YFO\Oy^r}3eƮCKԟ%J0]pqx%)5cn$t=Ucuy\3ฬ|amr8k(\6K; -S:Δ\qɒkGI^.'JO{\_^fGKE0kɜ>(!aX9F}7Kkhɱ ɲzu-k}mFN\l]5. s3u:Tɱ}mbXv]HiF0X Y-NsmNs|l ,\k%k /X`mb: הOV18HhUZ+YFpe5֡hi3d\ܳL?ν%l!~}8p0Zvfdy½$0Vж2޵hi~=Iq%l쯏\Cfh6H5?vx9H^Sa݈3>?Yj3futN;%ǰV[ \3OO;BeLUHk[ciwH%1V Kߌ$Х;eD 3ΚqJ'R"הpo=k Xyf(ן,cqV\Xi6vH1*~D96yF]>Jrtk+V ]KAR:O)\/0/X8^z`5z%7O0<#}$6}N4Ju4dNA={Xgpo}O4t<^>Y̑+.IŏFƚOOI\nskPsOK'I)2|q6Jm+>yd{t3}1=+Yg6Z딬+"3#?ƳA3$Ywi1O`=7s 61H aFlJM?Lcͺ ag˘kZrDYbz=9kƬ1'bL7%9g襒sestn_wd;ؐ6tb13ɵ.fܹLrmyYc|丘#I[$X*)aͼiɵԃ$תs}w`JrZ)W'ZISxaJ׶Zrk_A{DrpdS$Ѥ[%ه~ Y"&,+7%99]f\9%&zK7X &kSkVXz)b.IaL`AZby:6`7I·Xb\NZerC;ƢӒYVX+ yȚ G̸~0-9Zk$K[CN,3s'؝)Ĵ}v\a;{IŒ?#i;+OKާC}9EvvXk!GWI~*m+c["W8hw.k0d-2Ƚ\d^۱5'iD?MI\l{12ʚ,3-EVrȵ5%LC~(GI2{lLrnK?xoK$(OչNzڗk&ٖ$+WnS~qu{96J&)ؐK %JxN&ÓeGϴK$5en;ev}Z],9\!y0Wc 5)䜈pɺIydͺ25nOMNmd5MC hrzuݕ<9IrKYp8Gs|9;663H%6s0gpϟ cn:ڌ 1Gj6}nC?At4FFe+|2>f# mчv{=_~23Tla>\ ʳ^gY³-ޝ;gI;͠6$om䯍R&V{$KVn\|ṏҦ768tl;a; JAkm|U2ps(3͆ϳ=PK;hnRwߛmu4g6o{g5Zouq^KgO=Axҏmlfӽ~0:V_یՆ^l=׹k+Wқ;'6o|Xߠrb&窳M4Dx{Z谔: Ah3ȳ~~Y駫mxۖآ~IgHY/fli oُ&|@m=N;Oe}̾v>G~~ yѩIgk^~jS7_wujmL< /?o<\:zf_HSmis'_G6$o?U'׍۶6r6'd n>[:x`74g[f?lfsUg#wM]D?J{JFףmėsU+KuR &<^4& ׌;ٶc6vi&<; < mp2sxۖwÅ~^/gCrPiCSW6TR#3%p~5Wj+:N& zG?C8ډAuԯ)͞7/埿7H_Jcݒ~租߸tmOх6o#S%g.mK֏qO:6yb6ٖmdD)Sx͚ t)٧&[lliߥ%\zorܖY|5Q6s: {=Eǹm ?vS~-\mt=2o j_7bi6Ul% m¹:'Kڼ-3nϠ8FNO{P:jw͖{f/TfcW|=/<oO\mƹc&ym; q\}ltnOяuPY{b.vpOcOڊVnӱFui8;1k7ۘb.cĥ~\i{6~ch6}l:;Vgx6~ Фm3X&^ [o{PDl|ivՆ{2>i;liڦ_[o$ zP|F=u^?^.1Ǡɞu?Si_ǻAmMsemІ?s,yP= of cX,b^[iޗ䦍lj۶d/0 .\F^~K?NMNnرLHM&fydq<e&|7J}K<(Ѭs\uO䥎>u=T&mSLԌۤkwNNKc6x;jһ%_46ۍ6\0[b_?睇ӠNS?y߽oj[Ǔ&[ZM5ayQ wxҤ/J}hFfnMp .O~˲}@nkk mc])ږlN^ڴ:=n}kcӚ&b۟Mx8-n[ 64-[iϺxy4FhK񉝧.&/[6M86ś3sģk?>5Ƀׇ8A`m3g%kWOMt=.Ѫ)iǴW#}l\'mdOƏxysNM~P9ڴ-6:?5viw4 0Hfe.GIN~0S?_:b~h]L_m.׏_[Oa_:>zx<ގnLG/wmu<Ƿ:JV}gIV/DZ9up{Dߛhքؠo_C4mmJlojYu~yXdOݜMm_ǧ~_&,/'uy^y~ɷ4ɱ_;wIJspiƾ_Δm~<.=/Ҽ?hNOfgԝ>6r~m7~7M~_&?ڵϳ uj&s~&_&-كW?mۆFulKgS<5]Axģ6lm`}I]d\z{66)aukMgqm»k%=x5ɴk^Mk%x<^6ұ$kzxƸutjӾdJp+m׫A4m 4oޑ }~[G#f_PCl6urv_{:r62VgJmѵ^s6YyYg4(٦dMsei:m>~D&<\>+l=ɓk[[z"%ֵok;J`A/mY'~~ex˗K,Ѹ)(遏Jv7ſMpu9@IKrjӏ%5QydׇWfXnKG/^./첫/Flm캋vݰEI̓9yj87,=W}[x=-F3O϶lг6=g?`۩W|EئU ͘gk -49R1pMےn73fn{~&.[/zz8wg19Je:N熚'qÉk lOO^9b|qSFo40X9}N8o2ts/'iyF6H|W;e̬*pN)flI/=óC͵z$n11gx~ok3qEf,Ż$ߖ[{aq>_`>tSӱϟjVz#{~bfhshk ׻gU,=eef8Wzaϗ^1 } VO֦m ҋxiδ6t>Y=+lQWL{l1=8o~=>e/Vv8._j1gJ6'~_O^c 6y$9>OyjDZsꝍm}wv֞p޵s ']n[p>_o}k̝Wk^<>UR:y=CzO1}qex^-1 ,,,cs(8dcu~̽JKBMw~㍵ݤ|dٰ2hu晅h3\g>{}|/!'[[gBF47aٳ9ezym=Oq[~VZh[+9zʼ7Z־g8>,IVkS|Cz=G9Gnol7sb/շM[9 7o4x.Ezط3ޞx٢|0@r-o ͼ<_`п8l>k빆>c ~?aoO6(^=6cd+Ϥz'3i}ŕX~G9:IfF/^/7Ik-mcG+W]X^Y{z̔C >Y<㺍W#?6Ǜ6f }FH7/O4xkV/閧k ˤW掐diVJqْӭ<,IH ,lg<|YB`dzҎd w* 9Wzvh{+!plh`cogZr}%;[[ꑕ6'|ځϥ6c`Dne{n0'1cVIXpRk?IyL~m۠0u:<&|fn\7Ǔ_BYtL3OG:xfmT\ST;]$Yp~Q]`Y^S +koNgƷ~c?8syF|vZhX(gϐl8N+ ?Sw%|5w;|z7<ܟb?\z3|ڴN6M)oϒ|k; f~W]nc ưc^ps6d=\aE}oZqޓObygf.2mM-~;7n(tY1v$œӮ&Nne&K[˒ NhNvn~>a&dD(ˋ]{ʔmXߏp4x{J=Mz}<ϧIzZş1Sxv#\i,;>YY+g,PQGLm$x/ 1#V,?gzd13|Nψ6{%a/?6g gs϶9µ=I=Ry}]*LySI&}\'ά^{ux'DǘgO6im&G* |;'w+Eq2&Yom)oia) 0[g檓A9kCJ<(CmU_㯗K;3i}5&Z&3iQgO k5utNnlO0o.{3ֹ%9ܭ1btSmIf,< 'x?~igslj9]zzKIBfRosJνִϺ魍X{a|yVg6֎vz}ቅ,}0t!v=d3^O-e?xsX~穮iJz_g7I/K=Mc);{=]|֢Hs\[7px yΒ|Ym, Z饡[XĥWN|{߾e᤾<|Q%Y8>ћ7MqAlovi\3$ﻰ6I^?=rdO[H>yZ[Yd;uSA%_k|',Kq\lĦ?p^Y1*'a=N{F[$ YS}Wֶ3G꼞.7>ˎ٦ ybeOɆy.ni3&zqRoOvmn.[SuOg47k d;e8k+Ekmm腒:HguӘ=UzUex9KRO3\ӭvkΐ|i< M4=ʍ헷[\l x:`|^86L۵<rl<(YsdK_@8jI*{[m }^S!$[gW8[=uW3!Jqu %5spݭߚmPxpRsGd pSoqf8?p*a7ɹ:|~Ti-74s0Z6ml/1^moU>xnVzRih8pE _{땞#i*|1mB;/x†q( ޗ%7C/n7 |=[ŽkoױSz{7t g$)I]q. =bi/ չJO]t|ByPو2TyKh1%c/  ӫ_q`VYq ~HԹj?CJ_3%s}ƒ|4np} m-:) s{u,;UzlI~$F}C)|oxtG3$ݿ&=0'zM/Hҳ0_ oZ8KVZ^*Y/a1{wwvI98سŊ}I^^w*q.YWqgг!jc+scs;6m Йg)Nlp:l `ؗy'iΡCt {ʓosKu؃uJ;Jg6[ Iw* gM/oK톶jQ<_?O+hۻMaڮqU^}Hitw%@_ ~ C?HB2?U==${y#ƄNJ퇎P]oO4>>Jmb%`xvw(cjKL˗(}`+oU/U\MyPyvwhQ#y;J~@ץ#1Ot)|3z4Na!yyԳu͝h^3mug!{Xe} ;ƂNLaǸe~Jm:9 ƇV{ +\+^7!NGx!/=^]󂿧xЉ_U i$őx[y.)׆s PXCKxF?v vm_{ 9EZ#þ)Z}eIE5ɦWt&:4w:d qmhBa|gr X0Ƥ$OL#xg~^ל1yLC[u y@YH};W38Nެx%z@_ |RX.P8|οšs$wFۿMi\uO*,9v1FV=o=S4O~yKjm;ڜh2+@lAƎ*vj.P:H!>6uunC & 0t22@󨋴X>+Y^m_C?w鸔5R7)hz8&́r =_.9gATzKbg) oSx8;d:Itƒr}vr:>|!r(ߩm?csg :,WiQ?sB k6qbC?2tN^<9^g1Ǝ<}PXBDۂOPCOҹ6ޢcJam ȡKX=[uÍJ[,7$&s5 ٚ"/}Eb<#8w b~j#Z=[ . "^]&)ֽIr] pRUOD|$ `AWld9&A$ǧbO@xp¾;m)ZxDRb+0VڿR;^;n J魺>~qTZ2yZ=͊|Dߦc u+p=S' hn}N={G)ϸ}$=8O췮S}C?Ƹi ' 's,5 ?h>`.֊;q{c$۹?K׈u[7*(_ ᭑B/W؀#|Lǂm@{J¥RH#b ՝v1+2vґkmJk>4}@twQ5B^~qʢW(TmB>Uxy[V:n`]r<hO^ms#ї(̠9kO;`Q9M!- ;Nŏu>ifn 9֡=I烒ld~2Ne|}@gu.|-pM7AxːIKI sC|.Iw=܃k!OW6KK䓙ϽK$Y9T:=[LAǹI~Л˔gF-ֽ߫69|Am{}u;-[AVmP%u}$9zvIv0nvm;{ 憞KHY1B6aE^  mz7|f 1F~$ڂ/`/oU:@ׅ1_c^1 :9$;Zw s)OvI'*/tE}=8Z?o9X/ 7>G ]Ayϐyv}~N@~/H10^gJ0v?4kI &) o6aW%c `33| ~O?[ýIvÒsr$P_[ęLD8Pبɒp .YO+~P9Mi vv2ڜ_o31=4xjG FJq>?XCB\_sI8(3k9Wx bMqS9.g}sN8KnWXo(6( 1EkǿK OHi:[9Pr_O[r}>Ry僒ddtP1G+< =]fiKܡ;O͏TZ\oNt5K^ۺFC|cڿ+I"΢/rX73$?"|.4ɿ(gh/]cN͜<ݮ50&^Q|)ϑA!R}d)~1/\D\z z4bwưJ-:4w!e>09`㡯1gJO+&Xzi 胵+kuj:p]~1qg UKzDZCǢE v*o틩o~y9ZYr%rv}x$MJG7s›t'}sv!>Qq_#)^ 7/W5YyZ۬R8.H[lndymoxlh/IQ:oIz1)9 #.d~,9xkU"yzgJ9#H}RN A!kP% zBa>"^r\(ɯ,Ѿlx=7k}z`w۬!fQݏW5>nS" Ӓ֜0bx\k3)͉fQ 7 uw΁ֶ֝݇W)n Y^ CVXRP}Z]}s=Ø,ĄCg ~@qC| _ ]x抿D* ?V\fɹ }6(w'd_8 BIq)%o| HO%Tqg:5 ڃ?o`zB!k'$Y~7i= q'5a+\5OQv9׻tp}}emv>_SZw:-) Bgi֏v ]0uUǁ^ f}^WGy;N !ĕZWz4;T_c.GCa+ !^[5H$F8yz 2K%"iaM9f bG$W@1Ga-#}_>1!QHKѸ!"_C\KJWxbBϗ阰1/QÎ~C;tLmr :/kgxNEg(SB~z4rXp~H:M>@WNkA'v[rѱtB[NɿIq>,b˦m\7ORXAΆ ϕ\}=CTasrsѴGK?xszm˕u,؜? 9Pr̽7+\Y }νCiC+Ы7\!y/`(ЗCO6Ѯ-~|dY;KNhrd5t Hr}c"fDLu>]`pssз%A=8gL}}V:v;rO UzVI2YE[\ɿc \>ƒt(bk-z1_%3Jg97JKhlm?K%+u Ct}Dm1'!*-ߦ SCfG5P̈KU8?BIpDTuС%+̈PӼ\?"w_#}ᅒc3iR_TcxueI*ÛML !LxC2NIJ1W*0d]X]炾|Oiy{ucm\z;꫟ӵcW6\j1zEncQW$O4~mBcAY\rgJ)3bGs۾81ֆklkwؿW >>>0-mOk㎻kWo׏%SvKe1NN{8GYJRtNludD}{CX.ymR AO^4d<@7On/Q8ɵrj9J*Ǡ{AJ U}ߧ~]HX ،?.(9M>=Id zu$Ym3݆Koj T:~[2o:]װ7v{8i[" iC/s]A. v;"

?SIܷr=P"Yü!/+بπw?ޣF(=1&t,t^>//^q=`Uag~rD߀ǽr^ܘsػ~%;t#xܧ`3N]A'b?[a?ϴ̜!\JݡϯӱQXA?$y=__\ew)mY_Or:x ;qTEDŽl@`~QR}ԃ:6}d6`?tm~UotHr|p]k? }ܤ#ߩpB.wu?R#Vf  F{4cy`ݒC>c)|Ok[Q;5p`cuN̕斞c$~%ǫ֤rL~8yGꋶ]<#y~I6~'ぱqf}4'[ay>Ø2GO_|k#xcSvW͆Hc{h4 \wqyН嗦A[3юm7^Ngv@o+ < gC3ޛ>SFߝ8 ch?zP+ ;8κA]b_mH z Z3a|K;;?[xolm|0>t?|:?2}!߀7xNAY#>q|&>/ ~%t?zifbMcQ'/S_q,,c!wL}unetm<\ړy9S3`' 9 :ӕ1>qNZZgܧV膕]CIҁtm{Bq[L iJf{Pp6|ξ=m8[@ [z4Cȓgw_ą~KЏ0单 6gYcub}q/~,b>gF˫7fvdۓmA<ð8l;XQ+/x/>oY{6h@WLٰ~g,@%o?;7xF:V;]8CX@?jߒ"MGLѾMdy.·r63S|q,Ё~ەsm6_7n@[adlFgΏ2FNC$^=Ƕz#rћ̴۴)g̣ (3cLh by|#)%gzn}A<sVc|{{s=tMam>yy^?p?Uޢq +qmml<0X2̱qvks^e>p~eE@7=yGN3yu{| zg%<ͼZ i+Jc5>ԙUd݆n⾭IXq7~gwy6`[!֠!+67yq΃svX6F Df^jy_;q{@NN yNPFVwۋS⾥o7cJNr,Dmu52k;-c$ʀA zG Noyʵ;L6/pANd+hO ˗܅9 Dvfz0cmʠ?RVm-mGIw&7Л/\fq>[kmlnmA>HGe|/F{-s%#_Wqe~6zL8/.~c3 |fs(qЇs^eLI0F{/^ufQȇiYxϜ4ins pXlxO\,?g|"ĺ[u֖]>F1$+8#5ȕtp[a}`og80U9&|a ijğ=hkZ%|[O_#$\^@i X3;pnO@}ʘo<cE3x'<&o}wn.@ /V/.zXd<;v3}֯?پY>nn[s$ۆ82"|qwm K@,00O914}ViQ+y9aay2xgK%9a}vlas,!Ξ=cnOk{Jc='1,.-B[1u3Ȣ?|_ n3rw90/ 48X\ۘ=i߷K硝ɵy=(#>5l뀾-?Aw4@8&*?h-jh|e`{Zkɤ511mӪTsG8 Ô!v-T Mc@lt&X?zκ J=8x֊qZ ow(sMv=k +c k#poL)aB)I?@/|fK{Ƣ@̰݀>o|%d WƵ3nsmjwx=>-D_`׌᥎`Αwss{DW7*!mwSn:{"|umom 2NܙcOXr՚=svs[wdlxVv\~ sӮgq"9?vW8g ٰKg)pY`,#>_\.z<ށq,,d9{ݳtAPg{uxޣo>8m|bוZ|2ǚm.XM7rOkͬ۱{skCx1<߻TmNs튵uw|NDz->p9{m yRqz/h7oXqk?bܻ]G:y9j\v׎akMqγwv%gi{`!}#7kBҏ>jMG)_3bGoFS3a9{;>ܲ;wX怜l#9c"{(Տd}w>! M |gzof`mik|J<1Ϧb2r}pΉ6:Z՞;]{;Q6e%;ʷb꡻.9ߌo5n~^uvqh#=lmkkے6U}cSh67-j{k,|cYsp6n!xߪ|sJvysemG%޾=1Y{&['kue>voi bW3=jpC,5/urK疣ry7ۍ?pݽmnmrr1v R#o*p4Ridk8ktwi]cRho9L濫i'ȈmkseMl[ycjzS|pvC j%[`.gw]h=u [B }Z 7=.=6f֌c{XM%1?kQSC6v8f>ߦm3ݳUY*쑪?vnZ5ްNcҵ-i6",ʦ8tmQZ۱c^Lq;V[}GRуv^ș8աpwͤ$ ;VgX4:pVȋٮ=|MX(;<6|'5K{Ol-]kn,dOGm:r;M4lcJ?&mH&ny6vi9l-sLXgȭz6u):Rٓ~~r1s3hO%z#F/-仈Om}xچtw؏o_*owổ=@V _@h}7u:?{l6<|jHEv4ѝ Nx0zk v78&lb*wfء2yIÔc<Ɖj;ƛ-1zsO}c}͹&C?ep5a Wȇ||IuKClXGEy[ @NYdwDߪ{e1iG\O|4Y16܎2:ϊ${xa}G$UuT1|\{g.eM#_􋟔*NWX8ʼr݇-Y[xTkrijt26KΒ́P׮l3| ZoS_fr W؂*GI#wWoD->|^6*5J1>>s{feGa:7'g媼!÷ei0wϾcl ?_Xlٽ6sTu?w9>-So96эYVӯdQ^|}E־c.ݙyy"ML%Ѷǥ WnSvz-2+u5ͬVqSW3oI*@+7>]7&X//S-]ߕc)c1=ȓnj/oҟ2eM:O꜒]Ulva쫤@g"nwiؙwD4FoJ#,!('+},ǡ\1Y9m.sݣo[ݥo,E?9HCs;_K&VN%Y1 Tuȩl 9'롕?7Ǣv tNU;s]Eۢm˱4c,vW5rcf4F([+Ukw귩U,}{ k\%">b*o.M 6X}6I5zYu7ƦQ64w֨+ԩǞc*(U. czdu[cʫW8s<G]s;(\{=<:dU}ɧ3c)y_R3fWzdZɰWAzvUCFO6ryw"چ3qƛ]>:+㯨s_slx%ͱlsC3mneogZr}^q.D8ךD˳}`_qu*Z%<\[-x}]|٪Re[j_Ki˞-SgMgN\ >^V|xXi}kuf>_GƯ/qߡ9^ɼf<mm-Iϩ_(wx4@̽uo]EkƤkqiyĖӱR‘_?xnMv8F3OLt{UEV(ć~tG$+}KCHI~:cu!ԟIVP^QDwTTލX>Gu(}_;su>`>`]|)29 +q:c?c67_ZЩvs{+r^)oFBl؀WCgfN!~hKGT_:)@gb1Hj(z8ϩ٦Vkg fnUU=+K#ɝ?V"izT{`6VdWIv+90kӋq['~~ZT^aflՙoq9F>){k[mbũM^yJX ϳoG{1/rE9b-Ѭhw2Q r܊&fNI߁3*4$#V'V[M /6r ;GX=1+V83AT"ܘNتᔱUcWy \_nZgV3h? }Aeq m-uZ?Ǭ /Ȳʘ9}H\S~ZøOzJh߀'ܧǽܧVQڽ&V> _{{{2}WEW8]zV=qci#'9Z4I5{XamfmVg|U5rLY˥WrsTH6O3A,=OgSgT9'c2?ǏK~g&y,n3|/U->B_ rO}vbjҕ}c]arc_՚5s[tnm\kjfg8:Wk_CUF1,n2SY:0ƌ 򮪛-tjVH9awKe:~)e_{Ig8PHVq8%pԥKqOX/~2ڌG>$վߪ.1>E}[;X߽Q'mt[+O^~ںjb4)Hʳw&9`]fU519u~ h߃6Ci绲 R>j.A63Wrv*6,eָs* c3mgڕ2l3F|T㗌oq6`xGcf7?=-#_TcLnY'uȪxr|koH4|t索1E]5I5:?eUgW:V6Țh0{㊬gԓGk<9u־nq s(WWi̪qvD\]bxTc][}M|<[Roj9:}S۸'LC+^H?6\+gcNYzoTK;gXڿYXeRix0N RHޠUKLuXR{Ƚ>SeN=[כrjxY=NWCZ- scYhicUesnvkeY/zMIUde嗲Xsy68mIect [&VkS>6h>NJjԟ"_Q`|`lXt;2c5PkUMӓ-->"z~uX/'zbfXsU"o4؋L5Ԯޔf].>exTqmFv{o֕*Hc-IPe`A>sQnwn;TK\c˳QM0٬+z:,+\XyX@~ŕv#IOa>CtJxYrzT8Upn~:Z;O8_x>9)Ǜ7Hܣ} c A5n6wS/%E8{⃎LFNUyX0kn}[Ԗ Bay=)vpWxC0>moēXw Cnoc ӸwR.1x8 g) |~B/Oυ_~1=mχ!.^[)yRe}tk>>5}m<3\!@ܰIu=䁝g{Ag>p0V04`Y >UF][ϥA 'd͉ݵ&S>2 O9hQI,#AC9 tȲ=.OO +oɵX 2>q~k. 5d,Tk~_|~ܨ7+C=)I$cl ε|zgR[U'u5vޒcXxtPt.'m<|31ggg{q"'b~sA ;c=%E1ļX s?3>:4 aauIahav׵p}Ij \~dcS=ג!͕#7' | ryFIi$ǩ\We)6CI*>7fUu-(΍_\ͅоtq4EYv-9? JK膸5N2vW3e C^-}|8<LglġRw^nýtƾ>{7Хz\<d7ktx۪oI?4A;O$y3{&0prB x ߄ ot%rW&s,?ߜء|,xu1G&m[FW;ex:g$ :1/zlT9w]a{&{01bsmNڎ!_=Aݯ&=Ǟۑ߻(g̾(]q5ŷWA6׋/aN. !N +$y$װ۰ΝI #&Duڢc$> tdtGI|/e^O㹟k@ ܋:wݭ7e X0'hsߨg ۘlM'5 ic5u%>ߨ<-Kr ;<5:>O^ۤ}t)''>G{bn~ q3b ;G* uӷ*:mQk%\Bc 6g~ۣx>M˟9pV캤ot"/U[ !K' nƽ\wa?Lܯ?<4Vhb3=\z@OWi\`ԕ~>vJ4Ʈ$ލW(ڂX*sW[7{?M+ʡ[ܹ- h]cTk(EV@V moø <2wJ ^L6tKcZ?*bk |s>$|c 6\]F:a@_q,="ֿ.I2A~%h Y?V|/SkJ4w_KqTq 9;R }Ŝ>4ũ ?؉Z;q> #PH)":gvI CG_(}$a5ʫCޟlk?6/Xso}C8>/a|>hLXQF0E' '_eUzg4x0 ey)Ӕ?$;Չ#@nc鑷&t12VZkCbܿ5&׮FH$bΐoUO@N>:s|/RWy?Uqyg~!{ꠞhQ%''w&~niRՉc|Xސ/4pw:!sIKB26A'랮>`8kg1\[(M%G # ٞpj!G8v.V>t6,U=-⸾fsoGsӻI`'DByD7%И {  #d?׺+/ZO浪J;Z"uMsIF'j"=5wz7dDcG?D+NU_BрS zhq5rWm#, o5;.9?g&Z8s-)~(x'hlr؀:2Z5Z;"FQ 2q}oh x"ؿѳXqXܑ|m_ַ;yoRa ;~^!I&x>#PmIb CIV1La$C^< h | }{nH1a;ڙV e]ՖNy O村.#1t+!_ 0_b"R|Oc.W_⧽`OcL7ژs#-ȳM]rRl:v.ҹ%os't7Sl:w̗s"d1Q 6e4HIN]X)OG _D 9̣']"ֹ'`/PG ~y~sc'ZLj|]Ƃ,=@J1(~˩]!f!MBvͪgH9x\ħ䗠_/A7%f>Aƶ&[=ߪuEݠx{lԅћ9ٓX q7rqЭM^kjMmK>ie:7+?W&Zgŵ8 ;cA.G^6~S}>7fi׉ی#OQgKXz V] tA"q=51Vƚx~Fm|=lƏ5 x<74bv(i\sKTߝ3p6ڢ7B}B1Q^bhZ+j TξQW^Y^bjWDm]:HH$ze;bK8"NڌNh_;hXPgA%Yo쭾[bct졤O>+Ϩ}@m1,<#UՔA&Qڡ6"}{1~tTD>Ug0k'|Y1^uWu}d?(яvJ.O>~r"cW^17j`oW*n_Na%;!_Zw$+ė1|K-=u_O=,my~ucAo1VYmÎ|+. 7ArK[گP 'F @M4ĺ+ߩ9HO ^G2;P5iMZ{6ٸq p7 +X8,Vgq7}SPo䟒w36kle[Ԯ4{3dS;.z铒 ?v;Iz7r`kȥ?1>).M:kڻ,ƆS#_J{'VgT=W/oKɯI}_I~s{cX'ޭ_lqO]&A~˓ 5᤻xwZȕ=iǥ_{1_l>|^̫C24jgMBm$u<=}σc$wuM$cX%ֶ~q9!l#ץG1TbOO/+:h'$W'Ƿ&=4~|oDktw4N߅zZ qxj^k?1!_VxtK`gƗJf'5fc@ >m\6ޢjU_Oz_c}$ ==yp/$zƵ.KT_}'; ط%[kj\M1k Smo!&!n kf{l>!?$͍߬@8,ޮyܿOn8SZ}ϸkjfvޞWQc<} ɞ>:&zrļ0uY5ǽW].!z,g'[wAN jc HpwsH[Ǿ賰Yb=󊤟%zS% qo]{ONJzfD=C.bKswƂ|a]9᣿sޢ16?j?įJqm迓k0H2N>3yFl/ &2?#xdė('L6P{ΰ$;~=ոc=A#nn^1ᓓrO1>'aCbX?8==>ŭ1_9Qw4/뗨Q_ه4'y0jMOI^ò+!m *} {1yCz^9j$-@>"lUNBhw yҖ>;Fc&Yɤuwn#^Y6@O`sQB}%NQ_W?VA&G~7E@{R[ȢY[448+)_ 6-0@w6i_L͏K߳Q> xw }~kƪ%'.|߂ |9{o&>>.8&36C>1g$j?7~K}I :cߎRM;Tjq"2fXWC5 GCVbl}6/OE~Cnx/KIߑI{0D9[K | _gyAߑp{U5>*|GL ߗ|+jL<mɆu?H1Nkut~7 1%d rT>>~u,ӒCc|O(Tʔw^d/bXjPkE]F1KC[2sm )g"bøG:Ҽx4?- qh+/QxYUcqs/ij6K66𰇰mQ?ds {ކ_%ǚj8Ĝ}6w\2>=[㡷 ^F9.GV%샎;uGؕ?I*{?J͎{}@砯#{>'oyc8ڟ%~#@m GZokdakm܃58,t㾵]B: km6a}dn*MLh=fj̈́X&uX' bɵiO%>E;7ݾiMg;#C?>+y: k}GEMIc>I?cM]>%"]'{n>A7~^ʳSL} =f{8VHu|7W`` !ې?~#;W,y'؁`NjqrָI}K5Ɛ?')$'{}5TN%MQ'c?NϐGN|ާ>,'GVG~449akc/x.Mf;PKʳBI+~%ưX Ĕ!;L &^!\AA/&ٚkc߬{ bb@7֏W5nMz]ߧѽKeu]R6F$8:N̻+2xhP7-,Gnx\ꋸ<9]L_Q?T#yXcwR\?EA!EBL5|蜓{O&=U בQ]E.Dk(>bk47 ~"hN6k汞Ig1C ^eՉq &$C9qmkd5_K>!ߞA:ƞyPW7'@/qQ)o-ߡwYCYҞ&bJ=5\Gxk1:^'%"X_l81A7**x!XOA~/rggƻX&6l7C '櫿??ڡ{x0_?9kw=7ݽ(/d!wha'FyWcbii cuoQ n L8Mr7_%G#n?l-:F}XWw?Gr5稣a?r$<tlt$?liҳJz5{~=Uuפ1㞖?OQ\ y _A//M^Zkhx/h %=\d4O0W;4$PKs҉oL})7j ?L[^E4}Og]ȸSG4ĞXۖ~!T?H:mEw$?{ŽA:OK2>lU\#zb h):DZIVO[GMD|=:17S̆| 21]IOP[@y//!1{^Dho<'# jpf70h jǽR>#]`gĽܑt-6 R8?$?P,H8@gbW&[X'qtgnѯ7!|w#V@QI=+B\8d~6&=YC 9ޟ%}3X55gjCr|TO}wNB:|[q'bO'ӻ5zZX:7|wDCZ !oj=e08睟2٠hT[Bz!hz bG\ȋ_DػԦ/+Rl5w BI6{)@0ySm?}}46Kb}5Ն5Љǒ}z$?K~!!JZ$O+Ǘ:b`LZG$7"eV\?)/0~Ãv'B#w˼K/ᢍEՓk½^z.%뷉g\vkwzsa̝ӮDZ^:\t5kw]Grq7^w5_G'_sI{_vuθ'/%];/MeIK0H:recipes/data/okc.RData0000644000177700017770000055022713064546045015662 0ustar herbrandtherbrandtBZh91AY&SYaQw@* 4| H%RUBT*QQJDDD"*(QHJ)" T$ wZY5C6֕,̒+Z4[mZfYdhi6MXU6VLJ mefMjhlզ5M 3hT͆mfm++K iEEJjֱlReU&*Հ-U*fjR2+afTCj-B1!lZڶLѨٚk[emVҩM-&ؑAE ͩRbŪXPlS%d2UMm2%TX*B*lE ISm*e" Mm25lTѶiMal٢MAHU$R%"(RB"R%UQ$DJ@J$PAE* HBTD T T20!UOM"T M4b@4MTh4Ra !a4dOT4d4d )IO$hОF224` 13pPn=ZjѫVedEsM\$W.n.WI<3,S2(#h=F`\2J9l̨:EZy+b"+QEK7BESĪ.ră{=TSmTg%"W۔H gT^_̛HIr"D>U2vIrH""晦C1r J.1tH%T( nIIIPH/  t*G::ɍ쑛TJʠȊ!4EQP*2^8K9zThbAnMC' "6 u)QMsy#*(VeyQZ뫢В)s];v JtG\Pd)'$ C@WD7a);]`vhA(\$r]N B+$2B".لD{tƒ^^A(!WWwA,t{WMsLf]LȪA2#A*+qL̓/4O.(/"[Wg޽Pj{'IFauܓҘx{Dd"3$X\9ۺƒ DE7CQS[EV˹D ^0$8M&5w\Sd$^Z^WEEaxZSPCR *EBW+ gɐ2ȋR/)JshȹW28^I5&QRdD!WI;N&kB**/(U[:pm+Ȋ‰JZsjV|ȦFNFYaƲO;bLKYyrJUPEjFbrJf];lMAj ^f雺9%j6e̻۰ۺhEuģCCq;νF"=8mCPiu^FR˻#FA{^q\̃$b.ssv绅f]L29gu.m痗QThVXER )6R2$h:hyyzBPfrETɑҹ Pw^=Bَnyfp /*g@BCBKՓ!9EBL½h ,5(CiyS2ǽ9rt]c y䥕 vf]HeJUˡa K^%4ԒNN2Ԃ-FrQjhesҢnLwIwV CSXMz|m-ru\uLKˈP\zxBFE?F,K>>qKoBsҌt.̪jnS^Vj"'2ދ{+^p}LIͳ)Ŭ$k-V%1.=0odp5sZ,lS9Ckd cc4_eW˝]vWIdΔ3>Yjm;Ֆ5igrJ֓)΁i:`ӂF _Oeh1`@S B օ6{Np&-H_&iMKA:ɩLOsI|)Țܺ"-H|&HWa/;^zg}QfJ0Yȉ3hZj u?[cTubMt1.dc5Eɽӷ4IWNU2ِѦ&v8rfN;mo$'[nݗ2|V 9Jx&ϧ+1 VȦONJčƃ3:1څ`I:!{4VGA/\y^AnHJ6nE"g$Co3X3];ƈe߀;^,X&b8Xz3&nU6_ u7_b-18wWWzR~2x.CվT~BPL|!\GwSSod&\+ |+PE r ߲"X_/ܘ=I_8έ,3*6/˳wUng C\b2u}uQ2Ilfy]2x~hU wɁJcnK3-Ђ$^ ].R(3bSdr~0(/s Yq=5j '.))'_^ȑ8!& 6r]G=}1qZ dU>7FE"#]:p*8`whn?ϙ^F0o.[Vt$$/Rث"~dހIUGϕUbSfMl<jq s wY :_W_6 EGPZu-ؒ 鋼uJhn@ԱG|zqV$Az#Pgri[(đld'، DpT׫ Z87?EpNԝMWdgz> mMΐz& a-h+1 VU \/OOX~ָQZK 21Cl 97+}8Ֆ{'҈_=)nV%.6'!,`4^E{=8RWQN['[ UŤAZ'@]"@ZbWxъAYu砭jԊ(:ܺڨ`s8̚pxf^s)"zTllrNpjhsLfn&$RުkL.k`|ꄷ>RqʔR΋MC] C>j%K f|$2Ƶ _"P撠Gq KKDI3認e\饄!!:E犏"#ECusw~v2`.LNY$# P.]wtUڗĠK{Δ0g^Y(Ci'QV|vYFjp2h-1$sz0% 3˻D PE`46nONzEML uAxS>+\MFC>F?ܿg^ LnD؟w̪Yu1tkH\̛jbӕ2ōL:j8ڰ/HXqM[a$7dqtrFoA\{+ .T _V 2k;law^2.*X}2K%%i0=͵/J ,EV0JwMc.=[FdѬHb˭vYxڤ5ʔqIИ5NVa!x$2~e5͛ǔgӌw-q3f#X "ZZTT1nv2y^[ܵi27cMhfS`@H0FoVwG:Jqb[Ui͓[]s5:yFp@c@%W8ַB$5lO& AY-zR@ n5*!t A%k W؀jXdM>;@AO_ ,Xx7j XHBsG $02 h1Ϫ0?'"$)_8=2} giM_?ak`FK, V]^S-vE/P&(I[*ݩe‰=9TE fĠ;ЦQ>wőH&MBթ!bV?H8O/w> 8*g1k~:|o{H C4+4B|4<}UOEWʋ`KZCI\$HTōX|&&Osƃ1RJۏD685[u77w]N","8}qIBF!ı]{9z"`SF?A4g""C}B@T)c{\X&塇2ir࣐#-(E Lj^US[7} `> W)Bf'o^wk$J^ǂ٭%1_0 Ph0nP:xk)vQzδ($Z?;oRA~\'Eh\+'jgPU*¥Um޽gǹ|tFZ2 2rۏ"UN1*7 |\]a;w& [-bEoMxK֣#f>|wJN,lEtLOVCcȺ9*_eGkH {;NRc8Q{jܫ|u8$!flONzL%c_Crl[=Qsp5 !(ciƀA $XL.a…{;Mźd@ (Hki;Ӥi(QעV \~2ޝgS]>^{D灹.bEo).: qDP=ƚl^oBIE$/**C)1ǼԼl$HCElɩpc FG" 6ߊ"ؐss ZʜRPCu.dI1҂\ dvɳ6`b. ԥOD9LOpo9)_4AdHeu2W_X7,9< 1YcO+=gf55F&DANETr#"Dor X|oA*⽙cDeNk^gÇlhkf֐_* uff>?kf5Px!=wbl@\άNRO!D;\"ۛ.}O<'Z bK~8לg '`@fbYqچe\ 7Ͻ`C'QNeY7Vf,$}ҌPW1xhDBrCON0h\&I5&d:(V_?6I/?xUF"*3HD@/Ji˭stogꏛ5r:x6 ZWgS. 4dR9ր(]-pЭKV/CrbٛTRC{!vS|t.#-8ㅙVglE!3lخ^[Zů9lwHZj*.O/sI9¬3|Dy9EPOX:'et]=(t_s%#GlxZjeNF * \w9 ̕KeGDla-Q HE01ѥH~u%Vن\B0![Ik3yYէٜc6iP5'y5'r>rohB&s-ANsLFN3FMH iģ{c Ŵ/Q2VI֓h̖Gɒ#AY\Q$J_B B#7i^շݥEeA+}Sv0+eWn򿪌(^·6MDbbEHծǑ<4gQQ^v,s\sȥhɷ ꐀ@iI9Ws>Uw7` zũEM KäO6Kqw!gCm4:%Re-ÂNQ, [ n[HX:G:QR2(@1 P-J%d_ ^N;I8}s1v+ŪhR$su:0`ea(cv 7w}̦=˜= gؚ6AW箘/n.IYE@S\۫hoJ-l6SH͖RuT9sE9`̵zSwX5%cf@0erX 8%J.: [3΋2!x٥?oLX+q_% }԰[:gZVƭo^Nz! fP>ЂMQ UM<s.Яuv5zH"3먪*i t`\݊~ƪ}m71P` AUa"fcH~[J=I")>A_oDd6Fv NrPkɱ:)RD*䉈dh aECM2/r6Wi%slD`ϧR>LSg>&W$]%e'yUZn)\ %J6܆VglX0b6^Ѓۍ&} iS4qoƷ5]J?#26LC*j.beFIQD v,`bw0j  DESarFfMs x\Nda)O:ҹ5{ܫt\$CJoPֶ[X@Yy6EƲgh*S=?w.j8RL (0 *Y$S@ Fy;].K&!rQPj @gC= M+N4^т. BWL)HҮN_ G&H yE!Yc0}0rك\b]հNM7O9⪜GvRv3_L,OjY:E9Gjhܩs^V>8]7յX*Bqms -*2GUFq!|W1ޢ<*i֋R(I<;;CUܐ?k 돓"q,rV3TטF0yE_#p'%;1RJ*hx_pd] sB)ۢc7 m~Ds0oc4\QO,L2 5P|BJ&, 3ҢA|x} W]&ߌM%Yq `KU $wDx6`/i+3-naƹ:J|od};͊I')M(j=ɤ@4C%t~W[NJ7]H_DP"=l\^lFdf1)%KA q {.{i,hՙOʐ*{DIʋ9"3l5 L0j(G b +F֦ ?=/4fpB {͋6 xha9a36S!d.[жa{2xG=Ü9!\/X[Mtk2N'i3e ~QɩMSZ(nC`DF 3aY0< AL{drG(SlV6$p[9Qªawg$!#+KREs)5\3qGW~qںh.U[4qzr4f3* `A"l<*f9S="b"Aq5 E4!p گ'hN5SQ'=\iwgʅ@>"z(21w `ڒyQW[1OC]Jm-WiN`pu?t_<\PTиR4]@ܕ!Rm6Ezy7fd&Oyso 0 ֧+qVm3h,]UJ8o?W²6H47I07ͺE_&pԭI|`e'I BeU<mw [jΕ8#,U՚tߪ43LtH^+Q'Fjh:=bzD0tW2Eܣ}F><%{Q<; =rf L-:;TpO’i.c忤rn*If)܊Z6|*Փl~z' ʖm.L3q1xS kqѕ{)\1lpertS=Zc6ꍝ#vJRkk^) Ql b:,D;P/c $ENմђZK8M_U.ޯEo{pr 5j`/N\ S pF7w3ݠ\v>8oG&UbbWa_O8a5}Yo$2n̲q ?MLT; bRM/ j&pxkb0vu5Ț@`H~L zFj2tw‰@o,:WZQa: Eɋ?2tEp'N+;Kzvóu~f_o=["R׵3]ԥR6-"mp-j32`^TߤOԻ38>7.&~:ţ&"]5-d%iЖMYiF[h(b{!ޓK>Jo"&Ƴ= k͏7>;BpX_C>l*i6oswͶ$u°(9%*ӎml]f(eCI;*$ JrѝtfP[Г]lq}ɶBKǛ۱㲬`$%,n #9GzcL}#ۥjaMB`HGA"Fa ZHx0[ Sޢ"Ry_-N&U p GJ"J+Cɢ0drdOu /G52Q N"bdǴ!}̴i1 3.{&Q{=G ߁mxyP/IU5螅^mb~`j7F}оe|\'|0]EH %`[=,ل1c2a%1|VxV* BH$T/k47gj4D*\#^:-ɧeSEu|QBp'rWSFn,H^`U]0^sT$2: PI7c+ ON8ٓH]ZIMkґ)v}GLa~Wy_d|HǓIA>\IH'$0(:h .I5JlN+x `o 8@+"b'F;rQX %ٵJ=rV璻7yp'Ga}QN^TK=*^zz೙w0"?*HYAB?{.;z퐠ڄ9u)$GS^3vDDhP[18@,'M޶BA(:tky='=8c,{bYy;SWqEG͘*#Q_hH B!3}WKc3Bh{l:+mjyH5.!d%j]П,:e9HJж[NL:@t:pb(h"nhg WtwVdtgTb}$<^#Y\-V4Ջ@PDqBr n6i*.gkF/BZfAhʷbu ďP"&=nF)vuAJ\iZhMgï:>T(4b0FЭuG;#= {fu#k'^R*%Ik^®0;!3<1*3#PXHns&z\آ`$ܮ^ nc?P#]>%ӆX7+>H,yKdid(pjD %F7PG)X[cn( r?4E+*1t,k#KzUd%tXx!Wƹ車tXmރwD<G=Y6N]xSNp{[<&5 .'L*tTa'Cg/jC^Hˊ0ӻ*掣iֈNJ^ʺe FM 7(y:@kեGh,j<-_YDܤ5ߘlr5e*21G%qV觓0Uw ֐΂r ,$"€>T=sHs^{|\Wͭ0T HӸa[$қ}k-eUA$䮱2f_ [<lEx jׇxXB]#S/'GY抽6Σ[Bkcsb-0{jAgq1RmdLp3t^#;&҄ǰͦ캁W)^#1_e$<7⡟3aH<6 8tehO5\!?My U}/-5`b >i)X掆*qsx2W@6S( d:Mp'6,^Fp#knzVuAd,|/ye&TpPǧ$ELnSo:#4yn$^D}2(#`R\ƹs`x\SE%Fr|>G;HG?'vY@7. )fA0 ^-Ϥ5Ir˒$*/\"*ݩDpUWCAc gAbFDD4Kp``c+Ԑa^qvb5C[s'{#\~0}94M>AcZO^S̀5 D!rjو9rU#68 ?o?$/ƊC-qB@ 50E2`# Xpk0 S&_"+'.ˡ%׻gzc>"@Ԕqk5@.)}`!i@.J!o͍2u<b0hnn(˨4 `yRzycҏbXK򄠑P&,5aWNFȺWÒ_pɯikXۀ0X˪u[6>,#a- |QC}iwwems3RQsb Y(KXRz؆gO!M3Z5Ls+'PYN7BS/:S%ڵ6Q+In6@>]J!qzW[Nͼc֗ "nAe5xԌY֋R0|6b`QVG4 |1X#JcF%과M\#!LS1C-Y7!b>t0vjDeh~wilj!^z[/an;`ܴL^y'E Bl^I `~-PI '0yXa`9х0=2W* xg>;w#oՍk )X Ki}B iÛڒB~9 ;hM8ωK(5W=A.4gkonN(4(OU!m2 #oc ԢQ_;27Kt$qfY(cKd\J=BL^hq D.?HzM+J6)Ra;@l7ۀEГr<]DuD#_&5ߦCd?O&fbtTRs9gR$-DBb!3KUVPk: q]Y;< .{8 '뇓f<g幯|Uziww|;m+#i֏7GusT2 DbJ  ǻg@)@.zd]r d8|:l+c Dó X5Md! g2OrcZ֓N3h ꂽYVwJo,.i:d:$QIZ%ݯ2Dw=aq2!=,&]B+𘨓ƂUt!ASE1ݲdoN@;ؑ<[ /Ic"ɎaDYVٞ( 7 Zמ~j*Z|f4i1,)yh X ^Ӽ!J~B&R덝9֜-@-H/ 69Sf¬xO=ZraehM.,ys̓|9|NX(q=ptXd kcG9'A>FDpvl Lr8 2] %^1:sn A$[-.ĝ(>[K{?mM%O# gr폭nZDJߜᢀ?bM} ?LSxXj?qK D7?!hx*Yi{ccE nWMM)3 -\`Kt36I=7CF,k#ZpVC;Nt:c Znq?_*Ԅ_~ CVR;߽w_ Ϙ2)2FӝQ_:-X;vD2CEvmBvA_:fӳ_wU'v6;m$C [ MJ~[+LFCޒwXGup_ 4^|J $x@jXUl?ct3N62nvғ-͒C4{@ <^ 0$EP'ը 3zLsA:uPnc `Z8Z˷zk7/k,<Z!tlBbd 㫆Ή%>IbsN~oq :V!AD5{,*V.,K1KU6e6YDc ~׺%EΌY9,swa,|^OUu 0P0-|;gn,#̦BqTق\]19F 7UDs!T6q mZA?j0m[hG+Hc#gHG>oN;ؘ}YIb#V3@"&?%KFZ̳r&18˽i#ohkCqsQ)O)rb_6"mbmōuS`udͳ5UW̄ |`?um!&E > Vp%]p6o$Õm0,"}ic$=Hܤ.L"ǖ }/?]GÞ @>LMk;t^@ͩܦAP1+gh.BdS1LSf܉?Y՚ͫpO"̔ A(sr!مJ҃1^75ż3ɺ=]"ce'')/P/=뀝"(L9{qRbv:-_#`?ւ^4E!bb\ 1>v_BKSYY |`_o= xoŷt{э.aԛdjQk'= ~ H/d{09R(][ri(Ǫ IeZ*[AhNqJ/Wa"'~+Uh `3;L!d{%Fz#b`P17՞kr\sAFO7e5tӎ!q_4_:#"q\PSK|GǤuaLMғ}*ڇlr 'Q'D{U@"^O4zECљqVѿVDas,mNټW{zai2mZQ^}\YkyoJcКTJa jlE;83AH@u7NƝMmu^^0Xa}BQ*K},$قw@ 4bDqY|զ /ORV9Wo\nzReRK/m { | (n K)[Teͨ=b8Tc'bɇcUpg1DyP}b \:Eo1q{}HuzyߊŽgX`oe!<;d4'N6*^_C^$BB)#=Jx\pv}dx$=j@](I7 s|ܸ3e~Z& WK™lU߫^,PC Rzo3>]-Ј4Df9r?z$?UI2'N%dD`s.",rX $ ]IZ8ìD½1_%~+=^2IgxT+s[5S"\)U:ÆVU,mܥ';#U;9dl|u" tJO~*Ϧon@ ;6#yJal]#G0Tb,_8lsXpj %# I&6L\*(diJLuR%@&e|!yK,GTGgl*QdWkNp۸|naĐ73æ^:[4dͦpPwmfuIm#]ގ=ͅ:(j5y`ۉ~/.Q,:a&e6Ak=Je54.rOpEWU}|,a48| . Ke~iT% c"8m51ﹿ(g\@P o(%PY(XDw{y :@䄐5h=*v4:<E߮w[*^x_Fƅ8ȗA4/Pw\]Wܣ ߪc + z%o943.o!rK4~> [0'z߂:>Z"zˉ[5Gۮ `E8aܤ>:Nb+#׶-!nega1WF"5C% 2ršk(kڒ+@[\/7bmLe#A 5rv^j=H݉po ƒ7F+ڣ"͌ZԽ{Ui. },[Mp 8a%K{ 3$X+M^^H/WO'L{1{*j>4U&,Te鱼;dacwhT%H&j/޹6>)ڌK޿0;{|?}7vx6R;z|=F͊^ŒP,4}b {n śi^ԀqچVtݐ` >R,jלo;p;3[n EYj˻M=stwx:q@0ܙ4%Q@]F89_x {x"sR;RE[LW."=ۘe0ɂqFo~=H/ vL6w)D*y?<KtH"%1 {ɰA!'g5K'UN,^7Rgn+Ar4%=Q%e~|$k0YQVl|9$%,`FS%# ^WA=}7)<E,.dL'T0;(KSUBSwj #gǴS&Ϭ{ZI303{ mcJAC-#M9h$~Y4䊎~a I͖\pkNDs}.,XLV;x$~t>pAz&a\xdzpa2!&~=څW;9M'c &e H - mfE@#.Gv+]#h.EU2~܉쁰iLElqtɒmW""h7\7:Y-. ؗ1m ME6 B4qD)3CXMe]AJ'G>o3F * GThi􃖌؇d8 w. NX$Ms+;ε@ Wsjveւ:u]m./q4xtTߣyUMw,&7o6X@0"%UBX! [;̻uee]i0pD0 ƥ/o7昒};k,ILNP U}5e6+GBO:p[SpxWҝXW,R@ov6)saj* Vv 38X44d 8qt^H/n{,c%sjxF6fT~NS3{(d!ƶٯ,"s+Lu55U]h)Ec]0/.P gT {YB)!3E T"JGMfš߁fY!93D;1P>%N ;4-m 2~1Gsxx1bb6_qQ\+E(f& iat[ `F I]\^YR/kS,zOw鱁0{+6$w ;:6uԪE tiuE& rr<ՔMG.4?)rwۓ\MyP~jAZ-CnKʏR1[{ٗSX%YX@ҋѮah<_ ⟙kqpMѫNSi#&0:m(2jVwgB{4; j _[38Ó6Pve}yo26G>xaHJW +L5`K?# 2/N!$8 Rdcympʡzm‘%%:iz%%^@jaQ_ՏԈ9+A)liRt"m?EFvq]zl-'ץ^+ qH,, Ah2Qs r݄suk C1HpZ {0>B耳D+j/#O>Uq/8$nR„:58|P̸ Kby5TF\qvC2Hd d; 0~c_Icgt=PFͧM2CR3(#:VYk :HGFsc#{aTɬa1$Ԕ@Q,KF33dXqɕ. mlcJLG>ktau17HȭS0vYvrd6FA:؇k4r wC1 {'G({mղ<&|#6|u!eE8vq˃2&;_i%,\H0fbF\qHiuFcW5P,r[!4ޗ1C!`ê뱬'S[bf0^^=;)r74`r;ͽ]$VLd+ئ;#CB[%]%k9Ld|"ÍUy`8Vh1$45h 8@Y0'jȖ ! ,yf i u U -&\cPv0Ph7 m4G"a9\0iikYF{ 9[UUl"GP~\6 'SOӦ^8Fq{O\ ]YK2 &]k"MʜXJ]0LQ6s|}*.1>|G+Df#PS 0sx(Jhc{hvu:">k/' Zpa[{˛;@ʼ Ѷϳ4hzj (чsJ+Bo w7+ z՛*hx(2!:>C crTlg#9[PJ~ bV_hd8Kd8!\F )BTv1~4[,z%lkv}q<1"o ,ZE@1&c#y[ W:i'KT2qK"b5>1 mkR]>L˚rh 0b'G=o*;in)op9b$y~!-K;Za_%r,UvJ*@U ͗Z͸XI|<W?rz|b ~/Y,G}ɡׅYiExFu[_)B ƸCδ?,2FH1'ىWD84eFT,s[gfhzIü^qpjYdMG'pu[EC!`* KM53a]Ԧ y.Ɨo4Jap=]T[|jH6p |ě;:RUa,x-|jeTԋU͉yj/BDTv0=%G_t\6iRtwMԐsCms\o%7 KK- |= n+c4o1fBTd졌\sL)EBZizF2FU5:kYh@i] cC:C r;uA JQ! P6=dqw#HԪ6E&8tmMs-7}?+n5n(_d62RJ?߱BgjZ"/J6.h8*z{;ZW;I&.I N&~[' RQBB2njɬt.`EHj)$-l@8!5SCs+(eU46;!u; 1Fg^|^;RJ>Ř ~kо M[gԃ:B{~{z zlX+bXNSZ~ok:5}~h$'Kpr~5ɿKŒqifdB̞.;6@AƴkB5)C.$퀇-bۧe. aN&i*܄7$_5K" R.B8?|69YXJO⢒(`z4Rk>h)ױXs V:OՊ1R!aG"'1VR7Ҡ K͜Z9x2!nBW=m3~\ )hh1J;{YNvf솿.r܍YҬ_5sdYx|M޹k2DSaeZw3\A_ t""}U(ФL+ɳh\_&ik< NBBNFp \{V?E1.h3"rhHi(by$f/S%cBe{~ITZKvnaWjpVA|U" {uv&uHwv·-.V:n@6}d%>]‡xX:6 H4h5:ǔS[З`?-R$ƙqGmܶ%ޗ'icZCVܹu~;o6P;M€AR,v'Dlb|K,8 T5Cp)_(C GJ(20Vy콗_-}_{+4Ou-e2n-~yGLcaײl㌖{ǏjiJ=KcO$RF 1RDUR߅ׯb!ٶ ۷UzɔP:POo3[ E6"u[rz|$ XOEIuF.u8I9p\,Kccmd`[j]nɥݔThP^]D^"T$O{D j(u`?`Lszn>2I1&/0qF[sF<"q;_2oB0bȥ#3A|FU)<5jLm(?10PhvYzz0fńW3G:ca[` -h /#)đii%uHh`&lUWc0;I@cJ- ^q4P^/c81a Tb، `Q;žȕȩEImAKWxڐ. Veڇq[-u(OuhG*Yzk&ѠgwMTh vR+S5Hxh;N${#TFRg%Gج 2vXΏQCswqg "U{a,nt;\y˃v§*K`wO'' w(ٛ*:g9wta?Lݡ=rX։X+ij.FוHi9h6r|ޝ5g> ި`-{3>\{h_dk9RK_!+k T؎bƙ7T虹cPON(dybƵWȼz>QaMIu<4燒^zlW.k81V?D2X,/7d/T +3kݖ5!<,ȟ'I@Q>` Zd0ss9 @.(Ϩcg) +B@~wH?#O.Elk6V&Q.=}. >b=͏N{sT^ockZfϜ,Gн~/?@|Qj!?BdH7 yxN 4V*ZY )~2呡zH"L2 !@ (i)n:qL61pӠE?KԳ/S#C68_0e9 V֨&=E0SWFUpsx{ӑHZAin<ۻuͤN*uA%GB-w:y(n{7(GצMtM<ܦI}u1= =*A[!o6/,a Q򥵘T<;T @a!; &[a-ʛ`=fQ|**(c@wnA&&giDP4bqV\Eґ"$E؊JG6Uśi!z`Db0ˊ&x9Ze.˽Q A~ה\߀`t#6Wf?.ek TaJ<|Jۂ !l 9"640KnY7(o[5gc*tYҼ k밇GiP;NVK7pRH':( WrmM27b?sbϔּ_wWbbtM_Lg ıuzЂا6?|A0EGW>tn|:i{sJZzNWeD,XW5@PUiFF|eMoϠ]<&slckH`\3[k}`uUOE>l2ɘvFV} Rw&bY1!7 ^ϩ;(@KJE!VY SM[EwlÕԟy=5Z!!ﭮH!lx&b~M1{SĸtQM qaRH}B>oWSȆ.6-@w(rh1qa9 y` H'\l2׏ T3J] &_H%ߔHB5›S4;$i*ZMfA;=1KW30SG9v$^WigEQ5OC&RKS l]ܯ'P&{_Cu گa%N'%rUO$ U<>_1DG >aiLA1Ŷ-@U6­OF3q ΌQvS67gIX1 ?N/ԩZ4DZֹwQDlZ^ @G@/oMq E=,iԭZLbXd'WFe'~,>4&Hˠ #Pi2oe=TB}TcnDg;Mmd+L,.Ͱaŭ l[kt.} ៤cNM9~8FTsUXiK>oPaxUfS#R&Y59:,B2Bg0PByږ5V+'9 f!J$ʂUvطRej,J.X#qݝ_ 67MpEЯ-YM 12 JJ˂D+n;P f]3"A@&:OLO,~/#3PG;i"GH?L(=t Ѯ'^HA R3W@MF-pF.-yU Pc{Us;ySzێ +1ĽӉn!v-sH֮%;jž7FIr)޼W! ԺT dɲP*(1y+',oukS3ĘI򅨆UWdRlU] nU-p>I)r-b" *.[BV]8p d&i*dpaʻX7N:@qCM)U:wV.%/6'Va*I4#\.Y>ɶ bOyU KlDIpRM(~VVE$h cM+ :ІƜ<ޠ|!Qo/U/*p.: R|t)ع2pJQ{-g~F"yJP;0FSfwrgy$[Ot|+A< ǽ|ĵCIHնZ0_ZhnHSgc;)c6ҼaQ*;)ǩvo|&ɡwq%jZWGu$$@]`AC$MMp1Mx#a)Z~šK9HFRΨ ,eMe0/S;h:a%j D|2ɞG[哉|nȫ[fKw^ƶp?7xbu"A8su> KssQcYt.6}5Km.^,9kt?D8}pF!,ۅ(ݥe`f+-g0Ml:/2vBw-HHe\ aI ,+YGݬV{H4*:R4wL-L  QTAn$ŋpx0VeȅQ:u $.lTmлCvu)slEnkR4l t:̂(cURv[O/QA'ҐvQ`D:5I#C$WͬKmiGu.=[و0+ 'BHB#[ sTgV4JXMIfg#.6CH :0ƒ p:#wDEj~Q@D]٫nqbʣ]ǩۙ-B4[YNu 1te$^͖b]J0z]v.e kcWacIjR>o+^u)Rac]\KB-PT79H=E}FL!o-=0Wȹw RbQswd?md]jHY KYd:éŰHd"b;_y0aWz0TrʦDX'k~,(d睧oa 3GluW :;uZ~} FZ| % bg_qj39rvP~~&:؂BwZ~,w KzfR7'"ڏ M7[N="[Ϥ4b!Am5p0 =#J9rqVGH~gS:@waOQSDž]+ix. &zzc]qBk78|!W`fK[Hx$~FxP"/@/u>hx$Uͣq iB?XgKt HNŸnߐRM>z!|cםx3n\=D âl|!W҆chY1b󺄌$͉IϓcD4uߖIraPq%[*V\&D4L"w>#K!EQsh MB]Y/ zKDa&6:Fg@C&IxAA9[c p6i=wPWMn"wkoltj6SБQ] Gmk}i$[=@8]$Xf4 ަj` m[-P[(nXKsf J/ŵՕ(1WSbv'IŤ>1}LubĕG)q8WȄVAKx4}}?,%f4iVrd#jbx j^ZՇ x9E: 0$ʂFM)sep>w0l=i%`}* b$*bU*[d&ʠjy!֢9>\ uU?` (]mmg}oYSuUv̅ZB/i *Ȇ,Q/]@){M٦& ^0dyqCE|)]6dfhPB!<~e&2iAV ɛ^~td*gءzvif?a=W<9SP5K۲ 3VAHX`7 ~m_ňHD\}NqYclfvʪ+>G1]"Q+t 9<EFSwx}sô}jk\aԢZz kG6b@VJ+3eB+zd02W< ;l"Cbnu'^yk BZHnvR*_ BCϹS")HDjgѠ]B ~bEA =7#?ܤ49PСUzC&ا.sU{B@l2 U:rͮu^5(]#ɗ9D.1*:+TpE Z̊rP0ʥ&Wk9gj27&S8ތ) e=VR HtiUU-bcld9UqM)>J{Oo])ny$B XRҡ u$/{n'4,+LNiR#LLE K4cc[]_kZ";msmW!P/_(6&JcLǮ5: vKߣe8\֠%>W]VJFS( 1Id1>~;.f$.2/mX6-݆5l(zZ:L!^JR-$zO_ DGҍ ̩& d\O괳|QCb0IBHk9aӨQH8;h{};JuR|Q+Ԥzmڱ[㌼fז'|3jqwp<(7o/*e^ J唒bP;,W1FKL8^wDI Kxs⫇~Uuѕz']8B_q&}eC4f1+@LHSDcK`H9@r7Ch!h |䘧MrԳѐ&{l#Wr䳚ޯ,>4kR_^Je> 2~> E7[H2RW3fj)_j#Л jG?W duN@lrxUQs~D>+L}*z=XNnoʩOgB&KeQʛ6@S<طXʋZ VfY'I0̚ڛ8m6߈lDmPjSx~OR%m>,Uzƴݡ:X0bQȑr.8y4 O*LP:y^o%̣[/pB';wfyv۫ eK'AN@]Eld員R0(ƾ3m(.Ӌ@ǴC2)F$465Ƣ!;JS];\Jtk`:6\Ȇu ~_?+Տ\{J dB֣B.:@e{`ڝé;0]s^z Ӣt@C^@gp v"72 fZ<2Y⼈9끝8QG;5s֫vӉo0FfK6a'&MQtWÞ9) r Ԕ3\I;+[6{uЋ:o# ͛w{0e>tj?N@A+'t3N`N]6(v3k-K)Ar5@zZ!Ḿ YM#7HJ#Oc#L)DF?全͜Ώ7q3i} ae, [Ǽ,!?Kw+!c^ N-l;2QՄA4>u&h ofBa ~IrxU Cy2C=ۻC8'@RuхfUbAe(^hjPZ^ Rw7kT^\x7z>jU:YӱGU%'bl7yaOUhSj2{5}&rTB#F{&67ksEk>f M/Z/ςC|DLڃC>N;ԧU ѥQ7>txa-u-'‹m+JHhm?)=$O iE"o2V+UtmMڳtuX'{~r;ڶ4w"6n 4A.?TdOx}w*Gp6/6SevB7>|L:+I"$TA!|x0efdԝK8 OEۂA}1 ?Fx16@ `kV$ [+~Yf1٣bż70"@Kf{Ei\vLQPʵ)$7c"q. .t[IYI|hmis:+. 8Dm$ͦ ) JS{!]F甠@1CAܑ]{萻5Z݇G{Fyu_Tr=hO8s:UzX{ 4 mhga)3I3uL4qd{z&/2. ڒS5" LMK# ],"%8Eܧ2>O04G.PM~ZN7;jjk#J"쁨Cv~@@`)lߓ>OgFwVS<Ǵ{QV#gOۗ'R~8_qnZ?:$XJkX`"癓N~}ܥws)Va!Jj e5)?ÊhC.yF42Hv N?K+R"!1Ԭa^\'iH>!od)>IFt)ÈS1'M ml߅-! PMD ncxehKUDyV2P-^wRQ"?QGzA^6J+x\q+,IRLIrG:j{ ]l.X/! 2atw^F@9(t^~deV )˔:tD9mGr} RkG0(_ȱ`UF!j""*ٕ d4Ot_xf zB,4qGyR#S @&v[qo[ٚ%-`W%=5۳806Lv^H"֔ϊ@ja Y=|T:cgpo~8ڽpِs~lڼp됅k]>x넿2pCljPS(X X}~lǏ}϶2 |4aq&cn%GU*Ay˥`Y#]f%<⤇{wPJM.{T[C9bZB]Ҍ7*˿&Y ŠϜ? ;x[9 KT>.;k=6oq!D1f}fUV%e,v_iZaQZdi^჊zwji5JgFOɫPa7Chb:-%ȵgJV*#bnlIm5zHmA}@Å}0[PYbHgd_,P8LU6~߯|Ӕp1دxLmٿ,)(_$mo >K*gfRa3-Gu+Blՠ=max*O_`e䡐C滣xxo#Քkm*]Pf. IPjt4^~>oL ]8E\oZ ĭK.2mp-ιL2T <[ L/u/ õR+y)  qM^VUקN_z,uR U=ˢ$SubĚ?ݪJԾ`s_x7*8!>!:Zׅ]ֆ'8 )-wwò0h ň6vjeIRIи3b~\}=kmN@k-{Є ]tᅵ:/.ؔeV/_@J閽 c'Fw1g嗢:ħtý(S0yoi?L~ BAY-A l1j u,s9vE0Xn{޷y(Z2";kЉI&۞vU7MH\ԑVƤLubfQaj*d\{Dv' 6Q6ʦQVDccEpLB{,yhB+'z% wuu6DPA*Jȑ!FI-r\qEZ%˨sGNs,Y"N0.K-Z m3hȅ.:$;Jh0wvyFfKusiݡ#&.NN@c8=s7w]4D"5lC鸤')PuS[.A9$mȊbXN&R" N3wkpDbkJ kq\$Qq\(F;n84۲$98E(X=خw\C" $wmQXGErZ# !\EU#Vk1RetjPU iT<~o|x;wy|s][ ~mV eE6:<[<-V CsH&>ZOlDxzqcCLIHƁ$0A ώh$({z }˘!! X!ET!='ǎ("G/y?lzpjcхW$>1A zgXu8{61`rUd!DaZ͐!uBDE}eb[xE%#~c F$/D9"d8)dC_iQF<6cB񱄩bM>|*c01 C ^_s,ʲeq(5e\C+BJU]Q 1"D:ax'tfɝM H}XrVEtM#28uoH8i6!/a(;~Ca5 - 1xaAצvSx^1*0|nAԯ6(Q} *:faAb.(yRZEsNu2DPI/[VjxvF',;9}3jSV-IT#hԤFJ#C4҄aR1%LHϿ!rCߴҕ!)s(;nYf|KlAt+X1KYbIpf RD͏lE[Tv9X7ϔ& "lI @.3 CSr kBN[iHRev"*!WcΨ`*BǕ1T'*@YU-ڀSZ" mYY?FэpRv ŭqęZ@k,þ1,+-L8¥jl+yڴ-`"g7A$gTm@' R|CXQnKݖx!tE{pZ4or\CB-MAH0~+:lϱ<K'k9_~-@e1{Tc9*UX.GF7s+yrU8PL¨1lIG2eTޝgW=09K8 NjA,ҢgU}KLP*MIj.N;SdJSWM76‰dي|#Wz{jr,æR=iA{~leݸTL@ FMzbK#tZa1Wsi8{ʻ*H Ӵ*bA>n(eE],T'e[2X 캽7Jzt}7 3CֻEO ܎R+v%V"%҉]M Q*4*;jxBf+=7o}m{VJ::@ò8-a䚍֝6IA؅Ĭf&dBM?F@Yߞ JΎb08߾hD˩)WhLyCn(!_{PcLDcBMogTem/".2L&`&$0Ǔݧ^PkX2M͹<=]i/IW )YМ=Ih wW =Ϋukz,u$te}!b"F4Z<&͗\EJJN[a);OVy0ZrNPoʠBJ"[xBNvY$6ῖPfwBua1|i+Fr: Fcۛ6H|F}Z9P<%iRW(@|x097!dj|Y -DC22GaS>XG | Y=F="iQhyM7&= q{0K!Mx #7ӻҦunuB MRrbeGŅẇ[?Aܼ]/nri] pnIL }m,aAn/IzLw"*{`~yi: x^.A,G_&¡ l㌵>n5Ԩ-#9X6}ymD%O !⟀3eŐ[Q7x{oxo uN/5*ʹr=yl* gAx![ȟɇG(a|\q:FS6_Q<7 ,c7a[lh*zxt%/<:(?Y%!.i!ROVk_өi+aUλ'{8f9Akہ"}H1BL&LUKp\g|腵͉&ǧ\ej 46ؿ^ *="Y1r Zv֎n\..R%$O /O*H6$vjr2,Lv}"3}jK8]nDq H\bGpƹ&0fU.Liber}kț ; ~]Ɋ&U+*%pzLKdV D1:6QZEI[Z ?H4k=/Q9bIMppn@QHKM] ̭q !2fԘW^ ds ^=x^[-s}OvTdo|\7Q&iyPP:B"-p8>06YSU0K"q.=rFO9|˞8 @{«7r?XÕkǖ1b%dO h16ќ6ǛXV@Yu| ;R8Ƙ$<X#r9x $=,ZF dj؉Yx< D$7!$T _ÛE}5㠤sg[F6?w8_ u @Ο AZ!%T1p!N kgC"?  =@nQ%rw%4{ucN?xK23N lJZ@A* ~5 ։qcTt=2,R*1a~> 2̕< ig?Of}r.p;i`W{9 u {c*!+ nc%H"ɈF(QoLb)V`GF!Xjnd[Fc$("M8rLmJ^AՑ&T>rcxy~),O1Ža0Nx=qT@gׇ!pasJJ& q(k/^X!o~Yqa8+ۃ#ثWRLY0SaBj5:ҠȖ9pq -rCC SiQ0悏C{hس \0ooKkDԴ;8juL/n <*"jsjTg5 xlC;P)h8#qI`$V爈F: ̾f$>{=˩iZч+Ù<J|X|# 81b!>QP)f&˦ ٢nO1R!U8C'[ qP(;#;frk8A`E=`*tU䩒 ~Kۘdh$vqHNT]ㅉ˜ݜlOJg$.qj#^$2JɔA˶>,cL5CȓSlMSo*뫣l+ɯAtTMI8g3?]DIkrV7(]^ pLrU)vAy˅GY$#g X> eR Cܳ4960Sͷr"]eZXLҼv *ViI;\ViV"G?6'o*h!*OH'=xd}]wT4ɐhJ r(Fg\>G$ d)lAӘy$\b#)Xj Mj%ڮ|T)Я?x*`K{ p.; $$(*I81S[b!JȻFy TFHmC7`F/lvw#0{ bYR޲=MypEm^vmEM}6#=y8 1|\Ab?rJnJl?!Fs:-M?__ ^N̝k+)\٩⒴J4HoM4:,EYuX0C})Lְ"tB}pg~3aCʃq Rj'х#SIMrm=i8(-hj@JQ,?:BW,1{TCB`i˅+㻈2.G6] ު-"G)8`qi!+Ў|Z)φtkV : CNC+)AXr鎮Y{mShtė9_/EKzHQp CVEJ,Y"X]K{mjik)`c[x\סBtkL ssO`uX ed!'\byÓfrenYaA a--&1J _]0tVxtIX Rdz6kr&*Z$'Ǭ0e 0`&7zU䈔-QZZ\iG%DHo%LBjVy+)>1R DTG=1*1SO/ -"FŶ"$קIƵY(K(JVֱ²Od~8FmTKsYÙ_cLc>{Mcp;uh7'l-X7 Jqh,*!3҇kIzcj3nZX5_J,\ O 3 y'etK@v i~yl8b y *e x5P_[犝b V,ѵYke>*f0^9|;FM;-ǔABr)'hb5^[F)l3*B7\/:mcvN*?pv>Wn82LϞzB{e]!7o^x:x{Ec?;ٻ֪ HMtm*%ť|V_]RXdDr&;1ziQF3o$,32DWd٤)#E?SrRylbL BW2(D(`郣!1E+& v6 ' i7yHBlG^]-4~E \ӣD4;>H6 ׊G4p5DBD \@xbtƑC71"<#3\ ڞ<Q1SӇw4g&qż+CQq3Vzֱ%b n,'0CXU/^nJcR̺ eGfHp$PICZ n E|Ny#-2arNN-a w\.cB^Fk!H%p$,ZI"D:SB9K$mpwG<=M0G:ÔOُ/"_˛)P/hl87@=);%ji#0xHͼ7.t3ښ?aSO2?V[\Z$h*O G]Y\U5q&G_*ҊR~ yKRC`!M̟NYNv!mb&o BzAkbI _'!0 g&9'yLk (".DМ!r,ũ/4N`˃y9(S6A^+3|mZ9؋^ᩰ߯qGAp4U&EUPD7? G1>" V^cm*$ z*%}|å-hy;5mWŢ6q-S]Eh 眦"͛~oƭ DlN : - NV/XCY6'@^]HbIs@E] bsՒt">"1Kn3ٲ_%#KjyX Tn;cM:(=!d|੎89u;{eKW 9>;1V0;pӚ}ٞXTH D-uwj'J] ,2tyo3@. 9`ՉU ``Af:Rdk=9oug$ZXVtx63oщ D'1w 0Rɻ*<\wCa :a+z37D!ԯylHfotH).Ste8io- L\!:WĎjSEK)( 3rX:VI=GpBe]hԆ"~n>;O8ҟPMlgか , ײw~jm}x`I:a%co{|pX:f &m*'soRF{5ٞr;-1nYQ%9Aגs66 R<vJ_O{.ʼn } n^LEq!)! RO448"Hh1M+585/.y?9P [uUX}HsP~#;Ё&Ӿ?i e1#w)/oD:[tAxiF)mrjDSG\&i`^)DvAba|;N`QDZk0i_"X,K.DZ[C,3=$N @O*8~˸6,0Y9H+?-Ac8I<\\Y6U4.pR!$9D6ѱB*j~sEP \Y8C2on&Itw2IL};Xhś rr͟tAv.+r2scplܓ`K25hQJg}!̤rPÂ͊xꓭ@eO{o6,<ᔇC+Pv,/YaifDiЊ9"@3g!" ).Zcc +r@PߕYp?n\9&4R.kjGC/ԌaYըIAuEC#!/b]Ann4LʨS0l]@{pj OcSE)T;[Ց$0\,aՍs7L!@TTwu$ E*W(SVUY*Fq ƤȎe7NK`ZE [x0k^`̠= ,𮰖q:ằkJ;abw;@%{K6}"`)١ߋCx}:dYaCqQkx I ʳ<ٚZ $(9l˜R$:cw^K8۠|=賓_])t`=teoʱLDTzJ+x;eGm:QELoCC9h;Y1EqgcAc>RT 67XT$!ўewVjzvF&eyoL__gZL [[`g)ʰ"_?BAy xjULhؘ_ _*'ngia+f!2dG?+N)94 L4p)W?ނRqڜͪF'waӨ4;IyM +sn-U~9q^g יs*ZjB}@P=ց `?2ZuE1V rWֳ:¢n_z- HXXO v0.KIdwb}L2#nL8 Sb;B2n^ ]w F8ӢAn'9 0רhCL Ľ7w%Qco'xdZя(O-8LЙ+Hka?Up]I+9c>e:p>_8Jk*YХ#d"CN*&H^]V)HPʕD*lP>BYQَDɠ Yq٢X.Jukb+L9 7oӶxU8_1 L܉lŁc[{.4)#*uK^I)phцzeT_X059%MXBCI{`*µhrU"AI>!2|wAA""_ 24N*F󮔇A96BHͶ rz /m_qjb ,\ϒUX˳TF/UBp{$MX//L jO=/`m&7Ъ<Q^0nSBx]~41EKEӈ2a9 >R_q̒ŝց M!TL[`OIɽ=[!5xPu\=$ (tS:D|;YFYbdE1rfUA;r'0?&>G2yIE &%Q=iQM&;D?y ɶտ`GRxvm> Z]W93"Z)"6{|pǩ 1$ 'j[tz6۷B2J2M\I9vң3$j$;GM?J}p=|K==N`rɬT1-+qxd 4qm^FlTeԺw BL4X ODjx7.Y;򟦒߅c9FQދ1ax-lϛg?LQ@S 'T O0kp΁ky}e>9 J?}[1T W}; ]3هJ&C;.27݇CC|k/O7xN5?.#Wqz-HSb^ת|ׇDkm;9 43L \2Heuhe >g/F)U{˂}R=-:d>:hw}/:} dQi%og7= A !0ՎO{NJ"xSP6/k{RSn㒫A QX<"lT,||d).Rh/{Z L-nZZ@|İԳ5&fUia4AOxtO]~q-O|J!-թs3x-ԩ81$8)ީƒ43 ɨmQx)B=wL̏c/]`vwW(rTQM:7P1ezͻ2N~h52 ŋv׬.F[I;b~dg$K9`2KhA` #"95yK!v k65TO6>a"X}dZ]=enɻ"|9Kw/iDxziJ`a"yz&Ztv7Lq&$(>=jD:y' -AWRg6ʱ9 ,˰i`c\!9OSIOZ a( L"е0Z0q - pҐ>} ȃZF< \f-G4I"$Aw3A]Y8^M9BfQEEgiVI -_%hFTA܁jwjmY0?s9rZfЎ.+\;&r%9#'yX`/ y45`lU\ˉfJxݐoJe nJ G-cT\88X"b?+X Z=m|cFZ_Vvюj ri#x?.H;=qԮAY }.a"YO>7Pnúg MFό uʰ{tfqNW!~Tdm y*.EwZ<(`_ [K$/Śrk; LYqONڲAG,lֱӒOe1`d'e*XХKCɂ!H욝TZۯ뚈N\m0(:,!`s*-;aLj 5Wlg`=pr*)7_EnNCX$r׌d0H T*& %h$SJ-ٸ%/+y6ȓAi(j‹& ܶB_1,K$LJ 9'fшf}&ygx~  |P9Ebqc`Tr$=e0"('yPgBHl K=+2!bb`i.- O1oqW;=Ai\|$nʶ%AknV`Vg> !0i$::(`p5!q'GL?6oNHb8:$lB72%ӊB:R2gH|D~ ^ӰQd3>+EFW[O0BW1/aP j&1@Ƴ* Fs3>iaՊx Se4Mt4l=>TPnϳX ^,3ͪdz6w2mV_b49,)5yD6NAGvݥy>3nWΊ%G[- sQ?ym:vxĚڒ լ4rrn*1 2 0WsC>Ĵ!|@AC6!^_2Xx4GG nd=QC_QexE5HNA*A]']]("J%}v!fYN\C,ܜ< 2-^šYW_ʔB~9~@JENjxXűQYY [COC\ƽ^E*0Ȫ:gY=WP51 R7D y$v!3lp)1Iiq^xfa !f+9#0Wl0ME hƯ,v @b#F <SH(2gPFEzpF7rG0NP0xp_|"))Pź`5UKCP;A)+IJĤФDXL*CGqo$a596֕cb᱊*H@L&@cuF dh{\t4qǗ#Lh!7_?K Qlm$y6SV- *$lZm&Fp_H|m6GioHq3[µ|seHr "pGa < בνiSJ pq4ͱv:"x ٟI@`U;ةL8x9%+ʬ-ك t]qMĶd-0#ldtnX\CTiB O[i_'9HIX-tJ6K-p)V: צB6}Y*_:xn5(A's4oԶZBDaC'kW[9TC st/Xl BcU۬Pp:aސ:>a*ʅ qxU57d 䃫 -!Y G9RmGN 6Eel/*q";vj mfߏHW( lȾ~E#ToqS˩P)"Eǣ |6Փzl1wj2kg("經Qxw顸Ƌ}]طCkᄡ.v/Ct6Z0Br[UY*9@-I"`HFv%Mh)!N*yFSs`ә21f. (nLt{VJ1bA5zʕ?q:\7r)$\¡;.ūKA€ڌSD w!TqYvOb;00K1b`wƜx)=,t@W^H_)IeFk!3F]ܥ1 QaF^uj*D!2wVþ.DŽ Cp+Ii; ]FE"J}ޗ>d;h r{Dʘ}k-\h<<[aC_2' Y!{r#JOr|X55Cu܋!4:%TCje-EJ3c(FBf=F?(m#X/a‡c+jlp*" 8Ld\VwT8%R ^W `I,fYPOS 5a?(һn.lgh׎E  k*.)q~ws-U^G}~pI 7y 12@ ~ A#}>]geoJG lMGU$?nLBcieSq6r`(vF,0`Lױ졤2'ERn" 2  gJaT `TfjpT~RxoREi%Wts;5h<$m- <ڳLHCdx\F_OqIO `qDX#%uvfiH݄cn>M!7h" 9"r& J">۫OtE4PhÜ$"EE 9)`]RF!‡a9R+T+=/ڹ0 ]KD ǒ.eOFݴYX?U5yv zDbյuMzJ'P̹0}`#Ai JфjZ&80wCz aFucy:m`):+1d e*L" y*D!kI(NXTbć >Xx 4/¶@vT;UIpxQf!@3ʎ*=e920 Ō$d/Go='~NXj]Ά,%|;PBQɚ wHuhɒ2Ń(bSV ?8'`{R,_VΌ`zq"m5 NMM{8q& p}AZBbn@5f&q;o8DBUDY-KXlO#l'/-gf9Mn!֛9#s1ӌ|w웒ݡsR9>~k *38-8h˭ 0սA &1A_a{<4H60tPfbz훧6|C=dmR'9_3s=isĦBo`)HrMp~qne}L'^)4U9G"%x=72A":10:1TQ;O7WMA`"@-ʲj&uޜbXEP i 28/:_up˼S+uP9X Gq >o?U-LfOU4'/N\ Д}j2^߶;.M?!PA?r[kڝH vu| { + W<$x*"Xp_ێTG4rq޳ASvI u ¦uhLj,P˚ˍapWUc Ta`/bM*Rl< @KB ءQpZX4Ոs9COt9t:OV۴n_& ɏ1ZA: ))hw+Z~%2q3|Eީs1F)Wxrt8~(2$*#w%~%8(\K EaǤb4KoV$3JW5a ~x`DAgAAly?3yGpca%@Tf+FT-ˢ$9Z%wÁI1>T{&"f4c0uzp1L[YkJ_5.6Q]x( }i<q@rb2Ta$ŷ^tdGhjm!!%ߧ{K[A^؜VIW-˄™z͋0kj Cr++vLM&>;]%7J+$gjW1 V˙.,{:DW^8mƼJ|'7xGґng]m"7j(W9*~ݱv^haq{`A#-jP3u/ڗ `#tm A1y`nN#=Z|xY$#eu@H0Xb*B,0xlO.!O/ E bƥ~eYޘ`RSf,ަFC-z2>+mlđ,d1aKL{KG>O1ؒ-)=VF 1 ۙӐ9[$<4ԋuaUFIUҁyYl~e_TQR#5Jc9{~.#q!nlH%N_ ē頊dpTdC*twuʁ C]!~,0*xyr'_<'L] HDMS2d˂4jX+"h)v = '\5mƲ~}Āz@j@s- b)g]JA3:K|xaE_dXRs$Q|!O읫p{(%o65 ɣ6,'n2~4:\^E:HAr!qB,xe|ꄕϵʠU4$͚ S*1> msf2'wyJ%$e&};$A>Kq쑩*GYS2W'-N.`yr.ȬWݰ#|}?]}c+&TeXҠ?m $lIـE>{'Y7𨥖UtO#b q- bѵIjhK]+x,rDw:S%Sx 84B yg67w̌5m[ެƶs؃~ 5!@)Sx/f9eӜrn ,lהvsĮ& qG Yu~B#eW~CcCHMYfӫU[3Œ&`(ʄetJCd/Hn|~rAYzJM $IK.p\ IZ4ǣ_wu6d@ b´h: C\7nCyO\2p> Cf (Q?wb vڊvM},D5A3x}#yr]R∁Orƒ,DWiN'Nc1s=ӗ 5{ 2!zq [u0=$0~##f# R6}⠫|Nr YV {p/܋0Rop{6.: B/F[a: H`3j_ƃdzz%gX`!WdubQ>gBpkn$932'dBm"fGBEᎸO!0+JF_}^Aa*ٝ $pBJnElGӇţ_ Ft3RpȾ4".:pƢ楂f`雐^AcgBŢ#M*TuyS.Z-!+/;SmBBc:Bơje:6sFZ}~fg).;= 8R`"h!ci+z ? [` h_Ԇ5Q Gpm*Y&bˎ,(^CYb{ {8ս@x P{&Ya_I^9)8z~^:8BY T AhY*uC,/wbr L6bB1jFPPfLR^|rc'\c~[HlÃ9^P+ g ovF*=>K*hsnGE$r2H~+2I. ȞiI3~8/NF$!Q6IquP7-@j,[A ja0앥7*<]"{qFY+=_&VhD/T:%-2:]I|GW0x%5D!Ak=|gUa9\B &I ohuH0+ ^ؖ Ա]eD؎B%ڍAh $[jSY|4vׂH,{cW-_.챫_kЁ{ Dg(rIEuf@٘ʙhUou2}KplG2,'KPxWgc~uk+|4MT(śa&`%#:#&f!z>ܟˊXAҌ. jξߵ`:ǢljD;kѤUܰDTH@ {FٌيxqՀ|[gf:bʘz>lyqK&9B75LsM&bk̿sz}]OYjUt+p*ΰՏ U r;6!4w!>W1O育RREЪssVv&!>t+JHžDzkL,imnTЂ䂖!$k@jDp#h/d\R3<E#jboq)aT؅U#@=5 Ol<˝7AQ} %EHVC]ڬPIq s 58UGC7*@!9,PJm !b"ZR8oP*Vl'5DZ6 wԝGɓfQ֎5ĄӓCJ'kQD!d;#g4@eкP:kk+ U_Q'Tόi7Ed~\D#DAf! '2B#P9,ue2% J |&`LL |OFw< ٭:Ě^=I^'jBak>JL!:lVqȻ60D7Hcw@ΗڒUuԄu t5>73Ϝ{F^엮_*J6{gv n_Dl3{1xp,p|}ϘD=8_LP`r IIe{۴H㚍R&E'N+ JnQ1PuDJB-g7Q _AȤ![E=H#PC5׸%6>C+\ A9ޫHdۖ&E5~ʸ=X9Q 2 ok}+?ahkKݖA)9?X}{rҔcj/),=aAeg4(TZǪ k*;Zl"ogN{\"Dew&LZSoZT}5azm7te\#h*<_P*j/n4#NimX'5)nb:G`[|X:QsZ1`z7ìD#l݂$ɇfWJP:g 'ﻳAQ'p͡JPVN*4B]G1뱃q@R-(j`cfQV>ITښCmW-|mJ`yYu֯E /os(.6VX ؜G6%F4ʓRQ2 U|NS=_:rCkatX{r,  v/G8) .?^h^-ñR\Ρ<16Ez+"BAgyB)hbeyuz܎[B!y݉Fv`p}hYI8cᖸL?S7Y>b<Ȝ=$W d邕B}˲?7[vV!P{ʥhFlX’~

xm^[cFj.B!6{lקqm-xV<) c! =[pF +ٱ8)\Z` +˰A\͎ dz0wT$Ie3 vMjܖw(759EL#}D2KA4tZIA0ODbYt\ .31LUs 99;ͿM ! BgP<)B`-[03N+ۺ:@Mw٩bZ&qguŝ $b 6Nޒ06NDf*VTH’#7ӎc)!O{l4?#Tkˀ5t&TdO^ ʔQN+qqziݯX$"v$RX<@OD7/Vbw/ END J@h8Ҵ-)!ҟsgU@QD'm6:w7ƣjmL)S /+s #kkr\ˣE0[# [\(qϢ1eҎ)P b#9)ǞO>eޠ1C$Y):CiqdiIBݻ! "ygA (qB.Ljpb uB[%Ι5rPwVJ/u2:(V&%&|S1cŽ pbnt⹕]Ssʌ69!)ƧX юTJ= ͋425CU(G*seYOϕ=B_| (ǎb?JLzZ])2,*D4- / wk&F5N^WՍсO8l19f.e.c~^Nbz -tiD6aqX]@<[ X87[lrµKO;Fp7û"&PB @h5\'D \U83+3KH[$TB=򵧻ZJ$w*0cW>  %NdB*FH67VYQGDgFK]NUq9yr AD2઴sƥq3qe28!rl7r5;Pr4vo¤ vB..]#b;qD؜;o32Y)X,Cd'@tw3 "F@ۮҁV+u6!piB%vi#/ +vf(%%լ'e(%$x*٦a $'ِcwn|K` cgGФ`$ҥdc F+@;BB7<ܮKw1M\ 7&s_w ? GR5h{9ctX͵~ Q/ Ip/W% $mTy2Arτ Q5Ԝ˒EgN @1p kqUeAzBxJ#{|bwcz3`A6bSt8e`SZⱝ0؜fhXS!%UuD;`o#(UlRg]‚g6Q.DFEgVevoeT$ReB~ )ωņ|sa f4q䆽cYfS hY]:a*I$fl!^aQ\J @or3iR(`*ê L@]=a8M{`u 8vy"x0b2rS=$ݎscWhhf{$z{PvF*TVv(?&kg}gDZ=UHax]% DzL-2DvjnW{k"ɱwK %;܉$F'&8X0gkD')T˝;cBt#hLeE/8@O`m*xT- ~[ZCž$ꩅG#ʒ2 `xhDhu|8uGi`VH1J_l'dd'FwiR0b^Zy;KOzޏyJ6s0 yV,E8c銒vW7i!*u.! ?wϗC䄘ENJm$\1:3[Ȓ֥ͩ?3?prx}ΛA{m|]4DEN 0Z/œ\1/lD͞psÍΛ5@:b<2G~mqv 'H;h4U,-Ԛ2ioy}N) EaVksD}st!z:\NS׽%g벖Gd5X^̘M wzAc !=aŊ5e Z@I| o/L d/eLgB.# U%'(e9 ϦO[> {TZb-߼*6Fϡ֮c%9$tL}VeS02v9d X 3R%//< 㖡lHT˥ؽ4zCG h)d '*?zgR~%) R~i+ Ic@wJ;G"^ yxn)y`S8Wu8=6k0oWGa~J 7ԨaI)عc!Α)P2b~'$ o:\y5&s k=1jV6r_B7& _@`1=_ 'I?_Wݐߏ_5\3A׽B$ToRA2gii>NҎ+Պlo_ua|X~EM!3\ed^ n@)Bukj$XyJPt^#R.0ñ1j#¼S\>y3':ۭY]/xS \80A~m }378D CӁac8Q uNX\RG F.^ x^i۩`U|u(S^&@W 记%NGa@H/M4 }ur>^çZ ChZ 1:XįDdDıC|'erHb>fB&$,u|r3<3( " k0B\7L2{5\ST@Ho|wHiށo!RR.ݝ(kWk|{,vn ޫ쩀Wg9 🿝'Ld%m fs[O[_7D&eׇn ~`X 7!)7:`9_Wp:/FAz9S+n<|B?ϱp߯.}k& 74PEMqY(BEd3G@lIPք'mիZ0ү lab5LT .p)l@j%ڳ aR5RPwWӍly?BǪGq&HK5';6UKU5~puL}|):?L E E}LR3.60LA8Z~f|NyA}AdzW. C"z=*JXa0H /Q!-噾COv~U"g;pc?Z֭V~[Sw\!CQ;zBGuM!F1/=&t[;#dt<&8%Bb8M \ę$N]HWudRdmIR'{/< \ܐ=s.u. &ff1qmt*IqEJIə&DȄl1+Y%)Z!\G2 ,EʔBJ9Oȓ/-Z}~\9so6gISW5N&_9}ӝpG'7;'n!sGΦRONkq._~p߿{\ۤb=ə[q5 飛mx<|-=tC 6''wsSr3yIÿ78*S⨜ ϧ>㈈/ĕ"L:8VܩGn]VSowNjߎo7uǰLI\Y]y 횚Χclqg9k"q{f&}C*< d`?~"~}ncOݰSwpRTSS Jo QsxgaēU3 v,RS,>zb:"_{;q$dS;jH n=CD9$95jHm׈w-96jg.sg9yyȕ]u|BO.x3t:/|5v=g8RN{׎\ݨqT8(E|u1>߯ӑǾ9ѺqW6`jH3T}7gxsvOnE\3?U_~B|ਢ9XwNmq!@ێݷȳJ|l}oǎyGrOP{8.. #5/}<OS|u~GEDt% LS,Ô n8Yq Y-Ϝ7e嚮zu*õq:$OԝsK/rΦ}߉Cnuh\q}w7/Xo6PPm}p/n!r/ H3ĜP; xRykAQ|,_QmrǼ' liC'S^9?|J ϣo9u:楱jkLi7yq%JݹܾÖ9wu0M\E\x3͵)owyyfO.?mq:ԗj:$&-,pYD]}\\Ĝk5DZ:Y5duzyg>_gs{dyaљScUќ34NO9wB{fxjk꘦(wwkc3t">/7 8uY'=+v4˥vW >gߟoo~橶*ӓ퇉 B|)N{?r(~q,RVACCkqx('&y:rpsRJ,[:7k|\sU8N9DRs!to \SX7x⟉㥏ou qx曺li%^>ʿF*܏up!=w=3/U\|7̅}{μ-3ZF(׌{zu(b^H}>2lg!%9wSORPG G>]P|>dT@(*"b! v%|)(C (!~dE@d~ }b :EqDU]+ @Q?y FTP% AhX5Z*Z+Xcj5EjƨՒ(Bd@CQ"h <rX#峯!E=YSz( hC({N:֪*Zk4ME4M0 zS1"DeA"f J DiKP Y DUQTG$PG8@"ChSTi2DJ1 D 1`Ԙ"4[|6~H C: =p!FF1Iy:P-(dŰDk4Pa,kbţh,TZŋFFh6ƍ1Eh-FMѠ5kXdm-&FKH;Q;2z! ܂h h9!MjNܫ 65E+\b\9Eoڈy_3EA8)؇; $!jѦK4d0DY a 2 Uƭڶ{GSv >}P @8 P`#Ed=BU:E;T~C?7b? xWRcqP0pf+/zDNSAJ2_b|r/ ^id8CRQf[rbeJ^{ݞpf4gGwxPXjm†}̂d(P}cT3 O[ "H~u,D_I*^Cr& @},R2gMc=lN8㓻XX,*t->x㉩1V)u$_2 [,M$@{^>9_ۢ@cdX*$S+ bY72%h,VDzG[2$0k珄{xG;k1f\!lxˉ>q BaaO% [jHli( &1Z#TI ‹䊏<1zRIJcYSC{a3)xU|* LbN=V.gdâVUƃ LL-HuFȉ42\r|;9EJ TkYߢp]80d/1b9֢ss:pPs2,` XR}L5u8I^m ?gC=3YVT1&*yuj`N RZ 7 u*GiQsǗ󔋽rʊ,}VK1ehE >k h҃TV/`P!=YVW/jZQvBF(֑0f6 pB6< hPr_ 8O1lTОH5(E:&_{x.z4T)`|t@U ]Aq5)]V'ތ>0~#DKp]B.q88嘚 -Yl+G @ A2 f\0U54a_>3Sn!WD'ľ< 1 g5Ƨ+9n 3E\pB&6r,W4EAFaǑ lT҈W<“B_Ma])NP̖/Ʉ .'"]*(Dz_-AkNRS2j?7+SbŲr- /{(Tֈ$*rj>Lw,1͸C*/SC808=+TRH c=. gߍ uIV,16)(j FqwwMNz]:<4 ^FkZ9"bIleN4̏a.Q>RD% ])pt mn8ҳU(7nA~zc708\E.P*"Y! B#qnfE1'ĞiC-iO >53qI%OSZJԪɌ=mntJ KTLvYKB_bHCZDʼn2x30 Mb ~SSN1Q1e]ĚQcW`ɜ,3մZc"Nyh|N1+ 3PrTq D2[l9 SlPX1F`K+ "ʿ\MVMBتC[Wy5+K蠩H#TKQS;K)!Ϝ@FYcQg\wC'kC~[=x^XTQaEtDjTۻ'SY:'~hT,2؈(_~Ӭb Rĝ0%e[ 1BID =.}N̯l5&1PByL“LA\/m~`KIӛT Qϗ0ׂt`X h-m1V$1RBm^3ڧb !(bN Ն u9z/Eiu-c)1-neT%Ƒmߺrb&/ixHH?]-^KPW-]f`"8 kO-4Io/48CXb}ߛ9a(|N'ZʬrSbMtfZ֫2C9uTF$k̢UԨ("9ef:77ve˝}N0UhM,6T}+ӤG2EQNuFRk ֖+hoA_ϟΈSbTn:+Z0>Z_(d?yu%bl~]}TQE1?ij;R+F$Ho6 V|apj9ɂ)ְӛEb\ A6LUΏsNLUav_;`uPP>́Ŀi51'Fo6z꧉S(y&v/3I5&I 2bE׈k1֥ J\OAr& bJiGJݥP]|M&`̑SnׂUh1s\xxK.Eܺä)"56U/uܱ ^4%D~2E^?S|7Wr]q> Q'>d(PkWW,UIQ+Zm8l eLT*QWRTR3=5Q*=OfbķZZw1JUjjqE 1ԕ'˩*2*E1l\+'/$X"c&0S񪼠f-#/ד,u17/wsÔ>ǂjf2f-Cܦ)mO)*~j1dds Y8Ɍh̅dra5P(?(< Xkb5xXu/1?)SJ*uTC(9oqd/f"_33!q,dFÉ8phwd.P$Fq q:h^cX(W5*^ e+#d/$ VUxbݿfШ1Ke[Y a& a0b4 Z'=¢ FUa,ݧHrŁV"^kl7 ]~'} YmsvW=eD5x5RĽ X(EX<nB|:/BPMFqWATUT  oi8Mm!AtdI<Ϭq u,57,X+ԧR,"&oךp|sxYj/&Z[Kw')q* u<ʂŞDDڟ)q)5em> bNrP̳29xr/Iw0r1\5}FgbJ՝:f/ M,^>{uvN̾y#(m Br%U{]q1X,G)R LKlP0/,,D.~]݈9Uc|EV:)bXB@@YySJ~qu1r)?vQ`K,v~a9( 4NFv,>Ӵ KMecF`1^){gbn$_ x3 P̢R:Zq%aJְSjEG9V E80pSWh0A B*#%)a^F\|j°qԏ;/zQBZBA* S\aPrJ3h&z\[88IRTč^ʾ3MV ,тV^ #bʙX4pN"Ժy %aZ)ǧ<èkT+_,(w2TDq"AnExKR.k0*?\lĘԏ4$Nij 24¸p$ IĩOYnBpQ?XcOCrRNƧ%vanZ ?h]*Y g_AIO0Hl-6B$bP)dhi+eQѩʪEs QAME%Ā{@PIP"'몰r.Jʖ3?9T(E5*i>lKb5X)1W T>S^/:Ȳ|J<=s%F ~UʫC}s,!cfUg9 6(^z7DT[&Af5+OaM`0p0妍/K ܩC1 Ma?}lfC$ 1Y%1\k P)`bWKݸl2=j/~Wv"0Q~7s)$`>^Z;OP=qe7k+OyOxcXc`,~u{jxRe|)X[{KB) XXxr#VyOSqQAQݥ]2EKc") b$ :_q:qTc!Pw߻sLDg IYypUԬģͦ!Xumj1b+YiPM*jk+SqW9R `M6x` zkgt 2 0(988 ߘ`~Uֆ'%m-F~1 V*^MTOG RO]@p`1.>~kF'H +'6傀ܹkJRIz̎7f)spAL~ivUONd!:+ k2fF IS5i^kz|\ Q/-J`>KSä'VPVd(muF8OLO30}C~`V{Ol< @1>$eDR,2Ԩ )EynEXc"IjAa5jju%$ BD^% p5ˢLk"vMg17&01$oE5ҲLLUzC&8~ڨu;] C&e aP(%˩-`D+RTlvjֲe(z=l<irا6qR~8²}@Ʀ>=MoJe,mY<`{p3U@00dp@fO$$O@ObMMdMn4¡!KX1Y4pO2T*=d +UjaK0迒_J8@+bTd'Z#' RYVICYv:aHhE~LЄsaڕZ@1̖%(ۿˇWJD +"bV/w>nThl8 B P St \'\C\Ggw2# :]̼LL̸}~:_!I'Hyԧ:%Q~R >U*BsT(?AOTvuTW9* ϡEN*¹wTK(Du>-7O0떉:'RSU^-$eRrW+%RSBq#ƧItm&XS}ҿ)DR+0SDsQN.;)y(Dpy~N}U]~];rQѨ$Xe!%4$h6H1!oƭk] lIfba%#14AI4ɐ̨̚aMȐD"!!,CD)*1"))$ M) &aJM/͵*_uW2UEhlXI) dB3H14PI$RFL$(`(D4l͢ɘ)2R"YIfQd &JI(̊1&1)F[g")+棠Wx0ڛM-TƱlj6cEj64mhPXT~ *}lg,McQh!I-Tm-&UbѱQ 4Ii4AV#شmAj(%&*5CmbX5,kQQPV*-Ek&%E1$j5%4mbŶe&ٖEbEcc(j,-EJ "(%L5FؓRbѪ4Eh!F%cj5EQ65XPTXحF,j 51Vֶs={2*CrN@K_{;y0<8PN(*U:$xWl6Y=/[>1)*""W7@1L4#[6ٵٖ2_̎<{TH@RARS*&mhEQE"lb5mmhkRkEUIckEZ6ѣmѣlc&66m66(T[cci#QV5DdصbAE+X*ض4j6EM6lbXՊXmE6+X-(F5h+lmIEcd l[-&6VQjZ6bEkEh-EŊ6XE5[$ȍM5R۠u\Ia|pβ֥8T9טQWǔA\]O͗.ZmNYp,.Q\}+}(sZ['&UϐQG]h>W*rQ^_2*B( dWJ!_HA=OOQD )M7Wr]ݢp3ypCb]$<- ,PE2(fn7 r;>}DFhM6eFŶZZRhmM6l-lڡڭ[ [Jږ63lڨlSj&mRٵTje-6&ʛ*RldJTll%FMlQ&KjҦ[*[ eFBiSiEJ6IlFmMb[**6l[V6)ڑmmd3VMHFʑShѲ&hMlf[J66QmFV[RFmFCdf[U b6*-VVMUlRMjll-[*6[V[%mE[!fU&ʍ[%6lildi-6JUMmVf}N *AI'6rWBU_'-sqםCT>xrA4Ju=~ )󦏾[%-bmi66`+MJw _^ "hb" $RP%&f[Uk[}K=c.ٳ3nrĺsAX5b XŐQUH0#1bD wQ܂9NI\˻\s"+;(wF B4#UXDDU! B$bR M:F+vr0RF0 YDR'Hw. Eusn$㣎uskE9;tDs]r1s)݅CΘqG(ʅē}T};ׇrGZ}PItT%O98Nj)֥>G5Rc_`'UH_Nih.nm͎E|shr 4yνہ hUۗwZH4D^mШnWwdU9؄yD-%tArrF*e\ Lκnk7P(pA '$h5l6 А :4h(4  Apa,1PRQ BPTPB* (@DTJDPHE)@PHIBJUTA OB @M#CAi&ѦLA@0j~٩sNO;3J;і#la`P e}?I @AKK"^ih)wcyQA1ꀗt=oɰqlu͎%Uc7G:tw]1^ie~={EF?%3O*n "EegZT4xR?]"/sV>!X3y<n AAͤ\Ux&f+ߣax@ʭY;An7FZ'XDHS@}l̕w|wvMu}$yYT<5WomvK3F~ qQxd?Thy8%@DWlKloo,AOӣhqe &C;=A Zh [W|Οt0tu OշxpLǃA NAQRYrEٰh 0 n> 0G UV_ӥXgע5UM!f0 0lvz1xյZ¼MbPh% I'7VaP2^eڭ8Be}텧|do}xZ-;l6d6u&t7itˉ7n LJjxߗh:b2eS䰝FMjUPJ%hy}jL2^՗yLHf,kXdh3xJ)/a%Tt ]4͉.ԦC JUMc$u}ECK~r1k}4Ӝ$KM WTsW6KGX84ɯTȃ8 |39D6r.iо߁g}6( 5$]~AEHMdڋf,Yb1߶2e-~)qI|=ؔFF†M¯S<'W WP=BTĻ9gg923 51g1D S8$)|vD~5GuQEf[k~He ǭ (pi?e~ !ZP1"R+Tz`I,@#O r͹G_jFxE@nVJKѵ~zoɉy9Zi4ܥe@^sO!}Hר ^<0UP]k*3΁ٗ@'G{Fl%#xQABTj;>]9_y󋊂H z,0{O|>ǤrًTS慗V-c?R5Z Dl^biArA P_|M60r7a~\m.}}}޳3eԭV»iyg)HsƇjgV&gP) t׉,,v٠AV#Mo{[4qqa9lsx}v ym[W8!'sw_Mr@ "o9%t'*n;7yukMDR}H;]aM%RZq"Fw|i#*>0' *jZPW{֑_񟴃ȉ~тY W]ȵ4S,s)5r>֣)(ފ:R(L?~.̤]~F߄^QbO:}s^='><'e"K^~Iv!&$u{sKIOs=hh)m1y'͝6 YlˎDS2`ju ɟGCWAӗje鑺d"DwPOTnD#TQK>SֹI˕/k*bc"Oz~XsΊpQ_y"1=Pz(G%qb /x=ij,:`?-Oż]'~xIPUK eW ?|\7\˔zj(1uzH-G9E w>J uZnQjj'Ej"}WbScESI"4J7|RBNta&0ʗuoVf].Mq.;ET#N@>}b́ORs"Ip_ƹ vAS?]uPL$`2, }lBKZ֕3`-߻V EEeuuJD3i;3܄L8}eQ"3 XmOTsuxڵW ep#,29](y, pp.۬_62 5 3ot/%~؉XZ+,&LsH>5Eҵt:аG87&٧[(L+.vǜ.r%=׌Jsl8VeAO#Im'=Dd;[^+?'[?]4ݯy3g2r]M=}WH/@ :a7iQ;TlGz4Kz j ϑ;Kt\cO4rƯ ky A[D.ߛTkj,ҁ 6R^hcJ~-2U죃*@C! j8Ům=EUVxsB+[gc yBFq)Ğ;xL@p*" f=BCFn,Rvs hžr=2j6SޠA60a>'N D)!"b=Y}s5L[9M+7%W̝3Sʮ/BWz C|M )єdt/-n`db/ }_x#xGߖ'G ~T<U$=*]CQv%SJt5 G;{#tDϑV։AކrR v :_%X,#?aS螅FR:eGKuǽ_~@bJMxnlP"} W>a QJsS|b?L6"3T'/sm۪OV:`]~]R P0Sm̈")F;HRloHH^<$3y%V ʶ9Z9]$NEa#DEiQ2j#QCNԅCo2z[ k_>eNOo|B1LSU"t>>RZm.*;l@dW_TR $=g1~"X'*_1yFhDϛ[N:e-Pިl%C4H&3"CG)oh]l6'B6蠵 W&ř}GRyd۞9tD 4KϠCih1œd$I! | sl;+]@>-JGN/F9ú`+5%] o!JaX ^^?~k/lyi/",a_[nbikcnBdޢ7'k"h>bG5v٠A<>d.բϹ0~;3u3Jo}GHDz H!yҧdnsvIěOzѸOtӻ0i'Rڌy$H~r \P#OM鷫/UAP㥾D^^2+JO,Wf,o=3Ъ[Dsc,Li#E¶`܏p65DƂ@R'R|q)WMyQL9 k3 1N T, >#NݠYC8#h废ud]z]DxO.@s 9[ca$O(.lPF?Ǝ>𸐲^ɲwюlEVc?]c V Zݯ#wA:9GQ_X-v# lRM9EU,H^λ~DjI3J+ާh8n_S {!ƣyttrZUw9CE|sv[˨sN+wHzZ=NGf{}BTbVWm?S&x6,ZA!ck%$ϙ_AzE#1=ϬC =B SűMF(TU7imql#Vzvd@~VZpA:R!3(Ȋ4_KcʃkvH)SxANY=|hHKx]!MchoB.,m +$}r uJ7%AR(OM$:nH]F]'2z;oMS*|zV;jGXn0i H$jαF{EFÎ@@և hZKXzTUrI:bpui,TXGTao'ܣE\r ޛ ODi(W>>mHmjlPbI~#1;E6NBfz]bjtJ;ё<1=LWVc~gw5oRGU*nRq7t>B:0]ʢRNٗZ4lLPQ(1.΄X PmεNJ'?g>_r^H1wg G;.b1'217t7rodKego0[LywJްیWqЛ>OgrOB&M5Y%D@uW\yU;XXuPb.Jm B9ZJ1c__uVKs!!qBB)L%I T#(w`"IOn2~n"KEFƿP6"LZ3QY41* 9ݢ"$h? :m: ѥJ1=:tA NQ'C4)BG?c>֚(6I.[zw{Dp \qvYa5 ?{tp)]6NpA̒{o `~_Y`~L`(߬IuճjEg.xI1:N^7v40Dv]66{XwiYc=w 5^spmͳ]Zuv`IF&4T+O)VR<Ҕ+EjOK__JLCWu[[l3^%BdzvhZ/IߐY4B8`NuSՔ]F"ڲl 7MyhQjmp)xeMeK5nnBη6b]mm7Ѝ'Wѭ` xCj@b(a1F0="Jcû,ݍ|a^N<&7_vlA<ɬ=+F3K62 *BfCHaQ z1t/Eȁ˟(*hbVq|z_ap'X6ZMJf@G2yI]*sr3Uq,ƵJ \sOHvsgHK{@3ZN ;Nm4Yi[N iyhd$RȂp;]aqC7g̈́⯦G C:ykBiu/KJW$'GtW_Ќy;{)(Afb`D^3Em|/˷GbmaAgw:(Qk„͠-azsń*PgȑEBJ 3Cft- `H&ƗgԆ-7I>$cD8)Õ)DE2ˋc.rs H%ܮ|enmr]"qud5 pF!6iyX^ɉ|Py Ф綀G7':w|MF8w^>hKۍm>z5g!"F[0+( m \m{7)a="94^X ٗ|zz7I8j/SGX^j cEiR9.er}T (sf|o4r{(>30*? $)Vnu*(pQzgGU]NkPgĕHR?cޣh66)rb57v Ƈw\tZ$F} ̽;ʒ%H4^fHErtS`䡨[b˵?pJ֩wBU˭$J d.~Mo6w8wHL߲oXؖka^){ ]%م T./ΣhbsbvbHO,8W|ݢ?S8Ő[2* {jnnj]VAПB k%X;u$'1K#1Z 2Wc2f"6wh y{dʮļҵ Qot=59ou]# }U3>_^c;K?.F[ .|Ej4P|ӆ 6=jQA&(\}:ϴK;xʓj&"= L7)}zWD4LkNv*5gzHIy}Ӌx&9w>.J|r6:#rO-m^3,O.]x01%鴱yI&R} \xF1y3Vc)}S!.j14̙)")oԟiB]$| ţ&M9`fH Ԗ'&UIml~lnkj.D&$A\j?Rw?i3f/o1]fg-|wVG!jSTT;HyeBA[r$3N-#463CDL 6*rutA YJƩM~~i=A`396eb*ZT?'X˻Z&ƏWO3Ɨ36N#%w+dwj ,:c:T݃)w&U# '3#!mQb[ch(a{KM>?TRJȪ_ǩUB=!( %Tz)%TpUUML$QY$RE\&TjL  )4*A *UbIC&#Qz)+ǃԇIԵWmaG1&hFL,)Ȓ$eI)($L$2P7`)D $$ eewv$"J39 J(fʊ;8qzږ^]\'rQtB:uQIB H)55!$  )0%%%t(|W;q)CV}ALcQsȪ^w {n's99#aUEqʛ;lgfq_m39IԩΕ MjY Nnj)l25qIKY fCk^ !U1 Hm͚G V"ZgvxPl\儓8ݰ9(̮Y&2?[Q"xN\=3+021)VHBI$d:iӜ+:.f"IMAGXHȹR&yANJDHMy]C9dfٕȍCb9B8FmϚfT_&]ujtlaӕ.pʗfCRQ˜̴F{˃aAI64\wkFŊD:;$zcq:F#s\#l*epG2 !$L',n❤iYW1׽!Rrfj.W>r9"!E![r!\\Pe\=\I&WUP`#$;ܼKI^9 ҋ $:妾 L]E#EU*b:dzEQcds%);,)^8G6I[YV!z4O9z߾aDd1$O9)8'~nv☿I>5V\OXHȯh$uW H8|Ur$V.w7L:Nz k5ζ2Ns8x"ȟNWב * ظDQä+H!{ VgJUpr9rvrM7U92}ݷhYLN^ I#OnfgVX Of^k#!I \Dɘ)$ V&,#"*¹Ċ rw8N"*\:;w;W!pwe:SS{N"KB9^4ATEUdH?- &Q##aSd{jeq='8[ ЬlmDr8$G8q=IkDlM_Wqq>/Jy#&/iQ}9&,CndI Y5sSg[&"s䙗9JéH;vm;&(liQ9ͱȪrO:gV'Sl4Oi$Sms.*Qsߛ3"<,r&,cLFH}$JA`9$qwf穆7˜E<5LO4$Qs"dsft׷}QL&gTԇG:H'Cϧz^Z$rQȟMK'PXI0]2!#'|VB)'"kP2ܲe$T"9PlͶ&x<#u 9r Eb$BsHI>+:T̎GdAν@H7NԾyj(i=>9Ql=os%/RO;Z}tjBäj}2̃׷$S ӑ8QSZ'|sǀO:jghk3\YƖ8fwG}'C.{EDr99Ԗw.Ll nr Rq"z]N/pezN!l~9I wdwk'y$뻆!Fv_v`_ R_Zvr@+0λ:Cǫ崀= 2j(rgS^@QpAsJ|pvUdTp*y-=ͯIv,EX0j[Ru9g9'9kN;ЌH jW/ܯL5ώM8T{!wdc c\3,*rS{ }. $r&D\={FW7InD2IWDdU*~K_^q5LY'u $`rLsVgag\?;02!ɩXq>'9ǥآ➦nYyw:s:p\鏽L+Ѣv䰎K (G@x'zN*ETXHZLEb#29kӞ}r' q |1rlvfje+<܆)\8+gܚo)3Y$P̂)H V89ZITRE'P_v^"']`# r*R`|8B*W,W"1Ű֗*g)zSzʒAB9$1q:;ŮkPqA\HU>Ho#Yqsf Ȳ+\#MNs XVTg$y f(,1~.Ԯ)Uzj.Te[>9t}Ly l^˄p삊K0.&L+Cd pDLÝCe雚ǒx1+Sܕô*`6x93}N$\|>M7Oa0\W#)wP&8LX̀cptZg׷nCҬO\*\.==lAw!TMHLQ~.88[#sy%(׭4\ԏPqgюtjXyH֓99Lvjn9&tUvB8Iy#LG+xڛ r(lI (0QtXɖA[v?$9aNEpreHDR'L\5"Fu w%Rfyt2${uϿ+:"ɲj,~G$IU* ِF$L9Y$py\V^ӾwN'v"M8p\jQAE* xϖԂoƠ)ӅW5ۘF[zpTRM,kLd , v^؂-, CY sVU϶&:@#㆔ ?N+m:ݘD4L~N%#Y`T+B;6dX-*Q1B8P e}ש4 )@=CSa3ԏRdLfM.,b8&>B"ǡ5R1%XpDMVɋo#eJ=wfI;rYH}lHĸ2* CPk`&)56G> k^m+W:s"+ɣ0"kW(|Q1J$xb8~MsRl<פ"bNڥfDM66Ȩ*(|{qg&r#HhNnP"M/%^ZqhAp,}u 'RtfbJw5D72l:}jH!ԜNH"guu0)+'nr!Spiep>3۾k|PQMswɼHVb#"^ңˏ &9l"r LLbET50=2tY3֫\~b"!Vj\GEn=o`+>3O۩B2.ɚac%7a~뾊sԒo_VԁygrEUZLRv13lj_Îy!#\ֽ$}uC>()ۄH̳۴&d- ɻYȵ"ep&*p"D &fȸ.jVC<\\Qqqu0zmS#J3Dg#vxxW:MLg#f~^֚y>8{90R\vk+̀xYL D\5WEn9\>B:@IΜަ{19WwxN)ۑS*b䐲GPPN=%xt5? S5_XrW 5ALȧ. VNH6f0b(QPTI󩉰"hHɢьBC,* PHJBő 214Ld $HBQ *Bd)4) 4m LLL!%2dDɑ A")3 iM fb$!!)$јJ34IC!X"K(CAe 0(b3R3ɁD$Q!dTe12L&LD1*LDYP 2*! `H"bcPJ2 XcE24(IS")I "b()&`YlbBSAI)A2 %1DM1 vJcV($#FI1J4a a!CfąF QScE2fѡ)$@1#Rd 1X &L4Ȃ ,FY!#L) ,Q $D̀M& FPH HICLKJfB(ș!!D& L P a14DD"a#$ b1fQ1 ),%H(( !"MaRh0H)Š P%$3CHd(e0b1IdY 4$3$@23b0`&BDd00b(̌ec &$($4%$(L()444ѡHL F b( H Ƒ6fCRDAl1d aLBJ!e$! ĢēL$(1R$4D ""HhS I"`&BABhALBD)DZE %%"Ha0HR`ҘIRQDabC#&( (0,FLHI2̊ D` $6ZHH @# )%) 4h0$LDRa1L)%#1h "&QHBhHY"0D!d$&FdfH@)#`̒) D3&h1D#5Sd&) 2 ƍ44J"I%&,%LJbH0RȀ IJD0A##EdE"d13%02fFQQ`$b$#eF!* BSd d141%eE̐b !I((Ja 4fE $I0JlH% B$23H*ji,4d!’RDIa! hĄdFL3BbP2$04&FԄDIB0 &D b@iD&"c4LHH &@LI2  AHS%$ġd dQM%"LB4K%D !3&S%"&"$2AIF"04BH%!JБ)6 M1$) R 4) K$HJ F1"! 3L$LALA2i`EI&I2c2d%E BS@!SBbL3$̓14hf0ѱLɌQ&$ $ȤFAA!J`$$"e,H ѤY&%jLX3Ią L3!L56&bA1&2M2dٲ$c&$hd#Ie$6#$LI!(! 6BA0LBi$LPL) `PH,eP%$3aEA)MF0H`%4dQ"#F Hc(̕ "HB&!aE@А&*! FPDMI&!hِJD2db&dĔE#$R`‰41%ZbdQ H2Ld`)1I& HFd0Kb1HͦJReT()HaRII,30јM(b#L"i2d10@E,d)"Db4F342` `QA31Z dQ`Ѧ 1!B3D%"`!1 XS(e& DQ#&Ml2#"LX0&S4DC"fd2 (0 c D5 S(Blf$ac2HHPA4 LXR&h BRd*Da Q $JdFd$4)RF&f2PŒ S2LL6IHI1 J)0LBh@Y)(Ba"&FHa!*$E)) $00(I%H$#B0"hȂ #"2Pi&40a$(DI(L@0$c! 2db@EP0&ɨҍ c As%4 F) 30EH̉i(C($E3d );I%! @&`a% I42cd@DE Y"dJ@!PDB1Lȑ $R"X$DBP`@AL)( I$dAIM2F$ǎ)!J%0!(lR͒@RA @ FfMdLa$Y,"Le! A0C&XCA`wW`&H`bHFcI4)AҘPM"JhibcFHFcIQ bIDaa`0@#H6 H(4fI`Df1L%)(0@J$3Rԉ!$3!#L2TQdML"e $B2hD"FLɖi$1,be, 3B!"I)IRB I1% )2 6 #A$-ILT2Q(b#3 $c$AFIe(SD0l d6hB$$ٔ $Ɍhd fSLƐDbI4$Ʋi`J`$HĴ&M)% ˜ &S0P(Pa EjD4hQI$ (I3$BС(1LĄ 1c1#Jc11c$i$A421$ƒI$ I& LљBSL &`R}I2* 4e $Rf)IJ&S3D3(AiQa)Б &"I IHA"e!AB0 iL2 FJdDhb&D iD 2a$ ̈I# RDCQFQĦH*ٍ$ P $a!d&$$1 HFCM4e E16B@@$!H) ȆJi"E& IFd›"HtRS2M(EJ!$BPlALbPI2#1c $14 "F(4ɂb FlB (bA &`IHA &2ibilM` RAPKE 4c(&QR4D $ @ًSđe#2LIe "Fe,řb DX11,Lf@d„R2P`bH3Y)2$@f)2,$&cFiM$b&LbaI# #6)cDQD1II$2` )FPd1H0L"bDh҆LPAFbCh4I) 30PL@i2SE$b D("HLDJHR"A3 SD c`H̙ 1,PIe"&d$Ҙd#PHdBB#$!6`!$PA0fQLEA$hB#(كLCE4i L e"`(aFY#I$aLB3ц#aFR#A&J$L"&S&D$&@TFL܈ X@(HAЌDY 3&Pe (LdȉLD &R4#6%024ƅ DEM#&ȣE͑%%62LA!0Ld4L B4`dP4(ƌM!#,a,$ ZfLDdLLF,$$"4Dъ &$$$(bFRM iI3 L Fe$IJPb RBR$&KK&F)E"X1`DdI, DFBf 4`N6J6`Päca%3R(! ى$AŠ%1d!D2 E0 "!32iF $b B@$ &ɐ̐I$i"0!r) 0Ҙ#HJ $*BK$#L@ QfiR$h IJJ), D0ƆP 2 Bf3#""4!$J@L"$ɓde"QF44!$d  ̔P0ԊL4bQHb1d`̕%%)F1fB 2$1!"!Hd0LdQ3"FCb0ѰD‚ Ȧ33"! # E"RER f#d,"EDYFlL$h)$3i2cI4`*2)BI %,T&SFH0A4Q"a1#B H2! DF`Y%MKƁ3 S%@Ań6$dEJF &$LfCAQ0fB&4$"hD BDJLȴ1E1 (ȔeĊdI0S (%$$diBeM& D LdDXB 1daAL hi#PC $5%IX̌HEMA2M)PS(IL%lR B$L4&)1I$D&1d,))QL@e&LɢHfd0SbH!$`ɐ0 ""D$I0fH4",, dRQ1"@Bj,-E!IH$1$,IEAPa@ĒbD4$d!Bdd6`JR&`!BQcFTRDi20 B$Ȃ&@)H%+$D`(D ER` %&ZakVֶOQU~KxQZ  s7;7yysԕ@m]KqsyIYSI 4l5b+,qp$؆W7K]lpvf-!?ݥs Z 099)Yndqb8dM u!N\*21d؋#fj5$O*:_(IuAy `k!N*#$rUG|^e)w OJ0ڈJ **+UEsIt]rQ:1W'>s4W4tz*Ҏ^8i s+ICW'('RyGfi,I4ff٬S\d_- 8RJ9\E\%/&9vH"RyQtOԝ4JuW`Vٲ Ƭb11cXHTb6EҺUhEDVQɴV*,h,Zf#h\4XlPk@b1lbcjMwk,Z)66 RV1EfR`ةTa Z654Ti66bd֍Ddm4j-cImh&Ɗэ4Xԛ",clh#IEcm6-,cQFFi6ЖEDQbJ-FkDJbc%5Ih5E6nnѱV*1TQEccS65F66b-r( I4mE\ڹ+C@'~!!RۛEE Z st6lj-(QɈGjt*/Έ_l )$dwv̴2 [d+4v9!ǖmf3KbKzk&m=u\:69jdH e͢BH-lG+V^CVvEvʩ! f$OSd9lg'46dJ͚.@2K ЛNW$qد@RG]o9xM"\H%cKI%jtEZ5hK쀿pg*Q.YR]R_|S!=˩w] U+G(Tx2ű-jM5:Ix\Jb:p#"d<J'P$9J])sjyԒ<̤N\AW!G+v:a^w QwRJ9җ~j瞝yNxD:/&UG@jW5Ju}is\a]p'pWu%ByfmfB42% 1*dIZz2H3DRA$f%!@,&ȳKE FM("Zm٣l:N#8Bxy(cmdBF!`* əI$Ab,mX`ْ&1"R)1IREf36qBy0ptR#wf6ڶ4lZŨ؃b5FlZchaB?TДNѨ i,lF/7+؍BEj b(,55mL%FVMXƲk@FѱDbRa-&TmMQƒ 1(5EJAmMhF̪ j@RX-6)*-jM&-IFQF,cE#Db6ɢBmрAb zFW9D5˚A(xjrAS~b"㠮".)G%*@J6{j(|j! v챮nWwb$](y> *.p~iDUQ9Q\x~#UuT+W8 NR Syqyz㯕 !@rox Ւ6 Nuy[9VߙrdScq\B5r󘆡7 4!` %sy7{P)+LI@) 9/$unr{{Lq+\ j:s5Bew 5s$)&! #'!BZZ I]NZ$=9e;  nKXd-h)'((ruJ @{wyl9pnra˓eYMNIr53SM!sAi-NB7^@p伨MPzBSsM&̹7/%#%2GdBѸ]OnKֆ|2MddPP$ȥ22]caz/NdE&Z'@ԇ079=R.5>Ó rMC.=B.익.Iw)ytECHPjΥ(Sd^PG@{)7 o 77{]Nn!MCM ARrJ) \sKyd7SFyreA'R3Qu-AR(ς?.;HڟE2U_00Q~XYun5-I*.qUp!ڈU^SEҗ'R$dڣ%q+4(Alڼ\P1Ex? 5("]j%!'J ރXef"-YQW>Z^ m6&– (JAiTZhQe[6-EbEP(*"mKdmlմ 6lamKdm!h6Se) RU6H6M d m[j)ؖ6-Ull[(ٵMJl6lڛm6![ImIM![R[Hڭ[mddj+i[[TlڍmUA­l*S5Ce[JmV#h(DFMmRKjAQ$8s:Yģ]ڒyY(2k <0K ym!2]H_VFŹ|GLvrm]KepAqXn&= BN)Q!Lflpf[5ܡxXy \-HccoVH$W gqg6U8[3 Yե~TApyMBZnR?IqKAR LB Xb7|-mPD0_Džf~׫zn7g{?^N hl\!yH۶?:-mͣ+_3/j>2^=q:^}2K.H2U]B/0IԭV7 5V\3jϽ{pnF!\I3*(n¥k39j(Zzeâ`.BR)5w=➅>OS>"lfӃvZ|b5؊rԎ}R$Z*YRyZmWB Exa81zz?4dn z~tqA$vcD wn%H۹8UIKb Pi̭ʢ%tP]}vK5==ڴP.{,qѲP[e a$d4\ .<8ŞEe +#=go'>jDi;RﶹqUܪ~aWֲJytZjTdJ`DdiQđ#GP:-E0`~x\+/Y+:9&SG ?Z/u;^&-{SM\r;+J'c6tssgD"'d*殕 $( Y>ל)nٴv%4JnTX;KP5FmL&AT?DZce9f4ۓkr>YFnDyؐ WFҨ&FF:ce+-ήRbGT|ch,Z kKw5m 76Ѧd #B&FG2SPzi2gj=s灱l0Z= ʮ- v q#<:7O{YIuǮ{\˜i0Y/?TΤ= PPljsP@A`]Uc赹$5h;˗a"pc9þx퟉A@v%K GdڲGj5(C1Txx7v:|eG²yh&H]>2-{P o` ]w2 3Ga`ĘѺ1E"%9 !>4vؼ>qß/Ɏ`h=1'5Gσع}<#bTQ6,!l3N/ǬY /w(m5)2SаAA9!QD2̪+ 5IgCUE!Au,ZEYm~U5ASs;/z۩dx{ݕotsf7=j*&"oν5" si:w굫uhBj7Q2 6r继6Q~duM  ȫb ˇs*j4?af=b>[\*uQd{< #f0֮a ~V} n/+.qĈ2өeYېhgC= , 3g:|k\o8- GIJYMe_UMP6}(I-M7qp L!VGLKH;!,NyNQ[ʂE#U {NC=[7:z &@{8]lٯ-ONcz9&D+vMޑAR\nV!%])߳df_;{>7p* !4n_)+Ui#2LMg d>f,HN\f谮3HաC-ol%t[<2,U^,%% %EG\H>BL~UMR(񷺌b\2{=vX@`Mv`Zs7f{2D|ƞD@ot'%d!sH$>=(Sa7&L2FQ+ٟ+<יv(JIIph<CiUj^w@=r;,8H)w"hZx2hCܶFGM<$K~)!VErFC:)]1"61&ddBV݆e)*ܷw!C{ $x-R|m?GO`AwX[|歶nmǰZ@^%#=07B PgFr"@ѱƢ(›~H:yDөگ֝y4S>wkP([?awQ.{t40m:#LqwҶ{nquQxe۠ $T;-QA ڜ}Lɚ,aUFZ/UPit.B}AN&1 'LquB"uN8?S Yϔ7 {;ǰS01}(aa~h]z}FuV4С2;BӣeRMiFRcvD(m,F\9_enߝh/ j3:DrOn%}KiwJtɸc̟|sb(Nf0>UkC$;Dz! >J}ou)]'&(rP^xp/.5|Sp^65|gn 6\EQr=)~+G]EWK?͚W9eRd{*./)p7tbCAoiӻe"؄"@^[c2u9T^-" ,ƙV]:Z+?9"q$E4S",O9Ef _pZfUb֗'C7˙E3b^wKrGPA %@Q)t>ȴ'( oM/ˁQwGISYe>FC|&q7oR'.aDG&&uQ9|;XdSf\F}ٿU8Q1Au$tPe"Pv1`!m,a쮡n3K»dYGR~-O'װ! ;@vơt7Уcn~yGt;H}#yU6u7 !CpGـ+ebwJ˝ħm%(Vߠrn$;oH?ڷ]0'HꎳM[Ql:jXI=A;S(.}C!Q!+F(osr-Lu ',hq!8vpj]ƉRIÂwmqGʑ aHͺ.YڸToV⃹*C/G*S!)m1.5wv~уz.ܵ[cvVxn`2=۷s, 2Xxk.%p㵼j)r1;@ ŋبvvJ$oOn#~~+C0Ѵ:ǣeƹFʈqfyӤMA>}ܬxhj1tj9,|a_whhz<+$Jr;WZ| I3;fł:~\jϞkܙxa !NGl>mZw@St5Yl}\]2t#S[~^kf2?|Z6%#d|VrfGpPH0J rJx-̟tuX{S o11 *d,98^ *|xf6Gqxˌ+QlX\}ק(;'w&i?{BU[r ģX3g#D H?-|QW`]}anbMZBΊ b:^#. Lo2V5514Nsyh Ae*.>d2s ᷰI8Rbb\MaߑH?(͔ΌbgSDϑN'2. #-C.Fh5n=nc-);џ}.~,}™".徛ȈxD CgYĄkjKi"u+4ȇnU$G 4b̞C}b:/7>N,1ȹ"q|UޕqDinC%߆O}^r;8et~E\ĖoE(C[lnїPZF֗HCWAVm&6+w0eϛm-W"ᦗK49HIӭd>!N  ng:4Xn ߶WvYu2,AΣ fawzzSCW "Ğ3e*OVΰ4B6o9/ȗh49E^qR|:.ҚÉ|Rl:"Oʳku^X#ġ4LT; qazO-\ovW/Tt !&JgUA5>"bK请7,THJoTM&W(*:|԰"2J#H$V;MN-$hnskrH dKBKI4$0Nh0k+. 1g: TazDlgv$'.Q-Km >Zcu4| $0 boE`@-]i;@.Bn^#Xuȶ>h4-TU0l_#"7qavF]T@ ] CB]PDFc$_DekF4awpybl4H=?2+OdX+=.kdI2^29(ŎQ&yRz$AߛdH#R<|g0k}eؗFD>Ut0$gĨqK!'ZԸJRAoIyd:eVlT줆܈jB\="MRY_my_(d@yȕ6F]Rb/VF96el*nvo_O>s_;}g|2K?ہ"e^Ac VxX7X6NC9).&_lURR7M#-Woȱwr Am'vTիB0Huy噎V2cQ-֧ҏ~j{A Y~r)!KM< 5A&^e֒Z~ds ԷGX4/IϽo';B;Nek2r7L Gt;rݫ+d]fnFԡ\P4k;A4#sȽc-B_ 20g j{geܠ<$5ӛL]Bvw - lW'/ ,t$\fnL :Q1_6>ZTxo\(B3=V :j ٪ DP.e(j͚L4yw>c7}j" @"E4Ee=RydMfx@ 5r: UJ6 5 (4"cKmBak-$Zimǀ ʁc"&AJXUXզVuu-"B4[m AZk#Z5EHCcI4-!@L&&LMMF` 0TdLF5JD23. i+mNf(i#$][Dor@4sT߇d50⵮{P3pq#֜nwQZ6 MsO~ a ؊R"aX $WLH UA=5s?*I~#Z[XgY%I$@ȯG6^$0&H{e=.|* s,buhE[&n}MHBe*}a\P ^r5q/n/4b"2Ol{t=fEbF߅ ZtUPkjޕpklpW[E )ʭ-4OΓV1p~+73+sFKnU^UCÁDlbAvJϻn,lhFB54әW8n`MŊF3¨ܼĔSA.Rfpڙ y3]Z~SX@ElEhX#a6\ZǁKkؔPnE:M5{(B5,sVQ 1uz9E6=m.kK q&S5 6u{11$V\(Bd3vML^91Oad_TuH&'xhJkm|\b $nA|w2΃}]d= Ja6L/psy}8 v^^^&GYfs./%l_)XœX|Вȃn;cf_'Fͤf:_ v)3-ڔg }VqR*!#KA㦦$,YhO+6Q@Yh]etk:$X;gL"5֏͓ q+?m|DrPľӜ?TWϰ']٤a:]H~ tnRR-mS$?0u`bZ9%+Ae #"4MJYE>ܕ)CQ$]5nF͝㯯eM LJ6-㭱b/wӝ+&0wrF[^6ca,0y?n0턉Ǡf/b314$ e[W]9'Qߞ?J0qأjMަY6 Hrd%֯ 6+zU鉏gԨ*^aTl[ zs}H8z{z( x)2U 32_W|¨{Ee$aV~YK3Vo(dTf5e.J jOp{\x`X`l,ϧ[6D*xWeu3&u;h,y#l'>7難1CTl 8>q1;g))Lf<ߘO\ E4Sr|({\َ%ܗ;A]ßq8{ȏt)m}Ε 1Vk)]BbAي7YI*뾋Ho(G?}p\YH/4f,1S-&']ňf2 iɝ"΅ۥ7IdbajIW|#*N s+wT&d0/%ȅHipDR 14^oW(p8ŪD:eغUJUfXpU};ӲC7h~Kz,R~,Ȍ5܃ 詬}>^Fy ՛DeN y(5K/S% _'}~d>2`$2@픦,6.)P80"j"5s2Bʓ?Vkۋ eL1 N%3VSED햞oUT۞dGlcG ~7L񥳛aRy{!e'O+04QoL+>!/U|6^,X`z`WY'z> f9+˚z8R``-矕.%f\CF8kB|y%KXK[.Xu˿qM59|ڜ&n8jB]˰Ξ 2+3._ذ=93=0`HN6 >@m)atiwBI&)~t7QVç&:&{\^@"$야mZP|մsRPq,\f{Q{;϶U[..BםD dGBF16BE(Pw ҬJ8vz׹G"1aUAԕʎy& A{4k>{?ڒ+ո(P(0.ٯ2X_$.ܶ4Ɦl:γ\X8q!_贵[2tAOo؍%n iV6ضUPBOnkxz?S8S\(iyI6h\-&F;,ޟrzHK)])nXie!H5vܸI1nCH%p֤".NTgڵ !JY>+4"C0q@3H2a.7yB&-5!~͛P:' ^>" )Rq7 9ww1itT\vfZRZDK7?tw3nƕdB1lcx 42ہ}3>Sj+qC%QCWws8\z/n=׿o-/ ]y >kޕ9I߿#|wf7&IZ +Z.s{=e(HcӔ7yJ<NjE52R~X{8.|&t*&Aw\lǦ%hgy)/!$C*}b,Z 1,r:?+SCX#1ʹ/ƺ9_d*z]Պ= }BeĺidPAO6{-:*̳G|ٽv<9}QiG3txXmV~MsߑTR3X"K=boG9Ѝfoލ8OArC\pwװ >Mf.Iomx6LhAk.є hXI*诋EULSC@قޝo8jrp=~X㗰a@5`PTD.2G`!.-_bEd^v''vI}?br=g$;-30>[udNdb N#2󊖓Cq!vw 騁ژ=DT s<ف)uB>n ||XCʚCA"Z1[UUU^2^fpX'[/yiaXu΃bnz+SJ'~b&~0˟АWY6lD\LK?nD/61 !U}I/ڡVƗK X˥J߸P땑S^о-BɵaiyGQ;Y%4 B`_-ȡ.B]UIvPŢ; B3[8Ш6Amv ^ƿ#l/a0! n֡+D@qXF]N>_*YD-qz1ж>flusU :p:X?4ZU~ݯ7A.Yȭ+̓)1a؂]+`TCIAp/U=jP}0*P mU=e7 2 .`v -h-bs A~gfq4MgCfp15a]Gy Nuj>VLECߟ6.Gzdz] ʛKe֭vפo)eh~x!E 8yc[)jM#adޛ>Fxzz8ʔH px@&[1# 'PA5š/ݒ%.|0:ZfJt-J?uiax ؏l3f8u*c U495|m\N 3O`Ƥ>Z=E-7]?;ke$ՔXк 4pjaVo0Hj΅rRFL>єoy٩V2)2术c2D{0EM^8sMגܧ qvjQZOp7bngR$'Sf=ci['EqMY>oWF ' ɪkI)pF"(,K`mHtw@0, g,:8k!eY:Ps }~xL [y$PW v-|9em·nMW @vkJz%QwrVTM݇ V)Eϫq?^M$%mȴX܏I'rUW7Y,nv։Y h}^sUBLę!j֔GΑ(A Ɇ .tFL=?Ģ\~|`)`K^g^\7*/)&~rYPdzRCM@,a3vJ|fkwJ=&2&qh?4?} : Vwߞn&^%{jv .u4= ]p6f \QO-R)svWUfQggܪ{W4{y͡7TT^S\{~B@nN{ XmH)g_ 7=3~aXw;*o$h~}gGO#:a#|#_-3,ra[|R굶hC]^9Le4t3S; 9,ӐnfDhRb6M攎os[Qo9[tM(,%pҋO@1 mڱ Y*k/攮{Є5V+\ȵ`B SHX.%hscPuq/{Bp'u@'2*ΰ5wb;kL#f9P~(Iw9_9}s֬ˍ{$`mUI D#Wk*,,"_Rj߻O)lzvWxeGXF2<\ix=z>i ktV~d T΁E7M<_Zj\>¬))}Fa-"_.Mzpf2zw\qt4` 3 H O&URNpx )uO/{W"v!?sS *Nz󯚵Wh2 4Mq_M%?_xDlxtGeHG}P(:xe֋<Ʈ <4flm[Ն 6Ҙ>NcQ,n!^D/l/W^@k 8Du¢gT/ ƿU[%\} NVsWF?7cp|n~Ӊ@dS $S%TGAr O'3'_1 o0tq Vڊ? 9_$W!^%:%gLqGУ:Q1=d@mSRr,p-3 0 -Xh+Apr|UO2,.r@;Q 1|ԏPr^EX40X|h<ی9\юӻ)jwnaA=mBv4.0ILg Cj-:4O͍ ٘fb"(hd@ܲӆچƽ-r1u#( ˼U#%n Ƣ*Z J\GҨghJ*%m@."Z]Sj$e&kiNCMA-VV^ȯ隂y޼ⓖvNkJ7a#Nv8J*fJA~_&΋e Zmҳh.i4 w9@rB&E$*\r $s&. BߌOȤlwVT~UDfJo&6gfѶn-dq=BvoB^PPFAʻPP]/}ǽ*^6NdIp{jezs Rԭ S0]΂}!hI[u ͼ{oW֯8 Vnu,` Y,SVq!H Kqq5sM ; !Rno*&+ֿU@m5&\8nzd>-ksm^ǢBgU0Ɍ-aZMna*Fb8Ѥ8f0XiQ17Y4d@?:nG#7-< >vyK}=rz]Pg^Ys{N'yuYp#!9LZ+)n}O͉f L{3{ Ŭ *$>^Z;SbOYX?qR~S?a\W<+j+@']|.#7rUiq ;'jIȏ`ƛ]}ahP|}.oR51Z$hA/'yw [՘eIm^4 3Pe1 54 + +(Ku|8yQYζe$+'+gie+^Oog(JR̝C^E?-+J!O7CGW"iiZk43usQ!YmI |{{J=DU?;Q&'&@Cg jw=RFV2=4zm4y Iy84eRMFw)@=a(rf*g&} #9j*F3]OMp–@[nyYh#]vdpPu֌"-xM>TH΅֮M-MP>)$!!%E< kJ$ Hv "=I$$,3K |:0-Uf6Қժ+mqŬYJ[Ƶ2&f֍mb-m5jOŪΪյe`B!$ A$eL`@$S( #6b$52CI (LC!( , $BP #I2%$ @fb&Ab%2dl&B1 $$D))(C1hQ(BD%$ )@Hl3$A`Ta4$m,d 0ș )40$f$cbD2E)R&H2D(ESqDTDdMJDȔbf` $M$Re2E$! ,beFFA3$D"4I4M H2Bb `D"+HXf3&)0QF )H 2iBA@#H`I IY0$#$2L# J@QQA cYb24d"`AP!JLEI& !0a 1"B2*fBDI@fSE0LeiM% B$Bc#%P2%!I&hS @1CDdXJ$ i*,Ĥ&Y)&$4E@ARM  B(FHdPd )A D(4a Q3#RB ȉ$JH$aJd$$LDa,@a)Hd(PādfDQ b 2%H$ȚeD@4$1JlAL$lDf$HD" ̄(ID2L (Y L!P%1&110#ƅ6"D,&)3 0j,DbE1%a)M$4LJ$$2d$Ldq̓3e!"əLT2EeJ4J$ R1 dID`H1J"2"Q4H0H1IA&bQ,`%,4LLD`fPf%$5H1hF$JQHJR`C1#L fQE2I#"P&JL2bd2Ad2q*P E%12fS2d2MDdHablLD 4$ ALAq BDH 2& J2T2!ASLfcI 4dJ``Ij #(LF2$Hbȑ̑JI)HL$QI )`f 3 61FLX@fBBHR# E3ɘ,i$(1ALђ",F ,&Ldf( @B1 4IB)Sf1#M( 2 dҒ* eH dDF (HHF4H!M$ aMED&lF2d@h64RRR!$A)EAb L3@d 1 BAH2 LHEJR(S" HAJ$FE L  ha)dlŠ4M &"fF4 0%RDP3610FPbCIMJ&X,IK!&iE)F ,IC&0R `!($)&Pј!el4Y&"l!LLl0i"`$LD%b"DdY LɦHdSA"Pd"&LT! dA)Ɋ%F`"HfC#*S3Dd2B2APT2"cRBb$Q RBRM1 (bJ$L4e1f(ALLSaB10&2Ph4`L&@Q FbLJ`XdB@R))`D!"6Qm31P22&BaAIL("@c3LX%Ld50DFFF0Xlb2"0S e6)" fRI441Q&"4 D #E $č$1)LDLS4B$ LC L#PX0 FAc##$4(đ20( e,$)Ib$ ScA&&EBB&($P2YIH0%$S(BL@2b$F)MH2ITRMIH, 00T&$#@ $)#b( "#&fFA!" *"a)d  S$%LPe@&! (XDC4$&f$D0FB F12XFL4 &&ƒ)02$3 )e1D Y)"24@1@D$e$)")1& #,I IdI0J168\0Ff 44P&L!!"b3 i eFDFI ,2X$)`J)dj2F hĀǏf`1 !40HR%$bSDbA c,ĈI1h$$Abk)LZBR@D(!L,52A!BFAl0l`$i(*d"i1IM4T &J6#!IH1I4!&FaH4JLQLPiE$ I" 1A`aI&) 1$I`A(E$IB!1I$ D$IB%4fQRCC+E a2Dh6(,L2$dbI3,@$4Ia1DbD"D30Qa & !$iDIJ#!1@Q@H&S32Ca2PB̒`̣I  `Fi%e&&2", HK6"2 b#2X&2SHJP@$"&bRd#DSBR2hSSFf0hd(Ȉd0 db$BC,BdLRE"JM $E!2$ `!% "R*HX&I@ D! 1H 4Ri2HM 2aLK$ FI"L4S&c#ifI$B"eIaB(LQ2 C4I$C"3 DFI"cdȄ`LC $Ć#h"D"i R`f4i ,2S`K M! AHTԌ0 (,)2PŢb0 F#b DLBC!`LBXD%"%BQ ‰!)&$b6bE0A YPQdL&@"i(R$ dRFS"# Ab HJd$3TH3R`(hE"i(ad#R͒"" I&"*@fDa1P%1Cb3I-0RC4؉6M&4h)#!$2P21$$QQ24YE3d LD0H$1l4m%J`$D4%$Db&@H(E)D6"$I A`&bQE"H$Q&HAh# Ib4HR(ƅ412E$E`L!IAM,̌Fa@d,4B14fI4 H($!@X IJc20DI $hjJS(A1Ih4S2XIHdRAA" fA)DR2i mXB$D"SdbQD4XIA0aI)LQ& "@Hb$$dc$0l"œ&10H HdD$ @#Y"F$IhF ac4M&J#&La@bdf@F! BDD lR) c%$QL$F$0$2F""$m(Dƒ"HLfDd%1$$a J4"0AB$ fأIIR1(b,L6$CIFC  "E$B"&,b$ ,Q%bDf@A3" C f415$d& TQHcYXfJIb4JQ%J"`3@1#b)I0D4$I $%X)J( 0dL"5(Ri,Q3 "L"!2-BI*S@B!Lb0&eJHM!)`I4&0 L2f*S6aH$ɚ I%2CH 2Ѐe HIHLheDX$L$EEM (` )$1(J"EH2Xj34̈`PIH1aɰiP($RcFDc"2h 4(aLMfDDĢ0 S(2l2I"1@Bh@ ‘ 4D"2&YHDI2a )!I(CRE ,ca0Pfb4lJ f ƀ PQL!L"DRB 2LiH1 %3" $D#( i$C@fMAȒ2""B13H f1(61"ě#4)J$3If$""(O@;@ u" ETR]cYP tت+H.lUZ 4M'z^.'Ck/2$fh>hanX= Cj.RO.ƍA~dhw̲/\XMa`:<+کx Qr=T K^OQK9U;@+wU CPO<3Ҝ ?R.zpqcq0;UAjgHc>%N)\GƛQhp0%D*Hu}ẂƠ64."#=4;;!;hO^RZ+!-;Q4.Ah-Yfi"L r H~dVLUxY(񢜀)"Cx-҈Lʍb y7VE|uiQGJ2K1ؐkcLRM uVU)n[3 $͆4qi(Qc)e⧮>H hv A/NѻEt%L#KE9Gï~N[}%G=-r=KƳ, +/Z͉ ?<9@5eZ rTW⬏Y,X֘ZJZ٠9Fz}*9$lLDLY[ TgH'4_ȱupWA jTbߪfvNXsi8ב5#} rmE$iNԉiVlyqo`gVt[q61ɭk&1{M^@SO18b,lF;VǪaŶN-W#_t5!,% j'9Sin08٘#yKi):D,rMe<]Axnmhi"lI'ߡ St_?g)ri/j+ދjt_E巼Pzl%i՚Co$HB50[QԤ=`3,myץKҪ|aVB{{4#s;04'-q-Kmg0`sU.bDvDj[Cz {49r{n24B`6dqyjD<$sMԀo &0 J)@al?ͅOv0I^hUOg=Vf4wsLpA]iBbn DE":!޾۳ksk Bm3p@N1%5\}6J[Tc[5p~$@{Vwpɒj<W{ޜ;hb,64EIkDCTR3,<{gQ> K=B)Ӑ&(OXHB1!A܋: 9H%T٦׉kIcm "Y;Z(-U\\x{7 ;[ܝ\L,wiS}BEOu9oe?Z 鑢ޥf3p.Mi A9FUӨa4UtЍ<8f "qS!TO7XCQ=Ow dUC-%R?^ JV'@OY/ !yt(` U}L"a sd Dx2qnliYzuvB!ov꠳A֪TE9+^iFrCH^]WӤl_8@y( C ٷt ¾ִgO2gT{ I+KTvk'Xp+c~.~oiaNNFa&k:CƩf(;mEXd5^VnĐ&z ZF~p EPb0OtatL<@X% 8+*Yx8h;Ґ \ qJO~%AZc^3Zi[)$XIoT %hJ >gd #krmx5r9RY,tGu@J';&oͲv'/B) YQc 0ҥeFqO:ky"フ2J:efXTj|I^r( ,f=f*= UƜ1勤D" $qXr.Ċv24 cbY2ol|b" 35ǧd'c_)vy,:`^ʋt/v(1-i?- yRozf"D?|4(}OK7>IJP;,gS55AӠG =O{Bs{꽋2q (`^n)Hٛ'9|a 1z<L_ $&D:M~.kؐMvAQoEJ/ayyi ldt{]=+ZTs#?:uSdl۳`sdHD80ImמsGIV0iXɅ봈!g$DXf4c^&rf{KdAH[7$fYIq;XQ cPX18ečB2fh :ivS:id08u5~?w>ÕlBA$=B"? 59]MW(P (2+A} }>/+H@5e$XpG nY Bcb>4zS7C?kF0)EW[8rN;dgP^c#C s ir3`I.NKi*]B%ZLѐ23~ZL `FYꯔ DSƨDD#uBlfv<gco:Z1HE-]Yܜ" ʆB i=A70ATezwi_F^. ؂%1$,ۡJgu(]-v.!w :7deuj[4T\[ W|fBz_Pn,э7]- K\X4G"_H"{GJX;(82aAY9+kr9޻xE<|liTDƧAj[43G-dZ)x(+)_~%mg|2!BAk! _bTKeK ՜6aw×Il~6Q(b9uk L-*NygJFXK-1F̼!Mj qQC3(O&>3(DL^65DG`!ttGcR: z0Q Qc:DK,ΌB>hSTd/}x6*}b*a@sӛg&-pb4b ~{DK>5dTRc̘ 9 i{j -.YOӓdn̡;Pʢk1jP4SS%V熢cJs-F\k x ;J:^L*&vL0si PZ@|m절Y"# H86[/'O+s JP˷^ӑv#G/a Y^^PxC1ȬnJV *5Ρ+V#׍oNBD9 =ZInA}H,8E~Il̺^M=A +,9jե"<*e#a>7V9)P'\]1$9Ƥ/!eFf胎n4:rQQ;d %'aԠ:8ޔa Pޛ:(5EꤟDW~e{sxHlnA>Kkڞ<ĦCwP$d'bm Ř o^&Y`iV>ه3gvǥqfuĵlR(!խ[0!7*kPlOB uQdڡO)5i@Dft | U'췡%Ǘ+PQC1ib.,r>MǡI~  Ts4Vj>˭zD,Qa,"{gC945> ÒFM\|!R'*Is*H#PIdq, gF3팚qs:83= r^8 B)1"'ua~C晈z xtc1eѱ~}jzM eL x2:F QI1=cawSVaqBݵ뷃fïvȈT\Pj-&둧|$ҐqO‰) c*4J7ܸߔ2}3ߠr*#8 mR3~=(e"á2, /(۵3%z3xf-TOoj@&RQTx#ͪFWRLe1U#0 iPk)ByT5lH&~4kPJC0S.=*&޷gI( @_۲G$.f}=bmn1'#:M9^;T#[r(H|5M 8;]X0HunStoڌ$;r@N*1Ddif`*K`%nm 4 翴Bm}gU))AxWu8`EQŇ8GN|Rn:?bIe6ûǓvQiw< lWrz }Am-]BM0&OZ1ya|i>& 00ˆ gN2T$ 鉘,F!΃;xh?;jXKeTIa3M98lvn閠sSUtqXߴ\Ƙ4; D9ן9b3 OލlkJ/폈:'M*YX#F!IsR֎[iXL]9`3ϣ䱳V K~Q2""G?&(_P3!]-摨@K7X8`Q5ןpW+!piSaĖ0_v!JsXsM8oyX7XS{%TPIAdC}c[4jN]S41!!yN!ʎf͙=CSL3FS, +pI0,(FYKoVDgIE7p 1׌]2{ vLPv-Vaܛ'B9nڠ.Hx >~b2bc6;zf'@RH j͝ЀS1n]_5l>C!? 6[>ъ 0( zxs>Qaɿ&VaV俒z@an Pz-Ðö1OV;4HjD#)DyB}ٖXH 7{;yT#Ai: -̓ذS *%,qÅƐf8kM 1 #1-Љd.㜢0"uvD* =a4!(2i8r'\~o,eA?B^" \}/933'#\e,Y0V,qp#1'Oao 7SYHA@NOneFlidX "J ?=ѹiЇ3nq]S2) i p,@&1_]PQd*7R9@nHq֛HiH$j#KZ *PX8^niIeJ|eU͌HB cE@c&!=,\L3]byP4{k*e|'ɢxxEk/Uew|ڨOigqO/ UB !S 欓Ń_$Ǡ|`K>a0:_=ދπ~o<  6Rа6\0ዄ9i}jbm'A],&¥>m{cljy_bXY_%kzʦ} D+ՠhČ.dItJ-2N|.c.R;<Ы,e FXHA 2n-&<6 q>@z p⯍馔 Rg|b"SqJ&K8Y%D|~J+gzpؤ6T C`#NdL~Z==BHֲasj֫mҴֶ&@$cFI?.p!.recipes/data/biomass.RData0000644000177700017770000002401413064546045016531 0ustar herbrandtherbrandt7zXZi"6!X7'])TW"nRʟt<3n2uAv ˙Xz]Vݖxg˛4nng-(Տ #M;[=*9&ƄkJwcOv$]HLL+T(1nGȕ{#LLb")MU]."%gȭn9Ry&O1a'^+7eȝզ{ab#ӔԒ gɁr\bspFgiGl9dxfG=]EJ(t\K&ggR\$7Gًr.Bs^mt} U< N`G4+ #r:`1ir'K0=xi(nmvjn cʥJvdҰ7^JܿA3Q\pn)}lVN-CV6Jhtg 齐h ?!G 0(ipGF$:KmBTLʛV}07FBJkBAW <$ B%V4*,'x[bHۄ#K )!7;\h,u !AG@b~o籸 ;tjAv婙ik<7,M eJ낕PUGĆ 64[^6 xcj }д=j$h.Y/s+ Ypn/ #EKOy",NMKRǛTl/ôǽ"Ytpfv1YhבϳnaԦF_ּcB2[9aS>r/Cy:ʔ sJ("z(YAx}<^58*5kСa+ߌ.^8OgsߎP ӆa0gk^F 7 3EGj9$e,%o8~VOsFI 5EdilӢ EI6!{$>Jd!z}"uښ=-R%c"v*FfqI!C ^D;#^ (p ]S3 cIv$Z5.BJΝDce%RT11"798 KY7tX/1O/!z~ެ)^Ӑ>&rhroXCG5{y[gfhGt$a\n2׎l-m3 ֻ^/ wzmzZ7G ϝSD@RB5)Dx2wBq.|W9i+w1c'/J`nU,_{1p;Jz ]{x}PX;Њ$7e$E`Oj6O`+ .MMTo`Pу y\ 8eI EHI}*Yc|xꌨ|YKT+fG#5aAP-t>% uE4 2mG}(la8/ N*˘JT< 㥑MAIY9zUӇsM/ݹym] 9Cvͮ=VYA⇩.Aln!0& Ә{?/H , WUYWExl%*hEk"mhElX8¯ACe5s"{noQ" r&[X# wmͅ."``'>;l'=ՠ^tzRHy΄[.}AA*_tdlpj`eܼw?8x3}U|;Ke)nߝ* :3E٬5XsX(u@ȹhhBPf1*Xܜ KCTM`i9O;bp稴(^gKY[1T7Ɗ=Ian|L_ }['q^5ײ4XėwSJ[ ETaUù:Dܠ cK+!w]*e2~\p5CwgKvFVR^'`'e񅚬*&sA9&6#  qTeL(qj|5QmHa7Ӵ̠F,(gzsS jU*M RKMH?S$vFafp,B7ǵnF6eji xMꤟ|<$R2%Hh17JCdc0Rwf(`p0ISL +un& )2d"Z }d+'|[*Ô݃xYL1\Rv78xz苣giD>aaFw+jx~xky 1n]KE,hsf(]XOA\ FF$#݇,@D\b嵹orG4P*D4;ܿM]Ma]Ĥ|__ds!qA/(c;8UyT'oy/@Y\;9Wx,%~j3#M/•Ftj-X22:u!f3Db `gA0j .dc#uTYIך6 m"~/ѿ- ſt<)19( A|[T1C;cފXz@so.Z\Z&fN#*p!A8f`=Fs/ U O >9cg=SH<|U zJE⠫5U9 'zps-|U.-XnP0t_xZqN__xϒ`"ˆi]O%6=E SKb  W&쯌 >j5Oل X{C9d67Ǎ^:-i(k[ԙՉǹF&QQ~[}]"KIʖ;|X"xg!úS]y9뷿3[teӅ5X52nVERWH!m{%0IU;Q=nkx\@lcjHhVaC rrkDG\F4cseQ#o9*ckI'Mp?㻹}su7&[d0PD;]+A\i}f,J17DO7EF~$<KZ@=[DcEt&BH;IX ;㦿f3'xXw~~Tj$PTþS1dH\=Zr[\ i 4E>xG l:(sTWX㖁d;&J }*/MM = -0*r^gR 7lp3;,IHD"qJ†e7ڱ̘x4b xu*4h ),/_FOE;F`"H"LPrkw`Jmv *!ȆڋR\'$h_8uZn_b'j~T mI{Px&{Cخ֑aw)=!# 9}yGWWNJd=tArәxـ\RF!ae>cpEhn Ȃ/7*F{ݷ#KMhJ.~Z !pLN5@ .{.sΚZ41)Cnd J֚;+(?YoLxМ6QfޛeYS(*<.,>vF){UtoSW[?B(:2.zGC#{ Ѣj$EOm/ oLz S@JɘQncY{ zoܽ7Mԕ:UcHR;i D1q*$['h=rQ9Wifz%ؐ];~\8+3SIꓵǵh Pxc*cȑ"%5q(sy!ʞ$+"nE id_F/{ּnˍ8)q^8.J'Iɑ*q ubަUΈEOJ ۺUh;0DR/ܝMOC'a&D?'Pɮ,ﻈ(ލ/Ɉx JS:!zjaQWǀ~Sҥ_W.땩G$YX=md"Y]t yNMyh9?U(‚S(*8o&M ?2Pigb$ "j<[uËP|[dY4$:g_Z=_5`' x$&+U )܍ mr瀙꼑8fԇ{AȞE=@{#rGGК zPR&-nP'y,l'OK%=5kť0C]Ao}^Ŀ&],I)(c`]ixmwJuռi⣉~8#ׄ#~ O`~n ۑ847Ij 4&_S_K/WC#\] *ç^QV+B/>prdǃ6O,v(Wl^WCD=~M"f-@]*"\Wj8AC>+ mRA E>Oz\35sFimҮ 7{D?f 3cghdJt[- z UTT ᗦfȃ:̋h>T5R,Q$̇ w\jX xԸq_Izfg&9IQ$Yl>jj Y~> nt  Lo_2BuH,g'uLT:r+>bp\( 0`1^khIon⇏_*\eb߂`TYTcsُ[OK{]o{apTަP,H,+זŷ (J0GR48SET3DQS>Uɀy$ lDfVl?w% JfJL Ŗ|Rj: E]|euX~6%)$#a FAЖ8wUMs'Tzhiм-c4ὌՉρc'|PhI7#$]~cfJ~C!қ6/1[k {㊇|A n FLχ s34/L'ԍ *iS׊%.}͚o+,w%B 9(^w~Y7}dN0SڬdJ j o.@Og3dnwp˜U&b߾4UM!T$ ,⤑:^y:<]{bkalmɐ뾣<Ț+9[[ؑy=WNs?DTkӧf‚"U/4U<丿BɞU +|\׊J&`!˙ش㿢 $?-b A#Iqsr:!gQp"8{7DG/^s`6#+z#| $ט_њ K-!0^oN{(<߼ Ărpk_4pbEBŅd߾eT{ޖP ٪bzAq=| !H޾Q AmWIbXP[=YQ0~vu)l ,[8_F@~j%LTA3DnQƀ]aq)΅)a-r.P#lri9.F>i׻wxfReQ\(:-#֖ F_7NHf>X$XQH/Bs u=cZ3zΟ%w}b_BaM bw'ӭPM\kDzjْO+GI8\+C`sN댁Oڰ]/5} ~6gNuRêwrR8 P3NkƢi?2?xQ&vmSfFevjpaJ&~9ڶb&NYׄVQZ_sU)6(G' f?`"tBZo`ŞfJ/#JO+1L 4ʪW'wGg;Epx2r2D/ \fa%W4P4ސF7`x{o*xX<ʎ0ٞ<$O9AnC:kFFs^ʐ]?rܚi\e%,7Ǘ;_nd&*xtn潇1)']*Ŧ4;Y0KK4#?r 42Ϗf!1V„(P>7|sq<ê_)'Nj(훫X0HqAHI- #E-_ymămm(Y7dd *B}cꓯ՘2;zM r6Dj cB(7UCL+z%m-9i.=z"XVGmAkO wŷJW#E8|qZG.Lk4*I_:괄j[4&;]&(%}/cLJ1r`'*M+n>l6w6*&ʱC ,|;:OiGa(7T: /i?ǁ =,qUnrΙ [,!R(dfiA>%fbs";5Go+c4SA#eAˑxsk΋nD|;}(>TVnF@Lm@=řzoxxW/O@[>0 YZrecipes/data/datalist0000644000177700017770000000010113104416110015656 0ustar herbrandtherbrandtBiomass: Biomass okc: okc credit_data: credit_data covers: coversrecipes/R/0000755000177700017770000000000013136242227013441 5ustar herbrandtherbrandtrecipes/R/window.R0000644000177700017770000002046313135741217015102 0ustar herbrandtherbrandt#' Moving Window Functions #' #' \code{step_window} creates a \emph{specification} of a recipe step that will #' create new columns that are the results of functions that compute #' statistics across moving windows. #' #' @inheritParams step_center #' @inherit step_center return #' @param role For model terms created by this step, what analysis role should #' they be assigned? If \code{names} is left to be \code{NULL}, the rolling #' statistics replace the original columns and the roles are left unchanged. #' If \code{names} is set, those new columns will have a role of \code{NULL} #' unless this argument has a value. #' @param size An odd integer \code{>= 3} for the window size. #' @param na.rm A logical for whether missing values should be removed from the #' calculations within each window. #' @param statistic A character string for the type of statistic that should #' be calculated for each moving window. Possible values are: \code{'max'}, #' \code{'mean'}, \code{'median'}, \code{'min'}, \code{'prod'}, \code{'sd'}, #' \code{'sum'}, \code{'var'} #' @param columns A character string that contains the names of columns that #' should be processed. These values are not determined until #' \code{\link{prep.recipe}} is called. #' @param names An optional character string that is the same length of the #' number of terms selected by \code{terms}. If you are not sure what columns #' will be selected, use the \code{summary} function (see the example below). #' These will be the names of the new columns created by the step. #' @keywords datagen #' @concept preprocessing moving_windows #' @export #' @details The calculations use a somewhat atypical method for handling the #' beginning and end parts of the rolling statistics. The process starts #' with the center justified window calculations and the beginning and #' ending parts of the rolling values are determined using the first and #' last rolling values, respectively. For example if a column \code{x} with #' 12 values is smoothed with a 5-point moving median, the first three #' smoothed values are estimated by \code{median(x[1:5])} and the fourth #' uses \code{median(x[2:6])}. #' @examples #' library(recipes) #' library(dplyr) #' library(rlang) #' library(ggplot2, quietly = TRUE) #' #' set.seed(5522) #' sim_dat <- data.frame(x1 = (20:100) / 10) #' n <- nrow(sim_dat) #' sim_dat$y1 <- sin(sim_dat$x1) + rnorm(n, sd = 0.1) #' sim_dat$y2 <- cos(sim_dat$x1) + rnorm(n, sd = 0.1) #' sim_dat$x2 <- runif(n) #' sim_dat$x3 <- rnorm(n) #' #' rec <- recipe(y1 + y2 ~ x1 + x2 + x3, data = sim_dat) %>% #' step_window(starts_with("y"), size = 7, statistic = "median", #' names = paste0("med_7pt_", 1:2), #' role = "outcome") %>% #' step_window(starts_with("y"), #' names = paste0("mean_3pt_", 1:2), #' role = "outcome") #' rec <- prep(rec, training = sim_dat) #' #' # If you aren't sure how to set the names, see which variables are selected #' # and the order that they are selected: #' terms_select(info = summary(rec), terms = quos(starts_with("y"))) #' #' smoothed_dat <- bake(rec, sim_dat, everything()) #' #' ggplot(data = sim_dat, aes(x = x1, y = y1)) + #' geom_point() + #' geom_line(data = smoothed_dat, aes(y = med_7pt_1)) + #' geom_line(data = smoothed_dat, aes(y = mean_3pt_1), col = "red") + #' theme_bw() #' #' # If you want to replace the selected variables with the rolling statistic #' # don't set `names` #' sim_dat$original <- sim_dat$y1 #' rec <- recipe(y1 + y2 + original ~ x1 + x2 + x3, data = sim_dat) %>% #' step_window(starts_with("y")) #' rec <- prep(rec, training = sim_dat) #' smoothed_dat <- bake(rec, sim_dat, everything()) #' ggplot(smoothed_dat, aes(x = original, y = y1)) + #' geom_point() + #' theme_bw() step_window <- function(recipe, ..., role = NA, trained = FALSE, size = 3, na.rm = TRUE, statistic = "mean", columns = NULL, names = NULL) { if(!(statistic %in% roll_funs) | length(statistic) != 1) stop("`statistic` should be one of: ", paste0("'", roll_funs, "'", collapse = ", "), call. = FALSE) ## ensure size is odd, integer, and not too small if (is.na(size) | is.null(size)) stop("`size` needs a value.", call. = FALSE) if (!is.integer(size)) { tmp <- size size <- as.integer(size) if (!isTRUE(all.equal(tmp, size))) warning("`size` was not an integer (", tmp, ") and was ", "converted to ", size, ".", sep = "", call. = FALSE) } if (size %% 2 == 0) stop("`size` should be odd.", call. = FALSE) if (size < 3) stop("`size` should be at least 3.", call. = FALSE) add_step( recipe, step_window_new( terms = check_ellipses(...), trained = trained, role = role, size = size, na.rm = na.rm, statistic = statistic, columns = columns, names = names ) ) } roll_funs <- c("mean", "median", "sd", "var", "sum", "prod", "min", "max") step_window_new <- function(terms = NULL, role = NA, trained = FALSE, size = NULL, na.rm = NULL, statistic = NULL, columns = NULL, names = names) { step( subclass = "window", terms = terms, role = role, trained = trained, size = size, na.rm = na.rm, statistic = statistic, columns = columns, names = names ) } #' @export prep.step_window <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) if (any(info$type[info$variable %in% col_names] != "numeric")) stop("The selected variables should be numeric") if(!is.null(x$names)) { if(length(x$names) != length(col_names)) stop("There were ", length(col_names), " term(s) selected but ", length(x$names), " values for the new features ", "were passed to `names`.", call. = FALSE) } step_window_new( terms = x$terms, role = x$role, trained = TRUE, size = x$size, na.rm = x$na.rm, statistic = x$statistic, columns = col_names, names = x$names ) } #' @importFrom RcppRoll roll_max roll_maxl roll_maxr #' @importFrom RcppRoll roll_mean roll_meanl roll_meanr #' @importFrom RcppRoll roll_median roll_medianl roll_medianr #' @importFrom RcppRoll roll_min roll_minl roll_minr #' @importFrom RcppRoll roll_prod roll_prodl roll_prodr #' @importFrom RcppRoll roll_sd roll_sdl roll_sdr #' @importFrom RcppRoll roll_sum roll_suml roll_sumr #' @importFrom RcppRoll roll_var roll_varl roll_varr roller <- function(x, stat = "mean", window = 3L, na.rm = TRUE) { m <- length(x) gap <- floor(window / 2) if(m - window <= 2) stop("The window is too large.", call. = FALSE) ## stats for centered window roll_cl <- quote( roll_mean( x = x, n = window, weights = NULL, by = 1L, fill = NA, partial = FALSE, normalize = TRUE, na.rm = na.rm ) ) roll_cl[[1]] <- as.name(paste0("roll_", stat)) x2 <- eval(roll_cl) ## Fill in the left-hand points. Add enough data so that the ## missing values at the start can be estimated and filled in x2[1:gap] <- x2[gap + 1] ## Right-hand points x2[(m - gap + 1):m] <- x2[m - gap] x2 } #' @importFrom tibble as_tibble is_tibble #' @export bake.step_window <- function(object, newdata, ...) { for (i in seq(along = object$columns)) { if (!is.null(object$names)) { newdata[, object$names[i]] <- roller(x = getElement(newdata, object$columns[i]), stat = object$statistic, na.rm = object$na.rm, window = object$size) } else { newdata[, object$columns[i]] <- roller(x = getElement(newdata, object$columns[i]), stat = object$statistic, na.rm = object$na.rm, window = object$size) } } newdata } print.step_window <- function(x, width = max(20, options()$width - 28), ...) { cat("Moving ", x$size, "-point ", x$statistic, " on ", sep = "") if (x$trained) { cat(format_ch_vec(x$columns, width = width)) } else cat(format_selectors(x$terms, width = width)) if (x$trained) cat(" [trained]\n") else cat("\n") invisible(x) } recipes/R/poly.R0000644000177700017770000001072313135741217014554 0ustar herbrandtherbrandt#' Orthogonal Polynomial Basis Functions #' #' \code{step_poly} creates a \emph{specification} of a recipe step that will #' create new columns that are basis expansions of variables using orthogonal #' polynomials. #' #' @inheritParams step_center #' @inherit step_center return #' @param role For model terms created by this step, what analysis role should #' they be assigned?. By default, the function assumes that the new columns #' created from the original variables will be used as predictors in a model. #' @param objects A list of \code{\link[stats]{poly}} objects created once the #' step has been trained. #' @param options A list of options for \code{\link[stats]{poly}} which should #' not include \code{x} or \code{simple}. Note that the option #' \code{raw = TRUE} will produce the regular polynomial values (not #' orthogonalized). #' @keywords datagen #' @concept preprocessing basis_expansion #' @export #' @details \code{step_poly} can new features from a single variable that #' enable fitting routines to model this variable in a nonlinear manner. The #' extent of the possible nonlinearity is determined by the \code{degree} #' argument of \code{\link[stats]{poly}}. The original variables are #' removed from the data and new columns are added. The naming convention #' for the new variables is \code{varname_poly_1} and so on. #' @examples #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' quadratic <- rec %>% #' step_poly(carbon, hydrogen) #' quadratic <- prep(quadratic, training = biomass_tr) #' #' expanded <- bake(quadratic, biomass_te) #' expanded #' @seealso \code{\link{step_ns}} \code{\link{recipe}} #' \code{\link{prep.recipe}} \code{\link{bake.recipe}} step_poly <- function(recipe, ..., role = "predictor", trained = FALSE, objects = NULL, options = list(degree = 2)) { add_step( recipe, step_poly_new( terms = check_ellipses(...), trained = trained, role = role, objects = objects, options = options ) ) } step_poly_new <- function(terms = NULL, role = NA, trained = FALSE, objects = NULL, options = NULL) { step( subclass = "poly", terms = terms, role = role, trained = trained, objects = objects, options = options ) } poly_wrapper <- function(x, args) { args$x <- x args$simple <- FALSE poly_obj <- do.call("poly", args) ## don't need to save the original data so keep 1 row out <- matrix(NA, ncol = ncol(poly_obj), nrow = 1) class(out) <- c("poly", "basis", "matrix") attr(out, "degree") <- attr(poly_obj, "degree") attr(out, "coefs") <- attr(poly_obj, "coefs") out } #' @importFrom stats poly #' @export prep.step_poly <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) obj <- lapply(training[, col_names], poly_wrapper, x$options) for (i in seq(along = col_names)) attr(obj[[i]], "var") <- col_names[i] step_poly_new( terms = x$terms, role = x$role, trained = TRUE, objects = obj, options = x$options ) } #' @importFrom tibble as_tibble is_tibble #' @importFrom stats predict #' @export bake.step_poly <- function(object, newdata, ...) { ## pre-allocate a matrix for the basis functions. new_cols <- vapply(object$objects, ncol, c(int = 1L)) poly_values <- matrix(NA, nrow = nrow(newdata), ncol = sum(new_cols)) colnames(poly_values) <- rep("", sum(new_cols)) strt <- 1 for (i in names(object$objects)) { cols <- (strt):(strt + new_cols[i] - 1) orig_var <- attr(object$objects[[i]], "var") poly_values[, cols] <- predict(object$objects[[i]], getElement(newdata, i)) new_names <- paste(orig_var, "poly", names0(new_cols[i], ""), sep = "_") colnames(poly_values)[cols] <- new_names strt <- max(cols) + 1 newdata[, orig_var] <- NULL } newdata <- cbind(newdata, as_tibble(poly_values)) if (!is_tibble(newdata)) newdata <- as_tibble(newdata) newdata } print.step_poly <- function(x, width = max(20, options()$width - 35), ...) { cat("Orthogonal polynomials on ") printer(names(x$objects), x$terms, x$trained, width = width) invisible(x) } recipes/R/range.R0000644000177700017770000000711213135741217014663 0ustar herbrandtherbrandt#' Scaling Numeric Data to a Specific Range #' #' \code{step_range} creates a \emph{specification} of a recipe step that will #' normalize numeric data to have a standard deviation of one. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables will be #' scaled. See \code{\link{selections}} for more details. #' @param role Not used by this step since no new variables are created. #' @param min A single numeric value for the smallest value in the range #' @param max A single numeric value for the largest value in the range #' @param ranges A character vector of variables that will be normalized. Note #' that this is ignored until the values are determined by #' \code{\link{prep.recipe}}. Setting this value will be ineffective. #' @keywords datagen #' @concept preprocessing normalization_methods #' @export #' @details Scaling data means that the standard deviation of a variable is #' divided out of the data. \code{step_range} estimates the variable standard #' deviations from the data used in the \code{training} argument of #' \code{prep.recipe}. \code{bake.recipe} then applies the scaling to new #' data sets using these standard deviations. #' @examples #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' ranged_trans <- rec %>% #' step_range(carbon, hydrogen) #' #' ranged_obj <- prep(ranged_trans, training = biomass_tr) #' #' transformed_te <- bake(ranged_obj, biomass_te) #' #' biomass_te[1:10, names(transformed_te)] #' transformed_te step_range <- function(recipe, ..., role = NA, trained = FALSE, min = 0, max = 1, ranges = NULL) { add_step( recipe, step_range_new( terms = check_ellipses(...), role = role, trained = trained, min = min, max = max, ranges = ranges ) ) } step_range_new <- function(terms = NULL, role = NA, trained = FALSE, min = 0, max = 1, ranges = NULL) { step( subclass = "range", terms = terms, role = role, trained = trained, min = min, max = max, ranges = ranges ) } #' @importFrom stats sd #' @export prep.step_range <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) mins <- vapply(training[, col_names], min, c(min = 0), na.rm = TRUE) maxs <- vapply(training[, col_names], max, c(max = 0), na.rm = TRUE) step_range_new( terms = x$terms, role = x$role, trained = TRUE, min = x$min, max = x$max, ranges = rbind(mins, maxs) ) } #' @export bake.step_range <- function(object, newdata, ...) { tmp <- as.matrix(newdata[, colnames(object$ranges)]) tmp <- sweep(tmp, 2, object$ranges[1, ], "-") tmp <- tmp * (object$max - object$min) tmp <- sweep(tmp, 2, object$ranges[2, ] - object$ranges[1, ], "/") tmp <- tmp + object$min tmp[tmp < object$min] <- object$min tmp[tmp > object$max] <- object$max if (is.matrix(tmp) && ncol(tmp) == 1) tmp <- tmp[, 1] newdata[, colnames(object$ranges)] <- tmp as_tibble(newdata) } print.step_range <- function(x, width = max(20, options()$width - 30), ...) { cat("Range scaling to [", x$min, ",", x$max, "] for ", sep = "") printer(names(x$ranges), x$terms, x$trained, width = width) invisible(x) } recipes/R/scale.R0000644000177700017770000000556213135741217014665 0ustar herbrandtherbrandt#' Scaling Numeric Data #' #' \code{step_scale} creates a \emph{specification} of a recipe step that #' will normalize numeric data to have a standard deviation of one. #' #' @inheritParams step_center #' @inherit step_center return #' @param role Not used by this step since no new variables are created. #' @param sds A named numeric vector of standard deviations This is \code{NULL} #' until computed by \code{\link{prep.recipe}}. #' @param na.rm A logical value indicating whether \code{NA} values should be #' removed when computing the standard deviation. #' @keywords datagen #' @concept preprocessing normalization_methods #' @export #' @details Scaling data means that the standard deviation of a variable is #' divided out of the data. \code{step_scale} estimates the variable #' standard deviations from the data used in the \code{training} argument of #' \code{prep.recipe}. \code{bake.recipe} then applies the scaling to #' new data sets using these standard deviations. #' @examples #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' scaled_trans <- rec %>% #' step_scale(carbon, hydrogen) #' #' scaled_obj <- prep(scaled_trans, training = biomass_tr) #' #' transformed_te <- bake(scaled_obj, biomass_te) #' #' biomass_te[1:10, names(transformed_te)] #' transformed_te step_scale <- function(recipe, ..., role = NA, trained = FALSE, sds = NULL, na.rm = TRUE) { add_step( recipe, step_scale_new( terms = check_ellipses(...), role = role, trained = trained, sds = sds, na.rm = na.rm ) ) } step_scale_new <- function(terms = NULL, role = NA, trained = FALSE, sds = NULL, na.rm = NULL) { step( subclass = "scale", terms = terms, role = role, trained = trained, sds = sds, na.rm = na.rm ) } #' @importFrom stats sd #' @export prep.step_scale <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) sds <- vapply(training[, col_names], sd, c(sd = 0), na.rm = x$na.rm) step_scale_new( terms = x$terms, role = x$role, trained = TRUE, sds, na.rm = x$na.rm ) } #' @export bake.step_scale <- function(object, newdata, ...) { res <- sweep(as.matrix(newdata[, names(object$sds)]), 2, object$sds, "/") if (is.matrix(res) && ncol(res) == 1) res <- res[, 1] newdata[, names(object$sds)] <- res as_tibble(newdata) } print.step_scale <- function(x, width = max(20, options()$width - 30), ...) { cat("Scaling for ", sep = "") printer(names(x$sds), x$terms, x$trained, width = width) invisible(x) } recipes/R/dummy.R0000644000177700017770000001260213135741217014722 0ustar herbrandtherbrandt#' Dummy Variables Creation #' #' \code{step_dummy} creates a a \emph{specification} of a recipe step that #' will convert nominal data (e.g. character or factors) into one or more #' numeric binary model terms for the levels of the original data. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables will #' be used to create the dummy variables. See \code{\link{selections}} for #' more details. #' @param role For model terms created by this step, what analysis role should #' they be assigned?. By default, the function assumes that the binary #' dummy variable columns created by the original variables will be used as #' predictors in a model. #' @param contrast A specification for which type of contrast should be used #' to make a set of full rank dummy variables. See #' \code{\link[stats]{contrasts}} for more details. \bold{not currently #' working} #' @param naming A function that defines the naming convention for new binary #' columns. See Details below. #' @param levels A list that contains the information needed to create dummy #' variables for each variable contained in \code{terms}. This is #' \code{NULL} until the step is trained by \code{\link{prep.recipe}}. #' @keywords datagen #' @concept preprocessing dummy_variables model_specification dummy_variables #' variable_encodings #' @export #' @details \code{step_dummy} will create a set of binary dummy variables #' from a factor variable. For example, if a factor column in the data set #' has levels of "red", "green", "blue", the dummy variable bake will #' create two additional columns of 0/1 data for two of those three values #' (and remove the original column). #' #' By default, the missing dummy variable will correspond to the first level #' of the factor being converted. #' #' The function allows for non-standard naming of the resulting variables. For #' a factor named \code{x}, with levels \code{"a"} and \code{"b"}, the #' default naming convention would be to create a new variable called #' \code{x_b}. Note that if the factor levels are not valid variable names #' (e.g. "some text with spaces"), it will be changed by #' \code{\link[base]{make.names}} to be valid (see the example below). The #' naming format can be changed using the \code{naming} argument. #' @examples #' data(okc) #' okc <- okc[complete.cases(okc),] #' #' rec <- recipe(~ diet + age + height, data = okc) #' #' dummies <- rec %>% step_dummy(diet) #' dummies <- prep(dummies, training = okc) #' #' dummy_data <- bake(dummies, newdata = okc) #' #' unique(okc$diet) #' grep("^diet", names(dummy_data), value = TRUE) step_dummy <- function(recipe, ..., role = "predictor", trained = FALSE, contrast = options("contrasts"), naming = function(var, lvl) paste(var, make.names(lvl), sep = "_"), levels = NULL) { add_step( recipe, step_dummy_new( terms = check_ellipses(...), role = role, trained = trained, contrast = contrast, naming = naming, levels = levels ) ) } step_dummy_new <- function(terms = NULL, role = "predictor", trained = FALSE, contrast = contrast, naming = naming, levels = levels ) { step( subclass = "dummy", terms = terms, role = role, trained = trained, contrast = contrast, naming = naming, levels = levels ) } #' @importFrom stats as.formula model.frame #' @export prep.step_dummy <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) ## I hate doing this but currently we are going to have ## to save the terms object form the original (= training) ## data levels <- vector(mode = "list", length = length(col_names)) names(levels) <- col_names for (i in seq_along(col_names)) { form <- as.formula(paste0("~", col_names[i])) terms <- model.frame(form, data = training, xlev = x$levels[[i]]) levels[[i]] <- attr(terms, "terms") } step_dummy_new( terms = x$terms, role = x$role, trained = TRUE, contrast = x$contrast, naming = x$naming, levels = levels ) } #' @export bake.step_dummy <- function(object, newdata, ...) { ## Maybe do this in C? col_names <- names(object$levels) ## `na.action` cannot be passed to `model.matrix` but we ## can change it globally for a bit old_opt <- options()$na.action options(na.action = "na.pass") on.exit(options(na.action = old_opt)) for (i in seq_along(object$levels)) { indicators <- model.matrix( object = object$levels[[i]], data = newdata ) options(na.action = old_opt) on.exit(expr = NULL) indicators <- indicators[, -1, drop = FALSE] ## use backticks for nonstandard factor levels here used_lvl <- gsub(paste0("^", col_names[i]), "", colnames(indicators)) colnames(indicators) <- object$naming(col_names[i], used_lvl) newdata <- cbind(newdata, as_tibble(indicators)) newdata[, col_names[i]] <- NULL } if (!is_tibble(newdata)) newdata <- as_tibble(newdata) newdata } print.step_dummy <- function(x, width = max(20, options()$width - 30), ...) { cat("Dummy variables from ") printer(x$levels, x$terms, x$trained, width = width) invisible(x) } recipes/R/roles.R0000644000177700017770000000411713125050715014710 0ustar herbrandtherbrandt#' Manually Add Roles #' #' \code{add_role} can add a role definition to an existing variable in the #' recipe. #' #' @param recipe An existing \code{\link{recipe}}. #' @param ... One or more selector functions to choose which variables are #' being assigned a role. See \code{\link{selections}} for more details. #' @param new_role A character string for a single role. #' @return An updated recipe object. #' @details If a variable is selected that currently has a role, the role is #' changed and a warning is issued. #' @keywords datagen #' @concept preprocessing model_specification #' @export #' @examples #' #' data(biomass) #' #' # Create the recipe manually #' rec <- recipe(x = biomass) #' rec #' summary(rec) #' #' rec <- rec %>% #' add_role(carbon, contains("gen"), sulfur, new_role = "predictor") %>% #' add_role(sample, new_role = "id variable") %>% #' add_role(dataset, new_role = "splitting variable") %>% #' add_role(HHV, new_role = "outcome") #' rec #' #'@importFrom rlang quos add_role <- function(recipe, ..., new_role = "predictor") { if (length(new_role) > 1) stop("A single role is required", call. = FALSE) terms <- quos(...) if (is_empty(terms)) warning("No selectors were found", call. = FALSE) vars <- terms_select(terms = terms, info = summary(recipe)) ## check if there are newly defined variables in the list existing_var <- vars %in% recipe$var_info$variable if (any(!existing_var)) { ## Add new variable with role new_vars <- tibble(variable = vars[!existing_var], role = rep(new_role, sum(!existing_var))) recipe$var_info <- rbind(recipe$var_info, new_vars) } else { ## check for current roles that are missing vars2 <- vars[existing_var] has_role <- !is.na(recipe$var_info$role[recipe$var_info$variable %in% vars2]) if (any(has_role)) { warning("Changing role(s) for ", paste0(vars2[has_role], collapse = ", "), call. = FALSE) } recipe$var_info$role[recipe$var_info$variable %in% vars2] <- new_role } recipe$term_info <- recipe$var_info recipe } recipes/R/hyperbolic.R0000644000177700017770000000604713135741217015735 0ustar herbrandtherbrandt#' Hyperbolic Transformations #' #' \code{step_hyperbolic} creates a \emph{specification} of a recipe step that #' will transform data using a hyperbolic function. #' #' @inheritParams step_center #' @inherit step_center return #' @param role Not used by this step since no new variables are created. #' @param func A character value for the function. Valid values are "sin", #' "cos", or "tan". #' @param inverse A logical: should the inverse function be used? #' @param columns A character string of variable names that will be (eventually) #' populated by the \code{terms} argument. #' @keywords datagen #' @concept preprocessing transformation_methods #' @export #' @examples #' set.seed(313) #' examples <- matrix(rnorm(40), ncol = 2) #' examples <- as.data.frame(examples) #' #' rec <- recipe(~ V1 + V2, data = examples) #' #' cos_trans <- rec %>% #' step_hyperbolic(all_predictors(), #' func = "cos", inverse = FALSE) #' #' cos_obj <- prep(cos_trans, training = examples) #' #' transformed_te <- bake(cos_obj, examples) #' plot(examples$V1, transformed_te$V1) #' @seealso \code{\link{step_logit}} \code{\link{step_invlogit}} #' \code{\link{step_log}} \code{\link{step_sqrt}} \code{\link{recipe}} #' \code{\link{prep.recipe}} \code{\link{bake.recipe}} step_hyperbolic <- function(recipe, ..., role = NA, trained = FALSE, func = "sin", inverse = TRUE, columns = NULL) { funcs <- c("sin", "cos", "tan") if (!(func %in% funcs)) stop("`func` should be either `sin``, `cos`, or `tan`", call. = FALSE) add_step( recipe, step_hyperbolic_new( terms = check_ellipses(...), role = role, trained = trained, func = func, inverse = inverse, columns = columns ) ) } step_hyperbolic_new <- function(terms = NULL, role = NA, trained = FALSE, func = NULL, inverse = NULL, columns = NULL) { step( subclass = "hyperbolic", terms = terms, role = role, trained = trained, func = func, inverse = inverse, columns = columns ) } #' @export prep.step_hyperbolic <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) step_hyperbolic_new( terms = x$terms, role = x$role, trained = TRUE, func = x$func, inverse = x$inverse, columns = col_names ) } #' @export bake.step_hyperbolic <- function(object, newdata, ...) { func <- if (object$inverse) get(paste0("a", object$func)) else get(object$func) col_names <- object$columns for (i in seq_along(col_names)) newdata[, col_names[i]] <- func(getElement(newdata, col_names[i])) as_tibble(newdata) } print.step_hyperbolic <- function(x, width = max(20, options()$width - 32), ...) { ttl <- paste("Hyperbolic", x$func) if (x$inverse) ttl <- paste(ttl, "(inv)") cat(ttl, "transformation on ") printer(x$columns, x$terms, x$trained, width = width) invisible(x) } recipes/R/pca.R0000644000177700017770000001571613135741217014343 0ustar herbrandtherbrandt#' PCA Signal Extraction #' #' \code{step_pca} creates a \emph{specification} of a recipe step that will #' convert numeric data into one or more principal components. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables will be #' used to compute the components. See \code{\link{selections}} for more #' details. #' @param role For model terms created by this step, what analysis role should #' they be assigned?. By default, the function assumes that the new principal #' component columns created by the original variables will be used as #' predictors in a model. #' @param num The number of PCA components to retain as new predictors. If #' \code{num} is greater than the number of columns or the number of #' possible components, a smaller value will be used. #' @param threshold A fraction of the total variance that should be covered #' by the components. For example, \code{threshold = .75} means that #' \code{step_pca} should generate enough components to capture 75\% of the #' variability in the variables. Note: using this argument will override and #' resent any value given to \code{num}. #' @param options A list of options to the default method for #' \code{\link[stats]{prcomp}}. Argument defaults are set to #' \code{retx = FALSE}, \code{center = FALSE}, \code{scale. = FALSE}, and #' \code{tol = NULL}. \bold{Note} that the argument \code{x} should not be #' passed here (or at all). #' @param res The \code{\link[stats]{prcomp.default}} object is stored here #' once this preprocessing step has be trained by \code{\link{prep.recipe}}. #' @param prefix A character string that will be the prefix to the resulting #' new variables. See notes below #' @keywords datagen #' @concept preprocessing pca projection_methods #' @export #' @details #' Principal component analysis (PCA) is a transformation of a group of #' variables that produces a new set of artificial features or components. #' These components are designed to capture the maximum amount of information #' (i.e. variance) in the original variables. Also, the components are #' statistically independent from one another. This means that they can be #' used to combat large inter-variables correlations in a data set. #' #' It is advisable to standardized the variables prior to running PCA. Here, #' each variable will be centered and scaled prior to the PCA calculation. #' This can be changed using the \code{options} argument or by using #' \code{\link{step_center}} and \code{\link{step_scale}}. #' #' The argument \code{num} controls the number of components that will be #' retained (the original variables that are used to derive the components #' are removed from the data). The new components will have names that begin #' with \code{prefix} and a sequence of numbers. The variable names are #' padded with zeros. For example, if \code{num < 10}, their names will be #' \code{PC1} - \code{PC9}. If \code{num = 101}, the names would be #' \code{PC001} - \code{PC101}. #' #' Alternatively, \code{threshold} can be used to determine the number of #' components that are required to capture a specified fraction of the total #' variance in the variables. #' #' @references Jolliffe, I. T. (2010). \emph{Principal Component Analysis}. #' Springer. #' #' @examples #' rec <- recipe( ~ ., data = USArrests) #' pca_trans <- rec %>% #' step_center(all_numeric()) %>% #' step_scale(all_numeric()) %>% #' step_pca(all_numeric(), num = 3) #' pca_estimates <- prep(pca_trans, training = USArrests) #' pca_data <- bake(pca_estimates, USArrests) #' #' rng <- extendrange(c(pca_data$PC1, pca_data$PC2)) #' plot(pca_data$PC1, pca_data$PC2, #' xlim = rng, ylim = rng) #' #' with_thresh <- rec %>% #' step_center(all_numeric()) %>% #' step_scale(all_numeric()) %>% #' step_pca(all_numeric(), threshold = .99) #' with_thresh <- prep(with_thresh, training = USArrests) #' bake(with_thresh, USArrests) #' @seealso \code{\link{step_ica}} \code{\link{step_kpca}} #' \code{\link{step_isomap}} \code{\link{recipe}} \code{\link{prep.recipe}} #' \code{\link{bake.recipe}} step_pca <- function(recipe, ..., role = "predictor", trained = FALSE, num = 5, threshold = NA, options = list(), res = NULL, prefix = "PC") { if (!is.na(threshold) && (threshold > 1 | threshold <= 0)) stop("`threshold` should be on (0, 1].", call. = FALSE) add_step( recipe, step_pca_new( terms = check_ellipses(...), role = role, trained = trained, num = num, threshold = threshold, options = options, res = res, prefix = prefix ) ) } step_pca_new <- function(terms = NULL, role = "predictor", trained = FALSE, num = NULL, threshold = NULL, options = NULL, res = NULL, prefix = "PC") { step( subclass = "pca", terms = terms, role = role, trained = trained, num = num, threshold = threshold, options = options, res = res, prefix = prefix ) } #' @importFrom stats prcomp #' @importFrom rlang expr #' @export prep.step_pca <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) prc_call <- expr(prcomp( retx = FALSE, center = FALSE, scale. = FALSE, tol = NULL )) if (length(x$options) > 0) prc_call <- mod_call_args(prc_call, args = x$options) prc_call$x <- expr(training[, col_names, drop = FALSE]) prc_obj <- eval(prc_call) x$num <- min(x$num, length(col_names)) if (!is.na(x$threshold)) { total_var <- sum(prc_obj$sdev ^ 2) num_comp <- which.max(cumsum(prc_obj$sdev ^ 2 / total_var) >= x$threshold) if (length(num_comp) == 0) num_comp <- length(prc_obj$sdev) x$num <- num_comp } ## decide on removing prc elements that aren't used in new projections ## e.g. `sdev` etc. step_pca_new( terms = x$terms, role = x$role, trained = TRUE, num = x$num, threshold = x$threshold, options = x$options, res = prc_obj, prefix = x$prefix ) } #' @importFrom tibble as_tibble #' @export bake.step_pca <- function(object, newdata, ...) { pca_vars <- rownames(object$res$rotation) comps <- predict(object$res, newdata = newdata[, pca_vars]) comps <- comps[, 1:object$num, drop = FALSE] colnames(comps) <- names0(ncol(comps), object$prefix) newdata <- cbind(newdata, as_tibble(comps)) newdata <- newdata[, !(colnames(newdata) %in% pca_vars), drop = FALSE] as_tibble(newdata) } print.step_pca <- function(x, width = max(20, options()$width - 29), ...) { cat("PCA extraction with ") printer(rownames(x$res$rotation), x$terms, x$trained, width = width) invisible(x) } recipes/R/sqrt.R0000644000177700017770000000443113135741217014561 0ustar herbrandtherbrandt#' Square Root Transformation #' #' \code{step_sqrt} creates a \emph{specification} of a recipe step that will #' square root transform the data. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables will be #' transformed. See \code{\link{selections}} for more details. #' @param role Not used by this step since no new variables are created. #' @param columns A character string of variable names that will be (eventually) #' populated by the \code{terms} argument. #' @keywords datagen #' @concept preprocessing transformation_methods #' @export #' @examples #' set.seed(313) #' examples <- matrix(rnorm(40)^2, ncol = 2) #' examples <- as.data.frame(examples) #' #' rec <- recipe(~ V1 + V2, data = examples) #' #' sqrt_trans <- rec %>% #' step_sqrt(all_predictors()) #' #' sqrt_obj <- prep(sqrt_trans, training = examples) #' #' transformed_te <- bake(sqrt_obj, examples) #' plot(examples$V1, transformed_te$V1) #' @seealso \code{\link{step_logit}} \code{\link{step_invlogit}} #' \code{\link{step_log}} \code{\link{step_hyperbolic}} \code{\link{recipe}} #' \code{\link{prep.recipe}} \code{\link{bake.recipe}} step_sqrt <- function(recipe, ..., role = NA, trained = FALSE, columns = NULL) { add_step( recipe, step_sqrt_new( terms = check_ellipses(...), role = role, trained = trained, columns = columns ) ) } step_sqrt_new <- function(terms = NULL, role = NA, trained = FALSE, columns = NULL) { step( subclass = "sqrt", terms = terms, role = role, trained = trained, columns = columns ) } #' @export prep.step_sqrt <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) step_sqrt_new( terms = x$terms, role = x$role, trained = TRUE, columns = col_names ) } #' @export bake.step_sqrt <- function(object, newdata, ...) { col_names <- object$columns for (i in seq_along(col_names)) newdata[, col_names[i]] <- sqrt(getElement(newdata, col_names[i])) as_tibble(newdata) } print.step_sqrt <- function(x, width = max(20, options()$width - 29), ...) { cat("Square root transformation on ", sep = "") printer(x$columns, x$terms, x$trained, width = width) invisible(x) } recipes/R/bag_imp.R0000644000177700017770000001532213135741217015167 0ustar herbrandtherbrandt#' Imputation via Bagged Trees #' #' \code{step_bagimpute} creates a \emph{specification} of a recipe step that #' will create bagged tree models to impute missing data. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose variables. For #' \code{step_bagimpute}, this indicates the variables to be imputed. When #' used with \code{imp_vars}, the dots indicates which variables are used to #' predict the missing data in each variable. See \code{\link{selections}} #' for more details. #' @param role Not used by this step since no new variables are created. #' @param impute_with A call to \code{imp_vars} to specify which variables are #' used to impute the variables that can inlcude specific variable names #' seperated by commas or different selectors (see #' \code{\link{selections}}). If a column is included in both lists to be #' imputed and to be an imputation predictor, it will be removed from the #' latter and not used to impute itself. #' @param options A list of options to \code{\link[ipred]{ipredbagg}}. Defaults #' are set for the arguments \code{nbagg} and \code{keepX} but others can be #' passed in. \bold{Note} that the arguments \code{X} and \code{y} should not #' be passed here. #' @param seed_val A integer used to create reproducible models. The same seed #' is used across all imputation models. #' @param models The \code{\link[ipred]{ipredbagg}} objects are stored here #' once this bagged trees have be trained by \code{\link{prep.recipe}}. #' @keywords datagen #' @concept preprocessing imputation #' @export #' @details For each variables requiring imputation, a bagged tree is created #' where the outcome is the variable of interest and the predictors are any #' other variables listed in the \code{impute_with} formula. One advantage to #' the bagged tree is that is can accept predictors that have missing values #' themselves. This imputation method can be used when the variable of #' interest (and predictors) are numeric or categorical. Imputed categorical #' variables will remain categorical. #' #' Note that if a variable that is to be imputed is also in \code{impute_with}, #' this variable will be ignored. #' #' It is possible that missing values will still occur after imputation if a #' large majority (or all) of the imputing variables are also missing. #' @references Kuhn, M. and Johnson, K. (2013). #' \emph{Applied Predictive Modeling}. Springer Verlag. #' @examples #' data("credit_data") #' #' ## missing data per column #' vapply(credit_data, function(x) mean(is.na(x)), c(num = 0)) #' #' set.seed(342) #' in_training <- sample(1:nrow(credit_data), 2000) #' #' credit_tr <- credit_data[ in_training, ] #' credit_te <- credit_data[-in_training, ] #' missing_examples <- c(14, 394, 565) #' #' rec <- recipe(Price ~ ., data = credit_tr) #' #' impute_rec <- rec %>% #' step_bagimpute(Status, Home, Marital, Job, Income, Assets, Debt) #' #' imp_models <- prep(impute_rec, training = credit_tr) #' #' imputed_te <- bake(imp_models, newdata = credit_te, everything()) #' #' credit_te[missing_examples,] #' imputed_te[missing_examples, names(credit_te)] step_bagimpute <- function(recipe, ..., role = NA, trained = FALSE, models = NULL, options = list(nbagg = 25, keepX = FALSE), impute_with = imp_vars(all_predictors()), seed_val = sample.int(10 ^ 4, 1)) { if (is.null(impute_with)) stop("Please list some variables in `impute_with`", call. = FALSE) add_step( recipe, step_bagimpute_new( terms = check_ellipses(...), role = role, trained = trained, models = models, options = options, impute_with = impute_with, seed_val = seed_val ) ) } step_bagimpute_new <- function(terms = NULL, role = NA, trained = FALSE, models = NULL, options = NULL, impute_with = NULL, seed_val = NA) { step( subclass = "bagimpute", terms = terms, role = role, trained = trained, models = models, options = options, impute_with = impute_with, seed_val = seed_val ) } #' @importFrom ipred ipredbagg bag_wrap <- function(vars, dat, opt, seed_val) { seed_val <- seed_val[1] dat <- as.data.frame(dat[, c(vars$y, vars$x)]) if (!is.null(seed_val) && !is.na(seed_val)) set.seed(seed_val) out <- do.call("ipredbagg", c(list(y = dat[, vars$y], X = dat[, vars$x, drop = FALSE]), opt)) out$..imp_vars <- vars$x out } ## This figures out which data should be used to predict each variable ## scheduled for imputation impute_var_lists <- function(to_impute, impute_using, info) { to_impute <- terms_select(terms = to_impute, info = info) impute_using <- terms_select(terms = impute_using, info = info) var_lists <- vector(mode = "list", length = length(to_impute)) for (i in seq_along(var_lists)) { var_lists[[i]] <- list(y = to_impute[i], x = impute_using[!(impute_using %in% to_impute[i])]) } var_lists } #' @export prep.step_bagimpute <- function(x, training, info = NULL, ...) { var_lists <- impute_var_lists( to_impute = x$terms, impute_using = x$impute_with, info = info ) x$models <- lapply( var_lists, bag_wrap, dat = training, opt = x$options, seed_val = x$seed_val ) names(x$models) <- vapply(var_lists, function(x) x$y, c("")) x$trained <- TRUE x } #' @importFrom tibble as_tibble #' @importFrom stats predict complete.cases #' @export bake.step_bagimpute <- function(object, newdata, ...) { missing_rows <- !complete.cases(newdata) if (!any(missing_rows)) return(newdata) old_data <- newdata for (i in seq(along = object$models)) { imp_var <- names(object$models)[i] missing_rows <- !complete.cases(newdata[, imp_var]) if (any(missing_rows)) { preds <- object$models[[i]]$..imp_vars pred_data <- old_data[missing_rows, preds, drop = FALSE] ## do a better job of checking this: if (all(is.na(pred_data))) { warning("All predictors are missing; cannot impute", call. = FALSE) } else { pred_vals <- predict(object$models[[i]], pred_data) newdata[missing_rows, imp_var] <- pred_vals } } } ## changes character to factor! as_tibble(newdata) } print.step_bagimpute <- function(x, width = max(20, options()$width - 31), ...) { cat("Bagged tree imputation for ", sep = "") printer(names(x$models), x$terms, x$trained, width = width) invisible(x) } #' @export #' @rdname step_bagimpute imp_vars <- function(...) quos(...) recipes/R/knn_imp.R0000644000177700017770000001432313135741217015224 0ustar herbrandtherbrandt#' Imputation via K-Nearest Neighbors #' #' \code{step_knnimpute} creates a \emph{specification} of a recipe step that #' will impute missing data using nearest neighbors. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose variables. For #' \code{step_knnimpute}, this indicates the variables to be imputed. When #' used with \code{imp_vars}, the dots indicates which variables are used to #' predict the missing data in each variable. See \code{\link{selections}} #' for more details. #' @param role Not used by this step since no new variables are created. #' @param impute_with A call to \code{imp_vars} to specify which variables are #' used to impute the variables that can include specific variable names #' separated by commas or different selectors (see #' \code{\link{selections}}). If a column is included in both lists to be #' imputed and to be an imputation predictor, it will be removed from the #' latter and not used to impute itself. #' @param K The number of neighbors. #' @param ref_data A tibble of data that will reflect the data preprocessing #' done up to the point of this imputation step. This is #' \code{NULL} until the step is trained by \code{\link{prep.recipe}}. #' @param columns The column names that will be imputed and used for #' imputation. This is \code{NULL} until the step is trained by #' \code{\link{prep.recipe}}. #' @keywords datagen #' @concept preprocessing imputation #' @export #' @details The step uses the training set to impute any other data sets. The #' only distance function available is Gower's distance which can be used for #' mixtures of nominal and numeric data. #' #' Once the nearest neighbors are determined, the mode is used to predictor #' nominal variables and the mean is used for numeric data. #' #' Note that if a variable that is to be imputed is also in \code{impute_with}, #' this variable will be ignored. #' #' It is possible that missing values will still occur after imputation if a #' large majority (or all) of the imputing variables are also missing. #' @references Gower, C. (1971) "A general coefficient of similarity and some #' of its properties," Biometrics, 857-871. #' @examples #' library(recipes) #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training", ] #' biomass_te <- biomass[biomass$dataset == "Testing", ] #' biomass_te_whole <- biomass_te #' #' # induce some missing data at random #' set.seed(9039) #' carb_missing <- sample(1:nrow(biomass_te), 3) #' nitro_missing <- sample(1:nrow(biomass_te), 3) #' #' biomass_te$carbon[carb_missing] <- NA #' biomass_te$nitrogen[nitro_missing] <- NA #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' ratio_recipe <- rec %>% #' step_knnimpute(all_predictors(), K = 3) #' ratio_recipe2 <- prep(ratio_recipe, training = biomass_tr) #' imputed <- bake(ratio_recipe2, biomass_te) #' #' # how well did it work? #' summary(biomass_te_whole$carbon) #' cbind(before = biomass_te_whole$carbon[carb_missing], #' after = imputed$carbon[carb_missing]) #' #' summary(biomass_te_whole$nitrogen) #' cbind(before = biomass_te_whole$nitrogen[nitro_missing], #' after = imputed$nitrogen[nitro_missing]) step_knnimpute <- function(recipe, ..., role = NA, trained = FALSE, K = 5, impute_with = imp_vars(all_predictors()), ref_data = NULL, columns = NULL) { if (is.null(impute_with)) stop("Please list some variables in `impute_with`", call. = FALSE) add_step( recipe, step_knnimpute_new( terms = check_ellipses(...), role = role, trained = trained, K = K, impute_with = impute_with, ref_data = ref_data, columns = columns ) ) } step_knnimpute_new <- function(terms = NULL, role = NA, trained = FALSE, K = NULL, impute_with = NULL, ref_data = NULL, columns = NA) { step( subclass = "knnimpute", terms = terms, role = role, trained = trained, K = K, impute_with = impute_with, ref_data = ref_data, columns = columns ) } #' @export prep.step_knnimpute <- function(x, training, info = NULL, ...) { var_lists <- impute_var_lists( to_impute = x$terms, impute_using = x$impute_with, info = info ) all_x_vars <- lapply(var_lists, function(x) c(x$x, x$y)) all_x_vars <- unique(unlist(all_x_vars)) x$columns <- var_lists x$ref_data <- training[, all_x_vars] x$trained <- TRUE x } #' @importFrom gower gower_topn nn_index <- function(.new, .old, vars, K) { gower_topn(.old[, vars], .new[, vars], n = K, nthread = 1)$index } nn_pred <- function(index, dat) { dat <- dat[index, ] dat <- getElement(dat, names(dat)) dat <- dat[!is.na(dat)] est <- if (is.factor(dat) | is.character(dat)) mode_est(dat) else mean(dat) est } #' @importFrom tibble as_tibble #' @importFrom stats predict complete.cases #' @export bake.step_knnimpute <- function(object, newdata, ...) { missing_rows <- !complete.cases(newdata) if (!any(missing_rows)) return(newdata) old_data <- newdata for (i in seq(along = object$columns)) { imp_var <- object$columns[[i]]$y missing_rows <- !complete.cases(newdata[, imp_var]) if (any(missing_rows)) { preds <- object$columns[[i]]$x new_data <- old_data[missing_rows, preds, drop = FALSE] ## do a better job of checking this: if (all(is.na(new_data))) { warning("All predictors are missing; cannot impute", call. = FALSE) } else { nn_ind <- nn_index(object$ref_data, new_data, preds, object$K) pred_vals <- apply(nn_ind, 2, nn_pred, dat = object$ref_data[, imp_var]) newdata[missing_rows, imp_var] <- pred_vals } } } newdata } print.step_knnimpute <- function(x, width = max(20, options()$width - 31), ...) { all_x_vars <- lapply(x$columns, function(x) x$x) all_x_vars <- unique(unlist(all_x_vars)) cat(x$K, "-nearest neighbor imputation for ", sep = "") printer(all_x_vars, x$terms, x$trained, width = width) invisible(x) } recipes/R/selections.R0000644000177700017770000002663313135741217015750 0ustar herbrandtherbrandt #' @name selections #' @aliases selections #' @aliases selection #' @title Methods for Select Variables in Step Functions #' @description When selecting variables or model terms in \code{step} #' functions, \code{dplyr}-like tools are used. The \emph{selector} #' functions can choose variables based on their name, current role, data #' type, or any combination of these. The selectors are passed as any other #' argument to the step. If the variables are explicitly stated in the step #' function, this might be similar to: #' #' \preformatted{ #' recipe( ~ ., data = USArrests) \%>\% #' step_pca(Murder, Assault, UrbanPop, Rape, num = 3) #' } #' #' The first four arguments indicate which variables should be used in the #' PCA while the last argument is a specific argument to #' \code{\link{step_pca}}. #' #' Note that: #' #' \enumerate{ #' \item The selector arguments should not contain functions beyond those #' supported (see below). #' \item These arguments are not evaluated until the \code{prep} function #' for the step is executed. #' \item The \code{dplyr}-like syntax allows for negative sings to exclude #' variables (e.g. \code{-Murder}) and the set of selectors will #' processed in order. #' \item A leading exclusion in these arguments (e.g. \code{-Murder}) has #' the effect of adding all variables to the list except the excluded #' variable(s). #' } #' #' Also, select helpers from the \code{dplyr} package can also be used: #' \code{\link[dplyr]{starts_with}}, \code{\link[dplyr]{ends_with}}, #' \code{\link[dplyr]{contains}}, \code{\link[dplyr]{matches}}, #' \code{\link[dplyr]{num_range}}, and \code{\link[dplyr]{everything}}. #' For example: #' #' \preformatted{ #' recipe(Species ~ ., data = iris) \%>\% #' step_center(starts_with("Sepal"), -contains("Width")) #' } #' #' would only select \code{Sepal.Length} #' #' \bold{Inline} functions that specify computations, such as \code{log(x)}, #' should not be used in selectors and will produce an error. A list of #' allowed selector functions is below. #' #' Columns of the design matrix that may not exist when the step is coded can #' also be selected. For example, when using \code{step_pca}, the number of #' columns created by feature extraction may not be known when subsequent #' steps are defined. In this case, using \code{matches("^PC")} will select #' all of the columns whose names start with "PC" \emph{once those columns #' are created}. #' #' There are sets of functions that can be used to select variables based on #' their role or type: \code{\link{has_role}} and \code{\link{has_type}}. #' For convenience, there are also functions that are more specific: #' \code{\link{all_numeric}}, \code{\link{all_nominal}}, #' \code{\link{all_predictors}}, and \code{\link{all_outcomes}}. These can #' be used in conjunction with the previous functions described for #' selecting variables using their names: #' #' \preformatted{ #' data(biomass) #' recipe(HHV ~ ., data = biomass) \%>\% #' step_center(all_numeric(), -all_outcomes()) #' } #' #' This results in all the numeric predictors: carbon, hydrogen, oxygen, #' nitrogen, and sulfur. #' #' If a role for a variable has not been defined, it will never be selected #' using role-specific selectors. #' #' All steps use these techniques to define variables for steps #' \emph{except one}: \code{\link{step_interact}} requires traditional model #' formula representations of the interactions and takes a single formula #' as the argument to select the variables. #' #' The complete list of allowable functions in steps: #' #' \itemize{ #' \item \bold{By name}: \code{\link[dplyr]{starts_with}}, #' \code{\link[dplyr]{ends_with}}, \code{\link[dplyr]{contains}}, #' \code{\link[dplyr]{matches}}, \code{\link[dplyr]{num_range}}, and #' \code{\link[dplyr]{everything}} #' \item \bold{By role}: \code{\link{has_role}}, #' \code{\link{all_predictors}}, and \code{\link{all_outcomes}} #' \item \bold{By type}: \code{\link{has_type}}, \code{\link{all_numeric}}, #' and \code{\link{all_nominal}} #' } NULL ## These are the allowable functions for formulas in the the `terms` arguments ## to the steps or to `recipes.formula`. name_selectors <- c("starts_with", "ends_with", "contains", "matches", "num_range", "everything", "_F") role_selectors <- c("has_role", "all_predictors", "all_outcomes", "_F") type_selectors <- c("has_type", "all_numeric", "all_nominal", "_F") selectors <- unique(c(name_selectors, role_selectors, type_selectors)) ## Get the components of the formula split by +/-. The ## function also returns the sign f_elements <- function(x) { trms_obj <- terms(x) ## Their order will change here (minus at the end) clls <- attr(trms_obj, "variables") ## Any formula element with a minus prefix will not ## have an colname in the `factor` attribute of the ## terms object. We will check these against the ## list of calls tmp <- colnames(attr(trms_obj, "factors")) kept <- vector(mode = "list", length = length(tmp)) for (j in seq_along(tmp)) kept[[j]] <- as.name(tmp[j]) term_signs <- rep("", length(clls) - 1) for (i in seq_along(term_signs)) { ## Check to see if the elements are in the `factors` ## part of `terms` and these will have a + sign retained <- any(unlist(lapply(kept, function(x, y) any(y == x), y = clls[[i + 1]]))) term_signs[i] <- if (retained) "+" else "-" } list(terms = clls, signs = term_signs) } ## This adds the appropriate argument based on whether the call is for ## a variable name, role, or data type. add_arg <- function(cl) { func <- fun_calls(cl) if (func %in% name_selectors) { cl$vars <- quote(var_vals) } else { if (func %in% role_selectors) { cl$roles <- quote(role_vals) } else cl$types <- quote(type_vals) } cl } ## This flags formulas that are not allowed. When called from `recipe.formula` ## `allowed` is NULL. check_elements <- function(x, allowed = selectors) { funs <- fun_calls(x) funs <- funs[!(funs %in% c("~", "+", "-"))] if (!is.null(allowed)) { # when called from a step not_good <- funs[!(funs %in% allowed)] if (length(not_good) > 0) stop( "Not all functions are allowed in step function selectors (e.g. ", paste0("`", not_good, "`", collapse = ", "), "). See ?selections.", call. = FALSE ) } else { # when called from formula.recipe if (length(funs) > 0) stop( "No in-line functions should be used here; use steps to define ", "baking actions", call. = FALSE ) } invisible(NULL) } has_selector <- function(x, allowed = selectors) { res <- rep(NA, length(x) - 1) for (i in 2:length(x)) res[[i - 1]] <- isTRUE(fun_calls(x[[i]]) %in% allowed) res } #' Select Terms in a Step Function. #' #' This function bakees the step function selectors and might be useful #' when creating custom steps. #' #' @param info A tibble with columns \code{variable}, \code{type}, \code{role}, #' and \code{source} that represent the current state of the data. The #' function \code{\link{summary.recipe}} can be used to get this information #' from a recipe. #' @param terms A list of formulas whose right-hand side contains quoted #' expressions. See \code{\link[rlang]{quos}} for examples. #' @keywords datagen #' @concept preprocessing #' @return A character string of column names or an error of there are no #' selectors or if no variables are selected. #' @seealso \code{\link{recipe}} \code{\link{summary.recipe}} #' \code{\link{prep.recipe}} #' @importFrom purrr map_lgl map_if map_chr map #' @importFrom rlang names2 #' @export #' @examples #' library(rlang) #' data(okc) #' rec <- recipe(~ ., data = okc) #' info <- summary(rec) #' terms_select(info = info, quos(all_predictors())) terms_select <- function(terms, info) { vars <- info$variable roles <- info$role types <- info$type if (is_empty(terms)) { stop("At least one selector should be used", call. = FALSE) } ## check arguments against whitelist lapply(terms, check_elements) # Set current_info so available to helpers old_info <- set_current_info(info) on.exit(set_current_info(old_info), add = TRUE) sel <- with_handlers(tidyselect::vars_select(vars, !!! terms), tidyselect_empty = abort_selection ) unname(sel) } abort_selection <- exiting(function(cnd) { abort("No variables or terms were selected.") }) #' Role Selection #' #' \code{has_role}, \code{all_predictors}, and \code{all_outcomes} can be used #' to select variables in a formula that have certain roles. Similarly, #' \code{has_type}, \code{all_numeric}, and \code{all_nominal} are used to #' select columns based on their data type. See \code{\link{selections}} for #' more details. \code{current_info} is an internal function that is #' unlikely to help users while the others have limited utility outside of #' step function arguments. #' #' @param match A single character string for the query. Exact matching is #' used (i.e. regular expressions won't work). #' @param roles A character string of roles for the current set of terms. #' @param types A character string of roles for the current set of data types #' @return Selector functions return an integer vector while #' \code{current_info} returns an environment with vectors \code{vars}, #' \code{roles}, and \code{types}. #' @keywords datagen #' @examples #' data(biomass) #' #' rec <- recipe(biomass) %>% #' add_role(carbon, hydrogen, oxygen, nitrogen, sulfur, #' new_role = "predictor") %>% #' add_role(HHV, new_role = "outcome") %>% #' add_role(sample, new_role = "id variable") %>% #' add_role(dataset, new_role = "splitting indicator") #' recipe_info <- summary(rec) #' recipe_info #' #' has_role("id variable", roles = recipe_info$role) #' all_outcomes(roles = recipe_info$role) #' @export has_role <- function(match = "predictor", roles = current_info()$roles) which(roles %in% match) #' @export #' @rdname has_role #' @inheritParams has_role all_predictors <- function(roles = current_info()$roles) has_role("predictor", roles = roles) #' @export #' @rdname has_role #' @inheritParams has_role all_outcomes <- function(roles = current_info()$roles) has_role("outcome", roles = roles) #' @export #' @rdname has_role #' @inheritParams has_role has_type <- function(match = "numeric", types = current_info()$types) which(types %in% match) #' @export #' @rdname has_role #' @inheritParams has_role all_numeric <- function(types = current_info()$types) has_type("numeric", types = types) #' @export #' @rdname has_role #' @inheritParams has_role all_nominal <- function(types = current_info()$types) has_type("nominal", types = types) ## functions to get current variable info for selectors modeled after ## dplyr versions #' @import rlang cur_info_env <- child_env(env_parent(env)) set_current_info <- function(x) { # stopifnot(!is.environment(x)) old <- cur_info_env cur_info_env$vars <- x$variable cur_info_env$roles <- x$role cur_info_env$types <- x$type invisible(old) } #' @export #' @rdname has_role current_info <- function() { cur_info_env %||% stop("Variable context not set", call. = FALSE) } recipes/R/YeoJohnson.R0000644000177700017770000001400113135741217015655 0ustar herbrandtherbrandt#' Yeo-Johnson Transformation #' #' \code{step_YeoJohnson} creates a \emph{specification} of a recipe step that #' will transform data using a simple Yeo-Johnson transformation. #' #' @inheritParams step_center #' @inherit step_center return #' @param role Not used by this step since no new variables are created. #' @param lambdas A numeric vector of transformation values. This is #' \code{NULL} until computed by \code{\link{prep.recipe}}. #' @param limits A length 2 numeric vector defining the range to compute the #' transformation parameter lambda. #' @param nunique An integer where data that have less possible values will #' not be evaluate for a transformation #' @keywords datagen #' @concept preprocessing transformation_methods #' @export #' @details The Yeo-Johnson transformation is very similar to the Box-Cox but #' does not require the input variables to be strictly positive. In the #' package, the partial log-likelihood function is directly optimized within #' a reasonable set of transformation values (which can be changed by the #' user). #' #' This transformation is typically done on the outcome variable using the #' residuals for a statistical model (such as ordinary least squares). Here, #' a simple null model (intercept only) is used to apply the transformation #' to the \emph{predictor} variables individually. This can have the effect #' of making the variable distributions more symmetric. #' #' If the transformation parameters are estimated to be very closed to the #' bounds, or if the optimization fails, a value of \code{NA} is used and #' no transformation is applied. #' #' @references Yeo, I. K., and Johnson, R. A. (2000). A new family of power #' transformations to improve normality or symmetry. \emph{Biometrika}. #' @examples #' #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' yj_trans <- step_YeoJohnson(rec, all_numeric()) #' #' yj_estimates <- prep(yj_trans, training = biomass_tr) #' #' yj_te <- bake(yj_estimates, biomass_te) #' #' plot(density(biomass_te$sulfur), main = "before") #' plot(density(yj_te$sulfur), main = "after") #' @seealso \code{\link{step_BoxCox}} \code{\link{recipe}} #' \code{\link{prep.recipe}} \code{\link{bake.recipe}} step_YeoJohnson <- function(recipe, ..., role = NA, trained = FALSE, lambdas = NULL, limits = c(-5, 5), nunique = 5) { add_step( recipe, step_YeoJohnson_new( terms = check_ellipses(...), role = role, trained = trained, lambdas = lambdas, limits = sort(limits)[1:2], nunique = nunique ) ) } step_YeoJohnson_new <- function(terms = NULL, role = NA, trained = FALSE, lambdas = NULL, limits = NULL, nunique = NULL) { step( subclass = "YeoJohnson", terms = terms, role = role, trained = trained, lambdas = lambdas, limits = limits, nunique = nunique ) } #' @export prep.step_YeoJohnson <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) values <- vapply( training[, col_names], estimate_yj, c(lambda = 0), limits = x$limits, nunique = x$nunique ) values <- values[!is.na(values)] step_YeoJohnson_new( terms = x$terms, role = x$role, trained = TRUE, lambdas = values, limits = x$limits, nunique = x$nunique ) } #' @export bake.step_YeoJohnson <- function(object, newdata, ...) { if (length(object$lambdas) == 0) return(as_tibble(newdata)) param <- names(object$lambdas) for (i in seq_along(object$lambdas)) newdata[, param[i]] <- yj_trans(getElement(newdata, param[i]), lambda = object$lambdas[param[i]]) as_tibble(newdata) } print.step_YeoJohnson <- function(x, width = max(20, options()$width - 39), ...) { cat("Yeo-Johnson transformation on ", sep = "") printer(names(x$lambdas), x$terms, x$trained, width = width) invisible(x) } ## computes the new data given a lambda #' Internal Functions #' #' These are not to be used directly by the users. #' @export #' @keywords internal #' @rdname recipes-internal yj_trans <- function(x, lambda, eps = .001) { if (is.na(lambda)) return(x) if (!inherits(x, "tbl_df") || is.data.frame(x)) { x <- unlist(x, use.names = FALSE) } else { if (!is.vector(x)) x <- as.vector(x) } not_neg <- x >= 0 nn_trans <- function(x, lambda) if (abs(lambda) < eps) log(x + 1) else ((x + 1) ^ lambda - 1) / lambda ng_trans <- function(x, lambda) if (abs(lambda - 2) < eps) - log(-x + 1) else - ((-x + 1) ^ (2 - lambda) - 1) / (2 - lambda) if (any(not_neg)) x[not_neg] <- nn_trans(x[not_neg], lambda) if (any(!not_neg)) x[!not_neg] <- ng_trans(x[!not_neg], lambda) x } ## Helper for the log-likelihood calc for eq 3.1 of Yeo, I. K., ## & Johnson, R. A. (2000). A new family of power transformations ## to improve normality or symmetry. Biometrika. page 957 #' @importFrom stats var ll_yj <- function(lambda, y, eps = .001) { n <- length(y) nonneg <- all(y > 0) y_t <- yj_trans(y, lambda) mu_t <- mean(y_t) var_t <- var(y_t) * (n - 1) / n const <- sum(sign(y) * log(abs(y) + 1)) res <- -.5 * n * log(var_t) + (lambda - 1) * const res } #' @importFrom stats complete.cases ## eliminates missing data and returns -llh yj_obj <- function(lam, dat){ dat <- dat[complete.cases(dat)] ll_yj(lambda = lam, y = dat) } ## estimates the values #' @importFrom stats optimize #' @export #' @keywords internal #' @rdname recipes-internal estimate_yj <- function(dat, limits = c(-5, 5), nunique = 5) { eps <- .001 if (length(unique(dat)) < nunique) return(NA) res <- optimize( yj_obj, interval = limits, maximum = TRUE, dat = dat, tol = .0001 ) lam <- res$maximum if (abs(limits[1] - lam) <= eps | abs(limits[2] - lam) <= eps) lam <- NA lam } recipes/R/date.R0000644000177700017770000001577413135741217014521 0ustar herbrandtherbrandt#' Date Feature Generator #' #' \code{step_date} creates a a \emph{specification} of a recipe step that will #' convert date data into one or more factor or numeric variables. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables that #' will be used to create the new variables. The selected variables should #' have class \code{Date} or \code{POSIXct}. See \code{\link{selections}} for #' more details. #' @param role For model terms created by this step, what analysis role should #' they be assigned?. By default, the function assumes that the new variable #' columns created by the original variables will be used as predictors in a #' model. #' @param features A character string that includes at least one of the #' following values: \code{month}, \code{dow} (day of week), \code{doy} #' (day of year), \code{week}, \code{month}, \code{decimal} (decimal date, #' e.g. 2002.197), \code{quarter}, \code{semester}, \code{year}. #' @param label A logical. Only available for features \code{month} or #' \code{dow}. \code{TRUE} will display the day of the week as an ordered #' factor of character strings, such as "Sunday." \code{FALSE} will display #' the day of the week as a number. #' @param abbr A logical. Only available for features \code{month} or #' \code{dow}. \code{FALSE} will display the day of the week as an ordered #' factor of character strings, such as "Sunday". \code{TRUE} will display #' an abbreviated version of the label, such as "Sun". \code{abbr} is #' disregarded if \code{label = FALSE}. #' @param ordinal A logical: should factors be ordered? Only available for #' features \code{month} or \code{dow}. #' @param columns A character string of variables that will be used as #' inputs. This field is a placeholder and will be populated once #' \code{\link{prep.recipe}} is used. #' @keywords datagen #' @concept preprocessing model_specification variable_encodings dates #' @export #' @details Unlike other steps, \code{step_date} does \emph{not} remove the #' original date variables. \code{\link{step_rm}} can be used for this #' purpose. #' @examples #' library(lubridate) #' #' examples <- data.frame(Dan = ymd("2002-03-04") + days(1:10), #' Stefan = ymd("2006-01-13") + days(1:10)) #' date_rec <- recipe(~ Dan + Stefan, examples) %>% #' step_date(all_predictors()) #' #' date_rec <- prep(date_rec, training = examples) #' date_values <- bake(date_rec, newdata = examples) #' date_values #' @seealso \code{\link{step_holiday}} \code{\link{step_rm}} #' \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} step_date <- function(recipe, ..., role = "predictor", trained = FALSE, features = c("dow", "month", "year"), abbr = TRUE, label = TRUE, ordinal = FALSE, columns = NULL ) { feat <- c("year", "doy", "week", "decimal", "semester", "quarter", "dow", "month") if (!all(features %in% feat)) stop("Possible values of `features` should include: ", paste0("'", feat, "'", collapse = ", ")) add_step( recipe, step_date_new( terms = check_ellipses(...), role = role, trained = trained, features = features, abbr = abbr, label = label, ordinal = ordinal, columns = columns ) ) } step_date_new <- function( terms = NULL, role = "predictor", trained = FALSE, features = features, abbr = abbr, label = label, ordinal = ordinal, columns = columns ) { step( subclass = "date", terms = terms, role = role, trained = trained, features = features, abbr = abbr, label = label, ordinal = ordinal, columns = columns ) } #' @importFrom stats as.formula model.frame #' @export prep.step_date <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) date_data <- info[info$variable %in% col_names, ] if (any(date_data$type != "date")) stop("All variables for `step_date` should be either `Date` or", "`POSIXct` classes.", call. = FALSE) step_date_new( terms = x$terms, role = x$role, trained = TRUE, features = x$features, abbr = x$abbr, label = x$label, ordinal = x$ordinal, columns = col_names ) } ord2fac <- function(x, what) { x <- getElement(x, what) factor(as.character(x), levels = levels(x), ordered = FALSE) } #' @importFrom lubridate year yday week decimal_date quarter semester wday month get_date_features <- function(dt, feats, abbr = TRUE, label = TRUE, ord = FALSE) { ## pre-allocate values res <- matrix(NA, nrow = length(dt), ncol = length(feats)) res <- as_tibble(res) colnames(res) <- feats if ("year" %in% feats) res[, grepl("year$", names(res))] <- year(dt) if ("doy" %in% feats) res[, grepl("doy$", names(res))] <- yday(dt) if ("week" %in% feats) res[, grepl("week$", names(res))] <- week(dt) if ("decimal" %in% feats) res[, grepl("decimal$", names(res))] <- decimal_date(dt) if ("quarter" %in% feats) res[, grepl("quarter$", names(res))] <- quarter(dt) if ("semester" %in% feats) res[, grepl("semester$", names(res))] <- semester(dt) if ("dow" %in% feats) { res[, grepl("dow$", names(res))] <- wday(dt, abbr = abbr, label = label) if (!ord & label == TRUE) res[, grepl("dow$", names(res))] <- ord2fac(res, grep("dow$", names(res), value = TRUE)) } if ("month" %in% feats) { res[, grepl("month$", names(res))] <- month(dt, abbr = abbr, label = label) if (!ord & label == TRUE) res[, grepl("month$", names(res))] <- ord2fac(res, grep("month$", names(res), value = TRUE)) } res } #' @importFrom tibble as_tibble is_tibble #' @export bake.step_date <- function(object, newdata, ...) { new_cols <- rep(length(object$features), each = length(object$columns)) date_values <- matrix(NA, nrow = nrow(newdata), ncol = sum(new_cols)) colnames(date_values) <- rep("", sum(new_cols)) date_values <- as_tibble(date_values) strt <- 1 for (i in seq_along(object$columns)) { cols <- (strt):(strt + new_cols[i] - 1) tmp <- get_date_features( dt = getElement(newdata, object$columns[i]), feats = object$features, abbr = object$abbr, label = object$label, ord = object$ordinal ) date_values[, cols] <- tmp names(date_values)[cols] <- paste(object$columns[i], names(tmp), sep = "_") strt <- max(cols) + 1 } newdata <- cbind(newdata, date_values) if (!is_tibble(newdata)) newdata <- as_tibble(newdata) newdata } print.step_date <- function(x, width = max(20, options()$width - 29), ...) { cat("Date features from ") printer(x$columns, x$terms, x$trained, width = width) invisible(x) } recipes/R/logit.R0000644000177700017770000000476513135741217014720 0ustar herbrandtherbrandt#' Logit Transformation #' #' \code{step_logit} creates a \emph{specification} of a recipe step that will #' logit transform the data. #' #' @inheritParams step_center #' @inherit step_center return #' @param role Not used by this step since no new variables are created. #' @param columns A character string of variable names that will be (eventually) #' populated by the \code{terms} argument. #' @keywords datagen #' @concept preprocessing transformation_methods #' @export #' @details The inverse logit transformation takes values between zero and one #' and translates them to be on the real line using the function #' \code{f(p) = log(p/(1-p))}. #' @examples #' set.seed(313) #' examples <- matrix(runif(40), ncol = 2) #' examples <- data.frame(examples) #' #' rec <- recipe(~ X1 + X2, data = examples) #' #' logit_trans <- rec %>% #' step_logit(all_predictors()) #' #' logit_obj <- prep(logit_trans, training = examples) #' #' transformed_te <- bake(logit_obj, examples) #' plot(examples$X1, transformed_te$X1) #' @seealso \code{\link{step_invlogit}} \code{\link{step_log}} #' \code{\link{step_sqrt}} \code{\link{step_hyperbolic}} \code{\link{recipe}} #' \code{\link{prep.recipe}} \code{\link{bake.recipe}} step_logit <- function(recipe, ..., role = NA, trained = FALSE, columns = NULL) { add_step(recipe, step_logit_new( terms = check_ellipses(...), role = role, trained = trained, columns = columns )) } step_logit_new <- function(terms = NULL, role = NA, trained = FALSE, columns = NULL) { step( subclass = "logit", terms = terms, role = role, trained = trained, columns = columns ) } #' @export prep.step_logit <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) step_logit_new( terms = x$terms, role = x$role, trained = TRUE, columns = col_names ) } #' @importFrom tibble as_tibble #' @importFrom stats binomial #' @export bake.step_logit <- function(object, newdata, ...) { for (i in seq_along(object$columns)) newdata[, object$columns[i]] <- binomial()$linkfun(getElement(newdata, object$columns[i])) as_tibble(newdata) } print.step_logit <- function(x, width = max(20, options()$width - 33), ...) { cat("Logit transformation on ", sep = "") printer(x$columns, x$terms, x$trained, width = width) invisible(x) } recipes/R/holiday.R0000644000177700017770000001137513135741217015226 0ustar herbrandtherbrandt#' Holiday Feature Generator #' #' \code{step_holiday} creates a a \emph{specification} of a recipe step that #' will convert date data into one or more binary indicator variables for #' common holidays. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables will be #' used to create the new variables. The selected variables should have #' class \code{Date} or \code{POSIXct}. See \code{\link{selections}} for #' more details. #' @param role For model terms created by this step, what analysis role should #' they be assigned?. By default, the function assumes that the new variable #' columns created by the original variables will be used as predictors in #' a model. #' @param holidays A character string that includes at least one holdiay #' supported by the \code{timeDate} package. See #' \code{\link[timeDate]{listHolidays}} for a complete list. #' @param columns A character string of variables that will be used as #' inputs. This field is a placeholder and will be populated once #' \code{\link{prep.recipe}} is used. #' @keywords datagen #' @concept preprocessing model_specification variable_encodings dates #' @export #' @details Unlike other steps, \code{step_holiday} does \emph{not} remove the #' original date variables. \code{\link{step_rm}} can be used for #' this purpose. #' @examples #' library(lubridate) #' #' examples <- data.frame(someday = ymd("2000-12-20") + days(0:40)) #' holiday_rec <- recipe(~ someday, examples) %>% #' step_holiday(all_predictors()) #' #' holiday_rec <- prep(holiday_rec, training = examples) #' holiday_values <- bake(holiday_rec, newdata = examples) #' holiday_values #' @seealso \code{\link{step_date}} \code{\link{step_rm}} #' \code{\link{recipe}} \code{\link{prep.recipe}} #' \code{\link{bake.recipe}} \code{\link[timeDate]{listHolidays}} #' @import timeDate step_holiday <- function( recipe, ..., role = "predictor", trained = FALSE, holidays = c("LaborDay", "NewYearsDay", "ChristmasDay"), columns = NULL ) { all_days <- listHolidays() if (!all(holidays %in% all_days)) stop("Invalid `holidays` value. See timeDate::listHolidays", call. = FALSE) add_step( recipe, step_holiday_new( terms = check_ellipses(...), role = role, trained = trained, holidays = holidays, columns = columns ) ) } step_holiday_new <- function( terms = NULL, role = "predictor", trained = FALSE, holidays = holidays, columns = columns ) { step( subclass = "holiday", terms = terms, role = role, trained = trained, holidays = holidays, columns = columns ) } #' @importFrom stats as.formula model.frame #' @export prep.step_holiday <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) holiday_data <- info[info$variable %in% col_names, ] if (any(holiday_data$type != "date")) stop("All variables for `step_holiday` should be either `Date` ", "or `POSIXct` classes.", call. = FALSE) step_holiday_new( terms = x$terms, role = x$role, trained = TRUE, holidays = x$holidays, columns = col_names ) } is_holiday <- function(hol, dt) { hdate <- holiday(year = unique(year(dt)), Holiday = hol) hdate <- as.Date(hdate) out <- rep(0, length(dt)) out[dt %in% hdate] <- 1 out } #' @importFrom lubridate year is.Date get_holiday_features <- function(dt, hdays) { if (!is.Date(dt)) dt <- as.Date(dt) hdays <- as.list(hdays) hfeat <- lapply(hdays, is_holiday, dt = dt) hfeat <- do.call("cbind", hfeat) colnames(hfeat) <- unlist(hdays) as_tibble(hfeat) } #' @importFrom tibble as_tibble is_tibble #' @export bake.step_holiday <- function(object, newdata, ...) { new_cols <- rep(length(object$holidays), each = length(object$columns)) holiday_values <- matrix(NA, nrow = nrow(newdata), ncol = sum(new_cols)) colnames(holiday_values) <- rep("", sum(new_cols)) holiday_values <- as_tibble(holiday_values) strt <- 1 for (i in seq_along(object$columns)) { cols <- (strt):(strt + new_cols[i] - 1) tmp <- get_holiday_features(dt = getElement(newdata, object$columns[i]), hdays = object$holidays) holiday_values[, cols] <- tmp names(holiday_values)[cols] <- paste(object$columns[i], names(tmp), sep = "_") strt <- max(cols) + 1 } newdata <- cbind(newdata, as_tibble(holiday_values)) if (!is_tibble(newdata)) newdata <- as_tibble(newdata) newdata } print.step_holiday <- function(x, width = max(20, options()$width - 29), ...) { cat("Holiday features from ") printer(x$columns, x$terms, x$trained, width = width) invisible(x) } recipes/R/classdist.R0000644000177700017770000001317213135741217015563 0ustar herbrandtherbrandt#' Distances to Class Centroids #' #' \code{step_classdist} creates a a \emph{specification} of a recipe step #' that will convert numeric data into Mahalanobis distance measurements to #' the data centroid. This is done for each value of a categorical class #' variable. #' #' @inheritParams step_center #' @inherit step_center return #' @param class A single character string that specifies a single categorical #' variable to be used as the class. #' @param role For model terms created by this step, what analysis role should #' they be assigned?. By default, the function assumes that resulting #' distances will be used as predictors in a model. #' @param mean_func A function to compute the center of the distribution. #' @param cov_func A function that computes the covariance matrix #' @param pool A logical: should the covariance matrix be computed by pooling #' the data for all of the classes? #' @param log A logical: should the distances be transformed by the natural #' log function? #' @param objects Statistics are stored here once this step has been trained #' by \code{\link{prep.recipe}}. #' @keywords datagen #' @concept preprocessing dimension_reduction #' @export #' @details \code{step_classdist} will create a #' #' The function will create a new column for every unique value of the #' \code{class} variable. The resulting variables will not replace the #' original values and have the prefix \code{classdist_}. #' #' Note that, by default, the default covariance function requires that each #' class should have at least as many rows as variables listed in the #' \code{terms} argument. If \code{pool = TRUE}, there must be at least as #' many data points are variables overall. #' @examples #' #' # in case of missing data... #' mean2 <- function(x) mean(x, na.rm = TRUE) #' #' rec <- recipe(Species ~ ., data = iris) %>% #' step_classdist(all_predictors(), class = "Species", #' pool = FALSE, mean_func = mean2) #' #' rec_dists <- prep(rec, training = iris) #' #' dists_to_species <- bake(rec_dists, newdata = iris, everything()) #' ## on log scale: #' dist_cols <- grep("classdist", names(dists_to_species), value = TRUE) #' dists_to_species[, c("Species", dist_cols)] #' @importFrom stats cov step_classdist <- function(recipe, ..., class, role = "predictor", trained = FALSE, mean_func = mean, cov_func = cov, pool = FALSE, log = TRUE, objects = NULL) { if (!is.character(class) || length(class) != 1) stop("`class` should be a single character value.") add_step( recipe, step_classdist_new( terms = check_ellipses(...), class = class, role = role, trained = trained, mean_func = mean_func, cov_func = cov_func, pool = pool, log = log, objects = objects ) ) } step_classdist_new <- function(terms = NULL, class = NULL, role = "predictor", trained = FALSE, mean_func = NULL, cov_func = NULL, pool = NULL, log = NULL, objects = NULL) { step( subclass = "classdist", terms = terms, class = class, role = role, trained = trained, mean_func = mean_func, cov_func = cov_func, pool = pool, log = log, objects = objects ) } get_center <- function(x, mfun = mean) { apply(x, 2, mfun) } get_both <- function(x, mfun = mean, cfun = cov) { list(center = get_center(x, mfun), scale = cfun(x)) } #' @importFrom stats as.formula model.frame #' @export prep.step_classdist <- function(x, training, info = NULL, ...) { class_var <- x$class[1] x_names <- terms_select(x$terms, info = info) x_dat <- split(training[, x_names], getElement(training, class_var)) if (x$pool) { res <- list( center = lapply(x_dat, get_center, mfun = x$mean_func), scale = x$cov_func(training[, x_names]) ) } else { res <- lapply(x_dat, get_both, mfun = x$mean_func, cfun = x$cov_func) } step_classdist_new( terms = x$terms, class = x$class, role = x$role, trained = TRUE, mean_func = x$mean_func, cov_func = x$cov_func, pool = x$pool, log = x$log, objects = res ) } #' @importFrom stats mahalanobis mah_by_class <- function(param, x) mahalanobis(x, param$center, param$scale) mah_pooled <- function(means, x, cov_mat) mahalanobis(x, means, cov_mat) #' @importFrom tibble as_tibble #' @export bake.step_classdist <- function(object, newdata, ...) { if (object$pool) { x_cols <- names(object$objects[["center"]][[1]]) res <- lapply( object$objects$center, mah_pooled, x = newdata[, x_cols], cov_mat = object$objects$scale ) } else { x_cols <- names(object$objects[[1]]$center) res <- lapply(object$objects, mah_by_class, x = newdata[, x_cols]) } if (object$log) res <- lapply(res, log) res <- as_tibble(res) colnames(res) <- paste0("classdist_", colnames(res)) res <- cbind(newdata, res) if (!is_tibble(res)) res <- as_tibble(res) res } print.step_classdist <- function(x, width = max(20, options()$width - 30), ...) { cat("Distances to", x$class, "for ") if (x$trained) { x_names <- if (x$pool) names(x$objects[["center"]][[1]]) else names(x$objects[[1]]$center) } else x_names <- NULL printer(x_names, x$terms, x$trained, width = width) invisible(x) } recipes/R/lincombo.R0000644000177700017770000001332113135741217015370 0ustar herbrandtherbrandt#' Linear Combination Filter #' #' \code{step_lincomb} creates a \emph{specification} of a recipe step that #' will potentially remove numeric variables that have linear combinations #' between them. #' #' @inheritParams step_center #' @inherit step_center return #' @param role Not used by this step since no new variables are created. #' @param max_steps A value . #' @param removals A character string that contains the names of columns that #' should be removed. These values are not determined until #' \code{\link{prep.recipe}} is called. #' @keywords datagen #' @concept preprocessing variable_filters #' @author Max Kuhn, Kirk Mettler, and Jed Wing #' @export #' #' @details This step finds exact linear combinations between two or more #' variables and recommends which column(s) should be removed to resolve the #' issue. This algorithm may need to be applied multiple times (as defined #' by \code{max_steps}). #' @examples #' data(biomass) #' #' biomass$new_1 <- with(biomass, #' .1*carbon - .2*hydrogen + .6*sulfur) #' biomass$new_2 <- with(biomass, #' .5*carbon - .2*oxygen + .6*nitrogen) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + #' sulfur + new_1 + new_2, #' data = biomass_tr) #' #' lincomb_filter <- rec %>% #' step_lincomb(all_predictors()) #' #' prep(lincomb_filter, training = biomass_tr) #' @seealso \code{\link{step_nzv}}\code{\link{step_corr}} #' \code{\link{recipe}} \code{\link{prep.recipe}} #' \code{\link{bake.recipe}} step_lincomb <- function(recipe, ..., role = NA, trained = FALSE, max_steps = 5, removals = NULL) { add_step( recipe, step_lincomb_new( terms = check_ellipses(...), role = role, trained = trained, max_steps = max_steps, removals = removals ) ) } step_lincomb_new <- function(terms = NULL, role = NA, trained = FALSE, max_steps = NULL, removals = NULL) { step( subclass = "lincomb", terms = terms, role = role, trained = trained, max_steps = max_steps, removals = removals ) } #' @export prep.step_lincomb <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) if (any(info$type[info$variable %in% col_names] != "numeric")) stop("All variables for mean imputation should be numeric") filter <- iter_lc_rm(x = training[, col_names], max_steps = x$max_steps) step_lincomb_new( terms = x$terms, role = x$role, trained = TRUE, max_steps = x$max_steps, removals = filter ) } #' @export bake.step_lincomb <- function(object, newdata, ...) { if (length(object$removals) > 0) newdata <- newdata[, !(colnames(newdata) %in% object$removals)] as_tibble(newdata) } print.step_lincomb <- function(x, width = max(20, options()$width - 36), ...) { if (x$trained) { if (length(x$removals) > 0) { cat("Linear combination filter removed ") cat(format_ch_vec(x$removals, width = width)) } else cat("Linear combination filter removed no terms") } else { cat("Linear combination filter on ", sep = "") cat(format_selectors(x$terms, wdth = width)) } if (x$trained) cat(" [trained]\n") else cat("\n") invisible(x) } recommend_rm <- function(x, eps = 1e-6, ...) { if (!is.matrix(x)) x <- as.matrix(x) if (is.null(colnames(x))) stop("`x` should have column names", call. = FALSE) qr_decomp <- qr(x) qr_decomp_R <- qr.R(qr_decomp) # extract R matrix num_cols <- ncol(qr_decomp_R) # number of columns in R rank <- qr_decomp$rank # number of independent columns pivot <- qr_decomp$pivot # get the pivot vector if (is.null(num_cols) || rank == num_cols) { rm_list <- character(0) # there are no linear combinations } else { p1 <- 1:rank X <- qr_decomp_R[p1, p1] # extract the independent columns Y <- qr_decomp_R[p1, -p1, drop = FALSE] # extract the dependent columns b <- qr(X) # factor the independent columns b <- qr.coef(b, Y) # get regression coefficients of # the dependent columns b[abs(b) < eps] <- 0 # zap small values # generate a list with one element for each dependent column combos <- lapply(1:ncol(Y), function(i) c(pivot[rank + i], pivot[which(b[, i] != 0)])) rm_list <- unlist(lapply(combos, function(x) x[1])) rm_list <- colnames(x)[rm_list] } rm_list } iter_lc_rm <- function(x, max_steps = 10, verbose = FALSE) { if (is.null(colnames(x))) stop("`x` should have column names", call. = FALSE) orig_names <- colnames(x) if (!is.matrix(x)) x <- as.matrix(x) # converting to matrix may alter column names name_df <- data.frame(orig = orig_names, current = colnames(x), stringsAsFactors = FALSE) for (i in 1:max_steps) { if (verbose) cat(i) if (i == max_steps) break () lcs <- recommend_rm(x) if (length(lcs) == 0) break () else { if (verbose) cat(" removing", length(lcs), "\n") x <- x[, !(colnames(x) %in% lcs)] } } if (verbose) cat("\n") name_df <- name_df[!(name_df$current %in% colnames(x)), ] name_df$orig } recipes/R/pkg.R0000644000177700017770000000263513135741217014355 0ustar herbrandtherbrandt#' recipes: A package for computing and preprocessing design matrices. #' #'The \code{recipes} package can be used to create design matrices for modeling #' and to conduct preprocessing of variables. It is meant to be a more #' extensive framework that R's formula method. Some differences between #' simple formula methods and recipes are that #'\enumerate{ #'\item Variables can have arbitrary roles in the analysis beyond predictors #' and outcomes. #'\item A recipe consists of one or more steps that define actions on the #' variables. #'\item Recipes can be defined sequentially using pipes as well as being #' modifiable and extensible. #'} #' #' #' @section Basic Functions: #' The three main functions are \code{\link{recipe}}, \code{\link{prep}}, #' and \code{\link{bake}}. #' #' \code{\link{recipe}} defines the operations on the data and the associated #' roles. Once the preprocessing steps are defined, any parameters are #' estimated using \code{\link{prep}}. Once the data are ready for #' transformation, the \code{\link{bake}} function applies the operations. #' #' @section Step Functions: #' These functions are used to add new actions to the recipe and have the #' naming convention \code{"step_action"}. For example, #' \code{\link{step_center}} centers the data to have a zero mean and #' \code{\link{step_dummy}} is used to create dummy variables. #' @docType package #' @name recipes NULL recipes/R/intercept.R0000644000177700017770000000527713135741217015576 0ustar herbrandtherbrandt#' Add intercept (or constant) column #' #' \code{step_intercept} creates a \emph{specification} of a recipe step that #' will add an intercept or constant term in the first column of a data #' matrix. \code{step_intercept} has defaults to \emph{predictor} role so #' that it is by default called in the bake step. Be careful to avoid #' unintentional transformations when calling steps with #' \code{all_predictors}. #' #' @param recipe A recipe object. The step will be added to the sequence of #' operations for this recipe. #' @param ... Argument ignored; included for consistency with other step #' specification functions. #' @param role Defaults to "predictor" #' @param trained A logical to indicate if the quantities for preprocessing #' have been estimated. Again included for consistency. #' @param name Character name for new added column #' @param value A numeric constant to fill the intercept column. Defaults to 1. #' #' @return An updated version of \code{recipe} with the #' new step added to the sequence of existing steps (if any). #' @export #' #' @examples #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' rec_trans <- recipe(HHV ~ ., data = biomass_tr[, -(1:2)]) %>% #' step_intercept(value = 2) #' #' rec_obj <- prep(rec_trans, training = biomass_tr) #' #' with_intercept <- bake(rec_obj, biomass_te) #' with_intercept #' #' @seealso \code{\link{recipe}} \code{\link{prep.recipe}} #' \code{\link{bake.recipe}} step_intercept <- function(recipe, ..., role = "predictor", trained = FALSE, name = "intercept", value = 1) { if (length(list(...)) > 0) warning("Selectors are not used for this step.", call. = FALSE) if (!is.numeric(value)) stop("Intercept value must be numeric.", call. = FALSE) if (!is.character(name) | length(name) != 1) stop("Intercept/constant column name must be a character value.", call. = FALSE) add_step( recipe, step_intercept_new( role = role, trained = trained, name = name, value = value)) } step_intercept_new <- function(role = "predictor", trained = FALSE, name = "intercept", value = 1) { step( subclass = "intercept", role = role, trained = trained, name = name, value = value ) } prep.step_intercept <- function(x, training, info = NULL, ...) { x$trained <- TRUE x } #' @importFrom tibble add_column bake.step_intercept <- function(object, newdata, ...) { tibble::add_column(newdata, !!object$name := object$value, .before = TRUE) } recipes/R/log.R0000644000177700017770000000461013135741217014350 0ustar herbrandtherbrandt#' Logarithmic Transformation #' #' \code{step_log} creates a \emph{specification} of a recipe step that will #' log transform data. #' #' @inheritParams step_center #' @inherit step_center return #' @param role Not used by this step since no new variables are created. #' @param base A numeric value for the base. #' @param columns A character string of variable names that will be (eventually) #' populated by the \code{terms} argument. #' @keywords datagen #' @concept preprocessing transformation_methods #' @export #' @examples #' set.seed(313) #' examples <- matrix(exp(rnorm(40)), ncol = 2) #' examples <- as.data.frame(examples) #' #' rec <- recipe(~ V1 + V2, data = examples) #' #' log_trans <- rec %>% #' step_log(all_predictors()) #' #' log_obj <- prep(log_trans, training = examples) #' #' transformed_te <- bake(log_obj, examples) #' plot(examples$V1, transformed_te$V1) #' @seealso \code{\link{step_logit}} \code{\link{step_invlogit}} #' \code{\link{step_hyperbolic}} \code{\link{step_sqrt}} #' \code{\link{recipe}} \code{\link{prep.recipe}} #' \code{\link{bake.recipe}} step_log <- function(recipe, ..., role = NA, trained = FALSE, base = exp(1), columns = NULL) { add_step( recipe, step_log_new( terms = check_ellipses(...), role = role, trained = trained, base = base, columns = columns ) ) } step_log_new <- function(terms = NULL, role = NA, trained = FALSE, base = NULL, columns = NULL) { step( subclass = "log", terms = terms, role = role, trained = trained, base = base, columns = columns ) } #' @export prep.step_log <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) step_log_new( terms = x$terms, role = x$role, trained = TRUE, base = x$base, columns = col_names ) } #' @export bake.step_log <- function(object, newdata, ...) { col_names <- object$columns for (i in seq_along(col_names)) newdata[, col_names[i]] <- log(getElement(newdata, col_names[i]), base = object$base) as_tibble(newdata) } print.step_log <- function(x, width = max(20, options()$width - 31), ...) { cat("Log transformation on ", sep = "") printer(x$columns, x$terms, x$trained, width = width) invisible(x) } recipes/R/modeimpute.R0000644000177700017770000000601213135741217015735 0ustar herbrandtherbrandt#' Impute Nominal Data Using the Most Common Value #' #' \code{step_modeimpute} creates a \emph{specification} of a recipe step that #' will substitute missing values of nominal variables by the training set #' mode of those variables. #' #' @inheritParams step_center #' @inherit step_center return #' @param role Not used by this step since no new variables are created. #' @param modes A named character vector of modes. This is \code{NULL} until #' computed by \code{\link{prep.recipe}}. #' @keywords datagen #' @concept preprocessing imputation #' @export #' @details \code{step_modeimpute} estimates the variable modes from the data #' used in the \code{training} argument of \code{prep.recipe}. #' \code{bake.recipe} then applies the new values to new data sets using #' these values. If the training set data has more than one mode, one is #' selected at random. #' @examples #' data("credit_data") #' #' ## missing data per column #' vapply(credit_data, function(x) mean(is.na(x)), c(num = 0)) #' #' set.seed(342) #' in_training <- sample(1:nrow(credit_data), 2000) #' #' credit_tr <- credit_data[ in_training, ] #' credit_te <- credit_data[-in_training, ] #' missing_examples <- c(14, 394, 565) #' #' rec <- recipe(Price ~ ., data = credit_tr) #' #' impute_rec <- rec %>% #' step_modeimpute(Status, Home, Marital) #' #' imp_models <- prep(impute_rec, training = credit_tr) #' #' imputed_te <- bake(imp_models, newdata = credit_te, everything()) #' #' table(credit_te$Home, imputed_te$Home, useNA = "always") step_modeimpute <- function(recipe, ..., role = NA, trained = FALSE, modes = NULL) { add_step( recipe, step_modeimpute_new( terms = check_ellipses(...), role = role, trained = trained, modes = modes ) ) } step_modeimpute_new <- function(terms = NULL, role = NA, trained = FALSE, modes = NULL) { step( subclass = "modeimpute", terms = terms, role = role, trained = trained, modes = modes ) } #' @export prep.step_modeimpute <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) modes <- vapply(training[, col_names], mode_est, c(mode = "")) step_modeimpute_new( terms = x$terms, role = x$role, trained = TRUE, modes ) } #' @export bake.step_modeimpute <- function(object, newdata, ...) { for (i in names(object$modes)) { if (any(is.na(newdata[, i]))) newdata[is.na(newdata[, i]), i] <- object$modes[i] } as_tibble(newdata) } print.step_modeimpute <- function(x, width = max(20, options()$width - 30), ...) { cat("Mode Imputation for ", sep = "") printer(names(x$modes), x$terms, x$trained, width = width) invisible(x) } mode_est <- function(x) { if (!is.character(x) & !is.factor(x)) stop("The data should be character or factor to compute the mode.", call. = FALSE) tab <- table(x) modes <- names(tab)[tab == max(tab)] sample(modes, size = 1) } recipes/R/depth.R0000644000177700017770000001307413135741262014677 0ustar herbrandtherbrandt#' Data Depths #' #' \code{step_depth} creates a a \emph{specification} of a recipe step that #' will convert numeric data into measurement of \emph{data depth}. This is #' done for each value of a categorical class variable. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables that #' will be used to create the new features. See \code{\link{selections}} for #' more details. #' @param class A single character string that specifies a single categorical #' variable to be used as the class. #' @param role For model terms created by this step, what analysis role should #' they be assigned?. By default, the function assumes that resulting depth #' estimates will be used as predictors in a model. #' @param metric A character string specifying the depth metric. Possible #' values are "potential", "halfspace", "Mahalanobis", "simplicialVolume", #' "spatial", and "zonoid". #' @param options A list of options to pass to the underlying depth functions. #' See \code{\link[ddalpha]{depth.halfspace}}, #' \code{\link[ddalpha]{depth.Mahalanobis}}, #' \code{\link[ddalpha]{depth.potential}}, #' \code{\link[ddalpha]{depth.projection}}, #' \code{\link[ddalpha]{depth.simplicial}}, #' \code{\link[ddalpha]{depth.simplicialVolume}}, #' \code{\link[ddalpha]{depth.spatial}}, \code{\link[ddalpha]{depth.zonoid}}. #' @param data The training data are stored here once after #' \code{\link{prep.recipe}} is executed. #' @keywords datagen #' @concept preprocessing dimension_reduction #' @export #' @details Data depth metrics attempt to measure how close data a data point #' is to the center of its distribution. There are a number of methods for #' calculating death but a simple example is the inverse of the distance of #' a data point to the centroid of the distribution. Generally, small values #' indicate that a data point not close to the centroid. \code{step_depth} #' can compute a class-specific depth for a new data point based on the #' proximity of the new value to the training set distribution. #' #' Note that the entire training set is saved to compute future depth values. #' The saved data have been trained (i.e. prepared) and baked (i.e. processed) up to the point before the #' location that \code{step_depth} occupies in the recipe. Also, the data #' requirements for the different step methods may vary. For example, using #' \code{metric = "Mahalanobis"} requires that each class should have at least #' as many rows as variables listed in the \code{terms} argument. #' #' The function will create a new column for every unique value of the #' \code{class} variable. The resulting variables will not replace the #' original values and have the prefix \code{depth_}. #' #' @examples #' #' # halfspace depth is the default #' rec <- recipe(Species ~ ., data = iris) %>% #' step_depth(all_predictors(), class = "Species") #' #' rec_dists <- prep(rec, training = iris) #' #' dists_to_species <- bake(rec_dists, newdata = iris) #' dists_to_species step_depth <- function(recipe, ..., class, role = "predictor", trained = FALSE, metric = "halfspace", options = list(), data = NULL) { if (!is.character(class) || length(class) != 1) stop("`class` should be a single character value.") add_step( recipe, step_depth_new( terms = check_ellipses(...), class = class, role = role, trained = trained, metric = metric, options = options, data = data ) ) } step_depth_new <- function(terms = NULL, class = NULL, role = "predictor", trained = FALSE, metric = NULL, options = NULL, data = NULL) { step( subclass = "depth", terms = terms, class = class, role = role, trained = trained, metric = metric, options = options, data = data ) } #' @importFrom stats as.formula model.frame #' @export prep.step_depth <- function(x, training, info = NULL, ...) { class_var <- x$class[1] x_names <- terms_select(x$terms, info = info) x_dat <- split(training[, x_names], getElement(training, class_var)) x_dat <- lapply(x_dat, as.matrix) step_depth_new( terms = x$terms, class = x$class, role = x$role, trained = TRUE, metric = x$metric, options = x$options, data = x_dat ) } get_depth <- function(tr_dat, new_dat, metric, opts) { if (!is.matrix(new_dat)) new_dat <- as.matrix(new_dat) opts$data <- tr_dat opts$x <- new_dat do.call(paste0("depth.", metric), opts) } #' @importFrom tibble as_tibble #' @importFrom ddalpha depth.halfspace depth.Mahalanobis depth.potential #' depth.projection depth.simplicial depth.simplicialVolume depth.spatial #' depth.zonoid #' @export bake.step_depth <- function(object, newdata, ...) { x_names <- colnames(object$data[[1]]) x_data <- as.matrix(newdata[, x_names]) res <- lapply( object$data, get_depth, new_dat = x_data, metric = object$metric, opts = object$options ) res <- as_tibble(res) colnames(res) <- paste0("depth_", colnames(res)) res <- cbind(newdata, res) if (!is_tibble(res)) res <- as_tibble(res) res } print.step_depth <- function(x, width = max(20, options()$width - 30), ...) { cat("Data depth by ", x$class, "for ") if (x$trained) { cat(format_ch_vec(x_names, width = width)) } else x_names <- NULL printer(x_names, x$terms, x$trained, width = width) invisible(x) } recipes/R/shuffle.R0000644000177700017770000000465713135741217015236 0ustar herbrandtherbrandt#' Shuffle Variables #' #' \code{step_shuffle} creates a \emph{specification} of a recipe step that will #' randomly change the order of rows for selected variables. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables will #' permuted. See \code{\link{selections}} for more details. #' @param role Not used by this step since no new variables are created. #' @param columns A character string that contains the names of columns that #' should be shuffled. These values are not determined until #' \code{\link{prep.recipe}} is called. #' @keywords datagen #' @concept preprocessing randomization permutation #' @export #' @examples #' integers <- data.frame(A = 1:12, B = 13:24, C = 25:36) #' #' library(dplyr) #' rec <- recipe(~ A + B + C, data = integers) %>% #' step_shuffle(A, B) #' #' rand_set <- prep(rec, training = integers) #' #' set.seed(5377) #' bake(rand_set, integers) step_shuffle <- function(recipe, ..., role = NA, trained = FALSE, columns = NULL) { add_step(recipe, step_shuffle_new( terms = check_ellipses(...), role = role, trained = trained, columns = columns )) } step_shuffle_new <- function(terms = NULL, role = NA, trained = FALSE, columns = NULL) { step( subclass = "shuffle", terms = terms, role = role, trained = trained, columns = columns ) } #' @export prep.step_shuffle <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) step_shuffle_new( terms = x$terms, role = x$role, trained = TRUE, columns = col_names ) } #' @export bake.step_shuffle <- function(object, newdata, ...) { if (nrow(newdata) == 1) { warning("`newdata` contains a single row; unable to shuffle", call. = FALSE) return(newdata) } if (length(object$columns) > 0) for (i in seq_along(object$columns)) newdata[, object$columns[i]] <- sample(getElement(newdata, object$columns[i])) as_tibble(newdata) } print.step_shuffle <- function(x, width = max(20, options()$width - 22), ...) { cat("Shuffled ") printer(x$columns, x$terms, x$trained, width = width) invisible(x) } recipes/R/regex.R0000644000177700017770000001074413135741217014706 0ustar herbrandtherbrandt#' Create Dummy Variables using Regular Expressions #' #' \code{step_regex} creates a \emph{specification} of a recipe step that will #' create a new dummy variable based on a regular expression. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... A single selector functions to choose which variable will be #' searched for the pattern. The selector should resolve into a single #' variable. See \code{\link{selections}} for more details. #' @param role For a variable created by this step, what analysis role should #' they be assigned?. By default, the function assumes that the new dummy #' variable column created by the original variable will be used as a #' predictors in a model. #' @param pattern A character string containing a regular expression (or #' character string for \code{fixed = TRUE}) to be matched in the given #' character vector. Coerced by \code{as.character} to a character string #' if possible. #' @param options A list of options to \code{\link{grepl}} that should not #' include \code{x} or \code{pattern}. #' @param result A single character value for the name of the new variable. It #' should be a valid column name. #' @param input A single character value for the name of the variable being #' searched. This is \code{NULL} until computed by #' \code{\link{prep.recipe}}. #' @keywords datagen #' @concept preprocessing dummy_variables regular_expressions #' @export #' @examples #' data(covers) #' #' rec <- recipe(~ description, covers) %>% #' step_regex(description, pattern = "(rock|stony)", result = "rocks") %>% #' step_regex(description, pattern = "ratake families") #' #' rec2 <- prep(rec, training = covers) #' rec2 #' #' with_dummies <- bake(rec2, newdata = covers) #' with_dummies step_regex <- function(recipe, ..., role = "predictor", trained = FALSE, pattern = ".", options = list(), result = make.names(pattern), input = NULL) { if (!is.character(pattern)) stop("`pattern` should be a character string", call. = FALSE) if (length(pattern) != 1) stop("`pattern` should be a single pattern", call. = FALSE) valid_args <- names(formals(grepl))[- (1:2)] if (any(!(names(options) %in% valid_args))) stop("Valid options are: ", paste0(valid_args, collapse = ", "), call. = FALSE) terms <- check_ellipses(...) if (length(terms) > 1) stop("For this step, only a single selector can be used.", call. = FALSE) add_step( recipe, step_regex_new( terms = terms, role = role, trained = trained, pattern = pattern, options = options, result = result, input = input ) ) } step_regex_new <- function(terms = NULL, role = NA, trained = FALSE, pattern = NULL, options = NULL, result = NULL, input = NULL) { step( subclass = "regex", terms = terms, role = role, trained = trained, pattern = pattern, options = options, result = result, input = input ) } #' @export prep.step_regex <- function(x, training, info = NULL, ...) { col_name <- terms_select(x$terms, info = info) if (length(col_name) != 1) stop("The selector should only select a single variable") if (any(info$type[info$variable %in% col_name] != "nominal")) stop("The regular expression input should be character or factor") step_regex_new( terms = x$terms, role = x$role, trained = TRUE, pattern = x$pattern, options = x$options, input = col_name, result = x$result ) } #' @importFrom rlang expr bake.step_regex <- function(object, newdata, ...) { ## sub in options regex <- expr( grepl( x = getElement(newdata, object$input), pattern = object$pattern, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE ) ) if (length(object$options) > 0) regex <- mod_call_args(regex, args = object$options) newdata[, object$result] <- ifelse(eval(regex), 1, 0) newdata } print.step_regex <- function(x, width = max(20, options()$width - 30), ...) { cat("Regular expression dummy variable using `", x$pattern, "`", sep = "") if (x$trained) cat(" [trained]\n") else cat("\n") invisible(x) } recipes/R/data.R0000644000177700017770000000444513105555774014516 0ustar herbrandtherbrandt#' Biomass Data #' #' Ghugare et al (2014) contains a data set where different biomass fuels are #' characterized by the amount of certain molecules (carbon, hydrogen, oxygen, #' nitrogen, and sulfur) and the corresponding higher heating value (HHV). #' These data are from their Table S.2 of the Supplementary Materials #' #' @name biomass #' @aliases biomass #' @docType data #' @return \item{biomass}{a data frame} #' #' @source Ghugare, S. B., Tiwary, S., Elangovan, V., and Tambe, S. S. (2013). #' Prediction of Higher Heating Value of Solid Biomass Fuels Using Artificial #' Intelligence Formalisms. \emph{BioEnergy Research}, 1-12. #' #' @keywords datasets #' @examples #' data(biomass) #' str(biomass) NULL #' OkCupid Data #' #' These are a sample of columns of users of OkCupid dating website. The data #' are from Kim and Escobedo-Land (2015). #' #' @name okc #' @aliases okc #' @docType data #' @return \item{okc}{a data frame} #' #' @source Kim, A. Y., and A. Escobedo-Land. 2015. "OkCupid Data for #' Introductory Statistics and Data Science Courses." \emph{Journal of #' Statistics Education: An International Journal on the Teaching and #' Learning of Statistics}. #' #' @keywords datasets #' @examples #' data(okc) #' str(okc) NULL #' Credit Data #' #' These data are from the website of Dr. Lluís A. Belanche Muñoz by way of a #' github repository of Dr. Gaston Sanchez. One data point is a missing outcome #' was removed from the original data. #' #' @name credit_data #' @aliases credit_data #' @docType data #' @return \item{credit_data}{a data frame} #' #' @source \url{https://github.com/gastonstat/CreditScoring}, #' \url{http://bit.ly/2kkBFrk} #' #' @keywords datasets #' @examples #' data(credit_data) #' str(credit_data) NULL #' Raw Cover Type Data #' #' These data are raw data describing different types of forest cover-types #' from the UCI Machine Learning Database (see link below). There is one #' column in the data that has a few difference pieces of textual #' information (of variable lengths). #' #' @name covers #' @aliases covers #' @docType data #' @return \item{covers}{a data frame} #' #' @source \url{https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.info} #' #' @keywords datasets #' @examples #' data(covers) #' str(covers) NULL recipes/R/bin2factor.R0000644000177700017770000000662713135741217015632 0ustar herbrandtherbrandt#' Create a Factors from A Dummy Variable #' #' \code{step_bin2factor} creates a \emph{specification} of a recipe step that #' will create a two-level factor from a single dummy variable. #' @inheritParams step_center #' @inherit step_center return #' @param ... Selector functions that choose which variables will be converted. #' See \code{\link{selections}} for more details. #' @param role Not used by this step since no new variables are created. #' @param levels A length 2 character string that indicate the factor levels #' for the 1's (in the first position) and the zeros (second) #' @param columns A vector with the selected variable names. This is #' \code{NULL} until computed by \code{\link{prep.recipe}}. #' @details This operation may be useful for situations where a binary piece of #' information may need to be represented as categorical instead of numeric. #' For example, naive Bayes models would do better to have factor predictors #' so that the binomial distribution is modeled in stead of a Gaussian #' probability density of numeric binary data. #' Note that the numeric data is only verified to be numeric (and does not #' count levels). #' @keywords datagen #' @concept preprocessing dummy_variables factors #' @export #' @examples #' data(covers) #' #' rec <- recipe(~ description, covers) %>% #' step_regex(description, pattern = "(rock|stony)", result = "rocks") %>% #' step_regex(description, pattern = "(rock|stony)", result = "more_rocks") %>% #' step_bin2factor(rocks) #' #' rec <- prep(rec, training = covers) #' results <- bake(rec, newdata = covers) #' #' table(results$rocks, results$more_rocks) step_bin2factor <- function(recipe, ..., role = NA, trained = FALSE, levels = c("yes", "no"), columns = NULL) { if (length(levels) != 2 | !is.character(levels)) stop("`levels` should be a two element character string", call. = FALSE) add_step( recipe, step_bin2factor_new( terms = check_ellipses(...), role = role, trained = trained, levels = levels, columns = columns ) ) } step_bin2factor_new <- function(terms = NULL, role = NA, trained = FALSE, levels = NULL, columns = NULL) { step( subclass = "bin2factor", terms = terms, role = role, trained = trained, levels = levels, columns = columns ) } #' @export prep.step_bin2factor <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) if (length(col_names) < 1) stop("The selector should only select at least one variable") if (any(info$type[info$variable %in% col_names] != "numeric")) stop("The variables should be numeric") step_bin2factor_new( terms = x$terms, role = x$role, trained = TRUE, levels = x$levels, columns = col_names ) } bake.step_bin2factor <- function(object, newdata, ...) { for (i in seq_along(object$columns)) newdata[, object$columns[i]] <- factor(ifelse( getElement(newdata, object$columns[i]) == 1, object$levels[1], object$levels[2] ), levels = object$levels) newdata } print.step_bin2factor <- function(x, width = max(20, options()$width - 30), ...) { cat("Dummy variable to factor conversion for ", sep = "") printer(x$columns, x$terms, x$trained, width = width) invisible(x) } recipes/R/corr.R0000644000177700017770000001175513135741217014544 0ustar herbrandtherbrandt#' High Correlation Filter #' #' \code{step_corr} creates a \emph{specification} of a recipe step that will #' potentially remove variables that have large absolute correlations with #' other variables. #' #' @inheritParams step_center #' @inherit step_center return #' @param role Not used by this step since no new variables are created. #' @param threshold A value for the threshold of absolute correlation values. #' The step will try to remove the minimum number of columns so that all the #' resulting absolute correlations are less than this value. #' @param use A character string for the \code{use} argument to the #' \code{\link[stats]{cor}} function. #' @param method A character string for the \code{method} argument to the #' \code{\link[stats]{cor}} function. #' @param removals A character string that contains the names of columns that #' should be removed. These values are not determined until #' \code{\link{prep.recipe}} is called. #' @keywords datagen #' @author Original R code for filtering algorithm by Dong Li, modified by #' Max Kuhn. Contributions by Reynald Lescarbeau (for original in #' \code{caret} package). Max Kuhn for the \code{step} function. #' @concept preprocessing variable_filters #' @export #' #' @details This step attempts to remove variables to keep the largest absolute #' correlation between the variables less than \code{threshold}. #' @examples #' data(biomass) #' #' set.seed(3535) #' biomass$duplicate <- biomass$carbon + rnorm(nrow(biomass)) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + #' sulfur + duplicate, #' data = biomass_tr) #' #' corr_filter <- rec %>% #' step_corr(all_predictors(), threshold = .5) #' #' filter_obj <- prep(corr_filter, training = biomass_tr) #' #' filtered_te <- bake(filter_obj, biomass_te) #' round(abs(cor(biomass_tr[, c(3:7, 9)])), 2) #' round(abs(cor(filtered_te)), 2) #' @seealso \code{\link{step_nzv}} \code{\link{recipe}} #' \code{\link{prep.recipe}} \code{\link{bake.recipe}} step_corr <- function(recipe, ..., role = NA, trained = FALSE, threshold = 0.9, use = "pairwise.complete.obs", method = "pearson", removals = NULL) { add_step( recipe, step_corr_new( terms = check_ellipses(...), role = role, trained = trained, threshold = threshold, use = use, method = method, removals = removals ) ) } step_corr_new <- function( terms = NULL, role = NA, trained = FALSE, threshold = NULL, use = NULL, method = NULL, removals = NULL ) { step( subclass = "corr", terms = terms, role = role, trained = trained, threshold = threshold, use = use, method = method, removals = removals ) } #' @export prep.step_corr <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) filter <- corr_filter( x = training[, col_names], cutoff = x$threshold, use = x$use, method = x$method ) step_corr_new( terms = x$terms, role = x$role, trained = TRUE, threshold = x$threshold, use = x$use, method = x$method, removals = filter ) } #' @export bake.step_corr <- function(object, newdata, ...) { if (length(object$removals) > 0) newdata <- newdata[,!(colnames(newdata) %in% object$removals)] as_tibble(newdata) } print.step_corr <- function(x, width = max(20, options()$width - 36), ...) { if (x$trained) { if (length(x$removals) > 0) { cat("Correlation filter removed ") cat(format_ch_vec(x$removals, width = width)) } else cat("Correlation filter removed no terms") } else { cat("Correlation filter on ", sep = "") cat(format_selectors(x$terms, wdth = width)) } if (x$trained) cat(" [trained]\n") else cat("\n") invisible(x) } #' @importFrom stats cor corr_filter <- function(x, cutoff = .90, use = "pairwise.complete.obs", method = "pearson") { x <- cor(x, use = use, method = method) if (any(!complete.cases(x))) stop("The correlation matrix has some missing values.") averageCorr <- colMeans(abs(x)) averageCorr <- as.numeric(as.factor(averageCorr)) x[lower.tri(x, diag = TRUE)] <- NA combsAboveCutoff <- which(abs(x) > cutoff) colsToCheck <- ceiling(combsAboveCutoff / nrow(x)) rowsToCheck <- combsAboveCutoff %% nrow(x) colsToDiscard <- averageCorr[colsToCheck] > averageCorr[rowsToCheck] rowsToDiscard <- !colsToDiscard deletecol <- c(colsToCheck[colsToDiscard], rowsToCheck[rowsToDiscard]) deletecol <- unique(deletecol) if (length(deletecol) > 0) deletecol <- colnames(x)[deletecol] deletecol } recipes/R/ns.R0000644000177700017770000001055013135741217014207 0ustar herbrandtherbrandt#' Nature Spline Basis Functions #' #' \code{step_ns} creates a \emph{specification} of a recipe step that will #' create new columns that are basis expansions of variables using natural #' splines. #' #' @inheritParams step_center #' @inherit step_center return #' @param role For model terms created by this step, what analysis role should #' they be assigned?. By default, the function assumes that the new columns #' created from the original variables will be used as predictors in a model. #' @param objects A list of \code{\link[splines]{ns}} objects created once the #' step has been trained. #' @param options A list of options for \code{\link[splines]{ns}} which should #' not include \code{x}. #' @keywords datagen #' @concept preprocessing basis_expansion #' @export #' @details \code{step_ns} can new features from a single variable that enable #' fitting routines to model this variable in a nonlinear manner. The extent #' of the possible nonlinearity is determined by the \code{df} or \code{knot} #' arguments of \code{\link[splines]{ns}}. The original variables are #' removed from the data and new columns are added. The naming convention #' for the new variables is \code{varname_ns_1} and so on. #' @examples #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' with_splines <- rec %>% #' step_ns(carbon, hydrogen) #' with_splines <- prep(with_splines, training = biomass_tr) #' #' expanded <- bake(with_splines, biomass_te) #' expanded #' @seealso \code{\link{step_poly}} \code{\link{recipe}} #' \code{\link{prep.recipe}} \code{\link{bake.recipe}} step_ns <- function(recipe, ..., role = "predictor", trained = FALSE, objects = NULL, options = list(df = 2)) { add_step( recipe, step_ns_new( terms = check_ellipses(...), trained = trained, role = role, objects = objects, options = options ) ) } step_ns_new <- function(terms = NULL, role = NA, trained = FALSE, objects = NULL, options = NULL) { step( subclass = "ns", terms = terms, role = role, trained = trained, objects = objects, options = options ) } #' @importFrom splines ns ns_wrapper <- function(x, args) { if (!("Boundary.knots" %in% names(args))) args$Boundary.knots <- range(x) args$x <- x ns_obj <- do.call("ns", args) ## don't need to save the original data so keep 1 row out <- matrix(NA, ncol = ncol(ns_obj), nrow = 1) class(out) <- c("ns", "basis", "matrix") attr(out, "knots") <- attr(ns_obj, "knots")[] attr(out, "Boundary.knots") <- attr(ns_obj, "Boundary.knots") attr(out, "intercept") <- attr(ns_obj, "intercept") out } #' @export prep.step_ns <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) obj <- lapply(training[, col_names], ns_wrapper, x$options) for (i in seq(along = col_names)) attr(obj[[i]], "var") <- col_names[i] step_ns_new( terms = x$terms, role = x$role, trained = TRUE, objects = obj, options = x$options ) } #' @importFrom tibble as_tibble is_tibble #' @importFrom stats predict #' @export bake.step_ns <- function(object, newdata, ...) { ## pre-allocate a matrix for the basis functions. new_cols <- vapply(object$objects, ncol, c(int = 1L)) ns_values <- matrix(NA, nrow = nrow(newdata), ncol = sum(new_cols)) colnames(ns_values) <- rep("", sum(new_cols)) strt <- 1 for (i in names(object$objects)) { cols <- (strt):(strt + new_cols[i] - 1) orig_var <- attr(object$objects[[i]], "var") ns_values[, cols] <- predict(object$objects[[i]], getElement(newdata, i)) new_names <- paste(orig_var, "ns", names0(new_cols[i], ""), sep = "_") colnames(ns_values)[cols] <- new_names strt <- max(cols) + 1 newdata[, orig_var] <- NULL } newdata <- cbind(newdata, as_tibble(ns_values)) if (!is_tibble(newdata)) newdata <- as_tibble(newdata) newdata } print.step_ns <- function(x, width = max(20, options()$width - 28), ...) { cat("Natural Splines on ") printer(names(x$objects), x$terms, x$trained, width = width) invisible(x) } recipes/R/recipe.R0000644000177700017770000005067213135741262015047 0ustar herbrandtherbrandt#' Create a Recipe for Preprocessing Data #' #' A recipe is a description of what steps should be applied to a data set in #' order to get it ready for data analysis. #' #' @aliases recipe recipe.default recipe.formula #' @author Max Kuhn #' @keywords datagen #' @concept preprocessing model_specification #' @export recipe <- function(x, ...) UseMethod("recipe") #' @rdname recipe #' @export recipe.default <- function(x, ...) stop("`x` should be a data frame, matrix, or tibble", call. = FALSE) #' @rdname recipe #' @param vars A character string of column names corresponding to variables #' that will be used in any context (see below) #' @param roles A character string (the same length of \code{vars}) that #' describes a single role that the variable will take. This value could be #' anything but common roles are \code{"outcome"}, \code{"predictor"}, #' \code{"case_weight"}, or \code{"ID"} #' @param ... Further arguments passed to or from other methods (not currently #' used). #' @param formula A model formula. No in-line functions should be used here #' (e.g. \code{log(x)}, \code{x:y}, etc.). These types of transformations #' should be enacted using \code{step} functions in this package. Dots are #' allowed as are simple multivariate outcome terms (i.e. no need for #' \code{cbind}; see Examples). #' @param x,data A data frame or tibble of the \emph{template} data set #' (see below). #' @return An object of class \code{recipe} with sub-objects: #' \item{var_info}{A tibble containing information about the original data #' set columns} #' \item{term_info}{A tibble that contains the current set of terms in the #' data set. This initially defaults to the same data contained in #' \code{var_info}.} #' \item{steps}{A list of \code{step} objects that define the sequence of #' preprocessing steps that will be applied to data. The default value is #' \code{NULL}} #' \item{template}{A tibble of the data. This is initialized to be the same #' as the data given in the \code{data} argument but can be different after #' the recipe is trained.} #' #' @details Recipes are alternative methods for creating design matrices and #' for preprocessing data. #' #' Variables in recipes can have any type of \emph{role} in subsequent analyses #' such as: outcome, predictor, case weights, stratification variables, etc. #' #' \code{recipe} objects can be created in several ways. If the analysis only #' contains outcomes and predictors, the simplest way to create one is to use #' a simple formula (e.g. \code{y ~ x1 + x2}) that does not contain inline #' functions such as \code{log(x3)}. An example is given below. #' #' Alternatively, a \code{recipe} object can be created by first specifying #' which variables in a data set should be used and then sequentially #' defining their roles (see the last example). #' #' Steps to the recipe can be added sequentially. Steps can include common #' operations like logging a variable, creating dummy variables or #' interactions and so on. More computationally complex actions such as #' dimension reduction or imputation can also be specified. #' #' Once a recipe has been defined, the \code{\link{prep}} function can be #' used to estimate quants required in the steps from a data set (a.k.a. the #' training data). \code{\link{prep}} returns another recipe. #' #' To apply the recipe to a data set, the \code{\link{bake}} function is #' used in the same manner as \code{predict} would be for models. This #' applies the steps to any data set. #' #' Note that the data passed to \code{recipe} need not be the complete data #' that will be used to train the steps (by \code{\link{prep}}). The recipe #' only needs to know the names and types of data that will be used. For #' large data sets, \code{head} could be used to pass the recipe a smaller #' data set to save time and memory. #' #' @export #' @importFrom tibble as_tibble is_tibble tibble #' @importFrom dplyr full_join #' @importFrom stats predict #' @examples #' #' ############################################### #' # simple example: #' data(biomass) #' #' # split data #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' # When only predictors and outcomes, a simplified formula can be used. #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' # Now add preprocessing steps to the recipe. #' #' sp_signed <- rec %>% #' step_center(all_predictors()) %>% #' step_scale(all_predictors()) %>% #' step_spatialsign(all_predictors()) #' sp_signed #' #' # now estimate required parameters #' sp_signed_trained <- prep(sp_signed, training = biomass_tr) #' sp_signed_trained #' #' # apply the preprocessing to a data set #' test_set_values <- bake(sp_signed_trained, newdata = biomass_te) #' #' # or use pipes for the entire workflow: #' rec <- biomass_tr %>% #' recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur) %>% #' step_center(all_predictors()) %>% #' step_scale(all_predictors()) %>% #' step_spatialsign(all_predictors()) #' #' ############################################### #' # multivariate example #' #' # no need for `cbind(carbon, hydrogen)` for left-hand side #' multi_y <- recipe(carbon + hydrogen ~ oxygen + nitrogen + sulfur, #' data = biomass_tr) #' multi_y <- multi_y %>% #' step_center(all_outcomes()) %>% #' step_scale(all_predictors()) #' #' multi_y_trained <- prep(multi_y, training = biomass_tr) #' #' results <- bake(multi_y_trained, biomass_te) #' #' ############################################### #' # Creating a recipe manually with different roles #' #' rec <- recipe(biomass_tr) %>% #' add_role(carbon, hydrogen, oxygen, nitrogen, sulfur, #' new_role = "predictor") %>% #' add_role(HHV, new_role = "outcome") %>% #' add_role(sample, new_role = "id variable") %>% #' add_role(dataset, new_role = "splitting indicator") #' rec recipe.data.frame <- function(x, formula = NULL, ..., vars = NULL, roles = NULL) { if (!is.null(formula)) { if (!is.null(vars)) stop("This `vars` specification will be ignored when a formula is ", "used", call. = FALSE) if (!is.null(roles)) stop("This `roles` specification will be ignored when a formula is ", "used", call. = FALSE) obj <- recipe.formula(formula, x, ...) return(obj) } if (is.null(vars)) vars <- colnames(x) if (!is_tibble(x)) x <- as_tibble(x) if (is.null(vars)) vars <- colnames(x) if (any(table(vars) > 1)) stop("`vars` should have unique members", call. = FALSE) if (any(!(vars %in% colnames(x)))) stop("1+ elements of `vars` are not in `x`", call. = FALSE) x <- x[, vars] var_info <- tibble(variable = vars) ## Check and add roles when available if (!is.null(roles)) { if (length(roles) != length(vars)) stop("The number of roles should be the same as the number of ", "variables", call. = FALSE) var_info$role <- roles } else var_info$role <- NA ## Add types var_info <- full_join(get_types(x), var_info, by = "variable") var_info$source <- "original" ## Return final object of class `recipe` out <- list( var_info = var_info, term_info = var_info, steps = NULL, template = x, levels = NULL, retained = NA ) class(out) <- "recipe" out } #' @rdname recipe #' @export recipe.formula <- function(formula, data, ...) { args <- form2args(formula, data, ...) obj <- recipe.data.frame( x = args$x, formula = NULL, ..., vars = args$vars, roles = args$roles ) } #' @rdname recipe #' @export recipe.matrix <- function(x, ...) recipe.data.frame(x, ...) #' @importFrom stats as.formula #' @importFrom tibble as_tibble is_tibble form2args <- function(formula, data, ...) { if (!is_formula(formula)) formula <- as.formula(formula) ## check for in-line formulas check_elements(formula, allowed = NULL) if (!is_tibble(data)) data <- as_tibble(data) ## use rlang to get both sides of the formula outcomes <- get_lhs_vars(formula, data) predictors <- get_rhs_vars(formula, data) ## get `vars` from lhs and rhs of formula vars <- c(predictors, outcomes) ## subset data columns data <- data[, vars] ## derive roles roles <- rep("predictor", length(predictors)) if (length(outcomes) > 0) roles <- c(roles, rep("outcome", length(outcomes))) ## pass to recipe.default with vars and roles list(x = data, vars = vars, roles = roles) } #' @aliases prep prep.recipe #' @param x an object #' @param ... further arguments passed to or from other methods (not currently #' used). #' @author Max Kuhn #' @keywords datagen #' @concept preprocessing model_specification #' @export prep <- function(x, ...) UseMethod("prep") #' Train a Data Recipe #' #' For a recipe with at least one preprocessing step, estimate the required #' parameters from a training set that can be later applied to other data #' sets. #' @param training A data frame or tibble that will be used to estimate #' parameters for preprocessing. #' @param fresh A logical indicating whether already trained steps should be #' re-trained. If \code{TRUE}, you should pass in a data set to the argument #' \code{training}. #' @param verbose A logical that controls wether progress is reported as steps #' are executed. #' @param retain A logical: should the \emph{preprocessingcessed} training set be saved #' into the \code{template} slot of the recipe after training? This is a good #' idea if you want to add more steps later but want to avoid re-training #' the existing steps. #' @param stringsAsFactors A logical: should character columns be converted to #' factors? This affects the preprocessingcessed training set (when #' \code{retain = TRUE}) as well as the results of \code{bake.recipe}. #' @return A recipe whose step objects have been updated with the required #' quantities (e.g. parameter estimates, model objects, etc). Also, the #' \code{term_info} object is likely to be modified as the steps are #' executed. #' @details Given a data set, this function estimates the required quantities #' and statistics required by any steps. #' #' \code{\link{prep}} returns an updated recipe with the estimates. #' #' Note that missing data handling is handled in the steps; there is no global #' \code{na.rm} option at the recipe-level or in \code{\link{prep}}. #' #' Also, if a recipe has been trained using \code{\link{prep}} and then steps #' are added, \code{\link{prep}} will only update the new steps. If #' \code{fresh = TRUE}, all of the steps will be (re)estimated. #' #' As the steps are executed, the \code{training} set is updated. For example, #' if the first step is to center the data and the second is to scale the #' data, the step for scaling is given the centered data. #' #' @rdname prep #' @importFrom tibble as_tibble is_tibble tibble #' @export prep.recipe <- function(x, training = NULL, fresh = FALSE, verbose = TRUE, retain = FALSE, stringsAsFactors = TRUE, ...) { if (is.null(training)) { if (fresh) stop("A training set must be supplied to the `training` argument ", "when `fresh = TRUE`", call. = FALSE) training <- x$template tr_data <- train_info(training) } else { training <- if (!is_tibble(training)) as_tibble(training[, x$var_info$variable, drop = FALSE]) else training[, x$var_info$variable] } tr_data <- train_info(training) if (stringsAsFactors) { lvls <- lapply(training, get_levels) training <- strings2factors(training, lvls) } else lvls <- NULL for (i in seq(along = x$steps)) { note <- paste("step", i, gsub("^step_", "", class(x$steps[[i]])[1])) if (!x$steps[[i]]$trained | fresh) { if (verbose) cat(note, "training", "\n") # Compute anything needed for the preprocessing steps # then apply it to the current training set x$steps[[i]] <- prep(x$steps[[i]], training = training, info = x$term_info) training <- bake(x$steps[[i]], newdata = training) x$term_info <- merge_term_info(get_types(training), x$term_info) ## Update the roles and the term source ## These next two steps needs to be smarter to find diffs if (!is.na(x$steps[[i]]$role)) x$term_info$role[is.na(x$term_info$role)] <- x$steps[[i]]$role x$term_info$source[is.na(x$term_info$source)] <- "derived" } else { if (verbose) cat(note, "[pre-trained]\n") } } ## The steps may have changed the data so reassess the levels if (stringsAsFactors) { lvls <- lapply(training, get_levels) check_lvls <- has_lvls(lvls) if (!any(check_lvls)) lvls <- NULL } else lvls <- NULL if (retain) x$template <- training x$tr_info <- tr_data x$levels <- lvls x$retained <- retain x } #' @rdname bake #' @aliases bake bake.recipe #' @author Max Kuhn #' @keywords datagen #' @concept preprocessing model_specification #' @export bake <- function(object, ...) UseMethod("bake") #' Apply a Trained Data Recipe #' #' For a recipe with at least one preprocessing step that has been trained by #' \code{\link{prep.recipe}}, apply the computations to new data. #' @param object A trained object such as a \code{\link{recipe}} with at least #' one preprocessing step. #' @param newdata A data frame or tibble for whom the preprocessing will be #' applied. #' @param ... One or more selector functions to choose which variables will be #' returned by the function. See \code{\link{selections}} for more details. #' If no selectors are given, the default is to use #' \code{\link{all_predictors}}. #' @return A tibble that may have different columns than the original columns #' in \code{newdata}. #' @details \code{\link{bake}} takes a trained recipe and applies the #' operations to a data set to create a design matrix. #' #' If the original data used to train the data are to be processed, time can be #' saved by using the \code{retain = TRUE} option of \code{\link{prep}} to #' avoid duplicating the same operations. #' #' A tibble is always returned but can be easily converted to a data frame or #' matrix as needed. #' @rdname bake #' @importFrom tibble as_tibble #' @importFrom dplyr filter #' @export bake.recipe <- function(object, newdata = object$template, ...) { if (!is_tibble(newdata)) newdata <- as_tibble(newdata) terms <- quos(...) if (is_empty(terms)) terms <- quos(all_predictors()) ## determine return variables keepers <- terms_select(terms = terms, info = object$term_info) for (i in seq(along = object$steps)) { newdata <- bake(object$steps[[i]], newdata = newdata) if (!is_tibble(newdata)) newdata <- as_tibble(newdata) } newdata <- newdata[, names(newdata) %in% keepers] ## the Levels are not null when no nominal data are present or ## if stringsAsFactors = FALSE in `prep` if (!is.null(object$levels)) { var_levels <- object$levels var_levels <- var_levels[keepers] check_values <- vapply(var_levels, function(x) (!all(is.na(x))), c(all = TRUE)) var_levels <- var_levels[check_values] if (length(var_levels) > 0) newdata <- strings2factors(newdata, var_levels) } newdata } #' Print a Recipe #' #' @aliases print.recipe #' @param x A \code{recipe} object #' @param form_width The number of characters used to print the variables or #' terms in a formula #' @param ... further arguments passed to or from other methods (not currently #' used). #' @return The original object (invisibly) #' #' @author Max Kuhn #' @export print.recipe <- function(x, form_width = 30, ...) { cat("Data Recipe\n\n") cat("Inputs:\n\n") no_role <- is.na(x$var_info$role) if (any(!no_role)) { tab <- as.data.frame(table(x$var_info$role)) colnames(tab) <- c("role", "#variables") print(tab, row.names = FALSE) if (any(no_role)) { cat("\n ", sum(no_role), "variables without declared roles\n") } } else { cat(" ", nrow(x$var_info), "variables (no declared roles)\n") } if ("tr_info" %in% names(x)) { nmiss <- x$tr_info$nrows - x$tr_info$ncomplete cat("\nTraining data contained ", x$tr_info$nrows, " data points and ", sep = "") if (x$tr_info$nrows == x$tr_info$ncomplete) cat("no missing data.\n") else cat(nmiss, "incomplete", ifelse(nmiss > 1, "rows.", "row."), "\n") } if (!is.null(x$steps)) { cat("\nSteps:\n\n") for (i in seq_along(x$steps)) print(x$steps[[i]], form_width = form_width) } invisible(x) } #' Summarize a Recipe #' #' This function prints the current set of variables/features and some of their #' characteristics. #' @aliases summary.recipe #' @param object A \code{recipe} object #' @param original A logical: show the current set of variables or the original #' set when the recipe was defined. #' @param ... further arguments passed to or from other methods (not currently #' used). #' @return A tibble with columns \code{variable}, \code{type}, \code{role}, #' and \code{source}. #' @details Note that, until the recipe has been trained, the currrent and #' original variables are the same. #' @examples #' rec <- recipe( ~ ., data = USArrests) #' summary(rec) #' rec <- step_pca(rec, all_numeric(), num = 3) #' summary(rec) # still the same since not yet trained #' rec <- prep(rec, training = USArrests) #' summary(rec) #' @export #' @seealso \code{\link{recipe}} \code{\link{prep.recipe}} summary.recipe <- function(object, original = FALSE, ...) { if (original) object$var_info else object$term_info } #' Extract Finalized Training Set #' #' As steps are estimated by \code{prep}, these operations are #' applied to the training set. Rather than running \code{bake} #' to duplicate this processing, this function will return #' variables from the processed training set. #' @param object A \code{recipe} object that has been prepared #' with the option \code{retain = TRUE}. #' @param ... One or more selector functions to choose which variables will be #' returned by the function. See \code{\link{selections}} for more details. #' If no selectors are given, the default is to use #' \code{\link{all_predictors}}. #' @return A tibble. #' @details When preparing a recipe, if the training data set is retained using \code{retain = TRUE}, there is no need to \code{bake} the recipe to get the preprocessed training set. #' @examples #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' sp_signed <- rec %>% #' step_center(all_predictors()) %>% #' step_scale(all_predictors()) %>% #' step_spatialsign(all_predictors()) #' #' sp_signed_trained <- prep(sp_signed, training = biomass_tr, retain = TRUE) #' #' tr_values <- bake(sp_signed_trained, newdata = biomass_tr, all_predictors()) #' og_values <- juice(sp_signed_trained, all_predictors()) #' #' all.equal(tr_values, og_values) #' @export #' @seealso \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} juice <- function(object, ...) { if(!isTRUE(object$retained)) stop("Use `retain = TRUE` in `prep` to be able to extract the training set", call. = FALSE) tr_steps <- vapply(object$steps, function(x) x$trained, c(logic = TRUE)) if(!all(tr_steps)) stop("At least one step has not be prepared; cannot extract.", call. = FALSE) terms <- quos(...) if (is_empty(terms)) terms <- quos(all_predictors()) keepers <- terms_select(terms = terms, info = object$term_info) newdata <- object$template[, names(object$template) %in% keepers] ## Since most models require factors, do the conversion from character if (!is.null(object$levels)) { var_levels <- object$levels var_levels <- var_levels[keepers] check_values <- vapply(var_levels, function(x) (!all(is.na(x))), c(all = TRUE)) var_levels <- var_levels[check_values] if (length(var_levels) > 0) newdata <- strings2factors(newdata, var_levels) } newdata } recipes/R/kpca.R0000644000177700017770000001454513135741217014515 0ustar herbrandtherbrandt#' Kernel PCA Signal Extraction #' #' \code{step_kpca} a \emph{specification} of a recipe step that will convert #' numeric data into one or more principal components using a kernel basis #' expansion. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables will be #' used to compute the components. See \code{\link{selections}} for more #' details. #' @param role For model terms created by this step, what analysis role should #' they be assigned?. By default, the function assumes that the new principal #' component columns created by the original variables will be used as #' predictors in a model. #' @param num The number of PCA components to retain as new predictors. If #' \code{num} is greater than the number of columns or the number of possible #' components, a smaller value will be used. #' @param options A list of options to \code{\link[kernlab]{kpca}}. Defaults #' are set for the arguments \code{kernel} and \code{kpar} but others can be #' passed in. \bold{Note} that the arguments \code{x} and \code{features} #' should not be passed here (or at all). #' @param res An S4 \code{\link[kernlab]{kpca}} object is stored here once this #' preprocessing step has be trained by \code{\link{prep.recipe}}. #' @param prefix A character string that will be the prefix to the resulting #' new variables. See notes below. #' @keywords datagen #' @concept preprocessing pca projection_methods kernel_methods #' @export #' @details Kernel principal component analysis (kPCA) is an extension a PCA #' analysis that conducts the calculations in a broader dimensionality #' defined by a kernel function. For example, if a quadratic kernel function #' were used, each variable would be represented by its original values as #' well as its square. This nonlinear mapping is used during the PCA #' analysis and can potentially help find better representations of the #' original data. #' #' As with ordinary PCA, it is important to standardized the variables prior #' to running PCA (\code{step_center} and \code{step_scale} can be used for #' this purpose). #' #' When performing kPCA, the kernel function (and any important kernel #' parameters) must be chosen. The \pkg{kernlab} package is used and the #' reference below discusses the types of kernels available and their #' parameter(s). These specifications can be made in the \code{kernel} and #' \code{kpar} slots of the \code{options} argument to \code{step_kpca}. #' #' The argument \code{num} controls the number of components that will be #' retained (the original variables that are used to derive the components #' are removed from the data). The new components will have names that begin #' with \code{prefix} and a sequence of numbers. The variable names are #' padded with zeros. For example, if \code{num < 10}, their names will be #' \code{kPC1} - \code{kPC9}. If \code{num = 101}, the names would be #' \code{kPC001} - \code{kPC101}. #' #' @references Scholkopf, B., Smola, A., and Muller, K. (1997). Kernel #' principal component analysis. \emph{Lecture Notes in Computer Science}, #' 1327, 583-588. #' #' Karatzoglou, K., Smola, A., Hornik, K., and Zeileis, A. (2004). kernlab - #' An S4 package for kernel methods in R. \emph{Journal of Statistical #' Software}, 11(1), 1-20. #' #' @examples #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' kpca_trans <- rec %>% #' step_YeoJohnson(all_predictors()) %>% #' step_center(all_predictors()) %>% #' step_scale(all_predictors()) %>% #' step_kpca(all_predictors()) #' #' kpca_estimates <- prep(kpca_trans, training = biomass_tr) #' #' kpca_te <- bake(kpca_estimates, biomass_te) #' #' rng <- extendrange(c(kpca_te$kPC1, kpca_te$kPC2)) #' plot(kpca_te$kPC1, kpca_te$kPC2, #' xlim = rng, ylim = rng) #' @seealso \code{\link{step_pca}} \code{\link{step_ica}} #' \code{\link{step_isomap}} \code{\link{recipe}} \code{\link{prep.recipe}} #' \code{\link{bake.recipe}} #' step_kpca <- function(recipe, ..., role = "predictor", trained = FALSE, num = 5, res = NULL, options = list(kernel = "rbfdot", kpar = list(sigma = 0.2)), prefix = "kPC") { add_step( recipe, step_kpca_new( terms = check_ellipses(...), role = role, trained = trained, num = num, res = res, options = options, prefix = prefix ) ) } step_kpca_new <- function(terms = NULL, role = "predictor", trained = FALSE, num = NULL, res = NULL, options = NULL, prefix = "kPC") { step( subclass = "kpca", terms = terms, role = role, trained = trained, num = num, res = res, options = options, prefix = prefix ) } #' @importFrom dimRed kPCA dimRedData #' @export prep.step_kpca <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) kprc <- kPCA(stdpars = c(list(ndim = x$num), x$options)) kprc <- kprc@fun( dimRedData(as.data.frame(training[, col_names, drop = FALSE])), kprc@stdpars ) step_kpca_new( terms = x$terms, role = x$role, trained = TRUE, num = x$num, options = x$options, res = kprc, prefix = x$prefix ) } #' @export bake.step_kpca <- function(object, newdata, ...) { pca_vars <- colnames(environment(object$res@apply)$indata) comps <- object$res@apply( dimRedData(as.data.frame(newdata[, pca_vars, drop = FALSE])) )@data comps <- comps[, 1:object$num, drop = FALSE] colnames(comps) <- names0(ncol(comps), object$prefix) newdata <- cbind(newdata, as_tibble(comps)) newdata <- newdata[, !(colnames(newdata) %in% pca_vars), drop = FALSE] as_tibble(newdata) } print.step_kpca <- function(x, width = max(20, options()$width - 40), ...) { if(x$trained) { cat("Kernel PCA (", x$res@pars$kernel, ") extraction with ", sep = "") cat(format_ch_vec(colnames(x$res@org.data), width = width)) } else { cat("Kernel PCA extraction with ", sep = "") cat(format_selectors(x$terms, wdth = width)) } if(x$trained) cat(" [trained]\n") else cat("\n") invisible(x) } recipes/R/invlogit.R0000644000177700017770000000536213135741217015427 0ustar herbrandtherbrandt#' Inverse Logit Transformation #' #' \code{step_invlogit} creates a \emph{specification} of a recipe step that #' will transform the data from real values to be between zero and one. #' #' @inheritParams step_center #' @inherit step_center return #' @param role Not used by this step since no new variables are created. #' @param columns A character string of variable names that will be (eventually) #' populated by the \code{terms} argument. #' @keywords datagen #' @concept preprocessing transformation_methods #' @export #' @details The inverse logit transformation takes values on the real line and #' translates them to be between zero and one using the function #' \code{f(x) = 1/(1+exp(-x))}. #' @examples #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' ilogit_trans <- rec %>% #' step_center(carbon, hydrogen) %>% #' step_scale(carbon, hydrogen) %>% #' step_invlogit(carbon, hydrogen) #' #' ilogit_obj <- prep(ilogit_trans, training = biomass_tr) #' #' transformed_te <- bake(ilogit_obj, biomass_te) #' plot(biomass_te$carbon, transformed_te$carbon) #' @seealso \code{\link{step_logit}} \code{\link{step_log}} #' \code{\link{step_sqrt}} \code{\link{step_hyperbolic}} #' \code{\link{recipe}} \code{\link{prep.recipe}} #' \code{\link{bake.recipe}} step_invlogit <- function(recipe, ..., role = NA, trained = FALSE, columns = NULL) { add_step(recipe, step_invlogit_new( terms = check_ellipses(...), role = role, trained = trained, columns = columns )) } step_invlogit_new <- function(terms = NULL, role = NA, trained = FALSE, columns = NULL) { step( subclass = "invlogit", terms = terms, role = role, trained = trained, columns = columns ) } #' @export prep.step_invlogit <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) step_invlogit_new( terms = x$terms, role = x$role, trained = TRUE, columns = col_names ) } #' @importFrom tibble as_tibble #' @importFrom stats binomial #' @export bake.step_invlogit <- function(object, newdata, ...) { for (i in seq_along(object$columns)) newdata[, object$columns[i]] <- binomial()$linkinv(unlist(getElement(newdata, object$columns[i]), use.names = FALSE)) as_tibble(newdata) } print.step_invlogit <- function(x, width = max(20, options()$width - 26), ...) { cat("Inverse logit on ", sep = "") printer(x$columns, x$terms, x$trained, width = width) invisible(x) } recipes/R/center.R0000644000177700017770000000654213135741217015055 0ustar herbrandtherbrandt#' Centering Numeric Data #' #' \code{step_center} creates a \emph{specification} of a recipe step that #' will normalize numeric data to have a mean of zero. #' #' @param recipe A recipe object. The step will be added to the sequence of #' operations for this recipe. #' @param ... One or more selector functions to choose which variables are #' affected by the step. See \code{\link{selections}} for more details. #' @param role Not used by this step since no new variables are created. #' @param trained A logical to indicate if the quantities for preprocessing #' have been estimated. #' @param means A named numeric vector of means. This is \code{NULL} until #' computed by \code{\link{prep.recipe}}. #' @param na.rm A logical value indicating whether \code{NA} values should be #' removed when averaging. #' @return An updated version of \code{recipe} with the #' new step added to the sequence of existing steps (if any). #' @keywords datagen #' @concept preprocessing normalization_methods #' @export #' @details Centering data means that the average of a variable is subtracted #' from the data. \code{step_center} estimates the variable means from the #' data used in the \code{training} argument of \code{prep.recipe}. #' \code{bake.recipe} then applies the centering to new data sets using #' these means. #' #' @examples #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' center_trans <- rec %>% #' step_center(carbon, contains("gen"), -hydrogen) #' #' center_obj <- prep(center_trans, training = biomass_tr) #' #' transformed_te <- bake(center_obj, biomass_te) #' #' biomass_te[1:10, names(transformed_te)] #' transformed_te #' @seealso \code{\link{recipe}} \code{\link{prep.recipe}} #' \code{\link{bake.recipe}} step_center <- function(recipe, ..., role = NA, trained = FALSE, means = NULL, na.rm = TRUE) { add_step( recipe, step_center_new( terms = check_ellipses(...), trained = trained, role = role, means = means, na.rm = na.rm ) ) } ## Initializes a new object step_center_new <- function(terms = NULL, role = NA, trained = FALSE, means = NULL, na.rm = NULL) { step( subclass = "center", terms = terms, role = role, trained = trained, means = means, na.rm = na.rm ) } prep.step_center <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) means <- vapply(training[, col_names], mean, c(mean = 0), na.rm = x$na.rm) step_center_new( terms = x$terms, role = x$role, trained = TRUE, means = means, na.rm = x$na.rm ) } bake.step_center <- function(object, newdata, ...) { res <- sweep(as.matrix(newdata[, names(object$means)]), 2, object$means, "-") if (is.matrix(res) && ncol(res) == 1) res <- res[, 1] newdata[, names(object$means)] <- res as_tibble(newdata) } print.step_center <- function(x, width = max(20, options()$width - 30), ...) { cat("Centering for ", sep = "") printer(names(x$means), x$terms, x$trained, width = width) invisible(x) } recipes/R/spatialsign.R0000644000177700017770000000616113135741217016110 0ustar herbrandtherbrandt#' Spatial Sign Preprocessing #' #' \code{step_spatialsign} is a \emph{specification} of a recipe step that #' will convert numeric data into a projection on to a unit sphere. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables will be #' used for the normalization. See \code{\link{selections}} for more details. #' @param role For model terms created by this step, what analysis role should #' they be assigned? #' @param columns A character string of variable names that will be (eventually) #' populated by the \code{terms} argument. #' @keywords datagen #' @concept preprocessing projection_methods #' @export #' @details The spatial sign transformation projects the variables onto a unit #' sphere and is related to global contrast normalization. The spatial sign #' of a vector \code{w} is \code{w/norm(w)}. #' #' The variables should be centered and scaled prior to the computations. #' @references Serneels, S., De Nolf, E., and Van Espen, P. (2006). Spatial #' sign preprocessing: a simple way to impart moderate robustness to #' multivariate estimators. \emph{Journal of Chemical Information and #' Modeling}, 46(3), 1402-1409. #' @examples #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' ss_trans <- rec %>% #' step_center(carbon, hydrogen) %>% #' step_scale(carbon, hydrogen) %>% #' step_spatialsign(carbon, hydrogen) #' #' ss_obj <- prep(ss_trans, training = biomass_tr) #' #' transformed_te <- bake(ss_obj, biomass_te) #' #' plot(biomass_te$carbon, biomass_te$hydrogen) #' #' plot(transformed_te$carbon, transformed_te$hydrogen) step_spatialsign <- function(recipe, ..., role = "predictor", trained = FALSE, columns = NULL) { add_step(recipe, step_spatialsign_new( terms = check_ellipses(...), role = role, trained = trained, columns = columns )) } step_spatialsign_new <- function(terms = NULL, role = "predictor", trained = FALSE, columns = NULL) { step( subclass = "spatialsign", terms = terms, role = role, trained = trained, columns = columns ) } #' @export prep.step_spatialsign <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) step_spatialsign_new( terms = x$terms, role = x$role, trained = TRUE, columns = col_names ) } #' @export bake.step_spatialsign <- function(object, newdata, ...) { col_names <- object$columns ss <- function(x) x / sqrt(sum(x ^ 2)) newdata[, col_names] <- t(apply(as.matrix(newdata[, col_names]), 1, ss)) as_tibble(newdata) } print.step_spatialsign <- function(x, width = max(20, options()$width - 26), ...) { cat("Spatial sign on ", sep = "") printer(x$columns, x$terms, x$trained, width = width) invisible(x) } recipes/R/ica.R0000644000177700017770000001264513135741217014332 0ustar herbrandtherbrandt#' ICA Signal Extraction #' #' \code{step_ica} creates a \emph{specification} of a recipe step that will #' convert numeric data into one or more independent components. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables will be #' used to compute the components. See \code{\link{selections}} for more #' details. #' @param role For model terms created by this step, what analysis role should #' they be assigned?. By default, the function assumes that the new #' independent component columns created by the original variables will be #' used as predictors in a model. #' @param num The number of ICA components to retain as new predictors. If #' \code{num} is greater than the number of columns or the number of possible #' components, a smaller value will be used. #' @param options A list of options to \code{\link[fastICA]{fastICA}}. No #' defaults are set here. \bold{Note} that the arguments \code{X} and #' \code{n.comp} should not be passed here. #' @param res The \code{\link[fastICA]{fastICA}} object is stored here once #' this preprocessing step has be trained by \code{\link{prep.recipe}}. #' @param prefix A character string that will be the prefix to the resulting #' new variables. See notes below. #' @keywords datagen #' @concept preprocessing ica projection_methods #' @export #' @details Independent component analysis (ICA) is a transformation of a #' group of variables that produces a new set of artificial features or #' components. ICA assumes that the variables are mixtures of a set of #' distinct, non-Gaussian signals and attempts to transform the data to #' isolate these signals. Like PCA, the components are statistically #' independent from one another. This means that they can be used to combat #' large inter-variables correlations in a data set. Also like PCA, it is #' advisable to center and scale the variables prior to running ICA. #' #' This package produces components using the "FastICA" methodology (see #' reference below). #' #' The argument \code{num} controls the number of components that will be #' retained (the original variables that are used to derive the components #' are removed from the data). The new components will have names that begin #' with \code{prefix} and a sequence of numbers. The variable names are #' padded with zeros. For example, if \code{num < 10}, their names will be #' \code{IC1} - \code{IC9}. If \code{num = 101}, the names would be #' \code{IC001} - \code{IC101}. #' #' @references Hyvarinen, A., and Oja, E. (2000). Independent component #' analysis: algorithms and applications. \emph{Neural Networks}, 13(4-5), #' 411-430. #' #' @examples #' # from fastICA::fastICA #' set.seed(131) #' S <- matrix(runif(400), 200, 2) #' A <- matrix(c(1, 1, -1, 3), 2, 2, byrow = TRUE) #' X <- as.data.frame(S %*% A) #' #' tr <- X[1:100, ] #' te <- X[101:200, ] #' #' rec <- recipe( ~ ., data = tr) #' #' ica_trans <- step_center(rec, V1, V2) #' ica_trans <- step_scale(rec, V1, V2) #' ica_trans <- step_ica(rec, V1, V2, num = 2) #' ica_estimates <- prep(ica_trans, training = tr) #' ica_data <- bake(ica_estimates, te) #' #' plot(te$V1, te$V2) #' plot(ica_data$IC1, ica_data$IC2) #' @seealso \code{\link{step_pca}} \code{\link{step_kpca}} #' \code{\link{step_isomap}} \code{\link{recipe}} \code{\link{prep.recipe}} #' \code{\link{bake.recipe}} step_ica <- function(recipe, ..., role = "predictor", trained = FALSE, num = 5, options = list(), res = NULL, prefix = "IC") { add_step( recipe, step_ica_new( terms = check_ellipses(...), role = role, trained = trained, num = num, options = options, res = res, prefix = prefix ) ) } step_ica_new <- function(terms = NULL, role = "predictor", trained = FALSE, num = NULL, options = NULL, res = NULL, prefix = "IC") { step( subclass = "ica", terms = terms, role = role, trained = trained, num = num, options = options, res = res, prefix = prefix ) } #' @importFrom dimRed FastICA dimRedData #' @export prep.step_ica <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) x$num <- min(x$num, length(col_names)) indc <- FastICA(stdpars = x$options) indc <- indc@fun(dimRedData(as.data.frame(training[, col_names, drop = FALSE])), list(ndim = x$num)) step_ica_new( terms = x$terms, role = x$role, trained = TRUE, num = x$num, options = x$options, res = indc, prefix = x$prefix ) } #' @export bake.step_ica <- function(object, newdata, ...) { ica_vars <- colnames(environment(object$res@apply)$indata) comps <- object$res@apply( dimRedData( as.data.frame(newdata[, ica_vars, drop = FALSE]) ) )@data comps <- comps[, 1:object$num, drop = FALSE] colnames(comps) <- names0(ncol(comps), object$prefix) newdata <- cbind(newdata, as_tibble(comps)) newdata <- newdata[, !(colnames(newdata) %in% ica_vars), drop = FALSE] as_tibble(newdata) } print.step_ica <- function(x, width = max(20, options()$width - 29), ...) { cat("ICA extraction with ") printer(colnames(x$res@org.data), x$terms, x$trained, width = width) invisible(x) } recipes/R/interactions.R0000644000177700017770000001540013135741217016270 0ustar herbrandtherbrandt#' Create Interaction Variables #' #' \code{step_interact} creates a \emph{specification} of a recipe step that #' will create new columns that are interaction terms between two or more #' variables. #' #' @inheritParams step_center #' @inherit step_center return #' @param terms A traditional R formula that contains interaction terms. #' @param role For model terms created by this step, what analysis role should #' they be assigned?. By default, the function assumes that the new columns #' created from the original variables will be used as predictors in a model. #' @param objects A list of \code{terms} objects for each individual interation. #' @param sep A character value used to delinate variables in an interaction #' (e.g. \code{var1_x_var2} instead of the more traditional \code{var1:var2}). #' @keywords datagen #' @concept preprocessing model_specification #' @export #' @details \code{step_interact} can create interactions between variables. It #' is primarily intended for \bold{numeric data}; categorical variables #' should probably be converted to dummy variables using #' \code{\link{step_dummy}} prior to being used for interactions. #' #' Unlike other step functions, the \code{terms} argument should be a #' traditional R model formula but should contain no inline functions (e.g. #' \code{log}). For example, for predictors \code{A}, \code{B}, and \code{C}, #' a formula such as \code{~A:B:C} can be used to make a three way #' interaction between the variables. If the formula contains terms other #' than interactions (e.g. \code{(A+B+C)^3}) only the interaction terms are #' retained for the design matrix. #' #' The separator between the variables defaults to "\code{_x_}" so that the #' three way interaction shown previously would generate a column named #' \code{A_x_B_x_C}. This can be changed using the \code{sep} argument. #' @examples #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' int_mod_1 <- rec %>% #' step_interact(terms = ~ carbon:hydrogen) #' #' int_mod_2 <- int_mod_1 %>% #' step_interact(terms = ~ (oxygen + nitrogen + sulfur)^3) #' #' int_mod_1 <- prep(int_mod_1, training = biomass_tr) #' int_mod_2 <- prep(int_mod_2, training = biomass_tr) #' #' dat_1 <- bake(int_mod_1, biomass_te) #' dat_2 <- bake(int_mod_2, biomass_te) #' #' names(dat_1) #' names(dat_2) step_interact <- function(recipe, terms, role = "predictor", trained = FALSE, objects = NULL, sep = "_x_") { add_step( recipe, step_interact_new( terms = terms, trained = trained, role = role, objects = objects, sep = sep ) ) } ## Initializes a new object step_interact_new <- function(terms = NULL, role = NA, trained = FALSE, objects = NULL, sep = NULL) { step( subclass = "interact", terms = terms, role = role, trained = trained, objects = objects, sep = sep ) } ## The idea is to save a bunch of x-factor interaction terms instead of ## one large set of collected terms. #' @export prep.step_interact <- function(x, training, info = NULL, ...) { ## First, find the interaction terms based on the given formula int_terms <- get_term_names(x$terms, vnames = colnames(training)) ## Check to see if any variables are non-numeric and issue a warning ## if that is the case vars <- unique(unlist(lapply(make_new_formula(int_terms), all.vars))) var_check <- info[info$variable %in% vars, ] if (any(var_check$type == "nominal")) warning( "Categorical variables used in `step_interact` should probably be ", "avoided; This can lead to differences in dummy variable values that ", "are produced by `step_dummy`." ) ## For each interaction, create a new formula that has main effects ## and only the interaction of choice (e.g. `a+b+c+a:b:c`) int_forms <- make_new_formula(int_terms) ## Generate a standard R `terms` object from these short formulas and ## save to make future interactions int_terms <- make_small_terms(int_forms, training) step_interact_new( terms = x$terms, role = x$role, trained = TRUE, objects = int_terms, sep = x$sep ) } #' @export bake.step_interact <- function(object, newdata, ...) { ## `na.action` cannot be passed to `model.matrix` but we ## can change it globally for a bit old_opt <- options()$na.action options(na.action = "na.pass") on.exit(options(na.action = old_opt)) ## Create low level model matrices then remove the non-interaction terms. res <- lapply(object$object, model.matrix, data = newdata) options(na.action = old_opt) on.exit(expr = NULL) res <- lapply(res, function(x) x[, grepl(":", colnames(x)), drop = FALSE]) ncols <- vapply(res, ncol, c(int = 1L)) out <- matrix(NA, nrow = nrow(newdata), ncol = sum(ncols)) strt <- 1 for (i in seq_along(ncols)) { cols <- (strt):(strt + ncols[i] - 1) out[, cols] <- res[[i]] strt <- max(cols) + 1 } colnames(out) <- gsub(":", object$sep, unlist(lapply(res, colnames))) newdata <- cbind(newdata, as_tibble(out)) if (!is_tibble(newdata)) newdata <- as_tibble(newdata) newdata } ## This uses the highest level of interactions x_fac_int <- function(x) as.formula( paste0("~", paste0(x, collapse = "+"), "+", paste0(x, collapse = ":") ) ) make_new_formula <- function(x) { splitup <- strsplit(x, ":") lapply(splitup, x_fac_int) } #' @importFrom stats model.matrix ## Given a standard model formula and some data, get the ## term expansion (without `.`s). This returns the factor ## names and would not expand dummy variables. get_term_names <- function(form, vnames) { ## We are going to cheat and make a small fake data set to ## effcicently get the full formula exapnsion from ## model.matrix (devoid of factor levels) and then ## pick off the interactions dat <- matrix(1, nrow = 5, ncol = length(vnames)) colnames(dat) <- vnames nms <- colnames(model.matrix(form, data = as.data.frame(dat))) nms <- nms[nms != "(Intercept)"] nms <- grep(":", nms, value = TRUE) nms } #' @importFrom stats terms ## For a given data set and a list of formulas, generate the ## standard R `terms` objects make_small_terms <- function(forms, dat) { lapply(forms, terms, data = dat) } print.step_interact <- function(x, width = max(20, options()$width - 27), ...) { cat("Interactions with ", sep = "") cat(as.character(x$terms)[-1]) if (x$trained) cat(" [trained]\n") else cat("\n") invisible(x) } recipes/R/misc.R0000644000177700017770000002216413135756756014544 0ustar herbrandtherbrandtfilter_terms <- function(x, ...) UseMethod("filter_terms") ## Buckets variables into discrete, mutally exclusive types #' @importFrom tibble tibble get_types <- function(x) { var_types <- c( character = "nominal", factor = "nominal", ordered = "nominal", integer = "numeric", numeric = "numeric", double = "numeric", Surv = "censored", logical = "logical", Date = "date", POSIXct = "date" ) classes <- lapply(x, class) res <- lapply(classes, function(x, types) { in_types <- x %in% names(types) if (sum(in_types) > 0) { # not sure what to do with multiple matches; right now ## pick the first match which favors "factor" over "ordered" out <- unname(types[min(which(names(types) %in% x))]) } else out <- "other" out }, types = var_types) res <- unlist(res) tibble(variable = names(res), type = unname(res)) } type_by_var <- function(classes, dat) { res <- sapply(dat, is_one_of, what = classes) names(res)[res] } is_one_of <- function(x, what) { res <- sapply(as.list(what), function(class, obj) inherits(obj, what = class), obj = x) any(res) } ## general error trapping functions check_all_outcomes_same_type <- function(x) x ## get variables from formulas is_formula <- function(x) isTRUE(inherits(x, "formula")) #' @importFrom rlang f_lhs get_lhs_vars <- function(formula, data) { if (!is_formula(formula)) formula <- as.formula(formula) ## Want to make sure that multiple outcomes can be expressed as ## additions with no cbind business and that `.` works too (maybe) formula <- as.formula(paste("~", deparse(f_lhs(formula)))) get_rhs_vars(formula, data) } #' @importFrom stats model.frame get_rhs_vars <- function(formula, data) { if (!is_formula(formula)) formula <- as.formula(formula) ## This will need a lot of work to account for cases with `.` ## or embedded functions like `Sepal.Length + poly(Sepal.Width)`. ## or should it? what about Y ~ log(x)? data_info <- attr(model.frame(formula, data), "terms") response_info <- attr(data_info, "response") predictor_names <- names(attr(data_info, "dataClasses")) if (length(response_info) > 0 && all(response_info > 0)) predictor_names <- predictor_names[-response_info] predictor_names } get_lhs_terms <- function(x) x get_rhs_terms <- function(x) x ## ancillary step functions #' Add a New Step to Current Recipe #' #' \code{add_step} adds a step to the last location in the recipe. #' #' @param rec A \code{\link{recipe}}. #' @param object A step object. #' @keywords datagen #' @concept preprocessing #' @return A updated \code{\link{recipe}} with the new step in the last slot. #' @export add_step <- function(rec, object) { rec$steps[[length(rec$steps) + 1]] <- object rec } var_by_role <- function(rec, role = "predictor", returnform = TRUE) { res <- rec$var_info$variable[rec$var_info$role == role] if (returnform) res <- as.formula(paste("~", paste(res, collapse = "+"))) res } ## Overall wrapper to make new step_X objects #' A General Step Wrapper #' #' \code{step} sets the class of the step. #' #' @param subclass A character string for the resulting class. For example, #' if \code{subclass = "blah"} the step object that is returned has class #' \code{step_blah}. #' @param ... All arguments to the step that should be returned. #' @keywords datagen #' @concept preprocessing #' @return A updated step with the new class. #' @export step <- function(subclass, ...) { structure(list(...), class = c(paste0("step_", subclass), "step")) } ## then 9 is to keep space for "[trained]" format_ch_vec <- function(x, sep = ", ", width = options()$width - 9) { widths <- nchar(x) sep_wd <- nchar(sep) adj_wd <- widths + sep_wd if (sum(adj_wd) >= width) { keepers <- max(which(cumsum(adj_wd) < width)) - 1 if (length(keepers) == 0 || keepers < 1) { x <- paste(length(x), "items") } else { x <- c(x[1:keepers], "...") } } paste0(x, collapse = sep) } format_selectors <- function(x, wdth = options()$width - 9, ...) { ## convert to character without the leading ~ x_items <- lapply(x, function(x) as.character(x[-1])) x_items <- unlist(x_items) format_ch_vec(x_items, width = wdth, sep = ", ") } terms.recipe <- function(x, ...) x$term_info filter_terms.formula <- function(formula, data, ...) get_rhs_vars(formula, data) ## This function takes the default arguments of `func` and ## replaces them with the matching ones in `options` and ## remove any in `removals` sub_args <- function(func, options, removals = NULL) { args <- formals(func) for (i in seq_along(options)) args[[names(options)[i]]] <- options[[i]] if (!is.null(removals)) args[removals] <- NULL args } ## Same as above but starts with a call object mod_call_args <- function(cl, args, removals = NULL) { if (!is.null(removals)) for (i in removals) cl[[i]] <- NULL arg_names <- names(args) for (i in arg_names) cl[[i]] <- args[[i]] cl } #' Sequences of Names with Padded Zeros #' #' This function creates a series of \code{num} names with a common prefix. #' The names are numbered with leading zeros (e.g. #' \code{prefix01}-\code{prefix10} instead of \code{prefix1}-\code{prefix10}). #' #' @param num A single integer for how many elements are created. #' @param prefix A character string that will start each name. . #' @return A character string of length \code{num}. #' @keywords datagen #' @concept string_functions naming_functions #' @export names0 <- function(num, prefix = "x") { if (num < 1) stop("`num` should be > 0", call. = FALSE) ind <- format(1:num) ind <- gsub(" ", "0", ind) paste0(prefix, ind) } ## As suggested by HW, brought in from the `pryr` package ## https://github.com/hadley/pryr fun_calls <- function(f) { if (is.function(f)) { fun_calls(body(f)) } else if (is.call(f)) { fname <- as.character(f[[1]]) # Calls inside .Internal are special and shouldn't be included if (identical(fname, ".Internal")) return(fname) unique(c(fname, unlist(lapply(f[-1], fun_calls), use.names = FALSE))) } } get_levels <- function(x) { if (!is.factor(x) & !is.character(x)) return(list(values = NA, ordered = NA)) out <- if (is.factor(x)) list(values = levels(x), ordered = is.ordered(x)) else list(values = sort(unique(x)), ordered = FALSE) out } has_lvls <- function(info) !vapply(info, function(x) all(is.na(x$values)), c(logic = TRUE)) strings2factors <- function(x, info) { check_lvls <- has_lvls(info) if (!any(check_lvls)) return(x) info <- info[check_lvls] for (i in seq_along(info)) { lcol <- names(info)[i] x[, lcol] <- factor(as.character(getElement(x, lcol)), levels = info[[i]]$values, ordered = info[[i]]$ordered) } x } ## short summary of training set train_info <- function(x) { data.frame(nrows = nrow(x), ncomplete = sum(complete.cases(x))) } # Per LH and HW, brought in from the `dplyr` package is_negated <- function(x) { is_lang(x, "-", n = 1) } ## `merge_term_info` takes the information on the current variable ## list and the information on the new set of variables (after each step) ## and merges them. Special attention is paid to cases where the ## _type_ of data is changed for a common column in the data. #' @importFrom dplyr left_join merge_term_info <- function(.new, .old) { # Look for conflicts where the new variable type is different from # the original value tmp_new <- .new names(tmp_new)[names(tmp_new) == "type"] <- "new_type" tmp <- left_join(tmp_new[, c("variable", "new_type")], .old[, c("variable", "type")], by = "variable") tmp <- tmp[!(is.na(tmp$new_type) | is.na(tmp$type)), ] diff_type <- !(tmp$new_type == tmp$type) if (any(diff_type)) { ## Override old type to facilitate merge .old$type[which(diff_type)] <- .new$type[which(diff_type)] } left_join(.new, .old, by = c("variable", "type")) } #' @importFrom rlang quos is_empty check_ellipses <- function(...) { terms <- quos(...) if (is_empty(terms)) stop("Please supply at least one variable specification.", "See ?selections.", call. = FALSE) terms } #' @importFrom magrittr %>% #' @export magrittr::`%>%` printer <- function(tr_obj = NULL, untr_obj = NULL, trained = FALSE, width = max(20, options()$width - 30)) { if (trained) { cat(format_ch_vec(tr_obj, width = width)) } else cat(format_selectors(untr_obj, wdth = width)) if (trained) cat(" [trained]\n") else cat("\n") } #' @export #' @keywords internal #' @rdname recipes-internal prepare <- function(x, ...) stop("As of version 0.0.1.9006, used `prep` ", "instead of `prepare`", call. = FALSE) recipes/R/discretize.R0000644000177700017770000002175513135741217015745 0ustar herbrandtherbrandt#' Discretize Numeric Variables #' #' \code{discretize} converts a numeric vector into a factor with bins having #' approximately the same number of data points (based on a training set). #' #' @export #' @param x A numeric vector discretize <- function(x, ...) UseMethod("discretize") #' @rdname discretize discretize.default <- function(x, ...) stop("Only numeric `x` is accepted") #' @rdname discretize #' @param cuts An integer defining how many cuts to make of the data. #' @param labels A character vector defining the factor levels that will be in #' the new factor (from smallest to largest). This should have length #' \code{cuts+1} and should not include a level for missing (see #' \code{keep_na} below). #' @param prefix A single parameter value to be used as a prefix for the factor #' levels (e.g. \code{bin1}, \code{bin2}, ...). If the string is not a valid #' R name, it is coerced to one. #' @param keep_na A logical for whether a factor level should be created to #' identify missing values in \code{x}. #' @param infs A logical indicating whether the smallest and largest cut point #' should be infinite. #' @param min_unique An integer defining a sample size line of dignity for the #' binning. If (the number of unique values)\code{/(cuts+1)} is less than #' \code{min_unique}, no discretization takes place. #' @param ... For \code{discretize}: options to pass to #' \code{\link[stats]{quantile}} that should not include \code{x} or #' \code{probs}. For \code{step_discretize}, the dots specify one or more #' selector functions to choose which variables are affected by the step. See #' \code{\link{selections}} for more details. #' #' @return \code{discretize} returns an object of class \code{discretize}. #' \code{predict.discretize} returns a factor vector. #' @keywords datagen #' @concept preprocessing discretization factors #' @export #' @details \code{discretize} estimates the cut points from \code{x} using #' percentiles. For example, if \code{cuts = 3}, the function estimates the #' quartiles of \code{x} and uses these as the cut points. If \code{cuts = 2}, #' the bins are defined as being above or below the median of \code{x}. #' #' The \code{predict} method can then be used to turn numeric vectors into #' factor vectors. #' #' If \code{keep_na = TRUE}, a suffix of "_missing" is used as a factor level #' (see the examples below). #' #'If \code{infs = FALSE} and a new value is greater than the largest value of #' \code{x}, a missing value will result. #'@examples #'data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' median(biomass_tr$carbon) #' discretize(biomass_tr$carbon, cuts = 2) #' discretize(biomass_tr$carbon, cuts = 2, infs = FALSE) #' discretize(biomass_tr$carbon, cuts = 2, infs = FALSE, keep_na = FALSE) #' discretize(biomass_tr$carbon, cuts = 2, prefix = "maybe a bad idea to bin") #' #' carbon_binned <- discretize(biomass_tr$carbon) #' table(predict(carbon_binned, biomass_tr$carbon)) #' #' carbon_no_infs <- discretize(biomass_tr$carbon, infs = FALSE) #' predict(carbon_no_infs, c(50, 100)) #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' rec <- rec %>% step_discretize(carbon, hydrogen) #' rec <- prep(rec, biomass_tr) #' binned_te <- bake(rec, biomass_te) #' table(binned_te$carbon) #' @importFrom stats quantile discretize.numeric <- function(x, cuts = 4, labels = NULL, prefix = "bin", keep_na = TRUE, infs = TRUE, min_unique = 10, ...) { unique_vals <- length(unique(x)) missing_lab <- "_missing" if (cuts < 2) stop("There should be at least 2 cuts") if (unique_vals / (cuts + 1) >= min_unique) { breaks <- quantile(x, probs = seq(0, 1, length = cuts + 1), ...) num_breaks <- length(breaks) breaks <- unique(breaks) if (num_breaks > length(breaks)) warning( "Not enough data for ", cuts, " breaks. Only ", length(breaks), " breaks were used.", sep = "" ) if (infs) { breaks[1] <- -Inf breaks[length(breaks)] <- Inf } breaks <- unique(breaks) if (is.null(labels)) { prefix <- prefix[1] if (make.names(prefix) != prefix) { warning( "The prefix '", prefix, "' is not a valid R name. It has been changed to '", make.names(prefix), "'." ) prefix <- make.names(prefix) } labels <- names0(length(breaks) - 1, "") } out <- list( breaks = breaks, bins = length(breaks) - 1, prefix = prefix, labels = if (keep_na) labels <- c(missing_lab, labels) else labels, keep_na = keep_na ) } else { out <- list(bins = 0) warning("Data not binned; too few unique values per bin. ", "Adjust 'min_unique' as needed", call. = FALSE) } class(out) <- "discretize" out } #' @rdname discretize #' @importFrom stats predict #' @param object An object of class \code{discretize}. #' @param newdata A new numeric object to be binned. #' @export predict.discretize <- function(object, newdata, ...) { if (is.matrix(newdata) | is.data.frame(newdata)) newdata <- newdata[, 1] object$labels <- paste0(object$prefix, object$labels) if (object$bins >= 1) { labs <- if (object$keep_na) object$labels[-1] else object$labels out <- cut(newdata, object$breaks, labels = labs, include.lowest = TRUE) if (object$keep_na) { out <- as.character(out) if (any(is.na(newdata))) out[is.na(newdata)] <- object$labels[1] out <- factor(out, levels = object$labels) } } else out <- newdata out } #' @export print.discretize <- function(x, digits = max(3L, getOption("digits") - 3L), ...) { if (length(x$breaks) > 0) { cat("Bins:", length(x$labels)) if (any(grepl("_missing", x$labels))) cat(" (includes missing category)") cat("\n") if (length(x$breaks) <= 6) { cat("Breaks:", paste(signif(x$breaks, digits = digits), collapse = ", ")) } } else { if (x$bins == 0) cat("Too few unique data points. No binning.") else cat("Non-numeric data. No binning was used.") } } #' @rdname discretize #' @inheritParams step_center #' @inherit step_center return #' @param role Not used by this step since no new variables are created. #' @param objects The \code{\link{discretize}} objects are stored here once #' the recipe has be trained by \code{\link{prep.recipe}}. #' @param options A list of options to \code{\link{discretize}}. A defaults is #' set for the argument \code{x}. Note that the using the options #' \code{prefix} and \code{labels} when more than one variable is being #' transformed might be problematic as all variables inherit those values. #' @export step_discretize <- function(recipe, ..., role = NA, trained = FALSE, objects = NULL, options = list()) { add_step( recipe, step_discretize_new( terms = check_ellipses(...), trained = trained, role = role, objects = objects, options = options ) ) } step_discretize_new <- function(terms = NULL, role = NA, trained = FALSE, objects = NULL, options = NULL) { step( subclass = "discretize", terms = terms, role = role, trained = trained, objects = objects, options = options ) } bin_wrapper <- function(x, args) { bin_call <- quote(discretize(x, cuts, labels, prefix, keep_na, infs, min_unique, ...)) args <- sub_args(discretize.numeric, args, "x") args$x <- x eval(bin_call, envir = args) } #' @export prep.step_discretize <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) if (length(col_names) > 1 & any(names(x$options) %in% c("prefix", "labels"))) { warning("Note that the options `prefix` and `labels`", "will be applied to all variables") } obj <- lapply(training[, col_names], bin_wrapper, x$options) step_discretize_new( terms = x$terms, role = x$role, trained = TRUE, objects = obj, options = x$options ) } #' @importFrom tibble as_tibble #' @importFrom stats predict #' @export bake.step_discretize <- function(object, newdata, ...) { for (i in names(object$objects)) newdata[, i] <- predict(object$objects[[i]], getElement(newdata, i)) as_tibble(newdata) } print.step_discretize <- function(x, width = max(20, options()$width - 30), ...) { cat("Dummy variables from ") printer(names(x$objects), x$terms, x$trained, width = width) invisible(x) } recipes/R/other.R0000644000177700017770000001207713135741217014716 0ustar herbrandtherbrandt#' Collapse Some Categorical Levels #' #' \code{step_other} creates a \emph{specification} of a recipe step that will #' potentially pool infrequently occurring values into an "other" category. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables that #' will potentially be reduced. See \code{\link{selections}} for more details. #' @param role Not used by this step since no new variables are created. #' @param threshold A single numeric value in (0, 1) for pooling. #' @param other A single character value for the "other" category. #' @param objects A list of objects that contain the information to pool #' infrequent levels that is determined by \code{\link{prep.recipe}}. #' @keywords datagen #' @concept preprocessing factors #' @export #' @details The overall proportion of the categories are computed. The "other" #' category is used in place of any categorical levels whose individual #' proportion in the training set is less than \code{threshold}. #' #' If no pooling is done the data are unmodified (although character data may #' be changed to factors based on the value of \code{stringsAsFactors} in #' \code{\link{prep.recipe}}). Otherwise, a factor is always returned with #' different factor levels. #' #' If \code{threshold} is less than the largest category proportion, all levels #' except for the most frequent are collapsed to the \code{other} level. #' #' If the retained categories include the value of \code{other}, an error is #' thrown. If \code{other} is in the list of discarded levels, no error #' occurs. #' @examples #' data(okc) #' #' set.seed(19) #' in_train <- sample(1:nrow(okc), size = 30000) #' #' okc_tr <- okc[ in_train,] #' okc_te <- okc[-in_train,] #' #' rec <- recipe(~ diet + location, data = okc_tr) #' #' #' rec <- rec %>% #' step_other(diet, location, threshold = .1, other = "other values") #' rec <- prep(rec, training = okc_tr) #' #' collapsed <- bake(rec, okc_te) #' table(okc_te$diet, collapsed$diet, useNA = "always") step_other <- function(recipe, ..., role = NA, trained = FALSE, threshold = .05, other = "other", objects = NULL) { if (threshold <= 0) stop("`threshold` should be greater than zero", call. = FALSE) if (threshold >= 1) stop("`threshold` should be less than one", call. = FALSE) add_step( recipe, step_other_new( terms = check_ellipses(...), role = role, trained = trained, threshold = threshold, other = other, objects = objects ) ) } step_other_new <- function(terms = NULL, role = NA, trained = FALSE, threshold = NULL, other = NULL, objects = NULL) { step( subclass = "other", terms = terms, role = role, trained = trained, threshold = threshold, other = other, objects = objects ) } #' @importFrom stats sd #' @export prep.step_other <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) objects <- lapply(training[, col_names], keep_levels, prop = x$threshold, other = x$other) step_other_new( terms = x$terms, role = x$role, trained = TRUE, threshold = x$threshold, other = x$other, objects = objects ) } #' @importFrom tibble as_tibble is_tibble #' @export bake.step_other <- function(object, newdata, ...) { for (i in names(object$objects)) { if (object$objects[[i]]$collapse) { tmp <- if (!is.character(newdata[, i])) as.character(getElement(newdata, i)) else getElement(newdata, i) tmp <- ifelse(tmp %in% object$objects[[i]]$keep, tmp, object$objects[[i]]$other) tmp <- factor(tmp, levels = c(object$objects[[i]]$keep, object$objects[[i]]$other)) tmp[is.na(getElement(newdata, i))] <- NA newdata[, i] <- tmp } } if (!is_tibble(newdata)) newdata <- as_tibble(newdata) newdata } print.step_other <- function(x, width = max(20, options()$width - 30), ...) { cat("Collapsing factor levels for ", sep = "") printer(names(x$objects), x$terms, x$trained, width = width) invisible(x) } keep_levels <- function(x, prop = .1, other = "other") { if (!is.factor(x)) x <- factor(x) xtab <- sort(table(x, useNA = "no"), decreasing = TRUE) / sum(!is.na(x)) dropped <- which(xtab < prop) orig <- levels(x) collapse <- length(dropped) > 0 if (collapse) { keepers <- names(xtab[-dropped]) if (length(keepers) == 0) keepers <- names(xtab)[which.max(xtab)] if (other %in% keepers) stop( "The level ", other, " is already a factor level that will be retained. ", "Please choose a different value.", call. = FALSE ) } else keepers <- orig list(keep = orig[orig %in% keepers], collapse = collapse, other = other) } recipes/R/rm.R0000644000177700017770000000522313135741217014206 0ustar herbrandtherbrandt#' General Variable Filter #' #' \code{step_rm} creates a \emph{specification} of a recipe step that will #' remove variables based on their name, type, or role. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables that #' will evaluated by the filtering bake. See \code{\link{selections}} for #' more details. #' @param role Not used by this step since no new variables are created. #' @param removals A character string that contains the names of columns that #' should be removed. These values are not determined until #' \code{\link{prep.recipe}} is called. #' @keywords datagen #' @concept preprocessing variable_filters #' @export #' @examples #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' library(dplyr) #' smaller_set <- rec %>% #' step_rm(contains("gen")) #' #' smaller_set <- prep(smaller_set, training = biomass_tr) #' #' filtered_te <- bake(smaller_set, biomass_te) #' filtered_te step_rm <- function(recipe, ..., role = NA, trained = FALSE, removals = NULL) { add_step(recipe, step_rm_new( terms = check_ellipses(...), role = role, trained = trained, removals = removals )) } step_rm_new <- function(terms = NULL, role = NA, trained = FALSE, removals = NULL) { step( subclass = "rm", terms = terms, role = role, trained = trained, removals = removals ) } #' @export prep.step_rm <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) step_rm_new( terms = x$terms, role = x$role, trained = TRUE, removals = col_names ) } #' @export bake.step_rm <- function(object, newdata, ...) { if (length(object$removals) > 0) newdata <- newdata[, !(colnames(newdata) %in% object$removals)] as_tibble(newdata) } print.step_rm <- function(x, width = max(20, options()$width - 22), ...) { if (x$trained) { if (length(x$removals) > 0) { cat("Variables removed ") cat(format_ch_vec(x$removals, width = width)) } else cat("No variables were removed") } else { cat("Delete terms ", sep = "") cat(format_selectors(x$terms, wdth = width)) } if (x$trained) cat(" [trained]\n") else cat("\n") invisible(x) } recipes/R/isomap.R0000644000177700017770000001322513135741217015061 0ustar herbrandtherbrandt#' Isomap Embedding #' #' \code{step_isomap} creates a \emph{specification} of a recipe step that will #' convert numeric data into one or more new dimensions. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables will be #' used to compute the dimensions. See \code{\link{selections}} for more #' details. #' @param role For model terms created by this step, what analysis role should #' they be assigned?. By default, the function assumes that the new #' dimension columns created by the original variables will be used as #' predictors in a model. #' @param num The number of isomap dimensions to retain as new predictors. If #' \code{num} is greater than the number of columns or the number of #' possible dimensions, a smaller value will be used. #' @param options A list of options to \code{\link[dimRed]{Isomap}}. #' @param res The \code{\link[dimRed]{Isomap}} object is stored here once this #' preprocessing step has be trained by \code{\link{prep.recipe}}. #' @param prefix A character string that will be the prefix to the resulting #' new variables. See notes below #' @keywords datagen #' @concept preprocessing isomap projection_methods #' @export #' @details Isomap is a form of multidimensional scaling (MDS). MDS methods #' try to find a reduced set of dimensions such that the geometric distances #' between the original data points are preserved. This version of MDS uses #' nearest neighbors in the data as a method for increasing the fidelity of #' the new dimensions to the original data values. #' #' It is advisable to center and scale the variables prior to running Isomap #' (\code{step_center} and \code{step_scale} can be used for this purpose). #' #' The argument \code{num} controls the number of components that will be #' retained (the original variables that are used to derive the components #' are removed from the data). The new components will have names that begin #' with \code{prefix} and a sequence of numbers. The variable names are #' padded with zeros. For example, if \code{num < 10}, their names will be #' \code{Isomap1} - \code{Isomap9}. If \code{num = 101}, the names would be #' \code{Isomap001} - \code{Isomap101}. #' @references De Silva, V., and Tenenbaum, J. B. (2003). Global versus local #' methods in nonlinear dimensionality reduction. \emph{Advances in Neural #' Information Processing Systems}. 721-728. #' #' \pkg{dimRed}, a framework for dimensionality reduction, #' \url{https://github.com/gdkrmr} #' #' @examples #' data(biomass) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, #' data = biomass_tr) #' #' im_trans <- rec %>% #' step_YeoJohnson(all_predictors()) %>% #' step_center(all_predictors()) %>% #' step_scale(all_predictors()) %>% #' step_isomap(all_predictors(), #' options = list(knn = 100), #' num = 2) #' #' im_estimates <- prep(im_trans, training = biomass_tr) #' #' im_te <- bake(im_estimates, biomass_te) #' #' rng <- extendrange(c(im_te$Isomap1, im_te$Isomap2)) #' plot(im_te$Isomap1, im_te$Isomap2, #' xlim = rng, ylim = rng) #' @seealso \code{\link{step_pca}} \code{\link{step_kpca}} #' \code{\link{step_ica}} \code{\link{recipe}} \code{\link{prep.recipe}} #' \code{\link{bake.recipe}} step_isomap <- function(recipe, ..., role = "predictor", trained = FALSE, num = 5, options = list(knn = 50, .mute = c("message", "output")), res = NULL, prefix = "Isomap") { add_step( recipe, step_isomap_new( terms = check_ellipses(...), role = role, trained = trained, num = num, options = options, res = res, prefix = prefix ) ) } step_isomap_new <- function(terms = NULL, role = "predictor", trained = FALSE, num = NULL, options = NULL, res = NULL, prefix = "isomap") { step( subclass = "isomap", terms = terms, role = role, trained = trained, num = num, options = options, res = res, prefix = prefix ) } #' @importFrom dimRed embed dimRedData #' @export prep.step_isomap <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) x$num <- min(x$num, ncol(training)) x$options$knn <- min(x$options$knn, nrow(training)) imap <- embed( dimRedData(as.data.frame(training[, col_names, drop = FALSE])), "Isomap", knn = x$options$knn, ndim = x$num, .mute = x$options$.mute ) step_isomap_new( terms = x$terms, role = x$role, trained = TRUE, num = x$num, options = x$options, res = imap, prefix = x$prefix ) } #' @export bake.step_isomap <- function(object, newdata, ...) { isomap_vars <- colnames(environment(object$res@apply)$indata) comps <- object$res@apply( dimRedData(as.data.frame(newdata[, isomap_vars, drop = FALSE])) )@data comps <- comps[, 1:object$num, drop = FALSE] colnames(comps) <- names0(ncol(comps), object$prefix) newdata <- cbind(newdata, as_tibble(comps)) newdata <- newdata[, !(colnames(newdata) %in% isomap_vars), drop = FALSE] if (!is_tibble(newdata)) newdata <- as_tibble(newdata) newdata } print.step_isomap <- function(x, width = max(20, options()$width - 35), ...) { cat("Isomap approximation with ") printer(colnames(x$res@org.data), x$terms, x$trained, width = width) invisible(x) } recipes/R/ordinalscore.R0000644000177700017770000000722013135741217016253 0ustar herbrandtherbrandt#' Convert Ordinal Factors to Numeric Scores #' #' \code{step_ordinalscore} creates a \emph{specification} of a recipe step that #' will convert ordinal factor variables into numeric scores. #' #' @inheritParams step_center #' @inherit step_center return #' @param role Not used by this step since no new variables are created. #' @param columns A character string of variables that will be converted. This is \code{NULL} #' until computed by \code{\link{prep.recipe}}. #' @param convert A function that takes an ordinal factor vector as an input and outputs a single numeric variable. #' @keywords datagen #' @concept preprocessing ordinal_data #' @export #' @details Dummy variables from ordered factors with \code{C} levels will create polynomial basis functions with \code{C-1} terms. As an alternative, this step can be used to translate the ordered levels into a single numeric vector of values that represent (subjective) scores. By default, the translation uses a linear scale (1, 2, 3, ... \code{C}) but custom score functions can also be used (see the example below). #' @examples #' fail_lvls <- c("meh", "annoying", "really_bad") #' #' ord_data <- #' data.frame(item = c("paperclip", "twitter", "airbag"), #' fail_severity = factor(fail_lvls, #' levels = fail_lvls, #' ordered = TRUE)) #' #' model.matrix(~fail_severity, data = ord_data) #' #' linear_values <- recipe(~ item + fail_severity, data = ord_data) %>% #' step_dummy(item) %>% #' step_ordinalscore(fail_severity) #' #' linear_values <- prep(linear_values, training = ord_data, retain = TRUE) #' #' juice(linear_values, everything()) #' #' custom <- function(x) { #' new_values <- c(1, 3, 7) #' new_values[as.numeric(x)] #' } #' #' nonlin_scores <- recipe(~ item + fail_severity, data = ord_data) %>% #' step_dummy(item) %>% #' step_ordinalscore(fail_severity, convert = custom) #' #' nonlin_scores <- prep(nonlin_scores, training = ord_data, retain = TRUE) #' #' juice(nonlin_scores, everything()) step_ordinalscore <- function(recipe, ..., role = NA, trained = FALSE, columns = NULL, convert = as.numeric) { add_step( recipe, step_ordinalscore_new( terms = check_ellipses(...), role = role, trained = trained, columns = columns, convert = convert ) ) } step_ordinalscore_new <- function(terms = NULL, role = NA, trained = FALSE, columns = NULL, convert = NULL) { step( subclass = "ordinalscore", terms = terms, role = role, trained = trained, columns = columns, convert = convert ) } #' @export prep.step_ordinalscore <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) ord_check <- vapply(training[, col_names], is.ordered, c(logic = TRUE)) if (!all(ord_check)) stop("Ordinal factor variables should be selected as ", "inputs into this step.", call. = TRUE) step_ordinalscore_new( terms = x$terms, role = x$role, trained = TRUE, columns = col_names, convert = x$convert ) } #' @export bake.step_ordinalscore <- function(object, newdata, ...) { scores <- lapply(newdata[, object$columns], object$convert) for (i in object$columns) newdata[, i] <- scores[[i]] as_tibble(newdata) } print.step_ordinalscore <- function(x, width = max(20, options()$width - 30), ...) { cat("Scoring for ", sep = "") printer(x$columns, x$terms, x$trained, width = width) invisible(x) } recipes/R/nzv.R0000644000177700017770000001202013135741217014376 0ustar herbrandtherbrandt#' Near-Zero Variance Filter #' #' \code{step_nzv} creates a \emph{specification} of a recipe step that will #' potentially remove variables that are highly sparse and unbalanced. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables that #' will evaluated by the filtering bake. See \code{\link{selections}} for #' more details. #' @param role Not used by this step since no new variables are created. #' @param options A list of options for the filter (see Details below). #' @param removals A character string that contains the names of columns that #' should be removed. These values are not determined until #' \code{\link{prep.recipe}} is called. #' @keywords datagen #' @concept preprocessing variable_filters #' @export #' #' @details This step diagnoses predictors that have one unique value (i.e. #' are zero variance predictors) or predictors that are have both of the #' following characteristics: #' \enumerate{ #' \item they have very few unique values relative to the number of samples #' and #' \item the ratio of the frequency of the most common value to the #' frequency of the second most common value is large. #' } #' #' For example, an example of near zero variance predictor is one that, for #' 1000 samples, has two distinct values and 999 of them are a single value. #' #' To be flagged, first the frequency of the most prevalent value over the #' second most frequent value (called the "frequency ratio") must be above #' \code{freq_cut}. Secondly, the "percent of unique values," the number of #' unique values divided by the total number of samples (times 100), must #' also be below \code{unique_cut}. #' #' In the above example, the frequency ratio is 999 and the unique value #' percentage is 0.0001. #' @examples #' data(biomass) #' #' biomass$sparse <- c(1, rep(0, nrow(biomass) - 1)) #' #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur + sparse, #' data = biomass_tr) #' #' nzv_filter <- rec %>% #' step_nzv(all_predictors()) #' #' filter_obj <- prep(nzv_filter, training = biomass_tr) #' #' filtered_te <- bake(filter_obj, biomass_te) #' any(names(filtered_te) == "sparse") #' @seealso \code{\link{step_corr}} \code{\link{recipe}} #' \code{\link{prep.recipe}} \code{\link{bake.recipe}} step_nzv <- function(recipe, ..., role = NA, trained = FALSE, options = list(freq_cut = 95 / 5, unique_cut = 10), removals = NULL) { add_step( recipe, step_nzv_new( terms = check_ellipses(...), role = role, trained = trained, options = options, removals = removals ) ) } step_nzv_new <- function(terms = NULL, role = NA, trained = FALSE, options = NULL, removals = NULL) { step( subclass = "nzv", terms = terms, role = role, trained = trained, options = options, removals = removals ) } #' @export prep.step_nzv <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) filter <- nzv( x = training[, col_names], freq_cut = x$options$freq_cut, unique_cut = x$options$unique_cut ) step_nzv_new( terms = x$terms, role = x$role, trained = TRUE, options = x$options, removals = filter ) } #' @export bake.step_nzv <- function(object, newdata, ...) { if (length(object$removals) > 0) newdata <- newdata[, !(colnames(newdata) %in% object$removals)] as_tibble(newdata) } print.step_nzv <- function(x, width = max(20, options()$width - 38), ...) { if (x$trained) { if (length(x$removals) > 0) { cat("Sparse, unbalanced variable filter removed ") cat(format_ch_vec(x$removals, width = width)) } else cat("Sparse, unbalanced variable filter removed no terms") } else { cat("Correlation filter on ", sep = "") cat(format_selectors(x$terms, wdth = width)) } if (x$trained) cat(" [trained]\n") else cat("\n") invisible(x) } nzv <- function(x, freq_cut = 95 / 5, unique_cut = 10) { if (is.null(dim(x))) x <- matrix(x, ncol = 1) fr_foo <- function(data) { t <- table(data[!is.na(data)]) if (length(t) <= 1) { return(0) } w <- which.max(t) return(max(t, na.rm = TRUE) / max(t[-w], na.rm = TRUE)) } freq_ratio <- vapply(x, fr_foo, c(ratio = 0)) uni_foo <- function(data) length(unique(data[!is.na(data)])) lunique <- vapply(x, uni_foo, c(num = 0)) pct_unique <- 100 * lunique / vapply(x, length, c(num = 0)) zero_func <- function(data) all(is.na(data)) zero_var <- (lunique == 1) | vapply(x, zero_func, c(zv = TRUE)) out <- which( (freq_ratio > freq_cut & pct_unique <= unique_cut) | zero_var) names(out) <- NULL colnames(x)[out] } recipes/R/BoxCox.R0000644000177700017770000001224113135741217014770 0ustar herbrandtherbrandt#' Box-Cox Transformation for Non-Negative Data #' #' \code{step_BoxCox} creates a \emph{specification} of a recipe step that will #' transform data using a simple Box-Cox transformation. #' #' @inheritParams step_center #' @inherit step_center return #' @param role Not used by this step since no new variables are created. #' @param lambdas A numeric vector of transformation values. This is #' \code{NULL} until computed by \code{\link{prep.recipe}}. #' @param limits A length 2 numeric vector defining the range to compute the #' transformation parameter lambda. #' @param nunique An integer where data that have less possible values will #' not be evaluate for a transformation. #' @keywords datagen #' @concept preprocessing transformation_methods #' @export #' @details The Box-Cox transformation, which requires a strictly positive #' variable, can be used to rescale a variable to be more similar to a #' normal distribution. In this package, the partial log-likelihood function #' is directly optimized within a reasonable set of transformation values #' (which can be changed by the user). #' #' This transformation is typically done on the outcome variable using the #' residuals for a statistical model (such as ordinary least squares). #' Here, a simple null model (intercept only) is used to apply the #' transformation to the \emph{predictor} variables individually. This can #' have the effect of making the variable distributions more symmetric. #' #' If the transformation parameters are estimated to be very closed to the #' bounds, or if the optimization fails, a value of \code{NA} is used and #' no transformation is applied. #' #' @references Sakia, R. M. (1992). The Box-Cox transformation technique: #' A review. \emph{The Statistician}, 169-178.. #' @examples #' #' rec <- recipe(~ ., data = as.data.frame(state.x77)) #' #' bc_trans <- step_BoxCox(rec, all_numeric()) #' #' bc_estimates <- prep(bc_trans, training = as.data.frame(state.x77)) #' #' bc_data <- bake(bc_estimates, as.data.frame(state.x77)) #' #' plot(density(state.x77[, "Illiteracy"]), main = "before") #' plot(density(bc_data$Illiteracy), main = "after") #' @seealso \code{\link{step_YeoJohnson}} \code{\link{recipe}} #' \code{\link{prep.recipe}} \code{\link{bake.recipe}} step_BoxCox <- function(recipe, ..., role = NA, trained = FALSE, lambdas = NULL, limits = c(-5, 5), nunique = 5) { add_step( recipe, step_BoxCox_new( terms = check_ellipses(...), role = role, trained = trained, lambdas = lambdas, limits = sort(limits)[1:2], nunique = nunique ) ) } step_BoxCox_new <- function(terms = NULL, role = NA, trained = FALSE, lambdas = NULL, limits = NULL, nunique = NULL) { step( subclass = "BoxCox", terms = terms, role = role, trained = trained, lambdas = lambdas, limits = limits, nunique = nunique ) } #' @export prep.step_BoxCox <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) values <- vapply( training[, col_names], estimate_bc, c(lambda = 0), limits = x$limits, nunique = x$nunique ) values <- values[!is.na(values)] step_BoxCox_new( terms = x$terms, role = x$role, trained = TRUE, lambdas = values, limits = x$limits, nunique = x$nunique ) } #' @export bake.step_BoxCox <- function(object, newdata, ...) { if (length(object$lambdas) == 0) return(as_tibble(newdata)) param <- names(object$lambdas) for (i in seq_along(object$lambdas)) newdata[, param[i]] <- bc_trans(getElement(newdata, param[i]), lambda = object$lambdas[i]) as_tibble(newdata) } print.step_BoxCox <- function(x, width = max(20, options()$width - 35), ...) { cat("Box-Cox transformation on ", sep = "") printer(names(x$lambdas), x$terms, x$trained, width = width) invisible(x) } ## computes the new data bc_trans <- function(x, lambda, eps = .001) { if (is.na(lambda)) return(x) if (abs(lambda) < eps) log(x) else (x ^ lambda - 1) / lambda } ## helper for the log-likelihood calc #' @importFrom stats var ll_bc <- function(lambda, y, gm, eps = .001) { n <- length(y) gm0 <- gm ^ (lambda - 1) z <- if (abs(lambda) <= eps) log(y) / gm0 else (y ^ lambda - 1) / (lambda * gm0) var_z <- var(z) * (n - 1) / n - .5 * n * log(var_z) } #' @importFrom stats complete.cases ## eliminates missing data and returns -llh bc_obj <- function(lam, dat) { dat <- dat[complete.cases(dat)] geo_mean <- exp(mean(log(dat))) ll_bc(lambda = lam, y = dat, gm = geo_mean) } #' @importFrom stats optimize ## estimates the values estimate_bc <- function(dat, limits = c(-5, 5), nunique = 5) { eps <- .001 if (length(unique(dat)) < nunique | any(dat[complete.cases(dat)] <= 0)) return(NA) res <- optimize( bc_obj, interval = limits, maximum = TRUE, dat = dat, tol = .0001 ) lam <- res$maximum if (abs(limits[1] - lam) <= eps | abs(limits[2] - lam) <= eps) lam <- NA lam } recipes/R/ratio.R0000644000177700017770000001126213135741217014706 0ustar herbrandtherbrandt#' Ratio Variable Creation #' #' \code{step_ratio} creates a a \emph{specification} of a recipe step that #' will create one or more ratios out of numeric variables. #' #' @inheritParams step_center #' @inherit step_center return #' @param ... One or more selector functions to choose which variables will #' be used in the \emph{numerator} of the ratio. When used with #' \code{denom_vars}, the dots indicates which variables are used in the #' \emph{denominator}. See \code{\link{selections}} for more details. #' @param role For terms created by this step, what analysis role should #' they be assigned?. By default, the function assumes that the newly created #' ratios created by the original variables will be used as #' predictors in a model. #' @param denom A call to \code{denom_vars} to specify which variables are #' used in the denominator that can include specific variable names #' separated by commas or different selectors (see #' \code{\link{selections}}). If a column is included in both lists to be #' numerator and denominator, it will be removed from the listing. #' @param naming A function that defines the naming convention for new ratio #' columns. #' @param columns The column names used in the ratios. This argument is #' not populated until \code{\link{prep.recipe}} is executed. #' @keywords datagen #' @concept preprocessing #' @export #' @examples #' library(recipes) #' data(biomass) #' #' biomass$total <- apply(biomass[, 3:7], 1, sum) #' biomass_tr <- biomass[biomass$dataset == "Training",] #' biomass_te <- biomass[biomass$dataset == "Testing",] #' #' rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + #' sulfur + total, #' data = biomass_tr) #' #' ratio_recipe <- rec %>% #' # all predictors over total #' step_ratio(all_predictors(), denom = denom_vars(total)) %>% #' # get rid of the original predictors #' step_rm(all_predictors(), -matches("_o_")) #' #' #' ratio_recipe <- prep(ratio_recipe, training = biomass_tr) #' #' ratio_data <- bake(ratio_recipe, biomass_te) #' ratio_data step_ratio <- function(recipe, ..., role = "predictor", trained = FALSE, denom = denom_vars(), naming = function(numer, denom) make.names(paste(numer, denom, sep = "_o_")), columns = NULL) { if (is_empty(denom)) stop("Please supply at least one denominator variable specification. ", "See ?selections.", call. = FALSE) add_step( recipe, step_ratio_new( terms = check_ellipses(...), role = role, trained = trained, denom = denom, naming = naming, columns = columns ) ) } step_ratio_new <- function(terms = NULL, role = "predictor", trained = FALSE, denom = NULL, naming = NULL, columns = NULL ) { step( subclass = "ratio", terms = terms, role = role, trained = trained, denom = denom, naming = naming, columns = columns ) } #' @export prep.step_ratio <- function(x, training, info = NULL, ...) { col_names <- expand.grid( top = terms_select(x$terms, info = info), bottom = terms_select(x$denom, info = info), stringsAsFactors = FALSE ) col_names <- col_names[!(col_names$top == col_names$bottom), ] if (nrow(col_names) == 0) stop("No variables were selected for making ratios", call. = FALSE) if (any(info$type[info$variable %in% col_names$top] != "numeric")) stop("The ratio variables should be numeric") if (any(info$type[info$variable %in% col_names$bottom] != "numeric")) stop("The ratio variables should be numeric") step_ratio_new( terms = x$terms, role = x$role, trained = TRUE, denom = x$denom, naming = x$naming, columns = col_names ) } #' @export bake.step_ratio <- function(object, newdata, ...) { res <- newdata[, object$columns$top] / newdata[, object$columns$bottom] colnames(res) <- apply(object$columns, 1, function(x) object$naming(x[1], x[2])) if (!is_tibble(res)) res <- as_tibble(res) newdata <- cbind(newdata, res) if (!is_tibble(newdata)) newdata <- as_tibble(newdata) newdata } print.step_ratio <- function(x, width = max(20, options()$width - 30), ...) { cat("Ratios from ") if (x$trained) { vars <- c(unique(x$columns$top), unique(x$columns$bottom)) cat(format_ch_vec(vars, width = width)) } else cat(format_selectors(c(x$terms, x$denom), wdth = width)) if (x$trained) cat(" [trained]\n") else cat("\n") invisible(x) } #' @export #' @rdname step_ratio denom_vars <- function(...) quos(...) recipes/R/meanimpute.R0000644000177700017770000000631213135741217015734 0ustar herbrandtherbrandt#' Impute Numeric Data Using the Mean #' #' \code{step_meanimpute} creates a \emph{specification} of a recipe step that #' will substitute missing values of numeric variables by the training set #' mean of those variables. #' #' @inheritParams step_center #' @inherit step_center return #' @param role Not used by this step since no new variables are created. #' @param means A named numeric vector of means. This is \code{NULL} until #' computed by \code{\link{prep.recipe}}. #' @param trim The fraction (0 to 0.5) of observations to be trimmed from each #' end of the variables before the mean is computed. Values of trim outside #' that range are taken as the nearest endpoint. #' @keywords datagen #' @concept preprocessing imputation #' @export #' @details \code{step_meanimpute} estimates the variable means from the data #' used in the \code{training} argument of \code{prep.recipe}. #' \code{bake.recipe} then applies the new values to new data sets using #' these averages. #' @examples #' data("credit_data") #' #' ## missing data per column #' vapply(credit_data, function(x) mean(is.na(x)), c(num = 0)) #' #' set.seed(342) #' in_training <- sample(1:nrow(credit_data), 2000) #' #' credit_tr <- credit_data[ in_training, ] #' credit_te <- credit_data[-in_training, ] #' missing_examples <- c(14, 394, 565) #' #' rec <- recipe(Price ~ ., data = credit_tr) #' #' impute_rec <- rec %>% #' step_meanimpute(Income, Assets, Debt) #' #' imp_models <- prep(impute_rec, training = credit_tr) #' #' imputed_te <- bake(imp_models, newdata = credit_te, everything()) #' #' credit_te[missing_examples,] #' imputed_te[missing_examples, names(credit_te)] step_meanimpute <- function(recipe, ..., role = NA, trained = FALSE, means = NULL, trim = 0) { add_step( recipe, step_meanimpute_new( terms = check_ellipses(...), role = role, trained = trained, means = means, trim = trim ) ) } step_meanimpute_new <- function(terms = NULL, role = NA, trained = FALSE, means = NULL, trim = NULL) { step( subclass = "meanimpute", terms = terms, role = role, trained = trained, means = means, trim = trim ) } #' @export prep.step_meanimpute <- function(x, training, info = NULL, ...) { col_names <- terms_select(x$terms, info = info) if (any(info$type[info$variable %in% col_names] != "numeric")) stop("All variables for mean imputation should be numeric") means <- vapply(training[, col_names], mean, c(mean = 0), trim = x$trim, na.rm = TRUE) step_meanimpute_new( terms = x$terms, role = x$role, trained = TRUE, means, trim = x$trim ) } #' @export bake.step_meanimpute <- function(object, newdata, ...) { for (i in names(object$means)) { if (any(is.na(newdata[, i]))) newdata[is.na(newdata[, i]), i] <- object$means[i] } as_tibble(newdata) } print.step_meanimpute <- function(x, width = max(20, options()$width - 30), ...) { cat("Mean Imputation for ", sep = "") printer(names(x$means), x$terms, x$trained, width = width) invisible(x) } recipes/vignettes/0000755000177700017770000000000013136242227015250 5ustar herbrandtherbrandtrecipes/vignettes/Simple_Example.Rmd0000644000177700017770000001361613136241021020616 0ustar herbrandtherbrandt--- title: "Basic Recipes" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Basic Recipes} output: knitr:::html_vignette: toc: yes --- ```{r ex_setup, include=FALSE} knitr::opts_chunk$set( message = FALSE, digits = 3, collapse = TRUE, comment = "#>" ) options(digits = 3) ``` This document demonstrates some basic uses of recipes. First, some definitions are required: * __variables__ are the original (raw) data columns in a data frame or tibble. For example, in a traditional formula `Y ~ A + B + A:B`, the variables are `A`, `B`, and `Y`. * __roles__ define how variables will be used in the model. Examples are: `predictor` (independent variables), `response`, and `case weight`. This is meant to be open-ended and extensible. * __terms__ are columns in a design matrix such as `A`, `B`, and `A:B`. These can be other derived entities that are grouped such a a set of principal components or a set of columns that define a basis function for a variable. These are synonymous with features in machine learning. Variables that have `predictor` roles would automatically be main effect terms ## An Example The cell segmentation data will be used. It has 58 predictor columns, a factor variable `Class` (the outcome), and two extra labelling columns. Each of the predictors has a suffix for the optical channel (`"Ch1"`-`"Ch4"`). We will first separate the data into a training and test set then remove unimportant variables: ```{r data} library(recipes) library(caret) data(segmentationData) seg_train <- segmentationData %>% filter(Case == "Train") %>% select(-Case, -Cell) seg_test <- segmentationData %>% filter(Case == "Test") %>% select(-Case, -Cell) ``` The idea is that the preprocessing operations will all be created using the training set and then these steps will be applied to both the training and test set. ## An Initial Recipe For a first recipe, let's plan on centering and scaling the predictors. First, we will create a recipe from the original data and then specify the processing steps. Recipes can be created manually by sequentially adding roles to variables in a data set. If the analysis only required **outcomes** and **predictors**, the easiest way to create the initial recipe is to use the standard formula method: ```{r first_rec} rec_obj <- recipe(Class ~ ., data = seg_train) rec_obj ``` The data contained in the `data` argument need not be the training set; this data is only used to catalog the names of the variables and their types (e.g. numeric, etc.). (Note that the formula method here is used to declare the variables and their roles and nothing else. If you use inline functions (e.g. `log`) it will complain. These types of operations can be added later.) ## Preprocessing Steps From here, preprocessing steps can be added sequentially in one of two ways: ```{r step_code, eval = FALSE} rec_obj <- step_name(rec_obj, arguments) ## or rec_obj <- rec_obj %>% step_name(arguments) ``` `step_center` and the other functions will always return updated recipes. One other important facet of the code is the method for specifying which variables should be used in different steps. The manual page `?selections` has more details but [`dplyr`](https://cran.r-project.org/package=dplyr)-like selector functions can be used: * use basic variable names (e.g. `x1, x2`), * [`dplyr`](https://cran.r-project.org/package=dplyr) functions for selecting variables: `contains`, `ends_with`, `everything`, `matches`, `num_range`, and `starts_with`, * functions that subset on the role of the variables that have been specified so far: `all_outcomes`, `all_predictors`, `has_role`, or * similar functions for the type of data: `all_nominal`, `all_numeric`, and `has_type`. Note that the functions listed above are the only ones that can be used to selecto variables inside the steps. Also, minus signs can be used to deselect variables. For our data, we can add the two operations for all of the predictors: ```{r center_scale} standardized <- rec_obj %>% step_center(all_predictors()) %>% step_scale(all_predictors()) standardized ``` It is important to realize that the _specific_ variables have not been declared yet (in this example). In some preprocessing steps, variables will be added or removed from the current list of possible variables. If this is the only preprocessing steps for the predictors, we can now estimate the means and standard deviations from the training set. The `prep` function is used with a recipe and a data set: ```{r trained} trained_rec <- prep(standardized, training = seg_train) ``` Now that the statistics have been estimated, the preprocessing can be applied to the training and test set: ```{r apply} train_data <- bake(trained_rec, newdata = seg_train) test_data <- bake(trained_rec, newdata = seg_test) ``` `bake` returns a tibble: ```{r tibbles} class(test_data) test_data ``` ## Adding Steps After exploring the data, more preprocessing might be required. Steps can be added to the trained recipe. Suppose that we need to create PCA components but only from the predictors from channel 1 and any predictors that are areas: ```{r pca} trained_rec <- trained_rec %>% step_pca(ends_with("Ch1"), contains("area"), num = 5) trained_rec ``` Note that only the last step has been estimated; the first two were previously trained and these activities are not duplicated. We can add the PCA estimates using `prep` again: ```{r pca_training} trained_rec <- prep(trained_rec, training = seg_train) ``` `bake` can be reapplied to get the principal components in addition to the other variables: ```{r pca_bake} test_data <- bake(trained_rec, newdata = seg_test) names(test_data) ``` Note that the PCA components have replaced the original variables that were from channel 1 or measured an area aspect of the cells. There are a number of different steps included in the package: ```{r step_list} steps <- apropos("^step_") steps[!grepl("new$", steps)] ``` recipes/vignettes/Ordering.Rmd0000644000177700017770000000275713132246724017502 0ustar herbrandtherbrandt--- title: "Ordering of Steps" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Ordering of Steps} output: knitr:::html_vignette: toc: yes --- In recipes, there are no constraints related to the order in which steps are added to the recipe. However, there are some general suggestions that you should consider: * If using a Box-Cox transformation, don't center the data first or do any operations that might make the data non-positive. Alternatively, use the Yeo-Johnson transformation so you don't have to worry about this. * Recipes do not automatically create dummy variables (unlike _most_ formula methods). If you want to center, scale, or do any other operations on _all_ of the predictors, run `step_dummy` first so that numeric columns are in the data set instead of factors. * As noted in the help file for `step_interact`, you should make dummy variables _before_ creating the interactions. * If you are lumping infrequently categories together with `step_other`, call `step_other` before `step_dummy`. While your project's needs may vary, here is a suggested order of _potential_ steps that should work for most problems: 1. Impute 1. Individual transformations for skewness and other issues 1. Discretize (if needed and if you have no other choice) 1. Create dummy variables 1. Create interactions 1. Normalization steps (center, scale, range, etc) 1. Multivariate transformation (e.g. PCA, spatial sign, etc) Again, your milage may vary for your particular problem. recipes/vignettes/Selecting_Variables.Rmd0000644000177700017770000000442713135741217021632 0ustar herbrandtherbrandt--- title: "Selecting Variables" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Selecting Variables} output: knitr:::html_vignette: toc: yes --- ```{r ex_setup, include=FALSE} knitr::opts_chunk$set( message = FALSE, digits = 3, collapse = TRUE, comment = "#>" ) options(digits = 3) ``` When recipe steps are used, there are different approaches that can be used to select which variables or features should be used. The three main characteristics of variables that can be queried: * the name of the variable * the data type (e.g. numeric or nominal) * the role that was declared by the recipe The manual pages for `?selections` and `?has_role` have details about the available selection methods. To illustrate this, the credit data will be used: ```{r credit} library(recipes) data("credit_data") str(credit_data) rec <- recipe(Status ~ Seniority + Time + Age + Records, data = credit_data) rec ``` Before any steps are used the information on the original variables is: ```{r var_info_orig} summary(rec, original = TRUE) ``` We can add a step to compute dummy variables on the non-numeric data after we impute any missing data: ```{r dummy_1} dummied <- rec %>% step_dummy(all_nominal()) ``` This will capture _any_ variables that are either character strings or factors: `Status` and `Records`. However, since `Status` is our outcome, we might want to keep it as a factor so we can _subtract_ that variable out either by name or by role: ```{r dummy_2} dummied <- rec %>% step_dummy(Records) # or dummied <- rec %>% step_dummy(all_nominal(), - Status) # or dummied <- rec %>% step_dummy(all_nominal(), - all_outcomes()) ``` Using the last definition: ```{r dummy_3} dummied <- prep(dummied, training = credit_data) with_dummy <- bake(dummied, newdata = credit_data) with_dummy ``` `Status` is unaffected. One important aspect about selecting variables in steps is that the variable names and types may change as steps are being executed. In the above example, `Records` is a factor variable before the step is executed. Afterwards, `Records` is gone and the binary variable `Records_yes` is in its place. One reason to have general selection routines like `all_predictors` or `contains` is to be able to select variables that have not be created yet. recipes/vignettes/Custom_Steps.Rmd0000644000177700017770000002326613135741217020357 0ustar herbrandtherbrandt--- title: "Creating Custom Step Functions" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Custom Steps} %\VignetteEncoding{UTF-8} output: knitr:::html_vignette: toc: yes --- ```{r ex_setup, include=FALSE} knitr::opts_chunk$set( message = FALSE, digits = 3, collapse = TRUE, comment = "#>" ) options(digits = 3) ``` `recipes` contains a number of different steps included in the package: ```{r step_list} library(recipes) steps <- apropos("^step_") steps[!grepl("new$", steps)] ``` You might want to make your own and this page describes how to do that. If you are looking for good examples of existing steps, I would suggest looking at the code for [centering](https://github.com/topepo/recipes/blob/master/R/center.R) or [PCA](https://github.com/topepo/recipes/blob/master/R/pca.R) to start. # A new step definition At an example, let's create a step that replaces the value of a variable with its percentile from the training set. The date that I'll use is from the `recipes` package: ```{r initial} data(biomass) str(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] ``` To illustrate the transformation with the `carbon` variable, the training set distribution of that variables is shown below with a vertical line for the first value of the test set. ```{r carbon_dist} library(ggplot2) theme_set(theme_bw()) ggplot(biomass_tr, aes(x = carbon)) + geom_histogram(binwidth = 5, col = "blue", fill = "blue", alpha = .5) + geom_vline(xintercept = biomass_te$carbon[1], lty = 2) ``` Based on the training set, `r round(mean(biomass_tr$carbon <= biomass_te$carbon[1])*100, 1)`% of the data are less than a value of `r biomass_te$carbon[1]`. There are some applications where it might be advantageous to represent the predictor values are percentiles rather than their original values. Our new step will do this computation for any numeric variables of interest. We will call this `step_percentile`. The code below is designed for illustration and not speed or best practices. I've left out a lot of error trapping that we would want in a real implementation. # Create the initial function. The user-exposed function `step_percentile` is just a simple wrapper around an internal function called `add_step`. This function takes the same arguments as your function and simply adds it to a new recipe. The `...` signfies the variable selectors that can be used. ```{r initial_def} step_percentile <- function(recipe, ..., role = NA, trained = FALSE, ref_dist = NULL, approx = FALSE, options = list(probs = (0:100)/100, names = TRUE)) { ## bake but do not evaluate the variable selectors with ## the `quos` function in `rlang` terms <- rlang::quos(...) if(length(terms) == 0) stop("Please supply at least one variable specification. See ?selections.") add_step( recipe, step_percentile_new( terms = terms, trained = trained, role = role, ref_dist = ref_dist, approx = approx, options = options)) } ``` You should always keep the first four arguments (`recipe` though `trained`) the same as listed above. Some notes: * the `role` argument is used when you either 1) create new variables and want their role to be pre-set or 2) replace the existing variables with new values. The latter is what we will be doing and using `role = NA` will leave the existing role intact. * `trained` is set by the package when the estimation step has been run. You should default your function definition's argument to `FALSE`. I've added extra arguments specific to this step. In order to calculate the percentile, the training data for the relevant columns will need to be saved. This data will be saved in the `ref_dist` object. However, this might be problematic if the data set is large. `approx` would be used when you want to save a grid of pre-computed percentiles from the training set and use these to estimate the percentile for a new data point. If `approx = TRUE`, the argument `ref_dist` will contain the grid for each variable. We will use the `stats::quantile` to compute the grid. However, we might also want to have control over the granularity of this grid, so the `options` argument will be used to define how that calculations is done. We could just use the ellipses (aka `...`) so that any options passed to `step_percentile` that are not one of its arguments will then be passed to `stats::quantile`. We recommend making a seperate list object with the options and use these inside the function. # Initialization of new objects Next, you can utilize the internal function `step` that sets the class of new objects. Using `subclass = "percentile"` will set the class of new objects to `"step_percentile". ```{r initialize} step_percentile_new <- function(terms = NULL, role = NA, trained = FALSE, ref_dist = NULL, approx = NULL, options = NULL) { step( subclass = "percentile", terms = terms, role = role, trained = trained, ref_dist = ref_dist, approx = approx, options = options ) } ``` # Define the estimation procedure You will need to create a new `prep` method for your step's class. To do this, three arguments that the method should have: ```r function(x, training, info = NULL) ``` where * `x` will be the `step_percentile` object * `training` will be a _tibble_ that has the training set data * `info` will also be a tibble that has information on the current set of data available. This information is updated as each step is evaluated by its specific `prep` method so it may not have the variables from the original data. The columns in this tibble are `variable` (the variable name), `type` (currently either "numeric" or "nominal"), `role` (defining the variable's role), and `source` (either "original" or "derived" depending on where it originated). You can define other options. The first thing that you might want to do in the `prep` function is to translate the specification listed in the `terms` argument to column names in the current data. There is an internal function called `terms_select` that can be used to obtain this. ```{r prep_1, eval = FALSE} prep.step_percentile <- function(x, training, info = NULL, ...) { col_names <- terms_select(terms = x$terms, info = info) } ``` Once we have this, we can either save the original data columns or estimate the approximation grid. For the grid, we will use a helper function that enables us to run `do.call` on a list of arguments that include the `options` list. ```{r prep_2} get_pctl <- function(x, args) { args$x <- x do.call("quantile", args) } prep.step_percentile <- function(x, training, info = NULL, ...) { col_names <- terms_select(terms = x$terms, info = info) ## You can add error trapping for non-numeric data here and so on. ## We'll use the names later so if(x$options$names == FALSE) stop("`names` should be set to TRUE") if(!x$approx) { x$ref_dist <- training[, col_names] } else { pctl <- lapply( training[, col_names], get_pctl, args = x$options ) x$ref_dist <- pctl } ## Always return the updated step x } ``` # Create the `bake` method Remember that the `prep` function does not _apply_ the step to the data; it only estimates any required values such as `ref_dist`. We will need to create a new method for our `step_percentile` class. The minimum arguments for this are ```r function(object, newdata, ...) ``` where `object` is the updated step function that has been through the corresponding `prep` code and `newdata` is a tibble of data to be preprocessingcessed. Here is the code to convert the new data to percentiles. Two initial helper functions handle the two cases (approximation or not). We always return a tibble as the output. ```{r bake} ## Two helper functions pctl_by_mean <- function(x, ref) mean(ref <= x) pctl_by_approx <- function(x, ref) { ## go from 1 column tibble to vector x <- getElement(x, names(x)) ## get the percentiles values from the names (e.g. "10%") p_grid <- as.numeric(gsub("%$", "", names(ref))) approx(x = ref, y = p_grid, xout = x)$y/100 } bake.step_percentile <- function(object, newdata, ...) { require(tibble) ## For illustration (and not speed), we will loop through the affected variables ## and do the computations vars <- names(object$ref_dist) for(i in vars) { if(!object$approx) { ## We can use `apply` since tibbles do not drop dimensions: newdata[, i] <- apply(newdata[, i], 1, pctl_by_mean, ref = object$ref_dist[, i]) } else newdata[, i] <- pctl_by_approx(newdata[, i], object$ref_dist[[i]]) } ## Always convert to tibbles on the way out as_tibble(newdata) } ``` # Running the example Let's use the example data to make sure that it works: ```{r example} rec_obj <- recipe(HHV ~ ., data = biomass_tr[, -(1:2)]) rec_obj <- rec_obj %>% step_percentile(all_predictors(), approx = TRUE) rec_obj <- prep(rec_obj, training = biomass_tr) percentiles <- bake(rec_obj, biomass_te) percentiles ``` The plot below shows how the original data line up with the percentiles for each split of the data for one of the predictors: ```{r cdf_plot, echo = FALSE} grid_pct <- rec_obj$steps[[1]]$options$probs plot_data <- data.frame( carbon = c( quantile(biomass_tr$carbon, probs = grid_pct), biomass_te$carbon ), percentile = c(grid_pct, percentiles$carbon), dataset = rep( c("Training", "Testing"), c(length(grid_pct), nrow(percentiles)) ) ) ggplot(plot_data, aes(x = carbon, y = percentile, col = dataset)) + geom_point(alpha = .4, cex = 2) + theme(legend.position = "top") ``` recipes/MD50000644000177700017770000002245713136342173013563 0ustar herbrandtherbrandt6b1fbe18a564d9e05e83081587800ca4 *DESCRIPTION 9ed058621fa6f53ab41f8f212291b6c2 *NAMESPACE ee608d65a63e343e9954115594cabfd3 *NEWS.md 7c1e34669a83a0b2a7592dd9e0d1fbb2 *R/BoxCox.R e65da45dac822687cb050210c7b09d68 *R/YeoJohnson.R a520753018c1fd42a42ff3c68a8ca689 *R/bag_imp.R 3c0634e9debd664de368ce4fcd514da0 *R/bin2factor.R 09b4d519ba843bcd610911871f19e364 *R/center.R dd04a134028014575c3683c79e2a9b9c *R/classdist.R cf578c88dae8b0d33df61cc18f5b35d3 *R/corr.R 19d08e7a631aaede8a8639eaac7d82fb *R/data.R 68e0f1094c3413b8b4a6068db1b4c621 *R/date.R 280c26d3d7160a66290053108837c6b1 *R/depth.R e5fc6bd59476c2a1d6fb8758cbe41ab3 *R/discretize.R d6d77d7d1e1c17f67316df6440a11450 *R/dummy.R 3aa1fc188cf19c6d96376c575a02d669 *R/holiday.R f27b000060b6bddd9a537c9b817d2cdb *R/hyperbolic.R 444bfa74286ec154aec985224fe33a6d *R/ica.R 1ef68b9cf753c60151fbb69ef1829868 *R/interactions.R a01c48e3c15253bd4057c4a75f104811 *R/intercept.R 1102fd888c0ba728e4d8ef2aed48e82f *R/invlogit.R b70484cbced7bc39b6f1b848215be9fd *R/isomap.R 452719f35009648bf32ba21c71e56ea8 *R/knn_imp.R 179a405fcbd02bdc4d515f485cd3865d *R/kpca.R da044733b5ba8050490715eb9c461b82 *R/lincombo.R 05abc53192652735201fff749b4585cb *R/log.R 894f65fdecdf2550e4fb3fd4fa84bc96 *R/logit.R 846797e57c4e2407d45187270378fc72 *R/meanimpute.R 39157df324a60a40efaf0c4563d4b96f *R/misc.R ff7231ba10cc09600c4ffa59f09deee9 *R/modeimpute.R 7f38471989ea0259c40476334917ea09 *R/ns.R 7eb09f7654bb7be4bec0f57856ea468c *R/nzv.R a2a8895fe0f2fbe2360f577df99b14e8 *R/ordinalscore.R 670e5590996e4ef421b69c3b09948db4 *R/other.R d3aee42ffbae201df9b46c4f4287b5af *R/pca.R f5b7c4f945877cb42e32cd96414854a7 *R/pkg.R c75cb75a9301e3627857c341979712c5 *R/poly.R d90750877d572f3bfa8720b976a74ce5 *R/range.R 18507e2dfdce5f16b732e18a2ae6b061 *R/ratio.R 382edeb3255b00aa28ffec20199101a9 *R/recipe.R 55b444e7488d9d6f25ca1c57a7cf7ba1 *R/regex.R 53233a2a56740b8ebf43692b0bc21ead *R/rm.R de64b8042ebe32be6939a5974d2507eb *R/roles.R bb71fa12d1fcc3bf8877c6162e4bdb68 *R/scale.R 5d3c6dd231d9e64a6ceba4d3d81f98ab *R/selections.R 312dee6c355000365201e871b42caf8a *R/shuffle.R 3200d03829d576d4b1ad808ac5a13a6e *R/spatialsign.R 57c8938eee6af867a7b5c75edd6e586d *R/sqrt.R 57540dda916844b1a9d21427fd228f21 *R/window.R 546a7881d38ad0eb83b945e5f8c0e39a *build/vignette.rds 5ebd9dd64363dfd1c5cb4505f34e56d9 *data/biomass.RData 6b9a657fc4ffd6c81ef31ecae4c15bdd *data/covers.RData 2b13bd28fcf63623365ba65d6a460068 *data/credit_data.RData 7f522dd3d7d6cdba4699b26247b8ad4a *data/datalist 778f01de1e1fc099b22f40b93aee9aa3 *data/okc.RData b626d73ea193729345770210268faecb *inst/doc/Custom_Steps.R bf6a066d45d50040bed41e49fe9ca9f4 *inst/doc/Custom_Steps.Rmd 4476d0782f9569e8923e0c1bbdddbc8d *inst/doc/Custom_Steps.html 6f47a6cc05c76322a10966b345257daf *inst/doc/Ordering.Rmd 77001bf96182127d91b3dd23b3fd5907 *inst/doc/Ordering.html 78e2f4cf4a9074f5461dcaf22960e3d5 *inst/doc/Selecting_Variables.R 6fdc74bfa5ea87db55545b13c163c019 *inst/doc/Selecting_Variables.Rmd d93e985c12710290808ca548f420f63d *inst/doc/Selecting_Variables.html 68ce66004a2b1f3d3e521dec861c0c7c *inst/doc/Simple_Example.R e6e51ca1d9e1d605ed19253b081f8581 *inst/doc/Simple_Example.Rmd f30853e0b8541517930fec85fa351c24 *inst/doc/Simple_Example.html af5ad8334db8ee7898989d61500ed29a *man/add_role.Rd bd8c0aed85af98810b95bc09e0610852 *man/add_step.Rd 20c973c53ce254ee8a049107cf21475c *man/bake.Rd 10c18f9bc905c4fa9acf503d72a6d5ba *man/biomass.Rd 3efeeec95688239c8366335c06cae9b4 *man/covers.Rd 2d19524f74b563b060182be2b7f9d543 *man/credit_data.Rd 0cd71b71946264bd370872a083729ff9 *man/discretize.Rd e80188e260e6530b99cb04d200cf470d *man/has_role.Rd e8cd4f4ea23fc1c4610595c405dbfe65 *man/juice.Rd cb5b923d75ea8a3e5969f8031c19fc2e *man/names0.Rd 805bb51de405cde4c0c152a071a33de4 *man/okc.Rd 09ab29d7bf8e8db140a1040d09ff8701 *man/prep.Rd cefc6713fba0f89f3ec6477a510d27c2 *man/print.recipe.Rd 3fbe3a44178fa8eeb64f774570e493fe *man/recipe.Rd 43f90d39972a4376d923a5ec35e925eb *man/recipes-internal.Rd 5d3d2bd9e5d0f8dd0624d965b14c2864 *man/recipes.Rd 19b330c57a41b9f4e9577b52ce5d7065 *man/reexports.Rd 337080f0cfd3c74515181431c2dc4042 *man/selections.Rd afb81d525a924caf8467f704f5c93c2f *man/step.Rd da090e640bbce30f2d7dd0ba1c941ac1 *man/step_BoxCox.Rd 559bcef8155630d18ebc45a75221b99b *man/step_YeoJohnson.Rd 4268f48149896942889ee23f5f9da18d *man/step_bagimpute.Rd a938f25452863f8585a0e5950c8861c0 *man/step_bin2factor.Rd 8a6045ffdc926a3ffc9ef9d5bacb75b3 *man/step_center.Rd 6c366c604cb2f68e7a111f8b3bdb9c60 *man/step_classdist.Rd 394b31ac2a20b4fcf0bf56f7fdd0b8c7 *man/step_corr.Rd bfd6b88ed6071b626e0aaa1d2dad5872 *man/step_date.Rd b40310e394715f548de4551ef40075d2 *man/step_depth.Rd 789c218fa954d956b7fb8a6495f2839c *man/step_dummy.Rd c9a545a30e1c4fdc67fc658fca33486c *man/step_holiday.Rd 88b3dee333d3baa797f53a7ab4ff0eda *man/step_hyperbolic.Rd 9de964d81476d8d9ce28ad8df7f12e3a *man/step_ica.Rd 6d850c7806ca891e5517b536887152b9 *man/step_interact.Rd ee6bccb73237180fca3031ba995f21fd *man/step_intercept.Rd 24b908497bd43a3169feb6116c3bd697 *man/step_invlogit.Rd e962c87c69f13ed59bd7bdc6dfece4df *man/step_isomap.Rd b02cdaf76b62616235a633ee24a1aab2 *man/step_knnimpute.Rd a6f5fd1f51f025b71bfdbeb704b8b532 *man/step_kpca.Rd fb73570f9a01562b57d45108e06c556f *man/step_lincomb.Rd 5aa5161a76953959a64af3ae85dc4cc0 *man/step_log.Rd ee93793a66f19a07b4e3f12e6291b4bf *man/step_logit.Rd 37c3ea4f2b499e8ad020fda27e98b855 *man/step_meanimpute.Rd 5e71a0dac970cf6785256c848b688a28 *man/step_modeimpute.Rd ca695408fbd72e960024ed0390bcdbff *man/step_ns.Rd c0ce7c32ae0e1d767203953a0254242f *man/step_nzv.Rd 4edc115bd5bb168c7fdc7481daa64e2e *man/step_ordinalscore.Rd 8cd09494e8392133c3ddb8f69b730437 *man/step_other.Rd 05cb3c65041b9a4f0d404dd61b2cffad *man/step_pca.Rd 83039c32718bf7d29ac9e1f4290efd08 *man/step_poly.Rd cfb55c81f229dc91931d893855e0cfca *man/step_range.Rd 73e8396ae8854b1b74f6a84c60863d89 *man/step_ratio.Rd 0bbb2666f5ff2656c54058d208197cf1 *man/step_regex.Rd a9c916e21237dd9d4a5472c5d59803b9 *man/step_rm.Rd 8a0491d1989de66843f1917b0bcf0804 *man/step_scale.Rd c179f90b1da50731923acbf43ba74f36 *man/step_shuffle.Rd 84c392028f5e21b61878977f79540037 *man/step_spatialsign.Rd 25e19d3764d33d7d9a1b8f1bbfe655cb *man/step_sqrt.Rd 831c39230ddba9a73a271af182280195 *man/step_window.Rd eb57bddf88550e99c38f51e9bcaa4e47 *man/summary.recipe.Rd e38baff20765d630bcbc7928abc09009 *man/terms_select.Rd 8cf19fadd73508d80b84b2cc090afb19 *tests/testthat.R c34d4e78a1917818a04bfaf7e16e6e0e *tests/testthat/test-basics.R a523863f70c4dcf66e0febc27ce0516b *tests/testthat/test_BoxCox.R 4b822ad0c4cfe447e62e31d9e8f6270a *tests/testthat/test_YeoJohnson.R 64db95dea31805262712a6a230ef292b *tests/testthat/test_bagimpute.R 3e4052f71b8618f5b29a725c0611dacb *tests/testthat/test_bin2factor.R 75ccf76b9939b0182bd02e33b20ebded *tests/testthat/test_center_scale.R b09f1ed5201d6cb8cc341a7c0ed98b03 *tests/testthat/test_classdist.R 66a9f3f18e51d570b3c854642d634991 *tests/testthat/test_corr.R 4a6ea23eec48839c16cdf090523201bf *tests/testthat/test_date.R b136e2dd09da39a5e0dc1ac317f89c32 *tests/testthat/test_depth.R c0cae095aa663ed390a6d924e135654e *tests/testthat/test_discretized.R 3f726edca82f59e2cf6a89d8da5c3b39 *tests/testthat/test_dummies.R 242c932ec860ab8006706551ba0bd985 *tests/testthat/test_holiday.R 34500beb845bf3801d66374f7b90a34c *tests/testthat/test_hyperbolic.R 8fcfbfae137e6d566d0653c93d85cddb *tests/testthat/test_ica.R 4696734eeb3d061f29b268dd49f142fa *tests/testthat/test_interact.R 7e43040c550d5c20874ace8c1d4224b7 *tests/testthat/test_intercept.R 3bf3ef9c55d6330f8624a1a41c6b65ca *tests/testthat/test_invlogit.R e785b11b03b2f3330cdd1337db7021c5 *tests/testthat/test_isomap.R ff427e23f4fa0087579a8294a162e24e *tests/testthat/test_knnimpute.R ed377cf22537cba3a8f20f9f4b54b0d3 *tests/testthat/test_kpca.R cd5bfaa00f8eaaf531d7315ad15881d8 *tests/testthat/test_lincomb.R c7b4bed868608c462b5167c29f463d85 *tests/testthat/test_log.R a120bb36843cb09fce6c10017d26b6ea *tests/testthat/test_logit.R 582553ee2340e8cb02413f8d159142d6 *tests/testthat/test_meanimpute.R fcc0b9942eb0e78cd2b300f9fd4fd60a *tests/testthat/test_modeimpute.R 53c252c291bebc96796f72732f941aed *tests/testthat/test_multivariate.R 2c5c0c444583fcf49205404465f1e3cc *tests/testthat/test_ns.R d625575030b50291ca162343a3202914 *tests/testthat/test_nzv.R d6a2eb71ef48b0be9f0a764a72c12827 *tests/testthat/test_ordinalscore.R 1855c5f4207b55bd9733e856cfc06066 *tests/testthat/test_other.R fb69e8f95d495e1ba27300dbeba32da5 *tests/testthat/test_pca.R 13a4bdccfa761ea18799e6e383e2e9ad *tests/testthat/test_poly.R 7f2a4c88b1426abee89f4895388f424b *tests/testthat/test_range.R 171efd750e2349d990761a975da851a0 *tests/testthat/test_ratio.R 6f850dffc53f878b508a83bd611423c0 *tests/testthat/test_regex.R 9d3662288e41cac84805711b37d6cf0a *tests/testthat/test_retraining.R 144b4387e63cea64279eda09505c30c5 *tests/testthat/test_rm.R 98ac684a28c80b3dea65c4b8aae97ee3 *tests/testthat/test_roles.R ad9c395ba5721a10cf7f7b76d10b354e *tests/testthat/test_roll.R 064b0cea2ade54d8561f189606195dcd *tests/testthat/test_select_terms.R d3a8fe61fa8c74e5fa89fdd24897d856 *tests/testthat/test_shuffle.R 3a56b3ae68e3b5c2acdff389d8f15678 *tests/testthat/test_spatialsign.R 35b282d4cb1776d5f85a66564800ef50 *tests/testthat/test_sqrt.R 5c834b276a7b36cf53a397f4f62fbe39 *tests/testthat/test_stringsAsFactors.R bf6a066d45d50040bed41e49fe9ca9f4 *vignettes/Custom_Steps.Rmd 6f47a6cc05c76322a10966b345257daf *vignettes/Ordering.Rmd 6fdc74bfa5ea87db55545b13c163c019 *vignettes/Selecting_Variables.Rmd e6e51ca1d9e1d605ed19253b081f8581 *vignettes/Simple_Example.Rmd recipes/build/0000755000177700017770000000000013136242227014337 5ustar herbrandtherbrandtrecipes/build/vignette.rds0000644000177700017770000000045413136242227016701 0ustar herbrandtherbrandtQO0;-q|>νF:iHxP^~SBG(ģ1 \>~fQB&23QexӮcj8\\O6f{XŒ:ͤ淬i(ѻVu-ěC=6uphvÑ5Ķt #Lv_q3]@+.<vMT,l޼o4u] o VYq1͢z~`&9recipes/DESCRIPTION0000644000177700017770000000252513136342173014753 0ustar herbrandtherbrandtPackage: recipes Title: Preprocessing Tools to Create Design Matrices Version: 0.1.0 Authors@R: c( person("Max", "Kuhn", , "max@rstudio.com", c("aut", "cre")), person("Hadley", "Wickham", , "hadley@rstudio.com", "aut"), person("RStudio", role = "cph")) Description: An extensible framework to create and preprocess design matrices. Recipes consist of one or more data manipulation and analysis "steps". Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting design matrices can then be used as inputs into statistical or machine learning models. URL: https://github.com/topepo/recipes BugReports: https://github.com/topepo/recipes/issues Depends: R (>= 3.2.3), dplyr Imports: tibble, stats, ipred, dimRed (>= 0.1.0), lubridate, timeDate, ddalpha, purrr, rlang (>= 0.1.1), gower, RcppRoll, tidyselect (>= 0.1.1), magrittr Suggests: testthat, rpart, kernlab, fastICA, RANN, igraph, knitr, caret, ggplot2, rmarkdown License: GPL-2 VignetteBuilder: knitr Encoding: UTF-8 LazyData: true RoxygenNote: 6.0.1 NeedsCompilation: no Packaged: 2017-07-27 01:40:39 UTC; max Author: Max Kuhn [aut, cre], Hadley Wickham [aut], RStudio [cph] Maintainer: Max Kuhn Repository: CRAN Date/Publication: 2017-07-27 10:46:19 UTC recipes/man/0000755000177700017770000000000013135742247014021 5ustar herbrandtherbrandtrecipes/man/step_ns.Rd0000644000177700017770000000444713135742247015774 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/ns.R \name{step_ns} \alias{step_ns} \title{Nature Spline Basis Functions} \usage{ step_ns(recipe, ..., role = "predictor", trained = FALSE, objects = NULL, options = list(df = 2)) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new columns created from the original variables will be used as predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{objects}{A list of \code{\link[splines]{ns}} objects created once the step has been trained.} \item{options}{A list of options for \code{\link[splines]{ns}} which should not include \code{x}.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_ns} creates a \emph{specification} of a recipe step that will create new columns that are basis expansions of variables using natural splines. } \details{ \code{step_ns} can new features from a single variable that enable fitting routines to model this variable in a nonlinear manner. The extent of the possible nonlinearity is determined by the \code{df} or \code{knot} arguments of \code{\link[splines]{ns}}. The original variables are removed from the data and new columns are added. The naming convention for the new variables is \code{varname_ns_1} and so on. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) with_splines <- rec \%>\% step_ns(carbon, hydrogen) with_splines <- prep(with_splines, training = biomass_tr) expanded <- bake(with_splines, biomass_te) expanded } \seealso{ \code{\link{step_poly}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing basis_expansion } \keyword{datagen} recipes/man/step_classdist.Rd0000644000177700017770000000544013135742247017337 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/classdist.R \name{step_classdist} \alias{step_classdist} \title{Distances to Class Centroids} \usage{ step_classdist(recipe, ..., class, role = "predictor", trained = FALSE, mean_func = mean, cov_func = cov, pool = FALSE, log = TRUE, objects = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{class}{A single character string that specifies a single categorical variable to be used as the class.} \item{role}{For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that resulting distances will be used as predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{mean_func}{A function to compute the center of the distribution.} \item{cov_func}{A function that computes the covariance matrix} \item{pool}{A logical: should the covariance matrix be computed by pooling the data for all of the classes?} \item{log}{A logical: should the distances be transformed by the natural log function?} \item{objects}{Statistics are stored here once this step has been trained by \code{\link{prep.recipe}}.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_classdist} creates a a \emph{specification} of a recipe step that will convert numeric data into Mahalanobis distance measurements to the data centroid. This is done for each value of a categorical class variable. } \details{ \code{step_classdist} will create a The function will create a new column for every unique value of the \code{class} variable. The resulting variables will not replace the original values and have the prefix \code{classdist_}. Note that, by default, the default covariance function requires that each class should have at least as many rows as variables listed in the \code{terms} argument. If \code{pool = TRUE}, there must be at least as many data points are variables overall. } \examples{ # in case of missing data... mean2 <- function(x) mean(x, na.rm = TRUE) rec <- recipe(Species ~ ., data = iris) \%>\% step_classdist(all_predictors(), class = "Species", pool = FALSE, mean_func = mean2) rec_dists <- prep(rec, training = iris) dists_to_species <- bake(rec_dists, newdata = iris, everything()) ## on log scale: dist_cols <- grep("classdist", names(dists_to_species), value = TRUE) dists_to_species[, c("Species", dist_cols)] } \concept{ preprocessing dimension_reduction } \keyword{datagen} recipes/man/step_sqrt.Rd0000644000177700017770000000312013135742247016330 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/sqrt.R \name{step_sqrt} \alias{step_sqrt} \title{Square Root Transformation} \usage{ step_sqrt(recipe, ..., role = NA, trained = FALSE, columns = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables will be transformed. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{columns}{A character string of variable names that will be (eventually) populated by the \code{terms} argument.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_sqrt} creates a \emph{specification} of a recipe step that will square root transform the data. } \examples{ set.seed(313) examples <- matrix(rnorm(40)^2, ncol = 2) examples <- as.data.frame(examples) rec <- recipe(~ V1 + V2, data = examples) sqrt_trans <- rec \%>\% step_sqrt(all_predictors()) sqrt_obj <- prep(sqrt_trans, training = examples) transformed_te <- bake(sqrt_obj, examples) plot(examples$V1, transformed_te$V1) } \seealso{ \code{\link{step_logit}} \code{\link{step_invlogit}} \code{\link{step_log}} \code{\link{step_hyperbolic}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing transformation_methods } \keyword{datagen} recipes/man/step_dummy.Rd0000644000177700017770000000612313135742247016500 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/dummy.R \name{step_dummy} \alias{step_dummy} \title{Dummy Variables Creation} \usage{ step_dummy(recipe, ..., role = "predictor", trained = FALSE, contrast = options("contrasts"), naming = function(var, lvl) paste(var, make.names(lvl), sep = "_"), levels = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables will be used to create the dummy variables. See \code{\link{selections}} for more details.} \item{role}{For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the binary dummy variable columns created by the original variables will be used as predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{contrast}{A specification for which type of contrast should be used to make a set of full rank dummy variables. See \code{\link[stats]{contrasts}} for more details. \bold{not currently working}} \item{naming}{A function that defines the naming convention for new binary columns. See Details below.} \item{levels}{A list that contains the information needed to create dummy variables for each variable contained in \code{terms}. This is \code{NULL} until the step is trained by \code{\link{prep.recipe}}.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_dummy} creates a a \emph{specification} of a recipe step that will convert nominal data (e.g. character or factors) into one or more numeric binary model terms for the levels of the original data. } \details{ \code{step_dummy} will create a set of binary dummy variables from a factor variable. For example, if a factor column in the data set has levels of "red", "green", "blue", the dummy variable bake will create two additional columns of 0/1 data for two of those three values (and remove the original column). By default, the missing dummy variable will correspond to the first level of the factor being converted. The function allows for non-standard naming of the resulting variables. For a factor named \code{x}, with levels \code{"a"} and \code{"b"}, the default naming convention would be to create a new variable called \code{x_b}. Note that if the factor levels are not valid variable names (e.g. "some text with spaces"), it will be changed by \code{\link[base]{make.names}} to be valid (see the example below). The naming format can be changed using the \code{naming} argument. } \examples{ data(okc) okc <- okc[complete.cases(okc),] rec <- recipe(~ diet + age + height, data = okc) dummies <- rec \%>\% step_dummy(diet) dummies <- prep(dummies, training = okc) dummy_data <- bake(dummies, newdata = okc) unique(okc$diet) grep("^diet", names(dummy_data), value = TRUE) } \concept{ preprocessing dummy_variables model_specification dummy_variables variable_encodings } \keyword{datagen} recipes/man/step_isomap.Rd0000644000177700017770000000746313135742247016645 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/isomap.R \name{step_isomap} \alias{step_isomap} \title{Isomap Embedding} \usage{ step_isomap(recipe, ..., role = "predictor", trained = FALSE, num = 5, options = list(knn = 50, .mute = c("message", "output")), res = NULL, prefix = "Isomap") } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables will be used to compute the dimensions. See \code{\link{selections}} for more details.} \item{role}{For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new dimension columns created by the original variables will be used as predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{num}{The number of isomap dimensions to retain as new predictors. If \code{num} is greater than the number of columns or the number of possible dimensions, a smaller value will be used.} \item{options}{A list of options to \code{\link[dimRed]{Isomap}}.} \item{res}{The \code{\link[dimRed]{Isomap}} object is stored here once this preprocessing step has be trained by \code{\link{prep.recipe}}.} \item{prefix}{A character string that will be the prefix to the resulting new variables. See notes below} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_isomap} creates a \emph{specification} of a recipe step that will convert numeric data into one or more new dimensions. } \details{ Isomap is a form of multidimensional scaling (MDS). MDS methods try to find a reduced set of dimensions such that the geometric distances between the original data points are preserved. This version of MDS uses nearest neighbors in the data as a method for increasing the fidelity of the new dimensions to the original data values. It is advisable to center and scale the variables prior to running Isomap (\code{step_center} and \code{step_scale} can be used for this purpose). The argument \code{num} controls the number of components that will be retained (the original variables that are used to derive the components are removed from the data). The new components will have names that begin with \code{prefix} and a sequence of numbers. The variable names are padded with zeros. For example, if \code{num < 10}, their names will be \code{Isomap1} - \code{Isomap9}. If \code{num = 101}, the names would be \code{Isomap001} - \code{Isomap101}. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) im_trans <- rec \%>\% step_YeoJohnson(all_predictors()) \%>\% step_center(all_predictors()) \%>\% step_scale(all_predictors()) \%>\% step_isomap(all_predictors(), options = list(knn = 100), num = 2) im_estimates <- prep(im_trans, training = biomass_tr) im_te <- bake(im_estimates, biomass_te) rng <- extendrange(c(im_te$Isomap1, im_te$Isomap2)) plot(im_te$Isomap1, im_te$Isomap2, xlim = rng, ylim = rng) } \references{ De Silva, V., and Tenenbaum, J. B. (2003). Global versus local methods in nonlinear dimensionality reduction. \emph{Advances in Neural Information Processing Systems}. 721-728. \pkg{dimRed}, a framework for dimensionality reduction, \url{https://github.com/gdkrmr} } \seealso{ \code{\link{step_pca}} \code{\link{step_kpca}} \code{\link{step_ica}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing isomap projection_methods } \keyword{datagen} recipes/man/credit_data.Rd0000644000177700017770000000106713064546045016556 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{credit_data} \alias{credit_data} \title{Credit Data} \source{ \url{https://github.com/gastonstat/CreditScoring}, \url{http://bit.ly/2kkBFrk} } \value{ \item{credit_data}{a data frame} } \description{ These data are from the website of Dr. Lluís A. Belanche Muñoz by way of a github repository of Dr. Gaston Sanchez. One data point is a missing outcome was removed from the original data. } \examples{ data(credit_data) str(credit_data) } \keyword{datasets} recipes/man/step_regex.Rd0000644000177700017770000000426513135742247016464 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/regex.R \name{step_regex} \alias{step_regex} \title{Create Dummy Variables using Regular Expressions} \usage{ step_regex(recipe, ..., role = "predictor", trained = FALSE, pattern = ".", options = list(), result = make.names(pattern), input = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{A single selector functions to choose which variable will be searched for the pattern. The selector should resolve into a single variable. See \code{\link{selections}} for more details.} \item{role}{For a variable created by this step, what analysis role should they be assigned?. By default, the function assumes that the new dummy variable column created by the original variable will be used as a predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{pattern}{A character string containing a regular expression (or character string for \code{fixed = TRUE}) to be matched in the given character vector. Coerced by \code{as.character} to a character string if possible.} \item{options}{A list of options to \code{\link{grepl}} that should not include \code{x} or \code{pattern}.} \item{result}{A single character value for the name of the new variable. It should be a valid column name.} \item{input}{A single character value for the name of the variable being searched. This is \code{NULL} until computed by \code{\link{prep.recipe}}.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_regex} creates a \emph{specification} of a recipe step that will create a new dummy variable based on a regular expression. } \examples{ data(covers) rec <- recipe(~ description, covers) \%>\% step_regex(description, pattern = "(rock|stony)", result = "rocks") \%>\% step_regex(description, pattern = "ratake families") rec2 <- prep(rec, training = covers) rec2 with_dummies <- bake(rec2, newdata = covers) with_dummies } \concept{ preprocessing dummy_variables regular_expressions } \keyword{datagen} recipes/man/step_ratio.Rd0000644000177700017770000000504513135742247016465 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/ratio.R \name{step_ratio} \alias{step_ratio} \alias{denom_vars} \title{Ratio Variable Creation} \usage{ step_ratio(recipe, ..., role = "predictor", trained = FALSE, denom = denom_vars(), naming = function(numer, denom) make.names(paste(numer, denom, sep = "_o_")), columns = NULL) denom_vars(...) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables will be used in the \emph{numerator} of the ratio. When used with \code{denom_vars}, the dots indicates which variables are used in the \emph{denominator}. See \code{\link{selections}} for more details.} \item{role}{For terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the newly created ratios created by the original variables will be used as predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{denom}{A call to \code{denom_vars} to specify which variables are used in the denominator that can include specific variable names separated by commas or different selectors (see \code{\link{selections}}). If a column is included in both lists to be numerator and denominator, it will be removed from the listing.} \item{naming}{A function that defines the naming convention for new ratio columns.} \item{columns}{The column names used in the ratios. This argument is not populated until \code{\link{prep.recipe}} is executed.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_ratio} creates a a \emph{specification} of a recipe step that will create one or more ratios out of numeric variables. } \examples{ library(recipes) data(biomass) biomass$total <- apply(biomass[, 3:7], 1, sum) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur + total, data = biomass_tr) ratio_recipe <- rec \%>\% # all predictors over total step_ratio(all_predictors(), denom = denom_vars(total)) \%>\% # get rid of the original predictors step_rm(all_predictors(), -matches("_o_")) ratio_recipe <- prep(ratio_recipe, training = biomass_tr) ratio_data <- bake(ratio_recipe, biomass_te) ratio_data } \concept{ preprocessing } \keyword{datagen} recipes/man/step_range.Rd0000644000177700017770000000416213135742247016442 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/range.R \name{step_range} \alias{step_range} \title{Scaling Numeric Data to a Specific Range} \usage{ step_range(recipe, ..., role = NA, trained = FALSE, min = 0, max = 1, ranges = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables will be scaled. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{min}{A single numeric value for the smallest value in the range} \item{max}{A single numeric value for the largest value in the range} \item{ranges}{A character vector of variables that will be normalized. Note that this is ignored until the values are determined by \code{\link{prep.recipe}}. Setting this value will be ineffective.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_range} creates a \emph{specification} of a recipe step that will normalize numeric data to have a standard deviation of one. } \details{ Scaling data means that the standard deviation of a variable is divided out of the data. \code{step_range} estimates the variable standard deviations from the data used in the \code{training} argument of \code{prep.recipe}. \code{bake.recipe} then applies the scaling to new data sets using these standard deviations. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) ranged_trans <- rec \%>\% step_range(carbon, hydrogen) ranged_obj <- prep(ranged_trans, training = biomass_tr) transformed_te <- bake(ranged_obj, biomass_te) biomass_te[1:10, names(transformed_te)] transformed_te } \concept{ preprocessing normalization_methods } \keyword{datagen} recipes/man/step_rm.Rd0000644000177700017770000000305413135742247015763 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/rm.R \name{step_rm} \alias{step_rm} \title{General Variable Filter} \usage{ step_rm(recipe, ..., role = NA, trained = FALSE, removals = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables that will evaluated by the filtering bake. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{removals}{A character string that contains the names of columns that should be removed. These values are not determined until \code{\link{prep.recipe}} is called.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_rm} creates a \emph{specification} of a recipe step that will remove variables based on their name, type, or role. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) library(dplyr) smaller_set <- rec \%>\% step_rm(contains("gen")) smaller_set <- prep(smaller_set, training = biomass_tr) filtered_te <- bake(smaller_set, biomass_te) filtered_te } \concept{ preprocessing variable_filters } \keyword{datagen} recipes/man/step_center.Rd0000644000177700017770000000405513135742247016627 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/center.R \name{step_center} \alias{step_center} \title{Centering Numeric Data} \usage{ step_center(recipe, ..., role = NA, trained = FALSE, means = NULL, na.rm = TRUE) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{means}{A named numeric vector of means. This is \code{NULL} until computed by \code{\link{prep.recipe}}.} \item{na.rm}{A logical value indicating whether \code{NA} values should be removed when averaging.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_center} creates a \emph{specification} of a recipe step that will normalize numeric data to have a mean of zero. } \details{ Centering data means that the average of a variable is subtracted from the data. \code{step_center} estimates the variable means from the data used in the \code{training} argument of \code{prep.recipe}. \code{bake.recipe} then applies the centering to new data sets using these means. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) center_trans <- rec \%>\% step_center(carbon, contains("gen"), -hydrogen) center_obj <- prep(center_trans, training = biomass_tr) transformed_te <- bake(center_obj, biomass_te) biomass_te[1:10, names(transformed_te)] transformed_te } \seealso{ \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing normalization_methods } \keyword{datagen} recipes/man/step_modeimpute.Rd0000644000177700017770000000404313135742247017514 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/modeimpute.R \name{step_modeimpute} \alias{step_modeimpute} \title{Impute Nominal Data Using the Most Common Value} \usage{ step_modeimpute(recipe, ..., role = NA, trained = FALSE, modes = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{modes}{A named character vector of modes. This is \code{NULL} until computed by \code{\link{prep.recipe}}.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_modeimpute} creates a \emph{specification} of a recipe step that will substitute missing values of nominal variables by the training set mode of those variables. } \details{ \code{step_modeimpute} estimates the variable modes from the data used in the \code{training} argument of \code{prep.recipe}. \code{bake.recipe} then applies the new values to new data sets using these values. If the training set data has more than one mode, one is selected at random. } \examples{ data("credit_data") ## missing data per column vapply(credit_data, function(x) mean(is.na(x)), c(num = 0)) set.seed(342) in_training <- sample(1:nrow(credit_data), 2000) credit_tr <- credit_data[ in_training, ] credit_te <- credit_data[-in_training, ] missing_examples <- c(14, 394, 565) rec <- recipe(Price ~ ., data = credit_tr) impute_rec <- rec \%>\% step_modeimpute(Status, Home, Marital) imp_models <- prep(impute_rec, training = credit_tr) imputed_te <- bake(imp_models, newdata = credit_te, everything()) table(credit_te$Home, imputed_te$Home, useNA = "always") } \concept{ preprocessing imputation } \keyword{datagen} recipes/man/step_meanimpute.Rd0000644000177700017770000000425313135742247017513 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/meanimpute.R \name{step_meanimpute} \alias{step_meanimpute} \title{Impute Numeric Data Using the Mean} \usage{ step_meanimpute(recipe, ..., role = NA, trained = FALSE, means = NULL, trim = 0) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{means}{A named numeric vector of means. This is \code{NULL} until computed by \code{\link{prep.recipe}}.} \item{trim}{The fraction (0 to 0.5) of observations to be trimmed from each end of the variables before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_meanimpute} creates a \emph{specification} of a recipe step that will substitute missing values of numeric variables by the training set mean of those variables. } \details{ \code{step_meanimpute} estimates the variable means from the data used in the \code{training} argument of \code{prep.recipe}. \code{bake.recipe} then applies the new values to new data sets using these averages. } \examples{ data("credit_data") ## missing data per column vapply(credit_data, function(x) mean(is.na(x)), c(num = 0)) set.seed(342) in_training <- sample(1:nrow(credit_data), 2000) credit_tr <- credit_data[ in_training, ] credit_te <- credit_data[-in_training, ] missing_examples <- c(14, 394, 565) rec <- recipe(Price ~ ., data = credit_tr) impute_rec <- rec \%>\% step_meanimpute(Income, Assets, Debt) imp_models <- prep(impute_rec, training = credit_tr) imputed_te <- bake(imp_models, newdata = credit_te, everything()) credit_te[missing_examples,] imputed_te[missing_examples, names(credit_te)] } \concept{ preprocessing imputation } \keyword{datagen} recipes/man/add_step.Rd0000644000177700017770000000073113113567007016067 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/misc.R \name{add_step} \alias{add_step} \title{Add a New Step to Current Recipe} \usage{ add_step(rec, object) } \arguments{ \item{rec}{A \code{\link{recipe}}.} \item{object}{A step object.} } \value{ A updated \code{\link{recipe}} with the new step in the last slot. } \description{ \code{add_step} adds a step to the last location in the recipe. } \concept{ preprocessing } \keyword{datagen} recipes/man/step_holiday.Rd0000644000177700017770000000447013135742247017001 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/holiday.R \name{step_holiday} \alias{step_holiday} \title{Holiday Feature Generator} \usage{ step_holiday(recipe, ..., role = "predictor", trained = FALSE, holidays = c("LaborDay", "NewYearsDay", "ChristmasDay"), columns = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables will be used to create the new variables. The selected variables should have class \code{Date} or \code{POSIXct}. See \code{\link{selections}} for more details.} \item{role}{For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new variable columns created by the original variables will be used as predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{holidays}{A character string that includes at least one holdiay supported by the \code{timeDate} package. See \code{\link[timeDate]{listHolidays}} for a complete list.} \item{columns}{A character string of variables that will be used as inputs. This field is a placeholder and will be populated once \code{\link{prep.recipe}} is used.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_holiday} creates a a \emph{specification} of a recipe step that will convert date data into one or more binary indicator variables for common holidays. } \details{ Unlike other steps, \code{step_holiday} does \emph{not} remove the original date variables. \code{\link{step_rm}} can be used for this purpose. } \examples{ library(lubridate) examples <- data.frame(someday = ymd("2000-12-20") + days(0:40)) holiday_rec <- recipe(~ someday, examples) \%>\% step_holiday(all_predictors()) holiday_rec <- prep(holiday_rec, training = examples) holiday_values <- bake(holiday_rec, newdata = examples) holiday_values } \seealso{ \code{\link{step_date}} \code{\link{step_rm}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} \code{\link[timeDate]{listHolidays}} } \concept{ preprocessing model_specification variable_encodings dates } \keyword{datagen} recipes/man/step_kpca.Rd0000644000177700017770000001075413135742247016270 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/kpca.R \name{step_kpca} \alias{step_kpca} \title{Kernel PCA Signal Extraction} \usage{ step_kpca(recipe, ..., role = "predictor", trained = FALSE, num = 5, res = NULL, options = list(kernel = "rbfdot", kpar = list(sigma = 0.2)), prefix = "kPC") } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables will be used to compute the components. See \code{\link{selections}} for more details.} \item{role}{For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new principal component columns created by the original variables will be used as predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{num}{The number of PCA components to retain as new predictors. If \code{num} is greater than the number of columns or the number of possible components, a smaller value will be used.} \item{res}{An S4 \code{\link[kernlab]{kpca}} object is stored here once this preprocessing step has be trained by \code{\link{prep.recipe}}.} \item{options}{A list of options to \code{\link[kernlab]{kpca}}. Defaults are set for the arguments \code{kernel} and \code{kpar} but others can be passed in. \bold{Note} that the arguments \code{x} and \code{features} should not be passed here (or at all).} \item{prefix}{A character string that will be the prefix to the resulting new variables. See notes below.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_kpca} a \emph{specification} of a recipe step that will convert numeric data into one or more principal components using a kernel basis expansion. } \details{ Kernel principal component analysis (kPCA) is an extension a PCA analysis that conducts the calculations in a broader dimensionality defined by a kernel function. For example, if a quadratic kernel function were used, each variable would be represented by its original values as well as its square. This nonlinear mapping is used during the PCA analysis and can potentially help find better representations of the original data. As with ordinary PCA, it is important to standardized the variables prior to running PCA (\code{step_center} and \code{step_scale} can be used for this purpose). When performing kPCA, the kernel function (and any important kernel parameters) must be chosen. The \pkg{kernlab} package is used and the reference below discusses the types of kernels available and their parameter(s). These specifications can be made in the \code{kernel} and \code{kpar} slots of the \code{options} argument to \code{step_kpca}. The argument \code{num} controls the number of components that will be retained (the original variables that are used to derive the components are removed from the data). The new components will have names that begin with \code{prefix} and a sequence of numbers. The variable names are padded with zeros. For example, if \code{num < 10}, their names will be \code{kPC1} - \code{kPC9}. If \code{num = 101}, the names would be \code{kPC001} - \code{kPC101}. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) kpca_trans <- rec \%>\% step_YeoJohnson(all_predictors()) \%>\% step_center(all_predictors()) \%>\% step_scale(all_predictors()) \%>\% step_kpca(all_predictors()) kpca_estimates <- prep(kpca_trans, training = biomass_tr) kpca_te <- bake(kpca_estimates, biomass_te) rng <- extendrange(c(kpca_te$kPC1, kpca_te$kPC2)) plot(kpca_te$kPC1, kpca_te$kPC2, xlim = rng, ylim = rng) } \references{ Scholkopf, B., Smola, A., and Muller, K. (1997). Kernel principal component analysis. \emph{Lecture Notes in Computer Science}, 1327, 583-588. Karatzoglou, K., Smola, A., Hornik, K., and Zeileis, A. (2004). kernlab - An S4 package for kernel methods in R. \emph{Journal of Statistical Software}, 11(1), 1-20. } \seealso{ \code{\link{step_pca}} \code{\link{step_ica}} \code{\link{step_isomap}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing pca projection_methods kernel_methods } \keyword{datagen} recipes/man/step_ica.Rd0000644000177700017770000000730713135742247016106 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/ica.R \name{step_ica} \alias{step_ica} \title{ICA Signal Extraction} \usage{ step_ica(recipe, ..., role = "predictor", trained = FALSE, num = 5, options = list(), res = NULL, prefix = "IC") } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables will be used to compute the components. See \code{\link{selections}} for more details.} \item{role}{For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new independent component columns created by the original variables will be used as predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{num}{The number of ICA components to retain as new predictors. If \code{num} is greater than the number of columns or the number of possible components, a smaller value will be used.} \item{options}{A list of options to \code{\link[fastICA]{fastICA}}. No defaults are set here. \bold{Note} that the arguments \code{X} and \code{n.comp} should not be passed here.} \item{res}{The \code{\link[fastICA]{fastICA}} object is stored here once this preprocessing step has be trained by \code{\link{prep.recipe}}.} \item{prefix}{A character string that will be the prefix to the resulting new variables. See notes below.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_ica} creates a \emph{specification} of a recipe step that will convert numeric data into one or more independent components. } \details{ Independent component analysis (ICA) is a transformation of a group of variables that produces a new set of artificial features or components. ICA assumes that the variables are mixtures of a set of distinct, non-Gaussian signals and attempts to transform the data to isolate these signals. Like PCA, the components are statistically independent from one another. This means that they can be used to combat large inter-variables correlations in a data set. Also like PCA, it is advisable to center and scale the variables prior to running ICA. This package produces components using the "FastICA" methodology (see reference below). The argument \code{num} controls the number of components that will be retained (the original variables that are used to derive the components are removed from the data). The new components will have names that begin with \code{prefix} and a sequence of numbers. The variable names are padded with zeros. For example, if \code{num < 10}, their names will be \code{IC1} - \code{IC9}. If \code{num = 101}, the names would be \code{IC001} - \code{IC101}. } \examples{ # from fastICA::fastICA set.seed(131) S <- matrix(runif(400), 200, 2) A <- matrix(c(1, 1, -1, 3), 2, 2, byrow = TRUE) X <- as.data.frame(S \%*\% A) tr <- X[1:100, ] te <- X[101:200, ] rec <- recipe( ~ ., data = tr) ica_trans <- step_center(rec, V1, V2) ica_trans <- step_scale(rec, V1, V2) ica_trans <- step_ica(rec, V1, V2, num = 2) ica_estimates <- prep(ica_trans, training = tr) ica_data <- bake(ica_estimates, te) plot(te$V1, te$V2) plot(ica_data$IC1, ica_data$IC2) } \references{ Hyvarinen, A., and Oja, E. (2000). Independent component analysis: algorithms and applications. \emph{Neural Networks}, 13(4-5), 411-430. } \seealso{ \code{\link{step_pca}} \code{\link{step_kpca}} \code{\link{step_isomap}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing ica projection_methods } \keyword{datagen} recipes/man/step_lincomb.Rd0000644000177700017770000000427613135742247016777 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/lincombo.R \name{step_lincomb} \alias{step_lincomb} \title{Linear Combination Filter} \usage{ step_lincomb(recipe, ..., role = NA, trained = FALSE, max_steps = 5, removals = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{max_steps}{A value .} \item{removals}{A character string that contains the names of columns that should be removed. These values are not determined until \code{\link{prep.recipe}} is called.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_lincomb} creates a \emph{specification} of a recipe step that will potentially remove numeric variables that have linear combinations between them. } \details{ This step finds exact linear combinations between two or more variables and recommends which column(s) should be removed to resolve the issue. This algorithm may need to be applied multiple times (as defined by \code{max_steps}). } \examples{ data(biomass) biomass$new_1 <- with(biomass, .1*carbon - .2*hydrogen + .6*sulfur) biomass$new_2 <- with(biomass, .5*carbon - .2*oxygen + .6*nitrogen) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur + new_1 + new_2, data = biomass_tr) lincomb_filter <- rec \%>\% step_lincomb(all_predictors()) prep(lincomb_filter, training = biomass_tr) } \seealso{ \code{\link{step_nzv}}\code{\link{step_corr}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \author{ Max Kuhn, Kirk Mettler, and Jed Wing } \concept{ preprocessing variable_filters } \keyword{datagen} recipes/man/recipes-internal.Rd0000644000177700017770000000060613135756764017567 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/YeoJohnson.R, R/misc.R \name{yj_trans} \alias{yj_trans} \alias{estimate_yj} \alias{prepare} \title{Internal Functions} \usage{ yj_trans(x, lambda, eps = 0.001) estimate_yj(dat, limits = c(-5, 5), nunique = 5) prepare(x, ...) } \description{ These are not to be used directly by the users. } \keyword{internal} recipes/man/step_knnimpute.Rd0000644000177700017770000000714613135742247017365 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/knn_imp.R \name{step_knnimpute} \alias{step_knnimpute} \title{Imputation via K-Nearest Neighbors} \usage{ step_knnimpute(recipe, ..., role = NA, trained = FALSE, K = 5, impute_with = imp_vars(all_predictors()), ref_data = NULL, columns = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose variables. For \code{step_knnimpute}, this indicates the variables to be imputed. When used with \code{imp_vars}, the dots indicates which variables are used to predict the missing data in each variable. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{K}{The number of neighbors.} \item{impute_with}{A call to \code{imp_vars} to specify which variables are used to impute the variables that can include specific variable names separated by commas or different selectors (see \code{\link{selections}}). If a column is included in both lists to be imputed and to be an imputation predictor, it will be removed from the latter and not used to impute itself.} \item{ref_data}{A tibble of data that will reflect the data preprocessing done up to the point of this imputation step. This is \code{NULL} until the step is trained by \code{\link{prep.recipe}}.} \item{columns}{The column names that will be imputed and used for imputation. This is \code{NULL} until the step is trained by \code{\link{prep.recipe}}.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_knnimpute} creates a \emph{specification} of a recipe step that will impute missing data using nearest neighbors. } \details{ The step uses the training set to impute any other data sets. The only distance function available is Gower's distance which can be used for mixtures of nominal and numeric data. Once the nearest neighbors are determined, the mode is used to predictor nominal variables and the mean is used for numeric data. Note that if a variable that is to be imputed is also in \code{impute_with}, this variable will be ignored. It is possible that missing values will still occur after imputation if a large majority (or all) of the imputing variables are also missing. } \examples{ library(recipes) data(biomass) biomass_tr <- biomass[biomass$dataset == "Training", ] biomass_te <- biomass[biomass$dataset == "Testing", ] biomass_te_whole <- biomass_te # induce some missing data at random set.seed(9039) carb_missing <- sample(1:nrow(biomass_te), 3) nitro_missing <- sample(1:nrow(biomass_te), 3) biomass_te$carbon[carb_missing] <- NA biomass_te$nitrogen[nitro_missing] <- NA rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) ratio_recipe <- rec \%>\% step_knnimpute(all_predictors(), K = 3) ratio_recipe2 <- prep(ratio_recipe, training = biomass_tr) imputed <- bake(ratio_recipe2, biomass_te) # how well did it work? summary(biomass_te_whole$carbon) cbind(before = biomass_te_whole$carbon[carb_missing], after = imputed$carbon[carb_missing]) summary(biomass_te_whole$nitrogen) cbind(before = biomass_te_whole$nitrogen[nitro_missing], after = imputed$nitrogen[nitro_missing]) } \references{ Gower, C. (1971) "A general coefficient of similarity and some of its properties," Biometrics, 857-871. } \concept{ preprocessing imputation } \keyword{datagen} recipes/man/step_date.Rd0000644000177700017770000000607513135742247016270 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/date.R \name{step_date} \alias{step_date} \title{Date Feature Generator} \usage{ step_date(recipe, ..., role = "predictor", trained = FALSE, features = c("dow", "month", "year"), abbr = TRUE, label = TRUE, ordinal = FALSE, columns = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables that will be used to create the new variables. The selected variables should have class \code{Date} or \code{POSIXct}. See \code{\link{selections}} for more details.} \item{role}{For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new variable columns created by the original variables will be used as predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{features}{A character string that includes at least one of the following values: \code{month}, \code{dow} (day of week), \code{doy} (day of year), \code{week}, \code{month}, \code{decimal} (decimal date, e.g. 2002.197), \code{quarter}, \code{semester}, \code{year}.} \item{abbr}{A logical. Only available for features \code{month} or \code{dow}. \code{FALSE} will display the day of the week as an ordered factor of character strings, such as "Sunday". \code{TRUE} will display an abbreviated version of the label, such as "Sun". \code{abbr} is disregarded if \code{label = FALSE}.} \item{label}{A logical. Only available for features \code{month} or \code{dow}. \code{TRUE} will display the day of the week as an ordered factor of character strings, such as "Sunday." \code{FALSE} will display the day of the week as a number.} \item{ordinal}{A logical: should factors be ordered? Only available for features \code{month} or \code{dow}.} \item{columns}{A character string of variables that will be used as inputs. This field is a placeholder and will be populated once \code{\link{prep.recipe}} is used.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_date} creates a a \emph{specification} of a recipe step that will convert date data into one or more factor or numeric variables. } \details{ Unlike other steps, \code{step_date} does \emph{not} remove the original date variables. \code{\link{step_rm}} can be used for this purpose. } \examples{ library(lubridate) examples <- data.frame(Dan = ymd("2002-03-04") + days(1:10), Stefan = ymd("2006-01-13") + days(1:10)) date_rec <- recipe(~ Dan + Stefan, examples) \%>\% step_date(all_predictors()) date_rec <- prep(date_rec, training = examples) date_values <- bake(date_rec, newdata = examples) date_values } \seealso{ \code{\link{step_holiday}} \code{\link{step_rm}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing model_specification variable_encodings dates } \keyword{datagen} recipes/man/recipe.Rd0000644000177700017770000001331513135742247015562 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/recipe.R \name{recipe} \alias{recipe} \alias{recipe.default} \alias{recipe.formula} \alias{recipe.default} \alias{recipe.data.frame} \alias{recipe.formula} \alias{recipe.matrix} \title{Create a Recipe for Preprocessing Data} \usage{ recipe(x, ...) \method{recipe}{default}(x, ...) \method{recipe}{data.frame}(x, formula = NULL, ..., vars = NULL, roles = NULL) \method{recipe}{formula}(formula, data, ...) \method{recipe}{matrix}(x, ...) } \arguments{ \item{x, data}{A data frame or tibble of the \emph{template} data set (see below).} \item{...}{Further arguments passed to or from other methods (not currently used).} \item{formula}{A model formula. No in-line functions should be used here (e.g. \code{log(x)}, \code{x:y}, etc.). These types of transformations should be enacted using \code{step} functions in this package. Dots are allowed as are simple multivariate outcome terms (i.e. no need for \code{cbind}; see Examples).} \item{vars}{A character string of column names corresponding to variables that will be used in any context (see below)} \item{roles}{A character string (the same length of \code{vars}) that describes a single role that the variable will take. This value could be anything but common roles are \code{"outcome"}, \code{"predictor"}, \code{"case_weight"}, or \code{"ID"}} } \value{ An object of class \code{recipe} with sub-objects: \item{var_info}{A tibble containing information about the original data set columns} \item{term_info}{A tibble that contains the current set of terms in the data set. This initially defaults to the same data contained in \code{var_info}.} \item{steps}{A list of \code{step} objects that define the sequence of preprocessing steps that will be applied to data. The default value is \code{NULL}} \item{template}{A tibble of the data. This is initialized to be the same as the data given in the \code{data} argument but can be different after the recipe is trained.} } \description{ A recipe is a description of what steps should be applied to a data set in order to get it ready for data analysis. } \details{ Recipes are alternative methods for creating design matrices and for preprocessing data. Variables in recipes can have any type of \emph{role} in subsequent analyses such as: outcome, predictor, case weights, stratification variables, etc. \code{recipe} objects can be created in several ways. If the analysis only contains outcomes and predictors, the simplest way to create one is to use a simple formula (e.g. \code{y ~ x1 + x2}) that does not contain inline functions such as \code{log(x3)}. An example is given below. Alternatively, a \code{recipe} object can be created by first specifying which variables in a data set should be used and then sequentially defining their roles (see the last example). Steps to the recipe can be added sequentially. Steps can include common operations like logging a variable, creating dummy variables or interactions and so on. More computationally complex actions such as dimension reduction or imputation can also be specified. Once a recipe has been defined, the \code{\link{prep}} function can be used to estimate quants required in the steps from a data set (a.k.a. the training data). \code{\link{prep}} returns another recipe. To apply the recipe to a data set, the \code{\link{bake}} function is used in the same manner as \code{predict} would be for models. This applies the steps to any data set. Note that the data passed to \code{recipe} need not be the complete data that will be used to train the steps (by \code{\link{prep}}). The recipe only needs to know the names and types of data that will be used. For large data sets, \code{head} could be used to pass the recipe a smaller data set to save time and memory. } \examples{ ############################################### # simple example: data(biomass) # split data biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] # When only predictors and outcomes, a simplified formula can be used. rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) # Now add preprocessing steps to the recipe. sp_signed <- rec \%>\% step_center(all_predictors()) \%>\% step_scale(all_predictors()) \%>\% step_spatialsign(all_predictors()) sp_signed # now estimate required parameters sp_signed_trained <- prep(sp_signed, training = biomass_tr) sp_signed_trained # apply the preprocessing to a data set test_set_values <- bake(sp_signed_trained, newdata = biomass_te) # or use pipes for the entire workflow: rec <- biomass_tr \%>\% recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur) \%>\% step_center(all_predictors()) \%>\% step_scale(all_predictors()) \%>\% step_spatialsign(all_predictors()) ############################################### # multivariate example # no need for `cbind(carbon, hydrogen)` for left-hand side multi_y <- recipe(carbon + hydrogen ~ oxygen + nitrogen + sulfur, data = biomass_tr) multi_y <- multi_y \%>\% step_center(all_outcomes()) \%>\% step_scale(all_predictors()) multi_y_trained <- prep(multi_y, training = biomass_tr) results <- bake(multi_y_trained, biomass_te) ############################################### # Creating a recipe manually with different roles rec <- recipe(biomass_tr) \%>\% add_role(carbon, hydrogen, oxygen, nitrogen, sulfur, new_role = "predictor") \%>\% add_role(HHV, new_role = "outcome") \%>\% add_role(sample, new_role = "id variable") \%>\% add_role(dataset, new_role = "splitting indicator") rec } \author{ Max Kuhn } \concept{ preprocessing model_specification } \keyword{datagen} recipes/man/step_intercept.Rd0000644000177700017770000000343713135742247017347 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/intercept.R \name{step_intercept} \alias{step_intercept} \title{Add intercept (or constant) column} \usage{ step_intercept(recipe, ..., role = "predictor", trained = FALSE, name = "intercept", value = 1) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{Argument ignored; included for consistency with other step specification functions.} \item{role}{Defaults to "predictor"} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated. Again included for consistency.} \item{name}{Character name for new added column} \item{value}{A numeric constant to fill the intercept column. Defaults to 1.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_intercept} creates a \emph{specification} of a recipe step that will add an intercept or constant term in the first column of a data matrix. \code{step_intercept} has defaults to \emph{predictor} role so that it is by default called in the bake step. Be careful to avoid unintentional transformations when calling steps with \code{all_predictors}. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) rec_trans <- recipe(HHV ~ ., data = biomass_tr[, -(1:2)]) \%>\% step_intercept(value = 2) rec_obj <- prep(rec_trans, training = biomass_tr) with_intercept <- bake(rec_obj, biomass_te) with_intercept } \seealso{ \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } recipes/man/step_bin2factor.Rd0000644000177700017770000000407213135742247017377 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/bin2factor.R \name{step_bin2factor} \alias{step_bin2factor} \title{Create a Factors from A Dummy Variable} \usage{ step_bin2factor(recipe, ..., role = NA, trained = FALSE, levels = c("yes", "no"), columns = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{Selector functions that choose which variables will be converted. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{levels}{A length 2 character string that indicate the factor levels for the 1's (in the first position) and the zeros (second)} \item{columns}{A vector with the selected variable names. This is \code{NULL} until computed by \code{\link{prep.recipe}}.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_bin2factor} creates a \emph{specification} of a recipe step that will create a two-level factor from a single dummy variable. } \details{ This operation may be useful for situations where a binary piece of information may need to be represented as categorical instead of numeric. For example, naive Bayes models would do better to have factor predictors so that the binomial distribution is modeled in stead of a Gaussian probability density of numeric binary data. Note that the numeric data is only verified to be numeric (and does not count levels). } \examples{ data(covers) rec <- recipe(~ description, covers) \%>\% step_regex(description, pattern = "(rock|stony)", result = "rocks") \%>\% step_regex(description, pattern = "(rock|stony)", result = "more_rocks") \%>\% step_bin2factor(rocks) rec <- prep(rec, training = covers) results <- bake(rec, newdata = covers) table(results$rocks, results$more_rocks) } \concept{ preprocessing dummy_variables factors } \keyword{datagen} recipes/man/step_shuffle.Rd0000644000177700017770000000257513135742247017010 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/shuffle.R \name{step_shuffle} \alias{step_shuffle} \title{Shuffle Variables} \usage{ step_shuffle(recipe, ..., role = NA, trained = FALSE, columns = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables will permuted. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{columns}{A character string that contains the names of columns that should be shuffled. These values are not determined until \code{\link{prep.recipe}} is called.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_shuffle} creates a \emph{specification} of a recipe step that will randomly change the order of rows for selected variables. } \examples{ integers <- data.frame(A = 1:12, B = 13:24, C = 25:36) library(dplyr) rec <- recipe(~ A + B + C, data = integers) \%>\% step_shuffle(A, B) rand_set <- prep(rec, training = integers) set.seed(5377) bake(rand_set, integers) } \concept{ preprocessing randomization permutation } \keyword{datagen} recipes/man/add_role.Rd0000644000177700017770000000214313113567007016054 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/roles.R \name{add_role} \alias{add_role} \title{Manually Add Roles} \usage{ add_role(recipe, ..., new_role = "predictor") } \arguments{ \item{recipe}{An existing \code{\link{recipe}}.} \item{...}{One or more selector functions to choose which variables are being assigned a role. See \code{\link{selections}} for more details.} \item{new_role}{A character string for a single role.} } \value{ An updated recipe object. } \description{ \code{add_role} can add a role definition to an existing variable in the recipe. } \details{ If a variable is selected that currently has a role, the role is changed and a warning is issued. } \examples{ data(biomass) # Create the recipe manually rec <- recipe(x = biomass) rec summary(rec) rec <- rec \%>\% add_role(carbon, contains("gen"), sulfur, new_role = "predictor") \%>\% add_role(sample, new_role = "id variable") \%>\% add_role(dataset, new_role = "splitting variable") \%>\% add_role(HHV, new_role = "outcome") rec } \concept{ preprocessing model_specification } \keyword{datagen} recipes/man/step_invlogit.Rd0000644000177700017770000000376613135742247017212 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/invlogit.R \name{step_invlogit} \alias{step_invlogit} \title{Inverse Logit Transformation} \usage{ step_invlogit(recipe, ..., role = NA, trained = FALSE, columns = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{columns}{A character string of variable names that will be (eventually) populated by the \code{terms} argument.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_invlogit} creates a \emph{specification} of a recipe step that will transform the data from real values to be between zero and one. } \details{ The inverse logit transformation takes values on the real line and translates them to be between zero and one using the function \code{f(x) = 1/(1+exp(-x))}. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) ilogit_trans <- rec \%>\% step_center(carbon, hydrogen) \%>\% step_scale(carbon, hydrogen) \%>\% step_invlogit(carbon, hydrogen) ilogit_obj <- prep(ilogit_trans, training = biomass_tr) transformed_te <- bake(ilogit_obj, biomass_te) plot(biomass_te$carbon, transformed_te$carbon) } \seealso{ \code{\link{step_logit}} \code{\link{step_log}} \code{\link{step_sqrt}} \code{\link{step_hyperbolic}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing transformation_methods } \keyword{datagen} recipes/man/discretize.Rd0000644000177700017770000001054413135742247016461 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/discretize.R \name{discretize} \alias{discretize} \alias{discretize.default} \alias{discretize.numeric} \alias{predict.discretize} \alias{step_discretize} \title{Discretize Numeric Variables} \usage{ discretize(x, ...) \method{discretize}{default}(x, ...) \method{discretize}{numeric}(x, cuts = 4, labels = NULL, prefix = "bin", keep_na = TRUE, infs = TRUE, min_unique = 10, ...) \method{predict}{discretize}(object, newdata, ...) step_discretize(recipe, ..., role = NA, trained = FALSE, objects = NULL, options = list()) } \arguments{ \item{x}{A numeric vector} \item{...}{For \code{discretize}: options to pass to \code{\link[stats]{quantile}} that should not include \code{x} or \code{probs}. For \code{step_discretize}, the dots specify one or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{cuts}{An integer defining how many cuts to make of the data.} \item{labels}{A character vector defining the factor levels that will be in the new factor (from smallest to largest). This should have length \code{cuts+1} and should not include a level for missing (see \code{keep_na} below).} \item{prefix}{A single parameter value to be used as a prefix for the factor levels (e.g. \code{bin1}, \code{bin2}, ...). If the string is not a valid R name, it is coerced to one.} \item{keep_na}{A logical for whether a factor level should be created to identify missing values in \code{x}.} \item{infs}{A logical indicating whether the smallest and largest cut point should be infinite.} \item{min_unique}{An integer defining a sample size line of dignity for the binning. If (the number of unique values)\code{/(cuts+1)} is less than \code{min_unique}, no discretization takes place.} \item{object}{An object of class \code{discretize}.} \item{newdata}{A new numeric object to be binned.} \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{objects}{The \code{\link{discretize}} objects are stored here once the recipe has be trained by \code{\link{prep.recipe}}.} \item{options}{A list of options to \code{\link{discretize}}. A defaults is set for the argument \code{x}. Note that the using the options \code{prefix} and \code{labels} when more than one variable is being transformed might be problematic as all variables inherit those values.} } \value{ \code{discretize} returns an object of class \code{discretize}. \code{predict.discretize} returns a factor vector. } \description{ \code{discretize} converts a numeric vector into a factor with bins having approximately the same number of data points (based on a training set). } \details{ \code{discretize} estimates the cut points from \code{x} using percentiles. For example, if \code{cuts = 3}, the function estimates the quartiles of \code{x} and uses these as the cut points. If \code{cuts = 2}, the bins are defined as being above or below the median of \code{x}. The \code{predict} method can then be used to turn numeric vectors into factor vectors. If \code{keep_na = TRUE}, a suffix of "_missing" is used as a factor level (see the examples below). If \code{infs = FALSE} and a new value is greater than the largest value of \code{x}, a missing value will result. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] median(biomass_tr$carbon) discretize(biomass_tr$carbon, cuts = 2) discretize(biomass_tr$carbon, cuts = 2, infs = FALSE) discretize(biomass_tr$carbon, cuts = 2, infs = FALSE, keep_na = FALSE) discretize(biomass_tr$carbon, cuts = 2, prefix = "maybe a bad idea to bin") carbon_binned <- discretize(biomass_tr$carbon) table(predict(carbon_binned, biomass_tr$carbon)) carbon_no_infs <- discretize(biomass_tr$carbon, infs = FALSE) predict(carbon_no_infs, c(50, 100)) rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) rec <- rec \%>\% step_discretize(carbon, hydrogen) rec <- prep(rec, biomass_tr) binned_te <- bake(rec, biomass_te) table(binned_te$carbon) } \concept{ preprocessing discretization factors } \keyword{datagen} recipes/man/step_window.Rd0000644000177700017770000001030413135742247016650 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/window.R \name{step_window} \alias{step_window} \title{Moving Window Functions} \usage{ step_window(recipe, ..., role = NA, trained = FALSE, size = 3, na.rm = TRUE, statistic = "mean", columns = NULL, names = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{For model terms created by this step, what analysis role should they be assigned? If \code{names} is left to be \code{NULL}, the rolling statistics replace the original columns and the roles are left unchanged. If \code{names} is set, those new columns will have a role of \code{NULL} unless this argument has a value.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{size}{An odd integer \code{>= 3} for the window size.} \item{na.rm}{A logical for whether missing values should be removed from the calculations within each window.} \item{statistic}{A character string for the type of statistic that should be calculated for each moving window. Possible values are: \code{'max'}, \code{'mean'}, \code{'median'}, \code{'min'}, \code{'prod'}, \code{'sd'}, \code{'sum'}, \code{'var'}} \item{columns}{A character string that contains the names of columns that should be processed. These values are not determined until \code{\link{prep.recipe}} is called.} \item{names}{An optional character string that is the same length of the number of terms selected by \code{terms}. If you are not sure what columns will be selected, use the \code{summary} function (see the example below). These will be the names of the new columns created by the step.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_window} creates a \emph{specification} of a recipe step that will create new columns that are the results of functions that compute statistics across moving windows. } \details{ The calculations use a somewhat atypical method for handling the beginning and end parts of the rolling statistics. The process starts with the center justified window calculations and the beginning and ending parts of the rolling values are determined using the first and last rolling values, respectively. For example if a column \code{x} with 12 values is smoothed with a 5-point moving median, the first three smoothed values are estimated by \code{median(x[1:5])} and the fourth uses \code{median(x[2:6])}. } \examples{ library(recipes) library(dplyr) library(rlang) library(ggplot2, quietly = TRUE) set.seed(5522) sim_dat <- data.frame(x1 = (20:100) / 10) n <- nrow(sim_dat) sim_dat$y1 <- sin(sim_dat$x1) + rnorm(n, sd = 0.1) sim_dat$y2 <- cos(sim_dat$x1) + rnorm(n, sd = 0.1) sim_dat$x2 <- runif(n) sim_dat$x3 <- rnorm(n) rec <- recipe(y1 + y2 ~ x1 + x2 + x3, data = sim_dat) \%>\% step_window(starts_with("y"), size = 7, statistic = "median", names = paste0("med_7pt_", 1:2), role = "outcome") \%>\% step_window(starts_with("y"), names = paste0("mean_3pt_", 1:2), role = "outcome") rec <- prep(rec, training = sim_dat) # If you aren't sure how to set the names, see which variables are selected # and the order that they are selected: terms_select(info = summary(rec), terms = quos(starts_with("y"))) smoothed_dat <- bake(rec, sim_dat, everything()) ggplot(data = sim_dat, aes(x = x1, y = y1)) + geom_point() + geom_line(data = smoothed_dat, aes(y = med_7pt_1)) + geom_line(data = smoothed_dat, aes(y = mean_3pt_1), col = "red") + theme_bw() # If you want to replace the selected variables with the rolling statistic # don't set `names` sim_dat$original <- sim_dat$y1 rec <- recipe(y1 + y2 + original ~ x1 + x2 + x3, data = sim_dat) \%>\% step_window(starts_with("y")) rec <- prep(rec, training = sim_dat) smoothed_dat <- bake(rec, sim_dat, everything()) ggplot(smoothed_dat, aes(x = original, y = y1)) + geom_point() + theme_bw() } \concept{ preprocessing moving_windows } \keyword{datagen} recipes/man/step_spatialsign.Rd0000644000177700017770000000433313135742247017664 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/spatialsign.R \name{step_spatialsign} \alias{step_spatialsign} \title{Spatial Sign Preprocessing} \usage{ step_spatialsign(recipe, ..., role = "predictor", trained = FALSE, columns = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables will be used for the normalization. See \code{\link{selections}} for more details.} \item{role}{For model terms created by this step, what analysis role should they be assigned?} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{columns}{A character string of variable names that will be (eventually) populated by the \code{terms} argument.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_spatialsign} is a \emph{specification} of a recipe step that will convert numeric data into a projection on to a unit sphere. } \details{ The spatial sign transformation projects the variables onto a unit sphere and is related to global contrast normalization. The spatial sign of a vector \code{w} is \code{w/norm(w)}. The variables should be centered and scaled prior to the computations. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) ss_trans <- rec \%>\% step_center(carbon, hydrogen) \%>\% step_scale(carbon, hydrogen) \%>\% step_spatialsign(carbon, hydrogen) ss_obj <- prep(ss_trans, training = biomass_tr) transformed_te <- bake(ss_obj, biomass_te) plot(biomass_te$carbon, biomass_te$hydrogen) plot(transformed_te$carbon, transformed_te$hydrogen) } \references{ Serneels, S., De Nolf, E., and Van Espen, P. (2006). Spatial sign preprocessing: a simple way to impart moderate robustness to multivariate estimators. \emph{Journal of Chemical Information and Modeling}, 46(3), 1402-1409. } \concept{ preprocessing projection_methods } \keyword{datagen} recipes/man/step.Rd0000644000177700017770000000105513113567007015257 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/misc.R \name{step} \alias{step} \title{A General Step Wrapper} \usage{ step(subclass, ...) } \arguments{ \item{subclass}{A character string for the resulting class. For example, if \code{subclass = "blah"} the step object that is returned has class \code{step_blah}.} \item{...}{All arguments to the step that should be returned.} } \value{ A updated step with the new class. } \description{ \code{step} sets the class of the step. } \concept{ preprocessing } \keyword{datagen} recipes/man/step_hyperbolic.Rd0000644000177700017770000000354713135742247017514 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/hyperbolic.R \name{step_hyperbolic} \alias{step_hyperbolic} \title{Hyperbolic Transformations} \usage{ step_hyperbolic(recipe, ..., role = NA, trained = FALSE, func = "sin", inverse = TRUE, columns = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{func}{A character value for the function. Valid values are "sin", "cos", or "tan".} \item{inverse}{A logical: should the inverse function be used?} \item{columns}{A character string of variable names that will be (eventually) populated by the \code{terms} argument.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_hyperbolic} creates a \emph{specification} of a recipe step that will transform data using a hyperbolic function. } \examples{ set.seed(313) examples <- matrix(rnorm(40), ncol = 2) examples <- as.data.frame(examples) rec <- recipe(~ V1 + V2, data = examples) cos_trans <- rec \%>\% step_hyperbolic(all_predictors(), func = "cos", inverse = FALSE) cos_obj <- prep(cos_trans, training = examples) transformed_te <- bake(cos_obj, examples) plot(examples$V1, transformed_te$V1) } \seealso{ \code{\link{step_logit}} \code{\link{step_invlogit}} \code{\link{step_log}} \code{\link{step_sqrt}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing transformation_methods } \keyword{datagen} recipes/man/summary.recipe.Rd0000644000177700017770000000205013135742247017250 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/recipe.R \name{summary.recipe} \alias{summary.recipe} \title{Summarize a Recipe} \usage{ \method{summary}{recipe}(object, original = FALSE, ...) } \arguments{ \item{object}{A \code{recipe} object} \item{original}{A logical: show the current set of variables or the original set when the recipe was defined.} \item{...}{further arguments passed to or from other methods (not currently used).} } \value{ A tibble with columns \code{variable}, \code{type}, \code{role}, and \code{source}. } \description{ This function prints the current set of variables/features and some of their characteristics. } \details{ Note that, until the recipe has been trained, the currrent and original variables are the same. } \examples{ rec <- recipe( ~ ., data = USArrests) summary(rec) rec <- step_pca(rec, all_numeric(), num = 3) summary(rec) # still the same since not yet trained rec <- prep(rec, training = USArrests) summary(rec) } \seealso{ \code{\link{recipe}} \code{\link{prep.recipe}} } recipes/man/names0.Rd0000644000177700017770000000123513106636675015501 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/misc.R \name{names0} \alias{names0} \title{Sequences of Names with Padded Zeros} \usage{ names0(num, prefix = "x") } \arguments{ \item{num}{A single integer for how many elements are created.} \item{prefix}{A character string that will start each name. .} } \value{ A character string of length \code{num}. } \description{ This function creates a series of \code{num} names with a common prefix. The names are numbered with leading zeros (e.g. \code{prefix01}-\code{prefix10} instead of \code{prefix1}-\code{prefix10}). } \concept{ string_functions naming_functions } \keyword{datagen} recipes/man/okc.Rd0000644000177700017770000000112613106636675015071 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{okc} \alias{okc} \title{OkCupid Data} \source{ Kim, A. Y., and A. Escobedo-Land. 2015. "OkCupid Data for Introductory Statistics and Data Science Courses." \emph{Journal of Statistics Education: An International Journal on the Teaching and Learning of Statistics}. } \value{ \item{okc}{a data frame} } \description{ These are a sample of columns of users of OkCupid dating website. The data are from Kim and Escobedo-Land (2015). } \examples{ data(okc) str(okc) } \keyword{datasets} recipes/man/recipes.Rd0000644000177700017770000000273713135742247015753 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/pkg.R \docType{package} \name{recipes} \alias{recipes} \alias{recipes-package} \title{recipes: A package for computing and preprocessing design matrices.} \description{ The \code{recipes} package can be used to create design matrices for modeling and to conduct preprocessing of variables. It is meant to be a more extensive framework that R's formula method. Some differences between simple formula methods and recipes are that \enumerate{ \item Variables can have arbitrary roles in the analysis beyond predictors and outcomes. \item A recipe consists of one or more steps that define actions on the variables. \item Recipes can be defined sequentially using pipes as well as being modifiable and extensible. } } \section{Basic Functions}{ The three main functions are \code{\link{recipe}}, \code{\link{prep}}, and \code{\link{bake}}. \code{\link{recipe}} defines the operations on the data and the associated roles. Once the preprocessing steps are defined, any parameters are estimated using \code{\link{prep}}. Once the data are ready for transformation, the \code{\link{bake}} function applies the operations. } \section{Step Functions}{ These functions are used to add new actions to the recipe and have the naming convention \code{"step_action"}. For example, \code{\link{step_center}} centers the data to have a zero mean and \code{\link{step_dummy}} is used to create dummy variables. } recipes/man/step_logit.Rd0000644000177700017770000000336713135742247016472 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/logit.R \name{step_logit} \alias{step_logit} \title{Logit Transformation} \usage{ step_logit(recipe, ..., role = NA, trained = FALSE, columns = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{columns}{A character string of variable names that will be (eventually) populated by the \code{terms} argument.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_logit} creates a \emph{specification} of a recipe step that will logit transform the data. } \details{ The inverse logit transformation takes values between zero and one and translates them to be on the real line using the function \code{f(p) = log(p/(1-p))}. } \examples{ set.seed(313) examples <- matrix(runif(40), ncol = 2) examples <- data.frame(examples) rec <- recipe(~ X1 + X2, data = examples) logit_trans <- rec \%>\% step_logit(all_predictors()) logit_obj <- prep(logit_trans, training = examples) transformed_te <- bake(logit_obj, examples) plot(examples$X1, transformed_te$X1) } \seealso{ \code{\link{step_invlogit}} \code{\link{step_log}} \code{\link{step_sqrt}} \code{\link{step_hyperbolic}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing transformation_methods } \keyword{datagen} recipes/man/biomass.Rd0000644000177700017770000000140413064546045015743 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{biomass} \alias{biomass} \title{Biomass Data} \source{ Ghugare, S. B., Tiwary, S., Elangovan, V., and Tambe, S. S. (2013). Prediction of Higher Heating Value of Solid Biomass Fuels Using Artificial Intelligence Formalisms. \emph{BioEnergy Research}, 1-12. } \value{ \item{biomass}{a data frame} } \description{ Ghugare et al (2014) contains a data set where different biomass fuels are characterized by the amount of certain molecules (carbon, hydrogen, oxygen, nitrogen, and sulfur) and the corresponding higher heating value (HHV). These data are from their Table S.2 of the Supplementary Materials } \examples{ data(biomass) str(biomass) } \keyword{datasets} recipes/man/reexports.Rd0000644000177700017770000000061113125445567016345 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/misc.R \docType{import} \name{reexports} \alias{reexports} \alias{\%>\%} \title{Objects exported from other packages} \keyword{internal} \description{ These objects are imported from other packages. Follow the links below to see their documentation. \describe{ \item{magrittr}{\code{\link[magrittr]{\%>\%}}} }} recipes/man/step_scale.Rd0000644000177700017770000000400213135742247016426 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/scale.R \name{step_scale} \alias{step_scale} \title{Scaling Numeric Data} \usage{ step_scale(recipe, ..., role = NA, trained = FALSE, sds = NULL, na.rm = TRUE) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{sds}{A named numeric vector of standard deviations This is \code{NULL} until computed by \code{\link{prep.recipe}}.} \item{na.rm}{A logical value indicating whether \code{NA} values should be removed when computing the standard deviation.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_scale} creates a \emph{specification} of a recipe step that will normalize numeric data to have a standard deviation of one. } \details{ Scaling data means that the standard deviation of a variable is divided out of the data. \code{step_scale} estimates the variable standard deviations from the data used in the \code{training} argument of \code{prep.recipe}. \code{bake.recipe} then applies the scaling to new data sets using these standard deviations. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) scaled_trans <- rec \%>\% step_scale(carbon, hydrogen) scaled_obj <- prep(scaled_trans, training = biomass_tr) transformed_te <- bake(scaled_obj, biomass_te) biomass_te[1:10, names(transformed_te)] transformed_te } \concept{ preprocessing normalization_methods } \keyword{datagen} recipes/man/step_interact.Rd0000644000177700017770000000556613135742247017170 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/interactions.R \name{step_interact} \alias{step_interact} \title{Create Interaction Variables} \usage{ step_interact(recipe, terms, role = "predictor", trained = FALSE, objects = NULL, sep = "_x_") } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{terms}{A traditional R formula that contains interaction terms.} \item{role}{For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new columns created from the original variables will be used as predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{objects}{A list of \code{terms} objects for each individual interation.} \item{sep}{A character value used to delinate variables in an interaction (e.g. \code{var1_x_var2} instead of the more traditional \code{var1:var2}).} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_interact} creates a \emph{specification} of a recipe step that will create new columns that are interaction terms between two or more variables. } \details{ \code{step_interact} can create interactions between variables. It is primarily intended for \bold{numeric data}; categorical variables should probably be converted to dummy variables using \code{\link{step_dummy}} prior to being used for interactions. Unlike other step functions, the \code{terms} argument should be a traditional R model formula but should contain no inline functions (e.g. \code{log}). For example, for predictors \code{A}, \code{B}, and \code{C}, a formula such as \code{~A:B:C} can be used to make a three way interaction between the variables. If the formula contains terms other than interactions (e.g. \code{(A+B+C)^3}) only the interaction terms are retained for the design matrix. The separator between the variables defaults to "\code{_x_}" so that the three way interaction shown previously would generate a column named \code{A_x_B_x_C}. This can be changed using the \code{sep} argument. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) int_mod_1 <- rec \%>\% step_interact(terms = ~ carbon:hydrogen) int_mod_2 <- int_mod_1 \%>\% step_interact(terms = ~ (oxygen + nitrogen + sulfur)^3) int_mod_1 <- prep(int_mod_1, training = biomass_tr) int_mod_2 <- prep(int_mod_2, training = biomass_tr) dat_1 <- bake(int_mod_1, biomass_te) dat_2 <- bake(int_mod_2, biomass_te) names(dat_1) names(dat_2) } \concept{ preprocessing model_specification } \keyword{datagen} recipes/man/step_ordinalscore.Rd0000644000177700017770000000513613135742247020034 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/ordinalscore.R \name{step_ordinalscore} \alias{step_ordinalscore} \title{Convert Ordinal Factors to Numeric Scores} \usage{ step_ordinalscore(recipe, ..., role = NA, trained = FALSE, columns = NULL, convert = as.numeric) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{columns}{A character string of variables that will be converted. This is \code{NULL} until computed by \code{\link{prep.recipe}}.} \item{convert}{A function that takes an ordinal factor vector as an input and outputs a single numeric variable.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_ordinalscore} creates a \emph{specification} of a recipe step that will convert ordinal factor variables into numeric scores. } \details{ Dummy variables from ordered factors with \code{C} levels will create polynomial basis functions with \code{C-1} terms. As an alternative, this step can be used to translate the ordered levels into a single numeric vector of values that represent (subjective) scores. By default, the translation uses a linear scale (1, 2, 3, ... \code{C}) but custom score functions can also be used (see the example below). } \examples{ fail_lvls <- c("meh", "annoying", "really_bad") ord_data <- data.frame(item = c("paperclip", "twitter", "airbag"), fail_severity = factor(fail_lvls, levels = fail_lvls, ordered = TRUE)) model.matrix(~fail_severity, data = ord_data) linear_values <- recipe(~ item + fail_severity, data = ord_data) \%>\% step_dummy(item) \%>\% step_ordinalscore(fail_severity) linear_values <- prep(linear_values, training = ord_data, retain = TRUE) juice(linear_values, everything()) custom <- function(x) { new_values <- c(1, 3, 7) new_values[as.numeric(x)] } nonlin_scores <- recipe(~ item + fail_severity, data = ord_data) \%>\% step_dummy(item) \%>\% step_ordinalscore(fail_severity, convert = custom) nonlin_scores <- prep(nonlin_scores, training = ord_data, retain = TRUE) juice(nonlin_scores, everything()) } \concept{ preprocessing ordinal_data } \keyword{datagen} recipes/man/step_log.Rd0000644000177700017770000000320313135742247016122 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/log.R \name{step_log} \alias{step_log} \title{Logarithmic Transformation} \usage{ step_log(recipe, ..., role = NA, trained = FALSE, base = exp(1), columns = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{base}{A numeric value for the base.} \item{columns}{A character string of variable names that will be (eventually) populated by the \code{terms} argument.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_log} creates a \emph{specification} of a recipe step that will log transform data. } \examples{ set.seed(313) examples <- matrix(exp(rnorm(40)), ncol = 2) examples <- as.data.frame(examples) rec <- recipe(~ V1 + V2, data = examples) log_trans <- rec \%>\% step_log(all_predictors()) log_obj <- prep(log_trans, training = examples) transformed_te <- bake(log_obj, examples) plot(examples$V1, transformed_te$V1) } \seealso{ \code{\link{step_logit}} \code{\link{step_invlogit}} \code{\link{step_hyperbolic}} \code{\link{step_sqrt}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing transformation_methods } \keyword{datagen} recipes/man/step_other.Rd0000644000177700017770000000466613135742247016500 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/other.R \name{step_other} \alias{step_other} \title{Collapse Some Categorical Levels} \usage{ step_other(recipe, ..., role = NA, trained = FALSE, threshold = 0.05, other = "other", objects = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables that will potentially be reduced. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{threshold}{A single numeric value in (0, 1) for pooling.} \item{other}{A single character value for the "other" category.} \item{objects}{A list of objects that contain the information to pool infrequent levels that is determined by \code{\link{prep.recipe}}.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_other} creates a \emph{specification} of a recipe step that will potentially pool infrequently occurring values into an "other" category. } \details{ The overall proportion of the categories are computed. The "other" category is used in place of any categorical levels whose individual proportion in the training set is less than \code{threshold}. If no pooling is done the data are unmodified (although character data may be changed to factors based on the value of \code{stringsAsFactors} in \code{\link{prep.recipe}}). Otherwise, a factor is always returned with different factor levels. If \code{threshold} is less than the largest category proportion, all levels except for the most frequent are collapsed to the \code{other} level. If the retained categories include the value of \code{other}, an error is thrown. If \code{other} is in the list of discarded levels, no error occurs. } \examples{ data(okc) set.seed(19) in_train <- sample(1:nrow(okc), size = 30000) okc_tr <- okc[ in_train,] okc_te <- okc[-in_train,] rec <- recipe(~ diet + location, data = okc_tr) rec <- rec \%>\% step_other(diet, location, threshold = .1, other = "other values") rec <- prep(rec, training = okc_tr) collapsed <- bake(rec, okc_te) table(okc_te$diet, collapsed$diet, useNA = "always") } \concept{ preprocessing factors } \keyword{datagen} recipes/man/covers.Rd0000644000177700017770000000114613106636675015620 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{covers} \alias{covers} \title{Raw Cover Type Data} \source{ \url{https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.info} } \value{ \item{covers}{a data frame} } \description{ These data are raw data describing different types of forest cover-types from the UCI Machine Learning Database (see link below). There is one column in the data that has a few difference pieces of textual information (of variable lengths). } \examples{ data(covers) str(covers) } \keyword{datasets} recipes/man/step_poly.Rd0000644000177700017770000000465313135742247016336 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/poly.R \name{step_poly} \alias{step_poly} \title{Orthogonal Polynomial Basis Functions} \usage{ step_poly(recipe, ..., role = "predictor", trained = FALSE, objects = NULL, options = list(degree = 2)) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new columns created from the original variables will be used as predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{objects}{A list of \code{\link[stats]{poly}} objects created once the step has been trained.} \item{options}{A list of options for \code{\link[stats]{poly}} which should not include \code{x} or \code{simple}. Note that the option \code{raw = TRUE} will produce the regular polynomial values (not orthogonalized).} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_poly} creates a \emph{specification} of a recipe step that will create new columns that are basis expansions of variables using orthogonal polynomials. } \details{ \code{step_poly} can new features from a single variable that enable fitting routines to model this variable in a nonlinear manner. The extent of the possible nonlinearity is determined by the \code{degree} argument of \code{\link[stats]{poly}}. The original variables are removed from the data and new columns are added. The naming convention for the new variables is \code{varname_poly_1} and so on. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) quadratic <- rec \%>\% step_poly(carbon, hydrogen) quadratic <- prep(quadratic, training = biomass_tr) expanded <- bake(quadratic, biomass_te) expanded } \seealso{ \code{\link{step_ns}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing basis_expansion } \keyword{datagen} recipes/man/print.recipe.Rd0000644000177700017770000000102113106655000016670 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/recipe.R \name{print.recipe} \alias{print.recipe} \title{Print a Recipe} \usage{ \method{print}{recipe}(x, form_width = 30, ...) } \arguments{ \item{x}{A \code{recipe} object} \item{form_width}{The number of characters used to print the variables or terms in a formula} \item{...}{further arguments passed to or from other methods (not currently used).} } \value{ The original object (invisibly) } \description{ Print a Recipe } \author{ Max Kuhn } recipes/man/prep.Rd0000644000177700017770000000467713135742247015274 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/recipe.R \name{prep} \alias{prep} \alias{prep.recipe} \alias{prep.recipe} \title{Train a Data Recipe} \usage{ prep(x, ...) \method{prep}{recipe}(x, training = NULL, fresh = FALSE, verbose = TRUE, retain = FALSE, stringsAsFactors = TRUE, ...) } \arguments{ \item{x}{an object} \item{...}{further arguments passed to or from other methods (not currently used).} \item{training}{A data frame or tibble that will be used to estimate parameters for preprocessing.} \item{fresh}{A logical indicating whether already trained steps should be re-trained. If \code{TRUE}, you should pass in a data set to the argument \code{training}.} \item{verbose}{A logical that controls wether progress is reported as steps are executed.} \item{retain}{A logical: should the \emph{preprocessingcessed} training set be saved into the \code{template} slot of the recipe after training? This is a good idea if you want to add more steps later but want to avoid re-training the existing steps.} \item{stringsAsFactors}{A logical: should character columns be converted to factors? This affects the preprocessingcessed training set (when \code{retain = TRUE}) as well as the results of \code{bake.recipe}.} } \value{ A recipe whose step objects have been updated with the required quantities (e.g. parameter estimates, model objects, etc). Also, the \code{term_info} object is likely to be modified as the steps are executed. } \description{ For a recipe with at least one preprocessing step, estimate the required parameters from a training set that can be later applied to other data sets. } \details{ Given a data set, this function estimates the required quantities and statistics required by any steps. \code{\link{prep}} returns an updated recipe with the estimates. Note that missing data handling is handled in the steps; there is no global \code{na.rm} option at the recipe-level or in \code{\link{prep}}. Also, if a recipe has been trained using \code{\link{prep}} and then steps are added, \code{\link{prep}} will only update the new steps. If \code{fresh = TRUE}, all of the steps will be (re)estimated. As the steps are executed, the \code{training} set is updated. For example, if the first step is to center the data and the second is to scale the data, the step for scaling is given the centered data. } \author{ Max Kuhn } \concept{ preprocessing model_specification } \keyword{datagen} recipes/man/bake.Rd0000644000177700017770000000266313135742247015221 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/recipe.R \name{bake} \alias{bake} \alias{bake.recipe} \alias{bake.recipe} \title{Apply a Trained Data Recipe} \usage{ bake(object, ...) \method{bake}{recipe}(object, newdata = object$template, ...) } \arguments{ \item{object}{A trained object such as a \code{\link{recipe}} with at least one preprocessing step.} \item{...}{One or more selector functions to choose which variables will be returned by the function. See \code{\link{selections}} for more details. If no selectors are given, the default is to use \code{\link{all_predictors}}.} \item{newdata}{A data frame or tibble for whom the preprocessing will be applied.} } \value{ A tibble that may have different columns than the original columns in \code{newdata}. } \description{ For a recipe with at least one preprocessing step that has been trained by \code{\link{prep.recipe}}, apply the computations to new data. } \details{ \code{\link{bake}} takes a trained recipe and applies the operations to a data set to create a design matrix. If the original data used to train the data are to be processed, time can be saved by using the \code{retain = TRUE} option of \code{\link{prep}} to avoid duplicating the same operations. A tibble is always returned but can be easily converted to a data frame or matrix as needed. } \author{ Max Kuhn } \concept{ preprocessing model_specification } \keyword{datagen} recipes/man/step_depth.Rd0000644000177700017770000000657713135742247016466 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/depth.R \name{step_depth} \alias{step_depth} \title{Data Depths} \usage{ step_depth(recipe, ..., class, role = "predictor", trained = FALSE, metric = "halfspace", options = list(), data = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables that will be used to create the new features. See \code{\link{selections}} for more details.} \item{class}{A single character string that specifies a single categorical variable to be used as the class.} \item{role}{For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that resulting depth estimates will be used as predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{metric}{A character string specifying the depth metric. Possible values are "potential", "halfspace", "Mahalanobis", "simplicialVolume", "spatial", and "zonoid".} \item{options}{A list of options to pass to the underlying depth functions. See \code{\link[ddalpha]{depth.halfspace}}, \code{\link[ddalpha]{depth.Mahalanobis}}, \code{\link[ddalpha]{depth.potential}}, \code{\link[ddalpha]{depth.projection}}, \code{\link[ddalpha]{depth.simplicial}}, \code{\link[ddalpha]{depth.simplicialVolume}}, \code{\link[ddalpha]{depth.spatial}}, \code{\link[ddalpha]{depth.zonoid}}.} \item{data}{The training data are stored here once after \code{\link{prep.recipe}} is executed.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_depth} creates a a \emph{specification} of a recipe step that will convert numeric data into measurement of \emph{data depth}. This is done for each value of a categorical class variable. } \details{ Data depth metrics attempt to measure how close data a data point is to the center of its distribution. There are a number of methods for calculating death but a simple example is the inverse of the distance of a data point to the centroid of the distribution. Generally, small values indicate that a data point not close to the centroid. \code{step_depth} can compute a class-specific depth for a new data point based on the proximity of the new value to the training set distribution. Note that the entire training set is saved to compute future depth values. The saved data have been trained (i.e. prepared) and baked (i.e. processed) up to the point before the location that \code{step_depth} occupies in the recipe. Also, the data requirements for the different step methods may vary. For example, using \code{metric = "Mahalanobis"} requires that each class should have at least as many rows as variables listed in the \code{terms} argument. The function will create a new column for every unique value of the \code{class} variable. The resulting variables will not replace the original values and have the prefix \code{depth_}. } \examples{ # halfspace depth is the default rec <- recipe(Species ~ ., data = iris) \%>\% step_depth(all_predictors(), class = "Species") rec_dists <- prep(rec, training = iris) dists_to_species <- bake(rec_dists, newdata = iris) dists_to_species } \concept{ preprocessing dimension_reduction } \keyword{datagen} recipes/man/step_corr.Rd0000644000177700017770000000525413135742247016316 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/corr.R \name{step_corr} \alias{step_corr} \title{High Correlation Filter} \usage{ step_corr(recipe, ..., role = NA, trained = FALSE, threshold = 0.9, use = "pairwise.complete.obs", method = "pearson", removals = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{threshold}{A value for the threshold of absolute correlation values. The step will try to remove the minimum number of columns so that all the resulting absolute correlations are less than this value.} \item{use}{A character string for the \code{use} argument to the \code{\link[stats]{cor}} function.} \item{method}{A character string for the \code{method} argument to the \code{\link[stats]{cor}} function.} \item{removals}{A character string that contains the names of columns that should be removed. These values are not determined until \code{\link{prep.recipe}} is called.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_corr} creates a \emph{specification} of a recipe step that will potentially remove variables that have large absolute correlations with other variables. } \details{ This step attempts to remove variables to keep the largest absolute correlation between the variables less than \code{threshold}. } \examples{ data(biomass) set.seed(3535) biomass$duplicate <- biomass$carbon + rnorm(nrow(biomass)) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur + duplicate, data = biomass_tr) corr_filter <- rec \%>\% step_corr(all_predictors(), threshold = .5) filter_obj <- prep(corr_filter, training = biomass_tr) filtered_te <- bake(filter_obj, biomass_te) round(abs(cor(biomass_tr[, c(3:7, 9)])), 2) round(abs(cor(filtered_te)), 2) } \seealso{ \code{\link{step_nzv}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \author{ Original R code for filtering algorithm by Dong Li, modified by Max Kuhn. Contributions by Reynald Lescarbeau (for original in \code{caret} package). Max Kuhn for the \code{step} function. } \concept{ preprocessing variable_filters } \keyword{datagen} recipes/man/step_YeoJohnson.Rd0000644000177700017770000000565013135742247017444 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/YeoJohnson.R \name{step_YeoJohnson} \alias{step_YeoJohnson} \title{Yeo-Johnson Transformation} \usage{ step_YeoJohnson(recipe, ..., role = NA, trained = FALSE, lambdas = NULL, limits = c(-5, 5), nunique = 5) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{lambdas}{A numeric vector of transformation values. This is \code{NULL} until computed by \code{\link{prep.recipe}}.} \item{limits}{A length 2 numeric vector defining the range to compute the transformation parameter lambda.} \item{nunique}{An integer where data that have less possible values will not be evaluate for a transformation} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_YeoJohnson} creates a \emph{specification} of a recipe step that will transform data using a simple Yeo-Johnson transformation. } \details{ The Yeo-Johnson transformation is very similar to the Box-Cox but does not require the input variables to be strictly positive. In the package, the partial log-likelihood function is directly optimized within a reasonable set of transformation values (which can be changed by the user). This transformation is typically done on the outcome variable using the residuals for a statistical model (such as ordinary least squares). Here, a simple null model (intercept only) is used to apply the transformation to the \emph{predictor} variables individually. This can have the effect of making the variable distributions more symmetric. If the transformation parameters are estimated to be very closed to the bounds, or if the optimization fails, a value of \code{NA} is used and no transformation is applied. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) yj_trans <- step_YeoJohnson(rec, all_numeric()) yj_estimates <- prep(yj_trans, training = biomass_tr) yj_te <- bake(yj_estimates, biomass_te) plot(density(biomass_te$sulfur), main = "before") plot(density(yj_te$sulfur), main = "after") } \references{ Yeo, I. K., and Johnson, R. A. (2000). A new family of power transformations to improve normality or symmetry. \emph{Biometrika}. } \seealso{ \code{\link{step_BoxCox}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing transformation_methods } \keyword{datagen} recipes/man/terms_select.Rd0000644000177700017770000000210613135742247017000 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/selections.R \name{terms_select} \alias{terms_select} \title{Select Terms in a Step Function.} \usage{ terms_select(terms, info) } \arguments{ \item{terms}{A list of formulas whose right-hand side contains quoted expressions. See \code{\link[rlang]{quos}} for examples.} \item{info}{A tibble with columns \code{variable}, \code{type}, \code{role}, and \code{source} that represent the current state of the data. The function \code{\link{summary.recipe}} can be used to get this information from a recipe.} } \value{ A character string of column names or an error of there are no selectors or if no variables are selected. } \description{ This function bakees the step function selectors and might be useful when creating custom steps. } \examples{ library(rlang) data(okc) rec <- recipe(~ ., data = okc) info <- summary(rec) terms_select(info = info, quos(all_predictors())) } \seealso{ \code{\link{recipe}} \code{\link{summary.recipe}} \code{\link{prep.recipe}} } \concept{ preprocessing } \keyword{datagen} recipes/man/selections.Rd0000644000177700017770000000765713135742247016477 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/selections.R \name{selections} \alias{selections} \title{Methods for Select Variables in Step Functions} \description{ When selecting variables or model terms in \code{step} functions, \code{dplyr}-like tools are used. The \emph{selector} functions can choose variables based on their name, current role, data type, or any combination of these. The selectors are passed as any other argument to the step. If the variables are explicitly stated in the step function, this might be similar to: \preformatted{ recipe( ~ ., data = USArrests) \%>\% step_pca(Murder, Assault, UrbanPop, Rape, num = 3) } The first four arguments indicate which variables should be used in the PCA while the last argument is a specific argument to \code{\link{step_pca}}. Note that: \enumerate{ \item The selector arguments should not contain functions beyond those supported (see below). \item These arguments are not evaluated until the \code{prep} function for the step is executed. \item The \code{dplyr}-like syntax allows for negative sings to exclude variables (e.g. \code{-Murder}) and the set of selectors will processed in order. \item A leading exclusion in these arguments (e.g. \code{-Murder}) has the effect of adding all variables to the list except the excluded variable(s). } Also, select helpers from the \code{dplyr} package can also be used: \code{\link[dplyr]{starts_with}}, \code{\link[dplyr]{ends_with}}, \code{\link[dplyr]{contains}}, \code{\link[dplyr]{matches}}, \code{\link[dplyr]{num_range}}, and \code{\link[dplyr]{everything}}. For example: \preformatted{ recipe(Species ~ ., data = iris) \%>\% step_center(starts_with("Sepal"), -contains("Width")) } would only select \code{Sepal.Length} \bold{Inline} functions that specify computations, such as \code{log(x)}, should not be used in selectors and will produce an error. A list of allowed selector functions is below. Columns of the design matrix that may not exist when the step is coded can also be selected. For example, when using \code{step_pca}, the number of columns created by feature extraction may not be known when subsequent steps are defined. In this case, using \code{matches("^PC")} will select all of the columns whose names start with "PC" \emph{once those columns are created}. There are sets of functions that can be used to select variables based on their role or type: \code{\link{has_role}} and \code{\link{has_type}}. For convenience, there are also functions that are more specific: \code{\link{all_numeric}}, \code{\link{all_nominal}}, \code{\link{all_predictors}}, and \code{\link{all_outcomes}}. These can be used in conjunction with the previous functions described for selecting variables using their names: \preformatted{ data(biomass) recipe(HHV ~ ., data = biomass) \%>\% step_center(all_numeric(), -all_outcomes()) } This results in all the numeric predictors: carbon, hydrogen, oxygen, nitrogen, and sulfur. If a role for a variable has not been defined, it will never be selected using role-specific selectors. All steps use these techniques to define variables for steps \emph{except one}: \code{\link{step_interact}} requires traditional model formula representations of the interactions and takes a single formula as the argument to select the variables. The complete list of allowable functions in steps: \itemize{ \item \bold{By name}: \code{\link[dplyr]{starts_with}}, \code{\link[dplyr]{ends_with}}, \code{\link[dplyr]{contains}}, \code{\link[dplyr]{matches}}, \code{\link[dplyr]{num_range}}, and \code{\link[dplyr]{everything}} \item \bold{By role}: \code{\link{has_role}}, \code{\link{all_predictors}}, and \code{\link{all_outcomes}} \item \bold{By type}: \code{\link{has_type}}, \code{\link{all_numeric}}, and \code{\link{all_nominal}} } } recipes/man/step_nzv.Rd0000644000177700017770000000547313135742247016171 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/nzv.R \name{step_nzv} \alias{step_nzv} \title{Near-Zero Variance Filter} \usage{ step_nzv(recipe, ..., role = NA, trained = FALSE, options = list(freq_cut = 95/5, unique_cut = 10), removals = NULL) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables that will evaluated by the filtering bake. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{options}{A list of options for the filter (see Details below).} \item{removals}{A character string that contains the names of columns that should be removed. These values are not determined until \code{\link{prep.recipe}} is called.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_nzv} creates a \emph{specification} of a recipe step that will potentially remove variables that are highly sparse and unbalanced. } \details{ This step diagnoses predictors that have one unique value (i.e. are zero variance predictors) or predictors that are have both of the following characteristics: \enumerate{ \item they have very few unique values relative to the number of samples and \item the ratio of the frequency of the most common value to the frequency of the second most common value is large. } For example, an example of near zero variance predictor is one that, for 1000 samples, has two distinct values and 999 of them are a single value. To be flagged, first the frequency of the most prevalent value over the second most frequent value (called the "frequency ratio") must be above \code{freq_cut}. Secondly, the "percent of unique values," the number of unique values divided by the total number of samples (times 100), must also be below \code{unique_cut}. In the above example, the frequency ratio is 999 and the unique value percentage is 0.0001. } \examples{ data(biomass) biomass$sparse <- c(1, rep(0, nrow(biomass) - 1)) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur + sparse, data = biomass_tr) nzv_filter <- rec \%>\% step_nzv(all_predictors()) filter_obj <- prep(nzv_filter, training = biomass_tr) filtered_te <- bake(filter_obj, biomass_te) any(names(filtered_te) == "sparse") } \seealso{ \code{\link{step_corr}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing variable_filters } \keyword{datagen} recipes/man/step_bagimpute.Rd0000644000177700017770000000717013135742247017325 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/bag_imp.R \name{step_bagimpute} \alias{step_bagimpute} \alias{imp_vars} \title{Imputation via Bagged Trees} \usage{ step_bagimpute(recipe, ..., role = NA, trained = FALSE, models = NULL, options = list(nbagg = 25, keepX = FALSE), impute_with = imp_vars(all_predictors()), seed_val = sample.int(10^4, 1)) imp_vars(...) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose variables. For \code{step_bagimpute}, this indicates the variables to be imputed. When used with \code{imp_vars}, the dots indicates which variables are used to predict the missing data in each variable. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{models}{The \code{\link[ipred]{ipredbagg}} objects are stored here once this bagged trees have be trained by \code{\link{prep.recipe}}.} \item{options}{A list of options to \code{\link[ipred]{ipredbagg}}. Defaults are set for the arguments \code{nbagg} and \code{keepX} but others can be passed in. \bold{Note} that the arguments \code{X} and \code{y} should not be passed here.} \item{impute_with}{A call to \code{imp_vars} to specify which variables are used to impute the variables that can inlcude specific variable names seperated by commas or different selectors (see \code{\link{selections}}). If a column is included in both lists to be imputed and to be an imputation predictor, it will be removed from the latter and not used to impute itself.} \item{seed_val}{A integer used to create reproducible models. The same seed is used across all imputation models.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_bagimpute} creates a \emph{specification} of a recipe step that will create bagged tree models to impute missing data. } \details{ For each variables requiring imputation, a bagged tree is created where the outcome is the variable of interest and the predictors are any other variables listed in the \code{impute_with} formula. One advantage to the bagged tree is that is can accept predictors that have missing values themselves. This imputation method can be used when the variable of interest (and predictors) are numeric or categorical. Imputed categorical variables will remain categorical. Note that if a variable that is to be imputed is also in \code{impute_with}, this variable will be ignored. It is possible that missing values will still occur after imputation if a large majority (or all) of the imputing variables are also missing. } \examples{ data("credit_data") ## missing data per column vapply(credit_data, function(x) mean(is.na(x)), c(num = 0)) set.seed(342) in_training <- sample(1:nrow(credit_data), 2000) credit_tr <- credit_data[ in_training, ] credit_te <- credit_data[-in_training, ] missing_examples <- c(14, 394, 565) rec <- recipe(Price ~ ., data = credit_tr) impute_rec <- rec \%>\% step_bagimpute(Status, Home, Marital, Job, Income, Assets, Debt) imp_models <- prep(impute_rec, training = credit_tr) imputed_te <- bake(imp_models, newdata = credit_te, everything()) credit_te[missing_examples,] imputed_te[missing_examples, names(credit_te)] } \references{ Kuhn, M. and Johnson, K. (2013). \emph{Applied Predictive Modeling}. Springer Verlag. } \concept{ preprocessing imputation } \keyword{datagen} recipes/man/has_role.Rd0000644000177700017770000000366013106655000016075 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/selections.R \name{has_role} \alias{has_role} \alias{all_predictors} \alias{all_outcomes} \alias{has_type} \alias{all_numeric} \alias{all_nominal} \alias{current_info} \title{Role Selection} \usage{ has_role(match = "predictor", roles = current_info()$roles) all_predictors(roles = current_info()$roles) all_outcomes(roles = current_info()$roles) has_type(match = "numeric", types = current_info()$types) all_numeric(types = current_info()$types) all_nominal(types = current_info()$types) current_info() } \arguments{ \item{match}{A single character string for the query. Exact matching is used (i.e. regular expressions won't work).} \item{roles}{A character string of roles for the current set of terms.} \item{types}{A character string of roles for the current set of data types} } \value{ Selector functions return an integer vector while \code{current_info} returns an environment with vectors \code{vars}, \code{roles}, and \code{types}. } \description{ \code{has_role}, \code{all_predictors}, and \code{all_outcomes} can be used to select variables in a formula that have certain roles. Similarly, \code{has_type}, \code{all_numeric}, and \code{all_nominal} are used to select columns based on their data type. See \code{\link{selections}} for more details. \code{current_info} is an internal function that is unlikely to help users while the others have limited utility outside of step function arguments. } \examples{ data(biomass) rec <- recipe(biomass) \%>\% add_role(carbon, hydrogen, oxygen, nitrogen, sulfur, new_role = "predictor") \%>\% add_role(HHV, new_role = "outcome") \%>\% add_role(sample, new_role = "id variable") \%>\% add_role(dataset, new_role = "splitting indicator") recipe_info <- summary(rec) recipe_info has_role("id variable", roles = recipe_info$role) all_outcomes(roles = recipe_info$role) } \keyword{datagen} recipes/man/step_BoxCox.Rd0000644000177700017770000000543713135742247016556 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/BoxCox.R \name{step_BoxCox} \alias{step_BoxCox} \title{Box-Cox Transformation for Non-Negative Data} \usage{ step_BoxCox(recipe, ..., role = NA, trained = FALSE, lambdas = NULL, limits = c(-5, 5), nunique = 5) } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables are affected by the step. See \code{\link{selections}} for more details.} \item{role}{Not used by this step since no new variables are created.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{lambdas}{A numeric vector of transformation values. This is \code{NULL} until computed by \code{\link{prep.recipe}}.} \item{limits}{A length 2 numeric vector defining the range to compute the transformation parameter lambda.} \item{nunique}{An integer where data that have less possible values will not be evaluate for a transformation.} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_BoxCox} creates a \emph{specification} of a recipe step that will transform data using a simple Box-Cox transformation. } \details{ The Box-Cox transformation, which requires a strictly positive variable, can be used to rescale a variable to be more similar to a normal distribution. In this package, the partial log-likelihood function is directly optimized within a reasonable set of transformation values (which can be changed by the user). This transformation is typically done on the outcome variable using the residuals for a statistical model (such as ordinary least squares). Here, a simple null model (intercept only) is used to apply the transformation to the \emph{predictor} variables individually. This can have the effect of making the variable distributions more symmetric. If the transformation parameters are estimated to be very closed to the bounds, or if the optimization fails, a value of \code{NA} is used and no transformation is applied. } \examples{ rec <- recipe(~ ., data = as.data.frame(state.x77)) bc_trans <- step_BoxCox(rec, all_numeric()) bc_estimates <- prep(bc_trans, training = as.data.frame(state.x77)) bc_data <- bake(bc_estimates, as.data.frame(state.x77)) plot(density(state.x77[, "Illiteracy"]), main = "before") plot(density(bc_data$Illiteracy), main = "after") } \references{ Sakia, R. M. (1992). The Box-Cox transformation technique: A review. \emph{The Statistician}, 169-178.. } \seealso{ \code{\link{step_YeoJohnson}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing transformation_methods } \keyword{datagen} recipes/man/juice.Rd0000644000177700017770000000320613135742247015410 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/recipe.R \name{juice} \alias{juice} \title{Extract Finalized Training Set} \usage{ juice(object, ...) } \arguments{ \item{object}{A \code{recipe} object that has been prepared with the option \code{retain = TRUE}.} \item{...}{One or more selector functions to choose which variables will be returned by the function. See \code{\link{selections}} for more details. If no selectors are given, the default is to use \code{\link{all_predictors}}.} } \value{ A tibble. } \description{ As steps are estimated by \code{prep}, these operations are applied to the training set. Rather than running \code{bake} to duplicate this processing, this function will return variables from the processed training set. } \details{ When preparing a recipe, if the training data set is retained using \code{retain = TRUE}, there is no need to \code{bake} the recipe to get the preprocessed training set. } \examples{ data(biomass) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) sp_signed <- rec \%>\% step_center(all_predictors()) \%>\% step_scale(all_predictors()) \%>\% step_spatialsign(all_predictors()) sp_signed_trained <- prep(sp_signed, training = biomass_tr, retain = TRUE) tr_values <- bake(sp_signed_trained, newdata = biomass_tr, all_predictors()) og_values <- juice(sp_signed_trained, all_predictors()) all.equal(tr_values, og_values) } \seealso{ \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } recipes/man/step_pca.Rd0000644000177700017770000001064713135742247016116 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/pca.R \name{step_pca} \alias{step_pca} \title{PCA Signal Extraction} \usage{ step_pca(recipe, ..., role = "predictor", trained = FALSE, num = 5, threshold = NA, options = list(), res = NULL, prefix = "PC") } \arguments{ \item{recipe}{A recipe object. The step will be added to the sequence of operations for this recipe.} \item{...}{One or more selector functions to choose which variables will be used to compute the components. See \code{\link{selections}} for more details.} \item{role}{For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new principal component columns created by the original variables will be used as predictors in a model.} \item{trained}{A logical to indicate if the quantities for preprocessing have been estimated.} \item{num}{The number of PCA components to retain as new predictors. If \code{num} is greater than the number of columns or the number of possible components, a smaller value will be used.} \item{threshold}{A fraction of the total variance that should be covered by the components. For example, \code{threshold = .75} means that \code{step_pca} should generate enough components to capture 75\% of the variability in the variables. Note: using this argument will override and resent any value given to \code{num}.} \item{options}{A list of options to the default method for \code{\link[stats]{prcomp}}. Argument defaults are set to \code{retx = FALSE}, \code{center = FALSE}, \code{scale. = FALSE}, and \code{tol = NULL}. \bold{Note} that the argument \code{x} should not be passed here (or at all).} \item{res}{The \code{\link[stats]{prcomp.default}} object is stored here once this preprocessing step has be trained by \code{\link{prep.recipe}}.} \item{prefix}{A character string that will be the prefix to the resulting new variables. See notes below} } \value{ An updated version of \code{recipe} with the new step added to the sequence of existing steps (if any). } \description{ \code{step_pca} creates a \emph{specification} of a recipe step that will convert numeric data into one or more principal components. } \details{ Principal component analysis (PCA) is a transformation of a group of variables that produces a new set of artificial features or components. These components are designed to capture the maximum amount of information (i.e. variance) in the original variables. Also, the components are statistically independent from one another. This means that they can be used to combat large inter-variables correlations in a data set. It is advisable to standardized the variables prior to running PCA. Here, each variable will be centered and scaled prior to the PCA calculation. This can be changed using the \code{options} argument or by using \code{\link{step_center}} and \code{\link{step_scale}}. The argument \code{num} controls the number of components that will be retained (the original variables that are used to derive the components are removed from the data). The new components will have names that begin with \code{prefix} and a sequence of numbers. The variable names are padded with zeros. For example, if \code{num < 10}, their names will be \code{PC1} - \code{PC9}. If \code{num = 101}, the names would be \code{PC001} - \code{PC101}. Alternatively, \code{threshold} can be used to determine the number of components that are required to capture a specified fraction of the total variance in the variables. } \examples{ rec <- recipe( ~ ., data = USArrests) pca_trans <- rec \%>\% step_center(all_numeric()) \%>\% step_scale(all_numeric()) \%>\% step_pca(all_numeric(), num = 3) pca_estimates <- prep(pca_trans, training = USArrests) pca_data <- bake(pca_estimates, USArrests) rng <- extendrange(c(pca_data$PC1, pca_data$PC2)) plot(pca_data$PC1, pca_data$PC2, xlim = rng, ylim = rng) with_thresh <- rec \%>\% step_center(all_numeric()) \%>\% step_scale(all_numeric()) \%>\% step_pca(all_numeric(), threshold = .99) with_thresh <- prep(with_thresh, training = USArrests) bake(with_thresh, USArrests) } \references{ Jolliffe, I. T. (2010). \emph{Principal Component Analysis}. Springer. } \seealso{ \code{\link{step_ica}} \code{\link{step_kpca}} \code{\link{step_isomap}} \code{\link{recipe}} \code{\link{prep.recipe}} \code{\link{bake.recipe}} } \concept{ preprocessing pca projection_methods } \keyword{datagen}