MatchIt/0000755000176200001440000000000014763430062011610 5ustar liggesusersMatchIt/MD50000644000176200001440000001302514763430062012121 0ustar liggesusers4ee005ffa021c1d277d23e650aa424da *DESCRIPTION 85aa65ef308d274da164224f5fcde0c6 *NAMESPACE 966b38e7b146e46da3cf53d369d15068 *NEWS.md 8d7b94ab08f51599c861d235fcc1717f *R/MatchIt-package.R 3972c23b22167288f7ffff7542ac9ddc *R/RcppExports.R 6871ac511310d5263fe1735615069e63 *R/add_s.weights.R 64b297045d48afc10fe4d7778f163ee4 *R/aux_functions.R 4b607e0dbb16f4500be015d8793b1e3a *R/discard.R dbcc25887197dcd3c7e58b8c943ab477 *R/dist_functions.R 2ce4670616600bf902bfabdb6b34fd91 *R/distance2_methods.R 15db2c16321ced96e6f7960efc3bd238 *R/get_weights_from_mm.R aa65b3cd1f9ac09f6558bb1fb0ca9519 *R/get_weights_from_subclass.R 9ed24ec9732d41f10795fb3b612b2ad5 *R/input_processing.R 08e8b9e36da0e7646862a90357f5246e *R/lalonde.R 38a1e29eaab4f43a65eff8ad3a0f5bdc *R/match.qoi.R 741557e23ddec1c3d6e2a47fc2063768 *R/match_data.R 2ddd1e37e7bceaa4ac3f0a63351b8b0a *R/matchit.R 906a89cc73518d9bbb60af6f3d7d5acf *R/matchit2cardinality.R 9a23fed964590605d148c03a245c0530 *R/matchit2cem.R 4fd1181dff019ce2e5ed302a1c8b6f4f *R/matchit2exact.R fc558ead274cd76ef10f691c2623e44c *R/matchit2full.R 7471bb4d7b25def40c75836dfada083b *R/matchit2genetic.R 51ed9c12b9a87df4cded4adfdd1b5243 *R/matchit2nearest.R 639f860b60ea25767840dbb07dfa9224 *R/matchit2optimal.R 046426bfd86cb6f5f817301a8e684d01 *R/matchit2quick.R 8e51e416bd4de45f804edbc3109d1bca *R/matchit2subclass.R 27b4286d7f331c52ce421f799b59eba0 *R/plot.matchit.R c349b1bc599e8905277e201e036c027f *R/plot.summary.matchit.R 6eff8773823f026343c3917936f5d0f2 *R/rbind.matchdata.R 0555170da7ce27ded4ba7a6960944218 *R/summary.matchit.R 74ba1c26e7d2c368f0a90e1c7234be12 *R/utils.R cf17c848081f03144256c3a7003edd4e *R/zzz.R cdcc758163f6d12144f20e3a1b544045 *build/stage23.rdb b963c2a5523b94fbe6047ea993366484 *build/vignette.rds 0aa041b2d13d2489b60afea0103660f7 *data/lalonde.tab 2e3c4d1b48e19069607c92544fc0fc60 *inst/CITATION 2a6122db1a9c2d5ca3267dd260d9f1bb *inst/doc/MatchIt.Rmd 900da2dc1ef1dd5c92d0bc41b7b00d68 *inst/doc/MatchIt.html dd59fbf3ba4d07f40044f4ff7fa9a136 *inst/doc/assessing-balance.Rmd c34d8894fdb06ee2e92975b779da652a *inst/doc/assessing-balance.html d9aecbc4b2b796656710b6a62756cb80 *inst/doc/estimating-effects.Rmd 4e4c83ee0db49c38ed4b805aebd2a11e *inst/doc/estimating-effects.html 503645af078679473e03ab7ae8b31154 *inst/doc/matching-methods.Rmd 8cb4ac63a367b467d46f8335118e3f3e *inst/doc/matching-methods.html 21da6ce81e63a9229a2b4db0fe30ef73 *inst/doc/sampling-weights.Rmd 7c3664f77d84c1828c381a4ad53002fe *inst/doc/sampling-weights.html 3959b8f5e0ce16d378d0e100deed1f88 *inst/include/MatchIt.h 9fad12433d3f9297c7d649ab27613924 *inst/include/MatchIt_RcppExports.h 45dea246b8e6ecce342f5cd03942ccfc *man/MatchIt-package.Rd 0e759a9ded2060e80f2b48b54b0bba29 *man/add_s.weights.Rd a3ca4aa835b19ac009c6b919724718bf *man/distance.Rd 3425523d337ca3d85f23b912c098b0b6 *man/figures/README-unnamed-chunk-4-1.png eb5086634cf8b4400e76b679ff8dea25 *man/figures/README-unnamed-chunk-5-1.png 0920f6ee93789de8c475279701801c56 *man/figures/README-unnamed-chunk-6-1.png de9f73d1fb2d4fd20aafb80bf237aff6 *man/figures/logo.png 33d63bff868074d624d467c8761ba28a *man/lalonde.Rd 7f976dfae2d3a4b72d42af82de4394dd *man/macros/macros.Rd e9eadfee76f52a64012e86207ed08943 *man/mahalanobis_dist.Rd fec937333da1398bc140972717901a51 *man/match_data.Rd bb0a71128562da5d0dbd1f4827f0e13d *man/matchit.Rd 8e9901fa51a8de845a7c04b2e4c9b153 *man/method_cardinality.Rd 375c52dfdc63e356923ac4205afdf5c4 *man/method_cem.Rd eaf0afb00f9691e2f1ca4a73c90021a7 *man/method_exact.Rd 232c022f70629b061a2b1e26a9f98551 *man/method_full.Rd 54a550d750f0f364aa2301d9a41db5c7 *man/method_genetic.Rd 8a29c96d51a5156187aaca0143b4e410 *man/method_nearest.Rd 4e2632ba7c94d11c8a09f748bc60ef00 *man/method_optimal.Rd fd2a76ace307a4e579e787d3d76afb3c *man/method_quick.Rd 57f41c3773d4d8abf60c473052e800fe *man/method_subclass.Rd 81fa84f7a202d0a200610feb99dcd411 *man/plot.matchit.Rd 6543c6fd0b1f6342ac71e96bc1af047a *man/plot.summary.matchit.Rd 87d8737edf65593578375a0dfca01f14 *man/rbind.matchdata.Rd c6bcab699d94ddb08778e14bb25c7158 *man/summary.matchit.Rd c89d6b95d18721894ab45341d0fbc7a6 *src/Makevars c89d6b95d18721894ab45341d0fbc7a6 *src/Makevars.win 0863fddf7f44c45a7816eef791492c71 *src/RcppExports.cpp 0373d9f64d1aaf40dc323d25f3855e1c *src/all_equal_to.cpp d10790654d7bd7cbe35ef744b3d5be4e *src/eta_progress_bar.h 1229a36b24049ad4da99a4be181027b5 *src/eucdistC.cpp 69e837d5ba0b09cbf6424dd7424124c6 *src/get_splitsC.cpp 25bb6ec50e04a8d47dba36d88bae2682 *src/has_n_unique.cpp a2c702aff17677f2ea8ea13cef2741c6 *src/internal.cpp da45aafbc071de478432eabcc47edf18 *src/internal.h c7a516c072ebfc38f330a030aedc047d *src/nn_matchC_distmat.cpp 6a8320f54fc67bff334ad6fe8a325d56 *src/nn_matchC_distmat_closest.cpp a7dcc74afd71ac5efe5dfdee8f359385 *src/nn_matchC_mahcovs.cpp 602c02a266d6ecafedd2ee15c9cef551 *src/nn_matchC_mahcovs_closest.cpp 9332f4c804f94fae1794c36f2f82aed0 *src/nn_matchC_vec.cpp 371ecf60114b815465c4aac987ba3d14 *src/nn_matchC_vec_closest.cpp 9299e7a8325ab88f09a239f7bdbd5f1c *src/pairdistC.cpp 6806504ee2d3985874ff98f4187c80d3 *src/preprocess_matchC.cpp 775e099bf083ebfd20e0b86ade62a1a7 *src/subclass2mm.cpp 71969793888fbb7e8889a83d20c4dbe1 *src/subclass_scootC.cpp 84f622812d9bf966f1b5d2eb43dbf882 *src/tabulateC.cpp d35e6cb2e5e0a78c21224958bebc1b42 *src/weights_matrixC.cpp 2a6122db1a9c2d5ca3267dd260d9f1bb *vignettes/MatchIt.Rmd dd59fbf3ba4d07f40044f4ff7fa9a136 *vignettes/assessing-balance.Rmd d9aecbc4b2b796656710b6a62756cb80 *vignettes/estimating-effects.Rmd 503645af078679473e03ab7ae8b31154 *vignettes/matching-methods.Rmd e8c006997b34a1832f844c712465ce3d *vignettes/references.bib 21da6ce81e63a9229a2b4db0fe30ef73 *vignettes/sampling-weights.Rmd MatchIt/R/0000755000176200001440000000000014763322260012011 5ustar liggesusersMatchIt/R/RcppExports.R0000644000176200001440000000766314763322260014441 0ustar liggesusers# Generated by using Rcpp::compileAttributes() -> do not edit by hand # Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 all_equal_to <- function(x, y) { .Call(`_MatchIt_all_equal_to`, x, y) } eucdistC_N1xN0 <- function(x, t) { .Call(`_MatchIt_eucdistC_N1xN0`, x, t) } get_splitsC <- function(x, caliper) { .Call(`_MatchIt_get_splitsC`, x, caliper) } has_n_unique <- function(x, n) { .Call(`_MatchIt_has_n_unique`, x, n) } nn_matchC_distmat <- function(treat_, ord, ratio, discarded, reuse_max, focal_, distance_mat, exact_ = NULL, caliper_dist_ = NULL, caliper_covs_ = NULL, caliper_covs_mat_ = NULL, antiexact_covs_ = NULL, unit_id_ = NULL, disl_prog = FALSE) { .Call(`_MatchIt_nn_matchC_distmat`, treat_, ord, ratio, discarded, reuse_max, focal_, distance_mat, exact_, caliper_dist_, caliper_covs_, caliper_covs_mat_, antiexact_covs_, unit_id_, disl_prog) } nn_matchC_distmat_closest <- function(treat, ratio, discarded, reuse_max, distance_mat, exact_ = NULL, caliper_dist_ = NULL, caliper_covs_ = NULL, caliper_covs_mat_ = NULL, antiexact_covs_ = NULL, unit_id_ = NULL, close = TRUE, disl_prog = FALSE) { .Call(`_MatchIt_nn_matchC_distmat_closest`, treat, ratio, discarded, reuse_max, distance_mat, exact_, caliper_dist_, caliper_covs_, caliper_covs_mat_, antiexact_covs_, unit_id_, close, disl_prog) } nn_matchC_mahcovs <- function(treat_, ord, ratio, discarded, reuse_max, focal_, mah_covs, distance_ = NULL, exact_ = NULL, caliper_dist_ = NULL, caliper_covs_ = NULL, caliper_covs_mat_ = NULL, antiexact_covs_ = NULL, unit_id_ = NULL, disl_prog = FALSE) { .Call(`_MatchIt_nn_matchC_mahcovs`, treat_, ord, ratio, discarded, reuse_max, focal_, mah_covs, distance_, exact_, caliper_dist_, caliper_covs_, caliper_covs_mat_, antiexact_covs_, unit_id_, disl_prog) } nn_matchC_mahcovs_closest <- function(treat, ratio, discarded, reuse_max, mah_covs, distance_ = NULL, exact_ = NULL, caliper_dist_ = NULL, caliper_covs_ = NULL, caliper_covs_mat_ = NULL, antiexact_covs_ = NULL, unit_id_ = NULL, close = TRUE, disl_prog = FALSE) { .Call(`_MatchIt_nn_matchC_mahcovs_closest`, treat, ratio, discarded, reuse_max, mah_covs, distance_, exact_, caliper_dist_, caliper_covs_, caliper_covs_mat_, antiexact_covs_, unit_id_, close, disl_prog) } nn_matchC_vec <- function(treat_, ord, ratio, discarded, reuse_max, focal_, distance, exact_ = NULL, caliper_dist_ = NULL, caliper_covs_ = NULL, caliper_covs_mat_ = NULL, antiexact_covs_ = NULL, unit_id_ = NULL, disl_prog = FALSE) { .Call(`_MatchIt_nn_matchC_vec`, treat_, ord, ratio, discarded, reuse_max, focal_, distance, exact_, caliper_dist_, caliper_covs_, caliper_covs_mat_, antiexact_covs_, unit_id_, disl_prog) } nn_matchC_vec_closest <- function(treat, ratio, discarded, reuse_max, distance, exact_ = NULL, caliper_dist_ = NULL, caliper_covs_ = NULL, caliper_covs_mat_ = NULL, antiexact_covs_ = NULL, unit_id_ = NULL, close = TRUE, disl_prog = FALSE) { .Call(`_MatchIt_nn_matchC_vec_closest`, treat, ratio, discarded, reuse_max, distance, exact_, caliper_dist_, caliper_covs_, caliper_covs_mat_, antiexact_covs_, unit_id_, close, disl_prog) } pairdistsubC <- function(x, t, s) { .Call(`_MatchIt_pairdistsubC`, x, t, s) } preprocess_matchC <- function(t, p) { .Call(`_MatchIt_preprocess_matchC`, t, p) } subclass2mmC <- function(subclass_, treat, focal) { .Call(`_MatchIt_subclass2mmC`, subclass_, treat, focal) } mm2subclassC <- function(mm, treat, focal = NULL) { .Call(`_MatchIt_mm2subclassC`, mm, treat, focal) } subclass_scootC <- function(subclass_, treat_, x_, min_n) { .Call(`_MatchIt_subclass_scootC`, subclass_, treat_, x_, min_n) } tabulateC <- function(bins, nbins = NULL) { .Call(`_MatchIt_tabulateC`, bins, nbins) } weights_matrixC <- function(mm, treat_, focal = NULL) { .Call(`_MatchIt_weights_matrixC`, mm, treat_, focal) } # Register entry points for exported C++ functions methods::setLoadAction(function(ns) { .Call(`_MatchIt_RcppExport_registerCCallable`) }) MatchIt/R/matchit2subclass.R0000644000176200001440000002470514762403666015430 0ustar liggesusers#' Subclassification #' @name method_subclass #' @aliases method_subclass #' @usage NULL #' #' @description #' In [matchit()], setting `method = "subclass"` performs #' subclassification on the distance measure (i.e., propensity score). #' Treatment and control units are placed into subclasses based on quantiles of #' the propensity score in the treated group, in the control group, or overall, #' depending on the desired estimand. Weights are computed based on the #' proportion of treated units in each subclass. Subclassification implemented #' here does not rely on any other package. #' #' This page details the allowable arguments with `method = "subclass"`. #' See [matchit()] for an explanation of what each argument means in a general #' context and how it can be specified. #' #' Below is how `matchit()` is used for subclassification: #' \preformatted{ #' matchit(formula, #' data = NULL, #' method = "subclass", #' distance = "glm", #' link = "logit", #' distance.options = list(), #' estimand = "ATT", #' discard = "none", #' reestimate = FALSE, #' s.weights = NULL, #' verbose = FALSE, #' ...) } #' #' @param formula a two-sided [formula] object containing the treatment and #' covariates to be used in creating the distance measure used in the #' subclassification. #' @param data a data frame containing the variables named in `formula`. #' If not found in `data`, the variables will be sought in the #' environment. #' @param method set here to `"subclass"`. #' @param distance the distance measure to be used. See [`distance`] #' for allowable options. Must be a vector of distance scores or the name of a method of estimating propensity scores. #' @param link when `distance` is specified as a string, an additional #' argument controlling the link function used in estimating the distance #' measure. See [`distance`] for allowable options with each option. #' @param distance.options a named list containing additional arguments #' supplied to the function that estimates the distance measure as determined #' by the argument to `distance`. #' @param estimand the target `estimand`. If `"ATT"`, the default, #' subclasses are formed based on quantiles of the distance measure in the #' treated group; if `"ATC"`, subclasses are formed based on quantiles of #' the distance measure in the control group; if `"ATE"`, subclasses are #' formed based on quantiles of the distance measure in the full sample. The #' estimand also controls how the subclassification weights are computed; see #' the Computing Weights section at [matchit()] for details. #' @param discard a string containing a method for discarding units outside a #' region of common support. #' @param reestimate if `discard` is not `"none"`, whether to #' re-estimate the propensity score in the remaining sample prior to #' subclassification. #' @param s.weights the variable containing sampling weights to be incorporated #' into propensity score models and balance statistics. #' @param verbose `logical`; whether information about the matching #' process should be printed to the console. #' @param \dots additional arguments that control the subclassification: #' \describe{ #' \item{`subclass`}{either the number of subclasses desired #' or a vector of quantiles used to divide the distance measure into #' subclasses. Default is 6.} #' \item{`min.n`}{ the minimum number of #' units of each treatment group that are to be assigned each subclass. If the #' distance measure is divided in such a way that fewer than `min.n` units #' of a treatment group are assigned a given subclass, units from other #' subclasses will be reassigned to fill the deficient subclass. Default is 1. #' } #' } #' #' The arguments `exact`, `mahvars`, `replace`, `m.order`, `caliper` (and related arguments), and `ratio` are ignored with a warning. #' #' @section Outputs: #' #' All outputs described in [matchit()] are returned with #' `method = "subclass"` except that `match.matrix` is excluded and #' one additional component, `q.cut`, is included, containing a vector of #' the distance measure cutpoints used to define the subclasses. Note that when #' `min.n > 0`, the subclass assignments may not strictly obey the #' quantiles listed in `q.cut`. `include.obj` is ignored. #' #' @details #' After subclassification, effect estimates can be computed separately in the #' subclasses and combined, or a single marginal effect can be estimated by #' using the weights in the full sample. When using the weights, the method is #' sometimes referred to as marginal mean weighting through stratification #' (MMWS; Hong, 2010) or fine stratification weighting (Desai et al., 2017). #' The weights can be interpreted just like inverse probability weights. See `vignette("estimating-effects")` for details. #' #' Changing `min.n` can change the quality of the weights. Generally, a #' low `min.w` will yield better balance because subclasses only contain #' units with relatively similar distance values, but may yield higher variance #' because extreme weights can occur due to there being few members of a #' treatment group in some subclasses. When `min.n = 0`, some subclasses may fail to #' contain units from both treatment groups, in which case all units in such subclasses #' will be dropped. #' #' Note that subclassification weights can also be estimated using #' *WeightIt*, which provides some additional methods for estimating #' propensity scores. Where propensity score-estimation methods overlap, both #' packages will yield the same weights. #' #' @seealso [matchit()] for a detailed explanation of the inputs and outputs of #' a call to `matchit()`. #' #' [`method_full`] for optimal full matching and [`method_quick`] for generalized full matching, which are similar to #' subclassification except that the number of subclasses and subclass #' membership are chosen to optimize the within-subclass distance. #' #' @references In a manuscript, you don't need to cite another package when #' using `method = "subclass"` because the subclassification is performed #' completely within *MatchIt*. For example, a sentence might read: #' #' *Propensity score subclassification was performed using the MatchIt #' package (Ho, Imai, King, & Stuart, 2011) in R.* #' #' It may be a good idea to cite Hong (2010) or Desai et al. (2017) if the #' treatment effect is estimated using the subclassification weights. #' #' Desai, R. J., Rothman, K. J., Bateman, B. . T., Hernandez-Diaz, S., & #' Huybrechts, K. F. (2017). A Propensity-score-based Fine Stratification #' Approach for Confounding Adjustment When Exposure Is Infrequent: #' Epidemiology, 28(2), 249–257. \doi{10.1097/EDE.0000000000000595} #' #' Hong, G. (2010). Marginal mean weighting through stratification: Adjustment #' for selection bias in multilevel data. Journal of Educational and Behavioral #' Statistics, 35(5), 499–531. \doi{10.3102/1076998609359785} #' #' @examples #' #' data("lalonde") #' #' # PS subclassification for the ATT with 7 subclasses #' s.out1 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' method = "subclass", #' subclass = 7) #' s.out1 #' summary(s.out1, subclass = TRUE) #' #' # PS subclassification for the ATE with 10 subclasses #' # and at least 2 units in each group per subclass #' s.out2 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' method = "subclass", #' subclass = 10, #' estimand = "ATE", #' min.n = 2) #' s.out2 #' summary(s.out2) #' NULL matchit2subclass <- function(treat, distance, discarded, replace = FALSE, exact = NULL, estimand = "ATT", verbose = FALSE, subclass = 6L, min.n = 1L, ...) { .cat_verbose("Subclassifying...\n", verbose = verbose) #Checks chk::chk_numeric(subclass) if (length(subclass) == 1L) { chk::chk_gt(subclass, 1) } else if (any(subclass > 1) || any(subclass < 0)) { .err("when specifying `subclass` as a vector of quantiles, all values must be between 0 and 1") } if (is_not_null(...get("sub.by"))) { .err("`sub.by` is defunct and has been replaced with `estimand`") } estimand <- toupper(estimand) estimand <- match_arg(estimand, c("ATT", "ATC", "ATE")) chk::chk_count(min.n) ## Setting Cut Points if (length(subclass) == 1L) { sprobs <- seq(0, 1, length.out = round(subclass) + 1) } else { sprobs <- sort(subclass) if (sprobs[1] != 0) sprobs <- c(0, sprobs) if (sprobs[length(sprobs)] != 1) sprobs <- c(sprobs, 1) subclass <- length(sprobs) - 1L } qu <- switch(estimand, "ATT" = quantile(distance[treat == 1], probs = sprobs, na.rm = TRUE), "ATC" = quantile(distance[treat == 0], probs = sprobs, na.rm = TRUE), quantile(distance, probs = sprobs, na.rm = TRUE)) ## Calculating Subclasses psclass <- rep_with(NA_integer_, treat) psclass[!discarded] <- as.integer(findInterval(distance[!discarded], qu, all.inside = TRUE)) if (!has_n_unique(na.omit(psclass), subclass)) { .wrn("due to discreteness in the distance measure, fewer subclasses were generated than were requested") } if (min.n == 0) { ## If any subclass are missing treated or control units, set all to NA is.na(psclass)[!discarded & !psclass %in% intersect(psclass[!discarded & treat == 1], psclass[!discarded & treat == 0])] <- TRUE } else { ## If any subclasses don't have members of a treatment group, fill them ## by "scooting" units from nearby subclasses until each subclass has a unit ## from each treatment group psclass[!discarded] <- subclass_scoot(psclass[!discarded], treat[!discarded], distance[!discarded], min.n) } psclass <- setNames(factor(psclass, nmax = length(qu)), names(treat)) levels(psclass) <- as.character(seq_len(nlevels(psclass))) .cat_verbose("Calculating matching weights... ", verbose = verbose) res <- list(subclass = psclass, q.cut = qu, weights = get_weights_from_subclass(psclass, treat, estimand)) .cat_verbose("Done.\n", verbose = verbose) class(res) <- c("matchit.subclass", "matchit") res } MatchIt/R/dist_functions.R0000644000176200001440000004234514762410052015173 0ustar liggesusers#' Compute a Distance Matrix #' @name mahalanobis_dist #' #' @description #' The functions compute a distance matrix, either for a single dataset (i.e., #' the distances between all pairs of units) or for two groups defined by a #' splitting variable (i.e., the distances between all units in one group and #' all units in the other). These distance matrices include the Mahalanobis #' distance, Euclidean distance, scaled Euclidean distance, and robust #' (rank-based) Mahalanobis distance. These functions can be used as inputs to #' the `distance` argument to [matchit()] and are used to compute the #' corresponding distance matrices within `matchit()` when named. #' #' @aliases euclidean_dist scaled_euclidean_dist mahalanobis_dist #' robust_mahalanobis_dist #' #' @param formula a formula with the treatment (i.e., splitting variable) on #' the left side and the covariates used to compute the distance matrix on the #' right side. If there is no left-hand-side variable, the distances will be #' computed between all pairs of units. If `NULL`, all the variables in #' `data` will be used as covariates. #' @param data a data frame containing the variables named in `formula`. #' If `formula` is `NULL`, all variables in `data` will be used #' as covariates. #' @param s.weights when `var = NULL`, an optional vector of sampling #' weights used to compute the variances used in the Mahalanobis, scaled #' Euclidean, and robust Mahalanobis distances. #' @param var for `mahalanobis_dist()`, a covariance matrix used to scale #' the covariates. For `scaled_euclidean_dist()`, either a covariance #' matrix (from which only the diagonal elements will be used) or a vector of #' variances used to scale the covariates. If `NULL`, these values will be #' calculated using formulas described in Details. #' @param discarded a `logical` vector denoting which units are to be #' discarded or not. This is used only when `var = NULL`. The scaling #' factors will be computed only using the non-discarded units, but the #' distance matrix will be computed for all units (discarded and #' non-discarded). #' @param \dots ignored. Included to make cycling through these functions #' easier without having to change the arguments supplied. #' #' @return A numeric distance matrix. When `formula` has a left-hand-side #' (treatment) variable, the matrix will have one row for each treated unit and #' one column for each control unit. Otherwise, the matrix will have one row #' and one column for each unit. #' #' @details #' The **Euclidean distance** (computed using `euclidean_dist()`) is #' the raw distance between units, computed as \deqn{d_{ij} = \sqrt{(x_i - #' x_j)(x_i - x_j)'}} where \eqn{x_i} and \eqn{x_j} are vectors of covariates #' for units \eqn{i} and \eqn{j}, respectively. The Euclidean distance is #' sensitive to the scales of the variables and their redundancy (i.e., #' correlation). It should probably not be used for matching unless all of the #' variables have been previously scaled appropriately or are already on the #' same scale. It forms the basis of the other distance measures. #' #' The **scaled Euclidean distance** (computed using #' `scaled_euclidean_dist()`) is the Euclidean distance computed on the #' scaled covariates. Typically the covariates are scaled by dividing by their #' standard deviations, but any scaling factor can be supplied using the #' `var` argument. This leads to a distance measure computed as #' \deqn{d_{ij} = \sqrt{(x_i - x_j)S_d^{-1}(x_i - x_j)'}} where \eqn{S_d} is a #' diagonal matrix with the squared scaling factors on the diagonal. Although #' this measure is not sensitive to the scales of the variables (because they #' are all placed on the same scale), it is still sensitive to redundancy among #' the variables. For example, if 5 variables measure approximately the same #' construct (i.e., are highly correlated) and 1 variable measures another #' construct, the first construct will have 5 times as much influence on the #' distance between units as the second construct. The Mahalanobis distance #' attempts to address this issue. #' #' The **Mahalanobis distance** (computed using `mahalanobis_dist()`) #' is computed as \deqn{d_{ij} = \sqrt{(x_i - x_j)S^{-1}(x_i - x_j)'}} where #' \eqn{S} is a scaling matrix, typically the covariance matrix of the #' covariates. It is essentially equivalent to the Euclidean distance computed #' on the scaled principal components of the covariates. This is the most #' popular distance matrix for matching because it is not sensitive to the #' scale of the covariates and accounts for redundancy between them. The #' scaling matrix can also be supplied using the `var` argument. #' #' The Mahalanobis distance can be sensitive to outliers and long-tailed or #' otherwise non-normally distributed covariates and may not perform well with #' categorical variables due to prioritizing rare categories over common ones. #' One solution is the rank-based **robust Mahalanobis distance** #' (computed using `robust_mahalanobis_dist()`), which is computed by #' first replacing the covariates with their ranks (using average ranks for #' ties) and rescaling each ranked covariate by a constant scaling factor #' before computing the usual Mahalanobis distance on the rescaled ranks. #' #' The Mahalanobis distance and its robust variant are computed internally by #' transforming the covariates in such a way that the Euclidean distance #' computed on the scaled covariates is equal to the requested distance. For #' the Mahalanobis distance, this involves replacing the covariates vector #' \eqn{x_i} with \eqn{x_iS^{-.5}}, where \eqn{S^{-.5}} is the Cholesky #' decomposition of the (generalized) inverse of the covariance matrix \eqn{S}. #' #' When a left-hand-side splitting variable is present in `formula` and #' `var = NULL` (i.e., so that the scaling matrix is computed internally), #' the covariance matrix used is the "pooled" covariance matrix, which #' essentially is a weighted average of the covariance matrices computed #' separately within each level of the splitting variable to capture #' within-group variation and reduce sensitivity to covariate imbalance. This #' is also true of the scaling factors used in the scaled Euclidean distance. #' #' #' @author Noah Greifer #' @seealso [`distance`], [matchit()], [dist()] (which is used #' internally to compute some Euclidean distances) #' #' \pkgfun{optmatch}{match_on}, which provides similar functionality but with fewer #' options and a focus on efficient storage of the output. #' #' @references #' #' Rosenbaum, P. R. (2010). *Design of observational studies*. Springer. #' #' Rosenbaum, P. R., & Rubin, D. B. (1985). Constructing a Control Group Using #' Multivariate Matched Sampling Methods That Incorporate the Propensity Score. #' *The American Statistician*, 39(1), 33–38. \doi{10.2307/2683903} #' #' Rubin, D. B. (1980). Bias Reduction Using Mahalanobis-Metric Matching. #' *Biometrics*, 36(2), 293–298. \doi{10.2307/2529981} #' #' @examples #' #' data("lalonde") #' #' # Computing the scaled Euclidean distance between all units: #' d <- scaled_euclidean_dist(~ age + educ + race + married, #' data = lalonde) #' #' # Another interface using the data argument: #' dat <- subset(lalonde, select = c(age, educ, race, married)) #' d <- scaled_euclidean_dist(data = dat) #' #' # Computing the Mahalanobis distance between treated and #' # control units: #' d <- mahalanobis_dist(treat ~ age + educ + race + married, #' data = lalonde) #' #' # Supplying a covariance matrix or vector of variances (note: #' # a bit more complicated with factor variables) #' dat <- subset(lalonde, select = c(age, educ, married, re74)) #' vars <- sapply(dat, var) #' #' d <- scaled_euclidean_dist(data = dat, var = vars) #' #' # Same result: #' d <- scaled_euclidean_dist(data = dat, var = diag(vars)) #' #' # Discard units: #' discard <- sample(c(TRUE, FALSE), nrow(lalonde), #' replace = TRUE, prob = c(.2, .8)) #' #' d <- mahalanobis_dist(treat ~ age + educ + race + married, #' data = lalonde, discarded = discard) #' dim(d) #all units present in distance matrix #' table(lalonde$treat) #' #Functions to compute distance matrices #' @export mahalanobis_dist <- function(formula = NULL, data = NULL, s.weights = NULL, var = NULL, discarded = NULL, ...) { X <- transform_covariates(formula, data, method = "mahalanobis", s.weights = s.weights, var = var, discarded = discarded) eucdist_internal(X, attr(X, "treat")) } #' @export #' @rdname mahalanobis_dist scaled_euclidean_dist <- function(formula = NULL, data = NULL, s.weights = NULL, var = NULL, discarded = NULL, ...) { X <- transform_covariates(formula, data = data, method = "scaled_euclidean", s.weights = s.weights, var = var, discarded = discarded) eucdist_internal(X, attr(X, "treat")) } #' @export #' @rdname mahalanobis_dist robust_mahalanobis_dist <- function(formula = NULL, data = NULL, s.weights = NULL, discarded = NULL, ...) { X <- transform_covariates(formula, data = data, method = "robust_mahalanobis", s.weights = s.weights, discarded = discarded) eucdist_internal(X, attr(X, "treat")) } #' @export #' @rdname mahalanobis_dist euclidean_dist <- function(formula = NULL, data = NULL, ...) { X <- transform_covariates(formula, data = data, method = "euclidean") eucdist_internal(X, attr(X, "treat")) } #Transforms covariates so that Euclidean distance computed on transforms covariates is equivalent to #requested distance. When discarded is not NULL, statistics relevant to transformation are computed #using retained units, but full covariate matrix is returned. transform_covariates <- function(formula = NULL, data = NULL, method = "mahalanobis", s.weights = NULL, var = NULL, treat = NULL, discarded = NULL) { X <- get_covs_matrix_for_dist(formula, data) X <- .check_X(X) treat <- check_treat(treat, X) #If allvariables have no variance, use Euclidean to avoid errors #If some have no variance, removes those to avoid messing up distances no_variance <- which(apply(X, 2L, function(x) abs(max(x) - min(x)) < sqrt(.Machine$double.eps))) if (length(no_variance) == ncol(X)) { method <- "euclidean" X <- X[, 1L, drop = FALSE] } else if (is_not_null(no_variance)) { X <- X[, -no_variance, drop = FALSE] } method <- match_arg(method, matchit_distances()) if (is_null(discarded)) { discarded <- rep.int(FALSE, nrow(X)) } if (method == "mahalanobis") { # X <- sweep(X, 2, colMeans(X)) if (is_null(var) && is_null(treat) && is.null(s.weights)) { # https://stats.stackexchange.com/a/81691/116195 dn <- dimnames(X) X <- svd(sweep(X, 2L, colMeans(X)), nv = 0)$u * sqrt(nrow(X) - 1) dimnames(X) <- dn return(X) } if (is_null(var)) { X <- scale(X) #NOTE: optmatch and Rubin (1980) use pooled within-group covariance matrix var <- { if (is_not_null(treat)) pooled_cov(X[!discarded, , drop = FALSE], treat[!discarded], s.weights[!discarded]) else if (is_null(s.weights)) cov(X[!discarded, , drop = FALSE]) else cov.wt(X[!discarded, , drop = FALSE], s.weights[!discarded])$cov } } else if (!is.cov_like(var)) { .err("if `var` is not `NULL`, it must be a covariance matrix with as many entries as supplied variables") } inv_var <- NULL d <- det(var) if (d > 1e-8) { inv_var <- try(solve(var), silent = TRUE) } if (d <= 1e-8 || inherits(inv_var, "try-error")) { inv_var <- generalized_inverse(var) } X <- mahalanobize(X, inv_var) } else if (method == "robust_mahalanobis") { #Rosenbaum (2010, ch8) X_r <- matrix(0, nrow = sum(!discarded), ncol = ncol(X), dimnames = list(rownames(X)[!discarded], colnames(X))) for (i in seq_len(ncol(X_r))) { X_r[, i] <- rank(X[!discarded, i]) } var_r <- { if (is_null(s.weights)) cov(X_r) else cov.wt(X_r, s.weights[!discarded])$cov } multiplier <- sd(seq_len(sum(!discarded))) / sqrt(diag(var_r)) var_r <- var_r * outer(multiplier, multiplier, "*") inv_var <- NULL d <- det(var_r) if (d > 1e-8) { inv_var <- try(solve(var_r), silent = TRUE) } if (d <= 1e-8 || inherits(inv_var, "try-error")) { inv_var <- generalized_inverse(var_r) } if (any(discarded)) { X_r <- array(0, dim = dim(X), dimnames = dimnames(X)) for (i in seq_len(ncol(X_r))) { X_r[!discarded, i] <- rank(X[!discarded, i]) } } X <- mahalanobize(X_r, inv_var) } else if (method == "scaled_euclidean") { if (is_null(var)) { if (is_not_null(treat)) { sds <- pooled_sd(X[!discarded, , drop = FALSE], treat[!discarded], s.weights[!discarded]) } else { sds <- sqrt(apply(X[!discarded, , drop = FALSE], 2L, wvar, w = s.weights)) } } else if (is.cov_like(var, X)) { sds <- sqrt(diag(var)) } else if (is.numeric(var) && is.cov_like(diag(var), X)) { sds <- sqrt(var) } else { .err("if `var` is not `NULL`, it must be a covariance matrix or a vector of variances with as many entries as supplied variables") } for (i in seq_len(ncol(X))) { X[, i] <- X[, i] / sds[i] } } else if (method == "euclidean") { #Do nothing } attr(X, "treat") <- treat X } #Internal function for fast(ish) Euclidean distance eucdist_internal <- function(X, treat = NULL) { if (is_null(treat)) { d <- { if (NCOL(X) == 1L) abs(outer(drop(X), drop(X), "-")) else as.matrix(dist(X)) } dimnames(d) <- list(rownames(X), rownames(X)) } else if (!isTRUE(attr(treat, "type") == "multi")) { treat_l <- as.logical(treat) d <- { if (NCOL(X) == 1L) abs(outer(X[treat_l], X[!treat_l], "-")) else eucdistC_N1xN0(X, as.integer(treat)) } dimnames(d) <- list(rownames(X)[treat_l], rownames(X)[!treat_l]) } else { stop("`eucdist_internal()` cannot use a multi-category treat.") } d } #Get covariates (RHS) vars from formula; factor variable contrasts divided by sqrt(2) #to ensure same result as when non-factor binary variable supplied (see optmatch:::contr.match_on) get_covs_matrix_for_dist <- function(formula = NULL, data = NULL) { if (is_null(formula)) { if (is_null(colnames(data))) { colnames(data) <- paste0("X", seq_len(ncol(data))) } fnames <- colnames(data) fnames[!startsWith(fnames, "`")] <- add_quotes(fnames[!startsWith(fnames, "`")], "`") data <- as.data.frame(data) formula <- reformulate(fnames) } else { data <- as.data.frame(data) } formula <- terms(formula, data = data) if (rlang::is_formula(formula, lhs = FALSE)) { formula <- update(formula, ~ . + 1) } else { formula <- update(formula, . ~ . + 1) } mf <- model.frame(formula, data, na.action = na.pass) chars.in.mf <- vapply(mf, is.character, logical(1L)) mf[chars.in.mf] <- lapply(mf[chars.in.mf], factor) X <- model.matrix(formula, data = mf, contrasts.arg = lapply(Filter(is.factor, mf), function(x) contrasts(x, contrasts = FALSE) / sqrt(2))) if (ncol(X) > 1L) { assign <- attr(X, "assign")[-1L] X <- X[, -1L, drop = FALSE] } attr(X, "assign") <- assign attr(X, "treat") <- model.response(mf) X } .check_X <- function(X) { if (isTRUE(attr(X, "checked"))) { return(X) } treat <- attr(X, "treat") if (is.data.frame(X)) { X <- as.matrix(X) } else if (is.numeric(X) && is_null(dim(X))) { X <- matrix(X, nrow = length(X), dimnames = list(names(X), NULL)) } chk::chk_not_any_na(X, "the covariates") if (!all(is.finite(X))) { .err("non-finite values are not allowed in the covariates") } if (!is.numeric(X) || length(dim(X)) != 2) { stop("bad X") } attr(X, "checked") <- TRUE attr(X, "treat") <- treat X } is.cov_like <- function(var, X) { is.numeric(var) && length(dim(var)) == 2L && (missing(X) || all(dim(var) == ncol(X))) && isSymmetric(var) && all(diag(var) >= 0) } matchit_distances <- function() { c("mahalanobis", "robust_mahalanobis", "euclidean", "scaled_euclidean") } mahalanobize <- function(X, inv_var) { ## Mahalanobize covariates by computing cholesky decomp, ## allowing for NPD cov matrix by pivoting ch <- suppressWarnings(chol(inv_var, pivot = TRUE)) p <- order(attr(ch, "pivot")) # r <- seq_len(attr(ch, "rank")) tcrossprod(X, ch[, p, drop = FALSE]) } MatchIt/R/match.qoi.R0000644000176200001440000001606014762405412014022 0ustar liggesusers## Functions to calculate summary stats bal1var <- function(xx, tt, ww = NULL, s.weights, subclass = NULL, mm = NULL, s.d.denom = "treated", standardize = FALSE, compute.pair.dist = TRUE) { un <- is_null(ww) bin.var <- all(xx == 0 | xx == 1) xsum <- rep.int(NA_real_, 7L) if (standardize) names(xsum) <- c("Means Treated", "Means Control", "Std. Mean Diff.", "Var. Ratio", "eCDF Mean", "eCDF Max", "Std. Pair Dist.") else names(xsum) <- c("Means Treated", "Means Control", "Mean Diff.", "Var. Ratio", "eQQ Mean", "eQQ Max", "Pair Dist.") if (un) ww <- s.weights else ww <- ww * s.weights i1 <- which(tt == 1) i0 <- which(tt == 0) too.small <- sum(ww[i1] != 0) < 2 && sum(ww[i0] != 0) < 2 xsum["Means Treated"] <- wm(xx[i1], ww[i1], na.rm = TRUE) xsum["Means Control"] <- wm(xx[i0], ww[i0], na.rm = TRUE) mdiff <- xsum["Means Treated"] - xsum["Means Control"] if (standardize && abs(mdiff) > sqrt(.Machine$double.eps)) { if (!too.small) { if (is.numeric(s.d.denom)) { std <- s.d.denom } else { s.d.denom <- match_arg(s.d.denom, c("treated", "control", "pooled")) std <- switch(s.d.denom, "treated" = sqrt(wvar(xx[i1], bin.var, s.weights[i1])), "control" = sqrt(wvar(xx[i0], bin.var, s.weights[i0])), "pooled" = pooled_sd(xx, tt, w = s.weights, bin.var = bin.var, contribution = "equal")) #Avoid divide by zero if (!is.finite(std) || std < sqrt(.Machine$double.eps)) { std <- pooled_sd(xx, tt, w = s.weights, bin.var = bin.var) } } xsum[3L] <- mdiff / std if (!un && compute.pair.dist) { xsum[7L] <- pair.dist(xx, tt, subclass, mm, std) } } } else { xsum[3L] <- mdiff if (!un && compute.pair.dist) { xsum[7L] <- pair.dist(xx, tt, subclass, mm) } } if (bin.var) { xsum[5L:6L] <- abs(mdiff) } else if (!too.small) { xsum["Var. Ratio"] <- wvar(xx[i1], bin.var, ww[i1]) / wvar(xx[i0], bin.var, ww[i0]) qqmat <- qqsum(xx, tt, ww, standardize = standardize) xsum[5L:6L] <- qqmat[c("meandiff", "maxdiff")] } xsum } bal1var.subclass <- function(xx, tt, s.weights, subclass, s.d.denom = "treated", standardize = FALSE, which.subclass = NULL) { #Within-subclass balance statistics bin.var <- all(xx == 0 | xx == 1) in.sub <- !is.na(subclass) & subclass == which.subclass xsum <- matrix(NA_real_, nrow = 1L, ncol = 6L) rownames(xsum) <- "Subclass" if (standardize) colnames(xsum) <- c("Means Treated", "Means Control", "Std. Mean Diff.", "Var. Ratio", "eCDF Mean", "eCDF Max") else colnames(xsum) <- c("Means Treated", "Means Control", "Mean Diff", "Var. Ratio", "eQQ Mean", "eQQ Max") i1 <- which(in.sub & tt == 1) i0 <- which(in.sub & tt == 0) too.small <- length(i1) < 2L && length(i0) < 2L xsum["Subclass", "Means Treated"] <- wm(xx[i1], s.weights[i1], na.rm = TRUE) xsum["Subclass", "Means Control"] <- wm(xx[i0], s.weights[i0], na.rm = TRUE) mdiff <- xsum["Subclass", "Means Treated"] - xsum["Subclass", "Means Control"] if (standardize && abs(mdiff) > 1e-8) { if (!too.small) { if (is.numeric(s.d.denom)) { std <- s.d.denom } else { #SD from full sample, not within subclass s.d.denom <- match_arg(s.d.denom, c("treated", "control", "pooled")) std <- switch(s.d.denom, "treated" = sqrt(wvar(xx[i1], bin.var, s.weights[i1])), "control" = sqrt(wvar(xx[i0], bin.var, s.weights[i0])), "pooled" = pooled_sd(xx, tt, w = s.weights, bin.var = bin.var, contribution = "equal")) #Avoid divide by zero if (!is.finite(std) || std < sqrt(.Machine$double.eps)) { std <- pooled_sd(xx, tt, w = s.weights, bin.var = bin.var) } } xsum["Subclass", 3L] <- mdiff / std } } else { xsum["Subclass", 3L] <- mdiff } if (bin.var) { xsum["Subclass", 5L:6L] <- abs(mdiff) } else if (!too.small) { xsum["Subclass", "Var. Ratio"] <- wvar(xx[i1], bin.var, s.weights[i1]) / wvar(xx[i0], bin.var, s.weights[i0]) qqall <- qqsum(xx[in.sub], tt[in.sub], standardize = standardize) xsum["Subclass", 5L:6L] <- qqall[c("meandiff", "maxdiff")] } xsum } #Compute within-pair/subclass distances pair.dist <- function(xx, tt, subclass = NULL, mm = NULL, std = NULL) { if (is_not_null(subclass)) { mpdiff <- pairdistsubC(as.numeric(xx), as.integer(tt), as.integer(subclass)) } else if (is_not_null(mm)) { names(xx) <- names(tt) xx_t <- xx[rownames(mm)] xx_c <- matrix(0, nrow = nrow(mm), ncol = ncol(mm)) xx_c[] <- xx[mm] mpdiff <- mean(abs(xx_t - xx_c), na.rm = TRUE) } else { return(NA_real_) } if (is_not_null(std) && abs(mpdiff) > 1e-8) { return(mpdiff / std) } mpdiff } ## Function for QQ summary stats qqsum <- function(x, t, w = NULL, standardize = FALSE) { #x = variable, t = treat, w = weights n.obs <- length(x) if (is_null(w)) { w <- rep.int(1, n.obs) } if (has_n_unique(x, 2L) && all(x == 0 | x == 1)) { t1 <- t == t[1L] #For binary variables, just difference in means ediff <- abs(wm(x[t1], w[t1]) - wm(x[-t1], w[-t1])) return(c(meandiff = ediff, maxdiff = ediff)) } w <- .make_sum_to_1(w, by = t) ord <- order(x) x_ord <- x[ord] w_ord <- w[ord] t_ord <- t[ord] t1 <- which(t_ord == t_ord[1L]) if (standardize) { #Difference between ecdf of x for each group w_ord_ <- w_ord w_ord_[t1] <- -w_ord_[t1] ediff <- abs(cumsum(w_ord_))[c(diff1(x_ord) != 0, TRUE)] } else { #Horizontal distance of ecdf between groups #Need to interpolate larger group to be same size as smaller group u <- unique(x_ord) w1 <- w_ord[t1] w0 <- w_ord[-t1] x1 <- x_ord[t1][w1 > 0] x0 <- x_ord[-t1][w0 > 0] w1 <- w1[w1 > 0] w0 <- w0[w0 > 0] wn1 <- length(w1) wn0 <- length(w0) if (wn1 < wn0) { if (length(u) <= 5) { x0probs <- vapply(u, function(u_) wm(x0 == u_, w0), numeric(1L)) x0cumprobs <- c(0, .cumsum_prob(x0probs)) x0 <- u[findInterval(.cumsum_prob(w1), x0cumprobs, rightmost.closed = TRUE)] } else { x0 <- approx(.cumsum_prob(w0), y = x0, xout = .cumsum_prob(w1), rule = 2, method = "constant", ties = "ordered")$y } } else if (wn1 > wn0) { if (length(u) <= 5) { x1probs <- vapply(u, function(u_) wm(x1 == u_, w1), numeric(1L)) x1cumprobs <- c(0, .cumsum_prob(x1probs)) x1 <- u[findInterval(.cumsum_prob(w0), x1cumprobs, rightmost.closed = TRUE)] } else { x1 <- approx(.cumsum_prob(w1), y = x1, xout = .cumsum_prob(w0), rule = 2, method = "constant", ties = "ordered")$y } } ediff <- abs(x1 - x0) } c(meandiff = mean(ediff), maxdiff = max(ediff)) }MatchIt/R/matchit2genetic.R0000644000176200001440000005641314763231344015221 0ustar liggesusers#' Genetic Matching #' @name method_genetic #' @aliases method_genetic #' @usage NULL #' #' @description #' In [matchit()], setting `method = "genetic"` performs genetic matching. #' Genetic matching is a form of nearest neighbor matching where distances are #' computed as the generalized Mahalanobis distance, which is a generalization #' of the Mahalanobis distance with a scaling factor for each covariate that #' represents the importance of that covariate to the distance. A genetic #' algorithm is used to select the scaling factors. The scaling factors are #' chosen as those which maximize a criterion related to covariate balance, #' which can be chosen, but which by default is the smallest p-value in #' covariate balance tests among the covariates. This method relies on and is a #' wrapper for \pkgfun{Matching}{GenMatch} and \pkgfun{Matching}{Match}, which use #' \pkgfun{rgenoud}{genoud} to perform the optimization using the genetic #' algorithm. #' #' This page details the allowable arguments with `method = "genetic"`. #' See [matchit()] for an explanation of what each argument means in a general #' context and how it can be specified. #' #' Below is how `matchit()` is used for genetic matching: #' \preformatted{ #' matchit(formula, #' data = NULL, #' method = "genetic", #' distance = "glm", #' link = "logit", #' distance.options = list(), #' estimand = "ATT", #' exact = NULL, #' mahvars = NULL, #' antiexact = NULL, #' discard = "none", #' reestimate = FALSE, #' s.weights = NULL, #' replace = FALSE, #' m.order = NULL, #' caliper = NULL, #' ratio = 1, #' verbose = FALSE, #' ...) } #' #' @param formula a two-sided [formula] object containing the treatment and #' covariates to be used in creating the distance measure used in the matching. #' This formula will be supplied to the functions that estimate the distance #' measure and is used to determine the covariates whose balance is to be #' optimized. #' @param data a data frame containing the variables named in `formula`. #' If not found in `data`, the variables will be sought in the #' environment. #' @param method set here to `"genetic"`. #' @param distance the distance measure to be used. See [`distance`] #' for allowable options. When set to a method of estimating propensity scores #' or a numeric vector of distance values, the distance measure is included #' with the covariates in `formula` to be supplied to the generalized #' Mahalanobis distance matrix unless `mahvars` is specified. Otherwise, #' only the covariates in `formula` are supplied to the generalized #' Mahalanobis distance matrix to have their scaling factors chosen. #' `distance` *cannot* be supplied as a distance matrix. Supplying #' any method of computing a distance matrix (e.g., `"mahalanobis"`) has #' the same effect of omitting propensity score but does not affect how the #' distance between units is computed otherwise. #' @param link when `distance` is specified as a method of estimating #' propensity scores, an additional argument controlling the link function used #' in estimating the distance measure. See [`distance`] for allowable #' options with each option. #' @param distance.options a named list containing additional arguments #' supplied to the function that estimates the distance measure as determined #' by the argument to `distance`. #' @param estimand a string containing the desired estimand. Allowable options #' include `"ATT"` and `"ATC"`. See Details. #' @param exact for which variables exact matching should take place. #' @param mahvars when a distance corresponds to a propensity score (e.g., for #' caliper matching or to discard units for common support), which covariates #' should be supplied to the generalized Mahalanobis distance matrix for #' matching. If unspecified, all variables in `formula` will be supplied #' to the distance matrix. Use `mahvars` to only supply a subset. Even if #' `mahvars` is specified, balance will be optimized on all covariates in #' `formula`. See Details. #' @param antiexact for which variables anti-exact matching should take place. #' Anti-exact matching is processed using the `restrict` argument to #' `Matching::GenMatch()` and `Matching::Match()`. #' @param discard a string containing a method for discarding units outside a #' region of common support. Only allowed when `distance` corresponds to a #' propensity score. #' @param reestimate if `discard` is not `"none"`, whether to #' re-estimate the propensity score in the remaining sample prior to matching. #' @param s.weights the variable containing sampling weights to be incorporated #' into propensity score models and balance statistics. These are also supplied #' to `GenMatch()` for use in computing the balance t-test p-values in the #' process of matching. #' @param replace whether matching should be done with replacement. #' @param m.order the order that the matching takes place. Allowable options #' include `"largest"`, where matching takes place in descending order of #' distance measures; `"smallest"`, where matching takes place in ascending #' order of distance measures; `"random"`, where matching takes place #' in a random order; and `"data"` where matching takes place based on the #' order of units in the data. When `m.order = "random"`, results may differ #' across different runs of the same code unless a seed is set and specified #' with [set.seed()]. The default of `NULL` corresponds to `"largest"` when a #' propensity score is estimated or supplied as a vector and `"data"` #' otherwise. #' @param caliper the width(s) of the caliper(s) used for caliper matching. See #' Details and Examples. #' @param std.caliper `logical`; when calipers are specified, whether they #' are in standard deviation units (`TRUE`) or raw units (`FALSE`). #' @param ratio how many control units should be matched to each treated unit #' for k:1 matching. Should be a single integer value. #' @param verbose `logical`; whether information about the matching #' process should be printed to the console. When `TRUE`, output from #' `GenMatch()` with `print.level = 2` will be displayed. Default is #' `FALSE` for no printing other than warnings. #' @param \dots additional arguments passed to \pkgfun{Matching}{GenMatch}. #' Potentially useful options include `pop.size`, `max.generations`, #' and `fit.func`. If `pop.size` is not specified, a warning from #' *Matching* will be thrown reminding you to change it. Note that the #' `ties` and `CommonSupport` arguments are set to `FALSE` and #' cannot be changed. If `distance.tolerance` is not specified, it is set #' to 0, whereas the default in *Matching* is 1e-5. #' #' @section Outputs: #' All outputs described in [matchit()] are returned with #' `method = "genetic"`. When `replace = TRUE`, the `subclass` #' component is omitted. When `include.obj = TRUE` in the call to #' `matchit()`, the output of the call to \pkgfun{Matching}{GenMatch} will be #' included in the output. #' #' @details #' In genetic matching, covariates play three roles: 1) as the variables on #' which balance is optimized, 2) as the variables in the generalized #' Mahalanobis distance between units, and 3) in estimating the propensity #' score. Variables supplied to `formula` are always used for role (1), as #' the variables on which balance is optimized. When `distance` #' corresponds to a propensity score, the covariates are also used to estimate #' the propensity score (unless it is supplied). When `mahvars` is #' specified, the named variables will form the covariates that go into the #' distance matrix. Otherwise, the variables in `formula` along with the #' propensity score will go into the distance matrix. This leads to three ways #' to use `distance` and `mahvars` to perform the matching: #' #' \enumerate{ #' \item{When `distance` corresponds to a propensity score and `mahvars` #' *is not* specified, the covariates in `formula` along with the #' propensity score are used to form the generalized Mahalanobis distance #' matrix. This is the default and most typical use of `method = #' "genetic"` in `matchit()`. #' } #' \item{When `distance` corresponds to a propensity score and `mahvars` #' *is* specified, the covariates in `mahvars` are used to form the #' generalized Mahalanobis distance matrix. The covariates in `formula` #' are used to estimate the propensity score and have their balance optimized #' by the genetic algorithm. The propensity score is not included in the #' generalized Mahalanobis distance matrix. #' } #' \item{When `distance` is a method of computing a distance matrix #' (e.g.,`"mahalanobis"`), no propensity score is estimated, and the #' covariates in `formula` are used to form the generalized Mahalanobis #' distance matrix. Which specific method is supplied has no bearing on how the #' distance matrix is computed; it simply serves as a signal to omit estimation #' of a propensity score. #' } #' } #' #' When a caliper is specified, any variables mentioned in `caliper`, #' possibly including the propensity score, will be added to the matching #' variables used to form the generalized Mahalanobis distance matrix. This is #' because *Matching* doesn't allow for the separation of caliper #' variables and matching variables in genetic matching. #' #' ## Estimand #' #' The `estimand` argument controls whether control #' units are selected to be matched with treated units (`estimand = #' "ATT"`) or treated units are selected to be matched with control units #' (`estimand = "ATC"`). The "focal" group (e.g., the treated units for #' the ATT) is typically made to be the smaller treatment group, and a warning #' will be thrown if it is not set that way unless `replace = TRUE`. #' Setting `estimand = "ATC"` is equivalent to swapping all treated and #' control labels for the treatment variable. When `estimand = "ATC"`, the #' default `m.order` is `"smallest"`, and the `match.matrix` #' component of the output will have the names of the control units as the #' rownames and be filled with the names of the matched treated units (opposite #' to when `estimand = "ATT"`). Note that the argument supplied to #' `estimand` doesn't necessarily correspond to the estimand actually #' targeted; it is merely a switch to trigger which treatment group is #' considered "focal". Note that while `GenMatch()` and `Match()` #' support the ATE as an estimand, `matchit()` only supports the ATT and #' ATC for genetic matching. #' #' ## Reproducibility #' #' Genetic matching involves a random component, so a seed must be set using [set.seed()] to ensure reproducibility. When `cluster` is used for parallel processing, the seed must be compatible with parallel processing (e.g., by setting `kind = "L'Ecuyer-CMRG"`). #' #' @seealso [matchit()] for a detailed explanation of the inputs and outputs of #' a call to `matchit()`. #' #' \pkgfun{Matching}{GenMatch} and \pkgfun{Matching}{Match}, which do the work. #' #' @references In a manuscript, be sure to cite the following papers if using #' `matchit()` with `method = "genetic"`: #' #' Diamond, A., & Sekhon, J. S. (2013). Genetic matching for estimating causal #' effects: A general multivariate matching method for achieving balance in #' observational studies. Review of Economics and Statistics, 95(3), 932–945. \doi{10.1162/REST_a_00318} #' #' Sekhon, J. S. (2011). Multivariate and Propensity Score Matching Software #' with Automated Balance Optimization: The Matching package for R. Journal of #' Statistical Software, 42(1), 1–52. \doi{10.18637/jss.v042.i07} #' #' For example, a sentence might read: #' #' *Genetic matching was performed using the MatchIt package (Ho, Imai, #' King, & Stuart, 2011) in R, which calls functions from the Matching package #' (Diamond & Sekhon, 2013; Sekhon, 2011).* #' #' @examplesIf all(sapply(c("Matching", "rgenoud"), requireNamespace, quietly = TRUE)) #' data("lalonde") #' #' # 1:1 genetic matching with PS as a covariate #' m.out1 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' method = "genetic", #' pop.size = 10) #use much larger pop.size #' m.out1 #' summary(m.out1) #' #' # 2:1 genetic matching with replacement without PS #' m.out2 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' method = "genetic", #' replace = TRUE, #' ratio = 2, #' distance = "mahalanobis", #' pop.size = 10) #use much larger pop.size #' m.out2 #' summary(m.out2, un = FALSE) #' #' # 1:1 genetic matching on just age, educ, re74, and re75 #' # within calipers on PS and educ; other variables are #' # used to estimate PS #' m.out3 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' method = "genetic", #' mahvars = ~ age + educ + re74 + re75, #' caliper = c(.05, educ = 2), #' std.caliper = c(TRUE, FALSE), #' pop.size = 10) #use much larger pop.size #' m.out3 #' summary(m.out3, un = FALSE) NULL matchit2genetic <- function(treat, data, distance, discarded, ratio = 1, s.weights = NULL, replace = FALSE, m.order = NULL, caliper = NULL, mahvars = NULL, exact = NULL, formula = NULL, estimand = "ATT", verbose = FALSE, is.full.mahalanobis, use.genetic = TRUE, antiexact = NULL, ...) { rlang::check_installed(c("Matching", "rgenoud")) .cat_verbose("Genetic matching...\n", verbose = verbose) .args <- names(formals(Matching::GenMatch)) A <- ...mget(.args) A[lengths(A) == 0L] <- NULL estimand <- toupper(estimand) estimand <- match_arg(estimand, c("ATT", "ATC")) if (estimand == "ATC") { tc <- c("control", "treated") focal <- 0 } else { tc <- c("treated", "control") focal <- 1 } if (!replace) { if (sum(!discarded & treat != focal) < sum(!discarded & treat == focal)) { .wrn(sprintf("fewer %s units than %s units; not all %s units will get a match", tc[2L], tc[1L], tc[1L])) } else if (sum(!discarded & treat != focal) < sum(!discarded & treat == focal) * ratio) { .err(sprintf("not enough %s units for %s matches for each %s unit", tc[2L], ratio, tc[1L])) } } treat <- setNames(as.integer(treat == focal), names(treat)) n.obs <- length(treat) n1 <- sum(treat == 1) if (is_null(names(treat))) names(treat) <- seq_len(n.obs) m.order <- { if (is_null(distance)) match_arg(m.order, c("data", "random")) else if (is_null(m.order)) switch(estimand, "ATC" = "smallest", "largest") else match_arg(m.order, c("largest", "smallest", "data", "random")) } ord <- switch(m.order, "largest" = order(distance, decreasing = TRUE), "smallest" = order(distance), "random" = sample.int(n.obs), "data" = seq_len(n.obs)) if (any(discarded)) { ord <- ord[!ord %in% which(discarded)] } #Create X (matching variables) and covs_to_balance covs_to_balance <- get_covs_matrix(formula, data = data) if (ncol(covs_to_balance) == 0L) { .err("covariates must be specified in the input formula to use genetic matching") } X <- { if (is_not_null(mahvars)) get_covs_matrix_for_dist(mahvars, data = data) else if (is.full.mahalanobis) covs_to_balance else cbind(covs_to_balance, distance) } #Process exact; exact.log will be supplied to GenMatch() and Match() if (is_not_null(exact)) { #Add covariates in exact not in X to X ex <- unclass(exactify(model.frame(exact, data = data), names(treat), sep = ", ", include_vars = TRUE)) cc <- intersect(ex[treat == 1], ex[treat == 0]) if (is_null(cc)) { .err("No matches were found") } X <- cbind(X, ex) exact.log <- c(rep.int(FALSE, ncol(X) - 1L), TRUE) } else { exact.log <- ex <- NULL } #Reorder data according to m.order since Match matches in order of data; #ord already excludes discarded units treat_ <- treat[ord] covs_to_balance <- covs_to_balance[ord, , drop = FALSE] X <- X[ord, , drop = FALSE] if (is_not_null(s.weights)) s.weights <- s.weights[ord] #Process caliper; cal will be supplied to GenMatch() and Match() cal <- dist.cal <- cov.cals <- NULL if (is_not_null(caliper) && any(caliper < 0)) { neg.cal <- names(caliper)[caliper < 0] if (any(nzchar(neg.cal))) { negcalcovs <- get_covs_matrix(reformulate(neg.cal[nzchar(neg.cal)]), data = data)[ord, , drop = FALSE] negcalcovs_restrict <- do.call("rbind", lapply(seq_len(ncol(negcalcovs)), function(i) { do.call("rbind", lapply(which(treat_ == 1), function(j) { restricted_controls <- which(treat_ == 0 & abs(negcalcovs[j, i] - negcalcovs[, i]) <= -caliper[neg.cal[nzchar(neg.cal)][i]]) if (is_null(restricted_controls)) { return(NULL) } cbind(j, restricted_controls, -1) })) })) if (is_not_null(negcalcovs_restrict)) { A[["restrict"]] <- { if (is_null(A[["restrict"]])) unique(negcalcovs_restrict) else rbind(A[["restrict"]], unique(negcalcovs_restrict)) } } } if (!all(nzchar(neg.cal))) { negcaldist_restrict <- do.call("rbind", lapply(which(treat_ == 1), function(j) { restricted_controls <- which(treat_ == 0 & abs(distance[ord][j] - distance[ord]) <= -caliper[!nzchar(names(caliper))]) if (is_null(restricted_controls)) { return(NULL) } cbind(j, restricted_controls, -1) })) if (is_not_null(negcaldist_restrict)) { A[["restrict"]] <- { if (is_null(A[["restrict"]])) unique(negcaldist_restrict) else rbind(A[["restrict"]], unique(negcaldist_restrict)) } } } caliper <- caliper[caliper >= 0] } #Add covariates in caliper other than distance (cov.cals) not in X to X if (is_not_null(caliper)) { cov.cals <- setdiff(names(caliper), "") if (is_not_null(cov.cals) && !all(cov.cals %in% colnames(X))) { calcovs <- get_covs_matrix(reformulate(cov.cals[!cov.cals %in% colnames(X)]), data = data)[ord, , drop = FALSE] X <- cbind(X, calcovs) #Expand exact.log for newly added covariates if (is_not_null(exact.log)) { exact.log <- c(exact.log, rep.int(FALSE, ncol(calcovs))) } } #Matching::Match multiplies calipers by pop SD, so we need to divide by pop SD to unstandardize pop.sd <- function(x) sqrt(sum((x - mean(x))^2) / length(x)) caliper <- caliper / vapply(names(caliper), function(x) { if (x == "") pop.sd(distance[ord]) else pop.sd(X[, x]) }, numeric(1L)) #cal needs one value per variable in X cal <- setNames(rep.int(Inf, ncol(X)), colnames(X)) #First put covariate calipers into cal if (is_not_null(cov.cals)) { cal[intersect(cov.cals, names(cal))] <- caliper[intersect(cov.cals, names(cal))] } #Then put distance caliper into cal if (!all(nzchar(names(caliper)))) { dist.cal <- caliper[!nzchar(names(caliper))] if (is_not_null(mahvars)) { #If mahvars specified, distance is not yet in X, so add it to X X <- cbind(X, distance[ord]) cal <- c(cal, dist.cal) #Expand exact.log for newly added distance if (is_not_null(exact.log)) exact.log <- c(exact.log, FALSE) } else { #Otherwise, distance is in X at the specified index cal[ncol(covs_to_balance) + 1L] <- dist.cal } } else { dist.cal <- NULL } } if (is_not_null(antiexact)) { antiexactcovs <- model.frame(antiexact, data)[ord, , drop = FALSE] antiexact_restrict <- do.call("rbind", lapply(seq_len(ncol(antiexactcovs)), function(i) { do.call("rbind", lapply(which(treat_ == 1), function(j) { restricted_controls <- which(treat_ == 0 & antiexactcovs[[i]][j] == antiexactcovs[[i]]) if (is_null(restricted_controls)) { return(NULL) } cbind(j, restricted_controls, -1) })) })) if (is_not_null(antiexact_restrict)) { A[["restrict"]] <- { if (is_null(A[["restrict"]])) unique(antiexact_restrict) else rbind(A[["restrict"]], unique(antiexact_restrict)) } } } else { antiexactcovs <- NULL } if (is_null(A[["distance.tolerance"]])) { A[["distance.tolerance"]] <- 0 } if (use.genetic) { matchit_try({ g.out <- do.call(Matching::GenMatch, c(list(Tr = treat_, X = X, BalanceMatrix = covs_to_balance, M = ratio, exact = exact.log, caliper = cal, replace = replace, estimand = "ATT", ties = FALSE, CommonSupport = FALSE, verbose = verbose, weights = s.weights, print.level = 2 * verbose), A[names(A) %in% .args])) }, from = "Matching", dont_warn_if = c("replace==FALSE, but there are more (weighted) treated obs than control obs", "no valid matches")) } else { #For debugging g.out <- NULL } lab <- names(treat) lab1 <- lab[treat == 1] lab_ <- names(treat_) ind_ <- seq_along(treat)[ord] matchit_try({ m.out <- Matching::Match(Tr = treat_, X = X, M = ratio, exact = exact.log, caliper = cal, replace = replace, estimand = "ATT", ties = FALSE, weights = s.weights, CommonSupport = FALSE, distance.tolerance = A[["distance.tolerance"]], Weight = 3, Weight.matrix = { if (use.genetic) g.out else if (is_null(s.weights)) generalized_inverse(cor(X)) else generalized_inverse(cov.wt(X, s.weights, cor = TRUE)$cor) }, restrict = A[["restrict"]], version = "fast") }, from = "Matching", dont_warn_if = c("replace==FALSE, but there are more (weighted) treated obs than control obs", "no valid matches")) if (typeof(m.out) == "logical" && all(is.na(m.out))) { .err("no units were matched") } #Note: must use character match.matrix because of re-ordering treat into treat_ mm <- matrix(NA_integer_, nrow = n1, ncol = max(table(m.out$index.treated)), dimnames = list(lab1, NULL)) unique.matched.focal <- unique(m.out$index.treated, nmax = n1) ind1__ <- match(lab_, lab1) for (i in unique.matched.focal) { matched.units <- ind_[m.out$index.control[m.out$index.treated == i]] mm[ind1__[i], seq_along(matched.units)] <- matched.units } .cat_verbose("Calculating matching weights... ", verbose = verbose) if (replace) { psclass <- NULL weights <- get_weights_from_mm(mm, treat, 1L) } else { psclass <- mm2subclass(mm, treat, 1L) weights <- get_weights_from_subclass(psclass, treat) } res <- list(match.matrix = nummm2charmm(mm, treat), subclass = psclass, weights = weights, obj = g.out) .cat_verbose("Done.\n", verbose = verbose) class(res) <- "matchit" res } MatchIt/R/matchit.R0000644000176200001440000011201714762405353013573 0ustar liggesusers#' Matching for Causal Inference #' #' @description #' `matchit()` is the main function of *MatchIt* and performs #' pairing, subset selection, and subclassification with the aim of creating #' treatment and control groups balanced on included covariates. *MatchIt* #' implements the suggestions of Ho, Imai, King, and Stuart (2007) for #' improving parametric statistical models by preprocessing data with #' nonparametric matching methods. #' #' This page documents the overall use of `matchit()`, but for specifics #' of how `matchit()` works with individual matching methods, see the #' individual pages linked in the Details section below. #' #' @param formula a two-sided [`formula`] object containing the treatment and #' covariates to be used in creating the distance measure used in the matching. #' This formula will be supplied to the functions that estimate the distance #' measure. The formula should be specified as `A ~ X1 + X2 + ...` where #' `A` represents the treatment variable and `X1` and `X2` are #' covariates. #' @param data a data frame containing the variables named in `formula` #' and possible other arguments. If not found in `data`, the variables #' will be sought in the environment. #' @param method the matching method to be used. The allowed methods are #' [`"nearest"`][method_nearest] for nearest neighbor matching (on #' the propensity score by default), [`"optimal"`][method_optimal] #' for optimal pair matching, [`"full"`][method_full] for optimal #' full matching, [`"quick"`][method_quick] for generalized (quick) #' full matching, [`"genetic"`][method_genetic] for genetic #' matching, [`"cem"`][method_cem] for coarsened exact matching, #' [`"exact"`][method_exact] for exact matching, #' [`"cardinality"`][method_cardinality] for cardinality and #' profile matching, and [`"subclass"`][method_subclass] for #' subclassification. When set to `NULL`, no matching will occur, but #' propensity score estimation and common support restrictions will still occur #' if requested. See the linked pages for each method for more details on what #' these methods do, how the arguments below are used by each on, and what #' additional arguments are allowed. #' @param distance the distance measure to be used. Can be either the name of a #' method of estimating propensity scores (e.g., `"glm"`), the name of a #' method of computing a distance matrix from the covariates (e.g., #' `"mahalanobis"`), a vector of already-computed distance measures, or a #' matrix of pairwise distances. See [`distance`] for allowable #' options. The default is `"glm"` for propensity scores estimated with #' logistic regression using [glm()]. Ignored for some methods; see individual #' methods pages for information on whether and how the distance measure is #' used. #' @param link when `distance` is specified as a string, an additional #' argument controlling the link function used in estimating the distance #' measure. Allowable options depend on the specific `distance` value #' specified. See [`distance`] for allowable options with each #' option. The default is `"logit"`, which, along with `distance = "glm"`, identifies the default measure as logistic regression propensity scores. #' @param distance.options a named list containing additional arguments #' supplied to the function that estimates the distance measure as determined #' by the argument to `distance`. See [`distance`] for an #' example of its use. #' @param estimand a string containing the name of the target estimand desired. #' Can be one of `"ATT"`, `"ATC"`, or `"ATE"`. Default is `"ATT"`. See Details and the individual methods #' pages for information on how this argument is used. #' @param exact for methods that allow it, for which variables exact matching #' should take place. Can be specified as a string containing the names of #' variables in `data` to be used or a one-sided formula with the desired #' variables on the right-hand side (e.g., `~ X3 + X4`). See the #' individual methods pages for information on whether and how this argument is #' used. #' @param mahvars for methods that allow it, on which variables Mahalanobis #' distance matching should take place when `distance` corresponds to #' propensity scores. Usually used to perform Mahalanobis distance matching #' within propensity score calipers, where the propensity scores are computed #' using `formula` and `distance`. Can be specified as a string #' containing the names of variables in `data` to be used or a one-sided #' formula with the desired variables on the right-hand side (e.g., `~ X3 + X4`). See the individual methods pages for information on whether and how this argument is used. #' @param antiexact for methods that allow it, for which variables anti-exact #' matching should take place. Anti-exact matching ensures paired individuals #' do not have the same value of the anti-exact matching variable(s). Can be #' specified as a string containing the names of variables in `data` to be #' used or a one-sided formula with the desired variables on the right-hand #' side (e.g., `~ X3 + X4`). See the individual methods pages for #' information on whether and how this argument is used. #' @param discard a string containing a method for discarding units outside a #' region of common support. When a propensity score is estimated or supplied #' to `distance` as a vector, the options are `"none"`, #' `"treated"`, `"control"`, or `"both"`. For `"none"`, no #' units are discarded for common support. Otherwise, units whose propensity #' scores fall outside the corresponding region are discarded. Can also be a #' `logical` vector where `TRUE` indicates the unit is to be #' discarded. Default is `"none"` for no common support restriction. See #' Details. #' @param reestimate if `discard` is not `"none"` and propensity #' scores are estimated, whether to re-estimate the propensity scores in the #' remaining sample. Default is `FALSE` to use the propensity scores #' estimated in the original sample. #' @param s.weights an optional numeric vector of sampling weights to be #' incorporated into propensity score models and balance statistics. Can also #' be specified as a string containing the name of variable in `data` to #' be used or a one-sided formula with the variable on the right-hand side #' (e.g., `~ SW`). Not all propensity score models accept sampling #' weights; see [`distance`] for information on which do and do not, #' and see `vignette("sampling-weights")` for details on how to use #' sampling weights in a matching analysis. #' @param replace for methods that allow it, whether matching should be done #' with replacement (`TRUE`), where control units are allowed to be #' matched to several treated units, or without replacement (`FALSE`), #' where control units can only be matched to one treated unit each. See the #' individual methods pages for information on whether and how this argument is #' used. Default is `FALSE` for matching without replacement. #' @param m.order for methods that allow it, the order that the matching takes #' place. Allowable options depend on the matching method. The default of #' `NULL` corresponds to `"largest"` when a propensity score is #' estimated or supplied as a vector and `"data"` otherwise. #' @param caliper for methods that allow it, the width(s) of the caliper(s) to #' use in matching. Should be a numeric vector with each value named according #' to the variable to which the caliper applies. To apply to the distance #' measure, the value should be unnamed. See the individual methods pages for #' information on whether and how this argument is used. Positive values require the distance between paired units to be no larger than the supplied caliper; negative values require the distance between paired units to be larger than the absolute value value of the supplied caliper. The default is `NULL` for no caliper. #' @param std.caliper `logical`; when a caliper is specified, whether the #' the caliper is in standard deviation units (`TRUE`) or raw units #' (`FALSE`). Can either be of length 1, applying to all calipers, or of #' length equal to the length of `caliper`. Default is `TRUE`. #' @param ratio for methods that allow it, how many control units should be #' matched to each treated unit in k:1 matching. Should be a single integer #' value. See the individual methods pages for information on whether and how #' this argument is used. The default is 1 for 1:1 matching. #' @param verbose `logical`; whether information about the matching #' process should be printed to the console. What is printed depends on the #' matching method. Default is `FALSE` for no printing other than #' warnings. #' @param include.obj `logical`; whether to include any objects created in #' the matching process in the output, i.e., by the functions from other #' packages `matchit()` calls. What is included depends on the matching #' method. Default is `FALSE`. #' @param normalize `logical`; whether to rescale the nonzero weights in each treatment group to have an average of 1. Default is `TRUE`. See "How Matching Weights Are Computed" below for more details. #' @param \dots additional arguments passed to the functions used in the #' matching process. See the individual methods pages for information on what #' additional arguments are allowed for each method. #' #' @details #' Details for the various matching methods can be found at the following help #' pages: #' * [`method_nearest`] for nearest neighbor matching #' * [`method_optimal`] for optimal pair matching #' * [`method_full`] for optimal full matching #' * [`method_quick`] for generalized (quick) full matching #' * [`method_genetic`] for genetic matching #' * [`method_cem`] for coarsened exact matching #' * [`method_exact`] for exact matching #' * [`method_cardinality`] for cardinality and profile matching #' * [`method_subclass`] for subclassification #' #' The pages contain information on what the method does, which of the arguments above are #' allowed with them and how they are interpreted, and what additional #' arguments can be supplied to further tune the method. Note that the default #' method with no arguments supplied other than `formula` and `data` #' is 1:1 nearest neighbor matching without replacement on a propensity score #' estimated using a logistic regression of the treatment on the covariates. #' This is not the same default offered by other matching programs, such as #' those in *Matching*, `teffects` in Stata, or `PROC PSMATCH` #' in SAS, so care should be taken if trying to replicate the results of those #' programs. #' #' When `method = NULL`, no matching will occur, but any propensity score #' estimation and common support restriction will. This can be a simple way to #' estimate the propensity score for use in future matching specifications #' without having to re-estimate it each time. The `matchit()` output with #' no matching can be supplied to `summary()` to examine balance prior to #' matching on any of the included covariates and on the propensity score if #' specified. All arguments other than `distance`, `discard`, and #' `reestimate` will be ignored. #' #' See [`distance`] for details on the several ways to #' specify the `distance`, `link`, and `distance.options` #' arguments to estimate propensity scores and create distance measures. #' #' When the treatment variable is not a `0/1` variable, it will be coerced #' to one and returned as such in the `matchit()` output (see section #' Value, below). The following rules are used: 1) if `0` is one of the #' values, it will be considered the control and the other value the treated; #' 2) otherwise, if the variable is a factor, `levels(treat)[1]` will be #' considered control and the other value the treated; 3) otherwise, #' `sort(unique(treat))[1]` will be considered control and the other value #' the treated. It is safest to ensure the treatment variable is a `0/1` #' variable. #' #' The `discard` option implements a common support restriction. It can #' only be used when a distance measure is an estimated propensity score or supplied as a vector and is ignored for some matching #' methods. When specified as `"treated"`, treated units whose distance #' measure is outside the range of distance measures of the control units will #' be discarded. When specified as `"control"`, control units whose #' distance measure is outside the range of distance measures of the treated #' units will be discarded. When specified as `"both"`, treated and #' control units whose distance measure is outside the intersection of the #' range of distance measures of the treated units and the range of distance #' measures of the control units will be discarded. When `reestimate = TRUE` and `distance` corresponds to a propensity score-estimating #' function, the propensity scores are re-estimated in the remaining units #' prior to being used for matching or calipers. #' #' Caution should be used when interpreting effects estimated with various #' values of `estimand`. Setting `estimand = "ATT"` doesn't #' necessarily mean the average treatment effect in the treated is being #' estimated; it just means that for matching methods, treated units will be #' untouched and given weights of 1 and control units will be matched to them #' (and the opposite for `estimand = "ATC"`). If a caliper is supplied or #' treated units are removed for common support or some other reason (e.g., #' lacking matches when using exact matching), the actual estimand targeted is #' not the ATT but the treatment effect in the matched sample. The argument to #' `estimand` simply triggers which units are matched to which, and for #' stratification-based methods (exact matching, CEM, full matching, and #' subclassification), determines the formula used to compute the #' stratification weights. #' #' ## How Matching Weights Are Computed #' #' Matching weights are computed in one of two ways depending on whether matching was done with replacement #' or not. #' #' ### Matching without replacement and subclassification #' #' For matching *without* replacement (except for cardinality matching), including subclassification, each #' unit is assigned to a subclass, which represents the pair they are a part of #' (in the case of k:1 matching) or the stratum they belong to (in the case of #' exact matching, coarsened exact matching, full matching, or #' subclassification). The formula for computing the weights depends on the #' argument supplied to `estimand`. A new "stratum propensity score" #' (\eqn{p^s_i}) is computed for each unit \eqn{i} as \eqn{p^s_i = \frac{1}{n_s}\sum_{j: s_j =s_i}{I(A_j=1)}} where \eqn{n_s} is the size of subclass \eqn{s} and \eqn{I(A_j=1)} is 1 if unit \eqn{j} is treated and 0 otherwise. That is, the stratum propensity score for stratum \eqn{s} is the proportion of units in stratum \eqn{s} that are #' in the treated group, and all units in stratum \eqn{s} are assigned that #' stratum propensity score. This is distinct from the propensity score used for matching, if any. Weights are then computed using the standard formulas for #' inverse probability weights with the stratum propensity score inserted: #' * for the ATT, weights are 1 for the treated #' units and \eqn{\frac{p^s}{1-p^s}} for the control units #' * for the ATC, weights are #' \eqn{\frac{1-p^s}{p^s}} for the treated units and 1 for the control units #' * for the ATE, weights are \eqn{\frac{1}{p^s}} for the treated units and \eqn{\frac{1}{1-p^s}} for the #' control units. #' #' For cardinality matching, all matched units receive a weight #' of 1. #' #' ### Matching with replacement #' #' For matching *with* replacement, units are not assigned to unique strata. For #' the ATT, each treated unit gets a weight of 1. Each control unit is weighted #' as the sum of the inverse of the number of control units matched to the same #' treated unit across its matches. For example, if a control unit was matched #' to a treated unit that had two other control units matched to it, and that #' same control was matched to a treated unit that had one other control unit #' matched to it, the control unit in question would get a weight of \eqn{1/3 + 1/2 = 5/6}. For the ATC, the same is true with the treated and control labels #' switched. The weights are computed using the `match.matrix` component #' of the `matchit()` output object. #' #' ### Normalized weights #' #' When `normalize = TRUE` (the default), in each treatment group, weights are divided by the mean of the nonzero #' weights in that treatment group to make the weights sum to the number of #' units in that treatment group (i.e., to have an average of 1). #' #' ### Sampling weights #' #' If sampling weights are included through the #' `s.weights` argument, they will be included in the `matchit()` #' output object but not incorporated into the matching weights. #' [match_data()], which extracts the matched set from a `matchit` object, #' combines the matching weights and sampling weights. #' #' @return When `method` is something other than `"subclass"`, a #' `matchit` object with the following components: #' #' \item{match.matrix}{a matrix containing the matches. The row names correspond #' to the treated units and the values in each row are the names (or indices) #' of the control units matched to each treated unit. When treated units are #' matched to different numbers of control units (e.g., with variable ratio matching or #' matching with a caliper), empty spaces will be filled with `NA`. Not #' included when `method` is `"full"`, `"cem"` (unless `k2k = TRUE`), `"exact"`, `"quick"`, or `"cardinality"` (unless `mahvars` is supplied and `ratio` is an integer).} #' \item{subclass}{a factor #' containing matching pair/stratum membership for each unit. Unmatched units #' will have a value of `NA`. Not included when `replace = TRUE` or when `method = "cardinality"` unless `mahvars` is supplied and `ratio` is an integer.} #' \item{weights}{a numeric vector of estimated matching weights. Unmatched and #' discarded units will have a weight of zero.} #' \item{model}{the fit object of #' the model used to estimate propensity scores when `distance` is #' specified as a method of estimating propensity scores. When #' `reestimate = TRUE`, this is the model estimated after discarding #' units.} #' \item{X}{a data frame of covariates mentioned in `formula`, `exact`, `mahvars`, `caliper`, and `antiexact`.} #' \item{call}{the `matchit()` call.} #' \item{info}{information on the matching method and distance measures used.} #' \item{estimand}{the argument supplied to `estimand`.} #' \item{formula}{the `formula` supplied.} #' \item{treat}{a vector of treatment status converted to zeros (0) and ones #' (1) if not already in that format.} #' \item{distance}{a vector of distance #' values (i.e., propensity scores) when `distance` is supplied as a #' method of estimating propensity scores or a numeric vector.} #' \item{discarded}{a logical vector denoting whether each observation was #' discarded (`TRUE`) or not (`FALSE`) by the argument to `discard`.} #' \item{s.weights}{the vector of sampling weights supplied to the `s.weights` argument, if any.} #' \item{exact}{a one-sided formula containing the variables, if any, supplied to `exact`.} #' \item{mahvars}{a one-sided formula containing the variables, if any, supplied to `mahvars`.} #' \item{obj}{when `include.obj = TRUE`, an object containing the intermediate results of the matching procedure. See #' the individual methods pages for what this component will contain.} #' #' When `method = "subclass"`, a `matchit.subclass` object with the same #' components as above except that `match.matrix` is excluded and one #' additional component, `q.cut`, is included, containing a vector of the #' distance measure cutpoints used to define the subclasses. See #' [`method_subclass`] for details. #' #' @author Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart wrote the original package. Starting with version 4.0.0, Noah Greifer is the primary maintainer and developer. #' #' @seealso [summary.matchit()] for balance assessment after matching, [plot.matchit()] for plots of covariate balance and propensity score overlap after matching. #' #' * `vignette("MatchIt")` for an introduction to matching with *MatchIt* #' * `vignette("matching-methods")` for descriptions of the variety of matching methods and options available #' * `vignette("assessing-balance")` for information on assessing the quality of a matching specification #' * `vignette("estimating-effects")` for instructions on how to estimate treatment effects after matching #' * `vignette("sampling-weights")` for a guide to using *MatchIt* with sampling weights. #' #' @references #' Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching #' as Nonparametric Preprocessing for Reducing Model Dependence in Parametric #' Causal Inference. *Political Analysis*, 15(3), 199–236. \doi{10.1093/pan/mpl013} #' #' Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2011). MatchIt: #' Nonparametric Preprocessing for Parametric Causal Inference. *Journal of Statistical Software*, 42(8). \doi{10.18637/jss.v042.i08} #' #' @examples #' data("lalonde") #' #' # Default: 1:1 NN PS matching w/o replacement #' #' m.out1 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde) #' m.out1 #' summary(m.out1) #' #' # 1:1 NN Mahalanobis distance matching w/ replacement and #' # exact matching on married and race #' #' m.out2 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' distance = "mahalanobis", #' replace = TRUE, #' exact = ~ married + race) #' m.out2 #' summary(m.out2, un = TRUE) #' #' # 2:1 NN Mahalanobis distance matching within caliper defined #' # by a probit pregression PS #' #' m.out3 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' distance = "glm", #' link = "probit", #' mahvars = ~ age + educ + re74 + re75, #' caliper = .1, #' ratio = 2) #' m.out3 #' summary(m.out3, un = TRUE) #' #' # Optimal full PS matching for the ATE within calipers on #' # PS, age, and educ #' @examplesIf requireNamespace("optmatch", quietly = TRUE) #' m.out4 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' method = "full", #' estimand = "ATE", #' caliper = c(.1, age = 2, educ = 1), #' std.caliper = c(TRUE, FALSE, FALSE)) #' m.out4 #' summary(m.out4, un = TRUE) #' @examples #' # Subclassification on a logistic PS with 10 subclasses after #' # discarding controls outside common support of PS #' #' s.out1 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' method = "subclass", #' distance = "glm", #' discard = "control", #' subclass = 10) #' s.out1 #' summary(s.out1, un = TRUE) #' @export matchit <- function(formula, data = NULL, method = "nearest", distance = "glm", link = "logit", distance.options = list(), estimand = "ATT", exact = NULL, mahvars = NULL, antiexact = NULL, discard = "none", reestimate = FALSE, s.weights = NULL, replace = FALSE, m.order = NULL, caliper = NULL, std.caliper = TRUE, ratio = 1, verbose = FALSE, include.obj = FALSE, normalize = TRUE, ...) { #Checking input format #data input mcall <- match.call() ## Process method chk::chk_null_or(method, vld = chk::vld_string) if (is_null(method)) { fn2 <- "matchit2null" } else { method <- tolower(method) method <- match_arg(method, c("exact", "cem", "nearest", "optimal", "full", "genetic", "subclass", "cardinality", "quick")) fn2 <- paste0("matchit2", method) } #Process formula and data inputs if (!rlang::is_formula(formula, lhs = TRUE)) { .err("`formula` must be a formula relating treatment to covariates") } treat.form <- update(terms(formula, data = data), . ~ 0) treat.mf <- model.frame(treat.form, data = data, na.action = "na.pass") treat <- model.response(treat.mf) #Check and binarize treat treat <- check_treat(treat) if (is_null(treat)) { .err("the treatment cannot be `NULL`") } names(treat) <- rownames(treat.mf) n.obs <- length(treat) #Process inputs ignored.inputs <- check.inputs(mcall = mcall, method = method, distance = distance, exact = exact, mahvars = mahvars, antiexact = antiexact, caliper = caliper, discard = discard, reestimate = reestimate, s.weights = s.weights, replace = replace, ratio = ratio, m.order = m.order, estimand = estimand) for (i in ignored.inputs) { assign(i, NULL) } #Process replace replace <- process.replace(replace, method, ...) #Process ratio ratio <- process.ratio(ratio, method, ...) #Process s.weights if (is_not_null(s.weights)) { if (is.character(s.weights)) { if (is_null(data) || !is.data.frame(data)) { .err("if `s.weights` is specified a string, a data frame containing the named variable must be supplied to `data`") } if (!all(hasName(data, s.weights))) { .err("the name supplied to `s.weights` must be a variable in `data`") } s.weights.form <- reformulate(s.weights) s.weights <- model.frame(s.weights.form, data, na.action = "na.pass") if (ncol(s.weights) != 1L) { .err("`s.weights` can only contain one named variable") } s.weights <- s.weights[[1L]] } else if (rlang::is_formula(s.weights)) { s.weights.form <- update(terms(s.weights, data = data), NULL ~ .) s.weights <- model.frame(s.weights.form, data, na.action = "na.pass") if (ncol(s.weights) != 1L) { .err("`s.weights` can only contain one named variable") } s.weights <- s.weights[[1L]] } else if (!is.numeric(s.weights)) { .err("`s.weights` must be supplied as a numeric vector, string, or one-sided formula") } chk::chk_not_any_na(s.weights) if (length(s.weights) != n.obs) { .err("`s.weights` must be the same length as the treatment vector") } names(s.weights) <- names(treat) } #Process distance function is.full.mahalanobis <- FALSE fn1 <- NULL if (is_null(method) || !method %in% c("exact", "cem", "cardinality")) { distance <- process.distance(distance, method, treat) if (is.numeric(distance)) { fn1 <- "distance2user" } else if (is.character(distance)) { if (distance %in% matchit_distances()) { fn1 <- "distance2mahalanobis" is.full.mahalanobis <- TRUE attr(is.full.mahalanobis, "transform") <- distance } else { fn1 <- paste0("distance2", distance) } } } #Process covs if (is_not_null(fn1) && fn1 == "distance2gam") { rlang::check_installed("mgcv") env <- environment(formula) covs.formula <- mgcv::interpret.gam(formula)$fake.formula environment(covs.formula) <- env covs.formula <- delete.response(terms(covs.formula, data = data)) } else { covs.formula <- delete.response(terms(formula, data = data)) } covs.formula <- update(covs.formula, ~ .) covs <- model.frame(covs.formula, data = data, na.action = "na.pass") k <- ncol(covs) for (i in seq_len(k)) { if (anyNA(covs[[i]]) || (is.numeric(covs[[i]]) && !all(is.finite(covs[[i]])))) { covariates.with.missingness <- names(covs)[i:k][vapply(i:k, function(j) anyNA(covs[[j]]) || (is.numeric(covs[[j]]) && !all(is.finite(covs[[j]]))), logical(1L))] .err(paste0("Missing and non-finite values are not allowed in the covariates. Covariates with missingness or non-finite values:\n\t", toString(covariates.with.missingness)), tidy = FALSE) } if (is.character(covs[[i]])) { covs[[i]] <- factor(covs[[i]]) } } #Process exact, mahvars, and antiexact exactcovs <- process.variable.input(exact, data) exact <- attr(exactcovs, "terms") mahcovs <- process.variable.input(mahvars, data) mahvars <- attr(mahcovs, "terms") antiexactcovs <- process.variable.input(antiexact, data) antiexact <- attr(antiexactcovs, "terms") chk::chk_flag(verbose) chk::chk_flag(normalize) #Estimate distance, discard from common support, optionally re-estimate distance if (is_null(fn1) || is.full.mahalanobis) { #No distance measure dist.model <- distance <- link <- NULL } else if (fn1 == "distance2user") { dist.model <- link <- NULL } else { .cat_verbose("Estimating propensity scores...\n", verbose = verbose) if (is_not_null(s.weights)) { attr(s.weights, "in_ps") <- !distance %in% c("bart") } #Estimate distance if (is_null(distance.options)) { distance.options <- list(formula = formula, data = data, verbose = verbose, estimand = estimand) } else { chk::chk_list(distance.options) if (is_null(distance.options$formula)) distance.options$formula <- formula if (is_null(distance.options$data)) distance.options$data <- data if (is_null(distance.options$verbose)) distance.options$verbose <- verbose if (is_null(distance.options$estimand)) distance.options$estimand <- estimand } if (is_null(distance.options$weights) && !fn1 %in% c("distance2bart")) { distance.options$weights <- s.weights } distance.options$link <- { if (is_not_null(attr(distance, "link"))) attr(distance, "link") else link } dist.out <- do.call(fn1, distance.options) dist.model <- dist.out$model distance <- dist.out$distance #Remove smoothing terms from gam formula if (inherits(dist.model, "gam")) { env <- environment(formula) formula <- mgcv::interpret.gam(formula)$fake.formula environment(formula) <- env } } #Process discard if (is_null(fn1) || is.full.mahalanobis || identical(fn1, "distance2user")) { discarded <- discard(treat, distance, discard) } else { discarded <- discard(treat, dist.out$distance, discard) #Optionally reestimate if (reestimate && any(discarded)) { for (i in seq_along(distance.options)) { if (length(distance.options[[i]]) == n.obs) { distance.options[[i]] <- distance.options[[i]][!discarded] } else if (length(dim(distance.options[[i]])) == 2L && nrow(distance.options[[i]]) == n.obs) { distance.options[[i]] <- distance.options[[i]][!discarded, , drop = FALSE] } } dist.out <- do.call(fn1, distance.options, quote = TRUE) dist.model <- dist.out$model distance[!discarded] <- dist.out$distance } } #Process caliper calcovs <- NULL if (is_not_null(caliper)) { caliper <- process.caliper(caliper, method, data, covs, mahcovs, distance, discarded, std.caliper) if (is_not_null(attr(caliper, "cal.formula"))) { calcovs <- model.frame(attr(caliper, "cal.formula"), data = data, na.action = "na.pass") if (anyNA(calcovs)) { .err("missing values are not allowed in the covariates named in `caliper`") } attr(caliper, "cal.formula") <- NULL } } #Matching! match.out <- do.call(fn2, list(treat = treat, covs = covs, data = data, distance = distance, discarded = discarded, exact = exact, mahvars = mahvars, replace = replace, m.order = m.order, caliper = caliper, s.weights = s.weights, ratio = ratio, is.full.mahalanobis = is.full.mahalanobis, formula = formula, estimand = estimand, verbose = verbose, antiexact = antiexact, ...), quote = TRUE) weights <- match.out[["weights"]] #Normalize weights if (normalize) { wi <- which(weights > 0) weights[wi] <- .make_sum_to_n(weights[wi], treat[wi]) } info <- create_info(method, fn1, link, discard, replace, ratio, mahalanobis = is.full.mahalanobis || is_not_null(mahvars), transform = attr(is.full.mahalanobis, "transform"), subclass = match.out$subclass, antiexact = colnames(antiexactcovs), distance_is_matrix = is_not_null(distance) && is.matrix(distance)) #Create X output, removing duplicate variables X.list.nm <- c("covs", "exactcovs", "mahcovs", "calcovs", "antiexactcovs") X <- NULL for (i in X.list.nm) { X_tmp <- get0(i, inherits = FALSE) if (is_null(X_tmp)) { next } if (is_null(X)) { X <- X_tmp } else if (!all(hasName(X, names(X_tmp)))) { X <- cbind(X, X_tmp[!names(X_tmp) %in% names(X)]) } } ## putting all the results together out <- list( match.matrix = match.out[["match.matrix"]], subclass = match.out[["subclass"]], weights = weights, X = X, call = mcall, info = info, estimand = estimand, formula = formula, treat = treat, distance = if (is_not_null(distance) && !is.matrix(distance)) setNames(distance, names(treat)), discarded = discarded, s.weights = s.weights, exact = exact, mahvars = mahvars, caliper = caliper, q.cut = match.out[["q.cut"]], model = dist.model, obj = if (include.obj) match.out[["obj"]] ) out[lengths(out) == 0L] <- NULL class(out) <- class(match.out) out } #' @exportS3Method print matchit print.matchit <- function(x, ...) { info <- x[["info"]] cal <- is_not_null(x[["caliper"]]) dis <- c("both", "control", "treat")[pmatch(info$discard, c("both", "control", "treat"), 0L)] disl <- is_not_null(dis) nm <- is_null(x[["method"]]) cat("A `matchit` object\n") cat(sprintf(" - method: %s\n", info_to_method(info))) if (is_not_null(info$distance) || info$mahalanobis) { cat(" - distance: ") if (info$mahalanobis) { if (is_null(info$transform)) #mahvars used cat("Mahalanobis") else { cat(capwords(gsub("_", " ", info$transform, fixed = TRUE))) } } if (is_not_null(info$distance) && !info$distance %in% matchit_distances()) { if (info$mahalanobis) cat(" [matching]\n ") if (info$distance_is_matrix) cat("User-defined (matrix)") else if (info$distance != "user") cat("Propensity score") else if (is_not_null(attr(info$distance, "custom"))) cat(attr(info$distance, "custom")) else cat("User-defined") if (cal || disl) { cal.ps <- hasName(x[["caliper"]], "") cat(sprintf(" [%s]\n", toString(c("matching", "subclassification", "caliper", "common support")[c(!nm && !info$mahalanobis && info$method != "subclass", !nm && info$method == "subclass", cal.ps, disl)]))) } if (info$distance != "user") { cat(sprintf("\n - estimated with %s\n", info_to_distance(info))) if (is_not_null(x[["s.weights"]])) { cat(sprintf(" - sampling weights %s in estimation\n", if (isTRUE(attr(x[["s.weights"]], "in_ps"))) "included" else "not included")) } } } } if (cal) { cat(sprintf(" - caliper: %s\n", toString(vapply(seq_along(x[["caliper"]]), function(z) { sprintf("%s (%s)", if (nzchar(names(x[["caliper"]])[z])) names(x[["caliper"]])[z] else "", format(round(x[["caliper"]][z], 3L))) }, character(1L))))) } if (disl) { cat(sprintf(" - common support: %s dropped\n", switch(dis, "both" = "units from both groups", "treat" = "treated units", "control" = "control units"))) } cat(sprintf(" - number of obs.: %s (original)%s\n", length(x[["treat"]]), if (all_equal_to(x[["weights"]], 1)) "" else sprintf(", %s (matched)", sum(x[["weights"]] != 0)))) if (is_not_null(x[["s.weights"]])) { cat(" - sampling weights: present\n") } if (is_not_null(x[["estimand"]])) { cat(sprintf(" - target estimand: %s\n", x[["estimand"]])) } if (is_not_null(x[["X"]])) { cat(sprintf(" - covariates: %s\n", if (length(names(x[["X"]])) > 40L) "too many to name" else toString(names(x[["X"]])))) } invisible(x) } matchit2null <- function(discarded, ...) { res <- list(weights = as.numeric(!discarded)) class(res) <- "matchit" res } MatchIt/R/matchit2cardinality.R0000644000176200001440000007424114763323306016105 0ustar liggesusers#' Cardinality Matching #' @name method_cardinality #' @aliases method_cardinality #' @usage NULL #' #' @description #' In [matchit()], setting `method = "cardinality"` performs cardinality #' matching and other forms of matching that use mixed integer programming. #' Rather than forming pairs, cardinality matching selects the largest subset #' of units that satisfies user-supplied balance constraints on mean #' differences. One of several available optimization programs can be used to #' solve the mixed integer program. The default is the HiGHS library as #' implemented in the *highs* package, both of which are free, but performance can be #' improved using Gurobi and the *gurobi* package, for which there is a #' free academic license. #' #' This page details the allowable arguments with `method = "cardinality"`. See [matchit()] for an explanation of what each argument #' means in a general context and how it can be specified. #' #' Below is how `matchit()` is used for cardinality matching: #' \preformatted{ #' matchit(formula, #' data = NULL, #' method = "cardinality", #' estimand = "ATT", #' exact = NULL, #' mahvars = NULL, #' s.weights = NULL, #' ratio = 1, #' verbose = FALSE, #' tols = .05, #' std.tols = TRUE, #' solver = "highs", #' ...) } #' #' @param formula a two-sided [formula] object containing the treatment and #' covariates to be balanced. #' @param data a data frame containing the variables named in `formula`. #' If not found in `data`, the variables will be sought in the #' environment. #' @param method set here to `"cardinality"`. #' @param estimand a string containing the desired estimand. Allowable options #' include `"ATT"`, `"ATC"`, and `"ATE"`. See Details. #' @param exact for which variables exact matching should take place. Separate #' optimization will occur within each subgroup of the exact matching #' variables. #' @param mahvars which variables should be used for pairing after subset selection. Can only be set when `ratio` is a whole number. See Details. #' @param s.weights the variable containing sampling weights to be incorporated #' into the optimization. The balance constraints refer to the product of the #' sampling weights and the matching weights, and the sum of the product of the #' sampling and matching weights will be maximized. #' @param ratio the desired ratio of control to treated units. Can be set to #' `NA` to maximize sample size without concern for this ratio. See #' Details. #' @param verbose `logical`; whether information about the matching #' process should be printed to the console. #' @param \dots additional arguments that control the matching specification: #' \describe{ #' \item{`tols`}{`numeric`; a vector of imbalance #' tolerances for mean differences, one for each covariate in `formula`. #' If only one value is supplied, it is applied to all. See `std.tols` #' below. Default is `.05` for standardized mean differences of at most #' .05 for all covariates between the treatment groups in the matched sample. #' } #' \item{`std.tols`}{`logical`; whether each entry in `tols` #' corresponds to a raw or standardized mean difference. If only one value is #' supplied, it is applied to all. Default is `TRUE` for standardized mean #' differences. The standardization factor is the pooled standard deviation #' when `estimand = "ATE"`, the standard deviation of the treated group #' when `estimand = "ATT"`, and the standard deviation of the control #' group when `estimand = "ATC"` (the same as used in #' [summary.matchit()]).} #' \item{`solver`}{ the name of solver to use to #' solve the optimization problem. Available options include `"highs"`, `"glpk"`, #' `"symphony"`, and `"gurobi"` for HiGHS (implemented in the *highs* package), GLPK (implemented in the #' *Rglpk* package), SYMPHONY (implemented in the *Rsymphony* #' package), and Gurobi (implemented in the *gurobi* package), #' respectively. The differences between them are in speed and solving ability. #' HiGHS (the default) and GLPK are the easiest to install, but Gurobi is recommended as #' it consistently outperforms other solvers and can find solutions even when #' others can't, and in less time. Gurobi is proprietary but can be used with a #' free trial or academic license. SYMPHONY may not produce reproducible #' results, even with a seed set. } #' \item{`time`}{ the maximum amount of #' time before the optimization routine aborts, in seconds. Default is 120 (2 #' minutes). For large problems, this should be set much higher. } #' } #' #' The arguments `distance` (and related arguments), `replace`, `m.order`, and `caliper` (and related arguments) are ignored with a warning. #' #' @section Outputs: #' #' Most outputs described in [matchit()] are returned with #' `method = "cardinality"`. Unless `mahvars` is specified, the `match.matrix` and `subclass` #' components are omitted because no pairing or subclassification is done. When #' `include.obj = TRUE` in the call to `matchit()`, the output of the #' optimization function will be included in the output. When `exact` is #' specified, this will be a list of such objects, one for each stratum of the #' exact variables. #' #' @details #' ## Cardinality and Profile Matching #' #' Two types of matching are #' available with `method = "cardinality"`: cardinality matching and #' profile matching. #' #' **Cardinality matching** finds the largest matched set that satisfies the #' balance constraints between treatment groups, with the additional constraint #' that the ratio of the number of matched control to matched treated units is #' equal to `ratio` (1 by default), mimicking k:1 matching. When not all #' treated units are included in the matched set, the estimand no longer #' corresponds to the ATT, so cardinality matching should be avoided if #' retaining the ATT is desired. To request cardinality matching, #' `estimand` should be set to `"ATT"` or `"ATC"` and #' `ratio` should be set to a positive integer. 1:1 cardinality matching #' is the default method when no arguments are specified. #' #' **Profile matching** finds the largest matched set that satisfies balance #' constraints between each treatment group and a specified target sample. When #' `estimand = "ATT"`, it will find the largest subset of the control #' units that satisfies the balance constraints with respect to the treated #' group, which is left intact. When `estimand = "ATE"`, it will find the #' largest subsets of the treated group and of the control group that are #' balanced to the overall sample. To request profile matching for the ATT, #' `estimand` should be set to `"ATT"` and `ratio` to `NA`. #' To request profile matching for the ATE, `estimand` should be set to #' `"ATE"` and `ratio` can be set either to `NA` to maximize the #' size of each sample independently or to a positive integer to ensure that #' the ratio of matched control units to matched treated treats is fixed, #' mimicking k:1 matching. Unlike cardinality matching, profile matching #' retains the requested estimand if a solution is found. #' #' Neither method involves creating pairs in the matched set, but it is #' possible to perform an additional round of pairing within the matched sample #' after cardinality matching or profile matching for the ATE with a fixed whole number #' sample size ratio by supplying the desired pairing variables to `mahvars`. Doing so will trigger [optimal matching][method_optimal] using `optmatch::pairmatch()` on the Mahalanobis distance computed using the variables supplied to `mahvars`. The balance or composition of the matched sample will not change, but additional #' precision and robustness can be gained by forming the pairs. #' #' The weights are scaled so that the sum of the weights in each group is equal #' to the number of matched units in the smaller group when cardinality #' matching or profile matching for the ATE, and scaled so that the sum of the #' weights in the control group is equal to the number of treated units when #' profile matching for the ATT. When the sample sizes of the matched groups #' is the same (i.e., when `ratio = 1`), no scaling is done. Robust #' standard errors should be used in effect estimation after cardinality or #' profile matching (and cluster-robust standard errors if additional pairing #' is done in the matched sample). See `vignette("estimating-effects")` #' for more information. #' #' ## Specifying Balance Constraints #' #' The balance constraints are on #' the (standardized) mean differences between the matched treatment groups for #' each covariate. Balance constraints should be set by supplying arguments to #' `tols` and `std.tols`. For example, setting `tols = .1` and #' `std.tols = TRUE` requests that all the mean differences in the matched #' sample should be within .1 standard deviations for each covariate. Different #' tolerances can be set for different variables; it might be beneficial to #' constrain the mean differences for highly prognostic covariates more tightly #' than for other variables. For example, one could specify `tols = c(.001, .05), std.tols = c(TRUE, FALSE)` #' to request that the standardized #' mean difference for the first covariate is less than .001 and the raw mean #' difference for the second covariate is less than .05. The values should be #' specified in the order they appear in `formula`, except when #' interactions are present. One can run the following code: #' #' \preformatted{MatchIt:::get_assign(model.matrix(~X1*X2 + X3, data = data))[-1]} #' #' which will output a vector of numbers and the variable to which each number #' corresponds; the first entry in `tols` corresponds to the variable #' labeled 1, the second to the variable labeled 2, etc. #' #' ## Dealing with Errors and Warnings #' #' When the optimization cannot be #' solved at all, or at least within the time frame specified in the argument #' to `time`, an error or warning will appear. Unfortunately, it is hard #' to know exactly the cause of the failure and what measures should be taken #' to rectify it. #' #' A warning that says `"The optimizer failed to find an optimal solution in the time alotted. The returned solution may not be optimal."` usually #' means that an optimal solution may be possible to find with more time, in #' which case `time` should be increased or a faster solver should be #' used. Even with this warning, a potentially usable solution will be #' returned, so don't automatically take it to mean the optimization failed. #' Sometimes, when there are multiple solutions with the same resulting sample #' size, the optimizers will stall at one of them, not thinking it has found #' the optimum. The result should be checked to see if it can be used as the #' solution. #' #' An error that says `"The optimization problem may be infeasible."` #' usually means that there is a issue with the optimization problem, i.e., #' that there is no possible way to satisfy the constraints. To rectify this, #' one can try relaxing the constraints by increasing the value of `tols` #' or use another solver. Sometimes Gurobi can solve problems that the other #' solvers cannot. #' #' @seealso [matchit()] for a detailed explanation of the inputs and outputs of #' a call to `matchit()`. #' #' *\CRANpkg{designmatch}*, which performs cardinality and profile matching with many more options and #' more flexibility. The implementations of cardinality matching differ between #' *MatchIt* and *designmatch*, so their results might differ. #' #' *\CRANpkg{optweight}*, which offers similar functionality but in the context of weighting rather #' than matching. #' #' @references In a manuscript, you should reference the solver used in the #' optimization. For example, a sentence might read: #' #' *Cardinality matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R with the optimization performed by HiGHS (Huangfu & Hall, 2018).* #' #' See `vignette("matching-methods")` for more literature on cardinality #' matching. #' #' @examplesIf requireNamespace("highs", quietly = TRUE) #' data("lalonde") #' #' #Choose your solver; "gurobi" is best, "highs" is free and #' #easy to install #' \donttest{solver <- "highs" #' # 1:1 cardinality matching #' m.out1 <- matchit(treat ~ age + educ + re74, #' data = lalonde, #' method = "cardinality", #' estimand = "ATT", #' ratio = 1, #' tols = .2, #' solver = solver) #' m.out1 #' summary(m.out1) #' #' # Profile matching for the ATT #' m.out2 <- matchit(treat ~ age + educ + re74, #' data = lalonde, #' method = "cardinality", #' estimand = "ATT", #' ratio = NA, #' tols = .2, #' solver = solver) #' m.out2 #' summary(m.out2, un = FALSE) #' #' # Profile matching for the ATE #' m.out3 <- matchit(treat ~ age + educ + re74, #' data = lalonde, #' method = "cardinality", #' estimand = "ATE", #' ratio = NA, #' tols = .2, #' solver = solver) #' m.out3 #' summary(m.out3, un = FALSE)} #' @examplesIf (requireNamespace("highs", quietly = TRUE) && requireNamespace("optmatch", quietly = TRUE)) #' \donttest{# Pairing after 1:1 cardinality matching: #' m.out1b <- matchit(treat ~ age + educ + re74, #' data = lalonde, #' method = "cardinality", #' estimand = "ATT", #' ratio = 1, #' tols = .15, #' solver = solver, #' mahvars = ~ age + educ + re74) #' #' # Note that balance doesn't change but pair distances #' # are lower for the paired-upon variables #' summary(m.out1b, un = FALSE) #' summary(m.out1, un = FALSE)} #' #' # In these examples, a high tol was used and #' # few covariate matched on in order to not take too long; #' # with real data, tols should be much lower and more #' # covariates included if possible. NULL matchit2cardinality <- function(treat, data, discarded, formula, ratio = 1, focal = NULL, s.weights = NULL, replace = FALSE, mahvars = NULL, exact = NULL, estimand = "ATT", verbose = FALSE, tols = .05, std.tols = TRUE, solver = "highs", time = 2 * 60, ...) { .cat_verbose("Cardinality matching... \n", verbose = verbose) tvals <- unique(treat) nt <- length(tvals) estimand <- toupper(estimand) estimand <- match_arg(estimand, c("ATT", "ATC", "ATE")) if (is_null(focal)) { focal <- switch(estimand, "ATC" = min(tvals), max(tvals)) } else if (!any(tvals == focal)) { .err("`focal` must be a value of the treatment") } lab <- names(treat) weights <- rep_with(0, treat) X <- get_covs_matrix(formula, data = data) if (is_not_null(exact)) { ex <- exactify(model.frame(exact, data = data), nam = lab, sep = ", ", include_vars = TRUE) cc <- Reduce("intersect", lapply(tvals, function(t) unclass(ex)[treat == t])) if (is_null(cc)) { .err("no matches were found") } } else { ex <- gl(1, length(treat)) cc <- 1 } #Process mahvars if (is_not_null(mahvars)) { if (!is.finite(ratio) || !chk::vld_whole_number(ratio)) { .err("`mahvars` can only be used with `method = \"cardinality\"` when `ratio` is a whole number") } rlang::check_installed("optmatch") mahcovs <- transform_covariates(mahvars, data = data, method = "mahalanobis", s.weights = s.weights, treat = treat, discarded = discarded) pair <- rep_with(NA_character_, treat) } else { pair <- NULL } #Process tols assign <- get_assign(X) chk::chk_numeric(tols) if (length(tols) == 1L) { tols <- rep.int(tols, ncol(X)) } else if (length(tols) == max(assign)) { tols <- tols[assign] } else if (length(tols) != ncol(X)) { .err("`tols` must have length 1 or the number of covariates. See `?method_cardinality` for details") } chk::chk_logical(std.tols) if (length(std.tols) == 1L) { std.tols <- rep.int(std.tols, ncol(X)) } else if (length(std.tols) == max(assign)) { std.tols <- std.tols[assign] } else if (length(std.tols) != ncol(X)) { .err("`std.tols` must have length 1 or the number of covariates. See `?method_cardinality` for details") } #Apply std.tols if (any(std.tols)) { std.tols <- which(std.tols) sds <- { if (estimand == "ATE") { pooled_sd(X[, std.tols, drop = FALSE], t = treat, w = s.weights, contribution = "equal") } else { sqrt(apply(X[treat == focal, std.tols, drop = FALSE], 2, wvar, w = s.weights[treat == focal])) } } for (i in which(sds >= 1e-10)) { X[, std.tols[i]] <- X[, std.tols[i]] / sds[i] } } opt.out <- setNames(vector("list", nlevels(ex)), levels(ex)) for (e in levels(ex)[cc]) { if (nlevels(ex) > 1L) { .cat_verbose(sprintf("Matching subgroup %s/%s: %s...\n", match(e, levels(ex)[cc]), length(cc), e), verbose = verbose) } .e <- which(!discarded & ex == e) treat_in.exact <- treat[.e] out <- cardinality_matchit(treat = treat_in.exact, X = X[.e, , drop = FALSE], estimand = estimand, tols = tols, s.weights = s.weights[.e], ratio = ratio, focal = focal, tvals = tvals, solver = solver, time = time, verbose = verbose) weights[.e] <- out[["weights"]] opt.out[[e]] <- out[["opt.out"]] if (is_not_null(mahvars)) { mo <- eucdist_internal(mahcovs[.e[out[["weights"]] > 0], , drop = FALSE], treat_in.exact[out[["weights"]] > 0]) rlang::with_options({ pm <- optmatch::pairmatch(mo, controls = ratio, data = data.frame(treat_in.exact)) }, optmatch_max_problem_size = Inf) pair[names(pm)[!is.na(pm)]] <- paste(as.character(pm[!is.na(pm)]), e, sep = "|") } } if (is_not_null(pair)) { psclass <- factor(pair) levels(psclass) <- seq_len(nlevels(psclass)) names(psclass) <- names(treat) mm <- nummm2charmm(subclass2mmC(psclass, treat, focal = switch(estimand, "ATC" = 0, 1)), treat) } else { mm <- psclass <- NULL } if (length(opt.out) == 1L) { out <- out[[1L]] } res <- list(match.matrix = mm, subclass = psclass, weights = weights, obj = opt.out) .cat_verbose("Done.\n", verbose = verbose) class(res) <- "matchit" res } ## Function to actually do the matching cardinality_matchit <- function(treat, X, estimand = "ATT", tols = .05, s.weights = NULL, ratio = 1, focal = NULL, tvals = NULL, solver = "highs", time = 2 * 60, verbose = FALSE) { n <- length(treat) if (is_null(tvals)) tvals <- if (is.factor(treat)) levels(treat) else sort(unique(treat)) nt <- length(tvals) #Check inputs if (is_null(s.weights)) { s.weights <- rep.int(1, n) } else { s.weights <- .make_sum_to_n(s.weights, treat) } if (is_null(focal)) { focal <- tvals[length(tvals)] } chk::chk_number(time) chk::chk_gt(time, 0) chk::chk_string(solver) solver <- match_arg(solver, c("highs", "glpk", "symphony", "gurobi")) rlang::check_installed(switch(solver, glpk = "Rglpk", symphony = "Rsymphony", gurobi = "gurobi", highs = "highs")) #Select match type match_type <- { if (estimand == "ATE") "profile_ate" else if (is.finite(ratio)) "cardinality" else "profile_att" } #Set objective and constraints if (match_type == "profile_ate") { #Find largest sample that matches full sample #Objective function: total sample size O <- c( s.weights, #weight for each unit rep.int(0, nt) #slack coefs for each sample size (n1, n0) ) #Constraint matrix target.means <- apply(X, 2, wm, w = s.weights) C <- matrix(0, nrow = nt * (1 + 2 * ncol(X)), ncol = length(O)) Crhs <- rep.int(0, nrow(C)) Cdir <- rep.int("==", nrow(C)) for (i in seq_len(nt)) { #Num in group i = ni C[i, seq_len(n)] <- s.weights * (treat == tvals[i]) C[i, n + i] <- -1 #Cov means must be less than target.means+tols/2 r1 <- nt + (i - 1) * 2 * ncol(X) + seq_len(ncol(X)) C[r1, seq_len(n)] <- t((treat == tvals[i]) * s.weights * X) C[r1, n + i] <- -target.means - tols / 2 Cdir[r1] <- "<" #Cov means must be greater than target.means-tols/2 r2 <- r1 + ncol(X) C[r2, seq_len(n)] <- t((treat == tvals[i]) * s.weights * X) C[r2, n + i] <- -target.means + tols / 2 Cdir[r2] <- ">" } #If ratio != 0, constrain n0 to be ratio*n1 if (nt == 2L && is.finite(ratio)) { C_ratio <- c(rep.int(0, n), rep.int(-1, nt)) C_ratio[n + which(tvals == focal)] <- ratio C <- rbind(C, C_ratio) Crhs <- c(Crhs, 0) Cdir <- c(Cdir, "==") } #Coef types types <- c(rep.int("B", n), #Matching weights rep.int("C", nt)) #Slack coefs for matched group size lower.bound <- c(rep.int(0, n), rep.int(1, nt)) upper.bound <- c(rep.int(1, n), rep.int(Inf, nt)) } else if (match_type == "profile_att") { #Find largest control group that matches treated group nonf <- which(treat != focal) n0 <- length(nonf) tvals_ <- setdiff(tvals, focal) #Objective function: size of matched control group O <- c( rep.int(1, n0), #weights for each non-focal unit rep.int(0, nt - 1) #slack coef for size of non-focal groups ) #Constraint matrix target.means <- apply(X[treat == focal, , drop = FALSE], 2, wm, w = s.weights[treat == focal]) #One row per constraint, one column per coef C <- matrix(0, nrow = (nt - 1) * (1 + 2 * ncol(X)), ncol = length(O)) Crhs <- rep.int(0, nrow(C)) Cdir <- rep.int("==", nrow(C)) for (i in seq_len(nt - 1)) { #Num in group i = ni C[i, seq_len(n0)] <- s.weights[nonf] * (treat[nonf] == tvals_[i]) C[i, n0 + i] <- -1 #Cov means must be less than target.means+tols r1 <- nt - 1 + (i - 1) * 2 * ncol(X) + seq_len(ncol(X)) C[r1, seq_len(n0)] <- t((treat[nonf] == tvals_[i]) * s.weights[nonf] * X[nonf, , drop = FALSE]) C[r1, n0 + i] <- -target.means - tols Cdir[r1] <- "<" #Cov means must be greater than target.means-tols r2 <- r1 + ncol(X) C[r2, seq_len(n0)] <- t((treat[nonf] == tvals_[i]) * s.weights[nonf] * X[nonf, , drop = FALSE]) C[r2, n0 + i] <- -target.means + tols Cdir[r2] <- ">" } #Coef types types <- c(rep.int("B", n0), #Matching weights rep.int("C", nt - 1L)) #Slack for num control matched lower.bound <- c(rep.int(0, n0), rep.int(0, nt - 1L)) upper.bound <- c(rep.int(1, n0), rep.int(Inf, nt - 1L)) } else if (match_type == "cardinality") { #True cardinality matching: find largest balanced sample if (nt > 2) ratio <- 1 #Objective function: total sample size O <- c( s.weights, #weight for each unit 0 #coef for treated sample size (n1) ) #Constraint matrix t_combs <- combn(tvals, 2L, simplify = FALSE) C <- matrix(0, nrow = nt + 2 * ncol(X) * length(t_combs), ncol = length(O)) Crhs <- rep.int(0, nrow(C)) Cdir <- rep.int("==", nrow(C)) for (i in seq_len(nt)) { #Num in group i = ni C[i, seq_len(n)] <- s.weights * (treat == tvals[i]) C[i, n + 1L] <- if (tvals[i] == focal) -1 else -ratio } for (j in seq_along(t_combs)) { t_comb <- t_combs[[j]] if (t_comb[2L] == focal) t_comb <- rev(t_comb) r1 <- nt + (j - 1) * 2 * ncol(X) + seq_len(ncol(X)) C[r1, seq_len(n)] <- t(((treat == t_comb[1L]) - (treat == t_comb[2L]) / ratio) * s.weights * X) C[r1, n + 1L] <- -tols Cdir[r1] <- "<" r2 <- r1 + ncol(X) C[r2, seq_len(n)] <- t(((treat == t_comb[1L]) - (treat == t_comb[2L]) / ratio) * s.weights * X) C[r2, n + 1L] <- tols Cdir[r2] <- ">" } #Coef types types <- c(rep.int("B", n), #Matching weights rep.int("C", 1L)) #Slack coef for treated group size (n1) lower.bound <- c(rep.int(0, n), rep.int(0, 1L)) upper.bound <- c(rep.int(1, n), rep.int(min(tabulateC(treat)), 1L)) } weights <- NULL opt.out <- dispatch_optimizer(solver = solver, obj = O, mat = C, dir = Cdir, rhs = Crhs, types = types, max = TRUE, lb = lower.bound, ub = upper.bound, time = time, verbose = verbose) cardinality_error_report(opt.out, solver) sol <- switch(solver, "glpk" = opt.out$solution, "symphony" = opt.out$solution, "gurobi" = opt.out$x, "highs" = opt.out$primal_solution) if (match_type %in% c("profile_ate", "cardinality")) { weights <- round(sol[seq_len(n)]) } else if (match_type %in% c("profile_att")) { weights <- rep.int(1, n) weights[treat != focal] <- round(sol[seq_len(n0)]) } #Make sure sum of weights in both groups is the same (important for exact matching) if (match_type == "profile_att" && (is.na(ratio) || ratio != 1)) { for (t in setdiff(tvals, focal)) { weights[treat == t] <- weights[treat == t] * sum(weights[treat == focal]) / sum(weights[treat == t]) } } else { smallest.group <- tvals[which.min(vapply(tvals, function(t) sum(treat == t), numeric(1L)))] for (t in setdiff(tvals, smallest.group)) { weights[treat == t] <- weights[treat == t] * sum(weights[treat == smallest.group]) / sum(weights[treat == t]) } } list(weights = weights, opt.out = opt.out) } cardinality_error_report <- function(out, solver) { if (solver == "glpk") { if (out$status == 1) { if (all_equal_to(out$solution, 0)) { .err("the optimization problem may be infeasible. Try increasing the value of `tols`.\nSee `?method_cardinality` for additional details") } .wrn("the optimizer failed to find an optimal solution in the time alotted. The returned solution may not be optimal.\nSee `?method_cardinality` for additional details") } } else if (solver == "symphony") { if (names(out$status) %in% c("TM_TIME_LIMIT_EXCEEDED") && !all(out$solution == 0) && all(out$solution <= 1)) { .wrn("the optimizer failed to find an optimal solution in the time alotted. The returned solution may not be optimal") } else if (names(out$status) != "TM_OPTIMAL_SOLUTION_FOUND") { .err("the optimizer failed to find an optimal solution in the time alotted. The optimization problem may be infeasible. Try increasing the value of 'tols'.\nSee `?method_cardinality` for additional details") } } else if (solver == "gurobi") { if (out$status %in% c("TIME_LIMIT", "SUBOPTIMAL") && !all(out$x == 0)) { .wrn("the optimizer failed to find an optimal solution in the time alotted. The returned solution may not be optimal.\nSee `?method_cardinality` for additional details") } else if (out$status %in% c("INFEASIBLE", "INF_OR_UNBD", "NUMERIC") || all(out$x == 0)) { .err("The optimization problem may be infeasible. Try increasing the value of `tols`.\nSee `?method_cardinality` for additional details") } } else if (solver == "highs") { if (out$status_message %in% c("Infeasible", "Primal infeasible or unbounded")) { # if (out$status_message %in% c("Infeasible", "Primal infeasible or unbounded") || # all(abs(out$primal_solution) < 1e-8)) { .err("the optimization problem may be infeasible. Try increasing the value of `tols`.\nSee `?method_cardinality` for additional details") } if (out$status_message %in% c("Time limit reached", "Iteration limit reached")) { .err("the optimizer failed to find an optimal solution in the time alotted. Try increasing the value of `time`.\nSee `?method_cardinality` for additional details") } } } dispatch_optimizer <- function(solver = "highs", obj, mat, dir, rhs, types, max = TRUE, lb = NULL, ub = NULL, time = NULL, verbose = FALSE) { if (solver == "glpk") { dir[dir == "="] <- "==" opt.out <- Rglpk::Rglpk_solve_LP(obj = obj, mat = mat, dir = dir, rhs = rhs, max = max, types = types, # bounds = list(lower = lb, upper = ub), #Spurious warning when using bounds control = list(tm_limit = time * 1000, verbose = verbose)) } else if (solver == "symphony") { dir[dir == "<"] <- "<=" dir[dir == ">"] <- ">=" dir[dir == "="] <- "==" opt.out <- Rsymphony::Rsymphony_solve_LP(obj = obj, mat = mat, dir = dir, rhs = rhs, max = TRUE, types = types, verbosity = verbose - 2, # bounds = list(lower = lb, upper = ub), #Spurious warning when using bounds time_limit = time) } else if (solver == "gurobi") { dir[dir == "<="] <- "<" dir[dir == ">="] <- ">" dir[dir == "=="] <- "=" opt.out <- gurobi::gurobi(list(A = mat, obj = obj, sense = dir, rhs = rhs, vtype = types, modelsense = "max", lb = lb, ub = ub), params = list(OutputFlag = as.integer(verbose), TimeLimit = time)) } else if (solver == "highs") { rhs_h <- lhs_h <- rhs rhs_h[dir == ">"] <- Inf lhs_h[dir == "<"] <- -Inf types[types == "B"] <- "I" opt.out <- highs::highs_solve(L = obj, lower = lb, upper = ub, A = mat, lhs = lhs_h, rhs = rhs_h, types = types, maximum = max, control = list(time_limit = time, log_to_console = verbose)) } opt.out } MatchIt/R/plot.matchit.R0000644000176200001440000010144614763226540014554 0ustar liggesusers#' Generate Balance Plots after Matching and Subclassification #' #' Generates plots displaying distributional balance and overlap on covariates #' and propensity scores before and after matching and subclassification. For #' displaying balance solely on covariate standardized mean differences, see #' [plot.summary.matchit()]. The plots here can be used to assess to what #' degree covariate and propensity score distributions are balanced and how #' weighting and discarding affect the distribution of propensity scores. #' #' @aliases plot.matchit plot.matchit.subclass #' #' @param x a `matchit` object; the output of a call to [matchit()]. #' @param type the type of plot to display. Options include `"qq"`, #' `"ecdf"`, `"density"`, `"jitter"`, and `"histogram"`. #' See Details. Default is `"qq"`. Abbreviations allowed. #' @param interactive `logical`; whether the graphs should be displayed in #' an interactive way. Only applies for `type = "qq"`, `"ecdf"`, #' `"density"`, and `"jitter"`. See Details. #' @param which.xs with `type = "qq"`, `"ecdf"`, or `"density"`, #' for which covariate(s) plots should be displayed. Factor variables should be #' named by the original variable name rather than the names of individual #' dummy variables created after expansion with `model.matrix`. Can be supplied as a character vector or a one-sided formula. #' @param data an optional data frame containing variables named in `which.xs` but not present in the `matchit` object. #' @param subclass with subclassification and `type = "qq"`, #' `"ecdf"`, or `"density"`, whether to display balance for #' individual subclasses, and, if so, for which ones. Can be `TRUE` #' (display plots for all subclasses), `FALSE` (display plots only in #' aggregate), or the indices (e.g., `1:6`) of the specific subclasses for #' which to display balance. When unspecified, if `interactive = TRUE`, #' you will be asked for which subclasses plots are desired, and otherwise, #' plots will be displayed only in aggregate. #' @param \dots arguments passed to [plot()] to control the appearance of the #' plot. Not all options are accepted. #' #' @details #' `plot.matchit()` makes one of five different plots depending on the #' argument supplied to `type`. The first three, `"qq"`, #' `"ecdf"`, and `"density"`, assess balance on the covariates. When #' `interactive = TRUE`, plots for three variables will be displayed at a #' time, and the prompt in the console allows you to move on to the next set of #' variables. When `interactive = FALSE`, multiple pages are plotted at #' the same time, but only the last few variables will be visible in the #' displayed plot. To see only a few specific variables at a time, use the #' `which.xs` argument to display plots for just those variables. If fewer #' than three variables are available (after expanding factors into their #' dummies), `interactive` is ignored. #' #' With `type = "qq"`, empirical quantile-quantile (eQQ) plots are created #' for each covariate before and after matching. The plots involve #' interpolating points in the smaller group based on the weighted quantiles of #' the other group. When points are approximately on the 45-degree line, the #' distributions in the treatment and control groups are approximately equal. #' Major deviations indicate departures from distributional balance. With #' variable with fewer than 5 unique values, points are jittered to more easily #' visualize counts. #' #' With `type = "ecdf"`, empirical cumulative distribution function (eCDF) #' plots are created for each covariate before and after matching. Two eCDF #' lines are produced in each plot: a gray one for control units and a black #' one for treated units. Each point on the lines corresponds to the proportion #' of units (or proportionate share of weights) less than or equal to the #' corresponding covariate value (on the x-axis). Deviations between the lines #' on the same plot indicates distributional imbalance between the treatment #' groups for the covariate. The eCDF and eQQ statistics in [summary.matchit()] #' correspond to these plots: the eCDF max (also known as the #' Kolmogorov-Smirnov statistic) and mean are the largest and average vertical #' distance between the lines, and the eQQ max and mean are the largest and #' average horizontal distance between the lines. #' #' With `type = "density"`, density plots are created for each covariate #' before and after matching. Two densities are produced in each plot: a gray #' one for control units and a black one for treated units. The x-axis #' corresponds to the value of the covariate and the y-axis corresponds to the #' density or probability of that covariate value in the corresponding group. #' For binary covariates, bar plots are produced, having the same #' interpretation. Deviations between the black and gray lines represent #' imbalances in the covariate distribution; when the lines coincide (i.e., #' when only the black line is visible), the distributions are identical. #' #' The last two plots, `"jitter"` and `"histogram"`, visualize the #' distance (i.e., propensity score) distributions. These plots are more for #' heuristic purposes since the purpose of matching is to achieve balance on #' the covariates themselves, not the propensity score. #' #' With `type = "jitter"`, a jitter plot is displayed for distance values #' before and after matching. This method requires a distance variable (e.g., a #' propensity score) to have been estimated or supplied in the call to #' `matchit()`. The plot displays individuals values for matched and #' unmatched treatment and control units arranged horizontally by their #' propensity scores. Points are jitter so counts are easier to see. The size #' of the points increases when they receive higher weights. When #' `interactive = TRUE`, you can click on points in the graph to identify #' their rownames and indices to further probe extreme values, for example. #' With subclassification, vertical lines representing the subclass boundaries #' are overlay on the plots. #' #' With `type = "histogram"`, a histogram of distance values is displayed #' for the treatment and control groups before and after matching. This method #' requires a distance variable (e.g., a propensity score) to have been #' estimated or supplied in the call to `matchit()`. With #' subclassification, vertical lines representing the subclass boundaries are #' overlay on the plots. #' #' With all methods, sampling weights are incorporated into the weights if #' present. #' #' @note Sometimes, bugs in the plotting functions can cause strange layout or #' size issues. Running [frame()] or [dev.off()] can be used to reset the #' plotting pane (note the latter will delete any plots in the plot history). #' #' @seealso [summary.matchit()] for numerical summaries of balance, including #' those that rely on the eQQ and eCDF plots. #' #' [plot.summary.matchit()] for plotting standardized mean differences in a #' Love plot. #' #' \pkgfun{cobalt}{bal.plot} for displaying distributional balance in several other #' ways that are more easily customizable and produce *ggplot2* objects. #' *cobalt* functions natively support `matchit` objects. #' #' @examples #' data("lalonde") #' #' m.out <- matchit(treat ~ age + educ + married + #' race + re74, #' data = lalonde, #' method = "nearest") #' plot(m.out, type = "qq", #' interactive = FALSE, #' which.xs = ~age + educ + re74) #' plot(m.out, type = "histogram") #' #' s.out <- matchit(treat ~ age + educ + married + #' race + nodegree + re74 + re75, #' data = lalonde, #' method = "subclass") #' plot(s.out, type = "density", #' interactive = FALSE, #' which.xs = ~age + educ + re74, #' subclass = 3) #' plot(s.out, type = "jitter", #' interactive = FALSE) #' #' @exportS3Method plot matchit plot.matchit <- function(x, type = "qq", interactive = TRUE, which.xs = NULL, data = NULL, ...) { chk::chk_string(type) type <- tolower(type) type <- match_arg(type, c("qq", "ecdf", "density", "jitter", "histogram")) if (type %in% c("qq", "ecdf", "density")) { matchit.covplot(x, type = type, interactive = interactive, which.xs = which.xs, data = data, ...) } else if (type == "jitter") { if (is_null(x$distance)) { .err('`type = "jitter"` cannot be used if a distance measure is not estimated or supplied. No plots generated') } jitter_pscore(x, interactive = interactive, ...) } else if (type == "histogram") { if (is_null(x$distance)) { .err('`type = "hist"` cannot be used if a distance measure is not estimated or supplied. No plots generated') } hist_pscore(x, ...) } invisible(x) } #' @exportS3Method plot matchit.subclass #' @rdname plot.matchit plot.matchit.subclass <- function(x, type = "qq", interactive = TRUE, which.xs = NULL, subclass, ...) { choice.menu <- function(choices, question) { k <- length(choices) - 1 Choices <- data.frame(choices) row.names(Choices) <- 0:k names(Choices) <- "Choices" print.data.frame(Choices, right = FALSE) ans <- readline(question) while (!ans %in% 0:k) { message("Not valid -- please pick one of the choices") print.data.frame(Choices, right = FALSE) ans <- readline(question) } ans } chk::chk_string(type) type <- tolower(type) type <- match_arg(type, c("qq", "ecdf", "density", "jitter", "histogram")) if (type %in% c("qq", "ecdf", "density")) { #If subclass = T, index, or range, display all or range of subclasses, using interactive to advance #If subclass = F, display aggregate across subclass, using interactive to advance #If subclass = NULL, if interactive, use to choose subclass, else display aggregate across subclass subclasses <- levels(x$subclass) miss.sub <- missing(subclass) || is_null(subclass) if (miss.sub || isFALSE(subclass)) { which.subclass <- NULL } else if (isTRUE(subclass)) { which.subclass <- subclasses } else if (is.atomic(subclass) && all(subclass %in% seq_along(subclasses))) { which.subclass <- subclasses[subclass] } else { .err("`subclass` should be `TRUE`, `FALSE`, or a vector of subclass indices for which subclass balance is to be displayed") } if (is_not_null(which.subclass)) { matchit.covplot.subclass(x, type = type, which.subclass = which.subclass, interactive = interactive, which.xs = which.xs, ...) } else if (interactive && miss.sub) { subclasses <- levels(x$subclass) choices <- c("No (Exit)", paste0("Yes: Subclass ", subclasses), "Yes: In aggregate") plot.name <- switch(type, "qq" = "quantile-quantile", "ecdf" = "empirical CDF", "density" = "density") question <- sprintf("Would you like to see %s plots of any subclasses? ", plot.name) ans <- -1 while (ans != 0) { ans <- as.numeric(choice.menu(choices, question)) if (ans %in% seq_along(subclasses) && any(x$subclass == subclasses[ans])) { matchit.covplot.subclass(x, type = type, which.subclass = subclasses[ans], interactive = interactive, which.xs = which.xs, ...) } else if (ans != 0) { matchit.covplot(x, type = type, interactive = interactive, which.xs = which.xs, ...) } } } else { matchit.covplot(x, type = type, interactive = interactive, which.xs = which.xs, ...) } } else if (type == "jitter") { if (is_null(x$distance)) { .err('`type = "jitter"` cannot be used when no distance variable was estimated or supplied') } jitter_pscore(x, interactive = interactive, ...) } else if (type == "histogram") { if (is_null(x$distance)) { .err('`type = "histogram"` cannot be used when no distance variable was estimated or supplied') } hist_pscore(x, ...) } invisible(x) } ## plot helper functions matchit.covplot <- function(object, type = "qq", interactive = TRUE, which.xs = NULL, data = NULL, ...) { if (is_null(which.xs)) { if (is_null(object$X)) { .wrn("No covariates to plot") return(invisible(NULL)) } X <- object$X if (is_not_null(object$exact)) { Xexact <- model.frame(object$exact, data = object$X) X <- cbind(X, Xexact[setdiff(names(Xexact), names(X))]) } if (is_not_null(object$mahvars)) { Xmahvars <- model.frame(object$mahvars, data = object$X) X <- cbind(X, Xmahvars[setdiff(names(Xmahvars), names(X))]) } } else { if (is_not_null(data)) { if (!is.data.frame(data) || nrow(data) != length(object$treat)) { .err("`data` must be a data frame with as many rows as there are units in the supplied `matchit` object") } data <- cbind(data, object$X[setdiff(names(object$X), names(data))]) } else { data <- object$X } if (is_not_null(object$exact)) { Xexact <- model.frame(object$exact, data = object$X) data <- cbind(data, Xexact[setdiff(names(Xexact), names(data))]) } if (is_not_null(object$mahvars)) { Xmahvars <- model.frame(object$mahvars, data = object$X) data <- cbind(data, Xmahvars[setdiff(names(Xmahvars), names(data))]) } if (is.character(which.xs)) { if (!all(hasName(data, which.xs))) { .err("all variables in `which.xs` must be in the supplied `matchit` object or in `data`") } X <- data[which.xs] } else if (rlang::is_formula(which.xs)) { which.xs <- update(terms(which.xs, data = data), NULL ~ .) X <- model.frame(which.xs, data, na.action = "na.pass") } else { .err("`which.xs` must be supplied as a character vector of names or a one-sided formula") } # if (anyNA(X)) { # stop("Missing values are not allowed in the covariates named in `which.xs`.", # call. = FALSE) # } k <- ncol(X) for (i in seq_len(k)) { if (anyNA(X[[i]]) || (is.numeric(X[[i]]) && !all(is.finite(X[[i]])))) { covariates.with.missingness <- names(X)[i:k][vapply(i:k, function(j) anyNA(X[[j]]) || (is.numeric(X[[j]]) && !all(is.finite(X[[j]]))), logical(1L))] .err(paste0("Missing and non-finite values are not allowed in the covariates named in `which.xs`. Variables with missingness or non-finite values:\n\t", toString(covariates.with.missingness)), tidy = FALSE) } if (is.character(X[[i]])) { X[[i]] <- factor(X[[i]]) } } } X <- droplevels(X) t <- object$treat sw <- { if (is_null(object$s.weights)) rep.int(1, length(t)) else object$s.weights } w <- object$weights * sw if (is_null(w)) w <- rep.int(1, length(t)) w <- .make_sum_to_1(w, by = t) sw <- .make_sum_to_1(sw, by = t) if (type == "density") { varnames <- names(X) } else { X <- get_covs_matrix(data = X) varnames <- colnames(X) } .pardefault <- par(no.readonly = TRUE) on.exit(par(.pardefault)) oma <- c(2.25, 0, 3.75, 1.5) if (type == "qq") { opar <- par(mfrow = c(3, 3), mar = rep.int(.5, 4L), oma = oma) } else if (type %in% c("ecdf", "density")) { opar <- par(mfrow = c(3, 3), mar = c(1.5, .5, 1.5, .5), oma = oma) } for (i in seq_along(varnames)){ x <- if (type == "density") X[[i]] else X[, i] plot.new() if (((i - 1) %% 3) == 0) { if (type == "qq") { htext <- "eQQ Plots" mtext(htext, 3, 2, TRUE, 0.5, cex = 1.1, font = 2) mtext("All", 3, .25, TRUE, 0.5, cex = 1, font = 1) mtext("Matched", 3, .25, TRUE, 0.83, cex = 1, font = 1) mtext("Control Units", 1, 0, TRUE, 2 / 3, cex = 1, font = 1) mtext("Treated Units", 4, 0, TRUE, 0.5, cex = 1, font = 1) } else if (type == "ecdf") { htext <- "eCDF Plots" mtext(htext, 3, 2, TRUE, 0.5, cex = 1.1, font = 2) mtext("All", 3, .25, TRUE, 0.5, cex = 1, font = 1) mtext("Matched", 3, .25, TRUE, 0.83, cex = 1, font = 1) } else if (type == "density") { htext <- "Density Plots" mtext(htext, 3, 2, TRUE, 0.5, cex = 1.1, font = 2) mtext("All", 3, .25, TRUE, 0.5, cex = 1, font = 1) mtext("Matched", 3, .25, TRUE, 0.83, cex = 1, font = 1) } } par(usr = c(0, 1, 0, 1)) l.wid <- strwidth(varnames, "user") cex.labels <- max(0.75, min(1.45, 0.85 / max(l.wid))) text(0.5, 0.5, varnames[i], cex = cex.labels) if (type == "qq") { qqplot_match(x = x, t = t, w = w, sw = sw, ...) } else if (type == "ecdf") { ecdfplot_match(x = x, t = t, w = w, sw = sw, ...) } else if (type == "density") { densityplot_match(x = x, t = t, w = w, sw = sw, ...) } devAskNewPage(ask = interactive) } devAskNewPage(ask = FALSE) invisible(NULL) } matchit.covplot.subclass <- function(object, type = "qq", which.subclass = NULL, interactive = TRUE, which.xs = NULL, data = NULL, ...) { if (is_null(which.xs)) { if (is_null(object$X)) { .wrn("no covariates to plot") return(invisible(NULL)) } X <- object$X if (is_not_null(object$exact)) { Xexact <- model.frame(object$exact, data = object$X) X <- cbind(X, Xexact[setdiff(names(Xexact), names(X))]) } if (is_not_null(object$mahvars)) { Xmahvars <- model.frame(object$mahvars, data = object$X) X <- cbind(X, Xmahvars[setdiff(names(Xmahvars), names(X))]) } } else { if (is_not_null(data)) { if (!is.data.frame(data) || nrow(data) != length(object$treat)) { .err("`data` must be a data frame with as many rows as there are units in the supplied `matchit` object") } data <- cbind(data, object$X[setdiff(names(object$X), names(data))]) } else { data <- object$X } if (is_not_null(object$exact)) { Xexact <- model.frame(object$exact, data = object$X) data <- cbind(data, Xexact[setdiff(names(Xexact), names(data))]) } if (is_not_null(object$mahvars)) { Xmahvars <- model.frame(object$mahvars, data = object$X) data <- cbind(data, Xmahvars[setdiff(names(Xmahvars), names(data))]) } if (is.character(which.xs)) { if (!all(hasName(data, which.xs))) { .err("all variables in `which.xs` must be in the supplied `matchit` object or in `data`") } X <- data[which.xs] } else if (rlang::is_formula(which.xs)) { which.xs <- update(terms(which.xs, data = data), NULL ~ .) X <- model.frame(which.xs, data, na.action = "na.pass") chk::chk_not_any_na(X, "the covariates named in `which.xs`") } else { .err("`which.xs` must be supplied as a character vector of names or a one-sided formula") } } chars.in.X <- vapply(X, is.character, logical(1L)) for (i in which(chars.in.X)) { X[[i]] <- factor(X[[i]]) } X <- droplevels(X) t <- object$treat if (!is.atomic(which.subclass)) { .err("the argument to `subclass` must be `NULL` or the indices of the subclasses for which to display covariate distributions") } if (!all(which.subclass %in% object$subclass[!is.na(object$subclass)])) { .err("the argument supplied to `subclass` is not the index of any subclass in the `matchit` object") } if (type == "density") { varnames <- names(X) } else { X <- get_covs_matrix(data = X) varnames <- colnames(X) } .pardefault <- par(no.readonly = TRUE) on.exit(par(.pardefault)) oma <- c(2.25, 0, 3.75, 1.5) for (s in which.subclass) { if (type == "qq") { opar <- par(mfrow = c(3, 3), mar = rep.int(.5, 4), oma = oma) } else if (type %in% c("ecdf", "density")) { opar <- par(mfrow = c(3, 3), mar = c(1.5, .5, 1.5, .5), oma = oma) } sw <- { if (is_null(object$s.weights)) rep.int(1, length(t)) else object$s.weights } w <- sw * (!is.na(object$subclass) & object$subclass == s) w <- .make_sum_to_1(w, by = t) sw <- .make_sum_to_1(sw, by = t) for (i in seq_along(varnames)){ x <- switch(type, "density" = X[[i]], X[, i]) plot.new() if (((i - 1) %% 3) == 0) { if (type == "qq") { htext <- sprintf("eQQ Plots (Subclass %s)", s) mtext(htext, 3, 2, TRUE, 0.5, cex = 1.1, font = 2) mtext("All", 3, .25, TRUE, 0.5, cex = 1, font = 1) mtext("Matched", 3, .25, TRUE, 0.83, cex = 1, font = 1) mtext("Control Units", 1, 0, TRUE, 2 / 3, cex = 1, font = 1) mtext("Treated Units", 4, 0, TRUE, 0.5, cex = 1, font = 1) } else if (type == "ecdf") { htext <- sprintf("eCDF Plots (Subclass %s)", s) mtext(htext, 3, 2, TRUE, 0.5, cex = 1.1, font = 2) mtext("All", 3, .25, TRUE, 0.5, cex = 1, font = 1) mtext("Matched", 3, .25, TRUE, 0.83, cex = 1, font = 1) } else if (type == "density") { htext <- sprintf("Density Plots (Subclass %s)", s) mtext(htext, 3, 2, TRUE, 0.5, cex = 1.1, font = 2) mtext("All", 3, .25, TRUE, 0.5, cex = 1, font = 1) mtext("Matched", 3, .25, TRUE, 0.83, cex = 1, font = 1) } } #Empty plot with variable name par(usr = c(0, 1, 0, 1)) l.wid <- strwidth(varnames, "user") cex.labels <- max(0.75, min(1.45, 0.85 / max(l.wid))) text(0.5, 0.5, varnames[i], cex = cex.labels) if (type == "qq") { qqplot_match(x = x, t = t, w = w, sw = sw, ...) } else if (type == "ecdf") { ecdfplot_match(x = x, t = t, w = w, sw = sw, ...) } else if (type == "density") { densityplot_match(x = x, t = t, w = w, sw = sw, ...) } devAskNewPage(ask = interactive) } } devAskNewPage(ask = FALSE) invisible(NULL) } qqplot_match <- function(x, t, w, sw, discrete.cutoff = 5, ...) { ord <- order(x) x_ord <- x[ord] t_ord <- t[ord] u <- unique(x_ord) #Need to interpolate larger group to be same size as smaller group #Unmatched sample sw_ord <- sw[ord] sw1 <- sw_ord[t_ord == 1] sw0 <- sw_ord[t_ord != 1] x1 <- x_ord[t_ord == 1][sw1 > 0] x0 <- x_ord[t_ord != 1][sw0 > 0] sw1 <- sw1[sw1 > 0] sw0 <- sw0[sw0 > 0] swn1 <- length(sw1) swn0 <- length(sw0) if (swn1 < swn0) { if (length(u) <= discrete.cutoff) { x0probs <- vapply(u, function(u_) wm(x0 == u_, sw0), numeric(1L)) x0cumprobs <- c(0, .cumsum_prob(x0probs)) x0 <- u[findInterval(.cumsum_prob(sw1), x0cumprobs, rightmost.closed = TRUE)] } else { x0 <- approx(.cumsum_prob(sw0), y = x0, xout = .cumsum_prob(sw1), rule = 2, method = "constant", ties = "ordered")$y } } else if (swn1 > swn0) { if (length(u) <= discrete.cutoff) { x1probs <- vapply(u, function(u_) wm(x1 == u_, sw1), numeric(1L)) x1cumprobs <- c(0, .cumsum_prob(x1probs)) x1 <- u[findInterval(.cumsum_prob(sw0), x1cumprobs, rightmost.closed = TRUE)] } else { x1 <- approx(.cumsum_prob(sw1), y = x1, xout = .cumsum_prob(sw0), rule = 2, method = "constant", ties = "ordered")$y } } if (length(u) <= discrete.cutoff) { md <- min(diff1(u)) x0 <- jitter(x0, amount = .1 * md) x1 <- jitter(x1, amount = .1 * md) } rr <- range(c(x0, x1)) plot(x0, x1, xlab = "", ylab = "", xlim = rr, ylim = rr, axes = FALSE, ...) abline(a = 0, b = 1) abline(a = (rr[2L] - rr[1L]) * 0.1, b = 1, lty = 2) abline(a = (rr[1L] - rr[2L]) * 0.1, b = 1, lty = 2) axis(2) box() #Matched sample w_ord <- w[ord] w1 <- w_ord[t_ord == 1] w0 <- w_ord[t_ord != 1] x1 <- x_ord[t_ord == 1][w1 > 0] x0 <- x_ord[t_ord != 1][w0 > 0] w1 <- w1[w1 > 0] w0 <- w0[w0 > 0] wn1 <- length(w1) wn0 <- length(w0) if (wn1 < wn0) { if (length(u) <= discrete.cutoff) { x0probs <- vapply(u, function(u_) wm(x0 == u_, w0), numeric(1L)) x0cumprobs <- c(0, .cumsum_prob(x0probs)) x0 <- u[findInterval(.cumsum_prob(w1), x0cumprobs, rightmost.closed = TRUE)] } else { x0 <- approx(.cumsum_prob(w0), y = x0, xout = .cumsum_prob(w1), rule = 2, method = "constant", ties = "ordered")$y } } else if (wn1 > wn0) { if (length(u) <= discrete.cutoff) { x1probs <- vapply(u, function(u_) wm(x1 == u_, w1), numeric(1L)) x1cumprobs <- c(0, .cumsum_prob(x1probs)) x1 <- u[findInterval(.cumsum_prob(w0), x1cumprobs, rightmost.closed = TRUE)] } else { x1 <- approx(.cumsum_prob(w1), y = x1, xout = .cumsum_prob(w0), rule = 2, method = "constant", ties = "ordered")$y } } if (length(u) <= discrete.cutoff) { md <- min(diff1(u)) x0 <- jitter(x0, amount = .1 * md) x1 <- jitter(x1, amount = .1 * md) } plot(x0, x1, xlab = "", ylab = "", xlim = rr, ylim = rr, axes = FALSE, ...) abline(a = 0, b = 1) abline(a = (rr[2L] - rr[1L]) * 0.1, b = 1, lty = 2) abline(a = (rr[1L] - rr[2L]) * 0.1, b = 1, lty = 2) box() } ecdfplot_match <- function(x, t, w, sw, ...) { ord <- order(x) x.min <- x[ord][1] x.max <- x[ord][length(x)] x.range <- x.max - x.min #Unmatched samples plot(x = x, y = w, type = "n", xlim = c(x.min - .02 * x.range, x.max + .02 * x.range), ylim = c(0, 1), axes = TRUE, ...) for (tr in 0:1) { in.tr <- t[ord] == tr ordt <- ord[in.tr] cswt <- c(0, .cumsum_prob(sw[ordt]), 1) xt <- c(x.min - .02 * x.range, x[ordt], x.max + .02 * x.range) lines(x = xt, y = cswt, type = "s", col = if (tr == 0) "grey60" else "black") } abline(h = 0:1) box() #Matched sample plot(x = x, y = w, type = "n", xlim = c(x.min - .02 * x.range, x.max + .02 * x.range), ylim = c(0, 1), axes = FALSE, ...) for (tr in 0:1) { in.tr <- t[ord] == tr ordt <- ord[in.tr] cwt <- c(0, .cumsum_prob(w[ordt]), 1) xt <- c(x.min - .02 * x.range, x[ordt], x.max + .02 * x.range) lines(x = xt, y = cwt, type = "s", col = if (tr == 0) "grey60" else "black") } abline(h = 0:1) axis(1) box() } densityplot_match <- function(x, t, w, sw, bw = NULL, cut = 3, ...) { if (has_n_unique(x, 2L)) { x <- factor(x, nmax = 2) } u <- unique(t) if (is.factor(x)) { #Bar plot for binary variable x_t_un <- lapply(sort(unique(t)), function(t_) { vapply(levels(x), function(i) { wm(x[t == t_] == i, sw[t == t_]) }, numeric(1L))}) x_t_m <- lapply(sort(unique(t)), function(t_) { vapply(levels(x), function(i) { wm(x[t == t_] == i, w[t == t_]) }, numeric(1L))}) ylim <- c(0, 1.1 * max(unlist(x_t_un), unlist(x_t_m))) borders <- c("grey60", "black") for (i in seq_along(x_t_un)) { barplot(x_t_un[[i]], border = borders[i], col = if (i == 1) "white" else NA_character_, ylim = ylim, add = i != 1L) } abline(h = 0:1) box() for (i in seq_along(x_t_m)) { barplot(x_t_m[[i]], border = borders[i], col = if (i == 1) "white" else NA_character_, ylim = ylim, add = i != 1L, axes = FALSE) } abline(h = 0:1) box() } else { #Density plot for continuous variable small.tr <- u[which.min(vapply(u, function(tr) sum(t == tr), numeric(1L)))] x_small <- x[t == small.tr] x.min <- min(x) x.max <- max(x) if (is_null(bw)) { bw <- bw.nrd0(x_small) } else if (is.character(bw)) { bw <- tolower(bw) bw <- match_arg(bw, c("nrd0", "nrd", "ucv", "bcv", "sj", "sj-ste", "sj-dpi")) bw <- switch(bw, nrd0 = bw.nrd0(x_small), nrd = bw.nrd(x_small), ucv = bw.ucv(x_small), bcv = bw.bcv(x_small), sj = , `sj-ste` = bw.SJ(x_small, method = "ste"), `sj-dpi` = bw.SJ(x_small, method = "dpi")) } d_unmatched <- do.call("rbind", lapply(u, function(tr) { cbind(as.data.frame(density(x[t == tr], weights = sw[t == tr], from = x.min - cut * bw, to = x.max + cut * bw, bw = bw, cut = cut, ...)[1:2]), t = tr) })) d_matched <- do.call("rbind", lapply(u, function(tr) { cbind(as.data.frame(density(x[t == tr], weights = w[t == tr], from = x.min - cut * bw, to = x.max + cut * bw, bw = bw, cut = cut, ...)[1:2]), t = tr) })) y.max <- max(d_unmatched$y, d_matched$y) #Unmatched samples plot(x = d_unmatched$x, y = d_unmatched$y, type = "n", xlim = c(x.min - cut * bw, x.max + cut * bw), ylim = c(0, 1.1 * y.max), axes = TRUE, ...) for (tr in 0:1) { in.tr <- d_unmatched$t == tr xt <- d_unmatched$x[in.tr] yt <- d_unmatched$y[in.tr] lines(x = xt, y = yt, type = "l", col = if (tr == 0) "grey60" else "black") } abline(h = 0) box() #Matched sample plot(x = d_matched$x, y = d_matched$y, type = "n", xlim = c(x.min - cut * bw, x.max + cut * bw), ylim = c(0, 1.1 * y.max), axes = FALSE, ...) for (tr in 0:1) { in.tr <- d_matched$t == tr xt <- d_matched$x[in.tr] yt <- d_matched$y[in.tr] lines(x = xt, y = yt, type = "l", col = if (tr == 0) "grey60" else "black") } abline(h = 0) axis(1) box() } } hist_pscore <- function(x, xlab = "Propensity Score", freq = FALSE, ...) { .pardefault <- par(no.readonly = TRUE) on.exit(par(.pardefault)) treat <- x$treat pscore <- x$distance[!is.na(x$distance)] s.weights <- { if (is_null(x$s.weights)) rep.int(1, length(treat)) else x$s.weights } weights <- x$weights * s.weights q.cut <- x$q.cut minp <- min(pscore) maxp <- max(pscore) if (freq) { weights <- .make_sum_to_n(weights, by = treat) s.weights <- .make_sum_to_n(s.weights, by = treat) } else { weights <- .make_sum_to_1(weights, by = treat) s.weights <- .make_sum_to_1(s.weights, by = treat) } ylab <- if (freq) "Count" else "Proportion" par(mfrow = c(2, 2)) # breaks <- pretty(na.omit(pscore), 10) breaks <- seq(minp, maxp, length = 11) xlim <- range(breaks) for (n in c("Raw Treated", "Matched Treated", "Raw Control", "Matched Control")) { w <- if (startsWith(n, "Raw")) s.weights else weights tr <- if (endsWith(n, "Treated")) 1 else 0 #Create histogram using weights #Manually assign density, which is used as height of the bars. The scaling #of the weights above determine whether they are "counts" or "proportions". #Regardless, set freq = FALSE in plot() to ensure density is used for bar #height rather than count. pm <- hist(pscore[treat == tr], plot = FALSE, breaks = breaks) pm[["density"]] <- vapply(seq_len(length(pm$breaks) - 1L), function(i) { sum(w[treat == tr & pscore >= pm$breaks[i] & pscore < pm$breaks[i + 1L]]) }, numeric(1L)) plot(pm, xlim = xlim, xlab = xlab, main = n, ylab = ylab, freq = FALSE, col = "lightgray", ...) if (!startsWith(n, "Raw") && is_not_null(q.cut)) { abline(v = q.cut, lty = 2) } } } jitter_pscore <- function(x, interactive, pch = 1, ...) { .pardefault <- par(no.readonly = TRUE) on.exit(par(.pardefault)) treat <- x$treat pscore <- x$distance s.weights <- if (is_null(x$s.weights)) rep.int(1, length(treat)) else x$s.weights weights <- x$weights * s.weights matched <- weights > 0 q.cut <- x$q.cut jitp <- jitter(rep.int(1, length(treat)), factor = 6) + (treat == 1) * (weights == 0) - (treat == 0) - (weights==0) * (treat == 0) cswt <- sqrt(s.weights) cwt <- sqrt(weights) minp <- min(pscore, na.rm = TRUE) maxp <- max(pscore, na.rm = TRUE) plot(pscore, xlim = c(minp - 0.05 * (maxp-minp), maxp + 0.05 * (maxp - minp)), ylim = c(-1.5, 2.5), type = "n", ylab = "", xlab = "Propensity Score", axes = FALSE, main = "Distribution of Propensity Scores", ...) if (is_not_null(q.cut)) { abline(v = q.cut, col = "grey", lty = 1) } #Matched treated points(pscore[treat == 1 & matched], jitp[treat == 1 & matched], pch = pch, cex = cwt[treat == 1 & matched], ...) #Matched control points(pscore[treat == 0 & matched], jitp[treat == 0 & matched], pch = pch, cex = cwt[treat == 0 & matched], ...) #Unmatched treated points(pscore[treat == 1 & !matched], jitp[treat == 1 & !matched], pch = pch, cex = cswt[treat == 1 & !matched], ...) #Unmatched control points(pscore[treat == 0 & !matched], jitp[treat == 0 & !matched], pch = pch, cex = cswt[treat == 0 & !matched], ...) axis(1) center <- mean(par("usr")[1:2]) text(center, 2.5, "Unmatched Treated Units", adj = .5) text(center, 1.5, "Matched Treated Units", adj = .5) text(center, 0.5, "Matched Control Units", adj = .5) text(center, -0.5, "Unmatched Control Units", adj = .5) box() if (interactive) { cat("To identify the units, use first mouse button; to stop, use second.\n") identify(pscore, jitp, names(treat), atpen = TRUE) } } MatchIt/R/add_s.weights.R0000644000176200001440000001110214740554527014662 0ustar liggesusers#' Add sampling weights to a `matchit` object #' #' @description #' Adds sampling weights to a `matchit` object so that they are #' incorporated into balance assessment and creation of the weights. This would #' typically only be used when an argument to `s.weights` was not supplied #' to [matchit()] (i.e., because they were not to be included in the estimation #' of the propensity score) but sampling weights are required for generalizing #' an effect to the correct population. Without adding sampling weights to the #' `matchit` object, balance assessment tools (i.e., [summary.matchit()] #' and [plot.matchit()]) will not calculate balance statistics correctly, and #' the weights produced by [match_data()] and [get_matches()] will not #' incorporate the sampling weights. #' #' @param m a `matchit` object; the output of a call to [matchit()], #' typically with the `s.weights` argument unspecified. #' @param s.weights an numeric vector of sampling weights to be added to the #' `matchit` object. Can also be specified as a string containing the name #' of variable in `data` to be used or a one-sided formula with the #' variable on the right-hand side (e.g., `~ SW`). #' @param data a data frame containing the sampling weights if given as a #' string or formula. If unspecified, `add_s.weights()` will attempt to find #' the dataset using the environment of the `matchit` object. #' #' @return a `matchit` object with an `s.weights` component #' containing the supplied sampling weights. If `s.weights = NULL`, the original #' `matchit` object is returned. #' #' @author Noah Greifer #' #' @seealso [matchit()]; [match_data()] #' #' @examples #' #' data("lalonde") #' #' # Generate random sampling weights, just #' # for this example #' sw <- rchisq(nrow(lalonde), 2) #' #' # NN PS match using logistic regression PS that doesn't #' # include sampling weights #' m.out <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde) #' #' m.out #' #' # Add s.weights to the matchit object #' m.out <- add_s.weights(m.out, sw) #' #' m.out #note additional output #' #' # Check balance; note that sample sizes incorporate #' # s.weights #' summary(m.out, improvement = FALSE) #' #' @export add_s.weights <- function(m, s.weights = NULL, data = NULL) { chk::chk_is(m, "matchit") if (is_null(s.weights)) { return(m) } if (!is.numeric(s.weights)) { if (is_null(data)) { if (is_not_null(m$model)) { env <- attributes(terms(m$model))$.Environment } else { env <- parent.frame() } data <- eval(m$call$data, envir = env) if (is_null(data)) { .err("a dataset could not be found. Please supply an argument to `data` containing the original dataset used in the matching") } } else { if (!is.data.frame(data)) { if (!is.matrix(data)) { .err("`data` must be a data frame") } data <- as.data.frame.matrix(data) } if (nrow(data) != length(m$treat)) { .err("`data` must have as many rows as there were units in the original call to `matchit()`") } } if (is.character(s.weights)) { if (is_null(data) || !is.data.frame(data)) { .err("if `s.weights` is specified a string, a data frame containing the named variable must be supplied to `data`") } if (!all(hasName(data, s.weights))) { .err("the name supplied to `s.weights` must be a variable in `data`") } s.weights.form <- reformulate(s.weights) s.weights <- model.frame(s.weights.form, data, na.action = "na.pass") if (ncol(s.weights) != 1L) { .err("`s.weights` can only contain one named variable") } s.weights <- s.weights[[1L]] } else if (rlang::is_formula(s.weights)) { s.weights.form <- update(terms(s.weights, data = data), NULL ~ .) s.weights <- model.frame(s.weights.form, data, na.action = "na.pass") if (ncol(s.weights) != 1L) { .err("`s.weights` can only contain one named variable") } s.weights <- s.weights[[1L]] } else { .err("`s.weights` must be supplied as a numeric vector, string, or one-sided formula") } } chk::chk_not_any_na(s.weights) if (length(s.weights) != length(m$treat)) { .err("`s.weights` must be the same length as the treatment vector") } names(s.weights) <- names(m$treat) attr(s.weights, "in_ps") <- isTRUE(all.equal(s.weights, m$s.weights)) m$s.weights <- s.weights m$nn <- nn(m$treat, m$weights, m$discarded, s.weights) m } MatchIt/R/discard.R0000644000176200001440000000467614762410121013553 0ustar liggesusersdiscard <- function(treat, pscore = NULL, option = NULL) { n.obs <- length(treat) if (is_null(option)) { # keep all units return(rep_with(FALSE, treat)) } if (is.logical(option) && length(option) == n.obs && !anyNA(option)) { # user input return(setNames(option, names(treat))) } if (!chk::vld_string(option)) { .err('`discard` must be "none", "both", "control", "treated" or a logical vector of observations to discard') } option <- match_arg(option, c("none", "both", "control", "treated")) if (option == "none") { # keep all units return(rep_with(FALSE, treat)) } if (is_null(pscore)) { .err('`discard` must be a logical vector or "none" in the absence of a propensity score') } if (is.matrix(pscore)) { .err('`discard` must be a logical vector or "none" when `distance` is supplied as a matrix') } pmax0 <- max(pscore[treat == 0]) pmax1 <- max(pscore[treat == 1]) pmin0 <- min(pscore[treat == 0]) pmin1 <- min(pscore[treat == 1]) if (option == "both") # discard units outside of common support discarded <- (pscore < max(pmin0, pmin1) | pscore > min(pmax0, pmax1)) else if (option == "control") # discard control units only discarded <- (pscore < pmin1 | pscore > pmax1) else if (option == "treated") # discard treated units only discarded <- (pscore < pmin0 | pscore > pmax0) # NOTE: WhatIf package has been removed from CRAN, so hull options won't work # else if (option %in% c("hull.control", "hull.treat", "hull.both")) { # ## convex hull stuff # check.package("WhatIf") # X <- model.matrix(reformulate(names(covs), intercept = FALSE), data = covs, # contrasts.arg = lapply(Filter(is.factor, covs), # function(x) contrasts(x, contrasts = nlevels(x) == 1))) # discarded <- rep.int(FALSE, n.obs) # if (option == "hull.control"){ # discard units not in T convex hull # wif <- WhatIf::whatif(cfact = X[treat==0,], data = X[treat==1,]) # discarded[treat==0] <- !wif$in.hull # } else if (option == "hull.treat") { # wif <- WhatIf::whatif(cfact = X[treat==1,], data = X[treat==0,]) # discarded[treat==1] <- !wif$in.hull # } else if (option == "hull.both"){ # discard units not in T&C convex hull # wif <- WhatIf::whatif(cfact = cbind(1-treat, X), data = cbind(treat, X)) # discarded <- !wif$in.hull # } # } setNames(discarded, names(treat)) } MatchIt/R/zzz.R0000644000176200001440000000024214271312020012752 0ustar liggesusers#Used to load backports functions. No need to touch, but must always be included somewhere. .onLoad <- function(libname, pkgname) { backports::import(pkgname) }MatchIt/R/plot.summary.matchit.R0000644000176200001440000001601114762403014016231 0ustar liggesusers#' Generate a Love Plot of Standardized Mean Differences #' #' Generates a Love plot, which is a dot plot with variable names on the y-axis #' and standardized mean differences on the x-axis. Each point represents the #' standardized mean difference of the corresponding covariate in the matched #' or unmatched sample. Love plots are a simple way to display covariate #' balance before and after matching. The plots are generated using #' [dotchart()] and [points()]. #' #' @param x a `summary.matchit` object; the output of a call to #' [summary.matchit()]. The `standardize` argument must be set to #' `TRUE` (which is the default) in the call to `summary`. #' @param abs `logical`; whether the standardized mean differences should #' be displayed in absolute value (`TRUE`, default) or not `FALSE`. #' @param var.order how the variables should be ordered. Allowable options #' include `"data"`, ordering the variables as they appear in the #' `summary` output; `"unmatched"`, ordered the variables based on #' their standardized mean differences before matching; `"matched"`, #' ordered the variables based on their standardized mean differences after #' matching; and `"alphabetical"`, ordering the variables alphabetically. #' Default is `"data"`. Abbreviations allowed. #' @param threshold numeric values at which to place vertical lines indicating #' a balance threshold. These can make it easier to see for which variables #' balance has been achieved given a threshold. Multiple values can be supplied #' to add multiple lines. When `abs = FALSE`, the lines will be displayed #' on both sides of zero. The lines are drawn with `abline` with the #' linetype (`lty`) argument corresponding to the order of the entered #' variables (see options at [par()]). The default is `c(.1, .05)` for a #' solid line (`lty = 1`) at .1 and a dashed line (`lty = 2`) at .05, #' indicating acceptable and good balance, respectively. Enter a value as #' `NA` to skip that value of `lty` (e.g., `c(NA, .05)` to have #' only a dashed vertical line at .05). #' @param position the position of the legend. Should be one of the allowed #' keyword options supplied to `x` in [legend()] (e.g., `"right"`, #' `"bottomright"`, etc.). Default is `"bottomright"`. Set to #' `NULL` for no legend to be included. Note that the legend will cover up #' points if you are not careful; setting `var.order` appropriately can #' help in avoiding this. #' @param \dots ignored. #' #' @return A plot is displayed, and `x` is invisibly returned. #' #' @details #' For matching methods other than subclassification, #' `plot.summary.matchit` uses `x$sum.all[,"Std. Mean Diff."]` and #' `x$sum.matched[,"Std. Mean Diff."]` as the x-axis values. For #' subclassification, in addition to points for the unadjusted and aggregate #' subclass balance, numerals representing balance in individual subclasses are #' plotted if `subclass = TRUE` in the call to `summary`. Aggregate #' subclass standardized mean differences are taken from #' `x$sum.across[,"Std. Mean Diff."]` and the subclass-specific mean #' differences are taken from `x$sum.subclass`. #' #' @author Noah Greifer #' #' @seealso [summary.matchit()], [dotchart()] #' #' \pkgfun{cobalt}{love.plot} is a more flexible and sophisticated function to make #' Love plots and is also natively compatible with `matchit` objects. #' #' @examples #' #' data("lalonde") #' m.out <- matchit(treat ~ age + educ + married + #' race + re74, #' data = lalonde, #' method = "nearest") #' plot(summary(m.out, interactions = TRUE), #' var.order = "unmatched") #' #' s.out <- matchit(treat ~ age + educ + married + #' race + nodegree + re74 + re75, #' data = lalonde, #' method = "subclass") #' plot(summary(s.out, subclass = TRUE), #' var.order = "unmatched", #' abs = FALSE) #' #' @exportS3Method plot summary.matchit plot.summary.matchit <- function(x, abs = TRUE, var.order = "data", threshold = c(.1, .05), position = "bottomright", ...) { .pardefault <- par(no.readonly = TRUE) on.exit(par(.pardefault)) sub <- inherits(x, "summary.matchit.subclass") matched <- sub || is_not_null(x[["sum.matched"]]) un <- is_not_null(x[["sum.all"]]) standard.sum <- { if (un) x[["sum.all"]] else if (sub) x[["sum.across"]] else x[["sum.matched"]] } if (!"Std. Mean Diff." %in% colnames(standard.sum)) { .err("not appropriate for unstandardized summary. Run `summary()` with the `standardize = TRUE` option, and then plot") } if (un) { sd.all <- x[["sum.all"]][, "Std. Mean Diff."] } if (matched) { sd.matched <- x[[if (sub) "sum.across" else "sum.matched"]][, "Std. Mean Diff."] } chk::chk_flag(abs) var.names <- rownames(standard.sum) chk::chk_string(var.order) var.order <- match_arg(var.order, c("data", "matched", "unmatched", "alphabetical")) if (!un && var.order == "unmatched") { .err('`var.order` cannot be "unmatched" if `un = TRUE` in the call to `summary()`') } if (!matched && var.order == "matched") { .err('`var.order` cannot be "matched" if `method = NULL` in the original call to `matchit()`') } if (abs) { if (un) sd.all <- abs(sd.all) if (matched) sd.matched <- abs(sd.matched) xlab <- "Absolute Standardized\nMean Difference" } else { xlab <- "Standardized Mean Difference" } ord <- switch(var.order, "data" = rev(seq_along(var.names)), "matched" = order(sd.matched), "unmatched" = order(sd.all), "alphabetical" = order(var.names, decreasing = TRUE)) dotchart(if (un) sd.all[ord] else sd.matched[ord], labels = var.names[ord], xlab = xlab, bg = NA, color = NA, ...) abline(v = 0) if (sub && is_not_null(x$sum.subclass)) { for (i in seq_along(x$sum.subclass)) { sd.sub <- x$sum.subclass[[i]][, "Std. Mean Diff."] if (abs) sd.sub <- abs(sd.sub) points(x = sd.sub[ord], y = seq_along(sd.sub), pch = as.character(i), col = "gray60", cex = .6) } } if (un) { points(x = sd.all[ord], y = seq_along(sd.all), pch = 21, bg = "white", col = "black") } if (matched) { points(x = sd.matched[ord], y = seq_along(sd.matched), pch = 21, bg = "black", col = "black") } if (is_not_null(threshold)) { if (abs) { abline(v = threshold, lty = seq_along(threshold)) } else { abline(v = threshold, lty = seq_along(threshold)) abline(v = -threshold, lty = seq_along(threshold)) } } if (sum(matched, un) > 1 && is_not_null(position)) { position <- match_arg(position, c("bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right", "center")) legend(position, legend = c("All", "Matched"), pt.bg = c("white", "black"), pch = 21, inset = .015, xpd = TRUE) } invisible(x) } MatchIt/R/get_weights_from_mm.R0000644000176200001440000000100414731642347016162 0ustar liggesusersget_weights_from_mm <- function(match.matrix, treat, focal = NULL) { if (!is.integer(match.matrix)) { match.matrix <- charmm2nummm(match.matrix, treat) } weights <- weights_matrixC(match.matrix, treat, focal) if (all_equal_to(weights, 0)) { .err("no units were matched") } if (all_equal_to(weights[treat == 1], 0)) { .err("no treated units were matched") } if (all_equal_to(weights[treat == 0], 0)) { .err("no control units were matched") } setNames(weights, names(treat)) }MatchIt/R/matchit2quick.R0000644000176200001440000002670314763226441014720 0ustar liggesusers#' Fast Generalized Full Matching #' @name method_quick #' @aliases method_quick #' @usage NULL #' #' @description #' In [matchit()], setting `method = "quick"` performs generalized full #' matching, which is a form of subclassification wherein all units, both #' treatment and control (i.e., the "full" sample), are assigned to a subclass #' and receive at least one match. It uses an algorithm that is extremely fast #' compared to optimal full matching, which is why it is labeled as "quick", at the #' expense of true optimality. The method is described in Sävje, Higgins, & Sekhon (2021). The method relies on and is a wrapper #' for \pkgfun{quickmatch}{quickmatch}. #' #' Advantages of generalized full matching include that the matching order is not #' required to be specified, units do not need to be discarded, and it is less #' likely that extreme within-subclass distances will be large, unlike with #' standard subclassification. The primary output of generalized full matching is a set of #' matching weights that can be applied to the matched sample; in this way, #' generalized full matching can be seen as a robust alternative to propensity score #' weighting, robust in the sense that the propensity score model does not need #' to be correct to estimate the treatment effect without bias. #' #' This page details the allowable arguments with `method = "quick"`. #' See [matchit()] for an explanation of what each argument means in a general #' context and how it can be specified. #' #' Below is how `matchit()` is used for generalized full matching: #' \preformatted{ #' matchit(formula, #' data = NULL, #' method = "quick", #' distance = "glm", #' link = "logit", #' distance.options = list(), #' estimand = "ATT", #' exact = NULL, #' mahvars = NULL, #' discard = "none", #' reestimate = FALSE, #' s.weights = NULL, #' caliper = NULL, #' std.caliper = TRUE, #' verbose = FALSE, #' ...) #' } #' #' @param formula a two-sided [formula] object containing the treatment and #' covariates to be used in creating the distance measure used in the matching. #' This formula will be supplied to the functions that estimate the distance #' measure. #' @param data a data frame containing the variables named in `formula`. #' If not found in `data`, the variables will be sought in the #' environment. #' @param method set here to `"quick"`. #' @param distance the distance measure to be used. See [`distance`] #' for allowable options. Cannot be supplied as a matrix. #' @param link when `distance` is specified as a method of estimating #' propensity scores, an additional argument controlling the link function used #' in estimating the distance measure. See [`distance`] for allowable #' options with each option. #' @param distance.options a named list containing additional arguments #' supplied to the function that estimates the distance measure as determined #' by the argument to `distance`. #' @param estimand a string containing the desired estimand. Allowable options #' include `"ATT"`, `"ATC"`, and `"ATE"`. The estimand controls #' how the weights are computed; see the Computing Weights section at #' [matchit()] for details. #' @param exact for which variables exact matching should take place. #' @param mahvars for which variables Mahalanobis distance matching should take #' place when `distance` corresponds to a propensity score (e.g., to discard units for common support). If specified, the #' distance measure will not be used in matching. #' @param discard a string containing a method for discarding units outside a #' region of common support. Only allowed when `distance` corresponds to a #' propensity score. #' @param reestimate if `discard` is not `"none"`, whether to #' re-estimate the propensity score in the remaining sample prior to matching. #' @param s.weights the variable containing sampling weights to be incorporated #' into propensity score models and balance statistics. #' @param caliper the width of the caliper used for caliper matching. A caliper can only be placed on the propensity score and cannot be negative. #' @param std.caliper `logical`; when a caliper is specified, whether it #' is in standard deviation units (`TRUE`) or raw units (`FALSE`). #' @param verbose `logical`; whether information about the matching #' process should be printed to the console. #' @param \dots additional arguments passed to \pkgfun{quickmatch}{quickmatch}. Allowed arguments include `treatment_constraints`, `size_constraint`, `target`, and other arguments passed to `scclust::sc_clustering()` (see \pkgfun{quickmatch}{quickmatch} for details). In particular, changing `seed_method` from its default can improve performance. #' No arguments will be passed to `distances::distances()`. #' #' The arguments `replace`, `ratio`, `min.controls`, `max.controls`, `m.order`, and `antiexact` are ignored with a warning. #' #' @section Outputs: #' #' All outputs described in [matchit()] are returned with #' `method = "quick"` except for `match.matrix`. This is because #' matching strata are not indexed by treated units as they are in some other #' forms of matching. When `include.obj = TRUE` in the call to #' `matchit()`, the output of the call to \pkgfun{quickmatch}{quickmatch} will be #' included in the output. When `exact` is specified, this will be a list #' of such objects, one for each stratum of the `exact` variables. #' #' @details #' #' Generalized full matching is similar to optimal full matching, but has some additional flexibility that can be controlled by some of the extra arguments available. By default, `method = "quick"` performs a standard full match in which all units are matched (unless restricted by the caliper) and assigned to a subclass. Each subclass could contain multiple units from each treatment group. The subclasses are chosen to minimize the largest within-subclass distance between units (including between units of the same treatment group). Notably, generalized full matching requires less memory and can run much faster than optimal full matching and optimal pair matching and, in some cases, even than nearest neighbor matching, and it can be used with huge datasets (e.g., in the millions) while running in under a minute. #' #' @references In a manuscript, be sure to cite the *quickmatch* package if using #' `matchit()` with `method = "quick"`. A citation can be generated using `citation("quickmatch")`. #' #' For example, a sentence might read: #' #' *Generalized full matching was performed using the MatchIt package (Ho, #' Imai, King, & Stuart, 2011) in R, which calls functions from the quickmatch #' package (Sävje, Sekhon, & Higgins, 2024).* #' #' You should also cite the following paper, which develops and describes the method: #' #' Sävje, F., Higgins, M. J., & Sekhon, J. S. (2021). Generalized Full Matching. *Political Analysis*, 29(4), 423–447. \doi{10.1017/pan.2020.32} #' #' @seealso [matchit()] for a detailed explanation of the inputs and outputs of #' a call to `matchit()`. #' #' \pkgfun{quickmatch}{quickmatch}, which is the workhorse. #' #' [`method_full`] for optimal full matching, which is nearly the same but offers more customizability and more optimal solutions at the cost of speed. #' #' @examplesIf requireNamespace("quickmatch", quietly = TRUE) #' data("lalonde") #' #' # Generalize full PS matching #' m.out1 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' method = "quick") #' m.out1 #' summary(m.out1) NULL matchit2quick <- function(treat, formula, data, distance, discarded, s.weights = NULL, caliper = NULL, mahvars = NULL, exact = NULL, estimand = "ATT", verbose = FALSE, is.full.mahalanobis, ...) { rlang::check_installed("quickmatch") .cat_verbose("Generalized full matching...\n", verbose = verbose) A <- list(...) distances.args <- c("data", "id_variable", "dist_variables", "normalize", "weights") A[names(A) %in% distances.args] <- NULL estimand <- toupper(estimand) estimand <- match_arg(estimand, c("ATT", "ATC", "ATE")) if (estimand == "ATC") { tc <- c("control", "treated") focal <- 0 } else { tc <- c("treated", "control") focal <- 1 } treat_ <- treat[!discarded] # treat_ <- setNames(as.integer(treat[!discarded] == focal), names(treat)[!discarded]) if (is.full.mahalanobis) { if (is_null(attr(terms(formula, data = data), "term.labels"))) { .err(sprintf("covariates must be specified in the input formula when `distance = \"%s\"`", attr(is.full.mahalanobis, "transform"))) } mahvars <- formula } #Exact matching strata if (is_not_null(exact)) { ex <- factor(exactify(model.frame(exact, data = data), sep = ", ", include_vars = TRUE)[!discarded]) cc <- Reduce("intersect", lapply(unique(treat_), function(t) unclass(ex)[treat_ == t])) if (is_null(cc)) { .err("no matches were found") } } else { ex <- gl(1, length(treat_), labels = "_") cc <- 1 } #Create distance matrix; note that Mahalanobis distance computed using entire #sample (minus discarded), like method2nearest, as opposed to within exact strata, like optmatch. if (is_not_null(mahvars)) { transform <- if (is.full.mahalanobis) attr(is.full.mahalanobis, "transform") else "mahalanobis" distcovs <- transform_covariates(mahvars, data = data, method = transform, s.weights = s.weights, treat = treat, discarded = discarded) } else { distcovs <- as.matrix(distance) } #Remove discarded units from distance mat distcovs <- distcovs[!discarded, , drop = FALSE] rownames(distcovs) <- names(treat_) #Process caliper if (is_not_null(caliper)) { if (is_not_null(mahvars)) { .err('with `method = "quick"`, a caliper can only be used when `distance` is a propensity score or vector and `mahvars` is not specified') } if (length(caliper) > 1L || !identical(names(caliper), "")) { .err('with `method = "quick"`, calipers cannot be placed on covariates') } } A$caliper <- caliper #Initialize pair membership; must include names pair <- rep_with(NA_character_, treat) p <- setNames(vector("list", nlevels(ex)), levels(ex)) for (e in levels(ex)[cc]) { if (nlevels(ex) > 1L) { .cat_verbose(sprintf("Matching subgroup %s/%s: %s...\n", match(e, levels(ex)[cc]), length(cc), e), verbose = verbose) } A$distances <- distcovs[ex == e, , drop = FALSE] A$treatments <- treat_[ex == e] matchit_try({ p[[e]] <- do.call(quickmatch::quickmatch, A) }, from = "quickmatch") pair[which(ex == e)[!is.na(p[[e]])]] <- paste(as.character(p[[e]][!is.na(p[[e]])]), e, sep = "|") } if (length(p) == 1L) { p <- p[[1L]] } psclass <- factor(pair) levels(psclass) <- seq_len(nlevels(psclass)) names(psclass) <- names(treat) #No match.matrix because treated units don't index matched strata (i.e., more than one #treated unit can be in the same stratum). Stratum information is contained in subclass. .cat_verbose("Calculating matching weights... ", verbose = verbose) res <- list(subclass = psclass, weights = get_weights_from_subclass(psclass, treat, estimand), obj = p) .cat_verbose("Done.\n", verbose = verbose) class(res) <- "matchit" res } MatchIt/R/match_data.R0000644000176200001440000003613614762405453014237 0ustar liggesusers#' Construct a matched dataset from a `matchit` object #' @name match_data #' @aliases match_data match.data get_matches #' #' @description #' `match_data()` and `get_matches()` create a data frame with #' additional variables for the distance measure, matching weights, and #' subclasses after matching. This dataset can be used to estimate treatment #' effects after matching or subclassification. `get_matches()` is most #' useful after matching with replacement; otherwise, `match_data()` is #' more flexible. See Details below for the difference between them. #' #' @param object a `matchit` object; the output of a call to [matchit()]. #' @param group which group should comprise the matched dataset: `"all"` #' for all units, `"treated"` for just treated units, or `"control"` #' for just control units. Default is `"all"`. #' @param distance a string containing the name that should be given to the #' variable containing the distance measure in the data frame output. Default #' is `"distance"`, but `"prop.score"` or similar might be a good #' alternative if propensity scores were used in matching. Ignored if a #' distance measure was not supplied or estimated in the call to #' `matchit()`. #' @param weights a string containing the name that should be given to the #' variable containing the matching weights in the data frame output. Default #' is `"weights"`. #' @param subclass a string containing the name that should be given to the #' variable containing the subclasses or matched pair membership in the data #' frame output. Default is `"subclass"`. #' @param id a string containing the name that should be given to the variable #' containing the unit IDs in the data frame output. Default is `"id"`. #' Only used with `get_matches()`; for `match_data()`, the units IDs #' are stored in the row names of the returned data frame. #' @param data a data frame containing the original dataset to which the #' computed output variables (`distance`, `weights`, and/or #' `subclass`) should be appended. If empty, `match_data()` and #' `get_matches()` will attempt to find the dataset using the environment #' of the `matchit` object, which can be unreliable; see Notes. #' @param include.s.weights `logical`; whether to multiply the estimated #' weights by the sampling weights supplied to `matchit()`, if any. #' Default is `TRUE`. If `FALSE`, the weights in the #' `match_data()` or `get_matches()` output should be multiplied by #' the sampling weights before being supplied to the function estimating the #' treatment effect in the matched data. #' @param drop.unmatched `logical`; whether the returned data frame should #' contain all units (`FALSE`) or only units that were matched (i.e., have #' a matching weight greater than zero) (`TRUE`). Default is `TRUE` #' to drop unmatched units. #' @param \dots arguments passed to `match_data()`. #' #' @details #' `match_data()` creates a dataset with one row per unit. It will be #' identical to the dataset supplied except that several new columns will be #' added containing information related to the matching. When #' `drop.unmatched = TRUE`, the default, units with weights of zero, which #' are those units that were discarded by common support or the caliper or were #' simply not matched, will be dropped from the dataset, leaving only the #' subset of matched units. The idea is for the output of `match_data()` #' to be used as the dataset input in calls to `glm()` or similar to #' estimate treatment effects in the matched sample. It is important to include #' the weights in the estimation of the effect and its standard error. The #' subclass column, when created, contains pair or subclass membership and #' should be used to estimate the effect and its standard error. Subclasses #' will only be included if there is a `subclass` component in the #' `matchit` object, which does not occur with matching with replacement, #' in which case `get_matches()` should be used. See #' `vignette("estimating-effects")` for information on how to use #' `match_data()` output to estimate effects. `match.data()` is an alias for `match_data()`. #' #' `get_matches()` is similar to `match_data()`; the primary #' difference occurs when matching is performed with replacement, i.e., when #' units do not belong to a single matched pair. In this case, the output of #' `get_matches()` will be a dataset that contains one row per unit for #' each pair they are a part of. For example, if matching was performed with #' replacement and a control unit was matched to two treated units, that #' control unit will have two rows in the output dataset, one for each pair it #' is a part of. Weights are computed for each row, and, for control units, are equal to the #' inverse of the number of control units in each control unit's subclass; treated units get a weight of 1. #' Unmatched units are dropped. An additional column with unit IDs will be #' created (named using the `id` argument) to identify when the same unit #' is present in multiple rows. This dataset structure allows for the inclusion #' of both subclass membership and repeated use of units, unlike the output of #' `match_data()`, which lacks subclass membership when matching is done #' with replacement. A `match.matrix` component of the `matchit` #' object must be present to use `get_matches()`; in some forms of #' matching, it is absent, in which case `match_data()` should be used #' instead. See `vignette("estimating-effects")` for information on how to #' use `get_matches()` output to estimate effects after matching with #' replacement. #' #' @return #' A data frame containing the data supplied in the `data` argument or in the #' original call to `matchit()` with the computed #' output variables appended as additional columns, named according the #' arguments above. For `match_data()`, the `group` and #' `drop.unmatched` arguments control whether only subsets of the data are #' returned. See Details above for how `match_data()` and #' `get_matches()` differ. Note that `get_matches` sorts the data by #' subclass and treatment status, unlike `match_data()`, which uses the #' order of the data. #' #' The returned data frame will contain the variables in the original data set #' or dataset supplied to `data` and the following columns: #' #' \item{distance}{The propensity score, if estimated or supplied to the #' `distance` argument in `matchit()` as a vector.} #' \item{weights}{The computed matching weights. These must be used in effect #' estimation to correctly incorporate the matching.} #' \item{subclass}{Matching #' strata membership. Units with the same value are in the same stratum.} #' \item{id}{The ID of each unit, corresponding to the row names in the #' original data or dataset supplied to `data`. Only included in #' `get_matches` output. This column can be used to identify which rows #' belong to the same unit since the same unit may appear multiple times if #' reused in matching with replacement.} #' #' These columns will take on the name supplied to the corresponding arguments #' in the call to `match_data()` or `get_matches()`. See Examples for #' an example of rename the `distance` column to `"prop.score"`. #' #' If `data` or the original dataset supplied to `matchit()` was a #' `data.table` or `tbl`, the `match_data()` output will have #' the same class, but the `get_matches()` output will always be a base R #' `data.frame`. #' #' In addition to their base class (e.g., `data.frame` or `tbl`), #' returned objects have the class `matchdata` or `getmatches`. This #' class is important when using [`rbind()`][rbind.matchdata] to #' append matched datasets. #' #' @note The most common way to use `match_data()` and #' `get_matches()` is by supplying just the `matchit` object, e.g., #' as `match_data(m.out)`. A data set will first be searched in the #' environment of the `matchit` formula, then in the calling environment #' of `match_data()` or `get_matches()`, and finally in the #' `model` component of the `matchit` object if a propensity score #' was estimated. #' #' When called from an environment different from the one in which #' `matchit()` was originally called and a propensity score was not #' estimated (or was but with `discard` not `"none"` and #' `reestimate = TRUE`), this syntax may not work because the original #' dataset used to construct the matched dataset will not be found. This can #' occur when `matchit()` was run within an [lapply()] or #' `purrr::map()` call. The solution, which is recommended in all cases, #' is simply to supply the original dataset to the `data` argument of #' `match_data()`, e.g., as `match_data(m.out, data = original_data)`, as demonstrated in the Examples. #' #' @seealso #' #' [matchit()]; [rbind.matchdata()] #' #' `vignette("estimating-effects")` for uses of `match_data()` and #' `get_matches()` in estimating treatment effects. #' #' @examples #' #' data("lalonde") #' #' # 4:1 matching w/replacement #' m.out1 <- matchit(treat ~ age + educ + married + #' race + nodegree + re74 + re75, #' data = lalonde, #' replace = TRUE, #' caliper = .05, #' ratio = 4) #' #' m.data1 <- match_data(m.out1, #' data = lalonde, #' distance = "prop.score") #' dim(m.data1) #one row per matched unit #' head(m.data1, 10) #' #' g.matches1 <- get_matches(m.out1, #' data = lalonde, #' distance = "prop.score") #' dim(g.matches1) #multiple rows per matched unit #' head(g.matches1, 10) #' #' @export match_data <- function(object, group = "all", distance = "distance", weights = "weights", subclass = "subclass", data = NULL, include.s.weights = TRUE, drop.unmatched = TRUE) { chk::chk_is(object, "matchit") data.found <- FALSE for (i in 1:4) { if (i == 2) { data <- try(eval(object$call$data, envir = environment(object$formula)), silent = TRUE) } else if (i == 3) { data <- try(eval(object$call$data, envir = parent.frame()), silent = TRUE) } else if (i == 4) { data <- object[["model"]][["data"]] } if (!null_or_error(data) && length(dim(data)) == 2L && nrow(data) == length(object[["treat"]])) { data.found <- TRUE break } } if (!data.found) { .err("a valid dataset could not be found. Please supply an argument to `data` containing the original dataset used in the matching") } if (!is.data.frame(data)) { if (!is.matrix(data)) { .err("`data` must be a data frame") } data <- as.data.frame.matrix(data) } if (nrow(data) != length(object$treat)) { .err("`data` must have as many rows as there were units in the original call to `matchit()`") } if (is_not_null(object$distance)) { chk::chk_not_null(distance) chk::chk_string(distance) if (hasName(data, distance)) { .err(sprintf("%s is already the name of a variable in the data. Please choose another name for distance using the `distance` argument", add_quotes(distance))) } data[[distance]] <- object$distance } if (is_not_null(object$weights)) { chk::chk_not_null(weights) chk::chk_string(weights) if (hasName(data, weights)) { .err(sprintf("%s is already the name of a variable in the data. Please choose another name for weights using the `weights` argument", add_quotes(weights))) } data[[weights]] <- object$weights if (is_not_null(object$s.weights) && include.s.weights) { data[[weights]] <- data[[weights]] * object$s.weights } } if (is_not_null(object$subclass)) { chk::chk_not_null(subclass) chk::chk_string(subclass) if (hasName(data, subclass)) { .err(sprintf("%s is already the name of a variable in the data. Please choose another name for subclass using the `subclass` argument", add_quotes(subclass))) } data[[subclass]] <- object$subclass } treat <- object$treat if (drop.unmatched && is_not_null(object$weights)) { data <- data[object$weights > 0, , drop = FALSE] treat <- treat[object$weights > 0] } group <- match_arg(group, c("all", "treated", "control")) if (group == "treated") data <- data[treat == 1, , drop = FALSE] else if (group == "control") data <- data[treat == 0, , drop = FALSE] if (is_not_null(object$distance)) attr(data, "distance") <- distance if (is_not_null(object$weights)) attr(data, "weights") <- weights if (is_not_null(object$subclass)) attr(data, "subclass") <- subclass class(data) <- c("matchdata", class(data)) data } #' @export #' @rdname match_data match.data <- function(...) { match_data(...) } #' @export #' @rdname match_data get_matches <- function(object, distance = "distance", weights = "weights", subclass = "subclass", id = "id", data = NULL, include.s.weights = TRUE) { chk::chk_is(object, "matchit") if (is_null(object$match.matrix)) { .err("a match.matrix component must be present in the matchit object, which does not occur with all types of matching. Use `match_data()` instead") } #Get initial data using match_data; note weights and subclass will be removed, #including them here just checks their names don't clash m.data <- match_data(object, group = "all", distance = distance, weights = weights, subclass = subclass, data = data, include.s.weights = FALSE, drop.unmatched = TRUE) chk::chk_not_null(id) chk::chk_string(id) if (hasName(m.data, id)) { .err(sprintf("%s is already the name of a variable in the data. Please choose another name for id using the `id` argument", add_quotes(id))) } m.data[[id]] <- names(object$treat)[object$weights > 0] for (i in c(weights, subclass)) { if (hasName(m.data, i)) m.data[[i]] <- NULL } mm <- object$match.matrix mm <- mm[!is.na(mm[, 1L]), , drop = FALSE] tmm <- t(mm) num.matches <- rowSums(!is.na(mm)) matched <- as.data.frame(matrix(NA_character_, nrow = nrow(mm) + sum(!is.na(mm)), ncol = 3)) names(matched) <- c(id, subclass, weights) matched[[id]] <- c(as.vector(tmm[!is.na(tmm)]), rownames(mm)) matched[[subclass]] <- c(as.vector(col(tmm)[!is.na(tmm)]), seq_len(nrow(mm))) matched[[weights]] <- c(1 / num.matches[matched[[subclass]][seq_len(sum(!is.na(mm)))]], rep.int(1, nrow(mm))) if (is_not_null(object$s.weights) && include.s.weights) { matched[[weights]] <- matched[[weights]] * object$s.weights[matched[[id]]] } out <- merge(matched, m.data, by = id, all.x = TRUE, sort = FALSE) out <- out[order(out[[subclass]], object$treat[out[[id]]], method = "radix", decreasing = c(FALSE, TRUE)), ] rownames(out) <- NULL out[[subclass]] <- factor(out[[subclass]], labels = seq_len(nrow(mm))) if (is_not_null(object$distance)) attr(out, "distance") <- distance attr(out, "weights") <- weights attr(out, "subclass") <- subclass attr(out, "id") <- id class(out) <- c("getmatches", class(out)) out } MatchIt/R/matchit2cem.R0000644000176200001440000006200314762404716014343 0ustar liggesusers#' Coarsened Exact Matching #' @name method_cem #' @aliases method_cem #' #' @usage NULL #' #' @description #' In [matchit()], setting `method = "cem"` performs coarsened exact #' matching. With coarsened exact matching, covariates are coarsened into bins, #' and a complete cross of the coarsened covariates is used to form subclasses #' defined by each combination of the coarsened covariate levels. Any subclass #' that doesn't contain both treated and control units is discarded, leaving #' only subclasses containing treatment and control units that are exactly #' equal on the coarsened covariates. The coarsening process can be controlled #' by an algorithm or by manually specifying cutpoints and groupings. The #' benefits of coarsened exact matching are that the tradeoff between exact #' matching and approximate balancing can be managed to prevent discarding too #' many units, which can otherwise occur with exact matching. #' #' This page details the allowable arguments with `method = "cem"`. See #' [matchit()] for an explanation of what each argument means in a general #' context and how it can be specified. #' #' Below is how `matchit()` is used for coarsened exact matching: #' \preformatted{ #' matchit(formula, #' data = NULL, #' method = "cem", #' estimand = "ATT", #' s.weights = NULL, #' verbose = FALSE, #' ...) } #' #' @param formula a two-sided [formula] object containing the treatment and #' covariates to be used in creating the subclasses defined by a full cross of #' the coarsened covariate levels. #' @param data a data frame containing the variables named in `formula`. #' If not found in `data`, the variables will be sought in the #' environment. #' @param method set here to `"cem"`. #' @param estimand a string containing the desired estimand. Allowable options #' include `"ATT"`, `"ATC"`, and `"ATE"`. The estimand controls #' how the weights are computed; see the Computing Weights section at #' [matchit()] for details. When `k2k = TRUE` (see below), `estimand` #' also controls how the matching is done. #' @param s.weights the variable containing sampling weights to be incorporated #' into balance statistics or the scaling factors when `k2k = TRUE` and #' certain methods are used. #' @param verbose `logical`; whether information about the matching #' process should be printed to the console. #' @param \dots additional arguments to control the matching process. #' \describe{ #' \item{`grouping`}{ a named list with an (optional) entry #' for each categorical variable to be matched on. Each element should itself #' be a list, and each entry of the sublist should be a vector containing #' levels of the variable that should be combined to form a single level. Any #' categorical variables not included in `grouping` will remain as they #' are in the data, which means exact matching, with no coarsening, will take #' place on these variables. See Details. } #' \item{`cutpoints`}{ a named list with an (optional) entry for each numeric variable to be matched on. #' Each element describes a way of coarsening the corresponding variable. They #' can be a vector of cutpoints that demarcate bins, a single number giving the #' number of bins, or a string corresponding to a method of computing the #' number of bins. Allowable strings include `"sturges"`, `"scott"`, #' and `"fd"`, which use the functions #' [grDevices::nclass.Sturges()], [grDevices::nclass.scott()], #' and [grDevices::nclass.FD()], respectively. The default is #' `"sturges"` for variables that are not listed or if no argument is #' supplied. Can also be a single value to be applied to all numeric variables. #' See Details. } #' \item{`k2k`}{ `logical`; whether 1:1 matching should #' occur within the matched strata. If `TRUE` nearest neighbor matching #' without replacement will take place within each stratum, and any unmatched #' units will be dropped (e.g., if there are more treated than control units in #' the stratum, the treated units without a match will be dropped). The #' `k2k.method` argument controls how the distance between units is #' calculated. } #' \item{`k2k.method`}{`character`; how the distance #' between units should be calculated if `k2k = TRUE`. Allowable arguments #' include `NULL` (for random matching), any argument to #' [distance()] for computing a distance matrix from covariates #' (e.g., `"mahalanobis"`), or any allowable argument to `method` in #' [dist()]. Matching will take place on the original #' (non-coarsened) variables. The default is `"mahalanobis"`. #' } #' \item{`mpower`}{if `k2k.method = "minkowski"`, the power used in #' creating the distance. This is passed to the `p` argument of [dist()]. #' } #' \item{`m.order`}{`character`; the order that the matching takes place when `k2k = TRUE`. Allowable options #' include `"closest"`, where matching takes place in #' ascending order of the smallest distance between units; `"farthest"`, where matching takes place in #' descending order of the smallest distance between units; `"random"`, where matching takes place #' in a random order; and `"data"` where matching takes place based on the #' order of units in the data. When `m.order = "random"`, results may differ #' across different runs of the same code unless a seed is set and specified #' with [set.seed()]. The default of `NULL` corresponds to `"data"`. See [`method_nearest`] for more information. #' } #' } #' #' The arguments `distance` (and related arguments), `exact`, `mahvars`, `discard` (and related arguments), `replace`, `caliper` (and related arguments), and `ratio` are ignored with a warning. #' #' @section Outputs: #' #' All outputs described in [matchit()] are returned with #' `method = "cem"` except for `match.matrix`. When `k2k = TRUE`, a `match.matrix` component with the matched pairs is also #' included. `include.obj` is ignored. #' #' @details #' If the coarsening is such that there are no exact matches with the coarsened #' variables, the `grouping` and `cutpoints` arguments can be used to #' modify the matching specification. Reducing the number of cutpoints or #' grouping some variable values together can make it easier to find matches. #' See Examples below. Removing variables can also help (but they will likely #' not be balanced unless highly correlated with the included variables). To #' take advantage of coarsened exact matching without failing to find any #' matches, the covariates can be manually coarsened outside of #' `matchit()` and then supplied to the `exact` argument in a call to #' `matchit()` with another matching method. #' #' Setting `k2k = TRUE` is equivalent to first doing coarsened exact #' matching with `k2k = FALSE` and then supplying stratum membership as an #' exact matching variable (i.e., in `exact`) to another call to #' `matchit()` with `method = "nearest"`. #' It is also equivalent to performing nearest neighbor matching supplying #' coarsened versions of the variables to `exact`, except that #' `method = "cem"` automatically coarsens the continuous variables. The #' `estimand` argument supplied with `method = "cem"` functions the #' same way it would in these alternate matching calls, i.e., by determining #' the "focal" group that controls the order of the matching. #' #' ## Grouping and Cutpoints #' #' The `grouping` and `cutpoints` #' arguments allow one to fine-tune the coarsening of the covariates. #' `grouping` is used for combining categories of categorical covariates #' and `cutpoints` is used for binning numeric covariates. The values #' supplied to these arguments should be iteratively changed until a matching #' solution that balances covariate balance and remaining sample size is #' obtained. The arguments are described below. #' #' ### `grouping` #' #' The argument to `grouping` must be a list, where each component has the #' name of a categorical variable, the levels of which are to be combined. Each #' component must itself be a list; this list contains one or more vectors of #' levels, where each vector corresponds to the levels that should be combined #' into a single category. For example, if a variable `amount` had levels #' `"none"`, `"some"`, and `"a lot"`, one could enter #' `grouping = list(amount = list(c("none"), c("some", "a lot")))`, which #' would group `"some"` and `"a lot"` into a single category and #' leave `"none"` in its own category. Any levels left out of the list for #' each variable will be left alone (so `c("none")` could have been #' omitted from the previous code). Note that if a categorical variable does #' not appear in `grouping`, it will not be coarsened, so exact matching #' will take place on it. `grouping` should not be used for numeric #' variables with more than a few values; use `cutpoints`, described below, instead. #' #' ### `cutpoints` #' #' The argument to `cutpoints` must also be a list, where each component #' has the name of a numeric variables that is to be binned. (As a shortcut, it #' can also be a single value that will be applied to all numeric variables). #' Each component can take one of three forms: a vector of cutpoints that #' separate the bins, a single number giving the number of bins, or a string #' corresponding to an algorithm used to compute the number of bins. Any values #' at a boundary will be placed into the higher bin; e.g., if the cutpoints #' were `c(0, 5, 10)`, values of 5 would be placed into the same bin as #' values of 6, 7, 8, or 9, and values of 10 would be placed into a different #' bin. Internally, values of `-Inf` and `Inf` are appended to the #' beginning and end of the range. When given as a single number defining the #' number of bins, the bin boundaries are the maximum and minimum values of the #' variable with bin boundaries evenly spaced between them, i.e., not #' quantiles. A value of 0 will not perform any binning (equivalent to exact #' matching on the variable), and a value of 1 will remove the variable from #' the exact matching variables but it will be still used for pair matching #' when `k2k = TRUE`. The allowable strings include `"sturges"`, #' `"scott"`, and `"fd"`, which use the corresponding binning method, #' and `"q#"` where `#` is a number, which splits the variable into #' `#` equally-sized bins (i.e., quantiles). #' #' An example of a way to supply an argument to `cutpoints` would be the #' following: #' \preformatted{ #' cutpoints = list(X1 = 4, #' X2 = c(1.7, 5.5, 10.2), #' X3 = "scott", #' X4 = "q5") } #' #' This would split `X1` into 4 bins, `X2` #' into bins based on the provided boundaries, `X3` into a number of bins #' determined by [grDevices::nclass.scott()], and `X4` into #' quintiles. All other numeric variables would be split into a number of bins #' determined by [grDevices::nclass.Sturges()], the default. #' #' #' @note #' This method does not rely on the *cem* package, instead using #' code written for *MatchIt*, but its design is based on the original #' *cem* functions. Versions of *MatchIt* prior to 4.1.0 did rely on #' *cem*, so results may differ between versions. There are a few #' differences between the ways *MatchIt* and *cem* (and older #' versions of *MatchIt*) differ in executing coarsened exact matching, #' described below. #' * In *MatchIt*, when a single number is #' supplied to `cutpoints`, it describes the number of bins; in #' *cem*, it describes the number of cutpoints separating bins. The #' *MatchIt* method is closer to how [hist()] processes breaks points to #' create bins. #' * In *MatchIt*, values on the cutpoint boundaries will #' be placed into the higher bin; in *cem*, they are placed into the lower #' bin. To avoid consequences of this choice, ensure the bin boundaries do not #' coincide with observed values of the variables. #' * When `cutpoints` are used, `"ss"` (for Shimazaki-Shinomoto's rule) can be used in #' *cem* but not in *MatchIt*. #' * When `k2k = TRUE`, *MatchIt* matches on the original variables (scaled), whereas #' *cem* matches on the coarsened variables. Because the variables are #' already exactly matched on the coarsened variables, matching in *cem* #' is equivalent to random matching within strata. #' * When `k2k = TRUE`, in *MatchIt* matched units are identified by pair membership, and the #' original stratum membership prior to 1:1 matching is discarded. In #' *cem*, pairs are not identified beyond the stratum the members are part of. #' * When `k2k = TRUE`, `k2k.method = "mahalanobis"` can be #' requested in *MatchIt* but not in *cem*. #' #' @seealso [matchit()] for a detailed explanation of the inputs and outputs of #' a call to `matchit()`. #' #' The *cem* package, upon which this method is based and which provided #' the workhorse in previous versions of *MatchIt*. #' #' [`method_exact`] for exact matching, which performs exact matching #' on the covariates without coarsening. #' #' @references #' In a manuscript, you don't need to cite another package when #' using `method = "cem"` because the matching is performed completely #' within *MatchIt*. For example, a sentence might read: #' #' *Coarsened exact matching was performed using the MatchIt package (Ho, #' Imai, King, & Stuart, 2011) in R.* #' #' It would be a good idea to cite the following article, which develops the #' theory behind coarsened exact matching: #' #' Iacus, S. M., King, G., & Porro, G. (2012). Causal Inference without Balance #' Checking: Coarsened Exact Matching. *Political Analysis*, 20(1), 1–24. \doi{10.1093/pan/mpr013} #' #' @examples #' data("lalonde") #' #' # Coarsened exact matching on age, race, married, and educ with educ #' # coarsened into 5 bins and race coarsened into 2 categories, #' # grouping "white" and "hispan" together #' cutpoints <- list(educ = 5) #' grouping <- list(race = list(c("white", "hispan"), #' c("black"))) #' #' m.out1 <- matchit(treat ~ age + race + married + educ, #' data = lalonde, #' method = "cem", #' cutpoints = cutpoints, #' grouping = grouping) #' m.out1 #' summary(m.out1) #' #' # The same but requesting 1:1 Mahalanobis distance matching with #' # the k2k and k2k.method argument. Note the remaining number of units #' # is smaller than when retaining the full matched sample. #' m.out2 <- matchit(treat ~ age + race + married + educ, #' data = lalonde, #' method = "cem", #' cutpoints = cutpoints, #' grouping = grouping, #' k2k = TRUE, #' k2k.method = "mahalanobis") #' m.out2 #' summary(m.out2, un = FALSE) NULL matchit2cem <- function(treat, covs, estimand = "ATT", s.weights = NULL, m.order = NULL, verbose = FALSE, ...) { if (is_null(covs)) { .err("Covariates must be specified in the input formula to use coarsened exact matching") } .cat_verbose("Coarsened exact matching... \n", verbose = verbose) # if (isTRUE(A[["k2k"]])) { # if (!has_n_unique(treat, 2L)) { # .err("`k2k` cannot be `TRUE` with a multi-category treatment") # } # } estimand <- toupper(estimand) estimand <- match_arg(estimand, c("ATT", "ATC", "ATE")) #Uses in-house cem, no need for cem package. strat <- cem_matchit(treat = treat, X = covs, ...) mm <- NULL if (isTRUE(...get("k2k"))) { focal <- switch(estimand, "ATC" = 0, 1) mm <- do_k2k(treat = treat, X = covs, subclass = strat, s.weights = s.weights, focal = focal, m.order = m.order, verbose = verbose, ...) strat <- mm2subclass(mm, treat, focal = focal) levels(strat) <- seq_len(nlevels(strat)) mm <- nummm2charmm(mm, treat) weights <- get_weights_from_mm(mm, treat, focal) } else { levels(strat) <- seq_len(nlevels(strat)) weights <- get_weights_from_subclass(strat, treat, estimand) } .cat_verbose("Calculating matching weights... ", verbose = verbose) res <- list(match.matrix = mm, subclass = strat, weights = weights) .cat_verbose("Done.\n", verbose = verbose) class(res) <- "matchit" res } cem_matchit <- function(treat, X, cutpoints = "sturges", grouping = list(), ...) { #In-house implementation of cem. Basically the same except: #treat is a vector if treatment status, not the name of a variable #X is a data.frame of covariates #when cutpoints are given as integer or string, they define the number of bins, not the number of breakpoints. "ss" is no longer allowed. for (i in seq_along(X)) { if (is.ordered(X[[i]])) X[[i]] <- unclass(X[[i]]) } is.numeric.cov <- setNames(vapply(X, is.numeric, logical(1L)), names(X)) #Process grouping if (is_not_null(grouping)) { if (!is.list(grouping) || is_null(names(grouping))) { .err("`grouping` must be a named list of grouping values with an element for each variable whose values are to be binned") } bad.names <- setdiff(names(grouping), names(X)) nb <- length(bad.names) if (nb > 0) { .wrn(sprintf("the variable%%s %s named in `grouping` %%r not in the variables supplied to `matchit()` and will be ignored", word_list(bad.names, quotes = 2, and.or = "and")), n = nb) grouping[bad.names] <- NULL } for (i in names(grouping)) { X[[i]] <- as.character(X[[i]]) } bag.groupings <- names(grouping)[vapply(grouping, function(g) { !is.list(g) || !all(vapply(g, function(gg) is.atomic(gg) && is.vector(gg), logical(1L))) }, logical(1L))] nbg <- length(bag.groupings) if (nbg > 0L) { .err(paste0("Each entry in the list supplied to `groupings` must be a list with entries containing values of the corresponding variable.", "\nIncorrectly specified variable%s:\n\t"), toString(bag.groupings), tidy = FALSE, n = nbg) } for (g in names(grouping)) { x <- X[[g]] groups <- grouping[[g]] for (i in seq_along(groups)) { groups[[i]] <- as.character(groups[[i]]) x[x %in% groups[[i]]] <- groups[[i]][1L] } X[[g]] <- x #Remove cutpoints if variable named in `grouping` is.numeric.cov[g] <- FALSE } } #Process cutpoints if (!is.list(cutpoints)) { cutpoints <- setNames(rep.int(list(cutpoints), sum(is.numeric.cov)), names(X)[is.numeric.cov]) } if (is_null(names(cutpoints))) { .err("`cutpoints` must be a named list of binning values with an element for each numeric variable") } bad.names <- setdiff(names(cutpoints), names(X)) nb <- length(bad.names) if (nb > 0L) { .wrn(sprintf("the variable%%s %s named in `cutpoints` %%r not in the variables supplied to `matchit()` and will be ignored", word_list(bad.names, quotes = 2, and.or = "and")), n = nb) cutpoints[bad.names] <- NULL } if (is_not_null(grouping)) { grouping.cutpoint.names <- intersect(names(grouping), names(cutpoints)) ngc <- length(grouping.cutpoint.names) if (ngc > 0L) { .wrn(sprintf("the variable%%s %s %%r named in both `grouping` and `cutpoints`; %s entr%%y%%s in `cutpoints` will be ignored", word_list(grouping.cutpoint.names, quotes = 2, and.or = "and"), ngettext(ngc, "its", "their")), n = ngc) cutpoints[grouping.cutpoint.names] <- NULL } } non.numeric.in.cutpoints <- intersect(names(X)[!is.numeric.cov], names(cutpoints)) nnnic <- length(non.numeric.in.cutpoints) if (nnnic > 0L) { .wrn(sprintf("the variable%%s %s named in `cutpoints` %%r not numeric and %s cutpoints will not be applied. Use `grouping` for non-numeric variables", word_list(non.numeric.in.cutpoints, quotes = 2, and.or = "and"), ngettext(nnnic, "its", "their")), n = nnnic) } bad.cuts <- rep_with(FALSE, cutpoints) for (i in names(cutpoints)) { if (is_null(cutpoints[[i]])) { cutpoints[[i]] <- "sturges" } else if (length(cutpoints[[i]]) > 1L) { bad.cuts[i] <- !is.numeric(cutpoints[[i]]) } else if (is.na(cutpoints[[i]])) { is.numeric.cov[i] <- FALSE #Will not be binned } else if (is.character(cutpoints[[i]])) { bad.cuts[i] <- !(startsWith(cutpoints[[i]], "q") && can_str2num(substring(cutpoints[[i]], 2))) && is.na(pmatch(cutpoints[[i]], c("sturges", "fd", "scott"))) } else if (!is.numeric(cutpoints[[i]]) || !is.finite(cutpoints[[i]]) || cutpoints[[i]] < 0) { bad.cuts[i] <- TRUE } else if (cutpoints[[i]] == 0) { is.numeric.cov[i] <- FALSE #Will not be binned } else if (cutpoints[[i]] == 1) { X[[i]] <- NULL #Removing from X, still in X.match is.numeric.cov <- is.numeric.cov[names(is.numeric.cov) != i] } } if (any(bad.cuts)) { .err(paste0("All entries in the list supplied to `cutpoints` must be one of the following:", "\n\t- a string containing the name of an allowable binning method", "\n\t- a single number corresponding to the number of bins", "\n\t- a numeric vector containing the cut points separating bins", "\nIncorrectly specified variable%s:\n\t"), toString(names(cutpoints)[bad.cuts]), tidy = FALSE, n = sum(bad.cuts)) } if (is_null(X)) { return(rep_with(1L, treat)) } #Create bins for numeric variables for (i in names(X)[is.numeric.cov]) { bins <- { if (is_not_null(cutpoints) && any(names(cutpoints) == i)) cutpoints[[i]] else "sturges" } if (is.character(bins)) { if (startsWith(bins, "q") || can_str2num(substring(bins, 2))) { #Quantile bins q <- str2num(substring(bins, 2L)) bins <- quantile(X[[i]], probs = seq(1 / q, 1 - 1 / q, by = 1 / q), names = FALSE) #Outer boundaries will be added later } else { bins <- match_arg(tolower(bins), c("sturges", "fd", "scott")) bins <- switch(bins, sturges = nclass.Sturges(X[[i]]), fd = nclass.FD(X[[i]]), scott = nclass.scott(X[[i]])) #Breaks is now a single number } } if (length(bins) == 1L) { #cutpoints is number of bins, unlike in cem breaks <- seq(min(X[[i]]), max(X[[i]]), length = bins + 1) breaks[c(1, bins + 1)] <- c(-Inf, Inf) } else { breaks <- c(-Inf, sort(unique(bins)), Inf) } X[[i]] <- findInterval(X[[i]], breaks) } #Exact match ex <- unclass(exactify(X, names(treat))) cc <- Reduce("intersect", lapply(unique(treat), function(t) ex[treat == t])) if (is_null(cc)) { .err("no units were matched. Try coarsening the variables further or decrease the number of variables to match on") } setNames(factor(match(ex, cc), nmax = length(cc)), names(treat)) } do_k2k <- function(treat, X, subclass, k2k.method = "mahalanobis", mpower = 2, s.weights = NULL, focal, m.order = "data", verbose = FALSE, k2k = TRUE, ...) { #Note: need k2k argument to prevent partial matching for k2k.method m.order <- match_arg(m.order, c("data", "random", "closest", "farthest")) .cat_verbose("K:K matching...\n", verbose = verbose) if (is_not_null(k2k.method)) { chk::chk_string(k2k.method) k2k.method <- tolower(k2k.method) k2k.method <- match_arg(k2k.method, c(matchit_distances(), "maximum", "manhattan", "canberra", "binary", "minkowski")) if (k2k.method == "minkowski") { chk::chk_number(mpower) chk::chk_gt(mpower, 0) if (mpower == 2) { k2k.method <- "euclidean" } } X.match <- transform_covariates(data = X, s.weights = s.weights, treat = treat, method = if (k2k.method %in% matchit_distances()) k2k.method else "euclidean") distance <- NULL } else { k2k.method <- "euclidean" X.match <- NULL distance <- rep.int(0, length(treat)) } reuse.max <- 1L caliper.dist <- caliper.covs <- caliper.covs.mat <- antiexactcovs <- unit.id <- NULL if (k2k.method %in% matchit_distances()) { discarded <- is.na(subclass) ratio <- rep.int(1L, sum(treat == focal)) mm <- nn_matchC_dispatch(treat, focal, ratio, discarded, reuse.max, distance, NULL, subclass, caliper.dist, caliper.covs, caliper.covs.mat, X.match, antiexactcovs, unit.id, m.order, verbose) } else { mm <- matrix(NA_integer_, ncol = 1, nrow = sum(treat == 1), dimnames = list(names(treat)[treat == 1], NULL)) for (s in levels(subclass)) { .e <- which(subclass == s) treat_ <- treat[.e] discarded_ <- rep.int(FALSE, length(.e)) ex_ <- NULL ratio_ <- rep.int(1L, sum(treat_ == focal)) distance_mat <- as.matrix(dist(X.match[.e, , drop = FALSE], method = k2k.method, p = mpower))[treat_ == focal, treat_ != focal, drop = FALSE] mm_ <- nn_matchC_dispatch(treat_, focal, ratio_, discarded_, reuse.max, distance, distance_mat, ex_, caliper.dist, caliper.covs, caliper.covs.mat, NULL, antiexactcovs, unit.id, m.order, FALSE) #Ensure matched indices correspond to indices in full sample, not subgroup mm_[] <- .e[mm_] mm[rownames(mm_), ] <- mm_ } } mm }MatchIt/R/matchit2exact.R0000644000176200001440000001120614762404624014700 0ustar liggesusers#' Exact Matching #' @name method_exact #' @aliases method_exact #' @usage NULL #' #' @description #' In [matchit()], setting `method = "exact"` performs exact matching. #' With exact matching, a complete cross of the covariates is used to form #' subclasses defined by each combination of the covariate levels. Any subclass #' that doesn't contain both treated and control units is discarded, leaving #' only subclasses containing treatment and control units that are exactly #' equal on the included covariates. The benefits of exact matching are that #' confounding due to the covariates included is completely eliminated, #' regardless of the functional form of the treatment or outcome models. The #' problem is that typically many units will be discarded, sometimes #' dramatically reducing precision and changing the target population of #' inference. To use exact matching in combination with another matching method #' (i.e., to exact match on some covariates and some other form of matching on #' others), use the `exact` argument with that method. #' #' This page details the allowable arguments with `method = "exact"`. See #' [matchit()] for an explanation of what each argument means in a general #' context and how it can be specified. #' #' Below is how `matchit()` is used for exact matching: #' \preformatted{ #' matchit(formula, #' data = NULL, #' method = "exact", #' estimand = "ATT", #' s.weights = NULL, #' verbose = FALSE, #' ...) #'} #' #' @param formula a two-sided [formula] object containing the treatment and #' covariates to be used in creating the subclasses defined by a full cross of #' the covariate levels. #' @param data a data frame containing the variables named in `formula`. #' If not found in `data`, the variables will be sought in the #' environment. #' @param method set here to `"exact"`. #' @param estimand a string containing the desired estimand. Allowable options #' include `"ATT"`, `"ATC"`, and `"ATE"`. The estimand controls #' how the weights are computed; see the Computing Weights section at #' [matchit()] for details. #' @param s.weights the variable containing sampling weights to be incorporated #' into balance statistics. These weights do not affect the matching process. #' @param verbose `logical`; whether information about the matching #' process should be printed to the console. #' @param \dots ignored. #' #' The arguments `distance` (and related arguments), `exact`, `mahvars`, `discard` (and related arguments), `replace`, `m.order`, `caliper` (and related arguments), and `ratio` are ignored with a warning. #' #' @section Outputs: #' #' All outputs described in [matchit()] are returned with #' `method = "exact"` except for `match.matrix`. This is because #' matching strata are not indexed by treated units as they are in some other #' forms of matching. `include.obj` is ignored. #' #' @seealso [matchit()] for a detailed explanation of the inputs and outputs of #' a call to `matchit()`. The `exact` argument can be used with other #' methods to perform exact matching in combination with other matching #' methods. #' #' [method_cem] for coarsened exact matching, which performs exact #' matching on coarsened versions of the covariates. #' #' @references #' In a manuscript, you don't need to cite another package when #' using `method = "exact"` because the matching is performed completely #' within *MatchIt*. For example, a sentence might read: #' #' *Exact matching was performed using the MatchIt package (Ho, Imai, #' King, & Stuart, 2011) in R.* #' #' @examples #' #' data("lalonde") #' #' # Exact matching on age, race, married, and educ #' m.out1 <- matchit(treat ~ age + race + #' married + educ, #' data = lalonde, #' method = "exact") #' m.out1 #' summary(m.out1) #' NULL matchit2exact <- function(treat, covs, data, estimand = "ATT", verbose = FALSE, ...) { .cat_verbose("Exact matching...\n", verbose = verbose) if (is_null(covs)) { .err("covariates must be specified in the input formula to use exact matching") } estimand <- toupper(estimand) estimand <- match_arg(estimand, c("ATT", "ATC", "ATE")) xx <- exactify(covs, names(treat)) cc <- Reduce("intersect", lapply(unique(treat), function(t) xx[treat == t])) if (is_null(cc)) { .err("no exact matches were found") } psclass <- setNames(factor(match(xx, cc), nmax = length(cc)), names(treat)) .cat_verbose("Calculating matching weights... ", verbose = verbose) res <- list(subclass = psclass, weights = get_weights_from_subclass(psclass, treat, estimand)) .cat_verbose("Done.\n", verbose = verbose) class(res) <- "matchit" res } MatchIt/R/distance2_methods.R0000644000176200001440000006376114762407702015554 0ustar liggesusers#' Propensity scores and other distance measures #' @name distance #' @aliases distance #' @usage NULL #' #' @description #' Several matching methods require or can involve the distance between treated #' and control units. Options include the Mahalanobis distance, propensity #' score distance, or distance between user-supplied values. Propensity scores #' are also used for common support via the `discard` options and for #' defining calipers. This page documents the options that can be supplied to #' the `distance` argument to [matchit()]. #' #' @section Allowable options: #' #' There are four ways to specify the `distance` argument: 1) as a string containing the name of a method for #' estimating propensity scores, 2) as a string containing the name of a method #' for computing pairwise distances from the covariates, 3) as a vector of #' values whose pairwise differences define the distance between units, or 4) #' as a distance matrix containing all pairwise distances. The options are #' detailed below. #' #' ## Propensity score estimation methods #' #' When `distance` is specified as the name of a method for estimating propensity scores #' (described below), a propensity score is estimated using the variables in #' `formula` and the method corresponding to the given argument. This #' propensity score can be used to compute the distance between units as the #' absolute difference between the propensity scores of pairs of units. #' Propensity scores can also be used to create calipers and common support #' restrictions, whether or not they are used in the actual distance measure #' used in the matching, if any. #' #' In addition to the `distance` argument, two other arguments can be #' specified that relate to the estimation and manipulation of the propensity #' scores. The `link` argument allows for different links to be used in #' models that require them such as generalized linear models, for which the #' logit and probit links are allowed, among others. In addition to specifying #' the link, the `link` argument can be used to specify whether the #' propensity score or the linearized version of the propensity score should be #' used; by specifying `link = "linear.{link}"`, the linearized version #' will be used. #' #' The `distance.options` argument can also be specified, which should be #' a list of values passed to the propensity score-estimating function, for #' example, to choose specific options or tuning parameters for the estimation #' method. If `formula`, `data`, or `verbose` are not supplied #' to `distance.options`, the corresponding arguments from #' `matchit()` will be automatically supplied. See the Examples for #' demonstrations of the uses of `link` and `distance.options`. When #' `s.weights` is supplied in the call to `matchit()`, it will #' automatically be passed to the propensity score-estimating function as the #' `weights` argument unless otherwise described below. #' #' The following methods for estimating propensity scores are allowed: #' #' \describe{ #' \item{`"glm"`}{ The propensity scores are estimated using #' a generalized linear model (e.g., logistic regression). The `formula` #' supplied to `matchit()` is passed directly to [glm()], and #' [predict.glm()] is used to compute the propensity scores. The `link` #' argument can be specified as a link function supplied to [binomial()], e.g., #' `"logit"`, which is the default. When `link` is prepended by #' `"linear."`, the linear predictor is used instead of the predicted #' probabilities. `distance = "glm"` with `link = "logit"` (logistic #' regression) is the default in `matchit()`. (This used to be able to be requested as `distance = "ps"`, which still works.)} #' \item{`"gam"`}{ #' The propensity scores are estimated using a generalized additive model. The #' `formula` supplied to `matchit()` is passed directly to #' \pkgfun{mgcv}{gam}, and \pkgfun{mgcv}{predict.gam} is used to compute the propensity #' scores. The `link` argument can be specified as a link function #' supplied to [binomial()], e.g., `"logit"`, which is the default. When #' `link` is prepended by `"linear."`, the linear predictor is used #' instead of the predicted probabilities. Note that unless the smoothing #' functions \pkgfun{mgcv}{s}, \pkgfun{mgcv}{te}, \pkgfun{mgcv}{ti}, or \pkgfun{mgcv}{t2} are #' used in `formula`, a generalized additive model is identical to a #' generalized linear model and will estimate the same propensity scores as #' `glm()`. See the documentation for \pkgfun{mgcv}{gam}, #' \pkgfun{mgcv}{formula.gam}, and \pkgfun{mgcv}{gam.models} for more information on #' how to specify these models. Also note that the formula returned in the #' `matchit()` output object will be a simplified version of the supplied #' formula with smoothing terms removed (but all named variables present). } #' \item{`"gbm"`}{ The propensity scores are estimated using a #' generalized boosted model. The `formula` supplied to `matchit()` #' is passed directly to \pkgfun{gbm}{gbm}, and \pkgfun{gbm}{predict.gbm} is used to #' compute the propensity scores. The optimal tree is chosen using 5-fold #' cross-validation by default, and this can be changed by supplying an #' argument to `method` to `distance.options`; see \pkgfun{gbm}{gbm.perf} #' for details. The `link` argument can be specified as `"linear"` to #' use the linear predictor instead of the predicted probabilities. No other #' links are allowed. The tuning parameter defaults differ from #' `gbm::gbm()`; they are as follows: `n.trees = 1e4`, #' `interaction.depth = 3`, `shrinkage = .01`, `bag.fraction = 1`, `cv.folds = 5`, `keep.data = FALSE`. These are the same #' defaults as used in *WeightIt* and *twang*, except for #' `cv.folds` and `keep.data`. Note this is not the same use of #' generalized boosted modeling as in *twang*; here, the number of trees is #' chosen based on cross-validation or out-of-bag error, rather than based on #' optimizing balance. \pkg{twang} should not be cited when using this method #' to estimate propensity scores. Note that because there is a random component to choosing the tuning #' parameter, results will vary across runs unless a [seed][set.seed] is #' set.} #' \item{`"lasso"`, `"ridge"`, `"elasticnet"`}{ #' The propensity #' scores are estimated using a lasso, ridge, or elastic net model, #' respectively. The `formula` supplied to `matchit()` is processed #' with [model.matrix()] and passed to \pkgfun{glmnet}{cv.glmnet}, and #' \pkgfun{glmnet}{predict.cv.glmnet} is used to compute the propensity scores. The #' `link` argument can be specified as a link function supplied to #' [binomial()], e.g., `"logit"`, which is the default. When `link` #' is prepended by `"linear."`, the linear predictor is used instead of #' the predicted probabilities. When `link = "log"`, a Poisson model is #' used. For `distance = "elasticnet"`, the `alpha` argument, which #' controls how to prioritize the lasso and ridge penalties in the elastic net, #' is set to .5 by default and can be changed by supplying an argument to #' `alpha` in `distance.options`. For `"lasso"` and #' `"ridge"`, `alpha` is set to 1 and 0, respectively, and cannot be #' changed. The `cv.glmnet()` defaults are used to select the tuning #' parameters and generate predictions and can be modified using #' `distance.options`. If the `s` argument is passed to #' `distance.options`, it will be passed to `predict.cv.glmnet()`. #' Note that because there is a random component to choosing the tuning #' parameter, results will vary across runs unless a [seed][set.seed] is #' set. } #' \item{`"rpart"`}{ The propensity scores are estimated using a #' classification tree. The `formula` supplied to `matchit()` is #' passed directly to \pkgfun{rpart}{rpart}, and \pkgfun{rpart}{predict.rpart} is used #' to compute the propensity scores. The `link` argument is ignored, and #' predicted probabilities are always returned as the distance measure. } #' \item{`"randomforest"`}{ The propensity scores are estimated using a #' random forest. The `formula` supplied to `matchit()` is passed #' directly to \pkgfun{randomForest}{randomForest}, and #' \pkgfun{randomForest}{predict.randomForest} is used to compute the propensity #' scores. The `link` argument is ignored, and predicted probabilities are #' always returned as the distance measure. Note that because there is a random component, results will vary across runs unless a [seed][set.seed] is #' set. } #' \item{`"nnet"`}{ The #' propensity scores are estimated using a single-hidden-layer neural network. #' The `formula` supplied to `matchit()` is passed directly to #' \pkgfun{nnet}{nnet}, and [fitted()] is used to compute the propensity scores. #' The `link` argument is ignored, and predicted probabilities are always #' returned as the distance measure. An argument to `size` must be #' supplied to `distance.options` when using `method = "nnet"`. } #' \item{`"cbps"`}{ The propensity scores are estimated using the #' covariate balancing propensity score (CBPS) algorithm, which is a form of #' logistic regression where balance constraints are incorporated to a #' generalized method of moments estimation of of the model coefficients. The #' `formula` supplied to `matchit()` is passed directly to #' \pkgfun{CBPS}{CBPS}, and [fitted()] is used to compute the propensity #' scores. The `link` argument can be specified as `"linear"` to use #' the linear predictor instead of the predicted probabilities. No other links #' are allowed. The `estimand` argument supplied to `matchit()` will #' be used to select the appropriate estimand for use in defining the balance #' constraints, so no argument needs to be supplied to `ATT` in #' `CBPS`. } #' \item{`"bart"`}{ The propensity scores are estimated #' using Bayesian additive regression trees (BART). The `formula` supplied #' to `matchit()` is passed directly to \pkgfun{dbarts}{bart2}, #' and \pkgfun{dbarts}{fitted.bart} is used to compute the propensity #' scores. The `link` argument can be specified as `"linear"` to use #' the linear predictor instead of the predicted probabilities. When #' `s.weights` is supplied to `matchit()`, it will not be passed to #' `bart2` because the `weights` argument in `bart2` does not #' correspond to sampling weights. Note that because there is a random component to choosing the tuning #' parameter, results will vary across runs unless the `seed` argument is supplied to `distance.options`. Note that setting a seed using [set.seed()] is not sufficient to guarantee reproducibility unless single-threading is used. See \pkgfun{dbarts}{bart2} for details.} #' } #' #' ## Methods for computing distances from covariates #' #' The following methods involve computing a distance matrix from the covariates #' themselves without estimating a propensity score. Calipers on the distance #' measure and common support restrictions cannot be used, and the `distance` #' component of the output object will be empty because no propensity scores are #' estimated. The `link` and `distance.options` arguments are ignored with these #' methods. See the individual matching methods pages for whether these #' distances are allowed and how they are used. Each of these distance measures #' can also be calculated outside `matchit()` using its [corresponding function][euclidean_dist]. #' #' \describe{ #' \item{`"euclidean"`}{ The Euclidean distance is the raw #' distance between units, computed as \deqn{d_{ij} = \sqrt{(x_i - x_j)(x_i - x_j)'}} It is sensitive to the scale of the covariates, so covariates with #' larger scales will take higher priority. } #' \item{`"scaled_euclidean"`}{ #' The scaled Euclidean distance is the #' Euclidean distance computed on the scaled (i.e., standardized) covariates. #' This ensures the covariates are on the same scale. The covariates are #' standardized using the pooled within-group standard deviations, computed by #' treatment group-mean centering each covariate before computing the standard #' deviation in the full sample. #' } #' \item{`"mahalanobis"`}{ The Mahalanobis distance is computed as \deqn{d_{ij} = \sqrt{(x_i - x_j)\Sigma^{-1}(x_i - x_j)'}} where \eqn{\Sigma} is the pooled within-group #' covariance matrix of the covariates, computed by treatment group-mean #' centering each covariate before computing the covariance in the full sample. #' This ensures the variables are on the same scale and accounts for the #' correlation between covariates. } #' \item{`"robust_mahalanobis"`}{ The #' robust rank-based Mahalanobis distance is the Mahalanobis distance computed #' on the ranks of the covariates with an adjustment for ties. It is described #' in Rosenbaum (2010, ch. 8) as an alternative to the Mahalanobis distance #' that handles outliers and rare categories better than the standard #' Mahalanobis distance but is not affinely invariant. } #' } #' #' To perform Mahalanobis distance matching *and* estimate propensity scores to #' be used for a purpose other than matching, the `mahvars` argument should be #' used along with a different specification to `distance`. See the individual #' matching method pages for details on how to use `mahvars`. #' #' ## Distances supplied as a numeric vector or matrix #' #' `distance` can also be supplied as a numeric vector whose values will be #' taken to function like propensity scores; their pairwise difference will #' define the distance between units. This might be useful for supplying #' propensity scores computed outside `matchit()` or resupplying `matchit()` #' with propensity scores estimated previously without having to recompute them. #' #' `distance` can also be supplied as a matrix whose values represent the #' pairwise distances between units. The matrix should either be a square, with #' a row and column for each unit (e.g., as the output of a call to #' `as.matrix(`[`dist`]`(.))`), or have as many rows as there are treated units #' and as many columns as there are control units (e.g., as the output of a call #' to [mahalanobis_dist()] or \pkgfun{optmatch}{match_on}). Distance values of #' `Inf` will disallow the corresponding units to be matched. When `distance` is #' a supplied as a numeric vector or matrix, `link` and `distance.options` are #' ignored. #' #' @note In versions of *MatchIt* prior to 4.0.0, `distance` was specified in a #' slightly different way. When specifying arguments using the old syntax, they #' will automatically be converted to the corresponding method in the new syntax #' but a warning will be thrown. `distance = "logit"`, the old default, will #' still work in the new syntax, though `distance = "glm", link = "logit"` is #' preferred (note that these are the default settings and don't need to be made #' explicit). #' #' @examples #' data("lalonde") #' #' # Linearized probit regression PS: #' m.out1 <- matchit(treat ~ age + educ + race + married + #' nodegree + re74 + re75, #' data = lalonde, #' distance = "glm", #' link = "linear.probit") #' @examplesIf requireNamespace("mgcv", quietly = TRUE) #' # GAM logistic PS with smoothing splines (s()): #' m.out2 <- matchit(treat ~ s(age) + s(educ) + #' race + married + #' nodegree + re74 + re75, #' data = lalonde, #' distance = "gam") #' summary(m.out2$model) #' @examplesIf requireNamespace("CBPS", quietly = TRUE) #' # CBPS for ATC matching w/replacement, using the just- #' # identified version of CBPS (setting method = "exact"): #' m.out3 <- matchit(treat ~ age + educ + race + married + #' nodegree + re74 + re75, #' data = lalonde, #' distance = "cbps", #' estimand = "ATC", #' distance.options = list(method = "exact"), #' replace = TRUE) #' @examples #' # Mahalanobis distance matching - no PS estimated #' m.out4 <- matchit(treat ~ age + educ + race + married + #' nodegree + re74 + re75, #' data = lalonde, #' distance = "mahalanobis") #' #' m.out4$distance #NULL #' #' # Mahalanobis distance matching with PS estimated #' # for use in a caliper; matching done on mahvars #' m.out5 <- matchit(treat ~ age + educ + race + married + #' nodegree + re74 + re75, #' data = lalonde, #' distance = "glm", #' caliper = .1, #' mahvars = ~ age + educ + race + married + #' nodegree + re74 + re75) #' #' summary(m.out5) #' #' # User-supplied propensity scores #' p.score <- fitted(glm(treat ~ age + educ + race + married + #' nodegree + re74 + re75, #' data = lalonde, #' family = binomial)) #' #' m.out6 <- matchit(treat ~ age + educ + race + married + #' nodegree + re74 + re75, #' data = lalonde, #' distance = p.score) #' #' # User-supplied distance matrix using rank_mahalanobis() #' dist_mat <- robust_mahalanobis_dist( #' treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde) #' #' m.out7 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' distance = dist_mat) NULL #distance2glm----------------- distance2glm <- function(formula, data = NULL, link = "logit", ...) { linear <- is_not_null(link) && startsWith(as.character(link), "linear") if (linear) link <- sub("linear.", "", as.character(link), fixed = TRUE) args <- unique(c(names(formals(glm)), names(formals(glm.control)))) A <- ...mget(args) A[lengths(A) == 0L] <- NULL A[["data"]] <- data A[["formula"]] <- formula A[["family"]] <- { if (is_null(A[["weights"]])) binomial(link = link) else quasibinomial(link = link) } res <- do.call("glm", A) pred <- predict(res, type = if (linear) "link" else "response") list(model = res, distance = pred) } #distance2gam----------------- distance2gam <- function(formula, data = NULL, link = "logit", ...) { rlang::check_installed("mgcv") linear <- is_not_null(link) && startsWith(as.character(link), "linear") if (linear) link <- sub("linear.", "", as.character(link), fixed = TRUE) A <- list(...) weights <- A$weights A$weights <- NULL res <- do.call(mgcv::gam, c(list(formula, data, family = quasibinomial(link), weights = weights), A), quote = TRUE) pred <- predict(res, type = if (linear) "link" else "response") list(model = res, distance = as.numeric(pred)) } #distance2rpart----------------- distance2rpart <- function(formula, data = NULL, link = NULL, ...) { rlang::check_installed("rpart") args <- unique(c(names(formals(rpart::rpart)), names(formals(rpart::rpart.control)))) A <- ...mget(args) A[lengths(A) == 0L] <- NULL A$formula <- formula A$data <- data A$method <- "class" res <- do.call(rpart::rpart, A) list(model = res, distance = predict(res, type = "prob")[, "1"]) } #distance2nnet----------------- distance2nnet <- function(formula, data = NULL, link = NULL, ...) { rlang::check_installed("nnet") A <- list(...) weights <- A$weights A$weights <- NULL res <- do.call(nnet::nnet, c(list(formula, data = data, weights = weights, entropy = TRUE), A), quote = TRUE) list(model = res, distance = drop(fitted(res))) } #distance2cbps----------------- distance2cbps <- function(formula, data = NULL, link = NULL, ...) { rlang::check_installed("CBPS") linear <- is_not_null(link) && startsWith(as.character(link), "linear") A <- list(...) A[["standardized"]] <- FALSE if (is_null(A[["ATT"]])) { if (is_null(A[["estimand"]])) { A[["ATT"]] <- 1 } else { estimand <- toupper(A[["estimand"]]) estimand <- match_arg(estimand, c("ATT", "ATC", "ATE")) A[["ATT"]] <- switch(estimand, "ATT" = 1, "ATC" = 2, 0) } } if (is_null(A[["method"]])) { A[["method"]] <- if (isFALSE(A[["over"]])) "exact" else "over" } A[c("estimand", "over")] <- NULL if (is_not_null(A[["weights"]])) { A[["sample.weights"]] <- A[["weights"]] A[["weights"]] <- NULL } A[["formula"]] <- formula A[["data"]] <- data capture.output({ #Keeps from printing message about treatment res <- do.call(CBPS::CBPS, A, quote = TRUE) }) pred <- fitted(res) if (linear) pred <- qlogis(pred) list(model = res, distance = pred) } #distance2bart---------------- distance2bart <- function(formula, data = NULL, link = NULL, ...) { rlang::check_installed("dbarts") linear <- is_not_null(link) && startsWith(as.character(link), "linear") args <- unique(c(names(formals(dbarts::bart2)), names(formals(dbarts::dbartsControl)))) A <- ...mget(args) A[lengths(A) == 0L] <- NULL A$formula <- formula A$data <- data res <- do.call(dbarts::bart2, A) pred <- fitted(res, type = if (linear) "link" else "response") list(model = res, distance = pred) } # distance2bart <- function(formula, data, link = NULL, ...) { # rlang::check_installed("BART") # # if (is_not_null(link) && startsWith(as.character(link), "linear")) { # linear <- TRUE # link <- sub("linear.", "", as.character(link), fixed = TRUE) # } # else linear <- FALSE # # #Keep link probit because default in matchit is logit but probit is much faster with BART # link <- "probit" # # # if (is_null(link)) link <- "probit" # # else if (!link %in% c("probit", "logit")) { # # stop("'link' must be \"probit\" or \"logit\" with distance = \"bart\".", call. = FALSE) # # } # # data <- model.frame(formula, data) # # treat <- binarize(data[[1]]) # X <- data[-1] # # chars <- vapply(X, is.character, logical(1L)) # X[chars] <- lapply(X[chars], factor) # # A <- list(...) # # if (is_not_null(A[["mc.cores"]]) && A[["mc.cores"]][1] > 1) fun <- BART::mc.gbart # else fun <- BART::gbart # # res <- do.call(fun, c(list(X, # y.train = treat, # type = switch(link, "logit" = "lbart", "pbart")), # A[intersect(names(A), setdiff(names(formals(fun)), # c("x.train", "y.train", "x.test", "type", "ntype")))])) # # pred <- res$prob.train.mean # if (linear) pred <- switch(link, logit = qlogis, probit = qnorm)(pred) # # return(list(model = res, distance = pred)) # } #distance2randomforest----------------- distance2randomforest <- function(formula, data = NULL, link = NULL, ...) { rlang::check_installed("randomForest") newdata <- get_all_vars(formula, data) treatvar <- as.character(formula[[2L]]) newdata[[treatvar]] <- factor(newdata[[treatvar]], levels = c("0", "1")) res <- randomForest::randomForest(formula, data = newdata, ...) list(model = res, distance = predict(res, type = "prob")[, "1"]) } #distance2glmnet-------------- distance2elasticnet <- function(formula, data = NULL, link = NULL, ...) { rlang::check_installed("glmnet") linear <- is_not_null(link) && startsWith(as.character(link), "linear") if (linear) link <- sub("linear.", "", as.character(link), fixed = TRUE) s <- ...get("s") if (is_null(s)) { s <- "lambda.1se" } args <- unique(c(names(formals(glmnet::glmnet)), names(formals(glmnet::cv.glmnet)))) A <- ...mget(args) A[lengths(A) == 0L] <- NULL if (is_null(link)) link <- "logit" A$family <- switch(link, "logit" = "binomial", "log" = "poisson", binomial(link = link)) if (is_null(A[["alpha"]])) { A[["alpha"]] <- .5 } mf <- model.frame(formula, data = data) A$y <- model.response(mf) A$x <- model.matrix(update(formula, . ~ . + 1), mf)[, -1L, drop = FALSE] res <- do.call(glmnet::cv.glmnet, A) pred <- drop(predict(res, newx = A$x, s = s, type = if (linear) "link" else "response")) list(model = res, distance = pred) } distance2lasso <- function(formula, data = NULL, link = NULL, ...) { if ("alpha" %in% ...names()) { args <- unique(c("s", names(formals(glmnet::glmnet)), names(formals(glmnet::cv.glmnet)))) A <- ...mget(args) A[lengths(A) == 0L] <- NULL A$alpha <- 1 do.call("distance2elasticnet", c(list(formula, data = data, link = link), A), quote = TRUE) } else { distance2elasticnet(formula = formula, data = data, link = link, alpha = 1, ...) } } distance2ridge <- function(formula, data = NULL, link = NULL, ...) { if ("alpha" %in% ...names()) { args <- unique(c("s", names(formals(glmnet::glmnet)), names(formals(glmnet::cv.glmnet)))) A <- ...mget(args) A[lengths(A) == 0L] <- NULL A$alpha <- 0 do.call("distance2elasticnet", c(list(formula, data = data, link = link), A), quote = TRUE) } else { distance2elasticnet(formula = formula, data = data, link = link, alpha = 0, ...) } } #distance2gbm-------------- distance2gbm <- function(formula, data = NULL, link = NULL, ...) { rlang::check_installed("gbm") linear <- is_not_null(link) && startsWith(as.character(link), "linear") method <- ...get("method") args <- names(formals(gbm::gbm)) A <- ...mget(args) A[lengths(A) == 0L] <- NULL A$formula <- formula A$data <- data A$distribution <- "bernoulli" if (is_null(A[["n.trees"]])) A[["n.trees"]] <- 1e4 if (is_null(A[["interaction.depth"]])) A[["interaction.depth"]] <- 3 if (is_null(A[["shrinkage"]])) A[["shrinkage"]] <- .01 if (is_null(A[["bag.fraction"]])) A[["bag.fraction"]] <- 1 if (is_null(A[["cv.folds"]])) A[["cv.folds"]] <- 5 if (is_null(A[["keep.data"]])) A[["keep.data"]] <- FALSE if (A[["cv.folds"]] <= 1 && A[["bag.fraction"]] == 1) { .err('either `bag.fraction` must be less than 1 or `cv.folds` must be greater than 1 when using `distance = "gbm"`') } if (is_null(method)) { if (A[["bag.fraction"]] < 1) method <- "OOB" else method <- "cv" } else if (!tolower(method) %in% c("oob", "cv")) { .err('`distance.options$method` should be one of "OOB" or "cv"') } res <- do.call(gbm::gbm, A) best.tree <- gbm::gbm.perf(res, plot.it = FALSE, method = method) pred <- drop(predict(res, newdata = data, n.trees = best.tree, type = if (linear) "link" else "response")) list(model = res, distance = pred) } MatchIt/R/lalonde.R0000644000176200001440000000345014375161317013557 0ustar liggesusers#' Data from National Supported Work Demonstration and PSID, as analyzed by #' Dehejia and Wahba (1999). #' #' This is a subsample of the data from the treated group in the National #' Supported Work Demonstration (NSW) and the comparison sample from the #' Population Survey of Income Dynamics (PSID). This data was previously #' analyzed extensively by Lalonde (1986) and Dehejia and Wahba (1999). #' #' #' @name lalonde #' @docType data #' @format A data frame with 614 observations (185 treated, 429 control). #' There are 9 variables measured for each individual. #' \itemize{ #' \item "treat" #' is the treatment assignment (1=treated, 0=control). #' \item "age" is age in years. #' \item "educ" is education in number of years of schooling. #' \item "race" is the individual's race/ethnicity, (Black, Hispanic, or White). Note #' previous versions of this dataset used indicator variables `black` and #' `hispan` instead of a single race variable. #' \item "married" is an #' indicator for married (1=married, 0=not married). #' \item "nodegree" is an #' indicator for whether the individual has a high school degree (1=no degree, #' 0=degree). #' \item "re74" is income in 1974, in U.S. dollars. #' \item "re75" is #' income in 1975, in U.S. dollars. #' \item "re78" is income in 1978, in U.S. #' dollars. } #' #' "treat" is the treatment variable, "re78" is the outcome, and the #' others are pre-treatment covariates. #' #' @references Lalonde, R. (1986). Evaluating the econometric evaluations of #' training programs with experimental data. *American Economic Review* 76: #' 604-620. #' #' Dehejia, R.H. and Wahba, S. (1999). Causal Effects in Nonexperimental #' Studies: Re-Evaluating the Evaluation of Training Programs. *Journal of the #' American Statistical Association* 94: 1053-1062. #' @keywords datasets NULL MatchIt/R/matchit2optimal.R0000644000176200001440000004726214763323344015254 0ustar liggesusers#' Optimal Pair Matching #' @name method_optimal #' @aliases method_optimal #' @usage NULL #' #' @description #' In [matchit()], setting `method = "optimal"` performs optimal pair #' matching. The matching is optimal in the sense that that sum of the absolute #' pairwise distances in the matched sample is as small as possible. The method #' functionally relies on \pkgfun{optmatch}{fullmatch}. #' #' Advantages of optimal pair matching include that the matching order is not #' required to be specified and it is less likely that extreme within-pair #' distances will be large, unlike with nearest neighbor matching. Generally, #' however, as a subset selection method, optimal pair matching tends to #' perform similarly to nearest neighbor matching in that similar subsets of #' units will be selected to be matched. #' #' This page details the allowable arguments with `method = "optmatch"`. #' See [matchit()] for an explanation of what each argument means in a general #' context and how it can be specified. #' #' Below is how `matchit()` is used for optimal pair matching: #' \preformatted{ #' matchit(formula, #' data = NULL, #' method = "optimal", #' distance = "glm", #' link = "logit", #' distance.options = list(), #' estimand = "ATT", #' exact = NULL, #' mahvars = NULL, #' antiexact = NULL, #' discard = "none", #' reestimate = FALSE, #' s.weights = NULL, #' ratio = 1, #' min.controls = NULL, #' max.controls = NULL, #' verbose = FALSE, #' ...) } #' #' @param formula a two-sided [formula] object containing the treatment and #' covariates to be used in creating the distance measure used in the matching. #' This formula will be supplied to the functions that estimate the distance #' measure. #' @param data a data frame containing the variables named in `formula`. #' If not found in `data`, the variables will be sought in the #' environment. #' @param method set here to `"optimal"`. #' @param distance the distance measure to be used. See [`distance`] #' for allowable options. Can be supplied as a distance matrix. #' @param link when `distance` is specified as a method of estimating #' propensity scores, an additional argument controlling the link function used #' in estimating the distance measure. See [`distance`] for allowable #' options with each option. #' @param distance.options a named list containing additional arguments #' supplied to the function that estimates the distance measure as determined #' by the argument to `distance`. #' @param estimand a string containing the desired estimand. Allowable options #' include `"ATT"` and `"ATC"`. See Details. #' @param exact for which variables exact matching should take place. #' @param mahvars for which variables Mahalanobis distance matching should take #' place when `distance` corresponds to a propensity score (e.g., for #' caliper matching or to discard units for common support). If specified, the #' distance measure will not be used in matching. #' @param antiexact for which variables anti-exact matching should take place. #' Anti-exact matching is processed using \pkgfun{optmatch}{antiExactMatch}. #' @param discard a string containing a method for discarding units outside a #' region of common support. Only allowed when `distance` is not #' `"mahalanobis"` and not a matrix. #' @param reestimate if `discard` is not `"none"`, whether to #' re-estimate the propensity score in the remaining sample prior to matching. #' @param s.weights the variable containing sampling weights to be incorporated #' into propensity score models and balance statistics. #' @param ratio how many control units should be matched to each treated unit #' for k:1 matching. For variable ratio matching, see section "Variable Ratio #' Matching" in Details below. #' @param min.controls,max.controls for variable ratio matching, the minimum #' and maximum number of controls units to be matched to each treated unit. See #' section "Variable Ratio Matching" in Details below. #' @param verbose `logical`; whether information about the matching #' process should be printed to the console. What is printed depends on the #' matching method. Default is `FALSE` for no printing other than #' warnings. #' @param \dots additional arguments passed to \pkgfun{optmatch}{fullmatch}. #' Allowed arguments include `tol` and `solver`. See the #' \pkgfun{optmatch}{fullmatch} documentation for details. In general, `tol` #' should be set to a low number (e.g., `1e-7`) to get a more precise #' solution (default is `1e-3`). #' #' The arguments `replace`, `caliper`, and `m.order` are ignored with a warning. #' #' @section Outputs: #' #' All outputs described in [matchit()] are returned with #' `method = "optimal"`. When `include.obj = TRUE` in the call to #' `matchit()`, the output of the call to `optmatch::fullmatch()` will be #' included in the output. When `exact` is specified, this will be a list #' of such objects, one for each stratum of the `exact` variables. #' #' @details #' #' ## Mahalanobis Distance Matching #' #' Mahalanobis distance matching can be done one of two ways: #' \enumerate{ #' \item{If no propensity score needs to be estimated, `distance` should be #' set to `"mahalanobis"`, and Mahalanobis distance matching will occur #' using all the variables in `formula`. Arguments to `discard` and #' `mahvars` will be ignored. For example, to perform simple Mahalanobis #' distance matching, the following could be run: #' #' \preformatted{ #' matchit(treat ~ X1 + X2, method = "nearest", #' distance = "mahalanobis") } #' #' With this code, the Mahalanobis distance is computed using `X1` and #' `X2`, and matching occurs on this distance. The `distance` #' component of the `matchit()` output will be empty. #' } #' \item{If a propensity score needs to be estimated for common support with #' `discard`, `distance` should be whatever method is used to #' estimate the propensity score or a vector of distance measures, i.e., it #' should not be `"mahalanobis"`. Use `mahvars` to specify the #' variables used to create the Mahalanobis distance. For example, to perform #' Mahalanobis after discarding units outside the common support of the #' propensity score in both groups, the following could be run: #' #' \preformatted{ #' matchit(treat ~ X1 + X2 + X3, method = "nearest", #' distance = "glm", discard = "both", #' mahvars = ~ X1 + X2) } #' #' With this code, `X1`, `X2`, and `X3` are used to estimate the #' propensity score (using the `"glm"` method, which by default is #' logistic regression), which is used to identify the common support. The #' actual matching occurs on the Mahalanobis distance computed only using #' `X1` and `X2`, which are supplied to `mahvars`. The estimated #' propensity scores will be included in the `distance` component of the #' `matchit()` output. #' } #' } #' ## Estimand #' #' The `estimand` argument controls whether control units are selected to be matched with treated units #' (`estimand = "ATT"`) or treated units are selected to be matched with #' control units (`estimand = "ATC"`). The "focal" group (e.g., the #' treated units for the ATT) is typically made to be the smaller treatment #' group, and a warning will be thrown if it is not set that. Setting `estimand = "ATC"` is equivalent to #' swapping all treated and control labels for the treatment variable. When #' `estimand = "ATC"`, the `match.matrix` component of the output #' will have the names of the control units as the rownames and be filled with #' the names of the matched treated units (opposite to when `estimand = "ATT"`). Note that the argument supplied to `estimand` doesn't #' necessarily correspond to the estimand actually targeted; it is merely a #' switch to trigger which treatment group is considered "focal". #' #' ## Variable Ratio Matching #' #' `matchit()` can perform variable #' ratio matching, which involves matching a different number of control units #' to each treated unit. When `ratio > 1`, rather than requiring all #' treated units to receive `ratio` matches, the arguments to #' `max.controls` and `min.controls` can be specified to control the #' maximum and minimum number of matches each treated unit can have. #' `ratio` controls how many total control units will be matched: `n1 * ratio` control #' units will be matched, where `n1` is the number of #' treated units, yielding the same total number of matched controls as fixed #' ratio matching does. #' #' Variable ratio matching can be used with any `distance` specification. #' `ratio` does not have to be an integer but must be greater than 1 and #' less than `n0/n1`, where `n0` and `n1` are the number of #' control and treated units, respectively. Setting `ratio = n0/n1` #' performs a restricted form of full matching where all control units are #' matched. If `min.controls` is not specified, it is set to 1 by default. #' `min.controls` must be less than `ratio`, and `max.controls` #' must be greater than `ratio`. See the Examples section of #' [method_nearest()] for an example of their use, which is the same #' as it is with optimal matching. #' #' @note #' Optimal pair matching is a restricted form of optimal full matching #' where the number of treated units in each subclass is equal to 1, whereas in #' unrestricted full matching, multiple treated units can be assigned to the #' same subclass. \pkgfun{optmatch}{pairmatch} is simply a wrapper for #' \pkgfun{optmatch}{fullmatch}, which performs optimal full matching and is the #' workhorse for [`method_full`]. In the same way, `matchit()` #' uses `optmatch::fullmatch()` under the hood, imposing the restrictions that #' make optimal full matching function like optimal pair matching (which is #' simply to set `min.controls >= 1` and to pass `ratio` to the #' `mean.controls` argument). This distinction is not important for #' regular use but may be of interest to those examining the source code. #' #' The option `"optmatch_max_problem_size"` is automatically set to #' `Inf` during the matching process, different from its default in #' *optmatch*. This enables matching problems of any size to be run, but #' may also let huge, infeasible problems get through and potentially take a #' long time or crash R. See \pkgfun{optmatch}{setMaxProblemSize} for more details. #' #' A preprocessing algorithm describe by Sävje (2020; \doi{10.1214/19-STS739}) is used to improve the speed of the matching when 1:1 matching on a propensity score. It does so by adding an additional constraint that guarantees a solution as optimal as the solution that would have been found without the constraint, and that constraint often dramatically reduces the size of the matching problem at no cost. However, this may introduce differences between the results obtained by *MatchIt* and by *optmatch*, though such differences will shrink when smaller values of `tol` are used. #' #' @seealso [matchit()] for a detailed explanation of the inputs and outputs of #' a call to `matchit()`. #' #' \pkgfun{optmatch}{fullmatch}, which is the workhorse. #' #' [`method_full`] for optimal full matching, of which optimal pair #' matching is a special case, and which relies on similar machinery. #' #' @references In a manuscript, be sure to cite the following paper if using #' `matchit()` with `method = "optimal"`: #' #' Hansen, B. B., & Klopfer, S. O. (2006). Optimal Full Matching and Related #' Designs via Network Flows. Journal of Computational and Graphical #' Statistics, 15(3), 609–627. \doi{10.1198/106186006X137047} #' #' For example, a sentence might read: #' #' *Optimal pair matching was performed using the MatchIt package (Ho, #' Imai, King, & Stuart, 2011) in R, which calls functions from the optmatch #' package (Hansen & Klopfer, 2006).* #' #' @examplesIf requireNamespace("optmatch", quietly = TRUE) #' data("lalonde") #' #' #1:1 optimal PS matching with exact matching on race #' m.out1 <- matchit(treat ~ age + educ + race + #' nodegree + married + re74 + re75, #' data = lalonde, #' method = "optimal", #' exact = ~race) #' m.out1 #' summary(m.out1) #' #' #2:1 optimal matching on the scaled Euclidean distance #' m.out2 <- matchit(treat ~ age + educ + race + #' nodegree + married + re74 + re75, #' data = lalonde, #' method = "optimal", #' ratio = 2, #' distance = "scaled_euclidean") #' m.out2 #' summary(m.out2, un = FALSE) NULL matchit2optimal <- function(treat, formula, data, distance, discarded, ratio = 1, s.weights = NULL, caliper = NULL, mahvars = NULL, exact = NULL, estimand = "ATT", verbose = FALSE, is.full.mahalanobis, antiexact = NULL, ...) { rlang::check_installed("optmatch") .args <- c("tol", "solver") A <- ...mget(.args) A[lengths(A) == 0L] <- NULL estimand <- toupper(estimand) estimand <- match_arg(estimand, c("ATT", "ATC")) if (estimand == "ATC") { tc <- c("control", "treated") focal <- 0 } else { tc <- c("treated", "control") focal <- 1 } treat_ <- setNames(as.integer(treat[!discarded] == focal), names(treat)[!discarded]) # if (is_not_null(data)) data <- data[!discarded,] if (is.full.mahalanobis) { if (is_null(attr(terms(formula, data = data), "term.labels"))) { .err(sprintf("covariates must be specified in the input formula when `distance = \"%s\"`", attr(is.full.mahalanobis, "transform"))) } mahvars <- formula } if (is_not_null(caliper)) { .wrn("calipers are currently not compatible with `method = \"optimal\"` and will be ignored") caliper <- NULL } min.controls <- attr(ratio, "min.controls") max.controls <- attr(ratio, "max.controls") if (is_null(max.controls)) { min.controls <- max.controls <- ratio } #Exact matching strata if (is_not_null(exact)) { ex <- factor(exactify(model.frame(exact, data = data), sep = ", ", include_vars = TRUE)[!discarded]) cc <- Reduce("intersect", lapply(unique(treat_), function(t) unclass(ex)[treat_ == t])) if (is_null(cc)) { .err("no matches were found") } e_ratios <- vapply(levels(ex), function(e) { sum(treat_[ex == e] == 0) / sum(treat_[ex == e] == 1) }, numeric(1L)) if (any(e_ratios < 1)) { .wrn(sprintf("fewer %s units than %s units in some `exact` strata; not all %s units will get a match", tc[2L], tc[1L], tc[1L])) } if (ratio > 1 && any(e_ratios < ratio)) { if (ratio == max.controls) .wrn(sprintf("not all %s units will get %s matches", tc[1L], ratio)) else .wrn(sprintf("not enough %s units for an average of %s matches per %s unit in all `exact` strata", tc[2L], ratio, tc[1L])) } } else { ex <- gl(1, length(treat_), labels = "_") cc <- 1 e_ratios <- setNames(sum(treat_ == 0) / sum(treat_ == 1), levels(ex)) if (e_ratios < 1) { .wrn(sprintf("fewer %s units than %s units; not all %s units will get a match", tc[2L], tc[1L], tc[1L])) } else if (e_ratios < ratio) { if (ratio == max.controls) .wrn(sprintf("not all %s units will get %s matches", tc[1L], ratio)) else .wrn(sprintf("not enough %s units for an average of %s matches per %s unit", tc[2L], ratio, tc[1L])) } } #Create distance matrix; note that Mahalanobis distance computed using entire #sample (minus discarded), like method2nearest, as opposed to within exact strata, like optmatch. if (is_not_null(mahvars)) { transform <- if (is.full.mahalanobis) attr(is.full.mahalanobis, "transform") else "mahalanobis" mahcovs <- transform_covariates(mahvars, data = data, method = transform, s.weights = s.weights, treat = treat, discarded = discarded) mo <- eucdist_internal(mahcovs, treat) } else if (is.matrix(distance)) { mo <- distance } else { mo <- eucdist_internal(setNames(as.numeric(distance), names(treat)), treat) } #Transpose distance mat as needed if (focal == 0) { mo <- t(mo) } #Remove discarded units from distance mat mo <- mo[!discarded[treat == focal], !discarded[treat != focal], drop = FALSE] dimnames(mo) <- list(names(treat_)[treat_ == 1], names(treat_)[treat_ == 0]) mo <- optmatch::match_on(mo, data = as.data.frame(data)[!discarded, , drop = FALSE]) mo <- optmatch::as.InfinitySparseMatrix(mo) if (is_null(mahvars) && !is.matrix(distance) && nlevels(ex) == 1L && ratio == 1 && max.controls == 1) { .cat_verbose("Preprocessing to reduce problem size...\n", verbose = verbose) #Preprocess by pruning unnecessary edges as in Sävje (2020) https://doi.org/10.1214/19-STS699 keep <- preprocess_matchC(treat_, distance[!discarded]) if (length(keep) < length(mo)) { mo <- .subset_infsm(mo, keep) } } .cat_verbose("Optimal matching...\n", verbose = verbose) #Process antiexact if (is_not_null(antiexact)) { antiexactcovs <- model.frame(antiexact, data) for (i in seq_len(ncol(antiexactcovs))) { mo <- mo + optmatch::antiExactMatch(antiexactcovs[[i]][!discarded], z = treat_) } } #Initialize pair membership; must include names pair <- rep_with(NA_character_, treat) p <- setNames(vector("list", nlevels(ex)), levels(ex)) t_df <- data.frame(treat_) for (e in levels(ex)[cc]) { if (nlevels(ex) > 1L) { .cat_verbose(sprintf("Matching subgroup %s/%s: %s...\n", match(e, levels(ex)[cc]), length(cc), e), verbose = verbose) mo_ <- mo[ex[treat_ == 1] == e, ex[treat_ == 0] == e] } else { mo_ <- mo } if (any(dim(mo_) == 0) || !any(is.finite(mo_))) { next } if (all_equal_to(dim(mo_), 1) && all(is.finite(mo_))) { pair[ex == e] <- paste(1, e, sep = "|") next } #Process ratio, etc., when available ratio in exact matching categories #(e_ratio) differs from requested ratio if (e_ratios[e] < 1) { #Switch treatment and control labels; unmatched treated units are dropped ratio_ <- min.controls_ <- max.controls_ <- 1 mo_ <- t(mo_) } else if (e_ratios[e] < ratio) { #Lower ratio and min.controls. ratio_ <- e_ratios[e] min.controls_ <- min(min.controls, floor(e_ratios[e])) max.controls_ <- max.controls } else { ratio_ <- ratio min.controls_ <- min.controls max.controls_ <- max.controls } A$x <- mo_ A$mean.controls <- ratio_ A$min.controls <- min.controls_ A$max.controls <- max.controls_ A$data <- t_df[ex == e, , drop = FALSE] #just to get rownames; not actually used in matching rlang::with_options({ matchit_try({ p[[e]] <- do.call(optmatch::fullmatch, A) }, from = "optmatch") }, optmatch_max_problem_size = Inf) pair[names(p[[e]])[!is.na(p[[e]])]] <- paste(as.character(p[[e]][!is.na(p[[e]])]), e, sep = "|") } if (length(p) == 1L) { p <- p[[1L]] } psclass <- factor(pair) levels(psclass) <- seq_len(nlevels(psclass)) names(psclass) <- names(treat) mm <- nummm2charmm(subclass2mmC(psclass, treat, focal), treat) .cat_verbose("Calculating matching weights...", verbose = verbose) ## calculate weights and return the results res <- list(match.matrix = mm, subclass = psclass, weights = get_weights_from_subclass(psclass, treat, estimand), obj = p) .cat_verbose("Done.\n", verbose = verbose) class(res) <- "matchit" res } #Subset InfinitySparseMatrix using vector indices .subset_infsm <- function(y, ss) { y@.Data <- y[ss] y@cols <- y@cols[ss] y@rows <- y@rows[ss] y } MatchIt/R/utils.R0000644000176200001440000003253314763222505013303 0ustar liggesusers#Function to turn a vector into a string with "," and "and" or "or" for clean messages. 'and.or' #controls whether words are separated by "and" or "or"; 'is.are' controls whether the list is #followed by "is" or "are" (to avoid manually figuring out if plural); quotes controls whether #quotes should be placed around words in string. From WeightIt. word_list <- function(word.list = NULL, and.or = "and", is.are = FALSE, quotes = FALSE) { #When given a vector of strings, creates a string of the form "a and b" #or "a, b, and c" #If is.are, adds "is" or "are" appropriately word.list <- setdiff(word.list, c(NA_character_, "")) if (is_null(word.list)) { out <- "" attr(out, "plural") <- FALSE return(out) } word.list <- add_quotes(word.list, quotes) L <- length(word.list) if (L == 1L) { out <- word.list if (is.are) out <- paste(out, "is") attr(out, "plural") <- FALSE return(out) } if (is_null(and.or) || isFALSE(and.or)) { out <- toString(word.list) } else { and.or <- match_arg(and.or, c("and", "or")) if (L == 2L) { out <- sprintf("%s %s %s", word.list[1L], and.or, word.list[2L]) } else { out <- sprintf("%s, %s %s", toString(word.list[-L]), and.or, word.list[L]) } } if (is.are) out <- sprintf("%s are", out) attr(out, "plural") <- TRUE out } #Add quotes to a string add_quotes <- function(x, quotes = 2L) { if (isFALSE(quotes)) { return(x) } if (isTRUE(quotes)) { quotes <- '"' } if (chk::vld_string(quotes)) { return(paste0(quotes, x, str_rev(quotes))) } if (!chk::vld_count(quotes) || quotes > 2L) { stop("`quotes` must be boolean, 1, 2, or a string.") } if (quotes == 0L) { return(x) } x <- { if (quotes == 1L) sprintf("'%s'", x) else sprintf('"%s"', x) } x } str_rev <- function(x) { vapply(lapply(strsplit(x, NULL), rev), paste, character(1L), collapse = "") } #More informative and cleaner version of base::match.arg(). Uses chk. match_arg <- function(arg, choices, several.ok = FALSE) { #Replaces match.arg() but gives cleaner error message and processing #of arg. if (missing(arg)) { stop("No argument was supplied to match_arg.") } arg.name <- deparse1(substitute(arg), width.cutoff = 500L) if (missing(choices)) { sysP <- sys.parent() formal.args <- formals(sys.function(sysP)) choices <- eval(formal.args[[as.character(substitute(arg))]], envir = sys.frame(sysP)) } if (is_null(arg)) { return(choices[1L]) } if (several.ok) { chk::chk_character(arg, x_name = add_quotes(arg.name, "`")) } else { chk::chk_string(arg, x_name = add_quotes(arg.name, "`")) if (identical(arg, choices)) { return(arg[1L]) } } i <- pmatch(arg, choices, nomatch = 0L, duplicates.ok = TRUE) if (all(i == 0L)) .err(sprintf("the argument to `%s` should be %s%s", arg.name, ngettext(length(choices), "", if (several.ok) "at least one of " else "one of "), word_list(choices, and.or = "or", quotes = 2L))) i <- i[i > 0L] choices[i] } # Version of interaction(., drop = TRUE) that doesn't succumb to vector limit reached by # avoiding Cartesian expansion. Falls back to interaction() for small problems. interaction2 <- function(..., sep = ".", lex.order = TRUE) { narg <- ...length() if (narg == 0L) { stop("No factors specified") } if (narg == 1L && is.list(..1)) { args <- ..1 narg <- length(args) } else { args <- list(...) } for (i in seq_len(narg)) { args[[i]] <- as.factor(args[[i]]) } if (do.call("prod", lapply(args, nlevels)) <= 1e6) { return(interaction(args, drop = TRUE, sep = sep, lex.order = if (is.null(lex.order)) TRUE else lex.order)) } out <- do.call(function(...) paste(..., sep = sep), args) args_char <- lapply(args, function(x) { x <- unclass(x) formatC(x, format = "d", flag = "0", width = ceiling(log10(max(x)))) }) lev <- { if (is.null(lex.order)) unique(out) else if (lex.order) unique(out[order(do.call("paste", c(args_char, list(sep = sep))))]) else unique(out[order(do.call("paste", c(rev(args_char), list(sep = sep))))]) } factor(out, levels = lev) } #Turn a vector into a 0/1 vector. 'zero' and 'one' can be supplied to make it clear which is #which; otherwise, a guess is used. From WeightIt. binarize <- function(variable, zero = NULL, one = NULL) { var.name <- deparse1(substitute(variable)) if (is.character(variable) || is.factor(variable)) { variable <- factor(variable, nmax = if (is.factor(variable)) nlevels(variable) else NA) unique.vals <- levels(variable) } else { unique.vals <- unique(variable) } if (length(unique.vals) == 1L) { return(rep_with(1L, variable)) } if (length(unique.vals) != 2L) { .err(sprintf("cannot binarize %s: more than two levels", var.name)) } if (is_not_null(zero)) { if (!zero %in% unique.vals) { .err(sprintf("the argument to `zero` is not the name of a level of %s", var.name)) } return(setNames(as.integer(variable != zero), names(variable))) } if (is_not_null(one)) { if (!one %in% unique.vals) { .err(sprintf("the argument to `one` is not the name of a level of %s", var.name)) } return(setNames(as.integer(variable == one), names(variable))) } if (is.logical(variable)) { return(setNames(as.integer(variable), names(variable))) } if (is.numeric(variable)) { zero <- { if (any(unique.vals == 0)) 0 else min(unique.vals, na.rm = TRUE) } return(setNames(as.integer(variable != zero), names(variable))) } variable.numeric <- { if (can_str2num(unique.vals)) setNames(str2num(unique.vals), unique.vals)[variable] else as.numeric(factor(variable, levels = unique.vals)) } zero <- { if (0 %in% variable.numeric) 0 else min(variable.numeric, na.rm = TRUE) } setNames(as.integer(variable.numeric != zero), names(variable)) } is_null <- function(x) length(x) == 0L is_not_null <- function(x) !is_null(x) null_or_error <- function(x) {is_null(x) || inherits(x, "try-error")} #Determine whether a character vector can be coerced to numeric can_str2num <- function(x) { if (is.numeric(x) || is.logical(x)) { return(TRUE) } nas <- is.na(x) x_num <- suppressWarnings(as.numeric(as.character(x[!nas]))) !anyNA(x_num) } #Cleanly coerces a character vector to numeric; best to use after can_str2num() str2num <- function(x) { nas <- is.na(x) if (!is.numeric(x) && !is.logical(x)) x <- as.character(x) x_num <- suppressWarnings(as.numeric(x)) is.na(x_num)[nas] <- TRUE x_num } #Capitalize first letter of string firstup <- function(x) { substr(x, 1L, 1L) <- toupper(substr(x, 1L, 1L)) x } #Capitalize first letter of each word capwords <- function(s, strict = FALSE) { cap <- function(s) paste0(toupper(substring(s, 1L, 1L)), {s <- substring(s, 2L) if (strict) tolower(s) else s}, collapse = " ") sapply(strsplit(s, split = " ", fixed = TRUE), cap, USE.NAMES = is_not_null(names(s))) } #Reverse a string str_rev <- function(x) { vapply(lapply(strsplit(x, NULL), rev), paste, character(1L), collapse = "") } #Clean printing of data frames with numeric and NA elements. round_df_char <- function(df, digits, pad = "0", na_vals = "") { if (NROW(df) == 0L || NCOL(df) == 0L) { return(df) } if (!is.data.frame(df)) { df <- as.data.frame.matrix(df, stringsAsFactors = FALSE) } rn <- rownames(df) cn <- colnames(df) infs <- o.negs <- array(FALSE, dim = dim(df)) nas <- is.na(df) nums <- vapply(df, is.numeric, logical(1)) for (i in which(nums)) { infs[, i] <- !nas[, i] & !is.finite(df[[i]]) } for (i in which(!nums)) { if (can_str2num(df[[i]])) { df[[i]] <- str2num(df[[i]]) nums[i] <- TRUE } } o.negs[, nums] <- !nas[, nums] & df[nums] < 0 & round(df[nums], digits) == 0 df[nums] <- round(df[nums], digits = digits) pad0 <- identical(as.character(pad), "0") for (i in which(nums)) { df[[i]] <- format(df[[i]], scientific = FALSE, justify = "none", trim = TRUE, drop0trailing = !pad0) if (!pad0 && any(grepl(".", df[[i]], fixed = TRUE))) { s <- strsplit(df[[i]], ".", fixed = TRUE) lengths <- lengths(s) digits.r.of.. <- rep.int(0, NROW(df)) digits.r.of..[lengths > 1] <- nchar(vapply(s[lengths > 1], `[[`, character(1L), 2)) dots <- rep.int("", length(s)) dots[lengths <= 1] <- if (as.character(pad) != "") "." else pad pads <- vapply(max(digits.r.of..) - digits.r.of.., function(n) paste(rep.int(pad, n), collapse = ""), character(1L)) df[[i]] <- paste0(df[[i]], dots, pads) } } df[o.negs] <- paste0("-", df[o.negs]) # Insert NA placeholders df[nas] <- na_vals df[infs] <- "N/A" if (length(rn) > 0) rownames(df) <- rn if (length(cn) > 0) names(df) <- cn df } #Generalized inverse; port of MASS::ginv() generalized_inverse <- function(sigma, tol = 1e-8) { sigmasvd <- svd(sigma) pos <- sigmasvd$d > max(tol * sigmasvd$d[1L], 0) sigmasvd$v[, pos, drop = FALSE] %*% (sigmasvd$d[pos]^-1 * t(sigmasvd$u[, pos, drop = FALSE])) } #(Weighted) variance that uses special formula for binary variables wvar <- function(x, bin.var = NULL, w = NULL) { if (is_null(w)) w <- rep.int(1, length(x)) if (is_null(bin.var)) bin.var <- all(x == 0 | x == 1) w <- w / sum(w) #weights normalized to sum to 1 mx <- sum(w * x) #weighted mean if (bin.var) { return(mx * (1 - mx)) } #Reliability weights variance; same as cov.wt() sum(w * (x - mx)^2) / (1 - sum(w^2)) } #Weighted mean faster than weighted.mean() wm <- function(x, w = NULL, na.rm = TRUE) { if (is_null(w)) { if (anyNA(x)) { if (!na.rm) return(NA_real_) nas <- which(is.na(x)) x <- x[-nas] } return(sum(x) / length(x)) } if (anyNA(x) || anyNA(w)) { if (!na.rm) return(NA_real_) nas <- which(is.na(x) | is.na(w)) x <- x[-nas] w <- w[-nas] } sum(x * w) / sum(w) } #Faster diff() diff1 <- function(x) { x[-1] - x[-length(x)] } #cumsum() for probabilities to ensure they are between 0 and 1 .cumsum_prob <- function(x) { s <- cumsum(x) s / s[length(s)] } #Make vector sum to 1, optionally by group .make_sum_to_1 <- function(x, by = NULL) { if (is_null(by)) { return(x / sum(x)) } for (i in unique(by)) { in_i <- which(by == i) x[in_i] <- x[in_i] / sum(x[in_i]) } x } #Make vector sum to n (average of 1), optionally by group .make_sum_to_n <- function(x, by = NULL) { if (is_null(by)) { return(length(x) * x / sum(x)) } for (i in unique(by)) { in_i <- which(by == i) x[in_i] <- length(in_i) * x[in_i] / sum(x[in_i]) } x } #A faster na.omit for vectors na.rem <- function(x) { x[!is.na(x)] } #Extract variables from ..., similar to ...elt() or get0(), by name without evaluating list(...) ...get <- function(x, ifnotfound = NULL) { eval(quote(if (!anyNA(.m1 <- match(.x, ...names())) && is_not_null(.m2 <- ...elt(.m1))) .m2 else .ifnotfound), pairlist(.x = x[1L], .ifnotfound = ifnotfound), parent.frame(1L)) } #Extract multiple variables from ..., similar to mget(), by name without evaluating list(...) ...mget <- function(x) { found <- match(x, eval(quote(...names()), parent.frame(1L))) not_found <- is.na(found) if (all(not_found)) { return(list()) } setNames(lapply(found[!not_found], function(z) { eval(quote(...elt(.z)), pairlist(.z = z), parent.frame(3L)) }), x[!not_found]) } #Helper function to fill named vectors with x and given names of y rep_with <- function(x, y) { setNames(rep.int(x, length(y)), names(y)) } #cat() if verbose = TRUE (default sep = "", line wrapping) .cat_verbose <- function(..., verbose = TRUE, sep = "") { if (!verbose) { return(invisible(NULL)) } m <- do.call(function(...) paste(..., sep = sep), list(...)) if (endsWith(m, "\n")) { m <- paste0(paste(strwrap(m), collapse = "\n"), "\n") } else { m <- paste(strwrap(m), collapse = "\n") } cat(paste(m, collapse = "\n")) } #Functions for error handling; based on chk and rlang pkg_caller_call <- function() { pn <- utils::packageName() package.funs <- c(getNamespaceExports(pn), .getNamespaceInfo(asNamespace(pn), "S3methods")[, 3L]) for (i in seq_len(sys.nframe())) { e <- sys.call(i) n <- rlang::call_name(e) if (is_null(n)) { next } if (n %in% package.funs) { return(e) } } NULL } .err <- function(..., n = NULL, tidy = TRUE) { m <- chk::message_chk(..., n = n, tidy = tidy) rlang::abort(paste(strwrap(m), collapse = "\n"), call = pkg_caller_call()) } .wrn <- function(..., n = NULL, tidy = TRUE, immediate = TRUE) { m <- chk::message_chk(..., n = n, tidy = tidy) if (immediate && isTRUE(all.equal(0, getOption("warn")))) { rlang::with_options({ rlang::warn(paste(strwrap(m), collapse = "\n")) }, warn = 1) } else { rlang::warn(paste(strwrap(m), collapse = "\n")) } } .msg <- function(..., n = NULL, tidy = TRUE) { m <- chk::message_chk(..., n = n, tidy = tidy) rlang::inform(paste(strwrap(m), collapse = "\n"), tidy = FALSE) }MatchIt/R/matchit2nearest.R0000644000176200001440000007360514763230505015244 0ustar liggesusers#' Nearest Neighbor Matching #' @name method_nearest #' @aliases method_nearest #' @usage NULL #' #' @description #' In [matchit()], setting `method = "nearest"` performs greedy nearest #' neighbor matching. A distance is computed between each treated unit and each #' control unit, and, one by one, each treated unit is assigned a control unit #' as a match. The matching is "greedy" in the sense that there is no action #' taken to optimize an overall criterion; each match is selected without #' considering the other matches that may occur subsequently. #' #' This page details the allowable arguments with `method = "nearest"`. #' See [matchit()] for an explanation of what each argument means in a general #' context and how it can be specified. #' #' Below is how `matchit()` is used for nearest neighbor matching: #' \preformatted{ #' matchit(formula, #' data = NULL, #' method = "nearest", #' distance = "glm", #' link = "logit", #' distance.options = list(), #' estimand = "ATT", #' exact = NULL, #' mahvars = NULL, #' antiexact = NULL, #' discard = "none", #' reestimate = FALSE, #' s.weights = NULL, #' replace = TRUE, #' m.order = NULL, #' caliper = NULL, #' ratio = 1, #' min.controls = NULL, #' max.controls = NULL, #' verbose = FALSE, #' ...) } #' #' @param formula a two-sided [formula] object containing the treatment and #' covariates to be used in creating the distance measure used in the matching. #' @param data a data frame containing the variables named in `formula`. #' If not found in `data`, the variables will be sought in the #' environment. #' @param method set here to `"nearest"`. #' @param distance the distance measure to be used. See [`distance`] #' for allowable options. Can be supplied as a distance matrix. #' @param link when `distance` is specified as a method of estimating #' propensity scores, an additional argument controlling the link function used #' in estimating the distance measure. See [`distance`] for allowable #' options with each option. #' @param distance.options a named list containing additional arguments #' supplied to the function that estimates the distance measure as determined #' by the argument to `distance`. #' @param estimand a string containing the desired estimand. Allowable options #' include `"ATT"` and `"ATC"`. See Details. #' @param exact for which variables exact matching should take place; two units with different values of an exact matching variable will not be paired. #' @param mahvars for which variables Mahalanobis distance matching should take #' place when `distance` corresponds to a propensity score (e.g., for #' caliper matching or to discard units for common support). If specified, the #' distance measure will not be used in matching. #' @param antiexact for which variables anti-exact matching should take place; two units with the same value of an anti-exact matching variable will not be paired. #' @param discard a string containing a method for discarding units outside a #' region of common support. Only allowed when `distance` corresponds to a #' propensity score. #' @param reestimate if `discard` is not `"none"`, whether to #' re-estimate the propensity score in the remaining sample prior to matching. #' @param s.weights the variable containing sampling weights to be incorporated #' into propensity score models and balance statistics. #' @param replace whether matching should be done with replacement (i.e., whether control units can be used as matches multiple times). See also the `reuse.max` argument below. Default is `FALSE` for matching without replacement. #' @param m.order the order that the matching takes place. Allowable options #' include `"largest"`, where matching takes place in descending order of #' distance measures; `"smallest"`, where matching takes place in ascending #' order of distance measures; `"closest"`, where matching takes place in #' ascending order of the smallest distance between units; `"farthest"`, where matching takes place in #' descending order of the smallest distance between units; `"random"`, where matching takes place #' in a random order; and `"data"` where matching takes place based on the #' order of units in the data. When `m.order = "random"`, results may differ #' across different runs of the same code unless a seed is set and specified #' with [set.seed()]. The default of `NULL` corresponds to `"largest"` when a #' propensity score is estimated or supplied as a vector and `"data"` #' otherwise. See Details for more information. #' @param caliper the width(s) of the caliper(s) used for caliper matching. Two units with a difference on a caliper variable larger than the caliper will not be paired. See Details and Examples. #' @param std.caliper `logical`; when calipers are specified, whether they #' are in standard deviation units (`TRUE`) or raw units (`FALSE`). #' @param ratio how many control units should be matched to each treated unit #' for k:1 matching. For variable ratio matching, see section "Variable Ratio #' Matching" in Details below. When `ratio` is greater than 1, all treated units will be attempted to be matched with a control unit before any treated unit is matched with a second control unit, etc. This reduces the possibility that control units will be used up before some treated units receive any matches. #' @param min.controls,max.controls for variable ratio matching, the minimum #' and maximum number of controls units to be matched to each treated unit. See #' section "Variable Ratio Matching" in Details below. #' @param verbose `logical`; whether information about the matching #' process should be printed to the console. When `TRUE`, a progress bar #' implemented using *RcppProgress* will be displayed along with an estimate of the time remaining. #' @param \dots additional arguments that control the matching specification: #' \describe{ #' \item{`reuse.max`}{ `numeric`; the maximum number of #' times each control can be used as a match. Setting `reuse.max = 1` #' corresponds to matching without replacement (i.e., `replace = FALSE`), #' and setting `reuse.max = Inf` corresponds to traditional matching with #' replacement (i.e., `replace = TRUE`) with no limit on the number of #' times each control unit can be matched. Other values restrict the number of #' times each control can be matched when matching with replacement. #' `replace` is ignored when `reuse.max` is specified. } #' \item{`unit.id`}{ one or more variables containing a unit ID for each #' observation, i.e., in case multiple observations correspond to the same #' unit. Once a control observation has been matched, no other observation with #' the same unit ID can be used as matches. This ensures each control unit is #' used only once even if it has multiple observations associated with it. #' Omitting this argument is the same as giving each observation a unique ID.} #' } #' #' @section Outputs: #' All outputs described in [matchit()] are returned with #' `method = "nearest"`. When `replace = TRUE`, the `subclass` #' component is omitted. `include.obj` is ignored. #' #' @details #' ## Mahalanobis Distance Matching #' #' Mahalanobis distance matching can be done one of two ways: #' \enumerate{ #' \item{If no propensity score needs to be estimated, `distance` should be #' set to `"mahalanobis"`, and Mahalanobis distance matching will occur #' using all the variables in `formula`. Arguments to `discard` and #' `mahvars` will be ignored, and a caliper can only be placed on named #' variables. For example, to perform simple Mahalanobis distance matching, the #' following could be run: #' #' \preformatted{ #' matchit(treat ~ X1 + X2, method = "nearest", #' distance = "mahalanobis") } #' #' With this code, the Mahalanobis distance is computed using `X1` and #' `X2`, and matching occurs on this distance. The `distance` #' component of the `matchit()` output will be empty. #' } #' \item{If a propensity score needs to be estimated for any reason, e.g., for #' common support with `discard` or for creating a caliper, #' `distance` should be whatever method is used to estimate the propensity #' score or a vector of distance measures. Use `mahvars` to specify the #' variables used to create the Mahalanobis distance. For example, to perform #' Mahalanobis within a propensity score caliper, the following could be run: #' #' \preformatted{ #' matchit(treat ~ X1 + X2 + X3, method = "nearest", #' distance = "glm", caliper = .25, #' mahvars = ~ X1 + X2) } #' #' With this code, `X1`, `X2`, and `X3` are used to estimate the #' propensity score (using the `"glm"` method, which by default is #' logistic regression), which is used to create a matching caliper. The actual #' matching occurs on the Mahalanobis distance computed only using `X1` #' and `X2`, which are supplied to `mahvars`. Units whose propensity #' score difference is larger than the caliper will not be paired, and some #' treated units may therefore not receive a match. The estimated propensity #' scores will be included in the `distance` component of the #' `matchit()` output. See Examples. #' } #' } #' ## Estimand #' #' The `estimand` argument controls whether control units are selected to be #' matched with treated units (`estimand = "ATT"`) or treated units are #' selected to be matched with control units (`estimand = "ATC"`). The #' "focal" group (e.g., the treated units for the ATT) is typically made to be #' the smaller treatment group, and a warning will be thrown if it is not set #' that way unless `replace = TRUE`. Setting `estimand = "ATC"` is #' equivalent to swapping all treated and control labels for the treatment #' variable. When `estimand = "ATC"`, the default `m.order` is #' `"smallest"`, and the `match.matrix` component of the output will #' have the names of the control units as the rownames and be filled with the #' names of the matched treated units (opposite to when `estimand = "ATT"`). Note that the argument supplied to `estimand` doesn't #' necessarily correspond to the estimand actually targeted; it is merely a #' switch to trigger which treatment group is considered "focal". #' #' ## Variable Ratio Matching #' #' `matchit()` can perform variable ratio "extremal" matching as described by Ming and Rosenbaum (2000; \doi{10.1111/j.0006-341X.2000.00118.x}). This #' method tends to result in better balance than fixed ratio matching at the #' expense of some precision. When `ratio > 1`, rather than requiring all #' treated units to receive `ratio` matches, each treated unit is assigned #' a value that corresponds to the number of control units they will be matched #' to. These values are controlled by the arguments `min.controls` and #' `max.controls`, which correspond to \eqn{\alpha} and \eqn{\beta}, #' respectively, in Ming and Rosenbaum (2000), and trigger variable ratio #' matching to occur. Some treated units will receive `min.controls` #' matches and others will receive `max.controls` matches (and one unit #' may have an intermediate number of matches); how many units are assigned #' each number of matches is determined by the algorithm described in Ming and #' Rosenbaum (2000, p119). `ratio` controls how many total control units #' will be matched: `n1 * ratio` control units will be matched, where #' `n1` is the number of treated units, yielding the same total number of #' matched controls as fixed ratio matching does. #' #' Variable ratio matching cannot be used with Mahalanobis distance matching or #' when `distance` is supplied as a matrix. The calculations of the #' numbers of control units each treated unit will be matched to occurs without #' consideration of `caliper` or `discard`. `ratio` does not #' have to be an integer but must be greater than 1 and less than `n0/n1`, #' where `n0` and `n1` are the number of control and treated units, #' respectively. Setting `ratio = n0/n1` performs a crude form of full #' matching where all control units are matched. If `min.controls` is not #' specified, it is set to 1 by default. `min.controls` must be less than #' `ratio`, and `max.controls` must be greater than `ratio`. See #' Examples below for an example of their use. #' #' ## Using `m.order = "closest"` or `"farthest"` #' #' `m.order` can be set to `"closest"` or `"farthest"`, which work regardless of how the distance measure is specified. This matches in order of the distance between units. First, all the closest match is found for all treated units and the pairwise distances computed; when `m.order = "closest"` the pair with the smallest of the distances is matched first, and when `m.order = "farthest"`, the pair with the largest of the distances is matched first. Then, the pair with the second smallest (or largest) is matched second. If the matched control is ineligible (i.e., because it has already been used in a prior match), a new match is found for the treated unit, the new pair's distance is re-computed, and the pairs are re-ordered by distance. #' #' Using `m.order = "closest"` ensures that the best possible matches are given priority, and in that sense should perform similarly to `m.order = "smallest"`. It can be used to ensure the best matches, especially when matching with a caliper. Using `m.order = "farthest"` ensures that the hardest units to match are given their best chance to find a close match, and in that sense should perform similarly to `m.order = "largest"`. It can be used to reduce the possibility of extreme imbalance when there are hard-to-match units competing for controls. Note that `m.order = "farthest"` **does not** implement "far matching" (i.e., finding the farthest control unit from each treated unit); it defines the order in which the closest matches are selected. #' #' ## Reproducibility #' #' Nearest neighbor matching involves a random component only when `m.order = "random"` (or when the propensity is estimated using a method with randomness; see [`distance`] for details), so a seed must be set in that case using [set.seed()] to ensure reproducibility. Otherwise, it is purely deterministic, and any ties are broken based on the order in which the data appear. #' #' @seealso [matchit()] for a detailed explanation of the inputs and outputs of #' a call to `matchit()`. #' #' [method_optimal()] for optimal pair matching, which is similar to #' nearest neighbor matching without replacement except that an overall distance criterion is #' minimized (i.e., as an alternative to specifying `m.order`). #' #' @references In a manuscript, you don't need to cite another package when #' using `method = "nearest"` because the matching is performed completely #' within *MatchIt*. For example, a sentence might read: #' #' *Nearest neighbor matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R.* #' #' @examples #' data("lalonde") #' #' # 1:1 greedy NN matching on the PS #' m.out1 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' method = "nearest") #' m.out1 #' summary(m.out1) #' #' # 3:1 NN Mahalanobis distance matching with #' # replacement within a PS caliper #' m.out2 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' method = "nearest", #' replace = TRUE, #' mahvars = ~ age + educ + re74 + re75, #' ratio = 3, #' caliper = .02) #' m.out2 #' summary(m.out2, un = FALSE) #' #' # 1:1 NN Mahalanobis distance matching within calipers #' # on re74 and re75 and exact matching on married and race #' m.out3 <- matchit(treat ~ age + educ + re74 + re75, #' data = lalonde, #' method = "nearest", #' distance = "mahalanobis", #' exact = ~ married + race, #' caliper = c(re74 = .2, re75 = .15)) #' m.out3 #' summary(m.out3, un = FALSE) #' #' # 2:1 variable ratio NN matching on the PS #' m.out4 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' method = "nearest", #' ratio = 2, #' min.controls = 1, #' max.controls = 12) #' m.out4 #' summary(m.out4, un = FALSE) #' #' # Some units received 1 match and some received 12 #' table(table(m.out4$subclass[m.out4$treat == 0])) #' NULL matchit2nearest <- function(treat, data, distance, discarded, ratio = 1, s.weights = NULL, replace = FALSE, m.order = NULL, caliper = NULL, mahvars = NULL, exact = NULL, formula = NULL, estimand = "ATT", verbose = FALSE, is.full.mahalanobis, antiexact = NULL, unit.id = NULL, ...) { .cat_verbose("Nearest neighbor matching... \n", verbose = verbose) estimand <- toupper(estimand) estimand <- match_arg(estimand, c("ATT", "ATC")) if (estimand == "ATC") { tc <- c("control", "treated") focal <- 0 } else { tc <- c("treated", "control") focal <- 1 } treat <- setNames(as.integer(treat == focal), names(treat)) if (is.full.mahalanobis) { if (is_null(attr(terms(formula, data = data), "term.labels"))) { .err(sprintf("covariates must be specified in the input formula when `distance = %s`", add_quotes(attr(is.full.mahalanobis, "transform")))) } mahvars <- formula } n.obs <- length(treat) n1 <- sum(treat == 1) n0 <- n.obs - n1 min.controls <- attr(ratio, "min.controls") max.controls <- attr(ratio, "max.controls") mahcovs <- distance_mat <- NULL if (is_not_null(mahvars)) { transform <- { if (is.full.mahalanobis) attr(is.full.mahalanobis, "transform") else "mahalanobis" } mahcovs <- transform_covariates(mahvars, data = data, method = transform, s.weights = s.weights, treat = treat, discarded = discarded) } else if (is.matrix(distance)) { distance_mat <- distance distance <- NULL if (focal == 0) { distance_mat <- t(distance_mat) } } #Process caliper caliper.dist <- caliper.covs <- NULL caliper.covs.mat <- NULL ex.caliper <- NULL if (is_not_null(caliper)) { if (any(nzchar(names(caliper)))) { caliper.covs <- caliper[nzchar(names(caliper))] caliper.covs.mat <- get_covs_matrix(reformulate(names(caliper.covs)), data = data) if (is_not_null(mahcovs) || is_not_null(distance_mat)) { ex.caliper.list <- setNames(lapply(names(caliper.covs), function(i) { if (caliper.covs[i] < 0) { return(integer(0L)) } splits <- get_splitsC(as.numeric(caliper.covs.mat[, i]), as.numeric(caliper.covs[i])) if (is_null(splits)) { return(integer(0L)) } cut(caliper.covs.mat[, i], breaks = splits, include.lowest = TRUE) }), names(caliper.covs)) ex.caliper.list[lengths(ex.caliper.list) == 0L] <- NULL if (is_not_null(ex.caliper.list)) { for (i in seq_along(ex.caliper.list)) { levels(ex.caliper.list[[i]]) <- paste(names(ex.caliper.list)[i], "\u2208", levels(ex.caliper.list[[i]])) } ex.caliper <- exactify(ex.caliper.list, nam = names(treat), sep = ", ", justify = NULL) } } } if (!all(nzchar(names(caliper)))) { caliper.dist <- caliper[!nzchar(names(caliper))] } } #Process antiexact antiexactcovs <- NULL if (is_not_null(antiexact)) { antiexactcovs <- do.call("cbind", lapply(model.frame(antiexact, data), function(i) { unclass(as.factor(i)) })) } reuse.max <- attr(replace, "reuse.max") if (reuse.max >= n1) { m.order <- "data" } #unit.id if (is_not_null(unit.id) && reuse.max < n1) { unit.id <- process.variable.input(unit.id, data) unit.id <- exactify(model.frame(unit.id, data = data), nam = names(treat), sep = ", ", include_vars = TRUE) num_ctrl_unit.ids <- length(unique(unit.id[treat == 0])) num_trt_unit.ids <- length(unique(unit.id[treat == 1])) #If each unit is a unit.id, unit.ids are meaningless if (num_ctrl_unit.ids == n0 && num_trt_unit.ids == n1) { unit.id <- NULL } } else { unit.id <- NULL } #Process exact ex <- NULL if (is_not_null(exact)) { ex <- exactify(model.frame(exact, data = data), nam = names(treat), sep = ", ", include_vars = TRUE) cc <- Reduce("intersect", lapply(unique(treat), function(t) unclass(ex)[treat == t])) if (is_null(cc)) { .err("no matches were found") } cc <- sort(cc) } if (reuse.max < n1) { if (is_not_null(ex)) { w1 <- w2 <- FALSE for (e in levels(ex)) { ex_e <- which(ex == e) e_ratio <- { if (is_null(unit.id)) reuse.max * sum(treat[ex_e] == 0) / sum(treat[ex_e] == 1) else reuse.max * length(unique(unit.id[ex_e][treat[ex_e] == 0])) / sum(treat[ex_e] == 1) } if (!w1 && e_ratio < 1) { .wrn(sprintf("fewer %s %s than %s units in some `exact` strata; not all %s units will get a match", tc[2L], if (is_null(unit.id)) "units" else "unit IDs", tc[1L], tc[1L])) w1 <- TRUE } if (!w2 && ratio > 1 && e_ratio < ratio) { if (is_null(max.controls) || ratio == max.controls) .wrn(sprintf("not all %s units will get %s matches", tc[1L], ratio)) else .wrn(sprintf("not enough %s %s for an average of %s matches per %s unit in all `exact` strata", tc[2L], if (is_null(unit.id)) "units" else "unit IDs", ratio, tc[1L])) w2 <- TRUE } if (w1 && w2) { break } } } else { e_ratio <- { if (is_null(unit.id)) as.numeric(reuse.max) * n0 / n1 else as.numeric(reuse.max) * num_ctrl_unit.ids / num_trt_unit.ids } if (e_ratio < 1) { .wrn(sprintf("fewer %s %s than %s units; not all %s units will get a match", tc[2L], if (is_null(unit.id)) "units" else "unit IDs", tc[1L], tc[1L])) } else if (e_ratio < ratio) { if (is_null(max.controls) || ratio == max.controls) .wrn(sprintf("not all %s units will get %s matches", tc[1], ratio)) else .wrn(sprintf("not enough %s %s for an average of %s matches per %s unit", tc[2L], if (is_null(unit.id)) "units" else "unit IDs", ratio, tc[1L])) } } } #Variable ratio (extremal matching), Ming & Rosenbaum (2000) #Each treated unit get its own value of ratio if (is_not_null(max.controls)) { if (is_null(distance)) { if (is.full.mahalanobis) { .err(sprintf('`distance` cannot be "%s" for variable ratio nearest neighbor matching', transform)) } else { .err("`distance` cannot be supplied as a matrix for variable ratio nearest neighbor matching") } } m <- round(ratio * n1) # if (m > sum(treat == 0)) stop("'ratio' must be less than or equal to n0/n1") kmax <- floor((m - min.controls * (n1 - 1)) / (max.controls - min.controls)) kmin <- n1 - kmax - 1 kmed <- m - (min.controls * kmin + max.controls * kmax) ratio0 <- c(rep.int(min.controls, kmin), kmed, rep.int(max.controls, kmax)) #Make sure no units are assigned 0 matches while (any(ratio0 == 0)) { ind <- which(ratio0 == 0)[1L] ratio0[ind] <- 1 if (ind == length(ratio0)) { break } ratio0[ind + 1L] <- ratio0[ind + 1] - 1 } ratio <- rep.int(NA_integer_, n1) #Order by distance; treated are supposed to have higher values ratio[order(distance[treat == 1], decreasing = mean(distance[treat == 1]) > mean(distance[treat != 1]))] <- ratio0 ratio <- as.integer(ratio) } else { ratio <- rep.int(as.integer(ratio), n1) } m.order <- { if (is_null(distance)) match_arg(m.order, c("data", "random", "closest", "farthest")) else if (is_null(m.order)) switch(estimand, "ATC" = "smallest", "largest") else match_arg(m.order, c("largest", "smallest", "data", "random", "closest", "farthest")) } #If mahcovs is only 1 variable, use vec matching for speed if (is.full.mahalanobis && is_null(distance) && is_null(caliper.dist) && is_not_null(mahcovs) && ncol(mahcovs) == 1L) { distance <- mahcovs[, 1L] mahcovs <- NULL } if (is_not_null(ex)) { discarded[!ex %in% levels(ex)[cc]] <- TRUE } if (is_null(ex) || is_not_null(unit.id) || (is_null(mahcovs) && is_null(distance_mat))) { if (is_not_null(ex.caliper)) { ex <- exactify(list(ex, ex.caliper), nam = names(treat), sep = ", ", include_vars = FALSE) } mm <- nn_matchC_dispatch(treat, 1L, ratio, discarded, reuse.max, distance, distance_mat, ex, caliper.dist, caliper.covs, caliper.covs.mat, mahcovs, antiexactcovs, unit.id, m.order, verbose) } else { mm_list <- lapply(levels(ex)[cc], function(e) { .cat_verbose(sprintf("Matching subgroup %s/%s: %s...\n", match(e, levels(ex)[cc]), length(cc), e), verbose = verbose) .e <- which(ex == e) .e1 <- which(ex[treat == 1] == e) treat_ <- treat[.e] discarded_ <- discarded[.e] distance_ <- NULL if (is_not_null(distance)) { distance_ <- distance[.e] } ex.caliper_ <- NULL if (is_not_null(ex.caliper)) { ex.caliper_ <- ex.caliper[.e] } caliper.covs.mat_ <- NULL if (is_not_null(caliper.covs.mat)) { caliper.covs.mat_ <- caliper.covs.mat[.e, , drop = FALSE] } mahcovs_ <- NULL if (is_not_null(mahcovs)) { mahcovs_ <- mahcovs[.e, , drop = FALSE] } antiexactcovs_ <- NULL if (is_not_null(antiexactcovs)) { antiexactcovs_ <- antiexactcovs[.e, , drop = FALSE] } distance_mat_ <- NULL if (is_not_null(distance_mat)) { .e0 <- which(ex[treat == 0] == e) distance_mat_ <- distance_mat[.e1, .e0, drop = FALSE] } ratio_ <- ratio[.e1] mm_ <- nn_matchC_dispatch(treat_, 1L, ratio_, discarded_, reuse.max, distance_, distance_mat_, ex.caliper_, caliper.dist, caliper.covs, caliper.covs.mat_, mahcovs_, antiexactcovs_, NULL, m.order, verbose) #Ensure matched indices correspond to indices in full sample, not subgroup mm_[] <- .e[mm_] mm_ }) #Construct match.matrix mm <- matrix(NA_integer_, nrow = n1, ncol = max(vapply(mm_list, ncol, numeric(1L))), dimnames = list(names(treat)[treat == 1], NULL)) for (m in mm_list) { mm[rownames(m), seq_len(ncol(m))] <- m } } .cat_verbose("Calculating matching weights... ", verbose = verbose) if (reuse.max > 1) { psclass <- NULL weights <- get_weights_from_mm(mm, treat, 1L) } else { psclass <- mm2subclass(mm, treat, 1L) weights <- get_weights_from_subclass(psclass, treat) } res <- list(match.matrix = nummm2charmm(mm, treat), subclass = psclass, weights = weights) .cat_verbose("Done.\n", verbose = verbose) class(res) <- "matchit" res } # Dispatches Rcpp functions for NN matching nn_matchC_dispatch <- function(treat, focal, ratio, discarded, reuse.max, distance, distance_mat, ex, caliper.dist, caliper.covs, caliper.covs.mat, mahcovs, antiexactcovs, unit.id, m.order, verbose) { if (m.order %in% c("closest", "farthest")) { if (is_not_null(mahcovs)) { nn_matchC_mahcovs_closest(treat, ratio, discarded, reuse.max, mahcovs, distance, ex, caliper.dist, caliper.covs, caliper.covs.mat, antiexactcovs, unit.id, m.order == "closest", verbose) } else if (is_not_null(distance_mat)) { nn_matchC_distmat_closest(treat, ratio, discarded, reuse.max, distance_mat, ex, caliper.dist, caliper.covs, caliper.covs.mat, antiexactcovs, unit.id, m.order == "closest", verbose) } else { nn_matchC_vec_closest(treat, ratio, discarded, reuse.max, distance, ex, caliper.dist, caliper.covs, caliper.covs.mat, antiexactcovs, unit.id, m.order == "closest", verbose) } } else { ord <- switch(m.order, "largest" = order(distance[treat == focal], decreasing = TRUE), "smallest" = order(distance[treat == focal], decreasing = FALSE), "random" = sample(which(!discarded[treat == focal])), "data" = which(!discarded[treat == focal])) if (is_not_null(mahcovs)) { nn_matchC_mahcovs(treat, ord, ratio, discarded, reuse.max, focal, mahcovs, distance, ex, caliper.dist, caliper.covs, caliper.covs.mat, antiexactcovs, unit.id, verbose) } else if (is_not_null(distance_mat)) { nn_matchC_distmat(treat, ord, ratio, discarded, reuse.max, focal, distance_mat, ex, caliper.dist, caliper.covs, caliper.covs.mat, antiexactcovs, unit.id, verbose) } else { nn_matchC_vec(treat, ord, ratio, discarded, reuse.max, focal, distance, ex, caliper.dist, caliper.covs, caliper.covs.mat, antiexactcovs, unit.id, verbose) } } } MatchIt/R/input_processing.R0000644000176200001440000004546414763225432015547 0ustar liggesusers#Function to process inputs and throw warnings or errors if inputs are incompatible with methods check.inputs <- function(mcall, method, distance, exact, mahvars, antiexact, caliper, discard, reestimate, s.weights, replace, ratio, m.order, estimand, ..., min.controls = NULL, max.controls = NULL) { null.method <- is_null(method) if (null.method) { method <- "NULL" } else { method <- match_arg(method, c("exact", "cem", "nearest", "optimal", "full", "genetic", "subclass", "cardinality", "quick")) } ignored.inputs <- character(0L) error.inputs <- character(0L) .entered_arg <- function(mcall, i) { if (!hasName(mcall, i)) { return(FALSE) } i_ <- get0(i, envir = pos.to.env(-2L), inherits = FALSE) if (is_null(i_)) { return(FALSE) } !identical(i_, eval(formals(matchit)[[i]])) } if (null.method) { for (i in c("exact", "mahvars", "antiexact", "caliper", "std.caliper", "replace", "ratio", "min.controls", "max.controls", "m.order")) { if (.entered_arg(mcall, i)) { ignored.inputs <- c(ignored.inputs, i) } } } else if (method == "exact") { for (i in c("distance", "exact", "mahvars", "antiexact", "caliper", "std.caliper", "discard", "reestimate", "replace", "ratio", "min.controls", "max.controls", "m.order")) { if (.entered_arg(mcall, i)) { ignored.inputs <- c(ignored.inputs, i) } } } else if (method == "cem") { for (i in c("distance", "exact", "mahvars", "antiexact", "caliper", "std.caliper", "discard", "reestimate", "replace", "ratio", "min.controls", "max.controls")) { if (.entered_arg(mcall, i)) { ignored.inputs <- c(ignored.inputs, i) } } } else if (method == "nearest") { if (is.character(distance) && distance %in% matchit_distances()) { for (e in c("mahvars", "reestimate")) { if (.entered_arg(mcall, e)) { error.inputs <- c(error.inputs, e) } } } } else if (method == "optimal") { if (is.character(distance) && distance %in% matchit_distances()) { for (e in c("mahvars", "reestimate")) { if (.entered_arg(mcall, e)) { error.inputs <- c(error.inputs, e) } } } for (i in c("replace", "caliper", "std.caliper", "m.order")) { if (.entered_arg(mcall, i)) { ignored.inputs <- c(ignored.inputs, i) } } } else if (method == "full") { if (is.character(distance) && distance %in% matchit_distances()) { for (e in c("mahvars", "reestimate")) { if (.entered_arg(mcall, e)) { error.inputs <- c(error.inputs, e) } } } for (i in c("replace", "ratio", "m.order")) { if (.entered_arg(mcall, i)) { ignored.inputs <- c(ignored.inputs, i) } } } else if (method == "genetic") { if (is.character(distance) && distance %in% matchit_distances()) { for (e in c("mahvars", "reestimate")) { if (.entered_arg(mcall, e)) { error.inputs <- c(error.inputs, e) } } } for (i in c("min.controls", "max.controls")) { if (.entered_arg(mcall, i)) { ignored.inputs <- c(ignored.inputs, i) } } } else if (method == "cardinality") { for (i in c("distance", "antiexact", "caliper", "std.caliper", "reestimate", "replace", "min.controls", "m.order")) { if (.entered_arg(mcall, i)) { ignored.inputs <- c(ignored.inputs, i) } } } else if (method == "subclass") { for (i in c("exact", "mahvars", "antiexact", "caliper", "std.caliper", "replace", "ratio", "min.controls", "max.controls", "m.order")) { if (.entered_arg(mcall, i)) { ignored.inputs <- c(ignored.inputs, i) } } } else if (method == "quick") { if (is.character(distance) && distance %in% matchit_distances()) { for (e in c("mahvars", "reestimate")) { if (.entered_arg(mcall, e)) { error.inputs <- c(error.inputs, e) } } } for (i in c("replace", "ratio", "min.controls", "max.controls", "m.order", "antiexact")) { if (.entered_arg(mcall, i)) { ignored.inputs <- c(ignored.inputs, i) } } } if (is_not_null(ignored.inputs)) { .wrn(sprintf("the argument%%s %s %%r not used with `method = %s` and will be ignored", word_list(ignored.inputs, quotes = "`"), add_quotes(method, quotes = !null.method)), n = length(ignored.inputs)) } if (is_not_null(error.inputs)) { .err(sprintf("the argument%%s %s %%r not used with `method = %s` and `distance = %s`", word_list(error.inputs, quotes = "`"), add_quotes(method, quotes = !null.method), add_quotes(distance)), n = length(error.inputs)) } ignored.inputs } #Check treatment for type, binary, missing, num. rows check_treat <- function(treat = NULL, X = NULL) { if (is_null(treat)) { if (is_null(X) || is_null(attr(X, "treat"))) { return(NULL) } treat <- attr(X, "treat") } if (isTRUE(attr(treat, "checked"))) { return(treat) } if (!is.atomic(treat) || is_not_null(dim(treat))) { .err("the treatment must be a vector") } if (anyNA(treat)) { .err("missing values are not allowed in the treatment") } if (TRUE) { if (!has_n_unique(treat, 2L)) { .err("the treatment must be a binary variable") } treat <- binarize(treat) #make 0/1 } else { if (has_n_unique(treat, 2L)) { treat <- { if (is.logical(treat) || all(as.character(treat) %in% c("0", "1"))) factor(treat, levels = sort(unique(treat, nmax = 2)), labels = c("control", "treated"), ordered = FALSE) else factor(treat, nmax = 2, ordered = FALSE) } # treat <- binarize(treat) #make 0/1 attr(treat, "type") <- "binary" attr(treat, "treated") <- levels(treat)[2L] attr(treat, "ordered") <- FALSE } else { .err("the treatment must be a binary variable") #Remove to support multi if (!chk::vld_character_or_factor(treat)) { .err("the treatment must be a factor variable if it takes on more than 2 unique values") } treat <- droplevels(as.factor(treat)) attr(treat, "type") <- "multi" # attr(treat, "treated") <- levels(treat)[which.min(tabulateC(treat))] attr(treat, "ordered") <- is.ordered(treat) } } if (is_not_null(X) && length(treat) != nrow(X)) { .err("the treatment and covariates must have the same number of units") } attr(treat, "checked") <- TRUE treat } #Function to process distance and give warnings about new syntax process.distance <- function(distance, method = NULL, treat) { if (is_null(distance)) { if (is_not_null(method) && !method %in% c("cem", "exact", "cardinality")) { .err(sprintf("`distance` cannot be `NULL` with `method = %s`", add_quotes(method))) } return(distance) } if (chk::vld_string(distance)) { allowable.distances <- c( #Propensity score methods "glm", "cbps", "gam", "nnet", "rpart", "bart", "randomforest", "elasticnet", "lasso", "ridge", "gbm", #Distance matrices matchit_distances() ) if (tolower(distance) %in% c("cauchit", "cloglog", "linear.cloglog", "linear.log", "linear.logit", "linear.probit", "linear.cauchit", "log", "probit")) { link <- tolower(distance) .wrn(sprintf('`distance = "%s"` will be deprecated; please use `distance = "glm", link = "%s"` in the future', distance, link)) distance <- "glm" attr(distance, "link") <- link } else if (tolower(distance) %in% tolower(c("GAMcloglog", "GAMlog", "GAMlogit", "GAMprobit"))) { link <- tolower(substr(distance, 4, nchar(distance))) .wrn(sprintf('`distance = "%s"` will be deprecated; please use `distance = "gam", link = "%s"` in the future', distance, link)) distance <- "gam" attr(distance, "link") <- link } else if (tolower(distance) == "logit") { distance <- "glm" attr(distance, "link") <- "logit" } else if (tolower(distance) == "glmnet") { distance <- "elasticnet" } else if (!tolower(distance) %in% allowable.distances) { .err('the argument supplied to `distance` is not an allowable value. See `help("distance", package = "MatchIt")` for allowable options') } else if (is_not_null(method) && method == "subclass" && tolower(distance) %in% matchit_distances()) { .err(sprintf('`distance` cannot be %s with `method = "subclass"`', add_quotes(distance))) } else { distance <- tolower(distance) } return(distance) } if (!is.numeric(distance) || (is_not_null(dim(distance)) && length(dim(distance)) != 2)) { .err("`distance` must be a string with the name of the distance measure to be used or a numeric vector or matrix containing distance measures") } if (is.matrix(distance) && (is_null(method) || !method %in% c("nearest", "optimal", "full"))) { .err(sprintf("`distance` cannot be supplied as a matrix with `method = %s`", add_quotes(method, quotes = is_not_null(method)))) } if (is.matrix(distance)) { dim.distance <- dim(distance) if (all_equal_to(dim.distance, length(treat))) { if (is_not_null(rownames(distance))) { distance <- distance[names(treat), , drop = FALSE] } if (is_not_null(colnames(distance))) { distance <- distance[, names(treat), drop = FALSE] } distance <- distance[treat == 1, treat == 0, drop = FALSE] } else if (dim.distance[1L] == sum(treat == 1) && dim.distance[2L] == sum(treat == 0)) { if (is_not_null(rownames(distance))) { distance <- distance[names(treat)[treat == 1], , drop = FALSE] } if (is_not_null(colnames(distance))) { distance <- distance[, names(treat)[treat == 0], drop = FALSE] } } else { .err("when supplied as a matrix, `distance` must have dimensions NxN or N1xN0. See `help(\"distance\")` for details") } } else if (length(distance) != length(treat)) { .err("`distance` must be the same length as the dataset if specified as a numeric vector") } chk::chk_not_any_na(distance) distance } #Function to check ratio is acceptable process.ratio <- function(ratio, method = NULL, ..., min.controls = NULL, max.controls = NULL) { #Should be run after process.inputs() and ignored inputs set to NULL ratio.null <- is_null(ratio) ratio.na <- !ratio.null && anyNA(ratio) if (is_null(method)) { return(1) } if (method %in% c("nearest", "optimal")) { if (ratio.null) { ratio <- 1 } else { chk::chk_number(ratio) chk::chk_gte(ratio, 1) } if (is_null(max.controls)) { if (!chk::vld_whole_number(ratio)) { .err("`ratio` must be a whole number when `max.controls` is not specified") } ratio <- round(ratio) } else { chk::chk_count(max.controls) if (ratio == 1) { .err("`ratio` must be greater than 1 for variable ratio matching") } if (max.controls <= ratio) { .err("`max.controls` must be greater than `ratio` for variable ratio matching") } if (is_null(min.controls)) { min.controls <- 1 } else { chk::chk_count(min.controls) } if (min.controls < 1) { .err("`min.controls` cannot be less than 1 for variable ratio matching") } if (min.controls >= ratio) { .err("`min.controls` must be less than `ratio` for variable ratio matching") } } } else if (method == "full") { if (is_null(max.controls)) { max.controls <- Inf } else { chk::chk_number(max.controls) chk::chk_gt(max.controls, 0) } if (is_null(min.controls)) { min.controls <- 0 } else { chk::chk_number(min.controls) chk::chk_gt(min.controls, 0) } ratio <- 1 #Just to get min.controls and max.controls out } else if (method == "genetic") { if (ratio.null) { ratio <- 1 } else { chk::chk_count(ratio) } min.controls <- max.controls <- NULL } else if (method == "cardinality") { if (ratio.null) { ratio <- 1 } else if (!ratio.na && (!chk::vld_number(ratio) || !chk::vld_gte(ratio, 0))) { .err("`ratio` must be a single positive number or `NA`") } min.controls <- max.controls <- NULL } else { min.controls <- max.controls <- NULL } if (is_not_null(ratio)) { attr(ratio, "min.controls") <- min.controls attr(ratio, "max.controls") <- max.controls } ratio } #Function to check if caliper is okay and process it process.caliper <- function(caliper = NULL, method = NULL, data = NULL, covs = NULL, mahcovs = NULL, distance = NULL, discarded = NULL, std.caliper = TRUE) { #Check method; must be able to use a caliper #Check caliper names; if "" is one of them but distance = "mahal", throw error; #otherwise make sure variables exist in data or covs #Make sure no calipers are used on binary or factor variables (throw error if so) #Ignore calipers used on single-value variables or with caliper = NA or Inf #Export caliper.formula to add to covs #If std, export standardized versions #Check need for caliper if (is_null(caliper) || is_null(method) || !method %in% c("nearest", "genetic", "full", "quick")) { return(NULL) } #Check if form of caliper is okay if (!is.atomic(caliper) || !is.numeric(caliper)) { .err("`caliper` must be a numeric vector") } #Check caliper names if (length(caliper) == 1L && (is_null(names(caliper)) || identical(names(caliper), ""))) { names(caliper) <- "" } else if (is_null(names(caliper))) { .err("`caliper` must be a named vector with names corresponding to the variables for which a caliper is to be applied") } else if (anyNA(names(caliper))) { .err("`caliper` names cannot include `NA`") } else if (sum(!nzchar(names(caliper))) > 1L) { .err("no more than one entry in `caliper` can have no name") } if (hasName(caliper, "") && is_null(distance)) { .err("all entries in `caliper` must be named when `distance` does not correspond to a propensity score") } #Check if caliper name is in available data cal.in.data <- setNames(names(caliper) %in% names(data), names(caliper)) cal.in.covs <- setNames(names(caliper) %in% names(covs), names(caliper)) cal.in.mahcovs <- setNames(names(caliper) %in% names(mahcovs), names(caliper)) if (any(nzchar(names(caliper)) & !cal.in.covs & !cal.in.data)) { .err(paste0("All variables named in `caliper` must be in `data`. Variables not in `data`:\n\t", toString(names(caliper)[nzchar(names(caliper)) & !cal.in.data & !cal.in.covs & !cal.in.mahcovs])), tidy = FALSE) } #Check std.caliper chk::chk_logical(std.caliper) if (length(std.caliper) == 1L) { std.caliper <- rep_with(std.caliper, caliper) } else if (length(std.caliper) == length(caliper)) { names(std.caliper) <- names(caliper) } else { .err("`std.caliper` must be the same length as `caliper`") } #Remove trivial calipers caliper <- caliper[is.finite(caliper)] if (is_null(caliper)) { return(NULL) } #Ensure no calipers on categorical variables cat.vars <- vapply(names(caliper), function(x) { v <- { if (!nzchar(x)) distance else if (cal.in.data[x]) data[[x]] else if (cal.in.covs[x]) covs[[x]] else mahcovs[[x]] } chk::vld_character_or_factor(v) }, logical(1L)) if (any(cat.vars)) { .err(paste0("Calipers cannot be used with factor or character variables. Offending variables:\n\t", toString(ifelse(nzchar(names(caliper)), names(caliper), "")[cat.vars])), tidy = FALSE) } #Process calipers according to std.caliper std.caliper <- std.caliper[names(std.caliper) %in% names(caliper)] chk::chk_not_any_na(std.caliper) if (any(std.caliper)) { if (hasName(std.caliper, "") && isTRUE(std.caliper[!nzchar(names(std.caliper))]) && is.matrix(distance)) { .err("when `distance` is supplied as a matrix and a caliper for it is specified, `std.caliper` must be `FALSE` for the distance measure") } caliper[std.caliper] <- caliper[std.caliper] * vapply(names(caliper)[std.caliper], function(x) { if (!nzchar(x)) sd(distance[!discarded]) else if (cal.in.data[x]) sd(data[[x]][!discarded]) else if (cal.in.covs[x]) sd(covs[[x]][!discarded]) else sd(mahcovs[[x]][!discarded]) }, numeric(1L)) } if (any(caliper < 0) && !method %in% c("nearest", "genetic", "full")) { .err(sprintf("calipers cannot be negative with `method = %s`", add_quotes(method))) } #Add cal.formula if (any(nzchar(names(caliper)) & !cal.in.covs[names(caliper)] & !cal.in.mahcovs[names(caliper)])) { attr(caliper, "cal.formula") <- reformulate(names(caliper)[nzchar(names(caliper)) & !cal.in.covs[names(caliper)] & !cal.in.mahcovs[names(caliper)]]) } caliper } #Function to process replace argument process.replace <- function(replace, method = NULL, ..., reuse.max = NULL) { if (is_null(method)) { return(FALSE) } if (is_null(replace)) { replace <- FALSE } chk::chk_flag(replace) if (method %in% c("nearest")) { if (is_null(reuse.max)) { reuse.max <- if (replace) .Machine$integer.max else 1L } else { chk::chk_count(reuse.max) chk::chk_gte(reuse.max, 1) if (reuse.max > .Machine$integer.max) { reuse.max <- .Machine$integer.max } } replace <- reuse.max > 1L attr(replace, "reuse.max") <- as.integer(reuse.max) } replace } #Process variable input, e.g., to exact or mahvars, that accept a string or rhs formula #Returns a model.frame object process.variable.input <- function(x, data = NULL) { n <- deparse1(substitute(x)) if (is_null(x)) { return(NULL) } if (is.character(x)) { if (is_null(data) || !is.data.frame(data)) { .err(sprintf("if `%s` is specified as strings, a data frame containing the named variables must be supplied to `data`", n)) } if (!all(hasName(data, x))) { .err(sprintf("All names supplied to `%s` must be variables in `data`. Variables not in `data`:\n\t%s", n, toString(add_quotes(setdiff(x, names(data))))), tidy = FALSE) } x <- reformulate(x) } else if (rlang::is_formula(x)) { x <- update(terms(x, data = data), NULL ~ .) } else { .err(sprintf("`%s` must be supplied as a character vector of names or a one-sided formula.", n)) } x_covs <- model.frame(x, data, na.action = "na.pass") if (anyNA(x_covs)) { .err(sprintf("missing values are not allowed in the covariates named in `%s`", n)) } x_covs } MatchIt/R/matchit2full.R0000644000176200001440000004105714763323324014543 0ustar liggesusers#' Optimal Full Matching #' @name method_full #' @aliases method_full #' @usage NULL #' #' @description #' In [matchit()], setting `method = "full"` performs optimal full #' matching, which is a form of subclassification wherein all units, both #' treatment and control (i.e., the "full" sample), are assigned to a subclass #' and receive at least one match. The matching is optimal in the sense that #' that sum of the absolute distances between the treated and control units in #' each subclass is as small as possible. The method relies on and is a wrapper #' for \pkgfun{optmatch}{fullmatch}. #' #' Advantages of optimal full matching include that the matching order is not #' required to be specified, units do not need to be discarded, and it is less #' likely that extreme within-subclass distances will be large, unlike with #' standard subclassification. The primary output of full matching is a set of #' matching weights that can be applied to the matched sample; in this way, #' full matching can be seen as a robust alternative to propensity score #' weighting, robust in the sense that the propensity score model does not need #' to be correct to estimate the treatment effect without bias. Note: with large samples, the optimization may fail or run very slowly; one can try using [`method = "quick"`][method_quick] instead, which also performs full matching but can be much faster. #' #' This page details the allowable arguments with `method = "full"`. #' See [matchit()] for an explanation of what each argument means in a general #' context and how it can be specified. #' #' Below is how `matchit()` is used for optimal full matching: #' \preformatted{ #' matchit(formula, #' data = NULL, #' method = "full", #' distance = "glm", #' link = "logit", #' distance.options = list(), #' estimand = "ATT", #' exact = NULL, #' mahvars = NULL, #' anitexact = NULL, #' discard = "none", #' reestimate = FALSE, #' s.weights = NULL, #' caliper = NULL, #' std.caliper = TRUE, #' verbose = FALSE, #' ...) #' } #' #' @param formula a two-sided [formula] object containing the treatment and #' covariates to be used in creating the distance measure used in the matching. #' This formula will be supplied to the functions that estimate the distance #' measure. #' @param data a data frame containing the variables named in `formula`. #' If not found in `data`, the variables will be sought in the #' environment. #' @param method set here to `"full"`. #' @param distance the distance measure to be used. See [`distance`] #' for allowable options. Can be supplied as a distance matrix. #' @param link when `distance` is specified as a method of estimating #' propensity scores, an additional argument controlling the link function used #' in estimating the distance measure. See [`distance`] for allowable #' options with each option. #' @param distance.options a named list containing additional arguments #' supplied to the function that estimates the distance measure as determined #' by the argument to `distance`. #' @param estimand a string containing the desired estimand. Allowable options #' include `"ATT"`, `"ATC"`, and `"ATE"`. The estimand controls #' how the weights are computed; see the Computing Weights section at #' [matchit()] for details. #' @param exact for which variables exact matching should take place. #' @param mahvars for which variables Mahalanobis distance matching should take #' place when `distance` corresponds to a propensity score (e.g., for #' caliper matching or to discard units for common support). If specified, the #' distance measure will not be used in matching. #' @param antiexact for which variables anti-exact matching should take place. #' Anti-exact matching is processed using \pkgfun{optmatch}{antiExactMatch}. #' @param discard a string containing a method for discarding units outside a #' region of common support. Only allowed when `distance` corresponds to a #' propensity score. #' @param reestimate if `discard` is not `"none"`, whether to #' re-estimate the propensity score in the remaining sample prior to matching. #' @param s.weights the variable containing sampling weights to be incorporated #' into propensity score models and balance statistics. #' @param caliper the width(s) of the caliper(s) used for caliper matching. #' Calipers are processed by \pkgfun{optmatch}{caliper}. Positive and negative calipers are allowed. See Notes and Examples. #' @param std.caliper `logical`; when calipers are specified, whether they #' are in standard deviation units (`TRUE`) or raw units (`FALSE`). #' @param verbose `logical`; whether information about the matching #' process should be printed to the console. #' @param \dots additional arguments passed to \pkgfun{optmatch}{fullmatch}. #' Allowed arguments include `min.controls`, `max.controls`, #' `omit.fraction`, `mean.controls`, `tol`, and `solver`. #' See the \pkgfun{optmatch}{fullmatch} documentation for details. In general, #' `tol` should be set to a low number (e.g., `1e-7`) to get a more #' precise solution. #' #' The arguments `replace`, `m.order`, and `ratio` are ignored with a warning. #' #' @section Outputs: All outputs described in [matchit()] are returned with #' `method = "full"` except for `match.matrix`. This is because #' matching strata are not indexed by treated units as they are in some other #' forms of matching. When `include.obj = TRUE` in the call to #' `matchit()`, the output of the call to \pkgfun{optmatch}{fullmatch} will be #' included in the output. When `exact` is specified, this will be a list #' of such objects, one for each stratum of the `exact` variables. #' #' @details #' ## Mahalanobis Distance Matching #' #' Mahalanobis distance matching can be done one of two ways: #' #' \enumerate{ #' \item{ #' If no propensity score needs to be estimated, `distance` should be #' set to `"mahalanobis"`, and Mahalanobis distance matching will occur #' using all the variables in `formula`. Arguments to `discard` and #' `mahvars` will be ignored, and a caliper can only be placed on named #' variables. For example, to perform simple Mahalanobis distance matching, the #' following could be run: #' #' \preformatted{ #' matchit(treat ~ X1 + X2, method = "nearest", #' distance = "mahalanobis") } #' #' With this code, the Mahalanobis distance is computed using `X1` and #' `X2`, and matching occurs on this distance. The `distance` #' component of the `matchit()` output will be empty. #' } #' \item{ #' If a propensity score needs to be estimated for any reason, e.g., for #' common support with `discard` or for creating a caliper, #' `distance` should be whatever method is used to estimate the propensity #' score or a vector of distance measures, i.e., it should not be #' `"mahalanobis"`. Use `mahvars` to specify the variables used to #' create the Mahalanobis distance. For example, to perform Mahalanobis within #' a propensity score caliper, the following could be run: #' #' \preformatted{ #' matchit(treat ~ X1 + X2 + X3, method = "nearest", #' distance = "glm", caliper = .25, #' mahvars = ~ X1 + X2) } #' #' With this code, `X1`, `X2`, and `X3` are used to estimate the #' propensity score (using the `"glm"` method, which by default is #' logistic regression), which is used to create a matching caliper. The actual #' matching occurs on the Mahalanobis distance computed only using `X1` #' and `X2`, which are supplied to `mahvars`. Units whose propensity #' score difference is larger than the caliper will not be paired, and some #' treated units may therefore not receive a match. The estimated propensity #' scores will be included in the `distance` component of the #' `matchit()` output. See Examples. #' } #' } #' #' @note Calipers can only be used when `min.controls` is left at its #' default. #' #' The option `"optmatch_max_problem_size"` is automatically set to #' `Inf` during the matching process, different from its default in #' *optmatch*. This enables matching problems of any size to be run, but #' may also let huge, infeasible problems get through and potentially take a #' long time or crash R. See \pkgfun{optmatch}{setMaxProblemSize} for more details. #' #' @seealso [matchit()] for a detailed explanation of the inputs and outputs of #' a call to `matchit()`. #' #' \pkgfun{optmatch}{fullmatch}, which is the workhorse. #' #' [`method_optimal`] for optimal pair matching, which is a special #' case of optimal full matching, and which relies on similar machinery. #' Results from `method = "optimal"` can be replicated with `method = "full"` by setting `min.controls`, `max.controls`, and #' `mean.controls` to the desired `ratio`. #' #' [`method_quick`] for fast generalized quick matching, which is very similar to optimal full matching but can be dramatically faster at the expense of optimality and is less customizable. #' #' @references In a manuscript, be sure to cite the following paper if using #' `matchit()` with `method = "full"`: #' #' Hansen, B. B., & Klopfer, S. O. (2006). Optimal Full Matching and Related #' Designs via Network Flows. *Journal of Computational and Graphical Statistics*, #' 15(3), 609–627. \doi{10.1198/106186006X137047} #' #' For example, a sentence might read: #' #' *Optimal full matching was performed using the MatchIt package (Ho, #' Imai, King, & Stuart, 2011) in R, which calls functions from the optmatch #' package (Hansen & Klopfer, 2006).* #' #' Theory is also developed in the following article: #' #' Hansen, B. B. (2004). Full Matching in an Observational Study of Coaching #' for the SAT. Journal of the American Statistical Association, 99(467), #' 609–618. \doi{10.1198/016214504000000647} #' #' @examplesIf requireNamespace("optmatch", quietly = TRUE) #' data("lalonde") #' #' # Optimal full PS matching #' m.out1 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' method = "full") #' m.out1 #' summary(m.out1) #' #' # Optimal full Mahalanobis distance matching within a PS caliper #' m.out2 <- matchit(treat ~ age + educ + race + nodegree + #' married + re74 + re75, #' data = lalonde, #' method = "full", #' caliper = .01, #' mahvars = ~ age + educ + re74 + re75) #' m.out2 #' summary(m.out2, un = FALSE) #' #' # Optimal full Mahalanobis distance matching within calipers #' # of 500 on re74 and re75 #' m.out3 <- matchit(treat ~ age + educ + re74 + re75, #' data = lalonde, #' distance = "mahalanobis", #' method = "full", #' caliper = c(re74 = 500, #' re75 = 500), #' std.caliper = FALSE) #' m.out3 #' summary(m.out3, #' addlvariables = ~race + nodegree + married, #' data = lalonde, #' un = FALSE) NULL matchit2full <- function(treat, formula, data, distance, discarded, ratio = NULL, s.weights = NULL, #min.controls and max.controls in attrs of replace caliper = NULL, mahvars = NULL, exact = NULL, estimand = "ATT", verbose = FALSE, is.full.mahalanobis, antiexact = NULL, ...) { rlang::check_installed("optmatch") .cat_verbose("Full matching... \n", verbose = verbose) .args <- c("omit.fraction", "mean.controls", "tol", "solver") A <- ...mget(.args) A[lengths(A) == 0L] <- NULL estimand <- toupper(estimand) estimand <- match_arg(estimand, c("ATT", "ATC", "ATE")) if (estimand == "ATC") { tc <- c("control", "treated") focal <- 0 } else { tc <- c("treated", "control") focal <- 1 } treat_ <- setNames(as.integer(treat[!discarded] == focal), names(treat)[!discarded]) # if (is_not_null(data)) data <- data[!discarded,] if (is.full.mahalanobis) { if (is_null(attr(terms(formula, data = data), "term.labels"))) { .err(sprintf("covariates must be specified in the input formula when `distance = \"%s\"`", attr(is.full.mahalanobis, "transform"))) } mahvars <- formula } min.controls <- attr(ratio, "min.controls") max.controls <- attr(ratio, "max.controls") #Exact matching strata if (is_not_null(exact)) { ex <- factor(exactify(model.frame(exact, data = data), sep = ", ", include_vars = TRUE)[!discarded]) cc <- Reduce("intersect", lapply(unique(treat_), function(t) unclass(ex)[treat_ == t])) if (is_null(cc)) { .err("No matches were found") } } else { ex <- gl(1, length(treat_), labels = "_") cc <- 1 } #Create distance matrix; note that Mahalanobis distance computed using entire #sample (minus discarded), like method2nearest, as opposed to within exact strata, like optmatch. if (is_not_null(mahvars)) { transform <- if (is.full.mahalanobis) attr(is.full.mahalanobis, "transform") else "mahalanobis" mahcovs <- transform_covariates(mahvars, data = data, method = transform, s.weights = s.weights, treat = treat, discarded = discarded) mo <- eucdist_internal(mahcovs, treat) } else if (is.matrix(distance)) { mo <- distance } else { mo <- eucdist_internal(setNames(distance, names(treat)), treat) } #Transpose distance mat as needed if (focal == 0) { mo <- t(mo) } #Remove discarded units from distance mat mo <- mo[!discarded[treat == focal], !discarded[treat != focal], drop = FALSE] dimnames(mo) <- list(names(treat_)[treat_ == 1], names(treat_)[treat_ == 0]) mo <- optmatch::match_on(mo, data = as.data.frame(data)[!discarded, , drop = FALSE]) mo <- optmatch::as.InfinitySparseMatrix(mo) #Process antiexact if (is_not_null(antiexact)) { antiexactcovs <- model.frame(antiexact, data) for (i in seq_len(ncol(antiexactcovs))) { mo <- mo + optmatch::antiExactMatch(antiexactcovs[[i]][!discarded], z = treat_) } } #Process caliper if (is_not_null(caliper)) { if (min.controls != 0) { .err("calipers cannot be used with `method = \"full\"` when `min.controls` is specified") } if (any(nzchar(names(caliper)))) { cov.cals <- setdiff(names(caliper), "") calcovs <- get_covs_matrix(reformulate(cov.cals, intercept = FALSE), data = data) } for (i in seq_along(caliper)) { if (nzchar(names(caliper)[i])) { mo_cal <- optmatch::match_on(setNames(calcovs[!discarded, names(caliper)[i]], names(treat_)), z = treat_) } else if (is_null(mahvars) || is.matrix(distance)) { mo_cal <- mo } else { mo_cal <- optmatch::match_on(setNames(distance[!discarded], names(treat_)), z = treat_) } mo <- mo + optmatch::caliper(mo_cal, abs(caliper[i]), compare = if (caliper[i] >= 0) `<=` else `>=`) } rm(mo_cal) } #Initialize pair membership; must include names pair <- rep_with(NA_character_, treat) p <- setNames(vector("list", nlevels(ex)), levels(ex)) A$data <- data.frame(treat) #just to get rownames; not actually used in matching A$min.controls <- min.controls A$max.controls <- max.controls for (e in levels(ex)[cc]) { if (nlevels(ex) > 1L) { .cat_verbose(sprintf("Matching subgroup %s/%s: %s...\n", match(e, levels(ex)[cc]), length(cc), e), verbose = verbose) mo_ <- mo[ex[treat_ == 1] == e, ex[treat_ == 0] == e] } else { mo_ <- mo } if (any(dim(mo_) == 0) || !any(is.finite(mo_))) { next } if (all_equal_to(dim(mo_), 1) && all(is.finite(mo_))) { pair[ex == e] <- paste(1, e, sep = "|") next } A$x <- mo_ rlang::with_options({ matchit_try({ p[[e]] <- do.call(optmatch::fullmatch, A) }, from = "optmatch") }, optmatch_max_problem_size = Inf) pair[names(p[[e]])[!is.na(p[[e]])]] <- paste(as.character(p[[e]][!is.na(p[[e]])]), e, sep = "|") } if (all(is.na(pair))) { .err("No matches were found") } if (length(p) == 1L) { p <- p[[1]] } psclass <- factor(pair) levels(psclass) <- seq_len(nlevels(psclass)) names(psclass) <- names(treat) #No match.matrix because treated units don't index matched strata (i.e., more than one #treated unit can be in the same stratum). Stratum information is contained in subclass. .cat_verbose("Calculating matching weights... ", verbose = verbose) res <- list(subclass = psclass, weights = get_weights_from_subclass(psclass, treat, estimand), obj = p) .cat_verbose("Done.\n", verbose = verbose) class(res) <- "matchit" res } MatchIt/R/rbind.matchdata.R0000644000176200001440000001336614762402770015174 0ustar liggesusers#' Append matched datasets together #' #' These functions are [rbind()] methods for objects resulting from calls to #' [match_data()] and [get_matches()]. They function nearly identically to #' `rbind.data.frame()`; see Details for how they differ. #' #' @aliases rbind.matchdata rbind.getmatches #' #' @param \dots Two or more `matchdata` or `getmatches` objects the #' output of calls to [match_data()] and [get_matches()], respectively. #' Supplied objects must either be all `matchdata` objects or all #' `getmatches` objects. #' @param deparse.level Passed to [rbind()]. #' #' @return An object of the same class as those supplied to it (i.e., a #' `matchdata` object if `matchdata` objects are supplied and a #' `getmatches` object if `getmatches` objects are supplied). #' [rbind()] is called on the objects after adjusting the variables so that the #' appropriate method will be dispatched corresponding to the class of the #' original data object. #' #' @details #' `rbind()` appends two or more datasets row-wise. This can be useful #' when matching was performed separately on subsets of the original data and #' they are to be combined into a single dataset for effect estimation. Using #' the regular `data.frame` method for `rbind()` would pose a #' problem, however; the `subclass` variable would have repeated names #' across different datasets, even though units only belong to the subclasses #' in their respective datasets. `rbind.matchdata()` renames the #' subclasses so that the correct subclass membership is maintained. #' #' The supplied matched datasets must be generated from the same original #' dataset, that is, having the same variables in it. The added components #' (e.g., weights, subclass) can be named differently in different datasets but #' will be changed to have the same name in the output. #' #' `rbind.getmatches()` and `rbind.matchdata()` are identical. #' #' @author Noah Greifer #' @seealso [match_data()], [rbind()] #' #' See `vignettes("estimating-effects")` for details on using #' `rbind()` for effect estimation after subsetting the data. #' #' @examples #' #' data("lalonde") #' #' # Matching based on race subsets #' m.out_b <- matchit(treat ~ age + educ + married + #' nodegree + re74 + re75, #' data = subset(lalonde, race == "black")) #' md_b <- match_data(m.out_b) #' #' m.out_h <- matchit(treat ~ age + educ + married + #' nodegree + re74 + re75, #' data = subset(lalonde, race == "hispan")) #' md_h <- match_data(m.out_h) #' #' m.out_w <- matchit(treat ~ age + educ + married + #' nodegree + re74 + re75, #' data = subset(lalonde, race == "white")) #' md_w <- match_data(m.out_w) #' #' #Bind the datasets together #' md_all <- rbind(md_b, md_h, md_w) #' #' #Subclass conflicts are avoided #' levels(md_all$subclass) #' #' @exportS3Method rbind matchdata rbind.matchdata <- function(..., deparse.level = 1) { allargs <- list(...) allargs <- allargs[lengths(allargs) > 0L] if (is_null(names(allargs))) { md_list <- allargs allargs <- list() } else { md_list <- allargs[!nzchar(names(allargs))] allargs[!nzchar(names(allargs))] <- NULL } allargs$deparse.level <- deparse.level type <- intersect(c("matchdata", "getmatches"), unlist(lapply(md_list, class))) if (is_null(type)) { .err("a `matchdata` or `getmatches` object must be supplied") } if (length(type) == 2L) { .err("supplied objects must be all `matchdata` objects or all `getmatches` objects") } attrs <- c("distance", "weights", "subclass", "id") attr_list <- setNames(vector("list", length(attrs)), attrs) key_attrs <- setNames(rep.int(NA_character_, length(attrs)), attrs) for (i in attrs) { attr_list[[i]] <- unlist(lapply(md_list, function(m) { a <- attr(m, i) if (is_null(a)) NA_character_ else a })) if (all(is.na(attr_list[[i]]))) { attr_list[[i]] <- NULL } else { key_attrs[i] <- Find(Negate(is.na), attr_list[[i]]) } } attrs <- names(attr_list) key_attrs <- key_attrs[attrs] #Check if all non-attr columns are the same across datasets other_col_list <- lapply(seq_along(md_list), function(d) { setdiff(names(md_list[[d]]), unlist(lapply(attr_list, `[`, d))) }) for (d in seq_along(md_list)[-1L]) { if (length(other_col_list[[d]]) != length(other_col_list[[1L]]) || !all(other_col_list[[d]] %in% other_col_list[[1L]])) { .err(sprintf("the %s inputs must come from the same dataset", switch(type, "matchdata" = "`match_data()`", "`get_matches()`"))) } } for (d in seq_along(md_list)) { for (i in attrs) { #Rename columns of each attribute the same across datasets if (is_null(attr(md_list[[d]], i))) { md_list[[d]] <- setNames(cbind(md_list[[d]], NA), c(names(md_list[[d]]), key_attrs[i])) } else { names(md_list[[d]])[names(md_list[[d]]) == attr_list[[i]][d]] <- key_attrs[i] } #Give subclasses unique values across datasets if (i == "subclass") { if (all(is.na(md_list[[d]][[key_attrs[i]]]))) { md_list[[d]][[key_attrs[i]]] <- factor(md_list[[d]][[key_attrs[i]]], levels = NA) } else { levels(md_list[[d]][[key_attrs[i]]]) <- paste(d, levels(md_list[[d]][[key_attrs[i]]]), sep = "_") } } } #Put all columns in the same order if (d > 1) { md_list[[d]] <- md_list[[d]][names(md_list[[1L]])] } class(md_list[[d]]) <- setdiff(class(md_list[[d]]), type) } out <- do.call("rbind", c(md_list, allargs)) for (i in attrs) { attr(out, i) <- unname(key_attrs[i]) } class(out) <- c(type, class(out)) out } #' @exportS3Method rbind getmatches #' @rdname rbind.matchdata rbind.getmatches <- rbind.matchdata MatchIt/R/aux_functions.R0000644000176200001440000003016714763227504015035 0ustar liggesusers#Function to ensure no subclass is devoid of both treated and control units by "scooting" units #from other subclasses. subclass_scoot <- function(sub, treat, x, min.n = 1) { #Reassigns subclasses so there are no empty subclasses #for each treatment group. subtab <- table(treat, sub) if (all(subtab >= min.n)) { return(sub) } nsub <- ncol(subtab) if (any(rowSums(subtab) < nsub * min.n)) { .err(sprintf("not enough units to fit %s treated and control %s in each subclass", min.n, ngettext(min.n, "unit", "units"))) } subclass_scootC(as.integer(sub), as.integer(treat), as.numeric(x), as.integer(min.n)) } #Create info component of matchit object create_info <- function(method, fn1, link, discard, replace, ratio, mahalanobis, transform, subclass, antiexact, distance_is_matrix) { list(method = method, distance = if (is_not_null(fn1)) sub("distance2", "", fn1, fixed = TRUE) else NULL, link = if (is_not_null(link)) link else NULL, discard = discard, replace = if (is_not_null(method) && method %in% c("nearest", "genetic")) replace else NULL, ratio = if (is_not_null(method) && method %in% c("nearest", "optimal", "genetic")) ratio else NULL, max.controls = if (is_not_null(method) && method %in% c("nearest", "optimal")) attr(ratio, "max.controls") else NULL, mahalanobis = mahalanobis, transform = transform, subclass = if (is_not_null(method) && method == "subclass") length(unique(subclass[!is.na(subclass)])) else NULL, antiexact = antiexact, distance_is_matrix = distance_is_matrix) } #Function to turn a method name into a phrase describing the method info_to_method <- function(info) { out.list <- setNames(vector("list", 3L), c("kto1", "type", "replace")) out.list[["kto1"]] <- { if (is_null(info$ratio)) NULL else paste0(if (is_not_null(info$max.controls)) "variable ratio ", round(info$ratio, 2L), ":1") } out.list[["type"]] <- { if (is_null(info$method)) "none (no matching)" else switch(info$method, "exact" = "exact matching", "cem" = "coarsened exact matching", "nearest" = "nearest neighbor matching", "optimal" = "optimal pair matching", "full" = "optimal full matching", "quick" = "generalized full matching", "genetic" = "genetic matching", "subclass" = sprintf("subclassification (%s subclasses)", info$subclass), "cardinality" = "cardinality matching", if (is_null(attr(info$method, "method"))) "an unspecified matching method" else attr(info$method, "method")) } out.list[["replace"]] <- { if (is_null(info$replace) || !info$method %in% c("nearest", "genetic")) NULL else if (info$replace) "with replacement" else "without replacement" } firstup(do.call("paste", unname(out.list))) } info_to_distance <- function(info) { distance <- info$distance link <- info$link if (is_not_null(link) && startsWith(as.character(link), "linear")) { linear <- TRUE link <- sub("linear.", "", as.character(link)) } else { linear <- FALSE } dist <- switch(distance, "glm" = switch(link, "logit" = "logistic regression", "probit" = "probit regression", sprintf("GLM with a %s link", link)), "gam" = sprintf("GAM with a %s link", link), "gbm" = "GBM", "elasticnet" = sprintf("an elastic net with a %s link", link), "lasso" = switch(link, "logit" = "lasso logistic regression", sprintf("lasso regression with a %s link", link)), "ridge" = switch(link, "logit" = "ridge logistic regression", sprintf("ridge regression with a %s link", link)), "rpart" = "CART", "nnet" = "a neural network", "cbps" = "CBPS", "bart" = "BART", "randomforest" = "a random forest") if (linear) { dist <- paste(dist, "and linearized") } dist } #Make interaction vector out of matrix of covs; similar to interaction() exactify <- function(X, nam = NULL, sep = "|", include_vars = FALSE, justify = "right") { if (is_null(nam)) { nam <- rownames(X) } if (is.matrix(X)) { X <- as.data.frame.matrix(X) } else if (!is.list(X)) { stop("X must be a matrix, data frame, or list.") } X <- X[lengths(X) > 0] if (is_null(X)) { return(NULL) } for (i in seq_along(X)) { unique_x <- { if (is.factor(X[[i]])) levels(X[[i]]) else sort(unique(X[[i]])) } lev <- { if (include_vars) sprintf("%s = %s", names(X)[i], add_quotes(unique_x, chk::vld_character_or_factor(X[[i]]))) else if (is_null(justify)) unique_x else format(unique_x, justify = justify) } X[[i]] <- factor(X[[i]], levels = unique_x, labels = lev) } out <- interaction2(X, sep = sep, lex.order = if (include_vars) TRUE else NULL) if (is_null(nam)) { return(out) } setNames(out, nam) } #Get covariates (RHS) vars from formula get_covs_matrix <- function(formula = NULL, data = NULL) { if (is_null(formula)) { fnames <- colnames(data) fnames[!startsWith(fnames, "`")] <- add_quotes(fnames[!startsWith(fnames, "`")], "`") formula <- reformulate(fnames) } else { formula <- update(terms(formula, data = data), NULL ~ . + 1) } mf <- model.frame(terms(formula, data = data), data, na.action = na.pass) chars.in.mf <- vapply(mf, is.character, logical(1L)) for (i in which(chars.in.mf)) { mf[[i]] <- as.factor(mf[[i]]) } mf <- droplevels(mf) X <- model.matrix(formula, data = mf, contrasts.arg = lapply(Filter(is.factor, mf), contrasts, contrasts = FALSE)) .assign <- attr(X, "assign")[-1L] X <- X[, -1L, drop = FALSE] attr(X, "assign") <- .assign X } #Extracts and names the "assign" attribute from get_covs_matrix() get_assign <- function(mat) { if (is_null(attr(mat, "assign"))) { return(NULL) } setNames(attr(mat, "assign"), colnames(mat)) } #Convert match.matrix (mm) using numerical indices to using char rownames nummm2charmm <- function(nummm, treat) { #Assumes nummm has rownames charmm <- array(NA_character_, dim = dim(nummm), dimnames = dimnames(nummm)) charmm[] <- names(treat)[nummm] charmm } charmm2nummm <- function(charmm, treat) { nummm <- array(NA_integer_, dim = dim(charmm), dimnames = dimnames(charmm)) n_index <- setNames(seq_along(treat), names(treat)) nummm[] <- n_index[charmm] nummm } #Get subclass from match.matrix. Only to be used if replace = FALSE. See subclass2mmC.cpp for reverse. mm2subclass <- function(mm, treat, focal = NULL) { if (!is.integer(mm)) { mm <- charmm2nummm(mm, treat) } mm2subclassC(mm, treat, focal) } #Pooled within-group (weighted) covariance by group-mean centering covariates. Used #in Mahalanobis distance pooled_cov <- function(X, t, w = NULL) { unique_t <- unique(t) if (is_null(dim(X))) X <- matrix(X, nrow = length(X)) if (is_null(w)) { n <- nrow(X) for (i in unique_t) { in_t <- which(t == i) for (j in seq_len(ncol(X))) { X[in_t, j] <- X[in_t, j] - mean(X[in_t, j]) } } return(cov(X) * (n - 1) / (n - length(unique_t))) } for (i in unique_t) { in_t <- which(t == i) for (j in seq_len(ncol(X))) { X[in_t, j] <- X[in_t, j] - wm(X[in_t, j], w[in_t]) } } cov.wt(X, w)$cov } pooled_sd <- function(X, t, w = NULL, bin.var = NULL, contribution = "proportional") { contribution <- match_arg(contribution, c("proportional", "equal")) unique_t <- unique(t) if (is_null(dim(X))) X <- matrix(X, nrow = length(X)) n <- nrow(X) if (is_null(bin.var)) { bin.var <- apply(X, 2L, function(x) all(x == 0 | x == 1)) } if (contribution == "equal") { vars <- do.call("rbind", lapply(unique_t, function(i) { in_t <- which(t == i) vapply(seq_len(ncol(X)), function(j) { wvar(X[in_t, j], w = w[in_t], bin.var = bin.var[j]) }, numeric(1L)) })) pooled_var <- colMeans(vars) } else { pooled_var <- vapply(seq_len(ncol(X)), function(j) { x <- X[, j] b <- bin.var[j] if (b) { v <- { if (is_null(w)) vapply(unique_t, function(i) { in_i <- which(t == i) sxi <- sum(x[in_i]) ni <- length(in_i) sxi * (1 - sxi / ni) / n }, numeric(1L)) else vapply(unique_t, function(i) { in_i <- which(t == i) sxi <- sum(x[in_i] * w[in_i]) ni <- sum(w[in_i]) sxi * (1 - sxi / ni) / sum(w) }, numeric(1L)) } return(sum(v)) } if (is_null(w)) { for (i in unique_t) { in_i <- which(t == i) x[in_i] <- x[in_i] - wm(x[in_i]) } return(sum(x^2) / (n - length(unique_t))) } for (i in unique_t) { in_i <- which(t == i) x[in_i] <- x[in_i] - wm(x[in_i], w[in_i]) } w_ <- .make_sum_to_1(w) sum(w_ * x^2) / (1 - sum(w_^2)) }, numeric(1L)) } setNames(sqrt(pooled_var), colnames(X)) } #Effective sample size ESS <- function(w) { sum(w)^2 / sum(w^2) } #Compute sample sizes nn <- function(treat, weights, discarded = NULL, s.weights = NULL) { if (is_null(discarded)) { discarded <- rep.int(FALSE, length(treat)) } if (is_null(s.weights)) { s.weights <- rep.int(1, length(treat)) } weights <- weights * s.weights n <- matrix(0, ncol = 2L, nrow = 6L, dimnames = list(c("All (ESS)", "All", "Matched (ESS)", "Matched", "Unmatched", "Discarded"), c("Control", "Treated"))) # Control Treated n["All (ESS)", ] <- c(ESS(s.weights[treat == 0]), ESS(s.weights[treat == 1])) n["All", ] <- c(sum(treat == 0), sum(treat == 1)) n["Matched (ESS)", ] <- c(ESS(weights[treat == 0]), ESS(weights[treat == 1])) n["Matched", ] <- c(sum(treat == 0 & weights > 0), sum(treat == 1 & weights > 0)) n["Unmatched", ] <- c(sum(treat == 0 & weights == 0 & !discarded), sum(treat == 1 & weights == 0 & !discarded)) n["Discarded", ] <- c(sum(treat == 0 & discarded), sum(treat == 1 & discarded)) n } #Compute subclass sample sizes qn <- function(treat, subclass, discarded = NULL) { treat <- factor(treat, levels = 0:1, labels = c("Control", "Treated")) if (is_null(discarded)) { discarded <- rep.int(FALSE, length(treat)) } qn <- table(treat[!discarded], subclass[!discarded]) if (any(is.na(subclass) & !discarded)) { qn <- cbind(qn, table(treat[is.na(subclass) & !discarded])) colnames(qn)[ncol(qn)] <- "Unmatched" } if (any(discarded)) { qn <- cbind(qn, table(treat[discarded])) colnames(qn)[ncol(qn)] <- "Discarded" } qn <- rbind(qn, colSums(qn)) rownames(qn)[nrow(qn)] <- "Total" qn <- cbind(qn, rowSums(qn)) colnames(qn)[ncol(qn)] <- "All" qn } #Function to capture and print errors and warnings better matchit_try <- function(expr, from = NULL, dont_warn_if = NULL) { tryCatch({ withCallingHandlers({ expr }, warning = function(w) { if (is_null(dont_warn_if) || !any(vapply(dont_warn_if, grepl, logical(1L), conditionMessage(w), fixed = TRUE))) { if (is_null(from)) .wrn(conditionMessage(w), tidy = FALSE) else .wrn(sprintf("(from %s) %s", from, conditionMessage(w)), tidy = FALSE) } invokeRestart("muffleWarning") })}, error = function(e) { if (is_null(from)) .err(conditionMessage(e), tidy = FALSE) else .err(sprintf("(from %s) %s", from, conditionMessage(e)), tidy = FALSE) }) } MatchIt/R/summary.matchit.R0000644000176200001440000010152014762402720015257 0ustar liggesusers#' View a balance summary of a `matchit` object #' #' Computes and prints balance statistics for `matchit` and #' `matchit.subclass` objects. Balance should be assessed to ensure the #' matching or subclassification was effective at eliminating treatment group #' imbalance and should be reported in the write-up of the results of the #' analysis. #' #' @aliases summary.matchit summary.matchit.subclass print.summary.matchit #' print.summary.matchit.subclass #' #' @param object a `matchit` object; the output of a call to [matchit()]. #' @param interactions `logical`; whether to compute balance statistics #' for two-way interactions and squares of covariates. Default is `FALSE`. #' @param addlvariables additional variable for which balance statistics are to #' be computed along with the covariates in the `matchit` object. Can be #' entered in one of three ways: as a data frame of covariates with as many #' rows as there were units in the original `matchit()` call, as a string #' containing the names of variables in `data`, or as a right-sided #' `formula` with the additional variables (and possibly their #' transformations) found in `data`, the environment, or the #' `matchit` object. Balance on squares and interactions of the additional #' variables will be included if `interactions = TRUE`. #' @param standardize `logical`; whether to compute standardized #' (`TRUE`) or unstandardized (`FALSE`) statistics. The standardized #' statistics are the standardized mean difference and the mean and maximum of #' the difference in the (weighted) empirical cumulative distribution functions #' (ECDFs). The unstandardized statistics are the raw mean difference and the #' mean and maximum of the quantile-quantile (QQ) difference. Variance ratios #' are produced either way. See Details below. Default is `TRUE`. #' @param data a optional data frame containing variables named in #' `addlvariables` if specified as a string or formula. #' @param pair.dist `logical`; whether to compute average absolute pair #' distances. For matching methods that don't include a `match.matrix` #' component in the output (i.e., exact matching, coarsened exact matching, #' full matching, and subclassification), computing pair differences can take a #' long time, especially for large datasets and with many covariates. For other #' methods (i.e., nearest neighbor, optimal, and genetic matching), computation #' is fairly quick. Default is `FALSE` for subclassification and #' `TRUE` otherwise. #' @param un `logical`; whether to compute balance statistics for the #' unmatched sample. Default `TRUE`; set to `FALSE` for more concise #' output. #' @param improvement `logical`; whether to compute the percent reduction #' in imbalance. Default `FALSE`. Ignored if `un = FALSE`. #' @param subclass after subclassification, whether to display balance for #' individual subclasses, and, if so, for which ones. Can be `TRUE` #' (display balance for all subclasses), `FALSE` (display balance only in #' aggregate), or the indices (e.g., `1:6`) of the specific subclasses for #' which to display balance. When anything other than `FALSE`, aggregate #' balance statistics will not be displayed. Default is `FALSE`. #' @param digits the number of digits to round balance statistics to. #' @param x a `summay.matchit` or `summary.matchit.subclass` object; #' the output of a call to `summary()`. #' @param \dots ignored. #' #' @return For `matchit` objects, a `summary.matchit` object, which #' is a list with the following components: #' #' \item{call}{the original call to [matchit()]} #' \item{nn}{a matrix of the #' sample sizes in the original (unmatched) and matched samples} #' \item{sum.all}{if `un = TRUE`, a matrix of balance statistics for each #' covariate in the original (unmatched) sample} #' \item{sum.matched}{a matrix of #' balance statistics for each covariate in the matched sample} #' \item{reduction}{if `improvement = TRUE`, a matrix of the percent #' reduction in imbalance for each covariate in the matched sample} #' #' For `match.subclass` objects, a `summary.matchit.subclass` object, #' which is a list as above containing the following components: #' #' \item{call}{the original call to [matchit()]} #' \item{sum.all}{if `un = TRUE`, a matrix of balance statistics for each covariate in the original #' sample} #' \item{sum.subclass}{if `subclass` is not `FALSE`, a list #' of matrices of balance statistics for each subclass} #' \item{sum.across}{a #' matrix of balance statistics for each covariate computed using the #' subclassification weights} #' \item{reduction}{if `improvement = TRUE`, a #' matrix of the percent reduction in imbalance for each covariate in the #' matched sample} #' \item{qn}{a matrix of sample sizes within each subclass} #' \item{nn}{a matrix of the sample sizes in the original (unmatched) and #' matched samples} #' #' @details #' `summary()` computes a balance summary of a `matchit` object. This #' include balance before and after matching or subclassification, as well as #' the percent improvement in balance. The variables for which balance #' statistics are computed are those included in the `formula`, #' `exact`, and `mahvars` arguments to [matchit()], as well as the #' distance measure if `distance` is was supplied as a numeric vector or #' method of estimating propensity scores. The `X` component of the #' `matchit` object is used to supply the covariates. #' #' The standardized mean differences are computed both before and after #' matching or subclassification as the difference in treatment group means #' divided by a standardization factor computed in the unmatched (original) #' sample. The standardization factor depends on the argument supplied to #' `estimand` in `matchit()`: for `"ATT"`, it is the standard #' deviation in the treated group; for `"ATC"`, it is the standard #' deviation in the control group; for `"ATE"`, it is the square root of #' the average of the variances within each treatment group. The post-matching #' mean difference is computed with weighted means in the treatment groups #' using the matching or subclassification weights. #' #' The variance ratio is computed as the ratio of the treatment group #' variances. Variance ratios are not computed for binary variables because #' their variance is a function solely of their mean. After matching, weighted #' variances are computed using the formula used in [cov.wt()]. The percent #' reduction in bias is computed using the log of the variance ratios. #' #' The eCDF difference statistics are computed by creating a (weighted) eCDF #' for each group and taking the difference between them for each covariate #' value. The eCDF is a function that outputs the (weighted) proportion of #' units with covariate values at or lower than the input value. The maximum #' eCDF difference is the same thing as the Kolmogorov-Smirnov statistic. The #' values are bounded at zero and one, with values closer to zero indicating #' good overlap between the covariate distributions in the treated and control #' groups. For binary variables, all eCDF differences are equal to the #' (weighted) difference in proportion and are computed that way. #' #' The QQ difference statistics are computed by creating two samples of the #' same size by interpolating the values of the larger one. The values are #' arranged in order for each sample. The QQ difference for each quantile is #' the difference between the observed covariate values at that quantile #' between the two groups. The difference is on the scale of the original #' covariate. Values close to zero indicate good overlap between the covariate #' distributions in the treated and control groups. A weighted interpolation is #' used for post-matching QQ differences. For binary variables, all QQ #' differences are equal to the (weighted) difference in proportion and are #' computed that way. #' #' The pair distance is the average of the absolute differences of a variable #' between pairs. For example, if a treated unit was paired with four control #' units, that set of units would contribute four absolute differences to the #' average. Within a subclass, each combination of treated and control unit #' forms a pair that contributes once to the average. The pair distance is #' described in Stuart and Green (2008) and is the value that is minimized when #' using optimal (full) matching. When `standardize = TRUE`, the #' standardized versions of the variables are used, where the standardization #' factor is as described above for the standardized mean differences. Pair #' distances are not computed in the unmatched sample (because there are no #' pairs). Because pair distance can take a while to compute, especially with #' large datasets or for many covariates, setting `pair.dist = FALSE` is #' one way to speed up `summary()`. #' #' The effective sample size (ESS) is a measure of the size of a hypothetical #' unweighted sample with roughly the same precision as a weighted sample. When #' non-uniform matching weights are computed (e.g., as a result of full #' matching, matching with replacement, or subclassification), the ESS can be #' used to quantify the potential precision remaining in the matched sample. #' The ESS will always be less than or equal to the matched sample size, #' reflecting the loss in precision due to using the weights. With non-uniform #' weights, it is printed in the sample size table; otherwise, it is removed #' because it does not contain additional information above the matched sample #' size. #' #' After subclassification, the aggregate balance statistics are computed using #' the subclassification weights rather than averaging across subclasses. #' #' All balance statistics (except pair differences) are computed incorporating #' the sampling weights supplied to `matchit()`, if any. The unadjusted #' balance statistics include the sampling weights and the adjusted balance #' statistics use the matching weights multiplied by the sampling weights. #' #' When printing, `NA` values are replaced with periods (`.`), and #' the pair distance column in the unmatched and percent balance improvement #' components of the output are omitted. #' #' @seealso [summary()] for the generic method; [plot.summary.matchit()] for #' making a Love plot from `summary()` output. #' #' \pkgfun{cobalt}{bal.tab.matchit}, which also displays balance for `matchit` #' objects. #' #' @examples #' #' data("lalonde") #' m.out <- matchit(treat ~ age + educ + married + #' race + re74, #' data = lalonde, #' method = "nearest", #' exact = ~ married, #' replace = TRUE) #' #' summary(m.out, interactions = TRUE) #' #' s.out <- matchit(treat ~ age + educ + married + #' race + nodegree + re74 + re75, #' data = lalonde, #' method = "subclass") #' #' summary(s.out, addlvariables = ~log(age) + I(re74==0)) #' #' summary(s.out, subclass = TRUE) #' #' @exportS3Method summary matchit summary.matchit <- function(object, interactions = FALSE, addlvariables = NULL, standardize = TRUE, data = NULL, pair.dist = TRUE, un = TRUE, improvement = FALSE, ...) { #Create covariate matrix; include caliper, exact, and mahvars X <- .process_X(object, addlvariables, data) treat <- object$treat weights <- object$weights s.weights <- { if (is_null(object$s.weights)) rep_with(1, weights) else object$s.weights } no_x <- is_null(X) if (no_x) { X <- matrix(1, nrow = length(treat), ncol = 1L, dimnames = list(names(treat), ".1")) nam <- colnames(X) } else { nam <- colnames(X) #Remove tics has_tics <- which(startsWith(nam, "`") & endsWith(nam, "`")) nam[has_tics] <- substr(nam[has_tics], 2, nchar(nam[has_tics]) - 1) } kk <- ncol(X) matched <- is_not_null(object$info$method) un <- un || !matched chk::chk_flag(interactions) chk::chk_flag(standardize) chk::chk_flag(pair.dist) chk::chk_flag(un) chk::chk_flag(improvement) s.d.denom <- { if (standardize) switch(object$estimand, "ATT" = "treated", "ATC" = "control", "ATE" = "pooled") else NULL } ## Summary Stats if (un) { aa.all <- lapply(seq_len(kk), function(i) bal1var(X[, i], tt = treat, ww = NULL, s.weights = s.weights, standardize = standardize, s.d.denom = s.d.denom)) sum.all <- do.call("rbind", aa.all) dimnames(sum.all) <- list(nam, names(aa.all[[1L]])) if (no_x) sum.all <- sum.all[-1L, , drop = FALSE] sum.all.int <- NULL } if (matched) { aa.matched <- lapply(seq_len(kk), function(i) bal1var(X[, i], tt = treat, ww = weights, s.weights = s.weights, subclass = object$subclass, mm = object$match.matrix, standardize = standardize, s.d.denom = s.d.denom, compute.pair.dist = pair.dist)) sum.matched <- do.call("rbind", aa.matched) dimnames(sum.matched) <- list(nam, names(aa.matched[[1L]])) if (no_x) sum.matched <- sum.matched[-1, , drop = FALSE] sum.matched.int <- NULL } if (!no_x && interactions) { n.int <- kk * (kk + 1) / 2 if (un) sum.all.int <- matrix(NA_real_, nrow = n.int, ncol = length(aa.all[[1L]]), dimnames = list(NULL, names(aa.all[[1]]))) if (matched) sum.matched.int <- matrix(NA_real_, nrow = n.int, ncol = length(aa.matched[[1L]]), dimnames = list(NULL, names(aa.matched[[1L]]))) to.remove <- rep.int(FALSE, n.int) int.names <- character(n.int) k <- 1 for (i in seq_len(kk)) { for (j in i:kk) { x2 <- X[, i] * X[, j] if (all(abs(x2) < sqrt(.Machine$double.eps)) || all(abs(x2 - X[, i]) < sqrt(.Machine$double.eps))) { #prevent interactions within same factors to.remove[k] <- TRUE } else { if (un) { sum.all.int[k, ] <- bal1var(x2, tt = treat, ww = NULL, s.weights = s.weights, standardize = standardize, s.d.denom = s.d.denom) } if (matched) { sum.matched.int[k, ] <- bal1var(x2, tt = treat, ww = weights, s.weights = s.weights, subclass = object$subclass, mm = object$match.matrix, standardize = standardize, s.d.denom = s.d.denom, compute.pair.dist = pair.dist) } if (i == j) { #Add superscript 2 int.names[k] <- paste0(nam[i], "\u00B2") } else { int.names[k] <- paste(nam[i], nam[j], sep = " * ") } } k <- k + 1 } } if (un) { rownames(sum.all.int) <- int.names sum.all <- rbind(sum.all, sum.all.int[!to.remove, , drop = FALSE]) } if (matched) { rownames(sum.matched.int) <- int.names sum.matched <- rbind(sum.matched, sum.matched.int[!to.remove, , drop = FALSE]) } } if (is_not_null(object$distance)) { if (un) { ad.all <- bal1var(object$distance, tt = treat, ww = NULL, s.weights = s.weights, standardize = standardize, s.d.denom = s.d.denom) if (exists("sum.all", inherits = FALSE)) { sum.all <- rbind(ad.all, sum.all) rownames(sum.all)[1L] <- "distance" } else { sum.all <- matrix(ad.all, nrow = 1L, dimnames = list("distance", names(ad.all))) } } if (matched) { ad.matched <- bal1var(object$distance, tt = treat, ww = weights, s.weights = s.weights, subclass = object$subclass, mm = object$match.matrix, standardize = standardize, s.d.denom = s.d.denom, compute.pair.dist = pair.dist) if (exists("sum.matched", inherits = FALSE)) { sum.matched <- rbind(ad.matched, sum.matched) rownames(sum.matched)[1L] <- "distance" } else { sum.matched <- matrix(ad.matched, nrow = 1L, dimnames = list("distance", names(ad.matched))) } } } ## Imbalance Reduction if (matched && un && improvement) { reduction <- matrix(NA_real_, nrow = nrow(sum.all), ncol = ncol(sum.all) - 2, dimnames = list(rownames(sum.all), colnames(sum.all)[-(1:2)])) stat.all <- abs(sum.all[, -(1:2), drop = FALSE]) stat.matched <- abs(sum.matched[, -(1:2), drop = FALSE]) #Everything but variance ratios reduction[, -2] <- 100 * (stat.all[, -2] - stat.matched[, -2]) / stat.all[, -2] #Just variance ratios; turn to log first vr.all <- abs(log(stat.all[, 2])) vr.matched <- abs(log(stat.matched[, 2])) reduction[, 2] <- 100 * (vr.all - vr.matched) / vr.all reduction[stat.all == 0 & stat.matched == 0] <- 0 reduction[stat.all == 0 & stat.matched > 0] <- -Inf } else { reduction <- NULL } #Sample size nn <- nn(treat, weights, object$discarded, s.weights) ## output res <- list(call = object$call, nn = nn, sum.all = if (un) sum.all, sum.matched = if (matched) sum.matched, reduction = reduction) class(res) <- "summary.matchit" res } #' @exportS3Method summary matchit.subclass #' @rdname summary.matchit summary.matchit.subclass <- function(object, interactions = FALSE, addlvariables = NULL, standardize = TRUE, data = NULL, pair.dist = FALSE, subclass = FALSE, un = TRUE, improvement = FALSE, ...) { #Create covariate matrix X <- .process_X(object, addlvariables, data) which.subclass <- subclass treat <- object$treat weights <- object$weights s.weights <- { if (is_null(object$s.weights)) rep_with(1, weights) else object$s.weights } subclass <- object$subclass nam <- colnames(X) kk <- ncol(X) subclasses <- levels(subclass) chk::chk_flag(interactions) chk::chk_flag(standardize) chk::chk_flag(pair.dist) chk::chk_flag(un) chk::chk_flag(improvement) s.d.denom <- { if (standardize) switch(object$estimand, "ATT" = "treated", "ATC" = "control", "ATE" = "pooled") else NULL } if (isTRUE(which.subclass)) { which.subclass <- subclasses } else if (isFALSE(which.subclass)) { which.subclass <- NULL } else if (is.atomic(which.subclass) && all(which.subclass %in% seq_along(subclasses))) { which.subclass <- subclasses[which.subclass] } else { .err("`subclass` should be `TRUE`, `FALSE`, or a vector of subclass indices for which subclass balance is to be displayed") } matched <- TRUE #always compute aggregate balance so plot.summary can use it subs <- is_not_null(which.subclass) ## Aggregate Subclass #Use the estimated weights to compute aggregate balance. ## Summary Stats sum.all <- sum.matched <- sum.subclass <- reduction <- NULL if (un) { aa.all <- setNames(lapply(seq_len(kk), function(i) bal1var(X[, i], tt = treat, ww = NULL, s.weights = s.weights, standardize = standardize, s.d.denom = s.d.denom)), colnames(X)) sum.all <- do.call("rbind", aa.all) dimnames(sum.all) <- list(nam, names(aa.all[[1L]])) sum.all.int <- NULL } if (matched) { aa.matched <- setNames(lapply(seq_len(kk), function(i) bal1var(X[, i], tt = treat, ww = weights, s.weights = s.weights, subclass = subclass, standardize = standardize, s.d.denom = s.d.denom, compute.pair.dist = pair.dist)), colnames(X)) sum.matched <- do.call("rbind", aa.matched) dimnames(sum.matched) <- list(nam, names(aa.matched[[1L]])) sum.matched.int <- NULL } if (interactions) { n.int <- kk * (kk + 1) / 2 if (un) sum.all.int <- matrix(NA_real_, nrow = n.int, ncol = length(aa.all[[1L]]), dimnames = list(NULL, names(aa.all[[1L]]))) if (matched) sum.matched.int <- matrix(NA_real_, nrow = n.int, ncol = length(aa.matched[[1L]]), dimnames = list(NULL, names(aa.matched[[1L]]))) to.remove <- rep.int(FALSE, n.int) int.names <- character(n.int) k <- 1L for (i in seq_len(kk)) { for (j in i:kk) { x2 <- X[, i] * X[, j] if (all(abs(x2) < sqrt(.Machine$double.eps)) || all(abs(x2 - X[, i]) < sqrt(.Machine$double.eps))) { #prevent interactions within same factors to.remove[k] <- TRUE } else { if (un) { sum.all.int[k, ] <- bal1var(x2, tt = treat, ww = NULL, s.weights = s.weights, standardize = standardize, s.d.denom = s.d.denom) } if (matched) { sum.matched.int[k, ] <- bal1var(x2, tt = treat, ww = weights, s.weights = s.weights, subclass = subclass, standardize = standardize, compute.pair.dist = pair.dist) } if (i == j) { int.names[k] <- paste0(nam[i], "\u00B2") } else { int.names[k] <- paste(nam[i], nam[j], sep = " * ") } } k <- k + 1L } } if (un) { rownames(sum.all.int) <- int.names sum.all <- rbind(sum.all, sum.all.int[!to.remove, , drop = FALSE]) } if (matched) { rownames(sum.matched.int) <- int.names sum.matched <- rbind(sum.matched, sum.matched.int[!to.remove, , drop = FALSE]) } } if (is_not_null(object$distance)) { if (un) { ad.all <- bal1var(object$distance, tt = treat, ww = NULL, s.weights = s.weights, standardize = standardize, s.d.denom = s.d.denom) sum.all <- rbind(ad.all, sum.all) rownames(sum.all)[1L] <- "distance" } if (matched) { ad.matched <- bal1var(object$distance, tt = treat, ww = weights, s.weights = s.weights, subclass = subclass, standardize = standardize, s.d.denom = s.d.denom, compute.pair.dist = pair.dist) sum.matched <- rbind(ad.matched, sum.matched) rownames(sum.matched)[1L] <- "distance" } } ## Imbalance Reduction if (un && matched && improvement) { stat.all <- abs(sum.all[, -(1:2)]) stat.matched <- abs(sum.matched[, -(1:2)]) reduction <- 100 * (stat.all - stat.matched) / stat.all reduction[stat.all == 0 & stat.matched == 0] <- 0 reduction[stat.all == 0 & stat.matched > 0] <- -Inf } ## By Subclass if (subs) { sum.subclass <- lapply(which.subclass, function(s) { #bal1var.subclass only returns unmatched stats, which is all we need within #subclasses. Otherwise, identical to matched stats. aa <- setNames(lapply(seq_len(kk), function(i) { bal1var.subclass(X[, i], tt = treat, s.weights = s.weights, subclass = subclass, s.d.denom = s.d.denom, standardize = standardize, which.subclass = s) }), colnames(X)) sum.sub <- matrix(NA_real_, nrow = kk, ncol = ncol(aa[[1L]]), dimnames = list(nam, colnames(aa[[1L]]))) sum.sub.int <- NULL for (i in seq_len(kk)) { sum.sub[i, ] <- aa[[i]] } if (interactions) { sum.sub.int <- matrix(NA_real_, nrow = kk * (kk + 1) / 2, ncol = length(aa[[1L]]), dimnames = list(NULL, names(aa[[1L]]))) to.remove <- rep.int(FALSE, nrow(sum.sub.int)) int.names <- character(nrow(sum.sub.int)) k <- 1L for (i in seq_len(kk)) { for (j in i:kk) { if (!to.remove[k]) { #to.remove defined above x2 <- X[, i] * X[, j] jqoi <- bal1var.subclass(x2, tt = treat, s.weights = s.weights, subclass = subclass, s.d.denom = s.d.denom, standardize = standardize, which.subclass = s) sum.sub.int[k, ] <- jqoi if (i == j) { int.names[k] <- paste0(nam[i], "\u00B2") } else { int.names[k] <- paste(nam[i], nam[j], sep = " * ") } } k <- k + 1 } } rownames(sum.sub.int) <- int.names sum.sub <- rbind(sum.sub, sum.sub.int[!to.remove, , drop = FALSE]) } if (is_not_null(object$distance)) { ad <- bal1var.subclass(object$distance, tt = treat, s.weights = s.weights, subclass = subclass, s.d.denom = s.d.denom, standardize = standardize, which.subclass = s) sum.sub <- rbind(ad, sum.sub) rownames(sum.sub)[1L] <- "distance" } sum.sub }) names(sum.subclass) <- paste("Subclass", which.subclass) } ## Sample size qn <- qn(treat, subclass, object$discarded) nn <- nn(treat, weights, object$discarded, s.weights) ## output res <- list(call = object$call, sum.all = sum.all, sum.across = sum.matched, sum.subclass = sum.subclass, reduction = reduction, qn = qn, nn = nn) class(res) <- c("summary.matchit.subclass", "summary.matchit") res } #' @exportS3Method print summary.matchit #' @rdname summary.matchit print.summary.matchit <- function(x, digits = max(3, getOption("digits") - 3), ...) { if (is_not_null(x$call)) { cat("\nCall:", deparse(x$call), sep = "\n") } if (is_not_null(x$sum.all)) { cat("\nSummary of Balance for All Data:\n") print(round_df_char(x$sum.all[, -7, drop = FALSE], digits, pad = "0", na_vals = "."), right = TRUE, quote = FALSE) } if (is_not_null(x$sum.matched)) { cat("\nSummary of Balance for Matched Data:\n") if (all(is.na(x$sum.matched[, 7]))) x$sum.matched <- x$sum.matched[, -7, drop = FALSE] #Remove pair dist if empty print(round_df_char(x$sum.matched, digits, pad = "0", na_vals = "."), right = TRUE, quote = FALSE) } if (is_not_null(x$reduction)) { cat("\nPercent Balance Improvement:\n") print(round_df_char(x$reduction[, -5, drop = FALSE], 1, pad = "0", na_vals = "."), right = TRUE, quote = FALSE) } if (is_not_null(x$nn)) { cat("\nSample Sizes:\n") nn <- x$nn if (isTRUE(all.equal(nn["All (ESS)", ], nn["All", ]))) { #Don't print ESS if same as full SS nn <- nn[rownames(nn) != "All (ESS)", , drop = FALSE] } if (isTRUE(all.equal(nn["Matched (ESS)", ], nn["Matched", ]))) { #Don't print ESS if same as matched SS nn <- nn[rownames(nn) != "Matched (ESS)", , drop = FALSE] } print(round_df_char(nn, 2, pad = " ", na_vals = "."), right = TRUE, quote = FALSE) } cat("\n") invisible(x) } #' @exportS3Method print summary.matchit.subclass print.summary.matchit.subclass <- function(x, digits = max(3L, getOption("digits") - 3L), ...) { if (is_not_null(x$call)) { cat("\nCall:", deparse(x$call), sep = "\n") } if (is_not_null(x$sum.all)) { cat("\nSummary of Balance for All Data:\n") print(round_df_char(x$sum.all[, -7L, drop = FALSE], digits, pad = "0", na_vals = "."), right = TRUE, quote = FALSE) } if (is_not_null(x$sum.subclass)) { cat("\nSummary of Balance by Subclass:\n") for (s in seq_along(x$sum.subclass)) { cat(paste0("\n- ", names(x$sum.subclass)[s], "\n")) print(round_df_char(x$sum.subclass[[s]][, -7L, drop = FALSE], digits, pad = "0", na_vals = "."), right = TRUE, quote = FALSE) } if (is_not_null(x$qn)) { cat("\nSample Sizes by Subclass:\n") print(round_df_char(x$qn, 2, pad = " ", na_vals = "."), right = TRUE, quote = FALSE) } } else { if (is_not_null(x$sum.across)) { cat("\nSummary of Balance Across Subclasses\n") if (all(is.na(x$sum.across[, 7L]))) x$sum.across <- x$sum.across[, -7L, drop = FALSE] print(round_df_char(x$sum.across, digits, pad = "0", na_vals = "."), right = TRUE, quote = FALSE) } if (is_not_null(x$reduction)) { cat("\nPercent Balance Improvement:\n") print(round_df_char(x$reduction[, -5L, drop = FALSE], 1L, pad = "0", na_vals = "."), right = TRUE, quote = FALSE) } if (is_not_null(x$nn)) { cat("\nSample Sizes:\n") nn <- x$nn if (isTRUE(all.equal(nn["All (ESS)", ], nn["All", ]))) { #Don't print ESS if same as full SS nn <- nn[rownames(nn) != "All (ESS)", , drop = FALSE] } if (isTRUE(all.equal(nn["Matched (ESS)", ], nn["Matched", ]))) { #Don't print ESS if same as matched SS nn <- nn[rownames(nn) != "Matched (ESS)", , drop = FALSE] } print(round_df_char(nn, 2, pad = " ", na_vals = "."), right = TRUE, quote = FALSE) } } cat("\n") } .process_X <- function(object, addlvariables = NULL, data = NULL) { X <- { if (is_null(object$X)) matrix(nrow = length(object$treat), ncol = 0) else get_covs_matrix(data = object$X) } if (is_null(addlvariables)) { return(X) } #Attempt to extract data from matchit object; same as match_data() data.found <- FALSE for (i in 1:4) { if (i == 2L) { data <- try(eval(object$call$data, envir = environment(object$formula)), silent = TRUE) } else if (i == 3L) { data <- try(eval(object$call$data, envir = parent.frame()), silent = TRUE) } else if (i == 4L) { data <- object[["model"]][["data"]] } if (!null_or_error(data) && length(dim(data)) == 2L && nrow(data) == length(object[["treat"]])) { data.found <- TRUE break } } if (is.character(addlvariables)) { if (is_null(data) || !is.data.frame(data)) { .err("if `addlvariables` is specified as a string, a data frame argument must be supplied to `data`") } if (!all(hasName(data, addlvariables))) { .err("all variables in `addlvariables` must be in `data`") } addlvariables <- data[addlvariables] } else if (rlang::is_formula(addlvariables)) { if (is_not_null(data) && is.data.frame(data)) { vars.in.formula <- all.vars(addlvariables) data <- cbind(data[names(data) %in% vars.in.formula], object$X[names(object$X) %in% setdiff(vars.in.formula, names(data))]) } else { data <- object$X } } else if (!is.matrix(addlvariables) && !is.data.frame(addlvariables)) { .err("the argument to `addlvariables` must be in one of the accepted forms. See `?summary.matchit` for details") } af <- rlang::is_formula(addlvariables) if (af) { addvariables_f <- addlvariables addlvariables <- model.frame(addvariables_f, data = data, na.action = "na.pass") } if (nrow(addlvariables) != length(object$treat)) { if (is_null(data) || data.found) { .err("variables specified in `addlvariables` must have the same number of units as are present in the original call to `matchit()`") } else { .err("`data` must have the same number of units as are present in the original call to `matchit()`") } } k <- ncol(addlvariables) for (i in seq_len(k)) { if (anyNA(addlvariables[[i]]) || (is.numeric(addlvariables[[i]]) && !all(is.finite(addlvariables[[i]])))) { covariates.with.missingness <- names(addlvariables)[i:k][vapply(i:k, function(j) anyNA(addlvariables[[j]]) || (is.numeric(addlvariables[[j]]) && !all(is.finite(addlvariables[[j]]))), logical(1L))] .err(paste0("Missing and non-finite values are not allowed in `addlvariables`. Variables with missingness or non-finite values:\n\t", toString(covariates.with.missingness)), tidy = FALSE) } if (is.character(addlvariables[[i]])) { addlvariables[[i]] <- factor(addlvariables[[i]]) } } addlvariables <- { if (af) get_covs_matrix(addvariables_f, data = data) else get_covs_matrix(data = addlvariables) } # addl_assign <- get_assign(addlvariables) cbind(X, addlvariables[, setdiff(colnames(addlvariables), colnames(X)), drop = FALSE]) } MatchIt/R/MatchIt-package.R0000644000176200001440000000067614714554300015065 0ustar liggesusers#' @keywords internal "_PACKAGE" ## usethis namespace: start #' @import graphics #' @import stats #' @importFrom grDevices devAskNewPage #' @importFrom grDevices nclass.FD #' @importFrom grDevices nclass.scott #' @importFrom grDevices nclass.Sturges #' @importFrom Rcpp evalCpp #' @importFrom utils capture.output #' @importFrom utils combn #' @importFrom utils hasName #' @useDynLib MatchIt, .registration = TRUE ## usethis namespace: end NULL MatchIt/R/get_weights_from_subclass.R0000644000176200001440000000304014762407632017372 0ustar liggesusersget_weights_from_subclass <- function(subclass, treat, estimand = "ATT") { NAsub <- is.na(subclass) i1 <- which(treat == 1 & !NAsub) i0 <- which(treat == 0 & !NAsub) if (is_null(i1)) { if (is_null(i0)) { .err("No units were matched") } .err("No treated units were matched") } else if (is_null(i0)) { .err("No control units were matched") } weights <- rep_with(0, treat) if (!is.factor(subclass)) { subclass <- factor(subclass, nmax = min(length(i1), length(i0))) } treated_by_sub <- tabulate(subclass[i1], nlevels(subclass)) control_by_sub <- tabulate(subclass[i0], nlevels(subclass)) subclass <- unclass(subclass) if (estimand == "ATT") { weights[i1] <- 1 weights[i0] <- (treated_by_sub / control_by_sub)[subclass[i0]] } else if (estimand == "ATC") { weights[i1] <- (control_by_sub / treated_by_sub)[subclass[i1]] weights[i0] <- 1 } else if (estimand == "ATE") { weights[i1] <- 1 + (control_by_sub / treated_by_sub)[subclass[i1]] weights[i0] <- 1 + (treated_by_sub / control_by_sub)[subclass[i0]] } weights } # get_weights_from_subclass2 <- function(subclass, treat, estimand = "ATT") { # # weights <- weights_subclassC(subclass, treat, # switch(estimand, "ATT" = 1, "ATC" = 0, NULL)) # # if (sum(weights) == 0) # .err("No units were matched") # if (sum(weights[treat == 1]) == 0) # .err("No treated units were matched") # if (sum(weights[treat == 0]) == 0) # .err("No control units were matched") # # weights # }MatchIt/vignettes/0000755000176200001440000000000014763323604013623 5ustar liggesusersMatchIt/vignettes/assessing-balance.Rmd0000644000176200001440000010330714740300476017652 0ustar liggesusers--- title: "Assessing Balance" author: "Noah Greifer" date: "`r Sys.Date()`" output: html_vignette: toc: true vignette: > %\VignetteIndexEntry{Assessing Balance} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} bibliography: references.bib link-citations: true --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, fig.width=7, fig.height=5, fig.align = "center") options(width = 200, digits = 4) ``` ```{=html} ``` ## Introduction Covariate balance is the degree to which the distribution of covariates is similar across levels of the treatment. It has three main roles in causal effect estimation using matching: 1) as a target to optimize with matching, 2) as a method of assessing the quality of the resulting matches, and 3) as evidence to an audience that the estimated effect is close to the true effect. When covariate balance is achieved, the resulting effect estimate is less sensitive to model misspecification and ideally close to true treatment effect. The benefit of randomization is that covariate balance is achieved automatically (in expectation), which is why unadjusted effects estimated from randomized trial data (in the absence of drop-out) can be validly interpreted as causal effects. When using matching to recover causal effect estimates form observational data, balance is not guaranteed and must be assessed. This document provides instructions for assessing and reporting covariate balance as part of a matching analysis. The tools available in `MatchIt` for balance assessment should be used during the process of selecting a good matching scheme and ensuring that the chosen scheme is adequate. These tools implement the recommendations of @ho2007 and others for assessing balance. In addition to the tools available in `MatchIt`, the `cobalt` package has a suite of functions designed to assess and display balance and is directly compatible with `MatchIt` objects. `cobalt` has extensive documentation, but we describe some of its functionality here as a complement to the tools in `MatchIt`. The structure of this document is as follows: first, we describe some of the recommendations for balance checking and their rationale; next, we describe the tools for assessing balance present in `MatchIt` and display their use in evaluating several matching schemes; finally; we briefly describe some of the functionality in `cobalt` to extend that in `MatchIt`. ## Recommendations for Balance Assessment Assessing balance involves assessing whether the distributions of covariates are similar between the treated and control groups. Balance is typically assessed by examining univariate balance summary statistics for each covariate, though more complicated methods exist for assessing joint distributional balance as well. Visual depictions of distributional balance can be a helpful complement to numerical summaries, especially for hard to balance and prognostically important covariates. Many recommendations for balance assessment have been described in the methodological literature. Unfortunately, there is no single best way to assess balance or to weigh balance summary statistics because the degree and form of balance that will yield the least bias in an effect estimate depends on unknown qualities of the outcome data-generating model. Nonetheless, there are a number of valuable recommendations that can be implemented to ensure matching is successful at eliminating or reducing bias. We review some of these here. Common recommendations for assessing balance include the following: - **Standardized mean differences**. The standardized mean difference (SMD) is the difference in the means of each covariate between treatment groups standardized by a standardization factor so that it is on the same scale for all covariates. The standardization factor is typically the standard deviation of the covariate in the treated group when targeting the ATT or the pooled standard deviation across both groups when targeting the ATE. The standardization factor should be the same before and after matching to ensure changes in the mean difference are not confounded by changes in the standard deviation of the covariate. SMDs close to zero indicate good balance. Several recommended thresholds have been published in the literature; we recommend .1 and .05 for prognostically important covariates. Higher values may be acceptable when using covariate adjustment in the matched sample. In addition to computing SMDs on the covariates themselves, it is important to compute them on squares, cubes, and higher exponents as well as interactions between covariates. Several empirical studies have examined the appropriateness for using SMDs in balance assessment, including @belitser2011, @ali2014, and @stuart2013; in general, there is often a high correlation between the mean or maximum absolute SMD and the degree of bias in the treatment effect. - **Variance Ratios**. The variance ratio is the ratio of the variance of a covariate in one group to that in the other. Variance ratios close to 1 indicate good balance because they imply the variances of the samples are similar [@austin2009]. - **Empirical CDF Statistics**. Statistics related to the difference in the empirical cumulative distribution functions (eCDFs) of each covariate between groups allow assessment of imbalance across the entire covariate distribution of that covariate rather than just its mean or variance. The maximum eCDF difference, also known as the Kolmogorov-Smirnov statistic, is sometimes recommended as a useful supplement to SMDs for assessing balance [@austin2015] and is often used as a criterion to use in propensity score methods that attempt to optimize balance [e.g., @mccaffrey2004; @diamond2013]. Although the mean eCDF difference has not been as well studied, it provides a summary of imbalance that may be missed by relying solely on the maximum difference. - **Visual Diagnostics**. Visual diagnostics such as eCDF plots, empirical quantile-quantile (eQQ) plots, and kernel density plots can be used to see exactly how the covariate distributions differ from each other, i.e., where in the distribution the greatest imbalances are [@ho2007; @austin2009]. This can help to figure out how to tailor a matching method to target imbalance in a specific region of the covariate distribution. - **Prognostic scores**. The prognostic score is an estimate of the potential outcome under control for each unit [@hansen2008]. Balance on the prognostic score has been shown to be highly correlated with bias in the effect estimate, making it a useful tool in balance assessment [@stuart2013]. Estimating the prognostic score requires having access to the outcome data, and using it may be seen as violating the principle of separating the design and analysis stages of a matching analysis [@rubin2001]. However, because only the outcome values from the control group are required to use the prognostic score, some separation is maintained. Several multivariate statistics exist that summarize balance across the entire joint covariate distribution. These can be functions of the above measures, like the mean or maximum absolute SMD or the generalized weighted distance [GWD; @franklin2014], which is the sum of SMDs for the covariates and their squares and interactions, or separate statistics that measure quantities that abstract away from the distribution of individual covariates, like the L1 distance [@iacus2011], cross-match test [@heller2010], or energy distance [@huling2020]. Balance on the propensity score has often been considered a useful measure of balance, but we do not necessarily recommend it except as a supplement to balance on the covariates. Propensity score balance will generally be good with any matching method regardless of the covariate balancing potential of the propensity score, so a balanced propensity score does not imply balanced covariates [@austin2009]. Similarly, it may happen that covariates may be well balanced even if the propensity score is not balanced, such as when covariates are prioritized above the propensity score in the matching specification (e.g., with genetic matching). Given these observations, the propensity score should not be relied upon for assessing covariate balance. Simulation studies by @stuart2013 provide evidence for this recommendation against relying on propensity score balance. There has been some debate about the use of hypothesis tests, such as t-tests or Kolmogorov-Smirnov tests, for assessing covariate balance. The idea is that balance tests test the null hypothesis that the matched sample has equivalent balance to a randomized experiment. There are several problems with balance tests, described by @ho2007 and @imai2008: 1) balance is a property of the sample, not a of a population from which the sample was drawn; 2) the power of balance tests depends on the sample size, which changes during matching even if balance does not change; and 3) the use of hypothesis tests implies a uniform decision criterion for rejecting the null hypothesis (e.g., p-value less than .05, potentially with corrections for multiple comparisons), when balance should be improved without limit. `MatchIt` does not report any balance tests or p-values, instead relying on the descriptive statistics described above. ## Recommendations for Balance Reporting A variety of methods should be used when assessing balance to try to find an optimal matched set that will ideally yield a low-error estimate of the desired effect. However, reporting every balance statistic or plot in a research report or publication can be burdensome and unnecessary. That said, it is critical to report balance to demonstrate to readers that the resulting estimate is approximately unbiased and relies little on extrapolation or correct outcome model specification. We recommend the following in reporting balance in a matching analysis: - Report SMDs before and after matching for each covariate, any prognostically important interactions between covariates, and the prognostic score; this can be reported in a table or in a Love plot. - Report summaries of balance for other statistics, e.g., the largest mean and maximum eCDF difference among the covariates and the largest SMD among squares, cubes, and interactions of the covariates. `MatchIt` provides tools for calculating each of these statistics so they can be reported with ease in a manuscript or report. ## Assessing Balance with `MatchIt` `MatchIt` contains several tools to assess balance numerically and graphically. The primary balance assessment function is `summary.matchit()`, which is called when using `summary()` on a `MatchIt` object and produces several tables of balance statistics before and after matching. `plot.summary.matchit()` generates a Love plot using R's base graphics system containing the standardized mean differences resulting from a call to `summary.matchit()` and provides a nice way to display balance visually for inclusion in an article or report. `plot.matchit()` generates several plots that display different elements of covariate balance, including propensity score overlap and distribution plots of the covariates. These functions together form a suite that can be used to assess and report balance in a variety of ways. To demonstrate `MatchIt`'s balance assessment capabilities, we will use the Lalonde data included in `MatchIt` and used in `vignette("MatchIt")`. We will perform 1:1 nearest neighbor matching with replacement on the propensity score, though the functionality is identical across all matching methods except propensity score subclassification, which we illustrate at the end. ```{r} library("MatchIt") data("lalonde", package = "MatchIt") #1:1 NN matching w/ replacement on a logistic regression PS m.out <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, replace = TRUE) m.out ``` ### `summary.matchit()` When `summary()` is called on a `matchit` object, several tables of information are displayed. These include balance statistics for each covariate before matching, balance statistics for each covariate after matching, the percent reduction in imbalance after matching, and the sample sizes before and after matching. `summary.matchit()` has four additional arguments that control how balance is computed: - `interactions` controls whether balance statistics for all squares and pairwise interactions of covariates are to be displayed in addition to the covariates. The default is `FALSE`, and setting to `TRUE` can make the output massive when many covariates are present, but it is important to ensure no important interactions remain imbalanced. - `addlvariables` allows for balance to be assessed on variables other than those inside the `matchit` object. For example, if the distance between units only relied on a subset of covariates but balance needed to be achieved on all covariates, `addlvariables` could be used to supply these additional covariates. In addition to adding other variables, `addlvariables` can be used to request balance on specific functions of the covariates already in the `matchit` object, such as polynomial terms or interactions. The input to `addlvariables` can be a one-sided formula with the covariates and any desired transformations thereof on the right hand side, just like a model formula (e.g., `addlvariables = ~ X1 + X2 + I(X1^2)` would request balance on `X1`, `X2`, and the square of `X1`). Additional variables supplied to `addlvariables` but not present in the `matchit` object can be supplied as a data frame using the `data` argument. - `standardize` controls whether standardized or unstandardized statistics are to displayed. Standardized statistics include the standardized mean difference and eCDF statistics; unstandardized statistics include the raw difference in means and eQQ plot statistics. (Regardless, the variance ratio will always be displayed.). The default is `TRUE` for standardized statistics, which are more common to report because they are all on the same scale regardless of the scale of the covariates[^1]. - `pair.dist` controls whether within-pair distances should be computed and displayed. These reflect the average distance between units within the same pair, standardized or unstandardized according to the argument to `standardize`. The default is `TRUE`. With full matching, exact matching, coarsened exact matching, and propensity score subclassification, computing pair distances can take a long time, and so it may be beneficial to set to `FALSE` in these cases. [^1]: Note that versions of `MatchIt` before 4.0.0 had `standardize` set to `FALSE` by default. In addition, the arguments `un` (default: `TRUE`) and `improvement` (default: `FALSE`) control whether balance prior to matching should be displayed and whether the percent balance improvement after matching should be displayed. These can be set to `FALSE` to reduce the output. Below, we call `summary.matchit()` with `addlvariables` to display balance on covariates and a few functions of them in the matched sample. In particular, we request balance on the square of `age`, the variables representing whether `re74` and `re75` were equal to 0, and the interaction between `educ` and `race`. ```{r} summary(m.out, addlvariables = ~ I(age^2) + I(re74==0) + I(re75==0) + educ:race) ``` Let's examine the output in detail. The first table (`Summary of Balance for All Data`) provides balance in the sample prior to matching. The included statistics are the mean of the covariates in the treated group (`Means Treated`), the mean of the covariate in the control group (`Means Control`), the SMDs (`Std. Mean Diff.`), the variance ratio (`Var. Ratio`), the average distance between the eCDFs of the covariate across the groups (`eCDF Mean`), and the largest distance between the eCDFs (`eCDF Max`). Setting `un = FALSE` would have suppressed the creation of this table. The second table (`Summary of Balance for Matched Data`) contains all the same statistics in the matched sample. Because we implicitly request pair distance, an additional column for standardized pair distances (`Std. Pair Dist.`) is displayed. The final table (`Sample Sizes`) contains the sizes of the samples before (`All`) and after (`Matched`) matching, as well as the number of units left unmatched (`Unmatched`) and the number of units dropped due to a common support restriction (`Discarded`). The SMDs are computed as the mean difference divided by a standardization factor computed in the **unmatched** sample. An absolute SMD close to 0 indicates good balance; although a number of recommendations for acceptable values have appeared in the literature, we recommend absolute values less than .1 and less than .05 for potentially prognostically important variables. The variance ratios are computed as the ratio of the variance of the treated group to that of the control group for each covariate. Variance ratios are not computed for binary covariates because they are a function of the prevalence in each group, which is captured in the mean difference and eCDF statistics. A variance ratio close to 1 indicates good balance; a commonly used recommendation is for variance ratios to be between .5 and 2. The eCDF statistics correspond to the difference in the overall distributions of the covariates between the treatment groups. The values of both statistics range from 0 to 1, with values closer to zero indicating better balance. There are no specific recommendations for the values these statistics should take, though notably high values may indicate imbalance on higher moments of the covariates. The eQQ statistics produced when `standardize = FALSE` are interpreted similarly but are on the scale of the covariate. All these statistics should be considered together. Imbalance as measured by any of them may indicate a potential failure of the matching scheme to achieve distributional balance. ### `plot.summary.matchit()` A Love plot is a clean way to visually summarize balance. Using `plot` on the output of a call to `summary()` on a `matchit` object produces a Love plot of the standardized mean differences. `plot.summary.matchit()` has several additional arguments that can be used to customize the plot. - `abs` controls whether standardized mean difference should be displayed in absolute value or not. Default is `TRUE`. - `var.order` controls how the variables are ordered on the y-axis. The options are `"data"` (the default), which orders the variables as they appear the in the `summary.matchit()` output; `"unmatched"`, which orders the variables based on their standardized mean differences before matching; `"matched"`, which orders the variables based on their standardized mean differences after matching; and `"alphabetical"`, which orders the variables alphabetically. Using `"unmatched"` tends to result in attractive plots and ensures the legend doesn't overlap with points in its default position. - `threshold` controls where vertical lines indicating chosen thresholds should appear on the x-axis. Should be a numeric vector. The default is `c(.1, .05)`, which display vertical lines at .1 and .05 standardized mean difference units. - `position` controls the position of the legend. The default is `"bottomright"`, which puts the legend in the bottom right corner of the plot, and any keyword value available to supplied to `x` in `legend()` is allowed. Below we create a Love plot of the covariates. ```{r, fig.alt="A love plot with most matched dots below the threshold lines, indicaitng good balance after matching, in contrast to the unmatched dots far from the treshold lines, indicating poor balance before matching."} m.sum <- summary(m.out, addlvariables = ~ I(age^2) + I(re74==0) + I(re75==0) + educ:race) plot(m.sum, var.order = "unmatched") ``` From this plot it is clear to see that balance was quite poor prior to matching, but full matching improved balance on all covariates, and most within a threshold of .1. To make the variable names cleaner, the original variables should be renamed prior to matching. `cobalt` provides many additional options to generate and customize Love plots using the `love.plot()` function and should be used if a plot beyond what is available with `plot.summary.matchit()` is desired. ### `plot.matchit()` In addition to numeric summaries of balance, `MatchIt` offers graphical summaries as well using `plot.matchit()` (i.e., using `plot()` on a `matchit` object). We can create eQQ plots, eCDF plots, or density plots of the covariates and histograms or jitter plots of the propensity score. The covariate plots can provide a summary of the balance of the full marginal distribution of a covariate beyond just the mean and variance. `plot.matchit()` has a few arguments to customize the output: - `type` corresponds to the type of plot desired. Options include `"qq"` for eQQ plots (the default), `"ecdf"` for eCDF plots, `"density"` for density plots (or bar plots for categorical variables), `"jitter"` for jitter plots, and `"histogram"` for histograms. - `interactive` controls whether the plot is interactive or not. For eQQ, eCDF, and density plots, this allows us to control when the next page of covariates is to be displayed since only three can appear at a time. For jitter plots, this can allow us to select individual units with extreme values for further inspection. The default is `TRUE`. - `which.xs` is used to specify for which covariates to display balance in eQQ, eCDF, and density plots. The default is to display balance on all, but we can request balance just on a specific subset. If three or fewer are requested, `interactive` is ignored. The argument can be supplied as a one-sided formula with the variables of interest on the right or a character vector containing the names of the desired variables. If any variables are not in the `matchit` object, a `data` argument can be supplied with a data set containing the named variables. Below, we demonstrate the eQQ plot: ```{r, fig.alt ="eQQ plots of age, nodegree, and re74 in the unmatched and matched samples."} #eQQ plot plot(m.out, type = "qq", which.xs = ~age + nodegree + re74) ``` The y-axis displays the each value of the covariate for the treated units, and the x-axis displays the the value of the covariate at the corresponding quantile in the control group. When values fall on the 45 degree line, the groups are balanced. Above, we can see that `age` remains somewhat imbalanced, but `nodegree` and `re74` have much better balance after matching than before. The difference between the x and y values of each point are used to compute the eQQ difference statistics that are displayed in `summary.matchit()` with `standardize = FALSE`. Below, we demonstrate the eCDF plot: ```{r, fig.alt ="eCDF plots of educ, married, and re75 in the unmatched and matched samples."} #eCDF plot plot(m.out, type = "ecdf", which.xs = ~educ + married + re75) ``` The x-axis displays the covariate values and the y-axis displays the proportion of the sample at or less than that covariate value. Perfectly overlapping lines indicate good balance. The black line corresponds to the treated group and the gray line to the control group. Although `educ` and `re75` were fairly well balanced before matching, their balance has improved nonetheless. `married` appears far better balanced after matching than before. The vertical difference between the eCDFs lines of each treatment group is used to compute the eCDF difference statistics that are displayed in `summary.matchit()` with `standardize = TRUE`. Below, we demonstrate the density plot: ```{r, fig.alt ="Density plots of age, educ, and race in the unmatched and matched samples."} #density plot plot(m.out, type = "density", which.xs = ~age + educ + race) ``` The x-axis displays the covariate values and the y-axis displays the density of the sample at that covariate value. For categorical variables, the y-axis displays the proportion of the sample at that covariate value. The black line corresponds to the treated group and the gray line to the control group. Perfectly overlapping lines indicate good balance. Density plots display similar information to eCDF plots but may be more intuitive for some users because of their link to histograms. ## Assessing Balance After Subclassification With subclassification, balance can be checked both within each subclass and overall. With `summary.matchit()`, we can request to view balance only in aggregate or in each subclass. The latter can help us decide if we can interpret effects estimated within each subclass as unbiased. The `plot.summary.matchit()` and `plot.matchit()` outputs can be requested either in aggregate or for each subclass. We demonstrate this below. First we will perform propensity score subclassification using 4 subclasses (typically more is beneficial). ```{r} #Subclassification on a logistic regression PS s.out <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, method = "subclass", subclass = 4) s.out ``` When using `summary()`, the default is to display balance only in aggregate using the subclassification weights. This balance output looks similar to that for other matching methods. ```{r} summary(s.out) ``` An additional option in `summary()`, `subclass`, allows us to request balance for individual subclasses. `subclass` can be set to `TRUE` to display balance for all subclasses or the indices of individual subclasses for which balance is to be displayed. Below we call `summary()` and request balance to be displayed on all subclasses (setting `un = FALSE` to suppress balance in the original sample): ```{r} summary(s.out, subclass = TRUE, un = FALSE) ``` We can plot the standardized mean differences in a Love plot that also displays balance for the subclasses using `plot.summary.matchit()` on a `summary.matchit()` object with `subclass = TRUE`. ```{r, fig.alt ="Love plot of balance before and after subclassification, with subclass IDs representing balance within each subclass in addition to dots representing balance overall."} s <- summary(s.out, subclass = TRUE) plot(s, var.order = "unmatched", abs = FALSE) ``` Note that for some variables, while the groups are balanced in aggregate (black dots), the individual subclasses (gray numbers) may not be balanced, in which case unadjusted effect estimates within these subclasses should not be interpreted as unbiased. When we plot distributional balance using `plot.matchit()`, again we can choose whether balance should be displayed in aggregate or within subclasses again using the `subclass` option, which functions the same as it does with `summary.matchit()`. Below we demonstrate checking balance within a subclass. ```{r, fig.alt ="Density plots of educ, married, and re75 in the unmatched sample and in subclass 1."} plot(s.out, type = "density", which.xs = ~educ + married + re75, subclass = 1) ``` If we had set `subclass = FALSE`, plots would have been displayed in aggregate using the subclassification weights. If `subclass` is unspecified, a prompt will ask us for which subclass we want to see balance. ## Assessing Balance with `cobalt` ```{r, include=FALSE} ok <- requireNamespace("cobalt", quietly = TRUE) ``` The `cobalt` package was designed specifically for checking balance before and after matching (and weighting). It offers three main functions, `bal.tab()`, `love.plot()`, and `bal.plot()`, which perform similar actions to `summary.matchit()`, `plot.summary.matchit()`, and `plot.matchit()`, respectively. These functions directly interface with `matchit` objects, making `cobalt` straightforward to use in conjunction with `MatchIt`. `cobalt` can be used as a complement to `MatchIt`, especially for more advanced uses that are not accommodated by `MatchIt`, such as comparing balance across different matching schemes and even different packages, assessing balance in clustered or multiply imputed data, and assessing balance with multi-category, continuous, and time-varying treatments. The main `cobalt` vignette (`vignette("cobalt", package = "cobalt")`) contains many examples of its use with `MatchIt` objects, so we only provide a short demonstration of its capabilities here. ```{r, message = F, eval = ok} library("cobalt") ``` ### `bal.tab()` `bal.tab()` produces tables of balance statistics similar to `summary.matchit()`. The columns displayed can be customized to limit how much information is displayed and isolate desired information. We call `bal.tab()` with a few of its options specified below: ```{r, eval = ok} bal.tab(m.out, un = TRUE, stats = c("m", "v", "ks")) ``` The output is very similar to that of `summary.matchit()`, except that the balance statistics computed before matching (with the suffix `.Un`) and those computed after matching (with the suffix `.Adj`) are in the same table. By default, only SMDs after matching (`Diff.Adj`) are displayed; by setting `un = TRUE`, we requested that the balance statistics before matching also be displayed, and by setting `stats = c("m", "v", "ks")` we requested mean differences, variance ratios, and Kolmogorov-Smirnov statistics. Other balance statistics and summary statistics can be requested as well. One important detail to note is that the default for binary covariates is to print the raw difference in proportion rather than the standardized mean difference, so there will be an apparent discrepancy for these variables between `bal.tab()` and `summary.matchit()` output, though this behavior can be changed by setting `binary = "std"` in the call to `bal.tab()`. Functionality for producing balance statistics for additional variables and for powers and interactions of the covariates is available using the `addl`, `poly`, and `int` options. `bal.tab()` and other `cobalt` functions can produce balance not just on a single `matchit` object but on several at the same time, which facilitates comparing balance across several matching specifications. For example, if we wanted to compare the full matching results to the results of nearest neighbor matching without replacement, we could supply both to `bal.tab()`, which we demonstrate below: ```{r, eval = ok} #Nearest neighbor (NN) matching on the PS m.out2 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde) #Balance on covariates after full and NN matching bal.tab(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, un = TRUE, weights = list(full = m.out, nn = m.out2)) ``` This time, we supplied `bal.tab()` with the covariates and dataset and supplied the `matchit` output objects in the `weights` argument (which extracts the matching weights from the objects). Here we can see that full matching yields better balance than nearest neighbor matching overall, though balance is slightly worse for `age` and `maried` and the effective sample size is lower. ### `love.plot()` `love.plot()` creates a Love plot of chosen balance statistics. It offers many options for customization, including the shape and colors of the points, how the variable names are displayed, and for which statistics balance is to be displayed. Below is an example of its basic use: ```{r, eval = ok, fig.alt ="Minimal love plot of balance before and after matching."} love.plot(m.out, binary = "std") ``` The syntax is straightforward and similar to that of `bal.tab()`. Below we demonstrate a more advanced use that customizes the appearance of the plot and displays balance not only on mean differences but also on Kolmogorov-Smirnov statistics and for both full matching and nearest neighbor matching simultaneously. ```{r, fig.width=7, eval = ok, fig.alt ="A more elaborate love plot displaying some of the cobalt's capabilities for making publication-ready plots."} love.plot(m.out, stats = c("m", "ks"), poly = 2, abs = TRUE, weights = list(nn = m.out2), drop.distance = TRUE, thresholds = c(m = .1), var.order = "unadjusted", binary = "std", shapes = c("circle filled", "triangle", "square"), colors = c("red", "blue", "darkgreen"), sample.names = c("Original", "Full Matching", "NN Matching"), position = "bottom") ``` The `love.plot()` documentation explains what each of these arguments do and the several other ones available. See `vignette("love.plot", package = "cobalt")` for other advanced customization of `love.plot()`. ### `bal.plot()` `bal.plot()` displays distributional balance for a single covariate, similar to `plot.matchit()`. Its default is to display kernel density plots for continuous variables and bar graphs for categorical variables. It can also display eCDF plots and histograms. Below we demonstrate some of its uses: ```{r, eval = ok, fig.alt = c("Density plot for educ before and after matching.", "Bar graph for race before and after matching.", "Mirrored histograms of propensity scores before and after matching.")} #Density plot for continuous variables bal.plot(m.out, var.name = "educ", which = "both") #Bar graph for categorical variables bal.plot(m.out, var.name = "race", which = "both") #Mirrored histogram bal.plot(m.out, var.name = "distance", which = "both", type = "histogram", mirror = TRUE) ``` These plots help illuminate the specific ways in which the covariate distributions differ between treatment groups, which can aid in interpreting the balance statistics provided by `bal.tab()` and `summary.matchit()`. ## Conclusion The goal of matching is to achieve covariate balance, similarity between the covariate distributions of the treated and control groups. Balance should be assessed during the matching phase to find a matching specification that works. Balance must also be reported in the write-up of a matching analysis to demonstrate to readers that matching was successful. `MatchIt` and `cobalt` each offer a suite of functions to implement best practices in balance assessment and reporting. ## References MatchIt/vignettes/matching-methods.Rmd0000644000176200001440000015541614762360425017537 0ustar liggesusers--- title: "Matching Methods" author: "Noah Greifer" date: "`r Sys.Date()`" output: html_vignette: toc: true vignette: > %\VignetteIndexEntry{Matching Methods} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} bibliography: references.bib link-citations: true --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) options(width = 200) ``` ## Introduction `MatchIt` implements several matching methods with a variety of options. Though the help pages for the individual methods describe each method and how they can be used, this vignette provides a broad overview of the available matching methods and their associated options. The choice of matching method depends on the goals of the analysis (e.g., the estimand, whether low bias or high precision is important) and the unique qualities of each dataset to be analyzed, so there is no single optimal choice for any given analysis. A benefit of nonparametric preprocessing through matching is that a number of matching methods can be tried and their quality assessed without consulting the outcome, reducing the possibility of capitalizing on chance while allowing for the benefits of an exploratory analysis in the design phase [@ho2007]. This vignette describes each matching method available in `MatchIt` and the various options that are allowed with matching methods and the consequences of their use. For a brief introduction to the use of `MatchIt` functions, see `vignette("MatchIt")`. For details on how to assess and report covariate balance, see `vignette("assessing-balance")`. For details on how to estimate treatment effects and standard errors after matching, see `vignette("estimating-effects")`. ## Matching Matching as implemented in `MatchIt` is a form of *subset selection*, that is, the pruning and weighting of units to arrive at a (weighted) subset of the units from the original dataset. Ideally, and if done successfully, subset selection produces a new sample where the treatment is unassociated with the covariates so that a comparison of the outcomes treatment and control groups is not confounded by the measured and balanced covariates. Although statistical estimation methods like regression can also be used to remove confounding due to measured covariates, @ho2007 argue that fitting regression models in matched samples reduces the dependence of the validity of the estimated treatment effect on the correct specification of the model. Matching is nonparametric in the sense that the estimated weights and pruning of the sample are not direct functions of estimated model parameters but rather depend on the organization of discrete units in the sample; this is in contrast to propensity score weighting (also known as inverse probability weighting), where the weights come more directly from the estimated propensity score model and therefore are more sensitive to its correct specification. These advantages, as well as the intuitive understanding of matching by the public compared to regression or weighting, make it a robust and effective way to estimate treatment effects. It is important to note that this implementation of matching differs from the methods described by Abadie and Imbens [-@abadie2006; -@abadie2016] and implemented in the `Matching` R package and `teffects` routine in Stata. That form of matching is *matching imputation*, where the missing potential outcomes for each unit are imputed using the observed outcomes of paired units. This is a critical distinction because matching imputation is a specific estimation method with its own effect and standard error estimators, in contrast to subset selection, which is a preprocessing method that does not require specific estimators and is broadly compatible with other parametric and nonparametric analyses. The benefits of matching imputation are that its theoretical properties (i.e., the rate of convergence and asymptotic variance of the estimator) are well understood, it can be used in a straightforward way to estimate not just the average treatment effect in the treated (ATT) but also the average treatment effect in the population (ATE), and additional effective matching methods can be used in the imputation (e.g., kernel matching). The benefits of matching as nonparametric preprocessing are that it is far more flexible with respect to the types of effects that can be estimated because it does not involve any specific estimator, its empirical and finite-sample performance has been examined in depth and is generally well understood, and it aligns well with the design of experiments, which are more familiar to non-technical audiences. In addition to subset selection, matching often (though not always) involves a form of *stratification*, the assignment of units to pairs or strata containing multiple units. The distinction between subset selection and stratification is described by @zubizarretaMatchingBalancePairing2014, who separate them into two separate steps. In `MatchIt`, with almost all matching methods, subset selection is performed by stratification; for example, treated units are paired with control units, and unpaired units are then dropped from the matched sample. With some methods, subclasses are used to assign matching or stratification weights to individual units, which increase or decrease each unit's leverage in a subsequent analysis. There has been some debate about the importance of stratification after subset selection; while some authors have argued that, with some forms of matching, pair membership is incidental [@stuart2008; @schafer2008], others have argued that correctly incorporating pair membership into effect estimation can improve the quality of inferences [@austin2014a; @wan2019]. For methods that allow it, `MatchIt` includes stratum membership as an additional output of each matching specification. How these strata can be used is detailed in `vignette("estimating-effects")`. At the heart of `MatchIt` are three classes of methods: distance matching, stratum matching, and pure subset selection. *Distance matching* involves considering a focal group (usually the treated group) and selecting members of the non-focal group (i.e., the control group) to pair with each member of the focal group based on the *distance* between units, which can be computed in one of several ways. Members of either group that are not paired are dropped from the sample. Nearest neighbor matching (`method = "nearest"`), optimal pair matching (`method = "optimal"`), optimal full matching (`method = "full"`), generalized full matching (`method = "quick"`), and genetic matching (`method = "genetic"`) are the methods of distance matching implemented in `MatchIt`. Typically, only the average treatment in the treated (ATT) or average treatment in the control (ATC), if the control group is the focal group, can be estimated after distance matching in `MatchIt` (full matching is an exception, described later). *Stratum matching* involves creating strata based on unique values of the covariates and assigning units with those covariate values into those strata. Any units that are in strata that lack either treated or control units are then dropped from the sample. Strata can be formed using the raw covariates (`method = "exact"`), coarsened versions of the covariates (`method = "cem"`), or coarsened versions of the propensity score (`method = "subclass"`). When no units are discarded, either the ATT, ATC, or ATE can be estimated after stratum matching, though often some units are discarded, especially with exact and coarsened exact matching, making the estimand less clear. For use in estimating marginal treatment effects after exact matching, stratification weights are computed for the matched units first by computing a new "stratum propensity score" for each unit, which is the proportion of treated units in its stratum. The formulas for computing inverse probability weights from standard propensity scores are then applied to the new stratum propensity scores to form the new weights. Pure subset selection involves selecting a subset of units form the original sample without considering the distance between individual units or strata that units might fall into. Subsets are selected to optimize a criterion subject to constraint on balance and remaining sample size. Cardinality and profile matching (`method = "cardinality"`) are the methods of pure subset selection implemented in `MatchIt`. Both methods allow the user to specify the largest imbalance allowed in the resulting matched sample, and an optimization routine attempts to find the largest matched sample that satisfies those balance constraints. While cardinality matching does not target a specific estimand, profile matching can be used to target the ATT, ATC, or ATE. Below, we describe each of the matching methods implemented in `MatchIt`. ## Matching Methods ### Nearest Neighbor Matching (`method = "nearest"`) Nearest neighbor matching is also known as greedy matching. It involves running through the list of treated units and selecting the closest eligible control unit to be paired with each treated unit. It is greedy in the sense that each pairing occurs without reference to how other units will be or have been paired, and therefore does not aim to optimize any criterion. Nearest neighbor matching is the most common form of matching used [@thoemmes2011; @zakrison2018] and has been extensively studied through simulations. See `?method_nearest` for the documentation for `matchit()` with `method = "nearest"`. Nearest neighbor matching requires the specification of a distance measure to define which control unit is closest to each treated unit. The default and most common distance is the *propensity score difference*, which is the difference between the propensity scores of each treated and control unit [@stuart2010]. Another popular distance is the Mahalanobis distance, described in the section "Mahalanobis distance matching" below. The order in which the treated units are to be paired must also be specified and has the potential to change the quality of the matches [@austin2013b; @rubin1973]; this is specified by the `m.order` argument. With propensity score matching, the default is to go in descending order from the highest propensity score; doing so allows the units that would have the hardest time finding close matches to be matched first [@rubin1973]. Other orderings are possible, including random ordering, which can be tried multiple times until an adequate matched sample is found. When matching with replacement (i.e., where each control unit can be reused to be matched with any number of treated units), the matching order doesn't matter. When using a matching ratio greater than 1 (i.e., when more than 1 control units are requested to be matched to each treated unit), matching occurs in a cycle, where each treated unit is first paired with one control unit, and then each treated unit is paired with a second control unit, etc. Ties are broken deterministically based on the order of the units in the dataset to ensure that multiple runs of the same specification yield the same result (unless the matching order is requested to be random). Nearest neighbor matching is implemented in `MatchIt` using internal C++ code through `Rcpp`. When matching on a propensity score, this makes matching extremely fast, even for large datasets. Using a caliper on the propensity score (described below) makes it even faster. Run times may be a bit longer when matching on other distance measures (e.g., the Mahalanobis distance). In contrast to optimal pair matching (described below), nearest neighbor matching does not require computing the full distance matrix between units, which makes it more applicable to large datasets. ### Optimal Pair Matching (`method = "optimal"`) Optimal pair matching (often just called optimal matching) is very similar to nearest neighbor matching in that it attempts to pair each treated unit with one or more control units. Unlike nearest neighbor matching, however, it is "optimal" rather than greedy; it is optimal in the sense that it attempts to choose matches that collectively optimize an overall criterion [@hansen2006; @gu1993]. The criterion used is the sum of the absolute pair distances in the matched sample. See `?method_optimal` for the documentation for `matchit()` with `method = "optimal"`. Optimal pair matching in `MatchIt` depends on the `fullmatch()` function in the `optmatch` package [@hansen2006]. Like nearest neighbor matching, optimal pair matching requires the specification of a distance measure between units. Optimal pair matching can be thought of simply as an alternative to selecting the order of the matching for nearest neighbor matching. Optimal pair matching and nearest neighbor matching often yield the same or very similar matched samples; indeed, some research has indicated that optimal pair matching is not much better than nearest neighbor matching at yielding balanced matched samples [@austin2013b]. The `tol` argument in `fullmatch()` can be supplied to `matchit()` with `method = "optimal"`; this controls the numerical tolerance used to determine whether the optimal solution has been found. The default is fairly high and, for smaller problems, should be set much lower (e.g., by setting `tol = 1e-7`). ### Optimal Full Matching (`method = "full"`) Optimal full matching (often just called full matching) assigns every treated and control unit in the sample to one subclass each [@hansen2004; @stuart2008a]. Each subclass contains one treated unit and one or more control units or one control units and one or more treated units. It is optimal in the sense that the chosen number of subclasses and the assignment of units to subclasses minimize the sum of the absolute within-subclass distances in the matched sample. Weights are computed based on subclass membership, and these weights then function like propensity score weights and can be used to estimate a weighted treatment effect, ideally free of confounding by the measured covariates. See `?method_full` for the documentation for `matchit()` with `method = "full"`. Optimal full matching in `MatchIt` depends on the `fullmatch()` function in the `optmatch` package [@hansen2006]. Like the other distance matching methods, optimal full matching requires the specification of a distance measure between units. It can be seen a combination of distance matching and stratum matching: subclasses are formed with varying numbers of treated and control units, as with stratum matching, but the subclasses are formed based on minimizing within-pair distances and do not involve forming strata based on any specific variable, similar to distance matching. Unlike other distance matching methods, full matching can be used to estimate the ATE. Full matching can also be seen as a form of propensity score weighting that is less sensitive to the form of the propensity score model because the original propensity scores are used just to create the subclasses, not to form the weights directly [@austin2015a]. In addition, full matching does not have to rely on estimated propensity scores to form the subclasses and weights; other distance measures are allowed as well. Although full matching uses all available units, there is a loss in precision due to the weights. Units may be weighted in such a way that they contribute less to the sample than would unweighted units, so the effective sample size (ESS) of the full matching weighted sample may be lower than even that of 1:1 pair matching. Balance is often far better after full matching than it is with 1:k matching, making full matching a good option to consider especially when 1:k matching is not effective or when the ATE is the target estimand. The specification of the full matching optimization problem can be customized by supplying additional arguments that are passed to `optmatch::fullmatch()`, such as `min.controls`, `max.controls`, `mean.controls`, and `omit.fraction`. As with optimal pair matching, the numerical tolerance value can be set much lower than the default with small problems by setting, e.g., `tol = 1e-7`. ### Generalized Full Matching (`method = "quick"`) Generalized full matching is a variant of full matching that uses a special fast clustering algorithm to dramatically speed up the matching, even for large datasets [@savjeGeneralizedFullMatching2021]. Like with optimal full matching, generalized full matching assigns every unit to a subclass. What makes generalized full match "generalized" is that the user can customize the matching in a number of ways, such as by specifying an arbitrary minimum number of units from each treatment group or total number of units per subclass, or by allowing not all units from a treatment group to have to be matched. Generalized full matching minimizes the largest within-subclass distances in the matched sample, but it does so in a way that is not completely optimal (though the solution is often very close to the optimal solution). Matching weights are computed based on subclass membership, and these weights then function like propensity score weights and can be used to estimate a weighted treatment effect, ideally free of confounding by the measured covariates. See `?method_quick` for the documentation for `matchit()` with `method = "quick"`. Generalized full matching in `MatchIt` depends on the `quickmatch()` function in the `quickmatch` package [@savjeQuickmatchQuickGeneralized2018]. Generalized full matching includes different options for customization than optimal full matching. The user cannot supply their own distance matrix, but propensity scores and distance metrics that are computed from the supplied covariates (e.g., Mahalanobis distance) are allowed. Calipers can only be placed on the propensity score, if supplied. As with optimal full matching, generalized full matching can target the ATE. Matching performance tends to be similar between the two methods, but generalized full matching will be much quicker and can accommodate larger datasets, making it a good substitute. Generalized full matching is often faster than even nearest neighbor matching, especially for large datasets. ### Genetic Matching (`method = "genetic"`) Genetic matching is less a specific form of matching and more a way of specifying a distance measure for another form of matching. In practice, though, the form of matching used is nearest neighbor pair matching. Genetic matching uses a genetic algorithm, which is an optimization routine used for non-differentiable objective functions, to find scaling factors for each variable in a generalized Mahalanobis distance formula [@diamond2013]. The criterion optimized by the algorithm is one based on covariate balance. Once the scaling factors have been found, nearest neighbor matching is performed on the scaled generalized Mahalanobis distance. See `?method_genetic` for the documentation for `matchit()` with `method = "genetic"`. Genetic matching in `MatchIt` depends on the `GenMatch()` function in the `Matching` package [@sekhon2011] to perform the genetic search and uses the `Match()` function to perform the nearest neighbor match using the scaled generalized Mahalanobis distance. Genetic matching considers the generalized Mahalanobis distance between a treated unit $i$ and a control unit $j$ as $$\delta_{GMD}(\mathbf{x}_i,\mathbf{x}_j, \mathbf{W})=\sqrt{(\mathbf{x}_i - \mathbf{x}_j)'(\mathbf{S}^{-1/2})'\mathbf{W}(\mathbf{S}^{-1/2})(\mathbf{x}_i - \mathbf{x}_j)}$$ where $\mathbf{x}$ is a $p \times 1$ vector containing the value of each of the $p$ included covariates for that unit, $\mathbf{S}^{-1/2}$ is the Cholesky decomposition of the covariance matrix $\mathbf{S}$ of the covariates, and $\mathbf{W}$ is a diagonal matrix with scaling factors $w$ on the diagonal: $$ \mathbf{W}=\begin{bmatrix} w_1 & & & \\ & w_2 & & \\ & & \ddots &\\ & & & w_p \\ \end{bmatrix} $$ When $w_k=1$ for all covariates $k$, the computed distance is the standard Mahalanobis distance between units. Genetic matching estimates the optimal values of the $w_k$s, where a user-specified criterion is used to define what is optimal. The default is to maximize the smallest p-value among balance tests for the covariates in the matched sample (both Kolmogorov-Smirnov tests and t-tests for each covariate). In `MatchIt`, if a propensity score is specified, the default is to include the propensity score and the covariates in $\mathbf{x}$ and to optimize balance on the covariates. When `distance = "mahalanobis"` or the `mahvars` argument is specified, the propensity score is left out of $\mathbf{x}$. In all other respects, genetic matching functions just like nearest neighbor matching except that the matching itself is carried out by `Matching::Match()` instead of by `MatchIt`. When using `method = "genetic"` in `MatchIt`, additional arguments passed to `Matching::GenMatch()` to control the genetic search process should be specified; in particular, the `pop.size` argument should be increased from its default of 100 to a much higher value. Doing so will make the algorithm take more time to finish but will generally improve the quality of the resulting matches. Different functions can be supplied to be used as the objective in the optimization using the `fit.func` argument. ### Exact Matching (`method = "exact"`) Exact matching is a form of stratum matching that involves creating subclasses based on unique combinations of covariate values and assigning each unit into their corresponding subclass so that only units with identical covariate values are placed into the same subclass. Any units that are in subclasses lacking either treated or control units will be dropped. Exact matching is the most powerful matching method in that no functional form assumptions are required on either the treatment or outcome model for the method to remove confounding due to the measured covariates; the covariate distributions are exactly balanced. The problem with exact matching is that in general, few if any units will remain after matching, so the estimated effect will only generalize to a very limited population and can lack precision. Exact matching is particularly ineffective with continuous covariates, for which it might be that no two units have the same value, and with many covariates, for which it might be the case that no two units have the same combination of all covariates; this latter problem is known as the "curse of dimensionality". See `?method_exact` for the documentation for `matchit()` with `method = "exact"`. It is possible to use exact matching on some covariates and another form of matching on the rest. This makes it possible to have exact balance on some covariates (typically categorical) and approximate balance on others, thereby gaining the benefits of both exact matching and the other matching method used. To do so, the other matching method should be specified in the `method` argument to `matchit()` and the `exact` argument should be specified to contain the variables on which exact matching is to be done. ### Coarsened Exact Matching (`method = "cem"`) Coarsened exact matching (CEM) is a form of stratum matching that involves first coarsening the covariates by creating bins and then performing exact matching on the new coarsened versions of the covariates [@iacus2012]. The degree and method of coarsening can be controlled by the user to manage the trade-off between exact and approximate balancing. For example, coarsening a covariate to two bins will mean that units that differ greatly on the covariate might be placed into the same subclass, while coarsening a variable to five bins may require units to be dropped due to not finding matches. Like exact matching, CEM is susceptible to the curse of dimensionality, making it a less viable solution with many covariates, especially with few units. Dropping units can also change the target population of the estimated effect. See `?method_cem` for the documentation for `matchit()` with `method = "cem"`. CEM in `MatchIt` does not depend on any other package to perform the coarsening and matching, though it used to rely on the `cem` package. ### Subclassification (`method = "subclass"`) Propensity score subclassification can be thought of as a form of coarsened exact matching with the propensity score as the sole covariate to be coarsened and matched on. The bins are usually based on specified quantiles of the propensity score distribution either in the treated group, control group, or overall, depending on the desired estimand. Propensity score subclassification is an old and well-studied method, though it can perform poorly compared to other, more modern propensity score methods such as full matching and weighting [@austin2010]. See `?method_subclass` for the documentation for `matchit()` with `method = "subclass"`. The binning of the propensity scores is typically based on dividing the distribution of covariates into approximately equally sized bins. The user specifies the number of subclasses using the `subclass` argument and which group should be used to compute the boundaries of the bins using the `estimand` argument. Sometimes, subclasses can end up with no units from one of the treatment groups; by default, `matchit()` moves a unit from an adjacent subclass into the lacking one to ensure that each subclass has at least one unit from each treatment group. The minimum number of units required in each subclass can be chosen by the `min.n` argument to `matchit()`. If set to 0, an error will be thrown if any subclass lacks units from one of the treatment groups. Moving units from one subclass to another generally worsens the balance in the subclasses but can increase precision. The default number of subclasses is 6, which is arbitrary and should not be taken as a recommended value. Although early theory has recommended the use of 5 subclasses, in general there is an optimal number of subclasses that is typically much larger than 5 but that varies among datasets [@orihara2021]. Rather than trying to figure this out for oneself, one can use optimal full matching (i.e., with `method = "full"`) or generalized full matching (`method = "quick"`) to optimally create subclasses that optimize a within-subclass distance criterion. The output of propensity score subclassification includes the assigned subclasses and the subclassification weights. Effects can be estimated either within each subclass and then averaged across them, or a single marginal effect can be estimated using the subclassification weights. This latter method has been called marginal mean weighting through subclassification [MMWS; @hong2010] and fine stratification weighting [@desai2017]. It is also implemented in the `WeightIt` package. ### Cardinality and Profile Matching (`method = "cardinality"`) Cardinality and profile matching are pure subset selection methods that involve selecting a subset of the original sample without considering the distance between individual units or assigning units to pairs or subclasses. They can be thought of as a weighting method where the weights are restricted to be zero or one. Cardinality matching involves finding the largest sample that satisfies user-supplied balance constraints and constraints on the ratio of matched treated to matched control units [@zubizarretaMatchingBalancePairing2014]. It does not consider a specific estimand and can be a useful alternative to matching with a caliper for handling data with little overlap [@visconti2018]. Profile matching involves identifying a target distribution (e.g., the full sample for the ATE or the treated units for the ATT) and finding the largest subset of the treated and control groups that satisfy user-supplied balance constraints with respect to that target [@cohnProfileMatchingGeneralization2021]. See `?method_cardinality` for the documentation for using `matchit()` with `method = "cardinality"`, including which inputs are required to request either cardinality matching or profile matching. Subset selection is performed by solving a mixed integer programming optimization problem with linear constraints. The problem involves maximizing the size of the matched sample subject to constraints on balance and sample size. For cardinality matching, the balance constraints refer to the mean difference for each covariate between the matched treated and control groups, and the sample size constraints require the matched treated and control groups to be the same size (or differ by a user-supplied factor). For profile matching, the balance constraints refer to the mean difference for each covariate between each treatment group and the target distribution; for the ATE, this requires the mean of each covariate in each treatment group to be within a given tolerance of the mean of the covariate in the full sample, and for the ATT, this requires the mean of each covariate in the control group to be within a given tolerance of the mean of the covariate in the treated group, which is left intact. The balance tolerances are controlled by the `tols` and `std.tols` arguments. One can also create pairs in the matched sample by using the `mahvars` argument, which requests that optimal Mahalanobis matching be done after subset selection; doing so can add additional precision and robustness [@zubizarretaMatchingBalancePairing2014]. The optimization problem requires a special solver to solve. Currently, the available options in `MatchIt` are the HiGHS solver (through the `highs` package), the GLPK solver (through the `Rglpk` package), the SYMPHONY solver (through the `Rsymphony` package), and the Gurobi solver (through the `gurobi` package). The differences among the solvers are in performance; Gurobi is by far the best (fastest, least likely to fail to find a solution), but it is proprietary (though has a free trial and academic license) and is a bit more complicated to install. HiGHS is the default due to being open source, easily installed, and with performance comparable to Gurobi. The `designmatch` package also provides an implementation of cardinality matching with more options than `MatchIt` offers. ## Customizing the Matching Specification In addition to the specific matching method, other options are available for many of the matching methods to further customize the matching specification. These include different specifications of the distance measure, methods to perform alternate forms of matching in addition to the main method, prune units far from other units prior to matching, restrict possible matches, etc. Not all options are compatible with all matching methods. ### Specifying the propensity score or other distance measure (`distance`) The distance measure is used to define how close two units are. In nearest neighbor matching, this is used to choose the nearest control unit to each treated unit. In optimal matching, this is used in the criterion that is optimized. By default, the distance measure is the propensity score difference, and the argument supplied to `distance` corresponds to the method of estimating the propensity score. In `MatchIt`, propensity scores are often labeled as "distance" values, even though the propensity score itself is not a distance measure. This is to reflect that the propensity score is used in creating the distance value, but other scores could be used, such as prognostic scores for prognostic score matching [@hansen2008a]. The propensity score is more like a "position" value, in that it reflects the position of each unit in the matching space, and the difference between positions is the distance between them. If the argument to `distance` is one of the allowed methods for estimating propensity scores (see `?distance` for these values) or is a numeric vector with one value per unit, the distance between units will be computed as the pairwise difference between propensity scores or the supplied values. Propensity scores are also used in propensity score subclassification and can optionally be used in genetic matching as a component of the generalized Mahalanobis distance. For exact, coarsened exact, and cardinality matching, the `distance` argument is ignored. The default `distance` argument is `"glm"`, which estimates propensity scores using logistic regression or another generalized linear model. The `link` and `distance.options` arguments can be supplied to further specify the options for the propensity score models, including whether to use the raw propensity score or a linearized version of it (e.g., the logit of a logistic regression propensity score, which has been commonly referred to and recommended in the propensity score literature [@austin2011a; @stuart2010]). Allowable options for the propensity score model include parametric and machine learning-based models, each of which have their strengths and limitations and may perform differently depending on the unique qualities of each dataset. We recommend multiple types of models be tried to find one that yields the best balance, as there is no way to make a single recommendation that will work for all cases. The `distance` argument can also be specified as a method of computing pairwise distances from the covariates directly (i.e., without estimating propensity scores). The options include `"mahalanobis"`, `"robust_mahalanobis"`, `"euclidean"`, and `"scaled_euclidean"`. These methods compute a distance metric for a treated unit $i$ and a control unit $j$ as $$\delta(\mathbf{x}_i,\mathbf{x}_j)=\sqrt{(\mathbf{x}_i - \mathbf{x}_j)'S^{-1}(\mathbf{x}_i - \mathbf{x}_j)}$$ where $\mathbf{x}$ is a $p \times 1$ vector containing the value of each of the $p$ included covariates for that unit, $S$ is a scaling matrix, and $S^{-1}$ is the (generalized) inverse of $S$. For Mahalanobis distance matching, $S$ is the pooled covariance matrix of the covariates [@rubinBiasReductionUsing1980]; for Euclidean distance matching, $S$ is the identity matrix (i.e., no scaling); and for scaled Euclidean distance matching, $S$ is the diagonal of the pooled covariance matrix (containing just the variances). The robust Mahalanobis distance is computed not on the covariates directly but rather on their ranks and uses a correction for ties (see @rosenbaumDesignObservationalStudies2010, ch 8). For creating close pairs, matching with these distance measures tends work better than propensity score matching because paired units will have close values on all of the covariates, whereas propensity score-paired units may be close on the propensity score but not on any of the covariates themselves. This feature was the basis of King and Nielsen's [-@king2019] warning against using propensity scores for matching. That said, they do not always outperform propensity score matching [@ripolloneImplicationsPropensityScore2018]. `distance` can also be supplied as a matrix of distance values between units. This makes it possible to use handcrafted distance matrices or distances created outside `MatchIt`. Only nearest neighbor, optimal pair, and optimal full matching allow this specification. The propensity score can have uses other than as the basis for matching. It can be used to define a region of common support, outside which units are dropped prior to matching; this is implemented by the `discard` option. It can also be used to define a caliper, the maximum distance two units can be before they are prohibited from being paired with each other; this is implemented by the `caliper` argument. To estimate or supply a propensity score for one of these purposes but not use it as the distance measure for matching (i.e., to perform Mahalanobis distance matching instead), the `mahvars` argument can be specified. These options are described below. ### Implementing common support restrictions (`discard`) The region of *common support* is the region of overlap between treatment groups. A common support restriction discards units that fall outside of the region of common support, preventing them from being matched to other units and included in the matched sample. This can reduce the potential for extrapolation and help the matching algorithms to avoid overly distant matches from occurring. In `MatchIt`, the `discard` option implements a common support restriction based on the propensity score. The argument can be supplied as `"treated"`, `"control"`, or `"both"`, which discards units in the corresponding group that fall outside the region of common support for the propensity score. The `reestimate` argument can be supplied to choose whether to re-estimate the propensity score in the remaining units. **If units from the treated group are discarded based on a common support restriction, the estimand no longer corresponds to the ATT.** ### Caliper matching (`caliper`) A *caliper* can be though of as a ring around each unit that limits to which other units that unit can be paired. Calipers are based on the propensity score or other covariates. Two units whose distance on a calipered covariate is larger than the caliper width for that covariate are not allowed to be matched to each other. Any units for which there are no available matches within the caliper are dropped from the matched sample. Calipers ensure paired units are close to each other on the calipered covariates, which can ensure good balance in the matched sample. Multiple variables can be supplied to `caliper` to enforce calipers on all of them simultaneously. Using calipers can be a good alternative to exact or coarsened exact matching to ensure only similar units are paired with each other. The `std.caliper` argument controls whether the provided calipers are in raw units or standard deviation units. When negative calipers are supplied, this forces units whose distance on the calipered covariate is *smaller* than the absolute caliper width for that covariate to be disallowed from being matched to each other. **If units from the treated group are left unmatched due to a caliper, the estimand no longer corresponds to the ATT.** ### Mahalanobis distance matching (`mahvars`) To perform Mahalanobis distance matching without the need to estimate or use a propensity score, the `distance` argument can be set to `"mahalanobis"`. If a propensity score is to be estimated or used for a different purpose, such as in a common support restriction or a caliper, but you still want to perform Mahalanobis distance matching, variables should be supplied to the `mahvars` argument. The propensity scores will be generated using the `distance` specification, and matching will occur not on the covariates supplied to the main formula of `matchit()` but rather on the covariates supplied to `mahvars`. To perform Mahalanobis distance matching within a propensity score caliper, for example, the `distance` argument should be set to the method of estimating the propensity score (e.g., `"glm"` for logistic regression), the `caliper` argument should be specified to the desired caliper width, and `mahvars` should be specified to perform Mahalanobis distance matching on the desired covariates within the caliper. `mahvars` has a special meaning for genetic matching and cardinality matching; see their respective help pages for details. ### Exact matching (`exact`) To perform exact matching on all supplied covariates, the `method` argument can be set to `"exact"`. To perform exact matching only on some covariates and some other form of matching within exact matching strata on other covariates, the `exact` argument can be used. Covariates supplied to the `exact` argument will be matched exactly, and the form of matching specified by `method` (e.g., `"nearest"` for nearest neighbor matching) will take place within each exact matching stratum. This can be a good way to gain some of the benefits of exact matching without completely succumbing to the curse of dimensionality. As with exact matching performed with `method = "exact"`, any units in strata lacking members of one of the treatment groups will be left unmatched. Note that although matching occurs within each exact matching stratum, propensity score estimation and computation of the Mahalanobis or other distance matrix occur in the full sample. **If units from the treated group are unmatched due to an exact matching restriction, the estimand no longer corresponds to the ATT.** ### Anti-exact matching (`antiexact`) Anti-exact matching adds a restriction such that a treated and control unit with same values of any of the specified anti-exact matching variables cannot be paired. This can be useful when finding comparison units outside of a unit's group, such as when matching units in one group to units in another when units within the same group might otherwise be close matches. See examples [here](https://stackoverflow.com/questions/66526115/propensity-score-matching-with-panel-data) and [here](https://stackoverflow.com/questions/61120201/avoiding-duplicates-from-propensity-score-matching?rq=1). A similar effect can be implemented by supplying negative caliper values. ### Matching with replacement (`replace`) Nearest neighbor matching and genetic matching have the option of matching with or without replacement, and this is controlled by the `replace` argument. Matching without replacement means that each control unit is matched to only one treated unit, while matching with replacement means that control units can be reused and matched to multiple treated units. Matching without replacement carries certain statistical benefits in that weights for each unit can be omitted or are more straightforward to include and dependence between units depends only on pair membership. However, it is not asymptotically consistent unless the propensity scores for all treated units are below .5 and there are many more control units than treated units [@savjeInconsistencyMatchingReplacement2022]. Special standard error estimators are sometimes required for estimating effects after matching with replacement [@austin2020a], and methods for accounting for uncertainty are not well understood for non-continuous outcomes. Matching with replacement will tend to yield better balance though, because the problem of "running out" of close control units to match to treated units is avoided, though the reuse of control units will decrease the effect sample size, thereby worsening precision [@austin2013b]. (This problem occurs in the Lalonde dataset used in `vignette("MatchIt")`, which is why nearest neighbor matching without replacement is not very effective there.) After matching with replacement, control units are assigned to more than one subclass, so the `get_matches()` function should be used instead of `match_data()` after matching with replacement if subclasses are to be used in follow-up analyses; see `vignette("estimating-effects")` for details. The `reuse.max` argument can also be used with `method = "nearest"` to control how many times each control unit can be reused as a match. Setting `reuse.max = 1` is equivalent to requiring matching without replacement (i.e., because each control can be used only once). Other values allow control units to be matched more than once, though only up to the specified number of times. Higher values will tend to improve balance at the cost of precision. ### $k$:1 matching (`ratio`) The most common form of matching, 1:1 matching, involves pairing one control unit with each treated unit. To perform $k$:1 matching (e.g., 2:1 or 3:1), which pairs (up to) $k$ control units with each treated unit, the `ratio` argument can be specified. Performing $k$:1 matching can preserve precision by preventing too many control units from being unmatched and dropped from the matched sample, though the gain in precision by increasing $k$ diminishes rapidly after 4 [@rosenbaum2020]. Importantly, for $k>1$, the matches after the first match will generally be worse than the first match in terms of closeness to the treated unit, so increasing $k$ can also worsen balance [@rassenOnetomanyPropensityScore2012]. @austin2010a found that 1:1 or 1:2 matching generally performed best in terms of mean squared error. In general, it makes sense to use higher values of $k$ while ensuring that balance is satisfactory. With nearest neighbor and optimal pair matching, variable $k$:1 matching, in which the number of controls matched to each treated unit varies, can also be used; this can have improved performance over "fixed" $k$:1 matching [@ming2000; @rassenOnetomanyPropensityScore2012]. See `?method_nearest` and `?method_optimal` for information on implementing variable $k$:1 matching. ### Matching order (`m.order`) For nearest neighbor matching (including genetic matching), units are matched in an order, and that order can affect the quality of individual matches and of the resulting matched sample. With `method = "nearest"`, the allowable options to `m.order` to control the matching order are `"largest"`, `"smallest"`, `"closest"`, `"farthest"`, `"random"`, and `"data"`. With `method = "genetic"`, all but `"closest"` and `"farthest"` can be used. Requesting `"largest"` means that treated units with the largest propensity scores, i.e., those least like the control units, will be matched first, which prevents them from having bad matches after all the close control units have been used up. `"smallest"` means that treated units with the smallest propensity scores are matched first. `"closest"` means that potential pairs with the smallest distance between units will be matched first, which ensures that the best possible matches are included in the matched sample but can yield poor matches for units whose best match is far from them; this makes it particularly useful when matching with a caliper. `"farthest"` means that closest pairs with the largest distance between them will be matched first, which ensures the hardest units to match are given the best chance to find matches. `"random"` matches in a random order, and `"data"` matches in order of the data. A propensity score is required for `"largest"` and `"smallest"` but not for the other options. @rubin1973 recommends using `"largest"` or `"random"`, though @austin2013b recommends against `"largest"` and instead favors `"closest"` or `"random"`. `"closest"` and `"smallest"` are best for prioritizing the best possible matches, while `"farthest"` and `"largest"` are best for preventing extreme pairwise distances between matched units. ## Choosing a Matching Method Choosing the best matching method for one's data depends on the unique characteristics of the dataset as well as the goals of the analysis. For example, because different matching methods can target different estimands, when certain estimands are desired, specific methods must be used. On the other hand, some methods may be more effective than others when retaining the target estimand is less important. Below we provide some guidance on choosing a matching method. Remember that multiple methods can (and should) be tried as long as the treatment effect is not estimated until a method has been settled on. The criteria on which a matching specification should be judged are balance and remaining (effective) sample size after matching. Assessing balance is described in `vignette("assessing-balance")`. A typical workflow is similar to that demonstrated in `vignette("MatchIt")`: try a matching method, and if it yields poor balance or an unacceptably low remaining sample size, try another, until a satisfactory specification has been found. It is important to assess balance broadly (i.e., beyond comparing the means of the covariates in the treated and control groups), and the search for a matching specification should not stop when a threshold is reached, but should attempt to come as close as possible to perfect balance [@ho2007]. Even if the first matching specification appears successful at reducing imbalance, there may be another specification that could reduce it even further, thereby increasing the robustness of the inference and the plausibility of an unbiased effect estimate. If the target of inference is the ATE, optimal or generalized full matching, subclassification, or profile matching can be used. If the target of inference is the ATT or ATC, any matching method may be used. When retaining the target estimand is not so important, additional options become available that involve discarding units in such a way that the original estimand is distorted. These include matching with a caliper, matching within a region of common support, cardinality matching, or exact or coarsened exact matching, perhaps on a subset of the covariates. Because exact and coarsened exact matching aim to balance the entire joint distribution of covariates, they are the most powerful methods. If it is possible to perform exact matching, this method should be used. If continuous covariates are present, coarsened exact matching can be tried. Care should be taken with retaining the target population and ensuring enough matched units remain; unless the control pool is much larger than the treated pool, it is likely some (or many) treated units will be discarded, thereby changing the estimand and possibly dramatically reducing precision. These methods are typically only available in the most optimistic of circumstances, but they should be used first when those circumstances arise. It may also be useful to combine exact or coarsened exact matching on some covariates with another form of matching on the others (i.e., by using the `exact` argument). When estimating the ATE, either subclassification, full matching, or profile matching can be used. Optimal and generalized full matching can be effective because they optimize a balance criterion, often leading to better balance. With full matching, it's also possible to exact match on some variables and match using the Mahalanobis distance, eliminating the need to estimate propensity scores. Profile matching also ensures good balance, but because units are only given weights of zero or one, a solution may not be feasible and many units may have to be discarded. For large datasets, neither optimal full matching nor profile matching may be possible, in which case generalized full matching and subclassification are faster solutions. When using subclassification, the number of subclasses should be varied. With large samples, higher numbers of subclasses tend to yield better performance; one should not immediately settle for the default (6) or the often-cited recommendation of 5 without trying several other numbers. The documentation for `cobalt::bal.compute()` contains an example of using balance to select the optimal number of subclasses. When estimating the ATT, a variety of methods can be tried. Genetic matching can perform well at achieving good balance because it directly optimizes covariate balance. With larger datasets, it may take a long time to reach a good solution (though that solution will tend to be good as well). Profile matching also will achieve good balance if a solution is feasible because balance is controlled by the user. Optimal pair matching and nearest neighbor matching without replacement tend to perform similarly to each other; nearest neighbor matching may be preferable for large datasets that cannot be handled by optimal matching. Nearest neighbor, optimal, and genetic matching allow some customizations like including covariates on which to exactly match, using the Mahalanobis distance instead of a propensity score difference, and performing $k$:1 matching with $k>1$. Nearest neighbor matching with replacement, full matching, and subclassification all involve weighting the control units with nonuniform weights, which often allows for improved balancing capabilities but can be accompanied by a loss in effective sample size, even when all units are retained. There is no reason not to try many of these methods, varying parameters here and there, in search of good balance and high remaining sample size. As previously mentioned, no single method can be recommended above all others because the optimal specification depends on the unique qualities of each dataset. When the target population is less important, for example, when engaging in treatment effect discovery or when the sampled population is not of particular interest (e.g., it corresponds to an arbitrarily chosen hospital or school; see @mao2018 for these and other reasons why retaining the target population may not be important), other methods that do not retain the characteristics of the original sample become available. These include matching with a caliper (on the propensity score or on the covariates themselves), cardinality matching, and more restrictive forms of matching like exact and coarsened exact matching, either on all covariates or just a subset, that are prone to discard units from the sample in such a way that the target population is changed. @austin2013b and Austin and Stuart [-@austin2015c; -@austin2015a] have found that caliper matching can be a particularly effective modification to nearest neighbor matching for eliminating imbalance and reducing bias when the target population is less relevant, but when inference to a specific target population is desired, using calipers can induce bias due to incomplete matching [@rosenbaum1985; @wang2020]. Cardinality matching can be particularly effective in data with little overlap between the treatment groups [@visconti2018] and can perform better than caliper matching [@delosangelesresaDirectStableWeight2020]. It is important not to rely excessively on theoretical or simulation-based findings or specific recommendations when making choices about the best matching method to use. For example, although nearest neighbor matching without replacement balance covariates better than did subclassification with five or ten subclasses in Austin's [-@austin2009c] simulation, this does not imply it will be superior in all datasets. Likewise, though @rosenbaum1985a and @austin2011a both recommend using a caliper of .2 standard deviations of the logit of the propensity score, this does not imply that caliper will be optimal in all scenarios, and other widths should be tried, though it should be noted that tightening the caliper on the propensity score can sometimes degrade performance [@king2019]. For large datasets (i.e., in 10,000s to millions), some matching methods will be too slow to be used at scale. Instead, users should consider generalized full matching, subclassification, or coarsened exact matching, which are all very fast and designed to work with large datasets. Nearest neighbor matching on the propensity score has been optimized to run quickly for large datasets as well. ## Reporting the Matching Specification When reporting the results of a matching analysis, it is important to include the relevant details of the final matching specification and the process of arriving at it. Using `print()` on the `matchit` object synthesizes information on how the above arguments were used to provide a description of the matching specification. It is best to be as specific as possible to ensure the analysis is replicable and to allow audiences to assess its validity. Although citations recommending specific matching methods can be used to help justify a choice, the only sufficient justification is adequate balance and remaining sample size, regardless of published recommendations for specific methods. See `vignette("assessing-balance")` for instructions on how to assess and report the quality of a matching specification. After matching and estimating an effect, details of the effect estimation must be included as well; see `vignette("estimating-effects")` for instructions on how to perform and report on the analysis of a matched dataset. ## References MatchIt/vignettes/MatchIt.Rmd0000644000176200001440000006430114762373555015635 0ustar liggesusers--- title: 'MatchIt: Getting Started' author: "Noah Greifer" date: "`r Sys.Date()`" output: html_vignette: toc: yes vignette: > %\VignetteIndexEntry{MatchIt: Getting Started} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} bibliography: references.bib link-citations: true --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, fig.width=7, fig.height=5, fig.align = "center") options(width = 200) notice <- "Note: if the `optmatch` package is not available, the subsequent lines will not run." use <- { if (requireNamespace("optmatch", quietly = TRUE)) "full" else if (requireNamespace("quickmatch", quietly = TRUE)) "quick" else "none" } me_ok <- requireNamespace("marginaleffects", quietly = TRUE) && requireNamespace("sandwich", quietly = TRUE) && !isTRUE(as.logical(Sys.getenv("NOT_CRAN", "false"))) && utils::packageVersion("marginaleffects") > '0.25.0' ``` ```{=html} ``` ## Introduction `MatchIt` implements the suggestions of @ho2007 for improving parametric statistical models for estimating treatment effects in observational studies and reducing model dependence by preprocessing data with semi-parametric and non-parametric matching methods. After appropriately preprocessing with `MatchIt`, researchers can use whatever parametric model they would have used without `MatchIt` and produce inferences that are more robust and less sensitive to modeling assumptions. `MatchIt` reduces the dependence of causal inferences on commonly made, but hard-to-justify, statistical modeling assumptions using a large range of sophisticated matching methods. The package includes several popular approaches to matching and provides access to methods implemented in other packages through its single, unified, and easy-to-use interface. Matching is used in the context of estimating the causal effect of a binary treatment or exposure on an outcome while controlling for measured pre-treatment variables, typically confounding variables or variables prognostic of the outcome. Here and throughout the `MatchIt` documentation we use the word "treatment" to refer to the focal causal variable of interest, with "treated" and "control" reflecting the names of the treatment groups. The goal of matching is to produce *covariate balance*, that is, for the distributions of covariates in the two groups to be approximately equal to each other, as they would be in a successful randomized experiment. The importance of covariate balance is that it allows for increased robustness to the choice of model used to estimate the treatment effect; in perfectly balanced samples, a simple difference in means can be a valid treatment effect estimate. Here we do not aim to provide a full introduction to matching or causal inference theory, but simply to explain how to use `MatchIt` to perform nonparametric preprocessing. For excellent and accessible introductions to matching, see @stuart2010 and @austin2011b. A matching analysis involves four primary steps: 1) planning, 2) matching, 3) assessing the quality of matches, and 4) estimating the treatment effect and its uncertainty. Here we briefly discuss these steps and how they can be implemented with `MatchIt`; in the other included vignettes, these steps are discussed in more detail. We will use Lalonde's data on the evaluation of the National Supported Work program to demonstrate `MatchIt`'s capabilities. First, we load `MatchIt` and bring in the `lalonde` dataset. ```{r} library("MatchIt") data("lalonde") head(lalonde) ``` The statistical quantity of interest is the causal effect of the treatment (`treat`) on 1978 earnings (`re78`). The other variables are pre-treatment covariates. See `?lalonde` for more information on this dataset. In particular, the analysis is concerned with the marginal, total effect of the treatment for those who actually received the treatment. In what follows, we briefly describe the four steps of a matching analysis and how to implement them in `MatchIt`. For more details, we recommend reading the other vignettes, `vignette("matching-methods")`, `vignette("assessing-balance")`, and `vignette("estimating-effects")`, especially for users less familiar with matching methods. For the use of `MatchIt` with sampling weights, also see `vignette("sampling-weights")`. It is important to recognize that the ease of using `MatchIt` does not imply the simplicity of matching methods; advanced statistical methods like matching that require many decisions to be made and caution in their use should only be performed by those with statistical training. ## Planning The planning phase of a matching analysis involves selecting the type of effect to be estimated, selecting the target population to which the treatment effect is to generalize, and selecting the covariates for which balance is required for an unbiased estimate of the treatment effect. Each of these are theoretical steps that do not involve performing analyses on the data. Ideally, they should be considered prior to data collection in the planning stage of a study. Thinking about them early can aid in performing a complete and cost-effective analysis. **Selecting the type of effect to be estimated.** There are a few different types of effects to be estimated. In the presence of mediating variables, one might be interested in the direct effect of the treatment that does not pass through the mediating variables or the total effect of the treatment across all causal pathways. Matching is well suited for estimating total effects, and specific mediation methods may be better suited for other mediation-related quantities. One may be interested in a conditional effect or a marginal effect. A conditional effect is the effect of a treatment within some strata of other prognostic variables (e.g., at the patient level), and a marginal effect is the average effect of a treatment in a population (e.g., for implementing a broad policy change). Different types of matching are well suited for each of these, but the most common forms are best used for estimating marginal treatment effects; for conditional treatment effects, typically modeling assumptions are required or matching must be done within strata of the conditioning variables. Matching can reduce the reliance on correct model specification for conditional effects. **Selecting a target population.** The target population is the population to which the effect estimate is to generalize. Typically, an effect estimated in a sample generalizes to the population from which the sample is a probability sample. If the sample is not a probability sample from any population (e.g., it is a convenience sample or involves patients from an arbitrary hospital), the target population can be unclear. Often, the target population is a group of units who are eligible for the treatment (or a subset thereof). Causal estimands are defined by the target population to which they generalize. The average treatment effect in the population (ATE) is the average effect of the treatment for all units in the target population. The average treatment effect in the treated (ATT) is the average effect of the treatment for units like those who actually were treated. The most common forms of matching are best suited for estimating the ATT, though some are also available for estimating the ATE. Some matching methods distort the sample in such a way that the estimated treatment effect corresponds neither to the ATE nor to the ATT, but rather to the effect in an unspecified population (sometimes called the ATM, or average treatment effect in the remaining matched sample). When the target population is not so important (e.g., in the case of treatment effect discovery), such methods may be attractive; otherwise, care should be taken in ensuring the effect generalizes to the target population of interest. Different matching methods allow for different target populations, so it is important to choose a matching method that allows one to estimate the desired effect. See @greiferChoosingEstimandWhen2021 for guidance on making this choice. **Selecting covariates to balance.** Selecting covariates carefully is critical for ensuring the resulting treatment effect estimate is free of confounding and can be validly interpreted as a causal effect. To estimate total causal effects, all covariates must be measured prior to treatment (or otherwise not be affected by the treatment). Covariates should be those that cause variation in the outcome and selection into treatment group; these are known as confounding variables. See @vanderweele2019 for a guide on covariate selection. Ideally these covariates are measured without error and are free of missingness. ## Check Initial Imbalance After planning and prior to matching, it can be a good idea to view the initial imbalance in one's data that matching is attempting to eliminate. We can do this using the code below: ```{r} # No matching; constructing a pre-match matchit object m.out0 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, method = NULL, distance = "glm") ``` The first argument is a `formula` relating the treatment to the covariates used in estimating the propensity score and for which balance is to be assessed. The `data` argument specifies the dataset where these variables exist. Typically, the `method` argument specifies the method of matching to be performed; here, we set it to `NULL` so we can assess balance prior to matching[^1]. The `distance` argument specifies the method for estimating the propensity score, a one-dimensional summary of all the included covariates, computed as the predicted probability of being the treated group given the covariates; here, we set it to `"glm"` for generalized linear model, which implements logistic regression by default[^2] (see `?distance` for other options). [^1]: Note that the default for `method` is `"nearest"` to perform nearest neighbor matching. To prevent any matching from taking place in order to assess pre-matching imbalance, `method` must be set to `NULL`. [^2]: Note that setting `distance = "logit"`, which was the default in `MatchIt` version prior to 4.0.0, or `"ps"`, which was the default prior to version 4.5.0, will also estimate logistic regression propensity scores. Because it is the default, the `distance` argument can actually be omitted if logistic regression propensity scores are desired. Below we assess balance on the unmatched data using `summary()`: ```{r} # Checking balance prior to matching summary(m.out0) ``` We can see severe imbalances as measured by the standardized mean differences (`Std. Mean Diff.`), variance ratios (`Var. Ratio`), and empirical cumulative distribution function (eCDF) statistics. Values of standardized mean differences and eCDF statistics close to zero and values of variance ratios close to one indicate good balance, and here many of them are far from their ideal values. ## Matching Now, matching can be performed. There are several different classes and methods of matching, described in `vignette("matching-methods")`. Here, we begin by briefly demonstrating 1:1 nearest neighbor (NN) matching on the propensity score, which is appropriate for estimating the ATT. One by one, each treated unit is paired with an available control unit that has the closest propensity score to it. Any remaining control units are left unmatched and excluded from further analysis. Due to the theoretical balancing properties of the propensity score described by @rosenbaum1983, propensity score matching can be an effective way to achieve covariate balance in the treatment groups. Below we demonstrate the use of `matchit()` to perform nearest neighbor propensity score matching. ```{r} # 1:1 NN PS matching w/o replacement m.out1 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, method = "nearest", distance = "glm") ``` We use the same syntax as before, but this time specify `method = "nearest"` to implement nearest neighbor matching, again using a logistic regression propensity score. Many other arguments are available for tuning the matching method and method of propensity score estimation. The matching outputs are contained in the `m.out1` object. Printing this object gives a description of the type of matching performed: ```{r} m.out1 ``` The key components of the `m.out1` object are `weights` (the computed matching weights), `subclass` (matching pair membership), `distance` (the estimated propensity score), and `match.matrix` (which control units are matched to each treated unit). How these can be used for estimating the effect of the treatment after matching is detailed in `vignette("estimating-effects")`. ## Assessing the Quality of Matches Although matching on the propensity score is often effective at eliminating differences between the treatment groups to achieve covariate balance, its performance in this regard must be assessed. If covariates remain imbalanced after matching, the matching is considered unsuccessful, and a different matching specification should be tried. `MatchIt` offers a few tools for the assessment of covariate balance after matching. These include graphical and statistical methods. More detail on the interpretation of the included plots and statistics can be found in `vignette("assessing-balance")`. In addition to covariate balance, the quality of the match is determined by how many units remain after matching. Matching often involves discarding units that are not paired with other units, and some matching options, such as setting restrictions for common support or calipers, can further decrease the number of remaining units. If, after matching, the remaining sample size is small, the resulting effect estimate may be imprecise. In many cases, there will be a trade-off between balance and remaining sample size. How to optimally choose among them is an instance of the fundamental bias-variance trade-off problem that cannot be resolved without substantive knowledge of the phenomena under study. Prospective power analyses can be used to determine how small a sample can be before necessary precision is sacrificed. To assess the quality of the resulting matches numerically, we can use the `summary()` function on `m.out1` as before. Here we set `un = FALSE` to suppress display of the balance before matching for brevity and because we already saw it. (Leaving it as `TRUE`, its default, would display balance both before and after matching.) ```{r} # Checking balance after NN matching summary(m.out1, un = FALSE) ``` At the top is a summary of covariate balance after matching. Although balance has improved for some covariates, in general balance is still quite poor, indicating that nearest neighbor propensity score matching is not sufficient for removing confounding in this dataset. The final column, `Std. Pair Diff`, displays the average absolute within-pair difference of each covariate. When these values are small, better balance is typically achieved and estimated effects are more robust to misspecification of the outcome model [@king2019; @rubin1973a]. Next is a table of the sample sizes before and after matching. The matching procedure left 244 control units unmatched. Ideally, unmatched units would be those far from the treated units and would require greater extrapolation were they to have been retained. We can visualize the distribution of propensity scores of those who were matched using `plot()` with `type = "jitter"`: ```{r, fig.alt="Jitter plot of the propensity scores, which shows that no treated unit were dropped, and a large number of control units with low propensity scores were dropped."} plot(m.out1, type = "jitter", interactive = FALSE) ``` We can visually examine balance on the covariates using `plot()` with `type = "density"`: ```{r, fig.alt="Density plots of age, married and re75 in the unmatched and matched samples."} plot(m.out1, type = "density", interactive = FALSE, which.xs = ~age + married + re75) ``` Imbalances are represented by the differences between the black (treated) and gray (control) distributions. Although `married` and `re75` appear to have improved balance after matching, the case is mixed for `age`. ### Trying a Different Matching Specification Given the poor performance of nearest neighbor matching in this example, we can try a different matching method or make other changes to the matching algorithm or distance specification. Below, we'll try full matching, which matches every treated unit to at least one control and every control to at least one treated unit [@hansen2004; @stuart2008a]. We'll also try a different link (probit) for the propensity score model. `r if (use == "none") notice` ```{r, eval = (use == "full"), include= (use %in% c("full", "none"))} # Full matching on a probit PS m.out2 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, method = "full", distance = "glm", link = "probit") m.out2 ``` ```{r, eval = (use == "quick"), include = (use == "quick")} # Full matching on a probit PS m.out2 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, method = "quick", distance = "glm", link = "probit") m.out2 ``` We can examine balance on this new matching specification. ```{r, eval = (use != "none")} # Checking balance after full matching summary(m.out2, un = FALSE) ``` Balance is far better, as determined by the lower standardized mean differences and eCDF statistics. The balance should be reported when publishing the results of a matching analysis. This can be done either in a table, using the values resulting from `summary()`, or in a plot, such as a Love plot, which we can make by calling `plot()` on the `summary()` output: ```{r, eval = (use != "none"), fig.alt = "A love plot with matched dots below the threshold lines, indicaitng good balance after matching, in contrast to the unmatched dots far from the treshold lines, indicating poor balance before matching."} plot(summary(m.out2)) ``` Love plots are a simple and straightforward way to summarize balance visually. See `vignette("assessing-balance")` for more information on how to customize `MatchIt`'s Love plot and how to use `cobalt`, a package designed specifically for balance assessment and reporting that is compatible with `MatchIt`. ## Estimating the Treatment Effect How treatment effects are estimated depends on what form of matching was performed. See `vignette("estimating-effects")` for information on how to estimate treatment effects in a variety of scenarios (i.e., different matching methods and outcome types). After full matching and most other matching methods, we can run a regression of the outcome on the treatment and covariates in the matched sample (i.e., including the matching weights) and estimate the treatment effect using g-computation as implemented in `marginaleffects::avg_comparisons()`[^est]. Including the covariates used in the matching in the effect estimation can provide additional robustness to slight imbalances remaining after matching and can improve precision. [^est]: In some cases, the coefficient on the treatment variable in the outcome model can be used as the effect estimate, but g-computation always yields a valid effect estimate regardless of the form of the outcome model and its use is the same regardless of the outcome model type or matching method (with some slight variations), so we always recommend performing g-computation after fitting the outcome model. G-computation is explained in detail in `vignette("estimating-effects")`. Because full matching was successful at balancing the covariates, we'll demonstrate here how to estimate a treatment effect after performing such an analysis. First, we'll extract the matched dataset from the `matchit` object using `match_data()`. This dataset only contains the matched units and adds columns for `distance`, `weights`, and `subclass` (described previously). `r if (use == "none") notice` ```{r, eval = (use != "none")} m.data <- match_data(m.out2) head(m.data) ``` We can then model the outcome in this dataset using the standard regression functions in R, like `lm()` or `glm()`, being sure to include the matching weights (stored in the `weights` variable of the `match_data()` output) in the estimation[^3]. Finally, we use `marginaleffects::avg_comparisons()` to perform g-computation to estimate the ATT. We recommend using cluster-robust standard errors for most analyses, with pair membership as the clustering variable; `avg_comparisons()` makes this straightforward. [^3]: With 1:1 nearest neighbor matching without replacement, excluding the matching weights does not change the estimates. For all other forms of matching, they are required, so we recommend always including them for consistency. ```{r, eval = (use != "none" && me_ok)} library("marginaleffects") fit <- lm(re78 ~ treat * (age + educ + race + married + nodegree + re74 + re75), data = m.data, weights = weights) avg_comparisons(fit, variables = "treat", vcov = ~subclass, newdata = subset(treat == 1)) ``` ```{r, include = FALSE} est <- { if (use != "none" && me_ok) { avg_comparisons(fit, variables = "treat", vcov = ~subclass, newdata = subset(treat == 1)) } else data.frame(type = "response", term = "1 - 0", estimate = 2114, std.error = 646, statistic = 3.27, p.value = 0.0011, conf.low = 848, conf.high = 3380) } ``` The outcome model coefficients and tests should not be interpreted or reported. See `vignette("estimating-effects")` for more information on how to estimate effects and standard errors with different forms of matching and with different outcome types. A benefit of matching is that the outcome model used to estimate the treatment effect is robust to misspecification when balance has been achieved. With full matching, we were able to achieve balance, so the effect estimate should depend less on the form of the outcome model used than had we used 1:1 matching without replacement or no matching at all. ## Reporting Results To report matching results in a manuscript or research report, a few key pieces of information are required. One should be as detailed as possible about the matching procedure and the decisions made to ensure the analysis is replicable and can be adequately assessed for soundness by the audience. Key pieces of information to include are 1) the matching specification used (including the method and any additional options, like calipers or common support restrictions), 2) the distance measure used (including how it was estimated e.g., using logistic regression for propensity scores), 3) which other matching methods were tried prior to settling on a final specification and how the choices were made, 4) the balance of the final matching specification (including standardized mean differences and other balance statistics for the variables, their powers, and their interactions; some of these can be reported as summaries rather than in full detail), 5) the number of matched, unmatched, and discarded units included in the effect estimation, and 6) the method of estimating the treatment effect and standard error or confidence interval (including the specific model used and the specific type of standard error). See @thoemmes2011 for a complete list of specific details to report. Below is an example of how we might write up the prior analysis: > We used propensity score matching to estimate the average marginal effect of the treatment on 1978 earnings on those who received it accounting for confounding by the included covariates. We first attempted 1:1 nearest neighbor propensity score matching without replacement with a propensity score estimated using logistic regression of the treatment on the covariates. This matching specification yielded poor balance, so we instead tried full matching on the propensity score, which yielded adequate balance, as indicated in Table 1 and Figure 1. The propensity score was estimated using a probit regression of the treatment on the covariates, which yielded better balance than did a logistic regression. After matching, all standardized mean differences for the covariates were below 0.1 and all standardized mean differences for squares and two-way interactions between covariates were below .15, indicating adequate balance. Full matching uses all treated and all control units, so no units were discarded by the matching. > > To estimate the treatment effect and its standard error, we fit a linear regression model with 1978 earnings as the outcome and the treatment, covariates, and their interaction as predictors and included the full matching weights in the estimation. The `lm()` function was used to fit the outcome, and the `avg_comparisons()` function in the `marginaleffects` package was used to perform g-computation in the matched sample to estimate the ATT. A cluster-robust variance was used to estimate its standard error with matching stratum membership as the clustering variable. > > The estimated effect was \$`r round(est$estimate)` (SE = `r round(est$std.error, 1)`, p = `r round(est$p.value, 3)`), indicating that the average effect of the treatment for those who received it is to increase earnings. ## Conclusion Although we have covered the basics of performing a matching analysis here, to use matching to its full potential, the more advanced methods available in `MatchIt` should be considered. We recommend reading the other vignettes included here to gain a better understand of all the `MatchIt` has to offer and how to use it responsibly and effectively. As previously stated, the ease of using `MatchIt` does not imply that matching or causal inference in general are simple matters; matching is an advanced statistical technique that should be used with care and caution. We hope the capabilities of `MatchIt` ease and encourage the use of nonparametric preprocessing for estimating causal effects in a robust and well-justified way. ## References MatchIt/vignettes/sampling-weights.Rmd0000644000176200001440000003100014762373547017555 0ustar liggesusers--- title: "Matching with Sampling Weights" author: "Noah Greifer" date: "`r Sys.Date()`" output: html_vignette: toc: true vignette: > %\VignetteIndexEntry{Matching with Sampling Weights} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} bibliography: references.bib link-citations: true --- ```{=html} ``` ```{r, include = FALSE} knitr::opts_chunk$set(echo = TRUE, eval=T) options(width = 200, digits = 4) ``` ```{r, include = FALSE} #Generating data similar to Austin (2009) for demonstrating treatment effect estimation with sampling weights gen_X <- function(n) { X <- matrix(rnorm(9 * n), nrow = n, ncol = 9) X[,5] <- as.numeric(X[,5] < .5) X } #~20% treated gen_A <- function(X) { LP_A <- - 1.2 + log(2)*X[,1] - log(1.5)*X[,2] + log(2)*X[,4] - log(2.4)*X[,5] + log(2)*X[,7] - log(1.5)*X[,8] P_A <- plogis(LP_A) rbinom(nrow(X), 1, P_A) } # Continuous outcome gen_Y_C <- function(A, X) { 2*A + 2*X[,1] + 2*X[,2] + 2*X[,3] + 1*X[,4] + 2*X[,5] + 1*X[,6] + rnorm(length(A), 0, 5) } #Conditional: # MD: 2 #Marginal: # MD: 2 gen_SW <- function(X) { e <- rbinom(nrow(X), 1, .3) 1/plogis(log(1.4)*X[,2] + log(.7)*X[,4] + log(.9)*X[,6] + log(1.5)*X[,8] + log(.9)*e + -log(.5)*e*X[,2] + log(.6)*e*X[,4]) } set.seed(19599) n <- 2000 X <- gen_X(n) A <- gen_A(X) SW <- gen_SW(X) Y_C <- gen_Y_C(A, X) d <- data.frame(A, X, Y_C, SW) eval_est <- (requireNamespace("optmatch", quietly = TRUE) && requireNamespace("marginaleffects", quietly = TRUE) && !isTRUE(as.logical(Sys.getenv("NOT_CRAN", "false"))) && requireNamespace("sandwich", quietly = TRUE) && utils::packageVersion("marginaleffects") > '0.25.0') ``` ## Introduction Sampling weights (also known as survey weights) frequently appear when using large, representative datasets. They are required to ensure any estimated quantities generalize to a target population defined by the weights. Evidence suggests that sampling weights need to be incorporated into a propensity score matching analysis to obtain valid and unbiased estimates of the treatment effect in the sampling weighted population [@dugoff2014; @austin2016; @lenis2019]. In this guide, we demonstrate how to use sampling weights with `MatchIt` for propensity score estimation, balance assessment, and effect estimation. Fortunately, doing so is not complicated, but some care must be taken to ensure sampling weights are incorporated correctly. It is assumed one has read the other vignettes explaining matching (`vignette("matching-methods")`), balance assessment (`vignette("assessing-balance")`), and effect estimation (`vignette("estimating-effects")`. We will use the same simulated toy dataset used in `vignette("estimating-effects")` except with the addition of a sampling weights variable, `SW`, which is used to generalize the sample to a specific target population with a distribution of covariates different from that of the sample. Code to generate the covariates, treatment, and outcome is at the bottom of `vignette("estimating-effects")` and code to generate the sampling weights is at the end of this document. We will consider the effect of binary treatment `A` on continuous outcome `Y_C`, adjusting for confounders `X1`-`X9`. ```{r,message=FALSE,warning=FALSE} head(d) library("MatchIt") ``` ## Matching When using sampling weights with propensity score matching, one has the option of including the sampling weights in the model used to estimate the propensity scores. Although evidence is mixed on whether this is required [@austin2016; @lenis2019], it can be a good idea. The choice should depend on whether including the sampling weights improves the quality of the matches. Specifications including and excluding sampling weights should be tried to determine which is preferred. To supply sampling weights to the propensity score-estimating function in `matchit()`, the sampling weights variable should be supplied to the `s.weights` argument. It can be supplied either as a numerical vector containing the sampling weights, or a string or one-sided formula with the name of the sampling weights variable in the supplied dataset. Below we demonstrate including sampling weights into propensity scores estimated using logistic regression for optimal full matching for the average treatment effect in the population (ATE) (note that all methods and steps apply the same way to all forms of matching and all estimands). ```{asis, echo = eval_est} Note: if the `optmatch`, `marginaleffects`, or `sandwich` packages are not available, the subsequent lines will not run. ``` ```{r, eval = eval_est} mF_s <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = d, method = "full", distance = "glm", estimand = "ATE", s.weights = ~SW) mF_s ``` Notice that the description of the matching specification when the `matchit` object is printed includes lines indicating that the sampling weights were included in the estimation of the propensity score and that they are present in the `matchit` object. It is stored in the `s.weights` component of the `matchit` object. Note that at this stage, the matching weights (stored in the `weights` component of the `matchit` object) do not incorporate the sampling weights; they are calculated simply as a result of the matching. Now let's perform full matching on a propensity score that does not include the sampling weights in its estimation. Here we use the same specification as was used in `vignette("estimating-effects")`. ```{r, eval = eval_est} mF <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = d, method = "full", distance = "glm", estimand = "ATE") mF ``` Notice that there is no mention of sampling weights in the description of the matching specification. However, to properly assess balance and estimate effects, we need the sampling weights to be included in the `matchit` object, even if they were not used at all in the matching. To do so, we use the function `add_s.weights()`, which adds sampling weights to the supplied `matchit` objects. ```{r, eval = eval_est} mF <- add_s.weights(mF, ~SW) mF ``` Now when we print the `matchit` object, we can see lines have been added identifying that sampling weights are present but they were not used in the estimation of the propensity score used in the matching. Note that not all methods can involve sampling weights in the estimation. Only methods that use the propensity score will be affected by sampling weights; coarsened exact matching or Mahalanobis distance optimal pair matching, for example, ignore the sampling weights, and some propensity score estimation methods, like `randomForest` and `bart` (as presently implemented), cannot incorporate sampling weights. Sampling weights should still be supplied to `matchit()` even when using these methods to avoid having to use `add_s.weights()` and remembering which methods do or do not involve sampling weights. ## Assessing Balance Now we need to decide which matching specification is the best to use for effect estimation. We do this by selecting the one that yields the best balance without sacrificing remaining effective sample size. Because the sampling weights are incorporated into the `matchit` object, the balance assessment tools in `plot.matchit()` and `summary.matchit()` incorporate them into their output. We'll use `summary()` to examine balance on the two matching specifications. With sampling weights included, the balance statistics for the unmatched data are weighted by the sampling weights. The balance statistics for the matched data are weighted by the product of the sampling weights and the matching weights. It is the product of these weights that will be used in estimating the treatment effect. Below we use `summary()` to display balance for the two matching specifications. No additional arguments to `summary()` are required for it to use the sampling weights; as long as they are in the `matchit` object (either due to being supplied with the `s.weights` argument in the call to `matchit()` or to being added afterward by `add_s.weights()`), they will be correctly incorporated into the balance statistics. ```{r, eval = eval_est} #Balance before matching and for the SW propensity score full matching summary(mF_s) #Balance for the non-SW propensity score full matching summary(mF, un = FALSE) ``` The results of the two matching specifications are similar. Balance appears to be slightly better when using the sampling weight-estimated propensity scores than when using the unweighted propensity scores. However, the effective sample size for the control group is larger when using the unweighted propensity scores. Neither propensity score specification achieves excellent balance, and more fiddling with the matching specification (e.g., by changing the method of estimating propensity scores, the type of matching, or the options used with the matching) might yield a better matched set. For the purposes of this analysis, we will move forward with the matching that used the sampling weight-estimated propensity scores (`mF_s`) because of its superior balance. Some of the remaining imbalance may be eliminated by adjusting for the covariates in the outcome model. Note that had we not added sampling weights to `mF`, the matching specification that did not include the sampling weights, our balance assessment would be inaccurate because the balance statistics would not include the sampling weights. In this case, in fact, assessing balance on `mF` without incorporated the sampling weights would have yielded radically different results and a different conclusion. It is critical to incorporate sampling weights into the `matchit` object using `add_s.weights()` even if they are not included in the propensity score estimation. ## Estimating the Effect Estimating the treatment effect after matching is straightforward when using sampling weights. Effects are estimated in the same way as when sampling weights are excluded, except that the matching weights must be multiplied by the sampling weights for use in the outcome model to yield accurate, generalizable estimates. `match_data()` and `get_matches()` do this automatically, so the weights produced by these functions already are a product of the matching weights and the sampling weights. Note this will only be true if sampling weights are incorporated into the `matchit` object. With `avg_comparisons()`, only the sampling weights should be included when estimating the treatment effect. Below we estimate the effect of `A` on `Y_C` in the matched and sampling weighted sample, adjusting for the covariates to improve precision and decrease bias. ```{r, eval = eval_est} md_F_s <- match_data(mF_s) fit <- lm(Y_C ~ A * (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9), data = md_F_s, weights = weights) library("marginaleffects") avg_comparisons(fit, variables = "A", vcov = ~subclass, newdata = subset(A == 1), wts = "SW") ``` Note that `match_data()` and `get_weights()` have the option `include.s.weights`, which, when set to `FALSE`, makes it so the returned weights do not incorporate the sampling weights and are simply the matching weights. Because one might to forget to multiply the two sets of weights together, it is easier to just use the default of `include.s.weights = TRUE` and ignore the sampling weights in the rest of the analysis (because they are already included in the returned weights). ## Code to Generate Data used in Examples ```{r, eval = FALSE} #Generatng data similar to Austin (2009) for demonstrating #treatment effect estimation with sampling weights gen_X <- function(n) { X <- matrix(rnorm(9 * n), nrow = n, ncol = 9) X[,5] <- as.numeric(X[,5] < .5) X } #~20% treated gen_A <- function(X) { LP_A <- - 1.2 + log(2)*X[,1] - log(1.5)*X[,2] + log(2)*X[,4] - log(2.4)*X[,5] + log(2)*X[,7] - log(1.5)*X[,8] P_A <- plogis(LP_A) rbinom(nrow(X), 1, P_A) } # Continuous outcome gen_Y_C <- function(A, X) { 2*A + 2*X[,1] + 2*X[,2] + 2*X[,3] + 1*X[,4] + 2*X[,5] + 1*X[,6] + rnorm(length(A), 0, 5) } #Conditional: # MD: 2 #Marginal: # MD: 2 gen_SW <- function(X) { e <- rbinom(nrow(X), 1, .3) 1/plogis(log(1.4)*X[,2] + log(.7)*X[,4] + log(.9)*X[,6] + log(1.5)*X[,8] + log(.9)*e + -log(.5)*e*X[,2] + log(.6)*e*X[,4]) } set.seed(19599) n <- 2000 X <- gen_X(n) A <- gen_A(X) SW <- gen_SW(X) Y_C <- gen_Y_C(A, X) d <- data.frame(A, X, Y_C, SW) ``` ## References MatchIt/vignettes/estimating-effects.Rmd0000644000176200001440000016325714762363176020075 0ustar liggesusers--- title: "Estimating Effects After Matching" author: "Noah Greifer" date: "`r Sys.Date()`" output: html_vignette: toc: true vignette: > %\VignetteIndexEntry{Estimating Effects After Matching} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} bibliography: references.bib link-citations: true --- ```{=html} ``` ```{r, include = FALSE} knitr::opts_chunk$set(echo = TRUE, eval=T) options(width = 200, digits= 4) me_ok <- requireNamespace("marginaleffects", quietly = TRUE) && requireNamespace("sandwich", quietly = TRUE) && !isTRUE(as.logical(Sys.getenv("NOT_CRAN", "false"))) && utils::packageVersion("marginaleffects") > '0.25.0' su_ok <- requireNamespace("survival", quietly = TRUE) boot_ok <- requireNamespace("boot", quietly = TRUE) ``` ```{r, include = FALSE} #Generating data similar to Austin (2009) for demonstrating treatment effect estimation gen_X <- function(n) { X <- matrix(rnorm(9 * n), nrow = n, ncol = 9) X[,5] <- as.numeric(X[,5] < .5) X } #~20% treated gen_A <- function(X) { LP_A <- - 1.2 + log(2)*X[,1] - log(1.5)*X[,2] + log(2)*X[,4] - log(2.4)*X[,5] + log(2)*X[,7] - log(1.5)*X[,8] P_A <- plogis(LP_A) rbinom(nrow(X), 1, P_A) } # Continuous outcome gen_Y_C <- function(A, X) { 2*A + 2*X[,1] + 2*X[,2] + 2*X[,3] + 1*X[,4] + 2*X[,5] + 1*X[,6] + rnorm(length(A), 0, 5) } #Conditional: # MD: 2 #Marginal: # MD: 2 # Binary outcome gen_Y_B <- function(A, X) { LP_B <- -2 + log(2.4)*A + log(2)*X[,1] + log(2)*X[,2] + log(2)*X[,3] + log(1.5)*X[,4] + log(2.4)*X[,5] + log(1.5)*X[,6] P_B <- plogis(LP_B) rbinom(length(A), 1, P_B) } #Conditional: # OR: 2.4 # logOR: .875 #Marginal: # RD: .144 # RR: 1.54 # logRR: .433 # OR: 1.92 # logOR .655 # Survival outcome gen_Y_S <- function(A, X) { LP_S <- -2 + log(2.4)*A + log(2)*X[,1] + log(2)*X[,2] + log(2)*X[,3] + log(1.5)*X[,4] + log(2.4)*X[,5] + log(1.5)*X[,6] sqrt(-log(runif(length(A)))*2e4*exp(-LP_S)) } #Conditional: # HR: 2.4 # logHR: .875 #Marginal: # HR: 1.57 # logHR: .452 set.seed(19599) n <- 2000 X <- gen_X(n) A <- gen_A(X) Y_C <- gen_Y_C(A, X) Y_B <- gen_Y_B(A, X) Y_S <- gen_Y_S(A, X) d <- data.frame(A, X, Y_C, Y_B, Y_S) ``` ## Introduction After assessing balance and deciding on a matching specification, it comes time to estimate the effect of the treatment in the matched sample. How the effect is estimated and interpreted depends on the desired estimand and the type of model used (if any). In addition to estimating effects, estimating the uncertainty of the effects is critical in communicating them and assessing whether the observed effect is compatible with there being no effect in the population. This guide explains how to estimate effects after various forms of matching and with various outcome types. There may be situations that are not covered here for which additional methodological research may be required, but some of the recommended methods here can be used to guide such applications. This guide is structured as follows: first, information on the concepts related to effect and standard error (SE) estimation is presented below. Then, instructions for how to estimate effects and SEs are described for the standard case (matching for the ATT with a continuous outcome) and some other common circumstances. Finally, recommendations for reporting results and tips to avoid making common mistakes are presented. ### Identifying the estimand Before an effect is estimated, the estimand must be specified and clarified. Although some aspects of the estimand depend not only on how the effect is estimated after matching but also on the matching method itself, other aspects must be considered at the time of effect estimation and interpretation. Here, we consider three aspects of the estimand: the population the effect is meant to generalize to (the target population), the effect measure, and whether the effect is marginal or conditional. **The target population.** Different matching methods allow you to estimate effects that can generalize to different target populations. The most common estimand in matching is the average treatment effect in the treated (ATT), which is the average effect of treatment for those who receive treatment. This estimand is estimable for matching methods that do not change the treated units (i.e., by weighting or discarding units) and is requested in `matchit()` by setting `estimand = "ATT"` (which is the default). The average treatment effect in the population (ATE) is the average effect of treatment for the population from which the sample is a random sample. This estimand is estimable only for methods that allow the ATE and either do not discard units from the sample or explicit target full sample balance, which in `MatchIt` is limited to full matching, subclassification, and profile matching when setting `estimand = "ATE"`. When treated units are discarded (e.g., through the use of common support restrictions, calipers, cardinality matching, or [coarsened] exact matching), the estimand corresponds to neither the population ATT nor the population ATE, but rather to an average treatment effect in the remaining matched sample (ATM), which may not correspond to any specific target population. See @greiferChoosingEstimandWhen2021 for a discussion on the substantive considerations involved when choosing the target population of the estimand. **Marginal and conditional effects.** A marginal effect is a comparison between the expected potential outcome under treatment and the expected potential outcome under control. This is the same quantity estimated in randomized trials without blocking or covariate adjustment and is particularly useful for quantifying the overall effect of a policy or population-wide intervention. A conditional effect is the comparison between the expected potential outcomes in the treatment groups within strata. This is useful for identifying the effect of a treatment for an individual patient or a subset of the population. **Effect measures.** The outcome types we consider here are continuous, with the effect measured by the mean difference; binary, with the effect measured by the risk difference (RD), risk ratio (RR), or odds ratio (OR); and time-to-event (i.e., survival), with the effect measured by the hazard ratio (HR). The RR, OR, and HR are *noncollapsible* effect measures, which means the marginal effect on that scale is not a (possibly) weighted average of the conditional effects within strata, even if the stratum-specific effects are of the same magnitude. For these effect measures, it is critical to distinguish between marginal and conditional effects because different statistical methods target different types of effects. The mean difference and RD are *collapsible* effect measures, so the same methods can be used to estimate marginal and conditional effects. Our primary focus will be on marginal effects, which are appropriate for all effect measures, easily interpretable, and require few modeling assumptions. The "Common Mistakes" section includes examples of commonly used methods that estimate conditional rather than marginal effects and should not be used when marginal effects are desired. ### G-computation To estimate marginal effects, we use a method known as g-computation [@snowdenImplementationGComputationSimulated2011] or regression estimation [@schaferAverageCausalEffects2008]. This involves first specifying a model for the outcome as a function of the treatment and covariates. Then, for each unit, we compute their predicted values of the outcome setting their treatment status to treated, and then again for control, leaving us with two predicted outcome values for each unit, which are estimates of the potential outcomes under each treatment level. We compute the mean of each of the estimated potential outcomes across the entire sample, which leaves us with two average estimated potential outcomes. Finally, the contrast of these average estimated potential outcomes (e.g., their difference or ratio, depending on the effect measure desired) is the estimate of the treatment effect. When doing g-computation after matching, a few additional considerations are required. First, when we take the average of the estimated potential outcomes under each treatment level, this must be a weighted average that incorporates the matching weights. Second, if we want to target the ATT or ATC, we only estimate potential outcomes for the treated or control group, respectively (though we still generate predicted values under both treatment and control). G-computation as a framework for estimating effects after matching has a number of advantages over other approaches. It works the same regardless of the form of the outcome model or type of outcome (e.g., whether a linear model is used for a continuous outcome or a logistic model is used for a binary outcome); the only difference might be how the average expected potential outcomes are contrasted in the final step. In simple cases, the estimated effect is numerically identical to effects estimated using other methods; for example, if no covariates are included in the outcome model, the g-computation estimate is equal to the difference in means from a t-test or coefficient of the treatment in a linear model for the outcome. There are analytic approximations to the SEs of the g-computation estimate, and these SEs can incorporate pair/subclass membership (described in more detail below). For all these reasons, we use g-computation when possible for all effect estimates, even if there are simpler methods that would yield the same estimates. Using a single workflow (with some slight modifications depending on the context; see below) facilitates implementing best practices regardless of what choices a user makes. ### Modeling the Outcome The goal of the outcome model is to generate good predictions for use in the g-computation procedure described above. The type and form of the outcome model should depend on the outcome type. For continuous outcomes, one can use a linear model regressing the outcome on the treatment; for binary outcomes, one can use a generalized linear model with, e.g., a logistic link; for time-to-event outcomes, one can use a Cox proportional hazards model. An additional decision to make is whether (and how) to include covariates in the outcome model. One may ask, why use matching at all if you are going to model the outcome with covariates anyway? Matching reduces the dependence of the effect estimate on correct specification of the outcome model; this is the central thesis of @ho2007. Including covariates in the outcome model after matching has several functions: it can increase precision in the effect estimate, reduce the bias due to residual imbalance, and make the effect estimate "doubly robust", which means it is consistent if either the matching reduces sufficient imbalance in the covariates or if the outcome model is correct. For these reasons, we recommend covariate adjustment after matching when possible. There is some evidence that covariate adjustment is most helpful for covariates with standardized mean differences greater than .1 [@nguyen2017], so these covariates and covariates thought to be highly predictive of the outcome should be prioritized in treatment effect models if not all can be included due to sample size constraints. Although there are many possible ways to include covariates (e.g., not just main effects but interactions, smoothing terms like splines, or other nonlinear transformations), it is important not to engage in specification search (i.e., trying many outcomes models in search of the "best" one). Doing so can invalidate results and yield a conclusion that fails to replicate. For this reason, we recommend only including the same terms included in the propensity score model unless there is a strong *a priori* and justifiable reason to model the outcome differently. It is important not to interpret the coefficients and tests of covariates in the outcome model. These are not causal effects and their estimates may be severely confounded. Only the treatment effect estimate can be interpreted as causal assuming the relevant assumptions about unconfoundedness are met. Inappropriately interpreting the coefficients of covariates in the outcome model is known as the Table 2 fallacy [@westreich2013]. To avoid this, we only display the results of the g-computation procedure and do not examine or interpret the outcome models themselves. ### Estimating Standard Errors and Confidence Intervals Uncertainty estimation (i.e., of SEs, confidence intervals, and p-values) may consider the variety of sources of uncertainty present in the analysis, including (but not limited to!) estimation of the propensity score (if used), matching (i.e., because treated units might be matched to different control units if others had been sampled), and estimation of the treatment effect (i.e., because of sampling error). In general, there are no analytic solutions to all these issues, so much of the research done on uncertainty estimation after matching has relied on simulation studies. The two primary methods that have been shown to perform well in matched samples are using cluster-robust SEs and the bootstrap, described below. To compute SEs after g-computation, a method known as the delta method is used; this is a way to compute the SEs of the derived quantities (the expected potential outcomes and their contrast) from the variance of the coefficients of the outcome models. For nonlinear models (e.g., logistic regression), the delta method is only an approximation subject to error (though in many cases this error is small and shrinks in large samples). Because the delta method relies on the variance of the coefficients from the outcome model, it is important to correctly estimate these variances, using either robust or cluster-robust methods as described below. #### Robust and Cluster-Robust Standard Errors **Robust standard errors.** Also known as sandwich SEs (due to the form of the formula for computing them), heteroscedasticity-consistent SEs, or Huber-White SEs, robust SEs are an adjustment to the usual maximum likelihood or ordinary least squares SEs that are robust to violations of some of the assumptions required for usual SEs to be valid [@mackinnon1985]. Although there has been some debate about their utility [@king2015], robust SEs rarely degrade inferences and often improve them. Generally, robust SEs **must** be used when any non-uniform weights are included in the estimation (e.g., with matching with replacement or inverse probability weighting). **Cluster-robust standard errors.** A version of robust SEs known as cluster-robust SEs [@liang1986] can be used to account for dependence between observations within clusters (e.g., matched pairs). @abadie2019 demonstrate analytically that cluster-robust SEs are generally valid after matching, whereas regular robust SEs can over- or under-estimate the true sampling variability of the effect estimator depending on the specification of the outcome model (if any) and degree of effect modification. A plethora of simulation studies have further confirmed the validity of cluster-robust SEs after matching [e.g., @austin2009a; @austin2014; @gayat2012; @wan2019; @austin2013]. Given this evidence favoring the use of cluster-robust SEs, we recommend them in most cases and use them judiciously in this guide[^1]. [^1]: Because they are only appropriate with a large number of clusters, cluster-robust SEs are generally not used with subclassification methods. Regular robust SEs are valid with these methods when using the subclassification weights to estimate marginal effects. #### Bootstrapping One problem when using robust and cluster-robust SEs along with the delta method is that the delta method is an approximation, as previously mentioned. One solution to this problem is bootstrapping, which is a technique used to simulate the sampling distribution of an estimator by repeatedly drawing samples with replacement and estimating the effect in each bootstrap sample [@efron1993]. From the bootstrap distribution, SEs and confidence intervals can be computed in several ways, including using the standard deviation of the bootstrap estimates as the SE estimate or using the 2.5 and 97.5 percentiles as 95% confidence interval bounds. Bootstrapping tends to be most useful when no analytic estimator of a SE is possible or has been derived yet. Although @abadie2008 found analytically that the bootstrap is inappropriate for matched samples, simulation evidence has found it to be adequate in many cases [@hill2006; @austin2014; @austin2017]. Typically, bootstrapping involves performing the entire estimation process in each bootstrap sample, including propensity score estimation, matching, and effect estimation. This tends to be the most straightforward route, though intervals from this method may be conservative in some cases (i.e., they are wider than necessary to achieve nominal coverage) [@austin2014]. Less conservative and more accurate intervals have been found when using different forms of the bootstrap, including the wild bootstrap develop by @bodory2020 and the matched/cluster bootstrap described by @austin2014 and @abadie2019. The cluster bootstrap involves sampling matched pairs/strata of units from the matched sample and performing the analysis within each sample composed of the sampled pairs. @abadie2019 derived analytically that the cluster bootstrap is valid for estimating SEs and confidence intervals in the same circumstances cluster robust SEs are; indeed, the cluster bootstrap SE is known to approximate the cluster-robust SE [@cameron2015]. With bootstrapping, more bootstrap replications are always better but can take time and increase the chances that at least one error will occur within the bootstrap analysis (e.g., a bootstrap sample with zero treated units or zero units with an event). In general, numbers of replications upwards of 999 are recommended, with values one less than a multiple of 100 preferred to avoid interpolation when using the percentiles as confidence interval limits [@mackinnon2006]. There are several methods of computing bootstrap confidence intervals, but the bias-corrected accelerated (BCa) bootstrap confidence interval often performs best [@austin2014; @carpenter2000] and is easy to implement, simply by setting `type = "bca"` in the call to `boot::boot.ci()` after running `boot::boot()`[^2]. [^2]: Sometimes, an error will occur with this method, which usually means more bootstrap replications are required. The number of replicates must be greater than the original sample size when using the full bootstrap and greater than the number of pairs/strata when using the block bootstrap. Most of this guide will consider analytic (i.e., non-bootstrapping) approaches to estimating uncertainty; the section "Using Bootstrapping to Estimate Confidence Intervals" describes broadly how to use bootstrapping. Although analytic estimates are faster to compute, in many cases bootstrap confidence intervals are more accurate. ## Estimating Treatment Effects and Standard Errors After Matching Below, we describe effect estimation after matching. We'll be using a simulated toy dataset `d` with several outcome types. Code to generate the dataset is at the end of this document. The focus here is not on evaluating the methods but simply on demonstrating them. In all cases, the correct propensity score model is used. Below we display the first six rows of `d`: ```{r} head(d) ``` `A` is the treatment variable, `X1` through `X9` are covariates, `Y_C` is a continuous outcome, `Y_B` is a binary outcome, and `Y_S` is a survival outcome. We will need to the following packages to perform the desired analyses: - `marginaleffects` provides the `avg_comparisons()` function for performing g-computation and estimating the SEs and confidence intervals of the average estimate potential outcomes and treatment effects - `sandwich` is used internally by `marginaleffects` to compute robust and cluster-robust SEs - `survival` provides `coxph()` to estimate the coefficients in a Cox-proportional hazards model for the marginal hazard ratio, which we will use for survival outcomes. Of course, we also need `MatchIt` to perform the matching. ```{r,message=FALSE,warning=FALSE, eval = which(c(TRUE, me_ok))} library("MatchIt") library("marginaleffects") ``` All effect estimates will be computed using `marginaleffects::avg_comparions()`, even when its use may be superfluous (e.g., for performing a t-test in the matched set). As previously mentioned, this is because it is useful to have a single workflow that works no matter the situation, perhaps with very slight modifications to accommodate different contexts. Using `avg_comparions()` has several advantages, even when the alternatives are simple: it only provides the effect estimate, and not other coefficients; it automatically incorporates robust and cluster-robust SEs if requested; and it always produces average marginal effects for the correct population if requested. Other packages may be of use but are not used here. There are alternatives to the `marginaleffects` package for computing average marginal effects, including `margins` and `stdReg`. The `survey` package can be used to estimate robust SEs incorporating weights and provides functions for survey-weighted generalized linear models and Cox-proportional hazards models. ### The Standard Case For almost all matching methods, whether a caliper, common support restriction, exact matching specification, or $k$:1 matching specification is used, estimating the effect in the matched dataset is straightforward and involves fitting a model for the outcome that incorporates the matching weights[^3], then estimating the treatment effect using g-computation (i.e., using `marginaleffects::avg_comparisons()`) with a cluster-robust SE to account for pair membership. This procedure is the same for continuous and binary outcomes with and without covariates. [^3]: The matching weights are not necessary when performing 1:1 matching, but we include them here for generality. When weights are not necessary, including them does not affect the estimates. Because it may not always be clear when weights are required, we recommend always including them. There are a few adjustments that need to be made for certain scenarios, which we describe in the section "Adjustments to the Standard Case". These adjustments include for the following cases: when matching for the ATE rather than the ATT, for matching with replacement, for matching with a method that doesn't involve creating pairs (e.g., cardinality and profile matching and coarsened exact matching), for subclassification, for estimating effects with binary outcomes, and for estimating effects with survival outcomes. You must read the Standard Case to understand the basic procedure before reading about these special scenarios. Here, we demonstrate the faster analytic approach to estimating confidence intervals; for the bootstrap approach, see the section "Using Bootstrapping to Estimate Confidence Intervals" below. First, we will perform variable-ratio nearest neighbor matching without replacement on the propensity score for the ATT. Remember, all matching methods use this exact procedure or a slight variation, so this section is critical even if you are using a different matching method. ```{r} #Variable-ratio NN matching on the PS for the ATT mV <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = d, ratio = 2, max.controls = 4) mV #Extract matched data md <- match_data(mV) head(md) ``` Typically one would assess balance and ensure that this matching specification works, but we will skip that step here to focus on effect estimation. See `vignette("MatchIt")` and `vignette("assessing-balance")` for more information on this necessary step. Because we did not use a caliper, the target estimand is the ATT. We perform all analyses using the matched dataset, `md`, which, for matching methods that involve dropping units, contains only the units retained in the sample. First, we fit a model for the outcome given the treatment and (optionally) the covariates. It's usually a good idea to include treatment-covariate interactions, which we do below, but this is not always necessary, especially when excellent balance has been achieved. You can also include the propensity score (usually labeled `distance` in the `match_data()` output), which can add some robustness, especially when modeled flexibly (e.g., with polynomial terms or splines) [@austinDoublePropensityscoreAdjustment2017]; see [here](https://stats.stackexchange.com/a/580174/116195) for an example. ```{r} #Linear model with covariates fit1 <- lm(Y_C ~ A * (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9), data = md, weights = weights) ``` Next, we use `marginaleffects::avg_comparisons()` to estimate the ATT. ```{r, eval=me_ok} avg_comparisons(fit1, variables = "A", vcov = ~subclass, newdata = subset(A == 1)) ``` Let's break down the call to `avg_comparisons()`: to the first argument, we supply the model fit, `fit1`; to the `variables` argument, the name of the treatment (`"A"`); to the `vcov` argument, a formula with subclass membership (`~subclass`) to request cluster-robust SEs; and to the `newdata` argument, a version of the matched dataset containing only the treated units (`subset(A == 1)`) to request the ATT. Some of these arguments differ depending on the specifics of the matching method and outcome type; see the sections below for information. If, in addition to the effect estimate, we want the average estimated potential outcomes, we can use `marginaleffects::avg_predictions()`, which we demonstrate below. Note the interpretation of the resulting estimates as the expected potential outcomes is only valid if all covariates present in the outcome model (if any) are interacted with the treatment. ```{r, eval=me_ok && packageVersion("marginaleffects") >= "0.11.0"} avg_predictions(fit1, variables = "A", vcov = ~subclass, newdata = subset(A == 1)) ``` We can see that the difference in potential outcome means is equal to the average treatment effect computed previously[^4]. All of the arguments to `avg_predictions()` are the same as those to `avg_comparisons()`. [^4]: To verify that they are equal, supply the output of `avg_predictions()` to `hypotheses()`, e.g., `avg_predictions(...) |> hypotheses(~pairwise)`; this explicitly compares the average potential outcomes and should yield identical estimates to the `avg_comparisons()` call. ### Adjustments to the Standard Case This section explains how the procedure might differ if any of the following special circumstances occur. #### Matching for the ATE When matching for the ATE (including [coarsened] exact matching, full matching, subclassification, and cardinality matching), everything is identical to the Standard Case except that in the calls to `avg_comparisons()` and `avg_predictions()`, the `newdata` argument is omitted. This is because the estimated potential outcomes are computed for the full sample rather than just the treated units. #### Matching with replacement When matching with replacement (i.e., nearest neighbor or genetic matching with `replace = TRUE`), effect and SE estimation need to account for control unit multiplicity (i.e., repeated use) and within-pair correlations [@hill2006; @austin2020a]. Although @abadie2008 demonstrated analytically that bootstrap SEs may be invalid for matching with replacement, simulation work by @hill2006 and @bodory2020 has found that bootstrap SEs are adequate and generally slightly conservative. See the section "Using Bootstrapping to Estimate Confidence Intervals" for instructions on using the bootstrap and an example that use matching with replacement. Because control units do not belong to unique pairs, there is no pair membership in the `match_data()` output. One can simply change `vcov = ~subclass` to `vcov = "HC3"` in the calls to `avg_comparisons()` and `avg_predictions()` to use robust SEs instead of cluster-robust SEs, as recommended by @hill2006. There is some evidence for an alternative approach that incorporates pair membership and adjusts for reuse of control units, though this has only been studied for survival outcomes [@austin2020a]. This adjustment involves using two-way cluster-robust SEs with pair membership and unit ID as the clustering variables. For continuous and binary outcomes, this involves the following two changes: 1) replace `match_data()` with `get_matches()`, which produces a dataset with one row per unit per pair, meaning control units matched to multiple treated units will appear multiple times in the dataset; 2) set `vcov = ~subclass + id` in the calls to `avg_comparisons()` and `avg_predictions()`. For survival outcomes, a special procedure must be used; see the section on survival outcomes below. #### Matching without pairing Some matching methods do not involve creating pairs; these include cardinality and profile matching with `mahvars = NULL` (the default), exact matching, and coarsened exact matching with `k2k = FALSE` (the default). The only change that needs to be made to the Standard Case is that one should change `vcov = ~subclass` to `vcov = "HC3"` in the calls to `avg_comparisons()` and `avg_predictions()` to use robust SEs instead of cluster-robust SEs. Remember that if matching is done for the ATE (even if units are dropped), the `newdata` argument should be dropped. #### Propensity score subclassification There are two natural ways to estimate marginal effects after subclassification: the first is to estimate subclass-specific treatment effects and pool them using an average marginal effects procedure, and the second is to use the stratum weights to estimate a single average marginal effect. This latter approach is also known as marginal mean weighting through stratification (MMWS), and is described in detail by @hong2010[^5]. When done properly, both methods should yield similar or identical estimates of the treatment effect. [^5]: It is also known as fine stratification weighting, described by @desai2017. All of the methods described above for the Standard Case also work with MMWS because the formation of the weights is the same; the only difference is that it is not appropriate to use cluster-robust SEs with MMWS because of how few clusters are present, so one should change `vcov = ~subclass` to `vcov = "HC3"` in the calls to `avg_comparisons()` and `avg_predictions()` to use robust SEs instead of cluster-robust SEs. The subclasses can optionally be included in the outcome model (optionally interacting with treatment) as an alternative to including the propensity score. The subclass-specific approach omits the weights and uses the subclasses directly. It is only appropriate when there are a small number of subclasses relative to the sample size. In the outcome model, `subclass` should interact with all other predictors in the model (including the treatment, covariates, and interactions, if any), and the `weights` argument should be omitted. As with MMWS, one should change `vcov = ~subclass` to `vcov = "HC3"` in the calls to `avg_comparisons()` and `avg_predictions()`. See an example below: ```{r, eval=me_ok} #Subclassification on the PS for the ATT mS <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = d, method = "subclass", estimand = "ATT") #Extract matched data md <- match_data(mS) fitS <- lm(Y_C ~ subclass * (A * (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9)), data = md) avg_comparisons(fitS, variables = "A", vcov = "HC3", newdata = subset(A == 1)) ``` A model with fewer terms may be required when subclasses are small; removing covariates or their interactions with treatment may be required and can increase precision in smaller datasets. Remember that if subclassification is done for the ATE (even if units are dropped), the `newdata` argument should be dropped. #### Binary outcomes Estimating effects on binary outcomes is essentially the same as for continuous outcomes. The main difference is that there are several measures of the effect one can consider, which include the odds ratio (OR), risk ratio/relative risk (RR), and risk difference (RD), and the syntax to `avg_comparisons()` depends on which one is desired. The outcome model should be one appropriate for binary outcomes (e.g., logistic regression) but is unrelated to the desired effect measure because we can compute any of the above effect measures using `avg_comparisons()` after the logistic regression. To fit a logistic regression model, change `lm()` to `glm()` and set `family = quasibinomial()`[^6]. To compute the marginal RD, we can use exactly the same syntax as in the Standard Case; nothing needs to change[^7]. [^6]: We use `quasibinomial()` instead of `binomial()` simply to avoid a spurious warning that can occur with certain kinds of matching; the results will be identical regardless. [^7]: Note that for low or high average expected risks computed with `avg_predictions()`, the confidence intervals may go below 0 or above 1; this is because an approximation is used. To avoid this problem, bootstrapping or simulation-based inference can be used instead. To compute the marginal RR, we need to add `comparison = "lnratioavg"` to `avg_comparisons()`; this computes the marginal log RR. To get the marginal RR, we need to add `transform = "exp"` to `avg_comparisons()`, which exponentiates the marginal log RR and its confidence interval. The code below computes the effects and displays the statistics of interest: ```{r, eval=me_ok} #Logistic regression model with covariates fit2 <- glm(Y_B ~ A * (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9), data = md, weights = weights, family = quasibinomial()) #Compute effects; RR and confidence interval avg_comparisons(fit2, variables = "A", vcov = ~subclass, newdata = subset(A == 1), comparison = "lnratioavg", transform = "exp") ``` The output displays the marginal RR, its Z-value, the p-value for the Z-test of the log RR against 0, and its confidence interval. (Note that even though the `Contrast` label still suggests the log RR, the RR is actually displayed.) To view the log RR and its standard error, omit the `transform` argument. For the marginal OR, the only thing that needs to change is that `comparison` should be set to `"lnoravg"`. For the marginal RD, both the `comparison` and `transform` arguments can be removed (yielding the same call as in the standard case). #### Survival outcomes There are several measures of effect size for survival outcomes. When using the Cox proportional hazards model, the quantity of interest is the hazard ratio (HR) between the treated and control groups. As with the OR, the HR is non-collapsible, which means the estimated HR will only be a valid estimate of the marginal HR when no other covariates are included in the model. Other effect measures, such as the difference in mean survival times or probability of survival after a given time, can be treated just like continuous and binary outcomes as previously described. For the HR, we cannot compute average marginal effects and must use the coefficient on treatment in a Cox model fit without covariates[^8]. This means that we cannot use the procedures from the Standard Case. Here we describe estimating the marginal HR using `coxph()` from the `survival` package. (See `help("coxph", package = "survival")` for more information on this model.) To request cluster-robust SEs as recommended by @austin2013a, we need to supply pair membership (stored in the `subclass` column of `md`) to the `cluster` argument and set `robust = TRUE`. For matching methods that don't involve pairing (e.g., cardinality and profile matching and [coarsened] exact matching), we can omit the `cluster` argument (but keep `robust = TRUE`)[^9]. [^8]: It is not immediately clear how to estimate a marginal HR when covariates are included in the outcome model; though @austin2020 describe several ways of including covariates in a model to estimate the marginal HR, they do not develop SEs and little research has been done on this method, so we will not present it here. Instead, we fit a simple Cox model with the treatment as the sole predictor. [^9]: For subclassification, only MMWS can be used; this is done simply by including the stratification weights in the Cox model and omitting the `cluster` argument. ```{r, eval=su_ok} library("survival") #Cox Regression for marginal HR coxph(Surv(Y_S) ~ A, data = md, robust = TRUE, weights = weights, cluster = subclass) ``` The `coef` column contains the log HR, and `exp(coef)` contains the HR. Remember to always use the `robust se` for the SE of the log HR. The displayed z-test p-value results from using the robust SE. For matching with replacement, a special procedure described by @austin2020a can be necessary for valid inference. According to the results of their simulation studies, when the treatment prevalence is low (\<30%), a SE that does not involve pair membership (i.e., the `match_data()` approach, as demonstrated above) is sufficient. When treatment prevalence is higher, the SE that ignores pair membership may be too low, and the authors recommend using a custom SE estimator that uses information about both multiplicity and pairing. Doing so must be done manually for survival models using `get_matches()` and several calls to `coxph()` as demonstrated in the appendix of @austin2020a. We demonstrate this below: ```{r, eval = F} #get_matches() after matching with replacement gm <- get_matches(mR) #Austin & Cafri's (2020) SE estimator fs <- coxph(Surv(Y_S) ~ A, data = gm, robust = TRUE, weights = weights, cluster = subclass) Vs <- fs$var ks <- nlevels(gm$subclass) fi <- coxph(Surv(Y_S) ~ A, data = gm, robust = TRUE, weights = weights, cluster = id) Vi <- fi$var ki <- length(unique(gm$id)) fc <- coxph(Surv(Y_S) ~ A, data = gm, robust = TRUE, weights = weights) Vc <- fc$var kc <- nrow(gm) #Compute the variance and sneak it back into the fit object fc$var <- (ks/(ks-1))*Vs + (ki/(ki-1))*Vi - (kc/(kc-1))*Vc fc ``` The `robust se` column contains the computed SE, and the reported Z-test uses this SE. The `se(coef)` column should be ignored. ### Using Bootstrapping to Estimate Confidence Intervals The bootstrap is an alternative to the delta method for estimating confidence intervals for estimated effects. See the section Bootstrapping above for details. Here, we'll demonstrate two forms of the bootstrap: 1) the standard bootstrap, which involve resampling units and performing matching and effect estimation within each bootstrap sample, and 2) the cluster bootstrap, which involves resampling pairs after matching and estimating the effect in each bootstrap sample. For both, we will use functionality in the `boot` package. It is critical to set a seed using `set.seed()` prior to performing the bootstrap in order for results to be replicable. #### The standard bootstrap For the standard bootstrap, we need a function that takes in the original dataset and a vector of sampled unit indices and returns the estimated quantity of interest. This function should perform the matching on the bootstrap sample, fit the outcome model, and estimate the treatment effect using g-computation. In this example, we'll use matching with replacement, since the standard bootstrap has been found to work well with it [@bodory2020; @hill2006], despite some analytic results recommending otherwise [@abadie2008]. We'll implement g-computation manually rather than using `avg_comparisons()`, as this dramatically improves the speed of the estimation since we don't require standard errors to be estimated in each sample (or other processing `avg_comparisons()` does). We'll consider the marginal RR ATT of `A` on the binary outcome `Y_B`. The first step is to write the estimation function, we call `boot_fun`. This function returns the marginal RR. In it, we perform the matching, estimate the effect, and return the estimate of interest. ```{r} boot_fun <- function(data, i) { boot_data <- data[i,] #Do 1:1 PS matching with replacement m <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = boot_data, replace = TRUE) #Extract matched dataset md <- match_data(m, data = boot_data) #Fit outcome model fit <- glm(Y_B ~ A * (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9), data = md, weights = weights, family = quasibinomial()) ## G-computation ## #Subset to treated units for ATT; skip for ATE md1 <- subset(md, A == 1) #Estimated potential outcomes under treatment p1 <- predict(fit, type = "response", newdata = transform(md1, A = 1)) Ep1 <- mean(p1) #Estimated potential outcomes under control p0 <- predict(fit, type = "response", newdata = transform(md1, A = 0)) Ep0 <- mean(p0) #Risk ratio Ep1 / Ep0 } ``` Next, we call `boot::boot()` with this function and the original dataset supplied to perform the bootstrapping. We'll request 199 bootstrap replications here, but in practice you should use many more, upwards of 999. More is always better. Using more also allows you to use the bias-corrected and accelerated (BCa) bootstrap confidence intervals (which you can request by setting `type = "bca"` in the call to `boot.ci()`), which are known to be the most accurate. See `?boot.ci` for details. Here, we'll just use a percentile confidence interval. ```{r, eval = boot_ok, message=F, warning=F} library("boot") set.seed(54321) boot_out <- boot(d, boot_fun, R = 199) boot_out boot.ci(boot_out, type = "perc") ``` ```{r, include = FALSE} b <- { if (boot_ok) boot::boot.ci(boot_out, type = "perc") else list(t0 = 1.347, percent = c(0, 0, 0, 1.144, 1.891)) } ``` We find a RR of `r round(b$t0, 3)` with a confidence interval of (`r round(b$percent[4], 3)`, `r round(b$percent[5], 3)`). If we had wanted a risk difference, we could have changed the final line in `boot_fun()` to be `Ep1 - Ep0`. #### The cluster bootstrap For the cluster bootstrap, we need a function that takes in a vector of subclass (e.g., pairs) and a vector of sampled pair indices and returns the estimated quantity of interest. This function should fit the outcome model and estimate the treatment effect using g-computation, but the matching step occurs prior to the bootstrap. Here, we'll use matching without replacement, since the cluster bootstrap has been found to work well with it [@austin2014; @abadie2019]. This could be used for any method that returns pair membership, including other pair matching methods without replacement and full matching. As before, we'll use g-computation to estimate the marginal RR ATT, and we'll do so manually rather than using `avg_comparisons()` for speed. Note that the cluster bootstrap is already much faster than the standard bootstrap because matching does not need to occur within each bootstrap sample. First, we'll do a round of matching. ```{r} mNN <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = d) mNN md <- match_data(mNN) ``` Next, we'll write the function that takes in cluster membership and the sampled indices and returns an estimate. ```{r} #Unique pair IDs pair_ids <- levels(md$subclass) #Unit IDs, split by pair membership split_inds <- split(seq_len(nrow(md)), md$subclass) cluster_boot_fun <- function(pairs, i) { #Extract units corresponding to selected pairs ids <- unlist(split_inds[pairs[i]]) #Subset md with block bootstrapped indices boot_md <- md[ids,] #Fit outcome model fit <- glm(Y_B ~ A * (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9), data = boot_md, weights = weights, family = quasibinomial()) ## G-computation ## #Subset to treated units for ATT; skip for ATE md1 <- subset(boot_md, A == 1) #Estimated potential outcomes under treatment p1 <- predict(fit, type = "response", newdata = transform(md1, A = 1)) Ep1 <- mean(p1) #Estimated potential outcomes under control p0 <- predict(fit, type = "response", newdata = transform(md1, A = 0)) Ep0 <- mean(p0) #Risk ratio Ep1 / Ep0 } ``` Next, we call `boot::boot()` with this function and the vector of pair membership supplied to perform the bootstrapping. We'll request 199 bootstrap replications, but in practice you should use many more, upwards of 999. More is always better. Using more also allows you to use the bias-corrected and accelerated (BCa) boot strap confidence intervals, which are known to be the most accurate. See `?boot.ci` for details. Here, we'll just use a percentile confidence interval. ```{r, eval = boot_ok, message=F, warning=F} library("boot") set.seed(54321) cluster_boot_out <- boot(pair_ids, cluster_boot_fun, R = 199) cluster_boot_out boot.ci(cluster_boot_out, type = "perc") ``` ```{r, include = FALSE} b <- { if (boot_ok) boot::boot.ci(cluster_boot_out, type = "perc") else list(t0 = 1.588, percent = c(0,0,0, 1.348, 1.877)) } ``` We find a RR of `r round(b$t0, 3)` with a confidence interval of (`r round(b$percent[4], 3)`, `r round(b$percent[5], 3)`). If we had wanted a risk difference, we could have changed the final line in `cluster_boot_fun()` to be `Ep1 - Ep0`. ### Moderation Analysis Moderation analysis involves determining whether a treatment effect differs across levels of another variable. The use of matching with moderation analysis is described in @greenExaminingModerationAnalyses2014. The goal is to achieve balance within each subgroup of the potential moderating variable, and there are several ways of doing so. Broadly, one can either perform matching in the full dataset, requiring exact matching on the moderator, or one can perform completely separate analyses in each subgroup. We'll demonstrate the first approach below; see the blog post ["Subgroup Analysis After Propensity Score Matching Using R"](https://ngreifer.github.io/blog/subgroup-analysis-psm/) by Noah Greifer for an example of the other approach. There are benefits to using either approach, and @greenExaminingModerationAnalyses2014 find that either can be successful at balancing the subgroups. The first approach may be most effective with small samples, where separate propensity score models would be fit with greater uncertainty and an increased possibility of perfect prediction or failure to converge [@wangRelativePerformancePropensity2018]. The second approach may be more effective with larger samples or with matching methods that target balance in the matched sample, such as genetic matching [@kreifMethodsEstimatingSubgroup2012]. With genetic matching, separate subgroup analyses ensure balance is optimized within each subgroup rather than just overall. The chosen approach should be that which achieves the best balance, though we don't demonstrate assessing balance here to maintain focus on effect estimation. The full dataset approach involves pooling information across subgroups. This could involve estimating propensity scores using a single model for both groups but exact matching on the potential moderator. The propensity score model could include moderator-by-covariate interactions to allow the propensity score model to vary across subgroups on some covariates. It is critical that exact matching is done on the moderator so that matched pairs are not split across subgroups. We'll consider the binary variable `X5` to be the potential moderator of the effect of `A` on `Y_C`. Below, we'll estimate a propensity score using a single propensity score model with a few moderator-by-covariate interactions. We'll perform nearest neighbor matching on the propensity score and exact matching on the moderator, `X5`. ```{r} mP <- matchit(A ~ X1 + X2 + X5*X3 + X4 + X5*X6 + X7 + X5*X8 + X9, data = d, exact = ~X5) mP ``` Although it is straightforward to assess balance overall using `summary()`, it is more challenging to assess balance within subgroups. The easiest way to check subgroup balance would be to use `cobalt::bal.tab()`, which has a `cluster` argument that can be used to assess balance within subgroups, e.g., by `cobalt::bal.tab(mP, cluster = "X5")`. See the vignette "Appendix 2: Using cobalt with Clustered, Multiply Imputed, and Other Segmented Data" on the `cobalt` [website](https://ngreifer.github.io/cobalt/index.html) for details. If we are satisfied with balance, we can then model the outcome with an interaction between the treatment and the moderator. ```{r} mdP <- match_data(mP) fitP <- lm(Y_C ~ A * X5, data = mdP, weights = weights) ``` To estimate the subgroup ATTs, we can use `avg_comparisons()`, this time specifying the `by` argument to signify that we want treatment effects stratified by the moderator. ```{r, eval=me_ok} avg_comparisons(fitP, variables = "A", vcov = ~subclass, newdata = subset(A == 1), by = "X5") ``` We can see that the subgroup mean differences are quite similar to each other. Finally, we can test for moderation using another call to `avg_comparisons()`, this time using the `hypothesis` argument to signify that we want to compare effects between subgroups: ```{r, eval=me_ok} avg_comparisons(fitP, variables = "A", vcov = ~subclass, newdata = subset(A == 1), by = "X5", hypothesis = ~pairwise) ``` As expected, the difference between the subgroup treatment effects is small and nonsignificant, so there is no evidence of moderation by `X5`. When the moderator has more than two levels, it is possible to run an omnibus test for moderation by changing `hypothesis` to `~reference` and supplying the output to `hypotheses()` with `joint = TRUE`, e.g., ```{r, eval=FALSE} avg_comparisons(fitP, variables = "A", vcov = ~subclass, newdata = subset(A == 1), by = "X5", hypothesis = ~reference) |> hypotheses(joint = TRUE) ``` This produces a single p-value for the test that all pairwise differences between subgroups are equal to zero. ### Reporting Results It is important to be as thorough and complete as possible when describing the methods of estimating the treatment effect and the results of the analysis. This improves transparency and replicability of the analysis. Results should at least include the following: - a description of the outcome model used (e.g., logistic regression, a linear model with treatment-covariate interactions and covariates, a Cox proportional hazards model with the matching weights applied) - the way the effect was estimated (e.g., using g-computation or as the coefficient in the outcome model) - the way SEs and confidence intervals were estimated (e.g., using robust SEs, using cluster-robust SEs with pair membership as the cluster, using the BCa bootstrap with 4999 bootstrap replications and the entire process of matching and effect estimation included in each replication) - R packages and functions used in estimating the effect and its SE (e.g., `glm()` in base R, `avg_comparisons()` in `marginaleffects`, `boot()` and `boot.ci()` in `boot`) - The effect and its SE and confidence interval All this is in addition to information about the matching method, propensity score estimation procedure (if used), balance assessment, etc. mentioned in the other vignettes. ## Common Mistakes There are a few common mistakes that should be avoided. It is important not only to avoid these mistakes in one's own research but also to be able to spot these mistakes in others' analyses. ### 1. Failing to include weights Several methods involve weights that are to be used in estimating the treatment effect. With full matching and stratification matching (when analyzed using MMWS), the weights do the entire work of balancing the covariates across the treatment groups. Omitting weights essentially ignores the entire purpose of matching. Some cases are less obvious. When performing matching with replacement and estimating the treatment effect using the `match_data()` output, weights must be included to ensure control units matched to multiple treated units are weighted accordingly. Similarly, when performing k:1 matching where not all treated units receive k matches, weights are required to account for the differential weight of the matched control units. The only time weights can be omitted after pair matching is when performing 1:1 matching without replacement. Including weights even in this scenario will not affect the analysis and it can be good practice to always include weights to prevent this error from occurring. There are some scenarios where weights are not useful because the conditioning occurs through some other means, such as when using the direct subclass strategy rather than MMWS for estimating marginal effects after stratification. ### 2. Failing to use robust or cluster-robust standard errors Robust SEs are required when using weights to estimate the treatment effect. The model-based SEs resulting from weighted least squares or maximum likelihood are inaccurate when using matching weights because they assume weights are frequency weights rather than probability weights. Cluster-robust SEs account for both the matching weights and pair membership and should be used when appropriate. Sometimes, researchers use functions in the `survey` package to estimate robust SEs, especially with inverse probability weighting; this is a valid way to compute robust SEs and will give similar results to `sandwich::vcovHC()`.[^10] [^10]: To use `survey` to adjust for pair membership, one can use the following code to specify the survey design to be used with `svyglm()`: `svydesign(ids = ~subclass, weights = ~weights, data = md)` where `md` is the output of `match_data()`. After `svyglm()`, `avg_comparisons()` can be used, and the `vcov` argument does not need to be specified. ### 3. Interpreting conditional effects as marginal effects The distinction between marginal and conditional effects is not always clear both in methodological and applied papers. Some statistical methods are valid only for estimating conditional effects and they should not be used to estimate marginal effects (without further modification). Sometimes conditional effects are desirable, and such methods may be useful for them, but when marginal effects are the target of inference, it is critical not to inappropriately interpret estimates resulting from statistical methods aimed at estimating conditional effects as marginal effects. Although this issue is particularly salient with binary and survival outcomes due to the general noncollapsibility of the OR, RR, and HR, this can also occur with linear models for continuous outcomes or the RD. The following methods estimate **conditional effects** for binary or survival outcomes (with noncollapsible effect measures) and should **not** be used to estimate marginal effects: - Logistic regression or Cox proportional hazards model with covariates and/or the propensity score included, using the coefficient on treatment as the effect estimate - Conditional logistic regression after matching (e.g., using `survival::clogit()`) - Stratified Cox regression after matching (e.g., using `survival::coxph()` with `strata()` in the model formula) - Averaging stratum-specific effect estimates after stratification, including using Mantel-Haenszel OR pooling - Including pair or stratum fixed or random effects in a logistic regression model, using the coefficient on treatment as the effect estimate In addition, with continuous outcomes, conditional effects can be mistakenly interpreted as marginal effect estimates when treatment-covariate interactions are present in the outcome model. If the covariates are not centered at their mean in the target population (e.g., the treated group for the ATT, the full sample for the ATE, or the remaining matched sample for an ATM), the coefficient on treatment will not correspond to the marginal effect in the target population; it will correspond to the effect of treatment when the covariate values are equal to zero, which may not be meaningful or plausible. G-computation is always the safest way to estimate effects when including covariates in the outcome model, especially in the presence of treatment-covariate interactions. ## References ::: {#refs} ::: ## Code to Generate Data used in Examples ```{r, eval = FALSE} #Generating data similar to Austin (2009) for demonstrating treatment effect estimation gen_X <- function(n) { X <- matrix(rnorm(9 * n), nrow = n, ncol = 9) X[,5] <- as.numeric(X[,5] < .5) X } #~20% treated gen_A <- function(X) { LP_A <- - 1.2 + log(2)*X[,1] - log(1.5)*X[,2] + log(2)*X[,4] - log(2.4)*X[,5] + log(2)*X[,7] - log(1.5)*X[,8] P_A <- plogis(LP_A) rbinom(nrow(X), 1, P_A) } # Continuous outcome gen_Y_C <- function(A, X) { 2*A + 2*X[,1] + 2*X[,2] + 2*X[,3] + 1*X[,4] + 2*X[,5] + 1*X[,6] + rnorm(length(A), 0, 5) } #Conditional: # MD: 2 #Marginal: # MD: 2 # Binary outcome gen_Y_B <- function(A, X) { LP_B <- -2 + log(2.4)*A + log(2)*X[,1] + log(2)*X[,2] + log(2)*X[,3] + log(1.5)*X[,4] + log(2.4)*X[,5] + log(1.5)*X[,6] P_B <- plogis(LP_B) rbinom(length(A), 1, P_B) } #Conditional: # OR: 2.4 # logOR: .875 #Marginal: # RD: .144 # RR: 1.54 # logRR: .433 # OR: 1.92 # logOR .655 # Survival outcome gen_Y_S <- function(A, X) { LP_S <- -2 + log(2.4)*A + log(2)*X[,1] + log(2)*X[,2] + log(2)*X[,3] + log(1.5)*X[,4] + log(2.4)*X[,5] + log(1.5)*X[,6] sqrt(-log(runif(length(A)))*2e4*exp(-LP_S)) } #Conditional: # HR: 2.4 # logHR: .875 #Marginal: # HR: 1.57 # logHR: .452 set.seed(19599) n <- 2000 X <- gen_X(n) A <- gen_A(X) Y_C <- gen_Y_C(A, X) Y_B <- gen_Y_B(A, X) Y_S <- gen_Y_S(A, X) d <- data.frame(A, X, Y_C, Y_B, Y_S) ``` MatchIt/vignettes/references.bib0000644000176200001440000017041014762360425016426 0ustar liggesusers @article{ho2007, title = {Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference}, author = {{Ho}, {Daniel E.} and {Imai}, {Kosuke} and {King}, {Gary} and {Stuart}, {Elizabeth A.}}, year = {2007}, month = {06}, date = {2007-06-20}, journal = {Political Analysis}, pages = {199--236}, volume = {15}, number = {3}, doi = {10.1093/pan/mpl013}, url = {https://pan.oxfordjournals.org/content/15/3/199}, langid = {en} } @article{mccaffrey2004, title = {Propensity Score Estimation With Boosted Regression for Evaluating Causal Effects in Observational Studies.}, author = {{McCaffrey}, {Daniel F.} and {Ridgeway}, {Greg} and {Morral}, {Andrew R.}}, year = {2004}, date = {2004}, journal = {Psychological Methods}, pages = {403--425}, volume = {9}, number = {4}, doi = {10.1037/1082-989X.9.4.403}, url = {https://doi.apa.org/getdoi.cfm?doi=10.1037/1082-989X.9.4.403}, langid = {en} } @article{diamond2013, title = {Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies}, author = {{Diamond}, {Alexis} and {Sekhon}, {Jasjeet S.}}, year = {2013}, date = {2013}, journal = {Review of Economics and Statistics}, pages = {932{\textendash}945}, volume = {95}, number = {3}, doi = {10.1162/REST_a_00318}, url = {https://www.mitpressjournals.org/doi/abs/10.1162/REST_a_00318}, langid = {en} } @article{belitser2011, title = {Measuring balance and model selection in propensity score methods}, author = {{Belitser}, {Svetlana V.} and {Martens}, {Edwin P.} and {Pestman}, {Wiebe R.} and {Groenwold}, {Rolf H.H.} and {de Boer}, {Anthonius} and {Klungel}, {Olaf H.}}, year = {2011}, month = {11}, date = {2011-11-01}, journal = {Pharmacoepidemiology and Drug Safety}, pages = {1115--1129}, volume = {20}, number = {11}, doi = {10.1002/pds.2188}, url = {https://onlinelibrary.wiley.com.libproxy.lib.unc.edu/doi/10.1002/pds.2188/abstract}, langid = {en} } @article{ali2014, title = {Propensity score balance measures in pharmacoepidemiology: a simulation study}, author = {{Ali}, {M. Sanni} and {Groenwold}, {Rolf H. H.} and {Pestman}, {Wiebe R.} and {Belitser}, {Svetlana V.} and {Roes}, {Kit C. B.} and {Hoes}, {Arno W.} and {de Boer}, {Anthonius} and {Klungel}, {Olaf H.}}, year = {2014}, month = {08}, date = {2014-08-01}, journal = {Pharmacoepidemiology and Drug Safety}, pages = {802--811}, volume = {23}, number = {8}, doi = {10.1002/pds.3574}, url = {https://onlinelibrary.wiley.com.libproxy.lib.unc.edu/doi/10.1002/pds.3574/abstract}, langid = {en} } @article{stuart2013, title = {Prognostic score-based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research}, author = {{Stuart}, {Elizabeth A.} and {Lee}, {Brian K.} and {Leacy}, {Finbarr P.}}, year = {2013}, month = {08}, date = {2013-08}, journal = {Journal of Clinical Epidemiology}, pages = {S84}, volume = {66}, number = {8}, doi = {10.1016/j.jclinepi.2013.01.013}, langid = {English} } @article{austin2009, title = {Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples}, author = {{Austin}, {Peter C.}}, year = {2009}, month = {09}, date = {2009-09-15}, journal = {Statistics in Medicine}, pages = {3083--3107}, volume = {28}, number = {25}, doi = {10.1002/sim.3697}, url = {https://dx.doi.org/10.1002/sim.3697}, langid = {en} } @article{austin2015, title = {Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies}, author = {{Austin}, {Peter C.} and {Stuart}, {Elizabeth A.}}, year = {2015}, month = {12}, date = {2015-12-10}, journal = {Statistics in Medicine}, pages = {3661--3679}, volume = {34}, number = {28}, doi = {10.1002/sim.6607}, url = {https://onlinelibrary.wiley.com.libproxy.lib.unc.edu/doi/10.1002/sim.6607/abstract}, langid = {en} } @article{hansen2008, title = {The prognostic analogue of the propensity score}, author = {{Hansen}, {Ben B.}}, year = {2008}, month = {02}, date = {2008-02-04}, journal = {Biometrika}, pages = {481--488}, volume = {95}, number = {2}, doi = {10.1093/biomet/asn004}, url = {https://academic.oup.com/biomet/article-lookup/doi/10.1093/biomet/asn004}, langid = {en} } @article{rubin2001, title = {Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation}, author = {{Rubin}, {Donald B.}}, year = {2001}, month = {12}, date = {2001-12}, journal = {Health Services and Outcomes Research Methodology}, pages = {169--188}, volume = {2}, number = {3-4}, doi = {10.1023/A:1020363010465}, url = {https://link.springer.com/article/10.1023/A%3A1020363010465}, langid = {en} } @article{franklin2014, title = {Metrics for covariate balance in cohort studies of causal effects}, author = {{Franklin}, {Jessica M.} and {Rassen}, {Jeremy A.} and {Ackermann}, {Diana} and {Bartels}, {Dorothee B.} and {Schneeweiss}, {Sebastian}}, year = {2014}, month = {05}, date = {2014-05-10}, journal = {Statistics in Medicine}, pages = {1685--1699}, volume = {33}, number = {10}, doi = {10.1002/sim.6058}, url = {https://doi.wiley.com/10.1002/sim.6058}, langid = {en} } @article{iacus2011, title = {Multivariate Matching Methods That Are Monotonic Imbalance Bounding}, author = {{Iacus}, {Stefano M.} and {King}, {Gary} and {Porro}, {Giuseppe}}, year = {2011}, month = {03}, date = {2011-03}, journal = {Journal of the American Statistical Association}, pages = {345--361}, volume = {106}, number = {493}, doi = {10.1198/jasa.2011.tm09599}, url = {https://dx.doi.org/10.1198/jasa.2011.tm09599}, langid = {en} } @article{heller2010, title = {Using the Cross-Match Test to Appraise Covariate Balance in Matched Pairs}, author = {{Heller}, {Ruth} and {Rosenbaum}, {Paul R.} and {Small}, {Dylan S.}}, year = {2010}, month = {11}, date = {2010-11}, journal = {The American Statistician}, pages = {299--309}, volume = {64}, number = {4}, doi = {10.1198/tast.2010.09210}, langid = {en} } @article{huling2020, title = {Energy Balancing of Covariate Distributions}, author = {{Huling}, {Jared D.} and {Mak}, {Simon}}, year = {2020}, month = {04}, date = {2020-04-29}, journal = {arXiv:2004.13962 [stat]}, url = {https://arxiv.org/abs/2004.13962}, note = {arXiv: 2004.13962} } @article{imai2008, title = {Misunderstandings between Experimentalists and Observationalists about Causal Inference}, author = {{Imai}, {Kosuke} and {King}, {Gary} and {Stuart}, {Elizabeth A.}}, year = {2008}, date = {2008}, journal = {Journal of the Royal Statistical Society. Series A (Statistics in Society)}, pages = {481--502}, volume = {171}, number = {2}, doi = {10.1111/j.1467-985X.2007.00527.x}, } @article{huitfeldt2019, title = {On the collapsibility of measures of effect in the counterfactual causal framework}, author = {{Huitfeldt}, {Anders} and {Stensrud}, {Mats J.} and {Suzuki}, {Etsuji}}, year = {2019}, month = {01}, date = {2019-01-07}, journal = {Emerging Themes in Epidemiology}, pages = {1}, volume = {16}, number = {1}, doi = {10.1186/s12982-018-0083-9}, } @article{mackinnon1985, title = {Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties}, author = {{MacKinnon}, {James G.} and {White}, {Halbert}}, year = {1985}, month = {09}, date = {1985-09}, journal = {Journal of Econometrics}, pages = {305--325}, volume = {29}, number = {3}, doi = {10.1016/0304-4076(85)90158-7}, langid = {en} } @article{king2015, title = {How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It}, author = {{King}, {Gary} and {Roberts}, {Margaret E.}}, year = {2015}, date = {2015}, journal = {Political Analysis}, pages = {159--179}, volume = {23}, number = {2}, doi = {10.1093/pan/mpu015}, langid = {en} } @article{wan2019, title = {Matched or unmatched analyses with propensity{-}score{\textendash}matched data?}, author = {{Wan}, {Fei}}, year = {2019}, month = {01}, date = {2019-01-30}, journal = {Statistics in Medicine}, pages = {289--300}, volume = {38}, number = {2}, doi = {10.1002/sim.7976}, langid = {en} } @article{nguyen2017, title = {Double-adjustment in propensity score matching analysis: choosing a threshold for considering residual imbalance}, author = {{Nguyen}, {Tri-Long} and {Collins}, {Gary S.} and {Spence}, {Jessica} and {Daurès}, {Jean-Pierre} and {Devereaux}, {P. J.} and {Landais}, {Paul} and {Le Manach}, {Yannick}}, year = {2017}, date = {2017}, journal = {BMC Medical Research Methodology}, pages = {78}, volume = {17}, doi = {10.1186/s12874-017-0338-0}, } @book{efron1993, title = {An Introduction to the Bootstrap}, author = {{Efron}, {Bradley} and {Tibshirani}, {Robert J.}}, year = {1993}, date = {1993}, publisher = {Springer US}, } @article{austin2014, title = {The use of bootstrapping when using propensity-score matching without replacement: a simulation study}, author = {{Austin}, {Peter C.} and {Small}, {Dylan S.}}, year = {2014}, month = {08}, date = {2014-08-04}, journal = {Statistics in Medicine}, pages = {4306--4319}, volume = {33}, number = {24}, doi = {10.1002/sim.6276}, url = {https://dx.doi.org/10.1002/sim.6276}, langid = {en} } @article{bodory2020, title = {The Finite Sample Performance of Inference Methods for Propensity Score Matching and Weighting Estimators}, author = {{Bodory}, {Hugo} and {Camponovo}, {Lorenzo} and {Huber}, {Martin} and {Lechner}, {Michael}}, year = {2020}, month = {01}, date = {2020-01-02}, journal = {Journal of Business & Economic Statistics}, pages = {183--200}, volume = {38}, number = {1}, doi = {10.1080/07350015.2018.1476247}, langid = {en} } @article{snowden2011, title = {Implementation of G-Computation on a Simulated Data Set: Demonstration of a Causal Inference Technique}, author = {{Snowden}, {Jonathan M.} and {Rose}, {Sherri} and {Mortimer}, {Kathleen M.}}, year = {2011}, month = {04}, date = {2011-04-01}, journal = {American Journal of Epidemiology}, pages = {731--738}, volume = {173}, number = {7}, doi = {10.1093/aje/kwq472}, langid = {en} } @article{schafer2008, title = {Average causal effects from nonrandomized studies: A practical guide and simulated example}, author = {{Schafer}, {Joseph L.} and {Kang}, {Joseph}}, year = {2008}, month = {12}, date = {2008-12}, journal = {Psychological Methods}, pages = {279--313}, volume = {13}, number = {4}, doi = {10.1037/a0014268} } @article{pearl2015, title = {Detecting Latent Heterogeneity}, author = {{Pearl}, {Judea}}, year = {2015}, month = {08}, date = {2015-08-27}, journal = {Sociological Methods & Research}, pages = {370--389}, volume = {46}, number = {3}, doi = {10.1177/0049124115600597}, url = {https://dx.doi.org/10.1177/0049124115600597}, langid = {en} } @article{abadie2019, title = {Robust Post-Matching Inference}, author = {{Abadie}, {Alberto} and {Spiess}, {Jann}}, year = {2019}, month = {01}, date = {2019-01}, pages = {34}, langid = {en}, doi = {10.1080/01621459.2020.1840383} } @article{austin2009a, title = {Type I Error Rates, Coverage of Confidence Intervals, and Variance Estimation in Propensity-Score Matched Analyses}, author = {{Austin}, {Peter C.}}, year = {2009}, month = {01}, date = {2009-01-14}, journal = {The International Journal of Biostatistics}, volume = {5}, number = {1}, doi = {10.2202/1557-4679.1146}, url = {https://dx.doi.org/10.2202/1557-4679.1146} } @article{gayat2012, title = {Propensity score applied to survival data analysis through proportional hazards models: a Monte Carlo study}, author = {{Gayat}, {Etienne} and {Resche{-}Rigon}, {Matthieu} and {Mary}, {Jean-Yves} and {Porcher}, {Raphaël}}, year = {2012}, date = {2012}, journal = {Pharmaceutical Statistics}, pages = {222--229}, volume = {11}, number = {3}, doi = {10.1002/pst.537}, url = {https://doi.org/10.1002/pst.537}, langid = {en} } @article{austin2012, title = {The performance of different propensity score methods for estimating marginal hazard ratios}, author = {{Austin}, {Peter C.}}, year = {2012}, month = {12}, date = {2012-12-12}, journal = {Statistics in Medicine}, pages = {2837--2849}, volume = {32}, number = {16}, doi = {10.1002/sim.5705}, url = {https://doi.org/10.1002/sim.5705}, langid = {en} } @article{austin2013, title = {The performance of different propensity score methods for estimating marginal hazard ratios}, author = {{Austin}, {Peter C.}}, year = {2013}, month = {07}, date = {2013-07-20}, journal = {Statistics in Medicine}, pages = {2837--2849}, volume = {32}, number = {16}, doi = {10.1002/sim.5705}, url = {https://doi.org/10.1002/sim.5705}, note = {Publisher: John Wiley & Sons, Ltd}, langid = {en} } @article{abadie2008, title = {On the Failure of the Bootstrap for Matching Estimators}, author = {{Abadie}, {Alberto} and {Imbens}, {Guido W.}}, year = {2008}, date = {2008}, journal = {Econometrica}, pages = {1537--1557}, volume = {76}, number = {6}, url = {https://doi.org/10.3982/ECTA6474}, note = {Publisher: [Wiley, Econometric Society]} } @article{hill2006, title = {Interval estimation for treatment effects using propensity score matching}, author = {{Hill}, {Jennifer} and {Reiter}, {Jerome P.}}, year = {2006}, date = {2006}, journal = {Statistics in Medicine}, pages = {2230--2256}, volume = {25}, number = {13}, doi = {10.1002/sim.2277}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.2277}, note = {{\_}eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.2277}, langid = {en} } @article{austin2017, title = {Estimating the effect of treatment on binary outcomes using full matching on the propensity score}, author = {{Austin}, {Peter C.} and {Stuart}, {Elizabeth A.}}, year = {2017}, month = {12}, date = {2017-12}, journal = {Statistical Methods in Medical Research}, pages = {2505--2525}, volume = {26}, number = {6}, doi = {10.1177/0962280215601134}, url = {https://journals.sagepub.com/doi/10.1177/0962280215601134}, langid = {en} } @article{mackinnon2006, title = {Bootstrap Methods in Econometrics*}, author = {{MacKinnon}, {James G.}}, year = {2006}, month = {09}, date = {2006-09}, journal = {Economic Record}, pages = {S2--S18}, volume = {82}, number = {s1}, doi = {10.1111/j.1475-4932.2006.00328.x}, url = {https://dx.doi.org/10.1111/j.1475-4932.2006.00328.x}, langid = {en} } @article{carpenter2000, title = {Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians}, author = {{Carpenter}, {James} and {Bithell}, {John}}, year = {2000}, date = {2000}, journal = {Statistics in Medicine}, pages = {1141--1164}, volume = {19}, number = {9}, doi = {10.1002/(SICI)1097-0258(20000515)19:9<1141::AID-SIM479>3.0.CO;2-F}, langid = {en} } @article{westreich2013, title = {The Table 2 Fallacy: Presenting and Interpreting Confounder and Modifier Coefficients}, author = {{Westreich}, {D.} and {Greenland}, {S.}}, year = {2013}, month = {01}, date = {2013-01-30}, journal = {American Journal of Epidemiology}, pages = {292--298}, volume = {177}, number = {4}, doi = {10.1093/aje/kws412}, url = {https://dx.doi.org/10.1093/aje/kws412}, langid = {en} } @article{austin2013a, title = {The use of propensity score methods with survival or time-to-event outcomes: reporting measures of effect similar to those used in randomized experiments}, author = {{Austin}, {Peter C.}}, year = {2013}, month = {09}, date = {2013-09-30}, journal = {Statistics in Medicine}, pages = {1242--1258}, volume = {33}, number = {7}, doi = {10.1002/sim.5984}, url = {https://dx.doi.org/10.1002/sim.5984}, langid = {en} } @article{austin2020, title = {Covariate-adjusted survival analyses in propensity-score matched samples: Imputing potential time-to-event outcomes}, author = {{Austin}, {Peter C.} and {Thomas}, {Neal} and {Rubin}, {Donald B.}}, year = {2020}, month = {03}, date = {2020-03-01}, journal = {Statistical Methods in Medical Research}, pages = {728--751}, volume = {29}, number = {3}, doi = {10.1177/0962280218817926}, url = {https://doi.org/10.1177/0962280218817926}, note = {Publisher: SAGE Publications Ltd STM}, langid = {en} } @article{austin2020a, title = {Variance estimation when using propensity{-}score matching with replacement with survival or time{-}to{-}event outcomes}, author = {{Austin}, {Peter C.} and {Cafri}, {Guy}}, year = {2020}, month = {02}, date = {2020-02-28}, journal = {Statistics in Medicine}, pages = {1623--1640}, volume = {39}, number = {11}, doi = {10.1002/sim.8502}, url = {https://dx.doi.org/10.1002/sim.8502}, langid = {en} } @article{austin2015a, title = {The performance of inverse probability of treatment weighting and full matching on the propensity score in the presence of model misspecification when estimating the effect of treatment on survival outcomes}, author = {{Austin}, {Peter C.} and {Stuart}, {Elizabeth A.}}, year = {2015}, month = {04}, date = {2015-04-30}, journal = {Statistical Methods in Medical Research}, pages = {1654--1670}, volume = {26}, number = {4}, doi = {10.1177/0962280215584401}, url = {https://dx.doi.org/10.1177/0962280215584401}, langid = {en} } @article{austin2015b, title = {Optimal full matching for survival outcomes: a method that merits more widespread use}, author = {{Austin}, {Peter C.} and {Stuart}, {Elizabeth A.}}, year = {2015}, month = {08}, date = {2015-08-06}, journal = {Statistics in Medicine}, pages = {3949--3967}, volume = {34}, number = {30}, doi = {10.1002/sim.6602}, url = {https://dx.doi.org/10.1002/sim.6602}, langid = {en} } @article{hong2010, title = {Marginal mean weighting through stratification: Adjustment for selection bias in multilevel data}, author = {{Hong}, {Guanglei}}, year = {2010}, month = {10}, date = {2010-10}, journal = {Journal of Educational and Behavioral Statistics}, pages = {499--531}, volume = {35}, number = {5}, doi = {10.3102/1076998609359785}, url = {https://journals.sagepub.com/doi/10.3102/1076998609359785}, langid = {en} } @article{rudolph2016, title = {Optimally combining propensity score subclasses}, author = {{Rudolph}, {Kara E.} and {Colson}, {K. Ellicott} and {Stuart}, {Elizabeth A.} and {Ahern}, {Jennifer}}, year = {2016}, date = {2016}, journal = {Statistics in Medicine}, pages = {4937--4947}, volume = {35}, number = {27}, doi = {10.1002/sim.7046}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.7046}, note = {{\_}eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.7046}, langid = {en} } @article{stampf2010, title = {Estimators and confidence intervals for the marginal odds ratio using logistic regression and propensity score stratification}, author = {{Stampf}, {Susanne} and {Graf}, {Erika} and {Schmoor}, {Claudia} and {Schumacher}, {Martin}}, year = {2010}, date = {2010}, journal = {Statistics in Medicine}, pages = {760--769}, volume = {29}, number = {7-8}, doi = {10.1002/sim.3811}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.3811}, note = {{\_}eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.3811}, langid = {en} } @article{cameron2015, title = {A Practitioner{\textquoteright}s Guide to Cluster-Robust Inference}, author = {{Cameron}, {A. Colin} and {Miller}, {Douglas L.}}, year = {2015}, month = {03}, date = {2015-03-31}, journal = {Journal of Human Resources}, pages = {317--372}, volume = {50}, number = {2}, doi = {10.3368/jhr.50.2.317}, url = {https://jhr.uwpress.org/lookup/doi/10.3368/jhr.50.2.317}, note = {Publisher: University of Wisconsin Press}, langid = {en} } @article{desai2017, title = {A Propensity-score-based Fine Stratification Approach for Confounding Adjustment When Exposure Is Infrequent:}, author = {{Desai}, {Rishi J.} and {Rothman}, {Kenneth J.} and {Bateman}, {Brian T.} and {Hernandez-Diaz}, {Sonia} and {Huybrechts}, {Krista F.}}, year = {2017}, month = {03}, date = {2017-03}, journal = {Epidemiology}, pages = {249--257}, volume = {28}, number = {2}, doi = {10.1097/EDE.0000000000000595}, url = {https://Insights.ovid.com/crossref?an=00001648-201703000-00014}, langid = {en} } @article{stuart2008, title = {Developing practical recommendations for the use of propensity scores: Discussion of {\textquoteleft}A critical appraisal of propensity score matching in the medical literature between 1996 and 2003{\textquoteright} by Peter Austin,Statistics in Medicine}, author = {{Stuart}, {Elizabeth A.}}, year = {2008}, date = {2008}, journal = {Statistics in Medicine}, pages = {2062--2065}, volume = {27}, number = {12}, doi = {10.1002/sim.3207}, url = {https://dx.doi.org/10.1002/sim.3207}, langid = {en} } @article{austin2014a, title = {The use of bootstrapping when using propensity{-}score matching without replacement: a simulation study}, author = {{Austin}, {Peter C.} and {Small}, {Dylan S.}}, year = {2014}, month = {08}, date = {2014-08-04}, journal = {Statistics in Medicine}, pages = {4306--4319}, volume = {33}, number = {24}, doi = {10.1002/sim.6276}, url = {https://dx.doi.org/10.1002/sim.6276}, langid = {en} } @article{thoemmes2011, title = {A Systematic Review of Propensity Score Methods in the Social Sciences}, author = {{Thoemmes}, {Felix J.} and {Kim}, {Eun Sook}}, year = {2011}, month = {02}, date = {2011-02-07}, journal = {Multivariate Behavioral Research}, pages = {90--118}, volume = {46}, number = {1}, doi = {10.1080/00273171.2011.540475}, url = {https://dx.doi.org/10.1080/00273171.2011.540475}, langid = {en} } @article{zakrison2018, title = {A systematic review of propensity score methods in the acute care surgery literature: avoiding the pitfalls and proposing a set of reporting guidelines}, author = {{Zakrison}, {T. L.} and {Austin}, {Peter C.} and {McCredie}, {V. A.}}, year = {2018}, month = {06}, date = {2018-06-01}, journal = {European Journal of Trauma and Emergency Surgery}, pages = {385--395}, volume = {44}, number = {3}, doi = {10.1007/s00068-017-0786-6}, url = {https://doi.org/10.1007/s00068-017-0786-6}, langid = {en} } @article{stuart2010, title = {Matching Methods for Causal Inference: A Review and a Look Forward}, author = {{Stuart}, {Elizabeth A.}}, year = {2010}, month = {02}, date = {2010-02}, journal = {Statistical Science}, pages = {1--21}, volume = {25}, number = {1}, doi = {10.1214/09-STS313}, url = {https://projecteuclid.org/euclid.ss/1280841730}, langid = {en} } @article{austin2013b, title = {A comparison of 12 algorithms for matching on the propensity score}, author = {{Austin}, {Peter C.}}, year = {2013}, month = {10}, date = {2013-10-07}, journal = {Statistics in Medicine}, pages = {1057--1069}, volume = {33}, number = {6}, doi = {10.1002/sim.6004}, url = {https://dx.doi.org/10.1002/sim.6004}, langid = {en} } @article{rubin1973, title = {Matching to Remove Bias in Observational Studies}, author = {{Rubin}, {Donald B.}}, year = {1973}, month = {03}, date = {1973-03}, journal = {Biometrics}, pages = {159}, volume = {29}, number = {1}, doi = {10.2307/2529684}, url = {https://dx.doi.org/10.2307/2529684} } @article{hansen2006, title = {Optimal Full Matching and Related Designs via Network Flows}, author = {{Hansen}, {Ben B.} and {Klopfer}, {Stephanie O.}}, year = {2006}, month = {09}, date = {2006-09}, journal = {Journal of Computational and Graphical Statistics}, pages = {609--627}, volume = {15}, number = {3}, doi = {10.1198/106186006X137047}, url = {https://www.tandfonline.com/doi/abs/10.1198/106186006X137047}, langid = {en} } @article{gu1993, title = {Comparison of Multivariate Matching Methods: Structures, Distances, and Algorithms}, author = {{Gu}, {Xing Sam} and {Rosenbaum}, {Paul R.}}, year = {1993}, month = {12}, date = {1993-12}, journal = {Journal of Computational and Graphical Statistics}, pages = {405}, volume = {2}, number = {4}, doi = {10.2307/1390693}, url = {https://www.jstor.org/stable/1390693?origin=crossref}, langid = {en} } @article{hansen2004, title = {Full Matching in an Observational Study of Coaching for the SAT}, author = {{Hansen}, {Ben B.}}, year = {2004}, month = {09}, date = {2004-09}, journal = {Journal of the American Statistical Association}, pages = {609--618}, volume = {99}, number = {467}, doi = {10.1198/016214504000000647}, url = {https://www.tandfonline.com/doi/abs/10.1198/016214504000000647}, langid = {en} } @article{stuart2008a, title = {Using full matching to estimate causal effects in nonexperimental studies: Examining the relationship between adolescent marijuana use and adult outcomes.}, author = {{Stuart}, {Elizabeth A.} and {Green}, {Kerry M.}}, year = {2008}, month = {03}, date = {2008-03}, journal = {Developmental Psychology}, pages = {395--406}, volume = {44}, number = {2}, doi = {10.1037/0012-1649.44.2.395}, url = {https://dx.doi.org/10.1037/0012-1649.44.2.395}, langid = {en} } @article{sekhon2011, title = {Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R}, author = {{Sekhon}, {Jasjeet S.}}, year = {2011}, month = {06}, date = {2011-06-14}, journal = {Journal of Statistical Software}, pages = {1--52}, volume = {42}, number = {1}, doi = {10.18637/jss.v042.i07}, url = {https://www.jstatsoft.org/index.php/jss/article/view/v042i07}, note = {Number: 1}, langid = {en} } @article{iacus2012, title = {Causal Inference without Balance Checking: Coarsened Exact Matching}, author = {{Iacus}, {Stefano M.} and {King}, {Gary} and {Porro}, {Giuseppe}}, year = {2012}, date = {2012}, journal = {Political Analysis}, pages = {1--24}, volume = {20}, number = {1}, doi = {10.1093/pan/mpr013}, url = {https://www.cambridge.org/core/product/identifier/S1047198700012985/type/journal_article}, langid = {en} } @article{iacus2009, title = {cem: Software for Coarsened Exact Matching}, author = {{Iacus}, {Stefano M.} and {King}, {Gary} and {Porro}, {Giuseppe}}, year = {2009}, month = {06}, date = {2009-06-25}, journal = {Journal of Statistical Software}, pages = {1--27}, volume = {30}, number = {1}, doi = {10.18637/jss.v030.i09}, url = {https://www.jstatsoft.org/index.php/jss/article/view/v030i09}, note = {Number: 1}, langid = {en} } @article{austin2010, title = {The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies}, author = {{Austin}, {Peter C.}}, year = {2010}, month = {01}, date = {2010-01-27}, journal = {Statistics in Medicine}, pages = {2137--2148}, volume = {29}, number = {20}, doi = {10.1002/sim.3854}, url = {https://dx.doi.org/10.1002/sim.3854}, langid = {en} } @article{orihara2021, title = {Determination of the optimal number of strata for propensity score subclassification}, author = {{Orihara}, {Shunichiro} and {Hamada}, {Etsuo}}, year = {2021}, month = {01}, date = {2021-01-01}, journal = {Statistics & Probability Letters}, pages = {108951}, volume = {168}, doi = {10.1016/j.spl.2020.108951}, url = {https://www.sciencedirect.com/science/article/pii/S0167715220302546}, langid = {en} } @article{austin2011a, title = {Optimal caliper widths for propensity{-}score matching when estimating differences in means and differences in proportions in observational studies}, author = {{Austin}, {Peter C.}}, year = {2011}, month = {03}, date = {2011-03}, journal = {Pharmaceutical Statistics}, pages = {150--161}, volume = {10}, number = {2}, doi = {10.1002/pst.433}, url = {https://dx.doi.org/10.1002/pst.433}, langid = {en} } @article{king2019, title = {Why Propensity Scores Should Not Be Used for Matching}, author = {{King}, {Gary} and {Nielsen}, {Richard}}, year = {2019}, month = {05}, date = {2019-05-07}, journal = {Political Analysis}, pages = {1--20}, doi = {10.1017/pan.2019.11}, url = {https://www.cambridge.org/core/product/identifier/S1047198719000111/type/journal_article}, langid = {en} } @article{austin2010a, title = {Statistical Criteria for Selecting the Optimal Number of Untreated Subjects Matched to Each Treated Subject When Using Many-to-One Matching on the Propensity Score}, author = {{Austin}, {Peter C.}}, year = {2010}, month = {08}, date = {2010-08-28}, journal = {American Journal of Epidemiology}, pages = {1092--1097}, volume = {172}, number = {9}, doi = {10.1093/aje/kwq224}, url = {https://dx.doi.org/10.1093/aje/kwq224}, langid = {en} } @article{austin2011b, title = {An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies}, author = {{Austin}, {Peter C.}}, year = {2011}, month = {05}, date = {2011-05-31}, journal = {Multivariate Behavioral Research}, pages = {399--424}, volume = {46}, number = {3}, doi = {10.1080/00273171.2011.568786}, url = {https://dx.doi.org/10.1080/00273171.2011.568786}, langid = {en} } @article{abadie2006, title = {Large Sample Properties of Matching Estimators for Average Treatment Effects}, author = {{Abadie}, {Alberto} and {Imbens}, {Guido W.}}, year = {2006}, month = {01}, date = {2006-01}, journal = {Econometrica}, pages = {235--267}, volume = {74}, number = {1}, doi = {10.1111/j.1468-0262.2006.00655.x}, url = {https://doi.wiley.com/10.1111/j.1468-0262.2006.00655.x}, langid = {en} } @article{abadie2016, title = {Matching on the Estimated Propensity Score}, author = {{Abadie}, {Alberto} and {Imbens}, {Guido W.}}, year = {2016}, date = {2016}, journal = {Econometrica}, pages = {781--807}, volume = {84}, number = {2}, doi = {10.3982/ECTA11293}, url = {https://www.econometricsociety.org/doi/10.3982/ECTA11293}, langid = {en} } @article{liang1986, title = {Longitudinal data analysis using generalized linear models}, author = {{Liang}, {Kung-Yee} and {Zeger}, {Scott L.}}, year = {1986}, date = {1986}, journal = {Biometrika}, pages = {13--22}, volume = {73}, number = {1}, doi = {10.1093/biomet/73.1.13}, url = {https://academic.oup.com/biomet/article-lookup/doi/10.1093/biomet/73.1.13}, langid = {en} } @article{austin2007a, title = {The performance of different propensity score methods for estimating marginal odds ratios}, author = {{Austin}, {Peter C.}}, year = {2007}, date = {2007}, journal = {Statistics in Medicine}, pages = {3078--3094}, volume = {26}, number = {16}, doi = {10.1002/sim.2781}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.2781}, note = {{\_}eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.2781}, langid = {en} } @article{austin2009b, title = {Type I Error Rates, Coverage of Confidence Intervals, and Variance Estimation in Propensity-Score Matched Analyses}, author = {{Austin}, {Peter C.}}, year = {2009}, month = {01}, date = {2009-01-14}, journal = {The International Journal of Biostatistics}, volume = {5}, number = {1}, doi = {10.2202/1557-4679.1146}, url = {https://dx.doi.org/10.2202/1557-4679.1146} } @article{austin2008, title = {The performance of different propensity-score methods for estimating relative risks}, author = {{Austin}, {Peter C.}}, year = {2008}, month = {06}, date = {2008-06}, journal = {Journal of Clinical Epidemiology}, pages = {537--545}, volume = {61}, number = {6}, doi = {10.1016/j.jclinepi.2007.07.011}, url = {https://dx.doi.org/10.1016/j.jclinepi.2007.07.011}, langid = {en} } @article{austin2010b, title = {The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies}, author = {{Austin}, {Peter C.}}, year = {2010}, month = {01}, date = {2010-01-27}, journal = {Statistics in Medicine}, pages = {2137--2148}, volume = {29}, number = {20}, doi = {10.1002/sim.3854}, url = {https://dx.doi.org/10.1002/sim.3854}, langid = {en} } @article{vanderweele2019, title = {Principles of confounder selection}, author = {{VanderWeele}, {Tyler J.}}, year = {2019}, month = {03}, date = {2019-03}, journal = {European Journal of Epidemiology}, pages = {211--219}, volume = {34}, number = {3}, doi = {10.1007/s10654-019-00494-6}, url = {https://dx.doi.org/10.1007/s10654-019-00494-6}, langid = {en} } @article{rosenbaum1983, title = {The central role of the propensity score in observational studies for causal effects}, author = {{Rosenbaum}, {Paul R.} and {Rubin}, {Donald B.}}, year = {1983}, month = {04}, date = {1983-04-01}, journal = {Biometrika}, pages = {41--55}, volume = {70}, number = {1}, doi = {10.1093/biomet/70.1.41}, url = {https://biomet.oxfordjournals.org/content/70/1/41}, langid = {en} } @article{rubin1973a, title = {Matching to Remove Bias in Observational Studies}, author = {{Rubin}, {Donald B.}}, year = {1973}, month = {03}, date = {1973-03}, journal = {Biometrics}, pages = {159}, volume = {29}, number = {1}, doi = {10.2307/2529684}, url = {https://dx.doi.org/10.2307/2529684} } @article{cochran1973, title = {Controlling Bias in Observational Studies: A Review}, author = {{Cochran}, {William G.} and {Rubin}, {Donald B.}}, year = {1973}, date = {1973}, journal = {Sankhy{\={a}}: The Indian Journal of Statistics, Series A (1961-2002)}, pages = {417--446}, volume = {35}, number = {4}, url = {https://www.jstor.org/stable/25049893} } @article{hansen2008a, title = {The prognostic analogue of the propensity score}, author = {{Hansen}, {Ben B.}}, year = {2008}, month = {02}, date = {2008-02-04}, journal = {Biometrika}, pages = {481--488}, volume = {95}, number = {2}, doi = {10.1093/biomet/asn004}, url = {https://academic.oup.com/biomet/article-lookup/doi/10.1093/biomet/asn004}, langid = {en} } @article{mao2018, title = {Propensity score weighting analysis and treatment effect discovery}, author = {{Mao}, {Huzhang} and {Li}, {Liang} and {Greene}, {Tom}}, year = {2018}, month = {06}, date = {2018-06-19}, journal = {Statistical Methods in Medical Research}, pages = {096228021878117}, doi = {10.1177/0962280218781171}, url = {https://journals.sagepub.com/doi/10.1177/0962280218781171}, langid = {en} } @article{rosenbaum1985, title = {The Bias Due to Incomplete Matching}, author = {{Rosenbaum}, {Paul R.} and {Rubin}, {Donald B.}}, year = {1985}, date = {1985}, journal = {Biometrics}, pages = {103--116}, volume = {41}, number = {1}, doi = {10.2307/2530647}, url = {https://www.jstor.org/stable/2530647} } @article{wang2020, title = {To use or not to use propensity score matching?}, author = {{Wang}, {Jixian}}, year = {2020}, month = {08}, date = {2020-08-10}, journal = {Pharmaceutical Statistics}, doi = {10.1002/pst.2051}, url = {https://dx.doi.org/10.1002/pst.2051}, langid = {en} } @article{austin2015c, title = {Estimating the effect of treatment on binary outcomes using full matching on the propensity score}, author = {{Austin}, {Peter C.} and {Stuart}, {Elizabeth A.}}, year = {2015}, month = {09}, date = {2015-09}, journal = {Statistical Methods in Medical Research}, pages = {2505--2525}, volume = {26}, number = {6}, doi = {10.1177/0962280215601134}, url = {https://dx.doi.org/10.1177/0962280215601134}, langid = {en} } @article{austin2009c, title = {The Relative Ability of Different Propensity Score Methods to Balance Measured Covariates Between Treated and Untreated Subjects in Observational Studies}, author = {{Austin}, {Peter C.}}, year = {2009}, month = {08}, date = {2009-08-14}, journal = {Medical Decision Making}, pages = {661--677}, volume = {29}, number = {6}, doi = {10.1177/0272989x09341755}, url = {https://dx.doi.org/10.1177/0272989X09341755}, langid = {en} } @article{rosenbaum1985a, title = {Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score}, author = {{Rosenbaum}, {Paul R.} and {Rubin}, {Donald B.}}, year = {1985}, month = {02}, date = {1985-02}, journal = {The American Statistician}, pages = {33}, volume = {39}, number = {1}, doi = {10.2307/2683903}, url = {https://dx.doi.org/10.2307/2683903} } @article{forbes2008, title = {Inverse probability weighted estimation of the marginal odds ratio: Correspondence regarding {\textquoteleft}The performance of different propensity score methods for estimating marginal odds ratios{\textquoteright} by P. Austin, Statictics in Medicine, 2007; 26:3078{\textendash}3094}, author = {{Forbes}, {Andrew} and {Shortreed}, {Susan}}, year = {2008}, date = {2008}, journal = {Statistics in Medicine}, pages = {5556--5559}, volume = {27}, number = {26}, doi = {10.1002/sim.3362}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.3362}, note = {{\_}eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.3362}, langid = {en} } @article{dugoff2014, title = {Generalizing Observational Study Results: Applying Propensity Score Methods to Complex Surveys}, author = {{DuGoff}, {Eva H.} and {Schuler}, {Megan} and {Stuart}, {Elizabeth A.}}, year = {2014}, month = {02}, date = {2014-02}, journal = {Health Services Research}, pages = {284--303}, volume = {49}, number = {1}, doi = {10.1111/1475-6773.12090}, url = {https://doi.wiley.com/10.1111/1475-6773.12090}, langid = {en} } @article{austin2016, title = {Propensity score matching and complex surveys}, author = {{Austin}, {Peter C.} and {Jembere}, {Nathaniel} and {Chiu}, {Maria}}, year = {2016}, month = {07}, date = {2016-07-26}, journal = {Statistical Methods in Medical Research}, pages = {1240--1257}, volume = {27}, number = {4}, doi = {10.1177/0962280216658920}, url = {https://dx.doi.org/10.1177/0962280216658920}, langid = {en} } @article{lenis2019, title = {It{\textquoteright}s all about balance: propensity score matching in the context of complex survey data}, author = {{Lenis}, {David} and {Nguyen}, {Trang Quynh} and {Dong}, {Nianbo} and {Stuart}, {Elizabeth A.}}, year = {2019}, month = {01}, date = {2019-01-01}, journal = {Biostatistics}, pages = {147--163}, volume = {20}, number = {1}, doi = {10.1093/biostatistics/kxx063}, url = {https://academic.oup.com/biostatistics/article/20/1/147/4780267}, langid = {en} } @article{rosenbaum2020, title = {Modern Algorithms for Matching in Observational Studies}, author = {{Rosenbaum}, {Paul R.}}, year = {2020}, date = {2020}, journal = {Annual Review of Statistics and Its Application}, pages = {143--176}, volume = {7}, number = {1}, doi = {10.1146/annurev-statistics-031219-041058}, url = {https://doi.org/10.1146/annurev-statistics-031219-041058}, note = {{\_}eprint: https://doi.org/10.1146/annurev-statistics-031219-041058} } @article{ming2000, title = {Substantial Gains in Bias Reduction from Matching with a Variable Number of Controls}, author = {{Ming}, {Kewei} and {Rosenbaum}, {Paul R.}}, year = {2000}, month = {03}, date = {2000-03}, journal = {Biometrics}, pages = {118--124}, volume = {56}, number = {1}, doi = {10.1111/j.0006-341X.2000.00118.x}, url = {https://doi.wiley.com/10.1111/j.0006-341X.2000.00118.x}, langid = {en} } @article{greenExaminingModerationAnalyses2014, title = {Examining Moderation Analyses in Propensity Score Methods: {{Application}} to Depression and Substance Use}, shorttitle = {Examining Moderation Analyses in Propensity Score Methods}, author = {Green, Kerry M. and Stuart, Elizabeth A.}, year = {2014}, month = oct, volume = {82}, pages = {773--783}, issn = {0022-006X}, doi = {10.1037/a0036515}, abstract = {Objective: This study provides guidance on how propensity score methods can be combined with moderation analyses (i.e., effect modification) to examine subgroup differences in potential causal effects in nonexperimental studies. As a motivating example, we focus on how depression may affect subsequent substance use differently for men and women. Method: Using data from a longitudinal community cohort study (N = 952) of urban African Americans with assessments in childhood, adolescence, young adulthood, and midlife, we estimate the influence of depression by young adulthood on substance use outcomes in midlife, and whether that influence varies by gender. We illustrate and compare 5 different techniques for estimating subgroup effects using propensity score methods, including separate propensity score models and matching for men and women, a joint propensity score model for men and women with matching separately and together by gender, and a joint male/female propensity score model that includes theoretically important gender interactions with matching separately and together by gender. Results: Analyses showed that estimating separate models for men and women yielded the best balance and, therefore, is a preferred technique when subgroup analyses are of interest, at least in this data. Results also showed substance use consequences of depression but no significant gender differences. Conclusions: It is critical to prespecify subgroup effects before the estimation of propensity scores and to check balance within subgroups regardless of the type of propensity score model used. Results also suggest that depression may affect multiple substance use outcomes in midlife for both men and women relatively equally. (PsycINFO Database Record (c) 2016 APA, all rights reserved). (journal abstract)}, file = {/Users/NoahGreifer/Zotero/storage/FUTHAJER/Green and Stuart - 2014 - Examining moderation analyses in propensity score .pdf}, journal = {Journal of Consulting and Clinical Psychology}, keywords = {Blacks,causal effects,causality,depression,Drug Abuse,effect modification,Gender differences,Human Sex Differences,Major Depression,nonexperimental study,Observation Methods,observational data,substance abuse}, number = {5}, series = {Advances in {{Data Analytic Methods}}} } @article{kreifMethodsEstimatingSubgroup2012, title = {Methods for Estimating Subgroup Effects in Cost-Effectiveness Analyses That Use Observational Data}, author = {Kreif, Noemi and Grieve, Richard and Radice, Rosalba and Sadique, Zia and Ramsahai, Roland and Sekhon, Jasjeet S.}, year = {2012}, month = nov, volume = {32}, pages = {750--763}, issn = {0272-989X}, doi = {10.1177/0272989X12448929}, abstract = {Decision makers require cost-effectiveness estimates for patient subgroups. In nonrandomized studies, propensity score (PS) matching and inverse probability of treatment weighting (IPTW) can address overt selection bias, but only if they balance observed covariates between treatment groups. Genetic matching (GM) matches on the PS and individual covariates using an automated search algorithm to directly balance baseline covariates. This article compares these methods for estimating subgroup effects in cost-effectiveness analyses (CEA). The motivating case study is a CEA of a pharmaceutical intervention, drotrecogin alfa (DrotAA), for patient subgroups with severe sepsis (n = 2726). Here, GM reported better covariate balance than PS matching and IPTW. For the subgroup at a high level of baseline risk, the probability that DrotAA was cost-effective ranged from 30\% (IPTW) to 90\% (PS matching and GM), at a threshold of \textsterling 20 000 per quality-adjusted life-year. We then compared the methods in a simulation study, in which initially the PS was correctly specified and then misspecified, for example, by ignoring the subgroup-specific treatment assignment. Relative performance was assessed as bias and root mean squared error (RMSE) in the estimated incremental net benefits. When the PS was correctly specified and inverse probability weights were stable, each method performed well; IPTW reported the lowest RMSE. When the subgroup-specific treatment assignment was ignored, PS matching and IPTW reported covariate imbalance and bias; GM reported better balance, less bias, and more precise estimates. We conclude that if the PS is correctly specified and the weights for IPTW are stable, each method can provide unbiased cost-effectiveness estimates. However, unlike IPTW and PS matching, GM is relatively robust to PS misspecification.}, file = {/Users/NoahGreifer/Zotero/storage/2DX775MH/Web Appendix.pdf;/Users/NoahGreifer/Zotero/storage/IUAWWYQP/Kreif et al. - 2012 - Methods for Estimating Subgroup Effects in Cost-Ef.pdf}, journal = {Medical Decision Making}, language = {en}, number = {6} } @article{wangRelativePerformancePropensity2018, title = {Relative {{Performance}} of {{Propensity Score Matching Strategies}} for {{Subgroup Analyses}}}, author = {Wang, Shirley V. and Jin, Yinzhu and Fireman, Bruce and Gruber, Susan and He, Mengdong and Wyss, Richard and Shin, HoJin and Ma, Yong and Keeton, Stephine and Karami, Sara and Major, Jacqueline M. and Schneeweiss, Sebastian and Gagne, Joshua J.}, year = {2018}, month = aug, volume = {187}, pages = {1799--1807}, issn = {0002-9262}, doi = {10.1093/aje/kwy049}, abstract = {Abstract. Postapproval drug safety studies often use propensity scores (PSs) to adjust for a large number of baseline confounders. These studies may involve ex}, file = {/Users/NoahGreifer/Zotero/storage/PFIYDGIG/Wang et al. - 2018 - Relative Performance of Propensity Score Matching .pdf}, journal = {American Journal of Epidemiology}, language = {en}, number = {8} } @article{zubizarretaMatchingBalancePairing2014, title = {Matching for Balance, Pairing for Heterogeneity in an Observational Study of the Effectiveness of for-Profit and Not-for-Profit High Schools in {{Chile}}}, author = {Zubizarreta, Jos{\'e} R. and Paredes, Ricardo D. and Rosenbaum, Paul R.}, year = {2014}, month = mar, date = {2014-03}, volume = {8}, pages = {204--231}, issn = {1932-6157}, doi = {10.1214/13-AOAS713}, file = {/Users/NoahGreifer/Zotero/storage/JACNSVBA/Zubizarreta et al. - 2014 - Matching for balance, pairing for heterogeneity in.pdf}, journal = {The Annals of Applied Statistics}, url = {https://projecteuclid.org/euclid.aoas/1396966284}, language = {en}, number = {1} } @article{bennettBuildingRepresentativeMatched2020, title = {Building {{Representative Matched Samples With Multi}}-{{Valued Treatments}} in {{Large Observational Studies}}}, author = {Bennett, Magdalena and Vielma, Juan Pablo and Zubizarreta, Jos{\'e} R.}, year = {2020}, month = may, pages = {1--29}, issn = {1061-8600, 1537-2715}, doi = {10.1080/10618600.2020.1753532}, abstract = {In this article, we present a new way of matching in observational studies that overcomes three limitations of existing matching approaches. First, it directly balances covariates with multi-valued treatments without explicitly estimating the generalized propensity score. Second, it builds self-weighted matched samples that are representative of a target population by design. Third, it can handle large datasets, with hundreds of thousands of observations, in a couple of minutes. The key insights of this new approach to matching are balancing the treatment groups relative to a target population and positing a linear-sized mixed integer formulation of the matching problem. We formally show that this formulation is more effective than alternative quadratic-sized formulations, as its reduction in size does not affect its strength from the standpoint of its linear programming relaxation. We also show that this formulation can be used for matching with distributional covariate balance in polynomial time under certain assumptions on the covariates and that it can handle large datasets in practice even when the assumptions are not satisfied. This algorithmic characterization is key to handling large datasets. We illustrate this new approach to matching in both a simulation study and an observational study of the impact of an earthquake on educational attainment. With this approach, the results after matching can be visualized with simple and transparent graphical displays: while increasing levels of exposure to the earthquake have a negative impact on school attendance, there is no effect on college admission test scores. Supplementary materials for this article are available online.}, file = {/Users/NoahGreifer/Zotero/storage/FPDB8NKK/Bennett et al. - 2020 - Building Representative Matched Samples With Multi.pdf}, journal = {Journal of Computational and Graphical Statistics}, language = {en} } @article{delosangelesresaDirectStableWeight2020, title = {Direct and Stable Weight Adjustment in Non-Experimental Studies with Multivalued Treatments: Analysis of the Effect of an Earthquake on Post-Traumatic Stress}, author = {{de los Angeles Resa}, Mar{\'i}a and Zubizarreta, Jos{\'e} R.}, year = {2020}, month = apr, volume = {n/a}, publisher = {{John Wiley \& Sons, Ltd}}, issn = {0964-1998}, doi = {10.1111/rssa.12561}, abstract = {Summary In February 2010, a massive earthquake struck Chile, causing devastation in certain parts of the country, affecting other areas, and leaving territories untouched. 2 months after the earthquake, Chile's Ministry of Social Development reinterviewed a representative subsample of its National Socioeconomic Characterization Survey, which had been completed 2 months before the earthquake, thereby creating a prospective longitudinal survey with detailed information of the same individuals before and after the earthquake. We use a new weighting method for non-experimental studies with multivalued treatments to estimate the effect of levels of exposure to the earthquake on post-traumatic stress. Unlike common weighting approaches for multivalued treatments, this new method does not require explicit modelling of the generalized propensity score and instead focuses on directly balancing the covariates across the multivalued treatments with weights that have minimum variance. As a result, the weighting estimator is stable and approximately unbiased. Furthermore, the weights are constrained to avoid model extrapolation. We illustrate this new method in a simulation study, with both categorical and continuous treatments. The results show that directly targeting balance instead of explicitly modelling the treatment assignment probabilities tends to provide the best results in terms of bias and root-mean-square error. Using this method, we estimate the effect of the intensity of the earthquake on post-traumatic stress. We implement this method in the new package msbw for R.}, file = {/Users/NoahGreifer/Zotero/storage/RN4P9UNF/de los Angeles Resa and Zubizarreta - 2020 - Direct and stable weight adjustment in non-experim.pdf}, journal = {Journal of the Royal Statistical Society: Series A (Statistics in Society)}, keywords = {Causal inference,Inverse probability weights,Observational studies,Propensity score}, number = {n/a} } @article{visconti2018, title = {Handling Limited Overlap in Observational Studies with Cardinality Matching}, author = {{Visconti}, {Giancarlo} and {Zubizarreta}, {Jos{\'e} R.}}, year = {2018}, date = {2018}, journal = {Observational Studies}, pages = {217--249}, volume = {4}, number = {1}, doi = {10.1353/obs.2018.0012}, url = {https://doi.org/10.1353/obs.2018.0012}, langid = {en} } @article{greiferChoosingEstimandWhen2021, title = {Choosing the {{Estimand When Matching}} or {{Weighting}} in {{Observational Studies}}}, author = {Greifer, Noah and Stuart, Elizabeth A.}, year = {2021}, month = jun, journal = {arXiv:2106.10577 [stat]}, eprint = {2106.10577}, eprinttype = {arxiv}, primaryclass = {stat}, abstract = {Matching and weighting methods for observational studies require the choice of an estimand, the causal effect with reference to a specific target population. Commonly used estimands include the average treatment effect in the treated (ATT), the average treatment effect in the untreated (ATU), the average treatment effect in the population (ATE), and the average treatment effect in the overlap (i.e., equipoise population; ATO). Each estimand has its own assumptions, interpretation, and statistical methods that can be used to estimate it. This article provides guidance on selecting and interpreting an estimand to help medical researchers correctly implement statistical methods used to estimate causal effects in observational studies and to help audiences correctly interpret the results and limitations of these studies. The interpretations of the estimands resulting from regression and instrumental variable analyses are also discussed. Choosing an estimand carefully is essential for making valid inferences from the analysis of observational data and ensuring results are replicable and useful for practitioners.}, archiveprefix = {arXiv}, url = {https://arxiv.org/abs/2106.10577}, keywords = {Statistics - Methodology}, } @book{rosenbaumDesignObservationalStudies2010, title = {Design of Observational Studies}, author = {Rosenbaum, Paul R.}, year = {2010}, series = {Springer Series in Statistics}, publisher = {{Springer}}, address = {{New York}}, isbn = {978-1-4419-1212-1 978-1-4419-1213-8}, langid = {english}, lccn = {QA279 .R669 2010}, keywords = {Analysis of variance,Beobachtungsstudie,Experimental design}, annotation = {OCLC: ocn444428720}, file = {/Users/NoahGreifer/Zotero/storage/WQDICIVJ/Rosenbaum - 2010 - Design of observational studies.pdf} } @article{rubinBiasReductionUsing1980, ids = {rubinBiasReductionUsing1980a,rubinBiasReductionUsing1980b}, title = {Bias {{Reduction Using Mahalanobis-Metric Matching}}}, author = {Rubin, Donald B.}, year = {1980}, journal = {Biometrics}, volume = {36}, number = {2}, pages = {293--298}, publisher = {{[Wiley, International Biometric Society]}}, issn = {0006-341X}, doi = {10.2307/2529981}, abstract = {Monte Carlo methods are used to study the ability of nearest-available, Mahalanobis-metric matching to make the means of matching variables more similar in matched samples than in random samples.}, file = {/Users/NoahGreifer/Zotero/storage/G3ES5W6C/Rubin - 1980 - Bias Reduction Using Mahalanobis-Metric Matching.pdf} } @article{ripolloneImplicationsPropensityScore2018, title = {Implications of the {{Propensity Score Matching Paradox}} in {{Pharmacoepidemiology}}}, author = {Ripollone, John E. and Huybrechts, Krista F. and Rothman, Kenneth J. and Ferguson, Ryan E. and Franklin, Jessica M.}, year = {2018}, month = sep, journal = {American Journal of Epidemiology}, volume = {187}, number = {9}, pages = {1951--1961}, issn = {0002-9262}, doi = {10.1093/aje/kwy078}, abstract = {Abstract. Recent work has demonstrated that propensity score matching may lead to increased covariate imbalance, even with the corresponding decrease in propen}, langid = {english}, file = {/Users/NoahGreifer/Zotero/storage/G2B49Z3J/Ripollone et al. - 2018 - Implications of the Propensity Score Matching Para.pdf;/Users/NoahGreifer/Zotero/storage/9SYCKYA5/4994700.html} } @article{savjeGeneralizedFullMatching2021, title = {Generalized Full Matching}, author = {{Sävje}, Fredrik and Higgins, Michael J. and Sekhon, Jasjeet S.}, year = {2021}, month = {10}, date = {2021-10}, journal = {Political Analysis}, pages = {423--447}, volume = {29}, number = {4}, doi = {10.1017/pan.2020.32}, url = {http://www.cambridge.org/core/journals/political-analysis/article/generalized-full-matching/3DA71D8BEDA6F02B5D36457E114C79B6}, note = {Publisher: Cambridge University Press}, langid = {en} } @book{savjeQuickmatchQuickGeneralized2018, title = {quickmatch: Quick generalized full matching}, author = {Sävje, Fredrik and Sekhon, Jasjeet and Higgins, Michael}, year = {2018}, date = {2018}, url = {https://CRAN.R-project.org/package=quickmatch} } @article{austinDoublePropensityscoreAdjustment2017, title = {Double propensity-score adjustment: A solution to design bias or bias due to incomplete matching}, author = {Austin, Peter C.}, year = {2017}, month = {02}, date = {2017-02-01}, journal = {Statistical Methods in Medical Research}, pages = {201--222}, volume = {26}, number = {1}, doi = {10.1177/0962280214543508}, url = {https://doi.org/10.1177/0962280214543508}, note = {Publisher: SAGE Publications Ltd STM} } @article{snowdenImplementationGComputationSimulated2011, title = {Implementation of G-Computation on a Simulated Data Set: Demonstration of a Causal Inference Technique}, author = {Snowden, Jonathan M. and Rose, Sherri and Mortimer, Kathleen M.}, year = {2011}, month = {04}, date = {2011-04-01}, journal = {American Journal of Epidemiology}, pages = {731--738}, volume = {173}, number = {7}, doi = {10.1093/aje/kwq472}, url = {https://doi.org/10.1093/aje/kwq472}, langid = {en} } @article{schaferAverageCausalEffects2008, title = {Average causal effects from nonrandomized studies: A practical guide and simulated example}, author = {Schafer, Joseph L. and Kang, Joseph}, year = {2008}, month = {12}, date = {2008-12}, journal = {Psychological Methods}, pages = {279--313}, volume = {13}, number = {4}, doi = {10.1037/a0014268}, url = {https://doi.org/10.1037/a0014268} } @article{bodoryFiniteSamplePerformance2020, title = {The Finite Sample Performance of Inference Methods for Propensity Score Matching and Weighting Estimators}, author = {Bodory, Hugo and Camponovo, Lorenzo and Huber, Martin and Lechner, Michael}, year = {2020}, month = {01}, date = {2020-01-02}, journal = {Journal of Business & Economic Statistics}, pages = {183--200}, volume = {38}, number = {1}, doi = {10.1080/07350015.2018.1476247}, url = {https://doi.org/10.1080/07350015.2018.1476247}, langid = {en} } @article{abadieFailureBootstrapMatching2008, title = {On the Failure of the Bootstrap for Matching Estimators}, author = {Abadie, Alberto and Imbens, Guido W.}, year = {2008}, date = {2008}, journal = {Econometrica}, pages = {1537--1557}, volume = {76}, number = {6}, url = {https://www.jstor.org/stable/40056514}, note = {Publisher: [Wiley, Econometric Society]} } @article{cohnProfileMatchingGeneralization2021, title = {Profile {Matching} for the {Generalization} and {Personalization} of {Causal} {Inferences}}, volume = {33}, issn = {1044-3983}, url = {https://journals.lww.com/epidem/abstract/2022/09000/profile_matching_for_the_generalization_and.11.aspx}, doi = {10.1097/EDE.0000000000001517}, abstract = {We introduce profile matching, a multivariate matching method for randomized experiments and observational studies that finds the largest possible unweighted samples across multiple treatment groups that are balanced relative to a covariate profile. This covariate profile can represent a specific population or a target individual, facilitating the generalization and personalization of causal inferences. For generalization, because the profile often amounts to summary statistics for a target population, profile matching does not always require accessing individual-level data, which may be unavailable for confidentiality reasons. For personalization, the profile comprises the characteristics of a single individual. Profile matching achieves covariate balance by construction, but unlike existing approaches to matching, it does not require specifying a matching ratio, as this is implicitly optimized for the data. The method can also be used for the selection of units for study follow-up, and it readily applies to multivalued treatments with many treatment categories. We evaluate the performance of profile matching in a simulation study of the generalization of a randomized trial to a target population. We further illustrate this method in an exploratory observational study of the relationship between opioid use and mental health outcomes. We analyze these relationships for three covariate profiles representing: (i) sexual minorities, (ii) the Appalachian United States, and (iii) the characteristics of a hypothetical vulnerable patient. The method can be implemented via the new function profmatch in the designmatch package for R, for which we provide a step-by-step tutorial.}, language = {en-US}, number = {5}, urldate = {2025-03-05}, journal = {Epidemiology}, author = {Cohn, Eric R. and Zubizarreta, José R.}, month = sep, year = {2022}, pages = {678}, } @article{rassenOnetomanyPropensityScore2012, title = {One-to-many propensity score matching in cohort studies}, author = {Rassen, Jeremy A. and Shelat, Abhi A. and Myers, Jessica and Glynn, Robert J. and Rothman, Kenneth J. and Schneeweiss, Sebastian}, year = {2012}, date = {2012}, journal = {Pharmacoepidemiology and Drug Safety}, pages = {69--80}, volume = {21}, number = {S2}, doi = {10.1002/pds.3263}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/pds.3263}, note = {{\_}eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/pds.3263}, langid = {en} } @article{savjeInconsistencyMatchingReplacement2022, title = {On the inconsistency of matching without replacement}, author = {{Sävje}, F}, year = {2022}, month = {06}, date = {2022-06-01}, journal = {Biometrika}, pages = {551--558}, volume = {109}, number = {2}, doi = {10.1093/biomet/asab035}, url = {https://doi.org/10.1093/biomet/asab035} } MatchIt/data/0000755000176200001440000000000014463003201012505 5ustar liggesusersMatchIt/data/lalonde.tab0000644000176200001440000007242613705231557014645 0ustar liggesusers"treat" "age" "educ" "race" "married" "nodegree" "re74" "re75" "re78" "NSW1" 1 37 11 "black" 1 1 0 0 9930.046 "NSW2" 1 22 9 "hispan" 0 1 0 0 3595.894 "NSW3" 1 30 12 "black" 0 0 0 0 24909.45 "NSW4" 1 27 11 "black" 0 1 0 0 7506.146 "NSW5" 1 33 8 "black" 0 1 0 0 289.7899 "NSW6" 1 22 9 "black" 0 1 0 0 4056.494 "NSW7" 1 23 12 "black" 0 0 0 0 0 "NSW8" 1 32 11 "black" 0 1 0 0 8472.158 "NSW9" 1 22 16 "black" 0 0 0 0 2164.022 "NSW10" 1 33 12 "white" 1 0 0 0 12418.07 "NSW11" 1 19 9 "black" 0 1 0 0 8173.908 "NSW12" 1 21 13 "black" 0 0 0 0 17094.64 "NSW13" 1 18 8 "black" 0 1 0 0 0 "NSW14" 1 27 10 "black" 1 1 0 0 18739.93 "NSW15" 1 17 7 "black" 0 1 0 0 3023.879 "NSW16" 1 19 10 "black" 0 1 0 0 3228.503 "NSW17" 1 27 13 "black" 0 0 0 0 14581.86 "NSW18" 1 23 10 "black" 0 1 0 0 7693.4 "NSW19" 1 40 12 "black" 0 0 0 0 10804.32 "NSW20" 1 26 12 "black" 0 0 0 0 10747.35 "NSW21" 1 23 11 "black" 0 1 0 0 0 "NSW22" 1 41 14 "white" 0 0 0 0 5149.501 "NSW23" 1 38 9 "white" 0 1 0 0 6408.95 "NSW24" 1 24 11 "black" 0 1 0 0 1991.4 "NSW25" 1 18 10 "black" 0 1 0 0 11163.17 "NSW26" 1 29 11 "black" 1 1 0 0 9642.999 "NSW27" 1 25 11 "black" 0 1 0 0 9897.049 "NSW28" 1 27 10 "hispan" 0 1 0 0 11142.87 "NSW29" 1 17 10 "black" 0 1 0 0 16218.04 "NSW30" 1 24 11 "black" 0 1 0 0 995.7002 "NSW31" 1 17 10 "black" 0 1 0 0 0 "NSW32" 1 48 4 "black" 0 1 0 0 6551.592 "NSW33" 1 25 11 "black" 1 1 0 0 1574.424 "NSW34" 1 20 12 "black" 0 0 0 0 0 "NSW35" 1 25 12 "black" 0 0 0 0 3191.753 "NSW36" 1 42 14 "black" 0 0 0 0 20505.93 "NSW37" 1 25 5 "black" 0 1 0 0 6181.88 "NSW38" 1 23 12 "black" 1 0 0 0 5911.551 "NSW39" 1 46 8 "black" 1 1 0 0 3094.156 "NSW40" 1 24 10 "black" 0 1 0 0 0 "NSW41" 1 21 12 "black" 0 0 0 0 1254.582 "NSW42" 1 19 9 "white" 0 1 0 0 13188.83 "NSW43" 1 17 8 "black" 0 1 0 0 8061.485 "NSW44" 1 18 8 "hispan" 1 1 0 0 2787.96 "NSW45" 1 20 11 "black" 0 1 0 0 3972.54 "NSW46" 1 25 11 "black" 1 1 0 0 0 "NSW47" 1 17 8 "black" 0 1 0 0 0 "NSW48" 1 17 9 "black" 0 1 0 0 0 "NSW49" 1 25 5 "black" 0 1 0 0 12187.41 "NSW50" 1 23 12 "black" 0 0 0 0 4843.176 "NSW51" 1 28 8 "black" 0 1 0 0 0 "NSW52" 1 31 11 "black" 1 1 0 0 8087.487 "NSW53" 1 18 11 "black" 0 1 0 0 0 "NSW54" 1 25 12 "black" 0 0 0 0 2348.973 "NSW55" 1 30 11 "black" 1 1 0 0 590.7818 "NSW56" 1 17 10 "black" 0 1 0 0 0 "NSW57" 1 37 9 "black" 0 1 0 0 1067.506 "NSW58" 1 41 4 "black" 1 1 0 0 7284.986 "NSW59" 1 42 14 "black" 1 0 0 0 13167.52 "NSW60" 1 22 11 "white" 0 1 0 0 1048.432 "NSW61" 1 17 8 "black" 0 1 0 0 0 "NSW62" 1 29 8 "black" 0 1 0 0 1923.938 "NSW63" 1 35 10 "black" 0 1 0 0 4666.236 "NSW64" 1 27 11 "black" 0 1 0 0 549.2984 "NSW65" 1 29 4 "black" 0 1 0 0 762.9146 "NSW66" 1 28 9 "black" 0 1 0 0 10694.29 "NSW67" 1 27 11 "black" 0 1 0 0 0 "NSW68" 1 23 7 "white" 0 1 0 0 0 "NSW69" 1 45 5 "black" 1 1 0 0 8546.715 "NSW70" 1 29 13 "black" 0 0 0 0 7479.656 "NSW71" 1 27 9 "black" 0 1 0 0 0 "NSW72" 1 46 13 "black" 0 0 0 0 647.2046 "NSW73" 1 18 6 "black" 0 1 0 0 0 "NSW74" 1 25 12 "black" 0 0 0 0 11965.81 "NSW75" 1 28 15 "black" 0 0 0 0 9598.541 "NSW76" 1 25 11 "white" 0 1 0 0 18783.35 "NSW77" 1 22 12 "black" 0 0 0 0 18678.08 "NSW78" 1 21 9 "black" 0 1 0 0 0 "NSW79" 1 40 11 "black" 0 1 0 0 23005.6 "NSW80" 1 22 11 "black" 0 1 0 0 6456.697 "NSW81" 1 25 12 "black" 0 0 0 0 0 "NSW82" 1 18 12 "black" 0 0 0 0 2321.107 "NSW83" 1 38 12 "white" 0 0 0 0 4941.849 "NSW84" 1 27 13 "black" 0 0 0 0 0 "NSW85" 1 27 8 "black" 0 1 0 0 0 "NSW86" 1 38 11 "black" 0 1 0 0 0 "NSW87" 1 23 8 "hispan" 0 1 0 0 3881.284 "NSW88" 1 26 11 "black" 0 1 0 0 17230.96 "NSW89" 1 21 12 "white" 0 0 0 0 8048.603 "NSW90" 1 25 8 "black" 0 1 0 0 0 "NSW91" 1 31 11 "black" 1 1 0 0 14509.93 "NSW92" 1 17 10 "black" 0 1 0 0 0 "NSW93" 1 25 11 "black" 0 1 0 0 0 "NSW94" 1 21 12 "black" 0 0 0 0 9983.784 "NSW95" 1 44 11 "black" 0 1 0 0 0 "NSW96" 1 25 12 "white" 0 0 0 0 5587.503 "NSW97" 1 18 9 "black" 0 1 0 0 4482.845 "NSW98" 1 42 12 "black" 0 0 0 0 2456.153 "NSW99" 1 25 10 "black" 0 1 0 0 0 "NSW100" 1 31 9 "hispan" 0 1 0 0 26817.6 "NSW101" 1 24 10 "black" 0 1 0 0 0 "NSW102" 1 26 10 "black" 0 1 0 0 9265.788 "NSW103" 1 25 11 "black" 0 1 0 0 485.2298 "NSW104" 1 18 11 "black" 0 1 0 0 4814.627 "NSW105" 1 19 11 "black" 0 1 0 0 7458.105 "NSW106" 1 43 9 "black" 0 1 0 0 0 "NSW107" 1 27 13 "black" 0 0 0 0 34099.28 "NSW108" 1 17 9 "black" 0 1 0 0 1953.268 "NSW109" 1 30 11 "black" 0 1 0 0 0 "NSW110" 1 26 10 "black" 1 1 2027.999 0 0 "NSW111" 1 20 9 "black" 0 1 6083.994 0 8881.665 "NSW112" 1 17 9 "hispan" 0 1 445.1704 74.34345 6210.67 "NSW113" 1 20 12 "black" 0 0 989.2678 165.2077 0 "NSW114" 1 18 11 "black" 0 1 858.2543 214.5636 929.8839 "NSW115" 1 27 12 "black" 1 0 3670.872 334.0493 0 "NSW116" 1 21 12 "white" 0 0 3670.872 334.0494 12558.02 "NSW117" 1 27 12 "black" 0 0 2143.413 357.9499 22163.25 "NSW118" 1 20 12 "black" 0 0 0 377.5686 1652.637 "NSW119" 1 19 10 "black" 0 1 0 385.2741 8124.715 "NSW120" 1 23 12 "black" 0 0 5506.308 501.0741 671.3318 "NSW121" 1 29 14 "black" 0 0 0 679.6734 17814.98 "NSW122" 1 18 10 "black" 0 1 0 798.9079 9737.154 "NSW123" 1 19 9 "black" 0 1 0 798.9079 17685.18 "NSW124" 1 27 13 "white" 1 0 9381.566 853.7225 0 "NSW125" 1 18 11 "white" 0 1 3678.231 919.5579 4321.705 "NSW126" 1 27 9 "black" 1 1 0 934.4454 1773.423 "NSW127" 1 22 12 "black" 0 0 5605.852 936.1773 0 "NSW128" 1 23 10 "black" 1 1 0 936.4386 11233.26 "NSW129" 1 23 12 "hispan" 0 0 9385.74 1117.439 559.4432 "NSW130" 1 20 11 "black" 0 1 3637.498 1220.836 1085.44 "NSW131" 1 17 9 "black" 0 1 1716.509 1253.439 5445.2 "NSW132" 1 28 11 "black" 0 1 0 1284.079 60307.93 "NSW133" 1 26 11 "black" 1 1 0 1392.853 1460.36 "NSW134" 1 20 11 "black" 0 1 16318.62 1484.994 6943.342 "NSW135" 1 24 11 "black" 1 1 824.3886 1666.113 4032.708 "NSW136" 1 31 9 "black" 0 1 0 1698.607 10363.27 "NSW137" 1 23 8 "white" 1 1 0 1713.15 4232.309 "NSW138" 1 18 10 "black" 0 1 2143.411 1784.274 11141.39 "NSW139" 1 29 12 "black" 0 0 10881.94 1817.284 0 "NSW140" 1 26 11 "white" 0 1 0 2226.266 13385.86 "NSW141" 1 24 9 "black" 0 1 9154.7 2288.675 4849.559 "NSW142" 1 25 12 "black" 0 0 14426.79 2409.274 0 "NSW143" 1 24 10 "black" 0 1 4250.402 2421.947 1660.508 "NSW144" 1 46 8 "black" 0 1 3165.658 2594.723 0 "NSW145" 1 31 12 "white" 0 0 0 2611.218 2484.549 "NSW146" 1 19 11 "black" 0 1 2305.026 2615.276 4146.603 "NSW147" 1 19 8 "black" 0 1 0 2657.057 9970.681 "NSW148" 1 27 11 "black" 0 1 2206.94 2666.274 0 "NSW149" 1 26 11 "black" 1 1 0 2754.646 26372.28 "NSW150" 1 20 10 "black" 0 1 5005.731 2777.355 5615.189 "NSW151" 1 28 10 "black" 0 1 0 2836.506 3196.571 "NSW152" 1 24 12 "black" 0 0 13765.75 2842.764 6167.681 "NSW153" 1 19 8 "black" 0 1 2636.353 2937.264 7535.942 "NSW154" 1 23 12 "black" 0 0 6269.341 3039.96 8484.239 "NSW155" 1 42 9 "black" 1 1 0 3058.531 1294.409 "NSW156" 1 25 13 "black" 0 0 12362.93 3090.732 0 "NSW157" 1 18 9 "black" 0 1 0 3287.375 5010.342 "NSW158" 1 21 12 "black" 0 0 6473.683 3332.409 9371.037 "NSW159" 1 27 10 "black" 0 1 1001.146 3550.075 0 "NSW160" 1 21 8 "black" 0 1 989.2678 3695.897 4279.613 "NSW161" 1 22 9 "black" 0 1 2192.877 3836.986 3462.564 "NSW162" 1 31 4 "black" 0 1 8517.589 4023.211 7382.549 "NSW163" 1 24 10 "black" 1 1 11703.2 4078.152 0 "NSW164" 1 29 10 "black" 0 1 0 4398.95 0 "NSW165" 1 29 12 "black" 0 0 9748.387 4878.937 10976.51 "NSW166" 1 19 10 "white" 0 1 0 5324.109 13829.62 "NSW167" 1 19 11 "hispan" 1 1 5424.485 5463.803 6788.463 "NSW168" 1 31 9 "black" 0 1 10717.03 5517.841 9558.501 "NSW169" 1 22 10 "black" 1 1 1468.348 5588.664 13228.28 "NSW170" 1 21 9 "black" 0 1 6416.47 5749.331 743.6666 "NSW171" 1 17 10 "black" 0 1 1291.468 5793.852 5522.788 "NSW172" 1 26 12 "black" 1 0 8408.762 5794.831 1424.944 "NSW173" 1 20 9 "hispan" 0 1 12260.78 5875.049 1358.643 "NSW174" 1 19 10 "black" 0 1 4121.949 6056.754 0 "NSW175" 1 26 10 "black" 0 1 25929.68 6788.958 672.8773 "NSW176" 1 28 11 "black" 0 1 1929.029 6871.856 0 "NSW177" 1 22 12 "hispan" 1 0 492.2305 7055.702 10092.83 "NSW178" 1 33 11 "black" 0 1 0 7867.916 6281.433 "NSW179" 1 22 12 "white" 0 0 6759.994 8455.504 12590.71 "NSW180" 1 29 10 "hispan" 0 1 0 8853.674 5112.014 "NSW181" 1 33 12 "black" 1 0 20279.95 10941.35 15952.6 "NSW182" 1 25 14 "black" 1 0 35040.07 11536.57 36646.95 "NSW183" 1 35 9 "black" 1 1 13602.43 13830.64 12803.97 "NSW184" 1 35 8 "black" 1 1 13732.07 17976.15 3786.628 "NSW185" 1 33 11 "black" 1 1 14660.71 25142.24 4181.942 "PSID1" 0 30 12 "white" 1 0 20166.73 18347.23 25564.67 "PSID2" 0 26 12 "white" 1 0 25862.32 17806.55 25564.67 "PSID3" 0 25 16 "white" 1 0 25862.32 15316.21 25564.67 "PSID4" 0 42 11 "white" 1 1 21787.05 14265.29 15491.01 "PSID5" 0 25 9 "black" 1 1 14829.69 13776.53 0 "PSID6" 0 37 9 "black" 1 1 13685.48 12756.05 17833.2 "PSID7" 0 32 12 "white" 1 0 19067.58 12625.35 14146.28 "PSID8" 0 20 12 "black" 0 0 7392.314 12396.19 17765.23 "PSID9" 0 38 9 "hispan" 1 1 16826.18 12029.18 0 "PSID10" 0 39 10 "white" 1 1 16767.41 12022.02 4433.18 "PSID11" 0 41 5 "white" 1 1 10785.76 11991.58 19451.31 "PSID12" 0 31 14 "white" 1 0 17831.29 11563.69 22094.97 "PSID13" 0 34 8 "white" 1 1 8038.872 11404.35 5486.799 "PSID14" 0 29 12 "white" 1 0 14768.95 11146.55 6420.722 "PSID15" 0 22 14 "black" 1 0 748.4399 11105.37 18208.55 "PSID16" 0 42 0 "hispan" 1 1 2797.833 10929.92 9922.934 "PSID17" 0 25 9 "hispan" 0 1 5460.477 10589.76 7539.361 "PSID18" 0 28 9 "white" 1 1 11091.41 10357.02 15406.78 "PSID19" 0 40 13 "white" 1 0 3577.621 10301.52 11911.95 "PSID20" 0 35 9 "white" 1 1 11475.43 9397.403 11087.38 "PSID21" 0 27 10 "hispan" 1 1 15711.36 9098.419 17023.41 "PSID22" 0 27 6 "hispan" 1 1 7831.189 9071.565 5661.171 "PSID23" 0 36 12 "white" 1 0 25535.12 8695.597 21905.82 "PSID24" 0 47 8 "black" 1 1 9275.169 8543.419 0 "PSID25" 0 40 11 "white" 1 1 20666.35 8502.242 25564.67 "PSID26" 0 27 7 "white" 1 1 3064.293 8461.065 11149.45 "PSID27" 0 36 9 "black" 1 1 13256.4 8457.484 0 "PSID28" 0 39 6 "hispan" 1 1 13279.91 8441.371 25048.94 "PSID29" 0 21 9 "white" 1 1 11156.07 8441.371 1213.214 "PSID30" 0 29 12 "white" 1 0 11199.17 8081.516 0 "PSID31" 0 22 13 "hispan" 0 0 6404.843 7882.79 9453.017 "PSID32" 0 25 10 "white" 1 1 13634.54 7793.274 11688.82 "PSID33" 0 27 12 "white" 1 0 12270.89 7709.129 7806.829 "PSID34" 0 45 8 "white" 1 1 22415.97 7635.726 15931.37 "PSID35" 0 26 12 "white" 1 0 2345.242 7565.903 2838.713 "PSID36" 0 27 12 "white" 1 0 9788.497 7496.081 14038.4 "PSID37" 0 33 8 "white" 1 1 12312.03 7474.597 25514.43 "PSID38" 0 25 12 "white" 1 0 11381.38 7467.435 4162.756 "PSID39" 0 49 8 "white" 1 1 6459.703 7431.629 7503.896 "PSID40" 0 40 3 "hispan" 1 1 7576.485 7426.258 12104.06 "PSID41" 0 22 12 "black" 1 0 9729.719 7372.548 2231.367 "PSID42" 0 25 5 "white" 1 1 7891.927 7293.774 14617.67 "PSID43" 0 25 12 "white" 1 0 11516.57 7263.339 19588.74 "PSID44" 0 21 12 "white" 1 0 13601.23 7202.468 10746.03 "PSID45" 0 33 9 "hispan" 1 1 11959.36 7087.887 25564.67 "PSID46" 0 20 12 "black" 1 0 9555.344 7055.661 0 "PSID47" 0 19 11 "white" 1 1 4306.468 6978.677 837.871 "PSID48" 0 25 12 "black" 1 0 295.8493 6942.871 461.0507 "PSID49" 0 29 12 "white" 1 0 15303.83 6932.129 24290.87 "PSID50" 0 20 12 "white" 1 0 3558.029 6797.855 6680.802 "PSID51" 0 29 6 "hispan" 1 1 8542.403 6701.177 7196.528 "PSID52" 0 25 13 "white" 1 0 19259.59 6652.839 13015.82 "PSID53" 0 41 15 "white" 1 0 25862.32 6563.323 24647 "PSID54" 0 39 10 "white" 1 1 22745.13 6493.5 25564.67 "PSID55" 0 33 12 "white" 1 0 10819.07 6369.968 2936.243 "PSID56" 0 29 8 "white" 1 1 9169.369 6352.065 20575.86 "PSID57" 0 21 11 "white" 1 1 10679.96 6276.871 10923.35 "PSID58" 0 31 12 "white" 1 0 23652.27 6228.532 22403.81 "PSID59" 0 36 12 "black" 1 0 11040.47 6221.371 7215.739 "PSID60" 0 25 7 "white" 1 1 5597.625 6099.629 122.6513 "PSID61" 0 35 7 "white" 1 1 10715.23 6087.097 15177.73 "PSID62" 0 22 9 "white" 1 1 5683.833 6038.758 4742.025 "PSID63" 0 31 2 "hispan" 1 1 3262.179 5965.355 9732.307 "PSID64" 0 40 15 "white" 1 0 10907.24 5922.387 6238.962 "PSID65" 0 47 3 "white" 1 1 9047.894 5911.645 6145.865 "PSID66" 0 26 8 "hispan" 0 1 3168.134 5872.258 11136.15 "PSID67" 0 42 7 "white" 1 1 10971.89 5806.016 9241.702 "PSID68" 0 53 12 "white" 0 0 17104.4 5775.581 19965.56 "PSID69" 0 30 17 "black" 0 0 17827.37 5546.419 14421.13 "PSID70" 0 28 10 "white" 1 1 10415.46 5544.629 10289.41 "PSID71" 0 46 11 "white" 1 1 14753.28 5299.355 0 "PSID72" 0 28 12 "white" 0 0 8256.35 5279.661 21602.88 "PSID73" 0 27 12 "hispan" 1 0 17604.01 5222.371 25564.67 "PSID74" 0 25 10 "white" 1 1 4335.857 5181.194 12418.81 "PSID75" 0 38 8 "white" 1 1 11242.27 5174.032 0 "PSID76" 0 26 12 "hispan" 0 0 7968.338 5109.581 4181.966 "PSID77" 0 54 12 "white" 0 0 7165.039 5012.903 0 "PSID78" 0 38 8 "hispan" 1 1 22606.02 4978.887 8720.065 "PSID79" 0 23 17 "white" 0 0 0 4876.839 16747.08 "PSID80" 0 23 8 "white" 1 1 3595.255 4866.097 2782.559 "PSID81" 0 23 12 "white" 1 0 11690.95 4764.048 14065 "PSID82" 0 25 12 "hispan" 1 0 8746.167 4762.258 379.7757 "PSID83" 0 25 15 "white" 1 0 7386.436 4738.984 12705.49 "PSID84" 0 37 11 "hispan" 0 1 615.2098 4713.919 0 "PSID85" 0 40 12 "white" 1 0 18389.68 4688.855 21857.05 "PSID86" 0 19 10 "white" 0 1 5777.878 4672.742 135.9508 "PSID87" 0 48 7 "white" 1 1 13326.93 4636.935 0 "PSID88" 0 19 12 "white" 0 0 8530.648 4620.823 0 "PSID89" 0 16 9 "white" 0 1 2539.21 4579.645 0 "PSID90" 0 29 10 "white" 1 1 713.1731 4542.048 7781.708 "PSID91" 0 30 16 "white" 0 0 3093.682 4468.645 15538.29 "PSID92" 0 22 11 "white" 1 1 8761.841 4463.274 10642.59 "PSID93" 0 22 10 "white" 0 1 17268.98 4400.613 2453.026 "PSID94" 0 47 10 "black" 1 1 13311.26 4397.032 19330.14 "PSID95" 0 25 12 "hispan" 1 0 2266.872 4361.226 3020.473 "PSID96" 0 47 10 "black" 0 1 21918.32 4323.629 19438.02 "PSID97" 0 24 12 "black" 1 0 8573.752 4293.194 0 "PSID98" 0 20 12 "black" 1 0 2648.929 4273.5 0 "PSID99" 0 28 12 "black" 0 0 16722.34 4253.806 7314.747 "PSID100" 0 47 11 "white" 0 1 8060.424 4232.323 3358.873 "PSID101" 0 50 0 "white" 1 1 10162.72 4218 220.1813 "PSID102" 0 18 12 "white" 0 0 2217.89 4191.145 8957.978 "PSID103" 0 21 12 "white" 0 0 9665.063 4110.581 1687.564 "PSID104" 0 47 11 "white" 1 1 23924.61 4096.258 17358.85 "PSID105" 0 21 12 "white" 0 0 2827.222 4056.871 5937.505 "PSID106" 0 34 11 "white" 1 1 0 4010.323 18133.18 "PSID107" 0 19 12 "white" 1 0 5817.063 3919.016 1066.919 "PSID108" 0 44 13 "white" 1 0 8032.994 3881.419 3104.704 "PSID109" 0 21 15 "white" 1 0 6951.479 3879.629 0 "PSID110" 0 20 12 "black" 0 0 5099.971 3842.032 12718.79 "PSID111" 0 51 11 "white" 0 1 48.98167 3813.387 1525.014 "PSID112" 0 28 13 "white" 0 0 5260.631 3790.113 9253.524 "PSID113" 0 24 15 "white" 0 0 12746.99 3743.565 0 "PSID114" 0 28 8 "hispan" 1 1 8305.332 3718.5 0 "PSID115" 0 20 11 "white" 1 1 5822.941 3532.306 11075.56 "PSID116" 0 29 12 "white" 1 0 14288.93 3503.661 8133.407 "PSID117" 0 23 12 "white" 1 0 14347.71 3482.177 3818.445 "PSID118" 0 20 11 "black" 0 1 0 3480.387 5495.665 "PSID119" 0 42 7 "white" 1 1 4324.102 3457.113 9856.436 "PSID120" 0 43 12 "white" 1 0 14328.12 3453.532 18781.9 "PSID121" 0 27 13 "white" 0 0 16406.9 3426.677 5344.937 "PSID122" 0 27 4 "hispan" 1 1 626.9654 3410.565 3367.739 "PSID123" 0 25 12 "white" 1 0 21469.65 3405.194 7981.201 "PSID124" 0 18 12 "white" 0 0 4729.67 3328.21 12602.05 "PSID125" 0 31 16 "white" 1 0 25862.32 3254.806 25564.67 "PSID126" 0 27 12 "white" 1 0 4043.927 3231.532 7240.86 "PSID127" 0 18 11 "white" 0 1 0 3226.161 15814.63 "PSID128" 0 24 7 "white" 1 1 7860.578 3213.629 0 "PSID129" 0 23 12 "white" 1 0 7856.66 3213.629 5535.564 "PSID130" 0 50 12 "white" 1 0 19929.66 3190.355 18597.19 "PSID131" 0 19 12 "white" 0 0 99.92261 3172.452 15436.33 "PSID132" 0 23 10 "white" 1 1 15811.28 3145.597 6398.556 "PSID133" 0 51 12 "white" 1 0 21001.38 3140.226 16015.6 "PSID134" 0 19 11 "black" 0 1 5607.422 3054.29 94.5745 "PSID135" 0 20 10 "white" 1 1 3099.56 2970.145 21141.83 "PSID136" 0 20 11 "hispan" 0 1 2868.367 2968.355 7403.41 "PSID137" 0 21 12 "white" 0 0 8128.998 2939.71 0 "PSID138" 0 39 10 "white" 1 1 0 2886 18761.22 "PSID139" 0 36 5 "white" 0 1 3814.692 2873.468 2751.527 "PSID140" 0 19 9 "black" 0 1 1079.556 2873.468 14344.29 "PSID141" 0 42 6 "hispan" 1 1 2425.572 2832.29 1907.745 "PSID142" 0 20 7 "white" 0 1 1902.448 2792.903 6098.578 "PSID143" 0 23 12 "white" 1 0 4954.986 2771.419 0 "PSID144" 0 35 12 "white" 1 0 1469.45 2719.5 0 "PSID145" 0 18 12 "white" 0 0 881.6701 2696.226 12120.31 "PSID146" 0 43 8 "white" 1 1 18338.74 2674.742 6395.601 "PSID147" 0 37 14 "white" 1 0 18501.36 2638.935 13429.58 "PSID148" 0 24 10 "white" 1 1 4719.874 2565.532 2173.736 "PSID149" 0 51 12 "white" 0 0 20742.76 2538.677 1019.631 "PSID150" 0 22 11 "hispan" 0 1 7341.373 2535.097 14187.65 "PSID151" 0 19 12 "white" 0 0 336.9939 2518.984 7118.209 "PSID152" 0 52 0 "hispan" 1 1 773.9104 2506.452 0 "PSID153" 0 21 12 "white" 0 0 2903.633 2456.323 4787.834 "PSID154" 0 24 12 "white" 0 0 9784.578 2413.355 0 "PSID155" 0 35 8 "white" 1 1 2241.401 2399.032 9460.406 "PSID156" 0 20 13 "white" 0 0 0 2352.484 0 "PSID157" 0 17 7 "black" 0 1 1054.086 2286.242 1613.677 "PSID158" 0 18 10 "black" 0 1 311.5234 2284.452 8154.095 "PSID159" 0 28 12 "black" 0 0 6285.328 2255.806 7310.313 "PSID160" 0 25 14 "hispan" 1 0 1622.273 2239.694 1892.968 "PSID161" 0 40 12 "hispan" 0 0 13616.9 2228.952 876.2919 "PSID162" 0 50 3 "white" 1 1 3136.786 2203.887 13976.34 "PSID163" 0 48 8 "white" 1 1 16050.31 2116.161 11600.15 "PSID164" 0 17 7 "hispan" 0 1 0 2082.145 6460.621 "PSID165" 0 30 12 "white" 1 0 7347.251 2080.355 14475.81 "PSID166" 0 30 7 "white" 1 1 574.0652 2010.532 366.4762 "PSID167" 0 22 11 "white" 1 1 3030.986 1976.516 0 "PSID168" 0 27 12 "white" 1 0 11493.06 1906.694 13419.24 "PSID169" 0 25 9 "white" 1 1 23377.97 1901.323 1898.879 "PSID170" 0 21 14 "white" 0 0 80.32994 1890.581 6389.69 "PSID171" 0 17 10 "white" 0 1 0 1888.79 19993.64 "PSID172" 0 39 7 "white" 0 1 7786.126 1844.032 9206.237 "PSID173" 0 18 9 "black" 0 1 1183.397 1822.548 803.8833 "PSID174" 0 25 12 "white" 1 0 2721.422 1754.516 1037.364 "PSID175" 0 20 8 "white" 1 1 2360.916 1741.984 0 "PSID176" 0 19 13 "white" 0 0 2366.794 1709.758 0 "PSID177" 0 19 11 "white" 0 1 0 1693.645 9853.481 "PSID178" 0 22 12 "white" 0 0 10137.25 1679.323 25564.67 "PSID179" 0 18 11 "black" 0 1 2068.986 1623.823 20243.38 "PSID180" 0 21 10 "white" 0 1 1767.259 1555.79 7675.312 "PSID181" 0 24 12 "white" 1 0 7643.1 1546.839 3262.82 "PSID182" 0 18 11 "white" 0 1 1273.523 1532.516 12489.75 "PSID183" 0 17 10 "white" 0 1 568.1874 1525.355 6231.573 "PSID184" 0 17 10 "white" 0 1 0 1503.871 7843.773 "PSID185" 0 18 10 "white" 0 1 0 1491.339 237.914 "PSID186" 0 53 10 "hispan" 0 1 7878.212 1489.548 13170.98 "PSID187" 0 18 11 "black" 0 1 1191.234 1478.806 3683.972 "PSID188" 0 17 10 "hispan" 0 1 0 1453.742 6918.716 "PSID189" 0 26 12 "black" 0 0 0 1448.371 0 "PSID190" 0 39 5 "white" 1 1 13082.02 1434.048 18323.81 "PSID191" 0 18 12 "black" 0 0 1579.169 1408.984 3057.416 "PSID192" 0 23 13 "white" 0 0 601.4949 1394.661 4975.505 "PSID193" 0 18 8 "white" 0 1 5023.56 1391.081 6756.166 "PSID194" 0 28 10 "white" 1 1 7578.444 1383.919 2404.261 "PSID195" 0 32 4 "white" 1 1 0 1378.548 0 "PSID196" 0 18 11 "black" 0 1 0 1367.806 33.98771 "PSID197" 0 40 10 "white" 1 1 1543.902 1342.742 0 "PSID198" 0 21 14 "white" 0 0 8456.196 1330.21 16967.26 "PSID199" 0 29 10 "hispan" 0 1 3732.403 1323.048 6694.101 "PSID200" 0 31 6 "white" 0 1 2666.562 1321.258 0 "PSID201" 0 46 7 "white" 1 1 19171.43 1317.677 0 "PSID202" 0 20 9 "hispan" 1 1 0 1283.661 0 "PSID203" 0 36 18 "white" 1 0 3273.935 1269.339 18227.76 "PSID204" 0 45 12 "white" 1 0 16559.72 1265.758 7987.112 "PSID205" 0 16 10 "white" 0 1 1026.656 1224.581 6847.785 "PSID206" 0 18 12 "white" 0 0 818.9735 1208.468 2232.845 "PSID207" 0 40 12 "hispan" 0 0 11867.28 1195.935 3873.121 "PSID208" 0 16 9 "white" 0 1 0 1188.774 2451.548 "PSID209" 0 16 10 "white" 0 1 574.0652 1181.613 5578.418 "PSID210" 0 28 5 "hispan" 1 1 10967.98 1178.032 239.3917 "PSID211" 0 20 12 "white" 0 0 0 1147.597 15554.55 "PSID212" 0 19 8 "white" 1 1 39.18534 1136.855 5327.204 "PSID213" 0 16 8 "white" 0 1 0 1113.581 542.3257 "PSID214" 0 20 11 "white" 1 1 2547.047 1099.258 0 "PSID215" 0 35 10 "white" 1 1 4964.782 1086.726 1745.195 "PSID216" 0 32 6 "hispan" 1 1 979.6334 1036.597 0 "PSID217" 0 32 16 "black" 0 0 17135.75 1031.226 0 "PSID218" 0 17 9 "black" 0 1 0 981.0968 8900.347 "PSID219" 0 16 7 "white" 0 1 0 975.7258 4728.725 "PSID220" 0 32 15 "white" 0 0 489.8167 968.5645 7684.178 "PSID221" 0 19 12 "white" 0 0 815.055 964.9839 12059.73 "PSID222" 0 40 12 "white" 1 0 16851.65 961.4032 17717.94 "PSID223" 0 50 7 "white" 1 1 11473.47 956.0323 0 "PSID224" 0 39 11 "white" 0 1 0 930.9677 0 "PSID225" 0 18 8 "hispan" 0 1 0 902.3226 1306.31 "PSID226" 0 39 10 "black" 0 1 844.444 889.7903 701.9201 "PSID227" 0 17 11 "hispan" 0 1 0 873.6774 7759.542 "PSID228" 0 17 5 "black" 0 1 96.00407 868.3065 0 "PSID229" 0 19 12 "white" 0 0 2425.572 861.1452 2587.499 "PSID230" 0 27 15 "white" 0 0 0 857.5645 3392.86 "PSID231" 0 18 11 "black" 0 1 587.78 841.4516 7933.914 "PSID232" 0 20 14 "white" 1 0 0 805.6452 1454.083 "PSID233" 0 20 12 "white" 1 0 12145.49 791.3226 13683.75 "PSID234" 0 19 13 "black" 0 0 1714.358 785.9516 9067.33 "PSID235" 0 24 8 "white" 1 1 213.5601 760.8871 2340.719 "PSID236" 0 27 12 "white" 1 0 4222.22 751.9355 0 "PSID237" 0 19 9 "white" 0 1 773.9104 676.7419 5647.871 "PSID238" 0 52 8 "black" 1 1 5454.599 666 0 "PSID239" 0 18 11 "hispan" 0 1 0 630.1935 0 "PSID240" 0 16 10 "hispan" 0 1 0 630.1935 3892.332 "PSID241" 0 18 12 "hispan" 0 0 0 630.1935 4843.988 "PSID242" 0 45 12 "white" 0 0 4473.006 608.7097 0 "PSID243" 0 21 14 "white" 0 0 9708.167 594.3871 2256.488 "PSID244" 0 36 8 "white" 1 1 2715.544 585.4355 0 "PSID245" 0 21 13 "white" 0 0 513.3279 578.2742 0 "PSID246" 0 41 7 "white" 1 1 19573.08 565.7419 0 "PSID247" 0 18 7 "white" 0 1 491.776 558.5806 642.8111 "PSID248" 0 39 9 "white" 0 1 11230.52 537.0968 5752.79 "PSID249" 0 19 3 "white" 1 1 0 537.0968 0 "PSID250" 0 32 13 "white" 1 0 12553.02 524.5645 15353.58 "PSID251" 0 16 9 "white" 0 1 0 485.1774 4112.513 "PSID252" 0 16 7 "white" 0 1 658.3136 479.8065 6210.885 "PSID253" 0 21 9 "black" 0 1 1030.574 470.8548 1223.558 "PSID254" 0 22 12 "white" 1 0 12096.51 469.0645 14289.62 "PSID255" 0 23 11 "hispan" 1 1 8946.012 469.0645 4776.012 "PSID256" 0 17 8 "black" 0 1 0 451.1613 0 "PSID257" 0 21 8 "white" 1 1 5699.507 388.5 8844.194 "PSID258" 0 18 10 "white" 0 1 0 386.7097 0 "PSID259" 0 24 12 "white" 1 0 9051.813 327.629 8547.171 "PSID260" 0 24 12 "black" 1 0 4232.016 320.4677 1273.8 "PSID261" 0 16 9 "white" 0 1 0 320.4677 3707.616 "PSID262" 0 20 8 "white" 1 1 621.0876 306.1452 5551.819 "PSID263" 0 42 8 "white" 0 1 17925.33 300.7742 14116.72 "PSID264" 0 17 8 "hispan" 0 1 391.8534 300.7742 18891.26 "PSID265" 0 19 8 "hispan" 0 1 368.3422 300.7742 18510 "PSID266" 0 17 9 "black" 0 1 0 297.1935 54.67588 "PSID267" 0 21 14 "white" 0 0 107.7597 293.6129 7698.955 "PSID268" 0 16 9 "black" 0 1 0 277.5 3983.951 "PSID269" 0 23 13 "black" 0 0 172.4155 272.129 582.2243 "PSID270" 0 16 9 "white" 0 1 411.446 254.2258 1725.985 "PSID271" 0 17 11 "hispan" 0 1 803.2994 248.8548 5173.521 "PSID272" 0 46 7 "white" 0 1 1081.515 245.2742 0 "PSID273" 0 32 10 "white" 1 1 4145.809 238.1129 8245.714 "PSID274" 0 18 11 "white" 0 1 131.2709 218.4194 7503.896 "PSID275" 0 23 12 "hispan" 1 0 0 216.629 0 "PSID276" 0 18 10 "white" 1 1 0 211.2581 14053.18 "PSID277" 0 19 10 "black" 0 1 1056.045 205.8871 0 "PSID278" 0 16 7 "black" 0 1 133.2301 205.8871 6145.865 "PSID279" 0 26 7 "white" 1 1 1538.024 189.7742 650.1997 "PSID280" 0 16 10 "white" 0 1 0 189.7742 2136.793 "PSID281" 0 17 10 "white" 0 1 0 182.6129 6423.677 "PSID282" 0 17 10 "white" 0 1 0 171.871 1483.637 "PSID283" 0 23 8 "white" 1 1 33.30754 166.5 0 "PSID284" 0 29 12 "white" 1 0 14641.6 162.9194 9473.705 "PSID285" 0 17 10 "white" 0 1 0 152.1774 10301.23 "PSID286" 0 49 8 "white" 1 1 14684.7 136.0645 14963.46 "PSID287" 0 20 10 "white" 1 1 6563.544 134.2742 15363.92 "PSID288" 0 40 16 "white" 1 0 0 114.5806 0 "PSID289" 0 19 10 "white" 0 1 1933.796 112.7903 675.321 "PSID290" 0 18 11 "white" 0 1 1481.206 57.29032 1421.573 "PSID291" 0 16 6 "black" 0 1 0 44.75806 0 "PSID292" 0 22 8 "white" 1 1 105.8004 42.96774 209.8372 "PSID293" 0 31 12 "black" 1 0 0 42.96774 11023.84 "PSID294" 0 20 11 "white" 1 1 4478.884 39.3871 6280.338 "PSID295" 0 17 11 "hispan" 0 1 601.4949 10.74194 1913.656 "PSID296" 0 50 12 "white" 1 0 25862.32 0 25564.67 "PSID297" 0 49 14 "white" 1 0 25862.32 0 25564.67 "PSID298" 0 47 9 "white" 1 1 25862.32 0 25564.67 "PSID299" 0 34 11 "hispan" 1 1 22198.49 0 0 "PSID300" 0 22 8 "black" 1 1 16961.37 0 959.0445 "PSID301" 0 27 12 "white" 1 0 15509.56 0 12593.19 "PSID302" 0 30 10 "white" 1 1 14913.94 0 11563.21 "PSID303" 0 52 12 "white" 1 0 14780.71 0 25564.67 "PSID304" 0 43 12 "white" 1 0 13321.05 0 16860.86 "PSID305" 0 27 9 "hispan" 1 1 12829.28 0 0 "PSID306" 0 35 13 "white" 0 0 9537.711 0 11269.14 "PSID307" 0 45 12 "white" 1 0 9277.128 0 12108.49 "PSID308" 0 22 11 "black" 1 1 9049.853 0 9088.018 "PSID309" 0 22 12 "white" 1 0 9022.424 0 3342.618 "PSID310" 0 23 11 "white" 1 1 8910.745 0 4183.444 "PSID311" 0 55 7 "white" 1 1 8832.375 0 0 "PSID312" 0 26 14 "white" 0 0 8411.132 0 0 "PSID313" 0 34 12 "white" 0 0 8125.079 0 6032.08 "PSID314" 0 22 11 "white" 0 1 8013.401 0 5748.356 "PSID315" 0 31 12 "white" 0 0 6156.016 0 4094.78 "PSID316" 0 19 12 "white" 0 0 5797.47 0 2160.436 "PSID317" 0 24 10 "white" 1 1 5523.173 0 5040.525 "PSID318" 0 36 12 "white" 1 0 5374.269 0 0 "PSID319" 0 20 9 "white" 1 1 5229.283 0 15892.95 "PSID320" 0 23 8 "white" 1 1 4610.155 0 0 "PSID321" 0 35 11 "white" 1 1 3975.352 0 21963.45 "PSID322" 0 23 12 "white" 0 0 3893.063 0 16324.45 "PSID323" 0 29 10 "white" 0 1 3751.996 0 251.2135 "PSID324" 0 24 9 "white" 1 1 3438.513 0 818.6605 "PSID325" 0 18 10 "white" 0 1 3360.143 0 0 "PSID326" 0 45 8 "black" 0 1 3299.405 0 31.03226 "PSID327" 0 21 13 "hispan" 0 0 3015.312 0 17627.8 "PSID328" 0 29 13 "white" 1 0 2780.2 0 14339.86 "PSID329" 0 21 15 "white" 0 0 2629.336 0 1717.118 "PSID330" 0 22 16 "black" 0 0 2564.68 0 116.7404 "PSID331" 0 24 12 "black" 1 0 2355.039 0 2448.593 "PSID332" 0 20 14 "white" 0 0 2210.053 0 2813.591 "PSID333" 0 19 6 "black" 0 1 1955.348 0 14998.92 "PSID334" 0 19 9 "hispan" 0 1 1822.118 0 3372.172 "PSID335" 0 19 12 "black" 0 0 1681.051 0 0 "PSID336" 0 20 13 "white" 0 0 1657.54 0 913.235 "PSID337" 0 19 12 "black" 0 0 1655.58 0 0 "PSID338" 0 26 5 "white" 1 1 1573.291 0 3700.227 "PSID339" 0 26 9 "hispan" 0 1 1563.495 0 2862.356 "PSID340" 0 23 12 "white" 0 0 1504.717 0 0 "PSID341" 0 20 9 "hispan" 0 1 1500.798 0 12618.31 "PSID342" 0 20 10 "white" 0 1 1412.631 0 6290.682 "PSID343" 0 36 11 "white" 1 1 1404.794 0 0 "PSID344" 0 39 12 "white" 1 0 1289.198 0 1202.869 "PSID345" 0 17 9 "black" 0 1 1222.582 0 422.6298 "PSID346" 0 55 3 "white" 0 1 1208.868 0 0 "PSID347" 0 28 8 "white" 1 1 1202.99 0 19516.33 "PSID348" 0 19 12 "hispan" 0 0 1058.004 0 8923.991 "PSID349" 0 37 7 "white" 1 1 963.9593 0 0 "PSID350" 0 16 9 "white" 1 1 920.8554 0 15997.87 "PSID351" 0 17 10 "white" 0 1 646.558 0 9438.24 "PSID352" 0 24 12 "black" 0 0 566.2281 0 2284.565 "PSID353" 0 19 11 "white" 0 1 540.7576 0 3406.16 "PSID354" 0 50 5 "black" 1 1 411.446 0 9166.338 "PSID355" 0 19 9 "black" 0 1 384.0163 0 0 "PSID356" 0 36 1 "black" 0 1 348.7495 0 0 "PSID357" 0 18 11 "white" 0 1 321.3198 0 7722.599 "PSID358" 0 16 7 "hispan" 0 1 289.9715 0 7515.717 "PSID359" 0 21 11 "white" 1 1 246.8676 0 6708.879 "PSID360" 0 55 6 "white" 1 1 111.6782 0 0 "PSID361" 0 37 12 "white" 0 0 48.98167 0 877.7696 "PSID362" 0 26 12 "hispan" 1 0 47.0224 0 0 "PSID363" 0 54 12 "white" 1 0 0 0 0 "PSID364" 0 50 12 "white" 1 0 0 0 0 "PSID365" 0 16 8 "white" 0 1 0 0 2559.422 "PSID366" 0 16 9 "hispan" 0 1 0 0 0 "PSID367" 0 18 10 "black" 0 1 0 0 2281.61 "PSID368" 0 40 11 "black" 1 1 0 0 0 "PSID369" 0 16 8 "white" 0 1 0 0 0 "PSID370" 0 16 9 "black" 0 1 0 0 2158.959 "PSID371" 0 26 14 "white" 0 0 0 0 6717.745 "PSID372" 0 20 9 "black" 0 1 0 0 6083.8 "PSID373" 0 20 12 "black" 0 0 0 0 0 "PSID374" 0 18 11 "black" 0 1 0 0 0 "PSID375" 0 46 11 "black" 1 1 0 0 2820.98 "PSID376" 0 17 8 "black" 0 1 0 0 12760.17 "PSID377" 0 16 9 "white" 0 1 0 0 4974.028 "PSID378" 0 30 10 "white" 1 1 0 0 3151.991 "PSID379" 0 33 12 "hispan" 1 0 0 0 5841.453 "PSID380" 0 34 12 "black" 1 0 0 0 18716.88 "PSID381" 0 21 13 "black" 0 0 0 0 17941.08 "PSID382" 0 29 11 "white" 1 1 0 0 0 "PSID383" 0 19 12 "white" 0 0 0 0 0 "PSID384" 0 31 4 "hispan" 0 1 0 0 1161.493 "PSID385" 0 19 12 "hispan" 0 0 0 0 18573.55 "PSID386" 0 20 12 "black" 0 0 0 0 11594.24 "PSID387" 0 55 4 "black" 0 1 0 0 0 "PSID388" 0 19 11 "black" 0 1 0 0 16485.52 "PSID389" 0 18 11 "black" 0 1 0 0 7146.286 "PSID390" 0 48 13 "white" 1 0 0 0 0 "PSID391" 0 16 9 "hispan" 1 1 0 0 6821.186 "PSID392" 0 17 10 "black" 0 1 0 0 0 "PSID393" 0 38 12 "white" 1 0 0 0 18756.78 "PSID394" 0 34 8 "white" 1 1 0 0 2664.341 "PSID395" 0 53 12 "white" 0 0 0 0 0 "PSID396" 0 48 14 "white" 1 0 0 0 7236.427 "PSID397" 0 16 9 "white" 0 1 0 0 6494.608 "PSID398" 0 17 8 "black" 0 1 0 0 4520.366 "PSID399" 0 27 14 "black" 0 0 0 0 10122.43 "PSID400" 0 37 8 "black" 0 1 0 0 648.722 "PSID401" 0 17 10 "black" 0 1 0 0 1053.619 "PSID402" 0 16 8 "white" 0 1 0 0 0 "PSID403" 0 48 12 "white" 1 0 0 0 1491.026 "PSID404" 0 55 7 "white" 0 1 0 0 0 "PSID405" 0 21 15 "white" 0 0 0 0 0 "PSID406" 0 16 10 "black" 0 1 0 0 1730.418 "PSID407" 0 23 12 "white" 0 0 0 0 3902.676 "PSID408" 0 46 11 "black" 1 1 0 0 0 "PSID409" 0 17 10 "white" 0 1 0 0 14942.77 "PSID410" 0 42 16 "white" 0 0 0 0 23764.8 "PSID411" 0 18 10 "black" 0 1 0 0 5306.516 "PSID412" 0 53 12 "black" 0 0 0 0 0 "PSID413" 0 17 10 "white" 1 1 0 0 3859.822 "PSID414" 0 17 6 "white" 0 1 0 0 0 "PSID415" 0 43 6 "white" 1 1 0 0 0 "PSID416" 0 34 12 "black" 0 0 0 0 0 "PSID417" 0 16 8 "hispan" 0 1 0 0 12242.96 "PSID418" 0 27 12 "white" 1 0 0 0 1533.88 "PSID419" 0 51 4 "black" 0 1 0 0 0 "PSID420" 0 39 2 "black" 1 1 0 0 964.9555 "PSID421" 0 55 8 "white" 1 1 0 0 0 "PSID422" 0 16 9 "white" 0 1 0 0 5551.819 "PSID423" 0 27 10 "black" 0 1 0 0 7543.794 "PSID424" 0 25 14 "white" 0 0 0 0 0 "PSID425" 0 18 11 "white" 0 1 0 0 10150.5 "PSID426" 0 24 1 "hispan" 1 1 0 0 19464.61 "PSID427" 0 21 18 "white" 0 0 0 0 0 "PSID428" 0 32 5 "black" 1 1 0 0 187.6713 "PSID429" 0 16 9 "white" 0 1 0 0 1495.459 MatchIt/src/0000755000176200001440000000000014763323604012402 5ustar liggesusersMatchIt/src/weights_matrixC.cpp0000644000176200001440000000251114714001607016236 0ustar liggesusers#include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // Computes matching weights from match.matrix // [[Rcpp::export]] NumericVector weights_matrixC(const IntegerMatrix& mm, const IntegerVector& treat_, const Nullable& focal = R_NilValue) { CharacterVector lab = treat_.names(); IntegerVector unique_treat = unique(treat_); std::sort(unique_treat.begin(), unique_treat.end()); int g = unique_treat.size(); IntegerVector treat = match(treat_, unique_treat) - 1; R_xlen_t n = treat.size(); int gi; NumericVector weights(n); weights.fill(0.0); weights.names() = lab; IntegerVector row_ind; if (focal.isNotNull()) { row_ind = which(treat == as(focal)); } else { row_ind = match(as(rownames(mm)), lab) - 1; } NumericVector matches_g = rep(0.0, g); IntegerVector row_r(mm.ncol()); for (int r : which(!is_na(mm(_, 0)))) { row_r = na_omit(mm.row(r)); for (gi = 0; gi < g; gi++) { matches_g[gi] = 0.0; } for (int i : row_r - 1) { matches_g[treat[i]] += 1.0; } for (int i : row_r - 1) { if (matches_g[treat[i]] == 0.0) { continue; } weights[i] += 1.0/matches_g[treat[i]]; } weights[row_ind[r]] += 1.0; } return weights; } MatchIt/src/subclass_scootC.cpp0000644000176200001440000000612514714207052016235 0ustar liggesusers#include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] IntegerVector subclass_scootC(const IntegerVector& subclass_, const IntegerVector& treat_, const NumericVector& x_, const int& min_n) { if (min_n == 0) { return subclass_; } int m, i, s, s2; int best_i; double best_x, score; R_xlen_t nt; LogicalVector na_sub = is_na(subclass_); IntegerVector subclass = subclass_[!na_sub]; IntegerVector treat = treat_[!na_sub]; NumericVector x = x_[!na_sub]; R_xlen_t n = subclass.size(); IntegerVector unique_sub = unique(subclass); std::sort(unique_sub.begin(), unique_sub.end()); subclass = match(subclass, unique_sub) - 1; R_xlen_t nsub = unique_sub.size(); NumericVector subtab(nsub); IntegerVector indt; bool left = false; IntegerVector ut = unique(treat); for (int t : ut) { indt = which(treat == t); nt = indt.size(); //Tabulate subtab.fill(0.0); for (int i : indt) { subtab[subclass[i]]++; } for (m = 0; m < min_n; m++) { while (min(subtab) <= 0) { for (s = 0; s < nsub; s++) { if (subtab[s] == 0) { break; } } //Find which way to look for new member if (s == nsub - 1) { left = true; } else if (s == 0) { left = false; } else { score = 0.; for (s2 = 0; s2 < nsub; s2++) { if (subtab[s2] <= 1) { continue; } if (s2 == s) { continue; } score += (subtab[s2] - 1) / static_cast(s2 - s); } left = (score <= 0); } //Find which subclass to take from (s2) if (left) { for (s2 = s - 1; s2 >= 0; s2--) { if (subtab[s2] > 0) { break; } } } else { for (s2 = s + 1; s2 < nsub; s2++) { if (subtab[s2] > 0) { break; } } } //Find unit with closest x in that subclass to take for (i = 0; i < nt; i++) { if (subclass[indt[i]] == s2) { best_i = i; best_x = x[indt[i]]; break; } } for (i = best_i + 1; i < nt; i++) { if (subclass[indt[i]] != s2) { continue; } if (left) { if (x[indt[i]] < best_x) { continue; } } else { if (x[indt[i]] > best_x) { continue; } } best_i = i; best_x = x[indt[i]]; } subclass[indt[best_i]] = s; subtab[s]++; subtab[s2]--; } for (s = 0; s < nsub; s++) { subtab[s]--; } } } for (i = 0; i < n; i++) { subclass[i] = unique_sub[subclass[i]]; } IntegerVector sub_out(subclass_.size()); sub_out.fill(NA_INTEGER); sub_out[!na_sub] = subclass; return sub_out; } MatchIt/src/RcppExports.cpp0000644000176200001440000005576114757125336015422 0ustar liggesusers// Generated by using Rcpp::compileAttributes() -> do not edit by hand // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 #include "../inst/include/MatchIt.h" #include #include #include using namespace Rcpp; #ifdef RCPP_USE_GLOBAL_ROSTREAM Rcpp::Rostream& Rcpp::Rcout = Rcpp::Rcpp_cout_get(); Rcpp::Rostream& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get(); #endif // all_equal_to bool all_equal_to(RObject x, RObject y); RcppExport SEXP _MatchIt_all_equal_to(SEXP xSEXP, SEXP ySEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< RObject >::type x(xSEXP); Rcpp::traits::input_parameter< RObject >::type y(ySEXP); rcpp_result_gen = Rcpp::wrap(all_equal_to(x, y)); return rcpp_result_gen; END_RCPP } // eucdistC_N1xN0 NumericVector eucdistC_N1xN0(const NumericMatrix& x, const IntegerVector& t); RcppExport SEXP _MatchIt_eucdistC_N1xN0(SEXP xSEXP, SEXP tSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const NumericMatrix& >::type x(xSEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type t(tSEXP); rcpp_result_gen = Rcpp::wrap(eucdistC_N1xN0(x, t)); return rcpp_result_gen; END_RCPP } // get_splitsC NumericVector get_splitsC(const NumericVector& x, const double& caliper); RcppExport SEXP _MatchIt_get_splitsC(SEXP xSEXP, SEXP caliperSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const NumericVector& >::type x(xSEXP); Rcpp::traits::input_parameter< const double& >::type caliper(caliperSEXP); rcpp_result_gen = Rcpp::wrap(get_splitsC(x, caliper)); return rcpp_result_gen; END_RCPP } // has_n_unique bool has_n_unique(const SEXP& x, const int& n); RcppExport SEXP _MatchIt_has_n_unique(SEXP xSEXP, SEXP nSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const SEXP& >::type x(xSEXP); Rcpp::traits::input_parameter< const int& >::type n(nSEXP); rcpp_result_gen = Rcpp::wrap(has_n_unique(x, n)); return rcpp_result_gen; END_RCPP } // nn_matchC_distmat IntegerMatrix nn_matchC_distmat(const IntegerVector& treat_, const IntegerVector& ord, const IntegerVector& ratio, const LogicalVector& discarded, const int& reuse_max, const int& focal_, const NumericMatrix& distance_mat, const Nullable& exact_, const Nullable& caliper_dist_, const Nullable& caliper_covs_, const Nullable& caliper_covs_mat_, const Nullable& antiexact_covs_, const Nullable& unit_id_, const bool& disl_prog); RcppExport SEXP _MatchIt_nn_matchC_distmat(SEXP treat_SEXP, SEXP ordSEXP, SEXP ratioSEXP, SEXP discardedSEXP, SEXP reuse_maxSEXP, SEXP focal_SEXP, SEXP distance_matSEXP, SEXP exact_SEXP, SEXP caliper_dist_SEXP, SEXP caliper_covs_SEXP, SEXP caliper_covs_mat_SEXP, SEXP antiexact_covs_SEXP, SEXP unit_id_SEXP, SEXP disl_progSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const IntegerVector& >::type treat_(treat_SEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type ord(ordSEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type ratio(ratioSEXP); Rcpp::traits::input_parameter< const LogicalVector& >::type discarded(discardedSEXP); Rcpp::traits::input_parameter< const int& >::type reuse_max(reuse_maxSEXP); Rcpp::traits::input_parameter< const int& >::type focal_(focal_SEXP); Rcpp::traits::input_parameter< const NumericMatrix& >::type distance_mat(distance_matSEXP); Rcpp::traits::input_parameter< const Nullable& >::type exact_(exact_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_dist_(caliper_dist_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_covs_(caliper_covs_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_covs_mat_(caliper_covs_mat_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type antiexact_covs_(antiexact_covs_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type unit_id_(unit_id_SEXP); Rcpp::traits::input_parameter< const bool& >::type disl_prog(disl_progSEXP); rcpp_result_gen = Rcpp::wrap(nn_matchC_distmat(treat_, ord, ratio, discarded, reuse_max, focal_, distance_mat, exact_, caliper_dist_, caliper_covs_, caliper_covs_mat_, antiexact_covs_, unit_id_, disl_prog)); return rcpp_result_gen; END_RCPP } // nn_matchC_distmat_closest IntegerMatrix nn_matchC_distmat_closest(const IntegerVector& treat, const IntegerVector& ratio, const LogicalVector& discarded, const int& reuse_max, const NumericMatrix& distance_mat, const Nullable& exact_, const Nullable& caliper_dist_, const Nullable& caliper_covs_, const Nullable& caliper_covs_mat_, const Nullable& antiexact_covs_, const Nullable& unit_id_, const bool& close, const bool& disl_prog); RcppExport SEXP _MatchIt_nn_matchC_distmat_closest(SEXP treatSEXP, SEXP ratioSEXP, SEXP discardedSEXP, SEXP reuse_maxSEXP, SEXP distance_matSEXP, SEXP exact_SEXP, SEXP caliper_dist_SEXP, SEXP caliper_covs_SEXP, SEXP caliper_covs_mat_SEXP, SEXP antiexact_covs_SEXP, SEXP unit_id_SEXP, SEXP closeSEXP, SEXP disl_progSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const IntegerVector& >::type treat(treatSEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type ratio(ratioSEXP); Rcpp::traits::input_parameter< const LogicalVector& >::type discarded(discardedSEXP); Rcpp::traits::input_parameter< const int& >::type reuse_max(reuse_maxSEXP); Rcpp::traits::input_parameter< const NumericMatrix& >::type distance_mat(distance_matSEXP); Rcpp::traits::input_parameter< const Nullable& >::type exact_(exact_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_dist_(caliper_dist_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_covs_(caliper_covs_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_covs_mat_(caliper_covs_mat_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type antiexact_covs_(antiexact_covs_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type unit_id_(unit_id_SEXP); Rcpp::traits::input_parameter< const bool& >::type close(closeSEXP); Rcpp::traits::input_parameter< const bool& >::type disl_prog(disl_progSEXP); rcpp_result_gen = Rcpp::wrap(nn_matchC_distmat_closest(treat, ratio, discarded, reuse_max, distance_mat, exact_, caliper_dist_, caliper_covs_, caliper_covs_mat_, antiexact_covs_, unit_id_, close, disl_prog)); return rcpp_result_gen; END_RCPP } // nn_matchC_mahcovs IntegerMatrix nn_matchC_mahcovs(const IntegerVector& treat_, const IntegerVector& ord, const IntegerVector& ratio, const LogicalVector& discarded, const int& reuse_max, const int& focal_, const NumericMatrix& mah_covs, const Nullable& distance_, const Nullable& exact_, const Nullable& caliper_dist_, const Nullable& caliper_covs_, const Nullable& caliper_covs_mat_, const Nullable& antiexact_covs_, const Nullable& unit_id_, const bool& disl_prog); RcppExport SEXP _MatchIt_nn_matchC_mahcovs(SEXP treat_SEXP, SEXP ordSEXP, SEXP ratioSEXP, SEXP discardedSEXP, SEXP reuse_maxSEXP, SEXP focal_SEXP, SEXP mah_covsSEXP, SEXP distance_SEXP, SEXP exact_SEXP, SEXP caliper_dist_SEXP, SEXP caliper_covs_SEXP, SEXP caliper_covs_mat_SEXP, SEXP antiexact_covs_SEXP, SEXP unit_id_SEXP, SEXP disl_progSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const IntegerVector& >::type treat_(treat_SEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type ord(ordSEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type ratio(ratioSEXP); Rcpp::traits::input_parameter< const LogicalVector& >::type discarded(discardedSEXP); Rcpp::traits::input_parameter< const int& >::type reuse_max(reuse_maxSEXP); Rcpp::traits::input_parameter< const int& >::type focal_(focal_SEXP); Rcpp::traits::input_parameter< const NumericMatrix& >::type mah_covs(mah_covsSEXP); Rcpp::traits::input_parameter< const Nullable& >::type distance_(distance_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type exact_(exact_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_dist_(caliper_dist_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_covs_(caliper_covs_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_covs_mat_(caliper_covs_mat_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type antiexact_covs_(antiexact_covs_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type unit_id_(unit_id_SEXP); Rcpp::traits::input_parameter< const bool& >::type disl_prog(disl_progSEXP); rcpp_result_gen = Rcpp::wrap(nn_matchC_mahcovs(treat_, ord, ratio, discarded, reuse_max, focal_, mah_covs, distance_, exact_, caliper_dist_, caliper_covs_, caliper_covs_mat_, antiexact_covs_, unit_id_, disl_prog)); return rcpp_result_gen; END_RCPP } // nn_matchC_mahcovs_closest IntegerMatrix nn_matchC_mahcovs_closest(const IntegerVector& treat, const IntegerVector& ratio, const LogicalVector& discarded, const int& reuse_max, const NumericMatrix& mah_covs, const Nullable& distance_, const Nullable& exact_, const Nullable& caliper_dist_, const Nullable& caliper_covs_, const Nullable& caliper_covs_mat_, const Nullable& antiexact_covs_, const Nullable& unit_id_, const bool& close, const bool& disl_prog); RcppExport SEXP _MatchIt_nn_matchC_mahcovs_closest(SEXP treatSEXP, SEXP ratioSEXP, SEXP discardedSEXP, SEXP reuse_maxSEXP, SEXP mah_covsSEXP, SEXP distance_SEXP, SEXP exact_SEXP, SEXP caliper_dist_SEXP, SEXP caliper_covs_SEXP, SEXP caliper_covs_mat_SEXP, SEXP antiexact_covs_SEXP, SEXP unit_id_SEXP, SEXP closeSEXP, SEXP disl_progSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const IntegerVector& >::type treat(treatSEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type ratio(ratioSEXP); Rcpp::traits::input_parameter< const LogicalVector& >::type discarded(discardedSEXP); Rcpp::traits::input_parameter< const int& >::type reuse_max(reuse_maxSEXP); Rcpp::traits::input_parameter< const NumericMatrix& >::type mah_covs(mah_covsSEXP); Rcpp::traits::input_parameter< const Nullable& >::type distance_(distance_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type exact_(exact_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_dist_(caliper_dist_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_covs_(caliper_covs_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_covs_mat_(caliper_covs_mat_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type antiexact_covs_(antiexact_covs_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type unit_id_(unit_id_SEXP); Rcpp::traits::input_parameter< const bool& >::type close(closeSEXP); Rcpp::traits::input_parameter< const bool& >::type disl_prog(disl_progSEXP); rcpp_result_gen = Rcpp::wrap(nn_matchC_mahcovs_closest(treat, ratio, discarded, reuse_max, mah_covs, distance_, exact_, caliper_dist_, caliper_covs_, caliper_covs_mat_, antiexact_covs_, unit_id_, close, disl_prog)); return rcpp_result_gen; END_RCPP } // nn_matchC_vec IntegerMatrix nn_matchC_vec(const IntegerVector& treat_, const IntegerVector& ord, const IntegerVector& ratio, const LogicalVector& discarded, const int& reuse_max, const int& focal_, const NumericVector& distance, const Nullable& exact_, const Nullable& caliper_dist_, const Nullable& caliper_covs_, const Nullable& caliper_covs_mat_, const Nullable& antiexact_covs_, const Nullable& unit_id_, const bool& disl_prog); RcppExport SEXP _MatchIt_nn_matchC_vec(SEXP treat_SEXP, SEXP ordSEXP, SEXP ratioSEXP, SEXP discardedSEXP, SEXP reuse_maxSEXP, SEXP focal_SEXP, SEXP distanceSEXP, SEXP exact_SEXP, SEXP caliper_dist_SEXP, SEXP caliper_covs_SEXP, SEXP caliper_covs_mat_SEXP, SEXP antiexact_covs_SEXP, SEXP unit_id_SEXP, SEXP disl_progSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const IntegerVector& >::type treat_(treat_SEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type ord(ordSEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type ratio(ratioSEXP); Rcpp::traits::input_parameter< const LogicalVector& >::type discarded(discardedSEXP); Rcpp::traits::input_parameter< const int& >::type reuse_max(reuse_maxSEXP); Rcpp::traits::input_parameter< const int& >::type focal_(focal_SEXP); Rcpp::traits::input_parameter< const NumericVector& >::type distance(distanceSEXP); Rcpp::traits::input_parameter< const Nullable& >::type exact_(exact_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_dist_(caliper_dist_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_covs_(caliper_covs_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_covs_mat_(caliper_covs_mat_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type antiexact_covs_(antiexact_covs_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type unit_id_(unit_id_SEXP); Rcpp::traits::input_parameter< const bool& >::type disl_prog(disl_progSEXP); rcpp_result_gen = Rcpp::wrap(nn_matchC_vec(treat_, ord, ratio, discarded, reuse_max, focal_, distance, exact_, caliper_dist_, caliper_covs_, caliper_covs_mat_, antiexact_covs_, unit_id_, disl_prog)); return rcpp_result_gen; END_RCPP } // nn_matchC_vec_closest IntegerMatrix nn_matchC_vec_closest(const IntegerVector& treat, const IntegerVector& ratio, const LogicalVector& discarded, const int& reuse_max, const NumericVector& distance, const Nullable& exact_, const Nullable& caliper_dist_, const Nullable& caliper_covs_, const Nullable& caliper_covs_mat_, const Nullable& antiexact_covs_, const Nullable& unit_id_, const bool& close, const bool& disl_prog); RcppExport SEXP _MatchIt_nn_matchC_vec_closest(SEXP treatSEXP, SEXP ratioSEXP, SEXP discardedSEXP, SEXP reuse_maxSEXP, SEXP distanceSEXP, SEXP exact_SEXP, SEXP caliper_dist_SEXP, SEXP caliper_covs_SEXP, SEXP caliper_covs_mat_SEXP, SEXP antiexact_covs_SEXP, SEXP unit_id_SEXP, SEXP closeSEXP, SEXP disl_progSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const IntegerVector& >::type treat(treatSEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type ratio(ratioSEXP); Rcpp::traits::input_parameter< const LogicalVector& >::type discarded(discardedSEXP); Rcpp::traits::input_parameter< const int& >::type reuse_max(reuse_maxSEXP); Rcpp::traits::input_parameter< const NumericVector& >::type distance(distanceSEXP); Rcpp::traits::input_parameter< const Nullable& >::type exact_(exact_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_dist_(caliper_dist_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_covs_(caliper_covs_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type caliper_covs_mat_(caliper_covs_mat_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type antiexact_covs_(antiexact_covs_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type unit_id_(unit_id_SEXP); Rcpp::traits::input_parameter< const bool& >::type close(closeSEXP); Rcpp::traits::input_parameter< const bool& >::type disl_prog(disl_progSEXP); rcpp_result_gen = Rcpp::wrap(nn_matchC_vec_closest(treat, ratio, discarded, reuse_max, distance, exact_, caliper_dist_, caliper_covs_, caliper_covs_mat_, antiexact_covs_, unit_id_, close, disl_prog)); return rcpp_result_gen; END_RCPP } // pairdistsubC double pairdistsubC(const NumericVector& x, const IntegerVector& t, const IntegerVector& s); RcppExport SEXP _MatchIt_pairdistsubC(SEXP xSEXP, SEXP tSEXP, SEXP sSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const NumericVector& >::type x(xSEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type t(tSEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type s(sSEXP); rcpp_result_gen = Rcpp::wrap(pairdistsubC(x, t, s)); return rcpp_result_gen; END_RCPP } // preprocess_matchC IntegerVector preprocess_matchC(IntegerVector t, NumericVector p); RcppExport SEXP _MatchIt_preprocess_matchC(SEXP tSEXP, SEXP pSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< IntegerVector >::type t(tSEXP); Rcpp::traits::input_parameter< NumericVector >::type p(pSEXP); rcpp_result_gen = Rcpp::wrap(preprocess_matchC(t, p)); return rcpp_result_gen; END_RCPP } // subclass2mmC IntegerMatrix subclass2mmC(const IntegerVector& subclass_, const IntegerVector& treat, const int& focal); RcppExport SEXP _MatchIt_subclass2mmC(SEXP subclass_SEXP, SEXP treatSEXP, SEXP focalSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const IntegerVector& >::type subclass_(subclass_SEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type treat(treatSEXP); Rcpp::traits::input_parameter< const int& >::type focal(focalSEXP); rcpp_result_gen = Rcpp::wrap(subclass2mmC(subclass_, treat, focal)); return rcpp_result_gen; END_RCPP } // mm2subclassC IntegerVector mm2subclassC(const IntegerMatrix& mm, const IntegerVector& treat, const Nullable& focal); RcppExport SEXP _MatchIt_mm2subclassC(SEXP mmSEXP, SEXP treatSEXP, SEXP focalSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const IntegerMatrix& >::type mm(mmSEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type treat(treatSEXP); Rcpp::traits::input_parameter< const Nullable& >::type focal(focalSEXP); rcpp_result_gen = Rcpp::wrap(mm2subclassC(mm, treat, focal)); return rcpp_result_gen; END_RCPP } // subclass_scootC IntegerVector subclass_scootC(const IntegerVector& subclass_, const IntegerVector& treat_, const NumericVector& x_, const int& min_n); RcppExport SEXP _MatchIt_subclass_scootC(SEXP subclass_SEXP, SEXP treat_SEXP, SEXP x_SEXP, SEXP min_nSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const IntegerVector& >::type subclass_(subclass_SEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type treat_(treat_SEXP); Rcpp::traits::input_parameter< const NumericVector& >::type x_(x_SEXP); Rcpp::traits::input_parameter< const int& >::type min_n(min_nSEXP); rcpp_result_gen = Rcpp::wrap(subclass_scootC(subclass_, treat_, x_, min_n)); return rcpp_result_gen; END_RCPP } // tabulateC IntegerVector tabulateC(const IntegerVector& bins, const Nullable& nbins); RcppExport SEXP _MatchIt_tabulateC(SEXP binsSEXP, SEXP nbinsSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const IntegerVector& >::type bins(binsSEXP); Rcpp::traits::input_parameter< const Nullable& >::type nbins(nbinsSEXP); rcpp_result_gen = Rcpp::wrap(tabulateC(bins, nbins)); return rcpp_result_gen; END_RCPP } // weights_matrixC NumericVector weights_matrixC(const IntegerMatrix& mm, const IntegerVector& treat_, const Nullable& focal); RcppExport SEXP _MatchIt_weights_matrixC(SEXP mmSEXP, SEXP treat_SEXP, SEXP focalSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const IntegerMatrix& >::type mm(mmSEXP); Rcpp::traits::input_parameter< const IntegerVector& >::type treat_(treat_SEXP); Rcpp::traits::input_parameter< const Nullable& >::type focal(focalSEXP); rcpp_result_gen = Rcpp::wrap(weights_matrixC(mm, treat_, focal)); return rcpp_result_gen; END_RCPP } // validate (ensure exported C++ functions exist before calling them) static int _MatchIt_RcppExport_validate(const char* sig) { static std::set signatures; if (signatures.empty()) { } return signatures.find(sig) != signatures.end(); } // registerCCallable (register entry points for exported C++ functions) RcppExport SEXP _MatchIt_RcppExport_registerCCallable() { R_RegisterCCallable("MatchIt", "_MatchIt_RcppExport_validate", (DL_FUNC)_MatchIt_RcppExport_validate); return R_NilValue; } static const R_CallMethodDef CallEntries[] = { {"_MatchIt_all_equal_to", (DL_FUNC) &_MatchIt_all_equal_to, 2}, {"_MatchIt_eucdistC_N1xN0", (DL_FUNC) &_MatchIt_eucdistC_N1xN0, 2}, {"_MatchIt_get_splitsC", (DL_FUNC) &_MatchIt_get_splitsC, 2}, {"_MatchIt_has_n_unique", (DL_FUNC) &_MatchIt_has_n_unique, 2}, {"_MatchIt_nn_matchC_distmat", (DL_FUNC) &_MatchIt_nn_matchC_distmat, 14}, {"_MatchIt_nn_matchC_distmat_closest", (DL_FUNC) &_MatchIt_nn_matchC_distmat_closest, 13}, {"_MatchIt_nn_matchC_mahcovs", (DL_FUNC) &_MatchIt_nn_matchC_mahcovs, 15}, {"_MatchIt_nn_matchC_mahcovs_closest", (DL_FUNC) &_MatchIt_nn_matchC_mahcovs_closest, 14}, {"_MatchIt_nn_matchC_vec", (DL_FUNC) &_MatchIt_nn_matchC_vec, 14}, {"_MatchIt_nn_matchC_vec_closest", (DL_FUNC) &_MatchIt_nn_matchC_vec_closest, 13}, {"_MatchIt_pairdistsubC", (DL_FUNC) &_MatchIt_pairdistsubC, 3}, {"_MatchIt_preprocess_matchC", (DL_FUNC) &_MatchIt_preprocess_matchC, 2}, {"_MatchIt_subclass2mmC", (DL_FUNC) &_MatchIt_subclass2mmC, 3}, {"_MatchIt_mm2subclassC", (DL_FUNC) &_MatchIt_mm2subclassC, 3}, {"_MatchIt_subclass_scootC", (DL_FUNC) &_MatchIt_subclass_scootC, 4}, {"_MatchIt_tabulateC", (DL_FUNC) &_MatchIt_tabulateC, 2}, {"_MatchIt_weights_matrixC", (DL_FUNC) &_MatchIt_weights_matrixC, 3}, {"_MatchIt_RcppExport_registerCCallable", (DL_FUNC) &_MatchIt_RcppExport_registerCCallable, 0}, {NULL, NULL, 0} }; RcppExport void R_init_MatchIt(DllInfo *dll) { R_registerRoutines(dll, NULL, CallEntries, NULL, NULL); R_useDynamicSymbols(dll, FALSE); } MatchIt/src/nn_matchC_vec.cpp0000644000176200001440000002147014757124450015642 0ustar liggesusers// [[Rcpp::depends(RcppProgress)]] #include "eta_progress_bar.h" #include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] IntegerMatrix nn_matchC_vec(const IntegerVector& treat_, const IntegerVector& ord, const IntegerVector& ratio, const LogicalVector& discarded, const int& reuse_max, const int& focal_, const NumericVector& distance, const Nullable& exact_ = R_NilValue, const Nullable& caliper_dist_ = R_NilValue, const Nullable& caliper_covs_ = R_NilValue, const Nullable& caliper_covs_mat_ = R_NilValue, const Nullable& antiexact_covs_ = R_NilValue, const Nullable& unit_id_ = R_NilValue, const bool& disl_prog = false) { IntegerVector unique_treat = unique(treat_); std::sort(unique_treat.begin(), unique_treat.end()); int g = unique_treat.size(); IntegerVector treat = match(treat_, unique_treat) - 1; int focal; for (focal = 0; focal < g; focal++) { if (unique_treat[focal] == focal_) { break; } } R_xlen_t n = treat.size(); IntegerVector ind = Range(0, n - 1); R_xlen_t i; int gi; IntegerVector indt(n); IntegerVector indt_sep(g + 1); IntegerVector indt_tmp; IntegerVector nt(g); IntegerVector ind_match(n); ind_match.fill(NA_INTEGER); LogicalVector eligible = !discarded; IntegerVector g_c = Range(0, g - 1); g_c = g_c[g_c != focal]; IntegerVector n_eligible(g); for (i = 0; i < n; i++) { nt[treat[i]]++; if (eligible[i]) { n_eligible[treat[i]]++; } } int nf = nt[focal]; indt_sep[0] = 0; for (gi = 0; gi < g; gi++) { indt_sep[gi + 1] = indt_sep[gi] + nt[gi]; indt_tmp = ind[treat == gi]; for (i = 0; i < nt[gi]; i++) { indt[indt_sep[gi] + i] = indt_tmp[i]; ind_match[indt_tmp[i]] = i; } } IntegerVector ind_focal = indt[Range(indt_sep[focal], indt_sep[focal + 1] - 1)]; std::vector times_matched(n, 0); std::vector times_matched_allowed(n, reuse_max); for (i = 0; i < nf; i++) { times_matched_allowed[ind_focal[i]] = ratio[i]; } int max_ratio = max(ratio); // Output matrix with sample indices of control units IntegerMatrix mm(nf, max_ratio); mm.fill(NA_INTEGER); CharacterVector lab = treat_.names(); //Use base::order() because faster than Rcpp implementation of order() Function o("order"); IntegerVector ind_d_ord = o(distance); ind_d_ord = ind_d_ord - 1; //location of each unit after sorting IntegerVector match_d_ord = o(ind_d_ord); match_d_ord = match_d_ord - 1; IntegerVector last_control(g); last_control.fill(n - 1); IntegerVector first_control(g); first_control.fill(0); //exact bool use_exact = false; IntegerVector exact; if (exact_.isNotNull()) { exact = as(exact_); use_exact = true; } //caliper_dist double caliper_dist; if (caliper_dist_.isNotNull()) { caliper_dist = as(caliper_dist_); } else { caliper_dist = max_finite(distance) - min_finite(distance) + 1; } //caliper_covs NumericVector caliper_covs; NumericMatrix caliper_covs_mat; int ncc = 0; if (caliper_covs_.isNotNull()) { caliper_covs = as(caliper_covs_); caliper_covs_mat = as(caliper_covs_mat_); ncc = caliper_covs_mat.ncol(); } //antiexact IntegerMatrix antiexact_covs; int aenc = 0; if (antiexact_covs_.isNotNull()) { antiexact_covs = as(antiexact_covs_); aenc = antiexact_covs.ncol(); } //reuse_max bool use_reuse_max = (reuse_max < nf); //unit_id IntegerVector unit_id; bool use_unit_id = false; if (unit_id_.isNotNull()) { unit_id = as(unit_id_); use_unit_id = true; use_reuse_max = true; } IntegerVector matches_i(1 + max_ratio * (g - 1)); int k_total; //progress bar int prog_length; if (use_reuse_max) prog_length = sum(ratio) + 1; else prog_length = nf + 1; ETAProgressBar pb; Progress p(prog_length, disl_prog, pb); R_xlen_t c; int r, t_id_i; IntegerVector ck_; std::vector k(max_ratio); int counter = 0; if (use_reuse_max) { IntegerVector ord_r(nf); for (r = 1; r <= max_ratio; r++) { ord_r = ord[as(ratio[ord - 1]) >= r]; for (int t_id_t_i : ord_r - 1) { // t_id_t_i; index of treated unit to match among treated units // t_id_i: index of treated unit to match among all units counter++; if (counter == 200) { counter = 0; Rcpp::checkUserInterrupt(); } if (max(as(n_eligible[g_c])) == 0) { break; } t_id_i = ind_focal[t_id_t_i]; p.increment(); if (!eligible[t_id_i]) { continue; } k_total = 0; for (int gi : g_c) { update_first_and_last_control(first_control, last_control, ind_d_ord, eligible, treat, gi); k = find_control_vec(t_id_i, ind_d_ord, match_d_ord, treat, distance, eligible, gi, r, mm.row(t_id_t_i), ncc, caliper_covs_mat, caliper_covs, caliper_dist, use_exact, exact, aenc, antiexact_covs, first_control, last_control); if (k.empty()) { if (r == 1) { k_total = 0; break; } continue; } matches_i[k_total] = k[0]; k_total++; } if (k_total == 0) { eligible[t_id_i] = false; n_eligible[focal]--; continue; } for (c = 0; c < k_total; c++) { mm(t_id_t_i, sum(!is_na(mm(t_id_t_i, _)))) = matches_i[c]; } matches_i[k_total] = t_id_i; ck_ = matches_i[Range(0, k_total)]; if (use_unit_id) { ck_ = which(!is_na(match(unit_id, as(unit_id[ck_])))); } for (int ck : ck_) { if (!eligible[ck]) { continue; } times_matched[ck]++; if (times_matched[ck] >= times_matched_allowed[ck]) { eligible[ck] = false; n_eligible[treat[ck]]--; } } } } } else { for (int t_id_t_i : ord - 1) { // t_id_t_i; index of treated unit to match among treated units // t_id_i: index of treated unit to match among all units counter++; if (counter == 200) { counter = 0; Rcpp::checkUserInterrupt(); } t_id_i = ind_focal[t_id_t_i]; p.increment(); if (!eligible[t_id_i]) { continue; } k_total = 0; for (int gi : g_c) { k = find_control_vec(t_id_i, ind_d_ord, match_d_ord, treat, distance, eligible, gi, 1, mm.row(t_id_t_i), ncc, caliper_covs_mat, caliper_covs, caliper_dist, use_exact, exact, aenc, antiexact_covs, first_control, last_control, ratio[t_id_t_i]); if (k.empty()) { k_total = 0; break; } for (int cc : k) { matches_i[k_total] = cc; k_total++; } } if (k_total == 0) { continue; } for (c = 0; c < k_total; c++) { mm(t_id_t_i, sum(!is_na(mm(t_id_t_i, _)))) = matches_i[c]; } } } p.update(prog_length); mm = mm + 1; rownames(mm) = as(lab[ind_focal]); return mm; } MatchIt/src/nn_matchC_distmat_closest.cpp0000644000176200001440000001755214757124473020301 0ustar liggesusers// [[Rcpp::depends(RcppProgress)]] #include "eta_progress_bar.h" #include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] IntegerMatrix nn_matchC_distmat_closest(const IntegerVector& treat, const IntegerVector& ratio, const LogicalVector& discarded, const int& reuse_max, const NumericMatrix& distance_mat, const Nullable& exact_ = R_NilValue, const Nullable& caliper_dist_ = R_NilValue, const Nullable& caliper_covs_ = R_NilValue, const Nullable& caliper_covs_mat_ = R_NilValue, const Nullable& antiexact_covs_ = R_NilValue, const Nullable& unit_id_ = R_NilValue, const bool& close = true, const bool& disl_prog = false) { IntegerVector unique_treat = {0, 1}; int g = unique_treat.size(); int focal = 1; R_xlen_t n = treat.size(); IntegerVector ind = Range(0, n - 1); R_xlen_t i; int gi; IntegerVector indt(n); IntegerVector indt_sep(g + 1); IntegerVector indt_tmp; IntegerVector nt(g); IntegerVector ind_match(n); ind_match.fill(NA_INTEGER); IntegerVector times_matched(n); times_matched.fill(0); LogicalVector eligible = !discarded; for (gi = 0; gi < g; gi++) { nt[gi] = std::count(treat.begin(), treat.end(), gi); } int nf = nt[focal]; indt_sep[0] = 0; for (gi = 0; gi < g; gi++) { indt_sep[gi + 1] = indt_sep[gi] + nt[gi]; indt_tmp = ind[treat == gi]; for (i = 0; i < nt[gi]; i++) { indt[indt_sep[gi] + i] = indt_tmp[i]; } } IntegerVector ind_non_focal = which(treat != focal); IntegerVector ind_focal = which(treat == focal); for (i = 0; i < n - nf; i++) { ind_match[ind_non_focal[i]] = i; } for (i = 0; i < nf; i++) { ind_match[ind_focal[i]] = i; } IntegerVector times_matched_allowed(n); times_matched_allowed.fill(reuse_max); times_matched_allowed[ind_focal] = ratio; IntegerVector n_eligible(g); for (i = 0; i < n; i++) { if (eligible[i]) { n_eligible[treat[i]]++; } } int max_ratio = max(ratio); // Output matrix with sample indices of control units IntegerMatrix mm(nf, max_ratio); mm.fill(NA_INTEGER); CharacterVector lab = treat.names(); Function o("order"); //exact bool use_exact = false; IntegerVector exact; if (exact_.isNotNull()) { exact = as(exact_); use_exact = true; } //caliper_dist double caliper_dist; if (caliper_dist_.isNotNull()) { caliper_dist = as(caliper_dist_); } else { caliper_dist = max_finite(distance_mat) + .1; } //caliper_covs NumericVector caliper_covs; NumericMatrix caliper_covs_mat; int ncc = 0; if (caliper_covs_.isNotNull()) { caliper_covs = as(caliper_covs_); caliper_covs_mat = as(caliper_covs_mat_); ncc = caliper_covs_mat.ncol(); } //antiexact IntegerMatrix antiexact_covs; int aenc = 0; if (antiexact_covs_.isNotNull()) { antiexact_covs = as(antiexact_covs_); aenc = antiexact_covs.ncol(); } //unit_id IntegerVector unit_id; bool use_unit_id = false; if (unit_id_.isNotNull()) { unit_id = as(unit_id_); use_unit_id = true; } //storing closeness std::vector t_id, c_id; std::vector dist; t_id.reserve(n_eligible[focal]); c_id.reserve(n_eligible[focal]); dist.reserve(n_eligible[focal]); //progress bar R_xlen_t prog_length = n_eligible[focal] + sum(ratio) + 1; ETAProgressBar pb; Progress p(prog_length, disl_prog, pb); gi = 0; IntegerVector ck_; int c_id_i, t_id_t_i, t_id_i; int counter = 0; int r = 1; IntegerVector heap_ord(n_eligible[focal]); std::vector k; k.reserve(1); R_xlen_t hi; IntegerVector::iterator ci; std::function cmp; if (close) { cmp = [&dist](const int& a, const int& b) {return dist[a] < dist[b];}; } else { cmp = [&dist](const int& a, const int& b) {return dist[a] >= dist[b];}; } for (r = 1; r <= max_ratio; r++) { for (int ti : ind_focal) { if (!eligible[ti]) { continue; } counter++; if (counter == 200) { counter = 0; Rcpp::checkUserInterrupt(); } t_id_t_i = ind_match[ti]; k = find_control_mat(ti, treat, ind_non_focal, distance_mat.row(t_id_t_i), eligible, gi, r, mm.row(t_id_t_i), ncc, caliper_covs_mat, caliper_covs, caliper_dist, use_exact, exact, aenc, antiexact_covs); p.increment(); if (k.empty()) { eligible[ti] = false; n_eligible[focal]--; continue; } t_id.push_back(ti); c_id.push_back(k[0]); dist.push_back(distance_mat(t_id_t_i, ind_match[k[0]])); } nf = dist.size(); //Order the list heap_ord = o(dist, _["decreasing"] = !close); heap_ord = heap_ord - 1; i = 0; while (min(n_eligible) > 0 && i < nf) { counter++; if (counter == 200) { counter = 0; Rcpp::checkUserInterrupt(); } hi = heap_ord[i]; t_id_i = t_id[hi]; if (!eligible[t_id_i]) { i++; continue; } t_id_t_i = ind_match[t_id_i]; c_id_i = c_id[hi]; if (!eligible[c_id_i]) { // If control isn't eligible, find new control and try again k = find_control_mat(t_id_i, treat, ind_non_focal, distance_mat.row(t_id_t_i), eligible, gi, r, mm.row(t_id_t_i), ncc, caliper_covs_mat, caliper_covs, caliper_dist, use_exact, exact, aenc, antiexact_covs); //If no matches... if (k.empty()) { eligible[t_id_i] = false; n_eligible[focal]--; continue; } c_id[hi] = k[0]; dist[hi] = distance_mat(t_id_t_i, ind_match[k[0]]); // Find new position of pair in heap ci = std::lower_bound(heap_ord.begin() + i, heap_ord.end(), hi, cmp); if (ci != heap_ord.begin() + i) { std::rotate(heap_ord.begin() + i, heap_ord.begin() + i + 1, ci); } continue; } mm(t_id_t_i, sum(!is_na(mm(t_id_t_i, _)))) = c_id_i; ck_ = {c_id_i, t_id_i}; if (use_unit_id) { ck_ = which(!is_na(match(unit_id, as(unit_id[ck_])))); } for (int ck : ck_) { if (!eligible[ck]) { continue; } times_matched[ck]++; if (times_matched[ck] >= times_matched_allowed[ck]) { eligible[ck] = false; n_eligible[treat[ck]]--; } } p.increment(); i++; } t_id.clear(); c_id.clear(); dist.clear(); } p.update(prog_length); mm = mm + 1; rownames(mm) = as(lab[ind_focal]); return mm; } MatchIt/src/Makevars.win0000644000176200001440000000016114375442615014673 0ustar liggesusersPKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) MatchIt/src/nn_matchC_mahcovs_closest.cpp0000644000176200001440000002220014757124461020253 0ustar liggesusers// [[Rcpp::depends(RcppProgress)]] #include "eta_progress_bar.h" #include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] IntegerMatrix nn_matchC_mahcovs_closest(const IntegerVector& treat, const IntegerVector& ratio, const LogicalVector& discarded, const int& reuse_max, const NumericMatrix& mah_covs, const Nullable& distance_ = R_NilValue, const Nullable& exact_ = R_NilValue, const Nullable& caliper_dist_ = R_NilValue, const Nullable& caliper_covs_ = R_NilValue, const Nullable& caliper_covs_mat_ = R_NilValue, const Nullable& antiexact_covs_ = R_NilValue, const Nullable& unit_id_ = R_NilValue, const bool& close = true, const bool& disl_prog = false) { IntegerVector unique_treat = {0, 1}; int g = unique_treat.size(); int focal = 1; R_xlen_t n = treat.size(); IntegerVector ind = Range(0, n - 1); R_xlen_t i; int gi; IntegerVector indt(n); IntegerVector indt_sep(g + 1); IntegerVector indt_tmp; IntegerVector nt(g); IntegerVector ind_match(n); ind_match.fill(NA_INTEGER); LogicalVector eligible = !discarded; IntegerVector g_c = Range(0, g - 1); g_c = g_c[g_c != focal]; IntegerVector n_eligible(g); for (i = 0; i < n; i++) { nt[treat[i]]++; if (eligible[i]) { n_eligible[treat[i]]++; } } int nf = nt[focal]; indt_sep[0] = 0; for (gi = 0; gi < g; gi++) { indt_sep[gi + 1] = indt_sep[gi] + nt[gi]; indt_tmp = ind[treat == gi]; for (i = 0; i < nt[gi]; i++) { indt[indt_sep[gi] + i] = indt_tmp[i]; ind_match[indt_tmp[i]] = i; } } IntegerVector ind_focal = indt[Range(indt_sep[focal], indt_sep[focal + 1] - 1)]; std::vector times_matched(n, 0); std::vector times_matched_allowed(n, reuse_max); for (i = 0; i < nf; i++) { times_matched_allowed[ind_focal[i]] = ratio[i]; } int max_ratio = max(ratio); // Output matrix with sample indices of control units IntegerMatrix mm(nf, max_ratio); mm.fill(NA_INTEGER); CharacterVector lab = treat.names(); Function o("order"); NumericVector match_var = mah_covs.column(0); double match_var_caliper = R_PosInf; IntegerVector ind_d_ord = o(match_var); ind_d_ord = ind_d_ord - 1; //location of each unit after sorting IntegerVector match_d_ord = o(ind_d_ord); match_d_ord = match_d_ord - 1; //exact bool use_exact = false; IntegerVector exact; if (exact_.isNotNull()) { exact = as(exact_); use_exact = true; } //distance & caliper_dist bool use_caliper_dist = false; double caliper_dist; NumericVector distance; if (caliper_dist_.isNotNull() && distance_.isNotNull()) { distance = as(distance_); caliper_dist = as(caliper_dist_); use_caliper_dist = true; } //caliper_covs NumericVector caliper_covs; NumericMatrix caliper_covs_mat; int ncc = 0; if (caliper_covs_.isNotNull()) { caliper_covs = as(caliper_covs_); caliper_covs_mat = as(caliper_covs_mat_); ncc = caliper_covs_mat.ncol(); //Find if caliper placed on match_var for (int cci = 0; cci < ncc; cci++) { if (std::equal(caliper_covs_mat.column(cci).begin(), caliper_covs_mat.column(cci).end(), match_var.begin(), match_var.end())) { match_var_caliper = caliper_covs[cci]; break; } } } //antiexact IntegerMatrix antiexact_covs; int aenc = 0; if (antiexact_covs_.isNotNull()) { antiexact_covs = as(antiexact_covs_); aenc = antiexact_covs.ncol(); } //unit_id IntegerVector unit_id; bool use_unit_id = false; if (unit_id_.isNotNull()) { unit_id = as(unit_id_); use_unit_id = true; } //storing closeness std::vector t_id, c_id; std::vector dist; t_id.reserve(n_eligible[focal]); c_id.reserve(n_eligible[focal]); dist.reserve(n_eligible[focal]); //progress bar R_xlen_t prog_length = n_eligible[focal] + sum(ratio) + 1; ETAProgressBar pb; Progress p(prog_length, disl_prog, pb); gi = 0; IntegerVector ck_; int c_id_i, t_id_t_i, t_id_i; int counter = 0; int r = 1; IntegerVector heap_ord; std::vector k; k.reserve(1); R_xlen_t hi; std::function cmp; if (close) { cmp = [&dist](const int& a, const int& b) {return dist[a] < dist[b];}; } else { cmp = [&dist](const int& a, const int& b) {return dist[a] >= dist[b];}; } IntegerVector::iterator ci; for (r = 1; r <= max_ratio; r++) { //Find closest control unit to each treated unit for (int ti : ind_focal) { if (!eligible[ti]) { continue; } counter++; if (counter == 200) { counter = 0; Rcpp::checkUserInterrupt(); } t_id_t_i = ind_match[ti]; k = find_control_mahcovs(ti, ind_d_ord, match_d_ord, match_var, match_var_caliper, treat, distance, eligible, gi, r, mm.row(t_id_t_i), mah_covs, ncc, caliper_covs_mat, caliper_covs, use_caliper_dist, caliper_dist, use_exact, exact, aenc, antiexact_covs); p.increment(); if (k.empty()) { eligible[ti] = false; n_eligible[focal]--; continue; } t_id.push_back(ti); c_id.push_back(k[0]); dist.push_back(sum(pow(mah_covs.row(ti) - mah_covs.row(k[0]), 2.0))); } nf = dist.size(); //Order the list heap_ord = o(dist, _["decreasing"] = !close); heap_ord = heap_ord - 1; i = 0; while (min(n_eligible) > 0 && i < nf) { counter++; if (counter == 200) { counter = 0; Rcpp::checkUserInterrupt(); } hi = heap_ord[i]; t_id_i = t_id[hi]; if (!eligible[t_id_i]) { i++; continue; } t_id_t_i = ind_match[t_id_i]; c_id_i = c_id[hi]; if (!eligible[c_id_i]) { // If control isn't eligible, find new control and try again k = find_control_mahcovs(t_id_i, ind_d_ord, match_d_ord, match_var, match_var_caliper, treat, distance, eligible, gi, r, mm.row(t_id_t_i), mah_covs, ncc, caliper_covs_mat, caliper_covs, use_caliper_dist, caliper_dist, use_exact, exact, aenc, antiexact_covs); //If no matches... if (k.empty()) { eligible[t_id_i] = false; n_eligible[focal]--; continue; } c_id[hi] = k[0]; dist[hi] = sum(pow(mah_covs.row(t_id_i) - mah_covs.row(k[0]), 2.0)); // Find new position of pair in heap ci = std::lower_bound(heap_ord.begin() + i, heap_ord.end(), hi, cmp); if (ci != heap_ord.begin() + i) { std::rotate(heap_ord.begin() + i, heap_ord.begin() + i + 1, ci); } continue; } mm(t_id_t_i, sum(!is_na(mm(t_id_t_i, _)))) = c_id_i; ck_ = {c_id_i, t_id_i}; if (use_unit_id) { ck_ = which(!is_na(match(unit_id, as(unit_id[ck_])))); } for (int ck : ck_) { if (!eligible[ck]) { continue; } times_matched[ck]++; if (times_matched[ck] >= times_matched_allowed[ck]) { eligible[ck] = false; n_eligible[treat[ck]]--; } } p.increment(); i++; } t_id.clear(); c_id.clear(); dist.clear(); } p.update(prog_length); mm = mm + 1; rownames(mm) = as(lab[ind_focal]); return mm; } MatchIt/src/nn_matchC_distmat.cpp0000644000176200001440000002020514720773323016524 0ustar liggesusers// [[Rcpp::depends(RcppProgress)]] #include "eta_progress_bar.h" #include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] IntegerMatrix nn_matchC_distmat(const IntegerVector& treat_, const IntegerVector& ord, const IntegerVector& ratio, const LogicalVector& discarded, const int& reuse_max, const int& focal_, const NumericMatrix& distance_mat, const Nullable& exact_ = R_NilValue, const Nullable& caliper_dist_ = R_NilValue, const Nullable& caliper_covs_ = R_NilValue, const Nullable& caliper_covs_mat_ = R_NilValue, const Nullable& antiexact_covs_ = R_NilValue, const Nullable& unit_id_ = R_NilValue, const bool& disl_prog = false) { IntegerVector unique_treat = unique(treat_); std::sort(unique_treat.begin(), unique_treat.end()); int g = unique_treat.size(); IntegerVector treat = match(treat_, unique_treat) - 1; int focal; for (focal = 0; focal < g; focal++) { if (unique_treat[focal] == focal_) { break; } } R_xlen_t n = treat.size(); IntegerVector ind = Range(0, n - 1); R_xlen_t i; int gi; IntegerVector indt(n); IntegerVector indt_sep(g + 1); IntegerVector indt_tmp; IntegerVector nt(g); IntegerVector ind_match(n); ind_match.fill(NA_INTEGER); LogicalVector eligible = !discarded; IntegerVector g_c = Range(0, g - 1); g_c = g_c[g_c != focal]; IntegerVector n_eligible(g); for (i = 0; i < n; i++) { nt[treat[i]]++; if (eligible[i]) { n_eligible[treat[i]]++; } } int nf = nt[focal]; indt_sep[0] = 0; for (gi = 0; gi < g; gi++) { indt_sep[gi + 1] = indt_sep[gi] + nt[gi]; indt_tmp = ind[treat == gi]; for (i = 0; i < nt[gi]; i++) { indt[indt_sep[gi] + i] = indt_tmp[i]; } } IntegerVector ind_focal = indt[Range(indt_sep[focal], indt_sep[focal + 1] - 1)]; std::vector times_matched(n, 0); std::vector times_matched_allowed(n, reuse_max); for (i = 0; i < nf; i++) { times_matched_allowed[ind_focal[i]] = ratio[i]; } int max_ratio = max(ratio); IntegerVector ind_non_focal = which(treat != focal); for (i = 0; i < n - nf; i++) { ind_match[ind_non_focal[i]] = i; } for (i = 0; i < nf; i++) { ind_match[ind_focal[i]] = i; } // Output matrix with sample indices of control units IntegerMatrix mm(nf, max_ratio); mm.fill(NA_INTEGER); CharacterVector lab = treat_.names(); //exact bool use_exact = false; IntegerVector exact; if (exact_.isNotNull()) { exact = as(exact_); use_exact = true; } //caliper_dist double caliper_dist; if (caliper_dist_.isNotNull()) { caliper_dist = as(caliper_dist_); } else { caliper_dist = max_finite(distance_mat) + .1; } //caliper_covs NumericVector caliper_covs; NumericMatrix caliper_covs_mat; int ncc = 0; if (caliper_covs_.isNotNull()) { caliper_covs = as(caliper_covs_); caliper_covs_mat = as(caliper_covs_mat_); ncc = caliper_covs_mat.ncol(); } //antiexact IntegerMatrix antiexact_covs; int aenc = 0; if (antiexact_covs_.isNotNull()) { antiexact_covs = as(antiexact_covs_); aenc = antiexact_covs.ncol(); } //reuse_max bool use_reuse_max = (reuse_max < nf); //unit_id IntegerVector unit_id; bool use_unit_id = false; if (unit_id_.isNotNull()) { unit_id = as(unit_id_); use_unit_id = true; use_reuse_max = true; } IntegerVector matches_i(1 + max_ratio * (g - 1)); int k_total; //progress bar int prog_length; if (use_reuse_max) prog_length = sum(ratio) + 1; else prog_length = nf + 1; ETAProgressBar pb; Progress p(prog_length, disl_prog, pb); R_xlen_t c; int r, t_id_i; IntegerVector ck_; std::vector k(max_ratio); int counter = 0; if (use_reuse_max) { IntegerVector ord_r(nf); for (r = 1; r <= max_ratio; r++) { ord_r = ord[as(ratio[ord - 1]) >= r]; for (int t_id_t_i : ord_r - 1) { // t_id_t_i; index of treated unit to match among treated units // t_id_i: index of treated unit to match among all units counter++; if (counter == 200) { counter = 0; Rcpp::checkUserInterrupt(); } if (max(as(n_eligible[g_c])) == 0) { break; } t_id_i = ind_focal[t_id_t_i]; p.increment(); if (!eligible[t_id_i]) { continue; } k_total = 0; for (int gi : g_c) { k = find_control_mat(t_id_i, treat, ind_non_focal, distance_mat.row(t_id_t_i), eligible, gi, r, mm.row(t_id_t_i), ncc, caliper_covs_mat, caliper_covs, caliper_dist, use_exact, exact, aenc, antiexact_covs); if (k.empty()) { if (r == 1) { k_total = 0; break; } continue; } matches_i[k_total] = k[0]; k_total++; } if (k_total == 0) { eligible[t_id_i] = false; n_eligible[focal]--; continue; } for (c = 0; c < k_total; c++) { mm(t_id_t_i, sum(!is_na(mm(t_id_t_i, _)))) = matches_i[c]; } matches_i[k_total] = t_id_i; ck_ = matches_i[Range(0, k_total)]; if (use_unit_id) { ck_ = which(!is_na(match(unit_id, as(unit_id[ck_])))); } for (int ck : ck_) { if (!eligible[ck]) { continue; } times_matched[ck]++; if (times_matched[ck] >= times_matched_allowed[ck]) { eligible[ck] = false; n_eligible[treat[ck]]--; } } } } } else { for (int t_id_t_i : ord - 1) { // t_id_t_i; index of treated unit to match among treated units // t_id_i: index of treated unit to match among all units counter++; if (counter == 200) { counter = 0; Rcpp::checkUserInterrupt(); } t_id_i = ind_focal[t_id_t_i]; p.increment(); if (!eligible[t_id_i]) { continue; } k_total = 0; for (int gi : g_c) { k = find_control_mat(t_id_i, treat, ind_non_focal, distance_mat.row(t_id_t_i), eligible, gi, 1, mm.row(t_id_t_i), ncc, caliper_covs_mat, caliper_covs, caliper_dist, use_exact, exact, aenc, antiexact_covs, ratio[t_id_t_i]); if (k.empty()) { k_total = 0; break; } for (int cc : k) { matches_i[k_total] = cc; k_total++; } } if (k_total == 0) { continue; } for (c = 0; c < k_total; c++) { mm(t_id_t_i, sum(!is_na(mm(t_id_t_i, _)))) = matches_i[c]; } } } p.update(prog_length); mm = mm + 1; rownames(mm) = as(lab[ind_focal]); return mm; } MatchIt/src/nn_matchC_mahcovs.cpp0000644000176200001440000002251214757124464016530 0ustar liggesusers// [[Rcpp::depends(RcppProgress)]] #include "eta_progress_bar.h" #include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] IntegerMatrix nn_matchC_mahcovs(const IntegerVector& treat_, const IntegerVector& ord, const IntegerVector& ratio, const LogicalVector& discarded, const int& reuse_max, const int& focal_, const NumericMatrix& mah_covs, const Nullable& distance_ = R_NilValue, const Nullable& exact_ = R_NilValue, const Nullable& caliper_dist_ = R_NilValue, const Nullable& caliper_covs_ = R_NilValue, const Nullable& caliper_covs_mat_ = R_NilValue, const Nullable& antiexact_covs_ = R_NilValue, const Nullable& unit_id_ = R_NilValue, const bool& disl_prog = false) { IntegerVector unique_treat = unique(treat_); std::sort(unique_treat.begin(), unique_treat.end()); int g = unique_treat.size(); IntegerVector treat = match(treat_, unique_treat) - 1; int focal; for (focal = 0; focal < g; focal++) { if (unique_treat[focal] == focal_) { break; } } R_xlen_t n = treat.size(); IntegerVector ind = Range(0, n - 1); R_xlen_t i; int gi; IntegerVector indt(n); IntegerVector indt_sep(g + 1); IntegerVector indt_tmp; IntegerVector nt(g); IntegerVector ind_match(n); ind_match.fill(NA_INTEGER); LogicalVector eligible = !discarded; IntegerVector g_c = Range(0, g - 1); g_c = g_c[g_c != focal]; IntegerVector n_eligible(g); for (i = 0; i < n; i++) { nt[treat[i]]++; if (eligible[i]) { n_eligible[treat[i]]++; } } int nf = nt[focal]; indt_sep[0] = 0; for (gi = 0; gi < g; gi++) { indt_sep[gi + 1] = indt_sep[gi] + nt[gi]; indt_tmp = ind[treat == gi]; for (i = 0; i < nt[gi]; i++) { indt[indt_sep[gi] + i] = indt_tmp[i]; ind_match[indt_tmp[i]] = i; } } IntegerVector ind_focal = indt[Range(indt_sep[focal], indt_sep[focal + 1] - 1)]; std::vector times_matched(n, 0); std::vector times_matched_allowed(n, reuse_max); for (i = 0; i < nf; i++) { times_matched_allowed[ind_focal[i]] = ratio[i]; } int max_ratio = max(ratio); // Output matrix with sample indices of control units IntegerMatrix mm(nf, max_ratio); mm.fill(NA_INTEGER); CharacterVector lab = treat_.names(); Function o("order"); NumericVector match_var = mah_covs.column(0); double match_var_caliper = R_PosInf; IntegerVector ind_d_ord = o(match_var); ind_d_ord = ind_d_ord - 1; //location of each unit after sorting IntegerVector match_d_ord = o(ind_d_ord); match_d_ord = match_d_ord - 1; //exact bool use_exact = false; IntegerVector exact; if (exact_.isNotNull()) { exact = as(exact_); use_exact = true; } //distance & caliper_dist bool use_caliper_dist = false; double caliper_dist; NumericVector distance; if (caliper_dist_.isNotNull() && distance_.isNotNull()) { distance = as(distance_); caliper_dist = as(caliper_dist_); use_caliper_dist = true; } //caliper_covs NumericVector caliper_covs; NumericMatrix caliper_covs_mat; int ncc = 0; if (caliper_covs_.isNotNull()) { caliper_covs = as(caliper_covs_); caliper_covs_mat = as(caliper_covs_mat_); ncc = caliper_covs_mat.ncol(); //Find if caliper placed on match_var for (int cci = 0; cci < ncc; cci++) { if (std::equal(caliper_covs_mat.column(cci).begin(), caliper_covs_mat.column(cci).end(), match_var.begin(), match_var.end())) { match_var_caliper = caliper_covs[cci]; break; } } } //antiexact IntegerMatrix antiexact_covs; int aenc = 0; if (antiexact_covs_.isNotNull()) { antiexact_covs = as(antiexact_covs_); aenc = antiexact_covs.ncol(); } //reuse_max bool use_reuse_max = (reuse_max < nf); //unit_id IntegerVector unit_id; bool use_unit_id = false; if (unit_id_.isNotNull()) { unit_id = as(unit_id_); use_unit_id = true; use_reuse_max = true; } IntegerVector matches_i(1 + max_ratio * (g - 1)); int k_total; //progress bar int prog_length; if (use_reuse_max) prog_length = sum(ratio) + 1; else prog_length = nf + 1; ETAProgressBar pb; Progress p(prog_length, disl_prog, pb); R_xlen_t c; int r, t_id_i; IntegerVector ck_; std::vector k(max_ratio); int counter = 0; if (use_reuse_max) { IntegerVector ord_r(nf); for (r = 1; r <= max_ratio; r++) { ord_r = ord[as(ratio[ord - 1]) >= r]; for (int t_id_t_i : ord_r - 1) { // t_id_t_i; index of treated unit to match among treated units // t_id_i: index of treated unit to match among all units counter++; if (counter == 200) { counter = 0; Rcpp::checkUserInterrupt(); } if (max(as(n_eligible[g_c])) == 0) { break; } t_id_i = ind_focal[t_id_t_i]; p.increment(); if (!eligible[t_id_i]) { continue; } k_total = 0; for (int gi : g_c) { k = find_control_mahcovs(t_id_i, ind_d_ord, match_d_ord, match_var, match_var_caliper, treat, distance, eligible, gi, r, mm.row(t_id_t_i), mah_covs, ncc, caliper_covs_mat, caliper_covs, use_caliper_dist, caliper_dist, use_exact, exact, aenc, antiexact_covs); if (k.empty()) { if (r == 1) { k_total = 0; break; } continue; } matches_i[k_total] = k[0]; k_total++; } if (k_total == 0) { eligible[t_id_i] = false; n_eligible[focal]--; continue; } for (c = 0; c < k_total; c++) { mm(t_id_t_i, sum(!is_na(mm(t_id_t_i, _)))) = matches_i[c]; } matches_i[k_total] = t_id_i; ck_ = matches_i[Range(0, k_total)]; if (use_unit_id) { ck_ = which(!is_na(match(unit_id, as(unit_id[ck_])))); } for (int ck : ck_) { if (!eligible[ck]) { continue; } times_matched[ck]++; if (times_matched[ck] >= times_matched_allowed[ck]) { eligible[ck] = false; n_eligible[treat[ck]]--; } } } } } else { for (int t_id_t_i : ord - 1) { // t_id_t_i; index of treated unit to match among treated units // t_id_i: index of treated unit to match among all units counter++; if (counter == 200) { counter = 0; Rcpp::checkUserInterrupt(); } t_id_i = ind_focal[t_id_t_i]; p.increment(); if (!eligible[t_id_i]) { continue; } k_total = 0; for (int gi : g_c) { k = find_control_mahcovs(t_id_i, ind_d_ord, match_d_ord, match_var, match_var_caliper, treat, distance, eligible, gi, 1, mm.row(t_id_t_i), mah_covs, ncc, caliper_covs_mat, caliper_covs, use_caliper_dist, caliper_dist, use_exact, exact, aenc, antiexact_covs, ratio[t_id_t_i]); if (k.empty()) { k_total = 0; break; } for (int cc : k) { matches_i[k_total] = cc; k_total++; } } if (k_total == 0) { continue; } for (c = 0; c < k_total; c++) { mm(t_id_t_i, sum(!is_na(mm(t_id_t_i, _)))) = matches_i[c]; } } } p.update(prog_length); mm = mm + 1; rownames(mm) = as(lab[ind_focal]); return mm; } MatchIt/src/get_splitsC.cpp0000644000176200001440000000115414711615533015365 0ustar liggesusers#include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] NumericVector get_splitsC(const NumericVector& x, const double& caliper) { NumericVector splits; NumericVector x_ = unique(x); NumericVector x_sorted = x_.sort(); R_xlen_t n = x_sorted.size(); if (n <= 1) { return splits; } splits = x_sorted[0]; for (int i = 1; i < x_sorted.length(); i++) { if (x_sorted[i] - x_sorted[i - 1] <= caliper) continue; splits.push_back((x_sorted[i] + x_sorted[i - 1]) / 2); } splits.push_back(x_sorted[n - 1]); return splits; } MatchIt/src/all_equal_to.cpp0000644000176200001440000000152214740557565015561 0ustar liggesusers#include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // Templated function to check if all elements of a vector are equal to a supplied value template bool all_equal_to_(Vector x, typename traits::storage_type::type y) { for (auto xi : x) { if (xi != y) { return false; } } return true; } // Wrapper function to handle different types of R vectors // [[Rcpp::export]] bool all_equal_to(RObject x, RObject y) { switch (TYPEOF(x)) { case INTSXP: return all_equal_to_(as(x), as(y)); case REALSXP: return all_equal_to_(as(x), as(y)); case LGLSXP: return all_equal_to_(as(x), as(y)); default: stop("Unsupported vector type"); } } MatchIt/src/pairdistC.cpp0000644000176200001440000000142514757124425015035 0ustar liggesusers#include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] double pairdistsubC(const NumericVector& x, const IntegerVector& t, const IntegerVector& s) { double dist = 0; R_xlen_t i, j; int s_i, o_i, o_j; int k = 0; Function ord("order"); IntegerVector o = ord(s); o = o - 1; R_xlen_t n = sum(!is_na(s)); for (i = 0; i < n; i++) { o_i = o[i]; s_i = s[o_i]; for (j = i + 1; j < n; j++) { o_j = o[j]; if (s[o_j] != s_i) { break; } if (t[o_j] == t[o_i]) { continue; } //Numerically stable formula for adding new observation to a mean k++; dist += (std::abs(x[o_j] - x[o_i]) - dist) / k; } } return dist; } MatchIt/src/nn_matchC_vec_closest.cpp0000644000176200001440000002163014757124445017400 0ustar liggesusers// [[Rcpp::depends(RcppProgress)]] #include "eta_progress_bar.h" #include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] IntegerMatrix nn_matchC_vec_closest(const IntegerVector& treat, const IntegerVector& ratio, const LogicalVector& discarded, const int& reuse_max, const NumericVector& distance, const Nullable& exact_ = R_NilValue, const Nullable& caliper_dist_ = R_NilValue, const Nullable& caliper_covs_ = R_NilValue, const Nullable& caliper_covs_mat_ = R_NilValue, const Nullable& antiexact_covs_ = R_NilValue, const Nullable& unit_id_ = R_NilValue, const bool& close = true, const bool& disl_prog = false) { IntegerVector unique_treat = {0, 1}; int g = unique_treat.size(); int focal = 1; R_xlen_t n = treat.size(); IntegerVector ind = Range(0, n - 1); R_xlen_t i; int gi; IntegerVector indt(n); IntegerVector indt_sep(g + 1); IntegerVector indt_tmp; IntegerVector nt(g); IntegerVector ind_match(n); ind_match.fill(NA_INTEGER); LogicalVector eligible = !discarded; // IntegerVector g_c = Range(0, g - 1); // g_c = g_c[g_c != focal]; IntegerVector n_eligible(g); for (i = 0; i < n; i++) { nt[treat[i]]++; if (eligible[i]) { n_eligible[treat[i]]++; } } int nf = nt[focal]; indt_sep[0] = 0; for (gi = 0; gi < g; gi++) { indt_sep[gi + 1] = indt_sep[gi] + nt[gi]; indt_tmp = ind[treat == gi]; for (i = 0; i < nt[gi]; i++) { indt[indt_sep[gi] + i] = indt_tmp[i]; ind_match[indt_tmp[i]] = i; } } IntegerVector ind_focal = indt[Range(indt_sep[focal], indt_sep[focal + 1] - 1)]; std::vector times_matched(n, 0); std::vector times_matched_allowed(n, reuse_max); for (i = 0; i < nf; i++) { times_matched_allowed[ind_focal[i]] = ratio[i]; } int max_ratio = max(ratio); // Output matrix with sample indices of control units IntegerMatrix mm(nf, max_ratio); mm.fill(NA_INTEGER); CharacterVector lab = treat.names(); //Use base::order() because faster than C++ std::sort() Function o("order"); IntegerVector ind_d_ord = o(distance); ind_d_ord = ind_d_ord - 1; IntegerVector match_d_ord = o(ind_d_ord); match_d_ord = match_d_ord - 1; IntegerVector last_control(g); last_control.fill(n - 1); IntegerVector first_control(g); first_control.fill(0); //exact bool use_exact = false; IntegerVector exact; if (exact_.isNotNull()) { exact = as(exact_); use_exact = true; } //caliper_dist double caliper_dist; if (caliper_dist_.isNotNull()) { caliper_dist = as(caliper_dist_); } else { caliper_dist = max_finite(distance) - min_finite(distance) + 1; } //caliper_covs NumericVector caliper_covs; NumericMatrix caliper_covs_mat; int ncc = 0; if (caliper_covs_.isNotNull()) { caliper_covs = as(caliper_covs_); caliper_covs_mat = as(caliper_covs_mat_); ncc = caliper_covs_mat.ncol(); } //antiexact IntegerMatrix antiexact_covs; int aenc = 0; if (antiexact_covs_.isNotNull()) { antiexact_covs = as(antiexact_covs_); aenc = antiexact_covs.ncol(); } //unit_id IntegerVector unit_id; bool use_unit_id = false; if (unit_id_.isNotNull()) { unit_id = as(unit_id_); use_unit_id = true; } //storing closeness std::vector t_id, c_id; std::vector dist; t_id.reserve(n_eligible[focal]); c_id.reserve(n_eligible[focal]); dist.reserve(n_eligible[focal]); gi = 0; update_first_and_last_control(first_control, last_control, ind_d_ord, eligible, treat, gi); IntegerVector ck_; int c_id_i, t_id_t_i, t_id_i; int counter = 0; int r = 1; IntegerVector heap_ord; std::vector k(1); R_xlen_t hi; //progress bar R_xlen_t prog_length = sum(ratio) + 1; ETAProgressBar pb; Progress p(prog_length, disl_prog, pb); IntegerVector::iterator ci; std::function cmp; if (close) { cmp = [&dist](const int& a, const int& b) {return dist[a] < dist[b];}; } else { cmp = [&dist](const int& a, const int& b) {return dist[a] >= dist[b];}; } for (r = 1; r <= max_ratio; r++) { //Find closest control unit to each treated unit for (int ti : ind_focal) { if (!eligible[ti]) { continue; } counter++; if (counter == 200) { counter = 0; Rcpp::checkUserInterrupt(); } t_id_t_i = ind_match[ti]; k = find_control_vec(ti, ind_d_ord, match_d_ord, treat, distance, eligible, gi, r, mm.row(t_id_t_i), ncc, caliper_covs_mat, caliper_covs, caliper_dist, use_exact, exact, aenc, antiexact_covs, first_control, last_control); if (k.empty()) { eligible[ti] = false; n_eligible[focal]--; continue; } t_id.push_back(ti); c_id.push_back(k[0]); dist.push_back(std::abs(distance[ti] - distance[k[0]])); } nf = dist.size(); //Order the list heap_ord = o(dist, _["decreasing"] = !close); heap_ord = heap_ord - 1; i = 0; //Go through ordered list and assign matches, re-matching when necessary while (min(n_eligible) > 0 && i < nf) { counter++; if (counter == 200) { counter = 0; Rcpp::checkUserInterrupt(); } hi = heap_ord[i]; t_id_i = t_id[hi]; if (!eligible[t_id_i]) { i++; continue; } t_id_t_i = ind_match[t_id_i]; c_id_i = c_id[hi]; if (!eligible[c_id_i]) { // If control isn't eligible, find new control and try again update_first_and_last_control(first_control, last_control, ind_d_ord, eligible, treat, gi); k = find_control_vec(t_id_i, ind_d_ord, match_d_ord, treat, distance, eligible, gi, r, mm.row(t_id_t_i), ncc, caliper_covs_mat, caliper_covs, caliper_dist, use_exact, exact, aenc, antiexact_covs, first_control, last_control, 1, c_id_i); //If no matches... if (k.empty()) { eligible[t_id_i] = false; n_eligible[focal]--; continue; } c_id[hi] = k[0]; dist[hi] = std::abs(distance[t_id_i] - distance[k[0]]); // Find new position of pair in heap ci = std::lower_bound(heap_ord.begin() + i, heap_ord.end(), hi, cmp); if (ci != heap_ord.begin() + i) { std::rotate(heap_ord.begin() + i, heap_ord.begin() + i + 1, ci); } continue; } mm(t_id_t_i, sum(!is_na(mm(t_id_t_i, _)))) = c_id_i; ck_ = {c_id_i, t_id_i}; if (use_unit_id) { ck_ = which(!is_na(match(unit_id, as(unit_id[ck_])))); } for (int ck : ck_) { if (!eligible[ck]) { continue; } times_matched[ck]++; if (times_matched[ck] >= times_matched_allowed[ck]) { eligible[ck] = false; n_eligible[treat[ck]]--; } } p.increment(); i++; } t_id.clear(); c_id.clear(); dist.clear(); } p.update(prog_length); mm = mm + 1; rownames(mm) = as(lab[ind_focal]); return mm; } MatchIt/src/Makevars0000644000176200001440000000016114375442624014077 0ustar liggesusersPKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) MatchIt/src/internal.cpp0000644000176200001440000004657314757124506014744 0ustar liggesusers#include #include using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // Rcpp internal functions //C implementation of tabulate(). Faster than base::tabulate(), but real //use is in subclass2mmC(). // [[Rcpp::interfaces(cpp)]] IntegerVector tabulateC_(const IntegerVector& bins, const int& nbins = 0) { int max_bin; if (nbins > 0) max_bin = nbins; else max_bin = max(na_omit(bins)); IntegerVector counts(max_bin); int n = bins.size(); for (int i = 0; i < n; i++) { if (bins[i] > 0 && bins[i] <= max_bin) { counts[bins[i] - 1]++; } } return counts; } //Rcpp port of base::which // [[Rcpp::interfaces(cpp)]] IntegerVector which(const LogicalVector& x) { IntegerVector ind = Range(0, x.size() - 1); return ind[x]; } // [[Rcpp::interfaces(cpp)]] bool antiexact_okay(const int& aenc, const int& i, const int& j, const IntegerMatrix& antiexact_covs) { if (aenc == 0) { return true; } for (int k = 0; k < aenc; k++) { if (antiexact_covs(i, k) == antiexact_covs(j, k)) { return false; } } return true; } // [[Rcpp::interfaces(cpp)]] bool caliper_covs_okay(const int& ncc, const int& i, const int& j, const NumericMatrix& caliper_covs_mat, const NumericVector& caliper_covs) { if (ncc == 0) { return true; } for (int k = 0; k < ncc; k++) { if (caliper_covs[k] >= 0) { if (std::abs(caliper_covs_mat(i, k) - caliper_covs_mat(j, k)) > caliper_covs[k]) { return false; } } else { if (std::abs(caliper_covs_mat(i, k) - caliper_covs_mat(j, k)) <= -caliper_covs[k]) { return false; } } } return true; } // [[Rcpp::interfaces(cpp)]] bool caliper_dist_okay(const bool& use_caliper_dist, const int& i, const int& j, const NumericVector& distance, const double& caliper_dist) { if (!use_caliper_dist) { return true; } if (caliper_dist >= 0) { return std::abs(distance[i] - distance[j]) <= caliper_dist; } else { return std::abs(distance[i] - distance[j]) > -caliper_dist; } } // [[Rcpp::interfaces(cpp)]] bool mm_okay(const int& r, const int& i, const IntegerVector& mm_rowi) { if (r > 1) { for (int j : na_omit(mm_rowi)) { if (i == j) { return false; } } } return true; } // [[Rcpp::interfaces(cpp)]] bool exact_okay(const bool& use_exact, const int& i, const int& j, const IntegerVector& exact) { if (!use_exact) { return true; } return exact[i] == exact[j]; } // [[Rcpp::interfaces(cpp)]] std::vector find_control_vec(const int& t_id, const IntegerVector& ind_d_ord, const IntegerVector& match_d_ord, const IntegerVector& treat, const NumericVector& distance, const LogicalVector& eligible, const int& gi, const int& r, const IntegerVector& mm_rowi_, const int& ncc, const NumericMatrix& caliper_covs_mat, const NumericVector& caliper_covs, const double& caliper_dist, const bool& use_exact, const IntegerVector& exact, const int& aenc, const IntegerMatrix& antiexact_covs, const IntegerVector& first_control, const IntegerVector& last_control, const int& ratio = 1, const int& prev_start = -1) { int ii = match_d_ord[t_id]; IntegerVector mm_rowi; std::vector possible_starts; if (r > 1) { mm_rowi = na_omit(mm_rowi_); mm_rowi = mm_rowi[as(treat[mm_rowi]) == gi]; possible_starts.reserve(mm_rowi.size() + 2); for (int mmi : mm_rowi) { possible_starts.push_back(match_d_ord[mmi]); } } else { possible_starts.reserve(2); } if (prev_start >= 0) { possible_starts.push_back(match_d_ord[prev_start]); } int iil, iir; double min_dist; if (possible_starts.empty()) { iil = ii; iir = ii; min_dist = 0; } else { possible_starts.push_back(ii); iil = *std::min_element(possible_starts.begin(), possible_starts.end()); iir = *std::max_element(possible_starts.begin(), possible_starts.end()); if (iil == ii) { min_dist = std::abs(distance[t_id] - distance[ind_d_ord[iir]]); } else if (iir == ii) { min_dist = std::abs(distance[t_id] - distance[ind_d_ord[iil]]); } else { min_dist = std::max(std::abs(distance[t_id] - distance[ind_d_ord[iil]]), std::abs(distance[t_id] - distance[ind_d_ord[iir]])); } } int min_ii = first_control[gi]; int max_ii = last_control[gi]; double di = distance[t_id]; bool l_stop = false; bool r_stop = false; double dist_c; std::vector potential_matches_id; potential_matches_id.reserve(2 * ratio); std::vector potential_matches_dist; potential_matches_dist.reserve(2 * ratio); int num_matches_l = 0; int num_matches_r = 0; int iz; bool left = false; int num_closer_than_dist_c; while (!l_stop || !r_stop) { if (l_stop) { left = false; } else if (r_stop) { left = true; } else { left = !left; } if (left) { if (iil <= min_ii || num_matches_l == ratio) { l_stop = true; continue; } iil -= 1; iz = ind_d_ord[iil]; } else { if (iir >= max_ii || num_matches_r == ratio) { r_stop = true; continue; } iir += 1; iz = ind_d_ord[iir]; } if (!eligible[iz]) { continue; } if (treat[iz] != gi) { continue; } if (!mm_okay(r, iz, mm_rowi)) { continue; } dist_c = std::abs(di - distance[iz]); if (caliper_dist > 0) { if (dist_c > caliper_dist) { if (left) { l_stop = true; } else { r_stop = true; } continue; } } else { if (dist_c <= -caliper_dist) { continue; } } //If current dist is worse than ratio dists, continue if (potential_matches_id.size() >= ratio) { num_closer_than_dist_c = 0; for (double d : potential_matches_dist) { if (d < dist_c) { num_closer_than_dist_c++; if (num_closer_than_dist_c == ratio) { break; } } } if (num_closer_than_dist_c >= ratio) { if (left) { l_stop = true; } else { r_stop = true; } continue; } } if (dist_c < min_dist) { continue; } if (!exact_okay(use_exact, t_id, iz, exact)) { continue; } if (!antiexact_okay(aenc, t_id, iz, antiexact_covs)) { continue; } if (!caliper_covs_okay(ncc, t_id, iz, caliper_covs_mat, caliper_covs)) { continue; } potential_matches_id.push_back(iz); potential_matches_dist.push_back(dist_c); if (left) { num_matches_l++; if (num_matches_l == ratio) { l_stop = true; } } else { num_matches_r++; if (num_matches_r == ratio) { r_stop = true; } } } int n_potential_matches = potential_matches_id.size(); if (n_potential_matches <= 1) { return potential_matches_id; } if (n_potential_matches <= ratio && std::is_sorted(potential_matches_dist.begin(), potential_matches_dist.end())) { return potential_matches_id; } std::vector ind(n_potential_matches); std::iota(ind.begin(), ind.end(), 0); std::vector matches_out; if (n_potential_matches > ratio) { std::partial_sort(ind.begin(), ind.begin() + ratio, ind.end(), [&potential_matches_dist](int a, int b){ return potential_matches_dist[a] < potential_matches_dist[b]; }); matches_out.reserve(ratio); for (auto it = ind.begin(); it != ind.begin() + ratio; ++it) { matches_out.push_back(potential_matches_id[*it]); } } else { std::sort(ind.begin(), ind.end(), [&potential_matches_dist](int a, int b){ return potential_matches_dist[a] < potential_matches_dist[b]; }); matches_out.reserve(n_potential_matches); for (auto it = ind.begin(); it != ind.end(); ++it) { matches_out.push_back(potential_matches_id[*it]); } } return matches_out; } // [[Rcpp::interfaces(cpp)]] std::vector find_control_mahcovs(const int& t_id, const IntegerVector& ind_d_ord, const IntegerVector& match_d_ord, const NumericVector& match_var, const double& match_var_caliper, const IntegerVector& treat, const NumericVector& distance, const LogicalVector& eligible, const int& gi, const int& r, const IntegerVector& mm_rowi, const NumericMatrix& mah_covs, const int& ncc, const NumericMatrix& caliper_covs_mat, const NumericVector& caliper_covs, const bool& use_caliper_dist, const double& caliper_dist, const bool& use_exact, const IntegerVector& exact, const int& aenc, const IntegerMatrix& antiexact_covs, const int& ratio = 1) { int ii = match_d_ord[t_id]; int iil, iir; iil = ii; iir = ii; int min_ii = 0; int max_ii = match_d_ord.size() - 1; bool l_stop = false; bool r_stop = false; double dist_c; std::vector> potential_matches; potential_matches.reserve(ratio); std::pair new_match; int num_matches_l = 0; int num_matches_r = 0; double mv_i = match_var[t_id]; double mv_dist; int iz; bool left = false; auto dist_comp = [](std::pair a, std::pair b) { return a.second < b.second; }; while (!l_stop || !r_stop) { if (l_stop) { left = false; } else if (r_stop) { left = true; } else { left = !left; } if (left) { if (iil <= min_ii || num_matches_l == ratio) { l_stop = true; continue; } iil -= 1; iz = ind_d_ord[iil]; } else { if (iir >= max_ii || num_matches_r == ratio) { r_stop = true; continue; } iir += 1; iz = ind_d_ord[iir]; } if (!eligible[iz]) { continue; } if (treat[iz] != gi) { continue; } if (!mm_okay(r, iz, mm_rowi)) { continue; } mv_dist = pow(mv_i - match_var[iz], 2); if (mv_dist > match_var_caliper) { if (left) { l_stop = true; } else { r_stop = true; } continue; } //If current dist is worse than ratio dists, continue if (potential_matches.size() == ratio) { if (potential_matches.back().second < mv_dist) { if (left) { l_stop = true; } else { r_stop = true; } continue; } } if (!exact_okay(use_exact, t_id, iz, exact)) { continue; } if (!caliper_dist_okay(use_caliper_dist, t_id, iz, distance, caliper_dist)) { continue; } if (!antiexact_okay(aenc, t_id, iz, antiexact_covs)) { continue; } if (!caliper_covs_okay(ncc, t_id, iz, caliper_covs_mat, caliper_covs)) { continue; } dist_c = sum(pow(mah_covs.row(t_id) - mah_covs.row(iz), 2.0)); if (!std::isfinite(dist_c)) { continue; } new_match = std::pair(iz, dist_c); if (potential_matches.empty()) { potential_matches.push_back(new_match); } else if (dist_c > potential_matches.back().second) { if (potential_matches.size() == ratio) { continue; } potential_matches.push_back(new_match); } else if (ratio == 1) { potential_matches[0] = new_match; } else { if (potential_matches.size() == ratio) { potential_matches.pop_back(); } if (dist_c > potential_matches.back().second) { potential_matches.push_back(new_match); } else { potential_matches.insert(std::lower_bound(potential_matches.begin(), potential_matches.end(), new_match, dist_comp), new_match); } } } std::vector matches_out; matches_out.reserve(potential_matches.size()); for (auto p : potential_matches) { matches_out.push_back(p.first); } return matches_out; } // [[Rcpp::interfaces(cpp)]] std::vector find_control_mat(const int& t_id, const IntegerVector& treat, const IntegerVector& ind_non_focal, const NumericVector& distance_mat_row_i, const LogicalVector& eligible, const int& gi, const int& r, const IntegerVector& mm_rowi, const int& ncc, const NumericMatrix& caliper_covs_mat, const NumericVector& caliper_covs, const double& caliper_dist, const bool& use_exact, const IntegerVector& exact, const int& aenc, const IntegerMatrix& antiexact_covs, const int& ratio = 1) { int c_id_i; double dist_c; std::vector potential_matches_id; if (ratio < 1) { return potential_matches_id; } std::vector potential_matches_dist; double max_dist; R_xlen_t nc = distance_mat_row_i.size(); potential_matches_id.reserve(nc); potential_matches_dist.reserve(nc); for (R_xlen_t c = 0; c < nc; c++) { dist_c = distance_mat_row_i[c]; if (potential_matches_id.size() == ratio) { if (dist_c > max_dist) { continue; } } if (caliper_dist >= 0) { if (dist_c > caliper_dist) { continue; } } else { if (dist_c <= -caliper_dist) { continue; } } if (!std::isfinite(dist_c)) { continue; } c_id_i = ind_non_focal[c]; if (!eligible[c_id_i]) { continue; } if (treat[c_id_i] != gi) { continue; } if (!mm_okay(r, c_id_i, mm_rowi)) { continue; } if (!exact_okay(use_exact, t_id, c_id_i, exact)) { continue; } if (!antiexact_okay(aenc, t_id, c_id_i, antiexact_covs)) { continue; } if (!caliper_covs_okay(ncc, t_id, c_id_i, caliper_covs_mat, caliper_covs)) { continue; } potential_matches_id.push_back(c_id_i); potential_matches_dist.push_back(dist_c); if (potential_matches_id.size() == 1) { max_dist = dist_c; } else if (dist_c > max_dist) { max_dist = dist_c; } } int n_potential_matches = potential_matches_id.size(); if (n_potential_matches <= 1) { return potential_matches_id; } if (n_potential_matches <= ratio && std::is_sorted(potential_matches_dist.begin(), potential_matches_dist.end())) { return potential_matches_id; } std::vector ind(n_potential_matches); std::iota(ind.begin(), ind.end(), 0); std::vector matches_out; if (n_potential_matches > ratio) { std::partial_sort(ind.begin(), ind.begin() + ratio, ind.end(), [&potential_matches_dist](int a, int b){ return potential_matches_dist[a] < potential_matches_dist[b]; }); matches_out.reserve(ratio); for (auto it = ind.begin(); it != ind.begin() + ratio; ++it) { matches_out.push_back(potential_matches_id[*it]); } } else { std::sort(ind.begin(), ind.end(), [&potential_matches_dist](int a, int b){ return potential_matches_dist[a] < potential_matches_dist[b]; }); matches_out.reserve(n_potential_matches); for (auto it = ind.begin(); it != ind.end(); ++it) { matches_out.push_back(potential_matches_id[*it]); } } return matches_out; } // [[Rcpp::interfaces(cpp)]] double max_finite(const NumericVector& x) { double m = NA_REAL; R_xlen_t n = x.size(); R_xlen_t i; bool found = false; //Find first finite value for (i = 0; i < n; i++) { if (std::isfinite(x[i])) { m = x[i]; found = true; break; } } //If none found, return NA if (!found) { return m; } //Find largest finite value for (R_xlen_t j = i + 1; j < n; j++) { if (!std::isfinite(x[j])) { continue; } if (x[j] > m) { m = x[j]; } } return m; } // [[Rcpp::interfaces(cpp)]] double min_finite(const NumericVector& x) { double m = NA_REAL; R_xlen_t n = x.size(); R_xlen_t i; bool found = false; //Find first finite value for (i = 0; i < n; i++) { if (std::isfinite(x[i])) { m = x[i]; found = true; break; } } //If none found, return NA if (!found) { return m; } //Find smallest finite value for (R_xlen_t j = i + 1; j < n; j++) { if (!std::isfinite(x[j])) { continue; } if (x[j] < m) { m = x[j]; } } return m; } // [[Rcpp::interfaces(cpp)]] void update_first_and_last_control(IntegerVector first_control, IntegerVector last_control, const IntegerVector& ind_d_ord, const LogicalVector& eligible, const IntegerVector& treat, const int& gi) { R_xlen_t c; // Update first_control if (!eligible[ind_d_ord[first_control[gi]]]) { for (c = first_control[gi] + 1; c <= last_control[gi]; c++) { if (eligible[ind_d_ord[c]]) { if (treat[ind_d_ord[c]] == gi) { first_control[gi] = c; break; } } } } // Update last_control if (!eligible[ind_d_ord[last_control[gi]]]) { for (c = last_control[gi] - 1; c >= first_control[gi]; c--) { if (eligible[ind_d_ord[c]]) { if (treat[ind_d_ord[c]] == gi) { last_control[gi] = c; break; } } } } } MatchIt/src/eta_progress_bar.h0000644000176200001440000001716414711615466016110 0ustar liggesusers/* * eta_progress_bar.h * * A custom ProgressBar class to display a progress bar with time estimation * * Author: clemens@nevrome.de * * Copied from https://github.com/kforner/rcpp_progress/blob/master/inst/examples/RcppProgressETA/src/eta_progress_bar.hpp with modifications by NHG * */ #ifndef _RcppProgress_ETA_PROGRESS_BAR_HPP #define _RcppProgress_ETA_PROGRESS_BAR_HPP #include #include #include #include #include #include #include "progress_bar.hpp" // for unices only #if !defined(WIN32) && !defined(__WIN32) && !defined(__WIN32__) #include #endif class ETAProgressBar: public ProgressBar{ public: // ====== LIFECYCLE ===== /** * Main constructor */ ETAProgressBar() { _max_ticks = 50; _finalized = false; _timer_flag = true; } ~ETAProgressBar() { } public: // ===== main methods ===== void display() { REprintf("0%% 10 20 30 40 50 60 70 80 90 100%%\n"); REprintf("[----|----|----|----|----|----|----|----|----|----|\n"); flush_console(); } // update display void update(float progress) { // stop if already finalized if (_finalized) return; // measure current time time(¤t_time); // start time measurement when update() is called the first time if (_timer_flag) { _timer_flag = false; // measure start time time_at_start = current_time; time_at_last_refresh = current_time; progress_at_last_refresh = progress; _num_ticks = _compute_nb_ticks(progress); time_string = "calculating..."; // create progress bar string std::string progress_bar_string = _current_ticks_display(_num_ticks); // merge progress bar and time string std::stringstream strs; strs << "|" << progress_bar_string << "| ETA: " << time_string; std::string temp_str = strs.str(); char const* char_type = temp_str.c_str(); // print: remove old and replace with new REprintf("\r"); REprintf("%s", char_type); } else { double time_since_start = std::difftime(current_time, time_at_start); if (progress != 1) { // ensure overwriting of old time info int empty_length = time_string.length(); int _num_ticks_current = _compute_nb_ticks(progress); bool update_bar = (_num_ticks_current != _num_ticks); _num_ticks = _num_ticks_current; if (progress > 0 && time_since_start > 1) { double time_since_last_refresh = std::difftime(current_time, time_at_last_refresh); if (time_since_last_refresh >= .5) { update_bar = true; double progress_since_last_refresh = progress - progress_at_last_refresh; double total_rate = progress / time_since_start; if (progress_since_last_refresh == 0) { progress_since_last_refresh = .0000001; } double current_rate = progress_since_last_refresh / time_since_last_refresh; //alpha weights average rate against recent recent (current) rate; //alpha = 1 => estimate based on on total_rate (treats as constant) //alpha = 0 => estimate based on recent rate (high fluctuation) double alpha = .8; double eta = (1 - progress) * (alpha / total_rate + (1 - alpha) / current_rate); // convert seconds to time string time_string = "~"; time_string += _time_to_string(eta); time_at_last_refresh = current_time; progress_at_last_refresh = progress; } } if (update_bar) { // create progress bar string std::string progress_bar_string = _current_ticks_display(_num_ticks); std::string empty_space = std::string(std::fdim(empty_length, time_string.length()), ' '); // merge progress bar and time string std::stringstream strs; strs << "|" << progress_bar_string << "| ETA: " << time_string << empty_space; std::string temp_str = strs.str(); char const* char_type = temp_str.c_str(); // print: remove old and replace with new REprintf("\r"); REprintf("%s", char_type); } } else { // ensure overwriting of old time info int empty_length = time_string.length(); // finalize display when ready // convert seconds to time string std::string time_string = _time_to_string(time_since_start); std::string empty_space = std::string(std::fdim(empty_length, time_string.length()), ' '); // create progress bar string _num_ticks = _compute_nb_ticks(progress); std::string progress_bar_string = _current_ticks_display(_num_ticks); // merge progress bar and time string std::stringstream strs; strs << "|" << progress_bar_string << "| " << "Elapsed: " << time_string << empty_space; std::string temp_str = strs.str(); char const* char_type = temp_str.c_str(); // print: remove old and replace with new REprintf("\r"); REprintf("%s", char_type); _finalize_display(); } } } void end_display() { update(1); } protected: // ==== other instance methods ===== // convert double with seconds to time string std::string _time_to_string(double seconds) { int time = (int) seconds; int hour = 0; int min = 0; int sec = 0; hour = time / 3600; time = time % 3600; min = time / 60; time = time % 60; sec = time; std::stringstream time_strs; if (hour != 0) time_strs << hour << "h "; if (min != 0) time_strs << min << "min "; if (sec != 0 || (hour == 0 && min == 0)) time_strs << sec << "s "; std::string time_str = time_strs.str(); return time_str; } // update the ticks display corresponding to progress std::string _current_ticks_display(int nb) { std::stringstream ticks_strs; for (int i = 0; i < (_max_ticks - 1); ++i) { if (i < nb) { ticks_strs << "="; } else { ticks_strs << " "; } } std::string tick_space_string = ticks_strs.str(); return tick_space_string; } // finalize void _finalize_display() { if (_finalized) return; REprintf("\n"); flush_console(); _finalized = true; } // compute number of ticks according to progress int _compute_nb_ticks(float progress) { return int(progress * _max_ticks); } // N.B: does nothing on windows void flush_console() { #if !defined(WIN32) && !defined(__WIN32) && !defined(__WIN32__) R_FlushConsole(); #endif } private: // ===== INSTANCE VARIABLES ==== int _max_ticks; // the total number of ticks to print int _num_ticks; bool _finalized; bool _timer_flag; time_t time_at_start, current_time, time_at_last_refresh; float progress_at_last_refresh; std::string time_string; }; #endif MatchIt/src/internal.h0000644000176200001440000001250114757124477014400 0ustar liggesusers#ifndef INTERNAL_H #define INTERNAL_H #include #include #include #include #include #include using namespace Rcpp; IntegerVector tabulateC_(const IntegerVector& bins, const int& nbins = 0); IntegerVector which(const LogicalVector& x); std::vector find_control_vec(const int& t_id, const IntegerVector& ind_d_ord, const IntegerVector& match_d_ord, const IntegerVector& treat, const NumericVector& distance, const LogicalVector& eligible, const int& gi, const int& r, const IntegerVector& mm_rowi_, const int& ncc, const NumericMatrix& caliper_covs_mat, const NumericVector& caliper_covs, const double& caliper_dist, const bool& use_exact, const IntegerVector& exact, const int& aenc, const IntegerMatrix& antiexact_covs, const IntegerVector& first_control, const IntegerVector& last_control, const int& ratio = 1, const int& prev_start = -1); std::vector find_control_mahcovs(const int& t_id, const IntegerVector& ind_d_ord, const IntegerVector& match_d_ord, const NumericVector& match_var, const double& match_var_caliper, const IntegerVector& treat, const NumericVector& distance, const LogicalVector& eligible, const int& gi, const int& r, const IntegerVector& mm_rowi, const NumericMatrix& mah_covs, const int& ncc, const NumericMatrix& caliper_covs_mat, const NumericVector& caliper_covs, const bool& use_caliper_dist, const double& caliper_dist, const bool& use_exact, const IntegerVector& exact, const int& aenc, const IntegerMatrix& antiexact_covs, const int& ratio = 1); std::vector find_control_mat(const int& t_id, const IntegerVector& treat, const IntegerVector& ind_non_focal, const NumericVector& distance_mat_row_i, const LogicalVector& eligible, const int& gi, const int& r, const IntegerVector& mm_rowi, const int& ncc, const NumericMatrix& caliper_covs_mat, const NumericVector& caliper_covs, const double& caliper_dist, const bool& use_exact, const IntegerVector& exact, const int& aenc, const IntegerMatrix& antiexact_covs, const int& ratio = 1); bool antiexact_okay(const int& aenc, const int& i, const int& j, const IntegerMatrix& antiexact_covs); bool caliper_covs_okay(const int& ncc, const int& i, const int& j, const NumericMatrix& caliper_covs_mat, const NumericVector& caliper_covs); bool caliper_dist_okay(const bool& use_caliper_dist, const int& i, const int& j, const NumericVector& distance, const double& caliper_dist); bool mm_okay(const int& r, const int& i, const IntegerVector& mm_rowi); bool exact_okay(const bool& use_exact, const int& i, const int& j, const IntegerVector& exact); double max_finite(const NumericVector& x); double min_finite(const NumericVector& x); void update_first_and_last_control(IntegerVector first_control, IntegerVector last_control, const IntegerVector& ind_d_ord, const LogicalVector& eligible, const IntegerVector& treat, const int& gi); #endif MatchIt/src/subclass2mm.cpp0000644000176200001440000000523714720776763015363 0ustar liggesusers#include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] //Turns subclass vector given as a factor into a numeric match.matrix. //focal is the treatment level (0/1) that corresponds to the rownames. // [[Rcpp::export]] IntegerMatrix subclass2mmC(const IntegerVector& subclass_, const IntegerVector& treat, const int& focal) { LogicalVector na_sub = is_na(subclass_); IntegerVector unique_sub = unique(as(subclass_[!na_sub])); IntegerVector subclass = match(subclass_, unique_sub) - 1; R_xlen_t nsub = unique_sub.size(); R_xlen_t n = treat.size(); IntegerVector ind = Range(0, n - 1); IntegerVector ind_focal = ind[treat == focal]; R_xlen_t n1 = ind_focal.size(); IntegerVector subtab(nsub); subtab.fill(-1); R_xlen_t i; for (i = 0; i < n; i++) { if (na_sub[i]) { continue; } subtab[subclass[i]]++; } int mm_col = max(subtab); IntegerMatrix mm(n1, mm_col); mm.fill(NA_INTEGER); CharacterVector lab = treat.names(); IntegerVector ss(n1); ss.fill(NA_INTEGER); int s, si; for (i = 0; i < n1; i++) { if (na_sub[ind_focal[i]]) { continue; } ss[i] = subclass[ind_focal[i]]; } for (i = 0; i < n; i++) { if (treat[i] == focal) { continue; } if (na_sub[i]) { continue; } si = subclass[i]; for (s = 0; s < n1; s++) { if (!std::isfinite(ss[s])) { continue; } if (si != ss[s]) { continue; } mm(s, sum(!is_na(mm(s, _)))) = i; break; } } mm = mm + 1; rownames(mm) = as(lab[ind_focal]); return mm; } // [[Rcpp::export]] IntegerVector mm2subclassC(const IntegerMatrix& mm, const IntegerVector& treat, const Nullable& focal = R_NilValue) { CharacterVector lab = treat.names(); R_xlen_t n1 = treat.size(); IntegerVector subclass(n1); subclass.fill(NA_INTEGER); subclass.names() = lab; IntegerVector ind1; if (focal.isNotNull()) { ind1 = which(treat == as(focal)); } else { ind1 = match(as(rownames(mm)), lab) - 1; } R_xlen_t r = mm.nrow(); R_xlen_t ki = 0; int ri; IntegerVector s(r); std::vector levs; levs.reserve(r); for (R_xlen_t i : which(!is_na(mm))) { ri = i % r; //row //If first in column, assign subclass if (i / r == 0) { ki++; s[ri] = ki; subclass[ind1[ri]] = ki; levs.push_back(std::to_string(ki)); } subclass[mm[i] - 1] = s[ri]; } subclass.attr("class") = "factor"; subclass.attr("levels") = levs; return subclass; } MatchIt/src/preprocess_matchC.cpp0000644000176200001440000000335714757124416016565 0ustar liggesusers#include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] //Preprocess by pruning unnecessary edges as in Sävje (2020) https://doi.org/10.1214/19-STS699. //Returns a vector of matrix indices for n1xn0 distance matrix. // [[Rcpp::export]] IntegerVector preprocess_matchC(IntegerVector t, NumericVector p) { R_xlen_t n = t.size(); int n1 = std::count(t.begin(), t.end(), 1); int n0 = n - n1; int i, j; Function ord("order"); IntegerVector o = ord(p); o = o - 1; //location of each unit after sorting IntegerVector im(n); int i0 = 0; int i1 = 0; for (j = 0; j < n; j++) { if (t[j] == 0) { im[j] = i0; i0++; } else { im[j] = i1; i1++; } } IntegerVector a(n0), b(n0); std::vector queue; queue.reserve(n1); int k0 = 0; int ci = 0; for (i = 0; i < n; i++) { if (t[o[i]] == 1) { queue.push_back(i); continue; } if (queue.size() > k0) { a[ci] = queue[k0]; k0++; } else { a[ci] = i; } ci++; } k0 = 0; ci = n0 - 1; queue.clear(); for (i = n - 1; i >= 0; i--) { if (t[o[i]] == 1) { queue.push_back(i); continue; } if (queue.size() > k0) { b[ci] = queue[k0]; k0++; } else { b[ci] = i; } ci--; } std::vector keep; keep.reserve(n1 * n0); ci = 0; for (i = 0; i < n; i++) { if (t[o[i]] == 1) { continue; } for (j = a[ci]; j <= b[ci]; j++) { if (t[o[j]] == 0) { continue; } keep.push_back(im[o[j]] + n1 * im[o[i]]); } ci++; } IntegerVector out(keep.size()); for (i = 0; i < keep.size(); i++) { out[i] = keep[i] + 1; } return out; } MatchIt/src/eucdistC.cpp0000644000176200001440000000124414711615505014647 0ustar liggesusers#include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] NumericVector eucdistC_N1xN0(const NumericMatrix& x, const IntegerVector& t) { IntegerVector ind0 = which(t == 0); IntegerVector ind1 = which(t == 1); int p = x.ncol(); int i; double d, di; NumericVector dist(ind1.size() * ind0.size()); int k = 0; for (double i0 : ind0) { for (double i1 : ind1) { d = 0; for (i = 0; i < p; i++) { di = x(i0, i) - x(i1, i); d += di * di; } dist[k] = sqrt(d); k++; } } dist.attr("dim") = Dimension(ind1.size(), ind0.size()); return dist; } MatchIt/src/tabulateC.cpp0000644000176200001440000000046414706110107015003 0ustar liggesusers#include #include "internal.h" using namespace Rcpp; // [[Rcpp::export]] IntegerVector tabulateC(const IntegerVector& bins, const Nullable& nbins = R_NilValue) { int nbins_ = 0; if (nbins.isNotNull()) nbins_ = as(nbins); return tabulateC_(bins, nbins_); } MatchIt/src/has_n_unique.cpp0000644000176200001440000000242614714020352015556 0ustar liggesusers#include "internal.h" using namespace Rcpp; // [[Rcpp::plugins(cpp11)]] // Templated function to check if a vector has exactly n unique values template bool has_n_unique_(Vector x, const int& n) { Vector seen(n); seen[0] = x[0]; int n_seen = 1; int j; bool was_seen; // Iterate over the vector and add elements to the unordered set for (auto it = x.begin() + 1; it != x.end(); ++it) { if (*it == *(it - 1)) { continue; } was_seen = false; for (j = 0; j < n_seen; j++) { if (*it == seen[j]) { was_seen = true; break; } } if (!was_seen) { n_seen++; if (n_seen > n) { return false; } seen[n_seen - 1] = *it; } } // Check if the number of unique elements is exactly n return n_seen == n; } // Wrapper function to handle different types of R vectors // [[Rcpp::export]] bool has_n_unique(const SEXP& x, const int& n) { switch (TYPEOF(x)) { case INTSXP: return has_n_unique_(x, n); case REALSXP: return has_n_unique_(x, n); case STRSXP: return has_n_unique_(x, n); case LGLSXP: return has_n_unique_(x, n); default: stop("Unsupported vector type"); } } MatchIt/NAMESPACE0000644000176200001440000000154514737562473013051 0ustar liggesusers# Generated by roxygen2: do not edit by hand S3method(plot,matchit) S3method(plot,matchit.subclass) S3method(plot,summary.matchit) S3method(print,matchit) S3method(print,summary.matchit) S3method(print,summary.matchit.subclass) S3method(rbind,getmatches) S3method(rbind,matchdata) S3method(summary,matchit) S3method(summary,matchit.subclass) export(add_s.weights) export(euclidean_dist) export(get_matches) export(mahalanobis_dist) export(match.data) export(match_data) export(matchit) export(robust_mahalanobis_dist) export(scaled_euclidean_dist) import(graphics) import(stats) importFrom(Rcpp,evalCpp) importFrom(grDevices,devAskNewPage) importFrom(grDevices,nclass.FD) importFrom(grDevices,nclass.Sturges) importFrom(grDevices,nclass.scott) importFrom(utils,capture.output) importFrom(utils,combn) importFrom(utils,hasName) useDynLib(MatchIt, .registration = TRUE) MatchIt/NEWS.md0000644000176200001440000012022514762356564012725 0ustar liggesusers--- output: html_document: default pdf_document: default --- `MatchIt` News and Updates ====== # MatchIt 4.7.1 * Updates for CRAN compatibility. * Typo fixes in documentation and vignettes, thanks to @iagogv3. # MatchIt 4.7.0 * For nearest neighbor matching, optimal full matching, and genetic matching, calipers can now be negative, which forces paired units to be further away from each other on the given variables. * When using `method = "optimal"` to do 1:1 matching on a propensity score, a new preprocessing algorithm is used to speed up the matching process. This algorithm ensures the resulting match is as good as a match that would have been found without the preprocessing step while shrinking the size of the matching problem. Typically, the same matched set will be selected, but units may be paired differently (this is because the general optimization problem often has multiple solutions with the same value of the objective function; this algorithm adds an additional constraint to select among fewer such solutions). This algorithm is described in [Sävje (2020)](https://doi.org/10.1214/19-STS739) and is implemented in C++ to be fast. * Fixed a bug when matching with a nonzero `ratio` where subclass membership was incorrectly calculated. Thanks to Simon Loewe (@simon-lowe) for originally pointing it out. (#207, #208) * `match.data()` has been renamed to `match_data()`, but `match.data()` will remain as an alias for backward compatibility. * Fixed a bug with printing. * Documentation fixes. # MatchIt 4.6.0 Most improvements are related to performance. Some of these dramatically improve speeds for large datasets. Most come from improvements to `Rcpp` code. * When using `method = "nearest"`, `m.order` can now be set to `"farthest"` to prioritize hard-to-match treated units. Note this **does not** implement "far matching" but simply changes the order in which the closest matches are selected. * Speed improvements to `method = "nearest"`, especially when matching on a propensity score. * Speed improvements to `summary()` when `pair.dist = TRUE` and a `match.matrix` component is not included in the output (e.g., for `method = "full"` or `method = "quick"`). * Speed improvements to `method = "subclass"` with `min.n` greater than 0. * A new `normalize` argument has been added to `matchit()`. When set to `TRUE` (the default, which used to be the only option), the nonzero weights in each treatment group are rescaled to have an average of 1. When `FALSE`, the weights generated directly by the matching are returned instead. * When using `method = "nearest"` with `m.order = "closest"`, the full distance matrix is no longer computed, which increases support for larger samples. This uses an adaptation of an algorithm described by [Rassen et al. (2012)](https://doi.org/10.1002/pds.3263). * When using `method = "nearest"` with `verbose = TRUE`, the progress bar now displays an estimate of how much time remains. * When using `method = "nearest"` with `m.order = "closest"` and `ratio` greater than 1, all eligible units will receive their first match before any receive their second, etc. Previously, the closest pairs would be matched regardless of whether other units had been matched. This ensures consistency with other `m.order` arguments. * Speed and memory improvements to `method = "cem"` with many covariates and a large sample size. Previous versions used a Cartesian expansion of all levels of factor variables, which could easily explode. * When using `method = "cem"` with `k2k = TRUE`, `m.order` can be set to select the matching order. Allowable options include `"data"` (the default), `"closest"`, `"farthest"`, and `"random"`. `"closest"` is recommended, but `"data"` is the default for now to remain consistent with previous versions. * Documentation updates. * Fixed a bug when using `method = "optimal"` or `method = "full"` with `discard` specified and `data` given as a tibble (`tbl_df` object). (#185) * Fixed a bug when using `method = "cardinality"` with a single covariate. (#194) # MatchIt 4.5.5 * When using `method = "cardinality"`, a new solver, HiGHS, can be requested by setting `solver = "highs"`, which relies on the `highs` package. This is much faster and more reliable than GLPK and is free and easy to install as a regular R package with no additional requirements. * Fixed a bug when using `method = "optimal"` with `discard` and `exact` specified. Thanks to @NikNakk for the issue and fix. (#171) # MatchIt 4.5.4 * With `method = "nearest"`, `m.order` can now be set to `"closest"` to request that the closest potential pairs are matched first. This can be used whether a propensity score is used or not. * Fixed bugs when `distance = NULL` and no covariates are specified in `matchit()`. * Changed "empirical cumulative density function" to "empirical cumulative distribution function" in documentation. (#166) * Fixed a bug where calipers would not work properly on some systems. Thanks to Bill Dunlap for the solution. (#163) * Fixed a bug when `.` was present in formulas. Thanks to @dmolitor. (#167) * Fixed a bug when nearest neighbor matching for the ATC with `distance` supplied as a numeric distance matrix. # MatchIt 4.5.3 * Error messages have been improved using `chk` and `rlang`, which are now dependencies. * Fixed a bug when using `method = "nearest"` with `replace = TRUE` and `ratio` greater than 1. Thanks to Julia Kretschmann. (#159) * Fixed a bug when using `method = "nearest"` with `exact` and `ratio` greater than 1. Thanks to Sarah Conner. * Fixed a bug that would occur due to numerical imprecision in `plot.matchit()`. Thanks to @hkmztrk. (#158) * Fixed bugs when using `method = "cem"` where a covariate was to be omitted from coarsening. Thanks to @jfhelmer. (#160) * Fixed some typos in the vignettes. Thanks to @fBedecarrats. (#156) * Updated vignettes to use `marginaleffects` v0.11.0 syntax. # MatchIt 4.5.2 * Fixed a bug when using `method = "quick"` with `exact` specified. Thanks to @m-marquis. (#149) * Improved performance and fixed some bugs when using `exact` in cases where some strata contain units from only one treatment group. Thanks to @m-marquis and others for pointing these out. (#151) # MatchIt 4.5.1 * Nearest neighbor matching now uses a much faster algorithm (up to 6x times faster) when `distance` is a propensity score and `mahvars` is not specified. Differences in sort order might cause results to differ from previous versions if there are units with identical propensity scores. * Template matching has been renamed profile matching in all documentation. * After cardinality or profile matching using `method = "cardinality"` with `ratio` set to a whole number, it is possible to perform optimal Mahalanobis distance matching in the matched sample by supplying the desired matching variables to `mahvars`. Previously, the user had to run a separate pairing step. * Fixed some typos in the vignettes. * Fixed a bug where character variables would be flagged as non-finite. Thanks to @isfraser. (#138) * Added alt text to images in README and vignettes. (#134) # MatchIt 4.5.0 * Generalized full matching, as described by [Sävje, Higgins, and Sekhon (2021)](https://doi.org/10.1017/pan.2020.32), can now be implemented by setting `method = "quick"` in `matchit()`. It is a dramatically faster alternative to optimal full matching that can support much larger datasets and otherwise has similar balancing performance. See `?method_quick` and `vignette("matching-methods")` for more information. This functionality relies on the `quickmatch` package. * The package structure has been updated, include with the use of Roxygen for documentation. This should not affect use, but the source code will look different from that of previous versions. * When `method = "subclass"` and `min.n = 0` (which is not the default), any units not placed into a subclass are now considered "unmatched" and given weights of 0. Previously they were left in. * When `method = "genetic"`, the default `distance.tolerance` is now 0. In previous versions, this argument was ignored; now it is not. * For `plot.matchit()`, the `which.xs` argument can be specified as a one-sided formula. A new `data` argument is allowed if the variables in that formula are not among the original covariates. * When a factor variable is supplied to `plot.matchit()` with `type = "density"`, the plot now displays all factor levels in the same plot instead of in separate plots for each level, similar to `cobalt::bal.plot()`. * The "Estimating Effects" vignette (`vignette("estimating-effects")`) has been rewritten to be much shorter (and hopefully clearer) and to use the `marginaleffects` package, which is now a Suggested package. The new vignette focuses on using g-computation to estimate treatment effects using a single workflow with slight modifications for different situations. * The error message when covariates have missing or non-finite values is now clearer, identifying which variables are afflicted. This fixes a bug mentioned in #115. * Fixed a bug when using `matchit()` with `method = "cem"`, `k2k = TRUE`, and `k2k.method = NULL`. Thanks to Florian B. Mayr. * Fixed a bug when using `method = "optimal"` and `method = "full"` with `exact` and `antiexact` specified, wherein a warning would occur about the `drop` argument in subsetting. * Fixed a bug where `antiexact` would not work correctly with `method = "nearest"`. Thanks to @gli-1. (#119) * Fixed typos in the documentation and vignettes. * Calculating pair distances in `summary()` with `pair.dist = TRUE` is now faster. * Improved printing of balance results when no covariates are supplied. * Updates to the Estimating Effects vignette that dramatically increase the speed of the cluster bootstrap for average marginal effects after matching. Thanks to Yohei Hashimoto for pointing out the inefficiency. * Updates to the Assessing Balance vignette to fix errors * All vignettes and help files are better protected against Suggested packages not available on CRAN. # MatchIt 4.4.0 * `optmatch` has returned to CRAN, now with an open-source license! A new `solver` argument can be passed to `matchit()` with `method = "full"` and `method = "optimal"` to control the solver used to perform the optimization used in the matching. Note that using the default (open source) solver LEMON may yield results different from those obtained prior to `optmatch` 0.10.0. For reproducibility questions, please contact the `optmatch` maintainers. * New functions have been added to compute the Euclidean distance (`euclidean_dist()`), scaled Euclidean distance (`scaled_euclidean_dist()`), Mahalanobis distance (`mahalanobis_dist()`), and robust Mahalanobis distance (`robust_mahalanobis_dist()`). They produce distance matrices that can be supplied to the `distance` argument of `matchit()`, but see below. * New distance options are available for `matchit()` based on the distance functions above: `"robust_mahalanobis"`, `"euclidean"`, and `"scaled_euclidean"`, which complement `"mahalanobis"`. Similar to `"mahalanobis"`, these do not involve estimating a propensity score but rather operate on the covariates directly. These can be used for nearest neighbor matching, optimal matching, full matching, and coarsened exact matching with `k2k = TRUE`. * The Mahalanobis distance is now computed using the pooled within-group covariance matrix (computed by treatment group-mean centering each covariate before computing the covariance in the full sample), in line with how it is computed in `optmatch` and recommended by Rubin (1980) among others. This will cause results to differ between this version and prior versions of `MatchIt` that used the Mahalanobis distance computed ignoring group membership. * Added the `unit.id` argument to `matchit()` with `method = "nearest"`, which defines unit IDs so that if a control observation with a given unit ID has been matched to a treated unit, no other control units with the same ID can be used as future matches, ensuring each unit ID is used no more than once. This is useful when, e.g., multiple rows correspond to the same control firm but you only want each control firm to be matched once, in which case firm ID would be supplied to `unit.id`. See [here](https://stats.stackexchange.com/questions/349784/propensity-matching-and-analysis-of-resultant-data-on-a-data-set-with-repeated-m) for an example use case. * In `summary.matchit()`, `improvement` is now set to `FALSE` by default to hide the percentage improvement in balance. Set to `TRUE` to recover prior behavior. * Added clearer errors when required packages are missing for certain `distance` methods. * Fixed a bug when using `matchit()` with `method = "nearest"`, `ratio` greater than 1, and `reuse.max` specified. The bug allowed a previously matched control unit to be matched to the same treatment unit, thereby essentially ignoring the `ratio` argument. It now works as intended. * Fixed a bug in `matchit()` with `method = "nearest"` when `distance` was supplied as a matrix and `Inf` values were present. * Fixed a bug when using exact matching that caused an infinite loop when variable levels contained commas. Thanks to @bking124. (#111) * Fixed a bug introduced by `optmatch` version 0.10.3. * Documentation updates. * Updated the logo, thanks to [Ben Stillerman](https://stillben.com). # MatchIt 4.3.4 * `optmatch` has been removed from CRAN. Instructions on installing it are in `?method_optimal` and `?method_full`. * When `s.weights` are supplied with `distance = "randomforest"`, the weights are supplied to `randomForest::randomForest()`. * Improved conditional use of packages, especially `optmatch`. This may mean that certain examples fail to run in the vignettes. # MatchIt 4.3.3 * Fixed a bug where `rbind.matchdata()` would produce datasets twice their expected length. Thanks to @sconti555. (#98) # MatchIt 4.3.2 * Fixed a bug where the `q.cut` component of the `matchit` object when `method = "subclass"` was not included. Now it is. Thanks to @aldencabajar. (#92) * The `nn` and `qn` components of the `matchit` object have been removed. They are now computed by `summary.matchit()` and included in the `summary.matchit` object. * Removed the code to disable compiler checks to satisfy CRAN requirements. # MatchIt 4.3.1 * Added the `reuse.max` argument to `matchit()` with `method = "nearest"`. This controls the maximum number of times each control unit can be used as a match. Setting `reuse.max = 1` is equivalent to matching without replacement (i.e., like setting `replace = FALSE`), and setting `reuse.max = Inf` is equivalent to matching with replacement with no restriction on the reuse of controls (i.e., like setting `replace = TRUE`). Values in between restrict how many times each control unit can be used as a match. Higher values will tend to improve balance but decrease precision. * Mahalanobis distance matching with `method = "nearest"` is now a bit faster. * Fixed a bug where `method = "full"` would fail when some exact matching strata contained exactly one treated unit and exactly one control unit. (#88) * Fixed a bug introduced in 4.3.0 where the inclusion of character variables would cause the error `"Non-finite values are not allowed in the covariates."` Thanks to Moaath Mustafa. * Documentation updates. # MatchIt 4.3.0 * Cardinality and template matching can now be used by setting `method = "cardinality"` in `matchit()`. These methods use mixed integer programming to directly select a matched subsample without pairing or stratifying units that satisfied user-supplied balance constraints. Their results can be dramatically improved when using the Gurobi optimizer. See `?method_cardinality` and `vignette("matching-methods")` for more information. * Added `"lasso"`, `"ridge"`, and `"elasticnet"` as options for `distance`. These estimate propensity scores using lasso, ridge, or elastic net regression, respectively, as implemented in the `glmnet` package. * Added `"gbm"` as an option for `distance`. This estimates propensity scores using generalized boosted models as implemented in the `gbm` package. This implementation differs from that in `twang` by using cross-validation or out-of-bag error to choose the tuning parameter as opposed to balance. * A new argument, `include.obj`, has been added to `matchit()`. When `TRUE`, the intermediate matching object created internally will be included in the output in the `obj` component. See the individual methods pages for information on what is included in each output. This is ignored for some methods. * Density plots can now be requested using `plot.matchit()` by setting `type = "density"`. These display the density of each covariate in the treatment groups before and after matching and are similar to the plots created by `cobalt::bal.plot()`. Density plots can be easier to interpret than eCDF plots. `vignette("assessing-balance")` has been updated with this addition. * A clearer error is now produced when the treatment variable is omitted from the `formula` argument to `matchit()`. * Improvements in how `match.data()` finds the original dataset. It's still always safer to supply an argument to `data`, but now `match.data()` will look in the environment of the `matchit` formula, then the calling environment of `match.data()`, then the `model` component of the `matchit` object. A clearer error message is now printed when a valid dataset cannot be found in these places. * Fixed a bug that would occur when using `summary.matchit()` with just one covariate. * When `verbose = TRUE` and a propensity score is estimated (i.e., using the `distance` argument), a message saying so will be displayed. * Fixed a bug in `print.matchit()` where it would indicate that the propensity score was used in a caliper if any caliper was specified, even if not on the propensity score. Now, it will only indicate that the propensity score was used in a caliper if it actually was. * Fixed a bug in `plot.matchit()` that would occur when a level of a factor had no values. * Speed improvements for `method = "full"` with `exact` specified. These changes can make current results differ slightly from past results when the `tol` value is high. It is recommended to always use a low value of `tol`. * Typo fixes in documentation and vignettes. * Fixed a bug where supplying a "GAM" string to the `distance` argument (i.e., using the syntax prior to version 4.0.0) would ignore the link supplied. * When an incompatible argument is supplied to `matchit()` (e.g., `reestimate` with `distance = "mahalanobis"`), an error or warning will only be produced when that argument has been set to a value other than its default (e.g., so setting `reestimate = FALSE` will no longer throw an error). This fixes an issue brought up by Vu Ng when using `MatchThem`. * A clearer error is produced when non-finite values are present in the covariates. # MatchIt 4.2.0 * `distance` can now be supplied as a distance matrix containing pairwise distances with nearest neighbor, optimal, and full matching. This means users can create a distance matrix outside `MatchIt` (e.g., using `optmatch::match_on()` or `dist()`) and `matchit()` will use those distances in the matching. See `?distance` for details. * Added `rbind.matchdata()` method for `matchdata` and `getmatches` objects (the output of `match.data()` and `get_matches()`, respectively) to avoid subclass conflicts when combining matched samples after matching within subgroups. * Added a section in `vignette("estimating-effects")` on moderation analysis with matching, making use of the new `rbind()` method. * Added `antiexact` argument to perform anti-exact matching, i.e., matching that ensures treated and control units have different values of certain variables. See [here](https://stackoverflow.com/questions/66526115/propensity-score-matching-with-panel-data) and [here](https://stackoverflow.com/questions/61120201/avoiding-duplicates-from-propensity-score-matching?rq=1) for examples where this feature was requested and might be useful. Anti-exact matching works with nearest neighbor, optimal, full, and genetic matching. The argument to `antiexact` should be similar to an argument to `exact`: either a string or a one-sided `formula` containing the names of the anti-exact matching variables. * Slight speed improvements for nearest neighbor matching, especially with `exact` specified. * With `method = "nearest"`, `verbose = TRUE`, and `exact` specified, separate messages and progress bars will be shown for each subgroup of the `exact` variable(s). * A spurious warning that would appear when using a large `ratio` with `replace = TRUE` and `method = "nearest"` no longer appears. * Fixed a bug when trying to supply `distance` as a labeled numeric vector (e.g., resulting from `haven`). * Fixed some typos in the documentation and vignettes. # MatchIt 4.1.0 * Coarsened exact matching (i.e., `matchit()` with `method = "cem"`) has been completely rewritten and no longer involves the `cem` package, eliminating some spurious warning messages and fixing some bugs. All the same arguments can still be used, so old code will run, though some results will differ slightly. Additional options are available for matching and performance has improved. See `?method_cem` for details on the differences between the implementation in the current version of `MatchIt` and that in `cem` and older versions of `MatchIt`. In general, these changes make coarsened exact matching function as one would expect it to, circumventing some peculiarities and bugs in the `cem` package. * Variable ratio matching is now compatible with `method = "optimal"` in the same way it is with `method = "nearest"`, i.e., by using the `min.controls` and `max.controls` arguments. * With `method = "full"` and `method = "optimal"`, the maximum problem size has been set to unlimited, so that larger datasets can be used with these methods without error. They may take a long time to run, though. * Processing improvements with `method = "optimal"` due to rewriting some functions in `Rcpp`. * Using `method = "optimal"` runs more smoothly when combining it with exact matching through the `exact` argument. * When using `ratio` different from 1 with `method = "nearest"` and `method = "optimal"` and with exact matching, errors and warnings about the number of units that will be matched are clearer. Certain `ratio`s that would produce errors now only produce warnings. * Fixed a bug when no argument was supplied to `data` in `matchit()`. * Improvements to vignettes and documentation. # MatchIt 4.0.1 * Restored `cem` functionality after it had been taken down and re-uploaded. * Added `pkgdown` website. * Computing matching weights after matching with replacement is faster due to programming in `Rcpp`. * Fixed issues with `Rcpp` code that required C++11. C++11 has been added to SystemRequirements in DESCRIPTION, and `MatchIt` now requires R version 3.1.0 or later. # MatchIt 4.0.0 ## General Fixes and New Features * `match.data()`, which is used to create matched datasets, has a few new arguments. The `data` argument can be supplied with a dataset that will have the matching weights and subclasses added. If not supplied, `match.data()` will try to figure out the appropriate dataset like it did in the past. The `drop.unmatched` argument controls whether unmatched units are dropped from the output. The default is `TRUE`, consistent with past behavior. Warnings are now more informative. * `get_matches()`, which seems to have been rarely used since it performed a similar function to `match.data()`, has been revamped. It creates a dataset with one row per unit per matched pair. If a unit is part of two separate pairs (e.g., as a result of matching with replacement), it will get two rows in the output dataset. The goal here was to be able to implement standard error estimators that rely both on repeated use of the same unit and subclass/pair membership, e.g., Austin & Cafri (2020). Otherwise, it functions similarly to `match.data()`. *NOTE: the changes to `get_matches()` are breaking changes! Legacy code will not work with the new syntax!* * `print.matchit()` has completely changed and now prints information about the matching type and specifications. `summary.matchit()` contains all the information that was in the old `print` method. * A new function, `add_s.weights()`, adds sampling weights to `matchit` objects for use in balance checking and effect estimation. Sampling weights can also be directly supplied to `matchit()` through the new `s.weights` argument. A new vignette describing how to using `MatchIt` with sampling weights is available at `vignette("sampling-weights")`. * The included dataset, `lalonde`, now uses a `race` variable instead of separate `black` and `hispan` variables. This makes it easier to see how character variables are treated by `MatchIt` functions. * Added extensive documentation for every function, matching method, and distance specification. Documentation no longer links to `gking.harvard.edu/matchit` as it now stands alone. ## `matchit()` * An argument to `data` is no longer required if the variables in `formula` are present in the environment. * When missing values are present in the dataset but not in the treatment or matching variables, the error that used to appear no longer does. * The `exact` argument can be supplied either as a character vector of names of variables in `data` or as a one-sided formula. A full cross of all included variables will be used to create bins within which matching will take place. * The `mahvars` argument can also be supplied either as a character vector of names of variables in `data` or as a one-sided formula. Mahalanobis distance matching will occur on the variables in the formula, processed by `model.matrix()`. Use this when performing Mahalanobis distance matching on some variables within a caliper defined by the propensity scores estimated from the variables in the main `formula` using the argument to `distance`. For regular Mahalanobis distance matching (without a propensity score caliper), supply the variables in the main `formula` and set `distance = "mahalanobis"`. * The `caliper` argument can now be specified as a numeric vector with a caliper for each variable named in it. This means you can separately impose calipers on individual variables as well as or instead of the propensity score. For example, to require that units within pairs must be no more than .2 standard deviations of `X1` away from each other, one could specify `caliper = c(X1 = .2)`. A new option `std.caliper` allows the choice of whether the caliper is in standard deviation units or not, and one value per entry in `caliper` can be supplied. An unnamed entry to `caliper` applies the caliper to the propensity score and the default of `std.caliper` is `FALSE`, so this doesn't change the behavior of old code. These options only apply to the methods that accept calipers, namely `"nearest"`, `"genetic"`, and `"full"`. * A new `estimand` argument can be supplied to specify the target estimand of the analysis. For all methods, the ATT and ATC are available with the ATT as the default, consistent with prior behavior. For some methods, the ATE is additionally available. Note that setting the estimand doesn't actually mean that estimand is being targeted; if calipers, common support, or other restrictions are applied, the target population will shift from that requested. `estimand` just triggers the choice of which level of the treatment is focal and what formula should be used to compute weights from subclasses. * In methods that accept it, `m.order` can be set to "`data`", which matches in the order the data appear. With `distance = "mahalanobis"`, `m.order` can be "`random`" or "`data`", with "`data`" as the default. Otherwise, `m.order` can be `"largest"`, `"smallest"`, `"random"`, or `"data"`, with `"largest"` as the default (consistent with prior behavior). * The output to `matchit()` has changed slightly; the component `X` is now a data frame, the result of a call to `model.frame()` with the formula provided. If `exact` or `mahvars` are specified, their variables are included as well, if not already present. It is included for all methods and is the same for all methods. In the past, it was the result of a call to `model.matrix()` and was only included for some methods. * When key arguments are supplied to methods that don't accept them, a warning will be thrown. * `method` can be set to `NULL` to not perform matching but create a `matchit` object, possibly with a propensity score estimated using `distance` or with a common support restriction using `discard`, for the purpose of supplying to `summary.matchit()` to assess balance prior to matching. ### `method = "nearest"` * Matching is much faster due to re-programming with `Rcpp`. * With `method = "nearest"`, a `subclass` component containing pair membership is now included in the output when `replace = FALSE` (the default), as it has been with optimal and full matching. * When using `method = "nearest"` with `distance = "mahalanobis"`, factor variables can now be included in the main `formula`. The design matrix no longer has to be full rank because a generalized inverse is used to compute the Mahalanobis distance. * Unless `m.order = "random"`, results will be identical across runs. Previously, several random choices would occur to break ties. Ties are broken based on the order of the data; shuffling the order of the data may therefore yield different matches. * When using `method = "nearest"` with a caliper specified, the nearest control unit will be matched to the treated unit if one is available. Previously, a random control unit within the caliper would be selected. This eliminates the need for the `calclosest` argument, which has been removed. * Variable ratio extremal matching as described by Ming & Rosenbaum (2000) can be implemented using the new `min.controls` and `max.controls` arguments. * Added ability to display a progress bar during matching, which can be activated by setting `verbose = TRUE`. ### `method = "optimal"` and `method = "full"` * Fixed bug in `method = "optimal"`, which produced results that did not match `optmatch`. Now they do. * Added support for optimal and full Mahalanobis distance matching by setting `method = "mahalanobis"` with `method = "optimal"` and `method = "full"`. Previously, both methods would perform a random match if `method` was set to `"mahalanobis"`. Now they use the native support in `optmatch::pairmatch()` and `optmatch::fullmatch()` for Mahalanobis distance matching. * Added support for exact matching with `method = "optimal"` and `method = "full"`. As with `method = "nearest"`, the names of the variables for which exact matches are required should be supplied to the `exact` argument. This relies on `optmatch::exactMatch()`. * The warning that used to occur about the order of the match not guaranteed to be the same as the original data no longer occurs. * For `method = "full"`, the `estimand` argument can be set to `"ATT"`, `"ATC"`, or `"ATE"` to compute matching weights that correspond to the given estimand. See `?matchit` for details on how weights are computed for each `estimand`. ### `method = "genetic"` * Fixed a bug with `method = "genetic"` that caused an error with some `ratio` greater than 1. * The default of `replace` in `method = "genetic"` is now `FALSE`, as it is with `method = "nearest"`. * When `verbose = FALSE`, the default, no output is printed with `method = "genetic"`. With `verbose = TRUE`, the printed output of `Matching::GenMatch()` with `print.level = 2` is displayed. * The `exact` argument now correctly functions with `method = "genetic"`. Previously, it would have to be specified in accordance with its use in `Matching::GenMatch()`. * Different ways to match on variables are now allowed with `method = "genetic"`, similar to how they are with `method = "nearest"`. If `distance = "mahalanobis"`, no propensity score will be computed, and genetic matching will be performed just on the variables supplied to `formula`. If `mahvars` is specified, genetic matching will be performed on the variables supplied to `mahvars`, but balance will be optimized on all covariates supplied to `formula`. Otherwise, genetic matching will be performed on the variables supplied to `formula` and the propensity score. Previously, `mahvars` was ignored. Balance is now always optimized on the variables included in `formula` and never on the propensity score, whereas in the past the propensity score was always included in the balance optimization. * The `caliper` argument now works as it does with `method = "nearest"` and other methods rather than needing to be supplied in a way that `Matching::Match()` would accept. * A `subclass` component is now included in the output when `replace = FALSE` (the default), as it has been with optimal and full matching. ### `method = "cem"` and `method = "exact"` * With `method = "cem"`, the `k2k` argument is now recognized. Previously it was ignored unless an argument to `k2k.method` was supplied. * The `estimand` argument can be set to `"ATT"`, `"ATC"`, or `"ATE"` to compute matching weights that correspond to the given estimand. Previously only ATT weights were computed. See `?matchit` for details on how weights are computed for each `estimand`. ### `method = "subclass"` * Performance improvements. * A new argument, `min.n`, can be supplied, which controls the minimum size a treatment group can be in each subclass. When any estimated subclass doesn't have enough members from a treatment group, units from other subclasses are pulled to fill it so that every subclass will have at least `min.n` units from each treatment group. This uses the same mechanism as is used in `WeightIt`. The default `min.n` is 1 to ensure there are at least one treated and control unit in each subclass. * Rather than producing warnings and just using the default number of subclasses (6), when an inappropriate argument is supplied to `subclass`, an error will occur. * The new `subclass` argument to `summary()` can be used to control whether subclass balance statistics are computed; it can be `TRUE` (display balance for all subclasses), `FALSE` (display balance for no subclasses), or a vector of subclass indices on which to assess balance. The default is `FALSE`. * With `summary()`, balance aggregating across subclasses is now computed using subclass weights instead of by combining the subclass-specific balance statistics. * The `sub.by` argument has been replaced with `estimand`, which can be set to `"ATT"`, `"ATC"`, or `"ATE"` to replace the `sub.by` inputs of `"treat"`, `"control"`, and `"all"`, respectively. Previously, weights for `sub.by` that wasn't `"treat"` were incorrect; they are now correctly computed for all inputs to `estimand`. ### `distance` * The allowable options to `distance` have changed slightly. The input should be either `"mahalanobis"` for Mahalanobis distance matching (without a propensity score caliper), a numeric vector of distance values (i.e., values whose absolute pairwise differences form the distances), or one of the allowable options. The new allowable values include `"glm"` for propensity scores estimated with `glm()`, `"gam"` for propensity scores estimated with `mgcv::gam()`, `"rpart"` for propensity scores estimated with `rpart::rpart()`, `"nnet"` for propensity scores estimated with `nnet::nnet()`, `"cbps"` for propensity scores estimated with `CBPS::CBPS()`, or `bart` for propensity scores estimated with `dbarts::bart2()`. To specify a link (e.g., for probit regression), specify an argument to the new `link` parameter. For linear versions of the propensity score, specify `link` as `"linear.{link}"`. For example, for linear probit regression propensity scores, one should specify `distance = "glm", link = "linear.probit"`. The default `distance` is `"glm"` and the default link is `"logit"`, so these can be omitted if either is desired. Not all methods accept a `link`, and for those that don't, it will be ignored. If an old-style `distance` is supplied, it will be converted to an appropriate specification with a warning (except for `distance = "logit"`, which will be converted without a warning). * Added `"cbps"` as option for `distance`. This estimates propensity scores using the covariate balancing propensity score (CBPS) algorithm as implemented in the `CBPS` package. Set `link = "linear"` to use a linear version of the CBPS. * Added `"bart"` as an option for `distance`. This estimates propensity scores using Bayesian Additive Regression Trees (BART) as implemented in the `dbarts` package. * Added `"randomforest"` as an option for `distance`. This estimates propensity scores using random forests as implemented in the `randomForest` package. * Bugs in `distance = "rpart"` have been fixed. ## `summary.matchit()` * When `interactions = TRUE`, interactions are no longer computed with the distance measure or between dummy variables of the same factor. Variable names are cleaned up and easier to read. * The argument to `addlvariables` can be specified as a data frame or matrix of covariates, a formula with the additional covariates (and transformations) on the right side, or a character vector containing the names of the additional covariates. For the latter two, if the variables named do not exist in the `X` component of the `matchit` output object or in the environment, an argument to `data` can be supplied to `summary()` that contains these variables. * The output for `summary()` is now the same for all methods (except subclassification). Previously there were different methods for a few different types of matching. * The eCDF median (and QQ median) statistics have been replaced with the variance ratio, which is better studied and part of several sets of published recommendations. The eCDF and QQ median statistics provide little information above and beyond the corresponding mean statistics. The variance ratio uses the variances weighted by the matching weights. * The eCDF and QQ statistics have been adjusted. Both now use the weights that were computed as part of the matching. The eCDF and QQ statistics for binary variables are set to the difference in group proportions. The standard deviation of the control group has been removed from the output. * The default for `standardize` is now `TRUE`, so that standardized mean differences and eCDF statistics will be displayed by default. * A new column for the average absolute pair difference for each covariate is included in the output. The values indicate how far treated and control units within pairs are from each other. An additional argument to `summary.matchit()`, `pair.dist`, controls whether this value is computed. It can take a long time for some matching methods and could be omitted to speed up computation. * Balance prior to matching can now be suppressed by setting `un = FALSE`. * Percent balance improvement can now be suppressed by setting `improvement = FALSE`. When `un = FALSE`, `improvement` is automatically set to `FALSE`. ## `plot.matchit()` * Plots now use weighted summaries when weights are present, removing the need for the `num.draws` argument. * Added a new plot type, `"ecdf"`, which creates empirical CDF plots before and after matching. * The appearance of some plots has improved (e.g., text is appropriately centered, axes are more clearly labeled). For eQQ plots with binary variables or variables that take on only a few values, the plots look more like clusters than snakes. * The argument to `type` can be abbreviated (e.g., `"j"` for jitter). * Fixed a bug that caused all plots generated after using `plot(., type = "hist")` to be small. * When specifying an argument to `which.xs` to control for which variables balance is displayed graphically, the input should be the name of the original variable rather than the version that appears in the `summary()` output. In particular, if a factor variable was supplied to `matchit()`, it should be referred to by its name rather than the names of its split dummies. This makes it easier to view balance on factor variables without having to know or type the names of all their levels. * eQQ plots can now be used with all matching methods. Previously, attempting `plot()` after `method = "exact"` would fail. ## `plot.summary.matchit()` * The summary plot has been completely redesigned. It is now a Love plot made using `graphics::dotchart()`. A few options are available for ordering the variables, presenting absolute or raw standardized mean differences, and placing threshold lines on the plots. For a more sophisticated interface, see `cobalt::love.plot()`, which natively supports `matchit` objects and uses `ggplot2` as its engine. MatchIt/inst/0000755000176200001440000000000014763323604012570 5ustar liggesusersMatchIt/inst/include/0000755000176200001440000000000014706613744014217 5ustar liggesusersMatchIt/inst/include/MatchIt_RcppExports.h0000644000176200001440000000165014707507532020272 0ustar liggesusers// Generated by using Rcpp::compileAttributes() -> do not edit by hand // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 #ifndef RCPP_MatchIt_RCPPEXPORTS_H_GEN_ #define RCPP_MatchIt_RCPPEXPORTS_H_GEN_ #include namespace MatchIt { using namespace Rcpp; namespace { void validateSignature(const char* sig) { Rcpp::Function require = Rcpp::Environment::base_env()["require"]; require("MatchIt", Rcpp::Named("quietly") = true); typedef int(*Ptr_validate)(const char*); static Ptr_validate p_validate = (Ptr_validate) R_GetCCallable("MatchIt", "_MatchIt_RcppExport_validate"); if (!p_validate(sig)) { throw Rcpp::function_not_exported( "C++ function with signature '" + std::string(sig) + "' not found in MatchIt"); } } } } #endif // RCPP_MatchIt_RCPPEXPORTS_H_GEN_ MatchIt/inst/include/MatchIt.h0000644000176200001440000000037214706613744015723 0ustar liggesusers// Generated by using Rcpp::compileAttributes() -> do not edit by hand // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 #ifndef RCPP_MatchIt_H_GEN_ #define RCPP_MatchIt_H_GEN_ #include "MatchIt_RcppExports.h" #endif // RCPP_MatchIt_H_GEN_ MatchIt/inst/CITATION0000644000176200001440000000060414375444027013727 0ustar liggesusersbibentry( "Article", title = "{MatchIt}: Nonparametric Preprocessing for Parametric Causal Inference", author = c( person("Daniel E.", "Ho"), person("Kosuke", "Imai"), person("Gary", "King"), person("Elizabeth A.", "Stuart") ), year = 2011, journal = "Journal of Statistical Software", volume = 42, number = 8, pages = "1--28", doi = "10.18637/jss.v042.i08", )MatchIt/inst/doc/0000755000176200001440000000000014763323603013334 5ustar liggesusersMatchIt/inst/doc/assessing-balance.Rmd0000644000176200001440000010330714740300476017364 0ustar liggesusers--- title: "Assessing Balance" author: "Noah Greifer" date: "`r Sys.Date()`" output: html_vignette: toc: true vignette: > %\VignetteIndexEntry{Assessing Balance} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} bibliography: references.bib link-citations: true --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, fig.width=7, fig.height=5, fig.align = "center") options(width = 200, digits = 4) ``` ```{=html} ``` ## Introduction Covariate balance is the degree to which the distribution of covariates is similar across levels of the treatment. It has three main roles in causal effect estimation using matching: 1) as a target to optimize with matching, 2) as a method of assessing the quality of the resulting matches, and 3) as evidence to an audience that the estimated effect is close to the true effect. When covariate balance is achieved, the resulting effect estimate is less sensitive to model misspecification and ideally close to true treatment effect. The benefit of randomization is that covariate balance is achieved automatically (in expectation), which is why unadjusted effects estimated from randomized trial data (in the absence of drop-out) can be validly interpreted as causal effects. When using matching to recover causal effect estimates form observational data, balance is not guaranteed and must be assessed. This document provides instructions for assessing and reporting covariate balance as part of a matching analysis. The tools available in `MatchIt` for balance assessment should be used during the process of selecting a good matching scheme and ensuring that the chosen scheme is adequate. These tools implement the recommendations of @ho2007 and others for assessing balance. In addition to the tools available in `MatchIt`, the `cobalt` package has a suite of functions designed to assess and display balance and is directly compatible with `MatchIt` objects. `cobalt` has extensive documentation, but we describe some of its functionality here as a complement to the tools in `MatchIt`. The structure of this document is as follows: first, we describe some of the recommendations for balance checking and their rationale; next, we describe the tools for assessing balance present in `MatchIt` and display their use in evaluating several matching schemes; finally; we briefly describe some of the functionality in `cobalt` to extend that in `MatchIt`. ## Recommendations for Balance Assessment Assessing balance involves assessing whether the distributions of covariates are similar between the treated and control groups. Balance is typically assessed by examining univariate balance summary statistics for each covariate, though more complicated methods exist for assessing joint distributional balance as well. Visual depictions of distributional balance can be a helpful complement to numerical summaries, especially for hard to balance and prognostically important covariates. Many recommendations for balance assessment have been described in the methodological literature. Unfortunately, there is no single best way to assess balance or to weigh balance summary statistics because the degree and form of balance that will yield the least bias in an effect estimate depends on unknown qualities of the outcome data-generating model. Nonetheless, there are a number of valuable recommendations that can be implemented to ensure matching is successful at eliminating or reducing bias. We review some of these here. Common recommendations for assessing balance include the following: - **Standardized mean differences**. The standardized mean difference (SMD) is the difference in the means of each covariate between treatment groups standardized by a standardization factor so that it is on the same scale for all covariates. The standardization factor is typically the standard deviation of the covariate in the treated group when targeting the ATT or the pooled standard deviation across both groups when targeting the ATE. The standardization factor should be the same before and after matching to ensure changes in the mean difference are not confounded by changes in the standard deviation of the covariate. SMDs close to zero indicate good balance. Several recommended thresholds have been published in the literature; we recommend .1 and .05 for prognostically important covariates. Higher values may be acceptable when using covariate adjustment in the matched sample. In addition to computing SMDs on the covariates themselves, it is important to compute them on squares, cubes, and higher exponents as well as interactions between covariates. Several empirical studies have examined the appropriateness for using SMDs in balance assessment, including @belitser2011, @ali2014, and @stuart2013; in general, there is often a high correlation between the mean or maximum absolute SMD and the degree of bias in the treatment effect. - **Variance Ratios**. The variance ratio is the ratio of the variance of a covariate in one group to that in the other. Variance ratios close to 1 indicate good balance because they imply the variances of the samples are similar [@austin2009]. - **Empirical CDF Statistics**. Statistics related to the difference in the empirical cumulative distribution functions (eCDFs) of each covariate between groups allow assessment of imbalance across the entire covariate distribution of that covariate rather than just its mean or variance. The maximum eCDF difference, also known as the Kolmogorov-Smirnov statistic, is sometimes recommended as a useful supplement to SMDs for assessing balance [@austin2015] and is often used as a criterion to use in propensity score methods that attempt to optimize balance [e.g., @mccaffrey2004; @diamond2013]. Although the mean eCDF difference has not been as well studied, it provides a summary of imbalance that may be missed by relying solely on the maximum difference. - **Visual Diagnostics**. Visual diagnostics such as eCDF plots, empirical quantile-quantile (eQQ) plots, and kernel density plots can be used to see exactly how the covariate distributions differ from each other, i.e., where in the distribution the greatest imbalances are [@ho2007; @austin2009]. This can help to figure out how to tailor a matching method to target imbalance in a specific region of the covariate distribution. - **Prognostic scores**. The prognostic score is an estimate of the potential outcome under control for each unit [@hansen2008]. Balance on the prognostic score has been shown to be highly correlated with bias in the effect estimate, making it a useful tool in balance assessment [@stuart2013]. Estimating the prognostic score requires having access to the outcome data, and using it may be seen as violating the principle of separating the design and analysis stages of a matching analysis [@rubin2001]. However, because only the outcome values from the control group are required to use the prognostic score, some separation is maintained. Several multivariate statistics exist that summarize balance across the entire joint covariate distribution. These can be functions of the above measures, like the mean or maximum absolute SMD or the generalized weighted distance [GWD; @franklin2014], which is the sum of SMDs for the covariates and their squares and interactions, or separate statistics that measure quantities that abstract away from the distribution of individual covariates, like the L1 distance [@iacus2011], cross-match test [@heller2010], or energy distance [@huling2020]. Balance on the propensity score has often been considered a useful measure of balance, but we do not necessarily recommend it except as a supplement to balance on the covariates. Propensity score balance will generally be good with any matching method regardless of the covariate balancing potential of the propensity score, so a balanced propensity score does not imply balanced covariates [@austin2009]. Similarly, it may happen that covariates may be well balanced even if the propensity score is not balanced, such as when covariates are prioritized above the propensity score in the matching specification (e.g., with genetic matching). Given these observations, the propensity score should not be relied upon for assessing covariate balance. Simulation studies by @stuart2013 provide evidence for this recommendation against relying on propensity score balance. There has been some debate about the use of hypothesis tests, such as t-tests or Kolmogorov-Smirnov tests, for assessing covariate balance. The idea is that balance tests test the null hypothesis that the matched sample has equivalent balance to a randomized experiment. There are several problems with balance tests, described by @ho2007 and @imai2008: 1) balance is a property of the sample, not a of a population from which the sample was drawn; 2) the power of balance tests depends on the sample size, which changes during matching even if balance does not change; and 3) the use of hypothesis tests implies a uniform decision criterion for rejecting the null hypothesis (e.g., p-value less than .05, potentially with corrections for multiple comparisons), when balance should be improved without limit. `MatchIt` does not report any balance tests or p-values, instead relying on the descriptive statistics described above. ## Recommendations for Balance Reporting A variety of methods should be used when assessing balance to try to find an optimal matched set that will ideally yield a low-error estimate of the desired effect. However, reporting every balance statistic or plot in a research report or publication can be burdensome and unnecessary. That said, it is critical to report balance to demonstrate to readers that the resulting estimate is approximately unbiased and relies little on extrapolation or correct outcome model specification. We recommend the following in reporting balance in a matching analysis: - Report SMDs before and after matching for each covariate, any prognostically important interactions between covariates, and the prognostic score; this can be reported in a table or in a Love plot. - Report summaries of balance for other statistics, e.g., the largest mean and maximum eCDF difference among the covariates and the largest SMD among squares, cubes, and interactions of the covariates. `MatchIt` provides tools for calculating each of these statistics so they can be reported with ease in a manuscript or report. ## Assessing Balance with `MatchIt` `MatchIt` contains several tools to assess balance numerically and graphically. The primary balance assessment function is `summary.matchit()`, which is called when using `summary()` on a `MatchIt` object and produces several tables of balance statistics before and after matching. `plot.summary.matchit()` generates a Love plot using R's base graphics system containing the standardized mean differences resulting from a call to `summary.matchit()` and provides a nice way to display balance visually for inclusion in an article or report. `plot.matchit()` generates several plots that display different elements of covariate balance, including propensity score overlap and distribution plots of the covariates. These functions together form a suite that can be used to assess and report balance in a variety of ways. To demonstrate `MatchIt`'s balance assessment capabilities, we will use the Lalonde data included in `MatchIt` and used in `vignette("MatchIt")`. We will perform 1:1 nearest neighbor matching with replacement on the propensity score, though the functionality is identical across all matching methods except propensity score subclassification, which we illustrate at the end. ```{r} library("MatchIt") data("lalonde", package = "MatchIt") #1:1 NN matching w/ replacement on a logistic regression PS m.out <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, replace = TRUE) m.out ``` ### `summary.matchit()` When `summary()` is called on a `matchit` object, several tables of information are displayed. These include balance statistics for each covariate before matching, balance statistics for each covariate after matching, the percent reduction in imbalance after matching, and the sample sizes before and after matching. `summary.matchit()` has four additional arguments that control how balance is computed: - `interactions` controls whether balance statistics for all squares and pairwise interactions of covariates are to be displayed in addition to the covariates. The default is `FALSE`, and setting to `TRUE` can make the output massive when many covariates are present, but it is important to ensure no important interactions remain imbalanced. - `addlvariables` allows for balance to be assessed on variables other than those inside the `matchit` object. For example, if the distance between units only relied on a subset of covariates but balance needed to be achieved on all covariates, `addlvariables` could be used to supply these additional covariates. In addition to adding other variables, `addlvariables` can be used to request balance on specific functions of the covariates already in the `matchit` object, such as polynomial terms or interactions. The input to `addlvariables` can be a one-sided formula with the covariates and any desired transformations thereof on the right hand side, just like a model formula (e.g., `addlvariables = ~ X1 + X2 + I(X1^2)` would request balance on `X1`, `X2`, and the square of `X1`). Additional variables supplied to `addlvariables` but not present in the `matchit` object can be supplied as a data frame using the `data` argument. - `standardize` controls whether standardized or unstandardized statistics are to displayed. Standardized statistics include the standardized mean difference and eCDF statistics; unstandardized statistics include the raw difference in means and eQQ plot statistics. (Regardless, the variance ratio will always be displayed.). The default is `TRUE` for standardized statistics, which are more common to report because they are all on the same scale regardless of the scale of the covariates[^1]. - `pair.dist` controls whether within-pair distances should be computed and displayed. These reflect the average distance between units within the same pair, standardized or unstandardized according to the argument to `standardize`. The default is `TRUE`. With full matching, exact matching, coarsened exact matching, and propensity score subclassification, computing pair distances can take a long time, and so it may be beneficial to set to `FALSE` in these cases. [^1]: Note that versions of `MatchIt` before 4.0.0 had `standardize` set to `FALSE` by default. In addition, the arguments `un` (default: `TRUE`) and `improvement` (default: `FALSE`) control whether balance prior to matching should be displayed and whether the percent balance improvement after matching should be displayed. These can be set to `FALSE` to reduce the output. Below, we call `summary.matchit()` with `addlvariables` to display balance on covariates and a few functions of them in the matched sample. In particular, we request balance on the square of `age`, the variables representing whether `re74` and `re75` were equal to 0, and the interaction between `educ` and `race`. ```{r} summary(m.out, addlvariables = ~ I(age^2) + I(re74==0) + I(re75==0) + educ:race) ``` Let's examine the output in detail. The first table (`Summary of Balance for All Data`) provides balance in the sample prior to matching. The included statistics are the mean of the covariates in the treated group (`Means Treated`), the mean of the covariate in the control group (`Means Control`), the SMDs (`Std. Mean Diff.`), the variance ratio (`Var. Ratio`), the average distance between the eCDFs of the covariate across the groups (`eCDF Mean`), and the largest distance between the eCDFs (`eCDF Max`). Setting `un = FALSE` would have suppressed the creation of this table. The second table (`Summary of Balance for Matched Data`) contains all the same statistics in the matched sample. Because we implicitly request pair distance, an additional column for standardized pair distances (`Std. Pair Dist.`) is displayed. The final table (`Sample Sizes`) contains the sizes of the samples before (`All`) and after (`Matched`) matching, as well as the number of units left unmatched (`Unmatched`) and the number of units dropped due to a common support restriction (`Discarded`). The SMDs are computed as the mean difference divided by a standardization factor computed in the **unmatched** sample. An absolute SMD close to 0 indicates good balance; although a number of recommendations for acceptable values have appeared in the literature, we recommend absolute values less than .1 and less than .05 for potentially prognostically important variables. The variance ratios are computed as the ratio of the variance of the treated group to that of the control group for each covariate. Variance ratios are not computed for binary covariates because they are a function of the prevalence in each group, which is captured in the mean difference and eCDF statistics. A variance ratio close to 1 indicates good balance; a commonly used recommendation is for variance ratios to be between .5 and 2. The eCDF statistics correspond to the difference in the overall distributions of the covariates between the treatment groups. The values of both statistics range from 0 to 1, with values closer to zero indicating better balance. There are no specific recommendations for the values these statistics should take, though notably high values may indicate imbalance on higher moments of the covariates. The eQQ statistics produced when `standardize = FALSE` are interpreted similarly but are on the scale of the covariate. All these statistics should be considered together. Imbalance as measured by any of them may indicate a potential failure of the matching scheme to achieve distributional balance. ### `plot.summary.matchit()` A Love plot is a clean way to visually summarize balance. Using `plot` on the output of a call to `summary()` on a `matchit` object produces a Love plot of the standardized mean differences. `plot.summary.matchit()` has several additional arguments that can be used to customize the plot. - `abs` controls whether standardized mean difference should be displayed in absolute value or not. Default is `TRUE`. - `var.order` controls how the variables are ordered on the y-axis. The options are `"data"` (the default), which orders the variables as they appear the in the `summary.matchit()` output; `"unmatched"`, which orders the variables based on their standardized mean differences before matching; `"matched"`, which orders the variables based on their standardized mean differences after matching; and `"alphabetical"`, which orders the variables alphabetically. Using `"unmatched"` tends to result in attractive plots and ensures the legend doesn't overlap with points in its default position. - `threshold` controls where vertical lines indicating chosen thresholds should appear on the x-axis. Should be a numeric vector. The default is `c(.1, .05)`, which display vertical lines at .1 and .05 standardized mean difference units. - `position` controls the position of the legend. The default is `"bottomright"`, which puts the legend in the bottom right corner of the plot, and any keyword value available to supplied to `x` in `legend()` is allowed. Below we create a Love plot of the covariates. ```{r, fig.alt="A love plot with most matched dots below the threshold lines, indicaitng good balance after matching, in contrast to the unmatched dots far from the treshold lines, indicating poor balance before matching."} m.sum <- summary(m.out, addlvariables = ~ I(age^2) + I(re74==0) + I(re75==0) + educ:race) plot(m.sum, var.order = "unmatched") ``` From this plot it is clear to see that balance was quite poor prior to matching, but full matching improved balance on all covariates, and most within a threshold of .1. To make the variable names cleaner, the original variables should be renamed prior to matching. `cobalt` provides many additional options to generate and customize Love plots using the `love.plot()` function and should be used if a plot beyond what is available with `plot.summary.matchit()` is desired. ### `plot.matchit()` In addition to numeric summaries of balance, `MatchIt` offers graphical summaries as well using `plot.matchit()` (i.e., using `plot()` on a `matchit` object). We can create eQQ plots, eCDF plots, or density plots of the covariates and histograms or jitter plots of the propensity score. The covariate plots can provide a summary of the balance of the full marginal distribution of a covariate beyond just the mean and variance. `plot.matchit()` has a few arguments to customize the output: - `type` corresponds to the type of plot desired. Options include `"qq"` for eQQ plots (the default), `"ecdf"` for eCDF plots, `"density"` for density plots (or bar plots for categorical variables), `"jitter"` for jitter plots, and `"histogram"` for histograms. - `interactive` controls whether the plot is interactive or not. For eQQ, eCDF, and density plots, this allows us to control when the next page of covariates is to be displayed since only three can appear at a time. For jitter plots, this can allow us to select individual units with extreme values for further inspection. The default is `TRUE`. - `which.xs` is used to specify for which covariates to display balance in eQQ, eCDF, and density plots. The default is to display balance on all, but we can request balance just on a specific subset. If three or fewer are requested, `interactive` is ignored. The argument can be supplied as a one-sided formula with the variables of interest on the right or a character vector containing the names of the desired variables. If any variables are not in the `matchit` object, a `data` argument can be supplied with a data set containing the named variables. Below, we demonstrate the eQQ plot: ```{r, fig.alt ="eQQ plots of age, nodegree, and re74 in the unmatched and matched samples."} #eQQ plot plot(m.out, type = "qq", which.xs = ~age + nodegree + re74) ``` The y-axis displays the each value of the covariate for the treated units, and the x-axis displays the the value of the covariate at the corresponding quantile in the control group. When values fall on the 45 degree line, the groups are balanced. Above, we can see that `age` remains somewhat imbalanced, but `nodegree` and `re74` have much better balance after matching than before. The difference between the x and y values of each point are used to compute the eQQ difference statistics that are displayed in `summary.matchit()` with `standardize = FALSE`. Below, we demonstrate the eCDF plot: ```{r, fig.alt ="eCDF plots of educ, married, and re75 in the unmatched and matched samples."} #eCDF plot plot(m.out, type = "ecdf", which.xs = ~educ + married + re75) ``` The x-axis displays the covariate values and the y-axis displays the proportion of the sample at or less than that covariate value. Perfectly overlapping lines indicate good balance. The black line corresponds to the treated group and the gray line to the control group. Although `educ` and `re75` were fairly well balanced before matching, their balance has improved nonetheless. `married` appears far better balanced after matching than before. The vertical difference between the eCDFs lines of each treatment group is used to compute the eCDF difference statistics that are displayed in `summary.matchit()` with `standardize = TRUE`. Below, we demonstrate the density plot: ```{r, fig.alt ="Density plots of age, educ, and race in the unmatched and matched samples."} #density plot plot(m.out, type = "density", which.xs = ~age + educ + race) ``` The x-axis displays the covariate values and the y-axis displays the density of the sample at that covariate value. For categorical variables, the y-axis displays the proportion of the sample at that covariate value. The black line corresponds to the treated group and the gray line to the control group. Perfectly overlapping lines indicate good balance. Density plots display similar information to eCDF plots but may be more intuitive for some users because of their link to histograms. ## Assessing Balance After Subclassification With subclassification, balance can be checked both within each subclass and overall. With `summary.matchit()`, we can request to view balance only in aggregate or in each subclass. The latter can help us decide if we can interpret effects estimated within each subclass as unbiased. The `plot.summary.matchit()` and `plot.matchit()` outputs can be requested either in aggregate or for each subclass. We demonstrate this below. First we will perform propensity score subclassification using 4 subclasses (typically more is beneficial). ```{r} #Subclassification on a logistic regression PS s.out <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, method = "subclass", subclass = 4) s.out ``` When using `summary()`, the default is to display balance only in aggregate using the subclassification weights. This balance output looks similar to that for other matching methods. ```{r} summary(s.out) ``` An additional option in `summary()`, `subclass`, allows us to request balance for individual subclasses. `subclass` can be set to `TRUE` to display balance for all subclasses or the indices of individual subclasses for which balance is to be displayed. Below we call `summary()` and request balance to be displayed on all subclasses (setting `un = FALSE` to suppress balance in the original sample): ```{r} summary(s.out, subclass = TRUE, un = FALSE) ``` We can plot the standardized mean differences in a Love plot that also displays balance for the subclasses using `plot.summary.matchit()` on a `summary.matchit()` object with `subclass = TRUE`. ```{r, fig.alt ="Love plot of balance before and after subclassification, with subclass IDs representing balance within each subclass in addition to dots representing balance overall."} s <- summary(s.out, subclass = TRUE) plot(s, var.order = "unmatched", abs = FALSE) ``` Note that for some variables, while the groups are balanced in aggregate (black dots), the individual subclasses (gray numbers) may not be balanced, in which case unadjusted effect estimates within these subclasses should not be interpreted as unbiased. When we plot distributional balance using `plot.matchit()`, again we can choose whether balance should be displayed in aggregate or within subclasses again using the `subclass` option, which functions the same as it does with `summary.matchit()`. Below we demonstrate checking balance within a subclass. ```{r, fig.alt ="Density plots of educ, married, and re75 in the unmatched sample and in subclass 1."} plot(s.out, type = "density", which.xs = ~educ + married + re75, subclass = 1) ``` If we had set `subclass = FALSE`, plots would have been displayed in aggregate using the subclassification weights. If `subclass` is unspecified, a prompt will ask us for which subclass we want to see balance. ## Assessing Balance with `cobalt` ```{r, include=FALSE} ok <- requireNamespace("cobalt", quietly = TRUE) ``` The `cobalt` package was designed specifically for checking balance before and after matching (and weighting). It offers three main functions, `bal.tab()`, `love.plot()`, and `bal.plot()`, which perform similar actions to `summary.matchit()`, `plot.summary.matchit()`, and `plot.matchit()`, respectively. These functions directly interface with `matchit` objects, making `cobalt` straightforward to use in conjunction with `MatchIt`. `cobalt` can be used as a complement to `MatchIt`, especially for more advanced uses that are not accommodated by `MatchIt`, such as comparing balance across different matching schemes and even different packages, assessing balance in clustered or multiply imputed data, and assessing balance with multi-category, continuous, and time-varying treatments. The main `cobalt` vignette (`vignette("cobalt", package = "cobalt")`) contains many examples of its use with `MatchIt` objects, so we only provide a short demonstration of its capabilities here. ```{r, message = F, eval = ok} library("cobalt") ``` ### `bal.tab()` `bal.tab()` produces tables of balance statistics similar to `summary.matchit()`. The columns displayed can be customized to limit how much information is displayed and isolate desired information. We call `bal.tab()` with a few of its options specified below: ```{r, eval = ok} bal.tab(m.out, un = TRUE, stats = c("m", "v", "ks")) ``` The output is very similar to that of `summary.matchit()`, except that the balance statistics computed before matching (with the suffix `.Un`) and those computed after matching (with the suffix `.Adj`) are in the same table. By default, only SMDs after matching (`Diff.Adj`) are displayed; by setting `un = TRUE`, we requested that the balance statistics before matching also be displayed, and by setting `stats = c("m", "v", "ks")` we requested mean differences, variance ratios, and Kolmogorov-Smirnov statistics. Other balance statistics and summary statistics can be requested as well. One important detail to note is that the default for binary covariates is to print the raw difference in proportion rather than the standardized mean difference, so there will be an apparent discrepancy for these variables between `bal.tab()` and `summary.matchit()` output, though this behavior can be changed by setting `binary = "std"` in the call to `bal.tab()`. Functionality for producing balance statistics for additional variables and for powers and interactions of the covariates is available using the `addl`, `poly`, and `int` options. `bal.tab()` and other `cobalt` functions can produce balance not just on a single `matchit` object but on several at the same time, which facilitates comparing balance across several matching specifications. For example, if we wanted to compare the full matching results to the results of nearest neighbor matching without replacement, we could supply both to `bal.tab()`, which we demonstrate below: ```{r, eval = ok} #Nearest neighbor (NN) matching on the PS m.out2 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde) #Balance on covariates after full and NN matching bal.tab(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, un = TRUE, weights = list(full = m.out, nn = m.out2)) ``` This time, we supplied `bal.tab()` with the covariates and dataset and supplied the `matchit` output objects in the `weights` argument (which extracts the matching weights from the objects). Here we can see that full matching yields better balance than nearest neighbor matching overall, though balance is slightly worse for `age` and `maried` and the effective sample size is lower. ### `love.plot()` `love.plot()` creates a Love plot of chosen balance statistics. It offers many options for customization, including the shape and colors of the points, how the variable names are displayed, and for which statistics balance is to be displayed. Below is an example of its basic use: ```{r, eval = ok, fig.alt ="Minimal love plot of balance before and after matching."} love.plot(m.out, binary = "std") ``` The syntax is straightforward and similar to that of `bal.tab()`. Below we demonstrate a more advanced use that customizes the appearance of the plot and displays balance not only on mean differences but also on Kolmogorov-Smirnov statistics and for both full matching and nearest neighbor matching simultaneously. ```{r, fig.width=7, eval = ok, fig.alt ="A more elaborate love plot displaying some of the cobalt's capabilities for making publication-ready plots."} love.plot(m.out, stats = c("m", "ks"), poly = 2, abs = TRUE, weights = list(nn = m.out2), drop.distance = TRUE, thresholds = c(m = .1), var.order = "unadjusted", binary = "std", shapes = c("circle filled", "triangle", "square"), colors = c("red", "blue", "darkgreen"), sample.names = c("Original", "Full Matching", "NN Matching"), position = "bottom") ``` The `love.plot()` documentation explains what each of these arguments do and the several other ones available. See `vignette("love.plot", package = "cobalt")` for other advanced customization of `love.plot()`. ### `bal.plot()` `bal.plot()` displays distributional balance for a single covariate, similar to `plot.matchit()`. Its default is to display kernel density plots for continuous variables and bar graphs for categorical variables. It can also display eCDF plots and histograms. Below we demonstrate some of its uses: ```{r, eval = ok, fig.alt = c("Density plot for educ before and after matching.", "Bar graph for race before and after matching.", "Mirrored histograms of propensity scores before and after matching.")} #Density plot for continuous variables bal.plot(m.out, var.name = "educ", which = "both") #Bar graph for categorical variables bal.plot(m.out, var.name = "race", which = "both") #Mirrored histogram bal.plot(m.out, var.name = "distance", which = "both", type = "histogram", mirror = TRUE) ``` These plots help illuminate the specific ways in which the covariate distributions differ between treatment groups, which can aid in interpreting the balance statistics provided by `bal.tab()` and `summary.matchit()`. ## Conclusion The goal of matching is to achieve covariate balance, similarity between the covariate distributions of the treated and control groups. Balance should be assessed during the matching phase to find a matching specification that works. Balance must also be reported in the write-up of a matching analysis to demonstrate to readers that matching was successful. `MatchIt` and `cobalt` each offer a suite of functions to implement best practices in balance assessment and reporting. ## References MatchIt/inst/doc/estimating-effects.html0000644000176200001440000037171414763323601020016 0ustar liggesusers Estimating Effects After Matching

Estimating Effects After Matching

Noah Greifer

2025-03-09

Introduction

After assessing balance and deciding on a matching specification, it comes time to estimate the effect of the treatment in the matched sample. How the effect is estimated and interpreted depends on the desired estimand and the type of model used (if any). In addition to estimating effects, estimating the uncertainty of the effects is critical in communicating them and assessing whether the observed effect is compatible with there being no effect in the population. This guide explains how to estimate effects after various forms of matching and with various outcome types. There may be situations that are not covered here for which additional methodological research may be required, but some of the recommended methods here can be used to guide such applications.

This guide is structured as follows: first, information on the concepts related to effect and standard error (SE) estimation is presented below. Then, instructions for how to estimate effects and SEs are described for the standard case (matching for the ATT with a continuous outcome) and some other common circumstances. Finally, recommendations for reporting results and tips to avoid making common mistakes are presented.

Identifying the estimand

Before an effect is estimated, the estimand must be specified and clarified. Although some aspects of the estimand depend not only on how the effect is estimated after matching but also on the matching method itself, other aspects must be considered at the time of effect estimation and interpretation. Here, we consider three aspects of the estimand: the population the effect is meant to generalize to (the target population), the effect measure, and whether the effect is marginal or conditional.

The target population. Different matching methods allow you to estimate effects that can generalize to different target populations. The most common estimand in matching is the average treatment effect in the treated (ATT), which is the average effect of treatment for those who receive treatment. This estimand is estimable for matching methods that do not change the treated units (i.e., by weighting or discarding units) and is requested in matchit() by setting estimand = "ATT" (which is the default). The average treatment effect in the population (ATE) is the average effect of treatment for the population from which the sample is a random sample. This estimand is estimable only for methods that allow the ATE and either do not discard units from the sample or explicit target full sample balance, which in MatchIt is limited to full matching, subclassification, and profile matching when setting estimand = "ATE". When treated units are discarded (e.g., through the use of common support restrictions, calipers, cardinality matching, or [coarsened] exact matching), the estimand corresponds to neither the population ATT nor the population ATE, but rather to an average treatment effect in the remaining matched sample (ATM), which may not correspond to any specific target population. See Greifer and Stuart (2021) for a discussion on the substantive considerations involved when choosing the target population of the estimand.

Marginal and conditional effects. A marginal effect is a comparison between the expected potential outcome under treatment and the expected potential outcome under control. This is the same quantity estimated in randomized trials without blocking or covariate adjustment and is particularly useful for quantifying the overall effect of a policy or population-wide intervention. A conditional effect is the comparison between the expected potential outcomes in the treatment groups within strata. This is useful for identifying the effect of a treatment for an individual patient or a subset of the population.

Effect measures. The outcome types we consider here are continuous, with the effect measured by the mean difference; binary, with the effect measured by the risk difference (RD), risk ratio (RR), or odds ratio (OR); and time-to-event (i.e., survival), with the effect measured by the hazard ratio (HR). The RR, OR, and HR are noncollapsible effect measures, which means the marginal effect on that scale is not a (possibly) weighted average of the conditional effects within strata, even if the stratum-specific effects are of the same magnitude. For these effect measures, it is critical to distinguish between marginal and conditional effects because different statistical methods target different types of effects. The mean difference and RD are collapsible effect measures, so the same methods can be used to estimate marginal and conditional effects.

Our primary focus will be on marginal effects, which are appropriate for all effect measures, easily interpretable, and require few modeling assumptions. The “Common Mistakes” section includes examples of commonly used methods that estimate conditional rather than marginal effects and should not be used when marginal effects are desired.

G-computation

To estimate marginal effects, we use a method known as g-computation (Snowden, Rose, and Mortimer 2011) or regression estimation (Schafer and Kang 2008). This involves first specifying a model for the outcome as a function of the treatment and covariates. Then, for each unit, we compute their predicted values of the outcome setting their treatment status to treated, and then again for control, leaving us with two predicted outcome values for each unit, which are estimates of the potential outcomes under each treatment level. We compute the mean of each of the estimated potential outcomes across the entire sample, which leaves us with two average estimated potential outcomes. Finally, the contrast of these average estimated potential outcomes (e.g., their difference or ratio, depending on the effect measure desired) is the estimate of the treatment effect.

When doing g-computation after matching, a few additional considerations are required. First, when we take the average of the estimated potential outcomes under each treatment level, this must be a weighted average that incorporates the matching weights. Second, if we want to target the ATT or ATC, we only estimate potential outcomes for the treated or control group, respectively (though we still generate predicted values under both treatment and control).

G-computation as a framework for estimating effects after matching has a number of advantages over other approaches. It works the same regardless of the form of the outcome model or type of outcome (e.g., whether a linear model is used for a continuous outcome or a logistic model is used for a binary outcome); the only difference might be how the average expected potential outcomes are contrasted in the final step. In simple cases, the estimated effect is numerically identical to effects estimated using other methods; for example, if no covariates are included in the outcome model, the g-computation estimate is equal to the difference in means from a t-test or coefficient of the treatment in a linear model for the outcome. There are analytic approximations to the SEs of the g-computation estimate, and these SEs can incorporate pair/subclass membership (described in more detail below).

For all these reasons, we use g-computation when possible for all effect estimates, even if there are simpler methods that would yield the same estimates. Using a single workflow (with some slight modifications depending on the context; see below) facilitates implementing best practices regardless of what choices a user makes.

Modeling the Outcome

The goal of the outcome model is to generate good predictions for use in the g-computation procedure described above. The type and form of the outcome model should depend on the outcome type. For continuous outcomes, one can use a linear model regressing the outcome on the treatment; for binary outcomes, one can use a generalized linear model with, e.g., a logistic link; for time-to-event outcomes, one can use a Cox proportional hazards model.

An additional decision to make is whether (and how) to include covariates in the outcome model. One may ask, why use matching at all if you are going to model the outcome with covariates anyway? Matching reduces the dependence of the effect estimate on correct specification of the outcome model; this is the central thesis of Ho et al. (2007). Including covariates in the outcome model after matching has several functions: it can increase precision in the effect estimate, reduce the bias due to residual imbalance, and make the effect estimate “doubly robust”, which means it is consistent if either the matching reduces sufficient imbalance in the covariates or if the outcome model is correct. For these reasons, we recommend covariate adjustment after matching when possible. There is some evidence that covariate adjustment is most helpful for covariates with standardized mean differences greater than .1 (Nguyen et al. 2017), so these covariates and covariates thought to be highly predictive of the outcome should be prioritized in treatment effect models if not all can be included due to sample size constraints.

Although there are many possible ways to include covariates (e.g., not just main effects but interactions, smoothing terms like splines, or other nonlinear transformations), it is important not to engage in specification search (i.e., trying many outcomes models in search of the “best” one). Doing so can invalidate results and yield a conclusion that fails to replicate. For this reason, we recommend only including the same terms included in the propensity score model unless there is a strong a priori and justifiable reason to model the outcome differently.

It is important not to interpret the coefficients and tests of covariates in the outcome model. These are not causal effects and their estimates may be severely confounded. Only the treatment effect estimate can be interpreted as causal assuming the relevant assumptions about unconfoundedness are met. Inappropriately interpreting the coefficients of covariates in the outcome model is known as the Table 2 fallacy (Westreich and Greenland 2013). To avoid this, we only display the results of the g-computation procedure and do not examine or interpret the outcome models themselves.

Estimating Standard Errors and Confidence Intervals

Uncertainty estimation (i.e., of SEs, confidence intervals, and p-values) may consider the variety of sources of uncertainty present in the analysis, including (but not limited to!) estimation of the propensity score (if used), matching (i.e., because treated units might be matched to different control units if others had been sampled), and estimation of the treatment effect (i.e., because of sampling error). In general, there are no analytic solutions to all these issues, so much of the research done on uncertainty estimation after matching has relied on simulation studies. The two primary methods that have been shown to perform well in matched samples are using cluster-robust SEs and the bootstrap, described below.

To compute SEs after g-computation, a method known as the delta method is used; this is a way to compute the SEs of the derived quantities (the expected potential outcomes and their contrast) from the variance of the coefficients of the outcome models. For nonlinear models (e.g., logistic regression), the delta method is only an approximation subject to error (though in many cases this error is small and shrinks in large samples). Because the delta method relies on the variance of the coefficients from the outcome model, it is important to correctly estimate these variances, using either robust or cluster-robust methods as described below.

Robust and Cluster-Robust Standard Errors

Robust standard errors. Also known as sandwich SEs (due to the form of the formula for computing them), heteroscedasticity-consistent SEs, or Huber-White SEs, robust SEs are an adjustment to the usual maximum likelihood or ordinary least squares SEs that are robust to violations of some of the assumptions required for usual SEs to be valid (MacKinnon and White 1985). Although there has been some debate about their utility (King and Roberts 2015), robust SEs rarely degrade inferences and often improve them. Generally, robust SEs must be used when any non-uniform weights are included in the estimation (e.g., with matching with replacement or inverse probability weighting).

Cluster-robust standard errors. A version of robust SEs known as cluster-robust SEs (Liang and Zeger 1986) can be used to account for dependence between observations within clusters (e.g., matched pairs). Abadie and Spiess (2019) demonstrate analytically that cluster-robust SEs are generally valid after matching, whereas regular robust SEs can over- or under-estimate the true sampling variability of the effect estimator depending on the specification of the outcome model (if any) and degree of effect modification. A plethora of simulation studies have further confirmed the validity of cluster-robust SEs after matching (e.g., Austin 2009, 2013a; Austin and Small 2014; Gayat et al. 2012; Wan 2019). Given this evidence favoring the use of cluster-robust SEs, we recommend them in most cases and use them judiciously in this guide1.

Bootstrapping

One problem when using robust and cluster-robust SEs along with the delta method is that the delta method is an approximation, as previously mentioned. One solution to this problem is bootstrapping, which is a technique used to simulate the sampling distribution of an estimator by repeatedly drawing samples with replacement and estimating the effect in each bootstrap sample (Efron and Tibshirani 1993). From the bootstrap distribution, SEs and confidence intervals can be computed in several ways, including using the standard deviation of the bootstrap estimates as the SE estimate or using the 2.5 and 97.5 percentiles as 95% confidence interval bounds. Bootstrapping tends to be most useful when no analytic estimator of a SE is possible or has been derived yet. Although Abadie and Imbens (2008) found analytically that the bootstrap is inappropriate for matched samples, simulation evidence has found it to be adequate in many cases (Hill and Reiter 2006; Austin and Small 2014; Austin and Stuart 2017).

Typically, bootstrapping involves performing the entire estimation process in each bootstrap sample, including propensity score estimation, matching, and effect estimation. This tends to be the most straightforward route, though intervals from this method may be conservative in some cases (i.e., they are wider than necessary to achieve nominal coverage) (Austin and Small 2014). Less conservative and more accurate intervals have been found when using different forms of the bootstrap, including the wild bootstrap develop by Bodory et al. (2020) and the matched/cluster bootstrap described by Austin and Small (2014) and Abadie and Spiess (2019). The cluster bootstrap involves sampling matched pairs/strata of units from the matched sample and performing the analysis within each sample composed of the sampled pairs. Abadie and Spiess (2019) derived analytically that the cluster bootstrap is valid for estimating SEs and confidence intervals in the same circumstances cluster robust SEs are; indeed, the cluster bootstrap SE is known to approximate the cluster-robust SE (Cameron and Miller 2015).

With bootstrapping, more bootstrap replications are always better but can take time and increase the chances that at least one error will occur within the bootstrap analysis (e.g., a bootstrap sample with zero treated units or zero units with an event). In general, numbers of replications upwards of 999 are recommended, with values one less than a multiple of 100 preferred to avoid interpolation when using the percentiles as confidence interval limits (MacKinnon 2006). There are several methods of computing bootstrap confidence intervals, but the bias-corrected accelerated (BCa) bootstrap confidence interval often performs best (Austin and Small 2014; Carpenter and Bithell 2000) and is easy to implement, simply by setting type = "bca" in the call to boot::boot.ci() after running boot::boot()2.

Most of this guide will consider analytic (i.e., non-bootstrapping) approaches to estimating uncertainty; the section “Using Bootstrapping to Estimate Confidence Intervals” describes broadly how to use bootstrapping. Although analytic estimates are faster to compute, in many cases bootstrap confidence intervals are more accurate.

Estimating Treatment Effects and Standard Errors After Matching

Below, we describe effect estimation after matching. We’ll be using a simulated toy dataset d with several outcome types. Code to generate the dataset is at the end of this document. The focus here is not on evaluating the methods but simply on demonstrating them. In all cases, the correct propensity score model is used. Below we display the first six rows of d:

head(d)
##   A      X1      X2      X3       X4 X5      X6      X7      X8       X9      Y_C Y_B     Y_S
## 1 0  0.1725 -1.4283 -0.4103 -2.36059  1 -1.1199  0.6398 -0.4840 -0.59385  0.07104   0  278.46
## 2 0 -1.0959  0.8463  0.2456 -0.12333  1 -2.2687 -1.4491 -0.5514 -0.31439  0.15619   0  330.63
## 3 0  0.1768  0.7905 -0.8436  0.82366  1 -0.2221  0.2971 -0.6966 -0.69516 -0.85180   1  369.94
## 4 0 -0.4595  0.1726  1.9542 -0.62661  1 -0.4019 -0.8294 -0.5384  0.20729 -2.35184   0   91.06
## 5 1  0.3563 -1.8121  0.8135 -0.67189  1 -0.8297  1.7297 -0.6439 -0.02648  0.68058   0  182.73
## 6 0 -2.4313 -1.7984 -1.2940  0.04609  1 -1.2419 -1.1252 -1.8659 -0.56513 -5.62260   0 2563.73

A is the treatment variable, X1 through X9 are covariates, Y_C is a continuous outcome, Y_B is a binary outcome, and Y_S is a survival outcome.

We will need to the following packages to perform the desired analyses:

  • marginaleffects provides the avg_comparisons() function for performing g-computation and estimating the SEs and confidence intervals of the average estimate potential outcomes and treatment effects
  • sandwich is used internally by marginaleffects to compute robust and cluster-robust SEs
  • survival provides coxph() to estimate the coefficients in a Cox-proportional hazards model for the marginal hazard ratio, which we will use for survival outcomes.

Of course, we also need MatchIt to perform the matching.

library("MatchIt")
## library("marginaleffects")

All effect estimates will be computed using marginaleffects::avg_comparions(), even when its use may be superfluous (e.g., for performing a t-test in the matched set). As previously mentioned, this is because it is useful to have a single workflow that works no matter the situation, perhaps with very slight modifications to accommodate different contexts. Using avg_comparions() has several advantages, even when the alternatives are simple: it only provides the effect estimate, and not other coefficients; it automatically incorporates robust and cluster-robust SEs if requested; and it always produces average marginal effects for the correct population if requested.

Other packages may be of use but are not used here. There are alternatives to the marginaleffects package for computing average marginal effects, including margins and stdReg. The survey package can be used to estimate robust SEs incorporating weights and provides functions for survey-weighted generalized linear models and Cox-proportional hazards models.

The Standard Case

For almost all matching methods, whether a caliper, common support restriction, exact matching specification, or \(k\):1 matching specification is used, estimating the effect in the matched dataset is straightforward and involves fitting a model for the outcome that incorporates the matching weights3, then estimating the treatment effect using g-computation (i.e., using marginaleffects::avg_comparisons()) with a cluster-robust SE to account for pair membership. This procedure is the same for continuous and binary outcomes with and without covariates.

There are a few adjustments that need to be made for certain scenarios, which we describe in the section “Adjustments to the Standard Case”. These adjustments include for the following cases: when matching for the ATE rather than the ATT, for matching with replacement, for matching with a method that doesn’t involve creating pairs (e.g., cardinality and profile matching and coarsened exact matching), for subclassification, for estimating effects with binary outcomes, and for estimating effects with survival outcomes. You must read the Standard Case to understand the basic procedure before reading about these special scenarios.

Here, we demonstrate the faster analytic approach to estimating confidence intervals; for the bootstrap approach, see the section “Using Bootstrapping to Estimate Confidence Intervals” below.

First, we will perform variable-ratio nearest neighbor matching without replacement on the propensity score for the ATT. Remember, all matching methods use this exact procedure or a slight variation, so this section is critical even if you are using a different matching method.

#Variable-ratio NN matching on the PS for the ATT
mV <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + 
                X6 + X7 + X8 + X9,
              data = d,
              ratio = 2,
              max.controls = 4)
mV
## A `matchit` object
##  - method: Variable ratio 2:1 nearest neighbor matching without replacement
##  - distance: Propensity score
##              - estimated with logistic regression
##  - number of obs.: 2000 (original), 1323 (matched)
##  - target estimand: ATT
##  - covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9
#Extract matched data
md <- match_data(mV)

head(md)
##    A      X1      X2      X3      X4 X5      X6      X7      X8       X9      Y_C Y_B    Y_S distance weights subclass
## 1  0  0.1725 -1.4283 -0.4103 -2.3606  1 -1.1199  0.6398 -0.4840 -0.59385  0.07104   0 278.46  0.08461     0.5      365
## 3  0  0.1768  0.7905 -0.8436  0.8237  1 -0.2221  0.2971 -0.6966 -0.69516 -0.85180   1 369.94  0.22210     0.5       42
## 5  1  0.3563 -1.8121  0.8135 -0.6719  1 -0.8297  1.7297 -0.6439 -0.02648  0.68058   0 182.73  0.43291     1.0        1
## 7  0  1.8402  1.7601 -1.0746 -1.6428  1  1.4482  0.7131  0.6972 -0.94673  4.28651   1  97.49  0.09274     0.5        6
## 9  0  0.7808  1.3137  0.6580  0.8540  1  0.9495 -0.5731 -0.2362 -0.14580 15.89771   1  67.53  0.15751     0.5      218
## 10 1 -0.5651 -0.1053 -0.1369  1.6233  1 -0.5304 -0.3342  0.4184  0.46308  1.07888   1 113.70  0.16697     1.0        2

Typically one would assess balance and ensure that this matching specification works, but we will skip that step here to focus on effect estimation. See vignette("MatchIt") and vignette("assessing-balance") for more information on this necessary step. Because we did not use a caliper, the target estimand is the ATT.

We perform all analyses using the matched dataset, md, which, for matching methods that involve dropping units, contains only the units retained in the sample.

First, we fit a model for the outcome given the treatment and (optionally) the covariates. It’s usually a good idea to include treatment-covariate interactions, which we do below, but this is not always necessary, especially when excellent balance has been achieved. You can also include the propensity score (usually labeled distance in the match_data() output), which can add some robustness, especially when modeled flexibly (e.g., with polynomial terms or splines) (Austin 2017); see here for an example.

#Linear model with covariates
fit1 <- lm(Y_C ~ A * (X1 + X2 + X3 + X4 + X5 + 
                        X6 + X7 + X8 + X9),
           data = md,
           weights = weights)

Next, we use marginaleffects::avg_comparisons() to estimate the ATT.

avg_comparisons(fit1,
                variables = "A",
                vcov = ~subclass,
                newdata = subset(A == 1))

Let’s break down the call to avg_comparisons(): to the first argument, we supply the model fit, fit1; to the variables argument, the name of the treatment ("A"); to the vcov argument, a formula with subclass membership (~subclass) to request cluster-robust SEs; and to the newdata argument, a version of the matched dataset containing only the treated units (subset(A == 1)) to request the ATT. Some of these arguments differ depending on the specifics of the matching method and outcome type; see the sections below for information.

If, in addition to the effect estimate, we want the average estimated potential outcomes, we can use marginaleffects::avg_predictions(), which we demonstrate below. Note the interpretation of the resulting estimates as the expected potential outcomes is only valid if all covariates present in the outcome model (if any) are interacted with the treatment.

avg_predictions(fit1,
                variables = "A",
                vcov = ~subclass,
                newdata = subset(A == 1))

We can see that the difference in potential outcome means is equal to the average treatment effect computed previously4. All of the arguments to avg_predictions() are the same as those to avg_comparisons().

Adjustments to the Standard Case

This section explains how the procedure might differ if any of the following special circumstances occur.

Matching for the ATE

When matching for the ATE (including [coarsened] exact matching, full matching, subclassification, and cardinality matching), everything is identical to the Standard Case except that in the calls to avg_comparisons() and avg_predictions(), the newdata argument is omitted. This is because the estimated potential outcomes are computed for the full sample rather than just the treated units.

Matching with replacement

When matching with replacement (i.e., nearest neighbor or genetic matching with replace = TRUE), effect and SE estimation need to account for control unit multiplicity (i.e., repeated use) and within-pair correlations (Hill and Reiter 2006; Austin and Cafri 2020). Although Abadie and Imbens (2008) demonstrated analytically that bootstrap SEs may be invalid for matching with replacement, simulation work by Hill and Reiter (2006) and Bodory et al. (2020) has found that bootstrap SEs are adequate and generally slightly conservative. See the section “Using Bootstrapping to Estimate Confidence Intervals” for instructions on using the bootstrap and an example that use matching with replacement.

Because control units do not belong to unique pairs, there is no pair membership in the match_data() output. One can simply change vcov = ~subclass to vcov = "HC3" in the calls to avg_comparisons() and avg_predictions() to use robust SEs instead of cluster-robust SEs, as recommended by Hill and Reiter (2006). There is some evidence for an alternative approach that incorporates pair membership and adjusts for reuse of control units, though this has only been studied for survival outcomes (Austin and Cafri 2020). This adjustment involves using two-way cluster-robust SEs with pair membership and unit ID as the clustering variables. For continuous and binary outcomes, this involves the following two changes: 1) replace match_data() with get_matches(), which produces a dataset with one row per unit per pair, meaning control units matched to multiple treated units will appear multiple times in the dataset; 2) set vcov = ~subclass + id in the calls to avg_comparisons() and avg_predictions(). For survival outcomes, a special procedure must be used; see the section on survival outcomes below.

Matching without pairing

Some matching methods do not involve creating pairs; these include cardinality and profile matching with mahvars = NULL (the default), exact matching, and coarsened exact matching with k2k = FALSE (the default). The only change that needs to be made to the Standard Case is that one should change vcov = ~subclass to vcov = "HC3" in the calls to avg_comparisons() and avg_predictions() to use robust SEs instead of cluster-robust SEs. Remember that if matching is done for the ATE (even if units are dropped), the newdata argument should be dropped.

Propensity score subclassification

There are two natural ways to estimate marginal effects after subclassification: the first is to estimate subclass-specific treatment effects and pool them using an average marginal effects procedure, and the second is to use the stratum weights to estimate a single average marginal effect. This latter approach is also known as marginal mean weighting through stratification (MMWS), and is described in detail by Hong (2010)5. When done properly, both methods should yield similar or identical estimates of the treatment effect.

All of the methods described above for the Standard Case also work with MMWS because the formation of the weights is the same; the only difference is that it is not appropriate to use cluster-robust SEs with MMWS because of how few clusters are present, so one should change vcov = ~subclass to vcov = "HC3" in the calls to avg_comparisons() and avg_predictions() to use robust SEs instead of cluster-robust SEs. The subclasses can optionally be included in the outcome model (optionally interacting with treatment) as an alternative to including the propensity score.

The subclass-specific approach omits the weights and uses the subclasses directly. It is only appropriate when there are a small number of subclasses relative to the sample size. In the outcome model, subclass should interact with all other predictors in the model (including the treatment, covariates, and interactions, if any), and the weights argument should be omitted. As with MMWS, one should change vcov = ~subclass to vcov = "HC3" in the calls to avg_comparisons() and avg_predictions(). See an example below:

#Subclassification on the PS for the ATT
mS <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + 
                X6 + X7 + X8 + X9,
              data = d,
              method = "subclass",
              estimand = "ATT")

#Extract matched data
md <- match_data(mS)

fitS <- lm(Y_C ~ subclass * (A * (X1 + X2 + X3 + X4 + X5 + 
                                    X6 + X7 + X8 + X9)),
           data = md)

avg_comparisons(fitS,
                variables = "A",
                vcov = "HC3",
                newdata = subset(A == 1))

A model with fewer terms may be required when subclasses are small; removing covariates or their interactions with treatment may be required and can increase precision in smaller datasets. Remember that if subclassification is done for the ATE (even if units are dropped), the newdata argument should be dropped.

Binary outcomes

Estimating effects on binary outcomes is essentially the same as for continuous outcomes. The main difference is that there are several measures of the effect one can consider, which include the odds ratio (OR), risk ratio/relative risk (RR), and risk difference (RD), and the syntax to avg_comparisons() depends on which one is desired. The outcome model should be one appropriate for binary outcomes (e.g., logistic regression) but is unrelated to the desired effect measure because we can compute any of the above effect measures using avg_comparisons() after the logistic regression.

To fit a logistic regression model, change lm() to glm() and set family = quasibinomial()6. To compute the marginal RD, we can use exactly the same syntax as in the Standard Case; nothing needs to change7.

To compute the marginal RR, we need to add comparison = "lnratioavg" to avg_comparisons(); this computes the marginal log RR. To get the marginal RR, we need to add transform = "exp" to avg_comparisons(), which exponentiates the marginal log RR and its confidence interval. The code below computes the effects and displays the statistics of interest:

#Logistic regression model with covariates
fit2 <- glm(Y_B ~ A * (X1 + X2 + X3 + X4 + X5 + 
                         X6 + X7 + X8 + X9),
            data = md,
            weights = weights,
            family = quasibinomial())

#Compute effects; RR and confidence interval
avg_comparisons(fit2,
                variables = "A",
                vcov = ~subclass,
                newdata = subset(A == 1),
                comparison = "lnratioavg",
                transform = "exp")

The output displays the marginal RR, its Z-value, the p-value for the Z-test of the log RR against 0, and its confidence interval. (Note that even though the Contrast label still suggests the log RR, the RR is actually displayed.) To view the log RR and its standard error, omit the transform argument.

For the marginal OR, the only thing that needs to change is that comparison should be set to "lnoravg". For the marginal RD, both the comparison and transform arguments can be removed (yielding the same call as in the standard case).

Survival outcomes

There are several measures of effect size for survival outcomes. When using the Cox proportional hazards model, the quantity of interest is the hazard ratio (HR) between the treated and control groups. As with the OR, the HR is non-collapsible, which means the estimated HR will only be a valid estimate of the marginal HR when no other covariates are included in the model. Other effect measures, such as the difference in mean survival times or probability of survival after a given time, can be treated just like continuous and binary outcomes as previously described.

For the HR, we cannot compute average marginal effects and must use the coefficient on treatment in a Cox model fit without covariates8. This means that we cannot use the procedures from the Standard Case. Here we describe estimating the marginal HR using coxph() from the survival package. (See help("coxph", package = "survival") for more information on this model.) To request cluster-robust SEs as recommended by Austin (2013b), we need to supply pair membership (stored in the subclass column of md) to the cluster argument and set robust = TRUE. For matching methods that don’t involve pairing (e.g., cardinality and profile matching and [coarsened] exact matching), we can omit the cluster argument (but keep robust = TRUE)9.

library("survival")

#Cox Regression for marginal HR
coxph(Surv(Y_S) ~ A,
      data = md,
      robust = TRUE, 
      weights = weights,
      cluster = subclass)
## Call:
## coxph(formula = Surv(Y_S) ~ A, data = md, weights = weights, 
##     robust = TRUE, cluster = subclass)
## 
##   coef exp(coef) se(coef) robust se z     p
## A 0.47      1.60     0.06      0.07 7 2e-12
## 
## Likelihood ratio test=61  on 1 df, p=7e-15
## n= 1323, number of events= 1323

The coef column contains the log HR, and exp(coef) contains the HR. Remember to always use the robust se for the SE of the log HR. The displayed z-test p-value results from using the robust SE.

For matching with replacement, a special procedure described by Austin and Cafri (2020) can be necessary for valid inference. According to the results of their simulation studies, when the treatment prevalence is low (<30%), a SE that does not involve pair membership (i.e., the match_data() approach, as demonstrated above) is sufficient. When treatment prevalence is higher, the SE that ignores pair membership may be too low, and the authors recommend using a custom SE estimator that uses information about both multiplicity and pairing.

Doing so must be done manually for survival models using get_matches() and several calls to coxph() as demonstrated in the appendix of Austin and Cafri (2020). We demonstrate this below:

#get_matches() after matching with replacement
gm <- get_matches(mR)

#Austin & Cafri's (2020) SE estimator
fs <- coxph(Surv(Y_S) ~ A, data = gm, robust = TRUE, 
            weights = weights, cluster = subclass)
Vs <- fs$var
ks <- nlevels(gm$subclass)

fi <- coxph(Surv(Y_S) ~ A, data = gm, robust = TRUE, 
            weights = weights, cluster = id)
Vi <- fi$var
ki <- length(unique(gm$id))

fc <- coxph(Surv(Y_S) ~ A, data = gm, robust = TRUE, 
            weights = weights)
Vc <- fc$var
kc <- nrow(gm)

#Compute the variance and sneak it back into the fit object
fc$var <- (ks/(ks-1))*Vs + (ki/(ki-1))*Vi - (kc/(kc-1))*Vc

fc

The robust se column contains the computed SE, and the reported Z-test uses this SE. The se(coef) column should be ignored.

Using Bootstrapping to Estimate Confidence Intervals

The bootstrap is an alternative to the delta method for estimating confidence intervals for estimated effects. See the section Bootstrapping above for details. Here, we’ll demonstrate two forms of the bootstrap: 1) the standard bootstrap, which involve resampling units and performing matching and effect estimation within each bootstrap sample, and 2) the cluster bootstrap, which involves resampling pairs after matching and estimating the effect in each bootstrap sample. For both, we will use functionality in the boot package. It is critical to set a seed using set.seed() prior to performing the bootstrap in order for results to be replicable.

The standard bootstrap

For the standard bootstrap, we need a function that takes in the original dataset and a vector of sampled unit indices and returns the estimated quantity of interest. This function should perform the matching on the bootstrap sample, fit the outcome model, and estimate the treatment effect using g-computation. In this example, we’ll use matching with replacement, since the standard bootstrap has been found to work well with it (Bodory et al. 2020; Hill and Reiter 2006), despite some analytic results recommending otherwise (Abadie and Imbens 2008). We’ll implement g-computation manually rather than using avg_comparisons(), as this dramatically improves the speed of the estimation since we don’t require standard errors to be estimated in each sample (or other processing avg_comparisons() does). We’ll consider the marginal RR ATT of A on the binary outcome Y_B.

The first step is to write the estimation function, we call boot_fun. This function returns the marginal RR. In it, we perform the matching, estimate the effect, and return the estimate of interest.

boot_fun <- function(data, i) {
  boot_data <- data[i,]
  
  #Do 1:1 PS matching with replacement
  m <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + 
                 X6 + X7 + X8 + X9,
               data = boot_data,
               replace = TRUE)
  
  #Extract matched dataset
  md <- match_data(m, data = boot_data)
  
  #Fit outcome model
  fit <- glm(Y_B ~ A * (X1 + X2 + X3 + X4 + X5 + 
                          X6 + X7 + X8 + X9),
             data = md, weights = weights,
             family = quasibinomial())
  
  ## G-computation ##
  #Subset to treated units for ATT; skip for ATE
  md1 <- subset(md, A == 1)
  
  #Estimated potential outcomes under treatment
  p1 <- predict(fit, type = "response",
                newdata = transform(md1, A = 1))
  Ep1 <- mean(p1)
  
  #Estimated potential outcomes under control
  p0 <- predict(fit, type = "response",
                newdata = transform(md1, A = 0))
  Ep0 <- mean(p0)
  
  #Risk ratio
  Ep1 / Ep0
}

Next, we call boot::boot() with this function and the original dataset supplied to perform the bootstrapping. We’ll request 199 bootstrap replications here, but in practice you should use many more, upwards of 999. More is always better. Using more also allows you to use the bias-corrected and accelerated (BCa) bootstrap confidence intervals (which you can request by setting type = "bca" in the call to boot.ci()), which are known to be the most accurate. See ?boot.ci for details. Here, we’ll just use a percentile confidence interval.

library("boot")
set.seed(54321)
boot_out <- boot(d, boot_fun, R = 199)

boot_out
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = d, statistic = boot_fun, R = 199)
## 
## 
## Bootstrap Statistics :
##     original  bias    std. error
## t1*    1.347  0.1417      0.1937
boot.ci(boot_out, type = "perc")
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 199 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot_out, type = "perc")
## 
## Intervals : 
## Level     Percentile     
## 95%   ( 1.144,  1.891 )  
## Calculations and Intervals on Original Scale
## Some percentile intervals may be unstable

We find a RR of 1.347 with a confidence interval of (1.144, 1.891). If we had wanted a risk difference, we could have changed the final line in boot_fun() to be Ep1 - Ep0.

The cluster bootstrap

For the cluster bootstrap, we need a function that takes in a vector of subclass (e.g., pairs) and a vector of sampled pair indices and returns the estimated quantity of interest. This function should fit the outcome model and estimate the treatment effect using g-computation, but the matching step occurs prior to the bootstrap. Here, we’ll use matching without replacement, since the cluster bootstrap has been found to work well with it (Austin and Small 2014; Abadie and Spiess 2019). This could be used for any method that returns pair membership, including other pair matching methods without replacement and full matching.

As before, we’ll use g-computation to estimate the marginal RR ATT, and we’ll do so manually rather than using avg_comparisons() for speed. Note that the cluster bootstrap is already much faster than the standard bootstrap because matching does not need to occur within each bootstrap sample. First, we’ll do a round of matching.

mNN <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + 
                 X6 + X7 + X8 + X9, data = d)
mNN
## A `matchit` object
##  - method: 1:1 nearest neighbor matching without replacement
##  - distance: Propensity score
##              - estimated with logistic regression
##  - number of obs.: 2000 (original), 882 (matched)
##  - target estimand: ATT
##  - covariates: X1, X2, X3, X4, X5, X6, X7, X8, X9
md <- match_data(mNN)

Next, we’ll write the function that takes in cluster membership and the sampled indices and returns an estimate.

#Unique pair IDs
pair_ids <- levels(md$subclass)

#Unit IDs, split by pair membership
split_inds <- split(seq_len(nrow(md)), md$subclass)

cluster_boot_fun <- function(pairs, i) {
  
  #Extract units corresponding to selected pairs
  ids <- unlist(split_inds[pairs[i]])
  
  #Subset md with block bootstrapped indices
  boot_md <- md[ids,]
  
  #Fit outcome model
  fit <- glm(Y_B ~ A * (X1 + X2 + X3 + X4 + X5 + 
                          X6 + X7 + X8 + X9),
             data = boot_md, weights = weights,
             family = quasibinomial())
  
  ## G-computation ##
  #Subset to treated units for ATT; skip for ATE
  md1 <- subset(boot_md, A == 1)
  
  #Estimated potential outcomes under treatment
  p1 <- predict(fit, type = "response",
                newdata = transform(md1, A = 1))
  Ep1 <- mean(p1)
  
  #Estimated potential outcomes under control
  p0 <- predict(fit, type = "response",
                newdata = transform(md1, A = 0))
  Ep0 <- mean(p0)
  
  #Risk ratio
  Ep1 / Ep0
}

Next, we call boot::boot() with this function and the vector of pair membership supplied to perform the bootstrapping. We’ll request 199 bootstrap replications, but in practice you should use many more, upwards of 999. More is always better. Using more also allows you to use the bias-corrected and accelerated (BCa) boot strap confidence intervals, which are known to be the most accurate. See ?boot.ci for details. Here, we’ll just use a percentile confidence interval.

library("boot")
set.seed(54321)
cluster_boot_out <- boot(pair_ids, cluster_boot_fun,
                         R = 199)

cluster_boot_out
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = pair_ids, statistic = cluster_boot_fun, R = 199)
## 
## 
## Bootstrap Statistics :
##     original   bias    std. error
## t1*    1.588 0.001319      0.1265
boot.ci(cluster_boot_out, type = "perc")
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 199 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = cluster_boot_out, type = "perc")
## 
## Intervals : 
## Level     Percentile     
## 95%   ( 1.356,  1.857 )  
## Calculations and Intervals on Original Scale
## Some percentile intervals may be unstable

We find a RR of 1.588 with a confidence interval of (1.356, 1.857). If we had wanted a risk difference, we could have changed the final line in cluster_boot_fun() to be Ep1 - Ep0.

Moderation Analysis

Moderation analysis involves determining whether a treatment effect differs across levels of another variable. The use of matching with moderation analysis is described in Green and Stuart (2014). The goal is to achieve balance within each subgroup of the potential moderating variable, and there are several ways of doing so. Broadly, one can either perform matching in the full dataset, requiring exact matching on the moderator, or one can perform completely separate analyses in each subgroup. We’ll demonstrate the first approach below; see the blog post “Subgroup Analysis After Propensity Score Matching Using R” by Noah Greifer for an example of the other approach.

There are benefits to using either approach, and Green and Stuart (2014) find that either can be successful at balancing the subgroups. The first approach may be most effective with small samples, where separate propensity score models would be fit with greater uncertainty and an increased possibility of perfect prediction or failure to converge (Wang et al. 2018). The second approach may be more effective with larger samples or with matching methods that target balance in the matched sample, such as genetic matching (Kreif et al. 2012). With genetic matching, separate subgroup analyses ensure balance is optimized within each subgroup rather than just overall. The chosen approach should be that which achieves the best balance, though we don’t demonstrate assessing balance here to maintain focus on effect estimation.

The full dataset approach involves pooling information across subgroups. This could involve estimating propensity scores using a single model for both groups but exact matching on the potential moderator. The propensity score model could include moderator-by-covariate interactions to allow the propensity score model to vary across subgroups on some covariates. It is critical that exact matching is done on the moderator so that matched pairs are not split across subgroups.

We’ll consider the binary variable X5 to be the potential moderator of the effect of A on Y_C. Below, we’ll estimate a propensity score using a single propensity score model with a few moderator-by-covariate interactions. We’ll perform nearest neighbor matching on the propensity score and exact matching on the moderator, X5.

mP <- matchit(A ~ X1 + X2 + X5*X3 + X4 + 
                X5*X6 + X7 + X5*X8 + X9,
              data = d,
              exact = ~X5)
mP
## A `matchit` object
##  - method: 1:1 nearest neighbor matching without replacement
##  - distance: Propensity score
##              - estimated with logistic regression
##  - number of obs.: 2000 (original), 882 (matched)
##  - target estimand: ATT
##  - covariates: X1, X2, X5, X3, X4, X6, X7, X8, X9

Although it is straightforward to assess balance overall using summary(), it is more challenging to assess balance within subgroups. The easiest way to check subgroup balance would be to use cobalt::bal.tab(), which has a cluster argument that can be used to assess balance within subgroups, e.g., by cobalt::bal.tab(mP, cluster = "X5"). See the vignette “Appendix 2: Using cobalt with Clustered, Multiply Imputed, and Other Segmented Data” on the cobalt website for details.

If we are satisfied with balance, we can then model the outcome with an interaction between the treatment and the moderator.

mdP <- match_data(mP)

fitP <- lm(Y_C ~ A * X5, data = mdP, weights = weights)

To estimate the subgroup ATTs, we can use avg_comparisons(), this time specifying the by argument to signify that we want treatment effects stratified by the moderator.

avg_comparisons(fitP,
                variables = "A",
                vcov = ~subclass,
                newdata = subset(A == 1),
                by = "X5")

We can see that the subgroup mean differences are quite similar to each other. Finally, we can test for moderation using another call to avg_comparisons(), this time using the hypothesis argument to signify that we want to compare effects between subgroups:

avg_comparisons(fitP,
                variables = "A",
                vcov = ~subclass,
                newdata = subset(A == 1),
                by = "X5",
                hypothesis = ~pairwise)

As expected, the difference between the subgroup treatment effects is small and nonsignificant, so there is no evidence of moderation by X5.

When the moderator has more than two levels, it is possible to run an omnibus test for moderation by changing hypothesis to ~reference and supplying the output to hypotheses() with joint = TRUE, e.g.,

avg_comparisons(fitP,
                variables = "A",
                vcov = ~subclass,
                newdata = subset(A == 1),
                by = "X5",
                hypothesis = ~reference) |>
  hypotheses(joint = TRUE)

This produces a single p-value for the test that all pairwise differences between subgroups are equal to zero.

Reporting Results

It is important to be as thorough and complete as possible when describing the methods of estimating the treatment effect and the results of the analysis. This improves transparency and replicability of the analysis. Results should at least include the following:

  • a description of the outcome model used (e.g., logistic regression, a linear model with treatment-covariate interactions and covariates, a Cox proportional hazards model with the matching weights applied)
  • the way the effect was estimated (e.g., using g-computation or as the coefficient in the outcome model)
  • the way SEs and confidence intervals were estimated (e.g., using robust SEs, using cluster-robust SEs with pair membership as the cluster, using the BCa bootstrap with 4999 bootstrap replications and the entire process of matching and effect estimation included in each replication)
  • R packages and functions used in estimating the effect and its SE (e.g., glm() in base R, avg_comparisons() in marginaleffects, boot() and boot.ci() in boot)
  • The effect and its SE and confidence interval

All this is in addition to information about the matching method, propensity score estimation procedure (if used), balance assessment, etc. mentioned in the other vignettes.

Common Mistakes

There are a few common mistakes that should be avoided. It is important not only to avoid these mistakes in one’s own research but also to be able to spot these mistakes in others’ analyses.

1. Failing to include weights

Several methods involve weights that are to be used in estimating the treatment effect. With full matching and stratification matching (when analyzed using MMWS), the weights do the entire work of balancing the covariates across the treatment groups. Omitting weights essentially ignores the entire purpose of matching. Some cases are less obvious. When performing matching with replacement and estimating the treatment effect using the match_data() output, weights must be included to ensure control units matched to multiple treated units are weighted accordingly. Similarly, when performing k:1 matching where not all treated units receive k matches, weights are required to account for the differential weight of the matched control units. The only time weights can be omitted after pair matching is when performing 1:1 matching without replacement. Including weights even in this scenario will not affect the analysis and it can be good practice to always include weights to prevent this error from occurring. There are some scenarios where weights are not useful because the conditioning occurs through some other means, such as when using the direct subclass strategy rather than MMWS for estimating marginal effects after stratification.

2. Failing to use robust or cluster-robust standard errors

Robust SEs are required when using weights to estimate the treatment effect. The model-based SEs resulting from weighted least squares or maximum likelihood are inaccurate when using matching weights because they assume weights are frequency weights rather than probability weights. Cluster-robust SEs account for both the matching weights and pair membership and should be used when appropriate. Sometimes, researchers use functions in the survey package to estimate robust SEs, especially with inverse probability weighting; this is a valid way to compute robust SEs and will give similar results to sandwich::vcovHC().10

3. Interpreting conditional effects as marginal effects

The distinction between marginal and conditional effects is not always clear both in methodological and applied papers. Some statistical methods are valid only for estimating conditional effects and they should not be used to estimate marginal effects (without further modification). Sometimes conditional effects are desirable, and such methods may be useful for them, but when marginal effects are the target of inference, it is critical not to inappropriately interpret estimates resulting from statistical methods aimed at estimating conditional effects as marginal effects. Although this issue is particularly salient with binary and survival outcomes due to the general noncollapsibility of the OR, RR, and HR, this can also occur with linear models for continuous outcomes or the RD.

The following methods estimate conditional effects for binary or survival outcomes (with noncollapsible effect measures) and should not be used to estimate marginal effects:

  • Logistic regression or Cox proportional hazards model with covariates and/or the propensity score included, using the coefficient on treatment as the effect estimate
  • Conditional logistic regression after matching (e.g., using survival::clogit())
  • Stratified Cox regression after matching (e.g., using survival::coxph() with strata() in the model formula)
  • Averaging stratum-specific effect estimates after stratification, including using Mantel-Haenszel OR pooling
  • Including pair or stratum fixed or random effects in a logistic regression model, using the coefficient on treatment as the effect estimate

In addition, with continuous outcomes, conditional effects can be mistakenly interpreted as marginal effect estimates when treatment-covariate interactions are present in the outcome model. If the covariates are not centered at their mean in the target population (e.g., the treated group for the ATT, the full sample for the ATE, or the remaining matched sample for an ATM), the coefficient on treatment will not correspond to the marginal effect in the target population; it will correspond to the effect of treatment when the covariate values are equal to zero, which may not be meaningful or plausible. G-computation is always the safest way to estimate effects when including covariates in the outcome model, especially in the presence of treatment-covariate interactions.

References

Abadie, Alberto, and Guido W. Imbens. 2008. “On the Failure of the Bootstrap for Matching Estimators.” Econometrica 76 (6): 1537–57. https://doi.org/10.3982/ECTA6474.
Abadie, Alberto, and Jann Spiess. 2019. “Robust Post-Matching Inference,” January, 34. https://doi.org/10.1080/01621459.2020.1840383.
Austin, Peter C. 2009. “Type i Error Rates, Coverage of Confidence Intervals, and Variance Estimation in Propensity-Score Matched Analyses.” The International Journal of Biostatistics 5 (1). https://doi.org/10.2202/1557-4679.1146.
———. 2013a. “The Performance of Different Propensity Score Methods for Estimating Marginal Hazard Ratios.” Statistics in Medicine 32 (16): 2837–49. https://doi.org/10.1002/sim.5705.
———. 2013b. “The Use of Propensity Score Methods with Survival or Time-to-Event Outcomes: Reporting Measures of Effect Similar to Those Used in Randomized Experiments.” Statistics in Medicine 33 (7): 1242–58. https://doi.org/10.1002/sim.5984.
———. 2017. “Double Propensity-Score Adjustment: A Solution to Design Bias or Bias Due to Incomplete Matching.” Statistical Methods in Medical Research 26 (1): 201–22. https://doi.org/10.1177/0962280214543508.
Austin, Peter C., and Guy Cafri. 2020. “Variance Estimation When Using Propensity-Score Matching with Replacement with Survival or Time-to-Event Outcomes.” Statistics in Medicine 39 (11): 1623–40. https://doi.org/10.1002/sim.8502.
Austin, Peter C., and Dylan S. Small. 2014. “The Use of Bootstrapping When Using Propensity-Score Matching Without Replacement: A Simulation Study.” Statistics in Medicine 33 (24): 4306–19. https://doi.org/10.1002/sim.6276.
Austin, Peter C., and Elizabeth A. Stuart. 2017. “Estimating the Effect of Treatment on Binary Outcomes Using Full Matching on the Propensity Score.” Statistical Methods in Medical Research 26 (6): 2505–25. https://doi.org/10.1177/0962280215601134.
Austin, Peter C., Neal Thomas, and Donald B. Rubin. 2020. “Covariate-Adjusted Survival Analyses in Propensity-Score Matched Samples: Imputing Potential Time-to-Event Outcomes.” Statistical Methods in Medical Research 29 (3): 728–51. https://doi.org/10.1177/0962280218817926.
Bodory, Hugo, Lorenzo Camponovo, Martin Huber, and Michael Lechner. 2020. “The Finite Sample Performance of Inference Methods for Propensity Score Matching and Weighting Estimators.” Journal of Business & Economic Statistics 38 (1): 183–200. https://doi.org/10.1080/07350015.2018.1476247.
Cameron, A. Colin, and Douglas L. Miller. 2015. “A Practitioners Guide to Cluster-Robust Inference.” Journal of Human Resources 50 (2): 317–72. https://doi.org/10.3368/jhr.50.2.317.
Carpenter, James, and John Bithell. 2000. “Bootstrap Confidence Intervals: When, Which, What? A Practical Guide for Medical Statisticians.” Statistics in Medicine 19 (9): 1141–64. https://doi.org/10.1002/(SICI)1097-0258(20000515)19:9<1141::AID-SIM479>3.0.CO;2-F.
Desai, Rishi J., Kenneth J. Rothman, Brian T. Bateman, Sonia Hernandez-Diaz, and Krista F. Huybrechts. 2017. “A Propensity-Score-Based Fine Stratification Approach for Confounding Adjustment When Exposure Is Infrequent:” Epidemiology 28 (2): 249–57. https://doi.org/10.1097/EDE.0000000000000595.
Efron, Bradley, and Robert J. Tibshirani. 1993. An Introduction to the Bootstrap. Springer US.
Gayat, Etienne, Matthieu Resche-Rigon, Jean-Yves Mary, and Raphaël Porcher. 2012. “Propensity Score Applied to Survival Data Analysis Through Proportional Hazards Models: A Monte Carlo Study.” Pharmaceutical Statistics 11 (3): 222–29. https://doi.org/10.1002/pst.537.
Green, Kerry M., and Elizabeth A. Stuart. 2014. “Examining Moderation Analyses in Propensity Score Methods: Application to Depression and Substance Use.” Journal of Consulting and Clinical Psychology, Advances in Data Analytic Methods, 82 (5): 773–83. https://doi.org/10.1037/a0036515.
Greifer, Noah, and Elizabeth A. Stuart. 2021. “Choosing the Estimand When Matching or Weighting in Observational Studies.” arXiv:2106.10577 [Stat], June. https://arxiv.org/abs/2106.10577.
Hill, Jennifer, and Jerome P. Reiter. 2006. “Interval Estimation for Treatment Effects Using Propensity Score Matching.” Statistics in Medicine 25 (13): 2230–56. https://doi.org/10.1002/sim.2277.
Ho, Daniel E., Kosuke Imai, Gary King, and Elizabeth A. Stuart. 2007. “Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference.” Political Analysis 15 (3): 199–236. https://doi.org/10.1093/pan/mpl013.
Hong, Guanglei. 2010. “Marginal Mean Weighting Through Stratification: Adjustment for Selection Bias in Multilevel Data.” Journal of Educational and Behavioral Statistics 35 (5): 499–531. https://doi.org/10.3102/1076998609359785.
King, Gary, and Margaret E. Roberts. 2015. “How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It.” Political Analysis 23 (2): 159–79. https://doi.org/10.1093/pan/mpu015.
Kreif, Noemi, Richard Grieve, Rosalba Radice, Zia Sadique, Roland Ramsahai, and Jasjeet S. Sekhon. 2012. “Methods for Estimating Subgroup Effects in Cost-Effectiveness Analyses That Use Observational Data.” Medical Decision Making 32 (6): 750–63. https://doi.org/10.1177/0272989X12448929.
Liang, Kung-Yee, and Scott L. Zeger. 1986. “Longitudinal Data Analysis Using Generalized Linear Models.” Biometrika 73 (1): 13–22. https://doi.org/10.1093/biomet/73.1.13.
MacKinnon, James G. 2006. “Bootstrap Methods in Econometrics*.” Economic Record 82 (s1): S2–18. https://doi.org/10.1111/j.1475-4932.2006.00328.x.
MacKinnon, James G., and Halbert White. 1985. “Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties.” Journal of Econometrics 29 (3): 305–25. https://doi.org/10.1016/0304-4076(85)90158-7.
Nguyen, Tri-Long, Gary S. Collins, Jessica Spence, Jean-Pierre Daurès, P. J. Devereaux, Paul Landais, and Yannick Le Manach. 2017. “Double-Adjustment in Propensity Score Matching Analysis: Choosing a Threshold for Considering Residual Imbalance.” BMC Medical Research Methodology 17: 78. https://doi.org/10.1186/s12874-017-0338-0.
Schafer, Joseph L., and Joseph Kang. 2008. “Average Causal Effects from Nonrandomized Studies: A Practical Guide and Simulated Example.” Psychological Methods 13 (4): 279–313. https://doi.org/10.1037/a0014268.
Snowden, Jonathan M., Sherri Rose, and Kathleen M. Mortimer. 2011. “Implementation of G-Computation on a Simulated Data Set: Demonstration of a Causal Inference Technique.” American Journal of Epidemiology 173 (7): 731–38. https://doi.org/10.1093/aje/kwq472.
Wan, Fei. 2019. “Matched or Unmatched Analyses with Propensity-Scorematched Data?” Statistics in Medicine 38 (2): 289–300. https://doi.org/10.1002/sim.7976.
Wang, Shirley V., Yinzhu Jin, Bruce Fireman, Susan Gruber, Mengdong He, Richard Wyss, HoJin Shin, et al. 2018. “Relative Performance of Propensity Score Matching Strategies for Subgroup Analyses.” American Journal of Epidemiology 187 (8): 1799–1807. https://doi.org/10.1093/aje/kwy049.
Westreich, D., and S. Greenland. 2013. “The Table 2 Fallacy: Presenting and Interpreting Confounder and Modifier Coefficients.” American Journal of Epidemiology 177 (4): 292–98. https://doi.org/10.1093/aje/kws412.

Code to Generate Data used in Examples

#Generating data similar to Austin (2009) for demonstrating treatment effect estimation
gen_X <- function(n) {
  X <- matrix(rnorm(9 * n), nrow = n, ncol = 9)
  X[,5] <- as.numeric(X[,5] < .5)
  X
}

#~20% treated
gen_A <- function(X) {
  LP_A <- - 1.2 + log(2)*X[,1] - log(1.5)*X[,2] + log(2)*X[,4] - log(2.4)*X[,5] + log(2)*X[,7] - log(1.5)*X[,8]
  P_A <- plogis(LP_A)
  rbinom(nrow(X), 1, P_A)
}

# Continuous outcome
gen_Y_C <- function(A, X) {
  2*A + 2*X[,1] + 2*X[,2] + 2*X[,3] + 1*X[,4] + 2*X[,5] + 1*X[,6] + rnorm(length(A), 0, 5)
}
#Conditional:
#  MD: 2
#Marginal:
#  MD: 2

# Binary outcome
gen_Y_B <- function(A, X) {
  LP_B <- -2 + log(2.4)*A + log(2)*X[,1] + log(2)*X[,2] + log(2)*X[,3] + log(1.5)*X[,4] + log(2.4)*X[,5] + log(1.5)*X[,6]
  P_B <- plogis(LP_B)
  rbinom(length(A), 1, P_B)
}
#Conditional:
#  OR:   2.4
#  logOR: .875
#Marginal:
#  RD:    .144
#  RR:   1.54
#  logRR: .433
#  OR:   1.92
#  logOR  .655

# Survival outcome
gen_Y_S <- function(A, X) {
  LP_S <- -2 + log(2.4)*A + log(2)*X[,1] + log(2)*X[,2] + log(2)*X[,3] + log(1.5)*X[,4] + log(2.4)*X[,5] + log(1.5)*X[,6]
  sqrt(-log(runif(length(A)))*2e4*exp(-LP_S))
}
#Conditional:
#  HR:   2.4
#  logHR: .875
#Marginal:
#  HR:   1.57
#  logHR: .452

set.seed(19599)

n <- 2000
X <- gen_X(n)
A <- gen_A(X)

Y_C <- gen_Y_C(A, X)
Y_B <- gen_Y_B(A, X)
Y_S <- gen_Y_S(A, X)

d <- data.frame(A, X, Y_C, Y_B, Y_S)

  1. Because they are only appropriate with a large number of clusters, cluster-robust SEs are generally not used with subclassification methods. Regular robust SEs are valid with these methods when using the subclassification weights to estimate marginal effects.↩︎

  2. Sometimes, an error will occur with this method, which usually means more bootstrap replications are required. The number of replicates must be greater than the original sample size when using the full bootstrap and greater than the number of pairs/strata when using the block bootstrap.↩︎

  3. The matching weights are not necessary when performing 1:1 matching, but we include them here for generality. When weights are not necessary, including them does not affect the estimates. Because it may not always be clear when weights are required, we recommend always including them.↩︎

  4. To verify that they are equal, supply the output of avg_predictions() to hypotheses(), e.g., avg_predictions(...) |> hypotheses(~pairwise); this explicitly compares the average potential outcomes and should yield identical estimates to the avg_comparisons() call.↩︎

  5. It is also known as fine stratification weighting, described by Desai et al. (2017).↩︎

  6. We use quasibinomial() instead of binomial() simply to avoid a spurious warning that can occur with certain kinds of matching; the results will be identical regardless.↩︎

  7. Note that for low or high average expected risks computed with avg_predictions(), the confidence intervals may go below 0 or above 1; this is because an approximation is used. To avoid this problem, bootstrapping or simulation-based inference can be used instead.↩︎

  8. It is not immediately clear how to estimate a marginal HR when covariates are included in the outcome model; though Austin, Thomas, and Rubin (2020) describe several ways of including covariates in a model to estimate the marginal HR, they do not develop SEs and little research has been done on this method, so we will not present it here. Instead, we fit a simple Cox model with the treatment as the sole predictor.↩︎

  9. For subclassification, only MMWS can be used; this is done simply by including the stratification weights in the Cox model and omitting the cluster argument.↩︎

  10. To use survey to adjust for pair membership, one can use the following code to specify the survey design to be used with svyglm(): svydesign(ids = ~subclass, weights = ~weights, data = md) where md is the output of match_data(). After svyglm(), avg_comparisons() can be used, and the vcov argument does not need to be specified.↩︎

MatchIt/inst/doc/assessing-balance.html0000644000176200001440000252433114763323570017621 0ustar liggesusers Assessing Balance

Assessing Balance

Noah Greifer

2025-03-09

Introduction

Covariate balance is the degree to which the distribution of covariates is similar across levels of the treatment. It has three main roles in causal effect estimation using matching: 1) as a target to optimize with matching, 2) as a method of assessing the quality of the resulting matches, and 3) as evidence to an audience that the estimated effect is close to the true effect. When covariate balance is achieved, the resulting effect estimate is less sensitive to model misspecification and ideally close to true treatment effect. The benefit of randomization is that covariate balance is achieved automatically (in expectation), which is why unadjusted effects estimated from randomized trial data (in the absence of drop-out) can be validly interpreted as causal effects. When using matching to recover causal effect estimates form observational data, balance is not guaranteed and must be assessed.

This document provides instructions for assessing and reporting covariate balance as part of a matching analysis. The tools available in MatchIt for balance assessment should be used during the process of selecting a good matching scheme and ensuring that the chosen scheme is adequate. These tools implement the recommendations of Ho et al. (2007) and others for assessing balance.

In addition to the tools available in MatchIt, the cobalt package has a suite of functions designed to assess and display balance and is directly compatible with MatchIt objects. cobalt has extensive documentation, but we describe some of its functionality here as a complement to the tools in MatchIt.

The structure of this document is as follows: first, we describe some of the recommendations for balance checking and their rationale; next, we describe the tools for assessing balance present in MatchIt and display their use in evaluating several matching schemes; finally; we briefly describe some of the functionality in cobalt to extend that in MatchIt.

Recommendations for Balance Assessment

Assessing balance involves assessing whether the distributions of covariates are similar between the treated and control groups. Balance is typically assessed by examining univariate balance summary statistics for each covariate, though more complicated methods exist for assessing joint distributional balance as well. Visual depictions of distributional balance can be a helpful complement to numerical summaries, especially for hard to balance and prognostically important covariates.

Many recommendations for balance assessment have been described in the methodological literature. Unfortunately, there is no single best way to assess balance or to weigh balance summary statistics because the degree and form of balance that will yield the least bias in an effect estimate depends on unknown qualities of the outcome data-generating model. Nonetheless, there are a number of valuable recommendations that can be implemented to ensure matching is successful at eliminating or reducing bias. We review some of these here.

Common recommendations for assessing balance include the following:

  • Standardized mean differences. The standardized mean difference (SMD) is the difference in the means of each covariate between treatment groups standardized by a standardization factor so that it is on the same scale for all covariates. The standardization factor is typically the standard deviation of the covariate in the treated group when targeting the ATT or the pooled standard deviation across both groups when targeting the ATE. The standardization factor should be the same before and after matching to ensure changes in the mean difference are not confounded by changes in the standard deviation of the covariate. SMDs close to zero indicate good balance. Several recommended thresholds have been published in the literature; we recommend .1 and .05 for prognostically important covariates. Higher values may be acceptable when using covariate adjustment in the matched sample. In addition to computing SMDs on the covariates themselves, it is important to compute them on squares, cubes, and higher exponents as well as interactions between covariates. Several empirical studies have examined the appropriateness for using SMDs in balance assessment, including Belitser et al. (2011), Ali et al. (2014), and Stuart, Lee, and Leacy (2013); in general, there is often a high correlation between the mean or maximum absolute SMD and the degree of bias in the treatment effect.

  • Variance Ratios. The variance ratio is the ratio of the variance of a covariate in one group to that in the other. Variance ratios close to 1 indicate good balance because they imply the variances of the samples are similar (Austin 2009).

  • Empirical CDF Statistics. Statistics related to the difference in the empirical cumulative distribution functions (eCDFs) of each covariate between groups allow assessment of imbalance across the entire covariate distribution of that covariate rather than just its mean or variance. The maximum eCDF difference, also known as the Kolmogorov-Smirnov statistic, is sometimes recommended as a useful supplement to SMDs for assessing balance (Austin and Stuart 2015) and is often used as a criterion to use in propensity score methods that attempt to optimize balance (e.g., McCaffrey, Ridgeway, and Morral 2004; Diamond and Sekhon 2013). Although the mean eCDF difference has not been as well studied, it provides a summary of imbalance that may be missed by relying solely on the maximum difference.

  • Visual Diagnostics. Visual diagnostics such as eCDF plots, empirical quantile-quantile (eQQ) plots, and kernel density plots can be used to see exactly how the covariate distributions differ from each other, i.e., where in the distribution the greatest imbalances are (Ho et al. 2007; Austin 2009). This can help to figure out how to tailor a matching method to target imbalance in a specific region of the covariate distribution.

  • Prognostic scores. The prognostic score is an estimate of the potential outcome under control for each unit (Hansen 2008). Balance on the prognostic score has been shown to be highly correlated with bias in the effect estimate, making it a useful tool in balance assessment (Stuart, Lee, and Leacy 2013). Estimating the prognostic score requires having access to the outcome data, and using it may be seen as violating the principle of separating the design and analysis stages of a matching analysis (Rubin 2001). However, because only the outcome values from the control group are required to use the prognostic score, some separation is maintained.

Several multivariate statistics exist that summarize balance across the entire joint covariate distribution. These can be functions of the above measures, like the mean or maximum absolute SMD or the generalized weighted distance [GWD; Franklin et al. (2014)], which is the sum of SMDs for the covariates and their squares and interactions, or separate statistics that measure quantities that abstract away from the distribution of individual covariates, like the L1 distance (Iacus, King, and Porro 2011), cross-match test (Heller, Rosenbaum, and Small 2010), or energy distance (Huling and Mak 2020).

Balance on the propensity score has often been considered a useful measure of balance, but we do not necessarily recommend it except as a supplement to balance on the covariates. Propensity score balance will generally be good with any matching method regardless of the covariate balancing potential of the propensity score, so a balanced propensity score does not imply balanced covariates (Austin 2009). Similarly, it may happen that covariates may be well balanced even if the propensity score is not balanced, such as when covariates are prioritized above the propensity score in the matching specification (e.g., with genetic matching). Given these observations, the propensity score should not be relied upon for assessing covariate balance. Simulation studies by Stuart, Lee, and Leacy (2013) provide evidence for this recommendation against relying on propensity score balance.

There has been some debate about the use of hypothesis tests, such as t-tests or Kolmogorov-Smirnov tests, for assessing covariate balance. The idea is that balance tests test the null hypothesis that the matched sample has equivalent balance to a randomized experiment. There are several problems with balance tests, described by Ho et al. (2007) and Imai, King, and Stuart (2008): 1) balance is a property of the sample, not a of a population from which the sample was drawn; 2) the power of balance tests depends on the sample size, which changes during matching even if balance does not change; and 3) the use of hypothesis tests implies a uniform decision criterion for rejecting the null hypothesis (e.g., p-value less than .05, potentially with corrections for multiple comparisons), when balance should be improved without limit. MatchIt does not report any balance tests or p-values, instead relying on the descriptive statistics described above.

Recommendations for Balance Reporting

A variety of methods should be used when assessing balance to try to find an optimal matched set that will ideally yield a low-error estimate of the desired effect. However, reporting every balance statistic or plot in a research report or publication can be burdensome and unnecessary. That said, it is critical to report balance to demonstrate to readers that the resulting estimate is approximately unbiased and relies little on extrapolation or correct outcome model specification. We recommend the following in reporting balance in a matching analysis:

  • Report SMDs before and after matching for each covariate, any prognostically important interactions between covariates, and the prognostic score; this can be reported in a table or in a Love plot.

  • Report summaries of balance for other statistics, e.g., the largest mean and maximum eCDF difference among the covariates and the largest SMD among squares, cubes, and interactions of the covariates.

MatchIt provides tools for calculating each of these statistics so they can be reported with ease in a manuscript or report.

Assessing Balance with MatchIt

MatchIt contains several tools to assess balance numerically and graphically. The primary balance assessment function is summary.matchit(), which is called when using summary() on a MatchIt object and produces several tables of balance statistics before and after matching. plot.summary.matchit() generates a Love plot using R’s base graphics system containing the standardized mean differences resulting from a call to summary.matchit() and provides a nice way to display balance visually for inclusion in an article or report. plot.matchit() generates several plots that display different elements of covariate balance, including propensity score overlap and distribution plots of the covariates. These functions together form a suite that can be used to assess and report balance in a variety of ways.

To demonstrate MatchIt’s balance assessment capabilities, we will use the Lalonde data included in MatchIt and used in vignette("MatchIt"). We will perform 1:1 nearest neighbor matching with replacement on the propensity score, though the functionality is identical across all matching methods except propensity score subclassification, which we illustrate at the end.

library("MatchIt")
data("lalonde", package = "MatchIt")

#1:1 NN matching w/ replacement on a logistic regression PS
m.out <- matchit(treat ~ age + educ + race + married + 
                   nodegree + re74 + re75, data = lalonde,
                 replace = TRUE)
m.out
## A `matchit` object
##  - method: 1:1 nearest neighbor matching with replacement
##  - distance: Propensity score
##              - estimated with logistic regression
##  - number of obs.: 614 (original), 267 (matched)
##  - target estimand: ATT
##  - covariates: age, educ, race, married, nodegree, re74, re75

summary.matchit()

When summary() is called on a matchit object, several tables of information are displayed. These include balance statistics for each covariate before matching, balance statistics for each covariate after matching, the percent reduction in imbalance after matching, and the sample sizes before and after matching. summary.matchit() has four additional arguments that control how balance is computed:

  • interactions controls whether balance statistics for all squares and pairwise interactions of covariates are to be displayed in addition to the covariates. The default is FALSE, and setting to TRUE can make the output massive when many covariates are present, but it is important to ensure no important interactions remain imbalanced.
  • addlvariables allows for balance to be assessed on variables other than those inside the matchit object. For example, if the distance between units only relied on a subset of covariates but balance needed to be achieved on all covariates, addlvariables could be used to supply these additional covariates. In addition to adding other variables, addlvariables can be used to request balance on specific functions of the covariates already in the matchit object, such as polynomial terms or interactions. The input to addlvariables can be a one-sided formula with the covariates and any desired transformations thereof on the right hand side, just like a model formula (e.g., addlvariables = ~ X1 + X2 + I(X1^2) would request balance on X1, X2, and the square of X1). Additional variables supplied to addlvariables but not present in the matchit object can be supplied as a data frame using the data argument.
  • standardize controls whether standardized or unstandardized statistics are to displayed. Standardized statistics include the standardized mean difference and eCDF statistics; unstandardized statistics include the raw difference in means and eQQ plot statistics. (Regardless, the variance ratio will always be displayed.). The default is TRUE for standardized statistics, which are more common to report because they are all on the same scale regardless of the scale of the covariates1.
  • pair.dist controls whether within-pair distances should be computed and displayed. These reflect the average distance between units within the same pair, standardized or unstandardized according to the argument to standardize. The default is TRUE. With full matching, exact matching, coarsened exact matching, and propensity score subclassification, computing pair distances can take a long time, and so it may be beneficial to set to FALSE in these cases.

In addition, the arguments un (default: TRUE) and improvement (default: FALSE) control whether balance prior to matching should be displayed and whether the percent balance improvement after matching should be displayed. These can be set to FALSE to reduce the output.

Below, we call summary.matchit() with addlvariables to display balance on covariates and a few functions of them in the matched sample. In particular, we request balance on the square of age, the variables representing whether re74 and re75 were equal to 0, and the interaction between educ and race.

summary(m.out, addlvariables = ~ I(age^2) + I(re74==0) + 
          I(re75==0) + educ:race)
## 
## Call:
## matchit(formula = treat ~ age + educ + race + married + nodegree + 
##     re74 + re75, data = lalonde, replace = TRUE)
## 
## Summary of Balance for All Data:
##                  Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
## distance                 0.577         0.182           1.794      0.921     0.377    0.644
## age                     25.816        28.030          -0.309      0.440     0.081    0.158
## educ                    10.346        10.235           0.055      0.496     0.035    0.111
## raceblack                0.843         0.203           1.762          .     0.640    0.640
## racehispan               0.059         0.142          -0.350          .     0.083    0.083
## racewhite                0.097         0.655          -1.882          .     0.558    0.558
## married                  0.189         0.513          -0.826          .     0.324    0.324
## nodegree                 0.708         0.597           0.245          .     0.111    0.111
## re74                  2095.574      5619.237          -0.721      0.518     0.225    0.447
## re75                  1532.055      2466.484          -0.290      0.956     0.134    0.288
## I(age^2)               717.395       901.779          -0.428      0.363     0.081    0.158
## I(re74 == 0)TRUE         0.708         0.261           0.983          .     0.447    0.447
## I(re75 == 0)TRUE         0.600         0.312           0.587          .     0.288    0.288
## educ:raceblack           8.697         2.047           1.580      0.980     0.354    0.645
## educ:racehispan          0.578         1.263          -0.294      0.487     0.046    0.078
## educ:racewhite           1.070         6.925          -1.767      0.365     0.279    0.555
## 
## Summary of Balance for Matched Data:
##                  Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
## distance                 0.577         0.576           0.004      0.992     0.003    0.049           0.013
## age                     25.816        24.103           0.239      0.557     0.077    0.341           1.262
## educ                    10.346        10.378          -0.016      0.577     0.022    0.059           1.086
## raceblack                0.843         0.838           0.015          .     0.005    0.005           0.045
## racehispan               0.059         0.065          -0.023          .     0.005    0.005           0.297
## racewhite                0.097         0.097           0.000          .     0.000    0.000           0.054
## married                  0.189         0.130           0.152          .     0.059    0.059           0.511
## nodegree                 0.708         0.703           0.012          .     0.005    0.005           0.868
## re74                  2095.574      2336.463          -0.049      1.036     0.041    0.216           0.609
## re75                  1532.055      1503.929           0.009      2.129     0.068    0.238           0.650
## I(age^2)               717.395       670.946           0.108      0.510     0.077    0.341           1.196
## I(re74 == 0)TRUE         0.708         0.492           0.476          .     0.216    0.216           0.975
## I(re75 == 0)TRUE         0.600         0.362           0.485          .     0.238    0.238           1.037
## educ:raceblack           8.697         8.589           0.026      0.869     0.024    0.054           0.468
## educ:racehispan          0.578         0.638          -0.026      0.827     0.007    0.022           0.336
## educ:racewhite           1.070         1.151          -0.024      0.846     0.005    0.022           0.220
## 
## Sample Sizes:
##               Control Treated
## All            429.       185
## Matched (ESS)   46.31     185
## Matched         82.       185
## Unmatched      347.         0
## Discarded        0.         0

Let’s examine the output in detail. The first table (Summary of Balance for All Data) provides balance in the sample prior to matching. The included statistics are the mean of the covariates in the treated group (Means Treated), the mean of the covariate in the control group (Means Control), the SMDs (Std. Mean Diff.), the variance ratio (Var. Ratio), the average distance between the eCDFs of the covariate across the groups (eCDF Mean), and the largest distance between the eCDFs (eCDF Max). Setting un = FALSE would have suppressed the creation of this table.

The second table (Summary of Balance for Matched Data) contains all the same statistics in the matched sample. Because we implicitly request pair distance, an additional column for standardized pair distances (Std. Pair Dist.) is displayed.

The final table (Sample Sizes) contains the sizes of the samples before (All) and after (Matched) matching, as well as the number of units left unmatched (Unmatched) and the number of units dropped due to a common support restriction (Discarded).

The SMDs are computed as the mean difference divided by a standardization factor computed in the unmatched sample. An absolute SMD close to 0 indicates good balance; although a number of recommendations for acceptable values have appeared in the literature, we recommend absolute values less than .1 and less than .05 for potentially prognostically important variables.

The variance ratios are computed as the ratio of the variance of the treated group to that of the control group for each covariate. Variance ratios are not computed for binary covariates because they are a function of the prevalence in each group, which is captured in the mean difference and eCDF statistics. A variance ratio close to 1 indicates good balance; a commonly used recommendation is for variance ratios to be between .5 and 2.

The eCDF statistics correspond to the difference in the overall distributions of the covariates between the treatment groups. The values of both statistics range from 0 to 1, with values closer to zero indicating better balance. There are no specific recommendations for the values these statistics should take, though notably high values may indicate imbalance on higher moments of the covariates. The eQQ statistics produced when standardize = FALSE are interpreted similarly but are on the scale of the covariate.

All these statistics should be considered together. Imbalance as measured by any of them may indicate a potential failure of the matching scheme to achieve distributional balance.

plot.summary.matchit()

A Love plot is a clean way to visually summarize balance. Using plot on the output of a call to summary() on a matchit object produces a Love plot of the standardized mean differences. plot.summary.matchit() has several additional arguments that can be used to customize the plot.

  • abs controls whether standardized mean difference should be displayed in absolute value or not. Default is TRUE.
  • var.order controls how the variables are ordered on the y-axis. The options are "data" (the default), which orders the variables as they appear the in the summary.matchit() output; "unmatched", which orders the variables based on their standardized mean differences before matching; "matched", which orders the variables based on their standardized mean differences after matching; and "alphabetical", which orders the variables alphabetically. Using "unmatched" tends to result in attractive plots and ensures the legend doesn’t overlap with points in its default position.
  • threshold controls where vertical lines indicating chosen thresholds should appear on the x-axis. Should be a numeric vector. The default is c(.1, .05), which display vertical lines at .1 and .05 standardized mean difference units.
  • position controls the position of the legend. The default is "bottomright", which puts the legend in the bottom right corner of the plot, and any keyword value available to supplied to x in legend() is allowed.

Below we create a Love plot of the covariates.

m.sum <- summary(m.out, addlvariables = ~ I(age^2) + I(re74==0) + 
                   I(re75==0) + educ:race)
plot(m.sum, var.order = "unmatched")

A love plot with most matched dots below the threshold lines, indicaitng good balance after matching, in contrast to the unmatched dots far from the treshold lines, indicating poor balance before matching.

From this plot it is clear to see that balance was quite poor prior to matching, but full matching improved balance on all covariates, and most within a threshold of .1. To make the variable names cleaner, the original variables should be renamed prior to matching. cobalt provides many additional options to generate and customize Love plots using the love.plot() function and should be used if a plot beyond what is available with plot.summary.matchit() is desired.

plot.matchit()

In addition to numeric summaries of balance, MatchIt offers graphical summaries as well using plot.matchit() (i.e., using plot() on a matchit object). We can create eQQ plots, eCDF plots, or density plots of the covariates and histograms or jitter plots of the propensity score. The covariate plots can provide a summary of the balance of the full marginal distribution of a covariate beyond just the mean and variance.

plot.matchit() has a few arguments to customize the output:

  • type corresponds to the type of plot desired. Options include "qq" for eQQ plots (the default), "ecdf" for eCDF plots, "density" for density plots (or bar plots for categorical variables), "jitter" for jitter plots, and "histogram" for histograms.
  • interactive controls whether the plot is interactive or not. For eQQ, eCDF, and density plots, this allows us to control when the next page of covariates is to be displayed since only three can appear at a time. For jitter plots, this can allow us to select individual units with extreme values for further inspection. The default is TRUE.
  • which.xs is used to specify for which covariates to display balance in eQQ, eCDF, and density plots. The default is to display balance on all, but we can request balance just on a specific subset. If three or fewer are requested, interactive is ignored. The argument can be supplied as a one-sided formula with the variables of interest on the right or a character vector containing the names of the desired variables. If any variables are not in the matchit object, a data argument can be supplied with a data set containing the named variables.

Below, we demonstrate the eQQ plot:

#eQQ plot
plot(m.out, type = "qq", which.xs = ~age + nodegree + re74)

eQQ plots of age, nodegree, and re74 in the unmatched and matched samples.

The y-axis displays the each value of the covariate for the treated units, and the x-axis displays the the value of the covariate at the corresponding quantile in the control group. When values fall on the 45 degree line, the groups are balanced. Above, we can see that age remains somewhat imbalanced, but nodegree and re74 have much better balance after matching than before. The difference between the x and y values of each point are used to compute the eQQ difference statistics that are displayed in summary.matchit() with standardize = FALSE.

Below, we demonstrate the eCDF plot:

#eCDF plot
plot(m.out, type = "ecdf", which.xs = ~educ + married + re75)

eCDF plots of educ, married, and re75 in the unmatched and matched samples.

The x-axis displays the covariate values and the y-axis displays the proportion of the sample at or less than that covariate value. Perfectly overlapping lines indicate good balance. The black line corresponds to the treated group and the gray line to the control group. Although educ and re75 were fairly well balanced before matching, their balance has improved nonetheless. married appears far better balanced after matching than before. The vertical difference between the eCDFs lines of each treatment group is used to compute the eCDF difference statistics that are displayed in summary.matchit() with standardize = TRUE.

Below, we demonstrate the density plot:

#density plot
plot(m.out, type = "density", which.xs = ~age + educ + race)

Density plots of age, educ, and race in the unmatched and matched samples.

The x-axis displays the covariate values and the y-axis displays the density of the sample at that covariate value. For categorical variables, the y-axis displays the proportion of the sample at that covariate value. The black line corresponds to the treated group and the gray line to the control group. Perfectly overlapping lines indicate good balance. Density plots display similar information to eCDF plots but may be more intuitive for some users because of their link to histograms.

Assessing Balance After Subclassification

With subclassification, balance can be checked both within each subclass and overall. With summary.matchit(), we can request to view balance only in aggregate or in each subclass. The latter can help us decide if we can interpret effects estimated within each subclass as unbiased. The plot.summary.matchit() and plot.matchit() outputs can be requested either in aggregate or for each subclass. We demonstrate this below. First we will perform propensity score subclassification using 4 subclasses (typically more is beneficial).

#Subclassification on a logistic regression PS
s.out <- matchit(treat ~ age + educ + race + married + 
                   nodegree + re74 + re75, data = lalonde,
                 method = "subclass", subclass = 4)
s.out
## A `matchit` object
##  - method: Subclassification (4 subclasses)
##  - distance: Propensity score
##              - estimated with logistic regression
##  - number of obs.: 614 (original), 614 (matched)
##  - target estimand: ATT
##  - covariates: age, educ, race, married, nodegree, re74, re75

When using summary(), the default is to display balance only in aggregate using the subclassification weights. This balance output looks similar to that for other matching methods.

summary(s.out)
## 
## Call:
## matchit(formula = treat ~ age + educ + race + married + nodegree + 
##     re74 + re75, data = lalonde, method = "subclass", subclass = 4)
## 
## Summary of Balance for All Data:
##            Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
## distance           0.577         0.182           1.794      0.921     0.377    0.644
## age               25.816        28.030          -0.309      0.440     0.081    0.158
## educ              10.346        10.235           0.055      0.496     0.035    0.111
## raceblack          0.843         0.203           1.762          .     0.640    0.640
## racehispan         0.059         0.142          -0.350          .     0.083    0.083
## racewhite          0.097         0.655          -1.882          .     0.558    0.558
## married            0.189         0.513          -0.826          .     0.324    0.324
## nodegree           0.708         0.597           0.245          .     0.111    0.111
## re74            2095.574      5619.237          -0.721      0.518     0.225    0.447
## re75            1532.055      2466.484          -0.290      0.956     0.134    0.288
## 
## Summary of Balance Across Subclasses
##            Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
## distance           0.577         0.539           0.173      0.678     0.062    0.126
## age               25.816        24.975           0.118      0.465     0.085    0.297
## educ              10.346        10.433          -0.043      0.582     0.023    0.061
## raceblack          0.843         0.767           0.210          .     0.076    0.076
## racehispan         0.059         0.042           0.076          .     0.018    0.018
## racewhite          0.097         0.191          -0.318          .     0.094    0.094
## married            0.189         0.196          -0.017          .     0.007    0.007
## nodegree           0.708         0.657           0.113          .     0.051    0.051
## re74            2095.574      2557.709          -0.095      0.968     0.048    0.264
## re75            1532.055      1490.040           0.013      1.505     0.035    0.146
## 
## Sample Sizes:
##               Control Treated
## All             429.      185
## Matched (ESS)   102.3     185
## Matched         429.      185
## Unmatched         0.        0
## Discarded         0.        0

An additional option in summary(), subclass, allows us to request balance for individual subclasses. subclass can be set to TRUE to display balance for all subclasses or the indices of individual subclasses for which balance is to be displayed. Below we call summary() and request balance to be displayed on all subclasses (setting un = FALSE to suppress balance in the original sample):

summary(s.out, subclass = TRUE, un = FALSE)
## 
## Call:
## matchit(formula = treat ~ age + educ + race + married + nodegree + 
##     re74 + re75, data = lalonde, method = "subclass", subclass = 4)
## 
## Summary of Balance by Subclass:
## 
## - Subclass 1
##            Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
## distance           0.239         0.095           0.879      2.494     0.313    0.508
## age               26.478        28.800          -0.345      0.394     0.090    0.160
## educ              10.304        10.214           0.040      0.620     0.025    0.084
## raceblack          0.370         0.063           0.635          .     0.307    0.307
## racehispan         0.239         0.167           0.169          .     0.072    0.072
## racewhite          0.391         0.770          -0.776          .     0.379    0.379
## married            0.370         0.589          -0.455          .     0.219    0.219
## nodegree           0.587         0.584           0.007          .     0.003    0.003
## re74            5430.539      6363.913          -0.118      1.298     0.087    0.284
## re75            2929.039      2699.399           0.054      1.587     0.047    0.144
## 
## - Subclass 2
##            Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
## distance           0.604         0.612          -0.214      0.905     0.083    0.195
## age               25.556        24.409           0.152      0.461     0.114    0.370
## educ               9.933         9.773           0.066      0.448     0.084    0.188
## raceblack          1.000         1.000           0.000          .     0.000    0.000
## racehispan         0.000         0.000           0.000          .     0.000    0.000
## racewhite          0.000         0.000           0.000          .     0.000    0.000
## married            0.378         0.091           0.592          .     0.287    0.287
## nodegree           0.667         0.500           0.354          .     0.167    0.167
## re74            1777.422      2516.589          -0.219      0.433     0.076    0.280
## re75             972.344      1131.077          -0.100      0.666     0.034    0.086
## 
## - Subclass 3
##            Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
## distance           0.693         0.691           0.130      1.281     0.055    0.189
## age               24.021        22.964           0.158      0.509     0.128    0.281
## educ              10.170        10.286          -0.069      1.040     0.038    0.099
## raceblack          1.000         1.000           0.000          .     0.000    0.000
## racehispan         0.000         0.000           0.000          .     0.000    0.000
## racewhite          0.000         0.000           0.000          .     0.000    0.000
## married            0.021         0.107          -0.595          .     0.086    0.086
## nodegree           0.681         0.750          -0.148          .     0.069    0.069
## re74             939.969       888.947           0.020      2.038     0.059    0.216
## re75            1217.455      1285.387          -0.018      1.535     0.038    0.188
## 
## - Subclass 4
##            Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
## distance           0.767         0.753           0.540      2.961     0.165    0.445
## age               27.213        23.786           0.461      0.521     0.150    0.459
## educ              10.957        11.429          -0.341      0.701     0.059    0.126
## raceblack          1.000         1.000           0.000          .     0.000    0.000
## racehispan         0.000         0.000           0.000          .     0.000    0.000
## racewhite          0.000         0.000           0.000          .     0.000    0.000
## married            0.000         0.000           0.000          .     0.000    0.000
## nodegree           0.894         0.786           0.350          .     0.108    0.108
## re74             291.783       540.618          -0.307      0.917     0.083    0.280
## re75            1015.289       854.751           0.079      3.523     0.112    0.266
## 
## Sample Sizes by Subclass:
##           1  2  3  4 All
## Control 365 22 28 14 429
## Treated  46 45 47 47 185
## Total   411 67 75 61 614

We can plot the standardized mean differences in a Love plot that also displays balance for the subclasses using plot.summary.matchit() on a summary.matchit() object with subclass = TRUE.

s <- summary(s.out, subclass = TRUE)
plot(s, var.order = "unmatched", abs = FALSE)

Love plot of balance before and after subclassification, with subclass IDs representing balance within each subclass in addition to dots representing balance overall.

Note that for some variables, while the groups are balanced in aggregate (black dots), the individual subclasses (gray numbers) may not be balanced, in which case unadjusted effect estimates within these subclasses should not be interpreted as unbiased.

When we plot distributional balance using plot.matchit(), again we can choose whether balance should be displayed in aggregate or within subclasses again using the subclass option, which functions the same as it does with summary.matchit(). Below we demonstrate checking balance within a subclass.

plot(s.out, type = "density", which.xs = ~educ + married + re75,
     subclass = 1)

Density plots of educ, married, and re75 in the unmatched sample and in subclass 1.

If we had set subclass = FALSE, plots would have been displayed in aggregate using the subclassification weights. If subclass is unspecified, a prompt will ask us for which subclass we want to see balance.

Assessing Balance with cobalt

The cobalt package was designed specifically for checking balance before and after matching (and weighting). It offers three main functions, bal.tab(), love.plot(), and bal.plot(), which perform similar actions to summary.matchit(), plot.summary.matchit(), and plot.matchit(), respectively. These functions directly interface with matchit objects, making cobalt straightforward to use in conjunction with MatchIt. cobalt can be used as a complement to MatchIt, especially for more advanced uses that are not accommodated by MatchIt, such as comparing balance across different matching schemes and even different packages, assessing balance in clustered or multiply imputed data, and assessing balance with multi-category, continuous, and time-varying treatments. The main cobalt vignette (vignette("cobalt", package = "cobalt")) contains many examples of its use with MatchIt objects, so we only provide a short demonstration of its capabilities here.

library("cobalt")

bal.tab()

bal.tab() produces tables of balance statistics similar to summary.matchit(). The columns displayed can be customized to limit how much information is displayed and isolate desired information. We call bal.tab() with a few of its options specified below:

bal.tab(m.out, un = TRUE, stats = c("m", "v", "ks"))
## Balance Measures
##                 Type Diff.Un V.Ratio.Un KS.Un Diff.Adj V.Ratio.Adj KS.Adj
## distance    Distance   1.794      0.921 0.644    0.004       0.992  0.049
## age          Contin.  -0.309      0.440 0.158    0.239       0.557  0.341
## educ         Contin.   0.055      0.496 0.111   -0.016       0.577  0.059
## race_black    Binary   0.640          . 0.640    0.005           .  0.005
## race_hispan   Binary  -0.083          . 0.083   -0.005           .  0.005
## race_white    Binary  -0.558          . 0.558    0.000           .  0.000
## married       Binary  -0.324          . 0.324    0.059           .  0.059
## nodegree      Binary   0.111          . 0.111    0.005           .  0.005
## re74         Contin.  -0.721      0.518 0.447   -0.049       1.036  0.216
## re75         Contin.  -0.290      0.956 0.288    0.009       2.129  0.238
## 
## Sample sizes
##                      Control Treated
## All                   429.       185
## Matched (ESS)          46.31     185
## Matched (Unweighted)   82.       185
## Unmatched             347.         0

The output is very similar to that of summary.matchit(), except that the balance statistics computed before matching (with the suffix .Un) and those computed after matching (with the suffix .Adj) are in the same table. By default, only SMDs after matching (Diff.Adj) are displayed; by setting un = TRUE, we requested that the balance statistics before matching also be displayed, and by setting stats = c("m", "v", "ks") we requested mean differences, variance ratios, and Kolmogorov-Smirnov statistics. Other balance statistics and summary statistics can be requested as well. One important detail to note is that the default for binary covariates is to print the raw difference in proportion rather than the standardized mean difference, so there will be an apparent discrepancy for these variables between bal.tab() and summary.matchit() output, though this behavior can be changed by setting binary = "std" in the call to bal.tab(). Functionality for producing balance statistics for additional variables and for powers and interactions of the covariates is available using the addl, poly, and int options.

bal.tab() and other cobalt functions can produce balance not just on a single matchit object but on several at the same time, which facilitates comparing balance across several matching specifications. For example, if we wanted to compare the full matching results to the results of nearest neighbor matching without replacement, we could supply both to bal.tab(), which we demonstrate below:

#Nearest neighbor (NN) matching on the PS
m.out2 <- matchit(treat ~ age + educ + race + married + 
                   nodegree + re74 + re75, data = lalonde)

#Balance on covariates after full and NN matching
bal.tab(treat ~ age + educ + race + married + 
          nodegree + re74 + re75, data = lalonde, 
        un = TRUE, weights = list(full = m.out, nn = m.out2))
## Balance Measures
##                Type Diff.Un Diff.full Diff.nn
## age         Contin.  -0.309     0.239   0.072
## educ        Contin.   0.055    -0.016  -0.129
## race_black   Binary   0.640     0.005   0.373
## race_hispan  Binary  -0.083    -0.005  -0.157
## race_white   Binary  -0.558     0.000  -0.216
## married      Binary  -0.324     0.059  -0.022
## nodegree     Binary   0.111     0.005   0.070
## re74        Contin.  -0.721    -0.049  -0.050
## re75        Contin.  -0.290     0.009  -0.026
## 
## Effective sample sizes
##      Control Treated
## All   429.       185
## full   46.31     185
## nn    185.       185

This time, we supplied bal.tab() with the covariates and dataset and supplied the matchit output objects in the weights argument (which extracts the matching weights from the objects). Here we can see that full matching yields better balance than nearest neighbor matching overall, though balance is slightly worse for age and maried and the effective sample size is lower.

love.plot()

love.plot() creates a Love plot of chosen balance statistics. It offers many options for customization, including the shape and colors of the points, how the variable names are displayed, and for which statistics balance is to be displayed. Below is an example of its basic use:

love.plot(m.out, binary = "std")

Minimal love plot of balance before and after matching.

The syntax is straightforward and similar to that of bal.tab(). Below we demonstrate a more advanced use that customizes the appearance of the plot and displays balance not only on mean differences but also on Kolmogorov-Smirnov statistics and for both full matching and nearest neighbor matching simultaneously.

love.plot(m.out, stats = c("m", "ks"), poly = 2, abs = TRUE,
          weights = list(nn = m.out2),
          drop.distance = TRUE, thresholds = c(m = .1),
          var.order = "unadjusted", binary = "std",
          shapes = c("circle filled", "triangle", "square"), 
          colors = c("red", "blue", "darkgreen"),
          sample.names = c("Original", "Full Matching", "NN Matching"),
          position = "bottom")

A more elaborate love plot displaying some of the cobalt's capabilities for making publication-ready plots.

The love.plot() documentation explains what each of these arguments do and the several other ones available. See vignette("love.plot", package = "cobalt") for other advanced customization of love.plot().

bal.plot()

bal.plot() displays distributional balance for a single covariate, similar to plot.matchit(). Its default is to display kernel density plots for continuous variables and bar graphs for categorical variables. It can also display eCDF plots and histograms. Below we demonstrate some of its uses:

#Density plot for continuous variables
bal.plot(m.out, var.name = "educ", which = "both")

Density plot for educ before and after matching.

#Bar graph for categorical variables
bal.plot(m.out, var.name = "race", which = "both")

Bar graph for race before and after matching.

#Mirrored histogram
bal.plot(m.out, var.name = "distance", which = "both",
         type = "histogram", mirror = TRUE)

Mirrored histograms of propensity scores before and after matching.

These plots help illuminate the specific ways in which the covariate distributions differ between treatment groups, which can aid in interpreting the balance statistics provided by bal.tab() and summary.matchit().

Conclusion

The goal of matching is to achieve covariate balance, similarity between the covariate distributions of the treated and control groups. Balance should be assessed during the matching phase to find a matching specification that works. Balance must also be reported in the write-up of a matching analysis to demonstrate to readers that matching was successful. MatchIt and cobalt each offer a suite of functions to implement best practices in balance assessment and reporting.

References

Ali, M. Sanni, Rolf H. H. Groenwold, Wiebe R. Pestman, Svetlana V. Belitser, Kit C. B. Roes, Arno W. Hoes, Anthonius de Boer, and Olaf H. Klungel. 2014. “Propensity Score Balance Measures in Pharmacoepidemiology: A Simulation Study.” Pharmacoepidemiology and Drug Safety 23 (8): 802–11. https://doi.org/10.1002/pds.3574.
Austin, Peter C. 2009. “Balance Diagnostics for Comparing the Distribution of Baseline Covariates Between Treatment Groups in Propensity-Score Matched Samples.” Statistics in Medicine 28 (25): 3083–3107. https://doi.org/10.1002/sim.3697.
Austin, Peter C., and Elizabeth A. Stuart. 2015. “Moving Towards Best Practice When Using Inverse Probability of Treatment Weighting (IPTW) Using the Propensity Score to Estimate Causal Treatment Effects in Observational Studies.” Statistics in Medicine 34 (28): 3661–79. https://doi.org/10.1002/sim.6607.
Belitser, Svetlana V., Edwin P. Martens, Wiebe R. Pestman, Rolf H.H. Groenwold, Anthonius de Boer, and Olaf H. Klungel. 2011. “Measuring Balance and Model Selection in Propensity Score Methods.” Pharmacoepidemiology and Drug Safety 20 (11): 1115–29. https://doi.org/10.1002/pds.2188.
Diamond, Alexis, and Jasjeet S. Sekhon. 2013. “Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics 95 (3): 932945. https://doi.org/10.1162/REST_a_00318.
Franklin, Jessica M., Jeremy A. Rassen, Diana Ackermann, Dorothee B. Bartels, and Sebastian Schneeweiss. 2014. “Metrics for Covariate Balance in Cohort Studies of Causal Effects.” Statistics in Medicine 33 (10): 1685–99. https://doi.org/10.1002/sim.6058.
Hansen, Ben B. 2008. “The Prognostic Analogue of the Propensity Score.” Biometrika 95 (2): 481–88. https://doi.org/10.1093/biomet/asn004.
Heller, Ruth, Paul R. Rosenbaum, and Dylan S. Small. 2010. “Using the Cross-Match Test to Appraise Covariate Balance in Matched Pairs.” The American Statistician 64 (4): 299–309. https://doi.org/10.1198/tast.2010.09210.
Ho, Daniel E., Kosuke Imai, Gary King, and Elizabeth A. Stuart. 2007. “Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference.” Political Analysis 15 (3): 199–236. https://doi.org/10.1093/pan/mpl013.
Huling, Jared D., and Simon Mak. 2020. “Energy Balancing of Covariate Distributions.” arXiv:2004.13962 [Stat], April. https://arxiv.org/abs/2004.13962.
Iacus, Stefano M., Gary King, and Giuseppe Porro. 2011. “Multivariate Matching Methods That Are Monotonic Imbalance Bounding.” Journal of the American Statistical Association 106 (493): 345–61. https://doi.org/10.1198/jasa.2011.tm09599.
Imai, Kosuke, Gary King, and Elizabeth A. Stuart. 2008. “Misunderstandings Between Experimentalists and Observationalists about Causal Inference.” Journal of the Royal Statistical Society. Series A (Statistics in Society) 171 (2): 481–502. https://doi.org/10.1111/j.1467-985X.2007.00527.x.
McCaffrey, Daniel F., Greg Ridgeway, and Andrew R. Morral. 2004. “Propensity Score Estimation With Boosted Regression for Evaluating Causal Effects in Observational Studies.” Psychological Methods 9 (4): 403–25. https://doi.org/10.1037/1082-989X.9.4.403.
Rubin, Donald B. 2001. “Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation.” Health Services and Outcomes Research Methodology 2 (3-4): 169–88. https://doi.org/10.1023/A:1020363010465.
Stuart, Elizabeth A., Brian K. Lee, and Finbarr P. Leacy. 2013. “Prognostic Score-Based Balance Measures Can Be a Useful Diagnostic for Propensity Score Methods in Comparative Effectiveness Research.” Journal of Clinical Epidemiology 66 (8): S84. https://doi.org/10.1016/j.jclinepi.2013.01.013.

  1. Note that versions of MatchIt before 4.0.0 had standardize set to FALSE by default.↩︎

MatchIt/inst/doc/matching-methods.Rmd0000644000176200001440000015541614762360425017251 0ustar liggesusers--- title: "Matching Methods" author: "Noah Greifer" date: "`r Sys.Date()`" output: html_vignette: toc: true vignette: > %\VignetteIndexEntry{Matching Methods} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} bibliography: references.bib link-citations: true --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) options(width = 200) ``` ## Introduction `MatchIt` implements several matching methods with a variety of options. Though the help pages for the individual methods describe each method and how they can be used, this vignette provides a broad overview of the available matching methods and their associated options. The choice of matching method depends on the goals of the analysis (e.g., the estimand, whether low bias or high precision is important) and the unique qualities of each dataset to be analyzed, so there is no single optimal choice for any given analysis. A benefit of nonparametric preprocessing through matching is that a number of matching methods can be tried and their quality assessed without consulting the outcome, reducing the possibility of capitalizing on chance while allowing for the benefits of an exploratory analysis in the design phase [@ho2007]. This vignette describes each matching method available in `MatchIt` and the various options that are allowed with matching methods and the consequences of their use. For a brief introduction to the use of `MatchIt` functions, see `vignette("MatchIt")`. For details on how to assess and report covariate balance, see `vignette("assessing-balance")`. For details on how to estimate treatment effects and standard errors after matching, see `vignette("estimating-effects")`. ## Matching Matching as implemented in `MatchIt` is a form of *subset selection*, that is, the pruning and weighting of units to arrive at a (weighted) subset of the units from the original dataset. Ideally, and if done successfully, subset selection produces a new sample where the treatment is unassociated with the covariates so that a comparison of the outcomes treatment and control groups is not confounded by the measured and balanced covariates. Although statistical estimation methods like regression can also be used to remove confounding due to measured covariates, @ho2007 argue that fitting regression models in matched samples reduces the dependence of the validity of the estimated treatment effect on the correct specification of the model. Matching is nonparametric in the sense that the estimated weights and pruning of the sample are not direct functions of estimated model parameters but rather depend on the organization of discrete units in the sample; this is in contrast to propensity score weighting (also known as inverse probability weighting), where the weights come more directly from the estimated propensity score model and therefore are more sensitive to its correct specification. These advantages, as well as the intuitive understanding of matching by the public compared to regression or weighting, make it a robust and effective way to estimate treatment effects. It is important to note that this implementation of matching differs from the methods described by Abadie and Imbens [-@abadie2006; -@abadie2016] and implemented in the `Matching` R package and `teffects` routine in Stata. That form of matching is *matching imputation*, where the missing potential outcomes for each unit are imputed using the observed outcomes of paired units. This is a critical distinction because matching imputation is a specific estimation method with its own effect and standard error estimators, in contrast to subset selection, which is a preprocessing method that does not require specific estimators and is broadly compatible with other parametric and nonparametric analyses. The benefits of matching imputation are that its theoretical properties (i.e., the rate of convergence and asymptotic variance of the estimator) are well understood, it can be used in a straightforward way to estimate not just the average treatment effect in the treated (ATT) but also the average treatment effect in the population (ATE), and additional effective matching methods can be used in the imputation (e.g., kernel matching). The benefits of matching as nonparametric preprocessing are that it is far more flexible with respect to the types of effects that can be estimated because it does not involve any specific estimator, its empirical and finite-sample performance has been examined in depth and is generally well understood, and it aligns well with the design of experiments, which are more familiar to non-technical audiences. In addition to subset selection, matching often (though not always) involves a form of *stratification*, the assignment of units to pairs or strata containing multiple units. The distinction between subset selection and stratification is described by @zubizarretaMatchingBalancePairing2014, who separate them into two separate steps. In `MatchIt`, with almost all matching methods, subset selection is performed by stratification; for example, treated units are paired with control units, and unpaired units are then dropped from the matched sample. With some methods, subclasses are used to assign matching or stratification weights to individual units, which increase or decrease each unit's leverage in a subsequent analysis. There has been some debate about the importance of stratification after subset selection; while some authors have argued that, with some forms of matching, pair membership is incidental [@stuart2008; @schafer2008], others have argued that correctly incorporating pair membership into effect estimation can improve the quality of inferences [@austin2014a; @wan2019]. For methods that allow it, `MatchIt` includes stratum membership as an additional output of each matching specification. How these strata can be used is detailed in `vignette("estimating-effects")`. At the heart of `MatchIt` are three classes of methods: distance matching, stratum matching, and pure subset selection. *Distance matching* involves considering a focal group (usually the treated group) and selecting members of the non-focal group (i.e., the control group) to pair with each member of the focal group based on the *distance* between units, which can be computed in one of several ways. Members of either group that are not paired are dropped from the sample. Nearest neighbor matching (`method = "nearest"`), optimal pair matching (`method = "optimal"`), optimal full matching (`method = "full"`), generalized full matching (`method = "quick"`), and genetic matching (`method = "genetic"`) are the methods of distance matching implemented in `MatchIt`. Typically, only the average treatment in the treated (ATT) or average treatment in the control (ATC), if the control group is the focal group, can be estimated after distance matching in `MatchIt` (full matching is an exception, described later). *Stratum matching* involves creating strata based on unique values of the covariates and assigning units with those covariate values into those strata. Any units that are in strata that lack either treated or control units are then dropped from the sample. Strata can be formed using the raw covariates (`method = "exact"`), coarsened versions of the covariates (`method = "cem"`), or coarsened versions of the propensity score (`method = "subclass"`). When no units are discarded, either the ATT, ATC, or ATE can be estimated after stratum matching, though often some units are discarded, especially with exact and coarsened exact matching, making the estimand less clear. For use in estimating marginal treatment effects after exact matching, stratification weights are computed for the matched units first by computing a new "stratum propensity score" for each unit, which is the proportion of treated units in its stratum. The formulas for computing inverse probability weights from standard propensity scores are then applied to the new stratum propensity scores to form the new weights. Pure subset selection involves selecting a subset of units form the original sample without considering the distance between individual units or strata that units might fall into. Subsets are selected to optimize a criterion subject to constraint on balance and remaining sample size. Cardinality and profile matching (`method = "cardinality"`) are the methods of pure subset selection implemented in `MatchIt`. Both methods allow the user to specify the largest imbalance allowed in the resulting matched sample, and an optimization routine attempts to find the largest matched sample that satisfies those balance constraints. While cardinality matching does not target a specific estimand, profile matching can be used to target the ATT, ATC, or ATE. Below, we describe each of the matching methods implemented in `MatchIt`. ## Matching Methods ### Nearest Neighbor Matching (`method = "nearest"`) Nearest neighbor matching is also known as greedy matching. It involves running through the list of treated units and selecting the closest eligible control unit to be paired with each treated unit. It is greedy in the sense that each pairing occurs without reference to how other units will be or have been paired, and therefore does not aim to optimize any criterion. Nearest neighbor matching is the most common form of matching used [@thoemmes2011; @zakrison2018] and has been extensively studied through simulations. See `?method_nearest` for the documentation for `matchit()` with `method = "nearest"`. Nearest neighbor matching requires the specification of a distance measure to define which control unit is closest to each treated unit. The default and most common distance is the *propensity score difference*, which is the difference between the propensity scores of each treated and control unit [@stuart2010]. Another popular distance is the Mahalanobis distance, described in the section "Mahalanobis distance matching" below. The order in which the treated units are to be paired must also be specified and has the potential to change the quality of the matches [@austin2013b; @rubin1973]; this is specified by the `m.order` argument. With propensity score matching, the default is to go in descending order from the highest propensity score; doing so allows the units that would have the hardest time finding close matches to be matched first [@rubin1973]. Other orderings are possible, including random ordering, which can be tried multiple times until an adequate matched sample is found. When matching with replacement (i.e., where each control unit can be reused to be matched with any number of treated units), the matching order doesn't matter. When using a matching ratio greater than 1 (i.e., when more than 1 control units are requested to be matched to each treated unit), matching occurs in a cycle, where each treated unit is first paired with one control unit, and then each treated unit is paired with a second control unit, etc. Ties are broken deterministically based on the order of the units in the dataset to ensure that multiple runs of the same specification yield the same result (unless the matching order is requested to be random). Nearest neighbor matching is implemented in `MatchIt` using internal C++ code through `Rcpp`. When matching on a propensity score, this makes matching extremely fast, even for large datasets. Using a caliper on the propensity score (described below) makes it even faster. Run times may be a bit longer when matching on other distance measures (e.g., the Mahalanobis distance). In contrast to optimal pair matching (described below), nearest neighbor matching does not require computing the full distance matrix between units, which makes it more applicable to large datasets. ### Optimal Pair Matching (`method = "optimal"`) Optimal pair matching (often just called optimal matching) is very similar to nearest neighbor matching in that it attempts to pair each treated unit with one or more control units. Unlike nearest neighbor matching, however, it is "optimal" rather than greedy; it is optimal in the sense that it attempts to choose matches that collectively optimize an overall criterion [@hansen2006; @gu1993]. The criterion used is the sum of the absolute pair distances in the matched sample. See `?method_optimal` for the documentation for `matchit()` with `method = "optimal"`. Optimal pair matching in `MatchIt` depends on the `fullmatch()` function in the `optmatch` package [@hansen2006]. Like nearest neighbor matching, optimal pair matching requires the specification of a distance measure between units. Optimal pair matching can be thought of simply as an alternative to selecting the order of the matching for nearest neighbor matching. Optimal pair matching and nearest neighbor matching often yield the same or very similar matched samples; indeed, some research has indicated that optimal pair matching is not much better than nearest neighbor matching at yielding balanced matched samples [@austin2013b]. The `tol` argument in `fullmatch()` can be supplied to `matchit()` with `method = "optimal"`; this controls the numerical tolerance used to determine whether the optimal solution has been found. The default is fairly high and, for smaller problems, should be set much lower (e.g., by setting `tol = 1e-7`). ### Optimal Full Matching (`method = "full"`) Optimal full matching (often just called full matching) assigns every treated and control unit in the sample to one subclass each [@hansen2004; @stuart2008a]. Each subclass contains one treated unit and one or more control units or one control units and one or more treated units. It is optimal in the sense that the chosen number of subclasses and the assignment of units to subclasses minimize the sum of the absolute within-subclass distances in the matched sample. Weights are computed based on subclass membership, and these weights then function like propensity score weights and can be used to estimate a weighted treatment effect, ideally free of confounding by the measured covariates. See `?method_full` for the documentation for `matchit()` with `method = "full"`. Optimal full matching in `MatchIt` depends on the `fullmatch()` function in the `optmatch` package [@hansen2006]. Like the other distance matching methods, optimal full matching requires the specification of a distance measure between units. It can be seen a combination of distance matching and stratum matching: subclasses are formed with varying numbers of treated and control units, as with stratum matching, but the subclasses are formed based on minimizing within-pair distances and do not involve forming strata based on any specific variable, similar to distance matching. Unlike other distance matching methods, full matching can be used to estimate the ATE. Full matching can also be seen as a form of propensity score weighting that is less sensitive to the form of the propensity score model because the original propensity scores are used just to create the subclasses, not to form the weights directly [@austin2015a]. In addition, full matching does not have to rely on estimated propensity scores to form the subclasses and weights; other distance measures are allowed as well. Although full matching uses all available units, there is a loss in precision due to the weights. Units may be weighted in such a way that they contribute less to the sample than would unweighted units, so the effective sample size (ESS) of the full matching weighted sample may be lower than even that of 1:1 pair matching. Balance is often far better after full matching than it is with 1:k matching, making full matching a good option to consider especially when 1:k matching is not effective or when the ATE is the target estimand. The specification of the full matching optimization problem can be customized by supplying additional arguments that are passed to `optmatch::fullmatch()`, such as `min.controls`, `max.controls`, `mean.controls`, and `omit.fraction`. As with optimal pair matching, the numerical tolerance value can be set much lower than the default with small problems by setting, e.g., `tol = 1e-7`. ### Generalized Full Matching (`method = "quick"`) Generalized full matching is a variant of full matching that uses a special fast clustering algorithm to dramatically speed up the matching, even for large datasets [@savjeGeneralizedFullMatching2021]. Like with optimal full matching, generalized full matching assigns every unit to a subclass. What makes generalized full match "generalized" is that the user can customize the matching in a number of ways, such as by specifying an arbitrary minimum number of units from each treatment group or total number of units per subclass, or by allowing not all units from a treatment group to have to be matched. Generalized full matching minimizes the largest within-subclass distances in the matched sample, but it does so in a way that is not completely optimal (though the solution is often very close to the optimal solution). Matching weights are computed based on subclass membership, and these weights then function like propensity score weights and can be used to estimate a weighted treatment effect, ideally free of confounding by the measured covariates. See `?method_quick` for the documentation for `matchit()` with `method = "quick"`. Generalized full matching in `MatchIt` depends on the `quickmatch()` function in the `quickmatch` package [@savjeQuickmatchQuickGeneralized2018]. Generalized full matching includes different options for customization than optimal full matching. The user cannot supply their own distance matrix, but propensity scores and distance metrics that are computed from the supplied covariates (e.g., Mahalanobis distance) are allowed. Calipers can only be placed on the propensity score, if supplied. As with optimal full matching, generalized full matching can target the ATE. Matching performance tends to be similar between the two methods, but generalized full matching will be much quicker and can accommodate larger datasets, making it a good substitute. Generalized full matching is often faster than even nearest neighbor matching, especially for large datasets. ### Genetic Matching (`method = "genetic"`) Genetic matching is less a specific form of matching and more a way of specifying a distance measure for another form of matching. In practice, though, the form of matching used is nearest neighbor pair matching. Genetic matching uses a genetic algorithm, which is an optimization routine used for non-differentiable objective functions, to find scaling factors for each variable in a generalized Mahalanobis distance formula [@diamond2013]. The criterion optimized by the algorithm is one based on covariate balance. Once the scaling factors have been found, nearest neighbor matching is performed on the scaled generalized Mahalanobis distance. See `?method_genetic` for the documentation for `matchit()` with `method = "genetic"`. Genetic matching in `MatchIt` depends on the `GenMatch()` function in the `Matching` package [@sekhon2011] to perform the genetic search and uses the `Match()` function to perform the nearest neighbor match using the scaled generalized Mahalanobis distance. Genetic matching considers the generalized Mahalanobis distance between a treated unit $i$ and a control unit $j$ as $$\delta_{GMD}(\mathbf{x}_i,\mathbf{x}_j, \mathbf{W})=\sqrt{(\mathbf{x}_i - \mathbf{x}_j)'(\mathbf{S}^{-1/2})'\mathbf{W}(\mathbf{S}^{-1/2})(\mathbf{x}_i - \mathbf{x}_j)}$$ where $\mathbf{x}$ is a $p \times 1$ vector containing the value of each of the $p$ included covariates for that unit, $\mathbf{S}^{-1/2}$ is the Cholesky decomposition of the covariance matrix $\mathbf{S}$ of the covariates, and $\mathbf{W}$ is a diagonal matrix with scaling factors $w$ on the diagonal: $$ \mathbf{W}=\begin{bmatrix} w_1 & & & \\ & w_2 & & \\ & & \ddots &\\ & & & w_p \\ \end{bmatrix} $$ When $w_k=1$ for all covariates $k$, the computed distance is the standard Mahalanobis distance between units. Genetic matching estimates the optimal values of the $w_k$s, where a user-specified criterion is used to define what is optimal. The default is to maximize the smallest p-value among balance tests for the covariates in the matched sample (both Kolmogorov-Smirnov tests and t-tests for each covariate). In `MatchIt`, if a propensity score is specified, the default is to include the propensity score and the covariates in $\mathbf{x}$ and to optimize balance on the covariates. When `distance = "mahalanobis"` or the `mahvars` argument is specified, the propensity score is left out of $\mathbf{x}$. In all other respects, genetic matching functions just like nearest neighbor matching except that the matching itself is carried out by `Matching::Match()` instead of by `MatchIt`. When using `method = "genetic"` in `MatchIt`, additional arguments passed to `Matching::GenMatch()` to control the genetic search process should be specified; in particular, the `pop.size` argument should be increased from its default of 100 to a much higher value. Doing so will make the algorithm take more time to finish but will generally improve the quality of the resulting matches. Different functions can be supplied to be used as the objective in the optimization using the `fit.func` argument. ### Exact Matching (`method = "exact"`) Exact matching is a form of stratum matching that involves creating subclasses based on unique combinations of covariate values and assigning each unit into their corresponding subclass so that only units with identical covariate values are placed into the same subclass. Any units that are in subclasses lacking either treated or control units will be dropped. Exact matching is the most powerful matching method in that no functional form assumptions are required on either the treatment or outcome model for the method to remove confounding due to the measured covariates; the covariate distributions are exactly balanced. The problem with exact matching is that in general, few if any units will remain after matching, so the estimated effect will only generalize to a very limited population and can lack precision. Exact matching is particularly ineffective with continuous covariates, for which it might be that no two units have the same value, and with many covariates, for which it might be the case that no two units have the same combination of all covariates; this latter problem is known as the "curse of dimensionality". See `?method_exact` for the documentation for `matchit()` with `method = "exact"`. It is possible to use exact matching on some covariates and another form of matching on the rest. This makes it possible to have exact balance on some covariates (typically categorical) and approximate balance on others, thereby gaining the benefits of both exact matching and the other matching method used. To do so, the other matching method should be specified in the `method` argument to `matchit()` and the `exact` argument should be specified to contain the variables on which exact matching is to be done. ### Coarsened Exact Matching (`method = "cem"`) Coarsened exact matching (CEM) is a form of stratum matching that involves first coarsening the covariates by creating bins and then performing exact matching on the new coarsened versions of the covariates [@iacus2012]. The degree and method of coarsening can be controlled by the user to manage the trade-off between exact and approximate balancing. For example, coarsening a covariate to two bins will mean that units that differ greatly on the covariate might be placed into the same subclass, while coarsening a variable to five bins may require units to be dropped due to not finding matches. Like exact matching, CEM is susceptible to the curse of dimensionality, making it a less viable solution with many covariates, especially with few units. Dropping units can also change the target population of the estimated effect. See `?method_cem` for the documentation for `matchit()` with `method = "cem"`. CEM in `MatchIt` does not depend on any other package to perform the coarsening and matching, though it used to rely on the `cem` package. ### Subclassification (`method = "subclass"`) Propensity score subclassification can be thought of as a form of coarsened exact matching with the propensity score as the sole covariate to be coarsened and matched on. The bins are usually based on specified quantiles of the propensity score distribution either in the treated group, control group, or overall, depending on the desired estimand. Propensity score subclassification is an old and well-studied method, though it can perform poorly compared to other, more modern propensity score methods such as full matching and weighting [@austin2010]. See `?method_subclass` for the documentation for `matchit()` with `method = "subclass"`. The binning of the propensity scores is typically based on dividing the distribution of covariates into approximately equally sized bins. The user specifies the number of subclasses using the `subclass` argument and which group should be used to compute the boundaries of the bins using the `estimand` argument. Sometimes, subclasses can end up with no units from one of the treatment groups; by default, `matchit()` moves a unit from an adjacent subclass into the lacking one to ensure that each subclass has at least one unit from each treatment group. The minimum number of units required in each subclass can be chosen by the `min.n` argument to `matchit()`. If set to 0, an error will be thrown if any subclass lacks units from one of the treatment groups. Moving units from one subclass to another generally worsens the balance in the subclasses but can increase precision. The default number of subclasses is 6, which is arbitrary and should not be taken as a recommended value. Although early theory has recommended the use of 5 subclasses, in general there is an optimal number of subclasses that is typically much larger than 5 but that varies among datasets [@orihara2021]. Rather than trying to figure this out for oneself, one can use optimal full matching (i.e., with `method = "full"`) or generalized full matching (`method = "quick"`) to optimally create subclasses that optimize a within-subclass distance criterion. The output of propensity score subclassification includes the assigned subclasses and the subclassification weights. Effects can be estimated either within each subclass and then averaged across them, or a single marginal effect can be estimated using the subclassification weights. This latter method has been called marginal mean weighting through subclassification [MMWS; @hong2010] and fine stratification weighting [@desai2017]. It is also implemented in the `WeightIt` package. ### Cardinality and Profile Matching (`method = "cardinality"`) Cardinality and profile matching are pure subset selection methods that involve selecting a subset of the original sample without considering the distance between individual units or assigning units to pairs or subclasses. They can be thought of as a weighting method where the weights are restricted to be zero or one. Cardinality matching involves finding the largest sample that satisfies user-supplied balance constraints and constraints on the ratio of matched treated to matched control units [@zubizarretaMatchingBalancePairing2014]. It does not consider a specific estimand and can be a useful alternative to matching with a caliper for handling data with little overlap [@visconti2018]. Profile matching involves identifying a target distribution (e.g., the full sample for the ATE or the treated units for the ATT) and finding the largest subset of the treated and control groups that satisfy user-supplied balance constraints with respect to that target [@cohnProfileMatchingGeneralization2021]. See `?method_cardinality` for the documentation for using `matchit()` with `method = "cardinality"`, including which inputs are required to request either cardinality matching or profile matching. Subset selection is performed by solving a mixed integer programming optimization problem with linear constraints. The problem involves maximizing the size of the matched sample subject to constraints on balance and sample size. For cardinality matching, the balance constraints refer to the mean difference for each covariate between the matched treated and control groups, and the sample size constraints require the matched treated and control groups to be the same size (or differ by a user-supplied factor). For profile matching, the balance constraints refer to the mean difference for each covariate between each treatment group and the target distribution; for the ATE, this requires the mean of each covariate in each treatment group to be within a given tolerance of the mean of the covariate in the full sample, and for the ATT, this requires the mean of each covariate in the control group to be within a given tolerance of the mean of the covariate in the treated group, which is left intact. The balance tolerances are controlled by the `tols` and `std.tols` arguments. One can also create pairs in the matched sample by using the `mahvars` argument, which requests that optimal Mahalanobis matching be done after subset selection; doing so can add additional precision and robustness [@zubizarretaMatchingBalancePairing2014]. The optimization problem requires a special solver to solve. Currently, the available options in `MatchIt` are the HiGHS solver (through the `highs` package), the GLPK solver (through the `Rglpk` package), the SYMPHONY solver (through the `Rsymphony` package), and the Gurobi solver (through the `gurobi` package). The differences among the solvers are in performance; Gurobi is by far the best (fastest, least likely to fail to find a solution), but it is proprietary (though has a free trial and academic license) and is a bit more complicated to install. HiGHS is the default due to being open source, easily installed, and with performance comparable to Gurobi. The `designmatch` package also provides an implementation of cardinality matching with more options than `MatchIt` offers. ## Customizing the Matching Specification In addition to the specific matching method, other options are available for many of the matching methods to further customize the matching specification. These include different specifications of the distance measure, methods to perform alternate forms of matching in addition to the main method, prune units far from other units prior to matching, restrict possible matches, etc. Not all options are compatible with all matching methods. ### Specifying the propensity score or other distance measure (`distance`) The distance measure is used to define how close two units are. In nearest neighbor matching, this is used to choose the nearest control unit to each treated unit. In optimal matching, this is used in the criterion that is optimized. By default, the distance measure is the propensity score difference, and the argument supplied to `distance` corresponds to the method of estimating the propensity score. In `MatchIt`, propensity scores are often labeled as "distance" values, even though the propensity score itself is not a distance measure. This is to reflect that the propensity score is used in creating the distance value, but other scores could be used, such as prognostic scores for prognostic score matching [@hansen2008a]. The propensity score is more like a "position" value, in that it reflects the position of each unit in the matching space, and the difference between positions is the distance between them. If the argument to `distance` is one of the allowed methods for estimating propensity scores (see `?distance` for these values) or is a numeric vector with one value per unit, the distance between units will be computed as the pairwise difference between propensity scores or the supplied values. Propensity scores are also used in propensity score subclassification and can optionally be used in genetic matching as a component of the generalized Mahalanobis distance. For exact, coarsened exact, and cardinality matching, the `distance` argument is ignored. The default `distance` argument is `"glm"`, which estimates propensity scores using logistic regression or another generalized linear model. The `link` and `distance.options` arguments can be supplied to further specify the options for the propensity score models, including whether to use the raw propensity score or a linearized version of it (e.g., the logit of a logistic regression propensity score, which has been commonly referred to and recommended in the propensity score literature [@austin2011a; @stuart2010]). Allowable options for the propensity score model include parametric and machine learning-based models, each of which have their strengths and limitations and may perform differently depending on the unique qualities of each dataset. We recommend multiple types of models be tried to find one that yields the best balance, as there is no way to make a single recommendation that will work for all cases. The `distance` argument can also be specified as a method of computing pairwise distances from the covariates directly (i.e., without estimating propensity scores). The options include `"mahalanobis"`, `"robust_mahalanobis"`, `"euclidean"`, and `"scaled_euclidean"`. These methods compute a distance metric for a treated unit $i$ and a control unit $j$ as $$\delta(\mathbf{x}_i,\mathbf{x}_j)=\sqrt{(\mathbf{x}_i - \mathbf{x}_j)'S^{-1}(\mathbf{x}_i - \mathbf{x}_j)}$$ where $\mathbf{x}$ is a $p \times 1$ vector containing the value of each of the $p$ included covariates for that unit, $S$ is a scaling matrix, and $S^{-1}$ is the (generalized) inverse of $S$. For Mahalanobis distance matching, $S$ is the pooled covariance matrix of the covariates [@rubinBiasReductionUsing1980]; for Euclidean distance matching, $S$ is the identity matrix (i.e., no scaling); and for scaled Euclidean distance matching, $S$ is the diagonal of the pooled covariance matrix (containing just the variances). The robust Mahalanobis distance is computed not on the covariates directly but rather on their ranks and uses a correction for ties (see @rosenbaumDesignObservationalStudies2010, ch 8). For creating close pairs, matching with these distance measures tends work better than propensity score matching because paired units will have close values on all of the covariates, whereas propensity score-paired units may be close on the propensity score but not on any of the covariates themselves. This feature was the basis of King and Nielsen's [-@king2019] warning against using propensity scores for matching. That said, they do not always outperform propensity score matching [@ripolloneImplicationsPropensityScore2018]. `distance` can also be supplied as a matrix of distance values between units. This makes it possible to use handcrafted distance matrices or distances created outside `MatchIt`. Only nearest neighbor, optimal pair, and optimal full matching allow this specification. The propensity score can have uses other than as the basis for matching. It can be used to define a region of common support, outside which units are dropped prior to matching; this is implemented by the `discard` option. It can also be used to define a caliper, the maximum distance two units can be before they are prohibited from being paired with each other; this is implemented by the `caliper` argument. To estimate or supply a propensity score for one of these purposes but not use it as the distance measure for matching (i.e., to perform Mahalanobis distance matching instead), the `mahvars` argument can be specified. These options are described below. ### Implementing common support restrictions (`discard`) The region of *common support* is the region of overlap between treatment groups. A common support restriction discards units that fall outside of the region of common support, preventing them from being matched to other units and included in the matched sample. This can reduce the potential for extrapolation and help the matching algorithms to avoid overly distant matches from occurring. In `MatchIt`, the `discard` option implements a common support restriction based on the propensity score. The argument can be supplied as `"treated"`, `"control"`, or `"both"`, which discards units in the corresponding group that fall outside the region of common support for the propensity score. The `reestimate` argument can be supplied to choose whether to re-estimate the propensity score in the remaining units. **If units from the treated group are discarded based on a common support restriction, the estimand no longer corresponds to the ATT.** ### Caliper matching (`caliper`) A *caliper* can be though of as a ring around each unit that limits to which other units that unit can be paired. Calipers are based on the propensity score or other covariates. Two units whose distance on a calipered covariate is larger than the caliper width for that covariate are not allowed to be matched to each other. Any units for which there are no available matches within the caliper are dropped from the matched sample. Calipers ensure paired units are close to each other on the calipered covariates, which can ensure good balance in the matched sample. Multiple variables can be supplied to `caliper` to enforce calipers on all of them simultaneously. Using calipers can be a good alternative to exact or coarsened exact matching to ensure only similar units are paired with each other. The `std.caliper` argument controls whether the provided calipers are in raw units or standard deviation units. When negative calipers are supplied, this forces units whose distance on the calipered covariate is *smaller* than the absolute caliper width for that covariate to be disallowed from being matched to each other. **If units from the treated group are left unmatched due to a caliper, the estimand no longer corresponds to the ATT.** ### Mahalanobis distance matching (`mahvars`) To perform Mahalanobis distance matching without the need to estimate or use a propensity score, the `distance` argument can be set to `"mahalanobis"`. If a propensity score is to be estimated or used for a different purpose, such as in a common support restriction or a caliper, but you still want to perform Mahalanobis distance matching, variables should be supplied to the `mahvars` argument. The propensity scores will be generated using the `distance` specification, and matching will occur not on the covariates supplied to the main formula of `matchit()` but rather on the covariates supplied to `mahvars`. To perform Mahalanobis distance matching within a propensity score caliper, for example, the `distance` argument should be set to the method of estimating the propensity score (e.g., `"glm"` for logistic regression), the `caliper` argument should be specified to the desired caliper width, and `mahvars` should be specified to perform Mahalanobis distance matching on the desired covariates within the caliper. `mahvars` has a special meaning for genetic matching and cardinality matching; see their respective help pages for details. ### Exact matching (`exact`) To perform exact matching on all supplied covariates, the `method` argument can be set to `"exact"`. To perform exact matching only on some covariates and some other form of matching within exact matching strata on other covariates, the `exact` argument can be used. Covariates supplied to the `exact` argument will be matched exactly, and the form of matching specified by `method` (e.g., `"nearest"` for nearest neighbor matching) will take place within each exact matching stratum. This can be a good way to gain some of the benefits of exact matching without completely succumbing to the curse of dimensionality. As with exact matching performed with `method = "exact"`, any units in strata lacking members of one of the treatment groups will be left unmatched. Note that although matching occurs within each exact matching stratum, propensity score estimation and computation of the Mahalanobis or other distance matrix occur in the full sample. **If units from the treated group are unmatched due to an exact matching restriction, the estimand no longer corresponds to the ATT.** ### Anti-exact matching (`antiexact`) Anti-exact matching adds a restriction such that a treated and control unit with same values of any of the specified anti-exact matching variables cannot be paired. This can be useful when finding comparison units outside of a unit's group, such as when matching units in one group to units in another when units within the same group might otherwise be close matches. See examples [here](https://stackoverflow.com/questions/66526115/propensity-score-matching-with-panel-data) and [here](https://stackoverflow.com/questions/61120201/avoiding-duplicates-from-propensity-score-matching?rq=1). A similar effect can be implemented by supplying negative caliper values. ### Matching with replacement (`replace`) Nearest neighbor matching and genetic matching have the option of matching with or without replacement, and this is controlled by the `replace` argument. Matching without replacement means that each control unit is matched to only one treated unit, while matching with replacement means that control units can be reused and matched to multiple treated units. Matching without replacement carries certain statistical benefits in that weights for each unit can be omitted or are more straightforward to include and dependence between units depends only on pair membership. However, it is not asymptotically consistent unless the propensity scores for all treated units are below .5 and there are many more control units than treated units [@savjeInconsistencyMatchingReplacement2022]. Special standard error estimators are sometimes required for estimating effects after matching with replacement [@austin2020a], and methods for accounting for uncertainty are not well understood for non-continuous outcomes. Matching with replacement will tend to yield better balance though, because the problem of "running out" of close control units to match to treated units is avoided, though the reuse of control units will decrease the effect sample size, thereby worsening precision [@austin2013b]. (This problem occurs in the Lalonde dataset used in `vignette("MatchIt")`, which is why nearest neighbor matching without replacement is not very effective there.) After matching with replacement, control units are assigned to more than one subclass, so the `get_matches()` function should be used instead of `match_data()` after matching with replacement if subclasses are to be used in follow-up analyses; see `vignette("estimating-effects")` for details. The `reuse.max` argument can also be used with `method = "nearest"` to control how many times each control unit can be reused as a match. Setting `reuse.max = 1` is equivalent to requiring matching without replacement (i.e., because each control can be used only once). Other values allow control units to be matched more than once, though only up to the specified number of times. Higher values will tend to improve balance at the cost of precision. ### $k$:1 matching (`ratio`) The most common form of matching, 1:1 matching, involves pairing one control unit with each treated unit. To perform $k$:1 matching (e.g., 2:1 or 3:1), which pairs (up to) $k$ control units with each treated unit, the `ratio` argument can be specified. Performing $k$:1 matching can preserve precision by preventing too many control units from being unmatched and dropped from the matched sample, though the gain in precision by increasing $k$ diminishes rapidly after 4 [@rosenbaum2020]. Importantly, for $k>1$, the matches after the first match will generally be worse than the first match in terms of closeness to the treated unit, so increasing $k$ can also worsen balance [@rassenOnetomanyPropensityScore2012]. @austin2010a found that 1:1 or 1:2 matching generally performed best in terms of mean squared error. In general, it makes sense to use higher values of $k$ while ensuring that balance is satisfactory. With nearest neighbor and optimal pair matching, variable $k$:1 matching, in which the number of controls matched to each treated unit varies, can also be used; this can have improved performance over "fixed" $k$:1 matching [@ming2000; @rassenOnetomanyPropensityScore2012]. See `?method_nearest` and `?method_optimal` for information on implementing variable $k$:1 matching. ### Matching order (`m.order`) For nearest neighbor matching (including genetic matching), units are matched in an order, and that order can affect the quality of individual matches and of the resulting matched sample. With `method = "nearest"`, the allowable options to `m.order` to control the matching order are `"largest"`, `"smallest"`, `"closest"`, `"farthest"`, `"random"`, and `"data"`. With `method = "genetic"`, all but `"closest"` and `"farthest"` can be used. Requesting `"largest"` means that treated units with the largest propensity scores, i.e., those least like the control units, will be matched first, which prevents them from having bad matches after all the close control units have been used up. `"smallest"` means that treated units with the smallest propensity scores are matched first. `"closest"` means that potential pairs with the smallest distance between units will be matched first, which ensures that the best possible matches are included in the matched sample but can yield poor matches for units whose best match is far from them; this makes it particularly useful when matching with a caliper. `"farthest"` means that closest pairs with the largest distance between them will be matched first, which ensures the hardest units to match are given the best chance to find matches. `"random"` matches in a random order, and `"data"` matches in order of the data. A propensity score is required for `"largest"` and `"smallest"` but not for the other options. @rubin1973 recommends using `"largest"` or `"random"`, though @austin2013b recommends against `"largest"` and instead favors `"closest"` or `"random"`. `"closest"` and `"smallest"` are best for prioritizing the best possible matches, while `"farthest"` and `"largest"` are best for preventing extreme pairwise distances between matched units. ## Choosing a Matching Method Choosing the best matching method for one's data depends on the unique characteristics of the dataset as well as the goals of the analysis. For example, because different matching methods can target different estimands, when certain estimands are desired, specific methods must be used. On the other hand, some methods may be more effective than others when retaining the target estimand is less important. Below we provide some guidance on choosing a matching method. Remember that multiple methods can (and should) be tried as long as the treatment effect is not estimated until a method has been settled on. The criteria on which a matching specification should be judged are balance and remaining (effective) sample size after matching. Assessing balance is described in `vignette("assessing-balance")`. A typical workflow is similar to that demonstrated in `vignette("MatchIt")`: try a matching method, and if it yields poor balance or an unacceptably low remaining sample size, try another, until a satisfactory specification has been found. It is important to assess balance broadly (i.e., beyond comparing the means of the covariates in the treated and control groups), and the search for a matching specification should not stop when a threshold is reached, but should attempt to come as close as possible to perfect balance [@ho2007]. Even if the first matching specification appears successful at reducing imbalance, there may be another specification that could reduce it even further, thereby increasing the robustness of the inference and the plausibility of an unbiased effect estimate. If the target of inference is the ATE, optimal or generalized full matching, subclassification, or profile matching can be used. If the target of inference is the ATT or ATC, any matching method may be used. When retaining the target estimand is not so important, additional options become available that involve discarding units in such a way that the original estimand is distorted. These include matching with a caliper, matching within a region of common support, cardinality matching, or exact or coarsened exact matching, perhaps on a subset of the covariates. Because exact and coarsened exact matching aim to balance the entire joint distribution of covariates, they are the most powerful methods. If it is possible to perform exact matching, this method should be used. If continuous covariates are present, coarsened exact matching can be tried. Care should be taken with retaining the target population and ensuring enough matched units remain; unless the control pool is much larger than the treated pool, it is likely some (or many) treated units will be discarded, thereby changing the estimand and possibly dramatically reducing precision. These methods are typically only available in the most optimistic of circumstances, but they should be used first when those circumstances arise. It may also be useful to combine exact or coarsened exact matching on some covariates with another form of matching on the others (i.e., by using the `exact` argument). When estimating the ATE, either subclassification, full matching, or profile matching can be used. Optimal and generalized full matching can be effective because they optimize a balance criterion, often leading to better balance. With full matching, it's also possible to exact match on some variables and match using the Mahalanobis distance, eliminating the need to estimate propensity scores. Profile matching also ensures good balance, but because units are only given weights of zero or one, a solution may not be feasible and many units may have to be discarded. For large datasets, neither optimal full matching nor profile matching may be possible, in which case generalized full matching and subclassification are faster solutions. When using subclassification, the number of subclasses should be varied. With large samples, higher numbers of subclasses tend to yield better performance; one should not immediately settle for the default (6) or the often-cited recommendation of 5 without trying several other numbers. The documentation for `cobalt::bal.compute()` contains an example of using balance to select the optimal number of subclasses. When estimating the ATT, a variety of methods can be tried. Genetic matching can perform well at achieving good balance because it directly optimizes covariate balance. With larger datasets, it may take a long time to reach a good solution (though that solution will tend to be good as well). Profile matching also will achieve good balance if a solution is feasible because balance is controlled by the user. Optimal pair matching and nearest neighbor matching without replacement tend to perform similarly to each other; nearest neighbor matching may be preferable for large datasets that cannot be handled by optimal matching. Nearest neighbor, optimal, and genetic matching allow some customizations like including covariates on which to exactly match, using the Mahalanobis distance instead of a propensity score difference, and performing $k$:1 matching with $k>1$. Nearest neighbor matching with replacement, full matching, and subclassification all involve weighting the control units with nonuniform weights, which often allows for improved balancing capabilities but can be accompanied by a loss in effective sample size, even when all units are retained. There is no reason not to try many of these methods, varying parameters here and there, in search of good balance and high remaining sample size. As previously mentioned, no single method can be recommended above all others because the optimal specification depends on the unique qualities of each dataset. When the target population is less important, for example, when engaging in treatment effect discovery or when the sampled population is not of particular interest (e.g., it corresponds to an arbitrarily chosen hospital or school; see @mao2018 for these and other reasons why retaining the target population may not be important), other methods that do not retain the characteristics of the original sample become available. These include matching with a caliper (on the propensity score or on the covariates themselves), cardinality matching, and more restrictive forms of matching like exact and coarsened exact matching, either on all covariates or just a subset, that are prone to discard units from the sample in such a way that the target population is changed. @austin2013b and Austin and Stuart [-@austin2015c; -@austin2015a] have found that caliper matching can be a particularly effective modification to nearest neighbor matching for eliminating imbalance and reducing bias when the target population is less relevant, but when inference to a specific target population is desired, using calipers can induce bias due to incomplete matching [@rosenbaum1985; @wang2020]. Cardinality matching can be particularly effective in data with little overlap between the treatment groups [@visconti2018] and can perform better than caliper matching [@delosangelesresaDirectStableWeight2020]. It is important not to rely excessively on theoretical or simulation-based findings or specific recommendations when making choices about the best matching method to use. For example, although nearest neighbor matching without replacement balance covariates better than did subclassification with five or ten subclasses in Austin's [-@austin2009c] simulation, this does not imply it will be superior in all datasets. Likewise, though @rosenbaum1985a and @austin2011a both recommend using a caliper of .2 standard deviations of the logit of the propensity score, this does not imply that caliper will be optimal in all scenarios, and other widths should be tried, though it should be noted that tightening the caliper on the propensity score can sometimes degrade performance [@king2019]. For large datasets (i.e., in 10,000s to millions), some matching methods will be too slow to be used at scale. Instead, users should consider generalized full matching, subclassification, or coarsened exact matching, which are all very fast and designed to work with large datasets. Nearest neighbor matching on the propensity score has been optimized to run quickly for large datasets as well. ## Reporting the Matching Specification When reporting the results of a matching analysis, it is important to include the relevant details of the final matching specification and the process of arriving at it. Using `print()` on the `matchit` object synthesizes information on how the above arguments were used to provide a description of the matching specification. It is best to be as specific as possible to ensure the analysis is replicable and to allow audiences to assess its validity. Although citations recommending specific matching methods can be used to help justify a choice, the only sufficient justification is adequate balance and remaining sample size, regardless of published recommendations for specific methods. See `vignette("assessing-balance")` for instructions on how to assess and report the quality of a matching specification. After matching and estimating an effect, details of the effect estimation must be included as well; see `vignette("estimating-effects")` for instructions on how to perform and report on the analysis of a matched dataset. ## References MatchIt/inst/doc/MatchIt.Rmd0000644000176200001440000006430114762373555015347 0ustar liggesusers--- title: 'MatchIt: Getting Started' author: "Noah Greifer" date: "`r Sys.Date()`" output: html_vignette: toc: yes vignette: > %\VignetteIndexEntry{MatchIt: Getting Started} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} bibliography: references.bib link-citations: true --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, fig.width=7, fig.height=5, fig.align = "center") options(width = 200) notice <- "Note: if the `optmatch` package is not available, the subsequent lines will not run." use <- { if (requireNamespace("optmatch", quietly = TRUE)) "full" else if (requireNamespace("quickmatch", quietly = TRUE)) "quick" else "none" } me_ok <- requireNamespace("marginaleffects", quietly = TRUE) && requireNamespace("sandwich", quietly = TRUE) && !isTRUE(as.logical(Sys.getenv("NOT_CRAN", "false"))) && utils::packageVersion("marginaleffects") > '0.25.0' ``` ```{=html} ``` ## Introduction `MatchIt` implements the suggestions of @ho2007 for improving parametric statistical models for estimating treatment effects in observational studies and reducing model dependence by preprocessing data with semi-parametric and non-parametric matching methods. After appropriately preprocessing with `MatchIt`, researchers can use whatever parametric model they would have used without `MatchIt` and produce inferences that are more robust and less sensitive to modeling assumptions. `MatchIt` reduces the dependence of causal inferences on commonly made, but hard-to-justify, statistical modeling assumptions using a large range of sophisticated matching methods. The package includes several popular approaches to matching and provides access to methods implemented in other packages through its single, unified, and easy-to-use interface. Matching is used in the context of estimating the causal effect of a binary treatment or exposure on an outcome while controlling for measured pre-treatment variables, typically confounding variables or variables prognostic of the outcome. Here and throughout the `MatchIt` documentation we use the word "treatment" to refer to the focal causal variable of interest, with "treated" and "control" reflecting the names of the treatment groups. The goal of matching is to produce *covariate balance*, that is, for the distributions of covariates in the two groups to be approximately equal to each other, as they would be in a successful randomized experiment. The importance of covariate balance is that it allows for increased robustness to the choice of model used to estimate the treatment effect; in perfectly balanced samples, a simple difference in means can be a valid treatment effect estimate. Here we do not aim to provide a full introduction to matching or causal inference theory, but simply to explain how to use `MatchIt` to perform nonparametric preprocessing. For excellent and accessible introductions to matching, see @stuart2010 and @austin2011b. A matching analysis involves four primary steps: 1) planning, 2) matching, 3) assessing the quality of matches, and 4) estimating the treatment effect and its uncertainty. Here we briefly discuss these steps and how they can be implemented with `MatchIt`; in the other included vignettes, these steps are discussed in more detail. We will use Lalonde's data on the evaluation of the National Supported Work program to demonstrate `MatchIt`'s capabilities. First, we load `MatchIt` and bring in the `lalonde` dataset. ```{r} library("MatchIt") data("lalonde") head(lalonde) ``` The statistical quantity of interest is the causal effect of the treatment (`treat`) on 1978 earnings (`re78`). The other variables are pre-treatment covariates. See `?lalonde` for more information on this dataset. In particular, the analysis is concerned with the marginal, total effect of the treatment for those who actually received the treatment. In what follows, we briefly describe the four steps of a matching analysis and how to implement them in `MatchIt`. For more details, we recommend reading the other vignettes, `vignette("matching-methods")`, `vignette("assessing-balance")`, and `vignette("estimating-effects")`, especially for users less familiar with matching methods. For the use of `MatchIt` with sampling weights, also see `vignette("sampling-weights")`. It is important to recognize that the ease of using `MatchIt` does not imply the simplicity of matching methods; advanced statistical methods like matching that require many decisions to be made and caution in their use should only be performed by those with statistical training. ## Planning The planning phase of a matching analysis involves selecting the type of effect to be estimated, selecting the target population to which the treatment effect is to generalize, and selecting the covariates for which balance is required for an unbiased estimate of the treatment effect. Each of these are theoretical steps that do not involve performing analyses on the data. Ideally, they should be considered prior to data collection in the planning stage of a study. Thinking about them early can aid in performing a complete and cost-effective analysis. **Selecting the type of effect to be estimated.** There are a few different types of effects to be estimated. In the presence of mediating variables, one might be interested in the direct effect of the treatment that does not pass through the mediating variables or the total effect of the treatment across all causal pathways. Matching is well suited for estimating total effects, and specific mediation methods may be better suited for other mediation-related quantities. One may be interested in a conditional effect or a marginal effect. A conditional effect is the effect of a treatment within some strata of other prognostic variables (e.g., at the patient level), and a marginal effect is the average effect of a treatment in a population (e.g., for implementing a broad policy change). Different types of matching are well suited for each of these, but the most common forms are best used for estimating marginal treatment effects; for conditional treatment effects, typically modeling assumptions are required or matching must be done within strata of the conditioning variables. Matching can reduce the reliance on correct model specification for conditional effects. **Selecting a target population.** The target population is the population to which the effect estimate is to generalize. Typically, an effect estimated in a sample generalizes to the population from which the sample is a probability sample. If the sample is not a probability sample from any population (e.g., it is a convenience sample or involves patients from an arbitrary hospital), the target population can be unclear. Often, the target population is a group of units who are eligible for the treatment (or a subset thereof). Causal estimands are defined by the target population to which they generalize. The average treatment effect in the population (ATE) is the average effect of the treatment for all units in the target population. The average treatment effect in the treated (ATT) is the average effect of the treatment for units like those who actually were treated. The most common forms of matching are best suited for estimating the ATT, though some are also available for estimating the ATE. Some matching methods distort the sample in such a way that the estimated treatment effect corresponds neither to the ATE nor to the ATT, but rather to the effect in an unspecified population (sometimes called the ATM, or average treatment effect in the remaining matched sample). When the target population is not so important (e.g., in the case of treatment effect discovery), such methods may be attractive; otherwise, care should be taken in ensuring the effect generalizes to the target population of interest. Different matching methods allow for different target populations, so it is important to choose a matching method that allows one to estimate the desired effect. See @greiferChoosingEstimandWhen2021 for guidance on making this choice. **Selecting covariates to balance.** Selecting covariates carefully is critical for ensuring the resulting treatment effect estimate is free of confounding and can be validly interpreted as a causal effect. To estimate total causal effects, all covariates must be measured prior to treatment (or otherwise not be affected by the treatment). Covariates should be those that cause variation in the outcome and selection into treatment group; these are known as confounding variables. See @vanderweele2019 for a guide on covariate selection. Ideally these covariates are measured without error and are free of missingness. ## Check Initial Imbalance After planning and prior to matching, it can be a good idea to view the initial imbalance in one's data that matching is attempting to eliminate. We can do this using the code below: ```{r} # No matching; constructing a pre-match matchit object m.out0 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, method = NULL, distance = "glm") ``` The first argument is a `formula` relating the treatment to the covariates used in estimating the propensity score and for which balance is to be assessed. The `data` argument specifies the dataset where these variables exist. Typically, the `method` argument specifies the method of matching to be performed; here, we set it to `NULL` so we can assess balance prior to matching[^1]. The `distance` argument specifies the method for estimating the propensity score, a one-dimensional summary of all the included covariates, computed as the predicted probability of being the treated group given the covariates; here, we set it to `"glm"` for generalized linear model, which implements logistic regression by default[^2] (see `?distance` for other options). [^1]: Note that the default for `method` is `"nearest"` to perform nearest neighbor matching. To prevent any matching from taking place in order to assess pre-matching imbalance, `method` must be set to `NULL`. [^2]: Note that setting `distance = "logit"`, which was the default in `MatchIt` version prior to 4.0.0, or `"ps"`, which was the default prior to version 4.5.0, will also estimate logistic regression propensity scores. Because it is the default, the `distance` argument can actually be omitted if logistic regression propensity scores are desired. Below we assess balance on the unmatched data using `summary()`: ```{r} # Checking balance prior to matching summary(m.out0) ``` We can see severe imbalances as measured by the standardized mean differences (`Std. Mean Diff.`), variance ratios (`Var. Ratio`), and empirical cumulative distribution function (eCDF) statistics. Values of standardized mean differences and eCDF statistics close to zero and values of variance ratios close to one indicate good balance, and here many of them are far from their ideal values. ## Matching Now, matching can be performed. There are several different classes and methods of matching, described in `vignette("matching-methods")`. Here, we begin by briefly demonstrating 1:1 nearest neighbor (NN) matching on the propensity score, which is appropriate for estimating the ATT. One by one, each treated unit is paired with an available control unit that has the closest propensity score to it. Any remaining control units are left unmatched and excluded from further analysis. Due to the theoretical balancing properties of the propensity score described by @rosenbaum1983, propensity score matching can be an effective way to achieve covariate balance in the treatment groups. Below we demonstrate the use of `matchit()` to perform nearest neighbor propensity score matching. ```{r} # 1:1 NN PS matching w/o replacement m.out1 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, method = "nearest", distance = "glm") ``` We use the same syntax as before, but this time specify `method = "nearest"` to implement nearest neighbor matching, again using a logistic regression propensity score. Many other arguments are available for tuning the matching method and method of propensity score estimation. The matching outputs are contained in the `m.out1` object. Printing this object gives a description of the type of matching performed: ```{r} m.out1 ``` The key components of the `m.out1` object are `weights` (the computed matching weights), `subclass` (matching pair membership), `distance` (the estimated propensity score), and `match.matrix` (which control units are matched to each treated unit). How these can be used for estimating the effect of the treatment after matching is detailed in `vignette("estimating-effects")`. ## Assessing the Quality of Matches Although matching on the propensity score is often effective at eliminating differences between the treatment groups to achieve covariate balance, its performance in this regard must be assessed. If covariates remain imbalanced after matching, the matching is considered unsuccessful, and a different matching specification should be tried. `MatchIt` offers a few tools for the assessment of covariate balance after matching. These include graphical and statistical methods. More detail on the interpretation of the included plots and statistics can be found in `vignette("assessing-balance")`. In addition to covariate balance, the quality of the match is determined by how many units remain after matching. Matching often involves discarding units that are not paired with other units, and some matching options, such as setting restrictions for common support or calipers, can further decrease the number of remaining units. If, after matching, the remaining sample size is small, the resulting effect estimate may be imprecise. In many cases, there will be a trade-off between balance and remaining sample size. How to optimally choose among them is an instance of the fundamental bias-variance trade-off problem that cannot be resolved without substantive knowledge of the phenomena under study. Prospective power analyses can be used to determine how small a sample can be before necessary precision is sacrificed. To assess the quality of the resulting matches numerically, we can use the `summary()` function on `m.out1` as before. Here we set `un = FALSE` to suppress display of the balance before matching for brevity and because we already saw it. (Leaving it as `TRUE`, its default, would display balance both before and after matching.) ```{r} # Checking balance after NN matching summary(m.out1, un = FALSE) ``` At the top is a summary of covariate balance after matching. Although balance has improved for some covariates, in general balance is still quite poor, indicating that nearest neighbor propensity score matching is not sufficient for removing confounding in this dataset. The final column, `Std. Pair Diff`, displays the average absolute within-pair difference of each covariate. When these values are small, better balance is typically achieved and estimated effects are more robust to misspecification of the outcome model [@king2019; @rubin1973a]. Next is a table of the sample sizes before and after matching. The matching procedure left 244 control units unmatched. Ideally, unmatched units would be those far from the treated units and would require greater extrapolation were they to have been retained. We can visualize the distribution of propensity scores of those who were matched using `plot()` with `type = "jitter"`: ```{r, fig.alt="Jitter plot of the propensity scores, which shows that no treated unit were dropped, and a large number of control units with low propensity scores were dropped."} plot(m.out1, type = "jitter", interactive = FALSE) ``` We can visually examine balance on the covariates using `plot()` with `type = "density"`: ```{r, fig.alt="Density plots of age, married and re75 in the unmatched and matched samples."} plot(m.out1, type = "density", interactive = FALSE, which.xs = ~age + married + re75) ``` Imbalances are represented by the differences between the black (treated) and gray (control) distributions. Although `married` and `re75` appear to have improved balance after matching, the case is mixed for `age`. ### Trying a Different Matching Specification Given the poor performance of nearest neighbor matching in this example, we can try a different matching method or make other changes to the matching algorithm or distance specification. Below, we'll try full matching, which matches every treated unit to at least one control and every control to at least one treated unit [@hansen2004; @stuart2008a]. We'll also try a different link (probit) for the propensity score model. `r if (use == "none") notice` ```{r, eval = (use == "full"), include= (use %in% c("full", "none"))} # Full matching on a probit PS m.out2 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, method = "full", distance = "glm", link = "probit") m.out2 ``` ```{r, eval = (use == "quick"), include = (use == "quick")} # Full matching on a probit PS m.out2 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, method = "quick", distance = "glm", link = "probit") m.out2 ``` We can examine balance on this new matching specification. ```{r, eval = (use != "none")} # Checking balance after full matching summary(m.out2, un = FALSE) ``` Balance is far better, as determined by the lower standardized mean differences and eCDF statistics. The balance should be reported when publishing the results of a matching analysis. This can be done either in a table, using the values resulting from `summary()`, or in a plot, such as a Love plot, which we can make by calling `plot()` on the `summary()` output: ```{r, eval = (use != "none"), fig.alt = "A love plot with matched dots below the threshold lines, indicaitng good balance after matching, in contrast to the unmatched dots far from the treshold lines, indicating poor balance before matching."} plot(summary(m.out2)) ``` Love plots are a simple and straightforward way to summarize balance visually. See `vignette("assessing-balance")` for more information on how to customize `MatchIt`'s Love plot and how to use `cobalt`, a package designed specifically for balance assessment and reporting that is compatible with `MatchIt`. ## Estimating the Treatment Effect How treatment effects are estimated depends on what form of matching was performed. See `vignette("estimating-effects")` for information on how to estimate treatment effects in a variety of scenarios (i.e., different matching methods and outcome types). After full matching and most other matching methods, we can run a regression of the outcome on the treatment and covariates in the matched sample (i.e., including the matching weights) and estimate the treatment effect using g-computation as implemented in `marginaleffects::avg_comparisons()`[^est]. Including the covariates used in the matching in the effect estimation can provide additional robustness to slight imbalances remaining after matching and can improve precision. [^est]: In some cases, the coefficient on the treatment variable in the outcome model can be used as the effect estimate, but g-computation always yields a valid effect estimate regardless of the form of the outcome model and its use is the same regardless of the outcome model type or matching method (with some slight variations), so we always recommend performing g-computation after fitting the outcome model. G-computation is explained in detail in `vignette("estimating-effects")`. Because full matching was successful at balancing the covariates, we'll demonstrate here how to estimate a treatment effect after performing such an analysis. First, we'll extract the matched dataset from the `matchit` object using `match_data()`. This dataset only contains the matched units and adds columns for `distance`, `weights`, and `subclass` (described previously). `r if (use == "none") notice` ```{r, eval = (use != "none")} m.data <- match_data(m.out2) head(m.data) ``` We can then model the outcome in this dataset using the standard regression functions in R, like `lm()` or `glm()`, being sure to include the matching weights (stored in the `weights` variable of the `match_data()` output) in the estimation[^3]. Finally, we use `marginaleffects::avg_comparisons()` to perform g-computation to estimate the ATT. We recommend using cluster-robust standard errors for most analyses, with pair membership as the clustering variable; `avg_comparisons()` makes this straightforward. [^3]: With 1:1 nearest neighbor matching without replacement, excluding the matching weights does not change the estimates. For all other forms of matching, they are required, so we recommend always including them for consistency. ```{r, eval = (use != "none" && me_ok)} library("marginaleffects") fit <- lm(re78 ~ treat * (age + educ + race + married + nodegree + re74 + re75), data = m.data, weights = weights) avg_comparisons(fit, variables = "treat", vcov = ~subclass, newdata = subset(treat == 1)) ``` ```{r, include = FALSE} est <- { if (use != "none" && me_ok) { avg_comparisons(fit, variables = "treat", vcov = ~subclass, newdata = subset(treat == 1)) } else data.frame(type = "response", term = "1 - 0", estimate = 2114, std.error = 646, statistic = 3.27, p.value = 0.0011, conf.low = 848, conf.high = 3380) } ``` The outcome model coefficients and tests should not be interpreted or reported. See `vignette("estimating-effects")` for more information on how to estimate effects and standard errors with different forms of matching and with different outcome types. A benefit of matching is that the outcome model used to estimate the treatment effect is robust to misspecification when balance has been achieved. With full matching, we were able to achieve balance, so the effect estimate should depend less on the form of the outcome model used than had we used 1:1 matching without replacement or no matching at all. ## Reporting Results To report matching results in a manuscript or research report, a few key pieces of information are required. One should be as detailed as possible about the matching procedure and the decisions made to ensure the analysis is replicable and can be adequately assessed for soundness by the audience. Key pieces of information to include are 1) the matching specification used (including the method and any additional options, like calipers or common support restrictions), 2) the distance measure used (including how it was estimated e.g., using logistic regression for propensity scores), 3) which other matching methods were tried prior to settling on a final specification and how the choices were made, 4) the balance of the final matching specification (including standardized mean differences and other balance statistics for the variables, their powers, and their interactions; some of these can be reported as summaries rather than in full detail), 5) the number of matched, unmatched, and discarded units included in the effect estimation, and 6) the method of estimating the treatment effect and standard error or confidence interval (including the specific model used and the specific type of standard error). See @thoemmes2011 for a complete list of specific details to report. Below is an example of how we might write up the prior analysis: > We used propensity score matching to estimate the average marginal effect of the treatment on 1978 earnings on those who received it accounting for confounding by the included covariates. We first attempted 1:1 nearest neighbor propensity score matching without replacement with a propensity score estimated using logistic regression of the treatment on the covariates. This matching specification yielded poor balance, so we instead tried full matching on the propensity score, which yielded adequate balance, as indicated in Table 1 and Figure 1. The propensity score was estimated using a probit regression of the treatment on the covariates, which yielded better balance than did a logistic regression. After matching, all standardized mean differences for the covariates were below 0.1 and all standardized mean differences for squares and two-way interactions between covariates were below .15, indicating adequate balance. Full matching uses all treated and all control units, so no units were discarded by the matching. > > To estimate the treatment effect and its standard error, we fit a linear regression model with 1978 earnings as the outcome and the treatment, covariates, and their interaction as predictors and included the full matching weights in the estimation. The `lm()` function was used to fit the outcome, and the `avg_comparisons()` function in the `marginaleffects` package was used to perform g-computation in the matched sample to estimate the ATT. A cluster-robust variance was used to estimate its standard error with matching stratum membership as the clustering variable. > > The estimated effect was \$`r round(est$estimate)` (SE = `r round(est$std.error, 1)`, p = `r round(est$p.value, 3)`), indicating that the average effect of the treatment for those who received it is to increase earnings. ## Conclusion Although we have covered the basics of performing a matching analysis here, to use matching to its full potential, the more advanced methods available in `MatchIt` should be considered. We recommend reading the other vignettes included here to gain a better understand of all the `MatchIt` has to offer and how to use it responsibly and effectively. As previously stated, the ease of using `MatchIt` does not imply that matching or causal inference in general are simple matters; matching is an advanced statistical technique that should be used with care and caution. We hope the capabilities of `MatchIt` ease and encourage the use of nonparametric preprocessing for estimating causal effects in a robust and well-justified way. ## References MatchIt/inst/doc/sampling-weights.Rmd0000644000176200001440000003100014762373547017267 0ustar liggesusers--- title: "Matching with Sampling Weights" author: "Noah Greifer" date: "`r Sys.Date()`" output: html_vignette: toc: true vignette: > %\VignetteIndexEntry{Matching with Sampling Weights} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} bibliography: references.bib link-citations: true --- ```{=html} ``` ```{r, include = FALSE} knitr::opts_chunk$set(echo = TRUE, eval=T) options(width = 200, digits = 4) ``` ```{r, include = FALSE} #Generating data similar to Austin (2009) for demonstrating treatment effect estimation with sampling weights gen_X <- function(n) { X <- matrix(rnorm(9 * n), nrow = n, ncol = 9) X[,5] <- as.numeric(X[,5] < .5) X } #~20% treated gen_A <- function(X) { LP_A <- - 1.2 + log(2)*X[,1] - log(1.5)*X[,2] + log(2)*X[,4] - log(2.4)*X[,5] + log(2)*X[,7] - log(1.5)*X[,8] P_A <- plogis(LP_A) rbinom(nrow(X), 1, P_A) } # Continuous outcome gen_Y_C <- function(A, X) { 2*A + 2*X[,1] + 2*X[,2] + 2*X[,3] + 1*X[,4] + 2*X[,5] + 1*X[,6] + rnorm(length(A), 0, 5) } #Conditional: # MD: 2 #Marginal: # MD: 2 gen_SW <- function(X) { e <- rbinom(nrow(X), 1, .3) 1/plogis(log(1.4)*X[,2] + log(.7)*X[,4] + log(.9)*X[,6] + log(1.5)*X[,8] + log(.9)*e + -log(.5)*e*X[,2] + log(.6)*e*X[,4]) } set.seed(19599) n <- 2000 X <- gen_X(n) A <- gen_A(X) SW <- gen_SW(X) Y_C <- gen_Y_C(A, X) d <- data.frame(A, X, Y_C, SW) eval_est <- (requireNamespace("optmatch", quietly = TRUE) && requireNamespace("marginaleffects", quietly = TRUE) && !isTRUE(as.logical(Sys.getenv("NOT_CRAN", "false"))) && requireNamespace("sandwich", quietly = TRUE) && utils::packageVersion("marginaleffects") > '0.25.0') ``` ## Introduction Sampling weights (also known as survey weights) frequently appear when using large, representative datasets. They are required to ensure any estimated quantities generalize to a target population defined by the weights. Evidence suggests that sampling weights need to be incorporated into a propensity score matching analysis to obtain valid and unbiased estimates of the treatment effect in the sampling weighted population [@dugoff2014; @austin2016; @lenis2019]. In this guide, we demonstrate how to use sampling weights with `MatchIt` for propensity score estimation, balance assessment, and effect estimation. Fortunately, doing so is not complicated, but some care must be taken to ensure sampling weights are incorporated correctly. It is assumed one has read the other vignettes explaining matching (`vignette("matching-methods")`), balance assessment (`vignette("assessing-balance")`), and effect estimation (`vignette("estimating-effects")`. We will use the same simulated toy dataset used in `vignette("estimating-effects")` except with the addition of a sampling weights variable, `SW`, which is used to generalize the sample to a specific target population with a distribution of covariates different from that of the sample. Code to generate the covariates, treatment, and outcome is at the bottom of `vignette("estimating-effects")` and code to generate the sampling weights is at the end of this document. We will consider the effect of binary treatment `A` on continuous outcome `Y_C`, adjusting for confounders `X1`-`X9`. ```{r,message=FALSE,warning=FALSE} head(d) library("MatchIt") ``` ## Matching When using sampling weights with propensity score matching, one has the option of including the sampling weights in the model used to estimate the propensity scores. Although evidence is mixed on whether this is required [@austin2016; @lenis2019], it can be a good idea. The choice should depend on whether including the sampling weights improves the quality of the matches. Specifications including and excluding sampling weights should be tried to determine which is preferred. To supply sampling weights to the propensity score-estimating function in `matchit()`, the sampling weights variable should be supplied to the `s.weights` argument. It can be supplied either as a numerical vector containing the sampling weights, or a string or one-sided formula with the name of the sampling weights variable in the supplied dataset. Below we demonstrate including sampling weights into propensity scores estimated using logistic regression for optimal full matching for the average treatment effect in the population (ATE) (note that all methods and steps apply the same way to all forms of matching and all estimands). ```{asis, echo = eval_est} Note: if the `optmatch`, `marginaleffects`, or `sandwich` packages are not available, the subsequent lines will not run. ``` ```{r, eval = eval_est} mF_s <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = d, method = "full", distance = "glm", estimand = "ATE", s.weights = ~SW) mF_s ``` Notice that the description of the matching specification when the `matchit` object is printed includes lines indicating that the sampling weights were included in the estimation of the propensity score and that they are present in the `matchit` object. It is stored in the `s.weights` component of the `matchit` object. Note that at this stage, the matching weights (stored in the `weights` component of the `matchit` object) do not incorporate the sampling weights; they are calculated simply as a result of the matching. Now let's perform full matching on a propensity score that does not include the sampling weights in its estimation. Here we use the same specification as was used in `vignette("estimating-effects")`. ```{r, eval = eval_est} mF <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = d, method = "full", distance = "glm", estimand = "ATE") mF ``` Notice that there is no mention of sampling weights in the description of the matching specification. However, to properly assess balance and estimate effects, we need the sampling weights to be included in the `matchit` object, even if they were not used at all in the matching. To do so, we use the function `add_s.weights()`, which adds sampling weights to the supplied `matchit` objects. ```{r, eval = eval_est} mF <- add_s.weights(mF, ~SW) mF ``` Now when we print the `matchit` object, we can see lines have been added identifying that sampling weights are present but they were not used in the estimation of the propensity score used in the matching. Note that not all methods can involve sampling weights in the estimation. Only methods that use the propensity score will be affected by sampling weights; coarsened exact matching or Mahalanobis distance optimal pair matching, for example, ignore the sampling weights, and some propensity score estimation methods, like `randomForest` and `bart` (as presently implemented), cannot incorporate sampling weights. Sampling weights should still be supplied to `matchit()` even when using these methods to avoid having to use `add_s.weights()` and remembering which methods do or do not involve sampling weights. ## Assessing Balance Now we need to decide which matching specification is the best to use for effect estimation. We do this by selecting the one that yields the best balance without sacrificing remaining effective sample size. Because the sampling weights are incorporated into the `matchit` object, the balance assessment tools in `plot.matchit()` and `summary.matchit()` incorporate them into their output. We'll use `summary()` to examine balance on the two matching specifications. With sampling weights included, the balance statistics for the unmatched data are weighted by the sampling weights. The balance statistics for the matched data are weighted by the product of the sampling weights and the matching weights. It is the product of these weights that will be used in estimating the treatment effect. Below we use `summary()` to display balance for the two matching specifications. No additional arguments to `summary()` are required for it to use the sampling weights; as long as they are in the `matchit` object (either due to being supplied with the `s.weights` argument in the call to `matchit()` or to being added afterward by `add_s.weights()`), they will be correctly incorporated into the balance statistics. ```{r, eval = eval_est} #Balance before matching and for the SW propensity score full matching summary(mF_s) #Balance for the non-SW propensity score full matching summary(mF, un = FALSE) ``` The results of the two matching specifications are similar. Balance appears to be slightly better when using the sampling weight-estimated propensity scores than when using the unweighted propensity scores. However, the effective sample size for the control group is larger when using the unweighted propensity scores. Neither propensity score specification achieves excellent balance, and more fiddling with the matching specification (e.g., by changing the method of estimating propensity scores, the type of matching, or the options used with the matching) might yield a better matched set. For the purposes of this analysis, we will move forward with the matching that used the sampling weight-estimated propensity scores (`mF_s`) because of its superior balance. Some of the remaining imbalance may be eliminated by adjusting for the covariates in the outcome model. Note that had we not added sampling weights to `mF`, the matching specification that did not include the sampling weights, our balance assessment would be inaccurate because the balance statistics would not include the sampling weights. In this case, in fact, assessing balance on `mF` without incorporated the sampling weights would have yielded radically different results and a different conclusion. It is critical to incorporate sampling weights into the `matchit` object using `add_s.weights()` even if they are not included in the propensity score estimation. ## Estimating the Effect Estimating the treatment effect after matching is straightforward when using sampling weights. Effects are estimated in the same way as when sampling weights are excluded, except that the matching weights must be multiplied by the sampling weights for use in the outcome model to yield accurate, generalizable estimates. `match_data()` and `get_matches()` do this automatically, so the weights produced by these functions already are a product of the matching weights and the sampling weights. Note this will only be true if sampling weights are incorporated into the `matchit` object. With `avg_comparisons()`, only the sampling weights should be included when estimating the treatment effect. Below we estimate the effect of `A` on `Y_C` in the matched and sampling weighted sample, adjusting for the covariates to improve precision and decrease bias. ```{r, eval = eval_est} md_F_s <- match_data(mF_s) fit <- lm(Y_C ~ A * (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9), data = md_F_s, weights = weights) library("marginaleffects") avg_comparisons(fit, variables = "A", vcov = ~subclass, newdata = subset(A == 1), wts = "SW") ``` Note that `match_data()` and `get_weights()` have the option `include.s.weights`, which, when set to `FALSE`, makes it so the returned weights do not incorporate the sampling weights and are simply the matching weights. Because one might to forget to multiply the two sets of weights together, it is easier to just use the default of `include.s.weights = TRUE` and ignore the sampling weights in the rest of the analysis (because they are already included in the returned weights). ## Code to Generate Data used in Examples ```{r, eval = FALSE} #Generatng data similar to Austin (2009) for demonstrating #treatment effect estimation with sampling weights gen_X <- function(n) { X <- matrix(rnorm(9 * n), nrow = n, ncol = 9) X[,5] <- as.numeric(X[,5] < .5) X } #~20% treated gen_A <- function(X) { LP_A <- - 1.2 + log(2)*X[,1] - log(1.5)*X[,2] + log(2)*X[,4] - log(2.4)*X[,5] + log(2)*X[,7] - log(1.5)*X[,8] P_A <- plogis(LP_A) rbinom(nrow(X), 1, P_A) } # Continuous outcome gen_Y_C <- function(A, X) { 2*A + 2*X[,1] + 2*X[,2] + 2*X[,3] + 1*X[,4] + 2*X[,5] + 1*X[,6] + rnorm(length(A), 0, 5) } #Conditional: # MD: 2 #Marginal: # MD: 2 gen_SW <- function(X) { e <- rbinom(nrow(X), 1, .3) 1/plogis(log(1.4)*X[,2] + log(.7)*X[,4] + log(.9)*X[,6] + log(1.5)*X[,8] + log(.9)*e + -log(.5)*e*X[,2] + log(.6)*e*X[,4]) } set.seed(19599) n <- 2000 X <- gen_X(n) A <- gen_A(X) SW <- gen_SW(X) Y_C <- gen_Y_C(A, X) d <- data.frame(A, X, Y_C, SW) ``` ## References MatchIt/inst/doc/estimating-effects.Rmd0000644000176200001440000016325714762363176017607 0ustar liggesusers--- title: "Estimating Effects After Matching" author: "Noah Greifer" date: "`r Sys.Date()`" output: html_vignette: toc: true vignette: > %\VignetteIndexEntry{Estimating Effects After Matching} %\VignetteEngine{knitr::rmarkdown_notangle} %\VignetteEncoding{UTF-8} bibliography: references.bib link-citations: true --- ```{=html} ``` ```{r, include = FALSE} knitr::opts_chunk$set(echo = TRUE, eval=T) options(width = 200, digits= 4) me_ok <- requireNamespace("marginaleffects", quietly = TRUE) && requireNamespace("sandwich", quietly = TRUE) && !isTRUE(as.logical(Sys.getenv("NOT_CRAN", "false"))) && utils::packageVersion("marginaleffects") > '0.25.0' su_ok <- requireNamespace("survival", quietly = TRUE) boot_ok <- requireNamespace("boot", quietly = TRUE) ``` ```{r, include = FALSE} #Generating data similar to Austin (2009) for demonstrating treatment effect estimation gen_X <- function(n) { X <- matrix(rnorm(9 * n), nrow = n, ncol = 9) X[,5] <- as.numeric(X[,5] < .5) X } #~20% treated gen_A <- function(X) { LP_A <- - 1.2 + log(2)*X[,1] - log(1.5)*X[,2] + log(2)*X[,4] - log(2.4)*X[,5] + log(2)*X[,7] - log(1.5)*X[,8] P_A <- plogis(LP_A) rbinom(nrow(X), 1, P_A) } # Continuous outcome gen_Y_C <- function(A, X) { 2*A + 2*X[,1] + 2*X[,2] + 2*X[,3] + 1*X[,4] + 2*X[,5] + 1*X[,6] + rnorm(length(A), 0, 5) } #Conditional: # MD: 2 #Marginal: # MD: 2 # Binary outcome gen_Y_B <- function(A, X) { LP_B <- -2 + log(2.4)*A + log(2)*X[,1] + log(2)*X[,2] + log(2)*X[,3] + log(1.5)*X[,4] + log(2.4)*X[,5] + log(1.5)*X[,6] P_B <- plogis(LP_B) rbinom(length(A), 1, P_B) } #Conditional: # OR: 2.4 # logOR: .875 #Marginal: # RD: .144 # RR: 1.54 # logRR: .433 # OR: 1.92 # logOR .655 # Survival outcome gen_Y_S <- function(A, X) { LP_S <- -2 + log(2.4)*A + log(2)*X[,1] + log(2)*X[,2] + log(2)*X[,3] + log(1.5)*X[,4] + log(2.4)*X[,5] + log(1.5)*X[,6] sqrt(-log(runif(length(A)))*2e4*exp(-LP_S)) } #Conditional: # HR: 2.4 # logHR: .875 #Marginal: # HR: 1.57 # logHR: .452 set.seed(19599) n <- 2000 X <- gen_X(n) A <- gen_A(X) Y_C <- gen_Y_C(A, X) Y_B <- gen_Y_B(A, X) Y_S <- gen_Y_S(A, X) d <- data.frame(A, X, Y_C, Y_B, Y_S) ``` ## Introduction After assessing balance and deciding on a matching specification, it comes time to estimate the effect of the treatment in the matched sample. How the effect is estimated and interpreted depends on the desired estimand and the type of model used (if any). In addition to estimating effects, estimating the uncertainty of the effects is critical in communicating them and assessing whether the observed effect is compatible with there being no effect in the population. This guide explains how to estimate effects after various forms of matching and with various outcome types. There may be situations that are not covered here for which additional methodological research may be required, but some of the recommended methods here can be used to guide such applications. This guide is structured as follows: first, information on the concepts related to effect and standard error (SE) estimation is presented below. Then, instructions for how to estimate effects and SEs are described for the standard case (matching for the ATT with a continuous outcome) and some other common circumstances. Finally, recommendations for reporting results and tips to avoid making common mistakes are presented. ### Identifying the estimand Before an effect is estimated, the estimand must be specified and clarified. Although some aspects of the estimand depend not only on how the effect is estimated after matching but also on the matching method itself, other aspects must be considered at the time of effect estimation and interpretation. Here, we consider three aspects of the estimand: the population the effect is meant to generalize to (the target population), the effect measure, and whether the effect is marginal or conditional. **The target population.** Different matching methods allow you to estimate effects that can generalize to different target populations. The most common estimand in matching is the average treatment effect in the treated (ATT), which is the average effect of treatment for those who receive treatment. This estimand is estimable for matching methods that do not change the treated units (i.e., by weighting or discarding units) and is requested in `matchit()` by setting `estimand = "ATT"` (which is the default). The average treatment effect in the population (ATE) is the average effect of treatment for the population from which the sample is a random sample. This estimand is estimable only for methods that allow the ATE and either do not discard units from the sample or explicit target full sample balance, which in `MatchIt` is limited to full matching, subclassification, and profile matching when setting `estimand = "ATE"`. When treated units are discarded (e.g., through the use of common support restrictions, calipers, cardinality matching, or [coarsened] exact matching), the estimand corresponds to neither the population ATT nor the population ATE, but rather to an average treatment effect in the remaining matched sample (ATM), which may not correspond to any specific target population. See @greiferChoosingEstimandWhen2021 for a discussion on the substantive considerations involved when choosing the target population of the estimand. **Marginal and conditional effects.** A marginal effect is a comparison between the expected potential outcome under treatment and the expected potential outcome under control. This is the same quantity estimated in randomized trials without blocking or covariate adjustment and is particularly useful for quantifying the overall effect of a policy or population-wide intervention. A conditional effect is the comparison between the expected potential outcomes in the treatment groups within strata. This is useful for identifying the effect of a treatment for an individual patient or a subset of the population. **Effect measures.** The outcome types we consider here are continuous, with the effect measured by the mean difference; binary, with the effect measured by the risk difference (RD), risk ratio (RR), or odds ratio (OR); and time-to-event (i.e., survival), with the effect measured by the hazard ratio (HR). The RR, OR, and HR are *noncollapsible* effect measures, which means the marginal effect on that scale is not a (possibly) weighted average of the conditional effects within strata, even if the stratum-specific effects are of the same magnitude. For these effect measures, it is critical to distinguish between marginal and conditional effects because different statistical methods target different types of effects. The mean difference and RD are *collapsible* effect measures, so the same methods can be used to estimate marginal and conditional effects. Our primary focus will be on marginal effects, which are appropriate for all effect measures, easily interpretable, and require few modeling assumptions. The "Common Mistakes" section includes examples of commonly used methods that estimate conditional rather than marginal effects and should not be used when marginal effects are desired. ### G-computation To estimate marginal effects, we use a method known as g-computation [@snowdenImplementationGComputationSimulated2011] or regression estimation [@schaferAverageCausalEffects2008]. This involves first specifying a model for the outcome as a function of the treatment and covariates. Then, for each unit, we compute their predicted values of the outcome setting their treatment status to treated, and then again for control, leaving us with two predicted outcome values for each unit, which are estimates of the potential outcomes under each treatment level. We compute the mean of each of the estimated potential outcomes across the entire sample, which leaves us with two average estimated potential outcomes. Finally, the contrast of these average estimated potential outcomes (e.g., their difference or ratio, depending on the effect measure desired) is the estimate of the treatment effect. When doing g-computation after matching, a few additional considerations are required. First, when we take the average of the estimated potential outcomes under each treatment level, this must be a weighted average that incorporates the matching weights. Second, if we want to target the ATT or ATC, we only estimate potential outcomes for the treated or control group, respectively (though we still generate predicted values under both treatment and control). G-computation as a framework for estimating effects after matching has a number of advantages over other approaches. It works the same regardless of the form of the outcome model or type of outcome (e.g., whether a linear model is used for a continuous outcome or a logistic model is used for a binary outcome); the only difference might be how the average expected potential outcomes are contrasted in the final step. In simple cases, the estimated effect is numerically identical to effects estimated using other methods; for example, if no covariates are included in the outcome model, the g-computation estimate is equal to the difference in means from a t-test or coefficient of the treatment in a linear model for the outcome. There are analytic approximations to the SEs of the g-computation estimate, and these SEs can incorporate pair/subclass membership (described in more detail below). For all these reasons, we use g-computation when possible for all effect estimates, even if there are simpler methods that would yield the same estimates. Using a single workflow (with some slight modifications depending on the context; see below) facilitates implementing best practices regardless of what choices a user makes. ### Modeling the Outcome The goal of the outcome model is to generate good predictions for use in the g-computation procedure described above. The type and form of the outcome model should depend on the outcome type. For continuous outcomes, one can use a linear model regressing the outcome on the treatment; for binary outcomes, one can use a generalized linear model with, e.g., a logistic link; for time-to-event outcomes, one can use a Cox proportional hazards model. An additional decision to make is whether (and how) to include covariates in the outcome model. One may ask, why use matching at all if you are going to model the outcome with covariates anyway? Matching reduces the dependence of the effect estimate on correct specification of the outcome model; this is the central thesis of @ho2007. Including covariates in the outcome model after matching has several functions: it can increase precision in the effect estimate, reduce the bias due to residual imbalance, and make the effect estimate "doubly robust", which means it is consistent if either the matching reduces sufficient imbalance in the covariates or if the outcome model is correct. For these reasons, we recommend covariate adjustment after matching when possible. There is some evidence that covariate adjustment is most helpful for covariates with standardized mean differences greater than .1 [@nguyen2017], so these covariates and covariates thought to be highly predictive of the outcome should be prioritized in treatment effect models if not all can be included due to sample size constraints. Although there are many possible ways to include covariates (e.g., not just main effects but interactions, smoothing terms like splines, or other nonlinear transformations), it is important not to engage in specification search (i.e., trying many outcomes models in search of the "best" one). Doing so can invalidate results and yield a conclusion that fails to replicate. For this reason, we recommend only including the same terms included in the propensity score model unless there is a strong *a priori* and justifiable reason to model the outcome differently. It is important not to interpret the coefficients and tests of covariates in the outcome model. These are not causal effects and their estimates may be severely confounded. Only the treatment effect estimate can be interpreted as causal assuming the relevant assumptions about unconfoundedness are met. Inappropriately interpreting the coefficients of covariates in the outcome model is known as the Table 2 fallacy [@westreich2013]. To avoid this, we only display the results of the g-computation procedure and do not examine or interpret the outcome models themselves. ### Estimating Standard Errors and Confidence Intervals Uncertainty estimation (i.e., of SEs, confidence intervals, and p-values) may consider the variety of sources of uncertainty present in the analysis, including (but not limited to!) estimation of the propensity score (if used), matching (i.e., because treated units might be matched to different control units if others had been sampled), and estimation of the treatment effect (i.e., because of sampling error). In general, there are no analytic solutions to all these issues, so much of the research done on uncertainty estimation after matching has relied on simulation studies. The two primary methods that have been shown to perform well in matched samples are using cluster-robust SEs and the bootstrap, described below. To compute SEs after g-computation, a method known as the delta method is used; this is a way to compute the SEs of the derived quantities (the expected potential outcomes and their contrast) from the variance of the coefficients of the outcome models. For nonlinear models (e.g., logistic regression), the delta method is only an approximation subject to error (though in many cases this error is small and shrinks in large samples). Because the delta method relies on the variance of the coefficients from the outcome model, it is important to correctly estimate these variances, using either robust or cluster-robust methods as described below. #### Robust and Cluster-Robust Standard Errors **Robust standard errors.** Also known as sandwich SEs (due to the form of the formula for computing them), heteroscedasticity-consistent SEs, or Huber-White SEs, robust SEs are an adjustment to the usual maximum likelihood or ordinary least squares SEs that are robust to violations of some of the assumptions required for usual SEs to be valid [@mackinnon1985]. Although there has been some debate about their utility [@king2015], robust SEs rarely degrade inferences and often improve them. Generally, robust SEs **must** be used when any non-uniform weights are included in the estimation (e.g., with matching with replacement or inverse probability weighting). **Cluster-robust standard errors.** A version of robust SEs known as cluster-robust SEs [@liang1986] can be used to account for dependence between observations within clusters (e.g., matched pairs). @abadie2019 demonstrate analytically that cluster-robust SEs are generally valid after matching, whereas regular robust SEs can over- or under-estimate the true sampling variability of the effect estimator depending on the specification of the outcome model (if any) and degree of effect modification. A plethora of simulation studies have further confirmed the validity of cluster-robust SEs after matching [e.g., @austin2009a; @austin2014; @gayat2012; @wan2019; @austin2013]. Given this evidence favoring the use of cluster-robust SEs, we recommend them in most cases and use them judiciously in this guide[^1]. [^1]: Because they are only appropriate with a large number of clusters, cluster-robust SEs are generally not used with subclassification methods. Regular robust SEs are valid with these methods when using the subclassification weights to estimate marginal effects. #### Bootstrapping One problem when using robust and cluster-robust SEs along with the delta method is that the delta method is an approximation, as previously mentioned. One solution to this problem is bootstrapping, which is a technique used to simulate the sampling distribution of an estimator by repeatedly drawing samples with replacement and estimating the effect in each bootstrap sample [@efron1993]. From the bootstrap distribution, SEs and confidence intervals can be computed in several ways, including using the standard deviation of the bootstrap estimates as the SE estimate or using the 2.5 and 97.5 percentiles as 95% confidence interval bounds. Bootstrapping tends to be most useful when no analytic estimator of a SE is possible or has been derived yet. Although @abadie2008 found analytically that the bootstrap is inappropriate for matched samples, simulation evidence has found it to be adequate in many cases [@hill2006; @austin2014; @austin2017]. Typically, bootstrapping involves performing the entire estimation process in each bootstrap sample, including propensity score estimation, matching, and effect estimation. This tends to be the most straightforward route, though intervals from this method may be conservative in some cases (i.e., they are wider than necessary to achieve nominal coverage) [@austin2014]. Less conservative and more accurate intervals have been found when using different forms of the bootstrap, including the wild bootstrap develop by @bodory2020 and the matched/cluster bootstrap described by @austin2014 and @abadie2019. The cluster bootstrap involves sampling matched pairs/strata of units from the matched sample and performing the analysis within each sample composed of the sampled pairs. @abadie2019 derived analytically that the cluster bootstrap is valid for estimating SEs and confidence intervals in the same circumstances cluster robust SEs are; indeed, the cluster bootstrap SE is known to approximate the cluster-robust SE [@cameron2015]. With bootstrapping, more bootstrap replications are always better but can take time and increase the chances that at least one error will occur within the bootstrap analysis (e.g., a bootstrap sample with zero treated units or zero units with an event). In general, numbers of replications upwards of 999 are recommended, with values one less than a multiple of 100 preferred to avoid interpolation when using the percentiles as confidence interval limits [@mackinnon2006]. There are several methods of computing bootstrap confidence intervals, but the bias-corrected accelerated (BCa) bootstrap confidence interval often performs best [@austin2014; @carpenter2000] and is easy to implement, simply by setting `type = "bca"` in the call to `boot::boot.ci()` after running `boot::boot()`[^2]. [^2]: Sometimes, an error will occur with this method, which usually means more bootstrap replications are required. The number of replicates must be greater than the original sample size when using the full bootstrap and greater than the number of pairs/strata when using the block bootstrap. Most of this guide will consider analytic (i.e., non-bootstrapping) approaches to estimating uncertainty; the section "Using Bootstrapping to Estimate Confidence Intervals" describes broadly how to use bootstrapping. Although analytic estimates are faster to compute, in many cases bootstrap confidence intervals are more accurate. ## Estimating Treatment Effects and Standard Errors After Matching Below, we describe effect estimation after matching. We'll be using a simulated toy dataset `d` with several outcome types. Code to generate the dataset is at the end of this document. The focus here is not on evaluating the methods but simply on demonstrating them. In all cases, the correct propensity score model is used. Below we display the first six rows of `d`: ```{r} head(d) ``` `A` is the treatment variable, `X1` through `X9` are covariates, `Y_C` is a continuous outcome, `Y_B` is a binary outcome, and `Y_S` is a survival outcome. We will need to the following packages to perform the desired analyses: - `marginaleffects` provides the `avg_comparisons()` function for performing g-computation and estimating the SEs and confidence intervals of the average estimate potential outcomes and treatment effects - `sandwich` is used internally by `marginaleffects` to compute robust and cluster-robust SEs - `survival` provides `coxph()` to estimate the coefficients in a Cox-proportional hazards model for the marginal hazard ratio, which we will use for survival outcomes. Of course, we also need `MatchIt` to perform the matching. ```{r,message=FALSE,warning=FALSE, eval = which(c(TRUE, me_ok))} library("MatchIt") library("marginaleffects") ``` All effect estimates will be computed using `marginaleffects::avg_comparions()`, even when its use may be superfluous (e.g., for performing a t-test in the matched set). As previously mentioned, this is because it is useful to have a single workflow that works no matter the situation, perhaps with very slight modifications to accommodate different contexts. Using `avg_comparions()` has several advantages, even when the alternatives are simple: it only provides the effect estimate, and not other coefficients; it automatically incorporates robust and cluster-robust SEs if requested; and it always produces average marginal effects for the correct population if requested. Other packages may be of use but are not used here. There are alternatives to the `marginaleffects` package for computing average marginal effects, including `margins` and `stdReg`. The `survey` package can be used to estimate robust SEs incorporating weights and provides functions for survey-weighted generalized linear models and Cox-proportional hazards models. ### The Standard Case For almost all matching methods, whether a caliper, common support restriction, exact matching specification, or $k$:1 matching specification is used, estimating the effect in the matched dataset is straightforward and involves fitting a model for the outcome that incorporates the matching weights[^3], then estimating the treatment effect using g-computation (i.e., using `marginaleffects::avg_comparisons()`) with a cluster-robust SE to account for pair membership. This procedure is the same for continuous and binary outcomes with and without covariates. [^3]: The matching weights are not necessary when performing 1:1 matching, but we include them here for generality. When weights are not necessary, including them does not affect the estimates. Because it may not always be clear when weights are required, we recommend always including them. There are a few adjustments that need to be made for certain scenarios, which we describe in the section "Adjustments to the Standard Case". These adjustments include for the following cases: when matching for the ATE rather than the ATT, for matching with replacement, for matching with a method that doesn't involve creating pairs (e.g., cardinality and profile matching and coarsened exact matching), for subclassification, for estimating effects with binary outcomes, and for estimating effects with survival outcomes. You must read the Standard Case to understand the basic procedure before reading about these special scenarios. Here, we demonstrate the faster analytic approach to estimating confidence intervals; for the bootstrap approach, see the section "Using Bootstrapping to Estimate Confidence Intervals" below. First, we will perform variable-ratio nearest neighbor matching without replacement on the propensity score for the ATT. Remember, all matching methods use this exact procedure or a slight variation, so this section is critical even if you are using a different matching method. ```{r} #Variable-ratio NN matching on the PS for the ATT mV <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = d, ratio = 2, max.controls = 4) mV #Extract matched data md <- match_data(mV) head(md) ``` Typically one would assess balance and ensure that this matching specification works, but we will skip that step here to focus on effect estimation. See `vignette("MatchIt")` and `vignette("assessing-balance")` for more information on this necessary step. Because we did not use a caliper, the target estimand is the ATT. We perform all analyses using the matched dataset, `md`, which, for matching methods that involve dropping units, contains only the units retained in the sample. First, we fit a model for the outcome given the treatment and (optionally) the covariates. It's usually a good idea to include treatment-covariate interactions, which we do below, but this is not always necessary, especially when excellent balance has been achieved. You can also include the propensity score (usually labeled `distance` in the `match_data()` output), which can add some robustness, especially when modeled flexibly (e.g., with polynomial terms or splines) [@austinDoublePropensityscoreAdjustment2017]; see [here](https://stats.stackexchange.com/a/580174/116195) for an example. ```{r} #Linear model with covariates fit1 <- lm(Y_C ~ A * (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9), data = md, weights = weights) ``` Next, we use `marginaleffects::avg_comparisons()` to estimate the ATT. ```{r, eval=me_ok} avg_comparisons(fit1, variables = "A", vcov = ~subclass, newdata = subset(A == 1)) ``` Let's break down the call to `avg_comparisons()`: to the first argument, we supply the model fit, `fit1`; to the `variables` argument, the name of the treatment (`"A"`); to the `vcov` argument, a formula with subclass membership (`~subclass`) to request cluster-robust SEs; and to the `newdata` argument, a version of the matched dataset containing only the treated units (`subset(A == 1)`) to request the ATT. Some of these arguments differ depending on the specifics of the matching method and outcome type; see the sections below for information. If, in addition to the effect estimate, we want the average estimated potential outcomes, we can use `marginaleffects::avg_predictions()`, which we demonstrate below. Note the interpretation of the resulting estimates as the expected potential outcomes is only valid if all covariates present in the outcome model (if any) are interacted with the treatment. ```{r, eval=me_ok && packageVersion("marginaleffects") >= "0.11.0"} avg_predictions(fit1, variables = "A", vcov = ~subclass, newdata = subset(A == 1)) ``` We can see that the difference in potential outcome means is equal to the average treatment effect computed previously[^4]. All of the arguments to `avg_predictions()` are the same as those to `avg_comparisons()`. [^4]: To verify that they are equal, supply the output of `avg_predictions()` to `hypotheses()`, e.g., `avg_predictions(...) |> hypotheses(~pairwise)`; this explicitly compares the average potential outcomes and should yield identical estimates to the `avg_comparisons()` call. ### Adjustments to the Standard Case This section explains how the procedure might differ if any of the following special circumstances occur. #### Matching for the ATE When matching for the ATE (including [coarsened] exact matching, full matching, subclassification, and cardinality matching), everything is identical to the Standard Case except that in the calls to `avg_comparisons()` and `avg_predictions()`, the `newdata` argument is omitted. This is because the estimated potential outcomes are computed for the full sample rather than just the treated units. #### Matching with replacement When matching with replacement (i.e., nearest neighbor or genetic matching with `replace = TRUE`), effect and SE estimation need to account for control unit multiplicity (i.e., repeated use) and within-pair correlations [@hill2006; @austin2020a]. Although @abadie2008 demonstrated analytically that bootstrap SEs may be invalid for matching with replacement, simulation work by @hill2006 and @bodory2020 has found that bootstrap SEs are adequate and generally slightly conservative. See the section "Using Bootstrapping to Estimate Confidence Intervals" for instructions on using the bootstrap and an example that use matching with replacement. Because control units do not belong to unique pairs, there is no pair membership in the `match_data()` output. One can simply change `vcov = ~subclass` to `vcov = "HC3"` in the calls to `avg_comparisons()` and `avg_predictions()` to use robust SEs instead of cluster-robust SEs, as recommended by @hill2006. There is some evidence for an alternative approach that incorporates pair membership and adjusts for reuse of control units, though this has only been studied for survival outcomes [@austin2020a]. This adjustment involves using two-way cluster-robust SEs with pair membership and unit ID as the clustering variables. For continuous and binary outcomes, this involves the following two changes: 1) replace `match_data()` with `get_matches()`, which produces a dataset with one row per unit per pair, meaning control units matched to multiple treated units will appear multiple times in the dataset; 2) set `vcov = ~subclass + id` in the calls to `avg_comparisons()` and `avg_predictions()`. For survival outcomes, a special procedure must be used; see the section on survival outcomes below. #### Matching without pairing Some matching methods do not involve creating pairs; these include cardinality and profile matching with `mahvars = NULL` (the default), exact matching, and coarsened exact matching with `k2k = FALSE` (the default). The only change that needs to be made to the Standard Case is that one should change `vcov = ~subclass` to `vcov = "HC3"` in the calls to `avg_comparisons()` and `avg_predictions()` to use robust SEs instead of cluster-robust SEs. Remember that if matching is done for the ATE (even if units are dropped), the `newdata` argument should be dropped. #### Propensity score subclassification There are two natural ways to estimate marginal effects after subclassification: the first is to estimate subclass-specific treatment effects and pool them using an average marginal effects procedure, and the second is to use the stratum weights to estimate a single average marginal effect. This latter approach is also known as marginal mean weighting through stratification (MMWS), and is described in detail by @hong2010[^5]. When done properly, both methods should yield similar or identical estimates of the treatment effect. [^5]: It is also known as fine stratification weighting, described by @desai2017. All of the methods described above for the Standard Case also work with MMWS because the formation of the weights is the same; the only difference is that it is not appropriate to use cluster-robust SEs with MMWS because of how few clusters are present, so one should change `vcov = ~subclass` to `vcov = "HC3"` in the calls to `avg_comparisons()` and `avg_predictions()` to use robust SEs instead of cluster-robust SEs. The subclasses can optionally be included in the outcome model (optionally interacting with treatment) as an alternative to including the propensity score. The subclass-specific approach omits the weights and uses the subclasses directly. It is only appropriate when there are a small number of subclasses relative to the sample size. In the outcome model, `subclass` should interact with all other predictors in the model (including the treatment, covariates, and interactions, if any), and the `weights` argument should be omitted. As with MMWS, one should change `vcov = ~subclass` to `vcov = "HC3"` in the calls to `avg_comparisons()` and `avg_predictions()`. See an example below: ```{r, eval=me_ok} #Subclassification on the PS for the ATT mS <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = d, method = "subclass", estimand = "ATT") #Extract matched data md <- match_data(mS) fitS <- lm(Y_C ~ subclass * (A * (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9)), data = md) avg_comparisons(fitS, variables = "A", vcov = "HC3", newdata = subset(A == 1)) ``` A model with fewer terms may be required when subclasses are small; removing covariates or their interactions with treatment may be required and can increase precision in smaller datasets. Remember that if subclassification is done for the ATE (even if units are dropped), the `newdata` argument should be dropped. #### Binary outcomes Estimating effects on binary outcomes is essentially the same as for continuous outcomes. The main difference is that there are several measures of the effect one can consider, which include the odds ratio (OR), risk ratio/relative risk (RR), and risk difference (RD), and the syntax to `avg_comparisons()` depends on which one is desired. The outcome model should be one appropriate for binary outcomes (e.g., logistic regression) but is unrelated to the desired effect measure because we can compute any of the above effect measures using `avg_comparisons()` after the logistic regression. To fit a logistic regression model, change `lm()` to `glm()` and set `family = quasibinomial()`[^6]. To compute the marginal RD, we can use exactly the same syntax as in the Standard Case; nothing needs to change[^7]. [^6]: We use `quasibinomial()` instead of `binomial()` simply to avoid a spurious warning that can occur with certain kinds of matching; the results will be identical regardless. [^7]: Note that for low or high average expected risks computed with `avg_predictions()`, the confidence intervals may go below 0 or above 1; this is because an approximation is used. To avoid this problem, bootstrapping or simulation-based inference can be used instead. To compute the marginal RR, we need to add `comparison = "lnratioavg"` to `avg_comparisons()`; this computes the marginal log RR. To get the marginal RR, we need to add `transform = "exp"` to `avg_comparisons()`, which exponentiates the marginal log RR and its confidence interval. The code below computes the effects and displays the statistics of interest: ```{r, eval=me_ok} #Logistic regression model with covariates fit2 <- glm(Y_B ~ A * (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9), data = md, weights = weights, family = quasibinomial()) #Compute effects; RR and confidence interval avg_comparisons(fit2, variables = "A", vcov = ~subclass, newdata = subset(A == 1), comparison = "lnratioavg", transform = "exp") ``` The output displays the marginal RR, its Z-value, the p-value for the Z-test of the log RR against 0, and its confidence interval. (Note that even though the `Contrast` label still suggests the log RR, the RR is actually displayed.) To view the log RR and its standard error, omit the `transform` argument. For the marginal OR, the only thing that needs to change is that `comparison` should be set to `"lnoravg"`. For the marginal RD, both the `comparison` and `transform` arguments can be removed (yielding the same call as in the standard case). #### Survival outcomes There are several measures of effect size for survival outcomes. When using the Cox proportional hazards model, the quantity of interest is the hazard ratio (HR) between the treated and control groups. As with the OR, the HR is non-collapsible, which means the estimated HR will only be a valid estimate of the marginal HR when no other covariates are included in the model. Other effect measures, such as the difference in mean survival times or probability of survival after a given time, can be treated just like continuous and binary outcomes as previously described. For the HR, we cannot compute average marginal effects and must use the coefficient on treatment in a Cox model fit without covariates[^8]. This means that we cannot use the procedures from the Standard Case. Here we describe estimating the marginal HR using `coxph()` from the `survival` package. (See `help("coxph", package = "survival")` for more information on this model.) To request cluster-robust SEs as recommended by @austin2013a, we need to supply pair membership (stored in the `subclass` column of `md`) to the `cluster` argument and set `robust = TRUE`. For matching methods that don't involve pairing (e.g., cardinality and profile matching and [coarsened] exact matching), we can omit the `cluster` argument (but keep `robust = TRUE`)[^9]. [^8]: It is not immediately clear how to estimate a marginal HR when covariates are included in the outcome model; though @austin2020 describe several ways of including covariates in a model to estimate the marginal HR, they do not develop SEs and little research has been done on this method, so we will not present it here. Instead, we fit a simple Cox model with the treatment as the sole predictor. [^9]: For subclassification, only MMWS can be used; this is done simply by including the stratification weights in the Cox model and omitting the `cluster` argument. ```{r, eval=su_ok} library("survival") #Cox Regression for marginal HR coxph(Surv(Y_S) ~ A, data = md, robust = TRUE, weights = weights, cluster = subclass) ``` The `coef` column contains the log HR, and `exp(coef)` contains the HR. Remember to always use the `robust se` for the SE of the log HR. The displayed z-test p-value results from using the robust SE. For matching with replacement, a special procedure described by @austin2020a can be necessary for valid inference. According to the results of their simulation studies, when the treatment prevalence is low (\<30%), a SE that does not involve pair membership (i.e., the `match_data()` approach, as demonstrated above) is sufficient. When treatment prevalence is higher, the SE that ignores pair membership may be too low, and the authors recommend using a custom SE estimator that uses information about both multiplicity and pairing. Doing so must be done manually for survival models using `get_matches()` and several calls to `coxph()` as demonstrated in the appendix of @austin2020a. We demonstrate this below: ```{r, eval = F} #get_matches() after matching with replacement gm <- get_matches(mR) #Austin & Cafri's (2020) SE estimator fs <- coxph(Surv(Y_S) ~ A, data = gm, robust = TRUE, weights = weights, cluster = subclass) Vs <- fs$var ks <- nlevels(gm$subclass) fi <- coxph(Surv(Y_S) ~ A, data = gm, robust = TRUE, weights = weights, cluster = id) Vi <- fi$var ki <- length(unique(gm$id)) fc <- coxph(Surv(Y_S) ~ A, data = gm, robust = TRUE, weights = weights) Vc <- fc$var kc <- nrow(gm) #Compute the variance and sneak it back into the fit object fc$var <- (ks/(ks-1))*Vs + (ki/(ki-1))*Vi - (kc/(kc-1))*Vc fc ``` The `robust se` column contains the computed SE, and the reported Z-test uses this SE. The `se(coef)` column should be ignored. ### Using Bootstrapping to Estimate Confidence Intervals The bootstrap is an alternative to the delta method for estimating confidence intervals for estimated effects. See the section Bootstrapping above for details. Here, we'll demonstrate two forms of the bootstrap: 1) the standard bootstrap, which involve resampling units and performing matching and effect estimation within each bootstrap sample, and 2) the cluster bootstrap, which involves resampling pairs after matching and estimating the effect in each bootstrap sample. For both, we will use functionality in the `boot` package. It is critical to set a seed using `set.seed()` prior to performing the bootstrap in order for results to be replicable. #### The standard bootstrap For the standard bootstrap, we need a function that takes in the original dataset and a vector of sampled unit indices and returns the estimated quantity of interest. This function should perform the matching on the bootstrap sample, fit the outcome model, and estimate the treatment effect using g-computation. In this example, we'll use matching with replacement, since the standard bootstrap has been found to work well with it [@bodory2020; @hill2006], despite some analytic results recommending otherwise [@abadie2008]. We'll implement g-computation manually rather than using `avg_comparisons()`, as this dramatically improves the speed of the estimation since we don't require standard errors to be estimated in each sample (or other processing `avg_comparisons()` does). We'll consider the marginal RR ATT of `A` on the binary outcome `Y_B`. The first step is to write the estimation function, we call `boot_fun`. This function returns the marginal RR. In it, we perform the matching, estimate the effect, and return the estimate of interest. ```{r} boot_fun <- function(data, i) { boot_data <- data[i,] #Do 1:1 PS matching with replacement m <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = boot_data, replace = TRUE) #Extract matched dataset md <- match_data(m, data = boot_data) #Fit outcome model fit <- glm(Y_B ~ A * (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9), data = md, weights = weights, family = quasibinomial()) ## G-computation ## #Subset to treated units for ATT; skip for ATE md1 <- subset(md, A == 1) #Estimated potential outcomes under treatment p1 <- predict(fit, type = "response", newdata = transform(md1, A = 1)) Ep1 <- mean(p1) #Estimated potential outcomes under control p0 <- predict(fit, type = "response", newdata = transform(md1, A = 0)) Ep0 <- mean(p0) #Risk ratio Ep1 / Ep0 } ``` Next, we call `boot::boot()` with this function and the original dataset supplied to perform the bootstrapping. We'll request 199 bootstrap replications here, but in practice you should use many more, upwards of 999. More is always better. Using more also allows you to use the bias-corrected and accelerated (BCa) bootstrap confidence intervals (which you can request by setting `type = "bca"` in the call to `boot.ci()`), which are known to be the most accurate. See `?boot.ci` for details. Here, we'll just use a percentile confidence interval. ```{r, eval = boot_ok, message=F, warning=F} library("boot") set.seed(54321) boot_out <- boot(d, boot_fun, R = 199) boot_out boot.ci(boot_out, type = "perc") ``` ```{r, include = FALSE} b <- { if (boot_ok) boot::boot.ci(boot_out, type = "perc") else list(t0 = 1.347, percent = c(0, 0, 0, 1.144, 1.891)) } ``` We find a RR of `r round(b$t0, 3)` with a confidence interval of (`r round(b$percent[4], 3)`, `r round(b$percent[5], 3)`). If we had wanted a risk difference, we could have changed the final line in `boot_fun()` to be `Ep1 - Ep0`. #### The cluster bootstrap For the cluster bootstrap, we need a function that takes in a vector of subclass (e.g., pairs) and a vector of sampled pair indices and returns the estimated quantity of interest. This function should fit the outcome model and estimate the treatment effect using g-computation, but the matching step occurs prior to the bootstrap. Here, we'll use matching without replacement, since the cluster bootstrap has been found to work well with it [@austin2014; @abadie2019]. This could be used for any method that returns pair membership, including other pair matching methods without replacement and full matching. As before, we'll use g-computation to estimate the marginal RR ATT, and we'll do so manually rather than using `avg_comparisons()` for speed. Note that the cluster bootstrap is already much faster than the standard bootstrap because matching does not need to occur within each bootstrap sample. First, we'll do a round of matching. ```{r} mNN <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data = d) mNN md <- match_data(mNN) ``` Next, we'll write the function that takes in cluster membership and the sampled indices and returns an estimate. ```{r} #Unique pair IDs pair_ids <- levels(md$subclass) #Unit IDs, split by pair membership split_inds <- split(seq_len(nrow(md)), md$subclass) cluster_boot_fun <- function(pairs, i) { #Extract units corresponding to selected pairs ids <- unlist(split_inds[pairs[i]]) #Subset md with block bootstrapped indices boot_md <- md[ids,] #Fit outcome model fit <- glm(Y_B ~ A * (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9), data = boot_md, weights = weights, family = quasibinomial()) ## G-computation ## #Subset to treated units for ATT; skip for ATE md1 <- subset(boot_md, A == 1) #Estimated potential outcomes under treatment p1 <- predict(fit, type = "response", newdata = transform(md1, A = 1)) Ep1 <- mean(p1) #Estimated potential outcomes under control p0 <- predict(fit, type = "response", newdata = transform(md1, A = 0)) Ep0 <- mean(p0) #Risk ratio Ep1 / Ep0 } ``` Next, we call `boot::boot()` with this function and the vector of pair membership supplied to perform the bootstrapping. We'll request 199 bootstrap replications, but in practice you should use many more, upwards of 999. More is always better. Using more also allows you to use the bias-corrected and accelerated (BCa) boot strap confidence intervals, which are known to be the most accurate. See `?boot.ci` for details. Here, we'll just use a percentile confidence interval. ```{r, eval = boot_ok, message=F, warning=F} library("boot") set.seed(54321) cluster_boot_out <- boot(pair_ids, cluster_boot_fun, R = 199) cluster_boot_out boot.ci(cluster_boot_out, type = "perc") ``` ```{r, include = FALSE} b <- { if (boot_ok) boot::boot.ci(cluster_boot_out, type = "perc") else list(t0 = 1.588, percent = c(0,0,0, 1.348, 1.877)) } ``` We find a RR of `r round(b$t0, 3)` with a confidence interval of (`r round(b$percent[4], 3)`, `r round(b$percent[5], 3)`). If we had wanted a risk difference, we could have changed the final line in `cluster_boot_fun()` to be `Ep1 - Ep0`. ### Moderation Analysis Moderation analysis involves determining whether a treatment effect differs across levels of another variable. The use of matching with moderation analysis is described in @greenExaminingModerationAnalyses2014. The goal is to achieve balance within each subgroup of the potential moderating variable, and there are several ways of doing so. Broadly, one can either perform matching in the full dataset, requiring exact matching on the moderator, or one can perform completely separate analyses in each subgroup. We'll demonstrate the first approach below; see the blog post ["Subgroup Analysis After Propensity Score Matching Using R"](https://ngreifer.github.io/blog/subgroup-analysis-psm/) by Noah Greifer for an example of the other approach. There are benefits to using either approach, and @greenExaminingModerationAnalyses2014 find that either can be successful at balancing the subgroups. The first approach may be most effective with small samples, where separate propensity score models would be fit with greater uncertainty and an increased possibility of perfect prediction or failure to converge [@wangRelativePerformancePropensity2018]. The second approach may be more effective with larger samples or with matching methods that target balance in the matched sample, such as genetic matching [@kreifMethodsEstimatingSubgroup2012]. With genetic matching, separate subgroup analyses ensure balance is optimized within each subgroup rather than just overall. The chosen approach should be that which achieves the best balance, though we don't demonstrate assessing balance here to maintain focus on effect estimation. The full dataset approach involves pooling information across subgroups. This could involve estimating propensity scores using a single model for both groups but exact matching on the potential moderator. The propensity score model could include moderator-by-covariate interactions to allow the propensity score model to vary across subgroups on some covariates. It is critical that exact matching is done on the moderator so that matched pairs are not split across subgroups. We'll consider the binary variable `X5` to be the potential moderator of the effect of `A` on `Y_C`. Below, we'll estimate a propensity score using a single propensity score model with a few moderator-by-covariate interactions. We'll perform nearest neighbor matching on the propensity score and exact matching on the moderator, `X5`. ```{r} mP <- matchit(A ~ X1 + X2 + X5*X3 + X4 + X5*X6 + X7 + X5*X8 + X9, data = d, exact = ~X5) mP ``` Although it is straightforward to assess balance overall using `summary()`, it is more challenging to assess balance within subgroups. The easiest way to check subgroup balance would be to use `cobalt::bal.tab()`, which has a `cluster` argument that can be used to assess balance within subgroups, e.g., by `cobalt::bal.tab(mP, cluster = "X5")`. See the vignette "Appendix 2: Using cobalt with Clustered, Multiply Imputed, and Other Segmented Data" on the `cobalt` [website](https://ngreifer.github.io/cobalt/index.html) for details. If we are satisfied with balance, we can then model the outcome with an interaction between the treatment and the moderator. ```{r} mdP <- match_data(mP) fitP <- lm(Y_C ~ A * X5, data = mdP, weights = weights) ``` To estimate the subgroup ATTs, we can use `avg_comparisons()`, this time specifying the `by` argument to signify that we want treatment effects stratified by the moderator. ```{r, eval=me_ok} avg_comparisons(fitP, variables = "A", vcov = ~subclass, newdata = subset(A == 1), by = "X5") ``` We can see that the subgroup mean differences are quite similar to each other. Finally, we can test for moderation using another call to `avg_comparisons()`, this time using the `hypothesis` argument to signify that we want to compare effects between subgroups: ```{r, eval=me_ok} avg_comparisons(fitP, variables = "A", vcov = ~subclass, newdata = subset(A == 1), by = "X5", hypothesis = ~pairwise) ``` As expected, the difference between the subgroup treatment effects is small and nonsignificant, so there is no evidence of moderation by `X5`. When the moderator has more than two levels, it is possible to run an omnibus test for moderation by changing `hypothesis` to `~reference` and supplying the output to `hypotheses()` with `joint = TRUE`, e.g., ```{r, eval=FALSE} avg_comparisons(fitP, variables = "A", vcov = ~subclass, newdata = subset(A == 1), by = "X5", hypothesis = ~reference) |> hypotheses(joint = TRUE) ``` This produces a single p-value for the test that all pairwise differences between subgroups are equal to zero. ### Reporting Results It is important to be as thorough and complete as possible when describing the methods of estimating the treatment effect and the results of the analysis. This improves transparency and replicability of the analysis. Results should at least include the following: - a description of the outcome model used (e.g., logistic regression, a linear model with treatment-covariate interactions and covariates, a Cox proportional hazards model with the matching weights applied) - the way the effect was estimated (e.g., using g-computation or as the coefficient in the outcome model) - the way SEs and confidence intervals were estimated (e.g., using robust SEs, using cluster-robust SEs with pair membership as the cluster, using the BCa bootstrap with 4999 bootstrap replications and the entire process of matching and effect estimation included in each replication) - R packages and functions used in estimating the effect and its SE (e.g., `glm()` in base R, `avg_comparisons()` in `marginaleffects`, `boot()` and `boot.ci()` in `boot`) - The effect and its SE and confidence interval All this is in addition to information about the matching method, propensity score estimation procedure (if used), balance assessment, etc. mentioned in the other vignettes. ## Common Mistakes There are a few common mistakes that should be avoided. It is important not only to avoid these mistakes in one's own research but also to be able to spot these mistakes in others' analyses. ### 1. Failing to include weights Several methods involve weights that are to be used in estimating the treatment effect. With full matching and stratification matching (when analyzed using MMWS), the weights do the entire work of balancing the covariates across the treatment groups. Omitting weights essentially ignores the entire purpose of matching. Some cases are less obvious. When performing matching with replacement and estimating the treatment effect using the `match_data()` output, weights must be included to ensure control units matched to multiple treated units are weighted accordingly. Similarly, when performing k:1 matching where not all treated units receive k matches, weights are required to account for the differential weight of the matched control units. The only time weights can be omitted after pair matching is when performing 1:1 matching without replacement. Including weights even in this scenario will not affect the analysis and it can be good practice to always include weights to prevent this error from occurring. There are some scenarios where weights are not useful because the conditioning occurs through some other means, such as when using the direct subclass strategy rather than MMWS for estimating marginal effects after stratification. ### 2. Failing to use robust or cluster-robust standard errors Robust SEs are required when using weights to estimate the treatment effect. The model-based SEs resulting from weighted least squares or maximum likelihood are inaccurate when using matching weights because they assume weights are frequency weights rather than probability weights. Cluster-robust SEs account for both the matching weights and pair membership and should be used when appropriate. Sometimes, researchers use functions in the `survey` package to estimate robust SEs, especially with inverse probability weighting; this is a valid way to compute robust SEs and will give similar results to `sandwich::vcovHC()`.[^10] [^10]: To use `survey` to adjust for pair membership, one can use the following code to specify the survey design to be used with `svyglm()`: `svydesign(ids = ~subclass, weights = ~weights, data = md)` where `md` is the output of `match_data()`. After `svyglm()`, `avg_comparisons()` can be used, and the `vcov` argument does not need to be specified. ### 3. Interpreting conditional effects as marginal effects The distinction between marginal and conditional effects is not always clear both in methodological and applied papers. Some statistical methods are valid only for estimating conditional effects and they should not be used to estimate marginal effects (without further modification). Sometimes conditional effects are desirable, and such methods may be useful for them, but when marginal effects are the target of inference, it is critical not to inappropriately interpret estimates resulting from statistical methods aimed at estimating conditional effects as marginal effects. Although this issue is particularly salient with binary and survival outcomes due to the general noncollapsibility of the OR, RR, and HR, this can also occur with linear models for continuous outcomes or the RD. The following methods estimate **conditional effects** for binary or survival outcomes (with noncollapsible effect measures) and should **not** be used to estimate marginal effects: - Logistic regression or Cox proportional hazards model with covariates and/or the propensity score included, using the coefficient on treatment as the effect estimate - Conditional logistic regression after matching (e.g., using `survival::clogit()`) - Stratified Cox regression after matching (e.g., using `survival::coxph()` with `strata()` in the model formula) - Averaging stratum-specific effect estimates after stratification, including using Mantel-Haenszel OR pooling - Including pair or stratum fixed or random effects in a logistic regression model, using the coefficient on treatment as the effect estimate In addition, with continuous outcomes, conditional effects can be mistakenly interpreted as marginal effect estimates when treatment-covariate interactions are present in the outcome model. If the covariates are not centered at their mean in the target population (e.g., the treated group for the ATT, the full sample for the ATE, or the remaining matched sample for an ATM), the coefficient on treatment will not correspond to the marginal effect in the target population; it will correspond to the effect of treatment when the covariate values are equal to zero, which may not be meaningful or plausible. G-computation is always the safest way to estimate effects when including covariates in the outcome model, especially in the presence of treatment-covariate interactions. ## References ::: {#refs} ::: ## Code to Generate Data used in Examples ```{r, eval = FALSE} #Generating data similar to Austin (2009) for demonstrating treatment effect estimation gen_X <- function(n) { X <- matrix(rnorm(9 * n), nrow = n, ncol = 9) X[,5] <- as.numeric(X[,5] < .5) X } #~20% treated gen_A <- function(X) { LP_A <- - 1.2 + log(2)*X[,1] - log(1.5)*X[,2] + log(2)*X[,4] - log(2.4)*X[,5] + log(2)*X[,7] - log(1.5)*X[,8] P_A <- plogis(LP_A) rbinom(nrow(X), 1, P_A) } # Continuous outcome gen_Y_C <- function(A, X) { 2*A + 2*X[,1] + 2*X[,2] + 2*X[,3] + 1*X[,4] + 2*X[,5] + 1*X[,6] + rnorm(length(A), 0, 5) } #Conditional: # MD: 2 #Marginal: # MD: 2 # Binary outcome gen_Y_B <- function(A, X) { LP_B <- -2 + log(2.4)*A + log(2)*X[,1] + log(2)*X[,2] + log(2)*X[,3] + log(1.5)*X[,4] + log(2.4)*X[,5] + log(1.5)*X[,6] P_B <- plogis(LP_B) rbinom(length(A), 1, P_B) } #Conditional: # OR: 2.4 # logOR: .875 #Marginal: # RD: .144 # RR: 1.54 # logRR: .433 # OR: 1.92 # logOR .655 # Survival outcome gen_Y_S <- function(A, X) { LP_S <- -2 + log(2.4)*A + log(2)*X[,1] + log(2)*X[,2] + log(2)*X[,3] + log(1.5)*X[,4] + log(2.4)*X[,5] + log(1.5)*X[,6] sqrt(-log(runif(length(A)))*2e4*exp(-LP_S)) } #Conditional: # HR: 2.4 # logHR: .875 #Marginal: # HR: 1.57 # logHR: .452 set.seed(19599) n <- 2000 X <- gen_X(n) A <- gen_A(X) Y_C <- gen_Y_C(A, X) Y_B <- gen_Y_B(A, X) Y_S <- gen_Y_S(A, X) d <- data.frame(A, X, Y_C, Y_B, Y_S) ``` MatchIt/inst/doc/MatchIt.html0000644000176200001440000070024114763323565015566 0ustar liggesusers MatchIt: Getting Started

MatchIt: Getting Started

Noah Greifer

2025-03-09

Introduction

MatchIt implements the suggestions of Ho et al. (2007) for improving parametric statistical models for estimating treatment effects in observational studies and reducing model dependence by preprocessing data with semi-parametric and non-parametric matching methods. After appropriately preprocessing with MatchIt, researchers can use whatever parametric model they would have used without MatchIt and produce inferences that are more robust and less sensitive to modeling assumptions. MatchIt reduces the dependence of causal inferences on commonly made, but hard-to-justify, statistical modeling assumptions using a large range of sophisticated matching methods. The package includes several popular approaches to matching and provides access to methods implemented in other packages through its single, unified, and easy-to-use interface.

Matching is used in the context of estimating the causal effect of a binary treatment or exposure on an outcome while controlling for measured pre-treatment variables, typically confounding variables or variables prognostic of the outcome. Here and throughout the MatchIt documentation we use the word “treatment” to refer to the focal causal variable of interest, with “treated” and “control” reflecting the names of the treatment groups. The goal of matching is to produce covariate balance, that is, for the distributions of covariates in the two groups to be approximately equal to each other, as they would be in a successful randomized experiment. The importance of covariate balance is that it allows for increased robustness to the choice of model used to estimate the treatment effect; in perfectly balanced samples, a simple difference in means can be a valid treatment effect estimate. Here we do not aim to provide a full introduction to matching or causal inference theory, but simply to explain how to use MatchIt to perform nonparametric preprocessing. For excellent and accessible introductions to matching, see Stuart (2010) and Austin (2011).

A matching analysis involves four primary steps: 1) planning, 2) matching, 3) assessing the quality of matches, and 4) estimating the treatment effect and its uncertainty. Here we briefly discuss these steps and how they can be implemented with MatchIt; in the other included vignettes, these steps are discussed in more detail.

We will use Lalonde’s data on the evaluation of the National Supported Work program to demonstrate MatchIt’s capabilities. First, we load MatchIt and bring in the lalonde dataset.

library("MatchIt")
data("lalonde")

head(lalonde)
##      treat age educ   race married nodegree re74 re75       re78
## NSW1     1  37   11  black       1        1    0    0  9930.0460
## NSW2     1  22    9 hispan       0        1    0    0  3595.8940
## NSW3     1  30   12  black       0        0    0    0 24909.4500
## NSW4     1  27   11  black       0        1    0    0  7506.1460
## NSW5     1  33    8  black       0        1    0    0   289.7899
## NSW6     1  22    9  black       0        1    0    0  4056.4940

The statistical quantity of interest is the causal effect of the treatment (treat) on 1978 earnings (re78). The other variables are pre-treatment covariates. See ?lalonde for more information on this dataset. In particular, the analysis is concerned with the marginal, total effect of the treatment for those who actually received the treatment.

In what follows, we briefly describe the four steps of a matching analysis and how to implement them in MatchIt. For more details, we recommend reading the other vignettes, vignette("matching-methods"), vignette("assessing-balance"), and vignette("estimating-effects"), especially for users less familiar with matching methods. For the use of MatchIt with sampling weights, also see vignette("sampling-weights"). It is important to recognize that the ease of using MatchIt does not imply the simplicity of matching methods; advanced statistical methods like matching that require many decisions to be made and caution in their use should only be performed by those with statistical training.

Planning

The planning phase of a matching analysis involves selecting the type of effect to be estimated, selecting the target population to which the treatment effect is to generalize, and selecting the covariates for which balance is required for an unbiased estimate of the treatment effect. Each of these are theoretical steps that do not involve performing analyses on the data. Ideally, they should be considered prior to data collection in the planning stage of a study. Thinking about them early can aid in performing a complete and cost-effective analysis.

Selecting the type of effect to be estimated. There are a few different types of effects to be estimated. In the presence of mediating variables, one might be interested in the direct effect of the treatment that does not pass through the mediating variables or the total effect of the treatment across all causal pathways. Matching is well suited for estimating total effects, and specific mediation methods may be better suited for other mediation-related quantities. One may be interested in a conditional effect or a marginal effect. A conditional effect is the effect of a treatment within some strata of other prognostic variables (e.g., at the patient level), and a marginal effect is the average effect of a treatment in a population (e.g., for implementing a broad policy change). Different types of matching are well suited for each of these, but the most common forms are best used for estimating marginal treatment effects; for conditional treatment effects, typically modeling assumptions are required or matching must be done within strata of the conditioning variables. Matching can reduce the reliance on correct model specification for conditional effects.

Selecting a target population. The target population is the population to which the effect estimate is to generalize. Typically, an effect estimated in a sample generalizes to the population from which the sample is a probability sample. If the sample is not a probability sample from any population (e.g., it is a convenience sample or involves patients from an arbitrary hospital), the target population can be unclear. Often, the target population is a group of units who are eligible for the treatment (or a subset thereof). Causal estimands are defined by the target population to which they generalize.

The average treatment effect in the population (ATE) is the average effect of the treatment for all units in the target population. The average treatment effect in the treated (ATT) is the average effect of the treatment for units like those who actually were treated. The most common forms of matching are best suited for estimating the ATT, though some are also available for estimating the ATE. Some matching methods distort the sample in such a way that the estimated treatment effect corresponds neither to the ATE nor to the ATT, but rather to the effect in an unspecified population (sometimes called the ATM, or average treatment effect in the remaining matched sample). When the target population is not so important (e.g., in the case of treatment effect discovery), such methods may be attractive; otherwise, care should be taken in ensuring the effect generalizes to the target population of interest. Different matching methods allow for different target populations, so it is important to choose a matching method that allows one to estimate the desired effect. See Greifer and Stuart (2021) for guidance on making this choice.

Selecting covariates to balance. Selecting covariates carefully is critical for ensuring the resulting treatment effect estimate is free of confounding and can be validly interpreted as a causal effect. To estimate total causal effects, all covariates must be measured prior to treatment (or otherwise not be affected by the treatment). Covariates should be those that cause variation in the outcome and selection into treatment group; these are known as confounding variables. See VanderWeele (2019) for a guide on covariate selection. Ideally these covariates are measured without error and are free of missingness.

Check Initial Imbalance

After planning and prior to matching, it can be a good idea to view the initial imbalance in one’s data that matching is attempting to eliminate. We can do this using the code below:

# No matching; constructing a pre-match matchit object
m.out0 <- matchit(treat ~ age + educ + race + married + 
                    nodegree + re74 + re75,
                  data = lalonde,
                  method = NULL,
                  distance = "glm")

The first argument is a formula relating the treatment to the covariates used in estimating the propensity score and for which balance is to be assessed. The data argument specifies the dataset where these variables exist. Typically, the method argument specifies the method of matching to be performed; here, we set it to NULL so we can assess balance prior to matching1. The distance argument specifies the method for estimating the propensity score, a one-dimensional summary of all the included covariates, computed as the predicted probability of being the treated group given the covariates; here, we set it to "glm" for generalized linear model, which implements logistic regression by default2 (see ?distance for other options).

Below we assess balance on the unmatched data using summary():

# Checking balance prior to matching
summary(m.out0)
## 
## Call:
## matchit(formula = treat ~ age + educ + race + married + nodegree + 
##     re74 + re75, data = lalonde, method = NULL, distance = "glm")
## 
## Summary of Balance for All Data:
##            Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
## distance          0.5774        0.1822          1.7941     0.9211    0.3774   0.6444
## age              25.8162       28.0303         -0.3094     0.4400    0.0813   0.1577
## educ             10.3459       10.2354          0.0550     0.4959    0.0347   0.1114
## raceblack         0.8432        0.2028          1.7615          .    0.6404   0.6404
## racehispan        0.0595        0.1422         -0.3498          .    0.0827   0.0827
## racewhite         0.0973        0.6550         -1.8819          .    0.5577   0.5577
## married           0.1892        0.5128         -0.8263          .    0.3236   0.3236
## nodegree          0.7081        0.5967          0.2450          .    0.1114   0.1114
## re74           2095.5737     5619.2365         -0.7211     0.5181    0.2248   0.4470
## re75           1532.0553     2466.4844         -0.2903     0.9563    0.1342   0.2876
## 
## Sample Sizes:
##           Control Treated
## All           429     185
## Matched       429     185
## Unmatched       0       0
## Discarded       0       0

We can see severe imbalances as measured by the standardized mean differences (Std. Mean Diff.), variance ratios (Var. Ratio), and empirical cumulative distribution function (eCDF) statistics. Values of standardized mean differences and eCDF statistics close to zero and values of variance ratios close to one indicate good balance, and here many of them are far from their ideal values.

Matching

Now, matching can be performed. There are several different classes and methods of matching, described in vignette("matching-methods"). Here, we begin by briefly demonstrating 1:1 nearest neighbor (NN) matching on the propensity score, which is appropriate for estimating the ATT. One by one, each treated unit is paired with an available control unit that has the closest propensity score to it. Any remaining control units are left unmatched and excluded from further analysis. Due to the theoretical balancing properties of the propensity score described by Rosenbaum and Rubin (1983), propensity score matching can be an effective way to achieve covariate balance in the treatment groups. Below we demonstrate the use of matchit() to perform nearest neighbor propensity score matching.

# 1:1 NN PS matching w/o replacement
m.out1 <- matchit(treat ~ age + educ + race + married + 
                    nodegree + re74 + re75,
                  data = lalonde,
                  method = "nearest",
                  distance = "glm")

We use the same syntax as before, but this time specify method = "nearest" to implement nearest neighbor matching, again using a logistic regression propensity score. Many other arguments are available for tuning the matching method and method of propensity score estimation.

The matching outputs are contained in the m.out1 object. Printing this object gives a description of the type of matching performed:

m.out1
## A `matchit` object
##  - method: 1:1 nearest neighbor matching without replacement
##  - distance: Propensity score
##              - estimated with logistic regression
##  - number of obs.: 614 (original), 370 (matched)
##  - target estimand: ATT
##  - covariates: age, educ, race, married, nodegree, re74, re75

The key components of the m.out1 object are weights (the computed matching weights), subclass (matching pair membership), distance (the estimated propensity score), and match.matrix (which control units are matched to each treated unit). How these can be used for estimating the effect of the treatment after matching is detailed in vignette("estimating-effects").

Assessing the Quality of Matches

Although matching on the propensity score is often effective at eliminating differences between the treatment groups to achieve covariate balance, its performance in this regard must be assessed. If covariates remain imbalanced after matching, the matching is considered unsuccessful, and a different matching specification should be tried. MatchIt offers a few tools for the assessment of covariate balance after matching. These include graphical and statistical methods. More detail on the interpretation of the included plots and statistics can be found in vignette("assessing-balance").

In addition to covariate balance, the quality of the match is determined by how many units remain after matching. Matching often involves discarding units that are not paired with other units, and some matching options, such as setting restrictions for common support or calipers, can further decrease the number of remaining units. If, after matching, the remaining sample size is small, the resulting effect estimate may be imprecise. In many cases, there will be a trade-off between balance and remaining sample size. How to optimally choose among them is an instance of the fundamental bias-variance trade-off problem that cannot be resolved without substantive knowledge of the phenomena under study. Prospective power analyses can be used to determine how small a sample can be before necessary precision is sacrificed.

To assess the quality of the resulting matches numerically, we can use the summary() function on m.out1 as before. Here we set un = FALSE to suppress display of the balance before matching for brevity and because we already saw it. (Leaving it as TRUE, its default, would display balance both before and after matching.)

# Checking balance after NN matching
summary(m.out1, un = FALSE)
## 
## Call:
## matchit(formula = treat ~ age + educ + race + married + nodegree + 
##     re74 + re75, data = lalonde, method = "nearest", distance = "glm")
## 
## Summary of Balance for Matched Data:
##            Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
## distance          0.5774        0.3629          0.9739     0.7566    0.1321   0.4216          0.9740
## age              25.8162       25.3027          0.0718     0.4568    0.0847   0.2541          1.3938
## educ             10.3459       10.6054         -0.1290     0.5721    0.0239   0.0757          1.2474
## raceblack         0.8432        0.4703          1.0259          .    0.3730   0.3730          1.0259
## racehispan        0.0595        0.2162         -0.6629          .    0.1568   0.1568          1.0743
## racewhite         0.0973        0.3135         -0.7296          .    0.2162   0.2162          0.8390
## married           0.1892        0.2108         -0.0552          .    0.0216   0.0216          0.8281
## nodegree          0.7081        0.6378          0.1546          .    0.0703   0.0703          1.0106
## re74           2095.5737     2342.1076         -0.0505     1.3289    0.0469   0.2757          0.7965
## re75           1532.0553     1614.7451         -0.0257     1.4956    0.0452   0.2054          0.7381
## 
## Sample Sizes:
##           Control Treated
## All           429     185
## Matched       185     185
## Unmatched     244       0
## Discarded       0       0

At the top is a summary of covariate balance after matching. Although balance has improved for some covariates, in general balance is still quite poor, indicating that nearest neighbor propensity score matching is not sufficient for removing confounding in this dataset. The final column, Std. Pair Diff, displays the average absolute within-pair difference of each covariate. When these values are small, better balance is typically achieved and estimated effects are more robust to misspecification of the outcome model (King and Nielsen 2019; Rubin 1973).

Next is a table of the sample sizes before and after matching. The matching procedure left 244 control units unmatched. Ideally, unmatched units would be those far from the treated units and would require greater extrapolation were they to have been retained. We can visualize the distribution of propensity scores of those who were matched using plot() with type = "jitter":

plot(m.out1, type = "jitter", interactive = FALSE)

Jitter plot of the propensity scores, which shows that no treated unit were dropped, and a large number of control units with low propensity scores were dropped.

We can visually examine balance on the covariates using plot() with type = "density":

plot(m.out1, type = "density", interactive = FALSE,
     which.xs = ~age + married + re75)

Density plots of age, married and re75 in the unmatched and matched samples.

Imbalances are represented by the differences between the black (treated) and gray (control) distributions. Although married and re75 appear to have improved balance after matching, the case is mixed for age.

Trying a Different Matching Specification

Given the poor performance of nearest neighbor matching in this example, we can try a different matching method or make other changes to the matching algorithm or distance specification. Below, we’ll try full matching, which matches every treated unit to at least one control and every control to at least one treated unit (Hansen 2004; Stuart and Green 2008). We’ll also try a different link (probit) for the propensity score model.

# Full matching on a probit PS
m.out2 <- matchit(treat ~ age + educ + race + married + 
                    nodegree + re74 + re75,
                  data = lalonde,
                  method = "full",
                  distance = "glm",
                  link = "probit")
m.out2
## A `matchit` object
##  - method: Optimal full matching
##  - distance: Propensity score
##              - estimated with probit regression
##  - number of obs.: 614 (original), 614 (matched)
##  - target estimand: ATT
##  - covariates: age, educ, race, married, nodegree, re74, re75

We can examine balance on this new matching specification.

# Checking balance after full matching
summary(m.out2, un = FALSE)
## 
## Call:
## matchit(formula = treat ~ age + educ + race + married + nodegree + 
##     re74 + re75, data = lalonde, method = "full", distance = "glm", 
##     link = "probit")
## 
## Summary of Balance for Matched Data:
##            Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
## distance          0.5773        0.5764          0.0045     0.9949    0.0043   0.0486          0.0198
## age              25.8162       25.5347          0.0393     0.4790    0.0787   0.2742          1.2843
## educ             10.3459       10.5381         -0.0956     0.6192    0.0253   0.0730          1.2179
## raceblack         0.8432        0.8389          0.0119          .    0.0043   0.0043          0.0162
## racehispan        0.0595        0.0492          0.0435          .    0.0103   0.0103          0.4412
## racewhite         0.0973        0.1119         -0.0493          .    0.0146   0.0146          0.3454
## married           0.1892        0.1633          0.0660          .    0.0259   0.0259          0.4473
## nodegree          0.7081        0.6577          0.1110          .    0.0504   0.0504          0.9872
## re74           2095.5737     2100.2150         -0.0009     1.3467    0.0314   0.1881          0.8387
## re75           1532.0553     1561.4420         -0.0091     1.5906    0.0536   0.1984          0.8240
## 
## Sample Sizes:
##               Control Treated
## All            429.       185
## Matched (ESS)   50.76     185
## Matched        429.       185
## Unmatched        0.         0
## Discarded        0.         0

Balance is far better, as determined by the lower standardized mean differences and eCDF statistics. The balance should be reported when publishing the results of a matching analysis. This can be done either in a table, using the values resulting from summary(), or in a plot, such as a Love plot, which we can make by calling plot() on the summary() output:

plot(summary(m.out2))

A love plot with matched dots below the threshold lines, indicaitng good balance after matching, in contrast to the unmatched dots far from the treshold lines, indicating poor balance before matching.

Love plots are a simple and straightforward way to summarize balance visually. See vignette("assessing-balance") for more information on how to customize MatchIt’s Love plot and how to use cobalt, a package designed specifically for balance assessment and reporting that is compatible with MatchIt.

Estimating the Treatment Effect

How treatment effects are estimated depends on what form of matching was performed. See vignette("estimating-effects") for information on how to estimate treatment effects in a variety of scenarios (i.e., different matching methods and outcome types). After full matching and most other matching methods, we can run a regression of the outcome on the treatment and covariates in the matched sample (i.e., including the matching weights) and estimate the treatment effect using g-computation as implemented in marginaleffects::avg_comparisons()3. Including the covariates used in the matching in the effect estimation can provide additional robustness to slight imbalances remaining after matching and can improve precision.

Because full matching was successful at balancing the covariates, we’ll demonstrate here how to estimate a treatment effect after performing such an analysis. First, we’ll extract the matched dataset from the matchit object using match_data(). This dataset only contains the matched units and adds columns for distance, weights, and subclass (described previously).

m.data <- match_data(m.out2)

head(m.data)
##      treat age educ   race married nodegree re74 re75       re78  distance weights subclass
## NSW1     1  37   11  black       1        1    0    0  9930.0460 0.6356769       1       54
## NSW2     1  22    9 hispan       0        1    0    0  3595.8940 0.2298151       1       62
## NSW3     1  30   12  black       0        0    0    0 24909.4500 0.6813558       1       69
## NSW4     1  27   11  black       0        1    0    0  7506.1460 0.7690590       1       78
## NSW5     1  33    8  black       0        1    0    0   289.7899 0.6954138       1       85
## NSW6     1  22    9  black       0        1    0    0  4056.4940 0.6943658       1       91

We can then model the outcome in this dataset using the standard regression functions in R, like lm() or glm(), being sure to include the matching weights (stored in the weights variable of the match_data() output) in the estimation4. Finally, we use marginaleffects::avg_comparisons() to perform g-computation to estimate the ATT. We recommend using cluster-robust standard errors for most analyses, with pair membership as the clustering variable; avg_comparisons() makes this straightforward.

library("marginaleffects")

fit <- lm(re78 ~ treat * (age + educ + race + married +
                            nodegree + re74 + re75),
          data = m.data,
          weights = weights)

avg_comparisons(fit,
                variables = "treat",
                vcov = ~subclass,
                newdata = subset(treat == 1))

The outcome model coefficients and tests should not be interpreted or reported. See vignette("estimating-effects") for more information on how to estimate effects and standard errors with different forms of matching and with different outcome types.

A benefit of matching is that the outcome model used to estimate the treatment effect is robust to misspecification when balance has been achieved. With full matching, we were able to achieve balance, so the effect estimate should depend less on the form of the outcome model used than had we used 1:1 matching without replacement or no matching at all.

Reporting Results

To report matching results in a manuscript or research report, a few key pieces of information are required. One should be as detailed as possible about the matching procedure and the decisions made to ensure the analysis is replicable and can be adequately assessed for soundness by the audience. Key pieces of information to include are 1) the matching specification used (including the method and any additional options, like calipers or common support restrictions), 2) the distance measure used (including how it was estimated e.g., using logistic regression for propensity scores), 3) which other matching methods were tried prior to settling on a final specification and how the choices were made, 4) the balance of the final matching specification (including standardized mean differences and other balance statistics for the variables, their powers, and their interactions; some of these can be reported as summaries rather than in full detail), 5) the number of matched, unmatched, and discarded units included in the effect estimation, and 6) the method of estimating the treatment effect and standard error or confidence interval (including the specific model used and the specific type of standard error). See Thoemmes and Kim (2011) for a complete list of specific details to report. Below is an example of how we might write up the prior analysis:

We used propensity score matching to estimate the average marginal effect of the treatment on 1978 earnings on those who received it accounting for confounding by the included covariates. We first attempted 1:1 nearest neighbor propensity score matching without replacement with a propensity score estimated using logistic regression of the treatment on the covariates. This matching specification yielded poor balance, so we instead tried full matching on the propensity score, which yielded adequate balance, as indicated in Table 1 and Figure 1. The propensity score was estimated using a probit regression of the treatment on the covariates, which yielded better balance than did a logistic regression. After matching, all standardized mean differences for the covariates were below 0.1 and all standardized mean differences for squares and two-way interactions between covariates were below .15, indicating adequate balance. Full matching uses all treated and all control units, so no units were discarded by the matching.

To estimate the treatment effect and its standard error, we fit a linear regression model with 1978 earnings as the outcome and the treatment, covariates, and their interaction as predictors and included the full matching weights in the estimation. The lm() function was used to fit the outcome, and the avg_comparisons() function in the marginaleffects package was used to perform g-computation in the matched sample to estimate the ATT. A cluster-robust variance was used to estimate its standard error with matching stratum membership as the clustering variable.

The estimated effect was $2114 (SE = 646, p = 0.001), indicating that the average effect of the treatment for those who received it is to increase earnings.

Conclusion

Although we have covered the basics of performing a matching analysis here, to use matching to its full potential, the more advanced methods available in MatchIt should be considered. We recommend reading the other vignettes included here to gain a better understand of all the MatchIt has to offer and how to use it responsibly and effectively. As previously stated, the ease of using MatchIt does not imply that matching or causal inference in general are simple matters; matching is an advanced statistical technique that should be used with care and caution. We hope the capabilities of MatchIt ease and encourage the use of nonparametric preprocessing for estimating causal effects in a robust and well-justified way.

References

Austin, Peter C. 2011. “An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies.” Multivariate Behavioral Research 46 (3): 399–424. https://doi.org/10.1080/00273171.2011.568786.
Greifer, Noah, and Elizabeth A. Stuart. 2021. “Choosing the Estimand When Matching or Weighting in Observational Studies.” arXiv:2106.10577 [Stat], June. https://arxiv.org/abs/2106.10577.
Hansen, Ben B. 2004. “Full Matching in an Observational Study of Coaching for the SAT.” Journal of the American Statistical Association 99 (467): 609–18. https://doi.org/10.1198/016214504000000647.
Ho, Daniel E., Kosuke Imai, Gary King, and Elizabeth A. Stuart. 2007. “Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference.” Political Analysis 15 (3): 199–236. https://doi.org/10.1093/pan/mpl013.
King, Gary, and Richard Nielsen. 2019. “Why Propensity Scores Should Not Be Used for Matching.” Political Analysis, May, 1–20. https://doi.org/10.1017/pan.2019.11.
Rosenbaum, Paul R., and Donald B. Rubin. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70 (1): 41–55. https://doi.org/10.1093/biomet/70.1.41.
Rubin, Donald B. 1973. “Matching to Remove Bias in Observational Studies.” Biometrics 29 (1): 159. https://doi.org/10.2307/2529684.
Stuart, Elizabeth A. 2010. “Matching Methods for Causal Inference: A Review and a Look Forward.” Statistical Science 25 (1): 1–21. https://doi.org/10.1214/09-STS313.
Stuart, Elizabeth A., and Kerry M. Green. 2008. “Using Full Matching to Estimate Causal Effects in Nonexperimental Studies: Examining the Relationship Between Adolescent Marijuana Use and Adult Outcomes.” Developmental Psychology 44 (2): 395–406. https://doi.org/10.1037/0012-1649.44.2.395.
Thoemmes, Felix J., and Eun Sook Kim. 2011. “A Systematic Review of Propensity Score Methods in the Social Sciences.” Multivariate Behavioral Research 46 (1): 90–118. https://doi.org/10.1080/00273171.2011.540475.
VanderWeele, Tyler J. 2019. “Principles of Confounder Selection.” European Journal of Epidemiology 34 (3): 211–19. https://doi.org/10.1007/s10654-019-00494-6.

  1. Note that the default for method is "nearest" to perform nearest neighbor matching. To prevent any matching from taking place in order to assess pre-matching imbalance, method must be set to NULL.↩︎

  2. Note that setting distance = "logit", which was the default in MatchIt version prior to 4.0.0, or "ps", which was the default prior to version 4.5.0, will also estimate logistic regression propensity scores. Because it is the default, the distance argument can actually be omitted if logistic regression propensity scores are desired.↩︎

  3. In some cases, the coefficient on the treatment variable in the outcome model can be used as the effect estimate, but g-computation always yields a valid effect estimate regardless of the form of the outcome model and its use is the same regardless of the outcome model type or matching method (with some slight variations), so we always recommend performing g-computation after fitting the outcome model. G-computation is explained in detail in vignette("estimating-effects").↩︎

  4. With 1:1 nearest neighbor matching without replacement, excluding the matching weights does not change the estimates. For all other forms of matching, they are required, so we recommend always including them for consistency.↩︎

MatchIt/inst/doc/matching-methods.html0000644000176200001440000026600014763323602017460 0ustar liggesusers Matching Methods

Matching Methods

Noah Greifer

2025-03-09

Introduction

MatchIt implements several matching methods with a variety of options. Though the help pages for the individual methods describe each method and how they can be used, this vignette provides a broad overview of the available matching methods and their associated options. The choice of matching method depends on the goals of the analysis (e.g., the estimand, whether low bias or high precision is important) and the unique qualities of each dataset to be analyzed, so there is no single optimal choice for any given analysis. A benefit of nonparametric preprocessing through matching is that a number of matching methods can be tried and their quality assessed without consulting the outcome, reducing the possibility of capitalizing on chance while allowing for the benefits of an exploratory analysis in the design phase (Ho et al. 2007).

This vignette describes each matching method available in MatchIt and the various options that are allowed with matching methods and the consequences of their use. For a brief introduction to the use of MatchIt functions, see vignette("MatchIt"). For details on how to assess and report covariate balance, see vignette("assessing-balance"). For details on how to estimate treatment effects and standard errors after matching, see vignette("estimating-effects").

Matching

Matching as implemented in MatchIt is a form of subset selection, that is, the pruning and weighting of units to arrive at a (weighted) subset of the units from the original dataset. Ideally, and if done successfully, subset selection produces a new sample where the treatment is unassociated with the covariates so that a comparison of the outcomes treatment and control groups is not confounded by the measured and balanced covariates. Although statistical estimation methods like regression can also be used to remove confounding due to measured covariates, Ho et al. (2007) argue that fitting regression models in matched samples reduces the dependence of the validity of the estimated treatment effect on the correct specification of the model.

Matching is nonparametric in the sense that the estimated weights and pruning of the sample are not direct functions of estimated model parameters but rather depend on the organization of discrete units in the sample; this is in contrast to propensity score weighting (also known as inverse probability weighting), where the weights come more directly from the estimated propensity score model and therefore are more sensitive to its correct specification. These advantages, as well as the intuitive understanding of matching by the public compared to regression or weighting, make it a robust and effective way to estimate treatment effects.

It is important to note that this implementation of matching differs from the methods described by Abadie and Imbens (2006, 2016) and implemented in the Matching R package and teffects routine in Stata. That form of matching is matching imputation, where the missing potential outcomes for each unit are imputed using the observed outcomes of paired units. This is a critical distinction because matching imputation is a specific estimation method with its own effect and standard error estimators, in contrast to subset selection, which is a preprocessing method that does not require specific estimators and is broadly compatible with other parametric and nonparametric analyses. The benefits of matching imputation are that its theoretical properties (i.e., the rate of convergence and asymptotic variance of the estimator) are well understood, it can be used in a straightforward way to estimate not just the average treatment effect in the treated (ATT) but also the average treatment effect in the population (ATE), and additional effective matching methods can be used in the imputation (e.g., kernel matching). The benefits of matching as nonparametric preprocessing are that it is far more flexible with respect to the types of effects that can be estimated because it does not involve any specific estimator, its empirical and finite-sample performance has been examined in depth and is generally well understood, and it aligns well with the design of experiments, which are more familiar to non-technical audiences.

In addition to subset selection, matching often (though not always) involves a form of stratification, the assignment of units to pairs or strata containing multiple units. The distinction between subset selection and stratification is described by Zubizarreta, Paredes, and Rosenbaum (2014), who separate them into two separate steps. In MatchIt, with almost all matching methods, subset selection is performed by stratification; for example, treated units are paired with control units, and unpaired units are then dropped from the matched sample. With some methods, subclasses are used to assign matching or stratification weights to individual units, which increase or decrease each unit’s leverage in a subsequent analysis. There has been some debate about the importance of stratification after subset selection; while some authors have argued that, with some forms of matching, pair membership is incidental (Stuart 2008; Schafer and Kang 2008), others have argued that correctly incorporating pair membership into effect estimation can improve the quality of inferences (Austin and Small 2014; Wan 2019). For methods that allow it, MatchIt includes stratum membership as an additional output of each matching specification. How these strata can be used is detailed in vignette("estimating-effects").

At the heart of MatchIt are three classes of methods: distance matching, stratum matching, and pure subset selection. Distance matching involves considering a focal group (usually the treated group) and selecting members of the non-focal group (i.e., the control group) to pair with each member of the focal group based on the distance between units, which can be computed in one of several ways. Members of either group that are not paired are dropped from the sample. Nearest neighbor matching (method = "nearest"), optimal pair matching (method = "optimal"), optimal full matching (method = "full"), generalized full matching (method = "quick"), and genetic matching (method = "genetic") are the methods of distance matching implemented in MatchIt. Typically, only the average treatment in the treated (ATT) or average treatment in the control (ATC), if the control group is the focal group, can be estimated after distance matching in MatchIt (full matching is an exception, described later).

Stratum matching involves creating strata based on unique values of the covariates and assigning units with those covariate values into those strata. Any units that are in strata that lack either treated or control units are then dropped from the sample. Strata can be formed using the raw covariates (method = "exact"), coarsened versions of the covariates (method = "cem"), or coarsened versions of the propensity score (method = "subclass"). When no units are discarded, either the ATT, ATC, or ATE can be estimated after stratum matching, though often some units are discarded, especially with exact and coarsened exact matching, making the estimand less clear. For use in estimating marginal treatment effects after exact matching, stratification weights are computed for the matched units first by computing a new “stratum propensity score” for each unit, which is the proportion of treated units in its stratum. The formulas for computing inverse probability weights from standard propensity scores are then applied to the new stratum propensity scores to form the new weights.

Pure subset selection involves selecting a subset of units form the original sample without considering the distance between individual units or strata that units might fall into. Subsets are selected to optimize a criterion subject to constraint on balance and remaining sample size. Cardinality and profile matching (method = "cardinality") are the methods of pure subset selection implemented in MatchIt. Both methods allow the user to specify the largest imbalance allowed in the resulting matched sample, and an optimization routine attempts to find the largest matched sample that satisfies those balance constraints. While cardinality matching does not target a specific estimand, profile matching can be used to target the ATT, ATC, or ATE.

Below, we describe each of the matching methods implemented in MatchIt.

Matching Methods

Nearest Neighbor Matching (method = "nearest")

Nearest neighbor matching is also known as greedy matching. It involves running through the list of treated units and selecting the closest eligible control unit to be paired with each treated unit. It is greedy in the sense that each pairing occurs without reference to how other units will be or have been paired, and therefore does not aim to optimize any criterion. Nearest neighbor matching is the most common form of matching used (Thoemmes and Kim 2011; Zakrison, Austin, and McCredie 2018) and has been extensively studied through simulations. See ?method_nearest for the documentation for matchit() with method = "nearest".

Nearest neighbor matching requires the specification of a distance measure to define which control unit is closest to each treated unit. The default and most common distance is the propensity score difference, which is the difference between the propensity scores of each treated and control unit (Stuart 2010). Another popular distance is the Mahalanobis distance, described in the section “Mahalanobis distance matching” below. The order in which the treated units are to be paired must also be specified and has the potential to change the quality of the matches (Austin 2013; Rubin 1973); this is specified by the m.order argument. With propensity score matching, the default is to go in descending order from the highest propensity score; doing so allows the units that would have the hardest time finding close matches to be matched first (Rubin 1973). Other orderings are possible, including random ordering, which can be tried multiple times until an adequate matched sample is found. When matching with replacement (i.e., where each control unit can be reused to be matched with any number of treated units), the matching order doesn’t matter.

When using a matching ratio greater than 1 (i.e., when more than 1 control units are requested to be matched to each treated unit), matching occurs in a cycle, where each treated unit is first paired with one control unit, and then each treated unit is paired with a second control unit, etc. Ties are broken deterministically based on the order of the units in the dataset to ensure that multiple runs of the same specification yield the same result (unless the matching order is requested to be random).

Nearest neighbor matching is implemented in MatchIt using internal C++ code through Rcpp. When matching on a propensity score, this makes matching extremely fast, even for large datasets. Using a caliper on the propensity score (described below) makes it even faster. Run times may be a bit longer when matching on other distance measures (e.g., the Mahalanobis distance). In contrast to optimal pair matching (described below), nearest neighbor matching does not require computing the full distance matrix between units, which makes it more applicable to large datasets.

Optimal Pair Matching (method = "optimal")

Optimal pair matching (often just called optimal matching) is very similar to nearest neighbor matching in that it attempts to pair each treated unit with one or more control units. Unlike nearest neighbor matching, however, it is “optimal” rather than greedy; it is optimal in the sense that it attempts to choose matches that collectively optimize an overall criterion (Hansen and Klopfer 2006; Gu and Rosenbaum 1993). The criterion used is the sum of the absolute pair distances in the matched sample. See ?method_optimal for the documentation for matchit() with method = "optimal". Optimal pair matching in MatchIt depends on the fullmatch() function in the optmatch package (Hansen and Klopfer 2006).

Like nearest neighbor matching, optimal pair matching requires the specification of a distance measure between units. Optimal pair matching can be thought of simply as an alternative to selecting the order of the matching for nearest neighbor matching. Optimal pair matching and nearest neighbor matching often yield the same or very similar matched samples; indeed, some research has indicated that optimal pair matching is not much better than nearest neighbor matching at yielding balanced matched samples (Austin 2013).

The tol argument in fullmatch() can be supplied to matchit() with method = "optimal"; this controls the numerical tolerance used to determine whether the optimal solution has been found. The default is fairly high and, for smaller problems, should be set much lower (e.g., by setting tol = 1e-7).

Optimal Full Matching (method = "full")

Optimal full matching (often just called full matching) assigns every treated and control unit in the sample to one subclass each (Hansen 2004; Stuart and Green 2008). Each subclass contains one treated unit and one or more control units or one control units and one or more treated units. It is optimal in the sense that the chosen number of subclasses and the assignment of units to subclasses minimize the sum of the absolute within-subclass distances in the matched sample. Weights are computed based on subclass membership, and these weights then function like propensity score weights and can be used to estimate a weighted treatment effect, ideally free of confounding by the measured covariates. See ?method_full for the documentation for matchit() with method = "full". Optimal full matching in MatchIt depends on the fullmatch() function in the optmatch package (Hansen and Klopfer 2006).

Like the other distance matching methods, optimal full matching requires the specification of a distance measure between units. It can be seen a combination of distance matching and stratum matching: subclasses are formed with varying numbers of treated and control units, as with stratum matching, but the subclasses are formed based on minimizing within-pair distances and do not involve forming strata based on any specific variable, similar to distance matching. Unlike other distance matching methods, full matching can be used to estimate the ATE. Full matching can also be seen as a form of propensity score weighting that is less sensitive to the form of the propensity score model because the original propensity scores are used just to create the subclasses, not to form the weights directly (Austin and Stuart 2015a). In addition, full matching does not have to rely on estimated propensity scores to form the subclasses and weights; other distance measures are allowed as well.

Although full matching uses all available units, there is a loss in precision due to the weights. Units may be weighted in such a way that they contribute less to the sample than would unweighted units, so the effective sample size (ESS) of the full matching weighted sample may be lower than even that of 1:1 pair matching. Balance is often far better after full matching than it is with 1:k matching, making full matching a good option to consider especially when 1:k matching is not effective or when the ATE is the target estimand.

The specification of the full matching optimization problem can be customized by supplying additional arguments that are passed to optmatch::fullmatch(), such as min.controls, max.controls, mean.controls, and omit.fraction. As with optimal pair matching, the numerical tolerance value can be set much lower than the default with small problems by setting, e.g., tol = 1e-7.

Generalized Full Matching (method = "quick")

Generalized full matching is a variant of full matching that uses a special fast clustering algorithm to dramatically speed up the matching, even for large datasets (Fredrik Sävje, Higgins, and Sekhon 2021). Like with optimal full matching, generalized full matching assigns every unit to a subclass. What makes generalized full match “generalized” is that the user can customize the matching in a number of ways, such as by specifying an arbitrary minimum number of units from each treatment group or total number of units per subclass, or by allowing not all units from a treatment group to have to be matched. Generalized full matching minimizes the largest within-subclass distances in the matched sample, but it does so in a way that is not completely optimal (though the solution is often very close to the optimal solution). Matching weights are computed based on subclass membership, and these weights then function like propensity score weights and can be used to estimate a weighted treatment effect, ideally free of confounding by the measured covariates. See ?method_quick for the documentation for matchit() with method = "quick". Generalized full matching in MatchIt depends on the quickmatch() function in the quickmatch package (Fredrik Sävje, Sekhon, and Higgins 2018).

Generalized full matching includes different options for customization than optimal full matching. The user cannot supply their own distance matrix, but propensity scores and distance metrics that are computed from the supplied covariates (e.g., Mahalanobis distance) are allowed. Calipers can only be placed on the propensity score, if supplied. As with optimal full matching, generalized full matching can target the ATE. Matching performance tends to be similar between the two methods, but generalized full matching will be much quicker and can accommodate larger datasets, making it a good substitute. Generalized full matching is often faster than even nearest neighbor matching, especially for large datasets.

Genetic Matching (method = "genetic")

Genetic matching is less a specific form of matching and more a way of specifying a distance measure for another form of matching. In practice, though, the form of matching used is nearest neighbor pair matching. Genetic matching uses a genetic algorithm, which is an optimization routine used for non-differentiable objective functions, to find scaling factors for each variable in a generalized Mahalanobis distance formula (Diamond and Sekhon 2013). The criterion optimized by the algorithm is one based on covariate balance. Once the scaling factors have been found, nearest neighbor matching is performed on the scaled generalized Mahalanobis distance. See ?method_genetic for the documentation for matchit() with method = "genetic". Genetic matching in MatchIt depends on the GenMatch() function in the Matching package (Sekhon 2011) to perform the genetic search and uses the Match() function to perform the nearest neighbor match using the scaled generalized Mahalanobis distance.

Genetic matching considers the generalized Mahalanobis distance between a treated unit \(i\) and a control unit \(j\) as \[\delta_{GMD}(\mathbf{x}_i,\mathbf{x}_j, \mathbf{W})=\sqrt{(\mathbf{x}_i - \mathbf{x}_j)'(\mathbf{S}^{-1/2})'\mathbf{W}(\mathbf{S}^{-1/2})(\mathbf{x}_i - \mathbf{x}_j)}\] where \(\mathbf{x}\) is a \(p \times 1\) vector containing the value of each of the \(p\) included covariates for that unit, \(\mathbf{S}^{-1/2}\) is the Cholesky decomposition of the covariance matrix \(\mathbf{S}\) of the covariates, and \(\mathbf{W}\) is a diagonal matrix with scaling factors \(w\) on the diagonal: \[ \mathbf{W}=\begin{bmatrix} w_1 & & & \\ & w_2 & & \\ & & \ddots &\\ & & & w_p \\ \end{bmatrix} \]

When \(w_k=1\) for all covariates \(k\), the computed distance is the standard Mahalanobis distance between units. Genetic matching estimates the optimal values of the \(w_k\)s, where a user-specified criterion is used to define what is optimal. The default is to maximize the smallest p-value among balance tests for the covariates in the matched sample (both Kolmogorov-Smirnov tests and t-tests for each covariate).

In MatchIt, if a propensity score is specified, the default is to include the propensity score and the covariates in \(\mathbf{x}\) and to optimize balance on the covariates. When distance = "mahalanobis" or the mahvars argument is specified, the propensity score is left out of \(\mathbf{x}\).

In all other respects, genetic matching functions just like nearest neighbor matching except that the matching itself is carried out by Matching::Match() instead of by MatchIt. When using method = "genetic" in MatchIt, additional arguments passed to Matching::GenMatch() to control the genetic search process should be specified; in particular, the pop.size argument should be increased from its default of 100 to a much higher value. Doing so will make the algorithm take more time to finish but will generally improve the quality of the resulting matches. Different functions can be supplied to be used as the objective in the optimization using the fit.func argument.

Exact Matching (method = "exact")

Exact matching is a form of stratum matching that involves creating subclasses based on unique combinations of covariate values and assigning each unit into their corresponding subclass so that only units with identical covariate values are placed into the same subclass. Any units that are in subclasses lacking either treated or control units will be dropped. Exact matching is the most powerful matching method in that no functional form assumptions are required on either the treatment or outcome model for the method to remove confounding due to the measured covariates; the covariate distributions are exactly balanced. The problem with exact matching is that in general, few if any units will remain after matching, so the estimated effect will only generalize to a very limited population and can lack precision. Exact matching is particularly ineffective with continuous covariates, for which it might be that no two units have the same value, and with many covariates, for which it might be the case that no two units have the same combination of all covariates; this latter problem is known as the “curse of dimensionality”. See ?method_exact for the documentation for matchit() with method = "exact".

It is possible to use exact matching on some covariates and another form of matching on the rest. This makes it possible to have exact balance on some covariates (typically categorical) and approximate balance on others, thereby gaining the benefits of both exact matching and the other matching method used. To do so, the other matching method should be specified in the method argument to matchit() and the exact argument should be specified to contain the variables on which exact matching is to be done.

Coarsened Exact Matching (method = "cem")

Coarsened exact matching (CEM) is a form of stratum matching that involves first coarsening the covariates by creating bins and then performing exact matching on the new coarsened versions of the covariates (Iacus, King, and Porro 2012). The degree and method of coarsening can be controlled by the user to manage the trade-off between exact and approximate balancing. For example, coarsening a covariate to two bins will mean that units that differ greatly on the covariate might be placed into the same subclass, while coarsening a variable to five bins may require units to be dropped due to not finding matches. Like exact matching, CEM is susceptible to the curse of dimensionality, making it a less viable solution with many covariates, especially with few units. Dropping units can also change the target population of the estimated effect. See ?method_cem for the documentation for matchit() with method = "cem". CEM in MatchIt does not depend on any other package to perform the coarsening and matching, though it used to rely on the cem package.

Subclassification (method = "subclass")

Propensity score subclassification can be thought of as a form of coarsened exact matching with the propensity score as the sole covariate to be coarsened and matched on. The bins are usually based on specified quantiles of the propensity score distribution either in the treated group, control group, or overall, depending on the desired estimand. Propensity score subclassification is an old and well-studied method, though it can perform poorly compared to other, more modern propensity score methods such as full matching and weighting (Austin 2010a). See ?method_subclass for the documentation for matchit() with method = "subclass".

The binning of the propensity scores is typically based on dividing the distribution of covariates into approximately equally sized bins. The user specifies the number of subclasses using the subclass argument and which group should be used to compute the boundaries of the bins using the estimand argument. Sometimes, subclasses can end up with no units from one of the treatment groups; by default, matchit() moves a unit from an adjacent subclass into the lacking one to ensure that each subclass has at least one unit from each treatment group. The minimum number of units required in each subclass can be chosen by the min.n argument to matchit(). If set to 0, an error will be thrown if any subclass lacks units from one of the treatment groups. Moving units from one subclass to another generally worsens the balance in the subclasses but can increase precision.

The default number of subclasses is 6, which is arbitrary and should not be taken as a recommended value. Although early theory has recommended the use of 5 subclasses, in general there is an optimal number of subclasses that is typically much larger than 5 but that varies among datasets (Orihara and Hamada 2021). Rather than trying to figure this out for oneself, one can use optimal full matching (i.e., with method = "full") or generalized full matching (method = "quick") to optimally create subclasses that optimize a within-subclass distance criterion.

The output of propensity score subclassification includes the assigned subclasses and the subclassification weights. Effects can be estimated either within each subclass and then averaged across them, or a single marginal effect can be estimated using the subclassification weights. This latter method has been called marginal mean weighting through subclassification [MMWS; Hong (2010)] and fine stratification weighting (Desai et al. 2017). It is also implemented in the WeightIt package.

Cardinality and Profile Matching (method = "cardinality")

Cardinality and profile matching are pure subset selection methods that involve selecting a subset of the original sample without considering the distance between individual units or assigning units to pairs or subclasses. They can be thought of as a weighting method where the weights are restricted to be zero or one. Cardinality matching involves finding the largest sample that satisfies user-supplied balance constraints and constraints on the ratio of matched treated to matched control units (Zubizarreta, Paredes, and Rosenbaum 2014). It does not consider a specific estimand and can be a useful alternative to matching with a caliper for handling data with little overlap (Visconti and Zubizarreta 2018). Profile matching involves identifying a target distribution (e.g., the full sample for the ATE or the treated units for the ATT) and finding the largest subset of the treated and control groups that satisfy user-supplied balance constraints with respect to that target (Cohn and Zubizarreta 2022). See ?method_cardinality for the documentation for using matchit() with method = "cardinality", including which inputs are required to request either cardinality matching or profile matching.

Subset selection is performed by solving a mixed integer programming optimization problem with linear constraints. The problem involves maximizing the size of the matched sample subject to constraints on balance and sample size. For cardinality matching, the balance constraints refer to the mean difference for each covariate between the matched treated and control groups, and the sample size constraints require the matched treated and control groups to be the same size (or differ by a user-supplied factor). For profile matching, the balance constraints refer to the mean difference for each covariate between each treatment group and the target distribution; for the ATE, this requires the mean of each covariate in each treatment group to be within a given tolerance of the mean of the covariate in the full sample, and for the ATT, this requires the mean of each covariate in the control group to be within a given tolerance of the mean of the covariate in the treated group, which is left intact. The balance tolerances are controlled by the tols and std.tols arguments. One can also create pairs in the matched sample by using the mahvars argument, which requests that optimal Mahalanobis matching be done after subset selection; doing so can add additional precision and robustness (Zubizarreta, Paredes, and Rosenbaum 2014).

The optimization problem requires a special solver to solve. Currently, the available options in MatchIt are the HiGHS solver (through the highs package), the GLPK solver (through the Rglpk package), the SYMPHONY solver (through the Rsymphony package), and the Gurobi solver (through the gurobi package). The differences among the solvers are in performance; Gurobi is by far the best (fastest, least likely to fail to find a solution), but it is proprietary (though has a free trial and academic license) and is a bit more complicated to install. HiGHS is the default due to being open source, easily installed, and with performance comparable to Gurobi. The designmatch package also provides an implementation of cardinality matching with more options than MatchIt offers.

Customizing the Matching Specification

In addition to the specific matching method, other options are available for many of the matching methods to further customize the matching specification. These include different specifications of the distance measure, methods to perform alternate forms of matching in addition to the main method, prune units far from other units prior to matching, restrict possible matches, etc. Not all options are compatible with all matching methods.

Specifying the propensity score or other distance measure (distance)

The distance measure is used to define how close two units are. In nearest neighbor matching, this is used to choose the nearest control unit to each treated unit. In optimal matching, this is used in the criterion that is optimized. By default, the distance measure is the propensity score difference, and the argument supplied to distance corresponds to the method of estimating the propensity score. In MatchIt, propensity scores are often labeled as “distance” values, even though the propensity score itself is not a distance measure. This is to reflect that the propensity score is used in creating the distance value, but other scores could be used, such as prognostic scores for prognostic score matching (Hansen 2008). The propensity score is more like a “position” value, in that it reflects the position of each unit in the matching space, and the difference between positions is the distance between them. If the argument to distance is one of the allowed methods for estimating propensity scores (see ?distance for these values) or is a numeric vector with one value per unit, the distance between units will be computed as the pairwise difference between propensity scores or the supplied values. Propensity scores are also used in propensity score subclassification and can optionally be used in genetic matching as a component of the generalized Mahalanobis distance. For exact, coarsened exact, and cardinality matching, the distance argument is ignored.

The default distance argument is "glm", which estimates propensity scores using logistic regression or another generalized linear model. The link and distance.options arguments can be supplied to further specify the options for the propensity score models, including whether to use the raw propensity score or a linearized version of it (e.g., the logit of a logistic regression propensity score, which has been commonly referred to and recommended in the propensity score literature (Austin 2011; Stuart 2010)). Allowable options for the propensity score model include parametric and machine learning-based models, each of which have their strengths and limitations and may perform differently depending on the unique qualities of each dataset. We recommend multiple types of models be tried to find one that yields the best balance, as there is no way to make a single recommendation that will work for all cases.

The distance argument can also be specified as a method of computing pairwise distances from the covariates directly (i.e., without estimating propensity scores). The options include "mahalanobis", "robust_mahalanobis", "euclidean", and "scaled_euclidean". These methods compute a distance metric for a treated unit \(i\) and a control unit \(j\) as \[\delta(\mathbf{x}_i,\mathbf{x}_j)=\sqrt{(\mathbf{x}_i - \mathbf{x}_j)'S^{-1}(\mathbf{x}_i - \mathbf{x}_j)}\]

where \(\mathbf{x}\) is a \(p \times 1\) vector containing the value of each of the \(p\) included covariates for that unit, \(S\) is a scaling matrix, and \(S^{-1}\) is the (generalized) inverse of \(S\). For Mahalanobis distance matching, \(S\) is the pooled covariance matrix of the covariates (Rubin 1980); for Euclidean distance matching, \(S\) is the identity matrix (i.e., no scaling); and for scaled Euclidean distance matching, \(S\) is the diagonal of the pooled covariance matrix (containing just the variances). The robust Mahalanobis distance is computed not on the covariates directly but rather on their ranks and uses a correction for ties (see Rosenbaum (2010), ch 8). For creating close pairs, matching with these distance measures tends work better than propensity score matching because paired units will have close values on all of the covariates, whereas propensity score-paired units may be close on the propensity score but not on any of the covariates themselves. This feature was the basis of King and Nielsen’s (2019) warning against using propensity scores for matching. That said, they do not always outperform propensity score matching (Ripollone et al. 2018).

distance can also be supplied as a matrix of distance values between units. This makes it possible to use handcrafted distance matrices or distances created outside MatchIt. Only nearest neighbor, optimal pair, and optimal full matching allow this specification.

The propensity score can have uses other than as the basis for matching. It can be used to define a region of common support, outside which units are dropped prior to matching; this is implemented by the discard option. It can also be used to define a caliper, the maximum distance two units can be before they are prohibited from being paired with each other; this is implemented by the caliper argument. To estimate or supply a propensity score for one of these purposes but not use it as the distance measure for matching (i.e., to perform Mahalanobis distance matching instead), the mahvars argument can be specified. These options are described below.

Implementing common support restrictions (discard)

The region of common support is the region of overlap between treatment groups. A common support restriction discards units that fall outside of the region of common support, preventing them from being matched to other units and included in the matched sample. This can reduce the potential for extrapolation and help the matching algorithms to avoid overly distant matches from occurring. In MatchIt, the discard option implements a common support restriction based on the propensity score. The argument can be supplied as "treated", "control", or "both", which discards units in the corresponding group that fall outside the region of common support for the propensity score. The reestimate argument can be supplied to choose whether to re-estimate the propensity score in the remaining units. If units from the treated group are discarded based on a common support restriction, the estimand no longer corresponds to the ATT.

Caliper matching (caliper)

A caliper can be though of as a ring around each unit that limits to which other units that unit can be paired. Calipers are based on the propensity score or other covariates. Two units whose distance on a calipered covariate is larger than the caliper width for that covariate are not allowed to be matched to each other. Any units for which there are no available matches within the caliper are dropped from the matched sample. Calipers ensure paired units are close to each other on the calipered covariates, which can ensure good balance in the matched sample. Multiple variables can be supplied to caliper to enforce calipers on all of them simultaneously. Using calipers can be a good alternative to exact or coarsened exact matching to ensure only similar units are paired with each other. The std.caliper argument controls whether the provided calipers are in raw units or standard deviation units. When negative calipers are supplied, this forces units whose distance on the calipered covariate is smaller than the absolute caliper width for that covariate to be disallowed from being matched to each other. If units from the treated group are left unmatched due to a caliper, the estimand no longer corresponds to the ATT.

Mahalanobis distance matching (mahvars)

To perform Mahalanobis distance matching without the need to estimate or use a propensity score, the distance argument can be set to "mahalanobis". If a propensity score is to be estimated or used for a different purpose, such as in a common support restriction or a caliper, but you still want to perform Mahalanobis distance matching, variables should be supplied to the mahvars argument. The propensity scores will be generated using the distance specification, and matching will occur not on the covariates supplied to the main formula of matchit() but rather on the covariates supplied to mahvars. To perform Mahalanobis distance matching within a propensity score caliper, for example, the distance argument should be set to the method of estimating the propensity score (e.g., "glm" for logistic regression), the caliper argument should be specified to the desired caliper width, and mahvars should be specified to perform Mahalanobis distance matching on the desired covariates within the caliper. mahvars has a special meaning for genetic matching and cardinality matching; see their respective help pages for details.

Exact matching (exact)

To perform exact matching on all supplied covariates, the method argument can be set to "exact". To perform exact matching only on some covariates and some other form of matching within exact matching strata on other covariates, the exact argument can be used. Covariates supplied to the exact argument will be matched exactly, and the form of matching specified by method (e.g., "nearest" for nearest neighbor matching) will take place within each exact matching stratum. This can be a good way to gain some of the benefits of exact matching without completely succumbing to the curse of dimensionality. As with exact matching performed with method = "exact", any units in strata lacking members of one of the treatment groups will be left unmatched. Note that although matching occurs within each exact matching stratum, propensity score estimation and computation of the Mahalanobis or other distance matrix occur in the full sample. If units from the treated group are unmatched due to an exact matching restriction, the estimand no longer corresponds to the ATT.

Anti-exact matching (antiexact)

Anti-exact matching adds a restriction such that a treated and control unit with same values of any of the specified anti-exact matching variables cannot be paired. This can be useful when finding comparison units outside of a unit’s group, such as when matching units in one group to units in another when units within the same group might otherwise be close matches. See examples here and here. A similar effect can be implemented by supplying negative caliper values.

Matching with replacement (replace)

Nearest neighbor matching and genetic matching have the option of matching with or without replacement, and this is controlled by the replace argument. Matching without replacement means that each control unit is matched to only one treated unit, while matching with replacement means that control units can be reused and matched to multiple treated units. Matching without replacement carries certain statistical benefits in that weights for each unit can be omitted or are more straightforward to include and dependence between units depends only on pair membership. However, it is not asymptotically consistent unless the propensity scores for all treated units are below .5 and there are many more control units than treated units (F. Sävje 2022). Special standard error estimators are sometimes required for estimating effects after matching with replacement (Austin and Cafri 2020), and methods for accounting for uncertainty are not well understood for non-continuous outcomes. Matching with replacement will tend to yield better balance though, because the problem of “running out” of close control units to match to treated units is avoided, though the reuse of control units will decrease the effect sample size, thereby worsening precision (Austin 2013). (This problem occurs in the Lalonde dataset used in vignette("MatchIt"), which is why nearest neighbor matching without replacement is not very effective there.) After matching with replacement, control units are assigned to more than one subclass, so the get_matches() function should be used instead of match_data() after matching with replacement if subclasses are to be used in follow-up analyses; see vignette("estimating-effects") for details.

The reuse.max argument can also be used with method = "nearest" to control how many times each control unit can be reused as a match. Setting reuse.max = 1 is equivalent to requiring matching without replacement (i.e., because each control can be used only once). Other values allow control units to be matched more than once, though only up to the specified number of times. Higher values will tend to improve balance at the cost of precision.

\(k\):1 matching (ratio)

The most common form of matching, 1:1 matching, involves pairing one control unit with each treated unit. To perform \(k\):1 matching (e.g., 2:1 or 3:1), which pairs (up to) \(k\) control units with each treated unit, the ratio argument can be specified. Performing \(k\):1 matching can preserve precision by preventing too many control units from being unmatched and dropped from the matched sample, though the gain in precision by increasing \(k\) diminishes rapidly after 4 (Rosenbaum 2020). Importantly, for \(k>1\), the matches after the first match will generally be worse than the first match in terms of closeness to the treated unit, so increasing \(k\) can also worsen balance (Rassen et al. 2012). Austin (2010b) found that 1:1 or 1:2 matching generally performed best in terms of mean squared error. In general, it makes sense to use higher values of \(k\) while ensuring that balance is satisfactory.

With nearest neighbor and optimal pair matching, variable \(k\):1 matching, in which the number of controls matched to each treated unit varies, can also be used; this can have improved performance over “fixed” \(k\):1 matching (Ming and Rosenbaum 2000; Rassen et al. 2012). See ?method_nearest and ?method_optimal for information on implementing variable \(k\):1 matching.

Matching order (m.order)

For nearest neighbor matching (including genetic matching), units are matched in an order, and that order can affect the quality of individual matches and of the resulting matched sample. With method = "nearest", the allowable options to m.order to control the matching order are "largest", "smallest", "closest", "farthest", "random", and "data". With method = "genetic", all but "closest" and "farthest" can be used. Requesting "largest" means that treated units with the largest propensity scores, i.e., those least like the control units, will be matched first, which prevents them from having bad matches after all the close control units have been used up. "smallest" means that treated units with the smallest propensity scores are matched first. "closest" means that potential pairs with the smallest distance between units will be matched first, which ensures that the best possible matches are included in the matched sample but can yield poor matches for units whose best match is far from them; this makes it particularly useful when matching with a caliper. "farthest" means that closest pairs with the largest distance between them will be matched first, which ensures the hardest units to match are given the best chance to find matches. "random" matches in a random order, and "data" matches in order of the data. A propensity score is required for "largest" and "smallest" but not for the other options.

Rubin (1973) recommends using "largest" or "random", though Austin (2013) recommends against "largest" and instead favors "closest" or "random". "closest" and "smallest" are best for prioritizing the best possible matches, while "farthest" and "largest" are best for preventing extreme pairwise distances between matched units.

Choosing a Matching Method

Choosing the best matching method for one’s data depends on the unique characteristics of the dataset as well as the goals of the analysis. For example, because different matching methods can target different estimands, when certain estimands are desired, specific methods must be used. On the other hand, some methods may be more effective than others when retaining the target estimand is less important. Below we provide some guidance on choosing a matching method. Remember that multiple methods can (and should) be tried as long as the treatment effect is not estimated until a method has been settled on.

The criteria on which a matching specification should be judged are balance and remaining (effective) sample size after matching. Assessing balance is described in vignette("assessing-balance"). A typical workflow is similar to that demonstrated in vignette("MatchIt"): try a matching method, and if it yields poor balance or an unacceptably low remaining sample size, try another, until a satisfactory specification has been found. It is important to assess balance broadly (i.e., beyond comparing the means of the covariates in the treated and control groups), and the search for a matching specification should not stop when a threshold is reached, but should attempt to come as close as possible to perfect balance (Ho et al. 2007). Even if the first matching specification appears successful at reducing imbalance, there may be another specification that could reduce it even further, thereby increasing the robustness of the inference and the plausibility of an unbiased effect estimate.

If the target of inference is the ATE, optimal or generalized full matching, subclassification, or profile matching can be used. If the target of inference is the ATT or ATC, any matching method may be used. When retaining the target estimand is not so important, additional options become available that involve discarding units in such a way that the original estimand is distorted. These include matching with a caliper, matching within a region of common support, cardinality matching, or exact or coarsened exact matching, perhaps on a subset of the covariates.

Because exact and coarsened exact matching aim to balance the entire joint distribution of covariates, they are the most powerful methods. If it is possible to perform exact matching, this method should be used. If continuous covariates are present, coarsened exact matching can be tried. Care should be taken with retaining the target population and ensuring enough matched units remain; unless the control pool is much larger than the treated pool, it is likely some (or many) treated units will be discarded, thereby changing the estimand and possibly dramatically reducing precision. These methods are typically only available in the most optimistic of circumstances, but they should be used first when those circumstances arise. It may also be useful to combine exact or coarsened exact matching on some covariates with another form of matching on the others (i.e., by using the exact argument).

When estimating the ATE, either subclassification, full matching, or profile matching can be used. Optimal and generalized full matching can be effective because they optimize a balance criterion, often leading to better balance. With full matching, it’s also possible to exact match on some variables and match using the Mahalanobis distance, eliminating the need to estimate propensity scores. Profile matching also ensures good balance, but because units are only given weights of zero or one, a solution may not be feasible and many units may have to be discarded. For large datasets, neither optimal full matching nor profile matching may be possible, in which case generalized full matching and subclassification are faster solutions. When using subclassification, the number of subclasses should be varied. With large samples, higher numbers of subclasses tend to yield better performance; one should not immediately settle for the default (6) or the often-cited recommendation of 5 without trying several other numbers. The documentation for cobalt::bal.compute() contains an example of using balance to select the optimal number of subclasses.

When estimating the ATT, a variety of methods can be tried. Genetic matching can perform well at achieving good balance because it directly optimizes covariate balance. With larger datasets, it may take a long time to reach a good solution (though that solution will tend to be good as well). Profile matching also will achieve good balance if a solution is feasible because balance is controlled by the user. Optimal pair matching and nearest neighbor matching without replacement tend to perform similarly to each other; nearest neighbor matching may be preferable for large datasets that cannot be handled by optimal matching. Nearest neighbor, optimal, and genetic matching allow some customizations like including covariates on which to exactly match, using the Mahalanobis distance instead of a propensity score difference, and performing \(k\):1 matching with \(k>1\). Nearest neighbor matching with replacement, full matching, and subclassification all involve weighting the control units with nonuniform weights, which often allows for improved balancing capabilities but can be accompanied by a loss in effective sample size, even when all units are retained. There is no reason not to try many of these methods, varying parameters here and there, in search of good balance and high remaining sample size. As previously mentioned, no single method can be recommended above all others because the optimal specification depends on the unique qualities of each dataset.

When the target population is less important, for example, when engaging in treatment effect discovery or when the sampled population is not of particular interest (e.g., it corresponds to an arbitrarily chosen hospital or school; see Mao, Li, and Greene (2018) for these and other reasons why retaining the target population may not be important), other methods that do not retain the characteristics of the original sample become available. These include matching with a caliper (on the propensity score or on the covariates themselves), cardinality matching, and more restrictive forms of matching like exact and coarsened exact matching, either on all covariates or just a subset, that are prone to discard units from the sample in such a way that the target population is changed. Austin (2013) and Austin and Stuart (2015b, 2015a) have found that caliper matching can be a particularly effective modification to nearest neighbor matching for eliminating imbalance and reducing bias when the target population is less relevant, but when inference to a specific target population is desired, using calipers can induce bias due to incomplete matching (Rosenbaum and Rubin 1985a; Wang 2020). Cardinality matching can be particularly effective in data with little overlap between the treatment groups (Visconti and Zubizarreta 2018) and can perform better than caliper matching (de los Angeles Resa and Zubizarreta 2020).

It is important not to rely excessively on theoretical or simulation-based findings or specific recommendations when making choices about the best matching method to use. For example, although nearest neighbor matching without replacement balance covariates better than did subclassification with five or ten subclasses in Austin’s (2009) simulation, this does not imply it will be superior in all datasets. Likewise, though Rosenbaum and Rubin (1985b) and Austin (2011) both recommend using a caliper of .2 standard deviations of the logit of the propensity score, this does not imply that caliper will be optimal in all scenarios, and other widths should be tried, though it should be noted that tightening the caliper on the propensity score can sometimes degrade performance (King and Nielsen 2019).

For large datasets (i.e., in 10,000s to millions), some matching methods will be too slow to be used at scale. Instead, users should consider generalized full matching, subclassification, or coarsened exact matching, which are all very fast and designed to work with large datasets. Nearest neighbor matching on the propensity score has been optimized to run quickly for large datasets as well.

Reporting the Matching Specification

When reporting the results of a matching analysis, it is important to include the relevant details of the final matching specification and the process of arriving at it. Using print() on the matchit object synthesizes information on how the above arguments were used to provide a description of the matching specification. It is best to be as specific as possible to ensure the analysis is replicable and to allow audiences to assess its validity. Although citations recommending specific matching methods can be used to help justify a choice, the only sufficient justification is adequate balance and remaining sample size, regardless of published recommendations for specific methods. See vignette("assessing-balance") for instructions on how to assess and report the quality of a matching specification. After matching and estimating an effect, details of the effect estimation must be included as well; see vignette("estimating-effects") for instructions on how to perform and report on the analysis of a matched dataset.

References

Abadie, Alberto, and Guido W. Imbens. 2006. “Large Sample Properties of Matching Estimators for Average Treatment Effects.” Econometrica 74 (1): 235–67. https://doi.org/10.1111/j.1468-0262.2006.00655.x.
———. 2016. “Matching on the Estimated Propensity Score.” Econometrica 84 (2): 781–807. https://doi.org/10.3982/ECTA11293.
Austin, Peter C. 2009. “The Relative Ability of Different Propensity Score Methods to Balance Measured Covariates Between Treated and Untreated Subjects in Observational Studies.” Medical Decision Making 29 (6): 661–77. https://doi.org/10.1177/0272989x09341755.
———. 2010a. “The Performance of Different Propensity-Score Methods for Estimating Differences in Proportions (Risk Differences or Absolute Risk Reductions) in Observational Studies.” Statistics in Medicine 29 (20): 2137–48. https://doi.org/10.1002/sim.3854.
———. 2010b. “Statistical Criteria for Selecting the Optimal Number of Untreated Subjects Matched to Each Treated Subject When Using Many-to-One Matching on the Propensity Score.” American Journal of Epidemiology 172 (9): 1092–97. https://doi.org/10.1093/aje/kwq224.
———. 2011. “Optimal Caliper Widths for Propensity-Score Matching When Estimating Differences in Means and Differences in Proportions in Observational Studies.” Pharmaceutical Statistics 10 (2): 150–61. https://doi.org/10.1002/pst.433.
———. 2013. “A Comparison of 12 Algorithms for Matching on the Propensity Score.” Statistics in Medicine 33 (6): 1057–69. https://doi.org/10.1002/sim.6004.
Austin, Peter C., and Guy Cafri. 2020. “Variance Estimation When Using Propensity-Score Matching with Replacement with Survival or Time-to-Event Outcomes.” Statistics in Medicine 39 (11): 1623–40. https://doi.org/10.1002/sim.8502.
Austin, Peter C., and Dylan S. Small. 2014. “The Use of Bootstrapping When Using Propensity-Score Matching Without Replacement: A Simulation Study.” Statistics in Medicine 33 (24): 4306–19. https://doi.org/10.1002/sim.6276.
Austin, Peter C., and Elizabeth A. Stuart. 2015a. “The Performance of Inverse Probability of Treatment Weighting and Full Matching on the Propensity Score in the Presence of Model Misspecification When Estimating the Effect of Treatment on Survival Outcomes.” Statistical Methods in Medical Research 26 (4): 1654–70. https://doi.org/10.1177/0962280215584401.
———. 2015b. “Estimating the Effect of Treatment on Binary Outcomes Using Full Matching on the Propensity Score.” Statistical Methods in Medical Research 26 (6): 2505–25. https://doi.org/10.1177/0962280215601134.
Cohn, Eric R., and José R. Zubizarreta. 2022. “Profile Matching for the Generalization and Personalization of Causal Inferences.” Epidemiology 33 (5): 678. https://doi.org/10.1097/EDE.0000000000001517.
de los Angeles Resa, María, and José R. Zubizarreta. 2020. “Direct and Stable Weight Adjustment in Non-Experimental Studies with Multivalued Treatments: Analysis of the Effect of an Earthquake on Post-Traumatic Stress.” Journal of the Royal Statistical Society: Series A (Statistics in Society) n/a (n/a). https://doi.org/10.1111/rssa.12561.
Desai, Rishi J., Kenneth J. Rothman, Brian T. Bateman, Sonia Hernandez-Diaz, and Krista F. Huybrechts. 2017. “A Propensity-Score-Based Fine Stratification Approach for Confounding Adjustment When Exposure Is Infrequent:” Epidemiology 28 (2): 249–57. https://doi.org/10.1097/EDE.0000000000000595.
Diamond, Alexis, and Jasjeet S. Sekhon. 2013. “Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics 95 (3): 932945. https://doi.org/10.1162/REST_a_00318.
Gu, Xing Sam, and Paul R. Rosenbaum. 1993. “Comparison of Multivariate Matching Methods: Structures, Distances, and Algorithms.” Journal of Computational and Graphical Statistics 2 (4): 405. https://doi.org/10.2307/1390693.
Hansen, Ben B. 2004. “Full Matching in an Observational Study of Coaching for the SAT.” Journal of the American Statistical Association 99 (467): 609–18. https://doi.org/10.1198/016214504000000647.
———. 2008. “The Prognostic Analogue of the Propensity Score.” Biometrika 95 (2): 481–88. https://doi.org/10.1093/biomet/asn004.
Hansen, Ben B., and Stephanie O. Klopfer. 2006. “Optimal Full Matching and Related Designs via Network Flows.” Journal of Computational and Graphical Statistics 15 (3): 609–27. https://doi.org/10.1198/106186006X137047.
Ho, Daniel E., Kosuke Imai, Gary King, and Elizabeth A. Stuart. 2007. “Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference.” Political Analysis 15 (3): 199–236. https://doi.org/10.1093/pan/mpl013.
Hong, Guanglei. 2010. “Marginal Mean Weighting Through Stratification: Adjustment for Selection Bias in Multilevel Data.” Journal of Educational and Behavioral Statistics 35 (5): 499–531. https://doi.org/10.3102/1076998609359785.
Iacus, Stefano M., Gary King, and Giuseppe Porro. 2012. “Causal Inference Without Balance Checking: Coarsened Exact Matching.” Political Analysis 20 (1): 1–24. https://doi.org/10.1093/pan/mpr013.
King, Gary, and Richard Nielsen. 2019. “Why Propensity Scores Should Not Be Used for Matching.” Political Analysis, May, 1–20. https://doi.org/10.1017/pan.2019.11.
Mao, Huzhang, Liang Li, and Tom Greene. 2018. “Propensity Score Weighting Analysis and Treatment Effect Discovery.” Statistical Methods in Medical Research, June, 096228021878117. https://doi.org/10.1177/0962280218781171.
Ming, Kewei, and Paul R. Rosenbaum. 2000. “Substantial Gains in Bias Reduction from Matching with a Variable Number of Controls.” Biometrics 56 (1): 118–24. https://doi.org/10.1111/j.0006-341X.2000.00118.x.
Orihara, Shunichiro, and Etsuo Hamada. 2021. “Determination of the Optimal Number of Strata for Propensity Score Subclassification.” Statistics & Probability Letters 168 (January): 108951. https://doi.org/10.1016/j.spl.2020.108951.
Rassen, Jeremy A., Abhi A. Shelat, Jessica Myers, Robert J. Glynn, Kenneth J. Rothman, and Sebastian Schneeweiss. 2012. “One-to-Many Propensity Score Matching in Cohort Studies.” Pharmacoepidemiology and Drug Safety 21 (S2): 69–80. https://doi.org/10.1002/pds.3263.
Ripollone, John E., Krista F. Huybrechts, Kenneth J. Rothman, Ryan E. Ferguson, and Jessica M. Franklin. 2018. “Implications of the Propensity Score Matching Paradox in Pharmacoepidemiology.” American Journal of Epidemiology 187 (9): 1951–61. https://doi.org/10.1093/aje/kwy078.
Rosenbaum, Paul R. 2010. Design of Observational Studies. Springer Series in Statistics. New York: Springer.
———. 2020. “Modern Algorithms for Matching in Observational Studies.” Annual Review of Statistics and Its Application 7 (1): 143–76. https://doi.org/10.1146/annurev-statistics-031219-041058.
Rosenbaum, Paul R., and Donald B. Rubin. 1985a. “The Bias Due to Incomplete Matching.” Biometrics 41 (1): 103–16. https://doi.org/10.2307/2530647.
———. 1985b. “Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score.” The American Statistician 39 (1): 33. https://doi.org/10.2307/2683903.
Rubin, Donald B. 1973. “Matching to Remove Bias in Observational Studies.” Biometrics 29 (1): 159. https://doi.org/10.2307/2529684.
———. 1980. “Bias Reduction Using Mahalanobis-Metric Matching.” Biometrics 36 (2): 293–98. https://doi.org/10.2307/2529981.
Sävje, F. 2022. “On the Inconsistency of Matching Without Replacement.” Biometrika 109 (2): 551–58. https://doi.org/10.1093/biomet/asab035.
Sävje, Fredrik, Michael J. Higgins, and Jasjeet S. Sekhon. 2021. “Generalized Full Matching.” Political Analysis 29 (4): 423–47. https://doi.org/10.1017/pan.2020.32.
Sävje, Fredrik, Jasjeet Sekhon, and Michael Higgins. 2018. Quickmatch: Quick Generalized Full Matching. https://CRAN.R-project.org/package=quickmatch.
Schafer, Joseph L., and Joseph Kang. 2008. “Average Causal Effects from Nonrandomized Studies: A Practical Guide and Simulated Example.” Psychological Methods 13 (4): 279–313. https://doi.org/10.1037/a0014268.
Sekhon, Jasjeet S. 2011. “Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R.” Journal of Statistical Software 42 (1): 1–52. https://doi.org/10.18637/jss.v042.i07.
Stuart, Elizabeth A. 2008. “Developing Practical Recommendations for the Use of Propensity Scores: Discussion of A Critical Appraisal of Propensity Score Matching in the Medical Literature Between 1996 and 2003 by Peter Austin,Statistics in Medicine.” Statistics in Medicine 27 (12): 2062–65. https://doi.org/10.1002/sim.3207.
———. 2010. “Matching Methods for Causal Inference: A Review and a Look Forward.” Statistical Science 25 (1): 1–21. https://doi.org/10.1214/09-STS313.
Stuart, Elizabeth A., and Kerry M. Green. 2008. “Using Full Matching to Estimate Causal Effects in Nonexperimental Studies: Examining the Relationship Between Adolescent Marijuana Use and Adult Outcomes.” Developmental Psychology 44 (2): 395–406. https://doi.org/10.1037/0012-1649.44.2.395.
Thoemmes, Felix J., and Eun Sook Kim. 2011. “A Systematic Review of Propensity Score Methods in the Social Sciences.” Multivariate Behavioral Research 46 (1): 90–118. https://doi.org/10.1080/00273171.2011.540475.
Visconti, Giancarlo, and José R. Zubizarreta. 2018. “Handling Limited Overlap in Observational Studies with Cardinality Matching.” Observational Studies 4 (1): 217–49. https://doi.org/10.1353/obs.2018.0012.
Wan, Fei. 2019. “Matched or Unmatched Analyses with Propensity-Scorematched Data?” Statistics in Medicine 38 (2): 289–300. https://doi.org/10.1002/sim.7976.
Wang, Jixian. 2020. “To Use or Not to Use Propensity Score Matching?” Pharmaceutical Statistics, August. https://doi.org/10.1002/pst.2051.
Zakrison, T. L., Peter C. Austin, and V. A. McCredie. 2018. “A Systematic Review of Propensity Score Methods in the Acute Care Surgery Literature: Avoiding the Pitfalls and Proposing a Set of Reporting Guidelines.” European Journal of Trauma and Emergency Surgery 44 (3): 385–95. https://doi.org/10.1007/s00068-017-0786-6.
Zubizarreta, José R., Ricardo D. Paredes, and Paul R. Rosenbaum. 2014. “Matching for Balance, Pairing for Heterogeneity in an Observational Study of the Effectiveness of for-Profit and Not-for-Profit High Schools in Chile.” The Annals of Applied Statistics 8 (1): 204–31. https://doi.org/10.1214/13-AOAS713.
MatchIt/inst/doc/sampling-weights.html0000644000176200001440000010324214763323603017506 0ustar liggesusers Matching with Sampling Weights

Matching with Sampling Weights

Noah Greifer

2025-03-09

Introduction

Sampling weights (also known as survey weights) frequently appear when using large, representative datasets. They are required to ensure any estimated quantities generalize to a target population defined by the weights. Evidence suggests that sampling weights need to be incorporated into a propensity score matching analysis to obtain valid and unbiased estimates of the treatment effect in the sampling weighted population (DuGoff, Schuler, and Stuart 2014; Austin, Jembere, and Chiu 2016; Lenis et al. 2019). In this guide, we demonstrate how to use sampling weights with MatchIt for propensity score estimation, balance assessment, and effect estimation. Fortunately, doing so is not complicated, but some care must be taken to ensure sampling weights are incorporated correctly. It is assumed one has read the other vignettes explaining matching (vignette("matching-methods")), balance assessment (vignette("assessing-balance")), and effect estimation (vignette("estimating-effects").

We will use the same simulated toy dataset used in vignette("estimating-effects") except with the addition of a sampling weights variable, SW, which is used to generalize the sample to a specific target population with a distribution of covariates different from that of the sample. Code to generate the covariates, treatment, and outcome is at the bottom of vignette("estimating-effects") and code to generate the sampling weights is at the end of this document. We will consider the effect of binary treatment A on continuous outcome Y_C, adjusting for confounders X1-X9.

head(d)
##   A      X1      X2      X3       X4 X5      X6      X7      X8       X9     Y_C     SW
## 1 0  0.1725 -1.4283 -0.4103 -2.36059  1 -1.1199  0.6398 -0.4840 -0.59385 -3.5907  1.675
## 2 0 -1.0959  0.8463  0.2456 -0.12333  1 -2.2687 -1.4491 -0.5514 -0.31439 -1.5481  1.411
## 3 0  0.1768  0.7905 -0.8436  0.82366  1 -0.2221  0.2971 -0.6966 -0.69516  6.0714  2.332
## 4 0 -0.4595  0.1726  1.9542 -0.62661  1 -0.4019 -0.8294 -0.5384  0.20729  2.4906  1.644
## 5 1  0.3563 -1.8121  0.8135 -0.67189  1 -0.8297  1.7297 -0.6439 -0.02648 -0.6687  2.722
## 6 0 -2.4313 -1.7984 -1.2940  0.04609  1 -1.2419 -1.1252 -1.8659 -0.56513 -9.8504 14.773
library("MatchIt")

Matching

When using sampling weights with propensity score matching, one has the option of including the sampling weights in the model used to estimate the propensity scores. Although evidence is mixed on whether this is required (Austin, Jembere, and Chiu 2016; Lenis et al. 2019), it can be a good idea. The choice should depend on whether including the sampling weights improves the quality of the matches. Specifications including and excluding sampling weights should be tried to determine which is preferred.

To supply sampling weights to the propensity score-estimating function in matchit(), the sampling weights variable should be supplied to the s.weights argument. It can be supplied either as a numerical vector containing the sampling weights, or a string or one-sided formula with the name of the sampling weights variable in the supplied dataset. Below we demonstrate including sampling weights into propensity scores estimated using logistic regression for optimal full matching for the average treatment effect in the population (ATE) (note that all methods and steps apply the same way to all forms of matching and all estimands).

mF_s <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + 
                  X6 + X7 + X8 + X9, data = d,
                method = "full", distance = "glm",
                estimand = "ATE", s.weights = ~SW)
mF_s

Notice that the description of the matching specification when the matchit object is printed includes lines indicating that the sampling weights were included in the estimation of the propensity score and that they are present in the matchit object. It is stored in the s.weights component of the matchit object. Note that at this stage, the matching weights (stored in the weights component of the matchit object) do not incorporate the sampling weights; they are calculated simply as a result of the matching.

Now let’s perform full matching on a propensity score that does not include the sampling weights in its estimation. Here we use the same specification as was used in vignette("estimating-effects").

mF <- matchit(A ~ X1 + X2 + X3 + X4 + X5 + 
                X6 + X7 + X8 + X9, data = d,
              method = "full", distance = "glm",
              estimand = "ATE")
mF

Notice that there is no mention of sampling weights in the description of the matching specification. However, to properly assess balance and estimate effects, we need the sampling weights to be included in the matchit object, even if they were not used at all in the matching. To do so, we use the function add_s.weights(), which adds sampling weights to the supplied matchit objects.

mF <- add_s.weights(mF, ~SW)

mF

Now when we print the matchit object, we can see lines have been added identifying that sampling weights are present but they were not used in the estimation of the propensity score used in the matching.

Note that not all methods can involve sampling weights in the estimation. Only methods that use the propensity score will be affected by sampling weights; coarsened exact matching or Mahalanobis distance optimal pair matching, for example, ignore the sampling weights, and some propensity score estimation methods, like randomForest and bart (as presently implemented), cannot incorporate sampling weights. Sampling weights should still be supplied to matchit() even when using these methods to avoid having to use add_s.weights() and remembering which methods do or do not involve sampling weights.

Assessing Balance

Now we need to decide which matching specification is the best to use for effect estimation. We do this by selecting the one that yields the best balance without sacrificing remaining effective sample size. Because the sampling weights are incorporated into the matchit object, the balance assessment tools in plot.matchit() and summary.matchit() incorporate them into their output.

We’ll use summary() to examine balance on the two matching specifications. With sampling weights included, the balance statistics for the unmatched data are weighted by the sampling weights. The balance statistics for the matched data are weighted by the product of the sampling weights and the matching weights. It is the product of these weights that will be used in estimating the treatment effect. Below we use summary() to display balance for the two matching specifications. No additional arguments to summary() are required for it to use the sampling weights; as long as they are in the matchit object (either due to being supplied with the s.weights argument in the call to matchit() or to being added afterward by add_s.weights()), they will be correctly incorporated into the balance statistics.

#Balance before matching and for the SW propensity score full matching
summary(mF_s)

#Balance for the non-SW propensity score full matching
summary(mF, un = FALSE)

The results of the two matching specifications are similar. Balance appears to be slightly better when using the sampling weight-estimated propensity scores than when using the unweighted propensity scores. However, the effective sample size for the control group is larger when using the unweighted propensity scores. Neither propensity score specification achieves excellent balance, and more fiddling with the matching specification (e.g., by changing the method of estimating propensity scores, the type of matching, or the options used with the matching) might yield a better matched set. For the purposes of this analysis, we will move forward with the matching that used the sampling weight-estimated propensity scores (mF_s) because of its superior balance. Some of the remaining imbalance may be eliminated by adjusting for the covariates in the outcome model.

Note that had we not added sampling weights to mF, the matching specification that did not include the sampling weights, our balance assessment would be inaccurate because the balance statistics would not include the sampling weights. In this case, in fact, assessing balance on mF without incorporated the sampling weights would have yielded radically different results and a different conclusion. It is critical to incorporate sampling weights into the matchit object using add_s.weights() even if they are not included in the propensity score estimation.

Estimating the Effect

Estimating the treatment effect after matching is straightforward when using sampling weights. Effects are estimated in the same way as when sampling weights are excluded, except that the matching weights must be multiplied by the sampling weights for use in the outcome model to yield accurate, generalizable estimates. match_data() and get_matches() do this automatically, so the weights produced by these functions already are a product of the matching weights and the sampling weights. Note this will only be true if sampling weights are incorporated into the matchit object. With avg_comparisons(), only the sampling weights should be included when estimating the treatment effect.

Below we estimate the effect of A on Y_C in the matched and sampling weighted sample, adjusting for the covariates to improve precision and decrease bias.

md_F_s <- match_data(mF_s)

fit <- lm(Y_C ~ A * (X1 + X2 + X3 + X4 + X5 + 
                       X6 + X7 + X8 + X9), data = md_F_s,
          weights = weights)

library("marginaleffects")
avg_comparisons(fit,
                variables = "A",
                vcov = ~subclass,
                newdata = subset(A == 1),
                wts = "SW")

Note that match_data() and get_weights() have the option include.s.weights, which, when set to FALSE, makes it so the returned weights do not incorporate the sampling weights and are simply the matching weights. Because one might to forget to multiply the two sets of weights together, it is easier to just use the default of include.s.weights = TRUE and ignore the sampling weights in the rest of the analysis (because they are already included in the returned weights).

Code to Generate Data used in Examples

#Generatng data similar to Austin (2009) for demonstrating 
#treatment effect estimation with sampling weights
gen_X <- function(n) {
  X <- matrix(rnorm(9 * n), nrow = n, ncol = 9)
  X[,5] <- as.numeric(X[,5] < .5)
  X
}

#~20% treated
gen_A <- function(X) {
  LP_A <- - 1.2 + log(2)*X[,1] - log(1.5)*X[,2] + log(2)*X[,4] - log(2.4)*X[,5] + 
    log(2)*X[,7] - log(1.5)*X[,8]
  P_A <- plogis(LP_A)
  rbinom(nrow(X), 1, P_A)
}

# Continuous outcome
gen_Y_C <- function(A, X) {
  2*A + 2*X[,1] + 2*X[,2] + 2*X[,3] + 1*X[,4] + 2*X[,5] + 1*X[,6] + rnorm(length(A), 0, 5)
}
#Conditional:
#  MD: 2
#Marginal:
#  MD: 2

gen_SW <- function(X) {
  e <- rbinom(nrow(X), 1, .3)
  1/plogis(log(1.4)*X[,2] + log(.7)*X[,4] + log(.9)*X[,6] + log(1.5)*X[,8] + log(.9)*e +
             -log(.5)*e*X[,2] + log(.6)*e*X[,4])
}

set.seed(19599)

n <- 2000
X <- gen_X(n)
A <- gen_A(X)
SW <- gen_SW(X)

Y_C <- gen_Y_C(A, X)

d <- data.frame(A, X, Y_C, SW)

References

Austin, Peter C., Nathaniel Jembere, and Maria Chiu. 2016. “Propensity Score Matching and Complex Surveys.” Statistical Methods in Medical Research 27 (4): 1240–57. https://doi.org/10.1177/0962280216658920.
DuGoff, Eva H., Megan Schuler, and Elizabeth A. Stuart. 2014. “Generalizing Observational Study Results: Applying Propensity Score Methods to Complex Surveys.” Health Services Research 49 (1): 284–303. https://doi.org/10.1111/1475-6773.12090.
Lenis, David, Trang Quynh Nguyen, Nianbo Dong, and Elizabeth A. Stuart. 2019. “Its All about Balance: Propensity Score Matching in the Context of Complex Survey Data.” Biostatistics 20 (1): 147–63. https://doi.org/10.1093/biostatistics/kxx063.
MatchIt/build/0000755000176200001440000000000014763323603012711 5ustar liggesusersMatchIt/build/vignette.rds0000644000176200001440000000052514763323603015252 0ustar liggesusersJ@ƷMRmUԪ'YXFhA\ F C? W09nLTdh@6LwuKл|6brOXom#DzC]ek_kBU6aʿoY^ֵa / gہzE@6Hvt";Ql$YCT;ӞR u*{¤@>Z|B;G!\9le2*:@ά@tB@QVc deYr,aobt :וMzy{n/e/h.eͳdx\:;-}i[x&@}+v>oea^͖r̫:E6ܭ"KN1W wEk5.ԃ>u:6S i^ڸpbFkwW6$F3g^܂U2wx*@Y,4WeK(;dLr'Nyn^`OA> iV53 *# Yjў`mlؙI(4p~z#%؜Ut|+caYmCl` 6rNϬ00W0sh[n5^ li2je/%*~mKkS1Es&_?lF|Hf}Ďpħ; ql >Fy FYѸWvB\%9 |; STmҍ쩚^lo4gE*6䷕+s+8 4B=Y1~W݈)[PHh}"j.L57E-6vۛo'yr) 6Y9{ 4W*x@Vg.CD-F?pyA螢l*g1ssDUY 4[ą P*Ap rUƗnx@]TA<$eA0;A![sIL3l MtZح6~/ oOХҎEAifSqI؄P&uEzem.gsC s C9Xm|sz^j:]҆rAt8DM8v@ְ|z4 ¸"0 f.1Ok EnEj\o `9.(z-scyn&Bx|QpYAԨ]skvs ky}ƚU2]#;i^yN» 3)s?};lӌck+NuSO5EF;;h,Jef> T[hDe/T\; veQx]ނ|K١w󎎨S!796Ha5Jw Zd66M=e_%H7r}Ak~@7$k`:Dj [EՁ7 #^ „m& /"y4.K)wvV\4HYDk+dEŅC>Y)pۤq{8Y7`4dz^5h~lϖ|(-sHo?kzjJy+7:F T{5 O;tT]D-.t[-l3 _RpZOZ[Rqۀ!kP!ˍ>Kix TFOC> yPY`r‰yM7[^drh&SôY&nrtՠwt GxL_q4RK$X!fi`^%>+Fa=-i-se$cF ՎcmõT0$0PwY.2 Np »Ĵʌw_Ecg&:thM^{›E!7+SgHY956ctyW * nbyuMqMśx]LN^ૐ_Ufbee2i"]Zs.tp-QN%5mJ8yvHōBV?K){GƶHvKQ+7!Pޣ&>C:*n;d4.ׇK6d%6 2-3v)f^x$^? í,7%8އ|_HIk!KjT0p򴲍oJ|)G9r=.vm)bn[/I~q_hi{Ѵz*Ƚq|Ԟ|$5ey[4 xżQCېokS~pG7O$rhtD^L0;n[o(2b xvfy۷s$ׄ;<wCӐO+jhv`+ʧP? G!*gnm"r8Y}Xdn§ A&S=ͷ!*809s@n4P"e Dtmy~X_/ 6G'<⇁dGY{+w*)ӽ(_4[bo?"»/TGIke|)]Ho* ;p͛Ⲫh0+>7jO VpͷbEk4 Lއy}j>}4}*[Dxh|5:<Yn&mh_"Riuץɭ÷gg=~d=-o9ވK3/dqi3do,dfТgq];;>.muOZ ױ)$q6hljrԃzNEpNZJ̺NۦK͖_F"gbJbYM斊fʛs;~)=2[IEL9GZ$vDNPnic$ nڊmHGabm۬;u]:&PP K33J̇s8y@FۗO'ApE3+Q "b3N-mb4FpyHfQ.xE3NcD@Qo@~Cyu!0U}Sp_e]w,K(KԞd*Æ^fOz#͈I`dD͈4#' 2E,un%^G"MEr4#6׀7! yw 8Y}Dii&:[T%"y5TO+k*Q-5^efl,wIoFhӐnoi&ک\0 P6x;)ͨ4_y7^qy#zip d߅ZiVt ]E2ĕ633sNniS$ ڊm\I2_f-:b(#/ȳZ?)sk՚F2Ȗ`-yDyN0ni[41p"y4I^vI nLrt&eg>k_{YfQf[j&_j&I0ni[41$D$/s$h&zu"Jo = N&oB~eL>!!&D$^Z¸m`R&di8"3(ɄgD Y*Mmdr/l@7A$F/b YꎝRq SVM=:δsVgfp$SԢ~mg9v)H?L 2{wΛ0CqLabDN Y* A ?E=R}'dy@g=T1,weQt! _ ߘ_"C\7VϦϦrKئg{F=RWygeﲽv'<'"‡#W'ג"J]VD.-%[( bWrm%F8ۼ $:h)xE(?r5R@'!_~Y&a H7!6ߴ-mbN7+7>2r+mͲZ]$ A<yJz 3,^rpw<7{c49FrO rp-+O̊ dMUFy.ۀe0ÕQQW ʾ̼cegJz"kI}gQ]0z; ϕKPo`Y`Uau'=b52]r{Uw jIcDy$<ޔJ8%QA|$;2Έ#FS-GJŵwT5E_?)${G{It#c㐏kCdi`W1c `dn}HJ8 YjZ58dTNCWw!ˍol*^D畗]D7QUܖ}_dyMNg [/ˬr;B q-{>dDt 5h=yKۤh[+6>2r+mE, O)nYmD{>yw-*w< =N^xO IJgx^ςsaR/}z )Bfq.HƷ"~@p 정k],gXy'':WJ9= Y/CjJ Vt@P?'[c!Cx,]_p8 YnW}ML̔,wVƥֹ|#_!ҷ\J۰[rZB `RlCg~pPGԓRCx 5mkY7yL*LI*E7"< ,xIA>ME޼msb(pr 6/AVXSy)g9EMF;# Rڹ0i>ڴ-yhM lfg Ϭf6zײJL5F'%m&6TϻgF= !zrG6ȏ 3)z  (SSY 8:06X=hMyܗ`uCY|B{;մA&(vAo^ dJt {+ӆ4˼j['ڝy(Q}&E]zܿ'yG!N@7m̌29 ,5kf/$ʚ鬼F=ZM1Hrάh"!Ǽt&|,tiþm(r" D,09aنMBUV+ƼZӂrarN.ߣU>}IC׌I>[\.r\Ea פf]z6Gx==<~Ō2=]!HJ rx O|P^qǕU6yі"l+|4D{Ս*;17gGQsN?zz#50 # 5ϰN0Lw0`-jv@ֹ{؈MmR3v¬s3/G!K d58pzFLS˶z6 7IRC|${7%ڐ:9I qPmڀ{ o7]}i:\;7gI)(r T~I K9|ZRuXvhR^|59 tmDjQ4$,J9CnW&5(\R/\ f[p{^(.BFPF~I*nݩHŽ4 JRCTN{ hӏ|3 YjP8Y..ZDz!*kQzѮM\AV_ox9J5g5;Tv\VA}'Rz@,FDK <Yzjٿ }ɋ`>!ёxGdB~Uԋvwm\|p.dk/-Eh5"o_ٳAW 5iHgA-Fi0i)+7DeobeRpRWɁ$E=錳+̧4a䎗a!2ĕg L3 @PZTKIhp&0eo(IK+z#yg(y]lm˩ԇp -ͷ2@ԣ5-0ҳIZ5|\[ljxoAKh2cm'Tph6%8oW̲&܂sMvSڰ2s䏵FYnR3GN~ahys Z q'd3Rsd;6<xm4 J:NȇQ5*yxzƋ9dF0-ŽBr0wF D-(u0Y <б=9K"N(]ޅnT[3-f|`0[ K-2bg H!܂dTV`dˈ5#Rg |i^8d 2d)WM\-aI9owUVpE6ѹY=xa2}{O,}$+~Pc 歘yLVb4J8T8NBTm67{ې9k8Ɂ,ZgZi}=BlA_ix`;p'ڔh|5Tj')8t{R*'6CaRsb2+2--: Q/rAp;A8CY/T~;p'䝫<Ð[ <Y=ZQu/VBӡ\U(p=pg?{xzB&եbmji'1<X-<9F*CxYyzZ-4!xV{qR>%rvtmbhfGx KD*hA|;Zp?ʚݜ1gevЦ5RZ9A|Z.6+kP" |/NGbt/N#}1:kHӆ (FJ6^tDoC֙zw __ST8.d-%0~1r+oKp>",y:ehJgKA @-.j_C%TGRODlQN A|$7exܶ X⇄D?jiOAu]uc/s>h^]wE)>XyuzE A_ < YjC6|!+kṔ=WdIeU!^{ -Es3\/64(XWvKJVR?N|–'#tnAՔ!Ex˝(ݎ >ݓB#8МD|P{C5Rqm@7^yi;Q> YKjF):{_ "4 7̂ 'NT:Q܉yhx 5e mxvl"*p2m {$3G>/ڮȶ`ȻG?+zoEW9m hjZRN¨؆N%O/{Kh)xEKTQ~!˧hxf=>k sEDiʳa%N+³֚R'j#`c-KK[)a VlJŬgDg--h|~:Q)-J>K?=OՖ~?nD[a!0ǟvE8NBTE+Ӭ&_ %"_ϔ%nk7fΗaB\iȟ2ӿ 7A&e# ~% 6ߦDP _&_Ʀ̮]D-ͽa(3gP)Ϡ=>ZTdLtxԁ_ ygbO\\h]3OY]K_k&h P:zt'pf+p]ͷ}*np7d. C>M?'p40YdP\CrG,DξD{5ނ,2/V# ir>_z7#C\y7q[?SyrHk Wvi 疶F؆J$qd[&Z ^:EIA<84kUz _hͨ6p`|6/^,rJM[ X~ B*Y|S8_HmYw4߈ v:ڲF@Y}~G8Y}Q-њD6Ͱ;~r-Q$*W KA*nxUen6 eO&9+AVOMa3[`OؔW;o6Tv`'Nu=:N (^Ƚ^y'GHm, oFM}1i1  xt6cϤ}d)+Z~#i9&Δ϶FCG ˽VJC.M;_FoK߅^ER/=tD>F8ETwA ^S1iRMߋHxw*n7Ѐl(+i_uRغ!N' KFnPT\U RVF~Lo }LCf\`=^N7s֬] _εhh`!նqEӆE34?Cubr8y; vQc"C+nK[ oiNIVeSۀ!KkYIVe"|&N#_>L|FmG?3G383qQu}a|p2\%J_D!ܗjY_vPq>B,ƀG+|h>:epYΉ2=7 _l*'l|#,ೕQQ;inQ8r&6dC\`/(r n߫!b{QkәB'" YdXLW?ND|Sh.gDTB>M;m1!n~@>A?4-Et =WB~59| kbˣQNt`#E[lAV4%D6Q $|68]v/ShUT`;-ȖrUJk,rWi^b|aT3O>?.Cx wFa󲽶aT nW-mRξCQȎNDG}*i/$W~SR^/%18רf@\ @[?'Zc˟$T7hܡ5Wsi3WfsKѷ(*:r{K\k92yxfG,SȄU{cKu62X+/Z}FG!ڱ%Z+Ѻ[ àb[7Nl ִԽPM;ez9f|?2I+j5DG%mY^ֵgl:#@|^V!Jk#OgW,bH'~kE(^j4~ u Q*U"n<>@Kg.[B¤5tR%~!Y*1'RMƵjjާxYn P]\p VDf$@ɴVUh~ r1RbiUW+^7̼]\Ϡ+A)TPאb*)0>2mӣ 5C$5ȯI"B(u5m}R"jRqS#j P gηٙ=Y-ABщ(Zq^(g@r6 "*!ld7ȤG# !צ]uxTЙ Q9A:>!E.߂,wfv JKc] vl <'Ҕ9+X-/H*?1's$s("1]_cfvõ%a/gQ IȓU:NfCf>[|l~s sh䣹~0j ?N؛Vپp-C}`7[.F8N iU37OJ0*sz֎v 6jT\2P |06Ju#\(= 9?gsXьPiJ:U <\R1J{ 7 6 3e?hE+DQm'J&)V-~[ A|4վ?Z]*Pk#OiZ D=1JKO-,X2A =J~>*= gun$ TI dM4Q~4X7 mW]d q%>3*8(Ҹ;zz9=*`;e/TI41o.SzYYA3;}pY}JnE! ?M a  笌4pVRPJ ,p)~G8Ypp2zKeS{%NiͷAYM{lVZ?&,"J\(r?P.+3Ii3fY]&]kށN8 yT[Wbzx% ߷-XLBk Rj|q1 Ra RGN]$oA@ڛ7ȓ 03؝& %D㎇A]k4,DhB6#YG:}3^_Q{$B >LDi}/d! g/}'Ѐlr_/o܂=Tzw= $M{8 Y.oNn/JFMNkl#Jk#Oktrc/^c{4PҍF?N!])A?+DowLSy{h߫C$O(;D=aƄ]LJ*ΌMO31ۣ>ET8e߸c&laY9ʕ \3KU0]s3(^|I~ڈ_ˬk] |ZMBR~j儩3#z8 r;!G|C=B,o.J=~ @APógޅmn~>gS KIWqq03C%F˾SroNE(dgFa=@q᭏ f C67zt)޼PAU%ODE}I#̱nGo[uuwO&#ܧLOqS&n6z]4otK< .E=7O(i elA?#lݜf$z}^Ls+n' >B 6Э2O8 d s /r 0p^*:7?|8k[`ro r0x&bA恪oTW,V),#|)P0hE‹hŋ/wvgivARSg~B2RfY Ik2_ִlzz'|9k62솬ܐw4dgJzMsGc\Kw:6HP\f ϐ(\r٨!^`pAFaYZ(8p3\@lHJp+p5]@*H\I_CrM64 p@9O_s(g;+:i*>NZV|8𣺟`ߘ& SQc&ߠU381(] W<ȳl#0Ma__k*G2~q@MC410;jWQy:;3ph/繣49\hcuV%~gq8y::7<3/!L1j)9kbb =O'hm}ݧLt`6`3DMA^h{ ]s|I3LB] Jo")e Zt~b)PpIDosOx/+?E|3F pwps8[!A;b8wHgVedU2`KPF㕸$!lwW>;)JT.G}C9Uy1!0ɇۣ6aJQB}S^_3CxEz}lEk7Tt84CxkvlٯNbjdN+FVu̺GuGIdz}sZ?bMp[Mͼmz_lb֡&V^RɪMT߶Mse7=u?Y7ukkX_SцQxpI,˙Ry_)ZYPUm:|[,7N{ѿڱmjb~rJѶ'[VT~oGCAhL2=wM.:_([Yc\;cMYM h8A :*PE1uSgcUQ\,syԵgY89sk%u:e77sj96NsUH~\][¿ V}cQPvQ"ftU &`R9_c I=rt8U؊2'ek뼬,<8 6={Pῥӹ٨LE-p>]rYm[Vi1q}m:zYmm?}49:ťs eSQ[1cnoߵ7kiz< IyAsI\kQWt iGvnv%0mČ ?6tWi`KsZgύh(7=/zl6O{KEu>s]x/6Skf:|r3lœ*k*64& ?4Mt̾˦ݛomXa\kL5i/LJ9S<' O垀ԓȬ3xv><8 y>6,[VH6 |H^ƚ߿yeѹ6<+ӯnN3.rņwީlSV04x޼ZS`-yldɄG{K^2u^?N.{aš f>b)Pzt37x)~ko(74k]蝤q֤-,ظ>]f}5G=y8kcZsd=}{|l s~gς(m Ա"X|C\υ VrsGrn~ k4!&Z2/.ksfܼԽ'?W_{]ME}1\maSL=g uw-:5f0˴~-䯭vE-CRݴT;) f+0y*.[u;Li-T*+j7\d^*㿎گ>q*1Co=]F1oX9ۧ/+&'ǥ=^ klƪa4gxtM& -Lj0'jCW7D|B#6WOYA9Y~n]= ^yXŻZ&ܯ\#A#>'M"m}ɰ-5b]rǏc;veV{ ,ZܼIS~GJQ0ؒYQ+欜44Fd{K$vjE2kųBk!m#S-3EcTQ3'q%aԌ 4jF _Q3{qF͈Ihf4hG̓ p0ʣ&q\Q3o<+G 8Ffd7q@e Z!yS_Ս5ݐ(~)  9ҵjC&ĵC>nU:Sޅ|W&qޚ XG?:oy( 9Ec! 5dʫ1gkGFPE{(T6TGRSe>oקu膆hP>U4LƮdڮGJ FG4!k{zEu Z#oQIbփңo8uzw  ‰r &A~; +ơ8wCN"**pB2qEH)w SNܙȠF"V:2h-6:]w)ke=j Y*JGTC> exQNFE'OmQ@i _SVюNt%.' /Qw(y9*Hy=b!޽Bħ4!6\9GmvR6ڰ8o tDe7p}-#^|xbNhycM63YYqiRr@Љ^d)+nw|#k i#:*-ȷ ;|48}QxGr1Ԛ5G\]j Kg"Oi BlXg"Fyۀ!o7ӜSLMns.a?vAVߢ;7SpF|5vb =TmZ(A셌hDa#vM'8y@Ym҆eZqށB܃wͺNX%Y9 +o5náU# ]ރv {Qwķ"oŽв@`!o~? /(n??Au͗/'!O*eF-y 2l2gp'fBYE {"||>} j ܎ ң6`{Ju;bsiH DT4ͦqEޙؐHl FD*n&|&^sVOU ӵh~"ddu_3<[ۘ_!MB [)lz>`8^nQqgÐշeqoIb**L5g%S0gWt=U#m̔}1\[V%)swUc7鮁ZY5- "퐥&01'J؝w޻wEVx ->'0MetFS#CT2AKX 0gʑ_[w{fL~(2^E.AE6zxl&ZCCzA&tHBVMN)E\;Uɩl/DLG%<%h@6{hUvr(G#zB~U'fm7<$Kyٿ|Zc'G( *&Tt#jׁQegZ^)#1S)6 :S[vAd KeYjnPGR\ބ,Ėm:}z9viB6( 6*~ml9U:Qi4o@'}˝[ ~ہ hh|ס7Sb <vD{>}S5Xh]ނ|. YiC!$ w՗+iZbu8߯X)|8*] M*q_3c{u=}b=8EzAi}'Y<;s{q+B YjW{Bކ+<|'<ՏP똖$sxT8Z|; wI|!NJ>zfޑLh{[.`>e˿ Do$WE"Y}su g|ނ #/ h5-.D|l=TV.KR(Ki^MtDe?(z96hӄj{/a5~"< YegYڄj6m3.`\̱j,I}P_߰3ƸoxN93J3ğ Ohp 6ؚ\d5qWPo} ԘH<>`k-NcYb5,!̻[L6@/Uq\?q31=bs /h^u*፬Vo%}EgGKf(PI ہ 64¡7OC>X5 sxq3?IȄG!Kͧb;9O`DŽ KnNb= ޅ!2z_ÍHFe %V,3h>v'7ݢΙnȱO븗b«6Fp% [Db?-RAĎ5D9cCc͠j跗*g^ EyԄ3~tRTz+:ڗ:{|KѓGK #Vi>a0^,ucH\@ѮuL]ʹ,ZRi׀ O5l'Hn' *3*x{&\ady37/Z,P_=ƹpLJ]3Vd+V% YB2of]wڰ}Sl{M]#fwgQ B5$R:e8HI~jewCdl ټ=7Ϩ$I:hpEk3FBUSMϷ~EkZ& ̕B\Ǿiqi-9m.X.́ߡcyZ  }XPF'D1&7S2_ [ݞXÆ+P9efX7kà欲\93Z+#<~E&`[Gz^k}-r/K*Aϴ7{"[17EkL?aC.B}λuAƲp;ulظ,J5-Ũb?8+.8V7$Z/A 74_n,6Sh{KQ ͐7++wc07 8lCn׮-L5tΐ9/z@A> ,~)- G, of+7u Ke;ƊƼ OD

SɕI> ąO%B%!QN%Ց7N1gu$F&:P_=RkϠ4Hqf>/O?,\or猫=F4a3hfY麶K]+ M8 yX2+nj-7oO1N |Yo57^e$n ={Իc yE3|zwtk# ]q1n3[_zF1,q'r%:7 a E~Ҥe(n }Z<-/ILb MbYz&YM\ijiq~ B> Pvz&{tLFo[AR煊-(;D=-F*ʈZ^9/q0!lܮRjtqkO3>wD\?Cԣ] u$sx0Ql7ƭщGci:J_rG(<)'d2i#sI$t B!1Vz? ք<++LK#,n ??=9Е6:FΧ sQ$Է7'/3Nr{o½Bqtߊ-v4ŝ5|n-|@ů >4U:? ҁ?^5c iaƘ]}}COAQjn|\g@--~h /Њk}j*=|*ܖ1%9>P$&AV?!? QOqژ*Ŵq3c\gyf7Qn lGoiԂP]II>fG4I˟wrPƋY-9AKN/7 ,|{nm,Z4{͢ ͟IE!o~PPx1x60xIx\&>'y GӒv(FG؂}$]+`ށ / &o<1 :1:e 8߯#]][O} %܉%Úgćָ:%o}qܹ޺\[*4j V Aqmi$ azɯ#5b"ժoO{yӛ&zGBRԶ~ib&yV>3_bۦY+d/ޯŶYXⵄeT.aTVo? Y^TmSʣ, Al ,w˜C~.z7>u)p{? 2?4y4`]-h.t ueZ \_S;W{es~K2+i GLI$ kʆ^,z+mllWAʦ9}֝s%qe֕6VB+}m9lň9peRE/wV *qY[mf\z$') bb;-:څ( >svU:z;Jsh}=hrv݄bۀ{!Kp$pAާibx8Ԃ &b3 PKUfm.3YI ~[HtCKIF=<%Y`sgxvx B)r (k>C~]yL[EƬ-'Y;k w)Ϝ_B^oIiڳ,ʾS/_N)+aǃ( > t~}6GE ~1ۮzHK]oH=̲? mAEQQBbwe/)-PggtSչhk?kٮBoɵL]0Y'|u^X_0wqh`{C3'Yow/ ^~wjls@Pnh֚#PނY쭿M 6^E*}ʸ3ϋti* 9?dl\I=ƃɱq&~ !Tf?۰3x =mF"s*lz'T&bf7gG>31 h{[Y?M~R_{7dWTGKu6։g'8uFSüezV`9Pƪ ɤ]4z:hC_g^Ƚʼ >4V[#&D,B]ڀ !(8TA;UqZ0'D|V_-4\o,>nfK3t=t>UYJ6^=G^>`7n1TM)nV@>QCzh(䣭,HjhP}EI:kS4IgD-^Eqr:RL_tV"aȇ3vA8%%SqGݐ;)Ùy2QA.y ud g!W:yOID[9neתI]ǘ ʒUV6/d㑲p <pZͨgr}63lږ֥l*$iXlAZ=WmV@uU)H6jM8Ed/u)\8X}FV 9lvWUpK4nqR~wxԮ@G]ރ 1*~"(ҸVh6ȟCTƸak6{LzR.P}*wGWB}]jcO)|`HY3oCB9%0T\/pd$un5OQ=3y8 Yݜ< ?r6Kx+{#,VOGVӈ4|ݕwa<,IOm}I.83#YGCh\1cztJ\{Hş3 g[d^Kђzefs⛇G@HJ5:L3y'<Dg$*nr":Sħك|8S'!lJş >k )u$; tBí0E~l !1ZM#T] [Hb6_ CӄJa5ᆎǏ;DC|-6Rq{ EF B凛 K1*![0?Qq种i~ UB-.j+zI8y5jARVˁACmu#< ygO =7{ D%+gG?yA:E+bvJ5 t 72Ylbm@xfqVV|KL;:Di3,{[ES}ħ4:!6Ǚ`=5}p+Ҟ|=dAj Y 7yeiEM^|& EJEߌ >Y7 <-H p{J}kqFy&F`͋6)p Uh6aF&O\z|. xQEky%wL2s4 &E\@V?ȚydUg݄QE_x %q3l;-}vHবpt@V&9r=5xj"ŮFEVF ؜ilYmo-I]^ڕFm$UtБ @PNJ#eӅ}Յm TbEԒmDm6L*nBN6H v26^\dz(`e-%_{ħd$q^3wE$ד Ȟ]zg2wZ)t0E끦wjvڵș[fEx OSlDxqWkӄ !7@wW e\~{٦]I{mnDTe kv z"NUVT+OD-bs- O#z?~_B:{֐ p/ &8&x#UlGvBi#%Tsx9mQtVf!_lfK/)kf;\Er8YCq-bpZ+H"z&B0s|D'ȕ 5 G]!KxS%C_ 7 _#có'j!cgOD|ygOTŽֽ;tYa/7iFIXh+i$.^-1zv2C¢5gFg'fi-;8cOOuVo L̥ΟC~)-għne)渙[=?< +L%twlFPt"VdxPfDdV%Q;#If_E^9.Y vSp%H UQ0ĈJ0al%!UFMcdF'I)6ۭQdݢS>? Yg.ʄ}8UϟTDI"ݢ˝qBv2ۀ'Io)X;|TpTfiHON9uD hނ|K.6~TْNs±!w7PɭBj*\i5W!S9S RCtQ:R%";i׋dP@~)q{{x5j YQKȷY m62Deobk)8׮Cܔr硣΋FQ[B8{V0ᑨ4d [̌=Ҩ rF͘O4s[>u5hFOS77!e v`YDUC|+ ^*qH2HKN2`q: 솬s@5g!Ts:ާ"&B?oBdCHU^fr W*n$UyqEeкR9 dR̤K66 < B&DT)/ƶ֪zD+6K˜lM$mmt9')óhVchYE5 h & r)i9 ysm&p djt6΢Q,f,7L'4O[=ev8 ~wv显L)U=kemϗY`ׄSF-l#x]6xC[js-NWÐ^UT\;d*(cR8nN|mMlNAnA#w(Az)fK>Z.G F-k)s1ğJ4E-Jp'f? :W Z:Ҿ Q >Lhtev){#*r'0)6Y#ۻT֍ 'Ί䙣$.ScȾB; wh ]p?[܈TzIR;Ҵ%S 䌶FD‰y 7Z^Mro%˥hgX05hGƀoC.\dۦw݉-63ټy -ײt33$6f^2;be&|2}ku0X戜A)\15+UQ$w<>wlo{L7"BТYC C$*zW>dJ!jYa\g3|gXeB)6+n]}V MC3&f\3y FE۬tUwE,QBJP SzSd^Mj @!h?!C8ZVmj%fIpۂ͗bmpGDu*K*R!9Ǹ7[04y-Eܯ^46rVh w}yt #.DE$Хp_ɧ -qjAZ̿o=ej⛴vV=[2Ut,4u{Ye5/Kάh@-Ы JLRU,#[ՅS)y:Q6XX^e[ӷ15,z>]}M/[ G<|z]d̚v+)uJ/,旮 b_!"8=B6+dM$l'Zo ?%x? 2_ngJ}+W r_FϷ\1i=Z;Δ+cz~;k TfS ϖv'?>w1e%N~ !3| $9U| &d}A})IIT-m i>aSJa`'dM8*% 2пxr SO)fs70|F{G)送9ԈoD4prN [eHp .~$U}vY缏yۯ[c=j.AJO@0ܶڮmF{8=#[< Nr*)cEmBVcSc|,Άb rLޡHe¤ wAޥLkLd7&&̀^{ DxAez*-B"8/ˁ B~7h苳d)]g}eh,q<>{l&: r|x5[CEn78iU ?bd}39`,;l&XѬY}{ѪC*dx4Dg}EI: ytT[I+dgbk>:9+l&Y72IID<*d$j IG Kiդ&ED@V? G|! ak;6dG:w q<Yj%%f0T&aR j0Ƴ"tٲu"{xAx>`O ~FUc<`\ĈM rWMhCYI1F+C9 > akJFC KYk2[@ch& J )^> & ) w@ޡmkF_Q+Tb Y}?EypYgx6Q)GV鋠B}!f `㫾0[. />Fno3HGNfōė{ (S;݆ /7چ~Ix}Nş >I@v>Dq MTAQ(骻T4.U~љ\[ $_\ (!kTɀJ> 0MT)#fțUmGZ U@5A-2л5)ХKQqJjCЅBG.v"E"! EE %>dB|BYK *%Pg"Yj"ԉGI_ ؘ[*~m4Ta$|jO~bVm A='6g s6d5Fh]Ix .62tK%!yQv,5F T^`d_]P3#2,>|%|guA~נ,'CnW׵;tZp# Qo:jk;I{n_Ƃld heyl?3? 4Uȯj3Kǯ=E3 5f7Ew!k'!+q+! h݀P_K1\8jj:~ ^,wV-zzoHSo(@A-g߈#-?Gx/?qFԘK' ,.ꋾDg3KFΚ5yʁ*jOD8YlR0+5\evݑ$žexe;xbO6*1b _ 2Sa|ٌy1 DSmy|7@x{zsaiAb!ߥHTгX `rFYe,QS"> MKAYߛ N !7_M C_~ee J;xAYDj d}kQ4[nRk#o岁9%Tl 7CC[geQ[6>;DmicJQ_~T:Dؙ@R7/Ы(H-"l+ ^\^~͟88YjS{,<94'5&b32 oLp3ͭ?C3l}R'Ap+ں'J['!ڻn`Tf'Em@VXyU86[&A1Q"jPD+6/Ί鳩`,zZlOga[ ˥N~{ȕ𨨲G8ky^S0cnap(g? Unժ[b~*$ܑlJ ŮCnWW'۰g0¨Fid/pu-Ҏ:#Y15ʳbLY-W1GɼV[|w-ϟmT;Իκ7oLkkA~? Ne_wb|0#f IfjkO2W#_!AٍV$˺!* 9TJ%hi$ڊmD!%z`/{Sf-' 5zz9+`sIx!%:5*n{j#)HxMLFIwm*>qmDGSrG-)C"_~ U'܊AA2Pߏ0՜sG!_n+Q=@:lO!7@V, iq-igYqq"tx}moէ<< BzJL[?T>ͰfϧSA5Դ-`ߵ8y5:,'n앰fG >!?zi.f"Df ,<&wMiud8#Z@ Y}K@3[َ8I0"T1v#(̼ AĎ 5{Pq*Any{b3$:ԙIL? BBGS_gkoZ6YԂUnڈ1ͣe)v7.kuw=3:Ü_h+jN%Dېy:}oYL(z 0Da|YQݷəՑ7[Xw$>(H$?_ :v*ěϢ4VcޠQeWR/ڳfo/2M798gXr:6";\8^>3bW0]b3š%2`Ix9eR'b#$z8_9< [{&{1-0ba6jORF Vˠ˩egLN%d 26tp=>#@ Fο΄/92rgWPyb9Hо~qO@>RVv劏 &a6 MPGj&Q.V5:hLhp~+>YMjDfBJXބ#Gp6Q29 uaoq;|L2%*A&jw@Pߔ! 8 i͜E ͢g3΢.n# =$ԗU;zX@[%>rTYr 0$<Y*sL|?(a*MP_б  irc̟[d %!({1zOHOűGC)jt+V*'<Y-o1mĊ{T⣭ B?NtۨGD'L3D|[xI,"QOSœ"x)Lɤ ў R ᷡMki򁴿 fDTiM;)3F.> Ȋ7j~'AArX+@δ~D]hzw̢g qڸwJ|գ ct ]TJn{%)+OwHDOQB}i2ߴ<{ iܷE}׸O/#)H9qSvAYoRdjۮY|G#sʡ/z!hw Oi h#GZJ7 # vwq~. \K"ܝRv۞e">w7<<<{B=96 ^e8ߗ|B ?xob&`_*„GIŲ n߯5؋}aba+Úfćٸ:%o*sIjhOZk(vjZc5b=ռҦ?`+i7-\4[C W(:DnԄ)T#L/|8Tkq(w錽#xש`pr/vڸǘF#l~曆uSKtc-N9c u3h} BoD&Jt$5oAP)Jl#v@h}/;#Oi;@zF.l֔6g-XV7JSټ%<{%pb3)Pwn7ƃr46䖂M6팤GԁPqny>@тDe}7yNJ/w ]‘TCoCV4mvX%aZ҄>_}C}6ĨFۀtjB e<ҖntY։K{ѥfN[[M1Ӿ)I+okzhFL붘6VBb"Ou$cJcT-GrڛcJQ pRNE3~?_dآ]#9SQQc&/ 8ǜ&$/=uc6 ]OU{i'N2ހ3*/o9\XE"l$ά!XS6Eה^z:.Ao]TVRxmp'*0&ޞL`!m>=p5v˦X񣲶EٝZIdcZ Z=V-mT[-7NKIyx0k w\/@I!''8P !EeŶCzE-AJg^&@%v+6df^ԌLx3gъʼnHKݵQ.yjG?%?脷 R1 B8)/_$w.(4tCOA>B|ZYU :qw|Rgad3kiӚ]6b5|Yz^>XYe } ~PA65m+L)OPGs@@qь,C.kԌTI3_OF3 ʚȧRy8|%Ӑ?M; )qw*_J󿃊pTK'v'mj#gy&#AEbc<\XQ9`/(r&cR_ۆ{e; C@#<;: ԗzo|cbK"3oFU[5UQl]}3e?sb3K5LYӒUY_ٲ;}J$܏9mzW", R)?C=W:9 j#(A Tӟ]OUQVC[r`]0Y'\^X_0wqh`{C3'Y…抽{Bzfv>o f [0L3fBU<#"]UtH؃u>eÀ|:Oظ00l*}buk`rr~E:C\PAa[=gzg#e۲:Vd*DxÌo:Sqm<+fӯC[ [jZz.m.Z_օXVhv[Ȥz"Ohb&N8sU KY(Dwmԅ[ >4m%nR$yp=2|YA{!mf} ݛ{!K5Qlsh:֧b_ojGjvtR%n6ɲKPo~ !*P#rҡ϶#I8i;)H-^EqrZr4Edzs[UmZ|X[Sh`.m=+q;ܭܱNMVM^ߑ &g!W:uxdhb2^pM8KdC' F)C_їv¦~8NBT+̰W." yy<8A԰ٌ:p*vMӥnr v14fC]ޕCE{cA>}ݭcBB PhKEo\DcZ ĺ9MSzʆ.,2rh5 2ϺlaS?WSӲϕ*fJNPv'S=s\"Lc7Z ( HqR.nTT752#2$(|^iݠׁ㐥\>k[w!Ks tT=p'"" ~y1>4o) -tf +PPxYǵ MQ$f 2O/g,XYnV肙/[^:~͗so}DNyX#+j_Oj_EBiMJR|=;<:1oFDMށ,RK4`~j& $r=0l)ՄHȩpq]+9Ŝ.;Á1܃N@VwQlKɪ&u7U)k9Qg:Igu+W±D+9aӸQq(GB9Ox -府$!0g**giJ p} I49opfx oT\xee^AԪ0dr$L \'2DĽ9JĦ fD~ggRWnFKFyƳ1[=w8Yn&ѩ4m@ZLiZljZQu$* CkEJہ5*Tr0[A(qPBP :#:*a j ?A|ZM#ąf2"am)ɀؠ+qqEh;Sv' /T΍Ɛe,[ae-GpC\~_x[2*O3 Y+?4pGCE-\ކ,+NA{¼÷ˉc #7*0WHTЦ]U%)h 0"BC' K-iNħ}CVL~*Ji5KhU;Ardm@ȤJ.9'`Ml::eRh3_AV6I%7"[yp#='=^A`DnIE@Vﭻ΋ @th!Y~9RnȻ[?RDVӸ9C "Qypg/[rtXk=kHzܫ s`Gl; w4ģ;!%~tPB5' [o(:ECF3[Sqi%v:=vj\AZ=5—]#1kyfqRB+t,6D՗ Jx'#OiI^fѲ}@RE؞RFFW(Ik$aG0ߟ^Bmhn$du}OEFmY-Q-ʣXGzY<_ =V(}N= {MmNuN:m3A"%Uk%Ւha*n=pdh|^P͎A}(42gQ I&&{j h%/;eEkn֔ZU|jA|ZM&Z &m?$GYP};T>֟ѣ̽L&1n V3fHUZmYTj!6+69$wF}[?ل FdN:BVO"%{L掼αYxM|ځ!{%0у6@V$px"g%gyY (;b|8Y}yJhêS͸]+CN:o&x ۳3130Q"!W{$e-.PTg}݅],+f_TQ9r۩o6/Q"jiaL`IdK 9{7vP¶Dd;!`O*h1 LYbO4'*CRӤG<- H3Evÿ6F ّW/C|k:Ðգw+ nfg- WD Y9VAjoIP=2o^*kmxAK5A8Z|ׁ~'T{Սnw~iPN@ wu]nrx9UIlkaGiJPb5:Q".Ww y|$Q_,B wAڎ=X0dNn~y'+o2 :LCZ@+rW6l7\Ƹ3j[8ޜeek[~POGщAV?8wЋN0Snl]Ectf5Q|'H @hJFVxQMJTN /,zT{ڸ+0m3J&Ov(zA$wӐOL۳V޳M?ܷ7ҞoY#UYG{e۵+Y#m?cWg|xƣ3 ط?Ma~C"6hljrԃކ,G^һsbD$F:!~GxH~j;zTu} SMؒUS >džZ½0Mk+vȚM#PlmdUpRo=N6y\i|l&<6ʞ5[·7? ;&M Y.R}SxY`?d.g-Sq']εi  Ǯp L"Lw|* pq+<.d{kӢ}ʿ~^"s&P.뼳'ElAm"tD<YQ|59<YFM+t)OC{R{į 8YnҎZe&Ɗbۀ;R /&(*]lܮ~?:EJg2-fq2cwx> 2]ܩU}7s:#_#-((lSRɩe ך]կ5lhF˗7#NÐ*ۮ'.0UT Pq5]ނ~;FnAVٗ549s2>Y0?Gwl"םf go>T~`/^eqlE͊' /kQ:L/HV L¢fSzt= T-_<T:gGJo=$xT*nG.u?1 /IvG^LWf7"s@S4 QZA|$;U:V(vuƒF&ԓ7x{ VCGUF@ZN~[[OBV_ClIH$0YJH*4bt&og;D@"vx 7/^m7ӏ%˥pZ@{TEG=bv, :^dTEˤ"s3N$o+RVFQSA q9qJk/>a$ubU&A0)8x\;Yc( t 'y <\mY|4\ 2rP=Hu 5v‡|V.⍖- ZjVIbm8mIF!Vߠ4oBNZ3X qgʷb͎ wٹF6ITK LÉenFM4'Ə|͝N9I7ڟd՘o2(bx8@Fk5盐{Ci 6I€J &qyٛGtR^ڏűkHhI(|Ur=ZېwѷVvх|*f1./(oؤlߖtgV^ƒʤXQ^ȽRtL.oCnI*xD¶M:*,ˈB=N:;BTw@zпxT̉X6OA>l#p,l7,̲5GH/۶c|FOg (W側o1QrHO$+CSt§h5ZZ[x[B!_No.AJa(m6=wJˈ-HH,x%ځg}pӰc(YU96u=γTp.D|lWh:䚗frD/5K^dFI StBޫLHHm')`oU/ "8/'Ccw G ql8O2v sf0d~1e͈UsDT %%Zռ繁e;>`_0]Oݗїi5f͢o+'Z5`3S-˵ Ѯ>( ?VM"1Z[`e+A o`*0*ݷ.|JWWkeJEne!zDCꔊ!3} (}Vd2݆ O/%#!]vyGT|{0nY^ֵS<*2#QZA|$gU|]*">2>^\(OG JM^,p <g*qչh}Ľ^xEb?z'rEᙱEaiM8MuGU߬7w h/f7+^y4W<7QrT|`%T9 < a4qӎ;<Y*qA]^.J8cͥpxNBV2P? _o@flž-jmD25ȯ5ڨA(ֻ,THcmxӨ;]wy s*•hE>`\˻<=,3Ƿ췤$=05-TpzX/HǷWAVY8QDU4 / X '~UwΡwD ",T*e]ix>̞^& R6$ܳ^ʷ* b(% NcPqWt4IX Y= IJwo-3x^5vih lܶ YR Pq H]N# !^9t wF/Gݐr>V:xހ rg ʇlQ]2HI%MBfF nAN-6cNF[xM~wxQf<& :cbsxh:GN69_9}O*nKIz IHR=A8IF=f-&#Eq<YnTj /A.Dť!_^CV_^>OWqM2ߋ~@/EPzЮMLK|_*3|Ig`K8yH[Op]N@ZÊu*o?cQЄl*+F?BJ_$?ܺ+;Dů >Zo}jOJZom~E$_lz|Z0hZ;II|?tiM75y!o~Zm dtvbAkź<YɌ[q>E.AV?ߺTTK~= dcCzbsxTQqW57(Ҹ~7\n%ԫ%Z?%k[3K)F5m jGtquOx -tNjAP$%djUTTxIYԞP}IMǒ_% lArԷJW`Ι5G0(pܬϢMLoAV}6vYLZ(Uli{G: vq<CR?jbP=i6sNh< n*Hg?+" Y}}S`M; Xu,#fST\;dCQ"Gw!{˖?WΆCn$䚹3F6;EF0pE}{f B):.P04*n3p?d)vye+T.}*n%%:Qk"7!T|?xRhm*WN  ҶGx ! Iz=ȰzTM_m Y)D=zPds(PO[MAtwC|烊|zj]k:y+ϡ;^,ߞ4Ck0`ļ4u ,wPgq; ZѺ T )fy?EN@VgF &k<"_%џ~ Q_Em®BT!lcDkp?d}&6Fŭm;8ua̹NdgC5?淉= Y}rǗl]V0sEqc!>hE/&!syY,,R[ЁgAEs@ AL iU;< اEMR.XcKeS!7=|~$*:_<Xm <]ޢY*l6/k/Q".d/0挕*+.2" /{R6>ux QqېiS:p8 YHkSxAL_ڐXLM;p/d6A3\IF- EiĩeKGyp&B'g! !wxz-![okLpJ ې[wx"?A|iܡHӳ3XKe?8 y&4/EVRΊ/$-8U~deo#rJ%dzs2A/cs%ܥ_¦qDB_ 竟op$=G8Yn us p`ßIxrhe-c+RVfxB_1w3iCIT@VnЬ0\ͦxb0Zfssu^GOxI6x6sYWcO1yaW/I'"?4@ .w x]>}DpTř^F!@^πLxm2E-/h@6Y|s CŝASVa)Rs:{,H _?<4F샛;6Dx~Dl[ge^WKů >!R%xUr\9ky_$SAaRQ8ͪmF4EVo@]!1价zaU'q7Q»Hagމ ]jmx[3 ط? 'JNw$Jx4659zcABSoC>_lEtݹr1bCT#߉#C<$~;U=; u1^&ZFJ$: d M۳F]yzA0gut GhT :-[(;D=GNY^pQYX?Q 2˅ =F 2Ve9<9z0΄!Km] _t]h[8D1o%9;~FMt,qr;ie' d OC>Lr%ikԈrvKWEkOPAmAOf rX -u*49W-p87L꿃% \#|ᝫ?AB}ե8+2b ^~YN?|eve3`ۉ[Ns(qbNArK`MfY~ؠy"TbЭrPCdp@x)A /r P-/?R B?@Db */~moTCobQlW)Y,E׶+sc6ɞuqZIL۳nJj _j_%mT%i;;u1T qȒ\Ď3,܈h?7Ԧ":T~? [=*n5 7mP:L8m<ޝwinx13Ʋtt;|'=2f> {5`ܚ7l1W38eV=b?gh[-mчM8+T' vҙgX`&_ٖ1]kbDzN) #i&륍 'Є'Q%;kmϲX<' OwOr#=} Sc?1 _JrrW,:/Wg|ձwS| kc*#>y/yýǝյULސg>)rk)Mxr%T7Ǧ7oXj5mi/oz>X/uܛolgo7@zcMLxۮ2_KWNRJ̣_l:D>wuZg2V a-GFY9hR}>jIN /ױyPb4}OU*P2 -2]"ÙVԈPR.X,f@W?[E56P%<K {B@{>! 1l4HII3$>t?oݯ1k[!'زV|%V";9#9(r"_ZqaXG7f$O&_}UUW_SYPeNfx˟E X_Qa:eNSN^,|u/kZC1:B,gBfh4ոF6XI-l̵Ev^nZ9{\B_K)Xv.x6b%&P+_zt6n ?ϲС@ɝ-߱ `PQ.Ko,;Z D/a kK^֚*njVZ7;\6˲/U[RY+}8)|$n%|d_18m8|bPl__cDq|zXD :}~ѐibvf V]:;yb?K )mP #\V{]0Oi˨nT_Z\V)2Й{z <.%•.*n3P+01K6]+ [y,XPmݜp Z1T_9!#|P ~PxҝY6'Hfh&Mfh{[[%e$5Vl˓~[(6oގmgvY5sYj8Tk8!{L}1V f[\cx[dH+5/l0+N4Qh ~fW[Z}J/DӠ;m(c/䝼jkfvJz H:꨾C+v\fߚ挑:l-GwNչC]Cgpgba~Bٹs# >7ʃʣ>|8Suagdh0: đ`.A\.sN®|ШM&=.N,ofgK?82F셖SpCb>r''e!CRXlݲa70oH/;r4VSbfW~gMu[0L??iQ]i{s3w[˥ΨϺEgb&5 >}܊#vjѱwRbDL)Dwt6m-HşO5nY y S]ġa5COfhYK].vR\W(jpRͮ\'֫b$)VB>hPC:O|CX@|5tZ}" E[\ZGyr&R[l+\"Q`xcLz},qhY;q˛yɇVދG,tZ;Q< 6;8<"$1ZRQWe5' n? DUx ,a|T+E;p)$g#X@] h$!~Alic] R jS4pEM[Y樵Rj S&F6O9GˁqN */*7e\ކ|`Ky5˶fjYTέ34d*+T;!ypd5ODOf\J kK'nq NvdWrX=mKFZ L6$7M&W u ygT`YiIjvW8H?CVZ}ɍx{I}[P7ħ DŽvd̀fiR [3lW3|!Yie6oS$9X w@>5֘nk)J6w xƀ@GŝޅK1*hҸĮPdr;-rfwU5[=Tw ŎCon aR]C Ak\~S=>xЙyݙjE *Nޝ 4. e =Gx6RN,fGeم,ö?B2v~uچ9s d$+sb>$wFɇw&^eX^lr2Ja8dHRqː'.{5g"s8 Yߜ 8v~^w AzwePl ~+'xbħ4o=Dr0INnȻ]uMl#)P¬6~4.$Im=M;(n<'#_rpO-dL,^Yb\kbZmg O3 W|A>ة؄P]nB0^  |I.pۇ7`l+}L~ \uΚhަmg=[HrX-H@ӄZ5މǏ{eC|-7Rq;!+Ek_f}}T\WҊ<Yi~q6q9< YN|u8S﫣G?QPgħ4,@9dj.u7Cެ^SǪt%QZM$zm! >ы5&މg%ap9ȧyz›qce6/G;i`YϳT<wT n\@m}wW.|`9\vXeMކq/~:fQ7G ~ݐ]V*yevvaΚdUFja 0YG^$Ik4vT(v`*~H@|MCHJil]]oZ]e/8Kk74Ձudۙ* T4DhʼdaĬՒd?qJ+^o' Uކ,=/ڜO+'kdlQG.;؞Z%^ȞRMhӄɛP#iob)8>ԦsÓ勷/RQ+ >v'ON1hWK*Irth~W$G&XӟW )e6x.d}mg}gW9A9rXSSlW@44!*cZ?Sq7\֚%ujI[o *ht Sގ(nvH'  x ͑2CAq-ckF%C/?/Iv̲O5ٕ:'? jIQ]q1Uf۷M{0+,ZJcin27\YBisA(vé3Rp٤Nxx,gDi[VEoͧfO!V!K 12?a_meJgjؐYaM7N!JpB+~_+:9M°'ZSQy}p vu e.Ygjdxdt͉{R q9 |ґ\W.BY ;?.{ܢXh1P˲D7 ߟظh"pBOB /k.7 OiLb47%fRS:2|⽖\`lj#IY!em,dD`ٍb'Km4e,g"NO ħTBqA55H\lwYQ[ȷYRcŮ+e8.է]!0Hi^o@_}^T]!\Lҕd/ڙB6&z@ޡ2W~t>Z.8Zg +mJɫ$,dM ECC*?^Ѫۋ4J͒2+ ^䧽dH#5s,JTٚcFwU]4dHyއ|_iޭ&s z܈2mHR 9* ag,p0SmPq'CZM'E9F V|''ކeW$|tfC~6wyL贬 #ބ@ 7@Pk)7ނ5Ĩ~j!~XZo6یYk#{;Sdv)KuB֏됞CÐS >(8d#,dgUv3 =єbJ9/) +QiUWs0˝+{+=`[8='f'6~:kXwZһcȄ6=O eGE2bk(ubѥC'GסΎ ӆ#_/dPzhB58 ٷa(oGyS˽)Ӗs$*n-dsMt$ r䜱iS-WX|& [zcE&êخtS"u(LO7 a֏>qp_P-ѭn ,޹n0Mwm ܈*iVUHB wYS0+P U01]|:,"Y}"S$h/4d&vTQ OAѪ(6Kh6T~ڛ(*BFp7ȫpu*L\@ސ <9b]RJ**Ahm\@/.,*U}wJۢr*k G*eT+;v䴄s3/\݋KY =D%UyoYp,K:Ёoq{C $l]?pY;XY[r#5yN< s)S&tdtoϢ{[Q 򥩹$) RmYcԹ /`[Rqۀ 1R{&g;=&kK0p^~a~c:% > #rE;}BK`svY`v=ij^<3i{i/~T.8W,8-#7QP F{!+, 6^OGLK9£8kjeƥ)oTͻkثi>!6gxUFQi*Yp5YqanFҏK9jCt=D%7f8V-kU5삓k_ 8n K{dṪ'@P .U)h) 8U)Ah⾘"R)(Or$2hjK B /.W nyRp2(x~cR(6ҩdx\c(ҡ$ߌ l?J!PfT_ nAvw8Foglض+D>VYR 4W;u3cGN&U@()ߜ/75sL6=,QvBK:\ җBU<(Xa^|ؔHŝƩokw䴲MDngjw5 u*$d GC"!qYii#g0TJW8sU"{x~=E\oS-I0ŌC}-kbĦY)-BC\FY Ȝ([$6WeM75Ēvmp uQHw@e,' ­rzyƫz!y$Dea'QޅE3l6ozFZ^, !!Q}_x1:){MϡlL=c}zk:e5w)Mh5ڔ{EOzF xυI;L@|M h mnIA!ŔhǫDT;_iB͍9a铈\^*>K5I)*UGn*HFmdr~uxGY&jW/}{]R{wx~?BM;hΊ㒴Mf6Z  ܣ=W ٱqJpI:HCVH.ԣPouϘ-! >I5/ Mqcjֲ_EFݾARmPI'dII_6n[|*%4m!n}nxhe#Z9![HBhr/uG 8 Y~ܾl.ւ=m}ɱY9K_Vt)j޿6| %9}g((ڃ& dŐbg!Vvr%%U9M&w<<{X;u(u^B[ Csi&ElpghG}0MB^ʩ^܅vA޷ kr*?dsQ !DxrRq u'ǁ'!4P0pd)SӐR5~L//nAȎk8 yZeNm?uV8~z,ismdwYm/~{Emq[3KV $Y{"jIt=;s^?[^9һijek7BsYnjϖS8j^C8d+UTPxhܖ6 NFm*=sL6A6a]ݐwG%]=3uG,i(|P^ AcZR7%6g _j8Fl.!1'ݘzK-ĩ%~Ó͹LjDgxTN/@n9F֪.dsco%a֫0ԯGC3 F:چhjm?$m%sl1*C@| U~\-9~4'? 4+'?jh 6?a) >@duGP@Əb;СJ$Nk%2-%zk?c @b2?P\c*C@|M㏡bl1ž7 Y&OBV e$/8! >Vb4~d'2|EDؕiۑ@_0qd孎ūZYU-CAG*jia7shaBs '| $'A|׍/[ґħxR8\OA>Yzt"Ǻl_$kQ, Y-fVv_-/? ?"y1{ZI֖<Y-Qq!+g4},q ˚; 1]mʑUfH *YX,N^jGZRQIm}SedN^x-MnZ)2N Ƅ^mjߞǽ%^hKlY$[7 0Ma뤥ͻǐG!͍dx.wJZDY ` ;(K1qmO@_cmv&D|^݆̃5AZrb!2%U,Tkv`ͺ/02Bs9zv5n-si]wiS(:F3۴ȦnK$)9$<`lQaDtF!+KdrMپIei&6{՜WsNߟpGNߟsQŞpS.:t4S R5`]aD~+.dtS Jϡ! 7AVs}  ҈.g_?s& yj6(xbpF#]k7|-yY)Qf!&3uQȣQ dt kD,6d\L8 :_pK!%Ðyy8ހW!gbTp~i) >4nX6ZGN-fE! <(y,_ȘLU̜55+k,\:{DeaJSq[G Qv"S" / )(NCdh/e;Ӷc[KV/!> <Ymr !+M:z8yD[Gjiy枅:%JWw +=1" O@|4iJO{ ЂlX=Z:L=_L,4]0] q^|EK]'x0<52_popҖpn)r4Qo|+dkQ^zyF<܀BOG BHỈ=is+B$iB{Z-c*A5p1vc *WW**rxTg#>rJŎxmWM{{VlMdiҰAt-xvDӣ4H_oQ ijK\;ղͳZ4B7Qܹ/w =(~łYvAR&;GγvQ~!3;G;m;}G{VNX\ƭ_石9e>:'*>$?BdTSdd*YQ[ȷYRFZcžd:}ڛHv22R MF#Z?q}N,%>mtJr" oSWI-T xx rY' +3* dsT~`^~ iى+<g8y=zYKFz+ >4U'5_je:cbf$n[d@?`zޣޢW{/ޫt2>mEl{A~MT~̬ҁ]7{}lulp @8bJ;lAhfCŲ.˧!^! <Yim !^dPFT^짹 2 r{3 v6([=+ _X(+p' %Ka񖥋4]U6.M7yjf!zDgr,4F$7=/,r햮WS 3sJ;Eh~3ٓAp3d눉@o'WW񽙢Sz/p?tze{/1f!+] Td;v5<\֠VxZ'MKSYY=R!ol~ KwSUSsz#;Y!ެ, 1=,&K肿Un6 n)US]ܥNLN d4B6_-giUB Cӥ] <9'Zgg;p)As)߫3?t7G9d@h.eѣz(Kp?>g;b~*\SkųGjː,v=++*ĵwC h VՖJyR cŮxdHا]LHiA%B:+yc!B[z!*} 6BB-c]f>oG'"ā{SyΗ S3K֣odhd%-(d?,yz^1{Xx{lV'C ?:w$L9JI;V'eK3 gW=]  ek]J068}e?X?;R(|lcᵖ_׼%jkzIP3?Z͕\-o;SeUtd6֯}[gr=qW{LOwR+])JBfXEd5cPq?_AD"5<6\lٴXqʁ.YA>Z嬩 ࠒrZ@C?AkN{*Yρ9R ;V7(% TjƩÍof,I|0|؊w_U|u^bL/^Z v1>7R2:>*nx1,w%EқiD.uzl^?՘Jg m#G:GV94rgom_?#>,MWW)G(}h/j{J6!I%}i JV7<Y) ]H݆qV|8Y))_ _hDAV#~S+NXˊin~|NJW֟ ! Yy69* C>-~۩vb~?x%YIn9_7>AT P]y%/@VMO{|%L oGb3抚k=]k=V UC3r; d1B6}4vx R!?&۵~ ]5 0ϭJm0qC3u3;Mq9z!kk(Swζ6lD$dlѨĶyӫ^_Bx쒤aZ0ӊ7̌Cn@ {[<&HELƀ^H,uÓ Sɕ􂝊v_1u}/J'F_Cvѝz~ʜ>\p ~{:7ϖЫt$E'4x8p05+ jN-X`[L)z K:\ , > }E No4Df/p͌bg)q|r#O)ُ4| @D+WENui,}˞sSG@SҤ%Ai>k%W&9!%snÙ4a³j]NW4v G WDɓ' Hx m=~7n}lW~FGg'olvBO=βY7?Y݁D1vqg2y 2|-8vي/f/sV QO/A=6~C/TsE&~;yԤ|xω ߀+0aCш4 MGU-GwsޣIAP9pꚸZkSnn˯U-d4 xPUA)M+'edL HT|(h.Y1*(jdS T}k"<{+EnlSvw֪ O&eȗz3%aWmgMkolH8ۘj DCd%V^G{u1&u!0d}Cz`1ܶˁSZreݢW+ rփ]:Xݤbd-?r\{N}'h rӥoֺ*--߮z^|MG.|Zg.G_=;rNA(DጎGjݓKwՆ/:;|ld\o >74z._Y< .؎ /f$F I|k>闗WR/ϷbG žmZ$_'5\XogzsKMm-oZC)\5w/0j\i=b/U _-h8*q˛5Xhhys&E ,])w!] J|h]]Sc`z h`I~ ~ҭhѯOﶗzV!V ⲁW Ll7keOUp??XE<6l} Cs ܞzbKuIAPd'ֳShq /9Ʊќ<29oevvެwrP&[P֜qo[zZajD嬋 :n7%}Rb%+I[vAסNrz(eJޮ_h$7uTXI;h.3oT$c9?[ˇ\!^T.ء3`^1ZJa~Bٹs# >7ʃʣ>|8Suagdh0: `.blFxN\xt5uk2qu8 li)Yܐmx%%[bU u@lfgWsY,GCu4Օ_秚L15|alhSsa~-;Ok2ie}3ónY'ɠ'6/_CǭWM<{o9 F҄CK?, > 4-Ѧ1KdOGn*R]J O8`017gBO^h|R%]t+1 YECC9|BEhNn!<AUnXLC8fxE@|U5K>[GNG&%C2KT$Nᳵm@R6fہ 04XJ`CXǑÌިW!8vQZZ;Q< 6wV{%=%dy"3]-EN\կJ|SC: !OkW{fXKuT+_IaWjB5'5}\ʃlD wmd#-mKSJӸXA=WmJF|ĚrjhZK $mq:U1lb4h$~*hk-Z3'[!_TnV9,% rՒ5Poxn!SY NnvBx'~{"ԧk3.I%ƒ \8N8bՆtE(V0vcے?х5z#ɍ~UqBm85 ̺!+7ɳe/G=W +&ȍxYihɲ _@|Z=q< d1攀F|ck zR]wIΌXmޤH\r#G:z7 3 DӠ׀SeM&*dm; YcT=ѤqSv%r:YĎWruLqXwyURBV B 3ۥ9*p6-z}aftT B Q z;z7zDg/#w3Z-m kxR ; Yވe6d槽l3޶?Ǣ16>uF9s d[9B{}Ֆ^%ȗZ qqT\?2ږe5 Ld!>x/xXFWuWf AzweP~j ~@@|MCh+A$dIz[uYn=/&$:lGNBݴ 8ܦ'C/eIRHWjk2&j/,1]~Pgԙ<@iDB_dy*M9/Z5;7]+\@|MZ mQ'@>onnbuɦi4o:/8SY4L5Yh^0$ "j_xWv0$.]!N ;**7vCVcRUr]I%=0?QqH]=z~PA-G}VjTR 4QK?pZѐ֍KA`{VC 0.Z,yo>TeK?Jlq;ap]ލj)6 7rL1[ 9(/;VfYVEO&1ǔoe[Jq'_iu78'SnbO8?I&sYu9mm!Lm[7}[Iaݤ_ ^ոrEm?kEit3^rWފHhJ7 TuS֬,:i.8v4g'TǪ8>;#bvǷ(?x,6$*{xem҇50ko=R@{<AV_}9+<@핊tN{A|ؘRtbzǽ+$d^5f!sq/͜ r;43*i*pAϴ4sh1ǫׁo@~Ry~!5yECηG)o JQH7v9tWJj>.A^jj?j\٣,m-~Yr"1{:䯫6L҂e`9eǷtɳZ,yD&s?MVlED*#t;5vnG/nDAYk e+s[[’,!Z9â͕ޭ]l0ܣzNq"Ut 2. 䬩 칶G^b?GR֨AJNqɚln75mɇ]dJ%eU)3Nq%KpU!| k|D~yFvۯ:qpaǰHD 3<⣟u۝sQo:a=rG=odhd/NэtMES} 1p'"/hv-I2z?pTd>i(di~`"|L ~L'&hmV!(Lf|[)AB;"WὰۨTfwL2My&͋zB[$G>JM" ?1kJH~S<1~L;qґ{⩢t$wٖ[W$,"S鯳V.pQYyc ;?r%ʎ\cJv2oA~K_Y5j|j6Ztb7~O^VgU6M*!5L5Z,M^Q‡jkk(RtA jÈl|0c)ֆdgv~@#eq@z^| ȣvϋ8:a%Qn5J1/$I Vb@KXY$dN9ބ|S[ xmݏGWa6v?Zr|DBn2Wʖcq: xdGӻR|s?rj돂2ٱ&߈ *Wf=&|ڂ@zhxN8SeLz[9Ej5$7@0^o56}:@VwK*nC#:cD,'!nvy1Z]_Vv*D{q;#IS ЄZC)Di5~W"sŒV; GxC[QSKwyw{T >!6gxXSsFMPEHi~68F5{0W(RJ3+ƒ/Z愪7ڂ5Q++Q+d6ǡ4C3DVe%}/bz\_hj-KO BIEgL.rT|j-Ox<J:@0S1bTn(YK.lMs[+ln- p3К_z%gKf OzX%+¾`Geum){/}G),g!mn瀿Z;@i_tmz4]On0Ic'k,5>쫖;rv|mDx%IFY6h2eEХwd5t`ۈ39߫Vo8_~>5q;sE:;cގFE|oC[s)guhOxn>bHEm-D YBa BHfZ&='. 71fިl bGCZ=}aZ򴊖ٜ،' OӖzD䠰+BlX ۃ>o=~\Pl"f;YȿODj̧"|O ~hmV Qmm:#Ŷ ⧽] z~ϒD!?֞*+~ 4\*MxZcbfQor_-; ܯMybg-Y"' _Y`aVlpҒв%\f`Z6 qlr1̨dݼ}wWb1NBV=7< "3Kr(/?AADC@|4'Mum/J'3-e1Yo.-__[}[ǭӹ~8t[^M]KML[77 >jx«>T1ÐkPEuiw.B5:;( Lx${Jd>h;F9ILeE$<6|qkIG G W豄m: z +\g2d7G4' *kX9)^sKwsեCVlSt#d<;.>W}%"%$?]N, dN2R Y߭h9242ڟ=Hxrw`NSQYOeLlI^@fGVަfv16꣎\wJ.Fxn?:I߇ ݾߏcm_)}fNx`:g՟"FzH 7iEJVPs_nF#5QK+{cxK씎k# *pM3zo?k::zNIa.nIYX"{{eq\Cs,s(wz$6+3RU>ZCl>H_Òxo-eSP_Ua>I#c k]J068 s ?UYaRHx9)8V,jl IP_J.W [ yirRymrr]!]0_lew9_xXX~:*UcIǧmců^ 5޷DQA+FF-OxcN  CNUN2)_@2.|xL)4 P,8HI! ~goU/~Mu5=N}޺@#"* dd𸽁]WKMDx⫥D[*LڼljڬR2)bsJ+<#lR-zyS/d4DM?ƨfٛ{ IG=$}XN^xQpd%;e,.SuΰX%z/_ B~h#-#?x*~y/`A~OA)c*iLSx"_RbUʌqT$c9*?[ϒ\!^6.ء3`^10?~!v|xܹёs WޛCsǃRэ  >°324Zuf"jd1L?91FL|LsAe#<Ʌ}ҜSu nH_G7m`zz~ cSmg |?9K}OeuшUWbfW~Ɨ>dvMcd3CVCו氽Pf&ﶖI/Qu ?1쏐A Ol`_Q[;p*~H@|h$p[)Yʙt!w0Kqʑ˒RLC8]zw '@/4bo YK].vҙZ(c-Zcrh)% >aXKjhBx6hhBxk?, >4t1kv=jjh3Ko ālsgݼ]OY^|͠C7,& Jx>7GQ_N&Y'T#>|hM̦@F[T%?^vKd + ɞI)x p7{Y *E>9oCw_({#g']ITj Tє16qoM-1/: xt$9EÙTq)v,@. =4i<6k u4ڌ#>}o.-C٠PX(ClK&fțIrZNJJ[a' I=Hfy)rt~`›( :3G3?cPy-#*e?K|sBN]ڻSJ)v F$|W@|McZ 9oI2#7\әis 51y`K.; 7#@}+ ͵ND*:d-ހ"~!2ʟބ|ӘF:¬Nê @VBtQ u"K7ۀ]dco)nÐͭwWTK9RUQcL-zwh> h8yH[+3 |֘f6λAvII9[oG9 ITtf xm{9u?Z:I0-F`/ n0< ř,6b)OS]c/~=%%ȗڿħ41f=Yr$H\'pˢi{sEovl1 [TN`d5E6{ZwqO9͌ 5Ȇ^4imri"[̖K]]'Py$WUIЫvG1F"9kNA> :xmE- <oBV[7D_S!fЙ24InZvJ0y*-̲<ɫ%: kfCVo,qբӐOdQȣw=3n1XH#3nMjJuCMc>1:<D뇖=W ܿurxaCqYK~=A/BV0"k;v5s x]Evܧ!Hi{4t~v䚸Y7kj%[%aUi*ݦ7! 9lk|-mdCJ>- [EC oC(TK%]T?~yV6$Ap򀱮"]6ݴPNFW _i}orj+Jκ" l'R_@1d6mEz\pXb3<7=OP-ڲ~:~[#AY@|MCx~ZdFnf'P?r9Ldω]v;a֧7Eg.3>Tҏ8BoqA>w *W-~/˷A7+Coΐ^#V!ӕj2U.@^VD)Nb_hDq!+Oz#4!@& Lr @p!+]*`46N۟cbA4ˇmzP'NgcSBqm3ablѦ T;3Xoӿ4:P9b$; RnFGhx Ұ%7ZRqY8d.M*2 +x S&lc/CsJGDar1}Jf0ALOVd ˬv&vx~zxm0v鑥g/k6=% ^dt6;d{RrG]NBV >IX_l=Q_HY5$Yws#i![o;Tq!mQ||-ކXkΜ:loܼX}Nn.'mPB@812ʚ1ʙwVXOLPq _e#ziGʃ|R[>=7睲U-BI'`XFGoϑ9"vx rQqCې ^PS@sǦF3^"S~ I|->{0֐ĴϿeNN;[?u 75-e4rXRxҲ:ws.:WV ƻS]_Cni6l&T"*G iyB1dK[I/yuJzӒOCVi. QCV:hWQ\Ƕ ëӈ6Ŧux}׀m;ai(1{M|{!G5;G!'`kx'EaÂ*y xڶEϢW&*5E"yOgZ61 9oڡe*,L,_Yӡ[r@VJe|O X'D%o[o:"ɡ߲U:'4wq{q%P g6iݡFKCVJ2F}\/(uU@zgY?Uc &Cv͈1Y+Ϭ>xӲX cn"%o󽅁G%/{  ,,~9_ oEӀmy) lc@HjY~y*sV|CYk(|;Wq=:'4'Um8[y]Va^tM֪!]lw?fk MBoȘ5D+eM'ͭTjKCHMƻ+'K~šRFo/#g'-jY⊍oV3gەQMfr! 1Yk1 5Yk^*ī5J| @]lwB\LYCc 'l]Bs>SnPب0{1q-݂c9''U 4/b$kW\ėm՟]v]T2se;;i?]?j8-7KU1}Bm;E,N%(A3 d&y Pݞ YNm)e\GI[ldg Ɇ>Ǎ;Ng{VW<T?vJZ{c: pB>6`ϙƫTRcQes  =wQtP@=9XmbAvc8-2yC$&.К6DhY" b@ZT{RЍ-q-H'}ǁI;|FhEODj#.AK>z5{Bm;ʗ2<Ο+Ӷ'sPS1q t,XT$; /&Sr`6|.9aE[O[B'%&]-*,'6&殬N.o"8mDvfM׳hT)#I١3T)E= {o&>ᐭ6o+ֶKWh8)D>TB-J{Y˽Wg J/S9gW.qW.Oh~\7u!ȇZ?FV< yjmA MՌY#{{߽8Hl{/2&ǓZd%lm٪OhE<",1!=B!'/O8 yCwxvW"qx kw͸ P#4m]nȻ.,͸ @#)Y._,g9*6|UEԀa:~ !o46lxRpB-&%AiKF*m]T|( ?uphG^dnETjKR{=/`mJIlȊҳHo =V ` Q !Zf|KR&!4Y2ht8>: {C-=ET׀BfU&*C@=iϋI]1Sňfl/ء[^N4Q/8ooF1R|KVǬ/XtJӆ)iCi[ vιԘT_5B MgIFk܂92@y|Oq[e'Z=BV#Gc~3۬%o˯> ̷_B6Syc㬐/4tu?6#<~7Ou F淃ڷw AU¡|$}0/{Ƞ] K8Yh~xnp#47Xax: 'f}>hQ^#< Y } EA^AeFv 3n7B( ~{(,z_/C"^AZ5X'sJ:"nzKs*C@˺t~(3-Lj"K7?" n0 Q <Y)@ HhD*>/6Il/kWa"_ _xL~j+ߍgĐlX3z] 1mm9XLGdRI,k N 9y;4e''꧹:p(5` ;?5z kE[\ZrW+|O9,I?#pl`nzĈ1UW TO)v8yB[,o}'#׀w!'> Cp/4 U[52K,&lG"fۀuu3 |3 wxUIjW eaT$c9Ϻtdۡ:vׇ͵+`zh g΍,`tfl+>|*nN_`L-33tvppdjF.sbB#Ӛ+ߨ"b&n9e_mY|ҜS Kܐ4'I'vW Bo?[lýK!eu[7XWbfW~ƿ/6bG\WCa]ކ|=9<}ϱ VT BJm9diYo%^(j&W$Nxu/1xj>IfXR՘܉+OШ)ȧ՝tcΓ#>Y8d$ cWUTtxre*2p ,=SwħճxUMH֮铊' >-ckfIz+I7IM{x['')]&EoE{+4Ϻ9ŚK샖 0Cz~$;~ wB]ښr3׭kJv>VDdwƪH'יioK7bl%y1l=&Èa`g/r~zflJ ++; Yi/X,/HꫦmU58s x0%gpTY$ImMv򯤖[oiZ18lu87TceV! ,[]3{mE.`/֛U4Jҳ2HӮ&Z+~#yJ:e)k[]R;TEރ?NNٵ4fY(O /fy x=kӀluz`*漉2v:#Y9wXhp+ xU"~OJ4i*.DN\LvْdxNצn oAV,g6J'rF֏nQ[{؞SSer%E&Z%P?쿿Qnz'!+yRVeUr 8Y#MsO`D6 TiuFCFѸYkW..է|(GU' [crTe , $}FDşvɴjC&BwAަ.1{[:N`3 9Q=_PVdOE/vC|P=SԄcp!z, !qGBZrY)0qŽ ڥe$dm=99+0>*e &1~-N! |񣳗'RE>=yd}0`K*]{3 dUC'˝@}Ua(\Tά\R#F}|^])iAp򐁙d>%Z<-l΍?-Ȗ6͞e$ @FY_?'Ѥ'N4ž~殝`\>fJem_ 0'$pUWa֠ ׳˷##< * QvfɡJ815G Y!=I*7@pMp#47BXa`j$0D|jR>1JvIgAH eU6vC^q ʎQyL,v JZxW;[xlѥn Ǩ~@Rtexvu̧whw;C=:0di;T4d|.1Y@>zx[1ې0Tn^C9%|w)'gh\-_arșy;T]߹Oh{t7kEť73m>gᱣ#< ¾,ތ] >P9i&NO\ lrd_t><3祔gTLo_mE]heO,-()bѤ.0> Xl'Vsf*%٢̍^e (? D#[>sd(BA~M{/D*6O1kR%N*O}̠ i WP%6[aeJ"Mo}x3;0>$W!{`~ÓqȘRVd BS$Jht gh=oht܎[ v5fuFBfa1*[H:ʌC3: YZL!9T6V|w!Nlp+F`0_ȟ]8^8;wntd3ÕfP\yðTytpgn0 Fg4L%4t8QC O0<xT E)]%⣨|u%-IB7hȨTI!Lʞw 8i=S<PŴFXB?& >ټ.16JC+i  wK&Dd{!+}ɫ'<<45}i[pG@Yl0ȇn V$jgLŬQfzvP0=nQZzyzAz-6,Cm!BsZZuIWjeUl TnCj"U-+s;21DT= =vzR;S`b(,tysv}VKO|l5=a|$z*IT OEH`S(Zx8{Q5Ej~'Ƌ: ; y#7_@|MCȮ dk~$S>k?Qqi~|]HC5q3{ Mw!+x܌KRn87RVyHlFc"M*.cekF: Y(rh:ʖWƳ911z<8]rhnOJvyI:⽧)SOD}w_L{ι4H́"_ HO"gx.Bŝ^|A[{/, $Bc{<\ynm!+#V`8/I0#„T&z5,P[mMT !mFRÁ-?K3W|jQZQJȻN:3'OrnM[QLX}Jx=+Tw' Ohv9 ˵AMV`Ik5ғ_( əwxڨؠ,3/^V8:ޯ=bj^4)kdDށ{쉜hz3O4$ubS+a|r Ln}0vlb0t?wZQV뭏; 2.*~J@|4iL{e ;{J}ne-癝kNUVlZd4lXhq\<;uϢSߡ-h+Ciudt 407CD`.],.(ԢG#9bX\E5DhH'[%a9cJ#s3et.rEZkcڔ$1iJL)U7Tg_&X i# j{(JP^*r޳l\ c&"YM@enR ǭE7pd[Dd p7ݭ_n]aS{TOخE6Z ,:J)^SqkT,b ua*n3p/d}zlPV8!!eoЙ{>K'lAATzmԤEjnhEB:q[e?^lrp  v.{|4msxt'_:HA)X!{gr*~Bb&N4P?dӰgALB@s!{;UqZ*Y)S }*Z&BPH-blA1CCGgFcy{6tdzUpJ^D+=6&;!o܌傻vV'`'IYe.^x~ K#^`2Y)Qn.vwZ*h8ї`eWVx`O@>Nz..5E 04/ *bӐ#Nmޜw{)dQcmo43_ձ/CzSFW _]GY"&W8I8cY ^Hl-Zh ["4{fW*䫭+*,P@l M@|Z}(Un! >Z ֬EפL/;P(Qm5ueF.Mdgj4IH'zŢH+,qs`Ymckܞ^d6~Α&F(P\NsJD<*G*HTs3>dQl)C%  #R7T^[,+ em4 8m5bڵ'K2:˟h]G>;삵#ZBoTe1lV;֌FzFjm䣾ϟ8yo0 qZr O}PJBn7< Y;[HG'IƳL ozprb p(' >48AM2|P"Կ<7ol#hk Z e 9X-o@q1lk$鹰u®n:3-U`rSq3pSΉWx R~WjvmW1[5TMWZ?TBW!_mPEOOi9g,aovN^, n@58 <]Ji0\y}PV5 zJNC0l[/&ʛ +,gPˆmPT1mPaDύTG8t&~"SE &G5Ys퐷c,j.yw7BO1*#I0H[uJ"TGx ZzeT.QGGѢT+K@yk!Z0D4.d}:[r"׳@KL'rMK|,+N9p%+Gˇؒ F%ּ%kf㴀̨(dFŽ?sxU! wG )a _ Ulaa-j=ij x*D|*F3;4TRanєW ;e nX@x٪|/NBԮjVһ1߇*NAү{Ǔ|)*ⵙV v1jV_/[Ghun >ԦDSa!<^xISCO2? ꝺ$a|"/Jz-ᶌrCW]UOR/__1|RsEBnLS ǀf]ǽ*jBVrrx-]Ye~J( Q: YXYp8D2zxQŞ~[cD\N!ޞ!C {"JI>r{2W԰2{^{lkF@=}PyBͯBIǼǷ88-zEqoYp ^,8{snZuQ|{h.Y0:'t!Ԁ6ڵ>\<($aPaԚ[CKA1Y_,SK5 XfݏhT{ZD0W%!`@zPBs+Xu\mZ7gMsDCH֋ekŏuf&O"ɯ~TBt!--}&V8W_ހ-/> 7|v Z]]jsptƍFqqPw7`Д&]'3&ɟB1&7nnSZ̒)ٶaK5dM46 3N8,%\^B͖ٝ.p2wgБ糟ɯ80j@hΣS-ӃgU CMQggJ3T-珠Ĩm8j.Q$&OzŒ7뗽z'cObWĺz yKrh}I^r<\`EQݛ(T ZߝRO kZs6ojκg-J'7x'ZpV}Bs֣aq*E;2eQ nE6|ioP_0jnwbX)D`eS묉wKK u 4x Rއb4Q(?AYl+KctnNaE_i`o*uR2}fRGyd ziTMPvfl(h*P=>~#}w⡜.,O[[FFxsY H4kvc4әOP(c% E?2ʩ>~V`Lx̚PvfZjjܔGjVƔo&\'ʎQ@ ^2R-t|#؋--4z2w" @>b^>VRta [#e_]"2{Ϯ-A3n+EeS}ć8 1]I5P 6䷍0Ƕ/ڶl[v$& !MeogO{C6`^I6 BV;13M|_J톊ܧؔވ'a'ĿaĨbo?z:y⿍B6ĸDFAv *Jpo֊ K?gZ}F˷yyE_$ffbɹОt&NO\ 5ڈ(zͿ|Ry:7[- }v[i&=1m~7k kRl:rnŬ-hkӤƊ]fHZ: }ڛ)N *877bi|BqҮ:_~ 5c;df Xuૐ_m@ſ& > M>j 7BhLCP7Pp^cI(uZ,UBB7*1g]gk?w2Pk?T+`z مᅳsFG}0:3\yo6E͕> KG7 /|0p hat֙:;8MCT8X˃MXω6"s5uhgLt#n?zdX͸{um|=6WS˾@~]"rO?fB}Iݝ\-C MatchIt/man/0000755000176200001440000000000014763323604012366 5ustar liggesusersMatchIt/man/method_exact.Rd0000644000176200001440000000763214740562365015335 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/matchit2exact.R \name{method_exact} \alias{method_exact} \title{Exact Matching} \arguments{ \item{formula}{a two-sided \link{formula} object containing the treatment and covariates to be used in creating the subclasses defined by a full cross of the covariate levels.} \item{data}{a data frame containing the variables named in \code{formula}. If not found in \code{data}, the variables will be sought in the environment.} \item{method}{set here to \code{"exact"}.} \item{estimand}{a string containing the desired estimand. Allowable options include \code{"ATT"}, \code{"ATC"}, and \code{"ATE"}. The estimand controls how the weights are computed; see the Computing Weights section at \code{\link[=matchit]{matchit()}} for details.} \item{s.weights}{the variable containing sampling weights to be incorporated into balance statistics. These weights do not affect the matching process.} \item{verbose}{\code{logical}; whether information about the matching process should be printed to the console.} \item{\dots}{ignored. The arguments \code{distance} (and related arguments), \code{exact}, \code{mahvars}, \code{discard} (and related arguments), \code{replace}, \code{m.order}, \code{caliper} (and related arguments), and \code{ratio} are ignored with a warning.} } \description{ In \code{\link[=matchit]{matchit()}}, setting \code{method = "exact"} performs exact matching. With exact matching, a complete cross of the covariates is used to form subclasses defined by each combination of the covariate levels. Any subclass that doesn't contain both treated and control units is discarded, leaving only subclasses containing treatment and control units that are exactly equal on the included covariates. The benefits of exact matching are that confounding due to the covariates included is completely eliminated, regardless of the functional form of the treatment or outcome models. The problem is that typically many units will be discarded, sometimes dramatically reducing precision and changing the target population of inference. To use exact matching in combination with another matching method (i.e., to exact match on some covariates and some other form of matching on others), use the \code{exact} argument with that method. This page details the allowable arguments with \code{method = "exact"}. See \code{\link[=matchit]{matchit()}} for an explanation of what each argument means in a general context and how it can be specified. Below is how \code{matchit()} is used for exact matching: \preformatted{ matchit(formula, data = NULL, method = "exact", estimand = "ATT", s.weights = NULL, verbose = FALSE, ...) } } \section{Outputs}{ All outputs described in \code{\link[=matchit]{matchit()}} are returned with \code{method = "exact"} except for \code{match.matrix}. This is because matching strata are not indexed by treated units as they are in some other forms of matching. \code{include.obj} is ignored. } \examples{ data("lalonde") # Exact matching on age, race, married, and educ m.out1 <- matchit(treat ~ age + race + married + educ, data = lalonde, method = "exact") m.out1 summary(m.out1) } \references{ In a manuscript, you don't need to cite another package when using \code{method = "exact"} because the matching is performed completely within \emph{MatchIt}. For example, a sentence might read: \emph{Exact matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R.} } \seealso{ \code{\link[=matchit]{matchit()}} for a detailed explanation of the inputs and outputs of a call to \code{matchit()}. The \code{exact} argument can be used with other methods to perform exact matching in combination with other matching methods. \link{method_cem} for coarsened exact matching, which performs exact matching on coarsened versions of the covariates. } MatchIt/man/match_data.Rd0000644000176200001440000002402414740562365014750 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/match_data.R \name{match_data} \alias{match_data} \alias{match.data} \alias{get_matches} \title{Construct a matched dataset from a \code{matchit} object} \usage{ match_data( object, group = "all", distance = "distance", weights = "weights", subclass = "subclass", data = NULL, include.s.weights = TRUE, drop.unmatched = TRUE ) match.data(...) get_matches( object, distance = "distance", weights = "weights", subclass = "subclass", id = "id", data = NULL, include.s.weights = TRUE ) } \arguments{ \item{object}{a \code{matchit} object; the output of a call to \code{\link[=matchit]{matchit()}}.} \item{group}{which group should comprise the matched dataset: \code{"all"} for all units, \code{"treated"} for just treated units, or \code{"control"} for just control units. Default is \code{"all"}.} \item{distance}{a string containing the name that should be given to the variable containing the distance measure in the data frame output. Default is \code{"distance"}, but \code{"prop.score"} or similar might be a good alternative if propensity scores were used in matching. Ignored if a distance measure was not supplied or estimated in the call to \code{matchit()}.} \item{weights}{a string containing the name that should be given to the variable containing the matching weights in the data frame output. Default is \code{"weights"}.} \item{subclass}{a string containing the name that should be given to the variable containing the subclasses or matched pair membership in the data frame output. Default is \code{"subclass"}.} \item{data}{a data frame containing the original dataset to which the computed output variables (\code{distance}, \code{weights}, and/or \code{subclass}) should be appended. If empty, \code{match_data()} and \code{get_matches()} will attempt to find the dataset using the environment of the \code{matchit} object, which can be unreliable; see Notes.} \item{include.s.weights}{\code{logical}; whether to multiply the estimated weights by the sampling weights supplied to \code{matchit()}, if any. Default is \code{TRUE}. If \code{FALSE}, the weights in the \code{match_data()} or \code{get_matches()} output should be multiplied by the sampling weights before being supplied to the function estimating the treatment effect in the matched data.} \item{drop.unmatched}{\code{logical}; whether the returned data frame should contain all units (\code{FALSE}) or only units that were matched (i.e., have a matching weight greater than zero) (\code{TRUE}). Default is \code{TRUE} to drop unmatched units.} \item{\dots}{arguments passed to \code{match_data()}.} \item{id}{a string containing the name that should be given to the variable containing the unit IDs in the data frame output. Default is \code{"id"}. Only used with \code{get_matches()}; for \code{match_data()}, the units IDs are stored in the row names of the returned data frame.} } \value{ A data frame containing the data supplied in the \code{data} argument or in the original call to \code{matchit()} with the computed output variables appended as additional columns, named according the arguments above. For \code{match_data()}, the \code{group} and \code{drop.unmatched} arguments control whether only subsets of the data are returned. See Details above for how \code{match_data()} and \code{get_matches()} differ. Note that \code{get_matches} sorts the data by subclass and treatment status, unlike \code{match_data()}, which uses the order of the data. The returned data frame will contain the variables in the original data set or dataset supplied to \code{data} and the following columns: \item{distance}{The propensity score, if estimated or supplied to the \code{distance} argument in \code{matchit()} as a vector.} \item{weights}{The computed matching weights. These must be used in effect estimation to correctly incorporate the matching.} \item{subclass}{Matching strata membership. Units with the same value are in the same stratum.} \item{id}{The ID of each unit, corresponding to the row names in the original data or dataset supplied to \code{data}. Only included in \code{get_matches} output. This column can be used to identify which rows belong to the same unit since the same unit may appear multiple times if reused in matching with replacement.} These columns will take on the name supplied to the corresponding arguments in the call to \code{match_data()} or \code{get_matches()}. See Examples for an example of rename the \code{distance} column to \code{"prop.score"}. If \code{data} or the original dataset supplied to \code{matchit()} was a \code{data.table} or \code{tbl}, the \code{match_data()} output will have the same class, but the \code{get_matches()} output will always be a base R \code{data.frame}. In addition to their base class (e.g., \code{data.frame} or \code{tbl}), returned objects have the class \code{matchdata} or \code{getmatches}. This class is important when using \code{\link[=rbind.matchdata]{rbind()}} to append matched datasets. } \description{ \code{match_data()} and \code{get_matches()} create a data frame with additional variables for the distance measure, matching weights, and subclasses after matching. This dataset can be used to estimate treatment effects after matching or subclassification. \code{get_matches()} is most useful after matching with replacement; otherwise, \code{match_data()} is more flexible. See Details below for the difference between them. } \details{ \code{match_data()} creates a dataset with one row per unit. It will be identical to the dataset supplied except that several new columns will be added containing information related to the matching. When \code{drop.unmatched = TRUE}, the default, units with weights of zero, which are those units that were discarded by common support or the caliper or were simply not matched, will be dropped from the dataset, leaving only the subset of matched units. The idea is for the output of \code{match_data()} to be used as the dataset input in calls to \code{glm()} or similar to estimate treatment effects in the matched sample. It is important to include the weights in the estimation of the effect and its standard error. The subclass column, when created, contains pair or subclass membership and should be used to estimate the effect and its standard error. Subclasses will only be included if there is a \code{subclass} component in the \code{matchit} object, which does not occur with matching with replacement, in which case \code{get_matches()} should be used. See \code{vignette("estimating-effects")} for information on how to use \code{match_data()} output to estimate effects. \code{match.data()} is an alias for \code{match_data()}. \code{get_matches()} is similar to \code{match_data()}; the primary difference occurs when matching is performed with replacement, i.e., when units do not belong to a single matched pair. In this case, the output of \code{get_matches()} will be a dataset that contains one row per unit for each pair they are a part of. For example, if matching was performed with replacement and a control unit was matched to two treated units, that control unit will have two rows in the output dataset, one for each pair it is a part of. Weights are computed for each row, and, for control units, are equal to the inverse of the number of control units in each control unit's subclass; treated units get a weight of 1. Unmatched units are dropped. An additional column with unit IDs will be created (named using the \code{id} argument) to identify when the same unit is present in multiple rows. This dataset structure allows for the inclusion of both subclass membership and repeated use of units, unlike the output of \code{match_data()}, which lacks subclass membership when matching is done with replacement. A \code{match.matrix} component of the \code{matchit} object must be present to use \code{get_matches()}; in some forms of matching, it is absent, in which case \code{match_data()} should be used instead. See \code{vignette("estimating-effects")} for information on how to use \code{get_matches()} output to estimate effects after matching with replacement. } \note{ The most common way to use \code{match_data()} and \code{get_matches()} is by supplying just the \code{matchit} object, e.g., as \code{match_data(m.out)}. A data set will first be searched in the environment of the \code{matchit} formula, then in the calling environment of \code{match_data()} or \code{get_matches()}, and finally in the \code{model} component of the \code{matchit} object if a propensity score was estimated. When called from an environment different from the one in which \code{matchit()} was originally called and a propensity score was not estimated (or was but with \code{discard} not \code{"none"} and \code{reestimate = TRUE}), this syntax may not work because the original dataset used to construct the matched dataset will not be found. This can occur when \code{matchit()} was run within an \code{\link[=lapply]{lapply()}} or \code{purrr::map()} call. The solution, which is recommended in all cases, is simply to supply the original dataset to the \code{data} argument of \code{match_data()}, e.g., as \code{match_data(m.out, data = original_data)}, as demonstrated in the Examples. } \examples{ data("lalonde") # 4:1 matching w/replacement m.out1 <- matchit(treat ~ age + educ + married + race + nodegree + re74 + re75, data = lalonde, replace = TRUE, caliper = .05, ratio = 4) m.data1 <- match_data(m.out1, data = lalonde, distance = "prop.score") dim(m.data1) #one row per matched unit head(m.data1, 10) g.matches1 <- get_matches(m.out1, data = lalonde, distance = "prop.score") dim(g.matches1) #multiple rows per matched unit head(g.matches1, 10) } \seealso{ \code{\link[=matchit]{matchit()}}; \code{\link[=rbind.matchdata]{rbind.matchdata()}} \code{vignette("estimating-effects")} for uses of \code{match_data()} and \code{get_matches()} in estimating treatment effects. } MatchIt/man/method_optimal.Rd0000644000176200001440000003111314740562365015665 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/matchit2optimal.R \name{method_optimal} \alias{method_optimal} \title{Optimal Pair Matching} \arguments{ \item{formula}{a two-sided \link{formula} object containing the treatment and covariates to be used in creating the distance measure used in the matching. This formula will be supplied to the functions that estimate the distance measure.} \item{data}{a data frame containing the variables named in \code{formula}. If not found in \code{data}, the variables will be sought in the environment.} \item{method}{set here to \code{"optimal"}.} \item{distance}{the distance measure to be used. See \code{\link{distance}} for allowable options. Can be supplied as a distance matrix.} \item{link}{when \code{distance} is specified as a method of estimating propensity scores, an additional argument controlling the link function used in estimating the distance measure. See \code{\link{distance}} for allowable options with each option.} \item{distance.options}{a named list containing additional arguments supplied to the function that estimates the distance measure as determined by the argument to \code{distance}.} \item{estimand}{a string containing the desired estimand. Allowable options include \code{"ATT"} and \code{"ATC"}. See Details.} \item{exact}{for which variables exact matching should take place.} \item{mahvars}{for which variables Mahalanobis distance matching should take place when \code{distance} corresponds to a propensity score (e.g., for caliper matching or to discard units for common support). If specified, the distance measure will not be used in matching.} \item{antiexact}{for which variables anti-exact matching should take place. Anti-exact matching is processed using \pkgfun{optmatch}{antiExactMatch}.} \item{discard}{a string containing a method for discarding units outside a region of common support. Only allowed when \code{distance} is not \code{"mahalanobis"} and not a matrix.} \item{reestimate}{if \code{discard} is not \code{"none"}, whether to re-estimate the propensity score in the remaining sample prior to matching.} \item{s.weights}{the variable containing sampling weights to be incorporated into propensity score models and balance statistics.} \item{ratio}{how many control units should be matched to each treated unit for k:1 matching. For variable ratio matching, see section "Variable Ratio Matching" in Details below.} \item{min.controls, max.controls}{for variable ratio matching, the minimum and maximum number of controls units to be matched to each treated unit. See section "Variable Ratio Matching" in Details below.} \item{verbose}{\code{logical}; whether information about the matching process should be printed to the console. What is printed depends on the matching method. Default is \code{FALSE} for no printing other than warnings.} \item{\dots}{additional arguments passed to \pkgfun{optmatch}{fullmatch}. Allowed arguments include \code{tol} and \code{solver}. See the \pkgfun{optmatch}{fullmatch} documentation for details. In general, \code{tol} should be set to a low number (e.g., \code{1e-7}) to get a more precise solution (default is \code{1e-3}). The arguments \code{replace}, \code{caliper}, and \code{m.order} are ignored with a warning.} } \description{ In \code{\link[=matchit]{matchit()}}, setting \code{method = "optimal"} performs optimal pair matching. The matching is optimal in the sense that that sum of the absolute pairwise distances in the matched sample is as small as possible. The method functionally relies on \pkgfun{optmatch}{fullmatch}. Advantages of optimal pair matching include that the matching order is not required to be specified and it is less likely that extreme within-pair distances will be large, unlike with nearest neighbor matching. Generally, however, as a subset selection method, optimal pair matching tends to perform similarly to nearest neighbor matching in that similar subsets of units will be selected to be matched. This page details the allowable arguments with \code{method = "optmatch"}. See \code{\link[=matchit]{matchit()}} for an explanation of what each argument means in a general context and how it can be specified. Below is how \code{matchit()} is used for optimal pair matching: \preformatted{ matchit(formula, data = NULL, method = "optimal", distance = "glm", link = "logit", distance.options = list(), estimand = "ATT", exact = NULL, mahvars = NULL, antiexact = NULL, discard = "none", reestimate = FALSE, s.weights = NULL, ratio = 1, min.controls = NULL, max.controls = NULL, verbose = FALSE, ...) } } \details{ \subsection{Mahalanobis Distance Matching}{ Mahalanobis distance matching can be done one of two ways: \enumerate{ \item{If no propensity score needs to be estimated, \code{distance} should be set to \code{"mahalanobis"}, and Mahalanobis distance matching will occur using all the variables in \code{formula}. Arguments to \code{discard} and \code{mahvars} will be ignored. For example, to perform simple Mahalanobis distance matching, the following could be run: \preformatted{ matchit(treat ~ X1 + X2, method = "nearest", distance = "mahalanobis") } With this code, the Mahalanobis distance is computed using \code{X1} and \code{X2}, and matching occurs on this distance. The \code{distance} component of the \code{matchit()} output will be empty. } \item{If a propensity score needs to be estimated for common support with \code{discard}, \code{distance} should be whatever method is used to estimate the propensity score or a vector of distance measures, i.e., it should not be \code{"mahalanobis"}. Use \code{mahvars} to specify the variables used to create the Mahalanobis distance. For example, to perform Mahalanobis after discarding units outside the common support of the propensity score in both groups, the following could be run: \preformatted{ matchit(treat ~ X1 + X2 + X3, method = "nearest", distance = "glm", discard = "both", mahvars = ~ X1 + X2) } With this code, \code{X1}, \code{X2}, and \code{X3} are used to estimate the propensity score (using the \code{"glm"} method, which by default is logistic regression), which is used to identify the common support. The actual matching occurs on the Mahalanobis distance computed only using \code{X1} and \code{X2}, which are supplied to \code{mahvars}. The estimated propensity scores will be included in the \code{distance} component of the \code{matchit()} output. } } } \subsection{Estimand}{ The \code{estimand} argument controls whether control units are selected to be matched with treated units (\code{estimand = "ATT"}) or treated units are selected to be matched with control units (\code{estimand = "ATC"}). The "focal" group (e.g., the treated units for the ATT) is typically made to be the smaller treatment group, and a warning will be thrown if it is not set that. Setting \code{estimand = "ATC"} is equivalent to swapping all treated and control labels for the treatment variable. When \code{estimand = "ATC"}, the \code{match.matrix} component of the output will have the names of the control units as the rownames and be filled with the names of the matched treated units (opposite to when \code{estimand = "ATT"}). Note that the argument supplied to \code{estimand} doesn't necessarily correspond to the estimand actually targeted; it is merely a switch to trigger which treatment group is considered "focal". } \subsection{Variable Ratio Matching}{ \code{matchit()} can perform variable ratio matching, which involves matching a different number of control units to each treated unit. When \code{ratio > 1}, rather than requiring all treated units to receive \code{ratio} matches, the arguments to \code{max.controls} and \code{min.controls} can be specified to control the maximum and minimum number of matches each treated unit can have. \code{ratio} controls how many total control units will be matched: \code{n1 * ratio} control units will be matched, where \code{n1} is the number of treated units, yielding the same total number of matched controls as fixed ratio matching does. Variable ratio matching can be used with any \code{distance} specification. \code{ratio} does not have to be an integer but must be greater than 1 and less than \code{n0/n1}, where \code{n0} and \code{n1} are the number of control and treated units, respectively. Setting \code{ratio = n0/n1} performs a restricted form of full matching where all control units are matched. If \code{min.controls} is not specified, it is set to 1 by default. \code{min.controls} must be less than \code{ratio}, and \code{max.controls} must be greater than \code{ratio}. See the Examples section of \code{\link[=method_nearest]{method_nearest()}} for an example of their use, which is the same as it is with optimal matching. } } \note{ Optimal pair matching is a restricted form of optimal full matching where the number of treated units in each subclass is equal to 1, whereas in unrestricted full matching, multiple treated units can be assigned to the same subclass. \pkgfun{optmatch}{pairmatch} is simply a wrapper for \pkgfun{optmatch}{fullmatch}, which performs optimal full matching and is the workhorse for \code{\link{method_full}}. In the same way, \code{matchit()} uses \code{optmatch::fullmatch()} under the hood, imposing the restrictions that make optimal full matching function like optimal pair matching (which is simply to set \code{min.controls >= 1} and to pass \code{ratio} to the \code{mean.controls} argument). This distinction is not important for regular use but may be of interest to those examining the source code. The option \code{"optmatch_max_problem_size"} is automatically set to \code{Inf} during the matching process, different from its default in \emph{optmatch}. This enables matching problems of any size to be run, but may also let huge, infeasible problems get through and potentially take a long time or crash R. See \pkgfun{optmatch}{setMaxProblemSize} for more details. A preprocessing algorithm describe by Sävje (2020; \doi{10.1214/19-STS739}) is used to improve the speed of the matching when 1:1 matching on a propensity score. It does so by adding an additional constraint that guarantees a solution as optimal as the solution that would have been found without the constraint, and that constraint often dramatically reduces the size of the matching problem at no cost. However, this may introduce differences between the results obtained by \emph{MatchIt} and by \emph{optmatch}, though such differences will shrink when smaller values of \code{tol} are used. } \section{Outputs}{ All outputs described in \code{\link[=matchit]{matchit()}} are returned with \code{method = "optimal"}. When \code{include.obj = TRUE} in the call to \code{matchit()}, the output of the call to \code{optmatch::fullmatch()} will be included in the output. When \code{exact} is specified, this will be a list of such objects, one for each stratum of the \code{exact} variables. } \examples{ \dontshow{if (requireNamespace("optmatch", quietly = TRUE)) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} data("lalonde") #1:1 optimal PS matching with exact matching on race m.out1 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "optimal", exact = ~race) m.out1 summary(m.out1) #2:1 optimal matching on the scaled Euclidean distance m.out2 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "optimal", ratio = 2, distance = "scaled_euclidean") m.out2 summary(m.out2, un = FALSE) \dontshow{\}) # examplesIf} } \references{ In a manuscript, be sure to cite the following paper if using \code{matchit()} with \code{method = "optimal"}: Hansen, B. B., & Klopfer, S. O. (2006). Optimal Full Matching and Related Designs via Network Flows. Journal of Computational and Graphical Statistics, 15(3), 609–627. \doi{10.1198/106186006X137047} For example, a sentence might read: \emph{Optimal pair matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R, which calls functions from the optmatch package (Hansen & Klopfer, 2006).} } \seealso{ \code{\link[=matchit]{matchit()}} for a detailed explanation of the inputs and outputs of a call to \code{matchit()}. \pkgfun{optmatch}{fullmatch}, which is the workhorse. \code{\link{method_full}} for optimal full matching, of which optimal pair matching is a special case, and which relies on similar machinery. } MatchIt/man/summary.matchit.Rd0000644000176200001440000002660614740562366016021 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/summary.matchit.R \name{summary.matchit} \alias{summary.matchit} \alias{summary.matchit.subclass} \alias{print.summary.matchit} \alias{print.summary.matchit.subclass} \title{View a balance summary of a \code{matchit} object} \usage{ \method{summary}{matchit}( object, interactions = FALSE, addlvariables = NULL, standardize = TRUE, data = NULL, pair.dist = TRUE, un = TRUE, improvement = FALSE, ... ) \method{summary}{matchit.subclass}( object, interactions = FALSE, addlvariables = NULL, standardize = TRUE, data = NULL, pair.dist = FALSE, subclass = FALSE, un = TRUE, improvement = FALSE, ... ) \method{print}{summary.matchit}(x, digits = max(3, getOption("digits") - 3), ...) } \arguments{ \item{object}{a \code{matchit} object; the output of a call to \code{\link[=matchit]{matchit()}}.} \item{interactions}{\code{logical}; whether to compute balance statistics for two-way interactions and squares of covariates. Default is \code{FALSE}.} \item{addlvariables}{additional variable for which balance statistics are to be computed along with the covariates in the \code{matchit} object. Can be entered in one of three ways: as a data frame of covariates with as many rows as there were units in the original \code{matchit()} call, as a string containing the names of variables in \code{data}, or as a right-sided \code{formula} with the additional variables (and possibly their transformations) found in \code{data}, the environment, or the \code{matchit} object. Balance on squares and interactions of the additional variables will be included if \code{interactions = TRUE}.} \item{standardize}{\code{logical}; whether to compute standardized (\code{TRUE}) or unstandardized (\code{FALSE}) statistics. The standardized statistics are the standardized mean difference and the mean and maximum of the difference in the (weighted) empirical cumulative distribution functions (ECDFs). The unstandardized statistics are the raw mean difference and the mean and maximum of the quantile-quantile (QQ) difference. Variance ratios are produced either way. See Details below. Default is \code{TRUE}.} \item{data}{a optional data frame containing variables named in \code{addlvariables} if specified as a string or formula.} \item{pair.dist}{\code{logical}; whether to compute average absolute pair distances. For matching methods that don't include a \code{match.matrix} component in the output (i.e., exact matching, coarsened exact matching, full matching, and subclassification), computing pair differences can take a long time, especially for large datasets and with many covariates. For other methods (i.e., nearest neighbor, optimal, and genetic matching), computation is fairly quick. Default is \code{FALSE} for subclassification and \code{TRUE} otherwise.} \item{un}{\code{logical}; whether to compute balance statistics for the unmatched sample. Default \code{TRUE}; set to \code{FALSE} for more concise output.} \item{improvement}{\code{logical}; whether to compute the percent reduction in imbalance. Default \code{FALSE}. Ignored if \code{un = FALSE}.} \item{\dots}{ignored.} \item{subclass}{after subclassification, whether to display balance for individual subclasses, and, if so, for which ones. Can be \code{TRUE} (display balance for all subclasses), \code{FALSE} (display balance only in aggregate), or the indices (e.g., \code{1:6}) of the specific subclasses for which to display balance. When anything other than \code{FALSE}, aggregate balance statistics will not be displayed. Default is \code{FALSE}.} \item{x}{a \code{summay.matchit} or \code{summary.matchit.subclass} object; the output of a call to \code{summary()}.} \item{digits}{the number of digits to round balance statistics to.} } \value{ For \code{matchit} objects, a \code{summary.matchit} object, which is a list with the following components: \item{call}{the original call to \code{\link[=matchit]{matchit()}}} \item{nn}{a matrix of the sample sizes in the original (unmatched) and matched samples} \item{sum.all}{if \code{un = TRUE}, a matrix of balance statistics for each covariate in the original (unmatched) sample} \item{sum.matched}{a matrix of balance statistics for each covariate in the matched sample} \item{reduction}{if \code{improvement = TRUE}, a matrix of the percent reduction in imbalance for each covariate in the matched sample} For \code{match.subclass} objects, a \code{summary.matchit.subclass} object, which is a list as above containing the following components: \item{call}{the original call to \code{\link[=matchit]{matchit()}}} \item{sum.all}{if \code{un = TRUE}, a matrix of balance statistics for each covariate in the original sample} \item{sum.subclass}{if \code{subclass} is not \code{FALSE}, a list of matrices of balance statistics for each subclass} \item{sum.across}{a matrix of balance statistics for each covariate computed using the subclassification weights} \item{reduction}{if \code{improvement = TRUE}, a matrix of the percent reduction in imbalance for each covariate in the matched sample} \item{qn}{a matrix of sample sizes within each subclass} \item{nn}{a matrix of the sample sizes in the original (unmatched) and matched samples} } \description{ Computes and prints balance statistics for \code{matchit} and \code{matchit.subclass} objects. Balance should be assessed to ensure the matching or subclassification was effective at eliminating treatment group imbalance and should be reported in the write-up of the results of the analysis. } \details{ \code{summary()} computes a balance summary of a \code{matchit} object. This include balance before and after matching or subclassification, as well as the percent improvement in balance. The variables for which balance statistics are computed are those included in the \code{formula}, \code{exact}, and \code{mahvars} arguments to \code{\link[=matchit]{matchit()}}, as well as the distance measure if \code{distance} is was supplied as a numeric vector or method of estimating propensity scores. The \code{X} component of the \code{matchit} object is used to supply the covariates. The standardized mean differences are computed both before and after matching or subclassification as the difference in treatment group means divided by a standardization factor computed in the unmatched (original) sample. The standardization factor depends on the argument supplied to \code{estimand} in \code{matchit()}: for \code{"ATT"}, it is the standard deviation in the treated group; for \code{"ATC"}, it is the standard deviation in the control group; for \code{"ATE"}, it is the square root of the average of the variances within each treatment group. The post-matching mean difference is computed with weighted means in the treatment groups using the matching or subclassification weights. The variance ratio is computed as the ratio of the treatment group variances. Variance ratios are not computed for binary variables because their variance is a function solely of their mean. After matching, weighted variances are computed using the formula used in \code{\link[=cov.wt]{cov.wt()}}. The percent reduction in bias is computed using the log of the variance ratios. The eCDF difference statistics are computed by creating a (weighted) eCDF for each group and taking the difference between them for each covariate value. The eCDF is a function that outputs the (weighted) proportion of units with covariate values at or lower than the input value. The maximum eCDF difference is the same thing as the Kolmogorov-Smirnov statistic. The values are bounded at zero and one, with values closer to zero indicating good overlap between the covariate distributions in the treated and control groups. For binary variables, all eCDF differences are equal to the (weighted) difference in proportion and are computed that way. The QQ difference statistics are computed by creating two samples of the same size by interpolating the values of the larger one. The values are arranged in order for each sample. The QQ difference for each quantile is the difference between the observed covariate values at that quantile between the two groups. The difference is on the scale of the original covariate. Values close to zero indicate good overlap between the covariate distributions in the treated and control groups. A weighted interpolation is used for post-matching QQ differences. For binary variables, all QQ differences are equal to the (weighted) difference in proportion and are computed that way. The pair distance is the average of the absolute differences of a variable between pairs. For example, if a treated unit was paired with four control units, that set of units would contribute four absolute differences to the average. Within a subclass, each combination of treated and control unit forms a pair that contributes once to the average. The pair distance is described in Stuart and Green (2008) and is the value that is minimized when using optimal (full) matching. When \code{standardize = TRUE}, the standardized versions of the variables are used, where the standardization factor is as described above for the standardized mean differences. Pair distances are not computed in the unmatched sample (because there are no pairs). Because pair distance can take a while to compute, especially with large datasets or for many covariates, setting \code{pair.dist = FALSE} is one way to speed up \code{summary()}. The effective sample size (ESS) is a measure of the size of a hypothetical unweighted sample with roughly the same precision as a weighted sample. When non-uniform matching weights are computed (e.g., as a result of full matching, matching with replacement, or subclassification), the ESS can be used to quantify the potential precision remaining in the matched sample. The ESS will always be less than or equal to the matched sample size, reflecting the loss in precision due to using the weights. With non-uniform weights, it is printed in the sample size table; otherwise, it is removed because it does not contain additional information above the matched sample size. After subclassification, the aggregate balance statistics are computed using the subclassification weights rather than averaging across subclasses. All balance statistics (except pair differences) are computed incorporating the sampling weights supplied to \code{matchit()}, if any. The unadjusted balance statistics include the sampling weights and the adjusted balance statistics use the matching weights multiplied by the sampling weights. When printing, \code{NA} values are replaced with periods (\code{.}), and the pair distance column in the unmatched and percent balance improvement components of the output are omitted. } \examples{ data("lalonde") m.out <- matchit(treat ~ age + educ + married + race + re74, data = lalonde, method = "nearest", exact = ~ married, replace = TRUE) summary(m.out, interactions = TRUE) s.out <- matchit(treat ~ age + educ + married + race + nodegree + re74 + re75, data = lalonde, method = "subclass") summary(s.out, addlvariables = ~log(age) + I(re74==0)) summary(s.out, subclass = TRUE) } \seealso{ \code{\link[=summary]{summary()}} for the generic method; \code{\link[=plot.summary.matchit]{plot.summary.matchit()}} for making a Love plot from \code{summary()} output. \pkgfun{cobalt}{bal.tab.matchit}, which also displays balance for \code{matchit} objects. } MatchIt/man/method_full.Rd0000644000176200001440000002571014740562365015170 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/matchit2full.R \name{method_full} \alias{method_full} \title{Optimal Full Matching} \arguments{ \item{formula}{a two-sided \link{formula} object containing the treatment and covariates to be used in creating the distance measure used in the matching. This formula will be supplied to the functions that estimate the distance measure.} \item{data}{a data frame containing the variables named in \code{formula}. If not found in \code{data}, the variables will be sought in the environment.} \item{method}{set here to \code{"full"}.} \item{distance}{the distance measure to be used. See \code{\link{distance}} for allowable options. Can be supplied as a distance matrix.} \item{link}{when \code{distance} is specified as a method of estimating propensity scores, an additional argument controlling the link function used in estimating the distance measure. See \code{\link{distance}} for allowable options with each option.} \item{distance.options}{a named list containing additional arguments supplied to the function that estimates the distance measure as determined by the argument to \code{distance}.} \item{estimand}{a string containing the desired estimand. Allowable options include \code{"ATT"}, \code{"ATC"}, and \code{"ATE"}. The estimand controls how the weights are computed; see the Computing Weights section at \code{\link[=matchit]{matchit()}} for details.} \item{exact}{for which variables exact matching should take place.} \item{mahvars}{for which variables Mahalanobis distance matching should take place when \code{distance} corresponds to a propensity score (e.g., for caliper matching or to discard units for common support). If specified, the distance measure will not be used in matching.} \item{antiexact}{for which variables anti-exact matching should take place. Anti-exact matching is processed using \pkgfun{optmatch}{antiExactMatch}.} \item{discard}{a string containing a method for discarding units outside a region of common support. Only allowed when \code{distance} corresponds to a propensity score.} \item{reestimate}{if \code{discard} is not \code{"none"}, whether to re-estimate the propensity score in the remaining sample prior to matching.} \item{s.weights}{the variable containing sampling weights to be incorporated into propensity score models and balance statistics.} \item{caliper}{the width(s) of the caliper(s) used for caliper matching. Calipers are processed by \pkgfun{optmatch}{caliper}. Positive and negative calipers are allowed. See Notes and Examples.} \item{std.caliper}{\code{logical}; when calipers are specified, whether they are in standard deviation units (\code{TRUE}) or raw units (\code{FALSE}).} \item{verbose}{\code{logical}; whether information about the matching process should be printed to the console.} \item{\dots}{additional arguments passed to \pkgfun{optmatch}{fullmatch}. Allowed arguments include \code{min.controls}, \code{max.controls}, \code{omit.fraction}, \code{mean.controls}, \code{tol}, and \code{solver}. See the \pkgfun{optmatch}{fullmatch} documentation for details. In general, \code{tol} should be set to a low number (e.g., \code{1e-7}) to get a more precise solution. The arguments \code{replace}, \code{m.order}, and \code{ratio} are ignored with a warning.} } \description{ In \code{\link[=matchit]{matchit()}}, setting \code{method = "full"} performs optimal full matching, which is a form of subclassification wherein all units, both treatment and control (i.e., the "full" sample), are assigned to a subclass and receive at least one match. The matching is optimal in the sense that that sum of the absolute distances between the treated and control units in each subclass is as small as possible. The method relies on and is a wrapper for \pkgfun{optmatch}{fullmatch}. Advantages of optimal full matching include that the matching order is not required to be specified, units do not need to be discarded, and it is less likely that extreme within-subclass distances will be large, unlike with standard subclassification. The primary output of full matching is a set of matching weights that can be applied to the matched sample; in this way, full matching can be seen as a robust alternative to propensity score weighting, robust in the sense that the propensity score model does not need to be correct to estimate the treatment effect without bias. Note: with large samples, the optimization may fail or run very slowly; one can try using \code{\link[=method_quick]{method = "quick"}} instead, which also performs full matching but can be much faster. This page details the allowable arguments with \code{method = "full"}. See \code{\link[=matchit]{matchit()}} for an explanation of what each argument means in a general context and how it can be specified. Below is how \code{matchit()} is used for optimal full matching: \preformatted{ matchit(formula, data = NULL, method = "full", distance = "glm", link = "logit", distance.options = list(), estimand = "ATT", exact = NULL, mahvars = NULL, anitexact = NULL, discard = "none", reestimate = FALSE, s.weights = NULL, caliper = NULL, std.caliper = TRUE, verbose = FALSE, ...) } } \details{ \subsection{Mahalanobis Distance Matching}{ Mahalanobis distance matching can be done one of two ways: \enumerate{ \item{ If no propensity score needs to be estimated, \code{distance} should be set to \code{"mahalanobis"}, and Mahalanobis distance matching will occur using all the variables in \code{formula}. Arguments to \code{discard} and \code{mahvars} will be ignored, and a caliper can only be placed on named variables. For example, to perform simple Mahalanobis distance matching, the following could be run: \preformatted{ matchit(treat ~ X1 + X2, method = "nearest", distance = "mahalanobis") } With this code, the Mahalanobis distance is computed using \code{X1} and \code{X2}, and matching occurs on this distance. The \code{distance} component of the \code{matchit()} output will be empty. } \item{ If a propensity score needs to be estimated for any reason, e.g., for common support with \code{discard} or for creating a caliper, \code{distance} should be whatever method is used to estimate the propensity score or a vector of distance measures, i.e., it should not be \code{"mahalanobis"}. Use \code{mahvars} to specify the variables used to create the Mahalanobis distance. For example, to perform Mahalanobis within a propensity score caliper, the following could be run: \preformatted{ matchit(treat ~ X1 + X2 + X3, method = "nearest", distance = "glm", caliper = .25, mahvars = ~ X1 + X2) } With this code, \code{X1}, \code{X2}, and \code{X3} are used to estimate the propensity score (using the \code{"glm"} method, which by default is logistic regression), which is used to create a matching caliper. The actual matching occurs on the Mahalanobis distance computed only using \code{X1} and \code{X2}, which are supplied to \code{mahvars}. Units whose propensity score difference is larger than the caliper will not be paired, and some treated units may therefore not receive a match. The estimated propensity scores will be included in the \code{distance} component of the \code{matchit()} output. See Examples. } } } } \note{ Calipers can only be used when \code{min.controls} is left at its default. The option \code{"optmatch_max_problem_size"} is automatically set to \code{Inf} during the matching process, different from its default in \emph{optmatch}. This enables matching problems of any size to be run, but may also let huge, infeasible problems get through and potentially take a long time or crash R. See \pkgfun{optmatch}{setMaxProblemSize} for more details. } \section{Outputs}{ All outputs described in \code{\link[=matchit]{matchit()}} are returned with \code{method = "full"} except for \code{match.matrix}. This is because matching strata are not indexed by treated units as they are in some other forms of matching. When \code{include.obj = TRUE} in the call to \code{matchit()}, the output of the call to \pkgfun{optmatch}{fullmatch} will be included in the output. When \code{exact} is specified, this will be a list of such objects, one for each stratum of the \code{exact} variables. } \examples{ \dontshow{if (requireNamespace("optmatch", quietly = TRUE)) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} data("lalonde") # Optimal full PS matching m.out1 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "full") m.out1 summary(m.out1) # Optimal full Mahalanobis distance matching within a PS caliper m.out2 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "full", caliper = .01, mahvars = ~ age + educ + re74 + re75) m.out2 summary(m.out2, un = FALSE) # Optimal full Mahalanobis distance matching within calipers # of 500 on re74 and re75 m.out3 <- matchit(treat ~ age + educ + re74 + re75, data = lalonde, distance = "mahalanobis", method = "full", caliper = c(re74 = 500, re75 = 500), std.caliper = FALSE) m.out3 summary(m.out3, addlvariables = ~race + nodegree + married, data = lalonde, un = FALSE) \dontshow{\}) # examplesIf} } \references{ In a manuscript, be sure to cite the following paper if using \code{matchit()} with \code{method = "full"}: Hansen, B. B., & Klopfer, S. O. (2006). Optimal Full Matching and Related Designs via Network Flows. \emph{Journal of Computational and Graphical Statistics}, 15(3), 609–627. \doi{10.1198/106186006X137047} For example, a sentence might read: \emph{Optimal full matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R, which calls functions from the optmatch package (Hansen & Klopfer, 2006).} Theory is also developed in the following article: Hansen, B. B. (2004). Full Matching in an Observational Study of Coaching for the SAT. Journal of the American Statistical Association, 99(467), 609–618. \doi{10.1198/016214504000000647} } \seealso{ \code{\link[=matchit]{matchit()}} for a detailed explanation of the inputs and outputs of a call to \code{matchit()}. \pkgfun{optmatch}{fullmatch}, which is the workhorse. \code{\link{method_optimal}} for optimal pair matching, which is a special case of optimal full matching, and which relies on similar machinery. Results from \code{method = "optimal"} can be replicated with \code{method = "full"} by setting \code{min.controls}, \code{max.controls}, and \code{mean.controls} to the desired \code{ratio}. \code{\link{method_quick}} for fast generalized quick matching, which is very similar to optimal full matching but can be dramatically faster at the expense of optimality and is less customizable. } MatchIt/man/method_subclass.Rd0000644000176200001440000001722214740562365016044 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/matchit2subclass.R \name{method_subclass} \alias{method_subclass} \title{Subclassification} \arguments{ \item{formula}{a two-sided \link{formula} object containing the treatment and covariates to be used in creating the distance measure used in the subclassification.} \item{data}{a data frame containing the variables named in \code{formula}. If not found in \code{data}, the variables will be sought in the environment.} \item{method}{set here to \code{"subclass"}.} \item{distance}{the distance measure to be used. See \code{\link{distance}} for allowable options. Must be a vector of distance scores or the name of a method of estimating propensity scores.} \item{link}{when \code{distance} is specified as a string, an additional argument controlling the link function used in estimating the distance measure. See \code{\link{distance}} for allowable options with each option.} \item{distance.options}{a named list containing additional arguments supplied to the function that estimates the distance measure as determined by the argument to \code{distance}.} \item{estimand}{the target \code{estimand}. If \code{"ATT"}, the default, subclasses are formed based on quantiles of the distance measure in the treated group; if \code{"ATC"}, subclasses are formed based on quantiles of the distance measure in the control group; if \code{"ATE"}, subclasses are formed based on quantiles of the distance measure in the full sample. The estimand also controls how the subclassification weights are computed; see the Computing Weights section at \code{\link[=matchit]{matchit()}} for details.} \item{discard}{a string containing a method for discarding units outside a region of common support.} \item{reestimate}{if \code{discard} is not \code{"none"}, whether to re-estimate the propensity score in the remaining sample prior to subclassification.} \item{s.weights}{the variable containing sampling weights to be incorporated into propensity score models and balance statistics.} \item{verbose}{\code{logical}; whether information about the matching process should be printed to the console.} \item{\dots}{additional arguments that control the subclassification: \describe{ \item{\code{subclass}}{either the number of subclasses desired or a vector of quantiles used to divide the distance measure into subclasses. Default is 6.} \item{\code{min.n}}{ the minimum number of units of each treatment group that are to be assigned each subclass. If the distance measure is divided in such a way that fewer than \code{min.n} units of a treatment group are assigned a given subclass, units from other subclasses will be reassigned to fill the deficient subclass. Default is 1. } } The arguments \code{exact}, \code{mahvars}, \code{replace}, \code{m.order}, \code{caliper} (and related arguments), and \code{ratio} are ignored with a warning.} } \description{ In \code{\link[=matchit]{matchit()}}, setting \code{method = "subclass"} performs subclassification on the distance measure (i.e., propensity score). Treatment and control units are placed into subclasses based on quantiles of the propensity score in the treated group, in the control group, or overall, depending on the desired estimand. Weights are computed based on the proportion of treated units in each subclass. Subclassification implemented here does not rely on any other package. This page details the allowable arguments with \code{method = "subclass"}. See \code{\link[=matchit]{matchit()}} for an explanation of what each argument means in a general context and how it can be specified. Below is how \code{matchit()} is used for subclassification: \preformatted{ matchit(formula, data = NULL, method = "subclass", distance = "glm", link = "logit", distance.options = list(), estimand = "ATT", discard = "none", reestimate = FALSE, s.weights = NULL, verbose = FALSE, ...) } } \details{ After subclassification, effect estimates can be computed separately in the subclasses and combined, or a single marginal effect can be estimated by using the weights in the full sample. When using the weights, the method is sometimes referred to as marginal mean weighting through stratification (MMWS; Hong, 2010) or fine stratification weighting (Desai et al., 2017). The weights can be interpreted just like inverse probability weights. See \code{vignette("estimating-effects")} for details. Changing \code{min.n} can change the quality of the weights. Generally, a low \code{min.w} will yield better balance because subclasses only contain units with relatively similar distance values, but may yield higher variance because extreme weights can occur due to there being few members of a treatment group in some subclasses. When \code{min.n = 0}, some subclasses may fail to contain units from both treatment groups, in which case all units in such subclasses will be dropped. Note that subclassification weights can also be estimated using \emph{WeightIt}, which provides some additional methods for estimating propensity scores. Where propensity score-estimation methods overlap, both packages will yield the same weights. } \section{Outputs}{ All outputs described in \code{\link[=matchit]{matchit()}} are returned with \code{method = "subclass"} except that \code{match.matrix} is excluded and one additional component, \code{q.cut}, is included, containing a vector of the distance measure cutpoints used to define the subclasses. Note that when \code{min.n > 0}, the subclass assignments may not strictly obey the quantiles listed in \code{q.cut}. \code{include.obj} is ignored. } \examples{ data("lalonde") # PS subclassification for the ATT with 7 subclasses s.out1 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "subclass", subclass = 7) s.out1 summary(s.out1, subclass = TRUE) # PS subclassification for the ATE with 10 subclasses # and at least 2 units in each group per subclass s.out2 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "subclass", subclass = 10, estimand = "ATE", min.n = 2) s.out2 summary(s.out2) } \references{ In a manuscript, you don't need to cite another package when using \code{method = "subclass"} because the subclassification is performed completely within \emph{MatchIt}. For example, a sentence might read: \emph{Propensity score subclassification was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R.} It may be a good idea to cite Hong (2010) or Desai et al. (2017) if the treatment effect is estimated using the subclassification weights. Desai, R. J., Rothman, K. J., Bateman, B. . T., Hernandez-Diaz, S., & Huybrechts, K. F. (2017). A Propensity-score-based Fine Stratification Approach for Confounding Adjustment When Exposure Is Infrequent: Epidemiology, 28(2), 249–257. \doi{10.1097/EDE.0000000000000595} Hong, G. (2010). Marginal mean weighting through stratification: Adjustment for selection bias in multilevel data. Journal of Educational and Behavioral Statistics, 35(5), 499–531. \doi{10.3102/1076998609359785} } \seealso{ \code{\link[=matchit]{matchit()}} for a detailed explanation of the inputs and outputs of a call to \code{matchit()}. \code{\link{method_full}} for optimal full matching and \code{\link{method_quick}} for generalized full matching, which are similar to subclassification except that the number of subclasses and subclass membership are chosen to optimize the within-subclass distance. } MatchIt/man/rbind.matchdata.Rd0000644000176200001440000000632214737562474015716 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/rbind.matchdata.R \name{rbind.matchdata} \alias{rbind.matchdata} \alias{rbind.getmatches} \title{Append matched datasets together} \usage{ \method{rbind}{matchdata}(..., deparse.level = 1) \method{rbind}{getmatches}(..., deparse.level = 1) } \arguments{ \item{\dots}{Two or more \code{matchdata} or \code{getmatches} objects the output of calls to \code{\link[=match_data]{match_data()}} and \code{\link[=get_matches]{get_matches()}}, respectively. Supplied objects must either be all \code{matchdata} objects or all \code{getmatches} objects.} \item{deparse.level}{Passed to \code{\link[=rbind]{rbind()}}.} } \value{ An object of the same class as those supplied to it (i.e., a \code{matchdata} object if \code{matchdata} objects are supplied and a \code{getmatches} object if \code{getmatches} objects are supplied). \code{\link[=rbind]{rbind()}} is called on the objects after adjusting the variables so that the appropriate method will be dispatched corresponding to the class of the original data object. } \description{ These functions are \code{\link[=rbind]{rbind()}} methods for objects resulting from calls to \code{\link[=match_data]{match_data()}} and \code{\link[=get_matches]{get_matches()}}. They function nearly identically to \code{rbind.data.frame()}; see Details for how they differ. } \details{ \code{rbind()} appends two or more datasets row-wise. This can be useful when matching was performed separately on subsets of the original data and they are to be combined into a single dataset for effect estimation. Using the regular \code{data.frame} method for \code{rbind()} would pose a problem, however; the \code{subclass} variable would have repeated names across different datasets, even though units only belong to the subclasses in their respective datasets. \code{rbind.matchdata()} renames the subclasses so that the correct subclass membership is maintained. The supplied matched datasets must be generated from the same original dataset, that is, having the same variables in it. The added components (e.g., weights, subclass) can be named differently in different datasets but will be changed to have the same name in the output. \code{rbind.getmatches()} and \code{rbind.matchdata()} are identical. } \examples{ data("lalonde") # Matching based on race subsets m.out_b <- matchit(treat ~ age + educ + married + nodegree + re74 + re75, data = subset(lalonde, race == "black")) md_b <- match_data(m.out_b) m.out_h <- matchit(treat ~ age + educ + married + nodegree + re74 + re75, data = subset(lalonde, race == "hispan")) md_h <- match_data(m.out_h) m.out_w <- matchit(treat ~ age + educ + married + nodegree + re74 + re75, data = subset(lalonde, race == "white")) md_w <- match_data(m.out_w) #Bind the datasets together md_all <- rbind(md_b, md_h, md_w) #Subclass conflicts are avoided levels(md_all$subclass) } \seealso{ \code{\link[=match_data]{match_data()}}, \code{\link[=rbind]{rbind()}} See \code{vignettes("estimating-effects")} for details on using \code{rbind()} for effect estimation after subsetting the data. } \author{ Noah Greifer } MatchIt/man/macros/0000755000176200001440000000000014463002323013637 5ustar liggesusersMatchIt/man/macros/macros.Rd0000644000176200001440000000260414207226672015427 0ustar liggesusers% Rd macro for simplifying documentation writing \newcommand{\fun}{\code{\link[=#1]{#1()}}} % Because R packages need conditional use of packages in Suggests, any cross-reference to a doc in another package needs to be conditionally evaluated, too. %\pkgfun{}{})tests whether the package is available, and, if so, produces a cross-reference to the function in the package; if not, the function name is displayed without a cross-reference. The first argument is the package, the second is the function name, e.g., \pkgfun{optmatch}{pairmatch}. \newcommand{\pkgfun}{\ifelse{\Sexpr[results=rd,stage=render]{requireNamespace("#1", quietly = TRUE)}}{\code{\link[#1:#2]{#1::#2()}}}{\code{#1::#2()}}} %\newcommand{\pkgfun}{\code{\link[#1:#2]{#1::#2()}}} %E.g., \pkgfun{sandwich}{vcovCL} is the same as \code{\link[sandwich:vcovCL]{vcovCL}} if the sandwich package is installed and \code{vcovCL} if not. %\pkgfun2{}{}{} does the same but allows the third argument to be printed, e.g., to use text that differs from the name of the function in the new package. \newcommand{\pkgfun2}{\ifelse{\Sexpr[results=rd,stage=render]{requireNamespace("#1", quietly = TRUE)}}{\code{\link[#1:#2]{#3()}}}{\code{#3()}}} %\newcommand{\pkgfun2}{\code{\link[#1:#2]{#3()}}} %E.g., \pkgfun2{sandwich}{vcovCL}{meatCL} is the same as \code{\link[sandwich:vcovCL]{meatCL}} if the sandwich package is installed and \code{meatCL} if not. MatchIt/man/add_s.weights.Rd0000644000176200001440000000505514740562365015411 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/add_s.weights.R \name{add_s.weights} \alias{add_s.weights} \title{Add sampling weights to a \code{matchit} object} \usage{ add_s.weights(m, s.weights = NULL, data = NULL) } \arguments{ \item{m}{a \code{matchit} object; the output of a call to \code{\link[=matchit]{matchit()}}, typically with the \code{s.weights} argument unspecified.} \item{s.weights}{an numeric vector of sampling weights to be added to the \code{matchit} object. Can also be specified as a string containing the name of variable in \code{data} to be used or a one-sided formula with the variable on the right-hand side (e.g., \code{~ SW}).} \item{data}{a data frame containing the sampling weights if given as a string or formula. If unspecified, \code{add_s.weights()} will attempt to find the dataset using the environment of the \code{matchit} object.} } \value{ a \code{matchit} object with an \code{s.weights} component containing the supplied sampling weights. If \code{s.weights = NULL}, the original \code{matchit} object is returned. } \description{ Adds sampling weights to a \code{matchit} object so that they are incorporated into balance assessment and creation of the weights. This would typically only be used when an argument to \code{s.weights} was not supplied to \code{\link[=matchit]{matchit()}} (i.e., because they were not to be included in the estimation of the propensity score) but sampling weights are required for generalizing an effect to the correct population. Without adding sampling weights to the \code{matchit} object, balance assessment tools (i.e., \code{\link[=summary.matchit]{summary.matchit()}} and \code{\link[=plot.matchit]{plot.matchit()}}) will not calculate balance statistics correctly, and the weights produced by \code{\link[=match_data]{match_data()}} and \code{\link[=get_matches]{get_matches()}} will not incorporate the sampling weights. } \examples{ data("lalonde") # Generate random sampling weights, just # for this example sw <- rchisq(nrow(lalonde), 2) # NN PS match using logistic regression PS that doesn't # include sampling weights m.out <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde) m.out # Add s.weights to the matchit object m.out <- add_s.weights(m.out, sw) m.out #note additional output # Check balance; note that sample sizes incorporate # s.weights summary(m.out, improvement = FALSE) } \seealso{ \code{\link[=matchit]{matchit()}}; \code{\link[=match_data]{match_data()}} } \author{ Noah Greifer } MatchIt/man/method_cardinality.Rd0000644000176200001440000003403214740562365016526 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/matchit2cardinality.R \name{method_cardinality} \alias{method_cardinality} \title{Cardinality Matching} \arguments{ \item{formula}{a two-sided \link{formula} object containing the treatment and covariates to be balanced.} \item{data}{a data frame containing the variables named in \code{formula}. If not found in \code{data}, the variables will be sought in the environment.} \item{method}{set here to \code{"cardinality"}.} \item{estimand}{a string containing the desired estimand. Allowable options include \code{"ATT"}, \code{"ATC"}, and \code{"ATE"}. See Details.} \item{exact}{for which variables exact matching should take place. Separate optimization will occur within each subgroup of the exact matching variables.} \item{mahvars}{which variables should be used for pairing after subset selection. Can only be set when \code{ratio} is a whole number. See Details.} \item{s.weights}{the variable containing sampling weights to be incorporated into the optimization. The balance constraints refer to the product of the sampling weights and the matching weights, and the sum of the product of the sampling and matching weights will be maximized.} \item{ratio}{the desired ratio of control to treated units. Can be set to \code{NA} to maximize sample size without concern for this ratio. See Details.} \item{verbose}{\code{logical}; whether information about the matching process should be printed to the console.} \item{\dots}{additional arguments that control the matching specification: \describe{ \item{\code{tols}}{\code{numeric}; a vector of imbalance tolerances for mean differences, one for each covariate in \code{formula}. If only one value is supplied, it is applied to all. See \code{std.tols} below. Default is \code{.05} for standardized mean differences of at most .05 for all covariates between the treatment groups in the matched sample. } \item{\code{std.tols}}{\code{logical}; whether each entry in \code{tols} corresponds to a raw or standardized mean difference. If only one value is supplied, it is applied to all. Default is \code{TRUE} for standardized mean differences. The standardization factor is the pooled standard deviation when \code{estimand = "ATE"}, the standard deviation of the treated group when \code{estimand = "ATT"}, and the standard deviation of the control group when \code{estimand = "ATC"} (the same as used in \code{\link[=summary.matchit]{summary.matchit()}}).} \item{\code{solver}}{ the name of solver to use to solve the optimization problem. Available options include \code{"highs"}, \code{"glpk"}, \code{"symphony"}, and \code{"gurobi"} for HiGHS (implemented in the \emph{highs} package), GLPK (implemented in the \emph{Rglpk} package), SYMPHONY (implemented in the \emph{Rsymphony} package), and Gurobi (implemented in the \emph{gurobi} package), respectively. The differences between them are in speed and solving ability. HiGHS (the default) and GLPK are the easiest to install, but Gurobi is recommended as it consistently outperforms other solvers and can find solutions even when others can't, and in less time. Gurobi is proprietary but can be used with a free trial or academic license. SYMPHONY may not produce reproducible results, even with a seed set. } \item{\code{time}}{ the maximum amount of time before the optimization routine aborts, in seconds. Default is 120 (2 minutes). For large problems, this should be set much higher. } } The arguments \code{distance} (and related arguments), \code{replace}, \code{m.order}, and \code{caliper} (and related arguments) are ignored with a warning.} } \description{ In \code{\link[=matchit]{matchit()}}, setting \code{method = "cardinality"} performs cardinality matching and other forms of matching that use mixed integer programming. Rather than forming pairs, cardinality matching selects the largest subset of units that satisfies user-supplied balance constraints on mean differences. One of several available optimization programs can be used to solve the mixed integer program. The default is the HiGHS library as implemented in the \emph{highs} package, both of which are free, but performance can be improved using Gurobi and the \emph{gurobi} package, for which there is a free academic license. This page details the allowable arguments with \code{method = "cardinality"}. See \code{\link[=matchit]{matchit()}} for an explanation of what each argument means in a general context and how it can be specified. Below is how \code{matchit()} is used for cardinality matching: \preformatted{ matchit(formula, data = NULL, method = "cardinality", estimand = "ATT", exact = NULL, mahvars = NULL, s.weights = NULL, ratio = 1, verbose = FALSE, tols = .05, std.tols = TRUE, solver = "highs", ...) } } \details{ \subsection{Cardinality and Profile Matching}{ Two types of matching are available with \code{method = "cardinality"}: cardinality matching and profile matching. \strong{Cardinality matching} finds the largest matched set that satisfies the balance constraints between treatment groups, with the additional constraint that the ratio of the number of matched control to matched treated units is equal to \code{ratio} (1 by default), mimicking k:1 matching. When not all treated units are included in the matched set, the estimand no longer corresponds to the ATT, so cardinality matching should be avoided if retaining the ATT is desired. To request cardinality matching, \code{estimand} should be set to \code{"ATT"} or \code{"ATC"} and \code{ratio} should be set to a positive integer. 1:1 cardinality matching is the default method when no arguments are specified. \strong{Profile matching} finds the largest matched set that satisfies balance constraints between each treatment group and a specified target sample. When \code{estimand = "ATT"}, it will find the largest subset of the control units that satisfies the balance constraints with respect to the treated group, which is left intact. When \code{estimand = "ATE"}, it will find the largest subsets of the treated group and of the control group that are balanced to the overall sample. To request profile matching for the ATT, \code{estimand} should be set to \code{"ATT"} and \code{ratio} to \code{NA}. To request profile matching for the ATE, \code{estimand} should be set to \code{"ATE"} and \code{ratio} can be set either to \code{NA} to maximize the size of each sample independently or to a positive integer to ensure that the ratio of matched control units to matched treated treats is fixed, mimicking k:1 matching. Unlike cardinality matching, profile matching retains the requested estimand if a solution is found. Neither method involves creating pairs in the matched set, but it is possible to perform an additional round of pairing within the matched sample after cardinality matching or profile matching for the ATE with a fixed whole number sample size ratio by supplying the desired pairing variables to \code{mahvars}. Doing so will trigger \link[=method_optimal]{optimal matching} using \code{optmatch::pairmatch()} on the Mahalanobis distance computed using the variables supplied to \code{mahvars}. The balance or composition of the matched sample will not change, but additional precision and robustness can be gained by forming the pairs. The weights are scaled so that the sum of the weights in each group is equal to the number of matched units in the smaller group when cardinality matching or profile matching for the ATE, and scaled so that the sum of the weights in the control group is equal to the number of treated units when profile matching for the ATT. When the sample sizes of the matched groups is the same (i.e., when \code{ratio = 1}), no scaling is done. Robust standard errors should be used in effect estimation after cardinality or profile matching (and cluster-robust standard errors if additional pairing is done in the matched sample). See \code{vignette("estimating-effects")} for more information. } \subsection{Specifying Balance Constraints}{ The balance constraints are on the (standardized) mean differences between the matched treatment groups for each covariate. Balance constraints should be set by supplying arguments to \code{tols} and \code{std.tols}. For example, setting \code{tols = .1} and \code{std.tols = TRUE} requests that all the mean differences in the matched sample should be within .1 standard deviations for each covariate. Different tolerances can be set for different variables; it might be beneficial to constrain the mean differences for highly prognostic covariates more tightly than for other variables. For example, one could specify \verb{tols = c(.001, .05), std.tols = c(TRUE, FALSE)} to request that the standardized mean difference for the first covariate is less than .001 and the raw mean difference for the second covariate is less than .05. The values should be specified in the order they appear in \code{formula}, except when interactions are present. One can run the following code: \preformatted{MatchIt:::get_assign(model.matrix(~X1*X2 + X3, data = data))[-1]} which will output a vector of numbers and the variable to which each number corresponds; the first entry in \code{tols} corresponds to the variable labeled 1, the second to the variable labeled 2, etc. } \subsection{Dealing with Errors and Warnings}{ When the optimization cannot be solved at all, or at least within the time frame specified in the argument to \code{time}, an error or warning will appear. Unfortunately, it is hard to know exactly the cause of the failure and what measures should be taken to rectify it. A warning that says \code{"The optimizer failed to find an optimal solution in the time alotted. The returned solution may not be optimal."} usually means that an optimal solution may be possible to find with more time, in which case \code{time} should be increased or a faster solver should be used. Even with this warning, a potentially usable solution will be returned, so don't automatically take it to mean the optimization failed. Sometimes, when there are multiple solutions with the same resulting sample size, the optimizers will stall at one of them, not thinking it has found the optimum. The result should be checked to see if it can be used as the solution. An error that says \code{"The optimization problem may be infeasible."} usually means that there is a issue with the optimization problem, i.e., that there is no possible way to satisfy the constraints. To rectify this, one can try relaxing the constraints by increasing the value of \code{tols} or use another solver. Sometimes Gurobi can solve problems that the other solvers cannot. } } \section{Outputs}{ Most outputs described in \code{\link[=matchit]{matchit()}} are returned with \code{method = "cardinality"}. Unless \code{mahvars} is specified, the \code{match.matrix} and \code{subclass} components are omitted because no pairing or subclassification is done. When \code{include.obj = TRUE} in the call to \code{matchit()}, the output of the optimization function will be included in the output. When \code{exact} is specified, this will be a list of such objects, one for each stratum of the exact variables. } \examples{ \dontshow{if (requireNamespace("highs", quietly = TRUE)) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} data("lalonde") #Choose your solver; "gurobi" is best, "highs" is free and #easy to install \donttest{solver <- "highs" m.out1 <- matchit(treat ~ age + educ + re74, data = lalonde, method = "cardinality", estimand = "ATT", ratio = 1, tols = .2, solver = solver) m.out1 summary(m.out1) # Profile matching for the ATT m.out2 <- matchit(treat ~ age + educ + re74, data = lalonde, method = "cardinality", estimand = "ATT", ratio = NA, tols = .2, solver = solver) m.out2 summary(m.out2, un = FALSE) # Profile matching for the ATE m.out3 <- matchit(treat ~ age + educ + re74, data = lalonde, method = "cardinality", estimand = "ATE", ratio = NA, tols = .2, solver = solver) m.out3 summary(m.out3, un = FALSE)} \dontshow{\}) # examplesIf} \dontshow{if ((requireNamespace("highs", quietly = TRUE) && requireNamespace("optmatch", quietly = TRUE))) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} \donttest{# Pairing after 1:1 cardinality matching: m.out1b <- matchit(treat ~ age + educ + re74, data = lalonde, method = "cardinality", estimand = "ATT", ratio = 1, tols = .15, solver = solver, mahvars = ~ age + educ + re74) # Note that balance doesn't change but pair distances # are lower for the paired-upon variables summary(m.out1b, un = FALSE) summary(m.out1, un = FALSE)} # In these examples, a high tol was used and # few covariate matched on in order to not take too long; # with real data, tols should be much lower and more # covariates included if possible. \dontshow{\}) # examplesIf} } \references{ In a manuscript, you should reference the solver used in the optimization. For example, a sentence might read: \emph{Cardinality matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R with the optimization performed by HiGHS (Huangfu & Hall, 2018).} See \code{vignette("matching-methods")} for more literature on cardinality matching. } \seealso{ \code{\link[=matchit]{matchit()}} for a detailed explanation of the inputs and outputs of a call to \code{matchit()}. \emph{\CRANpkg{designmatch}}, which performs cardinality and profile matching with many more options and more flexibility. The implementations of cardinality matching differ between \emph{MatchIt} and \emph{designmatch}, so their results might differ. \emph{\CRANpkg{optweight}}, which offers similar functionality but in the context of weighting rather than matching. } MatchIt/man/MatchIt-package.Rd0000644000176200001440000000334414334477725015613 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/MatchIt-package.R \docType{package} \name{MatchIt-package} \alias{MatchIt} \alias{MatchIt-package} \title{MatchIt: Nonparametric Preprocessing for Parametric Causal Inference} \description{ \if{html}{\figure{logo.png}{options: style='float: right' alt='logo' width='120'}} Selects matched samples of the original treated and control groups with similar covariate distributions -- can be used to match exactly on covariates, to match on propensity scores, or perform a variety of other matching procedures. The package also implements a series of recommendations offered in Ho, Imai, King, and Stuart (2007) \doi{10.1093/pan/mpl013}. (The 'gurobi' package, which is not on CRAN, is optional and comes with an installation of the Gurobi Optimizer, available at \url{https://www.gurobi.com}.) } \seealso{ Useful links: \itemize{ \item \url{https://kosukeimai.github.io/MatchIt/} \item \url{https://github.com/kosukeimai/MatchIt} \item Report bugs at \url{https://github.com/kosukeimai/MatchIt/issues} } } \author{ \strong{Maintainer}: Noah Greifer \email{noah.greifer@gmail.com} (\href{https://orcid.org/0000-0003-3067-7154}{ORCID}) Authors: \itemize{ \item Daniel Ho \email{daniel.e.ho@gmail.com} (\href{https://orcid.org/0000-0002-2195-5469}{ORCID}) \item Kosuke Imai \email{imai@harvard.edu} (\href{https://orcid.org/0000-0002-2748-1022}{ORCID}) \item Gary King \email{king@harvard.edu} (\href{https://orcid.org/0000-0002-5327-7631}{ORCID}) \item Elizabeth Stuart \email{estuart@jhu.edu} (\href{https://orcid.org/0000-0002-9042-8611}{ORCID}) } Other contributors: \itemize{ \item Alex Whitworth \email{whitworth.alex@gmail.com} [contributor] } } \keyword{internal} MatchIt/man/figures/0000755000176200001440000000000014740264323014027 5ustar liggesusersMatchIt/man/figures/README-unnamed-chunk-4-1.png0000644000176200001440000016626113756161671020550 0ustar liggesusersPNG  IHDRz4iCCPkCGColorSpaceGenericRGB8U]hU>sg#$Sl4t? % V46nI6"dΘ83OEP|1Ŀ (>/ % (>P苦;3ie|{g蹪X-2s=+WQ+]L6O w[C{_F qb Uvz?Zb1@/zcs>~if,ӈUSjF 1_Mjbuݠpamhmçϙ>a\+5%QKFkm}ۖ?ޚD\!~6,-7SثŜvķ5Z;[rmS5{yDyH}r9|-ăFAJjI.[/]mK 7KRDrYQO-Q||6 (0 MXd(@h2_f<:”_δ*d>e\c?~,7?& ك^2Iq2"y@g|U\ 8eXIfMM*i_@IDATx$ֆcֶm۶mcֶmYm{wm{?otWVdd/"d&A"B@! z zU#B@! ""B@! z^[ ! B@B@! UD@{nU&B@! {@! BWUUB@! ! B@^E@WVeB@! "B@! z^[ ! B@B@! UD@{nU&B@! {@! BWUUB@! ! B@^E@WVeB@! "B@! z^[ ! B@B@! UD@{nU&B@! {@! BWUUB@! ! B@^E@WVeB@! "B@! z^[ ! B@B@! UD@{nU&B@! {@! BWUUB@! ! B@^E`^M &F૯ kv 5PaQF s1Gvmp &_|]wݘM6٤jW_ 34SU{ᮻy0`wqǘ뮻VXj4`0hG9^Z/ jmG=?᠃ |0ay ]vYi'xbڜb!XcV_}˷9?((Gꭷ rJxW믿Fq0$5\3lqUBD3u7gFڔmUf}lׯo߾ywG9眪˪6?~0 ;e /~lQG1c_|-0&ob _XƠt}Q>6wqGf 4,h| hbd!SL1EXz饃/'x"{?Z7|3L04+RvXzyMW^ye6d=HcoZuN@͟0~?Ӳ'̏5;Q]e9y8x衇/xnVˆrȼ?;l68s0wޙ-2kzl%O/^ 1㭷ޚx裏`Fshc1],:"<7:C2aa2~mX,۸RKe 31ld0m/mys73{+W+y_?~kJJ/1͋n>|qqf4ϼ=I<@fT gyf6x =334kUؓ2L}otvT;FB``# :G@7 P:z^{- ?Ԃ LB>6<~:I`xmg>N_K)I Jy}fR[nljOp jƞߘ/X'[]x4Z\/B` # :@7н+Pr-se-2CvA"d~yZ/i@"O> v錀BL l9&f['DM|34gHhV"?uCdc,6=wY@@@U.pj$ccsm\rpt}1k3"5tFҼlJ曗+;(x}5uJeVJ8lf`fڰkO(aVdQ#,OO8;;6?AEg24W=C_2^I𱸸ڕ[mvbەU*<|{-ӚEKG`a?npE yXa-[:l0' ぃf&}M#&>HZA~Z Ӣ9c#hc Јb$`I#`ެ$62 g MEV*Qg,C;.m]| /_nf~WZ2녻9@@;hra21bK0e1_qY>BdӹO7`b1*9YGk.Q|I7]v%j,xS2!Htz+?!S"@AH;̓c#Tqʴ@ δZ,ôق . ˴eZDc) bH΂&xX_ }1"EcS{Xfc&z݅h`O: @ kcaQ4T؎2&Rgwї51d3Cٲ(BG#qq$!?قȱgKt;t\k]v>6RӸͼP龨tŋQ^>U+{kr8}@eܔKS=H-سk@ZE-&ci@-x{)i@mV W_=L68Wsb.GEhYH{Ec@fmN9SԘFA Z<8Z eڦlwpж 1LЍA;q_PQgW {ծvׂ iƿye%f_#A1@hY3B5ܶI{Mmwo34{k]ThgRmm/h{M:Z[na%9묳u $/o/D ,ZPƝW7mk7!;NUE20icZ( `:উ Q.++iH:_`%/,AT!F\SMՔYQggmi #ql1^ 08 kc:5^bpfʯE:p GeXn0Q`'J;JqMG\b:jcw.֫B'ItUB(#X}WFm=bo _~y׭ /o*M! @] P(G~եl"@" ڻx6! @!tI1~۲+2. s1GϪ*L~{Xfe裏V^y_k#YiǀcFSOի*Gn" MuC"O!$nV p@x'škg"\>ߊ+N?KN;-lєY_A& y8C 5TXkvۘU;sЮW";H$c rJ^ꢝ/8tx1Do5S:Sq7"L??VsNh1 .`ױdM =7TSK/;cGw._8,jMu /fX"@0Ӆ ,{ǻp.ͅb-oeiMƛԲ/l2#Jo7|wy1tIcw}7Op 3#wܑ3A9F,c4ia632_cZQLId͸Ŵ,P)3Fp3:c#~"裏<]vYl<#ZkUefe>ekua~̴șitdFUVY%[nK%X"3B_ÁX5I/3V6HL[Usmi5ű'|j3 e~'4y@"`֥ =ڬ(Bk & (k0z03!2`w!$<ފlrr<3l\)ޏvlZ(bFW>s%L7t_ I5H*Q D]&bN2i$%<~Wᦛn: .>ctFxI^M~m$ݘS)gzc!C8)7eQYYoe: BwߒwUmB@4 <+x .Eg\ʎ/fS'3Mu:j/AT#<I$4ue#McڞjğgALLJ~xe'O?4j;کr`tWs#mtۛCѸuW:f쮿_v8Cnk.@! {X&!Д`^6k?3_vꙀogyf BU',Cig'Щq7ЩXDs7fiS.1?s5UԜ>Sx{a 'h S:IZ`pdxQ1e9^ՔH#!O& kp,.i;}AJ nDoqH.aЁ?jhhZT73 .[~=FM1uCt >TB0#h 1Ӳژ^{-D/ac{#|Hن W0BI}I9ѿm%E%-Ť{E ܉nj>"D29^4s< hxِrfvE0tYpD~v=+BH0+5~~]5űGCP&0~`.JhL hi0q//F3m&z4mi&x^gUWd:#1jD=̈63qf"3Hefϯ}.cDiecי0Ft03' ޶%ʿs`uF:c;3e61#ۈg,@)Fd&.Of`.?0B#ió>337g曙vUjf[JcpA̍!`GrcD>E#;#c[;!ײQ _v4 ˊcoj@lc{MM"Zf6HjDS"%r!@_ }$hВ# 7&(Yڎ2&`)76xo , QeB4]:+|䱭92:ukm=.ūNkmgG}9!*s%DFm<j %]#uGԑ;[saHu%|gm&OJ>)r=OW[_kz֣B@tvC B@!P"5B@! G@! B@,eB@! >"P%! B@Ԁh `)B@! @>*A! BD@kKYB@! h1T B@! 5 ZX*B@!}D@JB@! RV! B@# } UB@! @  ! B@t!_JB@ Z^_unO>h0_~ oF|OjA==U-fۇ;/L1au O 5p[.L5Ta'tRM;3$LjF:<5Ѭ~=P֍o6\}kkuk uuU͊~rp'v)vZ~)za{w_>\|a ~aXb%ak03>8L;Rt '~8',bvm~8{ ;oRK-UQ9#\rзoߺ<0~.|]w K/t-s1Yf% ˶[Kor寿 rJqk)fͲ,`a1§. 0 b/*}\~1ȌfF>c_?{G|03vL3Bڮ_}6쳷KJ-VoK.]t4튤GqDaFfm+h/M0=쳙Xx:j6D|/|I^3<ӌgU{pG2h4ٺXcrfm5!w=d:kf3va}_V]gf^|._ިmQU@,ĥLf3(;iJӤEryffbR=3gq>[s5D,LˀxA(0:3 Z^wzpf뭷^g 7\nlUVǷrKf:ɿJ+#@ vÉ'^m23WlvpYqb1djnH{oΈi%.ǀq{ĉ殮L3<ˣI{X2,բxp0jĜ?(`a1餓FmkcV틜Fik"EDZh "D n4k5Y'<b Bwꩧfs5W|0CyxnfDM\5 :\pAkxuY1BzF-Gc_|x$˞5~>o"A<믿rlH&dc6#g#4RԆCxw Ҽ[Gj>Q+IhO? | /ZT̕ bNm$cR$Bkc Tƌ4",I!V 3}q񄴒N0$LiS*?(7Cd>z41XJdi+VC'hO4E9c;]Fm3*N4͘ŦBx~GSMM>6}li[nA0ɗZkm &q̞eYBFM"滙}Ǚd濘_fQIWfK<'뮻.dk(r b5^8j|[|CK/4c˅Z(C[lAGq6"Я_6i5d`榯꯺x㎋d!Z`?lWL姝vZιD /nՈ"CqCs` cHz?hԔ;&eyvZ3Py/@"#Qa1njAO=T0"`}0Sp )(c*yh??d! FD 8d) _ 1-_ >ᇊXeZX`٘@#a=cь]wj袋Ӎ~-`%#1X$A{W0H?KPk(c0 dā|ad+x;ƀӒӠzQHbg0&{iGMw_n&sF^Nl7Xj` xonAzWqg̬3͸KE/-rs.~+?;i,HYD@yt6n˅ 0hx !ʱG2/ )g}r'zGI~?1! u Dw̃F4]|G M5l;E!:J5A+^/RfG1}0 ?Q8E KL/BADO)H>Sq 0 7|s{YhloӨ5/ \'$lƨ(@;]&k!i U3F w'`;'͐:"a{.O_iE]8D@d"=B`EŃs%SO"F?y9f+lg/Q@xpx;ͯ2Ô d ɣ>'48=jHDŽng{GR+r$h:q s -'yiݝcF Z6ll]E_ s^|>mpiwAV_mmUׄ@ C55z0CbRkOAURR*h@51`RH*4䏀hܜ'ZG31"d!dw1;~C=Lxzy|@ps3sItMI 좖'hSqmgjZ31ŒO ^'˃ ~hB72!B 옶!`( $SD!?wNv0AKN,_3ha7y% -*C O "~>E@[mD&`P#LFE}y7@ +lj92Y@Fu6s=Ɇh!$+B .g|A8ЄbvM 0Y@D1ChseF  ‡o)~e⿈f 0DC`Čv,7<! EkOtON&-*E䛼h7} t6G( RIm.Qι !mQ$4 wϿ ~ A3馛9[%Ə~m< h OEwq D@.B*xAb*)ǐH>hжO%0MC* 㾕&a )BCF D71yԸD˵K<ҫ.&Qu\7)> h#D, s;ki6P[hH]ZEp-^ ~f_Fz{NjxyZsm3Ղz|2^eh^@ nG[5]·b >D " h h V31B@\?MCBb;'׀A1D3Q5AcV?u' ᣌ/+vxM,dVtCRє$0 … 4oJrLD㿉6ď4~ ecC1y;aU .ʄB %iQ}pG D X'?p䓱)O߄;Pd] ˆ'5\"#,ƒU L8cADN=qK̞a=!./D h+AnWA8`FhAَ ͢G`~X:!T 9rIHncGh<īҖA@o+)3Ak a=Q9CK98 {BRuX4_\XT*6g-oQsGM]&H2! 9ZO4hnqv(NY{aa=9-E[6! @0'kx=O˓N:)޿v3G -B>'sFic;\KYTVĞW=}7b4{0lhX#yu ">I(FFlh¾hZ$G|ȡŴfm4eDyŞ=cchL@ψ 4b⧚yAw1Sc%܅-}T٪(4DuE\U7ע#Jh_SSϵSa~]RSdn%74 OwKc|Y9զѶW|Qh'I9\Vy|v P!u"pe ɄnN#= 2!k-픁Ity!:Ho81\_$ɼ"Siț`.-afA'/ J2w1>AS5Wz;Z-s;tnqSE3FߌD # VZ1LvK .Ђٸ@`.gi , *=W)`-QPF3P! w(h2؟}!kf[k @IDAT}F{LP $i]z&B@!`6؀9B@! huD@[}?! B@4" 6 jB@! ZVaO!  h #B@VG@GXB@! @! `! BmVB@!`6؀9B@! huD@[}?! B@4" 6 jB@! ZVaO!  h #B@VG@GXB@! @! `fE`Yg } }-|?.S:+eۥ}ݱ>ݹz$Ї>M6$ N8kOG}EaX7|y'nhVzR⨣9t! І5B[o5:t7dr5ׄm6?^jpW3N{[~{]`?|w}G_03⛽/= A@`?px衇iή0sg}vTA338L<MboIefO?=oz%h4 Q]E`ǏnhSW_}k&/7tSbwqr-|͘[n <^på^N;i6L9a} |I<~KQF% l*KԬN:a1 J裏,? ni@9쳇{キ~<&`"o6wf-8a崞?3>af ?|j;~C oq~髯Zjc_!zh>N4D᠃:/`sOa„Nv}cE _|qž<3r+MIB@42"<:jhBV_}H*S-$sYf L2I]aUW 30C8–[n@҄`zH{{p SYm"I;m5 ^BۖXbx&_|Lgs9s25fϩo+C#?~:la#4R$6W^9F??n@O9pdž*jI=d>a]w sNS`:V:qwoO_ mYc .lᲾh`2pE ]w)@ ` !{ǻp.Zc͞x≼F"3 ,ɮx4܎; qpG<Պ_~ynZXs3fF2#$iFjcW_v[Mf&obl!ȼ$^ve:#uZkvY;<și-4Dfizvg33giFcLÜoy^LCnfD0?oوi$_8餓w,(?gD;3Mrf$>O3wXgFb_zꩼ~>Js*f!l1ݗh,L!,`Eۉs03c;8|V1 h %](ZشMM7] c1FxW%d:h+ > -P5Xa fFKaO'&vvpwj,~E#J+E-ukG5)Ёj-&K.$nOގ/obϢCU&oF-KI o< 0zt>DGzQMR4c9=CF&_Rrx/-P\(C:1C!e9|pۑW"mN6aj6[[ouc=ne̓|@gR!4&&~4Tna0|kM|u$h.!EdHHhV!3Aҿ_=f%8\4ż}wG>4#<Ҧ=6'm/~~ix뭷6^\e]J=c &t$h#)/(F{VѸ*F@ ^ek+DA@yJ-Mҩ:i@  9yc?][Ńf%&{.JcOאMH5A06dυ2['W&>!ŐuY'j;L˅RƾR72OTgm.. X滫R ү ('JDA@yJ-Mfx N$a&. &q|,ٖ2bD7y-`&^ey"k,)FAeYXMw8B82˶ElH{wQǏmO 6eFMD C}jۄbmYWďYT >ʤq>`S;O4:,D(bۙ9O; leB+ 1[FuU}az\غsIދ.(cDA@yJ-MfxHd%;كAV ҅i"Ⱦ hwF Yb)I"cj(>|KdPDA* M}K<{n4'Ci#x<b1ljh@7LlD;݄6D>ĖA s>$?]Z/}g[( v-dT1e[)Ϩ4|vGҾfXٵ]vN]+@#,=_wSjO5uGx!<~3Pb.yBr:#J~M I66~1mN \cpZd9:ŀ$?'$);!'cyyDlqgfFm%@#P )ےQ7xau M4Qm_U B@*,O=԰曇v-߿îqafscsW_uxM#(o&>GRO>ddI¦ng0h޻ݕLO?}XeUO?@=/1q'I?|c9wU=sk ;Sc=]w3T! ZO>y`NwyB} vXix≸?cUyuW>駟pZQ,`뭷^eQTSM{p 'QM=%]@`y.\9h^[oaɮ#1hK,<٬Κ=:$43gWN9{5\O[fe/~즛nib\p줓N۱ꪫff.[=*:~gLGu)WK#`]6te-\ֵoFf8-Bu?aVo /6n6g?䓘k쪫ߙAئ^83f?~fFhbCً/md#"|me>ÚP[aC=59ꨣfQJqb-"xGG933hfм+"[{l` ]QylG>jKs3q:a{\>„16k^ʾIDM&7|Ģ_mV6l]{=WY}w^f?Y>})2;s9',X!~,_xߧz6 .<~idMbځkRNظܙ2Ee]X =ٌ3Θt/Ń?>` HQD/IeD@+cg5zuErKgh6hǑ{%b袋f{o|25Pd&tH4v .KV]tQxN?6>:{?!Ŷ; hKx7}q7=DrH{e~]@5X#i&k!zjw\H6fEY$Z9ϟ}ٙl#>X+w_YwqY߾}_<㏳x}͸WFah9N (yg+Yg. Rl6& h[,h3ct%}e>b! ǁ-z뭥vmWzR%lj)qZ͏;c٫?-p4.QO?AhpgAfzK~N;%K.i79*_6,j(sONLPY+qsޒK.Ӷr*p=`|j<4EID7C4۲.h ^w~硇ɿ`aב/XWu/hRx^ {pm=4h QQGW"2y*Әy`!lc"s~'?dXG+ȭ:,0/H)]w5'8Tl#a44y!~,! CY",Y3!c'xbtL_&6QΡ'c\ *2,Y ?>/q e1r?}g\7`:\>x(A}Xlwom{V"O=Tp0 +\&rWM0vN6dq`Ys512Yn@H +v>17qcEB[) ?RVLx@o]lM<] YzH@r DŽY*g{@C0|WFS,Z%hO˄2AK@\-E"SD7Jc=le[ .5f|Y~P.J#x~dh!h,>B"ZjHYL@C2k8W/Cp@)_~_¼ql2[y1?~[@_̟<pbN' ꁔ3 7.L1"g[2a@9QK}0:ğɊQLLL)hPDdCd)6ɏ1cb"ma3Wj Lfv$K=A :`2-hbBLz d׵ NMKTF0Z0va2!n'.;G.KNX\/506P/ĝsoi@]\@J"h݅sfD؃ :Ds~.dC7~'4H,xAL]{b*s^\[g^]6zr׀n1 >c52LY4sq/`OhK!rQe Ikw³|Rb~}"âZA䔮:(&瓖Ox]1Q6 |uxX):L L>c P !5SrK!DV_//c|^x) xAr06%QhNFhg=ip- rݯ0;[11bG#y\`Šmn+߹_^:w(^ <* ;7&R^arG@@ V";L8CTxaͶ!pN )SL.E6wmaa\Cs?V̤و< M'td2u-o᠃ZtEFƒr8?k,MzhNZ!Xd;Ј5"5=Q/ULᨇߛy$͉rqVz2\HDTa.X YBRJvрq_9ְ͋9Mǐ%H5~.X&\CV!A׹v/v |-&ұa]h(JM\>X% S7jKAЉ,s LaL9,<c4,J90wp̜ZayOC }wGxNЎ2R3,鯤6D@kër|>6/.VlGĤĤKPVL^N`՛l,g`}?<A*0 Q? yDDa@~L_.LN'LjEAP1!1^H6ZIn^G|b:ѧt7'Q$CXqof¾"OZ%FGY񀮴Wݔ BSG!)Eo6.=e 6dX.ʥ‚7%JǘqAxR6o),Xܥ2;;D?'ksOXD@JWZivͥ\y EdM}([e$C ֋I5h'x(d&kd"qwphgZ7& &^Ѻ@ !&Vh0SA`V$=XBx?&q>)۬6S[*Mt_ƊL٤9ref;&P2auH(½ QBX.G=c Ek]v1N*Z߈;M0UA:1"`"~ }/׎T"B=ZJ3>qX{( J5¸0o0Wl<$d]kOi =CܔbBO@\ [mD 3/;w9vb E;-GX(@n|B};ĭ!CHwQ:6`GƗ4 b7L3 e(= LXQK029R>"GhP+;9~r6}I*\&R2*\k+pK.>}fL(L䊉*9WS`nHoqf>a˸bi{r3 y\ A*爖=PSh k~ECdKl֘9btl1FGtȤ/C H~o k%AP:CͤE;4<'a(!K!i܉ʉ~%e#צP`[ 癯|g~Dr!=h9)'|w!(h \1_V Uel 0;gBbcu I7o82_VL>So|jB˂ߕur IHJDƵ#Y0d69#`agtbK;f?feo]O`/g> ee3=g~/7G[d"gJpJ]aҲ2 J,mb%Hĭ8-f6. 7s/'!k1 3h R+ kL5.>H !δ+̳7SAhnIC>BK>,MhIbmBsk3pUb  1;E#\dD@?{IqLvÆ-U(3J?->* 4LԉN=Ҽ2dd]*MejТ L3S!žw'0>L~L%NҘ y0pF2Y˗icLb2ၚA ֪B hp*z{C:, fkj W~g, ,{a` 57z1 \$CYt$peBY]粅_4nY*qLy; g~h',(33^y3ςBIR}'dO(2wH.RDMV)~B]ɹԵ*-W=h'<&&C419Y4{Jqf/&f'„*sVeݴ9Mu]=FSW %<1M)ڎO[Go\yefZ'"1[Ǩb@~O3Q&e"ߟ/a"yt"_nؐVX9AɏрzpG4eGZ0[feFm~]y|.DžIGX3 bp`EC7#Rᛦ84JSlWzXv+$X`2v$64c.pX* j#PIi$y<rG; QvcHRHkoe\KL`7UC8dus䇔B9v4p*" Z_<4 d mpՈF1Y#LEKOZ$hxi0YW&Lh̵<ʄf;I}@C; YG Tg@RF uOV4e3zuX:a [ -.H _<= lɄ Ri6յX(\#4ϖKh9kAﴝ?X"vlz+@S#C&X'4jI" ڻxjm"8m&o&O߇ \D3C^yPHǃGdRǗQw'66foۿ VĒ CPIx觤Rt4h|Pbg̝<]k{hF-++ih 1e!OfKQ1/S0'R#M+a 80%amEUȩ[bEG@c~:H(ԁ֓9f1#H! vp*.eN[1)Dym0;+m➦LʣF6-G=h;PK𷫜[4<}厙?*|:|Ј29y8( -iqG`aDt>pg@Nx:YmV!hxW&p*LEͅn.|)C 4!!VpbBik3ĭ(,p3p3;™47l>AVi&X)O)ٔ'ՆY;E{޵=Q1j"7tLh-xxüäHhx#k|Oy`B1)6*td OWƀ# "B4jyܓ'm-cҎ8∞\'O:t:\^IbFWv{~J`AjDyFcׂhu?e% O@i~=4CfԈ!P/aLkl``f4`|͂iFjM!XdnI&#؃*7.؃)ϧ#`$1Xv 郑`Zg6t L d2}0בnYkF-_3s fA6c fWӚ3g35-f`|'4vj#m4KG0_`03X{.NHǒ]HslTȈv8])^}EFjI\ DӨ@ԅ uA4  (=ߝn!7tX mj´Nk0c?q^ՓbTo9*|evmFmsr0租|裏u @k! Z@y뭷]{nK,"w5taa饗X; qk9Lw03333333affffѿ}?wӤI$]yز,KKA^{m{'Zȶf[{͆Qyǖ_~H⾙{.fѶ-@j"@ 0ڃ>h<-"mju5جjO?tv hݚEVYe{7瞳I'Te]>b-W^Ft͆QlAi7dIh]mݾU˄@C3si$tUW-& 7`WLc5-DZ0B^?HmdMf{Wb[Fi٢˶usW1Kh+\rIuQmV/>ېϜ3ό9)ڶ ؈#hcwqu]x첋V/2;裏mzwKu鍃>{o GٖZj)J:Cmֳ_5xeb y1njBO׆ͅhsj+4ʢ&g&p6kmVi>lg5@[n9x<?|1mCrwm7c=ꫯb+M?vG[mwމ,gi9Ilg9H/K9F=2^{%9L:h`]&!|m'9K_MmZμ8' N.92rn馁\?R & 9.CZkKiB;&h&hvt|ֿVNxv[$L[o>&Jf^~'b)JiET 7x|TDkj'h(#٢v 3L/펕'žz)[gub?G)ٹtGz),e!jfke'62ƸHHgwqw /i&gѯMhu* L虀&I&]30=aAZ%q^+4Qfa%bO*n?̺`4$(v$30C~7fԥR}ǥ2,PC _O`_xq衇\چTfX?Cơ'gs O> >DC@L5M>h2ϳR!$f҇?fGBK ĮHZ iQCSYw!DL)f~&(X~ ΃R-x_* ~]Vm4`W-CZ^.8q_JttIlBImR5@3"0S H*P"_3ىhAV^|hr^K 2DP  }gByB O 6jSw$W_}u;Í(r!2 $BVrQȁe2[ӨW1A:VKQ9s9忕&Uuh_G`_G]Q! BmTSB@! 6C/B@! h!D@[3! B@4"KB@! ZL5E! ̀h3(B@B@:SMB@! @3  : ! BmTSB@! 6C/B@! h!D@[3! B@4"KB@! Z~-5ETE~_gL3 5PUB@RJ/O?ml96S 5 Zh:v{7mᇷ 'xko]|6TSٟiowe?l~<6Xcޅu>c"|AE/[mg[,onk]p*nc9lVE(_# ĝכUO 8H#`C=UW_eYl0/bOtIW_/UGXb%/2^}wm駷gy6l1c} wyǮ*{7}W\(z뭶ƽDm׶K.d_[W_}p K/cͼkG}M6d7Ok&o"mzѯ>HNHIpy䑴Yg59MtPrˊ8RG@%'x5I]ԝ2}5yvap7X[J(lM4Q'iI&1:mUik3Ϝ#l;&tI %ʋVm696n.]wݕ.t 7$iG:uYcI{<ϸ)r-11WZ-"<='7ir!czځ>騣TL) Kh)\/|˔PB{{ɵ_)qA+ǘ7xAvu״:wW$cufKb~86ǹN9ukPcv Ůs\Fq( 9nN6[1"w=HaFmTs8y}vatv0Rydt/f I+,ɵ$m "X3eKԃvb@rs`gqwyg z]3HlqfC@*flu6dܗ3ݤ܌+B<9ͩTcE!)Je 4 7GE{o>B:JAՙ@:ѴsosHKm[pB )z}um<D죏> &LF5rDD2ړ[n4c2>AhcЪ3.|A zHG߬|ע^h΋L0]*}hO!:jcg|{]l.[O?=Q6XL>q>ORヒ>j񻚀.Rһh@j,}fMoǀʋrM7ŠAH1hγe4L dGydhT(_^}J}iD O^̙gyA,RaBB` bE;SMgS=\1[4?pi5RʋHxsx%O?%3)(/¼CD~bi}ѣl\\sRQ{gX|H@ĝ .q%po01I|2Qa-32YC+ lv [M "Zhpj\}20xdnfI'44LO:sT)t?)$zƀqa@*ά(ԍ@@>C$M@{V.h °N;) S]vٴDp= FW^ye뱃KWB&h0=1.i OH oe㤜.(&PԜ2^AFe\p.e3[.c[lmf24JɄ{ ByƤC m6,L=<]wum|#%A m _M&<y,=~m_Ƈ1s^Թ@bQ HjG@v̾D@g8@HGP_]~1"gP,L۔ǀ읁2 9FzDpN6snۙ 9+/ O=p3`MDs^xKkqO@x顅üHC 23f06~jK-^Ea T'4bosqQ e<[.'fl!r3d",pdqMh"&M1q͗%myWЌRj$fqh,I>9`4YWDcABWaF8BhsP&Rk0ʄlNf|5@]|4מUVZX /K/`I&XQʸAb"`,5⚨Xeyhp}b ('.-/X_jd\kT#1ec|2 m9>X@[R]KdN_~ '*k[^81Kи],Qqoṛ{eP̙OdI|Xf_sɫܔO͌>41rBUZofti5Zds˸Osef,sWX;'1^&2e| sǸQk>|mN`#c,m8ʹ\'сkI}TKi;kTc$נk#K5qc#OOy8X餷t:Y^ݢ9|ʑ8h娄@i@Ui։0fcى`%6QXuU÷0o# `&Ϭ}3g>f"̃yMM`RbNf蘌ئϲ<h,3BZcblt4>Z Nt)ڗK0QDz RhvBک Ye_gY0s@H!DhebpPM03H@ cLb>>yCݑ[C:t'I9@E1={yg_BF%%cO<,([hо21Ƥ_t'b[/??2f^714ԅ?FPVw94,#.} =sf4n\8>w\,uwkh ('I  E baPc%Y|?I 8Q IJ`D68;cz'"S=uG&@aq>krD˟À d| iAHżV̰`hԋٲ҄6B_x1rK@:Lgmy^+LYX[HQ0Q,L }U#sk&e\` ///O Df_ͳFIOm| h*Zqd  c7x #x^( ɹcZ#?\x_6 uϜ-C3@fuҳ/ELfHQ3T 7&KX CL5sYE ynz:??:GqkY<6ccK*ڙY!nƢ%_X`h;dBF,FBƾlY l+%#̈́4- ?Z8& c6iAH;ėtR_/kor }_w!6K֖c"uLNOJBrrYD@{φ*( ,΀KAJ^Y5Gyiqhh/ɮXC<4MP&/xeE14q"hwG4 <"&hb@@&b|=Čn&L@JxgBP\QvA)eպQʁ  /&b*_+)?Ga&< "2/U <VF11yb}L)=Ls}ښ%yR<<{Uf~@z30+j1!iaΓy.!VٚL&sy/&0m 1EY&--g9{A,'vbb4Lh? j0+Oy:"]ǪrEQ80e8Ƌ_6ҋ>,y&b#/gai^Yxat˂0Q55 1C61?10cF̒57(/^>Χ.Yh`BtNO m}\&ԁA>}s9KT&D`y|T1|:W$colbqRYBK^IxQ3|ju9$R:/n&EXܮWj/9p-44b "&Ǵ R׍dr@Y1Qy./q2dd2N@.'I/ʓY o\bsy,ԏ>U 22&ٴ^&1#䙉#rg1IeR(1?}AN!@VѴ32.Wc?L&dwnTyH2egz&F` .}msPvTwB!"jhUNKKef%c?%W)+|>k9 }JH-9:-fvGՙ 2D ԃz+(TxYeȁ>u⥁r9.8:xQGKZ L{y@xy90gwH9d\xL 5FQO G F_V!h/|!]$xa]S y>ʟ\Z`N'ګ@$1y.YW_lVƢ-;*c 1Cr&Z:c&X߬afmkR.,hI-( yX|Bea;EJAvsa g ^@(Ih]KGyH!2rNghR!QH8!LcOLXJC\q} \ԥhuTm/ԇ:vZ.[C@V 첉+z4}YH2keY) 9 1e1gp>Rh(r(W y?Q%_.ePC#˄+fFg-*A ~f΋Ad zbe +xs/dIΟpR).ˀk`l Bʹ Y] o ea^4L+j[OH'GMu@D " Vȓ1! ?L*sh!Cy hsb DGy5vQ cAQC616M6Dgr L2YXaPmI.O)T9"cԴ9Z>/y+W4uT qx 9_m5!LfY0u2`Ñ1ea&fJd?l6sKnd`/ +GߢbzmcGeO6 H:Z Ӈ*9{?^87spbEɐ-BD8D=lSV:S2Fp8^ I1@0GgX8&RC:XMx6Gs>.AY[5*\6}S11H(8S~1iB'2q/A byX( z乆4r ԙNjs>$}͓]ʅR'ʆxRmq]FY' +֨GSLH:G@s6Gh<1+BH+I6kCx1c{Y\xi`G#/Z/eZ&NJfJCpD&,hy@FAE#1M/F^٬I&1^ IO 2_,=yF,vR^, k?V `)ug-קagBD[a*.fzݎc\@\ #vU hqjBr0-C°fČ Y0X LA8WYrpUv `;h4g/1D@ +B0'R le^6 [+A1Cs )\3xQݯ4١9Z9HC_>@BPqAAs_O^徬&JKs c<{Zy$DkF7:*qQL?&u\H+7$u&L0Ws>B,X%7/L X?4Kh@2i}7<hpy)&o khBsM.<1`fGÉfW"Ye_ |+uqc2@QRHhpj\}DGIe֎~d+10Fsʋ3St2hr^ V`@| s+ZS\D[k-쓚5x:YYҜGH&1*!hM$)v];E ?hz+A+dB”g+8ĕ(T{E LƝ Dò?+_%.h: Ef1gl162%x֚bqs*xu}p5ƨr_f&-?}}M?&AgYtЮct9E@e0/3/rykёR M)yKŮ]9e0E;r p Ř#yaAy4RE n#Ds|q"\.LШ{@:8JDhq/D S>nYs Iq-\4E?\gH)V$Z7k^OHvgCix1Ƈ쫍=qV- 7dp,+y|23'暨N/Ls9Y5'fEbdjtbN6rm\#muZc>@kȢPИkd#} NJ;@ ss-y`{yyt+Зn770{ +E͵qϸ9=qgKmٮ_f˜RIsSoh=#Z#p:M4P7q]KŶdkz5Ax@n++aRA澈>A5jbt kvzpf1Vs3 N徫9ڭ {n{}KcνsY[RD@KPhC<hܷ-4JhmVkGW0_~002Se]?o̗ gOrZf ZjB!P?D']AvCSٕ}BͿ^4)h8 pTB@! F C! }>jB@! F C! }>jB@! F C! }>jB@! F C! }>jB@! F C! }>jB@! F C! }>jB@! F C! }>j́n|QaR!P"5S@_CoAN8ᄊMzmQFx_; ";m~Wp8N;l泑F&pBvamM6s4ÀbQC9|U,"6$o-bas9o·_~ DMd>ɃFmdc9f6+\6s6l0`,?݈#m}?̇3+aDs=Qq\pAoCSOm?>F}%?x`oK,}}Q桇j뭷^Taߨj}㏶f,BOԫ*W8&@u ba.w}V[t?DpC/w!_|aes16PC{ク{Wc{喳'8|f!dr @/ FzGk8SC aa++>ɵcoqZR( [9IN_N3\:5boDkkRKpXr(žk=#kbg)%yOri;⸓q'Hr-<M 3LrH;3S~kEKy.8ωP)OqSO-&Wv"Im)k ڄtrhsi܏>( >o)7T'&'ܑ<C$w(fi '׮I;s='9KIq Nϻi &H.;_.iC:O|͔G6=\Y&|D؝nS};6z=nm4Z>c+ /Z2'nN짟~)Fv4 7xӻ{B#iht17x[} -f5A`LMA4Em +`h,s=(+W,'%swNw2gyI;kMyD JuEm1yhN34]9]y%T]FG jKA*Gm6 )g>8a",}$"bJ!JH52QW\$c1℈bW }Cٸ(A ZL+nC>cyM Er>ۮ/*a_\(gLPą[܇6+ީvdR9ꨣ9_xos9PY;u(/GB@҅@K BA* 1|RNv9~ >+4NղG:Kc@스9 fB[YJkvv$j.N]':fӜ_wԖyy@::)RW^Nqr7"eZ'|Bÿ \Kh D@T !pCu9<$Ե `DΚ_|1Kr)LȌm%Q]̷h:!qQhcV4CZ;LuQr)AG Ƙ5 xU)ǨyD#c '0wm1]VvBPЖhZ #ju׍ 7IhD@/T!ƊXވm.G̰+|_(o/x M,WXc' *euumM"i#Cf?b"ZkE6Qg\ǫ ,6QK5Ĕk5\,E$>ݴ?K?[kC[Q<jN,c2tn1XXb _oHK?,zBԹYue]!  D44 SX[n>hFo`=\G!D+1Y9##xƒmtI##ϻ\U$:\Bk:vZY B فr p>.+k9DA :ه6}A!^qnE,ǨRHu-_>]F"p "}aKFF4,#XJ RSh6A[h&! AǧrbԱ!k`J%!;CEy $fH)i!]68|yL8(fLźzLNǻ] JB V)_WҨ7dh{Jx/L=2 {4CB`` .I~WOqv%B Dwv:"9+r).?ޕ}4Iy4|g!fhi;HOJU.ϞeC{ZL':t}Uf2.B@! UD@{n]L! B@TB@! @" ګpbB@! "B@! z^[B@! = B@!Ы*ܺB@! ! B@^E@WńB@! D@u! B@*" .&B@! {@! BWUu1! B@~?ngyƆn8[ll沯qn}caSM5<6ꨣC SL1 9W_?lka֦f[x/&l2[tE?s>3jlA: 6`K﷔?ZguX ! =hA٘Agil{neQᄈ~g nzAGq bK!\o衇C9vq6@>{ &b[om;sbsO/b?O6t8ӗPyB@! Z A\CZE4w\h{|90تj~ۡG=H^{e{=裡5_~yT4c5}6 3ؔSNirK7|tE~-OntMAJ!;ohI!N;\23۴NkhF{R|I[felI& 5Ѷ{Q71_FZj,wh:#駟|osho7&=! S/x*,*psjO3xL&iW_ksc)#]LV6ys-j3'7ӗD9V5įk-(k63F6,]t7'';<9M=Xpmk\/HskrœfqƴB :(mVq.+h5iEI2Kr"~8v7^139_5P}Q!д"e6LR@@oZ @Ոa&ՎWJwmdJǺ8駟NnfmHDM(Us_ȳj/tI'm$Y&Wt7ݧC=4m56h%w+*}ݷAH.uТ&0/Ҵ'v:]Yeo4װnuimVH|I׎B@@@oV'\ĭj>U3ݑ`*. ~ey#H$;7d8)?SD!_{ꪫ졇)\ eP?_1c+7|[oٚki_|q]Q^Oɳ>01)u793m喋6:'x"0/E"RAp =\B@h cA8nV0TMQ o:/"v>kl=#O?zj_H~!q ?[/ߏ?X:?|b9_>#T[ne~BF beמG]\=OhX}*`=i&kv6lQW^yUm݂hMq%rr^죩& K"}{Q5*,A@™a&d.D AG kl/1P\L1va)Zʗ_~9ǒQJ"@ 'ш QFSf{ghMY >$rP\L4 K~wy4܏C qФq3$g`GB@Y-FBr\AHfMpDE{q<ǝ)&'ɗ㮍;>n'{Ҧ9昣T'W ptG8ev=b&h ";D`AQ=!^xa餓NJTrgre/ Hp@y^' 89O>Y(l! @ AHh$5 }'c9#Y~ȃ\INBptr[RgK,yRD nO%\Ӛܯ4KA"+dH|FZLՀmnom>t6@Xi#X2B|}vW` KWBYm֞Bzz64EL۾x|պCjgm_X.fײ@o |r6'QC\ͧKucfLY+Me(4,1U˚m@q,U~OgI)*@c=6֏-jl?_jԪ^B@F|@wהR8N,7"V"Nl]w-'߲?묳"߶'SlEPOnl9ל_AE8|\8∈w>ɉe'#r/:'5_vi';ꫯ6P'H]K/dNXU ˗XlIB`ZF׀.Vj?4r|e}Dӭ!e,Jvsp"5h*S_q}. _Q_+3C[M/MZS/]ƿy?@aJSgHTx~I@}t9T<6>R#9.+,ݺ>#-Bo# 𒁃/n,h)+ Z)/JG摨E瓃v|>uPYo;QE/Nh64ӄӿ~d/WΈ'׀̿=s6쳛.L8eK8|-%bA~G_⾭d׵Ma:3'IDATe7?| IgA{0Iv'6Zջ\V}pX!k͟|.B@@ߞ#F׀j:$'Oꉆ]&p #ݑK.$גt'q m{nJkxCݨ; fmb%H'J;\rvB|I|Ck]x'Ko{wT~O&P\'z6! xzW+bhѨc\RyZ~~)y7EwuWo9*\2vw&_c7{e ,G"@h kF'X 8[ͱıkrk*q`L@<裱Rye9d-<.Srδk$~Fd;D)GcϋO>9;_"gJ|!/)i,L\C*|RCSm}s"֨ā)u arsIW;bBur)KBs:JsM]vYl,}E}GF) yrpr |?Ao lcyr!uZF'@û; ٨k¢eyxy>Qy}M!B>kɇOO[ouZꎰ& )'OhrHC+H1y X `y&h%B%7|P\"iBXBheEiָ" m(K0QGQFmr oT> No*_k5AzmCG^* .! jG@vF'FzJ=Yrr0RA^b`+$5_'L9 BBW/rM9">іrbqLs0Cw;ת$ĸ1X6 2LE %ıDcK}Y(,4yE;@xȇ\&lX<!3hYc+$\$>['MÝط E._4w~qe>忙Wb-Vs/@hZ{DdEp &Y pHr_aI#eB[sRlM qBt#ĒPN̵Z&(MďrM|e?6߽w.vWs"2|.I=Ot y 2&s 7D`VƜSM591_ 3Nu||\fg^|ҔOpƧHa'>|e ~rZ*rS -BOHeYIl4 ?C}#BXjpJy\{& |r^FH#p7;_7?7Y*)Q~ ~,*(§UeO݉bs{Wb0Krr@Y=Q+%Ionn*lsxƲQGNrUyJ*#@T:΋4V+ ׸o\v!铚u ~,Egc{/Jj~(M4aw6/X$oYVLB*/oseHx>NX2,SЦn]r~_<ćv!s|6i /F7G=s BʗwyO=7KToimWX~_`MV"sߒ5UA^knSxF+0KbHے^Pf1 XrDzoՉ4fު#f %lSJv'aZ,?e4s"~d9tS[2LZD4}<71*A50Sg̈~7_ww1c- U.Ѿ$.|M(o7 'Kqb%Y[( `9,up y<{yWB@! @! {Gu#B@FG@{HB@! @! b! BmRB@!bX9B@! htD@T?! B@"-֡jB@! F!O! -hu#B@FG@{HB@! @! b! BmRB@!bX9B@! htD@T?! B@"-֡jB@! F!O! -hu#B@FG@{HB@! @! b! BmRB@!bX9B@! htD@T?! B@"-֡jB@! F!O! -hu#B@FG@{HB@! @! b! BmRB@!bX9B@! htD@T?! B@"-֡jB@! ~^Fagy_lL2ICKO>Fez;Ϭ w}g6h S'Usÿ ;Ϭ o+0 UF̷~HթXЊtxi\`5=sƯxwm1ưa*kW_s18>#_⪭duQF~{vimAKcTE TGydz8ы,Bۂ .؋WեGmУ:E^DW^5X^~^.~GlT@B@> x]V! }оjB@! " x]V! }оjB@! " x]V! }оjB@! " x]V! }оjB@! Z~ _]Xn/ ԫ\Ԝ/!5g)>&|l@믿_B|0 tC@! MLM]B@! C@! MhSu*+B@G@P-B@! @S! Tݥ ! Bm>T B@!T6UwB@! h~D@! B@4"M]B@! C@! MhSu,|Ej=kԊ۾0*]VRk@Vy*Zϫ:[v ?瞳uYFydx?6d(b+}7 =@-ꫯK/m#0 306묳wsRI"g_~I'ԶfN =^h .`<;>߿ |1>:6ڮj|I)C!PXI*6۰k.1~J-H%'iN . 91ItP-vmӽޛz43f!Vtg|{3]~ɉgZnR~O?3S)"glj'[wz-e9x4CSO=5=iJC 5TlWoo4s/]qi'O>v{ZJcL2Iꪫwܑfm4~kzTV}okeh~^H:hJ鯽ZoR6@-vF=䓥iKIqMZL$D@_+L5TN6hb6guVrR裏JzkR׌ҴQ?j+~+;R|HKih<dop uzwu-3R/o_=Sk=5RKrJvm4mZmfN;|&_%l_JFγɎ{2 ԯ* $VZi6+BcE]dt.0~14mZʑF\b,Umܗ$hMLHq'TIz-}уy׾b@-r-Gs6کyvb)l 6OTjUxސ<6挌y_}U hN7tv5ز.afO.BuF]wKo7tN6,\sU Я;'EMM)^_K?Ss8Hh no"͸H&~ٹ{K/^zR!@L˜$IyQ8iLoFs?Є~w{믿nhG%Gֱ'}w@-y/6׿ҺBHZ3t` ˜?ڤc(?˓_jb50z;6CkNx &N5S!󆔏{yҸ>12y%#G}.b#2[Rj+Q믿vjo󆔏y_rzc ;\'qLH5O=@-c.V[-$d3vz0gTLk} Xn6M4Qŋ@Swފ(gߨڴ_>XЗي%3LQ}-ćW"2= hgjo|DY[($1nJk+~a#,KG$6 Ҁ6`TB -Z=Ẓp&\@h0M,9IuD~CM g;Jѵ<;y,D&4H@b $EaՏ<'/GPs)w̹#PXIdEH,zgݺB UCH,D.$H=XA1S:xZ>&'53e/ɔ<:P їPFW5do|(aM; {@K?Kѿ#ЕggvK|ʰn O]r)'[nIBofzco}"m1N믿Ji~2Vj]( )9o'J-D߸}Sf[{QrsRl͒Ky!7ߖ\|Hb%ƔktoOk zb9zvi}ۉvR9@_jT20}Yv": h⭫ ! B# > ! B@."&B@>hB@! z[WB@! @G@@! h⭫ ! B# o B@E@wՄB@!B@! @" ڻxjB@! [@! Bw]u5! B@yD@- B@!л.޺B@! <"}B@! ]D@{o]M! }> ! B@."&B@>hB@! z[WB@! @G@@! h⭫ ! B# o B@E@wՄB@!B?nf}B@4"MYK7|6 tPilYg:nO?~n+&FqDowUW[d! @wtB^L5TF)%k7<6lz]Wgql 7>{UWV[me믿~+l֊2>h{{38c{W_?jX' !-A=u,h@!HoS:A)Z'|ͱVٹluֱنvؚۇ~h=\hWs!mFv}B -+UdUu1! 馛~(0c; ڷ~kp 6Sfqg믿X`6h aγgT7\㏇F,&0 3ifSL1EgNǨυ^h., +6˺k^zM0C~O6d^{u_M裏n+ewTu%B@QЀJΛL/TPƱ[n% 6` UZ}r-NkP7xctM$'2Ga~i5HiMc4\{iKswO[otgG$-Ps=޴;ƾk#]wNz7ľK/6x4Fо/2'K p:3G}xq9yON9眓&t$3iiM?~:s,}駑Zt]~{XkZ,@3 ueH_\\ .5 5C=ksB MMn֏}yDj%Fq:AI]._! 2,B1 Pf5׌?jvygꫯ̉[k=?oF2'/A=J+d~DSO69A+a뭷믿.S$|~)OeY$ jWRae ^@pҜuH6ds-z뭶&t? p1e-Qw\lg!XjWG}4'oytqt+Vmo!DZ {B"b؉XY*7g;D# $ܳK1wf7$ ` Df)$@c]~ F{zzJQIwqKKK^F|*!mmmUȺMMMe*b>߯r 뙑P:cGf,(qB~ga⃝Tӣjooz||Ljl2b?ڢ{P|Ŏ[MLz@';j ? We";&i>V^__s_,$0asYu_]]cCbR"vsSC)$J,&F.KQ>;;rYJ2Jx Y<LO!Gli$/e%כVBfy# <;ǛY":777m$  H@P$  H@>$  H@P$  H@>$  H@P$  H@>$  H@P$  H@>$  H@P$  H@>$  H@P$  H@>$  H@P$  H@>$  H@P$  H@>$  H@tIENDB`MatchIt/man/figures/README-unnamed-chunk-5-1.png0000644000176200001440000010311414740264323020525 0ustar liggesusersPNG  IHDRz4iCCPkCGColorSpaceGenericRGB8U]hU>+$΃Ԧ5lRфem,lAݝi&3i)>A['!j-P(G 3k~s ,[%,-:t} }-+*&¿ gPG݅ج8"eŲ]A b ;l õWϙ2_E,(ۈ#Zsێ<5)"E6N#ӽEkۃO0}*rUt.iei #]r >cU{t7+ԙg߃xuWB_-%=^ t0uvW9 %/VBW'_tMۓP\>@y0`D i|[` hh)Tj0B#ЪhU# ~yhu fp#1I/I"0! 'Sdd:J5ǖ"sdy#R7wAgdJ7kʕn^:}nWFVst$gj-tԝr_װ_7Z ~V54V }o[G=Nd>-UlaY5V}xg[?k&>srq߀].r_r_qsGjy4k iQܟBZ-<(d=dKO a/zv7]ǰod}sn?TF'|3Nn#I?"mzv~K=گsl<b|_|4>?pߋQrib 2* (Ѧh{28oIyes8';Z9h6g>xRx'b8ՃWOϫ[xn%|^z}%x c8eXIfMM*i_@IDATx UO椔R(HD2gn3]F藊kK\[ ]B@I4QID )am}9~}^{kugU*pH  Htn@@  ) M)7;C@ @@HhJ  (  @J@S@@@@@R*@Rnv  @=  RДr3@@P@@ ! r  T4 @@{@@ )fg    ) M)7;C@ @@HhJ  (  @J@S@@@@@R*@Rnv  @=  RДr3@@P@@ ! r  T4 @@{@@ )fg    ) M)7;C@ @@HhJ  (  @J@S@@@@@R*@Rnv  @=  RДr3@@P@@ ! r  T4 @@{@@ )fg    ) M)7;C@ @@HhJ  (  @J@S@@@@@R*@Rnv  @=  RДr3@@P@@ ! r  T4 @@{@@ )fg    ) M)7;C@ @@HhJ  (  @J@S@@@@@R*@Rnv  @=  RДr3@@P@@ !  zm~+WΟm˖-Vbń!vaAX2e^gVT~^  P *ظqjժZ/K@"H׮Qk@p̙Oڈ#>0_Z^g^{\:dD@ {6m4kݺuWN:]vϦMZ:u!mڴɪUf~x‡۪U Θ1c^zZ'"# D=6~8>@@   .@|_4<;Zf8$@LJiv-Z0H  @2T':  I &MNJ  &:  I &M+._.\9 JP|ٹ5t={ҥKӥ^q_V?^ںvj_}s1TU:駟j%97@D -P tjMUs^r%vW3g?ׯ_̺v#JN:E>GOwq<7?[n]tNNKkJ<~ \ǣ8\.4p?rg hժUjzS+-K@}Yꪫ|#%ijkVɚzIB*Tqw#[dԬA=sK^ٖT~ 7XZ믿ƍ{r )n@ )lkgĈJRJ_/z+lݺ?>椣?+@Xږ_ԭ[75kV̺|)pa4 pmnݳz;F @>[hqS9  -n vꁭC^x?nU_=x^ve!`kWzok(mFge:zz~LAy NcP_}5l0+ΝD@isѩj;w%ꔤ`RyN>ݟUJ4,~LJR^xW|~@u %zGu?ʗ/wj,a @v&m@tcǚJ;WZe쳏?^zEE.* U00i%'}= %N@?5kV6 d@Zm5Ƨ@q.N6u6Rp|oLB@EB@H*DN{͛7Wg#"q#  Һtg8D^̫j]=+A;[e  i hhW~}Ӌ  [%(ݓ0Y%8+@(n.7n :  M2@@F(@@Hh*  @D4B  @*2~T X={3 $(@ (_EB@ >5A@HZ4i:VD@HF45A@HZ4i]qʔ)6z@@PΕ͛g'NΓ@@&d  (Q諯jJ_ E@H〦;Sw5|{衇L 4O?5jd^{5m4SOF@ j;v,։1T*N_|+WΆntIbŊ,L#  e6fѣծC*;|O[:uo9ŋۂ bo6c _K,.]غu;Z޽_Y|MbNG6nh۷o?n8\r̼֑ga;wLJ͛m֭1&Ǐ?@O?wt @xN6Fiwy4U۰aüi߾}}IQ|駂G4vmVzu{7})K.Gyʈ)U4oɶf͚6{bt~!q3?:z-_ܦN3O_*O ~ľ?k-ʹ >.N-AjՂO>$pRաCSNxk֬`4_xʕ+|WzuQ1y^y8[nc>AGNi„ ?v}w%W2e׼!kK}q@ /dWT ҲUwaUҶj*sAvP^N… mɫHfv_vmU׻ϿKMOB@Ht=0szF(#R|oC5LK*UzV-[+ f{ժU5P'+zoذw_8ӧ۷~TT)+8i@(@ZmٲTN*K:=u֑!SV2:Բ/2:5oܷ7Z&M|ǜ8]F A멧A^z@^tEђ@@D2+} >#ytMsy"V!ѩm۶?z}_~kjvȵR''$\pe]oѢܭ[<{c &J{=_%51b9s|x:,o9Ǟ={=Ȫ!CL%yzڶڐj8!1HI  E/m@uryj<]DGZ}y{>c:u2mٲʕ+[ӦM/mذ!rq~wݻhš4ibz-[,|v5XÆ VZvYgZ>k,;m=;ΝFD"ǥux .^h&MTСC3gFՄnӦM&0_ע A@ ]we-_|ƌc?/Fǟ5klv[lw5jXϞ='=zبQcǎ}wꫯZ~[n7ҵ^A`m<믿ڡjWzif#G[8:Μq)֮]kgq4@o>Ņ|w6m(2nݺּyg&@D e…VZ)^(a!Md!E#Kof.2dH\ep[.p}yY|yq6Xb67nԯ_?f~} $~aO pp$?%ŋW^Lqgq>4#;!=L.Heڒ%KJ=UkԨQLaΝpTTi۶ox&N:*Uqd^8q闕 8W *w}S]H"P{_`8&ձG'q?L" g}N1¶m۬lؐHR&ݿKew̥S@%,uAf̱ju%,4pmD#8s[c։pꩧ(vrfѫ7xc$Ot 2w\*>)PqU= :vھ$wv" %-c BUG}tpUWؽOcWЕcYpӾ4C ^v.ʗ^'Q/RSu(R:ި41͇ڈ6$Cǥc?6mSKä|oŕq+ gkI斄 @IwkvfꈤSՎN6w~glܸq15%#h4zv>֭"A 曘e 1Jwޱ-[{*R˵':TW5s|J业5ko`Iݯ,f>i@^x!('ثwi￿R?s|@*0LW^y 4Lre C3*x7(lT%~d<ξK}5:'):ظwq7Opg_=DwirJ=mO?y*xyzw6$p%>Ot|"ǥ\:׿L(vB;p6: 阋2 ^4a„ ?&~R/H ZE@ y0nРqTUJWcf׳lWɡ5hƍ? 失.!HI]v1GD۷oK$s NqGS"ǥu ueY͚5:vcnIǬ\yӯE  @)ɬRG?@Haݧzi {=V+>)`ݼys:üLmMd涍0=.oOt}%EjǪy?h s%ѧicgLw1jM7䟼JB2 E-4a{1(Kb_RΒS/JZIڜU~ȃ d@tBth@@Wʅj˪$@@ dԲ|-Z^$@@ Qc@@@cE@@d@Qc@@@ݓ+# @a@ QCҳ@@d}@@    - # H@\dF@(OB*`߫W/޽{9 E!@ZY˛^$@@ Qc@@@cE@@d@y+U-Y$n{=gѢEd @@ !Єb3իW.bZj m߾v_)Slщf|r޶mmٲ4MB@ @8{駭VZIovꩧ)b'ODҼylĉd#XNlժU֪U+\5m؆ "W޽{[-I&v뭷ڲe"ׯ_o\s5lgu-]4}]~/wjԨa={'|rFWFUW_~-诽Z\՟`G_=ИQ9s=>8ҽA7nLb @ pAY%\0dȐȱ@/8Gy$2O[\I7vX #y ,^z%\n\ 4(JviJCY"]Vp 7. \tAϚy%/OܔQ9Cbsa.I䟽+prer%+r%ǁ Fc6~wuW~D7|bVXa.j_}2L*tU|Ȼ:K\iꨤR#5]Z Gu͝;׾KUヒV)ltR*%O"[ ׉WXd涭ow_;}25{챌+<'@(@Z'3i$A >uf}~א@aկ][D߱(\믿6Wg`ޕRjt6׫bŊO<RiӦi;1iģ12uL]hd[_dZԩcM%rn1+zf+Vn;$iTor ~sɾƍ+ @ dLJ.m/i˗/q=(lS:\U)Sot'6 4L3Rxȷ԰O$׹4 CrMqi%޺M:U߶z\'3 dh v=젃{ڶmy7ߤ`;/lwq͞=nVwK@@H֬YcwuY֭[7{'-"G:n4WuI7U:X_M4~7dڵkҥK2e=:|d@@6md5kִ3f:~v[6mlyXei{>XNlժUe߿j*WlM65R6l\zm-Z&Me˖E_ޮkذժUEYvO>7sӣ(;vh* ?K9sO;3 ٱkFy 7[駟>تVj۷73(l>Q _& eo/n6K7wyav_Žb +W~㎳?6l]q>y)<3mرmՄ ^It6?:嵾~E'Ǐ?gB6|ٛ^q &c7O}{JThe>UoCoSO=e[SzgS_l^z=c׾+xȑ#;xwL K*er‡IM<9SO=5yoۍ7T*Osj XD+uQvi!C_|9T+yuM]:}-_|=gchٷ~3O)T >:~x¼;AS.3w}_~&͉~{5 8.jo~뱹_ʁ.\Ig:sjFXG\IhPZU}YJYW9+ \U}JCq}.ȏg@R`ʕkv8 x&G|BB@gD=Em>XE#$ Y+2eDՆO4D1Muq"|>WiO*VJ"}8wȅ^yJUZOԩ.W"k:{nD5FNH) fsΝzNjh$@(: N_ Nj /6u.)gÃ4=p 'K%j3@r_S¤6^Hm=Ur[JbŊ|to3|P@WI:')Pn'0Xձ+گǤwyUWhG3?*]`\u~yfز @P$T!y2&EjLxW}4tR8 zkx]^e,<*>+48 Z% TթI@T&٩_$ .k(%ufRp7\'wl*utm?}iRoxÈ#|GW*g7kwzR^OB@ fRM kmhj&5{˘A:aT]I꽮@Q=(5_WظqcSu{؎BC9)Ӑ/'tow^ I㌪UmLU-P%UQ;ꫯEjoRU:F FRd+,B@2SBeҡpUAc;CWM*_XY>fTL6LL+|yћy$VCOfy{M7Yz+ @@4D{3=(^e;Ka矝)ej{Tȹů>bULB@ *1U=1Jo׭[W|;`  P@K@@ Mk! %ZD_^N@H?넔~wD;v4!JB@ MF-ӛ'8e9 IP  $/@k" $!@  $/@]֮|r[pa֞?'  -_V=n8:thV;'  -![@@(hȊ  Px@@  IH7̺-˺wu # @PW\arHK}q |VJ 8R@QjGV -Mf~#?5j԰O</^l3gδN:믿ޖ-[x[l[V|CӦM/A飏>#<{=Z[o/hܸ?. #O?KVd4 {tR6x`;蠃l}.:OOwֲeK[z8׮]k۷oM6kR!* 3л[lv9عkm۶[oFiC2e=:NcDU39:~1"z6 N -P]&*SId~|P +xnRͳA7lw=Sֺuk:uϣTJqeK/d* SZlɦi&2?ۋ/aڇrJ_O>vۻ9ot,jdRU *ܺu/Eĉ#pL7_py-N_&?/[:2# Dq}=]=kWQ>'Uk+u%2OkRJ>YK,aJnBUIyuQ Cu83f>%[}))1ST)ӹQJo*0JyzoР/U'N_&?ժUt8N|i ?ދ/x&z߻z:^HJ?pov_E3.WhAjrQ0tu|…~zF'C: 6%4o=vK(T?Rpnppޣx*T0'۞ZhU8/Bd?7Q\m.Pnm/\+|G% v?}Wht>M} ſ4DQ5>Tia&%5Oܹs&? rLL@5 =x|S Vɨu-D8]@ @??rL4ɷaT?w UxaRX_/]uU GnTAj@m*3-<z!?z]nݺ~lL;/83|7|Tv>,Y\d @& u|A`5lJ" `-cAI&SaR=HyJjӣAWF*iu\R^{o2?k+rKvZn@V s1 @ S/wq~W՗EpHj_5?MlN=Tݻ﨣UҪ 7L P۵kxK?X@@'PUAEpUj^E4ؽWW`qFk>CRQK21WxlNICWM0!%M7K5D @O@CPzOTbC` 6 ?I+Vhz@@r *@@t MǫϚI  @2ɨe:jVI  @2ɨ  @Iӱ"  @2ɨ  @%n%X1a;ZfOF@@ Z4ZZhaz@@dOFu@@ M@@ MFu@@ M.{W\|-\0{8s@(hsqСC9k@(h   @A@ E^@@B   D W_}J*ekAȋ  PD< 3q3AmǎUtbGzeݻwS=cE]w]wwm3f̟??Gs5X2eYf>PMdۉ줈&So߾/.1ufԨQQ{38͛X#! @1 үKd.ps=c[tC?ϕ||pJ'#y^{5UyT+p%pU>]E3qAe28p`d냽+hٲed^JWB.2/m+c"y", wz8J_iٲe Fcm۶V6l aÆɮnp7~^d^3 O??OQ|2{ҧ{ZwB:c#wM֭3'KHUu0]JDLb;w6onݺ "&MdG}/ g !gԩmJ UE&lj*:v mU''^QG7/mrUO::i>&숤<.(浾 T}j}5PRrӑExߕ~]K_,'1,K@-Zsx. ^_*L_|1f>h"+7̒/{cL京$̼;|'͛7O5 {@H  Pi )SVo'NɪNA"_b(:EW+m޼jJLUXJʧaRk+~=m''׍rL į ]8`nÇr齺]Bߵ%t$@(~ @ŢT裏6q ./: С馛W^=׮]zcTרo}K/4ݶm[kڴnI6y@LӘm[ɓ۶ ;OCN5QzεI5*3g[nIC@(jߏAUک=L#dr=m~Pד現=1^x_50>3*x% 8zᇣg~YŊsgze֭Ν; U,ٳg[֭w"O?ڷoo ?4+C^nݺ @^i~GpРAv[ƍW_5jO?m?믿ԗ駟M0WI]Jb|/1c;۷oLm۶*i^`/mR(O㎋Ϸ&MX #g:u=sVL6ZI l޼Y͚5Z?VJt͚56}t˭CqF|6nf;o7<3W^>;SW\agu}1Tz?%VƍNi&+Wo.۲e*UvmpoWӅwypk}]GSg}|MٳgLn_zH Kb\zvZժU#ә(~e˖}\GV=L<(3p"rժU \od&\FJ4bo{X\0#_IQV_+%\?>Xred&܏J<Лſʖ-n'p? Jb񋾁ru]"Ӯ#p?b#s[_ 03/M63"Hࡇ \ _-Z\MrQ)Vڥ0#}߃>,r_~%Wq;bA*UE֮];pO2(ԩSs"d:P|k  @1<1Af.5!4$)жm[$ Z浍VZ}U.~-YOunIöW6=nUa 6 ?O?_v] ]^<S=sK2VRsϵH-L=d@Zǵg]ΘqFiѮc̙3MO&ysUƳmИ|>뭷UK>w}H&ؕ6dرEx+FȘTCh\'\K8n?z|D5 FO}Von%maÆ1sW^y3]({ مî@@F=u,SO=zkj@t (@@3z9  N -K@d]9TTrUk ,Z%љ,Ѥ٘O4l޼y6: E h~W.Oviŷq %^ K@@, ͂)" $@NWcA@@4 .2 @IpЉ+" g. ݅g}nL=|@ Νks̱֭[ۓO>gO׽PgGq[P`e@>}p '^2{R<, /؆ +:]gϞ@ af{_ϪUڏ?X ;v?ors9~;/rA]/@G @ضm5k̞z)ꪫ| hڵmv5\Ss֮]sR _ Lׁ@@D ,Z? >3h9J@X 07nl|Em9?OIA A~w^2>hz^ @#лwo_M~@t˖-rJ{젃J<{0aw}ߞTSi @@w%@(?ϟQd+W.|^~ONn-]4~1P6ixQ/_nӍ @ATb۷O0*H})|Guկ_?]4FO?rf%u=2hƍgz?A"dJNw4},ӛ7og"=JT:d )7Z5~~"rd뮻Sr,`  @ ]v}~͆n2LSN87bnԨQ 藪~lZ#OHa; 駟Ƭӱye \Dz.^.-[?!;8;"uzLI'd|-[r͚56w\vݑι+7c[8 {wOܕƾ@XGR& ϰ>XNlժUY5CժU}CfG+i1bDS̙cm۶Ek;/^>[~)KIZӵڕI=65xU]yh@HT/:5jذaLQM6QFGagy,WPuYvGGX~ߋ7 {U˓իO,s>f͚e˖-5WҥMOUIѣMѽ5vա_@2Z -P-_z%_TT"/9vjٛo9E?~5\,|ĉ*R)yb*1cF̩i؊?^6:=Z4F &= O6-\՛4;R|tײg[PǢ~)埪gwg1V5y}.SZUPٳgC=4&:l̘1 5k gªjWm=JO\`*UUส4h/ي_;[*U?LlRTzu?p2e$OW QuW~p1?s$kT߿/;?|H@^?)?Dv~]z75Wdɒȱ9^r$k//_>~/uyaf"ݣJC=\wuy.oժU \ b7~:%EaRɗzY%IazWM^|O?da5|d}wZP z+$6Len+<~NG*!WSW[s=*TKҋ> PR׿?CNL\rncǎOCJd;yʬ>]=h4p*Uy3̙3TTm׮]k#=s)S|عjbqꩧvAꈶ~6nhӟof͚&?vq@ =[N?$ Ծc*)EzիmҤI9NXcA>3 $,. @]atinTiU,Xݻw=]}W~.b[MuQ'L[h]/߿El^zf@*%!/Jלxj35MA&UI#L<99>k9u{ュK.XAIq4FQgWz8([sڲpB;TtLCCTְaCUGsI^O>dkdgCʘTCW)nD[n6`kJrK*/$5o\{?k3*C@$Щɵ;̱{oPS(g}rt5Ɩ~')6N?tS\UqU"+3P^i?"b_l <%{z=zG}wlUgj)ik̮/`S)jڲ!e/x7Ux!FHC'fɭTTOItJ"5*k@FTIJ35laDzjOä;:￿<5lX ϔI%* Uƹ9URX%* /׍[]֯_3?(# P4Tnݼ9_1J;HIOKRno~ FU2R0ԂΝ;@zh%KOj|T/7Q_ %ƪt6:f7RFpaBH 'QlTؔN&) P]%IYQ;L}wip| rm|j+ g-SY;_Qr/ZYj 4rifEB@ '|4≒z\Z&T LuU;:i^xSɦTr1>StFWA3SБd MIBn&fX@@ji?w=~dŒƪ=z#qnuA}eu`V!K8xXZyx&i)Ugp0o^-[% 6#݃W/J57N^V @K\@HcíK/Է,졪^I㓪Uڊ+Lm>ySPmn]]r%k׮z+Xֶ4dR"I۫__OۑG7qM7[X<=l>f%@NU+`̭3뮻jԶRC8zvVٳ ?MAQ}Ъ`qix M.RtI3z֫ǽY%zk8Is [QKa ,YfŜ&S UjW{]AJ@UGo+xꬤ 4^v6m |*߯։gY0TGP^j^c)_b8g} ,+P$  E2 >k&,izZT6%jѹ:aݺuE56 &@mWE@v.@6l/  脴/@&^O~n&nj Ntg[haz@@dOFu@@ M@@ MFu@@ M.{W\|-\0{8s@( ŗ+7>3{dzFk;/W")زe5*{,])[D@ +z-ۺukVkd t<"3 KLs=M7{C@^,֯_oGΔ)Sl̙?)aE٪U Μ9sbŊ6mڴC,[jԨOj^I`ƍsz8 o@l}M^Uq ;:h_">y+U/|Sʕ -[ֶmVu t@dNZ@?(ԑ@ה}QO?ڵk9c/xbtÆ ޽{ہgS*p)+V'ڠAl„ ź6{ 8:uꔾɑɓm6iҤbNoΝ;{@ HJ6f?  ^@@ )fg    ) M)7;C@ @@HhJ  (  @J>鷳)(5JR"Rœ;o@^RF͎@0D@(~ߘ=  D Fa0  Po@@@0D@(~7f  QQL" h@@(( &@@_  @h  /@Zi  |ɬSBJ)aNlܸ@}S ϼe۾}{'%:5:ud*Uۄ =YfY=z֨Q#2dH!=5o;8y߳{u]={:yIϞ=-mad< :3fڵkc1^{5wv؁T?7#b 93eÆ j J@XJ) 8dFzw;mvi'SuAZN"RVZ/8d $i&שS'qw*U*0O f6l\?>ƍ+(cղd9묳"-l׮?>4imV*VDe֬YA/O ձջQFAG< ̙3GO.t]ʔ)gZ ύx뮻W^~'Dx_(@j /k׎k֭['o$ _:u2֮]M 2c|>-[&jI&MSr G#l߾}Ѕ3a={ 4Hʗ/_P(LLi"~gk֮i6ad^ko]vVxě@*c9|z~ 7Ț5kl#ۢEan+UV *l_D*Q65ۄ@/Y˕-[6G|0mז-[ea\L 2Փ~X.H=*>`yWc[6+]RܤnyJ (@KH׬YzqH}*U"|kus{He,֭+W^yewjՒ;NΝM$t^; 7=2Y,#SyIdyP>)7&Xb\w$Zy]z)=gO@I-)%\3 &Djf͚E7͛7'F6) u!7'u 2~x ]Ěɓ';cm6RJf~h0J{SʉEĔ~7}SN;utsb]uTMN׀}_~͜9"v7x /@aRTO6:xbqnҥNݝnFroS,q2)!C8=mg IolP~ΊNamC)%_޽{MDyHo!4GS+W?ftӔ(RE}F .-[K wYb+\(}GQ&H 4d1$@%C瞳cA}Y0[8$={HMkҤ'|"Æ U1~aP~йsg9CJ[)0SΒ@v5k2c 97=*4+: h:姟~^xA<@+#{mwţNXK[y $@M?rՓMʉ'(zhr)+zV d5kf¾KdƍAYR>ҰaCKsQG믿wYoa]ٕZjɩjo]ny饗ҌSN9E+D lhVHzu7t\5\#7oG`<0a={뭷m۶6~ڵ>ȃ͛ ,(d/=bҀkpXuNjܸ;I~1 @ qիWwvGuFpNujucƌq^{m{V /zqNEVPEG: V Ҕ/_"B.Fmy\l2vk ݂ ,wWbtcͷ~TX;RBŴSQNbד&Mr*l pO_~ō5ڀ,kԨThucs]Otzߩ5HswJ*9<)~XעE 4h9rSqguX vN-Nܹs]ŊE]T,/iժUrqkr.Kp(WE믿ew1X[TX[[0Vᄏ-[X>`+Xyh;uHKHH  !,Ybmյ&4i;(:uruֵۏ?ʁ y h!`NQ*0¶P^`M6ҥ?;vtpMh"xA,A!>gǡaxe֒ABC4Ӭ[Ξ? 9r:LZZ}65H;̵n:PSݪUļZAd4(P@Pƍ }?fCA]`(@$ ~31 ğ߱ۯ{r6mS逊s9a*>DZ*,ZD-g6.T]V=`-'oΞ=;x}˗/`+ ZQeΜ9HiРTKye>b )63M:}xjN:"Sڨko1M|+WL'.e_F**}UDŽ, @@A S~Z}=H H nBJ FB$PR-XP_HOeɢ CA-vzj|'N=z_o/&X25k"8`oVZe9z4'e33٨Q#RXxbС/'Unf:Pqx/">^B`GLFTzd-b+DD'D,D.\FCX锸 de$ƺQp{EIB裏Z}ҥ> n5 δɂHH `тЄrȐ!k&h7#%g旅  ,a,0siFl\X r} GװT~VCiӦ@H 1)Y OHu]gZXjluCu޼y&Uv{!?Eם0]*eƿԷIJA^3.X- P&&kv)q_PH@$ Ը1 @ ;vcg7v*_>E/>(cjYzQ\ " ^Ŝ!Ey7D&CqW^yej)4O(/BTi> pA*L4juSO]víR8Y"Xu:mg7LNK/npE钊pUvd~]we׉iz§~G{ A7(YL$>ePe  RGű3/k",lX)dVAX40S(V¼a }8xFVW֤@bj9qOxX yT|BP7aŦm ȋanKdiڋٳgOmoގ '\r\LyhGLKAo]nkԨr)M%nZv*O=T;4q>l^lYy=ݪ³+® ,{9f1=n=g{QGɛoiOfdҤIrUW8fo߾A{_޽{K =ݎ)e>$@&@aHJ,n9sfPLŴwVls2bٰa*p<'6`zG˗//Hh"K3ȍ7(gyر=OO-+Vu 65jH^L^XJ! 7'e]&T3>B֫WO}YiڴkDUΝ+ݺu9s昰F|В@Nݯ_?ce^(}. Ĝf  $BW\9 FקO??8wgݏ7.HV9;8hZZ>,Ҩ:LS!t!ԁTȻߴÆ 3fsZDI]zN-ĉ-oga~j ]jժNl NAUcB3;裝 zW ={*W.] Ӿ #/HbMk@c#(.Wi98z譬[wJi?Юa<!0/[l޼~0bgL_#`]k*0}bX6a8WX0aM^%o޼y, @7om4b6 :@>IM4֑ zC[f'z˄R]#[#I-w&!a/Ƌ2_BTڦw/zTBa~C~羯Pb`%KVZv - C$[6H$ ԯ__`Fn?G-LՍE .X ^za{!L#ԧekV O%oܰMf|C~}eٯn:Ҿp~^ ď7!oL" 4l @ 4g}&&LSv\&U\ A, 0=vc]#NnL&Bx Eg֬Y n0͏] '\; ӷdec'v D7MW_m\҂dyqp#+1NgG/NxۨmJ k x|<Ɓ ?6z)d. tG9ǔ26 AxA(!\r%6 44hF0*j&0L 'pj(L*,AA+Ob>A쒕j||', g$@$@$@$s4熜&   ,_IDAT,N$@$@$@9G4熜&   ,N$@$@$@9G4熜&   ,N$@$@$@9G4熜&   ,N$@$@$@9G4熜&   ,N$@$@$@9G4熜&   ,N$@$@$@9G4熜&   ,N$@$@$@9G4熜&   ,N$@$@$@9G4熜&   ,N$@$@$@9G4熜&   ,N$@$@$@9G4熜&   ,N$@$@$@9G=aFIENDB`MatchIt/man/figures/logo.png0000644000176200001440000114477614235232564015522 0ustar liggesusersPNG  IHDR qy pHYs.#.#x?v OiTXtXML:com.adobe.xmp UIDATxwwA9LI#$: &APE׺XnQw]+*VDA!$!!Oʤg&u2{~a|gg4L`wa`2jz߿ë##VUTTw]{z|Piauj{Yb:;g- ؗ}}ggg>{>}7=)Չq1MOcG~X6G~Zjz յ;k{wjG(wf|::]Ww?`>P~%}Z_]R]Z_]\=t쾽Z٥0 暞DgWgV6=}窓_tDMG4oͭ1p`oXή^X}Yw;W=zf=MU>P]Ym^ _$w&5MOiYj>>3z_tv/V5o=;{To_ڃcמּOMOw=;{q_uX[S-;ëoyTjS^bujvf;g .ߟl~/;êtܰ9bտ.j~9wTW_rÚziӏZSV/v}34xxjzF|c.v0_[_PӏEys2_ëT~6=ꧫ7U ǣVT:jGVYAAk𹬪\sa/oW-qWTo6M?6=;7=?TkaƬcM?2pX]asq'W~'0p`:Ǫ>eM?p~do;Sw ¾ Y;p 1p7=z]{k~1pطW=O' S?~zvvvw}øz|W UU?g0n/ƿ>V}kӏgf;:ꏪ_6$~æ׏6]`Ybޝ?Yzzӏwf;ly|oOT_;p JoίMu-{/Q3'RM?o~z~'P{Uo>TbثۦO?`trG6fW{w=c7TX|u90]C3  Uas`#_hj U?U}iuM?V=u};pNz_#_VY7p >`;]/UG =GWzgգ>s:ss8ϩF vVo>\eo2l 3pصӦAk/ƪ꧚~KuԠ53)Ջ^2p 8ꯪ_4hq;#Tɸf+Vz-3GT>p }GM[f;ꛪwW?W0l\7W6`f[U׫' o5=xE0lsF^;p }տoz; _W^}-Nā[C%] |~sꗪW3p罴C{UZas\`O}ہ/WQlu-{$'CV_]}ou)YSh'|A gwT/m'TP?_V-Lz `wyWߩ>6l }ݩw}k5baë6?iGU?闪?2`w3p5 իo`OsZ+MVE;9ճMIGU?]]զAv2p$z3w^/~z97Vo`_oVU}j`f oN`_v@=Kߨ~e"`d a- MyBCA};zs COze; ; MOlS؁[zyoWW Z<w`W;zcaS"^Fwya_K~z5\\}K각[{5ZMOtj `dcZ}CAK%GT?TzW+Ճ{w}}qK`vzQ; ) xaaTk-`_˫_6;pb-7TgOz^;uC1p}תۛ۟5p Th7ݷ~ok˪߮d`w2p}MOm[sy]կU Zo8k' GWYWwZR0zsu!EzR wV2u `vu/˚?0h ̞'UY:|x-Toh:tꗫ-10;괁[`W;ջU=/-+33UQ{c{3^SMҶ{6{ww\yA'}Zޱ7Wϭ~jy"<7ySJj`F-,{ܿ.m5Yf\jCKz;̗G&]y'U?R}U_6y U::x`ƍZٱ?nmxw;Vh=;キ/H]QqxAVv< zZ?۪O ;{ë^Q9p 0Fs yEt_]۱e[vj?j4_+=pmȟw5gÞpz+K;,?՗7wC?=տ:̲hnh<wvpaknfh4n؆/o'>-[vtMOt_YZ;nN:r[ٱde@`Zzf' `ewu\vѨ6muepe6MC?h\j-msS:9hCkyqQտȃ OWOqsغuU_6߿Iq5h4WMjvۥuvԩgu^b+KKC'k: ۪ Z>qT_V:p 0Fq|_~~7}=Ζh4޳Nm'6m 羿x3[YYj2t%0[#տ aǫoh=3 Qs kZ[ol^2l,Dۻ}/d'=U|-mVЙzL5MY\A6wvo~:Z5p 0z]ww5nkRe_o]]w?zy|t+Kۇ fݑM#erv3n<7de?ncmocƸoyn6xU>sύ[Y^:]SQ+;h3GM/6uѨBw]֝vlo4uh4&+mpw|=q兝x;OkeiGdL`vX}YujǪ-`fHY[;V3vf2[\wo\xQ-o}o첍WI}U~L۷ ̶CꗫRm;76=j-/ᇺ+.l˃dѸ͛ʋx3;/ma,:]sտ-`fYՁ3nh}PRvuɤ-Mp\y?}MЕZSZzMA+8_U2p 0F/t[wz[[޾ͰWVz;>gCO|R;W3uPU?S5h{w﮾:Fۺj=xvvCo^[ꗪmW$WWoWqvl _^rnzdNo4&}n6|u<֗5ЉZn=U?YA VtaS75[w۴&K5rjd4=}2Yit9ﮫ/ij^O>JM&CWkMV~r"XS~zkʴإxvm7}a#3'OVzۻ/խ)u:Vvl:mVWz 1p೭9/ p_u5cʰ}W}i6pUqs?N:9Vv ̮q{^v6Tqꧫӫ5n4ԍx7]>`zq57OaUuk+Rrjd4b1i-ەnNyk:'4}ƚ_W}Z ]`מּ:)hx~U=1MZٹq7tqj껫cՃkT/V==h4jҨ6wE 67Ocp+KKvmXwe=ss_Q5q;kUVzSV[KObV l4˾|ڰh׋್j47iM]7헟{MrjGPYygCF1pػUWS}gu@5Y7u7vGkJc'yWڴ._O翺Gye" /~jy(>?{Q1WU?^_hԨQjwleRqN8e>Q5iR#oTTMOs<VgU?Qa; LZھ[/=?>uF5tǺ;ᬗug6&++CWi\=۪ -69q?dR++}]]w54vO2?UnNyWt O\j21tvqT>d%<[]-fh4371Zue\wEa;h:tۺݿcNyJ<;ణ5&'\<o^Sצ'o<` =A+ݤOÞhs~ēI}ErxF5 `U+7:N:v '~V/~z-,MVVn۾eGsg3OOuv3_1O?x)M=-ЉzzT2h>|S j9,^,umMq%_U￷:{_ӣ>c~2p!0FV~ji(Y6xN/ +6eѨxrﻧu羿[/9&5v 3feFsuigu_~~lJv8l {NƁճ3p 0F&++m}n۶iۙYy~;zyGvVkh<7reWZ}C3CCwG7WasY6Um躏'/m]tquWר&C' ߨkuCm YaWGTTpu9,ǍFVc=7\x\󫇮lT{?xGN~w6o24YYMꛫSy /;۪Snfh4j4key[.h+K50t k4WsuWᦫ;/g~t農dbs W]Ab_Ӫk:pEFLVںn}}m}`KúOmw^ua'3Z\OEW~ަK- |ﭾ̲\F-mƛ麏nŚ_=tV׎w{K>){uV7MZY^:]^Yb jE{d_q/ ̆#~:t`Fq+;z]+xn:ػ,JS;9/ÎnMs <7V?Y[7`sۯzRnfxndmk.}558tk2tu_yA'j큍ƣVNf{wWԭ\xTㅡ`߰FQ=e/4Y^n2Y:mV?dM`opw| [8y \=́[Y65oխ}m[VZX=t[v>䖋.Ϋ?ՉgO}v:B+K `8)^]P?dVWO~z-7ԶM+;kqm05Y^e_v^'?;'߁&MTYjAC2peئ ̲xxݻnuͯU ]T5ŵ}wN?qW#Oh~&++MVf[}Akb^Q:a`F-ރw͟h^VVCZ`3~~}mXwm<{ :G5RЙlگz{Ǫ`73p5UW?Rl`F7YY{+.솏e[ZjqmYӷ7~yEWuڃm4R5YT_}Avw`_P\}_3nx̑[^Qvê_Ϋ^[7h[RnfBѨ-m.oݹhq?vZXSɤ~';1'?Տ:&Vv ̮3VYjېA_(w`oՏT 3l<7h`/2n9^pCo}:[\Ҏ&++Ciz[ꧫn` ]El4jn~Jykw\nM&Vv ] ̮~KH{w`OտLK]f47|6?}o;Zuu,/}O;;IOg~Y>&Cg_VTtu{#Wo8p 0Fqs -/mޛKO}Ɏ 󪫻熫;ȧ>8Ud2:MkP}cՇ-S]`3mBok.>ئ[_k f kj2k?nNWny&++CgG_U7*w`W9zaCsMfhbJߺ;/>5 ZXUI몋;Y/'ށGx~_;oTS=8h0s Gڸ:.37hn77\Muckkqvhz;c_uuܳ^cN|rrx喗v ̮ޱİ9,1pIOPH0p 0ss۾6tu7_kC>ރwݧ;gC\Vf׿z{gx$]q3l476/;[>un˵ ]0Ś[7^1?#}I/ny&3ٴ-c"`f|)6}Y^:p 0VhwܝW_-|Mf@jauMVꆏ{itرMZiy+׫VX}PmwV_5p 0z[K]jam,t͇?^ұgǜ<&++->t&0N~WTC{.wsyf̸|s mCo[.:[B:1Ԧuq={;'Gc{+KCg ;t~9ǜR[6_\Վm[p㕭vERkЅinkx㺎~3;gcOi~՚ok2Y:]?T}C_? ZQ MOl)LlTݻnoVh`TkR7_pAw_{i>gNhR-o^M.fӑMg!=;/R}!l/47Cw=7\~Z\;tg4ز?==tclyǎVv ̮YȠ5 WohLvxUq9yjC0Umwv?'^ұ?GV𨖶mm-hy kquЖmieyyL`v}?6xxB껫nfh4j~՚m+/K/leVFC;/B}ݺ6޼zFG>Yz“[Xm[L 2SM| `wp껪3MfFqr]w\qA?}N⚚[jaMM ..g<ß>&+-o6t%0~zuW Z| `-MOne /;wuvt-w5V7t{ۺC/ӞaZwloy3uB/Q6h9e՛o:mVצ u5t%k~Cת|]uuw?9[74t&0^zIizz۪nfh4na~mP^nnV& ],_x[;+;)<ŵc&3uկ4_2h n[jM.K>Ѷ[X]sC0F&˵K;iȧYأھVvl2$̝׋dwMfh>Vxp7?deeL`v}UMOs>vT\3p 0FVJw^nnvl5_Q-J~ŕsgwӞӡ'>FvlyxJ`vziMOti en:lS܁[6i~qus ou~mgSkq}kqZQ>6xuGzfG>Y=SZڶ[r%.rb3A`:aI;M?m~7xV?t s t=7\ѧ=#??`+KK52tvw^/~as`#7WX8p 0FQuEK+/keo_]Imi}mʎ>{:࠶?PdL`v}mʦVu90 qd֦?a`-^hn.9.;-k4ŵ\^rY]Q]QO=CO|r++-mu m߼ۼ HaS6i~pvɹz'rڡ`0~T\6N}fGvfwloi Y;_yc/1kL&[߁=xwevmw5U{|m6wG Sѧ=;m?ʎ52tvw^_t{>Tz}f`fMZuAm}}⯺Lvno_Ub]=p{x5u:)jALEկU6L;cnfj2+>٭W]͵`ƌO[Y;7_ӆuWvSNɨ[6 ] ̮5[W_JuӀ=1pUշWP=k`M&ͯZ5mnz=qSj`jnSmz}=ǵᖶm܁]GVlz!`Oa+T:aIVP6 羯[/nذv#ӆSQM6Ya*g^ZfgMOmQækRq|T7?M`^w9šԽ7wϺ:gwSlam 6Kz~; /; nfڤ57u5umR kq}@r~{u}Õؓr;n]za/V ![ ̰ɤUc~E=tj~C;O:)gt3AGjiF^~5P}[9{yUqk[Y54k{u%曫Z4lbXk?6t]G>>|h[wGޗ5ꯇ}So:l`ZuA,mO}[?}nw_{EKkauC_55Yon '=b6=0t"0]Iӡ.f;kSnfd-usʋrVڡ/h\˻w }ݥuIOmiۖnv;]}OYjӠE0˾z}GqSvW.K%[.,n@$ܐJ !J &FB!t\eɪVoS#ɽzud=9='8)b0Ƥ%Iҩf 4Ƿ<`kKm`ۘp9CwI嚓=?K2F$I]$IF?<8-4; _@atO};hi&@p&IRje@ c=},z_{+u Σ06L\:t4[n]{sthw7A9$IJ([G9N4kYq4ar١ٲ [sH$I8p$I$~xpn-*! wI7?LǞl=Fz}$Id!pG'c ?knb57ih !ԋ wS;}GE$I$}K$I4^p64F=vpb3GaLq҅$T{#]'o seR-(L%]*).| `0"I$IM%I$Ig~YIqL˓kpG3۞&GKR$ bн6?L]v%Iҙ,ծa{f[ϊ ǎW*%͖[ ǒ͑$I$s.I$I:]:`I-R+sP-h63rruDxb$I:% W`D;} ?²d Ѥ%W=D$I$IK$ING\lTcu D<}rb; 9q$I:UbB5B3y#/bkuT Ӕf=]lYF@OL&Z$I$I]$It׫^qes1F }pK$͖ 2Pm[YjVny WPZ.:t4[9y =oIH$I5%I$I[%") w.3#~3tpG+qu1I &h|f^}=+6R!VK-6tkDk$I$Ig=$I~.`U-R.8Z}|G(Md!q҅$@ 2F_w+(N'])) lS3"I$IYˁ$I$))YুnfqL'W@8}G23:I\#avIr P)U>p6zMwq)LQ)xٲ xp+qC@%"I$IYǁ$I$) pK6ERj1a&C~|&8ħ<^j'~/HIo,֮ N|,Z{6Ü131B\.;t4[6 |!I$IYŁ$I$镴A[$UCR7w)"2J:$I:Cd7=L0z%lb͝KabI \ |s`WE$IwI$I+\W.K6ERe [9i[R(l}u$I߹ x dњTeJ3SIgJJ7 J2H$In%I$I[bqL˓pG3Ƿa_$IgrPsF::UkY}]8rɡr5]$Iā$I$yN`A%+ s㟦sN&;![t$I#bdj gUsƻq]lہ+9"I$IɁ$I$XxXp541'v>ɉO3BiJ&z NP$4nW`⍯aɺM'bo6 1Нh$I$]$I_ \pTudu G?~ILmu$IJBS3㽽ɪἋP.LS)<]lYI>h$I$]$Iz)&.IiDŽ L tq+sx?SC@ֿ>$I߱0S tPG;׬c3缋(LW%͖Zீ%#I$I:9p$I$}+.LER9(Lh{q+}T$I^Lar@1]I~| SI*{-$I$K$Is \p! ܷ`ũd<]$ joʼn0P}GqYf=qJ0tZ>0h$I$]$I,VC$YLɑg͏0tq6$IrDﱣеoKWS.LS-5:feP{cG'#I$I:8p$I$WIERjaDa.S#~p*dII$= 1:lg]˪~yƉK^3$$I$I$I$ˁwKnrTE]mR(VI$%"8>?YV_w;QTOqz2DI5 2xE$IDP$InovZ$͚('d=a d꒮$IR>U+Uڙ =G^dp먔TJ3%Q{ীё$IwI$I:{(CI(2dܗo:A|u$IQ*w c]]\p\z̽hIJҙk X9$IWI$I:2p ICkh09NszK$ ,/neWn`ՍwkKqz8LIn.|ؒh$I$]$Inr9ĕ2ۿJێ4S&:n$I:oqzzyc{XrՍ@@0t:xp %$$I$Iρ$I$_x36I)fd9Z# 05E9l$I:a\\fOu`oW\*R-Δ^+n@9$I]$I_6u HJ0509gD߱̌ ǐ%]'IS%jWajdu\r4{!Ij5LIuPg͑$I$$IN$")u Sytɡ^*ed >I$͆( 05:ĉ0pK7]|83]I:꩝x;D$I$IwI$IJsfI5\}[9Q(N2.ItVjjXoǞ2qY:cBҕB z<`E$IS$I$/o0I)Flf7?@K a$Ik+7o[_% T+3%Jg;h$I$es.I$IgwS;1I)!|=cC~tx D٤$I0"(Lsh#,Z{-oP*@\M:SR:A+I$I^2$ItzN(IiS)h0'v?P?B0]$I]8AZ?I_>.(\N:QRz^ v)"I$Iẃ$I$y XpY"z0Ni@AKN$I }^+{z9>UK"VKIgJJ[~ (&$I$I9p$I3ˏ?<3YҬ È0ccOl$Iҷ/kCRaGK/[d)Jҙk.^C?&Z#I$I8p$I3ënj|ʼn1?y:`f|j"$I$% ]1tWn`զ{5Σ\D%VX)-$I$|,)I$Ie ,JERery m;uLQ. 3%ItjDc'odMADTH:QRz-~H2H$IhR$INO7+nra!#[#t8=MBMN$IiϙrѮN>i:ocpŗWT+3%mmG+$$I$]$IN?o~X %")ł0$d/{ 1%I4vgh>XO7 _u%os/\.AtZ O&Z#I$I:UJ$IවɦHJ(43EG <8h$I$wI$I:u?\D4; iƺ)f$I:{T*ez9й ,դ3%ڡӉI$IR 8p$ISGw|)."h~az<3IC&]&I$%#JOc=^7G=] ED$I$ ]$I^tWKҬ ˆ'v|=[(LQ-U BK$I1&&|m,v2:%͞:` lMH$IP%I$YAm~p^-R.Cjiyq&zK@%]'I$f#VLмazwhAjҕkp'p @wA$Itq.I$Iߙr p.I$ATc ~,pK0j5DIU\ U`WE$Its.I$IE&")ceˣwQ.a$Itݫ {ɪUkk'K5{ ǩ>d$I$K$I+ y`!~w4ۂn2)fq.I$͖ ,t +ײsBKU?c/JE$Itq!I$Ik.`IUqZ=Iǁ&Vbvn$I{iz#j?u7rkI6NReo~x$"I$I:M8p$I H: T+elĶǙR$IRj'{;YzGfU=$$PE$I0H$I:۝ :6`>,i1mG9z)99R$IRBMJ``pβUx@,K풷vPE$I$IV9ǀ_Q{%$͢~o}c{(fc$Ik"Cm0^z 6Eù&')B`N!$I$]$I&~KFR@Lyf;&VvI$t륙lcƲko%k%idQ;׀JM$Iq.I$l4&#lP=֭39K\!a$It& UfGi0=vƻUQyҭ|<@+.I$)K$IJk} )`$͚82Hg Rb$I+`ˌw}`o j4`_#!I$)K$IJoWfGWƆhwR&q$It;R,|U&8 !vo*iVDy~(&$I$I$I4P{uds$Z@JXsVZ=AqbjJk$IҨv{6ox5KD~ّ.>x7('%I$IwI$Iiˁ'Z#)NV2h~&+_ϒ$I- VSvV箸0 KMum'ځJQ$It*8p$I!0iOaERv,{\0abY$IY&jhw{>j-n pHҩV0=&jM$I8p$It\s$ͦ V(MNкI:n43];a$Itv;R,{dCmGYvͭv2s%͖8= $I$I3Ux?p[fEqJDl}aZ8n$I5@Lij}h'+73ٓ/KY {@1(I$IN9p$It?JRI# V 9A\=oH$If*zƪ[LCwI# |$IowI$Ig 5\jޕYPVeˣCT$p.I$;@\0ȧ?W]êayatr])PxGG$Ii΁$I3Ajjads$YTS,m8=M@]$IKԖrh9` È8&]))Bjosǩi`:(I$I8p$It:܏vI(8R*sd7ǟ S@vI$IJ@LqjgDm~.j(%4 "j$I$r.I$t w? %#)N^T>As_fqvI$I'fr_`QD ]숀 >\$I$w%I$N`.wܝJ%Atj!Z^x۩Jͻ}$I$Ͳ Ze0Y.xusDj 8ttޞnNs0`$I$}wI$I9nrART)LѹyZ^xA%I$P-9YzdU7D!qtt eG l&$I$$IV,iAHLL\bMD_w I$I:!'9}`'z?\Cecf͍W4D$I$K$IJJ\.>IirN{T+UOl$Itz9e ._D@q* |7A~d$IwI$IIXmjK, R01NsT%$IN[AqJ} aٵbDjp.iV4 KhE$I*%I$$")0"V(Lu`[0:FA&]'I$Iߞ JH}`;ok7Gf*c=DE$I %I$r2[$\@L\dǞ"mQu$ǏrOйo+,(⸚t'bE$IŔ$I? &0'Ii!rɁo GwSxb$ItϑVv}hVt/ . d!qǀc@/I$IwI$Iep7MnaQ-(NѾ9Zt(]$IR1tA}\5,r QjSI <$#I$)K$I:WP;[$\Fq4GvLf $I2P)8tƥ .]GqtZ'j'-$I>$Itd~*I!P)j/1x л$I&'M V[dWIWJJ:yoீ6h$I3|%I$ S;[$Yr~oy/PB%'I$I Bf;?ҫn_C9e q'))~x ds$I$K$Iz9\l J:nuLeK$ID 9yzb7|{$͒ olF $Itfr.I$hJERaqL0@A=yd &]'I$I( ř=9:m[_@@\$()68L'Z$I$]$Iw"V?ڛ%iVaHTJEF[i0MG8l$Io)jC^v}#\x.A]('jҙ!ZH$IK$Iv]Dae HJ #S|OS.Ş.I$I/A~9P_w;7F3Y2q*:#Q;ٽ? I$I?$Io7%Z")LR01J4m~i29K$Ie!4o~۸Ytz2 QD\N:QRz- x=`0h$IӖwI$ILx HJ (j9Lӳ_b(\u$I.<''Oұ9.Aνr2z8&VN^3(PH2H$Iǁ$I[L핱 HJ ˆJx͛>K$Il .v|XZVxs_DW*q5LIv Z$ItZq.I$鿺xpI-, JA:^|OP''I$Ig([ٹw7}pƻXz;(R)C')) $Пd$IӃwI$I$")LZ09N]4=E&'p.I$I !.W9ԣtKn}.l}AQ-N^k>-$I($I- \  HJ ra&=%N @.:I$Ia(/~8-\zۃ,XLZ%V^0p (%Z$I$)Ath% ItY<pn-R,BLJqZ>ACMN$ITD䪍N_HS-9Y6.=I%?I'HOp$I>%")͂(Z.35ONj[8+*.I$Ig(WٶswXn9*]Ҭ83-g%I$K$Ig9zI6ERڅ,11&[vI$I:d\*s/ұy.\p:rs jt@O&-$I4K$Ioޝp ˆ0(hGkn"@!:I$IK!v?seb` 2u T+j%LIzj Q(BB$IJ-$IR-^*8I)a6GP`OӶ{3ղ'K$IRw-GXfoeTEͩY~AශD$I$ $IR:Hf- HJ(Z0=O׾4m23%r擮$I$͆L->Ko~k֓QQ)N^K>l-$ItJ9p$I%\ HJ0! S 4ه%C!:I$IҬ jJ3gh{q3 ZK~q5Z)']))n9yW~h$IS$IQpZ*iaHRf͛!II$I^iaTWc妻h|rZM:SRzx@S9$I^.$Iҙop[$Y橔 Ӿ9ZwI$IYo>t$Iҩp +9 HJ 93ӌvuۓtMC.:I$Iҙ<kx㫙b2z*qͩY5 9%I咟٤W'K$Igہ_I6ERljxo/n Rܜ$I$IgLÉ9U7˒+7p΅druKp6IO-$I?r.I$_^p9F;_fw\w$I#\|a:^%>E^MݼTJd%u'O^vSI$IK$IRG HJ0f2j=F0p(u'q$I$T!n.L | /ΪkhZ.QT.^o7ds$I$K$Iw'")ł $l;Ngi*'K$I429 }MM 4|V-\F&_OX IgJJejc?-$Iu%Iy[nj\Zfi~a Ur$I$Mr@ -[{h+okcy9ZL,Y ,GspE$IK$Ip&")l >7?H9{I$IR2υRľ/}ν[X}\x5.ZR-^0$I$IR?@Uu HJ 29 e:By'q$I$)I1σv}Yv +7sV\JJ@\$]*)~x#Dk$IwI$)K_V'")ł $S.n?Fۮi"$]'I$I7~v8D!.\y-%\-iV4 ;"I$,]$Izen~]fM&_GW.oy95$I^ })leՍhu4wP.N^V`<"I$,]$IzezO%")L0Pi?͛eD:Gm]$It&!n>&gupoW%߸jL\JTRz=xK#ds$Is.I$;K'")ł0"Sd'v4%#I$wI$i\ 4 jGeuTJe;۾Jq $I6Uؽc{Xq],Zw= -'Ρ\/’f?p@E$IR 9p$INz`snr\==23A]#;.Zs1R!RIuqS_[H$IwI$Ժx .dI(d29 # ;FI$IR. }} dWs+7ΧR*R.^ovǀH$I]$I:5V ,LERjaD&_O0M߱|P7&I$I#(W[jia׳|͜u 3 I!i7| hM2H$I:9p$I^s;nN6ERd+ۛ>m13Z%?$I^nUhݺ#Xn. O0_%͒_o740d$Itr.I$4 xք[$\FS}M15BO%I$I@XWBt]s. VIJJN^I$I$IwnWKERQ(81J4=ea%I$Ifb2pg\՜z yTrҥGA3Dk$I3wI$۷ ڰ҄[$XdghO-JƓ]$Io-lP=1z̒7rKϡ<3MW.N ooНh$Itp.I$}ks;w$")2zb`i8B>vI$I^\+fzb ̽h)1L҉U_xLH$I:9p$Ig7? x!- S#G 3GS;n$I$饋!~ 9_E]MÂ󩖋TJK%wGDk$IӔwI$[|/n`~-R, #2u9q.I$IҩC,tt3o,r/˯ V![H0C\-Aҵ'|8h$Itq.I$w\lT zеw;m;0u\$IW@c~;XrYz)a.O0 \Ҭ8 I$I $IRMIj'KҬvzuW뙢n.%I$Iz%P7e8S^E_˼ W+%z߀O |&Z$I$%́$Iޙtt 3Ylgoj'[P;n$I$CA90=:/|  q᫮~K3TK%?;#I$)9%It6[ |7K[$VLFdf?ݛڷ8>yb{$I$IRbj6N+K>IJ7qϡ\&zYK,Кd$I$I:-vbɦHJ o Vl=Lׁ|*8l$I$4k wwl7usA@0_%͊ex MH$Iz9p$I.mnQ.Oe.Nlsn>a%I$I:}P?*G*}Gb,Z/XL\R*&]*)8y+IH$I$I:[\  l4 |3#}=L=K$It&!ԾO >V /|ʅrҥ-R;?$IfwI$2j|'pi-R,C2z*=wѱ:_N9@t$I$Iz2u1ZXn#ZK~4qtt:FjC]I$Iā$IҪi΄[$Z@6_G 7ѵ;abH\vI$I$\+p;"7d1tҕ2ுx($Z$I$b%IFwoޞp9L^z}R K$I>1aqG^]{+֮g9Q)JIJJO^ +l9$Iҩ]$Iir% HJ0Sk6v?O4|vI$I/( aOtą#H0CZITRz8}O͑$I^>$IJ Xp LjLڿ=[(LA~A҅$I$))zcw,&u S_S|ශ"0d$Ir8p$Iҙ,^ x]-R.#"FZ>2;I$I$ yP-Cs߼µ뙿x%qB8t(9#͑$I^$I:SC$[ɒ.:^|dr׏%I$ICA#_>k~y Rҥuwx>I$;]$Igˀ?,IERaH{ҹg ]A uII$I^ \lmg7fUYx5)դK%Sx jCD$IowI$)S;q䧁 HJ 'ӵ;{gzB0@I$It5@\]/߼xF]* <34K<| 0h$I-8p$Iҙ[7&")ݢ\(e;h۽asn>>g$I$I/Y/TJ<4MXz,b.ZFTR*&))n=y p9$I7]$I HJ0ʐ33>DtB,?K%I$ICy07@!^nʅiJҥ1o;͑$I_%It:ZF;U HJ S)y`'"A҅$I$)y GjkbK]E&WGif8_K /i#"I$p.II}IERdua#tN,o!>I$ItVjȾZm;l9knd 2B0tSA#ԆD$I$K$qC&")ݢL(g;ػ^2uP7/:I$It6 })=MXvM,b-RR.%)):y|x4I$K$)io~Xl4 l8{s 9R;)mnu$I$Ijhgc]a!\k#HifZI:SRz ǀH$]$IIڰ HJ *=wѹ;^0 %I$IlUC 8’u7p+ Ra8LIt>OI$]$IIxC$[&WG0DׁtX8IN$I$ B7BǟP!WndTEBҙ/>d$I.%IJz ۓna&K&Wp]ѹwAܤ$I$I}aT114ʡe0Kmb 4s4J9LI p?ǀ$IY$I^ S;G HJ S.LӾY:m~*'IJ$I$4d0p{-X]l3q5LI?F46toJH$I]$Ii>&j7=7$")2uu7k tH\D$I$INrW} 8`a_W%cʅ NTR:-~ (dA$IJ'$I-o^tt9\o6F:A1:I$IS/k=E},b#K WP.P)Δ^N^Pg9$IJ$I:n '")ł("o`f|Oӹw+}-vI$Itv25X#_f(Ȣ+n|J3Jҙ!^HF$I]$IrG W$")͂ Zбo sE3 HP$I$镕CCϑ&O41zWnW]CS.Lqҙҩ .ޖh$Ix%IrܖpruD -]198M II$I$%' Љc'95T%*ř3%% I$$%Ir|7VIHJ0%g[>297&]'I$It#5Ǟ~#,^wSZ)%))n;y ds$It&r.I:mNmp l{6z @t$I$I)BnF?`a_Wn$HifZM:SRziDD=H$L]$I߉ Xp4 z .:n. UuIJ$I$A\j;`_.8)g .N 0d$I %IznBJҬyl#tFϡv\Cu$I$Ig W+ly# 9Ēn䜥)gIgJJ+>s9$I:9p$IҷrVIHJ0ar/й[;#7&]'I$It 3Gz&u70g ST+3%=]?'#IӕwI$}3kEm~a-R,2u v>Mm=@T{v$](I$I.1 oeYn \Gq'))B-/GÉI$]$Ia HJL0lsp'ӣE2ڸ]$I$I# [ tpG3d&.J2LҙknjC L&Z$IӆwI$WovjIHJ0%[hg+.F:rkHN$I$F1ioyc,z_VPZ.%));yK$Ig9$IM÷<3YҬ lfƆi]3p8z HO$I$le!XO?_f63$qt.n^H$)I%In+wP;}Y-,뉫U:nsV8<aҁ$I$I"L*mf#,^w׬ 43 IJJ:[OHH$Ip.Itv? )I)edrys S%I$ItZ }jwfYzMr R!LI xpD$Ir.Ity໓na![?N_L EkHN$I$IJ\Gi M,bKm%&VܜJ57L9$Iz8p$I:{lvjnbAo8=IG>A҅$I$IND9bjb fU754R$3%wQ;}G1$I}%IoJEReaHϑQj5LIu# H$$IR~0Io$VhU:<񣔋^ -I$I$%"L* 3Љ,fD'^Y`=D$Ib%Iej'I6ERdrDavMsb[X._0q]lڛx9$I]$ԻyN`~)R-͙G8Cӳ_cOP-C$I$I- #]݌w3t(KeJX҉+nؒh$IR8p$I:uV&")b2L9Y8U$B&t$I$I[ onaXyhuT3 ӀAHR_|C7 I4p.I/o'")8&7e[BуL Dy$I$I(jWy@us]NYz 3c咯=4[VS{^x;1O͑$I:9p$Izy lr HJ8&CpǞ^d $I$IQ0۞eEW\n9GUfu_l$IҙɁ$IK%N„[$\n\JmOоYڛ) t$I$It!pg b5d҉ҫ3+D$I0%I3 \lTc|ٺz2r4a.I$I$vZ=hW'}Gr]\pU ST 3.i\ S 0h$I$Iҷg6$") q>#]'h(G198DN$I$I UsnFڹ+X,aJšr9O6G$]$[9`N-R+ $߸(]w0I\L$I$IE9 &f%W_C~ c@oI%ˀ?h$Ií$I7w)k][$ZLnбYNz)T %'I$I$ɷ$Ud|ϊogU7W*f^+/'Z$Itr.I~x#&IiD0d$Ilr.Ig[$UDs2!fFڃ/I$I$Ig0]iz` /[%dcI$I$IeFi۵f^u=p7'ǒN^p?OH$s.Iව;ɦHJL]TcNx۟brBvI$I$)} c=}azeƻXrMqra:JIڛhJ2H$Tq.IҢeMe HJ8&h9Ds_fA]$I$I:a`Rf &?Aׁ~[raj{4+on PLH$er.I-S; l \:;v!"$I$IY-StHW /_%<R#?{e]O{w*˶ܻM5- $!! s~^)ўu^\ˋuϞsF*[S*K$K%Ixv5Ӣ")29]m}޶fyH}$I$I%0-vt? ^K%SV@_%Ww4$IK`]$G (mx?KҨIe'wmj'?4L"vI$I$I?[Ő&{4nbYq{Q2 x%>@}$I^ $i/H3Jp1J\+D$IwI4֭> lfDERܥ*熩yj}f"^@I$I$IKLCBg;G7ûY|,*D@%Wp9|i"IÂ$Ify4j t9=tՒ",,I$I$IJ݋E{]4}eUk()sQǔ_37> tGH$Xp$Ic_= qIH$IUUC֚C @Xz$I$I$I%H@2AZY+`ŕ` :xZp?FF$Xp$Icmgsjz;8w߹v!$-I$I$Iiy&9]~]7Ln/ڀ, >Iiǀ#M$IwI46>Bi;hHd& "'#O?@_[+\D.B$I$Id"SC4ͲKgy끀Bn(ꈒkp5pp?(I͂$IR/K#")t#{9=A~xt I$I$I$t7dNݎMv/=b>ꘒk.6`o_Q$I5IggAtY9-z{4G~0ѿ$I$I4ImGg+nrrut2J^ Q$IuItm>\L6KnF;(BII$I$I-B `ד.//%itP*s`S$I҄a]$.(ot$d: ܱB>GI$I$IOsͰΡGi69g]@!8|aI4X$I-.+b mIq$$Sikq~8\?A0H$I$IƩ Q]'kP)V^J,XNaxX: GJ&$IeC$*"")ewrѰw'n¢vI$I$IH|x'w_q+Sf68K+?.w"M$Ibɪ$I kWS"),A?}N<(ݝEvI$I$IHApoǟ}=,Z]d:Kax(ꈒX| xһQ$IbC$J?^C:I5d ۨy( %I$I$M AA b6?v?͊+nc cJi _|i"I >$IҩoˀtY$Y xMۑ ~!:$I$I$~A Zq+ CX:Ƒ$IwIr|88Ke troRk =E$I$I?zVavڎ0oE,ʪ'68g 6#M$I- $:3`jY$\2Pql] vI$I$Ii$aP_77?J,:毾D*C1:|8e I4Xp$I/V'\ mIq$A@=zt՟b$I$I$_$}[+_(B1%4 O}&$IFa>uIwوH JǿG}Bj$I$I$E L%8WF 9;"it 5JqIVQGN 7KJc6KҨI3 vshw8}#}aic{u:I$I$IB~zd?/%\Kb.uDIU|x&$IcwIY |r4jdP䎍x/}m<$(%I$I$I/OPz!C~?0 Ib>ꔒX |Pe I46Yp$I?KA=\jQA"IGm ZP>$I$ItAkX7fņۙ2 aHQǔ_ӁR*i`0@$il.I~/TtQHh{hܿa8rT$I$I$iT=,6,zduǙs_~+SS̻]ҨI?DdoGG$!%Is.> l"")>Ǟ!~b5r$I$I$VA!?ɝOztK.o -X.iԔkx6D$)r%IπS 8,H$ =K͓DX̃vI$I$IVAaXd?>{aņۘb1ꔒ//HIXp$i0 $ $j:GC#P"b$I$I$%A9zNgƲUpճEècJ*SB$IgM'Nik|QH$nhؽ|n!A##$I$I$i FCMC,Xeoj2Ţ}SI& ,?-$i.I*#)D'vm!,K/E.I$I$IB '};Yz柷dТS\|x 1$ŜwI/LjQ <-Pnj $I$I$/F@w=YqL[D"I74jWW %ЁEwIb˂$I>KURO?^p0Q$I$Iq0p"V R5cM%0pq?\$I&IR< ~_K6I%A"Po6?H') !A`]$I$I0MwV{E_ɢ &[Q=ݢQ P2$I:,K/I`%Q6"8b-  h]$I$I&s\Gz[Xv-=s-LYZt4*ʀˁ8EwIb$I&Q:}ԑ'IHs=,8yC}κZ?ג$I$I "@G{Uw=͊+neʂdC% O@1P$ Bq :$,zJH_LH6IAaH{ 5OCah]$I$IϋD\n| .iJ#Iފ?iwIƯ p1-Icv!0,.I$I$IGϋ.\t ^F%4p&ۀ(CIς$IO | x# 4:͇wQ=AX$I$I8$ǡ@gXq\vL,K-j)-; #$I^ $ uw#NNz'A R:JV$I$I_$"}mMߘt/9 IREwI^^%`(FK$%Iʁ+dq$W@@ trtӃyb$I$I$yZ!毹%\GA0q.KO> DJ$,K4eS:B͟ݒFM0}x]9!z°b$I$I$y\8w:2ens4:97?CQ$I?%9IƦo!#ʣ#)΂ !ڏߧb]vI$I$Iҩ vsi,˯iVLg ђn% %I~wIƖ10FKBzo +5$I$IP sdӃ4}WkHWH]NX |00]$I&6 $E'~@Q!A:O{;Jr$I$I$iL `EO<)LZt4ZۀO t3I4!k8FA~hHB!G[G7Ok$I$I$i{:3H$ӄ[(6mI*YGA:-.IU>\qIq}]m[!?<u2I$I$I^0]O|h.k/'[5D"IX FB߀Q$i.IQ,H H2KKnjҡ$I$I$G9Ľ4+naHW Q'_G/"#IRYp$it%*#)΂ t7R>Z r$I$I$)Ďoӗoae%>|hv$I$Igpq`IQ$Z|Vm~]H.I$I$IR h?v,X}/IҥEwIRW?:"M$IR Yp$ԫRvCQ$]H vR>}},s$I$I$) Nl}X}矟K(Y7'N/@$ʼnwIN _bxWQ$]$ ~0>=  JvI$I$I& \/ϰ|-L]tyah]ҨnG@0i"Ib$I/_|wIH=?4@OI= MwX%I$I$It5ֱ[_dYF*!-HQG_| mI7 $st9Hw$I$I$23b[=Ly%a1$ Q_w> P8t$IaeF4 QA"I~hNly %I$I$I:]$AE\IY2Yb0 ))_|>)D$Vg$I* \ | $,b- LR,ha3 }$R$I$I$z-OҴo+.9g^@j db!m@Iq6w)O-@g$IwIDW |3,b.Hvq5OCWS[%I$I$I )`,f-\EX: xWo`(D$E$i E[?2ƑgA H& Tѧi HzG.I$I$IҘm={-/sIWEwI_NDG:$i"|R]FGH( 1@8#r!t$I$I$Ie`VlN]p5 Ͽ3Hf yhCJ2?#M$Iid]4L>\qI1HEi:edt$I$I$Ii El|[Y&f\Mz2A"Y*KX|x@O$I: H&4p#")D `κ#x'HvI$I$Iƫ]mb0eR2Eb!ꔒ뺑߮$ŖwIR-68 T>Z9!wo%b$I$I$Iq$!t| _}1/sIWP0ꘒ;(\p"8$  \E鈮UggA@"R]9 2Q$I$I$I!B l ;w哧H)`]> (e IN5 .~5,b.L2NK^?Z:H.I$I$IR|C,f.;)$o>  $Iҩb] JPH $L2CG>u?55)HeN'I$I$IN Qz?o+WlLL$L68 D:CnFj>ΉmO i]$I$I -| mGYxTΘK*SN1?LQǔ_ pp"8$t%IL2J]qI1H ښ4G6=P0 :$I$I$I+RiC8Imee72 )4D2E1: || xh4$I/wIxT\6pGY$\""H$>jf)H]$I$I$ A,  羻4+iϠz aH:29`;0e I^ L-dI(H$H2 uTˑи/Ab$I$I$Izao|9gKeҜEd*) ŨcJo p08$0%IBfπggA@20\˂Q1u&L0a1%S0Vc@]$I,Kƺ))I&NC19= tXn$I$I$I/SPzcsrfV\~3V| \!%R_GOHIsXp$Ui`-^Fwd b9Խ8I2 鲨I$I$I8 .^v~d3/) WLXS,))8rm:1ł$i,Zc<,b,$3rt5qG8k+Ah]$I$I$dt8AbX|ULty!°uLInN#]HHc,KƒuggA@2T8r# ꀒ$I$I$iHe vne:*$R¨cJ ;O&$   p}Y$\"&:Zh>'亮dt$I$I$I\KM,ffZMab>uLIuOOQ$Ml%IQ qI1H&I2 vQw#OOK)$I$I$ilH$!S}l^%^VO0r}=4 ˂$)*ˁ L8 T`?]'q|nD1o]$I$I$Mtj:p֚,`=.He+( a1%oo>Ai{Mq$IwI6 p~Y$\*SF${6S~d* 靰$I$I$Iiطܳ.blᡨ#JS*?DG4X$.e)mnQHI$ tR'參TQ$I$I$Izd* ?c=gXq-L_z哧Q,(sQǔ__4$),KNK( I$R$3Yz%\ÔVV", ꨒ#ǁ#I3 Ѵ`^Y$YV۞CJ$I$IHg\r5 ֬z|e ))Ғÿ6$),KFTZශ"")撙,!=-4JS1>@ܭ$I$I$){b={6|M9B*$ CQ_x 2$)^,KNxHD*M2a֚I$I$IїHB z߳eo`3(4B~b>J%W\3͂$T9x@6,* Ir{h;[Bb$I$I$I~5kYrLtEbF/ `oi$IwI˵mH ]VA!?LG] u;7q6  I$I$I&tݶC;Xzu[Y He @ I 3͔&$[%I/Up5[u&{t ۨx?=![MXI$I$I4х@J(C= 'lf73{Z*$,(䆣N*)>#@$IwIKq6=Qg!dTvڎi>|dUL$I$I$I?%jofǷάϲt5L_zeXAQ%m#6GG4Xp$g~qIIe |x'ǟ};P,X]$I$I$IP* |5,`.))&7G.ix=]hH bp gsl%a1OgQl*Tnl$I$I$IzTAXq^zνY SJYǏ-&$i%I,p55FwtD*M{3Mqdtշ*I%I$I$I^DK~aY+ϣb,pI%ׅW<"M$I,K~;$0$J*`c9#4?H@ER$I$I$)>BHKWwC+翘s,f.=L y|U,iTҢſ l6$i.IiˁWEER* CvS>d+%I$I$IFQ(B960:_F`4*R* 8i"IҘa])͔66Ken8NÞ-yۇT@6KRI$I$I#pǩ߽s.j\CCQ'_KO4$)r%I: C LfƇhm$>iJI$I$I'D&P{~=nc%W3gjʧ"?4H17 AuZItFF- 4|(8 Cdty}=4-OPg7?Vl$I$I$IRRq$'v+h%I F[WmIR,KĴ/Ό8 !H.vt'w>ͱ-2*]$I$I$i TBXm;i>]52l9Ã`])70:Q$^%ibn qI1ʖ$g8C6,oIvI$I$I*,-i/ #4{egP9sa@~x0ꤒl_"M$I:-,Kı1N: C4L9]m4g Rb;Xl$I$I$I/BH|2w;wѰ{K]ŬUk(4P?\ԈS֑WƑ$6 gR:7gWaHL."7Gg9IwnXJ$I$I$Jg,9F{1k/e3TN"7GX(Xt4ZRhHFwI_.8KWT]pG* %I$I$Ib$[ an~ϲkR[ A@n/ꈒk.+FHtYpI7UI%!tt޶<ѧLMTl.I$I$I/! ( "zƃ;Xr=*g̥0Aӡ$Fd>iJI$I$I$ʎ;Xtѕ^lr}9,Kw|mIҩ`Qaׁ%F[ajC4JmL~ʪ%I$I$I&tP}h=zk.e2cYdE.iC;9[HI^ 4M^ pYY$\ 謭n&N<(}9UPVFn]$I$I$ib &A15OmV\t V_K GTR|-RwHI^ 4~ 3 ,$Β̖@m(ǛȔC,K$I$I$GBH$|* h<o`YQ1}6 !.i\=r%H^ 48 C* '}!}.Ҕ$I$I$IBHeKWG]_^\Y+Wp/aEwIm+/ 6$酲.I< #")Bd*)i>PI%I$I$Ie ,BݴfpͥL_z&A"@/4*N~i"I/d]ƾ2J7n8Ke+H$St5Mx1zVA>[$I$I$IҋMB?$͇v+Lb!G~h0ꤒAuZI:V#IwIMˣ") $SYE~h;7Qiw=Knʪ%I$I$I4f+"4D ͇u^tY=ŢEwI o'"M$IwI:R*o8KWT ڏvSxeUP> $I$I$I[B Pñgzt7 ^똺p%aHn?ꤒk%_#M$IwI:n1 ,$Βʖr;7Q}ͤˠ| %I$I$I*D~ko=L,2YO9( ]hz2pa$i.IZ")D2EnH4:@٤>]DI$I$I$E !)]]n6-5X}1){)%_mI7 4fwΉ8 !H&,i ;6Sg3ý vI$I$I${* PkRk1kjR z!,%rӀߧ߀Q.I# eJ74jR 4u۱n駬&S*[n$I$I$IRv?Q698iZau,X{9S/ϑ::#Ic]N8X I2d+koMnDzvI$I$I$WO^=-Gpz柷3 S6wIf:J/<mI t yF[aHL&7-?nf졐ls4$I$I$ItڤJW۱::A}_yg_Lr =%riEHb$|S_ \mId*۩ {aH ReXl$I$I$I҄;wvb?m5Rf8@_%|#-_z##IwIzy" -ɒLgl84] dʡl2bvI$I$I$Md! ( <|z ϿI:Z7r'HH8e]^ˁ;wegWaH"&S^@wG~mh;~D'=HSJ$I$I$IcKtm`Zsk.auOFnb!QOn}mI_,KҋkiI$W$d*'Q qbB [$I$I$I ]a-Qw#Xv=sϺtY9a_IS*'Pi"I',K Sx+!,b.]QIHrdu۟~3u($.I$I$I$APzvڏ.毽K", GSR|F/_&1΂$bS~G9$\2!-$[~:ZIg|2bvI$I$I$ !HB$Oz \µS5c ᨓJ+G(msAq$i.I?wӢ")B*z8cLKaʪ$I$I$IS )]] 4_}1 \FYd{ y :xzp#hHc]Y;gEERjb6Q}#3 ٪Q$I$I$IFG4?DG!ڎg>|A9}JҨ6p=e_HIb]~/o8KeH28@h,=ͽd*J"]$I$I$Ie!B=[El`•FTR|>I4Yp$"")ƒ4 ۛ8BGmH,K$I$I$IS$'@g}cwE,8 *"7O:v4$Ê gcA"A@G7>][hO! :$I$I$IT@t5YɔM"KQ \Ci&Xp4QM \qI) Ag$Mv0U [ ,nl$I$I$Iƒ2Ua~rd/ \k ):ZpHIif]D&QoLT&K#>8 I &Q*[n$)~B(*@X, τő#  xȕDDIv {"cs.IϘa}rK4΅e 70M~0 60yCrQ'_\7_mI:},KH~x=PmIqH&IU2Ƒ-t%HBjC%I!( (31(4td2դK_f!H̖AR0$?4Hr uu8%iőY\E(3)9NJLդ^ٻGٻd&Z{m.鑹L$i G~=HC;]x%e R,N*)^\KhH.i"X*p'$(,U9Nl}Ozd/HWhÓ$I"!?V)Kh9 0iR-|LSQ6e:TU$]"kfv;hk8=G;F:Ae,H sپp s)2HWM"UQqjԍc:OXtK L )ɋW0iR&-\2r>idLl42#aHn@_鋩m v3@k';FO1:hGfz3](UmGxcXf]G*[@ߏSx:@]$i7U QG >,b.-'L\mtp}mCd*F$IV1ҕH3g2㜵L[qSA¥TY@d'O;;kڣtn@aҕ.DF1s=ϜŌ2m٥`)UsP9kSOk{C=u#nm@]~g0myL]^1keSN=Pg;} 6]{#h?i;Z˙.IR,TM/g,<3C!#?4uBmtշ>$IypT̄eKyL]~&rcL t;LG~Zly4mI+d'g]ˬ5?gxUsDgmݳL@%&g$IG!a:s>E\I(sQǓw|;:],k Âq!u]%iB I P wmeQ$I/IX*=uAT{̿Zfiaꊳ,Bh?m[w1eSJ N)I05Z{1Vg~h/FÞg8H" L$Iҋ2RhױW2 T/XuHt?Lǩn7MP1,H?~|σ7%7yozEa~ñ*f8%?6+қ^_K60iᲨE( <ƑÑ{EoL.IҘ.mu`>s:]EŔ)QǓo9۔JFEwM5aXpfGERA@'>F-?Dae@uBIP4?`ke7y\EvԨӍ 4<5|w'}#EEHIcO^R rݕd'9[hxqj2wIcTő^9xX~{ɕMu1aMQsw8x׿[$ua/]ֱp$r}ݒ4ꀻ"΢S& 0,$7(DER̥HҸY?IKrc$IK%oa+^Ϣ 7̖GmL R9x(!$f$)a[b:z[XyXzϑ:CwIcHX,mlNU%V,p#ʨI~C tIƃA(eҳYt>|IWÂ& 0,{ \uILgHe;-?.z;Ie :$IzUs.^qc =>{%}dpdaFX|?8MdHW8_NN<{_9{`'9X~{YtMd&Gl\us]HCt$yuYʔV 8s,*f΍:ٸ|P0wILoWyoKoxUsFl\kyNn <";-C㖧Le$I%H@2 '?L_G3fQ>eaX$,94j7h2uQGN 0,i pnQ$Y$ȔW rt}z{4`O?iK4@z`5ISN{eSb%]GҺl53$k[Ӡltm+Vyt7{۝^$[wW}3!;ezb/;y6ȴgu|-{[TC.d8Y,k Â4SHHTY9dwQIz:HKpY$IJ@~:a{ƕ/&XJL?s5o={+Iz~||s}+) wPwI/A@~a̕;\~]/tL;<ESH$$y$Ri ]G  &[J"X(DQRUk(uQOஉ‚& cEҨI3+h?q-vIe|&Ix$`Rp}K?Q,:քU1c LUMO0 :')>1,:քU1c 7\CN>$9绤!H@gH}1ޟS=qԱ&Yxŵd'C=tIƃ (Ot5pzʪ'Q=k.aaѢQ3\T (,k°>f4^`QQ$YH)bO|'~CC+Ix$fk>~oxTRe̿:&/YLˮ:Kvf$`Azaً귿d&u /UV˯g4Hױ>绤_(H@o#L_9?9ּQǚrKӲi:8%I'X$[]KkY|QGҏ gϬ^MM:u\/8ǘ q::;%ߞ鋯k>#L?k ^D4olRԉ$I H>wRN/'uB.yIe(QG_I`p p.ph4f]wM#Su##k$iT$3e3YoEݶ'jh!,m5J* ~\ɿg9FI?ǤK6~ՔN3aiok?<炨#瘴p]No]12ϼf?0k%Q'Q= .@_S '7&[3]q" BiGIU0y" AX.iTWg\& (,k°̌8K$S+*nd&k5Q]R \rG?CQ Ă+!̉G*wK4 H&agt0ä9 VTQ,0ꘒkp p 6`]wMOǁhHTD"I'9oqrwLvIƷyK~=\!S=%TzRe,jr#%HI?1k2UN4߯!7|&7d'O:^fz!Fc[]"Ix@bwG7鬫![L"XGQRe3)mt_  w5QXpׄa}T?> VRҨI2+h9}[dt$I" 5.&]Yu$Dl/P+J{6i"_>^=7͜xt]"ſnߒx̖J$WAPZVmnݍ'VTS=k!aX$,))&S>x 5QXpׄa}Լ4vsG% $SQE_[#H \U(uBIt!5%n.!]Qu$LL!]ljGwi" CoK~W1|ꮣ14OH3_w5&?4$KATtXOC7R5c>哧S,JKKi+41d]wMOwWFI*(sx?.3O*$IRLlgIWVEHH2eWx'u4モ =F, _Mo!O]zNw>y7smG}|IQ'$I/G"Y*vUκL: Ŕ'Q (,k°~=r,8,ɒL &u;7A" ~F Sgf$;yjԑtʙ:lᙣMJ '?9c^sK1L?DQG)̖1:5tIƹ YB_{']'yD:˔9! ,))*k%V?@q`]wMO>MQHHWTYW[$aX$h\I(H@o=,jޢ#i+sEDކVH 3Rl=?߯[|te5s.i|bGs_#FI$]^ɜҲg-{g$))( ii!NPV=E°uLI5 X\T3YpDa]u#")ƂDtYC=zARiQ'$IEm5\b木)"LgCӶG\EGR%R2<]o(}?U'I&fs=Im&/Zpo GBvRi$P}Ͱw~swL`.}MposH83LZ\_;GwI& YdBg'kh?qtYS-& b!ꘒX \ lz"M (,[0p #)΂Dl$zO_`Ͻ_AC$3$I`',r {H3Q'f91 u~AR,|ϗ̖9ߥ..98%3ιYpŹ vL$i @~pC/_L$4 ,FU3)6 qI1*r}~3 tL$ Wǔ1nsX+X1+,@|_uqSWwb.Ea9X2uQ7u$MH$S0Egv{W .hoIJ_^qI?\ bEҨIe)=[YlzVHfH$@ qm Q8+o%Vz[p#4V3_| ^V~]kh2^UUə.ID@2]̑' 6?D2!):x+|XmIw)^~`fY$X"&[QM]lcCߦD-b×$bFsd&GGDٔ_lJi4S8MYoSXRAvrwi^ | ұ+4 Re:۾n@W$½$I9C0iA9+^YՋ*+gB՜!1wqH VNf;ߥ ?XzϮ-UVWye䇢N#IƊDTvvF9yReQǓo7?84~<(,b,Βd8ijfFn$I?O-p믍:ƩEW̪WA {JcM̿hZtͭz]\pk_=^W׾Γ9$IO J9ؽlprFR L$bOK#"%.O\ d#")BDlE5ycߠQ <4xz$IEC0iA+nz Y~먜FHiL( 嬸wdz*g;ߥ(zn7;ҥe L^TNa04$i {!U|t֐&H$()ʁہ7FEҋd5M__>,6 L$~.IKGI$B ^żK:ƹne7BJQ `~Ui4-Z\#8ߥ(0KotL_˲oaЙ.I~D)qgLEDH5+*#".w"")Rr?=|s H"'IƕbUאY^t$tJ(N#MlL&LB` "ŀl@(aCwwmv.?jM%|3lvyV{g$I*KOѴu+~{n!)M֒ˁ+[$ɸ$Eo\@޶f?q'uk$IEpSNwF `Ig0oM* l&9-$Aԓdqwi0ez`q3 c`7qh Si*&$I^f9j߰WӸe5*DΔ4 =I炻4;s`z!F ]YMWcnm۽$I(N;GzzGȔ#SA駝Ƅ#=]ǤNfک'錻D2]0p?&s"N9~%IkEl_/[6_knm$]9$ 1IICˀ qH9>r oQ0 w$I:yofN5Øvw]"87S|~2z!LF~~6<$PfftI40Daٽ+g3$*Γ4L|&I;@)k[$a2MEO}t4֓fw$I@Ԁ)ǟFn:a"NbRw.L]$| LKwKk?  &{ wigr2L9aoD)sNeԐ\&O™.I^=wAWc}+9h9عyLܕF7qƛ#.OpU|.Kz͂0$U^Eˎ ,J<|mur>M$Gn|I=ȸS4?XKҐ)'Ș;E#Îfqǐ鉻D2=0y=S4;X&ϙCkvI4Yd}feFҕpD pp1PsT|Ur'[$ {*z;ZXv׵,ZnYGwl$I`yqp)>ǝꂻ4H` g0jS4=H&K+O8ѳ;E#ܘa✹;%I~& -XzX~l#UQw b;@*Q[$ st?q7=GOk3|D.IOdLGLmng$Iop#gjjbSP~)vu2Q.C>;SU| KҠ^yipUQxs]ҫ+i[b}7V|.C($Ilp8(1ģ%w4e{`QQs]cA1c~큉Gǝ1j桌]$ (Ӻk+*tIE$ K. 2\^$KUȺovR:uߎK8e{`P1~R)*0c<w4e{`R1~b)*2cYh)҈GΡbST"*&LbGy0ITB}ǽf{_ۙzԉ̞v*FO ˒^r࿁2?. $qKҫHeX4--DAc$I1,1A)8$A2z$ Ҁ{i}]&B8ߥ!1N_, $)6az;پnZSȌ"&e3q'J~*( ś#l.Ko)IRт0A"~r6?{{gx5I cONQ1ѳн︥W>3f2j(z۝٣>#Q3fkrKA$ j˺GvB>|r QTIH%] K"h'/gqH^Rt[~Ȳ;AˮmD~„'H%gͦjҴSTb'Of,r>T/^w )0jAwieaAP=STbLg̃]"IJY>wmeog?dwIzF?w4y;DLƆ'aϪ섨[$i(èrԸSTb&OcY,;Er0zT95~?ߗǝ"~iIr4F0\fQ)$IEw MecrмR^9lƻ$m pN`U)䂻~ $=(B$ihU5qĔ˨g{'iIU`B Qc>. L?kv tU 5gwK!$!"z;:4l\oaqs(;Qp p=^`w)セNC0d[װҲ{'Ѿgp.I(*\~ A9*5A@ʹ } J^3@ gI(9`](Q3 TOI(gޒ$i]dlgk 0aܣ(LICI'[wiUw/.OCÆ5{?$IG> |ĸST*M|\w1F9[IwiP1TH}3=w$I{|Yr5L:h~ݻ0\oV=1H# _1H}]lz~v/{ގ""Ɠ$IJ>5S6z\)*QecQ1~=4`|W| 󽆞P1qeĝU6fY₻$I UiڶOz=3Oy#ɲ L܉$p> {wi`)'0v,%I²de1㩘0S% SMTDW]G)ҰWx*nj;E%|x*'L}gK)$I(OoG;z5=L=d "㮔44UP,7E¸d*}`l! *ut%+:aM$IpTXL( )cS9q2l% Lwť|x*'NrK>rʡ.ܔL$IEt{ZV{nmu;H__2 w4x40jsqHz4]- l~~.# {C^I4E9(3TUu)*QjjƐ]",>TUM)*Qjʪ@砼f ʪST5GL$IL(Guԣ0wP5v"L_t" k.K]$Ja"IaǢ(_i!IF($+\QdEe)*Qj5r5 Itu Q>IWp;dzعt>[2Գ1L29U#w3o(ir]z.qH0$nR6?{ @v.6I %eqD%+*H,# ||WlG9ߥ𻛖ʽiIejiI$ oDQV>r7VgɇG&VRAps4l.6o444m᩻iں#Ol$I#TT3TRU.K-C8r;C%.UMK$idx, {Xzϙ0P=݌:|6o@k-1HÒ ҫu!&5ٳj>>"!I*C* Dȇ|J L;ߥQ8CSwI4n=Mkhٱ?JY,qI%# @6iq]z $  ;> ø%IGzx%ei9]yHW]Q0$id B uS4_ʬLmd%K1HÎ ҫ3s8n\͆g S[$DAJHEH;%iCK,QS$I컝/dw3~; "M|hEV\p^x7qHlwaa7-{7!ޭA#IJPa*wJ\~Ap+^I4 'H1 }[]$I#KdiݳſI!{5g_t3S.|^T4ܥWn&p1;D 0cgCy$I$I$I$I~g[m9ijyIW.p s4l.2Ub4H vb6?ͭa N$i ]\hsw+|DB7 \GJ@aӳgB9L=$d EwT> \ě" .KSqGH{vX@ l|nZv"HC ^$i |? I ;%I4TIT<ͬfvxC_8.K3Cwx C$GA@aZ6?w?kebK$(;A%. + _bi$+ 3o4.'޲GN/~m #ۛ!"-I.he= R`Aθ3T>4 ĝsvI$`Yv/_M+yxHUVQ2.MoN`䂻Txn4"؞Ӱ~&D%I^FB= LW B;C%.|@3]1tz.IG?q?rC%L$),*@Sܟcn4ܥ8.I+" mV?}[6H xK$IRDa2'Jǝ%4 70udSsT= ) MKmdzHUĝnJM]"I4Hjgo~尳iݥD@6ir]zyS/!iA lyavZ@˓p]$ Bh%傻bei oo#M:5:LW'mwinwuXdTIhߎBiں;72y̞*nj""p]I승C\poiqHAɞvvK$ AZ[tvP6jl9*AٮN2m^K*H@_[3Gl=IIV2ݝwJPG;}m-޴$IT (ghXNӎK(ǝ(i`>\dbn$oӀO! g34_`wE<]$U TO_{K)*QMt5.F0 {ko;E%B^0 ݍ9:%I^0ڇ~âbU iD 0%ic/K LCkEmu;}ݴ IȶָSTz[[ﲌ4$t5:&\pBᚽVoJU|@ᚽ^OpWLZilv]$U H$a*cƉgɯbX0,,. W~ ;DjARiqHz傰8Lwki %I4$tCOsc)*Q={m*) (|HgӼFw$I& ˲Ǩ]C|SwWnl-Ґ₻>wW\ԽE" Tq$I#TB=NQ)":kw I8<#dDQDwi@u;.QIؽ|h$IJ"]^.8Dᢻ4|w4.N࠸C$'ݍѰ>@ H㮓$I2رLWsTB:ع~i? /NU_{+9ߥTfFgjgOg:%I\";XxzaS+ٳr. $pBNN( ܥ_#$eA$hڶ%_͊o_zJ$"Y-Ҿs[)*1Ѷ2~,M wJLa:ߥ,-wJLێ-nntK$ d6e?yzr®E誠4} (;B*qHCtqGHSA@t6ձGٽr!$IR̂=кeQݤ-hۺl㎑F{˖w (;ߥ !m[7:5h(uzg$I` c+o>3ˬΦjiQ(⮔L" .K C|D_[3Wg'᫘$IҐmKńIqt7Ѽ~%K-Q-WHqtﭣi wi%*i2z먜85ަ7]$)&A`]罝iǜJyh0AEwi( "HW |8I EE,J=[r%IT94]IqDn5+Hy_ʠqr:vm;E%m׮"|\^NێqDmLU| $IRIgY=,Ԯ]B0 'KJ>DaQ*y.Kp2pD!  o~vlb]?c䗛$ICTmƸST"Z6y}'JYNMqDl\CӺZ维$+qZ7;E%uzwI!"YM ,zwm!'kq,p"}F$QIR L$y:kY,Օ4n@2Z$ICXL74[Nǝ.ػz ~| Wa2иf]]>eewiBgaeD\9|Ƶt'$I44!$аq- nuIgc-DyD pVY|H"͏1;㎐JV&5m̿2,Aߒ$IE ?KǮqhkݺ HU]"t.~]N׺e=;ߥ(UuKuO^վc3uK'L$I$a.ggio ⢻x7}F.A8wT^zUw'뗳䶫Y]d{{H]'IW"U мau)6~2RqH!Yu Ҳyj\A<iJCòelZwFu+iX $ICZ 2=ݬm,'oXNO׾]O#b2 x@*SR'hX:维?i誇e/?| ؓ'F$I/+( Vŷ^˪nmf~D pP)5J%W Fo;B*A&D=lxr6!Y7K$ kܣtwF֭㤫.JKY z]"|cwF[f$IVD᠃U˙el|>:k-{R4!wo(#N$I"(|нѻijF#Dg>vO}ߕHxi?r]n7ۜ ,qh[&LL$Iie}>#vOoG+A Lxڡ4"-セJY 8H".7f]?g靿T~-ITbFGvѳqhlF]"Q呇[\)!v8۟z.Šll} v:=L$IႠTFq+{M[֒&LBWRoL"WqqGH]\=[Y,4nL_i$IJRf]0b#w(-)>az`] pK1 ᅰΎs4ez$I4!$ˡaZ?ksE(PsJk*U!0!a2It5ճm,J/z0 $I*mT =󟊻Fܮe˃QGRw ]=ևpK1* [څfϋO( DqH$i0%RHϳ}t7!#k⮯J*U)uw@zUD0=kX=dI]'I"Qm:mW-˖n}w/ oHCBڶv8ztfReб0zur}lyvvt,F$IqIAw KnKl۷c;OBLETr\U )6{#޸s4t-vK$wʠvJtDLD}}w;pEl6]j"~ A2 w?L$IRQҕдc+nFI )_R8ݻATR|UP) c)%A@"vXpU\D(I L,GmwYqDICV(Cڷo;GC\浬jpKCR(?׺u#+pK$I$ ?;-fg ^0*.晴䂻Jh`fP&$]- \ KnO>J>'cE%I4"(;\[F>HCT.ŽWP>O}}Y{]Y7]͎gW:ߥ*ѰjqieYs|kvI$J$ ɲXrZ=?rPw4\pW qA JAÆ_D›b%I4²~̮;ECΧd?r|%^NY˟Q9.I/bbaoS4D|A';%I!+eUHtH |\$pT`r]$|\J\$Ri M;6;YiܲT$Iݝ,4;GCLwC-K^qHz%Rо%?]wdE5^N z2,2ĝ!koK^R^K$i HCÆu,'lx.Zvm&UHz|-k~@ٸK$R`ɏ.ccwǝ!bcw'WP6&I$dAP8ͽcϪ5Hɸ8$(=/JIP\pW t.6bCh%]YU$IlrmͰi߾9ŬeZ\y yReqHz-~?߿I{k޴W|>4,%ʠ=.u˺sYpٷiΐ*F$I L@Zwa׳_w*2=$e /*)袋nE@ S4A@"&ѾgƻtKXWՑ41 pLO]3]$I.L~[|.Ct(4Q;Q,N>?i0ஒѼht ,GvH#VL!]Y5J H$It5z~*zRb K~|1 42`s 3|\Ig3,7YkrK^v1)|%&eYr%,TNtK$)&IиeMV!*ʪF!QEwx!q'w4\pWh^x`4@2i T-ݸօgHY$ICOSk<˄#dGǝAyϓHC"wGȣ{Qq'iH9ߥ  %zi~;6$ -_A"L$IR삠pcuklI2UFQ@D!XYq'#i0R2]Lj D2MO'{ֳkٳ^HW]'I$"HU@Os?~NgyqWiY}B2]P1Aa`? r̍J`ς}.$rkxP=uSN:3$ %췾Lo[qtI$ aӸe3[63cI8u:tyL?Qq`Z`wE o1H#HDQXk5-;HNՑ$ITlluZ8 w-kyOg&(#d*hLp}kݼG juK#PZ6жc-罎1N~Զ}#~qU$IDI-jٻq1Q>O1ɤKiN>ŸCJF3(. ka2E" YY H e$IFA]m7d#zxK`}0j:4EPVuKv׶gdyeUkyKg:ߥkń3긫5{z: I$i H$!ɳw&Zv#L*z4讑"; *͋=q]X&Hhqj6C;@D(,a)B4Zvnbٹl>Q-I!ɰ7pR1~RUosK"캛HqEWJ «.bɏot]$IV"U8ѽn:7cg0عr r²4ĸ#₻Jɘ}a'JEm{S~?JO{_v'$IF0dlWO>d:y[_dK$<%}u_!8." Lg;} ,鯨J$ɰG UVƼ G:*Fmt!LNt]$IV2['G@ͤ21GJX;B,ETR~͢a#L$I$St6Ӹu5#તJI9~!,L!ػq[?Ngc;2Hy$IJ@d3.s r8kݲES(wQ*9Qp׿SΊ;Gu.O!TOW@ok+g}qgUJ1#'J*'἗$I҈=(']٧Cft\\?L/;@,a w^CPH*]˟g=׳wNr#%I ?r.{<Ƀ'5SVsD4af; F=0nsK kn>OYtEz IĹ/I'HB[X;Yy/ٵyz;HUT&\ѐ*.(iP4/Z?@H"aHl[VYt,<I$I*A=M0Cx%b܄*DlWCݢG<w^f(˲I 'K$I#MD:[_V2գHWTrS!$3Ӿw4qH%4$$D-6Qf1?F_WTK$邲1x݅?fځqUooeu3o.㮒\g|.5s}-,e,||Ikmɲc$IR堿:p[ѫмq +((TzZO}S4oX_Šol]+ s# O"Z7oOC;MP ̻ V{뒻$IF ,d2{a-eiobGS=al|6wJT2O T"6]N *nQ  [i޾K~zW'IUt71$Q>ϖoc`sT~$ 0ټWQ3}VIz^/^5v=iwIA䡳|7K=G8%D9]I|+~[z.iEQadً.vss[CqN7[/D._5K$ u3O9Kdg3qiw U2\p!*z;[i޺˞cUc$I Fp,{ÌuhIz{k=,Jv>ɅG$(,AդI{<紸T]gS5.i+788_?ì7)]dx{ܷrBZ@$I*E ˌ_ؙQ^5LO7Q;O# *ɸ$i Refl_ ^HU$I;d!UQΙ~!.geͯG@e s/p}xi:6}?aF] iPx {^׽TUM܅ T9_|m/z$I*MbaJ<,313&dzXwIz 2dԮ]ʶEOйT9+㮓$IJA{aQxM1N_Dt]t9% Q /r锍w%W'21&⮑$I+,O: f|>ʱve D ]|qGHw 5 $]QE_O'[ױss^(T9^J$IN5x;O{cEQyvIEN8+W(㶡"e_]l{)2wIԾEVHUAoy#}L>T& G MVpI]TL(w$I44D¢{$Nyӎ9 s1U2\pWp]ZuKٶIjIA"w$I4t&(;.4\Dwʀjٴ+l{.!Y e5pQ B謃 [P9 pe'2vayU`C-n]O-}%骇хCWx5ff]q羬TU5-^@ݢTM),H$ITE:y rw p]^Fϓ$]r$ITzqm.eӏ& c!O/d IaaSK84>A˦4oZޕKhZ|nP: RIY@gT9E[wᮛ! *,.<z e0|%CnwA4o\E4otK&?R>0ĄcNda0f0ө8cutsv7yj,qm + ?F>TO3b徏 $.I$ ,|., I\pqH$IFBgL=PNŤ*N*JgXo Q~8uSsJOS}iYHC*&xμ|Һe/~|~&z<{GW_uB(& Lg4^{lw~'C| r4N?r;z)㒻$I$Iz\p$I$I"S/$q'e˃wꦫ-|!I@aYr}[*N*JݒYo%IP(u$I$IEs]$I$jA 0Y;2ңĝT5KYpŅd"e@w$ A=-0嫨45t8%I#Oa /bwM}9_|}$I$IEs]$I$:d!U/b!G]TLw'~3t6(F\?Dy33)ڋ;[ 7-I4B)e` ?aۣĝSdE~L?0: 7IK$I$?B$I$INz[?]SW]=KN*$@^8ΑD5E[s˵l6d5$G*.]TwwPM$I$Is]$I$bApO}D8 }g!L]#ICGBwL<~&{eĝT˙WIQE$2hZ/r)Ews~LsqH$IwI$IA1̻2wRQjydE54TE_qwQQ=<ѱ.IRSa]fm?;(A"d=]$I$\p$I$IH> 8SwN\Uꗯ"Dz[qm5eO$⮑$iElg+.*Jդ;r4%I$I_悻$I$I*ZBg=39_B†odm$N;$!t3|dx`#d]$I Y m[x_$wQQ:N \?=$I$I\.K$I!t7¸ UwRQZ6gΧ3K %i 6 8S`wN^uKWH㩯A2}y\s u(e2˘t܁t-L-I$I@$I$pz]S?-K@F z88߾P8}[Xs/r⮑$)~QJmًq'eGsJ*IoZ$I$IwI$I B誇isG&/}%?loD.! 7-?|*.qN*JȐĹ.I>Aa_]wN}χ9?Ho Dk$I$IP₻$I$Iq;1;>Ogm+Ɋ©$ l/ 8s_b]T(K/yveqH4$Ӑª_]G;( N9#/܄'I$I.K$ICN8cAosb/&!H]#ICH{xGG)ڊb}w$㮑$i"HUAos//~wn;(5gqyP=i$I$I.K$I?+fNW{YpʯKoôSf9#]=*/Ϣ~LOduIJػj 0zӻ9?>Jp $I$I $I$OweTMwRQ:voCOS/)$ (<|l9sϹ1(mpB %Iza>ho~͚[~wNQ0췞JgK$I$%I$Iҟ( '3L?qm_eu$"$w $xz$I@ Z7#wNQ||sIM˒$I$0%I$Iaa}q3w唏wRQ,cUߙ%UKO@O >p<οN*JsCʱ.I+$ L¶GauWĝS#? {_ۈ$I$0%I$IRAnHU%8 3.*Jw.-F\?DY8Syo;h  אDa*dd{,%YtEE)5ϽIsfP [$I$?$I$I1)ڲ]ƶ!(R(I*nC&ĝSuw\EAF+ U ;x_>;(N—IIqI$I悻$I$I"8HN$+Q{<ͲkK7*#% (0wewRQ-g_-Sx*s]$ '?v9E;=}㮑$I$IwI$IJ\@_TO~3N*JOS=/|ti'YU8PTei; GΉ(L?/ $鵋 ( ~SA}mP5&5̊;(=MsFڹ.I~$6yo&'|\ft<]$IRႻ$I$I(l/I8S͔όh H2IZr9oŝS?d9Қ$IOT8ŽE{W-(&sy2)t7n֖$I$I#o%I$I*EQO]S}Ji0ͧr/ H$N*ʖ_]K0w$I% t 4oe_!wQQ8>e)N$IwI$IJLBWL9`q'~dۣ$ihiqNayQ9qJIEi۶Owsj$ 4lyVsv>}Mx I$I炻$I$I$NWi*cfwQQ2],˴nK*IZ}&O~)wN~"/ܴ$IOrX'.*Jf4O<ے$I$idm$I$I%$܏8;hˮ=N"U8mPϾy9Cho r$I\Qjh΋GWCmIE3Nܗ_N']$I$I%I$I*!tÁg):tYEE,ry(zI'7ק̙_FzԘRl 52uI@v=\wM}9Op$I$Iy\p$I${η2=tGgm')$ LT+]C(N\UZ6ՑF$%@n[#$9/3$I$i$I$I% ),0 ;hMv_Z8GKD9?}χ)ڲk/e#LCF$E_@˦q'z -F8Mݒ$I$ip]$I.{/8߿p_eSqHBW=8XN7I]T>²_N.'YTIT5-3zӻ9_򅛺%I$IႻ$I$I#XB^x̻rN;(MVpzKnZkѳ1oQ5倸U]@Ub%I& ?︝7^wNq9;YoGOK$I4.I$IH@5!' L<椸,E4F*IZ^:|xc^?_D IJK*gFòqb$N?26Mޒ$I$i-$I$I#Ump5Gqm/b!H]#ICHݍp[bGH0L)xM$1H,t 4ocF3L9a.'|3*!ۃ7I$I4.I$IU]F:K, H Fptw)c'ĝT+YxՅ6ȹ.IҐ$ Y?<OrE_+$I$ip]$I&Rͩ3.*JoK# ͤkp)A^N#MUf]T(gNͤ}V$)d9,Rv>HEEI׌b3ă[9P$I$ _$I$irc>~ŝS?.TG"k^]S wM w$I*VAwuʯU;u('*'V qI$IW˯%I$IIBjoy_N&K~-2R8%InBHW]Sa'S[$I$ O.K$I4B6øC's9ߣr┸Һm *]{HN $u@jNW5㠸Ģ[Hr$IQa>s ?cowQQ2Npqt7)$I$ S.K$I4B 9Oc9E[oQp5K$ihr~?x9E[3$+ 'Ka*T54v i޸:TM_6cf/I$I^$I$iV8ol9E[sOY{- qHBW83_#L.*ʶa@͓,%Ijvq7c ]#I$I^)%I$I9w唍wRQ/`AG*\}zaܡww8%mĂ+.t DuIF 4vVq'>v.>,I$I%I$Ie_0У.*JE?kwvI>H)-~q9K$IҀ U>-y>/ci*D(I$I%I$Ir#89-lA$⮑zo;hko5H0w$Ipki]'q]TIǟ 5!^<]$IawI$I 0}1=$+N*gbٵ%ߟ%YK>A=0鸃8+(=./_|>RU8%I*a}WƝS?w(w$I$I* $I$ Gc9_j򴸋UE?mIW$$LKs'#(ٞnb\$i YQ6˲k/g]TTU οBo~C.I$IҐwI$I(Q|_9;hK.۟xt>^@LsWIVNu$I#[Awѱ{{IE=`N엩\E'$I$is]$Iaf}'~(xM:#k$insa;(;ye?.Ld9.IR K۟]wMy9oO> Q.I$Iq]$Ia$&yλN*JӺ,m&]U8O˩U]T,iF%I*1a ߟ)J"]i_OO3.I$I悻$I$IH!s>)&07DKz K6 %I7'-R=Ƥ]#I$I_/K$I4 q"sm鲸X9exz$SN<3|Qq'eςY|Ĺ.IXeհ񷏲)Qd{!]#I$I^₻$I$ICU}m0vdN7?9mĢN5)IdjJ'yuhEEkkfߦy}=\n$I*DeYvlfޗ/gqZQ$I$wI$I|?$p?;3z[9E[%ǡĒ4h,pCs|ߣ\$IYA:tG_}疸2jA/S=u;$I$wI$I泙s~ױH L]#ICH-08KH.*ʶGeٵW@%ɒ$)fQ,eqw}(ܔ(I$I5<$I$ '53λQcN*JÊE,ɷk%Y DqI@_;=>M*O(7ߤcwj$(atĊgon;(tӾp1NCoO$I$)n.K$I4dzb|9'~s?saٵߡ~5qHВdп2㬷ǝSe]w$IV"HUB_k7~uW-('q.aI.K$I%I$IBKr$ˀ("I"\wLN?{wQQV,bI_{ $U z'o&'|\8 d⎑$I$t.I$IP@1S>{!S(YriB&IZrP19,9)zYvw_Ź.I^c.fO]TQc8W0R]$I$ *%I$IB8o>u9E[гKIQ~8G9E[IUN]$IzM"H@˦,.*ʄ0磟b|9޸k$I$I*=.K$I4w3N}0;(0#L]#ICH}09+(;}~|D]$IFհYuÏ)Qfe 7/J$Iセ$I$Iq 6c]T+Yi"UK20jXNW2#t5a5кTI!,J>s̯\Ť㏤ Z&I$I r]$I&ɱ sE,2vc$ihr&}osW .oIA:Xiݺ!TOɟ25ǐ鎻F$IႻ$I$Iq ~g)ڪMD54wqALŝSM<>~}l$LdBBfFPR(P x[8!wA[b-V(E! x>g[+ŷ|3k?iS)L ۮY˴ꙙS|%j~%re3㏁;68Rf9|BY\=<^% I8emDeW]d5Z8+T(S\q`<#PXMPN-]@ ERf,u.e7ia;HZd_©+Yzv#~ӊ'kǼUT]*_ Kтb-}h\dHZvuuղ()U0MoT$+3zS^zvqw 3KOG 8RvҪհcMZD]p/2kWVjVs9ON;J=>(I~TfWs>c$\zj-+Z )5^@a Ť`wWi1"E*.'ŅWڍV=6BY( XΔm>MVᮯlޢ:L%JlP>1pDNmtue@v ~Fk{Z''hKPөz@$vxK+,1¶k@HVa5Z]W'q|O=5? 8J WJZ+zDtI"H"IQJVv#۾вG')3u$7(Vc[GBVn[1] 9FϞ ߋ*]%V UkH=NURe9FZW4DtJL_}N)9F`HMTN/,')^uuu*mh;HZdrPL5\Q)Z_}jl;HP+]K6)e? Dxq>Q$KcK~Bjt9j{孶s}a־•$:H,iϪmZt9d;Hn*R%Dv ;ǚ/ŋ:]rCaEF6?񀂑H.f6u mߵщU 5~qF^yKk_zvqAt#;XXڬ5^Jm9*x|J+M8a%Rʪv_iSuh~3m8bZm]9F5bhxRwDLd>49FDB_?[pH.^l`_^b;ؚ֦}H$v ʔl?O_1RYk駌ڮ u1p!dTqScڗ?>H%_Nj$Kvn9lPv/~E+@$!'h,mzS})Z]W'Wrܲ mkXTes2E 9Fv.D 4@v $X}r}u8VJm*-=IJ]8)ЊЦd;H Qg(]%kHM 8QR*O՚c>|rJJk xq)VKUg1%Z ScR-=I?_e;HFZ|HeQv ;?J>_.c;ؚ~[,IH'y1^3aruKi)Id#9(Mx.P.U0͕]@j6?QDI=GLS c_&+ ٮ+jSFLS0-v>ղG&ʋFL_EGJOO1>dtU6pm:K*j#^-v_nԊǧ`gk $Rf*s6mm;HZ,[CL5LJDTfOԮş1LTAݸ%kH /yq)P .Igg;ؚҗ,PH.~Br]1ho]5O8Sڿ+|r쳝cv^js Jΐ]@j`×Ti:p@v>OV(ӓ÷RfnG>MPvN ڮ8)\EZiˏٮ18zN$ R/Jj͛ǐI 1gB-=Yтik K kmw+z#GnǦ*gB'sR0'ho1Gk4UkRR5$?H^-Smpjd;H0__oP$v $/.2jvej|sV?|0OʶcN`P#[71ݤ:Uڮ 1pGox/{흒 2ZlyOTpe]OUAZ gzrT@*Җi|v_g'7ƭ~xӶz&7cdi#8q!5\R}"isYPgOR0@v #M5uT"fCR)N uqd;HZ Bk x1)=;Smpjvc$VUOβ]`W " bZD[vj9r(UW[R5$'R _V~{9F|Ӛ"Ul@r?lӿ$־־'U]8KzZ17+]@aDLӱz,']|__V<1]LO.wģR6tʈr!9FZ$9nBnj$k/h+O}E?q]u4Au:*{28 J'5P)e;ȁ Ⱪ*9W0bK"*eծ7 VVl)ؽC+Û)a AIJhsǖk̸*'5Tv Ʌ;"U"j>sxRz~md"m@rR 9cZևdۮHNLЦ=Zti;Hܮj{-J)/nI/u:>BNAiˏks(/%2PT+z , aoMQ8ӓ7Hw8.5W+kw(iY:58,9FRz~mt"m@TTC[61RtPUʩ&; >?!9G'y\~#~"5|XҲm@r{wv=O^y\}ZiU|)FZi %s4J5ŅrÎb$RN1l7>:--{ti H.TY3u2Y]Kj)ܐԓkk=9?q\2I9m:K\KJYk(Fm9FlݤNSBl@rbRF*ʽhk;H}ZL_]JkRQOK+1Vuڮ|ÏKZ\pu#^|f9FDBk5U]•KЪg)vFR-.J̰`$yKuVcm}OV>`ٸݱ]%jnC'KN \$}_[=:Yt[,慗{ 9F:OKlpqIݸ@$vhC/l@rbR5{Sv#۾STPUZ8}=#9FPH=NVFw@Pyq)Wܤ1-8UOԮ%β]OHH@MϿ\M~q#^-Q%.zP];lѮ_{Ҳ+)Ea|O#5<p`9FDBk_|Tk^xIl@$ϓjw'خ1{6V<Ҳ%9ʟHUikOcs P+ * ͋I[RaSm1-}xp٫K'5R1oR-{x%]P>9"-d~|߷dǰ֬pL0pTX^LʬUMn 1Rs?:MGT(v $?.E*핷vsD ڵdYkʷ@H*=iCuusdkNTfjw@%@ɿD/v/o}k RSQ1'ZcZǗ^v @,mxV?R9F{ܰ+?a㏁;%?!C&ٮ1?6M*eV5kçN1s?졉 %ojNҊ'Նןys2QuImp|qPxqrvX1rp-yhG<¶k x1)#:9K;>]pj< 9:܋l1~mx}U]P12}i3Uz#z\G|Fr;uL]e;Lj%OOh3O*$|_Rc5f|_[]lɱPא־ֿDvwRݮgㅁ;Rv&9l'5>^:OK(G]ŋKYukqTζc঵ZDE}a5H i11C9F@P=FLWFq;JD~\JZIy7ܥ-1R|pϞRH.^B fw}ԠYs$ZLtٶk I4`W>M_q^>_K(U IÑVMZ}_4bjVJ@w/yqR:>eԶcȶ/*_@v $?.d*ޢ#%hڿnBlqRDZר*?ac;R oIgvD,җk $Wjlq} e'k$peiZLKm9 9e/P0pO~ٿڝzǠkM}QKg?k l^#ٮ1wxx$9kRF򩧴g{1xrw-@%`(ՠz@$b;Ⱦ˵xWKH5ˀ1ʨYv#۶hUX450R (->F{V̷c$YI=MVڵ%l1p;~B eFTCw9FbEZ4ZQʶk x5t1EٙB9FXT+-SZup6~vWnƈHh kKo(s]|) bQEFZ{q߱- HR dc$56'-{e7|KHU6S3lzZ8m"%9k`K9Ғb;)wP%l1pϓ2ri 1rZ0u|Ork ǪRsܮǪhik $GZ0elXm;Hz9tҪTٮcHz'_Ou:c$V\e׮ߌhZ\tNb1xb;Gue|)RIڳ -}xE4:<sw@r%/!5<uu#i˳ꙗYojuVlQS/e֖V?ֿ"qݡ; y JÆ5rDZcd9Z8uBRjyRFN5rt3*1rpZ-2F/A5HV,iڳl# g2kV)$|O eԵ(e7nn;H}Z4cn?PH2H:]zۮ1/)ֲG&h THn?F^9Fjw讼u%1r$  ܛwZ^r#^"҆7>TFN+skM[|@94u/U^=/&Cb싷^GWfm86|_ʬ)~U{Q9ƺ z(]R&t=3m۹Z4u ''uʨ1r`J͛xܐ)Heiڽt# 2r2rNrV .O }$ekdHѾ]?y+]IƗHn;ۮ1+,Ԣjm gp#R#?ycfnWueo7p R".w#~"Nw( KRYVnk ZCZʬuLJKդ/Ozffwy%K w5^LiN݇Lbli3QK '')̴blOphUu_QCZ^A.|_6ik?خ1{ x|7G$:N_TJK/?]doPRv "`8RSOWﶝblݫOh/Rml@rIĤڪIS}7-=)ǵ/1sh\@ypByq)nN`v=+iQ g.'pt5Kil9 {m"r IsǏ klI^S3K+qnǨZ6SDւ)#ttm@RUSx?Zpeqz; K>ߡE3VvܡD@ypċf_6v!__P9#^,584un))|yeN*ߗ2jHk^zS^m;X~^.W 'j8Yf;W(8OH#s[TǏT0v /iGiS22^@ypR ʬUv_iθl@N6phgHO[)f7,+мCTr#5uV;](x[tm;ňyZ8mlҪ{bER.V+nbl3+F8H-my3|rcS'sbK wq/jwwb싿eVf#HDLڬzݝ:Û~F+})xxmm9F@@*'$w1pS}@,9FnZ9cG JNv $_ 2r6]cpN3DܐDT3z vnc$f5lpHa8nQӭUcO)F%Z8}ߦP$nwĊvWޢ&vײGig"HnΔv-^%L]?vu/e])ǗҫI^a5:6R {wSv 1p3R)]3:vmMgMTzuF}^LԠz%7cdE?*.~"_ eI &Ҟ m ed*=@b8&|O {ȩm;H3f1 خ{RϡUik)FJkػT#7$,ܱTz#9:ۀ{/]HU DPʻv#kь{kz3%qz;|GQVblcS{)-[\P>R$K<-{4uޘ;TzTcj~X|=Smx|YeԐ|F)mS:v퟼ų+:u/H;sz=]ٍ)Qbj~/.e֩P(#6q.{R0-ޣf)V]9F v9c+s\ɏIs_1ݨz=J< ~DTv׽ivxIL[w+h~jt/m[2kv.^p fH}E3GN1+ר@*a8Ro.Pn]clk*A9R@jл:ݞ:o<O=u/WV=6c[*V(ɱ]H DTY1]/w-SG+%=^LJ^YGݯpf#Xc*] cmɨYGGR0=(?n RcH.$G:e$Unv5gL.xmHf;HD'ҡ-;خN@D:uL"/Zj;Hg-TzTA ?#]}k-ypDNo;m[;JuT,"mxZCkuntFw#w`ΑbR.mm85ƶ=zҪKH՚SSӢ)Tv `O(SZ0Yvp,u2kf+]Hf Ƽ YJc;ѯjA/9K/1QU7]c!O%8.TrPsS4#wQA#/]Hf bRۇ~ϟN1'Z8yݢ`H.CRoclCceZTpLi S*>a;9RɹÍCl[Z+J#H6GHuQcmjd: I,-d#N0çj[;%Jj3Jϴc+5o Dl@R03SGPzZsڡ9ȋs#m#y1Ug#ٍ锻%?al~PT9tjʵb$^R9crk 8RmTYk`0[kH{Wn©mk~T|XrKm[dmy#+.$H%&譎7]clOh/(-ɷ]$Hglqu0VuQQI`)Z$mC&q|5#-YK $J:k,*Uc՚?q!%@s$7(͝0X`HFNgBAy15do qޣ(~c9F՜P0&OċCUNS$JK4wl_.wA=;nxvN Taq;@w8RA㍷1pH\V %\G*> bf9Ɩ?>Y[o~PookS3lp059J}~Tl H%GvR;Fٮ1/ks)$v $GJ9mOV\?NL%@ DE3 mdUQϑ3Y%k1p|ҫeQ3]v7kA ?!9tȉܠ#j}+Z7El1R+[B%۸=|ۀۥ#itl@qRoSs~k#s*l+&혳PKm;X+nSKW^+T`|$HoF-/:Gg;gP8v $G*9,5AovY~:p 3OLO޳b }dlhbb$Ŋ*kЩ Ƒ,Ӣ(pByQ)JNgҪ1՗3n|_ Yc⥾鯢}mnB=+ߗ })Q*1V՚c$VT9ݮGl@qRTkC+w?ycο\yU^+T@ sh+kl[mdBK 8TWj~Ysc{Px]L%@9Hti+jkٮ1u>N%*P9RQnV6p9 }'icS̰]IƑbRv5tqܿnN@X W᛿L#[7m1Yzp*0/&"N5Y4cx}>w(z-1(-gޡ}GsV@Jrґm5wg;HSQ;w`@EH.w Rϳ]cl!ڿzS)8R^5Pˋclc9 .9G KV<>v!GyQ_e{wG*/|N/uicx^BK 8RQnV:`@v|eMQ0v PA8<8V{W.]c$R9[=MQV`PDT'[OQrU9FQ' )ɍL/.!锑U#k}w*VFr\`>}%EsCF(V ɳ]8@TZ zjua;ň4gL?ٶKnv $G*/umN:׶k-4T{VS d`)Pg]coQ?F{CP1p q=R~6b;'h*nH&gPǛc/h+O*)}3خ1Dm;IGg (@EH|F}$EF,_U (n\KڕuIJZv#[C{KqXqL P񁽶sTmR #Ic w RcTi+9F;Tr@`K=߭]zN1l̝:k1@Hn@ڷvdX_]7ޠb"p=RnP~oآi#sbvǕ vK.^qc+/G4q]H_c {</ w(G*9(5YGu}@joj*#H6G*=*i&(I]ddߪZز۹ÕG]RAIP9TrH:vJ6G*#u5k-6R_/X@H\$8R $}Zd9FUnU[xYUl9qMG'>ģR(+#ƪRs߭c(V4E TxI#Iln+@yI"ǐQٮ#~"C;2qԵ58\9捻Ko]#AiZ8mc/IwK (gG*-R]_9Ɩ>8V?H07"7?WS@k_z\|:bWR@ZGo1k1ݩJK3Ը0H%:]Zk #|=ZT9q͑bRvuȡ @K!gS3@"'h#:|}c & ʮ Hrq$7 m]-t#PEg*'oT8Ha+Z<9)rnS3)-6ǑJM鮎7L6gӓr\ H'-l\`;Hr5ZUR4_< )$5͑JJ9[ʬdȾ˴ ✞+2keY#;whar]1ǕМ()cV^wukbŒ<5Sr%?!u4R5tc$j~*ؽWNv $#RCUǙS_7}u($]ks8Z_z_}w)H  9Rn)«l[8n}=3An(9Kj}yju?9nj|mp]*' ɑֽ㯶s2{Uk  ԸS#:z$#?}_k_xDRِ 6 Ե eV]d竴Rdp]HOlڴ[ڮn=@J,Wݣ 1RzM#rk qKH=l O;w3n*Gru4w]k59wp*'S q RLtN16g@_^c #\]cآc?ZT8[M}Q^~v@87WhOY? 91pRno~>lxYmz (8e-5<5Wn@8b;Ȏ9ך?>$/brR8ա-lTZv Iq R#E *MkJVv#[7i{/R)TI~ݪrR39FJ Ut\Y*6l~x6;vNߥÒ<5P <)^$u?Lu:]clΘ:V9q<8T_|]j l7~^Y6nCs7;m]-{d#n oQ.LɈjG%jȑKm-e?J9Cy N2,7܀Z ^2vp*ojң^H2ʑJKu4WQ0-v]Kj3nrHJҫ.2rtZ2>E K"Y8ܠTkNKb9Fjw议*q; +AmUovDi]%XԵPrcs ԡ[ʮluKhk8Z^GLy!$p[u.]clڽt>#HǑwIDm/EN$O˿a;H0-]oG zQA]$ԸHE{7U dW|ÑJKu4SF(aȞ JD=AQ->REv1Ri+u?LtɋڮH  )ŊZo*թo;HѾZ0uJ+E%ƋIn@j̵c$QZy)n8e_F& ]c9UǛnW>q;$&JH]K zm;qt`q;7; HnIM}c &ծsQ[6־CpDn=eoWJVZ/'ddi{QD~ơ #|Z%ܠ-,uusd֮.wSV,El@#rZWۆ)mȁ ɊFNXY3]]WVFsߣVr/歚7ac{N T4_}5Pq1p$%?.u?L16o@ٶq;|/sm;n[+ǕخW+m]-~`#n0keVo~ 1H#r\\''E->R;Cr~@Ǒ vI. 7O}qI~"7(%J<}!\#|plPGă`Aj,&c$7VMTzD<0 w8x=A Dl[(nN G*=,To`z"#{V.֪g)_[9XIGҢ#+,cfnWum`w8Rf um7c^-u WJ 'iv/Ղ)CumrHNIׂCmq\W-/NVE)p0p͗RǛ~i grq5͕rm9r\ I)m D:*> ;c,5@r½RˋL#s׾|Ork 8eguW.2÷/VI& Ҳ69F6iN Tz%k|aQ,_i[_om9qV<1U% H=>0b)V::XU5cx^->u@rrFWւIlk|Evۮ;'~V[NSl[0qn~|)Z uiN:<5M+ JrlʾK7mqCa]?H.8Kg.8FqROjwj}r܀#Ks>H_p_)rqWЗ&/+E~!pV?w^b;HzZpUkZSB5P>0W԰w:pt9F^ֽbqNoK9mM]vC_'`Z0m|ϳc~3~b1Jiu.UkvhaNG}-7l(oǛNS' с$9kH+}5-~#n0ܩ֗NE{g0K%Gws/)#NK L206W\ֿAN `;ȒjK ㆤxqBi=9Fip@\ök q{ RPuR(Mo{Y/7hKa)A FmzGZcNoKǨ!9FrvV(!yQ51/TWRfzsUK` ߑ(u.UovhQ-9JP 5ϑAi?i#ר7?!%bR)g16P_FnXcK|):WJNj\$N݋ :%?i;ֆןc$(@5>JڮRNCn h;'j/_.;HM P GozEmJ<~P $>OՑmrR3uq*Tv & EJuP*e;Ȟ GTrHH."5w@e֮o;H۵*y:ői5Z0ycϺ@ݦx{k 0p R(]g9FDBfܣC_la %$/.]O N9vSjUr#5pl9$jSmq!f⃶k 0pKvܤ_#MX Ws!'oR.Q5w mY mMU?Z ,)emI7 PV+][^_*=$5>rT cԆמUؓM8aG+3*1g"y!*ʽ@X:Z4^GV 5p$bR0]ʻjvcċǵhHܸ: q%7kǜpH9F@@M/JIm@c?(駓c'ўFH/]uZ^r752[4n}5C9A_Nj$O憤DqB^{N_~#H=ڪH~{]#k_|T[/7`Ka=w8 =x'75~1HGҲG&>9Fj+RQv $7#D )JZvVk33Um /*խ7 Pvfs߫%ӑmS6rߵxLfƈjqjwխM 1pC^LrR^۵c 5.'xNtIFL˖S#97 Oֿ#n0>STzv $/`—JKmFmY]dd|>Z\k^_n h;Ⱥ"/+E~qHkӳtusTnDyWFU+]ɉy(9*58r  1OO)]%Z 6R^Jˮn;/iS3U$I4i蝹Z0m|߳c/Pˏ;k 0p(2קhk;H@Oئ #H/.AGݩNc<-2\֬W:Ǖ܀Lc {m?5*9lw|O:jwjz%DZ|ׂC9 %%2P6RM_>8V?~Wn=7$-W΅1RUy}fƊۮ0JKMur'7cd[/苷_F20%G۷`v>ѨH.~UKDI#uVw(ʋڮ")IW:mzV<6MGگ@v $D^->Te#-~`mڦ`Hbj?`g9WݪV^|GJd8!%w׷v#~"hϊ 1nRQkP v=O &?U ,)*'(K=mzEmy/sr Py*ͷ]Ɂ;|/vߩwqS࣓kSӵ÷$חɥt/Pn~rq/kۯ(m@jIǦp.9Fjl)zX5`_ 4ĊKuPT0=vk݋`!5\bERզ9_j7c`-bد o%.ZD HaAYW~ $%BYAu_s1ŢZ8}ߤPH.^\O v#^<h\$96ֽ$vr3@'T]v1po+pw}%DZ|/EGjg(pRG-{_s' }z~m[r_ ( ksmɪP}+Ibkn7JJ'_{75>.mmmxrk D :);Lϴcd%Zl,,{p _ eH{VӢȋEmi8oJOj,68KYu+)O?e7na;H}Z8ڲSt5\R(3>UcO9FR-vݤu$e'!]ɥ]ۧomxrk| eH>ߦeSv:zշ+T)D8ТR ׷ҪVcuZ*}XH.")^ PvsܧMС;y+'Liǜ5py#M/Z"WgN*DT DB;@a;HD ޕkʴ]ŋK~"_^b;LjiaڹhBkre76'9fG}:+o;N*$ߗZ_v]y#iSw$DZ]IėR Uk]cמw"ߓ(߂iґ*#՚RkPf ŋm@=*5蝧>RcQs'ZS)]%Z iV#7oЪf*al@$\D &U"Zj;H W"/]wNXJQn޲#-u|Ut5\R0-S1/-)CgZ3m@Ḓ6ּH٫S@^߁:3͗_cBR"PnU\a;Ljij Krl@=)Ze5w1{V<>E[pٸpR"}~]f;HFNmj-(Vd/n8|)zT:Ӕ{]?566DRK zWus\ֽJٮ)\Iڳj͸Gb9Fz_}`P^v ? TBjJu16-{tl߯`H.b)FDy}F\9FbEZ2>|kr)!m-{dDvŅQi#߷]wB"*9rF?#͟8XP8v $/.%b5J9FxL 9()ɱ][ (Ŋu8N?@ ɱIX;mǴ7%o )ӥ_g-sTiTSV,Ŋm-5/K~Ey}ۮ1*^"k )@2lٷzV?+]O"U]Vhb.U۫n]wRHʪܾTfsܫ%աMdKDT(TC9FbZ8uڤpq)6!m'{N>LE S&#ʝDLʽN|Ŷs$JK5P}`"%q# Z]vZ_v#^"~P D1~Q]vj3@5ZUv warwBB(-P`4iRm HQz'@u>S$2Iv\^py}{c@Vm)& <~g$lճj OrHtH#*>;$lӻ/k_QCrM OTcͽ*[[LIHi?Vt? *V|x/=+ԺE.4^!gƞu2'! uZx5L-KPZp5E##%d~!_;I%;f: ;dH{5?W5'!`|jHm:hdɥѧ}8 vvhMjNW K+~L}tXNƞuMp4pi4Ԙ3tvLKpֿ~WC$I-ҀҘt~ISrX4?Ү;4w*Pg:NB*F\"!i`0 +tH=Ҹ^1罧x@19<@z Jzh~!oa8 i\ZZw40lW"myot%d)SXġXt{ x/k!#MIHGsKկ$o4^ј3S3'!`fϸT;}$_$t"@W%_?qg_އNVg@a@FX̩Q=GCN8 ;5_i[d: ;&E:_g$Ķm};˓IW@j֢ߦkLIHA>TԫH4wplM{T9"ig'$[@zlOqg_1罫߯V[4.gKRiesӥ [M'JH_(RtH2VM*Pg_$uf-z5o%w4^A)P׸/UѦ$AnJu+6[(},Η>~Y-nQ65أ J6? Hvk̙?UC5'!PPoX~ ψ%;'~t"aͿ7{NH6WlkǢ9$ĝqg]=n3À;cǤHHt_zHk^|Vn$tH#1UcϾt9x I4|+v~^nB͍%C5?;Q]<pq:[5KxLIHG;վCN4^:ۤ~e{%ʫn:NBZwlՂ;SzyMDZZx5E"$d!'HdGMwÒAWس/V՘I%Isg\K?Dm:hp5ϡǚH{Psf\-s_*:;oşӦa:NƝ}0B6Ic@ƈE$;bi7:tر?C^}E"i 1). ʱs/LIʧ瞖'Y\ǖRúZͿZjL'JHِQw%N2FUux;bQWɇ K4^­Rը~LIHGK;ԾS.4Sl[Iߝ97]H{fL>H1bai1 YRg-OξLE}N]uwհF-HIBƞ}N6(!-M=R,X,K+<8 }w~pri1'hG|63O2g)4ш~d:NBXTKu/"_4tHw?^a:NBzh9O:b@Zm)}Ɲ}8 [?orfo5uRi״쉇dG$t@m_.XRsnLӑ4{v)!2>Ř&q?TޢbqҸc-h;tH#Je*3t7iUÚyĵ)ּ?~( w%?(E:$;j: |wɒ"س/V5(!͹bme NX$Kw}8 vthޭWj۳/҇ϯ^}n3>ŀ;dGh4k'i̙?1'a>y;$$hhk_.3 Krƞsq՚瞕t@F%oԴAFm%d0?JUr`҆mK˓5Go&T8(U _a:Jv,ޭpHrzMd;&*.Ҝ/S=d:RB- ?lYC4r҃%ۤMAU=L'JH[6Ϳj5o%oJR&{ʆ1'!4Tl|%IJ$O?i3'aϻ\U<-OA|#`7[^O.3&a=,dMd:ۖk4d:RBz}iFry]tH'cqmRوޚpUrz$dתZt-j)tH#n ziT^UӉݮyFMw[(-ʤϽe.>Ɵ{}b%J 0ǒ"A[ոs.S՘I%QoH;},EƼ䳤H85̟aǚNH{PoD]*zе)}t8M*[4r #R,& =[yڏLI-}Z vTuJg_l:NV=V>oiw3'aCkI %oԼYoF7N~4_+PW8$2@c@JQ)!9Kpe$lOioZfǷW _i:M¶~KKOtNvLʫ6TsnLH p)spZuN 1 A_&^+yKMIH˖j@l=,)~Mߨ ӉU9\;+lzJ+VOZMI؄_HxF pi~ƝuzxD jk뜏W߰uHC#s}8 ;jsW%^Rϖ\>)wޤUN_I7*UpPe:lŀ;#R,* 8/LI؊ʧXϲH4MIغWѲ/YNiʶJiǢseli2)!=1gTNSv@R0 %!bx?p2Υ7*vpTԻL߯-.5(!-7hmרu{Fr_?;3q?TXDF H.K %oWʆ1(! }ڹlMIi>ƞuztD [5{E:CUIvt"$;Eޭs4&AK~HtوwI Ka|OC~8 cQ-}f}; T*c6'@*ؑ׀cس.2'a+~H+zZeM"gےDڵFsoRjMGJHňw+.~ХpN\SR4%EB;s/Wɦ%Ig\(*) ɒbmi~އc:QB/E*F2-RUw կXj:QBE%kUгT$t"w]Ǝķ :d>qӊ_TBYvT>~Yοt}?GH<d;&UJ;t8P/X!wW|L$$UI?( >]-)&9@z>+0'!Vj-ש}WXl&^p<e:LÀ;bR4,8M$l/hʝ'YNi D:5ך ginmK.4t ;&tֽ85_|XӉԠY7B;?ؠ@%/RV|s-/M'JH$[.)INiޭ3TӉ),AP~bE;L pWb|̆ C}] AgWçi?Q2I46jtTO;+*tlwuvLvƿb,W$W\&w^<yr$Krܲ\HXHxv! _m-joW{C: V:ݒ-9=/$mKbicfpWJԤVo^CE;brO7{Gw?K|%>rC ZnkS8}T.H|!{ƛ^L 2'a^}ZKH/W鐢R#2+3}ر K#h/M^.u[wǝi evLOʗ1, zTЭU=W]=WQ-_YU[X,w^ _+Oz){*W*P^-yErʓ_W;[֢}SkwĻ~VlUj.>O%KRCʯ.A,O~H i޴VsfFƖ_[|áU { PqA*?PR1?xTУH-[H._~.++tk7GWͺ"5mSaOzދL^U N~/Tr QQG./'PyU=_תeF5mXkWi5oZkԸ>>IJV|PI_ҀO1(!Z~0OWPA3|^Uo SQ*G^UC4?PN5ou+մa6~ոA0}T[c){#oGc~JkY̗3G3p,^-bV|+#?rU; ˓'֢7^msW!H EpPleR*@VTrwj!Ǡ*g#U>ddZeUrOZڵzV,Ue_TVug~%OԺCo*G}MB=5zS!7 ?DeGtO /_^)yL5oZTRZ\>PUj&ywQ ]VII~YYEMڱb M1)$_(-LR ȓ;'/)9ry},E:BR,X,hGH67RG.wzʝ׾^jGhXx:d[z?; ~-`MX1D'jdUʑd93krЎFT|v.~ QůȎvJ=LxYbӑ?}X">Tc&jdU3f,ڥTs7^@lI#bԤ}DU Gu$`j?XKisU|tɒZ6Kay%duТ{oQ$-*F SI;IUNQe Ut/KUh E:nFjtd? t pG`w@"X|y#GK*Sa> TTAʫ@e7UTWZ!ivƷ}~y$Y/1]7SCmTU Rt^Զo~U} wD@`w ܑ3pǾb-bd9%*0D}@ߣeU)ߩmԼi6Vӆոa>^u %W\m~Q >;)ͿNԪTa6{$Y!u4saqa3o ^cQmzm {dKp[޺ -~/{caҖojۜE%oY>ݝ~a>Pw^ŴWe[:-mN,Wx^#ƚK穳Q'&Xg;զ_ֹoiˬg}w'ܑ3pG`wg" 8T1TCG|h A#TԻ队eկ^KUbW-UG+<;{SvL0R};F};&ga m|%m|%m~uk?0$ O;acU2ptF4[ oSzI[f_FHeI͛a<\zZޢӑ_4] oCzY[f` Yw'ܑ3pG`wujU;@PňqE̎T"\@5f._F[(!U{]jq靗™*$Ox:KuƷ*GGE8Aު}S"=iO~OE~G oMJW{mxyk@ qFNvwz^K}?>~f?+w Ԇ7^кW_ІןS݃t:/`Ggo&wU|i+XCxA^}Q_^9@vbw ܑ3pǾbrb"V6T=LSISU5@U>t.UljѶi*ަxA=9tl ;Z] A9͊ ?c4ฯi ߒ;t:$q^>$|x|?c4ณ} [?khÛ/9>i9Ő ~5 <N"6y/gNY,yJucUσ0 sg͗t 0Ā;r c_19f퍒'2^^%Gmj[v4Mᆲ& MIH]|ͻaw7iMR,*9 =o[Xj:iW+S\HT =t7-ѼKkVIm|kN[d:\_*?A_9NOPgKM#}OHGk;TЃ^r]VKCOFv]bwoO1s$ۿO;K'j:RVY<[W+~\ t:aI$ytCOh1N}cL-^ۓh 1Ā;rtۖB@vʱt/4 lm~Uؓ!H پKyX>Gr:MG>vkWj˔ߝ^r'o߻J S}!%@׳,)ҾhrfbU㦨r1KZȽ" 3lj:5+U1b {s{F?Uc^[,_ }ƀ;rRGmN;HOuyL^OjS/+@,)XA;5_tHӱ/ZI$@~?CJ0++YFzMR5IwwY?SQce%˲4?Tă[)XN*mVkM;g:á᧝n֒o'3<{Aꤪq434맫Ѱ/x_snEtԲCWF~|Ye:SúUuo Ed{;s{T5gx$U:bê5AU5+JS E7dGR>4{)P2`AUܢ *Jt??ͫ>|i808<ӑJa3V Z|ڱh{d3vJBiyՈSR1 T_YTkUЃ!H WXI-ҰS?B2 ] ڢ97B5K׫^rοq+T5nXŲ4{?V w۵Z?)o]惤ƞNhsf\㧚sFv*F׼ۮg^'NHmST>bH{*GM[3<t ,Kv72FG{q gX$%_,P~5C@R{C':Gn"+_+x OݱxvMQwq!e:C y%jo ʝ'w/@ֳ,eT5vFӑЅwj?Q򕰽%!vuƅ#ߌ:BM!U-_3<%5oQ=gH7N@ntӁg9%w4;u[㠋>'vê?T͛9 1bYR] JCqe98fHMsnTR^dL'tԼU{A:wډ "-y{iu |}\}D/ёy)9UӉ_98s̗ԲMt:lɓ/4j?Ut"tBpL kXQEI?d:в@X5d;+>ܺCzt}ϣ5t*tS O"rLyQ =yZ@qݝ>itz+6NzDCO9V;t ٶW%m|-vv.R1|§gxW Cj.w+aߦ/_}c ԯXY_#$Gl%)X# ?xM*=t*t51(wu |O;AGA {Դ;NP&kɀ vwzk4T:=i?=QRNr%K= ?g: Pav#yWF C1gYRfjy4̟"2^G.ͼ'_Ef!YʒF~a*N.ijޭkۜU T" [yWN(֑<;I)2@L:$y Tt7 {_V#nzHc~MK@N%_zռiDBj~C}=2x-[~Ӧ{S#j:'V^@R[4't$$/I|oZ죲c\M%:4}o: yu5҈oEnXN.ۡ{j)O݁,fY?t4q$h٦YDu6_ >|E{Ґuk@vpГ7=$_qH"zcպ]r-9\;WLo:TxHCO>Jm[MÀ;2ҨO7?@EwӑЅZxoʫ_ ;YytM_Ve:diɣo(v >GJӑeU:Լ%k ;YitrfB qCx\ӁdYRgT߳RG=@F bRiN7> _IHb_{V}H9 bCj.UOoW$*:Z=' Vp'/Pli7,{_~󽪞0Dw nU~^#!I:솻U=qZ@ ?L|\%$ɫoWՓ@ Ҷ]y :-*5]qͼrEBr%J%:[U1bHH Su-wOeI{[U1~v'oUQ 2+u b$J7ܮEb_6k:t(vm*_I 1`YRi׎7= _qHb-[6_+P.1Ӊ$%E%CzůǙNu:_rJqKmv_:t"HC+v{w 9;N!=\)Lϊ|nWZn:R!5_p@ڳRt-_Ve:*jc(v1)Kp޹vY ņCM84g9YhCjI?Vt$riFÿu g, ?kpq5[~z~c:gx%H[%7HE*u|ӑd&nUj7Ŀd˒uRոzC<:U5nt=,K bӑ`X]q߁ ^9^q%#O~fUO`-d=gt7˓_h: i巨z@{,)2_!GN9(M7rH$Y!)9ҁ]acMǁaeCFiEo\MǁaeCF_@t_\cLǁa%K˿ \nP#4H/  =R{44?4)635 վKm: eYRV35o414?`#$>ĠMg_8HƝ#YdOߏ;l <4q&Ɲuub H?u4씣4gNs:'Wk7QV ,Kj):xƟK9>ӑ&^&ziv20dOњxU;>{2~2ĞNњp{4+t:>Urz#!M8>M8*Xk7q,@Z,)T/U)(oaH0ē_.A*TKT SE:%w@Tس8H3=h{ .R~> .~2'g .Rgvx%@tJ4‹߭8H3yzi/RtwHrKS.\ǚʆ֔KT #YRFy诛N4o4V#2%~Gl: TNӿGnϙ{ߥ_=+s):TcIrÊ/ߧO߁4g@8#aC] < IJZ1jW+Vr#Cv=[U`0ϒvH#OM?tUՈ0@F$_+PQm:2JϸP@{@U Crw DÒ;O}ّ0_iFq<!/~i_Vi:2R#OQ ,Kl՘3&o:Ҕ^nI iʲ`4?k =+caCn>h:2Li'j w MX) >h:{a'hNT=˒j_=F}8td~G~EN81 nF, c: \)3牆%o4˝W`:2;'_b6~p@vJiIgʝo:2;'.SNi/Ѱ(4?;@cr 0wcYRV7}x,րX=?fj= U>Xqzh 9DtIt"}c5,Kj:48i̲^_n:2T/A_ }#p1шKw򪺛 (,O = }; }k_–_ ]DM߱~%_48:ӠNS@%43*~?U "=RwaYRFxt?$qaU >xհ9HuzL=tdSPG*T/ߏTϩGN sL?Bt4p=NӠSNt:nᏙf: 2\σRiG3<Rwg Гdc;'@G MQw|ť yJ5pǿcG%[x?ޢR MVG$>Y9PQ%z2M4@J1vJRi a{;+'|[RKNitH}LGA[:[L'r[g]G?0Up)*tdNN:[~S{,߭u}beI]R#g/[S)K[c"!|Xp0W*XMaO]DzT=qʆ@ CRpzOUn L' +mT]CT1@0# +.5[TKNo{ ?3tdQ1| 0>z42c&w U,}T53;V1 ՋNLs?8U`: L73<RwTG5u(:\}zL߁TE$Ocʑrz} 8>8Hy Xt SrNO=<@0lL z5Y"[o 8 NHP?H#Ǜ,U1b' V$h: ["Az`IS1r<Ȟ3{(n@rn "Aġ6tdQm0);R&ʇKwXXᕊLr%7J=:R%N,Uzt@|GHC(Hݝc4b: TQ5uBS¾MA*7X=&H: =(Ud: L ?Xv@ ’X.8R[W@|Sw$'^L Kz܁t:rjdt=gq<KcT@ 0E$w 󛎃,s4y kR#FTQ ˕ʱHHH3HCF,W:d* ߁$CT6l(reCGۄM'rS$$U\eFjpL'@c) JUbDQ*FWI#yPgT9 f: \ɀa6`uNVzT ߑ\;|R՘T2`(rE}_VG$@nlM8D%,Woq@1䳤Fǔ#x)3P=FI4@%ۖ*FL1Ybxٶ$t }w$Qq;,$G:IpT1bnɎNg=rb89==w$],"v,pT9fR,l: "R٠kF*\H$@vߑjFlP $A]*XKH~CU>NR,. )S@ 5J U Iŀ;.* VѦ ˕ CNdpH5ZQw*G$E;R U@R;} 19w?#bT9j49@ I p@҅ۤqST2p(rzT[M'_$( GtE~n: rD^UwKI JCFH@EJӁac*8@R1_#'N,t{T1r|fZLRINi#,S%rJNd)}0tpI%r:a9*8Lt U=Rr8T<`nH~@RE:A*vF*R0ףI锊Tس(1=w@NI锊zHž;"Ra|o: rLa*WH) GQc T  "A|Hd: rDq߁5Z$@vJ)[Qc{x@Eyy $ES*П~GuI],)@ewQc*?:HhTҿ{V~^*K yp@RCRʫfHac NdXTg{_SE}Hhg| T֋~X#I;GV { Ӂ)[OQc򪻫'H<1rJr8MAN*))f: RA-,VAvNdxߑrjTسe x UkHC*_"Qc*À;& {Tг(1=zo"l"oZe\cY[vt ;Q@%)&#NGY zߋ擾ޛ3ar7ԙP]t:rNkjo79^ H"4NԴaZ)ֶc֯aH"O[659i:ծX*2@RMLGAh޸Vu;.Tb7r3RyկZN)I- i޴N{$HWj\K-[6Ӻ} i~v&5|tƵ+Ul~IF#>^w ~MjZt䈦kT <)K6a*Q#׮TG $bIc9HVӆՒlqCׯQՖ4^{:nɦdv{-ہdߑRKw I,ڪh)mկ@M.aܲ _G ضV.S}$cIJVP}(T9$+ .kMGAkڴV5K0 s<5mߑ\~GIKy:QZmR͢rTd77YeF`.70 p@R9}R~D4mZp6Tx|h: \+Tx/O<5tdx3$; XwV,5Yn6}H)wm;nRHw}L~tdU˴m{t:`X W/7Y>3<Rw$/,5,װnv.]0 ֎ˎqSiZc1-;ɎŴ~RZvھ=:]ΎT}5o5mNԲ=F#)]Q/5/U"I|o?k)ƀ;RrHS?1Y"TSGp\>a}ֿb8PhDx^535/HwExN>կo: I>G{X4 o>ջ{H)Om{ Kl_4[k_{ 2@K+~Bm: 2Զ9jSWl: K+z~>6mH"iߞԶ9o }L|Q5_j;ⱏ}W$}#p@J9R.ikW8f:2\= ][~?X$l:2L,ڗlҌ'5߱ob>' _jX۠/=EcS: 5Og;N,ѺRzF0J_y1o;>#ہ/V=󤶼(0[gO?.$>'> Qa|C+$?j+Q:iSqkYoh3t*rNԸUxMagHDx^ Zd $V^ekP+~D-M Ct4iP֠^i|O;Io߁tIM۴?(J#1`V>74,NԴM+:[[LA[o0w+V='m(P潣O?._$>DZקKm:ҝmkk_&H Kֿ|Et X.)?{5o`:\ i= bKO=j: 2֙ohߞ` <ҺWfjճ)4FOhSd: Dx/OE"Zc;uK8`+lo׊>Fq!:[⩇ռMNiͲ$o۴u MmwEm Mm9˒<V:yjo;N2eI|i}hwLA6--yv{ǀ; H+ֿdۦ ٶֿV<%H+5mnܥPNqfk#wis\|h ({}#wtZ-}n5mn߁ I͛vt]ZjXW/t哚64GV{C8H3Z=j\@ -0,dGe^?6iyz->!P*|E}}ԃZs N`_J{^=ES=w JU{A+8;Պe 2iSjˎFMAE#/jSWt w-61OyTHtX$U>u̔t{!󤅿A{t-3_ׂ;+O<2~_p;>W߁\y҂]b: Ė9ojd K nF[fn: Ėiﮗ+ i)̳$o;e Mmpe: J-ZԼy80e-lkk: ~os^˖Zx׍;\^e{sZ7n۬EwP;I[Բu80uF-&5onVp@Zp-;i:LhW>;5o!;ҺӒnQtВo|GRitOВo|~2TZw[ߦ5/.$t"]_*ZJhg[=H(V=V>"aӉ&bшVAۋ @KG>qht?ާw.\Zx]9~~,˟NAv,yP-> kʥwݩ~@v5v4{-w HC  }X;_Zp׍˦ Mlq,i/!9<ҜWiӻ/˚}r 吜iW9hӻ/-YNi/)9\ҜԦw\ek=*9t:,tKn\{tئw_ּ~)@zU"Ҋ'j]7yZq`XԲU.4/lTGMAԭX7\V9H_jKLA}Dw,9~uj Wt?÷~Rq"u+h֍Wy[}ŀ;ҋ-˥/ԢPt"Вn+$t"]Ɩ%ҖK5KղmDHm4˴%N-ue&Ӊd-6iuj̥;mg>o`:uͼm~gt:;lW,myof]Zo6IK彥=H?~-nѨDH1;GOGri$mKyUg_/WgsHHΖF;rzeUdOW4F=u67hk3@m)ϙh:$lӼ[=gU{Isf\IgpUso<{7,tKsf\oh:Rl;/i W/N-UKKG͹嗊LGB{CO2(=COY*jӜwcӗ=gͻWtzۯ*=CxLso"Aӑ"6ͽJ- TH{  mR.Y_ LA}XRmrM iwimQ=d:H=wVo8(KgH{Hny_4;5v$Ѯw]7,_4A^4;5Ϋh7]$;~y!_$t"1eKi՗y:Ӊd-[7k.іK/V"˒<7i00!-:ͺFy( W9>k -:= D>{ w XW$͙qpf -F; }?e=E;ڵ4g y{dkص`m: 2ӖZS4ѲM9R.kQgs޿ϩ+c9$G{r;Y7?eH{PQZs7ߍrLijѽ7hu@#m|=HgH{HKI32@:};rܪ;P7Cy=XT*TSdk҉3Hܑ3pǾb҃@:#bqr=# mwZt߃ʫ4)C\҆Wߑ匨z>ÄZk4kfnw;&lՂ;k߁g9$[Z\QU,k:B8Ԧw_v՞3Wߖmjd&~F3vH3 c?1;rW @pM,e}җ%"!ͻ헚{(~b,J_)ł7;2DGs͹f~>BKgͽך3ywαՙcm3HgKvp ;>ޗU$ wo4wstÀ; #g0}ŀ;˒\^i[sFmA\Zxu}IKޘ;QBӱ?kjuh]_ ~qiff^s{M\vf]y|t:XIޘpkLOkmu%Z+=ܱpG`9w+ X61SNOLUp5jZxuu5E,ɝ'mzg6-W1 Wϱkr}ZʫcY{nfZ-rTj޾|}3;ϵ̾j̞vP\-{ʫOռe*t,|]?WX{ZyU=+ܱpG೚8)f^w+ۚMG^ [5_ko;v]~7is*ٶH-₩W~|V=MII^ly5rZܫf: tWMZܫzS3{: ziZ*iճ/ NӖoF-zep@`;rܱrH.يE[md|ӱͽך{ӭ0y uS򕖫tp9èHM+^jXM2=oԸV|Z2F iS_F^k̞"AzQ3kftϛ/5nةo<-_YJ w>9L5| w'6#g0;,Kr oS{U;@"ӱ?lk.<ʮt50$88HDe9-;cՎiocڝew+w;v`ǣ$YfE"E$88a,3?N%P"A{WuTE(P,|>o Qh '_ߛeWZK2s~vKS 5Nv}/|/"1{]Kwܳ={fW?Kwq!-k:p5_\O=֪4s`o_g_~xˀHΪ!pgsVI{4yrk3O'8gjIEK{䎋+{mg|Lь/{"`9c&24GF˞lE}43c=96{|?N>O:ܻkLsHZs<>iOdbN{sl5z,7w^#;UCΙ,=ldxzm/~]jVٓjn>O&ڛuI)2YI:ssl߮oݞۼw97__I:i5p_G2e[6om~|Rk:u8'gf߮oٞ1{_R|hWdD'>9~ьoݞMk4//,_w^#;UCΙ,#9,d2L_zEkʞlU:|e>+WVO#Ɂʞ7P6m%Ǟz,_}^ω̞7P6m%>:;p\P|=Se"{}12n=?z8{?|_7~+)@%4GΉ345{36`?=[) {x@k$pg(}G(:v&I-gϼ*|Ut/ā SہjW?sY {oMwƋ@eMjrz+}{@%"~ɟ˺oB<^ǚT±kO窟_l﫹߳$dW֟;\8,?/G??Vp>8U )',o3ˮ|4=2eǠ~g_|MZ7ہNjdͣtH{b2SeWI<{=w#'>yOIc^hͣOO{r2k7`s:<{=w}iOZ߁\OgNdhr2CSg?^ga]O'g*t|-{>,?䚌]o2z y{rF'>7+yΪ!pgs_>5c{_jfF7mI{rVcO=?}YJUuˣB{әKkl"Ԛn͔$y{s{RovӚ>0^X练Ogn>'XߗtNs O?ɮ~z[߁8uϾ#_cx_:ssi[_3{<}!w]dj2RIs,uG>?ts'f(?ԛ== %p5j#Yv;}xUcx]cXފ;J.7 r[349]dywkϣ\k8"-$'&kv抿3?ˮ eOWG?3 XÁ[WZ?gy;L;Pi'Opi~]>7XO?zXڳ\t#?:F3J1{|y|;;+]g!|չ!s=KϏ=wV ;gJ2|rO\?ݜm+ѝ3~9{ NҞ,{*bEÃ5\e^~U\tyVVÏ>?>П$`Y;e}N.KݙL]t?O$ I.y*۳'5}?Qfzd_?Y{5Yse+~=P>|_v}菲lywV ;gJBI>nn䚿~<믹1uE=^g1wO?orlB&E=.G:m\KOfwf|ΌmVǵ9I{"i'Yn%W[;ޞ Yy[bxr}<~<ɿ|43O[߁~$۔l{r`Mߺ\'slߓy{'2O~#95Xdoυo}W]sz=g?wԳ^w^#;UCΙKE}.p(џE Xt=l{cOvKt:dq&V϶۾'[xg_sCƶ~_fz"Owyꋟѽǖ ywV ;gJIw>;<8E\?_ˮj8RC呿4Fk(֫{[XHga.,≙,;#sl99|x E}#V~owOY/ߜKz>49\xPݓC>#{v 0S=h}xy/g*;/{}l2׸/.3g?13Y9#sGsh9u>^~oi<C[2uɕYsey`_3֩}{8v;kY8~4=9C9gWF~synvOjdhj?'?+۔l{φkoĶ߲#)3ȱ<ُ= =$nV68mV~/9z@zKw!)"[vdtl2"'֤=1&Z-z=$Igxzne,=Gppʉ323s`o=X=wgwݴDZeO PmEmp Wc9<&A3x,}%)E}ңx1>/~1g˭ݳo;qyYkه/ӭ~ҝ?e+ß~z^v{b.G>Si ]ݜuW>ӗ\3<.i߻-83{ٜxv~0=kv=5ˉtZXw1'iM3cscSݼ5o~{rG6]umZK3)j PU)[Ik[246M[ɶ[̦M|߻8s^5;X{I4'kݺ5C^>nzKm|5-{\{zK'uD24>do˦X ֹR[Ik,ʞeafpQx%)48W0wi4ƒ[346SSltuZSV$% `*9ԛ{ Mȶ-^6kd[̦`+G?5Sm'}I6^W(gUˑ'c{R[9'o]IFlԺLeߟ^5֗=-pIleWgxȒիz?ZiXvI I}8ټ1#S2f*{}3ic IOK~QΪ'#\zZ?|0kl+{KsecBRk'#edǧ3yweu7deO g157櫮*{(M';_uWy* Oh.&E3޼6#241/ϖP6]{}_pAXY^g19dbo$ɲ5%C^[$g]M0E4ia=dxLlؘyQ6llⲧV$u3I^ǟ}.݅'NwedF6.]o`g6-]oteO FE\!+jƍ?/w݅/Sk+j4_$,_wԒьoڒu߶=n{GtK6]qEVx"imp; fӕW :;rԒʻg-ɖoNkxRWu~RTV+Ro 8~wjo!t+l二nڜuolxi7]n^r%IkM3;N^oٔu7.]o [e W ?8A5.EٳW\ ?J{i8Nr\ח=9Q$i /pbBkDR 'c;/lؐޚW]'XN2ܑY!׽)[o}[6^}MFF˞^;rESY573%V;Ik_ͷ=[o{6]}]L=&Eo.nl21=UhpV (4FF" $%cMܗ^2y WO@J,'[6__8^o!޸>ooq=YSi=93ur;4R?>j=US+ҝ9R+FQ8ZpN deC[M^jFF7X7Pj -{ *Og-w`EgpbEkl,)V_K_3ٲ'w-Y㢲G9Ͳ`9+GQ92ZJr;3ߝz](\JkS8Cw`eno =9j[z""~2yC]Sj.4'T3+`)Ro=9[xInz?YQ΋Z{eOr%pVHkx4E'9&n;V'+u{ɥo{gjV٣f+);gLEQ52F7ܒƚ;eOkoȆ˯Z?/i%wΐXQv#뮾9ۿ,+{^Nw.?x٣7EHe%pVH=TdhRH.G9ojz##QpQIc$5C&ӛ/{UlyfrEer^f#cʞJ FYwgHٓ:'+i 7\tw;gJE-*(]3o# seO©dxlyVvV1zdٷ`ū7ep^mᶌ_>'|rɏÌX(_F{(~ك\ {4۫$Y{U?I!I}gV٣wzmp;gH ERLQ/{=2m7ޖ^P~7zYUeRZ8cw`e(  1xY{tf˞ΉwlceRZHf ΘXHcb1J1u%dOjc#;R[ew8(zpORiM=I)fpK2\KeOz:ַ+kv\\(jpL (! ml,Δ=*U$Ǔ+~HcxiJNO+BQ+,{Ҭٶ=;}=du6庛R-v.{]+H-ͱz7ܒֺ˞f&?MeRz83$pV"Isx1J[{o±'Y}ҷ#VQJh6=˕XjIch)J5a}v|g& R geW=J%ԇSS8#w`tŪ?d edTzeOz,$WUZSeR HZ= ˑX]G捹ޚ#eOJH.Lsl ;Ԓ*?=IF'Dz3?W$Co1}=Je4GpQjgu>3yŦtEV,K?1{IZd+{#;")h%}䢷3s&#;d˵7E p^_$PRow5(HSTB{x([o-~gp&膭eR)z=zIaXqfcy ym=RnّܞZY(H˳?|&pTWȉ#G2cmV";BpÛƋsG-{M.{;hR pS8j?{<*~^li-e*~!5eOS]EGXwGM`+BQDh|/;ߙල5N 7fW=JjGs㐅烈FGje(h73<:IP#TFnMmߥ饨= R$ ǒ+~=9U4S7 TQKR+{'Em}jUk=8Hc]7e7dqIV^R7)w|ZЈpF54\f\p[2why݁z L^pq٣TRVK}h ;"ԆGSo4Rj"nxSZG[({H$WO5:Z4Tj93"p"iNmJ)f g[o±'Ydmt )ZF#͡;Isb)$9T/߰.p{&z׮s"~*=Jf){ӪՊ4G^Q=6)sxF|\$tÛ2vtʞft4ۭRzyO+u)m;g 'E&ה=Ih42wH~+e ߖo{s/[d-7eeW=JۛޓY7ZT`+@XHRk6s٧GQ^0:1׿) eO OM:J͞/~.RNPqw`i&E{Q?L|SwMgU[ҭn]m657)zQnv{91J4CC ;"4G!I'>D߻q^炷-sR(ʺsɎw@츸QrxyOsX٣FwΈXF$Esɏ= ZCltkIzeOܑJߕ~g>ʞh3"pVZcҽjVeQ^bMoκ/'Y^dt)Jug~USk7S HxQ~'9؃9~pӼ`eߞtW3\wޝ;%>_L77^kv<=;XI"'yAK?ǿyAnKmHS4b/-H*u~=+Eڷl9 jw`Eh(?;?Z39q<XH/ wfӛnLٓ,dí7eW=JzNv樂wFccw^5;ՒHS$vSԓl7,͕= . ;n=N;+#?32ir=9ϥ'E~]\({ӪixdMWf{1^pTz+?>WD/ɦЖ˞ݤ>57)z~=t $IQPLsd jw`+R=K{?t?)y;Λ3I;xgK%bOp,:˩5i/pU+B}B{$)IBv%k2oU$׽G/{<_9'Y E`HxW$͊^7NٮՓCGv;)$ozs.YlTSL\ya6]{coy3H~BNYc}Gz=`y^ZC#e1frywC}M7)|[揔=I5-K.џ %N'%w(4= ˍX[x)G~ׇJۍdeaIi\|n=J}W_ Xi&w^%;I! ~^ˡ-m;3uͶtO=ItM1.QO?|)ڷޞ~z.TZpF,{u)lkKbZ;=|0t:>;n3soW"Y8\?˞&6>:\8TTT<;ji= z^s9=7Cן^d 7ݝҋM?w=viҙϞvzR1x13;{g{ZCCY?JrSݥ})uԓM _-CkLkl<ѱFFKch(4Ci45IQl}?Ioqav/.YLg~>݅t; ?~<'gqH3sxhI[I韲:et/?Sz_dz2T-}~h*tB]|'Vg| OKkl|kl,푱Ci[)j^wv]\l:sfLfC=|N e(4ST~?nTk%3cJ"x[2ҝI:˟7d7gה=Jo~#'{(^;fKwOw/[ch(Q`- ݥYz.ޙ 3~cF֮𚩴24>dZҚJsd,AjnJY;{@ΖN?݅A>? x?z8 3Hfg̳2s`_=+;Bhk -dy`7v4#p[Fo\z#D##i5_Zqɇs'y/}6{sl_c0Aל g]0=lG0Y=6+(A-Xu0MOݷrb1m .XX|D"6-Fchڙ1̌N'0T]u\oJ5S`RB C-tnDh{!6l.A X*t了ޣJ!]r [n̮HY6d }S$,@)7TlB?v63W@㼅4̞KT^@U42 $t@aBo/َ^wlQEw~Kٱo`*HyKǧmR.}{ۃ\ wbVp~>dZ2+W^XrkꢎŖ[n`c1kDD}#pyC?jr4.\B͙KdnB*f;h ׆)6OUkNvz7w4xU`ۮ-Re"ak95g'Ҹl ӧEqTPxf=%Ǽ={y_IOzhTCYpޅNBf2LYuRQmE'$]v| Q."""RAGDDDʗpS)=2aXn)S&.d3L_}*ڨL1mϱKøQh0DjkNl`h0Tl7 hZOE4Y@TMMfJ3)UMc5 Xů}={vѷYڟyݏǞ~Ga;M˱\TvڥkYwGC”k|M1dv<.τMj74'Aˊ->n$45R˖P* vӿ{;е}mmdлu/1Tvw3:O l, 1x`hZw]D'2u鲨Md*ɌUk^v}{mlt߂+F EDdL5 Wu9 KVҸde1dv:LBz#"""7#<u`׃p寝Pr4K~ϮW+^YuZ7>MuJ}ilIg0*G~n^ʁSS[C˪i\yiLi‰:$4w.Ms2װ~>(;w߅pR,F \.VˈLB.^RP<~^51S.**` `%0Xrd9.ZXRvN_ɴNa4.ZF͜ELE:rϗsK]{d~Jφ2SfϷ8-ge1s4,[Mx NkJ]T8 JE޽ݾ*;jTvWa ?e5 8?ĂsΧ\ϼ׶r5WfyѾ bH%Q]DDDʛ^:>O1EbH֍K$4N0̢uy 0Siɶ o a4wcs4lëƄQ'D, |v]v.-+VѼl%uP=cgs xAݕ$j|ADD,ݟJPo{sN>kNab\/ʄp`!S,d{m OuWw@aha%yA5DdhfBX-/iǟ]Fy-a9Xo2yZ1@DDDDDDDD@w)sn2u 玸\O7z:1i3d|=?tsf/~k~g kQDDdy3a;/fg0{Ndj6S;\v._V{}>뮤g}#+R{6TpOj 3o9ǟBӊU]DuSS)%jP72m~NֿqTO' &:H<_l_5~7 /+I$5ڷR9,;e0茙<êErxOw˂|'_˴k&6$0{''h^i:yd C;_HlV/ FM?7L0NamǮ8w3W!PYS4]B˪amL`F>1b/n<up3iM?) L JC%cX72h\:ٳi.f=MWy2*oSDD& n_>.\tSYcBA&Duc Wka> „,0'˩,ξT&~ql_x9w3otVDDDʖe8#VMz0j{.ߎ{-i'KzO 4+^]u7(j?ƀm;xɔ&E)UK/fҲj-u󗒩:ib97^M* ^dJCs8ς .gcUl\vh>&/Mq, &aG~,Z:!ǖWּxgh\?QL$N%""""""""3«'VOc@:iq 6r3N='$9%4c@;~0&- A({4/9_X"1eB.h8T"*L0EDDlYWUHz5'0)a 0e:iIi_'Kcf|$&: WÜ\p̟|޿)T"RF_eG}kv?v Z]rGof 's?㸺!"""""""2AtMDDDʒ,Ԩ&}#~0j9 |>yB/eWr0hZvlQ^MXcp6;v=Zdu5?/?o Sv""R, >.23JDF, Cbqt vNu?9':$>^:Nǯ9tcRDDDDDDDDƓ """R,I&R`ԝΞPGqZN8 % /|-5-ӣN_,+uCPwm-RUdş%"RԉDDD J`7NἯ~3?YgΎ:)sZ`|K/o) [F X?eޘJ""""""""Jw)_E"S ';IyW3O;i'?u3%Hάcg`8ݻQr͓ЏoN$I6UD"⬏3Kr}vdY۠[d;G":0 ʱ ,B1V.5w\3J|xͧ_{ux|DDDDFMr,[6P $ci 4Ιk2f* /a򨣼`U?/1cVkTiHKqg/&u oy#',8r,3aHX* >aJ%Xr36ĩ63߰3QGh*H,d*/㤠wv}fmrdN3<~Xxꢎ`W﹖љn.N:gi96-'"{Т, bvqL]4H"R!LDip緇~~Q4eB2J9t' hQ8#i)OfhHD%RR_'; fύ?`cd)2( DȂ<4kN{)vͤI 1xqa9.^*"i\Nר[:|e1 LJE?S<5T"RA0 (/sd'&ĒeY~{zEqTODDDDDDD."""eMR`},ؽ9:vLci+P, ~Xxɥ_uxjMp <I*J`A0S9_'$N%">~! ۃ/ȈSzNg!{""""""""Bw)_x>D:q^zoe@cvfƻh taΩy^iڵM`pL[0i.BӚf&RA,`׿B ""ܖg[ރz,_`Ӳ%x?gUn}bօ2EDD|HF]{Uuch,8,bI^]ifS: 6v#~h?1rccvTJX 㥒1ȵ. ++ވm6+ K92Of򨣈T=dLnI^`0a't;.¾OM1JѲꤡ}{aQ;mzi6<3/l;&rdJE*PffX/x2cA//f,6?yUa!OeCܾ ĹwUNqwŨTEDD,YW7˶3.[[{vO&q4^D)MsyX1X0&;9,KBDs_׿?-GDD*׬)K/~Sϼ(""""""""%,ӎ'Q^"""KaީgDHD& S `g/}LcS<9"(DDDDDDDDDrQ  ,ӍQ&' =XvkELjyw:ט`Ayk뢎@w\E/bٷ ) c`$qK ?0HB_~}(Qe񳃔,~L@U`.ix*ܪjIՓk 3LcUZ >LuSYй8Hٱ⅜o0ӣN$"HP,b|"}'T5O.Cٳ:l;`~ 턥"nB˶1a!9e)-dP5u:3fR7sUu9'δs/byFDDDDDDDܨ."""W]K%?1rSп8vw_㎥ $jN3$AI+ z3@ώ+ p MpWT` `/vݿ=K%'} }LL`τ!&8A-=vu]=?JԲA EDDDDDDDƀ """R,Tcť|&vҽ0cؽhYk8pubJwi_t1 D}ro }ccIo{ݻޝyn[ Ai@ )~@aYd/KGz6~bO/¡Fsuxju4>t*ژlg's^Mo'xt1D&dM !mԗYvnrtE'8t,ҢgA | /Z`ox 'I$qN2% 23д ,af%]p!?#m[zu仺)w X6 9=)ȷݵoSfO=?V%kX?{oN#"R9l/IR"HVWc{#z/88_熊VA>,{hN9C1!)=toۯњV̲O; ǎ8 '7]˂\U7b9cmGy 9-XŬ5GgXߠyzQdD,ROϢB{rpg?ϱozDyCYw'VJšKϱpScs߿v !f /, |3>tn:LI 49ǟƔ%Ki^DR Fi?olW~jVBw'r~`K&֞? N=cD`igQx%O~H]8p\mn8{(A@c=ᔨ ˳Gߨnu/ҳeL=qo8eF̶n=DzI6ݽe 𜋩Bڶ7,8-0c;1M$̡BҺq#x#ފ_g fhHPʞDһ mmd[ݽ-Ә{;YP] 6}1=R:0 ;}ۿ/gzLQGmϰǩu &IX@?kO#n{ymwB?ž^}Y,s8s^i fp >~J8i.bٯcL;X1bx,}[8=qv-7΍)ID~qXo`\[:{g/S>Ig͊0H7Naٛǿa ?u+uan^OQGQ """R, /ػi޺K?u,i_LoCۅуӰM8LLzwoЌ{3 =QPa|wtz>^C 48@|;Eb`%fmke`+۷f9s?JQGwut?nS&fp'6ߛö,gh!}ş6Xηp_}Dj@=|UuS_v-Mt~1S*R$޳{ m3*7C%̾->Z,&TP`ɧI颊=c:{`;CvbDZlw`U}Cd[$xcv2?AqP'""c@[~Oo, ? """R,jjۧ)J$* <Z֬e GVp/ ’7\Fx]O2|@dy -U9I>t? },J[68 ,Psmo1phC+V23h<3Tzt&X'eu thץ3Oᔿ45Q'ֻ`/O{vSzr*ROa:6$I70Yx2<=֐$<,\RVuσ_8o`+E~c Mпg~xAݯjZf17+a$t'XDDDʎ&&^P,K$7=;YxcP;+vD(fagǸb6˾[Ǖ>Ȩ9zL_ 1Թg7]G17@o^4Frq%wlg|}w_NJ|',Ƕ-{ؽۊϵ7""039?ˢM㙟|mOj',l|e ?Y 팃bO/6sXxXxY*Q,JV!d$Z})<d{g볜вtyƄNױ{aL?^&u1f'iٺ(vw q4s}{p'gaiiYsjlK|~m'('H#gnUpnS)&:.eW+zzvKN7hY(xb 'rJ:W]DD y$N?;4K>K6ɶ#5p_ O0LH`+oq r,w0焓t*ϲK=wN |::33u,bZMEDDLYt!& SutR;%^6DD&; 5Ǽ' C6p O , l+14vr /\ g0ⷰMa欨㊌ ȷǢ,o ?㼨cd}# .3,ߋ """R, T ˚<#t®?xcq^"SeՉl:L8T>&+ O>ϋtb{s&@`^U5&:dz{( gۚmW=1aEѴbp}.&l1 :9^Du-h+ |Տz z >uh*^eu\ ٟDm6Ӳ s߄.|x 7jvρ<__f?%qخ; ߹Y̿XP]Dcs㑇ү笡|k~OۿeŪS˩ZM """""""2Bz,"""E":= ;6Ӿ}8df9c88q, /n;l( `MRCkqn.^& T({c~+o);wR>Nbhj c{%vp=8sQGSsλ GDDƑ Pc9Lc O]u%{l7<9gBXP9toz |jnQG~n+|t7 /``sӷ_qJyUDDDly> ^<[㌆뺴w&N /,@©L? ~xLܸrc ~>7a)qpSҘt=}\y*qӻcNL^Flwq{=u13ēiZR>$""|',wMv*{nn{y3NΝQ-$ =W׼l7GN( w|CRhI -ޘl'3jn"Sw.HlUXN"u gyP93.d0Ž4.%njiM;1_$ 0r\Tzh9*ww/Yˍo-?=[#Ձ)p<˕倛q?+c4sv:DD&Ú~tMMqwͮ~L9Y\Vv7~|mQyVn[E?NɻH7eMWezSh^IEDDDDDDDDFHEDD,YLF#NwlaEeZ.yJJ?V1OJ~w?8wtn\OPP<_r?p}ԟӵ@ԑĬ/~"M3!B8 ӏYuq{ѷc'&'r`CҖ1:hV7^1׋:Lb7< x3; 'YĎᦅͲ GiTqF-Y[/Q')/AHy2/w*7 ]_Fʶm6f4~ rаzǞ8~]#Ն5# O1v\dJʆin-~q}\w3خJʲ[`b1_ղ8f/DDDD&J ],~훱=ܾ9Rrai{vn|c":-wM&:7bOpqf' 7FgTTY'閻 `o4_(`sr:6<`lfv.SV?8~(`gQ7wd6_3}}X  =؎JA0̀{>\}ZNC҆mu_t*I!Z""2 Y=fkxzmҺV|Ah:t~ )tum?Pk2$6Ww1m.z?y7 ۮ﻽,/)cu_DDDDDDDDdTpcYhʸ7Gy%kfv?po$_ԥ):ؔfx6xOpl'*ˇQ\utmm%,&h3C~.7!4o;:!˶i9<3Rf PjS=)6kv{hF~N& ڜvUn;ï Q9*SלDܹx*¼+^׾ˮXg?6XLGDuwuaxKaoHdTpcae2 '| nŽ9\e1KWO0%  k|cGi-בdz;b":"$ʳ?sިZHOJDDFA͊?%U:ΘvwsϞ~FXcnR>c_*^E<9NȦ^I3(#8w>VV#ݒcŠh@Nv]޾}lӌNyJJٱ{ :/= Ahs)eh |dRSEW =?FQFig$(EDDDZVy܉QGS~w5P*."V*'x_KUn? Nr훞:ʈ%jhXu nˋH:4=^v0!TXn}]7_=GiZ 0kdjkE® ~ˊm ,qRz+r,R{oMQ jJHE) 3ְ7cٕs0mKs(u]DFTY|uFN1;&L1tDDDDDDDDdtk^DDDʏ^U5qkc҄Ʋ\Bڞ|!^A˲q%,B; rPr6M+3r$v}C)7K&4](% f,^k 0s TI+ҋ1?)Kz+r*h[ۮUQFz2Z0c=ADD"fTXodEQ쏿B08.M"R0o~G_Un'`QόXQ]DDDʐLn{ee =Eo{,`WPlQLqrP,:e̲cvwwGŸ1\!NI4LhrQ<B;o&aX{2MKWPʍaQpݏ=J@|/SzxjZEƈiDQF$UWR]Dtr{yELoxUDŽ(cvapQ'!nDfrlAɏ:ʰX@ID""""""""à˲1|rh Əgk"Jfw2S) 3Σ~ 6﹃\w|?0"lM=ho+(#x b?̿26(Gm/~oD-qg?Ldaйm ~!ua44o>""""""""G`&7݄!A!3* 1?]WJ'|56 sw֞DuCfc ۮ)1X:ū7f1x0FQGq:N "RfL&0sIUwsNg)tbizB=/wF]o\4yjStDDDDDDDDTpbS]ĭa & 9l)ܼ=6+uh:vmi^}}ya"< q=UU,2l{릨Ha*n-KDD^A\tN:;(G% ط[vN?kIjYt?u?@oQTYE""""""""Gjȫ,[E Lb Ovз{n&fiҸx0v. 3Χa6 n\wg34]>(,I$5Ld 9 ``SQGD]N:=o3؞ӋH|cwH+ }rRMeWDDDDDDD$**Hyrܞ1b@Ǧ̬/阥W5AjWLy)0nXC||_B';N*d"cr88@PN"o""j0 i^:(G剟|m{N/"O{<"Yed-f,  KQGvH7FCDDDDDDD$tNDDDʌWUBBX*RE6;}w=& ӏ]KecaѰ`;7\E3ޓ :«/ "cͶ0pCqcYDՉ [qDiF +b}N/"cn-1c0a|Niupmh^s*vyg~ : JJRo>1Nppi=c ި I$1Z{%""8X˴Ί:Qy${Uvc ;{>!L?vhce$k˝."""e"װYM KCƧ}uQGy\JPzaj73ulvrݝ1̡/n""c̲/(fh :4cg޻ k5)|? ⤢N#ʆxyeYCHDDDDDDDDTpec Fi-Ah{ xm<}14-Z C 0 i\|lg:ڱ1b>Iz.2֬{)=[1}v O[d5]DFcv})Lip, rʧnxU """""""r*Hy,LqW"q 楡g=QGy h^{^現OUC}^A~`b~lx-b}^REƚeCX,rQG TgԵDe:wdύ?kz [g;~3vQ#lCXX6n-#yMGDDDT:MDe4toɞۮ݃缆)+AjLeS "n\gg=c(Q8,7H,|P:TR=2jC]*0y/~D142,;{b~_yn"u S]DDDʋ^*uWJq6tlZO_{gi^eb,%xb?>B-.!aڟ} ŬD „Q'#1U.SW.ȃݽ(f|^D%x?J)42c eYɴv9r}>!"""Nap{į^:6mbGe|>fa߇>\61wпrQ8,KզAw+ BIiM O?EKE s٨SHkP]DDD$ Ƣy1$SY|-d[u[Dw^t v"42*(gcz{Ȅ (MaG@DDDdDIƯylc^7=:K4/\LusK1h۶] X>oh{?8uò/ֳf0eٱ͙uQ~:;5EDdR/~ """RV,Qx9aF∜4=w}w]+zlkN2raG,Ijp&(@ÜyT56DeRw]M/ """""""""""eNw)lYVFk0$(b跗gk߾JJ]e7ф!Xomn2""""[Ƴi,k~ߞ80PUEDDDDDDDDDDTpaA+wc0O94A*h4{9(vv]yaDlTaсȤ~4:ʨVr==.""""""""""RTp2bUs R1NB~ 0w)z"m ,HTUGaˈW |uH]ٶ21 è#Lx$"""C)jj2bxBotT 򈈈`XA)FIA.s=+x3ÿq+8Њ:3 P]uWeH4?ҕ ahxgN%/?""/A 28eg:"LIDDDDDDDDDDDdV""""eILRgc $٧i߾E N4am__"Q~*_O#q$īitf2NL&xi݄PPlpNUKxb;.Ρ-rl,²C?C?,DARf:3ؐnug {9-_DDDDDDDDDDD* """RVT8ȍ1uQɧs ,˨Lͷ@W'n2$cb!eynu ΨD2J x5ո4N2LxC?lūm =s1U3jJDMjU$kiT p)Ǣ2"""#(AfN2b;R*p򼈈:EDDlXx;\YN =Z]&)~A@;n"$c' #%25MIdCCvpk$kU iL꙳iAUT2SHժ.""2A.YLf̨X-ZDDDDDDDDDDDh/HB Y0gfǘ{Q'w]:0=xյxeL*>X.$H֓T2'аhR7sӧA"""iF1(#ֳ}3&TEDD, :7leO6dIVQG8,uҙCv' )MHVא6Ƶ|/a|\b0TmꚨX9lEDDDDDDDDDD* """RVd Q (fT/Y׽oxd"t"""&7:ʈ ߍkHR]DDDʆWU)Ah#f`{)=77rX1x-I$@q/Af,[ɒ?}%"""1cB2ULQF,߾H)^&bm $ku~yvԉ-[۽Ǧr>N/cCHjx ,Bq23}GLDDDbʄf2^9HEDDDDDDDDDDDDƚ """R,pUX1m'~)c@Ɗ(o'?:1Op)MpkdX근+"""&/M߉r ^*""""""""""2y."""g: *^R\ 7&ԪIJ """R6T:aDc:h$;3u# %b'[ QG9"Kb??MzTѳr=>EocNOP]_u2)W,ƲkJX*.""""""""""RɅLb^:ɂ@uQwQG-]Ek+Jf ~!u#rR$qpIe$˂l'~<|nu$12l[IN)""""""""""Rܨ&#MsCJg U~}=̼Έ: mx};pLKHP*ElK ',n\.h='HeJ,~>_(,A@,ahA 7]ԕknu)C ~>OP,1^T T&"""""""""""rTp Q]edz]*KN7e[bUpvOݹ7uhX1]7,(ٟXugs任uuj#߶\W~z)QPߋ ()泄r8jA^E_Kλ 8""Rl yR!$#:;u/hR ؎H7!~>?yp` PS_u" މMa򖧁0pDMڏ)؜Ϣ:Nlry`pw& $`Y~NĶw&mqc$Iq,%z#ǽ냓zШBn{Oѵ} ۶tc`{x$v&]q"""fC̔8A"""""""""""rx*HYarLݴ>t.C7m&TܓI!wW㌏~TU&HtlDu?s)p&I.""2l\ :ʈ%=ADDDDDDDDDDҨ."""ft¡1M7.T-_v3eȲvvX>C &YqŌ1`JQ@98_iMi}NZ| / nN(""2|IF,U߈z_DDDDDDDDDDR."""e!Štܤ@CO[2O#˱ӽc^*_:9$k2|vÚ!4X,w-~q/̔"""6BItCat """"""""""RqTp2`e yP}Ga;+zHO,BlcʢdyiZ0!XNiƙhZ;xI/D*ͳul lmKAn|EDDʐ@b_wQF,3'"W 7_1 d#зB݌Y0[@G]=I"3a7cJn'RxiKƇ93nE]_[~L= YUӢN&"""c{P8Punn񒄦O)EDDDDDDDDDD&EDD,qm a3޸rapNG=Q,k< dh?IH5Q2e@+,xy,MQ7{}M_ۋk: AJ7=]S:0 """""""""""21TpMNLBl/$τ$!Q-_,, ^R7(*C5!꣎3z'C$!35T"""2Rl.#fN*e/EDD,x &(R)Npc A!u#]7i a_]Μ3/:Θ۹ﯰki"l61*V<!Q"""R,d'X:a=m3Bz$?7}mՠrH9j0AIDDDDDDDDDDDd,%"""eI&RQNc A&{xj nY0zf~AqLW~o>Ha~RڽADD6(esQGy LuEDDDDDDDDDD0ܨD1^U^,`BR1Gd{^U5Th=Czv ^ʸܽulO2TAED,r==HWg3"3 T"""""""""""&HH񪂒u!rxTE- muYQ۷q'>֟d58)Tn ٮv =QGYsq $"""""""""""2*cT4'UHFU a!wupJ`%wkIe0g:wO|J ,EDD*B^QGtM^  """{^ o<(oY`!(TE 0S}eQG9jmmdmfTnTNvkuQ]zNƩ݁DDDDDDDDDDD&+EDD$,jc=!=# 1$2,!0 kn:Q)Gsõ)UnX Ů\$YWCDDDDDDDDDDDDDƊ """{z,+-aLC\6(Gd.n2MXa u[v9QG9jgx`'P]DD96{{1*MUVI*M1C:s]%6 A>u#mDJ`93} Q9*r#O~_pRTIDDD^B, 6/YH`4]DDDDDwav]]NGmԫ%Yް ` %P)tI Bx O$@hmp,ٖ,mvYg$MV\"Y(ҭќ5}F|b b:!H |/@YLp$7if(4i9JeQ9jC{/JA>vjJ{:kjTmđFQwszŵnJqcL R%Me礖%ձ율3^DI$];47(GKhHT0 ,Mge1l1Fa@&&V#Q'9,n:UU6KcmgK[Kup29Iid:QzJ7)N60,Op: Ʉ?/MV˄MKRǼEd3Q9*hJEU)iA?uҽd ww_D]} ܁80R dՔѤŧG=yx^Y8d9RzF46+3iD@%#pb;WU ˲+ ' Nq|9ejjQJU)IE"fªy 8"Frؾ} *IF S noQb͒%7^RӠ@|IEe쌪bS/Hg;(Ge˃kLoPDJ+*);!ޥPQʌKTU!o VDJV6ӱ\Sn5 j1 +nJ9 `_͐:QG9*u2$9&9YI:8nJڷIu5Uq} 6#Fě%L)^1F21jlHʭk $e֩yс O!r\C,7# >F6:Qq]Wg^*>)Cbò]YDW<ܲfel1a TRlG̣X#> xNk m?/,rTu*I-x݄,#QT4婨92 ' IȮPx6&QGxi( 8 zC,;RQaOp)Uuר?"ۍ: eYyЭou+MKûvTĉ@/fڲӔjoT ok[8(sɨ#$ yU>"OЫ u2RC$յ4G刍 k|ۓ/NpbX 8Yd?7[/7夤4uH&4(ِY1ߒx#@^":K2(i٨&J( .ZRP;ؕj*LopXBߗ_,F尹z%upܴԿnFm:Qw)ؠ pTL/T~K^n-825ƒF⥅c;b9wMqlBJIَQ'9*ܣw'1Q'8l&%2Q 5MK#OiGr&-XYK;ߕ3dCRPpfd*/) C45"N %VĖ$e&g:QXu, N$˒UAO&  8&4::-iΛߧTs= RE$lh]pQp:rS$cBP&F0* Г*u#V*TU/NrR MJ7HbQ7>ѾQ9jz2 K RߨdC#pD(Xsd;N1^Z7Bl JPpw\9t%/յw+u#6kߗ@l)(T:aK)ȑ٨mgՃzz}QZN/~ܺ gĖRItK2MM PeIvT\T9Aꅕ2ݑLE㘄njV!(Gldvp^Qh1[:P;j6iGrLN+՘_xZXz%FGrؒMpD3XK6˲{B.F tZ#IPJe3QG9b㽻){8bx-Iz^Ihٖ4I~>:e\F$ST+%)Р0.#mHJd=mU"&[+7QXwJtev_ U¢76u#R7uMajXN}|o:1YV!)eKgV~˔9JGĘD]4A /7&"4/۶%7.P&n|רUP, eYRJrD,Q}Og ԨD{/(ǤklM~dg\)-Կ3$Ga"Ϝ /@|YRQV a`1gKr*(g*rha`R9N,K2FNrDR C \*Uw$Ig~sJ51@`|$Gqt5NE-m1IzŹiO0E JPN* 2Q8*N%vU8˒RI~Q)*)ӣƮQیjv{Tý{NsL:fN׌PvƕNr +j~L)j<p(XsdŹ K9J0_(t'Wp=,UXkeI}Ju֜/ɶ%XT΢L&T*//PӒuҮ;rgă+]|+MN"᪡C^>s#@%xtQ/Q[1򋅨c7X?*8zM$iWj9u.}挨S/ɲ+86u#Ҳh2])xf 8!Tg%Imm:=VN^mIWRat$(GeJO(`;57uӄ1(b%,UN2Ye$۩ 2aek%ɗ߻F3./A5˒bA/R9Q'c$Auq遇^Qᡨc}PpeR2:!^@aIFF~2xn20:ű,FJbIaQ K=B{iQ^H^.@ec;S*ĵjZQq}QG9fTRgr2 EP,[sE刴ϚU(x(]&u^uB=Tܒ14rNeR0.ƥow>g~ԱcK؈:I$jLr%DJyZyR/ڞs -xdREH^nLc;rDDB K [$'q}) KQ=tV֔ ~NTd$Ir.!-$H?I]yʔcv>u#6ռ`88!2mҖoѾD8CQq-"YT-QG9bιT{AZ"V8jQP*FɴReve)Ϋӡ5̒ͨB5ò^::K?ok:pDlW*+jlڨi+Rǂ*r|8HI{|B{3(E^?+Ҫ)*HF^u#6iru/9]!CbMe<]awČ1ǢNqXDBN<=YWf+~4 ز$Jݺ3gJSQ-E)wGIX2R׊dKO#Y%iw{SO]>IRXT[2EP-TRg\$wkt&{YnE3FW:aݤu[l/VƴvuIJ}Y >҂+_Ov|WIK4u BFJKO_zW?uf;Dg2F2Pa,W*** MϥUU: ⌂;D&j :yc#Q8,v"QwK 숷T2SWjeR Ҭ7A֏4uYQHa :;OW\(2g'%ӶoUV >yu]t@\(-rcڵ3(G}LM=*CbIpHA$ KRh+d{•Vq˒bA_yL,7(ذ,O $-Ϟu,ะ0<[rlKu;Լ`6Kn~2(M]K.a) KRv蘆m:%z[ղG%Pp1f)w0b :a݄ "H\e)ݢKG:ΨǍJTQs.д.VaT,Pzi`&mQG9:%?4E'J*H m|:(Gegj%W(_ @l wc$Tx1R0^;¶M+Cd;RadD13˕nmTXy談%:9CWz-y+pNWNRj{rT\т_- '~]pg~K E9]ilV$wNHŁq /j57\ f'FVgǖNc,jH*ypO42eޢKr%w&1ەcڽ+(GŶ%\X@<hhݬ0TyQ^yWn*-S!')nۦ\Ψ۱yJ4e?:XRXrRިeNp+'r:g)攂|i)!mmƨӜ z.w4/;ز0OO:Qkޣ%ܶ bέklC$GN2]!;!{Ku6ulkRIeIސ䕤a?Ԍs/:p8Ii|.~ (d[߭zcQ'cu?XNq%˾#eMW^nvR٣ucoל׿Q&ķ1j^E1LpG<c QxYM*v, ɧKTCY =Rg^¿'Nu2लRiӾ+yڗRQ6]{UwI{٫u.o\\+ۑ[Rٌyfe\J7Hv| 2F3A^cYJӕ[pWyB@ cۖޟ(֠u?KF4ߥz-y;F mKckt߾Yt~MuS[U,˕BA~} TkPq?+^H5˒p_bitgY* PrwclVVw1pGcQ8,N2-Q8zNBUŮZ(dŒJR]|;_SQ'"$={효yW\#oL\LLqi-FꚿPשJ9)l˖/Jš?e"9I)7Ч]VE͹ ˮO5.1PL]Gb0Щͨb/B>7T';Iid&:QKZǟR%Ka8F4睿7MZ%X@䜔4ezMQ-i>%IA^tEq$',CC;wDjIAݨ?9!8i_F46ORiDqC0[/=Wk[*Q':)L?S%T~H QsXJۥtL]/M?C}곚vy@sRzϣrܤtG?RViZw[n:uC3Fa!# t)ɨS7% n٤^O:1Ywkk/ONj KRqTj^2O4WiԨcgRa7[% t}R Szt~Pqe:iViD/L n՜ˮRS[v) YNj%O*J+M2-|;436P,G = _*VǕoNgG8i?Un]P+TKrS1nBŢb5#P#TFnd=NJ >&(uWW)+儰$K~Iu:}WZQnmTODJ:~p:7vR܅QG*N"+[E^Q;K/߹FK?1|4u*$0RU{`mI >EGhޣRA*p-uE)+Im\W-?[}拚sk<P4=ubNOJŁDy[L'ʘR:aqi%卍F嘸 i ʏ+S_uc4y^ƏGd&C\./|kձdRLрe9{ڻA>H]ٺ]9=JFNK&YjGŖң_WM> ͻQ':q%\r p5ߕIzb7K7&y%yZ|4s2{ :jeKCV}KSԑi3t>]g/x3祠$l\@RQG8$ KE&# 2FE::y>3Y/ZuZ:q:s^궿vx*e_* JnCӢ^ŧ+]:PYi`rm:EHe5תs)k_A[mz'E)(JA(lWUޡu?{sM}S 辯:3Nlsf_v&8_zkK#rFP,Oj8e]&M?<_ƞJQG"fOim77-8'D*WZ雮#Ɵ٧l;-G_xv%;Y_uSڟjv{@lɘ܍Q!E/TMg7+ ٦=+漢$L+Ϸ9ZU!+(Hq)3U+5~iہIKCvϪ_ˌ:լW\N?)tdNcfj<90RvFzνLݧ 4ciiUt2!ϟF)" <WtRx;--q%^v]zzU㖯lz/dOe򒗓BI-ҌKҴgeB5LQON'')wh?ڂ~-=3?^ 7ޠmWCo K@NkҴ.U꘷@3)9I6I[~@$(1,M|*rQ^(JQ8,v"D %'֯UPP*'Pn]ɿW⥺>JKi-&K-˗謷O3οDͳ3 8Q,qGӶիԳPmsѩjH J]}^UɌvo)߂ﭴԶ u.Yy2cP[ 9W)jZD(rzuz:WI}ԖҦ~u%7]ewKRP.YEҔ+^gIalwOQ*[O[’lG{|6?fuvԉN(ے:,Q˜Ew嫵]CmR2#u~˳-d%Ӗc꘷H3fq,e:njQ"xoQ/f  x߸6 e<S9ܝDB: $H&i}k'4mQ':Zwsb=?oTҨdeo~^jM:5M%ǮP!uzgVoGH$Z\m O_5詛Ӗ_a) 9U&,č0H m2]s<}SjnQQˬ˦⟖.vR yV셚Kq,5O5˵W{5FE'5QĨRƟ84J]^gK6k2ӔmVpt,IT*?C0R~h_]/4OQ'7ۖSLuEN~f6U}}?uԹT_R{[ 7]-7D)8Re١_f,q \smKʴ) Ҭ>9MYV}{0[3df;N}{4df8a=hσl: b.]zzJEqlKmsm|\xF?a\R[]oR"{~7ag a0HvVj]L  7O)S阢dSRJfYPxKbNrc~N0FMgTC@,ɵߥf{Եlm=??Ү[o%)Qϩá'y.kcj2Ǯ8IlF_ݯ0W縒K栛&fjXM3q45NIRCɆF%YM{=_;]+]_uX46(ؠ9sѫSW[Q{jlkbp$-߿&Ϯa EJK fj=_gGJu*بDZGܤ:N=[ݧugek&E%yOk/ VLyvߓ[/|+DokS}[WE6W{{^cxo~K{=@NoQ%j9[MSz0is[+JF8>xQY1?==y @̄^eհWN2UwIiۭ~ƫui59R_z6* HQ 0aP T%Qgsi-ڷmm|J/SoU>ez8wihv mߪ[4}7=Me5ղoKYu@/L8$;%egLRˬңIj4YSݭdSJ5(Ͱ%* ~Ek=j׺'4y)QG-C Zrl۪o3|f_m?ZeżȏߢjRңɪT}{2kiUS&%ur3Y%ERS-cd#Ď1 n'JfM;wߢZtOl잤WQνTzq/ dč &RInuertZfW[Ɔ1I6Jֺ|]ݟרF^z5OѤϕW*4>&o|LȠr{ק} (?دC* ?6&?7P_ V#I%eٶl;DCMJ5(Yߠd]uuJ7(ԬtCMJ54(]$;S/7['7I%&\%=/{TQRG4gJ%y|NJ +74 CSaxHF QP,)執.ܪH$lj*Y 7 7UE&jhTEVeijS$abMHM==PҾ[w_^qb#]Wt]NӤΙoo٧@)o@PaxP!S0>P䗞t7FJ$hhoq64)8Рd]uY%-FjhTQnzb%JMo;dskB'_7XP!KR!$q9AL*_Yzhoӌ/:IijRXB_|FO O&/5m7o&?5/Gj\0Gm,W碥꘷@-3(֥LK`ǔ̏a9#_[km_^M[~Zԑb)L*lZZ% x%z%e@Wԋb۲Dǖ-ue Y+)\GN$d;67SqtdptXwGT֭^uH%77?獤0) |?.( /O X,ǑHrr\zA/ˑlh=hhYgG)-IEҴ?F~p?X^I/:9-VLoWN•8 ~,)Q(& B;b|BQPE؎D*Jd-ro4܋*d[R}{yʩϫ8:r{widn}}* VixHؐ<`Yr Y۔R]$e:Tѩ)=j잤Tk J54\`6jk+TzHFip6=_x?/cqR:u) u'ߧ5i)o& u֙3.'Ķ-%)Ni+ ٺ3 %*(Vw˵RU5]MҺkN>?8Jf2Jf2Rg&Jׁ/xR(abɊy{<'D S6'1=~w468uT+=xܴd'8ar%[ɛ:ŖitޫD)z?<8ab)JVA0cIH~n4$Mr$eVQGAp Oߨ(,%)<bcJCZ/k`Ǯ8A,GrSROW7G {Fw-[, t:!*S6J'KSԪ]xqP%;UnvvI\PaD8fjvqV~F [uӇߩ=:TDVa=ψDwKtFV :rU~LEwSijM7RI\_UG,t?-)J UaE09zTމIOE*?ӭzKyᯱ,ILpG0T,j7~8,[rtߨwu$>-iwu"DKbɩ Rc6)S;dSW);%9V^k#F&oՆ,+vI ł|N$N*ܫ7Tgْr۷Q'f=j#ExQ@,%RiCCb1Q㌌2&ۉM;]h7}ڹ#DC7u lONZ!2%%꤁kt>[NDKC{u_~XwWCP1,J1y/@ĒN+e K3F~B 25J6H;nM|*QA̭tߦ ONHvB,[ '-#p2XJ;oS~#u$'ƗtH#{D eK{ОQG;d&uC2aX} H (;ܺiˑukwE<G~Dͧ ,p:,& B)5tgL#<T;'-~N[{}QGb9o]~ G 8;D*uC3FAGxi((NqXl7!7U& z_?^q3xL׾ZoJfTnJa>LU; z~ƆFD2׽@kM}V?u Y M|uSy ܘ܍ xa_p $SH`LwZF1Ous,T<9Ba( 3P:3swko TN0ˑW*}BidϞc@uxI|uިS@<9dبMb1H( )>'z7DЍ~鏨gW]HGZ%* WC{7n:͒FWׯy6S@u0H'mM7l:jw?FJ$SQ84cdЯ l]Lp?[)iբݭk5* F$uX g;Av[CoӖ:NH } oXŻt?|JッQ`6NkwE 5;#%;JuC2a(X8"p"Y ~R&JRG+U'72;i}oК$S,Om?_K,) x Ų%'Q^C|㫺_-[$i]C- cQG6b #z{lbG;^;--[m}7۶cŻ9jR"&ڸ5Я $(v% 4}S$ôZ8. FoP8 ,G?Nݯ3w{8$V p(iMڷB/*Ů-d.0syL0y|1%I$Fa9?O|3oR뢎Q|ϯ4/Pv_?#T~F=zE Ib寋gouziOu?7٥]b߹Y׺jD Uȍ:7 ܘb39NTNܓ*O1~^o^׮WK/d&u:+į|U[6((:%)K CJ{~]\3~NV߫[z=qɥ:#ӔEDEw#otHwߠwie?ւW]tCS)q ym}w[zrR=>p9 }P[ą̏|^S.:w;n&w *d:6j%NrTZ /\ ѯ|V[oVh+^I'biMkwE#O<*e$ۉ.S%=,uT u˯t]KЙ;uϛu<'U>\tV;5aIhkO-SF'Py=yhP_AiLa)e :\XG>K K {5KtG^u<T8 Vd],ێ:ʡR/#:as{׾U-]uWFg~o4i!7 NبKokdZ1). _F|^` &ۧ7]wܤ /Ӳ?zN?CVpg ؘJcOk7>uOt5hY*:)X'3Fo w+r %Y*mfbZ[WM]^z8K6-8 lId2/cq4fLRn<M&~LG;nTUo1u_u7WkGzgBŜNL.|CRp@vsf,;UÚ+ T%)z?Ucf4ʫ9w,'_qXQatT֮~]Rž>aAA(HҁPvWV5XٕJG;7Uܛ& QAL1HYsCihտ/I]_D~cc# JI1*gI)l7_KXKCVݥ{׬#ytʛ~_]߂Ey`FZJ;Uw<8=}yQyc7^%5.:S3x Ϝ-uc^c$c4cv|PoMa>/(,y5w~Khߚtߧhe{]Vw{"ٖ2(} lݤkVk}wh-?/8(IӲ⾩eBϣ2a(o|44(&@l&iuhWf}DGz귿ӿX0+(zT YL)0&qvlS?g=S˩4 .VikTHcdP{{G{ߨ{Sd|Gz:4({V GkI5Yzz>_N$ژ5N~hP_߯Rc6JXP4BL`ID#),+k\koyby{5‹:m:-ڿ2ֿG?ݫwF(oѢbMebHcƙ :aSi%2 a}oԸh]wUjTaPam[6WqOKy)(~ZRP*/LdNTʭ+)K_uHZ.Ӭߩ^3e; *so=^{_=߭ѧV˄T~4T7[q(F#?˫w|M$]g%9inQnOiG_kx*)RI&,Uj@ :x% Kx^ _G>P%wiyml[N-m[ģ]BWK8b'b%~!eYGrdZnQA.uʶ 1 >vV=J)M썚{[ճ,ev'4m_6rv ~Ia<Ʋ%UZY*59 rPI/%;5؃Z&T3S/}z:_.S|ղj~m=o@}Bu]]W2' VRϨ," jlz= ى]m5ttr#Y^ށ غE}O'j%+(^I&0Yx@6*h gġUuX|,Qx t~`eͽZR==|sdm[7[ڳ!=t ~  ^d;Bߋ:!q91f^_Kϲu*qЍX my5r5ke+njeQ }_C;kڕvLJz%^J-KY *([~{L1#$rUk5Im?KEkW%ZXݒU^[-ˎC''sn#ܾE6oԾ'طM&&nz2~p?1fTϓ_$#?tS,U}"u}:,Sǜj陡D:SŌJoxfVݯ'PQ%iY߱TɏzdFt/bU٫Я/ _P~etzzSa|uRu- ĚexÃ̷1&ɖ: *KI|]diڐ2Wn&uf 9mP]%Ĝ[נ̤ي{Բm1wm/0]v•&d'Sj?ze Ϛ'7^Mk|_{zRV?F7<*#_)(PVqd)#7 è`ٮ}*퐂?uHcv\G  վpfUˌj\5OШ끽Ńn/5ַWc{4ֻK#;wjxfl|RO>X2Ap~e**9l|8lGPsԾhZfSKL5Mnَ+UѺ{* iww-xfO[7H&u&eg]ڬ9ze'ώ:˲l[~nL]FdFJMRn¸r۟XILWM2dٶbAޭ s9-zh.o=+N췔o/I.9 CQA󁿨:p2PpG͠E8* P*t8n9lוeJ4}+uYXH檮Mr 7 b|#,o܇b^C; }lzwjdoZ[.P C\|7A >Hk6Яӫ;5) oX)p@'րEAT^2}qTA0TX30qyNg0T,X舊#*08¾>g}"&nsЏP A7V} $AȄBo|\^o?[Te֫n)NQSٶN[[nh(MgHd?,#IIzZC Bߗy O~Rn\^>/˩0:Ƞ * *׫OػYI9 3Qfe<UkRTĿ;T(<Ӿwhѻ^ZJwNQ%랢l{R-J56*ФTc%R'V"㺒cI$co<łBA~TT)_*TQidXaSq^wo?:-+|]cT P˞_(d*mZOjǍ΁eJ$T7sg-P]g-ʴ)ШTSR鴜dJtJ]En $\,UUqtTQGFާ=4}FY' TO=h2A76t/zJ)?Nv-5m 4[$ofYa#^d偩aK-ȲydLbQp/dٶڦ/P}d>E;W;[#;%YShiuQ3(%ْH1BNsRwLVfٮ&N TBO3S=+^ miWP*0q:pPpG-0J7jʒݣgӦnA%j ƗTѤg]ιK9E&ǣ&fPD-I$W}dյvgv>S^MKQ1FR"ci9SV{l7!T( QN %#b;,V˴9ʶtcbmX,In*@eK4mj陭T]RQA0*h;jɸ(B0WMe5o;&smy l!7%٬!{Rqj1_.':p$ƣ,#QK܍$+,a _a+ت35|- HImaB+Jٖ殸D,WCI_؎c$,QKFT.SpG ,Rc4e[;:mv>vO~Qr0_,Guι5v((2D8F`!;j^=[p*1F~ v6cں5Tmyv>A+9S<)I huZx%Bca$E8Y( HE΄b^L,W]$u/X-ߡ{$ۉ:%prmujPum] rF$P(;H @u6y,[-߮˜'7%YV)/J\p,Wc4Y-+P=(fPpG-wT+ɲl5Mlsf-ԎGHrQg/T~sڅj4]t0 F$P(d;1RQv2T֭ιK;&9 f@ })𤎹5cj6[٦6'T:pvD8Y;lSyuD1A ?oҤE+5M{7na)ɲN J~Qh3/U缥k햑؎Z`;jwԒ~dY*=ɲTޭvL]?m)MG=FK\x&/>S S$ |O2&PudZ\&iIY^Imy,e[:1g=rv=V-9C/.ʓ'TOPMg*YWx'NI^!;jGX̯F0aTT"QǬEoTӴ4eܤd9Q@Rɚuj홫Lsہ Pc z T= %ʥgT^)/ҍL5uhSjU~ 7%Y*clsF/~/S}$ɲxF6N| % &~lGF{,[niSyڹm[}OrQ'@ JH3:WSΩr)/c={jwԒ@z(j1$v2m-]ꜷT[M6q%8@ <{M?b5OT] PAr1:pP_D1I$15τRdNs}&-zREC;*,:qfB+JMmy꘵Hvs*AhTI"Fe)ܮ))3{#Э*r* #%)u54ii$vcNIQN& EF.,c,VcTe1m_}zX%8J/HӖ/Rc4W{QHzl{fPUD- $=*͢ CaIN2֞kT׼e۸YnJSRP Ryej2[&*JQ,Z#P3(Qy᧦ 0 P2۠5M{Qic$ շ7j嗫se[:$IہJzX#P3(VTe/Y-|5Owqv݅?疩hhѨjI ۘ@ $ lBz6ݔ%l6$wS6 BB6@ b.ݖ%ZlM!Fgz`q|L~p{wW(r5iJ$I$I$I$i)!.wf%W(!'ME*{GiFqܫ48!iZ\"dhOCK; Wޛ9أdsɦ)I$I$I$I%1$e^/evw?FLҫ48pLw(iIBHjji[9.=;n䥽AIR$I$I$I$IZH\־n]EEjR1=;ndڴ %I$I$I$It P?%]Iע4C!˸͕^vt9pL%KYNo9K8N!l>BI$I$I$I$Md`ƭY=dr5LHJS1I;D:k&;KCBr(Eoks3Gz\"/#$I$I$I$I!>wRϻ~$!)ΓA^J;DJwdef`IEB|m=s͋+d_/(JR$I$I$I$IUPv^MR[ @Rv,Jq!Rk&+K;D&ICQBM (֦](I$I$I$IW+.B>o{Z:2V$j7P9J3wdxxIE.!e(C.\Ne6$I$I$I$I$ C'$K0QqjnA'IgQ !Nd^+F <{ {vZΓ$I$I$I$IrBrp5Ia4p}4#9pLW>Oe.i$&z:DZg;ȋ'%&$I$I$I$ISDyfu0j:.ۥs#~irங.NH/c([h7];w-'5iJ$I$I$I$\q ryXxeJC['LraΓfSW8)-ܥ/[3IB e(ì _p sbώ8.2dӮ$I$I$I$I9B+V3rk Iҹw+p<)M%( 5i& !! \ٽG],"/?H/9$I$I$I$ILJh6|,aJiGHir.ABvG['$1ksJqsח;1J6BI$I$I$IS.@\FnV Q&"Ib!o#Đ7$I$I$I$I떔!.E̚I$NΓf|ʞQќ Jp.*@cL9sY|ɛt={#O=N6Q&JI$I$I$I'$a% n]kLXN;ODŴC9p* ("i IjL־AV}ݷ_#/M9P$I$I$I$i)`N.x#KEB&gf4׍>vEd}C~~l8|څ$I$I$I$ISW\|} K/5 IH OP1J3wJ? ԧ"kB (mjeKX eӎ$I$I$I$I:B !`e4"Om1*R!T]F_~0I(!&d7ޕy8nr5@v$I$I$I$IR>8 olۥ:pT]FGp.MY!Ijh_G}u~%r)J$I$I$I$\6- WkTp.}"; #tHۂy{﹙rD&v$I$I$I$IKb>RמOV "vn;p_T]NÁ4 TnEuM- mūٳF? en$I$I$I$I:B{VchM@qTvp.G1%`8I@BL7(_.$I$I$I$I:{g:ol2vi\+;ܥo%ρ?J;D+i\%{CwI$I$I$I4%oibҽt-a49p<iq.}k%_zSn*$("_@mXBɦ](I$I$I$I%1jrx7\D}s+Dљއ$6W8pH;Dk!(x/q]D#%I$I$I$I Ikx9.M v49p"zRn@ehcEߪx/pd^,I$I$I$I1 Ĝ]SMKJ;DB2< W2oC޻gCwI$I$I$IIQyWҳxpTEۥoˁgiH:KB m,[yO^|23W<;t$I$I$I$In= Yt]D*vIw]wRnt}E`}8ul]$I$I$I$Mq cx%l;KFDe(p.2ǀR;wLefst-YCa>IM\.0rzٻcP9E@&r$I$I$I$ISDWkj{:^Evʅ1$I;O4|"U=ҹS뙯 dkY~гr+"/= 1"$I$I$I*$@\] YxiT\K;O51_)ifp.[ts$MK!P oY~}!."2iGJ$I$I$ItnPgs54ϝ^(M'iz+|]9]:ƁO]k$1I1ft`߃;}%8!Tn&I$I$I$IR5ڰ=053r;TI[+*{?o!Cܥt ͑4ťCۮa<{8̣L>y|ʁ$I$I$I$eI DPD,v5s(:L5<]:K~-L~hH7GtWvlVݼirs cDD %I$I$I$Iz}BR9=_@K_? / =`YY1۽bFJw)]E*'_͑4݅$8>lz/rpN=B\.eK$I$I$I Đhf[yIPM;OR8 ?R-Ҍ]J_mEwts$U\&Lߪ-]ox Q(JI$I$I$II\9̭}=+63rj(ӮT]IҌ]:>L)HD/~.ȞOP9E2ٴ %I$I$I$I֒"%,Z)8nt~D!KSͧRnT-B86BCkk8.%:@yb(S]$I$I$I $G~^E c%9$p}!*KSύNER qLq|Xdq#$( H$I$I$IR2,M]]_/"JgےΪ];S8p]5Gˁts$UT`KZwOJ._N<{ ) #!:3t$I$I$I$i%qkmSm Yx5Sq.iUnoŁ4_>#ڄP;ͬ{9fN>@P @IR$I$I$ITmBRydkki+l=I@a$z\$9C$}{ܥ,$q%ki_};#L"H$I$I$I^*gs]/!w'IC$}gܥ^2f#E l%سF>/dA$I$I$I%e ƶVW1j:)R.L'z>p<I35iz95`0IU)Jc#46MŽO7p|LEɦ)I$I$I$I.Buich1c#iIn{2Iwiz(p/͑TBS=EK6s+8|rD& Q&JI$I$I$ITʸ=Wgyx 2iIn/"Ur.M_OW$0?I*.=+9cGKQڕ$I$I$I)#@\|ܹtϢ<1vC$6ܥ TnhL7GR*ƈ2Y_},Ȟ7qG;~"W$I$I$I$xI򵡵 m(:n4F"up&U+ 8I*i[:XwgǗ9q`1L |I$I$I$I̒iW9r@qtyGvρT][\M7GRJ˘`wɾoԑŘL(JI$I$I$Id rj{&KsW/֞O ű$UoϦ",q.U~X ԧ$z"7\L>_8J@68t$I$I$I .CN.]˩kn41ZYKv|>Igwz}WV#j@i|\m=ˮ!zob7ާ?q(lڑ$I$I$I%)W-i_󮤵(qOm4> ^!&GRN3i i~x0?IU.!ޯp^2ʕ$I$I$I)$q{ le)K$by~ශ)b{>wif8 p@kEVR.R,Z6ΩK2tҮ$I$I$I$R!Tg&_L._Kql4$I$I$I4ehsx%[(xy[ <|x)%'?,M7GR !86Bmc3˯Q?s~=ĩDdӮ$I$I$I$}$CY̙.ch9qcΓT>|HE9]bශ2tjeh7Ĝ9?UNzxlLڕ$I$I$I@R|}[{}1Q&!fw)HJwI{w lR-TR\gyt`_pAds@v$I$I$I$@2d2KMp1FJBڅp7)HJwI_s6#j@yb(g%okf~^& _($I$I$It$~}V{KY ZzP4>v 4588~-4c$UFOΪy'G~}}0~jllڕ$I$I$ITB\9y|)]V&(J;OR{R*$dEiWJ$I$I$IRk2з<֜O&8z:K'd!MR$I$I$I Imi\ 6qJiI~iHB 9#vBh;F`8IU.%97MHRu Q&֫Z*f)7J$I$I$I]td+>$Uɔ[$MS%^ p-Нf " WѾ`)Ïn{(+%I$I$I$ "dkk=+зv\H 4y7?[$ p.ijC_~L7GR* P*2gshv>|7/{XB"'!I$I$I$i iZL-BT!(@eؾ?I%MU_x/p-0;"I)JcD,^@*?t;?r?'զ)I$I$I$ öӳb=.vlJc$N;ORu2lsΔ[$[r.i;STBS;ME~] W8#:diWJ$I$I$IΒ2阚z2iWJ$I$I$IJCH8uM.]At/FN!DI1/"IwI8{uO?tZ$:@i|(g&:Wp;9>^|n7!|*I$I$I$\)MTg歽|}1B'L;MÁj Yz#Z$8zl.֫X|#O>̉CG!OR$I$I$IdKP.Bko's݅4uR.S=vyoϥ"IgwITNs/CHVqDBRH;ORuʰ}{-4KIvypm9Uc #'ԲkZ;Aʻ_$I$I$I"&`Vzg޺3@0Nadڅ߿!I:WK> | GHVqH\*NxN>%j!]$I$I$Ij"1F5[΅O xI3wI38g~OHR*E:W31OC(JI$I$I$I(Vܽ| Vow5uF!IPRu; ?oSnT8p4= pN@> I))2<7^B CH!HI$I$I$:G!K][mh06B9)QډTNl*#wIҿ]]LeCHN!9I&g+X;##k![v$I$I$ITE(՛f{(ƙ9yY%M>\v$MU%黻ʉ#Z%4tc B<xЄ$I$I$IfiU+]eS?Mc MҤ9De?I[$is.Ignශ/? J:ۢ$FOпB,u//O& C%I$I$Is4q f +7SAib r.i>EpLIYRξ;нr)Vo{Fj(e"K$/> |7wȋjgA&CwI$I$I$M]$e(:nJʹ&$ g]Ҥ `"I9KҹWx+`IEVi|2 3{n?C9=r' %I$I$I4uD@l@t &8v$%M ؟r$8%)=?r hH3HR"I2=76o!탋8^^ D7%I$I$I4I s1oV.]G]sQʅg`%M1ீO"I3wIJg7?-IU)H☉S3pޕt .У;w825%I$I$ItE8-mFh#.NP9yy%Mv$t%ip'p#`K9R  sYÏdxYp.I$I$IA(ioɰ ]_KNeYx-$KTs kIZL0gp $ڦ %I$I$IT !k[ū5R=MHbOl4/)[$IwI>|?T$U("dyzWnm09{v<$I$I$IR)M@\yp>7P?'*v&GOw"IKvxpm9RFNRy>v;8u45 Bڱ$I$I$Iʨ8 sp!7<$)<<&_[$I߆wIb*T~}MAW8 :Xt1Cew0qj!]$I$I$I\!)mlc3(8>Ry%Mǀ?>VGwZ#:E!)>A&_üۘӿᆵsϓ\.I$I$I( 6/gtQ0raI cT$M#%i)Tk$U(")(K51Js{y2 á$I$I$I ^â.g&fw/ -C;8s;vV֒%I$I$IfH`4`hy=KP!&^ρORn$%:<1p;*C{$UB8zL.Gע5tϧcxO>E?}$I$I$If$1t/Åt (őSE8n4I ܗr$,p.IA!F]#*EIS9Aykϧ#?{oCOs$I$I$IfE(C˼vlehl&.)<)r.i31C$IgwI>76Z$JEĥ"qHCK[m"?r/w+ccfAá$I$I$IR5 013 m3QDqן(I>u)H2T%`n=Xi|(uB:z3mC %I$I$ItNC66ҷ< ,!_@ibǞ.i2=Oei[$Iā$Ugwo~ȤZ$D!#]t-^{n{ _v$I$I$I^Rt .`\6ʅ1*v&_lO;D4Kq7p#3UHJQD)a^@K GxʩOQ<Ҏ$I$I$IwUFQh^ƎBW]dQ`"I9]f5S-T jahk'K9s{ ' %I$I$IBSP; _r17;H681Vy%M'Cer8I9]fCTl !٩IZlE4}2R-Tm+;Ŝڹ}ik28t$I$I$IL@aru0|LK ٚ:JFҤ )Y`_-)$=|0@mATFds/XJS{7+{W8Dק)I$I$ITJc^.c`15Z(Q;&S{[$ISwIҷ )HJQDN{Ffwst#fNO4ͩIZIH\nv;/Ï˞73~L,8t$I$I$Iz-"1F~vዮg&:{ Ѵ %U~p,I]J|>WT$Ura(0{> -.C9vqi<0D$I$I$ PlLi _@ibF$: /SHWc7_~ts$Udry:VGעe^,,j$I$I$IHbb`%XJ]s+T%Mr$IwIkq0p#HJQD)k{Ff照w7qqj!M;T$I$I$iIb(Af]Eע54vșg9l4i|.ϕ$8p$V. WIZqD4uҿ//S CBڥ$I$I$I) PYK/Ui#S*CI))H)[IFeޔfj(Me+zB|}ڍ$I$I$I)U[/g|,ʅqJ$i> xO9SDHJdy:KְgM$I$I$I:wR1g<\7!.)P;n4i>| -I: KΦ < |ET$U\&)ilg&Zpɝ<{ %_lڕ$I$I$I'4 mys&]CtS$Iā$i2><\0/"IU+.!5g.u/m">}Fq j}[I$I$IT]!W ]L-4uTʒ&>ϥ"IF%Ii?*z߀T$U(Me2 A* KE4C8p$kǀo~LHRՊ(bVG[cr^8N2ٴ+%I$I$ILP.@sW C[c*BByaIwG=H-́$i*n.~XjK@D9Jk;n8+%I$I$IL'1 wfZ;"NTݞ~8r$I%ISf$Zb(hcBڴ%I$I$ILP.Vޙv7^DsWYqv6Y`w)$}wIT-䙇$u!ʅ  A~/=Q+%I$I$IR5ː}A?C[eBjf2xyG<[$I%ISU <7|,"IU+c8%k3#O>ijw#j!ʤ])I$I$IAH\9X*:~v;'K:>lN"Iҷ]4՝nXfꕔKEԷӿp r$IwIt2p -`nAWR.Em]p1N%դ](I$I$Ir YشyΧ\m=IHR.'&%`-$f%IۀwjB .4wgmt-^_gl kI$I$I4$1:1Jf$. iI~Rn$uq.I&O|͑TB'E.gVg//yg︎S^"(v$I$I$I:Bq :0xt .TL7NLCh-$%If $T$U$.CQJ 瞛)(JI$I$I4)K˱+Ymdrg%MGl"n^HEʁ$Z> < m-"I)r(͂WоpCOnI$I$Iʙ3pv#7\̬9H%r) OEs΁$i&z#Z :ZhlgF:N>L(JI$I$I{vmQ36.i nr$Ip.IO mIZ!IL*wL%_$I$I$*)Cmc=%먝B&#Ktw8n$IrB#IRIV1 wT$U$!hμu1 wMRơ$I$I$IX!BmNr5$v.i~珦"IҔtFot$U=T^DS-TB dsyfuнt#{w '@)I$I$IRu BCr M]=7]d+| 8$H4u8p$? 2rT$UBkYs9c~#^$K$I$ItvHb70g`E;J,;?"IҔ]oo4T^\KHR I Dj{F9}q ]$I$I$nI 7^J-6Hr$M"IҔ]p =TiIV$d24`t -g߽_D%I$I$IB4uou$vI~sє[$IKʽ|Yκ!jhǒyyy DQڕ$I$I$M-``UokH$COC@9I$:e96pmAW 70g:{y]~#'{$I$I$IEH~v [sx5 MD !q.i}ʦ1`4I$6cۀ+S-TB@Q;k6+о`)wJ\*;t$I$I$雄|u2o6!.II׀ہ)H4-9p$9|x~% IU*Bd3Է3j.^{nȓ;t%I$I$IJ[@ūr%s撫xjvS @HFí$Ig Y.!&WS,ӳl#%N>%I$I$I3VH`vwC[|3]Ҥ(!PLEiρ$IgOxʭ>|, I+S t dv< {w)Os$I$I$!@M,7_A 46e2$I;MRx0r$IUÁ$Ig(x+CIZ!I mM hXƾ]wʉ$I$I$UQ.˼U["f!Bp.i;pp<IwI&qm{_ TB @6qN.~ 6gq2%I$I$I:Bs3jU/w.ir%M@n$IɁ$I+^'Z$j@AVyi^< %I$I$I]] 5s,%_@D]ҹ tS$In%I:7瀿>lKHR !"jt 9](I$I$Iҫ WWG[Yd2YB8n4vPcTNp$Í$Ip;FTj*I(C]S+/k*s ?~'H$I$I(|9 dy%MoL7EŁ$I8 |YT$U!+Fϊ:CI;P$I$Io-$?֫;@gd*Ifc8If $' m禒κ2g":xᙇٳ&^~(JP$I$Ikx53]ҤJ H49p$)}EqG禒κ=ޕ1g?U?tIOI$I$IRzdjjB歽VLqInF͑$I%I:F[KTY*Dͭ _x-=+68N*SH$I$It.e\m0kNlKTxS^M#IҔ]%>@%m!L6Ǭ^V\vWl;ԑ8t$I$I$M󯡭!|-_#$MLrrE$8p$ij*ǨWnIkdtg矸=w@qtBI$I$I(vV 6^F5BTyiITnrܓ@!I8$ij+;pI2tot `߽_஻IJgp.I$I$Iz*oC].(ʁ+49b*7Hq$I8E{\MdҌTB(~v/>zWnۯ㥽OI$I$I^3+GQ9v Ms Ө$Mx ?&$IK4}`ʋ PSSIgۙly˻9 K>v($$I$I$}w24`1lM- qc*iR~8͑$IwI2p9T^_Ԥ%J@25t-YK[0 1BD%I$I$I$|C 6\Bߚ.iR_<ߒ$iq.IU-`ȧ$Z .]Ϟ7cKxK$I$IҌWC&{z핻:l4J>͑$IwI"i~dҌT|2Q7 =7xs@̭fSn$I$I${g9Z kh d%Mx03&I)$U~{*W?Аf*!2Y9NzF_>!4wI$I$IA*gdoiah]lM-!&/o8ItV8p$Tn./uiFIR!B gf .c_awP ]$I$IXeoچ&BÓ%M *q.IRUq.IRu;~XKT! b7ѽlyGzg>HQ$I$It|}L΅fυvI~3-IwI[[sZLQU 5oqzW?ów\ÇD]$I$I{Yxᵴd !T I#^pHTKTpoRlTBDQerس&N&$q4w$I$I4}3g753rzWn&[[OH3vI4#-T\$%Iz%I9? `9PfL.O \;nH$I$I)/$e"57P7k6I8{tv@c:{Y?ɱ=/rQBRȡ$I$I$M!LN^F:Ve2.ieS^H7G$Ł$Izٻ8s>jS)jbɲdɖ+BHX! $xABH#oHXKBB۲-YU-ڪn;+5g>F^~wqE2+MkO1ٽJCCI\;t$I$I3/=u=adw)V]D$&IBI6l~x p$I ́$Imu]t_4-YI\%WoUt-YÎ{oaH*# DЅ$I$I4qI)fCyZ;IJWCIʶ" ?M$Ig $e__~ 3H#I\u2ˮ~3gƯqd.8t$I$I&@Qy \tS{& Ia1$IQ3b IQ/G")\ҘO=73|(iR}$I$IHڸ=еx5Q.O8l4R;`IW!ttFxU$g>` 0)h&ٿ[<Gr%I$I$iHڟ|S{g2޷$Nq.O ͑$IwIR KCmA$eVILۜ齁}6km;FWrE+%I$I$쓦q{.k_ꉽ.in~x*l$IoK.+IA$eVR}2{%Xt.6}}l42H|$I$I$$(21kh4T[%i |;p$IKS~Iʬ\P+^Oul뼰a$I$I&$=̿:&͘E\&.CIʶ2#[$I8]$. m@oITKEZۻXx8yqDA )I$I$IgNP;Rh`J 6\Ky$? "I2$I:>h Z$)88s .dw;9vk$I$I$)|iӘrf{r)t; ImHI%%IXxxj9@K"I/i]pKcƛأT'H]$I$I J塾u =KΡëhA8]XM$IYC$7Ї4F*ZXa O6G$eUi~C'HD7x/@OI/AqqﭜZ \'i|SY7N/J3߁ڂIʬ~ӳl-ٳ{NF#@Q$I$I~()1MӦ1k^A]ch:Iw#`SI4Q8p$Ig}uwE2R(ѽd5nCO=Fe$i 9I$I$I:%U"hȟE˘ZZ:GKk.p$I`sHP xOYXH*4NnckO]_Q|HI$I$I7i )Pob,:JTFCIʶEm!6G$MDQ3b IfHʸ|]o}m$T!ʅ$I$I4 $1 NuWл|=WJ$eAn$!ttFx]$ \p>09h@V]̌y7xO%B>ϓ$I$It塥k[ S.L8 E$_9ِ$Ig_  #)$<2HNwphVv}3ǞK\$(R$I$IR $16s. !) )n'¦H${%I[\LZ$)jrJetYsldϖ;<|$I$I$iZڞ+=9k.c TFCIʾon$I9p$IgmWVA$eV4 QDk*vs3xѓGH ](I$I$i<+F6Om{*/)GJYaSlIwI x'39)44tɬ'~MٽQ9eI$I$IRC@}s sS-R<~Q%g>H$,Q3b I:3W[$eZJ(ʱofv-K\%_(O$I$Iҙ&WP_`JO/}kwiB4:ORn6lE4!ttFx[P$e.m._L Z$)""3W^+ص6m#PÃL$I$IRVP@.gt1sJ[&S>NR -l$I]$e^?X-YDx=ױޛ9#I$I$eJ\6E$il8p$IYqoHʤ("Mb'4e^szf+;GEyd\vI$I$IĵW}SmsUL8jyGkv?v]$)K$.Iʦ("T+:ٷÉS-' BJ$I$Iz)Ҵv=_]ʜUA.OiDK;=-$Ic΁$IH? U` HRfUF 1g+`6?HPK$I$I:PԎVL1JS:iIC௩]n$IK諧^ ,{%3&ˣ8j4v_>"I: \CV$I^9]@{ IYK]ez2yOҘ9l6lE$)($I5 7`5HRfGky <(C/@>p$I$I4T˵::Z C'L!i,U-G$IK${@%as$eSDO~ Kdצog($$I$I1T!ar+_J-<2DڰqrIFi~C'HƟyف[$eYRL/rG:Z%_Q.t$I$I-iq fcΚK]q!i\R ')7vn$#:A:#(I~x'p!09hlS_E._G缥ym}Vx(J$I$IzyR!i3{]q>s\N}$JC'BIʾ=_n$I:k9p$In?y5834&RJ'y^t/]ˎ{oS>r"ׇn$I$IƧ KJtϥ8xq_Ҙz(CH$K$>=Eas$eR& _qTV8d+{v<څ)I$I$I?XC\&:ѷ :VP)0zڰq!ICp.I Cm.*+hl"JRo1}yyn=J1PQ.t$I$ItvJ!cA?3Wn`9~as-$IwI[n!$eWetϻ{xNOZB=p$I$IRMZG9CSQ=J G0K$I$IY&P-מn?s3kE) ?S%_n$I $I~ "{ʒDyh(kZ=΁䁃Qm.I$I$eQ Q S{־֎nC)a1n$I$IcwoV͑IQD$O%W3Ԇyc' ?I$I$eDR S'1c2y}( 2zHm]yρrIlr I4CZ`N"IE$ œǘ;maCwrdӔG! *I$I$h27p!3Wng:8f9l4v7SoWILs.Itf~ GWˀI{$eX8 QD阷žTːo{>I$I$iZG9hC}%7R>I%emoon$IK$YwzEas$eVRBr3$I$IY&B\LXʜ599'vIcq[$I&g $I |uZ+hl"JRmB,`4bBgI$I$)4j нd13W\@uDœNvIc p c[$I&,$I= jW A$eVet(1{t_އfcst^HPJ$I$KZ"h;%RS>A$ %e[ ZI ρ$Ik^X6GRViBQz\Z:܃wq >BЕ$I$I( $UhF9<|ɣyAҘr\]$I$IRUH*6 3{]H8BQ!(im5P "IY$IAmOsCIʮZ&,3i ;<Sh(R$I$Ig4j st/Ysӽ<k?] )$Ia9p$Ip0)h̪Geѷ}c߶ٹ$%I$IJR\:ѳt-3W]HC$#CiPR_n6E$I/wI;^o\2FRviJi$| _EG>CO=} _R$I$IgjL{*f)s) R>:OR=$I9p$I> |% 52+Om:Ky3s͡ez.S$I$iJba֪ >JHqX݋ҳ|zHet4MCJʶԆ"I]$ixjC #)422D.g[[r}r])I$IV\89t/YMuN22Dyd(t ˁ;$It8p$Ix:Eas$eUǔN8i*. :-GS2txB|W*I$ItJP-Ak$b91wRy^Ԯǁ[$It9%$Ib#W7}A$eV\)W*L޹-bc9F ʅ$I$Iw TJPg,>I ()v|6E$I!8p$I؞ ,%hJGr9zOgb; B=$I$IR!`Ƃ~aԷLR&MЅ|8pOI$]$IOxuIY& Aruu[}ٷmxc&_Е$I$IO\vҳl-=gJ*a#$e׀ϟzI$is.IMۀք͑UIJz._z:o}|aR|JI$IKb]΁$ $e߃瀏Á[$Itp.I0ajx#03h̪D0m}Kf?(kĖ$I$I镦P-BzW,w:&_WOet4MC'JʶgOOn$IYƁ$Igwo 83tڥiJet\>ϬU޷7-kQE$I$INjv.=ϣ{LR2eIc*;-$I:K9p$Ir׷7ׄ͑UIS}b߶Ol!XI$IYRJ tO{yXϴTEJ#$emԮ]I$ݜH$ov}YIW f{}ؿ~>œ!ʅ$I$I?*Ehh3Y!Mڰ=MC'JʶǨ ? E$IwI$ǀ?W-@{"IٔTKD] W>g}ٷm} u @:T$I$,.нd=ҳl-uT#IPR> | x4l$I$IQ< :mg7͑UiP"Wcy6wo}!_R$I$W .CۜnzOϲLR<2:OR}<K$I/wI$7 wn0uMKd$I$I]CuZ;[^z=Χ}"jy.Pc"$I#q.I+hoo=GR"ID۬ѷ}ρ'<uBWJ$I$yi #[EϲuX\>Oet4MC'Jʶ'n$I8]$I~4-IiR)t/[C{B.csDBWJ$I$9Rm޵pk^ISG+ymؾ9l$I$INOndIY&1! i| o_Е$I$Ic'Bfvг|zerrP$u%I$IGQ=ӻl-]KVSJ8Jġ$e1cm[$I4A9p$IRh ٠52+Mb#C u>,sރ<>^ܹFׇ$I$IY\J v1 ^z"呡yo$I]$IgoQ 2h̊j \t-mspy~&N0LC3+%I$IDPIӛ8g=]cƔGCIʾon$IK$Rx=`Q"IU-i=طm3=@i[j$I$I3)F]3} r:*4 ]()۞>Bm|I$_9p$IhW_CIʦ4M^Yt-9̳rPh ])I$IR30s:K㤩TKTGˡ$ep#y-$Iǁ$IfzNm9*c#Cr.C9 _x4M$I$$e(B۬fk4c&qLyd(tw-C$Iǁ$Iƃ[]s& 9*+Zڻ:{:RqDЕ$I$i|-t$Ip.I}M #)"Df1iz}طm {H$4B$I$I!)k lk[BS)~@_S-$Iҏ́$IIjIE)MGt,gJ<9@}SHI$IBe_]+3}r[KE:OR|G-$I]$IYGHʪ$=Nx0;hJFr9f/`rlÛxIF%t$I$I *5u:-ji4IBIʶ__v-$Ip.Il'7 Iʤ4I(+Z}t.\ʞ{(4@I$IZTJ0}`6s\+iFR2:PR9jGz$I Ɂ$IT!zas$eURRQ<٫.f~> OZ ]$I3, A=%_ΌEā[$IK$I5 p;gIʬ l!I$K$I #)JRu*W_F۬>06FiQ$I$"Hc(BkG /%:c& %i|[nt$IÁ$I[oIʬjH5n͝Ͼ7#RЅ$I$OA(4 =g-m擯oZ%Myiπ[$IwI$釷#C? pj*Kӄ0|yK2c.cϖ8S PR$Iq ($UZ4k/1͓EKk/n$I $IK7wo Z#)8&DҵLk#-=wiRI$I2(Cv%3iZ'i;lt&|8p~'I$$%I&)+.Z#)JRiJs^NۜvM&Cï$I$I Mtp=cJlrTKE|-i 9p'p9p$I^ $)\)=}wѷG6T%t$I$Ia!2 6sBC#r4(tz&`wI$i|s.I$6oޅޖtEIBet\@g2&wfem?71:I$IA\Joet/u qL8o?(I_Bbg[$ILpp#I$^['o>lJ*I\e3w.ʮ'hh\$I Bq&wtXi3l1%joJ[$Ip.I$~;> ༐A+!ENfgٽ[NBC+á$I$);" 4W\A5LMZ7’ԃ^"I$ewI$i@mcΠE2(4Z%:s]L>@]SJI$INʩ3W50muT%0$l"I$ewI$ixpCIY&1B˘5%+qmsuk$I' "t̛IcZU*őAI |x8t$Iu%I3ශS_FRf% IBCdzg~>;6 E'Aǡ$I$ACiZ;Y|Ut/YMs "Z ]()n  M$I&$Iҙ5| }Q"IU- hi+h̞CyZCJ$I+ B]#,RzWcےSoH$IwI$)g~HR6)QYvt3c2vm?AБ$I${$1,]L3m|Z+%Kk#"I$MH%Iv6͑MiS-k` &ϘEy78u<~'I$I % ahAW1}`)SIjq$t4QA6E$IK$IU]rEw-YIBBIӘuzgqבֿPQ$I3+4OcWӽt5-3t/qCJ$I&(зn=sϻI3fo$.HS(KS{n$ItwI$l-$I~H~-I$? m_v]N$B\ W2w.|g7ЋC7A ])I$I:ۤ G׼h2˥yVO-$I^"$Iu$p%[A+!hƬ6w!?z77.$I$-#P .٫6>\R4 ')۞|8E$Iҏȁ$I4>%hlJSr(cҌ̿X}Ol#P:R$IJ\ۯXNߺ+=fj\ ')Fn$I29p$Ic;7_~6lJ\"_mZһ)u=HrЕ$I3%R] \|-SBTK{><H$I: K$IR~"k-YI\Z\q͌/QD+%I$Ic&(4Mmd{j&O%Wʡ$eFC=-$IN#$IR6Ab5ԆsCIʮR(ijs\r'{u %I$I[:Bfw)]Ie8')r?K$I$IR>M HR6)qD1iL]:ˎ᧟" $I$I[Rf,Xȼ fj $qL\)ΓmUρOiI$IcŁ$I})+BIʮ4I2.dR,?w}Rh\.t$I$J`6_tSѩ'{I'πmpI$Ić$I4q_87hJ*J5ϙm[y-*5.$I$*Ehh)k|&M!W#T1~n-$I$Is ΐA*%橝{سݺ$B}FI$I!k;vtk V.i|E$I]$InGIYf7ҽb-ﺉw$W\>t$I$黒*t1k;BCi8lt&5Ϯ*[$I$]$Iت{/ Z$)$&Mb Y䟜šga]71tu@.t$I$M` T1eƂ5"SIcVC`I$I9p$I/ 6o^X2HRv%*Q2[GaצoRM(4.$IZ .YnuB8VyI;¦H$I:8p$I^ ]5hJI(G/sm!u%I$) D9]q.^IKg78&j5|<;l$IwI$I_S{$o Z#)4!&&waٵogZkۻ|]mh!I$I:*La6wFҴ ɒ4>% P "I$,]$IS |=E2+MbRPH2[=6HBWJ$IR6TK4{ѹ[r$j4Iw'GDI$Ig)$I~WS-YI\(u Vms`{I ](I$IW\|} /ay4\$:n4֞ E$IY΁$I .o Z$)ҔZ!hK^Cx+} \>p$I$#I i 3.akC4IHy))`͑$I48p$IR$.6rS"I& ) B=S{rk;gƛ8yNBUr<-],~Yv>5NA )I$Ig^@B~_t-Sz}$xn$I4EinΈ7~8t$ID ,WnqQ.iB\gƛ~Ery ]'I$Ig@ I -pSo(Gġ$eר=s0E$)!ttFxQQ$I6<WA$ewuMXr.} yٽvʣ%~!I$)*75пrf@}s+|$N u.iL=6pp{{$eXDQ)[w] eצ8jJ](I$IOC+ΣhIPiB;l4Q9͑$IE%I$8 /HRv)Ihf5o{ovE%I$eHOHΚEbڬ u$i oO=@9p$I s.I$L(;>py"I +>w!Sz8F$Iq)q/Q'K: (0E$I]$Iҙ4 l^\EQA$eVmQ[ždCwQ-r %I$K(43k9Z' I]X  8E$I]$IR'~eIA$eTJDS1Yr7 ;#(t$I$r,cWEP4u.i 7_:$I($IBI3SSI]ZrkʑOƛ<<$Ié'Sf1 DDINR?v9$I&*$IB+|cJiZ阷iطu3;7} Q.p$I -Mu}^Aꛈ9$!% ')6lH$IK$I:[ w?š1& D9 ^},g;x~&Rɡ$I3*M P똻4NB+W%x}K`I$I:Ł$I VfK:R$\),t/Y_gHĿy$I$4\>G{/|SfI'PIH=Y A$IK$I:A/ȇQiJ tf~>{oaȡD$INN.z /EA$*O5$Ip.I$lVo6V]$_h`U]އṇCidG&$I^f٫6oh&ˑvIc) ͑$Iρ$I` = 2JR6iBE7ҷ ˮMqe"$I$釔B Zy\M6\41P{Z$Itvs.I$iH}1]é.%M!iɒ+@x &c$II!i˼ byD״^l4R| -$IwI$IM8 4G@}&IYujliO=Ȏ{艣$%I$Rr91oݕt/9\@E%2mශ'NwI$I7K$I뀫]t\]=KϧsR{>\ͤYDѩK;10 j+I$IR8p$Iu)/~|#@k(IE9&a~}Cjm]$I_ B !$M CW$h$I$1$I& p!A`%IREQ Ρm>|7=I8!r.I$ i \I>bfBC#\x15 < p'+$Iy%I$M41ppj_BFIʪإDWҵ`)HRryzc~NmI, I$J#[]k.:K0 <5p$I$K$Iҿw+MzE2Eҽx5;w=AT$M!$I4F:0«i4*!π9$Itp.I$I[NM%4y+Gv?{n穖9t$IN4R(43iz/\M{ߒ%iL>>bI$I:8p$I>\LZ$)j#y:xq%K$I/OZۯ y&Ocy2 C}'Mon$IwI$I^KB)d KsW_JϢ|rq4Nr%I)M G44еx5]ICd4u.iP%$I48p$IWnf*Ic$MS[YtYS).K$IÒkjm/3fIm.Ic' 36G$I$I$1=hLKӔIݳYӏo0rr"$I@ :Zg0wt-YQԋoQK$I~H%I$Gsx;1j`J"Iuj|ӽ\:-aw)'$rCwI$&:_e2Ͻ٫/PHĐ:n4N[H$IҸ]$I^+7Iʬ$[%KV[86*!\BI$)4=娡 ӿjv&qm.Ic>|&p$I$k%I$"5ශ3D4Lic٫~-߷j8t$Iē@ Ftf2u))vIc+ #I$IwI$I:}F=z+E+IHiofl6#ǎP-Wr˕$IRij{@K['s\Nϲ^lt&|ы[$I$)3K$IiUE2+I^u!]VoQ>IRMJ$Ic$!_43o+kj!VKkwn$Iq.I$IcNR_A+V71н<{ /x0Q.p$ItI:5vv I6NR֕]_ "I$I]$Iǁ/ #)҄$Nhf叽v=λo}T%ȡ$IƯ44&O«޿ RK:o n$ILs.I$Ig NcΠE2+Ib:-}>|7=x'#Ǐct$IKbabh@T&9t$IRxiIuLCkj VCIʾ}׀/n$I?:A:#(I$In>>p2HRv}wԽd ٳv>cՄ\6J$iJ!!_iTfPH\)}G{ "I$IwI$I:{< 7@s"IWJD<^Eϒ5lf?8 $It$1D4LBe_p mT?n$I$]$I>3nS4jD6V}x2:B(:R$IY&W}S3Sff`õYHWKs;=Q@E$I=4}&7~8t$I ")u imصvZ+.jJ$ICBRBC֎.ο눢% o <E$IzI!ttFx]$InSm05d쪍"f Vs7طu #ׅ.$IxWӳ<歿ITKE:OR'l$I$?]$IƇk7KA*R%_WǢW׳}M~)#'!(I$%Jڟ >]ˤh8IYW> |:l$I$ב$I4|{=OHʬ$IH4Mg/l̉}S%(R$Ig4 MLżuW2} $Rt.i>$I$ip.I$IO]3=A$eVRT+t,mB>|=^|( ])IMBR|CI]sV_L>_vIgA[H$I^"$I4~= w5E2Z\Wнh5;O|#p$I$G]$Iƿ;N~9`Z$)Jq^&foR$ՆL$IZ'>_ǔ>rJq$tlKmGE$I29p$Ij^~-FRf%q$2y,9-GeDQ.t$IΔ4 uMLu{jHS*á$e.#[$I$IwI$IʖA׀O \LZ$)ѽt-a0|rJBWJ$i̤P@>b\WPhhR$n$I$F%I$)?EA*R!Zze=7smO%M!_Q$I[\(mt-^JqJq$tl |9p$I$i 8p$Ilʩ; ,#)$<2D}dV-.o؞]ˇ$I˕ĵWcK ̣[LR<2:OR=|PI$I؉Gjb~C'H$Iu^ ") M۶[ ʅ$IK&u&awz"yop-$IR0:A:#.I$I5oA$eV4 QĬU1}Jvm&=уTPЕ$IR!_]],;y_I}$ʣ1-IcH$I$I4l^>hlJS#' ,'b7̡gx8$IJ)S`eҌYTÔGIʺ H$I0$I4q}k?,#)"8<|i|9};ǞMix\rН$I$ MLG+>9qJydORh( ])I4q)TKPטcJO/W_s/&/P')7vn$I$]$I/ \:HRfUF ot-:e! %I&jrL>e*Q".QҘ9 Ͱ)$IwI$Ikq 鴋 I( k~+ֳ}4œ'r+I1W!qd:-``ë:Jq jo }[(iL'n$I$E4MC7Hg?:A$Io&XERrx!vm6ǞIyLru$Iٓ&u͚uWнh5IS-Γ}_8E$I7!ttFxK$I6p{]E26Yer;{TP׀C%IN*%(_;JsgiյEZҮze[rCx!;15!!   `f:.Wre[,Y{,Mh.v>sYxw~wvҹuzU Iճg-$IAʁ$I$xG;ƠE2*pQ^DXwxh-G"M+C,I$=;"Dlnat|&RHaj: \ |n$I$IGwI$IR\3px prIEIB1686~5O=G#|mPI'.AR 8k>ӖnJ?2lw.zn|)t$I$)K$I//MQD\*w`[Yҷ{xu^(#%j!Wc%I2j/~T-ͳguYL\ཱྀ9lT5?6E$Ie%I$I^%_BIʪR Ǥ9Ki6knf}wwJGf/I4)zZ{ӾD:Nm}# \|XE$I4 8p$I$Ko^ 8Z$)" ~r55rmsk~ڵH!\|6ylj~ |rq(. "I$IFK$IBnas$eQEqLԎżI؃;DK%I~S\#ƍbL;,&LM+voK#z\ā[$I$IÌwI$IRH?#)T .)3x;t׍(ބ@:V$ k11ę3t&?R?oq%U*;-p$I$ir.I$I m+7πwM!$eW!\'='Ķ{bߖ$m܋IR(A.M]]t,>=)O$%U^J-$Ia΁$I$imS-Yas$eR)e/aҬlv~x8hl"L];/~=;gǣz$I 0b&ΜϤ{>qK{YHI$I~ͷ$I$IQ/ A$eSQ.S.53v*6s+{6Zg.f”l[b׆(Fr8t$ISi#3X|N~I偒T;ˀ/H$I8p$I$ Uo^x! $)" ppQM WK̅$vIt)-$I$5%I$IYt M9as$eR)i-knfCk8mQjCJj( al[+ 1eiI Ko "I$I]$Ie?..8.lL"r}1/x%g/扻o{zJ$ITq]̔=r}K{/æH$IT&I$IʺKw/ڃIʦ(\\c”mFk<޷Oԍ(:T$=i>\r2msʃKO/n$I$K$IG?. $) t.=l{6mjG.$ID]oXt ?aj*?2l.p$I$IDŽwI$IpsF5as$eR& }wg֙ zl]{v!W 5~FA\'6Ѿ`Siwn(r.|-t$I$IǒwI$IpuRBܰ92)EaTS^*eJ| CW$ITJ0ҹZ{e%:|E$Icη$I$In+ 4-Y^J}L 3hG BWJ4E&P|-L?Ei"#?|Kؾ:p$I$I8p$I$ o. $h K?(_eϣi,w_M!WKci M]/<+<?풪gT6$I$]$I_ x J92)Hk3^JK޻kUN$I\Qch%0~LGۿ2lKUWbI$I$I$OE࿀K?2tHR6Eq@\,0f4wc}j GRjG@.:TlJʩ#c2: ? 풪gK)$I$ .%I$I~ x02h*&9h6kogcQNnN#bDyh;'6Gp i^IU|*pkI$I%$I$~\W.#)"4^rL[qMf;ضNmy\ ԇ$ih+ )ø/ZN匙8‘7pbvIUs9p1$I$IK$IMkHʤ(");㚙}h˖{ncwѻz$IHRR7Ҿxy\b?}v]WC[$I$I|+L$IUTڃIʨrrSf2m-3U<GvDН$ ni]P,]LǢi|M-C]Rl|x(l$I$ICwI$IG2t#J(>H} &LAYl[{'֭*CwIJ}3i_x"ms1r| ybjJ_ "I$IҐ]$Igﺁ*as$eR&1ki6m{Wq`j _:T!.Bcۚh_t" W0n4};2lw.z.:D$Iʁ$I$Iݷ_R9 ܠ5)HJEJEF6ObY/k6mN߁ ˇ$)$R?]C׉+h_p"$1}V]R@Ǣt,\KmA!4 ()_ E$ILp.I$Iuu5zas$eU cѩyj]6CM}JIIMӦйhǨ6R8tO|I. "I$IR8p$I$:~ \GeZyAk$eTDK8aٲv<@M}P1q E:˙mֵQ< CwJܤ /]NǢ4OQDr.j6a)$I$ewI$Io TNs&>h*!8k1ڧ5k`w'RPҾDZg.a8H$tl+ "I$IR9p$I$ع}xas$eU$'eg2~LZzְ[ēkƏHr"L2'1ynmw¡ջ$UO!$I$ %I$I:~Je~-O#)"L4m2~JO>F^j!+A*)CGй&=3I%@݈tp6`Ǵlw+%IM\ru4ӱhms1r/BIʾ+o? !I$Ip.I$I`c*CWo͑UqD\.1y23N1bCrJk(Rui ~hgSl6ZZ ])IʊrtOsɴ\ĨI{)')|&p$I$IK$I4]4-as$eUa8/?qjG@ ])I -zL=IL`KeTI$IT%I$Iʶ]ܪOBIʪr(kc&v0k׮b$P[D;%IZ BCKK3yrfS8r4IBJʶ? "I$IɁ$I$I?7 \6GRVIB|M-)c[ؽ~ :׆$=Sq "4wwҹTZg,qDb?yKw7$I$I8$I$ /7͑UI\&;LØ LYvc'Ocy-yˇ$!I >:Nu"ƶu&1á$eU_[$I$I1]$I X %0+dKErض.F3aL=pb6@$oi 'D0e&Zb?iNmOn$I$IǘwI$ITn~'E2)MSʅ>\yAl6?Gr&ϝMS0m Q.S.ΓmT? "I$Iq.I$I^!hJR/ L^1mSi龟knf;mXJR0IJ0eSFk|FN&.( '). C$I$E$I$WnVN #)r\q|+NmЕP A'3y LC._K\*IBI J-$I$is.I$I'znP Z$)4\'4m'3wē=Bu+%؋aҜYL]vM]3kCR*R.Γm}?? "I$I2$I$I:5lJGm}#msc&z'O줮rЕT}I ^h:ǟAlDǔ }$eWUC$I$IRv8p$I$Ip;ρg͑Uqq L¸<=H$Ia$I$I:b*'?J'-YqQĨI4oa|G7'V_OHJ]cBIR(AȈgIۼemJ.\*@-)$UjM-$I$iq.I$IB \ < HR6)b(c\tF6Mg׬d˽k塦.t$rrr{tw*:mh$.)ˡ$e.#/ [$I$I0]$I$,p }as$eU$hϨv&\;c6@0dI%e(CswN8ii34)CIʾ:D$I$ _%I$I`p /YILBci_cᶫ8QE+% Gi #0gY@\,4 4 "I$I;$I$irIʬT$"F8SzؾN6}~PpRꃚzyL^pc'M!W[GR*ZBRU=| E$I$ p.I$IQ9q!$eSQ.τnne<6q=;~|-kCWJʲTfg2aJu#F%OmTmGP9)$I$IwI$I4X=p g;Hʪ4)c#hѭ _'vRQeIGI@:+O˴4@$e瀯k$p$I$Iop.I$I|=7hJ% b t,hLJӄT a”n̤ p|l=5r I]"KP.YL?l&LAmh$&.BJʶۿK’$I$ICwI$I4= p9.Mas$eU!.SH%i®~/9>FB4)qr-=ii]ұu"!o:$I$iHq.I$I"p>CE2+) 1'3}:Mw\CHB]cBIIjs>L-ryr|(^ "I$I8p$I$IC>**'н=hlJSR(cNFogm!A.t" /ZBc.u $qT ')vFH$I$='%I$IRVl [ Z$)$!N jj0u&Zhwn]7S|JIRCZz:9E$I$jK$I`3yZ{Hʪ4|Mf3nybw0~DzN9is9>"IU@)p$I$IRU9p$I$IE <|Iʬ$.u#G6gڻx{X/OЅa1YgǤ9KhLnL%Uǁ5-$I$IDŽwI$I4n |GRfQq\ N4w)n>FrR+ *I%4k&O~&O4IK. |E$I$)6$I$If/?6GRVILiA$eV!9O`|g7ᅨ_E 58\@&\lΓm7"I$IwI$I_;\> L $)Ҕ\"r4oeI2cmE kCGJC\xN<-jjIS%UVC-$I$IwI$Iߴp#VீA)MrB3vTJ[.gM! ])eS@Oai/b|g75u @**l9$I$IwI$I.6~-A+MbRs3e-qxAjCJRQc9Zg-q4i'-$I$IwI$I߯x;p1!`Y"Ie"F} fxՔ )5u \gM璓iD.'c4 ')Vnn$I$IK$I$==˨ S HR6)ILhŒS_ę p<}D9CGJCKCL$!MK]T `GI$I!$I$I3"p%]@m"I i9!3/~ eͿ`Е @LfƩ4m6(1A'.> las$I$I$I$I\ XGK-YIPSWOŎGְ~;LMC@i*C9LQDQ$.u  "I$I48p$I$Iz> Z$)LEԏÔ <͑$I$ g=;ttLx$I$IұO$Z$)*߈&;l^Mw\CDW5DeQΦs)4CH$tl[M9-$I$IÆoaH$I${W -Q)ICQ7r,=Ǥ9Kp<]$)rW>0ҔS%U6TNq$I$I1]$I$)| x!dJS4&Ƃ ^KxKٷu3|e7, i I ;01aLrpR{$UI <u`-$I$Is.I$I4$.? $)4R(Ȗ,~ɛؽaR<\.#6:RHm8E4O F&D~ӑB DQv&O4>K=]RդTvq"I$I$=-%I$IAr{m(ILqvoO=E.$fbZ{@.r.X(Qy1?$I$I]$I$ih*o#p<%UQ.u"O{n剻pQ.^$~HKNʰi**kas$I$Il8p$I$Ikځ|(I{mW4͑iQDxn֭{ݳk43rxO>_I\ |x($I$ݡc%I$I 8'`6PIRVm󎧩{.ﺁ-wBHgNiDF,=G]풪<|/l$I$I6$I$I\ K`"%)44dznz|-b "g@e5uu̘O3e2I&I4Us4MńWwuljz M'l~i2m9/XN$ I>`j*K$I$)K$I$ /1peTNtlUiJJ Q)KNelZ߿Bo/bi Q7b,gsoM$'K"''@I$I$K%I$I~;e{?M2+%Ibj}˘hJ2:fR&;^ɞS\c/8is>I\,:nTm1")7l$I$I:$I$I:Z ?ߢrr狀E2-.4%/}O=z/W]ۉqi PSW艓LT$.CIʾ[$I$I%I$Itm ,>,-Yi ̘Ol6#vc\6iZ9=Wg&N!UT]?n$I$IR8p$I$IR ;Z$)8`34s W]G\(MFę^q.c&$UI | fI$I$ewI$I$U۷^JLrڑX^-S< Ik݈FwN2r@R.u)ʨ9$I$Iʲ(M 1 N$I$I|8h ")\\gCw8Iʅ"Q(tӗq{M}[ژvٴ=$)&I*ٮ "I$Iaā$I$IBvx -YIL~^ʆUW{ FRcX7wì lboQQ.t# B]LJ4}I@\,Γ}ۀK~(p$I$I(M 1 2"I$I 7pEREHpX |x p$I$Iz.|wwI$I$ \X Z$)$&LlZu5^M=$1r$G651y2.?$14KH$I$IwI$I$ > \ CIʮTYg'qՕ0Ç }ahfI1fRq]R%Fn$I$I$I$IQ@{IY)B?#&%oa#ktuܱbo?QrQ%1::l&^BR.Q.$vۀ˩1gwI$I$78p$I$I`xU#r)hJr1u"Zz[zq)&C=Z9=_gTk+KOeӈryvIUX|;p$I$I;9p$I$IPp7pro@C"IKEt-;I3k}CC.,qe?3}ԏ@\.KG!I< -$I$I]$I$IC%zHʬ4%.9羒O`+ٷuE&0z;z~ҹ!.K:$ wo-$I$IiA:&} $I$ | x>ER5DO>xVȡ[(~=M+׍cNAۼeIB/@p  9p$I$I:Jz.|wwI$I$ Uہ7'So $) 1;tulz$.@ '2y L;,uĥb_;|5l$I$I8p$I$IPw+ME2+.r=^@eljv{}HS"1a<\F6M".;nTm%!sH$I$IωwI$I$e׀#)4\a\:v{w\͡]O2OK|Rr/t[P"I$I$=g%I$I%ׁgmA$eVĔhEYۺ=% O7"I$I$=$I$IR^ PIYKIq=q\ $)7o|q$I$I2Ł$I$IL@6GRf)IiDRvTVs:pUI$I$:K$I$i8 b-$Iz"䁯$I$IRf9p$I$Ipqx/੠5$I.T9p h$I$It 8p$I$Ip m@_I x9/$I$IҰ]$I$IՏhL$ Nrz$I$I48p$I$Ip/Y灍[$I󑳀OM$I$Iq.I$I$S;?~ #Ie+<6G$I$I ˁ$I$Ikw/ JAk$IRJ*?y/I$I$%%I$IxNZ #IWH$I$I]$I$IKsPKs|$I$IT%I$IKO/F$=8PZ#I$I$ewI$I$ z1$:HP9.I$I$jBH$I$I7͑$IJBH$I$IÍwI$I$)o? x)(l$IO[$I$IaɁ$I$IRhZ$I )$I$I]$I$Io.4-$) _n$I$IBH$I$I^ %p[I6*?o_vI$I$iwI$I$ip

E$I$I  I$I$Z3r #IҐWoq$I$I49p$I$Ip{I;^ 8p$I$I&t$I$Ig$xE$ >6n$I$I 8p$I$ImorE$w9E`eI$I$IςwI$I$ih[9p x5$s -!$I$I=$I$IR6|iX6Gc^;#aS$I$I$=W%I$I8| NA$IO/n$I$It8p$I$I1ϩf@]"IS+5[$I$I$e%I$Ivz3yAk$Izn|5t$I$Ip.I$I$eW_o^ #I3p 9`wI$I$IU]$I$Iv~ xH?l/}*ZE$I$I1]$I$I^ $h$Iπo? !I$I$r.I$I$ OP8)l$ITNm/n$I$It9p$I$I_+Elp `cI$I$I8p$I$Ix?p9UȐAaE$I$IR`%I$I$mH+o"I$I$ipp.I$I$>K*XFES9`I$I$IwI$I$I?k7 t-$ev;ׁH$I$IK$I$I}ʉo^Iʰ-$I$I1$I$Ig$ % [$I$I$ r%I$I$=] sI "I$I$ip.I$I$ |xj`| IҠri_n$I$I48p$I$Il=p]Ee5_$I$I449p$I$I\]\Oez9nrb-$I$I0$I$I^uTNs 0-d$Hu[$I$I$ewI$I$IGiI5-$I$I2ā$I$IjuxpAIQtMGC$I$I$ewI$I$IC:U%as$I=[$I$I$ewI$I$Iն,p#*' $=#O_. "I$I$)K$I$I:Vx$iJ/"I$I$ip.I$I$X5rI$I$IÈwI$I$I!PP}v Il "I$I$i8r.I$I$)'koF-u-$I$I1$I$I6GKo$I$I$$I$Iˀ˃HR| p8p$I$I$%I$I$ >7 $I2}-$I$I?8p$I$I4X=|x={m"Iڊ7oM$I$I΁$I$Inp;p :as$iHS "I$I$IwI$I$ICA  S-E44 |x*l$I$I$a%I$I$ %7.CI "]-$I$I9p$I$I4=0pxCId\ \C$I$I1$I$I뀕U[HRP7_~ H$I$Iҳ]$I$IPW 6`V"I:@eؾ9p$I$I$='%I$I$e߀WoF *;| pwI$I$I:*K$I$Iʚ55kW͑> $I$ItT9p$I$IUWQ'T/ Z$)\ l"I$I$IwI$I$IMQ9{A$eE/{H$I$I ]$I$I~54} pcI$I$IK$I$Iv#ppN5WH$I$IҠ]$I$I;-4>E$I$I$I$Ilx9> 2HҠ)!$I$I48p$I$Ign>Z #i)EzE$I$I$I$I앁k %$hP> \<E$I$I,$I$I=\ `|"I>πGæH$I$IЗ  I$I$I gHQ8n$I$I%I$I$xx츒ʇXnzH$I$IR8p$I$I.2Hs8p-p$I$I$ewI$I$ImשR#Iz J $I$I*q.I$I$IƃC5[7͑4}2 HæH$I$IR9p$I$I_{]qMF{` &FP &LMLX bB4`ӌDI"a$iq1t(l΂][s|p'; _뾛sO{\SgZ}ECO՟ͮ{FX ՝+Tx۫V_`(so՛MCr{{kte0޶ kƁzOuKuhpf|Z/UKᎦ;) g;|S]۴#S]V]_}~pֱ`>Ww䊦w0w۶ꜦH-M?W!wRj4l..n:6ipvWTWT}D07몝plpxvVWTgWWs w(lL+նMKƁTu}ǫ{UV?02 ~c$JoM꒦۔6<wŲzgufYdzuV`(,CsCukʦLuphN(w?WV9h|`T/.6}p),eƇjp (,mWUw Mo[|^\4,w귫C+s4r ꊱQXrΨi.XB vToΪil KqI5WTY]Xm_%cOK߯zS|]S=86D'r_s+ga1\U=wl摂;ȡsMϩn UWh+: -ՅEپUF+'}c0x2vWή.Ü:P]Rtuu7Z_: k6WϫZDcuzC mc0حMs?c[PW=4 uMswh64wNUgTWd;XqqQpD:\.ά>86'Wok߇`Q(p2>[]\S46'K붦 ';'ӾjKj4[WUop)p*|궱q8JWV>Is4\X=zy[M-MگTl_Sp`46WϪ~zMS8VkUwU4wj_uguOՏVVXdwR>R]_mk* l$ggkuGST/^T=i2lx_OWT6mj`CQp`:4;UY]]}[Y ӫMmZ}TZ[uwgqi=lH ,óimշγsG V|_]w]WwU4ڿP}m"Z_U_D#U?8{=Vk{զMYf Iepdvڴ3֝﫞^}wTO};Wgߡf׽Ճ՗=ծٿ̾oz}0Xfk[Wf4mH?M^վ:ڕݸ{K)0ަCI-IENDB`MatchIt/man/figures/README-unnamed-chunk-6-1.png0000644000176200001440000010674213754165363020550 0ustar liggesusersPNG  IHDRz4iCCPkCGColorSpaceGenericRGB8U]hU>sg#$Sl4t? % V46nI6"dΘ83OEP|1Ŀ (>/ % (>P苦;3ie|{g蹪X-2s=+WQ+]L6O w[C{_F qb Uvz?Zb1@/zcs>~if,ӈUSjF 1_Mjbuݠpamhmçϙ>a\+5%QKFkm}ۖ?ޚD\!~6,-7SثŜvķ5Z;[rmS5{yDyH}r9|-ăFAJjI.[/]mK 7KRDrYQO-Q||6 (0 MXd(@h2_f<:”_δ*d>e\c?~,7?& ك^2Iq2"y@g|U\ 8eXIfMM*i_@IDATxֆ9g3H$I**HI*\EID%+9I9(9+3;;;;|yfޙoNc( @\?!    M7 _ P7OF$@$@$@$@{HHHH(@'#   =@$@$@$@$W~͓ P    + PHHHH(@y _qd$@$@$@$@HHHHJԯy2     P$@$@$@$@~%@W< (    +nHHHH _ P7OF$@$@$@$@{HHHH(@'#   =@$@$@$@$W~͓ P    + PHHHH(@y _qd$@$@$@$@HHHHJԯy2     P$@$@$@$@~%@W< (    +nHHHH _ P7OF$@$@$@$@{HHHH(@'#   =@$@$@$@$W~͓ P    + PHHHH(@y _qd$@$@$@$@HHHHJԯy2     P$@$@$@$@~%@W< (    +nHHHH _ P7OF$@$@$@$@{HHHH(@'#   =@$@$@$@$W~͓ P    + PHHHH(@y _qd$@$@$@$@HHHHJԯy2     P$@$@$@$@~%@W< (    +nHHHH _ P7OF$@$@$@$@{HHHH(@'#   =@$@$@$@$W~͓ P    + PHHHH(@y _qd$@$@$@$@HHHHJԯy2    DZw.k֬ ?|Pݻ'I$ģG0 /e޽+qıuAf$  ĉ… %EQ*q0yBH6̟?[n'ʸqnݻWu1cO Bj,C$@$@$@$3 PхngʁB{N$@$@$-xQFfk   h6BV@$@$@$@$Qż$@$@$@$@&@mHHHH *(@ByIHHHMOB6ЫuҼy8{L$@$@$#(@cchU(Q"F$@$@$@/XHHHHg>cA    _PBeHHHH|&@3-n:1cF`IHHE4ZBݻeʕyHHH JƋOK:Hҥp-HC;I$@$@fĈ2m4ɚ5,]TJ*%Wʕ+SpҶm[ɔ)j_|K?>yW4*UH)$wڋwe}[n:fqڵwY>|TMV2g,.ӆ*pSdII&mҤIɘ1TVMƏ/ڵKjժϓ?~߿ܿј6 -K,RP!d @oxyE5kAbjx,wȑ޽{ҨQ#^nܳgsM-&M*pC:iѢ>G#r7|SQ^|Ey'믿ĔnݺU^~e 9΁Ѿ}nJƍ<ǎ~@Au]f͚@Ж+WNp5J Чax^v:uDp6nh#:!E7֭[f6W^uIcy~~:% ]S,Xq^u9RBm߾]/^,tU>pYbV6&}GۈPfMɑ#d˖M ɓk9b90 sO)D+Ws?^~7} "V3~3h"iذ 2D{nǎ@$I.G7B^^ .!ѵۃdǎLa(=ɏӿ? #Ls9͎ISqDQ|yv)o7n\C o*Bo+F|#Gw)% 5& JV2W?ӧO|3gNmQttIE1<ߕp3dȠϭ s/JZY 5\`G{)7:JGN,K$@$@~'йsgcРAϛ.]:C9<{ŋ7q6!s; ~U-AĜ=چGsٲe(Q"ywębŊ4+<3O^ǛIEʍcv9`uaz0?bM1|!VecIvD[U_MyHHHq|w.m+Qw:-)@1Tnqۻp1b1߻w|ǂ+0 q}·55ŤoW^Luis玎{ĉ5l0{ueI@RJc%K&kז%K<6$@$@$ ڢe˖Rzu=gDyߕp4arZ 8R>S:>{LCl{ylz&40;^PXKi! # K(`^p|Ν8b5U !b:w4mjVCr0ajɰ EϞ==b |Cc[O˸q\8h0|+J5kL{03VVXĠ^zYK"`"~TPAgQ,a_~E  1c輞r?߻wo)Ly8ܘXrѣҥKLRzK;9F _(DHHH NZpϊ>駟 l}"K ^ӓ #yD>x13݌%mժꥅ(Ŏ;(D+eְ+~ .!n_ ÒM86sLQµhԩSq4   5q0Mɝ2HX\ָD EWBSLi&}x_Q7~%}I=q챰?&[u@<>޽5 C{#  &Hхw몪liIy'4X><ù)HHHBh_ 'OB*X6  HNk"F$@$@$@p$$_:2$@$@$@$@A40[I$@$@$@AC4h.%;B$@$@$@A40ZyYFƐ   ЀTi… eԨQi[B$@$@$P(@r$@$@$@$(@$@$@$@$P(@r$@$@$@$(@$@$@$@$P$\hl֭yh [A$@$@$p(@='JH /8 5!   => =3gɓ'%wܒ1cF'Npw:r߿_RJ%y Df'  $@=^߸qCj׮-۶mĉ ־}$^x }J ( ϟǏk-[PA~ ;Qy%W\"E )]̘1å?ԫWOҦM+5k֔իWK2eرcV]vIZtKqnw~z}\|Y ÐCIɒ%5uօ^׮]eĈɆ Je޽yHHb#k״1u+!bbQ*U\A9r؋sHHHH1p7,.wG9()̙Mqx{XHHH 8х28x,'ַm&3gΈX 3$ B `ٳ ^4    C$   p'@N$@$@$@$@J c+H,X1& @d(@##a.\X /8 5!   XHHHHPc     P.t ={V8s   h,pB5jThv&  6 h#d$@$@$@$@Q!@ZK$@$@$@$mA%@Ν+qđ/F +    \>vuG˫*yvIu<('Nԓr%[z*ys$@$@$@Q#@5^!{ƌ駟J߾}@xbɑ#nb3a„r=?΋<4   0 8v͛ҹsgɝ;dȐA6l('N0ۭgSKN^x9H]~w)S\|J߼y4iD2f(ժUˣGʕ+E/Ay2}t4iTXtB|}}VZ=!d 7|_1cCZzu}AJ#   yE7լiٶmnuU/PGAbhbLyb^RsefCͼAO K,:t!cA    _PQ[dĉGN8v?sܹ%Fs+M42bĈh$@$@$@$|οFB0CsY9pHH- P>cGŋRzu[[N2g,Wm]p5ʫD$@$@N'vZ)[dȐAҥK'*TSN9>G P<,_\e&UVK.ɽ{_~RxqI,(P@z-uunܸ!ݻw׳/ӧO[o޼);wܹsaÆ۷m&˗ԩSK5d׮]V޴ ek௾Zʽ*k駟[Zi1^ӧ5kh'<4k֌a1  vNx7%{iӦC 6ґիqFi߾TXQRL)iӦVZĉE2e}Ì3F>cѣGҼys;w[+s̑.]Z=NCǏx.4h@&O,ׂ.d#kJ*Ʉ ڵk6j>}7ސRJa$^x>> b4'a$@$@$j֬w[l[y %:g-2TwCZmSH'|;v%Jilb -5g!f}z&Mj:i:O׮], 5IիN?Σb! oڥ$IC bNcb_Wȑ#uÇ7~CCcܸqVޘx',Y Gɓ'ϻʕ+q`8RC,D=z0FmC oʛXx<;??"5y33gZZ(N$~E5on=MS"Q/(ʱcDx=.رC5Dz.RCJ2SdIi%ڵk[ɓ'fnx.LAPsƍb8qb9~nrJ8p4iD{?~<Sꂗn" ܍ɏ Ϙ]<]ȇs`v58>)ڎ6z2K5AƖOpL0al p;7>n4X^M Yf6mH)|("ye…R^=/^ ,\E]_7oJrF7.yGHH}EI&I |("f͒'|R&L GFI3i1G?*UȡCח̙3c$HH#ڵkg ug˖MV*.]{I~x,Y2)P[r- 7{C?~߿>}:/TΝ%wZ@6lPN8a778 5k֔ԩSK嵇6|=l>}zɓo2tUgϮ_zYz4kLΝ;8?r*U;9stM/_| >\3GmM40|HHH 8R^zU6n(۷+Jʔ)%mڴҪU+8qrSLBn̘1#i޼̝;Wz-}9sH.]q|?s: 1Ǐ/Jٲe]#2KZ蚢qra]ϵkd˖-r]=IC.]#}v-J!~=ۮ];Yh"_]nݪuݺueQ7C>QFɒ%K?:vYjQiWGwIJ<<6J4)/WPPm6+|J4%bŊYHl۶+W.}ܛ!#)]K=O?DǛ?*lH&n3RJe(ϝJ0vRg6WJWSܔҰa/IJ&v󦼙?z)Otի[ov>?Tyy=}iʙd~:À3SYc tCy:S:ߠAM߬bp+ 2D{1K^bV.LX^QY1FH$@$@&04gΜ2h M$?b",u7eo0 A&q5s^_,儵E:3O Q0n8-R}T_}Śxq_B7]tҺukq|7)/m2hxK99Y/ @٩4Fx"0'<)nxb< "}0 q}ȑ#z۷p+   3,D    /ԋ>9.g;al <:,$͈SAXjF$@$@$@ZګD#   _PBeHHHH|&@3:$   /XHHHHg> ݂x{O#   _p&_x 8vHHH :=%   2 (#c    =%   2 (#c    =U]vRlkI&1ʳ~zYvhӦM={Z^   ÅhѢ+VL3ȡCp9۷S9k%  z~_|ԫWO0 qݻwk׮ҸqcYfTXQ6m @Hسg.Κ5ɓGڷo/p9Rbx^˗Kl٤jժr%wO/.ɒ% [o%nݲӍ7{Rpaɟ?_N>mytYr-2d ʉ'#GH2edÆ VEI[pT\Y4^t?/gΜ1Ijdʔ)˗֭[[y&M$J)RHr-{dʔI+p<|P}ޥKiY/ ɓ'hѢRti=~پ}$MT.^h|W^7_̥LRҦM+Z' ! nj#2V͛7sJ޽o߾2gҥ>s=PF ˗(=u,Yĺ[lbLPu/ywm۶ګn:iԨ]LΝ j5jпx=zhСW ^yp| ׮],h$@$@$@GN\rõ^-_|z4e˖ұc;jVq6jMy.'|;v%J5kiJrJ0Zy~7CyK}>MCꗆNz׌g}}eԩgeɄPrHX ^O_ j]{@~W-y2޾C%JH{W'0n8s姺u!olX޼y5#GxS3f)#:'b`F$@$@$=sN$VLX&-`(~W1ԞK-]0AHJ޵k:" ֮][OF2K~X43EXg0x@a?o̜9SA &h*1ڨfkoXsԲN8'lԨQfg>   ` q,VɓG3xA@k u0:b&$Hn O)yAb֥)0í(@c[Ӱ>fC\{j/:/fޛF+y !ƻgC"ޅF$@$@${2uTS-S^  O9?c>oO$@$@$@D ` *J$@$@$@$>  @,!X`*H7oRN$@$@$@eX*rcXHHH p>.L$@$@$@M4[O$@$@$@G4.L$@$@$@M4ciuc;yR   '@ؽ{~3  `  Xl*  `  XNij,YPBNiA$@$@$`$`Nhn H@IDATHH @ ΅С.]:'7kLjԨ}$   '@h8bŊ5kpwao>ٳgKNʖ-+ɒ% Nw$M>K:$2e?`$IZYHh8׵u KڵeÆ 0aBy\vM9"sγw$#;wHݺuرc޽{e׮]r9ɘ1 /G_^ʗ/=p5k֔iJZѣuVZ0tUN>m]{I~xcW@y뭷֭[:ڵkg˗Klt=.]J*ɂ B /_>Yd|ꫯZuccҤIRT)I"+WN.\rbmȐ!RX1ɑ# 8P 8LށV\]rEO49OzhA\(D-X}d޼yf裏dҦMAŋO?dwŋBǏiӦIϞ=ݳpBet 6l0ҥ?ޞm  EWD>x"{EDhǎO>nݺ?r>}HFo%Jڵ^{=2dիW {ꩧd:tPiٲ̚5K a['h=q/'A =?h wǰ{ҤI퇬h!EĆ^~:a{'z>|؞$#n,O~|`T) 2 ?ox?bN7Gǀ>? kիga9, dz ~7oZe!|w4n\z@\a2Ζ-[E9shsCܾ4ih/˗4'l v< fao~wɘj7Ib3톴'xB{9}6ו@wv9ˆq`1JF3gd:^=pO&l͚5ڛ qkab&57tz si*`RXc :Н0a͛WLJ#|n:t'$@$NԽZJ0kiӦz"ODtpº8fc|akNt 7 jBcN}_`U]i}4h*mF&zH%ϜKu t @XSࡃ0`^25zi~2LӍ6oެ'BaX>QD:\0a8 sc38k+X"FA/?Yk)"  qڋ ʨQzNHHH xPϵdOHHHH Peb#IHHH xPϵdOHHHH Peb#IHHH x2Lsi,߼ys6-#  p4 PG_g6ŋF$@$@$@/XHHHHg>cA    _PBeHHHH|&@3-n:1cF`IHHE4ZBݻeʕyHHH (@ D@ йsJ8qŋQļ$@$@$@$@1D h qc5nv)XTRҠA={[!ƍ^xAU&={s9l 9.D86%K)TU֭[t/^裏e˖˖-[t={=;w^ jMN~yKMH~}<3gu+(h4lؼyZj7n\cرJ@?QhQsWk׮>6"<.FsGygZXPdk?<bB IU*rksD~5n̚5+LU9  $ŋ7Mv$$x0aBg/֭[+wCکSҥKڛawˠxϟ_|؆1JazqLZqٳ[IMBn v;ݥ E\ !a-Y(P@7 ?5ӜfHH(diҤky^|9aOdf ?bv6lX$H  18iZ3 "4oFoiӦ偖D bm$̖wBׯ/? NEo嘔 MdHHH 864"8y 8HL wT1k֬q 3MrB!}노ϦM0A =3լݛ^19ܴɓkO1XȜos;&a2fΝ[_~^P   `&?]K%aravoӦ+VL*V{X* 31yO?ԥW^ѳ1{_CE-2e0314    XO @p>zXHHHH (@EԳDA$@$@$@> [hZp^?)$@$@$@ˑ D'l,D$@$@$@$+ P_ɱ O(@}B$@$@$@$@: r[͛0vHHH :(@C/D&JH 6?CO. m~ݻw?1w: ИcɚHHH kNZl))R~?Ξ>}Z(cǎ}MA%@*ӦM#_|x0VZ%ժU+Wʳ>8HHH<H8ԩSGJ*9Sc *g?|b_rEƌ#+WvI͛7m۶bFcL{z=gΜfp^*'N,YH4i$n\ν  uၢE ~-[޽%I$|S t{ѿK.\ ǏD$)&? y敃JdӦM AG# `'HWڵkg˗Kl٤jժr%}-&M]7)W౐ي+Ǎ'I&u̙3ww4^y{t5-*TH3GLXbO?-;vׯŋ'֭[Nn:F$@AOgƍҾ}{XLRҦM+G:h/Ɣ)SB /ȼy\(x?aÆab;E#G/0C<ΦMKFN&l2ٷoj,ݓ'O 0 <>!_ `($#̶5kK{٨Q#ӧOK,CIΝի`R믿X' u.|ܹSnܸ1T :uJP4}{y3ȑ#zvj$aygݺuzh8q\5_LBs(O|ӺP }v-B˖-+[l1H"2gh_kԩ+W.Q֟1cL.Bh]Ɨtɭ FE^ެvǹM60aB6bc<7?y_ϻ?>xSBqh"LK7;fm: ^jbWŋjtcȐ!V6Ԭw#{~7ORɺjK`ɘ1qYk*NPBdž4YƓO>ip2#e"I <jqlC5 < D@ɒ% ~E!J>37 L 8jOYzcX̴s mڴ1u cx wna= R7Sv9pćׯ__/w^=y QҪU+'5m! #P+QpaxP !ٳgKN3Mۺu(OAo3h1  ta׮]zhC5A)>y/BpҥK >GSm$  8jSqs֤Yfyf֭(7˳1kDbFf~ܹs6bO96y<;I6H@x@„ zX{zY֭[ p<B%wDCxQ|F 3;|D鐋f pEb 1dn7 b4(ubf7!9ρ 3 P1PwC(49w>0OODt/}pXBEF*vO6M?3vjg$@$@$1QJ(!'N83:@@ PG Fݾ}[?0̮ DVY~O^7~    j~Z^~e03Er̙(Onݺ+h`]/HHVR)X|7zDtu3fI ϥ͓'%d/^x$G+X B֭]^`IcfΜ)6mҏ]ԬY3X~ [evIѢE?ԅ5jȠAP8u<#GH ^d ?xGGbǟ|IyHHH3G ЫWƍ}RbEI2MVF-:t Ȕ)SB 4o޼0Wy6l(>uܹs?ԟ{93j1taPY%KqբJ<~\~]>kxqo4  (N81Oxz&ن-5իY6ҥK9)bJ=n >x7=.۷o8 lޫV9'*}!8솵r_lݺZ?G980͛kۻwo۷̙3G;u#+ >\cTkچ&(עE r : 07f5ڦ'MgʔIB%`a/A+UAÿ̥pc{?P߯k׮HHdEL`T3%Z4  `&{?͒p(E}ݫW/ٰa<? (h%0W<kc]gѭr,*@rbbB N}O(G'V|:y' +&V02̫v  7n,,_)͚5 *p)>qVڟ hF%*&QF0>~N# y={:󭵾,XNjʽKL)++c9FK/-ZX.SfF'XRi :eڵi-q>_% >o!9 |e]@^ @y={4;In)x[f:۴i#SLYT"W]&nv̞b7#{k%xxNwF)Y;ⱔub@ӭr h!d[IHHrC=#ZvCTBLb*~; B!7SOk2׎GL){ܣ}h3ʅTk8@L'GY(@mDsПI&jc   B J qg8Nyn.)R  @$@$@$@QPF<$@$@$@$@ PFFnj$@$@$@$@QPF<$@$@$@$@ pRdtk׮t $@$@$@!@Zz%]v D!)(ԘHHHH 2 蘑HHHH  (ԘHHHH 2 J7{'7n,]9 @pRfӧ/,&L(M5 hҤw^vm!ݻwĉsXc櫢`ZD`ΝYfqjےCҥ4o\ϟ:Xp >.` \\f=$@$@$@$@F    )nVF$@$@$@$@ @N P7+#   gHHHH (@s P3@$@$@$@$StDS̟Ҳe5- ̱59C͊B(@C0xI$@$@$@$}>c@$@$@$@$"@K    >c@$@$@$@$"@K    >c@$@$@$@$"@K    >c@$@$@$@$"@K    >c@$@$@$@$"@K    >Xr;r%̐˜`}%;wR'޽{۷/dKhĉ{R~}ܹ̝;ZJ'5-[ȑ#+$e,N*m۶x%pKƍ@MZ3Ql޼Y;0yg*m*E P+eX`\}?AƎ+&L3[oߞV&&.ba(RڵsjL] -<@SuAmr 'u,{.w-7M6^z`& qƹ s0cƌJJ1Ap >yٰas9 -ݻ̜93!.|3gիԭ[7/>@/_"%;vwL;K`ʕh"y'dRFrQ?7>?Eo!CTb~$}6Pfj |7͚5Kh m۶M˄x|~w_￟ϛx2c|>6n(j/c(<we>}n|Mڅ3a XBF-kNn~2@4X'|b׿Cg}QBoϧqAP7'((c٢E kz״iSҥ^:!7I 8IfP㞏^'֙^nݺ5|c<1 * ַPoM Xb\w$[y]<|n  \q=Og͚P3vw-!.|SVV&gNؤ2ԍt)1'e,1~5=\3>fI 禚U2{; FQMɀ+'SEbMK;uHԽAuM0`@S79] >StRݘ1c<( 錥:_puj9묳ktf$կ~թk8[wD;`?$E&w@QwAt k={СCMP&">`Bo!4GS+W?m4tӔ޽{(/h%7il jyQ(j*I(ح?=HByi"] K X"3P˧ąK-B &J=OS(@i4   (܄T& @1-d_HHHHP $@$@$@$PL(@i4   (0Hl" "[ݻS=f Ėhl #HE7Z馛R=8@~ߧ|:uwߝv_~?^᭷ޒ] e8"k_{챲lٲ?Q߿ qٺSNJ?ۭFo۬.GH C(@3Ő 7;tĉ `=qH,V!Ž?xٹsfK/ɸqaÆvŋ(B~c)n?$PRa%&|r#Beɒ%}OO.zThAt L3g.m۶"J>}#.]X<ꄵ+g (l$PR`lݺtI_єK.=RVB[n6])+B>裠,X).9ꨣ,MΝ''_ >\ꪄ^x,(_WS&M >rq@TW6R'$;CL#\wu2ow l/2 =4q[n{N9qN-nڴi?iy*N8r7TdŨXtz;Pqiez Mڵ-/"tJة bԩGu7nt:nн꫖v̘1և /ҫhtjt1ͻTX;RBŴSQNbsq*l p׿}6-&M8A;1\:wje ׿vsxSNş5GA 5zh裏:wV M]f͜Zl aj*zjWn]w饗:ꫯ?kZ-AdFQ:]PHgbٝx7 Zyv>o=Vڎ:uݮ /i =A 6X[u- M% Ç}-Z/h@<Fe@B#TE"ɓ^`?؄e]f A3V .q|^@F p ^HO=b OLabBT$}<]1CT +Q%xNrfSBU/؁z7WXо7xC4jE+WZ#]vS-}3ϔ5̴` ?\T#= `Jb@x-1>/acyf[€XP+p(SE/w#6has1oBWoP\Ю/H2B2 dQkP\cǎ&fsr {7}w/K:Pqx/">^B`GLFTz:ZźWE/ }XTx dhf8 ,ZtxB&`]ED+CQ;NNsaÆ0T4N ©7p1@x+J^HہpH#ƧSXb*ְ’ \@d"秽}~~WZ?vTw~%C{ߙuԯDX稻㭍G; o;Nu>?֢ X&HiӦ>}L `\cXb'dA4B?~|@xd1?䓭XF<;w~k˫$@ /W - R1TdlZ֜:`\ rvXj4<iY: >>(cjYzQ\ " ^Ŝ!Eymڴ1D&Cq\sMj)4O(BT;> pA*L4ju?SO]víR8Y"Xu:mg n.P^36ڻ?R%;˩oݮw#Ҕ+,_~9`=anPxbzVv?SO(Jp?ayweҲeKyWL=mJP&3 @.P2 -qF{#;RG9ƪ8症Yg%s,{]ve$ G-]vxh}ړe &NW >s=WM&z|Bg{13p;lC>}رᄛ7o)o툩)SIF)7|uQ5j$gq(SO<,'5jKH 7n(zֻ`z^g?]W7XO>dsb% zEzκs2}tu&s̑k֎1Æ ڋt} :Tڵk'GyvL/t|'7 x[G$b}ҥAmڞ?~={~e$Gym] u, >(?ϖÇ[=Gw^ٴiSPGyQGI&MdȐ!&vQ/, gKrUW ̣['I'${6kիWˀdʕ&,!oر?ko`Yw  $@$PBjժe}W访}o>_n̘13fi*ׯԢiq3gtjtMˇ7x6Nzմ||'7 x[G$C_j̚5\:p@sWAp)4bsDu8Y;1_5}:aOv[ 뮳u~iA83#[oe'U!+Qn1 H^JW\qMGc (Gm. /Lc3|k =,E qD9^XyE{$c = "3.4") withAutoprint else force)(\{ # examplesIf} m.out4 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "full", estimand = "ATE", caliper = c(.1, age = 2, educ = 1), std.caliper = c(TRUE, FALSE, FALSE)) m.out4 summary(m.out4, un = TRUE) \dontshow{\}) # examplesIf} # Subclassification on a logistic PS with 10 subclasses after # discarding controls outside common support of PS s.out1 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "subclass", distance = "glm", discard = "control", subclass = 10) s.out1 summary(s.out1, un = TRUE) } \references{ Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. \emph{Political Analysis}, 15(3), 199–236. \doi{10.1093/pan/mpl013} Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2011). MatchIt: Nonparametric Preprocessing for Parametric Causal Inference. \emph{Journal of Statistical Software}, 42(8). \doi{10.18637/jss.v042.i08} } \seealso{ \code{\link[=summary.matchit]{summary.matchit()}} for balance assessment after matching, \code{\link[=plot.matchit]{plot.matchit()}} for plots of covariate balance and propensity score overlap after matching. \itemize{ \item \code{vignette("MatchIt")} for an introduction to matching with \emph{MatchIt} \item \code{vignette("matching-methods")} for descriptions of the variety of matching methods and options available \item \code{vignette("assessing-balance")} for information on assessing the quality of a matching specification \item \code{vignette("estimating-effects")} for instructions on how to estimate treatment effects after matching \item \code{vignette("sampling-weights")} for a guide to using \emph{MatchIt} with sampling weights. } } \author{ Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart wrote the original package. Starting with version 4.0.0, Noah Greifer is the primary maintainer and developer. } MatchIt/man/method_genetic.Rd0000644000176200001440000003246214740562365015646 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/matchit2genetic.R \name{method_genetic} \alias{method_genetic} \title{Genetic Matching} \arguments{ \item{formula}{a two-sided \link{formula} object containing the treatment and covariates to be used in creating the distance measure used in the matching. This formula will be supplied to the functions that estimate the distance measure and is used to determine the covariates whose balance is to be optimized.} \item{data}{a data frame containing the variables named in \code{formula}. If not found in \code{data}, the variables will be sought in the environment.} \item{method}{set here to \code{"genetic"}.} \item{distance}{the distance measure to be used. See \code{\link{distance}} for allowable options. When set to a method of estimating propensity scores or a numeric vector of distance values, the distance measure is included with the covariates in \code{formula} to be supplied to the generalized Mahalanobis distance matrix unless \code{mahvars} is specified. Otherwise, only the covariates in \code{formula} are supplied to the generalized Mahalanobis distance matrix to have their scaling factors chosen. \code{distance} \emph{cannot} be supplied as a distance matrix. Supplying any method of computing a distance matrix (e.g., \code{"mahalanobis"}) has the same effect of omitting propensity score but does not affect how the distance between units is computed otherwise.} \item{link}{when \code{distance} is specified as a method of estimating propensity scores, an additional argument controlling the link function used in estimating the distance measure. See \code{\link{distance}} for allowable options with each option.} \item{distance.options}{a named list containing additional arguments supplied to the function that estimates the distance measure as determined by the argument to \code{distance}.} \item{estimand}{a string containing the desired estimand. Allowable options include \code{"ATT"} and \code{"ATC"}. See Details.} \item{exact}{for which variables exact matching should take place.} \item{mahvars}{when a distance corresponds to a propensity score (e.g., for caliper matching or to discard units for common support), which covariates should be supplied to the generalized Mahalanobis distance matrix for matching. If unspecified, all variables in \code{formula} will be supplied to the distance matrix. Use \code{mahvars} to only supply a subset. Even if \code{mahvars} is specified, balance will be optimized on all covariates in \code{formula}. See Details.} \item{antiexact}{for which variables anti-exact matching should take place. Anti-exact matching is processed using the \code{restrict} argument to \code{Matching::GenMatch()} and \code{Matching::Match()}.} \item{discard}{a string containing a method for discarding units outside a region of common support. Only allowed when \code{distance} corresponds to a propensity score.} \item{reestimate}{if \code{discard} is not \code{"none"}, whether to re-estimate the propensity score in the remaining sample prior to matching.} \item{s.weights}{the variable containing sampling weights to be incorporated into propensity score models and balance statistics. These are also supplied to \code{GenMatch()} for use in computing the balance t-test p-values in the process of matching.} \item{replace}{whether matching should be done with replacement.} \item{m.order}{the order that the matching takes place. Allowable options include \code{"largest"}, where matching takes place in descending order of distance measures; \code{"smallest"}, where matching takes place in ascending order of distance measures; \code{"random"}, where matching takes place in a random order; and \code{"data"} where matching takes place based on the order of units in the data. When \code{m.order = "random"}, results may differ across different runs of the same code unless a seed is set and specified with \code{\link[=set.seed]{set.seed()}}. The default of \code{NULL} corresponds to \code{"largest"} when a propensity score is estimated or supplied as a vector and \code{"data"} otherwise.} \item{caliper}{the width(s) of the caliper(s) used for caliper matching. See Details and Examples.} \item{std.caliper}{\code{logical}; when calipers are specified, whether they are in standard deviation units (\code{TRUE}) or raw units (\code{FALSE}).} \item{ratio}{how many control units should be matched to each treated unit for k:1 matching. Should be a single integer value.} \item{verbose}{\code{logical}; whether information about the matching process should be printed to the console. When \code{TRUE}, output from \code{GenMatch()} with \code{print.level = 2} will be displayed. Default is \code{FALSE} for no printing other than warnings.} \item{\dots}{additional arguments passed to \pkgfun{Matching}{GenMatch}. Potentially useful options include \code{pop.size}, \code{max.generations}, and \code{fit.func}. If \code{pop.size} is not specified, a warning from \emph{Matching} will be thrown reminding you to change it. Note that the \code{ties} and \code{CommonSupport} arguments are set to \code{FALSE} and cannot be changed. If \code{distance.tolerance} is not specified, it is set to 0, whereas the default in \emph{Matching} is 1e-5.} } \description{ In \code{\link[=matchit]{matchit()}}, setting \code{method = "genetic"} performs genetic matching. Genetic matching is a form of nearest neighbor matching where distances are computed as the generalized Mahalanobis distance, which is a generalization of the Mahalanobis distance with a scaling factor for each covariate that represents the importance of that covariate to the distance. A genetic algorithm is used to select the scaling factors. The scaling factors are chosen as those which maximize a criterion related to covariate balance, which can be chosen, but which by default is the smallest p-value in covariate balance tests among the covariates. This method relies on and is a wrapper for \pkgfun{Matching}{GenMatch} and \pkgfun{Matching}{Match}, which use \pkgfun{rgenoud}{genoud} to perform the optimization using the genetic algorithm. This page details the allowable arguments with \code{method = "genetic"}. See \code{\link[=matchit]{matchit()}} for an explanation of what each argument means in a general context and how it can be specified. Below is how \code{matchit()} is used for genetic matching: \preformatted{ matchit(formula, data = NULL, method = "genetic", distance = "glm", link = "logit", distance.options = list(), estimand = "ATT", exact = NULL, mahvars = NULL, antiexact = NULL, discard = "none", reestimate = FALSE, s.weights = NULL, replace = FALSE, m.order = NULL, caliper = NULL, ratio = 1, verbose = FALSE, ...) } } \details{ In genetic matching, covariates play three roles: 1) as the variables on which balance is optimized, 2) as the variables in the generalized Mahalanobis distance between units, and 3) in estimating the propensity score. Variables supplied to \code{formula} are always used for role (1), as the variables on which balance is optimized. When \code{distance} corresponds to a propensity score, the covariates are also used to estimate the propensity score (unless it is supplied). When \code{mahvars} is specified, the named variables will form the covariates that go into the distance matrix. Otherwise, the variables in \code{formula} along with the propensity score will go into the distance matrix. This leads to three ways to use \code{distance} and \code{mahvars} to perform the matching: \enumerate{ \item{When \code{distance} corresponds to a propensity score and \code{mahvars} \emph{is not} specified, the covariates in \code{formula} along with the propensity score are used to form the generalized Mahalanobis distance matrix. This is the default and most typical use of \code{method = "genetic"} in \code{matchit()}. } \item{When \code{distance} corresponds to a propensity score and \code{mahvars} \emph{is} specified, the covariates in \code{mahvars} are used to form the generalized Mahalanobis distance matrix. The covariates in \code{formula} are used to estimate the propensity score and have their balance optimized by the genetic algorithm. The propensity score is not included in the generalized Mahalanobis distance matrix. } \item{When \code{distance} is a method of computing a distance matrix (e.g.,\code{"mahalanobis"}), no propensity score is estimated, and the covariates in \code{formula} are used to form the generalized Mahalanobis distance matrix. Which specific method is supplied has no bearing on how the distance matrix is computed; it simply serves as a signal to omit estimation of a propensity score. } } When a caliper is specified, any variables mentioned in \code{caliper}, possibly including the propensity score, will be added to the matching variables used to form the generalized Mahalanobis distance matrix. This is because \emph{Matching} doesn't allow for the separation of caliper variables and matching variables in genetic matching. \subsection{Estimand}{ The \code{estimand} argument controls whether control units are selected to be matched with treated units (\code{estimand = "ATT"}) or treated units are selected to be matched with control units (\code{estimand = "ATC"}). The "focal" group (e.g., the treated units for the ATT) is typically made to be the smaller treatment group, and a warning will be thrown if it is not set that way unless \code{replace = TRUE}. Setting \code{estimand = "ATC"} is equivalent to swapping all treated and control labels for the treatment variable. When \code{estimand = "ATC"}, the default \code{m.order} is \code{"smallest"}, and the \code{match.matrix} component of the output will have the names of the control units as the rownames and be filled with the names of the matched treated units (opposite to when \code{estimand = "ATT"}). Note that the argument supplied to \code{estimand} doesn't necessarily correspond to the estimand actually targeted; it is merely a switch to trigger which treatment group is considered "focal". Note that while \code{GenMatch()} and \code{Match()} support the ATE as an estimand, \code{matchit()} only supports the ATT and ATC for genetic matching. } \subsection{Reproducibility}{ Genetic matching involves a random component, so a seed must be set using \code{\link[=set.seed]{set.seed()}} to ensure reproducibility. When \code{cluster} is used for parallel processing, the seed must be compatible with parallel processing (e.g., by setting \code{kind = "L'Ecuyer-CMRG"}). } } \section{Outputs}{ All outputs described in \code{\link[=matchit]{matchit()}} are returned with \code{method = "genetic"}. When \code{replace = TRUE}, the \code{subclass} component is omitted. When \code{include.obj = TRUE} in the call to \code{matchit()}, the output of the call to \pkgfun{Matching}{GenMatch} will be included in the output. } \examples{ \dontshow{if (all(sapply(c("Matching", "rgenoud"), requireNamespace, quietly = TRUE))) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} data("lalonde") # 1:1 genetic matching with PS as a covariate m.out1 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "genetic", pop.size = 10) #use much larger pop.size m.out1 summary(m.out1) # 2:1 genetic matching with replacement without PS m.out2 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "genetic", replace = TRUE, ratio = 2, distance = "mahalanobis", pop.size = 10) #use much larger pop.size m.out2 summary(m.out2, un = FALSE) # 1:1 genetic matching on just age, educ, re74, and re75 # within calipers on PS and educ; other variables are # used to estimate PS m.out3 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "genetic", mahvars = ~ age + educ + re74 + re75, caliper = c(.05, educ = 2), std.caliper = c(TRUE, FALSE), pop.size = 10) #use much larger pop.size m.out3 summary(m.out3, un = FALSE) \dontshow{\}) # examplesIf} } \references{ In a manuscript, be sure to cite the following papers if using \code{matchit()} with \code{method = "genetic"}: Diamond, A., & Sekhon, J. S. (2013). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics, 95(3), 932–945. \doi{10.1162/REST_a_00318} Sekhon, J. S. (2011). Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R. Journal of Statistical Software, 42(1), 1–52. \doi{10.18637/jss.v042.i07} For example, a sentence might read: \emph{Genetic matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R, which calls functions from the Matching package (Diamond & Sekhon, 2013; Sekhon, 2011).} } \seealso{ \code{\link[=matchit]{matchit()}} for a detailed explanation of the inputs and outputs of a call to \code{matchit()}. \pkgfun{Matching}{GenMatch} and \pkgfun{Matching}{Match}, which do the work. } MatchIt/man/plot.matchit.Rd0000644000176200001440000002022014740562365015263 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/plot.matchit.R \name{plot.matchit} \alias{plot.matchit} \alias{plot.matchit.subclass} \title{Generate Balance Plots after Matching and Subclassification} \usage{ \method{plot}{matchit}(x, type = "qq", interactive = TRUE, which.xs = NULL, data = NULL, ...) \method{plot}{matchit.subclass}(x, type = "qq", interactive = TRUE, which.xs = NULL, subclass, ...) } \arguments{ \item{x}{a \code{matchit} object; the output of a call to \code{\link[=matchit]{matchit()}}.} \item{type}{the type of plot to display. Options include \code{"qq"}, \code{"ecdf"}, \code{"density"}, \code{"jitter"}, and \code{"histogram"}. See Details. Default is \code{"qq"}. Abbreviations allowed.} \item{interactive}{\code{logical}; whether the graphs should be displayed in an interactive way. Only applies for \code{type = "qq"}, \code{"ecdf"}, \code{"density"}, and \code{"jitter"}. See Details.} \item{which.xs}{with \code{type = "qq"}, \code{"ecdf"}, or \code{"density"}, for which covariate(s) plots should be displayed. Factor variables should be named by the original variable name rather than the names of individual dummy variables created after expansion with \code{model.matrix}. Can be supplied as a character vector or a one-sided formula.} \item{data}{an optional data frame containing variables named in \code{which.xs} but not present in the \code{matchit} object.} \item{\dots}{arguments passed to \code{\link[=plot]{plot()}} to control the appearance of the plot. Not all options are accepted.} \item{subclass}{with subclassification and \code{type = "qq"}, \code{"ecdf"}, or \code{"density"}, whether to display balance for individual subclasses, and, if so, for which ones. Can be \code{TRUE} (display plots for all subclasses), \code{FALSE} (display plots only in aggregate), or the indices (e.g., \code{1:6}) of the specific subclasses for which to display balance. When unspecified, if \code{interactive = TRUE}, you will be asked for which subclasses plots are desired, and otherwise, plots will be displayed only in aggregate.} } \description{ Generates plots displaying distributional balance and overlap on covariates and propensity scores before and after matching and subclassification. For displaying balance solely on covariate standardized mean differences, see \code{\link[=plot.summary.matchit]{plot.summary.matchit()}}. The plots here can be used to assess to what degree covariate and propensity score distributions are balanced and how weighting and discarding affect the distribution of propensity scores. } \details{ \code{plot.matchit()} makes one of five different plots depending on the argument supplied to \code{type}. The first three, \code{"qq"}, \code{"ecdf"}, and \code{"density"}, assess balance on the covariates. When \code{interactive = TRUE}, plots for three variables will be displayed at a time, and the prompt in the console allows you to move on to the next set of variables. When \code{interactive = FALSE}, multiple pages are plotted at the same time, but only the last few variables will be visible in the displayed plot. To see only a few specific variables at a time, use the \code{which.xs} argument to display plots for just those variables. If fewer than three variables are available (after expanding factors into their dummies), \code{interactive} is ignored. With \code{type = "qq"}, empirical quantile-quantile (eQQ) plots are created for each covariate before and after matching. The plots involve interpolating points in the smaller group based on the weighted quantiles of the other group. When points are approximately on the 45-degree line, the distributions in the treatment and control groups are approximately equal. Major deviations indicate departures from distributional balance. With variable with fewer than 5 unique values, points are jittered to more easily visualize counts. With \code{type = "ecdf"}, empirical cumulative distribution function (eCDF) plots are created for each covariate before and after matching. Two eCDF lines are produced in each plot: a gray one for control units and a black one for treated units. Each point on the lines corresponds to the proportion of units (or proportionate share of weights) less than or equal to the corresponding covariate value (on the x-axis). Deviations between the lines on the same plot indicates distributional imbalance between the treatment groups for the covariate. The eCDF and eQQ statistics in \code{\link[=summary.matchit]{summary.matchit()}} correspond to these plots: the eCDF max (also known as the Kolmogorov-Smirnov statistic) and mean are the largest and average vertical distance between the lines, and the eQQ max and mean are the largest and average horizontal distance between the lines. With \code{type = "density"}, density plots are created for each covariate before and after matching. Two densities are produced in each plot: a gray one for control units and a black one for treated units. The x-axis corresponds to the value of the covariate and the y-axis corresponds to the density or probability of that covariate value in the corresponding group. For binary covariates, bar plots are produced, having the same interpretation. Deviations between the black and gray lines represent imbalances in the covariate distribution; when the lines coincide (i.e., when only the black line is visible), the distributions are identical. The last two plots, \code{"jitter"} and \code{"histogram"}, visualize the distance (i.e., propensity score) distributions. These plots are more for heuristic purposes since the purpose of matching is to achieve balance on the covariates themselves, not the propensity score. With \code{type = "jitter"}, a jitter plot is displayed for distance values before and after matching. This method requires a distance variable (e.g., a propensity score) to have been estimated or supplied in the call to \code{matchit()}. The plot displays individuals values for matched and unmatched treatment and control units arranged horizontally by their propensity scores. Points are jitter so counts are easier to see. The size of the points increases when they receive higher weights. When \code{interactive = TRUE}, you can click on points in the graph to identify their rownames and indices to further probe extreme values, for example. With subclassification, vertical lines representing the subclass boundaries are overlay on the plots. With \code{type = "histogram"}, a histogram of distance values is displayed for the treatment and control groups before and after matching. This method requires a distance variable (e.g., a propensity score) to have been estimated or supplied in the call to \code{matchit()}. With subclassification, vertical lines representing the subclass boundaries are overlay on the plots. With all methods, sampling weights are incorporated into the weights if present. } \note{ Sometimes, bugs in the plotting functions can cause strange layout or size issues. Running \code{\link[=frame]{frame()}} or \code{\link[=dev.off]{dev.off()}} can be used to reset the plotting pane (note the latter will delete any plots in the plot history). } \examples{ data("lalonde") m.out <- matchit(treat ~ age + educ + married + race + re74, data = lalonde, method = "nearest") plot(m.out, type = "qq", interactive = FALSE, which.xs = ~age + educ + re74) plot(m.out, type = "histogram") s.out <- matchit(treat ~ age + educ + married + race + nodegree + re74 + re75, data = lalonde, method = "subclass") plot(s.out, type = "density", interactive = FALSE, which.xs = ~age + educ + re74, subclass = 3) plot(s.out, type = "jitter", interactive = FALSE) } \seealso{ \code{\link[=summary.matchit]{summary.matchit()}} for numerical summaries of balance, including those that rely on the eQQ and eCDF plots. \code{\link[=plot.summary.matchit]{plot.summary.matchit()}} for plotting standardized mean differences in a Love plot. \pkgfun{cobalt}{bal.plot} for displaying distributional balance in several other ways that are more easily customizable and produce \emph{ggplot2} objects. \emph{cobalt} functions natively support \code{matchit} objects. } MatchIt/man/mahalanobis_dist.Rd0000644000176200001440000002061614705766023016165 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/dist_functions.R \name{mahalanobis_dist} \alias{mahalanobis_dist} \alias{euclidean_dist} \alias{scaled_euclidean_dist} \alias{robust_mahalanobis_dist} \title{Compute a Distance Matrix} \usage{ mahalanobis_dist( formula = NULL, data = NULL, s.weights = NULL, var = NULL, discarded = NULL, ... ) scaled_euclidean_dist( formula = NULL, data = NULL, s.weights = NULL, var = NULL, discarded = NULL, ... ) robust_mahalanobis_dist( formula = NULL, data = NULL, s.weights = NULL, discarded = NULL, ... ) euclidean_dist(formula = NULL, data = NULL, ...) } \arguments{ \item{formula}{a formula with the treatment (i.e., splitting variable) on the left side and the covariates used to compute the distance matrix on the right side. If there is no left-hand-side variable, the distances will be computed between all pairs of units. If \code{NULL}, all the variables in \code{data} will be used as covariates.} \item{data}{a data frame containing the variables named in \code{formula}. If \code{formula} is \code{NULL}, all variables in \code{data} will be used as covariates.} \item{s.weights}{when \code{var = NULL}, an optional vector of sampling weights used to compute the variances used in the Mahalanobis, scaled Euclidean, and robust Mahalanobis distances.} \item{var}{for \code{mahalanobis_dist()}, a covariance matrix used to scale the covariates. For \code{scaled_euclidean_dist()}, either a covariance matrix (from which only the diagonal elements will be used) or a vector of variances used to scale the covariates. If \code{NULL}, these values will be calculated using formulas described in Details.} \item{discarded}{a \code{logical} vector denoting which units are to be discarded or not. This is used only when \code{var = NULL}. The scaling factors will be computed only using the non-discarded units, but the distance matrix will be computed for all units (discarded and non-discarded).} \item{\dots}{ignored. Included to make cycling through these functions easier without having to change the arguments supplied.} } \value{ A numeric distance matrix. When \code{formula} has a left-hand-side (treatment) variable, the matrix will have one row for each treated unit and one column for each control unit. Otherwise, the matrix will have one row and one column for each unit. } \description{ The functions compute a distance matrix, either for a single dataset (i.e., the distances between all pairs of units) or for two groups defined by a splitting variable (i.e., the distances between all units in one group and all units in the other). These distance matrices include the Mahalanobis distance, Euclidean distance, scaled Euclidean distance, and robust (rank-based) Mahalanobis distance. These functions can be used as inputs to the \code{distance} argument to \code{\link[=matchit]{matchit()}} and are used to compute the corresponding distance matrices within \code{matchit()} when named. } \details{ The \strong{Euclidean distance} (computed using \code{euclidean_dist()}) is the raw distance between units, computed as \deqn{d_{ij} = \sqrt{(x_i - x_j)(x_i - x_j)'}} where \eqn{x_i} and \eqn{x_j} are vectors of covariates for units \eqn{i} and \eqn{j}, respectively. The Euclidean distance is sensitive to the scales of the variables and their redundancy (i.e., correlation). It should probably not be used for matching unless all of the variables have been previously scaled appropriately or are already on the same scale. It forms the basis of the other distance measures. The \strong{scaled Euclidean distance} (computed using \code{scaled_euclidean_dist()}) is the Euclidean distance computed on the scaled covariates. Typically the covariates are scaled by dividing by their standard deviations, but any scaling factor can be supplied using the \code{var} argument. This leads to a distance measure computed as \deqn{d_{ij} = \sqrt{(x_i - x_j)S_d^{-1}(x_i - x_j)'}} where \eqn{S_d} is a diagonal matrix with the squared scaling factors on the diagonal. Although this measure is not sensitive to the scales of the variables (because they are all placed on the same scale), it is still sensitive to redundancy among the variables. For example, if 5 variables measure approximately the same construct (i.e., are highly correlated) and 1 variable measures another construct, the first construct will have 5 times as much influence on the distance between units as the second construct. The Mahalanobis distance attempts to address this issue. The \strong{Mahalanobis distance} (computed using \code{mahalanobis_dist()}) is computed as \deqn{d_{ij} = \sqrt{(x_i - x_j)S^{-1}(x_i - x_j)'}} where \eqn{S} is a scaling matrix, typically the covariance matrix of the covariates. It is essentially equivalent to the Euclidean distance computed on the scaled principal components of the covariates. This is the most popular distance matrix for matching because it is not sensitive to the scale of the covariates and accounts for redundancy between them. The scaling matrix can also be supplied using the \code{var} argument. The Mahalanobis distance can be sensitive to outliers and long-tailed or otherwise non-normally distributed covariates and may not perform well with categorical variables due to prioritizing rare categories over common ones. One solution is the rank-based \strong{robust Mahalanobis distance} (computed using \code{robust_mahalanobis_dist()}), which is computed by first replacing the covariates with their ranks (using average ranks for ties) and rescaling each ranked covariate by a constant scaling factor before computing the usual Mahalanobis distance on the rescaled ranks. The Mahalanobis distance and its robust variant are computed internally by transforming the covariates in such a way that the Euclidean distance computed on the scaled covariates is equal to the requested distance. For the Mahalanobis distance, this involves replacing the covariates vector \eqn{x_i} with \eqn{x_iS^{-.5}}, where \eqn{S^{-.5}} is the Cholesky decomposition of the (generalized) inverse of the covariance matrix \eqn{S}. When a left-hand-side splitting variable is present in \code{formula} and \code{var = NULL} (i.e., so that the scaling matrix is computed internally), the covariance matrix used is the "pooled" covariance matrix, which essentially is a weighted average of the covariance matrices computed separately within each level of the splitting variable to capture within-group variation and reduce sensitivity to covariate imbalance. This is also true of the scaling factors used in the scaled Euclidean distance. } \examples{ data("lalonde") # Computing the scaled Euclidean distance between all units: d <- scaled_euclidean_dist(~ age + educ + race + married, data = lalonde) # Another interface using the data argument: dat <- subset(lalonde, select = c(age, educ, race, married)) d <- scaled_euclidean_dist(data = dat) # Computing the Mahalanobis distance between treated and # control units: d <- mahalanobis_dist(treat ~ age + educ + race + married, data = lalonde) # Supplying a covariance matrix or vector of variances (note: # a bit more complicated with factor variables) dat <- subset(lalonde, select = c(age, educ, married, re74)) vars <- sapply(dat, var) d <- scaled_euclidean_dist(data = dat, var = vars) # Same result: d <- scaled_euclidean_dist(data = dat, var = diag(vars)) # Discard units: discard <- sample(c(TRUE, FALSE), nrow(lalonde), replace = TRUE, prob = c(.2, .8)) d <- mahalanobis_dist(treat ~ age + educ + race + married, data = lalonde, discarded = discard) dim(d) #all units present in distance matrix table(lalonde$treat) } \references{ Rosenbaum, P. R. (2010). \emph{Design of observational studies}. Springer. Rosenbaum, P. R., & Rubin, D. B. (1985). Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score. \emph{The American Statistician}, 39(1), 33–38. \doi{10.2307/2683903} Rubin, D. B. (1980). Bias Reduction Using Mahalanobis-Metric Matching. \emph{Biometrics}, 36(2), 293–298. \doi{10.2307/2529981} } \seealso{ \code{\link{distance}}, \code{\link[=matchit]{matchit()}}, \code{\link[=dist]{dist()}} (which is used internally to compute some Euclidean distances) \pkgfun{optmatch}{match_on}, which provides similar functionality but with fewer options and a focus on efficient storage of the output. } \author{ Noah Greifer } MatchIt/man/method_cem.Rd0000644000176200001440000003555214740562365014777 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/matchit2cem.R \name{method_cem} \alias{method_cem} \title{Coarsened Exact Matching} \arguments{ \item{formula}{a two-sided \link{formula} object containing the treatment and covariates to be used in creating the subclasses defined by a full cross of the coarsened covariate levels.} \item{data}{a data frame containing the variables named in \code{formula}. If not found in \code{data}, the variables will be sought in the environment.} \item{method}{set here to \code{"cem"}.} \item{estimand}{a string containing the desired estimand. Allowable options include \code{"ATT"}, \code{"ATC"}, and \code{"ATE"}. The estimand controls how the weights are computed; see the Computing Weights section at \code{\link[=matchit]{matchit()}} for details. When \code{k2k = TRUE} (see below), \code{estimand} also controls how the matching is done.} \item{s.weights}{the variable containing sampling weights to be incorporated into balance statistics or the scaling factors when \code{k2k = TRUE} and certain methods are used.} \item{verbose}{\code{logical}; whether information about the matching process should be printed to the console.} \item{\dots}{additional arguments to control the matching process. \describe{ \item{\code{grouping}}{ a named list with an (optional) entry for each categorical variable to be matched on. Each element should itself be a list, and each entry of the sublist should be a vector containing levels of the variable that should be combined to form a single level. Any categorical variables not included in \code{grouping} will remain as they are in the data, which means exact matching, with no coarsening, will take place on these variables. See Details. } \item{\code{cutpoints}}{ a named list with an (optional) entry for each numeric variable to be matched on. Each element describes a way of coarsening the corresponding variable. They can be a vector of cutpoints that demarcate bins, a single number giving the number of bins, or a string corresponding to a method of computing the number of bins. Allowable strings include \code{"sturges"}, \code{"scott"}, and \code{"fd"}, which use the functions \code{\link[grDevices:nclass]{grDevices::nclass.Sturges()}}, \code{\link[grDevices:nclass]{grDevices::nclass.scott()}}, and \code{\link[grDevices:nclass]{grDevices::nclass.FD()}}, respectively. The default is \code{"sturges"} for variables that are not listed or if no argument is supplied. Can also be a single value to be applied to all numeric variables. See Details. } \item{\code{k2k}}{ \code{logical}; whether 1:1 matching should occur within the matched strata. If \code{TRUE} nearest neighbor matching without replacement will take place within each stratum, and any unmatched units will be dropped (e.g., if there are more treated than control units in the stratum, the treated units without a match will be dropped). The \code{k2k.method} argument controls how the distance between units is calculated. } \item{\code{k2k.method}}{\code{character}; how the distance between units should be calculated if \code{k2k = TRUE}. Allowable arguments include \code{NULL} (for random matching), any argument to \code{\link[=distance]{distance()}} for computing a distance matrix from covariates (e.g., \code{"mahalanobis"}), or any allowable argument to \code{method} in \code{\link[=dist]{dist()}}. Matching will take place on the original (non-coarsened) variables. The default is \code{"mahalanobis"}. } \item{\code{mpower}}{if \code{k2k.method = "minkowski"}, the power used in creating the distance. This is passed to the \code{p} argument of \code{\link[=dist]{dist()}}. } \item{\code{m.order}}{\code{character}; the order that the matching takes place when \code{k2k = TRUE}. Allowable options include \code{"closest"}, where matching takes place in ascending order of the smallest distance between units; \code{"farthest"}, where matching takes place in descending order of the smallest distance between units; \code{"random"}, where matching takes place in a random order; and \code{"data"} where matching takes place based on the order of units in the data. When \code{m.order = "random"}, results may differ across different runs of the same code unless a seed is set and specified with \code{\link[=set.seed]{set.seed()}}. The default of \code{NULL} corresponds to \code{"data"}. See \code{\link{method_nearest}} for more information. } } The arguments \code{distance} (and related arguments), \code{exact}, \code{mahvars}, \code{discard} (and related arguments), \code{replace}, \code{caliper} (and related arguments), and \code{ratio} are ignored with a warning.} } \description{ In \code{\link[=matchit]{matchit()}}, setting \code{method = "cem"} performs coarsened exact matching. With coarsened exact matching, covariates are coarsened into bins, and a complete cross of the coarsened covariates is used to form subclasses defined by each combination of the coarsened covariate levels. Any subclass that doesn't contain both treated and control units is discarded, leaving only subclasses containing treatment and control units that are exactly equal on the coarsened covariates. The coarsening process can be controlled by an algorithm or by manually specifying cutpoints and groupings. The benefits of coarsened exact matching are that the tradeoff between exact matching and approximate balancing can be managed to prevent discarding too many units, which can otherwise occur with exact matching. This page details the allowable arguments with \code{method = "cem"}. See \code{\link[=matchit]{matchit()}} for an explanation of what each argument means in a general context and how it can be specified. Below is how \code{matchit()} is used for coarsened exact matching: \preformatted{ matchit(formula, data = NULL, method = "cem", estimand = "ATT", s.weights = NULL, verbose = FALSE, ...) } } \details{ If the coarsening is such that there are no exact matches with the coarsened variables, the \code{grouping} and \code{cutpoints} arguments can be used to modify the matching specification. Reducing the number of cutpoints or grouping some variable values together can make it easier to find matches. See Examples below. Removing variables can also help (but they will likely not be balanced unless highly correlated with the included variables). To take advantage of coarsened exact matching without failing to find any matches, the covariates can be manually coarsened outside of \code{matchit()} and then supplied to the \code{exact} argument in a call to \code{matchit()} with another matching method. Setting \code{k2k = TRUE} is equivalent to first doing coarsened exact matching with \code{k2k = FALSE} and then supplying stratum membership as an exact matching variable (i.e., in \code{exact}) to another call to \code{matchit()} with \code{method = "nearest"}. It is also equivalent to performing nearest neighbor matching supplying coarsened versions of the variables to \code{exact}, except that \code{method = "cem"} automatically coarsens the continuous variables. The \code{estimand} argument supplied with \code{method = "cem"} functions the same way it would in these alternate matching calls, i.e., by determining the "focal" group that controls the order of the matching. \subsection{Grouping and Cutpoints}{ The \code{grouping} and \code{cutpoints} arguments allow one to fine-tune the coarsening of the covariates. \code{grouping} is used for combining categories of categorical covariates and \code{cutpoints} is used for binning numeric covariates. The values supplied to these arguments should be iteratively changed until a matching solution that balances covariate balance and remaining sample size is obtained. The arguments are described below. \subsection{\code{grouping}}{ The argument to \code{grouping} must be a list, where each component has the name of a categorical variable, the levels of which are to be combined. Each component must itself be a list; this list contains one or more vectors of levels, where each vector corresponds to the levels that should be combined into a single category. For example, if a variable \code{amount} had levels \code{"none"}, \code{"some"}, and \code{"a lot"}, one could enter \code{grouping = list(amount = list(c("none"), c("some", "a lot")))}, which would group \code{"some"} and \code{"a lot"} into a single category and leave \code{"none"} in its own category. Any levels left out of the list for each variable will be left alone (so \code{c("none")} could have been omitted from the previous code). Note that if a categorical variable does not appear in \code{grouping}, it will not be coarsened, so exact matching will take place on it. \code{grouping} should not be used for numeric variables with more than a few values; use \code{cutpoints}, described below, instead. } \subsection{\code{cutpoints}}{ The argument to \code{cutpoints} must also be a list, where each component has the name of a numeric variables that is to be binned. (As a shortcut, it can also be a single value that will be applied to all numeric variables). Each component can take one of three forms: a vector of cutpoints that separate the bins, a single number giving the number of bins, or a string corresponding to an algorithm used to compute the number of bins. Any values at a boundary will be placed into the higher bin; e.g., if the cutpoints were \code{c(0, 5, 10)}, values of 5 would be placed into the same bin as values of 6, 7, 8, or 9, and values of 10 would be placed into a different bin. Internally, values of \code{-Inf} and \code{Inf} are appended to the beginning and end of the range. When given as a single number defining the number of bins, the bin boundaries are the maximum and minimum values of the variable with bin boundaries evenly spaced between them, i.e., not quantiles. A value of 0 will not perform any binning (equivalent to exact matching on the variable), and a value of 1 will remove the variable from the exact matching variables but it will be still used for pair matching when \code{k2k = TRUE}. The allowable strings include \code{"sturges"}, \code{"scott"}, and \code{"fd"}, which use the corresponding binning method, and \code{"q#"} where \verb{#} is a number, which splits the variable into \verb{#} equally-sized bins (i.e., quantiles). An example of a way to supply an argument to \code{cutpoints} would be the following: \preformatted{ cutpoints = list(X1 = 4, X2 = c(1.7, 5.5, 10.2), X3 = "scott", X4 = "q5") } This would split \code{X1} into 4 bins, \code{X2} into bins based on the provided boundaries, \code{X3} into a number of bins determined by \code{\link[grDevices:nclass]{grDevices::nclass.scott()}}, and \code{X4} into quintiles. All other numeric variables would be split into a number of bins determined by \code{\link[grDevices:nclass]{grDevices::nclass.Sturges()}}, the default. } } } \note{ This method does not rely on the \emph{cem} package, instead using code written for \emph{MatchIt}, but its design is based on the original \emph{cem} functions. Versions of \emph{MatchIt} prior to 4.1.0 did rely on \emph{cem}, so results may differ between versions. There are a few differences between the ways \emph{MatchIt} and \emph{cem} (and older versions of \emph{MatchIt}) differ in executing coarsened exact matching, described below. \itemize{ \item In \emph{MatchIt}, when a single number is supplied to \code{cutpoints}, it describes the number of bins; in \emph{cem}, it describes the number of cutpoints separating bins. The \emph{MatchIt} method is closer to how \code{\link[=hist]{hist()}} processes breaks points to create bins. \item In \emph{MatchIt}, values on the cutpoint boundaries will be placed into the higher bin; in \emph{cem}, they are placed into the lower bin. To avoid consequences of this choice, ensure the bin boundaries do not coincide with observed values of the variables. \item When \code{cutpoints} are used, \code{"ss"} (for Shimazaki-Shinomoto's rule) can be used in \emph{cem} but not in \emph{MatchIt}. \item When \code{k2k = TRUE}, \emph{MatchIt} matches on the original variables (scaled), whereas \emph{cem} matches on the coarsened variables. Because the variables are already exactly matched on the coarsened variables, matching in \emph{cem} is equivalent to random matching within strata. \item When \code{k2k = TRUE}, in \emph{MatchIt} matched units are identified by pair membership, and the original stratum membership prior to 1:1 matching is discarded. In \emph{cem}, pairs are not identified beyond the stratum the members are part of. \item When \code{k2k = TRUE}, \code{k2k.method = "mahalanobis"} can be requested in \emph{MatchIt} but not in \emph{cem}. } } \section{Outputs}{ All outputs described in \code{\link[=matchit]{matchit()}} are returned with \code{method = "cem"} except for \code{match.matrix}. When \code{k2k = TRUE}, a \code{match.matrix} component with the matched pairs is also included. \code{include.obj} is ignored. } \examples{ data("lalonde") # Coarsened exact matching on age, race, married, and educ with educ # coarsened into 5 bins and race coarsened into 2 categories, # grouping "white" and "hispan" together cutpoints <- list(educ = 5) grouping <- list(race = list(c("white", "hispan"), c("black"))) m.out1 <- matchit(treat ~ age + race + married + educ, data = lalonde, method = "cem", cutpoints = cutpoints, grouping = grouping) m.out1 summary(m.out1) # The same but requesting 1:1 Mahalanobis distance matching with # the k2k and k2k.method argument. Note the remaining number of units # is smaller than when retaining the full matched sample. m.out2 <- matchit(treat ~ age + race + married + educ, data = lalonde, method = "cem", cutpoints = cutpoints, grouping = grouping, k2k = TRUE, k2k.method = "mahalanobis") m.out2 summary(m.out2, un = FALSE) } \references{ In a manuscript, you don't need to cite another package when using \code{method = "cem"} because the matching is performed completely within \emph{MatchIt}. For example, a sentence might read: \emph{Coarsened exact matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R.} It would be a good idea to cite the following article, which develops the theory behind coarsened exact matching: Iacus, S. M., King, G., & Porro, G. (2012). Causal Inference without Balance Checking: Coarsened Exact Matching. \emph{Political Analysis}, 20(1), 1–24. \doi{10.1093/pan/mpr013} } \seealso{ \code{\link[=matchit]{matchit()}} for a detailed explanation of the inputs and outputs of a call to \code{matchit()}. The \emph{cem} package, upon which this method is based and which provided the workhorse in previous versions of \emph{MatchIt}. \code{\link{method_exact}} for exact matching, which performs exact matching on the covariates without coarsening. } MatchIt/man/lalonde.Rd0000644000176200001440000000350014334100747014264 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/lalonde.R \docType{data} \name{lalonde} \alias{lalonde} \title{Data from National Supported Work Demonstration and PSID, as analyzed by Dehejia and Wahba (1999).} \format{ A data frame with 614 observations (185 treated, 429 control). There are 9 variables measured for each individual. \itemize{ \item "treat" is the treatment assignment (1=treated, 0=control). \item "age" is age in years. \item "educ" is education in number of years of schooling. \item "race" is the individual's race/ethnicity, (Black, Hispanic, or White). Note previous versions of this dataset used indicator variables \code{black} and \code{hispan} instead of a single race variable. \item "married" is an indicator for married (1=married, 0=not married). \item "nodegree" is an indicator for whether the individual has a high school degree (1=no degree, 0=degree). \item "re74" is income in 1974, in U.S. dollars. \item "re75" is income in 1975, in U.S. dollars. \item "re78" is income in 1978, in U.S. dollars. } "treat" is the treatment variable, "re78" is the outcome, and the others are pre-treatment covariates. } \description{ This is a subsample of the data from the treated group in the National Supported Work Demonstration (NSW) and the comparison sample from the Population Survey of Income Dynamics (PSID). This data was previously analyzed extensively by Lalonde (1986) and Dehejia and Wahba (1999). } \references{ Lalonde, R. (1986). Evaluating the econometric evaluations of training programs with experimental data. \emph{American Economic Review} 76: 604-620. Dehejia, R.H. and Wahba, S. (1999). Causal Effects in Nonexperimental Studies: Re-Evaluating the Evaluation of Training Programs. \emph{Journal of the American Statistical Association} 94: 1053-1062. } \keyword{datasets} MatchIt/man/method_quick.Rd0000644000176200001440000001735414740562365015347 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/matchit2quick.R \name{method_quick} \alias{method_quick} \title{Fast Generalized Full Matching} \arguments{ \item{formula}{a two-sided \link{formula} object containing the treatment and covariates to be used in creating the distance measure used in the matching. This formula will be supplied to the functions that estimate the distance measure.} \item{data}{a data frame containing the variables named in \code{formula}. If not found in \code{data}, the variables will be sought in the environment.} \item{method}{set here to \code{"quick"}.} \item{distance}{the distance measure to be used. See \code{\link{distance}} for allowable options. Cannot be supplied as a matrix.} \item{link}{when \code{distance} is specified as a method of estimating propensity scores, an additional argument controlling the link function used in estimating the distance measure. See \code{\link{distance}} for allowable options with each option.} \item{distance.options}{a named list containing additional arguments supplied to the function that estimates the distance measure as determined by the argument to \code{distance}.} \item{estimand}{a string containing the desired estimand. Allowable options include \code{"ATT"}, \code{"ATC"}, and \code{"ATE"}. The estimand controls how the weights are computed; see the Computing Weights section at \code{\link[=matchit]{matchit()}} for details.} \item{exact}{for which variables exact matching should take place.} \item{mahvars}{for which variables Mahalanobis distance matching should take place when \code{distance} corresponds to a propensity score (e.g., to discard units for common support). If specified, the distance measure will not be used in matching.} \item{discard}{a string containing a method for discarding units outside a region of common support. Only allowed when \code{distance} corresponds to a propensity score.} \item{reestimate}{if \code{discard} is not \code{"none"}, whether to re-estimate the propensity score in the remaining sample prior to matching.} \item{s.weights}{the variable containing sampling weights to be incorporated into propensity score models and balance statistics.} \item{caliper}{the width of the caliper used for caliper matching. A caliper can only be placed on the propensity score and cannot be negative.} \item{std.caliper}{\code{logical}; when a caliper is specified, whether it is in standard deviation units (\code{TRUE}) or raw units (\code{FALSE}).} \item{verbose}{\code{logical}; whether information about the matching process should be printed to the console.} \item{\dots}{additional arguments passed to \pkgfun{quickmatch}{quickmatch}. Allowed arguments include \code{treatment_constraints}, \code{size_constraint}, \code{target}, and other arguments passed to \code{scclust::sc_clustering()} (see \pkgfun{quickmatch}{quickmatch} for details). In particular, changing \code{seed_method} from its default can improve performance. No arguments will be passed to \code{distances::distances()}. The arguments \code{replace}, \code{ratio}, \code{min.controls}, \code{max.controls}, \code{m.order}, and \code{antiexact} are ignored with a warning.} } \description{ In \code{\link[=matchit]{matchit()}}, setting \code{method = "quick"} performs generalized full matching, which is a form of subclassification wherein all units, both treatment and control (i.e., the "full" sample), are assigned to a subclass and receive at least one match. It uses an algorithm that is extremely fast compared to optimal full matching, which is why it is labeled as "quick", at the expense of true optimality. The method is described in Sävje, Higgins, & Sekhon (2021). The method relies on and is a wrapper for \pkgfun{quickmatch}{quickmatch}. Advantages of generalized full matching include that the matching order is not required to be specified, units do not need to be discarded, and it is less likely that extreme within-subclass distances will be large, unlike with standard subclassification. The primary output of generalized full matching is a set of matching weights that can be applied to the matched sample; in this way, generalized full matching can be seen as a robust alternative to propensity score weighting, robust in the sense that the propensity score model does not need to be correct to estimate the treatment effect without bias. This page details the allowable arguments with \code{method = "quick"}. See \code{\link[=matchit]{matchit()}} for an explanation of what each argument means in a general context and how it can be specified. Below is how \code{matchit()} is used for generalized full matching: \preformatted{ matchit(formula, data = NULL, method = "quick", distance = "glm", link = "logit", distance.options = list(), estimand = "ATT", exact = NULL, mahvars = NULL, discard = "none", reestimate = FALSE, s.weights = NULL, caliper = NULL, std.caliper = TRUE, verbose = FALSE, ...) } } \details{ Generalized full matching is similar to optimal full matching, but has some additional flexibility that can be controlled by some of the extra arguments available. By default, \code{method = "quick"} performs a standard full match in which all units are matched (unless restricted by the caliper) and assigned to a subclass. Each subclass could contain multiple units from each treatment group. The subclasses are chosen to minimize the largest within-subclass distance between units (including between units of the same treatment group). Notably, generalized full matching requires less memory and can run much faster than optimal full matching and optimal pair matching and, in some cases, even than nearest neighbor matching, and it can be used with huge datasets (e.g., in the millions) while running in under a minute. } \section{Outputs}{ All outputs described in \code{\link[=matchit]{matchit()}} are returned with \code{method = "quick"} except for \code{match.matrix}. This is because matching strata are not indexed by treated units as they are in some other forms of matching. When \code{include.obj = TRUE} in the call to \code{matchit()}, the output of the call to \pkgfun{quickmatch}{quickmatch} will be included in the output. When \code{exact} is specified, this will be a list of such objects, one for each stratum of the \code{exact} variables. } \examples{ \dontshow{if (requireNamespace("quickmatch", quietly = TRUE)) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} data("lalonde") # Generalize full PS matching m.out1 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "quick") m.out1 summary(m.out1) \dontshow{\}) # examplesIf} } \references{ In a manuscript, be sure to cite the \emph{quickmatch} package if using \code{matchit()} with \code{method = "quick"}. A citation can be generated using \code{citation("quickmatch")}. For example, a sentence might read: \emph{Generalized full matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R, which calls functions from the quickmatch package (Sävje, Sekhon, & Higgins, 2024).} You should also cite the following paper, which develops and describes the method: Sävje, F., Higgins, M. J., & Sekhon, J. S. (2021). Generalized Full Matching. \emph{Political Analysis}, 29(4), 423–447. \doi{10.1017/pan.2020.32} } \seealso{ \code{\link[=matchit]{matchit()}} for a detailed explanation of the inputs and outputs of a call to \code{matchit()}. \pkgfun{quickmatch}{quickmatch}, which is the workhorse. \code{\link{method_full}} for optimal full matching, which is nearly the same but offers more customizability and more optimal solutions at the cost of speed. } MatchIt/man/distance.Rd0000644000176200001440000004306614740562365014464 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/distance2_methods.R \name{distance} \alias{distance} \title{Propensity scores and other distance measures} \description{ Several matching methods require or can involve the distance between treated and control units. Options include the Mahalanobis distance, propensity score distance, or distance between user-supplied values. Propensity scores are also used for common support via the \code{discard} options and for defining calipers. This page documents the options that can be supplied to the \code{distance} argument to \code{\link[=matchit]{matchit()}}. } \note{ In versions of \emph{MatchIt} prior to 4.0.0, \code{distance} was specified in a slightly different way. When specifying arguments using the old syntax, they will automatically be converted to the corresponding method in the new syntax but a warning will be thrown. \code{distance = "logit"}, the old default, will still work in the new syntax, though \verb{distance = "glm", link = "logit"} is preferred (note that these are the default settings and don't need to be made explicit). } \section{Allowable options}{ There are four ways to specify the \code{distance} argument: 1) as a string containing the name of a method for estimating propensity scores, 2) as a string containing the name of a method for computing pairwise distances from the covariates, 3) as a vector of values whose pairwise differences define the distance between units, or 4) as a distance matrix containing all pairwise distances. The options are detailed below. \subsection{Propensity score estimation methods}{ When \code{distance} is specified as the name of a method for estimating propensity scores (described below), a propensity score is estimated using the variables in \code{formula} and the method corresponding to the given argument. This propensity score can be used to compute the distance between units as the absolute difference between the propensity scores of pairs of units. Propensity scores can also be used to create calipers and common support restrictions, whether or not they are used in the actual distance measure used in the matching, if any. In addition to the \code{distance} argument, two other arguments can be specified that relate to the estimation and manipulation of the propensity scores. The \code{link} argument allows for different links to be used in models that require them such as generalized linear models, for which the logit and probit links are allowed, among others. In addition to specifying the link, the \code{link} argument can be used to specify whether the propensity score or the linearized version of the propensity score should be used; by specifying \code{link = "linear.{link}"}, the linearized version will be used. The \code{distance.options} argument can also be specified, which should be a list of values passed to the propensity score-estimating function, for example, to choose specific options or tuning parameters for the estimation method. If \code{formula}, \code{data}, or \code{verbose} are not supplied to \code{distance.options}, the corresponding arguments from \code{matchit()} will be automatically supplied. See the Examples for demonstrations of the uses of \code{link} and \code{distance.options}. When \code{s.weights} is supplied in the call to \code{matchit()}, it will automatically be passed to the propensity score-estimating function as the \code{weights} argument unless otherwise described below. The following methods for estimating propensity scores are allowed: \describe{ \item{\code{"glm"}}{ The propensity scores are estimated using a generalized linear model (e.g., logistic regression). The \code{formula} supplied to \code{matchit()} is passed directly to \code{\link[=glm]{glm()}}, and \code{\link[=predict.glm]{predict.glm()}} is used to compute the propensity scores. The \code{link} argument can be specified as a link function supplied to \code{\link[=binomial]{binomial()}}, e.g., \code{"logit"}, which is the default. When \code{link} is prepended by \code{"linear."}, the linear predictor is used instead of the predicted probabilities. \code{distance = "glm"} with \code{link = "logit"} (logistic regression) is the default in \code{matchit()}. (This used to be able to be requested as \code{distance = "ps"}, which still works.)} \item{\code{"gam"}}{ The propensity scores are estimated using a generalized additive model. The \code{formula} supplied to \code{matchit()} is passed directly to \pkgfun{mgcv}{gam}, and \pkgfun{mgcv}{predict.gam} is used to compute the propensity scores. The \code{link} argument can be specified as a link function supplied to \code{\link[=binomial]{binomial()}}, e.g., \code{"logit"}, which is the default. When \code{link} is prepended by \code{"linear."}, the linear predictor is used instead of the predicted probabilities. Note that unless the smoothing functions \pkgfun{mgcv}{s}, \pkgfun{mgcv}{te}, \pkgfun{mgcv}{ti}, or \pkgfun{mgcv}{t2} are used in \code{formula}, a generalized additive model is identical to a generalized linear model and will estimate the same propensity scores as \code{glm()}. See the documentation for \pkgfun{mgcv}{gam}, \pkgfun{mgcv}{formula.gam}, and \pkgfun{mgcv}{gam.models} for more information on how to specify these models. Also note that the formula returned in the \code{matchit()} output object will be a simplified version of the supplied formula with smoothing terms removed (but all named variables present). } \item{\code{"gbm"}}{ The propensity scores are estimated using a generalized boosted model. The \code{formula} supplied to \code{matchit()} is passed directly to \pkgfun{gbm}{gbm}, and \pkgfun{gbm}{predict.gbm} is used to compute the propensity scores. The optimal tree is chosen using 5-fold cross-validation by default, and this can be changed by supplying an argument to \code{method} to \code{distance.options}; see \pkgfun{gbm}{gbm.perf} for details. The \code{link} argument can be specified as \code{"linear"} to use the linear predictor instead of the predicted probabilities. No other links are allowed. The tuning parameter defaults differ from \code{gbm::gbm()}; they are as follows: \code{n.trees = 1e4}, \code{interaction.depth = 3}, \code{shrinkage = .01}, \code{bag.fraction = 1}, \code{cv.folds = 5}, \code{keep.data = FALSE}. These are the same defaults as used in \emph{WeightIt} and \emph{twang}, except for \code{cv.folds} and \code{keep.data}. Note this is not the same use of generalized boosted modeling as in \emph{twang}; here, the number of trees is chosen based on cross-validation or out-of-bag error, rather than based on optimizing balance. \pkg{twang} should not be cited when using this method to estimate propensity scores. Note that because there is a random component to choosing the tuning parameter, results will vary across runs unless a \link[=set.seed]{seed} is set.} \item{\code{"lasso"}, \code{"ridge"}, \code{"elasticnet"}}{ The propensity scores are estimated using a lasso, ridge, or elastic net model, respectively. The \code{formula} supplied to \code{matchit()} is processed with \code{\link[=model.matrix]{model.matrix()}} and passed to \pkgfun{glmnet}{cv.glmnet}, and \pkgfun{glmnet}{predict.cv.glmnet} is used to compute the propensity scores. The \code{link} argument can be specified as a link function supplied to \code{\link[=binomial]{binomial()}}, e.g., \code{"logit"}, which is the default. When \code{link} is prepended by \code{"linear."}, the linear predictor is used instead of the predicted probabilities. When \code{link = "log"}, a Poisson model is used. For \code{distance = "elasticnet"}, the \code{alpha} argument, which controls how to prioritize the lasso and ridge penalties in the elastic net, is set to .5 by default and can be changed by supplying an argument to \code{alpha} in \code{distance.options}. For \code{"lasso"} and \code{"ridge"}, \code{alpha} is set to 1 and 0, respectively, and cannot be changed. The \code{cv.glmnet()} defaults are used to select the tuning parameters and generate predictions and can be modified using \code{distance.options}. If the \code{s} argument is passed to \code{distance.options}, it will be passed to \code{predict.cv.glmnet()}. Note that because there is a random component to choosing the tuning parameter, results will vary across runs unless a \link[=set.seed]{seed} is set. } \item{\code{"rpart"}}{ The propensity scores are estimated using a classification tree. The \code{formula} supplied to \code{matchit()} is passed directly to \pkgfun{rpart}{rpart}, and \pkgfun{rpart}{predict.rpart} is used to compute the propensity scores. The \code{link} argument is ignored, and predicted probabilities are always returned as the distance measure. } \item{\code{"randomforest"}}{ The propensity scores are estimated using a random forest. The \code{formula} supplied to \code{matchit()} is passed directly to \pkgfun{randomForest}{randomForest}, and \pkgfun{randomForest}{predict.randomForest} is used to compute the propensity scores. The \code{link} argument is ignored, and predicted probabilities are always returned as the distance measure. Note that because there is a random component, results will vary across runs unless a \link[=set.seed]{seed} is set. } \item{\code{"nnet"}}{ The propensity scores are estimated using a single-hidden-layer neural network. The \code{formula} supplied to \code{matchit()} is passed directly to \pkgfun{nnet}{nnet}, and \code{\link[=fitted]{fitted()}} is used to compute the propensity scores. The \code{link} argument is ignored, and predicted probabilities are always returned as the distance measure. An argument to \code{size} must be supplied to \code{distance.options} when using \code{method = "nnet"}. } \item{\code{"cbps"}}{ The propensity scores are estimated using the covariate balancing propensity score (CBPS) algorithm, which is a form of logistic regression where balance constraints are incorporated to a generalized method of moments estimation of of the model coefficients. The \code{formula} supplied to \code{matchit()} is passed directly to \pkgfun{CBPS}{CBPS}, and \code{\link[=fitted]{fitted()}} is used to compute the propensity scores. The \code{link} argument can be specified as \code{"linear"} to use the linear predictor instead of the predicted probabilities. No other links are allowed. The \code{estimand} argument supplied to \code{matchit()} will be used to select the appropriate estimand for use in defining the balance constraints, so no argument needs to be supplied to \code{ATT} in \code{CBPS}. } \item{\code{"bart"}}{ The propensity scores are estimated using Bayesian additive regression trees (BART). The \code{formula} supplied to \code{matchit()} is passed directly to \pkgfun{dbarts}{bart2}, and \pkgfun{dbarts}{fitted.bart} is used to compute the propensity scores. The \code{link} argument can be specified as \code{"linear"} to use the linear predictor instead of the predicted probabilities. When \code{s.weights} is supplied to \code{matchit()}, it will not be passed to \code{bart2} because the \code{weights} argument in \code{bart2} does not correspond to sampling weights. Note that because there is a random component to choosing the tuning parameter, results will vary across runs unless the \code{seed} argument is supplied to \code{distance.options}. Note that setting a seed using \code{\link[=set.seed]{set.seed()}} is not sufficient to guarantee reproducibility unless single-threading is used. See \pkgfun{dbarts}{bart2} for details.} } } \subsection{Methods for computing distances from covariates}{ The following methods involve computing a distance matrix from the covariates themselves without estimating a propensity score. Calipers on the distance measure and common support restrictions cannot be used, and the \code{distance} component of the output object will be empty because no propensity scores are estimated. The \code{link} and \code{distance.options} arguments are ignored with these methods. See the individual matching methods pages for whether these distances are allowed and how they are used. Each of these distance measures can also be calculated outside \code{matchit()} using its \link[=euclidean_dist]{corresponding function}. \describe{ \item{\code{"euclidean"}}{ The Euclidean distance is the raw distance between units, computed as \deqn{d_{ij} = \sqrt{(x_i - x_j)(x_i - x_j)'}} It is sensitive to the scale of the covariates, so covariates with larger scales will take higher priority. } \item{\code{"scaled_euclidean"}}{ The scaled Euclidean distance is the Euclidean distance computed on the scaled (i.e., standardized) covariates. This ensures the covariates are on the same scale. The covariates are standardized using the pooled within-group standard deviations, computed by treatment group-mean centering each covariate before computing the standard deviation in the full sample. } \item{\code{"mahalanobis"}}{ The Mahalanobis distance is computed as \deqn{d_{ij} = \sqrt{(x_i - x_j)\Sigma^{-1}(x_i - x_j)'}} where \eqn{\Sigma} is the pooled within-group covariance matrix of the covariates, computed by treatment group-mean centering each covariate before computing the covariance in the full sample. This ensures the variables are on the same scale and accounts for the correlation between covariates. } \item{\code{"robust_mahalanobis"}}{ The robust rank-based Mahalanobis distance is the Mahalanobis distance computed on the ranks of the covariates with an adjustment for ties. It is described in Rosenbaum (2010, ch. 8) as an alternative to the Mahalanobis distance that handles outliers and rare categories better than the standard Mahalanobis distance but is not affinely invariant. } } To perform Mahalanobis distance matching \emph{and} estimate propensity scores to be used for a purpose other than matching, the \code{mahvars} argument should be used along with a different specification to \code{distance}. See the individual matching method pages for details on how to use \code{mahvars}. } \subsection{Distances supplied as a numeric vector or matrix}{ \code{distance} can also be supplied as a numeric vector whose values will be taken to function like propensity scores; their pairwise difference will define the distance between units. This might be useful for supplying propensity scores computed outside \code{matchit()} or resupplying \code{matchit()} with propensity scores estimated previously without having to recompute them. \code{distance} can also be supplied as a matrix whose values represent the pairwise distances between units. The matrix should either be a square, with a row and column for each unit (e.g., as the output of a call to \verb{as.matrix(}\code{\link{dist}}\verb{(.))}), or have as many rows as there are treated units and as many columns as there are control units (e.g., as the output of a call to \code{\link[=mahalanobis_dist]{mahalanobis_dist()}} or \pkgfun{optmatch}{match_on}). Distance values of \code{Inf} will disallow the corresponding units to be matched. When \code{distance} is a supplied as a numeric vector or matrix, \code{link} and \code{distance.options} are ignored. } } \examples{ data("lalonde") # Linearized probit regression PS: m.out1 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, distance = "glm", link = "linear.probit") \dontshow{if (requireNamespace("mgcv", quietly = TRUE)) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} # GAM logistic PS with smoothing splines (s()): m.out2 <- matchit(treat ~ s(age) + s(educ) + race + married + nodegree + re74 + re75, data = lalonde, distance = "gam") summary(m.out2$model) \dontshow{\}) # examplesIf} \dontshow{if (requireNamespace("CBPS", quietly = TRUE)) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} # CBPS for ATC matching w/replacement, using the just- # identified version of CBPS (setting method = "exact"): m.out3 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, distance = "cbps", estimand = "ATC", distance.options = list(method = "exact"), replace = TRUE) \dontshow{\}) # examplesIf} # Mahalanobis distance matching - no PS estimated m.out4 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, distance = "mahalanobis") m.out4$distance #NULL # Mahalanobis distance matching with PS estimated # for use in a caliper; matching done on mahvars m.out5 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, distance = "glm", caliper = .1, mahvars = ~ age + educ + race + married + nodegree + re74 + re75) summary(m.out5) # User-supplied propensity scores p.score <- fitted(glm(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, family = binomial)) m.out6 <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde, distance = p.score) # User-supplied distance matrix using rank_mahalanobis() dist_mat <- robust_mahalanobis_dist( treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde) m.out7 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, distance = dist_mat) } MatchIt/man/method_nearest.Rd0000644000176200001440000004075614740562365015676 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/matchit2nearest.R \name{method_nearest} \alias{method_nearest} \title{Nearest Neighbor Matching} \arguments{ \item{formula}{a two-sided \link{formula} object containing the treatment and covariates to be used in creating the distance measure used in the matching.} \item{data}{a data frame containing the variables named in \code{formula}. If not found in \code{data}, the variables will be sought in the environment.} \item{method}{set here to \code{"nearest"}.} \item{distance}{the distance measure to be used. See \code{\link{distance}} for allowable options. Can be supplied as a distance matrix.} \item{link}{when \code{distance} is specified as a method of estimating propensity scores, an additional argument controlling the link function used in estimating the distance measure. See \code{\link{distance}} for allowable options with each option.} \item{distance.options}{a named list containing additional arguments supplied to the function that estimates the distance measure as determined by the argument to \code{distance}.} \item{estimand}{a string containing the desired estimand. Allowable options include \code{"ATT"} and \code{"ATC"}. See Details.} \item{exact}{for which variables exact matching should take place; two units with different values of an exact matching variable will not be paired.} \item{mahvars}{for which variables Mahalanobis distance matching should take place when \code{distance} corresponds to a propensity score (e.g., for caliper matching or to discard units for common support). If specified, the distance measure will not be used in matching.} \item{antiexact}{for which variables anti-exact matching should take place; two units with the same value of an anti-exact matching variable will not be paired.} \item{discard}{a string containing a method for discarding units outside a region of common support. Only allowed when \code{distance} corresponds to a propensity score.} \item{reestimate}{if \code{discard} is not \code{"none"}, whether to re-estimate the propensity score in the remaining sample prior to matching.} \item{s.weights}{the variable containing sampling weights to be incorporated into propensity score models and balance statistics.} \item{replace}{whether matching should be done with replacement (i.e., whether control units can be used as matches multiple times). See also the \code{reuse.max} argument below. Default is \code{FALSE} for matching without replacement.} \item{m.order}{the order that the matching takes place. Allowable options include \code{"largest"}, where matching takes place in descending order of distance measures; \code{"smallest"}, where matching takes place in ascending order of distance measures; \code{"closest"}, where matching takes place in ascending order of the smallest distance between units; \code{"farthest"}, where matching takes place in descending order of the smallest distance between units; \code{"random"}, where matching takes place in a random order; and \code{"data"} where matching takes place based on the order of units in the data. When \code{m.order = "random"}, results may differ across different runs of the same code unless a seed is set and specified with \code{\link[=set.seed]{set.seed()}}. The default of \code{NULL} corresponds to \code{"largest"} when a propensity score is estimated or supplied as a vector and \code{"data"} otherwise. See Details for more information.} \item{caliper}{the width(s) of the caliper(s) used for caliper matching. Two units with a difference on a caliper variable larger than the caliper will not be paired. See Details and Examples.} \item{std.caliper}{\code{logical}; when calipers are specified, whether they are in standard deviation units (\code{TRUE}) or raw units (\code{FALSE}).} \item{ratio}{how many control units should be matched to each treated unit for k:1 matching. For variable ratio matching, see section "Variable Ratio Matching" in Details below. When \code{ratio} is greater than 1, all treated units will be attempted to be matched with a control unit before any treated unit is matched with a second control unit, etc. This reduces the possibility that control units will be used up before some treated units receive any matches.} \item{min.controls, max.controls}{for variable ratio matching, the minimum and maximum number of controls units to be matched to each treated unit. See section "Variable Ratio Matching" in Details below.} \item{verbose}{\code{logical}; whether information about the matching process should be printed to the console. When \code{TRUE}, a progress bar implemented using \emph{RcppProgress} will be displayed along with an estimate of the time remaining.} \item{\dots}{additional arguments that control the matching specification: \describe{ \item{\code{reuse.max}}{ \code{numeric}; the maximum number of times each control can be used as a match. Setting \code{reuse.max = 1} corresponds to matching without replacement (i.e., \code{replace = FALSE}), and setting \code{reuse.max = Inf} corresponds to traditional matching with replacement (i.e., \code{replace = TRUE}) with no limit on the number of times each control unit can be matched. Other values restrict the number of times each control can be matched when matching with replacement. \code{replace} is ignored when \code{reuse.max} is specified. } \item{\code{unit.id}}{ one or more variables containing a unit ID for each observation, i.e., in case multiple observations correspond to the same unit. Once a control observation has been matched, no other observation with the same unit ID can be used as matches. This ensures each control unit is used only once even if it has multiple observations associated with it. Omitting this argument is the same as giving each observation a unique ID.} }} } \description{ In \code{\link[=matchit]{matchit()}}, setting \code{method = "nearest"} performs greedy nearest neighbor matching. A distance is computed between each treated unit and each control unit, and, one by one, each treated unit is assigned a control unit as a match. The matching is "greedy" in the sense that there is no action taken to optimize an overall criterion; each match is selected without considering the other matches that may occur subsequently. This page details the allowable arguments with \code{method = "nearest"}. See \code{\link[=matchit]{matchit()}} for an explanation of what each argument means in a general context and how it can be specified. Below is how \code{matchit()} is used for nearest neighbor matching: \preformatted{ matchit(formula, data = NULL, method = "nearest", distance = "glm", link = "logit", distance.options = list(), estimand = "ATT", exact = NULL, mahvars = NULL, antiexact = NULL, discard = "none", reestimate = FALSE, s.weights = NULL, replace = TRUE, m.order = NULL, caliper = NULL, ratio = 1, min.controls = NULL, max.controls = NULL, verbose = FALSE, ...) } } \details{ \subsection{Mahalanobis Distance Matching}{ Mahalanobis distance matching can be done one of two ways: \enumerate{ \item{If no propensity score needs to be estimated, \code{distance} should be set to \code{"mahalanobis"}, and Mahalanobis distance matching will occur using all the variables in \code{formula}. Arguments to \code{discard} and \code{mahvars} will be ignored, and a caliper can only be placed on named variables. For example, to perform simple Mahalanobis distance matching, the following could be run: \preformatted{ matchit(treat ~ X1 + X2, method = "nearest", distance = "mahalanobis") } With this code, the Mahalanobis distance is computed using \code{X1} and \code{X2}, and matching occurs on this distance. The \code{distance} component of the \code{matchit()} output will be empty. } \item{If a propensity score needs to be estimated for any reason, e.g., for common support with \code{discard} or for creating a caliper, \code{distance} should be whatever method is used to estimate the propensity score or a vector of distance measures. Use \code{mahvars} to specify the variables used to create the Mahalanobis distance. For example, to perform Mahalanobis within a propensity score caliper, the following could be run: \preformatted{ matchit(treat ~ X1 + X2 + X3, method = "nearest", distance = "glm", caliper = .25, mahvars = ~ X1 + X2) } With this code, \code{X1}, \code{X2}, and \code{X3} are used to estimate the propensity score (using the \code{"glm"} method, which by default is logistic regression), which is used to create a matching caliper. The actual matching occurs on the Mahalanobis distance computed only using \code{X1} and \code{X2}, which are supplied to \code{mahvars}. Units whose propensity score difference is larger than the caliper will not be paired, and some treated units may therefore not receive a match. The estimated propensity scores will be included in the \code{distance} component of the \code{matchit()} output. See Examples. } } } \subsection{Estimand}{ The \code{estimand} argument controls whether control units are selected to be matched with treated units (\code{estimand = "ATT"}) or treated units are selected to be matched with control units (\code{estimand = "ATC"}). The "focal" group (e.g., the treated units for the ATT) is typically made to be the smaller treatment group, and a warning will be thrown if it is not set that way unless \code{replace = TRUE}. Setting \code{estimand = "ATC"} is equivalent to swapping all treated and control labels for the treatment variable. When \code{estimand = "ATC"}, the default \code{m.order} is \code{"smallest"}, and the \code{match.matrix} component of the output will have the names of the control units as the rownames and be filled with the names of the matched treated units (opposite to when \code{estimand = "ATT"}). Note that the argument supplied to \code{estimand} doesn't necessarily correspond to the estimand actually targeted; it is merely a switch to trigger which treatment group is considered "focal". } \subsection{Variable Ratio Matching}{ \code{matchit()} can perform variable ratio "extremal" matching as described by Ming and Rosenbaum (2000; \doi{10.1111/j.0006-341X.2000.00118.x}). This method tends to result in better balance than fixed ratio matching at the expense of some precision. When \code{ratio > 1}, rather than requiring all treated units to receive \code{ratio} matches, each treated unit is assigned a value that corresponds to the number of control units they will be matched to. These values are controlled by the arguments \code{min.controls} and \code{max.controls}, which correspond to \eqn{\alpha} and \eqn{\beta}, respectively, in Ming and Rosenbaum (2000), and trigger variable ratio matching to occur. Some treated units will receive \code{min.controls} matches and others will receive \code{max.controls} matches (and one unit may have an intermediate number of matches); how many units are assigned each number of matches is determined by the algorithm described in Ming and Rosenbaum (2000, p119). \code{ratio} controls how many total control units will be matched: \code{n1 * ratio} control units will be matched, where \code{n1} is the number of treated units, yielding the same total number of matched controls as fixed ratio matching does. Variable ratio matching cannot be used with Mahalanobis distance matching or when \code{distance} is supplied as a matrix. The calculations of the numbers of control units each treated unit will be matched to occurs without consideration of \code{caliper} or \code{discard}. \code{ratio} does not have to be an integer but must be greater than 1 and less than \code{n0/n1}, where \code{n0} and \code{n1} are the number of control and treated units, respectively. Setting \code{ratio = n0/n1} performs a crude form of full matching where all control units are matched. If \code{min.controls} is not specified, it is set to 1 by default. \code{min.controls} must be less than \code{ratio}, and \code{max.controls} must be greater than \code{ratio}. See Examples below for an example of their use. } \subsection{Using \code{m.order = "closest"} or \code{"farthest"}}{ \code{m.order} can be set to \code{"closest"} or \code{"farthest"}, which work regardless of how the distance measure is specified. This matches in order of the distance between units. First, all the closest match is found for all treated units and the pairwise distances computed; when \code{m.order = "closest"} the pair with the smallest of the distances is matched first, and when \code{m.order = "farthest"}, the pair with the largest of the distances is matched first. Then, the pair with the second smallest (or largest) is matched second. If the matched control is ineligible (i.e., because it has already been used in a prior match), a new match is found for the treated unit, the new pair's distance is re-computed, and the pairs are re-ordered by distance. Using \code{m.order = "closest"} ensures that the best possible matches are given priority, and in that sense should perform similarly to \code{m.order = "smallest"}. It can be used to ensure the best matches, especially when matching with a caliper. Using \code{m.order = "farthest"} ensures that the hardest units to match are given their best chance to find a close match, and in that sense should perform similarly to \code{m.order = "largest"}. It can be used to reduce the possibility of extreme imbalance when there are hard-to-match units competing for controls. Note that \code{m.order = "farthest"} \strong{does not} implement "far matching" (i.e., finding the farthest control unit from each treated unit); it defines the order in which the closest matches are selected. } \subsection{Reproducibility}{ Nearest neighbor matching involves a random component only when \code{m.order = "random"} (or when the propensity is estimated using a method with randomness; see \code{\link{distance}} for details), so a seed must be set in that case using \code{\link[=set.seed]{set.seed()}} to ensure reproducibility. Otherwise, it is purely deterministic, and any ties are broken based on the order in which the data appear. } } \section{Outputs}{ All outputs described in \code{\link[=matchit]{matchit()}} are returned with \code{method = "nearest"}. When \code{replace = TRUE}, the \code{subclass} component is omitted. \code{include.obj} is ignored. } \examples{ data("lalonde") # 1:1 greedy NN matching on the PS m.out1 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "nearest") m.out1 summary(m.out1) # 3:1 NN Mahalanobis distance matching with # replacement within a PS caliper m.out2 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "nearest", replace = TRUE, mahvars = ~ age + educ + re74 + re75, ratio = 3, caliper = .02) m.out2 summary(m.out2, un = FALSE) # 1:1 NN Mahalanobis distance matching within calipers # on re74 and re75 and exact matching on married and race m.out3 <- matchit(treat ~ age + educ + re74 + re75, data = lalonde, method = "nearest", distance = "mahalanobis", exact = ~ married + race, caliper = c(re74 = .2, re75 = .15)) m.out3 summary(m.out3, un = FALSE) # 2:1 variable ratio NN matching on the PS m.out4 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "nearest", ratio = 2, min.controls = 1, max.controls = 12) m.out4 summary(m.out4, un = FALSE) # Some units received 1 match and some received 12 table(table(m.out4$subclass[m.out4$treat == 0])) } \references{ In a manuscript, you don't need to cite another package when using \code{method = "nearest"} because the matching is performed completely within \emph{MatchIt}. For example, a sentence might read: \emph{Nearest neighbor matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R.} } \seealso{ \code{\link[=matchit]{matchit()}} for a detailed explanation of the inputs and outputs of a call to \code{matchit()}. \code{\link[=method_optimal]{method_optimal()}} for optimal pair matching, which is similar to nearest neighbor matching without replacement except that an overall distance criterion is minimized (i.e., as an alternative to specifying \code{m.order}). } MatchIt/man/plot.summary.matchit.Rd0000644000176200001440000001045614740562365016771 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/plot.summary.matchit.R \name{plot.summary.matchit} \alias{plot.summary.matchit} \title{Generate a Love Plot of Standardized Mean Differences} \usage{ \method{plot}{summary.matchit}( x, abs = TRUE, var.order = "data", threshold = c(0.1, 0.05), position = "bottomright", ... ) } \arguments{ \item{x}{a \code{summary.matchit} object; the output of a call to \code{\link[=summary.matchit]{summary.matchit()}}. The \code{standardize} argument must be set to \code{TRUE} (which is the default) in the call to \code{summary}.} \item{abs}{\code{logical}; whether the standardized mean differences should be displayed in absolute value (\code{TRUE}, default) or not \code{FALSE}.} \item{var.order}{how the variables should be ordered. Allowable options include \code{"data"}, ordering the variables as they appear in the \code{summary} output; \code{"unmatched"}, ordered the variables based on their standardized mean differences before matching; \code{"matched"}, ordered the variables based on their standardized mean differences after matching; and \code{"alphabetical"}, ordering the variables alphabetically. Default is \code{"data"}. Abbreviations allowed.} \item{threshold}{numeric values at which to place vertical lines indicating a balance threshold. These can make it easier to see for which variables balance has been achieved given a threshold. Multiple values can be supplied to add multiple lines. When \code{abs = FALSE}, the lines will be displayed on both sides of zero. The lines are drawn with \code{abline} with the linetype (\code{lty}) argument corresponding to the order of the entered variables (see options at \code{\link[=par]{par()}}). The default is \code{c(.1, .05)} for a solid line (\code{lty = 1}) at .1 and a dashed line (\code{lty = 2}) at .05, indicating acceptable and good balance, respectively. Enter a value as \code{NA} to skip that value of \code{lty} (e.g., \code{c(NA, .05)} to have only a dashed vertical line at .05).} \item{position}{the position of the legend. Should be one of the allowed keyword options supplied to \code{x} in \code{\link[=legend]{legend()}} (e.g., \code{"right"}, \code{"bottomright"}, etc.). Default is \code{"bottomright"}. Set to \code{NULL} for no legend to be included. Note that the legend will cover up points if you are not careful; setting \code{var.order} appropriately can help in avoiding this.} \item{\dots}{ignored.} } \value{ A plot is displayed, and \code{x} is invisibly returned. } \description{ Generates a Love plot, which is a dot plot with variable names on the y-axis and standardized mean differences on the x-axis. Each point represents the standardized mean difference of the corresponding covariate in the matched or unmatched sample. Love plots are a simple way to display covariate balance before and after matching. The plots are generated using \code{\link[=dotchart]{dotchart()}} and \code{\link[=points]{points()}}. } \details{ For matching methods other than subclassification, \code{plot.summary.matchit} uses \code{x$sum.all[,"Std. Mean Diff."]} and \code{x$sum.matched[,"Std. Mean Diff."]} as the x-axis values. For subclassification, in addition to points for the unadjusted and aggregate subclass balance, numerals representing balance in individual subclasses are plotted if \code{subclass = TRUE} in the call to \code{summary}. Aggregate subclass standardized mean differences are taken from \code{x$sum.across[,"Std. Mean Diff."]} and the subclass-specific mean differences are taken from \code{x$sum.subclass}. } \examples{ data("lalonde") m.out <- matchit(treat ~ age + educ + married + race + re74, data = lalonde, method = "nearest") plot(summary(m.out, interactions = TRUE), var.order = "unmatched") s.out <- matchit(treat ~ age + educ + married + race + nodegree + re74 + re75, data = lalonde, method = "subclass") plot(summary(s.out, subclass = TRUE), var.order = "unmatched", abs = FALSE) } \seealso{ \code{\link[=summary.matchit]{summary.matchit()}}, \code{\link[=dotchart]{dotchart()}} \pkgfun{cobalt}{love.plot} is a more flexible and sophisticated function to make Love plots and is also natively compatible with \code{matchit} objects. } \author{ Noah Greifer } MatchIt/DESCRIPTION0000644000176200001440000000534314763430062013323 0ustar liggesusersPackage: MatchIt Version: 4.7.1 Title: Nonparametric Preprocessing for Parametric Causal Inference Description: Selects matched samples of the original treated and control groups with similar covariate distributions -- can be used to match exactly on covariates, to match on propensity scores, or perform a variety of other matching procedures. The package also implements a series of recommendations offered in Ho, Imai, King, and Stuart (2007) . (The 'gurobi' package, which is not on CRAN, is optional and comes with an installation of the Gurobi Optimizer, available at .) Authors@R: c( person("Daniel", "Ho", email = "daniel.e.ho@gmail.com", role = c("aut"), comment = c(ORCID = "0000-0002-2195-5469")), person("Kosuke", "Imai", email = "imai@harvard.edu", role = c("aut"), comment = c(ORCID = "0000-0002-2748-1022")), person("Gary", "King", email = "king@harvard.edu", role = c("aut"), comment = c(ORCID = "0000-0002-5327-7631")), person("Elizabeth", "Stuart", email = "estuart@jhu.edu", role = c("aut"), comment = c(ORCID = "0000-0002-9042-8611")), person("Alex", "Whitworth", email = "whitworth.alex@gmail.com", role = c("ctb")), person("Noah", "Greifer", role = c("cre", "aut"), email = "noah.greifer@gmail.com", comment = c(ORCID="0000-0003-3067-7154")) ) Depends: R (>= 3.6.0) Imports: backports (>= 1.1.9), chk (>= 0.8.1), rlang (>= 1.1.0), Rcpp, utils, stats, graphics, grDevices Suggests: optmatch (>= 0.10.6), Matching, rgenoud, quickmatch (>= 0.2.1), nnet, rpart, mgcv, CBPS (>= 0.17), dbarts (>= 0.9-28), randomForest (>= 4.7-1), glmnet (>= 4.0), gbm (>= 2.1.7), cobalt (>= 4.2.3), boot, marginaleffects (>= 0.25.0), sandwich (>= 2.5-1), survival, RcppProgress (>= 0.4.2), highs, Rglpk, Rsymphony, gurobi, knitr, rmarkdown, testthat (>= 3.0.0) LinkingTo: Rcpp, RcppProgress Encoding: UTF-8 LazyData: true License: GPL (>= 2) URL: https://kosukeimai.github.io/MatchIt/, https://github.com/kosukeimai/MatchIt BugReports: https://github.com/kosukeimai/MatchIt/issues VignetteBuilder: knitr RoxygenNote: 7.3.2 Config/testthat/edition: 3 NeedsCompilation: yes Packaged: 2025-03-09 14:36:53 UTC; NoahGreifer Author: Daniel Ho [aut] (), Kosuke Imai [aut] (), Gary King [aut] (), Elizabeth Stuart [aut] (), Alex Whitworth [ctb], Noah Greifer [cre, aut] () Maintainer: Noah Greifer Repository: CRAN Date/Publication: 2025-03-10 00:20:02 UTC