polycor/0000755000176200001440000000000014151607547011753 5ustar liggesuserspolycor/NAMESPACE0000644000176200001440000000061613647327064013177 0ustar liggesusers# last modified 2020-04-20 by J. Fox importFrom("stats", "complete.cases", "cor", "dnorm", "na.omit", "optim", "optimise", "pchisq", "pnorm", "qnorm", "sd") importFrom("parallel", "detectCores") export(hetcor, polychor, polyserial) export(detectCores) S3method(hetcor, data.frame) S3method(hetcor, default) S3method(print, polycor) S3method(print, hetcor) S3method(as.matrix, hetcor) polycor/man/0000755000176200001440000000000014151522505012514 5ustar liggesuserspolycor/man/print.polycor.Rd0000644000176200001440000000216113650204752015631 0ustar liggesusers\name{print.polycor} \alias{print.polycor} \title{Print Method for polycor Objects} \description{ \code{print} method for objects of class \code{polycor}, produced by \code{polychor} and \code{polyserial}. } \usage{ \method{print}{polycor}(x, digits = max(3, getOption("digits") - 3), ...) } \arguments{ \item{x}{an object of class \code{polycor}, as returned by \code{polychor} or \code{polyserial}.} \item{digits}{number of significant digits to be printed.} \item{\dots}{not used.} } \value{ Invisibly returns \code{x}; used for its side effect --- i.e., printing. } \author{John Fox \email{jfox@mcmaster.ca}} \seealso{\code{\link{polychor}}, \code{\link{polyserial}}} \examples{ if(require(mvtnorm)){ set.seed(12345) data <- rmvnorm(1000, c(0, 0), matrix(c(1, .5, .5, 1), 2, 2)) x <- data[,1] y <- data[,2] cor(x, y) # sample correlation } if(require(mvtnorm)){ x <- cut(x, c(-Inf, .75, Inf)) y <- cut(y, c(-Inf, -1, .5, 1.5, Inf)) print(polychor(x, y, ML=TRUE, std.err=TRUE), digits=3) # polychoric correlation, ML estimate } } \keyword{print} \keyword{methods} polycor/man/polyserial.Rd0000644000176200001440000001072113650204752015173 0ustar liggesusers\name{polyserial} \alias{polyserial} \title{Polyserial Correlation} \description{ Computes the polyserial correlation (and its standard error) between a quantitative variable and an ordinal variable, based on the assumption that the joint distribution of the quantitative variable and a latent continuous variable underlying the ordinal variable is bivariate normal. Either the maximum-likelihood estimator or a quicker ``two-step'' approximation is available. For the ML estimator the estimates of the thresholds and the covariance matrix of the estimates are also available. } \usage{ polyserial(x, y, ML = FALSE, control = list(), std.err = FALSE, maxcor=.9999, bins=4, start) } \arguments{ \item{x}{a numerical variable.} \item{y}{an ordered categorical variable; can be numeric, logical, a factor, an ordered factor, or a character variables, but if a factor, its levels should be in proper order, and the values of a character variable are ordered alphabetically.} \item{ML}{if \code{TRUE}, compute the maximum-likelihood estimate; if \code{FALSE}, the default, compute a quicker ``two-step'' approximation.} \item{control}{optional arguments to be passed to the \code{optim} function.} \item{std.err}{if \code{TRUE}, return the estimated variance of the correlation (for the two-step estimator) or the estimated covariance matrix of the correlation and thresholds (for the ML estimator); the default is \code{FALSE}.} \item{maxcor}{maximum absolute correlation (to insure numerical stability).} \item{bins}{the number of bins into which to dissect \code{x} for a test of bivariate normality; the default is 4.} \item{start}{optional start value(s): if a single number, start value for the correlation; if a list with the elements \code{rho} and \code{thresholds}, start values for these parameters; start values are supplied automatically if omitted, and are only relevant when the ML estimator or standard errors are selected.} } \value{ If \code{std.err} is \code{TRUE}, returns an object of class \code{"polycor"} with the following components: \item{type}{set to \code{"polyserial"}.} \item{rho}{the polyserial correlation.} \item{cuts}{estimated thresholds for the ordinal variable (\code{y}), for the ML estimator.} \item{var}{the estimated variance of the correlation, or, for the ML estimator, the estimated covariance matrix of the correlation and thresholds.} \item{n}{the number of observations on which the correlation is based.} \item{chisq}{chi-square test for bivariate normality.} \item{df}{degrees of freedom for the test of bivariate normality.} \item{ML}{\code{TRUE} for the ML estimate, \code{FALSE} for the two-step estimate.} Othewise, returns the polyserial correlation. } \details{ The ML estimator is computed by maximizing the bivariate-normal likelihood with respect to the thresholds for \eqn{y} (\eqn{\tau^{y}_j, i = 1,\ldots, c - 1}{\tau^y[j], j = 1,\ldots, c - 1}) and the population correlation (\eqn{\rho}). The likelihood is maximized numerically using the \code{\link{optim}} function, and the covariance matrix of the estimated parameters is based on the numerical Hessian computed by \code{optim}. The two-step estimator is computed by first estimating the thresholds (\eqn{\tau^{y}_j, i = 1,\ldots, c - 1}{\tau^y[j], j = 1,\ldots, c - 1}) from the marginal distribution of \eqn{y}. Then if the standard error of \eqn{\hat{\rho}}{\rho hat} is requested, the one-dimensional likelihood for \eqn{\rho} is maximized numerically, using \code{\link{optim}} if standard errors are requested; the standard error computed treats the thresholds as fixed. If the standard error isn't request, \eqn{\hat{\rho}}{\rho hat} is computed directly. } \references{ Drasgow, F. (1986) Polychoric and polyserial correlations. Pp. 68--74 in S. Kotz and N. Johnson, eds., \emph{The Encyclopedia of Statistics, Volume 7.} Wiley. } \author{John Fox \email{jfox@mcmaster.ca}} \seealso{\code{\link{hetcor}}, \code{\link{polychor}}, \code{\link{print.polycor}}, \code{\link[stats]{optim}}} \examples{ if(require(mvtnorm)){ set.seed(12345) data <- rmvnorm(1000, c(0, 0), matrix(c(1, .5, .5, 1), 2, 2)) x <- data[,1] y <- data[,2] cor(x, y) # sample correlation } if(require(mvtnorm)){ y <- cut(y, c(-Inf, -1, .5, 1.5, Inf)) polyserial(x, y) # 2-step estimate } if(require(mvtnorm)){ polyserial(x, y, ML=TRUE, std.err=TRUE) # ML estimate } } \keyword{models} polycor/man/hetcor.Rd0000644000176200001440000001405014151520127014265 0ustar liggesusers\name{hetcor} \alias{hetcor} \alias{hetcor.default} \alias{hetcor.data.frame} \alias{print.hetcor} \alias{as.matrix.hetcor} \alias{detectCores} \title{Heterogeneous Correlation Matrix} \description{ \code{hetcor} computes a heterogenous correlation matrix, consisting of Pearson product-moment correlations between numeric variables, polyserial correlations between numeric and ordinal variables, and polychoric correlations between ordinal variables. The \code{\link[parallel]{detectCores}} function is imported from the \pkg{parallel} package and re-exported. } \usage{ hetcor(data, ..., ML = FALSE, std.err = TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE, parallel=FALSE, ncores=detectCores(logical=FALSE)) \method{hetcor}{data.frame}(data, ML = FALSE, std.err = TRUE, use = c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE, parallel=FALSE, ncores=detectCores(logical=FALSE), ...) \method{hetcor}{default}(data, ..., ML = FALSE, std.err = TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE, parallel=FALSE, ncores=detectCores(logical=FALSE)) \method{print}{hetcor}(x, digits = max(3, getOption("digits") - 3), ...) \method{as.matrix}{hetcor}(x, ...) detectCores(all.tests=FALSE, logical=TRUE) } \arguments{ \item{data}{a data frame consisting of factors, ordered factors, logical variables, character variables, and/or numeric variables, or the first of several variables.} \item{\dots}{variables and/or arguments to be passed down.} \item{ML}{if \code{TRUE}, compute maximum-likelihood estimates; if \code{FALSE}, compute quick two-step estimates.} \item{std.err}{if \code{TRUE}, compute standard errors.} \item{bins}{number of bins to use for continuous variables in testing bivariate normality; the default is 4.} \item{pd}{if \code{TRUE} and if the correlation matrix is not positive-definite, an attempt will be made to adjust it to a positive-definite matrix, using the \code{\link[Matrix]{nearPD}} function in the \code{Matrix} package. Note that default arguments to \code{nearPD} are used (except \code{corr=TRUE}); for more control call \code{nearPD} directly.} \item{parallel}{if \code{TRUE} (the default is \code{FALSE}), perform parallel computations on a computer with multiple CPUs/cores.} \item{ncores}{the number of cores to use for parallel computations; the default is the number of physical cores detected.} \item{use}{if \code{"complete.obs"}, remove observations with any missing data; if \code{"pairwise.complete.obs"}, compute each correlation using all observations with valid data for that pair of variables.} \item{x}{an object of class \code{"hetcor"} to be printed, or from which to extract the correlation matrix.} \item{digits}{number of significant digits.} \item{all.tests}{logical, apply all known tests; default is \code{FALSE}.} \item{logical}{if \code{TRUE}, detect logical CPUs/cores; if \code{FALSE}, detect physical CPUs/cores.} } \value{ \code{hetcor} returns an object of class \code{"hetcor"} with the following components: \item{correlations}{the correlation matrix.} \item{type}{the type of each correlation: \code{"Pearson"}, \code{"Polychoric"}, or \code{"Polyserial"}.} \item{std.errors}{the standard errors of the correlations, if requested.} \item{n}{the number (or numbers) of observations on which the correlations are based.} \item{tests}{p-values for tests of bivariate normality for each pair of variables.} \item{NA.method}{the method by which any missing data were handled: \code{"complete.obs"} or \code{"pairwise.complete.obs"}.} \item{ML}{\code{TRUE} for ML estimates, \code{FALSE} for two-step estimates.} } \author{John Fox \email{jfox@mcmaster.ca}} \note{Although the function reports standard errors for product-moment correlations, transformations (the most well known is Fisher's \emph{z}-transformation) are available that make the approach to asymptotic normality much more rapid.} \section{Warning}{ Be careful with character variables (as opposed to factors), the values of which are ordered alphabetically. Thus, e.g., the values \code{"disagree"}, \code{"neutral"}, \code{"agree"} are ordered \code{"agree"}, \code{"disagree"}, \code{"neutral"}. } \references{ Drasgow, F. (1986) Polychoric and polyserial correlations. Pp. 68-74 in S. Kotz and N. Johnson, eds., \emph{The Encyclopedia of Statistics, Volume 7.} Wiley. Olsson, U. (1979) Maximum likelihood estimation of the polychoric correlation coefficient. \emph{Psychometrika} \bold{44}, 443-460. Rodriguez, R.N. (1982) Correlation. Pp. 193-204 in S. Kotz and N. Johnson, eds., \emph{The Encyclopedia of Statistics, Volume 2.} Wiley. Ghosh, B.K. (1966) Asymptotic expansion for the moments of the distribution of correlation coefficient. \emph{Biometrika} \bold{53}, 258-262. Olkin, I., and Pratt, J.W. (1958) Unbiased estimation of certain correlation coefficients. \emph{Annals of Mathematical Statistics} \bold{29}, 201-211. } \seealso{\code{\link{polychor}}, \code{\link{polyserial}}, \code{\link[Matrix]{nearPD}}, \code{\link[parallel]{detectCores}}} \examples{ if(require(mvtnorm)){ set.seed(12345) R <- matrix(0, 4, 4) R[upper.tri(R)] <- runif(6) diag(R) <- 1 R <- cov2cor(t(R) \%*\% R) round(R, 4) # population correlations data <- rmvnorm(1000, rep(0, 4), R) round(cor(data), 4) # sample correlations } if(require(mvtnorm)){ x1 <- data[,1] x2 <- data[,2] y1 <- cut(data[,3], c(-Inf, .75, Inf)) y2 <- cut(data[,4], c(-Inf, -1, .5, 1.5, Inf)) data <- data.frame(x1, x2, y1, y2) hetcor(data) # Pearson, polychoric, and polyserial correlations, 2-step est. } if(require(mvtnorm)){ hetcor(x1, x2, y1, y2, ML=TRUE) # Pearson, polychoric, polyserial correlations, ML est. } \dontrun{ hc <- hetcor(data, ML=TRUE) # parallel computation: hc.m <- hetcor(data, ML=TRUE, parallel=TRUE, ncores=min(2, detectCores())) hc.m all.equal(hc, hc.m) } } \keyword{models} polycor/man/polychor.Rd0000644000176200001440000001154013650117202014640 0ustar liggesusers\name{polychor} \alias{polychor} \title{Polychoric Correlation} \description{ Computes the polychoric correlation (and its standard error) between two ordinal variables or from their contingency table, under the assumption that the ordinal variables dissect continuous latent variables that are bivariate normal. Either the maximum-likelihood estimator or a (possibly much) quicker ``two-step'' approximation is available. For the ML estimator, the estimates of the thresholds and the covariance matrix of the estimates are also available. } \usage{ polychor(x, y, ML = FALSE, control = list(), std.err = FALSE, maxcor=.9999, start) } \arguments{ \item{x}{a contingency table of counts or an ordered categorical variable; the latter can be numeric, logical, a factor, an ordered factor, or a character variable, but if a factor, its levels should be in proper order, and the values of a character variable are ordered alphabetically.} \item{y}{if \code{x} is a variable, a second ordered categorical variable.} \item{ML}{if \code{TRUE}, compute the maximum-likelihood estimate; if \code{FALSE}, the default, compute a quicker ``two-step'' approximation.} \item{control}{optional arguments to be passed to the \code{optim} function.} \item{std.err}{if \code{TRUE}, return the estimated variance of the correlation (for the two-step estimator) or the estimated covariance matrix (for the ML estimator) of the correlation and thresholds; the default is \code{FALSE}.} \item{maxcor}{maximum absolute correlation (to insure numerical stability).} \item{start}{optional start value(s): if a single number, start value for the correlation; if a list with the elements \code{rho}, \code{row.thresholds}, and \code{column.thresholds}, start values for these parameters; start values are supplied automatically if omitted, and are only relevant when the ML estimator or standard errors are selected.}} \value{ If \code{std.err} is \code{TRUE}, returns an object of class \code{"polycor"} with the following components: \item{type}{set to \code{"polychoric"}.} \item{rho}{the polychoric correlation.} \item{row.cuts}{estimated thresholds for the row variable (\code{x}), for the ML estimate.} \item{col.cuts}{estimated thresholds for the column variable (\code{y}), for the ML estimate.} \item{var}{the estimated variance of the correlation, or, for the ML estimate, the estimated covariance matrix of the correlation and thresholds.} \item{n}{the number of observations on which the correlation is based.} \item{chisq}{chi-square test for bivariate normality.} \item{df}{degrees of freedom for the test of bivariate normality.} \item{ML}{\code{TRUE} for the ML estimate, \code{FALSE} for the two-step estimate.} Othewise, returns the polychoric correlation. } \details{ The ML estimator is computed by maximizing the bivariate-normal likelihood with respect to the thresholds for the two variables (\eqn{\tau^{x}_i, i = 1,\ldots, r - 1}{\tau^x[i], i = 1,\ldots, r - 1}; \eqn{\tau^{y}_j, j = 1,\ldots, c - 1}{\tau^y[j], j = 1,\ldots, c - 1}) and the population correlation (\eqn{\rho}). Here, \eqn{r} and \eqn{c} are respectively the number of levels of \eqn{x} and \eqn{y}. The likelihood is maximized numerically using the \code{\link{optim}} function, and the covariance matrix of the estimated parameters is based on the numerical Hessian computed by \code{optim}. The two-step estimator is computed by first estimating the thresholds (\eqn{\tau^{x}_i, i = 1,\ldots, r - 1}{\tau^x[i], i = 1,\ldots, r - 1} and \eqn{\tau^{y}_j, i = j,\ldots, c - 1}{\tau^y[j], i = j,\ldots, c - 1}) separately from the marginal distribution of each variable. Then the one-dimensional likelihood for \eqn{\rho} is maximized numerically, using \code{\link{optim}} if standard errors are requested, or \code{\link{optimise}} if they are not. The standard error computed treats the thresholds as fixed. } \references{ Drasgow, F. (1986) Polychoric and polyserial correlations. Pp. 68--74 in S. Kotz and N. Johnson, eds., \emph{The Encyclopedia of Statistics, Volume 7.} Wiley. Olsson, U. (1979) Maximum likelihood estimation of the polychoric correlation coefficient. \emph{Psychometrika} \bold{44}, 443-460. } \author{John Fox \email{jfox@mcmaster.ca}} \seealso{\code{\link{hetcor}}, \code{\link{polyserial}}, \code{\link{print.polycor}}, \code{\link[stats]{optim}}} \examples{ if(require(mvtnorm)){ set.seed(12345) data <- rmvnorm(1000, c(0, 0), matrix(c(1, .5, .5, 1), 2, 2)) x <- data[,1] y <- data[,2] cor(x, y) # sample correlation } if(require(mvtnorm)){ x <- cut(x, c(-Inf, .75, Inf)) y <- cut(y, c(-Inf, -1, .5, 1.5, Inf)) polychor(x, y) # 2-step estimate } if(require(mvtnorm)){ set.seed(12345) polychor(x, y, ML=TRUE, std.err=TRUE) # ML estimate } } \keyword{models} polycor/DESCRIPTION0000644000176200001440000000160514151607547013463 0ustar liggesusersPackage: polycor Version: 0.8-0 Date: 2021-11-30 Title: Polychoric and Polyserial Correlations Authors@R: person("John", "Fox", role = c("aut", "cre"), email = "jfox@mcmaster.ca") Depends: R (>= 3.3.0) Imports: stats, mvtnorm, Matrix, parallel ByteCompile: yes LazyLoad: yes Description: Computes polychoric and polyserial correlations by quick "two-step" methods or ML, optionally with standard errors; tetrachoric and biserial correlations are special cases. License: GPL (>= 2) URL: https://r-forge.r-project.org/projects/polycor/, https://CRAN.R-project.org/package=polycor Author: John Fox [aut, cre] Maintainer: John Fox Repository: CRAN Repository/R-Forge/Project: polycor Repository/R-Forge/Revision: 24 Repository/R-Forge/DateTimeStamp: 2021-11-30 22:06:15 Date/Publication: 2021-12-01 06:00:07 UTC NeedsCompilation: no Packaged: 2021-11-30 22:28:07 UTC; rforge polycor/NEWS0000644000176200001440000000222114151477343012446 0ustar liggesusersversion 0.8-0 o hetcor() can optionally use parallel computations (suggestion of Marc Segond). o polychor() and polyserial() now allow user-specified start values (suggestion of someone whose name is lost to antiquity, with my apologies). o cases where correlations can't be computed (e.g., because a factor as only one level in the data) are handled more gracefully. version 0.7-10 o Let hetcor() work with objects that inherit from "data.frame" (suggestion of Emil O. W. Kirkegaard). o Allow empty pairs when use="pairwise.complete.obs" for hetcor() (suggestion of Emil O. W. Kirkegaard). o Allow character variables in hetcor() but print message. version 0.7-9 o Further protection/warnings against inadmissible correlations. o Protect against out-of-order thresholds during optimization (suggestion of Evgeny Mirkes). o hetcor() doesn't die when an error is thrown computing a correlation, which is now set to NA. o hetcor() now uses Matrix::nearPD() to force a postitive-definite result rather than the deprecated sfsmisc::nearcor() (suggestion of Timothy Bates). o Improved clarity of documentation. o See the CHANGES file for prior versions. polycor/R/0000755000176200001440000000000014151522505012142 5ustar liggesuserspolycor/R/utilities.R0000644000176200001440000000232712760047562014315 0ustar liggesusers# last modified 29 Mar 07 by J. Fox binBvn <- function(rho, row.cuts, col.cuts, bins=4){ row.cuts <- if (missing(row.cuts)) c(-Inf, 1:(bins - 1)/bins, Inf) else c(-Inf, row.cuts, Inf) col.cuts <- if (missing(col.cuts)) c(-Inf, 1:(bins - 1)/bins, Inf) else c(-Inf, col.cuts, Inf) r <- length(row.cuts) - 1 c <- length(col.cuts) - 1 P <- matrix(0, r, c) R <- matrix(c(1, rho, rho, 1), 2, 2) for (i in 1:r){ for (j in 1:c){ P[i,j] <- mvtnorm::pmvnorm(lower=c(row.cuts[i], col.cuts[j]), upper=c(row.cuts[i+1], col.cuts[j+1]), corr=R) } } P } chisq <- function(x, y, rho, row.cuts, col.cuts, zerotol=1e-6, bins=4){ if (missing(row.cuts)) row.cuts <- qnorm(1:(bins - 1)/bins) if (missing(col.cuts)) col.cuts <- qnorm(1:(bins - 1)/bins) P <- binBvn(rho, row.cuts, col.cuts, bins=bins) if (!is.factor(x)) x <- cut(scale(x), c(-Inf, row.cuts, Inf)) if (!is.factor(y)) y <- cut(scale(y), c(-Inf, col.cuts, Inf)) tab <- table(x, y) n <- sum(tab) 2*sum(tab*log((tab + zerotol)/(P*n))) } as.matrix.hetcor <- function(x, ...) x$correlations polycor/R/polychor.R0000644000176200001440000001127413650117202014126 0ustar liggesusers# last modified 2020-04-22 by J. Fox polychor <- function (x, y, ML=FALSE, control=list(), std.err=FALSE, maxcor=.9999, start){ f <- function(pars) { if (length(pars) == 1){ rho <- pars if (abs(rho) > maxcor) rho <- sign(rho)*maxcor row.cuts <- rc col.cuts <- cc } else { rho <- pars[1] if (abs(rho) > maxcor) rho <- sign(rho)*maxcor row.cuts <- pars[2:r] col.cuts <- pars[(r+1):(r+c-1)] if (any(diff(row.cuts) < 0) || any(diff(col.cuts) < 0)) return(Inf) } P <- binBvn(rho, row.cuts, col.cuts) - sum(tab * log(P)) } tab <- if (missing(y)) x else table(x, y) zerorows <- apply(tab, 1, function(x) all(x == 0)) zerocols <- apply(tab, 2, function(x) all(x == 0)) zr <- sum(zerorows) if (0 < zr) warning(paste(zr, " row", suffix <- if(zr == 1) "" else "s", " with zero marginal", suffix," removed", sep="")) zc <- sum(zerocols) if (0 < zc) warning(paste(zc, " column", suffix <- if(zc == 1) "" else "s", " with zero marginal", suffix, " removed", sep="")) tab <- tab[!zerorows, ,drop=FALSE] tab <- tab[, !zerocols, drop=FALSE] r <- nrow(tab) c <- ncol(tab) if (r < 2) { warning("the table has fewer than 2 rows") return(NA) } if (c < 2) { warning("the table has fewer than 2 columns") return(NA) } n <- sum(tab) rc <- qnorm(cumsum(rowSums(tab))/n)[-r] cc <- qnorm(cumsum(colSums(tab))/n)[-c] if (!missing(start) && (ML || std.err)) { if (is.list(start)){ rho <- start$rho rc <- start$row.thresholds cc <- start$col.thresholds } else { rho <- start } if (!is.numeric(rho) || length(rho) != 1) stop("start value for rho must be a number") if (!is.numeric(rc) || length(rc) != r - 1) stop("start values for row thresholds must be ", r - 1, "numbers") if (!is.numeric(cc) || length(cc) != c - 1) stop("start values for column thresholds must be ", c - 1, "numbers") } if (ML) { result <- optim( c(if (missing(start)) optimise(f, interval=c(-1, 1))$minimum else rho, rc, cc), f, control=control, hessian=std.err ) if (result$par[1] > 1){ result$par[1] <- maxcor warning(paste("inadmissible correlation set to", maxcor)) } else if (result$par[1] < -1){ result$par[1] <- -maxcor warning(paste("inadmissible correlation set to -", maxcor, sep="")) } if (std.err) { chisq <- 2*(result$value + sum(tab * log((tab + 1e-6)/n))) df <- length(tab) - r - c result <- list(type="polychoric", rho=result$par[1], row.cuts=result$par[2:r], col.cuts=result$par[(r+1):(r+c-1)], var=solve(result$hessian), n=n, chisq=chisq, df=df, ML=TRUE) class(result) <- "polycor" return(result) } else return(as.vector(result$par[1])) } else if (std.err){ result <- optim(if (missing(start)) 0 else rho, f, control=control, hessian=TRUE, method="BFGS") if (result$par > 1){ result$par <- maxcor warning(paste("inadmissible correlation set to", maxcor)) } else if (result$par < -1){ result$par <- -maxcor warning(paste("inadmissible correlation set to -", maxcor, sep="")) } chisq <- 2*(result$value + sum(tab *log((tab + 1e-6)/n))) df <- length(tab) - r - c result <- list(type="polychoric", rho=result$par, var=1/result$hessian, n=n, chisq=chisq, df=df, ML=FALSE) class(result) <- "polycor" return(result) } else optimise(f, interval=c(-maxcor, maxcor))$minimum } polycor/R/print.polycor.R0000644000176200001440000000406711077472524015127 0ustar liggesusers# last modified 24 June 04 by J. Fox "print.polycor" <- function(x, digits = max(3, getOption("digits") - 3), ...){ if (x$type == "polychoric"){ se <- sqrt(diag(x$var)) se.rho <- se[1] est <- if (x$ML) "ML est." else "2-step est." cat("\nPolychoric Correlation, ", est, " = ", signif(x$rho, digits), " (", signif(se.rho, digits), ")", sep="") if (x$df > 0) cat("\nTest of bivariate normality: Chisquare = ", signif(x$chisq, digits), ", df = ", x$df, ", p = ", signif(pchisq(x$chisq, x$df, lower.tail=FALSE), digits), "\n", sep="") else cat("\n") r <- length(x$row.cuts) c <- length(x$col.cuts) if (r == 0) return(invisible(x)) row.cuts.se <- se[2:(r+1)] col.cuts.se <- se[(r+2):(r+c+1)] rowThresh <- signif(cbind(x$row.cuts, row.cuts.se), digits) if (r > 1) cat("\n Row Thresholds\n") else cat("\n Row Threshold\n") colnames(rowThresh) <- c("Threshold", "Std.Err.") rownames(rowThresh) <- if (r > 1) 1:r else " " print(rowThresh) colThresh <- signif(cbind(x$col.cuts, col.cuts.se), digits) if (c > 1) cat("\n\n Column Thresholds\n") else cat("\n\n Column Threshold\n") colnames(colThresh) <- c("Threshold", "Std.Err.") rownames(colThresh) <- if (c > 1) 1:c else " " print(colThresh) } else if (x$type == "polyserial"){ se <- sqrt(diag(x$var)) se.rho <- se[1] est <- if (x$ML) "ML est." else "2-step est." cat("\nPolyserial Correlation, ", est, " = ", signif(x$rho, digits), " (", signif(se.rho, digits), ")", sep="") cat("\nTest of bivariate normality: Chisquare = ", signif(x$chisq, digits), ", df = ", x$df, ", p = ", signif(pchisq(x$chisq, x$df, lower.tail=FALSE), digits), "\n\n", sep="") if (length(se) == 1) return(invisible(x)) cuts.se <- se[-1] thresh <- signif(rbind(x$cuts, cuts.se), digits) colnames(thresh) <- 1:length(x$cuts) rownames(thresh) <- c("Threshold", "Std.Err.") print(thresh) } else print(unclass(x)) invisible(x) } polycor/R/hetcor.R0000644000176200001440000000037613647327064013573 0ustar liggesusers# last modified 2020-04-20 by J. Fox hetcor <- function(data, ..., ML=FALSE, std.err=TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE, parallel=FALSE, ncores=detectCores(logical=FALSE)){ UseMethod("hetcor") } polycor/R/polyserial.R0000644000176200001440000000540413650117202014450 0ustar liggesusers# last modified 2020-04-22 by J. Fox polyserial <- function(x, y, ML=FALSE, control=list(), std.err=FALSE, maxcor=.9999, bins=4, start){ f <- function(pars){ rho <- pars[1] if (abs(rho) > maxcor) rho <- sign(rho)*maxcor cts <- if (length(pars) == 1) c(-Inf, cuts, Inf) else c(-Inf, pars[-1], Inf) if (any(diff(cts) < 0)) return(Inf) tau <- (matrix(cts, n, s+1, byrow=TRUE) - matrix(rho*z, n, s+1))/ sqrt(1 - rho^2) - sum(log(dnorm(z)*(pnorm(tau[cbind(indices, y+1)]) - pnorm(tau[cbind(indices, y)])))) } if (!is.numeric(x)) stop("x must be numeric") valid <- complete.cases(x, y) x <- x[valid] y <- y[valid] z <- scale(x) tab <- table(y) n <- sum(tab) s <- length(tab) if (s < 2) { warning("y has fewer than 2 levels") return(NA) } if (sum(0 != tab) < 2){ warning("y has fewer than 2 levels with data") return(NA) } indices <- 1:n cuts <- qnorm(cumsum(tab)/n)[-s] y <- as.numeric(as.factor(y)) rho <- sqrt((n - 1)/n)*sd(y)*cor(x, y)/sum(dnorm(cuts)) if (!missing(start) && (ML || std.err)) { if (is.list(start)){ rho <- start$rho cuts <- start$thresholds } else { rho <- start } if (!is.numeric(rho) || length(rho) != 1) stop("start value for rho must be a number") if (!is.numeric(cuts) || length(cuts) != s - 1) stop("start values for thresholds must be ", s - 1, "numbers") } if (abs(rho) > maxcor) { warning("initial correlation inadmissible, ", rho, ", set to ", sign(rho)*maxcor) rho <- sign(rho)*maxcor } if (ML) { result <- optim(c(rho, cuts), f, control=control, hessian=std.err) if (result$par[1] > 1){ result$par[1] <- maxcor warning(paste("inadmissible correlation set to", maxcor)) } else if (result$par[1] < -1){ result$par[1] <- -maxcor warning(paste("inadmissible correlation set to -", maxcor, sep="")) } if (std.err){ chisq <- chisq(y, z, result$par[1], result$par[-1], bins=bins) df <- s*bins - s - 1 result <- list(type="polyserial", rho=result$par[1], cuts=result$par[-1], var=solve(result$hessian), n=n, chisq=chisq, df=df, ML=TRUE) class(result) <- "polycor" return(result) } else return(as.vector(result$par[1])) } else if (std.err){ result <- optim(rho, f, control=control, hessian=TRUE, method="BFGS") if (result$par > 1){ result$par <- maxcor warning(paste("inadmissible correlation set to", maxcor)) } else if (result$par < -1){ result$par <- -maxcor warning(paste("inadmissible correlation set to -", maxcor, sep="")) } chisq <- chisq(y, z, rho, cuts, bins=bins) df <- s*bins - s - 1 result <- list(type="polyserial", rho=result$par, var=1/result$hessian, n=n, chisq=chisq, df=df, ML=FALSE) class(result) <- "polycor" return(result) } else return(rho) } polycor/R/hetcor.default.R0000644000176200001440000000072313647327064015212 0ustar liggesusers# last modified 2020-04-19 by J. Fox hetcor.default <- function (data, ..., ML = FALSE, std.err = TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins = 4, pd = TRUE, parallel=FALSE, ncores=detectCores(logical=FALSE)) { use <- match.arg(use) dframe <- data.frame(data, ...) if (!missing(...)) names(dframe)[1] <- deparse(substitute(data)) hetcor(dframe, ML = ML, std.err = std.err, use=use, bins = bins, pd = pd, ncores=ncores) } polycor/R/print.hetcor.R0000644000176200001440000000227211077472524014720 0ustar liggesusers# last modified 12 Dec 04 by J. Fox "print.hetcor" <- function(x, digits = max(3, getOption("digits") - 3), ...){ R <- signif(x$correlations, digits=digits) R[upper.tri(R)] <- x$type[upper.tri(R)] R <- as.data.frame(R) if (x$ML) cat("\nMaximum-Likelihood Estimates\n") else cat("\nTwo-Step Estimates\n") cat("\nCorrelations/Type of Correlation:\n") print(R) if (!is.null(x$std.errors)){ SE <- signif(x$std.errors, digits) diag(SE) <- "" if (x$NA.method == "complete.obs"){ SE[upper.tri(SE)] <- "" cat("\nStandard Errors:\n") SE <- as.data.frame(SE) print(SE[,-ncol(SE)]) cat(paste("\nn =", x$n, "\n")) } else { SE[upper.tri(SE)] <- x$n[upper.tri(SE)] diag(SE) <- diag(x$n) SE <- as.data.frame(SE) cat("\nStandard Errors/Numbers of Observations:\n") print(SE) } if (!all(is.na(x$tests[lower.tri(x$tests)]))){ Test <- signif(x$tests, digits) Test[upper.tri(Test)] <- "" diag(Test) <- "" Test <- as.data.frame(Test) cat("\nP-values for Tests of Bivariate Normality:\n") print(Test[,-ncol(Test)]) } } invisible(x) } polycor/R/hetcor.data.frame.R0000644000176200001440000002753614151477343015600 0ustar liggesusers# # last modified 2021-11-30 by J. Fox hetcor.data.frame <- function(data, ML=FALSE, std.err=TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE, parallel=FALSE, ncores=detectCores(logical=FALSE), ...){ se.r <- function(r, n){ rho <- r*(1 + (1 - r^2)/(2*(n - 3))) # approx. unbiased estimator v <- (((1 - rho^2)^2)/(n + 6))*(1 + (14 + 11*rho^2)/(2*(n + 6))) sqrt(v) } computeCor <- function(pair){ type <- "" se <- NA test <- NA i <- rows[pair] j <- cols[pair] x <- data[, i] y <- data[, j] n <- sum(complete.cases(x, y)) if (n == 0) { test <- r <- se <- NA warning("no cases for pair ", j, ", ", i) } if (inherits(x, c("numeric", "integer")) && inherits(y, c("numeric", "integer"))) { r <- cor(x, y, use="complete.obs") type <- "Pearson" if (std.err) { se <- se.r(r, n) test <- pchisq(chisq(x, y, r, bins=bins), bins^2 - 2, lower.tail=FALSE) } } else if (inherits(x, c("factor", "logical", "character")) && inherits(y, c("factor", "logical", "character"))) { type <- "Polychoric" result <- try(polychor(x, y, ML=ML, std.err=std.err), silent=TRUE) error <- inherits(result, "try-error") if (error){ warning("could not compute polychoric correlation between variables ", i, " and ", j, "\n Message: ", result, "\n") result <- NA } if (std.err && !error && !(length(result) == 1 && is.na(result))){ r <- result$rho se <- sqrt(result$var[1,1]) test <- if (result$df > 0) pchisq(result$chisq, result$df, lower.tail=FALSE) else NA } else { r <- result test <- se <- NA } } else { if (inherits(x, c("factor", "logical", "character")) && inherits(y, c("numeric", "integer"))) result <- try(polyserial(y, x, ML=ML, std.err=std.err, bins=bins), silent=TRUE) else if (inherits(x, c("numeric", "integer")) && inherits(y, c("factor", "logical", "character"))) result <- try(polyserial(x, y, ML=ML, std.err=std.err, bins=bins), silent=TRUE) else { stop("columns must be numeric, factors, logical, or character.") } type <- "Polyserial" error <- inherits(result, "try-error") if (error){ warning("could not compute polyserial correlation between variables ", i, " and ", j, "\n Message: ", result, "\n") result <- NA } if (std.err && !error && !(length(result) == 1 && is.na(result))){ r <- result$rho se <- sqrt(result$var[1,1]) test <- pchisq(result$chisq, result$df, lower.tail=FALSE) } else { r <- result se <- test <- NA } } list(n=n, r=r, Type=type, SE=se, Test=test) } if (any(sapply(data, function(x) inherits(x, "character")))){ message("data contain one or more character variables", "\nthe values of which are ordered alphabetically") } use <- match.arg(use) if (use == "complete.obs") { data <- na.omit(data) n <- nrow(data) } p <- length(data) if (p < 2) stop("fewer than 2 variables.") R <- matrix(1, p, p) Type <- matrix("", p, p) SE <- matrix(0, p, p) N <- matrix(0, p, p) Test <- matrix(0, p, p) diag(N) <- if (use == "complete.obs") nrow(data) else sapply(data, function(x) sum(!is.na(x))) if (all(diag(N) == 0)) stop("no non-missing cases") npairs <- p*(p -1)/2 rows <- matrix(1:p, p, p) cols <- t(rows) rows <- rows[lower.tri(rows)] cols <- cols[lower.tri(cols)] result <- if (parallel && ncores > 1){ message("Note: using a cluster of ", ncores, " cores") cl <- parallel::makeCluster(ncores) on.exit(parallel::stopCluster(cl)) parallel::clusterApply(cl, 1:npairs, computeCor) } else { lapply(1:npairs, computeCor) } for (pair in 1:npairs){ i <- rows[pair] j <- cols[pair] res <- result[[pair]] N[i, j] <- N[j, i] <- res$n R[i, j] <- R[j, i] <- res$r Type[i, j] <- Type[j, i] <- res$Type SE[i, j] <- SE[j, i] <- res$SE Test[i, j] <- Test[j, i] <- res$Test } if (pd && !any(is.na(R)) && min(eigen(R, only.values=TRUE)$values) < 0){ cor <- Matrix::nearPD(R, corr=TRUE) if (!cor$converged) warning("attempt to make correlation matrix positive-definite failed") else warning("the correlation matrix has been adjusted to make it positive-definite") R <- as.matrix(cor$mat) } rownames(R) <- colnames(R) <- names(data) result <- list(correlations=R, type=Type, NA.method=use, ML=ML) if (std.err) { rownames(SE) <- colnames(SE) <- names(data) rownames(N) <- colnames(N) <- names(N) rownames(Test) <- colnames(Test) <- names(data) result$std.errors <- SE result$n <- if (use == "complete.obs") n else N result$tests <- Test } if (0 < (nNA <- sum(is.na(R[lower.tri(R)])))){ warning(nNA, if (nNA == 1) " correlation" else " correlations", " couldn't be computed and", if (nNA == 1) " is" else " are", " NA") } class(result) <- "hetcor" result } # "hetcor.data.frame" <- # function(data, ML=FALSE, std.err=TRUE, use=c("complete.obs", "pairwise.complete.obs"), # bins=4, pd=TRUE, ...){ # se.r <- function(r, n){ # rho <- r*(1 + (1 - r^2)/(2*(n - 3))) # approx. unbiased estimator # v <- (((1 - rho^2)^2)/(n + 6))*(1 + (14 + 11*rho^2)/(2*(n + 6))) # sqrt(v) # } # if (any(sapply(data, function(x) inherits(x, "character")))){ # message("data contain one or more character variables", # "\nthe values of which are ordered alphabetically") # } # use <- match.arg(use) # if (use == "complete.obs") data <- na.omit(data) # p <- length(data) # if (p < 2) stop("fewer than 2 variables.") # R <- matrix(1, p, p) # Type <- matrix("", p, p) # SE <- matrix(0, p, p) # N <- matrix(0, p, p) # Test <- matrix(0, p, p) # diag(N) <- if (use == "complete.obs") nrow(data) # else sapply(data, function(x) sum(!is.na(x))) # if (all(diag(N) == 0)) stop("no non-missing cases") # for (i in 2:p) { # for (j in 1:(i-1)){ # x <- data[[i]] # y <- data[[j]] # n <- sum(complete.cases(x, y)) # if (n == 0) { # Test[i, j] <- Test[j, i] <- R[i, j] <- R[j, i] <- SE[i, j] <- SE[j, i] <- NA # N[i, j] <- N[j, i] <- 0 # warning("no cases for pair ", j, ", ", i) # next # } # if (inherits(x, c("numeric", "integer")) && inherits(y, c("numeric", "integer"))) { # r <- cor(x, y, use="complete.obs") # Type[i, j] <- Type[j, i] <- "Pearson" # R[i, j] <- R[j, i] <- r # if (std.err) { # SE[i, j] <- SE[j, i] <- se.r(r, n) # N[i, j] <- N[j, i] <- n # Test[i, j] <- pchisq(chisq(x, y, r, bins=bins), bins^2 - 2, lower.tail=FALSE) # } # } # else if (inherits(x, c("factor", "logical", "character")) && # inherits(y, c("factor", "logical", "character"))) { # Type[i, j] <- Type[j, i] <- "Polychoric" # result <- try(polychor(x, y, ML=ML, std.err=std.err), silent=TRUE) # error <- inherits(result, "try-error") # if (error){ # warning("could not compute polychoric correlation between variables ", i, " and ", j, # "\n Message: ", result, "\n") # result <- NA # } # if (std.err && !error){ # R[i, j] <- R[j, i] <- result$rho # SE[i, j] <- SE[j, i] <- sqrt(result$var[1,1]) # N[i, j] <- N[j, i] <- n # Test[i, j] <- if (result$df > 0) # pchisq(result$chisq, result$df, lower.tail=FALSE) # else NA # } # else R[i, j] <- R[j, i] <- result # } # else { # if (inherits(x, c("factor", "logical", "character")) && # inherits(y, c("numeric", "integer"))) # result <- try(polyserial(y, x, ML=ML, std.err=std.err, bins=bins), silent=TRUE) # else if (inherits(x, c("numeric", "integer")) && # inherits(y, c("factor", "logical", "character"))) # result <- try(polyserial(x, y, ML=ML, std.err=std.err, bins=bins), silent=TRUE) # else { # stop("columns must be numeric, factors, logical, or character.") # } # Type[i, j] <- Type[j, i] <- "Polyserial" # error <- inherits(result, "try-error") # if (error){ # warning("could not compute polyserial correlation between variables ", i, " and ", j, # "\n Message: ", result, "\n") # result <- NA # } # if (std.err && !error){ # R[i, j] <- R[j, i] <- result$rho # SE[i, j] <- SE[j, i] <- sqrt(result$var[1,1]) # N[i, j] <- N[j, i] <- n # Test[i, j] <- pchisq(result$chisq, result$df, lower.tail=FALSE) # } # else R[i, j] <- R[j, i] <- result # } # } # } # if (pd && !any(is.na(R)) && min(eigen(R, only.values=TRUE)$values) < 0){ # cor <- Matrix::nearPD(R, corr=TRUE) # if (!cor$converged) warning("attempt to make correlation matrix positive-definite failed") # else warning("the correlation matrix has been adjusted to make it positive-definite") # R <- as.matrix(cor$mat) # } # rownames(R) <- colnames(R) <- names(data) # result <- list(correlations=R, type=Type, NA.method=use, ML=ML) # if (std.err) { # rownames(SE) <- colnames(SE) <- names(data) # rownames(N) <- colnames(N) <- names(N) # rownames(Test) <- colnames(Test) <- names(data) # result$std.errors <- SE # result$n <- if (use == "complete.obs") n else N # result$tests <- Test # } # class(result) <- "hetcor" # result # } polycor/MD50000644000176200001440000000141614151607547012265 0ustar liggesusers26e2b372feb75f3e50249bcd2047d6f8 *DESCRIPTION 691e1f6585f2d07cdbe76c1c5ff0ccca *NAMESPACE 3be45129d5e11ae532e663441ff960e1 *NEWS a39b21774e8f2ff64b2c9e1859f8d286 *R/hetcor.R c2adabbdd372617af490356a5a9b6437 *R/hetcor.data.frame.R a22daf1ad117fe0538df505705bf8ecc *R/hetcor.default.R bd099a4b959dc4b72e6bbd39782420fe *R/polychor.R 7bf6182ca40e94caf32e7379527eff6b *R/polyserial.R 3dd639b4ff9c10632f0c3ef8ffa0d043 *R/print.hetcor.R 7c4b1f775fc5aa944fe83e068524e6c5 *R/print.polycor.R 9a57a3f34449a1270944719c42ca7079 *R/utilities.R 4bd45f5ccc3154f479a7db406681015d *inst/CHANGES ef0a4910f51a861354673dc56952ac54 *man/hetcor.Rd abec2bda968a3dfb9386f4865471061e *man/polychor.Rd 52357e89045479f6fec9cc5027a149b9 *man/polyserial.Rd 289a8f6a6c2d0e11053f81d0324402a7 *man/print.polycor.Rd polycor/inst/0000755000176200001440000000000014151522505012716 5ustar liggesuserspolycor/inst/CHANGES0000644000176200001440000000402712760047562013725 0ustar liggesusersversion 0.5: original version on CRAN version 0.6-0 o added approximate standard errors for quick 2-step estimates. o corrected bug constraining quick estimates in polychor() to 0,1 rather than -1,1. o constraints on correlations (default to -.9999,.9999) for numeric stability. o several small changes. version 0.7-0 o added tests of bivariate normality. o added as.matrix.hetcor(). o fixed bug that caused use="pairwise.complete.obs" in hetcor() to fail. o cleaned up code. version 0.7-1 o suppressed test of bivariate normality in 2 x 2 (tetrachoric) case (where there are 0 df for the test). o examples, which use random data, now set the random-number-generator seed. o small changes to printed output. version 0.7-2 o polychor() and polyserial() now report an error when a categorical variable has just one level (problem pointed out by David Barron). version 0.7-3 o as.matrix.hetcor() now has a second argument, ..., to conform to the as.matrix() generic in R 2.5.0. version 0.7-4 o hetcor() uses the nearcor() function in sfsmisc to insure that the correlation matrix returned is positive-definite. o small changes. version 0.7-5 o polychor() and polyserial() now return NA and report a warning when a factor is invariant (after problem report by Jose Quesada); in polychor(), empty factor levels are automatically removed. version 0.7-6 o polyserial() and polychor() now make sure that final correlations are in the range [-1, 1], reporting a warning when inadmissible values are coerced to -1 or 1 (correcting a problem reported by Marie Loh). version 0.7-7 o hetcor() avoids an incorrect name for the first variable when it's given a matrix as input, rather than a data frame or individual variables, as the documentation indicates (preventing a consequent renaming error reported by Iuri Gavronski). o Small fix to hetcor.Rd. version 0.7-8 o hetcor() now accepts logical variables, treated as two-level factors (suggested by Soren Vogel).