polycor/0000755000176200001440000000000013522074002011734 5ustar liggesuserspolycor/NAMESPACE0000644000176200001440000000053412757604437013201 0ustar liggesusers# last modified 2016-08-25 by J. Fox importFrom("stats", "complete.cases", "cor", "dnorm", "na.omit", "optim", "optimise", "pchisq", "pnorm", "qnorm", "sd") export(hetcor, polychor, polyserial) S3method(hetcor, data.frame) S3method(hetcor, default) S3method(print, polycor) S3method(print, hetcor) S3method(as.matrix, hetcor) polycor/man/0000755000176200001440000000000013522061060012507 5ustar liggesuserspolycor/man/print.polycor.Rd0000644000176200001440000000220712760060024015623 0ustar liggesusers\name{print.polycor} \alias{print.polycor} \title{Print Method for polycor Objects} \description{ \code{print} method for objects of class \code{polycor}, produced by \code{polychor} and \code{polyserial}. } \usage{ \method{print}{polycor}(x, digits = max(3, getOption("digits") - 3), ...) } \arguments{ \item{x}{an object of class \code{polycor}, as returned by \code{polychor} or \code{polyserial}.} \item{digits}{number of significant digits to be printed.} \item{\dots}{not used.} } \value{ Invisibly returns \code{x}; used for its side effect --- i.e., printing. } \author{John Fox \email{jfox@mcmaster.ca}} \seealso{\code{\link{polychor}}, \code{\link{polyserial}}} \examples{ if(require(mvtnorm)){ set.seed(12345) data <- rmvnorm(1000, c(0, 0), matrix(c(1, .5, .5, 1), 2, 2)) x <- data[,1] y <- data[,2] cor(x, y) # sample correlation } if(require(mvtnorm)){ x <- cut(x, c(-Inf, .75, Inf)) y <- cut(y, c(-Inf, -1, .5, 1.5, Inf)) polychor(x, y, ML=TRUE, std.err=TRUE) # polychoric correlation, ML estimate } } \keyword{print} \keyword{methods} polycor/man/polyserial.Rd0000644000176200001440000001020713522061060015161 0ustar liggesusers\name{polyserial} \alias{polyserial} \title{Polyserial Correlation} \description{ Computes the polyserial correlation (and its standard error) between a quantitative variable and an ordinal variables, based on the assumption that the joint distribution of the quantitative variable and a latent continuous variable underlying the ordinal variable is bivariate normal. Either the maximum-likelihood estimator or a quicker ``two-step'' approximation is available. For the ML estimator the estimates of the thresholds and the covariance matrix of the estimates are also available. } \usage{ polyserial(x, y, ML = FALSE, control = list(), std.err = FALSE, maxcor=.9999, bins=4) } \arguments{ \item{x}{a numerical variable.} \item{y}{an ordered categorical variable; can be numeric, logical, a factor, an ordered factor, or a character variables, but if a factor, its levels should be in proper order, and the values of a character variable are ordered alphabetically.} \item{ML}{if \code{TRUE}, compute the maximum-likelihood estimate; if \code{FALSE}, the default, compute a quicker ``two-step'' approximation.} \item{control}{optional arguments to be passed to the \code{optim} function.} \item{std.err}{if \code{TRUE}, return the estimated variance of the correlation (for the two-step estimator) or the estimated covariance matrix of the correlation and thresholds (for the ML estimator); the default is \code{FALSE}.} \item{maxcor}{maximum absolute correlation (to insure numerical stability).} \item{bins}{the number of bins into which to dissect \code{x} for a test of bivariate normality; the default is 4.} } \value{ If \code{std.err} is \code{TRUE}, returns an object of class \code{"polycor"} with the following components: \item{type}{set to \code{"polyserial"}.} \item{rho}{the polyserial correlation.} \item{cuts}{estimated thresholds for the ordinal variable (\code{y}), for the ML estimator.} \item{var}{the estimated variance of the correlation, or, for the ML estimator, \ the estimated covariance matrix of the correlation and thresholds.} \item{n}{the number of observations on which the correlation is based.} \item{chisq}{chi-square test for bivariate normality.} \item{df}{degrees of freedom for the test of bivariate normality.} \item{ML}{\code{TRUE} for the ML estimate, \code{FALSE} for the two-step estimate.} Othewise, returns the polyserial correlation. } \details{ The ML estimator is computed by maximizing the bivariate-normal likelihood with respect to the thresholds for \eqn{y} (\eqn{\tau^{y}_j, i = 1,\ldots, c - 1}{\tau^y[j], j = 1,\ldots, c - 1}) and the population correlation (\eqn{\rho}). The likelihood is maximized numerically using the \code{\link{optim}} function, and the covariance matrix of the estimated parameters is based on the numerical Hessian computed by \code{optim}. The two-step estimator is computed by first estimating the thresholds (\eqn{\tau^{y}_j, i = 1,\ldots, c - 1}{\tau^y[j], j = 1,\ldots, c - 1}) from the marginal distribution of \eqn{y}. Then if the standard error of \eqn{\hat{\rho}}{\rho hat} is requested, the one-dimensional likelihood for \eqn{\rho} is maximized numerically, using \code{\link{optim}} if standard errors are requested; the standard error computed treats the thresholds as fixed. If the standard error isn't request, \eqn{\hat{\rho}}{\rho hat} is computed directly. } \references{ Drasgow, F. (1986) Polychoric and polyserial correlations. Pp. 68--74 in S. Kotz and N. Johnson, eds., \emph{The Encyclopedia of Statistics, Volume 7.} Wiley. } \author{John Fox \email{jfox@mcmaster.ca}} \seealso{\code{\link{hetcor}}, \code{\link{polychor}}, \code{\link{print.polycor}}, \code{\link[stats]{optim}}} \examples{ if(require(mvtnorm)){ set.seed(12345) data <- rmvnorm(1000, c(0, 0), matrix(c(1, .5, .5, 1), 2, 2)) x <- data[,1] y <- data[,2] cor(x, y) # sample correlation } if(require(mvtnorm)){ y <- cut(y, c(-Inf, -1, .5, 1.5, Inf)) polyserial(x, y) # 2-step estimate } if(require(mvtnorm)){ polyserial(x, y, ML=TRUE, std.err=TRUE) # ML estimate } } \keyword{models} polycor/man/hetcor.Rd0000644000176200001440000001200513522061060014260 0ustar liggesusers\name{hetcor} \alias{hetcor} \alias{hetcor.default} \alias{hetcor.data.frame} \alias{print.hetcor} \alias{as.matrix.hetcor} \title{Heterogeneous Correlation Matrix} \description{ Computes a heterogenous correlation matrix, consisting of Pearson product-moment correlations between numeric variables, polyserial correlations between numeric and ordinal variables, and polychoric correlations between ordinal variables. } \usage{ hetcor(data, ..., ML = FALSE, std.err = TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE) \method{hetcor}{data.frame}(data, ML = FALSE, std.err = TRUE, use = c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE, ...) \method{hetcor}{default}(data, ..., ML = FALSE, std.err = TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE) \method{print}{hetcor}(x, digits = max(3, getOption("digits") - 3), ...) \method{as.matrix}{hetcor}(x, ...) } \arguments{ \item{data}{a data frame consisting of factors, ordered factors, logical variables, character variables, and/or numeric variables, or the first of several variables.} \item{\dots}{variables and/or arguments to be passed down.} \item{ML}{if \code{TRUE}, compute maximum-likelihood estimates; if \code{FALSE}, compute quick two-step estimates.} \item{std.err}{if \code{TRUE}, compute standard errors.} \item{bins}{number of bins to use for continuous variables in testing bivariate normality; the default is 4.} \item{pd}{if \code{TRUE} and if the correlation matrix is not positive-definite, an attempt will be made to adjust it to a positive-definite matrix, using the \code{\link[Matrix]{nearPD}} function in the \code{Matrix} package. Note that default arguments to \code{nearPD} are used (except \code{corr=TRUE}); for more control call \code{nearPD} directly.} \item{use}{if \code{"complete.obs"}, remove observations with any missing data; if \code{"pairwise.complete.obs"}, compute each correlation using all observations with valid data for that pair of variables.} \item{x}{an object of class \code{"hetcor"} to be printed, or from which to extract the correlation matrix.} \item{digits}{number of significant digits.} } \value{ Returns an object of class \code{"hetcor"} with the following components: \item{correlations}{the correlation matrix.} \item{type}{the type of each correlation: \code{"Pearson"}, \code{"Polychoric"}, or \code{"Polyserial"}.} \item{std.errors}{the standard errors of the correlations, if requested.} \item{n}{the number (or numbers) of observations on which the correlations are based.} \item{tests}{p-values for tests of bivariate normality for each pair of variables.} \item{NA.method}{the method by which any missing data were handled: \code{"complete.obs"} or \code{"pairwise.complete.obs"}.} \item{ML}{\code{TRUE} for ML estimates, \code{FALSE} for two-step estimates.} } \author{John Fox \email{jfox@mcmaster.ca}} \note{Although the function reports standard errors for product-moment correlations, transformations (the most well known is Fisher's \emph{z}-transformation) are available that make the approach to asymptotic normality much more rapid.} \section{Warning}{ Be careful with character variables (as opposed to factors), the values of which are ordered alphabetically. Thus, e.g., the values \code{"disagree"}, \code{"neutral"}, \code{"agree"} are ordered \code{"agree"}, \code{"disagree"}, \code{"neutral"}. } \references{ Drasgow, F. (1986) Polychoric and polyserial correlations. Pp. 68-74 in S. Kotz and N. Johnson, eds., \emph{The Encyclopedia of Statistics, Volume 7.} Wiley. Olsson, U. (1979) Maximum likelihood estimation of the polychoric correlation coefficient. \emph{Psychometrika} \bold{44}, 443-460. Rodriguez, R.N. (1982) Correlation. Pp. 193-204 in S. Kotz and N. Johnson, eds., \emph{The Encyclopedia of Statistics, Volume 2.} Wiley. Ghosh, B.K. (1966) Asymptotic expansion for the moments of the distribution of correlation coefficient. \emph{Biometrika} \bold{53}, 258-262. Olkin, I., and Pratt, J.W. (1958) Unbiased estimation of certain correlation coefficients. \emph{Annals of Mathematical Statistics} \bold{29}, 201-211. } \seealso{\code{\link{polychor}}, \code{\link{polyserial}}, \code{\link[Matrix]{nearPD}}} \examples{ if(require(mvtnorm)){ set.seed(12345) R <- matrix(0, 4, 4) R[upper.tri(R)] <- runif(6) diag(R) <- 1 R <- cov2cor(t(R) \%*\% R) round(R, 4) # population correlations data <- rmvnorm(1000, rep(0, 4), R) round(cor(data), 4) # sample correlations } if(require(mvtnorm)){ x1 <- data[,1] x2 <- data[,2] y1 <- cut(data[,3], c(-Inf, .75, Inf)) y2 <- cut(data[,4], c(-Inf, -1, .5, 1.5, Inf)) data <- data.frame(x1, x2, y1, y2) hetcor(data) # Pearson, polychoric, and polyserial correlations, 2-step est. } if(require(mvtnorm)){ hetcor(x1, x2, y1, y2, ML=TRUE) # Pearson, polychoric, polyserial correlations, ML est. } } \keyword{models} polycor/man/polychor.Rd0000644000176200001440000001076613522061060014647 0ustar liggesusers\name{polychor} \alias{polychor} \title{Polychoric Correlation} \description{ Computes the polychoric correlation (and its standard error) between two ordinal variables or from their contingency table, under the assumption that the ordinal variables dissect continuous latent variables that are bivariate normal. Either the maximum-likelihood estimator or a (possibly much) quicker ``two-step'' approximation is available. For the ML estimator, the estimates of the thresholds and the covariance matrix of the estimates are also available. } \usage{ polychor(x, y, ML = FALSE, control = list(), std.err = FALSE, maxcor=.9999) } \arguments{ \item{x}{a contingency table of counts or an ordered categorical variable; the latter can be numeric, logical, a factor, an ordered factor, or a character variable, but if a factor, its levels should be in proper order, and the values of a character variable are ordered alphabetically.} \item{y}{if \code{x} is a variable, a second ordered categorical variable.} \item{ML}{if \code{TRUE}, compute the maximum-likelihood estimate; if \code{FALSE}, the default, compute a quicker ``two-step'' approximation.} \item{control}{optional arguments to be passed to the \code{optim} function.} \item{std.err}{if \code{TRUE}, return the estimated variance of the correlation (for the two-step estimator) or the estimated covariance matrix (for the ML estimator) of the correlation and thresholds; the default is \code{FALSE}.} \item{maxcor}{maximum absolute correlation (to insure numerical stability).} } \value{ If \code{std.err} is \code{TRUE}, returns an object of class \code{"polycor"} with the following components: \item{type}{set to \code{"polychoric"}.} \item{rho}{the polychoric correlation.} \item{row.cuts}{estimated thresholds for the row variable (\code{x}), for the ML estimate.} \item{col.cuts}{estimated thresholds for the column variable (\code{y}), for the ML estimate.} \item{var}{the estimated variance of the correlation, or, for the ML estimate, the estimated covariance matrix of the correlation and thresholds.} \item{n}{the number of observations on which the correlation is based.} \item{chisq}{chi-square test for bivariate normality.} \item{df}{degrees of freedom for the test of bivariate normality.} \item{ML}{\code{TRUE} for the ML estimate, \code{FALSE} for the two-step estimate.} Othewise, returns the polychoric correlation. } \details{ The ML estimator is computed by maximizing the bivariate-normal likelihood with respect to the thresholds for the two variables (\eqn{\tau^{x}_i, i = 1,\ldots, r - 1}{\tau^x[i], i = 1,\ldots, r - 1}; \eqn{\tau^{y}_j, j = 1,\ldots, c - 1}{\tau^y[j], j = 1,\ldots, c - 1}) and the population correlation (\eqn{\rho}). Here, \eqn{r} and \eqn{c} are respectively the number of levels of \eqn{x} and \eqn{y}. The likelihood is maximized numerically using the \code{\link{optim}} function, and the covariance matrix of the estimated parameters is based on the numerical Hessian computed by \code{optim}. The two-step estimator is computed by first estimating the thresholds (\eqn{\tau^{x}_i, i = 1,\ldots, r - 1}{\tau^x[i], i = 1,\ldots, r - 1} and \eqn{\tau^{y}_j, i = j,\ldots, c - 1}{\tau^y[j], i = j,\ldots, c - 1}) separately from the marginal distribution of each variable. Then the one-dimensional likelihood for \eqn{\rho} is maximized numerically, using \code{\link{optim}} if standard errors are requested, or \code{\link{optimise}} if they are not. The standard error computed treats the thresholds as fixed. } \references{ Drasgow, F. (1986) Polychoric and polyserial correlations. Pp. 68--74 in S. Kotz and N. Johnson, eds., \emph{The Encyclopedia of Statistics, Volume 7.} Wiley. Olsson, U. (1979) Maximum likelihood estimation of the polychoric correlation coefficient. \emph{Psychometrika} \bold{44}, 443-460. } \author{John Fox \email{jfox@mcmaster.ca}} \seealso{\code{\link{hetcor}}, \code{\link{polyserial}}, \code{\link{print.polycor}}, \code{\link[stats]{optim}}} \examples{ if(require(mvtnorm)){ set.seed(12345) data <- rmvnorm(1000, c(0, 0), matrix(c(1, .5, .5, 1), 2, 2)) x <- data[,1] y <- data[,2] cor(x, y) # sample correlation } if(require(mvtnorm)){ x <- cut(x, c(-Inf, .75, Inf)) y <- cut(y, c(-Inf, -1, .5, 1.5, Inf)) polychor(x, y) # 2-step estimate } if(require(mvtnorm)){ set.seed(12345) polychor(x, y, ML=TRUE, std.err=TRUE) # ML estimate } } \keyword{models} polycor/DESCRIPTION0000644000176200001440000000140013522074002013435 0ustar liggesusersPackage: polycor Version: 0.7-10 Date: 2019-07-22 Title: Polychoric and Polyserial Correlations Authors@R: person("John", "Fox", role = c("aut", "cre"), email = "jfox@mcmaster.ca") Depends: R (>= 3.3.0) Imports: stats, mvtnorm, Matrix ByteCompile: yes LazyLoad: yes Description: Computes polychoric and polyserial correlations by quick "two-step" methods or ML, optionally with standard errors; tetrachoric and biserial correlations are special cases. License: GPL (>= 2) URL: https://r-forge.r-project.org/projects/polycor/, https://CRAN.R-project.org/package=polycor NeedsCompilation: no Packaged: 2019-08-05 17:37:42 UTC; jfox Author: John Fox [aut, cre] Maintainer: John Fox Repository: CRAN Date/Publication: 2019-08-05 18:40:02 UTC polycor/NEWS0000644000176200001440000000143613522061060012437 0ustar liggesusersversion 0.7-10 o Let hetcor() work with objects that inherit from "data.frame" (suggestion of Emil O. W. Kirkegaard). o Allow empty pairs when use="pairwise.complete.obs" for hetcor() (suggestion of Emil O. W. Kirkegaard). o Allow character variables in hetcor() but print message. version 0.7-9 o Further protection/warnings against inadmissible correlations. o Protect against out-of-order thresholds during optimization (suggestion of Evgeny Mirkes). o hetcor() doesn't die when an error is thrown computing a correlation, which is now set to NA. o hetcor() now uses Matrix::nearPD() to force a postitive-definite results rather than the depricated sfsmisc::nearcor() (suggestion of Timothy Bates). o Improved clarity of documentation. o See the CHANGES file for prior versions. polycor/R/0000755000176200001440000000000013522061060012135 5ustar liggesuserspolycor/R/utilities.R0000644000176200001440000000232712760046675014321 0ustar liggesusers# last modified 29 Mar 07 by J. Fox binBvn <- function(rho, row.cuts, col.cuts, bins=4){ row.cuts <- if (missing(row.cuts)) c(-Inf, 1:(bins - 1)/bins, Inf) else c(-Inf, row.cuts, Inf) col.cuts <- if (missing(col.cuts)) c(-Inf, 1:(bins - 1)/bins, Inf) else c(-Inf, col.cuts, Inf) r <- length(row.cuts) - 1 c <- length(col.cuts) - 1 P <- matrix(0, r, c) R <- matrix(c(1, rho, rho, 1), 2, 2) for (i in 1:r){ for (j in 1:c){ P[i,j] <- mvtnorm::pmvnorm(lower=c(row.cuts[i], col.cuts[j]), upper=c(row.cuts[i+1], col.cuts[j+1]), corr=R) } } P } chisq <- function(x, y, rho, row.cuts, col.cuts, zerotol=1e-6, bins=4){ if (missing(row.cuts)) row.cuts <- qnorm(1:(bins - 1)/bins) if (missing(col.cuts)) col.cuts <- qnorm(1:(bins - 1)/bins) P <- binBvn(rho, row.cuts, col.cuts, bins=bins) if (!is.factor(x)) x <- cut(scale(x), c(-Inf, row.cuts, Inf)) if (!is.factor(y)) y <- cut(scale(y), c(-Inf, col.cuts, Inf)) tab <- table(x, y) n <- sum(tab) 2*sum(tab*log((tab + zerotol)/(P*n))) } as.matrix.hetcor <- function(x, ...) x$correlations polycor/R/polychor.R0000644000176200001440000000777412757603745014163 0ustar liggesusers# last modified 2013-12-25 by J. Fox polychor <- function (x, y, ML=FALSE, control=list(), std.err=FALSE, maxcor=.9999){ f <- function(pars) { if (length(pars) == 1){ rho <- pars if (abs(rho) > maxcor) rho <- sign(rho)*maxcor row.cuts <- rc col.cuts <- cc } else { rho <- pars[1] if (abs(rho) > maxcor) rho <- sign(rho)*maxcor row.cuts <- pars[2:r] col.cuts <- pars[(r+1):(r+c-1)] if (any(diff(row.cuts) < 0) || any(diff(col.cuts) < 0)) return(Inf) } P <- binBvn(rho, row.cuts, col.cuts) - sum(tab * log(P)) } tab <- if (missing(y)) x else table(x, y) zerorows <- apply(tab, 1, function(x) all(x == 0)) zerocols <- apply(tab, 2, function(x) all(x == 0)) zr <- sum(zerorows) if (0 < zr) warning(paste(zr, " row", suffix <- if(zr == 1) "" else "s", " with zero marginal", suffix," removed", sep="")) zc <- sum(zerocols) if (0 < zc) warning(paste(zc, " column", suffix <- if(zc == 1) "" else "s", " with zero marginal", suffix, " removed", sep="")) tab <- tab[!zerorows, ,drop=FALSE] tab <- tab[, !zerocols, drop=FALSE] r <- nrow(tab) c <- ncol(tab) if (r < 2) { warning("the table has fewer than 2 rows") return(NA) } if (c < 2) { warning("the table has fewer than 2 columns") return(NA) } n <- sum(tab) rc <- qnorm(cumsum(rowSums(tab))/n)[-r] cc <- qnorm(cumsum(colSums(tab))/n)[-c] if (ML) { result <- optim(c(optimise(f, interval=c(-1, 1))$minimum, rc, cc), f, control=control, hessian=std.err) if (result$par[1] > 1){ result$par[1] <- maxcor warning(paste("inadmissible correlation set to", maxcor)) } else if (result$par[1] < -1){ result$par[1] <- -maxcor warning(paste("inadmissible correlation set to -", maxcor, sep="")) } if (std.err) { chisq <- 2*(result$value + sum(tab * log((tab + 1e-6)/n))) df <- length(tab) - r - c result <- list(type="polychoric", rho=result$par[1], row.cuts=result$par[2:r], col.cuts=result$par[(r+1):(r+c-1)], var=solve(result$hessian), n=n, chisq=chisq, df=df, ML=TRUE) class(result) <- "polycor" return(result) } else return(as.vector(result$par[1])) } else if (std.err){ result <- optim(0, f, control=control, hessian=TRUE, method="BFGS") if (result$par > 1){ result$par <- maxcor warning(paste("inadmissible correlation set to", maxcor)) } else if (result$par < -1){ result$par <- -maxcor warning(paste("inadmissible correlation set to -", maxcor, sep="")) } chisq <- 2*(result$value + sum(tab *log((tab + 1e-6)/n))) df <- length(tab) - r - c result <- list(type="polychoric", rho=result$par, var=1/result$hessian, n=n, chisq=chisq, df=df, ML=FALSE) class(result) <- "polycor" return(result) } else optimise(f, interval=c(-maxcor, maxcor))$minimum } polycor/R/print.polycor.R0000644000176200001440000000406712757603745015136 0ustar liggesusers# last modified 24 June 04 by J. Fox "print.polycor" <- function(x, digits = max(3, getOption("digits") - 3), ...){ if (x$type == "polychoric"){ se <- sqrt(diag(x$var)) se.rho <- se[1] est <- if (x$ML) "ML est." else "2-step est." cat("\nPolychoric Correlation, ", est, " = ", signif(x$rho, digits), " (", signif(se.rho, digits), ")", sep="") if (x$df > 0) cat("\nTest of bivariate normality: Chisquare = ", signif(x$chisq, digits), ", df = ", x$df, ", p = ", signif(pchisq(x$chisq, x$df, lower.tail=FALSE), digits), "\n", sep="") else cat("\n") r <- length(x$row.cuts) c <- length(x$col.cuts) if (r == 0) return(invisible(x)) row.cuts.se <- se[2:(r+1)] col.cuts.se <- se[(r+2):(r+c+1)] rowThresh <- signif(cbind(x$row.cuts, row.cuts.se), digits) if (r > 1) cat("\n Row Thresholds\n") else cat("\n Row Threshold\n") colnames(rowThresh) <- c("Threshold", "Std.Err.") rownames(rowThresh) <- if (r > 1) 1:r else " " print(rowThresh) colThresh <- signif(cbind(x$col.cuts, col.cuts.se), digits) if (c > 1) cat("\n\n Column Thresholds\n") else cat("\n\n Column Threshold\n") colnames(colThresh) <- c("Threshold", "Std.Err.") rownames(colThresh) <- if (c > 1) 1:c else " " print(colThresh) } else if (x$type == "polyserial"){ se <- sqrt(diag(x$var)) se.rho <- se[1] est <- if (x$ML) "ML est." else "2-step est." cat("\nPolyserial Correlation, ", est, " = ", signif(x$rho, digits), " (", signif(se.rho, digits), ")", sep="") cat("\nTest of bivariate normality: Chisquare = ", signif(x$chisq, digits), ", df = ", x$df, ", p = ", signif(pchisq(x$chisq, x$df, lower.tail=FALSE), digits), "\n\n", sep="") if (length(se) == 1) return(invisible(x)) cuts.se <- se[-1] thresh <- signif(rbind(x$cuts, cuts.se), digits) colnames(thresh) <- 1:length(x$cuts) rownames(thresh) <- c("Threshold", "Std.Err.") print(thresh) } else print(unclass(x)) invisible(x) } polycor/R/hetcor.R0000644000176200001440000000031313124217752013552 0ustar liggesusers# last modified 2016-10-05 by J. Fox "hetcor" <- function(data, ..., ML=FALSE, std.err=TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE){ UseMethod("hetcor") } polycor/R/polyserial.R0000644000176200001440000000474212757603745014477 0ustar liggesusers# last modified 2013-12-25 by J. Fox polyserial <- function(x, y, ML=FALSE, control=list(), std.err=FALSE, maxcor=.9999, bins=4){ f <- function(pars){ rho <- pars[1] if (abs(rho) > maxcor) rho <- sign(rho)*maxcor cts <- if (length(pars) == 1) c(-Inf, cuts, Inf) else c(-Inf, pars[-1], Inf) if (any(diff(cts) < 0)) return(Inf) tau <- (matrix(cts, n, s+1, byrow=TRUE) - matrix(rho*z, n, s+1))/ sqrt(1 - rho^2) - sum(log(dnorm(z)*(pnorm(tau[cbind(indices, y+1)]) - pnorm(tau[cbind(indices, y)])))) } if (!is.numeric(x)) stop("x must be numeric") valid <- complete.cases(x, y) x <- x[valid] y <- y[valid] z <- scale(x) tab <- table(y) n <- sum(tab) s <- length(tab) if (s < 2) { warning("y has fewer than 2 levels") return(NA) } if (sum(0 != tab) < 2){ warning("y has fewer than 2 levels with data") return(NA) } indices <- 1:n cuts <- qnorm(cumsum(tab)/n)[-s] y <- as.numeric(as.factor(y)) rho <- sqrt((n - 1)/n)*sd(y)*cor(x, y)/sum(dnorm(cuts)) if (abs(rho) > maxcor) { warning("initial correlation inadmissible, ", rho, ", set to ", sign(rho)*maxcor) rho <- sign(rho)*maxcor } if (ML) { result <- optim(c(rho, cuts), f, control=control, hessian=std.err) if (result$par[1] > 1){ result$par[1] <- maxcor warning(paste("inadmissible correlation set to", maxcor)) } else if (result$par[1] < -1){ result$par[1] <- -maxcor warning(paste("inadmissible correlation set to -", maxcor, sep="")) } if (std.err){ chisq <- chisq(y, z, result$par[1], result$par[-1], bins=bins) df <- s*bins - s - 1 result <- list(type="polyserial", rho=result$par[1], cuts=result$par[-1], var=solve(result$hessian), n=n, chisq=chisq, df=df, ML=TRUE) class(result) <- "polycor" return(result) } else return(as.vector(result$par[1])) } else if (std.err){ result <- optim(rho, f, control=control, hessian=TRUE, method="BFGS") if (result$par > 1){ result$par <- maxcor warning(paste("inadmissible correlation set to", maxcor)) } else if (result$par < -1){ result$par <- -maxcor warning(paste("inadmissible correlation set to -", maxcor, sep="")) } chisq <- chisq(y, z, rho, cuts, bins=bins) df <- s*bins - s - 1 result <- list(type="polyserial", rho=result$par, var=1/result$hessian, n=n, chisq=chisq, df=df, ML=FALSE) class(result) <- "polycor" return(result) } else return(rho) } polycor/R/hetcor.default.R0000644000176200001440000000062213124217752015200 0ustar liggesusers# last modified 2016-10-05 by J. Fox hetcor.default <- function (data, ..., ML = FALSE, std.err = TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins = 4, pd = TRUE) { use <- match.arg(use) dframe <- data.frame(data, ...) if (!missing(...)) names(dframe)[1] <- deparse(substitute(data)) hetcor(dframe, ML = ML, std.err = std.err, use=use, bins = bins, pd = pd) } polycor/R/print.hetcor.R0000644000176200001440000000227212757603745014727 0ustar liggesusers# last modified 12 Dec 04 by J. Fox "print.hetcor" <- function(x, digits = max(3, getOption("digits") - 3), ...){ R <- signif(x$correlations, digits=digits) R[upper.tri(R)] <- x$type[upper.tri(R)] R <- as.data.frame(R) if (x$ML) cat("\nMaximum-Likelihood Estimates\n") else cat("\nTwo-Step Estimates\n") cat("\nCorrelations/Type of Correlation:\n") print(R) if (!is.null(x$std.errors)){ SE <- signif(x$std.errors, digits) diag(SE) <- "" if (x$NA.method == "complete.obs"){ SE[upper.tri(SE)] <- "" cat("\nStandard Errors:\n") SE <- as.data.frame(SE) print(SE[,-ncol(SE)]) cat(paste("\nn =", x$n, "\n")) } else { SE[upper.tri(SE)] <- x$n[upper.tri(SE)] diag(SE) <- diag(x$n) SE <- as.data.frame(SE) cat("\nStandard Errors/Numbers of Observations:\n") print(SE) } if (!all(is.na(x$tests[lower.tri(x$tests)]))){ Test <- signif(x$tests, digits) Test[upper.tri(Test)] <- "" diag(Test) <- "" Test <- as.data.frame(Test) cat("\nP-values for Tests of Bivariate Normality:\n") print(Test[,-ncol(Test)]) } } invisible(x) } polycor/R/hetcor.data.frame.R0000644000176200001440000001265613522061060015557 0ustar liggesusers# last modified 2019-07-22 by J. Fox "hetcor.data.frame" <- function(data, ML=FALSE, std.err=TRUE, use=c("complete.obs", "pairwise.complete.obs"), bins=4, pd=TRUE, ...){ se.r <- function(r, n){ rho <- r*(1 + (1 - r^2)/(2*(n - 3))) # approx. unbiased estimator v <- (((1 - rho^2)^2)/(n + 6))*(1 + (14 + 11*rho^2)/(2*(n + 6))) sqrt(v) } if (any(sapply(data, function(x) inherits(x, "character")))){ message("data contain one or more character variables", "\nthe values of which are ordered alphabetically") } use <- match.arg(use) if (use == "complete.obs") data <- na.omit(data) p <- length(data) if (p < 2) stop("fewer than 2 variables.") R <- matrix(1, p, p) Type <- matrix("", p, p) SE <- matrix(0, p, p) N <- matrix(0, p, p) Test <- matrix(0, p, p) diag(N) <- if (use == "complete.obs") nrow(data) else sapply(data, function(x) sum(!is.na(x))) if (all(diag(N) == 0)) stop("no non-missing cases") for (i in 2:p) { for (j in 1:(i-1)){ x <- data[[i]] y <- data[[j]] n <- sum(complete.cases(x, y)) if (n == 0) { Test[i, j] <- Test[j, i] <- R[i, j] <- R[j, i] <- SE[i, j] <- SE[j, i] <- NA N[i, j] <- N[j, i] <- 0 warning("no cases for pair ", j, ", ", i) next } if (inherits(x, c("numeric", "integer")) && inherits(y, c("numeric", "integer"))) { r <- cor(x, y, use="complete.obs") Type[i, j] <- Type[j, i] <- "Pearson" R[i, j] <- R[j, i] <- r if (std.err) { SE[i, j] <- SE[j, i] <- se.r(r, n) N[i, j] <- N[j, i] <- n Test[i, j] <- pchisq(chisq(x, y, r, bins=bins), bins^2 - 2, lower.tail=FALSE) } } else if (inherits(x, c("factor", "logical", "character")) && inherits(y, c("factor", "logical", "character"))) { Type[i, j] <- Type[j, i] <- "Polychoric" result <- try(polychor(x, y, ML=ML, std.err=std.err), silent=TRUE) error <- inherits(result, "try-error") if (error){ warning("could not compute polychoric correlation between variables ", i, " and ", j, "\n Message: ", result, "\n") result <- NA } if (std.err && !error){ R[i, j] <- R[j, i] <- result$rho SE[i, j] <- SE[j, i] <- sqrt(result$var[1,1]) N[i, j] <- N[j, i] <- n Test[i, j] <- if (result$df > 0) pchisq(result$chisq, result$df, lower.tail=FALSE) else NA } else R[i, j] <- R[j, i] <- result } else { if (inherits(x, c("factor", "logical", "character")) && inherits(y, c("numeric", "integer"))) result <- try(polyserial(y, x, ML=ML, std.err=std.err, bins=bins), silent=TRUE) else if (inherits(x, c("numeric", "integer")) && inherits(y, c("factor", "logical", "character"))) result <- try(polyserial(x, y, ML=ML, std.err=std.err, bins=bins), silent=TRUE) else { stop("columns must be numeric, factors, logical, or character.") } Type[i, j] <- Type[j, i] <- "Polyserial" error <- inherits(result, "try-error") if (error){ warning("could not compute polyserial correlation between variables ", i, " and ", j, "\n Message: ", result, "\n") result <- NA } if (std.err && !error){ R[i, j] <- R[j, i] <- result$rho SE[i, j] <- SE[j, i] <- sqrt(result$var[1,1]) N[i, j] <- N[j, i] <- n Test[i, j] <- pchisq(result$chisq, result$df, lower.tail=FALSE) } else R[i, j] <- R[j, i] <- result } } } if (pd && !any(is.na(R)) && min(eigen(R, only.values=TRUE)$values) < 0){ cor <- Matrix::nearPD(R, corr=TRUE) if (!cor$converged) warning("attempt to make correlation matrix positive-definite failed") else warning("the correlation matrix has been adjusted to make it positive-definite") R <- as.matrix(cor$mat) } rownames(R) <- colnames(R) <- names(data) result <- list(correlations=R, type=Type, NA.method=use, ML=ML) if (std.err) { rownames(SE) <- colnames(SE) <- names(data) rownames(N) <- colnames(N) <- names(N) rownames(Test) <- colnames(Test) <- names(data) result$std.errors <- SE result$n <- if (use == "complete.obs") n else N result$tests <- Test } class(result) <- "hetcor" result } polycor/MD50000644000176200001440000000141613522074002012246 0ustar liggesusers5c7ed877448ad60174f1e0372b32c500 *DESCRIPTION ecb4e291f8cda19809e05eb819c3a148 *NAMESPACE ce26f5a4a4a4ac493456d6c91d2015b9 *NEWS 027489ade839b5eabae30150d91c0afd *R/hetcor.R 9c18019482eb6c609fbd96f50b9e2005 *R/hetcor.data.frame.R df027130be3b5042a97ea83feb9817ce *R/hetcor.default.R 27d46ca3fc8225a279a4318eb9f2369a *R/polychor.R 88da432e897873dbc29202576516ec40 *R/polyserial.R 3dd639b4ff9c10632f0c3ef8ffa0d043 *R/print.hetcor.R 7c4b1f775fc5aa944fe83e068524e6c5 *R/print.polycor.R 9a57a3f34449a1270944719c42ca7079 *R/utilities.R 4bd45f5ccc3154f479a7db406681015d *inst/CHANGES 385b7c57e990065b1d73616e7f181528 *man/hetcor.Rd b6b095e3cf3adb32deeefc4f04026350 *man/polychor.Rd 87cc9c42917b13d1863a23d20a5a8552 *man/polyserial.Rd 6fc81945e0e29714508910c54a1cf07b *man/print.polycor.Rd polycor/inst/0000755000176200001440000000000012757603745012736 5ustar liggesuserspolycor/inst/CHANGES0000644000176200001440000000402712757604125013725 0ustar liggesusersversion 0.5: original version on CRAN version 0.6-0 o added approximate standard errors for quick 2-step estimates. o corrected bug constraining quick estimates in polychor() to 0,1 rather than -1,1. o constraints on correlations (default to -.9999,.9999) for numeric stability. o several small changes. version 0.7-0 o added tests of bivariate normality. o added as.matrix.hetcor(). o fixed bug that caused use="pairwise.complete.obs" in hetcor() to fail. o cleaned up code. version 0.7-1 o suppressed test of bivariate normality in 2 x 2 (tetrachoric) case (where there are 0 df for the test). o examples, which use random data, now set the random-number-generator seed. o small changes to printed output. version 0.7-2 o polychor() and polyserial() now report an error when a categorical variable has just one level (problem pointed out by David Barron). version 0.7-3 o as.matrix.hetcor() now has a second argument, ..., to conform to the as.matrix() generic in R 2.5.0. version 0.7-4 o hetcor() uses the nearcor() function in sfsmisc to insure that the correlation matrix returned is positive-definite. o small changes. version 0.7-5 o polychor() and polyserial() now return NA and report a warning when a factor is invariant (after problem report by Jose Quesada); in polychor(), empty factor levels are automatically removed. version 0.7-6 o polyserial() and polychor() now make sure that final correlations are in the range [-1, 1], reporting a warning when inadmissible values are coerced to -1 or 1 (correcting a problem reported by Marie Loh). version 0.7-7 o hetcor() avoids an incorrect name for the first variable when it's given a matrix as input, rather than a data frame or individual variables, as the documentation indicates (preventing a consequent renaming error reported by Iuri Gavronski). o Small fix to hetcor.Rd. version 0.7-8 o hetcor() now accepts logical variables, treated as two-level factors (suggested by Soren Vogel).