aroma.light/ 0000755 0001750 0001750 00000000000 14147402522 012612 5 ustar nilesh nilesh aroma.light/DESCRIPTION 0000644 0001750 0001750 00000002665 14136064722 014335 0 ustar nilesh nilesh Package: aroma.light
Version: 3.24.0
Depends: R (>= 2.15.2)
Imports: stats, R.methodsS3 (>= 1.7.1), R.oo (>= 1.23.0), R.utils (>=
2.9.0), matrixStats (>= 0.55.0)
Suggests: princurve (>= 2.1.4)
Title: Light-Weight Methods for Normalization and Visualization of
Microarray Data using Only Basic R Data Types
Authors@R: c(
person("Henrik", "Bengtsson", role = c("aut", "cre", "cph"),
email = "henrikb@braju.com"),
person("Pierre", "Neuvial", role = "ctb"),
person("Aaron", "Lun", role = "ctb"))
Description: Methods for microarray analysis that take basic data types such as matrices and lists of vectors. These methods can be used standalone, be utilized in other packages, or be wrapped up in higher-level classes.
License: GPL (>= 2)
biocViews: Infrastructure, Microarray, OneChannel, TwoChannel,
MultiChannel, Visualization, Preprocessing
URL: https://github.com/HenrikBengtsson/aroma.light,
https://www.aroma-project.org
BugReports: https://github.com/HenrikBengtsson/aroma.light/issues
LazyLoad: TRUE
Encoding: latin1
git_url: https://git.bioconductor.org/packages/aroma.light
git_branch: RELEASE_3_14
git_last_commit: d0f8f2b
git_last_commit_date: 2021-10-26
Date/Publication: 2021-10-26
NeedsCompilation: no
Packaged: 2021-10-26 20:49:22 UTC; biocbuild
Author: Henrik Bengtsson [aut, cre, cph],
Pierre Neuvial [ctb],
Aaron Lun [ctb]
Maintainer: Henrik Bengtsson
aroma.light/man/ 0000755 0001750 0001750 00000000000 14136047216 013370 5 ustar nilesh nilesh aroma.light/man/print.SmoothSplineLikelihood.Rd 0000644 0001750 0001750 00000001725 14136047216 021447 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% print.SmoothSplineLikelihood.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{print.SmoothSplineLikelihood}
\alias{print.SmoothSplineLikelihood}
\alias{SmoothSplineLikelihood.print}
\alias{print,SmoothSplineLikelihood-method}
\title{Prints an SmoothSplineLikelihood object}
\description{
Prints an SmoothSplineLikelihood object. A SmoothSplineLikelihood object is returned by
\code{\link{likelihood.smooth.spline}()}.
}
\usage{
\method{print}{SmoothSplineLikelihood}(x, digits=getOption("digits"), ...)
}
\arguments{
\item{x}{Object to be printed.}
\item{digits}{Minimal number of significant digits to print.}
\item{...}{Not used.}
}
\value{
Returns nothing.
}
\author{Henrik Bengtsson}
\keyword{internal}
\keyword{methods}
aroma.light/man/backtransformPrincipalCurve.Rd 0000644 0001750 0001750 00000013457 14136047216 021374 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% backtransformPrincipalCurve.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{backtransformPrincipalCurve}
\alias{backtransformPrincipalCurve}
\alias{backtransformPrincipalCurve.numeric}
\alias{backtransformPrincipalCurve.matrix}
\title{Reverse transformation of principal-curve fit}
\description{
Reverse transformation of principal-curve fit.
}
\usage{
\method{backtransformPrincipalCurve}{matrix}(X, fit, dimensions=NULL, targetDimension=NULL, ...)
\method{backtransformPrincipalCurve}{numeric}(X, ...)
}
\arguments{
\item{X}{An NxK \code{\link[base]{matrix}} containing data to be backtransformed.}
\item{fit}{An MxL principal-curve fit object of class
\code{principal_curve} as returned by \code{\link{fitPrincipalCurve}}().
Typically \eqn{L = K}, but not always.
}
\item{dimensions}{An (optional) subset of of D dimensions all in [1,L]
to be returned (and backtransform).}
\item{targetDimension}{An (optional) index specifying the dimension
in [1,L] to be used as the target dimension of the \code{fit}.
More details below.}
\item{...}{Passed internally to \code{\link[stats]{smooth.spline}}.}
}
\value{
The backtransformed NxK (or NxD) \code{\link[base]{matrix}}.
}
\details{
Each column in X ("dimension") is backtransformed independently
of the others.
}
\section{Target dimension}{
By default, the backtransform is such that afterward the signals are
approximately proportional to the (first) principal curve as fitted
by \code{\link{fitPrincipalCurve}}(). This scale and origin of this
principal curve is not uniquely defined.
If \code{targetDimension} is specified, then the backtransformed signals
are approximately proportional to the signals of the target dimension,
and the signals in the target dimension are unchanged.
}
\section{Subsetting dimensions}{
Argument \code{dimensions} can be used to backtransform a subset of
dimensions (K) based on a subset of the fitted dimensions (L).
If \eqn{K = L}, then both \code{X} and \code{fit} is subsetted.
If \eqn{K <> L}, then it is assumed that \code{X} is already
subsetted/expanded and only \code{fit} is subsetted.
}
\examples{
# Consider the case where K=4 measurements have been done
# for the same underlying signals 'x'. The different measurements
# have different systematic variation
#
# y_k = f(x_k) + eps_k; k = 1,...,K.
#
# In this example, we assume non-linear measurement functions
#
# f(x) = a + b*x + x^c + eps(b*x)
#
# where 'a' is an offset, 'b' a scale factor, and 'c' an exponential.
# We also assume heteroscedastic zero-mean noise with standard
# deviation proportional to the rescaled underlying signal 'x'.
#
# Furthermore, we assume that measurements k=2 and k=3 undergo the
# same transformation, which may illustrate that the come from
# the same batch. However, when *fitting* the model below we
# will assume they are independent.
# Transforms
a <- c(2, 15, 15, 3)
b <- c(2, 3, 3, 4)
c <- c(1, 2, 2, 1/2)
K <- length(a)
# The true signal
N <- 1000
x <- rexp(N)
# The noise
bX <- outer(b,x)
E <- apply(bX, MARGIN=2, FUN=function(x) rnorm(K, mean=0, sd=0.1*x))
# The transformed signals with noise
Xc <- t(sapply(c, FUN=function(c) x^c))
Y <- a + bX + Xc + E
Y <- t(Y)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Fit principal curve
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Fit principal curve through Y = (y_1, y_2, ..., y_K)
fit <- fitPrincipalCurve(Y)
# Flip direction of 'lambda'?
rho <- cor(fit$lambda, Y[,1], use="complete.obs")
flip <- (rho < 0)
if (flip) {
fit$lambda <- max(fit$lambda, na.rm=TRUE)-fit$lambda
}
L <- ncol(fit$s)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Backtransform data according to model fit
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Backtransform toward the principal curve (the "common scale")
YN1 <- backtransformPrincipalCurve(Y, fit=fit)
stopifnot(ncol(YN1) == K)
# Backtransform toward the first dimension
YN2 <- backtransformPrincipalCurve(Y, fit=fit, targetDimension=1)
stopifnot(ncol(YN2) == K)
# Backtransform toward the last (fitted) dimension
YN3 <- backtransformPrincipalCurve(Y, fit=fit, targetDimension=L)
stopifnot(ncol(YN3) == K)
# Backtransform toward the third dimension (dimension by dimension)
# Note, this assumes that K == L.
YN4 <- Y
for (cc in 1:L) {
YN4[,cc] <- backtransformPrincipalCurve(Y, fit=fit,
targetDimension=1, dimensions=cc)
}
stopifnot(identical(YN4, YN2))
# Backtransform a subset toward the first dimension
# Note, this assumes that K == L.
YN5 <- backtransformPrincipalCurve(Y, fit=fit,
targetDimension=1, dimensions=2:3)
stopifnot(identical(YN5, YN2[,2:3]))
stopifnot(ncol(YN5) == 2)
# Extract signals from measurement #2 and backtransform according
# its model fit. Signals are standardized to target dimension 1.
y6 <- Y[,2,drop=FALSE]
yN6 <- backtransformPrincipalCurve(y6, fit=fit, dimensions=2,
targetDimension=1)
stopifnot(identical(yN6, YN2[,2,drop=FALSE]))
stopifnot(ncol(yN6) == 1)
# Extract signals from measurement #2 and backtransform according
# the the model fit of measurement #3 (because we believe these
# two have undergone very similar transformations.
# Signals are standardized to target dimension 1.
y7 <- Y[,2,drop=FALSE]
yN7 <- backtransformPrincipalCurve(y7, fit=fit, dimensions=3,
targetDimension=1)
stopifnot(ncol(yN7) == 1)
stopifnot(cor(yN7, yN6) > 0.9999)
}
\seealso{
\code{\link{fitPrincipalCurve}}()
}
\keyword{methods}
aroma.light/man/fitIWPCA.Rd 0000644 0001750 0001750 00000012563 14136047216 015234 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% fitIWPCA.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{fitIWPCA}
\alias{fitIWPCA}
\alias{fitIWPCA.matrix}
\title{Robust fit of linear subspace through multidimensional data}
\description{
Robust fit of linear subspace through multidimensional data.
}
\usage{
\method{fitIWPCA}{matrix}(X, constraint=c("diagonal", "baseline", "max"), baselineChannel=NULL, ...,
aShift=rep(0, times = ncol(X)), Xmin=NULL)
}
\arguments{
\item{X}{NxK \code{\link[base]{matrix}} where N is the number of observations and
K is the number of dimensions (channels).
}
\item{constraint}{A \code{\link[base]{character}} string or a \code{\link[base]{numeric}} value.
If \code{\link[base]{character}} it specifies which additional constraint to be used
to specify the offset parameters along the fitted line;
If \code{"diagonal"}, the offset vector will be a point on the line
that is closest to the diagonal line (1,...,1).
With this constraint, all bias parameters are identifiable.
If \code{"baseline"} (requires argument \code{baselineChannel}), the
estimates are such that of the bias and scale parameters of the
baseline channel is 0 and 1, respectively.
With this constraint, all bias parameters are identifiable.
If \code{"max"}, the offset vector will the point on the line that is
as "great" as possible, but still such that each of its components is
less than the corresponding minimal signal. This will guarantee that
no negative signals are created in the backward transformation.
If \code{\link[base]{numeric}} value, the offset vector will the point on the line
such that after applying the backward transformation there are
\code{constraint*N}. Note that \code{constraint==0} corresponds
approximately to \code{constraint=="max"}.
With the latter two constraints, the bias parameters are only
identifiable modulo the fitted line.
}
\item{baselineChannel}{Index of channel toward which all other
channels are conform.
This argument is required if \code{constraint=="baseline"}.
This argument is optional if \code{constraint=="diagonal"} and
then the scale factor of the baseline channel will be one. The
estimate of the bias parameters is not affected in this case.
Defaults to one, if missing.
}
\item{...}{Additional arguments accepted by \code{\link{iwpca}}().
For instance, a N \code{\link[base]{vector}} of weights for each observation may be
given, otherwise they get the same weight.
}
\item{aShift, Xmin}{For internal use only.}
}
\value{
Returns a \code{\link[base]{list}} that contains estimated parameters and algorithm
details;
\item{a}{A \code{\link[base]{double}} \code{\link[base]{vector}} \eqn{(a[1],...,a[K])}with offset
parameter estimates.
It is made identifiable according to argument \code{constraint}.
}
\item{b}{A \code{\link[base]{double}} \code{\link[base]{vector}} \eqn{(b[1],...,b[K])}with scale
parameter estimates. It is made identifiable by constraining
\code{b[baselineChannel] == 1}.
These estimates are independent of argument \code{constraint}.
}
\item{adiag}{If identifiability constraint \code{"diagonal"},
a \code{\link[base]{double}} \code{\link[base]{vector}} \eqn{(adiag[1],...,adiag[K])}, where
\eqn{adiag[1] = adiag[2] = ... adiag[K]}, specifying the point
on the diagonal line that is closest to the fitted line,
otherwise the zero vector.
}
\item{eigen}{A KxK \code{\link[base]{matrix}} with columns of eigenvectors.
}
\item{converged}{\code{\link[base:logical]{TRUE}} if the algorithm converged, otherwise \code{\link[base:logical]{FALSE}}.
}
\item{nbrOfIterations}{The number of iterations for the algorithm
to converge, or zero if it did not converge.
}
\item{t0}{Internal parameter estimates, which contains no more
information than the above listed elements.
}
\item{t}{Always \code{\link[base]{NULL}}.}
}
\details{
This method uses re-weighted principal component analysis (IWPCA)
to fit a the model \eqn{y_n = a + bx_n + eps_n} where \eqn{y_n},
\eqn{a}, \eqn{b}, and \eqn{eps_n} are vector of the K and \eqn{x_n}
is a scalar.
The algorithm is:
For iteration i:
1) Fit a line \eqn{L} through the data close using weighted PCA
with weights \eqn{\{w_n\}}. Let
\eqn{r_n = \{r_{n,1},...,r_{n,K}\}}
be the \eqn{K} principal components.
2) Update the weights as
\eqn{w_n <- 1 / \sum_{2}^{K} (r_{n,k} + \epsilon_r)}
where we have used the residuals of all but the first principal
component.
3) Find the point a on \eqn{L} that is closest to the
line \eqn{D=(1,1,...,1)}. Similarly, denote the point on D that is
closest to \eqn{L} by \eqn{t=a*(1,1,...,1)}.
}
\author{Henrik Bengtsson}
%examples "fitMultiIWPCA.matrix.Rex"
\seealso{
This is an internal method used by the \code{\link{calibrateMultiscan}}()
and \code{\link{normalizeAffine}}() methods.
Internally the function \code{\link{iwpca}}() is used to fit a line
through the data cloud and the function \code{\link{distanceBetweenLines}}() to
find the closest point to the diagonal (1,1,...,1).
}
\keyword{methods}
\keyword{algebra}
aroma.light/man/normalizeQuantileRank.Rd 0000644 0001750 0001750 00000007135 14136047216 020204 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% normalizeQuantileRank.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{normalizeQuantileRank}
\alias{normalizeQuantileRank}
\alias{normalizeQuantileRank.numeric}
\alias{normalizeQuantileRank.list}
\alias{normalizeQuantile}
\alias{normalizeQuantile.default}
\title{Normalizes the empirical distribution of one of more samples to a target distribution}
\usage{
\method{normalizeQuantileRank}{numeric}(x, xTarget, ties=FALSE, ...)
\method{normalizeQuantileRank}{list}(X, xTarget=NULL, ...)
\method{normalizeQuantile}{default}(x, ...)
}
\description{
Normalizes the empirical distribution of one of more samples to a target distribution.
The average sample distribution is calculated either robustly or not
by utilizing either \code{weightedMedian()} or \code{weighted.mean()}.
A weighted method is used if any of the weights are different from one.
}
\arguments{
\item{x, X}{a \code{\link[base]{numeric}} \code{\link[base]{vector}} of length N or a \code{\link[base]{list}} of length N
with \code{\link[base]{numeric}} \code{\link[base]{vector}}s.
If a \code{\link[base]{list}}, then the \code{\link[base]{vector}}s may be of different lengths.}
\item{xTarget}{The target empirical distribution as a \emph{sorted}
\code{\link[base]{numeric}} \code{\link[base]{vector}} of length \eqn{M}.
If \code{\link[base]{NULL}} and \code{X} is a \code{\link[base]{list}}, then the target distribution is
calculated as the average empirical distribution of the samples.}
\item{ties}{Should ties in \code{x} be treated with care or not?
For more details, see "limma:normalizeQuantiles".}
\item{...}{Not used.}
}
\value{
Returns an object of the same shape as the input argument.
}
\section{Missing values}{
Missing values are excluded when estimating the "common" (the baseline).
Values that are \code{\link[base]{NA}} remain \code{\link[base]{NA}} after normalization.
No new \code{\link[base]{NA}}s are introduced.
}
\section{Weights}{
Currently only channel weights are support due to the way quantile
normalization is done.
If signal weights are given, channel weights are calculated from these
by taking the mean of the signal weights in each channel.
}
\examples{
# Simulate ten samples of different lengths
N <- 10000
X <- list()
for (kk in 1:8) {
rfcn <- list(rnorm, rgamma)[[sample(2, size=1)]]
size <- runif(1, min=0.3, max=1)
a <- rgamma(1, shape=20, rate=10)
b <- rgamma(1, shape=10, rate=10)
values <- rfcn(size*N, a, b)
# "Censor" values
values[values < 0 | values > 8] <- NA
X[[kk]] <- values
}
# Add 20\% missing values
X <- lapply(X, FUN=function(x) {
x[sample(length(x), size=0.20*length(x))] <- NA
x
})
# Normalize quantiles
Xn <- normalizeQuantile(X)
# Plot the data
layout(matrix(1:2, ncol=1))
xlim <- range(X, na.rm=TRUE)
plotDensity(X, lwd=2, xlim=xlim, main="The original distributions")
plotDensity(Xn, lwd=2, xlim=xlim, main="The normalized distributions")
}
\author{
Adopted from Gordon Smyth (\url{http://www.statsci.org/}) in 2002 \& 2006.
Original code by Ben Bolstad at Statistics Department, University of
California.
}
\seealso{
To calculate a target distribution from a set of samples, see
\code{\link{averageQuantile}}().
For an alternative empirical density normalization methods, see
\code{\link{normalizeQuantileSpline}}().
}
\keyword{methods}
\keyword{nonparametric}
\keyword{multivariate}
\keyword{robust}
aroma.light/man/plotMvsAPairs.Rd 0000644 0001750 0001750 00000002736 14136047216 016433 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% plotMvsAPairs.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{plotMvsAPairs}
\alias{plotMvsAPairs}
\alias{plotMvsAPairs.matrix}
\title{Plot log-ratios/log-intensities for all unique pairs of data vectors}
\description{
Plot log-ratios/log-intensities for all unique pairs of data vectors.
}
\usage{
\method{plotMvsAPairs}{matrix}(X, Alab="A", Mlab="M", Alim=c(0, 16), Mlim=c(-1, 1) * diff(Alim), pch=".",
..., add=FALSE)
}
\arguments{
\item{X}{NxK \code{\link[base]{matrix}} where N is the number of observations and
K is the number of channels.}
\item{Alab,Mlab}{Labels on the x and y axes.}
\item{Alim,Mlim}{Plot range on the A and M axes.}
\item{pch}{Plot symbol used.}
\item{...}{Additional arguments accepted by \code{\link[graphics]{points}}.}
\item{add}{If \code{\link[base:logical]{TRUE}}, data points are plotted in the current plot,
otherwise a new plot is created.}
}
\details{
Log-ratios and log-intensities are calculated for each neighboring pair
of channels (columns) and plotted. Thus, in total there will be K-1
data set plotted.
The colors used for the plotted pairs are 1, 2, and so on. To change
the colors, use a different color palette.
}
\value{
Returns nothing.
}
\author{Henrik Bengtsson}
\keyword{methods}
aroma.light/man/robustSmoothSpline.Rd 0000644 0001750 0001750 00000010065 14136047216 017544 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% robustSmoothSpline.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{robustSmoothSpline}
\alias{robustSmoothSpline.default}
\alias{robustSmoothSpline}
\title{Robust fit of a Smoothing Spline}
\usage{
\method{robustSmoothSpline}{default}(x, y=NULL, w=NULL, ..., minIter=3, maxIter=max(minIter, 50),
method=c("L1", "symmetric"), sdCriteria=2e-04, reps=1e-15, tol=1e-06 * IQR(x),
plotCurves=FALSE)
}
\description{
Fits a smoothing spline robustly using the \eqn{L_1} norm. Currently, the
algorithm is an \emph{iterative reweighted smooth spline} algorithm which
calls \code{smooth.spline(x,y,w,...)} at each iteration with the weights
\code{w} equal to the inverse of the absolute value of the residuals for
the last iteration step.
}
\arguments{
\item{x}{a \code{\link[base]{vector}} giving the values of the predictor variable, or a
\code{\link[base]{list}} or a two-column \code{\link[base]{matrix}} specifying \code{x} and \code{y}.
If \code{x} is of class \code{smooth.spline} then \code{x$x} is used
as the \code{x} values and \code{x$yin} are used as the \code{y}
values.}
\item{y}{responses. If \code{y} is missing, the responses are assumed to be
specified by \code{x}.}
\item{w}{a \code{\link[base]{vector}} of weights the same length as \code{x} giving the weights
to use for each element of \code{x}. Default value is equal weight
to all values.}
\item{...}{Other arguments passed to \code{\link[stats]{smooth.spline}}.}
\item{minIter}{the minimum number of iterations used to fit the smoothing
spline robustly. Default value is 3.}
\item{maxIter}{the maximum number of iterations used to fit the smoothing
spline robustly. Default value is 25.}
\item{method}{the method used to compute robustness weights at each
iteration. Default value is \code{"L1"}, which uses the inverse of
the absolute value of the residuals. Using \code{"symmetric"} will
use Tukey's biweight with cut-off equal to six times the MAD of
the residuals, equivalent to \code{\link[stats]{lowess}}.}
\item{sdCriteria}{Convergence criteria, which the difference between the
standard deviation of the residuals between two consecutive
iteration steps. Default value is 2e-4.}
\item{reps}{Small positive number added to residuals to avoid division by
zero when calculating new weights for next iteration.}
\item{tol}{Passed to \code{\link[stats]{smooth.spline}} (R >= 2.14.0).}
\item{plotCurves}{If \code{\link[base:logical]{TRUE}}, the fitted splines are added to the current
plot, otherwise not.}
}
\value{
Returns an object of class \code{smooth.spline}.
}
\examples{
data(cars)
attach(cars)
plot(speed, dist, main = "data(cars) & robust smoothing splines")
# Fit a smoothing spline using L_2 norm
cars.spl <- smooth.spline(speed, dist)
lines(cars.spl, col = "blue")
# Fit a smoothing spline using L_1 norm
cars.rspl <- robustSmoothSpline(speed, dist)
lines(cars.rspl, col = "red")
# Fit a smoothing spline using L_2 norm with 10 degrees of freedom
lines(smooth.spline(speed, dist, df=10), lty=2, col = "blue")
# Fit a smoothing spline using L_1 norm with 10 degrees of freedom
lines(robustSmoothSpline(speed, dist, df=10), lty=2, col = "red")
legend(5,120, c(
paste("smooth.spline [C.V.] => df =",round(cars.spl$df,1)),
paste("robustSmoothSpline [C.V.] => df =",round(cars.rspl$df,1)),
"standard with s( * , df = 10)", "robust with s( * , df = 10)"
), col = c("blue","red","blue","red"), lty = c(1,1,2,2), bg='bisque')
}
\seealso{
This implementation of this function was adopted from
\code{\link[stats]{smooth.spline}} of the \pkg{stats} package.
Because of this, this function is also licensed under GPL v2.
}
\author{Henrik Bengtsson}
\keyword{smooth}
\keyword{robust}
aroma.light/man/Non-documented_objects.Rd 0000644 0001750 0001750 00000002217 14136047216 020251 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% 999.NonDocumentedObjects.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{Non-documented objects}
\alias{Non-documented objects}
\title{Non-documented objects}
% Plot functions
\alias{lines.XYCurveFit}
% Matrix operations
\alias{rowAverages}
\alias{rowAverages.matrix}
% Simple linear-algebra
\alias{projectUontoV}
\alias{scalarProduct}
\alias{tr}
% Miscellaneous statistical functions
\alias{likelihood}
\alias{predict.lowess}
\description{
This page contains aliases for all "non-documented" objects that
\code{R CMD check} detects in this package.
Almost all of them are \emph{generic} functions that have specific
document for the corresponding method coupled to a specific class.
Other functions are re-defined by \code{setMethodS3()} to
\emph{default} methods. Neither of these two classes are non-documented
in reality.
The rest are deprecated methods.
}
\keyword{documentation}
\keyword{internal}
aroma.light/man/findPeaksAndValleys.Rd 0000644 0001750 0001750 00000005146 14136047216 017554 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% findPeaksAndValleys.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{findPeaksAndValleys}
\alias{findPeaksAndValleys}
\alias{findPeaksAndValleys.density}
\alias{findPeaksAndValleys.numeric}
\title{Finds extreme points in the empirical density estimated from data}
\description{
Finds extreme points in the empirical density estimated from data.
}
\usage{
\method{findPeaksAndValleys}{density}(x, tol=0, ...)
\method{findPeaksAndValleys}{numeric}(x, ..., tol=0, na.rm=TRUE)
}
\arguments{
\item{x}{A \code{\link[base]{numeric}} \code{\link[base]{vector}} containing data points or
a \code{\link[stats]{density}} object.}
\item{...}{Arguments passed to \code{\link[stats]{density}}.
Ignored if \code{x} is a \code{\link[stats]{density}} object.}
\item{tol}{A non-negative \code{\link[base]{numeric}} threshold specifying the minimum
density at the extreme point in order to accept it.}
\item{na.rm}{If \code{\link[base:logical]{TRUE}}, missing values are dropped, otherwise not.}
}
\value{
Returns a \code{\link[base]{data.frame}} (of class 'PeaksAndValleys') containing
of "peaks" and "valleys" filtered by \code{tol}.
}
\examples{
layout(matrix(1:3, ncol=1))
par(mar=c(2,4,4,1)+0.1)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# A unimodal distribution
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
x1 <- rnorm(n=10000, mean=0, sd=1)
x <- x1
fit <- findPeaksAndValleys(x)
print(fit)
plot(density(x), lwd=2, main="x1")
abline(v=fit$x)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# A trimodal distribution
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
x2 <- rnorm(n=10000, mean=4, sd=1)
x3 <- rnorm(n=10000, mean=8, sd=1)
x <- c(x1,x2,x3)
fit <- findPeaksAndValleys(x)
print(fit)
plot(density(x), lwd=2, main="c(x1,x2,x3)")
abline(v=fit$x)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# A trimodal distribution with clear separation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
x1b <- rnorm(n=10000, mean=0, sd=0.1)
x2b <- rnorm(n=10000, mean=4, sd=0.1)
x3b <- rnorm(n=10000, mean=8, sd=0.1)
x <- c(x1b,x2b,x3b)
# Illustrating explicit usage of density()
d <- density(x)
fit <- findPeaksAndValleys(d, tol=0)
print(fit)
plot(d, lwd=2, main="c(x1b,x2b,x3b)")
abline(v=fit$x)
}
\author{Henrik Bengtsson}
\seealso{
This function is used by \code{\link{callNaiveGenotypes}}().
}
\keyword{methods}
\keyword{internal}
aroma.light/man/plotDensity.Rd 0000644 0001750 0001750 00000003733 14136047216 016203 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% plotDensity.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{plotDensity}
\alias{plotDensity}
\alias{plotDensity.list}
\alias{plotDensity.data.frame}
\alias{plotDensity.matrix}
\alias{plotDensity.numeric}
\alias{plotDensity.density}
\title{Plots density distributions for a set of vectors}
\description{
Plots density distributions for a set of vectors.
}
\usage{
\method{plotDensity}{data.frame}(X, ..., xlab=NULL)
\method{plotDensity}{matrix}(X, ..., xlab=NULL)
\method{plotDensity}{numeric}(X, ..., xlab=NULL)
\method{plotDensity}{list}(X, W=NULL, xlim=NULL, ylim=NULL, xlab=NULL,
ylab="density (integrates to one)", col=1:length(X), lty=NULL, lwd=NULL, ...,
add=FALSE)
}
\arguments{
\item{X}{A single of \code{\link[base]{list}} of \code{\link[base]{numeric}} \code{\link[base]{vector}}s or \code{\link[stats]{density}}
objects, a \code{\link[base]{numeric}} \code{\link[base]{matrix}}, or a \code{\link[base]{numeric}} \code{\link[base]{data.frame}}.}
\item{W}{(optional) weights of similar data types and dimensions as
\code{X}.}
\item{xlim,ylim}{\code{\link[base]{character}} \code{\link[base]{vector}} of length 2. The x and y limits.}
\item{xlab,ylab}{\code{\link[base]{character}} string for labels on x and y axis.}
\item{col}{The color(s) of the curves.}
\item{lty}{The types of curves.}
\item{lwd}{The width of curves.}
\item{...}{Additional arguments passed to \code{\link[stats]{density}},
\code{\link[graphics]{plot}}(), and \code{\link[graphics]{lines}}.}
\item{add}{If \code{\link[base:logical]{TRUE}}, the curves are plotted in the current plot,
otherwise a new is created.}
}
\seealso{
Internally, \code{\link[stats]{density}} is used to estimate the
empirical density.
}
\author{Henrik Bengtsson}
\keyword{methods}
aroma.light/man/normalizeFragmentLength.Rd 0000644 0001750 0001750 00000024132 14136047216 020507 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% normalizeFragmentLength.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{normalizeFragmentLength}
\alias{normalizeFragmentLength.default}
\alias{normalizeFragmentLength}
\title{Normalizes signals for PCR fragment-length effects}
\description{
Normalizes signals for PCR fragment-length effects. Some or all signals are used to estimated the
normalization function. All signals are normalized.
}
\usage{
\method{normalizeFragmentLength}{default}(y, fragmentLengths, targetFcns=NULL, subsetToFit=NULL,
onMissing=c("ignore", "median"), .isLogged=TRUE, ..., .returnFit=FALSE)
}
\arguments{
\item{y}{A \code{\link[base]{numeric}} \code{\link[base]{vector}} of length K of signals to be normalized
across E enzymes.}
\item{fragmentLengths}{An \code{\link[base]{integer}} KxE \code{\link[base]{matrix}} of fragment lengths.}
\item{targetFcns}{An optional \code{\link[base]{list}} of E \code{\link[base]{function}}s; one per enzyme.
If \code{\link[base]{NULL}}, the data is normalized to have constant fragment-length
effects (all equal to zero on the log-scale).}
\item{subsetToFit}{The subset of data points used to fit the
normalization function.
If \code{\link[base]{NULL}}, all data points are considered.}
\item{onMissing}{Specifies how data points for which there is no
fragment length is normalized.
If \code{"ignore"}, the values are not modified.
If \code{"median"}, the values are updated to have the same
robust average as the other data points.
}
\item{.isLogged}{A \code{\link[base]{logical}}.}
\item{...}{Additional arguments passed to \code{\link[stats]{lowess}}.}
\item{.returnFit}{A \code{\link[base]{logical}}.}
}
\value{
Returns a \code{\link[base]{numeric}} \code{\link[base]{vector}} of the normalized signals.
}
\section{Multi-enzyme normalization}{
It is assumed that the fragment-length effects from multiple enzymes
added (with equal weights) on the intensity scale.
The fragment-length effects are fitted for each enzyme separately based
on units that are exclusively for that enzyme.
\emph{If there are no or very such units for an enzyme, the assumptions
of the model are not met and the fit will fail with an error.}
Then, from the above single-enzyme fits the average effect across
enzymes is the calculated for each unit that is on multiple enzymes.
}
\section{Target functions}{
It is possible to specify custom target function effects for each
enzyme via argument \code{targetFcns}. This argument has to be a
\code{\link[base]{list}} containing one \code{\link[base]{function}} per enzyme and ordered in the same
order as the enzyme are in the columns of argument
\code{fragmentLengths}.
For instance, if one wish to normalize the signals such that their
mean signal as a function of fragment length effect is constantly
equal to 2200 (or the intensity scale), the use
\code{targetFcns=function(fl, ...) log2(2200)} which completely
ignores fragment-length argument 'fl' and always returns a
constant.
If two enzymes are used, then use
\code{targetFcns=rep(list(function(fl, ...) log2(2200)), 2)}.
Note, if \code{targetFcns} is \code{\link[base]{NULL}}, this corresponds to
\code{targetFcns=rep(list(function(fl, ...) 0), ncol(fragmentLengths))}.
Alternatively, if one wants to only apply minimal corrections to
the signals, then one can normalize toward target functions that
correspond to the fragment-length effect of the average array.
}
\examples{
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Example 1: Single-enzyme fragment-length normalization of 6 arrays
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Number samples
I <- 9
# Number of loci
J <- 1000
# Fragment lengths
fl <- seq(from=100, to=1000, length.out=J)
# Simulate data points with unknown fragment lengths
hasUnknownFL <- seq(from=1, to=J, by=50)
fl[hasUnknownFL] <- NA
# Simulate data
y <- matrix(0, nrow=J, ncol=I)
maxY <- 12
for (kk in 1:I) {
k <- runif(n=1, min=3, max=5)
mu <- function(fl) {
mu <- rep(maxY, length(fl))
ok <- !is.na(fl)
mu[ok] <- mu[ok] - fl[ok]^{1/k}
mu
}
eps <- rnorm(J, mean=0, sd=1)
y[,kk] <- mu(fl) + eps
}
# Normalize data (to a zero baseline)
yN <- apply(y, MARGIN=2, FUN=function(y) {
normalizeFragmentLength(y, fragmentLengths=fl, onMissing="median")
})
# The correction factors
rho <- y-yN
print(summary(rho))
# The correction for units with unknown fragment lengths
# equals the median correction factor of all other units
print(summary(rho[hasUnknownFL,]))
# Plot raw data
layout(matrix(1:9, ncol=3))
xlim <- c(0,max(fl, na.rm=TRUE))
ylim <- c(0,max(y, na.rm=TRUE))
xlab <- "Fragment length"
ylab <- expression(log2(theta))
for (kk in 1:I) {
plot(fl, y[,kk], xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab)
ok <- (is.finite(fl) & is.finite(y[,kk]))
lines(lowess(fl[ok], y[ok,kk]), col="red", lwd=2)
}
# Plot normalized data
layout(matrix(1:9, ncol=3))
ylim <- c(-1,1)*max(y, na.rm=TRUE)/2
for (kk in 1:I) {
plot(fl, yN[,kk], xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab)
ok <- (is.finite(fl) & is.finite(y[,kk]))
lines(lowess(fl[ok], yN[ok,kk]), col="blue", lwd=2)
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Example 2: Two-enzyme fragment-length normalization of 6 arrays
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
set.seed(0xbeef)
# Number samples
I <- 5
# Number of loci
J <- 3000
# Fragment lengths (two enzymes)
fl <- matrix(0, nrow=J, ncol=2)
fl[,1] <- seq(from=100, to=1000, length.out=J)
fl[,2] <- seq(from=1000, to=100, length.out=J)
# Let 1/2 of the units be on both enzymes
fl[seq(from=1, to=J, by=4),1] <- NA
fl[seq(from=2, to=J, by=4),2] <- NA
# Let some have unknown fragment lengths
hasUnknownFL <- seq(from=1, to=J, by=15)
fl[hasUnknownFL,] <- NA
# Sty/Nsp mixing proportions:
rho <- rep(1, I)
rho[1] <- 1/3; # Less Sty in 1st sample
rho[3] <- 3/2; # More Sty in 3rd sample
# Simulate data
z <- array(0, dim=c(J,2,I))
maxLog2Theta <- 12
for (ii in 1:I) {
# Common effect for both enzymes
mu <- function(fl) {
k <- runif(n=1, min=3, max=5)
mu <- rep(maxLog2Theta, length(fl))
ok <- is.finite(fl)
mu[ok] <- mu[ok] - fl[ok]^{1/k}
mu
}
# Calculate the effect for each data point
for (ee in 1:2) {
z[,ee,ii] <- mu(fl[,ee])
}
# Update the Sty/Nsp mixing proportions
ee <- 2
z[,ee,ii] <- rho[ii]*z[,ee,ii]
# Add random errors
for (ee in 1:2) {
eps <- rnorm(J, mean=0, sd=1/sqrt(2))
z[,ee,ii] <- z[,ee,ii] + eps
}
}
hasFl <- is.finite(fl)
unitSets <- list(
nsp = which( hasFl[,1] & !hasFl[,2]),
sty = which(!hasFl[,1] & hasFl[,2]),
both = which( hasFl[,1] & hasFl[,2]),
none = which(!hasFl[,1] & !hasFl[,2])
)
# The observed data is a mix of two enzymes
theta <- matrix(NA_real_, nrow=J, ncol=I)
# Single-enzyme units
for (ee in 1:2) {
uu <- unitSets[[ee]]
theta[uu,] <- 2^z[uu,ee,]
}
# Both-enzyme units (sum on intensity scale)
uu <- unitSets$both
theta[uu,] <- (2^z[uu,1,]+2^z[uu,2,])/2
# Missing units (sample from the others)
uu <- unitSets$none
theta[uu,] <- apply(theta, MARGIN=2, sample, size=length(uu))
# Calculate target array
thetaT <- rowMeans(theta, na.rm=TRUE)
targetFcns <- list()
for (ee in 1:2) {
uu <- unitSets[[ee]]
fit <- lowess(fl[uu,ee], log2(thetaT[uu]))
class(fit) <- "lowess"
targetFcns[[ee]] <- function(fl, ...) {
predict(fit, newdata=fl)
}
}
# Fit model only to a subset of the data
subsetToFit <- setdiff(1:J, seq(from=1, to=J, by=10))
# Normalize data (to a target baseline)
thetaN <- matrix(NA_real_, nrow=J, ncol=I)
fits <- vector("list", I)
for (ii in 1:I) {
lthetaNi <- normalizeFragmentLength(log2(theta[,ii]), targetFcns=targetFcns,
fragmentLengths=fl, onMissing="median",
subsetToFit=subsetToFit, .returnFit=TRUE)
fits[[ii]] <- attr(lthetaNi, "modelFit")
thetaN[,ii] <- 2^lthetaNi
}
# Plot raw data
xlim <- c(0, max(fl, na.rm=TRUE))
ylim <- c(0, max(log2(theta), na.rm=TRUE))
Mlim <- c(-1,1)*4
xlab <- "Fragment length"
ylab <- expression(log2(theta))
Mlab <- expression(M==log[2](theta/theta[R]))
layout(matrix(1:(3*I), ncol=I, byrow=TRUE))
for (ii in 1:I) {
plot(NA, xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab, main="raw")
# Single-enzyme units
for (ee in 1:2) {
# The raw data
uu <- unitSets[[ee]]
points(fl[uu,ee], log2(theta[uu,ii]), col=ee+1)
}
# Both-enzyme units (use fragment-length for enzyme #1)
uu <- unitSets$both
points(fl[uu,1], log2(theta[uu,ii]), col=3+1)
for (ee in 1:2) {
# The true effects
uu <- unitSets[[ee]]
lines(lowess(fl[uu,ee], log2(theta[uu,ii])), col="black", lwd=4, lty=3)
# The estimated effects
fit <- fits[[ii]][[ee]]$fit
lines(fit, col="orange", lwd=3)
muT <- targetFcns[[ee]](fl[uu,ee])
lines(fl[uu,ee], muT, col="cyan", lwd=1)
}
}
# Calculate log-ratios
thetaR <- rowMeans(thetaN, na.rm=TRUE)
M <- log2(thetaN/thetaR)
# Plot normalized data
for (ii in 1:I) {
plot(NA, xlim=xlim, ylim=Mlim, xlab=xlab, ylab=Mlab, main="normalized")
# Single-enzyme units
for (ee in 1:2) {
# The normalized data
uu <- unitSets[[ee]]
points(fl[uu,ee], M[uu,ii], col=ee+1)
}
# Both-enzyme units (use fragment-length for enzyme #1)
uu <- unitSets$both
points(fl[uu,1], M[uu,ii], col=3+1)
}
ylim <- c(0,1.5)
for (ii in 1:I) {
data <- list()
for (ee in 1:2) {
# The normalized data
uu <- unitSets[[ee]]
data[[ee]] <- M[uu,ii]
}
uu <- unitSets$both
if (length(uu) > 0)
data[[3]] <- M[uu,ii]
uu <- unitSets$none
if (length(uu) > 0)
data[[4]] <- M[uu,ii]
cols <- seq_along(data)+1
plotDensity(data, col=cols, xlim=Mlim, xlab=Mlab, main="normalized")
abline(v=0, lty=2)
}
}
\author{Henrik Bengtsson}
\references{
[1] H. Bengtsson, R. Irizarry, B. Carvalho, and T. Speed, \emph{Estimation and assessment of raw copy numbers at the single locus level}, Bioinformatics, 2008.
\cr
}
\keyword{nonparametric}
\keyword{robust}
aroma.light/man/normalizeDifferencesToAverage.Rd 0000644 0001750 0001750 00000004237 14136047216 021621 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% normalizeDifferencesToAverage.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{normalizeDifferencesToAverage}
\alias{normalizeDifferencesToAverage}
\alias{normalizeDifferencesToAverage.list}
\title{Rescales channel vectors to get the same average}
\description{
Rescales channel vectors to get the same average.
}
\usage{
\method{normalizeDifferencesToAverage}{list}(x, baseline=1, FUN=median, ...)
}
\arguments{
\item{x}{A \code{\link[base]{numeric}} \code{\link[base]{list}} of length K.}
\item{baseline}{An \code{\link[base]{integer}} in [1,K] specifying which channel should be
the baseline. The baseline channel will be almost unchanged.
If \code{\link[base]{NULL}}, the channels will be shifted towards median of them all.}
\item{FUN}{A \code{\link[base]{function}} for calculating the average of one channel.}
\item{...}{Additional arguments passed to the \code{avg} \code{\link[base]{function}}.}
}
\value{
Returns a normalized \code{\link[base]{list}} of length K.
}
\examples{
# Simulate three shifted tracks of different lengths with same profiles
ns <- c(A=2, B=1, C=0.25)*1000
xx <- lapply(ns, FUN=function(n) { seq(from=1, to=max(ns), length.out=n) })
zz <- mapply(seq_along(ns), ns, FUN=function(z,n) rep(z,n))
yy <- list(
A = rnorm(ns["A"], mean=0, sd=0.5),
B = rnorm(ns["B"], mean=5, sd=0.4),
C = rnorm(ns["C"], mean=-5, sd=1.1)
)
yy <- lapply(yy, FUN=function(y) {
n <- length(y)
y[1:(n/2)] <- y[1:(n/2)] + 2
y[1:(n/4)] <- y[1:(n/4)] - 4
y
})
# Shift all tracks toward the first track
yyN <- normalizeDifferencesToAverage(yy, baseline=1)
# The baseline channel is not changed
stopifnot(identical(yy[[1]], yyN[[1]]))
# Get the estimated parameters
fit <- attr(yyN, "fit")
# Plot the tracks
layout(matrix(1:2, ncol=1))
x <- unlist(xx)
col <- unlist(zz)
y <- unlist(yy)
yN <- unlist(yyN)
plot(x, y, col=col, ylim=c(-10,10))
plot(x, yN, col=col, ylim=c(-10,10))
}
\author{Henrik Bengtsson}
\keyword{methods}
aroma.light/man/aroma.light-package.Rd 0000644 0001750 0001750 00000013003 14136047216 017452 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% 999.package.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{aroma.light-package}
\alias{aroma.light-package}
\alias{aroma.light}
\docType{package}
\title{Package aroma.light}
\encoding{latin1}
\description{
Methods for microarray analysis that take basic data types such as matrices and lists of vectors. These methods can be used standalone, be utilized in other packages, or be wrapped up in higher-level classes.
}
\section{Installation}{
To install this package, see
\url{https://bioconductor.org/packages/release/bioc/html/aroma.light.html}.
}
\section{To get started}{
For scanner calibration:
\enumerate{
\item see \code{\link{calibrateMultiscan}}() - scan the same array two or more times to calibrate for scanner effects and extended dynamical range.
}
To normalize multiple single-channel arrays all with the same number of probes/spots:
\enumerate{
\item \code{\link{normalizeAffine}}() - normalizes, on the intensity scale, for differences in offset and scale between channels.
\item \code{\link{normalizeQuantileRank}}(), \code{\link{normalizeQuantileSpline}}() - normalizes, on the intensity scale, for differences in empirical distribution between channels.
}
To normalize multiple single-channel arrays with varying number probes/spots:
\enumerate{
\item \code{\link{normalizeQuantileRank}}(), \code{\link{normalizeQuantileSpline}}() - normalizes, on the intensity scale, for differences in empirical distribution between channels.
}
To normalize two-channel arrays:
\enumerate{
\item \code{\link{normalizeAffine}}() - normalizes, on the intensity scale, for differences in offset and scale between channels. This will also correct for intensity-dependent affects on the log scale.
\item \code{\link{normalizeCurveFit}}() - Classical intensity-dependent normalization, on the log scale, e.g. lowess normalization.
}
To normalize three or more channels:
\enumerate{
\item \code{\link{normalizeAffine}}() - normalizes, on the intensity scale, for differences in offset and scale between channels. This will minimize the curvature on the log scale between any two channels.
}
}
\section{Further readings}{
Several of the normalization methods proposed in [1]-[7] are
available in this package.
}
\section{How to cite this package}{
Whenever using this package, please cite one or more of [1]-[7].
}
\section{Wishlist}{
Here is a list of features that would be useful, but which I have
too little time to add myself. Contributions are appreciated.
\itemize{
\item At the moment, nothing.
}
If you consider to contribute, make sure it is not already
implemented by downloading the latest "devel" version!
}
\author{Henrik Bengtsson, Pierre Neuvial, Aaron Lun}
\section{License}{
The releases of this package is licensed under
GPL version 2 or newer.
NB: Except for the \code{robustSmoothSpline()} method,
it is alright to distribute the rest of the package under
LGPL version 2.1 or newer.
The development code of the packages is under a private licence
(where applicable) and patches sent to the author fall under the
latter license, but will be, if incorporated, released under the
"release" license above.
}
\references{
Some of the reference below can be found at
\url{https://www.aroma-project.org/publications/}.\cr
[1] H. Bengtsson, \emph{Identification and normalization of plate effects
in cDNA microarray data}, Preprints in Mathematical Sciences,
2002:28, Mathematical Statistics, Centre for Mathematical Sciences,
Lund University, 2002.\cr
[2] H. Bengtsson, \emph{The R.oo package - Object-Oriented Programming with References Using Standard R Code}, In Kurt Hornik, Friedrich Leisch and Achim Zeileis, editors, Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), March 20-22, Vienna, Austria. \url{http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Proceedings/}
\cr
[3] H. Bengtsson, \emph{aroma - An R Object-oriented Microarray
Analysis environment}, Preprints in Mathematical Sciences (manuscript
in preparation), Mathematical Statistics, Centre for Mathematical
Sciences, Lund University, 2004.\cr
[4] H. Bengtsson, J. Vallon-Christersson and G. \enc{Jnsson}{Jonsson}, \emph{Calibration and assessment of channel-specific biases in microarray data with extended dynamical range}, BMC Bioinformatics, 5:177, 2004.
\cr
[5] Henrik Bengtsson and Ola \enc{Hssjer}{Hossjer}, \emph{Methodological Study of Affine Transformations of Gene Expression Data}, Methodological study of affine transformations of gene expression data with proposed robust non-parametric multi-dimensional normalization method, BMC Bioinformatics, 2006, 7:100.
\cr
[6] H. Bengtsson, R. Irizarry, B. Carvalho, and T. Speed, \emph{Estimation and assessment of raw copy numbers at the single locus level}, Bioinformatics, 2008.
\cr
[7] H. Bengtsson, A. Ray, P. Spellman and T.P. Speed, \emph{A single-sample method for normalizing and combining full-resolutioncopy numbers from multiple platforms, labs and analysis methods}, Bioinformatics, 2009.
\cr
[8] H. Bengtsson, P. Neuvial and T.P. Speed, \emph{TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays}, BMC Bioinformatics, 2010, 11:245. [PMID 20462408]
\cr
}
\keyword{package}
aroma.light/man/normalizeCurveFit.Rd 0000644 0001750 0001750 00000021665 14136047216 017341 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% normalizeCurveFit.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{normalizeCurveFit}
\alias{normalizeCurveFit}
\alias{normalizeLoess}
\alias{normalizeLowess}
\alias{normalizeSpline}
\alias{normalizeRobustSpline}
\alias{normalizeCurveFit.matrix}
\alias{normalizeLoess.matrix}
\alias{normalizeLowess.matrix}
\alias{normalizeSpline.matrix}
\alias{normalizeRobustSpline.matrix}
\encoding{latin1}
\title{Weighted curve-fit normalization between a pair of channels}
\description{
Weighted curve-fit normalization between a pair of channels.
This method will estimate a smooth function of the dependency
between the log-ratios and the log-intensity of the two channels and
then correct the log-ratios (only) in order to remove the dependency.
This is method is also known as \emph{intensity-dependent} or
\emph{lowess normalization}.
The curve-fit methods are by nature limited to paired-channel data.
There exist at least one method trying to overcome this limitation,
namely the cyclic-lowess [1], which applies the paired
curve-fit method iteratively over all pairs of channels/arrays.
Cyclic-lowess is not implemented here.
We recommend that affine normalization [2] is used instead of curve-fit
normalization.
}
\usage{
\method{normalizeCurveFit}{matrix}(X, weights=NULL, typeOfWeights=c("datapoint"),
method=c("loess", "lowess", "spline", "robustSpline"), bandwidth=NULL,
satSignal=2^16 - 1, ...)
\method{normalizeLoess}{matrix}(X, ...)
\method{normalizeLowess}{matrix}(X, ...)
\method{normalizeSpline}{matrix}(X, ...)
\method{normalizeRobustSpline}{matrix}(X, ...)
}
\arguments{
\item{X}{An Nx2 \code{\link[base]{matrix}} where the columns represent the two channels
to be normalized.}
\item{weights}{If \code{\link[base]{NULL}}, non-weighted normalization is done.
If data-point weights are used, this should be a \code{\link[base]{vector}} of length
N of data point weights used when estimating the normalization
function.
}
\item{typeOfWeights}{A \code{\link[base]{character}} string specifying the type of
weights given in argument \code{weights}.
}
\item{method}{\code{\link[base]{character}} string specifying which method to use when
fitting the intensity-dependent function.
Supported methods:
\code{"loess"} (better than lowess),
\code{"lowess"} (classic; supports only zero-one weights),
\code{"spline"} (more robust than lowess at lower and upper
intensities; supports only zero-one weights),
\code{"robustSpline"} (better than spline).
}
\item{bandwidth}{A \code{\link[base]{double}} value specifying the bandwidth of the
estimator used.
}
\item{satSignal}{Signals equal to or above this threshold will not
be used in the fitting.
}
\item{...}{Not used.}
}
\value{
A Nx2 \code{\link[base]{matrix}} of the normalized two channels.
The fitted model is returned as attribute \code{modelFit}.
}
\details{
A smooth function \eqn{c(A)} is fitted through data in \eqn{(A,M)},
where \eqn{M=log_2(y_2/y_1)} and \eqn{A=1/2*log_2(y_2*y_1)}. Data is
normalized by \eqn{M <- M - c(A)}.
Loess is by far the slowest method of the four, then lowess, and then
robust spline, which iteratively calls the spline method.
}
\section{Negative, non-positive, and saturated values}{
Non-positive values are set to not-a-number (\code{\link[base:is.finite]{NaN}}).
Data points that are saturated in one or more channels are not used
to estimate the normalization function, but they are normalized.
}
\section{Missing values}{
The estimation of the normalization function will only be made
based on complete non-saturated observations, i.e. observations that
contains no \code{\link[base]{NA}} values nor saturated values as defined by \code{satSignal}.
}
\section{Weighted normalization}{
Each data point, that is, each row in \code{X}, which is a
vector of length 2, can be assigned a weight in [0,1] specifying how much
it should \emph{affect the fitting of the normalization function}.
Weights are given by argument \code{weights}, which should be a \code{\link[base]{numeric}}
\code{\link[base]{vector}} of length N. Regardless of weights, all data points are
\emph{normalized} based on the fitted normalization function.
Note that the lowess and the spline method only support zero-one
\{0,1\} weights.
For such methods, all weights that are less than a half are set to zero.
}
\section{Details on loess}{
For \code{\link[stats]{loess}}, the arguments \code{family="symmetric"},
\code{degree=1}, \code{span=3/4},
\code{control=loess.control(trace.hat="approximate"},
\code{iterations=5}, \code{surface="direct")} are used.
}
\author{Henrik Bengtsson}
\references{
[1] M. \enc{strand}{Astrand},
Contrast Normalization of Oligonucleotide Arrays,
Journal Computational Biology, 2003, 10, 95-102. \cr
[2] Henrik Bengtsson and Ola \enc{Hssjer}{Hossjer}, \emph{Methodological Study of Affine Transformations of Gene Expression Data}, Methodological study of affine transformations of gene expression data with proposed robust non-parametric multi-dimensional normalization method, BMC Bioinformatics, 2006, 7:100.
\cr
}
\examples{
pathname <- system.file("data-ex", "PMT-RGData.dat", package="aroma.light")
rg <- read.table(pathname, header=TRUE, sep="\t")
nbrOfScans <- max(rg$slide)
rg <- as.list(rg)
for (field in c("R", "G"))
rg[[field]] <- matrix(as.double(rg[[field]]), ncol=nbrOfScans)
rg$slide <- rg$spot <- NULL
rg <- as.matrix(as.data.frame(rg))
colnames(rg) <- rep(c("R", "G"), each=nbrOfScans)
layout(matrix(c(1,2,0,3,4,0,5,6,7), ncol=3, byrow=TRUE))
rgC <- rg
for (channel in c("R", "G")) {
sidx <- which(colnames(rg) == channel)
channelColor <- switch(channel, R="red", G="green")
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The raw data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
plotMvsAPairs(rg[,sidx])
title(main=paste("Observed", channel))
box(col=channelColor)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The calibrated data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
rgC[,sidx] <- calibrateMultiscan(rg[,sidx], average=NULL)
plotMvsAPairs(rgC[,sidx])
title(main=paste("Calibrated", channel))
box(col=channelColor)
} # for (channel ...)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The average calibrated data
#
# Note how the red signals are weaker than the green. The reason
# for this can be that the scale factor in the green channel is
# greater than in the red channel, but it can also be that there
# is a remaining relative difference in bias between the green
# and the red channel, a bias that precedes the scanning.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
rgCA <- rg
for (channel in c("R", "G")) {
sidx <- which(colnames(rg) == channel)
rgCA[,sidx] <- calibrateMultiscan(rg[,sidx])
}
rgCAavg <- matrix(NA_real_, nrow=nrow(rgCA), ncol=2)
colnames(rgCAavg) <- c("R", "G")
for (channel in c("R", "G")) {
sidx <- which(colnames(rg) == channel)
rgCAavg[,channel] <- apply(rgCA[,sidx], MARGIN=1, FUN=median, na.rm=TRUE)
}
# Add some "fake" outliers
outliers <- 1:600
rgCAavg[outliers,"G"] <- 50000
plotMvsA(rgCAavg)
title(main="Average calibrated (AC)")
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Normalize data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Weight-down outliers when normalizing
weights <- rep(1, nrow(rgCAavg))
weights[outliers] <- 0.001
# Affine normalization of channels
rgCANa <- normalizeAffine(rgCAavg, weights=weights)
# It is always ok to rescale the affine normalized data if its
# done on (R,G); not on (A,M)! However, this is only needed for
# esthetic purposes.
rgCANa <- rgCANa *2^1.4
plotMvsA(rgCANa)
title(main="Normalized AC")
# Curve-fit (lowess) normalization
rgCANlw <- normalizeLowess(rgCAavg, weights=weights)
plotMvsA(rgCANlw, col="orange", add=TRUE)
# Curve-fit (loess) normalization
rgCANl <- normalizeLoess(rgCAavg, weights=weights)
plotMvsA(rgCANl, col="red", add=TRUE)
# Curve-fit (robust spline) normalization
rgCANrs <- normalizeRobustSpline(rgCAavg, weights=weights)
plotMvsA(rgCANrs, col="blue", add=TRUE)
legend(x=0,y=16, legend=c("affine", "lowess", "loess", "r. spline"), pch=19,
col=c("black", "orange", "red", "blue"), ncol=2, x.intersp=0.3, bty="n")
plotMvsMPairs(cbind(rgCANa, rgCANlw), col="orange", xlab=expression(M[affine]))
title(main="Normalized AC")
plotMvsMPairs(cbind(rgCANa, rgCANl), col="red", add=TRUE)
plotMvsMPairs(cbind(rgCANa, rgCANrs), col="blue", add=TRUE)
abline(a=0, b=1, lty=2)
legend(x=-6,y=6, legend=c("lowess", "loess", "r. spline"), pch=19,
col=c("orange", "red", "blue"), ncol=2, x.intersp=0.3, bty="n")
}
\seealso{
\code{\link{normalizeAffine}}().
}
\keyword{methods}
aroma.light/man/fitPrincipalCurve.Rd 0000644 0001750 0001750 00000007407 14136047216 017320 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% fitPrincipalCurve.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{fitPrincipalCurve}
\alias{fitPrincipalCurve}
\alias{fitPrincipalCurve.matrix}
\title{Fit a principal curve in K dimensions}
\description{
Fit a principal curve in K dimensions.
}
\usage{
\method{fitPrincipalCurve}{matrix}(X, ..., verbose=FALSE)
}
\arguments{
\item{X}{An NxK \code{\link[base]{matrix}} (K>=2) where the columns represent the dimension.}
\item{...}{Other arguments passed to \code{\link[princurve]{principal_curve}}.}
\item{verbose}{A \code{\link[base]{logical}} or a \code{\link[R.utils]{Verbose}} object.}
}
\value{
Returns a principal_curve object (which is a \code{\link[base]{list}}).
See \code{\link[princurve]{principal_curve}} for more details.
}
\section{Missing values}{
The estimation of the normalization function will only be made
based on complete observations, i.e. observations that contains no \code{\link[base]{NA}}
values in any of the channels.
}
\author{Henrik Bengtsson}
\references{
[1] Hastie, T. and Stuetzle, W, \emph{Principal Curves}, JASA, 1989.\cr
[2] H. Bengtsson, A. Ray, P. Spellman and T.P. Speed, \emph{A single-sample method for normalizing and combining full-resolutioncopy numbers from multiple platforms, labs and analysis methods}, Bioinformatics, 2009.
\cr
}
\examples{
# Simulate data from the model y <- a + bx + x^c + eps(bx)
J <- 1000
x <- rexp(J)
a <- c(2,15,3)
b <- c(2,3,4)
c <- c(1,2,1/2)
bx <- outer(b,x)
xc <- t(sapply(c, FUN=function(c) x^c))
eps <- apply(bx, MARGIN=2, FUN=function(x) rnorm(length(b), mean=0, sd=0.1*x))
y <- a + bx + xc + eps
y <- t(y)
# Fit principal curve through (y_1, y_2, y_3)
fit <- fitPrincipalCurve(y, verbose=TRUE)
# Flip direction of 'lambda'?
rho <- cor(fit$lambda, y[,1], use="complete.obs")
flip <- (rho < 0)
if (flip) {
fit$lambda <- max(fit$lambda, na.rm=TRUE)-fit$lambda
}
# Backtransform (y_1, y_2, y_3) to be proportional to each other
yN <- backtransformPrincipalCurve(y, fit=fit)
# Same backtransformation dimension by dimension
yN2 <- y
for (cc in 1:ncol(y)) {
yN2[,cc] <- backtransformPrincipalCurve(y, fit=fit, dimensions=cc)
}
stopifnot(identical(yN2, yN))
xlim <- c(0, 1.04*max(x))
ylim <- range(c(y,yN), na.rm=TRUE)
# Pairwise signals vs x before and after transform
layout(matrix(1:4, nrow=2, byrow=TRUE))
par(mar=c(4,4,3,2)+0.1)
for (cc in 1:3) {
ylab <- substitute(y[c], env=list(c=cc))
plot(NA, xlim=xlim, ylim=ylim, xlab="x", ylab=ylab)
abline(h=a[cc], lty=3)
mtext(side=4, at=a[cc], sprintf("a=\%g", a[cc]),
cex=0.8, las=2, line=0, adj=1.1, padj=-0.2)
points(x, y[,cc])
points(x, yN[,cc], col="tomato")
legend("topleft", col=c("black", "tomato"), pch=19,
c("orignal", "transformed"), bty="n")
}
title(main="Pairwise signals vs x before and after transform", outer=TRUE, line=-2)
# Pairwise signals before and after transform
layout(matrix(1:4, nrow=2, byrow=TRUE))
par(mar=c(4,4,3,2)+0.1)
for (rr in 3:2) {
ylab <- substitute(y[c], env=list(c=rr))
for (cc in 1:2) {
if (cc == rr) {
plot.new()
next
}
xlab <- substitute(y[c], env=list(c=cc))
plot(NA, xlim=ylim, ylim=ylim, xlab=xlab, ylab=ylab)
abline(a=0, b=1, lty=2)
points(y[,c(cc,rr)])
points(yN[,c(cc,rr)], col="tomato")
legend("topleft", col=c("black", "tomato"), pch=19,
c("orignal", "transformed"), bty="n")
}
}
title(main="Pairwise signals before and after transform", outer=TRUE, line=-2)
}
\seealso{
\code{\link{backtransformPrincipalCurve}}().
\code{\link[princurve]{principal_curve}}.
}
\keyword{methods}
aroma.light/man/distanceBetweenLines.Rd 0000644 0001750 0001750 00000010742 14136047216 017762 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% distanceBetweenLines.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{distanceBetweenLines}
\alias{distanceBetweenLines.default}
\alias{distanceBetweenLines}
\title{Finds the shortest distance between two lines}
\description{
Finds the shortest distance between two lines.
Consider the two lines
\eqn{x(s) = a_x + b_x*s} and \eqn{y(t) = a_y + b_y*t}
in an K-space where the offset and direction \code{\link[base]{vector}}s are \eqn{a_x}
and \eqn{b_x} (in \eqn{R^K}) that define the line \eqn{x(s)}
(\eqn{s} is a scalar). Similar for the line \eqn{y(t)}.
This function finds the point \eqn{(s,t)} for which \eqn{|x(s)-x(t)|}
is minimal.
}
\usage{
\method{distanceBetweenLines}{default}(ax, bx, ay, by, ...)
}
\arguments{
\item{ax,bx}{Offset and direction \code{\link[base]{vector}} of length K for line \eqn{z_x}.}
\item{ay,by}{Offset and direction \code{\link[base]{vector}} of length K for line \eqn{z_y}.}
\item{...}{Not used.}
}
\value{
Returns the a \code{\link[base]{list}} containing
\item{ax,bx}{The given line \eqn{x(s)}.}
\item{ay,by}{The given line \eqn{y(t)}.}
\item{s,t}{The values of \eqn{s} and \eqn{t} such that
\eqn{|x(s)-y(t)|} is minimal.}
\item{xs,yt}{The values of \eqn{x(s)} and \eqn{y(t)}
at the optimal point \eqn{(s,t)}.}
\item{distance}{The distance between the lines, i.e. \eqn{|x(s)-y(t)|}
at the optimal point \eqn{(s,t)}.}
}
\author{Henrik Bengtsson}
\examples{
for (zzz in 0) {
# This example requires plot3d() in R.basic [http://www.braju.com/R/]
if (!require(pkgName <- "R.basic", character.only=TRUE)) break
layout(matrix(1:4, nrow=2, ncol=2, byrow=TRUE))
############################################################
# Lines in two-dimensions
############################################################
x <- list(a=c(1,0), b=c(1,2))
y <- list(a=c(0,2), b=c(1,1))
fit <- distanceBetweenLines(ax=x$a, bx=x$b, ay=y$a, by=y$b)
xlim <- ylim <- c(-1,8)
plot(NA, xlab="", ylab="", xlim=ylim, ylim=ylim)
# Highlight the offset coordinates for both lines
points(t(x$a), pch="+", col="red")
text(t(x$a), label=expression(a[x]), adj=c(-1,0.5))
points(t(y$a), pch="+", col="blue")
text(t(y$a), label=expression(a[y]), adj=c(-1,0.5))
v <- c(-1,1)*10;
xv <- list(x=x$a[1]+x$b[1]*v, y=x$a[2]+x$b[2]*v)
yv <- list(x=y$a[1]+y$b[1]*v, y=y$a[2]+y$b[2]*v)
lines(xv, col="red")
lines(yv, col="blue")
points(t(fit$xs), cex=2.0, col="red")
text(t(fit$xs), label=expression(x(s)), adj=c(+2,0.5))
points(t(fit$yt), cex=1.5, col="blue")
text(t(fit$yt), label=expression(y(t)), adj=c(-1,0.5))
print(fit)
############################################################
# Lines in three-dimensions
############################################################
x <- list(a=c(0,0,0), b=c(1,1,1)) # The 'diagonal'
y <- list(a=c(2,1,2), b=c(2,1,3)) # A 'fitted' line
fit <- distanceBetweenLines(ax=x$a, bx=x$b, ay=y$a, by=y$b)
xlim <- ylim <- zlim <- c(-1,3)
dummy <- t(c(1,1,1))*100;
# Coordinates for the lines in 3d
v <- seq(-10,10, by=1);
xv <- list(x=x$a[1]+x$b[1]*v, y=x$a[2]+x$b[2]*v, z=x$a[3]+x$b[3]*v)
yv <- list(x=y$a[1]+y$b[1]*v, y=y$a[2]+y$b[2]*v, z=y$a[3]+y$b[3]*v)
for (theta in seq(30,140,length.out=3)) {
plot3d(dummy, theta=theta, phi=30, xlab="", ylab="", zlab="",
xlim=ylim, ylim=ylim, zlim=zlim)
# Highlight the offset coordinates for both lines
points3d(t(x$a), pch="+", col="red")
text3d(t(x$a), label=expression(a[x]), adj=c(-1,0.5))
points3d(t(y$a), pch="+", col="blue")
text3d(t(y$a), label=expression(a[y]), adj=c(-1,0.5))
# Draw the lines
lines3d(xv, col="red")
lines3d(yv, col="blue")
# Draw the two points that are closest to each other
points3d(t(fit$xs), cex=2.0, col="red")
text3d(t(fit$xs), label=expression(x(s)), adj=c(+2,0.5))
points3d(t(fit$yt), cex=1.5, col="blue")
text3d(t(fit$yt), label=expression(y(t)), adj=c(-1,0.5))
# Draw the distance between the two points
lines3d(rbind(fit$xs,fit$yt), col="purple", lwd=2)
}
print(fit)
} # for (zzz in 0)
rm(zzz)
}
\references{
[1] M. Bard and D. Himel, \emph{The Minimum Distance Between Two
Lines in n-Space}, September 2001, Advisor Dennis Merino.\cr
[2] Dan Sunday, \emph{Distance between 3D Lines and Segments},
Jan 2016, \url{https://www.geomalgorithms.com/algorithms.html}.\cr
}
\keyword{algebra}
aroma.light/man/fitNaiveGenotypes.Rd 0000644 0001750 0001750 00000004417 14136047216 017330 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% fitNaiveGenotypes.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{fitNaiveGenotypes}
\alias{fitNaiveGenotypes}
\alias{fitNaiveGenotypes.numeric}
\title{Fit naive genotype model from a normal sample}
\description{
Fit naive genotype model from a normal sample.
}
\usage{
\method{fitNaiveGenotypes}{numeric}(y, cn=rep(2L, times = length(y)), subsetToFit=NULL,
flavor=c("density", "fixed"), adjust=1.5, ..., censorAt=c(-0.1, 1.1), verbose=FALSE)
}
\arguments{
\item{y}{A \code{\link[base]{numeric}} \code{\link[base]{vector}} of length J containing allele B fractions
for a normal sample.}
\item{cn}{An optional \code{\link[base]{numeric}} \code{\link[base]{vector}} of length J specifying the true
total copy number in \eqn{\{0,1,2,NA\}} at each locus. This can be
used to specify which loci are diploid and which are not, e.g.
autosomal and sex chromosome copy numbers.}
\item{subsetToFit}{An optional \code{\link[base]{integer}} or \code{\link[base]{logical}} \code{\link[base]{vector}} specifying
which loci should be used for estimating the model.
If \code{\link[base]{NULL}}, all loci are used.}
\item{flavor}{A \code{\link[base]{character}} string specifying the type of algorithm used.}
\item{adjust}{A positive \code{\link[base]{double}} specifying the amount smoothing for
the empirical density estimator.}
\item{...}{Additional arguments passed to \code{\link{findPeaksAndValleys}}().}
\item{censorAt}{A \code{\link[base]{double}} \code{\link[base]{vector}} of length two specifying the range
for which values are considered finite. Values below (above) this
range are treated as -\code{\link[base:is.finite]{Inf}} (+\code{\link[base:is.finite]{Inf}}).}
\item{verbose}{A \code{\link[base]{logical}} or a \code{\link[R.utils]{Verbose}} object.}
}
\value{
Returns a \code{\link[base]{list}} of \code{\link[base]{list}}s.
}
\author{Henrik Bengtsson}
\seealso{
To call genotypes see \code{\link{callNaiveGenotypes}}().
Internally \code{\link{findPeaksAndValleys}}() is used to identify the thresholds.
}
\keyword{methods}
aroma.light/man/plotMvsA.Rd 0000644 0001750 0001750 00000002461 14136047216 015427 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% plotMvsA.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{plotMvsA}
\alias{plotMvsA}
\alias{plotMvsA.matrix}
\title{Plot log-ratios vs log-intensities}
\description{
Plot log-ratios vs log-intensities.
}
\usage{
\method{plotMvsA}{matrix}(X, Alab="A", Mlab="M", Alim=c(0, 16), Mlim=c(-1, 1) * diff(Alim) * aspectRatio,
aspectRatio=1, pch=".", ..., add=FALSE)
}
\arguments{
\item{X}{Nx2 \code{\link[base]{matrix}} with two channels and N observations.}
\item{Alab,Mlab}{Labels on the x and y axes.}
\item{Alim,Mlim}{Plot range on the A and M axes.}
\item{aspectRatio}{Aspect ratio between \code{Mlim} and \code{Alim}.}
\item{pch}{Plot symbol used.}
\item{...}{Additional arguments accepted by \code{\link[graphics]{points}}.}
\item{add}{If \code{\link[base:logical]{TRUE}}, data points are plotted in the current plot,
otherwise a new plot is created.}
}
\details{
Red channel is assumed to be in column one and green in column two.
Log-ratio are calculated taking channel one over channel two.
}
\value{
Returns nothing.
}
\author{Henrik Bengtsson}
\keyword{methods}
aroma.light/man/normalizeQuantileSpline.Rd 0000644 0001750 0001750 00000010477 14136047216 020546 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% normalizeQuantileSpline.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{normalizeQuantileSpline}
\alias{normalizeQuantileSpline}
\alias{normalizeQuantileSpline.numeric}
\alias{normalizeQuantileSpline.matrix}
\alias{normalizeQuantileSpline.list}
\title{Normalizes the empirical distribution of one or more samples to a target distribution}
\usage{
\method{normalizeQuantileSpline}{numeric}(x, w=NULL, xTarget, sortTarget=TRUE, robust=TRUE, ...)
\method{normalizeQuantileSpline}{matrix}(X, w=NULL, xTarget=NULL, sortTarget=TRUE, robust=TRUE, ...)
\method{normalizeQuantileSpline}{list}(X, w=NULL, xTarget=NULL, sortTarget=TRUE, robust=TRUE, ...)
}
\description{
Normalizes the empirical distribution of one or more samples to a target distribution.
After normalization, all samples have the same average empirical
density distribution.
}
\arguments{
\item{x, X}{A single (\eqn{K=1}) \code{\link[base]{numeric}} \code{\link[base]{vector}} of length \eqn{N},
a \code{\link[base]{numeric}} \eqn{NxK} \code{\link[base]{matrix}}, or a \code{\link[base]{list}} of length \eqn{K} with
\code{\link[base]{numeric}} \code{\link[base]{vector}}s, where \eqn{K} represents the number of samples
and \eqn{N} the number of data points.}
\item{w}{An optional \code{\link[base]{numeric}} \code{\link[base]{vector}} of length \eqn{N} of weights
specific to each data point.}
\item{xTarget}{The target empirical distribution as a \emph{sorted}
\code{\link[base]{numeric}} \code{\link[base]{vector}} of length \eqn{M}.
If \code{\link[base]{NULL}} and \code{X} is a \code{\link[base]{list}}, then the target distribution is
calculated as the average empirical distribution of the samples.}
\item{sortTarget}{If \code{\link[base:logical]{TRUE}}, argument \code{xTarget} will be sorted,
otherwise it is assumed to be already sorted.}
\item{robust}{If \code{\link[base:logical]{TRUE}}, the normalization function is
estimated robustly.}
\item{...}{Arguments passed to (\code{\link[stats]{smooth.spline}}
or \code{\link[aroma.light]{robustSmoothSpline}}).}
}
\value{
Returns an object of the same type and dimensions as the input.
}
\section{Missing values}{
Both argument \code{X} and \code{xTarget} may contain non-finite values.
These values do not affect the estimation of the normalization function.
Missing values and other non-finite values in \code{X},
remain in the output as is. No new missing values are introduced.
}
\examples{
# Simulate three samples with on average 20\% missing values
N <- 10000
X <- cbind(rnorm(N, mean=3, sd=1),
rnorm(N, mean=4, sd=2),
rgamma(N, shape=2, rate=1))
X[sample(3*N, size=0.20*3*N)] <- NA
# Plot the data
layout(matrix(c(1,0,2:5), ncol=2, byrow=TRUE))
xlim <- range(X, na.rm=TRUE)
plotDensity(X, lwd=2, xlim=xlim, main="The three original distributions")
Xn <- normalizeQuantile(X)
plotDensity(Xn, lwd=2, xlim=xlim, main="The three normalized distributions")
plotXYCurve(X, Xn, xlim=xlim, main="The three normalized distributions")
Xn2 <- normalizeQuantileSpline(X, xTarget=Xn[,1], spar=0.99)
plotDensity(Xn2, lwd=2, xlim=xlim, main="The three normalized distributions")
plotXYCurve(X, Xn2, xlim=xlim, main="The three normalized distributions")
}
\author{Henrik Bengtsson}
\seealso{
The target distribution can be calculated as the average
using \code{\link{averageQuantile}}().
Internally either
\code{\link[aroma.light]{robustSmoothSpline}} (\code{robust=TRUE}) or
\code{\link[stats]{smooth.spline}} (\code{robust=FALSE}) is used.
An alternative normalization method that is also normalizing the
empirical densities of samples is \code{\link{normalizeQuantileRank}}().
Contrary to this method, that method requires that all samples are
based on the exact same set of data points and it is also more likely
to over-correct in the tails of the distributions.
}
\references{
[1] H. Bengtsson, R. Irizarry, B. Carvalho, and T. Speed, \emph{Estimation and assessment of raw copy numbers at the single locus level}, Bioinformatics, 2008.
\cr
}
\keyword{methods}
\keyword{nonparametric}
\keyword{multivariate}
\keyword{robust}
aroma.light/man/plotXYCurve.Rd 0000644 0001750 0001750 00000004460 14136047216 016127 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% plotXYCurve.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{plotXYCurve}
\alias{plotXYCurve}
\alias{plotXYCurve.numeric}
\alias{plotXYCurve.matrix}
\title{Plot the relationship between two variables as a smooth curve}
\usage{
\method{plotXYCurve}{numeric}(x, y, col=1L, lwd=2, dlwd=1, dcol=NA, xlim=NULL, ylim=xlim, xlab=NULL,
ylab=NULL, curveFit=smooth.spline, ..., add=FALSE)
\method{plotXYCurve}{matrix}(X, Y, col=seq_len(nrow(X)), lwd=2, dlwd=1, dcol=NA, xlim=NULL, ylim=xlim,
xlab=NULL, ylab=NULL, curveFit=smooth.spline, ..., add=FALSE)
}
\description{
Plot the relationship between two variables as a smooth curve.
}
\arguments{
\item{x, y, X, Y}{Two \code{\link[base]{numeric}} \code{\link[base]{vector}}s of length N for one curve (K=1),
or two \code{\link[base]{numeric}} NxK \code{\link[base]{matrix}}:es for K curves.}
\item{col}{The color of each curve.
Either a scalar specifying the same value of all curves,
or a \code{\link[base]{vector}} of K curve-specific values.}
\item{lwd}{The line width of each curve.
Either a scalar specifying the same value of all curves,
or a \code{\link[base]{vector}} of K curve-specific values.}
\item{dlwd}{The width of each density curve.}
\item{dcol}{The fill color of the interior of each density curve.}
\item{xlim, ylim}{The x and y plotting limits.}
\item{xlab, ylab}{The x and y labels.}
\item{curveFit}{The \code{\link[base]{function}} used to fit each curve. The two first
arguments of the function must take \code{x} and \code{y}, and the
function must return a \code{\link[base]{list}} with fitted elements \code{x} and
\code{y}.}
\item{...}{Additional arguments passed to \code{\link[graphics]{lines}}
used to draw each curve.}
\item{add}{If \code{\link[base:logical]{TRUE}}, the graph is added to the current plot, otherwise
a new plot is created.}
}
\value{
Returns nothing.
}
\section{Missing values}{
Data points (x,y) with non-finite values are excluded.
}
\author{Henrik Bengtsson}
\keyword{methods}
\keyword{nonparametric}
\keyword{multivariate}
\keyword{robust}
aroma.light/man/averageQuantile.Rd 0000644 0001750 0001750 00000003011 14136047216 016767 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% averageQuantile.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{averageQuantile}
\alias{averageQuantile}
\alias{averageQuantile.list}
\alias{averageQuantile.matrix}
\title{Gets the average empirical distribution}
\usage{
\method{averageQuantile}{list}(X, ...)
\method{averageQuantile}{matrix}(X, ...)
}
\description{
Gets the average empirical distribution for a set of samples.
}
\arguments{
\item{X}{A \code{\link[base]{list}} with K \code{\link[base]{numeric}} \code{\link[base]{vector}}s, or a \code{\link[base]{numeric}} NxK \code{\link[base]{matrix}}.
If a \code{\link[base]{list}}, the \code{\link[base]{vector}}s may be of different lengths.}
\item{...}{Not used.}
}
\value{
Returns a \code{\link[base]{numeric}} \code{\link[base]{vector}} of length equal to the longest \code{\link[base]{vector}}
in argument \code{X}.
}
\section{Missing values}{
Missing values are excluded.
}
\seealso{
\code{\link{normalizeQuantileRank}}().
\code{\link{normalizeQuantileSpline}}().
\code{\link[stats]{quantile}}.
}
\author{
Parts adopted from Gordon Smyth (\url{http://www.statsci.org/}) in 2002
\& 2006. Original code by Ben Bolstad at Statistics Department,
University of California.
}
\keyword{methods}
\keyword{nonparametric}
\keyword{multivariate}
\keyword{robust}
aroma.light/man/wpca.Rd 0000644 0001750 0001750 00000016203 14136047216 014613 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% wpca.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{wpca}
\alias{wpca}
\alias{wpca.matrix}
\title{Light-weight Weighted Principal Component Analysis}
\usage{
\method{wpca}{matrix}(x, w=NULL, center=TRUE, scale=FALSE, method=c("dgesdd", "dgesvd"),
swapDirections=FALSE, ...)
}
\description{
Calculates the (weighted) principal components of a matrix, that is,
finds a new coordinate system (not unique) for representing the given
multivariate data such that
i) all dimensions are orthogonal to each other, and
ii) all dimensions have maximal variances.
}
\arguments{
\item{x}{An NxK \code{\link[base]{matrix}}.}
\item{w}{An N \code{\link[base]{vector}} of weights for each row (observation) in
the data matrix. If \code{\link[base]{NULL}}, all observations get the same weight,
that is, standard PCA is used.}
\item{center}{If \code{\link[base:logical]{TRUE}}, the (weighted) sample mean column \code{\link[base]{vector}} is
subtracted from each column in \code{mat}, first.
If data is not centered, the effect will be that a linear subspace
that goes through the origin is fitted.}
\item{scale}{If \code{\link[base:logical]{TRUE}}, each column in \code{mat} is
divided by its (weighted) root-mean-square of the
centered column, first.}
\item{method}{If \code{"dgesdd"} LAPACK's divide-and-conquer
based SVD routine is used (faster [1]).
If \code{"dgesvd"}, LAPACK's QR-decomposition-based routine is used.
}
\item{swapDirections}{If \code{\link[base:logical]{TRUE}}, the signs of eigenvectors
that have more negative than positive components are inverted.
The signs of corresponding principal components are also inverted.
This is only of interest when for instance visualizing or comparing
with other PCA estimates from other methods, because the
PCA (SVD) decomposition of a matrix is not unique.
}
\item{...}{Not used.}
}
\value{
Returns a \code{\link[base]{list}} with elements:
\item{pc}{An NxK \code{\link[base]{matrix}} where the column \code{\link[base]{vector}}s are the
principal components (a.k.a. loading vectors,
spectral loadings or factors etc).}
\item{d}{An K \code{\link[base]{vector}} containing the eigenvalues of the
principal components.}
\item{vt}{An KxK \code{\link[base]{matrix}} containing the eigenvector of the
principal components.}
\item{xMean}{The center coordinate.}
It holds that \code{x == t(t(fit$pc \%*\% fit$vt) + fit$xMean)}.
}
\section{Method}{
A singular value decomposition (SVD) is carried out.
Let X=\code{mat}, then the SVD of the matrix is \eqn{X = U D V'}, where
\eqn{U} and \eqn{V} are orthogonal, and \eqn{D} is a diagonal matrix
with singular values. The principal returned by this method are \eqn{U D}.
Internally \code{La.svd()} (or \code{svd()}) of the \pkg{base}
package is used.
For a popular and well written introduction to SVD see for instance [2].
}
\examples{
for (zzz in 0) {
# This example requires plot3d() in R.basic [http://www.braju.com/R/]
if (!require(pkgName <- "R.basic", character.only=TRUE)) break
# -------------------------------------------------------------
# A first example
# -------------------------------------------------------------
# Simulate data from the model y <- a + bx + eps(bx)
x <- rexp(1000)
a <- c(2,15,3)
b <- c(2,3,15)
bx <- outer(b,x)
eps <- apply(bx, MARGIN=2, FUN=function(x) rnorm(length(x), mean=0, sd=0.1*x))
y <- a + bx + eps
y <- t(y)
# Add some outliers by permuting the dimensions for 1/3 of the observations
idx <- sample(1:nrow(y), size=1/3*nrow(y))
y[idx,] <- y[idx,c(2,3,1)]
# Down-weight the outliers W times to demonstrate how weights are used
W <- 10
# Plot the data with fitted lines at four different view points
N <- 4
theta <- seq(0,180,length.out=N)
phi <- rep(30, length.out=N)
# Use a different color for each set of weights
col <- topo.colors(W)
opar <- par(mar=c(1,1,1,1)+0.1)
layout(matrix(1:N, nrow=2, byrow=TRUE))
for (kk in seq_along(theta)) {
# Plot the data
plot3d(y, theta=theta[kk], phi=phi[kk])
# First, same weights for all observations
w <- rep(1, length=nrow(y))
for (ww in 1:W) {
# Fit a line using IWPCA through data
fit <- wpca(y, w=w, swapDirections=TRUE)
# Get the first principal component
ymid <- fit$xMean
d0 <- apply(y, MARGIN=2, FUN=min) - ymid
d1 <- apply(y, MARGIN=2, FUN=max) - ymid
b <- fit$vt[1,]
y0 <- -b * max(abs(d0))
y1 <- b * max(abs(d1))
yline <- matrix(c(y0,y1), nrow=length(b), ncol=2)
yline <- yline + ymid
points3d(t(ymid), col=col)
lines3d(t(yline), col=col)
# Down-weight outliers only, because here we know which they are.
w[idx] <- w[idx]/2
}
# Highlight the last one
lines3d(t(yline), col="red", lwd=3)
}
par(opar)
} # for (zzz in 0)
rm(zzz)
if (dev.cur() > 1) dev.off()
# -------------------------------------------------------------
# A second example
# -------------------------------------------------------------
# Data
x <- c(1,2,3,4,5)
y <- c(2,4,3,3,6)
opar <- par(bty="L")
opalette <- palette(c("blue", "red", "black"))
xlim <- ylim <- c(0,6)
# Plot the data and the center mass
plot(x,y, pch=16, cex=1.5, xlim=xlim, ylim=ylim)
points(mean(x), mean(y), cex=2, lwd=2, col="blue")
# Linear regression y ~ x
fit <- lm(y ~ x)
abline(fit, lty=1, col=1)
# Linear regression y ~ x through without intercept
fit <- lm(y ~ x - 1)
abline(fit, lty=2, col=1)
# Linear regression x ~ y
fit <- lm(x ~ y)
c <- coefficients(fit)
b <- 1/c[2]
a <- -b*c[1]
abline(a=a, b=b, lty=1, col=2)
# Linear regression x ~ y through without intercept
fit <- lm(x ~ y - 1)
b <- 1/coefficients(fit)
abline(a=0, b=b, lty=2, col=2)
# Orthogonal linear "regression"
fit <- wpca(cbind(x,y))
b <- fit$vt[1,2]/fit$vt[1,1]
a <- fit$xMean[2]-b*fit$xMean[1]
abline(a=a, b=b, lwd=2, col=3)
# Orthogonal linear "regression" without intercept
fit <- wpca(cbind(x,y), center=FALSE)
b <- fit$vt[1,2]/fit$vt[1,1]
a <- fit$xMean[2]-b*fit$xMean[1]
abline(a=a, b=b, lty=2, lwd=2, col=3)
legend(xlim[1],ylim[2], legend=c("lm(y~x)", "lm(y~x-1)", "lm(x~y)",
"lm(x~y-1)", "pca", "pca w/o intercept"), lty=rep(1:2,3),
lwd=rep(c(1,1,2),each=2), col=rep(1:3,each=2))
palette(opalette)
par(opar)
}
\author{Henrik Bengtsson}
\references{
[1] J. Demmel and J. Dongarra, \emph{DOE2000 Progress Report}, 2004.
\url{https://people.eecs.berkeley.edu/~demmel/DOE2000/Report0100.html} \cr
[2] Todd Will, \emph{Introduction to the Singular Value Decomposition},
UW-La Crosse, 2004. \url{http://websites.uwlax.edu/twill/svd/} \cr
}
\seealso{
For a iterative re-weighted PCA method, see \code{\link{iwpca}}().
For Singular Value Decomposition, see \code{\link[base]{svd}}().
For other implementations of Principal Component Analysis functions see
(if they are installed):
\code{\link[stats]{prcomp}} in package \pkg{stats} and \code{pca()} in package
\pkg{pcurve}.
}
\keyword{methods}
\keyword{algebra}
aroma.light/man/plotMvsMPairs.Rd 0000644 0001750 0001750 00000002412 14136047216 016436 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% plotMvsMPairs.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{plotMvsMPairs}
\alias{plotMvsMPairs}
\alias{plotMvsMPairs.matrix}
\title{Plot log-ratios vs log-ratios for all pairs of columns}
\description{
Plot log-ratios vs log-ratios for all pairs of columns.
}
\usage{
\method{plotMvsMPairs}{matrix}(X, xlab="M", ylab="M", xlim=c(-1, 1) * 6, ylim=xlim, pch=".", ...,
add=FALSE)
}
\arguments{
\item{X}{Nx2K \code{\link[base]{matrix}} where N is the number of observations and
2K is an even number of channels.}
\item{xlab,ylab}{Labels on the x and y axes.}
\item{xlim,ylim}{Plot range on the x and y axes.}
\item{pch}{Plot symbol used.}
\item{...}{Additional arguments accepted by \code{\link[graphics]{points}}.}
\item{add}{If \code{\link[base:logical]{TRUE}}, data points are plotted in the current plot,
otherwise a new plot is created.}
}
\details{
Log-ratio are calculated by over paired columns, e.g. column 1 and 2,
column 3 and 4, and so on.
}
\value{
Returns nothing.
}
\author{Henrik Bengtsson}
\keyword{methods}
aroma.light/man/normalizeTumorBoost.Rd 0000644 0001750 0001750 00000013267 14136047216 017726 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% normalizeTumorBoost.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{normalizeTumorBoost}
\alias{normalizeTumorBoost}
\alias{normalizeTumorBoost.numeric}
\title{Normalizes allele B fractions for a tumor given a match normal}
\description{
TumorBoost [1] is a normalization method that normalizes the allele B
fractions of a tumor sample given the allele B fractions and genotypes
of a matched normal.
The method is a single-sample (single-pair) method.
It does not require total copy-number estimates.
The normalization is done such that the total copy number is
unchanged afterwards.
}
\usage{
\method{normalizeTumorBoost}{numeric}(betaT, betaN, muN=callNaiveGenotypes(betaN), preserveScale=FALSE,
flavor=c("v4", "v3", "v2", "v1"), ...)
}
\arguments{
\item{betaT, betaN}{Two \code{\link[base]{numeric}} \code{\link[base]{vector}}s each of length J with
tumor and normal allele B fractions, respectively.}
\item{muN}{An optional \code{\link[base]{vector}} of length J containing
normal genotypes calls in (0,1/2,1,\code{\link[base]{NA}}) for (AA,AB,BB).}
\item{preserveScale}{If \code{\link[base:logical]{TRUE}}, SNPs that are heterozygous in the
matched normal are corrected for signal compression using an estimate
of signal compression based on the amount of correction performed
by TumorBoost on SNPs that are homozygous in the matched normal.}
\item{flavor}{A \code{\link[base]{character}} string specifying the type of
correction applied.}
\item{...}{Not used.}
}
\value{
Returns a \code{\link[base]{numeric}} \code{\link[base]{vector}} of length J containing the normalized
allele B fractions for the tumor.
Attribute \code{modelFit} is a \code{\link[base]{list}} containing model fit parameters.
}
\details{
Allele B fractions are defined as the ratio between the allele B signal
and the sum of both (all) allele signals at the same locus.
Allele B fractions are typically within [0,1], but may have a slightly
wider support due to for instance negative noise.
This is typically also the case for the returned normalized
allele B fractions.
}
\section{Flavors}{
This method provides a few different "flavors" for normalizing the
data. The following values of argument \code{flavor} are accepted:
\itemize{
\item{v4: (default) The TumorBoost method, i.e. Eqns. (8)-(9) in [1].}
\item{v3: Eqn (9) in [1] is applied to both heterozygous and homozygous
SNPs, which effectively is v4 where the normalized allele B
fractions for homozygous SNPs becomes 0 and 1.}
\item{v2: ...}
\item{v1: TumorBoost where correction factor is forced to one, i.e.
\eqn{\eta_j=1}. As explained in [1], this is a suboptimal
normalization method. See also the discussion in the
paragraph following Eqn (12) in [1].}
}
}
\section{Preserving scale}{
\emph{As of \pkg{aroma.light} v1.33.3 (March 30, 2014),
argument \code{preserveScale} no longer has a default value and has
to be specified explicitly. This is done in order to change the
default to \code{\link[base:logical]{FALSE}} in a future version, while minimizing the risk
for surprises.}
Allele B fractions are more or less compressed toward a half, e.g.
the signals for homozygous SNPs are slightly away from zero and one.
The TumorBoost method decreases the correlation in allele B fractions
between the tumor and the normal \emph{conditioned on the genotype}.
What it does not control for is the mean level of the allele B fraction
\emph{conditioned on the genotype}.
By design, most flavors of the method will correct the homozygous SNPs
such that their mean levels get close to the expected zero and
one levels. However, the heterozygous SNPs will typically keep the
same mean levels as before.
One possibility is to adjust the signals such as the mean levels of
the heterozygous SNPs relative to that of the homozygous SNPs is
the same after as before the normalization.
If argument \code{preserveScale=TRUE}, then SNPs that are heterozygous
(in the matched normal) are corrected for signal compression using
an estimate of signal compression based on the amount of correction
performed by TumorBoost on SNPs that are homozygous
(in the matched normal).
The option of preserving the scale is \emph{not} discussed in the
TumorBoost paper [1], which presents the \code{preserveScale=FALSE}
version.
}
\examples{
library(R.utils)
# Load data
pathname <- system.file("data-ex/TumorBoost,fracB,exampleData.Rbin", package="aroma.light")
data <- loadObject(pathname)
attachLocally(data)
pos <- position/1e6
muN <- genotypeN
layout(matrix(1:4, ncol=1))
par(mar=c(2.5,4,0.5,1)+0.1)
ylim <- c(-0.05, 1.05)
col <- rep("#999999", length(muN))
col[muN == 1/2] <- "#000000"
# Allele B fractions for the normal sample
plot(pos, betaN, col=col, ylim=ylim)
# Allele B fractions for the tumor sample
plot(pos, betaT, col=col, ylim=ylim)
# TumorBoost w/ naive genotype calls
betaTN <- normalizeTumorBoost(betaT=betaT, betaN=betaN, preserveScale=FALSE)
plot(pos, betaTN, col=col, ylim=ylim)
# TumorBoost w/ external multi-sample genotype calls
betaTNx <- normalizeTumorBoost(betaT=betaT, betaN=betaN, muN=muN, preserveScale=FALSE)
plot(pos, betaTNx, col=col, ylim=ylim)
}
\author{Henrik Bengtsson, Pierre Neuvial}
\references{
[1] H. Bengtsson, P. Neuvial and T.P. Speed, \emph{TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays}, BMC Bioinformatics, 2010, 11:245. [PMID 20462408]
\cr
}
\keyword{methods}
aroma.light/man/1._Calibration_and_Normalization.Rd 0000644 0001750 0001750 00000022034 14136047216 022135 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% 901.CalibrationAndNormalization.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{1. Calibration and Normalization}
\alias{1. Calibration and Normalization}
\title{1. Calibration and Normalization}
\encoding{latin1}
\description{
In this section we give \emph{our} recommendation on how spotted
two-color (or multi-color) microarray data is best calibrated and
normalized.
}
\section{Classical background subtraction}{
We do \emph{not} recommend background subtraction in classical
means where background is estimated by various image analysis
methods. This means that we will only consider foreground signals
in the analysis.
We estimate "background" by other means. In what is explain below,
only a global background, that is, a global bias, is estimated
and removed.
}
\section{Multiscan calibration}{
In Bengtsson et al (2004) we give evidence that microarray scanners
can introduce a significant bias in data. This bias, which is
about 15-25 out of 65535, \emph{will} introduce intensity dependency
in the log-ratios, as explained in Bengtsson &
\enc{Hssjer}{Hossjer} (2006).
In Bengtsson et al (2004) we find that this bias is stable across
arrays (and a couple of months), but further research is needed
in order to tell if this is true over a longer time period.
To calibrate signals for scanner biases, scan the same array at
multiple PMT-settings at three or more (K >= 3) different
PMT settings (preferably in decreasing order).
While doing this, \emph{do not adjust the laser power settings}.
Also, do the multiscan \emph{without} washing, cleaning or by other
means changing the array between subsequent scans.
Although not necessary, it is preferred that the array
remains in the scanner between subsequent scans. This will simplify
the image analysis since spot identification can be made once
if images aligns perfectly.
After image analysis, read all K scans for the same array into the
two matrices, one for the red and one for the green channel, where
the K columns corresponds to scans and the N rows to the spots.
It is enough to use foreground signals.
In order to multiscan calibrate the data, for each channel
separately call \code{Xc <- calibrateMultiscan(X)} where \code{X}
is the NxK matrix of signals for one channel across all scans. The
calibrated signals are returned in the Nx1 matrix \code{Xc}.
Multiscan calibration may sometimes be skipped, especially if affine
normalization is applied immediately after, but we do recommend that
every lab check at least once if their scanner introduce bias.
If the offsets in a scanner is already estimated from earlier
multiscan analyses, or known by other means, they can readily be
subtracted from the signals of each channel. If arrays are still
multiscanned, it is possible to force the calibration method to
fit the model with zero intercept (assuming the scanner offsets
have been subtracted) by adding argument \code{center=FALSE}.
}
\section{Affine normalization}{
In Bengtsson & \enc{Hssjer}{Hossjer} (2006), we carry out a detailed
study on how biases in each channel introduce so called
intensity-dependent log-ratios among other systematic artifacts.
Data with (additive) bias in each channel is said to be \emph{affinely}
transformed. Data without such bias, is said to be \emph{linearly}
(proportionally) transform. Ideally, observed signals (data) is a
linear (proportional) function of true gene expression levels.
We do \emph{not} assume proportional observations. The scanner bias
is real evidence that assuming linearity is not correct.
Affine normalization corrects for affine transformation in data.
Without control spots it is not possible to estimate the bias in each
of the channels but only the relative bias such that after
normalization the effective bias are the same in all channels.
This is why we call it normalization and not calibration.
In its simplest form, affine normalization is done by
\code{Xn <- normalizeAffine(X)} where \code{X} is a Nx2 matrix with
the first column holds the foreground signals from the red channel and
the second holds the signals from the green channel. If three- or
four-channel data is used these are added the same way. The normalized
data is returned as a Nx2 matrix \code{Xn}.
To normalize all arrays and all channels at once, one may put all
data into one big NxK matrix where the K columns hold the all channels
from the first array, then all channels from the second array and so
on. Then \code{Xn <- normalizeAffine(X)} will return the across-array
and across-channel normalized data in the NxK matrix \code{Xn} where
the columns are stored in the same order as in matrix \code{X}.
Equal effective bias in all channels is much better. First of all,
any intensity-dependent bias in the log-ratios is removed \emph{for
all non-differentially expressed genes}. There is still an
intensity-dependent bias in the log-ratios for differentially expressed
genes, but this is now symmetric around log-ratio zero.
Affine normalization will (by default and recommended) normalize
\emph{all} arrays together and at once. This will guarantee that
all arrays are "on the same scale". Thus, it \emph{not} recommended
to apply a classical between-array scale normalization afterward.
Moreover, the average log-ratio will be zero after an affine
normalization.
Note that an affine normalization will only remove curvature in the
log-ratios at lower intensities.
If a strong intensity-dependent bias at high intensities remains,
this is most likely due to saturation effects, such as too high PMT
settings or quenching.
Note that for a perfect affine normalization you \emph{should}
expect much higher noise levels in the \emph{log-ratios} at lower
intensities than at higher. It should also be approximately
symmetric around zero log-ratio.
In other words, \emph{a strong fanning effect is a good sign}.
Due to different noise levels in red and green channels, different
PMT settings in different channels, plus the fact that the
minimum signal is zero, "odd shapes" may be seen in the log-ratio
vs log-intensity graphs at lower intensities. Typically, these
show themselves as non-symmetric in positive and negative log-ratios.
Note that you should not see this at higher intensities.
If there is a strong intensity-dependent effect left after the
affine normalization, we recommend, for now, that a subsequent
curve-fit or quantile normalization is done.
Which one, we do not know.
Why negative signals?
By default, 5\% of the normalized signals will have a non-positive
signal in one or both channels. \emph{This is on purpose}, although
the exact number 5\% is chosen by experience. The reason for
introducing negative signals is that they are indeed expected.
For instance, when measure a zero gene expression level, there is
a chance that the observed value is (should be) negative due to
measurement noise. (For this reason it is possible that the scanner
manufacturers have introduced scanner bias on purpose to avoid
negative signals, which then all would be truncated to zero.)
To adjust the ratio (or number) of negative signals allowed, use
for example \code{normalizeAffine(X, constraint=0.01)} for 1\%
negative signals. If set to zero (or \code{"max"}) only as much
bias is removed such that no negative signals exist afterward.
Note that this is also true if there were negative signals on
beforehand.
Why not lowess normalization?
Curve-fit normalization methods such as lowess normalization are
basically designed based on linearity assumptions and will for this
reason not correct for channel biases. Curve-fit normalization
methods can by definition only be applied to one pair of channels
at the time and do therefore require a subsequent between-array
scale normalization, which is by the way very ad hoc.
Why not quantile normalization?
Affine normalization can be though of a special case of quantile
normalization that is more robust than the latter.
See Bengtsson & \enc{Hssjer}{Hossjer} (2006) for details.
Quantile normalization is probably better to apply than curve-fit
normalization methods, but less robust than affine normalization,
especially at extreme (low and high) intensities.
For this reason, we do recommend to use affine normalization first,
and if this is not satisfactory, quantile normalization may be applied.
}
\section{Linear (proportional) normalization}{
If the channel offsets are zero, already corrected for, or estimated
by other means, it is possible to normalize the data robustly by
fitting the above affine model without intercept, that is, fitting
a truly linear model. This is done adding argument \code{center=FALSE}
when calling \code{normalizeAffine()}.
}
\author{Henrik Bengtsson}
\keyword{documentation}
aroma.light/man/medianPolish.Rd 0000644 0001750 0001750 00000004611 14136047216 016275 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% medianPolish.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{medianPolish}
\alias{medianPolish}
\alias{medianPolish.matrix}
\title{Median polish}
\description{
Median polish.
}
\usage{
\method{medianPolish}{matrix}(X, tol=0.01, maxIter=10L, na.rm=NA, ..., .addExtra=TRUE)
}
\arguments{
\item{X}{N-times-K \code{\link[base]{matrix}}}
\item{tol}{A \code{\link[base]{numeric}} value greater than zero used as a threshold
to identify when the algorithm has converged.}
\item{maxIter}{Maximum number of iterations.}
\item{na.rm}{If \code{\link[base:logical]{TRUE}} (\code{\link[base:logical]{FALSE}}), \code{\link[base]{NA}}s are exclude (not exclude).
If \code{\link[base]{NA}}, it is assumed that \code{X} contains no \code{\link[base]{NA}} values.}
\item{.addExtra}{If \code{\link[base:logical]{TRUE}}, the name of argument \code{X} is returned
and the returned structure is assigned a class. This will make the
result compatible what \code{\link[stats]{medpolish}} returns.}
\item{...}{Not used.}
}
\value{
Returns a named \code{\link[base]{list}} structure with elements:
\item{overall}{The fitted constant term.}
\item{row}{The fitted row effect.}
\item{col}{The fitted column effect.}
\item{residuals}{The residuals.}
\item{converged}{If \code{\link[base:logical]{TRUE}}, the algorithm converged, otherwise not.}
}
\details{
The implementation of this method give identical estimates as
\code{\link[stats]{medpolish}}, but is about 3-5 times more efficient when
there are no \code{\link[base]{NA}} values.
}
\author{Henrik Bengtsson}
\examples{
# Deaths from sport parachuting; from ABC of EDA, p.224:
deaths <- matrix(c(14,15,14, 7,4,7, 8,2,10, 15,9,10, 0,2,0), ncol=3, byrow=TRUE)
rownames(deaths) <- c("1-24", "25-74", "75-199", "200++", "NA")
colnames(deaths) <- 1973:1975
print(deaths)
mp <- medianPolish(deaths)
mp1 <- medpolish(deaths, trace=FALSE)
print(mp)
ff <- c("overall", "row", "col", "residuals")
stopifnot(all.equal(mp[ff], mp1[ff]))
# Validate decomposition:
stopifnot(all.equal(deaths, mp$overall+outer(mp$row,mp$col,"+")+mp$resid))
}
\seealso{
\code{\link[stats]{medpolish}}.
}
\keyword{methods}
\keyword{algebra}
aroma.light/man/normalizeQuantileRank.matrix.Rd 0000644 0001750 0001750 00000007002 14136047216 021500 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% normalizeQuantileRank.matrix.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{normalizeQuantileRank.matrix}
\alias{normalizeQuantileRank.matrix}
\title{Normalizes the empirical distribution of a set of samples to a common target distribution}
\usage{
\method{normalizeQuantileRank}{matrix}(X, ties=FALSE, robust=FALSE, weights=NULL,
typeOfWeights=c("channel", "signal"), ...)
}
\description{
Normalizes the empirical distribution of a set of samples to a common target distribution.
The average sample distribution is calculated either robustly or not
by utilizing either \code{weightedMedian()} or \code{weighted.mean()}.
A weighted method is used if any of the weights are different from one.
}
\arguments{
\item{X}{a numerical NxK \code{\link[base]{matrix}} with the K columns representing the
channels and the N rows representing the data points.}
\item{robust}{If \code{\link[base:logical]{TRUE}}, the (weighted) median function is used for
calculating the average sample distribution, otherwise the
(weighted) mean function is used.}
\item{ties}{Should ties in \code{x} be treated with care or not?
For more details, see "limma:normalizeQuantiles".}
\item{weights}{If \code{\link[base]{NULL}}, non-weighted normalization is done.
If channel weights, this should be a \code{\link[base]{vector}} of length K specifying
the weights for each channel.
If signal weights, it should be an NxK \code{\link[base]{matrix}} specifying the
weights for each signal.
}
\item{typeOfWeights}{A \code{\link[base]{character}} string specifying the type of
weights given in argument \code{weights}.}
\item{...}{Not used.}
}
\value{
Returns an object of the same shape as the input argument.
}
\section{Missing values}{
Missing values are excluded when estimating the "common" (the baseline).
Values that are \code{\link[base]{NA}} remain \code{\link[base]{NA}} after normalization.
No new \code{\link[base]{NA}}s are introduced.
}
\section{Weights}{
Currently only channel weights are support due to the way quantile
normalization is done.
If signal weights are given, channel weights are calculated from these
by taking the mean of the signal weights in each channel.
}
\examples{
# Simulate three samples with on average 20\% missing values
N <- 10000
X <- cbind(rnorm(N, mean=3, sd=1),
rnorm(N, mean=4, sd=2),
rgamma(N, shape=2, rate=1))
X[sample(3*N, size=0.20*3*N)] <- NA
# Normalize quantiles
Xn <- normalizeQuantile(X)
# Plot the data
layout(matrix(1:2, ncol=1))
xlim <- range(X, Xn, na.rm=TRUE)
plotDensity(X, lwd=2, xlim=xlim, main="The three original distributions")
plotDensity(Xn, lwd=2, xlim=xlim, main="The three normalized distributions")
}
\author{
Adopted from Gordon Smyth (\url{http://www.statsci.org/}) in 2002 \& 2006.
Original code by Ben Bolstad at Statistics Department, University of
California.
Support for calculating the average sample distribution using (weighted)
mean or median was added by Henrik Bengtsson.
}
\seealso{
\code{\link[stats]{median}}, \code{\link[matrixStats]{weightedMedian}},
\code{\link[base]{mean}}() and \code{\link[stats]{weighted.mean}}.
\code{\link{normalizeQuantileSpline}}().
}
\keyword{methods}
\keyword{nonparametric}
\keyword{multivariate}
\keyword{robust}
aroma.light/man/normalizeAverage.Rd 0000644 0001750 0001750 00000002706 14136047216 017157 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% normalizeAverage.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{normalizeAverage}
\alias{normalizeAverage}
\alias{normalizeAverage.list}
\alias{normalizeAverage.matrix}
\title{Rescales channel vectors to get the same average}
\description{
Rescales channel vectors to get the same average.
}
\usage{
\method{normalizeAverage}{matrix}(x, baseline=1, avg=stats::median, targetAvg=2200, ...)
\method{normalizeAverage}{list}(x, baseline=1, avg=stats::median, targetAvg=2200, ...)
}
\arguments{
\item{x}{A \code{\link[base]{numeric}} NxK \code{\link[base]{matrix}} (or \code{\link[base]{list}} of length K).}
\item{baseline}{An \code{\link[base]{integer}} in [1,K] specifying which channel should be
the baseline.}
\item{avg}{A \code{\link[base]{function}} for calculating the average of one channel.}
\item{targetAvg}{The average that each channel should have afterwards.
If \code{\link[base]{NULL}}, the baseline column sets the target average.}
\item{...}{Additional arguments passed to the \code{avg} \code{\link[base]{function}}.}
}
\value{
Returns a normalized \code{\link[base]{numeric}} NxK \code{\link[base]{matrix}} (or \code{\link[base]{list}} of length K).
}
\author{Henrik Bengtsson}
\keyword{methods}
aroma.light/man/sampleCorrelations.Rd 0000644 0001750 0001750 00000003403 14136047216 017525 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% sampleCorrelations.matrix.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{sampleCorrelations}
\alias{sampleCorrelations}
\alias{sampleCorrelations.matrix}
\title{Calculates the correlation for random pairs of observations}
\description{
Calculates the correlation for random pairs of observations.
}
\usage{
\method{sampleCorrelations}{matrix}(X, MARGIN=1, pairs=NULL, npairs=max(5000, nrow(X)), ...)
}
\arguments{
\item{X}{An NxK \code{\link[base]{matrix}} where N >= 2 and K >= 2.}
\item{MARGIN}{The dimension (1 or 2) in which the observations are.
If \code{MARGIN==1} (\code{==2}), each row (column) is an observation.}
\item{pairs}{If a Lx2 \code{\link[base]{matrix}}, the L index pairs for which the
correlations are calculated.
If \code{\link[base]{NULL}}, pairs of observations are sampled.}
\item{npairs}{The number of correlations to calculate.}
\item{...}{Not used.}
}
\value{
Returns a \code{\link[base]{double}} \code{\link[base]{vector}} of length \code{npairs}.
}
\author{Henrik Bengtsson}
\examples{
# Simulate 20000 genes with 10 observations each
X <- matrix(rnorm(n=20000), ncol=10)
# Calculate the correlation for 5000 random gene pairs
cor <- sampleCorrelations(X, npairs=5000)
print(summary(cor))
}
\seealso{
\code{\link[base]{sample}}().
}
\references{
[1] A. Ploner, L. Miller, P. Hall, J. Bergh & Y. Pawitan.
\emph{Correlation test to assess low-level processing of high-density
oligonucleotide microarray data}. BMC Bioinformatics, 2005, vol 6.
}
\keyword{methods}
\keyword{utilities}
aroma.light/man/pairedAlleleSpecificCopyNumbers.Rd 0000644 0001750 0001750 00000003055 14136047216 022102 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% pairedAlleleSpecificCopyNumbers.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{pairedAlleleSpecificCopyNumbers}
\alias{pairedAlleleSpecificCopyNumbers}
\alias{pairedAlleleSpecificCopyNumbers.numeric}
\title{Calculating tumor-normal paired allele-specific copy number stratified on genotypes}
\description{
Calculating tumor-normal paired allele-specific copy number stratified on genotypes.
The method is a single-sample (single-pair) method.
It requires paired tumor-normal parent-specific copy number signals.
}
\usage{
\method{pairedAlleleSpecificCopyNumbers}{numeric}(thetaT, betaT, thetaN, betaN,
muN=callNaiveGenotypes(betaN), ...)
}
\arguments{
\item{thetaT, betaT}{Theta and allele-B fraction signals for the tumor.}
\item{thetaN, betaN}{Total and allele-B fraction signals for the
matched normal.}
\item{muN}{An optional \code{\link[base]{vector}} of length J containing
normal genotypes calls in (0,1/2,1,\code{\link[base]{NA}}) for (AA,AB,BB).}
\item{...}{Not used.}
}
\value{
Returns a \code{\link[base]{data.frame}} with elements \code{CT}, \code{betaT} and \code{muN}.
}
\seealso{
This definition of calculating tumor-normal paired ASCN is related
to how the \code{\link{normalizeTumorBoost}}() method calculates normalized
tumor BAFs.
}
\author{Pierre Neuvial, Henrik Bengtsson}
\keyword{methods}
aroma.light/man/calibrateMultiscan.Rd 0000644 0001750 0001750 00000014124 14136047216 017467 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% calibrateMultiscan.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{calibrateMultiscan}
\alias{calibrateMultiscan}
\alias{calibrateMultiscan.matrix}
\encoding{latin1}
\title{Weighted affine calibration of a multiple re-scanned channel}
\description{
Weighted affine calibration of a multiple re-scanned channel.
}
\usage{
\method{calibrateMultiscan}{matrix}(X, weights=NULL, typeOfWeights=c("datapoint"), method="L1",
constraint="diagonal", satSignal=2^16 - 1, ..., average=median, deviance=NULL,
project=FALSE, .fitOnly=FALSE)
}
\arguments{
\item{X}{An NxK \code{\link[base]{matrix}} (K>=2) where the columns represent the
multiple scans of one channel (a two-color array contains two
channels) to be calibrated.}
\item{weights}{If \code{\link[base]{NULL}}, non-weighted normalization is done.
If data-point weights are used, this should be a \code{\link[base]{vector}} of length
N of data point weights used when estimating the normalization
function.
}
\item{typeOfWeights}{A \code{\link[base]{character}} string specifying the type of
weights given in argument \code{weights}.
}
\item{method}{A \code{\link[base]{character}} string specifying how the estimates are
robustified. See \code{\link{iwpca}}() for all accepted values.}
\item{constraint}{Constraint making the bias parameters identifiable.
See \code{\link{fitIWPCA}}() for more details.}
\item{satSignal}{Signals equal to or above this threshold is considered
saturated signals.}
\item{...}{Other arguments passed to \code{\link{fitIWPCA}}() and in
turn \code{\link{iwpca}}(), e.g. \code{center} (see below).}
\item{average}{A \code{\link[base]{function}} to calculate the average signals between calibrated scans.}
\item{deviance}{A \code{\link[base]{function}} to calculate the deviance of the signals between calibrated scans.}
\item{project}{If \code{\link[base:logical]{TRUE}}, the calibrated data points projected onto the
diagonal line, otherwise not. Moreover, if \code{\link[base:logical]{TRUE}}, argument
\code{average} is ignored.}
\item{.fitOnly}{If \code{\link[base:logical]{TRUE}}, the data will not be back-transform.}
}
\value{
If \code{average} is specified or \code{project} is \code{\link[base:logical]{TRUE}},
an Nx1 \code{\link[base]{matrix}} is returned, otherwise an NxK \code{\link[base]{matrix}} is returned.
If \code{deviance} is specified, a deviance Nx1 \code{\link[base]{matrix}} is returned
as attribute \code{deviance}.
In addition, the fitted model is returned as attribute \code{modelFit}.
}
\section{Negative, non-positive, and saturated values}{
Affine multiscan calibration applies also to negative values, which are
therefor also calibrated, if they exist.
Saturated signals in any scan are set to \code{\link[base]{NA}}. Thus, they will not be
used to estimate the calibration function, nor will they affect an
optional projection.
}
\section{Missing values}{
Only observations (rows) in \code{X} that contain all finite values are
used in the estimation of the calibration functions. Thus,
observations can be excluded by setting them to \code{\link[base]{NA}}.
}
\section{Weighted normalization}{
Each data point/observation, that is, each row in \code{X}, which is a
vector of length K, can be assigned a weight in [0,1] specifying how much
it should \emph{affect the fitting of the calibration function}.
Weights are given by argument \code{weights},
which should be a \code{\link[base]{numeric}} \code{\link[base]{vector}} of length N. Regardless of weights,
all data points are \emph{calibrated} based on the fitted calibration
function.
}
\section{Robustness}{
By default, the model fit of multiscan calibration is done in \eqn{L_1}
(\code{method="L1"}). This way, outliers affect the parameter estimates
less than ordinary least-square methods.
When calculating the average calibrated signal from multiple scans,
by default the median is used, which further robustify against outliers.
For further robustness, downweight outliers such as saturated signals,
if possible.
Tukey's biweight function is supported, but not used by default because
then a "bandwidth" parameter has to selected. This can indeed be done
automatically by estimating the standard deviation, for instance using
MAD. However, since scanner signals have heteroscedastic noise
(standard deviation is approximately proportional to the non-logged
signal), Tukey's bandwidth parameter has to be a function of the
signal too, cf. \code{\link[stats]{loess}}. We have experimented with this
too, but found that it does not significantly improve the robustness
compared to \eqn{L_1}.
Moreover, using Tukey's biweight as is, that is, assuming homoscedastic
noise, seems to introduce a (scale dependent) bias in the estimates
of the offset terms.
}
\section{Using a known/previously estimated offset}{
If the scanner offsets can be assumed to be known, for instance,
from prior multiscan analyses on the scanner, then it is possible
to fit the scanner model with no (zero) offset by specifying
argument \code{center=FALSE}.
Note that you cannot specify the offset. Instead, subtract it
from all signals before calibrating, e.g.
\code{Xc <- calibrateMultiscan(X-e, center=FALSE)}
where \code{e} is the scanner offset (a scalar).
You can assert that the model is fitted without offset by
\code{stopifnot(all(attr(Xc, "modelFit")$adiag == 0))}.
}
\details{
Fitting is done by iterated re-weighted principal component analysis
(IWPCA).
}
\author{Henrik Bengtsson}
\references{
[1] H. Bengtsson, J. Vallon-Christersson and G. \enc{Jnsson}{Jonsson}, \emph{Calibration and assessment of channel-specific biases in microarray data with extended dynamical range}, BMC Bioinformatics, 5:177, 2004.
\cr
}
\examples{\dontrun{# For an example, see help(normalizeAffine).}}
\seealso{
\code{\link{1. Calibration and Normalization}}.
\code{\link{normalizeAffine}}().
}
\keyword{methods}
aroma.light/man/iwpca.Rd 0000644 0001750 0001750 00000011715 14136047216 014767 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% iwpca.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{iwpca}
\alias{iwpca}
\alias{iwpca.matrix}
\title{Fits an R-dimensional hyperplane using iterative re-weighted PCA}
\description{
Fits an R-dimensional hyperplane using iterative re-weighted PCA.
}
\usage{
\method{iwpca}{matrix}(X, w=NULL, R=1, method=c("symmetric", "bisquare", "tricube", "L1"), maxIter=30,
acc=1e-04, reps=0.02, fit0=NULL, ...)
}
\arguments{
\item{X}{N-times-K \code{\link[base]{matrix}} where N is the number of observations and
K is the number of dimensions.}
\item{w}{An N \code{\link[base]{vector}} of weights for each row (observation) in
the data matrix. If \code{\link[base]{NULL}}, all observations get the same weight.}
\item{R}{Number of principal components to fit. By default a line
is fitted.}
\item{method}{
If \code{"symmetric"} (or \code{"bisquare"}), Tukey's biweight
is used. If \code{"tricube"}, the tricube weight is used.
If \code{"L1"}, the model is fitted in \eqn{L_1}.
If a \code{\link[base]{function}}, it is used to calculate weights for next iteration
based on the current iteration's residuals.}
\item{maxIter}{Maximum number of iterations.}
\item{acc}{The (Euclidean) distance between two subsequent parameters
fit for which the algorithm is considered to have converged.}
\item{reps}{Small value to be added to the residuals before the
the weights are calculated based on their inverse. This is to avoid
infinite weights.}
\item{fit0}{A \code{\link[base]{list}} containing elements \code{vt} and \code{pc}
specifying an initial fit.
If \code{\link[base]{NULL}}, the initial guess will be equal to the (weighted) PCA fit.}
\item{...}{Additional arguments accepted by \code{\link{wpca}}().}
}
\value{
Returns the fit (a \code{\link[base]{list}}) from the last call to \code{\link{wpca}}()
with the additional elements \code{nbrOfIterations} and
\code{converged}.
}
\details{
This method uses weighted principal component analysis (WPCA) to fit a
R-dimensional hyperplane through the data with initial internal
weights all equal.
At each iteration the internal weights are recalculated based on
the "residuals".
If \code{method=="L1"}, the internal weights are 1 / sum(abs(r) + reps).
This is the same as \code{method=function(r) 1/sum(abs(r)+reps)}.
The "residuals" are orthogonal Euclidean distance of the principal
components R,R+1,...,K.
In each iteration before doing WPCA, the internal weighted are
multiplied by the weights given by argument \code{w}, if specified.
}
\author{Henrik Bengtsson}
\examples{
for (zzz in 0) {
# This example requires plot3d() in R.basic [http://www.braju.com/R/]
if (!require(pkgName <- "R.basic", character.only=TRUE)) break
# Simulate data from the model y <- a + bx + eps(bx)
x <- rexp(1000)
a <- c(2,15,3)
b <- c(2,3,4)
bx <- outer(b,x)
eps <- apply(bx, MARGIN=2, FUN=function(x) rnorm(length(x), mean=0, sd=0.1*x))
y <- a + bx + eps
y <- t(y)
# Add some outliers by permuting the dimensions for 1/10 of the observations
idx <- sample(1:nrow(y), size=1/10*nrow(y))
y[idx,] <- y[idx,c(2,3,1)]
# Plot the data with fitted lines at four different view points
opar <- par(mar=c(1,1,1,1)+0.1)
N <- 4
layout(matrix(1:N, nrow=2, byrow=TRUE))
theta <- seq(0,270,length.out=N)
phi <- rep(20, length.out=N)
xlim <- ylim <- zlim <- c(0,45);
persp <- list();
for (kk in seq_along(theta)) {
# Plot the data
persp[[kk]] <- plot3d(y, theta=theta[kk], phi=phi[kk], xlim=xlim, ylim=ylim, zlim=zlim)
}
# Weights on the observations
# Example a: Equal weights
w <- NULL
# Example b: More weight on the outliers (uncomment to test)
w <- rep(1, length(x)); w[idx] <- 0.8
# ...and show all iterations too with different colors.
maxIter <- c(seq(1,20,length.out=10),Inf)
col <- topo.colors(length(maxIter))
# Show the fitted value for every iteration
for (ii in seq_along(maxIter)) {
# Fit a line using IWPCA through data
fit <- iwpca(y, w=w, maxIter=maxIter[ii], swapDirections=TRUE)
ymid <- fit$xMean
d0 <- apply(y, MARGIN=2, FUN=min) - ymid
d1 <- apply(y, MARGIN=2, FUN=max) - ymid
b <- fit$vt[1,]
y0 <- -b * max(abs(d0))
y1 <- b * max(abs(d1))
yline <- matrix(c(y0,y1), nrow=length(b), ncol=2)
yline <- yline + ymid
for (kk in seq_along(theta)) {
# Set pane to draw in
par(mfg=c((kk-1) \%/\% 2, (kk-1) \%\% 2) + 1);
# Set the viewpoint of the pane
options(persp.matrix=persp[[kk]]);
# Get the first principal component
points3d(t(ymid), col=col[ii])
lines3d(t(yline), col=col[ii])
# Highlight the last one
if (ii == length(maxIter))
lines3d(t(yline), col="red", lwd=3)
}
}
par(opar)
} # for (zzz in 0)
rm(zzz)
}
\seealso{
Internally \code{\link{wpca}}() is used for calculating the weighted PCA.
}
\keyword{methods}
\keyword{algebra}
aroma.light/man/likelihood.smooth.spline.Rd 0000644 0001750 0001750 00000010572 14136047216 020610 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% likelihood.smooth.spline.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{likelihood.smooth.spline}
\alias{likelihood.smooth.spline}
\title{Calculate the log likelihood of a smoothing spline given the data}
\usage{
\method{likelihood}{smooth.spline}(object, x=NULL, y=NULL, w=NULL, base=exp(1),
rel.tol=.Machine$double.eps^(1/8), ...)
}
\arguments{
\item{object}{The smooth.spline object.}
\item{x, y}{The x and y values for which the (weighted) likelihood will
be calculated. If \code{x} is of type \code{xy.coords} any value of
argument \code{y} will be omitted. If \code{x==NULL}, the x and y values
of the smoothing spline will be used.}
\item{w}{The weights for which the (weighted) likelihood will be
calculated. If \code{\link[base]{NULL}}, weights equal to one are assumed.}
\item{base}{The base of the logarithm of the likelihood. If \code{\link[base]{NULL}},
the non-logged likelihood is returned.}
\item{rel.tol}{The relative tolerance used in the call to
\code{integrate}.}
\item{...}{Not used.}
}
\description{
Calculate the (log) likelihood of a spline given the data used to fit
the spline, \eqn{g}. The likelihood consists of two main parts:
1) (weighted) residuals sum of squares, and 2) a penalty term. The
penalty term consists of a \emph{smoothing parameter} \eqn{lambda}
and a \emph{roughness measure} of the spline
\eqn{J(g) = \int g''(t) dt}. Hence, the overall log likelihood is
\deqn{\log L(g|x) = (y-g(x))'W(y-g(x)) + \lambda J(g)}
In addition to the overall likelihood, all its separate
components are also returned.
Note: when fitting a smooth spline with \eqn{(x,y)} values where the
\eqn{x}'s are \emph{not} unique, \code{smooth.spline} will replace
such \eqn{(x,y)}'s with a new pair \eqn{(x,y')} where \eqn{y'} is a
reweighted average on the original \eqn{y}'s. It is important to
be aware of this. In such cases, the resulting \code{smooth.spline}
object does \emph{not} contain all \eqn{(x,y)}'s and therefore this
function will not calculate the weighted residuals sum of square on
the original data set, but on the data set with unique \eqn{x}'s.
See examples below how to calculate the likelihood for the spline with
the original data.
}
\value{
Returns the overall (log) likelihood of class
\code{SmoothSplineLikelihood}, a class with the following attributes:
\item{wrss}{the (weighted) residual sum of square}
\item{penalty}{the penalty which is equal to \code{-lambda*roughness}.}
\item{lambda}{the smoothing parameter}
\item{roughness}{the value of the roughness functional given the
specific smoothing spline and the range of data}
}
\details{
The roughness penalty for the smoothing spline, \eqn{g}, fitted
from data in the interval \eqn{[a,b]} is defined as
\deqn{J(g) = \int_a^b g''(t) dt}
which is the same as
\deqn{J(g) = g'(b) - g'(a)}
The latter is calculated internally by using
\code{\link[stats]{predict.smooth.spline}}.
}
\examples{
# Define f(x)
f <- expression(0.1*x^4 + 1*x^3 + 2*x^2 + x + 10*sin(2*x))
# Simulate data from this function in the range [a,b]
a <- -2; b <- 5
x <- seq(a, b, length.out=3000)
y <- eval(f)
# Add some noise to the data
y <- y + rnorm(length(y), 0, 10)
# Plot the function and its second derivative
plot(x,y, type="l", lwd=4)
# Fit a cubic smoothing spline and plot it
g <- smooth.spline(x,y, df=16)
lines(g, col="yellow", lwd=2, lty=2)
# Calculating the (log) likelihood of the fitted spline
l <- likelihood(g)
cat("Log likelihood with unique x values:\n")
print(l)
# Note that this is not the same as the log likelihood of the
# data on the fitted spline iff the x values are non-unique
x[1:5] <- x[1] # Non-unique x values
g <- smooth.spline(x,y, df=16)
l <- likelihood(g)
cat("\nLog likelihood of the *spline* data set:\n")
print(l)
# In cases with non unique x values one has to proceed as
# below if one want to get the log likelihood for the original
# data.
l <- likelihood(g, x=x, y=y)
cat("\nLog likelihood of the *original* data set:\n")
print(l)
}
\seealso{
\code{\link[stats]{smooth.spline}} and \code{\link{robustSmoothSpline}}().
}
\author{Henrik Bengtsson}
\keyword{methods}
\keyword{smooth}
\keyword{internal}
aroma.light/man/backtransformAffine.Rd 0000644 0001750 0001750 00000007162 14136047216 017632 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% backtransformAffine.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{backtransformAffine}
\alias{backtransformAffine}
\alias{backtransformAffine.matrix}
\title{Reverse affine transformation}
\description{
Reverse affine transformation.
}
\usage{
\method{backtransformAffine}{matrix}(X, a=NULL, b=NULL, project=FALSE, ...)
}
\arguments{
\item{X}{An NxK \code{\link[base]{matrix}} containing data to be backtransformed.}
\item{a}{A scalar, \code{\link[base]{vector}}, a \code{\link[base]{matrix}}, or a \code{\link[base]{list}}.
First, if a \code{\link[base]{list}}, it is assumed to contained the elements \code{a}
and \code{b}, which are the used as if they were passed as separate
arguments.
If a \code{\link[base]{vector}}, a matrix of size NxK is created which is then filled
\emph{row by row} with the values in the vector. Commonly, the
vector is of length K, which means that the matrix will consist of
copies of this vector stacked on top of each other.
If a \code{\link[base]{matrix}}, a matrix of size NxK is created which is then filled
\emph{column by column} with the values in the matrix (collected
column by column. Commonly, the matrix is of size NxK, or NxL with
L < K and then the resulting matrix consists of copies sitting
next to each other.
The resulting NxK matrix is subtracted from the NxK matrix \code{X}.
}
\item{b}{A scalar, \code{\link[base]{vector}}, a \code{\link[base]{matrix}}.
A NxK matrix is created from this argument. For details see
argument \code{a}.
The NxK matrix \code{X-a} is divided by the resulting NxK matrix.
}
\item{project}{
returned (K values per data point are returned).
If \code{\link[base:logical]{TRUE}}, the backtransformed values "\code{(X-a)/b}" are projected
onto the line L(a,b) so that all columns
will be identical.
}
\item{...}{Not used.}
}
\value{
The "\code{(X-a)/b}" backtransformed NxK \code{\link[base]{matrix}} is returned.
If \code{project} is \code{\link[base:logical]{TRUE}}, an Nx1 \code{\link[base]{matrix}} is returned, because
all columns are identical anyway.
}
\section{Missing values}{
Missing values remain missing values. If projected, data points that
contain missing values are projected without these.
}
\examples{
X <- matrix(1:8, nrow=4, ncol=2)
X[2,2] <- NA
print(X)
# Returns a 4x2 matrix
print(backtransformAffine(X, a=c(1,5)))
# Returns a 4x2 matrix
print(backtransformAffine(X, b=c(1,1/2)))
# Returns a 4x2 matrix
print(backtransformAffine(X, a=matrix(1:4,ncol=1)))
# Returns a 4x2 matrix
print(backtransformAffine(X, a=matrix(1:3,ncol=1)))
# Returns a 4x2 matrix
print(backtransformAffine(X, a=matrix(1:2,ncol=1), b=c(1,2)))
# Returns a 4x1 matrix
print(backtransformAffine(X, b=c(1,1/2), project=TRUE))
# If the columns of X are identical, and a identity
# backtransformation is applied and projected, the
# same matrix is returned.
X <- matrix(1:4, nrow=4, ncol=3)
Y <- backtransformAffine(X, b=c(1,1,1), project=TRUE)
print(X)
print(Y)
stopifnot(sum(X[,1]-Y) <= .Machine$double.eps)
# If the columns of X are identical, and a identity
# backtransformation is applied and projected, the
# same matrix is returned.
X <- matrix(1:4, nrow=4, ncol=3)
X[,2] <- X[,2]*2; X[,3] <- X[,3]*3
print(X)
Y <- backtransformAffine(X, b=c(1,2,3))
print(Y)
Y <- backtransformAffine(X, b=c(1,2,3), project=TRUE)
print(Y)
stopifnot(sum(X[,1]-Y) <= .Machine$double.eps)
}
\keyword{methods}
aroma.light/man/callNaiveGenotypes.Rd 0000644 0001750 0001750 00000007267 14136047216 017467 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% callNaiveGenotypes.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{callNaiveGenotypes}
\alias{callNaiveGenotypes}
\alias{callNaiveGenotypes.numeric}
\title{Calls genotypes in a normal sample}
\description{
Calls genotypes in a normal sample.
}
\usage{
\method{callNaiveGenotypes}{numeric}(y, cn=rep(2L, times = length(y)), ..., modelFit=NULL, verbose=FALSE)
}
\arguments{
\item{y}{A \code{\link[base]{numeric}} \code{\link[base]{vector}} of length J containing allele B fractions
for a normal sample.}
\item{cn}{An optional \code{\link[base]{numeric}} \code{\link[base]{vector}} of length J specifying the true
total copy number in \eqn{\{0,1,2,NA\}} at each locus. This can be
used to specify which loci are diploid and which are not, e.g.
autosomal and sex chromosome copy numbers.}
\item{...}{Additional arguments passed to \code{\link{fitNaiveGenotypes}}().}
\item{modelFit}{A optional model fit as returned
by \code{\link{fitNaiveGenotypes}}().}
\item{verbose}{A \code{\link[base]{logical}} or a \code{\link[R.utils]{Verbose}} object.}
}
\value{
Returns a \code{\link[base]{numeric}} \code{\link[base]{vector}} of length J containing the genotype calls
in allele B fraction space, that is, in [0,1] where 1/2 corresponds
to a heterozygous call, and 0 and 1 corresponds to homozygous A
and B, respectively.
Non called genotypes have value \code{\link[base]{NA}}.
}
\examples{
layout(matrix(1:3, ncol=1))
par(mar=c(2,4,4,1)+0.1)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# A bimodal distribution
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
xAA <- rnorm(n=10000, mean=0, sd=0.1)
xBB <- rnorm(n=10000, mean=1, sd=0.1)
x <- c(xAA,xBB)
fit <- findPeaksAndValleys(x)
print(fit)
calls <- callNaiveGenotypes(x, cn=rep(1,length(x)), verbose=-20)
xc <- split(x, calls)
print(table(calls))
xx <- c(list(x),xc)
plotDensity(xx, adjust=1.5, lwd=2, col=seq_along(xx), main="(AA,BB)")
abline(v=fit$x)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# A trimodal distribution with missing values
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
xAB <- rnorm(n=10000, mean=1/2, sd=0.1)
x <- c(xAA,xAB,xBB)
x[sample(length(x), size=0.05*length(x))] <- NA;
x[sample(length(x), size=0.01*length(x))] <- -Inf;
x[sample(length(x), size=0.01*length(x))] <- +Inf;
fit <- findPeaksAndValleys(x)
print(fit)
calls <- callNaiveGenotypes(x)
xc <- split(x, calls)
print(table(calls))
xx <- c(list(x),xc)
plotDensity(xx, adjust=1.5, lwd=2, col=seq_along(xx), main="(AA,AB,BB)")
abline(v=fit$x)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# A trimodal distribution with clear separation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
xAA <- rnorm(n=10000, mean=0, sd=0.02)
xAB <- rnorm(n=10000, mean=1/2, sd=0.02)
xBB <- rnorm(n=10000, mean=1, sd=0.02)
x <- c(xAA,xAB,xBB)
fit <- findPeaksAndValleys(x)
print(fit)
calls <- callNaiveGenotypes(x)
xc <- split(x, calls)
print(table(calls))
xx <- c(list(x),xc)
plotDensity(xx, adjust=1.5, lwd=2, col=seq_along(xx), main="(AA',AB',BB')")
abline(v=fit$x)
}
\section{Missing and non-finite values}{
A missing value always gives a missing (\code{\link[base]{NA}}) genotype call.
Negative infinity (-\code{\link[base:is.finite]{Inf}}) always gives genotype call 0.
Positive infinity (+\code{\link[base:is.finite]{Inf}}) always gives genotype call 1.
}
\author{Henrik Bengtsson}
\seealso{
Internally \code{\link{fitNaiveGenotypes}}() is used to identify the thresholds.
}
\keyword{methods}
aroma.light/man/fitXYCurve.Rd 0000644 0001750 0001750 00000007216 14136047216 015735 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% fitXYCurve.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{fitXYCurve}
\alias{fitXYCurve}
\alias{fitXYCurve.matrix}
\alias{backtransformXYCurve}
\alias{backtransformXYCurve.matrix}
\title{Fitting a smooth curve through paired (x,y) data}
\description{
Fitting a smooth curve through paired (x,y) data.
}
\usage{
\method{fitXYCurve}{matrix}(X, weights=NULL, typeOfWeights=c("datapoint"), method=c("loess", "lowess",
"spline", "robustSpline"), bandwidth=NULL, satSignal=2^16 - 1, ...)
}
\arguments{
\item{X}{An Nx2 \code{\link[base]{matrix}} where the columns represent the two channels
to be normalized.}
\item{weights}{If \code{\link[base]{NULL}}, non-weighted normalization is done.
If data-point weights are used, this should be a \code{\link[base]{vector}} of length
N of data point weights used when estimating the normalization
function.
}
\item{typeOfWeights}{A \code{\link[base]{character}} string specifying the type of
weights given in argument \code{weights}.
}
\item{method}{\code{\link[base]{character}} string specifying which method to use when
fitting the intensity-dependent function.
Supported methods:
\code{"loess"} (better than lowess),
\code{"lowess"} (classic; supports only zero-one weights),
\code{"spline"} (more robust than lowess at lower and upper
intensities; supports only zero-one weights),
\code{"robustSpline"} (better than spline).
}
\item{bandwidth}{A \code{\link[base]{double}} value specifying the bandwidth of the
estimator used.
}
\item{satSignal}{Signals equal to or above this threshold will not
be used in the fitting.
}
\item{...}{Not used.}
}
\value{
A named \code{\link[base]{list}} structure of class \code{XYCurve}.
}
\section{Missing values}{
The estimation of the function will only be made based on complete
non-saturated observations, i.e. observations that contains no \code{\link[base]{NA}}
values nor saturated values as defined by \code{satSignal}.
}
\section{Weighted normalization}{
Each data point, that is, each row in \code{X}, which is a
vector of length 2, can be assigned a weight in [0,1] specifying how much
it should \emph{affect the fitting of the normalization function}.
Weights are given by argument \code{weights}, which should be a \code{\link[base]{numeric}}
\code{\link[base]{vector}} of length N.
Note that the lowess and the spline method only support zero-one
\{0,1\} weights.
For such methods, all weights that are less than a half are set to zero.
}
\section{Details on loess}{
For \code{\link[stats]{loess}}, the arguments \code{family="symmetric"},
\code{degree=1}, \code{span=3/4},
\code{control=loess.control(trace.hat="approximate"},
\code{iterations=5}, \code{surface="direct")} are used.
}
\author{Henrik Bengtsson}
\examples{
# Simulate data from the model y <- a + bx + x^c + eps(bx)
x <- rexp(1000)
a <- c(2,15)
b <- c(2,1)
c <- c(1,2)
bx <- outer(b,x)
xc <- t(sapply(c, FUN=function(c) x^c))
eps <- apply(bx, MARGIN=2, FUN=function(x) rnorm(length(x), mean=0, sd=0.1*x))
Y <- a + bx + xc + eps
Y <- t(Y)
lim <- c(0,70)
plot(Y, xlim=lim, ylim=lim)
# Fit principal curve through a subset of (y_1, y_2)
subset <- sample(nrow(Y), size=0.3*nrow(Y))
fit <- fitXYCurve(Y[subset,], bandwidth=0.2)
lines(fit, col="red", lwd=2)
# Backtransform (y_1, y_2) keeping y_1 unchanged
YN <- backtransformXYCurve(Y, fit=fit)
points(YN, col="blue")
abline(a=0, b=1, col="red", lwd=2)
}
\keyword{methods}
aroma.light/man/sampleTuples.Rd 0000644 0001750 0001750 00000002412 14136047216 016334 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% sampleTuples.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{sampleTuples}
\alias{sampleTuples.default}
\alias{sampleTuples}
\title{Sample tuples of elements from a set}
\description{
Sample tuples of elements from a set.
The elements within a sampled tuple are unique, i.e. no two elements
are the same.
}
\usage{
\method{sampleTuples}{default}(x, size, length, ...)
}
\arguments{
\item{x}{A set of elements to sample from.}
\item{size}{The number of tuples to sample.}
\item{length}{The length of each tuple.}
\item{...}{Additional arguments passed to \code{\link[base]{sample}}().}
}
\value{
Returns a NxK \code{\link[base]{matrix}} where N = \code{size} and K = \code{length}.
}
\author{Henrik Bengtsson}
\examples{
pairs <- sampleTuples(1:10, size=5, length=2)
print(pairs)
triples <- sampleTuples(1:10, size=5, length=3)
print(triples)
# Allow tuples with repeated elements
quadruples <- sampleTuples(1:3, size=5, length=4, replace=TRUE)
print(quadruples)
}
\seealso{
\code{\link[base]{sample}}().
}
\keyword{utilities}
aroma.light/man/normalizeAffine.Rd 0000644 0001750 0001750 00000023235 14136047216 016775 0 ustar nilesh nilesh %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do not modify this file since it was automatically generated from:
%
% normalizeAffine.R
%
% by the Rdoc compiler part of the R.oo package.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\name{normalizeAffine}
\alias{normalizeAffine}
\alias{normalizeAffine.matrix}
\encoding{latin1}
\title{Weighted affine normalization between channels and arrays}
\description{
Weighted affine normalization between channels and arrays.
This method will remove curvature in the M vs A plots that are
due to an affine transformation of the data. In other words, if there
are (small or large) biases in the different (red or green) channels,
biases that can be equal too, you will get curvature in the M vs A plots
and this type of curvature will be removed by this normalization method.
Moreover, if you normalize all slides at once, this method will also
bring the signals on the same scale such that the log-ratios for
different slides are comparable. Thus, do not normalize the scale of
the log-ratios between slides afterward.
It is recommended to normalize as many slides as possible in one run.
The result is that if creating log-ratios between any channels and any
slides, they will contain as little curvature as possible.
Furthermore, since the relative scale between any two channels on any
two slides will be one if one normalizes all slides (and channels) at
once it is possible to add or multiply with the \emph{same} constant
to all channels/arrays without introducing curvature. Thus, it is
easy to rescale the data afterwards as demonstrated in the example.
}
\usage{
\method{normalizeAffine}{matrix}(X, weights=NULL, typeOfWeights=c("datapoint"), method="L1",
constraint=0.05, satSignal=2^16 - 1, ..., .fitOnly=FALSE)
}
\arguments{
\item{X}{An NxK \code{\link[base]{matrix}} (K>=2) where the columns represent the channels,
to be normalized.}
\item{weights}{If \code{\link[base]{NULL}}, non-weighted normalization is done.
If data-point weights are used, this should be a \code{\link[base]{vector}} of length
N of data point weights used when estimating the normalization
function.
}
\item{typeOfWeights}{A \code{\link[base]{character}} string specifying the type of
weights given in argument \code{weights}.
}
\item{method}{A \code{\link[base]{character}} string specifying how the estimates are
robustified. See \code{\link{iwpca}}() for all accepted values.}
\item{constraint}{Constraint making the bias parameters identifiable.
See \code{\link{fitIWPCA}}() for more details.}
\item{satSignal}{Signals equal to or above this threshold will not
be used in the fitting.}
\item{...}{Other arguments passed to \code{\link{fitIWPCA}}() and in
turn \code{\link{iwpca}}(). For example, the weight argument
of \code{\link{iwpca}}(). See also below.}
\item{.fitOnly}{If \code{\link[base:logical]{TRUE}}, the data will not be back-transform.}
}
\value{
A NxK \code{\link[base]{matrix}} of the normalized channels.
The fitted model is returned as attribute \code{modelFit}.
}
\section{Negative, non-positive, and saturated values}{
Affine normalization applies equally well to negative values. Thus,
contrary to normalization methods applied to log-ratios, such as curve-fit
normalization methods, affine normalization, will not set these to \code{\link[base]{NA}}.
Data points that are saturated in one or more channels are not used
to estimate the normalization function, but they are normalized.
}
\section{Missing values}{
The estimation of the affine normalization function will only be made
based on complete non-saturated observations, i.e. observations that
contains no \code{\link[base]{NA}} values nor saturated values as defined by \code{satSignal}.
}
\section{Weighted normalization}{
Each data point/observation, that is, each row in \code{X}, which is a
vector of length K, can be assigned a weight in [0,1] specifying how much
it should \emph{affect the fitting of the affine normalization function}.
Weights are given by argument \code{weights},
which should be a \code{\link[base]{numeric}} \code{\link[base]{vector}} of length N. Regardless of weights,
all data points are \emph{normalized} based on the fitted normalization
function.
}
\section{Robustness}{
By default, the model fit of affine normalization is done in \eqn{L_1}
(\code{method="L1"}). This way, outliers affect the parameter estimates
less than ordinary least-square methods.
For further robustness, downweight outliers such as saturated signals,
if possible.
We do not use Tukey's biweight function for reasons similar to those
outlined in \code{\link{calibrateMultiscan}}().
}
\section{Using known/previously estimated channel offsets}{
If the channel offsets can be assumed to be known, then it is
possible to fit the affine model with no (zero) offset, which
formally is a linear (proportional) model, by specifying
argument \code{center=FALSE}.
In order to do this, the channel offsets have to be subtracted
from the signals manually before normalizing, e.g.
\code{Xa <- t(t(X)-a)} where \code{e} is \code{\link[base]{vector}} of length
\code{ncol(X)}. Then normalize by
\code{Xn <- normalizeAffine(Xa, center=FALSE)}.
You can assert that the model is fitted without offset by
\code{stopifnot(all(attr(Xn, "modelFit")$adiag == 0))}.
}
\details{
A line is fitted robustly through the \eqn{(y_R,y_G)} observations
using an iterated re-weighted principal component analysis (IWPCA),
which minimized the residuals that are orthogonal to the fitted line.
Each observation is down-weighted by the inverse of the absolute
residuals, i.e. the fit is done in \eqn{L_1}.
}
\author{Henrik Bengtsson}
\references{
[1] Henrik Bengtsson and Ola \enc{Hssjer}{Hossjer}, \emph{Methodological Study of Affine Transformations of Gene Expression Data}, Methodological study of affine transformations of gene expression data with proposed robust non-parametric multi-dimensional normalization method, BMC Bioinformatics, 2006, 7:100.
\cr
}
\examples{
pathname <- system.file("data-ex", "PMT-RGData.dat", package="aroma.light")
rg <- read.table(pathname, header=TRUE, sep="\t")
nbrOfScans <- max(rg$slide)
rg <- as.list(rg)
for (field in c("R", "G"))
rg[[field]] <- matrix(as.double(rg[[field]]), ncol=nbrOfScans)
rg$slide <- rg$spot <- NULL
rg <- as.matrix(as.data.frame(rg))
colnames(rg) <- rep(c("R", "G"), each=nbrOfScans)
layout(matrix(c(1,2,0,3,4,0,5,6,7), ncol=3, byrow=TRUE))
rgC <- rg
for (channel in c("R", "G")) {
sidx <- which(colnames(rg) == channel)
channelColor <- switch(channel, R="red", G="green")
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The raw data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
plotMvsAPairs(rg[,sidx])
title(main=paste("Observed", channel))
box(col=channelColor)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The calibrated data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
rgC[,sidx] <- calibrateMultiscan(rg[,sidx], average=NULL)
plotMvsAPairs(rgC[,sidx])
title(main=paste("Calibrated", channel))
box(col=channelColor)
} # for (channel ...)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The average calibrated data
#
# Note how the red signals are weaker than the green. The reason
# for this can be that the scale factor in the green channel is
# greater than in the red channel, but it can also be that there
# is a remaining relative difference in bias between the green
# and the red channel, a bias that precedes the scanning.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
rgCA <- rg
for (channel in c("R", "G")) {
sidx <- which(colnames(rg) == channel)
rgCA[,sidx] <- calibrateMultiscan(rg[,sidx])
}
rgCAavg <- matrix(NA_real_, nrow=nrow(rgCA), ncol=2)
colnames(rgCAavg) <- c("R", "G")
for (channel in c("R", "G")) {
sidx <- which(colnames(rg) == channel)
rgCAavg[,channel] <- apply(rgCA[,sidx], MARGIN=1, FUN=median, na.rm=TRUE)
}
# Add some "fake" outliers
outliers <- 1:600
rgCAavg[outliers,"G"] <- 50000
plotMvsA(rgCAavg)
title(main="Average calibrated (AC)")
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Normalize data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Weight-down outliers when normalizing
weights <- rep(1, nrow(rgCAavg))
weights[outliers] <- 0.001
# Affine normalization of channels
rgCANa <- normalizeAffine(rgCAavg, weights=weights)
# It is always ok to rescale the affine normalized data if its
# done on (R,G); not on (A,M)! However, this is only needed for
# esthetic purposes.
rgCANa <- rgCANa *2^1.4
plotMvsA(rgCANa)
title(main="Normalized AC")
# Curve-fit (lowess) normalization
rgCANlw <- normalizeLowess(rgCAavg, weights=weights)
plotMvsA(rgCANlw, col="orange", add=TRUE)
# Curve-fit (loess) normalization
rgCANl <- normalizeLoess(rgCAavg, weights=weights)
plotMvsA(rgCANl, col="red", add=TRUE)
# Curve-fit (robust spline) normalization
rgCANrs <- normalizeRobustSpline(rgCAavg, weights=weights)
plotMvsA(rgCANrs, col="blue", add=TRUE)
legend(x=0,y=16, legend=c("affine", "lowess", "loess", "r. spline"), pch=19,
col=c("black", "orange", "red", "blue"), ncol=2, x.intersp=0.3, bty="n")
plotMvsMPairs(cbind(rgCANa, rgCANlw), col="orange", xlab=expression(M[affine]))
title(main="Normalized AC")
plotMvsMPairs(cbind(rgCANa, rgCANl), col="red", add=TRUE)
plotMvsMPairs(cbind(rgCANa, rgCANrs), col="blue", add=TRUE)
abline(a=0, b=1, lty=2)
legend(x=-6,y=6, legend=c("lowess", "loess", "r. spline"), pch=19,
col=c("orange", "red", "blue"), ncol=2, x.intersp=0.3, bty="n")
}
\seealso{
\code{\link{calibrateMultiscan}}().
}
\keyword{methods}
aroma.light/tests/ 0000755 0001750 0001750 00000000000 14136047216 013757 5 ustar nilesh nilesh aroma.light/tests/rowAverages.matrix.R 0000644 0001750 0001750 00000000356 14136047216 017676 0 ustar nilesh nilesh library("aroma.light")
X <- matrix(1:30, nrow=5L, ncol=6L)
mu <- rowMeans(X)
sd <- apply(X, MARGIN=1L, FUN=sd)
y <- rowAverages(X)
stopifnot(all(y == mu))
stopifnot(all(attr(y,"deviance") == sd))
stopifnot(all(attr(y,"df") == ncol(X)))
aroma.light/tests/normalizeDifferencesToAverage.R 0000644 0001750 0001750 00000001644 14136047216 022043 0 ustar nilesh nilesh library("aroma.light")
# Simulate three shifted tracks of different lengths with same profiles
ns <- c(A=2, B=1, C=0.25)*1000
xx <- lapply(ns, FUN=function(n) { seq(from=1, to=max(ns), length.out=n) })
zz <- mapply(seq_along(ns), ns, FUN=function(z,n) rep(z,n))
yy <- list(
A = rnorm(ns["A"], mean=0, sd=0.5),
B = rnorm(ns["B"], mean=5, sd=0.4),
C = rnorm(ns["C"], mean=-5, sd=1.1)
)
yy <- lapply(yy, FUN=function(y) {
n <- length(y)
y[1:(n/2)] <- y[1:(n/2)] + 2
y[1:(n/4)] <- y[1:(n/4)] - 4
y
})
# Shift all tracks toward the first track
yyN <- normalizeDifferencesToAverage(yy, baseline=1)
# The baseline channel is not changed
stopifnot(identical(yy[[1]], yyN[[1]]))
# Get the estimated parameters
fit <- attr(yyN, "fit")
# Plot the tracks
layout(matrix(1:2, ncol=1))
x <- unlist(xx)
col <- unlist(zz)
y <- unlist(yy)
yN <- unlist(yyN)
plot(x, y, col=col, ylim=c(-10,10))
plot(x, yN, col=col, ylim=c(-10,10))
aroma.light/tests/wpca2.matrix.R 0000644 0001750 0001750 00000002645 14136047216 016430 0 ustar nilesh nilesh library("aroma.light")
# -------------------------------------------------------------
# A second example
# -------------------------------------------------------------
# Data
x <- c(1,2,3,4,5)
y <- c(2,4,3,3,6)
opar <- par(bty="L")
opalette <- palette(c("blue", "red", "black"))
xlim <- ylim <- c(0,6)
# Plot the data and the center mass
plot(x,y, pch=16, cex=1.5, xlim=xlim, ylim=ylim)
points(mean(x), mean(y), cex=2, lwd=2, col="blue")
# Linear regression y ~ x
fit <- lm(y ~ x)
abline(fit, lty=1, col=1)
# Linear regression y ~ x through without intercept
fit <- lm(y ~ x - 1)
abline(fit, lty=2, col=1)
# Linear regression x ~ y
fit <- lm(x ~ y)
c <- coefficients(fit)
b <- 1/c[2]
a <- -b*c[1]
abline(a=a, b=b, lty=1, col=2)
# Linear regression x ~ y through without intercept
fit <- lm(x ~ y - 1)
b <- 1/coefficients(fit)
abline(a=0, b=b, lty=2, col=2)
# Orthogonal linear "regression"
fit <- wpca(cbind(x,y))
b <- fit$vt[1,2]/fit$vt[1,1]
a <- fit$xMean[2]-b*fit$xMean[1]
abline(a=a, b=b, lwd=2, col=3)
# Orthogonal linear "regression" without intercept
fit <- wpca(cbind(x,y), center=FALSE)
b <- fit$vt[1,2]/fit$vt[1,1]
a <- fit$xMean[2]-b*fit$xMean[1]
abline(a=a, b=b, lty=2, lwd=2, col=3)
legend(xlim[1],ylim[2], legend=c("lm(y~x)", "lm(y~x-1)", "lm(x~y)",
"lm(x~y-1)", "pca", "pca w/o intercept"), lty=rep(1:2,3),
lwd=rep(c(1,1,2),each=2), col=rep(1:3,each=2))
palette(opalette)
par(opar)
aroma.light/tests/fitPrincipalCurve.matrix.R 0000644 0001750 0001750 00000004220 14136047216 021034 0 ustar nilesh nilesh library("aroma.light")
# Simulate data from the model y <- a + bx + x^c + eps(bx)
J <- 1000
x <- rexp(J)
a <- c(2,15,3)
b <- c(2,3,4)
c <- c(1,2,1/2)
bx <- outer(b,x)
xc <- t(sapply(c, FUN=function(c) x^c))
eps <- apply(bx, MARGIN=2, FUN=function(x) rnorm(length(b), mean=0, sd=0.1*x))
y <- a + bx + xc + eps
y <- t(y)
# Fit principal curve through (y_1, y_2, y_3)
fit <- fitPrincipalCurve(y, verbose=TRUE)
# Flip direction of 'lambda'?
rho <- cor(fit$lambda, y[,1], use="complete.obs")
flip <- (rho < 0)
if (flip) {
fit$lambda <- max(fit$lambda, na.rm=TRUE)-fit$lambda
}
# Backtransform (y_1, y_2, y_3) to be proportional to each other
yN <- backtransformPrincipalCurve(y, fit=fit)
# Same backtransformation dimension by dimension
yN2 <- y
for (cc in 1:ncol(y)) {
yN2[,cc] <- backtransformPrincipalCurve(y, fit=fit, dimensions=cc)
}
stopifnot(identical(yN2, yN))
xlim <- c(0, 1.04*max(x))
ylim <- range(c(y,yN), na.rm=TRUE)
# Pairwise signals vs x before and after transform
layout(matrix(1:4, nrow=2, byrow=TRUE))
par(mar=c(4,4,3,2)+0.1)
for (cc in 1:3) {
ylab <- substitute(y[c], env=list(c=cc))
plot(NA, xlim=xlim, ylim=ylim, xlab="x", ylab=ylab)
abline(h=a[cc], lty=3)
mtext(side=4, at=a[cc], sprintf("a=%g", a[cc]),
cex=0.8, las=2, line=0, adj=1.1, padj=-0.2)
points(x, y[,cc])
points(x, yN[,cc], col="tomato")
legend("topleft", col=c("black", "tomato"), pch=19,
c("orignal", "transformed"), bty="n")
}
title(main="Pairwise signals vs x before and after transform", outer=TRUE, line=-2)
# Pairwise signals before and after transform
layout(matrix(1:4, nrow=2, byrow=TRUE))
par(mar=c(4,4,3,2)+0.1)
for (rr in 3:2) {
ylab <- substitute(y[c], env=list(c=rr))
for (cc in 1:2) {
if (cc == rr) {
plot.new()
next
}
xlab <- substitute(y[c], env=list(c=cc))
plot(NA, xlim=ylim, ylim=ylim, xlab=xlab, ylab=ylab)
abline(a=0, b=1, lty=2)
points(y[,c(cc,rr)])
points(yN[,c(cc,rr)], col="tomato")
legend("topleft", col=c("black", "tomato"), pch=19,
c("orignal", "transformed"), bty="n")
}
}
title(main="Pairwise signals before and after transform", outer=TRUE, line=-2)
aroma.light/tests/callNaiveGenotypes.R 0000644 0001750 0001750 00000003236 14136047216 017702 0 ustar nilesh nilesh library("aroma.light")
layout(matrix(1:3, ncol=1))
par(mar=c(2,4,4,1)+0.1)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# A bimodal distribution
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
xAA <- rnorm(n=10000, mean=0, sd=0.1)
xBB <- rnorm(n=10000, mean=1, sd=0.1)
x <- c(xAA,xBB)
fit <- findPeaksAndValleys(x)
print(fit)
calls <- callNaiveGenotypes(x, cn=rep(1,length(x)), verbose=-20)
xc <- split(x, calls)
print(table(calls))
xx <- c(list(x),xc)
plotDensity(xx, adjust=1.5, lwd=2, col=seq_along(xx), main="(AA,BB)")
abline(v=fit$x)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# A trimodal distribution with missing values
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
xAB <- rnorm(n=10000, mean=1/2, sd=0.1)
x <- c(xAA,xAB,xBB)
x[sample(length(x), size=0.05*length(x))] <- NA_real_
x[sample(length(x), size=0.01*length(x))] <- -Inf
x[sample(length(x), size=0.01*length(x))] <- +Inf
fit <- findPeaksAndValleys(x)
print(fit)
calls <- callNaiveGenotypes(x)
xc <- split(x, calls)
print(table(calls))
xx <- c(list(x),xc)
plotDensity(xx, adjust=1.5, lwd=2, col=seq_along(xx), main="(AA,AB,BB)")
abline(v=fit$x)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# A trimodal distribution with clear separation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
xAA <- rnorm(n=10000, mean=0, sd=0.02)
xAB <- rnorm(n=10000, mean=1/2, sd=0.02)
xBB <- rnorm(n=10000, mean=1, sd=0.02)
x <- c(xAA,xAB,xBB)
fit <- findPeaksAndValleys(x)
print(fit)
calls <- callNaiveGenotypes(x)
xc <- split(x, calls)
print(table(calls))
xx <- c(list(x),xc)
plotDensity(xx, adjust=1.5, lwd=2, col=seq_along(xx), main="(AA',AB',BB')")
abline(v=fit$x)
aroma.light/tests/normalizeAverage.matrix.R 0000644 0001750 0001750 00000001071 14136047216 020677 0 ustar nilesh nilesh library("aroma.light")
# Simulate three samples with on average 20% missing values
N <- 10000
X <- cbind(rnorm(N, mean=3, sd=1),
rnorm(N, mean=4, sd=2),
rgamma(N, shape=2, rate=1))
X[sample(3*N, size=0.20*3*N)] <- NA_real_
# Normalize quantiles
Xn <- normalizeAverage(X, na.rm=TRUE, targetAvg=median(X, na.rm=TRUE))
# Plot the data
layout(matrix(1:2, ncol=1))
xlim <- range(X, Xn, na.rm=TRUE)
plotDensity(X, lwd=2, xlim=xlim, main="The three original distributions")
plotDensity(Xn, lwd=2, xlim=xlim, main="The three normalized distributions")
aroma.light/tests/sampleCorrelations.matrix.R 0000644 0001750 0001750 00000000345 14136047216 021255 0 ustar nilesh nilesh library("aroma.light")
# Simulate 20000 genes with 10 observations each
X <- matrix(rnorm(n=20000), ncol=10)
# Calculate the correlation for 5000 random gene pairs
cor <- sampleCorrelations(X, npairs=5000)
print(summary(cor))
aroma.light/tests/normalizeQuantileRank.list.R 0000644 0001750 0001750 00000001405 14136047216 021373 0 ustar nilesh nilesh library("aroma.light")
# Simulate ten samples of different lengths
N <- 10000
X <- list()
for (kk in 1:8) {
rfcn <- list(rnorm, rgamma)[[sample(2, size=1)]]
size <- runif(1, min=0.3, max=1)
a <- rgamma(1, shape=20, rate=10)
b <- rgamma(1, shape=10, rate=10)
values <- rfcn(size*N, a, b)
# "Censor" values
values[values < 0 | values > 8] <- NA_real_
X[[kk]] <- values
}
# Add 20% missing values
X <- lapply(X, FUN=function(x) {
x[sample(length(x), size=0.20*length(x))] <- NA_real_
x
})
# Normalize quantiles
Xn <- normalizeQuantile(X)
# Plot the data
layout(matrix(1:2, ncol=1))
xlim <- range(X, na.rm=TRUE)
plotDensity(X, lwd=2, xlim=xlim, main="The original distributions")
plotDensity(Xn, lwd=2, xlim=xlim, main="The normalized distributions")
aroma.light/tests/normalizeAffine.matrix.R 0000644 0001750 0001750 00000005176 14136047216 020527 0 ustar nilesh nilesh library("aroma.light")
pathname <- system.file("data-ex", "PMT-RGData.dat", package="aroma.light")
rg <- read.table(pathname, header=TRUE, sep="\t")
nbrOfScans <- max(rg$slide)
rg <- as.list(rg)
for (field in c("R", "G"))
rg[[field]] <- matrix(as.double(rg[[field]]), ncol=nbrOfScans)
rg$slide <- rg$spot <- NULL
rg <- as.matrix(as.data.frame(rg))
colnames(rg) <- rep(c("R", "G"), each=nbrOfScans)
rgC <- rg
layout(matrix(c(1,2,0,3,4,0,5,6,7), ncol=3, byrow=TRUE))
for (channel in c("R", "G")) {
sidx <- which(colnames(rg) == channel)
channelColor <- switch(channel, R="red", G="green")
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The raw data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
plotMvsAPairs(rg, channel=channel)
title(main=paste("Observed", channel))
box(col=channelColor)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The calibrated data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
rgC[,sidx] <- calibrateMultiscan(rg[,sidx], average=NULL)
plotMvsAPairs(rgC, channel=channel)
title(main=paste("Calibrated", channel))
box(col=channelColor)
} # for (channel ...)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The average calibrated data
#
# Note how the red signals are weaker than the green. The reason
# for this can be that the scale factor in the green channel is
# greater than in the red channel, but it can also be that there
# is a remaining relative difference in bias between the green
# and the red channel, a bias that precedes the scanning.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
rgCA <- matrix(NA_real_, nrow=nrow(rg), ncol=2)
colnames(rgCA) <- c("R", "G")
for (channel in c("R", "G")) {
sidx <- which(colnames(rg) == channel)
rgCA[,channel] <- calibrateMultiscan(rg[,sidx])
}
plotMvsA(rgCA)
title(main="Average calibrated")
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The affine normalized average calibrated data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Create a matrix where the columns represent the channels
# to be normalized.
rgCAN <- rgCA
# Affine normalization of channels
rgCAN <- normalizeAffine(rgCAN)
plotMvsA(rgCAN)
title(main="Affine normalized A.C.")
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# It is always ok to rescale the affine normalized data if its
# done on (R,G); not on (A,M)! However, this is only needed for
# esthetic purposes.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
rgCAN <- rgCAN * 2^5
plotMvsA(rgCAN)
title(main="Rescaled normalized")
aroma.light/tests/robustSmoothSpline.R 0000644 0001750 0001750 00000002213 14136047216 017763 0 ustar nilesh nilesh library("aroma.light")
data(cars)
attach(cars)
plot(speed, dist, main = "data(cars) & robust smoothing splines")
# Fit a smoothing spline using L_2 norm
cars.spl <- smooth.spline(speed, dist)
lines(cars.spl, col = "blue")
# Fit a smoothing spline using L_1 norm
cars.rspl <- robustSmoothSpline(speed, dist)
lines(cars.rspl, col = "red")
# Fit a smoothing spline using L_2 norm with 10 degrees of freedom
lines(smooth.spline(speed, dist, df=10), lty=2, col = "blue")
# Fit a smoothing spline using L_1 norm with 10 degrees of freedom
lines(robustSmoothSpline(speed, dist, df=10), lty=2, col = "red")
# Fit a smoothing spline using Tukey's biweight norm
cars.rspl <- robustSmoothSpline(speed, dist, method = "symmetric")
lines(cars.rspl, col = "purple")
legend(5,120, c(
paste("smooth.spline [C.V.] => df =",round(cars.spl$df,1)),
paste("robustSmoothSpline L1 [C.V.] => df =",round(cars.rspl$df,1)),
paste("robustSmoothSpline symmetric [C.V.] => df =",round(cars.rspl$df,1)),
"standard with s( * , df = 10)", "robust with s( * , df = 10)"
),
col = c("blue","red","purple","blue","red"), lty = c(1,1,1,2,2),
bg='bisque')
aroma.light/tests/wpca.matrix.R 0000644 0001750 0001750 00000003550 14136047216 016342 0 ustar nilesh nilesh library("aroma.light")
for (zzz in 0) {
# This example requires plot3d() in R.basic [http://www.braju.com/R/]
if (!require(pkgName <- "R.basic", character.only=TRUE)) break
# -------------------------------------------------------------
# A first example
# -------------------------------------------------------------
# Simulate data from the model y <- a + bx + eps(bx)
x <- rexp(1000)
a <- c(2,15,3)
b <- c(2,3,15)
bx <- outer(b,x)
eps <- apply(bx, MARGIN=2, FUN=function(x) rnorm(length(x), mean=0, sd=0.1*x))
y <- a + bx + eps
y <- t(y)
# Add some outliers by permuting the dimensions for 1/3 of the observations
idx <- sample(1:nrow(y), size=1/3*nrow(y))
y[idx,] <- y[idx,c(2,3,1)]
# Down-weight the outliers W times to demonstrate how weights are used
W <- 10
# Plot the data with fitted lines at four different view points
N <- 4
theta <- seq(0,180,length.out=N)
phi <- rep(30, length.out=N)
# Use a different color for each set of weights
col <- topo.colors(W)
opar <- par(mar=c(1,1,1,1)+0.1)
layout(matrix(1:N, nrow=2, byrow=TRUE))
for (kk in seq(theta)) {
# Plot the data
plot3d(y, theta=theta[kk], phi=phi[kk])
# First, same weights for all observations
w <- rep(1, length=nrow(y))
for (ww in 1:W) {
# Fit a line using IWPCA through data
fit <- wpca(y, w=w, swapDirections=TRUE)
# Get the first principal component
ymid <- fit$xMean
d0 <- apply(y, MARGIN=2, FUN=min) - ymid
d1 <- apply(y, MARGIN=2, FUN=max) - ymid
b <- fit$vt[1,]
y0 <- -b * max(abs(d0))
y1 <- b * max(abs(d1))
yline <- matrix(c(y0,y1), nrow=length(b), ncol=2)
yline <- yline + ymid
points3d(t(ymid), col=col)
lines3d(t(yline), col=col)
# Down-weight outliers only, because here we know which they are.
w[idx] <- w[idx]/2
}
# Highlight the last one
lines3d(t(yline), col="red", lwd=3)
}
par(opar)
} # for (zzz in 0)
rm(zzz)
aroma.light/tests/likelihood.smooth.spline.R 0000644 0001750 0001750 00000002106 14136047216 021025 0 ustar nilesh nilesh library("aroma.light")
# Define f(x)
f <- expression(0.1*x^4 + 1*x^3 + 2*x^2 + x + 10*sin(2*x))
# Simulate data from this function in the range [a,b]
a <- -2; b <- 5
x <- seq(a, b, length.out=3000)
y <- eval(f)
# Add some noise to the data
y <- y + rnorm(length(y), 0, 10)
# Plot the function and its second derivative
plot(x,y, type="l", lwd=4)
# Fit a cubic smoothing spline and plot it
g <- smooth.spline(x,y, df=16)
lines(g, col="yellow", lwd=2, lty=2)
# Calculating the (log) likelihood of the fitted spline
l <- likelihood(g)
cat("Log likelihood with unique x values:\n")
print(l)
# Note that this is not the same as the log likelihood of the
# data on the fitted spline iff the x values are non-unique
x[1:5] <- x[1] # Non-unique x values
g <- smooth.spline(x,y, df=16)
l <- likelihood(g)
cat("\nLog likelihood of the *spline* data set:\n")
print(l)
# In cases with non unique x values one has to proceed as
# below if one want to get the log likelihood for the original
# data.
l <- likelihood(g, x=x, y=y)
cat("\nLog likelihood of the *original* data set:\n")
print(l)
aroma.light/tests/normalizeFragmentLength-ex2.R 0000644 0001750 0001750 00000011063 14136047216 021425 0 ustar nilesh nilesh library("aroma.light")
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Example 2: Two-enzyme fragment-length normalization of 6 arrays
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
set.seed(0xbeef)
# Number samples
I <- 5
# Number of loci
J <- 3000
# Fragment lengths (two enzymes)
fl <- matrix(0, nrow=J, ncol=2)
fl[,1] <- seq(from=100, to=1000, length.out=J)
fl[,2] <- seq(from=1000, to=100, length.out=J)
# Let 1/2 of the units be on both enzymes
fl[seq(from=1, to=J, by=4),1] <- NA_real_
fl[seq(from=2, to=J, by=4),2] <- NA_real_
# Let some have unknown fragment lengths
hasUnknownFL <- seq(from=1, to=J, by=15)
fl[hasUnknownFL,] <- NA_real_
# Sty/Nsp mixing proportions:
rho <- rep(1, I)
rho[1] <- 1/3; # Less Sty in 1st sample
rho[3] <- 3/2; # More Sty in 3rd sample
# Simulate data
z <- array(0, dim=c(J,2,I))
maxLog2Theta <- 12
for (ii in 1:I) {
# Common effect for both enzymes
mu <- function(fl) {
k <- runif(n=1, min=3, max=5)
mu <- rep(maxLog2Theta, length(fl))
ok <- is.finite(fl)
mu[ok] <- mu[ok] - fl[ok]^{1/k}
mu
}
# Calculate the effect for each data point
for (ee in 1:2) {
z[,ee,ii] <- mu(fl[,ee])
}
# Update the Sty/Nsp mixing proportions
ee <- 2
z[,ee,ii] <- rho[ii]*z[,ee,ii]
# Add random errors
for (ee in 1:2) {
eps <- rnorm(J, mean=0, sd=1/sqrt(2))
z[,ee,ii] <- z[,ee,ii] + eps
}
}
hasFl <- is.finite(fl)
unitSets <- list(
nsp = which( hasFl[,1] & !hasFl[,2]),
sty = which(!hasFl[,1] & hasFl[,2]),
both = which( hasFl[,1] & hasFl[,2]),
none = which(!hasFl[,1] & !hasFl[,2])
)
# The observed data is a mix of two enzymes
theta <- matrix(NA_real_, nrow=J, ncol=I)
# Single-enzyme units
for (ee in 1:2) {
uu <- unitSets[[ee]]
theta[uu,] <- 2^z[uu,ee,]
}
# Both-enzyme units (sum on intensity scale)
uu <- unitSets$both
theta[uu,] <- (2^z[uu,1,]+2^z[uu,2,])/2
# Missing units (sample from the others)
uu <- unitSets$none
theta[uu,] <- apply(theta, MARGIN=2, sample, size=length(uu))
# Calculate target array
thetaT <- rowMeans(theta, na.rm=TRUE)
targetFcns <- list()
for (ee in 1:2) {
uu <- unitSets[[ee]]
fit <- lowess(fl[uu,ee], log2(thetaT[uu]))
class(fit) <- "lowess"
targetFcns[[ee]] <- function(fl, ...) {
predict(fit, newdata=fl)
}
}
# Fit model only to a subset of the data
subsetToFit <- setdiff(1:J, seq(from=1, to=J, by=10))
# Normalize data (to a target baseline)
thetaN <- matrix(NA_real_, nrow=J, ncol=I)
fits <- vector("list", I)
for (ii in 1:I) {
lthetaNi <- normalizeFragmentLength(log2(theta[,ii]), targetFcns=targetFcns,
fragmentLengths=fl, onMissing="median",
subsetToFit=subsetToFit, .returnFit=TRUE)
fits[[ii]] <- attr(lthetaNi, "modelFit")
thetaN[,ii] <- 2^lthetaNi
}
# Plot raw data
xlim <- c(0, max(fl, na.rm=TRUE))
ylim <- c(0, max(log2(theta), na.rm=TRUE))
Mlim <- c(-1,1)*4
xlab <- "Fragment length"
ylab <- expression(log2(theta))
Mlab <- expression(M==log[2](theta/theta[R]))
layout(matrix(1:(3*I), ncol=I, byrow=TRUE))
for (ii in 1:I) {
plot(NA, xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab, main="raw")
# Single-enzyme units
for (ee in 1:2) {
# The raw data
uu <- unitSets[[ee]]
points(fl[uu,ee], log2(theta[uu,ii]), col=ee+1)
}
# Both-enzyme units (use fragment-length for enzyme #1)
uu <- unitSets$both
points(fl[uu,1], log2(theta[uu,ii]), col=3+1)
for (ee in 1:2) {
# The true effects
uu <- unitSets[[ee]]
lines(lowess(fl[uu,ee], log2(theta[uu,ii])), col="black", lwd=4, lty=3)
# The estimated effects
fit <- fits[[ii]][[ee]]$fit
lines(fit, col="orange", lwd=3)
muT <- targetFcns[[ee]](fl[uu,ee])
lines(fl[uu,ee], muT, col="cyan", lwd=1)
}
}
# Calculate log-ratios
thetaR <- rowMeans(thetaN, na.rm=TRUE)
M <- log2(thetaN/thetaR)
# Plot normalized data
for (ii in 1:I) {
plot(NA, xlim=xlim, ylim=Mlim, xlab=xlab, ylab=Mlab, main="normalized")
# Single-enzyme units
for (ee in 1:2) {
# The normalized data
uu <- unitSets[[ee]]
points(fl[uu,ee], M[uu,ii], col=ee+1)
}
# Both-enzyme units (use fragment-length for enzyme #1)
uu <- unitSets$both
points(fl[uu,1], M[uu,ii], col=3+1)
}
ylim <- c(0,1.5)
for (ii in 1:I) {
data <- list()
for (ee in 1:2) {
# The normalized data
uu <- unitSets[[ee]]
data[[ee]] <- M[uu,ii]
}
uu <- unitSets$both
if (length(uu) > 0)
data[[3]] <- M[uu,ii]
uu <- unitSets$none
if (length(uu) > 0)
data[[4]] <- M[uu,ii]
cols <- seq_along(data)+1
plotDensity(data, col=cols, xlim=Mlim, xlab=Mlab, main="normalized")
abline(v=0, lty=2)
}
aroma.light/tests/findPeaksAndValleys.R 0000644 0001750 0001750 00000002173 14136047216 017774 0 ustar nilesh nilesh library("aroma.light")
layout(matrix(1:3, ncol=1))
par(mar=c(2,4,4,1)+0.1)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# A unimodal distribution
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
x1 <- rnorm(n=10000, mean=0, sd=1)
x <- x1
fit <- findPeaksAndValleys(x)
print(fit)
plot(density(x), lwd=2, main="x1")
abline(v=fit$x)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# A trimodal distribution
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
x2 <- rnorm(n=10000, mean=4, sd=1)
x3 <- rnorm(n=10000, mean=8, sd=1)
x <- c(x1,x2,x3)
fit <- findPeaksAndValleys(x)
print(fit)
plot(density(x), lwd=2, main="c(x1,x2,x3)")
abline(v=fit$x)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# A trimodal distribution with clear separation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
x1b <- rnorm(n=10000, mean=0, sd=0.1)
x2b <- rnorm(n=10000, mean=4, sd=0.1)
x3b <- rnorm(n=10000, mean=8, sd=0.1)
x <- c(x1b,x2b,x3b)
# Illustrating explicit usage of density()
d <- density(x)
fit <- findPeaksAndValleys(d, tol=0)
print(fit)
plot(d, lwd=2, main="c(x1b,x2b,x3b)")
abline(v=fit$x)
aroma.light/tests/normalizeFragmentLength-ex1.R 0000644 0001750 0001750 00000003316 14136047216 021426 0 ustar nilesh nilesh library("aroma.light")
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Example 1: Single-enzyme fragment-length normalization of 6 arrays
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Number samples
I <- 9
# Number of loci
J <- 1000
# Fragment lengths
fl <- seq(from=100, to=1000, length.out=J)
# Simulate data points with unknown fragment lengths
hasUnknownFL <- seq(from=1, to=J, by=50)
fl[hasUnknownFL] <- NA_real_
# Simulate data
y <- matrix(0, nrow=J, ncol=I)
maxY <- 12
for (kk in 1:I) {
k <- runif(n=1, min=3, max=5)
mu <- function(fl) {
mu <- rep(maxY, length(fl))
ok <- !is.na(fl)
mu[ok] <- mu[ok] - fl[ok]^{1/k}
mu
}
eps <- rnorm(J, mean=0, sd=1)
y[,kk] <- mu(fl) + eps
}
# Normalize data (to a zero baseline)
yN <- apply(y, MARGIN=2, FUN=function(y) {
normalizeFragmentLength(y, fragmentLengths=fl, onMissing="median")
})
# The correction factors
rho <- y-yN
print(summary(rho))
# The correction for units with unknown fragment lengths
# equals the median correction factor of all other units
print(summary(rho[hasUnknownFL,]))
# Plot raw data
layout(matrix(1:9, ncol=3))
xlim <- c(0,max(fl, na.rm=TRUE))
ylim <- c(0,max(y, na.rm=TRUE))
xlab <- "Fragment length"
ylab <- expression(log2(theta))
for (kk in 1:I) {
plot(fl, y[,kk], xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab)
ok <- (is.finite(fl) & is.finite(y[,kk]))
lines(lowess(fl[ok], y[ok,kk]), col="red", lwd=2)
}
# Plot normalized data
layout(matrix(1:9, ncol=3))
ylim <- c(-1,1)*max(y, na.rm=TRUE)/2
for (kk in 1:I) {
plot(fl, yN[,kk], xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab)
ok <- (is.finite(fl) & is.finite(y[,kk]))
lines(lowess(fl[ok], yN[ok,kk]), col="blue", lwd=2)
}
aroma.light/tests/normalizeTumorBoost,flavors.R 0000644 0001750 0001750 00000002362 14136047216 021614 0 ustar nilesh nilesh library("aroma.light")
library("R.utils")
# Load data
pathname <- system.file("data-ex/TumorBoost,fracB,exampleData.Rbin", package="aroma.light")
data <- loadObject(pathname)
# Drop loci with missing values
data <- na.omit(data)
attachLocally(data)
pos <- position/1e6
# Call naive genotypes
muN <- callNaiveGenotypes(betaN)
# Genotype classes
isAA <- (muN == 0)
isAB <- (muN == 1/2)
isBB <- (muN == 1)
# Sanity checks
stopifnot(all(muN[isAA] == 0))
stopifnot(all(muN[isAB] == 1/2))
stopifnot(all(muN[isBB] == 1))
# TumorBoost normalization with different flavors
betaTNs <- list()
for (flavor in c("v1", "v2", "v3", "v4")) {
betaTN <- normalizeTumorBoost(betaT=betaT, betaN=betaN, preserveScale=FALSE, flavor=flavor)
# Assert that no non-finite values are introduced
stopifnot(all(is.finite(betaTN)))
# Assert that nothing is flipped
stopifnot(all(betaTN[isAA] < 1/2))
stopifnot(all(betaTN[isBB] > 1/2))
betaTNs[[flavor]] <- betaTN
}
# Plot
layout(matrix(1:4, ncol=1))
par(mar=c(2.5,4,0.5,1)+0.1)
ylim <- c(-0.05, 1.05)
col <- rep("#999999", length(muN))
col[muN == 1/2] <- "#000000"
for (flavor in names(betaTNs)) {
betaTN <- betaTNs[[flavor]]
ylab <- sprintf("betaTN[%s]", flavor)
plot(pos, betaTN, col=col, ylim=ylim, ylab=ylab)
}
aroma.light/tests/normalizeTumorBoost.R 0000644 0001750 0001750 00000001547 14136047216 020147 0 ustar nilesh nilesh library("aroma.light")
library("R.utils")
# Load data
pathname <- system.file("data-ex/TumorBoost,fracB,exampleData.Rbin", package="aroma.light")
data <- loadObject(pathname)
attachLocally(data)
pos <- position/1e6
muN <- genotypeN
layout(matrix(1:4, ncol=1))
par(mar=c(2.5,4,0.5,1)+0.1)
ylim <- c(-0.05, 1.05)
col <- rep("#999999", length(muN))
col[muN == 1/2] <- "#000000"
# Allele B fractions for the normal sample
plot(pos, betaN, col=col, ylim=ylim)
# Allele B fractions for the tumor sample
plot(pos, betaT, col=col, ylim=ylim)
# TumorBoost w/ naive genotype calls
betaTN <- normalizeTumorBoost(betaT=betaT, betaN=betaN, preserveScale=FALSE)
plot(pos, betaTN, col=col, ylim=ylim)
# TumorBoost w/ external multi-sample genotype calls
betaTNx <- normalizeTumorBoost(betaT=betaT, betaN=betaN, muN=muN, preserveScale=FALSE)
plot(pos, betaTNx, col=col, ylim=ylim)
aroma.light/tests/normalizeQuantileSpline.matrix.R 0000644 0001750 0001750 00000001445 14136047216 022267 0 ustar nilesh nilesh library("aroma.light")
# Simulate three samples with on average 20% missing values
N <- 10000
X <- cbind(rnorm(N, mean=3, sd=1),
rnorm(N, mean=4, sd=2),
rgamma(N, shape=2, rate=1))
X[sample(3*N, size=0.20*3*N)] <- NA_real_
# Plot the data
layout(matrix(c(1,0,2:5), ncol=2, byrow=TRUE))
xlim <- range(X, na.rm=TRUE)
plotDensity(X, lwd=2, xlim=xlim, main="The three original distributions")
Xn <- normalizeQuantile(X)
plotDensity(Xn, lwd=2, xlim=xlim, main="The three normalized distributions")
plotXYCurve(X, Xn, xlim=xlim, main="The three normalized distributions")
Xn2 <- normalizeQuantileSpline(X, xTarget=Xn[,1], spar=0.99)
plotDensity(Xn2, lwd=2, xlim=xlim, main="The three normalized distributions")
plotXYCurve(X, Xn2, xlim=xlim, main="The three normalized distributions")
aroma.light/tests/distanceBetweenLines.R 0000644 0001750 0001750 00000005054 14136047216 020205 0 ustar nilesh nilesh library("aroma.light")
for (zzz in 0) {
# This example requires plot3d() in R.basic [http://www.braju.com/R/]
if (!require(pkgName <- "R.basic", character.only=TRUE)) break
layout(matrix(1:4, nrow=2, ncol=2, byrow=TRUE))
############################################################
# Lines in two-dimensions
############################################################
x <- list(a=c(1,0), b=c(1,2))
y <- list(a=c(0,2), b=c(1,1))
fit <- distanceBetweenLines(ax=x$a, bx=x$b, ay=y$a, by=y$b)
xlim <- ylim <- c(-1,8)
plot(NA, xlab="", ylab="", xlim=ylim, ylim=ylim)
# Highlight the offset coordinates for both lines
points(t(x$a), pch="+", col="red")
text(t(x$a), label=expression(a[x]), adj=c(-1,0.5))
points(t(y$a), pch="+", col="blue")
text(t(y$a), label=expression(a[y]), adj=c(-1,0.5))
v <- c(-1,1)*10
xv <- list(x=x$a[1]+x$b[1]*v, y=x$a[2]+x$b[2]*v)
yv <- list(x=y$a[1]+y$b[1]*v, y=y$a[2]+y$b[2]*v)
lines(xv, col="red")
lines(yv, col="blue")
points(t(fit$xs), cex=2.0, col="red")
text(t(fit$xs), label=expression(x(s)), adj=c(+2,0.5))
points(t(fit$yt), cex=1.5, col="blue")
text(t(fit$yt), label=expression(y(t)), adj=c(-1,0.5))
print(fit)
############################################################
# Lines in three-dimensions
############################################################
x <- list(a=c(0,0,0), b=c(1,1,1)) # The 'diagonal'
y <- list(a=c(2,1,2), b=c(2,1,3)) # A 'fitted' line
fit <- distanceBetweenLines(ax=x$a, bx=x$b, ay=y$a, by=y$b)
xlim <- ylim <- zlim <- c(-1,3)
dummy <- t(c(1,1,1))*100
# Coordinates for the lines in 3d
v <- seq(-10,10, by=1)
xv <- list(x=x$a[1]+x$b[1]*v, y=x$a[2]+x$b[2]*v, z=x$a[3]+x$b[3]*v)
yv <- list(x=y$a[1]+y$b[1]*v, y=y$a[2]+y$b[2]*v, z=y$a[3]+y$b[3]*v)
for (theta in seq(30,140,length.out=3)) {
plot3d(dummy, theta=theta, phi=30, xlab="", ylab="", zlab="",
xlim=ylim, ylim=ylim, zlim=zlim)
# Highlight the offset coordinates for both lines
points3d(t(x$a), pch="+", col="red")
text3d(t(x$a), label=expression(a[x]), adj=c(-1,0.5))
points3d(t(y$a), pch="+", col="blue")
text3d(t(y$a), label=expression(a[y]), adj=c(-1,0.5))
# Draw the lines
lines3d(xv, col="red")
lines3d(yv, col="blue")
# Draw the two points that are closest to each other
points3d(t(fit$xs), cex=2.0, col="red")
text3d(t(fit$xs), label=expression(x(s)), adj=c(+2,0.5))
points3d(t(fit$yt), cex=1.5, col="blue")
text3d(t(fit$yt), label=expression(y(t)), adj=c(-1,0.5))
# Draw the distance between the two points
lines3d(rbind(fit$xs,fit$yt), col="purple", lwd=2)
}
print(fit)
} # for (zzz in 0)
rm(zzz)
aroma.light/tests/backtransformPrincipalCurve.matrix.R 0000644 0001750 0001750 00000006675 14136047216 023126 0 ustar nilesh nilesh library("aroma.light")
# Consider the case where K=4 measurements have been done
# for the same underlying signals 'x'. The different measurements
# have different systematic variation
#
# y_k = f(x_k) + eps_k; k = 1,...,K.
#
# In this example, we assume non-linear measurement functions
#
# f(x) = a + b*x + x^c + eps(b*x)
#
# where 'a' is an offset, 'b' a scale factor, and 'c' an exponential.
# We also assume heteroscedastic zero-mean noise with standard
# deviation proportional to the rescaled underlying signal 'x'.
#
# Furthermore, we assume that measurements k=2 and k=3 undergo the
# same transformation, which may illustrate that the come from
# the same batch. However, when *fitting* the model below we
# will assume they are independent.
# Transforms
a <- c(2, 15, 15, 3)
b <- c(2, 3, 3, 4)
c <- c(1, 2, 2, 1/2)
K <- length(a)
# The true signal
N <- 1000
x <- rexp(N)
# The noise
bX <- outer(b,x)
E <- apply(bX, MARGIN=2, FUN=function(x) rnorm(K, mean=0, sd=0.1*x))
# The transformed signals with noise
Xc <- t(sapply(c, FUN=function(c) x^c))
Y <- a + bX + Xc + E
Y <- t(Y)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Fit principal curve
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Fit principal curve through Y = (y_1, y_2, ..., y_K)
fit <- fitPrincipalCurve(Y)
# Flip direction of 'lambda'?
rho <- cor(fit$lambda, Y[,1], use="complete.obs")
flip <- (rho < 0)
if (flip) {
fit$lambda <- max(fit$lambda, na.rm=TRUE)-fit$lambda
}
L <- ncol(fit$s)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Backtransform data according to model fit
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Backtransform toward the principal curve (the "common scale")
YN1 <- backtransformPrincipalCurve(Y, fit=fit)
stopifnot(ncol(YN1) == K)
# Backtransform toward the first dimension
YN2 <- backtransformPrincipalCurve(Y, fit=fit, targetDimension=1)
stopifnot(ncol(YN2) == K)
# Backtransform toward the last (fitted) dimension
YN3 <- backtransformPrincipalCurve(Y, fit=fit, targetDimension=L)
stopifnot(ncol(YN3) == K)
# Backtransform toward the third dimension (dimension by dimension)
# Note, this assumes that K == L.
YN4 <- Y
for (cc in 1:L) {
YN4[,cc] <- backtransformPrincipalCurve(Y, fit=fit,
targetDimension=1, dimensions=cc)
}
stopifnot(identical(YN4, YN2))
# Backtransform a subset toward the first dimension
# Note, this assumes that K == L.
YN5 <- backtransformPrincipalCurve(Y, fit=fit,
targetDimension=1, dimensions=2:3)
stopifnot(identical(YN5, YN2[,2:3]))
stopifnot(ncol(YN5) == 2)
# Extract signals from measurement #2 and backtransform according
# its model fit. Signals are standardized to target dimension 1.
y6 <- Y[,2,drop=FALSE]
yN6 <- backtransformPrincipalCurve(y6, fit=fit, dimensions=2,
targetDimension=1)
stopifnot(identical(yN6, YN2[,2,drop=FALSE]))
stopifnot(ncol(yN6) == 1)
# Extract signals from measurement #2 and backtransform according
# the the model fit of measurement #3 (because we believe these
# two have undergone very similar transformations.
# Signals are standardized to target dimension 1.
y7 <- Y[,2,drop=FALSE]
yN7 <- backtransformPrincipalCurve(y7, fit=fit, dimensions=3,
targetDimension=1)
stopifnot(ncol(yN7) == 1)
rho <- cor(yN7, yN6)
print(rho)
stopifnot(rho > 0.999)
aroma.light/tests/fitXYCurve.matrix.R 0000644 0001750 0001750 00000001235 14136047216 017456 0 ustar nilesh nilesh library("aroma.light")
# Simulate data from the model y <- a + bx + x^c + eps(bx)
x <- rexp(1000)
a <- c(2,15)
b <- c(2,1)
c <- c(1,2)
bx <- outer(b,x)
xc <- t(sapply(c, FUN=function(c) x^c))
eps <- apply(bx, MARGIN=2, FUN=function(x) rnorm(length(x), mean=0, sd=0.1*x))
Y <- a + bx + xc + eps
Y <- t(Y)
lim <- c(0,70)
plot(Y, xlim=lim, ylim=lim)
# Fit principal curve through a subset of (y_1, y_2)
subset <- sample(nrow(Y), size=0.3*nrow(Y))
fit <- fitXYCurve(Y[subset,], bandwidth=0.2)
lines(fit, col="red", lwd=2)
# Backtransform (y_1, y_2) keeping y_1 unchanged
YN <- backtransformXYCurve(Y, fit=fit)
points(YN, col="blue")
abline(a=0, b=1, col="red", lwd=2)
aroma.light/tests/normalizeCurveFit.matrix.R 0000644 0001750 0001750 00000007123 14136047216 021060 0 ustar nilesh nilesh library("aroma.light")
pathname <- system.file("data-ex", "PMT-RGData.dat", package="aroma.light")
rg <- read.table(pathname, header=TRUE, sep="\t")
nbrOfScans <- max(rg$slide)
rg <- as.list(rg)
for (field in c("R", "G"))
rg[[field]] <- matrix(as.double(rg[[field]]), ncol=nbrOfScans)
rg$slide <- rg$spot <- NULL
rg <- as.matrix(as.data.frame(rg))
colnames(rg) <- rep(c("R", "G"), each=nbrOfScans)
layout(matrix(c(1,2,0,3,4,0,5,6,7), ncol=3, byrow=TRUE))
rgC <- rg
for (channel in c("R", "G")) {
sidx <- which(colnames(rg) == channel)
channelColor <- switch(channel, R="red", G="green")
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The raw data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
plotMvsAPairs(rg[,sidx])
title(main=paste("Observed", channel))
box(col=channelColor)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The calibrated data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
rgC[,sidx] <- calibrateMultiscan(rg[,sidx], average=NULL)
plotMvsAPairs(rgC[,sidx])
title(main=paste("Calibrated", channel))
box(col=channelColor)
} # for (channel ...)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The average calibrated data
#
# Note how the red signals are weaker than the green. The reason
# for this can be that the scale factor in the green channel is
# greater than in the red channel, but it can also be that there
# is a remaining relative difference in bias between the green
# and the red channel, a bias that precedes the scanning.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
rgCA <- rg
for (channel in c("R", "G")) {
sidx <- which(colnames(rg) == channel)
rgCA[,sidx] <- calibrateMultiscan(rg[,sidx])
}
rgCAavg <- matrix(NA_real_, nrow=nrow(rgCA), ncol=2)
colnames(rgCAavg) <- c("R", "G")
for (channel in c("R", "G")) {
sidx <- which(colnames(rg) == channel)
rgCAavg[,channel] <- apply(rgCA[,sidx], MARGIN=1, FUN=median, na.rm=TRUE)
}
# Add some "fake" outliers
outliers <- 1:600
rgCAavg[outliers,"G"] <- 50000
plotMvsA(rgCAavg)
title(main="Average calibrated (AC)")
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Normalize data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Weight-down outliers when normalizing
weights <- rep(1, nrow(rgCAavg))
weights[outliers] <- 0.001
# Affine normalization of channels
rgCANa <- normalizeAffine(rgCAavg, weights=weights)
# It is always ok to rescale the affine normalized data if its
# done on (R,G); not on (A,M)! However, this is only needed for
# esthetic purposes.
rgCANa <- rgCANa *2^1.4
plotMvsA(rgCANa)
title(main="Normalized AC")
# Curve-fit (lowess) normalization
rgCANlw <- normalizeLowess(rgCAavg, weights=weights)
plotMvsA(rgCANlw, col="orange", add=TRUE)
# Curve-fit (loess) normalization
rgCANl <- normalizeLoess(rgCAavg, weights=weights)
plotMvsA(rgCANl, col="red", add=TRUE)
# Curve-fit (robust spline) normalization
rgCANrs <- normalizeRobustSpline(rgCAavg, weights=weights)
plotMvsA(rgCANrs, col="blue", add=TRUE)
legend(x=0,y=16, legend=c("affine", "lowess", "loess", "r. spline"), pch=19,
col=c("black", "orange", "red", "blue"), ncol=2, x.intersp=0.3, bty="n")
plotMvsMPairs(cbind(rgCANa, rgCANlw), col="orange", xlab=expression(M[affine]))
title(main="Normalized AC")
plotMvsMPairs(cbind(rgCANa, rgCANl), col="red", add=TRUE)
plotMvsMPairs(cbind(rgCANa, rgCANrs), col="blue", add=TRUE)
abline(a=0, b=1, lty=2)
legend(x=-6,y=6, legend=c("lowess", "loess", "r. spline"), pch=19,
col=c("orange", "red", "blue"), ncol=2, x.intersp=0.3, bty="n")
aroma.light/tests/backtransformAffine.matrix.R 0000644 0001750 0001750 00000002244 14136047216 021354 0 ustar nilesh nilesh library("aroma.light")
X <- matrix(1:8, nrow=4, ncol=2)
X[2,2] <- NA_integer_
print(X)
# Returns a 4x2 matrix
print(backtransformAffine(X, a=c(1,5)))
# Returns a 4x2 matrix
print(backtransformAffine(X, b=c(1,1/2)))
# Returns a 4x2 matrix
print(backtransformAffine(X, a=matrix(1:4,ncol=1)))
# Returns a 4x2 matrix
print(backtransformAffine(X, a=matrix(1:3,ncol=1)))
# Returns a 4x2 matrix
print(backtransformAffine(X, a=matrix(1:2,ncol=1), b=c(1,2)))
# Returns a 4x1 matrix
print(backtransformAffine(X, b=c(1,1/2), project=TRUE))
# If the columns of X are identical, and a identity
# backtransformation is applied and projected, the
# same matrix is returned.
X <- matrix(1:4, nrow=4, ncol=3)
Y <- backtransformAffine(X, b=c(1,1,1), project=TRUE)
print(X)
print(Y)
stopifnot(sum(X[,1]-Y) <= .Machine$double.eps)
# If the columns of X are identical, and a identity
# backtransformation is applied and projected, the
# same matrix is returned.
X <- matrix(1:4, nrow=4, ncol=3)
X[,2] <- X[,2]*2; X[,3] <- X[,3]*3
print(X)
Y <- backtransformAffine(X, b=c(1,2,3))
print(Y)
Y <- backtransformAffine(X, b=c(1,2,3), project=TRUE)
print(Y)
stopifnot(sum(X[,1]-Y) <= .Machine$double.eps)
aroma.light/tests/sampleTuples.R 0000644 0001750 0001750 00000000414 14136047216 016557 0 ustar nilesh nilesh library("aroma.light")
pairs <- sampleTuples(1:10, size=5, length=2)
print(pairs)
triples <- sampleTuples(1:10, size=5, length=3)
print(triples)
# Allow tuples with repeated elements
quadruples <- sampleTuples(1:3, size=5, length=4, replace=TRUE)
print(quadruples)
aroma.light/tests/normalizeQuantileRank.matrix.R 0000644 0001750 0001750 00000001015 14136047216 021721 0 ustar nilesh nilesh library("aroma.light")
# Simulate three samples with on average 20% missing values
N <- 10000
X <- cbind(rnorm(N, mean=3, sd=1),
rnorm(N, mean=4, sd=2),
rgamma(N, shape=2, rate=1))
X[sample(3*N, size=0.20*3*N)] <- NA_real_
# Normalize quantiles
Xn <- normalizeQuantile(X)
# Plot the data
layout(matrix(1:2, ncol=1))
xlim <- range(X, Xn, na.rm=TRUE)
plotDensity(X, lwd=2, xlim=xlim, main="The three original distributions")
plotDensity(Xn, lwd=2, xlim=xlim, main="The three normalized distributions")
aroma.light/tests/iwpca.matrix.R 0000644 0001750 0001750 00000004106 14136047216 016511 0 ustar nilesh nilesh library("aroma.light")
for (zzz in 0) {
# This example requires plot3d() in R.basic [http://www.braju.com/R/]
if (!require(pkgName <- "R.basic", character.only=TRUE)) break
# Simulate data from the model y <- a + bx + eps(bx)
x <- rexp(1000)
a <- c(2,15,3)
b <- c(2,3,4)
bx <- outer(b,x)
eps <- apply(bx, MARGIN=2, FUN=function(x) rnorm(length(x), mean=0, sd=0.1*x))
y <- a + bx + eps
y <- t(y)
# Add some outliers by permuting the dimensions for 1/10 of the observations
idx <- sample(1:nrow(y), size=1/10*nrow(y))
y[idx,] <- y[idx,c(2,3,1)]
# Plot the data with fitted lines at four different view points
opar <- par(mar=c(1,1,1,1)+0.1)
N <- 4
layout(matrix(1:N, nrow=2, byrow=TRUE))
theta <- seq(0,270,length.out=N)
phi <- rep(20, length.out=N)
xlim <- ylim <- zlim <- c(0,45)
persp <- list()
for (kk in seq_along(theta)) {
# Plot the data
persp[[kk]] <- plot3d(y, theta=theta[kk], phi=phi[kk], xlim=xlim, ylim=ylim, zlim=zlim)
}
# Weights on the observations
# Example a: Equal weights
w <- NULL
# Example b: More weight on the outliers (uncomment to test)
w <- rep(1, length(x)); w[idx] <- 0.8
# ...and show all iterations too with different colors.
maxIter <- c(seq(1,20,length.out=10),Inf)
col <- topo.colors(length(maxIter))
# Show the fitted value for every iteration
for (ii in seq_along(maxIter)) {
# Fit a line using IWPCA through data
fit <- iwpca(y, w=w, maxIter=maxIter[ii], swapDirections=TRUE)
ymid <- fit$xMean
d0 <- apply(y, MARGIN=2, FUN=min) - ymid
d1 <- apply(y, MARGIN=2, FUN=max) - ymid
b <- fit$vt[1,]
y0 <- -b * max(abs(d0))
y1 <- b * max(abs(d1))
yline <- matrix(c(y0,y1), nrow=length(b), ncol=2)
yline <- yline + ymid
for (kk in seq_along(theta)) {
# Set pane to draw in
par(mfg=c((kk-1) %/% 2, (kk-1) %% 2) + 1)
# Set the viewpoint of the pane
options(persp.matrix=persp[[kk]])
# Get the first principal component
points3d(t(ymid), col=col[ii])
lines3d(t(yline), col=col[ii])
# Highlight the last one
if (ii == length(maxIter))
lines3d(t(yline), col="red", lwd=3)
}
}
par(opar)
} # for (zzz in 0)
rm(zzz)
aroma.light/tests/normalizeAverage.list.R 0000644 0001750 0001750 00000001475 14136047216 020356 0 ustar nilesh nilesh library("aroma.light")
# Simulate ten samples of different lengths
N <- 10000
X <- list()
for (kk in 1:8) {
rfcn <- list(rnorm, rgamma)[[sample(2, size=1)]]
size <- runif(1, min=0.3, max=1)
a <- rgamma(1, shape=20, rate=10)
b <- rgamma(1, shape=10, rate=10)
values <- rfcn(size*N, a, b)
# "Censor" values
values[values < 0 | values > 8] <- NA_real_
X[[kk]] <- values
}
# Add 20% missing values
X <- lapply(X, FUN=function(x) {
x[sample(length(x), size=0.20*length(x))] <- NA_real_
x
})
# Normalize quantiles
Xn <- normalizeAverage(X, na.rm=TRUE, targetAvg=median(unlist(X), na.rm=TRUE))
# Plot the data
layout(matrix(1:2, ncol=1))
xlim <- range(X, Xn, na.rm=TRUE)
plotDensity(X, lwd=2, xlim=xlim, main="The original distributions")
plotDensity(Xn, lwd=2, xlim=xlim, main="The normalized distributions")
aroma.light/tests/medianPolish.matrix.R 0000644 0001750 0001750 00000001027 14136047216 020021 0 ustar nilesh nilesh library("aroma.light")
# Deaths from sport parachuting; from ABC of EDA, p.224:
deaths <- matrix(c(14,15,14, 7,4,7, 8,2,10, 15,9,10, 0,2,0), ncol=3, byrow=TRUE)
rownames(deaths) <- c("1-24", "25-74", "75-199", "200++", "NA")
colnames(deaths) <- 1973:1975
print(deaths)
mp <- medianPolish(deaths)
mp1 <- medpolish(deaths, trace=FALSE)
print(mp)
ff <- c("overall", "row", "col", "residuals")
stopifnot(all.equal(mp[ff], mp1[ff]))
# Validate decomposition:
stopifnot(all.equal(deaths, mp$overall+outer(mp$row,mp$col,"+")+mp$resid))
aroma.light/.Rinstignore 0000644 0001750 0001750 00000000415 14136047216 015121 0 ustar nilesh nilesh doc/Makefile$
# Certain LaTeX files (e.g. bib, bst, sty) must be part of the build
# such that they are available for R CMD check. These are excluded
# from the install using .Rinstignore in the top-level directory
# such as this one.
doc/.*[.](bib|bst|sty)$
aroma.light/R/ 0000755 0001750 0001750 00000000000 14136047216 013016 5 ustar nilesh nilesh aroma.light/R/normalizeQuantileSpline.R 0000644 0001750 0001750 00000017342 14136047216 020026 0 ustar nilesh nilesh ###########################################################################/**
# @RdocGeneric normalizeQuantileSpline
# @alias normalizeQuantileSpline.numeric
# @alias normalizeQuantileSpline.matrix
# @alias normalizeQuantileSpline.list
#
# @title "Normalizes the empirical distribution of one or more samples to a target distribution"
#
# \usage{
# @usage normalizeQuantileSpline,numeric
# @usage normalizeQuantileSpline,matrix
# @usage normalizeQuantileSpline,list
# }
#
# \description{
# @get "title".
# After normalization, all samples have the same average empirical
# density distribution.
# }
#
# \arguments{
# \item{x, X}{A single (\eqn{K=1}) @numeric @vector of length \eqn{N},
# a @numeric \eqn{NxK} @matrix, or a @list of length \eqn{K} with
# @numeric @vectors, where \eqn{K} represents the number of samples
# and \eqn{N} the number of data points.}
# \item{w}{An optional @numeric @vector of length \eqn{N} of weights
# specific to each data point.}
# \item{xTarget}{The target empirical distribution as a \emph{sorted}
# @numeric @vector of length \eqn{M}.
# If @NULL and \code{X} is a @list, then the target distribution is
# calculated as the average empirical distribution of the samples.}
# \item{sortTarget}{If @TRUE, argument \code{xTarget} will be sorted,
# otherwise it is assumed to be already sorted.}
# \item{robust}{If @TRUE, the normalization function is
# estimated robustly.}
# \item{...}{Arguments passed to (@see "stats::smooth.spline"
# or @see "aroma.light::robustSmoothSpline").}
# }
#
# \value{
# Returns an object of the same type and dimensions as the input.
# }
#
# \section{Missing values}{
# Both argument \code{X} and \code{xTarget} may contain non-finite values.
# These values do not affect the estimation of the normalization function.
# Missing values and other non-finite values in \code{X},
# remain in the output as is. No new missing values are introduced.
# }
#
# @examples "../incl/normalizeQuantileSpline.matrix.Rex"
#
# @author "HB"
#
# \seealso{
# The target distribution can be calculated as the average
# using @see "averageQuantile".
#
# Internally either
# @see "aroma.light::robustSmoothSpline" (\code{robust=TRUE}) or
# @see "stats::smooth.spline" (\code{robust=FALSE}) is used.
#
# An alternative normalization method that is also normalizing the
# empirical densities of samples is @see "normalizeQuantileRank".
# Contrary to this method, that method requires that all samples are
# based on the exact same set of data points and it is also more likely
# to over-correct in the tails of the distributions.
# }
#
# \references{
# [1] @include "../incl/BengtssonH_etal_2008.bib.Rdoc" \cr
# }
#
# @keyword "nonparametric"
# @keyword "multivariate"
# @keyword "robust"
#*/###########################################################################
setMethodS3("normalizeQuantileSpline", "list", function(X, w=NULL, xTarget=NULL, sortTarget=TRUE, robust=TRUE, ...) {
# Argument 'xTarget':
if (is.null(xTarget)) {
# Get the target quantile for all channels?
xTarget <- averageQuantile(X);
sortTarget <- FALSE;
} else if (!is.numeric(xTarget)) {
throw("Argument 'xTarget' is not numeric: ", mode(xTarget));
}
# Sort target distribution?
if (sortTarget) {
xTarget <- sort(xTarget, na.last=TRUE);
}
# Normalizes the data
nTarget <- length(xTarget);
X <- lapply(X, FUN=function(x) {
normalizeQuantileSpline(x, w=w, xTarget=xTarget, sortTarget=FALSE,
robust=TRUE, ...);
})
X;
})
setMethodS3("normalizeQuantileSpline", "matrix", function(X, w=NULL, xTarget=NULL, sortTarget=TRUE, robust=TRUE, ...) {
# Argument 'xTarget':
if (is.null(xTarget)) {
# Get the target quantile for all channels?
xTarget <- averageQuantile(X);
sortTarget <- FALSE;
} else if (!is.numeric(xTarget)) {
throw("Argument 'xTarget' is not numeric: ", mode(xTarget));
}
if (length(xTarget) != nrow(X)) {
throw("Argument 'xTarget' is of different length than the number of rows in 'X': ", length(xTarget) , " != ", nrow(X));
}
# Sort target distribution?
if (sortTarget) {
xTarget <- sort(xTarget, na.last=TRUE);
}
# Normalize each of the columns towards the target distribution
for (cc in seq_len(ncol(X))) {
X[,cc] <- normalizeQuantileSpline(X[,cc], w=w, xTarget=xTarget,
sortTarget=FALSE, robust=robust, ...);
}
X;
}) # normalizeQuantileSpline.matrix()
setMethodS3("normalizeQuantileSpline", "numeric", function(x, w=NULL, xTarget, sortTarget=TRUE, robust=TRUE, ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
n <- length(x);
# Argument 'w':
if (!is.null(w)) {
if (!is.numeric(w)) {
throw("Argument 'w' is not numeric: ", mode(w));
}
if (length(w) != n) {
throw("Argument 'w' is of different length than 'x': ",
length(w), " != ", n);
}
}
# Argument 'xTarget':
if (!is.numeric(xTarget)) {
throw("Argument 'xTarget' is not numeric: ", mode(xTarget));
}
if (length(xTarget) != n) {
throw("Argument 'xTarget' is of different length than 'x': ",
length(xTarget), " != ", n);
}
# Sort target distribution?
if (sortTarget) {
xTarget <- sort(xTarget, na.last=TRUE);
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# A) Fit normalization function
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Sort signals (w/ weights) to be normalized
o <- order(x, na.last=TRUE);
xx <- x[o];
if (!is.null(w))
w <- w[o];
# Not needed anymore
o <- NULL;
# Keep only finite values
ok <- (is.finite(xx) & is.finite(xTarget));
# Exclude data points with zero weight
if (!is.null(w))
ok <- (ok & w > 0);
xx <- xx[ok];
if (!is.null(w))
w <- w[ok];
xTarget <- xTarget[ok];
# Not needed anymore
ok <- NULL;
if (robust) {
# robustSmoothSpline() does not return 'data'.
fit <- robustSmoothSpline(x=xx, w=w, y=xTarget, ...);
} else {
# smooth.spline() returns 'data' by default.
fit <- smooth.spline(x=xx, w=w, y=xTarget, keep.data=FALSE, ...);
}
# Not needed anymore
xx <- xTarget <- NULL;
# Not needed below
fit[c("x", "y", "w", "yin", "call")] <- NULL; # Saves < 1MB, though.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# B) Normalize the data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ok <- is.finite(x);
x[ok] <- predict(fit, x=x[ok])$y;
x;
}) # normalizeQuantileSpline.numeric()
##############################################################################
# HISTORY:
# 2013-10-08
# o SPEEDUP: Now normalizeQuantileSpline(..., sortTarget=TRUE) sorts the
# target only once for lists of vectors just as done for matrices.
# 2013-10-07
# o normalizeQuantileSpline() for list:s gained explicit argument 'w',
# 'sortTarget' and 'robust'.
# 2013-05-25
# o SPEEDUP: Removed all gc() calls.
# 2008-04-14
# o Added normalizeQuantileSpline() for list:s, which works analogously as
# normalizeQuantileRank() for list:s.
# 2007-03-28
# o BUG FIX: Weights 'w' are now correctly ordered.
# o BUG FIX: Due to an incorrect if(), TRUE & FALSE was swapped for 'robust'.
# o Memory optimization; now the fitting is not keeping the data.
# o Renamed argument 'sort' to 'sortTarget'.
# 2007-03-22
# o Updated the Rdocs slightly.
# 2007-02-05
# o Now normalizeQuantileSpline() handles NAs too.
# 2007-02-04
# o Created from normalizeQuantile.numeric.R.
##############################################################################
aroma.light/R/rowAverages.matrix.R 0000644 0001750 0001750 00000003174 14136047216 016736 0 ustar nilesh nilesh setMethodS3("rowAverages", "matrix", function(X, average=base::mean, deviance=stats::sd, df=function(x, ...) length(if(na.rm) na.omit(x) else x), na.rm=TRUE, ..., asAttributes=TRUE) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 1. Verify the arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument: 'X'
if (!is.matrix(X))
stop(paste("Argument 'X' is not a matrix: ", class(X)[1]));
# Argument: '...', 'average', and 'deviance'.
args <- list(...);
args[["average"]] <- average;
args[["deviance"]] <- deviance;
args[["df"]] <- df;
for (kk in seq_along(args)) {
key <- names(args)[kk];
arg <- args[[kk]];
if (!is.null(arg) && !is.function(arg))
stop(paste("Argument '", key, "' must be a function: ", mode(arg)));
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 2. Calculate the average and the deviance
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
stats <- list();
for (kk in seq_along(args)) {
key <- names(args)[kk];
arg <- args[[kk]];
stats[[key]] <- as.matrix(apply(X, MARGIN=1, FUN=arg, na.rm=na.rm));
}
if (asAttributes) {
attrs <- attributes(X);
X <- stats[["average"]];
stats[["average"]] <- NULL;
mostattributes(X) <- c(stats, attrs);
X;
} else {
stats;
}
}) # rowAverages()
############################################################################
# HISTORY:
# o 2004-05-17
# Recreated. Made into its own function. This is need by *many* methods.
############################################################################
aroma.light/R/normalizeDifferencesToAverage.R 0000644 0001750 0001750 00000004710 14136047216 021077 0 ustar nilesh nilesh ###########################################################################/**
# @RdocGeneric normalizeDifferencesToAverage
# @alias normalizeDifferencesToAverage.list
#
# @title "Rescales channel vectors to get the same average"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage normalizeDifferencesToAverage,list
# }
#
# \arguments{
# \item{x}{A @numeric @list of length K.}
# \item{baseline}{An @integer in [1,K] specifying which channel should be
# the baseline. The baseline channel will be almost unchanged.
# If @NULL, the channels will be shifted towards median of them all.}
# \item{FUN}{A @function for calculating the average of one channel.}
# \item{...}{Additional arguments passed to the \code{avg} @function.}
# }
#
# \value{
# Returns a normalized @list of length K.
# }
#
# @examples "../incl/normalizeDifferencesToAverage.Rex"
#
# @author "HB"
#*/###########################################################################
setMethodS3("normalizeDifferencesToAverage", "list", function(x, baseline=1, FUN=median, ...) {
# Argument 'x':
if (!is.list(x)) {
throw("Argument 'x' is not a list: ", class(x)[1]);
}
# Number dimensions
ndim <- length(x);
# Argument 'baseline':
if (!is.null(baseline)) {
baseline <- as.integer(baseline);
if (baseline < 1 && baseline > ndim) {
throw(sprintf("Argument 'baseline' is out of range [1,%d]: %d",
ndim, baseline));
}
}
# Calculate the overall average level for each dimension
mus <- sapply(x, FUN=function(y) {
y <- y[is.finite(y)];
FUN(y);
});
# Estimate the overall target level
if (is.null(baseline)) {
targetMu <- mus[baseline];
} else {
targetMu <- median(mus, na.rm=TRUE);
}
# Calculate the required overall shifts per dimension
deltas <- mus - targetMu;
# Shift all dimensions so that all have the same overall average
xN <- mapply(x, as.list(deltas), FUN=function(y, dy) {
y <- y - dy;
list(y);
});
# Return estimated parameters
fit <- list(mus=mus, baseline=baseline, targetMu=targetMu, deltas=deltas);
attr(xN, "fit") <- fit;
xN;
})
############################################################################
# HISTORY:
# 2010-04-04
# o Made the code independent of R.utils::Arguments.
# 2009-09-30
# o Created from the source of an aroma.tcga vignette from May 2009.
############################################################################
aroma.light/R/plotMvsAPairs.R 0000644 0001750 0001750 00000005001 14136047216 015701 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric plotMvsAPairs
# @alias plotMvsAPairs.matrix
#
# @title "Plot log-ratios/log-intensities for all unique pairs of data vectors"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage plotMvsAPairs,matrix
# }
#
# \arguments{
# \item{X}{NxK @matrix where N is the number of observations and
# K is the number of channels.}
# \item{Alab,Mlab}{Labels on the x and y axes.}
# \item{Alim,Mlim}{Plot range on the A and M axes.}
# \item{pch}{Plot symbol used.}
# \item{...}{Additional arguments accepted by @see "graphics::points".}
# \item{add}{If @TRUE, data points are plotted in the current plot,
# otherwise a new plot is created.}
# }
#
# \details{
# Log-ratios and log-intensities are calculated for each neighboring pair
# of channels (columns) and plotted. Thus, in total there will be K-1
# data set plotted.
#
# The colors used for the plotted pairs are 1, 2, and so on. To change
# the colors, use a different color palette.
# }
#
# \value{
# Returns nothing.
# }
#
# @author "HB"
#*/#########################################################################
setMethodS3("plotMvsAPairs", "matrix", function(X, Alab="A", Mlab="M", Alim=c(0,16), Mlim=c(-1,1)*diff(Alim), pch=".", ..., add=FALSE) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument 'X'
if (ncol(X) < 2)
throw("Argument 'X' must have at least two columns: ", ncol(X));
if (!add) {
plot(NA, xlab=Alab, ylab=Mlab, xlim=Alim, ylim=Mlim);
}
nbrOfChannels <- ncol(X);
# Do not plot (or generate false) M vs A for non-positive signals.
X[X <= 0] <- NA;
col <- 1;
for (ii in 1:(nbrOfChannels-1)) {
Xii <- as.double(X[,ii]);
for (jj in (ii+1):nbrOfChannels) {
Xjj <- as.double(X[,jj]);
M <- log(Xii/Xjj, base=2);
A <- log(Xii*Xjj, base=2)/2;
points(A,M, pch=pch, col=col, ...);
col <- col + 1;
}
}
}) # plotMvsAPairs()
############################################################################
# HISTORY:
# 2005-09-06
# o Coercing to doubles to avoid overflow when multiplying to integers.
# o Now non-positive signals are excluded.
# 2005-06-03
# o Now using arguments 'Alab' and 'Mlab' instead of 'xlab' and 'ylab'.
# Same for 'Alim' and 'Mlim'.
# 2005-04-07
# o Created Rdoc comments.
#############################################################################
aroma.light/R/fitIWPCA.R 0000644 0001750 0001750 00000035502 14136047216 014514 0 ustar nilesh nilesh ########################################################################/**
# @RdocGeneric fitIWPCA
# @alias fitIWPCA.matrix
#
# @title "Robust fit of linear subspace through multidimensional data"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage fitIWPCA,matrix
# }
#
# \arguments{
# \item{X}{NxK @matrix where N is the number of observations and
# K is the number of dimensions (channels).
# }
#
# \item{constraint}{A @character string or a @numeric value.
# If @character it specifies which additional constraint to be used
# to specify the offset parameters along the fitted line;
#
# If \code{"diagonal"}, the offset vector will be a point on the line
# that is closest to the diagonal line (1,...,1).
# With this constraint, all bias parameters are identifiable.
#
# If \code{"baseline"} (requires argument \code{baselineChannel}), the
# estimates are such that of the bias and scale parameters of the
# baseline channel is 0 and 1, respectively.
# With this constraint, all bias parameters are identifiable.
#
# If \code{"max"}, the offset vector will the point on the line that is
# as "great" as possible, but still such that each of its components is
# less than the corresponding minimal signal. This will guarantee that
# no negative signals are created in the backward transformation.
# If @numeric value, the offset vector will the point on the line
# such that after applying the backward transformation there are
# \code{constraint*N}. Note that \code{constraint==0} corresponds
# approximately to \code{constraint=="max"}.
# With the latter two constraints, the bias parameters are only
# identifiable modulo the fitted line.
# }
#
# \item{baselineChannel}{Index of channel toward which all other
# channels are conform.
# This argument is required if \code{constraint=="baseline"}.
# This argument is optional if \code{constraint=="diagonal"} and
# then the scale factor of the baseline channel will be one. The
# estimate of the bias parameters is not affected in this case.
# Defaults to one, if missing.
# }
#
# \item{...}{Additional arguments accepted by @see "iwpca".
# For instance, a N @vector of weights for each observation may be
# given, otherwise they get the same weight.
# }
#
# \item{aShift, Xmin}{For internal use only.}
# }
#
# \value{
# Returns a @list that contains estimated parameters and algorithm
# details;
#
# \item{a}{A @double @vector \eqn{(a[1],...,a[K])}with offset
# parameter estimates.
# It is made identifiable according to argument \code{constraint}.
# }
# \item{b}{A @double @vector \eqn{(b[1],...,b[K])}with scale
# parameter estimates. It is made identifiable by constraining
# \code{b[baselineChannel] == 1}.
# These estimates are independent of argument \code{constraint}.
# }
# \item{adiag}{If identifiability constraint \code{"diagonal"},
# a @double @vector \eqn{(adiag[1],...,adiag[K])}, where
# \eqn{adiag[1] = adiag[2] = ... adiag[K]}, specifying the point
# on the diagonal line that is closest to the fitted line,
# otherwise the zero vector.
# }
# \item{eigen}{A KxK @matrix with columns of eigenvectors.
# }
# \item{converged}{@TRUE if the algorithm converged, otherwise @FALSE.
# }
# \item{nbrOfIterations}{The number of iterations for the algorithm
# to converge, or zero if it did not converge.
# }
#
# \item{t0}{Internal parameter estimates, which contains no more
# information than the above listed elements.
# }
# \item{t}{Always @NULL.}
# }
#
# \details{
# This method uses re-weighted principal component analysis (IWPCA)
# to fit a the model \eqn{y_n = a + bx_n + eps_n} where \eqn{y_n},
# \eqn{a}, \eqn{b}, and \eqn{eps_n} are vector of the K and \eqn{x_n}
# is a scalar.
#
# The algorithm is:
# For iteration i:
# 1) Fit a line \eqn{L} through the data close using weighted PCA
# with weights \eqn{\{w_n\}}. Let
# \eqn{r_n = \{r_{n,1},...,r_{n,K}\}}
# be the \eqn{K} principal components.
# 2) Update the weights as
# \eqn{w_n <- 1 / \sum_{2}^{K} (r_{n,k} + \epsilon_r)}
# where we have used the residuals of all but the first principal
# component.
# 3) Find the point a on \eqn{L} that is closest to the
# line \eqn{D=(1,1,...,1)}. Similarly, denote the point on D that is
# closest to \eqn{L} by \eqn{t=a*(1,1,...,1)}.
# }
#
# @author
#
# %examples "fitMultiIWPCA.matrix.Rex"
#
# \seealso{
# This is an internal method used by the @see "calibrateMultiscan"
# and @see "normalizeAffine" methods.
# Internally the function @see "iwpca" is used to fit a line
# through the data cloud and the function @see "distanceBetweenLines" to
# find the closest point to the diagonal (1,1,...,1).
# }
#
# @keyword "algebra"
#*/########################################################################
setMethodS3("fitIWPCA", "matrix", function(X, constraint=c("diagonal", "baseline", "max"), baselineChannel=NULL, ..., aShift=rep(0, times=ncol(X)), Xmin=NULL) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 0. Define local functions
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
statistic <- function(X, ..., constraint="diagonal", Xmin=NULL,
baselineChannel=1, aShift=rep(0, times=ncol(X))) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Fit an K-dimensional line through the data using iterative
# re-weighted PCA.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- iwpca(X, ...);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Get the fitted line L
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Get the center of the fitted line...
ax <- fit$xMean;
names(ax) <- NULL;
# ...and the fitted eigenvectors (u1,u2,...,uK)
# with ui*uj = 0; i!=j and ui*ui = 1.
U <- t(fit$vt);
colnames(U) <- rownames(U) <- NULL;
# The fitted scale parameters b=(b[1],b[2],...,b[K]) where
# the elements are rescaled such that b[1] == 1.
# [ min(b[i]) == 1. Before it was such that b[1] == 1, but this
# is probably better. ]
# [ Indeed not; this is not good if one do more than one estimate per
# array, e.g. printtip etc. /HB 2004-04-26 ]
# [ With the introduction of 'baselineChannel' it is possible to
# specify which channel should get scale one. / HB 2004-06-30 ]
U1 <- U[,1];
bx <- as.vector(U1/U1[baselineChannel]);
# Shift the data.
# [ This is for instance useful if fitting towards the diagonal line
# and resampling under H0: y_i = alpha + z_i and /HB 2004-01-02 ]
ax <- ax + aShift;
if (identical(constraint, "diagonal")) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Find the point t on the fitted line that is closest to the
# points s on the "diagonal" line (1,1,...,1) in K-space.
# [This works also for lines in two dimension.]
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# x(s) is the fitted line (the first IWPCA component)
# y(t) is the diagonal line
ay <- rep(0, times=length(ax)); # (0,0,...,0)
by <- rep(1, times=length(ay)); # (1,1,...,1)
dbl <- distanceBetweenLines(ax=ax,bx=bx, ay=ay,by=by);
a <- as.vector(dbl$xs);
adiag <- as.vector(dbl$yt);
} else if (identical(constraint, "baseline")) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Find the point t on the fitted line for which the bias parameter of
# the baseline channel is zero, i.e. for which a[baselineChannel]==0.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# The scale parameters are already such that b[baselineChannel]==1.
# y[c,i] = a[c] + b[c]*x[c,i] ; c = 1,...,C
# y[b,i] = a[b] + x[b,i] ; b - baseline channel
# Similar to the constraint=="max" reasoning:
# For channel b, find t such that
# ax[b] + bx[b]*t == 0 <=> { bx[b]==1 } <=> t = -ax[b]
# => a[c] <- ax[c] + bx[c]*t <=> a[c] <- ax[c] - bx[c]*ax[b]
a <- ax - bx*ax[baselineChannel];
adiag <- rep(0, times=length(ax));
} else if (identical(constraint, "max")) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Find the "greatest" point t on the fitted line that is within
# the cube C whose upper limits are defined by the minimum value
# in each channel.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Find the minimal value of each X component.
if (is.null(Xmin))
Xmin <- colMins(X, na.rm=TRUE);
# For each component k, find the value t such that
# ax[k] + bx[k]*t[k] == Xmin[k] <=> t[k] == (Xmin[k] - ax[k])/bx[k]
t <- (Xmin-ax)/bx;
# Choose minimum t[k]
# Now, amax <<= Xmin if amax <- ax - bx[k]*min(t) where <<= (\prec)
# means componentswise less or equal than.
a <- ax + bx*min(t);
adiag <- rep(0, times=length(ax));
} else if (is.numeric(constraint)) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Find the "greatest" point t on the fitted line that is within
# the cube C whose upper limits are defined by the alpha quantile
# value in each channel.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Find the alpha quantile value of each X component.
if (is.null(Xmin)) {
Xmin <- colQuantiles(X, probs=constraint, na.rm=TRUE);
}
# For each component k, find the value t such that
# ax[k] + bx[k]*t[k] == Xmin[k] <=> t[k] == (Xmin[k] - ax[k])/bx[k]
t <- (Xmin-ax)/bx;
# Choose minimum t[k]
# Now, amax <<= Xmin if amax <- ax - bx[k]*min(t) where <<= (\prec)
# means componentswise less or equal than.
a <- ax + bx*min(t);
adiag <- rep(0, times=length(ax));
}
# Return the statistic
t <- c(a=a);
t <- c(t, b=bx);
t <- c(t, adiag=adiag);
t <- c(t, U=as.vector(U));
t <- c(t, niter=fit$nbrOfIterations * (fit$converged*2-1));
t;
} # statistic()
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 1. Verify the arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument: 'X'
if (!is.matrix(X))
stop("Argument 'X' must be a matrix:", mode(X));
N <- nrow(X);
K <- ncol(X);
if (K == 1) {
stop("Argument 'X' must have two or more columns:", K);
}
if (N < K) {
stop("Argument 'X' must have at least as many rows as columns:", N, "<", K);
}
# Argument: 'constraint'
if (is.numeric(constraint)) {
if (length(constraint) != 1)
stop("Argument 'constraint' can not be a numerical vector.");
if (constraint < 0 || constraint > 1)
stop("Invalid value of argument 'constraint':", constraint);
} else {
constraint <- match.arg(constraint);
if (identical(constraint, "baseline")) {
if (is.null(baselineChannel)) {
stop("Argument 'baselineChannel' must be given if 'constraint' is \"baseline\".");
}
}
}
# Argument: 'baselineChannel'
if (!is.null(baselineChannel)) {
if (!is.numeric(baselineChannel) || length(baselineChannel) != 1) {
stop("Argument 'baselineChannel' must be a single numeric: ", baselineChannel);
}
if (baselineChannel < 1 || baselineChannel > ncol(X)) {
stop("Argument 'baselineChannel' is out of range [1,", ncol(X),"]: ", baselineChannel);
}
if (!(constraint %in% c("baseline", "diagonal"))) {
stop("Argument 'baselineChannel' must not be specified if argument 'constraint' is \"baseline\" or \"diagonal\": ", constraint);
}
if (!is.null(Xmin)) {
stop("Argument 'Xmin' must not be specified if 'baselineChannel' is specified: ", paste(Xmin, collapse=", "));
}
} else {
baselineChannel <- 1;
}
# Argument: 'aShift'
if (is.null(aShift)) {
aShift <- rep(0, times=ncol(X));
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 2. Prepare the data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
if (identical(constraint, "max")) {
Xmin <- colMins(X, na.rm=TRUE);
} else {
Xmin <- NULL;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 3. Fit the model
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Use only finite observations
isFinite <- apply(X, MARGIN=1L, FUN=function(r) all(is.finite(r)));
# Number of finite observations
N <- sum(isFinite);
# Validate the number of finite observations
if (N < K) {
stop("Argument 'X' must have at least as many non-finite observations (rows) as columns:", N, "<", K);
}
t0 <- statistic(X[isFinite,], constraint=constraint,
Xmin=Xmin, baselineChannel=baselineChannel, aShift=aShift, ...);
t <- NULL;
# Extract the parameter estimates from the internal estimation vector.
a <- t0[regexpr("^a[0-9]*$", names(t0)) != -1];
b <- t0[regexpr("^b[0-9]*$", names(t0)) != -1];
adiag <- t0[regexpr("^adiag[0-9]*$", names(t0)) != -1];
U <- t0[regexpr("^U[0-9]*$", names(t0)) != -1];
U <- matrix(U, nrow=sqrt(length(U)));
niter <- as.integer(abs(t0["niter"]));
converged <- (niter > 0);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 5. Return the parameter estimates
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
list(a=a, b=b, adiag=adiag, eigen=U,
converged=converged, nbrOfIterations=niter, t0=t0, t=t);
}) # fitIWPCA()
###########################################################################
# HISTORY:
# 2013-09-26
# o Now utilizing colMins() and colQuantiles() of 'matrixStats'.
# 2011-02-05
# o DOCUMENTATION: Fixed broken links to help for iwpca().
# 2006-01-22
# o Added Rdoc help on the returned parameters.
# o If missing, 'baselineChannel' is now set to one before calling the
# internal function.
# o Now fitIWPCA() does not return the data matrix. This is to save memory.
# The calling algorithm can equally well add the data if it is needed.
# 2004-06-30
# o Added argument 'baselineChannel' with 'constraint' "baseline". This
# is useful for instance when normalizing toward a common reference. In
# such cases, the common-reference channel is unaffected and only the
# other channel(s) is affinely transformed.
# 2004-06-28
# o BUG FIX: Forgot to exclude non-finite observations when fitting the
# iwpca(). This was done before, but somehow it disappeared while
# re-organizing the code.
###########################################################################
aroma.light/R/normalizeAffine.R 0000644 0001750 0001750 00000022521 14136047216 016254 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric normalizeAffine
# @alias normalizeAffine.matrix
#
# \encoding{latin1}
#
# @title "Weighted affine normalization between channels and arrays"
#
# \description{
# @get "title".
#
# This method will remove curvature in the M vs A plots that are
# due to an affine transformation of the data. In other words, if there
# are (small or large) biases in the different (red or green) channels,
# biases that can be equal too, you will get curvature in the M vs A plots
# and this type of curvature will be removed by this normalization method.
#
# Moreover, if you normalize all slides at once, this method will also
# bring the signals on the same scale such that the log-ratios for
# different slides are comparable. Thus, do not normalize the scale of
# the log-ratios between slides afterward.
#
# It is recommended to normalize as many slides as possible in one run.
# The result is that if creating log-ratios between any channels and any
# slides, they will contain as little curvature as possible.
#
# Furthermore, since the relative scale between any two channels on any
# two slides will be one if one normalizes all slides (and channels) at
# once it is possible to add or multiply with the \emph{same} constant
# to all channels/arrays without introducing curvature. Thus, it is
# easy to rescale the data afterwards as demonstrated in the example.
# }
#
# \usage{
# @usage normalizeAffine,matrix
# }
#
# \arguments{
# \item{X}{An NxK @matrix (K>=2) where the columns represent the channels,
# to be normalized.}
# \item{weights}{If @NULL, non-weighted normalization is done.
# If data-point weights are used, this should be a @vector of length
# N of data point weights used when estimating the normalization
# function.
# }
# \item{typeOfWeights}{A @character string specifying the type of
# weights given in argument \code{weights}.
# }
# \item{method}{A @character string specifying how the estimates are
# robustified. See @see "iwpca" for all accepted values.}
# \item{constraint}{Constraint making the bias parameters identifiable.
# See @see "fitIWPCA" for more details.}
# \item{satSignal}{Signals equal to or above this threshold will not
# be used in the fitting.}
# \item{...}{Other arguments passed to @see "fitIWPCA" and in
# turn @see "iwpca". For example, the weight argument
# of @see "iwpca". See also below.}
# \item{.fitOnly}{If @TRUE, the data will not be back-transform.}
# }
#
# \value{
# A NxK @matrix of the normalized channels.
# The fitted model is returned as attribute \code{modelFit}.
# }
#
# \section{Negative, non-positive, and saturated values}{
# Affine normalization applies equally well to negative values. Thus,
# contrary to normalization methods applied to log-ratios, such as curve-fit
# normalization methods, affine normalization, will not set these to @NA.
#
# Data points that are saturated in one or more channels are not used
# to estimate the normalization function, but they are normalized.
# }
#
# \section{Missing values}{
# The estimation of the affine normalization function will only be made
# based on complete non-saturated observations, i.e. observations that
# contains no @NA values nor saturated values as defined by \code{satSignal}.
# }
#
# \section{Weighted normalization}{
# Each data point/observation, that is, each row in \code{X}, which is a
# vector of length K, can be assigned a weight in [0,1] specifying how much
# it should \emph{affect the fitting of the affine normalization function}.
# Weights are given by argument \code{weights},
# which should be a @numeric @vector of length N. Regardless of weights,
# all data points are \emph{normalized} based on the fitted normalization
# function.
# }
#
# \section{Robustness}{
# By default, the model fit of affine normalization is done in \eqn{L_1}
# (\code{method="L1"}). This way, outliers affect the parameter estimates
# less than ordinary least-square methods.
#
# For further robustness, downweight outliers such as saturated signals,
# if possible.
#
# We do not use Tukey's biweight function for reasons similar to those
# outlined in @see "calibrateMultiscan".
# }
#
# \section{Using known/previously estimated channel offsets}{
# If the channel offsets can be assumed to be known, then it is
# possible to fit the affine model with no (zero) offset, which
# formally is a linear (proportional) model, by specifying
# argument \code{center=FALSE}.
# In order to do this, the channel offsets have to be subtracted
# from the signals manually before normalizing, e.g.
# \code{Xa <- t(t(X)-a)} where \code{e} is @vector of length
# \code{ncol(X)}. Then normalize by
# \code{Xn <- normalizeAffine(Xa, center=FALSE)}.
# You can assert that the model is fitted without offset by
# \code{stopifnot(all(attr(Xn, "modelFit")$adiag == 0))}.
# }
#
# \details{
# A line is fitted robustly through the \eqn{(y_R,y_G)} observations
# using an iterated re-weighted principal component analysis (IWPCA),
# which minimized the residuals that are orthogonal to the fitted line.
# Each observation is down-weighted by the inverse of the absolute
# residuals, i.e. the fit is done in \eqn{L_1}.
# }
#
# @author
#
# \references{
# [1] @include "../incl/BengtssonHossjer_2006.bib.Rdoc" \cr
# }
#
# @examples "../incl/normalizeCurveFit.matrix.Rex"
#
# \seealso{
# @see "calibrateMultiscan".
# }
#*/#########################################################################
setMethodS3("normalizeAffine", "matrix", function(X, weights=NULL, typeOfWeights=c("datapoint"), method="L1", constraint=0.05, satSignal=2^16-1, ..., .fitOnly=FALSE) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 1. Verify the arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument: 'X'
if (ncol(X) < 2)
stop("Affine normalization requires at least two channels: ", ncol(X));
if (nrow(X) < 3)
stop("Affine normalization requires at least three observations: ", nrow(X));
# Argument: 'satSignal'
if (satSignal < 0)
stop("Argument 'satSignal' is negative: ", satSignal);
# Argument: 'typeOfWeights'
typeOfWeights <- match.arg(typeOfWeights);
# Argument: 'weights'
datapointWeights <- NULL;
if (!is.null(weights)) {
# If 'weights' is an object of a class with as.double(), cast it.
weights <- as.double(weights);
if (anyMissing(weights))
stop("Argument 'weights' must not contain NA values.");
if (any(weights < 0 | weights > 1)) {
stop("Argument 'weights' out of range [0,1]: ", paste(weights[weights < 0.0 | weights > 1.0], collapse=", "));
}
weights <- as.vector(weights);
if (length(weights) == 1) {
weights <- rep(weights, length.out=nrow(X));
} else if (length(weights) != nrow(X)) {
stop("Argument 'weights' does not have the same length as the number of data points (rows9 in the matrix: ", length(weights), " != ", nrow(X));
}
datapointWeights <- weights;
}
# Argument: 'method'
# Validate by fitIWPCA() -> iwpca()
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 2. Prepare data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Use only non-saturated observations to estimate the normalization
# function (non-finite values are taken care of by fitIWPCA()).
isSaturated <- (is.finite(X) & X >= satSignal);
Xsat <- X[isSaturated];
X[isSaturated] <- NA;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 3. Fit the model
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- fitIWPCA(X, w=datapointWeights, method=method, constraint=constraint, ...);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 4. Backtransform
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
X[isSaturated] <- Xsat;
# Not needed anymore
isSaturated <- Xsat <- NULL;
if (.fitOnly == FALSE) {
X <- backtransformAffine(X, a=fit);
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 5. Return the backtransformed data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
attr(X, "modelFit") <- fit;
X;
}) # normalizeAffine()
############################################################################
# HISTORY:
# 2013-09-26
# o Now utilizing anyMissing().
# 2011-02-05
# o DOCUMENTATION: Added section on how to normalize when channel offsets
# are supposed to be known/zero.
# o DOCUMENTATION: Fixed broken links to help for iwpca().
# 2005-06-03
# o Added argument 'typeOfWeights' to make it similar to other normalization
# methods, although only "datapoint" weights are allowed.
# 2005-02-27
# o Passes argument 'methods' to fitIWPCA() now.
# 2005-02-02
# o BUG FIX: isSaturated could contain NA.
# 2005-02-01
# o Added argument '.fitOnly'.
# o Added validation of argument 'weights'.
# 2005-01-24
# o Added argument 'weights' (instead of passing 'w' to fitIWPCA()).
# o Now, saturated functions are normalized, but just not used when
# estimating the normalization function.
# 2005-01-23
# o Updated the Rdoc comments and error messages.
# 2004-12-28
# o Added Rdoc comments on weights.
# 2004-06-28
# o BUG FIX: Missing braces in Rdoc comments.
############################################################################
aroma.light/R/backtransformPrincipalCurve.R 0000644 0001750 0001750 00000015172 14136047216 020652 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric backtransformPrincipalCurve
# @alias backtransformPrincipalCurve.numeric
# @alias backtransformPrincipalCurve.matrix
#
# @title "Reverse transformation of principal-curve fit"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage backtransformPrincipalCurve,matrix
# @usage backtransformPrincipalCurve,numeric
# }
#
# \arguments{
# \item{X}{An NxK @matrix containing data to be backtransformed.}
# \item{fit}{An MxL principal-curve fit object of class
# \code{principal_curve} as returned by @see "fitPrincipalCurve".
# Typically \eqn{L = K}, but not always.
# }
# \item{dimensions}{An (optional) subset of of D dimensions all in [1,L]
# to be returned (and backtransform).}
# \item{targetDimension}{An (optional) index specifying the dimension
# in [1,L] to be used as the target dimension of the \code{fit}.
# More details below.}
# \item{...}{Passed internally to @see "stats::smooth.spline".}
# }
#
# \value{
# The backtransformed NxK (or NxD) @matrix.
# }
#
# \details{
# Each column in X ("dimension") is backtransformed independently
# of the others.
# }
#
# \section{Target dimension}{
# By default, the backtransform is such that afterward the signals are
# approximately proportional to the (first) principal curve as fitted
# by @see "fitPrincipalCurve". This scale and origin of this
# principal curve is not uniquely defined.
# If \code{targetDimension} is specified, then the backtransformed signals
# are approximately proportional to the signals of the target dimension,
# and the signals in the target dimension are unchanged.
# }
#
# \section{Subsetting dimensions}{
# Argument \code{dimensions} can be used to backtransform a subset of
# dimensions (K) based on a subset of the fitted dimensions (L).
# If \eqn{K = L}, then both \code{X} and \code{fit} is subsetted.
# If \eqn{K <> L}, then it is assumed that \code{X} is already
# subsetted/expanded and only \code{fit} is subsetted.
# }
#
# @examples "../incl/backtransformPrincipalCurve.matrix.Rex"
#
# \seealso{
# @see "fitPrincipalCurve"
# }
#*/#########################################################################
setMethodS3("backtransformPrincipalCurve", "matrix", function(X, fit, dimensions=NULL, targetDimension=NULL, ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument 'X'
if (!is.numeric(X)) {
stop("Argument 'X' is not numeric: ", mode(X));
}
dimnamesX <- dimnames(X);
dimX <- dim(X);
K <- dimX[2];
if (!is.matrix(X)) {
X <- as.matrix(X);
}
# Argument 'fit'
if (!inherits(fit, "principal_curve")) {
stop("Argument 'fit' is not a principal_curve object: ", class(fit)[1]);
}
# Argument 'dimensions'
dimS <- dim(fit$s);
L <- dimS[2];
if (!is.null(dimensions)) {
dimensions <- as.integer(dimensions);
if (any(dimensions < 1 | dimensions > L)) {
stop("Argument 'dimensions' contains values out of range [1,", L, "]");
}
}
# Argument 'targetDimension':
if (!is.null(targetDimension)) {
targetDimension <- as.integer(targetDimension);
if (length(targetDimension) != 1L) {
stop("Argument 'targetDimension' should be a scalar or NULL.");
}
if (targetDimension < 1L | targetDimension > L) {
stop("Argument 'targetDimension' is out of range [1,", L, "]: ", targetDimension);
}
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Transform towards a target dimension?
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
hasTargetDimension <- (!is.null(targetDimension));
if (hasTargetDimension) {
lambda <- fit$s[,targetDimension];
} else {
lambda <- fit$lambda;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Subset dimensions?
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
s <- fit$s;
if (!is.null(dimensions)) {
s <- s[,dimensions,drop=FALSE];
if (K == L) {
X <- X[,dimensions,drop=FALSE];
dimX <- dim(X);
dimnamesX <- dimnames(X);
}
dimS <- dim(s);
L <- dimS[2];
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Find backtransformations and backtransform data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
naValue <- NA;
mode(naValue) <- mode(X);
Xhat <- matrix(naValue, nrow=dimX[1], ncol=dimX[2]);
okLambda <- is.finite(lambda);
for (kk in seq_len(L)) {
sKK <- s[,kk];
ok <- (is.finite(sKK) & okLambda);
fitKK <- smooth.spline(sKK[ok], lambda[ok], ...);
Xkk <- X[,kk];
keep <- which(is.finite(Xkk));
Xkk <- Xkk[keep];
XhatKK <- predict(fitKK, x=Xkk)$y;
# Sanity check
stopifnot(length(XhatKK) == length(keep));
Xhat[keep,kk] <- XhatKK;
}
# Not needed anymore
sKK <- lambda <- fitKK <- XhatKK <- keep <- s <- NULL;
dim(Xhat) <- dimX;
dimnames(Xhat) <- dimnamesX;
Xhat;
}) # backtransformPrincipalCurve()
setMethodS3("backtransformPrincipalCurve", "numeric", function(X, ...) {
X <- as.matrix(X);
backtransformPrincipalCurve(X, ...);
})
###########################################################################
# HISTORY:
# 2013-04-18
# o BUG FIX: backtransformPrincipalCurve() gave an error if the
# pricipal curve was fitted using data with missing values.
# Now backtransformPrincipalCurve() preserves dimension names.
# 2009-05-29
# o BUG FIX: Previous bug fix in backtransformPrincipalCurve() regarding
# argument 'dimension' broke the initial purpose of this argument. Since
# both use cases are still of interest, how the subsetting is done is now
# based on whether the number of dimensions of the input data and the
# model fit match or not. See help(backtransformPrincipalCurve.matrix).
# Added several cases to the example code for testing this.
# o Added more Rdoc comments.
# 2009-05-12
# o BUG FIX: backtransformPrincipalCurve(..., dimensions) did not subset
# the 'X' matrix. Also, the method now returns a matrix of the same
# number of columns requested. The Rd example now illustrates this.
# Thanks to Pierre Neuvial, UC Berkeley for the troublshooting and fix.
# 2009-02-08
# o An error was thrown in backtransformPrincipalCurve() if argument
# 'dimensions' was specified.
# o BUG FIX:
# 2009-01-12
# o Updated validation of arguments such that it does not require R.utils.
# 2008-10-08
# o Added argument 'targetDimension' to backtransformPrincipalCurve().
# 2008-10-07
# o Created.
###########################################################################
aroma.light/R/callNaiveGenotypes.R 0000644 0001750 0001750 00000015731 14136047216 016744 0 ustar nilesh nilesh ###########################################################################/**
# @RdocGeneric callNaiveGenotypes
# @alias callNaiveGenotypes.numeric
#
# @title "Calls genotypes in a normal sample"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage callNaiveGenotypes,numeric
# }
#
# \arguments{
# \item{y}{A @numeric @vector of length J containing allele B fractions
# for a normal sample.}
# \item{cn}{An optional @numeric @vector of length J specifying the true
# total copy number in \eqn{\{0,1,2,NA\}} at each locus. This can be
# used to specify which loci are diploid and which are not, e.g.
# autosomal and sex chromosome copy numbers.}
# \item{...}{Additional arguments passed to @see "fitNaiveGenotypes".}
# \item{modelFit}{A optional model fit as returned
# by @see "fitNaiveGenotypes".}
# \item{verbose}{A @logical or a @see "R.utils::Verbose" object.}
# }
#
# \value{
# Returns a @numeric @vector of length J containing the genotype calls
# in allele B fraction space, that is, in [0,1] where 1/2 corresponds
# to a heterozygous call, and 0 and 1 corresponds to homozygous A
# and B, respectively.
# Non called genotypes have value @NA.
# }
#
# @examples "../incl/callNaiveGenotypes.Rex"
#
# \section{Missing and non-finite values}{
# A missing value always gives a missing (@NA) genotype call.
# Negative infinity (-@Inf) always gives genotype call 0.
# Positive infinity (+@Inf) always gives genotype call 1.
# }
#
# @author
#
# \seealso{
# Internally @see "fitNaiveGenotypes" is used to identify the thresholds.
# }
#*/###########################################################################
setMethodS3("callNaiveGenotypes", "numeric", function(y, cn=rep(2L, times=length(y)), ..., modelFit=NULL, verbose=FALSE) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument 'y':
J <- length(y);
y <- as.double(y);
# Argument 'cn':
cn <- as.integer(cn);
if (length(cn) == 1L) {
cn <- rep(cn, times=J);
} else if (length(cn) != J) {
stop("The length of argument 'cn' does not match 'y': ", length(cn), " != ", J);
}
uniqueCNs <- sort(unique(cn));
unknown <- which(!is.element(uniqueCNs, c(0,1,2,NA)));
if (length(unknown) > 0L) {
unknown <- paste(uniqueCNs[unknown], collapse=", ");
stop("Argument 'cn' contains unknown CN levels: ", unknown);
}
# Argument 'modelFit':
if (!is.null(modelFit)) {
if (!inherits(modelFit, "NaiveGenotypeModelFit")) {
throw("Argument 'modelFit' is not of class NaiveGenotypeModelFit: ", class(modelFit)[1]);
}
}
# Argument 'verbose':
verbose <- Arguments$getVerbose(verbose);
if (verbose) {
pushState(verbose);
on.exit(popState(verbose));
}
verbose && enter(verbose, "Calling genotypes from allele B fractions (BAFs)");
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Fit naive genotype model?
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
if (is.null(modelFit)) {
verbose && enter(verbose, "Fitting naive genotype model");
modelFit <- fitNaiveGenotypes(y=y, cn=cn, ..., verbose=verbose);
verbose && print(verbose, modelFit);
verbose && exit(verbose);
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Call genotypes
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
mu <- rep(NA_real_, times=J);
# To please R CMD check
type <- NULL; rm(list="type");
# Fitted CNs
cns <- sapply(modelFit, FUN=function(fit) fit$cn);
for (kk in seq_along(uniqueCNs)) {
cnKK <- uniqueCNs[kk];
verbose && enter(verbose, sprintf("Copy number level #%d (C=%g) of %d", kk, cnKK, length(uniqueCNs)));
# Special case
if (cnKK == 0) {
verbose && cat(verbose, "TCN=0 => BAF not defined. Skipping.");
verbose && exit(verbose);
next;
}
keep <- which(cn == cnKK);
yKK <- y[keep];
idx <- which(cnKK == cns);
if (length(idx) != 1L) {
msg <- sprintf("Cannot call genotypes for %d loci with true total copy number %d, because the naive genotype model was not fit for such copy numbers. Skipping.", length(yKK), cnKK);
verbose && cat(verbose, msg);
verbose && exit(verbose);
next;
}
fitKK <- modelFit[[idx]];
verbose && cat(verbose, "Model fit:");
verbose && print(verbose, fitKK);
tau <- fitKK$tau;
if (is.null(tau)) {
# Backward compatibility
fitValleys <- fitKK$fitValleys;
verbose && cat(verbose, "Local minimas (\"valleys\") in BAF:");
verbose && print(verbose, fitValleys);
tau <- fitValleys$x;
# Not needed anymore
fitValleys <- NULL;
}
verbose && printf(verbose, "Genotype threshholds [%d]: %s\n", length(tau), hpaste(tau));
# Call genotypes
muKK <- rep(NA_real_, times=length(yKK));
if (cnKK == 1) {
verbose && cat(verbose, "TCN=1 => BAF in {0,1}.");
a <- tau[1];
verbose && printf(verbose, "Call regions: A = (-Inf,%.3f], B = (%.3f,+Inf)\n", a, a);
muKK[yKK <= a] <- 0;
muKK[a < yKK] <- 1;
} else if (cnKK == 2) {
verbose && cat(verbose, "TCN=2 => BAF in {0,1/2,1}.");
a <- tau[1];
b <- tau[2];
verbose && printf(verbose, "Call regions: AA = (-Inf,%.3f], AB = (%.3f,%.3f], BB = (%.3f,+Inf)\n", a, a, b, b);
muKK[yKK <= a] <- 0;
muKK[a < yKK & yKK <= b] <- 1/2;
muKK[b < yKK] <- 1;
} else {
verbose && printf(verbose, "TCN=%d => Skipping.\n", cnKK);
}
mu[keep] <- muKK;
verbose && exit(verbose);
} # for (kk ...)
# Sanity check
stopifnot(length(mu) == J);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Return genotype calls (and parameter estimates)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
attr(mu, "modelFit") <- modelFit;
verbose && exit(verbose);
mu;
}) # callNaiveGenotypes()
###########################################################################
# HISTORY:
# 2012-04-16
# o CLEANUP: Dropped argument 'flavor' of callNaiveGenotypes(); it is
# now passed to fitNaiveGenotypes() via '...'.
# o GENERALIZATION: Now callNaiveGenotypes() no longer relies on 'modelFit'
# to hold a 'fitValleys' element, but rather a 'tau' element.
# 2010-10-14
# o TYPO FIX: Used name 'fitPeaks' instead of 'fitValleys'.
# 2010-10-07
# o Now callNaiveGenotypes() utilizes fitNaiveGenotypes().
# o Added more detailed verbose to callNaiveGenotypes().
# 2010-07-23
# o Now callNaiveGenotypes() returns the model estimates as attribute
# 'modelFit'.
# 2010-04-04
# o Updated code such that R.utils::Verbose is optional.
# o Corrected an Rdoc tag typo.
# 2009-11-03
# o Added an example() to the Rd help of callNaiveGenotypes().
# 2009-07-08
# o BUG FIX: Was never tested. Now tested via example(normalizeTumorBoost).
# 2009-07-06
# o Created from aroma.cn test script.
###########################################################################
aroma.light/R/linearAlgebra.R 0000644 0001750 0001750 00000001363 14136047216 015674 0 ustar nilesh nilesh # The trace of matrix X
tr <- function(X, na.rm=FALSE) {
sum(diag(X), na.rm=na.rm);
}
# u and v are column vectors.
scalarProduct <- function(u,v, na.rm=FALSE) {
colSums(u*v, na.rm=na.rm);
}
# u are column vectors, v a single vector for now.
projectUontoV <- function(u,v, na.rm=FALSE) {
vN <- v / sqrt(sum(v^2, na.rm=na.rm));
sp <- scalarProduct(u,vN, na.rm=na.rm);
sp <- rep(sp, each=length(vN));
sp <- matrix(sp, nrow=length(vN));
v * sp;
}
############################################################################
# HISTORY:
# 2005-01-24
# o Added arg. na.rm=FALSE to tr(), scalarProduct(), and projectUontoV().
# 2003-11-06
# o Created for readability.
############################################################################
aroma.light/R/000.R 0000644 0001750 0001750 00000000346 14136047216 013443 0 ustar nilesh nilesh ## covr: skip=all
## Look for existing generic functions also in imported namespaces.
## This will affect whether setGenericS3() creates a generic function
## or not.
options("R.methodsS3:checkImports:setGenericS3"=TRUE)
aroma.light/R/normalizeFragmentLength.R 0000644 0001750 0001750 00000034561 14136047216 020000 0 ustar nilesh nilesh ###########################################################################/**
# @RdocDefault normalizeFragmentLength
#
# @title "Normalizes signals for PCR fragment-length effects"
#
# \description{
# @get "title". Some or all signals are used to estimated the
# normalization function. All signals are normalized.
# }
#
# @synopsis
#
# \arguments{
# \item{y}{A @numeric @vector of length K of signals to be normalized
# across E enzymes.}
# \item{fragmentLengths}{An @integer KxE @matrix of fragment lengths.}
# \item{targetFcns}{An optional @list of E @functions; one per enzyme.
# If @NULL, the data is normalized to have constant fragment-length
# effects (all equal to zero on the log-scale).}
# \item{subsetToFit}{The subset of data points used to fit the
# normalization function.
# If @NULL, all data points are considered.}
# \item{onMissing}{Specifies how data points for which there is no
# fragment length is normalized.
# If \code{"ignore"}, the values are not modified.
# If \code{"median"}, the values are updated to have the same
# robust average as the other data points.
# }
# \item{.isLogged}{A @logical.}
# \item{...}{Additional arguments passed to @see "stats::lowess".}
# \item{.returnFit}{A @logical.}
# }
#
# \value{
# Returns a @numeric @vector of the normalized signals.
# }
#
# \section{Multi-enzyme normalization}{
# It is assumed that the fragment-length effects from multiple enzymes
# added (with equal weights) on the intensity scale.
# The fragment-length effects are fitted for each enzyme separately based
# on units that are exclusively for that enzyme.
# \emph{If there are no or very such units for an enzyme, the assumptions
# of the model are not met and the fit will fail with an error.}
# Then, from the above single-enzyme fits the average effect across
# enzymes is the calculated for each unit that is on multiple enzymes.
# }
#
# \section{Target functions}{
# It is possible to specify custom target function effects for each
# enzyme via argument \code{targetFcns}. This argument has to be a
# @list containing one @function per enzyme and ordered in the same
# order as the enzyme are in the columns of argument
# \code{fragmentLengths}.
# For instance, if one wish to normalize the signals such that their
# mean signal as a function of fragment length effect is constantly
# equal to 2200 (or the intensity scale), the use
# \code{targetFcns=function(fl, ...) log2(2200)} which completely
# ignores fragment-length argument 'fl' and always returns a
# constant.
# If two enzymes are used, then use
# \code{targetFcns=rep(list(function(fl, ...) log2(2200)), 2)}.
#
# Note, if \code{targetFcns} is @NULL, this corresponds to
# \code{targetFcns=rep(list(function(fl, ...) 0), ncol(fragmentLengths))}.
#
# Alternatively, if one wants to only apply minimal corrections to
# the signals, then one can normalize toward target functions that
# correspond to the fragment-length effect of the average array.
# }
#
# \examples{
# @include "../incl/normalizeFragmentLength-ex1.Rex"
#
# @include "../incl/normalizeFragmentLength-ex2.Rex"
# }
#
# @author "HB"
#
# \references{
# [1] @include "../incl/BengtssonH_etal_2008.bib.Rdoc" \cr
# }
#
# @keyword "nonparametric"
# @keyword "robust"
#*/###########################################################################
setMethodS3("normalizeFragmentLength", "default", function(y, fragmentLengths, targetFcns=NULL, subsetToFit=NULL, onMissing=c("ignore", "median"), .isLogged=TRUE, ..., .returnFit=FALSE) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument 'y':
y <- as.double(y);
nbrOfDataPoints <- length(y);
okY <- is.finite(y);
# Sanity check
if (!any(okY, na.rm=TRUE)) {
throw("Cannot fit normalization function to enzyme, because there are no (finite) data points in argument 'y'.");
}
# Argument 'fragmentLengths':
if (!is.matrix(fragmentLengths)) {
if (is.vector(fragmentLengths)) {
fragmentLengths <- as.matrix(fragmentLengths);
} else {
throw("Argument 'fragmentLengths' must be a matrix: ",
class(fragmentLengths)[[1]]);
}
}
if (nrow(fragmentLengths) != nbrOfDataPoints) {
throw("Number of rows in argument 'fragmentLengths' does not match the length of argument 'y': ", nrow(fragmentLengths), " != ", nbrOfDataPoints);
}
nbrOfEnzymes <- ncol(fragmentLengths);
allEnzymes <- seq_len(nbrOfEnzymes);
# Coerce to doubles
for (ee in allEnzymes) {
fragmentLengths[,ee] <- as.double(fragmentLengths[,ee]);
}
# Assert that there are some finite fragment lengths
hasFL <- is.finite(fragmentLengths);
if (!any(hasFL)) {
throw("Cannot fit normalization function. Argument 'fragmentLengths' contains no finite values.");
}
# Assert that for each enzyme there exist some finite fragment lengths
for (ee in allEnzymes) {
if (sum(hasFL[,ee]) == 0) {
throw(sprintf("Cannot fit normalization function to enzyme #%d, because there are no units with finite fragment lengths for this enzyme: ", ee));
}
}
# Count the number of enzymes per units
countFL <- rep(0L, times=nbrOfDataPoints);
for (ee in allEnzymes) {
countFL <- countFL + as.integer(hasFL[,ee]);
}
# Assert that there are units from a single enzyme
isSingleEnzymed <- (countFL == 1L);
if (sum(isSingleEnzymed) == 0) {
throw("Cannot fit normalization function, because none of the units are on fragments from a single enzyme, or equivalently, there exist no rows in argument 'fragmentLenghts' that only have one finite value.");
}
# Argument 'targetFcns':
if (!is.null(targetFcns)) {
if (!is.list(targetFcns)) {
if (nbrOfEnzymes == 1L) {
targetFcns <- list(targetFcns);
} else {
throw("Argument 'targetFcns' is not a list: ", class(targetFcns)[1]);
}
}
if (length(targetFcns) != nbrOfEnzymes) {
throw("Number of elements in 'targetFcns' does not match the number of columns in 'fragmentLengths': ", length(targetFcns), " != ", nbrOfEnzymes);
}
# Validate each element
for (ee in allEnzymes) {
if (!is.function(targetFcns[[ee]])) {
throw("One element in 'targetFcns' is not a function: ", class(targetFcns[[ee]])[1]);
}
}
}
# Argument 'subsetToFit':
if (!is.null(subsetToFit)) {
subsetToFit <- as.integer(subsetToFit);
if (length(subsetToFit) > nbrOfDataPoints) {
throw("The length of argument 'subsetToFit' does not match the number of data points: ", length(subsetToFit), " != ", nbrOfDataPoints);
}
}
# Argument 'onMissing':
onMissing <- match.arg(onMissing);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Estimate normalization function and predict the signals
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Fit smooth curve to each enzyme separately
# KxE matrix for sample (and target predictions)
mu <- matrix(NA_real_, nrow=nbrOfDataPoints, ncol=nbrOfEnzymes);
if (!is.null(targetFcns)) {
muT <- matrix(NA_real_, nrow=nbrOfDataPoints, ncol=nbrOfEnzymes);
}
if (.returnFit) {
fits <- vector("list", nbrOfEnzymes);
}
for (ee in allEnzymes) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# (a) Fit normalization function
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# (i) Fit only to units that are on fragments from a single (this)
# enzyme and that there exist finite fragment lengths
ok <- isSingleEnzymed & hasFL[,ee];
if (!any(ok)) {
throw(sprintf("Cannot fit normalization function to enzyme #%d, because there are no units in argument 'fragmentLengths' that are unique to this enzyme and with finite fragment lengths: ", ee));
}
# (ii) Fit only to units with non-missing data points.
ok <- ok & okY;
# Sanity check
if (!any(ok)) {
throw(sprintf("Cannot fit normalization function to enzyme #%d, because there are no units in argument 'fragmentLengths' that are unique to this enzyme and with finite fragment lengths and at the same time have finite values in argument 'y': ", ee));
}
if (!is.null(subsetToFit)) {
ok[-subsetToFit] <- FALSE;
# Sanity check
if (!any(ok)) {
throw(sprintf("Cannot fit normalization function to enzyme #%d, because after subsetting there are no units in argument 'fragmentLengths' that are unique to this enzyme and with finite fragment lengths and at the same time have finite values in argument 'y': ", ee));
}
}
# All fragment lengths for current enzyme
fl <- fragmentLengths[,ee];
# Fit finite {(lambda, log2theta)_j} to data points j on current enzyme
suppressWarnings({
fit <- lowess(fl[ok], y[ok], ...);
class(fit) <- "lowess";
})
# Not needed anymore
ok <- NULL;
if (.returnFit)
fits[[ee]] <- fit;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# (b) Calculate correction factor
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calculate the correction factor for every data point on this enzyme
ok <- (hasFL[,ee] & okY);
mu[ok,ee] <- predict(fit, newdata=fl[ok]);
if (.returnFit) {
fits[[ee]] <- list(fit=fit, mu=mu[,ee]);
}
# Normalize toward a target function?
if (!is.null(targetFcns)) {
muT[ok,ee] <- targetFcns[[ee]](fl[ok]);
if (.returnFit) {
fits[[ee]]$muT <- muT[,ee];
}
}
# Not needed anymore
fit <- fl <- NULL;
} # for (ee ...)
# Not needed anymore
hasFL <- isSingleEnzymed <- NULL;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calculate the *average* predicted signal across enzymes
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Sum on the non-log scale.
if (.isLogged) {
mu <- 2^mu;
if (!is.null(targetFcns))
muT <- 2^muT;
}
mu <- rowSums(mu, na.rm=TRUE);
# mu <- mu / countFL; # Averaging (needed?!?)
if (!is.null(targetFcns)) {
muT <- rowSums(muT, na.rm=TRUE);
# muT <- muT / countFL; # Averaging (needed?!?)
}
# Special case: Units with unknown fragment lengths
if (onMissing != "ignore") {
isMissing <- (countFL == 0);
if (any(isMissing)) {
if (onMissing == "median") {
# Let the predicted value for these units be the robust average
# of all other units (based on the assumption that the missing
# fragment lengths are distributed as the known ones).
# Identify the set to be used to estimate the target average
ok <- (okY & !isMissing);
# Sanity check
if (!any(ok)) {
throw("Cannot fit normalization function to loci with unknown fragment lengths, because there are no (finite) data points to be fitted.");
}
if (!is.null(subsetToFit)) {
ok[-subsetToFit] <- FALSE;
# Sanity check
if (!any(ok)) {
throw("Cannot fit normalization function to loci with unknown fragment lengths, because after subsetting there are no (finite) data points to be fitted.");
}
}
# Substitute the predicted means with the median of the already
# predicted set of loci.
mu[isMissing] <- median(mu[ok], na.rm=TRUE);
if (!is.null(targetFcns)) {
muT[isMissing] <- median(muT[ok], na.rm=TRUE);
}
# Not needed anymore
ok <- NULL;
} # if (onMissing == "median")
}
# Not needed anymore
isMissing <- NULL;
}
# Not needed anymore
countFL <- NULL;
if (.isLogged) {
mu <- log2(mu);
if (!is.null(targetFcns))
muT <- log2(muT);
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calculate the correction ("normalization") factor
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calculate correction factors
if (is.null(targetFcns)) {
dy <- mu;
} else {
dy <- (mu - muT);
}
# Not needed anymore
mu <- NULL;
if (!is.null(targetFcns)) {
# Not needed anymore
muT <- NULL;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Normalize signals
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Transform signals
ok <- is.finite(dy) & okY;
# Not needed anymore
okY <- NULL;
y[ok] <- y[ok] - dy[ok];
if (.returnFit) {
attr(y, "modelFit") <- fits;
}
y;
}, private=TRUE)
############################################################################
# HISTORY:
# 2010-09-18
# o ROBUSTNESS: Now normalizeFragmentLength() asserts that arguments
# 'fragmentLengths' and 'y' contain at least some finite values and
# specifies the same number of units. In addition, the method also
# gives more informative error messages in case it cannot fit the
# normalization function due to non-finite values.
# 2008-09-11
# o Now onMissing="median" estimates the median on using the subset to fit.
# 2008-09-10
# o Added argument 'onMissing' to normalizeFragmentLength() for specifying
# how to normalize (if at all) data points for which the fragment lengths
# are unknown. For backward compatibility, we start of by having it
# "ignore" by default.
# 2008-05-10
# o BUG FIX: If the 'subsetToFit' was shorter than the number of data
# points, an exception was thrown. The test was supposed to be assert
# that the subset was not greater than the number of data points.
# 2008-04-14
# o Removed any usage of R.utils::Arguments.
# 2007-11-29
# o BUG FIX: The implemented multi-enzyme model was not the one in mind;
# The correction for the multi-enzyme data points was not right.
# Have now created an updated example that displays the normalized
# log-ratios (as a function of fragment length as well as they densities).
# The example does also test the case for non-aliquot mixing proportions
# between enzymes. This is actually automagically corrected for by the
# way the model was set up, i.e. there is no need to estimate the
# mixing proportions.
# 2007-11-19
# o Added Rdoc examples. From these simulation examples, it looks like the
# multi-enzyme normalization method works.
# o Updated normalizeFragmentLength() to handle multiple enzymes.
# 2006-11-28
# o Created.
############################################################################
aroma.light/R/iwpca.R 0000644 0001750 0001750 00000022237 14136047216 014252 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric iwpca
# @alias iwpca.matrix
#
# @title "Fits an R-dimensional hyperplane using iterative re-weighted PCA"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage iwpca,matrix
# }
#
# \arguments{
# \item{X}{N-times-K @matrix where N is the number of observations and
# K is the number of dimensions.}
# \item{w}{An N @vector of weights for each row (observation) in
# the data matrix. If @NULL, all observations get the same weight.}
# \item{R}{Number of principal components to fit. By default a line
# is fitted.}
# \item{method}{
# If \code{"symmetric"} (or \code{"bisquare"}), Tukey's biweight
# is used. If \code{"tricube"}, the tricube weight is used.
# If \code{"L1"}, the model is fitted in \eqn{L_1}.
# If a @function, it is used to calculate weights for next iteration
# based on the current iteration's residuals.}
# \item{maxIter}{Maximum number of iterations.}
# \item{acc}{The (Euclidean) distance between two subsequent parameters
# fit for which the algorithm is considered to have converged.}
# \item{reps}{Small value to be added to the residuals before the
# the weights are calculated based on their inverse. This is to avoid
# infinite weights.}
# \item{fit0}{A @list containing elements \code{vt} and \code{pc}
# specifying an initial fit.
# If @NULL, the initial guess will be equal to the (weighted) PCA fit.}
# \item{...}{Additional arguments accepted by @see "wpca".}
# }
#
# \value{
# Returns the fit (a @list) from the last call to @see "wpca"
# with the additional elements \code{nbrOfIterations} and
# \code{converged}.
# }
#
# \details{
# This method uses weighted principal component analysis (WPCA) to fit a
# R-dimensional hyperplane through the data with initial internal
# weights all equal.
# At each iteration the internal weights are recalculated based on
# the "residuals".
# If \code{method=="L1"}, the internal weights are 1 / sum(abs(r) + reps).
# This is the same as \code{method=function(r) 1/sum(abs(r)+reps)}.
# The "residuals" are orthogonal Euclidean distance of the principal
# components R,R+1,...,K.
# In each iteration before doing WPCA, the internal weighted are
# multiplied by the weights given by argument \code{w}, if specified.
# }
#
# @author
#
# @examples "../incl/iwpca.matrix.Rex"
#
# \seealso{
# Internally @see "wpca" is used for calculating the weighted PCA.
# }
#
# @keyword "algebra"
#*/#########################################################################
setMethodS3("iwpca", "matrix", function(X, w=NULL, R=1, method=c("symmetric", "bisquare", "tricube", "L1"), maxIter=30, acc=1e-4, reps=0.02, fit0=NULL, ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 1. Verify the arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument: 'w'
if (!is.null(w))
w <- rep(w, length.out=nrow(X));
w0 <- w;
# Argument: 'method'
if (is.function(method)) {
dummy <- method(1:5);
if (!is.numeric(dummy))
stop("Argument 'method' (weight function) does not return numeric values.");
if (!is.vector(dummy))
stop("Argument 'method' (weight function) does not return a vector.");
if (length(dummy) != 5)
stop("Argument 'method' (weight function) does not return the correct number of values.");
} else {
method <- match.arg(method);
}
# Argument: 'fit0'
if (!is.null(fit0)) {
if (!all(c("vt", "pc") %in% names(fit0))) {
throw("Argument 'fit0' is missing element 'vt' or 'pc': ",
paste(names(fit0), collapse=", "));
}
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 2. Fit the model
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Ulast <- 1/.Machine$double.eps; # A large number
iter <- 0;
isConverged <- FALSE;
w <- rep(1, length=nrow(X));
while (!isConverged && iter < maxIter) {
if (iter > 0 || is.null(fit0)) {
iter <- iter + 1;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# "Re-weight the weights"
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
if (!is.null(w0))
w <- w0 * w;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Fit N-dimensional weighted PCA.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- wpca(X, w=w, scale=FALSE, ...);
} else {
fit <- fit0;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Get the fitted line L
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Get fitted eigenvectors (u1,u2,...,uN)
# with ui*uj = 0; i!=j and ui*ui = 1.
U <- fit$vt;
colnames(U) <- rownames(U) <- NULL;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Check for convergence
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
isConverged <- (sum(abs(U-Ulast))/length(U) < acc);
Ulast <- U;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Finally, update the weights
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Residuals in the "tailing" dimensions.
r <- fit$pc[,-c(1:R), drop=FALSE];
# Residuals in orthogonal Euclidean distance
if (anyMissing(r)) {
# Sometimes some residuals become NAs.
r <- sqrt(rowSums(r^2, na.rm=TRUE));
} else {
r <- sqrt(rowSums(r^2));
}
# Down-weight points that are "far" away...
if (is.character(method)) {
if (method == "L1") {
# Add small number to residuals to avoid infinite weights
r <- abs(r) + reps;
w <- 1/r;
} else if (method %in% c("symmetric", "bisquare")) {
# Add small number to residuals to avoid infinite weights
r <- abs(r) + reps;
r <- r/6;
# Zero weights introduce NA's (for unknown reasons), therefore
# with use a number very close to zero instead.
w <- rep(.Machine$double.eps, times=length(r));
ii <- (r < 1);
w[ii] <- (1-r[ii]^2)^2;
# Not needed anymore
ii <- NULL;
} else if (method == "tricube") {
# Add small number to residuals to avoid infinite weights
r <- abs(r) + reps;
r <- r/6;
# Zero weights introduce NA's (for unknown reasons), therefore
# with use a number very close to zero instead.
w <- rep(.Machine$double.eps, times=length(r));
ii <- (r < 1);
w[ii] <- (1-r[ii]^3)^3;
# Not needed anymore
ii <- NULL;
}
} else if (is.function(method)) {
# Pass also the "fitted values" to the weight function.
attr(r, "x") <- fit$pc[,1:R, drop=FALSE];
attr(r, "reps") <- reps;
w <- method(r);
# Combine w_i = ||w_{i,j}||_2 (Euclidean distance), if needed.
if (is.matrix(w) && ncol(w) > 1) {
w <- sqrt(rowSums(w^2)/ncol(w));
} else {
w <- as.vector(w);
}
}
# Sometimes some weights become NAs (not sure why). Set them to zero.
if (anyMissing(w)) {
w[is.na(w)] <- 0;
}
# Not needed anymore
r <- NULL;
} # while(...)
fit$w <- w;
fit$nbrOfIterations <- iter;
fit$converged <- isConverged;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 3. Return the estimated parameters
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit;
}) # iwpca()
############################################################################
# HISTORY:
# 2013-09-26
# o Now utilizing anyMissing().
# 2006-04-25
# o Updated the example to first plot data from all viewpoints, then just
# the lines. Faster since the lines are only fitted once and nicer.
# 2005-05-03
# o Now test of argument 'fit0' checks if it is NULL.
# 2005-03-28
# o Second try with initial guess; added argument 'fit0'.
# 2005-02-08
# o Added "symmetric" (now default) and "tricube" too.
# 2005-02-07
# o Argument 'method' is now how the weights are calculated from the
# residuals, not how residuals are combined across dimensions.
# Method "L2" is therefore removed, because it corresponds to wpca().
# o Added support for weight functions via argument 'method'. The function
# must take a matrix of residuals as the first argument.
# 2004-05-14
# o Made into a method of class matrix instead of a stand-alone function.
# 2004-04-18
# o Added support for weighted IWPCA; simply by weighing the weights.
# 2004-03-09
# o Now making use of our own weighted PCA method instead of the
# multidim::acp() method, which is not maintained anymore.
# 2003-12-29
# o Generalized the method to fit a R-dimensional hyperplane instead of
# just a line, which is the default fit.
# 2003-12-28
# o Added to the R.basic package.
# o Created by extracted code from RGData.fitMultiIWPCA().
############################################################################
aroma.light/R/zzz.R 0000644 0001750 0001750 00000000621 14136047216 013775 0 ustar nilesh nilesh ## covr: skip=all
# Allows conflicts. For more information, see library() and
# conflicts() in [R] base.
.conflicts.OK <- TRUE
.onLoad <- function(libname, pkgname) {
## covr: skip=3
ns <- getNamespace(pkgname);
pkg <- Package(pkgname);
assign(pkgname, pkg, envir=ns);
}
.onAttach <- function(libname, pkgname) {
pkg <- get(pkgname, envir=getNamespace(pkgname));
startupMessage(pkg);
}
aroma.light/R/sampleCorrelations.matrix.R 0000644 0001750 0001750 00000007355 14136047216 020324 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric sampleCorrelations
# @alias sampleCorrelations.matrix
#
# @title "Calculates the correlation for random pairs of observations"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage sampleCorrelations,matrix
# }
#
# \arguments{
# \item{X}{An NxK @matrix where N >= 2 and K >= 2.}
# \item{MARGIN}{The dimension (1 or 2) in which the observations are.
# If \code{MARGIN==1} (\code{==2}), each row (column) is an observation.}
# \item{pairs}{If a Lx2 @matrix, the L index pairs for which the
# correlations are calculated.
# If @NULL, pairs of observations are sampled.}
# \item{npairs}{The number of correlations to calculate.}
# \item{...}{Not used.}
# }
#
# \value{
# Returns a @double @vector of length \code{npairs}.
# }
#
# @author "HB"
#
# @examples "../incl/sampleCorrelations.matrix.Rex"
#
# \seealso{
# @see "base::sample".
# }
#
# \references{
# [1] A. Ploner, L. Miller, P. Hall, J. Bergh & Y. Pawitan.
# \emph{Correlation test to assess low-level processing of high-density
# oligonucleotide microarray data}. BMC Bioinformatics, 2005, vol 6.
# }
#
# @keyword utilities
#*/#########################################################################
setMethodS3("sampleCorrelations", "matrix", function(X, MARGIN=1, pairs=NULL, npairs=max(5000, nrow(X)), ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Local functions
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
corFast <- function(x, y, ...) {
## .Internal() calls are no longer allowed. /HB 2012-04-16
## # 3 == "pairwise.complete.obs"
## .Internal(cor(x, y, as.integer(3), FALSE));
cor(x=x, y=y, use="pairwise.complete.obs", method="pearson");
} # corFast()
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument 'X'
if (!is.matrix(X))
throw("Argument 'X' must be a matrix: ", mode(X));
if (nrow(X) < 2)
throw("Argument 'X' must have more than two rows.");
if (ncol(X) < 2)
throw("Argument 'X' must have more than two columns.");
# Argument 'MARGIN'
if (MARGIN < 1 || MARGIN > 2)
throw("Argument 'MARGIN' is out of range [1,2]: ", MARGIN);
# Argument 'npairs'
if (npairs < 1)
throw("Argument 'npairs' must be equal or greater than one: ", npairs);
# Get row/column-index pairs to calculate correlations for.
if (is.null(pairs)) {
pairs <- sampleTuples(dim(X)[MARGIN], size=npairs, length=2);
} else {
npairs <- nrow(pairs);
}
# Are 'pairs' and 'npairs' consistent with each other?
if (nrow(pairs) < npairs) {
throw("The number of pairs in 'pairs' is smaller than 'npairs': ",
nrow(pairs), " < ", npairs);
}
# Pre-create result vector to optimize speed (and memory)
cors <- rep(NA_real_, times=npairs);
if (MARGIN == 1) {
for (kk in 1:npairs) {
pair <- pairs[kk,];
x <- X[pair[1],];
y <- X[pair[2],];
cors[kk] <- corFast(x,y);
}
} else {
for (kk in 1:npairs) {
pair <- pairs[kk,];
x <- X[,pair[1]];
y <- X[,pair[2]];
cors[kk] <- corFast(x,y);
}
}
cors;
}) # sampleCorrelations()
############################################################################
# HISTORY:
# 2012-04-16
# o sampleCorrelations() no longer utilizes .Internal() calls.
# o Added internal corFast() to sampleCorrelations().
# 2011-04-12
# o Now using NAs of the correct storage type.
# 2005-07-25
# o Added Rdoc comments with a simple example.
# 2005-04-07
# o Created.
############################################################################
aroma.light/R/fitNaiveGenotypes.R 0000644 0001750 0001750 00000017267 14136047216 016621 0 ustar nilesh nilesh ###########################################################################/**
# @RdocGeneric fitNaiveGenotypes
# @alias fitNaiveGenotypes.numeric
#
# @title "Fit naive genotype model from a normal sample"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage fitNaiveGenotypes,numeric
# }
#
# \arguments{
# \item{y}{A @numeric @vector of length J containing allele B fractions
# for a normal sample.}
# \item{cn}{An optional @numeric @vector of length J specifying the true
# total copy number in \eqn{\{0,1,2,NA\}} at each locus. This can be
# used to specify which loci are diploid and which are not, e.g.
# autosomal and sex chromosome copy numbers.}
# \item{subsetToFit}{An optional @integer or @logical @vector specifying
# which loci should be used for estimating the model.
# If @NULL, all loci are used.}
# \item{flavor}{A @character string specifying the type of algorithm used.}
# \item{adjust}{A positive @double specifying the amount smoothing for
# the empirical density estimator.}
# \item{...}{Additional arguments passed to @see "findPeaksAndValleys".}
# \item{censorAt}{A @double @vector of length two specifying the range
# for which values are considered finite. Values below (above) this
# range are treated as -@Inf (+@Inf).}
# \item{verbose}{A @logical or a @see "R.utils::Verbose" object.}
# }
#
# \value{
# Returns a @list of @lists.
# }
#
# @author
#
# \seealso{
# To call genotypes see @see "callNaiveGenotypes".
# Internally @see "findPeaksAndValleys" is used to identify the thresholds.
# }
#*/###########################################################################
setMethodS3("fitNaiveGenotypes", "numeric", function(y, cn=rep(2L, times=length(y)), subsetToFit=NULL, flavor=c("density", "fixed"), adjust=1.5, ..., censorAt=c(-0.1,1.1), verbose=FALSE) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument 'y':
J <- length(y);
y <- as.double(y);
# Argument 'cn':
cn <- as.integer(cn);
if (length(cn) == 1L) {
cn <- rep(cn, times=J);
} else if (length(cn) != J) {
stop("The length of argument 'cn' does not match 'y': ", length(cn), " != ", J);
}
uniqueCNs <- sort(unique(cn));
unknown <- which(!is.element(uniqueCNs, c(0,1,2,NA)));
if (length(unknown) > 0L) {
unknown <- paste(uniqueCNs[unknown], collapse=", ");
stop("Argument 'cn' contains unknown CN levels: ", unknown);
}
# Argument 'subsetToFit':
if (!is.null(subsetToFit)) {
if (is.logical(subsetToFit)) {
if (length(subsetToFit) != J) {
stop("The length of argument 'subsetToFit' does not match 'y': ", length(subsetToFit), " != ", J);
}
subsetToFit <- which(subsetToFit);
} else {
subsetToFit <- as.integer(subsetToFit);
subsetToFit <- sort(unique(subsetToFit));
if (!all(1 <= subsetToFit & subsetToFit <= J)) {
stop(sprintf("Some elements of argument 'subsetToFit' is out of range [1,%d].", J));
}
}
}
# Argument 'flavor':
flavor <- match.arg(flavor);
# Argument 'adjust':
adjust <- as.double(adjust);
if (length(adjust) != 1L) {
stop("Argument 'adjust' must be single value: ", adjust);
}
if (adjust <= 0) {
stop("Argument 'adjust' must be positive: ", adjust);
}
## # Argument 'tol':
## tol <- as.double(tol);
## if (length(tol) != 1) {
## stop("Argument 'tol' must be single value: ", tol);
## }
## if (tol <= 0) {
## stop("Argument 'tol' must be positive: ", tol);
## }
# Argument 'censorAt':
censorAt <- as.double(censorAt);
stopifnot(length(censorAt) == 2L);
stopifnot(censorAt[1] <= censorAt[2]);
# Argument 'verbose':
verbose <- Arguments$getVerbose(verbose);
if (verbose) {
pushState(verbose);
on.exit(popState(verbose));
}
verbose && enter(verbose, "Fitting naive genotype model from normal allele B fractions (BAFs)");
verbose && cat(verbose, "Flavor: ", flavor);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Adjust signals
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
if (verbose) {
enter(verbose, "Censoring BAFs");
cat(verbose, "Before:");
summary(verbose, y);
print(verbose, sum(is.finite(y)));
}
# Censor values
y[y < censorAt[1]] <- -Inf;
y[y > censorAt[2]] <- +Inf;
if (verbose) {
cat(verbose, "After:");
summary(verbose, y);
print(verbose, sum(is.finite(y)));
exit(verbose);
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Subsetting
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
if (!is.null(subsetToFit)) {
if (verbose) {
enter(verbose, "Subsetting");
cat(verbose, "Number of data points before: ", length(y));
cat(verbose, "Number of true copy-number levels before: ", length(uniqueCNs));
}
y <- y[subsetToFit];
cn <- cn[subsetToFit];
if (verbose) {
uniqueCNs <- sort(unique(cn));
cat(verbose, "Number of data points afterward: ", length(y));
cat(verbose, "Number of true copy-number levels afterward: ", length(uniqueCNs));
exit(verbose);
}
}
# To please R CMD check
type <- NULL; rm(list="type");
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Call genotypes
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fitList <- list();
for (kk in seq_along(uniqueCNs)) {
cnKK <- uniqueCNs[kk];
verbose && enter(verbose, sprintf("Copy number level #%d (C=%g) of %d", kk, cnKK, length(uniqueCNs)));
keep <- which(cn == cnKK);
yKK <- y[keep];
# Exclude missing and non-finited values when fitting the density
yT <- yKK[is.finite(yKK)];
n <- length(yT);
if (flavor == "density") {
fit <- findPeaksAndValleys(yT, adjust=adjust, ...);
verbose && cat(verbose, "Identified extreme points in density of BAF:");
verbose && print(verbose, fit);
fitValleys <- subset(fit, type == "valley");
nbrOfGenotypeGroups <- nrow(fitValleys) + 1L;
verbose && cat(verbose, "Local minimas (\"valleys\") in BAF:");
verbose && print(verbose, fitValleys);
tau <- fitValleys$x;
} else if (flavor == "fixed") {
args <- list(...);
tau <- args$tau;
if (is.null(tau)) {
tau <- seq_len(cnKK) / (cnKK + 1L);
}
nbrOfGenotypeGroups <- length(tau) + 1L;
}
# Sanity check
stopifnot(length(tau) == nbrOfGenotypeGroups - 1L);
# Store
fitKK <- list(
flavor = flavor,
cn=cnKK,
nbrOfGenotypeGroups=nbrOfGenotypeGroups, # Not really used
tau=tau,
n=n
);
if (flavor == "density") {
fitKK$fit <- fit;
fitKK$fitValleys <- fitValleys;
}
fitList[[kk]] <- fitKK;
verbose && exit(verbose);
} # for (kk ...)
verbose && exit(verbose);
class(fitList) <- c("NaiveGenotypeModelFit", class(fitList));
fitList;
}) # fitNaiveGenotypes()
###########################################################################
# HISTORY:
# 2012-04-16
# o Added support for fitNaiveGenotypes(..., flavor="fixed").
# o GENERALIZATION: Now fitNaiveGenotypes() returns also 'flavor' and
# 'tau'. The latter are the genotype threshholds used by the caller.
# 2010-10-14
# o TYPO FIX: Used name 'fitPeaks' instead of 'fitValleys'.
# 2010-10-12
# o New default of argument 'censorAt' for fitNaiveGenotypes().
# o BUG FIX: fitNaiveGenotypes(..., subsetToFit=) would throw
# an exception reporting "Some elements of argument 'subsetToFit' is
# out of range ...".
# 2010-10-07
# o Created from callNaiveGenotypes.R.
###########################################################################
aroma.light/R/robustSmoothSpline.R 0000644 0001750 0001750 00000026512 14136047216 017032 0 ustar nilesh nilesh ############################################################################/**
# @RdocDefault robustSmoothSpline
#
# @title "Robust fit of a Smoothing Spline"
#
# @synopsis
#
# \description{
# Fits a smoothing spline robustly using the \eqn{L_1} norm. Currently, the
# algorithm is an \emph{iterative reweighted smooth spline} algorithm which
# calls \code{smooth.spline(x,y,w,...)} at each iteration with the weights
# \code{w} equal to the inverse of the absolute value of the residuals for
# the last iteration step.
# }
#
# \arguments{
# \item{x}{a @vector giving the values of the predictor variable, or a
# @list or a two-column @matrix specifying \code{x} and \code{y}.
# If \code{x} is of class \code{smooth.spline} then \code{x$x} is used
# as the \code{x} values and \code{x$yin} are used as the \code{y}
# values.}
# \item{y}{responses. If \code{y} is missing, the responses are assumed to be
# specified by \code{x}.}
# \item{w}{a @vector of weights the same length as \code{x} giving the weights
# to use for each element of \code{x}. Default value is equal weight
# to all values.}
# \item{...}{Other arguments passed to @see "stats::smooth.spline".}
# \item{minIter}{the minimum number of iterations used to fit the smoothing
# spline robustly. Default value is 3.}
# \item{maxIter}{the maximum number of iterations used to fit the smoothing
# spline robustly. Default value is 25.}
# \item{method}{the method used to compute robustness weights at each
# iteration. Default value is \code{"L1"}, which uses the inverse of
# the absolute value of the residuals. Using \code{"symmetric"} will
# use Tukey's biweight with cut-off equal to six times the MAD of
# the residuals, equivalent to @see "stats::lowess".}
# \item{sdCriteria}{Convergence criteria, which the difference between the
# standard deviation of the residuals between two consecutive
# iteration steps. Default value is 2e-4.}
# \item{reps}{Small positive number added to residuals to avoid division by
# zero when calculating new weights for next iteration.}
# \item{tol}{Passed to @see "stats::smooth.spline" (R >= 2.14.0).}
# \item{plotCurves}{If @TRUE, the fitted splines are added to the current
# plot, otherwise not.}
# }
#
# \value{
# Returns an object of class \code{smooth.spline}.
# }
#
# @examples "../incl/robustSmoothSpline.Rex"
#
# \seealso{
# This implementation of this function was adopted from
# @see "stats::smooth.spline" of the \pkg{stats} package.
# Because of this, this function is also licensed under GPL v2.
# }
#
# @author
#
# @keyword "smooth"
# @keyword "robust"
#*/############################################################################
setMethodS3("robustSmoothSpline", "default", function(x, y=NULL, w=NULL, ..., minIter=3, maxIter=max(minIter, 50), method=c("L1", "symmetric"),
sdCriteria=2e-4, reps=1e-15, tol=1e-6*IQR(x), plotCurves=FALSE) {
requireNamespace("stats") || throw("Package not loaded: stats"); # smooth.spline()
stats.smooth.spline <- smooth.spline;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Verify arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument: 'w'
if (is.numeric(w)) {
w <- as.double(w);
if (anyMissing(w)) {
stop("Weights with value NA are not allowed.");
}
if (any(w < 0 | w > 1)) {
stop("Weights out of range [0,1]: ", paste(w[w < 0.0 | w > 1.0], collapse=", "));
}
} else if (!is.null(w)) {
stop("Argument 'w' is of an unsupported datatype/class: ", class(weights)[1]);
}
# Argument: 'method'
method <- match.arg(method)
# Argument: 'reps'
if (!is.numeric(reps) || length(reps) != 1 || reps <= 0)
throw("Argument 'reps' must be a single postive number.");
# smooth.spline() next will only operate on unique x-values. For this reason,
# we have to remove corresponding weights too. There is a small problem here;
# if different weights are used for data points (x,y,w) with same x-value, which
# data points (x,y,w) should be used? Here we use the first one only. /HB 2005-01-24
uIdxs <- .whichUnique(x, tol = tol);
nu <- length(uIdxs);
w0 <- w[uIdxs];
# WORKAROUND
# We need to make sure that 'g$x == x' below. /HB 2011-10-10
x <- x[uIdxs];
y <- y[uIdxs];
w <- w[uIdxs];
uIdxs <- seq_along(x);
if (inherits(x, "smooth.spline")) {
g <- x;
} else if (missing(w) || is.null(w)) {
x <- as.vector(x);
y <- as.vector(y);
g <- stats.smooth.spline(x, y, ..., tol=tol);
# Sanity check /HB 2011-10-10
stopifnot(length(g$x) == nu);
# Not needed anymore
x <- y <- NULL;
} else {
x <- as.vector(x);
y <- as.vector(y);
w <- as.vector(w);
g <- stats.smooth.spline(x, y, w=w, ..., tol=tol);
# Sanity check /HB 2011-10-10
stopifnot(length(g$x) == nu);
# Not needed anymore
x <- y <- w <- NULL;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Step 0. Initiation
#
# This will generate an object of class smooth.spline
# containing the fields
# x : the distinct `x' values in increasing order.
# y : the fitted values corresponding to `x'.
# yin : the y values used at the unique `y' values.
# From these the residuals can be calculated as
# r <- yin - y
# The important is that we use these (x,yin) as our
# (x,y) in the rest of the algorithm.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -
sdR0 <- NA_real_;
col <- 0;
ready <- FALSE;
iter <- 0;
while (!ready & iter < maxIter) {
iter <- iter + 1;
# Calculate the residuals and the weights
r <- (g$yin-g$y);
if (method == "L1") {
w <- 1/(abs(r)+reps); # Add a small constant for stability.
} else if (method == "symmetric") {
rmad <- mad(r);
threshold <- 6 * rmad; # same as lowess().
threshold <- max(reps, threshold); # avoid instability at very low MADs.
w <- (1 - pmin(1, abs(r)/threshold)^2)^2;
}
# If the user specified weights initially, the weights
# calculated from the inverse of the residuals are themselve
# weighted by the user initial weights.
if (!is.null(w0)) {
w <- w0*w;
}
sdR <- sd(r);
# Not needed anymore
r <- NULL;
if (iter > minIter) {
if (!is.na(sdR0)) {
dSd <- abs(sdR0-sdR);
if (dSd < sdCriteria)
break;
}
}
# Remove "bad" weights. For instance, too large w's gives:
# Error in smooth.spline(g$x, g$yin, w = w, ...) :
# NA/NaN/Inf in foreign function call (arg 4)
ok.weights <- (w != 0 & is.finite(w));
if (!all(ok.weights))
w[!ok.weights] <- 0;
# Not needed anymore
ok.weights <- NULL;
g <- stats.smooth.spline(g$x, g$yin, w=w, ..., tol=tol);
# Not needed anymore
w <- NULL;
if (plotCurves == TRUE)
lines(g, col=(col<-col+1));
sdR0 <- sdR;
} # while ( ... )
g$iter <- iter
g;
}) # robustSmoothSpline()
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Local functions
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.whichUnique <- function(x, ..., tol) {
# We need to make sure that 'g$x == x' below. /HB 2011-10-10
xx <- x;
keep <- rep(TRUE, times=length(x));
while (TRUE) {
idxs <- which(keep);
xx <- round((x[idxs] - mean(x[idxs]))/tol); # de-mean to avoid possible overflow
dups <- duplicated(xx);
if (!any(dups)) {
break;
}
keep[idxs[dups]] <- FALSE;
} # while()
nd <- keep;
# Sanity check
stopifnot(length(nd) == length(x));
which(nd);
} # .whichUnique()
######################################################################
# HISTORY
# 2015-01-06
# o Using requestNamespace() instead of request().
# 2014-03-25
# o CLEANUP: The internal .Fortran() calls no longer pass DUP=FALSE,
# which "may be disabled in future versions of R.".
# 2013-09-26
# o Now utilizing anyMissing().
# 2012-08-30
# o BUG FIX: Now local getNativeSplineFitFunction() sets up the
# function such that it is called via a FortranRoutine object,
# rather than by name.
# 2012-08-19
# o Added local getNativeSplineFitFunction() function to
# robustSmoothSpline() which returns a wrapper to a proper
# native and internal spline fit function of R.
# o Make it clear that robustSmoothSpline() is under GPL (>= 2),
# because it is adapted from smooth.spline() of R by R Core Team.
# Added a GPL source code header.
# 2011-10-10
# o Updated robustSmoothSpline() such that it works with the new
# "uniqueness" scheme of smooth.spline() in R v2.14.0 and newer.
# It is tricky, because robustSmoothSpline() is a reiterative
# algorithm which requires that the choosen "unique" x:s does
# not change in each iteration. Previously, 'signif(x, 6)' was
# used to identify unique x:s, which gives the same set of values
# when called twice, whereas this is not true for the new choice
# with 'round((x - mean(x))/tol)'.
# 2011-04-12
# o Now using as.double(NA) instead of NA, which is logical.
# o Interestingly, stats::smooth.spline() of R v2.14.0 now does
# very similar speedups as robustSmoothSpline() has done
# internally in its smooth.spline.fit() since 2002. Great.
# o CLEANUP: Now robustSmoothSpline() utilizes stats:::n.knots()
# internally, if running on R v2.14.0 or newer.
# 2008-07-20
# o MEMORY OPTIMIZATION: Removing more variables when done etc.
# Helping the garbage collector by doing x <- as.vector(x) before
# calling a function rather than having as.vector(x) as an argument.
# 2007-06-08
# o Added declaration 'nx <- 0' in robustSmoothSpline.matrix() in
# order to please R CMD check R v2.6.0.
# 2007-01-01
# o Removed any code to make method backward compatibility with
# R < 1.9.0, which was before 'modreg' was merged into 'stats'.
# 2005-06-03
# o Now making use of setMethodS3().
# o Renamed to robustSmoothSpline().
# o Copied from R.basic. At the same time, we speedup functions were made
# into local functions.
# 2005-01-24
# o Added support for weights.
# 2002-04-21
# o Updated due to modreg is merged into stats from R v1.9.0.
# 2002-03-02
# o SPEED UP: Since robust. smooth. spline() now makes use of
# the "home-made" smooth.spline.prepare() and smooth.spline0() it
# speed up about three times on my test data; 32secs -> 9secs.
# o Splitted smooth.spline() into the two functions
# smooth.spline.prepare() and smooth.spline.fit(). The purpose of
# this is to speed up robust.spline(), especially when there are
# duplicate x values!
# 2002-02-19
# o The idea of using w.org is not simple since the data points are
# reorder by smooth.spline.
# o Made w <- as.vector(w).
# 2002-02-18
# o Created the Rd comments with an example adapted from
# smooth.spline.
# o Made it possible to specify weights even in the robust estimation.
# o Added a assertion that the weights are non-illegal and not to
# big.
# o Renamed to robust. smooth. spline() and made analogue to
# smooth.spline().
# 2002-02-15
# o Created. It seems like the robust spline alorithm gives pretty
# much the same result as lowess. If not, the differences are
# quite small compared to the noise level of cDNA microarray data.
######################################################################
aroma.light/R/plotXYCurve.R 0000644 0001750 0001750 00000012305 14136047216 015406 0 ustar nilesh nilesh ###########################################################################/**
# @RdocGeneric plotXYCurve
# @alias plotXYCurve.numeric
# @alias plotXYCurve.matrix
#
# @title "Plot the relationship between two variables as a smooth curve"
#
# \usage{
# @usage plotXYCurve,numeric
# @usage plotXYCurve,matrix
# }
#
# \description{
# @get "title".
# }
#
# \arguments{
# \item{x, y, X, Y}{Two @numeric @vectors of length N for one curve (K=1),
# or two @numeric NxK @matrix:es for K curves.}
# \item{col}{The color of each curve.
# Either a scalar specifying the same value of all curves,
# or a @vector of K curve-specific values.}
# \item{lwd}{The line width of each curve.
# Either a scalar specifying the same value of all curves,
# or a @vector of K curve-specific values.}
# \item{dlwd}{The width of each density curve.}
# \item{dcol}{The fill color of the interior of each density curve.}
# \item{xlim, ylim}{The x and y plotting limits.}
# \item{xlab, ylab}{The x and y labels.}
# \item{curveFit}{The @function used to fit each curve. The two first
# arguments of the function must take \code{x} and \code{y}, and the
# function must return a @list with fitted elements \code{x} and
# \code{y}.}
# \item{...}{Additional arguments passed to @see "graphics::lines"
# used to draw each curve.}
# \item{add}{If @TRUE, the graph is added to the current plot, otherwise
# a new plot is created.}
# }
#
# \value{
# Returns nothing.
# }
#
# \section{Missing values}{
# Data points (x,y) with non-finite values are excluded.
# }
#
# @author "HB"
#
# @keyword "nonparametric"
# @keyword "multivariate"
# @keyword "robust"
#*/###########################################################################
setMethodS3("plotXYCurve", "numeric", function(x, y, col=1L, lwd=2, dlwd=1, dcol=NA, xlim=NULL, ylim=xlim, xlab=NULL, ylab=NULL, curveFit=smooth.spline, ..., add=FALSE) {
if (is.null(xlab))
xlab <- substitute(X);
if (is.null(ylab))
ylab <- substitute(Y);
# Exclude non-finite data points
ok <- (is.finite(x) & is.finite(y));
x <- x[ok];
y <- y[ok];
# Create empty plot?
if (!add) {
par(mar=c(5,4,4,5)+0.1);
suppressWarnings({
plot(NA, xlim=xlim, ylim=ylim, xlab="", ylab="", ..., axes=FALSE);
})
cex <- par("cex.lab")*par("cex");
mtext(xlab, side=1, line=3, cex=cex, col=par("col.lab"), font=par("font.lab"));
mtext(ylab, side=4, line=3, cex=cex, col=par("col.lab"), font=par("font.lab"));
abline(a=0, b=1, col="gray", lty=2L);
}
# Estimate and draw smooth function
suppressWarnings({
args <- list(x=x, y=y, ...);
keep <- intersect(names(args), names(formals(smooth.spline)));
args <- args[keep];
fit <- do.call(smooth.spline, args=args);
lines(fit, col=col, lwd=lwd, ...);
})
usr <- par("usr");
# Limit the density plot to the plot region and data range
rx <- range(x);
rx[1L] <- max(rx[1], usr[1L]);
rx[2L] <- min(rx[2], usr[2L]);
ry <- range(y);
ry[1L] <- max(ry[1], usr[3L]);
ry[2L] <- min(ry[2], usr[4L]);
# Estimate density of x
d <- density(x, from=rx[1L], to=rx[2L]);
n <- length(d$y);
d$y[c(1L,n)] <- 0;
d$y <- d$y / max(d$y, na.rm=TRUE);
dx <- d;
d$y <- 1/10*(usr[4L]-usr[3L])*d$y;
d$y <- usr[4L]+d$y;
polygon(d, col=dcol, border=col, lwd=dlwd, xpd=TRUE);
# Estimate density of y
d <- density(y, from=ry[1L], to=ry[2L]);
n <- length(d$y);
d$y[c(1L,n)] <- 0;
d$y <- d$y / max(d$y, na.rm=TRUE);
dy <- d;
t <- d$x; d$x <- d$y; d$y <- t;
d$x <- usr[1L]-1/10*(usr[2L]-usr[1L])*d$x;
polygon(d, col=dcol, border=col, lwd=dlwd, xpd=TRUE);
d <- dx;
t <- d$x; d$x <- d$y; d$y <- t;
d$x <- usr[1L]-1/10*(usr[2L]-usr[1L])*d$x;
lines(d, col="black", lwd=0.618*dlwd, lty=2, xpd=TRUE);
if (!add) {
axis(side=1L);
axis(side=4L);
box();
}
invisible(fit);
}) # plotXYCurve.numeric()
setMethodS3("plotXYCurve", "matrix", function(X, Y, col=seq_len(nrow(X)), lwd=2, dlwd=1, dcol=NA, xlim=NULL, ylim=xlim, xlab=NULL, ylab=NULL, curveFit=smooth.spline, ..., add=FALSE) {
if (identical((X), dim(Y))) {
throw("Argument 'X' and 'Y' have different dimensions.");
}
if (is.null(xlab))
xlab <- substitute(X);
if (is.null(ylab))
ylab <- substitute(Y);
ncol <- ncol(X);
if (is.null(col)) {
col <- seq_len(ncol);
} else {
col <- rep(col, length.out=ncol);
}
if (!is.null(lwd))
lwd <- rep(lwd, length.out=ncol);
if (is.null(xlim)) {
xlim <- range(X, na.rm=TRUE);
}
if (is.null(ylim)) {
ylim <- range(Y, na.rm=TRUE);
}
for (kk in seq_len(ncol)) {
plotXYCurve(X[,kk], Y[,kk], col=col[kk], lwd=lwd, dlwd=dlwd, dcol=dcol, xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab, curveFit=curveFit, ..., add=add || (kk > 1L));
}
box();
invisible();
}) # plotXYCurve.matrix()
############################################################################
# HISTORY:
# 2013-10-08
# o BUG FIX: Argument 'lwd' of plotXYCurve(X, ...) was ignored if 'X'
# was a matrix.
# o DOCUMENTATION: Now there is one combined plotXYCurve() help pages
# for all data types.
# 2008-04-14
# o Replaced a R.utils::doCall() with a "cleanup" do.call().
# 2007-02-04
# o Created.
############################################################################
aroma.light/R/normalizeQuantileRank.R 0000644 0001750 0001750 00000013136 14136047216 017464 0 ustar nilesh nilesh ###########################################################################/**
# @RdocGeneric normalizeQuantileRank
# @alias normalizeQuantileRank.numeric
# @alias normalizeQuantileRank.list
# @alias normalizeQuantile
# @alias normalizeQuantile.default
#
# @title "Normalizes the empirical distribution of one of more samples to a target distribution"
#
# \usage{
# @usage normalizeQuantileRank,numeric
# @usage normalizeQuantileRank,list
# @usage normalizeQuantile,default
# }
#
# \description{
# @get "title".
#
# The average sample distribution is calculated either robustly or not
# by utilizing either \code{weightedMedian()} or \code{weighted.mean()}.
# A weighted method is used if any of the weights are different from one.
# }
#
# \arguments{
# \item{x, X}{a @numeric @vector of length N or a @list of length N
# with @numeric @vectors.
# If a @list, then the @vectors may be of different lengths.}
# \item{xTarget}{The target empirical distribution as a \emph{sorted}
# @numeric @vector of length \eqn{M}.
# If @NULL and \code{X} is a @list, then the target distribution is
# calculated as the average empirical distribution of the samples.}
# \item{ties}{Should ties in \code{x} be treated with care or not?
# For more details, see "limma:normalizeQuantiles".}
# \item{...}{Not used.}
# }
#
# \value{
# Returns an object of the same shape as the input argument.
# }
#
# \section{Missing values}{
# Missing values are excluded when estimating the "common" (the baseline).
# Values that are @NA remain @NA after normalization.
# No new @NAs are introduced.
# }
#
# \section{Weights}{
# Currently only channel weights are support due to the way quantile
# normalization is done.
# If signal weights are given, channel weights are calculated from these
# by taking the mean of the signal weights in each channel.
# }
#
# @examples "../incl/normalizeQuantileRank.list.Rex"
#
# \author{
# Adopted from Gordon Smyth (\url{http://www.statsci.org/}) in 2002 \& 2006.
# Original code by Ben Bolstad at Statistics Department, University of
# California.
# }
#
# \seealso{
# To calculate a target distribution from a set of samples, see
# @see "averageQuantile".
# For an alternative empirical density normalization methods, see
# @see "normalizeQuantileSpline".
# }
#
# @keyword "nonparametric"
# @keyword "multivariate"
# @keyword "robust"
#*/###########################################################################
setMethodS3("normalizeQuantileRank", "list", function(X, xTarget=NULL, ...) {
# Get the target quantile for all channels (columns)?
if (is.null(xTarget))
xTarget <- averageQuantile(X);
# Normalizes the data
nTarget <- length(xTarget);
X <- lapply(X, FUN=function(x) {
normalizeQuantileRank(x, xTarget=xTarget, ...);
})
X;
})
setMethodS3("normalizeQuantileRank", "numeric", function(x, xTarget, ties=FALSE, ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
n <- length(x);
# Argument 'xTarget':
if (!is.numeric(xTarget)) {
throw("Argument 'xTarget' is not numeric: ", mode(xTarget));
}
nTarget <- length(xTarget);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Different length of sample and target?
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
nDiff <- (nTarget - n);
if (nDiff > 0L) {
# Add hoc fix for differences in lengths.
naValue <- NA;
storage.mode(naValue) <- storage.mode(x);
x <- c(x, rep(naValue, times=nDiff));
n <- n + nDiff;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# For all columns, get for each sample quantile the value of
# average sample distribution at that quantile.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
quantiles <- (0:(nTarget-1))/(nTarget-1);
ok <- !is.na(x);
nok <- sum(ok);
if(nok < n) {
# Get the sample quantiles for those values
if (ties) {
r <- rank(x[ok]);
xNew <- (r-1)/(nok-1);
} else {
xNew <- (0:(nok-1))/(nok-1);
}
# Interpolate to get the xTarget's at positions specified by
# 'quantile' using data points given by 'xNew' and 'xTarget'.
if (!ties) {
# Order and sort the values
ok <- ((1:n)[ok])[order(x[ok])];
}
x[ok] <- approx(x=quantiles, y=xTarget, xout=xNew, ties="ordered")$y;
if (nDiff > 0L) {
x <- x[1:(n-nDiff)];
}
} else {
if (ties || n != nTarget) {
r <- rank(x);
xNew <- (r-1)/(n-1);
x <- approx(x=quantiles, y=xTarget, xout=xNew, ties="ordered")$y;
} else {
ok <- order(x);
x[ok] <- xTarget;
}
}
x;
})
setMethodS3("normalizeQuantile", "default", function(x, ...) {
normalizeQuantileRank(x, ...);
})
##############################################################################
# HISTORY:
# 2013-10-07
# o DOCUMENTATION: Merged the documentation for normalizeQuantileRank()
# for numeric vectors and lists.
# 2011-04-12
# o Now using NAs of the correct storage type.
# 2008-04-14
# o Renamed normalizeQuantile() to normalizeQuantileRank(). Keeping the old
# name for backward compatibility.
# 2006-05-21
# o Now 'x' and 'xTarget' may be of different lengths.
# 2006-05-15
# o Now the method can normalize vectors of length different from 'xTarget'.
# 2006-05-12
# o Created from normalizeQuantile.matrix.R. It has been optimized for
# memory. Hence, the normalization is done using a two-pass procedure.
##############################################################################
aroma.light/R/pairedAlleleSpecificCopyNumbers.R 0000644 0001750 0001750 00000011770 14136047216 021367 0 ustar nilesh nilesh ###########################################################################/**
# @RdocGeneric pairedAlleleSpecificCopyNumbers
# @alias pairedAlleleSpecificCopyNumbers.numeric
#
# @title "Calculating tumor-normal paired allele-specific copy number stratified on genotypes"
#
# \description{
# @get "title".
# The method is a single-sample (single-pair) method.
# It requires paired tumor-normal parent-specific copy number signals.
# }
#
# \usage{
# @usage pairedAlleleSpecificCopyNumbers,numeric
# }
#
# \arguments{
# \item{thetaT, betaT}{Theta and allele-B fraction signals for the tumor.}
# \item{thetaN, betaN}{Total and allele-B fraction signals for the
# matched normal.}
# \item{muN}{An optional @vector of length J containing
# normal genotypes calls in (0,1/2,1,@NA) for (AA,AB,BB).}
# \item{...}{Not used.}
# }
#
# \value{
# Returns a @data.frame with elements \code{CT}, \code{betaT} and \code{muN}.
# }
#
# \seealso{
# This definition of calculating tumor-normal paired ASCN is related
# to how the @see "normalizeTumorBoost" method calculates normalized
# tumor BAFs.
# }
#
# @author "PN, HB"
#*/###########################################################################
setMethodS3("pairedAlleleSpecificCopyNumbers", "numeric", function(thetaT, betaT, thetaN, betaN, muN=callNaiveGenotypes(betaN), ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument: 'thetaT' & 'betaT':
thetaT <- as.numeric(thetaT);
betaT <- as.numeric(betaT);
J <- length(thetaT);
if (length(betaT) != J) {
stop("The length of arguments 'betaT' and 'thetaT' differ: ", length(betaT), " != ", J);
}
# Argument: 'thetaN' & 'betaN':
thetaN <- as.numeric(thetaN);
betaN <- as.numeric(betaN);
if (length(thetaN) != J) {
stop("The length of arguments 'thetaN' and 'thetaT' differ: ", length(thetaN), " != ", J);
}
if (length(betaN) != J) {
stop("The length of arguments 'betaN' and 'thetaN' differ: ", length(betaN), " != ", J);
}
# Argument: 'muN':
if (length(muN) != J) {
stop("The length of arguments 'muN' and 'betaN' differ: ", length(muN), " != ", J);
}
knownGenotypes <- c(0, 1/2, 1, NA);
unknown <- which(!is.element(muN, knownGenotypes));
n <- length(unknown);
if (n > 0L) {
unknown <- unique(muN[unknown]);
stop("Argument 'muN' contains unknown values: ", hpaste(unknown));
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calculate (thetaA,thetaB) for tumor and normal (for SNP only)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# SNPs are identifies as those loci that have non-missing 'betaTN' & 'muN'
isSnp <- (!is.na(betaT) & !is.na(muN));
nbrOfSnps <- sum(isSnp);
thetaTs <- thetaT[isSnp] * matrix(c(1-betaT[isSnp], betaT[isSnp]), ncol=2L);
stopifnot(nrow(thetaTs) == nbrOfSnps);
thetaNs <- thetaN[isSnp] * matrix(c(1-betaN[isSnp], betaN[isSnp]), ncol=2L);
stopifnot(nrow(thetaNs) == nbrOfSnps);
muNs <- muN[isSnp];
stopifnot(length(muNs) == nbrOfSnps);
isHomAs <- (muNs == 0);
isHomBs <- (muNs == 1);
stopifnot(length(isHomAs) == nbrOfSnps);
stopifnot(length(isHomBs) == nbrOfSnps);
muNs <- NULL; # Not needed anymore
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calculate tumor (CA,CB) conditioned on genotype (for SNP only)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CTs <- thetaTs / thetaNs;
CTs[isHomAs,1L] <- 2*CTs[isHomAs,1L];
CTs[isHomAs,2L] <- 0;
CTs[isHomBs,1L] <- 0;
CTs[isHomBs,2L] <- 2*CTs[isHomBs,2L];
thetaTs <- thetaNs <- isHomAs <- isHomBs <- NULL; # Not needed anymore
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Translate (CA,CB) to (CT,betaT)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CT <- betaT <- rep(NA_real_, times=J);
# Total CN ratios
CT[isSnp] <- CTs[,1L] + CTs[,2L];
CT[!isSnp] <- 2 * thetaT[!isSnp] / thetaN[!isSnp];
# Tumor BAFs
betaT[isSnp] <- CTs[,2L] / CT[isSnp];
CTs <- isSnp <- NULL; # Not needed anymore
# Sanity checks
stopifnot(length(CT) == J);
stopifnot(length(betaT) == J);
stopifnot(length(muN) == J);
# Return value
data <- data.frame(CT=CT, betaT=betaT, muN=muN);
data;
}) # pairedAlleleSpecificCopyNumbers()
##############################################################################
# HISTORY:
# 2014-03-30 [HB in Juvisy]
# o Created from PN's description just to be on the same page. PN has argued
# for this way of calculating ASCN's PN since our TumorBoost days (~2009).
# PN has a high-level implementation elsewhere, but HB decided to do this
# from scratch to get a low-level API similar to that of TumorBoost.
##############################################################################
aroma.light/R/plotDensity.R 0000644 0001750 0001750 00000015303 14136047216 015461 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric plotDensity
# @alias plotDensity.list
# @alias plotDensity.data.frame
# @alias plotDensity.matrix
# @alias plotDensity.numeric
# @alias plotDensity.density
#
# @title "Plots density distributions for a set of vectors"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage plotDensity,data.frame
# @usage plotDensity,matrix
# @usage plotDensity,numeric
# @usage plotDensity,list
# }
#
# \arguments{
# \item{X}{A single of @list of @numeric @vectors or @see "stats::density"
# objects, a @numeric @matrix, or a @numeric @data.frame.}
# \item{W}{(optional) weights of similar data types and dimensions as
# \code{X}.}
# \item{xlim,ylim}{@character @vector of length 2. The x and y limits.}
# \item{xlab,ylab}{@character string for labels on x and y axis.}
# \item{col}{The color(s) of the curves.}
# \item{lty}{The types of curves.}
# \item{lwd}{The width of curves.}
# \item{...}{Additional arguments passed to @see "stats::density",
# @see "graphics::plot", and @see "graphics::lines".}
# \item{add}{If @TRUE, the curves are plotted in the current plot,
# otherwise a new is created.}
# }
#
# \seealso{
# Internally, @see "stats::density" is used to estimate the
# empirical density.
# }
#
# @author "HB"
#*/#########################################################################
setMethodS3("plotDensity", "list", function(X, W=NULL, xlim=NULL, ylim=NULL, xlab=NULL, ylab="density (integrates to one)", col=1:length(X), lty=NULL, lwd=NULL, ..., add=FALSE) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument 'X':
nbrOfSamples <- length(X);
# Argument 'W':
if (is.numeric(W)) {
nX <- sapply(X, FUN=length);
if (any(nX != nX[1L])) {
throw("If argument 'W' is a numeric vector or matrix, then all vectors of 'X' must of identical lengths, which is not the case.");
}
nX <- nX[1L];
if (is.vector(W)) {
nW <- length(W);
if (nW != nX) {
throw("Length of argument 'W' and the length of the elements of 'X' does not match: ", nW, " != ", nX);
}
# Coerce into a list of weights of the same number of elements as 'X'
W <- rep(list(W), times=nbrOfSamples);
} else if (is.matrix(W)) {
nW <- nrow(W);
if (nW != nX) {
throw("Number of rows of argument 'W' and the length of the elements of 'X' does not match: ", nW, " != ", nX);
}
# Coerce into a list of weights of the same number of elements as 'X'
Wx <- vector("list", length=ncol(W));
for (kk in 1:ncol(W)) Wx[[kk]] <- W[,kk,drop=TRUE];
W <- Wx;
Wx <- NULL; # Not needed anymore
}
} # if (is.numeric(W))
if (is.list(W)) {
if (length(W) != nbrOfSamples) {
throw("The lists of argument 'W' and 'X' do not have the same number of elements: ", length(W), " != ", nbrOfSamples);
}
for (kk in 1:nbrOfSamples) {
w <- W[[kk]];
nW <- length(w);
nX <- length(X[[kk]]);
if (nW != nX) {
throw(sprintf("Element #%d of arguments 'W' and 'X' are of different lengths: %d != %d", kk, nW, nX));
}
if (any(w < 0)) throw("Argument 'W' contains negative weights.");
w <- nW <- nX <- NULL; # Not needed anymore
}
} else if (!is.null(W)) {
throw("Argument 'W' must be a list, a numeric vector, or a numeric matrix: ", class(W)[1L]);
}
# Argument 'xlab':
if (is.null(xlab))
xlab <- substitute(X);
# Argument 'col':
if (is.null(col)) {
col <- seq_len(nbrOfSamples);
} else {
col <- rep(col, length.out=nbrOfSamples);
}
# Argument 'lty':
if (!is.null(lty))
lty <- rep(lty, length.out=nbrOfSamples);
# Argument 'lwd':
if (!is.null(lwd))
lwd <- rep(lwd, length.out=nbrOfSamples);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Generate all densities first and figure out the plot limits.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ds <- list();
xlimDef <- c(NA_real_,NA_real_);
ylimDef <- c(0,NA_real_);
for(kk in 1:nbrOfSamples) {
x <- X[[kk]];
if (inherits(x, "density")) {
d <- x;
} else {
w <- W[[kk]];
if (is.null(w)) {
keep <- is.finite(x);
x <- x[keep];
keep <- NULL; # Not needed anymore
suppressWarnings({
d <- density(x, ...);
});
x <- NULL; # Not needed anymore
} else {
keep <- is.finite(x) & is.finite(w);
x <- x[keep];
w <- w[keep];
keep <- NULL; # Not needed anymore
# Standardize to sum(w) == 1
w <- w / sum(w);
suppressWarnings({
d <- density(x, weights=w, ...);
});
x <- w <- NULL; # Not needed anymore
}
}
ds[[kk]] <- d;
xlimDef <- range(c(xlimDef, d$x), na.rm=TRUE);
ylimDef <- range(c(ylimDef, d$y), na.rm=TRUE);
} # for (kk ...)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Plot the densities
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
if (is.null(xlim)) {
xlim <- xlimDef;
} else {
for (kk in 1:2) if (is.na(xlim[kk])) xlim[kk] <- xlimDef[kk]
}
if (is.null(ylim)) {
ylim <- ylimDef;
} else {
for (kk in 1:2) if (is.na(ylim[kk])) ylim[kk] <- ylimDef[kk]
}
if (add == FALSE) {
suppressWarnings({
plot(NA, xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab, ...);
})
}
for(kk in 1:nbrOfSamples) {
suppressWarnings({
lines(ds[[kk]], col=col[kk], lty=lty[kk], lwd=lwd[kk], ...);
})
}
invisible(ds);
}) # plotDensity()
setMethodS3("plotDensity", "data.frame", function(X, ..., xlab=NULL) {
# Argument 'xlab':
if (is.null(xlab))
xlab <- substitute(X);
plotDensity(as.list(X), ..., xlab=xlab);
})
setMethodS3("plotDensity", "matrix", function(X, ..., xlab=NULL) {
# Argument 'xlab':
if (is.null(xlab))
xlab <- substitute(X);
plotDensity(as.data.frame(X), ..., xlab=xlab);
})
setMethodS3("plotDensity", "numeric", function(X, ..., xlab=NULL) {
# Argument 'xlab':
if (is.null(xlab))
xlab <- substitute(X);
plotDensity(list(X), ..., xlab=xlab);
})
setMethodS3("plotDensity", "density", function(X, ..., xlab=NULL) {
# Argument 'xlab':
if (is.null(xlab))
xlab <- substitute(X);
plotDensity(list(X), ..., xlab=xlab);
})
##############################################################################
# HISTORY:
# 2014-03-25
# o Now plotDensity() supports weights via argument 'W'.
# o Now plotDensity() also supports density() objects.
# 2006-05-12
# o Created.
##############################################################################
aroma.light/R/901.CalibrationAndNormalization.R 0000644 0001750 0001750 00000023330 14136047216 021073 0 ustar nilesh nilesh #########################################################################/**
# @RdocDocumentation "1. Calibration and Normalization"
#
# \encoding{latin1}
#
# \description{
# In this section we give \emph{our} recommendation on how spotted
# two-color (or multi-color) microarray data is best calibrated and
# normalized.
# }
#
# \section{Classical background subtraction}{
# We do \emph{not} recommend background subtraction in classical
# means where background is estimated by various image analysis
# methods. This means that we will only consider foreground signals
# in the analysis.
#
# We estimate "background" by other means. In what is explain below,
# only a global background, that is, a global bias, is estimated
# and removed.
# }
#
# \section{Multiscan calibration}{
# In Bengtsson et al (2004) we give evidence that microarray scanners
# can introduce a significant bias in data. This bias, which is
# about 15-25 out of 65535, \emph{will} introduce intensity dependency
# in the log-ratios, as explained in Bengtsson &
# \enc{Hssjer}{Hossjer} (2006).
#
# In Bengtsson et al (2004) we find that this bias is stable across
# arrays (and a couple of months), but further research is needed
# in order to tell if this is true over a longer time period.
#
# To calibrate signals for scanner biases, scan the same array at
# multiple PMT-settings at three or more (K >= 3) different
# PMT settings (preferably in decreasing order).
# While doing this, \emph{do not adjust the laser power settings}.
# Also, do the multiscan \emph{without} washing, cleaning or by other
# means changing the array between subsequent scans.
# Although not necessary, it is preferred that the array
# remains in the scanner between subsequent scans. This will simplify
# the image analysis since spot identification can be made once
# if images aligns perfectly.
#
# After image analysis, read all K scans for the same array into the
# two matrices, one for the red and one for the green channel, where
# the K columns corresponds to scans and the N rows to the spots.
# It is enough to use foreground signals.
#
# In order to multiscan calibrate the data, for each channel
# separately call \code{Xc <- calibrateMultiscan(X)} where \code{X}
# is the NxK matrix of signals for one channel across all scans. The
# calibrated signals are returned in the Nx1 matrix \code{Xc}.
#
# Multiscan calibration may sometimes be skipped, especially if affine
# normalization is applied immediately after, but we do recommend that
# every lab check at least once if their scanner introduce bias.
# If the offsets in a scanner is already estimated from earlier
# multiscan analyses, or known by other means, they can readily be
# subtracted from the signals of each channel. If arrays are still
# multiscanned, it is possible to force the calibration method to
# fit the model with zero intercept (assuming the scanner offsets
# have been subtracted) by adding argument \code{center=FALSE}.
# }
#
# \section{Affine normalization}{
# In Bengtsson & \enc{Hssjer}{Hossjer} (2006), we carry out a detailed
# study on how biases in each channel introduce so called
# intensity-dependent log-ratios among other systematic artifacts.
# Data with (additive) bias in each channel is said to be \emph{affinely}
# transformed. Data without such bias, is said to be \emph{linearly}
# (proportionally) transform. Ideally, observed signals (data) is a
# linear (proportional) function of true gene expression levels.
#
# We do \emph{not} assume proportional observations. The scanner bias
# is real evidence that assuming linearity is not correct.
# Affine normalization corrects for affine transformation in data.
# Without control spots it is not possible to estimate the bias in each
# of the channels but only the relative bias such that after
# normalization the effective bias are the same in all channels.
# This is why we call it normalization and not calibration.
#
# In its simplest form, affine normalization is done by
# \code{Xn <- normalizeAffine(X)} where \code{X} is a Nx2 matrix with
# the first column holds the foreground signals from the red channel and
# the second holds the signals from the green channel. If three- or
# four-channel data is used these are added the same way. The normalized
# data is returned as a Nx2 matrix \code{Xn}.
#
# To normalize all arrays and all channels at once, one may put all
# data into one big NxK matrix where the K columns hold the all channels
# from the first array, then all channels from the second array and so
# on. Then \code{Xn <- normalizeAffine(X)} will return the across-array
# and across-channel normalized data in the NxK matrix \code{Xn} where
# the columns are stored in the same order as in matrix \code{X}.
#
# Equal effective bias in all channels is much better. First of all,
# any intensity-dependent bias in the log-ratios is removed \emph{for
# all non-differentially expressed genes}. There is still an
# intensity-dependent bias in the log-ratios for differentially expressed
# genes, but this is now symmetric around log-ratio zero.
#
# Affine normalization will (by default and recommended) normalize
# \emph{all} arrays together and at once. This will guarantee that
# all arrays are "on the same scale". Thus, it \emph{not} recommended
# to apply a classical between-array scale normalization afterward.
# Moreover, the average log-ratio will be zero after an affine
# normalization.
#
# Note that an affine normalization will only remove curvature in the
# log-ratios at lower intensities.
# If a strong intensity-dependent bias at high intensities remains,
# this is most likely due to saturation effects, such as too high PMT
# settings or quenching.
#
# Note that for a perfect affine normalization you \emph{should}
# expect much higher noise levels in the \emph{log-ratios} at lower
# intensities than at higher. It should also be approximately
# symmetric around zero log-ratio.
# In other words, \emph{a strong fanning effect is a good sign}.
#
# Due to different noise levels in red and green channels, different
# PMT settings in different channels, plus the fact that the
# minimum signal is zero, "odd shapes" may be seen in the log-ratio
# vs log-intensity graphs at lower intensities. Typically, these
# show themselves as non-symmetric in positive and negative log-ratios.
# Note that you should not see this at higher intensities.
#
# If there is a strong intensity-dependent effect left after the
# affine normalization, we recommend, for now, that a subsequent
# curve-fit or quantile normalization is done.
# Which one, we do not know.
#
# Why negative signals?
# By default, 5\% of the normalized signals will have a non-positive
# signal in one or both channels. \emph{This is on purpose}, although
# the exact number 5\% is chosen by experience. The reason for
# introducing negative signals is that they are indeed expected.
# For instance, when measure a zero gene expression level, there is
# a chance that the observed value is (should be) negative due to
# measurement noise. (For this reason it is possible that the scanner
# manufacturers have introduced scanner bias on purpose to avoid
# negative signals, which then all would be truncated to zero.)
# To adjust the ratio (or number) of negative signals allowed, use
# for example \code{normalizeAffine(X, constraint=0.01)} for 1\%
# negative signals. If set to zero (or \code{"max"}) only as much
# bias is removed such that no negative signals exist afterward.
# Note that this is also true if there were negative signals on
# beforehand.
#
# Why not lowess normalization?
# Curve-fit normalization methods such as lowess normalization are
# basically designed based on linearity assumptions and will for this
# reason not correct for channel biases. Curve-fit normalization
# methods can by definition only be applied to one pair of channels
# at the time and do therefore require a subsequent between-array
# scale normalization, which is by the way very ad hoc.
#
# Why not quantile normalization?
# Affine normalization can be though of a special case of quantile
# normalization that is more robust than the latter.
# See Bengtsson & \enc{Hssjer}{Hossjer} (2006) for details.
# Quantile normalization is probably better to apply than curve-fit
# normalization methods, but less robust than affine normalization,
# especially at extreme (low and high) intensities.
# For this reason, we do recommend to use affine normalization first,
# and if this is not satisfactory, quantile normalization may be applied.
# }
#
# \section{Linear (proportional) normalization}{
# If the channel offsets are zero, already corrected for, or estimated
# by other means, it is possible to normalize the data robustly by
# fitting the above affine model without intercept, that is, fitting
# a truly linear model. This is done adding argument \code{center=FALSE}
# when calling \code{normalizeAffine()}.
# }
#
# @author "HB"
#*/#########################################################################
############################################################################
# HISTORY:
# 2011-02-05
# o DOCUMENTATION: Added paragraphs on how to do affine normalization
# when channel offsets are known/zero. Same for multiscan calibration
# when scanner offsets are known/zero.
# 2006-06-29
# o Added to aroma.light instead.
# 2005-02-02
# o Created.
############################################################################
aroma.light/R/normalizeAverage.R 0000644 0001750 0001750 00000004754 14136047216 016446 0 ustar nilesh nilesh ###########################################################################/**
# @RdocGeneric normalizeAverage
# @alias normalizeAverage.list
# @alias normalizeAverage.matrix
#
# @title "Rescales channel vectors to get the same average"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage normalizeAverage,matrix
# @usage normalizeAverage,list
# }
#
# \arguments{
# \item{x}{A @numeric NxK @matrix (or @list of length K).}
# \item{baseline}{An @integer in [1,K] specifying which channel should be
# the baseline.}
# \item{avg}{A @function for calculating the average of one channel.}
# \item{targetAvg}{The average that each channel should have afterwards.
# If @NULL, the baseline column sets the target average.}
# \item{...}{Additional arguments passed to the \code{avg} @function.}
# }
#
# \value{
# Returns a normalized @numeric NxK @matrix (or @list of length K).
# }
#
# @author "HB"
#*/###########################################################################
setMethodS3("normalizeAverage", "matrix", function(x, baseline=1, avg=stats::median, targetAvg=2200, ...) {
# Estimate the scale for each channel
scale <- apply(x, MARGIN=2, FUN=avg, ...);
# The scale of the baseline column
scale1 <- scale[baseline];
# Standardize so that the 'baseline' column is not rescaled (has scale one).
scale <- scale / scale1;
# Rescale to target averages?
if (!is.null(targetAvg)) {
rho <- (scale1 / targetAvg);
scale <- rho * scale;
}
# Rescale so that all channels have the same scale
for (cc in 1:ncol(x)) {
x[,cc] <- x[,cc] / scale[cc];
}
x;
}, private=TRUE)
setMethodS3("normalizeAverage", "list", function(x, baseline=1, avg=stats::median, targetAvg=2200, ...) {
# Estimate the scale for each channel
scale <- lapply(x, FUN=avg, ...);
scale <- unlist(scale, use.names=FALSE);
scale1 <- scale[baseline];
# Standardize so that the 'baseline' channel has scale one.
scale <- scale / scale1;
# Rescale to target averages?
if (!is.null(targetAvg)) {
rho <- (scale1 / targetAvg);
scale <- rho * scale;
}
# Rescale so that all channels have the same scale
for (cc in 1:length(x)) {
x[[cc]] <- x[[cc]] / scale[cc];
}
x;
}, private=TRUE)
############################################################################
# HISTORY:
# 2007-06-04
# o Corrected minor ineffective typo in code.
# 2007-03-29
# o Added Rdoc comments.
# 2006-05-08
# o Created.
############################################################################
aroma.light/R/print.SmoothSplineLikelihood.R 0000644 0001750 0001750 00000003322 14136047216 020724 0 ustar nilesh nilesh ###########################################################################/**
# @class SmoothSplineLikelihood
# @RdocMethod print
#
# @title "Prints an SmoothSplineLikelihood object"
#
# \description{
# @get "title". A SmoothSplineLikelihood object is returned by
# \code{\link{likelihood.smooth.spline}()}.
# }
#
# \usage{
# @usage print,SmoothSplineLikelihood
# }
#
# \arguments{
# \item{x}{Object to be printed.}
# \item{digits}{Minimal number of significant digits to print.}
# \item{...}{Not used.}
# }
#
# \value{
# Returns nothing.
# }
#
# @author "HB"
#
# @keyword internal
#*/###########################################################################
setMethodS3("print", "SmoothSplineLikelihood", function(x, digits=getOption("digits"), ...) {
# To please R CMD check...
object <- x;
s <- paste("Likelihood of smoothing spline:", format(object, digits=digits), "\n");
base <- attr(object, "base");
s <- paste(s, "Log base:", format(base, digits=digits), "\n")
wrss <- attr(object, "wrss");
s <- paste(s, "Weighted residuals sum of square:", format(wrss, digits=digits), "\n");
penalty <- attr(object, "penalty");
s <- paste(s, "Penalty:", format(penalty, digits=digits), "\n");
lambda <- attr(object, "lambda");
s <- paste(s, "Smoothing parameter lambda:", format(lambda, digits=digits), "\n");
roughness <- attr(object, "roughness");
s <- paste(s, "Roughness score:", format(roughness, digits=digits), "\n");
cat(s);
invisible(object);
})
############################################################################
# HISTORY:
# 2005-06-03
# o Added Rdoc comments.
# o Extracted from likelihood.smooth.spline.R.
############################################################################
aroma.light/R/likelihood.smooth.spline.R 0000644 0001750 0001750 00000013206 14136047216 020067 0 ustar nilesh nilesh ###########################################################################/**
# @class smooth.spline
# @RdocMethod likelihood
#
# @title "Calculate the log likelihood of a smoothing spline given the data"
#
# @synopsis
#
# \arguments{
# \item{object}{The smooth.spline object.}
# \item{x, y}{The x and y values for which the (weighted) likelihood will
# be calculated. If \code{x} is of type \code{xy.coords} any value of
# argument \code{y} will be omitted. If \code{x==NULL}, the x and y values
# of the smoothing spline will be used.}
# \item{w}{The weights for which the (weighted) likelihood will be
# calculated. If @NULL, weights equal to one are assumed.}
# \item{base}{The base of the logarithm of the likelihood. If @NULL,
# the non-logged likelihood is returned.}
# \item{rel.tol}{The relative tolerance used in the call to
# \code{integrate}.}
# \item{...}{Not used.}
# }
#
# \description{
# Calculate the (log) likelihood of a spline given the data used to fit
# the spline, \eqn{g}. The likelihood consists of two main parts:
# 1) (weighted) residuals sum of squares, and 2) a penalty term. The
# penalty term consists of a \emph{smoothing parameter} \eqn{lambda}
# and a \emph{roughness measure} of the spline
# \eqn{J(g) = \int g''(t) dt}. Hence, the overall log likelihood is
# \deqn{\log L(g|x) = (y-g(x))'W(y-g(x)) + \lambda J(g)}
# In addition to the overall likelihood, all its separate
# components are also returned.
#
# Note: when fitting a smooth spline with \eqn{(x,y)} values where the
# \eqn{x}'s are \emph{not} unique, \code{smooth.spline} will replace
# such \eqn{(x,y)}'s with a new pair \eqn{(x,y')} where \eqn{y'} is a
# reweighted average on the original \eqn{y}'s. It is important to
# be aware of this. In such cases, the resulting \code{smooth.spline}
# object does \emph{not} contain all \eqn{(x,y)}'s and therefore this
# function will not calculate the weighted residuals sum of square on
# the original data set, but on the data set with unique \eqn{x}'s.
# See examples below how to calculate the likelihood for the spline with
# the original data.
# }
#
# \value{
# Returns the overall (log) likelihood of class
# \code{SmoothSplineLikelihood}, a class with the following attributes:
# \item{wrss}{the (weighted) residual sum of square}
# \item{penalty}{the penalty which is equal to \code{-lambda*roughness}.}
# \item{lambda}{the smoothing parameter}
# \item{roughness}{the value of the roughness functional given the
# specific smoothing spline and the range of data}
# }
#
# \details{
# The roughness penalty for the smoothing spline, \eqn{g}, fitted
# from data in the interval \eqn{[a,b]} is defined as
# \deqn{J(g) = \int_a^b g''(t) dt}
# which is the same as
# \deqn{J(g) = g'(b) - g'(a)}
# The latter is calculated internally by using
# @see "stats::predict.smooth.spline".
# }
#
# @examples "../incl/likelihood.smooth.spline.Rex"
#
# \seealso{
# @see "stats::smooth.spline" and @see "robustSmoothSpline".
# }
#
# @author
#
# @keyword smooth
# @keyword internal
#*/###########################################################################
setMethodS3("likelihood", "smooth.spline", function(object, x=NULL, y=NULL, w=NULL, base=exp(1), rel.tol=.Machine$double.eps^(1/8), ...) {
requireNamespace("stats") || throw("Package not loaded: stats")
smooth.spline <- stats::smooth.spline
g <- object;
if (is.null(x)) {
x <- g$x;
y <- g$yin;
w <- g$w;
yg <- g$y;
} else {
xy <- xy.coords(x, y);
if (is.element("w", names(x)))
w <- x$w;
x <- xy$x;
y <- xy$y;
if (is.null(w))
w <- rep(1, times=length(x));
yg <- NULL;
ok <- (!is.na(x) & !is.na(y) & !is.na(w));
if (any(ok == FALSE)) {
x <- x[ok];
y <- y[ok];
z <- z[ok];
}
}
# The weighted residuals sum of square
if (is.null(yg))
yg <- predict(g, x)$y;
wrss <- sum(w * (y-yg)^2);
# The smoothing parameter
lambda <- g$lambda
# The roughness score J(g) = \int_a^b (g''(t))^2 dt
gbis <- smooth.spline(predict(g, x, deriv=2));
ab <- range(x, na.rm=TRUE);
Jg <- integrate(function(x) predict(gbis, x=x)$y,
lower=ab[1], upper=ab[2],
rel.tol=rel.tol, stop.on.error=FALSE)$value
# The penalty term
penalty <- -lambda * Jg;
# The log likelihood
l <- -(wrss + penalty);
# Return the correct logarithm (if any)
if (missing(base) || (!is.null(base) && base == exp(1))) {
} else if (is.null(base)) {
l <- exp(l);
} else {
l <- l*log(exp(1), base=base);;
}
attr(l, "base") <- base;
attr(l, "wrss") <- wrss;
attr(l, "lambda") <- lambda;
attr(l, "roughness") <- Jg;
attr(l, "penalty") <- penalty;
class(l) <- "SmoothSplineLikelihood";
l;
})
############################################################################
# HISTORY:
# 2007-01-01
# o Removed any code to make method backward compatibility with
# R < 1.9.0, which was before 'modreg' was merged into 'stats'.
# 2005-06-03
# o now returns an object of class SmoothSplineLikelihood.
# 2005-02-20
# o Now using setMethodS3() and added '...' to please R CMD check.
# 2002-04-21
# o Updated due to modreg is merged into stats from R v1.9.0.
# 2002-03-04
# o Returns an object of class likelihood. smooth.spline instead of a list.
# o Added the option to explicitly specify x, y and w.
# o Rename from logLikelihood(...) to likelihood(..., base=exp(1)).
# o BUG FIX: Forgot to take the square in the integral of J(g).
# 2002-03-02
# o Added Rdoc details about case with non unique x values.
# 2002-03-01
# o Wrote the Rdoc comments
# o Created.
############################################################################
aroma.light/R/calibrateMultiscan.R 0000644 0001750 0001750 00000024370 14136047216 016755 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric calibrateMultiscan
# @alias calibrateMultiscan.matrix
#
# \encoding{latin1}
#
# @title "Weighted affine calibration of a multiple re-scanned channel"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage calibrateMultiscan,matrix
# }
#
# \arguments{
# \item{X}{An NxK @matrix (K>=2) where the columns represent the
# multiple scans of one channel (a two-color array contains two
# channels) to be calibrated.}
# \item{weights}{If @NULL, non-weighted normalization is done.
# If data-point weights are used, this should be a @vector of length
# N of data point weights used when estimating the normalization
# function.
# }
# \item{typeOfWeights}{A @character string specifying the type of
# weights given in argument \code{weights}.
# }
# \item{method}{A @character string specifying how the estimates are
# robustified. See @see "iwpca" for all accepted values.}
# \item{constraint}{Constraint making the bias parameters identifiable.
# See @see "fitIWPCA" for more details.}
# \item{satSignal}{Signals equal to or above this threshold is considered
# saturated signals.}
# \item{...}{Other arguments passed to @see "fitIWPCA" and in
# turn @see "iwpca", e.g. \code{center} (see below).}
# \item{average}{A @function to calculate the average signals between calibrated scans.}
# \item{deviance}{A @function to calculate the deviance of the signals between calibrated scans.}
# \item{project}{If @TRUE, the calibrated data points projected onto the
# diagonal line, otherwise not. Moreover, if @TRUE, argument
# \code{average} is ignored.}
# \item{.fitOnly}{If @TRUE, the data will not be back-transform.}
# }
#
# \value{
# If \code{average} is specified or \code{project} is @TRUE,
# an Nx1 @matrix is returned, otherwise an NxK @matrix is returned.
# If \code{deviance} is specified, a deviance Nx1 @matrix is returned
# as attribute \code{deviance}.
# In addition, the fitted model is returned as attribute \code{modelFit}.
# }
#
# \section{Negative, non-positive, and saturated values}{
# Affine multiscan calibration applies also to negative values, which are
# therefor also calibrated, if they exist.
#
# Saturated signals in any scan are set to @NA. Thus, they will not be
# used to estimate the calibration function, nor will they affect an
# optional projection.
# }
#
# \section{Missing values}{
# Only observations (rows) in \code{X} that contain all finite values are
# used in the estimation of the calibration functions. Thus,
# observations can be excluded by setting them to @NA.
# }
#
# \section{Weighted normalization}{
# Each data point/observation, that is, each row in \code{X}, which is a
# vector of length K, can be assigned a weight in [0,1] specifying how much
# it should \emph{affect the fitting of the calibration function}.
# Weights are given by argument \code{weights},
# which should be a @numeric @vector of length N. Regardless of weights,
# all data points are \emph{calibrated} based on the fitted calibration
# function.
# }
#
# \section{Robustness}{
# By default, the model fit of multiscan calibration is done in \eqn{L_1}
# (\code{method="L1"}). This way, outliers affect the parameter estimates
# less than ordinary least-square methods.
#
# When calculating the average calibrated signal from multiple scans,
# by default the median is used, which further robustify against outliers.
#
# For further robustness, downweight outliers such as saturated signals,
# if possible.
#
# Tukey's biweight function is supported, but not used by default because
# then a "bandwidth" parameter has to selected. This can indeed be done
# automatically by estimating the standard deviation, for instance using
# MAD. However, since scanner signals have heteroscedastic noise
# (standard deviation is approximately proportional to the non-logged
# signal), Tukey's bandwidth parameter has to be a function of the
# signal too, cf. @see "stats::loess". We have experimented with this
# too, but found that it does not significantly improve the robustness
# compared to \eqn{L_1}.
# Moreover, using Tukey's biweight as is, that is, assuming homoscedastic
# noise, seems to introduce a (scale dependent) bias in the estimates
# of the offset terms.
# }
#
# \section{Using a known/previously estimated offset}{
# If the scanner offsets can be assumed to be known, for instance,
# from prior multiscan analyses on the scanner, then it is possible
# to fit the scanner model with no (zero) offset by specifying
# argument \code{center=FALSE}.
# Note that you cannot specify the offset. Instead, subtract it
# from all signals before calibrating, e.g.
# \code{Xc <- calibrateMultiscan(X-e, center=FALSE)}
# where \code{e} is the scanner offset (a scalar).
# You can assert that the model is fitted without offset by
# \code{stopifnot(all(attr(Xc, "modelFit")$adiag == 0))}.
# }
#
# \details{
# Fitting is done by iterated re-weighted principal component analysis
# (IWPCA).
# }
#
# @author
#
# \references{
# [1] @include "../incl/BengtssonH_etal_2004.bib.Rdoc" \cr
# }
#
# \examples{\dontrun{# For an example, see help(normalizeAffine).}}
#
# \seealso{
# @see "1. Calibration and Normalization".
# @see "normalizeAffine".
# }
#*/#########################################################################
setMethodS3("calibrateMultiscan", "matrix", function(X, weights=NULL, typeOfWeights=c("datapoint"), method="L1", constraint="diagonal", satSignal=2^16-1, ..., average=median, deviance=NULL, project=FALSE, .fitOnly=FALSE) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 1. Verify the arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument: 'X'
if (ncol(X) < 2L)
stop("Multiscan calibratation requires at least two scans: ", ncol(X));
if (nrow(X) < 3L)
stop("Multiscan calibratation requires at least three observations: ", nrow(X));
# Argument: 'satSignal'
if (satSignal < 0)
stop("Argument 'satSignal' is negative: ", satSignal);
# Argument: 'typeOfWeights'
typeOfWeights <- match.arg(typeOfWeights);
# Argument: 'weights'
datapointWeights <- NULL;
if (!is.null(weights)) {
# If 'weights' is an object of a class with as.double(), cast it.
weights <- as.double(weights);
if (anyMissing(weights))
stop("Argument 'weights' must not contain NA values.");
if (any(weights < 0 | weights > 1)) {
stop("Argument 'weights' out of range [0,1]: ", paste(weights[weights < 0.0 | weights > 1.0], collapse=", "));
}
weights <- as.vector(weights);
if (length(weights) == 1L) {
weights <- rep(weights, length.out=nrow(X));
} else if (length(weights) != nrow(X)) {
stop("Argument 'weights' does not have the same length as the number of data points (rows) in the matrix: ", length(weights), " != ", nrow(X));
}
datapointWeights <- weights;
}
# Argument 'average':
if (!is.null(average) && !is.function(average)) {
throw("Argument 'average' must be a function or NULL: ", class(average)[1]);
}
# Argument 'deviance':
if (!is.null(deviance) && !is.function(deviance)) {
throw("Argument 'deviance' must be a function or NULL: ", class(deviance)[1]);
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 2. Prepare the data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Use non-saturated observations (non-finite values are taken care of by
# the fitIWPCA() function.
X[(X >= satSignal)] <- NA_real_;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 3. Fit the model
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- fitIWPCA(X, w=datapointWeights, method=method, constraint=constraint, ...);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 4. Backtransform
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
if (.fitOnly == FALSE) {
X <- backtransformAffine(X, a=fit, project=project);
if (project == FALSE && !is.null(average)) {
X <- apply(X, MARGIN=1L, FUN=average, na.rm=TRUE);
X <- as.matrix(X);
}
if (!is.null(deviance)) {
deviance <- apply(X, MARGIN=1L, FUN=deviance, na.rm=TRUE);
attr(X, "deviance") <- as.matrix(deviance);
}
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 5. Return the backtransformed data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
attr(X, "modelFit") <- fit;
X;
}) # calibrateMultiscan()
############################################################################
# HISTORY:
# 2013-09-26
# o Now utilizing anyMissing().
# 2011-02-05
# o DOCUMENTATION: Added section on how to calibrate when scanner offsets
# are supposed to be known/zero.
# o DOCUMENTATION: Fixed broken links to help for iwpca().
# 2005-06-03
# o Added argument 'typeOfWeights' to make it similar to other normalization
# methods, although only "datapoint" weights are allowed.
# 2005-02-13
# o Made argument 'method="L1"' explicit and wrote a Rdoc comment about it
# to document the fact that we have deliberately choosen not to use
# "symmetric" Tukey's biweight.
# 2005-02-04
# o Put arguments 'average' and 'deviance' back again. It is much more
# userfriendly. Averaging with median() is now the default.
# 2005-02-01
# o Added argument '.fitOnly'.
# 2005-01-24
# o Added argument 'weights' (instead of passing 'w' to fitIWPCA()).
# o Saturated values are not used to estimate the calibration function nor
# are the used if data is projected.
# 2004-12-28
# o Added Rdoc comments on weights.
# 2004-06-28
# o BUG FIX: Missing braces in Rdoc comments.
# 2004-05-18
# o Removed averaging etc. That is now in its own function rowAverages().
# o The only difference between calibrateMultiscanSpatial() and
# calibrateMultiscan() is how the parameters are fitted.
# 2004-05-14
# o Cleaned up. Making use of new backtransformAffine(), which makes the
# code clearer. Explicit arguments that were just passed to iwpca() etc
# are now passed as "..." to make the documentation simpler and less
# confusing for the end user. Experts will follow "..." to iwpca().
############################################################################
aroma.light/R/findPeaksAndValleys.R 0000644 0001750 0001750 00000007640 14136047216 017037 0 ustar nilesh nilesh ###########################################################################/**
# @RdocGeneric findPeaksAndValleys
# @alias findPeaksAndValleys.density
# @alias findPeaksAndValleys.numeric
#
# @title "Finds extreme points in the empirical density estimated from data"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage findPeaksAndValleys,density
# @usage findPeaksAndValleys,numeric
# }
#
# \arguments{
# \item{x}{A @numeric @vector containing data points or
# a @see "stats::density" object.}
# \item{...}{Arguments passed to @see "stats::density".
# Ignored if \code{x} is a @see "stats::density" object.}
# \item{tol}{A non-negative @numeric threshold specifying the minimum
# density at the extreme point in order to accept it.}
# \item{na.rm}{If @TRUE, missing values are dropped, otherwise not.}
# }
#
# \value{
# Returns a @data.frame (of class 'PeaksAndValleys') containing
# of "peaks" and "valleys" filtered by \code{tol}.
# }
#
# @examples "../incl/findPeaksAndValleys.Rex"
#
# @author
#
# \seealso{
# This function is used by @see "callNaiveGenotypes".
# }
#
# @keyword internal
#*/###########################################################################
setMethodS3("findPeaksAndValleys", "density", function(x, tol=0, ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument 'x':
d <- x;
# Argument 'tol':
tol <- as.double(tol);
stopifnot(length(tol) == 1);
stopifnot(tol >= 0);
delta <- diff(d$y);
n <- length(delta);
isPeak <- (delta[-n] > 0 & delta[-1] < 0);
isValley <- (delta[-n] < 0 & delta[-1] > 0);
isPeakOrValley <- (isPeak | isValley);
idxs <- which(isPeakOrValley);
types <- c("valley", "peak")[isPeak[idxs]+1];
names(idxs) <- types;
x <- d$x[idxs];
y <- d$y[idxs];
res <- data.frame(type=types, x=x, density=y);
class(res) <- c("PeaksAndValleys", class(res));
# Filter valleys by density?
if (tol > 0) {
res <- subset(res, density >= tol);
}
res;
}) # findPeaksAndValleys()
setMethodS3("findPeaksAndValleys", "numeric", function(x, ..., tol=0, na.rm=TRUE) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument 'na.rm':
na.rm <- as.logical(na.rm);
stopifnot(length(na.rm) == 1);
# Argument 'tol':
tol <- as.double(tol);
stopifnot(length(tol) == 1);
stopifnot(tol >= 0);
d <- density(x, na.rm=na.rm, ...);
findPeaksAndValleys(d, tol=tol);
}) # findPeaksAndValleys()
############################################################################
# HISTORY:
# 2011-03-03 [HB]
# o BUG FIX: findPeaksAndValleys(x, to) were 'x' is numeric would use
# partial match and interpret 'to' as argument 'tol' and not part of
# '...' passed to density(). This problem was solved by placing '...'
# before argument 'tol'. Thanks Oscar Rueda at the Cancer Reasearch UK
# for reporting and identify this bug.
# 2010-10-08 [HB]
# o Now findPeaksAndValleys() returns a object of class PeaksAndValleys,
# which extends data.frame.
# 2010-10-06 [HB]
# o Added findPeaksAndValleys() for the 'density' class, which then
# findPeaksAndValleys() for 'numeric' utilizes.
# 2010-04-04 [HB]
# o Made findPeaksAndValleys() an internal function in Rd.
# o Updated could to validate arguments with using R.utils::Arguments.
# o Corrected a non-defined Rdoc tag.
# 2009-11-03 [HB]
# o Added Rdoc comments with an example().
# 2009-03-06 [HB]
# o Created for doing quick naive genotyping of some TCGA normal samples in
# order to highlight the centers of the clouds in a tumor-normal fracB
# scatter plots.
############################################################################
aroma.light/R/normalizeCurveFit.R 0000644 0001750 0001750 00000026072 14136047216 016620 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric normalizeCurveFit
# @alias normalizeLoess
# @alias normalizeLowess
# @alias normalizeSpline
# @alias normalizeRobustSpline
# @alias normalizeCurveFit.matrix
# @alias normalizeLoess.matrix
# @alias normalizeLowess.matrix
# @alias normalizeSpline.matrix
# @alias normalizeRobustSpline.matrix
#
# \encoding{latin1}
#
# @title "Weighted curve-fit normalization between a pair of channels"
#
# \description{
# @get "title".
#
# This method will estimate a smooth function of the dependency
# between the log-ratios and the log-intensity of the two channels and
# then correct the log-ratios (only) in order to remove the dependency.
# This is method is also known as \emph{intensity-dependent} or
# \emph{lowess normalization}.
#
# The curve-fit methods are by nature limited to paired-channel data.
# There exist at least one method trying to overcome this limitation,
# namely the cyclic-lowess [1], which applies the paired
# curve-fit method iteratively over all pairs of channels/arrays.
# Cyclic-lowess is not implemented here.
#
# We recommend that affine normalization [2] is used instead of curve-fit
# normalization.
# }
#
# \usage{
# @usage normalizeCurveFit,matrix
# @usage normalizeLoess,matrix
# @usage normalizeLowess,matrix
# @usage normalizeSpline,matrix
# @usage normalizeRobustSpline,matrix
# }
#
# \arguments{
# \item{X}{An Nx2 @matrix where the columns represent the two channels
# to be normalized.}
# \item{weights}{If @NULL, non-weighted normalization is done.
# If data-point weights are used, this should be a @vector of length
# N of data point weights used when estimating the normalization
# function.
# }
# \item{typeOfWeights}{A @character string specifying the type of
# weights given in argument \code{weights}.
# }
# \item{method}{@character string specifying which method to use when
# fitting the intensity-dependent function.
# Supported methods:
# \code{"loess"} (better than lowess),
# \code{"lowess"} (classic; supports only zero-one weights),
# \code{"spline"} (more robust than lowess at lower and upper
# intensities; supports only zero-one weights),
# \code{"robustSpline"} (better than spline).
# }
# \item{bandwidth}{A @double value specifying the bandwidth of the
# estimator used.
# }
# \item{satSignal}{Signals equal to or above this threshold will not
# be used in the fitting.
# }
# \item{...}{Not used.}
# }
#
# \value{
# A Nx2 @matrix of the normalized two channels.
# The fitted model is returned as attribute \code{modelFit}.
# }
#
# \details{
# A smooth function \eqn{c(A)} is fitted through data in \eqn{(A,M)},
# where \eqn{M=log_2(y_2/y_1)} and \eqn{A=1/2*log_2(y_2*y_1)}. Data is
# normalized by \eqn{M <- M - c(A)}.
#
# Loess is by far the slowest method of the four, then lowess, and then
# robust spline, which iteratively calls the spline method.
# }
#
# \section{Negative, non-positive, and saturated values}{
# Non-positive values are set to not-a-number (@NaN).
# Data points that are saturated in one or more channels are not used
# to estimate the normalization function, but they are normalized.
# }
#
# \section{Missing values}{
# The estimation of the normalization function will only be made
# based on complete non-saturated observations, i.e. observations that
# contains no @NA values nor saturated values as defined by \code{satSignal}.
# }
#
# \section{Weighted normalization}{
# Each data point, that is, each row in \code{X}, which is a
# vector of length 2, can be assigned a weight in [0,1] specifying how much
# it should \emph{affect the fitting of the normalization function}.
# Weights are given by argument \code{weights}, which should be a @numeric
# @vector of length N. Regardless of weights, all data points are
# \emph{normalized} based on the fitted normalization function.
#
# Note that the lowess and the spline method only support zero-one
# \{0,1\} weights.
# For such methods, all weights that are less than a half are set to zero.
# }
#
# \section{Details on loess}{
# For @see "stats::loess", the arguments \code{family="symmetric"},
# \code{degree=1}, \code{span=3/4},
# \code{control=loess.control(trace.hat="approximate"},
# \code{iterations=5}, \code{surface="direct")} are used.
# }
#
# @author "HB"
#
# \references{
# [1] M. \enc{strand}{Astrand},
# Contrast Normalization of Oligonucleotide Arrays,
# Journal Computational Biology, 2003, 10, 95-102. \cr
# [2] @include "../incl/BengtssonHossjer_2006.bib.Rdoc" \cr
# }
#
# \examples{
# @include "../incl/normalizeCurveFit.matrix.Rex"
# }
#
# \seealso{
# @see "normalizeAffine".
# }
#*/#########################################################################
setMethodS3("normalizeCurveFit", "matrix", function(X, weights=NULL, typeOfWeights=c("datapoint"), method=c("loess", "lowess", "spline", "robustSpline"), bandwidth=NULL, satSignal=2^16-1, ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 1. Verify the arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument: 'X'
if (ncol(X) != 2)
stop("Curve-fit normalization requires two channels only: ", ncol(X));
if (nrow(X) < 3)
stop("Curve-fit normalization requires at least three observations: ", nrow(X));
# Argument: 'satSignal'
if (satSignal < 0)
stop("Argument 'satSignal' is negative: ", satSignal);
# Argument: 'method'
method <- match.arg(method);
zeroOneWeightsOnly <- (method %in% c("lowess", "spline"));
# Argument: 'typeOfWeights'
typeOfWeights <- match.arg(typeOfWeights);
# Argument: 'weights'
datapointWeights <- NULL;
if (!is.null(weights)) {
# If 'weights' is an object of a class with as.double(), cast it.
weights <- as.double(weights);
if (anyMissing(weights))
stop("Argument 'weights' must not contain NA values.");
if (any(weights < 0 | weights > 1)) {
stop("Argument 'weights' out of range [0,1]: ", paste(weights[weights < 0.0 | weights > 1.0], collapse=", "));
}
if (zeroOneWeightsOnly && any(weights > 0 & weights < 1)) {
weights <- round(weights);
warning("Weights were rounded to {0,1} since '", method, "' normalization supports only zero-one weights.");
}
weights <- as.vector(weights);
if (length(weights) == 1) {
weights <- rep(weights, length.out=nrow(X));
} else if (length(weights) != nrow(X)) {
stop("Argument 'weights' does not have the same length as the number of data points (rows) in the matrix: ", length(weights), " != ", nrow(X));
}
datapointWeights <- weights;
} # if (!is.null(weights))
# Argument: 'bandwidth'
if (is.null(bandwidth)) {
bandwidths <- c("loess"=0.75, "lowess"=0.3, "robustSpline"=0.75,
"spline"=0.75);
bandwidth <- bandwidths[method];
} else if (!is.numeric(bandwidth) || bandwidth <= 0 || bandwidth > 1) {
stop("Argument 'bandwidth' must be in [0,1): ", bandwidth);
} else if (length(bandwidth) != 1) {
stop("Argument 'bandwidth' must be a scalar: ", paste(bandwidth, collapse=", "));
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 2. Prepare data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Convert non-positive signals to NaN. If not done here, the transform
# (R,G) -> (A,M) -> (R,G) will no it.
X[X <= 0] <- NaN;
# Use only positive non-saturated observations to estimate the
# normalization function
isValid <- (is.finite(X) & (X <= satSignal));
isValid <- (isValid[,1] & isValid[,2]);
Y <- X[isValid,];
if (!is.null(datapointWeights))
datapointWeights <- datapointWeights[isValid];
M <- log(Y[,1]/Y[,2], base=2);
A <- log(Y[,1]*Y[,2], base=2)/2;
# Not needed anymore
Y <- NULL;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 3. Estimate the model
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
if (method == "lowess") {
incl <- if (!is.null(datapointWeights)) (datapointWeights > 0) else TRUE;
fit <- lowess(x=A[incl], y=M[incl], f=bandwidth, ...);
fit$predictM <- function(newA) approx(fit, xout=newA, ties=mean)$y;
} else if (method == "loess") {
fit <- loess(formula=M ~ A, weights=datapointWeights,
family="symmetric", degree=1, span=bandwidth,
control=loess.control(trace.hat="approximate",
iterations=5, surface="direct"), ...);
fit$predictM <- function(newA) predict(fit, newdata=newA);
} else if (method == "spline") {
incl <- if (!is.null(datapointWeights)) (datapointWeights > 0) else TRUE;
fit <- smooth.spline(x=A[incl], y=M[incl], spar=bandwidth, ...);
fit$predictM <- function(newA) predict(fit, x=newA)$y;
} else if (method == "robustSpline") {
fit <- robustSmoothSpline(x=A, y=M, w=datapointWeights, spar=bandwidth, ...);
fit$predictM <- function(newA) predict(fit, x=newA)$y;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 4. Normalize
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Normalize all data
M <- log(X[,1]/X[,2], base=2);
A <- log(X[,1]*X[,2], base=2)/2;
ok <- is.finite(A);
M[ok] <- M[ok] - fit$predictM(A[ok]);
# Not needed anymore
ok <- NULL;
X[,1] <- as.matrix(sqrt(2^(2*A+M)));
X[,2] <- as.matrix(sqrt(2^(2*A-M)));
# Not needed anymore
A <- M <- NULL;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 5. Return the normalized data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
attr(X, "modelFit") <- fit;
X;
}) # normalizeCurveFit()
setMethodS3("normalizeLowess", "matrix", function(X, ...) {
normalizeCurveFit(X, method="lowess", ...);
})
setMethodS3("normalizeLoess", "matrix", function(X, ...) {
normalizeCurveFit(X, method="loess", ...);
})
setMethodS3("normalizeSpline", "matrix", function(X, ...) {
normalizeCurveFit(X, method="spline", ...);
})
setMethodS3("normalizeRobustSpline", "matrix", function(X, ...) {
normalizeCurveFit(X, method="robustSpline", ...);
})
############################################################################
# HISTORY:
# 2013-09-26
# o Now utilizing anyMissing().
# 2005-06-03
# o Added argument 'typeOfWeights' to make it similar to other normalization
# methods, although only "datapoint" weights are allowed.
# o Removed argument '.fitOnly'.
# o renamed all "robust.spline" to "robustSpline".
# 2005-03-23
# o Updated normalizeCurveFit() so that approx() does not give warnings
# about 'Collapsing to unique x values' when doing lowess normalization.
# 2005-02-02
# o Zero-one weights are now round off by round(w).
# o BUG FIX: Forgot to adjust weights vector in normalizeCurveFit() when
# removing non-finite values from data.
# 2005-02-01
# o Added argument '.fitOnly'.
# 2005-01-24
# o Create an Rdoc example with MvsA and MvsM comparisons.
# 2005-01-23
# o Added aliases normalizeLowess() and normalizeLoess().
# o Created from normalizeCurveFit() in MAData().
############################################################################
aroma.light/R/fitPrincipalCurve.R 0000644 0001750 0001750 00000011164 14136047216 016575 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric fitPrincipalCurve
# @alias fitPrincipalCurve.matrix
#
# @title "Fit a principal curve in K dimensions"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage fitPrincipalCurve,matrix
# }
#
# \arguments{
# \item{X}{An NxK @matrix (K>=2) where the columns represent the dimension.}
# \item{...}{Other arguments passed to @see "princurve::principal_curve".}
# \item{verbose}{A @logical or a @see "R.utils::Verbose" object.}
# }
#
# \value{
# Returns a principal_curve object (which is a @list).
# See @see "princurve::principal_curve" for more details.
# }
#
# \section{Missing values}{
# The estimation of the normalization function will only be made
# based on complete observations, i.e. observations that contains no @NA
# values in any of the channels.
# }
#
# @author "HB"
#
# \references{
# [1] Hastie, T. and Stuetzle, W, \emph{Principal Curves}, JASA, 1989.\cr
# [2] @include "../incl/BengtssonH_etal_2009.bib.Rdoc" \cr
# }
#
# @examples "../incl/fitPrincipalCurve.matrix.Rex"
#
# \seealso{
# @see "backtransformPrincipalCurve".
# @see "princurve::principal_curve".
# }
#*/#########################################################################
setMethodS3("fitPrincipalCurve", "matrix", function(X, ..., verbose=FALSE) {
# princurve v1.1-9 and before contains bugs. /HB 2008-05-26
# princurve v2.0.0 replaced princurve.curve with princurve_curve. /HB 2018-09-04
use("princurve (>= 2.1.2)")
principal_curve <- princurve::principal_curve
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
n <- nrow(X);
p <- ncol(X);
# Argument 'verbose':
verbose <- Arguments$getVerbose(verbose);
if (verbose) {
pushState(verbose);
on.exit(popState(verbose));
}
verbose && enter(verbose, "Fitting principal curve");
verbose && cat(verbose, "Data size: ", n, "x", p);
verbose && enter(verbose, "Identifying missing values");
# princurve::principal_curve() does not handle missing values.
keep <- rep(TRUE, times=n);
for (cc in seq_len(p)) {
keep <- keep & is.finite(X[,cc]);
}
anyMissing <- (!all(keep));
if (anyMissing) {
X <- X[keep,, drop=FALSE];
}
verbose && exit(verbose);
verbose && cat(verbose, "Data size after removing non-finite data points: ", nrow(X), "x", p);
verbose && enter(verbose, "Calling principal_curve()");
trace <- as.logical(verbose);
t <- system.time({
fit <- principal_curve(X, ..., trace=trace);
});
attr(fit, "processingTime") <- t;
verbose && printf(verbose, "Converged: %s\n", fit$converged);
verbose && printf(verbose, "Number of iterations: %d\n", fit$num_iterations);
verbose && printf(verbose, "Processing time/iteration: %.1fs (%.1fs/iteration)\n", t[3], t[3]/fit$num_iterations);
verbose && exit(verbose);
# Expand, iff missing values were dropped
if (anyMissing) {
values <- matrix(NA_real_, nrow=n, ncol=p);
values[keep,] <- fit$s;
dimnames(values) <- dimnames(fit$s);
fit$s <- values;
values <- rep(NA_real_, times=n);
for (ff in c("ord", "lambda")) {
values[keep] <- fit[[ff]];
fit[[ff]] <- values;
}
}
verbose && exit(verbose);
class(fit) <- c("PrincipalCurve", class(fit));
fit;
}) # fitPrincipalCurve()
###########################################################################
# HISTORY:
# 2013-04-18
# o BUG FIX: fitPrincipalCurve() would not preserve dimension names
# if data contain missing values.
# 2011-04-12
# o CLEANUP: Removed internal patch of principal.curve(). If an older
# version than princurve v1.1-10 is used, an informative error is
# thrown requesting an update. The internal patch is part of the
# offical princurve v1.1-10 release (since 2009-10-04).
# 2009-11-01
# o Now fitPrincipalCurve() bug-fixed princurve v1.1-10. If earlier
# version are available, it used the internal patch.
# 2009-07-15
# o Added attribute 'processingTime' to the fit object returned by
# fitPrincipalCurve().
# 2009-01-12
# o Updated code such that R.utils::Verbose is optional.
# 2008-10-08
# o Removed argument 'fixDimension'. That constrain is taken care of
# by backtransformPrincipalCurve().
# o Now the fitted object is of class PrincipalCurve that extends the
# princurve::principal.curve class.
# 2008-10-07
# o Added Rdoc comments and an example.
# o Removed implementation for data.frame:s.
# 2008-10-03
# o Added argument 'fixDimension'.
# 2008-05-27
# o Added fitPrincipalCurve().
# o Created.
###########################################################################
aroma.light/R/999.package.R 0000644 0001750 0001750 00000007764 14136047216 015103 0 ustar nilesh nilesh #########################################################################/**
# @RdocPackage aroma.light
#
# \encoding{latin1}
#
# \description{
# @eval "getDescription(aroma.light)"
# }
#
# \section{Installation}{
# To install this package, see
# \url{https://bioconductor.org/packages/release/bioc/html/aroma.light.html}.
# }
#
# \section{To get started}{
# For scanner calibration:
# \enumerate{
# \item see @see "calibrateMultiscan" - scan the same array two or more times to calibrate for scanner effects and extended dynamical range.
# }
#
# To normalize multiple single-channel arrays all with the same number of probes/spots:
# \enumerate{
# \item @see "normalizeAffine" - normalizes, on the intensity scale, for differences in offset and scale between channels.
# \item @see "normalizeQuantileRank", @see "normalizeQuantileSpline" - normalizes, on the intensity scale, for differences in empirical distribution between channels.
# }
#
# To normalize multiple single-channel arrays with varying number probes/spots:
# \enumerate{
# \item @see "normalizeQuantileRank", @see "normalizeQuantileSpline" - normalizes, on the intensity scale, for differences in empirical distribution between channels.
# }
#
# To normalize two-channel arrays:
# \enumerate{
# \item @see "normalizeAffine" - normalizes, on the intensity scale, for differences in offset and scale between channels. This will also correct for intensity-dependent affects on the log scale.
# \item @see "normalizeCurveFit" - Classical intensity-dependent normalization, on the log scale, e.g. lowess normalization.
# }
#
# To normalize three or more channels:
# \enumerate{
# \item @see "normalizeAffine" - normalizes, on the intensity scale, for differences in offset and scale between channels. This will minimize the curvature on the log scale between any two channels.
# }
# }
#
# \section{Further readings}{
# Several of the normalization methods proposed in [1]-[7] are
# available in this package.
# }
#
# \section{How to cite this package}{
# Whenever using this package, please cite one or more of [1]-[7].
# }
#
# \section{Wishlist}{
# Here is a list of features that would be useful, but which I have
# too little time to add myself. Contributions are appreciated.
# \itemize{
# \item At the moment, nothing.
# }
#
# If you consider to contribute, make sure it is not already
# implemented by downloading the latest "devel" version!
# }
#
# @author "*"
#
# \section{License}{
# The releases of this package is licensed under
# GPL version 2 or newer.
#
# NB: Except for the \code{robustSmoothSpline()} method,
# it is alright to distribute the rest of the package under
# LGPL version 2.1 or newer.
#
# The development code of the packages is under a private licence
# (where applicable) and patches sent to the author fall under the
# latter license, but will be, if incorporated, released under the
# "release" license above.
# }
#
# \references{
# Some of the reference below can be found at
# \url{https://www.aroma-project.org/publications/}.\cr
#
# [1] H. Bengtsson, \emph{Identification and normalization of plate effects
# in cDNA microarray data}, Preprints in Mathematical Sciences,
# 2002:28, Mathematical Statistics, Centre for Mathematical Sciences,
# Lund University, 2002.\cr
#
# [2] @include "../incl/BengtssonH_2003.bib.Rdoc" \cr
#
# [3] H. Bengtsson, \emph{aroma - An R Object-oriented Microarray
# Analysis environment}, Preprints in Mathematical Sciences (manuscript
# in preparation), Mathematical Statistics, Centre for Mathematical
# Sciences, Lund University, 2004.\cr
#
# [4] @include "../incl/BengtssonH_etal_2004.bib.Rdoc" \cr
#
# [5] @include "../incl/BengtssonHossjer_2006.bib.Rdoc" \cr
#
# [6] @include "../incl/BengtssonH_etal_2008.bib.Rdoc" \cr
#
# [7] @include "../incl/BengtssonH_etal_2009.bib.Rdoc" \cr
#
# [8] @include "../incl/BengtssonNeuvial_2010.bib.Rdoc" \cr
# }
#*/#########################################################################
aroma.light/R/999.NonDocumentedObjects.R 0000644 0001750 0001750 00000002266 14136047216 017554 0 ustar nilesh nilesh ###########################################################################/**
# @RdocDocumentation "Non-documented objects"
#
# % Plot functions
# @alias lines.XYCurveFit
#
# % Matrix operations
# @alias rowAverages
# @alias rowAverages.matrix
#
# % Simple linear-algebra
# @alias projectUontoV
# @alias scalarProduct
# @alias tr
#
# % Miscellaneous statistical functions
# @alias likelihood
# @alias predict.lowess
#
# \description{
# This page contains aliases for all "non-documented" objects that
# \code{R CMD check} detects in this package.
#
# Almost all of them are \emph{generic} functions that have specific
# document for the corresponding method coupled to a specific class.
# Other functions are re-defined by \code{setMethodS3()} to
# \emph{default} methods. Neither of these two classes are non-documented
# in reality.
# The rest are deprecated methods.
# }
#
# @keyword internal
#*/###########################################################################
############################################################################
# HISTORY:
# 2005-02-10
# o Created to please R CMD check.
############################################################################
aroma.light/R/wpca.R 0000644 0001750 0001750 00000023674 14136047216 014107 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric wpca
# @alias wpca.matrix
#
# @title "Light-weight Weighted Principal Component Analysis"
#
# \usage{
# @usage wpca,matrix
# }
#
# \description{
# Calculates the (weighted) principal components of a matrix, that is,
# finds a new coordinate system (not unique) for representing the given
# multivariate data such that
# i) all dimensions are orthogonal to each other, and
# ii) all dimensions have maximal variances.
# }
#
# \arguments{
# \item{x}{An NxK @matrix.}
# \item{w}{An N @vector of weights for each row (observation) in
# the data matrix. If @NULL, all observations get the same weight,
# that is, standard PCA is used.}
# \item{center}{If @TRUE, the (weighted) sample mean column @vector is
# subtracted from each column in \code{mat}, first.
# If data is not centered, the effect will be that a linear subspace
# that goes through the origin is fitted.}
# \item{scale}{If @TRUE, each column in \code{mat} is
# divided by its (weighted) root-mean-square of the
# centered column, first.}
# \item{method}{If \code{"dgesdd"} LAPACK's divide-and-conquer
# based SVD routine is used (faster [1]).
# If \code{"dgesvd"}, LAPACK's QR-decomposition-based routine is used.
# }
# \item{swapDirections}{If @TRUE, the signs of eigenvectors
# that have more negative than positive components are inverted.
# The signs of corresponding principal components are also inverted.
# This is only of interest when for instance visualizing or comparing
# with other PCA estimates from other methods, because the
# PCA (SVD) decomposition of a matrix is not unique.
# }
# \item{...}{Not used.}
# }
#
# \value{
# Returns a @list with elements:
# \item{pc}{An NxK @matrix where the column @vectors are the
# principal components (a.k.a. loading vectors,
# spectral loadings or factors etc).}
# \item{d}{An K @vector containing the eigenvalues of the
# principal components.}
# \item{vt}{An KxK @matrix containing the eigenvector of the
# principal components.}
# \item{xMean}{The center coordinate.}
#
# It holds that \code{x == t(t(fit$pc \%*\% fit$vt) + fit$xMean)}.
# }
#
# \section{Method}{
# A singular value decomposition (SVD) is carried out.
# Let X=\code{mat}, then the SVD of the matrix is \eqn{X = U D V'}, where
# \eqn{U} and \eqn{V} are orthogonal, and \eqn{D} is a diagonal matrix
# with singular values. The principal returned by this method are \eqn{U D}.
#
# Internally \code{La.svd()} (or \code{svd()}) of the \pkg{base}
# package is used.
# For a popular and well written introduction to SVD see for instance [2].
# }
#
# \examples{
# @include "../incl/wpca.matrix.Rex"
#
# if (dev.cur() > 1) dev.off()
#
# @include "../incl/wpca2.matrix.Rex"
# }
#
# @author
#
# \references{
# [1] J. Demmel and J. Dongarra, \emph{DOE2000 Progress Report}, 2004.
# \url{https://people.eecs.berkeley.edu/~demmel/DOE2000/Report0100.html} \cr
# [2] Todd Will, \emph{Introduction to the Singular Value Decomposition},
# UW-La Crosse, 2004. \url{http://websites.uwlax.edu/twill/svd/} \cr
# }
#
# \seealso{
# For a iterative re-weighted PCA method, see @see "iwpca".
# For Singular Value Decomposition, see @see "base::svd".
# For other implementations of Principal Component Analysis functions see
# (if they are installed):
# @see "stats::prcomp" in package \pkg{stats} and \code{pca()} in package
# \pkg{pcurve}.
# }
#
# @keyword "algebra"
#*/#########################################################################
setMethodS3("wpca", "matrix", function(x, w=NULL, center=TRUE, scale=FALSE, method=c("dgesdd", "dgesvd"), swapDirections=FALSE, ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 1. Verify the arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument: 'x'
x <- as.matrix(x);
# The dimensions of 'x'
N <- nrow(x);
K <- ncol(x);
# Standardizes the weights to [0,1] such that they sum to 1.
if (!is.null(w)) {
w <- rep(w, length.out=N);
w <- w/sum(w);
if (anyMissing(w))
stop("Argument 'w' has missing values.");
}
## Argument 'method':
method <- match.arg(method, choices = c("dgesdd", "dgesvd"))
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 2. Weighted or non-weighted centering and rescaling of the data
#
# Note: The following split of (center == TRUE) and (center == FALSE)
# is to minimize memory usage. In other words, the codes is longer,
# but more memory efficient.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
if (center || scale) {
if (is.null(w)) {
# Calculates the standard column means
xMean <- colMeans(x, na.rm=TRUE); # a K vector
} else {
# Calculates the weighted column means (recall that sum(w) == 1)
xMean <- as.vector(w %*% x); # a K vector
}
if (center) {
# Centers the data directly by subtracting the column means
for (kk in 1:ncol(x))
x[,kk] <- x[,kk] - xMean[kk];
} else {
# ...or calculates the centered data for rescaling
xc <- x;
for (kk in 1:ncol(x))
xc[,kk] <- x[,kk] - xMean[kk];
}
if (scale) {
if (is.null(w)) {
# Non-weighted root-mean-squares
rms <- function(v) { # v - column vector of length N
v <- v[!is.na(v)];
sqrt(sum(v^2)/max(1, length(v)-1));
}
} else {
# Weighted root-mean-squares
rms <- function(v) { # v - column vector of length N
ok <- !is.na(v);
v <- w[ok]*v[ok];
sqrt(sum(v^2)/max(1, length(v)-1));
}
}
if (center) {
xRMS <- apply(x, MARGIN=2, FUN=rms);
} else {
xRMS <- apply(xc, MARGIN=2, FUN=rms);
# Not needed anymore
xc <- NULL;
}
for (kk in 1:ncol(x))
x[,kk] <- x[,kk] / xRMS[kk];
# Not needed anymore
xRMS <- rms <- NULL;
}
} else {
xMean <- rep(0, length=K);
}
# Weight the observations?
if (!is.null(w)) {
x <- sqrt(w)*x;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 3. Singular Value Decomposition, i.e. X = U D V'
#
# "We compare DGESVD, the original QR-based routines from
# LAPACK 1.0, with DGESDD, the new divide-and-conquer based
# routine from LAPACK 3.0. The table below shows the speeds
# on several machines. The new routine is 5.7 to 16.8 times
# faster than the old routine. Part of the speed results
# from doing many fewer operations, and the rest comes from
# doing them faster (a high Mflop rate)." [1]
# [1] J. Demmel and J. Dongarra, DOE2000 Progress Report,
# http://www.cs.berkeley.edu/~demmel/DOE2000/Report0100.html
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
if (method == "dgesdd" || method == "dgesvd") {
duvt <- La.svd(x);
} else {
stop(sprintf("Unknown LAPACK or LINPACK routine to solve SVD: %s", method));
}
# 'duvt' is a list with the follwing components:
#
# u - a NxK matrix whose columns contain the left singular
# vectors (eigenvectors) of 'x'.
# It holds that t(u) %*% u == I
# d - a K vector containing the singular value of
# each principal component on its diagonal.
# It holds that d[1] >= d[2] >= ... d[K] >= 0.
# vt - a KxK transposed matrix whose columns contain the right
# singular vectors (eigenvector) of 'x'.
# It holds that t(v) %*% v == I
# Not need anymore, in case a local copy has been created!
x <- NULL;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 4. The PCA principal components
#
# (a.k.a. loading vectors, spectral loadings or factors).
#
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
d <- duvt$d;
vt <- duvt$vt;
pc <- duvt$u;
# Not need anymore
duvt <- NULL;
# Note: D == diag(duvt$d) is memory expensive since the dimensions of D
# is the same as the dimensions of 'x'. Thus, it unwise to do:
# pc <- duvt$u %*% diag(duvt$d);
for (kk in seq_len(N))
pc[kk,] <- pc[kk,] * d;
if (!is.null(w)) {
# Rescale the principal components
pc <- pc / sqrt(w);
# Not need anymore
w <- NULL;
}
if (swapDirections) {
swap <- apply(vt, MARGIN=1, FUN=function(z) sum(sign(z)) < 0);
# Which eigenvectors should swap signs?
swap <- which(swap);
for (kk in swap) {
vt[kk,] <- -vt[kk,];
pc[,kk] <- -pc[,kk];
}
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 4. Return the parameter estimates
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
res <- list(pc=pc, d=d, vt=vt, xMean=xMean);
class(res) <- "WPCAFit";
res;
}) # wpca()
############################################################################
# HISTORY:
# 2015-05-24
# o Removed obsolete method="dsvdc"; only needed in R (< 1.7.0).
# 2013-09-26
# o Now utilizing anyMissing().
# 2006-06-26
# o Function would not work in R v2.4.0 devel, because argument 'method' was
# removed from La.svd().
# 2006-04-25
# o Updated the URL to Todd Will's webpage.
# 2005-02-20
# o Added '...' to please R CMD check.
# 2005-02-20
# o Now using setMethodS3() and added '...' to please R CMD check.
# 2005-01-24
# o Added a check for missing values of argument 'w'.
# 2004-05-14
# o Made into a method of class matrix instead of a stand-alone function.
# 2003-03-09
# o Created! Verified that it gives similar results as acp().
############################################################################
aroma.light/R/averageQuantile.R 0000644 0001750 0001750 00000011672 14136047216 016265 0 ustar nilesh nilesh ###########################################################################/**
# @RdocGeneric averageQuantile
# @alias averageQuantile.list
# @alias averageQuantile.matrix
#
# @title "Gets the average empirical distribution"
#
# \usage{
# @usage averageQuantile,list
# @usage averageQuantile,matrix
# }
#
# \description{
# @get "title" for a set of samples.
# }
#
# \arguments{
# \item{X}{A @list with K @numeric @vectors, or a @numeric NxK @matrix.
# If a @list, the @vectors may be of different lengths.}
# \item{...}{Not used.}
# }
#
# \value{
# Returns a @numeric @vector of length equal to the longest @vector
# in argument \code{X}.
# }
#
# \section{Missing values}{
# Missing values are excluded.
# }
#
# \seealso{
# @see "normalizeQuantileRank".
# @see "normalizeQuantileSpline".
# @see "stats::quantile".
# }
#
# \author{
# Parts adopted from Gordon Smyth (\url{http://www.statsci.org/}) in 2002
# \& 2006. Original code by Ben Bolstad at Statistics Department,
# University of California.
# }
#
# @keyword "nonparametric"
# @keyword "multivariate"
# @keyword "robust"
#*/###########################################################################
setMethodS3("averageQuantile", "list", function(X, ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument 'X':
nbrOfChannels <- length(X);
# Nothing to do?
if(nbrOfChannels == 1L)
return(X);
nbrOfObservations <- unlist(lapply(X, FUN=length), use.names=FALSE);
maxNbrOfObservations <- max(nbrOfObservations);
# Nothing to do?
if(maxNbrOfObservations == 1L)
return(X);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Get the sample quantile for all channels (columns)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Construct the sample quantiles
quantiles <- (0:(maxNbrOfObservations-1L))/(maxNbrOfObservations-1L);
# Create a vector to hold the target distribution
xTarget <- vector("double", length=maxNbrOfObservations);
for (cc in 1:nbrOfChannels) {
Xcc <- X[[cc]];
# Order and sort the values
Scc <- sort(Xcc, na.last=NA);
# The number of non-NA observations
nobs <- length(Scc);
# Too few data points?
if(nobs < maxNbrOfObservations) {
# Get the sample quantiles for those values
bins <- (0:(nobs-1L))/(nobs-1L);
# Interpolate to get the values at positions specified by
# 'quantile' using data points given by 'bins' and 'Scc'.
Scc <- approx(x=bins, y=Scc, xout=quantiles, ties="ordered")$y;
bins <- NULL; # Not needed anymore
}
# Incremental mean
xTarget <- xTarget + Scc;
Scc <- NULL; # Not needed anymore
}
Xcc <- NULL; # Not needed anymore
xTarget <- xTarget/nbrOfChannels;
xTarget;
}) # averageQuantile.list()
setMethodS3("averageQuantile", "matrix", function(X, ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
nbrOfChannels <- ncol(X);
# Nothing to do?
if(nbrOfChannels == 1L)
return(X);
nbrOfObservations <- nrow(X);
# Nothing to do?
if(nbrOfObservations == 1L)
return(X);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Get the sample quantile for all channels (columns)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Construct the sample quantiles
quantiles <- (0:(nbrOfObservations-1L))/(nbrOfObservations-1L);
# Create a vector to hold the target distribution
xTarget <- vector("double", length=nbrOfObservations);
for (cc in 1:nbrOfChannels) {
Xcc <- X[,cc,drop=TRUE];
# Order and sort the values
Scc <- sort(Xcc, na.last=NA);
# The number of non-NA observations
nobs <- length(Scc);
# Too few data points?
if(nobs < nbrOfObservations) {
# Get the sample quantiles for those values
bins <- (0:(nobs-1L))/(nobs-1L);
# Interpolate to get the values at positions specified by
# 'quantile' using data points given by 'bins' and 'Scc'.
Scc <- approx(x=bins, y=Scc, xout=quantiles, ties="ordered")$y;
bins <- NULL; # Not needed anymore
}
# Incremental mean
xTarget <- xTarget + Scc;
Scc <- NULL; # Not needed anymore
}
Xcc <- NULL; # Not needed anymore
xTarget <- xTarget/nbrOfChannels;
xTarget;
}) # averageQuantile.matrix()
##############################################################################
# HISTORY:
# 2013-10-08
# o Added averageQuantile() for matrices.
# 2007-01-22
# o BUG FIX: averageQuantile.list() did not deal with vectors of different
# length correctly. Thanks Alicia Oshlack, WEHI.
# 2006-05-12
# o Created from normalizeQuantile.matrix.R. It has been optimized for
# memory. Hence, the normalization is done using a two-pass procedure.
##############################################################################
aroma.light/R/backtransformXYCurve.matrix.R 0000644 0001750 0001750 00000002163 14136047216 020570 0 ustar nilesh nilesh setMethodS3("backtransformXYCurve", "matrix", function(X, fit, targetChannel=1, ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 1. Verify the arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument: 'X'
if (ncol(X) != 2) {
stop("Curve-fit normalization requires two channels only: ", ncol(X));
}
# Argument 'targetChannel':
# targetChannel <- Arguments$getIndex(targetChannel, range=c(1,ncol(X)));
# Allocate result
XN <- X;
# Predict using only finite covariates (otherwise an error)
keep <- which(is.finite(X[,targetChannel]));
# Nothing to do?
if (length(keep) > 0) {
X <- X[keep,,drop=FALSE];
xN <- fit$predictY(X[,targetChannel]);
delta <- xN - X[,targetChannel];
# Not needed anymore
xN <- NULL;
XN[keep,-targetChannel] <- X[,-targetChannel] - delta;
# Not needed anymore
keep <- delta <- NULL;
}
XN;
})
############################################################################
# HISTORY:
# 2009-07-15
# o Created.
############################################################################
aroma.light/R/predict.lowess.R 0000644 0001750 0001750 00000000542 14136047216 016107 0 ustar nilesh nilesh setMethodS3("predict", "lowess", function(object, newdata=NULL, ties=mean, ...) {
approx(object, xout=newdata, ties=ties, ...)$y;
}, private=TRUE) # predict()
############################################################################
# HISTORY:
# 2006-11-28
# o Created.
############################################################################
aroma.light/R/normalizeTumorBoost.R 0000644 0001750 0001750 00000026166 14136047216 017212 0 ustar nilesh nilesh ###########################################################################/**
# @RdocGeneric normalizeTumorBoost
# @alias normalizeTumorBoost.numeric
#
# @title "Normalizes allele B fractions for a tumor given a match normal"
#
# \description{
# TumorBoost [1] is a normalization method that normalizes the allele B
# fractions of a tumor sample given the allele B fractions and genotypes
# of a matched normal.
# The method is a single-sample (single-pair) method.
# It does not require total copy-number estimates.
# The normalization is done such that the total copy number is
# unchanged afterwards.
# }
#
# \usage{
# @usage normalizeTumorBoost,numeric
# }
#
# \arguments{
# \item{betaT, betaN}{Two @numeric @vectors each of length J with
# tumor and normal allele B fractions, respectively.}
# \item{muN}{An optional @vector of length J containing
# normal genotypes calls in (0,1/2,1,@NA) for (AA,AB,BB).}
# \item{preserveScale}{If @TRUE, SNPs that are heterozygous in the
# matched normal are corrected for signal compression using an estimate
# of signal compression based on the amount of correction performed
# by TumorBoost on SNPs that are homozygous in the matched normal.}
# \item{flavor}{A @character string specifying the type of
# correction applied.}
# \item{...}{Not used.}
# }
#
# \value{
# Returns a @numeric @vector of length J containing the normalized
# allele B fractions for the tumor.
# Attribute \code{modelFit} is a @list containing model fit parameters.
# }
#
# \details{
# Allele B fractions are defined as the ratio between the allele B signal
# and the sum of both (all) allele signals at the same locus.
# Allele B fractions are typically within [0,1], but may have a slightly
# wider support due to for instance negative noise.
# This is typically also the case for the returned normalized
# allele B fractions.
# }
#
# \section{Flavors}{
# This method provides a few different "flavors" for normalizing the
# data. The following values of argument \code{flavor} are accepted:
# \itemize{
# \item{v4: (default) The TumorBoost method, i.e. Eqns. (8)-(9) in [1].}
# \item{v3: Eqn (9) in [1] is applied to both heterozygous and homozygous
# SNPs, which effectively is v4 where the normalized allele B
# fractions for homozygous SNPs becomes 0 and 1.}
# \item{v2: ...}
# \item{v1: TumorBoost where correction factor is forced to one, i.e.
# \eqn{\eta_j=1}. As explained in [1], this is a suboptimal
# normalization method. See also the discussion in the
# paragraph following Eqn (12) in [1].}
# }
# }
#
# \section{Preserving scale}{
# \emph{As of \pkg{aroma.light} v1.33.3 (March 30, 2014),
# argument \code{preserveScale} no longer has a default value and has
# to be specified explicitly. This is done in order to change the
# default to @FALSE in a future version, while minimizing the risk
# for surprises.}
#
# Allele B fractions are more or less compressed toward a half, e.g.
# the signals for homozygous SNPs are slightly away from zero and one.
# The TumorBoost method decreases the correlation in allele B fractions
# between the tumor and the normal \emph{conditioned on the genotype}.
# What it does not control for is the mean level of the allele B fraction
# \emph{conditioned on the genotype}.
#
# By design, most flavors of the method will correct the homozygous SNPs
# such that their mean levels get close to the expected zero and
# one levels. However, the heterozygous SNPs will typically keep the
# same mean levels as before.
# One possibility is to adjust the signals such as the mean levels of
# the heterozygous SNPs relative to that of the homozygous SNPs is
# the same after as before the normalization.
#
# If argument \code{preserveScale=TRUE}, then SNPs that are heterozygous
# (in the matched normal) are corrected for signal compression using
# an estimate of signal compression based on the amount of correction
# performed by TumorBoost on SNPs that are homozygous
# (in the matched normal).
#
# The option of preserving the scale is \emph{not} discussed in the
# TumorBoost paper [1], which presents the \code{preserveScale=FALSE}
# version.
# }
#
# @examples "../incl/normalizeTumorBoost.Rex"
#
# @author "HB, PN"
#
# \references{
# [1] @include "../incl/BengtssonNeuvial_2010.bib.Rdoc" \cr
# }
#*/###########################################################################
setMethodS3("normalizeTumorBoost", "numeric", function(betaT, betaN, muN=callNaiveGenotypes(betaN), preserveScale=FALSE, flavor=c("v4", "v3", "v2", "v1"), ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument: 'betaT' & 'betaN':
betaT <- as.numeric(betaT);
betaN <- as.numeric(betaN);
J <- length(betaT);
if (length(betaN) != J) {
stop("The length of arguments 'betaT' and 'betaN' differ: ", length(betaN), " != ", J);
}
# Argument: 'muN':
if (length(muN) != J) {
stop("Argument 'muN' does not match the number of loci: ", length(muN), " != ", J);
}
knownGenotypes <- c(0, 1/2, 1, NA);
unknown <- which(!is.element(muN, knownGenotypes));
n <- length(unknown);
if (n > 0) {
unknown <- unique(muN[unknown]);
stop("Argument 'muN' contains unknown values: ", hpaste(unknown));
}
# Argument: 'preserveScale':
preserveScale <- as.logical(preserveScale);
# Argument: 'flavor':
flavor <- match.arg(flavor);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Extract data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Identify set to be updated
toUpdate <- which(is.finite(betaT) & is.finite(betaN) & is.finite(muN));
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Estimate delta
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
delta <- (betaN - muN);
if (flavor == "v1") {
b <- 1;
} else if (flavor == "v2") {
b <- rep(1, times=length(delta));
isDown <- (betaT < betaN);
isBetaNZero <- (betaN == 0);
isBetaNOne <- (betaN == 1);
idxs <- which(isDown & !isBetaNZero);
b[idxs] <- betaT[idxs]/betaN[idxs];
idxs <- which(!isDown & !isBetaNOne);
b[idxs] <- (1-betaT[idxs])/(1-betaN[idxs]);
# Not needed anymore
isDown <- idxs <- NULL;
# Treat the case when the estimated SNP effect is zero
# Then we want the normalized value to be exactly zero or one.
idxs <- which(delta == 0);
} else if (flavor == "v3") {
b <- rep(1, times=length(delta));
isHomA <- (muN == 0);
isHomB <- (muN == 1);
isHet <- (!isHomA & !isHomB);
isDown <- (betaT < betaN);
isBetaNZero <- (betaN == 0);
isBetaNOne <- (betaN == 1);
idxs <- which((isHet & isDown & !isBetaNZero) | (isHomA & !isBetaNZero));
b[idxs] <- betaT[idxs]/betaN[idxs];
idxs <- which((isHet & !isDown & !isBetaNOne) | (isHomB & !isBetaNOne));
b[idxs] <- (1-betaT[idxs])/(1-betaN[idxs]);
# Not needed anymore
isDown <- isHet <- isHomA <- isHomB <- idxs <- NULL;
} else if (flavor == "v4") {
# This is the published TumorBoost normalization method
b <- rep(1, times=length(delta));
isHet <- (muN != 0 & muN != 1);
isDown <- (betaT < betaN);
idxs <- which(isHet & isDown);
b[idxs] <- betaT[idxs]/betaN[idxs];
idxs <- which(isHet & !isDown);
b[idxs] <- (1-betaT[idxs])/(1-betaN[idxs]);
# Not needed anymore
isDown <- isHet <- idxs <- NULL;
}
delta <- b * delta;
# Sanity check
stopifnot(length(delta) == J);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Normalize
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# In very rare cases delta can be non-finite while betaT is.
# This can happen whenever muN or betaN is non-finite. Then:
# ok <- is.finite(delta);
# ok <- which(ok);
# betaTN[ok] <- betaT[ok] - delta[ok];
# It can be debated whether one should correct a SNP in this case, for
# which betaTN then become non-finite too. If not correcting, we will
# end up with betaTN == betaT value which is not from the same mixture
# distribution as the other corrected values.
# For now, we ignore this. /HB 2010-08-19 [on the flight to SFO]
betaTN <- betaT - delta;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Preserve scale of heterozygotes relative to homozygotes
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
if (preserveScale) {
isHom <- (muN == 0 | muN == 1);
idxs <- which(isHom);
# Signal compression in homozygous SNPs before TBN
eta <- median(abs(betaT[idxs]-1/2), na.rm=TRUE);
# Signal compression in homozygous SNPs after TBN
etaC <- median(abs(betaTN[idxs]-1/2), na.rm=TRUE);
# Correction factor
sf <- etaC/eta;
# Correct
isHet <- !isHom;
isDown <- (betaTN < 1/2);
idxs <- which(isHet & isDown);
betaTN[idxs] <- 1/2 - sf * (1/2 - betaTN[idxs]);
idxs <- which(isHet & !isDown);
betaTN[idxs] <- 1/2 + sf * (betaTN[idxs] - 1/2);
# Not needed anymore
isDown <- isHom <- isHet <- idxs <- eta <- etaC <- NULL;
} else {
sf <- NA_real_;
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Return normalized data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
modelFit <- list(
method = "normalizeTumorBoost",
flavor = flavor,
delta = delta,
preserveScale = preserveScale,
scaleFactor = sf
);
attr(betaTN, "modelFit") <- modelFit;
# Sanity check
stopifnot(length(betaTN) == J);
betaTN;
}) # normalizeTumorBoost()
############################################################################
# HISTORY:
# 2014-03-30
# o Argument 'preserveScale' for normalizeTumorBoost() is now required.
# o Swapped the order of argument 'flavor' and 'preserveScale'.
# 2010-09-23
# o CLEANUP: normalizeTumorBoost() now uses which() instead of
# whichVector() of 'R.utils'. The former used to be significantly
# slower than the latter, but that is no longer the case.
# 2010-08-04
# o Added argument 'preserveScale' to normalizeTumorBoost().
# 2010-03-18
# o BUG FIX: For flavors "v2" and "v3" NaN:s could be introduced if betaN
# was exactly zero or exactly one.
# 2009-07-08
# o Now the arguments are 'betaT', 'betaN' and 'muN'.
# o Added an example() with real data.
# 2009-07-06
# o Created from process() of TumorBoostNormalization in aroma.cn.
# o Added model 'flavor' "v4" which corrects heterozygots according to "v2"
# and homozygotes according to "v1".
# o Added model 'flavor' "v3". Suggested by PN last night over a Guinness
# at the pub after a long day of hard work.
# 2009-06-22
# o Added model 'flavor' "v2".
# 2009-06-08
# o The constructor of TumorBoostNormalization now only takes an
# AromaUnitGenotypeCallSet for argument 'gcN'. It no longer takes an
# AromaUnitFracBCnBinarySet object.
# 2009-05-17
# o Now the constructor of TumorBoostNormalization asserts that there are
# no stray arguments.
# 2009-04-29
# o Created.
############################################################################
aroma.light/R/plotMvsA.R 0000644 0001750 0001750 00000004477 14136047216 014722 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric plotMvsA
# @alias plotMvsA.matrix
#
# @title "Plot log-ratios vs log-intensities"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage plotMvsA,matrix
# }
#
# \arguments{
# \item{X}{Nx2 @matrix with two channels and N observations.}
# \item{Alab,Mlab}{Labels on the x and y axes.}
# \item{Alim,Mlim}{Plot range on the A and M axes.}
# \item{aspectRatio}{Aspect ratio between \code{Mlim} and \code{Alim}.}
# \item{pch}{Plot symbol used.}
# \item{...}{Additional arguments accepted by @see "graphics::points".}
# \item{add}{If @TRUE, data points are plotted in the current plot,
# otherwise a new plot is created.}
# }
#
# \details{
# Red channel is assumed to be in column one and green in column two.
# Log-ratio are calculated taking channel one over channel two.
# }
#
# \value{
# Returns nothing.
# }
#
# @author "HB"
#*/#########################################################################
setMethodS3("plotMvsA", "matrix", function(X, Alab="A", Mlab="M", Alim=c(0,16), Mlim=c(-1,1)*diff(Alim)*aspectRatio, aspectRatio=1, pch=".", ..., add=FALSE) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument 'X':
if (ncol(X) != 2) {
throw("Argument 'X' must have exactly two columns: ", ncol(X));
}
if (!add) {
plot(NA, xlab=Alab, ylab=Mlab, xlim=Alim, ylim=Mlim);
}
# Do not plot (or generate false) M vs A for non-positive signals.
X[X <= 0] <- NA;
R <- as.double(X[,1]);
G <- as.double(X[,2]);
M <- log(R/G, base=2);
A <- log(R*G, base=2)/2;
points(A,M, pch=pch, ...);
})
############################################################################
# HISTORY:
# 2011-06-26
# o Added argument 'aspectRatio' to plotMvsA(). It can be used to adjust
# the range of the 'Mlim' argument relative to the 'Alim' argument.
# 2005-09-06
# o Coercing to doubles to avoid overflow when multiplying to integers.
# o Now non-positive signals are excluded.
# 2005-06-11
# o BUG FIX: Used 'rg' instead of 'X' in R <- rg[,1] and G <- rg[,2].
# 2005-06-03
# o Created from the normalizeQuantile.matrix.Rex example.
############################################################################
aroma.light/R/distanceBetweenLines.R 0000644 0001750 0001750 00000010011 14136047216 017231 0 ustar nilesh nilesh #########################################################################/**
# @RdocDefault distanceBetweenLines
#
# @title "Finds the shortest distance between two lines"
#
# \description{
# @get "title".
#
# Consider the two lines
#
# \eqn{x(s) = a_x + b_x*s} and \eqn{y(t) = a_y + b_y*t}
#
# in an K-space where the offset and direction @vectors are \eqn{a_x}
# and \eqn{b_x} (in \eqn{R^K}) that define the line \eqn{x(s)}
# (\eqn{s} is a scalar). Similar for the line \eqn{y(t)}.
# This function finds the point \eqn{(s,t)} for which \eqn{|x(s)-x(t)|}
# is minimal.
# }
#
# @synopsis
#
# \arguments{
# \item{ax,bx}{Offset and direction @vector of length K for line \eqn{z_x}.}
# \item{ay,by}{Offset and direction @vector of length K for line \eqn{z_y}.}
# \item{...}{Not used.}
# }
#
# \value{
# Returns the a @list containing
# \item{ax,bx}{The given line \eqn{x(s)}.}
# \item{ay,by}{The given line \eqn{y(t)}.}
# \item{s,t}{The values of \eqn{s} and \eqn{t} such that
# \eqn{|x(s)-y(t)|} is minimal.}
# \item{xs,yt}{The values of \eqn{x(s)} and \eqn{y(t)}
# at the optimal point \eqn{(s,t)}.}
# \item{distance}{The distance between the lines, i.e. \eqn{|x(s)-y(t)|}
# at the optimal point \eqn{(s,t)}.}
# }
#
# @author "HB"
#
# @examples "../incl/distanceBetweenLines.Rex"
#
# \references{
# [1] M. Bard and D. Himel, \emph{The Minimum Distance Between Two
# Lines in n-Space}, September 2001, Advisor Dennis Merino.\cr
# [2] Dan Sunday, \emph{Distance between 3D Lines and Segments},
# Jan 2016, \url{https://www.geomalgorithms.com/algorithms.html}.\cr
# }
#
# @keyword "algebra"
#*/#########################################################################
setMethodS3("distanceBetweenLines", "default", function(ax, bx, ay, by, ...) {
if (length(ax) != length(bx)) {
stop(sprintf("The length of the offset vector 'ax' (%d) and direction vector 'bx' (%d) are not equal.", length(ax), length(bx)));
}
if (length(ay) != length(by)) {
stop(sprintf("The length of the offset vector 'ay' (%d) and direction vector 'by' (%d) are not equal.", length(ay), length(by)));
}
if (length(ax) != length(ay)) {
stop(sprintf("The line x(s) and y(t) are of different dimensions: %d vs %d", length(ax), length(ay)));
}
if (length(ax) <= 1)
stop(sprintf("The lines must be in two or more dimensions: %d", length(ax)));
ax <- as.vector(ax);
bx <- as.vector(bx);
ay <- as.vector(ay);
by <- as.vector(by);
if (length(ax) == 2) {
# Find (s,t) such that x(s) == y(t) where
# x(s) = a_x + b_x*s
# y(t) = a_y + b_y*t
e <- (ax-ay);
f <- e/by;
g <- bx/by;
s <- (f[2]-f[1])/(g[1]-g[2]);
t <- f[1] + g[1]*s;
d <- 0;
} else {
# Consider the two lines in an K-space
# x(s) = a_x + b_x*t (line 1)
# y(t) = a_y + b_y*s (line 2)
# where s and t are scalars and the other vectors in R^K.
# Some auxillary calculations
A <- sum(bx*bx);
B <- 2*(sum(bx*ax)-sum(bx*ay));
C <- 2*sum(bx*by);
D <- 2*(sum(by*ay)-sum(by*ax));
E <- sum(by*by);
F <- sum(ax*ax) + sum(ay*ay);
# Shortest distance between the two lines (points)
G <- C^2-4*A*E;
d2 <- (B*C*D+B^2*E+C^2*F+A*(D^2-4*E*F))/G;
d <- sqrt(d2);
# The points that are closest to each other.
t <- (2*A*D+B*C)/G; # t is on y(t)
s <- (C*t-B)/(2*A); # s is on x(s)
}
# Get the coordinates of the two points on x(s) and y(t) that
# are closest to each other.
xs <- ax + bx*s;
yt <- ay + by*t;
list(ax=ax, bx=bx, ay=ay, by=by, s=s, t=t, xs=xs, yt=yt, distance=d);
}) # distanceBetweenLines()
############################################################################
# HISTORY:
# 2005-06-03
# o Made into a default method.
# 2003-12-29
# o Added Rdoc comments.
# o Created by generalizing from formet RGData$fitIWPCA() in com.braju.smax.
############################################################################
aroma.light/R/plotMvsMPairs.R 0000644 0001750 0001750 00000004160 14136047216 015722 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric plotMvsMPairs
# @alias plotMvsMPairs.matrix
#
# @title "Plot log-ratios vs log-ratios for all pairs of columns"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage plotMvsMPairs,matrix
# }
#
# \arguments{
# \item{X}{Nx2K @matrix where N is the number of observations and
# 2K is an even number of channels.}
# \item{xlab,ylab}{Labels on the x and y axes.}
# \item{xlim,ylim}{Plot range on the x and y axes.}
# \item{pch}{Plot symbol used.}
# \item{...}{Additional arguments accepted by @see "graphics::points".}
# \item{add}{If @TRUE, data points are plotted in the current plot,
# otherwise a new plot is created.}
# }
#
# \details{
# Log-ratio are calculated by over paired columns, e.g. column 1 and 2,
# column 3 and 4, and so on.
# }
#
# \value{
# Returns nothing.
# }
#
# @author "HB"
#*/#########################################################################
setMethodS3("plotMvsMPairs", "matrix", function(X, xlab="M", ylab="M", xlim=c(-1,1)*6, ylim=xlim, pch=".", ..., add=FALSE) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument 'X':
if (ncol(X)/2 != round(ncol(X)/2))
throw("Argument 'X' must have an even number of columns: ", ncol(X));
if (!add) {
plot(NA, xlab=xlab, ylab=ylab, xlim=xlim, ylim=ylim);
}
# Do not plot (or generate false) M vs A for non-positive signals.
X[X <= 0] <- NA;
npairs <- ncol(X)/2;
for (kk in npairs-1) {
R <- X[,2*kk-1];
G <- X[,2*kk];
M1 <- log(R/G, base=2);
R <- X[,2*(kk+1)-1];
G <- X[,2*(kk+1)];
M2 <- log(R/G, base=2);
points(M1,M2, pch=pch, ...);
}
})
############################################################################
# HISTORY:
# 2005-09-06
# o Now non-positive signals are excluded.
# 2005-06-11
# o BUG FIX: Used 'rg' instead of 'X'.
# 2005-06-03
# o Created from the normalizeQuantile.matrix.Rex example.
############################################################################
aroma.light/R/fitXYCurve.R 0000644 0001750 0001750 00000016315 14136047216 015217 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric fitXYCurve
# @alias fitXYCurve.matrix
# @alias backtransformXYCurve
# @alias backtransformXYCurve.matrix
#
# @title "Fitting a smooth curve through paired (x,y) data"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage fitXYCurve,matrix
# }
#
# \arguments{
# \item{X}{An Nx2 @matrix where the columns represent the two channels
# to be normalized.}
# \item{weights}{If @NULL, non-weighted normalization is done.
# If data-point weights are used, this should be a @vector of length
# N of data point weights used when estimating the normalization
# function.
# }
# \item{typeOfWeights}{A @character string specifying the type of
# weights given in argument \code{weights}.
# }
# \item{method}{@character string specifying which method to use when
# fitting the intensity-dependent function.
# Supported methods:
# \code{"loess"} (better than lowess),
# \code{"lowess"} (classic; supports only zero-one weights),
# \code{"spline"} (more robust than lowess at lower and upper
# intensities; supports only zero-one weights),
# \code{"robustSpline"} (better than spline).
# }
# \item{bandwidth}{A @double value specifying the bandwidth of the
# estimator used.
# }
# \item{satSignal}{Signals equal to or above this threshold will not
# be used in the fitting.
# }
# \item{...}{Not used.}
# }
#
# \value{
# A named @list structure of class \code{XYCurve}.
# }
#
# \section{Missing values}{
# The estimation of the function will only be made based on complete
# non-saturated observations, i.e. observations that contains no @NA
# values nor saturated values as defined by \code{satSignal}.
# }
#
# \section{Weighted normalization}{
# Each data point, that is, each row in \code{X}, which is a
# vector of length 2, can be assigned a weight in [0,1] specifying how much
# it should \emph{affect the fitting of the normalization function}.
# Weights are given by argument \code{weights}, which should be a @numeric
# @vector of length N.
#
# Note that the lowess and the spline method only support zero-one
# \{0,1\} weights.
# For such methods, all weights that are less than a half are set to zero.
# }
#
# \section{Details on loess}{
# For @see "stats::loess", the arguments \code{family="symmetric"},
# \code{degree=1}, \code{span=3/4},
# \code{control=loess.control(trace.hat="approximate"},
# \code{iterations=5}, \code{surface="direct")} are used.
# }
#
# @author "HB"
#
# \examples{
# @include "../incl/fitXYCurve.matrix.Rex"
# }
#*/#########################################################################
setMethodS3("fitXYCurve", "matrix", function(X, weights=NULL, typeOfWeights=c("datapoint"), method=c("loess", "lowess", "spline", "robustSpline"), bandwidth=NULL, satSignal=2^16-1, ...) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 1. Verify the arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Argument: 'X'
if (ncol(X) != 2) {
stop("Curve-fit normalization requires two channels only: ", ncol(X));
}
if (nrow(X) < 3) {
stop("Curve-fit normalization requires at least three observations: ", nrow(X));
}
# Argument: 'satSignal'
if (satSignal < 0) {
stop("Argument 'satSignal' is negative: ", satSignal);
}
# Argument: 'method'
method <- match.arg(method);
zeroOneWeightsOnly <- (method %in% c("lowess", "spline"));
# Argument: 'typeOfWeights'
typeOfWeights <- match.arg(typeOfWeights);
# Argument: 'weights'
datapointWeights <- NULL;
if (!is.null(weights)) {
# If 'weights' is an object of a class with as.double(), cast it.
weights <- as.double(weights);
if (anyMissing(weights))
stop("Argument 'weights' must not contain NA values.");
if (any(weights < 0 | weights > 1)) {
stop("Argument 'weights' out of range [0,1]: ", paste(weights[weights < 0.0 | weights > 1.0], collapse=", "));
}
if (zeroOneWeightsOnly && any(weights > 0 & weights < 1)) {
weights <- round(weights);
warning("Weights were rounded to {0,1} since '", method, "' normalization supports only zero-one weights.");
}
weights <- as.vector(weights);
if (length(weights) == 1) {
weights <- rep(weights, length.out=nrow(X));
} else if (length(weights) != nrow(X)) {
stop("Argument 'weights' does not have the same length as the number of data points (rows) in the matrix: ", length(weights), " != ", nrow(X));
}
datapointWeights <- weights;
} # if (!is.null(weights))
# Argument: 'bandwidth'
if (is.null(bandwidth)) {
bandwidths <- c("loess"=0.75, "lowess"=0.3, "robustSpline"=0.75,
"spline"=0.75);
bandwidth <- bandwidths[method];
} else if (!is.numeric(bandwidth) || bandwidth <= 0 || bandwidth > 1) {
stop("Argument 'bandwidth' must be in [0,1): ", bandwidth);
} else if (length(bandwidth) != 1) {
stop("Argument 'bandwidth' must be a scalar: ", paste(bandwidth, collapse=", "));
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 2. Prepare data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Use only positive non-saturated observations to estimate the
# normalization function
isValid <- (is.finite(X) & (X <= satSignal));
isValid <- (isValid[,1] & isValid[,2]);
X <- X[isValid,];
x <- X[,1,drop=TRUE];
y <- X[,2,drop=TRUE];
if (!is.null(datapointWeights)) {
datapointWeights <- datapointWeights[isValid];
}
# Not needed anymore
X <- isValid <- NULL;
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 3. Fit the curve
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
if (method == "lowess") {
keep <- if (!is.null(datapointWeights)) (datapointWeights > 0) else TRUE;
x <- x[keep];
y <- y[keep];
# Not needed anymore
keep <- NULL;
fit <- lowess(x=x, y=y, f=bandwidth, ...);
fit$predictY <- function(x) approx(fit, xout=x, ties=mean)$y;
} else if (method == "loess") {
fit <- loess(formula=y ~ x, weights=datapointWeights,
family="symmetric", degree=1, span=bandwidth,
control=loess.control(trace.hat="approximate",
iterations=5, surface="direct"), ...);
fit$predictY <- function(x) predict(fit, newdata=x);
} else if (method == "spline") {
keep <- if (!is.null(datapointWeights)) (datapointWeights > 0) else TRUE;
x <- x[keep];
y <- y[keep];
# Not needed anymore
keep <- NULL;
fit <- smooth.spline(x=x, y=y, spar=bandwidth, ...);
fit$predictY <- function(x) predict(fit, x=x)$y;
} else if (method == "robustSpline") {
fit <- robustSmoothSpline(x=x, y=y, w=datapointWeights, spar=bandwidth, ...);
fit$predictY <- function(x) predict(fit, x=x)$y;
}
class(fit) <- c("XYCurveFit", class(fit));
fit;
}) # fitXYCurve()
############################################################################
# HISTORY:
# 2013-10-08
# o DOCUMENTATION: Added backtransformXYCurve() alias to this help page.
# 2013-09-26
# o Now utilizing anyMissing().
# 2009-07-15
# o Created from normalizeCurveFit.R.
############################################################################
aroma.light/R/backtransformAffine.R 0000644 0001750 0001750 00000014461 14136047216 017114 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric backtransformAffine
# @alias backtransformAffine.matrix
#
# @title "Reverse affine transformation"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage backtransformAffine,matrix
# }
#
# \arguments{
# \item{X}{An NxK @matrix containing data to be backtransformed.}
# \item{a}{A scalar, @vector, a @matrix, or a @list.
# First, if a @list, it is assumed to contained the elements \code{a}
# and \code{b}, which are the used as if they were passed as separate
# arguments.
# If a @vector, a matrix of size NxK is created which is then filled
# \emph{row by row} with the values in the vector. Commonly, the
# vector is of length K, which means that the matrix will consist of
# copies of this vector stacked on top of each other.
# If a @matrix, a matrix of size NxK is created which is then filled
# \emph{column by column} with the values in the matrix (collected
# column by column. Commonly, the matrix is of size NxK, or NxL with
# L < K and then the resulting matrix consists of copies sitting
# next to each other.
# The resulting NxK matrix is subtracted from the NxK matrix \code{X}.
# }
# \item{b}{A scalar, @vector, a @matrix.
# A NxK matrix is created from this argument. For details see
# argument \code{a}.
# The NxK matrix \code{X-a} is divided by the resulting NxK matrix.
# }
# \item{project}{
# returned (K values per data point are returned).
# If @TRUE, the backtransformed values "\code{(X-a)/b}" are projected
# onto the line L(a,b) so that all columns
# will be identical.
# }
# \item{...}{Not used.}
# }
#
# \value{
# The "\code{(X-a)/b}" backtransformed NxK @matrix is returned.
# If \code{project} is @TRUE, an Nx1 @matrix is returned, because
# all columns are identical anyway.
# }
#
# \section{Missing values}{
# Missing values remain missing values. If projected, data points that
# contain missing values are projected without these.
# }
#
# @examples "../incl/backtransformAffine.matrix.Rex"
#
#*/#########################################################################
setMethodS3("backtransformAffine", "matrix", function(X, a=NULL, b=NULL, project=FALSE, ...) {
# Dimensions of 'X'
nobs <- nrow(X);
ndims <- ncol(X);
if (ndims == 1L) {
stop("Can not fit affine multiscan model. Matrix must contain at least two columns (scans): ", ndims);
}
# If argument 'a' is a list assume it contains the elements 'a' and 'b'.
if (is.list(a)) {
b <- a$b;
a <- a$a;
}
# If 'a' and/or 'b' are vector convert them to row matrices.
if (is.vector(a)) {
# Create a full matrix and filled row by row with 'a'
a <- matrix(a, nrow=nobs, ncol=ndims, byrow=TRUE);
} else if (is.matrix(a)) {
# Create a full matrix and filled column by column by the columns in 'a'
t <- a;
a <- matrix(NA_real_, nrow=nobs, ncol=ndims);
for (cc in 1:ndims) {
# Loop over the columns in a0 too.
col <- ((cc-1) %% ncol(t)) + 1L;
value <- rep(t[,col], length.out=nobs);
a[,cc] <- value;
}
# Not needed anymore
t <- NULL;
} else if (!is.null(a)) {
stop(paste("Unknown data type of argument 'a':", class(a)[1]));
}
if (!project) {
if (is.vector(b)) {
# Create a full matrix and filled row by row with 'b'
b <- matrix(b, nrow=nobs, ncol=ndims, byrow=TRUE);
} else if (is.matrix(b)) {
# Create a full matrix and filled column by column by the columns in 'b'
t <- b;
b <- matrix(NA_real_, nrow=nobs, ncol=ndims);
for (cc in 1:ndims) {
# Loop over the columns in a0 too.
col <- ((cc-1) %% ncol(t)) + 1L;
value <- rep(t[,col], length.out=nobs);
b[,cc] <- value;
}
} else if (!is.null(b)) {
stop(paste("Unknown data type of argument 'b':", class(b)[1]));
}
}
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 2. Subtract the bias and rescale
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
if (!is.null(a))
X <- X - a;
########################################################################
# Alternative 2
#
# i) Translate the fitted line L to L' such that L' goes through
# (a,a,...,a) and project the data y onto L' to obtain ytilde
# ii) from which we can calculate xtilde = (ytilde - a) / b
# iii) Since all xtilde are the same take the first component to be our
# estimate xhat in y = a + b*xhat.
########################################################################
if (project) {
# In theory:
# ytilde <- projectUontoV(y-a,b) + a;
# xtilde <- (ytilde-a)/b;
# In practice:
X <- t(X);
# 'b' standardized to a unit vector such that == 1.
v <- b / sqrt(sum(b^2));
# projectUontoV(): U should be an NxK matrix and v an N vector.
X <- projectUontoV(X,v, na.rm=TRUE);
X <- X[1,] / b[1]; # Note that here 'b' is a vector!
} else {
if (!is.null(b))
X <- X / b; # Note that here 'b' is a matrix!
}
as.matrix(X);
}) # backtransformAffine()
############################################################################
# HISTORY:
# 2011-04-12
# o Now using as.double(NA) instead of NA.
# 2006-06-03
# o Minor to merge two different threads of this code.
# o Method passes the tests in the example code (again).
# 2005-01-24
# o Now missing values are excluded before projection.
# 2005-01-08
# o Now, if project is TRUE, only one column is returned.
# o Now the method is guaranteed to return a matrix by calling as.matrix().
# o The average of projected data is now no the same scale as the average on
# non-projected data. Before the data was max(b) times too large.
# o Added argument 'project' to the Rdoc comments.
# o Added test in example to assert that the same matrix is returned if
# projection on an identity transformation is applied.
# 2004-06-28
# o BUG FIX: Applied projection when project was FALSE and vice versa.
# Projection did not work because 'b' was expanded into a full matrix.
# Now this is only done if project == FALSE.
# 2004-05-14
# o Created. Extracted and generalize code from calibrateMultiscan(),
# normalizeAffine() and calibrateMultiscanSpatial().
############################################################################
aroma.light/R/medianPolish.R 0000644 0001750 0001750 00000012613 14136047216 015560 0 ustar nilesh nilesh #########################################################################/**
# @RdocGeneric medianPolish
# @alias medianPolish.matrix
#
# @title "Median polish"
#
# \description{
# @get "title".
# }
#
# \usage{
# @usage medianPolish,matrix
# }
#
# \arguments{
# \item{X}{N-times-K @matrix}
# \item{tol}{A @numeric value greater than zero used as a threshold
# to identify when the algorithm has converged.}
# \item{maxIter}{Maximum number of iterations.}
# \item{na.rm}{If @TRUE (@FALSE), @NAs are exclude (not exclude).
# If @NA, it is assumed that \code{X} contains no @NA values.}
# \item{.addExtra}{If @TRUE, the name of argument \code{X} is returned
# and the returned structure is assigned a class. This will make the
# result compatible what @see "stats::medpolish" returns.}
# \item{...}{Not used.}
# }
#
# \value{
# Returns a named @list structure with elements:
# \item{overall}{The fitted constant term.}
# \item{row}{The fitted row effect.}
# \item{col}{The fitted column effect.}
# \item{residuals}{The residuals.}
# \item{converged}{If @TRUE, the algorithm converged, otherwise not.}
# }
#
# \details{
# The implementation of this method give identical estimates as
# @see "stats::medpolish", but is about 3-5 times more efficient when
# there are no @NA values.
# }
#
# @author "HB"
#
# @examples "../incl/medianPolish.matrix.Rex"
#
# \seealso{
# @see "stats::medpolish".
# }
#
# @keyword algebra
#*/#########################################################################
setMethodS3("medianPolish", "matrix", function(X, tol=0.01, maxIter=10L, na.rm=NA, ..., .addExtra=TRUE) {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Local functions
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ns <- getNamespace("matrixStats")
.psortKM <- get(".psortKM", mode="function", envir=ns);
dim <- dim(X);
nrow <- dim[1L];
ncol <- dim[2L];
if (.addExtra) {
name <- deparse(substitute(X));
}
# Overall effects
t <- 0;
# Row effects
r <- vector("double", length=nrow);
# Column effects
c <- vector("double", length=ncol);
hasNa <- (!is.na(na.rm) && anyMissing(X));
if (hasNa) {
oldSum <- 0;
for (ii in 1:maxIter) {
# Fit the row effects
rdelta <- rowMedians(X, na.rm=na.rm);
X <- X - rdelta;
r <- r + rdelta;
# Fit the overall effects
delta <- median(c, na.rm=na.rm);
c <- c - delta;
t <- t + delta;
# Fit the column effects
cdelta <- colMedians(X, na.rm=na.rm);
X <- X - matrix(cdelta, nrow=nrow, ncol=ncol, byrow=TRUE);
c <- c + cdelta;
# Fit the overall effects
delta <- median(r, na.rm=na.rm);
r <- r - delta;
t <- t + delta;
# Fit the overall effects
newSum <- sum(abs(X), na.rm=na.rm);
converged <- (newSum == 0 || abs(newSum - oldSum) < tol * newSum);
if (converged)
break;
oldSum <- newSum;
} # for (ii ...)
} else {
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Optimized code for the case where there are no NAs
#
# Compared to medpolish(..., na.rm=FALSE), this version is about
# 3-4 times faster.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
rhalf <- (nrow+1)/2;
if (nrow %% 2 == 1) {
# Get x(rhalf), where x(k) is k:th sorted value in x.
rMedian <- function(x) .psortKM(x, k=rhalf);
} else {
# Average x(rhalf) and x(rhalf+1).
rMedian <- function(x) sum(.psortKM(x, k=rhalf+1L, m=2L))/2;
}
chalf <- (ncol+1)/2;
if (ncol %% 2 == 1) {
# Get x(chalf), where x(k) is k:th sorted value in x.
cMedian <- function(x) .psortKM(x, k=chalf);
} else {
# Average x(chalf) and x(chalf+1).
cMedian <- function(x) sum(.psortKM(x, k=chalf+1L, m=2L))/2;
}
oldSum <- 0;
for (ii in 1:maxIter) {
# Fit the row effects
rdelta <- apply(X, MARGIN=1L, FUN=cMedian);
X <- X - rdelta;
r <- r + rdelta;
# Fit the overall effects
delta <- cMedian(c);
c <- c - delta;
t <- t + delta;
# Fit the column effects
cdelta <- apply(X, MARGIN=2L, FUN=rMedian);
X <- X - matrix(cdelta, nrow=nrow, ncol=ncol, byrow=TRUE);
c <- c + cdelta;
# Fit the overall effects
delta <- rMedian(r);
r <- r - delta;
t <- t + delta;
# Fit the overall effects
newSum <- sum(abs(X), na.rm=FALSE);
converged <- (newSum == 0 || abs(newSum - oldSum) < tol * newSum);
if (converged)
break;
oldSum <- newSum;
} # for (ii ...)
}
res <- list(overall=t, row=r, col=c, residuals=X, converged=converged);
if (.addExtra) {
res$name <- name;
class(res) <- c("medianPolish", "medpolish");
}
res;
}) # medianPolish()
############################################################################
# HISTORY:
# 2013-09-26
# o CLEANUP: No longer utilizes ':::'.
# o SPEEDUP: Now utilizing anyMissing() and (col|row)Medians() of the
# 'matrixStats' package.
# o TWEAKS: Using integer (e.g. 1L) where possible
# 2012-09-12
# o ROBUSTNESS: Replaced an .Internal(psort(...)) call with a call to
# matrixStats:::.psortKM() in medianPolish().
# 2012-04-16
# o Added local function psortGet() to medianPolish().
# 2006-05-16
# o Created from stats::medpolish().
############################################################################
aroma.light/R/sampleTuples.R 0000644 0001750 0001750 00000003315 14136047216 015621 0 ustar nilesh nilesh #########################################################################/**
# @RdocDefault sampleTuples
#
# @title "Sample tuples of elements from a set"
#
# \description{
# @get "title".
# The elements within a sampled tuple are unique, i.e. no two elements
# are the same.
# }
#
# @synopsis
#
# \arguments{
# \item{x}{A set of elements to sample from.}
# \item{size}{The number of tuples to sample.}
# \item{length}{The length of each tuple.}
# \item{...}{Additional arguments passed to @see "base::sample".}
# }
#
# \value{
# Returns a NxK @matrix where N = \code{size} and K = \code{length}.
# }
#
# @author "HB"
#
# @examples "../incl/sampleTuples.Rex"
#
# \seealso{
# @see "base::sample".
# }
#
# @keyword utilities
#*/#########################################################################
setMethodS3("sampleTuples", "default", function(x, size, length, ...) {
# Argument 'x':
if (length(x) < 1L)
throw("Argument 'x' must be a vector of length one or greater.");
# Argument 'size':
if (size < 0)
throw("Argument 'size' must be a non-negative integer: ", size);
# Argument 'length':
if (length < 1)
throw("Argument 'length' must be one or greater: ", length);
# Sample tuples
naValue <- NA;
storage.mode(naValue) <- storage.mode(x);
tuples <- matrix(naValue, nrow=size, ncol=length);
for (kk in seq_len(size)) {
tuples[kk,] <- sample(x, size=length, ...);
}
tuples;
})
############################################################################
# HISTORY:
# 2011-04-12
# o Now using NAs of the correct storage type.
# 2005-07-25
# o Created generic sampleTuples().
# 2005-04-07
# o Created.
############################################################################
aroma.light/R/lines.XYCurveFit.R 0000644 0001750 0001750 00000000727 14136047216 016270 0 ustar nilesh nilesh setMethodS3("lines", "XYCurveFit", function(x, xNew=NULL, ...) {
# To please R CMD check
fit <- x;
if (is.null(xNew)) {
xNew <- fit$x;
xNew <- sort(xNew);
xNew <- xNew[!duplicated(xNew)];
}
y <- fit$predictY(xNew);
lines(x=xNew, y=y, ...);
}) # lines()
############################################################################
# HISTORY:
# 2009-07-15
# o Created.
############################################################################
aroma.light/R/normalizeQuantileRank.matrix.R 0000644 0001750 0001750 00000023746 14136047216 020777 0 ustar nilesh nilesh ###########################################################################/**
# @set "class=matrix"
# @RdocMethod normalizeQuantileRank
#
# @title "Normalizes the empirical distribution of a set of samples to a common target distribution"
#
# \usage{
# @usage normalizeQuantileRank,matrix
# }
#
# \description{
# @get "title".
#
# The average sample distribution is calculated either robustly or not
# by utilizing either \code{weightedMedian()} or \code{weighted.mean()}.
# A weighted method is used if any of the weights are different from one.
# }
#
# \arguments{
# \item{X}{a numerical NxK @matrix with the K columns representing the
# channels and the N rows representing the data points.}
# \item{robust}{If @TRUE, the (weighted) median function is used for
# calculating the average sample distribution, otherwise the
# (weighted) mean function is used.}
# \item{ties}{Should ties in \code{x} be treated with care or not?
# For more details, see "limma:normalizeQuantiles".}
# \item{weights}{If @NULL, non-weighted normalization is done.
# If channel weights, this should be a @vector of length K specifying
# the weights for each channel.
# If signal weights, it should be an NxK @matrix specifying the
# weights for each signal.
# }
# \item{typeOfWeights}{A @character string specifying the type of
# weights given in argument \code{weights}.}
# \item{...}{Not used.}
# }
#
# \value{
# Returns an object of the same shape as the input argument.
# }
#
# \section{Missing values}{
# Missing values are excluded when estimating the "common" (the baseline).
# Values that are @NA remain @NA after normalization.
# No new @NAs are introduced.
# }
#
# \section{Weights}{
# Currently only channel weights are support due to the way quantile
# normalization is done.
# If signal weights are given, channel weights are calculated from these
# by taking the mean of the signal weights in each channel.
# }
#
# @examples "../incl/normalizeQuantileRank.matrix.Rex"
#
# \author{
# Adopted from Gordon Smyth (\url{http://www.statsci.org/}) in 2002 \& 2006.
# Original code by Ben Bolstad at Statistics Department, University of
# California.
# Support for calculating the average sample distribution using (weighted)
# mean or median was added by Henrik Bengtsson.
# }
#
# \seealso{
# @see "stats::median", @see "matrixStats::weightedMedian",
# @see "base::mean" and @see "stats::weighted.mean".
# @see "normalizeQuantileSpline".
# }
#
# @keyword "nonparametric"
# @keyword "multivariate"
# @keyword "robust"
#*/###########################################################################
setMethodS3("normalizeQuantileRank", "matrix", function(X, ties=FALSE, robust=FALSE, weights=NULL, typeOfWeights=c("channel", "signal"), ...) {
zeroOneWeightsOnly <- TRUE; # Until supported otherwise.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validate arguments
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
nbrOfChannels <- ncol(X);
if(nbrOfChannels == 1L)
return(X);
nbrOfObservations <- nrow(X);
if(nbrOfObservations == 1L)
return(X);
# Argument 'typeOfWeights':
typeOfWeights <- match.arg(typeOfWeights);
# Argument 'weights':
channelWeights <- NULL;
signalWeights <- NULL;
if (!is.null(weights)) {
# If 'weights' is an object of a class with as.double(), cast it.
dim <- dim(weights);
weights <- as.double(weights);
dim(weights) <- dim;
if (anyMissing(weights))
stop("Argument 'weights' must not contain NA values.");
if (any(weights < 0 | weights > 1)) {
stop("Argument 'weights' out of range [0,1]: ", paste(weights[weights < 0.0 | weights > 1.0], collapse=", "));
}
if (typeOfWeights == "channel") {
if (length(weights) == 1L) {
weights <- rep(weights, length.out=nbrOfObservations);
} else if (length(weights) != nbrOfObservations) {
stop("Argument 'weights' (channel weights) does not have the same length as the number of rows in the matrix: ", length(weights), " != ", nbrOfChannels);
}
channelWeights <- weights;
} else if (typeOfWeights == "signal") {
if (!identical(dim(weights), dim(X))) {
stop("Dimension of argument 'weights' (signal weights) does not match dimension of argument 'X': (", paste(dim(weights), collapse=","), ") != (", paste(dim(X), collapse=","), ")");
}
# Calculate channel weights
channelWeights <- colMeans(weights);
if (zeroOneWeightsOnly && any(weights > 0 & weights < 1)) {
weights <- round(weights);
warning("Weights were rounded to {0,1} since quantile normalization supports only zero-one weights.");
}
signalWeights <- weights;
}
} # if (!is.null(weights))
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 0. Setup
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
maxNbrOfObservations <- nbrOfObservations;
# Create a list S to hold the sorted values for each channels
S <- matrix(NA_real_, nrow=maxNbrOfObservations, ncol=nbrOfChannels);
# Create a list O to hold the ordered indices for each channels
O <- vector("list", length=nbrOfChannels);
# A vector specifying the number of observations in each column
nbrOfFiniteObservations <- rep(maxNbrOfObservations, times=nbrOfChannels);
# Construct the sample quantiles
quantiles <- (0:(maxNbrOfObservations-1L))/(maxNbrOfObservations-1L);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 1. Get the sample quantile for all channels (columns)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
for (cc in seq_len(nbrOfChannels)) {
values <- X[,cc,drop=TRUE];
if (!is.null(signalWeights)) {
values[signalWeights[,cc] == 0] <- NA;
}
# Order and sort the values
Scc <- sort(values, index.return=TRUE);
# The number of non-NA observations
nobs <- length(Scc$x);
# Has NAs?
if(nobs < nbrOfObservations) {
nbrOfFiniteObservations[cc] <- nobs;
isOk <- !is.na(values);
# Get the sample quantiles for those values
bins <- (0:(nobs-1L))/(nobs-1L);
# Record the order position for these values.
O[[cc]] <- ((1:nbrOfObservations)[isOk])[Scc$ix];
# Interpolate to get the values at positions specified by
# 'quantile' using data points given by 'bins' and 'Scc$x'.
Scc <- approx(x=bins, y=Scc$x, xout=quantiles, ties="ordered")$y;
} else {
O[[cc]] <- Scc$ix;
Scc <- Scc$x;
}
S[,cc] <- Scc;
} # for (cc ...)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 2. Calculate the average sample distribution, of each quantile
# across all columns. This can be done robustly or not and
# with weights or not.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
useWeighted <- (!is.null(channelWeights) & any(channelWeights != 1));
if (robust) {
if (useWeighted) {
xTarget <- rowWeightedMedians(S, w=channelWeights, na.rm=TRUE);
} else {
xTarget <- rowMedians(S, na.rm=TRUE);
}
} else {
if (useWeighted) {
xTarget <- rowWeightedMeans(S, w=channelWeights, na.rm=TRUE);
} else {
xTarget <- rowMeans(S, na.rm=TRUE);
}
}
# Assert that xTarget is of then same length as number of observations
stopifnot(length(xTarget) == maxNbrOfObservations);
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 3. For all columns, get for each sample quantile the value of
# average sample distribution at that quantile.
#
# Input: X[r,c], xTarget[r], O[[c]][r], nbrOfFiniteObservations[c].
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
for (cc in seq_len(nbrOfChannels)) {
# Get the number of non-NA observations
nobs <- nbrOfFiniteObservations[cc];
# Has NAs?
if(nobs < nbrOfObservations) {
# Get the NAs
isOk <- !is.na(X[,cc]);
# Get the sample quantiles for those values
if (ties) {
r <- rank(X[isOk,cc]);
bins <- (r-1)/(nobs-1L);
} else {
bins <- (0:(nobs-1L))/(nobs-1L);
}
# Interpolate to get the m's at positions specified by
# 'quantile' using data points given by 'bins' and 'xTarget'.
if (ties) {
idx <- isOk;
} else {
idx <- O[[cc]];
}
X[idx,cc] <- approx(x=quantiles, y=xTarget, xout=bins, ties="ordered")$y;
} else {
if (ties) {
r <- rank(X[,cc]);
bins <- (r-1L)/(nobs-1L);
X[,cc] <- approx(x=quantiles, y=xTarget, xout=bins, ties="ordered")$y;
} else {
X[O[[cc]],cc] <- xTarget;
}
}
} # for (cc ...)
X;
})
##############################################################################
# HISTORY:
# 2013-09-26
# o Now utilizing anyMissing(), rowMedians(), rowWeightedMedians(), and
# rowWeightedMeans().
# 2011-04-12
# o Now using as.double(NA) instead of NA.
# 2008-04-14
# o Renamed normalizeQuantile() to normalizeQuantileRank(). Keeping the old
# name for backward compatibility.
# 2006-05-12
# o Updated according to Gordon Smyth's package.
# 2006-02-08
# o Rd bug fix: Fixed broken links to help for median() and weighted.mean().
# 2005-06-03
# o Replaced arguments 'channelWeights' and 'signalWeights' with 'weights'
# and 'typeOfWeights'.
# o Added Rdoc help on weights.
# 2005-03-23
# o Updated normalizeQuantile() so that approx() does not give warnings
# about 'Collapsing to unique x values' when doing lowess normalization.
# 2005-02-02
# o Zero-one weights are now round off by round(w).
# 2005-02-01
# o Added argument 'signalWeights'.
# o Added validation of argument 'channelWeights'.
# 2005-01-27
# o Renamed argument 'A' to 'X'.
# o Renamed argument 'weights' to 'channelWeights'.
# o Making use of setMethodS3(). Added some more Rdoc comments.
# o Moved from R.basic to aroma.
# 2003-04-13
# o Updated the Rdoc comments.
# 2002-10-24
# o Adapted from source code by Gordon Smyth's smagks package.
##############################################################################
aroma.light/NEWS 0000644 0001750 0001750 00000072631 14136047216 013325 0 ustar nilesh nilesh Package: aroma.light
====================
Version: 3.23.1 [2021-08-19]
DOCUMENTATION:
* Update several citation URLs that were either broken or redirects elsewhere.
Version: 3.22.0 [2021-05-19]
* The version number was bumped for the Bioconductor release version, which is
now BioC 3.13 for R (>= 4.0.3).
Version: 3.20.0 [2020-10-27]
* The version number was bumped for the Bioconductor release version, which is
now BioC 3.12 for R (>= 4.0.0).
Version: 3.18.0 [2020-04-27]
* The version number was bumped for the Bioconductor release version, which is
now BioC 3.11 for R (>= 3.6.1).
Version: 3.17.1 [2019-12-09]
CODE REFACTORING:
* Now importing throw() from R.oo instead of R.methodsS3.
Version: 3.17.0 [2019-10-30]
* The version number was bumped for the Bioconductor devel version, which is
now BioC 3.11 for R (>= 3.6.1).
Version: 3.16.0 [2019-10-30]
* The version number was bumped for the Bioconductor release version, which is
now BioC 3.10 for R (>= 3.6.1).
Version: 3.15.1 [2019-08-28]
BUG FIXES:
* wpca() for matrices had a 'length > 1 in coercion to logical' bug.
Version: 3.15.0 [2019-05-02]
* The version number was bumped for the Bioconductor devel version, which is
now BioC 3.10 for R (>= 3.6.0).
Version: 3.14.0 [2019-05-02]
* The version number was bumped for the Bioconductor release version, which is
now BioC 3.9 for R (>= 3.6.0).
Version: 3.13.0 [2018-10-30]
* The version number was bumped for the Bioconductor devel version, which is
now BioC 3.9 for R (>= 3.6.0).
Version: 3.12.0 [2018-10-30]
* The version number was bumped for the Bioconductor release version, which is
now BioC 3.8 for R (>= 3.5.0).
Version: 3.11.2 [2018-09-04]
CODE REFACTORING:
* fitPrincipalCurve() now requires princurve (>= 2.1.2) and was updated to
make use of new principcal_curve class instead of deprecated
principcal.curve class. This update "should not" affect the results, but
see https://github.com/dynverse/princurve/issues/8 for information of what
has changed in the princurve package in this respect.
Version: 3.11.1 [2018-08-28]
* Updated installation instructions in README.md.
Version: 3.11.0 [2018-04-30]
* The version number was bumped for the Bioconductor devel version, which is
now BioC 3.8 for R (>= 3.6.0).
Version: 3.10.0 [2018-04-30]
* The version number was bumped for the Bioconductor release version, which is
now BioC 3.7 for R (>= 3.5.0).
Version: 3.9.1 [2017-12-19]
NEW FEATURES:
* robustSmoothSpline() now supports using Tukey's biweight (in addition to
already exising L1) estimators. See argument 'method'. Thanks to
Aaron Lun at the Cancer Research UK Cambridge Institute for adding this
feature.
Version: 3.9.0 [2017-10-30]
* The version number was bumped for the Bioconductor devel version, which is
now BioC 3.7 for R (>= 3.5.0).
Version: 3.8.0 [2017-10-30]
* The version number was bumped for the Bioconductor release version, which is
now BioC 3.6 for R (>= 3.4.0).
Version: 3.7.0 [2017-04-23]
* The version number was bumped for the Bioconductor devel version, which is
now BioC 3.6 for R (>= 3.4.0).
Version: 3.6.0 [2017-04-23]
* The version number was bumped for the Bioconductor release version, which is
now BioC 3.5 for R (>= 3.4.0).
Version: 3.5.1 [2017-04-14]
SIGNIFICANT CHANGES:
* robustSmoothSpline() uses a re-weighted re-iterative algorithm that fits a
smooth spline using stats::smooth.spline(), calculates the residuals and
which are used to fit a re-weighted smooth spline and so on until converence.
Due to updates to stats::smooth.spline() in R (>= 3.4.0) it is no longer
feasible to maintain a highly optimized version of the algorithm, because it
was based on internal stats::smooth.spline() code that has no completely
changed. Instead the re-iterative algorithm calls stats::smooth.spline() as
is, which slows it down. More importantly, it will now give slightly
different estimates.
SOFTWARE QUALITY:
* In addition to continous integration (CI) tests and nightly Bioconductor
tests, the package is now also tested regularly against all reverse package
depencies available on CRAN and Bioconductor.
Version: 3.5.0 [2016-10-18]
* The version number was bumped for the Bioconductor devel version, which is
now BioC v3.5 for R (>= 3.4.0).
Version: 3.4.0 [2016-10-18]
* The version number was bumped for the Bioconductor release version, which is
now BioC v3.4 for R (>= 3.3.1).
Version: 3.3.2 [2016-09-16]
CODE REFACTORING:
* Using NA_real_ (not just NA) everywhere applicable.
BUG FIXES:
* robustSmoothSpline() gave an error since R-devel (>= 3.4.0 r70682).
Version: 3.3.1 [2016-08-10]
CODE REFACTORING:
* CLEANUP: Using seq_len() and seq_along() everywhere (Issue #8)
Version: 3.3.0 [2016-05-03]
* The version number was bumped for the Bioconductor devel version, which is
now BioC v3.4 for R (>= 3.3.0).
Version: 3.2.0 [2016-05-03]
* The version number was bumped for the Bioconductor release version, which is
now BioC v3.3 for R (>= 3.3.0).
Version: 3.1.1 [2016-01-06]
* Package requires R (>= 2.15.2).
CODE REFACTORING:
* CLEANUP: robustSmoothSpline() no longer generates messages that
".nknots.smspl() is now exported; use it instead of n.knots()" for
R (>= 3.1.1).
Version: 3.1.0 [2015-10-23]
* The version number was bumped for the Bioconductor devel version, which is
now BioC v3.3 for R (>= 3.3.0).
Version: 3.0.0 [2015-10-13]
* The version number was bumped for the Bioconductor release version, which is
now BioC v3.2 for R (>= 3.2.2).
Version: 2.99.0 [2015-10-06]
* No changes.
Version: 2.9.0 [2015-09-17]
SIGNIFICANT CHANGES:
* Argument 'preserveScale' for normalizeTumorBoost() now defaults to FALSE.
Since 1.33.3 (2014-04-30) it had no default and prior to that it was TRUE.
Version: 2.5.3 [2015-09-13]
SOFTWARE QUALITY:
* ROBUSTNESS: Explicitly importing core R functions.
BUG FIXES:
* rowAverages() and normalizeAverages() would give an error if some of the
argument default functions are overridden by non-functions of the same name
in the calling environment.
Version: 2.5.2 [2015-06-16]
SOFTWARE QUALITY:
* Relaxed package test for backtransformPrincipalCurve().
Version: 2.5.1 [2015-05-24]
* Bumped package dependencies.
DEPRECATED AND DEFUNCT:
* CLEANUP: Removed obsolete wpca(..., method = "dsvdc"); was only needed for
backward compatibility with R (< 1.7.0).
Version: 2.5.0 [2015-04-16]
* The version number was bumped for the Bioconductor devel version, which is
now BioC v3.2 for R (>= 3.3.0).
Version: 2.4.0 [2015-04-16]
* The version number was bumped for the Bioconductor release version, which is
now BioC v3.1 for R (>= 3.2.0).
Version: 2.3.3 [2015-02-18]
NEW FEATURES:
* If a value of argument 'xlim' or 'ylim' for plotDensity() is NA, then it
defaults to the corresponding extreme value of the data, e.g.
plotDensity(x, xlim = c(0, NA)).
Version: 2.3.2 [2015-02-17]
SOFTWARE QUALITY:
* ROBUSTNESS: Added package tests. Code coverage is 76%.
CODE REFACTORING:
* CLEANUP: Using requestNamespace() instead of request().
Version: 2.3.1 [2014-12-08]
* Same updates as in 2.2.1.
Version: 2.3.0 [2014-10-13]
* The version number was bumped for the Bioconductor devel version, which is
now BioC v3.1 for R (>= 3.2.0).
Version: 2.2.1 [2014-12-08]
CODE REFACTORING:
* Minor code cleanup.
Version: 2.2.0 [2014-10-13]
* The version number was bumped for the Bioconductor release version, which is
now BioC v3.0 for R (>= 3.1.1).
Version: 2.1.2 [2014-09-23]
* Minor tweaks due to the move to GitHub.
Version: 2.1.1 [2014-09-16]
SOFTWARE QUALITY:
* Fixed some new R CMD check NOTEs.
CODE REFACTORING:
* CLEANUP: Now importing R.utils (instead of only suggesting it).
* IMPORTANT/CLEANUP: The matrixStats package is no longer attached with this
package. In other words, you now might have to add library('matrixStats')
to your scripts.
Version: 2.1.0 [2014-04-11]
* The version number was bumped for the Bioconductor devel version, which is
now BioC v2.15 for R (>= 3.1.0).
Version: 2.0.0 [2014-04-11]
* The version number was bumped for the Bioconductor release version, which is
now BioC v2.14 for R (>= 3.1.0).
Version: 1.99.3 [2014-03-31]
* Bumped the version such that the next release will be 2.0.0.
Version: 1.33.3 [2014-03-30]
SIGNIFICANT CHANGES:
* Argument 'preserveScale' for normalizeTumorBoost() is now required. The goal
with this is to in a future version migrate to use preserveScale = FALSE as
the default (was preserveScale = TRUE) in order to avoid introducing a a
global bias in the tumor allele B fraction of heterozygous SNPs. The
examples use preserveScale = FALSE now.
NEW FEATURES:
* Added pairedAlleleSpecificCopyNumbers().
Version: 1.33.2 [2014-03-25]
NEW FEATURES:
* Now plotDensity() supports weights via argument 'W'.
Version: 1.33.1 [2014-03-25]
NEW FEATURES:
* Now plotDensity() also supports density() objects.
CODE REFACTORING:
* CLEANUP: robustSmoothSpline() no longer uses DUP = FALSE in an internal
.Fortran() call.
* Bumped up package dependencies.
Version: 1.33.0 [2013-10-14]
* The version number was bumped for the Bioc devel version.
Version: 1.32.0 [2012-10-14]
* The version number was bumped for the Bioconductor release version, which is
now Bioc v2.13 for R (>= 3.0.0).
Version: 1.31.10 [2013-10-08]
NEW FEATURES:
* Added averageQuantile() for matrices in addition to lists.
PERFORMANCE:
* SPEEDUP: Now normalizeQuantileSpline(..., sortTarget = TRUE) sorts the
target only once for lists of vectors just as done for matrices.
DOCUMENTATION:
* Merged the documentation for normalizeQuantileSpline() for all data types
into one help page. Same for plotXYCurve().
CODE REFACTORING:
* Bumped up package dependencies.
BUG FIXES:
* Argument 'lwd' of plotXYCurve(X, ...) was ignored if 'X' was a matrix.
Version: 1.31.9 [2013-10-07]
NEW FEATURES:
* Now library(aroma.light, quietly = TRUE) attaches the package completely
silently without any messages.
* Now the 'aroma.light' Package object is also available when the package is
only loaded (but not attached).
DOCUMENTATION:
* Merged the documentation for normalizeQuantileRank() for numeric vectors and
lists.
* Now documention S3 methods under their corresponding generic function.
Version: 1.31.8 [2013-10-02]
DOCUMENTATION:
* More generic functions are now "aliased" under relevant corresponding
methods.
Version: 1.31.7 [2013-09-27]
PERFORMANCE:
* SPEEDUP: Now all package functions utilizes 'matrixStats' functions
where possible, e.g. anyMissing(), colMins(), and rowWeightedMedians().
CODE REFACTORING:
* Bumped up package dependencies.
Version: 1.31.6 [2013-09-25]
CODE REFACTORING:
* CLEANUP: Package no longer use a fallback attachment of the 'R.oo' package
upon attachment.
Version: 1.31.5 [2013-09-23]
SOFTWARE QUALITY:
* ROBUSTNESS: Now properly declaring all S3 methods in the NAMESPACE file.
PERFORMANCE:
* SPEEDUP/CLEANUP: normalizeTumorBoost() now uses which() instead of
whichVector() of 'R.utils'. Before R (< 2.11.0), which() used to be 10x
slower than whichVector(), but now it's 3x faster.
CODE REFACTORING:
* CLEANUP: Now only using 'Authors@R' in DESCRIPTION, which is possible since
R (>= 2.14.0). Hence the new requirement on the version of R.
* Bumped up package dependencies.
Version: 1.31.4 [2013-09-10]
SOFTWARE QUALITY:
* CLEANUP: Now package explicitly imports what it needs from matrixStats.
CODE REFACTORING:
* Bumped up package dependencies.
Version: 1.31.3 [2013-05-25]
PERFORMANCE:
* SPEEDUP: Removed all remaining gc() calls, which were in
normalizeQuantileSpline().
* SPEEDUP: Replaced all rm() calls with NULL assignments.
CODE REFACTORING:
* Updated the package dependencies.
Version: 1.31.2 [2013-05-20]
* Same updates as in v1.30.2.
Version: 1.31.1 [2011-04-18]
* Same updates as in v1.30.1.
Version: 1.31.0 [2013-04-03]
* The version number was bumped for the Bioc devel version.
Version: 1.30.5 [2013-09-25]
SOFTWARE QUALITY:
* Backport from v1.31.5: Declaring all S3 methods in NAMESPACE.
CODE REFACTORING:
* Backport from v1.31.5: normalizeTumorBoost() now uses which(), which also
removes one dependency on 'R.utils'.
Version: 1.30.4 [2013-09-25]
SOFTWARE QUALITY:
* Backport from v1.31.4: Now package explicitly imports what it needs from
matrixStats.
Version: 1.30.3 [2013-09-25]
PERFORMANCE:
* Backport from v1.31.3: Removal of all gc() calls and removal of variables is
now faster.
BUG FIX:
* Removed one stray str() debug output in robustSmoothSpline().
Version: 1.30.2 [2013-05-20]
* CRAN POLICY: Now all Rd \usage{} lines are at most 90 characters long.
Version: 1.30.1 [2013-04-18]
NEW FEATURES:
* Now backtransformPrincipalCurve() preserves dimension names.
BUG FIXES:
* backtransformPrincipalCurve() gave an error if the pricipal curve was fitted
using data with missing values.
* fitPrincipalCurve() would not preserve dimension names if data contain
missing values.
Version: 1.30.0 [2012-04-03]
* The version number was bumped for the Bioconductor release version, which
now is Bioc v2.12 for R (>= 3.0.0).
Version: 1.29.0 [2012-10-01]
* The version number was bumped for the Bioc devel version.
Version: 1.28.0 [2012-10-01]
* The version number was bumped for the Bioconductor release version, which
now is Bioc v2.11 for R (>= 2.15.1).
Version: 1.27.1 [2012-09-12]
SOFTWARE QUALITY:
* ROBUSTNESS: Replaced an .Internal(psort(...)) call in medianPolish() with a
call to matrixStats:::.psortKM().
Version: 1.27.0 [2012-08-30]
CODE REFACTORING:
* CLEANUP: Removed weightedMedian(), which has been moved to the matrixStats
package.
* BACKWARD COMPATIBILITY: Now package depends on the matrixStats (>= 0.5.2)
package, so that weightedMedian() is still available when loading this
package. In future releases, matrixStats will be downgraded to only be a
suggested package.
Version: 1.26.1 [2012-08-30]
CODE REFACTORING:
* Updated the package dependencies.
BUG FIXES:
* robustSmoothSpline() would not work with most recent R devel versions.
Version: 1.26.0 [2012-08-19]
SIGNIFICANT CHANGES:
* Changed the license of aroma.light to GPL (>= 2) from LGPL (>= 2), because
some of the implementation was adopted from GPL (>= 2) code, i.e.
robustSmoothSpline() uses code from stats::smooth.spline().
SOFTWARE QUALITY:
* R CMD check no longer warns about some examples depending on the R.basic
package.
Version: 1.25.4 [2012-08-19]
SOFTWARE QUALITY:
* WORKAROUND: Now robustSmoothSpline() robustly locates the proper native R
fit function for smooth splines, which vary with different releases of R.
Version: 1.25.3 [2012-04-16]
CODE REFACTORING:
* Package no longer depends on R.methodsS3, only imports.
Version: 1.25.2 [2012-04-16]
SOFTWARE QUALITY:
* 'R CMD check' no longer complaints about .Internal() calls.
Version: 1.25.1 [2012-04-16]
NEW FEATURES:
* Added support for fitNaiveGenotypes(..., flavor = "fixed").
* GENERALIZATION: Now fitNaiveGenotypes() returns also 'flavor' and 'tau'.
The latter are the genotype threshholds used by the caller.
CODE REFACTORING:
* CLEANUP: Dropped argument 'flavor' of callNaiveGenotypes(); it is instead
passed to fitNaiveGenotypes() via '...'.
Version: 1.25.0 [2012-03-30]
* The version number was bumped for the Bioconductor devel version.
Version: 1.24.0 [2012-03-30]
* The version number was bumped for the Bioconductor release version, which
now is Bioc v2.10 for R (>= 2.15.0).
Version: 1.23.0 [2011-10-31]
* The version number was bumped for the Bioconductor devel version.
Version: 1.22.0 [2011-10-31]
* The version number was bumped for the Bioconductor release version, which
now is Bioc v2.9 for R (>= 2.14.0).
Version: 1.21.2 [2011-10-10]
CODE REFACTORING:
* Updated robustSmoothSpline() such that it works with the new "uniqueness"
scheme of smooth.spline() in R v2.14.0 and newer. It is tricky, because
robustSmoothSpline() is a reiterative algorithm which requires that the
choosen "unique" x:s does not change in each iteration. Previously,
'signif(x, 6)' was used to identify unique x:s, which gives the same set of
values when called twice, whereas this is not true for the new choice with
'round((x - mean(x))/tol)'.
Version: 1.21.1 [2011-06-26]
NEW FEATURES:
* Added argument 'aspectRatio' to plotMvsA(). It can be used to adjust the
range of the 'Mlim' argument relative to the 'Alim' argument.
Version: 1.21.0 [2011-04-13]
* The version number was bumped for the Bioconductor devel version.
Version: 1.20.0 [2010-04-13]
* The version number was bumped for the Bioconductor release version,
which now is Bioc v2.8 for R (>= 2.13.0).
Version: 1.19.6 [2011-04-12]
CODE REFACTORING:
* CLEANUP: Removed internal patch of principal.curve(). If an older version
than princurve v1.1-10 is used, an informative error is thrown requesting an
update. The internal patch is part of the offical princurve v1.1-10 release
(since 2009-10-04).
* Now all methods allocate objects with NAs of the appropriate mode.
KNOWN ISSUES:
* Recent updates to smooth.spline() in R v2.14.0 causes robustSmoothSpline()
to break in some cases.
Version: 1.19.5 [2011-04-08]
NEW FEATURES:
* Now weightedMedian() returns NA:s of the same mode as argument 'x'.
Version: 1.19.4 [2011-03-03]
* Same updates as in v1.18.4.
Version: 1.19.3 [2011-02-05]
* Same updates as in v1.18.3.
Version: 1.19.2 [2010-10-22]
* Same updates as in v1.18.2.
Version: 1.19.1 [2010-10-18]
* Same updates as in v1.18.1.
Version: 1.19.0 [2010-10-18]
* The version number was bumped for the Bioconductor devel version.
Version: 1.18.4 [2011-03-03]
BUG FIXES:
* findPeaksAndValleys(x, to) were 'x' is numeric would use partial match and
interpret 'to' as argument 'tol' and not part of '...' passed to density().
This problem was solved by placing '...' before argument 'tol'. Thanks
Oscar Rueda at the Cancer Reasearch UK for reporting on and identifying this
bug.
Version: 1.18.3 [2011-02-05]
DOCUMENTATION:
* Added paragraphs on how to do affine normalization when channel offsets are
known/zero. Same for multiscan calibration when scanner offsets are
known/zero.
* Fixed broken links to help for iwpca().
Version: 1.18.2 [2010-10-22]
DOCUMENTATION:
* Minor clarifications to the help page on "1. Calibration and Normalization".
This page is now also referenced in help("calibrateMultiscan").
Version: 1.18.1 [2010-10-18]
NEW FEATURES:
* Argument 'censorAt' for fitNaiveGenotypes() has new default.
* These updates were supposed to be in v1.17.7, but we forgot to commit them
to the BioC repository before the new BioC release.
BUG FIXES:
* fitNaiveGenotypes(..., subsetToFit = ) would throw an exception
reporting "Some elements of argument 'subsetToFit' is out of range ...".
Version: 1.18.0 [2010-10-18]
* The version number was bumped for the Bioconductor release version, which
now is Bioc v2.7 for R (>= 2.12.0).
Version: 1.17.6 [2010-10-08]
NEW FEATURES:
* Now findPeaksAndValleys() returns a object of class PeaksAndValleys, which
extends data.frame.
Version: 1.17.5 [2010-10-07]
NEW FEATURES:
* Added optional argument 'fit' to callNaiveGenotypes() for passing a model
fit returned by fitNaiveGenotypes(). If not specified, callNaiveGenotypes()
will call fitNaiveGenotypes() internally.
* Added fitNaiveGenotypes(), which previously was only internal of
callNaiveGenotypes().
Version: 1.17.4 [2010-10-06]
NEW FEATURES:
* Added findPeaksAndValleys() for the 'density' class, which then
findPeaksAndValleys() for 'numeric' utilizes.
Version: 1.17.3 [2010-09-18]
SOFTWARE QUALITY:
* ROBUSTNESS: Now normalizeFragmentLength() asserts that arguments
'fragmentLengths' and 'y' contain at least some finite values and specifies
the same number of units. In addition, the method also gives more
informative error messages in case it cannot fit the normalization function
due to non-finite values.
Version: 1.17.2 [2010-08-04]
NEW FEATURES:
* Added argument 'preserveScale' to normalizeTumorBoost() to rescale the
calibrated allele B fractions for heterozygous SNPs such that the
compression relative to the homozgygotes is preserved.
Version: 1.17.1 [2010-07-23]
* Same updates as in release version v1.16.1.
Version: 1.17.0 [2010-04-22]
* The version number was bumped for the Bioconductor devel version.
Version: 1.16.1 [2010-07-23]
NEW FEATURES:
* Now callNaiveGenotypes() returns the model estimates as attribute
'modelFit'. This feature was supposed to be in v1.16.0.
Version: 1.16.0 [2010-04-22]
* The version number was bumped for the Bioconductor release version, which
now is Bioc v2.6 for R (>= 2.11.0).
Version: 1.15.4 [2010-04-08]
SOFTWARE QUALITY:
* R devel assumes ASCII encoding unless specified. Added explicit Latin-1
encoding to the DESCRIPTION file to R CMD check to pass.
Version: 1.15.3 [2010-04-04]
NEW FEATURES:
* Added normalizeDifferencesToAverage(), normalizeTumorBoost(),
callNaiveGenotypes() and internal findPeaksAndValleys(), which all were
moved from the aroma.cn package.
Version: 1.15.2 [2010-03-12]
BUG FIXES:
* The example of fitPrincipalCurve() used a non-existing argument name in the
calls to substitute(). Thanks to Brian Ripley at University of Oxford for
reporting this.
Version: 1.15.1 [2009-11-01]
CODE REFACTORING:
* Now fitPrincipalCurve() only uses the internal bug-fix patch if a version
earlier than princurve v1.1-10 is installed.
Version: 1.15.0 [2009-10-27]
* The version number was bumped for the Bioconductor devel version.
Version: 1.14.0 [2009-10-27]
* The version number was bumped for the Bioconductor release version, which
now is Bioc v2.5 for R (>= 2.10.0).
Version: 1.13.6 [2009-10-20]
DOCUMENTATION:
* FIX: CITATION file reverted to that of v1.13.4.
Version: 1.13.5 [2009-10-08]
DOCUMENTATION:
* CITATION file [incorrectly] updated by BioC maintainers.
Version: 1.13.4 [2009-09-23]
DOCUMENTATION:
* Fixed a few broken Rd links.
Version: 1.13.3 [2009-07-15]
NEW FEATURES:
* ADDED: fit- and backtransformXYCurve().
* Added attribute 'processingTime' to the fit object returned by
fitPrincipalCurve().
Version: 1.13.2 [2009-05-29]
* Incorporating the same updates as in release v1.12.2.
Version: 1.13.1 [2009-05-13]
* Incorporating the same updates as in release v1.12.1.
Version: 1.13.0 [2009-04-20]
* The version number was bumped for the Bioconductor devel version.
Version: 1.12.2 [2009-05-29]
CODE REFACTORING:
* Replacing old HOWTOCITE with a standard CITATION file.
BUG FIXES:
* Previous bug fix in backtransformPrincipalCurve() regarding argument
'dimension' broke the initial purpose of this argument. Since both use cases
are still of interest, how the subsetting is done is now based on whether
the number of dimensions of the input data and the model fit match or not.
See help and example code for 'backtransformPrincipalCurve.matrix'.
Version: 1.12.1 [2009-05-13]
BUG FIXES:
* backtransformPrincipalCurve(..., dimensions) did not subset the 'X' matrix.
Also, the method now returns a matrix of the same number of columns
requested. The Rd example now illustrates this. Thanks to Pierre Neuvial,
UC Berkeley for the troublshooting and fix.
Version: 1.12.0 [2009-04-20]
* The version number was bumped for the Bioconductor release version.
Version: 1.11.2 [2009-02-08]
BUG FIXES:
* An error was thrown in backtransformPrincipalCurve() when argument
'dimensions' was specified.
Version: 1.11.1 [2009-01-12]
NEW FEATURES:
* Added fit- & backtransformPrincipalCurve().
Version: 1.11.0 [2008-10-21]
* The version number was bumped for the Bioconductor devel version.
Version: 1.10.0 [2008-10-21]
* The version number was bumped for the Bioconductor release version.
Version: 1.9.2 [2008-09-11]
NEW FEATURES:
* Added argument 'onMissing' to normalizeFragmentLength() for specifying how
to normalize (if at all) data points for which the fragment lengths are
unknown. For backward compatibility, we start off by having it "ignore" by
default.
CODE REFACTORING:
* MEMORY OPTIMIZATION: robustSmoothSpline() is now cleaning out more variables
when no longer needed.
Version: 1.9.1 [2008-05-10]
* Incorporating the same updates as in release v1.8.1.
Version: 1.9.0 [2008-04-29]
* The version number was bumped for the Bioconductor devel version.
Version: 1.8.1 [2008-05-10]
BUG FIXES:
* If the 'subsetToFit' of normalizeFragmentLength() was shorter than the
number of data points, an exception was thrown. The test was supposed to
assert that the subset was not greater than the number of data points.
Version: 1.8.0 [2008-04-29]
* The version number was bumped for the Bioconductor release version.
Version: 1.7.2 [2008-04-14]
NEW FEATURES:
* Added normalizeFragmentLength().
* Added normalizeQuantileSpline().
* Renamed normalizeQuantile() to normalizeQuantileRank().
* Added plotXYCurve().
* Added predict() for the 'lowess' class.
Version: 1.7.1 [2007-11-28]
NEW FEATURES:
* The startup message when loading the package is generated with
packageStartupMessage() so that it can be suppressed.
DOCUMENTATION:
* TYPO: Corrected a spelling error in the help pages.
SOFTWARE QUALITY:
* Package passes R CMD check R v2.6.1.
CODE REFACTORING:
* Package now only suggest the R.oo package, and instead depends on the new
R.methodsS3.
Version: 1.7.0 [2007-10-08]
* The version number was bumped for the Bioconductor devel version.
Version: 1.6.0 [2007-10-08]
* The version number was bumped for the Bioconductor release version.
Version: 1.5.2 [2007-08-10]
SOFTWARE QUALITY:
* Package pass R CMD check R v2.6.0.
Version: 1.5.1 [2007-06-08] (this was mistakenly versioned 1.4.1)
NEW FEATURES:
* Added normalizeAverage().
SOFTWARE QUALITY:
* Package pass R CMD check R v2.6.0 with Rd encoding errors.
Version: 1.5.0 [2007-05-09]
* The version number was bumped for the Bioconductor devel version.
Version: 1.4.0 [2007-05-09]
* The version number was bumped up with the Bioconductor release.
Version: 1.3.1 [2007-01-15]
CODE REFACTORING:
* Removed code to use 'modreg' for backward compatibility with R < 1.9.0.
* Added R.utils to Suggests field of DESCRIPTION.
Version: 1.3.0 [2006-10-??]
* The devel version number was bumped up with the Bioconductor release.
Version: 1.2.0 [2006-10-03]
* The version number was bumped up with the Bioconductor release.
Version: 1.1.0 [2006-07-20]
SIGNIFICANT CHANGES:
* Added to Bioconductor v1.9.
NEW FEATURES:
* Added some trial RSP pages. Try browseRsp() in the R.rsp package.
Version: 0.1.7 [2006-06-27]
CODE REFACTORING:
* Made the package truely standalone except from R.oo. Previously package
R.basic was used in some of the examples.
Version: 0.1.6 [2006-05-22]
NEW FEATURES:
* Added medianPolish() which is much faster than stats::medpolish() when there
are no NA.
* Added plotDensity() for list of vectors as well as for matrices.
* Added normalizeQuantile() for lists of vectors as well as for a single
vector of numerics. To calculate the target quantile there is a new
function averageQuantile(), which is also for lists of vectors. It latter
does not support robust estimatation of the average, because it safes memory.
* Updated normalizeQuantile() for matrices according to the updates in the
limma package.
CODE REFACTORING:
* Added a namespace for the package.
* Added 'biocViews' since the package will eventually be added to the
Bioconductor project.
Version: 0.1.5 [2006-04-21]
PERFORMANCE:
* Minor speedup to weightedMedian(), e.g. negative weights do no longer give
and error, but are treated as zero instead. This removes some overhead
of the function. Also, if it is known that there are no NAs that can be
specified by na.rm = NA, which will skip NA checks.
Version: 0.1.4 [2006-03-28]
DOCUMENTATION:
* Updated broken Rd links.
* Updated the references to publications.
Version: 0.1.3 [2006-01-22]
NEW FEATURES:
* Now fitIWPCA() does not return the data matrix. This is to save memory.
The calling algorithm can equally well add the data if it is needed.
DOCUMENTATION:
* Added help on the returned parameters of fitIWPCA().
Version: 0.1.2 [2005-09-06]
NEW FEATURES:
* All plot methods displaying log-ratios now assures that no fake log-ratios
are calculated due to two negative raw signals. Similarily, methods display
log-intensities now assures that the log-intensities are calculated as
doubles to avoid possible overflow warnings for too large integers.
Version: 0.1.1 [2005-07-26]
NEW FEATURES:
* Added sampleCorrelations() and sampleTuples().
* Now argument 'interpolate' of weightedMedian() defaults to TRUE only if
'ties' is NULL.
Version: 0.1.0 [2005-06-03]
SIGNIFICANT CHANGES:
* Created. Most of the matrix methods were copied from the R.basic and the
aroma packages. The purpose of this package is to provide a standalone
package, which does not require any of the aroma classes. This will allow
the methods to be used by end users as is, or be utilized in other packages.
aroma.light/inst/ 0000755 0001750 0001750 00000000000 14136047216 013572 5 ustar nilesh nilesh aroma.light/inst/CITATION 0000644 0001750 0001750 00000016073 14136047216 014736 0 ustar nilesh nilesh citHeader("Please cite aroma.light one or more of approprite reference below");
citEntry(
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# BibTeX entry:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
entry = "Article",
author = "H. Bengtsson and Pierre Neuvial and Terence P Speed",
title = "TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays",
journal = "BMC Bioinformatics",
year = "2010",
month = "May",
volume = "11",
number = "245",
doi = "10.1186/1471-2105-11-245",
url = "https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-245",
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Plain-text citation:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
textVersion = paste(sep="",
"H. Bengtsson, P. Neuvial and T.P. Speed. ",
"TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays, ",
"BMC Bioinformatics, ",
"2010"
)
);
citEntry(
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# BibTeX entry:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
entry = "Article",
author = "Henrik Bengtsson and Amrita Ray and Paul Spellman and Terence P Speed",
title = "A single-sample method for normalizing and combining full-resolutioncopy numbers from multiple platforms, labs and analysis methods",
journal = "Bioinformatics",
year = "2009",
volume = "25",
number = "7",
doi = "10.1093/bioinformatics/btp074",
url = "https://bioinformatics.oxfordjournals.org/cgi/content/full/25/7/861",
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Plain-text citation:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
textVersion = paste(sep="",
"H. Bengtsson, A. Ray, P. Spellman and T.P. Speed. ",
"A single-sample method for normalizing and combining full-resolutioncopy numbers from multiple platforms, labs and analysis methods, ",
"Bioinformatics, ",
"2009"
)
);
citEntry(
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# BibTeX entry:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
entry="TechReport",
author = "H. Bengtsson and K. Simpson and J. Bullard and K. Hansen",
title = "{aroma.affymetrix}: A generic framework in {R} for analyzing small
to very large {Affymetrix} data sets in bounded memory",
institution = "Department of Statistics, University of California, Berkeley",
year = "2008",
month = "February",
number = "745",
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Plain-text citation:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
textVersion = paste(sep="",
"H. Bengtsson; K. Simpson; J. Bullard; K. Hansen. ",
"aroma.affymetrix: A generic framework in R for analyzing small to very large Affymetrix data sets in bounded memory, ",
"Tech Report 745, ",
"Department of Statistics, University of California, Berkeley, ",
"February 2008"
)
);
citEntry(
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# BibTeX entry:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
entry="Article",
author = "Henrik Bengtsson and R. Irizarry and B. Carvalho and T.P. Speed",
title = "Estimation and assessment of raw copy numbers at the single locus level",
journal = "Bioinformatics",
year = "2008",
volume = "24",
number = "6",
url = "https://bioinformatics.oxfordjournals.org/cgi/content/full/24/6/759",
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Plain-text citation:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
textVersion = paste(sep="",
"H. Bengtsson, R. Irizarry, B. Carvalho, & T.P. Speed. ",
"Estimation and assessment of raw copy numbers at the single locus level, ",
"Bioinformatics, ",
"2008"
)
);
citEntry(
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# BibTeX entry:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
entry="Article",
author = "H. Bengtsson and O. H{\"o}ssjer",
title = "Methodological study of affine transformations of gene
expressiondata with proposed robust non-parametric multi-dimensional
normalization method",
journal = "BMC Bioinformatics",
year = "2006",
volume = "7",
number = "100",
doi = "10.1186/1471-2105-7-100",
url = "https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-100",
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Plain-text citation:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
textVersion = paste(sep="",
"H. Bengtsson & O. Hssjer. ",
"Methodological study of affine transformations of gene expressiondata
with proposed robust non-parametric multi-dimensional normalization
method, ",
"BMC Bioinformatics, ",
"2006"
)
);
citEntry(
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# BibTeX entry:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
entry="Article",
author = "H. Bengtsson and G. J\"{o}nsson and J. Vallon-Christersson",
title = "Calibration and assessment of channel-specific biases in microarray
data with extended dynamical range",
journal = "BMC Bioinformatics",
year = "2004",
volume = "5",
number = "177",
doi = "10.1186/1471-2105-5-177",
url = "https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-5-177",
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Plain-text citation:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
textVersion = paste(sep="",
"H. Bengtsson, G. Jnsson & J. Vallon-Christersson. ",
"Calibration and assessment of channel-specific biases in microarray data with extended dynamical range, ",
"BMC Bioinformatics, ",
"2004"
)
);
citEntry(
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# BibTeX entry:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
entry="TechReport",
author = "H. Bengtsson",
title = "{aroma} - {A}n {R} {O}bject-oriented {M}icroarray {A}nalysis environment",
type = "Preprint in Mathematical Sciences",
institution = "Mathematical Statistics, Centre for Mathematical Sciences, Lund University, Sweden",
year = "2004",
number = "18",
url = "https://lup.lub.lu.se/search/publication/929031",
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Plain-text citation:
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
textVersion = paste(sep="",
"H. Bengtsson. ",
"aroma - An R Object-oriented Microarray Analysis environment, ",
"Preprint 2004:18, ",
"Centre for Mathematical Sciences, Lund University, ",
"2004"
)
);
aroma.light/inst/WORDLIST 0000644 0001750 0001750 00000001167 14136047216 014771 0 ustar nilesh nilesh Achim
affine
Affine
affinely
al
AppVeyor
ASCN
backtransform
backtransformed
BAFs
Bergh
Bioinformatics
BMC
Bolstad
Carvalho
cDNA
Centre
CMD
Crosse
Demmel
Dongarra
downweight
DSC
Eqn
Eqns
et
Friedrich
Hastie
heterozygous
Himel
homozygous
Hornik
Irizarry
iteratively
IWPCA
KxE
KxK
LAPACK's
Leisch
licence
limma
loadings
loess
Loess
Lund
Lx
macOS
multiscan
Multiscan
multiscanned
MxL
normalizeQuantiles
Nx
NxD
NxK
NxL
oo
Pawitan
PeaksAndValleys
Ploner
PMID
PMT
pre
Pre
Preprints
rescale
Rescales
resolutioncopy
robustified
robustify
SmoothSplineLikelihood
Smyth
Spellman
Stuetzle
suboptimal
tricube
TumorBoost
UW
Wishlist
WPCA
Zeileis
aroma.light/inst/rsp/ 0000755 0001750 0001750 00000000000 14136047216 014376 5 ustar nilesh nilesh aroma.light/inst/rsp/style.css 0000644 0001750 0001750 00000007530 14136047216 016255 0 ustar nilesh nilesh /*********************************************************************
This HTML style sheet makes your webpage more conforming to
braju.com's graphical profile.
To include this style sheet in our HTML web page add the following
line *between* and :
URL : http://www.braju.com/R/style.css
Author: Henrik Bengtsson, henrikb@braju.com
*********************************************************************/
/* ----------- M a i n - b o d y ----------- */
body {
font-family: Arial, Verdana, Geneva, sans-serif;
background: #ffffff;
position: absolute;
margin-top: 0;
top:8;
left:8;
}
center, p, b, i, em, small, big {
font-family: Arial, Verdana, Geneva, sans-serif;
}
/* ------------- H e a d e r s ------------- */
h1 {
font-family: Arial, Verdana, Geneva, sans-serif;
color: #3366cc;
}
h2, h3, h4, h5, h6 {
font-family: Arial, Verdana, Geneva, sans-serif;
}
/* --------------- L i n k s --------------- */
a {
font-family: Arial, Verdana, Geneva, sans-serif;
color: #3366cc;
}
a:visited { color: #3366cc; }
/*
a:link { color: #0C2577; }
a:active { color: #0C2577; }
*/
/* --------------- L i s t s --------------- */
ol, ul, li {
font-family: Arial, Verdana, Geneva, sans-serif;
}
/* ------------ V e r b a t i m ------------ */
pre, tt, code {
font-family: 'Courier New', monospace;
}
/* --------------- F o r m s --------------- */
.BrajuButton {
font-family: Arial, Verdana, Geneva, sans-serif;
background-color: #3366cc;
color: white;
font-weight: bold;
font-size: 8pt;
}
/* -------------- T a b l e s -------------- */
table, tr, th, td {
font-family: Arial, Verdana, Geneva, sans-serif;
}
.TableHeader {
font-family: Arial, Verdana, Geneva, sans-serif;
background-color: #3366cc;
color: white;
font-weight: bold;
}
.Table {
font-family: Arial, Verdana, Geneva, sans-serif;
border-width: thin;
border-color: #3366cc;
border-style: solid;
background-color: white;
}
/* --------- C o d e - S t y l e s --------- */
.Code {
font-family: 'Courier New', monospace;
color: #3366cc;
}
.RCode {
font-family: 'Courier New', monospace;
color: #3366cc;
}
.TeXCode {
font-family: 'Courier New', monospace;
color: #3366cc;
}
.CCode {
font-family: 'Courier New', monospace;
color: #3366cc;
}
.JavaCode {
font-family: 'Courier New', monospace;
color: #3366cc;
}
.PerlCode {
font-family: 'Courier New', monospace;
color: #3366cc;
}
.HTMLCode {
font-family: 'Courier New', monospace;
color: #3366cc;
}
.RCode {
font-family: 'Courier New', monospace;
color: #3366cc;
}
.RSource {
font-family: 'Courier New', monospace;
color: #3366cc;
}
/* ----------------- U M L ----------------- */
.UMLClass {
background: #000000;
color: #ffffff;
border-color: #000000;
border-width: 0px;
border-style: solid;
}
.UMLDivider {
background: #000000;
color: #ffffff;
height: 2px;
}
.UMLClassName {
background: #ffffff;
color: #000000;
font-weight: bold;
font-size: x-small;
}
.UMLAttribute {
background: #ffffff;
color: #000000;
font-size: x-small;
}
.UMLMethod {
background: #ffffff;
color: #000000;
font-size: x-small;
}
/* ----------- F o o t n o t e s ----------- */
.Footnote {
color: gray;
font-size: x-small;
}
.FootnoteInverse {
color: white;
}
.FootnoteTable {
font-size: x-small;
color: gray;
}
/* ------- M i s c e l l a n e o u s ------- */
.RError {
color: red;
font-family: monospace;
}
.RPrompt {
color: #3366cc;
font-family: monospace;
}
.Dates {
font-size: x-small;
}
.Filesize {
color: gray;
}
.HorizontalBar {
background-color: #3366cc;
}
.Footer {
font-size: x-small;
color: #6f6f6f;
}
/* HISTORY:
2001-10-28
* Cleaned up the code and added a few comments.
*/
aroma.light/inst/rsp/preprocessing.html.rsp 0000644 0001750 0001750 00000004601 14136047216 020753 0 ustar nilesh nilesh
Pre-processing of signals
Although the below discussion applies to a pair of single-channel arrays, for simplicity consider a two-color experiment (C=2) where each array has I features (spots). Let xc,i the (unknown) true biological signal for feature i=1,2,...,I in sample/channel c=1,2,...,C. This is the quantity that we wish to estimate.
With microarrays, we obtain estimates of xc,i called feature signals. Denote these by yc,i. These are not direct measurements, but rather transformed signals due to additive background, scale effects, non-linear effects, and other systematic effects, which we all summarize in a function fc(xc,i).
With random errors ec,i, we measure:
yc,i = fc(xc,i) + ec,i.
The most general objective in microarray analysis is to obtain an estimate x*c,i of the true signal, or at least an estimate y*c,i which is close to the true signal up to a less important scale factor (d). If we can estimate fc(), then we can back transform the data to get:
y*c,i = d * fc-1(yc,i)
If we manage to do this, we know that y*c,i is proportional to xc,i, which is often sufficent since we only need to know the relative signal, e.g. when the true signals doubles, our estimate doubles too. Note that this is typically not the case for yc,i.
This is why we pre-process measure signals.
Simply speaking, background subtraction, scanner calibration, and normalization is about estimating fc() and back-transforming signals to obtain y*c,i.
For the definitions and the difference between calibration and normalization, see the introduction of H. Bengtsson (2004).