gbm/0000755000176200001440000000000013417121763011024 5ustar liggesusersgbm/inst/0000755000176200001440000000000013417115400011770 5ustar liggesusersgbm/inst/doc/0000755000176200001440000000000013417115400012535 5ustar liggesusersgbm/inst/doc/gbm.Rnw0000644000176200001440000007264113346511223014010 0ustar liggesusers\documentclass{article} \bibliographystyle{plain} \newcommand{\EV}{\mathrm{E}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\aRule}{\begin{center} \rule{5in}{1mm} \end{center}} \title{Generalized Boosted Models:\\A guide to the gbm package} \author{Greg Ridgeway} %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{Generalized Boosted Models: A guide to the gbm package} \newcommand{\mathgbf}[1]{{\mbox{\boldmath$#1$\unboldmath}}} \begin{document} \maketitle Boosting takes on various forms with different programs using different loss functions, different base models, and different optimization schemes. The gbm package takes the approach described in \cite{Friedman:2001} and \cite{Friedman:2002}. Some of the terminology differs, mostly due to an effort to cast boosting terms into more standard statistical terminology (e.g. deviance). In addition, the gbm package implements boosting for models commonly used in statistics but not commonly associated with boosting. The Cox proportional hazard model, for example, is an incredibly useful model and the boosting framework applies quite readily with only slight modification \cite{Ridgeway:1999}. Also some algorithms implemented in the gbm package differ from the standard implementation. The AdaBoost algorithm \cite{FreundSchapire:1997} has a particular loss function and a particular optimization algorithm associated with it. The gbm implementation of AdaBoost adopts AdaBoost's exponential loss function (its bound on misclassification rate) but uses Friedman's gradient descent algorithm rather than the original one proposed. So the main purposes of this document is to spell out in detail what the gbm package implements. \section{Gradient boosting} This section essentially presents the derivation of boosting described in \cite{Friedman:2001}. The gbm package also adopts the stochastic gradient boosting strategy, a small but important tweak on the basic algorithm, described in \cite{Friedman:2002}. \subsection{Friedman's gradient boosting machine} \label{sec:GradientBoostingMachine} \begin{figure} \aRule Initialize $\hat f(\mathbf{x})$ to be a constant, $\hat f(\mathbf{x}) = \arg \min_{\rho} \sum_{i=1}^N \Psi(y_i,\rho)$. \\ For $t$ in $1,\ldots,T$ do \begin{enumerate} \item Compute the negative gradient as the working response \begin{equation} z_i = -\frac{\partial}{\partial f(\mathbf{x}_i)} \Psi(y_i,f(\mathbf{x}_i)) \mbox{\Huge $|$}_{f(\mathbf{x}_i)=\hat f(\mathbf{x}_i)} \end{equation} \item Fit a regression model, $g(\mathbf{x})$, predicting $z_i$ from the covariates $\mathbf{x}_i$. \item Choose a gradient descent step size as \begin{equation} \rho = \arg \min_{\rho} \sum_{i=1}^N \Psi(y_i,\hat f(\mathbf{x}_i)+\rho g(\mathbf{x}_i)) \end{equation} \item Update the estimate of $f(\mathbf{x})$ as \begin{equation} \hat f(\mathbf{x}) \leftarrow \hat f(\mathbf{x}) + \rho g(\mathbf{x}) \end{equation} \end{enumerate} \aRule \caption{Friedman's Gradient Boost algorithm} \label{fig:GradientBoost} \end{figure} Friedman (2001) and the companion paper Friedman (2002) extended the work of Friedman, Hastie, and Tibshirani (2000) and laid the ground work for a new generation of boosting algorithms. Using the connection between boosting and optimization, this new work proposes the Gradient Boosting Machine. In any function estimation problem we wish to find a regression function, $\hat f(\mathbf{x})$, that minimizes the expectation of some loss function, $\Psi(y,f)$, as shown in (\ref{NonparametricRegression1}). \begin{eqnarray} \hspace{0.5in} \hat f(\mathbf{x}) &=& \arg \min_{f(\mathbf{x})} \EV_{y,\mathbf{x}} \Psi(y,f(\mathbf{x})) \nonumber \\ \label{NonparametricRegression1} &=& \arg \min_{f(\mathbf{x})} \EV_x \left[ \EV_{y|\mathbf{x}} \Psi(y,f(\mathbf{x})) \Big| \mathbf{x} \right] \end{eqnarray} We will focus on finding estimates of $f(\mathbf{x})$ such that \begin{equation} \label{NonparametricRegression2} \hspace{0.5in} \hat f(\mathbf{x}) = \arg \min_{f(\mathbf{x})} \EV_{y|\mathbf{x}} \left[ \Psi(y,f(\mathbf{x}))|\mathbf{x} \right] \end{equation} Parametric regression models assume that $f(\mathbf{x})$ is a function with a finite number of parameters, $\beta$, and estimates them by selecting those values that minimize a loss function (e.g. squared error loss) over a training sample of $N$ observations on $(y,\mathbf{x})$ pairs as in (\ref{eq:Friedman1}). \begin{equation} \label{eq:Friedman1} \hspace{0.5in} \hat\beta = \arg \min_{\beta} \sum_{i=1}^N \Psi(y_i,f(\mathbf{x}_i;\beta)) \end{equation} When we wish to estimate $f(\mathbf{x})$ non-parametrically the task becomes more difficult. Again we can proceed similarly to \cite{FHT:2000} and modify our current estimate of $f(\mathbf{x})$ by adding a new function $f(\mathbf{x})$ in a greedy fashion. Letting $f_i = f(\mathbf{x}_i)$, we see that we want to decrease the $N$ dimensional function \begin{eqnarray} \label{EQ:Friedman2} \hspace{0.5in} J(\mathbf{f}) &=& \sum_{i=1}^N \Psi(y_i,f(\mathbf{x}_i)) \nonumber \\ &=& \sum_{i=1}^N \Psi(y_i,F_i). \end{eqnarray} The negative gradient of $J(\mathbf{f})$ indicates the direction of the locally greatest decrease in $J(\mathbf{f})$. Gradient descent would then have us modify $\mathbf{f}$ as \begin{equation} \label{eq:Friedman3} \hspace{0.5in} \hat \mathbf{f} \leftarrow \hat \mathbf{f} - \rho \nabla J(\mathbf{f}) \end{equation} where $\rho$ is the size of the step along the direction of greatest descent. Clearly, this step alone is far from our desired goal. First, it only fits $f$ at values of $\mathbf{x}$ for which we have observations. Second, it does not take into account that observations with similar $\mathbf{x}$ are likely to have similar values of $f(\mathbf{x})$. Both these problems would have disastrous effects on generalization error. However, Friedman suggests selecting a class of functions that use the covariate information to approximate the gradient, usually a regression tree. This line of reasoning produces his Gradient Boosting algorithm shown in Figure~\ref{fig:GradientBoost}. At each iteration the algorithm determines the direction, the gradient, in which it needs to improve the fit to the data and selects a particular model from the allowable class of functions that is in most agreement with the direction. In the case of squared-error loss, $\Psi(y_i,f(\mathbf{x}_i)) = \sum_{i=1}^N (y_i-f(\mathbf{x}_i))^2$, this algorithm corresponds exactly to residual fitting. There are various ways to extend and improve upon the basic framework suggested in Figure~\ref{fig:GradientBoost}. For example, Friedman (2001) substituted several choices in for $\Psi$ to develop new boosting algorithms for robust regression with least absolute deviation and Huber loss functions. Friedman (2002) showed that a simple subsampling trick can greatly improve predictive performance while simultaneously reduce computation time. Section~\ref{GBMModifications} discusses some of these modifications. \section{Improving boosting methods using control of the learning rate, sub-sampling, and a decomposition for interpretation} \label{GBMModifications} This section explores the variations of the previous algorithms that have the potential to improve their predictive performance and interpretability. In particular, by controlling the optimization speed or learning rate, introducing low-variance regression methods, and applying ideas from robust regression we can produce non-parametric regression procedures with many desirable properties. As a by-product some of these modifications lead directly into implementations for learning from massive datasets. All these methods take advantage of the general form of boosting \begin{equation} \hat f(\mathbf{x}) \leftarrow \hat f(\mathbf{x}) + \EV(z(y,\hat f(\mathbf{x}))|\mathbf{x}). \end{equation} So far we have taken advantage of this form only by substituting in our favorite regression procedure for $\EV_w(z|\mathbf{x})$. I will discuss some modifications to estimating $\EV_w(z|\mathbf{x})$ that have the potential to improve our algorithm. \subsection{Decreasing the learning rate} As several authors have phrased slightly differently, ``...boosting, whatever flavor, seldom seems to overfit, no matter how many terms are included in the additive expansion''. This is not true as the discussion to \cite{FHT:2000} points out. In the update step of any boosting algorithm we can introduce a learning rate to dampen the proposed move. \begin{equation} \label{eq:shrinkage} \hat f(\mathbf{x}) \leftarrow \hat f(\mathbf{x}) + \lambda \EV(z(y,\hat f(\mathbf{x}))|\mathbf{x}). \end{equation} By multiplying the gradient step by $\lambda$ as in equation~\ref{eq:shrinkage} we have control on the rate at which the boosting algorithm descends the error surface (or ascends the likelihood surface). When $\lambda=1$ we return to performing full gradient steps. Friedman (2001) relates the learning rate to regularization through shrinkage. The optimal number of iterations, $T$, and the learning rate, $\lambda$, depend on each other. In practice I set $\lambda$ to be as small as possible and then select $T$ by cross-validation. Performance is best when $\lambda$ is as small as possible performance with decreasing marginal utility for smaller and smaller $\lambda$. Slower learning rates do not necessarily scale the number of optimal iterations. That is, if when $\lambda=1.0$ and the optimal $T$ is 100 iterations, does {\it not} necessarily imply that when $\lambda=0.1$ the optimal $T$ is 1000 iterations. \subsection{Variance reduction using subsampling} Friedman (2002) proposed the stochastic gradient boosting algorithm that simply samples uniformly without replacement from the dataset before estimating the next gradient step. He found that this additional step greatly improved performance. We estimate the regression $\EV(z(y,\hat f(\mathbf{x}))|\mathbf{x})$ using a random subsample of the dataset. \subsection{ANOVA decomposition} Certain function approximation methods are decomposable in terms of a ``functional ANOVA decomposition''. That is a function is decomposable as \begin{equation} \label{ANOVAdecomp} f(\mathbf{x}) = \sum_j f_j(x_j) + \sum_{jk} f_{jk}(x_j,x_k) + \sum_{jk\ell} f_{jk\ell}(x_j,x_k,x_\ell) + \cdots. \end{equation} This applies to boosted trees. Regression stumps (one split decision trees) depend on only one variable and fall into the first term of \ref{ANOVAdecomp}. Trees with two splits fall into the second term of \ref{ANOVAdecomp} and so on. By restricting the depth of the trees produced on each boosting iteration we can control the order of approximation. Often additive components are sufficient to approximate a multivariate function well, generalized additive models, the na\"{\i}ve Bayes classifier, and boosted stumps for example. When the approximation is restricted to a first order we can also produce plots of $x_j$ versus $f_j(x_j)$ to demonstrate how changes in $x_j$ might affect changes in the response variable. \subsection{Relative influence} Friedman (2001) also develops an extension of a variable's ``relative influence'' for boosted estimates. For tree based methods the approximate relative influence of a variable $x_j$ is \begin{equation} \label{RelInfluence} \hspace{0.5in} \hat J_j^2 = \hspace{-0.1in}\sum_{\mathrm{splits~on~}x_j}\hspace{-0.2in}I_t^2 \end{equation} where $I_t^2$ is the empirical improvement by splitting on $x_j$ at that point. Friedman's extension to boosted models is to average the relative influence of variable $x_j$ across all the trees generated by the boosting algorithm. \begin{figure} \aRule Select \begin{itemize} \item a loss function (\texttt{distribution}) \item the number of iterations, $T$ (\texttt{n.trees}) \item the depth of each tree, $K$ (\texttt{interaction.depth}) \item the shrinkage (or learning rate) parameter, $\lambda$ (\texttt{shrinkage}) \item the subsampling rate, $p$ (\texttt{bag.fraction}) \end{itemize} Initialize $\hat f(\mathbf{x})$ to be a constant, $\hat f(\mathbf{x}) = \arg \min_{\rho} \sum_{i=1}^N \Psi(y_i,\rho)$ \\ For $t$ in $1,\ldots,T$ do \begin{enumerate} \item Compute the negative gradient as the working response \begin{equation} z_i = -\frac{\partial}{\partial f(\mathbf{x}_i)} \Psi(y_i,f(\mathbf{x}_i)) \mbox{\Huge $|$}_{f(\mathbf{x}_i)=\hat f(\mathbf{x}_i)} \end{equation} \item Randomly select $p\times N$ cases from the dataset \item Fit a regression tree with $K$ terminal nodes, $g(\mathbf{x})=\EV(z|\mathbf{x})$. This tree is fit using only those randomly selected observations \item Compute the optimal terminal node predictions, $\rho_1,\ldots,\rho_K$, as \begin{equation} \rho_k = \arg \min_{\rho} \sum_{\mathbf{x}_i\in S_k} \Psi(y_i,\hat f(\mathbf{x}_i)+\rho) \end{equation} where $S_k$ is the set of $\mathbf{x}$s that define terminal node $k$. Again this step uses only the randomly selected observations. \item Update $\hat f(\mathbf{x})$ as \begin{equation} \hat f(\mathbf{x}) \leftarrow \hat f(\mathbf{x}) + \lambda\rho_{k(\mathbf{x})} \end{equation} where $k(\mathbf{x})$ indicates the index of the terminal node into which an observation with features $\mathbf{x}$ would fall. \end{enumerate} \aRule \caption{Boosting as implemented in \texttt{gbm()}} \label{fig:gbm} \end{figure} \section{Common user options} This section discusses the options to gbm that most users will need to change or tune. \subsection{Loss function} The first and foremost choice is \texttt{distribution}. This should be easily dictated by the application. For most classification problems either \texttt{bernoulli} or \texttt{adaboost} will be appropriate, the former being recommended. For continuous outcomes the choices are \texttt{gaussian} (for minimizing squared error), \texttt{laplace} (for minimizing absolute error), and quantile regression (for estimating percentiles of the conditional distribution of the outcome). Censored survival outcomes should require \texttt{coxph}. Count outcomes may use \texttt{poisson} although one might also consider \texttt{gaussian} or \texttt{laplace} depending on the analytical goals. \subsection{The relationship between shrinkage and number of iterations} The issues that most new users of gbm struggle with are the choice of \texttt{n.trees} and \texttt{shrinkage}. It is important to know that smaller values of \texttt{shrinkage} (almost) always give improved predictive performance. That is, setting \texttt{shrinkage=0.001} will almost certainly result in a model with better out-of-sample predictive performance than setting \texttt{shrinkage=0.01}. However, there are computational costs, both storage and CPU time, associated with setting \texttt{shrinkage} to be low. The model with \texttt{shrinkage=0.001} will likely require ten times as many iterations as the model with \texttt{shrinkage=0.01}, increasing storage and computation time by a factor of 10. Figure~\ref{fig:shrinkViters} shows the relationship between predictive performance, the number of iterations, and the shrinkage parameter. Note that the increase in the optimal number of iterations between two choices for shrinkage is roughly equal to the ratio of the shrinkage parameters. It is generally the case that for small shrinkage parameters, 0.001 for example, there is a fairly long plateau in which predictive performance is at its best. My rule of thumb is to set \texttt{shrinkage} as small as possible while still being able to fit the model in a reasonable amount of time and storage. I usually aim for 3,000 to 10,000 iterations with shrinkage rates between 0.01 and 0.001. \begin{figure}[ht] \begin{center} \includegraphics[width=5in]{shrinkage-v-iterations} \end{center} \caption{Out-of-sample predictive performance by number of iterations and shrinkage. Smaller values of the shrinkage parameter offer improved predictive performance, but with decreasing marginal improvement.} \label{fig:shrinkViters} \end{figure} \subsection{Estimating the optimal number of iterations} gbm offers three methods for estimating the optimal number of iterations after the gbm model has been fit, an independent test set (\texttt{test}), out-of-bag estimation (\texttt{OOB}), and $v$-fold cross validation (\texttt{cv}). The function \texttt{gbm.perf} computes the iteration estimate. Like Friedman's MART software, the independent test set method uses a single holdout test set to select the optimal number of iterations. If \texttt{train.fraction} is set to be less than 1, then only the \textit{first} \texttt{train.fraction}$\times$\texttt{nrow(data)} will be used to fit the model. Note that if the data are sorted in a systematic way (such as cases for which $y=1$ come first), then the data should be shuffled before running gbm. Those observations not used in the model fit can be used to get an unbiased estimate of the optimal number of iterations. The downside of this method is that a considerable number of observations are used to estimate the single regularization parameter (number of iterations) leaving a reduced dataset for estimating the entire multivariate model structure. Use \texttt{gbm.perf(...,method="test")} to obtain an estimate of the optimal number of iterations using the held out test set. If \texttt{bag.fraction} is set to be greater than 0 (0.5 is recommended), gbm computes an out-of-bag estimate of the improvement in predictive performance. It evaluates the reduction in deviance on those observations not used in selecting the next regression tree. The out-of-bag estimator underestimates the reduction in deviance. As a result, it almost always is too conservative in its selection for the optimal number of iterations. The motivation behind this method was to avoid having to set aside a large independent dataset, which reduces the information available for learning the model structure. Use \texttt{gbm.perf(...,method="OOB")} to obtain the OOB estimate. Lastly, gbm offers $v$-fold cross validation for estimating the optimal number of iterations. If when fitting the gbm model, \texttt{cv.folds=5} then gbm will do 5-fold cross validation. gbm will fit five gbm models in order to compute the cross validation error estimate and then will fit a sixth and final gbm model with \texttt{n.trees}iterations using all of the data. The returned model object will have a component labeled \texttt{cv.error}. Note that \texttt{gbm.more} will do additional gbm iterations but will not add to the \texttt{cv.error} component. Use \texttt{gbm.perf(...,method="cv")} to obtain the cross validation estimate. \begin{figure}[ht] \begin{center} \includegraphics[width=5in]{oobperf2} \end{center} \caption{Out-of-sample predictive performance of four methods of selecting the optimal number of iterations. The vertical axis plots performance relative the best. The boxplots indicate relative performance across thirteen real datasets from the UCI repository. See \texttt{demo(OOB-reps)}.} \label{fig:oobperf} \end{figure} Figure~\ref{fig:oobperf} compares the three methods for estimating the optimal number of iterations across 13 datasets. The boxplots show the methods performance relative to the best method on that dataset. For most datasets the method perform similarly, however, 5-fold cross validation is consistently the best of them. OOB, using a 33\% test set, and using a 20\% test set all have datasets for which the perform considerably worse than the best method. My recommendation is to use 5- or 10-fold cross validation if you can afford the computing time. Otherwise you may choose among the other options, knowing that OOB is conservative. \section{Available distributions} This section gives some of the mathematical detail for each of the distribution options that gbm offers. The gbm engine written in C++ has access to a C++ class for each of these distributions. Each class contains methods for computing the associated deviance, initial value, the gradient, and the constants to predict in each terminal node. In the equations shown below, for non-zero offset terms, replace $f(\mathbf{x}_i)$ with $o_i + f(\mathbf{x}_i)$. \subsection{Gaussian} \begin{tabular}{ll} Deviance & $\displaystyle \frac{1}{\sum w_i} \sum w_i(y_i-f(\mathbf{x}_i))^2$ \\ Initial value & $\displaystyle f(\mathbf{x})=\frac{\sum w_i(y_i-o_i)}{\sum w_i}$ \\ Gradient & $z_i=y_i - f(\mathbf{x}_i)$ \\ Terminal node estimates & $\displaystyle \frac{\sum w_i(y_i-f(\mathbf{x}_i))}{\sum w_i}$ \end{tabular} \subsection{AdaBoost} \begin{tabular}{ll} Deviance & $\displaystyle \frac{1}{\sum w_i} \sum w_i\exp(-(2y_i-1)f(\mathbf{x}_i))$ \\ Initial value & $\displaystyle \frac{1}{2}\log\frac{\sum y_iw_ie^{-o_i}}{\sum (1-y_i)w_ie^{o_i}}$ \\ Gradient & $\displaystyle z_i= -(2y_i-1)\exp(-(2y_i-1)f(\mathbf{x}_i))$ \\ Terminal node estimates & $\displaystyle \frac{\sum (2y_i-1)w_i\exp(-(2y_i-1)f(\mathbf{x}_i))} {\sum w_i\exp(-(2y_i-1)f(\mathbf{x}_i))}$ \end{tabular} \subsection{Bernoulli} \begin{tabular}{ll} Deviance & $\displaystyle -2\frac{1}{\sum w_i} \sum w_i(y_if(\mathbf{x}_i)-\log(1+\exp(f(\mathbf{x}_i))))$ \\ Initial value & $\displaystyle \log\frac{\sum w_iy_i}{\sum w_i(1-y_i)}$ \\ Gradient & $\displaystyle z_i=y_i-\frac{1}{1+\exp(-f(\mathbf{x}_i))}$ \\ Terminal node estimates & $\displaystyle \frac{\sum w_i(y_i-p_i)}{\sum w_ip_i(1-p_i)}$ \\ & where $\displaystyle p_i = \frac{1}{1+\exp(-f(\mathbf{x}_i))}$ \\ \end{tabular} Notes: \begin{itemize} \item For non-zero offset terms, the computation of the initial value requires Newton-Raphson. Initialize $f_0=0$ and iterate $\displaystyle f_0 \leftarrow f_0 + \frac{\sum w_i(y_i-p_i)}{\sum w_ip_i(1-p_i)}$ where $\displaystyle p_i = \frac{1}{1+\exp(-(o_i+f_0))}$. \end{itemize} \subsection{Laplace} \begin{tabular}{ll} Deviance & $\frac{1}{\sum w_i} \sum w_i|y_i-f(\mathbf{x}_i)|$ \\ Initial value & $\mbox{median}_w(y)$ \\ Gradient & $z_i=\mbox{sign}(y_i-f(\mathbf{x}_i))$ \\ Terminal node estimates & $\mbox{median}_w(z)$ \end{tabular} Notes: \begin{itemize} \item $\mbox{median}_w(y)$ denotes the weighted median, defined as the solution to the equation $\frac{\sum w_iI(y_i\leq m)}{\sum w_i}=\frac{1}{2}$ \item \texttt{gbm()} currently does not implement the weighted median and issues a warning when the user uses weighted data with \texttt{distribution="laplace"}. \end{itemize} \subsection{Quantile regression} Contributed by Brian Kriegler (see \cite{Kriegler:2010}). \begin{tabular}{ll} Deviance & $\frac{1}{\sum w_i} \left(\alpha\sum_{y_i>f(\mathbf{x}_i)} w_i(y_i-f(\mathbf{x}_i))\right. +$ \\ & \hspace{0.5in}$\left.(1-\alpha)\sum_{y_i\leq f(\mathbf{x}_i)} w_i(f(\mathbf{x}_i)-y_i)\right)$ \\ Initial value & $\mathrm{quantile}^{(\alpha)}_w(y)$ \\ Gradient & $z_i=\alpha I(y_i>f(\mathbf{x}_i))-(1-\alpha)I(y_i\leq f(\mathbf{x}_i))$ \\ Terminal node estimates & $\mathrm{quantile}^{(\alpha)}_w(z)$ \end{tabular} Notes: \begin{itemize} \item $\mathrm{quantile}^{(\alpha)}_w(y)$ denotes the weighted quantile, defined as the solution to the equation $\frac{\sum w_iI(y_i\leq q)}{\sum w_i}=\alpha$ \item \texttt{gbm()} currently does not implement the weighted median and issues a warning when the user uses weighted data with \texttt{distribution=list(name="quantile")}. \end{itemize} \subsection{Cox Proportional Hazard} \begin{tabular}{ll} Deviance & $-2\sum w_i(\delta_i(f(\mathbf{x}_i)-\log(R_i/w_i)))$\\ Gradient & $\displaystyle z_i=\delta_i - \sum_j \delta_j \frac{w_jI(t_i\geq t_j)e^{f(\mathbf{x}_i)}} {\sum_k w_kI(t_k\geq t_j)e^{f(\mathbf{x}_k)}}$ \\ Initial value & 0 \\ Terminal node estimates & Newton-Raphson algorithm \end{tabular} \begin{enumerate} \item Initialize the terminal node predictions to 0, $\mathgbf{\rho}=0$ \item Let $\displaystyle p_i^{(k)}=\frac{\sum_j I(k(j)=k)I(t_j\geq t_i)e^{f(\mathbf{x}_i)+\rho_k}} {\sum_j I(t_j\geq t_i)e^{f(\mathbf{x}_i)+\rho_k}}$ \item Let $g_k=\sum w_i\delta_i\left(I(k(i)=k)-p_i^{(k)}\right)$ \item Let $\mathbf{H}$ be a $k\times k$ matrix with diagonal elements \begin{enumerate} \item Set diagonal elements $H_{mm}=\sum w_i\delta_i p_i^{(m)}\left(1-p_i^{(m)}\right)$ \item Set off diagonal elements $H_{mn}=-\sum w_i\delta_i p_i^{(m)}p_i^{(n)}$ \end{enumerate} \item Newton-Raphson update $\mathgbf{\rho} \leftarrow \mathgbf{\rho} - \mathbf{H}^{-1}\mathbf{g}$ \item Return to step 2 until convergence \end{enumerate} Notes: \begin{itemize} \item $t_i$ is the survival time and $\delta_i$ is the death indicator. \item $R_i$ denotes the hazard for the risk set, $R_i=\sum_{j=1}^N w_jI(t_j\geq t_i)e^{f(\mathbf{x}_i)}$ \item $k(i)$ indexes the terminal node of observation $i$ \item For speed, \texttt{gbm()} does only one step of the Newton-Raphson algorithm rather than iterating to convergence. No appreciable loss of accuracy since the next boosting iteration will simply correct for the prior iterations inadequacy. \item \texttt{gbm()} initially sorts the data by survival time. Doing this reduces the computation of the risk set from $O(n^2)$ to $O(n)$ at the cost of a single up front sort on survival time. After the model is fit, the data are then put back in their original order. \end{itemize} \subsection{Poisson} \begin{tabular}{ll} Deviance & -2$\frac{1}{\sum w_i} \sum w_i(y_if(\mathbf{x}_i)-\exp(f(\mathbf{x}_i)))$ \\ Initial value & $\displaystyle f(\mathbf{x})= \log\left(\frac{\sum w_iy_i}{\sum w_ie^{o_i}}\right)$ \\ Gradient & $z_i=y_i - \exp(f(\mathbf{x}_i))$ \\ Terminal node estimates & $\displaystyle \log\frac{\sum w_iy_i}{\sum w_i\exp(f(\mathbf{x}_i))}$ \end{tabular} The Poisson class includes special safeguards so that the most extreme predicted values are $e^{-19}$ and $e^{+19}$. This behavior is consistent with \texttt{glm()}. \subsection{Pairwise} This distribution implements ranking measures following the \emph{LambdaMart} algorithm \cite{Burges:2010}. Instances belong to \emph{groups}; all pairs of items with different labels, belonging to the same group, are used for training. In \emph{Information Retrieval} applications, groups correspond to user queries, and items to (feature vectors of) documents in the associated match set to be ranked. For consistency with typical usage, our goal is to \emph{maximize} one of the \emph{utility} functions listed below. Consider a group with instances $x_1, \dots, x_n$, ordered such that $f(x_1) \geq f(x_2) \geq \dots f(x_n)$; i.e., the \emph{rank} of $x_i$ is $i$, where smaller ranks are preferable. Let $P$ be the set of all ordered pairs such that $y_i > y_j$. \begin{enumerate} \item[{\bf Concordance:}] Fraction of concordant (i.e, correctly ordered) pairs. For the special case of binary labels, this is equivalent to the Area under the ROC Curve. $$\left\{ \begin{array}{l l}\frac{\|\{(i,j)\in P | f(x_i)>f(x_j)\}\|}{\|P\|} & P \neq \emptyset\\ 0 & \mbox{otherwise.} \end{array}\right. $$ \item[{\bf MRR:}] Mean reciprocal rank of the highest-ranked positive instance (it is assumed $y_i\in\{0,1\}$): $$\left\{ \begin{array}{l l}\frac{1}{\min\{1 \leq i \leq n |y_i=1\}} & \exists i: \, 1 \leq i \leq n, y_i=1\\ 0 & \mbox{otherwise.}\end{array}\right.$$ \item[{\bf MAP:}] Mean average precision, a generalization of MRR to multiple positive instances: $$\left\{ \begin{array}{l l} \frac{\sum_{1\leq i\leq n | y_i=1} \|\{1\leq j\leq i |y_j=1\}\|\,/\,i}{\|\{1\leq i\leq n | y_i=1\}\|} & \exists i: \, 1 \leq i \leq n, y_i=1\\ 0 & \mbox{otherwise.}\end{array}\right.$$ \item[{\bf nDCG:}] Normalized discounted cumulative gain: $$\frac{\sum_{1\leq i\leq n} \log_2(i+1) \, y_i}{\sum_{1\leq i\leq n} \log_2(i+1) \, y'_i},$$ where $y'_1, \dots, y'_n$ is a reordering of $y_1, \dots,y_n$ with $y'_1 \geq y'_2 \geq \dots \geq y'_n$. \end{enumerate} The generalization to multiple (possibly weighted) groups is straightforward. Sometimes a cut-off rank $k$ is given for \emph{MRR} and \emph{nDCG}, in which case we replace the outer index $n$ by $\min(n,k)$. The initial value for $f(x_i)$ is always zero. We derive the gradient of a cost function whose gradient locally approximates the gradient of the IR measure for a fixed ranking: \begin{eqnarray*} \Phi & = & \sum_{(i,j) \in P} \Phi_{ij}\\ & = & \sum_{(i,j) \in P} |\Delta Z_{ij}| \log \left( 1 + e^{-(f(x_i) - f(x_j))}\right), \end{eqnarray*} where $|\Delta Z_{ij}|$ is the absolute utility difference when swapping the ranks of $i$ and $j$, while leaving all other instances the same. Define \begin{eqnarray*} \lambda_{ij} & = & \frac{\partial\Phi_{ij}}{\partial f(x_i)}\\ & = & - |\Delta Z_{ij}| \frac{1}{1 + e^{f(x_i) - f(x_j)}}\\ & = & - |\Delta Z_{ij}| \, \rho_{ij}, \end{eqnarray*} with $$ \rho_{ij} = - \frac{\lambda_{ij }}{|\Delta Z_{ij}|} = \frac{1}{1 + e^{f(x_i) - f(x_j)}}$$ For the gradient of $\Phi$ with respect to $f(x_i)$, define \begin{eqnarray*} \lambda_i & = & \frac{\partial \Phi}{\partial f(x_i)}\\ & = & \sum_{j|(i,j) \in P} \lambda_{ij} - \sum_{j|(j,i) \in P} \lambda_{ji}\\ & = & - \sum_{j|(i,j) \in P} |\Delta Z_{ij}| \, \rho_{ij}\\ & & \mbox{} + \sum_{j|(j,i) \in P} |\Delta Z_{ji}| \, \rho_{ji}. \end{eqnarray*} The second derivative is \begin{eqnarray*} \gamma_i & \stackrel{def}{=} & \frac{\partial^2\Phi}{\partial f(x_i)^2}\\ & = & \sum_{j|(i,j) \in P} |\Delta Z_{ij}| \, \rho_{ij} \, (1-\rho_{ij})\\ & & \mbox{} + \sum_{j|(j,i) \in P} |\Delta Z_{ji}| \, \rho_{ji} \, (1-\rho_{ji}). \end{eqnarray*} Now consider again all groups with associated weights. For a given terminal node, let $i$ range over all contained instances. Then its estimate is $$-\frac{\sum_i v_i\lambda_{i}}{\sum_i v_i \gamma_i},$$ where $v_i=w(\mbox{\em group}(i))/\|\{(j,k)\in\mbox{\em group}(i)\}\|.$ In each iteration, instances are reranked according to the preliminary scores $f(x_i)$ to determine the $|\Delta Z_{ij}|$. Note that in order to avoid ranking bias, we break ties by adding a small amount of random noise. \bibliography{gbm} \end{document} gbm/inst/doc/gbm.pdf0000644000176200001440000100701613417115377014017 0ustar liggesusers%PDF-1.5 % 3 0 obj << /Length 1923 /Filter /FlateDecode >> stream xڅXK6WfjZ{7U$ˆ2Z/@ht7|1H,t,U<`ֻ}'yqƔ6/:`R%Kyǜ-^g!Zb;u%9YyYY:#ʨ(:dwIN:(/2آ#4otp5wީ$2*Or,Xi;XRndeO~>ꖻd*_A1 endstream endobj 14 0 obj << /Length 1258 /Filter /FlateDecode >> stream xXmo6_o+ b0 :@E[len=M˲!'tw|f"^{ֿ+#@@j%Jh Β¿NP-jG暬.ARngE0a(#5:smzS:?[Z5m*x6vӧ=SHyTؿcw>֍؉5`!;X&QE|CD6'>0a)gT߃C]E5H%dҫ[FiEr`!95i&_Rb'sSʄ`5N -gVG'k ՞lV;m N㘒n< qO6Q@NҢѮ Ԣx93rxRt>(d-+ʮnƮi\'IJxbT*5s g~q'!\ДL6-V-"jBNlZE{%dE_v$q&4Hsy'u)*AH7ܫpMW0PKk벻5ݮ/o«Aꭜ*e0'ga)NڈCc S D|n cy48C&!9P?M SctЕOzeQA?n,9ٗi#Jl/#UGj# C9^A[NW0a^öʮ7c5IL"ROz k0cgy>H,=9s)+H9xeZgר*nQnk|gg|Ksջv$UZJ/8gzSl^iEj|v]5m_mL xsڣd<6aޟ!PR endstream endobj 23 0 obj << /Length 2608 /Filter /FlateDecode >> stream xYY6~G5VxC,v xݭYI뷊EKdiC6߽ Wo&Osa\Te eEez;Pc:qUSO=qOo&NÐ a(fvk0|:mC!4.Ѹ|{@"mֻ%"IMrw &? x[i*۷ eI!49+3I`/T֏\kDTLr^,enwjΥֶ iM :!^(Ӷˈ,SYWSJwSsiG>?qgK tᇭ)/27<*@geӹfͦtpie۶i0]% "[1ѷI%nK^%440 ζ^x_$ue }ym`Fdzh;MŌ!>g֘+a q#\")*l¶Cti]FC1r;ДyZp,qpyϯ"%ju(>5R5室ȈpqѲ"l\g=πy$)%wWDcR=gVXr"A <_1?BVԻ1̚t?Q4r +UCq܅\ R"ϒoNAT`7a eu0 +iZǝBYV >) 8~Tͥ \ֺRZͨ_T9r{˦( kCq|\MڻPS />sGtK{VC`Ԃr?tg`s%"|-5ҩG!8b)bT)?P_zwkLFY;J};Gk?o6$,rCQvS>j;;cրb1]f=bp$9)3f Iقg ˮ/$㌀yB?? 4GZMdGP)k EO9WZP|a]*֔D f9=Äj2\b(s zw\dD2fѧ'SMff,Hi#5}п) XIuӠɻmc?/eoCQ(!"Ř$P&,%S648|`5_Kj/Sn /!//ۦ#Wew5ScJ`@MvKlAYׇj^0b,',6&l=D˂/BѬdtƜG#ؓSFzB#XY7A,_P ɜ\q~ЛfWp1}soXeqݢq8w9(~$>Yqر,<''|l7Q&}7} -H/4Wau64{5UQw6K[E$Z7@:tkms{U&܅dkۚ5wo 4]B*yD)6g1!t+1X)2pGTnkkMLE=OK0fefsuɚ;Iִ}8Xy-=Y&31ƃpyQb#>0*ŠVM1؉< P w @`K5ISDeY.ԅ\@5:9|A͋q endstream endobj 27 0 obj << /Length 2704 /Filter /FlateDecode >> stream xM۶_鉚F _L=t:7CxjT@ϿEQM;bXal"w/~]ne{؈4Y4՛Q;Ssd^DM_k)o,p_ 1V%<1FHKPqYfUˮIT;\<IE8—," 8VDkl NCIKQF/lqMBҖWȀNRB û"rQ1)R^/Ūt2%t%P?OqjFFRK%kǕq&h5i9:WuLE\ 6fH4I9t};LFk+~'eK4H`4~h4Mç`G\S m'#T\)rw4lwJ摦"zvcO+[Foa!x>͇5mVDnOgZL :u?LBK"=x |1RضL# [ tn *P䬩O m$ ^ ~;Y+92(cg^:gyS) yPi%R#"2Mw&@ "$0d7@`8|v3\w?9UcFaa5FҜg6R}w \*Y=*5HG halT"4Ⲡ% Go/^5a G=Tj˨L|P`88)x?Aڻ BLA[}2:Cp |Y2\cϘG ,tk(S~3yl0ؓ3 \ң~{b.Uw'&=W9ubHkVEC"K!=)Eo?"z6Smdc z~ZÍ`ep]CGL,C~K9&9k|9=.(A8Mӣ6 $$ϙLXb8Vϳоc!`j{6I{h3M8ah2 ZD=^,}`'I=Q/.QZqqw>eI.@ZZ޵xSsA(IH*H6vO[0e0,M7-ۭ`c+ oX뻆bq6!XY#{.q5j<#MT{G|xBj(p2R)*7 :'V"<9ekPAkt,;1Uk'<;6ώYkw>z؂0jw!yF |` 1L{C@Xt7H2ȗ3Vn@  ZPuMouk{?ZR_cN&}WJR HCv cʨփ̀9"iμ=3EnFBd_܁D-?,WtאIKU,JIP>rY*T_h>$B^pT)Ό/42-=^hdDgI!.+yOw+%PxX9HE^ s)!4qYT,Ŵ $P 4TkT};AcoSU0e8Z\r|$¦#.РXt LN`ZγL˫B0eeZD&@-LqM`3ϙ|\+ݲKL;VFo芏3ޚ~do W0wZd{^5JQIQ'"&v ~U૪y%HJ䔦p2% {1́w(j}~ ct,$SenY_V\>In6]eQ)&[EH<%dyEr-Hڿv>H.Z+ohLvȄr{cEfʪXXI2*i~4 DBL5Ef\C"ډ( L4?d!]!n=‚޹KJcyYsAHsyit[tT&S.d]Kp_rs'"zޖv9saNH-g/*?I!ęT_Y͈8TB®GζP`>!'nHhwٿBRSIW~r wU2OT\J 4]J >L@D,| jt~ڭxڍ&Go5OS~>!f QoyV *1U& 2'%.QvE!FW Ww/ [ü endstream endobj 30 0 obj << /Length 2663 /Filter /FlateDecode >> stream xڽ˒_%TeE/ˇlʛć8Lũ&0HI_O?0" w7lWy[*"n%JmVTfu[=0lޯIʦޕcݵk덖2ZڤPjy?oj}ծqo) $y2ix՜4\%ál9&Gx(oj 60, DZxM}Uu<9nˆgOc.DR-LZB`R'&>L?\T$*֓m/,u :fEvTjʾn>2`@x逿^=Otwy2 _I ܗzxޘ o|D]%qN,rX-&"| 77ט$X "R,O~@%p:-$כ"U,#c,U0KZ/F""E^R)&&-%$ԇc(I{2U\K\R?f39I9T9M\Rţ~B<9\xJ:Yj`4"dyд3k$niRe50ΦFd S_ׅN;Zj;m]PLpE'ZKU8کx:؝~, 9ݑtCg0FV-}9'}+!l42/}׃+=%mpLj8 օ])t_Nm;̡Ng:6: u6~*Q$ʡK~b(Y}IV`3轴` cu'gLPDC{"xO6r`($<ڣ WP ?2nAnYr|@ei `7}N!HR%ߡ2Ōin&gK,eV=BJp䛨t\dKHٲU&JLjzFN3 ҳu6z7c/9E2 !J\&oL/M]mtlPT0{˼_s)~J{_JuEwCM.17EN;/BGd_t!hҸ#O>koAN(G9pi!ũt0qg>&]0NC̐ H*Hρ[Olm* nչ V9&I [Y t٠BZ[`@AWЩb%!J{^藜.NCAӀ9)tx o 99%Sv87CLL|Q-\ (bAM ?DRdD" `y\&$!#6/Jf"^ 7r"  }#L*'? & rERԢz0e 5nŔJL[VrPr+9<>/2tJq* \ ph{4Vw=ji -'r[J~6G  ;At b6f8a}{o=`x?&,p!ST!?gjt K11 r:~L yr-8 YyA ~"ѽ[_gOUG$|v#YGEcP\14V!|yShEx[nN<黆gXDxV>8r]jB`T^$CYW2 ĠjEwW`_Iέq]49'/%o:4׾joM q[0r𒩚obT[R5[P}Ix۔`hUznJ&y{/Swlu*mTR'Dpy53cYLM~ vPdk߃yʫ RR8ڀ<Ȃ"96+)S>\^c N]Bf~gedY6Tе l6%}MIca~0Ү/ CS\sg!S)RDSO]ag4x9MF09zˡ_s4?&߉eti gڿb2O+bgI y8m9(?N]"x KrK Dt4S9VQfX#mN1cJ=. u(5)4vFFdbɡc{@*DӴŲ$OyM˚;#L/pBC3URgS,ÚzB\k։'Amf A>Z2di&զt$ɛH0DM2f7*mF>g.$5cw[=Tc}HY~Kyi ySlM9[O+ʕ3YLu5PN^)9uW~09<Ωq?̙#9~쉏gkJ{T{\pOn ]TL@v> stream xYێ}#[}X v F6QޯOZZHUUN5qp;\qF2RRgaʞw8H3e7wufTfi"H+=rW>:.4 ْ#*1Q's+;bd廦? lEl&BaD.XU]VG7 /')Ň \gI8I[oK{*?.H7%+}ת/}'d`B! cbS-\HzWOpժ<[WiAuuIa4Dn(FUm7"ʝ E+J"0۶Un!D^l)MVhxwBdN7PžӒ~41>}*έ6c<鵵 an5 ]<8O6«M[ҧ[.gK\]Iewh9jK SgK;}2($KIAƼ WaRHItaF$8UN8g+x YԳ($]fz,b k.L!D2ԃs)I( C3漸%E8$eS.䂏r"OrIqsHl lÀ<g`фy nE!X\VYkiMέIf 85k/ a V;ΦMĺ42M?zw96'A:BB܇i f0t3qOEW5tmaHRq;D2-.Iwrs5S1*fq3.VBեUl"9ܤ!7Ce h D1'"B39H1n+5Jöy.%9vcCۦ 1՞ *~ i}waNf:\UOO_ ΁GDe%9PDgA%jdޗޢKuPDA u֞kI%&n4I55I c>/aY}3vRaEa,t[}ag_GTfɫ'$%㞗?[5̄wGddNty- ң爺SM#k\fˢ?w&W\`xqllbC<:P\x07q 8$ Q 97ޭM׏'h?}86u]M0bѧ"5t/sKSFMlw endstream endobj 39 0 obj << /Length 2783 /Filter /FlateDecode >> stream xڝYKﯘ[ CM6eǩ瀑 )O7AQz* huw|G]<(+-A#\Q>dfUJه|ָC%V+QjE<VU* ;\F}=a8d6ֱ3v4z|:3G"a$Ż-4 Sv2X3oꦁl݆43. q r@P/xÚ'Js=zگ8^>׿G1?:Y{ꄾzޯnrQYۍ4~^n*:sxT ԭb 25hk*Gq6+}8qa `]h˰\ `h}8nKG1!CՎ R |[$oYTK BP}uW](U®rzm*]|`RHtqMUdgQ/D+Pwt0"/fLO|pHh$DȤLjf(ǀzUy)JY!|r MRZhi Å@\(:"Kkz\g#p؁sr0#.jN1!]4S}! `a,FS_71.9`yqf=R2RCFT JuȕlnT y2)jej35z(-[0Bu4 `᝟D-]>T6cz F:!9Z1! Ho=Ժ_f ?x-)0{}C+Lcr>RJpM  0 Rq,rL٦4QBnaM h!h 0>@9D[CAc5]@%\ hk!1]i@2J?K!ZbCwͣ:CeX$ uS3Rh-8wQ}qqD;l`< 镊r#ޠBRV m) YU֊`Η%uYfcESu0nCLL9':f _n60 [*`"c:͈+"ƹ,5EG`#ԌЧ ұLYq -i^}|*TłsMdNcN&!!zn~i =oRy{+׵!_@l@?a^SMf9w78^,RRL`F|Sx ƃ/݂sE g`ZKQAM%+l2l&I) $SjU>-x"2nGMhE}qSw5dXR;}}^ܒUZ&L{{~%O4utL}X^JɛvdjwcCafM(}&/$۞p#˧@߹#NbgRyn(`|C@h2j(Iиy(2) j-P8.Sw8$ kP1ZT8pQ l+lxڠ)uxc|uđw:~9zq{S _(!}܊\]zq_gBxQ(*uώ۵R77v! PW?wz!u P3v~zڭ ř-nD@m 8c- !f2WXw{#AY9Y, 0]|Ufҧ-=V'sHhON{ToaPGt[^nG'?aqu 4n' vAAHM`*}-aRsL7k E¯V_a 3nš_MWdH& 4a?2z X11K/k$_Wy z$JK6eoco!REpw>8\Teys9g`|-E ;n뼡yfM endstream endobj 43 0 obj << /Length 1430 /Filter /FlateDecode >> stream xڅVKo6W{X%U`-E큶h.E%M΃Dha 77M.x#72M"/7iYDUowQ/=ݯEna织rSEU,6hO3۝J3~L~7wmۗmR%08Nݣ=S{PJXovG82pɱqkBLe,~tۢTeddF^}0qcYgtglbt^6 .U s{ؘb?y&^oR6ghK:N-ޝK1ᩌ$B JL#: >p)2@t6-6KyotжǙvkfiX< L3/@Vx 潢G{[8J F"4Ո=@Bxm,jã˲I7VY6Fi6?E% 1|Gx䪌jgert56L_ ZL qo .B7=WO;s3Ai؏AS@GBn?x&Bq(ˡr5vtPs/l$N%pMV%cFo2 98Ei)mWHqa vj~ 1FD* 4P8 9o Kp,O[uR!cxvE: _U,9Mv_: %6:f`s 06Z ΐ. 9(yN=2Rm 2ʋ k"&cdQȸfE h Tw7:c`(J?O0*"з, ^SM}T拤Ƒ܊Șkiaݠ98uch J)CChgQ=W+ )::}T&g]l,UHQBƥqpq5[P3sGfM; ;9[@fBku>B˂Ev"#_}y65sU>huNvSdxW3\1sw0o24S|$ endstream endobj 36 0 obj << /Type /XObject /Subtype /Form /FormType 1 /PTEX.FileName (./shrinkage-v-iterations.pdf) /PTEX.PageNumber 1 /PTEX.InfoDict 44 0 R /BBox [0 0 559 432] /Resources << /ProcSet [ /PDF /Text ] /ExtGState << /R4 45 0 R >>/Font << /R10 46 0 R>> >> /Length 9034 /Filter /FlateDecode >> stream x|Mϭ;R=AiYWbI@A}!և\: Zjn~ryݫ|W_O~Bޢ7?x˷fOjkg,~,i:[?owz2INom[|<?4i;wqo_k~oϚ%|_}}Om~oPS٭뿠fݾX?z[WJm֝SΞ+Sh~-#Znŏ[|97uo1V1~5_pZkxpݭ}1s㫕W)ٽoM^Ul3^9NSo~B~qj|̯׿oǟ͟iۊSwnzJ֩FZv .-Ǯ5Y.\p +EY s|Tgճ"ܬT)ϩT9`.}/kYڃ/~3?Kް۩-%zťq{hZ6kκz ُ̃6)N;+B_X?K-?%K;4~k6ȑl۞ow>?&c`3-A}慄hSZO1{pըڸn1|R?X˝0>Xx7YsIm6Kdy-^+n:s[,!ME/dP&K1M&˓d~l<+ oZǕsg̹",͋,v7X,sE?l1-K7-yo7FlD'殕5zS5eHrQ.yݲkgPw9&um 0%Y9OCKsjyN!_܍RTvz&-mC h8^!) o.  [[;ldkADבMt.Ltn61ͭ@5iNk>MtZ&:mѥ(4]#ņ&th4yid3pL?&:i&uDWvM;Q'dS&&;wOemsxp}aO]qos[1׋~˲){E\ N#+cgA:+m¦:tJF:+>,t!6)(} y{:M9as@z*Ӝk-s}x攽dS˓;{&y<6Ǹ.NAO򺦸BuaW^ gJ+>=F˛+>JK3%~;K TeZ6nˁ!+- Mit^x(vٔüpOgvCk(Ł+s]e\!\:ks=Q+vLp<ܾN\ٜaͥ |uzr+ǚ.=`qm/bsA9)/1kTic@gmvn9Mecscp;ܧpO emnUO:_ɝ| grkU= ,`hj$es:[7FmKu]"H9eսuCUu׻y:ߵɕkw:N^q?HXN? kߤ|Pn;O sw]QTi=d# 1+.̺s{k7vm;õlnXU-#Q VjW睏隕]ᚩU>eWr9:+R>d oE .'Vvaѳe⡓3-+gu ̲ٖQZw1pQn69una6`Q[#6ݜ~[cmhaֱYPq[[^ub yN5غ},s2]7p~ \ U ξ$s\s"e0(io (kݼ`\CT?tcLP*XM\V;[nE3 }D@*8{SѵnWU*-g]~rb'eypͫno8D5yJN,2ƓX-X[:-7Vƅv qTI)xxĖfp͊k[1ˠ;N IuJlm6V{XhQ~Ž7@E*.yY=\NO4hE.E+ݦua .dh}Sfs|Xbp9ƨ57F jyj|5\64rxh=i.IYFDطae1$}H(w|,0o:8}\{I齙K u;XF0 Y O6h`}IY$V?dnv рAZ p$or%r{$볐}cJI-)P쓤Tw'Yˢ>]8|{iYRigqC $[v L7[v9=(+2E]}1#g}>emay҇;uEPv0v[v"ޢc ʦdӨ+ڠ˵ŒVjgWߦ_@tasdsúnOᲣy7e"շS:AGwU9A5'AX? R}"}' Ar~`VlN!EahE+g/>I(W z TpǫEtr5 (zJ׾9* 9RL96viODȈ_L0Xer-V~_B+=u7WqHN/=mC8e|R8{&WsgWF+8pD;+ `IDkqç` qNWEg3Ing$qc "s=²<n3>*/3Ì[e*npvŲaL]q֊ iIp&p-Ic6T)#A S0W\4&qO8X[{\|4_-ϦX IVfs7cHɩm.tHp3и3Vj8wD׉yQWWlUXq 앴UY*`Zn9)ڜqf57Ϭq Uft<9׽׺<;61p <3\7ERǓTrV-YN>c\s cS!Zq3{QJi_o`aKqr7U1 aR[lfq@?_8p8-]qDqpq|q7,ǭ)7}܊-̭0^M&G`N8Y*k;S!¦,Q$MCYE&A_2%Z141?ib.T\BPc*[Aw) E; h*3m57Pd1k(gXvZq\bxf՝dj&jqɤ阋N&}LHt1LOZyh@Ndd}$&\Au`nH" W\`rV&E 1k\ÂM"eG$rD"".I!_avE rBcL 3Ht;I뉐\c)1Լ8%8N٬xcyu9٬1XH'g8㔏'VO -e"0Unm/t=2#8}J͸bqDyܮ&:Y,Cl~E qM'Ȁ*vriWuv[bjT%Vw/7QTX)[D~O Rqp>EPS=\pUt侁D+EGs=.n\X<9"i0PFRLy+֒"7WOp=Z=HR_4Yd/LBp*Uug9p~X{·gW]Gh ojӺJ7cv U WW\q:/ 0sCqZ{'5w/ # ab{niYFS%rq6CNZr_:!GY#&̻8_r,N2:rCλNZQ39O%!lWr|8^,.9|#!Gq␣60GfkA3r.Yu1&uk rbAN 68Pb7?d8ucT)[q/NgVJt;8 MLJLb @!ţ8KD2bS`ȁ' r] 4QpìGsWq⑩!9ʚ%7~Y`ȩk9M,G޼p$+%KƈG\MԒGNmxd[EJ uaFF(YdO-2{[kW>@MAqg%sdrǟ 9LoNǁs?Λ&/u ̻U0z"9pbS W %<g20teI+rA83z%6}[6Wf7T]P2YȊY)[Nirdn9e[+qdn9 sGߓ]{fe-3]rtd'2l&Ll{š6(5 +9I-康^);Gl3!1 dacDw~MZ曉(AC/+fr8v|s i$9͢&& D_l^_7 QQGFL4}4e 6k.!*ARNf**R1|31~&S]ˊ[NIjjN%1vyeC+YCWr`&| _\)Ǽ1攉"A8es"S$>'K S^Q6[+l(G2^/". z8=x=0^/f4ǀ׃?|qïB, B4sd=oݵaPN?0 %z77 OE7KA כb*o:Z]j_t4P{VL9|9ү1U{MFؼ9C,{ş69.ps_s,C9揯9̦58 шe~Wz1ی 䋎M}k2Лhϸ? ֥Л>C hD )t)u逡k\ Z7|tf# o:!MG)O H!hs;OK Fk)0I}@N^cmBU:$BX6`B'^2!Qhd3$f`% v3WnqBtPA;7vǼa|!@!w F>n9+ ?-nx &c_r4Jc:MTv3@jl[nz7l_^6C?Jz[z7ݵch~oܵŖkv/j+Sچ6f~Q[3)mksVULzQqeO!~}9jsWy=]J>1p[;.]M$H8JW˵VR^Tpw.~HpxA8os Oy> stream xUU{\aav`%(AE`,X]>peW@G`AE&$&pAH Ķ]"hXQ]QDVҞ!YL_;9w% K $I۰O>wh5ys&DS-i"~)%j!?ĺj*hIP!"ɍeeeoIOMI]&,suwߎO``tͦH4驙GFI k)T\J֨aJV^Mʗ,rz{kr UfT.MФiU9I}fZiN.])%X"'"BA,!ф$܈P%&#1D-aG0P,’#!RF"--,E^um1VV唘rljPg÷J>;Î>{-e=0tHPu]7Nd.)@w"o`@> nI)剌Ίoj$۾Zѩm GHrCTB0VGfXulFt;d&ِC>yB*~.yQ˭Os&LAm^憕)輑ϟWf1sVR1_=7fcf 3([W3 x+1|^M_NxH֌qc BƢ;$̛)qS> A`\ l`OJJ^0̍.ndޘsc>L^`kX4s)d@NuYaD"'URL] Nn31=P>ŠՍ[hx_>&[?Gyexg6FDaOlƟ "ČBkV<qQ/f S| HQ9&q^r犗anVނ|Oر2hbX xj-7'7#;z.:B^mU5D>(r:6G-xmvUGZ۞0_n> stream xڍ]6~EP4e}Зnöbi݃/Q.l+#EٱPp(7$U&ߏw7oU ~%WFg 0WR@A)%sP9H@`B 4ɀ*.KKeKnQJhHs_!mד,BL%7 <^Kȳ|wxn;Lu.`{\ZWHA#*XӦEMT6@fpanO.у TO`[Z)!p0$*-x? }h^ZG ƛ&\p-\D`uV0;X}ZT9= *L@\>y-GvjGu.(5Ǔ1ՄGIq9Ja0B26vG 2H\EE}.(-# *83wd,`snxrY_ZnQHO=X}!bK 0N=xGT&c 7\^! b/[ ]kCy:RJYV׶FBI*SU:07-]_`o@b2۸.p7<M?h+0{qHy.)w Ȇ$.>/Font << /R10 56 0 R/R13 57 0 R>> >> /Length 2497 /Filter /FlateDecode >> stream xZKof n VMMMct^I`O03iKQg{&6M)Ýp?[5v,ḻrvj3vtR.?onp2?[yw!3ḹOl5]Mݺp|zp 8'^Rm~z;\);i.L*[:*nѤ|m`H-kkpgbϧ[d&PrUk눍\FX+*6 {'R2FKHdAli쥝%mXSb{^Kb = ƺ\]fki1ZvYYi|aNf_]WEKkS?E$4l[' z`NϤ^=bvRVBYmC\ nmX+6ZN& ƚXs6DC^hEՐShk]䴎,ド_xmoƫ>BnLԳ05YiSyypVU ll2c8^< Z`73 |6$:a%x[Gi**zQ;Z:bʲ*,֖=U{9gz菼*`e4ҋ23F&YV)[(\e&VoBj`s=4IL]㊸,u:x05Ci2!{l% e6hUI5):%]4-(xi5xsp1nt7.IϰZBW*] TْiO| t3FH\jb\@.C`W,g0ֆ vaV' ©ͫn 4l"&踤Y]EPxPQ&.Sd\Z4a'pv40.ʖ |ʴi"%`&6L^]>B' lN\.Fco%:W,۳xd~{A6 !{ |-}l4mPFEP REE6$U+C d]EU:f"evМIs *ﰋ(DѶ.+t9&l3 = ]|n7A.0)o|"T j͙dQm >idrUɅ[n̖Ud.lxN: \aeEL,̴f|f0?gcՅ^R]ޘXRG+`meV l/X}㥙`Ukya&[&"33?rx3Xe?=7fƙr0!^B^3@]釳81pe֍{]W:?lJ/UFO:L=Czᡥ&9uib#?TqL],~H6+<=ϧ˦ |vld'S ?g aG;kN$Ahe<$X=4ފnxXK1> stream xcd`ab`dduM- M f!Cﵿf2`e̻GOqE% A:CKKKJKjqfzQ_Wb TSYQYP_ᬩ`d`` $JsSu222K*RsSrSRAfU*due&+=o^X) L,|C泭zr2σ< }nx endstream endobj 61 0 obj << /Subtype /Type1C /Filter /FlateDecode /Length 63 0 R >> stream x]V TqVwY=A60:Ȏq[D ↚ {T "aP02䛨?I 4?sTwW}{o2@D"n V^JU2-!&Z?9A ##O_HJq吒{M/(HW薔&j33gΔ/](On2d21m܍VbqTytl2V V)%2,vvdp^*NL+ʸtUtLR5.15)M?%.5>-!=(C^8u k8{)GS6ʟR(;ʓ)/*)*J-|)Wj:5r8ʌ2QF1%L(2ԗxRVʀ) 9`jpqx@e\/Iܕufe&.jA)# LQ3a.$su"#m1 w g:};]#d ZHؒ*LnN~6__}fi\x-6 mvh0ik/wG#6Z˦K\wFQq{iZoHtUc}Xk,\Z+ 1:ڢvovS mͥBӍsYDzE7#mLM+:Sx8ɼAy&tI9/nҰF">&ҘCa.jxX'li8Iq4qMˎ_S$;rw Z~4nȴ$C,LjD:Tl*Ǻ{!eEEyߝ51kٶ qv` 8|D;`g;^hC+LI^DYBP<W1 yw{28xcLJ vllh5΄}$Ep "b5'ꜰ\p"KsgbL1O)&fUϑNWrҊ7M pœ"=6^Nu!R b[.& ހdo^f~*yҬlGȱZtIdJڰ2%-AdUb}jO({(`eOWa;zՒ4eBd }jIizRteH\w 'wbHupGtLU}y VZ?N⾵,jyH+矯IlYH~2cy(e`<.o&x,) 1| mtZ+'ag-8l-,ap.@b@uShH{TS"ahXH t)}u}sg8_+sO2)W~zB2rxDhI)MGs(\"7֥S y(Q2|l]:XwcGPB% sK}vA/d(Fmuck;J\rOn.[m{mvg@Ni 4> stream x[KsWy?R8YC*RLB(R&!k__c HI"11-\&1Lr>O~Nd*4韾NLb}^䓩r&r]|-r3o?|I`,d*RDBWn}o.HNKJ#uVq\yNj S̲E2K^"fӫx*mމuVLM\TIDz:>Z}B LX\5 _^ݺ(|_߽ozl67\- rm\8]<,hox6_G-3{cC^ (//ح/MA|E-w_'_-nK@r lYEl,VfϋYmQJEJpWbYqU`LA.(0./Y\@ W\5_ć$].W鷜 d &&/yΥfhBSz7XsŮ,yGż͘_YxbYp˸vӡD)&Z)>`\Zж64'D %֘w=sҕz9qĥ( bף55n7X\ZgKmQ˥gA)jt~L`<4y#<>] OxCc*yHh^,3:SLJgG Ʉ1dv#eܦeAJ%P3jR;f 3ʧEyofzK黳qjH:v:@ B8TJ@2{յ!h1w(uG@O%QAm w !#<}C'kF=чB/@\`&ƌVcg<8R0={ý3OXjlc+گzEl7N wMέ&qjo`Zǰ@Δozn^@ GS0KB r`ܡ8AB#! YM*<{OҝM9*(8BfȬ0B!uiq4m8Őe[PsZoРit0p2<ƃאiB&ܾp)d5KX]=FijVb{j>] u`=jTH?*'j+ jŮ %8 Pq@,K[[0VRiu7,F! eve؛? -".Cv GNaNd t3= Hr F| k>#.T#$㯘YfV~WǷ=Z9#̷f*O '@`1VUi8Vpq'@^ { _pGopx R 14Uv(]7؉8;A#J(GVhƥzVه8]1o+>gz`62ہDoa\_;.u5ROۂU7S EU1ʎAAzJEGDۀ)x_d@s+ÜcS9~ųǷZ'1:f.:&q8 !-õ{R *\30µ,FѠQ;< 4=HpX{̣ãɘBQg:øOkLhC5?VJV62'9HH MTOВ@ZwWCyTza'/Q H ӛ`pMJP{Kӯb3t4V4[2d Mᶆ&XkYWtZt4kc2=YׯG7#>H}}+[SXG?R-X w%XWvsY-)ݪ&0/!W|{]]wzsv b/<- 8>P Ggw5a2[}Ux[}@cn9ZN}4O(SEVsz׹<hձr@{ODF`hHQ^aqqξZD>)rqV {Z9{H;Cٿ̯ endstream endobj 69 0 obj << /Length 2544 /Filter /FlateDecode >> stream x[I۶ϯ`$ULKc{U!yYRI^l3H_@J\@Hc*EF믛0MxA^^|W cJɓ˷ he%5B&SƸ:MdW?I, $$)3ioW>Yup0~n"~0s.`MX¥%F{_xE)>9DXSY?ºXE(+Ua߃bV5chha}f 0HReu*s*BG{Q:q4N Vh KB}>ÙN*Ef䨺FC#s1FK)SFfsWmkd%nj21" A6}XԆ޵LLp &ƩbXdvS kPPS=%z5?ĺWց/<?&<[mbnanv>EH &;wgF_y6N6u~Z%#,U`p}y__|Ӎ : ~( <B7..~܏͵SmQ GY~u7a!Oe"IM^M\|C,!0WZy\JU.2ͼ2wmszJRA]L#)5 ʈ+#yI9DQmf֨p3$ 1@D=IC7B%6uUcfŤ"LmȈq̑w3GA`Y鶒\ &KCb$N:W!w(Q`ˡoV"O>S.p?̲y3ٽ0Cpw ד4wA6 xPTK=^dWkDݺ[gպB/ww:L:f?Ne#z_{{d'mc2tsä޵(` sc˒H%wh#F|XV*[hhȠBw˰^Yǘ_;| ߮_Vyv5s&Hx#-h"n(:e \*rG_e1NX)@ٝ ]480kz49cy(TJ أǛOj`R!^"Z˯b'S![GMpԤ.kB"U vQ^& 6U1!r-5^\ ŻJUnФc86ep썭CkEPgXZB'>( =b#C6=08i!8@"F[Q:o7Ѩ޶-.XT-mڶ3치̳gO;lz7/Ӛ@cUE~-Wb)ii~)kA֪8 4~i:_/|ޛ0Tt7gO[f`dе~̳l|ϣvl,+Υ}Zͬg}3cɀr'ǏK6~#J"avB5+l $Q$HmZBҬԶ(/Uʽ0lTY#ZKvՀ5 Pj<G?BH:fL7܉"8}%Qw|{kpcxk93QdXlP6W?rv~\NnjrԛE:ŬidmݐDbڢd=ݬyxmޢO2je4UoAQZ gxtYf1~%2X$pM:IQ_*16ܖuu0 ",9ZשLפ8 (E2R(na y>FH麝D"njiAƪӇ#X4~u*%6z'd)yНV=o9`C' z{(F-su_|2 Id86U_E-%XdƲ;{:ssp^ 7poIb> stream xZ[۶~_G299ms{R7uW)R!):3H$\IvI`0o. mft \zwŷ,%zv1H*L+CP\۟ѿ4{Pο--''p.fl03hMf C S݄ww7!2d*! gmg1oNWyʬ㶰[[m3HS6[;hXL?hɖSD*343MR#+#G*ֶ" f @L~18aF#|„G "2&zj;? ()B YQ:k9c5JX@Y"PTcEnIrzDt2BU'T|^?1։ƾ6ׂK ?l!IS}E/s0aK˽)nˈ*7I/#{ꑦ%MY>=f>\O턵Z#ʉ~V/+這suP;0 voh>Ux~߹Z뙑#@˂G;f﨤QJĈt2;r[1rTkЄfjJh VoC,XڜS^Gƶ:+.f-8l@IQ0Q71 ӆ}d˥gKxs@i:2Wԑi64i.y#GO6L^>"نb'vuj-!cVks,F O~B!:aGÀ*1a d3J[Yh0F]cf1zԤ;DvXamާeSZ}ckm^1J $_zIqC dmIOy]tb(J ~=ꤻ"zPVHUÝ H#'Pff0+6US` )3 T˩!4YX2|7@&{[:gy#% Iߠ˼ͳPzCݣ!R.r/@j5ο82yvZ^`ggFvov7Ax y 5:u]mc~%=>X.J_heha"(/3p_U {Lډ9cMdA[n0w{ h+ ^Vwt4#윁,W[ y;NvA\R ;J۬ 0 ;ܷ1ϖ{uOgnѪWg虧C)MR)ʹk(Mp+:3(VE_yP[($UA%z/7q@D&Mly!ȂG7u;4bͿy¨? Eir&f` M w!Aُ9Ԯ y-D[>#Ӕ4?F3$Q7Q$h㣚!7On@|Y4Irj9@٧ )$~w{jJ8Tr|.~îE6zʓ}^'Uc/6.& \:)uWI++_WlH<#;|)v4":Yfت".w\ Q4ѴS860J[wuye6sZ>*9p8e<2*}9.w@8we'q,M;i:X8v e6sB) Sԟ$Kh gfU?n|a@҄!vxvxEH9.BVI YK&&鍍d5:sFd(8TI8l:OWzRJPlZS;zYVe7U )d@8r~BP@±6a%ccMt;.,SuMľ;7Vg{߱ opc C{ 8e3"о$Ȋ`Bp[ԉe?zDiuJNftX|ǻʈc&?k5 &D_yYQxvvD+ln}B@鈭QۗY0̢yݰp:l9IU`Bl65l6x}wY^MNP1J@*mhIAoPrl~цݮsMl"L{=!kתDKS WH~[Xu; A]۬+h h"4ܗ#iĊ05z-QQX7^w$ .[_TE"@SW]~܁ ?oŎ#~p*K>*Ilux$8 [ۼۏlp5\;\(=ȡ Cld!]#M#U 5L'˸3qȢǵLa̿ċ]4 bH2IDphoWzwf}3#'\D#g'׈59'Փ *:OߔjS/c6)L {fcY pXP̉D=.#x1Ҫu'/lbU&nž{z`1 L ?0`b}e,ccWcmk< s5_fC4gE?c+j!p39!Cp l}/gQo)}"g^!t2Ge endstream endobj 76 0 obj << /Length 2825 /Filter /FlateDecode >> stream xZs_q$aO:I:θMʹ$gDLi #HQHbXGEA狳o-q⪰ Z…*.._&׫,W_N Oz3bҔM^MgB l2Ml»_5 (o hX+%әt:n tܫy].U˶BAcĩns^*!&U| 0}٢DB g0ae57~ƔM* ^woq_wlBPoXdC5N$ea R8#,2I8LM|LDf Ӳ?ha_X".,#P…gQDh *78 YI4t=l1ĹNBD ;H'SFbFu"D䛜Ȇ"C-{Dg i .^,4 ߞ-.%(Dgx,ƭ!RDpљqFPACZ :؞쁯g##*8%ҁ"Aڿl6lܖ 'әN}Vd|C!\1'<ޞCo og\5ksS.9}xOf^\Wf"%72nMNWK{ӆgٶ[n#@5D#ҩe8,ml#[ŀT4##,3޳0`EAz`aN{S2Fgp<8 5rlaJ1! qlG/8hzc eGA=r$ʍzI(T\/2ۂ4܃g+*È&$A="|¢VHR`^)z戼TA`;._hKԳtkM=-E>%!mXKd3uގhw8UDR(%tJV9-$5Ӊ&+>폑ecD//Wa0'7WE:RjAH4?=@-EZUf_c%hqP=#CSP.n<79ljg4I(-e5b ~ڝ9 $)\ΰr{໴w#}] NzqKA|hۻ6Lgiʞj㏲ Է2v3[J};wL`|ǐ1 .\ f<^Qw|7`&2_ 4f@k`|;9J{/UΥSOd%^gեUҎ}w ݄Q ^Vs ŔGvw(`ۯT2'6֤Lwm_۩7*ɇғr1Zީ:<|^j'&7E> stream xZ[oF~#ӹ_,mt7@my`DZf,Q^?g.9H#y1̙ܾ34V~>w/tfTfa )3)LdU:kś߽ d"k ZdH EQIdVT "H"nœ;/6] y6KO5H ڇ0NF$bvN/*zi짋Xg$R!a:.V$[n^Y/_f1;'ɨrXl( ,3.Ib1LG%'a{<=I!LT6JP2;d"g)R7Ub7DZ,(7)7Pw;Bd aLqT,#ս)N!l 0DgF``}/vgD@lzJwJؔ|MhSRR755-TR6ep m(cpUUWVM.;&߹uu.z9ɔczTE I#6! "r!XoU {mX0 H!N9h QeR D \Or_Si>XIVQD 2p,j$%Hq.͑fT2eWB2Ye҂3$\5)d422A$l,RD"r!?.E4hE>dBK&(k< **"0id{3-4M\/1CyaSS X#Q!EH<Ɩ*1_LSfscfBiGN٩n@ %aA7_D(s|GQ(-*>K^ lÔ̇n_/ɷgU5@dO@k)@b81^F ]- P՗)r(GAX}Lr*RD*\Pd`#A 8OăxyOC-gER Nṻ'Qk`( i!C*>?Qey?|xŗq|h;ؠ<1=d;J2y]XRɗ۶orU6y^U?s}uu˦9Ӭi_b;tӆIjPw-!90[K 2" ժ:!$} 03TW@{ۿ+?ԴPsދl?km(=L!j9bG_>@G"fdeǎg횚ԙ;K (0W k;\+bT^Y:G$oTH応*.^\hJ ]ي "`4i"&۶x*$?i@տeC- @`Ĺnrsnhmn$ZR܍Brsne`9޷[`*ibM}Zg!pq\O^9&%|@8+*%pݪM1%|c6IAf&fID!6e C箮^]:;]i`D8t%7eqK+pA;>iSy{Q^p=@EXs/2v`׀;e4Òg*G2!9 endstream endobj 85 0 obj << /Length 730 /Filter /FlateDecode >> stream xڍMS0 7h z(1!lIR: K۲ed5؞=O]VUZ2-RgU]+u_WϻB3Fi^HVh9#SdhC=L) a*ϝv4‘)bS%'ΔpxO# !qk<'+TZRϩ}?q. SvLu12D'K"/?>I"A] kfábV/,_u(\jqAM<}Sތwq]Smv aAգTnW9D> stream xڍPZ. ; 0w`p`p'Hp'Np%Xzj|_>UCE$f6I\ؘYJ:lVVfVVv$** k;пHTZ g5؁? ˫LJT;]l6n~6~VV;++߿`g~$ ; HT`GOgkK+8:hl||<Af@ d hPY\<#? ;3 vc[X@Ge=ҘV֐` w3*69@^M\A΀u9EG_dſ`cfpd1 ltvXXہ]<\@?@;LS_+> bmG,ymA#?Ikgk=Y\[e:h:X;$漊Y\\<ܼafG OGПJ?į5z;e|-@?H O#$66 diW1/z}cuvbe YEqKR\f0sظ9<;sJOJ{AK: ?nj<M^I*XpʶϖzOm>jSmH & ~#gyr })Oʋ}'Q_<:WŭX6eX*4k 1=ֹRx$hBo Y vH%>+)jox%EQBK,n'MՀw]6n_B !5NjOU!q.'8=xg )ڇt)feZ3N S63[H.Ϻn @iN1Z rbs!Gw|vx5#i&> ;Q7ۈaff>^z5(HYx 0JeŖx:D˳dv BDF޹g Q'!{)`o8&z Q/N\ko;rB`K}r­13Dj}bҜj,H(]3=duz2PÜ8p cȒ~0e-[ۣ:A2)pһ\垳Y&bgG]rԶK:9PF \"Ύf/ n$` Q]u-ġP528mkێՔ/ްOB]JnB}@M(Ǵ̒U[eW^4EԓE)+o#:h/mLj%ןYj`mibbmS1\L{?ob_&?"<`Qu-!ɻC}S&7sCC1oUz?ӄSG)Qc`~v[5p&JV*n"HQI|t$X5,UئxaaÐyaWY=P8Q?M[\{DBjigZ-,]13T6TA3ҁF/u &75%ܰY\H `V 'ur,f`K7xKs*#fOHԺ8S#3ܝ.;3}}-H8GEwAFWZ* c;oݻWQؙYxv%=k fe[Nm K& mq[1'WK8} :X2hpp1E&mȖe/S8m+&| j*[cٳHO l;-ԔG(_!w-BbItVϩR+ym+jC9$!ҘvVX9}a@h,+ Vd۶ZC_y>!jX 5pV?ochhC4n g0|fo te<Л8i1֕c7{3}BJ>;;uh_{\тn*2&{o6 lAKS*VUPeI7|V뷳v3/(oXe4]kŨ8^sն]D9mXc `\<^ ˤJ_\S"{⠀vzjR&BC?_;)$BS926@ME}p]|[˶O)T2*bMBvҶYΚbFfFڷa "1.C!,-( {j+W{d}.ɐ1pfC鞙J'NbCynυ/҉Et.YFMȋ/y9F;$sxP &+3Mcڜ?fg_l]sAUFw0G Wls%Lt޷2{SU'_PKu9Q;:2d6 s9r[Љ\E@ O8+" awOOYJ2ՠ;EB%np?\u{ca 67!f7cbh5o%|&_y7pc#$?>3W&6wSx4ll+"J7ɪ|'N;:9q귫iUnƩ?S {:$ɿ;sVh!Dy7y6"^ۺX0L$T,o,S gL㻰0R_XKD9Ia#$ʻltP_RI`'ΰٶ =̘V#ES ps7QD8UC1H9 !oM8}dz Df7G:qY-BJczި&$SĪ14inkhlBfӑ6>J)/Sw~VdV/ˡq>+DqM>ױlg?n 꿑պ B=&BMϳ/LG;EIkW,΃R j! Az|Ji歷Z@y"uM۱b3نLovT.,h9#&ʻ: `ŽМaȢAߏ=OVR+mx(2'͏ koXk 6lgUs_TaZBaFQjȔ dU(r]-B+^F0@sY ) 8t~BRTNu%1] 0!別v@q(cn J/k F l,Ig}%K<GAk}$'m7BDJ 9pO$rL &r0:IgP[Z@ʁ#xxbHRei{qң^xyfls;6(x =2Tލa ߽I7å.JO2_wi֏K[@0üv}k@l=O̳ lEpkJkdydݢ3Dȅ]{P򘖿2Ē(U,1[#6䲆D|=bљ;`I@ȳ0Np6_^:-gX#6(9|yIApe\R4-?7:[TXjio>{,W4Y~H7c:؜ 󉄛7KQU-@qr0]C_Lt nQ&/ r!K r98CXe> c(|c SB9*Qq\P?6uB 4(qtRo{k?Q4 $/sB31;X_B~'x>v9^󕟿 C N9NC9PJRꄒEJ]TY_b.өȬ8P" dCmoL]a,ٳ~)u%E^)O@ qO}h~`stA [ ]ڻ?xY4LUdEpoϟ .68wV$K y<+Hh:%B%mSL۶T#)!:yyFR=-JJs^ܶ2 5s47P3,mAtnVcUYcP/`4&_K0ؐ*uVMBOyR6̢8,WA;Jf*թSk- s>\^&GQĶNI)4Dɶqض ;4k𮴰AՇmaH)o8nOD힒ѽ=5zz^&kc=Y8*19Φh;."!o@ͨ8Nc*%`x[y*m\þP$L¶M[O:Q›B̔JCZ'MZ1-*S$h\*淞lf[Z0jY,x''hNM-h YIX+k0˄uMe"rǷ}XQ~1pDi?sgHu#W~Pj; 6į[_Rmr;_fWޤ-ח”g-A7K΃8WOȶ p/(x1㌠$բR8f۟z܇(ɴ(͟+[Vi0Sk«4'ӊ{(Et*gk`3Pm8wz ےvŊT-mf=ե F1#&&/BY/ȴDaLm#`W0[E*4xC"m~ESr,kf`bH|j``&p藵Œۤ mhxSAqƯ 7`{6_V03*fh$3Kt`~)GL\8e\aom: r}VP[0Lj@Wq])? ڳ'([YXLjnR:?;d\KOq2Ÿŕ."`zUݎ/lk&#L"\S՘1St|IlR(1.Q7YAy ~i7如-q<-Vosdl)BVі[ 3 <do Vج* Jz|~-x QlaQh.Xgv͵!2)Y ?6]?OnY< LÖNd Ba'|2צ%X={Hw*ҙg=D$NM" C7ї$5kz~ߦ+>Uƒk2L4IR~grFWKw>#?N:woHKY)JZP(d|ZDYJh^x!`ǰ;Ǭ>g\7mJU0:Fqb>\ܨE(/#"sc22cV/~߷>1Zߛvlq<./aa7tz>GvYIT+p2MȨ9EM쬿2Hh͗^Cf2œ]]J "b_fW8Q=$Z:'ce B7 r!%٠h Մ$\; B¥mA'9:*>͊|$i=\.2fVq^ 1' جYr oNiVH;y'"^0:ofdNKCӥ=1j\kN=띕^g W?{ ,_[N^3L:7Blu.ɀ/ݰCB*yt}AOaqwv?X6sH4V0U#tmt1:7/ *(:)/0!-k}F8g#DN34GLzZfR I֗Tx(y` c@\B]/kcSn MIjW\ ԏ\_2~}{h"ƐXa6d,WG~\h}_.!C^/\+Lhߢ}HԖ֛0+p".fAʸwMY'0܂ Fi,=(p89`zUfz-c뻜br$d)('/M?MuOLSD)=yˉ{H$?fT GǹQC D]}_d ;gOj{Yqqbcs%@{*j0BJؠm%_XzSV þdb- ߱\v\^(jV>wY  CAiQhR4!}[umfAn K:,@- 6[mn(-1Ft3eTTpmS#z L\º5CV?I{fX$wP9DmYPdurllSEC4ݑpyZ%E"P~6 1g_c (;0wX?ޡ q| kDc`t `i ^k&opS/Bl} ]f_@e iSaSU=n!ma3/s&M?/8x:*# *1P Rj:Syj{p;"JOH>] CأHݧJybY,Tc ѕFoJȨ*UqvoSn ,\ g̓->yЏFE%]sHQ2sdudb_\,IOvfgBVzpI.XVg}^c,-p94 +ԗh8<̫T_$\m tn6Y뾻3JeLu^?{ M"c!7c\oK8-Ћ0r ]S4㴨}?>3*&G~1BP z1=DOύ !:|%dmک`XwZҗ&uM &E)ǐ쬈47&{OȐQH_%N yE&~X&ӕ&UڸlN [B'SAL(#DC-Fҧ)]tf^H3fo7 UҷY #1R01wv~H2"s%Ҍ./H?(3_}O4:%pi0W[,W7.Imչ9s6 L_+ w_ݗ! Z!t{?# tV3{pXnP TAW|BI8"8Ƶ@/uYT/9ӉuuuPBrUǦio:rc 0cʰ((rcZ J*yg ,rhBe6V[NY|A^;uȣl_D+:iۧɶ6,Ԓ ˔IRL0".֋P3Lbͽ'Æg` mwk25/J4J˵߲`iߴT}l?Zj|ڜ cwC߹S3`6l {-14_%gAD,,82`~Mf 1-f%\5Kd ȹy|bAOl_@WlvSݢ؆`w:9xM׭=.u[HZee:Ol@zv^5ЦűT,Ĥl-j/m<z}/4?2QiZC 9Jm2zѺY-+j R5Tb< 5cH6}Y b|2LD\e,RC[@H߽Qlӈv.oS' ABq1񃮖c?Fϭh,(dΎ`kgQU> stream xڍT%| W3td6&um۶1d6&krMv~߷:kڿ!%S01X;0sdL0JfaHUf6֜;~Ȅ? elN&+''==?6a}g3# -@C*dcfofb?_6V@{3C}k)#%@ ?!ȹMm9\\\hhmMx).fh2@V oj0%S3)m]큀!hP|ZX_Ԁ7@p+6Vnf&c3K 4#5@/C}K}g}3K}K ?_ fk#!++_  ?Fǵq2662-PB6"?2#] MJf [-ApwzyS`df0Y! +@cDŽX[15$eMJAAW `cxo9}AWr?=^ ƒ\ Ϡkѳ~|0]M_Q_V$di_?z}+3K[|L|5Ukum,NQcM,F3Q3W_r(g`i0vZ|[XM)bmhcז1`?FF@׿@Gkm 0EYYtXtNb A:"6ztN'}d>2|/b }d>2(A:?##?## c~/h>ݿGY >9`K`G?? L?*2Sǻٚ~?2l-?Z FIa+Gm?VOw>n oO>H"GZ:ˇ!OsBhj3|ptG?Qd,rN?<ŏ}Wt,/r?T M$SMXpzBLe'4ҋ#B~˿BqRҚ 0418Ux,P7GĿjgK4ΉA._̵nxž~ˏY(H-9}%IX9ϝ-3rBhE].? @ۤasg?@R~RO7*K+TD2u.̄>'weS*,t3Xѭdk`yO 6zNn{8Z~Bɴ4zvXxr9Gn.B {fd]rb Sݎx f<'dm kg8(K,Rsu?_:A)}9F A1K$T^rU[229542< 1mCjbF%eANr鵰_2?/-|Nj >qQэ 4|le/Ǹ&P5%@8u4`Ia\ $u5slW j45G]Mʫ4s-V6 a@Ik{;rMUrOeڜTmCY~sc}·:rG?r|w1U#t }H$)>4bYPcƂJs&Q-E. IkiN[- 5K}e˲%e8.:^ /]RtsВFwݐmy8JQC'mc+*A~IЙ$ep,Vи BEfG_v ?+Y#p |oNg?=/D\m@VM7Am+Ap,'bj#d&ߠXdu~@哦H.jPo6}] +ӄ@+fMb(ڬ!z'<{?Vf]hH6xg[ Sy+R`"aUuGh$;` ' B 5b.JMCck#by]eOK7lwb>}j]Vd{&qnIN7Ȋ$ׅ$$$3+h6j,@OxGph!hͳQ1("T j2A`1XjꦂЯ"8Ap`Ǜ8G2X?AQr,sn,J U0 uT|E 3Sauo" vFrאՑ?=RK#\dq>c3}. */rYP\$,-['짳uvoB\g=/xxLyIR$Ihij&u,a̶͟ a EJ5e'q~/nZ2ȊKQY7b:/Z,,!I;ASRE:x-i28j_ HR֓`uTQܦ|˙<(y&@TۍxlpG+I8f4@D~xw;b[p".(i9_s4 gN)H11"Wmoech>9WNp e-nJd?68fi)j`,nw~7J;$`<(IBhr;OG>DȠSW˂I_' MOo 9jߺku`fo$*Х3c<$c̃d, YȢ|5&F/.6~AN ;3e#bI̫H ,}%/ƺxX,f6w,&C7*a1yHgΦJJY_W>+W$x[׼9!86RVijseH%ld5mC3W r[s#9%r0 gm#52TVRM-CDktLU\wltl=,Ɂ[=!T |_ vnS<* 7 >lsdD掭skZ4V mECZ`?"y"&BM[_o%DVɹ4pNȶp翖9|zY:?ɚ~!:$<4Ƣ UcBqeKd%z' !F}~Iq*Œ0oK%UF *3ڙ'餘s͝a*ttVIR%ظ H+'R!D`_|\njē|I`hj_g}x;z);خluU>QŇ߮e|ʚl~zu* ̒,)pN\BA|dwrý)|{$t; ~[턟)v8 dtQ eu;Ix>R/З6r‚a*B^TV*~#Y|:$E$ rGAyM Ă;e"kԀ'51n59R+;Vuˏ οL̠,y_* w ,sg>vʂgL ÌU6"^M&#|<-&\O  _eDb]d4C61!%Lfݚ2&(Â苾3Qwnͣr/p6_6ҥִ_ :FF0bOI(wC\q]ݢK?28`Pɨy, 40°Є"ϨXW<k5J==nt?2]6uM|'(6k!-oX$k$ht=5LX ,FQNϱv֍Ef9 Ko@"t犰*^t~[HAMxҘMdOLczElB1jT b*F ۔~NWo v1SqgNʆW4^`@\ (}5e]yp9:X .Vu0͝,{}u3U|'Z}5ѻd;y>L2lvۀvݢ&RyTEi=̱/Ee}0) 宋aK{-K!2U"3߂2OI³( &{~%Xµsc \mujP@SӴc&^s쐼QW[6f>7a#)83XSZ0ӪߨA:Dykq[2LuMTiRa H/;ؼ&,*q}/f.R(o#U26Ig[IsXo!hm\2RuMw؝=qq6^h㕢y Mr+-fueX+:} NA =9i?{O=@}x^nPNƕҟϣ{Z n#y4ݞi>vu,)7ژk궓,!&#Iwފ70ecδ 0u1ڇ2\Mq/:2nbF-xŬ_6l7wm{0H{ Sy:Vֺ1@A1K0c !1YO $K^/B$mszK̓k|*p1pn? | UJMlgp<@tֆaa_7$WؔVjOc csA=WKMv{L,no -VHL9vTaKn"o(3ў#XR?"RHXhZ7Nbt~&(_UÄqyܡ# %OlTtvOQqGZy]BC-4p akҧm%hH@'M,K(6DԇhlWOO\=3Nvh$~ųǤ7*`kEy$j2ZtXu >AxNƭ~X>1^Jل Rr#2hV3۱.k{=H/Ntlk?k#4x13:?06Q:J%9Vy?XlZ#;!4eHGSl!gzλ+X hE|Q*ve `LZxSh]2'3 KC0_5dr&)-iysQU#tQ-61[JEbo2z!:JU/®Xls=%u άa9a~IGi6:9g3j[PbcZurfF!MEE~suo-'r}53E2һPZq`q{e6bE2hyQh-AsN^9}~V{IRs&q+\5*t)q ` 9"Hؽ~pΊa_<%jeԙ0@U[fӇY'WN c1\}2NQF k #6uM{)Ú+/u zv Ҭuw 'gn<GYrXU6hS=$(N G5wIb`}Ί\ĩئ,p` *{@w~FeNs(;@i㜹c52''WiɏSv/N+P7cM P~f_Zι N|*w6|+ ̬yϘ>Jz&!c8=|O.dNeʡzAw`Wsndg|e-w轁/ ?(iu^e\ql}XPdtDph~L(L  J"'N8msV88'hcOE7׷F>΃f =R_,h> Dj*l4Wm%s9R$%vi^GaDPl]B:HW/Ԉ ݘ!'MSkO<,:d{7D粙e̅FEc+ۉsFw)zQ"9փtneGAc5LЃhNCBgFab95tԦOm,j^;~aD | IS~ʝ , (9HO-5)=tT–;8 ߮q%t"J9\ /6 ōҾ$~8uNav G;>VZ_DiEf/jC_~㶒5O'l| ] 9Jba? ZQf ُzvv0%werzbBȮ O,JΥ;ZHMIP2;j-wfHtɐ89H+5gB>L'PLzτT y4,I?㣩z۝@,TQo%Fs'o$g {o -]bSjvŌ83Z7#󠪳xש-0`IjqΓBӮԽ[\ɴU?!OQ+u90xF:j*3`|*%i퉹%.TK7"WLμoPlg}攩!GSJ;ȣY:YBd%D| Afrn~zOIC<6)?ucvSYY'{j* .ܤW:?c(? xi<߽˄足u1+༣ȑ@D 0 _QUR޶R]JU`;wGpSa:=2ohV8)k,'#ޔm-ƻjXɧxD?cEIc$'ΉNL>Q?,la[p0M|='zϟzF@CM.$jWs5RS/YAC:Pb%F`WODݦ Ws;c0- U[&دInZ5=v*m` <*M([e, q|T&(˛J@[.Q3._j  5X*drH7(.}E_ La7FBH*4+w^Tb,qC9T\cBbVZ{A#mY'T}8l5^gC0S,_0XJT㝏o`&a/z\d`8/T<_{_+S6Q67rZLY4; yC0~Y Оՙ߲%q!d`S|'2.yzuM&ji {_RZjw,/;p%o>^Қ4ofQF[ KdCN$C$j~I }(^a3멯7㨗vtb>$],u<՞XHٸ_<ʃ*=ӓ'vM-b<Ιͦ ],f{RCzs姠ro2mg&xtwn2퓐[IlQK:aeNZ(/޵"ELt?y.-eX$/5KGƏw@msDEf3W85Y4VKT_$2#bҢU'\nTD̔d^&PHo$I/c;/SwkEk\-pu~YX!`) 'pD(C}A:V\< ~B0֞.H0yZt -0񙡵h맅7D p3A^aRG{7'm=-3ƹ 29Ԝ#BD B0G+lDtD8 <+\qC(h {~ 17If-"6O Sg{K:̠nNWqTdψCQRpG)!Jv1A/8еcN^eM_XM#44VD6 $GVᝣv;9Qz/0N}.f\i7ᩎ.$gXkӜ ( dj#֬.+{85Tt<# cRSyVN}ɼn#&QiBaBsб_F Ŕw/0DeB4eAb$( Ɨ\+t7eh\Ԫr%؅^ -!6 u6;L毹TD٣)S=ߡC61j3Tj+$І)zmC"%ԑ b#/}t Y1I_wuxo #dPZs:ϼRė](,JfԣS<E1EQ;o~xl%~Cx!N6Kγ3F^p%xDE(>c&y>V#8Q4 95eF=8U@o6J] Z6K~'S~p?( SJ1?MOe;Ny{dFG 8.4j|(+1` Y#IuxKw=K_ yS'mV_@,l351#/P X I2w m#F~f[?CKC&YZL%_#Da|2ݲo*V@K@9&Ffq2dֳDi%)E`jp,TUHϝ}w0{B2 qbW'/^c痩)Hg(e#/ 8{IS1;7${eإlꕘ ƐՒ(_lq)t40O/EXFQ]g!Y%r/@:.֌Wq:?@Vf\q#M 51%TwI !9 ς]˘(6srB׬H:&b&~55L֎#XȤyfG2\0;{kONYdSXRyv-◩YRyԦ{V[YIcSiaZYǢg5P! Lu6gQ }GM5xSTdIJvi  OoT&Q4Ah'߳mi."3P.n-`=?󮘞&yCIH6W\Z=3?>}Y#aJ@]m})UA˰8 ~9Eyz%quX>CCǩ,1YI41`jv͍j -Ԣ&fA`Q";q鮑[co[ DM['ӫ- n Bs'ᥣ;A 7( F}dNڦq{O])o8hߡhV7Frf[.eH<{2oZMla4!lm/&FZOߟM-s$ jK.-H[i$ӽ{Y=HӦBxt#bO+RSnBR!Z#E;vXMmA߉y7SA'=#qh%bB9m@i1*\|$sa}&!ǂ`u6 ! BN[ Җ0!5ǂk3)ks˚XReJ,ʏPgwl%'&N`lob;qVhE~y)OD-ʔc!QZ"+]sxE ң%˩f] Ϩͨ3,(V(Mb> ;{}[fSvolr Ǐ릓5r3PGq lh{" yRˌ{ MA`;k#3tsPfP_(/3:(=DO}emxgYvD}p,qpbR|@*`R˕敢 l̒7YQFW7}:ݬ yG{^clD@;d*8(F Ht +-[L 3rXFR0]Lυ_Z[bUWXY(c#e޺ >w]mH"%8I:\ʒwt.о_Jzk=2G?hU%d V.[胃>:?ϓg1mg7i=ik"(g_ Zԑ)c7 #%(]8 m?e`hoK:3Փ̈pֈ/3E>dv4`Y- | ?[NV[W3k62ā>A_<bU81U srh~ "LFb&bu@`;G:^"V[Q׭CkeӉ8F.Gp }:KoUW aZ'.$PM]̻>c\YbWiDXZ@ű~-{>ZQ` vx[TІ-_?(I/|8B5L2vO3 䛏Q@2I;+GPpNM{%V@{Jw϶t&pgǀPhfri9wS[4ԕ~ Ĝׇ BOjO'Pu.9TIqĂW Js8WcQ!o0~xe'/V,ew":װ 2`O&By*(r9}-Py/LR.)F,^+Rf6wJAr s|o9BVɌ75@<.^ %dx^ՄKofZZ1p1ٽqcS..(;XI~{˸\x-jq)9p(rBb:BI̓)B- Xͤ2|&>J7+۱ Ԩ<OSvvm C¯lyUy"jU_ b\FQ"$ghPuEРv,8,kC;Cudd=6^Rl什M-R'RɝQAwBG UY1^4XLf崽2.'&LzD:\*8cle¦H̿4>#nJ.'84Hu'qg]re!M-}Ei`u:Ľ{HdZ/No7+V.;IJ2|s8'a܅+/xG RG$Nf`C endstream endobj 108 0 obj << /Length1 1362 /Length2 6115 /Length3 0 /Length 7052 /Filter /FlateDecode >> stream xڍtT t3CwH  0 Cw HJRRR %ݡ H 19߹w{׬z|, Z\26+(GqxĀr겆@~  kayEpCB![<uSG*N@^~ G_@R (A@"pFQe df s8C0kACo+ZCk(7'B 80=PEz@mj@&an[' `Pm;*5]?`?N߻xoD0`5퀶0'(PBy8/ q VߝC`m vY#a.(7׈ܿ޲F p~'CBoݛfO߃- nckwn}8,rk  @=z.N_ }].@!0[ D!ݡ//fZA`pd5CmoyMxn -lp'[CIVIS"\@.>A /(PX4Z6x EEt{{M/X \[B!) Co En!0'￀[Һn?E8O ߗsü6Z0Ҙ B~}T\<<햐][aK`|B@ ƹ]I{Do Ap6x;?O^> /?Z#ۢ:V1ƙFX9Ԅ;`4HgE_cUf,#OdR.+K_4ŊhNniqgfwՎLm=.=-+W'Mh*,y"Zg݊^=CO7*T,Տ1 .dɷʞ`Dqb}"<> WIߍ/5^;Y,sd4E?&Ws 'E*OliͧZ  ~U1.+AU < 4IdCTv!cy>\lH(! 8^Nb̳oւpnBDqQEp;D$QV[$W>D($IqDUhR ~V@,K gFiq|(,{wЃ'le@DV]Ob}jF5-lşdZet[k  s k-jen&p MP oi;**(7AAg^[}E |9˥<&p 5i 'i9,Xu.JkcS3:dW] X׿J,a\YdW'oi0Oi)yÖg/6[(}IR3cam\m@|\}nX Dŧ02rӾ%G&&—̵g V[t)lm(NC$7%C“ Sy+rs .d'`a4WRmtߠs,M%G3e]?u3)LVBO.1ݩCa+zAb7x\YZ{(q-fbmhE9G2ptWdОUΩ~^ []\󓺊)E_WB_δ [:hZ]c6n'd}c0Ru̞;5ypZe.pBy[iMaf\_d{<ôÊ"kY/kҚ)tJ袸°eշ?ZMTiƿl4Q̨RID@ 31>m%!EiWpKH18]H=4ʭ16%WR$x#-s+_+vgBe>Sبv لd.?O:Zh* YVcy$X=We8@e9>"_u.㧔Ot* ^pM߇Z:ZR+Vo"_Hjenn JfgO4 0a/tdHO&?!S9X5ǯ27Y/\u8)V}u=܅McE}%k=zkFlcvb<]KZ|7T}\Vu~F!yYt3_vNK e-"I]o҈䰯w,0~j0fR3i A>kyEDqy\ǘȺZfLVxT:z9TBpӹKkߐx 85oX:GK!ip%:[)].u4\Z><.\_7 <ITA7`q)iu~dFT~nmUj>e#XH$8ޱil$5{ nC(@SQ8.E02q`X_I& f=G@sg0ʇ Y8sZ/y5.,CCӟ2MᦆI-} }}&QG./!:za¯*94 w8W =ٝ= >C]YE"r͙ɠ?0,QV<(AQt{:NY4A qHg+NQiw N@T#^}%Vo;]S bhju;_w2%Z3мV|Oqr%rU Xip "nN:(|hvhɬN(+$;6(gcri.dL2/ԫvOմ;R_d,ζ8Ft=f]Tz7g?;d6RkY2\2Ri&j֠YcŘ5Xzk`8[]"*r,ݮXw`$1oEfbp[tz4ۄ8Z$3ԧ%$ 󃩳 /F>MtfYd`I*§Lk{X~42m_[#X);q.vYhՒyld9)E?.yJyF"{pպ"^4Ng;{a0|iZ!1(&^H76M*e{f4H@=1L2Q+67$ }Uy1l7x]]x[HiyL憏ZNݷE5KP F_Z X~UTX$hW B,Ge%7#$&yuY\?:/F)z5vXu=φr{B(p5[@ ܛA iq몃)G(L $ޞG=*}e?JCz dBbaJ}pQ4hdQ,IJ1u@LfFyA;ޅp[Fm̩o4^ =ph0\Hk"̠ݨGIxK6XW-62pC$I..QK R:C?&?ُ!'r.w?_7|af~,ߧx1rI}߯#NvЗKAlMMu#?_Az^ocd?q1R{a51rM)oW:+_ 5 l\uLliڑfžsDes=,9n3\١f3kOfg ;q??au{kzP@ ΞY2C<۹f/U;RXxZw5TQ ,hS+1 -jw Wz:]/ZLDufzzb2̒jZ7AG>)Tm 19#;OJ,~:)^o&79Ŀ#óP0OG sLRUx\5i.W=^n6wS kz uxu4vO8>A ssΥq*=Vo}j׉?M2TvO}I^(DtAwt>c`jOor٪?e[BirMOYIL,CuWpp1A\jۮKЎ~Q#_ X7댆I—6X:h2 ODGgIR$>͙!xQ""61CZ%aQ }2 'eCKėC:'WK8:YՂxׄBot>ZyK}z{V)u9I T0xa0 t(丘 f[>Uw( J=91Wg7`cfv{=5M5sӗ)2Y;;ۅ2O:~[uGzd~0*m6:+./d[;xk.ߧ*&q7: ы-Ka>:[>e{[rzΚfk ǟ}?S+^&f)[qkBiaX‹9Ѭsxݪډ~G$ДV2]B?|A؋IXv ~C]KcXzr5֨UZm>x(D:qpp!*CVTDžoĺWiC:]萿WtiE &'>i U'5^``Dw<< N簣 m<)tM|j&ꌮck@l7тSFK$]%^|w=YXNH5D/y`le^xkO{E)0ДD{R>^BOVlYs0N:ʮSXm~Xq_؜^' ΨS0zn@h'5c+ۉW.9Ǭ(zmtt)P{iGFiwXer{^Blğ;K}$;k[R3_^8Kr?; ٱ_ +KbO+?(]G\$J9xSiۉu 4h8$ ~6F !.(ʀxdX }gF[cQy4ڣK׬i14L#'D+&w V9;sNG-x"I5cj_L}pT)Omrf Ut* &{ >ʥGeysqXz^&H'! Pզ+t>.:?OK endstream endobj 110 0 obj << /Length1 1652 /Length2 7498 /Length3 0 /Length 8555 /Filter /FlateDecode >> stream xڍtT6%ҡҵKtw7Ȳ,.,KwwJ4"H (JtR?y{s9;3ϼ,:<& XX `(gx,FP; C nm `-Qy8@8HDb""' PCx,W$u_HLL;@Ap&u;(8/ŝxP=; *v)`s GءH( @.p[(pv"kEi w?޿࿝ v0g(@[Iᶿ`gwĭ? s~_ Po+S; sEÜ+m(~ 0$rw?u#~:vʰp3< 8&(P_DXu@!|B_\20;; @!=~y0 ` |;$`?[">1ן !~<~1a $($3"sV!bQ?ci!n G@! .7TH7p V5D.пvWj poT]Y sWyCmu`(_nkלapun t7ݟTC_HF">xc= @i -d/uܖC ~TA!PgJfo ; 9AQ =!]ox g.?;'p.._m[3? o OD~QMv/x o3~-_ߏ N# ug5xֆM?Evx$Q;_a!511V? 5O~% '\oO BPN9lB:!s*rr`G#ΡD^;+5mbFZ1{Vc+ *.p D<˥g*[U_{^~QCS%Rd!)oyH׳/g#Y* /ɹ*eK'Vtr%X"sId$*MZ [ &#:s癦yTtt);bN3ϊQƝ;hh.j&Nqɧ@ y +V{^˒W&sĸ_( >-DyYy,/_㐨@6/RaR;jx||']F@~m/6ޟNO<:Xćw9[Nf!ěU¾3O:!0价7$VDS0{߫zCRS\1+72T뇑d ѣsErL"g}UkK\d^Q%rk]S:^Dc3W7>@|<U=;QHGbU-_FTa&jb_u׮{ul`tJM@fЄ>1GU*\ExRΝ| U]? _UWbۭ]Q{~WZٵkWGw o<7g@kD$J?#xpV 0]ĕ|$|Ҳg^մ] :L]U2}kdB ӽMI(p*~F V5e>՛WwDӀ)5瓃\<} 7ŴANu`dTJ]D&pb+D◱~JJ6 xW[e.,6 M0PB^ a,I tl᛹yɡ_džB9l̒"aJ*HfBdZϟ&x~[$g(`KDrí"Yin=WdTU/[M<~މ%EUSK}NrPӦ`#e v .A+NqMB_bxH6OB|Xzh$Ow<$*cΛ&~wdqW@8QP~p<ATx]aɛL []Sy %J$菪e9܎P&9\U#5'1vT&$sD/Q؁o7 \e*UOrQLuDL:gbj BgLYhM xҚn@pQg-.^;1P^9a7:`OL!(ɠ3+56u Hk#c҉y NSwpfAxo÷qiz?pafȪg^ yeyl..+>*= 22yo†{<K:$,i[c9X_mM?8"MNY53c[Y# z*leA4w`#%Ў(9MKy=E5,zr0uN-:r6yvHIU$F`;'o}Ƌ.33$n9B:Si\I\K$hQ5- c:h(Q45̠]|GKl=][$8gYW#34|پڡmHKF+ioy-wR;}P emut7-EVi zSvɻM׊=U70&W$uM:vF{I܀yM|t\2>ղtH9팃^ ,EZov.=9Y1FQ|t%,}a'+<"t5m-j8?V;sp2O=MەYr$N#tƺsȾ{%[M17m:uNDXE-Vo{Pb㝳JsӧCr>/DbhcĄ'-tt@r׭6W_PV8 >ZKNoQ~n=7۱/rE6EޣQ|^NakEU1 klvI='fƳ~\K @}WTbkiGmX Տlp}Ix2M"aQhKSI:&=9F'xx 6JC$?kkxD8huu62~2GG9x_(7ѕGR].8f&w_k/y#xtͨ'aKNٴѸج/Ub`Yz.TMYؒ5AJr'1u\S.Ws1rbMEKۍ4&D~[p1^ ӴKUۙނ&)))ݸx̻oH5=Z>"{4%߯LlGDݍwj[҃MvJ᾵])HwvaۃN3T®oDPœ5дo?s ZΊjw/s36^G?`Â.yU$CiTJODKiEQ{Jnˊ("h&ڮb@T~xM1!yICn_:kO׾ɒ/ܢqk;M';J-#J!O-V4nUPrc5Vc+j/p!9*ӂO/TtXX*zpB'jWO^bbӇEc|Aa?E? fʩaQDZ EuZҩmbP b{SӞJ {[yR aN(dX Vٝ1~PlcZ<ѱuV2yK:`Mk, 8J-&uGzDhu ]VK [ik] J)$!15yxEf^OXe=*LiIo(ijx2KSU%Zs-!qԜtsͮ ğ|&Δ[IDU~xoj2ӣǸm?2`?*Qk" |3F%bk aj2|7dT}ؼg Hco$-W?êN&)0c:Jxh LR*ԫ ߤͨI  )%Vj$3VXDYX1oIF[6gߟ(h{4mKA3ͻ mu\A_$I&VxD_YLUhc=FvVzܫ 2gWMKeIsûugn8O%w?}jeHkrg ߓ=O`y@w{<A)HOp)7DQk/D\Nf bB^ˏcD+ցNs|۹PQ`-0|W"O_lj]BCsCu3)r?׻s7aIH /?Ϧdžٱp!ɛ/6{ѯGoYf,p>AyN}Z*Aݒ:G3KwChHuڃDnFh,y^Zό I~;lb"C+]OQČ4_yb'n|zSFŪ+upc4ϓzÁSS22)6~ϱwѓJp*x:HZf( #g/Yd` b"xҸ[쩑.g^[e Y_i^k^L6 u|!g&uh1\OHwֱL}Pxl0\ﺗ[4Jۍ<~l!y~{^͵W\ES^p.K`gtQQF瀫hʤՇ|!z7aO >;۸\ _`ʏ}o-2$"ɘ 7u9X=[$\AnRz֝oEQ{ P:]`0kfz p &;'a,[(mn}VSn q}HO7$lSɆ@X5X*y%(14ӭTIdױk,0U=_vs\R0-VZGg.J!VDϠ"*<''+w&JRROȵRE{|2ᔴ O9榭 ݋>k[J!uSkХ(> 5viZ!g%bjSLHKN߅Jr<9\T>I;O_}ɦ.#uXJEDG^D]>r+,9yT}Y)%ϕs Rv]@_ f8] !q+MjEȼ\R4cϖ5-#=D~nG>a-s VTI{k |R ˋfF+% YS Ь ^K5lV=v¨RsՇ^nwNk;bkp{6uZsjh;ƣ 8Js*hpb+qgЧs8_ԳTR2#Zɑ lJK텚]r-b0#8$zKs ˗c5AmȽLz59(iJ]VR&ߕ= ::÷>҄)$<Im^? < (ADvqŌS|ěPBZ:͍)u1 -ZxH\aob ۫ȗM oH* Ȥ cٲα:2 BY$%(Ir71Y[$GU=6JV7'+Ny(d57.]u" \Vx"=6pޞa&옾xKݹ{FN+$@,o2'VR0ߨЈU{d k2WE5ry9txØCqM}Яfi =^;{yF-r9% 8 HCpΠyQ|\ ']+I+R0WIѩ/0!&u>hea|Ӌzl=˼gWͳZvXE{NjfhCYðr͎ն_MHwk4"e%on걼 |i|OC"8y endstream endobj 112 0 obj << /Length1 1977 /Length2 13607 /Length3 0 /Length 14844 /Filter /FlateDecode >> stream xڍP w ݂08 0܂[pw'xР Np'ȽUWS5[Wwz 5:$wffga+*ʲ8Y85mA#Sk V`{y;ί6 " b `gcpZYr`{Zdea?:Sz;{?v '+S=@l {h M@CAف͍ha;Y3ܬ-j dewk, K+_ `sg7j2C^C\@NuY/gotGAdeg0l[قR ,L@[5 :Y: % vwS'+g YyfI{3qG}VN }`pmn^AVfaioՄ f} 9@$p/ 9 t\@>^^_02u,a5¯dc{; ^f#fԖg.^̜fn6;/_`dQo=!%SJGllog?[t9l=x(_j]E]uūٹXظ[AAf*VΦ/gkeRCb^딙ڼ^#Wm!io 6c8y@''"n똚T7x`vB߳X082 N? *xJ >kkkk"׸E ucye29c~Mz d/b cMn*^_j/j_hgb^_/^;r:9[mͬ^ lO:,ay!~uq|%q|Omktez G.NN^y ;yal*d]qW#JƼ34Nyy@rh#^=+sQLj]к[I.{ǥv=Y 3ԾI͛[byNrȀ.罥qy= 2XSnnRv(LP-D/JwrF.4Xcqoj|`bsO#ugytmM2ֶ$#~&y` m ]6oeЗe6,NN4W*Mtƽ!"/Oˮ.1EHh(e¼Svo iQnv7 Cv"J;_?>Gd٥A,mK[aפбfP%JW(Ndג#ϧ#tC'\ tP ˹P[(ؓg|+Wc>~66FΤbⶲ3gOÁ1QW93DP;JYJ28-]f uĈ ?"4]hS"n]^?JA O8.Zݢ+gd%|t;~5Բ[ (Af7j͔+zwYFu <5գaN<ӁzqgjIO<}h>o>!ryj\۫]wED=㒴X?e^pN2Fħy"QFqగ'Of`k7q83%3a܅F?hH|NWee|D?bn{,;]ddɐk.57 qha,p3m=R0, 0zN!DUc{ά MxvأeZ'1iAn[>YUcR{;\sR^&c!r+hzzGtF?ܞ4 ,n7hl#fZOY/I{۞ů~ؔэuwߖbʠ0%kPU3qB?.lϲAV"᪜OZ_~e0U-C'0PI$`?."p':ңD|{Ex *WjGUU0ִ2w譒{۸h*jko%s?jt>sd덱Y Iב (v5 idlCm4d=2A/!˙`N o2@A.gkRMһD9MvV>h)7RT_5C0qw\%A@s$#Z@ddP[ɺ$ŽN3Fߣp['&B{xFa"`pyj?vsY=sBAhEl|z*Dz&e:n}QGr}2N7$7~& >B6nQ'`4WPky$^x3gu:(58ysÚR/Kw?b"Y*bmd,Bj,US[r0O|;5}9=x_yL>rͷF3nRbu&8a@Dyف5[O[ oZ's`N5cOzDc5pb$[-Fa8eI)= z1YgH$༲/5j==/Rμ <8{4|;v8\s5Q2nXptRj*8<りB?87F@AOg}L5aJ;iroD~pc+HwF4= (}!Pi>!Ai>nq>sj9lr4 ַⴊXCz)W S||XٙTq>MU$V" _A/M==f1@JNoKY+;8{bƲhoL)ڸzl'\Y5*mJ d7+S;zVu k'p k֥$@K?_qB(.?A{N5o< f쐱v5?+ [1/Ϛ_x^6n$,}9Qr/#2/|lyvМc1$dޮrU8XeRQې?M0*LE7,yHoSjk2w'1|{.qD|1ĺzK Wc;4R<𧾝 ;q1tlw ai$xeMȓΑlAl*SRSo/j kF[٪L9 ?4)ymDe'9HwdeOKb$U ^ }KԡWs?]?wGbY,Dl!5Y,+ V/dէ&C ۶ԶWl@_^~Ů$幚m JUz ˕>,<S⴫zSŒ“CX_ࠚ5eo6<"yXIk)sy A9LڌZ0Oi}CH* +ɋ^Ɂ&*z{\d Kj$P?7-.H1ʤz4 Y!w-e${rK~T3Rrt{]Q\"/F/ O)|;!>cy(LFQ_5czxCxv$ڣ>_yH$.t#i17${ccTYAH'3\簨ͮH/͙O=`֛w((I׌c Ne&T?\B[5/q@e>h;qP0D"xŜ[zקF4mz Nk~B!=yK"ͳӄ k-F7 ygZ:51bGtxa"V˗hF$Q7<Nqe\&R$6Sx;wai$ª߳ 7z3ט*`]5hbxcaAg)< y?[T)g4G,v-Pf,@}9_h,e`mR//^Y@gc󗯹t/&; QDL llͧ5 ZmnPCӉyd d1R)%}KX~`z LI>O? ZYVՍQ>wi0)Bqppk^qV6!>bQvrw:|B[1j+/ktVO0x66j?x"_{^mQeiB!_[7#-eQVب`& -&9ϛWOk{e QSx~{V̈́zq쟺/?K\^:=J1mVB"b%X+FiYO M̦x t8yìK'T04]r<fٱ-XUvyiPo[nzՠOk0Dnicq5)>1χ=k0P]܃fq=ZzDh Fxȵ֑'b$e[Oye5RF?Oxd,$Ivj}26 YZ˓[oT.<@$/Ă Fle('< Q)<a3<γrw;nr@R` fu&Ɇ|\ovp$+cy\6,NXE7o;y:8KƈGN(J[YC|?,y$ 'Ր {[FJpQ8S^tΗL:W%}] w 9L>."!2fۘK?Z JNscYWeYh 60Ȥ`21KGCl!T_dsC6&9HDd)g9XS@v:k8IY%ОDSe9𦃩ZRקk|4+Jx[v I#pqOV)xD~7䂜NNJ6:{-+_oc42e|*bK:] |#UQPg'ҢUvlh7]1?grb=ӣ x(]u^\>^sR=؄*fjwJժYNuc).tEf ^Y/S4NٙpIMqd_O{3i[Y?AB͟+\{9ܘyq%s~7Tӎ7{؁gB8(|*Uy77;"j0qT".oePb!wP F_eȽuIɰ' =V,R<_~ #A Ӆ][OA=%s VŬػ"pfIr|6Ң.9ѽ9T Px!:+›AFȻ+b:q4o {0Gi v̇eiw)/$sX0qͷ\Ӥ( > Fn|k6px_iJݸlAtUWƷ*qRyf-$P&b :@tAym$nм]3ȹ N'QjEz0S! : AZ$z[Y/T0=IUZ# LQXϴh7ژ[! K=_ "K k$£e/tZYعvN[a8UGD'{*mFS= cA8]# ~R[ln$YT=S cӦrã 0'O20n^pfTe{5sסC4REU&7+LLx4$qAk e豬DMiw^of?US] .j5NKpn%솩]8j^#X%TC5{d0Ei0bPMƗ9-9G ==} j" j#eo6 *Rf%qG:HьBX_Ytc#lmQCR4 ;Y.#WĤ%kIAXj͹ROw8jNċ.<*o\*P%#+_yo|k O~:P8+mW.7}q!7D' `-v+#'ۤNM $M²kel]]+v訔Ѹh_6jGH$9O_(k{nǵ"ue_a Vis=r#n;[Yo17{‡U|HvX&.ܙ+R;b ,, `zj\DY$8=2;՚@" O_Ӣ},Ӳ{X c`Fme6*2YZ^i=7kq Wo2_ ҷy1e/(mA@P!$L IfFl4kO&_7wP4s0-$zbG4Y.OΗo/1x1袞K~Ci^Ͽ۩*NϡNYhkME sRSqV[nvX}T#ETk"7}{5HJ}0{>V@vI#]% ~FswF~Wh ǚ°Q߽WQl/nv  UЈ\=u}wKPbP1SN7F L]9 [|!q7oJ$5>^?vO zꘓqARsa76rz(=ǙMsub J07_M]8&u5R35l(`HLm~*Z\^'b,-}F?\S͝]>&fTcoGT ެ] w czi 0/ CffwW/w>3X1'{ƱIAd`T}E'=H.^k~ucwcw:7Cݥnc>N%Xsu4 tu{NuF$BT?f2-v}>%-WAZ5Y'UZ3uR g2$pۘ 9ӌ6o`K+H.%uǼ?XH -Sy9B3#9[Nd/:"y`fn}_zmd S@?vw*wqNөnmK;BiDX, Rʖ"`_vDkXE)})?(CqY4$0*mz{3mG"~byQ#to\jk 6;M;^r-߅7f 8"Bw<0oZlOM{%tMG,'.1mCǭx&RB1Zӛe^iO9 t?)iI~)ڪ!yCq7]4عnZhmou]DTBs4|"KnJ儾$\=~,T}Rmd[C@ Zq,Y]D0n R}gWw9`kƀY[9HD'>Ϲцs\Kpđ|o#KZMw,ACT`o(Ԁ楲qߴC!SPD~3ƨ$1Fsˆܟ_A򽹸gj!/^'-/%}0 u\9v|7gӦ_#L8Nt||sNpmbC^i1fRZEpr7F`y`k68 -FŢ6oh]V]D矑 O٣ Q#| 10\''$|aq% |["wԀFMzzfm93 6٩_Db!CM_Uo7f 7;>PTmQYD=FNoyelj#NhU/}ܵ4Y<<;n> 9*JCEs]oe D^<"w&VFʖU0T uOEHks/ʭi+˘ik&kj%/Нs}^9ح0ԤFH_,gp-pSw=t[e|}Z"7㲺tL^²O'Z:zbcV>b^8,(UYRTF#Ԍ+߻ĥ9{J wiIP,BeT#mfiE3j4nجS K&)N,x=h7m TĆ%#J:Ж36ꌭݧ3xO ͇(ym,sa5.-=Wd. ˡ.,ךs6땏Nr|1Kf ,' x+O,ҳm; "_ W6{?1910˙ 4# j w;LK^~A{2 $GޛNRNxTI^ /ξJ%6Ccq}Xz81!a_RqYQ5@^Gp,[ȧU^S qh翷?{,lh[e~װ7O &'b/Q97I{k!x4 WF fA]vm֕@쌸gZғe/6+Fѳ3 /0wAЎ%۴"3ЄV`z/2pHtژB[hZCeEޯ*L'#FHlrߵSbjHN^'#nm2;Fm2غ_L+:A$Z$`ky{#2B[g+=1|Lۢ6Ʉ䷵I } MrZ@U'+/?WU<3MhoK,;Z/8if8\\ (kzOGo&gK)tFTwjx~Q*Ӷ?!)uW3YM̓e^TakН6]\Y $^<8p{1Җ$EW)㜯jeݎp?\PY}OQ Ky KS&ݹ:AlrR*S^&gmzrj  n_HT#G주G1FaT+/~h < ?ں+{l n.( #"?Rq"yc +dcP`7/^6!6<3̜xԡW8?)/c1L-"S1G9, 1&t{HPjq?!*{Fuh|銯ЅNj?1Ё~D䝬ô!d'kWzЍ 3%U_Ԓe|HDKO>&TiZ7&uuE}|՗9Gjy=h=bJFBL)r$.ܛw'VzeC!z}zc#t5')Տ1LB[2=1V֌|kѪ<ا:։T'$)8 +oq$gq|c+`ɒl e"-!Kddp6o1efmjB Fx}?5,M3K"qpG\`SNwEzȣd{s`.]g{9t1)ܩg8>EՍ]P2T,[Su# |<|:3LOʆӷŀj฻:0o`_ Z[pGJd)ޅ("BkG㷹mV#4auoȇ(A"8P]l@Nĩwÿ#Y!UPo!^At߼9Z/zrq4i\5A|t N9!ԧ]#D_bTʠpkLV8b~nVʙ5[pZb,!1"h].lnYOgYl؂u2myi.R&Uj! b[iWe7d7BuO239\i5y*B1/͓j7Pʓ $)40Ř|{bl R#\;'ݙMA5Yn ;f\w#t x wd>;޻_)`ܴu!X++UA<">3g:V!(a]D=`A{C6 Eiq4̆dS1쨬-&6Wd/6'0%AYC# [ ){bfGKk# 5(zjS\ͩq>ݎcPGM8h& _zJ'JX$]}iKe7;:rp\{mnc쭀V/|mE/'W>r#Z7BvxɄK$GX?`p9BF(vҰV?,G\Ne<&s &l-YYb1+I*DnMtŔmٹmbWpoS $SX; @q]g ~p(;=qtY4>]{6:-)~)CqX>i25T/Z2^j&Ͱ[m@&8YEY15OZ[}wsI^ n߸#ICkhΑt/)裗؆Vj. QC(O16yC d~Ԩ7sL|O8^H\=:R'ue$!kXHW:ԭ? endstream endobj 114 0 obj << /Length1 1445 /Length2 7095 /Length3 0 /Length 8078 /Filter /FlateDecode >> stream xڍtT>)]J HJ 1Hw#) 0 14%ݨ4 % !)ݡt}}Y7g}>y;n\@as>rc30hCv 'g(&N0' FT0/% @Ni+P NP+k6ZY\BBl { v-9G fQkA͍lwgaAM3b50@l36@ǯD {spY@j௳pqp_ٿ Aap{0 XB 5Ye; Y`W0lv9 k+ 5A_ |! >PK3@8@|3O `5G VP?;A{q~^~9U?3cRRpwxOW^!Uп?u T&h~?SwῪo$d]~0j.{eohU P* B:B!PQ;C-Y@en{~8O|/6keC>d`p_Wojql' O;af. SiyON&7]Me~ w9$\$Ц" Dƾ6$~r}(%3r9NY3\&6U/LZ=eݴ1;E:{B7IP$E-\ԅNA؊%aaե@T1HAV@1I[e'B5fk|?ə2GD%)\`ǹ.uREVt_;| }"͋Q;QຓLEW"Ky1-cZN=*40*'5s%V"BC%9\ Lu\9 /"4dk"-Sà.H "й, Rj"|$ 6sez^:pHfKzRpߢABy 0Q7-]'Z$C@+]kf,3Tar&nJPR ?o9)̷<BYn<"M qFu%#&hK;]+$7Q؇a RPUv?ƒ.fWxe30k"UjQtfApfZ;R,곍x@ Wxg}6|i؏j:En(hIa)MSow\ 1&{ӽO,ɫ@C$ ֦J| 3 G:Ya3|+fBHAfKg2u:P u^革xR<>`L} _!估_t\eceqߤ;E?pc4/ &̊So+ʣE{ *1>[1eQYä|4x7<纉b?)yЊhHGU~6S&~N5suhe(;XGmsѡ 8;Po_{Ln]u_k.C9+< gDaָ%PC*j.~vA̠y(ΥZ\w}4NR̷.]e"Ӻu/ +[PPt273VV޺H׏9mѐf^fFxߓ>C2mn$/tp *&4{1qGމD|1Efs&:R q2v'DC|VH?e4@~ݐY}/Hs욖D!u mXFdd]kleԼP1dy0-(255T;A&1G~O2ϰD& yȝ'Yu7FMSp$wc <͜3~d"@%yj0BG@.@Lh]9,߆zgY)"j(q\ڬ̧E\g9_s,ޢ #OV(՛$F?|*N%dUJVcIsiְI [I|I N <.UEkJR@a,ZU,0?aVU|\ :)^\z-mEGEjliwuIc|fSRgr c](OBO@ .z=E*oqL2t,fMH;pAPNk}ZnO`kuI]Rvor^ .uytNS&gdB<f~_!JiC}+p2[W0#':IQʺofbcJq\i*iE}΍W1'sE,Tö 'NUhgUa謎F 1sTˆVƧXI&女 =0뚏Kd\=Wf~?c?0Fd`8#0<'60 鉣^sa;"cٚl9R㪽^(itZ{aJPO؍7ëWg,+SI;֊g!qR<?$&Mk}tCd~:Rf QYWmb|!PSxSRL;]i珆jt*wa۩ўC5ő=F/icoiz{Ki ~)=gTib63q߫gBR~vGCƋ1J}M.D"L6 &G2I;w?`Kjߟiв+s :9ޢrZ[LGc/;۳S܈.. S;.l ~%a(*Kk淃pW3d6s)̴ujiՐ+[NnK兀lfcq!nҶ*c qFJ(Ҍ~3ɗAᙯy/| &GtkQģ>oZROL5u_m]/f߸}rhmLEE{wxl{sCoM?v p'fI7UXk:tg#K-aƌoF yi{[/_}*)|W2J_woki> on$qDc _O[٬_ 9&\-%cBR5ۨ%:*CH73~٠탳KjFr: ۸VAaFBn諐\SB,YD6  *\!~dhm Qq49%ܬ]cAC3_$$S1!OUggO<ȯqՒ4 ޻}q˖1f54?zx ); ZՂw|+AZ`װ WlB.oY XY/uέ0*(7)O bfz˄<3KDfW o)b:lnL [qLjp o"OwNXg-'G(Q?r7EDMOL_LnFznXGGv:6C-ʷ2cvFh㉊)úͼo*_/5wPG^!&]VaW\ȢGN3ƉKd U]+A6d5H [ߓ|3eyl v<;qCKd kDZeW S7Llp̳l&Aɭ @O$t17dJ$Jk.܀}v Ei_fb 6K9_ 9|1fƉܳrYhsWiS2-BO fN!-FM}o :MgjX##"&iӾKN?a$yvq{D%ZހD5hr ^.PIQ6(;XY?oI02srv +F]g\g]dh$ptޓ[NO%ŦQ{%پmt־"䩨d>%)t鸶iLZ]/f.C^۫DPy?yplx|Y#P-}BCҮ,D~o_V $ 4ŃwV#i*rב||iW4 ǴYAS.(REЧ2^tʤ=&&1sX?#i>i{Ee ĺۚ}nKc\hc]m˩ـ]EܔNڔٌn^ꡳw `:k}C A1k&}X^*vˇlĭ1Q ^{jwTi\?bC 'X-˔:}ma]wLYU^ѩ{=% __Hmc]B)' I\v=v+ endstream endobj 116 0 obj << /Length1 1777 /Length2 11314 /Length3 0 /Length 12450 /Filter /FlateDecode >> stream xڍP۲- wwww,!Hp $垽*FݳG*(HDL팁v ,̼1.33#33+<:BAhn7r~)d],lN^.^ff+33yF S#@O!fg2p~?>&4.?"6@G-@h~5@tLLnnnF6Nv47@tthhd3Fx /#nm#\lMj2%{_dtGdg`$ݝF\@F?+7H='GH~bv66@[g'?9Mޯ݃Zڹz @f4abϤa rpʈMy7c3:8ظY@Ă?,;񲷳7{9Ύ.@;LA&c9f_} wgwϗ޻Ll=9_&MeeEi:OT```p?|;* 50U5OŮ w2Ew\ SYo"IX[qـ=&}S-b2F bk.fvFf 'I;Tlbdkj [菷=pz_.#ess`:w&v+'}򾠦@? `bs~0sc\l&?L!? `Rǩ˜ +J>{{@wdc>(9L\&@w_˿(El |o?=?nwſ):0]a\݁&Kv&|"n {gڬ sۉj9krC,}?v=o%/݂/$y.h9)F}25V@j\Ir( V/ϖ"tղ _Jw3YU!4vmYx3ᖉ=5XcI&;GKQ97(OIL?WF '}{(L3?n-C!l5_>J!?+6a2.RY G:G¬T/eFBce`;qFCR~a'U/jx_-^+Vʋx~ͤP%S G#TDPՠA:c-߈&PÄ.1z)P"Bry@ahs+RcoD$ !K_ oWf)x<4?c"cDlD5`x#mYDTT)ۃL+u3Z6۝<EH?L oz /&x`!ik&m3P&K܍Â`NPq"keeYl Sj8&T|0F WW85ޚ`B`Ϩc6| LkKF gHvZcoՍ(4s*ݿc+Z2 HQ|M0+ҍ 016RxT\!L^>Wfj'Df(>Gʼ0Q0l= "7udQNܶvwfJz/l֙'CgE:\ɐq[#w~ #z ;z^-ԤEg$m0h:CYff=Ue^N3ii?=rEm9(}Iz134Vuzq)9|H~Էx \QUqa x8i/_gVqW@\VWaC~A3 )逅?onf;V܌stѪ e^!Fk{e6R{2UDtz;? ]-{˙=YIΛo@V:b*.DUr^ hm~JV_z!OSXx>̙sQ*. 1\>7yPMq}p\np/EuEr6A:/ Ii]9sQZPً5x8!,[Y=G8H 5Q+:aSce2*- 됾dBݎjG0[ HrLQo*ZhkG.خ s~C[-BώE>MLnKsS2 +FNxO<с׆ʼnY,lDxQj|d ٔRrE0sj;q ZuTb0nm%1U쑓[, iV]z-psGR>.OaYǿm9gGGp.,0^*5.5MS:бԆ)+u?cc~ z$};0P&u+m`wir?9SCg$#1{]D+ö)ab)XhSk<;o~⹐cJ90[ n@6!bUv-hH^DܑK O *N箂yíg+Z!}5cvD_%]V4܅h#snbmиqs:0*[ۏk{,MAgW4CUL&#U&7DX8ti3LߌD`eY5n^K NE9Z-UH;2S'+_1 @r+ F TWk[*΋,&($^"zr8*Pv}brw/LPA02q uߗ)]ub߁ߞ "n.FdΊgгea$)eS,<Nj@%b9I ԗ e&L8қS >KV7TAnCABwPu芕/1eIFxKnnU'Ï5<Ϝ8 uw~h'_v0<$+Bi;--k>$޶ RnF^ᨙd.G^qوkv[,cQzK| Kd|ݞj헴h`ѿi~cu׽@6&` 2AQZQ1#ƝջYP6arhp;>O<`H, Ӷ FLX1`97\'=$;͋f ܢKf!Ce4`Mu*1[Ƴ8tm"v]+|y-֤ie}H U Zdž; Ka6iUw)^kQU~,*{LZmB9|XPt`!5Z@:sV+h+[:ǡr/gN"UH,uB~buUDj-#K}{\} AG.p@ >)"zls]E.W @zkt>gɦBF[)lfΣgʫ2r w,$w/ڽt[KLná^7G3]>]Yl-Ē_٨MeH΅7BMKd(GЪҒ//T?Y[EB [Ad?R$v&¿"rQ_!pzA8ȸ ϡ+YI9ĠXb.(1 eS>#[I^֥ =Է"U57\aqZ-D<N`uм mݢV'$[h Fj#5b%o\V+=6W_ŴtW3~$@ƹE` U̅#RpM LR O8%b8Ύ}Z8uiepMȍ_FI#v(X]oC-PݡSjّKRIHkO1pt3nzqe?/25ԘZ1TtGAKQQDϸwU>hgvbWi{$ST=聟uy: )¤g:s)~AKkUԲ!zJKws$ ~#JAv,y2tc\qɆ? .ag C~Bz0O?[ihY\rX`ڷygՐv5 QK&+Gh,s+ʧ:: Cc_ 62RdՑk{/[,kw#y7%)3Xutf窝ؓIAPSDPQqɫ 3FPusA W zcA!@vnK}a9!1Wg G?ޞˌPU"%> 9}1oW  nRaN ad/pVh>ZEn bd$UOٚjy5J)쭮U"?;K83O̢2DS:-( F?ᆤ?R ! )iҟ+`]<aآZy3ez?cm[X`#$ZpHTdQKR7mJfܹ?R<'-= |T1GKj`06M"YR.~Y-lsOS_\YZTE"f"XhWX` \cː3Q WEȦʫO: DN%#Ǟ8&0 .i/dBr))&Tx+=j37Os *HK"YePyT${Az=Hp6~蓰Ma* =HOHU],5 342&4FIN'hOq"mɐԘ.u oNBH _#bo ʎ4 'ʘXl8*U{uEܵwCJnřQL-t}@mӠL %ٖu+(£fg7hHLj=3`P^Xޭ0U1F0=Fwjdtʑӄh iO_7 ^sݾv_#VX}Yg}F {5ƶ Ozd$F g3'g VlHe+EA؈8@ɟ5,'L,Ćql(;HN HB>PgeO߄z$(|(ކ>⺤X q?/d8D$@=]TKJ,2aZQRЫ-n! jK3\^ +bgW3G x2ˡ\x IO!K㏋Mt#X2MB/"ȓ{Ra8y: _N7rlGLkX[ιRksY` pWpѭeT:İue"%S8d/fy`U|e5#L hN YPZ]bE+( UK$3gԃo0aW5Bbl`?~h+h5js 1Sj\'-65[A6v !" ! @b쥆3jE2*e-;?eW6N_r% B u B]^o-α2IM3N괯cO藾~Q6U{Ui+һ'G32/:&Fz+^Fgw=S=j;$v+eӐrb W:Ռb ;: [&pVߑȜ.$K(^~[&…RH`zA@% 6CQG ":6** UOp\|[J6D@`9[5Vp:#6?;B@HZߠ,|J~ХY_Ga$k` ==z6B !nO82/Ob9:i'sR0!OWG8% -p7y_s[䛼 Ld|&Zh([æ7<&241gz2gD.@YW&id1g919$A> xI?,*ܳ( 䆏{;Qeȥj ?KP">Nt4Z EId ^(Jͽ|VS\?4=A`Djb QvTZ #)0W V| 㵶:$sAq| .]ą((8O暼%c/I_t5rQj:$"cP0 а \A8 ^#~ GQ -H͇4-=b {gAضnٻ|%bgz. )X·eOxCQBNV?q;B[ kapX- `>J4=}8չu5-vks" 2CqqTE5iv[5U$Xs+.mPj&UNi<}QeG &ZEwl/Y- (I;V;?$7a"U}}ޒF[M97~03bDqE(NH]~:>R[Oԡ/ tdb~7dwNWE{DkyTvG]Yx [v=źT+ ֓56Jwפ2)tRط߀2 {J̩pj_O>;g,+]yh{}.;(-RReYOn'0}/89fkm8wȐCTJ ghq  G*+XIOX8 tk1:X"`5SDitȫ9ύKy*  .O4~1HV.4/]'7kVo:@a3ގ9׈/8뤾tMǑ8mfUOE!o[7VsSMgFx:oiٞ%L}YW`~h깂zR6kp0kmW03{|xgN|[ǧ/t9$^1 3LAnt#$'MK{U 6M~IYXmi۩] <`i Gq^بa8Cۤxb>c7/I Q ՞-bOhGfnjmXrn-\getb[{iȈcz/ mW4܊f#Vpi0e.,Q87uNoCa(9ʥ bPՍc< ()~oM\*C78V2L\)X _Q wH[N|an;񜓲 -.= d_JD󭐇!tK4vf~a֒J[&jטu{GS M3Ć^I^ 7~lN\/0"@;4,]3gOvkEFd+խO12 62GyK!Kñ#(JukuMXkQ"' #Y̪tEC6% jvV@UIeQM0?ytl_ށʝ?ت?=ZZ|jH> ><{ɯJ$ axP:,V?1ݗMH [N m ^"m6]-L,<9>L[*3BKrl2KL!kwNe<ѩA>(w ?68 endstream endobj 118 0 obj << /Length1 1345 /Length2 5936 /Length3 0 /Length 6850 /Filter /FlateDecode >> stream xڍwuTTm>]J4%fhNpanPQ@@BiDiTRBABѧu:{_}?/ Ġ=" 9@MOO[ @ qH鉂m5c=@԰p'N!0h `)9@1X9@⍄z"m Aq"Wo" 8/4 pm] @vpz Dv@W7vH0 h/ C!H];T1 lEyzx QwhU: .O1>耿E Cj&ztk ©(9=IwP'_)L࿍_j\An7kDq7zo `H'wD)S8 `5@0}ő A-3UU %a1IKҸCпBh^EU4sI ! HŽgv1W]  '@\(?!8zyVA[C,*o'*hG" ?HM/f:A?x/ۏ@C1_$&)@Xn8I6MS@TĹ҃KkbbR( KIP/,hqY+ RL|@oF8WD4~#/&Aٱw@fb&*G+u`ߗcj$F8| U%luW sdة39P&侷4%WWv\C_)r?>)e&27*#y<iHo\,GΧSd+>Ymbֹ;.h0$ 4F 8xft1쏧3>и qn'x֜Mb׃U;sujcB-/n w!*+ DŽ0b3Oi!` ]X%B]C+(uSVI%/"NҔ]N/jVshOOBM՗pII-<%='R v/)Ѥ`P9Žt\ugn4XqW` {$]eHOZDz5F5R[g{i_xNRjtJm?Oux AXQx0m8d$}Fڼ/9a91xVWꆬ̙`2guu l`)_od~^qye0m`Dv5MRjcVۺ{%_ &bw2hdroi-yIuuzzwa;wjv!GB>uk}">R*g"U&+Fwa#Z1WuaSlWR9D^ފO*s ~i7?HDS No\_~.z9z0 ?,x'3uY=lXA/^}C|7dDԾy7sE2.RX'q.mU.,xپ 9~\XQ4=vFa;Qk$n2jkhG[xnM}(7;) Q >TK1Qt0q]b"ؘgig &f|l_67!x?y|t?ϔX!,~i4Gq)qA/Sz--R1ǟ3VZV8M@D@i vi+M>k+) qF$hgFLF|hBć,+p92࡭-*7vsqOEK~*mJ^ӷYlAI!X߂LJ{򾙪b1)ڄTfb8,;漌Xq>s7 IY62nըl+4zX{g'}}f6IsL:Dtf ܥ_.nGCxyORKRM&);AՇ 55۵?%<|1jV(a>竵&m~UmD@WRoC^ ҷd_q]be oŪVPCr-[i`k9d?x9TRJ$,/ʶȅCWDϳX!IRb+R_b'Mxc )_=hN[M~5yۉ8u] =;8\"z8ٹz(v`Nyq^bɹ: eƴD]R$͟fSvp_lѶw kn<*0[jf^GȶИ<|z 2@ֵ8O35$|9NizgY{} ׷ k+=lk^3@4{f-Vtx۸~M}rtrnb _+g鮀N?.}nь.s)!R³@,NyL-=XƳ(fl͌R|0w=Y(AR17fzB*PpA n&fِ5e$wӏK%G\r"KݷWE@]kAMJodک⟊|VZmڮSO˨a;1F',B62,SEooCeeF:FS.*pIf$/L&/5&X5~JSOv0 @ nz5oQ@Dz &Әh]^(?~*.|4㖒; ;l|ӑeͪc|eniOL\P[#a *8ʎX eGq#o9RSϞL~Z3Ln/nsHǕ !Peo nNNQ"Іra@[ѽ/,1`Ҿ$ز o &¢e1jDQ㠔{H0thhBh=}yQԌw [E_8ϐ".r$?Mr:8UR2^fI$5p77YSq\h {X7Fpj /D 24-ʛWm'3Uϙ(̈[Eg S[E~ O*aQ1Fwu4w㩓{~UNb' qa;ԝ+ \A""d%?9d ߤj 6f6}c[5qC'NV0|PziN]Ӗ۰wG='viVTRS[ϴaޘ|g?yí T^_i6!^g,s $*i()J*w,vOwLJ$:Z_ ı!+de H ceb+m8b*T5wlOC^xZ#?K*`I83_$@, ٙ.@-zbRM!,z4Sֹ*[FńWɦUz}D p=s͑[Ys †^[j׿~ڭYRXpދk%BhBAy"EH0s3i'C]'b+oWkjCR20JbDQu h.u>J m>VUIJ`c> F^%kĿBh;*,M~^eyv?u,X:7-z`_2GDJ+Dr3P=ZbEPmsWSw,m xD^ 7WI!w"tYKfm@)%Jp SM̚Slv]=BUqt]X5KX[ }j~{Praq-ō !z~r8GXy`&5z}P8^qyuo_A޷DU,WYn-Yzق3.vv^5N|AEb=Q&V^۷Uao“й]te~V)wd}+S[4㣞zlA({dWh)|P'X!~A՚^d;[XIq̱Ab/ 0ђ " Z4=T#CN=f-V+KmNVтsy3-Cäd~QAuPX]uok{ 1#d P*9ʱ4aj:?iL$ݏvKvNk%pGrŇ0=8IV;Bw#je]uBVku66bmoN_Yf.DX=:mo`:p) 8J-*HlYF.93>Յ_hr=ݙ2:Jz |+T( I e$aLQ endstream endobj 120 0 obj << /Length1 2757 /Length2 24034 /Length3 0 /Length 25579 /Filter /FlateDecode >> stream xڌP\ .4@@܂;.33$WuoQm4j "掦@IG7Ff^* 3RRj $7q)8:d,lN^.^ff+33 ]x&FR 4fD.f&7+=(@4VnnNLL&.onVU+hU0@weu+k׿jn&.@H`gmtpy;]5yoc  px d𗳉%Pgtr{ 0q0ehb7031 )0Oyf.NnvJde s1G{{+/~.@3P۽?WNL@L@"2K t̬~Wvd%U xn.@?E,,sk37)wth7 =?_eRRu^_vf+3גq>7?4qpԦ1gh9Zc):f_,W/+ے_BvviM1-AgMjeL@ `io]%nfVor_WfgTvtX0Ft2=:\A+ p0s4uboАAEs_K `btpt@,]~M$K70F\&߈$/bo5I_ `@R+I7b0F&DN7@~#9E_ drF njoiF nZ( F>A" rӿ PhD49@9@? /Cг7_dX[]RP jtTbN# +=hoM\~//gwеo2$[ zfKۜ㗹A@PlA٬Xd6T>? h Z;2tApM5jP0'? t0Kf{Aqsc~ށ8;MdG_1˯A1_s 2sz1PWk?€jbrry:j4?? #+(߭E?r3w|zՂ/N@ aiь/SX]',V:-K ڬ =oV%h~ /<4F&==SmCX,:OOȠ.l )K΍\q'be,b~WeSb!^#N?t24.ϛYWw'lž |֪Y](tq MS,,z"M[exxYk[1e' `c' ȅL (íHtBtF&pg+`Y8ymxd6"nid_T_ғNJ/fu7( mK ('hp~ߐ,[E̺нcgy~?/nΡ-yr*v]\YIgEgJtG'R}6΅V0]lVc3{ȿX1HQ.#ջP3msSxYіQBs0~<$sc~uSY)juyQ%:]+,?jPnRH,D^xSSuQRDT-u'Ϸ2K r~KŽxuA?X?u(р!eYK[$7n/J~W@ ֻ03M::MkW_ ˀ{GEjC&."y]Sa>snq24%Jƌ)8q" (*Z/|Oα~ Ilq?R@KbC\(J~jCT`%Z҅iyV0zhŁq!pfhp`ߨ1>%PPf' cGF|AEe^$&;&VFY@9;um^AU,S䂥'$GJ( `G4Di/,~pm MYVyaՇApuuBxBW\؈Fcyz^q{)|#p[FEhiwsg9d{!mAڭ7.K1y~k^ѭLkA'=X| K{\檱D_ ܔy^KG26<[Ȕl9U9ܣa8R#r&QbR9D+4(1rJ̳i52vU 8z]sػA|8<*VJZ[1 U}>3"8>ѕ WNtNmʥ`N~׿u*N~tQ?Hk3Wk,XV.b"9v&7&p<ad~8%e~߭sOqz?u;DW9=};cEYHj _Ln1|ț6ky@$\+vIUp<[W#sWs+0D~f]JawG )a ںjLȾOI[D5nQ)%R:4݃cؾd&vĚ"jL{gb9"v|5]4䭋0b¸}Uߣui]@9* aϦim@Q8x ̫75q!Vb9gA͛ߵ1j6MMYd%FMܕġ'ᖹdrwj'Ytg/Tǯ6T|K6ބ]n,PɳY\:V xjh{ZjpwP> 1"CQk UȢ%w;NoHYDIK~9c";݅ 0P`9 G&kxwx1>olތzy^-[kn,"HN"zj)f3E >ا5~ R”J>z.'"O/J, bG厾i@i6[exWHpjjbe{^Ŕ!ٟ)M$0HYDZkTv>{sZ<NxL_ wɔJr1FƵz$ ֘J+TᣆSU>HUZc?0Tܘ*EQVlJW͘?s)|ُq o_$D*]ԉT~Fא(8z *GQ\WExUfiӍp55KSՄC6R=ޡ܊I.sLF ehR>N}DnT$lX04]q8 >5-4[1)o}B:Xm+`OfIpy=s3pu頻I&P~;1T<"I/Vs=fMSpKWts iډL*y~N(fT; $/>e$ߍ>ԌDF\ y̵@]\7)$VdkD wǵ+ #f#<"Qt*$5ۃ)F2%im\bD/*Nً=H:<V멁oEx5wi}J=;ԉS3; 'AV.6FW&ȳ/4^X*oQjǓ\yIT)x 81khb<mznbTE TEM:>-d~P(;[.~mc. bvohӰMdmW,T=I۴e>>Zru$ZIL&m30a5iTWgP@0nlBm +jRom9f^Щ! /ߙ,6R3GKɤSѪ-\aH4nu (mI*I ';4qnK#S*9l P0G3Qh]jl&U A "f1vZ 40.`M+ S:6+l~vD5ƭ2vyρ O?g{FN|S `$j&L l!2ݽW)\.S׮>#X9st{>|~ :Ax[ -vϚW2Rx MP)HR `fo}>9Һ]`S>ۆޔA[_pk71ea3̲Yo/H &[L>62g|gDWo[~"J.@9/K;T&]K#Sf&{t=KCѣ E%CY?½L G%LqԎHHȆ{UPNe ˛C"7İŐkd74c[&EíV[eE;q"dQWw|(F_,0)aT72$a7OÚFÂJxdg|4}~ HfY@-T(HF y(z𤭹Ĉr4!/n]ŊuD,jwuDfh~ )ŕMV2 W нq /֠SdQ;Gs\ۛ5QQ xCrh!_D[[דpT 5)z ۞=m|\!~Fw%q";%oVS6>RqFi;$as4eI|x3SJLmb'Pk•1ɍ|ɛ83?lpܿ)nxoF'}_tk*?{ dLV ȯh` %(ܶôIWG`"C $t&w=&?EYX>;9!QlsG(O|An/B&pAWrgXx!'7Qgn2 $*$/lV{]i+:.'NUOZ|vb0H>uV8+7iJ9~޿]"hޱq}["`59ZzHKgn,rGފb}t[u%!G)2bC6K.Hw܋\ ܫ:͠S: =mhq}V|{e : p fG(EZdP2A4 gjAKBj1LcKOGX3%ZF^pqSD J./R%*? kc.h[zV+2zo<>Q21C׍9U"={f8Pg쌻e}p:t~ϱc28+kŝ#׮(WBom!M-'<}sut}rpDŽ(B/]t*RaFuo,?e\ ms#͖gD8K {y?]i(%1 D7no$imR8ō KTo$@uB\XtMb)V%'*2c6][sAjZ~䏉V-DN_.68FVkyְ%q5@Ga%s m [ S^~gyAJ l;6ǘ⾮?_!j6[Γon_R5f|%F3ÜzF"(>Qtw ~~:܀t;[:b7^T .mZe#4|$F)NW64#'8[H;nڑJA#ZHowv `EfYL#ngֵey:~Q!\NBO#VI_*[.v]JlvQ7s~{?لiJ{̯.ƚOXV_gaRI8KĎ4N\4ߓEDM4[yXCۄ+`uo˵ =&ɧbuZ%!.l A{6q -&t1ϔ0Owf$VOg]-9nDF)/#C}*Wc8x0{zf)y',e(_(9O=zs5<^6ٵl.? p&*{"ᲄb7$<އa5R WgA{|K^1:n8+з0,69Ċ l}O^Uӵ OOw5e+b0O̿0} Y|E> sE3ʵMRRp<)OMpH7{w[\)r^A}Eb!sXF:x]IE bh WGO^Xmm@6;\ΪeI/k/jWnݧ%j/]9<+Apn[e362|һg= *aK#GJÆ-JFA{dfgD6(9L9 r h<@5,Pk`^<2nBl79EĠR/LsGr)Mqps^mQݓKgŔyY׳ Vwlfy`1h͕t5h.gFqlˉ9?} 1D+BQ !";*KCbҽ d5xfa|ͮpđdAP㗼YD osww|Nw&J"Ne}\ Kӎs`)QGR/;{t&R-7m6zTyf( -Ҟ>H| uYK5x}%C#pPIJLR)ed=.؜7>Cx] m׫4ߌg1şW&=L4x1ʂΟr#/r\tES) ?h}AFf2\'X +t@)LpUc |i="oS Uo:;CDub5)g&l~p֖B%& FX'jp<^ ul\{wqf[i]ӫEr~Q! 54PK ςl <+e T:tF7_q+=DEt9NX( dZul^SfTb}4rO3(|)I,P; lJlV"$ibϨEӆK~>oO VK,j+V}9UBBOS-sϽfX*El1c{K. >Gg#-s7_*Z^M%0+xx`/g wb.9:KdIfzgM^_dZMyNJ١g߸DD+ɓ_ن ^9c .b`@rBj~oYr;T'Ói 0_8An!PԐL6Dȣ'T~3^]ѝ rAC_db*䏙9a9PaC 䏦7?lZI8^ҁIșcI@_+xEqǍ=wV꒚ip0Ӧd}ٍF@x6*;K -uO>U2wIA(NS9k|~ȅ׌G3'rBi'|lR*QDÉlH -ٌZD^ bk:\ ~)=9bU* Z7SfȮ[5qk|6BO,|7R{jϭao;l 4S[p='L9`)y>=P1 63pz!#ec|ۇtrN6䈋v$ ;`I"I) 9);r1{Wofwxi +/(k8YB;n: tq)+#.ڨ|AÃE?`Uc|yKzv6,IW" (_ʿonRpG;}3jT=]&4jay%V&C+dI$;=ZM-\ n`\BUÌo =ƒR'> _۔ 锠,5˶iZ is8tvEl\ܠ DRrQ2;Ź8*<Y#"Đ?P^A 3!19Xz)95 q|%mn!OqWWorxVFS?qc A?:4|reW)=C86XH#f"hk\~ݛ<~ֽ2Z_א߬lHOAVdӡɍj❡7'wtfPe!:NC\%ʼVn|$@ǧOwJ > 'pv=EZ'S &~Hr>1${ѕZ8Q_aΨa_+gƬj;T=Eث3g*[gef:Yq;Sbfeִ`()a^ ]R6L/U7z [Hp2Jm~8׮9{WgK0sP6U/h/V3hW50tv(Kʡ i6%Iɺ}9reߞ7fx:ݠiK 1`No \Ew!p3RFNN44j*(nvX {=vgc绱o@PhŒjub(ӳbo;PZJp(; qUZrXYG6E8]5u|Ltğy4=Kp^uZ>2/z&}-[}τsb̪*4ހ'0խS"Ako"E-^X+$xAA=3Ms>lYVx#(;YIlcR产'1| '/W9}Ro~xH,A{w9';~]!fUQ`;F]TS.ūՀ`۹j,r4^m@FUHlni&1/"'QfT-sl q=D~*WԍI쁣/|yfO3 hINx@|r*48DIm(j{}!p`-'ATc}0vDA|5vWI{e AiWqwAG4dG.*dp@tzxߒ/zj~Mt; 1ˠA)h"f&`tdbҩkfV򛙰,k{WБ;o ,`tX<|L}|!$qKk%~5=Gn-/c 4ɡ—6ĻNȸ':ޥ5&G6ꌎXH8Ҙ1ܙq;m1/= EGUF/287S^'BCv?nԻ1 x߻,D$ړ$ h-y>n^h1~AD>Q*-0ٻtXPK 0f*$AI3mAdaM"B!)Ie,#R fU[$Ȯ@ /cNEVoD\e SP'(1",bioskw7IѧTœbL÷H +]3&ʛ#W]Tk<ٻ:ײta\PNiaYLD]UHvu-Iw,/X[4"[D }4ИE-=rHa*Vof< t~Ժ} @7/xa r1M U\( QLA>PpFHrUK 'kfH§h2R`8hhh<|yN! Ifh嶻%RUeJQFnSv9Ě4=^.@T{=It(97pEg-3us 'CEGgDO" jp}Qlʗۑ\V ax!6K"%ED`I Wiu`m>^nڿ }T:Vjb[Ƽ;3( ]ar ]9˺"DK5G;cQX Ǟ}dAQ|4r~~Í{ge9)~^0.%2AI!R*GPac>A|JwFQT@݉QSJ*àx;k_Jڢ(ӮZ/ VHUԚ_߳^VuI\S^xY*G@mFDM$ۗeZKE-uN)B,Ѷd]YbJ'Zes`B,!#6oJ1-Lm"hs#ͻ۶|HӲ!'m^ha[r3r:` 7 $d^z7?]B^N3^*%Jsm Ȋ:Bvc)j%[T(""4Vvgr;ZxP46 G&ɹ˔Ko07~\j!@& 3G"})7 ߘ$&C*76>7$04sԗ Nû7eUf/;K)C"=m9ZdOuWB>%k!A#CT[ ߓ_(of`?sH #X}'_ mNqDˊ\ڈt쬋aye?5(չ]# eAɄ'c-}@(7d~%+۪C%Eg X޿ƅAc˘rfZ?SgMCy6,7ѳE-:r! 9w̪>CBFp-g?PF%uD| wrhh&Rt;z}B-c=6[݄aX{A}*2&˛H|t.WmU5<{j ||*$dԬ5E8ZMGJ%G}v? 0 %[lѸn[NCهv25Ү0ٷSeb zR#t 8h|t}a3rFo`p,asLߵ"~Dm6=BBivi9 OTxq!ˈjl?v"nt;5a<2iBLvK(HT}A=ט8`E5'``'0##)o+66V*DG)r=/MJKR<:?!1sm7®dR<6 "iX?% YY9uR[Yv 3?]BIȺ6@(wH|$(n"59׆='  d|c~QUPBiۨf<%GChuR\B_66 y8_=ʤdA;}yBe:Ͳe }^; ^Brl.y!t2>d,1a2y*-C{KvЩzJD*NXjśEqGq%B)GT}pe)2@73I:wri]zm'b8B]yöz!p\r^k8Vz=ia Ԑ9 m?Bg㍌3GZv M . .S;H@ZWbuXnCrr\^[DN %u:{Dt]>?A5\#5el~h[Q|U6jzr\My%߷mA9Lxъ ݵ@3OLR!Z`R1A7Bjcm!a J<`苊Vz})$NWP?E C>k\˙5ӶNp)~Ss'"En6)7> ;kVK.,(~q9V<0o ׌al,!aЧ TOMnb^+ ªk5WLڀ*O¿ k语N4y 2 e78Hj 7lrmHcb Lx@*ԣ ]*S+NYV}q.U6~Nu)Ίk=f{$vњo0/PPiw I?rX>o$鉨{U b~{4RW~U\ )Dg|"]]~T49-)^ٽ{-^ *k.iS(F7m$o DXW#7o3¡Vd P%tYVU˺F|׌\$b΍9wZxd4ڭIau x&Ջ=fx^JZGJGM p%̄4φƄ |tlOXK%e!" VwhFDIB&efF em?Z |?~25@4` tp:ru9N#>kJyL/l-Mۓ>vΪi7.v0%B_ 44`6+ |ѝMVet<.] f{FȔfP-ˮӡN,:yjo0׉)u>C\h$f.H$gd)ėX_M0hrB MF)h,Γ$ Voc.ihW5)|g{ċm{G+!w +3]~"> .T6L4ĥϝKPr"ڳ7"xV$= K{`t8ȡ ktJ\.(9i60F+Sp A f}FO@BA: L@+)?Z ahTtq W~Uu.=u-i&p͠66S>~Ie6ah1U6Rg[hȣpB'[H ļj5ߪoLv]VP5Qx",xX)0E8^i|ht_zu:7:*Pڅܓ-u_\b2WN0b@~)`R"e9;͋V\ރRGh: K9 C, W! 1).8Ik IV, dR'<c֠ytWOKO*viSŠGibKxR+.h03;yel\>Car9;Q"}KLʛM2e )' 0,Sڱ)? ^a0Og. 60$\4 nN@yPF(ߞ*\ʎA2;B>w}#$n5܆4L">ɍrE{kFrְ;)oc6_1s!H(d+Ʒ$V~ Fƨ$.ɚ\{9qi!6DNa?5;]T+k~pYǒnAMǞ3P^N%.r~?#n!fɫ pc&OBAOI'r%=bn(.ٖVwOK kNDԳ)Zd&[?W``V/.'$Y4(Rc~ٸa Xz7u .K<3Ϣ4 IL;P/$u;Ǝ8;zoȖqeW AKQ16.EB$̶ŭwk!=ܣ:/w0#yq1uڟ.̋~sf됌ق0N_J2ڑZ.]X]U?%&eDJil hkwR+e Vcٶ{ {C'( |WiЃa4E׬ Ԫ= ^UƁf$]4\ 5قڧMcJtƚ>;q+g7`>oE0Ɵ 5h' }9RPS_mۧ'jr"̜ NG۵/#y`-QjRE*}w~ےz0?4j\S`˅X]pcIluȅCg-2V yYm]n񫃫ϩyܛ$PeW`ԥEk!.BW" {?EFEwOj<&;UŐuhN Uhmt!ksʩ TN{S(AN:,^f6-Z !grܤZY<*,2$y;ճ;MDVh2:hw:"9->,nN LuABtDaLVUe;^LĦXR-TԤP|i/bu,m9e;]5vRx(ڸg(# Bo}TjgAGK#l#d\.~LFB'"X, ,Ն/s;r L;#kIxBkaJ0B:6 ;LؚaI<т8oT(X]8|4>!gѓҵY/ :GYlAufejeKh§ <|gHxaM7+7̧uy詁Ao[y1Fhi+WݓV\/X0fmy2d3?AW3aq;#A@ǚ2ٿFu{LC*?#{5[ &]P?-P7_s{A IM}/:̥wPOȉ0KHK9/Oj]ɂ2_\壃t"x1kޒP$z+l@Gd.)o*o q⽻pE.U [?,|Ñ(wJN0[} - r lq~H9ڎ3 FϘ͘Խ]GU@##m8GmB.eYJ%/5+t1X v|\HptN cL{_3 #Km t~E]-d?|~ǃ"3(@H(Yf'`Zcɘ[0ǔSܣUyfïtرK^!ܟ\׼p u_N |jP43pN#P~K`k[lpv/t҅_BʙʥvJ_xePIz׳hxl/Y<UyAnG}gSq3*hVnwIZ *03˥jC"%^S4a[yU Jq]Dg|2tzA@V'&%r+-y\f,o4C_tm|3,8QkH@>( l- ob=iuu\,Jzu`hcYOm吘ӿ[̐N!AĖVܳyR"^E'rվ]O.Hgl,\@a*JP^RkaRDEr+r b::xHZ{c380DǤ`ӷ4p`W*Կ[vZTƆ#{.6X^47@kE+& .W-Խ֬ƏBx+ ! +W 4'#|܍٪FXf]@- \ LkB$q4#?1MNp[ʃ*+S`yh wC^ǥ셴%k%/i5[G\ΒlѱK]nVGL2oI;yZ` f2(2h±uZ*gP.BR2RA=Pgz5 k dCp;ۍLEm NZw=zEP).PZu_38$'yBGmX(ȌzHn^ B.O8իop]JT= k ؗѩR10 q@Go.!5> i#nywhc vߣ#G@ Q@Nw0.Yڤs,b:۞|5CbX3e<~<6Gֺl'$7sʌX8tl{-Kq.u;x+vjBmL}57B.M]Ǿd9;;3i>B1H!je/N2j2H@ޙ7Xrs@ЭOߝ-Nnr5S]Ϗhޗ,uDk),@{S}~ Z0<E)ac Bw-CFČ6o_O;"J&c; $q@aa VrpeFh8=_<9X;=)!2\MX{b Sȸ6tFRs9?3nbc9$wEo7.oeߊ.t,w_ q&\xF-`?oH{OT; *qCs'[na `-]Ji=JgdOg_;Ho׵#k}݊!ה0r2t{CW{PC8b;n!}bCl_ewCxfhDOYȥ$Y +J f®)K d^ZyqH\P ms=ߣILWD:mXm\Oc3.7<|}NKD#z槳я<mĿדD8J;RDݿ/WTl=WP :ׂL\CYqVRG^ a6I\/@ uj P߶]Ʊ-[chCV pE1#y aiEWcvJK2}fjij[^!؁i WQ&ZnSl]br"nZshri#dfN-7>A<:2܌UO> [jBzL{ik(TBo5+!9~R 3Бl6%"Q bbLƾu% e(Z{63%7D5pB`'O! ږ cܷP5?! endstream endobj 122 0 obj << /Length1 1640 /Length2 9242 /Length3 0 /Length 10317 /Filter /FlateDecode >> stream xڍTk6L#]Jww720 0 RH7ttHHJ 4RʇYkv\ְ2*j0$ @YPP ' j ABqYMO;L_0yoS"ta-/(@P (&%(.% O;B :|-wU89##Á ())G:@ 8a v 0rw~E!D¥}||n|'9N 0{`w]pY?FH  7@!`} Fij`؟ @OoA`$@p@=5m>/~@o  r @Mo<8ғ"o)@nn`w}*~~ެ+p@yM`/_!&lN`$@T@@@\R}8; M O7Dxo+(A{'|`%p=AO6à~qzƖvOI+,HJHEA͢U?0GwO?{u'࿹tE pqkQ/g',7oAj^Pn?7 +^^{oCr_&x0'cxA| }Oi7dP V ~\_>RaBb ( @~A`?4 烹#SoT\'k$%7F"w{fP /0_P ޓɾ?v? >rx ާK[F_vB _7Mn}.]t>ۓfnkB4κܰ5ąbX?ʖ*ǹπQ7O ;qSO4ĀCk#4WKPgPݷiry!3>PvOlro^"NlX8h5ǭuh1`hj{@Mxo w9(͹ )Қ`Q Ml%N#TGV߮Iک2wl:63jkD{mqAɏr܇{w&-xYLLrLنUF)wװ(jdZ8"S%$<1 8X 냐ϷHgNOGQp$ÜCN};6&xB@fz~JlKƇ]3VWŻu\>q.`Jٜ<:gL2cq߽:<|dse'o.H˻Ԕ7*N_D6!;3Q Տ[bxOc)N]Scy*ߙsw:8o2W 2(sD-83~䦍q3d'X*JlM0 m˾ޟ(ȏ8^*jZix.I-ʷI×zS2uhdeid[i=Ÿ҆= !#ȴvbXLpf'> 5.RPޅ-&̪(Ajlٟ~ԢGm&?NF:ɒys*%Ҧw\%6W ѩ5BP諤 ߘ|3Y9gYr"]#f<(J}g*5STZa=kN~Źtn SxuW/sRM^9LԮC7Rf0pZ$12Y{eoBY9MGݎ:>V1ʲuqTSi% xfFg7<uA)pdЩiJF9J?nC!ll6 ŘH[Bs.aGU8N/hNe>Icex|ÞS_0KkRհs0ylDV)=^ Ae+%yn;z|D={;W.c6Db Vo8Tk=0\+%*GU=uܚW-_GGϞFӥnoeFG*;z. qʄs]8@- HIya)RZ̅!N5>Y lEص}J $VVِ[ᣃ$ix(QԻlU$Qs Cyђ}s,T5W!/|-mz|TfBrEDkui./ ow5V9Oi)de7uٖJz&qak!fk@3J^h } ݂ P_Ggpb~)2R. Ц?6=E9?iF#5j4KyZKi6HML9wzK}(8onvr2kZt3\Q:_[ obvĢ^SXY?#KYD>ks_@5h7ϷpNס̷SEŷU.Kx]tqε}D\$/er^#aKi0ԉml\QCn/ "SM"h01vYzJv=DW@=ѦkIyAٮC@N`cHh"^JV6'>mًYٷ0yUxBЃ8ֺjEEu-y9[OņIT->aGToM;\}R}Cύ-G}TTrIgZ$ɏ-UX%ig;s.1 )~8|%l,-b#4Q2E eoK;NmfWï$`u^s]bLS2*c<ԗKp2B"W#/NF?%cʿv &$W^oY|QW㽆ϰOs0nV̜y! 4eC9"ZLpsgQ w!-J7DhQ:Ig5V@*W3.a@=/>FhCJI٣-BNGh/oʉח'L'f,k0cNg oy)u+6d%DeNs`6-3ֿYBJPk&x[]D= 3&yP_1Ш!DU?BH)0uq3ed3(͌"nkA%`Y Fn(`h_D?~Z#ƶf(N8 - g!d"'xb 1 oMOmg.V+GsWb%RxA祋726[yF6Gl5(($VӠr6K\D(탷WLN=ؘP[wҟ$M,MPPdZ) 3.6Gy}Dz~5Y4CnS-qDhed99a4=ipR˸n787L4k9`f@TI0asuO]Ӳu-)w]D8|2GԵzw}{K!q!t2㾒gpz.e?dQ$>J_L;=!R*Ļo! QŢO?HSTI+b\LZ l]ܐGV= )]&7&#n#mBʍ0Nj|  g\EayՋ>C%B5 L)@rvG͈iŨZ:X/vi%<sGiQyZɷ GDr-!k?r2"OF4&85{uϖ7ojl9rDS`;|]5!ZH:M 3-~\&++4lHg"xP 霆vJ'|=AptC}Cie_fw. ŞsHJ cn5Cgހ'~|s"deW mxO.snoG"" $ $eqxk0&Ɨ>VJNC7wKVn|K2x0|vNrZWhnz*i6%P UR1́;O7V:^lk 7OՅ%eZJ$9OJĤjp79nYmX[P.EW"Rg-LAL6naV})|x51>;nXIVR7ơވnHt\GD3&W۫m:E'Fa3-&uV[ub,#ه K=܅|{'-3*ģ7 `{]>Cw (X**zlŀ21>mL Ѣt(S|KxổS[׋sj^[̥e熦:1#U>`ڂo|v|'ȾT\;݋D3? 6[yrLIb$O%+Cl^֝ij~zS$œ''?ecqR)u&[Nٛ1~*/R 8&vAWa*p]eq)܁1lg]<:tJd7eVV/LӴrSEcf2 7=5]gO, aδW~]*R+ O>t֯=73Of@;$%k8sM=ՌMCٟ--?L}yaAJ.fScKWsD)> _ݸI0ͷTQw:cy8ϏoWe󷟳Z~ކ$Җ p0GՓ|ʪ$)~ύ=Z:(n'bDKۤ 8-snzat~zxgP|]W y^2SٯK2<2OXM1}ѧ+['ڢyC&_ ֡]~%`8瓦^%juR XL31Yt҈5ET?e܍8L2i&F@]f<y Oy3  v 8Di[]h/EWKh}QrA&*GKo1Ю$%S({-elُXABOG Y;%W.K(34Sڙ?,2@:k:]۠ [i&EaVX YtcDq'K9{.YLRǾYVZ:;Qy\ŋddºMSrS=7 ;!b'8h"vtԱP+ wuc\v30I[.I( )W΋JEʞ6DoWRn lEL"EWr'ekޡ{n nn()\j U~C`iRYni>j/%wTfF(suE8։?Rh߹>n!`QFUyrI7`ztY t4^S njֺd4gm|@_Hpsη-L ߵ_܆)/%~.wRHK)ձ>%%Σ]&"iQ- VB3Ǝp5~?.S!.kW K\m~I"==DGBۧN)rt-ԥœU-g8v 4;ۖY3.c|*Vk9؜J\SZy,% PY!FX(ܝZUkMF @d=99|Q*ggP;dGڭю/X!fX;s6ڤ$;"1*M2J4"( A%զ*~k.Ję hXYa "n?>^Z^yovRk:lc3nx"[݀qc %r;qfӗe %9mwr;u~MPbIxuEwz$rc!ޥG4pGX8lƆyfϪs/ensl4&&4Vw@a?owf?u{Fإm5O {z}>`#Fc9AQ.I3VDtz/Qkx?_/w;!?pWsn,|>u78R2551 3pwf`F S< NT*~_˄%bM'8RoDP3 !eoT@"QE󕚱lP3|̬WXlsX3ڑK=sĹ >bqAS|4҂6E'=[?=d9Fc2/]`2VŏJ3jΫ %z1J&a)1kd[e/p$QlZ"!0K"ӁKŗִC. Y]sz)i'f0<@N_1} c)g̫H]a3ab_!В-.0  ȋ4B13 P,RU8hZۂN&Qmn}@?AsϣХZg ~G4OKxRvFA <( +tVG/]xAZ~bv-Z}X/h\]j!;Q+A"h p V@^y<+.Cgi^G# A /i7'5a3KedmnpH:GkP F뵱O+ ;M鋁*A{xlNMOP+߉p2>h@G5c|<~D}fk=cZ3nl2| ~ˏG^`k 361(B.N]%Ȼ 7WurBIo?t4,wZw$(2[έ@z=N45`~T>HHq=kn`ޒ޵e~/ɸ&քޕ!.p7nWō/x#]-0 ̝p/`%?^E1zg/gAFYN&*NGh| 5Z%nsdqގ,]ۮ3FiU_ hዚ-ADR:X5χUtiJcVvλZ9ECk?FsCCI{Z~5 Iv !6B>WvW c>>̾>E?)Z/.gR*>u>Zwhc8),7 iR=2]Κ߭|)^$fD<_\U"ſ TO>r0}V> stream xڍPMpw 6Hp a`\`\ݻ{5U|}N)iT5X,``iԅ (PR\l@ ':=&^ Ae 6syI<)yW;csHA,JlyN/stXY^0S`% pqrx[߄\`+ObtNi8ߟ= jˮ,)ga/V..+'wU3_UqZtJ)gk7K4`?3np?O.;m iW;? ?j3{_O34J-ϝU[@\W+bbP+, [B\@˟rKfUaΐ߯ fl^秉C~ZN),~o'k':i8yx^OhclP˓ =% I/]+MOd!~;oyҁ`vO GOvS n/[ >%>UbrCJJʄgr ?sZ>=WgRMN2:2}RJ ቗^;I ?G6 q<ȧ0o}UD<\xi~烌tD-jw`=N&{5 XWzP~ b5 l[MH QчCc>UTOW:q=<=U)Ŝ,\oɼm8˩M#MPO[߿Tڸ/T_57/:Ѭ]OuO'9"*{` Q>!rSgfcze;R``$O=.m9*{rVzս&͚j>TAMKd׷x rpųX(:4j3FՏh3Zy^e"BQ!4d:8=: UUEKZѢƇ:_ةfI_X8ap-Ɲtr>UYZ9-߭~P9E6F-2.CX;1xȏr[`'mD&?_%nBF:w*H$V 5<q&M\A%c4NPk"ߴs(-n Su'@х[9yK?ѻk~?THU߼ƛuv%WaWv'(Yf,5.X?K k&>ŕBL9HÇh+]Qq޳f|/D3RJ.1T3\7iFHmsja&]߾=Ao%|0|є{Qo`.FZ\JGcR8U*g&g 07L:#cfdZ/qbr t4Vz7\hNGQX&qt=yJށSWxmXTt9xe{;u5_u*UZR|l5Xr:@>G;huUW#d/7LVC/ڕ/$#VKf"Fz~vNzطeT:QGjb_r_ jGV%!{4j7;mT3brm j'w+C_C[Q/^#4e`/b8T"[Eo%#[6&0uXƚ-+F:B,)o>_NfڃDeB@@^^'J#RPQ5rTA0ݶg`!2qzvՖJz6DH1B.S6)z]օ]Wj[x={! J5iɭU]5rQY,w cr__|2:lڴJz|S:v l~ȑ!Aa;w]mlSÜy:eJ6 F-3U rjf5n[bct=T8NG"鉢a+WFp7)m=à깩rDžJ1R<?\L_^@81F@21;V:Y:,^kq^zuWwK.xQ(٨tI!\. GQr1ћp̋oHVڃm̛'LKhG}H?RExuhZ<&92ȹbwV1{xX{B?z,_%1uTGCA[٭BN\!7b7Q^Yd"A&[A)- /0.F{,i,w1ة7$]N8U60[DdI:`7u*]źZwpAw™;ׄfֺ: Ի~*z eKo'R잧%6T!y{|kB0\z̅'y I*><Tv#R;C}b)ndE/GHݾi8\"Iɶ ;m\;w9rG:@:UNCbN46ZGpWXթFnVgt)V8ЦT|OHƺh\!Οd%GAoH|oW"c@>"]v!r Cd>>z$O}ȊfYMy hX2_Qu*ʀ]"_E&FPWсk9.óXr-#ǒs2rhQ SOg G=N4nUlgCE5[x^t5dⵃ|KG';,Ȕeܰ܄ʴ); zڇ"%?nC8d31N"E^D(M fܰn@oFڌCdvMaIeZ Ԉ}x&Φ i}! Fq2(2R| BOi&vÑɫK&Y p^/Dr}1|5?QQSBGeP}@jUxbyY=9XpL":QN`o? fA;HH򲄉eTm,Zj,$) U! %\ټWV%=e/ =Ԑ t_ Dgu oKk{dl#Jbl]z+NgKAe!gi^pQSf>BJ}y?ywpm|Z=o=s'>`Lt2jLwsݑ޶vL`^$ )l3{uщF묒1Pb^f+{5X9pYyyOfagAWWP>rT jOgg0vq } Y}DUbn1gHeU$ CF=$Mk- >Jk)~ ^xБ8 an壿 ;[Mxm5T`?exTVo4/Z1)Bݫ{OCO *'k3*6N:KV~713^z<ԆkSB7!r3˾)HeMYt}^+h]EJ]/GH%XB:ջ`D0*0$tGW%(V;7M)7Ljcy:^ѣ(h}q`Ӑj/V\z9>>92#D9 RYic^=^\a}-{d#X- 3*=83npKVonǬ-EW/.x/òB b뽒7#Z5uj'F>8I-leX>Nĕ׭oOnM;q/bN͵$_d 8ͨpQ#:vq Pmp=_٫$LC̾OE Oc1dCԞy"uܞ9l!aX7ƽ8\jM)M-~b(Yћ!BN]oymz$$$uBIb98V V /]2rxw@k_2Rq&Z,t}cKjCm)pZ;d|Q:17,2DB?ANezAܶO#4de|: k_~X|Zguu@ .W>tɵ肅uv&i>ϟ)U;}S{w,xhz$+[5spsы74>>mID10dG: 7xN> ,Meu4 8*8Tڌ& ϡpżA[%yh\̑ T@yЭ By?Ow+LS6(T'&NcY[hw9׎I ^5/x͹Ѿ}U6;xȷD]#ImwSIv2ii*ʴy\dͪI^0LuP7!qL;֬e7]N@NRa RwD?(mE[8I}9-x~iKB'sѯ%[kX2%p,vk.E6jZPx3}pY趃oM%ƛ;dg{TA\>Ed>A@ γ¶6YKdOO3H,{֑+{,R^ݫ8{Pv^t@ֻ b<o(3\]L +lO#oc#=x!Ȫ~<\5L,%.u:"e4KxL΍0II#U Leޔ!QXɖK""zM];rGU=9|2;ɧ+ D;M @P"`o"*4a N^W{4peȶ!`g[ G*c9rz18@ Mꖈt8 GÈ%-,C,Hoph\pJUY`^UPݰvV)(2ˢQ[J"<; ~&-*]mbZӟe ޣ(*t=mV xIќ!`y0A]1D񍓏tF x,pNbN}No'#oÜdӋd1iڛ0-sqJwJsܦh@̬mniΔkW2yn>mY 7/#FqV,ǐ#[S=;rF\˟$l#v*z%1̥r&j1sLidȚ*psXe0C;8%2G` Y< lyX8j OŬ} 9a33X_qT-|YmlkaU]9"\t7_Y#HU1zyZiki>H%je Sm >:3H>/58/]e8 rOL6BVLɭ ݬØ gǩ61Ck~*|}Bٻ|r]\ɸqō..gY&PO)T6gI`v$KeoKAlu*C4EDT=]\̣b@īs*-8wT>P2N˘v'#-^GGW6'(vHQp \8-de'pkwasj xHuf, rhl nD((,ߑl&I_P'o"' ki7ֺ;DiL2zEeKgooMf p$hrj39C;Lĸ̥3ёG#eacg-W",.Acm]-Ř2GۋRU01G|O/"'Dm8[8rDL ״6c1H':m} I%i\~WGt\d܆ǟPt^깁!{q9QIJ TfJ?CDqW{IQ#G^Ea%totFqh$XyE=RX>D޸omNs2ۨ9qq@ Uh|;VLJG f~XŤִVUe6cRlˇϤgs(!MjMQ id-+9ԓSVo!E< [=Abq/ӒÓ}F\v-ڿ}Ŭ2(uRl;4RD#Y`_˦TpYS@N@}g^{S9#ˢIi:RDzY6U/i#fTy/F;O}Hdvpk1fy/,_ζZeqʹYϑF&cNrP@s֊ٿJ :KolYp)Dr /'&f$%w'Cb(oҍKb֞5c5䱍g}7|*3t Rxe3c/a޵ PBgL,M-?65T>èVFpJECnK"s ͽ=춋iL {D[.ĸ nZ?ퟔȭL)"0tf%P-3XI?hOt^^KKO4=M}tY*yYa \%j Ap_/)*Q}e$0^ïc=:ʮdq@_ + }&lx}kǴ3jss7@Ϯbgl %췏E"y38藢YC/ $y$ H;(bg@WW{#k9 z=Xl7=5n9dq, /,1O`C@S*aKФ^T8򵚏ViH}4&Z 6&(i\Geo|-={RIt/k +Frs〸]II=cy {(wz~7Mʴd!No[^U5j0LmDZ.;6k*maoШ63mD7[}fSQZi( ~$hԚ ""sߨt eӗ.Y` /:YfC8ZD~@!K mMh"}oavJ\cff)r?oԕ{49A$sjc+]E+3' E3~n؊s=I KM +4Cib7_"Ќ^Ƶۈ#k| Rv:ǂ["C _+;HRmBwAzoN9نշ3b Q\3&﬉(nvzdK"JH-a3 ػ|)ߜjZs| C?iw^ _n|P+~\y*i{9$ij=>Ƭ-Jj3!0>-nh$3?K+O|HnqO;ʂdѯqA@fJRr+c^ /̴SƎ<E/T8gKL{*1 nT,oÈ8qY^~kwb$A⏃SԚk2V*n ɩfg TWh%ȹ6Qڱ5r ˮ{O/+q9t1 e>tx^Ԥ9a4W5iNWE_#c#ZlcL(Xͮn~(WOKfĴS0w\B$ bRNk} ;Y!﮴>Eݙ9۠!^rJx 4މ)u uG d`0?zY,#&']:bP0҄/[k걵^i+> 7gXf+hJx$+NZXz{ŕ7U\CXlTnzM/8USB\Sw|Kid q4rPn1 ,k+s܋!d- endstream endobj 126 0 obj << /Length1 1375 /Length2 6045 /Length3 0 /Length 6997 /Filter /FlateDecode >> stream xڍvTl7!R$ ҈FlF)4HtKI !%H0}?}gl3 =DDA:H\#h_RR G"dT htD%eEdA $/C,@4(wYuDed\p;Aà،v!E# vp4 `EA=B\&`p_bC`.p;(uDC=C6@ X/Aߣ ;޿!vvHW7p8]]5maZA6X)PS@F .;;dU+FONWZHo_g8w n"'VD# @R;c G)[? p ;@?(o?o{` u#H+:un0a' dŖ=?1R~l`!>!Q@HL*.@ {g=*@#N_{}gH,b$~ ݿ?yVZ+o=h,uX /@ឮ!X(!]=D8J ׃`A/oP=$ A`/UvGKTE!KLBbWIE4A0@DDc]H@=SNJHD՟?Ryzx` :uj(jG:;TrV->"G'(2+tD)cš*߱N;d拀_։cͤ3{Gwj{XIn )n\87ks=)riϼ}j{J燞MoTJj*5)}5Ib%CӇr3kvDXeI2#1T7#+1f&ôqBO\뛂 BT›b/+fDyYzi&WY{pުl~Nwn9([Lq'gF$M0I *u<aoTď~<u4(|I TRlaayꞜEl0<;C[g ;<"MԇT ]υ,[T/ͩKt./5'i~$AĶ@P8Ep`:* jq';S㤀=w9J'GULc,ajf ]\B}2x-RJ)=y;VD;;Eag˜<%^O-k@&c*;W8N7%YMͫYe_:~gY6>zy)Y)]IΣEje+q`/}hCe&)'ZI}:aϽfBX5LkML V'K)}_;VF?`X >X&lUyPU|i{Kt,zIV뽍{]K_hW.Il.v9c]o)Bh9TQ7vdp*f УkC,u6fLYD7LdZKE0'a~<%RO܂17)P}]pP5.).DFXgó, 'PcUٮ*sWՅtqW[IQ,Fɏ7C6kcNX2][ d?}t=WQh$DԂm s%hҡ\Mqf2VܛQGR{IN'cEZmi5>O/-o͑|!4wxA*hqŶS򼳙0s}EIqcdh e~e:.#O)Sj۠,vlxkbN Ǚya'sO8~=突0d@syem 1JmUI-8i89Ax 7;iW40<N{kb7wC A6HX0£02/=f,9*9Jl)ryG3$ #Yu4E֌}޲ΩrMhqG,ИKK#}RCwyj$EyG1TAlhsrMᘹ)Wz[/:=Fd7RPe<~8G$ ;WA ҵP~I+ݽRS 6 P|`;8t]U#Ĕd@>.(ږ!y%wO}-w'9δ|_#hk^#~9I@qMlXu>st;y*mz!w/c=A5a;-Y=&+ؘJW1BOQc;>JUo'OD?2q H+L*"4Bv%4dn[ G''s ['pNZ?T[}^z\Ԭ zCnSHqhpԦ\f6lb,׵$YcdBۖLjd&c)>wk٠Ki`"To`|0;/ݺܗi*mr^~Evt g,2y:ܕzңlA7PIx^~b83FcNdY*Se%vUPI*لU).G >p٢ >/3vϿ" ] &+.VjTB/**DQ*qb]}N*x?(?jU|; j,c;)ZV o_}!^簋k!+̨zRK^!/lvmFK$MS6k=J#Dj:͆yQ' VA˂I'@;F HT5IS<' /~:zvd_D7ݎCތI W3WMS,hm[Y7;iuiB"›e|Ѥ,p&2rNtG$ <HV9/c. M_ Q=NE 6eZ{>Oi%8Iha 3 3谨15ŵ7n 7솮VKmsX.ÞBe] K>|]ZGQ/hC|C8JJ^!iگY:N"LdUH'7> saiMQسukSH )D~ڪIbıMf'n#M;eVn5lv$ rl*Q2ZVpOK0%4rSzO8)-h0]Oq,[\X}px).4 m{57zDĔ$j@W6,%vq㝴{zoczsC6% @\̱a$XPCRnkX_x*%ZKE@u,(NO%i`A y)/ a&\aH)I{*ێ ju*tǺ nƆ3&*hn6X6-fm̗A6}3'~BC.t;Y4=7iAғɇ%Vl*lJ3 dWהDO(Jf'eTGu;a(JSzn g^3) 2Z9zw_P{}|u+i$z[*pPU=!Sۃ%e͓;@؈5֘ 'iGM0e hBmman9F[~֚>#֥=a8P,|ؕԌ0:3)Js>qI1\Z|xTۻT "¥88sQZ2{s5aXɆC˷֑Ht>(d7x*!vHgdގF ] xdJLJugG{8ƯE=};7Pfܥ3xi&9{śpN%l~s滊NJ/;|ί^`9=Eg;–Fa Go. jLNqLdvg:ҿ102p政S`*&jU=@cዦD6&jk^7D{FX ҏJAL^*emRږϚrze X:ϧm_tuKt##z4_FWr0ǟ162S؟yTCZӀ!qiέxǚ;5;?UCBi,;{]l1MKM^G쬫VToC%so;.~l`R_ }!,HjUSe})nP譫gmVs~RrWp]h~MQPyӳ]L<*N=kVƇS:g^/)"ނ33\ O"H`_ _: ju'1R_*L6b>Kbg۶GZ ݦ) k+A_o^Xu"f-53WTɝq& enBs FExmjF#2ZFmmcT<$NEC+nр[ ". AUoaFRz6a@6~>Qޫ: bI>LmTȰ07h٬=F߇p!JPI]i?=-MɑT7cZ=<7%XT~6}1iڤ ;BM.:eEB@gª$:s-) 2) ˟;x$4qFhm;E,vB]o { xCZ> yl wӓ010?Х,G:g S,D?~S]3RrǛ/f#oǑ/Եm|֤yQĀf!v9qv^FV9fB'RfԷjbvH,{qnصf"GS5Ӵ}!DZB#͇Y~;6 %B( -M6ǍOҕU8Pz-—o{Ie:(&?sGl ږVU;E]>=cu_T&'luItV62@ΏɌѸT֎!&_([lmwdu/PuGil.w]ffD8>h ٥RJtjѤ endstream endobj 128 0 obj << /Length1 1630 /Length2 8301 /Length3 0 /Length 9369 /Filter /FlateDecode >> stream xڍwTl?%1K`4HA[j5JQRZQDNQNo<>TD@D `Y4 f?O(@d~XL UG:@8"% b`>@5h/ A^UW{@~"##- `PPGboA=h/rn~~^ P``(+qU"܀&p_O+] (h6EB}@,QX3h4PCAgiQ_m H/(*r <@C =Q ?a KCOVwP1|a>/?_Q_ A`rVE#p/W|j8 [`zЁ(gW ^ s`0XZ {A07/f^$? `S!\/0$- 3t"Pca2> m0v @_';l9Qn.HJ[FwQ** FDL("& B bb@i!^?mrAe[g\ } =`I0 &G7X('W?룱oUKwF#bw@W ǨZ0O nEzP"08V0'V @ی $x'+!!` ()G%J`ŀ i3/BD`q,g)JAc G(شB>4{M?G Shح{5Rf Y]|, i? Jx{wP9jvY@i(1͸,!dd 09~ٖrM7u3 P; m:7)hitWGO%=$dh{t8 ~(MN l;Ic dKBL{}QY$ac' ^5gIć*6.dyeA %#'rIOKVҌ9 aI+pzm+F5"4iCAAJsc36:Rn{/ݔ_Rq&V ;lk0*!n":gHl>LsZxOkZMJzPʬV?}Y H̢Q_w{.Dh<39Ž5C~5}!ljI!fZR3"[ eo+MvVmID=X8jFvB7x2? awߚG+L▷3xý;2:$U^'),(#R>NbH)G7/V=k=&^Fv+ yA[zŮ?F uU] FT^|ڽj O`n`FKܟ w3DT xC|M (T:6l־Ϫ;A4WAHNB9 {]O|࣪g\7)5 W{U%ֽb9c=%UP]ۊHѴ&2\ ctVQyp8eZ])+ڥo\f I~ܦbGRuv]«8cpֹ.o)2Kwq0Z"uJeg9z0Xk%% bnEbP2=x`g Slc!1L^@N`?hXh= -niS -u1N ]جOs:@{p`wFwWKYEe׮;f&@hL oZ}ѻ#s|9ZR=ՉG(&x*M"EEAޙ^@H+Ɗ=B7e?QR@$Κ q۶[7$wWnriF|!e@e؊ydTS2 >GɿxGEhY`4 = j*n5w^s*o/]į0o\k[y{C8k/?0SS{/C2X؜yyS`e^Az=$gޕnһ6wϦ,u)25:Zd}/+/x;Ї=YGkv-%,3^g?B/Cϵlkkѭ: 2Ar59eHIiN+qz͇ lXх?tS`opHf$ L :D$Nr> 1t2]PD]o붘!(N%?󫀉%7[}ILo78NKz<NF Ewbc~ OX?MwEw*XC+iS/=_h\!囂y׭qOD]q7MU&sd~L"4%<#~jCMA˃vxUOPnLhM:iqHɻ8!&Za|hA_̉ޢ+Ւ^-ɾ~vi1m1At+oM$_=M7̛3(~MVCeiKy Gw<^ # ^i2|vw3CT^yP_Kʍ^V}޲vOKSq֒EX so^DM\TT۠B*u^ꬿmdl7|:e%<{ )&!1.=#P5/rm빲Ğt3 C柚1SArRK~oL ; c94*~”b8hmj:}ye-1&# 1ps,ٹPTw'f@(EgKgXZaB3?DIyugw>[NSRLoY\#K]7*:7y@Wf W Eag=0'c'>0D7V6ObPveOAkLF_DWF'rN`r-֌/{K cxӬfSWoz);yB 1 U'LeGw%vw$i--z1H`ڋ:i1֊s ԇ =fF @C%,= b)wX^'1x(:i( i~5EO@"Թ#eC0q*s{~`A|<8V&c)+k 1v4`p]|mFԷFvtxp^Jp°ÈafpQMeM): +='ݍBqCS-Z%f߁T4jmLN{R'ɼfJ&f?/v\RM H-!=`~t]">I_՚1Vtss&WBPuz4HអS-=I/T9*cVxj.(,~(^wKnPnWr5ݾQ_$+'Wd$^qY- u~uyzȴJ(%V!8W!COf@i) Re<bspHK[iJҟ a'v$xFx7 WyzaKGa0~ʲڴ%˶ T4n $#$Z}ʓi58O*tX0,aXU 8, :J6oe/48z=OnRUFv!|zD #_хТ>6<[$:]ceP#GɊ.2^eț)"R!T=LxTE4)"9Xuy%Hfzm /{höB-Mz> Yq>*ߘ{dT~IIK dH%|u#a87sZ(աY$ʃ!gRCXbe2?b;423kj^/2ͮI~;"QwW|ϝ >˘y_$5ZmHLMTΝ ]wUaQwɿ囖`Vb}/+>bp75B@u@M{:FS]fMR!rޯ KoʕȭBɾD%ƛ`nihA,-o@qN0.Q6WS9<76sqfrR\E:y5ɇ-Gcݛ.`j`\MQ2}F'j />@ lPRmpo$طT=zJa@[T?0^Lj$E?aӌIr/_wͫù)AvhrH RQW^ }'ꆣ3f\LNv›' }B$Knyꊉy;>y:5>]ژrq0 EѧA H ˽f}oݢ`KzDw]jW:*I3`[5\rxm>=6VEښ0hw%skEBr:jE;!4!]Ͷ`ObO) qmǾ. XzѶ{{OOuCy~\pKkaYwGc4>+@u ,9.{9D+ƪej ,v-9QWdQط&ǎӱkt%]h)ĨTAb5]\#w!íޙ'e/&m(07!ӼT}=&^IIE`El`KG_;5IMGCo. QM>:ĺGO&YxziN.07c#L/*['uNLKO.2+ݛX*e-5$3J޵b# u)aCM SK.O``]s4C\ s/vp7J+bYIQ |~@(%g_[1Ca*洑{EJG0W9#wˮI,FT/6^טwB 5[cds9l 49.Ac6SHc*Y)fj՛ARp8{% h&X3.ս߾{8d{s#ޟ&2)H4&EJCrZiǎcK+Z K4=u'j]ﺳ(yT?=L5 /jaIOs}U q7\QT.·i1~-Gbi*>!5 \棒].0)q}]njOf#iGy6:eL1DG|6iHYse[y a 7t`Celql5(ֺғ R 9t'G?xR{DmܫEmFKgl(ʺSFhQ .M AOm= 2s0\.qfLrt6#MҁN1Չq/tv5A{U,W<{:6R/w^kP06'NAGern=kNQH۱+䕨֞,Р^m0eZ!GuO@ye{z~/$fDr9QS7%=455g.L0EFNxQ9K |}x\M:#sT"K1ќҶ 6su%*y;k~Z5W2 3CGBƲM0*S^-^dBU{'guηSdYLN*~)>24zAvd%Fl~5H@> stream xڍ4[6.GA0zщۨQ 3ƌ0:]]=A袗 &y{}k֚g>Y4#!*8WO@m`*(ce5"axW(. DaJ $4`Aa@H@@?D$@ 8UGW '@PBB;@  = 0@@!HG"%=<<@N|G<(qC_%t@N?  H `PC\Z]g/_ 7_ 3l0@WE_D05;u@E^BU>W3ҕ U#0cVNN8W~JP ܽ4ge mvs7CAԕpPߘ @ 660r6Q58#2 ~P[ ] ~>4{'(Cmkwt k `. @,P #0[̯^! @PPP &&w 'mERv? w,JB" "`zo?fsEANPJnHh#P_c_ CݜתAnR4C>PW' "mR_ѯyA Ay  5d6[%&j25lB" kJ#J0|p@EjAlLQAQ5ol v@W`T7 CQοoL99#\ID"QL(v.^ ? 0ԹENP4CBQ<ᨛ + j7F݃C6n..(QB?w# FH; nZ^i7 AͼPxKli/-άT\$rϚ!o^I{F-A99{.ͷ__߶nIEČ](SFXR47Cu[(hBk4ךylǬޞݕoBɨɽ5s)4\DKS♨ Bٖ>[mIvQ |ޭ!2-ӵtj=~cb=OKJ0@ڷJaXn7%=}$T }>yױOc2,Rd`}_ D0zA7)2V2]FAfh Ctc@W!倎b" ޤ-cNB'c.eQd Ê3.'n_?7Ү*e&'1;g,LgH2ck*"b|+F(W´v9±m7Zî|ߍR08^UaoEVImuiݷ\p3XDrDŽU]Yo샴c{m)ju1ɬT% aq"͊F F!+|A Ζ[u:ߜ˞)(HgC$h`7W"8'% Mma,Q$5Uϻ,5W?󶹸["~DW"A׷|Kc}Tfԏ; Fxi(*\ ~n;Pw}`õt_ES:'hqfnD,w7BN>J:Js`.Cr M3zvvP)Mi?Ĩ~=LT{,_fUqxiby1{s^lWM2TrVG(|e]g !;ꄓo3Hxcl+.@>oNI ,FquRC8Tky:5` _r[ VM;Vτ)i}3+pQ#N[إLۖ {ѹb\"/aG>eX~Uc *@4Mb:D~չ.~8г[q$# 䛎 ڭ2x~ͥ9 4B`JT4^?ͺXllDt/>&r}}oVtw,/tX/ǡ xS[ax *}I$tAJʟ+UKΪܿNGb{I&JqUHo>k͐^ҢEHd%:HqPr'uWirhv/׳{ϙ sGL&t/h,j~;޶TܚJJgnb<"_2 ʌu%bk;n wq3+3:{KD؍!qTL뷾ąGm,Sס\HD?P,l ?[je|*%MHHt>|{BWi'0D\UjjO U=z,͔2xQlJ+"BɎ [{%+mލ1,_XI:~@-'܊i\ӵV\^',v?ZW}Cuy36pg/iPrC vdꌄ@o:m0p aoz`}o搛&ilaSwzXȄS h@nFI'ӄe'GLi,7P2NaI[_Q7tNmp;FGyu0N*K"z?MSu41XXqSl%\t} 76rhDo}Gz^hٯ"3WƝlCEts/v=Wt gFvbu"Lo+i{J;XԒOMiy)ѹ f 'yFemiNk+رKy$Bm Nc 6x]+D(9t> M{dֹ##J CƼ-SO_:3*_%}M^)w>Swaҳvqq S$]2O?4c}ؼ:r̡bCYG\L^ށ#iqdK9sGmO*SVV/9*3g .{6n*sWLD}S^f|W 9͗, 'pM*ԃ)4V.m?4ۅ9R4~6*Ar\pW:dʾ*ŪfgN7X?2H98gnƫ &xV_9p!VeZatItmmDqf)3zYrft dj .o0m s@ZRvѰoNQl~Q}jTZ0nEϧ#Ot<JɧT5ccYclM%V^0D=5[oWYbb?)yA#h:Ai gtk.<Ҟ\)>|L-ENBi*QlB\Qe4|JwbmP̸I{"#_F> Z Vܒ C+Rc)J9b6VnP0M $B?wSҴ{82C/ɴ]#AJj{͙D&Q#*GXܟ5&&ev\<+3eC7|~Q9/"$ƫ5$ oЉEf.m(rbjXd%.e4oU\]G$EղY;yXT⠵eh>h?bĢG}p'B7Tcii8?7? !%|/@tD?fCrW-{Ff^3. guW&\taE2`tګTOKǵc7ЬB%n[6il;eE`&MGcpS\uMI.dl=NMazsZfb) 4QX?<db*Do2;"GzR zqIW\ k-wY{ w~ZbA~<ǐ~i ȋξ;{w-";po[vʭv/S{1ʪGé5~ʽfNP\&@ͺcN=i^>6V9!Xm_ixC02gsCYhQ/SqӃiܾ~sIH\XSRY#$fVl'h<#j9*$kaD9,>6##;qb@dFv V&r+ \Pd\d{CBVCʙIǀ _nZ$Sb4uQ*T[ŻUN >ջW:?IhL^DҸF)x]9poS |kKJC}i,oH&՗F41X^Hw)I7^;C0SRY5fn, ;E%Mz]l0L;9 ".т*WD}y0l\>(0Hn|yCi/kR#g\!\iاo)8<6'h]_/|)-_ PuXNM9ӠT;]X.C^ΩXjmKN&L,[:Mk fNK6~\+4X@V}-~g˥eLҔ!"\͎ipȫҾlZ뽣9=2:iݩш^>g#~CyvAͳ1 ѦBqhy-9Ac4'4_0|cQWAg tlOlMn?Ύ슑&O=x#/P_VLϒL_!PjA'DR}U QYK4fʪpiRQL"3]͓YˇY[֥- %5'z8ی1~^F[up'rZ%~iGl}9!W/}sbWIfT  Sy=C2{z =wv'wkYD oUjpbQ6:({P|:_a܊t>jm1J)OV.1LW?G\. W~ȃM|n_\H 95+ѝX␅)MUs zK9s1K:Ϛ;KJ@A}U)> 2|/'ɝcGdѨcZ6?jqOH~ (heT/L-ЁJ1F^_oaLoˆoXy.OHFo~LpdJ)> HqKZ/#qHĺw}DRlAyǒK+ZEoKSEg  ϼIJōE4܃iEYʾPKEvf$ȉt=C0-T óSOЄ fE?Q/!go xqXQ+l1^$%UoٯK ¢n6Hà>I;$E)ZPoyF #&mY秤R7zka A"arZoxpӲoԳ:Z.nT׽nnB~VctҍأO{rUK)> xHڗ.fjy q`Mpees]N҃rhNK~+AW .u𯑂IMu┪w*rlkr2%M$/ǩ>I=K R ;0a!lEeX8ϯǓ[?@TUlVT Ȱå&tu_en!s WNk*] | ﶳgk>d1bVa?JPj} ]6ug#n" <,sْiΕTJ P@lb:LjaYP΅"zYSY5؛㌗8A6~y,d;>Lz.)@؎*b=+~F毵_\r1m;72B0 tյR͎C=WFe&5vfYecOٷ@`MH!i1?8[1lpǘԘJƜ2PBR*_IPK#I46b EM,nPJjk|LjR؎o`rfT8j Y?'eRhuAh㱡,xl,Z󢔸4ŷ84oiGuuuѹF %ic)b$ڷtD0CFtplS*F mqBwy: 9]ѰFɔ;2X8ꩦh3E#;o88R PwwA8.k cUA_;>Jg $u:;n9@&Z_hS%[~ ь> stream xڍvT6Ht Ail IA@$D:V iy}y߳s}s_wd症"T4o,"`01' s>\HĝEPD h AcxHBw$0,/"uPD\9N(-__n+@HJJwV A1 > C{#][4鎠Ep-s`PWÀ3bNGZ!(V0 Lr@_C!?Cw޿!VVHG'0@[  /"A'!.4=+ "wբ0[VF@0څW}Jp s힂&k@#&NpgW_ "fCb`)qqQq \0mayVH5 (ēQ-G(㷐Aq0=H '/7 0aήB¬oY``0+ϓH+`ry&wX014TW I: L0~=0mO y?ܵa &h3$B-N1 !"o~lH@1K&sklK>cSɮj~\G(4E1y>$jHeeѠ~bInaX={N+7%df[ᓗsl?80̄>)Z[/]֋="@Un#3'{TUS`bWp6x)5Kq4QֹZJl)(S 訣&`k/X*4 t&)+IΆ'0ضuzoS%<_O@?Ps-*?$4^R|(sظM*B{6cWH?/ag6lM8$Dzj1Y+Ua+wP$xkj{UvI#2+M^-ړ#5/n}&տ>}U_ !KmEyV*^.?zk ӭwO x=Rc}+RZ=V'&81Bs1ETr]#M% Hd;b|p7?#= NKcc+5$ b_6EN2W'X먥gEWMG}d|2 ), KbڞO?1g3=9*bs-g1Onih=*}ѰqX7ޮbdvgV(UJd)okql87/{FlݩOe*l\:2rd4N7עdْIwТphMSjv"r+IBT:mxxpzVi Ac$U}\)ncTr|tW㘾nRʨ?eP=֭3*20EܷĒuEQv9 bN|B _oVLJf3@ơ yh-M{]( Tcz9)Bމރ#ۗ\ҩ+e|ᲰqQ=k5SκWϱkZП1LBY)F=Bm EES\*.6ϦQ6,7IfYTm. Eg<9O5};iQSEX^eY qE{<|?Vt+`;m"-4 49VB m6{ \{}k[xTK~}Dw=a#.oa^sNL֛,3 \$ivצ-6OG9 hP 2%:rnCt dʁ-H&S1aD`e<1MYG5ɚ _$}i}Ve]iޔzqUijZ`lBr2kP&iRGJu)+^Yz%%{J?49Zs&ś(/aGn_nɄ| uQNtWaIogt t܊6`mtjj̐j7Hh$5BM%g*}%|3=ᅪI];Tw^>V&Zl&U2s*T.7@еuO7[^m^:/i] Yϔ,N 4i%6URVkA::cI{=̶n궛dyW5be.$:/zkMC5 j-˳\UJv%ae(YFф+`FCֲwl*,ܔLsVwyC5{g\Z_]ʔ o(\c}VY 1^GgW'+~6xӅ3Nf2G83J;-,A^~%$qMF^>ՈKܖbȵ&96QNAj"wUipQr$6Wl\ ֮[,my~h7,sr<ױѪv5*Z^]x´6 B+ҒQ|`hx9'VxRa"jXr0rpe}gS((Q|'o`^I7`rN1 /6!BB#=_zĨvz} VO*{@Cޙ'd sƷwI@uEȽO3Whpߟf7p8vM^t|Y6xkjd*qC6E͒հt$vVfLL?fmͯ ؒ >ui@(X"46X Tgu\VT$kť&Wperu@{y 4}"'3es Ka$a!*5>ekC؅wѧFU=9P7:ЄDvTdrʦU|Jmѐ+#,ލy[G}*&UF+Ba~Jx& 2)r`Ǻ}JW0ӏi%vj%VRx](ʥƫj OՌd2!k+%-2+k59t8Lxw3)lոRXW)~O6݃_w"ػ; S6p-^Zhl6Ld~ށLG Oo#ieh׺fHAFXl>f 7 0B$;֧s‡&qޛu߳tx^%RKVsnQ-!hCOIjdok{{oLEnvy΃mO- 5}$rX0'ѐuO@26d/.B\};L.W?I|uq2J&Vr}',#AgN:+&i6Nn$ `$ꑆCe[)ilpPKm#vycsHr&;ĒNq{ qmgR(bU^%,1C?շ)Qy(qDޛ)[++&tn^Q!haxOl&X#m\'ZtD6_EqФD2V_BWlVjDF;5Eclʻ8@J&~?k岜o]ۗEv4I~S<ҧ-1ð0 -~[(A3O\HI$ӏD hܒXw@3U*n`3FBoqtFKó IO$ocǖ[p¡('wґϖ7w9nmՔzY܅SxzJոZX*^K;Kͩ?Hu حZ!$ThoI3X^xiǙTiRjU =/=g*&4B*^mڳtdf_.zG7o&` [Rח4:guBBg ޥeT%0^Wafm;VnBJ̶ (*Sh뼘t|HYJ+o{g"w;Z)ʹe*D\)&L#:9f8ŀ IlB_D35|E<#&=eٛ`UG6;DfL.5ӎx(-#׼bSIDrډ0N> stream xڍT6NtҍH C0tw#t % Hw7HwH |ck}ߚyϳ<̨)e3àN^.10Yp' *8@b@^!1^a1迈0@qaP F +~; eC@@(@ #w:0p7+މ d{p `'0l50@h33@ׁY"\p0 A@`2jtT`߳r_ @fCVK!ʅpC<@;'2͑ߝRZ r9׈ܿ OYj!CNx 䱻sY[(wa ZXف VKABx`V`@GTHH@v@ܿ;y =`K`o%tpg{ sOu$ F^>0Aj﷗HyYv/6ǟ<9y^>CXe4mDt<uWZ0jDn#B>x;OῪD ;@{_R`H@cZ5J RP+9yx'yBY/A`M׷?1@RC`}{_9(fg|B tAʉOPɋ4\P@ ~]+/v!18[" ^_7Ƈ쐃FӿmBkR3[c vf&aA6A Rc ;,S`²n_9KGAcȧ闂5+W73pz,1HM{!%-pS .C~;@CC6VHCEBu֔Xz ZF:s^ZBܯ'&2ԯIHdO*lOJba=F0^V3RT;o*.:d.ȍ7f`¾ Uo ÏR(YfrkdXj;)-d0yvZ"?9{:d\բ$}خiFilerLx4z8.25X.mܰJ>boř#06Y z)1y6*ߗ0>fK |hBI dV4H!0?HLg ؓA8 G+XMﭟ, u17Y|1bœB |_s!l5:3=ePlt<3f w/ZU'*ܬ6@!_pAً/c&:eW~*ͅs} R^Iby~z-[[Ew@"x4(!-iQ]T 1UYW4@b#]?J^~q!k^t[sОTz:ՈYJ=+0ewqQۃA}^ˣ/Ad㮏>vC-{̝c$2X*\++2 nij,}2) z\((1+A ,Ir>pѧf_jShftob[PQ(CLb8N~1T!qq67bU;3y#̾NѺ ,Q^W|<ݜ=.Kl9Aa%Tw^g[oKx. R<~6BX#I5}}jǯỆVk`) ]R~0@6#XlZ0?_!bW]uʪT+*[K1-{?w/&nͿhg'xy`q@THCv= ~rѠ+#P^KUCU!IzuʉƼ(1j7NSڨBɮ$R\5C'[C1Ed]:3\z\|$1 ̗fN^,!th8@+;k`3Xq wAAڄbw#éIG4w&m=]E:+[pM }[iV=w*A%z80\a8r`GoB4T6B01 )p]7Od>mP=UwNUa~%p+u` uSk =QOAsafFݬѲ~KI-ji[Ox /h( (8cGߙ1T`HH:Jޘ<Ι %`t;mMmnU>FWR}Ce.g%,Job3pHRhqǀ`^lLbw^>Fh>>?ZH5+P/|C O3 %DX=3#qyzl=<&׌ =z,i<ά%byqW3(_a9&@6:Fz̧æ%UOjgXqr~1UB40| TD5XYٗqҽ |6)G#he#::m9zGNY0QU5~Ej_1U*V~pH ؒ , a-ymĸ@>=r._%RZ| my.ש:iűNmzd@.T绋sD7ywb |QŬ 4XcM'W0{ >t2|Ǜ{̶j5K15(TtA_}7]k]7hx}|ߔ.?{XDp=E{lg`cgFWf͔>#ϴ%m4ӬQ(6ȪrSp~C]8{Ī. KGF`TS2^:R 1."s(b9(| X ~ YZ0udAcOxUb6XͼY“z-]*q?~*J8-Q%px.τ5H ܤ|ƌU ma@ 0`}ֺIZBr,f%fQA).z,2u>=>jONJ!3٫1q}̻$ڸhlx%}ȰdFYéL?83P0!$2ޟ͟?MpuCP<·madGt!)f==1#y\Jye/ a;nsj2܆vfpZUf{$i3rK +)=" v޳07 uUWDPɽ5sugʄ'U޸U,fAKypppBQBc.X|W!дјм{2wݑjmUwFLokPZ߾6~I4Z8 |cP(I?k<2iי tjy`d2{oS#h}!nudƟk`'m{2Nk(NbޙލlP׺E"k>BD~j8Xk5;[G4Cqohq[El)Ai Dk Fs[ $ փ} '?4m֛<^k9u5#<]VVJ*e>KW%,0C[FI,]Cf2ŨDx F#">*4]ȭ é@~#UvmNvIxυC}c4fqQNXby4Ԩixo!M[hEv+ɖ%JhBZ_ *ro`}-NŽ[ $ȕEt ,'OBU˼y4j*}sE |x3G`lii'f=8l$(sKʮ;Tr[!eNю8G CŇg~0PXquu7CwVxf9|-/V\6MY 4jmEB%UP3˰ fMe-@7P/ hxh:=Iq]OPK=ʶب0Y37NknamE0 1ayCX\DG+d_zsjqUO\|~Wo%@fS? ;.ҭ~h wgQ9i]7R$iNS1B3);JUr%vZ͟hLoI_7䨡@(e&=(C\<,7<#:ץte}f'JVpMa$_eY>CWAWͭʍGBuIo7V/bm߭V,7/2-%\R Jb OZ3@fas N%#t RvݞQ}sރ#Oܯ_i(ء rvVrὼDgNd`Ic49ʦQS_];b5Tڙj9]+ޫ3f95U=];ʡ%b],[#!*[f]&û&K {!]ʈϦݖ 30O܍GH?yι!o[J}s1_Y ]ֻC[cf@,N)+*P=d)sUW:C/D;![=GGjM3/i=kXc65^3Ғ h%9?r.@mAd.%kj=tOT_˲)'ӵ8@Qtc<+wifJ-O%j /%"u5h7rDX^]XOqj:ŶkG]v yv,E^s/jO;uU+Paz'C{ ۧ;ٛvw6 4I3Ķ_+ J<{m@2-٘ɦWy'ITFNsN|:QWM EYkkٳmT6ek|̂$ZrO.^%έO?]k" 95UJ "MA;k#L4Οçв}H_~fDuaVMBL7z_bzGq>$as@K卢n˞>9K+B/~K57;U]0tQy)6/w>'KJc \]Iԗi =q"5r*pN ?tH j~2P5+N-_yeX^V <UeR 秄Ƹ9 _hwrw?v] 8H'ȝʏlqLGb/UyħY*I endstream endobj 136 0 obj << /Length1 1959 /Length2 15165 /Length3 0 /Length 16368 /Filter /FlateDecode >> stream xڍT۲ 4NN XNpwwwwOp>ܳ?{fլZ֪ZQj0Y:@Ll̬ %M96V++3++;-?v*m?n@3ЇM ATrv{88ll<vVV!:$I[7Ǿp흜Y:YZU PN?6k  6,-/G ~..2V?w3O  #66`uB;hoqn֏cGY:;9Mh(J0:ŝ~L\&v.V;'!Q59Y9-cGz?B߹?: YX->?Bu_Y_*pp߄%YpTGW hir ish&6NfVmݥm wͿZ͛P#>qh3וrpkعfnnf>g~lSi W3XA!Vn,7E/ӿEo `/a8,r#N߈7ȩ_`Q}d}d/3}h1X,7maMa7 !C!~|\3>lnCc`Y`q{ǟ?R]?~p(?Lؓ?.;N 7_G @X^p k{#bڟ`Dٓ~LeX)eN#<9AvQ<~M`:{Fr}LCB KnoJj/FNݺIvp ?RbOoscNd_.Ҟ2L!ixľ%xa5?5ݓf G30T5٦-G2[t0YUk"RF_>RhsO`=Ѱ4a$UZ>e;/]~1[@QmHbT\k}jçN.қ2C_uG}m%0z}muXz*>,`25؂hy_d(wOHFgMi_-eRm-9eWtҟj"`Gp`Weܥţ5y5m`;$y)|JBŰVWCeA2$H%>̑+b<,Mɥ*w]e$Agէ8$s FMKߌGwp /V)PvSs5֮0Y$x;a^ѦW'8 Eq(~{[Ģ;ux$aMn!5qe3P٦"t%Bb!"r I N>8S  K߲ʖGK9%-  >GQm&:MІbqylRlX[TʌR;N8Io!̯K~@k@OkܖG>N2}=u#ݥ:DO%9k|eCLA99BR'ΚkQW] P0J&SRib3f]Gek?OsyβdJ{WCI^}Ή QS~kB5C3XVy3E$%`i[&ƎaO,_ χ[ tm6BjKzX77F0{-gxXk?#85_7OߴPpy&cW\8vKo KYd_K|1O~[+;h_"=c=NQ2{ǔXytCH$0&'}^/I()\}r/ w $"U zx{q(-.x1ا:*G.e2LB Dr`U&?A8%fELe!1 POdJ8E꜋, z>6S JXFkcQZ6 `u$= ]ᅫ $V4[KNXSg-ő ̆}kdX<4WXqaVؠKri'KnUϻڀ[dՙ[tƟN)iċ[SAFRǦliH [n)w@qŸ\B+@ /jkϜj ߽o$=;@`?jCSj}BH ax58]m=.BN#"0H{B]AӎQ˾9\/H(3׌,P~rRѩS.ەdr&:cgԻJ1I"(4νK~7,R~&[=kUTHfў&~]O*wyF$w%v[iC\eԁ=B'3AͯBײ/a_ޏ2h RhZ).5xBXT W9cZz7B4~VQDjБ1-dj^@ۈmC5W.Mxдqf,d36E(:1[eo1LB4kdƒ#prRX"҄ֈ\i|qu(d7S2zn! 9׊{]X6vĶDDg*{PJjYhhUᷝg $^󒷿f3ͺ|^J%џ&k8VRQ sXsH[J=+,0/. c[瓝ntӑ['3>Flp`4ܯ4`'J ~O]G !'nI 85%X :"i6ihqA'>T옪#ZVMaӠœoiQY-U`|y-9ofn_4*>D(*r 6-tU{V58`4%iҶ^2#qSs!WºWxs #-aלo@-sǠfGY(2Uvu3Ác5wΏ&hvѱc <3#cЮˆ7Wy MfQNo0B#O#!Mls1hȒ|Ҧ {Nk]rw.u>"c\Z K׫wm;Z6wv~3腋4Q_gSI_umk«SԮIh}D1Y4ԫWy &,@zRW9~$UREov)Kѥ9IvFFZHlZ)49"3Kt8W!"\1fnYMqAfdss61Jr;>]5 ;a}[^F${IYur0UkWkh1S!WtK = (HJch؃J*6|솪pr@㍙D-qV#2a]ݘ0˸oZ|~\lkSKB".e"m'Ato` z']PH}*5.^7#-0dD$sFJi>tUw4 L"Pد?i32wg~YE-hGFXb6G\9>|¼b'l(UCQQ9#*CHb""_R7[!v݅<7&!xkz M.fk}`3oPt5lbεf($nen_BV?t[h_LK<0x΃j[s$}/N8::Ŝ\4Yuc sV5cE5C? x!ݗ<*j-™<A'X^ܚ=ʚΛRq?"L~]1dNNxqћC( CQU4Rdí!,< F?Kgύty&T՘ nk_,,Rvt]y R}ةi,IC:<ZQCGZh.jb?48g] ckA v 8:3qPK3RPwn¶vR/ BrqֱWq(nۢPq:^8AVYNdRYm|wƒ&{A%; ghdDy| fXЁDEJAx<0G ѭZvENȎ"'-,]ew }O-_4#rh_vMǖ۱40?O*݋!)B"һ[dK<o`Ɓ[Oogm@55Es+q*XTj|Bǃl 1M"R!Uοzg'bN՝Hxc*_lzV.7k>FqJ%hˆuCU1:ƶ/?E l5V?2X*ȕ=W|^h'8{[ήܮZL{`d8|){'WTKi>m(@EZS >h=YKs ߗ-#9]8x~x X |Eԉ'sgܽ|[9> 8TA׌ .EHǬ+8ZAnX |`p15y5PFJfY #53tMfzEyH' :AYG02ՁWؚ&P̃iߝpv}Oތw]=o4iR NW #r5+#{ցRT騎e.b%&t2>^4FUF::L߭~o-8gEuתZPt2WbF*JGL·5-">H>/SVZtEQõ֘W[pjAd:V^gbcm7͐zӊ*>jtD]`m64*$UPS(g8bN0]q<|dkL'WT2qZc"ﯳ"!5r *96nvT/jcRf%!yIE"4Ӡ 嚟jť;qFT4],( ,,R[:,YRF;ѿo/^Ԩb dJw$hrt̥܅4\)k'|fuqp`Mi`1{?@j5l40gGA$dDq۴9]\ոHw::c5;ۻ/lKI_DqMG֜5, SrVa~96HR9!F\NF$ٌm!séIu"/` Tã%%Z\])KI͜hhL9ṬK a5Xo݇oH5D$ kUqGטJ-mKweHx,Qisvrâgf zU!2Yh@#UzhčYVSPv_ K8ʦ7sH8&L>CIT3.";{"aѷ,-qaKz8}֐W 1Y+ƝdC(u/c.n7D`σdB݉Vp7$5;K8/#ͪԋH(}lXPAre.ow*M%n" w˜k;ڗzl;e[SwK.u\g7lV(Æt~VAwh~~1 >=)}o}zFVD*qO0jJc36 Ag5.b8r?3fsxWp!w)4(!-MXٖ6]8|'75ܩt8b莉 %!~^寧/MhCvauFi>1"^d ,ދ &$ͶRfn3> tiqwWO I'3QYd+@,Qy<܆ϼPй 1t;d[mlBT faTؒ <bˮ%|ͷD) Ǯn$#z,S8 !DU_k Fuv"h-صwҜxN=w^Jg^OL&>KՄXVZIlnHz- jiB7 ԗ\d يMM?V1p4 )(pGEa=S*fǘF%\!@xpagw!e^ABfz*i&jPwN3cM{sSMa0<꜔GQ$#b(w@n޼Vx޵LHyny~[6&=5O?7qZyI7("7Ds YaϼJ^y%9|2l"P?fR1kElD\`fn%cn]+LJF"0CyH*𽯝H;oW!͑~S? knxƤQjuVSC8Kdik6^~4m^Oگ!UVȰ-NXfG wGJU2.)SSZH9p`FJD.13TFv:[j|Cv;$&f60SLϒ{0;uZ'КB&r}VBBUpb}1ݽ^E 1R~ŽQ+6Dh@M0jT(Wl¤L4vEØ)nNVỄVK݋G+򿌶y򰆔n';`Zg!&a`1 E$JlB8\= KE$ }Zffb8tLR}W(> tx"[>˔vd8+^rMFa-'($ xO Ndx:ZC,)˄M<r+hylmF4ԏѻ N:a1㶵V<.HX-.ᖿUCOUzF'Zuir`hhQG6KY1 Wo!k,\ņ{h ɚϼ3X/]4# W0YEç`׽CC`W|&|oF%qpb#49<`=[拢V,Lw Ɲ[g:PqG\x[k^U5K9__W7m9O\SEY)rV `XDDǝ ~~w[$y `ԬMОqK&@pнCZp9^l(K(2C\RkhP=dEƭ>zaԊNT=NzuE.D\ljpiL%ؤ E'7"ʢa.wknBew-CċiD57՗jBxYaB ׭ {ҞwΊ'ޡ.sR u{7$ ?ycw#w!qӭyA1H=q*kjf"6Û`b1o ktv ߄!>&4tiNtPJXZjqr GeH] ۣ<}w]5혝/(?7FZQoUAuM(|w _U"QL/)`5gҺ񂎞*WRk!Sp;q;|-I(wq !M2ͧ[S;[/#Oݖ"OӱMxֽB6;=5h3mԗ\5:ds){(:*Hc`Z +Y)=$߃FT90Y[AYOӘ9kjXG3[(u*s`{-%XEv\ ï6=üH;0;,\0{ͼ&j~)d5˺Jتz}f0IC7I~ M:G<[."G8*C1 ;$ c2 7%x/\̺&+DA:7l x3. dl;Tn[s?";%?5w)j(z^QͦK™y] FU~ƁNAY{㮂!'lv]3O]t`*8:}*rC5m*lze{QG#A[/e$Q[iWP1sdS`R(42ʧڲ4/EÎ8J="-*o6I;ԻG_u{0͝7nJ` ٕ2:r`i٬B:'QsoVVAr;Y(S2#fjT /j2:7𝢃br; pɪK2El +C`Q\JW-{2HgC 4&+Ilΐ!Ѕ}H ܸ2#Vur!cYq^'*1ċ_ gBG3X@dK[DL:/7hHL%:zt/1㠛iȯ3/:  %Ŏ jl$$ygNdzlmMGx?!&JL8c6m|KqIAoXƉIlϣ-y˰VHυw I^gr (]CR0O3x@"FWFi Dp ogtSy JLYN'cLjwt,u_hg YHϩ7C~LR*Wyͥ2ݧd%Gf )e~[ZS&iu mEcb?{/eFٽ #.%3 #&sX.\d?=QX'?_1>I^?He|וB:nh"9OtC=ڛtC*øۄ~a!pnx!k.yELY15yJb rEW!ү!^Wy]D*K%/{X]rs8< 9IK""%wy娸bTMdZl}N8i8*Euڃ&qgqYv9U[lup.54\XO+ѥ2=0 ݶ̅%WںQ _E/:k2Yī1* : S"FÏ&K1l󦛿F{$c5L#î.BJiD)`e=5~؂?VGᵠ2gRj?'ΙJ:vϩS^v^F MW=-ڠ{IAJѤ7Mz3%hUq|똥bȢ'¾s~a+3\:Y?ۇt$`Isa] s@"DF]Fk k-Ie#,ܒ!t' ӜpϪ.Ǘ05iԣO#v[roFM/5JJW~i@ӮWʲn-oK,R=@pX0Jp5u&O'%ao$b_MPLCj|'Qch~ B;WC|O$t Ghmݱ+Fj#=9E'PXo|15k VAK)%> ̧;ՂQ(gm HW ~3Dc]y5{Wy\K{NJR%}ÈX=0fy))XqL~%3ƣAVRHӁ73 {+`c96/[3p+Bf8uD71}e4;s5-/X2:]L|Ʋ≮E(ëB&v 8 ?=BGJfXpPrAiu(MX~-O6J>B:L>92P'y;_w-x%Lc;j&784,FtVqtLoRE]OPq{ʐX4(5YeH '╗i5}0ĢRЕ%u(C R)UW!ҥA6!q@.^,djKqMn$$vWͪԮY NvMcX0N"F=AkkU_^iaF;V8jo7_=4%F2=QPt)b }2RAcόC,# 1 tFެPAH_IPݷ`lmwV3=\u#@3C +i] &M3Û C& ,]O]Ӻn1}C:#P'uX0meJ$mC-^UgOr8v`MY.׼\>o_h R{ׄ\LE )# Kqٟ+pAlXd/ BZn'-Jetu[m,"(|Taa>~jk-_姲 YmJ1*v3iy!y{~wԖD^*F+0͞!fgx%ʋB ڌYj~%魀%Mtp#}<3>ԅ@1=vfSO$ݐc9" xn%Yz^hʱU7ieX\Ԯ˞yh^[E0=Cu'½,L17dG&{oEnj#)rZb{ǰ`#ӓ?^(Bv)cCf8!pf$(CO>? h'D;2޶)ğ0M0[t ?M 9^YT/{5r?1uiG'Ѝ\=?!PI1%[ֳ;9:_aH9):NS5Scp}>ȗ"1"Hr{GP g?V5ZCmQ.m ơSꦻ2ddŤUr{Z'4˒ٔT<L+?Oߎx/+]8Xv5y'+A{Ucq#3;A0-rn:7.M}U ~A^9!Z G뛚~ 0oUNMN ݭ+?P[WFh榁[I{V{_j )w^cuRMSD*oRg^S`hUJ E_ۨgdphS*yV兪%^Ikisa'0c?'6ZMdmSEBn*dm{J@ -+3U4-MQC8V1,{`dlj ~%Ka(SOVmV[颚 55LؔT7FFNyw2gcx9xM{R\̓%Rk.!5p1hZZ뉯\pC, ֟ڴUPUo#O?y<#c+:/ɩ Rx.%?ݝMD/v; RޙXdޫ@O2b5ҝ8=&"ah@vTqFT2+ΥxS>a/r?r`E9?S? O g}:U[|*7DVIrf )Fsa5Mq;1٭/;o6>sۆ>1UY5EAAw>uA"Qc2[vXjeA~(申UO`PxXyxX3 ,4ztg4PکAM!ITsRDH|64L&`yA_RwIS5A(my?xׇJ ha|EjEǀj#; f^408}Mf`(@ 'ߥo?D@U\P/*-zH,?6E愽>45_%]^)d*v ܒcgf*527&bŞ9lҋa}#a/8B w$)Dhn Ke A\Med_ =lɳZ˟^^{Sd֭}!VIYN)RKow@C q<>ݑ~Ա9E&\Y1J%ˈ!8NNd*P%x[#XtU{ a VmM 19sۥ{KsjA{v|f-"^#s o~$rx^YGtQeQնbyAʤ4axSs":%$U X/q Ĕ(D)"7XmJ!N=$=V( DCxj[8-4%A[ Ԍ_(Go9 ]Q?{ǯzvzZǢ-(vsW ?a c-Bt9 -)_])9"@C:TK_!E嫚A5M0G * $4׀dA Ğ?uXGgd=(.MM2"(MCV{e[y=0;Rv^S<ZJm5Q#0)a5TДs'w<S"ξy~ i%n"X4>bNi;W&OE{8,a4ׁ./G)4G  ڣb Mh"h!8i[> sE}RSj6Ο7l;6DDTʫ00첿V3WgRxG !o:϶i_,tYX*zۡp^;~QpFBMN.ڂO<7 Kl{R\FUl5|>I 3&ZT P3CV<&#k ygpH;V VK˥2*=9CDO([O[~孍Q$}+n>7 nJr`j l=Xť`z|XRv,AE]\f]r=^j27!Tӎ ގ6n~V6UF<@v௷a^ )/gy5_D*"56TN%S1Fr HQymN}X췈kcAIsP4'P~{iwga@XYmЍ (#`a[bI[nH<"ߟGZ؉)̇l\Bu< LsӾ oh!vOM aƂuw1o?F9c?+k߆J_@0cNkDȱ`6j1Fs[{^IxYrad&`j26TmͶxINJ #aǼ↳8˦ ~7ڕ/Fœ{ˎ==LxXЌukI%b kP R22Xa;s|fLA &XW[m BJOK|tl9R- :Y˼u'w;w[RO9oaXP VE غg d?nk47t_]bsL^D;) p?` endstream endobj 138 0 obj << /Length1 1914 /Length2 12364 /Length3 0 /Length 13560 /Filter /FlateDecode >> stream xڍP c]C]`Hp`8r9}UT^ݻ7CI(jno sadeb+jhXXؙXX))5.6)@N`{;1ĝ@@7卨hosXXXXl,,!;$n`s"@H)nry[?_4fV^^n? 'tپhۛA.FŁݝ hdd)DpX@ '79%-oiL +_u{ wfB\AN e_d 7tGlg0h Xm@e)&"-MHo lvpqfr4o,ig.nok sqF> m=>܏vvA`;s?d:0kځ]AsL,A.Nv60bc OПN?o|o2@` 3 pqrz`6s,vd3,o購+u??YKWCNQou{9lNvn/WHY; {_ž vhZR[67`d1{{c?HO?_?l7o]]f@m/U*+|Q;Kn$Y 2WY53H`dea?2v{8.vf 'Ddy$6NN74ώ>Z!(]yPU =*~-r#=U|gՌ1,,04 օ,f+\"=I,go y* 6n| |Dט3Tbirxe%1낋,L$>c]| x,6NḰ:}9&t Ǩ-ݥ Jn#<6`uOF-pVZV%}a x})Nf=<ԿVJ"fcgpS$^cq vB /Kidr{8-rw'lI*4l%6K<éB͌bH+^#i^#2!u9c\QhAnIp,z t3)֚Mt%é˼-0X|P\> K 9Yl*?V`%f6 G%J\ߵR–j卆 *AmݯhU+ 2f{~xz 4jSI'BdSa6~'%U7^f2~_!i^ۋ5:tUlmӭI:YAarŪJpf;ox\gjTMfmN~6hJ{%/zuț׳TJOM]# R~aEp5EWUPVbdFa: Iw{:ג$EC?d;K6 O_c9X e29~꟠~K֩2}d4m()G*34gn"|I@%94;/,(6Gs|8K}<4_*;sӟ̚,򚎻5A91 1AseZ65Er*{jLa+~_QpO{KqCrHcX-oٺ 8KػD* 긽~fX i#[62]FKzR=⌐2ƶuv 9n=4O0<'dz/n|b|(:yˡQ|ALLC]?0^ʦx!jd%/xeܓ~WtJP =#|\)VZZ-J6% onHn*VM-B8Q-3/&1HR_)6t8ċ؀t2>SҸFAX<.8-D'}*tJIl׋%ݍrs!#b&YjX"$yj4Yj< %>(uvi65T. a_F:?p2~ V/)h:LTW_me)cY~7u'JGtQ- m4§Hj,pH6y=(.5ٰ2EЈb|Hwۗozp .#vXGem߻용*c>Q~xHiBuBB^~I}h(% dſHW3aG"Kh嘸ALx>Vh?D gHa}d8~?W٥E' ehAfAC# M)JR/9tdXƦp|!lEC5,DIK&UQ` Z+:Nc:.xkG=_6Yz`η[—Yif[}tl)6u7:rTv#@HE;dKAYhaOɁ7ۦbySO<>+?rV # nšEzLb糒 mlb㺐m SܦZDw@u+dhkƇP.k/`GઉOL*}BTٓ#Xxvc! @V#uwWwu)BMh{g/=J0DS4+4W&g2)f*J׿J˃a(i){˾5`xE&YcEyj?hsO*a-hREA F|xp1w.T2k*} ۖ E͙WbBFm:{Qw NE*28#>Cp k~N)lŵkÚ~g D{uytҏSX*јʳ8-[F'aelS$ף!$i|_1{Va~>l!Y<nqtѯMTTr08.vm=iIns徢G0`oڕj\>gal =Yiw42UwIly+F~wMF%hS4=H! t1y:M71IK'naAhuՋFWB jY%Ž-,G늋LZѸܖQ̡i­r.*q[w>}N |^Zh*-zj=Taz6ttC%ƒR(-Zӗ}1ړtv"ٟr5BA>~ywRߒ`Gخ)mx+D^% |PޠWٞĎ\;OK; abp"W0w[džo}D6u9n~ʌfmbS攚wa( ] 䩹IVs jrB@QG_9{Һ;rوƨaCid;r&?$ =vs㛊e~3̵ȌN eG25w) $e,$ŵ4C븢QIި0Yʻ˴WMy$Rѯq6#L+PSn!,zDr%6SPt"́ y#zRon,BMdĬ%ՠ*dug1Y֘9ߘKn6>G__U= feT#3ݼ.".8MʆA妷_+kp{6'p ;wů #kRY9e8gOA ~E~ԟ{3 e,aTHYSezO=` 7'=ʔ%L Xƭua@җUHiۙ)Nf@Ue [*~Գ_!,k n퇷~*+4x"!=*c!$V0tF`l)!e Y_dۇ_dĵ=s){zkRo"g,VAlU:xqŃL?{jKAj|?۴+DQ?mj/-g7bF>oI=g>'Q#0ƯJr=H{-~ X}uf0š2j%w4J8zI>ĨQيrd`..rI-m (y9Q(Q_Y6^o hzsN ʼn}k^Jû3 6Df>vq~eQ72FQa^p`aoű*_=;O%?sߠg*45=*pL`=bjY1yީv;]i[Z x@H.Zw]{HcҒ9Ay8]Pu0N{߻}Nhu:ڟDX,UHO|d0;J9H=0ᘥ0`%hr zH=d-/wd cO6^xM7p{hTƺ@\ r"4< LmB^6)ijm 1B/ .f5Y![⤩[v{=!| \MX0LK~Ei?UrJG[,x Xz_/ Ahfư`_ƭSj8pvz ]VWCVNn h [LHl!c|DKf?m!)q^[΀F m.fo'wTœE%e:3JWLhŞ!YvX&?nvۋg8$|{ :0(fOLî[Jiy檅9H>%qN9]}9 @sKc-9Hv?$2u01kgTκ)YT-͞,m !L&E '+?` "Vi~쵸+2Ȅ QZlĹ"ǠI}ŀ 9 HUϮUZ+??:+Ci,MMH_Bϫͷ} ~}ޤ|}Yb` g;1{\sv25$?;m;kX4fpQH%Λ4vɉeSĦɋ.`xqE,,pE=#ߧMք\6 [), gh''X\R'$ \̇\ԝ,((ˊTB?1Y?kw_c^ Q<* !XsGw0vј}L@Z牒\Wg=5

sL%oi_W)DSHfN+j㷖i/qK#kD`=fĺ񬍝 t eoh&,vL}? Ԟ0j}oFY yq.[]ХJ?9*BHT;VC- 60ÊV(q[|m>c5Z)G!?9T1m@V?Uَ)ETz >}3cUF}ct'~ޒU.Z\yve-6Sƌw^yk1gɂe{q~aKPn$8HJJZ:v霷!ry||Y֫tҡ!udz+k|sI>PƁ@.7Zߧ MXm-OaS,jiluxs rFw=GtE7ÄqJV{/24k<-(HqjgR~ļ{/' El[\}.X |Ü"_5h+8C "VŖ~IGҜ}͠abCh5l -f9uôSv '&"vp8lnbo^*L֢z. N07v5Cr)Op}JqɁmg@28xX_7]$4պO峼~zRi&Îw~Ӣ _S(kKUnk(lG'.(/.KV2,-qd"hU\ۄ3J6%E4FOQFs!P E/g S.uZƸ9H`XOK>aCd~@I596HnW5֐js>?')ҙ!bt8{DF7q D(et_{*P7{NQ$JD2grA1p _ ϹQNN 7fPں\^'kwvn>Yr\]U,dΡ53Jc4E@,0xQns,鴶Fʉ] F |G$ L40zݎV*SjA\425M"Efm._BN%@-l&̚aU-u4hHppHg9*3wZG:_mץF'JbCeB>h~;SW'JB~%"1~(w}剢ݐðv '@~wC~VsDuZrcR,Ȥv*3|:-}҉wn&Usf/1Dz~ÎCі7+$)MI "FE3Xe[#=AQ jd,en+Kbi1xiFJۯ~z4 ebNO \\I\OW^Pc0H$+TzV)e%XҒt]Rsuvr(g \Dž=4o"d!-8څFau"HÆR١[XdrZtF6)Z0Pi4Jg 3ԑU Ҧ;Eq{nZq[/#u8k t:cBrqXNۺC'/t-i U ~UQazi<&.G[tյ@+~g4VQkrE濊h?[fe{ҨC)q2VC#[ @:[kk&4R'8k8V0dgZ("9pdg'P?:+-oH}^% @LceeI]? ՠ\8]- j|g`fY]ě=cy@3og ' fr="L[`״fRF3ךڢ5?>e~PUauٱ- V9(qZQNVwv;q~i١O9,2@wJ1Qͪڋ*|Z5rpE. L#FwG_s;.յyL;.eKUW^6O vמa j|}b;=h0{^7u0pCY6ыC?|]0:PeտK ˆJ '>i<; D> üaTYK`lqn9"WhR 'ї2(5sUX̠H,j.as=]=4 o{qޅ\3N.2ĊhK(#nP[!  3Jj|L콩p!eфNm! hsHn ([Pl"q}n-U1^KjDd!9GO!-ihFPzyԝ*Wɪ6`,p4 UF6vڦNm%<QHymK#q|A}Z삀&d~%xFۧ= [:PozSy~YՄ6nreFKwF XrSiU{/idO_NJPlz]RQxWcY0Tq?"B-E6de@ E}L0kh:/F 籢Mk۠J?="3'Zγ8Djf.S;4egOAցIOIPFpcUٝwNܸybdAJ; V!FauCFrv`@]~5netQ7`%-ۇ-kI(Q5$<흃^RcL(-EzA 10umݤy6L>'"*>пJ-T)J|' nFB Y$%_w1"+%EMItV҈nAC'Z+g2j3~{g>]"5C! r4_hy-nՆZ.z'Fd1H4=l Mh =xc>v9xtgD1~`2TdD\ܱSڃ:i;o~iy=s86zRBQUo~]bNq 'p8ov4|72:舏N j-&7ʍs8R3(~e6ynkrd }O{5STDZy#p(%<,GQʮ;Φg5 i i3HD3~C!)Sq{S2A}сk„ٓP[nd?{ >^Vڐj/;hzW#$[ã$}rO(#?&M0zSvTo*)<"k2k2  #b|/RFrk894Z F  V>:u3 *T/wFCXǾj%uk婪5 +UU:^Z57&%NW`34YQc$ְ2nI+ܬm!Q>t%T#LZ- ǟC?pԣPv"Cq[;B(l/Ϗ +†~54'UJ8 r;~}>[%#v˵TY#Vat< `B]Ry>7KvcN+ w>.Tft44${L:X8s]{*7ݦhF ģw) pldWMЖ[@c!q~#H߷>VcTA>TX᳋wP0*2z_X8ĔG4Bo;?kz+o^Fóc<|$aX`^]E~sFW\cA@ml ߁R8lc~rĐDZzT+{_ZCnB8#Nza;HH1Lm$~+WʚLAxU[ηǰuGZyv0ZʘO~]"voh.;djB&i0)lͯ!Ygz@ǶX oZc X߆D6ZP, ^m*vvXF endstream endobj 10 0 obj << /Type /ObjStm /N 100 /First 852 /Length 6037 /Filter /FlateDecode >> stream x\is7_1w+eR[eّEr;|%fVWDʱo?ݘ̐DIQ[8F@Pj*U%JLRtL֕5UӶҾ>VU7NULU)-C VL$RWTlrm+i*N4L JDRWU&NI*g+ct#Җ@@)kLޮgiRR?53ꗨQPUh8``Hr%Ry"!諐H1PU0M-:4 3: NR7I]b]5ZDl HDM"$Fj*, '=(%sPyFQrD&㷈 ߐ|TCT] ;`1)A_RiE\K4Ш-GZ: !6ϗz$KQC+AЍiX}()wajhCE+:^Vpz|].Un_SbyR*b1Q-nկ_Vj>|Y5ۍn^_0С|27{ӓhnՈȟNenrxirIUgNjOs*1Cd y`O97s=*',x0"y6M Y6eM.#8OK 0 ^,kҭ!Z}d귙3kK=M>?֋?z>RslFzRzVe=_T6Y7J etOSh}[ 2by߼NO'g 3.ÌZey۵C%]sl;GsD}0q>9rpj#8<pOSj95wj|x.aK4-S#OoG 2dYm;v%a)ܯZ}Qdlvl% .ߪm V'rK4~Zqx{]_O 5^US}oo2nu?yMX,;t'g: ơ* HmVi5nf bt>XZCb^8'BM6b,^hGf_HP=ڶ=ֶD>VжRcm[&qWx*@l&cr:]ة/~#s|?֟hqR#=Cv}:Ezt\YyƵӺۇ?蘲<,ϧcJo8,'cy@LwҒՄYMHkMa5|*j#]"t5v} iD=Fl%Z"V~1ʯʶ[b<۷]_U6/ն^tkyPv%q]Nmݾ ظmSi(WȪk#:4۸Pim -p)KnBi7&R@]P}DJ4CCx4MԈ8ǁv&<SAIzq,5#HKN{YTȵJɓ`xݺOY]YrBYs 0=ccNCbH cSWo]M;ߊt~) Bk5 Eۇl&gfb8`=K*m8 "BN &dD m6Tu#")Q m Lq%IHHp0J.hf0AS :y&/OQvy,>Bi!AWX6rEuC eÃS7cp>9.r4)K&~5I@kN%DŽmv r液;nɸg/!ʦ@J4{#>i>HRiTuBzY X[tiQ9hG0rz{tӠ `8D(i1ub4=x d(vԞTەPk"Rڝ"$՝Nr ́ppH-)y8!n~h똮>iY^N9JXet6`` ^Ӹ DZq'RJ<;Q\ J&ܣ`9_k0X3"vnVJ-U"p%O+gx-QȪKZxg嵜_Q +Z~eis@ZN-U*QU6in)-L삶rln 6)9~j\uBu]lމʪ Vbo*Kd."pd C|'"Emɲơ\$.1]rp?؀gp;@%v>k)kM␫z%4P xB1^>VkY"wRG)g$ !o@& G2L 呰űrLA,y8P#PrV48?p  J,knVA9Sv!ObI=;/%IjJ8F w&JBw)FTJOoC)޽& 3=s X Ԇ_hy!XW|Eз8e;r2%W"7!xƍJ(ZK ˻N8peȊl^FX4d̄W8jo8%[;|B^aQK !D̑LshDdCó8Š{1Syz M9! z&BYL{A*e÷p r•x3∖-ۥk;]XNfTd]NVYӋ.w of̘D5İgn?'G$i`Z^KCٖB'q)Am, #ۼ%ߡ:>|֭䫁Lse]obzwuS,c+NX-@_<O6~-qL]tLnxng{FǹBL,F/VKk{\ua`\͚Tlz\+rE#  HTJ 2K ! íqpO7޲'=3m$v=Z}pgzXѺѭ\0}VRgOя}LWbq,ϩHox[]N4Z|Ĩ|[nitX2`?o s}ʽs>2TW ohe#f:zGYL]_/K2[&r6,oI875$ӆ9ns3Փ;Ͼyr-k_pB+Fbp-w*}M7k`'YWzRG'g%.3MnýWOggqp`ܤkxaw_a$_c;:?=UrÉ\|enV+s%?Co\O?Sk\[+u\wUo"W{;{Ya]V+EZ%nwvw_+߭5 ~MKSpB5ؒ3 rr4=~X>~?o?cȵG^m|営=Ӱ!bv~v<_LE1.xLbVQ}??;ub6:Lc ViSGp@"׊zMC2_'2bZ>NPr:!;#;89}ƚf)u$a=C_fwX_h~o;ỵ;Uu#fz-8fryy8%r=9J'S_fs\M'"0Ngg$ӫk}6yBq``$l6??KxiUuI=|_!vc}㡄=bC!&]uҪX ..gCb6{oefun>j,%g;@ONןe\v.`b 09޽O՛D_f&_H ,$?zs̹Crħ|yןAu|arԳ$5?g;>>)F3?< j棏ϨF_7B>>nFWkD棏;ƃ/is[棏N&ƣX/vڗ9w;Xh>[iv6|r8X)wwͭ}~+}}r&au ƣ/ikj\Z9~1| [9m*9d_W݅-_xK]7Q7!v%f endstream endobj 143 0 obj << /Producer (pdfTeX-1.40.18) /Creator (TeX) /CreationDate (D:20190114092151-05'00') /ModDate (D:20190114092151-05'00') /Trapped /False /PTEX.Fullbanner (This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017) kpathsea version 6.2.3) >> endobj 141 0 obj << /Type /ObjStm /N 2 /First 13 /Length 90 /Filter /FlateDecode >> stream x341P0P041R0ಱ,HUHLO-w/+Q04L)V64*R R0Q\vvH:Ksӹ FMl> endstream endobj 144 0 obj << /Type /XRef /Index [0 145] /Size 145 /W [1 3 1] /Root 142 0 R /Info 143 0 R /ID [<5C1558A9B405C8C8902FD4F5FB129DE2> <5C1558A9B405C8C8902FD4F5FB129DE2>] /Length 403 /Filter /FlateDecode >> stream x%;LQ=B ȥ^ܽ JQpd@ՐFHL\tp0%1qseq,O~wNp̎Y`12OGdX(OIjUrX,$AXwva#U!r&aXf_9INӰ:e_ΓgH-#Dak+Z{g9H.A4Es"im6hw и@KFZa{ : /5v.M.+*FzuK# r$I[?UT* w2@6U ?ީTCpTpkjT-WË֫Kw[: xov%6 `.9? endstream endobj startxref 265083 %%EOF gbm/src/0000755000176200001440000000000013417115400011602 5ustar liggesusersgbm/src/locationm.h0000644000176200001440000000157613417115400013751 0ustar liggesusers//------------------------------------------------------------------------------ // GBM alteration by Daniel Edwards // File: locationm.h // // History: 27/3/2008 created // //------------------------------------------------------------------------------ #ifndef LOCMCGBM_H #define LOCMCGBM_H #include #include #include #include using namespace std; class CLocationM { public: CLocationM(const char *sType, int iN, double *adParams); virtual ~CLocationM(); double Median(int iN, double *adV, double *adW); double PsiFun(double dX); double LocationM(int iN, double *adX, double *adW); private: double *madParams; const char *msType; double mdEps; struct comp{ bool operator()(pair prP, pair prQ) { return (prP.second < prQ.second); } }; }; #endif // LOCMCGBM_H gbm/src/locationm.cpp0000644000176200001440000001145713417115400014303 0ustar liggesusers//------------------------------------------------------------------------------ // GBM alteration by Daniel Edwards // File: locationm.cpp // // Purpose: Class to provide methods to calculate the location M-estimates // of a variety of functions // // History: 31/03/2008 created // //------------------------------------------------------------------------------ #include "locationm.h" #include using namespace std; ///////////////////////////////////////////////// // Constructor // // Creates a new instance of this class ///////////////////////////////////////////////// CLocationM::CLocationM(const char *sType, int iN, double *adParams) { int ii; msType = sType; mdEps = 1e-8; madParams = new double[iN]; for (ii = 0; ii < iN; ii++) { madParams[ii] = adParams[ii]; } } ///////////////////////////////////////////////// // Destructor // // Frees any memory from variables in this class ///////////////////////////////////////////////// CLocationM::~CLocationM() { if (madParams != NULL) { delete[] madParams; } } ///////////////////////////////////////////////// // Median // // Function to return the weighted quantile of // a vector of a given length // // Parameters: iN - Length of vector // adV - Vector of doubles // adW - Array of weights // dAlpha - Quantile to calculate (0.5 for median) // // Returns : Weighted quantile ///////////////////////////////////////////////// double CLocationM::Median(int iN, double *adV, double *adW) { // Local variables int ii, iMedIdx; vector vecW; vector< pair > vecV; double dCumSum, dWSum, dMed; // Check the vector size if (iN == 0) { return 0.0; } else if(iN == 1) { return adV[0]; } // Create vectors containing the values and weights vecV.resize(iN); for (ii = 0; ii < iN; ii++) { vecV[ii] = make_pair(ii, adV[ii]); } // Sort the vector std::stable_sort(vecV.begin(), vecV.end(), comp()); // Sort the weights correspondingly and calculate their sum vecW.resize(iN); dWSum = 0.0; for (ii = 0; ii < iN; ii++) { vecW[ii] = adW[vecV[ii].first]; dWSum += adW[ii]; } // Get the first index where the cumulative weight is >=0.5 iMedIdx = -1; dCumSum = 0.0; while (dCumSum < 0.5 * dWSum) { iMedIdx ++; dCumSum += vecW[iMedIdx]; } // Get the index of the next non-zero weight int iNextNonZero = iN; for (ii = (iN - 1); ii > iMedIdx; ii--) { if (vecW[ii] > 0) { iNextNonZero = ii; } } // Use this index unless the cumulative sum is exactly alpha if (iNextNonZero == iN || dCumSum > 0.5 * dWSum) { dMed = vecV[iMedIdx].second; } else { dMed = 0.5 * (vecV[iMedIdx].second + vecV[iNextNonZero].second); } return dMed; } ///////////////////////////////////////////////// // PsiFun // // Function to calculate the psi of the supplied // value, given the type of function to use and // the supplied parameters // // Parameters: dX - Value // // Returns : Psi(X) ///////////////////////////////////////////////// double CLocationM::PsiFun(double dX) { // Local variables double dPsiVal = 0.0; // Switch on the type of function if(strncmp(msType,"tdist",2) == 0) { dPsiVal = dX / (madParams[0] + (dX * dX)); } else { // TODO: Handle the error Rprintf("Error: Function type %s not found\n", msType); } return dPsiVal; } ///////////////////////////////////////////////// // LocationM // // Function to calculate location M estimate for // the supplied weighted data, with the psi-function // type and parameters specified in this class // // Parameters: iN - Number of data points // adX - Data vector // adW - Weight vector // // Returns : Location M-Estimate of (X, W) ///////////////////////////////////////////////// double CLocationM::LocationM(int iN, double *adX, double *adW) { // Local variables int ii; // Get the initial estimate of location double dBeta0 = Median(iN, adX, adW); // Get the initial estimate of scale double *adDiff = new double[iN]; for (ii = 0; ii < iN; ii++) { adDiff[ii] = fabs(adX[ii] - dBeta0); } double dScale0 = 1.4826 * Median(iN, adDiff, adW); dScale0 = fmax(dScale0, mdEps); // Loop over until the error is low enough double dErr = 1.0; int iCount = 0; while (iCount < 50) { double dSumWX = 0.0; double dSumW = 0.0; for (ii = 0; ii < iN; ii++) { double dT = fabs(adX[ii] - dBeta0) / dScale0; dT = fmax(dT, mdEps); double dWt = adW[ii] * PsiFun(dT) / dT; dSumWX += dWt * adX[ii]; dSumW += dWt; } double dBeta = dBeta0; if (dSumW > 0){ dBeta = dSumWX / dSumW; } dErr = fabs(dBeta - dBeta0); if (dErr > mdEps) { dErr /= fabs(dBeta0); } dBeta0 = dBeta; if (dErr < mdEps) { iCount = 100; } else { iCount++; } } // Cleanup memory delete[] adDiff; return dBeta0; } gbm/src/gbm_engine.h0000644000176200001440000000622613417115400014053 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: gbm_engine.h // // License: GNU GPL (version 2 or later) // // Contents: Generalized boosted model engine // // Owner: gregr@rand.org // // History: 3/26/2001 gregr created // 2/14/2003 gregr: adapted for R implementation // //------------------------------------------------------------------------------ #ifndef GBM_ENGINGBM_H #define GBM_ENGINGBM_H #include #include "buildinfo.h" #include "distribution.h" #include "tree.h" #include "dataset.h" #include "node_factory.h" using namespace std; class CGBM { public: CGBM(); ~CGBM(); GBMRESULT Initialize(CDataset *pData, CDistribution *pDist, double dLambda, unsigned long nTrain, double dBagFraction, unsigned long cLeaves, unsigned long cMinObsInNode, unsigned long cNumClasses, int cGroups); GBMRESULT iterate(double *adF, double &dTrainError, double &dValidError, double &dOOBagImprove, int &cNodes, int cNumClasses, int cClassIdx); GBMRESULT TransferTreeToRList(int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, VEC_VEC_CATEGORIES &vecSplitCodes, int cCatSplitsOld); GBMRESULT Predict(unsigned long iVar, unsigned long cTrees, double *adF, double *adX, unsigned long cLength); GBMRESULT Predict(double *adX, unsigned long cRow, unsigned long cCol, unsigned long cTrees, double *adF); GBMRESULT GetVarRelativeInfluence(double *adRelInf, unsigned long cTrees); GBMRESULT PrintTree(); bool IsPairwise() const { return (cGroups >= 0); } CDataset *pData; // the data CDistribution *pDist; // the distribution bool fInitialized; // indicates whether the GBM has been initialized CNodeFactory *pNodeFactory; // these objects are for the tree growing // allocate them once here for all trees to use bool *afInBag; unsigned long *aiNodeAssign; CNodeSearch *aNodeSearch; PCCARTTree ptreeTemp; VEC_P_NODETERMINAL vecpTermNodes; double *adZ; double *adFadj; private: double dLambda; unsigned long cTrain; unsigned long cValid; unsigned long cTotalInBag; double dBagFraction; unsigned long cDepth; unsigned long cMinObsInNode; int cGroups; }; #endif // GBM_ENGINGBM_H gbm/src/laplace.h0000644000176200001440000000572213417115400013362 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // File: laplace.h // // License: GNU GPL (version 2 or later) // // Contents: laplace object // // Owner: gregr@rand.org // // History: 3/26/2001 gregr created // 2/14/2003 gregr: adapted for R implementation // //------------------------------------------------------------------------------ #ifndef LAPLACGBM_H #define LAPLACGBM_H #include #include "distribution.h" #include "locationm.h" class CLaplace : public CDistribution { public: CLaplace(); virtual ~CLaplace(); GBMRESULT UpdateParams(double *adF, double *adOffset, double *adWeight, unsigned long cLength) { return GBM_OK; }; GBMRESULT ComputeWorkingResponse(double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff); GBMRESULT InitF(double *adY, double *adMisc, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength); GBMRESULT FitBestConstant(double *adY, double *adMisc, double *adOffset, double *adW, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long nTrain, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff); double Deviance(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff); double BagImprovement(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain); private: vector vecd; vector::iterator itMedian; CLocationM *mpLocM; }; #endif // LAPLACGBM_H gbm/src/tree.cpp0000644000176200001440000002666013417115400013257 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "tree.h" CCARTTree::CCARTTree() { pRootNode = NULL; pNodeFactory = NULL; dShrink = 1.0; } CCARTTree::~CCARTTree() { if(pRootNode != NULL) { pRootNode->RecycleSelf(pNodeFactory); } } GBMRESULT CCARTTree::Initialize ( CNodeFactory *pNodeFactory ) { GBMRESULT hr = GBM_OK; this->pNodeFactory = pNodeFactory; return hr; } GBMRESULT CCARTTree::Reset() { GBMRESULT hr = GBM_OK; if(pRootNode != NULL) { // delete the old tree and start over hr = pRootNode->RecycleSelf(pNodeFactory); } if(GBM_FAILED(hr)) { goto Error; } iBestNode = 0; dBestNodeImprovement = 0.0; schWhichNode = 0; pNewSplitNode = NULL; pNewLeftNode = NULL; pNewRightNode = NULL; pNewMissingNode = NULL; pInitialRootNode = NULL; Cleanup: return hr; Error: goto Cleanup; } //------------------------------------------------------------------------------ // Grows a regression tree //------------------------------------------------------------------------------ GBMRESULT CCARTTree::grow ( double *adZ, CDataset *pData, double *adW, double *adF, unsigned long nTrain, unsigned long nBagged, double dLambda, unsigned long cMaxDepth, unsigned long cMinObsInNode, bool *afInBag, unsigned long *aiNodeAssign, CNodeSearch *aNodeSearch, VEC_P_NODETERMINAL &vecpTermNodes ) { GBMRESULT hr = GBM_OK; #ifdef NOISY_DEBUG Rprintf("Growing tree\n"); #endif if((adZ==NULL) || (pData==NULL) || (adW==NULL) || (adF==NULL) || (cMaxDepth < 1)) { hr = GBM_INVALIDARG; goto Error; } dSumZ = 0.0; dSumZ2 = 0.0; dTotalW = 0.0; #ifdef NOISY_DEBUG Rprintf("initial tree calcs\n"); #endif for(iObs=0; iObsGetNewNodeTerminal(); pInitialRootNode->dPrediction = dSumZ/dTotalW; pInitialRootNode->dTrainW = dTotalW; vecpTermNodes.resize(2*cMaxDepth + 1,NULL); // accounts for missing nodes vecpTermNodes[0] = pInitialRootNode; pRootNode = pInitialRootNode; aNodeSearch[0].Set(dSumZ,dTotalW,nBagged, pInitialRootNode, &pRootNode, pNodeFactory); // build the tree structure #ifdef NOISY_DEBUG Rprintf("Building tree 1 "); #endif cTotalNodeCount = 1; cTerminalNodes = 1; for(cDepth=0; cDepthWhichNode(pData,iObs); if(schWhichNode == 1) // goes right { aiNodeAssign[iObs] = cTerminalNodes-2; } else if(schWhichNode == 0) // is missing { aiNodeAssign[iObs] = cTerminalNodes-1; } // those to the left stay with the same node assignment } } // set up the node search for the new right node aNodeSearch[cTerminalNodes-2].Set(aNodeSearch[iBestNode].dBestRightSumZ, aNodeSearch[iBestNode].dBestRightTotalW, aNodeSearch[iBestNode].cBestRightN, pNewRightNode, &(pNewSplitNode->pRightNode), pNodeFactory); // set up the node search for the new missing node aNodeSearch[cTerminalNodes-1].Set(aNodeSearch[iBestNode].dBestMissingSumZ, aNodeSearch[iBestNode].dBestMissingTotalW, aNodeSearch[iBestNode].cBestMissingN, pNewMissingNode, &(pNewSplitNode->pMissingNode), pNodeFactory); // set up the node search for the new left node // must be done second since we need info for right node first aNodeSearch[iBestNode].Set(aNodeSearch[iBestNode].dBestLeftSumZ, aNodeSearch[iBestNode].dBestLeftTotalW, aNodeSearch[iBestNode].cBestLeftN, pNewLeftNode, &(pNewSplitNode->pLeftNode), pNodeFactory); } // end tree growing // DEBUG // Print(); Cleanup: return hr; Error: goto Cleanup; } GBMRESULT CCARTTree::GetBestSplit ( CDataset *pData, unsigned long nTrain, CNodeSearch *aNodeSearch, unsigned long cTerminalNodes, unsigned long *aiNodeAssign, bool *afInBag, double *adZ, double *adW, unsigned long &iBestNode, double &dBestNodeImprovement ) { GBMRESULT hr = GBM_OK; int iVar = 0; unsigned long iNode = 0; unsigned long iOrderObs = 0; unsigned long iWhichObs = 0; unsigned long cVarClasses = 0; double dX = 0.0; for(iVar=0; iVar < pData->cCols; iVar++) { cVarClasses = pData->acVarClasses[iVar]; for(iNode=0; iNode < cTerminalNodes; iNode++) { hr = aNodeSearch[iNode].ResetForNewVar(iVar,cVarClasses); } // distribute the observations in order to the correct node search for(iOrderObs=0; iOrderObs < nTrain; iOrderObs++) { iWhichObs = pData->aiXOrder[iVar*nTrain + iOrderObs]; if(afInBag[iWhichObs]) { iNode = aiNodeAssign[iWhichObs]; dX = pData->adX[iVar*(pData->cRows) + iWhichObs]; hr = aNodeSearch[iNode].IncorporateObs (dX, adZ[iWhichObs], adW[iWhichObs], pData->alMonotoneVar[iVar]); if(GBM_FAILED(hr)) { goto Error; } } } for(iNode=0; iNode dBestNodeImprovement) { iBestNode = iNode; dBestNodeImprovement = aNodeSearch[iNode].BestImprovement(); } } Cleanup: return hr; Error: goto Cleanup; } GBMRESULT CCARTTree::GetNodeCount ( int &cNodes ) { cNodes = cTotalNodeCount; return GBM_OK; } GBMRESULT CCARTTree::PredictValid ( CDataset *pData, unsigned long nValid, double *adFadj ) { GBMRESULT hr = GBM_OK; int i=0; for(i=pData->cRows - nValid; icRows; i++) { pRootNode->Predict(pData, i, adFadj[i]); adFadj[i] *= dShrink; } return hr; } GBMRESULT CCARTTree::Predict ( double *adX, unsigned long cRow, unsigned long cCol, unsigned long iRow, double &dFadj ) { if(pRootNode != NULL) { pRootNode->Predict(adX,cRow,cCol,iRow,dFadj); dFadj *= dShrink; } else { dFadj = 0.0; } return GBM_OK; } GBMRESULT CCARTTree::Adjust ( unsigned long *aiNodeAssign, double *adFadj, unsigned long cTrain, VEC_P_NODETERMINAL &vecpTermNodes, unsigned long cMinObsInNode ) { unsigned long hr = GBM_OK; unsigned long iObs = 0; hr = pRootNode->Adjust(cMinObsInNode); if(GBM_FAILED(hr)) { goto Error; } // predict for the training observations for(iObs=0; iObsdPrediction; } Cleanup: return hr; Error: goto Cleanup; } GBMRESULT CCARTTree::Print() { GBMRESULT hr = GBM_OK; if(pRootNode != NULL) { pRootNode->PrintSubtree(0); Rprintf("shrinkage: %f\n",dShrink); Rprintf("initial error: %f\n\n",dError); } return hr; } GBMRESULT CCARTTree::GetVarRelativeInfluence ( double *adRelInf ) { GBMRESULT hr = GBM_OK; if(pRootNode != NULL) { hr = pRootNode->GetVarRelativeInfluence(adRelInf); if(GBM_FAILED(hr)) { goto Error; } } Cleanup: return hr; Error: goto Cleanup; } GBMRESULT CCARTTree::TransferTreeToRList ( CDataset *pData, int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, VEC_VEC_CATEGORIES &vecSplitCodes, int cCatSplitsOld, double dShrinkage ) { GBMRESULT hr = GBM_OK; int iNodeID = 0; if(pRootNode != NULL) { hr = pRootNode->TransferTreeToRList(iNodeID, pData, aiSplitVar, adSplitPoint, aiLeftNode, aiRightNode, aiMissingNode, adErrorReduction, adWeight, adPred, vecSplitCodes, cCatSplitsOld, dShrinkage); } else { hr = GBM_FAIL; } return hr; } gbm/src/gbm_engine.cpp0000644000176200001440000003063213417115400014404 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 //#define NOISY_DEBUG #include "gbm_engine.h" CGBM::CGBM() { adFadj = NULL; adZ = NULL; afInBag = NULL; aiNodeAssign = NULL; aNodeSearch = NULL; cDepth = 0; cMinObsInNode = 0; dBagFraction = 0.0; dLambda = 0.0; fInitialized = false; cTotalInBag = 0; cTrain = 0; cValid = 0; pData = NULL; pDist = NULL; pNodeFactory = NULL; ptreeTemp = NULL; } CGBM::~CGBM() { if(adFadj != NULL) { delete [] adFadj; adFadj = NULL; } if(adZ != NULL) { delete [] adZ; adZ = NULL; } if(afInBag != NULL) { delete [] afInBag; afInBag = NULL; } if(aiNodeAssign != NULL) { delete [] aiNodeAssign; aiNodeAssign = NULL; } if(aNodeSearch != NULL) { delete [] aNodeSearch; aNodeSearch = NULL; } if(ptreeTemp != NULL) { delete ptreeTemp; ptreeTemp = NULL; } // must delete the node factory last!!! at least after deleting trees if(pNodeFactory != NULL) { delete pNodeFactory; pNodeFactory = NULL; } } GBMRESULT CGBM::Initialize ( CDataset *pData, CDistribution *pDist, double dLambda, unsigned long cTrain, double dBagFraction, unsigned long cDepth, unsigned long cMinObsInNode, unsigned long cNumClasses, int cGroups ) { GBMRESULT hr = GBM_OK; unsigned long i=0; if(pData == NULL) { hr = GBM_INVALIDARG; goto Error; } if(pDist == NULL) { hr = GBM_INVALIDARG; goto Error; } this->pData = pData; this->pDist = pDist; this->dLambda = dLambda; this->cTrain = cTrain; this->dBagFraction = dBagFraction; this->cDepth = cDepth; this->cMinObsInNode = cMinObsInNode; this->cGroups = cGroups; // allocate the tree structure ptreeTemp = new CCARTTree; if(ptreeTemp == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } cValid = pData->cRows - cTrain; cTotalInBag = (unsigned long)(dBagFraction*cTrain); adZ = new double[(pData->cRows) * cNumClasses]; if(adZ == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } adFadj = new double[(pData->cRows) * cNumClasses]; if(adFadj == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } for (i=0; i<(pData->cRows)*cNumClasses; i++) { adFadj[i] = 0.0; } pNodeFactory = new CNodeFactory(); if(pNodeFactory == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } hr = pNodeFactory->Initialize(cDepth); if(GBM_FAILED(hr)) { goto Error; } ptreeTemp->Initialize(pNodeFactory); // array for flagging those observations in the bag afInBag = new bool[cTrain]; if(afInBag==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } // aiNodeAssign tracks to which node each training obs belongs aiNodeAssign = new ULONG[cTrain]; if(aiNodeAssign==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } // NodeSearch objects help decide which nodes to split aNodeSearch = new CNodeSearch[2*cDepth+1]; if(aNodeSearch==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } for(i=0; i<2*cDepth+1; i++) { aNodeSearch[i].Initialize(cMinObsInNode); } vecpTermNodes.resize(2*cDepth+1,NULL); fInitialized = true; Cleanup: return hr; Error: goto Cleanup; } GBMRESULT CGBM::Predict ( unsigned long iVar, unsigned long cTrees, double *adF, double *adX, unsigned long cLength ) { GBMRESULT hr = GBM_OK; return hr; } GBMRESULT CGBM::Predict ( double *adX, unsigned long cRow, unsigned long cCol, unsigned long cTrees, double *adF ) { GBMRESULT hr = GBM_OK; return hr; } GBMRESULT CGBM::GetVarRelativeInfluence ( double *adRelInf, unsigned long cTrees ) { GBMRESULT hr = GBM_OK; int iVar=0; for(iVar=0; iVarcCols; iVar++) { adRelInf[iVar] = 0.0; } return hr; } GBMRESULT CGBM::PrintTree() { GBMRESULT hr = GBM_OK; hr = ptreeTemp->Print(); if(GBM_FAILED(hr)) goto Error; Cleanup: return hr; Error: goto Cleanup; } GBMRESULT CGBM::iterate ( double *adF, double &dTrainError, double &dValidError, double &dOOBagImprove, int &cNodes, int cNumClasses, int cClassIdx ) { GBMRESULT hr = GBM_OK; unsigned long i = 0; unsigned long cBagged = 0; int cIdxOff = cClassIdx * (cTrain + cValid); // for(i=0; i < cTrain + cIdxOff; i++){ adF[i] = 0;} if(!fInitialized) { hr = GBM_FAIL; goto Error; } dTrainError = 0.0; dValidError = 0.0; dOOBagImprove = 0.0; vecpTermNodes.assign(2*cDepth+1,NULL); // randomly assign observations to the Bag if (cClassIdx == 0) { if (!IsPairwise()) { // regular instance based training for(i=0; i= cTotalInBag){ break; } */ } // the remainder is not in the bag for( ; iadMisc[i]; if (dGroup != dLastGroup) { if (cBaggedGroups >= cTotalGroupsInBag) { break; } // Group changed, make a new decision chosen = (unif_rand()*(cGroups - cSeenGroups) < cTotalGroupsInBag - cBaggedGroups); if (chosen) { cBaggedGroups++; } dLastGroup = dGroup; cSeenGroups++; } if (chosen) { afInBag[i] = true; cBagged++; } else { afInBag[i] = false; } } // the remainder is not in the bag for( ; iComputeWorkingResponse(pData->adY, pData->adMisc, pData->adOffset, adF, adZ, pData->adWeight, afInBag, cTrain, cIdxOff); if(GBM_FAILED(hr)) { goto Error; } #ifdef NOISY_DEBUG Rprintf("Reset tree\n"); #endif hr = ptreeTemp->Reset(); #ifdef NOISY_DEBUG Rprintf("grow tree\n"); #endif hr = ptreeTemp->grow(&(adZ[cIdxOff]), pData, &(pData->adWeight[cIdxOff]), &(adFadj[cIdxOff]), cTrain, cTotalInBag, dLambda, cDepth, cMinObsInNode, afInBag, aiNodeAssign, aNodeSearch, vecpTermNodes); if(GBM_FAILED(hr)) { goto Error; } #ifdef NOISY_DEBUG Rprintf("get node count\n"); #endif hr = ptreeTemp->GetNodeCount(cNodes); if(GBM_FAILED(hr)) { goto Error; } // Now I have adF, adZ, and vecpTermNodes (new node assignments) // Fit the best constant within each terminal node #ifdef NOISY_DEBUG Rprintf("fit best constant\n"); #endif hr = pDist->FitBestConstant(pData->adY, pData->adMisc, pData->adOffset, pData->adWeight, adF, adZ, aiNodeAssign, cTrain, vecpTermNodes, (2*cNodes+1)/3, // number of terminal nodes cMinObsInNode, afInBag, adFadj, cIdxOff); if(GBM_FAILED(hr)) { goto Error; } // update training predictions // fill in missing nodes where N < cMinObsInNode hr = ptreeTemp->Adjust(aiNodeAssign,&(adFadj[cIdxOff]),cTrain, vecpTermNodes,cMinObsInNode); if(GBM_FAILED(hr)) { goto Error; } ptreeTemp->SetShrinkage(dLambda); if (cClassIdx == (cNumClasses - 1)) { dOOBagImprove = pDist->BagImprovement(pData->adY, pData->adMisc, pData->adOffset, pData->adWeight, adF, adFadj, afInBag, dLambda, cTrain); } // update the training predictions for(i=0; i < cTrain; i++) { int iIdx = i + cIdxOff; adF[iIdx] += dLambda * adFadj[iIdx]; } dTrainError = pDist->Deviance(pData->adY, pData->adMisc, pData->adOffset, pData->adWeight, adF, cTrain, cIdxOff); // update the validation predictions hr = ptreeTemp->PredictValid(pData,cValid,&(adFadj[cIdxOff])); for(i=cTrain; i < cTrain+cValid; i++) { adF[i + cIdxOff] += adFadj[i + cIdxOff]; } if(pData->fHasOffset) { dValidError = pDist->Deviance(pData->adY, pData->adMisc, pData->adOffset, pData->adWeight, adF, cValid, cIdxOff + cTrain); } else { dValidError = pDist->Deviance(pData->adY, pData->adMisc, NULL, pData->adWeight, adF, cValid, cIdxOff + cTrain); } Cleanup: return hr; Error: goto Cleanup; } GBMRESULT CGBM::TransferTreeToRList ( int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, VEC_VEC_CATEGORIES &vecSplitCodes, int cCatSplitsOld ) { GBMRESULT hr = GBM_OK; hr = ptreeTemp->TransferTreeToRList(pData, aiSplitVar, adSplitPoint, aiLeftNode, aiRightNode, aiMissingNode, adErrorReduction, adWeight, adPred, vecSplitCodes, cCatSplitsOld, dLambda); return hr; } gbm/src/gaussian.h0000644000176200001440000000547513417115400013600 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: gaussian.h // // License: GNU GPL (version 2 or later) // // Contents: gaussian object // // Owner: gregr@rand.org // // History: 3/26/2001 gregr created // 2/14/2003 gregr: adapted for R implementation // //------------------------------------------------------------------------------ #ifndef GAUSSIAN_H #define GAUSSIAN_H #include "distribution.h" class CGaussian : public CDistribution { public: CGaussian(); virtual ~CGaussian(); GBMRESULT UpdateParams(double *adF, double *adOffset, double *adWeight, unsigned long cLength) { return GBM_OK; }; GBMRESULT ComputeWorkingResponse(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adZ, bool *afInBag, unsigned long nTrain, int cIdxOff); GBMRESULT InitF(double *adY, double *adMisc, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength); GBMRESULT FitBestConstant(double *adY, double *adMisc, double *adOffset, double *adW, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long nTrain, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff); double Deviance(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff); double BagImprovement(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain); }; #endif // GAUSSIAN_H gbm/src/pairwise.h0000644000176200001440000003254113417115400013603 0ustar liggesusers//--------------------------------------------------------------------------------- // GBM alteration by Stefan Schroedl (schroedl@a9.com) // // File: pairwise // // Contains: Distribution object to implement pairwise distributions for ranking // // History: 12/15/2011 Created // //--------------------------------------------------------------------------------- // This file implements the LambdaMart algorithm for learning ranking functions. // The main idea is to model p_ij, the probability that item i should rank higher // than j, as // p_ij = 1 / (1 + exp(s_i - s_j)), // where s_i, s_j are the model scores for the two items. // // While scores are still generated one item at a time, gradients for learning // depend on _pairs_ of items. The algorithm is aware of _groups_; all pairs of items // with different labels, belonging to the same group, are used for training. A // typical application is ranking for web search: groups correspond to user queries, // and items to (feature vectors of) web pages in the associated match set. // // Different IR measures can be chosen, to weight instances based on their rank. // Generally, changes in top ranks should have more influence than changes at the // bottom of the result list. This function provides the following options: // // * CONC (concordance index, fraction of correctly raked pairs. This is a generalization // of Area under the ROC Curve (AUC) from binary to multivalued labels. // * Normalized Discounted Cumulative Gain (NDCG) // * Mean Reciprocal Rank (MRR) of the highest-ranked positive instance. // * Mean Average Precision (MAP), a generalization of MRR to multiple positive instances. // // While MRR and MAP expect binary target labels, CONC and NDCG can equally work with // continuous values. More precisely, NDCG is defined as // \Sum_{r=1..n} val_r / log2(r+1), // where val_r is the user-specified target for the item at rank r. Note that this is // contrast to some definitions of NDCG that assume integer targets s_i, and // implicitly transform val_r = 2^{s+i}-1. // // Groups are specified using an integer vector of the same length as the training instances. // // Optionally, item weights can be supplied; it is assumed that all instances belonging // to the same group have the same weight. // // For background information on LambdaMart, please see e.g. the following papers: // // * Burges, C., "From RankNet to LambdaRank to LambdaMART: An Overview", Microsoft // Research Technical Report MSR-TR-2010-82, 2010 // * Donmez, P., K. Svore, K., and Burges, C., "On the Local Optimality of // LambdaRank", SIGIR 2009 // * Burges, C., Ragno, R., and Le, Q., "Learning to Rank with Non-Smooth Cost // Functions", NIPS 2006 #ifndef PAIRWISE_H #define PAIRWISE_H #include "distribution.h" #include "buildinfo.h" // A class to rerank groups based on (intermediate) scores // Note: Smaller ranks are better, the top rank is 1 class CRanker { public: // Auxiliary structure to store score and rank typedef std::pair CDoubleUintPair; // Buffer memory allocation void Init(unsigned int cMaxItemsPerGroup); // Initialize ranker with scores of items belonging to the same group // - adScores is a score array, (at least) cNumItems long bool SetGroupScores(const double* const adScores, unsigned int cNumItems); // Perform the ranking // - Return true if any item changed its rank bool Rank(); // Getter / setter unsigned int GetNumItems() const { return cNumItems; } unsigned int GetRank(int i) const { return vecdipScoreRank[i].second; } unsigned int GetItem(unsigned int iRank) const { return (vecpdipScoreRank[iRank-1] - &(vecdipScoreRank[0])); } void SetRank(int i, unsigned int r) { vecdipScoreRank[i].second = r; } void AddToScore(int i, double delta) { vecdipScoreRank[i].first += delta; } protected: // Number of items in current group unsigned int cNumItems; // Pairs of (score, rank) for current group vector vecdipScoreRank; // Array of pointers to elements of vecdipScoreRank, used for sorting // Note: We need a separate array for sorting in order to be able to // quickly look up the rank for any given item. vector vecpdipScoreRank; }; // Abstract base class for all IR Measures class CIRMeasure { public: // Constructor CIRMeasure() : cRankCutoff(UINT_MAX) {} // Destructor virtual ~CIRMeasure() { } // Getter / Setter unsigned int GetCutoffRank() const { return cRankCutoff; } void SetCutoffRank(unsigned int cRankCutoff) { this->cRankCutoff = cRankCutoff; } // Auxiliary function for sanity check bool AnyPairs(const double* const adY, unsigned int cNumItems) const { return (cNumItems >= 2 // at least two instances && adY[0] > 0.0 // at least one positive example (targets are non-increasing) && adY[cNumItems-1] != adY[0]); // at least two different targets } // Memory allocation virtual void Init(unsigned long cMaxGroup, unsigned long cNumItems, unsigned int cRankCutoff = UINT_MAX) { this->cRankCutoff = cRankCutoff; } // Calculate the IR measure for the group of items set in the ranker. // Precondition: CRanker::SetGroupScores() has been called // - adY are the target scores virtual double Measure(const double* const adY, const CRanker& ranker) = 0; // Calculate the maximum achievable IR measure for a given group. // Side effect: the ranker state might change // Default implementation for MRR and MAP: if any positive items exist, // ranking them at the top yields a perfect measure of 1. virtual double MaxMeasure(unsigned int iGroup, const double* const adY, unsigned int cNumItems) { return (AnyPairs(adY, cNumItems) ? 1.0 : 0.0); } // Calculate the difference in the IR measure caused by swapping the ranks of two items. // Assumptions: // * iItemBetter has a higher label than iItemWorse (i.e., adY[iItemBetter] > adY[iItemWorse]). // * ranker.setGroup() has been called. virtual double SwapCost(int iItemBetter, int iItemWorse, const double* const adY, const CRanker& ranker) const = 0; protected: // Cut-off rank below which items are ignored for measure unsigned int cRankCutoff; }; // Class to implement IR Measure 'CONC' (fraction of concordant pairs). For the case of binary labels, this is // equivalent to the area under the ROC curve (AUC). class CConc : public CIRMeasure { public: virtual ~CConc() { } void Init(unsigned long cMaxGroup, unsigned long cNumItems, unsigned int cRankCutoff = UINT_MAX); double Measure(const double* const adY, const CRanker& ranker); // The maximum number of correctly classified pairs is simply all pairs with different labels double MaxMeasure(unsigned int iGroup, const double* const adY, unsigned int cNumItems) { return PairCount(iGroup, adY, cNumItems); } // (Cached) calculation of the number of pairs with different labels unsigned int PairCount(unsigned int iGroup, const double* const adY, unsigned int cNumItems); double SwapCost(int iItemBetter, int iItemWorse, const double* const adY, const CRanker& ranker) const; protected: // Calculate the number of pairs with different labels int ComputePairCount(const double* const adY, unsigned int cNumItems); // Caches the number of pairs with different labels, for each group vector veccPairCount; }; // Class to implement IR Measure 'Normalized Discounted Cumulative Gain' // Note: Labels can have any non-negative value class CNDCG : public CIRMeasure { public: void Init(unsigned long cMaxGroup, unsigned long cNumItems, unsigned int cRankCutoff = UINT_MAX); // Compute DCG double Measure(const double* const adY, const CRanker& ranker); // Compute best possible DCG double MaxMeasure(unsigned int iGroup, const double* const adY, unsigned int cNumItems); double SwapCost(int iItemBetter, int iItemWorse, const double* const adY, const CRanker& ranker) const; protected: // Lookup table for rank weight (w(rank) = 1/log2(1+rank)) vector vecdRankWeight; // Caches the maximum achievable DCG, for each group vector vecdMaxDCG; }; // Class to implement IR Measure 'Mean Reciprocal Rank' // Assumption: Labels are 0 or 1 class CMRR : public CIRMeasure { public: double Measure(const double* const adY, const CRanker& ranker); double SwapCost(int iItemPos, int iItemNeg, const double* const adY, const CRanker& ranker) const; }; // Class to implement IR Measure 'Mean Average Precision' // Assumption: Labels are 0 or 1 class CMAP : public CIRMeasure { public: void Init(unsigned long cMaxGroup, unsigned long cNumItems, unsigned int cRankCutoff = UINT_MAX); double Measure(const double* const adY, const CRanker& ranker); double SwapCost(int iItemPos, int iItemNeg, const double* const adY, const CRanker& ranker) const; protected: // Buffer to hold positions of positive examples mutable vector veccRankPos; }; // Main class for 'pairwise' distribution // Notes and Assumptions: // * The items are sorted such that // * Instances belonging to the same group occur in // a contiguous range // * Within a group, labels are non-increasing. // * adGroup supplies the group ID (positive integer, but double // format for compliance with the base class interface). // * The targets adY are non-negative values, and binary {0,1} // for measures MRR and MAP. // * Higher IR measures are better. // * Only pairs with different labels are used for training. // * Instance weights (adWeight) are constant among groups. // * CPairwise::Initialize() is called before any of the other // functions, with same values for adY, adGroup, adWeight, and // nTrain. Certain values have to be precomputed for // efficiency. class CPairwise : public CDistribution { public: // Constructor: determine IR measure as either "conc", "map", "mrr", or "ndcg" CPairwise(const char* szIRMeasure); virtual ~CPairwise(); GBMRESULT Initialize(double *adY, double *adGroup, double *adOffset, double *adWeight, unsigned long cLength); GBMRESULT UpdateParams(double *adF, double *adOffset, double *adWeight, unsigned long cLength) { return GBM_OK; }; GBMRESULT ComputeWorkingResponse(double *adY, double *adGroup, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff); double Deviance(double *adY, double *adGroup, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff); GBMRESULT InitF(double *adY, double *adGroup, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength); GBMRESULT FitBestConstant(double *adY, double *adGroup, double *adOffset, double *adW, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long nTrain, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff); double BagImprovement(double *adY, double *adGroup, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain); protected: // Calculate and accumulate up the gradients and Hessians from all training pairs void ComputeLambdas(int iGroup, unsigned int cNumItems, const double* const adY, const double* const adF, const double* const adWeight, double* adZ, double* adDeriv); CIRMeasure* pirm; // The IR measure to use CRanker ranker; // The ranker vector vecdHessian; // Second derivative of loss function, for each training instance; used for Newton step vector vecdNum; // Buffer used for numerator in FitBestConstant(), for each node vector vecdDenom; // Buffer used for denominator in FitBestConstant(), for each node vector vecdFPlusOffset; // Temporary buffer for (adF + adOffset), if the latter is not null }; #endif // PAIRWISE_H gbm/src/adaboost.h0000644000176200001440000000555613417115400013562 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: adaboost.h // // License: GNU GPL (version 2 or later) // // Contents: Object for fitting for the AdaBoost loss function // // Owner: gregr@rand.org // // History: 3/26/2001 gregr created // 2/14/2003 gregr: adapted for R implementation // //------------------------------------------------------------------------------ #ifndef ADABOOST_H #define ADABOOST_H #include "distribution.h" class CAdaBoost : public CDistribution { public: CAdaBoost(); virtual ~CAdaBoost(); GBMRESULT UpdateParams(double *adF, double *adOffset, double *adWeight, unsigned long cLength) { return GBM_OK; }; GBMRESULT ComputeWorkingResponse(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adZ, bool *afInBag, unsigned long nTrain, int cIdxOff); GBMRESULT InitF(double *adY, double *adMisc, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength); GBMRESULT FitBestConstant(double *adY, double *adMisc, double *adOffset, double *adW, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long nTrain, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff); double Deviance(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff); double BagImprovement(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain); private: vector vecdNum; vector vecdDen; }; #endif // ADABOOST_H gbm/src/pairwise.cpp0000644000176200001440000007344713417115400014150 0ustar liggesusers// Implementation file for 'pairwise' distribution // // Author: Stefan Schroedl (schroedl@a9.com) #include "pairwise.h" #include #include #include #include //#define NOISY_DEBUG #ifdef NOISY_DEBUG #endif void CRanker::Init(unsigned int cMaxItemsPerGroup) { // Allocate sorting buffers vecdipScoreRank.resize(cMaxItemsPerGroup); vecpdipScoreRank.resize(cMaxItemsPerGroup); } bool CRanker::SetGroupScores(const double* const adScores, const unsigned int cNumItems) { const double dEPS = 1e-10; if (cNumItems > vecdipScoreRank.size()) { // Allocate additional space // (We should never get here if CPairwise::Initialize has been called before, as expected) Init(cNumItems); } this->cNumItems = cNumItems; // Copy scores to buffer, and // initialize pointer array to score entries for(unsigned int i = 0; i < cNumItems; i++) { // Add small random number to break possible ties vecdipScoreRank[i].first = adScores[i] + dEPS * (unif_rand() - 0.5); vecpdipScoreRank[i] = &(vecdipScoreRank[i]); } return true; } // Auxiliary struct to compare pair pointers // decreasing order based on the first component (score) struct CDoubleUintPairPtrComparison { bool operator() (const CRanker::CDoubleUintPair* lhs, const CRanker::CDoubleUintPair* rhs) { return (lhs->first > rhs->first); } }; bool CRanker::Rank() { // Sort the pointer array, based on decreasing score CDoubleUintPairPtrComparison comp; sort(vecpdipScoreRank.begin(), vecpdipScoreRank.begin() + cNumItems, comp); bool bChanged = false; // Create inverted rank lookup for(unsigned int i = 0; i < cNumItems; i++) { // Note: ranks are 1-based const unsigned int cNewRank = i + 1; if (!bChanged) { bChanged = (cNewRank != vecpdipScoreRank[i]->second); } // Store the rank with the corresponding score in the vecdipScoreRank array vecpdipScoreRank[i]->second = cNewRank; } return bChanged; } void CConc::Init ( unsigned long cMaxGroup, unsigned long cMaxItemsPerGroup, unsigned int cRankCutoff ) { CIRMeasure::Init(cMaxGroup, cMaxItemsPerGroup, cRankCutoff); veccPairCount.resize(cMaxGroup + 1, -1); } unsigned int CConc::PairCount(unsigned int iGroup, const double* const adY, unsigned int cNumItems) { if (iGroup >= veccPairCount.size()) { // Allocate additional space // (We should never get here if CPairwise::Initialize has been called before, as expected) veccPairCount.resize(iGroup + 1, -1); } if (veccPairCount[iGroup] < 0.0) { // Not yet initialized veccPairCount[iGroup] = ComputePairCount(adY, cNumItems); } return veccPairCount[iGroup]; } // Calculate the number of pairs with different labels, and store in veccPairCount // Assumption: instances are sorted such that labels are non-increasing int CConc::ComputePairCount(const double* const adY, unsigned int cNumItems) { if (!AnyPairs(adY, cNumItems)) { return 0; } double dLabelCurrent = adY[0]; int iLabelEnd = 0; // End of range with higher labels int cPairs = 0; for (unsigned int j = 1; j < cNumItems; j++) { if (adY[j] != dLabelCurrent) { // i.e., dYj < dLabelCurrent iLabelEnd = j; dLabelCurrent = adY[j]; } // All items in 0 .. iLabelEnd - 1 are better than item j; // i.e, we have pairs (j,0), (j,1), ... (j, iLabelEnd - 1) cPairs += iLabelEnd; } return cPairs; } // Count the number of correctly ranked pairs with different labels double CConc::Measure(const double* const adY, const CRanker& ranker) { double dLabelCurrent = adY[0]; int iLabelEnd = 0; // End of the range with higher labels int cGoodPairs = 0; for (unsigned int j = 1; j < ranker.GetNumItems(); j++) { const double dYj = adY[j]; if (dYj != dLabelCurrent) { // i.e., dYj < dLabelCurrent iLabelEnd = j; dLabelCurrent = dYj; } // All items in 0 .. iLabelEnd - 1 are better than this item for (int i = 0; i < iLabelEnd; i++) { if (ranker.GetRank(i) < ranker.GetRank(j)) { cGoodPairs++; } } } return cGoodPairs; } double CConc::SwapCost(int iItemBetter, int iItemWorse, const double* const adY, const CRanker& ranker) const { // Note: this implementation can handle arbitrary non-negative target values. // For binary (0/1) targets, the swap cost would reduce to the much simpler expression: // (int)ranker.GetRank(iItemBetter) - (int)ranker.GetRank(iItemWorse) const unsigned int cRankBetter = ranker.GetRank(iItemBetter); const unsigned int cRankWorse = ranker.GetRank(iItemWorse); // Which one of the two has the higher rank? unsigned int cRankUpper, cRankLower; double dYUpper, dYLower; int cDiff; if (cRankBetter > cRankWorse) { // Concordance increasing cRankUpper = cRankWorse; cRankLower = cRankBetter; dYUpper = adY[iItemWorse]; dYLower = adY[iItemBetter]; cDiff = 1; // The direct impact of the pair (iItemBetter, iItemWorse) } else { // Concordance decreasing cRankUpper = cRankBetter; cRankLower = cRankWorse; dYUpper = adY[iItemBetter]; dYLower = adY[iItemWorse]; cDiff = -1; // // The direct impact of the pair (iItemBetter, iItemWorse) } // Compute indirect impact for pairs involving items in between the two for (unsigned int cRank = cRankUpper + 1; cRank < cRankLower; cRank++) { const double dYi = adY[ranker.GetItem(cRank)]; double dScoreDiff = dYi - dYLower; if (dScoreDiff != 0) { cDiff += (dScoreDiff < 0) ? 1 : -1; } dScoreDiff = dYi - dYUpper; if (dScoreDiff != 0) { cDiff += (dScoreDiff < 0) ? -1 : 1; } } return cDiff; } void CNDCG::Init ( unsigned long cMaxGroup, unsigned long cMaxItemsPerGroup, unsigned int cRankCutoff ) { CIRMeasure::Init(cMaxGroup, cMaxItemsPerGroup, cRankCutoff); // Initialize rank weights (note: ranks are 1-based) vecdRankWeight.resize(cMaxItemsPerGroup + 1, 0.0); const unsigned int cMaxRank = std::min((unsigned int)cMaxItemsPerGroup, GetCutoffRank()); // Precompute rank weights for (unsigned int i = 1; i <= cMaxRank; i++) { vecdRankWeight[i] = log((double)2) / log((double)(i+1)); } // Allocate buffer vecdMaxDCG.resize(cMaxGroup + 1, -1.0); } // Sum of target values, weighted by rank weight double CNDCG::Measure(const double* const adY, const CRanker& ranker) { double dScore = 0; for (unsigned int i = 0; i < ranker.GetNumItems(); i++) { dScore += adY[i] * vecdRankWeight[ranker.GetRank(i)]; } return dScore; } double CNDCG::MaxMeasure(unsigned int iGroup, const double* const adY, unsigned int cNumItems) { if (iGroup >= vecdMaxDCG.size()) { // Allocate additional space // (We should never get here if CPairwise::Initialize has been called before, as expected) vecdMaxDCG.resize(iGroup + 1, -1.0); } if (vecdMaxDCG[iGroup] < 0.0) { // Not initialized if (!AnyPairs(adY, cNumItems)) { // No training pairs exist vecdMaxDCG[iGroup] = 0.0; } else { // Compute maximum possible DCG. // Note: By assumption, items are pre-sorted by descending score. double dScore = 0; unsigned int i = 0; while (i < cNumItems && adY[i] > 0) { // Note: Due to sorting, we can terminate early for a zero score. dScore += adY[i] * vecdRankWeight[i + 1]; i++; } vecdMaxDCG[iGroup] = dScore; #ifdef NOISY_DEBUG if (vecdMaxDCG[iGroup] == 0) { Rprintf("max score is 0: iGroup = %d, maxScore = %f, sz = %d\n", iGroup, vecdMaxDCG[iGroup], ranker.GetNumItems()); assert(false); } #endif } } return vecdMaxDCG[iGroup]; } double CNDCG::SwapCost(int iItemBetter, int iItemWorse, const double* const adY, const CRanker& ranker) const { const unsigned int cRanki = ranker.GetRank(iItemBetter); const unsigned int cRankj = ranker.GetRank(iItemWorse); return (vecdRankWeight[cRanki] - vecdRankWeight[cRankj]) * (adY[iItemBetter] - adY[iItemWorse]); } // Auxiliary function to find the top rank of a positive item (cRankTop), and the number of positive items (cPos) inline void TopRankPos(const double* const adY, const CRanker& ranker, unsigned int& cRankTop, unsigned int& cPos) { const unsigned int cNumItems = ranker.GetNumItems(); cRankTop = cNumItems + 1; // Ranks are 1-based for (cPos = 0; cPos < cNumItems; cPos++) { if (adY[cPos] <= 0.0) { // All subsequent items are zero, because of presorting return; } cRankTop = min(cRankTop, ranker.GetRank(cPos)); } } double CMRR::Measure(const double* const adY, const CRanker& ranker) { unsigned int cRankTop, cPos; TopRankPos(adY, ranker, cRankTop, cPos); const unsigned int cNumItems = min(ranker.GetNumItems(), GetCutoffRank()); if (cRankTop >= cNumItems + 1) { // No positive item found return 0.0; } // Ranks start at 1 return 1.0 / cRankTop; } double CMRR::SwapCost(int iItemPos, int iItemNeg, const double* const adY, const CRanker& ranker) const { unsigned int cRankTop, cPos; TopRankPos(adY, ranker, cRankTop, cPos); const unsigned int cNumItems = ranker.GetNumItems(); if (cRankTop >= cNumItems + 1 // No positive item (ranks are 1-based) || cPos >= cNumItems) // No negative item { return 0.0; } const unsigned int cRankPos = ranker.GetRank(iItemPos); const unsigned int cRankNeg = ranker.GetRank(iItemNeg); const unsigned int cCutoffRank = GetCutoffRank(); const double dMeasureCurrent = (cRankTop > cCutoffRank) ? 0.0 : 1.0 / cRankTop; const double dMeasureNeg = (cRankNeg > cCutoffRank) ? 0.0 : 1.0 / cRankNeg; // Only pairs where the negative item is above the top positive result, // or else where the positive item *is* the top item, can change the MRR return ((cRankNeg < cRankTop || cRankPos == cRankTop) ? (dMeasureNeg - dMeasureCurrent) : 0.0); } void CMAP::Init ( unsigned long cMaxGroup, unsigned long cMaxItemsPerGroup, unsigned int cRankCutoff ) { CIRMeasure::Init(cMaxGroup, cMaxItemsPerGroup, cRankCutoff); // Allocate rank buffer (note: ranks are 1-based) veccRankPos.resize(cMaxItemsPerGroup + 1); } // Auxiliary function to find the sorted ranks of positive items (veccRankPos), and their number (cPos) inline void SortRankPos(const double* const adY, const CRanker& ranker, vector& veccRankPos, unsigned int& cPos) { // Store all ranks of positive items in veccRankPos for (cPos = 0; cPos < ranker.GetNumItems(); cPos++) { if (adY[cPos] <= 0.0) { // All subsequent items are zero, because of presorting break; } veccRankPos[cPos] = ranker.GetRank(cPos); } sort(veccRankPos.begin(), veccRankPos.begin() + cPos); } double CMAP::SwapCost(int iItemPos, int iItemNeg, const double* const adY, const CRanker& ranker) const { unsigned int cPos; SortRankPos(adY, ranker, veccRankPos, cPos); if (cPos == 0) { return 0.0; } // Now veccRankPos[i] is the i-th highest rank of a positive item, and // cPos is the total number of positive items. const int iRankItemPos = ranker.GetRank(iItemPos); const int iRankItemNeg = ranker.GetRank(iItemNeg); // Search for the position of the two items to swap const vector::iterator itItemPos = upper_bound(veccRankPos.begin(), veccRankPos.begin() + cPos, iRankItemPos); const vector::iterator itItemNeg = upper_bound(veccRankPos.begin(), veccRankPos.begin() + cPos, iRankItemNeg); // The number of positive items up to and including iItemPos const unsigned int cNumPosNotBelowItemPos = (unsigned int)(itItemPos - veccRankPos.begin()); // The number of positive items up to iItemNeg (Note: Cannot include iItemNeg itself) const unsigned int cNumPosAboveItemNeg = (unsigned int)(itItemNeg - veccRankPos.begin()); // Range of indices of positive items between iRankItemPos and iRankItemNeg (exclusively) int cIntermediateHigh, cIntermediateLow; // Current contribution of iItemPos double dContribBefore = (double) cNumPosNotBelowItemPos / iRankItemPos; double dSign, dContribAfter; if (iRankItemNeg > iRankItemPos) { // MAP is decreasing dSign = -1.0; // The first positive item after iRankItemPos cIntermediateLow = cNumPosNotBelowItemPos; // The last positive item before iRankItemNeg cIntermediateHigh = cNumPosAboveItemNeg - 1; // Note: iItemPos already counted in cNumPosAboveItemNeg dContribAfter = (double)cNumPosAboveItemNeg / iRankItemNeg; } else { // MAP is increasing dSign = 1.0; // The first positive result after iRankItemNeg cIntermediateLow = cNumPosAboveItemNeg; // The first positive result after iRankItemPos, minus iItemPos itself cIntermediateHigh = cNumPosNotBelowItemPos - 2; // Note: iItemPos not yet counted in cNumPosAboveItemNeg dContribAfter = (double) (cNumPosAboveItemNeg + 1) / iRankItemNeg; } // The direct effect of switching iItemPos double dDiff = dContribAfter - dContribBefore; // The indirect effect for all items in between the two items for (int j = cIntermediateLow; j <= cIntermediateHigh; j++) { dDiff += dSign / veccRankPos[j]; } return dDiff / cPos; } double CMAP::Measure(const double* const adY, const CRanker& ranker) { unsigned int cPos; SortRankPos(adY, ranker, veccRankPos, cPos); if (cPos == 0) { return 0.0; } // Now veccRankPos[i] is the i-th highest rank of a positive item double dPrec = 0.0; for (unsigned int j = 0; j < cPos; j++) { dPrec += double(j + 1) / veccRankPos[j]; } return dPrec / cPos; } CPairwise::CPairwise(const char* szIRMeasure) { // Construct the IR Measure if (!strcmp(szIRMeasure, "conc")) { pirm = new CConc(); } else if (!strcmp(szIRMeasure, "map")) { pirm = new CMAP(); } else if (!strcmp(szIRMeasure, "mrr")) { pirm = new CMRR(); } else { if (strcmp(szIRMeasure, "ndcg")) { Rprintf("Unknown IR measure '%s' in initialization, using 'ndcg' instead\n", szIRMeasure); } pirm = new CNDCG(); } } CPairwise::~CPairwise() { delete pirm; } // Auxiliary function for addition of optional offset parameter inline const double* OffsetVector(const double* const adX, const double* const adOffset, unsigned int iStart, unsigned int iEnd, vector& vecBuffer) { if (adOffset == NULL) { // Optional second argument is not set, just return first one return adX + iStart; } else { for (unsigned int i = iStart, iOut = 0; i < iEnd; i++, iOut++) { vecBuffer[iOut] = adX[i] + adOffset[i]; } return &vecBuffer[0]; } } GBMRESULT CPairwise::ComputeWorkingResponse ( double *adY, double *adGroup, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff ) { #ifdef NOISY_DEBUG Rprintf("compute working response, nTrain = %u, cIdxOff = %d\n", nTrain, cIdxOff); #endif if (nTrain <= 0) { return GBM_OK; } try { // Iterate through all groups, compute gradients unsigned int iItemStart = 0; unsigned int iItemEnd = 0; while (iItemStart < nTrain) { adZ[iItemEnd] = 0; vecdHessian[iItemEnd] = 0; const double dGroup = adGroup[iItemStart]; // Find end of current group, initialize working response for (iItemEnd = iItemStart + 1; iItemEnd < nTrain && adGroup[iItemEnd] == dGroup; iItemEnd++) { // Clear gradients from last iteration adZ[iItemEnd] = 0; vecdHessian[iItemEnd] = 0; } #ifdef NOISY_DEBUG // Check sorting for (unsigned int i = iItemStart; i < iItemEnd-1; i++) { assert(adY[i] >= adY[i+1]); } #endif if (afInBag[iItemStart]) { // Group is part of the training set const int cNumItems = iItemEnd - iItemStart; // If offset given, add up current scores const double* adFPlusOffset = OffsetVector(adF, adOffset, iItemStart, iItemEnd, vecdFPlusOffset); // Accumulate gradients ComputeLambdas((int)dGroup, cNumItems, adY + iItemStart, adFPlusOffset, adWeight + iItemStart, adZ + iItemStart, &vecdHessian[iItemStart]); } // Next group iItemStart = iItemEnd; } } catch (std::bad_alloc&) { return GBM_OUTOFMEMORY; } return GBM_OK; } // Referring to MSR-TR-2010-82-2, section 7 (see also the vignette): // // Let P be the set of pairs (i,j) where Y(i)>Y(j) (i is better than j). // The approximation to the IR measure is the utility function C (to be maximized) // C // = \Sum_{(i,j) in P} |Delta Z_ij| C(s_i - s_j) // = \Sum_{(i,j) in P} |Delta Z_ij| / (1 + exp(-(s_i - s_j))), // where |Delta Z_ij| is the cost of swapping (only) i and j in the current ranking, // and s_i, s_j are the prediction scores (sum of the tree predictions) for items // i and j. // // For (i,j) in P, define // lambda_ij // = dC(s_i-s_j) / ds_i // = - |Delta Z_ij| / (1 + exp(s_i - s_j)) // = - |Delta Z_ij| * rho_ij, // with // rho_ij = - lambda_ij / |Delta Z_ij| = 1 / (1 + exp(s_i - s_j)) // // So the gradient of C with respect to s_i is // dC / ds_i // =(def) lambda_i // = \Sum_{j|(i,j) in P} lambda_ij - \Sum_{j|(j,i) in P} lambda_ji // = - \Sum_{j|(i,j) in P} |Delta Z_ij| * rho_ij // + \Sum_{j|(j,i) in P} |Delta Z_ji| * rho_ji; // it is stored in adZ[i]. // // The second derivative is // d^2C / ds_i^2 // =(def) gamma_i // = \Sum_{j|(i,j) in P} |Delta Z_ij| * rho_ij * (1-rho_ij) // - \Sum_{j|(j,i) in P} |Delta Z_ji| * rho_ji * (1-rho_ji); // it is stored in vecdHessian[i]. // // The Newton step for a particular leaf node is (a fraction of) // g'/g'', where g' (resp. g'') is the sum of dC/ds_i = lambda_i // (resp. d^2C/d^2s_i = gamma_i) over all instances falling into this leaf. This // summation is calculated later in CPairwise::FitBestConstant(). void CPairwise::ComputeLambdas(int iGroup, unsigned int cNumItems, const double* const adY, const double* const adF, const double* const adWeight, double* adZ, double* adDeriv) { // Assumption: Weights are constant within group if (adWeight[0] <= 0) { return; } // Normalize for maximum achievable group score const double dMaxScore = pirm->MaxMeasure(iGroup, adY, cNumItems); if (dMaxScore <= 0.0) { // No pairs return; } // Rank items by current score ranker.SetGroupScores(adF, cNumItems); ranker.Rank(); double dLabelCurrent = adY[0]; // First index of instance that has dLabelCurrent // (i.e., each smaller index corresponds to better item) unsigned int iLabelCurrentStart = 0; // Number of pairs with unequal labels unsigned int cPairs = 0; #ifdef NOISY_DEBUG double dMeasureBefore = pirm->Measure(adY, ranker); #endif for (unsigned int j = 1; j < cNumItems; j++) { const double dYj = adY[j]; if (dYj != dLabelCurrent) { iLabelCurrentStart = j; dLabelCurrent = dYj; } for (unsigned int i = 0; i < iLabelCurrentStart; i++) { // Instance i is better than j const double dSwapCost = fabs(pirm->SwapCost(i, j, adY, ranker)); #ifdef NOISY_DEBUG double dDelta = fabs(pirm->SwapCost(i, j, adY, ranker)); const int cRanki = ranker.GetRank(i); const int cRankj = ranker.GetRank(j); ranker.SetRank(i, cRankj); ranker.SetRank(j, cRanki); double dMeasureAfter = pirm->Measure(adY, ranker); if (fabs(dMeasureBefore-dMeasureAfter) - dDelta > 1e-5) { Rprintf("%f %f %f %f %f %d %d\n", pirm->SwapCost(i, j, adY, ranker), dMeasureBefore, dMeasureAfter, dMeasureBefore - dMeasureAfter, dDelta , i, j); for (unsigned int k = 0; k < cNumItems; k++) { Rprintf("%d\t%d\t%f\t%f\n", k, ranker.GetRank(k), adY[k], adF[k]); } assert(false); } assert(fabs(dMeasureBefore - dMeasureAfter) - fabs(dDelta) < 1e-5); ranker.SetRank(j, cRankj); ranker.SetRank(i, cRanki); #endif assert(isfinite(dSwapCost)); if (dSwapCost > 0.0) { cPairs++; const double dRhoij = 1.0 / (1.0 + exp(adF[i]- adF[j])) ; assert(isfinite(dRhoij)); const double dLambdaij = dSwapCost * dRhoij; adZ[i] += dLambdaij; adZ[j] -= dLambdaij; const double dDerivij = dLambdaij * (1.0 - dRhoij); assert(dDerivij >= 0); adDeriv[i] += dDerivij; adDeriv[j] += dDerivij; } } } if (cPairs > 0) { // Normalize for number of training pairs const double dQNorm = 1.0 / (dMaxScore * cPairs); for (unsigned int j = 0; j < cNumItems; j++) { adZ[j] *= dQNorm; adDeriv[j] *= dQNorm; } } } GBMRESULT CPairwise::Initialize ( double *adY, double *adGroup, double *adOffset, double *adWeight, unsigned long cLength ) { if (cLength <= 0) { return GBM_OK; } try { // Allocate memory for derivative buffer vecdHessian.resize(cLength); // Count the groups and number of items per group unsigned int cMaxItemsPerGroup = 0; double dMaxGroup = 0; unsigned int iItemStart = 0; unsigned int iItemEnd = 0; while (iItemStart < cLength) { const double dGroup = adGroup[iItemStart]; // Find end of current group for (iItemEnd = iItemStart + 1; iItemEnd < cLength && adGroup[iItemEnd] == dGroup; iItemEnd++); const unsigned int cNumItems = iItemEnd - iItemStart; if (cNumItems > cMaxItemsPerGroup) { cMaxItemsPerGroup = cNumItems; } if (dGroup > dMaxGroup) { dMaxGroup = dGroup; } // Next group iItemStart = iItemEnd; } // Allocate buffer for offset addition vecdFPlusOffset.resize(cMaxItemsPerGroup); // Allocate ranker memory ranker.Init(cMaxItemsPerGroup); // Allocate IR measure memory // The last element of adGroup specifies the cutoff // (zero means no cutoff) unsigned int cRankCutoff = cMaxItemsPerGroup; if (adGroup[cLength] > 0) { cRankCutoff = (unsigned int)adGroup[cLength]; } pirm->Init((unsigned long)dMaxGroup, cMaxItemsPerGroup, cRankCutoff); #ifdef NOISY_DEBUG Rprintf("Initialization: instances=%ld, groups=%u, max items per group=%u, rank cutoff=%u, offset specified: %d\n", cLength, (unsigned long)dMaxGroup, cMaxItemsPerGroup, cRankCutoff, (adOffset != NULL)); #endif } catch (std::bad_alloc&) { return GBM_OUTOFMEMORY; } return GBM_OK; } GBMRESULT CPairwise::InitF ( double *adY, double *adGroup, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength ) { dInitF = 0.0; return GBM_OK; } double CPairwise::Deviance ( double *adY, double *adGroup, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff ) { #ifdef NOISY_DEBUG Rprintf("Deviance, cLength = %u, cIdxOff = %d\n", cLength, cIdxOff); #endif if (cLength <= 0) { return 0; } double dL = 0.0; double dW = 0.0; unsigned int iItemStart = cIdxOff; unsigned int iItemEnd = iItemStart; const unsigned int cEnd = cLength + cIdxOff; while (iItemStart < cEnd) { const double dGroup = adGroup[iItemStart]; const double dWi = adWeight[iItemStart]; // Find end of current group for (iItemEnd = iItemStart + 1; iItemEnd < cEnd && adGroup[iItemEnd] == dGroup; iItemEnd++) ; const int cNumItems = iItemEnd - iItemStart; const double dMaxScore = pirm->MaxMeasure((int)dGroup, adY + iItemStart, cNumItems); if (dMaxScore > 0.0) { // Rank items by current score // If offset given, add up current scores const double* adFPlusOffset = OffsetVector(adF, adOffset, iItemStart, iItemEnd, vecdFPlusOffset); ranker.SetGroupScores(adFPlusOffset, cNumItems); ranker.Rank(); dL += dWi * pirm->Measure(adY + iItemStart, ranker) / dMaxScore; dW += dWi; } // Next group iItemStart = iItemEnd; } // Loss = 1 - utility return 1.0 - dL / dW; } GBMRESULT CPairwise::FitBestConstant ( double *adY, double *adGroup, double *adOffset, double *adW, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long nTrain, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff ) { #ifdef NOISY_DEBUG Rprintf("FitBestConstant, nTrain = %u, cIdxOff = %d, cTermNodes = %d, \n", nTrain, cIdxOff, cTermNodes); #endif // Assumption: ComputeWorkingResponse() has been executed before with // the same arguments try { // Allocate space for numerators and denominators, and set to zero vecdNum.reserve(cTermNodes); vecdDenom.reserve(cTermNodes); for (unsigned int i = 0; i < cTermNodes; i++) { vecdNum[i] = 0.0; vecdDenom[i] = 0.0; } } catch (std::bad_alloc&) { return GBM_OUTOFMEMORY; } for (unsigned int iObs = 0; iObs < nTrain; iObs++) { if (afInBag[iObs]) { assert(isfinite(adW[iObs])); assert(isfinite(adZ[iObs])); assert(isfinite(vecdHessian[iObs])); vecdNum[aiNodeAssign[iObs]] += adW[iObs] * adZ[iObs]; vecdDenom[aiNodeAssign[iObs]] += adW[iObs] * vecdHessian[iObs]; } } for (unsigned int iNode = 0; iNode < cTermNodes; iNode++) { if (vecpTermNodes[iNode] != NULL) { vecpTermNodes[iNode]->dPrediction = vecdNum[iNode]; if (vecdDenom[iNode] <= 0.0) { vecpTermNodes[iNode]->dPrediction = 0.0; } else { vecpTermNodes[iNode]->dPrediction = vecdNum[iNode]/vecdDenom[iNode]; } } } return GBM_OK; } double CPairwise::BagImprovement ( double *adY, double *adGroup, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain ) { #ifdef NOISY_DEBUG Rprintf("BagImprovement, nTrain = %u\n", nTrain); #endif if (nTrain <= 0) { return 0; } double dL = 0.0; double dW = 0.0; unsigned int iItemStart = 0; unsigned int iItemEnd = 0; while (iItemStart < nTrain) { const double dGroup = adGroup[iItemStart]; // Find end of current group for (iItemEnd = iItemStart + 1; iItemEnd < nTrain && adGroup[iItemEnd] == dGroup; iItemEnd++) ; if (!afInBag[iItemStart]) { // Group was held out of training set const unsigned int cNumItems = iItemEnd - iItemStart; const double dMaxScore = pirm->MaxMeasure((int)dGroup, adY + iItemStart, cNumItems); if (dMaxScore > 0.0) { // If offset given, add up current scores const double* adFPlusOffset = OffsetVector(adF, adOffset, iItemStart, iItemEnd, vecdFPlusOffset); // Compute score according to old score, adF ranker.SetGroupScores(adFPlusOffset, cNumItems); ranker.Rank(); const double dOldScore = pirm->Measure(adY + iItemStart, ranker); // Compute score according to new score: adF' = adF + dStepSize * adFadj for (unsigned int i = 0; i < cNumItems; i++) { ranker.AddToScore(i, adFadj[i+iItemStart] * dStepSize); } const double dWi = adWeight[iItemStart]; if (ranker.Rank()) { // Ranking changed const double dNewScore = pirm->Measure(adY + iItemStart, ranker); dL += dWi * (dNewScore - dOldScore) / dMaxScore; } dW += dWi; } } // Next group iItemStart = iItemEnd; } return dL / dW; } gbm/src/distribution.cpp0000644000176200001440000000022613417115400015025 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "distribution.h" CDistribution::CDistribution() { } CDistribution::~CDistribution() { } gbm/src/gbm-init.c0000644000176200001440000000227413417115400013461 0ustar liggesusers#include #include #include // for NULL #include /* FIXME: Check these declarations against the C/Fortran source code. */ /* .Call calls */ extern SEXP gbm_fit(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP gbm_plot(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP gbm_pred(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP gbm_shrink_gradient(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP gbm_shrink_pred(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP); static const R_CallMethodDef CallEntries[] = { {"gbm_fit", (DL_FUNC) &gbm_fit, 22}, {"gbm_plot", (DL_FUNC) &gbm_plot, 10}, {"gbm_pred", (DL_FUNC) &gbm_pred, 10}, {"gbm_shrink_gradient", (DL_FUNC) &gbm_shrink_gradient, 11}, {"gbm_shrink_pred", (DL_FUNC) &gbm_shrink_pred, 10}, {NULL, NULL, 0} }; void R_init_gbm(DllInfo *dll) { R_registerRoutines(dll, NULL, CallEntries, NULL, NULL); R_useDynamicSymbols(dll, FALSE); } gbm/src/node_continuous.cpp0000644000176200001440000001211213417115400015516 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "node_continuous.h" #include "node_factory.h" CNodeContinuous::CNodeContinuous() { dSplitValue = 0.0; } CNodeContinuous::~CNodeContinuous() { #ifdef NOISY_DEBUG Rprintf("continuous destructor\n"); #endif } GBMRESULT CNodeContinuous::PrintSubtree ( unsigned long cIndent ) { GBMRESULT hr = GBM_OK; unsigned long i = 0; for(i=0; i< cIndent; i++) Rprintf(" "); Rprintf("N=%f, Improvement=%f, Prediction=%f, NA pred=%f\n", dTrainW, dImprovement, dPrediction, (pMissingNode == NULL ? 0.0 : pMissingNode->dPrediction)); for(i=0; i< cIndent; i++) Rprintf(" "); Rprintf("V%d < %f\n", iSplitVar, dSplitValue); hr = pLeftNode->PrintSubtree(cIndent+1); for(i=0; i< cIndent; i++) Rprintf(" "); Rprintf("V%d > %f\n", iSplitVar, dSplitValue); hr = pRightNode->PrintSubtree(cIndent+1); for(i=0; i< cIndent; i++) Rprintf(" "); Rprintf("missing\n"); hr = pMissingNode->PrintSubtree(cIndent+1); return hr; } signed char CNodeContinuous::WhichNode ( CDataset *pData, unsigned long iObs ) { signed char ReturnValue = 0; double dX = pData->adX[iSplitVar*(pData->cRows) + iObs]; if(!ISNA(dX)) { if(dX < dSplitValue) { ReturnValue = -1; } else { ReturnValue = 1; } } // if missing value returns 0 return ReturnValue; } signed char CNodeContinuous::WhichNode ( double *adX, unsigned long cRow, unsigned long cCol, unsigned long iRow ) { signed char ReturnValue = 0; double dX = adX[iSplitVar*cRow + iRow]; if(!ISNA(dX)) { if(dX < dSplitValue) { ReturnValue = -1; } else { ReturnValue = 1; } } // if missing value returns 0 return ReturnValue; } GBMRESULT CNodeContinuous::RecycleSelf ( CNodeFactory *pNodeFactory ) { GBMRESULT hr = GBM_OK; pNodeFactory->RecycleNode(this); return hr; }; GBMRESULT CNodeContinuous::TransferTreeToRList ( int &iNodeID, CDataset *pData, int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, VEC_VEC_CATEGORIES &vecSplitCodes, int cCatSplitsOld, double dShrinkage ) { GBMRESULT hr = GBM_OK; int iThisNodeID = iNodeID; aiSplitVar[iThisNodeID] = iSplitVar; adSplitPoint[iThisNodeID] = dSplitValue; adErrorReduction[iThisNodeID] = dImprovement; adWeight[iThisNodeID] = dTrainW; adPred[iThisNodeID] = dShrinkage*dPrediction; iNodeID++; aiLeftNode[iThisNodeID] = iNodeID; hr = pLeftNode->TransferTreeToRList(iNodeID, pData, aiSplitVar, adSplitPoint, aiLeftNode, aiRightNode, aiMissingNode, adErrorReduction, adWeight, adPred, vecSplitCodes, cCatSplitsOld, dShrinkage); if(GBM_FAILED(hr)) goto Error; aiRightNode[iThisNodeID] = iNodeID; hr = pRightNode->TransferTreeToRList(iNodeID, pData, aiSplitVar, adSplitPoint, aiLeftNode, aiRightNode, aiMissingNode, adErrorReduction, adWeight, adPred, vecSplitCodes, cCatSplitsOld, dShrinkage); if(GBM_FAILED(hr)) goto Error; aiMissingNode[iThisNodeID] = iNodeID; hr = pMissingNode->TransferTreeToRList(iNodeID, pData, aiSplitVar, adSplitPoint, aiLeftNode, aiRightNode, aiMissingNode, adErrorReduction, adWeight, adPred, vecSplitCodes, cCatSplitsOld, dShrinkage); if(GBM_FAILED(hr)) goto Error; Cleanup: return hr; Error: goto Cleanup; } gbm/src/node.h0000644000176200001440000000670113417115400012704 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: node.h // // License: GNU GPL (version 2 or later) // // Contents: a node in the tree // // Owner: gregr@rand.org // // History: 3/26/2001 gregr created // 2/14/2003 gregr: adapted for R implementation // //------------------------------------------------------------------------------ #ifndef NODGBM_H #define NODGBM_H #include #include "dataset.h" #include "buildinfo.h" class CNodeFactory; using namespace std; typedef vector VEC_CATEGORIES; typedef vector VEC_VEC_CATEGORIES; class CNode { public: CNode(); virtual ~CNode(); virtual GBMRESULT Adjust(unsigned long cMinObsInNode); virtual GBMRESULT Predict(CDataset *pData, unsigned long iRow, double &dFadj); virtual GBMRESULT Predict(double *adX, unsigned long cRow, unsigned long cCol, unsigned long iRow, double &dFadj) = 0; static double Improvement ( double dLeftW, double dRightW, double dMissingW, double dLeftSum, double dRightSum, double dMissingSum ) { double dTemp = 0.0; double dResult = 0.0; if(dMissingW == 0.0) { dTemp = dLeftSum/dLeftW - dRightSum/dRightW; dResult = dLeftW*dRightW*dTemp*dTemp/(dLeftW+dRightW); } else { dTemp = dLeftSum/dLeftW - dRightSum/dRightW; dResult += dLeftW*dRightW*dTemp*dTemp; dTemp = dLeftSum/dLeftW - dMissingSum/dMissingW; dResult += dLeftW*dMissingW*dTemp*dTemp; dTemp = dRightSum/dRightW - dMissingSum/dMissingW; dResult += dRightW*dMissingW*dTemp*dTemp; dResult /= (dLeftW + dRightW + dMissingW); } return dResult; } virtual GBMRESULT PrintSubtree(unsigned long cIndent); virtual GBMRESULT TransferTreeToRList(int &iNodeID, CDataset *pData, int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, VEC_VEC_CATEGORIES &vecSplitCodes, int cCatSplitsOld, double dShrinkage); double TotalError(); virtual GBMRESULT GetVarRelativeInfluence(double *adRelInf); virtual GBMRESULT RecycleSelf(CNodeFactory *pNodeFactory) = 0; double dPrediction; double dTrainW; // total training weight in node unsigned long cN; // number of training observations in node bool isTerminal; protected: double GetXEntry(CDataset *pData, unsigned long iRow, unsigned long iCol) { return pData->adX[iCol*(pData->cRows) + iRow]; } }; typedef CNode *PCNode; #endif // NODGBM_H gbm/src/node_nonterminal.h0000644000176200001440000000470713417115400015316 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: node_nonterminal.h // // License: GNU GPL (version 2 or later) // // Contents: a node in the tree // // Owner: gregr@rand.org // // History: 3/26/2001 gregr created // 2/14/2003 gregr: adapted for R implementation // //------------------------------------------------------------------------------ #ifndef NODENONTERMINAL_H #define NODENONTERMINAL_H #include "node.h" #include "node_terminal.h" class CNodeNonterminal : public CNode { public: CNodeNonterminal(); virtual ~CNodeNonterminal(); virtual GBMRESULT Adjust(unsigned long cMinObsInNode); virtual signed char WhichNode(CDataset *pData, unsigned long iObs) = 0; virtual signed char WhichNode(double *adX, unsigned long cRow, unsigned long cCol, unsigned long iRow) = 0; virtual GBMRESULT TransferTreeToRList(int &iNodeID, CDataset *pData, int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, VEC_VEC_CATEGORIES &vecSplitCodes, int cCatSplitsOld, double dShrinkage) = 0; GBMRESULT Predict(CDataset *pData, unsigned long iRow, double &dFadj); GBMRESULT Predict(double *adX, unsigned long cRow, unsigned long cCol, unsigned long iRow, double &dFadj); GBMRESULT GetVarRelativeInfluence(double *adRelInf); virtual GBMRESULT RecycleSelf(CNodeFactory *pNodeFactory) = 0; CNode *pLeftNode; CNode *pRightNode; CNode *pMissingNode; unsigned long iSplitVar; double dImprovement; }; typedef CNodeNonterminal *PCNodeNonterminal; #endif // NODENONTERMINAL_H gbm/src/node_factory.cpp0000644000176200001440000000673313417115400014773 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "node_factory.h" CNodeFactory::CNodeFactory() { } CNodeFactory::~CNodeFactory() { #ifdef NOISY_DEBUG Rprintf("destructing node factory\n"); #endif } GBMRESULT CNodeFactory::Initialize ( unsigned long cDepth ) { GBMRESULT hr = GBM_OK; unsigned long i = 0; for(i=0; idPrediction = 0.0; } return pNodeTerminalTemp; } CNodeContinuous* CNodeFactory::GetNewNodeContinuous() { if(ContinuousStack.empty()) { #ifdef NOISY_DEBUG Rprintf("Continuous stack is empty\n"); #endif pNodeContinuousTemp = NULL; } else { pNodeContinuousTemp = ContinuousStack.top(); ContinuousStack.pop(); pNodeContinuousTemp->dPrediction = 0.0; pNodeContinuousTemp->dImprovement = 0.0; pNodeContinuousTemp->pMissingNode = NULL; pNodeContinuousTemp->pLeftNode = NULL; pNodeContinuousTemp->pRightNode = NULL; pNodeContinuousTemp->iSplitVar = 0; pNodeContinuousTemp->dSplitValue = 0.0; } return pNodeContinuousTemp; } CNodeCategorical* CNodeFactory::GetNewNodeCategorical() { if(CategoricalStack.empty()) { #ifdef NOISY_DEBUG Rprintf("Categorical stack is empty\n"); #endif pNodeCategoricalTemp = NULL; } else { pNodeCategoricalTemp = CategoricalStack.top(); CategoricalStack.pop(); pNodeCategoricalTemp->dPrediction = 0.0; pNodeCategoricalTemp->dImprovement = 0.0; pNodeCategoricalTemp->pMissingNode = NULL; pNodeCategoricalTemp->pLeftNode = NULL; pNodeCategoricalTemp->pRightNode = NULL; pNodeCategoricalTemp->iSplitVar = 0; pNodeCategoricalTemp->aiLeftCategory = NULL; pNodeCategoricalTemp->cLeftCategory = 0; } return pNodeCategoricalTemp; } GBMRESULT CNodeFactory::RecycleNode ( CNodeTerminal *pNode ) { if(pNode != NULL) { TerminalStack.push(pNode); } return GBM_OK; } GBMRESULT CNodeFactory::RecycleNode ( CNodeContinuous *pNode ) { if(pNode != NULL) { if(pNode->pLeftNode != NULL) pNode->pLeftNode->RecycleSelf(this); if(pNode->pRightNode != NULL) pNode->pRightNode->RecycleSelf(this); if(pNode->pMissingNode != NULL) pNode->pMissingNode->RecycleSelf(this); ContinuousStack.push(pNode); } return GBM_OK; } GBMRESULT CNodeFactory::RecycleNode ( CNodeCategorical *pNode ) { if(pNode != NULL) { if(pNode->pLeftNode != NULL) pNode->pLeftNode->RecycleSelf(this); if(pNode->pRightNode != NULL) pNode->pRightNode->RecycleSelf(this); if(pNode->pMissingNode != NULL) pNode->pMissingNode->RecycleSelf(this); if(pNode->aiLeftCategory != NULL) { delete [] pNode->aiLeftCategory; pNode->aiLeftCategory = NULL; } CategoricalStack.push(pNode); } return GBM_OK; } gbm/src/node_search.cpp0000644000176200001440000003045013417115400014562 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: node_search.cpp // //------------------------------------------------------------------------------ #include "node_search.h" CNodeSearch::CNodeSearch() :k_cMaxClasses(1024) { iBestSplitVar = 0; dBestSplitValue = 0.0; fIsSplit = false; dBestMissingTotalW = 0.0; dCurrentMissingTotalW = 0.0; dBestMissingSumZ = 0.0; dCurrentMissingSumZ = 0.0; adGroupSumZ = NULL; adGroupW = NULL; acGroupN = NULL; adGroupMean = NULL; aiCurrentCategory = NULL; aiBestCategory = NULL; iRank = UINT_MAX; } CNodeSearch::~CNodeSearch() { if(adGroupSumZ != NULL) { delete [] adGroupSumZ; adGroupSumZ = NULL; } if(adGroupW != NULL) { delete [] adGroupW; adGroupW = NULL; } if(acGroupN != NULL) { delete [] acGroupN; acGroupN = NULL; } if(adGroupMean != NULL) { delete [] adGroupMean; adGroupMean = NULL; } if(aiCurrentCategory != NULL) { delete [] aiCurrentCategory; aiCurrentCategory = NULL; } if(aiBestCategory != NULL) { delete [] aiBestCategory; aiBestCategory = NULL; } } GBMRESULT CNodeSearch::Initialize ( unsigned long cMinObsInNode ) { GBMRESULT hr = GBM_OK; adGroupSumZ = new double[k_cMaxClasses]; if(adGroupSumZ == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } adGroupW = new double[k_cMaxClasses]; if(adGroupW == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } acGroupN = new ULONG[k_cMaxClasses]; if(acGroupN == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } adGroupMean = new double[k_cMaxClasses]; if(adGroupMean == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } aiCurrentCategory = new int[k_cMaxClasses]; if(aiCurrentCategory == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } aiBestCategory = new ULONG[k_cMaxClasses]; if(aiBestCategory == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } this->cMinObsInNode = cMinObsInNode; Cleanup: return hr; Error: goto Cleanup; } GBMRESULT CNodeSearch::IncorporateObs ( double dX, double dZ, double dW, long lMonotone ) { GBMRESULT hr = GBM_OK; static double dWZ = 0.0; if(fIsSplit) goto Cleanup; dWZ = dW*dZ; if(ISNA(dX)) { dCurrentMissingSumZ += dWZ; dCurrentMissingTotalW += dW; cCurrentMissingN++; dCurrentRightSumZ -= dWZ; dCurrentRightTotalW -= dW; cCurrentRightN--; } else if(cCurrentVarClasses == 0) // variable is continuous { if(dLastXValue > dX) { error("Observations are not in order. gbm() was unable to build an index for the design matrix. Could be a bug in gbm or an unusual data type in data.\n"); hr = GBM_FAIL; goto Error; } // Evaluate the current split // the newest observation is still in the right child dCurrentSplitValue = 0.5*(dLastXValue + dX); if((dLastXValue != dX) && (cCurrentLeftN >= cMinObsInNode) && (cCurrentRightN >= cMinObsInNode) && ((lMonotone==0) || (lMonotone*(dCurrentRightSumZ*dCurrentLeftTotalW - dCurrentLeftSumZ*dCurrentRightTotalW) > 0))) { dCurrentImprovement = CNode::Improvement(dCurrentLeftTotalW,dCurrentRightTotalW, dCurrentMissingTotalW, dCurrentLeftSumZ,dCurrentRightSumZ, dCurrentMissingSumZ); if(dCurrentImprovement > dBestImprovement) { iBestSplitVar = iCurrentSplitVar; dBestSplitValue = dCurrentSplitValue; cBestVarClasses = 0; dBestLeftSumZ = dCurrentLeftSumZ; dBestLeftTotalW = dCurrentLeftTotalW; cBestLeftN = cCurrentLeftN; dBestRightSumZ = dCurrentRightSumZ; dBestRightTotalW = dCurrentRightTotalW; cBestRightN = cCurrentRightN; dBestImprovement = dCurrentImprovement; } } // now move the new observation to the left // if another observation arrives we will evaluate this dCurrentLeftSumZ += dWZ; dCurrentLeftTotalW += dW; cCurrentLeftN++; dCurrentRightSumZ -= dWZ; dCurrentRightTotalW -= dW; cCurrentRightN--; dLastXValue = dX; } else // variable is categorical, evaluates later { adGroupSumZ[(unsigned long)dX] += dWZ; adGroupW[(unsigned long)dX] += dW; acGroupN[(unsigned long)dX] ++; } Cleanup: return hr; Error: goto Cleanup; } GBMRESULT CNodeSearch::Set ( double dSumZ, double dTotalW, unsigned long cTotalN, CNodeTerminal *pThisNode, CNode **ppParentPointerToThisNode, CNodeFactory *pNodeFactory ) { GBMRESULT hr = GBM_OK; dInitSumZ = dSumZ; dInitTotalW = dTotalW; cInitN = cTotalN; dBestLeftSumZ = 0.0; dBestLeftTotalW = 0.0; cBestLeftN = 0; dCurrentLeftSumZ = 0.0; dCurrentLeftTotalW = 0.0; cCurrentLeftN = 0; dBestRightSumZ = dSumZ; dBestRightTotalW = dTotalW; cBestRightN = cTotalN; dCurrentRightSumZ = 0.0; dCurrentRightTotalW = dTotalW; cCurrentRightN = cTotalN; dBestMissingSumZ = 0.0; dBestMissingTotalW = 0.0; cBestMissingN = 0; dCurrentMissingSumZ = 0.0; dCurrentMissingTotalW = 0.0; cCurrentMissingN = 0; dBestImprovement = 0.0; iBestSplitVar = UINT_MAX; dCurrentImprovement = 0.0; iCurrentSplitVar = UINT_MAX; dCurrentSplitValue = -HUGE_VAL; fIsSplit = false; this->pThisNode = pThisNode; this->ppParentPointerToThisNode = ppParentPointerToThisNode; this->pNodeFactory = pNodeFactory; return hr; } GBMRESULT CNodeSearch::ResetForNewVar ( unsigned long iWhichVar, long cCurrentVarClasses ) { GBMRESULT hr = GBM_OK; long i=0; if(fIsSplit) goto Cleanup; for(i=0; icCurrentVarClasses = cCurrentVarClasses; dCurrentLeftSumZ = 0.0; dCurrentLeftTotalW = 0.0; cCurrentLeftN = 0; dCurrentRightSumZ = dInitSumZ; dCurrentRightTotalW = dInitTotalW; cCurrentRightN = cInitN; dCurrentMissingSumZ = 0.0; dCurrentMissingTotalW = 0.0; cCurrentMissingN = 0; dCurrentImprovement = 0.0; dLastXValue = -HUGE_VAL; Cleanup: return hr; } GBMRESULT CNodeSearch::WrapUpCurrentVariable() { GBMRESULT hr = GBM_OK; if(iCurrentSplitVar == iBestSplitVar) { if(cCurrentMissingN > 0) { dBestMissingSumZ = dCurrentMissingSumZ; dBestMissingTotalW = dCurrentMissingTotalW; cBestMissingN = cCurrentMissingN; } else // DEBUG: consider a weighted average with parent node? { dBestMissingSumZ = dInitSumZ; dBestMissingTotalW = dInitTotalW; cBestMissingN = 0; } } return hr; } GBMRESULT CNodeSearch::EvaluateCategoricalSplit() { GBMRESULT hr = GBM_OK; long i=0; long j=0; unsigned long cFiniteMeans = 0; if(fIsSplit) goto Cleanup; if(cCurrentVarClasses == 0) { hr = GBM_INVALIDARG; goto Error; } cFiniteMeans = 0; for(i=0; i1) && ((ULONG)i= cMinObsInNode) && (cCurrentRightN >= cMinObsInNode) && (dCurrentImprovement > dBestImprovement)) { dBestSplitValue = dCurrentSplitValue; if(iBestSplitVar != iCurrentSplitVar) { iBestSplitVar = iCurrentSplitVar; cBestVarClasses = cCurrentVarClasses; for(j=0; jGetNewNodeTerminal(); pNewRightNode = pNodeFactory->GetNewNodeTerminal(); pNewMissingNode = pNodeFactory->GetNewNodeTerminal(); // set up a continuous split if(cBestVarClasses==0) { pNewNodeContinuous = pNodeFactory->GetNewNodeContinuous(); pNewNodeContinuous->dSplitValue = dBestSplitValue; pNewNodeContinuous->iSplitVar = iBestSplitVar; pNewSplitNode = pNewNodeContinuous; } else { // get a new categorical node and its branches pNewNodeCategorical = pNodeFactory->GetNewNodeCategorical(); // set up the categorical split pNewNodeCategorical->iSplitVar = iBestSplitVar; pNewNodeCategorical->cLeftCategory = (ULONG)dBestSplitValue + 1; pNewNodeCategorical->aiLeftCategory = new ULONG[pNewNodeCategorical->cLeftCategory]; for(i=0; icLeftCategory; i++) { pNewNodeCategorical->aiLeftCategory[i] = aiBestCategory[i]; } pNewSplitNode = pNewNodeCategorical; } *ppParentPointerToThisNode = pNewSplitNode; pNewSplitNode->dPrediction = pThisNode->dPrediction; pNewSplitNode->dImprovement = dBestImprovement; pNewSplitNode->dTrainW = pThisNode->dTrainW; pNewSplitNode->pLeftNode = pNewLeftNode; pNewSplitNode->pRightNode = pNewRightNode; pNewSplitNode->pMissingNode = pNewMissingNode; pNewLeftNode->dPrediction = dBestLeftSumZ/dBestLeftTotalW; pNewLeftNode->dTrainW = dBestLeftTotalW; pNewLeftNode->cN = cBestLeftN; pNewRightNode->dPrediction = dBestRightSumZ/dBestRightTotalW; pNewRightNode->dTrainW = dBestRightTotalW; pNewRightNode->cN = cBestRightN; pNewMissingNode->dPrediction = dBestMissingSumZ/dBestMissingTotalW; pNewMissingNode->dTrainW = dBestMissingTotalW; pNewMissingNode->cN = cBestMissingN; pThisNode->RecycleSelf(pNodeFactory); return hr; } gbm/src/node.cpp0000644000176200001440000000224013417115400013231 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "node.h" CNode::CNode() { dPrediction = 0.0; dTrainW = 0.0; isTerminal = false; } CNode::~CNode() { // the nodes get deleted by deleting the node factory } GBMRESULT CNode::Adjust ( unsigned long cMinObsInNode ) { GBMRESULT hr = GBM_NOTIMPL; return hr; } GBMRESULT CNode::Predict ( CDataset *pData, unsigned long iRow, double &dFadj ) { GBMRESULT hr = GBM_NOTIMPL; return hr; } double CNode::TotalError() { GBMRESULT hr = GBM_NOTIMPL; return hr; } GBMRESULT CNode::PrintSubtree ( unsigned long cIndent ) { GBMRESULT hr = GBM_NOTIMPL; return hr; } GBMRESULT CNode::GetVarRelativeInfluence ( double *adRelInf ) { GBMRESULT hr = GBM_NOTIMPL; return hr; } GBMRESULT CNode::TransferTreeToRList ( int &iNodeID, CDataset *pData, int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, VEC_VEC_CATEGORIES &vecSplitCodes, int cCatSplitsOld, double dShrinkage ) { return GBM_NOTIMPL; } gbm/src/bernoulli.cpp0000644000176200001440000001071513417115400014305 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "bernoulli.h" CBernoulli::CBernoulli() { } CBernoulli::~CBernoulli() { } GBMRESULT CBernoulli::ComputeWorkingResponse ( double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff ) { unsigned long i = 0; double dProb = 0.0; double dF = 0.0; for(i=0; i 0.0001) { dNum=0.0; dDen=0.0; for(i=0; idPrediction = 0.0; } else { vecpTermNodes[iNode]->dPrediction = vecdNum[iNode]/vecdDen[iNode]; } } } return hr; } double CBernoulli::BagImprovement ( double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain ) { double dReturnValue = 0.0; double dF = 0.0; double dW = 0.0; unsigned long i = 0; for(i=0; idAlpha = dAlpha; } CQuantile::~CQuantile() { } GBMRESULT CQuantile::ComputeWorkingResponse ( double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff ) { unsigned long i = 0; if(adOffset == NULL) { for(i=0; i adF[i]) ? dAlpha : -(1.0-dAlpha); } } else { for(i=0; i adF[i]+adOffset[i]) ? dAlpha : -(1.0-dAlpha); } } return GBM_OK; } // DEBUG: needs weighted quantile GBMRESULT CQuantile::InitF ( double *adY, double *adMisc, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength ) { double dOffset=0.0; unsigned long i=0; vecd.resize(cLength); for(i=0; i adF[i]) { dL += adWeight[i]*dAlpha *(adY[i] - adF[i]); } else { dL += adWeight[i]*(1.0-dAlpha)*(adF[i] - adY[i]); } dW += adWeight[i]; } } else { for(i=cIdxOff; i adF[i] + adOffset[i]) { dL += adWeight[i]*dAlpha *(adY[i] - adF[i]-adOffset[i]); } else { dL += adWeight[i]*(1.0-dAlpha)*(adF[i]+adOffset[i] - adY[i]); } dW += adWeight[i]; } } return dL/dW; } // DEBUG: needs weighted quantile GBMRESULT CQuantile::FitBestConstant ( double *adY, double *adMisc, double *adOffset, double *adW, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long nTrain, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff ) { GBMRESULT hr = GBM_OK; unsigned long iNode = 0; unsigned long iObs = 0; unsigned long iVecd = 0; double dOffset; vecd.resize(nTrain); // should already be this size from InitF for(iNode=0; iNodecN >= cMinObsInNode) { iVecd = 0; for(iObs=0; iObsdPrediction = *max_element(vecd.begin(), vecd.begin()+iVecd); } else { nth_element(vecd.begin(), vecd.begin() + int(iVecd*dAlpha), vecd.begin() + int(iVecd)); vecpTermNodes[iNode]->dPrediction = *(vecd.begin() + int(iVecd*dAlpha)); } } } return hr; } double CQuantile::BagImprovement ( double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain ) { double dReturnValue = 0.0; double dF = 0.0; double dW = 0.0; unsigned long i = 0; for(i=0; i dF) { dReturnValue += adWeight[i]*dAlpha*(adY[i]-dF); } else { dReturnValue += adWeight[i]*(1-dAlpha)*(dF-adY[i]); } if(adY[i] > dF+dStepSize*adFadj[i]) { dReturnValue -= adWeight[i]*dAlpha* (adY[i] - dF-dStepSize*adFadj[i]); } else { dReturnValue -= adWeight[i]*(1-dAlpha)* (dF+dStepSize*adFadj[i] - adY[i]); } dW += adWeight[i]; } } return dReturnValue/dW; } gbm/src/huberized.h0000644000176200001440000000555313417115400013744 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: bernoulli.h // // License: GNU GPL (version 2 or later) // // Contents: bernoulli object // // Owner: gregr@rand.org // // History: 3/26/2001 gregr created // 2/14/2003 gregr: adapted for R implementation // //------------------------------------------------------------------------------ #ifndef HUBERIZED_H #define HUBERIZED_H #include "distribution.h" #include "buildinfo.h" class CHuberized : public CDistribution { public: CHuberized(); virtual ~CHuberized(); GBMRESULT UpdateParams(double *adF, double *adOffset, double *adWeight, unsigned long cLength) { return GBM_OK; }; GBMRESULT ComputeWorkingResponse(double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff); double Deviance(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff); GBMRESULT InitF(double *adY, double *adMisc, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength); GBMRESULT FitBestConstant(double *adY, double *adMisc, double *adOffset, double *adW, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long nTrain, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff); double BagImprovement(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain); private: vector vecdNum; vector vecdDen; }; #endif // HUBERIZED_H gbm/src/node_search.h0000644000176200001440000000625113417115400014231 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: node_search.h // // License: GNU GPL (version 2 or later) // // Contents: does the searching for where to split a node // // Owner: gregr@rand.org // // History: 3/26/2001 gregr created // 2/14/2003 gregr: adapted for R implementation // //------------------------------------------------------------------------------ #ifndef NODESEARCH_H #define NODESEARCH_H #include "node_factory.h" #include "dataset.h" using namespace std; class CNodeSearch { public: CNodeSearch(); ~CNodeSearch(); GBMRESULT Initialize(unsigned long cMinObsInNode); GBMRESULT IncorporateObs(double dX, double dZ, double dW, long lMonotone); GBMRESULT Set(double dSumZ, double dTotalW, unsigned long cTotalN, CNodeTerminal *pThisNode, CNode **ppParentPointerToThisNode, CNodeFactory *pNodeFactory); GBMRESULT ResetForNewVar(unsigned long iWhichVar, long cVarClasses); double BestImprovement() { return dBestImprovement; } GBMRESULT SetToSplit() { fIsSplit = true; return GBM_OK; }; GBMRESULT SetupNewNodes(PCNodeNonterminal &pNewSplitNode, PCNodeTerminal &pNewLeftNode, PCNodeTerminal &pNewRightNode, PCNodeTerminal &pNewMissingNode); GBMRESULT EvaluateCategoricalSplit(); GBMRESULT WrapUpCurrentVariable(); double ThisNodePrediction() {return pThisNode->dPrediction;} bool operator<(const CNodeSearch &ns) {return dBestImprovement #include #include #include "dataset.h" #include "node_factory.h" #include "node_search.h" class CCARTTree { public: CCARTTree(); ~CCARTTree(); GBMRESULT Initialize(CNodeFactory *pNodeFactory); GBMRESULT grow(double *adZ, CDataset *pData, double *adAlgW, double *adF, unsigned long nTrain, unsigned long nBagged, double dLambda, unsigned long cMaxDepth, unsigned long cMinObsInNode, bool *afInBag, unsigned long *aiNodeAssign, CNodeSearch *aNodeSearch, VEC_P_NODETERMINAL &vecpTermNodes); GBMRESULT Reset(); GBMRESULT TransferTreeToRList(CDataset *pData, int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, VEC_VEC_CATEGORIES &vecSplitCodes, int cCatSplitsOld, double dShrinkage); GBMRESULT PredictValid(CDataset *pData, unsigned long nValid, double *adFadj); GBMRESULT Predict(double *adX, unsigned long cRow, unsigned long cCol, unsigned long iRow, double &dFadj); GBMRESULT Adjust(unsigned long *aiNodeAssign, double *adFadj, unsigned long cTrain, VEC_P_NODETERMINAL &vecpTermNodes, unsigned long cMinObsInNode); GBMRESULT GetNodeCount(int &cNodes); GBMRESULT SetShrinkage(double dShrink) { this->dShrink = dShrink; return GBM_OK; } double GetShrinkage() {return dShrink;} GBMRESULT Print(); GBMRESULT GetVarRelativeInfluence(double *adRelInf); double dError; // total squared error before carrying out the splits private: GBMRESULT GetBestSplit(CDataset *pData, unsigned long nTrain, CNodeSearch *aNodeSearch, unsigned long cTerminalNodes, unsigned long *aiNodeAssign, bool *afInBag, double *adZ, double *adW, unsigned long &iBestNode, double &dBestNodeImprovement); CNode *pRootNode; double dShrink; // objects used repeatedly unsigned long cDepth; unsigned long cTerminalNodes; unsigned long cTotalNodeCount; unsigned long iObs; unsigned long iWhichNode; unsigned long iBestNode; double dBestNodeImprovement; double dSumZ; double dSumZ2; double dTotalW; signed char schWhichNode; CNodeFactory *pNodeFactory; CNodeNonterminal *pNewSplitNode; CNodeTerminal *pNewLeftNode; CNodeTerminal *pNewRightNode; CNodeTerminal *pNewMissingNode; CNodeTerminal *pInitialRootNode; }; typedef CCARTTree *PCCARTTree; #endif // TREGBM_H gbm/src/poisson.cpp0000644000176200001440000001234113417115400014001 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "poisson.h" CPoisson::CPoisson() { } CPoisson::~CPoisson() { } GBMRESULT CPoisson::ComputeWorkingResponse ( double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff ) { unsigned long i = 0; double dF = 0.0; // compute working response for(i=0; i < nTrain; i++) { dF = adF[i] + ((adOffset==NULL) ? 0.0 : adOffset[i]); adZ[i] = adY[i] - exp(dF); } return GBM_OK; } GBMRESULT CPoisson::InitF ( double *adY, double *adMisc, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength ) { GBMRESULT hr = GBM_OK; double dSum = 0.0; double dDenom = 0.0; unsigned long i = 0; if(adOffset == NULL) { for(i=0; idPrediction = -19.0; } else if(vecdDen[iNode] == 0.0) { vecpTermNodes[iNode]->dPrediction = 0.0; } else { vecpTermNodes[iNode]->dPrediction = log(vecdNum[iNode]/vecdDen[iNode]); } vecpTermNodes[iNode]->dPrediction = fmin2(vecpTermNodes[iNode]->dPrediction, 19-vecdMax[iNode]); vecpTermNodes[iNode]->dPrediction = fmax2(vecpTermNodes[iNode]->dPrediction, -19-vecdMin[iNode]); } } return hr; } double CPoisson::BagImprovement ( double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain ) { double dReturnValue = 0.0; double dF = 0.0; double dW = 0.0; unsigned long i = 0; for(i=0; i #include "distribution.h" #include "locationm.h" class CMultinomial : public CDistribution { public: CMultinomial(int cNumClasses, int cRows); virtual ~CMultinomial(); GBMRESULT UpdateParams(double *adF, double *adOffset, double *adWeight, unsigned long cLength); GBMRESULT ComputeWorkingResponse(double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff); GBMRESULT InitF(double *adY, double *adMisc, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength); GBMRESULT FitBestConstant(double *adY, double *adMisc, double *adOffset, double *adW, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long nTrain, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff); double Deviance(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff); double BagImprovement(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain); private: unsigned long mcNumClasses; unsigned long mcRows; double *madProb; }; #endif // KMULTICGBM_H gbm/src/adaboost.cpp0000644000176200001440000001027413417115400014106 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "adaboost.h" CAdaBoost::CAdaBoost() { } CAdaBoost::~CAdaBoost() { } GBMRESULT CAdaBoost::ComputeWorkingResponse ( double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff ) { unsigned long i = 0; if(adOffset == NULL) { for(i=0; idPrediction = 0.0; } else { vecpTermNodes[iNode]->dPrediction = vecdNum[iNode]/vecdDen[iNode]; } } } return hr; } double CAdaBoost::BagImprovement ( double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain ) { double dReturnValue = 0.0; double dF = 0.0; double dW = 0.0; unsigned long i = 0; for(i=0; i #include #include "node_nonterminal.h" class CNodeCategorical : public CNodeNonterminal { public: CNodeCategorical(); ~CNodeCategorical(); GBMRESULT PrintSubtree(unsigned long cIndent); GBMRESULT TransferTreeToRList(int &iNodeID, CDataset *pData, int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, VEC_VEC_CATEGORIES &vecSplitCodes, int cCatSplitsOld, double dShrinkage); signed char WhichNode(CDataset *pData, unsigned long iObs); signed char WhichNode(double *adX, unsigned long cRow, unsigned long cCol, unsigned long iRow); GBMRESULT RecycleSelf(CNodeFactory *pNodeFactory); unsigned long *aiLeftCategory; unsigned long cLeftCategory; }; typedef CNodeCategorical *PCNodeCategorical; #endif // NODECATEGORICAL_H gbm/src/node_factory.h0000644000176200001440000000317513417115400014435 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: node_factory.h // // License: GNU GPL (version 2 or later) // // Contents: manager for allocation and destruction of all nodes // // Owner: gregr@rand.org // // History: 3/26/2001 gregr created // 2/14/2003 gregr: adapted for R implementation // //------------------------------------------------------------------------------ #ifndef NODEFACTORY_H #define NODEFACTORY_H #include #include #include "node_terminal.h" #include "node_continuous.h" #include "node_categorical.h" #define NODEFACTORY_NODGBM_RESERVE ((unsigned long)101) using namespace std; class CNodeFactory { public: CNodeFactory(); ~CNodeFactory(); GBMRESULT Initialize(unsigned long cDepth); CNodeTerminal* GetNewNodeTerminal(); CNodeContinuous* GetNewNodeContinuous(); CNodeCategorical* GetNewNodeCategorical(); GBMRESULT RecycleNode(CNodeTerminal *pNode); GBMRESULT RecycleNode(CNodeContinuous *pNode); GBMRESULT RecycleNode(CNodeCategorical *pNode); private: stack TerminalStack; stack ContinuousStack; stack CategoricalStack; CNodeTerminal* pNodeTerminalTemp; CNodeContinuous* pNodeContinuousTemp; CNodeCategorical* pNodeCategoricalTemp; CNodeTerminal aBlockTerminal[NODEFACTORY_NODGBM_RESERVE]; CNodeContinuous aBlockContinuous[NODEFACTORY_NODGBM_RESERVE]; CNodeCategorical aBlockCategorical[NODEFACTORY_NODGBM_RESERVE]; }; #endif // NODEFACTORY_H gbm/src/gbm.cpp0000644000176200001440000001332713417115400013061 0ustar liggesusers//------------------------------------------------------------------------------ // // GBM by Greg Ridgeway Copyright (C) 2003 // File: gbm.cpp // //------------------------------------------------------------------------------ #include #include "gbm.h" // Count the number of distinct groups in the input data int num_groups(const double* adMisc, int cTrain) { if (cTrain <= 0) { return 0; } double dLastGroup = adMisc[0]; int cGroups = 1; for(int i=1; iSetData(adX,aiXOrder,adY,adOffset,adWeight,adMisc, cRows,cCols,acVarClasses,alMonotoneVar); if(GBM_FAILED(hr)) { goto Error; } // set the distribution if(strncmp(pszFamily,"bernoulli",2) == 0) { pDist = new CBernoulli(); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else if(strncmp(pszFamily,"gaussian",2) == 0) { pDist = new CGaussian(); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else if(strncmp(pszFamily,"poisson",2) == 0) { pDist = new CPoisson(); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else if(strncmp(pszFamily,"adaboost",2) == 0) { pDist = new CAdaBoost(); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else if(strncmp(pszFamily,"coxph",2) == 0) { pDist = new CCoxPH(); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else if(strncmp(pszFamily,"laplace",2) == 0) { pDist = new CLaplace(); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else if(strncmp(pszFamily,"quantile",2) == 0) { pDist = new CQuantile(adMisc[0]); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else if(strncmp(pszFamily,"tdist",2) == 0) { pDist = new CTDist(adMisc[0]); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else if(strncmp(pszFamily,"multinomial",2) == 0) { pDist = new CMultinomial(cNumClasses, cRows); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else if(strncmp(pszFamily,"huberized",2) == 0) { pDist = new CHuberized(); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else if(strcmp(pszFamily,"pairwise_conc") == 0) { pDist = new CPairwise("conc"); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else if(strcmp(pszFamily,"pairwise_ndcg") == 0) { pDist = new CPairwise("ndcg"); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else if(strcmp(pszFamily,"pairwise_map") == 0) { pDist = new CPairwise("map"); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else if(strcmp(pszFamily,"pairwise_mrr") == 0) { pDist = new CPairwise("mrr"); if(pDist==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } } else { hr = GBM_INVALIDARG; goto Error; } if(pDist==NULL) { hr = GBM_INVALIDARG; goto Error; } if (!strncmp(pszFamily, "pairwise", strlen("pairwise"))) { cGroups = num_groups(adMisc, cTrain); } Cleanup: return hr; Error: goto Cleanup; } GBMRESULT gbm_transfer_to_R ( CGBM *pGBM, VEC_VEC_CATEGORIES &vecSplitCodes, int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, int cCatSplitsOld ) { GBMRESULT hr = GBM_OK; hr = pGBM->TransferTreeToRList(aiSplitVar, adSplitPoint, aiLeftNode, aiRightNode, aiMissingNode, adErrorReduction, adWeight, adPred, vecSplitCodes, cCatSplitsOld); if(GBM_FAILED(hr)) goto Error; Cleanup: return hr; Error: goto Cleanup; } GBMRESULT gbm_transfer_catsplits_to_R ( int iCatSplit, VEC_VEC_CATEGORIES &vecSplitCodes, int *aiSplitCodes ) { unsigned long i=0; for(i=0; i #include "node_nonterminal.h" class CNodeContinuous : public CNodeNonterminal { public: CNodeContinuous(); ~CNodeContinuous(); GBMRESULT PrintSubtree(unsigned long cIndent); GBMRESULT TransferTreeToRList(int &iNodeID, CDataset *pData, int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, VEC_VEC_CATEGORIES &vecSplitCodes, int cCatSplitsOld, double dShrinkage); signed char WhichNode(CDataset *pData, unsigned long iObs); signed char WhichNode(double *adX, unsigned long cRow, unsigned long cCol, unsigned long iRow); GBMRESULT RecycleSelf(CNodeFactory *pNodeFactory); double dSplitValue; }; typedef CNodeContinuous *PCNodeContinuous; #endif // NODECONTINUOUS_H gbm/src/coxph.cpp0000644000176200001440000001375513417115400013442 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "coxph.h" CCoxPH::CCoxPH() { } CCoxPH::~CCoxPH() { } GBMRESULT CCoxPH::ComputeWorkingResponse ( double *adT, double *adDelta, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff ) { unsigned long i = 0; double dF = 0.0; double dTot = 0.0; double dRiskTot = 0.0; vecdRiskTot.resize(nTrain); dRiskTot = 0.0; for(i=0; icN >= cMinObsInNode) { veciK2Node[K] = i; veciNode2K[i] = K; K++; } } vecdP.resize(K); matH.setactualsize(K-1); vecdG.resize(K-1); vecdG.assign(K-1,0.0); // zero the Hessian for(k=0; kcN >= cMinObsInNode)) { dF = adF[i] + ((adOffset==NULL) ? 0.0 : adOffset[i]); vecdP[veciNode2K[aiNodeAssign[i]]] += adW[i]*exp(dF); dRiskTot += adW[i]*exp(dF); if(adDelta[i]==1.0) { // compute g and H for(k=0; kdPrediction = 0.0; } for(m=0; mdPrediction = 0.0; break; } else { vecpTermNodes[veciK2Node[k]]->dPrediction -= dTemp*vecdG[m]; } } } // vecpTermNodes[veciK2Node[K-1]]->dPrediction = 0.0; // already set to 0.0 return hr; } double CCoxPH::BagImprovement ( double *adT, double *adDelta, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain ) { double dReturnValue = 0.0; double dNum = 0.0; double dDen = 0.0; double dF = 0.0; double dW = 0.0; unsigned long i = 0; dNum = 0.0; dDen = 0.0; for(i=0; i= cRows) || (iCol >= cCols)) { hr = GBM_INVALIDARG; goto Error; } dValue = adX[iCol*cRows + iRow]; Cleanup: return hr; Error: goto Cleanup; } bool fHasOffset; double *adX; int *aiXOrder; double *adXTemp4Order; double *adY; double *adOffset; double *adWeight; double *adMisc; char **apszVarNames; int *acVarClasses; int *alMonotoneVar; int cRows; int cCols; private: }; #endif // DATASET_H gbm/src/dataset.cpp0000644000176200001440000000314113417115400013732 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "dataset.h" CDataset::CDataset() { fHasOffset = false; adX = NULL; aiXOrder = NULL; adXTemp4Order = NULL; adY = NULL; adOffset = NULL; adWeight = NULL; apszVarNames = NULL; cRows = 0; cCols = 0; } CDataset::~CDataset() { } GBMRESULT CDataset::ResetWeights() { GBMRESULT hr = GBM_OK; int i = 0; if(adWeight == NULL) { hr = GBM_INVALIDARG; goto Error; } for(i=0; icRows = cRows; this->cCols = cCols; this->adX = adX; this->aiXOrder = aiXOrder; this->adY = adY; this->adOffset = adOffset; this->adWeight = adWeight; this->acVarClasses = acVarClasses; this->alMonotoneVar = alMonotoneVar; if((adOffset != NULL) && !ISNA(*adOffset)) { this->adOffset = adOffset; fHasOffset = true; } else { this->adOffset = NULL; fHasOffset = false; } if((adMisc != NULL) && !ISNA(*adMisc)) { this->adMisc = adMisc; } else { this->adMisc = NULL; } Cleanup: return hr; Error: goto Cleanup; } gbm/src/node_categorical.cpp0000644000176200001440000001404113417115400015570 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "node_categorical.h" #include "node_factory.h" CNodeCategorical::CNodeCategorical() { aiLeftCategory = NULL; cLeftCategory = 0; } CNodeCategorical::~CNodeCategorical() { #ifdef NOISY_DEBUG Rprintf("categorical destructor\n"); #endif if(aiLeftCategory != NULL) { delete [] aiLeftCategory; aiLeftCategory = NULL; } } GBMRESULT CNodeCategorical::PrintSubtree ( unsigned long cIndent ) { GBMRESULT hr = GBM_OK; unsigned long i = 0; for(i=0; i< cIndent; i++) Rprintf(" "); Rprintf("N=%f, Improvement=%f, Prediction=%f, NA pred=%f\n", dTrainW, dImprovement, dPrediction, (pMissingNode == NULL ? 0.0 : pMissingNode->dPrediction)); for(i=0; i< cIndent; i++) Rprintf(" "); Rprintf("V%d in ",iSplitVar); for(i=0; iPrintSubtree(cIndent+1); for(i=0; i< cIndent; i++) Rprintf(" "); Rprintf("V%d not in ",iSplitVar); for(i=0; iPrintSubtree(cIndent+1); for(i=0; i< cIndent; i++) Rprintf(" "); Rprintf("missing\n"); hr = pMissingNode->PrintSubtree(cIndent+1); return hr; } signed char CNodeCategorical::WhichNode ( CDataset *pData, unsigned long iObs ) { signed char ReturnValue = 0; double dX = pData->adX[iSplitVar*(pData->cRows) + iObs]; if(!ISNA(dX)) { if(std::find(aiLeftCategory, aiLeftCategory+cLeftCategory, (ULONG)dX) != aiLeftCategory+cLeftCategory) { ReturnValue = -1; } else { ReturnValue = 1; } } // if missing value returns 0 return ReturnValue; } signed char CNodeCategorical::WhichNode ( double *adX, unsigned long cRow, unsigned long cCol, unsigned long iRow ) { signed char ReturnValue = 0; double dX = adX[iSplitVar*cRow + iRow]; if(!ISNA(dX)) { if(std::find(aiLeftCategory, aiLeftCategory+cLeftCategory, (ULONG)dX) != aiLeftCategory+cLeftCategory) { ReturnValue = -1; } else { ReturnValue = 1; } } // if missing value returns 0 return ReturnValue; } GBMRESULT CNodeCategorical::RecycleSelf ( CNodeFactory *pNodeFactory ) { GBMRESULT hr = GBM_OK; hr = pNodeFactory->RecycleNode(this); return hr; }; GBMRESULT CNodeCategorical::TransferTreeToRList ( int &iNodeID, CDataset *pData, int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, VEC_VEC_CATEGORIES &vecSplitCodes, int cCatSplitsOld, double dShrinkage ) { GBMRESULT hr = GBM_OK; int iThisNodeID = iNodeID; unsigned long cCatSplits = vecSplitCodes.size(); unsigned long i = 0; int cLevels = pData->acVarClasses[iSplitVar]; aiSplitVar[iThisNodeID] = iSplitVar; adSplitPoint[iThisNodeID] = cCatSplits+cCatSplitsOld; // 0 based adErrorReduction[iThisNodeID] = dImprovement; adWeight[iThisNodeID] = dTrainW; adPred[iThisNodeID] = dShrinkage*dPrediction; vecSplitCodes.push_back(VEC_CATEGORIES()); vecSplitCodes[cCatSplits].resize(cLevels,1); for(i=0; iTransferTreeToRList(iNodeID, pData, aiSplitVar, adSplitPoint, aiLeftNode, aiRightNode, aiMissingNode, adErrorReduction, adWeight, adPred, vecSplitCodes, cCatSplitsOld, dShrinkage); if(GBM_FAILED(hr)) goto Error; aiRightNode[iThisNodeID] = iNodeID; hr = pRightNode->TransferTreeToRList(iNodeID, pData, aiSplitVar, adSplitPoint, aiLeftNode, aiRightNode, aiMissingNode, adErrorReduction, adWeight, adPred, vecSplitCodes, cCatSplitsOld, dShrinkage); if(GBM_FAILED(hr)) goto Error; aiMissingNode[iThisNodeID] = iNodeID; hr = pMissingNode->TransferTreeToRList(iNodeID, pData, aiSplitVar, adSplitPoint, aiLeftNode, aiRightNode, aiMissingNode, adErrorReduction, adWeight, adPred, vecSplitCodes, cCatSplitsOld, dShrinkage); if(GBM_FAILED(hr)) goto Error; Cleanup: return hr; Error: goto Cleanup; } gbm/src/coxph.h0000644000176200001440000000577713417115400013114 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: coxph.h // // License: GNU GPL (version 2 or later) // // Contents: Cox proportional hazard object // // Owner: gregr@rand.org // // History: 3/26/2001 gregr created // 2/14/2003 gregr: adapted for R implementation // //------------------------------------------------------------------------------ #ifndef COXPH_H #define COXPH_H #include "distribution.h" #include "matrix.h" class CCoxPH : public CDistribution { public: CCoxPH(); virtual ~CCoxPH(); GBMRESULT UpdateParams(double *adF, double *adOffset, double *adWeight, unsigned long cLength) { return GBM_OK; }; GBMRESULT ComputeWorkingResponse(double *adT, double *adDelta, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff); GBMRESULT InitF(double *adT, double *adDelta, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength); GBMRESULT FitBestConstant(double *adT, double *adDelta, double *adOffset, double *adW, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long nTrain, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff); double Deviance(double *adT, double *adDelta, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff); double BagImprovement(double *adT, double *adDelta, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain); private: vector vecdP; vector vecdRiskTot; vector vecdG; vector veciK2Node; vector veciNode2K; matrix matH; matrix matHinv; }; #endif // COXPH_H gbm/src/laplace.cpp0000644000176200001440000001027013417115400013707 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "laplace.h" CLaplace::CLaplace() { mpLocM = NULL; } CLaplace::~CLaplace() { if(mpLocM != NULL) { delete mpLocM; } } GBMRESULT CLaplace::ComputeWorkingResponse ( double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff ) { unsigned long i = 0; if(adOffset == NULL) { for(i=0; i 0.0 ? 1.0 : -1.0; } } else { for(i=0; i 0.0 ? 1.0 : -1.0; } } return GBM_OK; } GBMRESULT CLaplace::InitF ( double *adY, double *adMisc, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength ) { GBMRESULT hr = GBM_OK; double dOffset = 0.0; unsigned long ii = 0; int nLength = int(cLength); double *adArr = NULL; // Create a new LocationM object (for weighted medians) double *pTemp = NULL; mpLocM = new CLocationM("Other", 0, pTemp); if(mpLocM == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } adArr = new double[cLength]; if(adArr == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } for (ii = 0; ii < cLength; ii++) { dOffset = (adOffset==NULL) ? 0.0 : adOffset[ii]; adArr[ii] = adY[ii] - dOffset; } dInitF = mpLocM->Median(nLength, adArr, adWeight); Cleanup: return hr; Error: goto Cleanup; } double CLaplace::Deviance ( double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff ) { unsigned long i=0; double dL = 0.0; double dW = 0.0; if(adOffset == NULL) { for(i=cIdxOff; icN >= cMinObsInNode) { iVecd = 0; for(iObs=0; iObsdPrediction = mpLocM->Median(iVecd, adArr, adW2); } } return hr; } double CLaplace::BagImprovement ( double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain ) { double dReturnValue = 0.0; double dF = 0.0; double dW = 0.0; unsigned long i = 0; for(i=0; i #include "dataset.h" #include "node.h" using namespace std; class CNodeTerminal : public CNode { public: CNodeTerminal(); ~CNodeTerminal(); GBMRESULT Adjust(unsigned long cMinObsInNode); GBMRESULT PrintSubtree(unsigned long cIndent); GBMRESULT TransferTreeToRList(int &iNodeID, CDataset *pData, int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, VEC_VEC_CATEGORIES &vecSplitCodes, int cCatSplitsOld, double dShrinkage); GBMRESULT ApplyShrinkage(double dLambda); GBMRESULT Predict(CDataset *pData, unsigned long i, double &dFadj); GBMRESULT Predict(double *adX, unsigned long cRow, unsigned long cCol, unsigned long iRow, double &dFadj); GBMRESULT GetVarRelativeInfluence(double *adRelInf); GBMRESULT RecycleSelf(CNodeFactory *pNodeFactory); }; typedef CNodeTerminal *PCNodeTerminal; typedef vector VEC_P_NODETERMINAL; #endif // NODETERMINAL_H gbm/src/node_nonterminal.cpp0000644000176200001440000000467513417115400015655 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "node_nonterminal.h" CNodeNonterminal::CNodeNonterminal() { pLeftNode = NULL; pRightNode = NULL; iSplitVar = 0; dImprovement = 0.0; pMissingNode = NULL; } CNodeNonterminal::~CNodeNonterminal() { } GBMRESULT CNodeNonterminal::Adjust ( unsigned long cMinObsInNode ) { GBMRESULT hr = GBM_OK; hr = pLeftNode->Adjust(cMinObsInNode); hr = pRightNode->Adjust(cMinObsInNode); if(pMissingNode->isTerminal && (pMissingNode->cN < cMinObsInNode)) { dPrediction = ((pLeftNode->dTrainW)*(pLeftNode->dPrediction) + (pRightNode->dTrainW)*(pRightNode->dPrediction))/ (pLeftNode->dTrainW + pRightNode->dTrainW); pMissingNode->dPrediction = dPrediction; } else { hr = pMissingNode->Adjust(cMinObsInNode); dPrediction = ((pLeftNode->dTrainW)* (pLeftNode->dPrediction) + (pRightNode->dTrainW)* (pRightNode->dPrediction) + (pMissingNode->dTrainW)*(pMissingNode->dPrediction))/ (pLeftNode->dTrainW + pRightNode->dTrainW + pMissingNode->dTrainW); } return hr; } GBMRESULT CNodeNonterminal::Predict ( CDataset *pData, unsigned long iRow, double &dFadj ) { GBMRESULT hr = GBM_OK; signed char schWhichNode = WhichNode(pData,iRow); if(schWhichNode == -1) { hr = pLeftNode->Predict(pData, iRow, dFadj); } else if(schWhichNode == 1) { hr = pRightNode->Predict(pData, iRow, dFadj); } else { hr = pMissingNode->Predict(pData, iRow, dFadj); } return hr; } GBMRESULT CNodeNonterminal::Predict ( double *adX, unsigned long cRow, unsigned long cCol, unsigned long iRow, double &dFadj ) { GBMRESULT hr = GBM_OK; signed char schWhichNode = WhichNode(adX,cRow,cCol,iRow); if(schWhichNode == -1) { hr = pLeftNode->Predict(adX,cRow,cCol,iRow,dFadj); } else if(schWhichNode == 1) { hr = pRightNode->Predict(adX,cRow,cCol,iRow,dFadj); } else { hr = pMissingNode->Predict(adX,cRow,cCol,iRow,dFadj); } return hr; } GBMRESULT CNodeNonterminal::GetVarRelativeInfluence ( double *adRelInf ) { GBMRESULT hr = GBM_OK; adRelInf[iSplitVar] += dImprovement; pLeftNode->GetVarRelativeInfluence(adRelInf); pRightNode->GetVarRelativeInfluence(adRelInf); return hr; } gbm/src/node_terminal.cpp0000644000176200001440000000432313417115400015130 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: node_terminal.cpp // //------------------------------------------------------------------------------ #include "node_terminal.h" #include "node_factory.h" CNodeTerminal::CNodeTerminal() { isTerminal = true; } CNodeTerminal::~CNodeTerminal() { #ifdef NOISY_DEBUG Rprintf("terminal destructor\n"); #endif } GBMRESULT CNodeTerminal::Adjust ( unsigned long cMinObsInNode ) { return GBM_OK; } GBMRESULT CNodeTerminal::ApplyShrinkage ( double dLambda ) { GBMRESULT hr = GBM_OK; dPrediction *= dLambda; return hr; } GBMRESULT CNodeTerminal::Predict ( CDataset *pData, unsigned long iRow, double &dFadj ) { dFadj = dPrediction; return GBM_OK; } GBMRESULT CNodeTerminal::Predict ( double *adX, unsigned long cRow, unsigned long cCol, unsigned long iRow, double &dFadj ) { dFadj = dPrediction; return GBM_OK; } GBMRESULT CNodeTerminal::PrintSubtree ( unsigned long cIndent ) { unsigned long i = 0; for(i=0; i< cIndent; i++) Rprintf(" "); Rprintf("N=%f, Prediction=%f *\n", dTrainW, dPrediction); return GBM_OK; } GBMRESULT CNodeTerminal::GetVarRelativeInfluence ( double *adRelInf ) { return GBM_OK; } GBMRESULT CNodeTerminal::RecycleSelf ( CNodeFactory *pNodeFactory ) { pNodeFactory->RecycleNode(this); return GBM_OK; }; GBMRESULT CNodeTerminal::TransferTreeToRList ( int &iNodeID, CDataset *pData, int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, VEC_VEC_CATEGORIES &vecSplitCodes, int cCatSplitsOld, double dShrinkage ) { GBMRESULT hr = GBM_OK; aiSplitVar[iNodeID] = -1; adSplitPoint[iNodeID] = dShrinkage*dPrediction; aiLeftNode[iNodeID] = -1; aiRightNode[iNodeID] = -1; aiMissingNode[iNodeID] = -1; adErrorReduction[iNodeID] = 0.0; adWeight[iNodeID] = dTrainW; adPred[iNodeID] = dShrinkage*dPrediction; iNodeID++; return hr; } gbm/src/gbmentry.cpp0000644000176200001440000011225713417115400014145 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "gbm.h" extern "C" { #include #include SEXP gbm_fit ( SEXP radY, // outcome or response SEXP radOffset, // offset for f(x), NA for no offset SEXP radX, SEXP raiXOrder, SEXP radWeight, SEXP radMisc, // other row specific data (eg failure time), NA=no Misc SEXP rcRows, SEXP rcCols, SEXP racVarClasses, SEXP ralMonotoneVar, SEXP rszFamily, SEXP rcTrees, SEXP rcDepth, // interaction depth SEXP rcMinObsInNode, SEXP rcNumClasses, SEXP rdShrinkage, SEXP rdBagFraction, SEXP rcTrain, SEXP radFOld, SEXP rcCatSplitsOld, SEXP rcTreesOld, SEXP rfVerbose ) { unsigned long hr = 0; SEXP rAns = NULL; SEXP rNewTree = NULL; SEXP riSplitVar = NULL; SEXP rdSplitPoint = NULL; SEXP riLeftNode = NULL; SEXP riRightNode = NULL; SEXP riMissingNode = NULL; SEXP rdErrorReduction = NULL; SEXP rdWeight = NULL; SEXP rdPred = NULL; SEXP rdInitF = NULL; SEXP radF = NULL; SEXP radTrainError = NULL; SEXP radValidError = NULL; SEXP radOOBagImprove = NULL; SEXP rSetOfTrees = NULL; SEXP rSetSplitCodes = NULL; SEXP rSplitCode = NULL; VEC_VEC_CATEGORIES vecSplitCodes; int i = 0; int iT = 0; int iK = 0; int cTrees = INTEGER(rcTrees)[0]; const int cResultComponents = 7; // rdInitF, radF, radTrainError, radValidError, radOOBagImprove // rSetOfTrees, rSetSplitCodes const int cTreeComponents = 8; // riSplitVar, rdSplitPoint, riLeftNode, // riRightNode, riMissingNode, rdErrorReduction, rdWeight, rdPred int cNodes = 0; int cTrain = INTEGER(rcTrain)[0]; int cNumClasses = INTEGER(rcNumClasses)[0]; double dTrainError = 0.0; double dValidError = 0.0; double dOOBagImprove = 0.0; CGBM *pGBM = NULL; CDataset *pData = NULL; CDistribution *pDist = NULL; int cGroups = -1; // set up the dataset pData = new CDataset(); if(pData==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } // initialize R's random number generator GetRNGstate(); // initialize some things hr = gbm_setup(REAL(radY), REAL(radOffset), REAL(radX), INTEGER(raiXOrder), REAL(radWeight), REAL(radMisc), INTEGER(rcRows)[0], INTEGER(rcCols)[0], INTEGER(racVarClasses), INTEGER(ralMonotoneVar), CHAR(STRING_ELT(rszFamily,0)), INTEGER(rcTrees)[0], INTEGER(rcDepth)[0], INTEGER(rcMinObsInNode)[0], INTEGER(rcNumClasses)[0], REAL(rdShrinkage)[0], REAL(rdBagFraction)[0], INTEGER(rcTrain)[0], pData, pDist, cGroups); if(GBM_FAILED(hr)) { goto Error; } // allocate the GBM pGBM = new CGBM(); if(pGBM==NULL) { hr = GBM_OUTOFMEMORY; goto Error; } // initialize the GBM hr = pGBM->Initialize(pData, pDist, REAL(rdShrinkage)[0], cTrain, REAL(rdBagFraction)[0], INTEGER(rcDepth)[0], INTEGER(rcMinObsInNode)[0], INTEGER(rcNumClasses)[0], cGroups); if(GBM_FAILED(hr)) { goto Error; } // allocate the main return object PROTECT(rAns = allocVector(VECSXP, cResultComponents)); // allocate the initial value PROTECT(rdInitF = allocVector(REALSXP, 1)); SET_VECTOR_ELT(rAns,0,rdInitF); UNPROTECT(1); // rdInitF // allocate the predictions PROTECT(radF = allocVector(REALSXP, (pData->cRows) * cNumClasses)); SET_VECTOR_ELT(rAns,1,radF); UNPROTECT(1); // radF hr = pDist->Initialize(pData->adY, pData->adMisc, pData->adOffset, pData->adWeight, pData->cRows); if(ISNA(REAL(radFOld)[0])) // check for old predictions { // set the initial value of F as a constant hr = pDist->InitF(pData->adY, pData->adMisc, pData->adOffset, pData->adWeight, REAL(rdInitF)[0], cTrain); for(i=0; i < (pData->cRows) * cNumClasses; i++) { REAL(radF)[i] = REAL(rdInitF)[0]; } } else { for(i=0; i < (pData->cRows) * cNumClasses; i++) { REAL(radF)[i] = REAL(radFOld)[i]; } } // allocate space for the performance measures PROTECT(radTrainError = allocVector(REALSXP, cTrees)); PROTECT(radValidError = allocVector(REALSXP, cTrees)); PROTECT(radOOBagImprove = allocVector(REALSXP, cTrees)); SET_VECTOR_ELT(rAns,2,radTrainError); SET_VECTOR_ELT(rAns,3,radValidError); SET_VECTOR_ELT(rAns,4,radOOBagImprove); UNPROTECT(3); // radTrainError , radValidError, radOOBagImprove // allocate the component for the tree structures PROTECT(rSetOfTrees = allocVector(VECSXP, cTrees * cNumClasses)); SET_VECTOR_ELT(rAns,5,rSetOfTrees); UNPROTECT(1); // rSetOfTrees if(INTEGER(rfVerbose)[0]) { Rprintf("Iter TrainDeviance ValidDeviance StepSize Improve\n"); } for(iT=0; iTUpdateParams(REAL(radF), pData->adOffset, pData->adWeight, cTrain); if(GBM_FAILED(hr)) { goto Error; } REAL(radTrainError)[iT] = 0.0; REAL(radValidError)[iT] = 0.0; REAL(radOOBagImprove)[iT] = 0.0; for (iK = 0; iK < cNumClasses; iK++) { hr = pGBM->iterate(REAL(radF), dTrainError,dValidError,dOOBagImprove, cNodes, cNumClasses, iK); if(GBM_FAILED(hr)) { goto Error; } // store the performance measures REAL(radTrainError)[iT] += dTrainError; REAL(radValidError)[iT] += dValidError; REAL(radOOBagImprove)[iT] += dOOBagImprove; // allocate the new tree component for the R list structure PROTECT(rNewTree = allocVector(VECSXP, cTreeComponents)); // riNodeID,riSplitVar,rdSplitPoint,riLeftNode, // riRightNode,riMissingNode,rdErrorReduction,rdWeight PROTECT(riSplitVar = allocVector(INTSXP, cNodes)); PROTECT(rdSplitPoint = allocVector(REALSXP, cNodes)); PROTECT(riLeftNode = allocVector(INTSXP, cNodes)); PROTECT(riRightNode = allocVector(INTSXP, cNodes)); PROTECT(riMissingNode = allocVector(INTSXP, cNodes)); PROTECT(rdErrorReduction = allocVector(REALSXP, cNodes)); PROTECT(rdWeight = allocVector(REALSXP, cNodes)); PROTECT(rdPred = allocVector(REALSXP, cNodes)); SET_VECTOR_ELT(rNewTree,0,riSplitVar); SET_VECTOR_ELT(rNewTree,1,rdSplitPoint); SET_VECTOR_ELT(rNewTree,2,riLeftNode); SET_VECTOR_ELT(rNewTree,3,riRightNode); SET_VECTOR_ELT(rNewTree,4,riMissingNode); SET_VECTOR_ELT(rNewTree,5,rdErrorReduction); SET_VECTOR_ELT(rNewTree,6,rdWeight); SET_VECTOR_ELT(rNewTree,7,rdPred); UNPROTECT(cTreeComponents); SET_VECTOR_ELT(rSetOfTrees,(iK + iT * cNumClasses),rNewTree); UNPROTECT(1); // rNewTree hr = gbm_transfer_to_R(pGBM, vecSplitCodes, INTEGER(riSplitVar), REAL(rdSplitPoint), INTEGER(riLeftNode), INTEGER(riRightNode), INTEGER(riMissingNode), REAL(rdErrorReduction), REAL(rdWeight), REAL(rdPred), INTEGER(rcCatSplitsOld)[0]); } // Close for iK // print the information if((iT <= 9) || ((iT+1+INTEGER(rcTreesOld)[0])/20 == (iT+1+INTEGER(rcTreesOld)[0])/20.0) || (iT==cTrees-1)) { R_CheckUserInterrupt(); if(INTEGER(rfVerbose)[0]) { Rprintf("%6d %13.4f %15.4f %10.4f %9.4f\n", iT+1+INTEGER(rcTreesOld)[0], REAL(radTrainError)[iT], REAL(radValidError)[iT], REAL(rdShrinkage)[0], REAL(radOOBagImprove)[iT]); } } } if(INTEGER(rfVerbose)[0]) Rprintf("\n"); // transfer categorical splits to R PROTECT(rSetSplitCodes = allocVector(VECSXP, vecSplitCodes.size())); SET_VECTOR_ELT(rAns,6,rSetSplitCodes); UNPROTECT(1); // rSetSplitCodes for(i=0; i<(int)vecSplitCodes.size(); i++) { PROTECT(rSplitCode = allocVector(INTSXP, size_of_vector(vecSplitCodes,i))); SET_VECTOR_ELT(rSetSplitCodes,i,rSplitCode); UNPROTECT(1); // rSplitCode hr = gbm_transfer_catsplits_to_R(i, vecSplitCodes, INTEGER(rSplitCode)); } // dump random number generator seed #ifdef NOISY_DEBUG Rprintf("PutRNGstate\n"); #endif PutRNGstate(); Cleanup: UNPROTECT(1); // rAns #ifdef NOISY_DEBUG Rprintf("destructing\n"); #endif if(pGBM != NULL) { delete pGBM; pGBM = NULL; } if(pDist != NULL) { delete pDist; pDist = NULL; } if(pData != NULL) { delete pData; pData = NULL; } return rAns; Error: goto Cleanup; } SEXP gbm_pred ( SEXP radX, // the data matrix SEXP rcRows, // number of rows SEXP rcCols, // number of columns SEXP rcNumClasses, // number of classes SEXP rcTrees, // number of trees, may be a vector SEXP rdInitF, // the initial value SEXP rTrees, // the list of trees SEXP rCSplits, // the list of categorical splits SEXP raiVarType, // indicator of continuous/nominal SEXP riSingleTree // boolean whether to return only results for one tree ) { unsigned long hr = 0; int iTree = 0; int iObs = 0; int cRows = INTEGER(rcRows)[0]; int cPredIterations = LENGTH(rcTrees); int iPredIteration = 0; int cTrees = 0; int iClass = 0; int cNumClasses = INTEGER(rcNumClasses)[0]; SEXP rThisTree = NULL; int *aiSplitVar = NULL; double *adSplitCode = NULL; int *aiLeftNode = NULL; int *aiRightNode = NULL; int *aiMissingNode = NULL; int iCurrentNode = 0; double dX = 0.0; int iCatSplitIndicator = 0; bool fSingleTree = (INTEGER(riSingleTree)[0]==1); SEXP radPredF = NULL; // allocate the predictions to return PROTECT(radPredF = allocVector(REALSXP, cRows*cNumClasses*cPredIterations)); if(radPredF == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } // initialize the predicted values if(!fSingleTree) { // initialize with the intercept for only the smallest rcTrees for(iObs=0; iObs0)) { // copy over from the last rcTrees for(iObs=0; iObs 0) { cStackNodes--; iCurrentNode = aiNodeStack[cStackNodes]; if(aiSplitVar[iCurrentNode] == -1) // terminal node { REAL(radPredF)[iClass*cRows + iObs] += adWeightStack[cStackNodes]*adSplitCode[iCurrentNode]; } else // non-terminal node { // is this a split variable that interests me? iPredVar = -1; for(i=0; (iPredVar == -1) && (i < cCols); i++) { if(INTEGER(raiWhichVar)[i] == aiSplitVar[iCurrentNode]) { iPredVar = i; // split is on one that interests me } } if(iPredVar != -1) // this split is among raiWhichVar { dX = REAL(radX)[iPredVar*cRows + iObs]; // missing? if(ISNA(dX)) { aiNodeStack[cStackNodes] = aiMissingNode[iCurrentNode]; cStackNodes++; } // continuous? else if(INTEGER(raiVarType)[aiSplitVar[iCurrentNode]] == 0) { if(dX < adSplitCode[iCurrentNode]) { aiNodeStack[cStackNodes] = aiLeftNode[iCurrentNode]; cStackNodes++; } else { aiNodeStack[cStackNodes] = aiRightNode[iCurrentNode]; cStackNodes++; } } else // categorical { iCatSplitIndicator = INTEGER( VECTOR_ELT(rCSplits, (int)adSplitCode[iCurrentNode]))[(int)dX]; if(iCatSplitIndicator==-1) { aiNodeStack[cStackNodes] = aiLeftNode[iCurrentNode]; cStackNodes++; } else if(iCatSplitIndicator==1) { aiNodeStack[cStackNodes] = aiRightNode[iCurrentNode]; cStackNodes++; } else // handle unused level { iCurrentNode = aiMissingNode[iCurrentNode]; } } } // iPredVar != -1 else // not interested in this split, average left and right { aiNodeStack[cStackNodes] = aiRightNode[iCurrentNode]; dCurrentW = adWeightStack[cStackNodes]; adWeightStack[cStackNodes] = dCurrentW * adW[aiRightNode[iCurrentNode]]/ (adW[aiLeftNode[iCurrentNode]]+ adW[aiRightNode[iCurrentNode]]); cStackNodes++; aiNodeStack[cStackNodes] = aiLeftNode[iCurrentNode]; adWeightStack[cStackNodes] = dCurrentW-adWeightStack[cStackNodes-1]; cStackNodes++; } } // non-terminal node } // while(cStackNodes > 0) } // iObs } // iClass } // iTree Cleanup: UNPROTECT(1); // radPredF return radPredF; Error: goto Cleanup; } // gbm_plot SEXP gbm_shrink_pred ( SEXP radX, SEXP rcRows, SEXP rcCols, SEXP rcNumClasses, SEXP racTrees, SEXP rdInitF, SEXP rTrees, SEXP rCSplits, SEXP raiVarType, SEXP rcInteractionDepth, SEXP radLambda ) { unsigned long hr = 0; int iTree = 0; int iPredictionIter = 0; int iObs = 0; int iClass = 0; int i = 0; int cRows = INTEGER(rcRows)[0]; int cNumClasses = INTEGER(rcNumClasses)[0]; double *adLambda = REAL(radLambda); double dLambda = 0.0; double dPred = 0.0; SEXP rThisTree = NULL; int *aiSplitVar = NULL; double *adSplitCode = NULL; int *aiLeftNode = NULL; int *aiRightNode = NULL; int *aiMissingNode = NULL; double *adNodeW = NULL; int iCurrentNode = 0; double dX = 0.0; int iCatSplitIndicator = 0; SEXP rResult = NULL; SEXP radPredF = NULL; // The predictions double *adPredF = NULL; // The shrunken predictions double *adNodePred = NULL; int *aiNodeStack = NULL; unsigned long cNodeStack = 0; int cMaxNodes = 1+3*(INTEGER(rcInteractionDepth)[0]); adPredF = new double[cRows * cNumClasses]; if(adPredF == NULL) { hr = GBM_OUTOFMEMORY; goto Error; } for(iObs=0; iObs0) { i = aiNodeStack[cNodeStack-1]; if(aiSplitVar[i]==-1) { adNodePred[i] = adSplitCode[i]; cNodeStack--; } else if(ISNA(adNodePred[aiLeftNode[i]])) { aiNodeStack[cNodeStack] = aiLeftNode[i]; cNodeStack++; aiNodeStack[cNodeStack] = aiRightNode[i]; cNodeStack++; // check whether missing node is the same as parent node // occurs when X_i has no missing values if(adNodeW[i] != adNodeW[aiMissingNode[i]]) { aiNodeStack[cNodeStack] = aiMissingNode[i]; cNodeStack++; } else { adNodePred[aiMissingNode[i]] = 0.0; } } else { // compute the parent node's prediction adNodePred[i] = (adNodeW[aiLeftNode[i]]*adNodePred[aiLeftNode[i]] + adNodeW[aiRightNode[i]]*adNodePred[aiRightNode[i]]+ adNodeW[aiMissingNode[i]]*adNodePred[aiMissingNode[i]])/ adNodeW[i]; cNodeStack--; } } // predict for the observations for(iObs=0; iObs 1) { adProb = new double[cNumClasses]; } // initialize the predicted values for(iObs=0; iObs 1) then calculate the probabilities if (cNumClasses > 1) { dDenom = 0.0; for (iClass = 0; iClass < cNumClasses; iClass++) { adProb[iClass] = exp(REAL(radPredF)[iObs + iClass * cRows]); dDenom += adProb[iClass]; } dDJDf = 0.0; for (iClass = 0; iClass < cNumClasses; iClass++) { adProb[iClass] /= dDenom; REAL(rdObjective)[0] += (adY[iObs + iClass * cRows] - adProb[iClass]) * (adY[iObs + iClass * cRows] - adProb[iClass]); dDJDf += -2*(adY[iObs + iClass * cRows] - adProb[iClass]); } REAL(rdObjective)[0] /= double(cNumClasses); dDJDf /= double(cNumClasses); } else { // DEBUG: need to make more general for other loss functions! REAL(rdObjective)[0] += (adY[iObs]-REAL(radPredF)[iObs])* (adY[iObs]-REAL(radPredF)[iObs]); dDJDf = -2*(adY[iObs]-REAL(radPredF)[iObs]); } for(iLambda=0; iLambda #include "distribution.h" #include "locationm.h" class CTDist : public CDistribution { public: CTDist(double adNu); virtual ~CTDist(); GBMRESULT UpdateParams(double *adF, double *adOffset, double *adWeight, unsigned long cLength) { return GBM_OK; }; GBMRESULT ComputeWorkingResponse(double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff); GBMRESULT InitF(double *adY, double *adMisc, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength); GBMRESULT FitBestConstant(double *adY, double *adMisc, double *adOffset, double *adW, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long nTrain, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff); double Deviance(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff); double BagImprovement(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain); private: double mdNu; CLocationM *mpLocM; }; #endif // TDISTCGBM_H gbm/src/poisson.h0000644000176200001440000000567413417115400013461 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // File: poisson.h // // License: GNU GPL (version 2 or later) // // Contents: poisson object // // Owner: gregr@rand.org // // History: 3/26/2001 gregr created // 2/14/2003 gregr: adapted for R implementation // //------------------------------------------------------------------------------ #ifndef POISSON_H #define POISSON_H #include #include "distribution.h" class CPoisson : public CDistribution { public: CPoisson(); virtual ~CPoisson(); GBMRESULT UpdateParams(double *adF, double *adOffset, double *adWeight, unsigned long cLength) { return GBM_OK; }; GBMRESULT ComputeWorkingResponse(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adZ, bool *afInBag, unsigned long nTrain, int cIdxOff); double Deviance(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff); GBMRESULT InitF(double *adY, double *adMisc, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength); GBMRESULT FitBestConstant(double *adY, double *adMisc, double *adOffset, double *adW, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long nTrain, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff); double BagImprovement(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain); private: vector vecdNum; vector vecdDen; vector vecdMax; vector vecdMin; }; #endif // POISSON_H gbm/src/gbm.h0000644000176200001440000000355613417115400012531 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: gbm.h // // License: GNU GPL (version 2 or later) // // Contents: Entry point for gbm.dll // // Owner: gregr@rand.org // // History: 2/14/2003 gregr created // 6/11/2007 gregr added quantile regression // written by Brian Kriegler // //------------------------------------------------------------------------------ #include #include "dataset.h" #include "distribution.h" #include "bernoulli.h" #include "adaboost.h" #include "poisson.h" #include "gaussian.h" #include "coxph.h" #include "laplace.h" #include "quantile.h" #include "tdist.h" #include "multinomial.h" #include "pairwise.h" #include "gbm_engine.h" #include "locationm.h" #include "huberized.h" typedef vector VEC_CATEGORIES; typedef vector VEC_VEC_CATEGORIES; GBMRESULT gbm_setup ( double *adY, double *adOffset, double *adX, int *aiXOrder, double *adWeight, double *adMisc, int cRows, int cCols, int *acVarClasses, int *alMonotoneVar, const char *pszFamily, int cTrees, int cLeaves, int cMinObsInNode, int cNumClasses, double dShrinkage, double dBagFraction, int cTrain, CDataset *pData, PCDistribution &pDist, int& cGroups ); GBMRESULT gbm_transfer_to_R ( CGBM *pGBM, VEC_VEC_CATEGORIES &vecSplitCodes, int *aiSplitVar, double *adSplitPoint, int *aiLeftNode, int *aiRightNode, int *aiMissingNode, double *adErrorReduction, double *adWeight, double *adPred, int cCatSplitsOld ); GBMRESULT gbm_transfer_catsplits_to_R ( int iCatSplit, VEC_VEC_CATEGORIES &vecSplitCodes, int *aiSplitCodes ); int size_of_vector ( VEC_VEC_CATEGORIES &vec, int i ); gbm/src/bernoulli.h0000644000176200001440000000555313417115400013756 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: bernoulli.h // // License: GNU GPL (version 2 or later) // // Contents: bernoulli object // // Owner: gregr@rand.org // // History: 3/26/2001 gregr created // 2/14/2003 gregr: adapted for R implementation // //------------------------------------------------------------------------------ #ifndef BERNOULLI_H #define BERNOULLI_H #include "distribution.h" #include "buildinfo.h" class CBernoulli : public CDistribution { public: CBernoulli(); virtual ~CBernoulli(); GBMRESULT UpdateParams(double *adF, double *adOffset, double *adWeight, unsigned long cLength) { return GBM_OK; }; GBMRESULT ComputeWorkingResponse(double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff); double Deviance(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff); GBMRESULT InitF(double *adY, double *adMisc, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength); GBMRESULT FitBestConstant(double *adY, double *adMisc, double *adOffset, double *adW, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long nTrain, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff); double BagImprovement(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain); private: vector vecdNum; vector vecdDen; }; #endif // BERNOULLI_H gbm/src/gaussian.cpp0000644000176200001440000000643113417115400014124 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "gaussian.h" CGaussian::CGaussian() { } CGaussian::~CGaussian() { } GBMRESULT CGaussian::ComputeWorkingResponse ( double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff ) { GBMRESULT hr = GBM_OK; unsigned long i = 0; if((adY == NULL) || (adF == NULL) || (adZ == NULL) || (adWeight == NULL)) { hr = GBM_INVALIDARG; goto Error; } if(adOffset == NULL) { for(i=0; idPrediction = 0.0; } else { vecpTermNodes[iNode]->dPrediction = vecdNum[iNode]/vecdDen[iNode]; } } } return hr; } double CHuberized::BagImprovement ( double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain ) { double dReturnValue = 0.0; double dF = 0.0; double dW = 0.0; unsigned long i = 0; for(i=0; i 0) ? dClassSum : 1e-8; for (kk = 0; kk < mcNumClasses; kk++) { madProb[ii + kk * mcRows] /= dClassSum; } } return GBM_OK; } GBMRESULT CMultinomial::ComputeWorkingResponse ( double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff ) { unsigned long i = 0; for(i=cIdxOff; icN >= cMinObsInNode) { // Get the number of nodes here double dNum = 0.0; double dDenom = 0.0; for (iObs = 0; iObs < nTrain; iObs++) { if(afInBag[iObs] && (aiNodeAssign[iObs] == iNode)) { int iIdx = iObs + cIdxOff; dNum += adW[iIdx] * adZ[iIdx]; dDenom += adW[iIdx] * fabs(adZ[iIdx]) * (1 - fabs(adZ[iIdx])); } } dDenom = (dDenom > 0) ? dDenom : 1e-8; vecpTermNodes[iNode]->dPrediction = dNum / dDenom; } } return hr; } double CMultinomial::BagImprovement ( double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain ) { double dReturnValue = 0.0; double dW = 0.0; unsigned long ii; unsigned long kk; // Calculate the probabilities after the step double *adStepProb = new double[mcNumClasses * mcRows]; // Assume that this is last class - calculate new prob as in updateParams but // using (F_ik + ss*Fadj_ik) instead of F_ik. Then calculate OOB improve for (ii = 0; ii < mcRows; ii++) { double dClassSum = 0.0; for (kk = 0; kk < mcNumClasses; kk++) { int iIdx = ii + kk * mcRows; double dF = (adOffset == NULL) ? adF[iIdx] : adF[iIdx] + adOffset[iIdx]; dF += dStepSize * adFadj[iIdx]; adStepProb[iIdx] = adWeight[iIdx] * exp(dF); dClassSum += adWeight[iIdx] * exp(dF); } dClassSum = (dClassSum > 0) ? dClassSum : 1e-8; for (kk = 0; kk < mcNumClasses; kk++) { adStepProb[ii + kk * mcRows] /= dClassSum; } } // Calculate the improvement for(ii=0; ii #include "distribution.h" class CQuantile: public CDistribution { public: CQuantile(double dAlpha); virtual ~CQuantile(); GBMRESULT UpdateParams(double *adF, double *adOffset, double *adWeight, unsigned long cLength) { return GBM_OK; }; GBMRESULT ComputeWorkingResponse(double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff); GBMRESULT InitF(double *adY, double *adMisc, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength); GBMRESULT FitBestConstant(double *adY, double *adMisc, double *adOffset, double *adW, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long nTrain, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff); double Deviance(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff); double BagImprovement(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain); private: vector vecd; double dAlpha; }; #endif // QUANTILE_H gbm/src/distribution.h0000644000176200001440000001312113417115400014470 0ustar liggesusers//------------------------------------------------------------------------------ // GBM by Greg Ridgeway Copyright (C) 2003 // // File: distribution.h // // License: GNU GPL (version 2 or later) // // Contents: distribution object // // Owner: gregr@rand.org // // History: 3/26/2001 gregr created // 2/14/2003 gregr: adapted for R implementation // //------------------------------------------------------------------------------ #ifndef DISTRIBUTION_H #define DISTRIBUTION_H #include "node_terminal.h" class CDistribution { public: CDistribution(); virtual ~CDistribution(); // In the subsequent functions, parameters have the following meaning: // * adY - The target // * adMisc - Optional auxiliary data (the precise meaning is specific to the // derived class) // * adOffset - An optional offset to the score (adF) // * adWeight - Instance training weight // * adF - Current score (sum of all trees generated so far) // * adZ - (Negative) gradient of loss function, to be predicted by tree // * adFadj - Output of current tree, to be added to adF // * cLength - Number of instances (size of vectors) // * afInBag - true if instance is part of training set for current tree // (depends on random subsampling) // * cIdxOff - Offset used for multi-class training (CMultinomial). // Initialize() is called once, before training starts. // It gives derived classes a chance for custom preparations, e.g., to allocate // memory or to pre-compute values that do not change between iterations. virtual GBMRESULT Initialize(double *adY, double *adMisc, double *adOffset, double *adWeight, unsigned long cLength) { return GBM_OK; } // UpdateParams() is called at the start of each iteration. // CMultinomial uses it to normalize predictions across multiple classes. virtual GBMRESULT UpdateParams(double *adF, double *adOffset, double *adWeight, unsigned long cLength) = 0; // ComputeWorkingResonse() calculates the negative gradients of the // loss function, and stores them in adZ. virtual GBMRESULT ComputeWorkingResponse(double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long cLength, int cIdxOff) = 0; // InitF() computes the best constant prediction for all instances, and // stores it in dInitF. virtual GBMRESULT InitF(double *adY, double *adMisc, double *adOffset, double *adWeight, double &dInitF, unsigned long cLength) = 0; // Deviance() returns the value of the loss function, based on the // current predictions (adF). virtual double Deviance(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff) = 0; // FitBestConstant() calculates and sets prediction values for all terminal nodes // of the tree being currently constructed. // Assumptions: // * cTermNodes is the number of terminal nodes of the tree. // * vecpTermNodes is a vector of (pointers to) the terminal nodes of the tree, of // size cTermNodes. // * aiNodeAssign is a vector of size cLength, that maps each instance to an index // into vecpTermNodes for the corresponding terminal node. virtual GBMRESULT FitBestConstant(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adZ, unsigned long *aiNodeAssign, unsigned long cLength, VEC_P_NODETERMINAL vecpTermNodes, unsigned long cTermNodes, unsigned long cMinObsInNode, bool *afInBag, double *adFadj, int cIdxOff) = 0; // BagImprovement() returns the incremental difference in the loss // function induced by scoring with (adF + dStepSize * adFAdj) instead of adF, for // all instances that were not part of the training set for the current tree (i.e., // afInBag set to false). virtual double BagImprovement(double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long cLength) = 0; }; typedef CDistribution *PCDistribution; #endif // DISTRIBUTION_H gbm/src/tdist.cpp0000644000176200001440000001055713417115400013445 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 #include "tdist.h" CTDist::CTDist(double adNu) { mdNu = adNu; double *adParams = new double[1]; adParams[0] = adNu; mpLocM = new CLocationM("tdist", 1, adParams); delete[] adParams; } CTDist::~CTDist() { delete mpLocM; } GBMRESULT CTDist::ComputeWorkingResponse ( double *adY, double *adMisc, double *adOffset, double *adF, double *adZ, double *adWeight, bool *afInBag, unsigned long nTrain, int cIdxOff ) { unsigned long i = 0; double dU = 0.0; if(adOffset == NULL) { for(i=0; iLocationM(iN, adArr, adWeight); delete[] adArr; return GBM_OK; } double CTDist::Deviance ( double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, unsigned long cLength, int cIdxOff ) { unsigned long i=0; double dL = 0.0; double dW = 0.0; double dU = 0.0; if(adOffset == NULL) { for(i=cIdxOff; icN >= cMinObsInNode) { // Get the number of nodes here int iNumNodes = 0; for (iObs = 0; iObs < nTrain; iObs++) { if(afInBag[iObs] && (aiNodeAssign[iObs] == iNode)) { iNumNodes++; } } // Create the arrays to centre double *adArr = new double[iNumNodes]; double *adWeight = new double[iNumNodes]; int iIdx = 0; for(iObs=0; iObsdPrediction = mpLocM->LocationM(iNumNodes, adArr, adWeight); delete[] adArr; delete[] adWeight; } } return hr; } double CTDist::BagImprovement ( double *adY, double *adMisc, double *adOffset, double *adWeight, double *adF, double *adFadj, bool *afInBag, double dStepSize, unsigned long nTrain ) { double dReturnValue = 0.0; double dF = 0.0; double dW = 0.0; unsigned long i = 0; double dU = 0.0; double dV = 0.0; for(i=0; i // generic object (class) definition of matrix: template class matrix{ // NOTE: maxsize determines available memory storage, but // actualsize determines the actual size of the stored matrix in use // at a particular time. int maxsize; // max number of rows (same as max number of columns) int actualsize; // actual size (rows, or columns) of the stored matrix D* data; // where the data contents of the matrix are stored void allocateD() { delete[] data; data = new D [maxsize*maxsize]; }; public: matrix() { maxsize = 5; actualsize = 5; data = 0; allocateD(); }; // private ctor's matrix(int newmaxsize) {matrix(newmaxsize,newmaxsize);}; matrix(int newmaxsize, int newactualsize) { // the only public ctor if (newmaxsize <= 0) newmaxsize = 5; maxsize = newmaxsize; if ((newactualsize <= newmaxsize)&&(newactualsize>0)) actualsize = newactualsize; else actualsize = newmaxsize; // since allocateD() will first call delete[] on data: data = 0; allocateD(); }; ~matrix() { delete[] data; }; void dumpMatrixValues() { bool xyz; double rv; for (int i=0; i < actualsize; i++) { cout << "i=" << i << ": "; for (int j=0; j maxunitydeviation ) { maxunitydeviation = currentunitydeviation; worstdiagonal = i; } } int worstoffdiagonalrow = 0; int worstoffdiagonalcolumn = 0; D maxzerodeviation = 0.0; D currentzerodeviation ; for ( i = 0; i < actualsize; i++ ) { for ( int j = 0; j < actualsize; j++ ) { if ( i == j ) continue; // we look only at non-diagonal terms currentzerodeviation = data[i*maxsize+j]; if ( currentzerodeviation < 0.0) currentzerodeviation *= -1.0; if ( currentzerodeviation > maxzerodeviation ) { maxzerodeviation = currentzerodeviation; worstoffdiagonalrow = i; worstoffdiagonalcolumn = j; } } } cout << "Worst diagonal value deviation from unity: " << maxunitydeviation << " at row/column " << worstdiagonal << endl; cout << "Worst off-diagonal value deviation from zero: " << maxzerodeviation << " at row = " << worstoffdiagonalrow << ", column = " << worstoffdiagonalcolumn << endl; } void settoproduct(matrix& left, matrix& right) { actualsize = left.getactualsize(); if ( maxsize < left.getactualsize() ) { maxsize = left.getactualsize(); allocateD(); } for ( int i = 0; i < actualsize; i++ ) { for ( int j = 0; j < actualsize; j++ ) { D sum = 0.0; D leftvalue, rightvalue; bool success; for (int c = 0; c < actualsize; c++) { left.getvalue(i,c,leftvalue,success); right.getvalue(c,j,rightvalue,success); sum += leftvalue * rightvalue; } setvalue(i,j,sum); } } } void copymatrix(matrix& source) { actualsize = source.getactualsize(); if ( maxsize < source.getactualsize() ) { maxsize = source.getactualsize(); allocateD(); } for ( int i = 0; i < actualsize; i++ ) { for ( int j = 0; j < actualsize; j++ ) { D value; bool success; source.getvalue(i,j,value,success); data[i*maxsize+j] = value; } } }; void setactualsize(int newactualsize) { if ( newactualsize > maxsize ) { maxsize = newactualsize ; // * 2; // wastes memory but saves // time otherwise required for // operation new[] allocateD(); } if (newactualsize >= 0) actualsize = newactualsize; }; int getactualsize() { return actualsize; }; void getvalue(int row, int column, D& returnvalue, bool& success) { if ( (row>=maxsize) || (column>=maxsize) || (row<0) || (column<0) ) { success = false; return; } returnvalue = data[ row * maxsize + column ]; success = true; }; bool setvalue(int row, int column, D newvalue) { if ( (row >= maxsize) || (column >= maxsize) || (row<0) || (column<0) ) return false; data[ row * maxsize + column ] = newvalue; return true; }; void invert() { int i = 0; int j = 0; int k = 0; if (actualsize <= 0) return; // sanity check if (actualsize == 1) { data[0] = 1.0/data[0]; return; } for (i=1; i < actualsize; i++) data[i] /= data[0]; // normalize row 0 for (i=1; i < actualsize; i++) { for ( j=i; j < actualsize; j++) { // do a column of L D sum = 0.0; for ( k = 0; k < i; k++) sum += data[j*maxsize+k] * data[k*maxsize+i]; data[j*maxsize+i] -= sum; } if (i == actualsize-1) continue; for ( j=i+1; j < actualsize; j++) { // do a row of U D sum = 0.0; for ( k = 0; k < i; k++) sum += data[i*maxsize+k]*data[k*maxsize+j]; data[i*maxsize+j] = (data[i*maxsize+j]-sum) / data[i*maxsize+i]; } } for ( i = 0; i < actualsize; i++ ) // invert L { for ( j = i; j < actualsize; j++ ) { D x = 1.0; if ( i != j ) { x = 0.0; for ( k = i; k < j; k++ ) x -= data[j*maxsize+k]*data[k*maxsize+i]; } data[j*maxsize+i] = x / data[j*maxsize+j]; } } for ( i = 0; i < actualsize; i++ ) // invert U { for ( j = i; j < actualsize; j++ ) { if ( i == j ) continue; D sum = 0.0; for ( k = i; k < j; k++ ) sum += data[k*maxsize+j]*( (i==k) ? 1.0 : data[i*maxsize+k] ); data[i*maxsize+j] = -sum; } } for ( i = 0; i < actualsize; i++ ) // final inversion { for ( j = 0; j < actualsize; j++ ) { D sum = 0.0; for ( k = ((i>j)?i:j); k < actualsize; k++ ) sum += ((j==k)?1.0:data[j*maxsize+k])*data[k*maxsize+i]; data[j*maxsize+i] = sum; } } }; }; #endif gbm/src/buildinfo.h0000644000176200001440000000110213417115400013720 0ustar liggesusers// GBM by Greg Ridgeway Copyright (C) 2003 // License: GNU GPL (version 2 or later) #ifndef BUILDINFO_H #define BUILDINFO_H #undef ERROR #include #define GBM_FAILED(hr) ((unsigned long)hr != 0) typedef unsigned long GBMRESULT; #define GBM_OK 0 #define GBM_FAIL 1 #define GBM_INVALIDARG 2 #define GBM_OUTOFMEMORY 3 #define GBM_INVALID_DATA 4 #define GBM_NOTIMPL 5 #define LEVELS_PER_CHUNK ((unsigned long) 1) typedef unsigned long ULONG; typedef char *PCHAR; // #define NOISY_DEBUG #endif // BUILDINFO_H gbm/NAMESPACE0000644000176200001440000000425613417115354012251 0ustar liggesusers# Generated by roxygen2: do not edit by hand S3method(plot,gbm) S3method(predict,gbm) S3method(print,gbm) S3method(summary,gbm) export(basehaz.gbm) export(calibrate.plot) export(checkID) export(checkMissing) export(checkOffset) export(checkWeights) export(gbm) export(gbm.conc) export(gbm.fit) export(gbm.loss) export(gbm.more) export(gbm.perf) export(gbm.roc.area) export(gbmCluster) export(gbmCrossVal) export(gbmCrossValErr) export(gbmCrossValModelBuild) export(gbmCrossValPredictions) export(gbmDoFold) export(getCVgroup) export(getStratify) export(getVarNames) export(grid.arrange) export(guessDist) export(interact.gbm) export(ir.measure.auc) export(ir.measure.conc) export(ir.measure.map) export(ir.measure.mrr) export(ir.measure.ndcg) export(perf.pairwise) export(permutation.test.gbm) export(plot.gbm) export(predict.gbm) export(pretty.gbm.tree) export(quantile.rug) export(reconstructGBMdata) export(relative.influence) export(show.gbm) export(shrink.gbm) export(shrink.gbm.pred) export(summary.gbm) export(test.gbm) export(test.relative.influence) export(validate.gbm) import(lattice) importFrom(grDevices,rainbow) importFrom(graphics,abline) importFrom(graphics,axis) importFrom(graphics,barplot) importFrom(graphics,lines) importFrom(graphics,mtext) importFrom(graphics,par) importFrom(graphics,plot) importFrom(graphics,polygon) importFrom(graphics,rug) importFrom(graphics,segments) importFrom(graphics,title) importFrom(gridExtra,grid.arrange) importFrom(stats,approx) importFrom(stats,binomial) importFrom(stats,delete.response) importFrom(stats,gaussian) importFrom(stats,glm) importFrom(stats,loess) importFrom(stats,model.extract) importFrom(stats,model.frame) importFrom(stats,model.offset) importFrom(stats,model.response) importFrom(stats,model.weights) importFrom(stats,na.pass) importFrom(stats,poisson) importFrom(stats,predict) importFrom(stats,quantile) importFrom(stats,rbinom) importFrom(stats,reformulate) importFrom(stats,reorder) importFrom(stats,rexp) importFrom(stats,rnorm) importFrom(stats,runif) importFrom(stats,sd) importFrom(stats,supsmu) importFrom(stats,terms) importFrom(stats,var) importFrom(stats,weighted.mean) importFrom(survival,Surv) useDynLib(gbm, .registration = TRUE) gbm/demo/0000755000176200001440000000000013346511223011743 5ustar liggesusersgbm/demo/bernoulli.R0000644000176200001440000000641313346511223014065 0ustar liggesusers# LOGISTIC REGRESSION EXAMPLE cat("Running logistic regression example.\n") # create some data N <- 1000 X1 <- runif(N) X2 <- runif(N) X3 <- factor(sample(letters[1:4],N,replace=T)) mu <- c(-1,0,1,2)[as.numeric(X3)] p <- 1/(1+exp(-(sin(3*X1) - 4*X2 + mu))) Y <- rbinom(N,1,p) # random weights if you want to experiment with them w <- rexp(N) w <- N*w/sum(w) data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3) # fit initial model gbm1 <- gbm(Y~X1+X2+X3, # formula data=data, # dataset weights=w, var.monotone=c(0,0,0), # -1: monotone decrease, +1: monotone increase, 0: no monotone restrictions distribution="bernoulli", n.trees=3000, # number of trees shrinkage=0.001, # shrinkage or learning rate, 0.001 to 0.1 usually work interaction.depth=3, # 1: additive model, 2: two-way interactions, etc bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best train.fraction = 0.5, # fraction of data for training, first train.fraction*N used for training cv.folds=5, # do 5-fold cross-validation n.minobsinnode = 10, # minimum total weight needed in each node verbose = FALSE) # don't print progress # plot the performance best.iter.oob <- gbm.perf(gbm1,method="OOB") # returns out-of-bag estimated best number of trees print(best.iter.oob) best.iter.cv <- gbm.perf(gbm1,method="cv") # returns 5-fold cv estimate of best number of trees print(best.iter.cv) best.iter.test <- gbm.perf(gbm1,method="test") # returns test set estimate of best number of trees print(best.iter.test) best.iter <- best.iter.test # plot variable influence summary(gbm1,n.trees=1) # based on the first tree summary(gbm1,n.trees=best.iter) # based on the estimated best number of trees # create marginal plots # plot variable X1,X2,X3 after "best" iterations par(mfrow=c(1,3)) plot.gbm(gbm1,1,best.iter) plot.gbm(gbm1,2,best.iter) plot.gbm(gbm1,3,best.iter) par(mfrow=c(1,1)) plot.gbm(gbm1,1:2,best.iter) # contour plot of variables 1 and 2 after "best" number iterations plot.gbm(gbm1,2:3,best.iter) # lattice plot of variables 2 and 3 after "best" number iterations # 3-way plot plot.gbm(gbm1,1:3,best.iter) # print the first and last trees print(pretty.gbm.tree(gbm1,1)) print(pretty.gbm.tree(gbm1,gbm1$n.trees)) # make some new data N <- 1000 X1 <- runif(N) X2 <- runif(N) X3 <- factor(sample(letters[1:4],N,replace=T)) mu <- c(-1,0,1,2)[as.numeric(X3)] p <- 1/(1+exp(-(sin(3*X1) - 4*X2 + mu))) Y <- rbinom(N,1,p) data2 <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3) # predict on the new data using "best" number of trees # f.predict will be on the canonical scale (logit,log,etc.) f.predict <- predict.gbm(gbm1,data2, n.trees=c(best.iter.oob,best.iter.cv,best.iter.test)) # transform to probability scale for logistic regression p.pred <- 1/(1+exp(-f.predict)) # calibration plot for logistic regression - well calibrated means a 45 degree line par(mfrow=c(1,1)) calibrate.plot(Y,p.pred[,3]) # logistic error sum(data2$Y*f.predict[,1] - log(1+exp(f.predict[,1]))) sum(data2$Y*f.predict[,2] - log(1+exp(f.predict[,2]))) sum(data2$Y*f.predict[,3] - log(1+exp(f.predict[,3]))) gbm/demo/gaussian.R0000644000176200001440000000762213346511223013707 0ustar liggesusers# LEAST SQUARES EXAMPLE cat("Running least squares regression example.\n") # create some data N <- 1000 X1 <- runif(N) X2 <- 2*runif(N) X3 <- factor(sample(letters[1:4],N,replace=T)) X4 <- ordered(sample(letters[1:6],N,replace=T)) X5 <- factor(sample(letters[1:3],N,replace=T)) X6 <- 3*runif(N) mu <- c(-1,0,1,2)[as.numeric(X3)] SNR <- 10 # signal-to-noise ratio Y <- X1**1.5 + 2 * (X2**.5) + mu sigma <- sqrt(var(Y)/SNR) Y <- Y + rnorm(N,0,sigma) # create a bunch of missing values X1[sample(1:N,size=100)] <- NA X3[sample(1:N,size=300)] <- NA # random weights if you want to experiment with them # w <- rexp(N) # w <- N*w/sum(w) w <- rep(1,N) data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6) # fit initial model gbm1 <- gbm(Y~X1+X2+X3+X4+X5+X6, # formula data=data, # dataset var.monotone=c(0,0,0,0,0,0), # -1: monotone decrease, +1: monotone increase, 0: no monotone restrictions distribution="gaussian", # bernoulli, adaboost, gaussian, poisson, coxph, or # list(name="quantile",alpha=0.05) for quantile regression n.trees=2000, # number of trees shrinkage=0.005, # shrinkage or learning rate, 0.001 to 0.1 usually work interaction.depth=3, # 1: additive model, 2: two-way interactions, etc bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best train.fraction = 0.5, # fraction of data for training, first train.fraction*N used for training n.minobsinnode = 10, # minimum number of obs needed in each node keep.data=TRUE, cv.folds=10, # do 10-fold cross-validation verbose = FALSE) # don't print progress # plot the performance best.iter <- gbm.perf(gbm1,method="OOB") # returns out-of-bag estimated best number of trees best.iter <- gbm.perf(gbm1,method="test") # returns test set estimate of best number of trees best.iter <- gbm.perf(gbm1,method="cv") # returns cv estimate of best number of trees # plot variable influence summary(gbm1,n.trees=1) # based on the first tree summary(gbm1,n.trees=best.iter) # based on the estimated best number of trees # print the first and last trees print(pretty.gbm.tree(gbm1,1)) print(pretty.gbm.tree(gbm1,gbm1$n.trees)) print(gbm1$c.splits[1:3]) # make some new data N <- 1000 X1 <- runif(N) X2 <- 2*runif(N) X3 <- factor(sample(letters[1:4],N,replace=TRUE)) X4 <- ordered(sample(letters[1:6],N,replace=TRUE)) X5 <- factor(sample(letters[1:3],N,replace=TRUE)) X6 <- 3*runif(N) mu <- c(-1,0,1,2)[as.numeric(X3)] Y <- X1**1.5 + 2 * (X2**.5) + mu Y <- Y + rnorm(N,0,sigma) data2 <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6) print(data2[1:10,]) # predict on the new data using "best" number of trees f.predict <- predict(gbm1,data2,best.iter) # f.predict will be on the canonical scale (logit,log,etc.) print(f.predict[1:10]) # least squares error print(sum((data2$Y-f.predict)^2)) # create marginal plots # plot variable X1,X2,X3 after "best" iterations par(mfrow=c(1,3)) plot(gbm1,1,best.iter) plot(gbm1,2,best.iter) plot(gbm1,3,best.iter) par(mfrow=c(1,1)) plot(gbm1,1:2,best.iter) # contour plot of variables 1 and 2 after "best" number iterations plot(gbm1,2:3,best.iter) # lattice plot of variables 2 and 3 after "best" number iterations plot(gbm1,3:4,best.iter) # lattice plot of variables 2 and 3 after "best" number iterations plot(gbm1,c(1,2,6),best.iter,cont=20) # 3-way plots plot(gbm1,1:3,best.iter) plot(gbm1,2:4,best.iter) plot(gbm1,3:5,best.iter) # check interactions interact.gbm(gbm1,data=data,i.var=1:2,n.trees=best.iter) # get all two way interactions i.var <- subset(expand.grid(x1=1:6,x2=1:6), x1=data2$tt[i])*exp(f.predict) ) } cat("Boosting:",sum( data2$delta*( f.predict - log(risk) ) ),"\n") # linear model coxph1 <- coxph(Surv(tt,delta)~X1+X2+X3,data=data) f.predict <- predict(coxph1,newdata=data2) risk <- rep(0,N) for(i in 1:N) { risk[i] <- sum( (data2$tt>=data2$tt[i])*exp(f.predict) ) } cat("Linear model:",sum( data2$delta*( f.predict - log(risk) ) ),"\n") gbm/NEWS.md0000644000176200001440000000324413417113151012115 0ustar liggesusers# gbm 2.1.5 * Fixed bug that occurred whenever `distribution` was a list (e.g., "pairwise" regression) [(#27)](https://github.com/gbm-developers/gbm/issues/27). * Fixed a bug that occurred when making predictions on new data with different factor levels [(#28)](https://github.com/gbm-developers/gbm/issues/28). * Fixed a bug that caused `relative.influence()` to give different values whenever `n.trees` was/wasn't given for multinomial distributions [(#31)](https://github.com/gbm-developers/gbm/issues/31). * The `plot.it` argument of `gbm.perf()` is no longer ignored [(#34)](https://github.com/gbm-developers/gbm/issues/34). * Fixed an error that occurred in `gbm.perf()` whenever `oobag.curve = FALSE` and `overlay = FALSE`. # gbm 2.1.4 * Switched from `CHANGES` to `NEWS` file. * Updated links and maintainer field in `DESCRIPTION` file. * Fixed bug caused by factors with unused levels [(#5)](https://github.com/gbm-developers/gbm/issues/5). * Fixed bug with axis labels in the `plot()` method for `"gbm"` objects [(#17)](https://github.com/gbm-developers/gbm/issues/17). * The `plot()` method for `"gbm"` objects is now more consistent and always returns a `"trellis"` object [(#19)](https://github.com/gbm-developers/gbm/issues/19). Consequently, setting graphical parameters via `par` will no longer have an effect on the output from `plot.gbm`. * The `plot()` method for `"gbm"` objects gained five new arguments: `level.plot`, `contour`, `number`, `overlap`, and `col.regions`; see `?plot.gbm` for details. * The default color palette for false color level plots in `plot.gbm()` has changed to the Matplotlib 'viridis' color map. * Fixed a number of references and URLs. gbm/R/0000755000176200001440000000000013413011135011210 5ustar liggesusersgbm/R/utils.R0000644000176200001440000001007613346511223012507 0ustar liggesusers#' Arrange multiple grobs on a page #' #' See \code{\link[gridExtra]{grid.arrange}} for more details. #' #' @name grid.arrange #' @rdname grid.arrange #' @keywords internal #' @export #' @importFrom gridExtra grid.arrange #' @usage grid.arrange(..., newpage = TRUE) NULL #' @keywords internal getAvailableDistributions <- function() { c("adaboost", "bernoulli", "coxph", "gaussian", "huberized", "laplace", "multinomial", "pairwise", "poisson", "quantile", "tdist") } #' @keywords internal guess_error_method <- function(object) { if (has_train_test_split(object)) { "test" } else if (has_cross_validation(object)) { "cv" } else { "OOB" } } #' @keywords internal has_train_test_split <- function(object) { object$train.fraction < 1 } #' @keywords internal has_cross_validation <- function(object) { !is.null(object$cv.error) } #' @keywords internal best_iter <- function(object, method) { check_if_gbm_fit(object) if (method == "OOB") { best_iter_out_of_bag(object) } else if (method == "test") { best_iter_test(object) } else if (method == "cv") { best_iter_cv(object) } else { stop("method must be one of \"cv\", \"test\", or \"OOB\"") } } #' @keywords internal best_iter_test <- function(object) { check_if_gbm_fit(object) best_iter_test <- which.min(object$valid.error) return(best_iter_test) } #' @keywords internal best_iter_cv <- function(object) { check_if_gbm_fit(object) if(!has_cross_validation(object)) { stop('In order to use method="cv" gbm must be called with cv_folds>1.') } best_iter_cv <- which.min(object$cv.error) return(best_iter_cv) } #' @keywords internal best_iter_out_of_bag <- function(object) { check_if_gbm_fit(object) if(object$bag.fraction == 1) { stop("Cannot compute OOB estimate or the OOB curve when bag_fraction=1.") } if(all(!is.finite(object$oobag.improve))) { stop("Cannot compute OOB estimate or the OOB curve. No finite OOB ", "estimates of improvement.") } message("OOB generally underestimates the optimal number of iterations ", "although predictive performance is reasonably competitive. Using ", "cv_folds>1 when calling gbm usually results in improved predictive ", "performance.") smoother <- generate_smoother_oobag(object) best_iter_oob <- smoother$x[which.min(-cumsum(smoother$y))] attr(best_iter_oob, "smoother") <- smoother return(best_iter_oob) } #' @keywords internal generate_smoother_oobag <- function(object) { check_if_gbm_fit(object) x <- seq_len(object$n.trees) smoother <- loess(object$oobag.improve ~ x, enp.target = min(max(4, length(x) / 10), 50)) smoother$y <- smoother$fitted smoother$x <- x return(smoother) } #' @keywords internal check_if_gbm_fit <- function(object) { if (!inherits(object, "gbm")) { stop(deparse(substitute(object)), " is not a valid \"gbm\" object.") } } #' @keywords internal get_ylab <- function(object) { check_if_gbm_fit(object) if (object$distribution$name != "pairwise") { switch(substring(object$distribution$name, 1, 2), ga = "Squared error loss", be = "Bernoulli deviance", po = "Poisson deviance", ad = "AdaBoost exponential bound", co = "Cox partial deviance", la = "Absolute loss", qu = "Quantile loss", mu = "Multinomial deviance", td = "t-distribution deviance") } else { switch(object$distribution$metric, conc = "Fraction of concordant pairs", ndcg = "Normalized discounted cumulative gain", map = "Mean average precision", mrr = "Mean reciprocal rank") } } #' @keywords internal get_ylim <- function(object, method) { check_if_gbm_fit(object) if(object$train.fraction == 1) { if ( method=="cv" ) { range(object$train.error, object$cv.error) } else if ( method == "test" ) { range( object$train.error, object$valid.error) } else { range(object$train.error) } } else { range(object$train.error, object$valid.error) } } gbm/R/gbm.perf.R0000644000176200001440000000711113413011135013033 0ustar liggesusers#' GBM performance #' #' Estimates the optimal number of boosting iterations for a \code{gbm} object #' and optionally plots various performance measures #' #' @param object A \code{\link{gbm.object}} created from an initial call to #' \code{\link{gbm}}. #' #' @param plot.it An indicator of whether or not to plot the performance #' measures. Setting \code{plot.it = TRUE} creates two plots. The first plot #' plots \code{object$train.error} (in black) and \code{object$valid.error} #' (in red) versus the iteration number. The scale of the error measurement, #' shown on the left vertical axis, depends on the \code{distribution} #' argument used in the initial call to \code{\link{gbm}}. #' #' @param oobag.curve Indicates whether to plot the out-of-bag performance #' measures in a second plot. #' #' @param overlay If TRUE and oobag.curve=TRUE then a right y-axis is added to #' the training and test error plot and the estimated cumulative improvement #' in the loss function is plotted versus the iteration number. #' #' @param method Indicate the method used to estimate the optimal number of #' boosting iterations. \code{method = "OOB"} computes the out-of-bag estimate #' and \code{method = "test"} uses the test (or validation) dataset to compute #' an out-of-sample estimate. \code{method = "cv"} extracts the optimal number #' of iterations using cross-validation if \code{gbm} was called with #' \code{cv.folds} > 1. #' #' @return \code{gbm.perf} Returns the estimated optimal number of iterations. #' The method of computation depends on the \code{method} argument. #' #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' #' @seealso \code{\link{gbm}}, \code{\link{gbm.object}} #' #' @keywords nonlinear survival nonparametric tree #' #' @export gbm.perf <- function(object, plot.it = TRUE, oobag.curve = FALSE, overlay = TRUE, method) { # Determine method, if missing if (missing(method)) { method <- guess_error_method(object) } # Determine "optimal" number of iterations best.iter <- best_iter(object, method = method) # Plot results if (plot.it) { # Determine an appropriate y-axis label ylab <- get_ylab(object) # Determine an appropriate range for the y-axis ylim <- get_ylim(object, method = method) # Plot results plot(object$train.error, ylim = ylim, type = "l", xlab = "Iteration", ylab = ylab) if (object$train.fraction != 1) { lines(object$valid.error, col = "red") } if (method=="cv") { lines(object$cv.error, col = "green") } if (!is.na(best.iter)) { abline(v = best.iter, col = "blue", lwd = 2, lty = 2) } if (oobag.curve) { smoother <- attr(best.iter, "smoother") if (overlay) { # smoother <- attr(best.iter, "smoother") par(new = TRUE) plot(smoother$x, cumsum(smoother$y), col = "blue", type = "l", xlab = "", ylab = "", axes = FALSE) axis(4, srt = 0) at <- mean(range(smoother$y)) mtext(paste("OOB improvement in", ylab), side = 4, srt = 270, line = 2) abline(h = 0, col = "blue", lwd = 2) } plot(object$oobag.improve, type = "l", xlab = "Iteration", ylab = paste("OOB change in", ylab)) lines(smoother, col = "red", lwd = 2) abline(h = 0, col = "blue", lwd = 1) abline(v =best.iter, col = "blue", lwd = 1) } } # Return "best" number of iterations (i.e., number of boosted trees) best.iter } gbm/R/interact.gbm.R0000644000176200001440000001277713346511223013736 0ustar liggesusers#' Estimate the strength of interaction effects #' #' Computes Friedman's H-statistic to assess the strength of variable #' interactions. #' #' @param x A \code{\link{gbm.object}} fitted using a call to \code{\link{gbm}}. #' #' @param data The dataset used to construct \code{x}. If the original dataset #' is large, a random subsample may be used to accelerate the computation in #' \code{interact.gbm}. #' #' @param i.var A vector of indices or the names of the variables for compute #' the interaction effect. If using indices, the variables are indexed in the #' same order that they appear in the initial \code{gbm} formula. #' #' @param n.trees The number of trees used to generate the plot. Only the first #' \code{n.trees} trees will be used. #' #' @return Returns the value of \eqn{H}. #' #' @details #' \code{interact.gbm} computes Friedman's H-statistic to assess the relative #' strength of interaction effects in non-linear models. H is on the scale of #' [0-1] with higher values indicating larger interaction effects. To connect #' to a more familiar measure, if \eqn{x_1} and \eqn{x_2} are uncorrelated #' covariates with mean 0 and variance 1 and the model is of the form #' \deqn{y=\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3} then #' \deqn{H=\frac{\beta_3}{\sqrt{\beta_1^2+\beta_2^2+\beta_3^2}}} #' #' Note that if the main effects are weak, the estimated H will be unstable. #' For example, if (in the case of a two-way interaction) neither main effect #' is in the selected model (relative influence is zero), the result will be #' 0/0. Also, with weak main effects, rounding errors can result in values of H #' > 1 which are not possible. #' #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' @seealso \code{\link{gbm}}, \code{\link{gbm.object}} #' @references J.H. Friedman and B.E. Popescu (2005). \dQuote{Predictive #' Learning via Rule Ensembles.} Section 8.1 #' @keywords methods #' @export interact.gbm <- function(x, data, i.var = 1, n.trees = x$n.trees){ ############################################################### # Do sanity checks on the call if (x$interaction.depth < length(i.var)){ stop("interaction.depth too low in model call") } if (all(is.character(i.var))){ i <- match(i.var, x$var.names) if (any(is.na(i))) { stop("Variables given are not used in gbm model fit: ", i.var[is.na(i)]) } else { i.var <- i } } if ((min(i.var) < 1) || (max(i.var) > length(x$var.names))) { warning("i.var must be between 1 and ", length(x$var.names)) } if (n.trees > x$n.trees) { warning(paste("n.trees exceeds the number of trees in the model, ", x$n.trees,". Using ", x$n.trees, " trees.", sep = "")) n.trees <- x$n.trees } # End of sanity checks ############################################################### unique.tab <- function(z,i.var) { a <- unique(z[,i.var,drop=FALSE]) a$n <- table(factor(apply(z[,i.var,drop=FALSE],1,paste,collapse="\r"), levels=apply(a,1,paste,collapse="\r"))) return(a) } # convert factors for(j in i.var) { if(is.factor(data[,x$var.names[j]])) data[,x$var.names[j]] <- as.numeric(data[,x$var.names[j]])-1 } # generate a list with all combinations of variables a <- apply(expand.grid(rep(list(c(FALSE,TRUE)), length(i.var)))[-1,],1, function(x) as.numeric(which(x))) FF <- vector("list",length(a)) for(j in 1:length(a)) { FF[[j]]$Z <- data.frame(unique.tab(data, x$var.names[i.var[a[[j]]]])) FF[[j]]$n <- as.numeric(FF[[j]]$Z$n) FF[[j]]$Z$n <- NULL FF[[j]]$f <- .Call("gbm_plot", X = as.double(data.matrix(FF[[j]]$Z)), cRows = as.integer(nrow(FF[[j]]$Z)), cCols = as.integer(ncol(FF[[j]]$Z)), n.class = as.integer(x$num.classes), i.var = as.integer(i.var[a[[j]]] - 1), n.trees = as.integer(n.trees), initF = as.double(x$initF), trees = x$trees, c.splits = x$c.splits, var.type = as.integer(x$var.type), PACKAGE = "gbm") # FF[[jj]]$Z is the data, f is the predictions, n is the number of levels for factors # Need to restructure f to deal with multinomial case FF[[j]]$f <- matrix(FF[[j]]$f, ncol=x$num.classes, byrow=FALSE) # center the values FF[[j]]$f <- apply(FF[[j]]$f, 2, function(x, w){ x - weighted.mean(x, w, na.rm=TRUE) }, w=FF[[j]]$n) # precompute the sign of these terms to appear in H FF[[j]]$sign <- ifelse(length(a[[j]]) %% 2 == length(i.var) %% 2, 1, -1) } H <- FF[[length(a)]]$f for(j in 1:(length(a)-1)){ i1 <- apply(FF[[length(a)]]$Z[,a[[j]], drop=FALSE], 1, paste, collapse="\r") i2 <- apply(FF[[j]]$Z,1,paste,collapse="\r") i <- match(i1, i2) H <- H + with(FF[[j]], sign*f[i,]) } # Compute H w <- matrix(FF[[length(a)]]$n, ncol=1) f <- matrix(FF[[length(a)]]$f^2, ncol=x$num.classes, byrow=FALSE) top <- apply(H^2, 2, weighted.mean, w = w, na.rm = TRUE) btm <- apply(f, 2, weighted.mean, w = w, na.rm = TRUE) H <- top / btm if (x$distribution$name=="multinomial"){ names(H) <- x$classes } # If H > 1, rounding and tiny main effects have messed things up H[H > 1] <- NaN return(sqrt(H)) } gbm/R/shrink.gbm.R0000644000176200001440000000545313346511223013414 0ustar liggesusers# evaluates the objective function and gradient with respect to beta # beta = log(lambda/(1-lambda)) #' L1 shrinkage of the predictor variables in a GBM #' #' Performs recursive shrinkage in each of the trees in a GBM fit using #' different shrinkage parameters for each variable. #' #' This function is currently experimental. Used in conjunction with a gradient #' ascent search for inclusion of variables. #' #' @param object A \code{\link{gbm.object}}. #' #' @param n.trees Integer specifying the number of trees to use. #' #' @param lambda Vector of length equal to the number of variables containing #' the shrinkage parameter for each variable. #' #' @param \dots Additional optional arguments. (Currently ignored.) #' #' @return \item{predF}{Predicted values from the shrunken tree} #' \item{objective}{The value of the loss function associated with the #' predicted values} \item{gradient}{A vector with length equal to the number #' of variables containing the derivative of the objective function with #' respect to beta, the logit transform of the shrinkage parameter for each #' variable} #' #' @note Warning: This function is experimental. #' #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' #' @seealso \code{\link{shrink.gbm.pred}}, \code{\link{gbm}} #' #' @references Hastie, T. J., and Pregibon, D. #' \url{https://web.stanford.edu/~hastie/Papers/shrink_tree.pdf}. AT&T Bell #' Laboratories Technical Report (March 1990). #' #' @keywords methods #' #' @export shrink.gbm <- function(object,n.trees, lambda=rep(10,length(object$var.names)), ...) { if(length(lambda) != length(object$var.names)) { stop("lambda must have the same length as the number of variables in the gbm object.") } if(is.null(object$data)) { stop("shrink.gbm requires keep.data=TRUE when gbm model is fit.") } y <- object$data$y x <- object$data$x cCols <- length(object$var.names) cRows <- length(x)/cCols if(missing(n.trees) || (n.trees > object$n.trees)) { n.trees <- object$n.trees warning("n.trees not specified or some values exceeded number fit so far. Using ",n.trees,".") } result <- .Call("gbm_shrink_gradient", y=as.double(y), X=as.double(x), cRows=as.integer(cRows), cCols=as.integer(cCols), n.trees=as.integer(n.trees), initF=object$initF, trees=object$trees, c.split=object$c.split, var.type=as.integer(object$var.type), depth=as.integer(object$interaction.depth), lambda=as.double(lambda), PACKAGE = "gbm") names(result) <- c("predF","objective","gradient") return(result) } gbm/R/calibrate.plot.R0000644000176200001440000001325413346511223014253 0ustar liggesusers#' Quantile rug plot #' #' Marks the quantiles on the axes of the current plot. #' #' @param x A numeric vector. #' #' @param prob The quantiles of x to mark on the x-axis. #' #' @param ... Additional optional arguments to be passed onto #' \code{\link[graphics]{rug}} #' #' @return No return values. #' #' @author Greg Ridgeway \email{gregridgeway@@gmail.com}. #' #' @seealso \code{\link[graphics]{plot}}, \code{\link[stats]{quantile}}, #' \code{\link[base]{jitter}}, \code{\link[graphics]{rug}}. #' #' @keywords aplot #' #' @export quantile.rug #' #' @examples #' x <- rnorm(100) #' y <- rnorm(100) #' plot(x, y) #' quantile.rug(x) quantile.rug <- function(x, prob = 0:10/10, ...) { quants <- quantile(x[!is.na(x)], prob = prob) if(length(unique(quants)) < length(prob)) { quants <- jitter(quants) } rug(quants, ...) } #' Calibration plot #' #' An experimental diagnostic tool that plots the fitted values versus the #' actual average values. Currently only available when #' \code{distribution = "bernoulli"}. #' #' Uses natural splines to estimate E(y|p). Well-calibrated predictions imply #' that E(y|p) = p. The plot also includes a pointwise 95% confidence band. #' #' @param y The outcome 0-1 variable. #' #' @param p The predictions estimating E(y|x). #' #' @param distribution The loss function used in creating \code{p}. #' \code{bernoulli} and \code{poisson} are currently the only special options. #' All others default to squared error assuming \code{gaussian}. #' #' @param replace Determines whether this plot will replace or overlay the #' current plot. \code{replace=FALSE} is useful for comparing the calibration #' of several methods. #' #' @param line.par Graphics parameters for the line. #' #' @param shade.col Color for shading the 2 SE region. \code{shade.col=NA} #' implies no 2 SE region. #' #' @param shade.density The \code{density} parameter for \code{\link{polygon}}. #' #' @param rug.par Graphics parameters passed to \code{\link{rug}}. #' #' @param xlab x-axis label corresponding to the predicted values. #' #' @param ylab y-axis label corresponding to the observed average. #' #' @param xlim,ylim x- and y-axis limits. If not specified te function will #' select limits. #' #' @param knots,df These parameters are passed directly to #' \code{\link[splines]{ns}} for constructing a natural spline smoother for the #' calibration curve. #' #' @param ... Additional optional arguments to be passed onto #' \code{\link[graphics]{plot}} #' #' @return No return values. #' #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' #' @references #' J.F. Yates (1982). "External correspondence: decomposition of #' the mean probability score," Organisational Behaviour and Human Performance #' 30:132-156. #' #' D.J. Spiegelhalter (1986). "Probabilistic Prediction in Patient Management #' and Clinical Trials," Statistics in Medicine 5:421-433. #' @keywords hplot #' #' @export #' #' @examples #' # Don't want R CMD check to think there is a dependency on rpart #' # so comment out the example #' #library(rpart) #' #data(kyphosis) #' #y <- as.numeric(kyphosis$Kyphosis)-1 #' #x <- kyphosis$Age #' #glm1 <- glm(y~poly(x,2),family=binomial) #' #p <- predict(glm1,type="response") #' #calibrate.plot(y, p, xlim=c(0,0.6), ylim=c(0,0.6)) calibrate.plot <- function(y, p, distribution = "bernoulli", replace = TRUE, line.par = list(col = "black"), shade.col = "lightyellow", shade.density = NULL, rug.par = list(side = 1), xlab = "Predicted value", ylab = "Observed average", xlim = NULL, ylim = NULL, knots = NULL, df = 6, ...) { # Sanity check if (!requireNamespace("splines", quietly = TRUE)) { stop("The splines package is needed for this function to work. Please ", "install it.", call. = FALSE) } data <- data.frame(y = y, p = p) # Check spline parameters if(is.null(knots) && is.null(df)) { stop("Either knots or df must be specified") } if((df != round(df)) || (df < 1)) { stop("df must be a positive integer") } # Check distribution if(distribution == "bernoulli") { family1 <- binomial } else if(distribution == "poisson") { family1 <- poisson } else { family1 <- gaussian } # Fit a GLM using natural cubic splines gam1 <- glm(y ~ splines::ns(p, df = df, knots = knots), data = data, family = family1) # Plotting data x <- seq(min(p), max(p), length = 200) yy <- predict(gam1, newdata = data.frame(p = x), se.fit = TRUE, type = "response") x <- x[!is.na(yy$fit)] yy$se.fit <- yy$se.fit[!is.na(yy$fit)] yy$fit <- yy$fit[!is.na(yy$fit)] # Plotting parameters if(!is.na(shade.col)) { se.lower <- yy$fit - 2 * yy$se.fit se.upper <- yy$fit + 2 * yy$se.fit if(distribution == "bernoulli") { se.lower[se.lower < 0] <- 0 se.upper[se.upper > 1] <- 1 } if(distribution == "poisson") { se.lower[se.lower < 0] <- 0 } if(is.null(xlim)) { xlim <- range(se.lower, se.upper, x) } if(is.null(ylim)) { ylim <- range(se.lower, se.upper, x) } } else { if(is.null(xlim)) { xlim <- range(yy$fit,x) } if(is.null(ylim)) { ylim <- range(yy$fit,x) } } # Construct plot if(replace) { plot(0, 0, type = "n", xlab = xlab, ylab = ylab, xlim = xlim, ylim = ylim, ...) } if(!is.na(shade.col)) { polygon(c(x, rev(x), x[1L]), c(se.lower, rev(se.upper), se.lower[1L]), col = shade.col, border = NA, density = shade.density) } lines(x, yy$fit, col = line.par$col) quantile.rug(p, side = rug.par$side) abline(0, 1, col = "red") } gbm/R/gbm-package.R0000644000176200001440000000422513346511223013504 0ustar liggesusers#' Generalized Boosted Regression Models (GBMs) #' #' This package implements extensions to Freund and Schapire's AdaBoost #' algorithm and J. Friedman's gradient boosting machine. Includes regression #' methods for least squares, absolute loss, logistic, Poisson, Cox #' proportional hazards partial likelihood, multinomial, t-distribution, #' AdaBoost exponential loss, Learning to Rank, and Huberized hinge loss. #' #' Further information is available in vignette: #' \code{browseVignettes(package = "gbm")} #' #' @import lattice #' #' @importFrom grDevices rainbow #' @importFrom graphics abline axis barplot lines mtext par plot polygon rug #' @importFrom graphics segments title #' @importFrom stats approx binomial delete.response gaussian glm loess #' @importFrom stats model.extract model.frame model.offset model.response #' @importFrom stats model.weights na.pass poisson predict quantile rbinom #' @importFrom stats reformulate reorder rexp rnorm runif sd supsmu terms var #' @importFrom stats weighted.mean #' @importFrom survival Surv #' #' @useDynLib gbm, .registration = TRUE #' #' @name gbm-package #' #' @docType package #' #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} with contributions by #' Daniel Edwards, Brian Kriegler, Stefan Schroedl and Harry Southworth. #' #' @references #' Y. Freund and R.E. Schapire (1997) \dQuote{A decision-theoretic #' generalization of on-line learning and an application to boosting,} #' \emph{Journal of Computer and System Sciences,} 55(1):119-139. #' #' G. Ridgeway (1999). \dQuote{The state of boosting,} \emph{Computing Science #' and Statistics} 31:172-181. #' #' J.H. Friedman, T. Hastie, R. Tibshirani (2000). \dQuote{Additive Logistic #' Regression: a Statistical View of Boosting,} \emph{Annals of Statistics} #' 28(2):337-374. #' #' J.H. Friedman (2001). \dQuote{Greedy Function Approximation: A Gradient #' Boosting Machine,} \emph{Annals of Statistics} 29(5):1189-1232. #' #' J.H. Friedman (2002). \dQuote{Stochastic Gradient Boosting,} #' \emph{Computational Statistics and Data Analysis} 38(4):367-378. #' #' The \url{http://statweb.stanford.edu/~jhf/R-MART} website. #' #' @keywords package NULLgbm/R/pretty.gbm.tree.R0000644000176200001440000000405113346511223014374 0ustar liggesusers#' Print gbm tree components #' #' \code{gbm} stores the collection of trees used to construct the model in a #' compact matrix structure. This function extracts the information from a #' single tree and displays it in a slightly more readable form. This function #' is mostly for debugging purposes and to satisfy some users' curiosity. #' #' #' @param object a \code{\link{gbm.object}} initially fit using #' \code{\link{gbm}} #' @param i.tree the index of the tree component to extract from \code{object} #' and display #' @return \code{pretty.gbm.tree} returns a data frame. Each row corresponds to #' a node in the tree. Columns indicate \item{SplitVar}{index of which variable #' is used to split. -1 indicates a terminal node.} \item{SplitCodePred}{if the #' split variable is continuous then this component is the split point. If the #' split variable is categorical then this component contains the index of #' \code{object$c.split} that describes the categorical split. If the node is a #' terminal node then this is the prediction.} \item{LeftNode}{the index of the #' row corresponding to the left node.} \item{RightNode}{the index of the row #' corresponding to the right node.} \item{ErrorReduction}{the reduction in the #' loss function as a result of splitting this node.} \item{Weight}{the total #' weight of observations in the node. If weights are all equal to 1 then this #' is the number of observations in the node.} #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' @seealso \code{\link{gbm}}, \code{\link{gbm.object}} #' @keywords print #' @export pretty.gbm.tree pretty.gbm.tree <- function(object,i.tree=1) { if((i.tree<1) || (i.tree>length(object$trees))) { stop("i.tree is out of range. Must be less than ",length(object$trees)) } else { temp <- data.frame(object$trees[[i.tree]]) names(temp) <- c("SplitVar","SplitCodePred","LeftNode", "RightNode","MissingNode","ErrorReduction", "Weight","Prediction") row.names(temp) <- 0:(nrow(temp)-1) } return(temp) } gbm/R/test.gbm.R0000644000176200001440000002577113413011135013072 0ustar liggesusers#' Test the \code{gbm} package. #' #' Run tests on \code{gbm} functions to perform logical checks and #' reproducibility. #' #' The function uses functionality in the \code{RUnit} package. A fairly small #' validation suite is executed that checks to see that relative influence #' identifies sensible variables from simulated data, and that predictions from #' GBMs with Gaussian, Cox or binomial distributions are sensible, #' #' @aliases validate.gbm test.gbm test.relative.influence #' @return An object of class \code{RUnitTestData}. See the help for #' \code{RUnit} for details. #' @note The test suite is not comprehensive. #' @author Harry Southworth #' @seealso \code{\link{gbm}} #' @keywords models #' @examples #' #' # Uncomment the following lines to run - commented out to make CRAN happy #' #library(RUnit) #' #val <- validate.texmex() #' #printHTMLProtocol(val, "texmexReport.html") #' @export test.gbm <- function(){ # Based on example in R package # Gaussian example ############################################################################ ## test Gaussian distribution gbm model set.seed(1) cat("Running least squares regression example.\n") # create some data N <- 1000 X1 <- runif(N) X2 <- 2*runif(N) X3 <- factor(sample(letters[1:4],N,replace=T)) X4 <- ordered(sample(letters[1:6],N,replace=T)) X5 <- factor(sample(letters[1:3],N,replace=T)) X6 <- 3*runif(N) mu <- c(-1,0,1,2)[as.numeric(X3)] SNR <- 10 # signal-to-noise ratio Y <- X1**1.5 + 2 * (X2**.5) + mu sigma <- sqrt(var(Y)/SNR) Y <- Y + rnorm(N,0,sigma) # create a bunch of missing values X1[sample(1:N,size=100)] <- NA X3[sample(1:N,size=300)] <- NA w <- rep(1,N) data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6) # fit initial model gbm1 <- gbm(Y~X1+X2+X3+X4+X5+X6, # formula data=data, # dataset var.monotone=c(0,0,0,0,0,0), # -1: monotone decrease, +1: monotone increase, 0: no monotone restrictions distribution="gaussian", # bernoulli, adaboost, gaussian, poisson, coxph, or # list(name="quantile",alpha=0.05) for quantile regression n.trees=2000, # number of trees shrinkage=0.005, # shrinkage or learning rate, 0.001 to 0.1 usually work interaction.depth=3, # 1: additive model, 2: two-way interactions, etc bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best train.fraction = 0.5, # fraction of data for training, first train.fraction*N used for training n.minobsinnode = 10, # minimum number of obs needed in each node keep.data=TRUE, cv.folds=10) # do 10-fold cross-validation # Get best model best.iter <- gbm.perf(gbm1,method="cv", plot.it=FALSE) # returns cv estimate of best number of trees set.seed(2) # make some new data N <- 1000 X1 <- runif(N) X2 <- 2*runif(N) X3 <- factor(sample(letters[1:4],N,replace=TRUE)) X4 <- ordered(sample(letters[1:6],N,replace=TRUE)) X5 <- factor(sample(letters[1:3],N,replace=TRUE)) X6 <- 3*runif(N) mu <- c(-1,0,1,2)[as.numeric(X3)] # Actual underlying signal Y <- X1**1.5 + 2 * (X2**.5) + mu # Want to see how close predictions are to the underlying signal; noise would just interfere with this # Y <- Y + rnorm(N,0,sigma) data2 <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6) # predict on the new data using "best" number of trees f.predict <- predict(gbm1,data2,best.iter) # f.predict will be on the canonical scale (logit,log,etc.) # Base the validation tests on observed discrepancies RUnit::checkTrue(abs(mean(data2$Y-f.predict)) < 0.01, msg="Gaussian absolute error within tolerance") RUnit::checkTrue(sd(data2$Y-f.predict) < sigma , msg="Gaussian squared erroor within tolerance") ############################################################################ ## test coxph distribution gbm model ## COX PROPORTIONAL HAZARDS REGRESSION EXAMPLE cat("Running cox proportional hazards regression example.\n") # create some data set.seed(1) N <- 3000 X1 <- runif(N) X2 <- runif(N) X3 <- factor(sample(letters[1:4],N,replace=T)) mu <- c(-1,0,1,2)[as.numeric(X3)] f <- 0.5*sin(3*X1 + 5*X2^2 + mu/10) tt.surv <- rexp(N,exp(f)) tt.cens <- rexp(N,0.5) delta <- as.numeric(tt.surv <= tt.cens) tt <- apply(cbind(tt.surv,tt.cens),1,min) # throw in some missing values X1[sample(1:N,size=100)] <- NA X3[sample(1:N,size=300)] <- NA # random weights if you want to experiment with them w <- rep(1,N) data <- data.frame(tt=tt,delta=delta,X1=X1,X2=X2,X3=X3) # fit initial model gbm1 <- gbm(Surv(tt,delta)~X1+X2+X3, # formula data=data, # dataset weights=w, var.monotone=c(0,0,0), # -1: monotone decrease, +1: monotone increase, 0: no monotone restrictions distribution="coxph", n.trees=3000, # number of trees shrinkage=0.001, # shrinkage or learning rate, 0.001 to 0.1 usually work interaction.depth=3, # 1: additive model, 2: two-way interactions, etc bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best train.fraction = 0.5, # fraction of data for training, first train.fraction*N used for training cv.folds = 5, # do 5-fold cross-validation n.minobsinnode = 10, # minimum total weight needed in each node keep.data = TRUE) best.iter <- gbm.perf(gbm1,method="test", plot.it=FALSE) # returns test set estimate of best number of trees # make some new data set.seed(2) N <- 1000 X1 <- runif(N) X2 <- runif(N) X3 <- factor(sample(letters[1:4],N,replace=T)) mu <- c(-1,0,1,2)[as.numeric(X3)] f <- 0.5*sin(3*X1 + 5*X2^2 + mu/10) # -0.5 <= f <= 0.5 via sin fn. tt.surv <- rexp(N,exp(f)) tt.cens <- rexp(N,0.5) data2 <- data.frame(tt=apply(cbind(tt.surv,tt.cens),1,min), delta=as.numeric(tt.surv <= tt.cens), f=f, X1=X1,X2=X2,X3=X3) # predict on the new data using "best" number of trees # f.predict will be on the canonical scale (logit,log,etc.) f.predict <- predict(gbm1, newdata = data2, n.trees = best.iter) #plot(data2$f,f.predict) # Use observed sd RUnit::checkTrue(sd(data2$f - f.predict) < 0.4, msg="Coxph: squared error within tolerance") ############################################################################ ## Test bernoulli distribution gbm model set.seed(1) cat("Running logistic regression example.\n") # create some data N <- 1000 X1 <- runif(N) X2 <- runif(N) X3 <- factor(sample(letters[1:4],N,replace=T)) mu <- c(-1,0,1,2)[as.numeric(X3)] p <- 1/(1+exp(-(sin(3*X1) - 4*X2 + mu))) Y <- rbinom(N,1,p) # random weights if you want to experiment with them w <- rexp(N) w <- N*w/sum(w) data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3) # fit initial model gbm1 <- gbm(Y~X1+X2+X3, # formula data=data, # dataset weights=w, var.monotone=c(0,0,0), # -1: monotone decrease, +1: monotone increase, 0: no monotone restrictions distribution="bernoulli", n.trees=3000, # number of trees shrinkage=0.001, # shrinkage or learning rate, 0.001 to 0.1 usually work interaction.depth=3, # 1: additive model, 2: two-way interactions, etc bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best train.fraction = 0.5, # fraction of data for training, first train.fraction*N used for training cv.folds=5, # do 5-fold cross-validation n.minobsinnode = 10) # minimum total weight needed in each node best.iter.test <- gbm.perf(gbm1,method="test", plot.it=FALSE) # returns test set estimate of best number of trees best.iter <- best.iter.test # make some new data set.seed(2) N <- 1000 X1 <- runif(N) X2 <- runif(N) X3 <- factor(sample(letters[1:4],N,replace=T)) mu <- c(-1,0,1,2)[as.numeric(X3)] p <- 1/(1+exp(-(sin(3*X1) - 4*X2 + mu))) Y <- rbinom(N,1,p) data2 <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3) # predict on the new data using "best" number of trees # f.predict will be on the canonical scale (logit,log,etc.) f.1.predict <- predict(gbm1,data2, n.trees=best.iter.test) # compute quantity prior to transformation f.new = sin(3*X1) - 4*X2 + mu # Base the validation tests on observed discrepancies RUnit::checkTrue(sd(f.new - f.1.predict) < 1.0 ) invisible() } ################################################################################ ########################### test.relative.influence() ########################## ########################### ########################## #' @export test.relative.influence <- function(){ # Test that relative.influence really does pick out the true predictors set.seed(1234) X1 <- matrix(nrow=1000, ncol=50) X1 <- apply(X1, 2, function(x) rnorm(1000)) # Random noise X2 <- matrix(nrow=1000, ncol=5) X2 <- apply(X2, 2, function(x) c(rnorm(500), rnorm(500, 3))) # Real predictors cls <- rep(c(0, 1), ea=500) # Class X <- data.frame(cbind(X1, X2, cls)) mod <- gbm(cls ~ ., data= X, n.trees=1000, cv.folds=5, shrinkage=.01, interaction.depth=2) ri <- rev(sort(relative.influence(mod))) wh <- names(ri)[1:5] res <- sum(wh %in% paste("V", 51:55, sep = "")) RUnit::checkEqualsNumeric(res, 5, msg="Testing relative.influence identifies true predictors") } ################################################################################ ################################ validate.gbm() ################################ ################################ ################################ #' @export validate.gbm <- function () { wh <- (1:length(search()))[search() == "package:gbm"] tests <- objects(wh)[substring(objects(wh), 1, 5) == "test."] # Create temporary directory to put tests into sep <- if (.Platform$OS.type == "windows") "\\" else "/" dir <- file.path(tempdir(), "gbm.tests", fsep = sep) dir.create(dir) for (i in 1:length(tests)) { str <- paste(dir, sep, tests[i], ".R", sep = "") dump(tests[i], file = str) } res <- RUnit::defineTestSuite("gbm", dirs = dir, testFuncRegexp = "^test.+", testFileRegexp = "*.R") cat("Running gbm test suite.\nThis will take some time...\n\n") RUnit::runTestSuite(res) } gbm/R/print.gbm.R0000644000176200001440000001620513346754762013270 0ustar liggesusers#' Print model summary #' #' Display basic information about a \code{gbm} object. #' #' Prints some information about the model object. In particular, this method #' prints the call to \code{gbm()}, the type of loss function that was used, #' and the total number of iterations. #' #' If cross-validation was performed, the 'best' number of trees as estimated #' by cross-validation error is displayed. If a test set was used, the 'best' #' number of trees as estimated by the test set error is displayed. #' #' The number of available predictors, and the number of those having non-zero #' influence on predictions is given (which might be interesting in data mining #' applications). #' #' If multinomial, bernoulli or adaboost was used, the confusion matrix and #' prediction accuracy are printed (objects being allocated to the class with #' highest probability for multinomial and bernoulli). These classifications #' are performed on the entire training data using the model with the 'best' #' number of trees as described above, or the maximum number of trees if the #' 'best' cannot be computed. #' #' If the 'distribution' was specified as gaussian, laplace, quantile or #' t-distribution, a summary of the residuals is displayed. The residuals are #' for the training data with the model at the 'best' number of trees, as #' described above, or the maximum number of trees if the 'best' cannot be #' computed. #' #' @aliases print.gbm show.gbm #' @param x an object of class \code{gbm}. #' @param \dots arguments passed to \code{print.default}. #' @author Harry Southworth, Daniel Edwards #' @seealso \code{\link{gbm}} #' @keywords models nonlinear survival nonparametric #' @examples #' #' data(iris) #' iris.mod <- gbm(Species ~ ., distribution="multinomial", data=iris, #' n.trees=2000, shrinkage=0.01, cv.folds=5, #' verbose=FALSE, n.cores=1) #' iris.mod #' #data(lung) #' #lung.mod <- gbm(Surv(time, status) ~ ., distribution="coxph", data=lung, #' # n.trees=2000, shrinkage=0.01, cv.folds=5,verbose =FALSE) #' #lung.mod #' @rdname print.gbm #' @export print.gbm <- function(x, ... ) { if (!is.null(x$call)){ print(x$call) } dist.name <- x$distribution$name if (dist.name == "pairwise") { if (!is.null(x$distribution$max.rank) && x$distribution$max.rank > 0) { dist.name <- sprintf("pairwise (metric=%s, max.rank=%d)", x$distribution$metric, x$distribution$max.rank) } else { dist.name <- sprintf("pairwise (metric=%s)", x$distribution$metric) } } cat( paste( "A gradient boosted model with", dist.name, "loss function.\n" )) cat( paste( length( x$train.error ), "iterations were performed.\n" ) ) best <- length( x$train.error ) if ( !is.null( x$cv.error ) ) { best <- gbm.perf( x, plot.it = FALSE, method="cv" ) cat( paste("The best cross-validation iteration was ", best, ".\n", sep = "" ) ) } if ( x$train.fraction < 1 ) { best <- gbm.perf( x, plot.it = FALSE, method="test" ) cat( paste("The best test-set iteration was ", best, ".\n", sep = "" ) ) } if ( is.null( best ) ) { best <- length( x$train.error ) } ri <- relative.influence( x, n.trees=best ) cat( "There were", length( x$var.names ), "predictors of which", sum( ri > 0 ), "had non-zero influence.\n" ) invisible() } #' @rdname print.gbm #' #' @export show.gbm <- print.gbm #' Summary of a gbm object #' #' Computes the relative influence of each variable in the gbm object. #' #' For \code{distribution="gaussian"} this returns exactly the reduction of #' squared error attributable to each variable. For other loss functions this #' returns the reduction attributable to each variable in sum of squared error #' in predicting the gradient on each iteration. It describes the relative #' influence of each variable in reducing the loss function. See the references #' below for exact details on the computation. #' #' @param object a \code{gbm} object created from an initial call to #' \code{\link{gbm}}. #' @param cBars the number of bars to plot. If \code{order=TRUE} the only the #' variables with the \code{cBars} largest relative influence will appear in #' the barplot. If \code{order=FALSE} then the first \code{cBars} variables #' will appear in the plot. In either case, the function will return the #' relative influence of all of the variables. #' @param n.trees the number of trees used to generate the plot. Only the first #' \code{n.trees} trees will be used. #' @param plotit an indicator as to whether the plot is generated. #' @param order an indicator as to whether the plotted and/or returned relative #' influences are sorted. #' @param method The function used to compute the relative influence. #' \code{\link{relative.influence}} is the default and is the same as that #' described in Friedman (2001). The other current (and experimental) choice is #' \code{\link{permutation.test.gbm}}. This method randomly permutes each #' predictor variable at a time and computes the associated reduction in #' predictive performance. This is similar to the variable importance measures #' Breiman uses for random forests, but \code{gbm} currently computes using the #' entire training dataset (not the out-of-bag observations). #' @param normalize if \code{FALSE} then \code{summary.gbm} returns the #' unnormalized influence. #' @param ... other arguments passed to the plot function. #' @return Returns a data frame where the first component is the variable name #' and the second is the computed relative influence, normalized to sum to 100. #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' @seealso \code{\link{gbm}} #' @references J.H. Friedman (2001). "Greedy Function Approximation: A Gradient #' Boosting Machine," Annals of Statistics 29(5):1189-1232. #' #' L. Breiman #' (2001).\url{https://www.stat.berkeley.edu/users/breiman/randomforest2001.pdf}. #' @keywords hplot #' #' @export summary.gbm #' @export summary.gbm <- function(object, cBars=length(object$var.names), n.trees=object$n.trees, plotit=TRUE, order=TRUE, method=relative.influence, normalize=TRUE, ...) { if(n.trees < 1) { stop("n.trees must be greater than 0.") } if(n.trees > object$n.trees) { warning("Exceeded total number of GBM terms. Results use n.trees=",object$n.trees," terms.\n") n.trees <- object$n.trees } rel.inf <- method(object,n.trees) rel.inf[rel.inf<0] <- 0 if(order) { i <- order(-rel.inf) } else { i <- 1:length(rel.inf) } if(cBars==0) cBars <- min(10,length(object$var.names)) if(cBars>length(object$var.names)) cBars <- length(object$var.names) if(normalize) rel.inf <- 100*rel.inf/sum(rel.inf) if(plotit) { barplot(rel.inf[i[cBars:1]], horiz=TRUE, col=rainbow(cBars,start=3/6,end=4/6), names=object$var.names[i[cBars:1]], xlab="Relative influence",...) } return(data.frame(var=object$var.names[i], rel.inf=rel.inf[i])) } gbm/R/plot.gbm.R0000644000176200001440000003010413346754746013106 0ustar liggesusers#' Marginal plots of fitted gbm objects #' #' Plots the marginal effect of the selected variables by "integrating" out the #' other variables. #' #' \code{plot.gbm} produces low dimensional projections of the #' \code{\link{gbm.object}} by integrating out the variables not included in #' the \code{i.var} argument. The function selects a grid of points and uses #' the weighted tree traversal method described in Friedman (2001) to do the #' integration. Based on the variable types included in the projection, #' \code{plot.gbm} selects an appropriate display choosing amongst line plots, #' contour plots, and \code{\link[lattice]{lattice}} plots. If the default #' graphics are not sufficient the user may set \code{return.grid=TRUE}, store #' the result of the function, and develop another graphic display more #' appropriate to the particular example. #' #' @param x A \code{\link{gbm.object}} that was fit using a call to #' \code{\link{gbm}}. #' #' @param i.var Vector of indices or the names of the variables to plot. If #' using indices, the variables are indexed in the same order that they appear #' in the initial \code{gbm} formula. If \code{length(i.var)} is between 1 and #' 3 then \code{plot.gbm} produces the plots. Otherwise, \code{plot.gbm} #' returns only the grid of evaluation points and their average predictions #' #' @param n.trees Integer specifying the number of trees to use to generate the #' plot. Default is to use \code{x$n.trees} (i.e., the entire ensemble). #' #' @param continuous.resolution Integer specifying the number of equally space #' points at which to evaluate continuous predictors. #' #' @param return.grid Logical indicating whether or not to produce graphics #' \code{FALSE} or only return the grid of evaluation points and their average #' predictions \code{TRUE}. This is useful for customizing the graphics for #' special variable types, or for higher dimensional graphs. #' #' @param type Character string specifying the type of prediction to plot on the #' vertical axis. See \code{\link{predict.gbm}} for details. #' #' @param level.plot Logical indicating whether or not to use a false color #' level plot (\code{TRUE}) or a 3-D surface (\code{FALSE}). Default is #' \code{TRUE}. #' #' @param contour Logical indicating whether or not to add contour lines to the #' level plot. Only used when \code{level.plot = TRUE}. Default is \code{FALSE}. #' #' @param number Integer specifying the number of conditional intervals to use #' for the continuous panel variables. See \code{\link[graphics]{co.intervals}} #' and \code{\link[lattice]{equal.count}} for further details. #' #' @param overlap The fraction of overlap of the conditioning variables. See #' \code{\link[graphics]{co.intervals}} and \code{\link[lattice]{equal.count}} #' for further details. #' #' @param col.regions Color vector to be used if \code{level.plot} is #' \code{TRUE}. Defaults to the wonderful Matplotlib 'viridis' color map #' provided by the \code{viridis} package. See \code{\link[viridis]{viridis}} #' for details. #' #' @param ... Additional optional arguments to be passed onto #' \code{\link[graphics]{plot}}. #' #' @return If \code{return.grid = TRUE}, a grid of evaluation points and their #' average predictions. Otherwise, a plot is returned. #' #' @note More flexible plotting is available using the #' \code{\link[pdp]{partial}} and \code{\link[pdp]{plotPartial}} functions. #' #' @seealso \code{\link[pdp]{partial}}, \code{\link[pdp]{plotPartial}}, #' \code{\link{gbm}}, and \code{\link{gbm.object}}. #' #' @references J. H. Friedman (2001). "Greedy Function Approximation: A Gradient #' Boosting Machine," Annals of Statistics 29(4). #' #' @references B. M. Greenwell (2017). "pdp: An R Package for Constructing #' Partial Dependence Plots," The R Journal 9(1), 421--436. #' \url{https://journal.r-project.org/archive/2017/RJ-2017-016/index.html}. #' #' @export plot.gbm #' @export plot.gbm <- function(x, i.var = 1, n.trees = x$n.trees, continuous.resolution = 100, return.grid = FALSE, type = c("link", "response"), level.plot = TRUE, contour = FALSE, number = 4, overlap = 0.1, col.regions = viridis::viridis, ...) { # Match type argument type <- match.arg(type) # Sanity checks if(all(is.character(i.var))) { i <- match(i.var, x$var.names) if(any(is.na(i))) { stop("Requested variables not found in ", deparse(substitute(x)), ": ", i.var[is.na(i)]) } else { i.var <- i } } if((min(i.var) < 1) || (max(i.var) > length(x$var.names))) { warning("i.var must be between 1 and ", length(x$var.names)) } if(n.trees > x$n.trees) { warning(paste("n.trees exceeds the number of tree(s) in the model: ", x$n.trees, ". Using ", x$n.trees, " tree(s) instead.", sep = "")) n.trees <- x$n.trees } if(length(i.var) > 3) { warning("plot.gbm() will only create up to (and including) 3-way ", "interaction plots.\nBeyond that, plot.gbm() will only return ", "the plotting data structure.") return.grid <- TRUE } # Generate grid of predictor values on which to compute the partial # dependence values grid.levels <- vector("list", length(i.var)) for(i in 1:length(i.var)) { if(is.numeric(x$var.levels[[i.var[i]]])) { # continuous grid.levels[[i]] <- seq(from = min(x$var.levels[[i.var[i]]]), to = max(x$var.levels[[i.var[i]]]), length = continuous.resolution) } else { # categorical grid.levels[[i]] <- as.numeric(factor(x$var.levels[[i.var[i]]], levels = x$var.levels[[i.var[i]]])) - 1 } } X <- expand.grid(grid.levels) names(X) <- paste("X", 1:length(i.var), sep = "") # For compatibility with gbm version 1.6 if (is.null(x$num.classes)) { x$num.classes <- 1 } # Compute partial dependence values y <- .Call("gbm_plot", X = as.double(data.matrix(X)), cRows = as.integer(nrow(X)), cCols = as.integer(ncol(X)), n.class = as.integer(x$num.classes), i.var = as.integer(i.var - 1), n.trees = as.integer(n.trees), initF = as.double(x$initF), trees = x$trees, c.splits = x$c.splits, var.type = as.integer(x$var.type), PACKAGE = "gbm") if (x$distribution$name == "multinomial") { # reshape into matrix X$y <- matrix(y, ncol = x$num.classes) colnames(X$y) <- x$classes # Convert to class probabilities (if requested) if (type == "response") { X$y <- exp(X$y) X$y <- X$y / matrix(rowSums(X$y), ncol = ncol(X$y), nrow = nrow(X$y)) } } else if(is.element(x$distribution$name, c("bernoulli", "pairwise")) && type == "response") { X$y <- 1 / (1 + exp(-y)) } else if ((x$distribution$name == "poisson") && (type == "response")) { X$y <- exp(y) } else if (type == "response"){ warning("`type = \"response\"` only implemented for \"bernoulli\", ", "\"poisson\", \"multinomial\", and \"pairwise\" distributions. ", "Ignoring." ) } else { X$y <- y } # Transform categorical variables back to factors f.factor <- rep(FALSE, length(i.var)) for(i in 1:length(i.var)) { if(!is.numeric(x$var.levels[[i.var[i]]])) { X[,i] <- factor(x$var.levels[[i.var[i]]][X[, i] + 1], levels = x$var.levels[[i.var[i]]]) f.factor[i] <- TRUE } } # Return original variable names names(X)[1:length(i.var)] <- x$var.names[i.var] # Return grid only (if requested) if(return.grid) { return(X) } # Determine number of predictors nx <- length(i.var) # Determine which type of plot to draw based on the number of predictors if (nx == 1L) { # Single predictor plotOnePredictorPDP(X, ...) } else if (nx == 2) { # Two predictors plotTwoPredictorPDP(X, level.plot = level.plot, contour = contour, col.regions = col.regions, ...) } else { # Three predictors (paneled version of plotTwoPredictorPDP) plotThreePredictorPDP(X, nx = nx, level.plot = level.plot, contour = contour, col.regions = col.regions, number = number, overlap = overlap, ...) } } #' @keywords internal plotOnePredictorPDP <- function(X, ...) { # Use the first column to determine which type of plot to construct if (is.numeric(X[[1L]])) { # Draw a line plot lattice::xyplot(stats::as.formula(paste("y ~", names(X)[1L])), data = X, type = "l", ...) } else { # Draw a Cleveland dot plot lattice::dotplot(stats::as.formula(paste("y ~", names(X)[1L])), data = X, xlab = names(X)[1L], ...) } } #' @keywords internal plotTwoPredictorPDP <- function(X, level.plot, contour, col.regions, ...) { # Use the first two columns to determine which type of plot to construct if (is.factor(X[[1L]]) && is.factor(X[[2L]])) { # Draw a Cleveland dot plot lattice::dotplot(stats::as.formula( paste("y ~", paste(names(X)[1L:2L], collapse = "|")) ), data = X, xlab = names(X)[1L], ...) } else if (is.factor(X[[1L]]) || is.factor(X[[2L]])) { # Lattice plot formula form <- if (is.factor(X[[1L]])) { stats::as.formula(paste("y ~", paste(names(X)[2L:1L], collapse = "|"))) } else { stats::as.formula(paste("y ~", paste(names(X)[1L:2L], collapse = "|"))) } # Draw a paneled line plot lattice::xyplot(form, data = X, type = "l", ...) } else { # Lattice plot formula form <- stats::as.formula( paste("y ~", paste(names(X)[1L:2L], collapse = "*")) ) # Draw a three-dimensional surface if (level.plot) { # Draw a false color level plot lattice::levelplot(form, data = X, col.regions = col.regions, contour = contour, ...) } else { # Draw a wireframe plot lattice::wireframe(form, data = X, ...) } } } #' @keywords internal plotThreePredictorPDP <- function(X, nx, level.plot, contour, col.regions, number, overlap, ...) { # Factor, numeric, numeric if (is.factor(X[[1L]]) && !is.factor(X[[2L]]) && !is.factor(X[[3L]])) { X[, 1L:3L] <- X[, c(2L, 3L, 1L)] } # Numeric, factor, numeric if (!is.factor(X[[1L]]) && is.factor(X[[2L]]) && !is.factor(X[[3L]])) { X[, 1L:3L] <- X[, c(1L, 3L, 2L)] } # Factor, factor, numeric if (is.factor(X[[1L]]) && is.factor(X[[2L]]) && !is.factor(X[[3L]])) { X[, 1L:3L] <- X[, c(3L, 1L, 2L)] } # Factor, numeric, factor if (is.factor(X[[1L]]) && !is.factor(X[[2L]]) && is.factor(X[[3L]])) { X[, 1L:3L] <- X[, c(2L, 1L, 3L)] } # Convert third predictor to a factor using the equal count algorithm if (is.numeric(X[[3L]])) { X[[3L]] <- equal.count(X[[3L]], number = number, overlap = overlap) } if (is.factor(X[[1L]]) && is.factor(X[[2L]])) { # Lattice plot formula form <- stats::as.formula( paste("y ~", names(X)[1L], "|", paste(names(X)[2L:nx], collapse = "*")) ) # Produce a paneled dotplot lattice::dotplot(form, data = X, xlab = names(X)[1L], ...) } else if (is.numeric(X[[1L]]) && is.factor(X[[2L]])) { # Lattice plot formula form <- stats::as.formula( paste("y ~", names(X)[1L], "|", paste(names(X)[2L:nx], collapse = "*")) ) # Produce a paneled lineplot lattice::xyplot(form, data = X, type = "l", ...) } else { # Lattice plot formula form <- stats::as.formula( paste("y ~", paste(names(X)[1L:2L], collapse = "*"), "|", paste(names(X)[3L:nx], collapse = "*")) ) # Draw a three-dimensional surface if (level.plot) { # Draw a false color level plot lattice::levelplot(form, data = X, col.regions = col.regions, contour = contour, ...) } else { # Draw a wireframe plot lattice::wireframe(form, data = X, ...) } } }gbm/R/predict.gbm.R0000644000176200001440000001376413413011135013544 0ustar liggesusers#' Predict method for GBM Model Fits #' #' Predicted values based on a generalized boosted model object #' #' \code{predict.gbm} produces predicted values for each observation in #' \code{newdata} using the the first \code{n.trees} iterations of the boosting #' sequence. If \code{n.trees} is a vector than the result is a matrix with #' each column representing the predictions from gbm models with #' \code{n.trees[1]} iterations, \code{n.trees[2]} iterations, and so on. #' #' The predictions from \code{gbm} do not include the offset term. The user may #' add the value of the offset to the predicted value if desired. #' #' If \code{object} was fit using \code{\link{gbm.fit}} there will be no #' \code{Terms} component. Therefore, the user has greater responsibility to #' make sure that \code{newdata} is of the same format (order and number of #' variables) as the one originally used to fit the model. #' #' @param object Object of class inheriting from (\code{\link{gbm.object}}) #' #' @param newdata Data frame of observations for which to make predictions #' #' @param n.trees Number of trees used in the prediction. \code{n.trees} may be #' a vector in which case predictions are returned for each iteration specified #' #' @param type The scale on which gbm makes the predictions #' #' @param single.tree If \code{single.tree=TRUE} then \code{predict.gbm} #' returns only the predictions from tree(s) \code{n.trees} #' #' @param \dots further arguments passed to or from other methods #' #' @return Returns a vector of predictions. By default the predictions are on #' the scale of f(x). For example, for the Bernoulli loss the returned value is #' on the log odds scale, poisson loss on the log scale, and coxph is on the #' log hazard scale. #' #' If \code{type="response"} then \code{gbm} converts back to the same scale as #' the outcome. Currently the only effect this will have is returning #' probabilities for bernoulli and expected counts for poisson. For the other #' distributions "response" and "link" return the same. #' #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' #' @seealso \code{\link{gbm}}, \code{\link{gbm.object}} #' #' @keywords models regression #' #' @export predict.gbm #' @export predict.gbm <- function(object,newdata,n.trees, type="link", single.tree = FALSE, ...) { if ( missing( newdata ) ){ newdata <- reconstructGBMdata(object) } if ( missing(n.trees) ) { if ( object$train.fraction < 1 ){ n.trees <- gbm.perf( object, method="test", plot.it = FALSE ) } else if (!is.null(object$cv.error)){ n.trees <- gbm.perf( object, method="cv", plot.it = FALSE ) } else{ best <- length( object$train.error ) } cat( paste( "Using", n.trees, "trees...\n" ) ) } if(!is.element(type, c("link","response" ))) { stop("type must be either 'link' or 'response'") } if(!is.null(object$Terms)) { x <- model.frame(terms(reformulate(object$var.names)), newdata, na.action=na.pass) } else { x <- newdata } cRows <- nrow(x) cCols <- ncol(x) for(i in 1:cCols) { if(is.factor(x[,i])) { if (length(levels(x[,i])) > length(object$var.levels[[i]])) { new.compare <- levels(x[,i])[1:length(object$var.levels[[i]])] } else { new.compare <- levels(x[,i]) } if (!identical(object$var.levels[[i]], new.compare)) { x[,i] <- factor(x[,i], union(object$var.levels[[i]], levels(x[,i]))) } x[,i] <- as.numeric(factor(x[,i], levels = object$var.levels[[i]]))-1 } } x <- as.vector(unlist(x, use.names=FALSE)) if(missing(n.trees) || any(n.trees > object$n.trees)) { n.trees[n.trees>object$n.trees] <- object$n.trees warning("Number of trees not specified or exceeded number fit so far. Using ",paste(n.trees,collapse=" "),".") } i.ntree.order <- order(n.trees) # Next if block for compatibility with objects created with version 1.6. if (is.null(object$num.classes)){ object$num.classes <- 1 } predF <- .Call("gbm_pred", X=as.double(x), cRows=as.integer(cRows), cCols=as.integer(cCols), cNumClasses = as.integer(object$num.classes), n.trees=as.integer(n.trees[i.ntree.order]), initF=object$initF, trees=object$trees, c.split=object$c.split, var.type=as.integer(object$var.type), single.tree = as.integer(single.tree), PACKAGE = "gbm") if((length(n.trees) > 1) || (object$num.classes > 1)) { if(object$distribution$name=="multinomial") { predF <- array(predF, dim=c(cRows,object$num.classes,length(n.trees))) dimnames(predF) <- list(NULL, object$classes, n.trees) predF[,,i.ntree.order] <- predF } else { predF <- matrix(predF, ncol=length(n.trees), byrow=FALSE) colnames(predF) <- n.trees predF[,i.ntree.order] <- predF } } if(type=="response") { if(is.element(object$distribution$name, c("bernoulli", "pairwise"))) { predF <- 1/(1+exp(-predF)) } else if(object$distribution$name=="poisson") { predF <- exp(predF) } else if (object$distribution$name == "adaboost"){ predF <- 1 / (1 + exp(-2*predF)) } if(object$distribution$name=="multinomial") { pexp <- exp(predF) psum <- apply(pexp, c(1, 3), function(x) { x / sum(x) }) # Transpose each 2d array predF <- aperm(psum, c(2, 1, 3)) } if((length(n.trees)==1) && (object$distribution$name!="multinomial")) { predF <- as.vector(predF) } } if(!is.null(attr(object$Terms,"offset"))) { warning("predict.gbm does not add the offset to the predicted values.") } return(predF) } gbm/R/gbm.R0000644000176200001440000005230013414472304012112 0ustar liggesusers#' Generalized Boosted Regression Modeling (GBM) #' #' Fits generalized boosted regression models. For technical details, see the #' vignette: \code{utils::browseVignettes("gbm")}. #' #' \code{gbm.fit} provides the link between R and the C++ gbm engine. #' \code{gbm} is a front-end to \code{gbm.fit} that uses the familiar R #' modeling formulas. However, \code{\link[stats]{model.frame}} is very slow if #' there are many predictor variables. For power-users with many variables use #' \code{gbm.fit}. For general practice \code{gbm} is preferable. #' #' @param formula A symbolic description of the model to be fit. The formula #' may include an offset term (e.g. y~offset(n)+x). If #' \code{keep.data = FALSE} in the initial call to \code{gbm} then it is the #' user's responsibility to resupply the offset to \code{\link{gbm.more}}. #' #' @param distribution Either a character string specifying the name of the #' distribution to use or a list with a component \code{name} specifying the #' distribution and any additional parameters needed. If not specified, #' \code{gbm} will try to guess: if the response has only 2 unique values, #' bernoulli is assumed; otherwise, if the response is a factor, multinomial is #' assumed; otherwise, if the response has class \code{"Surv"}, coxph is #' assumed; otherwise, gaussian is assumed. #' #' Currently available options are \code{"gaussian"} (squared error), #' \code{"laplace"} (absolute loss), \code{"tdist"} (t-distribution loss), #' \code{"bernoulli"} (logistic regression for 0-1 outcomes), #' \code{"huberized"} (huberized hinge loss for 0-1 outcomes), classes), #' \code{"adaboost"} (the AdaBoost exponential loss for 0-1 outcomes), #' \code{"poisson"} (count outcomes), \code{"coxph"} (right censored #' observations), \code{"quantile"}, or \code{"pairwise"} (ranking measure #' using the LambdaMart algorithm). #' #' If quantile regression is specified, \code{distribution} must be a list of #' the form \code{list(name = "quantile", alpha = 0.25)} where \code{alpha} is #' the quantile to estimate. The current version's quantile regression method #' does not handle non-constant weights and will stop. #' #' If \code{"tdist"} is specified, the default degrees of freedom is 4 and #' this can be controlled by specifying #' \code{distribution = list(name = "tdist", df = DF)} where \code{DF} is your #' chosen degrees of freedom. #' #' If "pairwise" regression is specified, \code{distribution} must be a list of #' the form \code{list(name="pairwise",group=...,metric=...,max.rank=...)} #' (\code{metric} and \code{max.rank} are optional, see below). \code{group} is #' a character vector with the column names of \code{data} that jointly #' indicate the group an instance belongs to (typically a query in Information #' Retrieval applications). For training, only pairs of instances from the same #' group and with different target labels can be considered. \code{metric} is #' the IR measure to use, one of #' \describe{ #' \item{list("conc")}{Fraction of concordant pairs; for binary labels, this #' is equivalent to the Area under the ROC Curve} #' \item{:}{Fraction of concordant pairs; for binary labels, this is #' equivalent to the Area under the ROC Curve} #' \item{list("mrr")}{Mean reciprocal rank of the highest-ranked positive #' instance} #' \item{:}{Mean reciprocal rank of the highest-ranked positive instance} #' \item{list("map")}{Mean average precision, a generalization of \code{mrr} #' to multiple positive instances}\item{:}{Mean average precision, a #' generalization of \code{mrr} to multiple positive instances} #' \item{list("ndcg:")}{Normalized discounted cumulative gain. The score is #' the weighted sum (DCG) of the user-supplied target values, weighted #' by log(rank+1), and normalized to the maximum achievable value. This #' is the default if the user did not specify a metric.} #' } #' #' \code{ndcg} and \code{conc} allow arbitrary target values, while binary #' targets {0,1} are expected for \code{map} and \code{mrr}. For \code{ndcg} #' and \code{mrr}, a cut-off can be chosen using a positive integer parameter #' \code{max.rank}. If left unspecified, all ranks are taken into account. #' #' Note that splitting of instances into training and validation sets follows #' group boundaries and therefore only approximates the specified #' \code{train.fraction} ratio (the same applies to cross-validation folds). #' Internally, queries are randomly shuffled before training, to avoid bias. #' #' Weights can be used in conjunction with pairwise metrics, however it is #' assumed that they are constant for instances from the same group. #' #' For details and background on the algorithm, see e.g. Burges (2010). #' #' @param data an optional data frame containing the variables in the model. By #' default the variables are taken from \code{environment(formula)}, typically #' the environment from which \code{gbm} is called. If \code{keep.data=TRUE} in #' the initial call to \code{gbm} then \code{gbm} stores a copy with the #' object. If \code{keep.data=FALSE} then subsequent calls to #' \code{\link{gbm.more}} must resupply the same dataset. It becomes the user's #' responsibility to resupply the same data at this point. #' #' @param weights an optional vector of weights to be used in the fitting #' process. Must be positive but do not need to be normalized. If #' \code{keep.data=FALSE} in the initial call to \code{gbm} then it is the #' user's responsibility to resupply the weights to \code{\link{gbm.more}}. #' #' @param var.monotone an optional vector, the same length as the number of #' predictors, indicating which variables have a monotone increasing (+1), #' decreasing (-1), or arbitrary (0) relationship with the outcome. #' #' @param n.trees Integer specifying the total number of trees to fit. This is #' equivalent to the number of iterations and the number of basis functions in #' the additive expansion. Default is 100. #' #' @param interaction.depth Integer specifying the maximum depth of each tree #' (i.e., the highest level of variable interactions allowed). A value of 1 #' implies an additive model, a value of 2 implies a model with up to 2-way #' interactions, etc. Default is 1. #' #' @param n.minobsinnode Integer specifying the minimum number of observations #' in the terminal nodes of the trees. Note that this is the actual number of #' observations, not the total weight. #' #' @param shrinkage a shrinkage parameter applied to each tree in the #' expansion. Also known as the learning rate or step-size reduction; 0.001 to #' 0.1 usually work, but a smaller learning rate typically requires more trees. #' Default is 0.1. #' #' @param bag.fraction the fraction of the training set observations randomly #' selected to propose the next tree in the expansion. This introduces #' randomnesses into the model fit. If \code{bag.fraction} < 1 then running the #' same model twice will result in similar but different fits. \code{gbm} uses #' the R random number generator so \code{set.seed} can ensure that the model #' can be reconstructed. Preferably, the user can save the returned #' \code{\link{gbm.object}} using \code{\link{save}}. Default is 0.5. #' #' @param train.fraction The first \code{train.fraction * nrows(data)} #' observations are used to fit the \code{gbm} and the remainder are used for #' computing out-of-sample estimates of the loss function. #' #' @param cv.folds Number of cross-validation folds to perform. If #' \code{cv.folds}>1 then \code{gbm}, in addition to the usual fit, will #' perform a cross-validation, calculate an estimate of generalization error #' returned in \code{cv.error}. #' #' @param keep.data a logical variable indicating whether to keep the data and #' an index of the data stored with the object. Keeping the data and index #' makes subsequent calls to \code{\link{gbm.more}} faster at the cost of #' storing an extra copy of the dataset. #' #' @param verbose Logical indicating whether or not to print out progress and #' performance indicators (\code{TRUE}). If this option is left unspecified for #' \code{gbm.more}, then it uses \code{verbose} from \code{object}. Default is #' \code{FALSE}. #' #' @param class.stratify.cv Logical indicating whether or not the #' cross-validation should be stratified by class. Defaults to \code{TRUE} for #' \code{distribution = "multinomial"} and is only implemented for #' \code{"multinomial"} and \code{"bernoulli"}. The purpose of stratifying the #' cross-validation is to help avoiding situations in which training sets do #' not contain all classes. #' #' @param n.cores The number of CPU cores to use. The cross-validation loop #' will attempt to send different CV folds off to different cores. If #' \code{n.cores} is not specified by the user, it is guessed using the #' \code{detectCores} function in the \code{parallel} package. Note that the #' documentation for \code{detectCores} makes clear that it is not failsafe and #' could return a spurious number of available cores. #' #' @return A \code{\link{gbm.object}} object. #' #' @details #' This package implements the generalized boosted modeling framework. Boosting #' is the process of iteratively adding basis functions in a greedy fashion so #' that each additional basis function further reduces the selected loss #' function. This implementation closely follows Friedman's Gradient Boosting #' Machine (Friedman, 2001). #' #' In addition to many of the features documented in the Gradient Boosting #' Machine, \code{gbm} offers additional features including the out-of-bag #' estimator for the optimal number of iterations, the ability to store and #' manipulate the resulting \code{gbm} object, and a variety of other loss #' functions that had not previously had associated boosting algorithms, #' including the Cox partial likelihood for censored data, the poisson #' likelihood for count outcomes, and a gradient boosting implementation to #' minimize the AdaBoost exponential loss function. #' #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' #' Quantile regression code developed by Brian Kriegler #' \email{bk@@stat.ucla.edu} #' #' t-distribution, and multinomial code developed by Harry Southworth and #' Daniel Edwards #' #' Pairwise code developed by Stefan Schroedl \email{schroedl@@a9.com} #' #' @seealso \code{\link{gbm.object}}, \code{\link{gbm.perf}}, #' \code{\link{plot.gbm}}, \code{\link{predict.gbm}}, \code{\link{summary.gbm}}, #' and \code{\link{pretty.gbm.tree}}. #' #' @references #' Y. Freund and R.E. Schapire (1997) \dQuote{A decision-theoretic #' generalization of on-line learning and an application to boosting,} #' \emph{Journal of Computer and System Sciences,} 55(1):119-139. #' #' G. Ridgeway (1999). \dQuote{The state of boosting,} \emph{Computing Science #' and Statistics} 31:172-181. #' #' J.H. Friedman, T. Hastie, R. Tibshirani (2000). \dQuote{Additive Logistic #' Regression: a Statistical View of Boosting,} \emph{Annals of Statistics} #' 28(2):337-374. #' #' J.H. Friedman (2001). \dQuote{Greedy Function Approximation: A Gradient #' Boosting Machine,} \emph{Annals of Statistics} 29(5):1189-1232. #' #' J.H. Friedman (2002). \dQuote{Stochastic Gradient Boosting,} #' \emph{Computational Statistics and Data Analysis} 38(4):367-378. #' #' B. Kriegler (2007). Cost-Sensitive Stochastic Gradient Boosting Within a #' Quantitative Regression Framework. Ph.D. Dissertation. University of #' California at Los Angeles, Los Angeles, CA, USA. Advisor(s) Richard A. Berk. #' url{https://dl.acm.org/citation.cfm?id=1354603}. #' #' C. Burges (2010). \dQuote{From RankNet to LambdaRank to LambdaMART: An #' Overview,} Microsoft Research Technical Report MSR-TR-2010-82. #' #' @export #' #' @examples #' # #' # A least squares regression example #' # #' #' # Simulate data #' set.seed(101) # for reproducibility #' N <- 1000 #' X1 <- runif(N) #' X2 <- 2 * runif(N) #' X3 <- ordered(sample(letters[1:4], N, replace = TRUE), levels = letters[4:1]) #' X4 <- factor(sample(letters[1:6], N, replace = TRUE)) #' X5 <- factor(sample(letters[1:3], N, replace = TRUE)) #' X6 <- 3 * runif(N) #' mu <- c(-1, 0, 1, 2)[as.numeric(X3)] #' SNR <- 10 # signal-to-noise ratio #' Y <- X1 ^ 1.5 + 2 * (X2 ^ 0.5) + mu #' sigma <- sqrt(var(Y) / SNR) #' Y <- Y + rnorm(N, 0, sigma) #' X1[sample(1:N, size = 500)] <- NA # introduce some missing values #' X4[sample(1:N, size = 300)] <- NA # introduce some missing values #' data <- data.frame(Y, X1, X2, X3, X4, X5, X6) #' #' # Fit a GBM #' set.seed(102) # for reproducibility #' gbm1 <- gbm(Y ~ ., data = data, var.monotone = c(0, 0, 0, 0, 0, 0), #' distribution = "gaussian", n.trees = 100, shrinkage = 0.1, #' interaction.depth = 3, bag.fraction = 0.5, train.fraction = 0.5, #' n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE, #' verbose = FALSE, n.cores = 1) #' #' # Check performance using the out-of-bag (OOB) error; the OOB error typically #' # underestimates the optimal number of iterations #' best.iter <- gbm.perf(gbm1, method = "OOB") #' print(best.iter) #' #' # Check performance using the 50% heldout test set #' best.iter <- gbm.perf(gbm1, method = "test") #' print(best.iter) #' #' # Check performance using 5-fold cross-validation #' best.iter <- gbm.perf(gbm1, method = "cv") #' print(best.iter) #' #' # Plot relative influence of each variable #' par(mfrow = c(1, 2)) #' summary(gbm1, n.trees = 1) # using first tree #' summary(gbm1, n.trees = best.iter) # using estimated best number of trees #' #' # Compactly print the first and last trees for curiosity #' print(pretty.gbm.tree(gbm1, i.tree = 1)) #' print(pretty.gbm.tree(gbm1, i.tree = gbm1$n.trees)) #' #' # Simulate new data #' set.seed(103) # for reproducibility #' N <- 1000 #' X1 <- runif(N) #' X2 <- 2 * runif(N) #' X3 <- ordered(sample(letters[1:4], N, replace = TRUE)) #' X4 <- factor(sample(letters[1:6], N, replace = TRUE)) #' X5 <- factor(sample(letters[1:3], N, replace = TRUE)) #' X6 <- 3 * runif(N) #' mu <- c(-1, 0, 1, 2)[as.numeric(X3)] #' Y <- X1 ^ 1.5 + 2 * (X2 ^ 0.5) + mu + rnorm(N, 0, sigma) #' data2 <- data.frame(Y, X1, X2, X3, X4, X5, X6) #' #' # Predict on the new data using the "best" number of trees; by default, #' # predictions will be on the link scale #' Yhat <- predict(gbm1, newdata = data2, n.trees = best.iter, type = "link") #' #' # least squares error #' print(sum((data2$Y - Yhat)^2)) #' #' # Construct univariate partial dependence plots #' p1 <- plot(gbm1, i.var = 1, n.trees = best.iter) #' p2 <- plot(gbm1, i.var = 2, n.trees = best.iter) #' p3 <- plot(gbm1, i.var = "X3", n.trees = best.iter) # can use index or name #' grid.arrange(p1, p2, p3, ncol = 3) #' #' # Construct bivariate partial dependence plots #' plot(gbm1, i.var = 1:2, n.trees = best.iter) #' plot(gbm1, i.var = c("X2", "X3"), n.trees = best.iter) #' plot(gbm1, i.var = 3:4, n.trees = best.iter) #' #' # Construct trivariate partial dependence plots #' plot(gbm1, i.var = c(1, 2, 6), n.trees = best.iter, #' continuous.resolution = 20) #' plot(gbm1, i.var = 1:3, n.trees = best.iter) #' plot(gbm1, i.var = 2:4, n.trees = best.iter) #' plot(gbm1, i.var = 3:5, n.trees = best.iter) #' #' # Add more (i.e., 100) boosting iterations to the ensemble #' gbm2 <- gbm.more(gbm1, n.new.trees = 100, verbose = FALSE) gbm <- function(formula = formula(data), distribution = "bernoulli", data = list(), weights, var.monotone = NULL, n.trees = 100, interaction.depth = 1, n.minobsinnode = 10, shrinkage = 0.1, bag.fraction = 0.5, train.fraction = 1.0, cv.folds = 0, keep.data = TRUE, verbose = FALSE, class.stratify.cv = NULL, n.cores = NULL) { # Match the call to gbm mcall <- match.call() # Verbose output? lVerbose <- if (!is.logical(verbose)) { FALSE } else { verbose } # Construct model frame, terms object, weights, and offset mf <- match.call(expand.dots = FALSE) m <- match(c("formula", "data", "weights", "offset"), names(mf), 0) mf <- mf[c(1, m)] mf$drop.unused.levels <- TRUE mf$na.action <- na.pass mf[[1]] <- as.name("model.frame") m <- mf mf <- eval(mf, parent.frame()) Terms <- attr(mf, "terms") w <- model.weights(mf) offset <- model.offset(mf) # Determine and check response distribution if (missing(distribution)) { y <- data[, all.vars(formula)[1L], drop = TRUE] distribution <- guessDist(y) } if (is.character(distribution)) { distribution <- list(name = distribution) } if (!is.element(distribution$name, getAvailableDistributions())) { stop("Distribution ", distribution$name, " is not supported.") } # Extract and check response values y <- model.response(mf) # Construct data frame of predictor values var.names <- attributes(Terms)$term.labels x <- model.frame(terms(reformulate(var.names)), data = data, na.action = na.pass) # Extract response name as a character string response.name <- as.character(formula[[2L]]) # Stratify cross-validation by class (only for bernoulli and multinomial) class.stratify.cv <- getStratify(class.stratify.cv, d = distribution) # Groups (for pairwise distribution only) group <- NULL num.groups <- 0 # Determine number of training instances if (distribution$name != "pairwise"){ # Number of training instances nTrain <- floor(train.fraction * nrow(x)) } else { # Sampling is by group, so we need to calculate them here distribution.group <- distribution[["group"]] if (is.null(distribution.group)) { stop(paste("For pairwise regression, `distribution` must be a list of", "the form `list(name = \"pairwise\", group = c(\"date\",", "\"session\", \"category\", \"keywords\"))`.")) } # Check if group names are valid i <- match(distribution.group, colnames(data)) if (any(is.na(i))) { stop("Group column does not occur in data: ", distribution.group[is.na(i)], ".") } # Construct group index group <- factor( do.call(paste, c(data[, distribution.group, drop = FALSE], sep = ":")) ) # Check that weights are constant across groups if ((!missing(weights)) && (!is.null(weights))) { w.min <- tapply(w, INDEX = group, FUN = min) w.max <- tapply(w, INDEX = group, FUN = max) if (any(w.min != w.max)) { stop("For `distribution = \"pairwise\"`, all instances for the same ", "group must have the same weight.") } w <- w * length(w.min) / sum(w.min) # normalize across groups } # Shuffle groups to remove bias when split into train/test sets and/or CV # folds perm.levels <- levels(group)[sample(1:nlevels(group))] group <- factor(group, levels = perm.levels) # The C function expects instances to be sorted by group and descending by # target ord.group <- order(group, -y) group <- group[ord.group] y <- y[ord.group] x <- x[ord.group, , drop = FALSE] w <- w[ord.group] # Split into train and validation sets at group boundary num.groups.train <- max(1, round(train.fraction * nlevels(group))) # Include all groups up to the num.groups.train nTrain <- max(which(group==levels(group)[num.groups.train])) Misc <- group } # Set up for k-fold cross-validation cv.error <- NULL if(cv.folds > 1) { cv.results <- gbmCrossVal(cv.folds = cv.folds, nTrain = nTrain, n.cores = n.cores, class.stratify.cv = class.stratify.cv, data = data, x = x, y = y, offset = offset, distribution = distribution, w = w, var.monotone = var.monotone, n.trees = n.trees, interaction.depth = interaction.depth, n.minobsinnode = n.minobsinnode, shrinkage = shrinkage, bag.fraction = bag.fraction, var.names = var.names, response.name = response.name, group = group) cv.error <- cv.results$error p <- cv.results$predictions } # Fit a GBM gbm.obj <- gbm.fit(x = x, y = y, offset = offset, distribution = distribution, w = w, var.monotone = var.monotone, n.trees = n.trees, interaction.depth = interaction.depth, n.minobsinnode = n.minobsinnode, shrinkage = shrinkage, bag.fraction = bag.fraction, nTrain = nTrain, keep.data = keep.data, verbose = lVerbose, var.names = var.names, response.name = response.name, group = group) # Attach further components gbm.obj$train.fraction <- train.fraction gbm.obj$Terms <- Terms gbm.obj$cv.error <- cv.error gbm.obj$cv.folds <- cv.folds gbm.obj$call <- mcall gbm.obj$m <- m if (cv.folds > 0) { gbm.obj$cv.fitted <- p } if (distribution$name == "pairwise") { # Data has been reordered according to queries. We need to permute the # fitted values so that they correspond to the original order. gbm.obj$ord.group <- ord.group gbm.obj$fit <- gbm.obj$fit[order(ord.group)] } # Return "gbm" object gbm.obj } gbm/R/gbm.fit.R0000644000176200001440000005552213360715724012712 0ustar liggesusers#' Generalized Boosted Regression Modeling (GBM) #' #' Workhorse function providing the link between R and the C++ gbm engine. #' \code{gbm} is a front-end to \code{gbm.fit} that uses the familiar R #' modeling formulas. However, \code{\link[stats]{model.frame}} is very slow if #' there are many predictor variables. For power-users with many variables use #' \code{gbm.fit}. For general practice \code{gbm} is preferable. #' #' @param x A data frame or matrix containing the predictor variables. The #' number of rows in \code{x} must be the same as the length of \code{y}. #' #' @param y A vector of outcomes. The number of rows in \code{x} must be the #' same as the length of \code{y}. #' #' @param offset A vector of offset values. #' #' @param misc An R object that is simply passed on to the gbm engine. It can be #' used for additional data for the specific distribution. Currently it is only #' used for passing the censoring indicator for the Cox proportional hazards #' model. #' #' @param distribution Either a character string specifying the name of the #' distribution to use or a list with a component \code{name} specifying the #' distribution and any additional parameters needed. If not specified, #' \code{gbm} will try to guess: if the response has only 2 unique values, #' bernoulli is assumed; otherwise, if the response is a factor, multinomial is #' assumed; otherwise, if the response has class \code{"Surv"}, coxph is #' assumed; otherwise, gaussian is assumed. #' #' Currently available options are \code{"gaussian"} (squared error), #' \code{"laplace"} (absolute loss), \code{"tdist"} (t-distribution loss), #' \code{"bernoulli"} (logistic regression for 0-1 outcomes), #' \code{"huberized"} (huberized hinge loss for 0-1 outcomes), classes), #' \code{"adaboost"} (the AdaBoost exponential loss for 0-1 outcomes), #' \code{"poisson"} (count outcomes), \code{"coxph"} (right censored #' observations), \code{"quantile"}, or \code{"pairwise"} (ranking measure #' using the LambdaMart algorithm). #' #' If quantile regression is specified, \code{distribution} must be a list of #' the form \code{list(name = "quantile", alpha = 0.25)} where \code{alpha} is #' the quantile to estimate. The current version's quantile regression method #' does not handle non-constant weights and will stop. #' #' If \code{"tdist"} is specified, the default degrees of freedom is 4 and #' this can be controlled by specifying #' \code{distribution = list(name = "tdist", df = DF)} where \code{DF} is your #' chosen degrees of freedom. #' #' If "pairwise" regression is specified, \code{distribution} must be a list of #' the form \code{list(name="pairwise",group=...,metric=...,max.rank=...)} #' (\code{metric} and \code{max.rank} are optional, see below). \code{group} is #' a character vector with the column names of \code{data} that jointly #' indicate the group an instance belongs to (typically a query in Information #' Retrieval applications). For training, only pairs of instances from the same #' group and with different target labels can be considered. \code{metric} is #' the IR measure to use, one of #' \describe{ #' \item{list("conc")}{Fraction of concordant pairs; for binary labels, this #' is equivalent to the Area under the ROC Curve} #' \item{:}{Fraction of concordant pairs; for binary labels, this is #' equivalent to the Area under the ROC Curve} #' \item{list("mrr")}{Mean reciprocal rank of the highest-ranked positive #' instance} #' \item{:}{Mean reciprocal rank of the highest-ranked positive instance} #' \item{list("map")}{Mean average precision, a generalization of \code{mrr} #' to multiple positive instances}\item{:}{Mean average precision, a #' generalization of \code{mrr} to multiple positive instances} #' \item{list("ndcg:")}{Normalized discounted cumulative gain. The score is #' the weighted sum (DCG) of the user-supplied target values, weighted #' by log(rank+1), and normalized to the maximum achievable value. This #' is the default if the user did not specify a metric.} #' } #' #' \code{ndcg} and \code{conc} allow arbitrary target values, while binary #' targets {0,1} are expected for \code{map} and \code{mrr}. For \code{ndcg} #' and \code{mrr}, a cut-off can be chosen using a positive integer parameter #' \code{max.rank}. If left unspecified, all ranks are taken into account. #' #' Note that splitting of instances into training and validation sets follows #' group boundaries and therefore only approximates the specified #' \code{train.fraction} ratio (the same applies to cross-validation folds). #' Internally, queries are randomly shuffled before training, to avoid bias. #' #' Weights can be used in conjunction with pairwise metrics, however it is #' assumed that they are constant for instances from the same group. #' #' For details and background on the algorithm, see e.g. Burges (2010). #' #' @param w A vector of weights of the same length as the \code{y}. #' #' @param var.monotone an optional vector, the same length as the number of #' predictors, indicating which variables have a monotone increasing (+1), #' decreasing (-1), or arbitrary (0) relationship with the outcome. #' #' @param n.trees the total number of trees to fit. This is equivalent to the #' number of iterations and the number of basis functions in the additive #' expansion. #' #' @param interaction.depth The maximum depth of variable interactions. A value #' of 1 implies an additive model, a value of 2 implies a model with up to 2-way #' interactions, etc. Default is \code{1}. #' #' @param n.minobsinnode Integer specifying the minimum number of observations #' in the trees terminal nodes. Note that this is the actual number of #' observations not the total weight. #' #' @param shrinkage The shrinkage parameter applied to each tree in the #' expansion. Also known as the learning rate or step-size reduction; 0.001 to #' 0.1 usually work, but a smaller learning rate typically requires more trees. #' Default is \code{0.1}. #' #' @param bag.fraction The fraction of the training set observations randomly #' selected to propose the next tree in the expansion. This introduces #' randomnesses into the model fit. If \code{bag.fraction} < 1 then running the #' same model twice will result in similar but different fits. \code{gbm} uses #' the R random number generator so \code{set.seed} can ensure that the model #' can be reconstructed. Preferably, the user can save the returned #' \code{\link{gbm.object}} using \code{\link{save}}. Default is \code{0.5}. #' #' @param nTrain An integer representing the number of cases on which to train. #' This is the preferred way of specification for \code{gbm.fit}; The option #' \code{train.fraction} in \code{gbm.fit} is deprecated and only maintained #' for backward compatibility. These two parameters are mutually exclusive. If #' both are unspecified, all data is used for training. #' #' @param train.fraction The first \code{train.fraction * nrows(data)} #' observations are used to fit the \code{gbm} and the remainder are used for #' computing out-of-sample estimates of the loss function. #' #' @param keep.data Logical indicating whether or not to keep the data and an #' index of the data stored with the object. Keeping the data and index makes #' subsequent calls to \code{\link{gbm.more}} faster at the cost of storing an #' extra copy of the dataset. #' #' @param verbose Logical indicating whether or not to print out progress and #' performance indicators (\code{TRUE}). If this option is left unspecified for #' \code{gbm.more}, then it uses \code{verbose} from \code{object}. Default is #' \code{FALSE}. #' #' @param var.names Vector of strings of length equal to the number of columns #' of \code{x} containing the names of the predictor variables. #' #' @param response.name Character string label for the response variable. #' #' @param group The \code{group} to use when \code{distribution = "pairwise"}. #' #' @return A \code{\link{gbm.object}} object. #' #' @details #' This package implements the generalized boosted modeling framework. Boosting #' is the process of iteratively adding basis functions in a greedy fashion so #' that each additional basis function further reduces the selected loss #' function. This implementation closely follows Friedman's Gradient Boosting #' Machine (Friedman, 2001). #' #' In addition to many of the features documented in the Gradient Boosting #' Machine, \code{gbm} offers additional features including the out-of-bag #' estimator for the optimal number of iterations, the ability to store and #' manipulate the resulting \code{gbm} object, and a variety of other loss #' functions that had not previously had associated boosting algorithms, #' including the Cox partial likelihood for censored data, the poisson #' likelihood for count outcomes, and a gradient boosting implementation to #' minimize the AdaBoost exponential loss function. #' #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' #' Quantile regression code developed by Brian Kriegler #' \email{bk@@stat.ucla.edu} #' #' t-distribution, and multinomial code developed by Harry Southworth and #' Daniel Edwards #' #' Pairwise code developed by Stefan Schroedl \email{schroedl@@a9.com} #' #' @seealso \code{\link{gbm.object}}, \code{\link{gbm.perf}}, #' \code{\link{plot.gbm}}, \code{\link{predict.gbm}}, \code{\link{summary.gbm}}, #' and \code{\link{pretty.gbm.tree}}. #' #' @references #' Y. Freund and R.E. Schapire (1997) \dQuote{A decision-theoretic #' generalization of on-line learning and an application to boosting,} #' \emph{Journal of Computer and System Sciences,} 55(1):119-139. #' #' G. Ridgeway (1999). \dQuote{The state of boosting,} \emph{Computing Science #' and Statistics} 31:172-181. #' #' J.H. Friedman, T. Hastie, R. Tibshirani (2000). \dQuote{Additive Logistic #' Regression: a Statistical View of Boosting,} \emph{Annals of Statistics} #' 28(2):337-374. #' #' J.H. Friedman (2001). \dQuote{Greedy Function Approximation: A Gradient #' Boosting Machine,} \emph{Annals of Statistics} 29(5):1189-1232. #' #' J.H. Friedman (2002). \dQuote{Stochastic Gradient Boosting,} #' \emph{Computational Statistics and Data Analysis} 38(4):367-378. #' #' B. Kriegler (2007). Cost-Sensitive Stochastic Gradient Boosting Within a #' Quantitative Regression Framework. Ph.D. Dissertation. University of #' California at Los Angeles, Los Angeles, CA, USA. Advisor(s) Richard A. Berk. #' url{https://dl.acm.org/citation.cfm?id=1354603}. #' #' C. Burges (2010). \dQuote{From RankNet to LambdaRank to LambdaMART: An #' Overview,} Microsoft Research Technical Report MSR-TR-2010-82. #' #' @export gbm.fit <- function(x, y, offset = NULL, misc = NULL, distribution = "bernoulli", w = NULL, var.monotone = NULL, n.trees = 100, interaction.depth = 1, n.minobsinnode = 10, shrinkage = 0.001, bag.fraction = 0.5, nTrain = NULL, train.fraction = NULL, keep.data = TRUE, verbose = TRUE, var.names = NULL, response.name = "y", group = NULL) { # Reformat distribution into a named list if(is.character(distribution)) { distribution <- list(name = distribution) } # Dimensions of predictor data cRows <- nrow(x) cCols <- ncol(x) if(nrow(x) != ifelse(class(y) == "Surv", nrow(y), length(y))) { stop("The number of rows in x does not equal the length of y.") } # The preferred way to specify the number of training instances is via the # parameter `nTrain`. The parameter `train.fraction` is only maintained for # back compatibility. if(!is.null(nTrain) && !is.null(train.fraction)) { stop("Parameters `nTrain` and `train.fraction` cannot both be specified.") } else if(!is.null(train.fraction)) { warning("Parameter `train.fraction` is deprecated, please specify ", "`nTrain` instead.") nTrain <- floor(train.fraction*cRows) } else if(is.null(nTrain)) { nTrain <- cRows # both undefined, use all training data } if (is.null(train.fraction)){ train.fraction <- nTrain / cRows } # Extract var.names if NULL if(is.null(var.names)) { var.names <- getVarNames(x) } # Check size of data if(nTrain * bag.fraction <= 2 * n.minobsinnode + 1) { stop("The data set is too small or the subsampling rate is too large: ", "`nTrain * bag.fraction <= n.minobsinnode`") } if (distribution$name != "pairwise") { w <- w * length(w) / sum(w) # normalize to N } # Sanity checks ch <- checkMissing(x, y) interaction.depth <- checkID(interaction.depth) w <- checkWeights(w, length(y)) offset <- checkOffset(offset, y) Misc <- NA # setup variable types var.type <- rep(0,cCols) var.levels <- vector("list",cCols) for(i in 1:length(var.type)) { if(all(is.na(x[,i]))) { stop("variable ",i,": ",var.names[i]," has only missing values.") } if(is.ordered(x[,i])) { var.levels[[i]] <- levels(factor(x[,i])) x[,i] <- as.numeric(factor(x[,i]))-1 var.type[i] <- 0 } else if(is.factor(x[,i])) { if(length(levels(x[,i]))>1024) stop("gbm does not currently handle categorical variables with more than 1024 levels. Variable ",i,": ",var.names[i]," has ",length(levels(x[,i]))," levels.") var.levels[[i]] <- levels(factor(x[,i])) x[,i] <- as.numeric(factor(x[,i]))-1 var.type[i] <- max(x[,i],na.rm=TRUE)+1 } else if(is.numeric(x[,i])) { var.levels[[i]] <- quantile(x[,i],prob=(0:10)/10,na.rm=TRUE) } else { stop("variable ",i,": ",var.names[i]," is not of type numeric, ordered, or factor.") } # check for some variation in each variable if(length(unique(var.levels[[i]])) == 1) { warning("variable ",i,": ",var.names[i]," has no variation.") } } nClass <- 1 if(!("name" %in% names(distribution))) { stop("The distribution is missing a `name` component; for example, ", "distribution = list(name = \"gaussian\").") } supported.distributions <- getAvailableDistributions() distribution.call.name <- distribution$name # Check for potential problems with the distribution if(!is.element(distribution$name, supported.distributions)) { stop("Distribution ",distribution$name," is not supported") } if((distribution$name == "bernoulli") && !all(is.element(y,0:1))) { stop("Bernoulli requires the response to be in {0,1}") if (is.factor(y)) { y <- as.integer(y) - 1 } } if((distribution$name == "huberized") && !all(is.element(y,0:1))) { stop("Huberized square hinged loss requires the response to be in {0,1}") if (is.factor(y)) { y <- as.integer(y) - 1 } } if((distribution$name == "poisson") && any(y<0)) { stop("Poisson requires the response to be positive") } if((distribution$name == "poisson") && any(y != trunc(y))) { stop("Poisson requires the response to be a positive integer") } if((distribution$name == "adaboost") && !all(is.element(y,0:1))) { stop("This version of AdaBoost requires the response to be in {0,1}") if (is.factor(y)) { y <- as.integer(y) - 1 } } if(distribution$name == "quantile") { if(length(unique(w)) > 1) { stop("This version of gbm for the quantile regression lacks a weighted quantile. For now the weights must be constant.") } if(is.null(distribution$alpha)) { stop("For quantile regression, the distribution parameter must be a list with a parameter 'alpha' indicating the quantile, for example list(name=\"quantile\",alpha=0.95).") } else { if((distribution$alpha < 0) || (distribution$alpha > 1)) { stop("alpha must be between 0 and 1.") } } Misc <- c(alpha=distribution$alpha) } if(distribution$name == "coxph") { if(class(y)!="Surv") { stop("Outcome must be a survival object Surv(time,failure)") } if(attr(y,"type")!="right") { stop("gbm() currently only handles right censored observations") } Misc <- y[,2] y <- y[,1] # reverse sort the failure times to compute risk sets on the fly i.train <- order(-y[1:nTrain]) n.test <- cRows - nTrain if(n.test > 0) { i.test <- order(-y[(nTrain+1):cRows]) + nTrain } else { i.test <- NULL } i.timeorder <- c(i.train,i.test) y <- y[i.timeorder] Misc <- Misc[i.timeorder] x <- x[i.timeorder,,drop=FALSE] w <- w[i.timeorder] if(!is.na(offset)) offset <- offset[i.timeorder] } if(distribution$name == "tdist") { if (is.null(distribution$df) || !is.numeric(distribution$df)){ Misc <- 4 } else { Misc <- distribution$df[1] } } if (distribution$name == "multinomial") { ## Ensure that the training set contains all classes classes <- attr(factor(y), "levels") nClass <- length(classes) if (nClass > nTrain) { stop(paste("Number of classes (", nClass, ") must be less than the", " size of the training set (", nTrain, ").", sep = "")) } new.idx <- as.vector(sapply(classes, function(a,x){ min((1:length(x))[x==a]) }, y)) all.idx <- 1:length(y) new.idx <- c(new.idx, all.idx[!(all.idx %in% new.idx)]) y <- y[new.idx] x <- x[new.idx, ] w <- w[new.idx] if (!is.null(offset)) { offset <- offset[new.idx] } ## Get the factors y <- as.numeric(as.vector(outer(y, classes, "=="))) ## Fill out the weight and offset w <- rep(w, nClass) if (!is.null(offset)) { offset <- rep(offset, nClass) } } # close if (dist... == "multinomial" if(distribution$name == "pairwise") { distribution.metric <- distribution[["metric"]] if (!is.null(distribution.metric)) { distribution.metric <- tolower(distribution.metric) supported.metrics <- c("conc", "ndcg", "map", "mrr") if (!is.element(distribution.metric, supported.metrics)) { stop("Metric '", distribution.metric, "' is not supported, use either 'conc', 'ndcg', 'map', or 'mrr'") } metric <- distribution.metric } else { warning("No metric specified, using 'ndcg'") metric <- "ndcg" # default distribution[["metric"]] <- metric } if (any(y<0)) { stop("targets for 'pairwise' should be non-negative") } if (is.element(metric, c("mrr", "map")) && (!all(is.element(y, 0:1)))) { stop("Metrics 'map' and 'mrr' require the response to be in {0,1}") } # Cut-off rank for metrics # Default of 0 means no cutoff max.rank <- 0 if (!is.null(distribution[["max.rank"]]) && distribution[["max.rank"]] > 0) { if (is.element(metric, c("ndcg", "mrr"))) { max.rank <- distribution[["max.rank"]] } else { stop("Parameter 'max.rank' cannot be specified for metric '", distribution.metric, "', only supported for 'ndcg' and 'mrr'") } } # We pass the cut-off rank to the C function as the last element in the Misc vector Misc <- c(group, max.rank) distribution.call.name <- sprintf("pairwise_%s", metric) } # close if (dist... == "pairwise" # create index upfront... subtract one for 0 based order x.order <- apply(x[1:nTrain,,drop=FALSE],2,order,na.last=FALSE)-1 x <- as.vector(data.matrix(x)) predF <- rep(0,length(y)) train.error <- rep(0,n.trees) valid.error <- rep(0,n.trees) oobag.improve <- rep(0,n.trees) if(is.null(var.monotone)) { var.monotone <- rep(0,cCols) } else if(length(var.monotone)!=cCols) { stop("Length of var.monotone != number of predictors") } else if(!all(is.element(var.monotone,-1:1))) { stop("var.monotone must be -1, 0, or 1") } fError <- FALSE gbm.obj <- .Call("gbm_fit", Y=as.double(y), Offset=as.double(offset), X=as.double(x), X.order=as.integer(x.order), weights=as.double(w), Misc=as.double(Misc), cRows=as.integer(cRows), cCols=as.integer(cCols), var.type=as.integer(var.type), var.monotone=as.integer(var.monotone), distribution=as.character(distribution.call.name), n.trees=as.integer(n.trees), interaction.depth=as.integer(interaction.depth), n.minobsinnode=as.integer(n.minobsinnode), n.classes = as.integer(nClass), shrinkage=as.double(shrinkage), bag.fraction=as.double(bag.fraction), nTrain=as.integer(nTrain), fit.old=as.double(NA), n.cat.splits.old=as.integer(0), n.trees.old=as.integer(0), verbose=as.integer(verbose), PACKAGE = "gbm") names(gbm.obj) <- c("initF","fit","train.error","valid.error", "oobag.improve","trees","c.splits") gbm.obj$bag.fraction <- bag.fraction gbm.obj$distribution <- distribution gbm.obj$interaction.depth <- interaction.depth gbm.obj$n.minobsinnode <- n.minobsinnode gbm.obj$num.classes <- nClass gbm.obj$n.trees <- length(gbm.obj$trees) / nClass gbm.obj$nTrain <- nTrain gbm.obj$train.fraction <- train.fraction gbm.obj$response.name <- response.name gbm.obj$shrinkage <- shrinkage gbm.obj$var.levels <- var.levels gbm.obj$var.monotone <- var.monotone gbm.obj$var.names <- var.names gbm.obj$var.type <- var.type gbm.obj$verbose <- verbose gbm.obj$Terms <- NULL if(distribution$name == "coxph") { gbm.obj$fit[i.timeorder] <- gbm.obj$fit } ## If K-Classification is used then split the fit and tree components if (distribution$name == "multinomial") { gbm.obj$fit <- matrix(gbm.obj$fit, ncol = nClass) dimnames(gbm.obj$fit)[[2]] <- classes gbm.obj$classes <- classes ## Also get the class estimators exp.f <- exp(gbm.obj$fit) denom <- matrix(rep(rowSums(exp.f), nClass), ncol = nClass) gbm.obj$estimator <- exp.f/denom } if(keep.data) { if(distribution$name == "coxph") { # Put the observations back in order gbm.obj$data <- list( y = y, x = x, x.order = x.order, offset = offset, Misc = Misc, w = w, i.timeorder = i.timeorder ) } else if ( distribution$name == "multinomial" ) { # Restore original order of the data new.idx <- order(new.idx) gbm.obj$data <- list( y = as.vector(matrix(y, ncol = length(classes), byrow = FALSE)[new.idx, ]), x = as.vector(matrix(x, ncol = length(var.names), byrow = FALSE)[new.idx, ]), x.order = x.order, offset = offset[new.idx], Misc = Misc, w = w[new.idx] ) } else { gbm.obj$data <- list( y = y, x = x, x.order = x.order, offset = offset, Misc = Misc, w = w ) } } else { gbm.obj$data <- NULL } # Reuturn object of class "gbm" class(gbm.obj) <- "gbm" gbm.obj } gbm/R/basehaz.gbm.R0000644000176200001440000000535713346511223013536 0ustar liggesusers# rd2rox <- function(path = file.choose()) { # info <- Rd2roxygen::parse_file(path) # cat(Rd2roxygen::create_roxygen(info), sep = "\n") # } #' Baseline hazard function #' #' Computes the Breslow estimator of the baseline hazard function for a #' proportional hazard regression model. #' #' The proportional hazard model assumes h(t|x)=lambda(t)*exp(f(x)). #' \code{\link{gbm}} can estimate the f(x) component via partial likelihood. #' After estimating f(x), \code{basehaz.gbm} can compute the a nonparametric #' estimate of lambda(t). #' #' @param t The survival times. #' @param delta The censoring indicator. #' @param f.x The predicted values of the regression model on the log hazard #' scale. #' @param t.eval Values at which the baseline hazard will be evaluated. #' @param smooth If \code{TRUE} \code{basehaz.gbm} will smooth the estimated #' baseline hazard using Friedman's super smoother \code{\link{supsmu}}. #' @param cumulative If \code{TRUE} the cumulative survival function will be #' computed. #' @return A vector of length equal to the length of t (or of length #' \code{t.eval} if \code{t.eval} is not \code{NULL}) containing the baseline #' hazard evaluated at t (or at \code{t.eval} if \code{t.eval} is not #' \code{NULL}). If \code{cumulative} is set to \code{TRUE} then the returned #' vector evaluates the cumulative hazard function at those values. #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' @seealso \code{\link[survival]{survfit}}, \code{\link{gbm}} #' @references #' N. Breslow (1972). "Discussion of `Regression Models and #' Life-Tables' by D.R. Cox," Journal of the Royal Statistical Society, Series #' B, 34(2):216-217. #' #' N. Breslow (1974). "Covariance analysis of censored survival data," #' Biometrics 30:89-99. #' @keywords methods survival #' @export basehaz.gbm <- function(t,delta, f.x, t.eval = NULL, smooth = FALSE, cumulative = TRUE) { t.unique <- sort(unique(t[delta==1])) alpha <- length(t.unique) for(i in 1:length(t.unique)) { alpha[i] <- sum(t[delta==1]==t.unique[i])/ sum(exp(f.x[t>=t.unique[i]])) } if(!smooth && !cumulative) { if(!is.null(t.eval)) { stop("Cannot evaluate unsmoothed baseline hazard at t.eval.") } } else { if(smooth && !cumulative) { lambda.smooth <- supsmu(t.unique,alpha) } else { if(smooth && cumulative) { lambda.smooth <- supsmu(t.unique, cumsum(alpha)) } else { # (!smooth && cumulative) - THE DEFAULT lambda.smooth <- list(x = t.unique, y = cumsum(alpha)) } } } obj <- if(!is.null(t.eval)) { approx(lambda.smooth$x, lambda.smooth$y, xout = t.eval)$y } else { approx(lambda.smooth$x, lambda.smooth$y, xout = t)$y } return(obj) } gbm/R/gbm.object.R0000644000176200001440000000460313346511223013360 0ustar liggesusers#' Generalized Boosted Regression Model Object #' #' These are objects representing fitted \code{gbm}s. #' #' @return \item{initF}{the "intercept" term, the initial predicted value to #' which trees make adjustments} \item{fit}{a vector containing the fitted #' values on the scale of regression function (e.g. log-odds scale for #' bernoulli, log scale for poisson)} \item{train.error}{a vector of length #' equal to the number of fitted trees containing the value of the loss #' function for each boosting iteration evaluated on the training data} #' \item{valid.error}{a vector of length equal to the number of fitted trees #' containing the value of the loss function for each boosting iteration #' evaluated on the validation data} \item{cv.error}{if \code{cv.folds}<2 this #' component is NULL. Otherwise, this component is a vector of length equal to #' the number of fitted trees containing a cross-validated estimate of the loss #' function for each boosting iteration} \item{oobag.improve}{a vector of #' length equal to the number of fitted trees containing an out-of-bag estimate #' of the marginal reduction in the expected value of the loss function. The #' out-of-bag estimate uses only the training data and is useful for estimating #' the optimal number of boosting iterations. See \code{\link{gbm.perf}}} #' \item{trees}{a list containing the tree structures. The components are best #' viewed using \code{\link{pretty.gbm.tree}}} \item{c.splits}{a list of all #' the categorical splits in the collection of trees. If the \code{trees[[i]]} #' component of a \code{gbm} object describes a categorical split then the #' splitting value will refer to a component of \code{c.splits}. That component #' of \code{c.splits} will be a vector of length equal to the number of levels #' in the categorical split variable. -1 indicates left, +1 indicates right, #' and 0 indicates that the level was not present in the training data} #' \item{cv.fitted}{If cross-validation was performed, the cross-validation #' predicted values on the scale of the linear predictor. That is, the fitted #' values from the ith CV-fold, for the model having been trained on the data #' in all other folds.} #' @section Structure: The following components must be included in a #' legitimate \code{gbm} object. #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' @seealso \code{\link{gbm}} #' @keywords methods #' @name gbm.object NULL gbm/R/reconstructGBMdata.R0000644000176200001440000000335413346511223015103 0ustar liggesusers#' Reconstruct a GBM's Source Data #' #' Helper function to reconstitute the data for plots and summaries. This #' function is not intended for the user to call directly. #' #' #' @param x a \code{\link{gbm.object}} initially fit using \code{\link{gbm}} #' @return Returns a data used to fit the gbm in a format that can subsequently #' be used for plots and summaries #' @author Harry Southworth #' @seealso \code{\link{gbm}}, \code{\link{gbm.object}} #' @keywords manip #' @export reconstructGBMdata <- function(x) { if(class(x) != "gbm") { stop( "This function is for use only with objects having class 'gbm'" ) } else if (is.null(x$data)) { stop("Cannot reconstruct data from gbm object. gbm() was called with keep.data=FALSE") } else if (x$distribution$name=="multinomial") { y <- matrix(x$data$y, ncol=x$num.classes, byrow=FALSE) yn <- apply(y, 1, function(z,nc) {(1:nc)[z == 1]}, nc = x$num.classes) y <- factor(yn, labels=x$classes) xdat <- matrix(x$data$x, ncol=ncol(x$data$x.order), byrow=FALSE) d <- data.frame(y, xdat) names(d) <- c(x$response.name, x$var.names) } else if (x$distribution$name == "coxph") { xdat <- matrix(x$data$x, ncol=ncol(x$data$x.order), byrow=FALSE) status <- x$data$Misc y <- x$data$y[order(x$data$i.timeorder)] d <- data.frame(y, status, xdat) names(d) <- c(x$response.name[-1], colnames(x$data$x.order)) } else { y <- x$data$y xdat <- matrix(x$data$x, ncol=ncol(x$data$x.order), byrow=FALSE) d <- data.frame(y, xdat) rn <- ifelse(length(x$response.name) > 1, x$response.name[2], x$response.name) names(d) <- c(rn, colnames(x$data$x.order)) } invisible(d) } gbm/R/relative.influence.R0000644000176200001440000001446013413011135015122 0ustar liggesusers#' Methods for estimating relative influence #' #' Helper functions for computing the relative influence of each variable in #' the gbm object. #' #' @details #' This is not intended for end-user use. These functions offer the different #' methods for computing the relative influence in \code{\link{summary.gbm}}. #' \code{gbm.loss} is a helper function for \code{permutation.test.gbm}. #' #' @aliases relative.influence permutation.test.gbm gbm.loss #' #' @param object a \code{gbm} object created from an initial call to #' \code{\link{gbm}}. #' #' @param n.trees the number of trees to use for computations. If not provided, #' the the function will guess: if a test set was used in fitting, the number #' of trees resulting in lowest test set error will be used; otherwise, if #' cross-validation was performed, the number of trees resulting in lowest #' cross-validation error will be used; otherwise, all trees will be used. #' #' @param scale. whether or not the result should be scaled. Defaults to #' \code{FALSE}. #' #' @param sort. whether or not the results should be (reverse) sorted. #' Defaults to \code{FALSE}. #' #' @param y,f,w,offset,dist,baseline For \code{gbm.loss}: These components are #' the outcome, predicted value, observation weight, offset, distribution, and #' comparison loss function, respectively. #' #' @param group,max.rank Used internally when \code{distribution = #' \'pairwise\'}. #' #' @return By default, returns an unprocessed vector of estimated relative #' influences. If the \code{scale.} and \code{sort.} arguments are used, #' returns a processed version of the same. #' #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' #' @seealso \code{\link{summary.gbm}} #' #' @references J.H. Friedman (2001). "Greedy Function Approximation: A Gradient #' Boosting Machine," Annals of Statistics 29(5):1189-1232. #' #' L. Breiman (2001). #' \url{https://www.stat.berkeley.edu/users/breiman/randomforest2001.pdf}. #' #' @keywords hplot #' #' @rdname relative.influence #' #' @export relative.influence <- function(object, n.trees, scale. = FALSE, sort. = FALSE ) { if( missing( n.trees ) ){ if ( object$train.fraction < 1 ){ n.trees <- gbm.perf( object, method="test", plot.it=FALSE ) } else if ( !is.null( object$cv.error ) ){ n.trees <- gbm.perf( object, method="cv", plot.it = FALSE ) } else{ # If dist=multinomial, object$n.trees = n.trees * num.classes # so use the following instead. n.trees <- length( object$train.error ) } cat( paste( "n.trees not given. Using", n.trees, "trees.\n" ) ) } if (object$distribution == "multinomial") { n.trees <- n.trees * object$num.classes } get.rel.inf <- function(obj) { lapply(split(obj[[6]],obj[[1]]),sum) # 6 - Improvement, 1 - var name } temp <- unlist(lapply(object$trees[1:n.trees],get.rel.inf)) rel.inf.compact <- unlist(lapply(split(temp,names(temp)),sum)) rel.inf.compact <- rel.inf.compact[names(rel.inf.compact)!="-1"] # rel.inf.compact excludes those variable that never entered the model # insert 0's for the excluded variables rel.inf <- rep(0,length(object$var.names)) i <- as.numeric(names(rel.inf.compact))+1 rel.inf[i] <- rel.inf.compact names(rel.inf) <- object$var.names if (scale.){ rel.inf <- rel.inf / max(rel.inf) } if (sort.){ rel.inf <- rev(sort(rel.inf)) } return(rel.inf=rel.inf) } #' @rdname relative.influence #' @export permutation.test.gbm <- function(object, n.trees) { # get variables used in the model i.vars <- sort(unique(unlist(lapply(object$trees[1:n.trees], function(x){unique(x[[1]])})))) i.vars <- i.vars[i.vars!=-1] + 1 rel.inf <- rep(0,length(object$var.names)) if(!is.null(object$data)) { y <- object$data$y os <- object$data$offset Misc <- object$data$Misc w <- object$data$w x <- matrix(object$data$x, ncol=length(object$var.names)) object$Terms <- NULL # this makes predict.gbm take x as it is if (object$distribution$name == "pairwise") { # group and cutoff are only relevant for distribution "pairwise" # in this case, the last element specifies the max rank # max rank = 0 means no cut off group <- Misc[1:length(y)] max.rank <- Misc[length(y)+1] } } else { stop("Model was fit with keep.data=FALSE. permutation.test.gbm has not been implemented for that case.") } # the index shuffler j <- sample(1:nrow(x)) for(i in 1:length(i.vars)) { x[ ,i.vars[i]] <- x[j,i.vars[i]] new.pred <- predict.gbm(object,newdata=x,n.trees=n.trees) rel.inf[i.vars[i]] <- gbm.loss(y,new.pred,w,os, object$distribution, object$train.error[n.trees], group, max.rank) x[j,i.vars[i]] <- x[ ,i.vars[i]] } return(rel.inf=rel.inf) } #' @rdname relative.influence #' @export gbm.loss <- function(y, f, w, offset, dist, baseline, group=NULL, max.rank=NULL) { if (!is.na(offset)) { f <- offset+f } if (dist$name != "pairwise") { switch(dist$name, gaussian = weighted.mean((y - f)^2,w) - baseline, bernoulli = -2*weighted.mean(y*f - log(1+exp(f)),w) - baseline, laplace = weighted.mean(abs(y-f),w) - baseline, adaboost = weighted.mean(exp(-(2*y-1)*f),w) - baseline, poisson = -2*weighted.mean(y*f-exp(f),w) - baseline, stop(paste("Distribution",dist$name,"is not yet supported for method=permutation.test.gbm"))) } else # dist$name == "pairwise" { if (is.null(dist$metric)) { stop("No metric specified for distribution 'pairwise'") } if (!is.element(dist$metric, c("conc", "ndcg", "map", "mrr"))) { stop("Invalid metric '", dist$metric, "' specified for distribution 'pairwise'") } if (is.null(group)) { stop("For distribution 'pairwise', parameter 'group' has to be supplied") } # Loss = 1 - utility (1 - perf.pairwise(y, f, group, dist$metric, w, max.rank)) - baseline } } gbm/R/gbm.more.R0000644000176200001440000003421013346511223013051 0ustar liggesusers#' Generalized Boosted Regression Modeling (GBM) #' #' Adds additional trees to a \code{\link{gbm.object}} object. #' #' @param object A \code{\link{gbm.object}} object created from an initial call #' to \code{\link{gbm}}. #' #' @param n.new.trees Integer specifying the number of additional trees to add #' to \code{object}. Default is 100. #' #' @param data An optional data frame containing the variables in the model. By #' default the variables are taken from \code{environment(formula)}, typically #' the environment from which \code{gbm} is called. If \code{keep.data=TRUE} in #' the initial call to \code{gbm} then \code{gbm} stores a copy with the #' object. If \code{keep.data=FALSE} then subsequent calls to #' \code{\link{gbm.more}} must resupply the same dataset. It becomes the user's #' responsibility to resupply the same data at this point. #' #' @param weights An optional vector of weights to be used in the fitting #' process. Must be positive but do not need to be normalized. If #' \code{keep.data=FALSE} in the initial call to \code{gbm} then it is the #' user's responsibility to resupply the weights to \code{\link{gbm.more}}. #' #' @param offset A vector of offset values. #' #' @param verbose Logical indicating whether or not to print out progress and #' performance indicators (\code{TRUE}). If this option is left unspecified for #' \code{gbm.more}, then it uses \code{verbose} from \code{object}. Default is #' \code{FALSE}. #' #' @return A \code{\link{gbm.object}} object. #' #' @export #' #' @examples #' # #' # A least squares regression example #' # #' #' # Simulate data #' set.seed(101) # for reproducibility #' N <- 1000 #' X1 <- runif(N) #' X2 <- 2 * runif(N) #' X3 <- ordered(sample(letters[1:4], N, replace = TRUE), levels = letters[4:1]) #' X4 <- factor(sample(letters[1:6], N, replace = TRUE)) #' X5 <- factor(sample(letters[1:3], N, replace = TRUE)) #' X6 <- 3 * runif(N) #' mu <- c(-1, 0, 1, 2)[as.numeric(X3)] #' SNR <- 10 # signal-to-noise ratio #' Y <- X1 ^ 1.5 + 2 * (X2 ^ 0.5) + mu #' sigma <- sqrt(var(Y) / SNR) #' Y <- Y + rnorm(N, 0, sigma) #' X1[sample(1:N,size=500)] <- NA # introduce some missing values #' X4[sample(1:N,size=300)] <- NA # introduce some missing values #' data <- data.frame(Y, X1, X2, X3, X4, X5, X6) #' #' # Fit a GBM #' set.seed(102) # for reproducibility #' gbm1 <- gbm(Y ~ ., data = data, var.monotone = c(0, 0, 0, 0, 0, 0), #' distribution = "gaussian", n.trees = 100, shrinkage = 0.1, #' interaction.depth = 3, bag.fraction = 0.5, train.fraction = 0.5, #' n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE, #' verbose = FALSE, n.cores = 1) #' #' # Check performance using the out-of-bag (OOB) error; the OOB error typically #' # underestimates the optimal number of iterations #' best.iter <- gbm.perf(gbm1, method = "OOB") #' print(best.iter) #' #' # Check performance using the 50% heldout test set #' best.iter <- gbm.perf(gbm1, method = "test") #' print(best.iter) #' #' # Check performance using 5-fold cross-validation #' best.iter <- gbm.perf(gbm1, method = "cv") #' print(best.iter) #' #' # Plot relative influence of each variable #' par(mfrow = c(1, 2)) #' summary(gbm1, n.trees = 1) # using first tree #' summary(gbm1, n.trees = best.iter) # using estimated best number of trees #' #' # Compactly print the first and last trees for curiosity #' print(pretty.gbm.tree(gbm1, i.tree = 1)) #' print(pretty.gbm.tree(gbm1, i.tree = gbm1$n.trees)) #' #' # Simulate new data #' set.seed(103) # for reproducibility #' N <- 1000 #' X1 <- runif(N) #' X2 <- 2 * runif(N) #' X3 <- ordered(sample(letters[1:4], N, replace = TRUE)) #' X4 <- factor(sample(letters[1:6], N, replace = TRUE)) #' X5 <- factor(sample(letters[1:3], N, replace = TRUE)) #' X6 <- 3 * runif(N) #' mu <- c(-1, 0, 1, 2)[as.numeric(X3)] #' Y <- X1 ^ 1.5 + 2 * (X2 ^ 0.5) + mu + rnorm(N, 0, sigma) #' data2 <- data.frame(Y, X1, X2, X3, X4, X5, X6) #' #' # Predict on the new data using the "best" number of trees; by default, #' # predictions will be on the link scale #' Yhat <- predict(gbm1, newdata = data2, n.trees = best.iter, type = "link") #' #' # least squares error #' print(sum((data2$Y - Yhat)^2)) #' #' # Construct univariate partial dependence plots #' p1 <- plot(gbm1, i.var = 1, n.trees = best.iter) #' p2 <- plot(gbm1, i.var = 2, n.trees = best.iter) #' p3 <- plot(gbm1, i.var = "X3", n.trees = best.iter) # can use index or name #' grid.arrange(p1, p2, p3, ncol = 3) #' #' # Construct bivariate partial dependence plots #' plot(gbm1, i.var = 1:2, n.trees = best.iter) #' plot(gbm1, i.var = c("X2", "X3"), n.trees = best.iter) #' plot(gbm1, i.var = 3:4, n.trees = best.iter) #' #' # Construct trivariate partial dependence plots #' plot(gbm1, i.var = c(1, 2, 6), n.trees = best.iter, #' continuous.resolution = 20) #' plot(gbm1, i.var = 1:3, n.trees = best.iter) #' plot(gbm1, i.var = 2:4, n.trees = best.iter) #' plot(gbm1, i.var = 3:5, n.trees = best.iter) #' #' # Add more (i.e., 100) boosting iterations to the ensemble #' gbm2 <- gbm.more(gbm1, n.new.trees = 100, verbose = FALSE) gbm.more <- function(object, n.new.trees = 100, data = NULL, weights = NULL, offset = NULL, verbose = NULL) { theCall <- match.call() nTrain <- object$nTrain if (object$distribution$name != "pairwise") { distribution.call.name <- object$distribution$name } else { distribution.call.name <- sprintf("pairwise_%s", object$distribution$metric) } if(is.null(object$Terms) && is.null(object$data)) { stop("The gbm model was fit using gbm.fit (rather than gbm) and keep.data was set to FALSE. gbm.more cannot locate the dataset.") } else if(is.null(object$data) && is.null(data)) { stop("keep.data was set to FALSE on original gbm call and argument 'data' is NULL") } else if(is.null(object$data)) { m <- eval(object$m, parent.frame()) Terms <- attr(m, "terms") a <- attributes(Terms) y <- as.vector(model.extract(m, "response")) offset <- model.extract(m,offset) x <- model.frame(delete.response(Terms), data, na.action=na.pass) w <- weights if(length(w)==0) w <- rep(1, nrow(x)) if (object$distribution$name != "pairwise") { w <- w*length(w)/sum(w) # normalize to N } if(is.null(offset) || (offset==0)) { offset <- NA } Misc <- NA if(object$distribution$name == "coxph") { Misc <- as.numeric(y)[-(1:cRows)] y <- as.numeric(y)[1:cRows] # reverse sort the failure times to compute risk sets on the fly i.train <- order(-y[1:nTrain]) i.test <- order(-y[(nTrain+1):cRows]) + nTrain i.timeorder <- c(i.train,i.test) y <- y[i.timeorder] Misc <- Misc[i.timeorder] x <- x[i.timeorder,,drop=FALSE] w <- w[i.timeorder] if(!is.na(offset)) offset <- offset[i.timeorder] object$fit <- object$fit[i.timeorder] } else if(object$distribution$name == "tdist" ){ Misc <- object$distribution$df } else if (object$distribution$name == "pairwise"){ # Check if group names are valid distribution.group <- object$distribution$group i <- match(distribution.group, colnames(data)) if (any(is.na(i))) { stop("Group column does not occur in data: ", distribution.group[is.na(i)]) } # construct group index group <- factor(do.call(paste, c(data[,distribution.group, drop=FALSE], sep=":"))) # Check that weights are constant across groups if ((!missing(weights)) && (!is.null(weights))) { w.min <- tapply(w, INDEX=group, FUN=min) w.max <- tapply(w, INDEX=group, FUN=max) if (any(w.min != w.max)) { stop("For distribution 'pairwise', all instances for the same group must have the same weight") } # Normalize across groups w <- w * length(w.min) / sum(w.min) } # Shuffle groups, to remove bias when splitting into train/test set and/or CV folds perm.levels <- levels(group)[sample(1:nlevels(group))] group <- factor(group, levels=perm.levels) # The C function expects instances to be sorted by group and descending by target ord.group <- object$ord.group group <- group[ord.group] y <- y[ord.group] x <- x[ord.group,,drop=FALSE] w <- x[ord.group] object$fit <- object$fit[ord.group] # object$fit is stored in the original order # Split into train and validation set, at group boundary num.groups.train <- max(1, round(object$train.fraction * nlevels(group))) # include all groups up to the num.groups.train nTrain <- max(which(group==levels(group)[num.groups.train])) metric <- object$distribution[["metric"]] if (is.element(metric, c("mrr", "map")) && (!all(is.element(y, 0:1)))) { stop("Metrics 'map' and 'mrr' require the response to be in {0,1}") } # Cut-off rank for metrics # We pass this argument as the last element in the Misc vector # Default of 0 means no cutoff max.rank <- 0 if (!is.null(object$distribution[["max.rank"]]) && object$distribution[["max.rank"]] > 0) { if (is.element(metric, c("ndcg", "mrr"))) { max.rank <- object$distribution[["max.rank"]] } else { stop("Parameter 'max.rank' cannot be specified for metric '", metric, "', only supported for 'ndcg' and 'mrr'") } } Misc <- c(group, max.rank) } # create index upfront... subtract one for 0 based order x.order <- apply(x[1:nTrain,,drop=FALSE],2,order,na.last=FALSE)-1 x <- data.matrix(x) cRows <- nrow(x) cCols <- ncol(x) } else { y <- object$data$y x <- object$data$x x.order <- object$data$x.order offset <- object$data$offset Misc <- object$data$Misc w <- object$data$w nTrain <- object$nTrain cRows <- length(y) cCols <- length(x)/cRows if(object$distribution$name == "coxph") { i.timeorder <- object$data$i.timeorder object$fit <- object$fit[i.timeorder] } if (object$distribution$name == "pairwise") { object$fit <- object$fit[object$ord.group] # object$fit is stored in the original order } } if(is.null(verbose)) { verbose <- object$verbose } x <- as.vector(x) gbm.obj <- .Call("gbm_fit", Y = as.double(y), Offset = as.double(offset), X = as.double(x), X.order = as.integer(x.order), weights = as.double(w), Misc = as.double(Misc), cRows = as.integer(cRows), cCols = as.integer(cCols), var.type = as.integer(object$var.type), var.monotone = as.integer(object$var.monotone), distribution = as.character(distribution.call.name), n.trees = as.integer(n.new.trees), interaction.depth = as.integer(object$interaction.depth), n.minobsinnode = as.integer(object$n.minobsinnode), n.classes = as.integer(object$num.classes), shrinkage = as.double(object$shrinkage), bag.fraction = as.double(object$bag.fraction), train.fraction = as.integer(nTrain), fit.old = as.double(object$fit), n.cat.splits.old = as.integer(length(object$c.splits)), n.trees.old = as.integer(object$n.trees), verbose = as.integer(verbose), PACKAGE = "gbm") names(gbm.obj) <- c("initF","fit","train.error","valid.error", "oobag.improve","trees","c.splits") gbm.obj$initF <- object$initF gbm.obj$train.error <- c(object$train.error, gbm.obj$train.error) gbm.obj$valid.error <- c(object$valid.error, gbm.obj$valid.error) gbm.obj$oobag.improve <- c(object$oobag.improve, gbm.obj$oobag.improve) gbm.obj$trees <- c(object$trees, gbm.obj$trees) gbm.obj$c.splits <- c(object$c.splits, gbm.obj$c.splits) # cv.error not updated when using gbm.more gbm.obj$cv.error <- object$cv.error gbm.obj$cv.folds <- object$cv.folds gbm.obj$n.trees <- length(gbm.obj$trees) gbm.obj$distribution <- object$distribution gbm.obj$train.fraction <- object$train.fraction gbm.obj$shrinkage <- object$shrinkage gbm.obj$bag.fraction <- object$bag.fraction gbm.obj$var.type <- object$var.type gbm.obj$var.monotone <- object$var.monotone gbm.obj$var.names <- object$var.names gbm.obj$interaction.depth <- object$interaction.depth gbm.obj$n.minobsinnode <- object$n.minobsinnode gbm.obj$num.classes <- object$num.classes gbm.obj$nTrain <- object$nTrain gbm.obj$response.name <- object$response.name gbm.obj$Terms <- object$Terms gbm.obj$var.levels <- object$var.levels gbm.obj$verbose <- verbose if(object$distribution$name == "coxph") { gbm.obj$fit[i.timeorder] <- gbm.obj$fit } if (object$distribution$name == "pairwise") { # Data has been reordered according to queries. # We need to permute the fitted values to correspond # to the original order. gbm.obj$fit <- gbm.obj$fit[order(object$ord.group)] object$fit <- object$fit[order(object$ord.group)] gbm.obj$ord.group <- object$ord.group } if(!is.null(object$data)) { gbm.obj$data <- object$data } else { gbm.obj$data <- NULL } gbm.obj$m <- object$m gbm.obj$call <- theCall class(gbm.obj) <- "gbm" return(gbm.obj) } gbm/R/shrink.gbm.pred.R0000644000176200001440000000501313346511223014335 0ustar liggesusers#' Predictions from a shrunked GBM #' #' Makes predictions from a shrunken GBM model. #' #' @param object a \code{\link{gbm.object}} #' @param newdata dataset for predictions #' @param n.trees the number of trees to use #' @param lambda a vector with length equal to the number of variables #' containing the shrinkage parameter for each variable #' @param \dots other parameters (ignored) #' @return A vector with length equal to the number of observations in newdata #' containing the predictions #' @section Warning: This function is experimental #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' @seealso \code{\link{shrink.gbm}}, \code{\link{gbm}} #' @keywords methods #' @export shrink.gbm.pred <- function(object,newdata,n.trees, lambda=rep(1,length(object$var.names)), ...) { if(length(lambda) != length(object$var.names)) { stop("lambda must have the same length as the number of variables in the gbm object.") } if(!is.null(object$Terms)) { x <- model.frame(delete.response(object$Terms), newdata, na.action=na.pass) } else { x <- newdata } cRows <- nrow(x) cCols <- ncol(x) for(i in 1:cCols) { if(is.factor(x[,i])) { j <- match(levels(x[,i]), object$var.levels[[i]]) if(any(is.na(j))) { stop(paste("New levels for variable ", object$var.names[i],": ", levels(x[,i])[is.na(j)],sep="")) } x[,i] <- as.numeric(x[,i])-1 } } x <- as.vector(unlist(x)) if(missing(n.trees) || any(n.trees > object$n.trees)) { n.trees <- n.trees[n.trees<=object$n.trees] if(length(n.trees)==0) n.trees <- object$n.trees warning("n.trees not specified or some values exceeded number fit so far. Using ",n.trees,".") } # sort n.trees so that predictions are easier to generate and store n.trees <- sort(n.trees) predF <- .Call("gbm_shrink_pred", X=as.double(x), cRows=as.integer(cRows), cCols=as.integer(cCols), n.trees=as.integer(n.trees), initF=object$initF, trees=object$trees, c.split=object$c.split, var.type=as.integer(object$var.type), depth=as.integer(object$interaction.depth), lambda=as.double(lambda), PACKAGE = "gbm") return(predF) } gbm/R/gbm-internals.R0000644000176200001440000001075213346511223014112 0ustar liggesusers#' gbm internal functions #' #' Helper functions for preprocessing data prior to building a \code{"gbm"} #' object. #' #' @param y The response variable. #' @param d,distribution The distribution, either specified by the user or #' implied. #' @param class.stratify.cv Whether or not to stratify, if provided by the user. #' @param i.train Computed internally by \code{gbm}. #' @param group The group, if using \code{distibution = "pairwise"}. #' @param strat Whether or not to stratify. #' @param cv.folds The number of cross-validation folds. #' @param x The design matrix. #' @param id The interaction depth. #' @param w The weights. #' @param n The number of cores to use in the cluster. #' @param o The offset. #' #' @details #' These are functions used internally by \code{gbm} and not intended for direct #' use by the user. #' #' @aliases guessDist getStratify getCVgroup checkMissing checkID checkWeights #' checkOffset getVarNames gbmCluster #' #' @rdname gbm-internals #' @export guessDist <- function(y){ # If distribution is not given, try to guess it if (length(unique(y)) == 2){ d <- "bernoulli" } else if (class(y) == "Surv" ){ d <- "coxph" } else if (is.factor(y)){ d <- "multinomial" } else{ d <- "gaussian" } cat(paste("Distribution not specified, assuming", d, "...\n")) list(name=d) } #' @rdname gbm-internals #' @export getCVgroup <- function(distribution, class.stratify.cv, y, i.train, cv.folds, group) { # Construct cross-validation groups depending on the type of model to be fit if (distribution$name %in% c( "bernoulli", "multinomial" ) & class.stratify.cv ){ nc <- table(y[i.train]) # Number in each class uc <- names(nc) if (min(nc) < cv.folds){ stop( paste("The smallest class has only", min(nc), "objects in the training set. Can't do", cv.folds, "fold cross-validation.")) } cv.group <- vector(length = length(i.train)) for (i in 1:length(uc)){ cv.group[y[i.train] == uc[i]] <- sample(rep(1:cv.folds , length = nc[i])) } } # Close if else if (distribution$name == "pairwise") { # Split into CV folds at group boundaries s <- sample(rep(1:cv.folds, length=nlevels(group))) cv.group <- s[as.integer(group[i.train])] } else { cv.group <- sample(rep(1:cv.folds, length=length(i.train))) } cv.group } #' @rdname gbm-internals #' @export getStratify <- function(strat, d){ if (is.null(strat)){ if (d$name == "multinomial" ){ strat <- TRUE } else { strat <- FALSE } } else { if (!is.element(d$name, c( "bernoulli", "multinomial"))){ warning("You can only use class.stratify.cv when distribution is bernoulli or multinomial. Ignored.") strat <- FALSE } } # Close else strat } #' @rdname gbm-internals #' @export checkMissing <- function(x, y){ nms <- getVarNames(x) #### Check for NaNs in x and NAs in response j <- apply(x, 2, function(z) any(is.nan(z))) if(any(j)) { stop("Use NA for missing values. NaN found in predictor variables:", paste(nms[j],collapse=",")) } if(any(is.na(y))) stop("Missing values are not allowed in the response") invisible(NULL) } #' @rdname gbm-internals #' @export checkWeights <- function(w, n){ # Logical checks on weights if(length(w)==0) { w <- rep(1, n) } else if(any(w < 0)) stop("negative weights not allowed") w } #' @rdname gbm-internals #' @export checkID <- function(id){ # Check for disallowed interaction.depth if(id < 1) { stop("interaction.depth must be at least 1.") } else if(id > 49) { stop("interaction.depth must be less than 50. You should also ask yourself why you want such large interaction terms. A value between 1 and 5 should be sufficient for most applications.") } invisible(id) } #' @rdname gbm-internals #' @export checkOffset <- function(o, y){ # Check offset if(is.null(o) | all(o==0)) { o <- NA } else if(length(o) != length(y)) { stop("The length of offset does not equal the length of y.") } o } #' @rdname gbm-internals #' @export getVarNames <- function(x){ if(is.matrix(x)) { var.names <- colnames(x) } else if(is.data.frame(x)) { var.names <- names(x) } else { var.names <- paste("X",1:ncol(x),sep="") } var.names } #' @rdname gbm-internals #' @export gbmCluster <- function(n){ # If number of cores (n) not given, try to work it out from the number # that appear to be available and the number of CV folds. if (is.null(n)){ n <- parallel::detectCores() } parallel::makeCluster(n) } gbm/R/ir.measures.R0000644000176200001440000001472013346511223013604 0ustar liggesusers# Functions to compute IR measures for pairwise loss for # a single group # Notes: # * Inputs are passed as a 2-elemen (y,f) list, to # facilitate the 'by' iteration # * Return the respective metric, or a negative value if # it is undefined for the given group # * For simplicity, we have no special handling for ties; # instead, we break ties randomly. This is slightly # inaccurate for individual groups, but should have # a small effect on the overall measure. #' Compute Information Retrieval measures. #' #' Functions to compute Information Retrieval measures for pairwise loss for a #' single group. The function returns the respective metric, or a negative #' value if it is undefined for the given group. #' #' @param obs Observed value. #' @param pred Predicted value. #' @param metric What type of performance measure to compute. #' @param y,y.f,f,w,group,max.rank Used internally. #' @param x ?. #' @return The requested performance measure. #' #' @details #' For simplicity, we have no special handling for ties; instead, we break ties #' randomly. This is slightly inaccurate for individual groups, but should have #' only a small effect on the overall measure. #' #' \code{gbm.conc} computes the concordance index: Fraction of all pairs (i,j) #' with i Define data, use random, #' ##-- or do help(data=index) for the standard data sets. # Area under ROC curve = ratio of correctly ranking pairs #' @rdname gbm.roc.area #' @export gbm.roc.area <- function(obs, pred) { n1 <- sum(obs) n <- length(obs) if (n==n1) { return(1) } # Fraction of concordant pairs # = sum_{pos}(rank-1) / #pairs with different labels # #pairs = n1 * (n-n1) return ((mean(rank(pred)[obs > 0]) - (n1 + 1)/2)/(n - n1)) } # Concordance Index: # Fraction of all pairs (i,j) with i0) if (length(f) <= 1 || num.pos == 0 || num.pos == length(f)) { return (-1.0) } else { return (gbm.roc.area(obs=y, pred=f)) } } #' @rdname gbm.roc.area #' @export ir.measure.mrr <- function(y.f, max.rank) { y <- y.f[[1]] f <- y.f[[2]] num.pos <- sum(y>0) if (length(f) <= 1 || num.pos == 0 || num.pos == length(f)) { return (-1.0) } ord <- order(f, decreasing=TRUE) min.idx.pos <- min(which(y[ord]>0)) if (min.idx.pos <= max.rank) { return (1.0 / min.idx.pos) } else { return (0.0) } } #' @rdname gbm.roc.area #' @export ir.measure.map <- function(y.f, max.rank=0) { # Note: max.rank is meaningless for MAP y <- y.f[[1]] f <- y.f[[2]] ord <- order(f, decreasing=TRUE) idx.pos <- which(y[ord]>0) num.pos <- length(idx.pos) if (length(f) <= 1 || num.pos == 0 || num.pos == length(f)) { return (-1.0) } # Above and including the rank of the i-th positive result, # there are exactly i positives and rank(i) total results return (sum((1:length(idx.pos))/idx.pos) / num.pos) } #' @rdname gbm.roc.area #' @export ir.measure.ndcg <- function(y.f, max.rank) { y <- y.f[[1]] f <- y.f[[2]] if (length(f) <= 1 || all(diff(y)==0)) { return (-1.0) } num.items <- min(length(f), max.rank) ord <- order(f, decreasing=TRUE) dcg <- sum(y[ord][1:num.items] / log2(2:(num.items+1))) # The best possible DCG: order by target ord.max <- order(y, decreasing=TRUE) dcg.max <- sum(y[ord.max][1:num.items] / log2(2:(num.items+1))) # Normalize return (dcg / dcg.max) } #' @rdname gbm.roc.area #' @export perf.pairwise <- function(y, f, group, metric="ndcg", w=NULL, max.rank=0) { func.name <- switch(metric, conc = "ir.measure.conc", mrr = "ir.measure.mrr", map = "ir.measure.map", ndcg = "ir.measure.ndcg", stop(paste("Metric",metric,"is not supported")) ) # Optimization: for binary targets, # AUC is equivalent but faster than CONC if (metric == "conc" && all(is.element(y, 0:1))) { func.name <- "ir.measure.auc" } # Max rank = 0 means no cut off if (max.rank <= 0) { max.rank <- length(y)+1 } # Random tie breaking in case of duplicate scores. # (Without tie breaking, we would overestimate if instances are # sorted descending on target) f <- f + 1E-10 * runif(length(f), min=-0.5, max=0.5) measure.by.group <- as.matrix(by(list(y, f), INDICES=group, FUN=get(func.name), max.rank=max.rank)) # Exclude groups with single result or only negative or positive instances idx <- which((!is.null(measure.by.group)) & measure.by.group >= 0) if (is.null(w)) { return (mean(measure.by.group[idx])) } else { # Assumption: weights are constant per group w.by.group <- tapply(w, group, mean) return (weighted.mean(measure.by.group[idx], w=w.by.group[idx])) } } gbm/R/gbmCrossVal.R0000644000176200001440000001727313346511223013577 0ustar liggesusers#' Cross-validate a gbm #' #' Functions for cross-validating gbm. These functions are used internally and #' are not intended for end-user direct usage. #' #' These functions are not intended for end-user direct usage, but are used #' internally by \code{gbm}. #' #' @aliases gbmCrossVal gbmCrossValModelBuild gbmDoFold gbmCrossValErr #' gbmCrossValPredictions #' @param cv.folds The number of cross-validation folds. #' @param nTrain The number of training samples. #' @param n.cores The number of cores to use. #' @param class.stratify.cv Whether or not stratified cross-validation samples #' are used. #' @param data The data. #' @param x The model matrix. #' @param y The response variable. #' @param offset The offset. #' @param distribution The type of loss function. See \code{\link{gbm}}. #' @param w Observation weights. #' @param var.monotone See \code{\link{gbm}}. #' @param n.trees The number of trees to fit. #' @param interaction.depth The degree of allowed interactions. See #' \code{\link{gbm}}. #' @param n.minobsinnode See \code{\link{gbm}}. #' @param shrinkage See \code{\link{gbm}}. #' @param bag.fraction See \code{\link{gbm}}. #' @param var.names See \code{\link{gbm}}. #' @param response.name See \code{\link{gbm}}. #' @param group Used when \code{distribution = "pairwise"}. See #' \code{\link{gbm}}. #' @param i.train Items in the training set. #' @param cv.models A list containing the models for each fold. #' @param cv.group A vector indicating the cross-validation fold for each #' member of the training set. #' @param best.iter.cv The iteration with lowest cross-validation error. #' @param X Index (cross-validation fold) on which to subset. #' @param s Random seed. #' @return A list containing the cross-validation error and predictions. #' @author Greg Ridgeway \email{gregridgeway@@gmail.com} #' @seealso \code{\link{gbm}} #' @references J.H. Friedman (2001). "Greedy Function Approximation: A Gradient #' Boosting Machine," Annals of Statistics 29(5):1189-1232. #' #' L. Breiman (2001). #' \url{https://www.stat.berkeley.edu/users/breiman/randomforest2001.pdf}. #' @keywords models # Perform gbm cross-validation # # This function has far too many arguments, but there isn't the # abstraction in gbm to lose them. #' @rdname gbmCrossVal #' @export gbmCrossVal <- function(cv.folds, nTrain, n.cores, class.stratify.cv, data, x, y, offset, distribution, w, var.monotone, n.trees, interaction.depth, n.minobsinnode, shrinkage, bag.fraction, var.names, response.name, group) { i.train <- 1:nTrain cv.group <- getCVgroup(distribution, class.stratify.cv, y, i.train, cv.folds, group) ## build the models cv.models <- gbmCrossValModelBuild(cv.folds, cv.group, n.cores, i.train, x, y, offset, distribution, w, var.monotone, n.trees, interaction.depth, n.minobsinnode, shrinkage, bag.fraction, var.names, response.name, group) ## get the errors cv.error <- gbmCrossValErr(cv.models, cv.folds, cv.group, nTrain, n.trees) best.iter.cv <- which.min(cv.error) ## get the predictions predictions <- gbmCrossValPredictions(cv.models, cv.folds, cv.group, best.iter.cv, distribution, data[i.train, ], y) list(error = cv.error, predictions = predictions) } # Get the gbm cross-validation error #' @rdname gbmCrossVal #' @export gbmCrossValErr <- function(cv.models, cv.folds, cv.group, nTrain, n.trees) { in.group <- tabulate(cv.group, nbins=cv.folds) cv.error <- vapply(1:cv.folds, function(index) { model <- cv.models[[index]] model$valid.error * in.group[[index]] }, double(n.trees)) ## this is now a (n.trees, cv.folds) matrix ## and now a n.trees vector rowSums(cv.error) / nTrain } #' @rdname gbmCrossVal #' @export gbmCrossValPredictions <- function(cv.models, cv.folds, cv.group, best.iter.cv, distribution, data, y) { # Get the predictions for GBM cross validation. This function is not as nice # as it could be (i.e., leakage of y) # Test that cv.group and data match if (nrow(data) != length(cv.group)) { stop("Mismatch between `data` and `cv.group`.") } # This is a little complicated due to multinomial distribution num.cols <- if (distribution$name == "multinomial") { nlevels(factor(y)) } else { 1 } # Initialize results matrix res <- matrix(nrow = nrow(data), ncol = num.cols) # There's no real reason to do this as other than a for loop data.names <- names(data) # column names for (ind in 1:cv.folds) { # These are the particular elements flag <- cv.group == ind model <- cv.models[[ind]] # The %in% here is to handle coxph my.data <- data[flag, !(data.names %in% model$response.name)] predictions <- predict(model, newdata = my.data, n.trees = best.iter.cv) # FIXME predictions <- matrix(predictions, ncol = num.cols) res[flag, ] <- predictions } # Handle multinomial case if (distribution$name != "multinomial") { res <- as.numeric(res) } # Return the result res } # Perform gbm cross-validation # # This function has far too many arguments. #' @rdname gbmCrossVal #' @export gbmCrossValModelBuild <- function(cv.folds, cv.group, n.cores, i.train, x, y, offset, distribution, w, var.monotone, n.trees, interaction.depth, n.minobsinnode, shrinkage, bag.fraction, var.names, response.name, group) { # Set up cluster and add finalizer cluster <- gbmCluster(n.cores) on.exit(parallel::stopCluster(cluster)) # Set random seeds seeds <- as.integer(runif(cv.folds, -(2^31 - 1), 2^31)) # Perform cross-validation model builds parallel::parLapply(cl = cluster, X = 1:cv.folds, fun = gbmDoFold, i.train, x, y, offset, distribution, w, var.monotone, n.trees, interaction.depth, n.minobsinnode, shrinkage, bag.fraction, cv.group, var.names, response.name, group, seeds) } #' @rdname gbmCrossVal #' @export gbmDoFold <- function(X, i.train, x, y, offset, distribution, w, var.monotone, n.trees, interaction.depth, n.minobsinnode, shrinkage, bag.fraction, cv.group, var.names, response.name, group, s) { # Do specified cross-validation fold - a self-contained function for passing # to individual cores. # Load required packages for core library(gbm, quietly=TRUE) # Print CV information cat("CV:", X, "\n") # Setup set.seed(s[[X]]) i <- order(cv.group == X) x <- x[i.train,,drop=TRUE][i,,drop=FALSE] y <- y[i.train][i] offset <- offset[i.train][i] nTrain <- length(which(cv.group != X)) group <- group[i.train][i] # Fit a GBM res <- gbm.fit(x = x, y = y, offset = offset, distribution = distribution, w = w, var.monotone = var.monotone, n.trees = n.trees, interaction.depth = interaction.depth, n.minobsinnode = n.minobsinnode, shrinkage = shrinkage, bag.fraction = bag.fraction, nTrain = nTrain, keep.data = FALSE, verbose = FALSE, response.name = response.name, group = group) # Return the result res } gbm/R/zzz.R0000644000176200001440000000022413346511223012176 0ustar liggesusers#' @keywords internal .onAttach <- function(lib, pkg) { vers <- utils::packageVersion("gbm") packageStartupMessage(paste("Loaded gbm", vers)) } gbm/vignettes/0000755000176200001440000000000013417115400013023 5ustar liggesusersgbm/vignettes/oobperf2.pdf0000644000176200001440000002317113346511223015244 0ustar liggesusers%PDF-1.3 %쏢 6 0 obj <> stream xZKof n VMMMct^I`O03iKQg{&6M)Ýp?[5v,ḻrvj3vtR.?onp2?[yw!3ḹOl5]Mݺp|zp 8'^Rm~z;\);i.L*[:*nѤ|m`H-kkpgbϧ[d&PrUk눍\FX+*6 {'R2FKHdAli쥝%mXSb{^Kb = ƺ\]fki1ZvYYi|aNf_]WEKkS?E$4l[' z`NϤ^=bvRVBYmC\ nmX+6ZN& ƚXs6DC^hEՐShk]䴎,ド_xmoƫ>BnLԳ05YiSyypVU ll2c8^< Z`73 |6$:a%x[Gi**zQ;Z:bʲ*,֖=U{9gz菼*`e4ҋ23F&YV)[(\e&VoBj`s=4IL]㊸,u:x05Ci2!{l% e6hUI5):%]4-(xi5xsp1nt7.IϰZBW*] TْiO| t3FH\jb\@.C`W,g0ֆ vaV' ©ͫn 4l"&踤Y]EPxPQ&.Sd\Z4a'pv40.ʖ |ʴi"%`&6L^]>B' lN\.Fco%:W,۳xd~{A6 !{ |-}l4mPFEP REE6$U+C d]EU:f"evМIs *ﰋ(DѶ.+t9&l3 = ]|n7A.0)o|"T j͙dQm >idrUɅ[n̖Ud.lxN: \aeEL,̴f|f0?gcՅ^R]ޘXRG+`meV l/X}㥙`Ukya&[&"33?rx3Xe?=7fƙr0!^B^3@]釳81pe֍{]W:?lJ/UFO:L=Czᡥ&9uib#?TqL],~H6+<=ϧ˦ |vld'S ?g aG;kN$Ahe<$X=4ފnxXK1> /Contents 6 0 R >> endobj 3 0 obj << /Type /Pages /Kids [ 5 0 R ] /Count 1 >> endobj 1 0 obj <> endobj 4 0 obj <> endobj 14 0 obj <> endobj 15 0 obj <> endobj 12 0 obj <> endobj 11 0 obj <>stream x]V TqVwY=A60:Ȏq[D ↚ {T "aP02䛨?I 4?sTwW}{o2@D"n V^JU2-!&Z?9A ##O_HJq吒{M/(HW薔&j33gΔ/](On2d21m܍VbqTytl2V V)%2,vvdp^*NL+ʸtUtLR5.15)M?%.5>-!=(C^8u k8{)GS6ʟR(;ʓ)/*)*J-|)Wj:5r8ʌ2QF1%L(2ԗxRVʀ) 9`jpqx@e\/Iܕufe&.jA)# LQ3a.$su"#m1 w g:};]#d ZHؒ*LnN~6__}fi\x-6 mvh0ik/wG#6Z˦K\wFQq{iZoHtUc}Xk,\Z+ 1:ڢvovS mͥBӍsYDzE7#mLM+:Sx8ɼAy&tI9/nҰF">&ҘCa.jxX'li8Iq4qMˎ_S$;rw Z~4nȴ$C,LjD:Tl*Ǻ{!eEEyߝ51kٶ qv` 8|D;`g;^hC+LI^DYBP<W1 yw{28xcLJ vllh5΄}$Ep "b5'ꜰ\p"KsgbL1O)&fUϑNWrҊ7M pœ"=6^Nu!R b[.& ހdo^f~*yҬlGȱZtIdJڰ2%-AdUb}jO({(`eOWa;zՒ4eBd }jIizRteH\w 'wbHupGtLU}y VZ?N⾵,jyH+矯IlYH~2cy(e`<.o&x,) 1| mtZ+'ag-8l-,ap.@b@uShH{TS"ahXH t)}u}sg8_+sO2)W~zB2rxDhI)MGs(\"7֥S y(Q2|l]:XwcGPB% sK}vA/d(Fmuck;J\rOn.[m{mvg@Ni 4> endobj 8 0 obj <>stream xcd`ab`dduM- M f!Cﵿf2`e̻GOqE% A:CKKKJKjqfzQ_Wb TSYQYP_ᬩ`d`` $JsSu222K*RsSrSRAfU*due&+=o^X) L,|C泭zr2σ< }nx endstream endobj 17 0 obj 298 endobj 10 0 obj <> endobj 13 0 obj <> endobj 2 0 obj <>endobj xref 0 18 0000000000 65535 f 0000002821 00000 n 0000009361 00000 n 0000002762 00000 n 0000002869 00000 n 0000002602 00000 n 0000000015 00000 n 0000002582 00000 n 0000006633 00000 n 0000006395 00000 n 0000007036 00000 n 0000003360 00000 n 0000003011 00000 n 0000008197 00000 n 0000002938 00000 n 0000002968 00000 n 0000006374 00000 n 0000007016 00000 n trailer << /Size 18 /Root 1 0 R /Info 2 0 R >> startxref 9411 %%EOF gbm/vignettes/srcltx.sty0000644000176200001440000001170413346511223015112 0ustar liggesusers%% %% This is file `srcltx.sty', %% generated with the docstrip utility. %% %% The original source files were: %% %% srcltx.dtx (with options: `package,latex') %% %% This package is in the public domain. It comes with no guarantees %% and no reserved rights. You can use or modify this package at your %% own risk. %% Originally written by: Aleksander Simonic %% Current maintainer: Stefan Ulrich %% \NeedsTeXFormat{LaTeX2e} \ProvidesPackage{srcltx}[2006/11/12 v1.6 Source specials for inverse search in DVI files] \newif\ifSRCOK \SRCOKtrue \newif\ifsrc@debug@ \newif\ifsrc@dviwin@ \newif\ifsrc@winedt@\src@winedt@true \newif\ifsrc@everypar@\src@everypar@true \newif\ifsrc@everymath@\src@everymath@true \RequirePackage{ifthen} \DeclareOption{active}{\SRCOKtrue} \DeclareOption{inactive}{\SRCOKfalse} \DeclareOption{nowinedt}{\src@winedt@false} \DeclareOption{debug}{\src@debug@true} \DeclareOption{nopar}{\global\src@everypar@false} \DeclareOption{nomath}{\global\src@everymath@false} \newcommand*\src@maybe@space{} \let\src@maybe@space\space \DeclareOption{dviwin}{\let\src@maybe@space\relax} \ExecuteOptions{active} \ProcessOptions \newcount\src@lastline \global\src@lastline=-1 \newcommand*\src@debug{} \def\src@debug#1{\ifsrc@debug@\typeout{DBG: |#1|}\fi} \newcommand*\MainFile{} \def\MainFile{\jobname.tex} \newcommand*\CurrentInput{} \gdef\CurrentInput{\MainFile} \newcommand*\WinEdt{} \def\WinEdt#1{\ifsrc@winedt@\typeout{:#1}\fi} \newcommand\src@AfterFi{} \def\src@AfterFi#1\fi{\fi#1} \AtBeginDocument{% \@ifpackageloaded{soul}{% \let\src@SOUL@\SOUL@ \def\SOUL@#1{% \ifSRCOK \SRCOKfalse\src@SOUL@{#1}\SRCOKtrue \else \src@AfterFi\src@SOUL@{#1}% \fi }% }{}% } \newcommand*\srcIncludeHook[1]{\protected@xdef\CurrentInput{#1.tex}} \newcommand*\srcInputHook[1]{% \src@getfilename@with@ext{#1}% } \newcommand*\src@spec{} \def\src@spec{% \ifSRCOK \ifnum\inputlineno>\src@lastline \global\src@lastline=\inputlineno \src@debug{% src:\the\inputlineno\src@maybe@space\CurrentInput}% \special{src:\the\inputlineno\src@maybe@space\CurrentInput}% \fi \fi } \newcommand\src@before@file@hook{} \newcommand\src@after@file@hook{} \def\src@before@file@hook{% \WinEdt{<+ \CurrentInput}% \global\src@lastline=0 \ifSRCOK\special{src:1\src@maybe@space\CurrentInput}\fi } \def\src@after@file@hook#1{% \WinEdt{<-}% \global\src@lastline=\inputlineno \global\advance\src@lastline by -1% \gdef\CurrentInput{#1}% \src@spec } \newcommand*\src@fname{}% \newcommand*\src@tempa{}% \newcommand*\src@extensions@path{}% \newcommand*\src@getfilename@with@ext{}% \def\src@extensions@path#1.#2\end{% \ifthenelse{\equal{#2}{}}{% \protected@edef\src@extensions@last{#1}% \let\src@tempa\relax }{% \def\src@tempa{\src@extensions@path#2\end}% }% \src@tempa } \def\src@getfilename@with@ext#1{% \expandafter\src@extensions@path#1.\end \ifthenelse{\equal{\src@extensions@last}{tex}}{% \protected@xdef\CurrentInput{#1}% }{% \protected@xdef\CurrentInput{#1.tex}% }% \PackageInfo{srcltx}{Expanded filename `#1' to `\CurrentInput'}% } \newcommand*\src@include{} \newcommand*\src@@include{} \let\src@include\include \def\include#1{% \src@spec \clearpage \expandafter\src@@include\expandafter{\CurrentInput}{#1}% }% \def\src@@include#1#2{% \srcIncludeHook{#2}% \src@before@file@hook \src@include{#2}% \src@after@file@hook{#1}% } \newcommand*\src@input{} \newcommand*\src@@input{} \newcommand*\src@@@input{} \let\src@input\input \def\input{\src@spec\@ifnextchar\bgroup\src@@input\@@input}% \def\src@@input#1{% \expandafter\src@@@input\expandafter{\CurrentInput}{#1}% } \def\src@@@input#1#2{% \srcInputHook{#2}% \src@before@file@hook \src@input{#2}% \src@after@file@hook{#1}% } \newcommand\Input{} \let\Input\input \ifsrc@everypar@ \newcommand*\src@old@everypar{} \let\src@old@everypar\everypar \newtoks\src@new@everypar \let\everypar\src@new@everypar \everypar\expandafter{\the\src@old@everypar} \src@old@everypar{\the\src@new@everypar\src@spec} \fi \ifsrc@everymath@ \def\@tempa#1\the\everymath#2\delimiter{{#1\src@spec\the\everymath#2}} \frozen@everymath=\expandafter\@tempa\the\frozen@everymath\delimiter \fi \newcommand*\src@bibliography{} \newcommand*\src@@bibliography{} \let\src@bibliography\bibliography \def\bibliography#1{% \expandafter\src@@bibliography\expandafter{\CurrentInput}{#1}% } \def\src@@bibliography#1#2{% \protected@xdef\CurrentInput{\jobname.bbl}% \src@before@file@hook \src@bibliography{#2}% \src@after@file@hook{#1}% } \newcommand*\src@old@output{} \let\src@old@output\output \newtoks\src@new@output \let\output\src@new@output \output\expandafter{\the\src@old@output} \src@old@output{\SRCOKfalse\the\src@new@output} \endinput %% %% End of file `srcltx.sty'. gbm/vignettes/gbm.Rnw0000644000176200001440000007264113346511223014276 0ustar liggesusers\documentclass{article} \bibliographystyle{plain} \newcommand{\EV}{\mathrm{E}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\aRule}{\begin{center} \rule{5in}{1mm} \end{center}} \title{Generalized Boosted Models:\\A guide to the gbm package} \author{Greg Ridgeway} %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{Generalized Boosted Models: A guide to the gbm package} \newcommand{\mathgbf}[1]{{\mbox{\boldmath$#1$\unboldmath}}} \begin{document} \maketitle Boosting takes on various forms with different programs using different loss functions, different base models, and different optimization schemes. The gbm package takes the approach described in \cite{Friedman:2001} and \cite{Friedman:2002}. Some of the terminology differs, mostly due to an effort to cast boosting terms into more standard statistical terminology (e.g. deviance). In addition, the gbm package implements boosting for models commonly used in statistics but not commonly associated with boosting. The Cox proportional hazard model, for example, is an incredibly useful model and the boosting framework applies quite readily with only slight modification \cite{Ridgeway:1999}. Also some algorithms implemented in the gbm package differ from the standard implementation. The AdaBoost algorithm \cite{FreundSchapire:1997} has a particular loss function and a particular optimization algorithm associated with it. The gbm implementation of AdaBoost adopts AdaBoost's exponential loss function (its bound on misclassification rate) but uses Friedman's gradient descent algorithm rather than the original one proposed. So the main purposes of this document is to spell out in detail what the gbm package implements. \section{Gradient boosting} This section essentially presents the derivation of boosting described in \cite{Friedman:2001}. The gbm package also adopts the stochastic gradient boosting strategy, a small but important tweak on the basic algorithm, described in \cite{Friedman:2002}. \subsection{Friedman's gradient boosting machine} \label{sec:GradientBoostingMachine} \begin{figure} \aRule Initialize $\hat f(\mathbf{x})$ to be a constant, $\hat f(\mathbf{x}) = \arg \min_{\rho} \sum_{i=1}^N \Psi(y_i,\rho)$. \\ For $t$ in $1,\ldots,T$ do \begin{enumerate} \item Compute the negative gradient as the working response \begin{equation} z_i = -\frac{\partial}{\partial f(\mathbf{x}_i)} \Psi(y_i,f(\mathbf{x}_i)) \mbox{\Huge $|$}_{f(\mathbf{x}_i)=\hat f(\mathbf{x}_i)} \end{equation} \item Fit a regression model, $g(\mathbf{x})$, predicting $z_i$ from the covariates $\mathbf{x}_i$. \item Choose a gradient descent step size as \begin{equation} \rho = \arg \min_{\rho} \sum_{i=1}^N \Psi(y_i,\hat f(\mathbf{x}_i)+\rho g(\mathbf{x}_i)) \end{equation} \item Update the estimate of $f(\mathbf{x})$ as \begin{equation} \hat f(\mathbf{x}) \leftarrow \hat f(\mathbf{x}) + \rho g(\mathbf{x}) \end{equation} \end{enumerate} \aRule \caption{Friedman's Gradient Boost algorithm} \label{fig:GradientBoost} \end{figure} Friedman (2001) and the companion paper Friedman (2002) extended the work of Friedman, Hastie, and Tibshirani (2000) and laid the ground work for a new generation of boosting algorithms. Using the connection between boosting and optimization, this new work proposes the Gradient Boosting Machine. In any function estimation problem we wish to find a regression function, $\hat f(\mathbf{x})$, that minimizes the expectation of some loss function, $\Psi(y,f)$, as shown in (\ref{NonparametricRegression1}). \begin{eqnarray} \hspace{0.5in} \hat f(\mathbf{x}) &=& \arg \min_{f(\mathbf{x})} \EV_{y,\mathbf{x}} \Psi(y,f(\mathbf{x})) \nonumber \\ \label{NonparametricRegression1} &=& \arg \min_{f(\mathbf{x})} \EV_x \left[ \EV_{y|\mathbf{x}} \Psi(y,f(\mathbf{x})) \Big| \mathbf{x} \right] \end{eqnarray} We will focus on finding estimates of $f(\mathbf{x})$ such that \begin{equation} \label{NonparametricRegression2} \hspace{0.5in} \hat f(\mathbf{x}) = \arg \min_{f(\mathbf{x})} \EV_{y|\mathbf{x}} \left[ \Psi(y,f(\mathbf{x}))|\mathbf{x} \right] \end{equation} Parametric regression models assume that $f(\mathbf{x})$ is a function with a finite number of parameters, $\beta$, and estimates them by selecting those values that minimize a loss function (e.g. squared error loss) over a training sample of $N$ observations on $(y,\mathbf{x})$ pairs as in (\ref{eq:Friedman1}). \begin{equation} \label{eq:Friedman1} \hspace{0.5in} \hat\beta = \arg \min_{\beta} \sum_{i=1}^N \Psi(y_i,f(\mathbf{x}_i;\beta)) \end{equation} When we wish to estimate $f(\mathbf{x})$ non-parametrically the task becomes more difficult. Again we can proceed similarly to \cite{FHT:2000} and modify our current estimate of $f(\mathbf{x})$ by adding a new function $f(\mathbf{x})$ in a greedy fashion. Letting $f_i = f(\mathbf{x}_i)$, we see that we want to decrease the $N$ dimensional function \begin{eqnarray} \label{EQ:Friedman2} \hspace{0.5in} J(\mathbf{f}) &=& \sum_{i=1}^N \Psi(y_i,f(\mathbf{x}_i)) \nonumber \\ &=& \sum_{i=1}^N \Psi(y_i,F_i). \end{eqnarray} The negative gradient of $J(\mathbf{f})$ indicates the direction of the locally greatest decrease in $J(\mathbf{f})$. Gradient descent would then have us modify $\mathbf{f}$ as \begin{equation} \label{eq:Friedman3} \hspace{0.5in} \hat \mathbf{f} \leftarrow \hat \mathbf{f} - \rho \nabla J(\mathbf{f}) \end{equation} where $\rho$ is the size of the step along the direction of greatest descent. Clearly, this step alone is far from our desired goal. First, it only fits $f$ at values of $\mathbf{x}$ for which we have observations. Second, it does not take into account that observations with similar $\mathbf{x}$ are likely to have similar values of $f(\mathbf{x})$. Both these problems would have disastrous effects on generalization error. However, Friedman suggests selecting a class of functions that use the covariate information to approximate the gradient, usually a regression tree. This line of reasoning produces his Gradient Boosting algorithm shown in Figure~\ref{fig:GradientBoost}. At each iteration the algorithm determines the direction, the gradient, in which it needs to improve the fit to the data and selects a particular model from the allowable class of functions that is in most agreement with the direction. In the case of squared-error loss, $\Psi(y_i,f(\mathbf{x}_i)) = \sum_{i=1}^N (y_i-f(\mathbf{x}_i))^2$, this algorithm corresponds exactly to residual fitting. There are various ways to extend and improve upon the basic framework suggested in Figure~\ref{fig:GradientBoost}. For example, Friedman (2001) substituted several choices in for $\Psi$ to develop new boosting algorithms for robust regression with least absolute deviation and Huber loss functions. Friedman (2002) showed that a simple subsampling trick can greatly improve predictive performance while simultaneously reduce computation time. Section~\ref{GBMModifications} discusses some of these modifications. \section{Improving boosting methods using control of the learning rate, sub-sampling, and a decomposition for interpretation} \label{GBMModifications} This section explores the variations of the previous algorithms that have the potential to improve their predictive performance and interpretability. In particular, by controlling the optimization speed or learning rate, introducing low-variance regression methods, and applying ideas from robust regression we can produce non-parametric regression procedures with many desirable properties. As a by-product some of these modifications lead directly into implementations for learning from massive datasets. All these methods take advantage of the general form of boosting \begin{equation} \hat f(\mathbf{x}) \leftarrow \hat f(\mathbf{x}) + \EV(z(y,\hat f(\mathbf{x}))|\mathbf{x}). \end{equation} So far we have taken advantage of this form only by substituting in our favorite regression procedure for $\EV_w(z|\mathbf{x})$. I will discuss some modifications to estimating $\EV_w(z|\mathbf{x})$ that have the potential to improve our algorithm. \subsection{Decreasing the learning rate} As several authors have phrased slightly differently, ``...boosting, whatever flavor, seldom seems to overfit, no matter how many terms are included in the additive expansion''. This is not true as the discussion to \cite{FHT:2000} points out. In the update step of any boosting algorithm we can introduce a learning rate to dampen the proposed move. \begin{equation} \label{eq:shrinkage} \hat f(\mathbf{x}) \leftarrow \hat f(\mathbf{x}) + \lambda \EV(z(y,\hat f(\mathbf{x}))|\mathbf{x}). \end{equation} By multiplying the gradient step by $\lambda$ as in equation~\ref{eq:shrinkage} we have control on the rate at which the boosting algorithm descends the error surface (or ascends the likelihood surface). When $\lambda=1$ we return to performing full gradient steps. Friedman (2001) relates the learning rate to regularization through shrinkage. The optimal number of iterations, $T$, and the learning rate, $\lambda$, depend on each other. In practice I set $\lambda$ to be as small as possible and then select $T$ by cross-validation. Performance is best when $\lambda$ is as small as possible performance with decreasing marginal utility for smaller and smaller $\lambda$. Slower learning rates do not necessarily scale the number of optimal iterations. That is, if when $\lambda=1.0$ and the optimal $T$ is 100 iterations, does {\it not} necessarily imply that when $\lambda=0.1$ the optimal $T$ is 1000 iterations. \subsection{Variance reduction using subsampling} Friedman (2002) proposed the stochastic gradient boosting algorithm that simply samples uniformly without replacement from the dataset before estimating the next gradient step. He found that this additional step greatly improved performance. We estimate the regression $\EV(z(y,\hat f(\mathbf{x}))|\mathbf{x})$ using a random subsample of the dataset. \subsection{ANOVA decomposition} Certain function approximation methods are decomposable in terms of a ``functional ANOVA decomposition''. That is a function is decomposable as \begin{equation} \label{ANOVAdecomp} f(\mathbf{x}) = \sum_j f_j(x_j) + \sum_{jk} f_{jk}(x_j,x_k) + \sum_{jk\ell} f_{jk\ell}(x_j,x_k,x_\ell) + \cdots. \end{equation} This applies to boosted trees. Regression stumps (one split decision trees) depend on only one variable and fall into the first term of \ref{ANOVAdecomp}. Trees with two splits fall into the second term of \ref{ANOVAdecomp} and so on. By restricting the depth of the trees produced on each boosting iteration we can control the order of approximation. Often additive components are sufficient to approximate a multivariate function well, generalized additive models, the na\"{\i}ve Bayes classifier, and boosted stumps for example. When the approximation is restricted to a first order we can also produce plots of $x_j$ versus $f_j(x_j)$ to demonstrate how changes in $x_j$ might affect changes in the response variable. \subsection{Relative influence} Friedman (2001) also develops an extension of a variable's ``relative influence'' for boosted estimates. For tree based methods the approximate relative influence of a variable $x_j$ is \begin{equation} \label{RelInfluence} \hspace{0.5in} \hat J_j^2 = \hspace{-0.1in}\sum_{\mathrm{splits~on~}x_j}\hspace{-0.2in}I_t^2 \end{equation} where $I_t^2$ is the empirical improvement by splitting on $x_j$ at that point. Friedman's extension to boosted models is to average the relative influence of variable $x_j$ across all the trees generated by the boosting algorithm. \begin{figure} \aRule Select \begin{itemize} \item a loss function (\texttt{distribution}) \item the number of iterations, $T$ (\texttt{n.trees}) \item the depth of each tree, $K$ (\texttt{interaction.depth}) \item the shrinkage (or learning rate) parameter, $\lambda$ (\texttt{shrinkage}) \item the subsampling rate, $p$ (\texttt{bag.fraction}) \end{itemize} Initialize $\hat f(\mathbf{x})$ to be a constant, $\hat f(\mathbf{x}) = \arg \min_{\rho} \sum_{i=1}^N \Psi(y_i,\rho)$ \\ For $t$ in $1,\ldots,T$ do \begin{enumerate} \item Compute the negative gradient as the working response \begin{equation} z_i = -\frac{\partial}{\partial f(\mathbf{x}_i)} \Psi(y_i,f(\mathbf{x}_i)) \mbox{\Huge $|$}_{f(\mathbf{x}_i)=\hat f(\mathbf{x}_i)} \end{equation} \item Randomly select $p\times N$ cases from the dataset \item Fit a regression tree with $K$ terminal nodes, $g(\mathbf{x})=\EV(z|\mathbf{x})$. This tree is fit using only those randomly selected observations \item Compute the optimal terminal node predictions, $\rho_1,\ldots,\rho_K$, as \begin{equation} \rho_k = \arg \min_{\rho} \sum_{\mathbf{x}_i\in S_k} \Psi(y_i,\hat f(\mathbf{x}_i)+\rho) \end{equation} where $S_k$ is the set of $\mathbf{x}$s that define terminal node $k$. Again this step uses only the randomly selected observations. \item Update $\hat f(\mathbf{x})$ as \begin{equation} \hat f(\mathbf{x}) \leftarrow \hat f(\mathbf{x}) + \lambda\rho_{k(\mathbf{x})} \end{equation} where $k(\mathbf{x})$ indicates the index of the terminal node into which an observation with features $\mathbf{x}$ would fall. \end{enumerate} \aRule \caption{Boosting as implemented in \texttt{gbm()}} \label{fig:gbm} \end{figure} \section{Common user options} This section discusses the options to gbm that most users will need to change or tune. \subsection{Loss function} The first and foremost choice is \texttt{distribution}. This should be easily dictated by the application. For most classification problems either \texttt{bernoulli} or \texttt{adaboost} will be appropriate, the former being recommended. For continuous outcomes the choices are \texttt{gaussian} (for minimizing squared error), \texttt{laplace} (for minimizing absolute error), and quantile regression (for estimating percentiles of the conditional distribution of the outcome). Censored survival outcomes should require \texttt{coxph}. Count outcomes may use \texttt{poisson} although one might also consider \texttt{gaussian} or \texttt{laplace} depending on the analytical goals. \subsection{The relationship between shrinkage and number of iterations} The issues that most new users of gbm struggle with are the choice of \texttt{n.trees} and \texttt{shrinkage}. It is important to know that smaller values of \texttt{shrinkage} (almost) always give improved predictive performance. That is, setting \texttt{shrinkage=0.001} will almost certainly result in a model with better out-of-sample predictive performance than setting \texttt{shrinkage=0.01}. However, there are computational costs, both storage and CPU time, associated with setting \texttt{shrinkage} to be low. The model with \texttt{shrinkage=0.001} will likely require ten times as many iterations as the model with \texttt{shrinkage=0.01}, increasing storage and computation time by a factor of 10. Figure~\ref{fig:shrinkViters} shows the relationship between predictive performance, the number of iterations, and the shrinkage parameter. Note that the increase in the optimal number of iterations between two choices for shrinkage is roughly equal to the ratio of the shrinkage parameters. It is generally the case that for small shrinkage parameters, 0.001 for example, there is a fairly long plateau in which predictive performance is at its best. My rule of thumb is to set \texttt{shrinkage} as small as possible while still being able to fit the model in a reasonable amount of time and storage. I usually aim for 3,000 to 10,000 iterations with shrinkage rates between 0.01 and 0.001. \begin{figure}[ht] \begin{center} \includegraphics[width=5in]{shrinkage-v-iterations} \end{center} \caption{Out-of-sample predictive performance by number of iterations and shrinkage. Smaller values of the shrinkage parameter offer improved predictive performance, but with decreasing marginal improvement.} \label{fig:shrinkViters} \end{figure} \subsection{Estimating the optimal number of iterations} gbm offers three methods for estimating the optimal number of iterations after the gbm model has been fit, an independent test set (\texttt{test}), out-of-bag estimation (\texttt{OOB}), and $v$-fold cross validation (\texttt{cv}). The function \texttt{gbm.perf} computes the iteration estimate. Like Friedman's MART software, the independent test set method uses a single holdout test set to select the optimal number of iterations. If \texttt{train.fraction} is set to be less than 1, then only the \textit{first} \texttt{train.fraction}$\times$\texttt{nrow(data)} will be used to fit the model. Note that if the data are sorted in a systematic way (such as cases for which $y=1$ come first), then the data should be shuffled before running gbm. Those observations not used in the model fit can be used to get an unbiased estimate of the optimal number of iterations. The downside of this method is that a considerable number of observations are used to estimate the single regularization parameter (number of iterations) leaving a reduced dataset for estimating the entire multivariate model structure. Use \texttt{gbm.perf(...,method="test")} to obtain an estimate of the optimal number of iterations using the held out test set. If \texttt{bag.fraction} is set to be greater than 0 (0.5 is recommended), gbm computes an out-of-bag estimate of the improvement in predictive performance. It evaluates the reduction in deviance on those observations not used in selecting the next regression tree. The out-of-bag estimator underestimates the reduction in deviance. As a result, it almost always is too conservative in its selection for the optimal number of iterations. The motivation behind this method was to avoid having to set aside a large independent dataset, which reduces the information available for learning the model structure. Use \texttt{gbm.perf(...,method="OOB")} to obtain the OOB estimate. Lastly, gbm offers $v$-fold cross validation for estimating the optimal number of iterations. If when fitting the gbm model, \texttt{cv.folds=5} then gbm will do 5-fold cross validation. gbm will fit five gbm models in order to compute the cross validation error estimate and then will fit a sixth and final gbm model with \texttt{n.trees}iterations using all of the data. The returned model object will have a component labeled \texttt{cv.error}. Note that \texttt{gbm.more} will do additional gbm iterations but will not add to the \texttt{cv.error} component. Use \texttt{gbm.perf(...,method="cv")} to obtain the cross validation estimate. \begin{figure}[ht] \begin{center} \includegraphics[width=5in]{oobperf2} \end{center} \caption{Out-of-sample predictive performance of four methods of selecting the optimal number of iterations. The vertical axis plots performance relative the best. The boxplots indicate relative performance across thirteen real datasets from the UCI repository. See \texttt{demo(OOB-reps)}.} \label{fig:oobperf} \end{figure} Figure~\ref{fig:oobperf} compares the three methods for estimating the optimal number of iterations across 13 datasets. The boxplots show the methods performance relative to the best method on that dataset. For most datasets the method perform similarly, however, 5-fold cross validation is consistently the best of them. OOB, using a 33\% test set, and using a 20\% test set all have datasets for which the perform considerably worse than the best method. My recommendation is to use 5- or 10-fold cross validation if you can afford the computing time. Otherwise you may choose among the other options, knowing that OOB is conservative. \section{Available distributions} This section gives some of the mathematical detail for each of the distribution options that gbm offers. The gbm engine written in C++ has access to a C++ class for each of these distributions. Each class contains methods for computing the associated deviance, initial value, the gradient, and the constants to predict in each terminal node. In the equations shown below, for non-zero offset terms, replace $f(\mathbf{x}_i)$ with $o_i + f(\mathbf{x}_i)$. \subsection{Gaussian} \begin{tabular}{ll} Deviance & $\displaystyle \frac{1}{\sum w_i} \sum w_i(y_i-f(\mathbf{x}_i))^2$ \\ Initial value & $\displaystyle f(\mathbf{x})=\frac{\sum w_i(y_i-o_i)}{\sum w_i}$ \\ Gradient & $z_i=y_i - f(\mathbf{x}_i)$ \\ Terminal node estimates & $\displaystyle \frac{\sum w_i(y_i-f(\mathbf{x}_i))}{\sum w_i}$ \end{tabular} \subsection{AdaBoost} \begin{tabular}{ll} Deviance & $\displaystyle \frac{1}{\sum w_i} \sum w_i\exp(-(2y_i-1)f(\mathbf{x}_i))$ \\ Initial value & $\displaystyle \frac{1}{2}\log\frac{\sum y_iw_ie^{-o_i}}{\sum (1-y_i)w_ie^{o_i}}$ \\ Gradient & $\displaystyle z_i= -(2y_i-1)\exp(-(2y_i-1)f(\mathbf{x}_i))$ \\ Terminal node estimates & $\displaystyle \frac{\sum (2y_i-1)w_i\exp(-(2y_i-1)f(\mathbf{x}_i))} {\sum w_i\exp(-(2y_i-1)f(\mathbf{x}_i))}$ \end{tabular} \subsection{Bernoulli} \begin{tabular}{ll} Deviance & $\displaystyle -2\frac{1}{\sum w_i} \sum w_i(y_if(\mathbf{x}_i)-\log(1+\exp(f(\mathbf{x}_i))))$ \\ Initial value & $\displaystyle \log\frac{\sum w_iy_i}{\sum w_i(1-y_i)}$ \\ Gradient & $\displaystyle z_i=y_i-\frac{1}{1+\exp(-f(\mathbf{x}_i))}$ \\ Terminal node estimates & $\displaystyle \frac{\sum w_i(y_i-p_i)}{\sum w_ip_i(1-p_i)}$ \\ & where $\displaystyle p_i = \frac{1}{1+\exp(-f(\mathbf{x}_i))}$ \\ \end{tabular} Notes: \begin{itemize} \item For non-zero offset terms, the computation of the initial value requires Newton-Raphson. Initialize $f_0=0$ and iterate $\displaystyle f_0 \leftarrow f_0 + \frac{\sum w_i(y_i-p_i)}{\sum w_ip_i(1-p_i)}$ where $\displaystyle p_i = \frac{1}{1+\exp(-(o_i+f_0))}$. \end{itemize} \subsection{Laplace} \begin{tabular}{ll} Deviance & $\frac{1}{\sum w_i} \sum w_i|y_i-f(\mathbf{x}_i)|$ \\ Initial value & $\mbox{median}_w(y)$ \\ Gradient & $z_i=\mbox{sign}(y_i-f(\mathbf{x}_i))$ \\ Terminal node estimates & $\mbox{median}_w(z)$ \end{tabular} Notes: \begin{itemize} \item $\mbox{median}_w(y)$ denotes the weighted median, defined as the solution to the equation $\frac{\sum w_iI(y_i\leq m)}{\sum w_i}=\frac{1}{2}$ \item \texttt{gbm()} currently does not implement the weighted median and issues a warning when the user uses weighted data with \texttt{distribution="laplace"}. \end{itemize} \subsection{Quantile regression} Contributed by Brian Kriegler (see \cite{Kriegler:2010}). \begin{tabular}{ll} Deviance & $\frac{1}{\sum w_i} \left(\alpha\sum_{y_i>f(\mathbf{x}_i)} w_i(y_i-f(\mathbf{x}_i))\right. +$ \\ & \hspace{0.5in}$\left.(1-\alpha)\sum_{y_i\leq f(\mathbf{x}_i)} w_i(f(\mathbf{x}_i)-y_i)\right)$ \\ Initial value & $\mathrm{quantile}^{(\alpha)}_w(y)$ \\ Gradient & $z_i=\alpha I(y_i>f(\mathbf{x}_i))-(1-\alpha)I(y_i\leq f(\mathbf{x}_i))$ \\ Terminal node estimates & $\mathrm{quantile}^{(\alpha)}_w(z)$ \end{tabular} Notes: \begin{itemize} \item $\mathrm{quantile}^{(\alpha)}_w(y)$ denotes the weighted quantile, defined as the solution to the equation $\frac{\sum w_iI(y_i\leq q)}{\sum w_i}=\alpha$ \item \texttt{gbm()} currently does not implement the weighted median and issues a warning when the user uses weighted data with \texttt{distribution=list(name="quantile")}. \end{itemize} \subsection{Cox Proportional Hazard} \begin{tabular}{ll} Deviance & $-2\sum w_i(\delta_i(f(\mathbf{x}_i)-\log(R_i/w_i)))$\\ Gradient & $\displaystyle z_i=\delta_i - \sum_j \delta_j \frac{w_jI(t_i\geq t_j)e^{f(\mathbf{x}_i)}} {\sum_k w_kI(t_k\geq t_j)e^{f(\mathbf{x}_k)}}$ \\ Initial value & 0 \\ Terminal node estimates & Newton-Raphson algorithm \end{tabular} \begin{enumerate} \item Initialize the terminal node predictions to 0, $\mathgbf{\rho}=0$ \item Let $\displaystyle p_i^{(k)}=\frac{\sum_j I(k(j)=k)I(t_j\geq t_i)e^{f(\mathbf{x}_i)+\rho_k}} {\sum_j I(t_j\geq t_i)e^{f(\mathbf{x}_i)+\rho_k}}$ \item Let $g_k=\sum w_i\delta_i\left(I(k(i)=k)-p_i^{(k)}\right)$ \item Let $\mathbf{H}$ be a $k\times k$ matrix with diagonal elements \begin{enumerate} \item Set diagonal elements $H_{mm}=\sum w_i\delta_i p_i^{(m)}\left(1-p_i^{(m)}\right)$ \item Set off diagonal elements $H_{mn}=-\sum w_i\delta_i p_i^{(m)}p_i^{(n)}$ \end{enumerate} \item Newton-Raphson update $\mathgbf{\rho} \leftarrow \mathgbf{\rho} - \mathbf{H}^{-1}\mathbf{g}$ \item Return to step 2 until convergence \end{enumerate} Notes: \begin{itemize} \item $t_i$ is the survival time and $\delta_i$ is the death indicator. \item $R_i$ denotes the hazard for the risk set, $R_i=\sum_{j=1}^N w_jI(t_j\geq t_i)e^{f(\mathbf{x}_i)}$ \item $k(i)$ indexes the terminal node of observation $i$ \item For speed, \texttt{gbm()} does only one step of the Newton-Raphson algorithm rather than iterating to convergence. No appreciable loss of accuracy since the next boosting iteration will simply correct for the prior iterations inadequacy. \item \texttt{gbm()} initially sorts the data by survival time. Doing this reduces the computation of the risk set from $O(n^2)$ to $O(n)$ at the cost of a single up front sort on survival time. After the model is fit, the data are then put back in their original order. \end{itemize} \subsection{Poisson} \begin{tabular}{ll} Deviance & -2$\frac{1}{\sum w_i} \sum w_i(y_if(\mathbf{x}_i)-\exp(f(\mathbf{x}_i)))$ \\ Initial value & $\displaystyle f(\mathbf{x})= \log\left(\frac{\sum w_iy_i}{\sum w_ie^{o_i}}\right)$ \\ Gradient & $z_i=y_i - \exp(f(\mathbf{x}_i))$ \\ Terminal node estimates & $\displaystyle \log\frac{\sum w_iy_i}{\sum w_i\exp(f(\mathbf{x}_i))}$ \end{tabular} The Poisson class includes special safeguards so that the most extreme predicted values are $e^{-19}$ and $e^{+19}$. This behavior is consistent with \texttt{glm()}. \subsection{Pairwise} This distribution implements ranking measures following the \emph{LambdaMart} algorithm \cite{Burges:2010}. Instances belong to \emph{groups}; all pairs of items with different labels, belonging to the same group, are used for training. In \emph{Information Retrieval} applications, groups correspond to user queries, and items to (feature vectors of) documents in the associated match set to be ranked. For consistency with typical usage, our goal is to \emph{maximize} one of the \emph{utility} functions listed below. Consider a group with instances $x_1, \dots, x_n$, ordered such that $f(x_1) \geq f(x_2) \geq \dots f(x_n)$; i.e., the \emph{rank} of $x_i$ is $i$, where smaller ranks are preferable. Let $P$ be the set of all ordered pairs such that $y_i > y_j$. \begin{enumerate} \item[{\bf Concordance:}] Fraction of concordant (i.e, correctly ordered) pairs. For the special case of binary labels, this is equivalent to the Area under the ROC Curve. $$\left\{ \begin{array}{l l}\frac{\|\{(i,j)\in P | f(x_i)>f(x_j)\}\|}{\|P\|} & P \neq \emptyset\\ 0 & \mbox{otherwise.} \end{array}\right. $$ \item[{\bf MRR:}] Mean reciprocal rank of the highest-ranked positive instance (it is assumed $y_i\in\{0,1\}$): $$\left\{ \begin{array}{l l}\frac{1}{\min\{1 \leq i \leq n |y_i=1\}} & \exists i: \, 1 \leq i \leq n, y_i=1\\ 0 & \mbox{otherwise.}\end{array}\right.$$ \item[{\bf MAP:}] Mean average precision, a generalization of MRR to multiple positive instances: $$\left\{ \begin{array}{l l} \frac{\sum_{1\leq i\leq n | y_i=1} \|\{1\leq j\leq i |y_j=1\}\|\,/\,i}{\|\{1\leq i\leq n | y_i=1\}\|} & \exists i: \, 1 \leq i \leq n, y_i=1\\ 0 & \mbox{otherwise.}\end{array}\right.$$ \item[{\bf nDCG:}] Normalized discounted cumulative gain: $$\frac{\sum_{1\leq i\leq n} \log_2(i+1) \, y_i}{\sum_{1\leq i\leq n} \log_2(i+1) \, y'_i},$$ where $y'_1, \dots, y'_n$ is a reordering of $y_1, \dots,y_n$ with $y'_1 \geq y'_2 \geq \dots \geq y'_n$. \end{enumerate} The generalization to multiple (possibly weighted) groups is straightforward. Sometimes a cut-off rank $k$ is given for \emph{MRR} and \emph{nDCG}, in which case we replace the outer index $n$ by $\min(n,k)$. The initial value for $f(x_i)$ is always zero. We derive the gradient of a cost function whose gradient locally approximates the gradient of the IR measure for a fixed ranking: \begin{eqnarray*} \Phi & = & \sum_{(i,j) \in P} \Phi_{ij}\\ & = & \sum_{(i,j) \in P} |\Delta Z_{ij}| \log \left( 1 + e^{-(f(x_i) - f(x_j))}\right), \end{eqnarray*} where $|\Delta Z_{ij}|$ is the absolute utility difference when swapping the ranks of $i$ and $j$, while leaving all other instances the same. Define \begin{eqnarray*} \lambda_{ij} & = & \frac{\partial\Phi_{ij}}{\partial f(x_i)}\\ & = & - |\Delta Z_{ij}| \frac{1}{1 + e^{f(x_i) - f(x_j)}}\\ & = & - |\Delta Z_{ij}| \, \rho_{ij}, \end{eqnarray*} with $$ \rho_{ij} = - \frac{\lambda_{ij }}{|\Delta Z_{ij}|} = \frac{1}{1 + e^{f(x_i) - f(x_j)}}$$ For the gradient of $\Phi$ with respect to $f(x_i)$, define \begin{eqnarray*} \lambda_i & = & \frac{\partial \Phi}{\partial f(x_i)}\\ & = & \sum_{j|(i,j) \in P} \lambda_{ij} - \sum_{j|(j,i) \in P} \lambda_{ji}\\ & = & - \sum_{j|(i,j) \in P} |\Delta Z_{ij}| \, \rho_{ij}\\ & & \mbox{} + \sum_{j|(j,i) \in P} |\Delta Z_{ji}| \, \rho_{ji}. \end{eqnarray*} The second derivative is \begin{eqnarray*} \gamma_i & \stackrel{def}{=} & \frac{\partial^2\Phi}{\partial f(x_i)^2}\\ & = & \sum_{j|(i,j) \in P} |\Delta Z_{ij}| \, \rho_{ij} \, (1-\rho_{ij})\\ & & \mbox{} + \sum_{j|(j,i) \in P} |\Delta Z_{ji}| \, \rho_{ji} \, (1-\rho_{ji}). \end{eqnarray*} Now consider again all groups with associated weights. For a given terminal node, let $i$ range over all contained instances. Then its estimate is $$-\frac{\sum_i v_i\lambda_{i}}{\sum_i v_i \gamma_i},$$ where $v_i=w(\mbox{\em group}(i))/\|\{(j,k)\in\mbox{\em group}(i)\}\|.$ In each iteration, instances are reranked according to the preliminary scores $f(x_i)$ to determine the $|\Delta Z_{ij}|$. Note that in order to avoid ranking bias, we break ties by adding a small amount of random noise. \bibliography{gbm} \end{document} gbm/vignettes/gbm.bib0000644000176200001440000000331013346511223014247 0ustar liggesusers@article{FreundSchapire:1997, author = {Y. Freund and R. E. Schapire}, title = {A decision-theoretic generalization of on-line learning and an application to boosting}, journal = {Journal of Computer and System Sciences}, volume = {55}, number = {1}, pages = {119--139}, year = {1997} } @article{Friedman:2001, author = {J. H. Friedman}, title = {Greedy Function Approximation: A Gradient Boosting Machine}, journal = {Annals of Statistics}, volume = {29}, number = {5}, pages = {1189--1232}, year = {2001} } @article{Friedman:2002, author = {J. H. Friedman}, title = {Stochastic Gradient Boosting}, journal = {Computational Statistics and Data Analysis}, volume = {38}, number = {4}, pages = {367--378}, year = {2002} } @article{FHT:2000, author = {J. H. Friedman and T. Hastie and and R. Tibshirani}, title = {Additive Logistic Regression: a Statistical View of Boosting}, journal = {Annals of Statistics}, volume = {28}, number = {2}, pages = {337--374}, year = {2000} } @article{Kriegler:2010, author = {B. Kriegler and R. Berk}, title = {Small Area Estimation of the Homeless in Los Angeles, An Application of Cost-Sensitive Stochastic Gradient Boosting}, journal = {Annals of Applied Statistics}, volume = {4}, number = {3}, pages = {1234--1255}, year = {2010} } @article{Ridgeway:1999, author = {G. Ridgeway}, title = {The state of boosting}, journal = {Computing Science and Statistics}, volume = {31}, pages = {172--181}, year = {1999} } @article{Burges:2010, author = {C. Burges}, title = {From RankNet to LambdaRank to LambdaMART: An Overview}, journal = {Microsoft Research Technical Report MSR-TR-2010-82}, year = {2010} } gbm/vignettes/shrinkage-v-iterations.pdf0000644000176200001440000003213613346511223020124 0ustar liggesusers%PDF-1.3 %쏢 6 0 obj <> stream x|Mϭ;R=AiYWbI@A}!և\: Zjn~ryݫ|W_O~Bޢ7?x˷fOjkg,~,i:[?owz2INom[|<?4i;wqo_k~oϚ%|_}}Om~oPS٭뿠fݾX?z[WJm֝SΞ+Sh~-#Znŏ[|97uo1V1~5_pZkxpݭ}1s㫕W)ٽoM^Ul3^9NSo~B~qj|̯׿oǟ͟iۊSwnzJ֩FZv .-Ǯ5Y.\p +EY s|Tgճ"ܬT)ϩT9`.}/kYڃ/~3?Kް۩-%zťq{hZ6kκz ُ̃6)N;+B_X?K-?%K;4~k6ȑl۞ow>?&c`3-A}慄hSZO1{pըڸn1|R?X˝0>Xx7YsIm6Kdy-^+n:s[,!ME/dP&K1M&˓d~l<+ oZǕsg̹",͋,v7X,sE?l1-K7-yo7FlD'殕5zS5eHrQ.yݲkgPw9&um 0%Y9OCKsjyN!_܍RTvz&-mC h8^!) o.  [[;ldkADבMt.Ltn61ͭ@5iNk>MtZ&:mѥ(4]#ņ&th4yid3pL?&:i&uDWvM;Q'dS&&;wOemsxp}aO]qos[1׋~˲){E\ N#+cgA:+m¦:tJF:+>,t!6)(} y{:M9as@z*Ӝk-s}x攽dS˓;{&y<6Ǹ.NAO򺦸BuaW^ gJ+>=F˛+>JK3%~;K TeZ6nˁ!+- Mit^x(vٔüpOgvCk(Ł+s]e\!\:ks=Q+vLp<ܾN\ٜaͥ |uzr+ǚ.=`qm/bsA9)/1kTic@gmvn9Mecscp;ܧpO emnUO:_ɝ| grkU= ,`hj$es:[7FmKu]"H9eսuCUu׻y:ߵɕkw:N^q?HXN? kߤ|Pn;O sw]QTi=d# 1+.̺s{k7vm;õlnXU-#Q VjW睏隕]ᚩU>eWr9:+R>d oE .'Vvaѳe⡓3-+gu ̲ٖQZw1pQn69una6`Q[#6ݜ~[cmhaֱYPq[[^ub yN5غ},s2]7p~ \ U ξ$s\s"e0(io (kݼ`\CT?tcLP*XM\V;[nE3 }D@*8{SѵnWU*-g]~rb'eypͫno8D5yJN,2ƓX-X[:-7Vƅv qTI)xxĖfp͊k[1ˠ;N IuJlm6V{XhQ~Ž7@E*.yY=\NO4hE.E+ݦua .dh}Sfs|Xbp9ƨ57F jyj|5\64rxh=i.IYFDطae1$}H(w|,0o:8}\{I齙K u;XF0 Y O6h`}IY$V?dnv рAZ p$or%r{$볐}cJI-)P쓤Tw'Yˢ>]8|{iYRigqC $[v L7[v9=(+2E]}1#g}>emay҇;uEPv0v[v"ޢc ʦdӨ+ڠ˵ŒVjgWߦ_@tasdsúnOᲣy7e"շS:AGwU9A5'AX? R}"}' Ar~`VlN!EahE+g/>I(W z TpǫEtr5 (zJ׾9* 9RL96viODȈ_L0Xer-V~_B+=u7WqHN/=mC8e|R8{&WsgWF+8pD;+ `IDkqç` qNWEg3Ing$qc "s=²<n3>*/3Ì[e*npvŲaL]q֊ iIp&p-Ic6T)#A S0W\4&qO8X[{\|4_-ϦX IVfs7cHɩm.tHp3и3Vj8wD׉yQWWlUXq 앴UY*`Zn9)ڜqf57Ϭq Uft<9׽׺<;61p <3\7ERǓTrV-YN>c\s cS!Zq3{QJi_o`aKqr7U1 aR[lfq@?_8p8-]qDqpq|q7,ǭ)7}܊-̭0^M&G`N8Y*k;S!¦,Q$MCYE&A_2%Z141?ib.T\BPc*[Aw) E; h*3m57Pd1k(gXvZq\bxf՝dj&jqɤ阋N&}LHt1LOZyh@Ndd}$&\Au`nH" W\`rV&E 1k\ÂM"eG$rD"".I!_avE rBcL 3Ht;I뉐\c)1Լ8%8N٬xcyu9٬1XH'g8㔏'VO -e"0Unm/t=2#8}J͸bqDyܮ&:Y,Cl~E qM'Ȁ*vriWuv[bjT%Vw/7QTX)[D~O Rqp>EPS=\pUt侁D+EGs=.n\X<9"i0PFRLy+֒"7WOp=Z=HR_4Yd/LBp*Uug9p~X{·gW]Gh ojӺJ7cv U WW\q:/ 0sCqZ{'5w/ # ab{niYFS%rq6CNZr_:!GY#&̻8_r,N2:rCλNZQ39O%!lWr|8^,.9|#!Gq␣60GfkA3r.Yu1&uk rbAN 68Pb7?d8ucT)[q/NgVJt;8 MLJLb @!ţ8KD2bS`ȁ' r] 4QpìGsWq⑩!9ʚ%7~Y`ȩk9M,G޼p$+%KƈG\MԒGNmxd[EJ uaFF(YdO-2{[kW>@MAqg%sdrǟ 9LoNǁs?Λ&/u ̻U0z"9pbS W %<g20teI+rA83z%6}[6Wf7T]P2YȊY)[Nirdn9e[+qdn9 sGߓ]{fe-3]rtd'2l&Ll{š6(5 +9I-康^);Gl3!1 dacDw~MZ曉(AC/+fr8v|s i$9͢&& D_l^_7 QQGFL4}4e 6k.!*ARNf**R1|31~&S]ˊ[NIjjN%1vyeC+YCWr`&| _\)Ǽ1攉"A8es"S$>'K S^Q6[+l(G2^/". z8=x=0^/f4ǀ׃?|qïB, B4sd=oݵaPN?0 %z77 OE7KA כb*o:Z]j_t4P{VL9|9ү1U{MFؼ9C,{ş69.ps_s,C9揯9̦58 шe~Wz1ی 䋎M}k2Лhϸ? ֥Л>C hD )t)u逡k\ Z7|tf# o:!MG)O H!hs;OK Fk)0I}@N^cmBU:$BX6`B'^2!Qhd3$f`% v3WnqBtPA;7vǼa|!@!w F>n9+ ?-nx &c_r4Jc:MTv3@jl[nz7l_^6C?Jz[z7ݵch~oܵŖkv/j+Sچ6f~Q[3)mksVULzQqeO!~}9jsWy=]J>1p[;.]M$H8JW˵VR^Tpw.~HpxA8os Oy> /Contents 6 0 R >> endobj 3 0 obj << /Type /Pages /Kids [ 5 0 R ] /Count 1 >> endobj 1 0 obj <> endobj 4 0 obj <> endobj 11 0 obj <> endobj 12 0 obj <> endobj 9 0 obj <> endobj 8 0 obj <>stream xUU{\aav`%(AE`,X]>peW@G`AE&$&pAH Ķ]"hXQ]QDVҞ!YL_;9w% K $I۰O>wh5ys&DS-i"~)%j!?ĺj*hIP!"ɍeeeoIOMI]&,suwߎO``tͦH4驙GFI k)T\J֨aJV^Mʗ,rz{kr UfT.MФiU9I}fZiN.])%X"'"BA,!ф$܈P%&#1D-aG0P,’#!RF"--,E^um1VV唘rljPg÷J>;Î>{-e=0tHPu]7Nd.)@w"o`@> nI)剌Ίoj$۾Zѩm GHrCTB0VGfXulFt;d&ِC>yB*~.yQ˭Os&LAm^憕)輑ϟWf1sVR1_=7fcf 3([W3 x+1|^M_NxH֌qc BƢ;$̛)qS> A`\ l`OJJ^0̍.ndޘsc>L^`kX4s)d@NuYaD"'URL] Nn31=P>ŠՍ[hx_>&[?Gyexg6FDaOlƟ "ČBkV<qQ/f S| HQ9&q^r犗anVނ|Oر2hbX xj-7'7#;z.:B^mU5D>(r:6G-xmvUGZ۞0_n> endobj 2 0 obj <>endobj xref 0 14 0000000000 65535 f 0000009358 00000 n 0000012997 00000 n 0000009299 00000 n 0000009406 00000 n 0000009139 00000 n 0000000015 00000 n 0000009119 00000 n 0000009844 00000 n 0000009537 00000 n 0000011962 00000 n 0000009475 00000 n 0000009505 00000 n 0000011941 00000 n trailer << /Size 14 /Root 1 0 R /Info 2 0 R >> startxref 13047 %%EOF gbm/README.md0000644000176200001440000000401413346511223012275 0ustar liggesusersgbm === [![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/gbm)](https://cran.r-project.org/package=gbm) [![Build Status](https://travis-ci.org/gbm-developers/gbm.svg?branch=master)](https://travis-ci.org/gbm-developers/gbm) [![Downloads](http://cranlogs.r-pkg.org/badges/gbm)](http://cranlogs.r-pkg.org/badges/gbm) [![Total Downloads](http://cranlogs.r-pkg.org/badges/grand-total/gbm)](http://cranlogs.r-pkg.org/badges/grand-total/gbm) Overview -------- The gbm package (which stands for **g**eneralized **b**oosted **m**odels) implements extensions to Freund and Schapire’s AdaBoost algorithm and [Friedman’s gradient boosting machine](http://projecteuclid.org/euclid.aos/1013203451). It includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (i.e., [LambdaMart](https://www.microsoft.com/en-us/research/publication/from-ranknet-to-lambdarank-to-lambdamart-an-overview/)). Installation ------------ ``` r # The easiest way to get gbm is to it install from CRAN: install.packages("gbm") # Or the the development version from GitHub: # install.packages("devtools") devtools::install_github("gbm-developers/gbm") ``` Lifecycle --------- [![lifecycle](https://img.shields.io/badge/lifecycle-retired-orange.svg)](https://www.tidyverse.org/lifecycle/#retired) The gbm package is retired and no longer under active development. We will only make the necessary changes to ensure that gbm remain on CRAN. For the most part, no new features will be added, and only the most critical of bugs will be fixed. This is a maintained version of `gbm` back compatible to CRAN versions of `gbm` 2.1.x. It exists mainly for the purpose of reproducible research and data analyses performed with the 2.1.x versions of `gbm`. For newer development, and a more consistent API, try out the [gbm3](https://github.com/gbm-developers/gbm3) package! gbm/MD50000644000176200001440000001417513417121763011344 0ustar liggesusers108bdba2eb6f2ba6ce890f47224ef68f *CHANGES 894d28d233ef8843240f7fca545caea0 *DESCRIPTION 67f2f9cc8297be2f12dfe86e05277383 *LICENSE 00dda5f78be66b96a668b74b523fcac1 *NAMESPACE f49617fcc735cf616817886d616d9ee2 *NEWS.md 061c315ef880f845918ff59cce721239 *R/basehaz.gbm.R aef3622e1f5a19f9c74616130321851f *R/calibrate.plot.R af7dcaeddbc7e6eb31b66290a98c0a1c *R/gbm-internals.R 2f21a77c0c4d5274533173b223f7f05e *R/gbm-package.R 6c851a0da731e8611f499738d7ebc3b7 *R/gbm.R bd784b825af03b7576017cdd45c696fa *R/gbm.fit.R 2f6a79af8a23dd4be5283881a82e5f5c *R/gbm.more.R cdcc395f477e8a83fde52d313d5d9760 *R/gbm.object.R f2d808e5f68996a79a14d575ae20ab16 *R/gbm.perf.R f17f3d39a4d6820e78130748ce8032ff *R/gbmCrossVal.R 40231a31962f0df1ab182edcffe51b9f *R/interact.gbm.R fc877c59338b8343545050803c29ec95 *R/ir.measures.R 1e1e9648a40d27a07c63e9c4103ba4d0 *R/plot.gbm.R e8f2da715b200da15e92a70e483207ce *R/predict.gbm.R 48438bd417c4a7b3c0495c901c5d5060 *R/pretty.gbm.tree.R b068e5396186cc21060477aac914abe7 *R/print.gbm.R af4fd23ba860c912a1a237fb3b5631d1 *R/reconstructGBMdata.R 7d953fa9013fdb90ae01e67e336b2747 *R/relative.influence.R 81f913b053b7d402f4a808aeb3670e2f *R/shrink.gbm.R d001fbd3c7de86463f4d0f1dff63a70b *R/shrink.gbm.pred.R 21f1a9fdd69be98ad81bbca7e18ec8a7 *R/test.gbm.R 3fc23fb8a1c816ac430c4e836a08078a *R/utils.R 08ab323918a24917e4d4638ca01c841a *R/zzz.R 55ae3c9b2954cd0ac1c317b5698d77c3 *README.md 4dc9151409b8112474ac3f1da044f7f7 *build/vignette.rds 4e38ebb4d3578e523b7d94fc9ece3d65 *demo/00Index e3bd8606063f15ded6ab3261c13d22af *demo/OOB-reps.R 354344b4f6e8a232508ef872ced5efa3 *demo/bernoulli.R f7599f6ddc6852ba0721651a46601b06 *demo/coxph.R bb1c84d68320171ac205bb33114d49e1 *demo/gaussian.R 31906c0a7bce9676949413f0fbff2c6c *demo/multinomial.R af763746809ed98e48e065f77942cb05 *demo/pairwise.R dbff7ebcc6a18e27c1b423fd5db70ae3 *demo/printExamples.R 79316127956b8f5291f5021f1e7c89ef *demo/robustReg.R c044e4fcd21ef75478830ede774cfba7 *inst/doc/gbm.Rnw ecaf68f8e96581dbbd9735927f42c462 *inst/doc/gbm.pdf e89d6b6a7a2f19974d5c7916c9e2ae66 *man/basehaz.gbm.Rd c606780ccf3028850a848dfc2b3f4739 *man/calibrate.plot.Rd bf74b54c920807d509d5ff19e45e95d4 *man/gbm-internals.Rd 5f96c05f991a485fbfe7a23b87b3d649 *man/gbm-package.Rd 15763b8625b44991118470ad6057b6da *man/gbm.Rd 94befbc345d33d0ed250a227a1268603 *man/gbm.fit.Rd a65152118be58b4d8bf48ad8c93614c7 *man/gbm.more.Rd 728fa0d75f96519d0156aa2891362b9b *man/gbm.object.Rd d007fd2b010c4b6ccbd4c0ec2aba9ea0 *man/gbm.perf.Rd c43f6a77ca7bec407e85b642d6dfa2be *man/gbm.roc.area.Rd 2cd76f2ffbdc511bb0ac0a9dc1fb393b *man/gbmCrossVal.Rd 7d42ecd6cfbbb3e83f94685f0ef7add4 *man/grid.arrange.Rd c1789d7d5b7fc9be7665be55c1893d35 *man/interact.gbm.Rd 0a3f9f38c375609ef6380dceb1d4128c *man/plot.gbm.Rd 2a0d1ae9483de0ffb214d25623821f68 *man/predict.gbm.Rd e368dcac4b75c8273529151e0087c5d4 *man/pretty.gbm.tree.Rd 21c028bad14805f40e0a7a0dc7e49e64 *man/print.gbm.Rd f9563a4ec1265edfec56ecbdb8148e38 *man/quantile.rug.Rd 27aa52e20ea8281697e8357a36d58b85 *man/reconstructGBMdata.Rd f17f451739be17e89ec1b227b6602c86 *man/relative.influence.Rd 6f99e3dde82cbc922d9f1fc7f22bdcd9 *man/shrink.gbm.Rd d75c1d9e1ff0c6a83bb37df2591ae4d9 *man/shrink.gbm.pred.Rd dd2dfa92c91ff3ae020d9dbdd23657fb *man/summary.gbm.Rd 8201654f42537ca205d0d5b138848df8 *man/test.gbm.Rd 0d32ce72a7b02fc57d602c60b9ba8305 *src/adaboost.cpp 2f5d22dc3043e69628763cbe303e6b5f *src/adaboost.h 6d2bd44a11975c8f023640eb7a9036c3 *src/bac/gaussian.cpp c877a1d31fa93463ed5d3ccd2164aa80 *src/bernoulli.cpp 323f73ab809cff64ad5b4f336157f295 *src/bernoulli.h 088062cab2532d24fa3a9fc5affcf69a *src/buildinfo.h e15f767c646f66e54eb5bb20ccd7cebd *src/coxph.cpp e110cbd0b715934c4e0257cf20e9c1da *src/coxph.h 3616890b5d7af2b3edd52dc5f29544b0 *src/dataset.cpp d30f46362b1915f76e5a328ce95c7136 *src/dataset.h b5824ccf353076bf59018429ae3ac6ac *src/distribution.cpp 91d88e455827695f63bf23df5dfb3108 *src/distribution.h 6d2bd44a11975c8f023640eb7a9036c3 *src/gaussian.cpp 6c2bf2616a3b4491aaaf501346246d3f *src/gaussian.h 889bfcdd44dc35824be51ba8ae2bd517 *src/gbm-init.c 1d8d4e59887769602b1d3c8dc3d5f94f *src/gbm.cpp 0f49e8549558916322ec80e29b591a73 *src/gbm.h c0c572eb464dae70700ffe8fdc3f6b9f *src/gbm_engine.cpp b3f1f49fa614ac6cfd52b28191bfdb70 *src/gbm_engine.h 1d924856d046e942a312d373cfce230f *src/gbmentry.cpp 1fba83f37e9f092d8b005e0c8f32a97b *src/huberized.cpp 141e5b762944c14a0b6294e15046296f *src/huberized.h 10dcf061e2807ca52f811ec6650f33ad *src/laplace.cpp 53b4d97c482517fbbc97162da1adf891 *src/laplace.h d25bcfb8da3565604f902270b25eb470 *src/locationm.cpp 932f3d98f158ebf6ae11ed47e873a7f3 *src/locationm.h 39094967ceaabf7c744bc93d0b86d22f *src/matrix.h 7242e54abea29c46990c4aabba7a65b6 *src/multinomial.cpp 8798fe266a8bad59ac9b3e7019cebbe8 *src/multinomial.h 75737afcbdd3162c62fcdd82b027e1d2 *src/node.cpp 3f7d35689f88a25a8f536d31c4ce172b *src/node.h 49da51b394dccb0063fa7b5e4ed662d6 *src/node_categorical.cpp 98afbdcf5bb70211102e58ed262fcec1 *src/node_categorical.h 74913ea93e6707eb49e52ac24047ae07 *src/node_continuous.cpp f09bd89f861430f58cb80ccf0de77c6a *src/node_continuous.h af2b9dd107d657344891521829c52243 *src/node_factory.cpp 3b80b8101a773a42a06eb41b5c6b01c9 *src/node_factory.h 56dc9a7a6309294654e641c14a32023d *src/node_nonterminal.cpp 062cbcf913ad61d33048c36ab0b76735 *src/node_nonterminal.h a99c0738f82cb857c87b45a65d4e8f25 *src/node_search.cpp 76b812a554f8ce9e7ea64c6f3c7631ee *src/node_search.h c6943942255ce8138259b6b47caa0c08 *src/node_terminal.cpp 084bcc63d1b33ca200460b88ef36b8f6 *src/node_terminal.h b763976a9c68d9e975417a84b7e2b3c4 *src/pairwise.cpp 8dc9c440afcb8d96f881c6d56ecae4d6 *src/pairwise.h 756422dc1f3f394260fa4d77ec42d1ed *src/poisson.cpp 0c901877981c1df8c4d82f6dd99c9231 *src/poisson.h 64e10460138c1b67923020b58cf1a599 *src/quantile.cpp 491d792d90d047d5a8c192253b632252 *src/quantile.h 519b30584e7e752480750e86027aea7e *src/tdist.cpp 9ab15eb81fc9a18ee7d14a76f7aefd2a *src/tdist.h 276e36bf158250eb458a1cdabcf975b5 *src/tree.cpp 6b2f1cd60e5d67638e110e1ac9552b27 *src/tree.h c044e4fcd21ef75478830ede774cfba7 *vignettes/gbm.Rnw b5633beb372053eac8730e76d8999ce9 *vignettes/gbm.bib 7ba661d197d25537a69fc34d737b4d29 *vignettes/oobperf2.pdf 3fda19791155842b0e48565781441aa2 *vignettes/shrinkage-v-iterations.pdf 90fd593dd07098b5600fb650e86733ff *vignettes/srcltx.sty gbm/build/0000755000176200001440000000000013417115400012112 5ustar liggesusersgbm/build/vignette.rds0000644000176200001440000000034413417115400014452 0ustar liggesusersmOK 0MAr.]SD"Hq66c-֦$+Onj+o !.(%chBIs'ޝ0+h(&Wl*6WR@ls"2GG?\qu_s4~2~ms LTSZ.T k]yB&1[u ?_,FàR{PBk[h_pÃB|ugbm/DESCRIPTION0000644000176200001440000000341013417121763012530 0ustar liggesusersPackage: gbm Version: 2.1.5 Title: Generalized Boosted Regression Models Authors@R: c( person("Brandon", "Greenwell", email = "greenwell.brandon@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-8120-0084")), person("Bradley", "Boehmke", email = "bradleyboehmke@gmail.com", role = "aut", comment = c(ORCID = "0000-0002-3611-8516")), person("Jay", "Cunningham", email = "james@notbadafterall.com", role = "aut"), person("GBM", "Developers", role = "aut", comment = "https://github.com/gbm-developers") ) Depends: R (>= 2.9.0) Imports: gridExtra, lattice, parallel, survival Suggests: knitr, pdp, RUnit, splines, viridis Description: An implementation of extensions to Freund and Schapire's AdaBoost algorithm and Friedman's gradient boosting machine. Includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart). Originally developed by Greg Ridgeway. License: GPL (>= 2) | file LICENSE URL: https://github.com/gbm-developers/gbm BugReports: https://github.com/gbm-developers/gbm/issues RoxygenNote: 6.1.1 VignetteBuilder: knitr NeedsCompilation: yes Packaged: 2019-01-14 14:21:52 UTC; bgreenwell Author: Brandon Greenwell [aut, cre] (), Bradley Boehmke [aut] (), Jay Cunningham [aut], GBM Developers [aut] (https://github.com/gbm-developers) Maintainer: Brandon Greenwell Repository: CRAN Date/Publication: 2019-01-14 15:00:03 UTC gbm/man/0000755000176200001440000000000013346511223011572 5ustar liggesusersgbm/man/gbm.more.Rd0000644000176200001440000001170013346511223013566 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/gbm.more.R \name{gbm.more} \alias{gbm.more} \title{Generalized Boosted Regression Modeling (GBM)} \usage{ gbm.more(object, n.new.trees = 100, data = NULL, weights = NULL, offset = NULL, verbose = NULL) } \arguments{ \item{object}{A \code{\link{gbm.object}} object created from an initial call to \code{\link{gbm}}.} \item{n.new.trees}{Integer specifying the number of additional trees to add to \code{object}. Default is 100.} \item{data}{An optional data frame containing the variables in the model. By default the variables are taken from \code{environment(formula)}, typically the environment from which \code{gbm} is called. If \code{keep.data=TRUE} in the initial call to \code{gbm} then \code{gbm} stores a copy with the object. If \code{keep.data=FALSE} then subsequent calls to \code{\link{gbm.more}} must resupply the same dataset. It becomes the user's responsibility to resupply the same data at this point.} \item{weights}{An optional vector of weights to be used in the fitting process. Must be positive but do not need to be normalized. If \code{keep.data=FALSE} in the initial call to \code{gbm} then it is the user's responsibility to resupply the weights to \code{\link{gbm.more}}.} \item{offset}{A vector of offset values.} \item{verbose}{Logical indicating whether or not to print out progress and performance indicators (\code{TRUE}). If this option is left unspecified for \code{gbm.more}, then it uses \code{verbose} from \code{object}. Default is \code{FALSE}.} } \value{ A \code{\link{gbm.object}} object. } \description{ Adds additional trees to a \code{\link{gbm.object}} object. } \examples{ # # A least squares regression example # # Simulate data set.seed(101) # for reproducibility N <- 1000 X1 <- runif(N) X2 <- 2 * runif(N) X3 <- ordered(sample(letters[1:4], N, replace = TRUE), levels = letters[4:1]) X4 <- factor(sample(letters[1:6], N, replace = TRUE)) X5 <- factor(sample(letters[1:3], N, replace = TRUE)) X6 <- 3 * runif(N) mu <- c(-1, 0, 1, 2)[as.numeric(X3)] SNR <- 10 # signal-to-noise ratio Y <- X1 ^ 1.5 + 2 * (X2 ^ 0.5) + mu sigma <- sqrt(var(Y) / SNR) Y <- Y + rnorm(N, 0, sigma) X1[sample(1:N,size=500)] <- NA # introduce some missing values X4[sample(1:N,size=300)] <- NA # introduce some missing values data <- data.frame(Y, X1, X2, X3, X4, X5, X6) # Fit a GBM set.seed(102) # for reproducibility gbm1 <- gbm(Y ~ ., data = data, var.monotone = c(0, 0, 0, 0, 0, 0), distribution = "gaussian", n.trees = 100, shrinkage = 0.1, interaction.depth = 3, bag.fraction = 0.5, train.fraction = 0.5, n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE, verbose = FALSE, n.cores = 1) # Check performance using the out-of-bag (OOB) error; the OOB error typically # underestimates the optimal number of iterations best.iter <- gbm.perf(gbm1, method = "OOB") print(best.iter) # Check performance using the 50\% heldout test set best.iter <- gbm.perf(gbm1, method = "test") print(best.iter) # Check performance using 5-fold cross-validation best.iter <- gbm.perf(gbm1, method = "cv") print(best.iter) # Plot relative influence of each variable par(mfrow = c(1, 2)) summary(gbm1, n.trees = 1) # using first tree summary(gbm1, n.trees = best.iter) # using estimated best number of trees # Compactly print the first and last trees for curiosity print(pretty.gbm.tree(gbm1, i.tree = 1)) print(pretty.gbm.tree(gbm1, i.tree = gbm1$n.trees)) # Simulate new data set.seed(103) # for reproducibility N <- 1000 X1 <- runif(N) X2 <- 2 * runif(N) X3 <- ordered(sample(letters[1:4], N, replace = TRUE)) X4 <- factor(sample(letters[1:6], N, replace = TRUE)) X5 <- factor(sample(letters[1:3], N, replace = TRUE)) X6 <- 3 * runif(N) mu <- c(-1, 0, 1, 2)[as.numeric(X3)] Y <- X1 ^ 1.5 + 2 * (X2 ^ 0.5) + mu + rnorm(N, 0, sigma) data2 <- data.frame(Y, X1, X2, X3, X4, X5, X6) # Predict on the new data using the "best" number of trees; by default, # predictions will be on the link scale Yhat <- predict(gbm1, newdata = data2, n.trees = best.iter, type = "link") # least squares error print(sum((data2$Y - Yhat)^2)) # Construct univariate partial dependence plots p1 <- plot(gbm1, i.var = 1, n.trees = best.iter) p2 <- plot(gbm1, i.var = 2, n.trees = best.iter) p3 <- plot(gbm1, i.var = "X3", n.trees = best.iter) # can use index or name grid.arrange(p1, p2, p3, ncol = 3) # Construct bivariate partial dependence plots plot(gbm1, i.var = 1:2, n.trees = best.iter) plot(gbm1, i.var = c("X2", "X3"), n.trees = best.iter) plot(gbm1, i.var = 3:4, n.trees = best.iter) # Construct trivariate partial dependence plots plot(gbm1, i.var = c(1, 2, 6), n.trees = best.iter, continuous.resolution = 20) plot(gbm1, i.var = 1:3, n.trees = best.iter) plot(gbm1, i.var = 2:4, n.trees = best.iter) plot(gbm1, i.var = 3:5, n.trees = best.iter) # Add more (i.e., 100) boosting iterations to the ensemble gbm2 <- gbm.more(gbm1, n.new.trees = 100, verbose = FALSE) } gbm/man/summary.gbm.Rd0000644000176200001440000000534413346511223014330 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/print.gbm.R \name{summary.gbm} \alias{summary.gbm} \title{Summary of a gbm object} \usage{ \method{summary}{gbm}(object, cBars = length(object$var.names), n.trees = object$n.trees, plotit = TRUE, order = TRUE, method = relative.influence, normalize = TRUE, ...) } \arguments{ \item{object}{a \code{gbm} object created from an initial call to \code{\link{gbm}}.} \item{cBars}{the number of bars to plot. If \code{order=TRUE} the only the variables with the \code{cBars} largest relative influence will appear in the barplot. If \code{order=FALSE} then the first \code{cBars} variables will appear in the plot. In either case, the function will return the relative influence of all of the variables.} \item{n.trees}{the number of trees used to generate the plot. Only the first \code{n.trees} trees will be used.} \item{plotit}{an indicator as to whether the plot is generated.} \item{order}{an indicator as to whether the plotted and/or returned relative influences are sorted.} \item{method}{The function used to compute the relative influence. \code{\link{relative.influence}} is the default and is the same as that described in Friedman (2001). The other current (and experimental) choice is \code{\link{permutation.test.gbm}}. This method randomly permutes each predictor variable at a time and computes the associated reduction in predictive performance. This is similar to the variable importance measures Breiman uses for random forests, but \code{gbm} currently computes using the entire training dataset (not the out-of-bag observations).} \item{normalize}{if \code{FALSE} then \code{summary.gbm} returns the unnormalized influence.} \item{...}{other arguments passed to the plot function.} } \value{ Returns a data frame where the first component is the variable name and the second is the computed relative influence, normalized to sum to 100. } \description{ Computes the relative influence of each variable in the gbm object. } \details{ For \code{distribution="gaussian"} this returns exactly the reduction of squared error attributable to each variable. For other loss functions this returns the reduction attributable to each variable in sum of squared error in predicting the gradient on each iteration. It describes the relative influence of each variable in reducing the loss function. See the references below for exact details on the computation. } \references{ J.H. Friedman (2001). "Greedy Function Approximation: A Gradient Boosting Machine," Annals of Statistics 29(5):1189-1232. L. Breiman (2001).\url{https://www.stat.berkeley.edu/users/breiman/randomforest2001.pdf}. } \seealso{ \code{\link{gbm}} } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} } \keyword{hplot} gbm/man/predict.gbm.Rd0000644000176200001440000000450213346511223014260 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/predict.gbm.R \name{predict.gbm} \alias{predict.gbm} \title{Predict method for GBM Model Fits} \usage{ \method{predict}{gbm}(object, newdata, n.trees, type = "link", single.tree = FALSE, ...) } \arguments{ \item{object}{Object of class inheriting from (\code{\link{gbm.object}})} \item{newdata}{Data frame of observations for which to make predictions} \item{n.trees}{Number of trees used in the prediction. \code{n.trees} may be a vector in which case predictions are returned for each iteration specified} \item{type}{The scale on which gbm makes the predictions} \item{single.tree}{If \code{single.tree=TRUE} then \code{predict.gbm} returns only the predictions from tree(s) \code{n.trees}} \item{\dots}{further arguments passed to or from other methods} } \value{ Returns a vector of predictions. By default the predictions are on the scale of f(x). For example, for the Bernoulli loss the returned value is on the log odds scale, poisson loss on the log scale, and coxph is on the log hazard scale. If \code{type="response"} then \code{gbm} converts back to the same scale as the outcome. Currently the only effect this will have is returning probabilities for bernoulli and expected counts for poisson. For the other distributions "response" and "link" return the same. } \description{ Predicted values based on a generalized boosted model object } \details{ \code{predict.gbm} produces predicted values for each observation in \code{newdata} using the the first \code{n.trees} iterations of the boosting sequence. If \code{n.trees} is a vector than the result is a matrix with each column representing the predictions from gbm models with \code{n.trees[1]} iterations, \code{n.trees[2]} iterations, and so on. The predictions from \code{gbm} do not include the offset term. The user may add the value of the offset to the predicted value if desired. If \code{object} was fit using \code{\link{gbm.fit}} there will be no \code{Terms} component. Therefore, the user has greater responsibility to make sure that \code{newdata} is of the same format (order and number of variables) as the one originally used to fit the model. } \seealso{ \code{\link{gbm}}, \code{\link{gbm.object}} } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} } \keyword{models} \keyword{regression} gbm/man/shrink.gbm.pred.Rd0000644000176200001440000000165413346511223015062 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/shrink.gbm.pred.R \name{shrink.gbm.pred} \alias{shrink.gbm.pred} \title{Predictions from a shrunked GBM} \usage{ shrink.gbm.pred(object, newdata, n.trees, lambda = rep(1, length(object$var.names)), ...) } \arguments{ \item{object}{a \code{\link{gbm.object}}} \item{newdata}{dataset for predictions} \item{n.trees}{the number of trees to use} \item{lambda}{a vector with length equal to the number of variables containing the shrinkage parameter for each variable} \item{\dots}{other parameters (ignored)} } \value{ A vector with length equal to the number of observations in newdata containing the predictions } \description{ Makes predictions from a shrunken GBM model. } \section{Warning}{ This function is experimental } \seealso{ \code{\link{shrink.gbm}}, \code{\link{gbm}} } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} } \keyword{methods} gbm/man/shrink.gbm.Rd0000644000176200001440000000302613346511223014124 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/shrink.gbm.R \name{shrink.gbm} \alias{shrink.gbm} \title{L1 shrinkage of the predictor variables in a GBM} \usage{ shrink.gbm(object, n.trees, lambda = rep(10, length(object$var.names)), ...) } \arguments{ \item{object}{A \code{\link{gbm.object}}.} \item{n.trees}{Integer specifying the number of trees to use.} \item{lambda}{Vector of length equal to the number of variables containing the shrinkage parameter for each variable.} \item{\dots}{Additional optional arguments. (Currently ignored.)} } \value{ \item{predF}{Predicted values from the shrunken tree} \item{objective}{The value of the loss function associated with the predicted values} \item{gradient}{A vector with length equal to the number of variables containing the derivative of the objective function with respect to beta, the logit transform of the shrinkage parameter for each variable} } \description{ Performs recursive shrinkage in each of the trees in a GBM fit using different shrinkage parameters for each variable. } \details{ This function is currently experimental. Used in conjunction with a gradient ascent search for inclusion of variables. } \note{ Warning: This function is experimental. } \references{ Hastie, T. J., and Pregibon, D. \url{https://web.stanford.edu/~hastie/Papers/shrink_tree.pdf}. AT&T Bell Laboratories Technical Report (March 1990). } \seealso{ \code{\link{shrink.gbm.pred}}, \code{\link{gbm}} } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} } \keyword{methods} gbm/man/plot.gbm.Rd0000644000176200001440000001003213346511223013577 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/plot.gbm.R \name{plot.gbm} \alias{plot.gbm} \title{Marginal plots of fitted gbm objects} \usage{ \method{plot}{gbm}(x, i.var = 1, n.trees = x$n.trees, continuous.resolution = 100, return.grid = FALSE, type = c("link", "response"), level.plot = TRUE, contour = FALSE, number = 4, overlap = 0.1, col.regions = viridis::viridis, ...) } \arguments{ \item{x}{A \code{\link{gbm.object}} that was fit using a call to \code{\link{gbm}}.} \item{i.var}{Vector of indices or the names of the variables to plot. If using indices, the variables are indexed in the same order that they appear in the initial \code{gbm} formula. If \code{length(i.var)} is between 1 and 3 then \code{plot.gbm} produces the plots. Otherwise, \code{plot.gbm} returns only the grid of evaluation points and their average predictions} \item{n.trees}{Integer specifying the number of trees to use to generate the plot. Default is to use \code{x$n.trees} (i.e., the entire ensemble).} \item{continuous.resolution}{Integer specifying the number of equally space points at which to evaluate continuous predictors.} \item{return.grid}{Logical indicating whether or not to produce graphics \code{FALSE} or only return the grid of evaluation points and their average predictions \code{TRUE}. This is useful for customizing the graphics for special variable types, or for higher dimensional graphs.} \item{type}{Character string specifying the type of prediction to plot on the vertical axis. See \code{\link{predict.gbm}} for details.} \item{level.plot}{Logical indicating whether or not to use a false color level plot (\code{TRUE}) or a 3-D surface (\code{FALSE}). Default is \code{TRUE}.} \item{contour}{Logical indicating whether or not to add contour lines to the level plot. Only used when \code{level.plot = TRUE}. Default is \code{FALSE}.} \item{number}{Integer specifying the number of conditional intervals to use for the continuous panel variables. See \code{\link[graphics]{co.intervals}} and \code{\link[lattice]{equal.count}} for further details.} \item{overlap}{The fraction of overlap of the conditioning variables. See \code{\link[graphics]{co.intervals}} and \code{\link[lattice]{equal.count}} for further details.} \item{col.regions}{Color vector to be used if \code{level.plot} is \code{TRUE}. Defaults to the wonderful Matplotlib 'viridis' color map provided by the \code{viridis} package. See \code{\link[viridis]{viridis}} for details.} \item{...}{Additional optional arguments to be passed onto \code{\link[graphics]{plot}}.} } \value{ If \code{return.grid = TRUE}, a grid of evaluation points and their average predictions. Otherwise, a plot is returned. } \description{ Plots the marginal effect of the selected variables by "integrating" out the other variables. } \details{ \code{plot.gbm} produces low dimensional projections of the \code{\link{gbm.object}} by integrating out the variables not included in the \code{i.var} argument. The function selects a grid of points and uses the weighted tree traversal method described in Friedman (2001) to do the integration. Based on the variable types included in the projection, \code{plot.gbm} selects an appropriate display choosing amongst line plots, contour plots, and \code{\link[lattice]{lattice}} plots. If the default graphics are not sufficient the user may set \code{return.grid=TRUE}, store the result of the function, and develop another graphic display more appropriate to the particular example. } \note{ More flexible plotting is available using the \code{\link[pdp]{partial}} and \code{\link[pdp]{plotPartial}} functions. } \references{ J. H. Friedman (2001). "Greedy Function Approximation: A Gradient Boosting Machine," Annals of Statistics 29(4). B. M. Greenwell (2017). "pdp: An R Package for Constructing Partial Dependence Plots," The R Journal 9(1), 421--436. \url{https://journal.r-project.org/archive/2017/RJ-2017-016/index.html}. } \seealso{ \code{\link[pdp]{partial}}, \code{\link[pdp]{plotPartial}}, \code{\link{gbm}}, and \code{\link{gbm.object}}. } gbm/man/print.gbm.Rd0000644000176200001440000000425013346511223013762 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/print.gbm.R \name{print.gbm} \alias{print.gbm} \alias{show.gbm} \title{Print model summary} \usage{ \method{print}{gbm}(x, ...) show.gbm(x, ...) } \arguments{ \item{x}{an object of class \code{gbm}.} \item{\dots}{arguments passed to \code{print.default}.} } \description{ Display basic information about a \code{gbm} object. } \details{ Prints some information about the model object. In particular, this method prints the call to \code{gbm()}, the type of loss function that was used, and the total number of iterations. If cross-validation was performed, the 'best' number of trees as estimated by cross-validation error is displayed. If a test set was used, the 'best' number of trees as estimated by the test set error is displayed. The number of available predictors, and the number of those having non-zero influence on predictions is given (which might be interesting in data mining applications). If multinomial, bernoulli or adaboost was used, the confusion matrix and prediction accuracy are printed (objects being allocated to the class with highest probability for multinomial and bernoulli). These classifications are performed on the entire training data using the model with the 'best' number of trees as described above, or the maximum number of trees if the 'best' cannot be computed. If the 'distribution' was specified as gaussian, laplace, quantile or t-distribution, a summary of the residuals is displayed. The residuals are for the training data with the model at the 'best' number of trees, as described above, or the maximum number of trees if the 'best' cannot be computed. } \examples{ data(iris) iris.mod <- gbm(Species ~ ., distribution="multinomial", data=iris, n.trees=2000, shrinkage=0.01, cv.folds=5, verbose=FALSE, n.cores=1) iris.mod #data(lung) #lung.mod <- gbm(Surv(time, status) ~ ., distribution="coxph", data=lung, # n.trees=2000, shrinkage=0.01, cv.folds=5,verbose =FALSE) #lung.mod } \seealso{ \code{\link{gbm}} } \author{ Harry Southworth, Daniel Edwards } \keyword{models} \keyword{nonlinear} \keyword{nonparametric} \keyword{survival} gbm/man/gbm.object.Rd0000644000176200001440000000463513346511223014103 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/gbm.object.R \name{gbm.object} \alias{gbm.object} \title{Generalized Boosted Regression Model Object} \value{ \item{initF}{the "intercept" term, the initial predicted value to which trees make adjustments} \item{fit}{a vector containing the fitted values on the scale of regression function (e.g. log-odds scale for bernoulli, log scale for poisson)} \item{train.error}{a vector of length equal to the number of fitted trees containing the value of the loss function for each boosting iteration evaluated on the training data} \item{valid.error}{a vector of length equal to the number of fitted trees containing the value of the loss function for each boosting iteration evaluated on the validation data} \item{cv.error}{if \code{cv.folds}<2 this component is NULL. Otherwise, this component is a vector of length equal to the number of fitted trees containing a cross-validated estimate of the loss function for each boosting iteration} \item{oobag.improve}{a vector of length equal to the number of fitted trees containing an out-of-bag estimate of the marginal reduction in the expected value of the loss function. The out-of-bag estimate uses only the training data and is useful for estimating the optimal number of boosting iterations. See \code{\link{gbm.perf}}} \item{trees}{a list containing the tree structures. The components are best viewed using \code{\link{pretty.gbm.tree}}} \item{c.splits}{a list of all the categorical splits in the collection of trees. If the \code{trees[[i]]} component of a \code{gbm} object describes a categorical split then the splitting value will refer to a component of \code{c.splits}. That component of \code{c.splits} will be a vector of length equal to the number of levels in the categorical split variable. -1 indicates left, +1 indicates right, and 0 indicates that the level was not present in the training data} \item{cv.fitted}{If cross-validation was performed, the cross-validation predicted values on the scale of the linear predictor. That is, the fitted values from the ith CV-fold, for the model having been trained on the data in all other folds.} } \description{ These are objects representing fitted \code{gbm}s. } \section{Structure}{ The following components must be included in a legitimate \code{gbm} object. } \seealso{ \code{\link{gbm}} } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} } \keyword{methods} gbm/man/gbmCrossVal.Rd0000644000176200001440000000570113346511223014306 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/gbmCrossVal.R \name{gbmCrossVal} \alias{gbmCrossVal} \alias{gbmCrossValModelBuild} \alias{gbmDoFold} \alias{gbmCrossValErr} \alias{gbmCrossValPredictions} \title{Cross-validate a gbm} \usage{ gbmCrossVal(cv.folds, nTrain, n.cores, class.stratify.cv, data, x, y, offset, distribution, w, var.monotone, n.trees, interaction.depth, n.minobsinnode, shrinkage, bag.fraction, var.names, response.name, group) gbmCrossValErr(cv.models, cv.folds, cv.group, nTrain, n.trees) gbmCrossValPredictions(cv.models, cv.folds, cv.group, best.iter.cv, distribution, data, y) gbmCrossValModelBuild(cv.folds, cv.group, n.cores, i.train, x, y, offset, distribution, w, var.monotone, n.trees, interaction.depth, n.minobsinnode, shrinkage, bag.fraction, var.names, response.name, group) gbmDoFold(X, i.train, x, y, offset, distribution, w, var.monotone, n.trees, interaction.depth, n.minobsinnode, shrinkage, bag.fraction, cv.group, var.names, response.name, group, s) } \arguments{ \item{cv.folds}{The number of cross-validation folds.} \item{nTrain}{The number of training samples.} \item{n.cores}{The number of cores to use.} \item{class.stratify.cv}{Whether or not stratified cross-validation samples are used.} \item{data}{The data.} \item{x}{The model matrix.} \item{y}{The response variable.} \item{offset}{The offset.} \item{distribution}{The type of loss function. See \code{\link{gbm}}.} \item{w}{Observation weights.} \item{var.monotone}{See \code{\link{gbm}}.} \item{n.trees}{The number of trees to fit.} \item{interaction.depth}{The degree of allowed interactions. See \code{\link{gbm}}.} \item{n.minobsinnode}{See \code{\link{gbm}}.} \item{shrinkage}{See \code{\link{gbm}}.} \item{bag.fraction}{See \code{\link{gbm}}.} \item{var.names}{See \code{\link{gbm}}.} \item{response.name}{See \code{\link{gbm}}.} \item{group}{Used when \code{distribution = "pairwise"}. See \code{\link{gbm}}.} \item{cv.models}{A list containing the models for each fold.} \item{cv.group}{A vector indicating the cross-validation fold for each member of the training set.} \item{best.iter.cv}{The iteration with lowest cross-validation error.} \item{i.train}{Items in the training set.} \item{X}{Index (cross-validation fold) on which to subset.} \item{s}{Random seed.} } \value{ A list containing the cross-validation error and predictions. } \description{ Functions for cross-validating gbm. These functions are used internally and are not intended for end-user direct usage. } \details{ These functions are not intended for end-user direct usage, but are used internally by \code{gbm}. } \references{ J.H. Friedman (2001). "Greedy Function Approximation: A Gradient Boosting Machine," Annals of Statistics 29(5):1189-1232. L. Breiman (2001). \url{https://www.stat.berkeley.edu/users/breiman/randomforest2001.pdf}. } \seealso{ \code{\link{gbm}} } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} } \keyword{models} gbm/man/pretty.gbm.tree.Rd0000644000176200001440000000331113346511223015110 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/pretty.gbm.tree.R \name{pretty.gbm.tree} \alias{pretty.gbm.tree} \title{Print gbm tree components} \usage{ \method{pretty}{gbm.tree}(object, i.tree = 1) } \arguments{ \item{object}{a \code{\link{gbm.object}} initially fit using \code{\link{gbm}}} \item{i.tree}{the index of the tree component to extract from \code{object} and display} } \value{ \code{pretty.gbm.tree} returns a data frame. Each row corresponds to a node in the tree. Columns indicate \item{SplitVar}{index of which variable is used to split. -1 indicates a terminal node.} \item{SplitCodePred}{if the split variable is continuous then this component is the split point. If the split variable is categorical then this component contains the index of \code{object$c.split} that describes the categorical split. If the node is a terminal node then this is the prediction.} \item{LeftNode}{the index of the row corresponding to the left node.} \item{RightNode}{the index of the row corresponding to the right node.} \item{ErrorReduction}{the reduction in the loss function as a result of splitting this node.} \item{Weight}{the total weight of observations in the node. If weights are all equal to 1 then this is the number of observations in the node.} } \description{ \code{gbm} stores the collection of trees used to construct the model in a compact matrix structure. This function extracts the information from a single tree and displays it in a slightly more readable form. This function is mostly for debugging purposes and to satisfy some users' curiosity. } \seealso{ \code{\link{gbm}}, \code{\link{gbm.object}} } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} } \keyword{print} gbm/man/relative.influence.Rd0000644000176200001440000000427313346511223015651 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/relative.influence.R \name{relative.influence} \alias{relative.influence} \alias{permutation.test.gbm} \alias{gbm.loss} \title{Methods for estimating relative influence} \usage{ relative.influence(object, n.trees, scale. = FALSE, sort. = FALSE) permutation.test.gbm(object, n.trees) gbm.loss(y, f, w, offset, dist, baseline, group = NULL, max.rank = NULL) } \arguments{ \item{object}{a \code{gbm} object created from an initial call to \code{\link{gbm}}.} \item{n.trees}{the number of trees to use for computations. If not provided, the the function will guess: if a test set was used in fitting, the number of trees resulting in lowest test set error will be used; otherwise, if cross-validation was performed, the number of trees resulting in lowest cross-validation error will be used; otherwise, all trees will be used.} \item{scale.}{whether or not the result should be scaled. Defaults to \code{FALSE}.} \item{sort.}{whether or not the results should be (reverse) sorted. Defaults to \code{FALSE}.} \item{y, f, w, offset, dist, baseline}{For \code{gbm.loss}: These components are the outcome, predicted value, observation weight, offset, distribution, and comparison loss function, respectively.} \item{group, max.rank}{Used internally when \code{distribution = \'pairwise\'}.} } \value{ By default, returns an unprocessed vector of estimated relative influences. If the \code{scale.} and \code{sort.} arguments are used, returns a processed version of the same. } \description{ Helper functions for computing the relative influence of each variable in the gbm object. } \details{ This is not intended for end-user use. These functions offer the different methods for computing the relative influence in \code{\link{summary.gbm}}. \code{gbm.loss} is a helper function for \code{permutation.test.gbm}. } \references{ J.H. Friedman (2001). "Greedy Function Approximation: A Gradient Boosting Machine," Annals of Statistics 29(5):1189-1232. L. Breiman (2001). \url{https://www.stat.berkeley.edu/users/breiman/randomforest2001.pdf}. } \seealso{ \code{\link{summary.gbm}} } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} } \keyword{hplot} gbm/man/gbm.perf.Rd0000644000176200001440000000363413346511223013567 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/gbm.perf.R \name{gbm.perf} \alias{gbm.perf} \title{GBM performance} \usage{ gbm.perf(object, plot.it = TRUE, oobag.curve = FALSE, overlay = TRUE, method) } \arguments{ \item{object}{A \code{\link{gbm.object}} created from an initial call to \code{\link{gbm}}.} \item{plot.it}{An indicator of whether or not to plot the performance measures. Setting \code{plot.it = TRUE} creates two plots. The first plot plots \code{object$train.error} (in black) and \code{object$valid.error} (in red) versus the iteration number. The scale of the error measurement, shown on the left vertical axis, depends on the \code{distribution} argument used in the initial call to \code{\link{gbm}}.} \item{oobag.curve}{Indicates whether to plot the out-of-bag performance measures in a second plot.} \item{overlay}{If TRUE and oobag.curve=TRUE then a right y-axis is added to the training and test error plot and the estimated cumulative improvement in the loss function is plotted versus the iteration number.} \item{method}{Indicate the method used to estimate the optimal number of boosting iterations. \code{method = "OOB"} computes the out-of-bag estimate and \code{method = "test"} uses the test (or validation) dataset to compute an out-of-sample estimate. \code{method = "cv"} extracts the optimal number of iterations using cross-validation if \code{gbm} was called with \code{cv.folds} > 1.} } \value{ \code{gbm.perf} Returns the estimated optimal number of iterations. The method of computation depends on the \code{method} argument. } \description{ Estimates the optimal number of boosting iterations for a \code{gbm} object and optionally plots various performance measures } \seealso{ \code{\link{gbm}}, \code{\link{gbm.object}} } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} } \keyword{nonlinear} \keyword{nonparametric} \keyword{survival} \keyword{tree} gbm/man/grid.arrange.Rd0000644000176200001440000000046313346511223014427 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/utils.R \name{grid.arrange} \alias{grid.arrange} \title{Arrange multiple grobs on a page} \usage{ grid.arrange(..., newpage = TRUE) } \description{ See \code{\link[gridExtra]{grid.arrange}} for more details. } \keyword{internal} gbm/man/calibrate.plot.Rd0000644000176200001440000000530313346511223014765 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/calibrate.plot.R \name{calibrate.plot} \alias{calibrate.plot} \title{Calibration plot} \usage{ calibrate.plot(y, p, distribution = "bernoulli", replace = TRUE, line.par = list(col = "black"), shade.col = "lightyellow", shade.density = NULL, rug.par = list(side = 1), xlab = "Predicted value", ylab = "Observed average", xlim = NULL, ylim = NULL, knots = NULL, df = 6, ...) } \arguments{ \item{y}{The outcome 0-1 variable.} \item{p}{The predictions estimating E(y|x).} \item{distribution}{The loss function used in creating \code{p}. \code{bernoulli} and \code{poisson} are currently the only special options. All others default to squared error assuming \code{gaussian}.} \item{replace}{Determines whether this plot will replace or overlay the current plot. \code{replace=FALSE} is useful for comparing the calibration of several methods.} \item{line.par}{Graphics parameters for the line.} \item{shade.col}{Color for shading the 2 SE region. \code{shade.col=NA} implies no 2 SE region.} \item{shade.density}{The \code{density} parameter for \code{\link{polygon}}.} \item{rug.par}{Graphics parameters passed to \code{\link{rug}}.} \item{xlab}{x-axis label corresponding to the predicted values.} \item{ylab}{y-axis label corresponding to the observed average.} \item{xlim, ylim}{x- and y-axis limits. If not specified te function will select limits.} \item{knots, df}{These parameters are passed directly to \code{\link[splines]{ns}} for constructing a natural spline smoother for the calibration curve.} \item{...}{Additional optional arguments to be passed onto \code{\link[graphics]{plot}}} } \value{ No return values. } \description{ An experimental diagnostic tool that plots the fitted values versus the actual average values. Currently only available when \code{distribution = "bernoulli"}. } \details{ Uses natural splines to estimate E(y|p). Well-calibrated predictions imply that E(y|p) = p. The plot also includes a pointwise 95% confidence band. } \examples{ # Don't want R CMD check to think there is a dependency on rpart # so comment out the example #library(rpart) #data(kyphosis) #y <- as.numeric(kyphosis$Kyphosis)-1 #x <- kyphosis$Age #glm1 <- glm(y~poly(x,2),family=binomial) #p <- predict(glm1,type="response") #calibrate.plot(y, p, xlim=c(0,0.6), ylim=c(0,0.6)) } \references{ J.F. Yates (1982). "External correspondence: decomposition of the mean probability score," Organisational Behaviour and Human Performance 30:132-156. D.J. Spiegelhalter (1986). "Probabilistic Prediction in Patient Management and Clinical Trials," Statistics in Medicine 5:421-433. } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} } \keyword{hplot} gbm/man/test.gbm.Rd0000644000176200001440000000177513346511223013616 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/test.gbm.R \name{test.gbm} \alias{test.gbm} \alias{validate.gbm} \alias{test.relative.influence} \title{Test the \code{gbm} package.} \usage{ test.gbm() } \value{ An object of class \code{RUnitTestData}. See the help for \code{RUnit} for details. } \description{ Run tests on \code{gbm} functions to perform logical checks and reproducibility. } \details{ The function uses functionality in the \code{RUnit} package. A fairly small validation suite is executed that checks to see that relative influence identifies sensible variables from simulated data, and that predictions from GBMs with Gaussian, Cox or binomial distributions are sensible, } \note{ The test suite is not comprehensive. } \examples{ # Uncomment the following lines to run - commented out to make CRAN happy #library(RUnit) #val <- validate.texmex() #printHTMLProtocol(val, "texmexReport.html") } \seealso{ \code{\link{gbm}} } \author{ Harry Southworth } \keyword{models} gbm/man/quantile.rug.Rd0000644000176200001440000000141313346511223014476 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/calibrate.plot.R \name{quantile.rug} \alias{quantile.rug} \title{Quantile rug plot} \usage{ \method{quantile}{rug}(x, prob = 0:10/10, ...) } \arguments{ \item{x}{A numeric vector.} \item{prob}{The quantiles of x to mark on the x-axis.} \item{...}{Additional optional arguments to be passed onto \code{\link[graphics]{rug}}} } \value{ No return values. } \description{ Marks the quantiles on the axes of the current plot. } \examples{ x <- rnorm(100) y <- rnorm(100) plot(x, y) quantile.rug(x) } \seealso{ \code{\link[graphics]{plot}}, \code{\link[stats]{quantile}}, \code{\link[base]{jitter}}, \code{\link[graphics]{rug}}. } \author{ Greg Ridgeway \email{gregridgeway@gmail.com}. } \keyword{aplot} gbm/man/reconstructGBMdata.Rd0000644000176200001440000000123313346511223015613 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/reconstructGBMdata.R \name{reconstructGBMdata} \alias{reconstructGBMdata} \title{Reconstruct a GBM's Source Data} \usage{ reconstructGBMdata(x) } \arguments{ \item{x}{a \code{\link{gbm.object}} initially fit using \code{\link{gbm}}} } \value{ Returns a data used to fit the gbm in a format that can subsequently be used for plots and summaries } \description{ Helper function to reconstitute the data for plots and summaries. This function is not intended for the user to call directly. } \seealso{ \code{\link{gbm}}, \code{\link{gbm.object}} } \author{ Harry Southworth } \keyword{manip} gbm/man/gbm.roc.area.Rd0000644000176200001440000000366213346511223014326 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/ir.measures.R \name{gbm.roc.area} \alias{gbm.roc.area} \alias{gbm.conc} \alias{ir.measure.conc} \alias{ir.measure.auc} \alias{ir.measure.mrr} \alias{ir.measure.map} \alias{ir.measure.ndcg} \alias{perf.pairwise} \title{Compute Information Retrieval measures.} \usage{ gbm.roc.area(obs, pred) gbm.conc(x) ir.measure.conc(y.f, max.rank = 0) ir.measure.auc(y.f, max.rank = 0) ir.measure.mrr(y.f, max.rank) ir.measure.map(y.f, max.rank = 0) ir.measure.ndcg(y.f, max.rank) perf.pairwise(y, f, group, metric = "ndcg", w = NULL, max.rank = 0) } \arguments{ \item{obs}{Observed value.} \item{pred}{Predicted value.} \item{x}{?.} \item{y, y.f, f, w, group, max.rank}{Used internally.} \item{metric}{What type of performance measure to compute.} } \value{ The requested performance measure. } \description{ Functions to compute Information Retrieval measures for pairwise loss for a single group. The function returns the respective metric, or a negative value if it is undefined for the given group. } \details{ For simplicity, we have no special handling for ties; instead, we break ties randomly. This is slightly inaccurate for individual groups, but should have only a small effect on the overall measure. \code{gbm.conc} computes the concordance index: Fraction of all pairs (i,j) with i Define data, use random, ##-- or do help(data=index) for the standard data sets. } \references{ C. Burges (2010). "From RankNet to LambdaRank to LambdaMART: An Overview", Microsoft Research Technical Report MSR-TR-2010-82. } \seealso{ \code{\link{gbm}} } \author{ Stefan Schroedl } \keyword{models} gbm/man/interact.gbm.Rd0000644000176200001440000000400213346511223014432 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/interact.gbm.R \name{interact.gbm} \alias{interact.gbm} \title{Estimate the strength of interaction effects} \usage{ interact.gbm(x, data, i.var = 1, n.trees = x$n.trees) } \arguments{ \item{x}{A \code{\link{gbm.object}} fitted using a call to \code{\link{gbm}}.} \item{data}{The dataset used to construct \code{x}. If the original dataset is large, a random subsample may be used to accelerate the computation in \code{interact.gbm}.} \item{i.var}{A vector of indices or the names of the variables for compute the interaction effect. If using indices, the variables are indexed in the same order that they appear in the initial \code{gbm} formula.} \item{n.trees}{The number of trees used to generate the plot. Only the first \code{n.trees} trees will be used.} } \value{ Returns the value of \eqn{H}. } \description{ Computes Friedman's H-statistic to assess the strength of variable interactions. } \details{ \code{interact.gbm} computes Friedman's H-statistic to assess the relative strength of interaction effects in non-linear models. H is on the scale of [0-1] with higher values indicating larger interaction effects. To connect to a more familiar measure, if \eqn{x_1} and \eqn{x_2} are uncorrelated covariates with mean 0 and variance 1 and the model is of the form \deqn{y=\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3} then \deqn{H=\frac{\beta_3}{\sqrt{\beta_1^2+\beta_2^2+\beta_3^2}}} Note that if the main effects are weak, the estimated H will be unstable. For example, if (in the case of a two-way interaction) neither main effect is in the selected model (relative influence is zero), the result will be 0/0. Also, with weak main effects, rounding errors can result in values of H > 1 which are not possible. } \references{ J.H. Friedman and B.E. Popescu (2005). \dQuote{Predictive Learning via Rule Ensembles.} Section 8.1 } \seealso{ \code{\link{gbm}}, \code{\link{gbm.object}} } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} } \keyword{methods} gbm/man/gbm.fit.Rd0000644000176200001440000002472113346511223013415 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/gbm.fit.R \name{gbm.fit} \alias{gbm.fit} \title{Generalized Boosted Regression Modeling (GBM)} \usage{ gbm.fit(x, y, offset = NULL, misc = NULL, distribution = "bernoulli", w = NULL, var.monotone = NULL, n.trees = 100, interaction.depth = 1, n.minobsinnode = 10, shrinkage = 0.001, bag.fraction = 0.5, nTrain = NULL, train.fraction = NULL, keep.data = TRUE, verbose = TRUE, var.names = NULL, response.name = "y", group = NULL) } \arguments{ \item{x}{A data frame or matrix containing the predictor variables. The number of rows in \code{x} must be the same as the length of \code{y}.} \item{y}{A vector of outcomes. The number of rows in \code{x} must be the same as the length of \code{y}.} \item{offset}{A vector of offset values.} \item{misc}{An R object that is simply passed on to the gbm engine. It can be used for additional data for the specific distribution. Currently it is only used for passing the censoring indicator for the Cox proportional hazards model.} \item{distribution}{Either a character string specifying the name of the distribution to use or a list with a component \code{name} specifying the distribution and any additional parameters needed. If not specified, \code{gbm} will try to guess: if the response has only 2 unique values, bernoulli is assumed; otherwise, if the response is a factor, multinomial is assumed; otherwise, if the response has class \code{"Surv"}, coxph is assumed; otherwise, gaussian is assumed. Currently available options are \code{"gaussian"} (squared error), \code{"laplace"} (absolute loss), \code{"tdist"} (t-distribution loss), \code{"bernoulli"} (logistic regression for 0-1 outcomes), \code{"huberized"} (huberized hinge loss for 0-1 outcomes), classes), \code{"adaboost"} (the AdaBoost exponential loss for 0-1 outcomes), \code{"poisson"} (count outcomes), \code{"coxph"} (right censored observations), \code{"quantile"}, or \code{"pairwise"} (ranking measure using the LambdaMart algorithm). If quantile regression is specified, \code{distribution} must be a list of the form \code{list(name = "quantile", alpha = 0.25)} where \code{alpha} is the quantile to estimate. The current version's quantile regression method does not handle non-constant weights and will stop. If \code{"tdist"} is specified, the default degrees of freedom is 4 and this can be controlled by specifying \code{distribution = list(name = "tdist", df = DF)} where \code{DF} is your chosen degrees of freedom. If "pairwise" regression is specified, \code{distribution} must be a list of the form \code{list(name="pairwise",group=...,metric=...,max.rank=...)} (\code{metric} and \code{max.rank} are optional, see below). \code{group} is a character vector with the column names of \code{data} that jointly indicate the group an instance belongs to (typically a query in Information Retrieval applications). For training, only pairs of instances from the same group and with different target labels can be considered. \code{metric} is the IR measure to use, one of \describe{ \item{list("conc")}{Fraction of concordant pairs; for binary labels, this is equivalent to the Area under the ROC Curve} \item{:}{Fraction of concordant pairs; for binary labels, this is equivalent to the Area under the ROC Curve} \item{list("mrr")}{Mean reciprocal rank of the highest-ranked positive instance} \item{:}{Mean reciprocal rank of the highest-ranked positive instance} \item{list("map")}{Mean average precision, a generalization of \code{mrr} to multiple positive instances}\item{:}{Mean average precision, a generalization of \code{mrr} to multiple positive instances} \item{list("ndcg:")}{Normalized discounted cumulative gain. The score is the weighted sum (DCG) of the user-supplied target values, weighted by log(rank+1), and normalized to the maximum achievable value. This is the default if the user did not specify a metric.} } \code{ndcg} and \code{conc} allow arbitrary target values, while binary targets {0,1} are expected for \code{map} and \code{mrr}. For \code{ndcg} and \code{mrr}, a cut-off can be chosen using a positive integer parameter \code{max.rank}. If left unspecified, all ranks are taken into account. Note that splitting of instances into training and validation sets follows group boundaries and therefore only approximates the specified \code{train.fraction} ratio (the same applies to cross-validation folds). Internally, queries are randomly shuffled before training, to avoid bias. Weights can be used in conjunction with pairwise metrics, however it is assumed that they are constant for instances from the same group. For details and background on the algorithm, see e.g. Burges (2010).} \item{w}{A vector of weights of the same length as the \code{y}.} \item{var.monotone}{an optional vector, the same length as the number of predictors, indicating which variables have a monotone increasing (+1), decreasing (-1), or arbitrary (0) relationship with the outcome.} \item{n.trees}{the total number of trees to fit. This is equivalent to the number of iterations and the number of basis functions in the additive expansion.} \item{interaction.depth}{The maximum depth of variable interactions. A value of 1 implies an additive model, a value of 2 implies a model with up to 2-way interactions, etc. Default is \code{1}.} \item{n.minobsinnode}{Integer specifying the minimum number of observations in the trees terminal nodes. Note that this is the actual number of observations not the total weight.} \item{shrinkage}{The shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or step-size reduction; 0.001 to 0.1 usually work, but a smaller learning rate typically requires more trees. Default is \code{0.1}.} \item{bag.fraction}{The fraction of the training set observations randomly selected to propose the next tree in the expansion. This introduces randomnesses into the model fit. If \code{bag.fraction} < 1 then running the same model twice will result in similar but different fits. \code{gbm} uses the R random number generator so \code{set.seed} can ensure that the model can be reconstructed. Preferably, the user can save the returned \code{\link{gbm.object}} using \code{\link{save}}. Default is \code{0.5}.} \item{nTrain}{An integer representing the number of cases on which to train. This is the preferred way of specification for \code{gbm.fit}; The option \code{train.fraction} in \code{gbm.fit} is deprecated and only maintained for backward compatibility. These two parameters are mutually exclusive. If both are unspecified, all data is used for training.} \item{train.fraction}{The first \code{train.fraction * nrows(data)} observations are used to fit the \code{gbm} and the remainder are used for computing out-of-sample estimates of the loss function.} \item{keep.data}{Logical indicating whether or not to keep the data and an index of the data stored with the object. Keeping the data and index makes subsequent calls to \code{\link{gbm.more}} faster at the cost of storing an extra copy of the dataset.} \item{verbose}{Logical indicating whether or not to print out progress and performance indicators (\code{TRUE}). If this option is left unspecified for \code{gbm.more}, then it uses \code{verbose} from \code{object}. Default is \code{FALSE}.} \item{var.names}{Vector of strings of length equal to the number of columns of \code{x} containing the names of the predictor variables.} \item{response.name}{Character string label for the response variable.} \item{group}{The \code{group} to use when \code{distribution = "pairwise"}.} } \value{ A \code{\link{gbm.object}} object. } \description{ Workhorse function providing the link between R and the C++ gbm engine. \code{gbm} is a front-end to \code{gbm.fit} that uses the familiar R modeling formulas. However, \code{\link[stats]{model.frame}} is very slow if there are many predictor variables. For power-users with many variables use \code{gbm.fit}. For general practice \code{gbm} is preferable. } \details{ This package implements the generalized boosted modeling framework. Boosting is the process of iteratively adding basis functions in a greedy fashion so that each additional basis function further reduces the selected loss function. This implementation closely follows Friedman's Gradient Boosting Machine (Friedman, 2001). In addition to many of the features documented in the Gradient Boosting Machine, \code{gbm} offers additional features including the out-of-bag estimator for the optimal number of iterations, the ability to store and manipulate the resulting \code{gbm} object, and a variety of other loss functions that had not previously had associated boosting algorithms, including the Cox partial likelihood for censored data, the poisson likelihood for count outcomes, and a gradient boosting implementation to minimize the AdaBoost exponential loss function. } \references{ Y. Freund and R.E. Schapire (1997) \dQuote{A decision-theoretic generalization of on-line learning and an application to boosting,} \emph{Journal of Computer and System Sciences,} 55(1):119-139. G. Ridgeway (1999). \dQuote{The state of boosting,} \emph{Computing Science and Statistics} 31:172-181. J.H. Friedman, T. Hastie, R. Tibshirani (2000). \dQuote{Additive Logistic Regression: a Statistical View of Boosting,} \emph{Annals of Statistics} 28(2):337-374. J.H. Friedman (2001). \dQuote{Greedy Function Approximation: A Gradient Boosting Machine,} \emph{Annals of Statistics} 29(5):1189-1232. J.H. Friedman (2002). \dQuote{Stochastic Gradient Boosting,} \emph{Computational Statistics and Data Analysis} 38(4):367-378. B. Kriegler (2007). Cost-Sensitive Stochastic Gradient Boosting Within a Quantitative Regression Framework. Ph.D. Dissertation. University of California at Los Angeles, Los Angeles, CA, USA. Advisor(s) Richard A. Berk. url{https://dl.acm.org/citation.cfm?id=1354603}. C. Burges (2010). \dQuote{From RankNet to LambdaRank to LambdaMART: An Overview,} Microsoft Research Technical Report MSR-TR-2010-82. } \seealso{ \code{\link{gbm.object}}, \code{\link{gbm.perf}}, \code{\link{plot.gbm}}, \code{\link{predict.gbm}}, \code{\link{summary.gbm}}, and \code{\link{pretty.gbm.tree}}. } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} Quantile regression code developed by Brian Kriegler \email{bk@stat.ucla.edu} t-distribution, and multinomial code developed by Harry Southworth and Daniel Edwards Pairwise code developed by Stefan Schroedl \email{schroedl@a9.com} } gbm/man/gbm.Rd0000644000176200001440000003535613414472361012647 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/gbm.R \name{gbm} \alias{gbm} \title{Generalized Boosted Regression Modeling (GBM)} \usage{ gbm(formula = formula(data), distribution = "bernoulli", data = list(), weights, var.monotone = NULL, n.trees = 100, interaction.depth = 1, n.minobsinnode = 10, shrinkage = 0.1, bag.fraction = 0.5, train.fraction = 1, cv.folds = 0, keep.data = TRUE, verbose = FALSE, class.stratify.cv = NULL, n.cores = NULL) } \arguments{ \item{formula}{A symbolic description of the model to be fit. The formula may include an offset term (e.g. y~offset(n)+x). If \code{keep.data = FALSE} in the initial call to \code{gbm} then it is the user's responsibility to resupply the offset to \code{\link{gbm.more}}.} \item{distribution}{Either a character string specifying the name of the distribution to use or a list with a component \code{name} specifying the distribution and any additional parameters needed. If not specified, \code{gbm} will try to guess: if the response has only 2 unique values, bernoulli is assumed; otherwise, if the response is a factor, multinomial is assumed; otherwise, if the response has class \code{"Surv"}, coxph is assumed; otherwise, gaussian is assumed. Currently available options are \code{"gaussian"} (squared error), \code{"laplace"} (absolute loss), \code{"tdist"} (t-distribution loss), \code{"bernoulli"} (logistic regression for 0-1 outcomes), \code{"huberized"} (huberized hinge loss for 0-1 outcomes), classes), \code{"adaboost"} (the AdaBoost exponential loss for 0-1 outcomes), \code{"poisson"} (count outcomes), \code{"coxph"} (right censored observations), \code{"quantile"}, or \code{"pairwise"} (ranking measure using the LambdaMart algorithm). If quantile regression is specified, \code{distribution} must be a list of the form \code{list(name = "quantile", alpha = 0.25)} where \code{alpha} is the quantile to estimate. The current version's quantile regression method does not handle non-constant weights and will stop. If \code{"tdist"} is specified, the default degrees of freedom is 4 and this can be controlled by specifying \code{distribution = list(name = "tdist", df = DF)} where \code{DF} is your chosen degrees of freedom. If "pairwise" regression is specified, \code{distribution} must be a list of the form \code{list(name="pairwise",group=...,metric=...,max.rank=...)} (\code{metric} and \code{max.rank} are optional, see below). \code{group} is a character vector with the column names of \code{data} that jointly indicate the group an instance belongs to (typically a query in Information Retrieval applications). For training, only pairs of instances from the same group and with different target labels can be considered. \code{metric} is the IR measure to use, one of \describe{ \item{list("conc")}{Fraction of concordant pairs; for binary labels, this is equivalent to the Area under the ROC Curve} \item{:}{Fraction of concordant pairs; for binary labels, this is equivalent to the Area under the ROC Curve} \item{list("mrr")}{Mean reciprocal rank of the highest-ranked positive instance} \item{:}{Mean reciprocal rank of the highest-ranked positive instance} \item{list("map")}{Mean average precision, a generalization of \code{mrr} to multiple positive instances}\item{:}{Mean average precision, a generalization of \code{mrr} to multiple positive instances} \item{list("ndcg:")}{Normalized discounted cumulative gain. The score is the weighted sum (DCG) of the user-supplied target values, weighted by log(rank+1), and normalized to the maximum achievable value. This is the default if the user did not specify a metric.} } \code{ndcg} and \code{conc} allow arbitrary target values, while binary targets {0,1} are expected for \code{map} and \code{mrr}. For \code{ndcg} and \code{mrr}, a cut-off can be chosen using a positive integer parameter \code{max.rank}. If left unspecified, all ranks are taken into account. Note that splitting of instances into training and validation sets follows group boundaries and therefore only approximates the specified \code{train.fraction} ratio (the same applies to cross-validation folds). Internally, queries are randomly shuffled before training, to avoid bias. Weights can be used in conjunction with pairwise metrics, however it is assumed that they are constant for instances from the same group. For details and background on the algorithm, see e.g. Burges (2010).} \item{data}{an optional data frame containing the variables in the model. By default the variables are taken from \code{environment(formula)}, typically the environment from which \code{gbm} is called. If \code{keep.data=TRUE} in the initial call to \code{gbm} then \code{gbm} stores a copy with the object. If \code{keep.data=FALSE} then subsequent calls to \code{\link{gbm.more}} must resupply the same dataset. It becomes the user's responsibility to resupply the same data at this point.} \item{weights}{an optional vector of weights to be used in the fitting process. Must be positive but do not need to be normalized. If \code{keep.data=FALSE} in the initial call to \code{gbm} then it is the user's responsibility to resupply the weights to \code{\link{gbm.more}}.} \item{var.monotone}{an optional vector, the same length as the number of predictors, indicating which variables have a monotone increasing (+1), decreasing (-1), or arbitrary (0) relationship with the outcome.} \item{n.trees}{Integer specifying the total number of trees to fit. This is equivalent to the number of iterations and the number of basis functions in the additive expansion. Default is 100.} \item{interaction.depth}{Integer specifying the maximum depth of each tree (i.e., the highest level of variable interactions allowed). A value of 1 implies an additive model, a value of 2 implies a model with up to 2-way interactions, etc. Default is 1.} \item{n.minobsinnode}{Integer specifying the minimum number of observations in the terminal nodes of the trees. Note that this is the actual number of observations, not the total weight.} \item{shrinkage}{a shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or step-size reduction; 0.001 to 0.1 usually work, but a smaller learning rate typically requires more trees. Default is 0.1.} \item{bag.fraction}{the fraction of the training set observations randomly selected to propose the next tree in the expansion. This introduces randomnesses into the model fit. If \code{bag.fraction} < 1 then running the same model twice will result in similar but different fits. \code{gbm} uses the R random number generator so \code{set.seed} can ensure that the model can be reconstructed. Preferably, the user can save the returned \code{\link{gbm.object}} using \code{\link{save}}. Default is 0.5.} \item{train.fraction}{The first \code{train.fraction * nrows(data)} observations are used to fit the \code{gbm} and the remainder are used for computing out-of-sample estimates of the loss function.} \item{cv.folds}{Number of cross-validation folds to perform. If \code{cv.folds}>1 then \code{gbm}, in addition to the usual fit, will perform a cross-validation, calculate an estimate of generalization error returned in \code{cv.error}.} \item{keep.data}{a logical variable indicating whether to keep the data and an index of the data stored with the object. Keeping the data and index makes subsequent calls to \code{\link{gbm.more}} faster at the cost of storing an extra copy of the dataset.} \item{verbose}{Logical indicating whether or not to print out progress and performance indicators (\code{TRUE}). If this option is left unspecified for \code{gbm.more}, then it uses \code{verbose} from \code{object}. Default is \code{FALSE}.} \item{class.stratify.cv}{Logical indicating whether or not the cross-validation should be stratified by class. Defaults to \code{TRUE} for \code{distribution = "multinomial"} and is only implemented for \code{"multinomial"} and \code{"bernoulli"}. The purpose of stratifying the cross-validation is to help avoiding situations in which training sets do not contain all classes.} \item{n.cores}{The number of CPU cores to use. The cross-validation loop will attempt to send different CV folds off to different cores. If \code{n.cores} is not specified by the user, it is guessed using the \code{detectCores} function in the \code{parallel} package. Note that the documentation for \code{detectCores} makes clear that it is not failsafe and could return a spurious number of available cores.} } \value{ A \code{\link{gbm.object}} object. } \description{ Fits generalized boosted regression models. For technical details, see the vignette: \code{utils::browseVignettes("gbm")}. } \details{ \code{gbm.fit} provides the link between R and the C++ gbm engine. \code{gbm} is a front-end to \code{gbm.fit} that uses the familiar R modeling formulas. However, \code{\link[stats]{model.frame}} is very slow if there are many predictor variables. For power-users with many variables use \code{gbm.fit}. For general practice \code{gbm} is preferable. This package implements the generalized boosted modeling framework. Boosting is the process of iteratively adding basis functions in a greedy fashion so that each additional basis function further reduces the selected loss function. This implementation closely follows Friedman's Gradient Boosting Machine (Friedman, 2001). In addition to many of the features documented in the Gradient Boosting Machine, \code{gbm} offers additional features including the out-of-bag estimator for the optimal number of iterations, the ability to store and manipulate the resulting \code{gbm} object, and a variety of other loss functions that had not previously had associated boosting algorithms, including the Cox partial likelihood for censored data, the poisson likelihood for count outcomes, and a gradient boosting implementation to minimize the AdaBoost exponential loss function. } \examples{ # # A least squares regression example # # Simulate data set.seed(101) # for reproducibility N <- 1000 X1 <- runif(N) X2 <- 2 * runif(N) X3 <- ordered(sample(letters[1:4], N, replace = TRUE), levels = letters[4:1]) X4 <- factor(sample(letters[1:6], N, replace = TRUE)) X5 <- factor(sample(letters[1:3], N, replace = TRUE)) X6 <- 3 * runif(N) mu <- c(-1, 0, 1, 2)[as.numeric(X3)] SNR <- 10 # signal-to-noise ratio Y <- X1 ^ 1.5 + 2 * (X2 ^ 0.5) + mu sigma <- sqrt(var(Y) / SNR) Y <- Y + rnorm(N, 0, sigma) X1[sample(1:N, size = 500)] <- NA # introduce some missing values X4[sample(1:N, size = 300)] <- NA # introduce some missing values data <- data.frame(Y, X1, X2, X3, X4, X5, X6) # Fit a GBM set.seed(102) # for reproducibility gbm1 <- gbm(Y ~ ., data = data, var.monotone = c(0, 0, 0, 0, 0, 0), distribution = "gaussian", n.trees = 100, shrinkage = 0.1, interaction.depth = 3, bag.fraction = 0.5, train.fraction = 0.5, n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE, verbose = FALSE, n.cores = 1) # Check performance using the out-of-bag (OOB) error; the OOB error typically # underestimates the optimal number of iterations best.iter <- gbm.perf(gbm1, method = "OOB") print(best.iter) # Check performance using the 50\% heldout test set best.iter <- gbm.perf(gbm1, method = "test") print(best.iter) # Check performance using 5-fold cross-validation best.iter <- gbm.perf(gbm1, method = "cv") print(best.iter) # Plot relative influence of each variable par(mfrow = c(1, 2)) summary(gbm1, n.trees = 1) # using first tree summary(gbm1, n.trees = best.iter) # using estimated best number of trees # Compactly print the first and last trees for curiosity print(pretty.gbm.tree(gbm1, i.tree = 1)) print(pretty.gbm.tree(gbm1, i.tree = gbm1$n.trees)) # Simulate new data set.seed(103) # for reproducibility N <- 1000 X1 <- runif(N) X2 <- 2 * runif(N) X3 <- ordered(sample(letters[1:4], N, replace = TRUE)) X4 <- factor(sample(letters[1:6], N, replace = TRUE)) X5 <- factor(sample(letters[1:3], N, replace = TRUE)) X6 <- 3 * runif(N) mu <- c(-1, 0, 1, 2)[as.numeric(X3)] Y <- X1 ^ 1.5 + 2 * (X2 ^ 0.5) + mu + rnorm(N, 0, sigma) data2 <- data.frame(Y, X1, X2, X3, X4, X5, X6) # Predict on the new data using the "best" number of trees; by default, # predictions will be on the link scale Yhat <- predict(gbm1, newdata = data2, n.trees = best.iter, type = "link") # least squares error print(sum((data2$Y - Yhat)^2)) # Construct univariate partial dependence plots p1 <- plot(gbm1, i.var = 1, n.trees = best.iter) p2 <- plot(gbm1, i.var = 2, n.trees = best.iter) p3 <- plot(gbm1, i.var = "X3", n.trees = best.iter) # can use index or name grid.arrange(p1, p2, p3, ncol = 3) # Construct bivariate partial dependence plots plot(gbm1, i.var = 1:2, n.trees = best.iter) plot(gbm1, i.var = c("X2", "X3"), n.trees = best.iter) plot(gbm1, i.var = 3:4, n.trees = best.iter) # Construct trivariate partial dependence plots plot(gbm1, i.var = c(1, 2, 6), n.trees = best.iter, continuous.resolution = 20) plot(gbm1, i.var = 1:3, n.trees = best.iter) plot(gbm1, i.var = 2:4, n.trees = best.iter) plot(gbm1, i.var = 3:5, n.trees = best.iter) # Add more (i.e., 100) boosting iterations to the ensemble gbm2 <- gbm.more(gbm1, n.new.trees = 100, verbose = FALSE) } \references{ Y. Freund and R.E. Schapire (1997) \dQuote{A decision-theoretic generalization of on-line learning and an application to boosting,} \emph{Journal of Computer and System Sciences,} 55(1):119-139. G. Ridgeway (1999). \dQuote{The state of boosting,} \emph{Computing Science and Statistics} 31:172-181. J.H. Friedman, T. Hastie, R. Tibshirani (2000). \dQuote{Additive Logistic Regression: a Statistical View of Boosting,} \emph{Annals of Statistics} 28(2):337-374. J.H. Friedman (2001). \dQuote{Greedy Function Approximation: A Gradient Boosting Machine,} \emph{Annals of Statistics} 29(5):1189-1232. J.H. Friedman (2002). \dQuote{Stochastic Gradient Boosting,} \emph{Computational Statistics and Data Analysis} 38(4):367-378. B. Kriegler (2007). Cost-Sensitive Stochastic Gradient Boosting Within a Quantitative Regression Framework. Ph.D. Dissertation. University of California at Los Angeles, Los Angeles, CA, USA. Advisor(s) Richard A. Berk. url{https://dl.acm.org/citation.cfm?id=1354603}. C. Burges (2010). \dQuote{From RankNet to LambdaRank to LambdaMART: An Overview,} Microsoft Research Technical Report MSR-TR-2010-82. } \seealso{ \code{\link{gbm.object}}, \code{\link{gbm.perf}}, \code{\link{plot.gbm}}, \code{\link{predict.gbm}}, \code{\link{summary.gbm}}, and \code{\link{pretty.gbm.tree}}. } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} Quantile regression code developed by Brian Kriegler \email{bk@stat.ucla.edu} t-distribution, and multinomial code developed by Harry Southworth and Daniel Edwards Pairwise code developed by Stefan Schroedl \email{schroedl@a9.com} } gbm/man/gbm-internals.Rd0000644000176200001440000000247413346511223014632 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/gbm-internals.R \name{guessDist} \alias{guessDist} \alias{getStratify} \alias{getCVgroup} \alias{checkMissing} \alias{checkID} \alias{checkWeights} \alias{checkOffset} \alias{getVarNames} \alias{gbmCluster} \title{gbm internal functions} \usage{ guessDist(y) getCVgroup(distribution, class.stratify.cv, y, i.train, cv.folds, group) getStratify(strat, d) checkMissing(x, y) checkWeights(w, n) checkID(id) checkOffset(o, y) getVarNames(x) gbmCluster(n) } \arguments{ \item{y}{The response variable.} \item{class.stratify.cv}{Whether or not to stratify, if provided by the user.} \item{i.train}{Computed internally by \code{gbm}.} \item{cv.folds}{The number of cross-validation folds.} \item{group}{The group, if using \code{distibution = "pairwise"}.} \item{strat}{Whether or not to stratify.} \item{d, distribution}{The distribution, either specified by the user or implied.} \item{x}{The design matrix.} \item{w}{The weights.} \item{n}{The number of cores to use in the cluster.} \item{id}{The interaction depth.} \item{o}{The offset.} } \description{ Helper functions for preprocessing data prior to building a \code{"gbm"} object. } \details{ These are functions used internally by \code{gbm} and not intended for direct use by the user. } gbm/man/basehaz.gbm.Rd0000644000176200001440000000351013346511223014241 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/basehaz.gbm.R \name{basehaz.gbm} \alias{basehaz.gbm} \title{Baseline hazard function} \usage{ basehaz.gbm(t, delta, f.x, t.eval = NULL, smooth = FALSE, cumulative = TRUE) } \arguments{ \item{t}{The survival times.} \item{delta}{The censoring indicator.} \item{f.x}{The predicted values of the regression model on the log hazard scale.} \item{t.eval}{Values at which the baseline hazard will be evaluated.} \item{smooth}{If \code{TRUE} \code{basehaz.gbm} will smooth the estimated baseline hazard using Friedman's super smoother \code{\link{supsmu}}.} \item{cumulative}{If \code{TRUE} the cumulative survival function will be computed.} } \value{ A vector of length equal to the length of t (or of length \code{t.eval} if \code{t.eval} is not \code{NULL}) containing the baseline hazard evaluated at t (or at \code{t.eval} if \code{t.eval} is not \code{NULL}). If \code{cumulative} is set to \code{TRUE} then the returned vector evaluates the cumulative hazard function at those values. } \description{ Computes the Breslow estimator of the baseline hazard function for a proportional hazard regression model. } \details{ The proportional hazard model assumes h(t|x)=lambda(t)*exp(f(x)). \code{\link{gbm}} can estimate the f(x) component via partial likelihood. After estimating f(x), \code{basehaz.gbm} can compute the a nonparametric estimate of lambda(t). } \references{ N. Breslow (1972). "Discussion of `Regression Models and Life-Tables' by D.R. Cox," Journal of the Royal Statistical Society, Series B, 34(2):216-217. N. Breslow (1974). "Covariance analysis of censored survival data," Biometrics 30:89-99. } \seealso{ \code{\link[survival]{survfit}}, \code{\link{gbm}} } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} } \keyword{methods} \keyword{survival} gbm/man/gbm-package.Rd0000644000176200001440000000314313346511223014220 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/gbm-package.R \docType{package} \name{gbm-package} \alias{gbm-package} \title{Generalized Boosted Regression Models (GBMs)} \description{ This package implements extensions to Freund and Schapire's AdaBoost algorithm and J. Friedman's gradient boosting machine. Includes regression methods for least squares, absolute loss, logistic, Poisson, Cox proportional hazards partial likelihood, multinomial, t-distribution, AdaBoost exponential loss, Learning to Rank, and Huberized hinge loss. } \details{ Further information is available in vignette: \code{browseVignettes(package = "gbm")} } \references{ Y. Freund and R.E. Schapire (1997) \dQuote{A decision-theoretic generalization of on-line learning and an application to boosting,} \emph{Journal of Computer and System Sciences,} 55(1):119-139. G. Ridgeway (1999). \dQuote{The state of boosting,} \emph{Computing Science and Statistics} 31:172-181. J.H. Friedman, T. Hastie, R. Tibshirani (2000). \dQuote{Additive Logistic Regression: a Statistical View of Boosting,} \emph{Annals of Statistics} 28(2):337-374. J.H. Friedman (2001). \dQuote{Greedy Function Approximation: A Gradient Boosting Machine,} \emph{Annals of Statistics} 29(5):1189-1232. J.H. Friedman (2002). \dQuote{Stochastic Gradient Boosting,} \emph{Computational Statistics and Data Analysis} 38(4):367-378. The \url{http://statweb.stanford.edu/~jhf/R-MART} website. } \author{ Greg Ridgeway \email{gregridgeway@gmail.com} with contributions by Daniel Edwards, Brian Kriegler, Stefan Schroedl and Harry Southworth. } \keyword{package} gbm/.Rinstignore0000644000176200001440000000026213346511223013323 0ustar liggesusersinst/doc/gbm.tex inst/doc/srcltx.sty inst/doc/shrinkage-v-iterations.eps inst/doc/shrinkage-v-iterations.pdf inst/doc/oobperf2.eps inst/doc/oobperf2.pdf inst/doc/shrinkageplot.R gbm/LICENSE0000644000176200001440000000122213346511223012021 0ustar liggesusersGeneralized Boosted Regression package for the R environment Copyright (C) 2003 Greg Ridgeway This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Copies of the relevant licenses can be found at: https://www.r-project.org/Licenses/ gbm/CHANGES0000644000176200001440000003330513346511223012016 0ustar liggesusersChanges in version 2.1 - The cross-validation loop is now parallelized. The functions attempt to guess a sensible number of cores to use, or the user can specify how many through new argument n.cores. - A fair amount of code refactoring. - Added type='response' for predict when distribution='adaboost'. - Fixed a bug that caused offset not to be used if the first element of offset was 0. - Updated predict.gbm and plot.gbm to cope with objects created using gbm version 1.6. - Changed default value of verbose to 'CV'. gbm now defaults to letting the user know which block of CV folds it is running. If verbose=TRUE is specified, the final run of the model also prints its progress to screen as in earlier versions. - Fixed bug that caused predict to return wrong result when distribution == 'multinomial' and length(n.trees) > 1. - Fixed bug that caused n.trees to be wrong in relative.influence if no CV or validation set was used. - Relative influence was computed wrongly when distribution="multinomial". Fixed. - Cross-validation predictions now included in the output object. - Fixed bug in relative.influence that caused labels to be wrong when sort.=TRUE. - Modified interact.gbm to do additional sanity check, updated help file - Fixed bug in interact.gbm so that it now works for distribution="multinomial" - Modified predict.gbm to improve performance on large datasets Changes in version 2.0 Lots of new features added so it warrants a change to the first digit of the version number. Major changes: - Several new distributions are now available thanks to Harry Southworth and Daniel Edwards: multinomial and tdist. - New distribution 'pairwise' for Learning to Rank Applications (LambdaMART), including four different ranking measures, thanks to Stefan Schroedl. - The gbm package is now managed on R-Forge by Greg Ridgeway and Harry Southworth. Visit http://r-forge.r-project.org/projects/gbm/ to get the latest or to contribute to the package Minor changes: - the "quantile" distribution now handles weighted data - relative.influence changed to give names to the returned vector - Added print.gbm and show.gbm. These give basic summaries of the fitted model - Added support function and reconstructGBMdata() to facilitate reconstituting the data for certain plots and summaries - gbm was not using the weights when using cross-validation due to a bug. That's been fixed (Thanks to Trevor Hastie for catching this) - predict.gbm now tries to guess the number of trees, also defaults to using the training data if no newdata is given. - relative.influence has has 2 new arguments, scale. and sort. that default to FALSE. The returned vector now has names. - gbm now tries to guess what distribution you meant if you didn't specify. - gbm has a new argument, class.stratifiy.cv, to control if cross-validation is stratified by class with distribution is "bernoulli" or "multinomial". Defaults to TRUE for multinomial, FALSE for bernoulli. The purpose is to avoid unusable training sets. - gbm.perf now puts a vertical line at the best number of trees when method = "cv" or "test". Tries to guess what method you meant if you don't tell it. - .First.lib had a bug that would crash gbm if gbm was installed as a local library. Fixed. - plot.gbm has a new argument, type, defaulting to "link". For bernoulli, multinomial, poisson, "response" is allowed. - models with large interactions (>24) were using up all the terminal nodes in the stack. The stack has been increased to 101 nodes allowing interaction.depth up to 49. A more graceful error is now issued if interaction.depth exceeds 49. (Thanks to Tom Dietterich for catching this). - gbm now uses the R macro R_NaN in the C++ code rather than NAN, which would not compile on Sun OS. - If covariates marked missing values with NaN instead of NA, the model fit would not be consistent (Thanks to JR Lockwood for noting this) Changes in version 1.6 - Quantile regression is now available thanks to a contribution from Brian Kriegler. Use list(name="quantile",alpha=0.05) as the distribution parameter to construct a predictor of the 5% of the conditional distribution - gbm() now stores cv.folds in the returned gbm object - Added a normalize parameter to summary.gbm that allows one to choose whether or not to normalize the variable influence to sum to 100 or not - Corrected a minor bug in plot.gbm that put the wrong variable label on the x axis when plotting a numeric variable and a factor variable - the C function gbm_plot can now handle missing values. This does not effect the R function plot.gbm(), but it makes gbm_plot potentially more useful for computing partial dependence plots - mgcv is no longer a required package, but the splines package is needed for calibrate.plot() - minor changes for compatibility with R 2.6.0 (thanks to Seth Falcon) - corrected a bug in the cox model computation when all terminal nodes had exactly the minimum number of observations permitted, which caused gbm and R to crash ungracefully. This was likely to occur with small datasets (thanks to Brian Ring) - corrected a bug in Laplace that always made the terminal node predictions slightly larger than the median. Corrected again in a minor release due to a bug caught by Jon McAuliffe - corrected a bug in interact.gbm that caused it to crash for factors. Caught by David Carslaw - added a plot of cross-validated error to the plots generated by gbm.perf Changes in version 1.5 - gbm would fail if there was only one x. Now drop=FALSE is set in all data.frame subsetting (thanks to Gregg Keller for noticing this). - Corrected gbm.perf() to check if bag.fraction=1 and skips trying to create the OOB plots and estimates. - Corrected a typo in the vignette specifying the gradient for the Cox model. - Fixed the OOB-reps.R demo. For non-Gaussian cases it was maximizing the deviance rather than minimizing. - Increased the largest factor variable allowed from 256 levels to 1024 levels. gbm stops if any factor variable exceeds 1024. Will try to make this cleaner in the future. - predict.gbm now allows n.trees to be a vector and efficiently computes predictions for each indicated model. Avoids having to call predict.gbm several times for different choices of n.trees. - fixed a bug that occurred when using cross-validation for coxph. Was computing length(y) when y is a Surv object which return 2*N rather than N. This generated out-of-range indices for the training dataset. - Changed the method for extracting the name of the outcome variable to work around a change in terms.formula() when using "." in formulas. Changes in version 1.4 - The formula interface now allows for "-x" to indicate not including certain variables in the model fit. - Fixed the formula interface to allow offset(). The offset argument has now been removed from gbm(). - Added basehaz.gbm that computes the Breslow estimate of the baseline hazard. At a later stage this will be substituted with a call to survfit, which is much more general handling not only left-censored data. - OOB estimator is known to be conservative. A warning is now issued when using method="OOB" and there is no longer a default method for gbm.perf() - cv.folds now an option to gbm and method="cv" is an option for gbm.perf. Performs v-fold cross validation for estimating the optimal number of iterations - There is now a package vignette with details on the user options and the mathematics behind the gbm engine. Changes in version 1.3 - All likelihood based loss functions are now in terms of Deviance (-2*log likelihood). As a result, gbm always minimizes the loss. Previous versions minimized losses for some choices of distribution and maximized a likelihood for other choices. - Fixed the Poisson regression to avoid predicting +/- infinity which occurs when a terminal node has only observations with y=0. The largest predicted value is now +/-19, similar to what glm predicts for these extreme cases for linear Poisson regression. The shrinkage factor will be applied to the -19 predictions so it will take 1/shrinkage gbm iterations locating pure terminal nodes before gbm would actually return a predicted value of +/-19. - Introduces shrink.gbm.pred() that does a lasso-style variable selection Consider this function as still in an experimental phase. - Bug fix in plot.gbm - All calls to ISNAN now call ISNA (avoids using isnan) Changes in version 1.2 - fixed gbm.object help file and updated the function to check for missing values to the latest R standard. - gbm.plot now allows i.var to be the names of the variables to plot or the index of the variables used - gbm now requires "stats" package into which "modreg" has been merged - documentation for predict.gbm corrected Changes in version 1.1 - all calculations of loss functions now compute averages rather than totals. That is, all performance measures (text of progress, gbm.perf) now report average log-likelihood rather than total log-likelihood (e.g. mean squared error rather than sum of squared error). A slight exception applies to distribution="coxph". For these models the averaging pertains only to the uncensored observations. The denominator is sum(w[i]*delta[i]) rather than the usual sum(w[i]). - summary.gbm now has an experimental "method" argument. The default computes the relative influence as before. The option "method=permutation.test.gbm" performs a permutation test for the relative influence. Give it a try and let me know how it works. It currently is not implemented for "distribution=coxph". - added gbm.fit, a function that avoids the model.frame call, which is tragically slow with lots of variables. gbm is now just a formula/model.frame wrapper for the gbm.fit function. (based on a suggestion and code from Jim Garrett) - corrected a bug in the use of offsets. Now the user must pass the offset vector with the offset argument rather than in the formula. Previously, offsets were being used once as offsets and a second time as a predictor. - predict.gbm now has a single.tree option. When set to TRUE the function will return predictions from only that tree. The idea is that this may be useful for reweighting the trees using a post-model fit adjustment. - corrected a bug in CPoisson::BagImprovement that incorrectly computed the bagged estimate of improvement - corrected a bug for distribution="coxph" in gbm() and gbm.more(). If there was a single predictor the functions would drop the unused array dimension issuing an error. - corrected gbm() distribution="coxph" when train.fraction=1.0. The program would set two non-existant observations in the validation set and issue a warning. - if a predictor variable has no variation a warning (rather than an error) is now issued - updated the documentation for calibrate.plot to match the implementation - changed the some of the default values in gbm(), bag.fraction=0.5, train.fraction=1.0, and shrinkage=0.001. - corrected a bug in predict.gbm. The C code producing the predictions would go into an infinite loop if predicting an observation with a level of a categorical variable not seen in the training dataset. Now the routine uses the missing value prediction. (Feng Zeng) - added a "type" parameter to predict.gbm. The default ("link") is the same as before, predictions are on the canonical scale (gradient scale). The new option ("response") converts back the same scale as the outcome (probability for bernoulli, mean for gaussian, etc.). - gbm and gbm.more now have verbose options which can be set to FALSE to suppress the progress and performance indicators. (several users requested this nice feature) - gbm.perf no longer prints out verbose information about the best iteration estimate. It simply returns the estimate and creates the plots if requested. - ISNAN, since R 1.8.0, R.h changed declarations for ISNAN(). These changes broke gbm 1.0. I added the following code to buildinfo.h to fix this #ifdef IEEE_754 #undef ISNAN #define ISNAN(x) R_IsNaNorNA(x) #endif seems to work now but I'll look for a more elegant solution. Changes in version 0.8 - Additional documentation about the loss functions, graphics, and methods is now available with the package - Fixed the initial value for the adaboost exponential loss. Prior to version 0.8 the initial value was 0.0, now half the baseline log-odds - Changes in some headers and #define's to compile under gcc 3.2 (Brian Ripley) Changes in version 0.7 - gbm.perf, the argument named best.iter.calc has been renamed "method" for greater simplicity - all entries in the design matrix are now coerced to doubles (Thanks to Bonnie Ghosh) - now checks that all predictors are either numeric, ordinal, or factor - summary.gbm now reports the correct relative influence when some variables do not enter the model. (Thanks to Hugh Chipman) - renamed several #define'd variables in buildinfo.h so they do not conflict with standard winerror.h names. Planned future changes 1. Add weighted median functionality to Laplace 2. Automate the fitting process, ie, selecting shrinkage and number of iterations 3. Add overlay factor*continuous predictor plot as an option rather than lattice plots 4. Add multinomial and ordered logistic regression procedures Thanks to RAND for sponsoring the development of this software through statistical methods funding. Kurt Hornik, Brian Ripley, and Jan De Leeuw for helping me get gbm up to the R standard and into CRAN. Dan McCaffrey for testing and evangelizing the utility of this program. Bonnie Ghosh for finding bugs. Arnab Mukherji for testing and suggesting new features. Daniela Golinelli for finding bugs and marrying me. Andrew Morral for suggesting improvements and finding new applications of the method in the evaluation of drug treatment programs. Katrin Hambarsoomians for finding bugs. Hugh Chipman for finding bugs. Jim Garrett for many suggestions and contributions.