pwr/0000755000176200001440000000000013246622211011061 5ustar liggesuserspwr/inst/0000755000176200001440000000000013246571173012050 5ustar liggesuserspwr/inst/doc/0000755000176200001440000000000013246571173012615 5ustar liggesuserspwr/inst/doc/pwr-vignette.html0000644000176200001440000037665513246571173016164 0ustar liggesusers Getting started with the pwr package

Getting started with the pwr package

Clay Ford

2018-03-03

The basic idea of calculating power or sample size with functions in the pwr package is to leave out the argument that you want to calculate. If you want to calculate power, then leave the power argument out of the function. If you want to calculate sample size, leave n out of the function. Whatever parameter you want to calculate is determined from the others.

You select a function based on the statistical test you plan to use to analyze your data. If you plan to use a two-sample t-test to compare two means, you would use the pwr.t.test function for estimating sample size or power. All functions for power and sample size analysis in the pwr package begin with pwr. Functions are available for the following statistical tests:

There are also a few convenience functions for calculating effect size as well as a generic plot function for plotting power versus sample size. All of these are demonstrated in the examples below.

A simple example

Let’s say we suspect we have a loaded coin that lands heads 75% of the time instead of the expected 50%. We wish to create an experiment to test this. We will flip the coin a certain number of times and observe the proportion of heads. We will then conduct a one-sample proportion test to see if the proportion of heads is significantly different from what we would expect with a fair coin. We will judge significance by our p-value. If our p-value falls below a certain threshold, say 0.05, we will conclude our coin’s behavior is inconsistent with that of a fair coin.

How many times should we flip the coin to have a high probability (or power), say 0.80, of correctly rejecting the null of \(\pi\) = 0.5 if our coin is indeed loaded to land heads 75% of the time?

Here is how we can determine this using the pwr.p.test function.

library(pwr)
pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50), 
           sig.level = 0.05, 
           power = 0.80, 
           alternative = "greater")
## 
##      proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.5235988
##               n = 22.55126
##       sig.level = 0.05
##           power = 0.8
##     alternative = greater

The function tells us we should flip the coin 22.55127 times, which we round up to 23. Always round sample size estimates up. If we’re correct that our coin lands heads 75% of the time, we need to flip it at least 23 times to have an 80% chance of correctly rejecting the null hypothesis at the 0.05 significance level.

Notice that since we wanted to determine sample size (n), we left it out of the function. Our effect size is entered in the h argument. The label h is due to Cohen (1988). The function ES.h is used to calculate a unitless effect size using the arcsine transformation. (More on effect size below.) sig.level is the argument for our desired significance level. This is also sometimes referred to as our tolerance for a Type I error (\(\alpha\)). power is our desired power. It is sometimes referred to as 1 - \(\beta\), where \(\beta\) is Type II error. The alternative argument says we think the alternative is “greater” than the null, not just different.

Type I error, \(\alpha\), is the probability of rejecting the null hypothesis when it is true. This is thinking we have found an effect where none exist. This is considered the more serious error. Our tolerance for Type I error is usually 0.05 or lower.

Type II error, \(\beta\), is the probability of failing to reject the null hypothesis when it is false. This is thinking there is no effect when in fact there is. Our tolerance for Type II error is usually 0.20 or lower. Type II error is 1 - Power. If we desire a power of 0.90, then we implicitly specify a Type II error tolerance of 0.10.

The pwr package provides a generic plot function that allows us to see how power changes as we change our sample size. If you have the ggplot2 package installed, it will create a plot using ggplot. Otherwise base R graphics are used.

p.out <- pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50),
                    sig.level = 0.05, 
                    power = 0.80, 
                    alternative = "greater")
plot(p.out)

What is the power of our test if we flip the coin 40 times and lower our Type I error tolerance to 0.01? Notice we leave out the power argument, add n = 40, and change sig.level = 0.01:

pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50),
           sig.level = 0.01, 
           n = 40,
           alternative = "greater")
## 
##      proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.5235988
##               n = 40
##       sig.level = 0.01
##           power = 0.8377325
##     alternative = greater

The power of our test is about 84%.

We specified alternative = "greater" since we assumed the coin was loaded for more heads (not less). This is a stronger assumption than assuming that the coin is simply unfair in one way or another. In practice, sample size and power calculations will usually make the more conservative “two-sided” assumption. In fact this is the default for pwr functions with an alternative argument. If we wish to assume a “two-sided” alternative, we can simply leave it out of the function. Notice how our power estimate drops below 80% when we do this.

pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50),
           sig.level = 0.01, 
           n = 40)
## 
##      proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.5235988
##               n = 40
##       sig.level = 0.01
##           power = 0.7690434
##     alternative = two.sided

What if we assume the “loaded” effect is smaller? Maybe the coin lands heads 65% of the time. How many flips do we need to perform to detect this smaller effect at the 0.05 level with 80% power and the more conservative two-sided alternative?

pwr.p.test(h = ES.h(p1 = 0.65, p2 = 0.50),
           sig.level = 0.05, 
           power = 0.80)
## 
##      proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.3046927
##               n = 84.54397
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided

About 85 coin flips. Detecting smaller effects require larger sample sizes.

More on effect size

Cohen describes effect size as “the degree to which the null hypothesis is false.” In our coin flipping example, this is the difference between 75% and 50%. We could say the effect was 25% but recall we had to transform the absolute difference in proportions to another quantity using the ES.h function. This is a crucial part of using the pwr package correctly: You must provide an effect size on the expected scale. Doing otherwise will produce wrong sample size and power calculations.

When in doubt, we can use Conventional Effect Sizes. These are pre-determined effect sizes for “small”, “medium”, and “large” effects. The cohen.ES function returns a conventional effect size for a given test and size. For example, the medium effect size for the correlation test is 0.3:

cohen.ES(test = "r", size = "medium")
## 
##      Conventional effect size from Cohen (1982) 
## 
##            test = r
##            size = medium
##     effect.size = 0.3

For convenience, here are all conventional effect sizes for all tests in the pwr package:

Test small medium large
tests for proportions (p) 0.2 0.5 0.8
tests for means (t) 0.2 0.5 0.8
chi-square tests (chisq) 0.1 0.3 0.5
correlation test (r) 0.1 0.3 0.5
anova (anov) 0.1 0.25 0.4
general linear model (f2) 0.02 0.15 0.35

It is worth noting that pwr functions can take vectors for effect size and n arguments. This allows us to make many power calculations at once, either for multiple effect sizes or multiple sample sizes. For example, let’s see how power changes for our coin flipping experiment for the three conventional effect sizes of 0.2, 0.5, and 0.8, assuming a sample size of 20.

pwr.p.test(h = c(0.2,0.5,0.8),
           n = 20,
           sig.level = 0.05)
## 
##      proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.2, 0.5, 0.8
##               n = 20
##       sig.level = 0.05
##           power = 0.1454725, 0.6087795, 0.9471412
##     alternative = two.sided

As we demonstrated with the plot function above, we can save our results. This produces a list object from which we can extract quantities for further manipulation. For example, we can calculate power for sample sizes ranging from 10 to 100 in steps of 10, with an assumed “medium” effect of 0.5, and output to a data frame with some formatting:

n <- seq(10,100,10)
p.out <- pwr.p.test(h = 0.5,
                    n = n,
                    sig.level = 0.05)
data.frame(n, power = sprintf("%.2f%%", p.out$power * 100))
##      n  power
## 1   10 35.26%
## 2   20 60.88%
## 3   30 78.19%
## 4   40 88.54%
## 5   50 94.24%
## 6   60 97.21%
## 7   70 98.69%
## 8   80 99.40%
## 9   90 99.73%
## 10 100 99.88%

We can also directly extract quantities with the $ function appended to the end of a pwr function. For example,

pwr.p.test(h = 0.5, n = n, sig.level = 0.05)$power
##  [1] 0.3526081 0.6087795 0.7819080 0.8853791 0.9424375 0.9721272 0.9869034
##  [8] 0.9940005 0.9973108 0.9988173

More examples

pwr.2p.test - two-sample test for proportions

Let’s say we want to randomly sample male and female college undergraduate students and ask them if they consume alcohol at least once a week. Our null hypothesis is no difference in the proportion that answer yes. Our alternative hypothesis is that there is a difference. This is a two-sided alternative; one gender has higher proportion but we don’t know which. We would like to detect a difference as small as 5%. How many students do we need to sample in each group if we want 80% power and a significance level of 0.05?

If we think one group proportion is 55% and the other 50%:

pwr.2p.test(h = ES.h(p1 = 0.55, p2 = 0.50), sig.level = 0.05, power = .80)
## 
##      Difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.1001674
##               n = 1564.529
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: same sample sizes

Notice the sample size is per group. We need to sample 1,565 males and 1,565 females to detect the 5% difference with 80% power.

If we think one group proportion is 10% and the other 5%:

pwr.2p.test(h = ES.h(p1 = 0.10, p2 = 0.05), sig.level = 0.05, power = .80)
## 
##      Difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.1924743
##               n = 423.7319
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: same sample sizes

Even though the absolute difference between proportions is the same (5%), the optimum sample size is now 424 per group. 10% vs 5% is actually a bigger difference than 55% vs 50%. A heuristic approach for understanding why is to compare the ratios: 55/50 = 1.1 while 10/5 = 2.

The ES.h function performs an arcsine transformation on both proportions and returns the difference. By setting p2 to 0, we can see the transformed value for p1. We can exploit this to help us visualize how the transformation creates larger effects for two proportions closer to 0 or 1. Below we plot transformed proportions versus untransformed proportions and then compare the distance between pairs of proportions on each axis.

addSegs <- function(p1, p2){
  tp1 <- ES.h(p1, 0); tp2 <- ES.h(p2, 0)
  segments(p1,0,p1,tp1, col="blue"); segments(p2,0,p2,tp2,col="blue")
  segments(0, tp1, p1, tp1, col="red"); segments(0, tp2, p2, tp2, col="red")
}

curve(expr = ES.h(p1 = x, p2 = 0), xlim = c(0,1),
      xlab = "proportion", ylab = "transformed proportion")
addSegs(p1 = 0.50, p2 = 0.55) # 50% vs 55%
addSegs(p1 = 0.05, p2 = 0.10) # 5% vs 10%

The differences on the x-axis between the two pairs of proportions is the same (0.05), but the difference is larger for 5% vs 10% on the y-axis. The ES.h function returns the distance between the red lines.

Base R has a function called power.prop.test that allows us to use the raw proportions in the function without a need for a separate effect size function.

power.prop.test(p1 = 0.55, p2 = 0.50, sig.level = 0.05, power = .80)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 1564.672
##              p1 = 0.55
##              p2 = 0.5
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Notice the results are slightly different. It calculates effect size differently.

If we don’t have any preconceived estimates of proportions or don’t feel comfortable making estimates, we can use conventional effect sizes of 0.2 (small), 0.5 (medium), or 0.8 (large). The sample size per group needed to detect a “small” effect with 80% power and 0.05 significance is about 393:

pwr.2p.test(h = 0.2, sig.level = 0.05, power = .80)
## 
##      Difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.2
##               n = 392.443
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: same sample sizes

pwr.2p2n.test - two-sample test for proportions, unequal sample sizes

Let’s return to our undergraduate survey of alcohol consumption. It turns out we were able to survey 543 males and 675 females. The power of our test if we’re interested in being able to detect a “small” effect size with 0.05 significance is about 93%.

cohen.ES(test = "p", size = "small")
## 
##      Conventional effect size from Cohen (1982) 
## 
##            test = p
##            size = small
##     effect.size = 0.2
pwr.2p2n.test(h = 0.2, n1 = 543, n2 = 675, sig.level = 0.05)
## 
##      difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.2
##              n1 = 543
##              n2 = 675
##       sig.level = 0.05
##           power = 0.9344102
##     alternative = two.sided
## 
## NOTE: different sample sizes

Let’s say we previously surveyed 763 female undergraduates and found that p% said they consumed alcohol once a week. We would like to survey some males and see if a significantly different proportion respond yes. How many do I need to sample to detect a small effect size (0.2) in either direction with 80% power and a significance level of 0.05?

pwr.2p2n.test(h = 0.2, n1 = 763, power = 0.8, sig.level = 0.05)
## 
##      difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.2
##              n1 = 763
##              n2 = 264.1544
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: different sample sizes

About 265.

pwr.t.test - one-sample and two-sample t tests for means

We’re interested to know if there is a difference in the mean price of what male and female students pay at a library coffee shop. Let’s say we randomly observe 30 male and 30 female students check out from the coffee shop and calculate the mean purchase price for each gender. We’ll test for a difference in means using a two-sample t-test. How powerful is this experiment if we want to detect a “medium” effect in either direction with a significance level of 0.05?

cohen.ES(test = "t", size = "medium")
## 
##      Conventional effect size from Cohen (1982) 
## 
##            test = t
##            size = medium
##     effect.size = 0.5
pwr.t.test(n = 30, d = 0.5, sig.level = 0.05)
## 
##      Two-sample t test power calculation 
## 
##               n = 30
##               d = 0.5
##       sig.level = 0.05
##           power = 0.4778965
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Only 48%. Not very powerful. How many students should we observe for a test with 80% power?

pwr.t.test(d = 0.5, power = 0.80, sig.level = 0.05)
## 
##      Two-sample t test power calculation 
## 
##               n = 63.76561
##               d = 0.5
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

About 64 per group.

Let’s say we want to be able to detect a difference of at least 75 cents in the mean purchase price. We need to convert that to an effect size using the following formula:

\[d = \frac{m_{1} - m_{2}}{\sigma} \]

where \(m_{1}\) and \(m_{2}\) are the means of each group, respectively, and \(\sigma\) is the common standard deviation of the two groups. Again, the label d is due to Cohen (1988).

We have \(m_{1} - m_{2} =\) 0.75. We need to make a guess at the population standard deviation. If we have absolutely no idea, one rule of thumb is to take the difference between the maximum and minimum values and divide by 4. Let’s say the maximum purchase is $10 and the minimum purchase is $1. Our estimated standard deviation is (10 - 1)/4 = 2.25. Therefore our effect size is 0.75/2.25 \(\approx\) 0.333.

d <- 0.75/2.25
pwr.t.test(d = d, power = 0.80, sig.level = 0.05)
## 
##      Two-sample t test power calculation 
## 
##               n = 142.2462
##               d = 0.3333333
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

For a desired power of 80%, Type I error tolerance of 0.05, and a hypothesized effect size of 0.333, we should sample at least 143 per group.

Performing the same analysis with the base R function power.t.test is a little easier. The difference \(m_{1} - m_{2} =\) 0.75 is entered in the delta argument and the estimated \(\sigma\) = 2.25 is entered in the sd argument:

power.t.test(delta = 0.75, sd = 2.25, sig.level = 0.05, power = 0.8)
## 
##      Two-sample t test power calculation 
## 
##               n = 142.2466
##           delta = 0.75
##              sd = 2.25
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

To calculate power and sample size for one-sample t-tests, we need to set the type argument to "one.sample". By default it is set to "two.sample".

For example, we think the average purchase price at the Library coffee shop is over $3 per student. Our null is $3 or less; our alternative is greater than $3. We can use a one-sample t-test to investigate this hunch. If the true average purchase price is $3.50, we would like to have 90% power to declare the estimated average purchase price is greater than $3. How many transactions do we need to observe assuming a significance level of 0.05? Let’s say the maximum purchase price is $10 and the minimum is $1. So our guess at a standard deviation is 9/4 = 2.25.

d <- 0.50/2.25
pwr.t.test(d = d, sig.level = 0.05, power = 0.90, alternative = "greater", 
           type = "one.sample")
## 
##      One-sample t test power calculation 
## 
##               n = 174.7796
##               d = 0.2222222
##       sig.level = 0.05
##           power = 0.9
##     alternative = greater

We should plan on observing at least 175 transactions.

To use the power.t.test function, set type = "one.sample" and alternative = "one.sided":

power.t.test(delta = 0.50, sd = 2.25, power = 0.90, sig.level = 0.05, 
             alternative = "one.sided", type = "one.sample")
## 
##      One-sample t test power calculation 
## 
##               n = 174.7796
##           delta = 0.5
##              sd = 2.25
##       sig.level = 0.05
##           power = 0.9
##     alternative = one.sided

“Paired” t-tests are basically the same as one-sample t-tests, except our one sample is usually differences in pairs. The following example should make this clear.

(From Hogg & Tanis, exercise 6.5-12) 24 high school boys are put on a ultra-heavy rope-jumping program. Does this decrease their 40-yard dash time (i.e., make them faster)? We’ll measure their 40 time in seconds before the program and after. We’ll use a paired t-test to see if the difference in times is greater than 0 (before - after). Assume the standard deviation of the differences will be about 0.25 seconds. How powerful is the test to detect a difference of about 0.08 seconds with 0.05 significance?

Notice we set type = "paired":

pwr.t.test(n = 24, d = 0.08 / 0.25, 
           type = "paired", alternative = "greater")
## 
##      Paired t test power calculation 
## 
##               n = 24
##               d = 0.32
##       sig.level = 0.05
##           power = 0.4508691
##     alternative = greater
## 
## NOTE: n is number of *pairs*

Only 45%. Not all that powerful. How many high school boys should we sample for 80% power?

pwr.t.test(d = 0.08 / 0.25, power = 0.8,
           type = "paired", alternative = "greater")
## 
##      Paired t test power calculation 
## 
##               n = 61.75209
##               d = 0.32
##       sig.level = 0.05
##           power = 0.8
##     alternative = greater
## 
## NOTE: n is number of *pairs*

About 62.

For paired t-tests we sometimes estimate a standard deviation for within pairs instead of for the difference in pairs. In our example, this would mean an estimated standard deviation for each boy’s 40-yard dash times. When dealing with this type of estimated standard deviation we need to multiply it by \(\sqrt{2}\) in the pwr.t.test function. Let’s say we estimate the standard deviation of each boy’s 40-yard dash time to be about 0.10 seconds. The sample size needed to detect a difference of 0.08 seconds is now calculated as follows:

pwr.t.test(d = 0.08 / (0.1 * sqrt(2)), power = 0.8, 
           type = "paired", alternative = "greater")
## 
##      Paired t test power calculation 
## 
##               n = 20.74232
##               d = 0.5656854
##       sig.level = 0.05
##           power = 0.8
##     alternative = greater
## 
## NOTE: n is number of *pairs*

We need to sample at least 21 students.

pwr.t2n.test - two-sample t test for means, unequal sample sizes

Find power for a two-sample t-test with 28 in one group and 35 in the other group and a medium effect size. (sig.level defaults to 0.05.)

pwr.t2n.test(n1 = 28, n2 = 35, d = 0.5)
## 
##      t test power calculation 
## 
##              n1 = 28
##              n2 = 35
##               d = 0.5
##       sig.level = 0.05
##           power = 0.4924588
##     alternative = two.sided

pwr.chisq.test - Goodness of fit test

(From Cohen, example 7.1) A market researcher is seeking to determine preference among 4 package designs. He arranges to have a panel of 100 consumers rate their favorite package design. He wants to perform a chi-square goodness of fit test against the null of equal preference (25% for each design) with a significance level of 0.05. What’s the power of the test if 3/8 of the population actually prefers one of the designs and the remaining 5/8 are split over the other 3 designs?

We use the ES.w1 function to calculate effect size. To do so, we need to create vectors of null and alternative proportions:

null <- rep(0.25, 4)
alt <- c(3/8, rep((5/8)/3, 3))
ES.w1(null,alt)
## [1] 0.2886751

To calculate power, specify effect size (w), sample size (N), and degrees of freedom, which is the number of categories minus 1 (df = 4 - 1).

pwr.chisq.test(w=ES.w1(null,alt), N=100, df=(4-1), sig.level=0.05)
## 
##      Chi squared power calculation 
## 
##               w = 0.2886751
##               N = 100
##              df = 3
##       sig.level = 0.05
##           power = 0.6739834
## 
## NOTE: N is the number of observations

If our estimated effect size is correct, we only have about a 67% chance of finding it (i.e., rejecting the null hypothesis of equal preference).

How many subjects do we need to achieve 80% power?

pwr.chisq.test(w=ES.w1(null,alt), df=(4-1), power=0.8, sig.level = 0.05)
## 
##      Chi squared power calculation 
## 
##               w = 0.2886751
##               N = 130.8308
##              df = 3
##       sig.level = 0.05
##           power = 0.8
## 
## NOTE: N is the number of observations

If our alternative hypothesis is correct then we need to survey at least 131 people to detect it with 80% power.

pwr.chisq.test - test of association

We want to see if there’s an association between gender and flossing teeth among college students. We randomly sample 100 students (male and female) and ask whether or not they floss daily. We want to carry out a chi-square test of association to determine if there’s an association between these two variables. We set our significance level to 0.01. To determine effect size we need to propose an alternative hypothesis, which in this case is a table of proportions. We propose the following:

gender Floss No Floss
Male 0.1 0.4
Female 0.2 0.3

We use the ES.w2 function to calculate effect size for chi-square tests of association

prob <- matrix(c(0.1,0.2,0.4,0.3), ncol=2, 
               dimnames = list(c("M","F"),c("Floss","No Floss")))
prob
##   Floss No Floss
## M   0.1      0.4
## F   0.2      0.3

This says we sample even proportions of male and females, but believe 10% more females floss.

Now use the matrix to calculate effect size:

ES.w2(prob)
## [1] 0.2182179

We also need degrees of freedom. df = (2 - 1) * (2 - 1) = 1

And now to calculate power:

pwr.chisq.test(w = ES.w2(prob), N = 100, df = 1, sig.level = 0.01)
## 
##      Chi squared power calculation 
## 
##               w = 0.2182179
##               N = 100
##              df = 1
##       sig.level = 0.01
##           power = 0.3469206
## 
## NOTE: N is the number of observations

At only 35% this is not a very powerful experiment. How many students should I survey if I wish to achieve 90% power?

pwr.chisq.test(w = ES.w2(prob), power = 0.9, df = 1, sig.level = 0.01)
## 
##      Chi squared power calculation 
## 
##               w = 0.2182179
##               N = 312.4671
##              df = 1
##       sig.level = 0.01
##           power = 0.9
## 
## NOTE: N is the number of observations

About 313.

If you don’t suspect association in either direction, or you don’t feel like building a matrix in R, you can try a conventional effect size. For example, how many students should we sample to detect a small effect?

cohen.ES(test = "chisq", size = "small")
## 
##      Conventional effect size from Cohen (1982) 
## 
##            test = chisq
##            size = small
##     effect.size = 0.1
pwr.chisq.test(w = 0.1, power = 0.9, df = 1, sig.level = 0.01)
## 
##      Chi squared power calculation 
## 
##               w = 0.1
##               N = 1487.939
##              df = 1
##       sig.level = 0.01
##           power = 0.9
## 
## NOTE: N is the number of observations

1,488 students. Perhaps more than we thought we might need.

We could consider reframing the question as a two-sample proportion test. What sample size do we need to detect a “small” effect in gender on the proportion of students who floss with 90% power and a significance level of 0.01?

pwr.2p.test(h = 0.2, sig.level = 0.01, power = 0.9)
## 
##      Difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.2
##               n = 743.9694
##       sig.level = 0.01
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: same sample sizes

About 744 per group. Notice that 744 \(\times\) 2 = 1,488, the sample size returned previously by pwr.chisq.test. In fact the test statistic for a two-sample proportion test and chi-square test of association are one and the same.

pwr.r.test - correlation test

(From Hogg & Tanis, exercise 8.9-12) A graduate student is investigating the effectiveness of a fitness program. She wants to see if there is a correlation between the weight of a participant at the beginning of the program and the participant’s weight change after 6 months. She suspects there is a “small” positive linear relationship between these two quantities. She will measure this relationship with correlation, r, and conduct a correlation test to determine if the estimated correlation is statistically greater than 0. How many subjects does she need to sample to detect this small positive (i.e., r > 0) relationship with 80% power and 0.01 significance level?

There is nothing tricky about the effect size argument, r. It is simply the hypothesized correlation. It can take values ranging from -1 to 1.

cohen.ES(test = "r", size = "small")
## 
##      Conventional effect size from Cohen (1982) 
## 
##            test = r
##            size = small
##     effect.size = 0.1
pwr.r.test(r = 0.1, sig.level = 0.01, power = 0.8, alternative = "greater")
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 999.2054
##               r = 0.1
##       sig.level = 0.01
##           power = 0.8
##     alternative = greater

She needs to observe about a 1000 students.

The default is a two-sided test. We specify alternative = "greater" since we believe there is small positive effect.

If she just wants to detect a small effect in either direction (positive or negative correlation), use the default settings of “two.sided”, which we can do by removing the alternative argument from the function.

pwr.r.test(r = 0.1, sig.level = 0.01, power = 0.8)
## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 1162.564
##               r = 0.1
##       sig.level = 0.01
##           power = 0.8
##     alternative = two.sided

Now she needs to observe 1163 students. Detecting small effects requires large sample sizes.

pwr.anova.test - balanced one-way analysis of variance tests

(From Hogg & Tanis, exercise 8.7-11) The driver of a diesel-powered car decides to test the quality of three types of fuel sold in his area based on the miles per gallon (mpg) his car gets on each fuel. He will use a balanced one-way ANOVA to test the null that the mean mpg is the same for each fuel versus the alternative that the means are different. (“balanced” means equal sample size in each group; “one-way” means one grouping variable.) How many times does he need to try each fuel to have 90% power to detect a “medium” effect with a significance of 0.01?

We use cohen.ES to get learn the “medium” effect value is 0.25. We put that in the f argument of pwr.anova.test. We also need to specify the number of groups using the k argument.

cohen.ES(test = "anov", size = "medium")
## 
##      Conventional effect size from Cohen (1982) 
## 
##            test = anov
##            size = medium
##     effect.size = 0.25
pwr.anova.test(k = 3, f = 0.25, sig.level = 0.01, power = 0.9)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 3
##               n = 94.48714
##               f = 0.25
##       sig.level = 0.01
##           power = 0.9
## 
## NOTE: n is number in each group

He would need to measure mpg 95 times for each type of fuel. His experiment may take a while to complete.

The effect size f is calculated as follows:

\[f = \frac{\sigma_{means}}{\sigma_{pop'n}}\]

where \(\sigma_{means}\) is the standard deviation of the k means and \(\sigma_{pop'n}\) is the common standard deviation of the k groups. These two quantities are also known as the between-group and within-group standard deviations. If our driver suspects the between-group standard deviation is 5 mpg and the within-group standard deviation is 3 mpg, f = 5/3.

pwr.anova.test(k = 3, f = 5/3, sig.level = 0.01, power = 0.9)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 3
##               n = 3.842228
##               f = 1.666667
##       sig.level = 0.01
##           power = 0.9
## 
## NOTE: n is number in each group

In this case he only needs to try each fuel 4 times. Clearly the hypothesized effect has important consequences in estimating an optimum effect size.

We can also use the power.anova.test function that comes with base R. It requires between-group and within-group variances. To get the same result as pwr.anova.test we need to square the standard deviations to get variances and multiply the between-group variance by \(\frac{k}{k-1}\). This is because the effect size formula for the ANOVA test assumes the between-group variance has a denominator of k instead of k - 1.

power.anova.test(groups = 3, 
                 within.var = 3^2, 
                 between.var = 5^2 * (3/2), 
                 sig.level = 0.01, power = 0.90)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##          groups = 3
##               n = 3.842225
##     between.var = 37.5
##      within.var = 9
##       sig.level = 0.01
##           power = 0.9
## 
## NOTE: n is number in each group

pwr.f2.test - test for the general linear model

(From Kutner, et al, exercise 8.43) A director of admissions at a university wants to determine how accurately students’ grade-point averages (gpa) at the end of their first year can be predicted or explained by SAT scores and high school class rank. A common approach to answering this kind of question is to model gpa as a function of SAT score and class rank. Or to put another way, we can perform a multiple regression with gpa as the dependent variable and SAT and class rank as independent variables.

The null hypothesis is that none of the independent variables explain any of the variability in gpa. This would mean their regression coefficients are statistically indistinguishable from 0. The alternative is that at least one of the coefficients is not 0. This is tested with an F test. We can estimate power and sample size for this test using the pwr.f2.test function.

The F test has numerator and denominator degrees of freedom. The numerator degrees of freedom, u, is the number of coefficients you’ll have in your model (minus the intercept). In our example, u = 2. The denominator degrees of freedom, v, is the number of error degrees of freedom: \(v = n - u - 1\). This implies \(n = v + u + 1\).

The effect size, f2, is \(R^{2}/(1 - R^{2})\), where \(R^{2}\) is the coefficient of determination, aka the “proportion of variance explained”. To determine effect size you hypothesize the proportion of variance your model explains, or the \(R^{2}\). For example, if I think my model explains 45% of the variance in my dependent variable, the effect size is 0.45/(1 - 0.45) \(\approx\) 0.81.

Returning to our example, let’s say the director of admissions hypothesizes his model explains about 30% of the variability in gpa. How large of a sample does he need to take to detect this effect with 80% power at a 0.001 significance level?

pwr.f2.test(u = 2, f2 = 0.3/(1 - 0.3), sig.level = 0.001, power = 0.8)
## 
##      Multiple regression power calculation 
## 
##               u = 2
##               v = 49.88971
##              f2 = 0.4285714
##       sig.level = 0.001
##           power = 0.8

Recall \(n = v + u + 1\). Therefore he needs 50 + 2 + 1 = 53 student records.

What is the power of the test with 40 subjects and a significance level of 0.01? Recall \(v = n - u - 1\).

pwr.f2.test(u = 2, v = 40 - 2 - 1, f2 = 0.3/(1 - 0.3), sig.level = 0.01)
## 
##      Multiple regression power calculation 
## 
##               u = 2
##               v = 37
##              f2 = 0.4285714
##       sig.level = 0.01
##           power = 0.8406124

Power is about 84%.

References and Further Reading

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). LEA.
Dalgaard, P. (2002). Introductory Statistics with R. Springer. (Ch. 2)
Hogg, R and Tanis, E. (2006). Probability and Statistical Inference (7th ed.). Pearson. (Ch. 9)
Kabacoff, R. (2011). R in Action. Manning. (Ch. 10)
Kutner, et al. (2005). Applied Linear Statistical Models. McGraw-Hill. (Ch. 16)
Ryan, T. (2013). Sample Size Determination and Power. Wiley.

The CRAN Task View for Clinical Trial Design, Monitoring, and Analysis lists various R packages that also perform sample size and power calculations.

pwr/inst/doc/pwr-vignette.R0000644000176200001440000001606713246571172015404 0ustar liggesusers## ----echo=TRUE----------------------------------------------------------- library(pwr) pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50), sig.level = 0.05, power = 0.80, alternative = "greater") ## ----fig.height=5, fig.width=5------------------------------------------- p.out <- pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50), sig.level = 0.05, power = 0.80, alternative = "greater") plot(p.out) ## ------------------------------------------------------------------------ pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50), sig.level = 0.01, n = 40, alternative = "greater") ## ------------------------------------------------------------------------ pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50), sig.level = 0.01, n = 40) ## ------------------------------------------------------------------------ pwr.p.test(h = ES.h(p1 = 0.65, p2 = 0.50), sig.level = 0.05, power = 0.80) ## ------------------------------------------------------------------------ cohen.ES(test = "r", size = "medium") ## ------------------------------------------------------------------------ pwr.p.test(h = c(0.2,0.5,0.8), n = 20, sig.level = 0.05) ## ------------------------------------------------------------------------ n <- seq(10,100,10) p.out <- pwr.p.test(h = 0.5, n = n, sig.level = 0.05) data.frame(n, power = sprintf("%.2f%%", p.out$power * 100)) ## ------------------------------------------------------------------------ pwr.p.test(h = 0.5, n = n, sig.level = 0.05)$power ## ------------------------------------------------------------------------ pwr.2p.test(h = ES.h(p1 = 0.55, p2 = 0.50), sig.level = 0.05, power = .80) ## ------------------------------------------------------------------------ pwr.2p.test(h = ES.h(p1 = 0.10, p2 = 0.05), sig.level = 0.05, power = .80) ## ----fig.height=5, fig.width=5------------------------------------------- addSegs <- function(p1, p2){ tp1 <- ES.h(p1, 0); tp2 <- ES.h(p2, 0) segments(p1,0,p1,tp1, col="blue"); segments(p2,0,p2,tp2,col="blue") segments(0, tp1, p1, tp1, col="red"); segments(0, tp2, p2, tp2, col="red") } curve(expr = ES.h(p1 = x, p2 = 0), xlim = c(0,1), xlab = "proportion", ylab = "transformed proportion") addSegs(p1 = 0.50, p2 = 0.55) # 50% vs 55% addSegs(p1 = 0.05, p2 = 0.10) # 5% vs 10% ## ------------------------------------------------------------------------ power.prop.test(p1 = 0.55, p2 = 0.50, sig.level = 0.05, power = .80) ## ------------------------------------------------------------------------ pwr.2p.test(h = 0.2, sig.level = 0.05, power = .80) ## ------------------------------------------------------------------------ cohen.ES(test = "p", size = "small") pwr.2p2n.test(h = 0.2, n1 = 543, n2 = 675, sig.level = 0.05) ## ------------------------------------------------------------------------ pwr.2p2n.test(h = 0.2, n1 = 763, power = 0.8, sig.level = 0.05) ## ------------------------------------------------------------------------ cohen.ES(test = "t", size = "medium") pwr.t.test(n = 30, d = 0.5, sig.level = 0.05) ## ------------------------------------------------------------------------ pwr.t.test(d = 0.5, power = 0.80, sig.level = 0.05) ## ------------------------------------------------------------------------ d <- 0.75/2.25 pwr.t.test(d = d, power = 0.80, sig.level = 0.05) ## ------------------------------------------------------------------------ power.t.test(delta = 0.75, sd = 2.25, sig.level = 0.05, power = 0.8) ## ------------------------------------------------------------------------ d <- 0.50/2.25 pwr.t.test(d = d, sig.level = 0.05, power = 0.90, alternative = "greater", type = "one.sample") ## ------------------------------------------------------------------------ power.t.test(delta = 0.50, sd = 2.25, power = 0.90, sig.level = 0.05, alternative = "one.sided", type = "one.sample") ## ------------------------------------------------------------------------ pwr.t.test(n = 24, d = 0.08 / 0.25, type = "paired", alternative = "greater") ## ------------------------------------------------------------------------ pwr.t.test(d = 0.08 / 0.25, power = 0.8, type = "paired", alternative = "greater") ## ------------------------------------------------------------------------ pwr.t.test(d = 0.08 / (0.1 * sqrt(2)), power = 0.8, type = "paired", alternative = "greater") ## ------------------------------------------------------------------------ pwr.t2n.test(n1 = 28, n2 = 35, d = 0.5) ## ------------------------------------------------------------------------ null <- rep(0.25, 4) alt <- c(3/8, rep((5/8)/3, 3)) ES.w1(null,alt) ## ------------------------------------------------------------------------ pwr.chisq.test(w=ES.w1(null,alt), N=100, df=(4-1), sig.level=0.05) ## ------------------------------------------------------------------------ pwr.chisq.test(w=ES.w1(null,alt), df=(4-1), power=0.8, sig.level = 0.05) ## ------------------------------------------------------------------------ prob <- matrix(c(0.1,0.2,0.4,0.3), ncol=2, dimnames = list(c("M","F"),c("Floss","No Floss"))) prob ## ------------------------------------------------------------------------ ES.w2(prob) ## ------------------------------------------------------------------------ pwr.chisq.test(w = ES.w2(prob), N = 100, df = 1, sig.level = 0.01) ## ------------------------------------------------------------------------ pwr.chisq.test(w = ES.w2(prob), power = 0.9, df = 1, sig.level = 0.01) ## ------------------------------------------------------------------------ cohen.ES(test = "chisq", size = "small") pwr.chisq.test(w = 0.1, power = 0.9, df = 1, sig.level = 0.01) ## ------------------------------------------------------------------------ pwr.2p.test(h = 0.2, sig.level = 0.01, power = 0.9) ## ------------------------------------------------------------------------ cohen.ES(test = "r", size = "small") pwr.r.test(r = 0.1, sig.level = 0.01, power = 0.8, alternative = "greater") ## ------------------------------------------------------------------------ pwr.r.test(r = 0.1, sig.level = 0.01, power = 0.8) ## ------------------------------------------------------------------------ cohen.ES(test = "anov", size = "medium") pwr.anova.test(k = 3, f = 0.25, sig.level = 0.01, power = 0.9) ## ------------------------------------------------------------------------ pwr.anova.test(k = 3, f = 5/3, sig.level = 0.01, power = 0.9) ## ------------------------------------------------------------------------ power.anova.test(groups = 3, within.var = 3^2, between.var = 5^2 * (3/2), sig.level = 0.01, power = 0.90) ## ------------------------------------------------------------------------ pwr.f2.test(u = 2, f2 = 0.3/(1 - 0.3), sig.level = 0.001, power = 0.8) ## ------------------------------------------------------------------------ pwr.f2.test(u = 2, v = 40 - 2 - 1, f2 = 0.3/(1 - 0.3), sig.level = 0.01) pwr/inst/doc/pwr-vignette.Rmd0000644000176200001440000007137013065040321015705 0ustar liggesusers--- title: "Getting started with the pwr package" author: "Clay Ford" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with the pwr package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- The basic idea of calculating power or sample size with functions in the pwr package is to _leave out_ the argument that you want to calculate. If you want to calculate power, then leave the `power` argument out of the function. If you want to calculate sample size, leave `n` out of the function. Whatever parameter you want to calculate is determined from the others. You select a function based on the statistical test you plan to use to analyze your data. If you plan to use a two-sample t-test to compare two means, you would use the `pwr.t.test` function for estimating sample size or power. All functions for power and sample size analysis in the pwr package begin with `pwr`. Functions are available for the following statistical tests: - `pwr.p.test`: one-sample proportion test - `pwr.2p.test`: two-sample proportion test - `pwr.2p2n.test`: two-sample proportion test (unequal sample sizes) - `pwr.t.test`: two-sample, one-sample and paired t-tests - `pwr.t2n.test`: two-sample t-tests (unequal sample sizes) - `pwr.anova.test`: one-way balanced ANOVA - `pwr.r.test`: correlation test - `pwr.chisq.test`: chi-squared test (goodness of fit and association) - `pwr.f2.test`: test for the general linear model There are also a few convenience functions for calculating effect size as well as a generic `plot` function for plotting power versus sample size. All of these are demonstrated in the examples below. ## A simple example Let's say we suspect we have a loaded coin that lands heads 75% of the time instead of the expected 50%. We wish to create an experiment to test this. We will flip the coin a certain number of times and observe the proportion of heads. We will then conduct a one-sample proportion test to see if the proportion of heads is significantly different from what we would expect with a fair coin. We will judge significance by our p-value. If our p-value falls below a certain threshold, say 0.05, we will conclude our coin's behavior is inconsistent with that of a fair coin. - Our null hypothesis is that the coin is fair and lands heads 50% of the time ($\pi$ = 0.50). - Our alternative hypothesis is that the coin is loaded to land heads more then 50% of the time ($\pi$ > 0.50). How many times should we flip the coin to have a high probability (or _power_), say 0.80, of correctly rejecting the null of $\pi$ = 0.5 if our coin is indeed loaded to land heads 75% of the time? Here is how we can determine this using the `pwr.p.test` function. ```{r echo=TRUE} library(pwr) pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50), sig.level = 0.05, power = 0.80, alternative = "greater") ``` The function tells us we should flip the coin 22.55127 times, which we round up to 23. Always round sample size estimates up. If we're correct that our coin lands heads 75% of the time, we need to flip it at least 23 times to have an 80% chance of correctly rejecting the null hypothesis at the 0.05 significance level. Notice that since we wanted to determine sample size (`n`), we left it out of the function. Our _effect size_ is entered in the `h` argument. The label `h` is due to Cohen (1988). The function `ES.h` is used to calculate a unitless effect size using the arcsine transformation. (More on effect size below.) `sig.level` is the argument for our desired significance level. This is also sometimes referred to as our tolerance for a Type I error ($\alpha$). `power` is our desired power. It is sometimes referred to as 1 - $\beta$, where $\beta$ is Type II error. The `alternative` argument says we think the alternative is "greater" than the null, not just different. Type I error, $\alpha$, is the probability of rejecting the null hypothesis when it is true. This is thinking we have found an effect where none exist. This is considered the more serious error. Our tolerance for Type I error is usually 0.05 or lower. Type II error, $\beta$, is the probability of failing to reject the null hypothesis when it is false. This is thinking there is no effect when in fact there is. Our tolerance for Type II error is usually 0.20 or lower. Type II error is 1 - Power. If we desire a power of 0.90, then we implicitly specify a Type II error tolerance of 0.10. The pwr package provides a generic `plot` function that allows us to see how power changes as we change our sample size. If you have the ggplot2 package installed, it will create a plot using `ggplot`. Otherwise base R graphics are used. ```{r fig.height=5, fig.width=5} p.out <- pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50), sig.level = 0.05, power = 0.80, alternative = "greater") plot(p.out) ``` What is the power of our test if we flip the coin 40 times and lower our Type I error tolerance to 0.01? Notice we leave out the `power` argument, add `n = 40`, and change `sig.level = 0.01`: ```{r} pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50), sig.level = 0.01, n = 40, alternative = "greater") ``` The power of our test is about 84%. We specified `alternative = "greater"` since we assumed the coin was loaded for more heads (not less). This is a stronger assumption than assuming that the coin is simply unfair in one way or another. In practice, sample size and power calculations will usually make the more conservative "two-sided" assumption. In fact this is the default for pwr functions with an `alternative` argument. If we wish to assume a "two-sided" alternative, we can simply leave it out of the function. Notice how our power estimate drops below 80% when we do this. ```{r} pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50), sig.level = 0.01, n = 40) ``` What if we assume the "loaded" effect is smaller? Maybe the coin lands heads 65% of the time. How many flips do we need to perform to detect this smaller effect at the 0.05 level with 80% power and the more conservative two-sided alternative? ```{r} pwr.p.test(h = ES.h(p1 = 0.65, p2 = 0.50), sig.level = 0.05, power = 0.80) ``` About 85 coin flips. Detecting smaller effects require larger sample sizes. ## More on effect size Cohen describes effect size as "the degree to which the null hypothesis is false." In our coin flipping example, this is the difference between 75% and 50%. We could say the effect was 25% but recall we had to transform the absolute difference in proportions to another quantity using the `ES.h` function. This is a crucial part of using the pwr package correctly: _You must provide an effect size on the expected scale._ Doing otherwise will produce wrong sample size and power calculations. When in doubt, we can use _Conventional Effect Sizes_. These are pre-determined effect sizes for "small", "medium", and "large" effects. The `cohen.ES` function returns a conventional effect size for a given test and size. For example, the medium effect size for the correlation test is 0.3: ```{r} cohen.ES(test = "r", size = "medium") ``` For convenience, here are all conventional effect sizes for all tests in the pwr package: Test | `small` | `medium` | `large` ----------------------------|-------|--------|------- tests for proportions (`p`) | 0.2 | 0.5 | 0.8 tests for means (`t`) | 0.2 | 0.5 | 0.8 chi-square tests (`chisq`) | 0.1 | 0.3 | 0.5 correlation test (`r`) | 0.1 | 0.3 | 0.5 anova (`anov`) | 0.1 | 0.25 | 0.4 general linear model (`f2`) | 0.02 | 0.15 | 0.35 It is worth noting that pwr functions can take vectors for effect size and n arguments. This allows us to make many power calculations at once, either for multiple effect sizes or multiple sample sizes. For example, let's see how power changes for our coin flipping experiment for the three conventional effect sizes of 0.2, 0.5, and 0.8, assuming a sample size of 20. ```{r} pwr.p.test(h = c(0.2,0.5,0.8), n = 20, sig.level = 0.05) ``` As we demonstrated with the `plot` function above, we can save our results. This produces a list object from which we can extract quantities for further manipulation. For example, we can calculate power for sample sizes ranging from 10 to 100 in steps of 10, with an assumed "medium" effect of 0.5, and output to a data frame with some formatting: ```{r} n <- seq(10,100,10) p.out <- pwr.p.test(h = 0.5, n = n, sig.level = 0.05) data.frame(n, power = sprintf("%.2f%%", p.out$power * 100)) ``` We can also directly extract quantities with the `$` function appended to the end of a pwr function. For example, ```{r} pwr.p.test(h = 0.5, n = n, sig.level = 0.05)$power ``` ## More examples ### pwr.2p.test - two-sample test for proportions Let's say we want to randomly sample male and female college undergraduate students and ask them if they consume alcohol at least once a week. Our null hypothesis is no difference in the proportion that answer yes. Our alternative hypothesis is that there is a difference. This is a two-sided alternative; one gender has higher proportion but we don't know which. We would like to detect a difference as small as 5%. How many students do we need to sample in each group if we want 80% power and a significance level of 0.05? If we think one group proportion is 55% and the other 50%: ```{r} pwr.2p.test(h = ES.h(p1 = 0.55, p2 = 0.50), sig.level = 0.05, power = .80) ``` Notice the sample size is _per group_. We need to sample 1,565 males and 1,565 females to detect the 5% difference with 80% power. If we think one group proportion is 10% and the other 5%: ```{r} pwr.2p.test(h = ES.h(p1 = 0.10, p2 = 0.05), sig.level = 0.05, power = .80) ``` Even though the absolute difference between proportions is the same (5%), the optimum sample size is now 424 per group. 10% vs 5% is actually a bigger difference than 55% vs 50%. A heuristic approach for understanding why is to compare the ratios: 55/50 = 1.1 while 10/5 = 2. The `ES.h` function performs an arcsine transformation on both proportions and returns the difference. By setting `p2` to 0, we can see the transformed value for `p1`. We can exploit this to help us visualize how the transformation creates larger effects for two proportions closer to 0 or 1. Below we plot transformed proportions versus untransformed proportions and then compare the distance between pairs of proportions on each axis. ```{r fig.height=5, fig.width=5} addSegs <- function(p1, p2){ tp1 <- ES.h(p1, 0); tp2 <- ES.h(p2, 0) segments(p1,0,p1,tp1, col="blue"); segments(p2,0,p2,tp2,col="blue") segments(0, tp1, p1, tp1, col="red"); segments(0, tp2, p2, tp2, col="red") } curve(expr = ES.h(p1 = x, p2 = 0), xlim = c(0,1), xlab = "proportion", ylab = "transformed proportion") addSegs(p1 = 0.50, p2 = 0.55) # 50% vs 55% addSegs(p1 = 0.05, p2 = 0.10) # 5% vs 10% ``` The differences on the x-axis between the two pairs of proportions is the same (0.05), but the difference is larger for 5% vs 10% on the y-axis. The `ES.h` function returns the distance between the red lines. Base R has a function called `power.prop.test` that allows us to use the raw proportions in the function without a need for a separate effect size function. ```{r} power.prop.test(p1 = 0.55, p2 = 0.50, sig.level = 0.05, power = .80) ``` Notice the results are slightly different. It calculates effect size differently. If we don't have any preconceived estimates of proportions or don't feel comfortable making estimates, we can use conventional effect sizes of 0.2 (small), 0.5 (medium), or 0.8 (large). The sample size per group needed to detect a "small" effect with 80% power and 0.05 significance is about 393: ```{r} pwr.2p.test(h = 0.2, sig.level = 0.05, power = .80) ``` ### pwr.2p2n.test - two-sample test for proportions, unequal sample sizes Let's return to our undergraduate survey of alcohol consumption. It turns out we were able to survey 543 males and 675 females. The power of our test if we're interested in being able to detect a "small" effect size with 0.05 significance is about 93%. ```{r} cohen.ES(test = "p", size = "small") pwr.2p2n.test(h = 0.2, n1 = 543, n2 = 675, sig.level = 0.05) ``` Let's say we previously surveyed 763 female undergraduates and found that _p_% said they consumed alcohol once a week. We would like to survey some males and see if a significantly different proportion respond yes. How many do I need to sample to detect a small effect size (0.2) in either direction with 80% power and a significance level of 0.05? ```{r} pwr.2p2n.test(h = 0.2, n1 = 763, power = 0.8, sig.level = 0.05) ``` About 265. ### pwr.t.test - one-sample and two-sample t tests for means We're interested to know if there is a difference in the mean price of what male and female students pay at a library coffee shop. Let's say we randomly observe 30 male and 30 female students check out from the coffee shop and calculate the mean purchase price for each gender. We'll test for a difference in means using a two-sample t-test. How powerful is this experiment if we want to detect a "medium" effect in either direction with a significance level of 0.05? ```{r} cohen.ES(test = "t", size = "medium") pwr.t.test(n = 30, d = 0.5, sig.level = 0.05) ``` Only 48%. Not very powerful. How many students should we observe for a test with 80% power? ```{r} pwr.t.test(d = 0.5, power = 0.80, sig.level = 0.05) ``` About 64 per group. Let's say we want to be able to detect a difference of at least 75 cents in the mean purchase price. We need to convert that to an effect size using the following formula: $$d = \frac{m_{1} - m_{2}}{\sigma} $$ where $m_{1}$ and $m_{2}$ are the means of each group, respectively, and $\sigma$ is the common standard deviation of the two groups. Again, the label _d_ is due to Cohen (1988). We have $m_{1} - m_{2} =$ 0.75. We need to make a guess at the population standard deviation. If we have absolutely no idea, one rule of thumb is to take the difference between the maximum and minimum values and divide by 4. Let's say the maximum purchase is $10 and the minimum purchase is $1. Our estimated standard deviation is (10 - 1)/4 = 2.25. Therefore our effect size is 0.75/2.25 $\approx$ 0.333. ```{r} d <- 0.75/2.25 pwr.t.test(d = d, power = 0.80, sig.level = 0.05) ``` For a desired power of 80%, Type I error tolerance of 0.05, and a hypothesized effect size of 0.333, we should sample at least 143 per group. Performing the same analysis with the base R function `power.t.test` is a little easier. The difference $m_{1} - m_{2} =$ 0.75 is entered in the `delta` argument and the estimated $\sigma$ = 2.25 is entered in the `sd` argument: ```{r} power.t.test(delta = 0.75, sd = 2.25, sig.level = 0.05, power = 0.8) ``` To calculate power and sample size for one-sample t-tests, we need to set the `type` argument to `"one.sample"`. By default it is set to `"two.sample"`. For example, we think the average purchase price at the Library coffee shop is over $3 per student. Our null is $3 or less; our alternative is greater than $3. We can use a one-sample t-test to investigate this hunch. If the true average purchase price is $3.50, we would like to have 90% power to declare the estimated average purchase price is greater than $3. How many transactions do we need to observe assuming a significance level of 0.05? Let's say the maximum purchase price is $10 and the minimum is $1. So our guess at a standard deviation is 9/4 = 2.25. ```{r} d <- 0.50/2.25 pwr.t.test(d = d, sig.level = 0.05, power = 0.90, alternative = "greater", type = "one.sample") ``` We should plan on observing at least 175 transactions. To use the `power.t.test` function, set `type = "one.sample"` and `alternative = "one.sided"`: ```{r} power.t.test(delta = 0.50, sd = 2.25, power = 0.90, sig.level = 0.05, alternative = "one.sided", type = "one.sample") ``` "Paired" t-tests are basically the same as one-sample t-tests, except our one sample is usually differences in pairs. The following example should make this clear. (_From Hogg & Tanis, exercise 6.5-12_) 24 high school boys are put on a ultra-heavy rope-jumping program. Does this decrease their 40-yard dash time (i.e., make them faster)? We'll measure their 40 time in seconds before the program and after. We'll use a paired t-test to see if the difference in times is greater than 0 (before - after). Assume the standard deviation of the differences will be about 0.25 seconds. How powerful is the test to detect a difference of about 0.08 seconds with 0.05 significance? Notice we set `type = "paired"`: ```{r} pwr.t.test(n = 24, d = 0.08 / 0.25, type = "paired", alternative = "greater") ``` Only 45%. Not all that powerful. How many high school boys should we sample for 80% power? ```{r} pwr.t.test(d = 0.08 / 0.25, power = 0.8, type = "paired", alternative = "greater") ``` About 62. For paired t-tests we sometimes estimate a standard deviation for _within_ pairs instead of for the difference in pairs. In our example, this would mean an estimated standard deviation for each boy's 40-yard dash times. When dealing with this type of estimated standard deviation we need to multiply it by $\sqrt{2}$ in the `pwr.t.test` function. Let's say we estimate the standard deviation of each boy's 40-yard dash time to be about 0.10 seconds. The sample size needed to detect a difference of 0.08 seconds is now calculated as follows: ```{r} pwr.t.test(d = 0.08 / (0.1 * sqrt(2)), power = 0.8, type = "paired", alternative = "greater") ``` We need to sample at least 21 students. ### pwr.t2n.test - two-sample t test for means, unequal sample sizes Find power for a two-sample t-test with 28 in one group and 35 in the other group and a medium effect size. (sig.level defaults to 0.05.) ```{r} pwr.t2n.test(n1 = 28, n2 = 35, d = 0.5) ``` ### pwr.chisq.test - Goodness of fit test (_From Cohen, example 7.1_) A market researcher is seeking to determine preference among 4 package designs. He arranges to have a panel of 100 consumers rate their favorite package design. He wants to perform a chi-square goodness of fit test against the null of equal preference (25% for each design) with a significance level of 0.05. What's the power of the test if 3/8 of the population actually prefers one of the designs and the remaining 5/8 are split over the other 3 designs? We use the `ES.w1` function to calculate effect size. To do so, we need to create vectors of null and alternative proportions: ```{r} null <- rep(0.25, 4) alt <- c(3/8, rep((5/8)/3, 3)) ES.w1(null,alt) ``` To calculate power, specify effect size (`w`), sample size (`N`), and degrees of freedom, which is the number of categories minus 1 (`df` = 4 - 1). ```{r} pwr.chisq.test(w=ES.w1(null,alt), N=100, df=(4-1), sig.level=0.05) ``` If our estimated effect size is correct, we only have about a 67% chance of finding it (i.e., rejecting the null hypothesis of equal preference). How many subjects do we need to achieve 80% power? ```{r} pwr.chisq.test(w=ES.w1(null,alt), df=(4-1), power=0.8, sig.level = 0.05) ``` If our alternative hypothesis is correct then we need to survey at least 131 people to detect it with 80% power. ### pwr.chisq.test - test of association We want to see if there's an association between gender and flossing teeth among college students. We randomly sample 100 students (male and female) and ask whether or not they floss daily. We want to carry out a chi-square test of association to determine if there's an association between these two variables. We set our significance level to 0.01. To determine effect size we need to propose an alternative hypothesis, which in this case is a table of proportions. We propose the following: gender | Floss |No Floss --|------|-------- Male | 0.1 | 0.4 Female | 0.2 | 0.3 We use the `ES.w2` function to calculate effect size for chi-square tests of association ```{r} prob <- matrix(c(0.1,0.2,0.4,0.3), ncol=2, dimnames = list(c("M","F"),c("Floss","No Floss"))) prob ``` This says we sample even proportions of male and females, but believe 10% more females floss. Now use the matrix to calculate effect size: ```{r} ES.w2(prob) ``` We also need degrees of freedom. `df` = (2 - 1) * (2 - 1) = 1 And now to calculate power: ```{r} pwr.chisq.test(w = ES.w2(prob), N = 100, df = 1, sig.level = 0.01) ``` At only 35% this is not a very powerful experiment. How many students should I survey if I wish to achieve 90% power? ```{r} pwr.chisq.test(w = ES.w2(prob), power = 0.9, df = 1, sig.level = 0.01) ``` About 313. If you don't suspect association in either direction, or you don't feel like building a matrix in R, you can try a conventional effect size. For example, how many students should we sample to detect a small effect? ```{r} cohen.ES(test = "chisq", size = "small") pwr.chisq.test(w = 0.1, power = 0.9, df = 1, sig.level = 0.01) ``` 1,488 students. Perhaps more than we thought we might need. We could consider reframing the question as a two-sample proportion test. What sample size do we need to detect a "small" effect in gender on the proportion of students who floss with 90% power and a significance level of 0.01? ```{r} pwr.2p.test(h = 0.2, sig.level = 0.01, power = 0.9) ``` About 744 per group. Notice that 744 $\times$ 2 = 1,488, the sample size returned previously by `pwr.chisq.test`. In fact the test statistic for a two-sample proportion test and chi-square test of association are one and the same. ### pwr.r.test - correlation test (_From Hogg & Tanis, exercise 8.9-12_) A graduate student is investigating the effectiveness of a fitness program. She wants to see if there is a correlation between the weight of a participant at the beginning of the program and the participant's weight change after 6 months. She suspects there is a "small" positive linear relationship between these two quantities. She will measure this relationship with correlation, _r_, and conduct a correlation test to determine if the estimated correlation is statistically greater than 0. How many subjects does she need to sample to detect this small positive (i.e., _r_ > 0) relationship with 80% power and 0.01 significance level? There is nothing tricky about the effect size argument, `r`. It is simply the hypothesized correlation. It can take values ranging from -1 to 1. ```{r} cohen.ES(test = "r", size = "small") pwr.r.test(r = 0.1, sig.level = 0.01, power = 0.8, alternative = "greater") ``` She needs to observe about a 1000 students. The default is a two-sided test. We specify `alternative = "greater"` since we believe there is small positive effect. If she just wants to detect a small effect in either direction (positive or negative correlation), use the default settings of "two.sided", which we can do by removing the `alternative` argument from the function. ```{r} pwr.r.test(r = 0.1, sig.level = 0.01, power = 0.8) ``` Now she needs to observe 1163 students. Detecting small effects requires large sample sizes. ### pwr.anova.test - balanced one-way analysis of variance tests (_From Hogg & Tanis, exercise 8.7-11_) The driver of a diesel-powered car decides to test the quality of three types of fuel sold in his area based on the miles per gallon (mpg) his car gets on each fuel. He will use a balanced one-way ANOVA to test the null that the mean mpg is the same for each fuel versus the alternative that the means are different. ("balanced" means equal sample size in each group; "one-way" means one grouping variable.) How many times does he need to try each fuel to have 90% power to detect a "medium" effect with a significance of 0.01? We use `cohen.ES` to get learn the "medium" effect value is 0.25. We put that in the `f` argument of `pwr.anova.test`. We also need to specify the number of groups using the `k` argument. ```{r} cohen.ES(test = "anov", size = "medium") pwr.anova.test(k = 3, f = 0.25, sig.level = 0.01, power = 0.9) ``` He would need to measure mpg 95 times for each type of fuel. His experiment may take a while to complete. The effect size `f` is calculated as follows: $$f = \frac{\sigma_{means}}{\sigma_{pop'n}}$$ where $\sigma_{means}$ is the standard deviation of the _k_ means and $\sigma_{pop'n}$ is the common standard deviation of the _k_ groups. These two quantities are also known as the _between-group_ and _within-group_ standard deviations. If our driver suspects the between-group standard deviation is 5 mpg and the within-group standard deviation is 3 mpg, `f` = 5/3. ```{r} pwr.anova.test(k = 3, f = 5/3, sig.level = 0.01, power = 0.9) ``` In this case he only needs to try each fuel 4 times. Clearly the hypothesized effect has important consequences in estimating an optimum effect size. We can also use the `power.anova.test` function that comes with base R. It requires between-group and within-group _variances_. To get the same result as `pwr.anova.test` we need to square the standard deviations to get variances and multiply the between-group variance by $\frac{k}{k-1}$. This is because the effect size formula for the ANOVA test assumes the between-group variance has a denominator of _k_ instead of _k - 1_. ```{r} power.anova.test(groups = 3, within.var = 3^2, between.var = 5^2 * (3/2), sig.level = 0.01, power = 0.90) ``` ### pwr.f2.test - test for the general linear model (_From Kutner, et al, exercise 8.43_) A director of admissions at a university wants to determine how accurately students' grade-point averages (gpa) at the end of their first year can be predicted or explained by SAT scores and high school class rank. A common approach to answering this kind of question is to model gpa as a function of SAT score and class rank. Or to put another way, we can perform a multiple regression with gpa as the dependent variable and SAT and class rank as independent variables. The null hypothesis is that none of the independent variables explain any of the variability in gpa. This would mean their regression coefficients are statistically indistinguishable from 0. The alternative is that at least one of the coefficients is not 0. This is tested with an F test. We can estimate power and sample size for this test using the `pwr.f2.test` function. The F test has numerator and denominator degrees of freedom. The numerator degrees of freedom, `u`, is the number of coefficients you'll have in your model (minus the intercept). In our example, `u = 2`. The denominator degrees of freedom, `v`, is the number of error degrees of freedom: $v = n - u - 1$. This implies $n = v + u + 1$. The effect size, `f2`, is $R^{2}/(1 - R^{2})$, where $R^{2}$ is the coefficient of determination, aka the "proportion of variance explained". To determine effect size you hypothesize the proportion of variance your model explains, or the $R^{2}$. For example, if I think my model explains 45% of the variance in my dependent variable, the effect size is 0.45/(1 - 0.45) $\approx$ 0.81. Returning to our example, let's say the director of admissions hypothesizes his model explains about 30% of the variability in gpa. How large of a sample does he need to take to detect this effect with 80% power at a 0.001 significance level? ```{r} pwr.f2.test(u = 2, f2 = 0.3/(1 - 0.3), sig.level = 0.001, power = 0.8) ``` Recall $n = v + u + 1$. Therefore he needs 50 + 2 + 1 = 53 student records. What is the power of the test with 40 subjects and a significance level of 0.01? Recall $v = n - u - 1$. ```{r} pwr.f2.test(u = 2, v = 40 - 2 - 1, f2 = 0.3/(1 - 0.3), sig.level = 0.01) ``` Power is about 84%. ## References and Further Reading Cohen, J. (1988). _Statistical Power Analysis for the Behavioral Sciences (2nd ed.)_. LEA. Dalgaard, P. (2002). _Introductory Statistics with R_. Springer. (Ch. 2) Hogg, R and Tanis, E. (2006). _Probability and Statistical Inference (7th ed.)_. Pearson. (Ch. 9) Kabacoff, R. (2011). _R in Action_. Manning. (Ch. 10) Kutner, et al. (2005). _Applied Linear Statistical Models_. McGraw-Hill. (Ch. 16) Ryan, T. (2013). _Sample Size Determination and Power_. Wiley. The [CRAN Task View for Clinical Trial Design, Monitoring, and Analysis](https://CRAN.R-project.org/view=ClinicalTrials) lists various R packages that also perform sample size and power calculations. pwr/NAMESPACE0000644000176200001440000000030613046210416012275 0ustar liggesusers# Default NAMESPACE created by R # Remove the previous line if you edit this file # Export all names exportPattern(".") # Imports import(stats, graphics) # S3 methods S3method(plot, power.htest) pwr/NEWS0000644000176200001440000000335413246553766011607 0ustar liggesusers2018-03-03 * Version 1.2-2 * Updates in documentation. 2017-03-21 * Version 1.2-1 * vignettes/pwr-vignette.Rmd: New vignette by Clay Ford * R: Add upper limits of n (Aditya Anandkumar's pull request) * R/plot.pwer.htest: Robert Volcic's fix for pwr.t.test * NEWS: ChangeLog moved to "NEWS" file 2016-08-06 Helios De Rosario-Martinez * Version 1.2-0 * DESCRIPTION: move ggplot2 and scales to "Suggests"; import from graphics. * NAMESPACE: remove ggplot2 and scales from import; add graphics. * R/plot.power.htest: falling back to generic plot tools if ggplot2 and scales do not exist. 2016-08-04 Helios De Rosario-Martinez * Version 1.1-4 * R/plot.power.htest: New function contributed by Stephan Weibelzahl. * DESCRIPTION, NAMESPACE, man: Adapt to include new plot.power.htest function. 2015-09-02 Helios De Rosario-Martinez * R/pwr.t2n.test: Revert to previous version. 2015-08-18 Helios De Rosario-Martinez * NAMESPACE, DESCRIPTION: Add import from stats. 2015-04-02 Helios De Rosario-Martinez * R/pwr.t2n.test: Fix following Jan Wunder's report. 2015-02-27 Helios De Rosario-Martinez * DESCRIPTION: Authors redefined as Authors@R. Add Claus Ekstrom, Peter Daalgard and Jeffrey Gill as contributors. New maintainer (Helios De Rosario, taking over Stephane Champely's work). Update License to GPL>=3. Added link to Github development repository. * LICENSE: Added GPL 3 license. * man: Fixed line widths of usage and example in various files. (pwr-package.Rd)Updated authors. * R/pwr.r.test.R: Fixed calculation of zrc following Jeffrey Gill's report. * Version 1.1-2 pwr/R/0000755000176200001440000000000013246571173011274 5ustar liggesuserspwr/R/pwr.2p.test.R0000644000176200001440000000456413064333546013534 0ustar liggesusers"pwr.2p.test" <- function (h = NULL, n = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided","less","greater")) { if (sum(sapply(list(h, n, power, sig.level), is.null)) != 1) stop("exactly one of h, n, power, and sig.level must be NULL") if (!is.null(n) && n < 1) stop("number of observations in each group must be at least 1") if (!is.null(sig.level) && !is.numeric(sig.level) || any(0 > sig.level | sig.level > 1)) stop(sQuote("sig.level"), " must be numeric in [0, 1]") if (!is.null(power) && !is.numeric(power) || any(0 > power | power > 1)) stop(sQuote("power"), " must be numeric in [0, 1]") alternative <- match.arg(alternative) tside <- switch(alternative, less = 1, two.sided = 2, greater=3) if (tside == 2 && !is.null(h)) h <- abs(h) if (tside == 3) { p.body <- quote({ pnorm(qnorm(sig.level, lower = FALSE) - h * sqrt(n/2), lower = FALSE) }) } if (tside == 2) { p.body <- quote({ pnorm(qnorm(sig.level/2, lower = FALSE) - h * sqrt(n/2), lower = FALSE) + pnorm(qnorm(sig.level/2, lower = TRUE) - h * sqrt(n/2), lower = TRUE) }) } if (tside ==1) { p.body <- quote({ pnorm(qnorm(sig.level, lower = TRUE) - h * sqrt(n/2), lower = TRUE) }) } if (is.null(power)) power <- eval(p.body) else if (is.null(h)){ if(tside==2){ h <- uniroot(function(h) eval(p.body) - power, c(1e-10,10))$root} if(tside==1){ h <- uniroot(function(h) eval(p.body) - power, c(-10,5))$root} if(tside==3){ h <- uniroot(function(h) eval(p.body) - power, c(-5,10))$root} } else if (is.null(n)) n <- uniroot(function(n) eval(p.body) - power, c(2 + 1e-10, 1e+09))$root else if (is.null(sig.level)) sig.level <- uniroot(function(sig.level) eval(p.body) - power, c(1e-10, 1 - 1e-10))$root else stop("internal error") NOTE <- "same sample sizes" METHOD <- "Difference of proportion power calculation for binomial distribution (arcsine transformation)" structure(list(h = h, n = n, sig.level = sig.level, power = power, alternative = alternative, method = METHOD, note = NOTE), class = "power.htest") } pwr/R/pwr.p.test.R0000644000176200001440000000443713064333546013451 0ustar liggesusers"pwr.p.test" <- function (h = NULL, n = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided","less","greater")) { if (sum(sapply(list(h, n, power, sig.level), is.null)) != 1) stop("exactly one of h, n, power, and sig.level must be NULL") if (!is.null(n) && n < 1) stop("number of observations in each group must be at least 1") if (!is.null(sig.level) && !is.numeric(sig.level) || any(0 > sig.level | sig.level > 1)) stop(sQuote("sig.level"), " must be numeric in [0, 1]") if (!is.null(power) && !is.numeric(power) || any(0 > power | power > 1)) stop(sQuote("power"), " must be numeric in [0, 1]") alternative <- match.arg(alternative) tside <- switch(alternative, less = 1, two.sided = 2, greater=3) if (tside == 2 && !is.null(h)) h <- abs(h) if (tside == 2) { p.body <- quote({ pnorm(qnorm(sig.level/2, lower = FALSE) - h * sqrt(n), lower = FALSE) + pnorm(qnorm(sig.level/2, lower = TRUE) - h * sqrt(n), lower = TRUE) }) } if (tside == 3) { p.body <- quote({ pnorm(qnorm(sig.level, lower = FALSE) - h * sqrt(n), lower = FALSE) }) } if (tside == 1) { p.body <- quote({ pnorm(qnorm(sig.level, lower = TRUE) - h * sqrt(n), lower = TRUE) }) } if (is.null(power)) power <- eval(p.body) else if (is.null(h)){ if(tside==2){ h <- uniroot(function(h) eval(p.body) - power, c(1e-10,10))$root} if(tside==1){ h <- uniroot(function(h) eval(p.body) - power, c(-10,5))$root} if(tside==3){ h <- uniroot(function(h) eval(p.body) - power, c(-5,10))$root} } else if (is.null(n)) n <- uniroot(function(n) eval(p.body) - power, c(2 + 1e-10, 1e+09))$root else if (is.null(sig.level)) sig.level <- uniroot(function(sig.level) eval(p.body) - power, c(1e-10, 1 - 1e-10))$root else stop("internal error") METHOD <- "proportion power calculation for binomial distribution (arcsine transformation)" structure(list(h = h, n = n, sig.level = sig.level, power = power, alternative = alternative, method = METHOD), class = "power.htest") } pwr/R/pwr.f2.test.R0000644000176200001440000000333113064333546013511 0ustar liggesusers"pwr.f2.test" <- function (u = NULL, v = NULL, f2 = NULL, sig.level = 0.05, power = NULL) { if (sum(sapply(list(u,v, f2, power, sig.level), is.null)) != 1) stop("exactly one of u, v, f2, power, and sig.level must be NULL") if (!is.null(f2) && f2 < 0) stop("f2 must be positive") if (!is.null(u) && u < 1) stop("degree of freedom u for numerator must be at least 1") if (!is.null(v) && v < 1) stop("degree of freedom v for denominator must be at least 1") if (!is.null(sig.level) && !is.numeric(sig.level) || any(0 > sig.level | sig.level > 1)) stop(sQuote("sig.level"), " must be numeric in [0, 1]") if (!is.null(power) && !is.numeric(power) || any(0 > power | power > 1)) stop(sQuote("power"), " must be numeric in [0, 1]") p.body <- quote({ lambda <- f2*(u+v+1) pf(qf(sig.level, u, v, lower = FALSE), u, v, lambda, lower = FALSE) }) if (is.null(power)) power <- eval(p.body) else if (is.null(u)) u <- uniroot(function(u) eval(p.body) - power, c(1 + 1e-10, 100))$root else if (is.null(v)) v <- uniroot(function(v) eval(p.body) - power, c(1 + 1e-10, 1e+09))$root else if (is.null(f2)) f2 <- uniroot(function(f2) eval(p.body) - power, c(1e-07, 1e+07))$root else if (is.null(sig.level)) sig.level <- uniroot(function(sig.level) eval(p.body) - power, c(1e-10, 1 - 1e-10))$root else stop("internal error") METHOD <- "Multiple regression power calculation" structure(list(u = u, v = v, f2 = f2, sig.level = sig.level, power = power, method = METHOD), class = "power.htest") } pwr/R/pwr.t.test.R0000644000176200001440000000542113064333546013447 0ustar liggesusers"pwr.t.test" <- function (n = NULL, d = NULL, sig.level = 0.05, power = NULL, type = c("two.sample", "one.sample", "paired"), alternative = c("two.sided", "less","greater")) { if (sum(sapply(list(n, d, power, sig.level), is.null)) != 1) stop("exactly one of n, d, power, and sig.level must be NULL") if (!is.null(sig.level) && !is.numeric(sig.level) || any(0 > sig.level | sig.level > 1)) stop(sQuote("sig.level"), " must be numeric in [0, 1]") if (!is.null(power) && !is.numeric(power) || any(0 > power | power > 1)) stop(sQuote("power"), " must be numeric in [0, 1]") type <- match.arg(type) alternative <- match.arg(alternative) tsample <- switch(type, one.sample = 1, two.sample = 2, paired = 1) ttside<-switch(alternative, less = 1, two.sided = 2, greater=3) tside <- switch(alternative, less = 1, two.sided = 2, greater =1) if (tside == 2 && !is.null(d)) d <- abs(d) if (ttside == 1) { p.body <- quote({ nu <- (n - 1) * tsample pt(qt(sig.level/tside, nu, lower = TRUE), nu, ncp = sqrt(n/tsample) * d, lower = TRUE) }) } if (ttside == 2) { p.body <- quote({ nu <- (n - 1) * tsample qu <- qt(sig.level/tside, nu, lower = FALSE) pt(qu, nu, ncp = sqrt(n/tsample) * d, lower = FALSE) + pt(-qu, nu, ncp = sqrt(n/tsample) * d, lower = TRUE) }) } if (ttside == 3) { p.body <- quote({ nu <- (n - 1) * tsample pt(qt(sig.level/tside, nu, lower = FALSE), nu, ncp = sqrt(n/tsample) * d, lower = FALSE) }) } if (is.null(power)) power <- eval(p.body) else if (is.null(n)) n <- uniroot(function(n) eval(p.body) - power, c(2 + 1e-10, 1e+09))$root else if (is.null(d)) { if(ttside==2){ d <- uniroot(function(d) eval(p.body) - power, c(1e-07, 10))$root} if(ttside==1){ d <- uniroot(function(d) eval(p.body) - power, c(-10, 5))$root} if(ttside==3){ d <- uniroot(function(d) eval(p.body) - power, c(-5, 10))$root} } else if (is.null(sig.level)) sig.level <- uniroot(function(sig.level) eval(p.body) - power, c(1e-10, 1 - 1e-10))$root else stop("internal error") NOTE <- switch(type, paired = "n is number of *pairs*", two.sample = "n is number in *each* group", NULL) METHOD <- paste(switch(type, one.sample = "One-sample", two.sample = "Two-sample", paired = "Paired"), "t test power calculation") structure(list(n = n, d = d, sig.level = sig.level, power = power, alternative = alternative, note = NOTE, method = METHOD), class = "power.htest") } pwr/R/pwr.t2n.test.R0000644000176200001440000000527713064333546013720 0ustar liggesusers"pwr.t2n.test" <- function (n1 = NULL, n2= NULL, d = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided", "less","greater")) { if (sum(sapply(list(n1,n2, d, power, sig.level), is.null)) != 1) stop("exactly one of n1, n2, d, power, and sig.level must be NULL") if (!is.null(sig.level) && !is.numeric(sig.level) || any(0 > sig.level | sig.level > 1)) stop(sQuote("sig.level"), " must be numeric in [0, 1]") if (!is.null(power) && !is.numeric(power) || any(0 > power | power > 1)) stop(sQuote("power"), " must be numeric in [0, 1]") if (!is.null(n1) && n1 < 2) stop("number of observations in the first group must be at least 2") if (!is.null(n2) && n2 < 2) stop("number of observations in the second group must be at least 2") alternative <- match.arg(alternative) tsample <-2 ttside<-switch(alternative, less = 1, two.sided = 2, greater=3) tside <- switch(alternative, less = 1, two.sided = 2, greater =1) if (tside==2 && !is.null(d)) d <- abs(d) if (ttside == 1) { p.body <- quote({ nu <- n1+n2-2 pt(qt(sig.level/tside, nu, lower = TRUE), nu, ncp = d*(1/sqrt(1/n1+1/n2)), lower = TRUE) }) } if (ttside == 2) { p.body <- quote({ nu <- n1+n2-2 qu <- qt(sig.level/tside, nu, lower = FALSE) pt(qu, nu, ncp = d*(1/sqrt(1/n1+1/n2)), lower = FALSE) + pt(-qu, nu,ncp = d*(1/sqrt(1/n1+1/n2)), lower = TRUE) }) } if (ttside == 3) { p.body <- quote({ nu <- n1+n2-2 pt(qt(sig.level/tside, nu, lower = FALSE), nu, ncp = d*(1/sqrt(1/n1+1/n2)), lower = FALSE) }) } if (is.null(power)) power <- eval(p.body) else if (is.null(n1)) n1 <- uniroot(function(n1) eval(p.body) - power, c(2 + 1e-10, 1e+09))$root else if (is.null(n2)) n2 <- uniroot(function(n2) eval(p.body) - power, c(2 + 1e-10, 1e+09))$root else if (is.null(d)) { if(ttside==2){ d <- uniroot(function(d) eval(p.body) - power, c(1e-07, 10))$root} if(ttside==1){ d <- uniroot(function(d) eval(p.body) - power, c(-10, 5))$root} if(ttside==3){ d <- uniroot(function(d) eval(p.body) - power, c(-5, 10))$root} } else if (is.null(sig.level)) sig.level <- uniroot(function(sig.level) eval(p.body) - power, c(1e-10, 1 - 1e-10))$root else stop("internal error") METHOD <- c("t test power calculation") structure(list(n1 = n1,n2=n2, d = d, sig.level = sig.level, power = power, alternative = alternative,method = METHOD), class = "power.htest") } pwr/R/cohen.ES.R0000644000176200001440000000135313046210414013005 0ustar liggesusers"cohen.ES" <- function(test=c("p","t","r","anov","chisq","f2"),size=c("small","medium","large")){ test <- match.arg(test) size <- match.arg(size) ntest <- switch(test, p = 1, t = 2,r=3,anov=4,chisq=5,f2=6) if(ntest==1){ ES<-switch(size,small=0.2,medium=0.5,large=0.8) } if(ntest==2){ ES<-switch(size,small=0.2,medium=0.5,large=0.8) } if(ntest==3){ ES<-switch(size,small=0.1,medium=0.3,large=0.5) } if(ntest==4){ ES<-switch(size,small=0.1,medium=0.25,large=0.4) } if(ntest==5){ ES<-switch(size,small=0.1,medium=0.3,large=0.5) } if(ntest==6){ ES<-switch(size,small=0.02,medium=0.15,large=0.35) } METHOD <- "Conventional effect size from Cohen (1982)" structure(list(test = test,size=size,effect.size=ES, method = METHOD), class = "power.htest") } pwr/R/pwr.norm.test.R0000644000176200001440000000445613064333546014166 0ustar liggesusers"pwr.norm.test" <- function (d = NULL, n = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided","less","greater")) { if (sum(sapply(list(d, n, power, sig.level), is.null)) != 1) stop("exactly one of d, n, power, and sig.level must be NULL") if (!is.null(n) && n < 1) stop("number of observations in each group must be at least 1") if (!is.null(sig.level) && !is.numeric(sig.level) || any(0 > sig.level | sig.level > 1)) stop(sQuote("sig.level"), " must be numeric in [0, 1]") if (!is.null(power) && !is.numeric(power) || any(0 > power | power > 1)) stop(sQuote("power"), " must be numeric in [0, 1]") alternative <- match.arg(alternative) tside <- switch(alternative, less = 1, two.sided = 2, greater=3) if (tside == 2 && !is.null(d)) d <- abs(d) if (tside == 2) { p.body <- quote({ pnorm(qnorm(sig.level/2, lower = FALSE) - d * sqrt(n), lower = FALSE) + pnorm(qnorm(sig.level/2, lower = TRUE) - d * sqrt(n), lower = TRUE) }) } if (tside==1) { p.body <- quote({ pnorm(qnorm(sig.level, lower = TRUE) - d * sqrt(n), lower = TRUE) }) } if (tside==3) { p.body <- quote({ pnorm(qnorm(sig.level, lower = FALSE) - d * sqrt(n), lower = FALSE) }) } if (is.null(power)) power <- eval(p.body) else if (is.null(d)) { if (tside == 2){ d <- uniroot(function(d) eval(p.body) - power, c(1e-10, 10))$root} if (tside == 1){ d <- uniroot(function(d) eval(p.body) - power, c(-10, 5))$root} if (tside == 3){ d <- uniroot(function(d) eval(p.body) - power, c(-5, 10))$root} } else if (is.null(n)) n <- uniroot(function(n) eval(p.body) - power, c(1 + 1e-10, 1e+09))$root else if (is.null(sig.level)) sig.level <- uniroot(function(sig.level) eval(p.body) - power, c(1e-10, 1 - 1e-10))$root else stop("internal error") METHOD <- "Mean power calculation for normal distribution with known variance" structure(list(d = d, n = n, sig.level = sig.level, power = power, alternative = alternative, method = METHOD), class = "power.htest") } pwr/R/pwr.2p2n.test.R0000644000176200001440000000533613064333546013772 0ustar liggesusers"pwr.2p2n.test" <- function (h = NULL, n1 = NULL, n2 = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided", "less","greater")) { if (sum(sapply(list(h, n1, n2, power, sig.level), is.null)) != 1) stop("exactly one of h, n1, n2, power, and sig.level must be NULL") if (!is.null(n1) && n1 < 2) stop("number of observations in the first group must be at least 2") if (!is.null(n2) && n2 < 2) stop("number of observations in the second group must be at least 2") if (!is.null(sig.level) && !is.numeric(sig.level) || any(0 > sig.level | sig.level > 1)) stop(sQuote("sig.level"), " must be numeric in [0, 1]") if (!is.null(power) && !is.numeric(power) || any(0 > power | power > 1)) stop(sQuote("power"), " must be numeric in [0, 1]") alternative <- match.arg(alternative) tside <- switch(alternative, less = 1, two.sided = 2,greater=3) if (tside == 2 && !is.null(h)) h <- abs(h) if (tside == 3) { p.body <- quote({ pnorm(qnorm(sig.level, lower = FALSE) - h * sqrt((n1 * n2)/(n1 + n2)), lower = FALSE) }) } if (tside == 1) { p.body <- quote({ pnorm(qnorm(sig.level, lower = TRUE) - h * sqrt((n1 * n2)/(n1 + n2)), lower = TRUE) }) } if (tside == 2) { p.body <- quote({ pnorm(qnorm(sig.level/2, lower = FALSE) - h * sqrt((n1 * n2)/(n1 + n2)), lower = FALSE) + pnorm(qnorm(sig.level/2, lower = TRUE) - h * sqrt((n1 * n2)/(n1 + n2)), lower = TRUE) }) } if (is.null(power)) power <- eval(p.body) else if (is.null(h)){ if(tside==2){ h <- uniroot(function(h) eval(p.body) - power, c(1e-10,10))$root} if(tside==1){ h <- uniroot(function(h) eval(p.body) - power, c(-10,5))$root} if(tside==3){ h <- uniroot(function(h) eval(p.body) - power, c(-5,10))$root} } else if (is.null(n1)) n1 <- uniroot(function(n1) eval(p.body) - power, c(2 + 1e-10, 1e+09))$root else if (is.null(n2)) n2 <- uniroot(function(n2) eval(p.body) - power, c(2 + 1e-10, 1e+09))$root else if (is.null(sig.level)) sig.level <- uniroot(function(sig.level) eval(p.body) - power, c(1e-10, 1 - 1e-10))$root else stop("internal error") NOTE <- "different sample sizes" METHOD <- "difference of proportion power calculation for binomial distribution (arcsine transformation)" structure(list(h = h, n1 = n1, n2 = n2, sig.level = sig.level, power = power, alternative = alternative, method = METHOD, note = NOTE), class = "power.htest") } pwr/R/ES.h.R0000644000176200001440000000011513046210414012133 0ustar liggesusers"ES.h" <- function (p1, p2) { 2 * asin(sqrt(p1)) - 2 * asin(sqrt(p2)) } pwr/R/ES.w1.R0000644000176200001440000000006713046210414012241 0ustar liggesusers"ES.w1" <- function(P0,P1){ sqrt(sum((P1-P0)^2/P0)) } pwr/R/pwr.r.test.R0000644000176200001440000000511613064333546013446 0ustar liggesusers"pwr.r.test" <- function (n = NULL, r = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided", "less","greater")) { if (sum(sapply(list(n, r, power, sig.level), is.null)) != 1) stop("exactly one of n, r, power, and sig.level must be NULL") if (!is.null(sig.level) && !is.numeric(sig.level) || any(0 > sig.level | sig.level > 1)) stop(sQuote("sig.level"), " must be numeric in [0, 1]") if (!is.null(power) && !is.numeric(power) || any(0 > power | power > 1)) stop(sQuote("power"), " must be numeric in [0, 1]") if (!is.null(n) && n < 4) stop("number of observations must be at least 4") alternative <- match.arg(alternative) tside <- switch(alternative, less = 1, two.sided = 2,greater=3) if (tside == 2 && !is.null(r)) r <- abs(r) if (tside == 3) { p.body <- quote({ ttt <- qt(sig.level, df = n - 2, lower = FALSE) rc <- sqrt(ttt^2/(ttt^2 + n - 2)) zr <- atanh(r) + r/(2 * (n - 1)) zrc <- atanh(rc) # + rc/(2 * (n - 1)) pnorm((zr - zrc) * sqrt(n - 3)) }) } if (tside == 1) { p.body <- quote({ r<--r ttt <- qt(sig.level, df = n - 2, lower = FALSE) rc <- sqrt(ttt^2/(ttt^2 + n - 2)) zr <- atanh(r) + r/(2 * (n - 1)) zrc <- atanh(rc) # + rc/(2 * (n - 1)) pnorm((zr - zrc) * sqrt(n - 3)) }) } if (tside == 2) { p.body <- quote({ ttt <- qt(sig.level/2, df = n - 2, lower = FALSE) rc <- sqrt(ttt^2/(ttt^2 + n - 2)) zr <- atanh(r) + r/(2 * (n - 1)) zrc <- atanh(rc) # + rc/(2 * (n - 1)) pnorm((zr - zrc) * sqrt(n - 3)) + pnorm((-zr - zrc) * sqrt(n - 3)) }) } if (is.null(power)) power <- eval(p.body) else if (is.null(n)) n <- uniroot(function(n) eval(p.body) - power, c(4 + 1e-10, 1e+09))$root else if (is.null(r)) { if(tside==2){r <- uniroot(function(r) eval(p.body) - power, c(1e-10,1 - 1e-10))$root} else {r <- uniroot(function(r) eval(p.body) - power, c(-1+1e-10, 1 - 1e-10))$root} } else if (is.null(sig.level)) sig.level <- uniroot(function(sig.level) eval(p.body) - power, c(1e-10, 1 - 1e-10))$root else stop("internal error") METHOD <- "approximate correlation power calculation (arctangh transformation)" structure(list(n = n, r = r, sig.level = sig.level, power = power, alternative = alternative, method = METHOD), class = "power.htest") } pwr/R/pwr.anova.test.R0000644000176200001440000000344613064333546014315 0ustar liggesusers"pwr.anova.test" <- function (k = NULL, n = NULL, f = NULL, sig.level = 0.05, power = NULL) { if (sum(sapply(list(k, n, f, power, sig.level), is.null)) != 1) stop("exactly one of k, n, f, power, and sig.level must be NULL") if (!is.null(f) && f < 0) stop("f must be positive") if (!is.null(k) && k < 2) stop("number of groups must be at least 2") if (!is.null(n) && n < 2) stop("number of observations in each group must be at least 2") if (!is.null(sig.level) && !is.numeric(sig.level) || any(0 > sig.level | sig.level > 1)) stop(sQuote("sig.level"), " must be numeric in [0, 1]") if (!is.null(power) && !is.numeric(power) || any(0 > power | power > 1)) stop(sQuote("power"), " must be numeric in [0, 1]") p.body <- quote({ lambda <- k * n * f^2 pf(qf(sig.level, k - 1, (n - 1) * k, lower = FALSE), k - 1, (n - 1) * k, lambda, lower = FALSE) }) if (is.null(power)) power <- eval(p.body) else if (is.null(k)) k <- uniroot(function(k) eval(p.body) - power, c(2 + 1e-10, 100))$root else if (is.null(n)) n <- uniroot(function(n) eval(p.body) - power, c(2 + 1e-10, 1e+09))$root else if (is.null(f)) f <- uniroot(function(f) eval(p.body) - power, c(1e-07, 1e+07))$root else if (is.null(sig.level)) sig.level <- uniroot(function(sig.level) eval(p.body) - power, c(1e-10, 1 - 1e-10))$root else stop("internal error") NOTE <- "n is number in each group" METHOD <- "Balanced one-way analysis of variance power calculation" structure(list(k = k, n = n, f = f, sig.level = sig.level, power = power, note = NOTE, method = METHOD), class = "power.htest") } pwr/R/plot.power.htest.R0000644000176200001440000002771313064333546014666 0ustar liggesusersplot.power.htest <- function (x, ...){ # initial checks if (class(x) != "power.htest") stop("argument must be of class power.htest") pwr.methods <- c("One-sample t test power calculation", "Two-sample t test power calculation", "Paired t test power calculation", "t test power calculation", "Difference of proportion power calculation for binomial distribution (arcsine transformation)", "difference of proportion power calculation for binomial distribution (arcsine transformation)", "Balanced one-way analysis of variance power calculation", "Chi squared power calculation", "Mean power calculation for normal distribution with known variance", "proportion power calculation for binomial distribution (arcsine transformation)", "approximate correlation power calculation (arctangh transformation)") if(!(x$method %in% pwr.methods)) stop(paste("the method ", x$method, " is not supported. Supported methods include:", paste(pwr.methods, collapse = "; "))) # settings breaks <- 20 # case: One-sample, Two-sample or Paired t test if(x$method == "One-sample t test power calculation" || x$method == "Two-sample t test power calculation" || x$method == "Paired t test power calculation") { if(x$method == "One-sample t test power calculation"){x$type = "one.sample"} if(x$method == "Two-sample t test power calculation"){x$type = "two.sample"} if(x$method == "Paired t test power calculation"){x$type = "paired"} n <- x$n n_upper <- max(n*1.5, n+30) # upper at least 30 above n # generate data sample_sizes <- seq.int(from=10, to=n_upper, by=(n_upper - 10)/breaks) data <- data.frame(sample_sizes) data$power <- sapply(sample_sizes, FUN = function(ss) { return(pwr.t.test(n=ss, d=x$d, sig.level = x$sig.level, type=x$type, alternative = x$alternative)$power) }, simplify = TRUE) # create labels title_string <- x$method legend_string <- paste("tails =", x$alternative, "\neffect size d =", x$d, "\nalpha =", x$sig.level ) xlab_string <- "sample size" ylab_string <- expression(paste("test power = 1 - ", beta)) optimal_string <- paste("optimal sample size \nn = ", ceiling(n), "\n", x$note, sep = "") } # case: Two-sample t test with n1 and n2 else if(x$method == "t test power calculation") { n <- x$n1 + x$n2 n_upper <- max(n*1.5, n+30) # upper at least 30 above n n_rel <- x$n1 / n # relative sample size; will be kept constant in claculations # generate data sample_sizes <- seq.int(from=10, to=n_upper, by=(n_upper - 10)/breaks) data <- data.frame(sample_sizes) data$power <- sapply(sample_sizes, FUN = function(ss) { n1 <- ceiling(ss*n_rel) n2 <- ss - n1 if(n1 <2 || n2 <2){ return(NA) }else{ return(pwr.t2n.test(n1=n1, n2=n2, d=x$d, sig.level = x$sig.level, alternative = x$alternative)$power) } }, simplify = TRUE) # create labels title_string <- x$method legend_string <- paste("tails =", x$alternative, "\neffect size d =", x$d, "\nalpha =", x$sig.level, "\nn1/n2 = ", round(n_rel, 2)) xlab_string <- "sample size" ylab_string <- expression(paste("test power = 1 - ", beta)) optimal_string <- paste("optimal sample size \nn = ", x$n1, " + ", x$n2, " = ", n, sep = "") } # case: Difference of proportion (same sample size) else if(x$method == "Difference of proportion power calculation for binomial distribution (arcsine transformation)") { n <- x$n n_upper <- max(n*1.5, n+30) # upper at least 30 above n # generate data sample_sizes <- seq.int(from=10, to=n_upper, by=(n_upper - 10)/breaks) data <- data.frame(sample_sizes) data$power <- sapply(sample_sizes, FUN = function(ss) { return(pwr.2p.test(n=ss, h=x$h, sig.level = x$sig.level, alternative = x$alternative)$power) }, simplify = TRUE) # create labels title_string <- "Difference of proportion power calculation\nfor binomial distribution (arcsine transformation)" legend_string <- paste("tails =", x$alternative, "\neffect size h =", x$h, "\nalpha =", x$sig.level ) xlab_string <- "sample size" ylab_string <- expression(paste("test power = 1 - ", beta)) optimal_string <- paste("optimal sample size \nn = ", ceiling(n), "\n", x$note, sep = "") } # case: difference of proportion (different sample size) else if(x$method == "difference of proportion power calculation for binomial distribution (arcsine transformation)") { n <- x$n1 + x$n2 n_upper <- max(n*1.5, n+30) # upper at least 30 above n n_rel <- x$n1 / n # relative sample size; will be kept constant in claculations # generate data sample_sizes <- seq.int(from=10, to=n_upper, by=(n_upper - 10)/breaks) data <- data.frame(sample_sizes) data$power <- sapply(sample_sizes, FUN = function(ss) { n1 <- ceiling(ss*n_rel) n2 <- ss - n1 if(n1 <2 || n2 <2){ return(NA) }else{ return(pwr.2p2n.test(n1=n1, n2=n2, h=x$h, sig.level = x$sig.level, alternative = x$alternative)$power) } }, simplify = TRUE) # create labels title_string <- "Difference of proportion power calculation\nfor binomial distribution (arcsine transformation)" legend_string <- paste("tails =", x$alternative, "\neffect size h =", x$h, "\nalpha =", x$sig.level, "\nn1/n2 = ", round(n_rel, 2)) xlab_string <- "sample size" ylab_string <- expression(paste("test power = 1 - ", beta)) optimal_string <- paste("optimal sample size \nn = ", x$n1, " + ", x$n2, " = ", n, sep = "") } # case: ANOVA else if(x$method == "Balanced one-way analysis of variance power calculation") { n <- x$n n_upper <- max(n*1.5, n+30) # upper at least 30 above n # generate data sample_sizes <- seq.int(from=10, to=n_upper, by=(n_upper - 10)/breaks) data <- data.frame(sample_sizes) data$power <- sapply(sample_sizes, FUN = function(ss) { return(pwr.anova.test(n=ss, k=x$k, f=x$f, sig.level = x$sig.level)$power) }, simplify = TRUE) # create labels title_string <- "Balanced one-way analysis of variance \npower calculation" legend_string <- paste("groups k =", x$k, "\neffect size f =", x$f, "\nalpha =", x$sig.level ) xlab_string <- "sample size" ylab_string <- expression(paste("test power = 1 - ", beta)) optimal_string <- paste("optimal sample size \nn = ", ceiling(n), "\n", x$note, sep = "") } # case: Chi Squared else if(x$method == "Chi squared power calculation") { n <- x$N n_upper <- max(n*1.5, n+30) # upper at least 30 above n # generate data sample_sizes <- seq.int(from=10, to=n_upper, by=(n_upper - 10)/breaks) data <- data.frame(sample_sizes) data$power <- sapply(sample_sizes, FUN = function(ss) { return(pwr.chisq.test(N=ss, w=x$w, sig.level = x$sig.level, df=x$df)$power) }, simplify = TRUE) # create labels title_string <- x$method legend_string <- paste("effect size w =", x$w, "\ndf =", x$df, "\nalpha =", x$sig.level) xlab_string <- "sample size" ylab_string <- expression(paste("test power = 1 - ", beta)) optimal_string <- paste("optimal sample size \nN = ", ceiling(n), "\n", x$note, sep = "") } # case: Normal distribution else if(x$method == "Mean power calculation for normal distribution with known variance") { n <- x$n n_upper <- max(n*1.5, n+30) # upper at least 30 above n # generate data sample_sizes <- seq.int(from=10, to=n_upper, by=(n_upper - 10)/breaks) data <- data.frame(sample_sizes) data$power <- sapply(sample_sizes, FUN = function(ss) { return(pwr.norm.test(n=ss, d=x$d, sig.level = x$sig.level, alternative = x$alternative)$power) }, simplify = TRUE) # create labels title_string <- "Mean power calculation for normal distribution\nwith known variance" legend_string <- paste("tails =", x$alternative, "\neffect size d =", x$d, "\nalpha =", x$sig.level ) xlab_string <- "sample size" ylab_string <- expression(paste("test power = 1 - ", beta)) optimal_string <- paste("optimal sample size \nn = ", ceiling(n), "\n", x$note, sep = "") } # case: proportion else if(x$method == "proportion power calculation for binomial distribution (arcsine transformation)") { n <- x$n n_upper <- max(n*1.5, n+30) # upper at least 30 above n # generate data sample_sizes <- seq.int(from=10, to=n_upper, by=(n_upper - 10)/breaks) data <- data.frame(sample_sizes) data$power <- sapply(sample_sizes, FUN = function(ss) { return(pwr.p.test(n=ss, h=x$h, sig.level = x$sig.level, alternative = x$alternative)$power) }, simplify = TRUE) # create labels title_string <- "proportion power calculation\nfor binomial distribution (arcsine transformation)" legend_string <- paste("tails =", x$alternative, "\neffect size h =", x$h, "\nalpha =", x$sig.level ) xlab_string <- "sample size" ylab_string <- expression(paste("test power = 1 - ", beta)) optimal_string <- paste("optimal sample size \nn = ", ceiling(n), "\n", x$note, sep = "") } # case: correlation else if(x$method == "approximate correlation power calculation (arctangh transformation)") { n <- x$n n_upper <- max(n*1.5, n+30) # upper at least 30 above n # generate data sample_sizes <- seq.int(from=10, to=n_upper, by=(n_upper - 10)/breaks) data <- data.frame(sample_sizes) data$power <- sapply(sample_sizes, FUN = function(ss) { return(pwr.r.test(n=ss, r=x$r, sig.level = x$sig.level, alternative = x$alternative)$power) }, simplify = TRUE) # create labels title_string <- "approximate correlation power calculation\n(arctangh transformation)" legend_string <- paste("tails =", x$alternative, "\nr =", x$r, "\nalpha =", x$sig.level ) xlab_string <- "sample size" ylab_string <- expression(paste("test power = 1 - ", beta)) optimal_string <- paste("optimal sample size \nn = ", ceiling(n), sep = "") } # pass arguments if required if(length(dots <- list(...)) && !is.null(dots$xlab)){ xlab_string <- dots$xlab } if(length(dots <- list(...)) && !is.null(dots$ylab)){ ylab_string <- dots$ylab } if(length(dots <- list(...)) && !is.null(dots$main)){ title_string <- dots$main } # position of text in plot if(x$power < 0.5){ text_anchor <- 1 text_vjust <- 1 }else{ text_anchor <- 0 text_vjust <- 0 } if(min(data$power, na.rm = TRUE) < 0.6){ legend_anchor <- 1 legend_vjust <- 1 }else{ legend_anchor <- 0 legend_vjust <- 0 } # plot if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("scales", quietly = TRUE)) { ggplot2::ggplot(data = data, ggplot2::aes(x=sample_sizes, y=power)) + ggplot2::geom_line(colour="red", size=0.1, na.rm = TRUE) + ggplot2::geom_point(na.rm = TRUE) + ggplot2::geom_vline(xintercept = ceiling(n), linetype=3, size=0.8, colour="darkblue") + ggplot2::xlab(xlab_string) + ggplot2::ylab(ylab_string) + ggplot2::ggtitle(title_string) + ggplot2::scale_y_continuous(labels=scales::percent,limits = c(0,1)) + ggplot2::annotate("text", 10, legend_anchor, label=legend_string, hjust=0, vjust=legend_vjust, size=3.5) + ggplot2::annotate("text", n + ((n_upper-10)/breaks), text_anchor, label=optimal_string, hjust=0, vjust=text_vjust, colour="darkblue", size=3.5) }else{ # Alternative if ggplot2 or scales are not included plot(power~sample_sizes, data=data, type="l", col="red", xlab=xlab_string, ylab=ylab_string, yaxt="n", ylim=c(0,1)) points(power~sample_sizes, data=data, pch=16) axis(2, at=pretty(data$power), labels=paste0(pretty(data$power)*100,"%"), las=TRUE) title(title_string) grid() abline(v=ceiling(n), lty=3, col="darkblue") text(10, legend_anchor, labels=legend_string, adj=c(0,legend_vjust), cex=.8) text(n + ((n_upper-10)/breaks), text_anchor, labels=optimal_string, adj=c(0,text_vjust), cex=.8, col="darkblue") grid(TRUE) } } pwr/R/ES.w2.R0000644000176200001440000000014713046210414012241 0ustar liggesusers"ES.w2" <- function(P){ pi<-apply(P,1,sum) pj<-apply(P,2,sum) P0<-pi%*%t(pj) sqrt(sum((P-P0)^2/P0)) } pwr/R/pwr.chisq.test.R0000644000176200001440000000303513064333546014312 0ustar liggesusers"pwr.chisq.test" <- function (w = NULL, N = NULL, df = NULL, sig.level = 0.05, power = NULL) { if (sum(sapply(list(w, N, df, power, sig.level), is.null)) != 1) stop("exactly one of w, N, df, power, and sig.level must be NULL") if (!is.null(w) && w < 0) stop("w must be positive") if (!is.null(N) && N < 1) stop("number of observations must be at least 1") if (!is.null(sig.level) && !is.numeric(sig.level) || any(0 > sig.level | sig.level > 1)) stop(sQuote("sig.level"), " must be numeric in [0, 1]") if (!is.null(power) && !is.numeric(power) || any(0 > power | power > 1)) stop(sQuote("power"), " must be numeric in [0, 1]") p.body <- quote({ k <- qchisq(sig.level, df = df, lower = FALSE) pchisq(k, df = df, ncp = N * w^2, lower = FALSE) }) if (is.null(power)) power <- eval(p.body) else if (is.null(w)) w <- uniroot(function(w) eval(p.body) - power, c(1e-10, 1e+09))$root else if (is.null(N)) N <- uniroot(function(N) eval(p.body) - power, c(1 + 1e-10, 1e+09))$root else if (is.null(sig.level)) sig.level <- uniroot(function(sig.level) eval(p.body) - power, c(1e-10, 1 - 1e-10))$root else stop("internal error") METHOD <- "Chi squared power calculation" NOTE <- "N is the number of observations" structure(list(w = w, N = N, df = df, sig.level = sig.level, power = power, method = METHOD, note = NOTE), class = "power.htest") } pwr/vignettes/0000755000176200001440000000000013246571173013103 5ustar liggesuserspwr/vignettes/pwr-vignette.Rmd0000644000176200001440000007137013065040321016173 0ustar liggesusers--- title: "Getting started with the pwr package" author: "Clay Ford" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with the pwr package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- The basic idea of calculating power or sample size with functions in the pwr package is to _leave out_ the argument that you want to calculate. If you want to calculate power, then leave the `power` argument out of the function. If you want to calculate sample size, leave `n` out of the function. Whatever parameter you want to calculate is determined from the others. You select a function based on the statistical test you plan to use to analyze your data. If you plan to use a two-sample t-test to compare two means, you would use the `pwr.t.test` function for estimating sample size or power. All functions for power and sample size analysis in the pwr package begin with `pwr`. Functions are available for the following statistical tests: - `pwr.p.test`: one-sample proportion test - `pwr.2p.test`: two-sample proportion test - `pwr.2p2n.test`: two-sample proportion test (unequal sample sizes) - `pwr.t.test`: two-sample, one-sample and paired t-tests - `pwr.t2n.test`: two-sample t-tests (unequal sample sizes) - `pwr.anova.test`: one-way balanced ANOVA - `pwr.r.test`: correlation test - `pwr.chisq.test`: chi-squared test (goodness of fit and association) - `pwr.f2.test`: test for the general linear model There are also a few convenience functions for calculating effect size as well as a generic `plot` function for plotting power versus sample size. All of these are demonstrated in the examples below. ## A simple example Let's say we suspect we have a loaded coin that lands heads 75% of the time instead of the expected 50%. We wish to create an experiment to test this. We will flip the coin a certain number of times and observe the proportion of heads. We will then conduct a one-sample proportion test to see if the proportion of heads is significantly different from what we would expect with a fair coin. We will judge significance by our p-value. If our p-value falls below a certain threshold, say 0.05, we will conclude our coin's behavior is inconsistent with that of a fair coin. - Our null hypothesis is that the coin is fair and lands heads 50% of the time ($\pi$ = 0.50). - Our alternative hypothesis is that the coin is loaded to land heads more then 50% of the time ($\pi$ > 0.50). How many times should we flip the coin to have a high probability (or _power_), say 0.80, of correctly rejecting the null of $\pi$ = 0.5 if our coin is indeed loaded to land heads 75% of the time? Here is how we can determine this using the `pwr.p.test` function. ```{r echo=TRUE} library(pwr) pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50), sig.level = 0.05, power = 0.80, alternative = "greater") ``` The function tells us we should flip the coin 22.55127 times, which we round up to 23. Always round sample size estimates up. If we're correct that our coin lands heads 75% of the time, we need to flip it at least 23 times to have an 80% chance of correctly rejecting the null hypothesis at the 0.05 significance level. Notice that since we wanted to determine sample size (`n`), we left it out of the function. Our _effect size_ is entered in the `h` argument. The label `h` is due to Cohen (1988). The function `ES.h` is used to calculate a unitless effect size using the arcsine transformation. (More on effect size below.) `sig.level` is the argument for our desired significance level. This is also sometimes referred to as our tolerance for a Type I error ($\alpha$). `power` is our desired power. It is sometimes referred to as 1 - $\beta$, where $\beta$ is Type II error. The `alternative` argument says we think the alternative is "greater" than the null, not just different. Type I error, $\alpha$, is the probability of rejecting the null hypothesis when it is true. This is thinking we have found an effect where none exist. This is considered the more serious error. Our tolerance for Type I error is usually 0.05 or lower. Type II error, $\beta$, is the probability of failing to reject the null hypothesis when it is false. This is thinking there is no effect when in fact there is. Our tolerance for Type II error is usually 0.20 or lower. Type II error is 1 - Power. If we desire a power of 0.90, then we implicitly specify a Type II error tolerance of 0.10. The pwr package provides a generic `plot` function that allows us to see how power changes as we change our sample size. If you have the ggplot2 package installed, it will create a plot using `ggplot`. Otherwise base R graphics are used. ```{r fig.height=5, fig.width=5} p.out <- pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50), sig.level = 0.05, power = 0.80, alternative = "greater") plot(p.out) ``` What is the power of our test if we flip the coin 40 times and lower our Type I error tolerance to 0.01? Notice we leave out the `power` argument, add `n = 40`, and change `sig.level = 0.01`: ```{r} pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50), sig.level = 0.01, n = 40, alternative = "greater") ``` The power of our test is about 84%. We specified `alternative = "greater"` since we assumed the coin was loaded for more heads (not less). This is a stronger assumption than assuming that the coin is simply unfair in one way or another. In practice, sample size and power calculations will usually make the more conservative "two-sided" assumption. In fact this is the default for pwr functions with an `alternative` argument. If we wish to assume a "two-sided" alternative, we can simply leave it out of the function. Notice how our power estimate drops below 80% when we do this. ```{r} pwr.p.test(h = ES.h(p1 = 0.75, p2 = 0.50), sig.level = 0.01, n = 40) ``` What if we assume the "loaded" effect is smaller? Maybe the coin lands heads 65% of the time. How many flips do we need to perform to detect this smaller effect at the 0.05 level with 80% power and the more conservative two-sided alternative? ```{r} pwr.p.test(h = ES.h(p1 = 0.65, p2 = 0.50), sig.level = 0.05, power = 0.80) ``` About 85 coin flips. Detecting smaller effects require larger sample sizes. ## More on effect size Cohen describes effect size as "the degree to which the null hypothesis is false." In our coin flipping example, this is the difference between 75% and 50%. We could say the effect was 25% but recall we had to transform the absolute difference in proportions to another quantity using the `ES.h` function. This is a crucial part of using the pwr package correctly: _You must provide an effect size on the expected scale._ Doing otherwise will produce wrong sample size and power calculations. When in doubt, we can use _Conventional Effect Sizes_. These are pre-determined effect sizes for "small", "medium", and "large" effects. The `cohen.ES` function returns a conventional effect size for a given test and size. For example, the medium effect size for the correlation test is 0.3: ```{r} cohen.ES(test = "r", size = "medium") ``` For convenience, here are all conventional effect sizes for all tests in the pwr package: Test | `small` | `medium` | `large` ----------------------------|-------|--------|------- tests for proportions (`p`) | 0.2 | 0.5 | 0.8 tests for means (`t`) | 0.2 | 0.5 | 0.8 chi-square tests (`chisq`) | 0.1 | 0.3 | 0.5 correlation test (`r`) | 0.1 | 0.3 | 0.5 anova (`anov`) | 0.1 | 0.25 | 0.4 general linear model (`f2`) | 0.02 | 0.15 | 0.35 It is worth noting that pwr functions can take vectors for effect size and n arguments. This allows us to make many power calculations at once, either for multiple effect sizes or multiple sample sizes. For example, let's see how power changes for our coin flipping experiment for the three conventional effect sizes of 0.2, 0.5, and 0.8, assuming a sample size of 20. ```{r} pwr.p.test(h = c(0.2,0.5,0.8), n = 20, sig.level = 0.05) ``` As we demonstrated with the `plot` function above, we can save our results. This produces a list object from which we can extract quantities for further manipulation. For example, we can calculate power for sample sizes ranging from 10 to 100 in steps of 10, with an assumed "medium" effect of 0.5, and output to a data frame with some formatting: ```{r} n <- seq(10,100,10) p.out <- pwr.p.test(h = 0.5, n = n, sig.level = 0.05) data.frame(n, power = sprintf("%.2f%%", p.out$power * 100)) ``` We can also directly extract quantities with the `$` function appended to the end of a pwr function. For example, ```{r} pwr.p.test(h = 0.5, n = n, sig.level = 0.05)$power ``` ## More examples ### pwr.2p.test - two-sample test for proportions Let's say we want to randomly sample male and female college undergraduate students and ask them if they consume alcohol at least once a week. Our null hypothesis is no difference in the proportion that answer yes. Our alternative hypothesis is that there is a difference. This is a two-sided alternative; one gender has higher proportion but we don't know which. We would like to detect a difference as small as 5%. How many students do we need to sample in each group if we want 80% power and a significance level of 0.05? If we think one group proportion is 55% and the other 50%: ```{r} pwr.2p.test(h = ES.h(p1 = 0.55, p2 = 0.50), sig.level = 0.05, power = .80) ``` Notice the sample size is _per group_. We need to sample 1,565 males and 1,565 females to detect the 5% difference with 80% power. If we think one group proportion is 10% and the other 5%: ```{r} pwr.2p.test(h = ES.h(p1 = 0.10, p2 = 0.05), sig.level = 0.05, power = .80) ``` Even though the absolute difference between proportions is the same (5%), the optimum sample size is now 424 per group. 10% vs 5% is actually a bigger difference than 55% vs 50%. A heuristic approach for understanding why is to compare the ratios: 55/50 = 1.1 while 10/5 = 2. The `ES.h` function performs an arcsine transformation on both proportions and returns the difference. By setting `p2` to 0, we can see the transformed value for `p1`. We can exploit this to help us visualize how the transformation creates larger effects for two proportions closer to 0 or 1. Below we plot transformed proportions versus untransformed proportions and then compare the distance between pairs of proportions on each axis. ```{r fig.height=5, fig.width=5} addSegs <- function(p1, p2){ tp1 <- ES.h(p1, 0); tp2 <- ES.h(p2, 0) segments(p1,0,p1,tp1, col="blue"); segments(p2,0,p2,tp2,col="blue") segments(0, tp1, p1, tp1, col="red"); segments(0, tp2, p2, tp2, col="red") } curve(expr = ES.h(p1 = x, p2 = 0), xlim = c(0,1), xlab = "proportion", ylab = "transformed proportion") addSegs(p1 = 0.50, p2 = 0.55) # 50% vs 55% addSegs(p1 = 0.05, p2 = 0.10) # 5% vs 10% ``` The differences on the x-axis between the two pairs of proportions is the same (0.05), but the difference is larger for 5% vs 10% on the y-axis. The `ES.h` function returns the distance between the red lines. Base R has a function called `power.prop.test` that allows us to use the raw proportions in the function without a need for a separate effect size function. ```{r} power.prop.test(p1 = 0.55, p2 = 0.50, sig.level = 0.05, power = .80) ``` Notice the results are slightly different. It calculates effect size differently. If we don't have any preconceived estimates of proportions or don't feel comfortable making estimates, we can use conventional effect sizes of 0.2 (small), 0.5 (medium), or 0.8 (large). The sample size per group needed to detect a "small" effect with 80% power and 0.05 significance is about 393: ```{r} pwr.2p.test(h = 0.2, sig.level = 0.05, power = .80) ``` ### pwr.2p2n.test - two-sample test for proportions, unequal sample sizes Let's return to our undergraduate survey of alcohol consumption. It turns out we were able to survey 543 males and 675 females. The power of our test if we're interested in being able to detect a "small" effect size with 0.05 significance is about 93%. ```{r} cohen.ES(test = "p", size = "small") pwr.2p2n.test(h = 0.2, n1 = 543, n2 = 675, sig.level = 0.05) ``` Let's say we previously surveyed 763 female undergraduates and found that _p_% said they consumed alcohol once a week. We would like to survey some males and see if a significantly different proportion respond yes. How many do I need to sample to detect a small effect size (0.2) in either direction with 80% power and a significance level of 0.05? ```{r} pwr.2p2n.test(h = 0.2, n1 = 763, power = 0.8, sig.level = 0.05) ``` About 265. ### pwr.t.test - one-sample and two-sample t tests for means We're interested to know if there is a difference in the mean price of what male and female students pay at a library coffee shop. Let's say we randomly observe 30 male and 30 female students check out from the coffee shop and calculate the mean purchase price for each gender. We'll test for a difference in means using a two-sample t-test. How powerful is this experiment if we want to detect a "medium" effect in either direction with a significance level of 0.05? ```{r} cohen.ES(test = "t", size = "medium") pwr.t.test(n = 30, d = 0.5, sig.level = 0.05) ``` Only 48%. Not very powerful. How many students should we observe for a test with 80% power? ```{r} pwr.t.test(d = 0.5, power = 0.80, sig.level = 0.05) ``` About 64 per group. Let's say we want to be able to detect a difference of at least 75 cents in the mean purchase price. We need to convert that to an effect size using the following formula: $$d = \frac{m_{1} - m_{2}}{\sigma} $$ where $m_{1}$ and $m_{2}$ are the means of each group, respectively, and $\sigma$ is the common standard deviation of the two groups. Again, the label _d_ is due to Cohen (1988). We have $m_{1} - m_{2} =$ 0.75. We need to make a guess at the population standard deviation. If we have absolutely no idea, one rule of thumb is to take the difference between the maximum and minimum values and divide by 4. Let's say the maximum purchase is $10 and the minimum purchase is $1. Our estimated standard deviation is (10 - 1)/4 = 2.25. Therefore our effect size is 0.75/2.25 $\approx$ 0.333. ```{r} d <- 0.75/2.25 pwr.t.test(d = d, power = 0.80, sig.level = 0.05) ``` For a desired power of 80%, Type I error tolerance of 0.05, and a hypothesized effect size of 0.333, we should sample at least 143 per group. Performing the same analysis with the base R function `power.t.test` is a little easier. The difference $m_{1} - m_{2} =$ 0.75 is entered in the `delta` argument and the estimated $\sigma$ = 2.25 is entered in the `sd` argument: ```{r} power.t.test(delta = 0.75, sd = 2.25, sig.level = 0.05, power = 0.8) ``` To calculate power and sample size for one-sample t-tests, we need to set the `type` argument to `"one.sample"`. By default it is set to `"two.sample"`. For example, we think the average purchase price at the Library coffee shop is over $3 per student. Our null is $3 or less; our alternative is greater than $3. We can use a one-sample t-test to investigate this hunch. If the true average purchase price is $3.50, we would like to have 90% power to declare the estimated average purchase price is greater than $3. How many transactions do we need to observe assuming a significance level of 0.05? Let's say the maximum purchase price is $10 and the minimum is $1. So our guess at a standard deviation is 9/4 = 2.25. ```{r} d <- 0.50/2.25 pwr.t.test(d = d, sig.level = 0.05, power = 0.90, alternative = "greater", type = "one.sample") ``` We should plan on observing at least 175 transactions. To use the `power.t.test` function, set `type = "one.sample"` and `alternative = "one.sided"`: ```{r} power.t.test(delta = 0.50, sd = 2.25, power = 0.90, sig.level = 0.05, alternative = "one.sided", type = "one.sample") ``` "Paired" t-tests are basically the same as one-sample t-tests, except our one sample is usually differences in pairs. The following example should make this clear. (_From Hogg & Tanis, exercise 6.5-12_) 24 high school boys are put on a ultra-heavy rope-jumping program. Does this decrease their 40-yard dash time (i.e., make them faster)? We'll measure their 40 time in seconds before the program and after. We'll use a paired t-test to see if the difference in times is greater than 0 (before - after). Assume the standard deviation of the differences will be about 0.25 seconds. How powerful is the test to detect a difference of about 0.08 seconds with 0.05 significance? Notice we set `type = "paired"`: ```{r} pwr.t.test(n = 24, d = 0.08 / 0.25, type = "paired", alternative = "greater") ``` Only 45%. Not all that powerful. How many high school boys should we sample for 80% power? ```{r} pwr.t.test(d = 0.08 / 0.25, power = 0.8, type = "paired", alternative = "greater") ``` About 62. For paired t-tests we sometimes estimate a standard deviation for _within_ pairs instead of for the difference in pairs. In our example, this would mean an estimated standard deviation for each boy's 40-yard dash times. When dealing with this type of estimated standard deviation we need to multiply it by $\sqrt{2}$ in the `pwr.t.test` function. Let's say we estimate the standard deviation of each boy's 40-yard dash time to be about 0.10 seconds. The sample size needed to detect a difference of 0.08 seconds is now calculated as follows: ```{r} pwr.t.test(d = 0.08 / (0.1 * sqrt(2)), power = 0.8, type = "paired", alternative = "greater") ``` We need to sample at least 21 students. ### pwr.t2n.test - two-sample t test for means, unequal sample sizes Find power for a two-sample t-test with 28 in one group and 35 in the other group and a medium effect size. (sig.level defaults to 0.05.) ```{r} pwr.t2n.test(n1 = 28, n2 = 35, d = 0.5) ``` ### pwr.chisq.test - Goodness of fit test (_From Cohen, example 7.1_) A market researcher is seeking to determine preference among 4 package designs. He arranges to have a panel of 100 consumers rate their favorite package design. He wants to perform a chi-square goodness of fit test against the null of equal preference (25% for each design) with a significance level of 0.05. What's the power of the test if 3/8 of the population actually prefers one of the designs and the remaining 5/8 are split over the other 3 designs? We use the `ES.w1` function to calculate effect size. To do so, we need to create vectors of null and alternative proportions: ```{r} null <- rep(0.25, 4) alt <- c(3/8, rep((5/8)/3, 3)) ES.w1(null,alt) ``` To calculate power, specify effect size (`w`), sample size (`N`), and degrees of freedom, which is the number of categories minus 1 (`df` = 4 - 1). ```{r} pwr.chisq.test(w=ES.w1(null,alt), N=100, df=(4-1), sig.level=0.05) ``` If our estimated effect size is correct, we only have about a 67% chance of finding it (i.e., rejecting the null hypothesis of equal preference). How many subjects do we need to achieve 80% power? ```{r} pwr.chisq.test(w=ES.w1(null,alt), df=(4-1), power=0.8, sig.level = 0.05) ``` If our alternative hypothesis is correct then we need to survey at least 131 people to detect it with 80% power. ### pwr.chisq.test - test of association We want to see if there's an association between gender and flossing teeth among college students. We randomly sample 100 students (male and female) and ask whether or not they floss daily. We want to carry out a chi-square test of association to determine if there's an association between these two variables. We set our significance level to 0.01. To determine effect size we need to propose an alternative hypothesis, which in this case is a table of proportions. We propose the following: gender | Floss |No Floss --|------|-------- Male | 0.1 | 0.4 Female | 0.2 | 0.3 We use the `ES.w2` function to calculate effect size for chi-square tests of association ```{r} prob <- matrix(c(0.1,0.2,0.4,0.3), ncol=2, dimnames = list(c("M","F"),c("Floss","No Floss"))) prob ``` This says we sample even proportions of male and females, but believe 10% more females floss. Now use the matrix to calculate effect size: ```{r} ES.w2(prob) ``` We also need degrees of freedom. `df` = (2 - 1) * (2 - 1) = 1 And now to calculate power: ```{r} pwr.chisq.test(w = ES.w2(prob), N = 100, df = 1, sig.level = 0.01) ``` At only 35% this is not a very powerful experiment. How many students should I survey if I wish to achieve 90% power? ```{r} pwr.chisq.test(w = ES.w2(prob), power = 0.9, df = 1, sig.level = 0.01) ``` About 313. If you don't suspect association in either direction, or you don't feel like building a matrix in R, you can try a conventional effect size. For example, how many students should we sample to detect a small effect? ```{r} cohen.ES(test = "chisq", size = "small") pwr.chisq.test(w = 0.1, power = 0.9, df = 1, sig.level = 0.01) ``` 1,488 students. Perhaps more than we thought we might need. We could consider reframing the question as a two-sample proportion test. What sample size do we need to detect a "small" effect in gender on the proportion of students who floss with 90% power and a significance level of 0.01? ```{r} pwr.2p.test(h = 0.2, sig.level = 0.01, power = 0.9) ``` About 744 per group. Notice that 744 $\times$ 2 = 1,488, the sample size returned previously by `pwr.chisq.test`. In fact the test statistic for a two-sample proportion test and chi-square test of association are one and the same. ### pwr.r.test - correlation test (_From Hogg & Tanis, exercise 8.9-12_) A graduate student is investigating the effectiveness of a fitness program. She wants to see if there is a correlation between the weight of a participant at the beginning of the program and the participant's weight change after 6 months. She suspects there is a "small" positive linear relationship between these two quantities. She will measure this relationship with correlation, _r_, and conduct a correlation test to determine if the estimated correlation is statistically greater than 0. How many subjects does she need to sample to detect this small positive (i.e., _r_ > 0) relationship with 80% power and 0.01 significance level? There is nothing tricky about the effect size argument, `r`. It is simply the hypothesized correlation. It can take values ranging from -1 to 1. ```{r} cohen.ES(test = "r", size = "small") pwr.r.test(r = 0.1, sig.level = 0.01, power = 0.8, alternative = "greater") ``` She needs to observe about a 1000 students. The default is a two-sided test. We specify `alternative = "greater"` since we believe there is small positive effect. If she just wants to detect a small effect in either direction (positive or negative correlation), use the default settings of "two.sided", which we can do by removing the `alternative` argument from the function. ```{r} pwr.r.test(r = 0.1, sig.level = 0.01, power = 0.8) ``` Now she needs to observe 1163 students. Detecting small effects requires large sample sizes. ### pwr.anova.test - balanced one-way analysis of variance tests (_From Hogg & Tanis, exercise 8.7-11_) The driver of a diesel-powered car decides to test the quality of three types of fuel sold in his area based on the miles per gallon (mpg) his car gets on each fuel. He will use a balanced one-way ANOVA to test the null that the mean mpg is the same for each fuel versus the alternative that the means are different. ("balanced" means equal sample size in each group; "one-way" means one grouping variable.) How many times does he need to try each fuel to have 90% power to detect a "medium" effect with a significance of 0.01? We use `cohen.ES` to get learn the "medium" effect value is 0.25. We put that in the `f` argument of `pwr.anova.test`. We also need to specify the number of groups using the `k` argument. ```{r} cohen.ES(test = "anov", size = "medium") pwr.anova.test(k = 3, f = 0.25, sig.level = 0.01, power = 0.9) ``` He would need to measure mpg 95 times for each type of fuel. His experiment may take a while to complete. The effect size `f` is calculated as follows: $$f = \frac{\sigma_{means}}{\sigma_{pop'n}}$$ where $\sigma_{means}$ is the standard deviation of the _k_ means and $\sigma_{pop'n}$ is the common standard deviation of the _k_ groups. These two quantities are also known as the _between-group_ and _within-group_ standard deviations. If our driver suspects the between-group standard deviation is 5 mpg and the within-group standard deviation is 3 mpg, `f` = 5/3. ```{r} pwr.anova.test(k = 3, f = 5/3, sig.level = 0.01, power = 0.9) ``` In this case he only needs to try each fuel 4 times. Clearly the hypothesized effect has important consequences in estimating an optimum effect size. We can also use the `power.anova.test` function that comes with base R. It requires between-group and within-group _variances_. To get the same result as `pwr.anova.test` we need to square the standard deviations to get variances and multiply the between-group variance by $\frac{k}{k-1}$. This is because the effect size formula for the ANOVA test assumes the between-group variance has a denominator of _k_ instead of _k - 1_. ```{r} power.anova.test(groups = 3, within.var = 3^2, between.var = 5^2 * (3/2), sig.level = 0.01, power = 0.90) ``` ### pwr.f2.test - test for the general linear model (_From Kutner, et al, exercise 8.43_) A director of admissions at a university wants to determine how accurately students' grade-point averages (gpa) at the end of their first year can be predicted or explained by SAT scores and high school class rank. A common approach to answering this kind of question is to model gpa as a function of SAT score and class rank. Or to put another way, we can perform a multiple regression with gpa as the dependent variable and SAT and class rank as independent variables. The null hypothesis is that none of the independent variables explain any of the variability in gpa. This would mean their regression coefficients are statistically indistinguishable from 0. The alternative is that at least one of the coefficients is not 0. This is tested with an F test. We can estimate power and sample size for this test using the `pwr.f2.test` function. The F test has numerator and denominator degrees of freedom. The numerator degrees of freedom, `u`, is the number of coefficients you'll have in your model (minus the intercept). In our example, `u = 2`. The denominator degrees of freedom, `v`, is the number of error degrees of freedom: $v = n - u - 1$. This implies $n = v + u + 1$. The effect size, `f2`, is $R^{2}/(1 - R^{2})$, where $R^{2}$ is the coefficient of determination, aka the "proportion of variance explained". To determine effect size you hypothesize the proportion of variance your model explains, or the $R^{2}$. For example, if I think my model explains 45% of the variance in my dependent variable, the effect size is 0.45/(1 - 0.45) $\approx$ 0.81. Returning to our example, let's say the director of admissions hypothesizes his model explains about 30% of the variability in gpa. How large of a sample does he need to take to detect this effect with 80% power at a 0.001 significance level? ```{r} pwr.f2.test(u = 2, f2 = 0.3/(1 - 0.3), sig.level = 0.001, power = 0.8) ``` Recall $n = v + u + 1$. Therefore he needs 50 + 2 + 1 = 53 student records. What is the power of the test with 40 subjects and a significance level of 0.01? Recall $v = n - u - 1$. ```{r} pwr.f2.test(u = 2, v = 40 - 2 - 1, f2 = 0.3/(1 - 0.3), sig.level = 0.01) ``` Power is about 84%. ## References and Further Reading Cohen, J. (1988). _Statistical Power Analysis for the Behavioral Sciences (2nd ed.)_. LEA. Dalgaard, P. (2002). _Introductory Statistics with R_. Springer. (Ch. 2) Hogg, R and Tanis, E. (2006). _Probability and Statistical Inference (7th ed.)_. Pearson. (Ch. 9) Kabacoff, R. (2011). _R in Action_. Manning. (Ch. 10) Kutner, et al. (2005). _Applied Linear Statistical Models_. McGraw-Hill. (Ch. 16) Ryan, T. (2013). _Sample Size Determination and Power_. Wiley. The [CRAN Task View for Clinical Trial Design, Monitoring, and Analysis](https://CRAN.R-project.org/view=ClinicalTrials) lists various R packages that also perform sample size and power calculations. pwr/MD50000644000176200001440000000372213246622211011375 0ustar liggesusers8e978034867eca1e9780cd13d5f89399 *DESCRIPTION 6a2b948c4abdc27fa91f02422d949df4 *NAMESPACE 725ec88a93a679e8280027abbe1e869c *NEWS b8a87f4ddac74d10af98b71c93eef5cb *R/ES.h.R 20e4fa58c12b787eba3fdfbbdec1934e *R/ES.w1.R 8d1b3ba98a6a3e6040a18ab99375c8ae *R/ES.w2.R 8f43391c71c7d3c42056cad38f12ed5a *R/cohen.ES.R 10f8ff8e86941758853581839d67fd6f *R/plot.power.htest.R c961372f59b87aceea32bb7dc4c8c161 *R/pwr.2p.test.R d0bfd188876d9b9beb55fc4ca866e0e3 *R/pwr.2p2n.test.R 69a9d679367323caadeb2143c73aec2c *R/pwr.anova.test.R 0de6e54aca992b5f22d8084aff3273e6 *R/pwr.chisq.test.R a335e21ced325019af343f108819c60f *R/pwr.f2.test.R 7695371612d70d6c0acf690d86f3fd06 *R/pwr.norm.test.R 71897031b5623c9e8270b599b3295669 *R/pwr.p.test.R cdb84f30cfe480e53f5592bd21181f9a *R/pwr.r.test.R b4e1a7beca3bc74e6e474e1037dac0b3 *R/pwr.t.test.R 4a0adecbb08b52e916fb0730d4b6827f *R/pwr.t2n.test.R b8f3f220fe5bfe36b56290ecdf7f881b *build/vignette.rds d21e35adb0a60115bb62cab48ddafee5 *inst/doc/pwr-vignette.R 9b37e2917689703af87e670bbfdfd4ce *inst/doc/pwr-vignette.Rmd 3bf8bcdc9862f3c65cbd86196a39b518 *inst/doc/pwr-vignette.html eca9274a0fb0feba8000a076e5796800 *man/ES.h.Rd 800c5e5691e698138998a86b4c410dee *man/ES.w1.Rd 16994846d5cf682688fd5617d6193185 *man/ES.w2.Rd c060199d3c12481ab4bf243844a09068 *man/cohen.ES.Rd 0f1fefe2773be679b6a4504c8411184c *man/plot.power.htest.Rd b446e02123e8899d23edf0ff2e3c24fb *man/pwr-package.Rd 6573055b854cd28d915583603bd38f65 *man/pwr.2p.test.Rd 8199bee63fdd444a299277f0cc1a7ac6 *man/pwr.2p2n.test.Rd 96d5df94bdb949e8a225da03771be265 *man/pwr.anova.test.Rd e7a9b99a266050e243c112a49e5ea80c *man/pwr.chisq.test.Rd c832b0a0afc7f80c850375d00d7b719b *man/pwr.f2.test.Rd b60c87f94df02422fd38b524a94ffa88 *man/pwr.norm.test.Rd 3b1079dbdc8100942f4a09e45f668ccf *man/pwr.p.test.Rd dc4ff3e34dd80d8d59cf5b314a6b6e3a *man/pwr.r.test.Rd 892acb23edb6da8b1f9193e762e03548 *man/pwr.t.test.Rd b13d9039aa43414595657cf2611a6bac *man/pwr.t2n.test.Rd 9b37e2917689703af87e670bbfdfd4ce *vignettes/pwr-vignette.Rmd pwr/build/0000755000176200001440000000000013246571173012172 5ustar liggesuserspwr/build/vignette.rds0000644000176200001440000000034313246571173014531 0ustar liggesusersmQ0   h|S~wäyh<ԋ%J~֬pwr/DESCRIPTION0000644000176200001440000000223113246622211012565 0ustar liggesusersPackage: pwr Version: 1.2-2 Date: 2018-03-03 Title: Basic Functions for Power Analysis Authors@R: c(person("Stephane", "Champely", role=c("aut")), person("Claus", "Ekstrom", role="ctb"), person("Peter", "Dalgaard", role="ctb"), person("Jeffrey", "Gill", role="ctb"), person("Stephan", "Weibelzahl", role="ctb"), person("Aditya", "Anandkumar", role="ctb"), person("Clay", "Ford", role="ctb"), person("Robert", "Volcic", role="ctb"), person("Helios", "De Rosario", role="cre", email="helios.derosario@gmail.com")) Description: Power analysis functions along the lines of Cohen (1988). Imports: stats, graphics Suggests: ggplot2, scales, knitr, rmarkdown License: GPL (>= 3) URL: https://github.com/heliosdrm/pwr VignetteBuilder: knitr RoxygenNote: 6.0.1 NeedsCompilation: no Packaged: 2018-03-03 19:07:39 UTC; meliana Author: Stephane Champely [aut], Claus Ekstrom [ctb], Peter Dalgaard [ctb], Jeffrey Gill [ctb], Stephan Weibelzahl [ctb], Aditya Anandkumar [ctb], Clay Ford [ctb], Robert Volcic [ctb], Helios De Rosario [cre] Maintainer: Helios De Rosario Repository: CRAN Date/Publication: 2018-03-03 22:41:13 UTC pwr/man/0000755000176200001440000000000013246571173011646 5ustar liggesuserspwr/man/ES.w1.Rd0000644000176200001440000000142513046210416012760 0ustar liggesusers\name{ES.w1} \alias{ES.w1} \title{Effect size calculation in the chi-squared test for goodness of fit} \description{ Compute effect size w for two sets of k probabilities P0 (null hypothesis) and P1 (alternative hypothesis) } \usage{ ES.w1(P0, P1) } \arguments{ \item{P0}{First set of k probabilities (null hypothesis)} \item{P1}{Second set of k probabilities (alternative hypothesis)} } \value{ The corresponding effect size w } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane CHAMPELY} \seealso{pwr.chisq.test} \examples{ ## Exercise 7.1 p. 249 from Cohen P0<-rep(1/4,4) P1<-c(0.375,rep((1-0.375)/3,3)) ES.w1(P0,P1) pwr.chisq.test(w=ES.w1(P0,P1),N=100,df=(4-1)) } \keyword{htest} pwr/man/pwr.norm.test.Rd0000644000176200001440000000440513046210416014664 0ustar liggesusers\name{pwr.norm.test} \alias{pwr.norm.test} \title{Power calculations for the mean of a normal distribution (known variance)} \description{ Compute power of test or determine parameters to obtain target power (same as power.anova.test).} \usage{ pwr.norm.test(d = NULL, n = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided","less","greater"))} \arguments{ \item{d}{Effect size d=mu-mu0} \item{n}{Number of observations} \item{sig.level}{Significance level (Type I error probability)} \item{power}{Power of test (1 minus Type II error probability)} \item{alternative}{a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less"}} \details{ Exactly one of the parameters 'd','n','power' and 'sig.level' must be passed as NULL, and that parameter is determined from the others. Notice that the last one has non-NULL default so NULL must be explicitly passed if you want to compute it. } \value{ Object of class '"power.htest"', a list of the arguments (including the computed one) augmented with 'method' and 'note' elements. } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane Champely but this is a mere copy of Peter Dalgaard work (power.t.test)} \note{ 'uniroot' is used to solve power equation for unknowns, so you may see errors from it, notably about inability to bracket the root when invalid arguments are given. } \examples{ ## Power at mu=105 for H0:mu=100 vs. H1:mu>100 (sigma=15) 20 obs. (alpha=0.05) sigma<-15 c<-100 mu<-105 d<-(mu-c)/sigma pwr.norm.test(d=d,n=20,sig.level=0.05,alternative="greater") ## Sample size of the test for power=0.80 pwr.norm.test(d=d,power=0.8,sig.level=0.05,alternative="greater") ## Power function of the same test mu<-seq(95,125,l=100) d<-(mu-c)/sigma plot(d,pwr.norm.test(d=d,n=20,sig.level=0.05,alternative="greater")$power, type="l",ylim=c(0,1)) abline(h=0.05) abline(h=0.80) ## Power function for the two-sided alternative plot(d,pwr.norm.test(d=d,n=20,sig.level=0.05,alternative="two.sided")$power, type="l",ylim=c(0,1)) abline(h=0.05) abline(h=0.80) } \keyword{htest}pwr/man/pwr-package.Rd0000644000176200001440000000430313246552675014344 0ustar liggesusers\name{pwr-package} \alias{pwr-package} \alias{pwr} \docType{package} \title{ Basic Functions for Power Analysis pwr } \description{ Power calculations along the lines of Cohen (1988) using in particular the same notations for effect sizes. Examples from the book are given. } \details{ \tabular{ll}{ Package: \tab pwr\cr Type: \tab Package\cr Version: \tab 1.2-2\cr Date: \tab 2018-03-03\cr License: \tab GPL (>= 3) \cr } This package contains functions for basic power calculations using effect sizes and notations from Cohen (1988) : pwr.p.test: test for one proportion (ES=h) pwr.2p.test: test for two proportions (ES=h) pwr.2p2n.test: test for two proportions (ES=h, unequal sample sizes) pwr.t.test: one sample and two samples (equal sizes) t tests for means (ES=d) pwr.t2n.test: two samples (different sizes) t test for means (ES=d) pwr.anova.test: test for one-way balanced anova (ES=f) pwr.r.test: correlation test (ES=r) pwr.chisq.test: chi-squared test (ES=w) pwr.f2.test: test for the general linear model (ES=f2) ES.h: computing effect size h for proportions tests ES.w1: computing effect size w for the goodness of fit chi-squared test ES.w2: computing effect size w for the association chi-squared test cohen.ES: computing effect sizes for all the previous tests corresponding to conventional effect sizes (small, medium, large) } \author{ Stephane Champely, based on previous works by Claus Ekstrom and Peter Dalgaard, with contributions of Jeffrey Gill, Stephan Weibelzahl, Clay Ford, Aditya Anandkumar and Robert Volcic. Maintainer: Helios De Rosario-Martinez } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \keyword{ package } \keyword{htest} \seealso{power.t.test,power.prop.test,power.anova.test} \examples{ ## Exercise 8.1 P. 357 from Cohen (1988) pwr.anova.test(f=0.28,k=4,n=20,sig.level=0.05) ## Exercise 6.1 p. 198 from Cohen (1988) pwr.2p.test(h=0.3,n=80,sig.level=0.05,alternative="greater") ## Exercise 7.3 p. 251 pwr.chisq.test(w=0.346,df=(2-1)*(3-1),N=140,sig.level=0.01) ## Exercise 6.5 p. 203 from Cohen (1988) pwr.p.test(h=0.2,n=60,sig.level=0.05,alternative="two.sided") } pwr/man/pwr.anova.test.Rd0000644000176200001440000000326513244306132015021 0ustar liggesusers\name{pwr.anova.test} \alias{pwr.anova.test} \title{Power calculations for balanced one-way analysis of variance tests} \description{ Compute power of test or determine parameters to obtain target power (same as power.anova.test).} \usage{ pwr.anova.test(k = NULL, n = NULL, f = NULL, sig.level = 0.05, power = NULL) } %- maybe also 'usage' for other objects documented here. \arguments{ \item{k}{Number of groups} \item{n}{Number of observations (per group)} \item{f}{Effect size} \item{sig.level}{Significance level (Type I error probability)} \item{power}{Power of test (1 minus Type II error probability)} } \details{ Exactly one of the parameters 'k','n','f','power' and 'sig.level' must be passed as NULL, and that parameter is determined from the others. Notice that the last one has non-NULL default so NULL must be explicitly passed if you want to compute it. } \value{ Object of class '"power.htest"', a list of the arguments (including the computed one) augmented with 'method' and 'note' elements. } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane Champely but this is a mere copy of Peter Dalgaard work (power.t.test)} \note{ 'uniroot' is used to solve power equation for unknowns, so you may see errors from it, notably about inability to bracket the root when invalid arguments are given. } \seealso{power.anova.test} \examples{ ## Exercise 8.1 P. 357 from Cohen (1988) pwr.anova.test(f=0.28,k=4,n=20,sig.level=0.05) ## Exercise 8.10 p. 391 pwr.anova.test(f=0.28,k=4,power=0.80,sig.level=0.05) } \keyword{htest} pwr/man/cohen.ES.Rd0000644000176200001440000000165013046210416013525 0ustar liggesusers\name{cohen.ES} \alias{cohen.ES} \title{Conventional effects size} \description{ Give the conventional effect size (small, medium, large) for the tests available in this package } \usage{ cohen.ES(test = c("p", "t", "r", "anov", "chisq", "f2"), size = c("small", "medium", "large")) } \arguments{ \item{test}{The statistical test of interest} \item{size}{The ES : small, medium of large? } } \value{ The corresponding effect size } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane CHAMPELY} \examples{ ## medium effect size for the correlation test cohen.ES(test="r", size="medium") ## sample size for a medium size effect in the two-sided correlation test ## using the conventional power of 0.80 pwr.r.test(r=cohen.ES(test="r",size="medium")$effect.size, power=0.80, sig.level=0.05, alternative="two.sided") } \keyword{htest}pwr/man/pwr.r.test.Rd0000644000176200001440000000461013246561533014163 0ustar liggesusers\name{pwr.r.test} \alias{pwr.r.test} \title{Power calculations for correlation test} \description{ Compute power of test or determine parameters to obtain target power (same as power.anova.test).} \usage{ pwr.r.test(n = NULL, r = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided", "less","greater"))} %- maybe also 'usage' for other objects documented here. \arguments{ \item{n}{Number of observations} \item{r}{Linear correlation coefficient} \item{sig.level}{Significance level (Type I error probability)} \item{power}{Power of test (1 minus Type II error probability)} \item{alternative}{a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less"}} \details{ These calculations use the Z' transformation of correlation coefficient : Z'=arctanh(r)+r/(2*(n-1)) and a bias correction is applied. Note that contrary to Cohen (1988) p.546, where zp' = arctanh(rp) + rp/(2*(n-1)) and zc' = arctanh(rc) + rc/(2*(n-1)), we only use here zp' = arctanh(rp) + rp/(2*(n-1)) and zc' = arctanh(rc). Exactly one of the parameters 'r','n','power' and 'sig.level' must be passed as NULL, and that parameter is determined from the others. Notice that the last one has non-NULL default so NULL must be explicitly passed if you want to compute it. } \value{ Object of class '"power.htest"', a list of the arguments (including the computed one) augmented with 'method' and 'note' elements. } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane Champely but this is a mere copy of Peter Dalgaard work (power.t.test). The modified bias correction is contributed by Jeffrey Gill.} \note{ 'uniroot' is used to solve power equation for unknowns, so you may see errors from it, notably about inability to bracket the root when invalid arguments are given. } \examples{ ## Exercise 3.1 p. 96 from Cohen (1988) pwr.r.test(r=0.3,n=50,sig.level=0.05,alternative="two.sided") pwr.r.test(r=0.3,n=50,sig.level=0.05,alternative="greater") ## Exercise 3.4 p. 208 pwr.r.test(r=0.3,power=0.80,sig.level=0.05,alternative="two.sided") pwr.r.test(r=0.5,power=0.80,sig.level=0.05,alternative="two.sided") pwr.r.test(r=0.1,power=0.80,sig.level=0.05,alternative="two.sided") } \keyword{htest} pwr/man/ES.h.Rd0000644000176200001440000000130613046210416012656 0ustar liggesusers\name{ES.h} \alias{ES.h} \title{Effect size calculation for proportions} \description{ Compute effect size h for two proportions } \usage{ ES.h(p1, p2) } \arguments{ \item{p1}{First proportion} \item{p2}{Second proportion} } \details{ The effect size is 2*asin(sqrt(p1))-2*asin(sqrt(p2)) } \value{ The corresponding effect size } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane CHAMPELY} \seealso{pwr.p.test, pwr.2p.test, pwr.2p2n.test, power.prop.test} \examples{ ## Exercise 6.5 p. 203 from Cohen h<-ES.h(0.5,0.4) h pwr.p.test(h=h,n=60,sig.level=0.05,alternative="two.sided") } \keyword{htest} pwr/man/pwr.2p2n.test.Rd0000644000176200001440000000365713046210416014502 0ustar liggesusers\name{pwr.2p2n.test} \alias{pwr.2p2n.test} \title{Power calculation for two proportions (different sample sizes)} \description{ Compute power of test, or determine parameters to obtain target power. } \usage{ pwr.2p2n.test(h = NULL, n1 = NULL, n2 = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided", "less","greater")) } \arguments{ \item{h}{Effect size} \item{n1}{Number of observations in the first sample} \item{n2}{Number of observationsz in the second sample} \item{sig.level}{Significance level (Type I error probability)} \item{power}{Power of test (1 minus Type II error probability)} \item{alternative}{a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less"} } \details{ Exactly one of the parameters 'h','n1', 'n2', 'power' and 'sig.level' must be passed as NULL, and that parameter is determined from the others. Notice that the last one has non-NULL default so NULL must be explicitly passed if you want to compute it.} \value{ Object of class '"power.htest"', a list of the arguments (including the computed one) augmented with 'method' and 'note' elements. } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane Champely but this is a mere copy of Peter Dalgaard work (power.t.test)} \note{ 'uniroot' is used to solve power equation for unknowns, so you may see errors from it, notably about inability to bracket the root when invalid arguments are given. } \seealso{ES.h, pwr.2p.test, power.prop.test} \examples{ ## Exercise 6.3 P. 200 from Cohen (1988) pwr.2p2n.test(h=0.30,n1=80,n2=245,sig.level=0.05,alternative="greater") ## Exercise 6.7 p. 207 from Cohen (1988) pwr.2p2n.test(h=0.20,n1=1600,power=0.9,sig.level=0.01,alternative="two.sided") } \keyword{htest} pwr/man/pwr.t2n.test.Rd0000644000176200001440000000352113046210416014412 0ustar liggesusers\name{pwr.t2n.test} \alias{pwr.t2n.test} \title{Power calculations for two samples (different sizes) t-tests of means } \description{ Compute power of tests or determine parameters to obtain target power (similar to as power.t.test).} \usage{ pwr.t2n.test(n1 = NULL, n2= NULL, d = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided", "less","greater"))} %- maybe also 'usage' for other objects documented here. \arguments{ \item{n1}{Number of observations in the first sample} \item{n2}{Number of observations in the second sample} \item{d}{Effect size} \item{sig.level}{Significance level (Type I error probability)} \item{power}{Power of test (1 minus Type II error probability)} \item{alternative}{a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less"} } \details{ Exactly one of the parameters 'd','n1','n2','power' and 'sig.level' must be passed as NULL, and that parameter is determined from the others. Notice that the last one has non-NULL default so NULL must be explicitly passed if you want to compute it. } \value{ Object of class '"power.htest"', a list of the arguments (including the computed one) augmented with 'method' and 'note' elements. } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane Champely but this is a mere copy of Peter Dalgaard work (power.t.test)} \note{ 'uniroot' is used to solve power equation for unknowns, so you may see errors from it, notably about inability to bracket the root when invalid arguments are given. } \examples{ ## Exercise 2.3 p. 437 from Cohen (1988) pwr.t2n.test(d=0.6,n1=90,n2=60,alternative="greater") } \keyword{htest}pwr/man/ES.w2.Rd0000644000176200001440000000142213046210416012756 0ustar liggesusers\name{ES.w2} \alias{ES.w2} \title{Effect size calculation in the chi-squared test for association} \description{ Compute effect size w for a two-way probability table corresponding to the alternative hypothesis in the chi-squared test of association in two-way contingency tables } \usage{ ES.w2(P) } \arguments{ \item{P}{A two-way probability table (alternative hypothesis)} } \value{ The corresponding effect size w } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane CHAMPELY} \seealso{pwr.chisq.test} \examples{ prob<-matrix(c(0.225,0.125,0.125,0.125,0.16,0.16,0.04,0.04),nrow=2,byrow=TRUE) prob ES.w2(prob) pwr.chisq.test(w=ES.w2(prob),df=(2-1)*(4-1),N=200) } \keyword{htest} pwr/man/pwr.f2.test.Rd0000644000176200001440000000301413046210416014213 0ustar liggesusers\name{pwr.f2.test} \alias{pwr.f2.test} \title{Power calculations for the general linear model} \description{ Compute power of test or determine parameters to obtain target power (same as power.anova.test).} \usage{ pwr.f2.test(u = NULL, v = NULL, f2 = NULL, sig.level = 0.05, power = NULL) } \arguments{ \item{u}{degrees of freedom for numerator} \item{v}{degrees of freedomfor denominator} \item{f2}{effect size} \item{sig.level}{Significance level (Type I error probability)} \item{power}{Power of test (1 minus Type II error probability)} } \details{ Exactly one of the parameters 'u','v','f2','power' and 'sig.level' must be passed as NULL, and that parameter is determined from the others. Notice that the last one has non-NULL default so NULL must be explicitly passed if you want to compute it. } \value{ Object of class '"power.htest"', a list of the arguments (including the computed one) augmented with 'method' and 'note' elements. } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane Champely but this is a mere copy of Peter Dalgaard work (power.t.test)} \note{ 'uniroot' is used to solve power equation for unknowns, so you may see errors from it, notably about inability to bracket the root when invalid arguments are given. } \examples{ ## Exercise 9.1 P. 424 from Cohen (1988) pwr.f2.test(u=5,v=89,f2=0.1/(1-0.1),sig.level=0.05) } \keyword{htest}pwr/man/plot.power.htest.Rd0000644000176200001440000000323413064577156015403 0ustar liggesusers\name{plot.power.htest} \alias{plot.power.htest} \title{Plot diagram of sample size vs. test power} \description{Plot a diagram to illustrate the relationship of sample size and test power for a given set of parameters.} \usage{ \method{plot}{power.htest}(x, \dots) } %- maybe also 'usage' for other objects documented here. \arguments{ \item{x}{object of class power.htest usually created by one of the power calculation functions, e.g., pwr.t.test()} \item{\dots}{Arguments to be passed to \code{ggplot} including xlab and ylab} } \details{ Power calculations for the following tests are supported: t-test (pwr.t.test(), pwr.t2n.test()), chi squared test (pwr.chisq.test()), one-way ANOVA (pwr.anova.test(), standard normal distribution (pwr.norm.test()), pearson correlation (pwr.r.test()), proportions (pwr.p.test(), pwr.2p.test(), pwr.2p2n.test())) } \value{ These functions are invoked for their side effect of drawing on the active graphics device. } \note{ By default it attempts to use the plotting tools of \href{https://cran.r-project.org/package=ggplot2/}{ggplot2} and \href{https://cran.r-project.org/package=scales}{scales}. If they are not installed, it will use the basic R plotting tools. } \author{Stephan Weibelzahl } \seealso{\code{\link{pwr.t.test}}, \code{\link{pwr.p.test}}, \code{\link{pwr.2p.test}}, \code{\link{pwr.2p2n.test}}, \code{\link{pwr.r.test}}, \code{\link{pwr.chisq.test}}, \code{\link{pwr.anova.test}}, \code{\link{pwr.t2n.test}} } \examples{ ## Two-sample t-test p.t.two <- pwr.t.test(d=0.3, power=0.8, type="two.sample", alternative="two.sided") plot(p.t.two) plot(p.t.two, xlab="sample size per group") } \keyword{htest} pwr/man/pwr.chisq.test.Rd0000644000176200001440000000332713046210416015022 0ustar liggesusers\name{pwr.chisq.test} \alias{pwr.chisq.test} \title{power calculations for chi-squared tests} \description{ Compute power of test or determine parameters to obtain target power (same as power.anova.test).} \usage{ pwr.chisq.test(w = NULL, N = NULL, df = NULL, sig.level = 0.05, power = NULL) } \arguments{ \item{w}{Effect size} \item{N}{Total number of observations} \item{df}{degree of freedom (depends on the chosen test)} \item{sig.level}{Significance level (Type I error probability)} \item{power}{Power of test (1 minus Type II error probability)} } \details{ Exactly one of the parameters 'w','N','power' and 'sig.level' must be passed as NULL, and that parameter is determined from the others. Notice that the last one has non-NULL default so NULL must be explicitly passed if you want to compute it. } \value{ Object of class '"power.htest"', a list of the arguments (including the computed one) augmented with 'method' and 'note' elements. } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane Champely but this is a mere copy of Peter Dalgaard work (power.t.test)} \note{ 'uniroot' is used to solve power equation for unknowns, so you may see errors from it, notably about inability to bracket the root when invalid arguments are given. } \seealso{ES.w1,ES.w2} \examples{ ## Exercise 7.1 P. 249 from Cohen (1988) pwr.chisq.test(w=0.289,df=(4-1),N=100,sig.level=0.05) ## Exercise 7.3 p. 251 pwr.chisq.test(w=0.346,df=(2-1)*(3-1),N=140,sig.level=0.01) ## Exercise 7.8 p. 270 pwr.chisq.test(w=0.1,df=(5-1)*(6-1),power=0.80,sig.level=0.05) } \keyword{htest}pwr/man/pwr.2p.test.Rd0000644000176200001440000000335513046210416014235 0ustar liggesusers\name{pwr.2p.test} \alias{pwr.2p.test} \title{Power calculation for two proportions (same sample sizes)} \description{ Compute power of test, or determine parameters to obtain target power (similar to power.prop.test). } \usage{ pwr.2p.test(h = NULL, n = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided","less","greater")) } \arguments{ \item{h}{Effect size} \item{n}{Number of observations (per sample)} \item{sig.level}{Significance level (Type I error probability)} \item{power}{Power of test (1 minus Type II error probability)} \item{alternative}{a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less"} } \details{ Exactly one of the parameters 'h','n', 'power' and 'sig.level' must be passed as NULL, and that parameter is determined from the others. Notice that the last one has non-NULL default so NULL must be explicitly passed if you want to compute it. } \value{ Object of class '"power.htest"', a list of the arguments (including the computed one) augmented with 'method' and 'note' elements. } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane Champely but this is a mere copy of Peter Dalgaard work (power.t.test)} \note{ 'uniroot' is used to solve power equation for unknowns, so you may see errors from it, notably about inability to bracket the root when invalid arguments are given. } \seealso{ES.h, pwr.2p2n.test, power.prop.test} \examples{ ## Exercise 6.1 p. 198 from Cohen (1988) pwr.2p.test(h=0.3,n=80,sig.level=0.05,alternative="greater") } \keyword{htest} pwr/man/pwr.p.test.Rd0000644000176200001440000000364313046210416014153 0ustar liggesusers\name{pwr.p.test} \alias{pwr.p.test} \title{Power calculations for proportion tests (one sample)} \description{ Compute power of test or determine parameters to obtain target power (same as power.anova.test).} \usage{ pwr.p.test(h = NULL, n = NULL, sig.level = 0.05, power = NULL, alternative = c("two.sided","less","greater")) } %- maybe also 'usage' for other objects documented here. \arguments{ \item{h}{Effect size} \item{n}{Number of observations} \item{sig.level}{Significance level (Type I error probability)} \item{power}{Power of test (1 minus Type II error probability)} \item{alternative}{a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less"}} \details{ These calculations use arcsine transformation of the proportion (see Cohen (1988)) Exactly one of the parameters 'h','n','power' and 'sig.level' must be passed as NULL, and that parameter is determined from the others. Notice that the last one has non-NULL default so NULL must be explicitly passed if you want to compute it. } \value{ Object of class '"power.htest"', a list of the arguments (including the computed one) augmented with 'method' and 'note' elements. } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane Champely but this is a mere copy of Peter Dalgaard work (power.t.test)} \note{ 'uniroot' is used to solve power equation for unknowns, so you may see errors from it, notably about inability to bracket the root when invalid arguments are given. } \seealso{ES.h} \examples{ ## Exercise 6.5 p. 203 from Cohen h<-ES.h(0.5,0.4) h pwr.p.test(h=h,n=60,sig.level=0.05,alternative="two.sided") ## Exercise 6.8 p. 208 pwr.p.test(h=0.2,power=0.95,sig.level=0.05,alternative="two.sided") } \keyword{htest}pwr/man/pwr.t.test.Rd0000644000176200001440000000476613244306314014171 0ustar liggesusers\name{pwr.t.test} \alias{pwr.t.test} \title{Power calculations for t-tests of means (one sample, two samples and paired samples)} \description{ Compute power of tests or determine parameters to obtain target power (similar to power.t.test).} \usage{ pwr.t.test(n = NULL, d = NULL, sig.level = 0.05, power = NULL, type = c("two.sample", "one.sample", "paired"), alternative = c("two.sided", "less", "greater")) } %- maybe also 'usage' for other objects documented here. \arguments{ \item{n}{Number of observations (per sample)} \item{d}{Effect size (Cohen's d) - difference between the means divided by the pooled standard deviation} \item{sig.level}{Significance level (Type I error probability)} \item{power}{Power of test (1 minus Type II error probability)} \item{type}{Type of t test : one- two- or paired-samples} \item{alternative}{a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less"} } \details{ Exactly one of the parameters 'd','n','power' and 'sig.level' must be passed as NULL, and that parameter is determined from the others. Notice that the last one has non-NULL default so NULL must be explicitly passed if you want to compute it. } \value{ Object of class '"power.htest"', a list of the arguments (including the computed one) augmented with 'method' and 'note' elements. } \references{Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.} \author{Stephane Champely but this is a mere copy of Peter Dalgaard work (power.t.test)} \note{ 'uniroot' is used to solve power equation for unknowns, so you may see errors from it, notably about inability to bracket the root when invalid arguments are given. } \seealso{power.prop.test} \examples{ ## One sample (power) ## Exercise 2.5 p. 47 from Cohen (1988) pwr.t.test(d=0.2,n=60,sig.level=0.10,type="one.sample",alternative="two.sided") ## Paired samples (power) ## Exercise p. 50 from Cohen (1988) d<-8/(16*sqrt(2*(1-0.6))) pwr.t.test(d=d,n=40,sig.level=0.05,type="paired",alternative="two.sided") ## Two independent samples (power) ## Exercise 2.1 p. 40 from Cohen (1988) d<-2/2.8 pwr.t.test(d=d,n=30,sig.level=0.05,type="two.sample",alternative="two.sided") ## Two independent samples (sample size) ## Exercise 2.10 p. 59 pwr.t.test(d=0.3,power=0.75,sig.level=0.05,type="two.sample",alternative="greater") } \keyword{htest}