statcheck/ 0000755 0001762 0000144 00000000000 14363505732 012233 5 ustar ligges users statcheck/NAMESPACE 0000644 0001762 0000144 00000001705 14350273527 013455 0 ustar ligges users # Generated by roxygen2: do not edit by hand
S3method(identify,statcheck)
S3method(plot,statcheck)
S3method(summary,statcheck)
export(checkHTML)
export(checkHTMLdir)
export(checkPDF)
export(checkPDFdir)
export(checkdir)
export(statcheck)
export(statcheckReport)
importFrom(ggplot2,aes)
importFrom(ggplot2,annotate)
importFrom(ggplot2,element_blank)
importFrom(ggplot2,element_line)
importFrom(ggplot2,facet_grid)
importFrom(ggplot2,geom_abline)
importFrom(ggplot2,geom_hline)
importFrom(ggplot2,geom_point)
importFrom(ggplot2,geom_vline)
importFrom(ggplot2,ggplot)
importFrom(ggplot2,scale_color_manual)
importFrom(ggplot2,scale_x_continuous)
importFrom(ggplot2,scale_y_continuous)
importFrom(ggplot2,theme)
importFrom(ggplot2,theme_bw)
importFrom(graphics,abline)
importFrom(graphics,legend)
importFrom(graphics,par)
importFrom(graphics,plot.default)
importFrom(graphics,points)
importFrom(graphics,text)
importFrom(rlang,.data)
statcheck/README.md 0000644 0001762 0000144 00000006656 14362533741 013527 0 ustar ligges users
# statcheck
[](https://cran.r-project.org/package=statcheck)
[](https://cran.r-project.org/package=statcheck)
[](https://github.com/MicheleNuijten/statcheck)
## What is statcheck?
`statcheck` is a “spellchecker” for statistics. It checks whether your
*p*-values match their accompanying test statistic and degrees of
freedom.
`statcheck` searches for null-hypothesis significance test (NHST) in APA
style (e.g., *t*(28) = 2.2, *p* \< .05). It recalculates the p-value
using the reported test statistic and degrees of freedom. If the
reported and computed p-values don’t match, `statcheck` will flag the
result as an error.

## What can I use statcheck for?
`statcheck` is mainly useful for:
1. **Self-checks**: you can use `statcheck` to make sure your
manuscript doesn’t contain copy-paste errors or other
inconsistencies before you submit it to a journal.
2. **Peer review**: editors and reviewers can use `statcheck` to check
submitted manuscripts for statistical inconsistencies. They can ask
authors for a correction or clarification before publishing a
manuscript.
3. **Research**: `statcheck` can be used to automatically extract
statistical test results from articles that can then be analyzed.
You can for instance investigate whether you can predict statistical
inconsistencies (see e.g., [Nuijten et al.,
2017](https://doi.org/10.1525/collabra.102)), or use it to analyze
p-value distributions (see e.g., [Hartgerink et al.,
2016](https://peerj.com/articles/1935/)).
## How does statcheck work?
The algorithm behind `statcheck` consists of four basic steps:
1. **Convert** pdf and html articles to plain text files.
2. **Search** the text for instances of NHST results. Specifically,
`statcheck` can recognize *t*-tests, *F*-tests, correlations,
*z*-tests, $\chi^2$ -tests, and Q-tests (from meta-analyses) if they
are reported completely (test statistic, degrees of freedom, and
*p*-value) and in APA style.
3. **Recompute** the *p*-value using the reported test statistic and
degrees of freedom.
4. **Compare** the reported and recomputed *p*-value. If the reported
*p*-value does not match the computed one, the result is marked as
an *inconsistency* (`Error` in the output). If the reported
*p*-value is significant and the computed is not, or vice versa, the
result is marked as a *gross inconsistency* (`DecisionError` in the
output).
`statcheck` takes into account correct rounding of the test statistic,
and has the option to take into account one-tailed testing. See the
[manual](http://rpubs.com/michelenuijten/statcheckmanual) for details.
## Installation and use
For detailed information about installing and using `statcheck`, see the
[manual on RPubs](http://rpubs.com/michelenuijten/statcheckmanual).
[statcheck.io](http://statcheck.io/) is a web-based interface for
statcheck.
statcheck/man/ 0000755 0001762 0000144 00000000000 14362536723 013011 5 ustar ligges users statcheck/man/identify.statcheck.Rd 0000644 0001762 0000144 00000002526 14350273527 017065 0 ustar ligges users % Generated by roxygen2: do not edit by hand
% Please edit documentation in R/identify.statcheck.R
\name{identify.statcheck}
\alias{identify.statcheck}
\title{Identify specific points in a statcheck plot.}
\usage{
\method{identify}{statcheck}(x, alpha = 0.05, ...)
}
\arguments{
\item{x}{A statcheck object. See \code{\link{statcheck}}.}
\item{alpha}{assumed level of significance in the scanned texts. Defaults to
.05.}
\item{...}{arguments to be passed to methods, such as graphical parameters
(see \code{\link{par}}).}
}
\description{
With this function you can simply point and click on the datapoints in the
plot to see the corresponding statcheck details, such as the paper from which
the data came and the exact statistical results.
}
\examples{
\dontrun{
# First we need a statcheck object
# Here, we create one by running statcheck on some raw text
txt <- "This test is consistent t(28) = 0.2, p = .84, but this one is
inconsistent: F(2, 28) = 4.2, p = .01. This final test is even a
gross/decision inconsistency: z = 1.23, p = .03"
result <- statcheck(txt)
# Now, we can run identify.statcheck(), or shorter, simply identify():
identify(result)
# Further instructions:
# click on one or multiple points of interest
# press Esc
# a dataframe with information on the selected points will appear
}
}
statcheck/man/summary.statcheck.Rd 0000644 0001762 0000144 00000001670 14350273527 016746 0 ustar ligges users % Generated by roxygen2: do not edit by hand
% Please edit documentation in R/summary.statcheck.R
\name{summary.statcheck}
\alias{summary.statcheck}
\title{Summary method for statcheck}
\usage{
\method{summary}{statcheck}(object, ...)
}
\arguments{
\item{object}{a \code{statcheck} object.}
\item{...}{additional arguments affecting the summary produced.}
}
\value{
A data frame containing for each source of statistics:
\describe{
\item{source}{Name of the file/origin of which the statistics are
extracted}
\item{nr_p_values}{The number of extracted reported p values per article}
\item{nr_errors}{The number of errors per article}
\item{nr_decision_errors}{The number of decision errors per article}
}
}
\description{
Gives the summaries for a \code{statcheck} object.
}
\examples{
txt <- "blablabla the effect was very significant (t(100)=1, p < 0.001)"
stat <- statcheck(txt)
summary(stat)
}
statcheck/man/statcheckReport.Rd 0000644 0001762 0000144 00000003653 14350273527 016451 0 ustar ligges users % Generated by roxygen2: do not edit by hand
% Please edit documentation in R/statcheckReport.R
\name{statcheckReport}
\alias{statcheckReport}
\title{Generate HTML report for statcheck output}
\usage{
statcheckReport(statcheckOutput, outputFileName, outputDir)
}
\arguments{
\item{statcheckOutput}{statcheck output of one of the following functions:
\code{\link{statcheck}}, \code{\link{checkPDFdir}}, \code{\link{checkPDF}},
\code{\link{checkHTMLdir}}, \code{\link{checkHTML}}, or
\code{\link{checkdir}}.}
\item{outputFileName}{String specifying the file name under which you want to
save the generated HTML report. The extension ".html" is automatically added,
so doesn't need to be specified in this argument.}
\item{outputDir}{String specifying the directory in which you want to save
the generated HTML report.}
}
\value{
An HTML report, saved in the directory specified in the argument
"outputDir".
}
\description{
This function uses R Markdown to generate a nicely formatted HTML report of
\code{\link{statcheck}} output.
}
\details{
This function temporarily saves the inserted \code{statcheck} output as an
.RData file in the "output" folder in the statcheck package directory. This
file is then called by the .Rmd template that is saved in the folder "rmd",
also in the statcheck package directory. After the HTML report is generated,
the .RData file is removed again.
}
\examples{
\dontrun{
# first generate statcheck output, for instance by using the statcheck()
function
txt <- "blablabla the effect was very significant (t(100)=1, p < 0.001)"
stat <- statcheck(txt)
# next, use this output to generate a nice HTML report of the results
statcheckReport(stat, outputFileName="statcheckHTMLReport",
outputDir="C:/mydocuments/results")
# you can now find your HTML report in the folder
# "C:/mydocuments/results" under the name "statcheckHTMLReport.html".
}
}
statcheck/man/checkfiles.Rd 0000644 0001762 0000144 00000001340 14350273527 015373 0 ustar ligges users % Generated by roxygen2: do not edit by hand
% Please edit documentation in R/checkHTML.R, R/checkPDF.R, R/doc-checkfiles.R
\name{checkHTML}
\alias{checkHTML}
\alias{checkPDF}
\alias{checkfiles}
\title{Extract statistics from PDF/HTML articles and recalculate p-values}
\usage{
checkHTML(files, ...)
checkPDF(files, ...)
}
\arguments{
\item{files}{Vector of strings containing file paths to HTML files to check.}
\item{...}{Arguments sent to \code{statcheck}.}
}
\value{
A statcheck data frame with the extracted statistics. See
\code{\link{statcheck}} for details.
}
\description{
These functions search for NHST results in PDF and/or HTML articles and send
the extracted statistics to \code{statcheck}.
}
statcheck/man/statcheck-package.Rd 0000644 0001762 0000144 00000012622 14362536723 016645 0 ustar ligges users % Generated by roxygen2: do not edit by hand
% Please edit documentation in R/statcheck-package.R
\docType{package}
\name{statcheck-package}
\alias{statcheck-package}
\alias{_PACKAGE}
\alias{{statcheck}-package}
\title{statcheck: Extract statistics from articles and recompute p-values}
\description{
The package \code{statcheck} can extract Null Hypothesis Significance Test
(NHST) results from articles (or plain text) and recomputes p-values to check
whether a reported NHST result is internally consistent or not.
}
\details{
\code{statcheck} can be used for multiple purposes, including:
\itemize{
\item \strong{Self-checks}: you can use statcheck to make sure your
manuscript doesn't contain copy-paste errors or other inconsistencies
before you submit it to a journal.
\item \strong{Peer review}: editors and reviewers can use statcheck to
check submitted manuscripts for statistical inconsistencies. They can ask
authors for a correction or clarification before publishing a manuscript.
\item \strong{Research}: statcheck can be used to automatically extract
statistical test results from articles that can then be analyzed. You can
for instance investigate whether you can predict statistical
inconsistencies (see e.g., Nuijten et al., 2017 ),
or use it to analyze p-value distributions (see e.g.,
Hartgerink et al., 2016 ).
}
}
\section{Using statcheck on a string of text}{
The most basic usage of \code{statcheck} is to directly extract NHST results
and check for inconsistencies in a string of text. See
\code{\link{statcheck}} for details and an example of how to do this.
}
\section{Using statcheck on an article}{
Another option is to run \code{statcheck} on an article (PDF or HTML). This
is a useful option if you want to check for inconsistencies in a single
article (e.g., as a final check before you submit it). Depending on whether
you want to check an article in HTML or PDF, you can use
\code{\link{checkHTML}} or \code{\link{checkPDF}}, respectively. Note: it is
recommended to check articles in HTML, as converting PDF files to plain text
sometimes results in some conversion errors.
}
\section{Using statcheck on a folder of articles}{
Finally, it is possible to run \code{statcheck} on an entire folder of
articles. This is often useful for meta-research. To do so, you can use
\code{\link{checkPDFdir}} to check all PDF articles in a folder,
\code{\link{checkHTMLdir}} to check all PDF articles in a folder, and
\code{\link{checkdir}} to check both PDF and HTML articles in a folder.
}
\section{Accuracy of the algorithm in detecting inconsistencies}{
It is important to note that \code{statcheck} is not perfect. Its performance
in detecting NHST results depends on the type-setting and reporting style of
an article and can vary widely. However, \code{statcheck} performs well in
classifying the retrieved statistics in different consistency categories. We
found that statcheck’s sensitivity (true positive rate) and specificity (true
negative rate) were high: between 85.3% and 100%, and between 96.0% and 100%,
respectively, depending on the assumptions and settings. The overall accuracy
of statcheck ranged from 96.2% to 99.9%. More details on the validity study
can be found in \href{https://psyarxiv.com/tcxaj/}{Nuijten et al., 2017}.
}
\section{Manual}{
Details on what statcheck can and cannot do, and how to install the package
and the necessary program Xpdf can be found in the
\href{https://rpubs.com/michelenuijten/statcheckmanual}{online manual}.
}
\section{Web app}{
\code{statcheck} is also available as a free, online web app at
\url{http://statcheck.io}.
}
\references{
Hartgerink, C. H. J., Van Aert, R. C. M., Nuijten, M. B., Wicherts, J. M.,
Van Assen, M. A. L. M. (2016). Distributions of p-values smaller than .05 in
psychology: What is going on? \emph{PeerJ}, \emph{4}, e1935.
doi: 10.7717/peerj.1935
Nuijten, M. B., Borghuis, J., Veldkamp, C. L. S., Dominguez-Alvarez, L., Van
Assen, M. A. L. M., & Wicherts, J. M. (2017). Journal data sharing policies
and statistical reporting inconsistencies in psychology.
\emph{Collabra: Psychology}, \emph{3}(1), 1-22. doi: 10.1525/collabra.102.
Nuijten, M. B., Van Assen, M. A. L. M., Hartgerink, C. H. J., Epskamp, S., &
Wicherts, J. M. (2017). The validity of the tool "statcheck" in discovering
statistical reporting inconsistencies. \emph{Preprint retrieved from
https://psyarxiv.com/tcxaj/.}
}
\author{
\strong{Maintainer}: Michele B. Nuijten \email{m.b.nuijten@uvt.nl} (\href{https://orcid.org/0000-0002-1468-8585}{ORCID})
Authors:
\itemize{
\item Sacha Epskamp \email{mail@sachaepskamp.com} (\href{https://orcid.org/0000-0003-4884-8118}{ORCID})
}
Other contributors:
\itemize{
\item Willem Sleegers (\href{https://orcid.org/0000-0001-9058-3817}{ORCID}) [contributor]
\item Sean Rife (\href{https://orcid.org/0000-0002-6748-0841}{ORCID}) [contributor]
\item John Sakaluk (\href{https://orcid.org/0000-0002-2515-9822}{ORCID}) [contributor]
\item Paul van der Laken (\href{https://orcid.org/0000-0002-0404-9114}{ORCID}) [contributor]
\item Chris Hartgerink (\href{https://orcid.org/0000-0003-1050-6809}{ORCID}) [contributor]
\item Steve Haroz (\href{https://orcid.org/0000-0002-2725-9173}{ORCID}) [contributor]
}
}
\keyword{internal}
statcheck/man/plot.statcheck.Rd 0000644 0001762 0000144 00000004002 14350273527 016217 0 ustar ligges users % Generated by roxygen2: do not edit by hand
% Please edit documentation in R/plot.statcheck.R
\name{plot.statcheck}
\alias{plot.statcheck}
\title{Plot method for statcheck}
\usage{
\method{plot}{statcheck}(x, alpha = 0.05, APAstyle = TRUE, group = NULL, ...)
}
\arguments{
\item{x}{A statcheck object. See \code{\link{statcheck}}.}
\item{alpha}{assumed level of significance in the scanned texts. Defaults to
.05.}
\item{APAstyle}{If TRUE, prints plot in APA style.}
\item{group}{Indicate grouping variable to facet plot. Only works when
\code{APAstyle==TRUE}}
\item{...}{arguments to be passed to methods, such as graphical parameters
(see \code{\link{par}}).}
}
\description{
Function for plotting of \code{statcheck} objects. Reported p values are
plotted against recalculated p values, which allows the user to easily spot
if articles contain miscalculations of statistical results.
}
\details{
If APAstyle = FALSE, inconsistencies between the reported and the recalculated p value are indicated with an orange dot. Recalculations of the p value that render a previously non significant result (p >= .5) as significant (p < .05), and vice versa, are considered decision errors, and are indicated with a red dot. Exactly reported p values (i.e. p = ..., as opposed to p < ... or p > ...) are indicated with a diamond.
}
\section{Acknowledgements}{
Many thanks to John Sakaluk who adapted the plot code to create graphs in
APA style.
}
\examples{
# First we need a statcheck object
# Here, we create one by running statcheck on some raw text
txt <- "This test is consistent t(28) = 0.2, p = .84, but this one is
inconsistent: F(2, 28) = 4.2, p = .01. This final test is even a
gross/decision inconsistency: z = 1.23, p = .03"
result <- statcheck(txt)
# We can then plot the statcheck object 'result' by simply calling plot() on
# "result". R will know what kind of plot to make, because "result" is of
# class "statcheck"
plot(result)
}
\seealso{
\code{\link{statcheck}}
}
statcheck/man/figures/ 0000755 0001762 0000144 00000000000 14362533741 014452 5 ustar ligges users statcheck/man/figures/overview_functions.pdf 0000644 0001762 0000144 00001011020 14362533741 021076 0 ustar ligges users %PDF-1.7
%
1 0 obj
<>/Metadata 2316 0 R/ViewerPreferences 2317 0 R>>
endobj
2 0 obj
<>
endobj
3 0 obj
<>/Font<>/Pattern<>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/Annots[ 33 0 R 41 0 R] /MediaBox[ 0 0 1280.04 960] /Contents 4 0 R/Group<>/Tabs/S/StructParents 0>>
endobj
4 0 obj
<>
stream
xZYo#7~4#L2.l6E2C3ڲWR:-el=8/-UX,AWۛ|#^}Z۷^|XGZ|hT=ƣG.._~y|-W_W_^#