readxl/ 0000755 0001762 0000144 00000000000 14451634551 011541 5 ustar ligges users readxl/NAMESPACE 0000644 0001762 0000144 00000000751 14402521714 012753 0 ustar ligges users # Generated by roxygen2: do not edit by hand
export(anchored)
export(cell_cols)
export(cell_limits)
export(cell_rows)
export(excel_format)
export(excel_sheets)
export(format_from_ext)
export(format_from_signature)
export(read_excel)
export(read_xls)
export(read_xlsx)
export(readxl_example)
export(readxl_progress)
importFrom(cellranger,anchored)
importFrom(cellranger,cell_cols)
importFrom(cellranger,cell_limits)
importFrom(cellranger,cell_rows)
useDynLib(readxl, .registration = TRUE)
readxl/LICENSE.note 0000644 0001762 0000144 00000003033 13406366260 013507 0 ustar ligges users # libxls
libxls -- A multiplatform, C/C++ library for parsing Excel(TM) files.
Copyright 2004 Komarov Valery
Copyright 2006 Christophe Leitienne
Copyright 2008-2017 David Hoerl
Copyright 2013 Bob Colbert
Copyright 2013-2018 Evan Miller
The included libxls code is licensed under the BSD 2-clause license:
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ''AS IS''
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
readxl/LICENSE 0000644 0001762 0000144 00000000054 14370571753 012551 0 ustar ligges users YEAR: 2023
COPYRIGHT HOLDER: readxl authors
readxl/README.md 0000644 0001762 0000144 00000022763 14451326710 013025 0 ustar ligges users
# readxl
[](https://cran.r-project.org/package=readxl)
[](https://github.com/tidyverse/readxl/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/tidyverse/readxl?branch=main)
[](https://lifecycle.r-lib.org/articles/stages.html)
## Overview
The readxl package makes it easy to get data out of Excel and into R.
Compared to many of the existing packages (e.g. gdata, xlsx,
xlsReadWrite) readxl has no external dependencies, so it’s easy to
install and use on all operating systems. It is designed to work with
*tabular* data.
readxl supports both the legacy `.xls` format and the modern xml-based
`.xlsx` format. The [libxls](https://github.com/libxls/libxls) C library
is used to support `.xls`, which abstracts away many of the complexities
of the underlying binary format. To parse `.xlsx`, we use the
[RapidXML](https://rapidxml.sourceforge.net/) C++ library.
## Installation
The easiest way to install the latest released version from CRAN is to
install the whole tidyverse.
``` r
install.packages("tidyverse")
```
NOTE: you will still need to load readxl explicitly, because it is not a
core tidyverse package loaded via `library(tidyverse)`.
Alternatively, install just readxl from CRAN:
``` r
install.packages("readxl")
```
Or install the development version from GitHub:
``` r
#install.packages("pak")
pak::pak("tidyverse/readxl")
```
## Cheatsheet
You can see how to read data with readxl in the **data import
cheatsheet**, which also covers similar functionality in the related
packages readr and googlesheets4.
## Usage
``` r
library(readxl)
```
readxl includes several example files, which we use throughout the
documentation. Use the helper `readxl_example()` with no arguments to
list them or call it with an example filename to get the path.
``` r
readxl_example()
#> [1] "clippy.xls" "clippy.xlsx" "datasets.xls" "datasets.xlsx"
#> [5] "deaths.xls" "deaths.xlsx" "geometry.xls" "geometry.xlsx"
#> [9] "type-me.xls" "type-me.xlsx"
readxl_example("clippy.xls")
#> [1] "/private/tmp/RtmpM1GkLC/temp_libpatha8e46f7f62bf/readxl/extdata/clippy.xls"
```
`read_excel()` reads both xls and xlsx files and detects the format from
the extension.
``` r
xlsx_example <- readxl_example("datasets.xlsx")
read_excel(xlsx_example)
#> # A tibble: 150 × 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> # ℹ 147 more rows
xls_example <- readxl_example("datasets.xls")
read_excel(xls_example)
#> # A tibble: 150 × 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> # ℹ 147 more rows
```
List the sheet names with `excel_sheets()`.
``` r
excel_sheets(xlsx_example)
#> [1] "iris" "mtcars" "chickwts" "quakes"
```
Specify a worksheet by name or number.
``` r
read_excel(xlsx_example, sheet = "chickwts")
#> # A tibble: 71 × 2
#> weight feed
#>
#> 1 179 horsebean
#> 2 160 horsebean
#> 3 136 horsebean
#> # ℹ 68 more rows
read_excel(xls_example, sheet = 4)
#> # A tibble: 1,000 × 5
#> lat long depth mag stations
#>
#> 1 -20.4 182. 562 4.8 41
#> 2 -20.6 181. 650 4.2 15
#> 3 -26 184. 42 5.4 43
#> # ℹ 997 more rows
```
There are various ways to control which cells are read. You can even
specify the sheet here, if providing an Excel-style cell range.
``` r
read_excel(xlsx_example, n_max = 3)
#> # A tibble: 3 × 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
read_excel(xlsx_example, range = "C1:E4")
#> # A tibble: 3 × 3
#> Petal.Length Petal.Width Species
#>
#> 1 1.4 0.2 setosa
#> 2 1.4 0.2 setosa
#> 3 1.3 0.2 setosa
read_excel(xlsx_example, range = cell_rows(1:4))
#> # A tibble: 3 × 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
read_excel(xlsx_example, range = cell_cols("B:D"))
#> # A tibble: 150 × 3
#> Sepal.Width Petal.Length Petal.Width
#>
#> 1 3.5 1.4 0.2
#> 2 3 1.4 0.2
#> 3 3.2 1.3 0.2
#> # ℹ 147 more rows
read_excel(xlsx_example, range = "mtcars!B1:D5")
#> # A tibble: 4 × 3
#> cyl disp hp
#>
#> 1 6 160 110
#> 2 6 160 110
#> 3 4 108 93
#> # ℹ 1 more row
```
If `NA`s are represented by something other than blank cells, set the
`na` argument.
``` r
read_excel(xlsx_example, na = "setosa")
#> # A tibble: 150 × 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> # ℹ 147 more rows
```
If you are new to the tidyverse conventions for data import, you may
want to consult the [data import
chapter](https://r4ds.had.co.nz/data-import.html) in R for Data Science.
readxl will become increasingly consistent with other packages, such as
[readr](https://readr.tidyverse.org/).
## Articles
Broad topics are explained in [these
articles](https://readxl.tidyverse.org/articles/index.html):
- [Cell and Column
Types](https://readxl.tidyverse.org/articles/cell-and-column-types.html)
- [Sheet
Geometry](https://readxl.tidyverse.org/articles/sheet-geometry.html):
how to specify which cells to read
- [readxl
Workflows](https://readxl.tidyverse.org/articles/articles/readxl-workflows.html):
Iterating over multiple tabs or worksheets, stashing a csv snapshot
We also have some focused articles that address specific aggravations
presented by the world’s spreadsheets:
- [Column
Names](https://readxl.tidyverse.org/articles/articles/column-names.html)
- [Multiple Header
Rows](https://readxl.tidyverse.org/articles/articles/multiple-header-rows.html)
## Features
- No external dependency on, e.g., Java or Perl.
- Re-encodes non-ASCII characters to UTF-8.
- Loads datetimes into POSIXct columns. Both Windows (1900) and
Mac (1904) date specifications are processed correctly.
- Discovers the minimal data rectangle and returns that, by default.
User can exert more control with `range`, `skip`, and `n_max`.
- Column names and types are determined from the data in the sheet, by
default. User can also supply via `col_names` and `col_types` and
control name repair via `.name_repair`.
- Returns a
[tibble](https://tibble.tidyverse.org/reference/tibble.html), i.e. a
data frame with an additional `tbl_df` class. Among other things, this
provide nicer printing.
## Other relevant packages
Here are some other packages with functionality that is complementary to
readxl and that also avoid a Java dependency.
**Writing Excel files**: The example files `datasets.xlsx` and
`datasets.xls` were created with the help of
[openxlsx](https://CRAN.R-project.org/package=openxlsx) (and Excel).
openxlsx provides “a high level interface to writing, styling and
editing worksheets”.
``` r
l <- list(iris = iris, mtcars = mtcars, chickwts = chickwts, quakes = quakes)
openxlsx::write.xlsx(l, file = "inst/extdata/datasets.xlsx")
```
[writexl](https://cran.r-project.org/package=writexl) is a new option in
this space, first released on CRAN in August 2017. It’s a portable and
lightweight way to export a data frame to xlsx, based on
[libxlsxwriter](https://github.com/jmcnamara/libxlsxwriter). It is much
more minimalistic than openxlsx, but on simple examples, appears to be
about twice as fast and to write smaller files.
**Non-tabular data and formatting**:
[tidyxl](https://cran.r-project.org/package=tidyxl) is focused on
importing awkward and non-tabular data from Excel. It also “exposes cell
content, position and formatting in a tidy structure for further
manipulation”.
readxl/man/ 0000755 0001762 0000144 00000000000 14370571753 012320 5 ustar ligges users readxl/man/readxl_example.Rd 0000644 0001762 0000144 00000000762 13070523305 015570 0 ustar ligges users % Generated by roxygen2: do not edit by hand
% Please edit documentation in R/example.R
\name{readxl_example}
\alias{readxl_example}
\title{Get path to readxl example}
\usage{
readxl_example(path = NULL)
}
\arguments{
\item{path}{Name of file. If \code{NULL}, the example files will be listed.}
}
\description{
readxl comes bundled with some example files in its \code{inst/extdata}
directory. This function make them easy to access.
}
\examples{
readxl_example()
readxl_example("datasets.xlsx")
}
readxl/man/excel_format.Rd 0000644 0001762 0000144 00000003070 14216663505 015253 0 ustar ligges users % Generated by roxygen2: do not edit by hand
% Please edit documentation in R/excel-format.R
\name{excel_format}
\alias{excel_format}
\alias{format_from_ext}
\alias{format_from_signature}
\title{Determine file format}
\usage{
excel_format(path, guess = TRUE)
format_from_ext(path)
format_from_signature(path)
}
\arguments{
\item{path}{Path to the xls/xlsx file.}
\item{guess}{Logical. If the file extension is absent or not recognized, this
controls whether we attempt to guess format based on the file signature or
"magic number".}
}
\value{
Character vector with values \code{"xlsx"}, \code{"xls"}, or \code{NA}.
}
\description{
Determine if files are xls or xlsx (or from the xlsx family).
\code{excel_format(guess = TRUE)} is used by \code{read_excel()} to
determine format. It draws on logic from two lower level functions:
\itemize{
\item \code{format_from_ext()} attempts to determine format from the file extension.
\item \code{format_from_signature()} consults the \href{https://en.wikipedia.org/wiki/List_of_file_signatures}{file signature} or "magic
number".
}
File extensions associated with xlsx vs. xls:
\itemize{
\item xlsx: \code{.xlsx}, \code{.xlsm}, \code{.xltx}, \code{.xltm}
\item xls: \code{.xls}
}
File signatures (in hexadecimal) for xlsx vs xls:
\itemize{
\item xlsx: First 4 bytes are \verb{50 4B 03 04}
\item xls: First 8 bytes are \verb{D0 CF 11 E0 A1 B1 1A E1}
}
}
\examples{
files <- c(
"a.xlsx",
"b.xls",
"c.png",
file.path(R.home("doc"), "html", "logo.jpg"),
readxl_example("clippy.xlsx"),
readxl_example("deaths.xls")
)
excel_format(files)
}
readxl/man/read_excel.Rd 0000644 0001762 0000144 00000013606 14216663505 014704 0 ustar ligges users % Generated by roxygen2: do not edit by hand
% Please edit documentation in R/read_excel.R
\name{read_excel}
\alias{read_excel}
\alias{read_xls}
\alias{read_xlsx}
\title{Read xls and xlsx files}
\usage{
read_excel(
path,
sheet = NULL,
range = NULL,
col_names = TRUE,
col_types = NULL,
na = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
progress = readxl_progress(),
.name_repair = "unique"
)
read_xls(
path,
sheet = NULL,
range = NULL,
col_names = TRUE,
col_types = NULL,
na = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
progress = readxl_progress(),
.name_repair = "unique"
)
read_xlsx(
path,
sheet = NULL,
range = NULL,
col_names = TRUE,
col_types = NULL,
na = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
progress = readxl_progress(),
.name_repair = "unique"
)
}
\arguments{
\item{path}{Path to the xls/xlsx file.}
\item{sheet}{Sheet to read. Either a string (the name of a sheet), or an
integer (the position of the sheet). Ignored if the sheet is specified via
\code{range}. If neither argument specifies the sheet, defaults to the first
sheet.}
\item{range}{A cell range to read from, as described in \link{cell-specification}.
Includes typical Excel ranges like "B3:D87", possibly including the sheet
name like "Budget!B2:G14", and more. Interpreted strictly, even if the
range forces the inclusion of leading or trailing empty rows or columns.
Takes precedence over \code{skip}, \code{n_max} and \code{sheet}.}
\item{col_names}{\code{TRUE} to use the first row as column names, \code{FALSE} to get
default names, or a character vector giving a name for each column. If user
provides \code{col_types} as a vector, \code{col_names} can have one entry per
column, i.e. have the same length as \code{col_types}, or one entry per
unskipped column.}
\item{col_types}{Either \code{NULL} to guess all from the spreadsheet or a
character vector containing one entry per column from these options:
"skip", "guess", "logical", "numeric", "date", "text" or "list". If exactly
one \code{col_type} is specified, it will be recycled. The content of a cell in
a skipped column is never read and that column will not appear in the data
frame output. A list cell loads a column as a list of length 1 vectors,
which are typed using the type guessing logic from \code{col_types = NULL}, but
on a cell-by-cell basis.}
\item{na}{Character vector of strings to interpret as missing values. By
default, readxl treats blank cells as missing data.}
\item{trim_ws}{Should leading and trailing whitespace be trimmed?}
\item{skip}{Minimum number of rows to skip before reading anything, be it
column names or data. Leading empty rows are automatically skipped, so this
is a lower bound. Ignored if \code{range} is given.}
\item{n_max}{Maximum number of data rows to read. Trailing empty rows are
automatically skipped, so this is an upper bound on the number of rows in
the returned tibble. Ignored if \code{range} is given.}
\item{guess_max}{Maximum number of data rows to use for guessing column
types.}
\item{progress}{Display a progress spinner? By default, the spinner appears
only in an interactive session, outside the context of knitting a document,
and when the call is likely to run for several seconds or more. See
\code{\link[=readxl_progress]{readxl_progress()}} for more details.}
\item{.name_repair}{Handling of column names. Passed along to
\code{\link[tibble:as_tibble]{tibble::as_tibble()}}. readxl's default is `.name_repair = "unique", which
ensures column names are not empty and are unique.}
}
\value{
A \link[tibble:tibble-package]{tibble}
}
\description{
Read xls and xlsx files
\code{read_excel()} calls \code{\link[=excel_format]{excel_format()}} to determine if \code{path} is xls or xlsx,
based on the file extension and the file itself, in that order. Use
\code{read_xls()} and \code{read_xlsx()} directly if you know better and want to
prevent such guessing.
}
\examples{
datasets <- readxl_example("datasets.xlsx")
read_excel(datasets)
# Specify sheet either by position or by name
read_excel(datasets, 2)
read_excel(datasets, "mtcars")
# Skip rows and use default column names
read_excel(datasets, skip = 148, col_names = FALSE)
# Recycle a single column type
read_excel(datasets, col_types = "text")
# Specify some col_types and guess others
read_excel(datasets, col_types = c("text", "guess", "numeric", "guess", "guess"))
# Accomodate a column with disparate types via col_type = "list"
df <- read_excel(readxl_example("clippy.xlsx"), col_types = c("text", "list"))
df
df$value
sapply(df$value, class)
# Limit the number of data rows read
read_excel(datasets, n_max = 3)
# Read from an Excel range using A1 or R1C1 notation
read_excel(datasets, range = "C1:E7")
read_excel(datasets, range = "R1C2:R2C5")
# Specify the sheet as part of the range
read_excel(datasets, range = "mtcars!B1:D5")
# Read only specific rows or columns
read_excel(datasets, range = cell_rows(102:151), col_names = FALSE)
read_excel(datasets, range = cell_cols("B:D"))
# Get a preview of column names
names(read_excel(readxl_example("datasets.xlsx"), n_max = 0))
# exploit full .name_repair flexibility from tibble
# "universal" names are unique and syntactic
read_excel(
readxl_example("deaths.xlsx"),
range = "arts!A5:F15",
.name_repair = "universal"
)
# specify name repair as a built-in function
read_excel(readxl_example("clippy.xlsx"), .name_repair = toupper)
# specify name repair as a custom function
my_custom_name_repair <- function(nms) tolower(gsub("[.]", "_", nms))
read_excel(
readxl_example("datasets.xlsx"),
.name_repair = my_custom_name_repair
)
# specify name repair as an anonymous function
read_excel(
readxl_example("datasets.xlsx"),
sheet = "chickwts",
.name_repair = ~ substr(.x, start = 1, stop = 3)
)
}
\seealso{
\link{cell-specification} for more details on targetting cells with the
\code{range} argument
}
readxl/man/figures/ 0000755 0001762 0000144 00000000000 14402521667 013757 5 ustar ligges users readxl/man/figures/logo.png 0000644 0001762 0000144 00000056010 14402522031 015411 0 ustar ligges users PNG
IHDR ޫh gAMA a cHRM z&