dbplyr/0000755000176200001440000000000013223156042011544 5ustar liggesusersdbplyr/inst/0000755000176200001440000000000013221502521012514 5ustar liggesusersdbplyr/inst/doc/0000755000176200001440000000000013221502520013260 5ustar liggesusersdbplyr/inst/doc/sql-translation.R0000644000176200001440000000554413221502520016546 0ustar liggesusers## ----setup, include = FALSE---------------------------------------------- knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) ## ---- message = FALSE---------------------------------------------------- library(dbplyr) library(dplyr) ## ------------------------------------------------------------------------ # In SQLite variable names are escaped by double quotes: translate_sql(x) # And strings are escaped by single quotes translate_sql("x") ## ------------------------------------------------------------------------ translate_sql(x == 1 && (y < 2 || z > 3)) translate_sql(x ^ 2 < 10) translate_sql(x %% 2 == 10) ## ------------------------------------------------------------------------ translate_sql(substr(x, 5, 10)) translate_sql(log(x, 10)) ## ------------------------------------------------------------------------ translate_sql(1) translate_sql(1L) ## ------------------------------------------------------------------------ translate_sql(if (x > 5) "big" else "small") ## ---- error = TRUE------------------------------------------------------- translate_sql(mean(x, na.rm = TRUE)) translate_sql(mean(x, trim = 0.1)) ## ------------------------------------------------------------------------ translate_sql(glob(x, y)) translate_sql(x %like% "ab%") ## ------------------------------------------------------------------------ knitr::include_graphics("windows.png", dpi = 200) ## ------------------------------------------------------------------------ translate_sql(mean(G)) translate_sql(rank(G)) translate_sql(ntile(G, 2)) translate_sql(lag(G)) ## ------------------------------------------------------------------------ translate_sql(cummean(G), vars_order = "year") translate_sql(rank(), vars_group = "ID") ## ---- eval = FALSE------------------------------------------------------- # mutate(players, # min_rank(yearID), # order_by(yearID, cumsum(G)), # lead(G, order_by = yearID) # ) ## ------------------------------------------------------------------------ con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") flights <- copy_to(con, nycflights13::flights) airports <- copy_to(con, nycflights13::airports) ## ------------------------------------------------------------------------ flights %>% select(contains("delay")) %>% show_query() flights %>% select(distance, air_time) %>% mutate(speed = distance / (air_time / 60)) %>% show_query() ## ------------------------------------------------------------------------ flights %>% filter(month == 1, day == 1) %>% show_query() ## ------------------------------------------------------------------------ flights %>% arrange(carrier, desc(arr_delay)) %>% show_query() ## ------------------------------------------------------------------------ flights %>% group_by(month, day) %>% summarise(delay = mean(dep_delay)) %>% show_query() dbplyr/inst/doc/dbplyr.Rmd0000644000176200001440000002660213221275427015243 0ustar liggesusers--- title: "Introduction to dbplyr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to dbplyr} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 6L, tibble.print_max = 6L, digits = 3) ``` As well as working with local in-memory data stored in data frames, dplyr also works with remote on-disk data stored in databases. This is particularly useful in two scenarios: * Your data is already in a database. * You have so much data that it does not all fit into memory simultaneously and you need to use some external storage engine. (If your data fits in memory there is no advantage to putting it in a database: it will only be slower and more frustrating.) This vignette focusses on the first scenario because it's the most common. If you're using R to do data analysis inside a company, most of the data you need probably already lives in a database (it's just a matter of figuring out which one!). However, you will learn how to load data in to a local database in order to demonstrate dplyr's database tools. At the end, I'll also give you a few pointers if you do need to set up your own database. ## Getting started To use databases with dplyr you need to first install dbplyr: ```{r, eval = FALSE} install.packages("dbplyr") ``` You'll also need to install a DBI backend package. The DBI package provides a common interface that allows dplyr to work with many different databases using the same code. DBI is automatically installed with dbplyr, but you need to install a specific backend for the database that you want to connect to. Five commonly used backends are: * [RMySQL](https://github.com/rstats-db/RMySQL#readme) connects to MySQL and MariaDB * [RPostgreSQL](https://CRAN.R-project.org/package=RPostgreSQL) connects to Postgres and Redshift. * [RSQLite](https://github.com/rstats-db/RSQLite) embeds a SQLite database. * [odbc](https://github.com/rstats-db/odbc#odbc) connects to many commercial databases via the open database connectivity protocol. * [bigrquery](https://github.com/rstats-db/bigrquery) connects to Google's BigQuery. If the database you need to connect to is not listed here, you'll need to do some investigation (i.e. googling) yourself. In this vignette, we're going to use the RSQLite backend which is automatically installed when you install dbplyr. SQLite is a great way to get started with databases because it's completely embedded inside an R package. Unlike most other systems, you don't need to setup a separate database server. SQLite is great for demos, but is surprisingly powerful, and with a little practice you can use it to easily work with many gigabytes of data. ## Connecting to the database To work with a database in dplyr, you must first connect to it, using `DBI::dbConnect()`. We're not going to go into the details of the DBI package here, but it's the foundation upon which dbplyr is built. You'll need to learn more about if you need to do things to the database that are beyond the scope of dplyr. ```{r setup, message = FALSE} library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:") ``` The arguments to `DBI::dbConnect()` vary from database to database, but the first argument is always the database backend. It's `RSQLite::SQLite()` for RSQLite, `RMySQL::MySQL()` for RMySQL, `RPostgreSQL::PostgreSQL()` for RPostgreSQL, `odbc::odbc()` for odbc, and `bigrquery::bigquery()` for BigQuery. SQLite only needs one other argument: the path to the database. Here we use the special string `":memory:"` which causes SQLite to make a temporary in-memory database. Most existing databases don't live in a file, but instead live on another server. That means in real-life that your code will look more like this: ```{r, eval = FALSE} con <- DBI::dbConnect(RMySQL::MySQL(), host = "database.rstudio.com", user = "hadley", password = rstudioapi::askForPassword("Database password") ) ``` (If you're not using RStudio, you'll need some other way to securely retrieve your password. You should never record it in your analysis scripts or type it into the console. [Securing Credentials](https://db.rstudio.com/best-practices/managing-credentials) provides some best practices.) Our temporary database has no data in it, so we'll start by copying over `nycflights13::flights` using the convenient `copy_to()` function. This is a quick and dirty way of getting data into a database and is useful primarily for demos and other small jobs. ```{r} copy_to(con, nycflights13::flights, "flights", temporary = FALSE, indexes = list( c("year", "month", "day"), "carrier", "tailnum", "dest" ) ) ``` As you can see, the `copy_to()` operation has an additional argument that allows you to supply indexes for the table. Here we set up indexes that will allow us to quickly process the data by day, carrier, plane, and destination. Creating the right indices is key to good database performance, but is unfortunately beyond the scope of this article. Now that we've copied the data, we can use `tbl()` to take a reference to it: ```{r} flights_db <- tbl(con, "flights") ``` When you print it out, you'll notice that it mostly looks like a regular tibble: ```{r} flights_db ``` The main difference is that you can see that it's a remote source in a SQLite database. ## Generating queries To interact with a database you usually use SQL, the Structured Query Language. SQL is over 40 years old, and is used by pretty much every database in existence. The goal of dbplyr is to automatically generate SQL for you so that you're not forced to use it. However, SQL is a very large language and dbplyr doesn't do everything. It focusses on `SELECT` statements, the SQL you write most often as an analyst. Most of the time you don't need to know anything about SQL, and you can continue to use the dplyr verbs that you're already familiar with: ```{r} flights_db %>% select(year:day, dep_delay, arr_delay) flights_db %>% filter(dep_delay > 240) flights_db %>% group_by(dest) %>% summarise(delay = mean(dep_time)) ``` However, in the long-run, I highly recommend you at least learn the basics of SQL. It's a valuable skill for any data scientist, and it will help you debug problems if you run into problems with dplyr's automatic translation. If you're completely new to SQL you might start with this [codeacademy tutorial](https://www.codecademy.com/learn/learn-sql). If you have some familiarity with SQL and you'd like to learn more, I found [how indexes work in SQLite](http://www.sqlite.org/queryplanner.html) and [10 easy steps to a complete understanding of SQL](http://blog.jooq.org/2016/03/17/10-easy-steps-to-a-complete-understanding-of-sql) to be particularly helpful. The most important difference between ordinary data frames and remote database queries is that your R code is translated into SQL and executed in the database on the remote server, not in R on your local machine. When working with databases, dplyr tries to be as lazy as possible: * It never pulls data into R unless you explicitly ask for it. * It delays doing any work until the last possible moment: it collects together everything you want to do and then sends it to the database in one step. For example, take the following code: ```{r} tailnum_delay_db <- flights_db %>% group_by(tailnum) %>% summarise( delay = mean(arr_delay), n = n() ) %>% arrange(desc(delay)) %>% filter(n > 100) ``` Suprisingly, this sequence of operations never touches the database. It's not until you ask for the data (e.g. by printing `tailnum_delay`) that dplyr generates the SQL and requests the results from the database. Even then it tries to do as little work as possible and only pulls down a few rows. ```{r} tailnum_delay_db ``` Behind the scenes, dplyr is translating your R code into SQL. You can see the SQL it's generating with `show_query()`: ```{r} tailnum_delay_db %>% show_query() ``` If you're familiar with SQL, this probably isn't exactly what you'd write by hand, but it does the job. You can learn more about the SQL translation in `vignette("sql-translation")`. Typically, you'll iterate a few times before you figure out what data you need from the database. Once you've figured it out, use `collect()` to pull all the data down into a local tibble: ```{r} tailnum_delay <- tailnum_delay_db %>% collect() tailnum_delay ``` `collect()` requires that database does some work, so it may take a long time to complete. Otherwise, dplyr tries to prevent you from accidentally performing expensive query operations: * Because there's generally no way to determine how many rows a query will return unless you actually run it, `nrow()` is always `NA`. * Because you can't find the last few rows without executing the whole query, you can't use `tail()`. ```{r, error = TRUE} nrow(tailnum_delay_db) tail(tailnum_delay_db) ``` You can also ask the database how it plans to execute the query with `explain()`. The output is database dependent, and can be esoteric, but learning a bit about it can be very useful because it helps you understand if the database can execute the query efficiently, or if you need to create new indices. ## Creating your own database If you don't already have a database, here's some advice from my experiences setting up and running all of them. SQLite is by far the easiest to get started with, but the lack of window functions makes it limited for data analysis. PostgreSQL is not too much harder to use and has a wide range of built-in functions. In my opinion, you shouldn't bother with MySQL/MariaDB: it's a pain to set up, the documentation is subpar, and it's less featureful than Postgres. Google BigQuery might be a good fit if you have very large data, or if you're willing to pay (a small amount of) money to someone who'll look after your database. All of these databases follow a client-server model - a computer that connects to the database and the computer that is running the database (the two may be one and the same but usually isn't). Getting one of these databases up and running is beyond the scope of this article, but there are plenty of tutorials available on the web. ### MySQL/MariaDB In terms of functionality, MySQL lies somewhere between SQLite and PostgreSQL. It provides a wider range of [built-in functions](http://dev.mysql.com/doc/refman/5.0/en/functions.html), but it does not support window functions (so you can't do grouped mutates and filters). ### PostgreSQL PostgreSQL is a considerably more powerful database than SQLite. It has: * a much wider range of [built-in functions](http://www.postgresql.org/docs/current/static/functions.html), and * support for [window functions](http://www.postgresql.org/docs/current/static/tutorial-window.html), which allow grouped subset and mutates to work. ### BigQuery BigQuery is a hosted database server provided by Google. To connect, you need to provide your `project`, `dataset` and optionally a project for `billing` (if billing for `project` isn't enabled). It provides a similar set of functions to Postgres and is designed specifically for analytic workflows. Because it's a hosted solution, there's no setup involved, but if you have a lot of data, getting it to Google can be an ordeal (especially because upload support from R is not great currently). (If you have lots of data, you can [ship hard drives]()!) dbplyr/inst/doc/dbplyr.R0000644000176200001440000000407113221502515014705 0ustar liggesusers## ---- include = FALSE---------------------------------------------------- knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 6L, tibble.print_max = 6L, digits = 3) ## ---- eval = FALSE------------------------------------------------------- # install.packages("dbplyr") ## ----setup, message = FALSE---------------------------------------------- library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:") ## ---- eval = FALSE------------------------------------------------------- # con <- DBI::dbConnect(RMySQL::MySQL(), # host = "database.rstudio.com", # user = "hadley", # password = rstudioapi::askForPassword("Database password") # ) ## ------------------------------------------------------------------------ copy_to(con, nycflights13::flights, "flights", temporary = FALSE, indexes = list( c("year", "month", "day"), "carrier", "tailnum", "dest" ) ) ## ------------------------------------------------------------------------ flights_db <- tbl(con, "flights") ## ------------------------------------------------------------------------ flights_db ## ------------------------------------------------------------------------ flights_db %>% select(year:day, dep_delay, arr_delay) flights_db %>% filter(dep_delay > 240) flights_db %>% group_by(dest) %>% summarise(delay = mean(dep_time)) ## ------------------------------------------------------------------------ tailnum_delay_db <- flights_db %>% group_by(tailnum) %>% summarise( delay = mean(arr_delay), n = n() ) %>% arrange(desc(delay)) %>% filter(n > 100) ## ------------------------------------------------------------------------ tailnum_delay_db ## ------------------------------------------------------------------------ tailnum_delay_db %>% show_query() ## ------------------------------------------------------------------------ tailnum_delay <- tailnum_delay_db %>% collect() tailnum_delay ## ---- error = TRUE------------------------------------------------------- nrow(tailnum_delay_db) tail(tailnum_delay_db) dbplyr/inst/doc/sql-translation.Rmd0000644000176200001440000003023013102722665017071 0ustar liggesusers--- title: "SQL translation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{SQL translation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) ``` There are two components to dplyr's SQL translation system: * translation of vector expressions like `x * y + 10` * translation of whole verbs like `mutate()` or `summarise()` To explore them, you'll need to load both dbplyr and dplyr: ```{r, message = FALSE} library(dbplyr) library(dplyr) ``` ## Vectors Most filtering, mutating or summarising operations only perform simple mathematical operations. These operations are very similar between R and SQL, so they're easy to translate. To see what's happening yourself, you can use `translate_sql()`. The basic techniques that underlie the implementation of `translate_sql()` are described in ["Advanced R"](http://adv-r.had.co.nz/dsl.html). `translate_sql()` is built on top of R's parsing engine and has been carefully designed to generate correct SQL. It also protects you against SQL injection attacks by correctly escaping the strings and variable names needed by the database that you're connecting to. The following examples work through some of the basic differences between R and SQL. * `"` and `'` mean different things ```{r} # In SQLite variable names are escaped by double quotes: translate_sql(x) # And strings are escaped by single quotes translate_sql("x") ``` * Many functions have slightly different names ```{r} translate_sql(x == 1 && (y < 2 || z > 3)) translate_sql(x ^ 2 < 10) translate_sql(x %% 2 == 10) ``` * And some functions have different argument orders: ```{r} translate_sql(substr(x, 5, 10)) translate_sql(log(x, 10)) ``` * R and SQL have different defaults for integers and reals. In R, 1 is a real, and 1L is an integer. In SQL, 1 is an integer, and 1.0 is a real ```{r} translate_sql(1) translate_sql(1L) ``` * If statements are translated into a case statement: ```{r} translate_sql(if (x > 5) "big" else "small") ``` ### Known functions dplyr knows how to convert the following R functions to SQL: * basic math operators: `+`, `-`, `*`, `/`, `%%`, `^` * math functions: `abs`, `acos`, `acosh`, `asin`, `asinh`, `atan`, `atan2`, `atanh`, `ceiling`, `cos`, `cosh`, `cot`, `coth`, `exp`, `floor`, `log`, `log10`, `round`, `sign`, `sin`, `sinh`, `sqrt`, `tan`, `tanh` * logical comparisons: `<`, `<=`, `!=`, `>=`, `>`, `==`, `%in%` * boolean operations: `&`, `&&`, `|`, `||`, `!`, `xor` * basic aggregations: `mean`, `sum`, `min`, `max`, `sd`, `var` * string functions: `tolower`, `toupper`, `trimws`, `nchar`, `substr` * coerce types: `as.numeric`, `as.integer`, `as.character` Perfect translation is not possible because databases don't have all the functions that R does. The goal of dplyr is to provide a semantic rather than a literal translation: what you mean rather than what is done. In fact, even for functions that exist both in databases and R, you shouldn't expect results to be identical; database programmers have different priorities than R core programmers. For example, in R in order to get a higher level of numerical accuracy, `mean()` loops through the data twice. R's `mean()` also provides a `trim` option for computing trimmed means; this is something that databases do not provide. Databases automatically drop NULLs (their equivalent of missing values) whereas in R you have to ask nicely. This means the essence of simple calls like `mean(x)` will be translated accurately, but more complicated calls like `mean(x, trim = 0.5, na.rm = TRUE)` will raise an error: ```{r, error = TRUE} translate_sql(mean(x, na.rm = TRUE)) translate_sql(mean(x, trim = 0.1)) ``` `translate_sql()` takes an optional `con` parameter. If not supplied, this causes dplyr to generate (approximately) SQL-92 compliant SQL. If supplied, dplyr uses `sql_translate_env()` to look up a custom environment which makes it possible for different databases to generate slightly different SQL: see `vignette("new-backend")` for more details. ### Unknown functions Any function that dplyr doesn't know how to convert is left as is. This means that database functions that are not covered by dplyr can be used directly via `translate_sql()`. Here a couple of examples that will work with [SQLite](http://www.sqlite.org/lang_corefunc.html): ```{r} translate_sql(glob(x, y)) translate_sql(x %like% "ab%") ``` ### Window functions Things get a little trickier with window functions, because SQL's window functions are considerably more expressive than the specific variants provided by base R or dplyr. They have the form `[expression] OVER ([partition clause] [order clause] [frame_clause])`: * The __expression__ is a combination of variable names and window functions. Support for window functions varies from database to database, but most support the ranking functions, `lead`, `lag`, `nth`, `first`, `last`, `count`, `min`, `max`, `sum`, `avg` and `stddev`. * The __partition clause__ specifies how the window function is broken down over groups. It plays an analogous role to `GROUP BY` for aggregate functions, and `group_by()` in dplyr. It is possible for different window functions to be partitioned into different groups, but not all databases support it, and neither does dplyr. * The __order clause__ controls the ordering (when it makes a difference). This is important for the ranking functions since it specifies which variables to rank by, but it's also needed for cumulative functions and lead. Whenever you're thinking about before and after in SQL, you must always tell it which variable defines the order. If the order clause is missing when needed, some databases fail with an error message while others return non-deterministic results. * The __frame clause__ defines which rows, or __frame__, that are passed to the window function, describing which rows (relative to the current row) should be included. The frame clause provides two offsets which determine the start and end of frame. There are three special values: -Inf means to include all preceeding rows (in SQL, "unbounded preceding"), 0 means the current row ("current row"), and Inf means all following rows ("unbounded following)". The complete set of options is comprehensive, but fairly confusing, and is summarised visually below. ```{r} knitr::include_graphics("windows.png", dpi = 200) ``` Of the many possible specifications, there are only three that commonly used. They select between aggregation variants: * Recycled: `BETWEEN UNBOUND PRECEEDING AND UNBOUND FOLLOWING` * Cumulative: `BETWEEN UNBOUND PRECEEDING AND CURRENT ROW` * Rolling: `BETWEEN 2 PRECEEDING AND 2 FOLLOWING` dplyr generates the frame clause based on whether your using a recycled aggregate or a cumulative aggregate. To see how individual window functions are translated to SQL, we can again use `translate_sql()`: ```{r} translate_sql(mean(G)) translate_sql(rank(G)) translate_sql(ntile(G, 2)) translate_sql(lag(G)) ``` If the tbl has been grouped or arranged previously in the pipeline, then dplyr will use that information to set the "partition by" and "order by" clauses. For interactive exploration, you can achieve the same effect by setting the `vars_group` and `vars_order` arguments to `translate_sql()` ```{r} translate_sql(cummean(G), vars_order = "year") translate_sql(rank(), vars_group = "ID") ``` There are some challenges when translating window functions between R and SQL, because dplyr tries to keep the window functions as similar as possible to both the existing R analogues and to the SQL functions. This means that there are three ways to control the order clause depending on which window function you're using: * For ranking functions, the ordering variable is the first argument: `rank(x)`, `ntile(y, 2)`. If omitted or `NULL`, will use the default ordering associated with the tbl (as set by `arrange()`). * Accumulating aggregates only take a single argument (the vector to aggregate). To control ordering, use `order_by()`. * Aggregates implemented in dplyr (`lead`, `lag`, `nth_value`, `first_value`, `last_value`) have an `order_by` argument. Supply it to override the default ordering. The three options are illustrated in the snippet below: ```{r, eval = FALSE} mutate(players, min_rank(yearID), order_by(yearID, cumsum(G)), lead(G, order_by = yearID) ) ``` Currently there is no way to order by multiple variables, except by setting the default ordering with `arrange()`. This will be added in a future release. ## Whole tables All dplyr verbs generate a `SELECT` statement. To demonstrate we'll make a temporary database with a couple of tables ```{r} con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") flights <- copy_to(con, nycflights13::flights) airports <- copy_to(con, nycflights13::airports) ``` ### Single table verbs * `select()` and `mutate()` modify the `SELECT` clause: ```{r} flights %>% select(contains("delay")) %>% show_query() flights %>% select(distance, air_time) %>% mutate(speed = distance / (air_time / 60)) %>% show_query() ``` (As you can see here, the generated SQL isn't always as minimal as you might generate by hand.) * `filter()` generates a `WHERE` clause: ```{r} flights %>% filter(month == 1, day == 1) %>% show_query() ``` * `arrange()` generates an `ORDER BY` clause: ```{r} flights %>% arrange(carrier, desc(arr_delay)) %>% show_query() ``` * `summarise()` and `group_by()` work together to generate a `GROUP BY` clause: ```{r} flights %>% group_by(month, day) %>% summarise(delay = mean(dep_delay)) %>% show_query() ``` ### Dual table verbs | R | SQL |------------------|------------------------------------------------------------ | `inner_join()` | `SELECT * FROM x JOIN y ON x.a = y.a` | `left_join()` | `SELECT * FROM x LEFT JOIN y ON x.a = y.a` | `right_join()` | `SELECT * FROM x RIGHT JOIN y ON x.a = y.a` | `full_join()` | `SELECT * FROM x FULL JOIN y ON x.a = y.a` | `semi_join()` | `SELECT * FROM x WHERE EXISTS (SELECT 1 FROM y WHERE x.a = y.a)` | `anti_join()` | `SELECT * FROM x WHERE NOT EXISTS (SELECT 1 FROM y WHERE x.a = y.a)` | `intersect(x, y)`| `SELECT * FROM x INTERSECT SELECT * FROM y` | `union(x, y)` | `SELECT * FROM x UNION SELECT * FROM y` | `setdiff(x, y)` | `SELECT * FROM x EXCEPT SELECT * FROM y` `x` and `y` don't have to be tables in the same database. If you specify `copy = TRUE`, dplyr will copy the `y` table into the same location as the `x` variable. This is useful if you've downloaded a summarised dataset and determined a subset of interest that you now want the full data for. You can use `semi_join(x, y, copy = TRUE)` to upload the indices of interest to a temporary table in the same database as `x`, and then perform a efficient semi join in the database. If you're working with large data, it maybe also be helpful to set `auto_index = TRUE`. That will automatically add an index on the join variables to the temporary table. ### Behind the scenes The verb level SQL translation is implemented on top of `tbl_lazy`, which basically tracks the operations you perform in a pipeline (see `lazy-ops.R`). Turning that into a SQL query takes place in three steps: * `sql_build()` recurses over the lazy op data structure building up query objects (`select_query()`, `join_query()`, `set_op_query()` etc) that represent the different subtypes of `SELECT` queries that we might generate. * `sql_optimise()` takes a pass over these SQL objects, looking for potential optimisations. Currently this only involves removing subqueries where possible. * `sql_render()` calls an SQL generation function (`sql_select()`, `sql_join()`, `sql_subquery()`, `sql_semijoin()` etc) to produce the actual SQL. Each of these functions is a generic, taking the connection as an argument, so that the details can be customised for different databases. dbplyr/inst/doc/dbplyr.html0000644000176200001440000011013313221502516015446 0ustar liggesusers Introduction to dbplyr

Introduction to dbplyr

As well as working with local in-memory data stored in data frames, dplyr also works with remote on-disk data stored in databases. This is particularly useful in two scenarios:

(If your data fits in memory there is no advantage to putting it in a database: it will only be slower and more frustrating.)

This vignette focusses on the first scenario because it’s the most common. If you’re using R to do data analysis inside a company, most of the data you need probably already lives in a database (it’s just a matter of figuring out which one!). However, you will learn how to load data in to a local database in order to demonstrate dplyr’s database tools. At the end, I’ll also give you a few pointers if you do need to set up your own database.

Getting started

To use databases with dplyr you need to first install dbplyr:

install.packages("dbplyr")

You’ll also need to install a DBI backend package. The DBI package provides a common interface that allows dplyr to work with many different databases using the same code. DBI is automatically installed with dbplyr, but you need to install a specific backend for the database that you want to connect to.

Five commonly used backends are:

If the database you need to connect to is not listed here, you’ll need to do some investigation (i.e. googling) yourself.

In this vignette, we’re going to use the RSQLite backend which is automatically installed when you install dbplyr. SQLite is a great way to get started with databases because it’s completely embedded inside an R package. Unlike most other systems, you don’t need to setup a separate database server. SQLite is great for demos, but is surprisingly powerful, and with a little practice you can use it to easily work with many gigabytes of data.

Connecting to the database

To work with a database in dplyr, you must first connect to it, using DBI::dbConnect(). We’re not going to go into the details of the DBI package here, but it’s the foundation upon which dbplyr is built. You’ll need to learn more about if you need to do things to the database that are beyond the scope of dplyr.

library(dplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")

The arguments to DBI::dbConnect() vary from database to database, but the first argument is always the database backend. It’s RSQLite::SQLite() for RSQLite, RMySQL::MySQL() for RMySQL, RPostgreSQL::PostgreSQL() for RPostgreSQL, odbc::odbc() for odbc, and bigrquery::bigquery() for BigQuery. SQLite only needs one other argument: the path to the database. Here we use the special string ":memory:" which causes SQLite to make a temporary in-memory database.

Most existing databases don’t live in a file, but instead live on another server. That means in real-life that your code will look more like this:

con <- DBI::dbConnect(RMySQL::MySQL(),
host = "database.rstudio.com",
user = "hadley",
password = rstudioapi::askForPassword("Database password")
)

(If you’re not using RStudio, you’ll need some other way to securely retrieve your password. You should never record it in your analysis scripts or type it into the console. Securing Credentials provides some best practices.)

Our temporary database has no data in it, so we’ll start by copying over nycflights13::flights using the convenient copy_to() function. This is a quick and dirty way of getting data into a database and is useful primarily for demos and other small jobs.

copy_to(con, nycflights13::flights, "flights",
temporary = FALSE,
indexes = list(
c("year", "month", "day"),
"carrier",
"tailnum",
"dest"
)
)

As you can see, the copy_to() operation has an additional argument that allows you to supply indexes for the table. Here we set up indexes that will allow us to quickly process the data by day, carrier, plane, and destination. Creating the right indices is key to good database performance, but is unfortunately beyond the scope of this article.

Now that we’ve copied the data, we can use tbl() to take a reference to it:

flights_db <- tbl(con, "flights")

When you print it out, you’ll notice that it mostly looks like a regular tibble:

flights_db
#> # Source: table<flights> [?? x 19]
#> # Database: sqlite 3.19.3 []
#> year month day dep_t… sche… dep_… arr_… sche… arr_… carr… flig… tail…
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr>
#> 1 2013 1 1 517 515 2.00 830 819 11.0 UA 1545 N142…
#> 2 2013 1 1 533 529 4.00 850 830 20.0 UA 1714 N242…
#> 3 2013 1 1 542 540 2.00 923 850 33.0 AA 1141 N619…
#> 4 2013 1 1 544 545 -1.00 1004 1022 -18.0 B6 725 N804…
#> 5 2013 1 1 554 600 -6.00 812 837 -25.0 DL 461 N668…
#> 6 2013 1 1 554 558 -4.00 740 728 12.0 UA 1696 N394…
#> # ... with more rows, and 7 more variables: origin <chr>, dest <chr>,
#> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour
#> # <dbl>

The main difference is that you can see that it’s a remote source in a SQLite database.

Generating queries

To interact with a database you usually use SQL, the Structured Query Language. SQL is over 40 years old, and is used by pretty much every database in existence. The goal of dbplyr is to automatically generate SQL for you so that you’re not forced to use it. However, SQL is a very large language and dbplyr doesn’t do everything. It focusses on SELECT statements, the SQL you write most often as an analyst.

Most of the time you don’t need to know anything about SQL, and you can continue to use the dplyr verbs that you’re already familiar with:

flights_db %>% select(year:day, dep_delay, arr_delay)
#> # Source: lazy query [?? x 5]
#> # Database: sqlite 3.19.3 []
#> year month day dep_delay arr_delay
#> <int> <int> <int> <dbl> <dbl>
#> 1 2013 1 1 2.00 11.0
#> 2 2013 1 1 4.00 20.0
#> 3 2013 1 1 2.00 33.0
#> 4 2013 1 1 -1.00 -18.0
#> 5 2013 1 1 -6.00 -25.0
#> 6 2013 1 1 -4.00 12.0
#> # ... with more rows
flights_db %>% filter(dep_delay > 240)
#> # Source: lazy query [?? x 19]
#> # Database: sqlite 3.19.3 []
#> year month day dep_t… sche… dep_… arr_… sche… arr_… carr… flig… tail…
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr>
#> 1 2013 1 1 848 1835 853 1001 1950 851 MQ 3944 N942…
#> 2 2013 1 1 1815 1325 290 2120 1542 338 EV 4417 N171…
#> 3 2013 1 1 1842 1422 260 1958 1535 263 EV 4633 N181…
#> 4 2013 1 1 2115 1700 255 2330 1920 250 9E 3347 N924…
#> 5 2013 1 1 2205 1720 285 46 2040 246 AA 1999 N5DN…
#> 6 2013 1 1 2343 1724 379 314 1938 456 EV 4321 N211…
#> # ... with more rows, and 7 more variables: origin <chr>, dest <chr>,
#> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour
#> # <dbl>
flights_db %>%
group_by(dest) %>%
summarise(delay = mean(dep_time))
#> Warning: Missing values are always removed in SQL.
#> Use `AVG(x, na.rm = TRUE)` to silence this warning
#> # Source: lazy query [?? x 2]
#> # Database: sqlite 3.19.3 []
#> dest delay
#> <chr> <dbl>
#> 1 ABQ 2006
#> 2 ACK 1033
#> 3 ALB 1627
#> 4 ANC 1635
#> 5 ATL 1293
#> 6 AUS 1521
#> # ... with more rows

However, in the long-run, I highly recommend you at least learn the basics of SQL. It’s a valuable skill for any data scientist, and it will help you debug problems if you run into problems with dplyr’s automatic translation. If you’re completely new to SQL you might start with this codeacademy tutorial. If you have some familiarity with SQL and you’d like to learn more, I found how indexes work in SQLite and 10 easy steps to a complete understanding of SQL to be particularly helpful.

The most important difference between ordinary data frames and remote database queries is that your R code is translated into SQL and executed in the database on the remote server, not in R on your local machine. When working with databases, dplyr tries to be as lazy as possible:

For example, take the following code:

tailnum_delay_db <- flights_db %>%
group_by(tailnum) %>%
summarise(
delay = mean(arr_delay),
n = n()
) %>%
arrange(desc(delay)) %>%
filter(n > 100)

Suprisingly, this sequence of operations never touches the database. It’s not until you ask for the data (e.g. by printing tailnum_delay) that dplyr generates the SQL and requests the results from the database. Even then it tries to do as little work as possible and only pulls down a few rows.

tailnum_delay_db
#> Warning: Missing values are always removed in SQL.
#> Use `AVG(x, na.rm = TRUE)` to silence this warning
#> # Source: lazy query [?? x 3]
#> # Database: sqlite 3.19.3 []
#> # Ordered by: desc(delay)
#> tailnum delay n
#> <chr> <dbl> <int>
#> 1 N11119 30.3 148
#> 2 N16919 29.9 251
#> 3 N14998 27.9 230
#> 4 N15910 27.6 280
#> 5 N13123 26.0 121
#> 6 N11192 25.9 154
#> # ... with more rows

Behind the scenes, dplyr is translating your R code into SQL. You can see the SQL it’s generating with show_query():

tailnum_delay_db %>% show_query()
#> Warning: Missing values are always removed in SQL.
#> Use `AVG(x, na.rm = TRUE)` to silence this warning
#> <SQL>
#> SELECT *
#> FROM (SELECT *
#> FROM (SELECT `tailnum`, AVG(`arr_delay`) AS `delay`, COUNT() AS `n`
#> FROM `flights`
#> GROUP BY `tailnum`)
#> ORDER BY `delay` DESC)
#> WHERE (`n` > 100.0)

If you’re familiar with SQL, this probably isn’t exactly what you’d write by hand, but it does the job. You can learn more about the SQL translation in vignette("sql-translation").

Typically, you’ll iterate a few times before you figure out what data you need from the database. Once you’ve figured it out, use collect() to pull all the data down into a local tibble:

tailnum_delay <- tailnum_delay_db %>% collect()
#> Warning: Missing values are always removed in SQL.
#> Use `AVG(x, na.rm = TRUE)` to silence this warning
tailnum_delay
#> # A tibble: 1,201 x 3
#> tailnum delay n
#> <chr> <dbl> <int>
#> 1 N11119 30.3 148
#> 2 N16919 29.9 251
#> 3 N14998 27.9 230
#> 4 N15910 27.6 280
#> 5 N13123 26.0 121
#> 6 N11192 25.9 154
#> # ... with 1,195 more rows

collect() requires that database does some work, so it may take a long time to complete. Otherwise, dplyr tries to prevent you from accidentally performing expensive query operations:

nrow(tailnum_delay_db)
#> [1] NA
tail(tailnum_delay_db)
#> Error: tail() is not supported by sql sources

You can also ask the database how it plans to execute the query with explain(). The output is database dependent, and can be esoteric, but learning a bit about it can be very useful because it helps you understand if the database can execute the query efficiently, or if you need to create new indices.

Creating your own database

If you don’t already have a database, here’s some advice from my experiences setting up and running all of them. SQLite is by far the easiest to get started with, but the lack of window functions makes it limited for data analysis. PostgreSQL is not too much harder to use and has a wide range of built-in functions. In my opinion, you shouldn’t bother with MySQL/MariaDB: it’s a pain to set up, the documentation is subpar, and it’s less featureful than Postgres. Google BigQuery might be a good fit if you have very large data, or if you’re willing to pay (a small amount of) money to someone who’ll look after your database.

All of these databases follow a client-server model - a computer that connects to the database and the computer that is running the database (the two may be one and the same but usually isn’t). Getting one of these databases up and running is beyond the scope of this article, but there are plenty of tutorials available on the web.

MySQL/MariaDB

In terms of functionality, MySQL lies somewhere between SQLite and PostgreSQL. It provides a wider range of built-in functions, but it does not support window functions (so you can’t do grouped mutates and filters).

PostgreSQL

PostgreSQL is a considerably more powerful database than SQLite. It has:

BigQuery

BigQuery is a hosted database server provided by Google. To connect, you need to provide your project, dataset and optionally a project for billing (if billing for project isn’t enabled).

It provides a similar set of functions to Postgres and is designed specifically for analytic workflows. Because it’s a hosted solution, there’s no setup involved, but if you have a lot of data, getting it to Google can be an ordeal (especially because upload support from R is not great currently). (If you have lots of data, you can ship hard drives!)

dbplyr/inst/doc/sql-translation.html0000644000176200001440000032632213221502520017311 0ustar liggesusers SQL translation

SQL translation

There are two components to dplyr’s SQL translation system:

To explore them, you’ll need to load both dbplyr and dplyr:

library(dbplyr)
library(dplyr)

Vectors

Most filtering, mutating or summarising operations only perform simple mathematical operations. These operations are very similar between R and SQL, so they’re easy to translate. To see what’s happening yourself, you can use translate_sql(). The basic techniques that underlie the implementation of translate_sql() are described in “Advanced R”. translate_sql() is built on top of R’s parsing engine and has been carefully designed to generate correct SQL. It also protects you against SQL injection attacks by correctly escaping the strings and variable names needed by the database that you’re connecting to.

The following examples work through some of the basic differences between R and SQL.

Known functions

dplyr knows how to convert the following R functions to SQL:

  • basic math operators: +, -, *, /, %%, ^
  • math functions: abs, acos, acosh, asin, asinh, atan, atan2, atanh, ceiling, cos, cosh, cot, coth, exp, floor, log, log10, round, sign, sin, sinh, sqrt, tan, tanh
  • logical comparisons: <, <=, !=, >=, >, ==, %in%
  • boolean operations: &, &&, |, ||, !, xor
  • basic aggregations: mean, sum, min, max, sd, var
  • string functions: tolower, toupper, trimws, nchar, substr
  • coerce types: as.numeric, as.integer, as.character

Perfect translation is not possible because databases don’t have all the functions that R does. The goal of dplyr is to provide a semantic rather than a literal translation: what you mean rather than what is done. In fact, even for functions that exist both in databases and R, you shouldn’t expect results to be identical; database programmers have different priorities than R core programmers. For example, in R in order to get a higher level of numerical accuracy, mean() loops through the data twice. R’s mean() also provides a trim option for computing trimmed means; this is something that databases do not provide. Databases automatically drop NULLs (their equivalent of missing values) whereas in R you have to ask nicely. This means the essence of simple calls like mean(x) will be translated accurately, but more complicated calls like mean(x, trim = 0.5, na.rm = TRUE) will raise an error:

translate_sql(mean(x, na.rm = TRUE))
#> <SQL> avg("x") OVER ()
translate_sql(mean(x, trim = 0.1))
#> Error in mean(x, trim = 0.1): unused argument (trim = 0.1)

translate_sql() takes an optional con parameter. If not supplied, this causes dplyr to generate (approximately) SQL-92 compliant SQL. If supplied, dplyr uses sql_translate_env() to look up a custom environment which makes it possible for different databases to generate slightly different SQL: see vignette("new-backend") for more details.

Unknown functions

Any function that dplyr doesn’t know how to convert is left as is. This means that database functions that are not covered by dplyr can be used directly via translate_sql(). Here a couple of examples that will work with SQLite:

translate_sql(glob(x, y))
#> <SQL> GLOB("x", "y")
translate_sql(x %like% "ab%")
#> <SQL> "x" LIKE 'ab%'

Window functions

Things get a little trickier with window functions, because SQL’s window functions are considerably more expressive than the specific variants provided by base R or dplyr. They have the form [expression] OVER ([partition clause] [order clause] [frame_clause]):

  • The expression is a combination of variable names and window functions. Support for window functions varies from database to database, but most support the ranking functions, lead, lag, nth, first, last, count, min, max, sum, avg and stddev.

  • The partition clause specifies how the window function is broken down over groups. It plays an analogous role to GROUP BY for aggregate functions, and group_by() in dplyr. It is possible for different window functions to be partitioned into different groups, but not all databases support it, and neither does dplyr.

  • The order clause controls the ordering (when it makes a difference). This is important for the ranking functions since it specifies which variables to rank by, but it’s also needed for cumulative functions and lead. Whenever you’re thinking about before and after in SQL, you must always tell it which variable defines the order. If the order clause is missing when needed, some databases fail with an error message while others return non-deterministic results.

  • The frame clause defines which rows, or frame, that are passed to the window function, describing which rows (relative to the current row) should be included. The frame clause provides two offsets which determine the start and end of frame. There are three special values: -Inf means to include all preceeding rows (in SQL, “unbounded preceding”), 0 means the current row (“current row”), and Inf means all following rows (“unbounded following)”. The complete set of options is comprehensive, but fairly confusing, and is summarised visually below.

    knitr::include_graphics("windows.png", dpi = 200)

    Of the many possible specifications, there are only three that commonly used. They select between aggregation variants:

    • Recycled: BETWEEN UNBOUND PRECEEDING AND UNBOUND FOLLOWING

    • Cumulative: BETWEEN UNBOUND PRECEEDING AND CURRENT ROW

    • Rolling: BETWEEN 2 PRECEEDING AND 2 FOLLOWING

    dplyr generates the frame clause based on whether your using a recycled aggregate or a cumulative aggregate.

To see how individual window functions are translated to SQL, we can again use translate_sql():

translate_sql(mean(G))
#> Warning: Missing values are always removed in SQL.
#> Use `avg(x, na.rm = TRUE)` to silence this warning
#> <SQL> avg("G") OVER ()
translate_sql(rank(G))
#> <SQL> rank() OVER (ORDER BY "G")
translate_sql(ntile(G, 2))
#> <SQL> NTILE(2) OVER (ORDER BY "G")
translate_sql(lag(G))
#> <SQL> LAG("G", 1, NULL) OVER ()

If the tbl has been grouped or arranged previously in the pipeline, then dplyr will use that information to set the “partition by” and “order by” clauses. For interactive exploration, you can achieve the same effect by setting the vars_group and vars_order arguments to translate_sql()

translate_sql(cummean(G), vars_order = "year")
#> <SQL> mean("G") OVER (ORDER BY "year" ROWS UNBOUNDED PRECEDING)
translate_sql(rank(), vars_group = "ID")
#> <SQL> rank() OVER (PARTITION BY "ID")

There are some challenges when translating window functions between R and SQL, because dplyr tries to keep the window functions as similar as possible to both the existing R analogues and to the SQL functions. This means that there are three ways to control the order clause depending on which window function you’re using:

  • For ranking functions, the ordering variable is the first argument: rank(x), ntile(y, 2). If omitted or NULL, will use the default ordering associated with the tbl (as set by arrange()).

  • Accumulating aggregates only take a single argument (the vector to aggregate). To control ordering, use order_by().

  • Aggregates implemented in dplyr (lead, lag, nth_value, first_value, last_value) have an order_by argument. Supply it to override the default ordering.

The three options are illustrated in the snippet below:

mutate(players,
min_rank(yearID),
order_by(yearID, cumsum(G)),
lead(G, order_by = yearID)
)

Currently there is no way to order by multiple variables, except by setting the default ordering with arrange(). This will be added in a future release.

Whole tables

All dplyr verbs generate a SELECT statement. To demonstrate we’ll make a temporary database with a couple of tables

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
flights <- copy_to(con, nycflights13::flights)
airports <- copy_to(con, nycflights13::airports)

Single table verbs

  • select() and mutate() modify the SELECT clause:

    flights %>%
    select(contains("delay")) %>%
    show_query()
    #> <SQL>
    #> SELECT `dep_delay`, `arr_delay`
    #> FROM `nycflights13::flights`
    flights %>%
    select(distance, air_time) %>%
    mutate(speed = distance / (air_time / 60)) %>%
    show_query()
    #> <SQL>
    #> SELECT `distance`, `air_time`, `distance` / (`air_time` / 60.0) AS `speed`
    #> FROM (SELECT `distance`, `air_time`
    #> FROM `nycflights13::flights`)

    (As you can see here, the generated SQL isn’t always as minimal as you might generate by hand.)

  • filter() generates a WHERE clause:

    flights %>%
    filter(month == 1, day == 1) %>%
    show_query()
    #> <SQL>
    #> SELECT *
    #> FROM `nycflights13::flights`
    #> WHERE ((`month` = 1.0) AND (`day` = 1.0))
  • arrange() generates an ORDER BY clause:

    flights %>%
    arrange(carrier, desc(arr_delay)) %>%
    show_query()
    #> <SQL>
    #> SELECT *
    #> FROM `nycflights13::flights`
    #> ORDER BY `carrier`, `arr_delay` DESC
  • summarise() and group_by() work together to generate a GROUP BY clause:

    flights %>%
    group_by(month, day) %>%
    summarise(delay = mean(dep_delay)) %>%
    show_query()
    #> Warning: Missing values are always removed in SQL.
    #> Use `AVG(x, na.rm = TRUE)` to silence this warning
    #> <SQL>
    #> SELECT `month`, `day`, AVG(`dep_delay`) AS `delay`
    #> FROM `nycflights13::flights`
    #> GROUP BY `month`, `day`

Dual table verbs

R SQL
inner_join() SELECT * FROM x JOIN y ON x.a = y.a
left_join() SELECT * FROM x LEFT JOIN y ON x.a = y.a
right_join() SELECT * FROM x RIGHT JOIN y ON x.a = y.a
full_join() SELECT * FROM x FULL JOIN y ON x.a = y.a
semi_join() SELECT * FROM x WHERE EXISTS (SELECT 1 FROM y WHERE x.a = y.a)
anti_join() SELECT * FROM x WHERE NOT EXISTS (SELECT 1 FROM y WHERE x.a = y.a)
intersect(x, y) SELECT * FROM x INTERSECT SELECT * FROM y
union(x, y) SELECT * FROM x UNION SELECT * FROM y
setdiff(x, y) SELECT * FROM x EXCEPT SELECT * FROM y

x and y don’t have to be tables in the same database. If you specify copy = TRUE, dplyr will copy the y table into the same location as the x variable. This is useful if you’ve downloaded a summarised dataset and determined a subset of interest that you now want the full data for. You can use semi_join(x, y, copy = TRUE) to upload the indices of interest to a temporary table in the same database as x, and then perform a efficient semi join in the database.

If you’re working with large data, it maybe also be helpful to set auto_index = TRUE. That will automatically add an index on the join variables to the temporary table.

Behind the scenes

The verb level SQL translation is implemented on top of tbl_lazy, which basically tracks the operations you perform in a pipeline (see lazy-ops.R). Turning that into a SQL query takes place in three steps:

  • sql_build() recurses over the lazy op data structure building up query objects (select_query(), join_query(), set_op_query() etc) that represent the different subtypes of SELECT queries that we might generate.

  • sql_optimise() takes a pass over these SQL objects, looking for potential optimisations. Currently this only involves removing subqueries where possible.

  • sql_render() calls an SQL generation function (sql_select(), sql_join(), sql_subquery(), sql_semijoin() etc) to produce the actual SQL. Each of these functions is a generic, taking the connection as an argument, so that the details can be customised for different databases.

dbplyr/inst/doc/new-backend.R0000644000176200001440000000146113221502516015570 0ustar liggesusers## ---- echo = FALSE, message = FALSE-------------------------------------- knitr::opts_chunk$set(collapse = T, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) ## ----setup, message = FALSE---------------------------------------------- library(dplyr) library(DBI) ## ------------------------------------------------------------------------ con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:") DBI::dbWriteTable(con, "mtcars", mtcars) tbl(con, "mtcars") ## ------------------------------------------------------------------------ #' @export db_desc.PostgreSQLConnection <- function(x) { info <- dbGetInfo(x) host <- if (info$host == "") "localhost" else info$host paste0("postgres ", info$serverVersion, " [", info$user, "@", host, ":", info$port, "/", info$dbname, "]") } dbplyr/inst/doc/new-backend.Rmd0000644000176200001440000000644413173721605016127 0ustar liggesusers--- title: "Adding a new DBI backend" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Adding a new DBI backend} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, echo = FALSE, message = FALSE} knitr::opts_chunk$set(collapse = T, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) ``` This document describes how to add a new SQL backend to dbplyr. To begin: * Ensure that you have a DBI compliant database backend. If not, you'll need to first create it by following the instructions in `vignette("backend", package = "DBI")`. * You'll need a working knowledge of S3. Make sure that you're [familiar with the basics](http://adv-r.had.co.nz/OO-essentials.html#s3) before you start. This document is still a work in progress, but it will hopefully get you started. I'd also strongly recommend reading the bundled source code for [SQLite](https://github.com/tidyverse/dbplyr/blob/master/R/db-sqlite.r), [MySQL](https://github.com/tidyverse/dbplyr/blob/master/R/db-mysql.r), and [PostgreSQL](https://github.com/tidyverse/dbplyr/blob/master/R/db-postgres.r). ## First steps For interactive exploitation, attach dplyr and DBI. If you're creating a package, you'll need to import dplyr and DBI. ```{r setup, message = FALSE} library(dplyr) library(DBI) ``` Check that you can create a tbl from a connection, like: ```{r} con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:") DBI::dbWriteTable(con, "mtcars", mtcars) tbl(con, "mtcars") ``` If you can't, this likely indicates some problem with the DBI methods. Use [DBItest](https://github.com/rstats-db/DBItest) to narrow down the problem. Now is a good time to implement a method for `db_desc()`. This should briefly describe the connection, typically formatting the information returned from `dbGetInfo()`. This is what dbplyr does for Postgres connections: ```{r} #' @export db_desc.PostgreSQLConnection <- function(x) { info <- dbGetInfo(x) host <- if (info$host == "") "localhost" else info$host paste0("postgres ", info$serverVersion, " [", info$user, "@", host, ":", info$port, "/", info$dbname, "]") } ``` ## Copying, computing, collecting and collapsing Next, check that `copy_to()`, `collapse()`, `compute()`, and `collect()` work. * If `copy_to()` fails, it's likely you need a method for `db_write_table()`, `db_create_indexes()` or `db_analyze()`. * If `collapse()` fails, your database has a non-standard way of constructing subqueries. Add a method for `sql_subquery()`. * If `compute()` fails, your database has a non-standard way of saving queries in temporary tables. Add a method for `db_save_query()`. ## SQL translation Make sure you've read `vignette("sql-translation")` so you have the lay of the land. ### Verbs Check that SQL translation for the key verbs work: * `summarise()`, `mutate()`, `filter()` etc: powered by `sql_select()` * `left_join()`, `inner_join()`: powered by `sql_join()` * `semi_join()`, `anti_join()`: powered by `sql_semi_join()` * `union()`, `intersect()`, `setdiff()`: powered by `sql_set_op()` ### Vectors Finally, you may have to provide custom R -> SQL translation at the vector level by providing a method for `src_translate_env()`. This function should return an object created by `sql_variant()`. See existing methods for examples. dbplyr/inst/doc/new-backend.html0000644000176200001440000003344213221502517016340 0ustar liggesusers Adding a new DBI backend

Adding a new DBI backend

This document describes how to add a new SQL backend to dbplyr. To begin:

This document is still a work in progress, but it will hopefully get you started. I’d also strongly recommend reading the bundled source code for SQLite, MySQL, and PostgreSQL.

First steps

For interactive exploitation, attach dplyr and DBI. If you’re creating a package, you’ll need to import dplyr and DBI.

library(dplyr)
library(DBI)

Check that you can create a tbl from a connection, like:

con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
DBI::dbWriteTable(con, "mtcars", mtcars)
tbl(con, "mtcars")
#> # Source: table<mtcars> [?? x 11]
#> # Database: sqlite 3.19.3 []
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21.0 6.00 160 110 3.90 2.62 16.5 0 1.00 4.00 4.00
#> 2 21.0 6.00 160 110 3.90 2.88 17.0 0 1.00 4.00 4.00
#> 3 22.8 4.00 108 93.0 3.85 2.32 18.6 1.00 1.00 4.00 1.00
#> 4 21.4 6.00 258 110 3.08 3.22 19.4 1.00 0 3.00 1.00
#> # ... with more rows

If you can’t, this likely indicates some problem with the DBI methods. Use DBItest to narrow down the problem.

Now is a good time to implement a method for db_desc(). This should briefly describe the connection, typically formatting the information returned from dbGetInfo(). This is what dbplyr does for Postgres connections:

#' @export
db_desc.PostgreSQLConnection <- function(x) {
info <- dbGetInfo(x)
host <- if (info$host == "") "localhost" else info$host
paste0("postgres ", info$serverVersion, " [", info$user, "@",
host, ":", info$port, "/", info$dbname, "]")
}

Copying, computing, collecting and collapsing

Next, check that copy_to(), collapse(), compute(), and collect() work.

SQL translation

Make sure you’ve read vignette("sql-translation") so you have the lay of the land.

Verbs

Check that SQL translation for the key verbs work:

  • summarise(), mutate(), filter() etc: powered by sql_select()
  • left_join(), inner_join(): powered by sql_join()
  • semi_join(), anti_join(): powered by sql_semi_join()
  • union(), intersect(), setdiff(): powered by sql_set_op()

Vectors

Finally, you may have to provide custom R -> SQL translation at the vector level by providing a method for src_translate_env(). This function should return an object created by sql_variant(). See existing methods for examples.

dbplyr/tests/0000755000176200001440000000000013221502521012701 5ustar liggesusersdbplyr/tests/testthat.R0000644000176200001440000000007013173736536014706 0ustar liggesuserslibrary(testthat) library(dbplyr) test_check("dbplyr") dbplyr/tests/testthat/0000755000176200001440000000000013223156042014546 5ustar liggesusersdbplyr/tests/testthat/test-pull.R0000644000176200001440000000065713075231072016634 0ustar liggesuserscontext("pull") test_that("default extracts last var from remote source", { mf <- memdb_frame(x = 1:10, y = 1:10) expect_equal(pull(mf), 1:10) }) test_that("can extract by name, or positive/negative position", { x <- 1:10 mf <- memdb_frame(x, y = runif(10)) expect_equal(pull(mf, x), x) expect_equal(pull(mf, 1L), x) expect_equal(pull(mf, 1), x) expect_equal(pull(mf, -2), x) expect_equal(pull(mf, -2L), x) }) dbplyr/tests/testthat/test-sql-query.R0000644000176200001440000000033113221455430017606 0ustar liggesuserscontext("test-sql-query.R") test_that("add_suffixes works if no suffix requested", { expect_equal(add_suffixes(c("x", "x"), "y", ""), c("x", "x")) expect_equal(add_suffixes(c("x", "y"), "y", ""), c("x", "y")) }) dbplyr/tests/testthat/test-sql-optimise.R0000644000176200001440000000174313176707444020320 0ustar liggesuserscontext("SQL: optimise") test_that("optimisation is turned on by default", { lf <- lazy_frame(x = 1, y = 2) %>% arrange(x) %>% head(5) qry <- lf %>% sql_build() expect_equal(qry$from, ident("df")) }) # Optimisations ----------------------------------------------------------- test_that("group by then limit is collapsed", { lf <- memdb_frame(x = 1:10, y = 2) %>% group_by(x) %>% summarise(y = sum(y, na.rm = TRUE)) %>% head(1) qry <- lf %>% sql_build() expect_equal(qry$limit, 1L) expect_equal(qry$group_by, sql('"x"')) # And check that it returns the correct value expect_equal(collect(lf), tibble(x = 1L, y = 2)) }) test_that("filter and rename are correctly composed", { lf <- memdb_frame(x = 1, y = 2) %>% filter(x == 1) %>% select(x = y) qry <- lf %>% sql_build() expect_equal(qry$select, ident(x = "y")) expect_equal(qry$where, sql('"x" = 1.0')) # It surprises me that this SQL works! expect_equal(collect(lf), tibble(x = 2)) }) dbplyr/tests/testthat/explain-sqlite.txt0000644000176200001440000000000013221464723020242 0ustar liggesusersdbplyr/tests/testthat/test-translate-sql-window.r0000644000176200001440000000035713173457716022032 0ustar liggesuserscontext("test-translate-sql-window.r") test_that("aggregation functions warn if na.rm = FALSE", { sql_mean <- win_aggregate("mean") expect_warning(sql_mean("x"), "Missing values") expect_warning(sql_mean("x", na.rm = TRUE), NA) }) dbplyr/tests/testthat/test-group-by.r0000644000176200001440000000220213066503201017444 0ustar liggesuserscontext("group_by") test_that("group_by with add = TRUE adds groups", { mf <- memdb_frame(x = 1:3, y = 1:3) gf1 <- mf %>% group_by(x, y) gf2 <- mf %>% group_by(x) %>% group_by(y, add = TRUE) expect_equal(group_vars(gf1), c("x", "y")) expect_equal(group_vars(gf2), c("x", "y")) }) test_that("collect, collapse and compute preserve grouping", { g <- memdb_frame(x = 1:3, y = 1:3) %>% group_by(x, y) expect_equal(group_vars(compute(g)), c("x", "y")) expect_equal(group_vars(collapse(g)), c("x", "y")) expect_equal(group_vars(collect(g)), c("x", "y")) }) test_that("joins preserve grouping", { g <- memdb_frame(x = 1:3, y = 1:3) %>% group_by(x) expect_equal(group_vars(inner_join(g, g, by = c("x", "y"))), "x") expect_equal(group_vars(left_join(g, g, by = c("x", "y"))), "x") expect_equal(group_vars(semi_join(g, g, by = c("x", "y"))), "x") expect_equal(group_vars(anti_join(g, g, by = c("x", "y"))), "x") }) test_that("group_by can perform mutate", { mf <- memdb_frame(x = 3:1, y = 1:3) out <- mf %>% group_by(z = x + y) %>% summarise(n = n()) %>% collect() expect_equal(out, tibble(z = 4L, n = 3L)) }) dbplyr/tests/testthat/test-arrange.r0000644000176200001440000000034113066532316017332 0ustar liggesuserscontext("arrange") test_that("two arranges equivalent to one", { mf <- memdb_frame(x = c(2, 2, 1), y = c(1, -1, 1)) mf1 <- mf %>% arrange(x, y) mf2 <- mf %>% arrange(y) %>% arrange(x) expect_equal_tbl(mf1, mf2) }) dbplyr/tests/testthat/test-output.R0000644000176200001440000000243013177053111017206 0ustar liggesuserscontext("output") test_that("ungrouped output", { if (packageVersion("tibble") < "1.0-10") skip("need tibble 1.0-10 or later for this test") mtcars_mem <- src_memdb() %>% copy_to(mtcars, name = "mtcars-output-test", overwrite = TRUE) iris_mem <- src_memdb() %>% copy_to(iris, name = "iris-output-test", overwrite = TRUE) withr::with_options( list(digits = 4, width = 80), with_mock( `dbplyr::sqlite_version` = function() "x.y.z", expect_equal( mtcars_mem %>% tbl_sum(), c( Source = "table [?? x 11]", Database = "sqlite x.y.z [:memory:]" ) ), expect_equal( mtcars_mem %>% group_by(cyl, gear) %>% tbl_sum(), c( Source = "table [?? x 11]", Database = "sqlite x.y.z [:memory:]", Groups = "cyl, gear" ) ), expect_equal( iris_mem %>% group_by(Species) %>% arrange(Sepal.Length, Sepal.Width) %>% tbl_sum(), c( Source = "table [?? x 5]", Database = "sqlite x.y.z [:memory:]", Groups = "Species", "Ordered by" = "Sepal.Length, Sepal.Width" ) ) ) ) }) dbplyr/tests/testthat/test-translate.r0000644000176200001440000001104013174404170017702 0ustar liggesuserscontext("translate") test_that("dplyr.strict_sql = TRUE prevents auto conversion", { old <- options(dplyr.strict_sql = TRUE) on.exit(options(old)) expect_equal(translate_sql(1 + 2), sql("1.0 + 2.0")) expect_error(translate_sql(blah(x)), "could not find function") }) test_that("Wrong number of arguments raises error", { expect_error(translate_sql(mean(1, 2, na.rm = TRUE), window = FALSE), "unused argument") }) test_that("between translated to special form (#503)", { out <- translate_sql(between(x, 1, 2)) expect_equal(out, sql('"x" BETWEEN 1.0 AND 2.0')) }) test_that("is.na and is.null are equivalent", { # Needs to be wrapped in parens to ensure correct precedence expect_equal(translate_sql(is.na(x)), sql('(("x") IS NULL)')) expect_equal(translate_sql(is.null(x)), sql('(("x") IS NULL)')) expect_equal(translate_sql(x + is.na(x)), sql('"x" + (("x") IS NULL)')) expect_equal(translate_sql(!is.na(x)), sql('NOT((("x") IS NULL))')) }) test_that("if translation adds parens", { expect_equal( translate_sql(if (x) y), sql('CASE WHEN ("x") THEN ("y") END') ) expect_equal( translate_sql(if (x) y else z), sql('CASE WHEN ("x") THEN ("y") WHEN NOT("x") THEN ("z") END') ) }) test_that("if and ifelse use correctly named arguments",{ exp <- translate_sql(if (x) 1 else 2) expect_equal(translate_sql(ifelse(test = x, yes = 1, no = 2)), exp) expect_equal(translate_sql(if_else(condition = x, true = 1, false = 2)), exp) }) test_that("all forms of if translated to case statement", { expected <- sql('CASE WHEN ("x") THEN (1) WHEN NOT("x") THEN (2) END') expect_equal(translate_sql(if (x) 1L else 2L), expected) expect_equal(translate_sql(ifelse(x, 1L, 2L)), expected) expect_equal(translate_sql(if_else(x, 1L, 2L)), expected) }) test_that("pmin and pmax become min and max", { expect_equal(translate_sql(pmin(x, y)), sql('MIN("x", "y")')) expect_equal(translate_sql(pmax(x, y)), sql('MAX("x", "y")')) }) test_that("%in% translation parenthesises when needed", { expect_equal(translate_sql(x %in% 1L), sql('"x" IN (1)')) expect_equal(translate_sql(x %in% c(1L)), sql('"x" IN (1)')) expect_equal(translate_sql(x %in% 1:2), sql('"x" IN (1, 2)')) expect_equal(translate_sql(x %in% y), sql('"x" IN "y"')) }) test_that("n_distinct can take multiple values", { expect_equal( translate_sql(n_distinct(x), window = FALSE), sql('COUNT(DISTINCT "x")') ) expect_equal( translate_sql(n_distinct(x, y), window = FALSE), sql('COUNT(DISTINCT "x", "y")') ) }) test_that("na_if is translated to NULL_IF", { expect_equal(translate_sql(na_if(x, 0L)), sql('NULL_IF("x", 0)')) }) test_that("connection affects quoting character", { dbTest <- src_sql("test", con = simulate_test()) testTable <- tbl_sql("test", src = dbTest, from = ident("table1")) out <- select(testTable, field1) expect_match(sql_render(out), "^SELECT `field1`\nFROM `table1`$") }) test_that("magrittr pipe is translated", { expect_identical(translate_sql(1 %>% is.na()), translate_sql(is.na(1))) }) # string functions -------------------------------------------------------- test_that("different arguments of substr are corrected", { expect_equal(translate_sql(substr(x, 3, 4)), sql('substr("x", 3, 2)')) expect_equal(translate_sql(substr(x, 3, 3)), sql('substr("x", 3, 1)')) expect_equal(translate_sql(substr(x, 3, 2)), sql('substr("x", 3, 0)')) expect_equal(translate_sql(substr(x, 3, 1)), sql('substr("x", 3, 0)')) }) # stringr ------------------------------------------- test_that("str_length() translates correctly ", { expect_equivalent( translate_sql( str_length(field_name)), sql("LENGTH(\"field_name\")")) }) test_that("str_to_upper() translates correctly ", { expect_equivalent( translate_sql( str_to_upper(field_name)), sql("UPPER(\"field_name\")")) }) test_that("str_to_lower() translates correctly ", { expect_equivalent( translate_sql( str_to_lower(field_name)), sql("LOWER(\"field_name\")")) }) test_that("str_replace_all() translates correctly ", { expect_equivalent( translate_sql( str_replace_all(field_name, "pattern", "replacement")), sql("REPLACE(\"field_name\", 'pattern', 'replacement')")) }) test_that("str_detect() translates correctly ", { expect_equivalent( translate_sql( str_detect(field_name, "pattern")), sql("INSTR('pattern', \"field_name\") > 0")) }) test_that("str_trim() translates correctly ", { expect_equivalent( translate_sql( str_trim(field_name, "both")), sql("LTRIM(RTRIM(\"field_name\"))")) }) dbplyr/tests/testthat/test-translate-hive.R0000644000176200001440000000055313175643172020612 0ustar liggesuserscontext("translate-Hive") test_that("custom scalar & string functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_hive()) } expect_equal(trans(cot(x)), sql("1.0 / TAN(`x`)")) expect_equal(trans(str_replace_all(x, "old", "new")), sql("REGEXP_REPLACE(`x`, 'old', 'new')")) }) dbplyr/tests/testthat/test-collect.R0000644000176200001440000000122413053073557017304 0ustar liggesuserscontext("collect") test_that("collect equivalent to as.data.frame/as_tibble", { mf <- memdb_frame(letters) expect_equal(as.data.frame(mf), data.frame(letters, stringsAsFactors = FALSE)) expect_equal(tibble::as_tibble(mf), tibble::tibble(letters)) expect_equal(collect(mf), tibble::tibble(letters)) }) test_that("explicit collection returns all data", { n <- 1e5 + 10 # previous default was 1e5 big <- memdb_frame(x = seq_len(n)) nrow1 <- big %>% as.data.frame() %>% nrow() nrow2 <- big %>% tibble::as_tibble() %>% nrow() nrow3 <- big %>% collect() %>% nrow() expect_equal(nrow1, n) expect_equal(nrow2, n) expect_equal(nrow3, n) }) dbplyr/tests/testthat/test-sql-escape.r0000644000176200001440000000665513221470364017763 0ustar liggesuserscontext("SQL: escaping") # Identifiers ------------------------------------------------------------------ ei <- function(...) unclass(escape(ident(c(...)))) test_that("identifiers are doubled quoted", { expect_equal(ei("x"), '"x"') }) test_that("identifiers are comma separated", { expect_equal(ei("x", "y"), '"x", "y"') }) test_that("identifier names become AS", { expect_equal(ei(x = "y"), '"y" AS "x"') }) # Zero-length inputs ------------------------------------------------------ test_that("zero length inputs yield zero length output when not collapsed", { expect_equal(sql_vector(sql(), collapse = NULL), sql()) expect_equal(sql_vector(ident(), collapse = NULL), sql()) expect_equal(sql_vector(ident_q(), collapse = NULL), sql()) }) test_that("zero length inputs yield length-1 output when collapsed", { expect_equal(sql_vector(sql(), parens = FALSE, collapse = ""), sql("")) expect_equal(sql_vector(sql(), parens = TRUE, collapse = ""), sql("()")) expect_equal(sql_vector(ident(), parens = FALSE, collapse = ""), sql("")) expect_equal(sql_vector(ident(), parens = TRUE, collapse = ""), sql("()")) expect_equal(sql_vector(ident_q(), parens = FALSE, collapse = ""), sql("")) expect_equal(sql_vector(ident_q(), parens = TRUE, collapse = ""), sql("()")) }) # Special values ---------------------------------------------------------------- test_that("missing vaues become null", { expect_equal(escape(NA), sql("NULL")) expect_equal(escape(NA_real_), sql("NULL")) expect_equal(escape(NA_integer_), sql("NULL")) expect_equal(escape(NA_character_), sql("NULL")) }) test_that("-Inf and Inf are expanded and quoted", { expect_equal(escape(Inf), sql("'Infinity'")) expect_equal(escape(-Inf), sql("'-Infinity'")) }) test_that("logical is SQL-99 compatible (by default)", { expect_equal(escape(TRUE), sql("TRUE")) expect_equal(escape(FALSE), sql("FALSE")) expect_equal(escape(NA), sql("NULL")) }) test_that("can escape integer64 values", { skip_if_not_installed("bit64") expect_equal(escape(bit64::as.integer64(NA)), sql("NULL")) expect_equal(escape(bit64::as.integer64("123456789123456789")), sql("123456789123456789")) }) # Times ------------------------------------------------------------------- test_that("times are converted to ISO 8601", { x <- ISOdatetime(2000, 1, 2, 3, 4, 5, tz = "US/Central") expect_equal(escape(x), sql("'2000-01-02T09:04:05Z'")) }) # Name aliasing ----------------------------------------------------------- test_that("queries generated by select() don't alias unnecessarily", { lf_build <- lazy_frame(x = 1) %>% select(x) %>% sql_build() lf_render <- sql_render(lf_build, con = simulate_dbi()) expect_equal(lf_render, sql("SELECT \"x\"\nFROM \"df\"")) }) test_that("names_to_as() doesn't alias when ident name and value are identical", { x <- ident(name = "name") y <- sql_escape_ident(con = simulate_dbi(), x = x) expect_equal(names_to_as(y, names2(x), con = simulate_dbi()), "\"name\"") }) test_that("names_to_as() doesn't alias when ident name is missing", { x <- ident("*") y <- sql_escape_ident(con = simulate_dbi(), x = x) expect_equal(names_to_as(y, names2(x), con = simulate_dbi()), "\"*\"") }) test_that("names_to_as() aliases when ident name and value are different", { x <- ident(new_name = "name") y <- sql_escape_ident(con = simulate_dbi(), x = x) expect_equal(names_to_as(y, names2(x), con = simulate_dbi()), "\"name\" AS \"new_name\"") }) dbplyr/tests/testthat/test-translate-sql-paste.R0000644000176200001440000000057313176642775021603 0ustar liggesuserscontext("test-translate-sql-paste.R") test_that("basic infix operation", { sql_paste <- sql_paste_infix("", "&&", function(x) sql_expr(cast(UQ(x) %as% text))) x <- ident("x") y <- ident("y") expect_equal(sql_paste(x), sql('CAST("x" AS text)')) expect_equal(sql_paste(x, y), sql('"x" && "y"')) expect_equal(sql_paste(x, y, sep = " "), sql('"x" && \' \' && "y"')) }) dbplyr/tests/testthat/test-translate-window.R0000644000176200001440000000277613173460342021171 0ustar liggesuserscontext("translate-window") test_that("window functions without group have empty over", { expect_equal(translate_sql(n()), sql("COUNT(*) OVER ()")) expect_equal(translate_sql(sum(x, na.rm = TRUE)), sql('sum("x") OVER ()')) }) test_that("aggregating window functions ignore order_by", { expect_equal( translate_sql(n(), vars_order = "x"), sql("COUNT(*) OVER ()") ) expect_equal( translate_sql(sum(x, na.rm = TRUE), vars_order = "x"), sql('sum("x") OVER ()') ) }) test_that("order_by overrides default ordering", { expect_equal( translate_sql(order_by(y, cumsum(x)), vars_order = "x"), sql('sum("x") OVER (ORDER BY "y" ROWS UNBOUNDED PRECEDING)') ) }) test_that("cumulative windows warn if no order", { expect_warning(translate_sql(cumsum(x)), "does not have explicit order") expect_warning(translate_sql(cumsum(x), vars_order = "x"), NA) }) test_that("ntile always casts to integer", { expect_equal( translate_sql(ntile(x, 10.5)), sql('NTILE(10) OVER (ORDER BY "x")') ) }) test_that("first, last, and nth translated to _value", { expect_equal(translate_sql(first(x)), sql('first_value("x") OVER ()')) expect_equal(translate_sql(last(x)), sql('last_value("x") OVER ()')) expect_equal(translate_sql(nth(x, 1)), sql('nth_value("x", 1) OVER ()')) }) test_that("can override frame of recycled functions", { expect_equal( translate_sql(sum(x, na.rm = TRUE), vars_frame = c(-1, 0), vars_order = "y"), sql('sum("x") OVER (ORDER BY "y" ROWS 1 PRECEDING)') ) }) dbplyr/tests/testthat/test-select.r0000644000176200001440000000237213173732453017203 0ustar liggesuserscontext("select") df <- as.data.frame(as.list(setNames(1:26, letters))) tbls <- test_load(df) test_that("two selects equivalent to one", { mf <- memdb_frame(a = 1, b = 1, c = 1, d = 2) out <- mf %>% select(a:c) %>% select(b:c) %>% collect() expect_named(out, c("b", "c")) }) test_that("select operates on mutated vars", { mf <- memdb_frame(x = c(1, 2, 3), y = c(3, 2, 1)) df1 <- mf %>% mutate(x, z = x + y) %>% select(z) %>% collect() df2 <- mf %>% collect() %>% mutate(x, z = x + y) %>% select(z) expect_equal_tbl(df1, df2) }) test_that("select renames variables (#317)", { mf <- memdb_frame(x = 1, y = 2) expect_equal_tbl(mf %>% select(A = x), tibble(A = 1)) }) test_that("rename renames variables", { mf <- memdb_frame(x = 1, y = 2) expect_equal_tbl(mf %>% rename(A = x), tibble(A = 1, y = 2)) }) test_that("can rename multiple vars", { mf <- memdb_frame(a = 1, b = 2) exp <- tibble(c = 1, d = 2) expect_equal_tbl(mf %>% rename(c = a, d = b), exp) expect_equal_tbl(mf %>% group_by(a) %>% rename(c = a, d = b), exp) }) test_that("select preserves grouping vars", { mf <- memdb_frame(a = 1, b = 2) %>% group_by(b) out <- mf %>% select(a) %>% collect() expect_named(out, c("b", "a")) }) dbplyr/tests/testthat/test-translate-impala.R0000644000176200001440000000045213175643172021120 0ustar liggesuserscontext("translate-Impala") test_that("custom scalar functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_impala()) } expect_equal(trans(as.Date(x)), sql("CAST(`x` AS VARCHAR(10))")) expect_equal(trans(ceiling(x)), sql("CEIL(`x`)")) }) dbplyr/tests/testthat/test-translate-sql-helpers.r0000644000176200001440000000036013173456405022151 0ustar liggesuserscontext("test-translate-sql-helpers.r") test_that("aggregation functions warn if na.rm = FALSE", { sql_mean <- sql_aggregate("mean") expect_warning(sql_mean("x"), "Missing values") expect_warning(sql_mean("x", na.rm = TRUE), NA) }) dbplyr/tests/testthat/test-sql-expr.R0000644000176200001440000000072513174460454017437 0ustar liggesuserscontext("test-sql-expr.R") test_that("atomic vectors are escaped", { expect_equal(sql_expr(2), sql("2.0")) expect_equal(sql_expr("x"), sql("'x'")) }) test_that("user infix functions have % stripped", { expect_equal(sql_expr(x %like% y), sql("x LIKE y")) }) test_that("string function names are not quoted", { f <- "foo" expect_equal(sql_expr(UQ(f)()), sql("FOO()")) }) test_that("correct number of parens", { expect_equal(sql_expr((1L)), sql("(1)")) }) dbplyr/tests/testthat/test-translate-vectorised.R0000644000176200001440000000064713123003501022005 0ustar liggesuserscontext("translate-vectorised") test_that("case_when converted to CASE WHEN", { expect_equal( translate_sql(case_when(x > 1L ~ "a")), sql('CASE\nWHEN ("x" > 1) THEN (\'a\')\nEND') ) }) test_that("even inside mutate", { out <- lazy_frame(x = 1:5) %>% mutate(y = case_when(x > 1L ~ "a")) %>% sql_build() expect_equal( out$select[[2]], 'CASE\nWHEN ("x" > 1) THEN (\'a\')\nEND AS "y"' ) }) dbplyr/tests/testthat/test-translate-sqlite.R0000644000176200001440000000053413176643431021156 0ustar liggesuserscontext("translate-sqlite") test_that("vectorised translations", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_sqlite(), window = FALSE) } expect_equal(trans(nullif(x, 0L)), sql('NULLIF(`x`, 0)')) expect_equal(trans(paste(x, y)), sql("`x` || ' ' || `y`")) expect_equal(trans(paste0(x, y)), sql("`x` || `y`")) }) dbplyr/tests/testthat/test-ident.R0000644000176200001440000000075713221463257016771 0ustar liggesuserscontext("ident") test_that("zero length inputs return correct clases", { expect_s3_class(ident(), "ident") expect_s3_class(ident_q(), "ident_q") }) test_that("ident quotes and ident_q doesn't", { x1 <- ident("x") x2 <- ident_q('"x"') expect_equal(escape(x1), sql('"x"')) expect_equal(escape(x2), sql('"x"')) }) test_that("ident are left unchanged when coerced to sql", { x1 <- ident("x") x2 <- ident_q('"x"') expect_equal(as.sql(x1), x1) expect_equal(as.sql(x2), x2) }) dbplyr/tests/testthat/test-translate-mssql.r0000644000176200001440000001122113221275427021044 0ustar liggesuserscontext("translate-MSSQL") test_that("custom scalar translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_mssql()) } expect_equal(trans(as.numeric(x)), sql("CAST(`x` AS NUMERIC)")) expect_equal(trans(as.double(x)), sql("CAST(`x` AS NUMERIC)")) expect_equal(trans(as.character(x)), sql("CAST(`x` AS VARCHAR(MAX))")) expect_equal(trans(log(x)), sql("LOG(`x`)")) expect_equal(trans(nchar(x)), sql("LEN(`x`)")) expect_equal(trans(atan2(x)), sql("ATN2(`x`)")) expect_equal(trans(ceiling(x)), sql("CEILING(`x`)")) expect_equal(trans(ceil(x)), sql("CEILING(`x`)")) expect_equal(trans(substr(x, 1, 2)), sql("SUBSTRING(`x`, 1.0, 2.0)")) expect_equal(trans(trimws(x)), sql("LTRIM(RTRIM(`x`))")) expect_error(trans(paste(x)), sql("not available")) }) test_that("custom stringr functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_mssql()) } expect_equal(trans(str_length(x)), sql("LEN(`x`)")) expect_equal(trans(str_locate(x, "find")), sql("CHARINDEX('find', `x`)")) expect_equal(trans(str_detect(x, "find")), sql("CHARINDEX('find', `x`) > 0")) }) test_that("custom aggregators translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = FALSE, con = simulate_mssql()) } expect_equal(trans(sd(x, na.rm = TRUE)), sql("STDEV(`x`)")) expect_equal(trans(var(x, na.rm = TRUE)), sql("VAR(`x`)")) expect_error(trans(cor(x)), "not available") expect_error(trans(cov(x)), "not available") }) test_that("custom window functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = TRUE, con = simulate_mssql()) } expect_equal(trans(sd(x, na.rm = TRUE)), sql("STDEV(`x`) OVER ()")) expect_equal(trans(var(x, na.rm = TRUE)), sql("VAR(`x`) OVER ()")) expect_error(trans(cor(x)), "not supported") expect_error(trans(cov(x)), "not supported") }) test_that("filter and mutate translate is.na correctly", { mf <- lazy_frame(x = 1, src = simulate_mssql()) expect_equal( mf %>% head() %>% show_query(), sql("SELECT TOP 6 *\nFROM `df`") ) expect_equal( mf %>% mutate(z = is.na(x)) %>% show_query(), sql("SELECT `x`, CONVERT(BIT, IIF(`x` IS NULL, 1, 0)) AS `z`\nFROM `df`") ) expect_equal( mf %>% mutate(z = !is.na(x)) %>% show_query(), sql("SELECT `x`, ~(CONVERT(BIT, IIF(`x` IS NULL, 1, 0))) AS `z`\nFROM `df`") ) expect_equal( mf %>% filter(is.na(x)) %>% show_query(), sql("SELECT *\nFROM `df`\nWHERE (((`x`) IS NULL))") ) expect_equal( mf %>% mutate(x = x == 1) %>% show_query(), sql("SELECT CONVERT(BIT, IIF(`x` = 1.0, 1.0, 0.0)) AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = x != 1) %>% show_query(), sql("SELECT CONVERT(BIT, IIF(`x` != 1.0, 1.0, 0.0)) AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = x > 1) %>% show_query(), sql("SELECT CONVERT(BIT, IIF(`x` > 1.0, 1.0, 0.0)) AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = x >= 1) %>% show_query(), sql("SELECT CONVERT(BIT, IIF(`x` >= 1.0, 1.0, 0.0)) AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = !(x == 1)) %>% show_query(), sql("SELECT ~((CONVERT(BIT, IIF(`x` = 1.0, 1.0, 0.0)))) AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = !(x != 1)) %>% show_query(), sql("SELECT ~((CONVERT(BIT, IIF(`x` != 1.0, 1.0, 0.0)))) AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = !(x > 1)) %>% show_query(), sql("SELECT ~((CONVERT(BIT, IIF(`x` > 1.0, 1.0, 0.0)))) AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = !(x >= 1)) %>% show_query(), sql("SELECT ~((CONVERT(BIT, IIF(`x` >= 1.0, 1.0, 0.0)))) AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = x > 4 & x < 5) %>% show_query(), sql("SELECT CONVERT(BIT, IIF(`x` > 4.0, 1.0, 0.0)) & CONVERT(BIT, IIF(`x` < 5.0, 1.0, 0.0)) AS `x`\nFROM `df`") ) expect_equal( mf %>% filter(x > 4 & x < 5) %>% show_query(), sql("SELECT *\nFROM `df`\nWHERE (`x` > 4.0 AND `x` < 5.0)") ) expect_equal( mf %>% mutate(x = x > 4 | x < 5) %>% show_query(), sql("SELECT CONVERT(BIT, IIF(`x` > 4.0, 1.0, 0.0)) | CONVERT(BIT, IIF(`x` < 5.0, 1.0, 0.0)) AS `x`\nFROM `df`") ) expect_equal( mf %>% filter(x > 4 | x < 5) %>% show_query(), sql("SELECT *\nFROM `df`\nWHERE (`x` > 4.0 OR `x` < 5.0)") ) expect_equal( mf %>% mutate(x = ifelse(x == 0, 0 ,1)) %>% show_query(), sql("SELECT CASE WHEN ((CONVERT(BIT, IIF(`x` = 0.0, 1.0, 0.0))) = 'TRUE') THEN (0.0) WHEN ((CONVERT(BIT, IIF(`x` = 0.0, 1.0, 0.0))) = 'FALSE') THEN (1.0) END AS `x`\nFROM `df`") ) }) dbplyr/tests/testthat/test-partial_eval.R0000644000176200001440000000100213066543301020305 0ustar liggesuserscontext("partial_eval") test_that("subsetting always evaluated locally", { x <- list(a = 1, b = 1) y <- c(2, 1) correct <- quote(`_var` == 1) expect_equal(partial_eval(quote(`_var` == x$a)), correct) expect_equal(partial_eval(quote(`_var` == x[[2]])), correct) expect_equal(partial_eval(quote(`_var` == y[2])), correct) }) test_that("namespace operators always evaluated locally", { expect_equal(partial_eval(quote(base::sum(1, 2))), 3) expect_equal(partial_eval(quote(base:::sum(1, 2))), 3) }) dbplyr/tests/testthat/test-translate-math.R0000644000176200001440000000360513174465455020616 0ustar liggesuserscontext("translate-math") test_that("basic arithmetic is correct", { expect_equal(translate_sql(1 + 2), sql("1.0 + 2.0")) expect_equal(translate_sql(2 * 4), sql("2.0 * 4.0")) expect_equal(translate_sql(5 ^ 2), sql("POWER(5.0, 2.0)")) expect_equal(translate_sql(100L %% 3L), sql("100 % 3")) }) test_that("small numbers aren't converted to 0", { expect_equal(translate_sql(1e-9), sql("1e-09")) }) # minus ------------------------------------------------------------------- test_that("unary minus flips sign of number", { expect_equal(translate_sql(-10L), sql("-10")) expect_equal(translate_sql(x == -10), sql('"x" = -10.0')) expect_equal(translate_sql(x %in% c(-1L, 0L)), sql('"x" IN (-1, 0)')) }) test_that("unary minus wraps non-numeric expressions", { expect_equal(translate_sql(-(1L + 2L)), sql("-(1 + 2)")) expect_equal(translate_sql(-mean(x, na.rm = TRUE), window = FALSE), sql('-AVG("x")')) }) test_that("binary minus subtracts", { expect_equal(translate_sql(1L - 10L), sql("1 - 10")) }) # log --------------------------------------------------------------------- test_that("log base comes first", { expect_equal(translate_sql(log(x, 10)), sql('LOG(10.0, "x")')) }) test_that("log becomes ln", { expect_equal(translate_sql(log(x)), sql('LN("x")')) }) test_that("sqlite mimics two argument log", { translate_sqlite <- function(...) { translate_sql(..., con = src_memdb()$con) } expect_equal(translate_sqlite(log(x)), sql('LOG(`x`)')) expect_equal(translate_sqlite(log(x, 10)), sql('LOG(`x`) / LOG(10.0)')) }) test_that("postgres mimics two argument log", { translate_postgres <- function(...) { translate_sql(..., con = simulate_postgres()) } expect_equal(translate_postgres(log(x)), sql('LN(\"x\")')) expect_equal(translate_postgres(log(x, 10)), sql('LOG(\"x\") / LOG(10.0)')) expect_equal(translate_postgres(log(x, 10L)), sql('LOG(\"x\") / LOG(10)')) }) dbplyr/tests/testthat/test-schema.R0000644000176200001440000000223113173736576017130 0ustar liggesuserscontext("schema") # Create database with a schema sqlite_con_with_aux <- function() { tmp <- tempfile() con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") DBI::dbExecute(con, paste0("ATTACH '", tmp, "' AS aux")) con } test_that("can refer to default schema explicitly", { con <- sqlite_con_with_aux() on.exit(DBI::dbDisconnect(con)) DBI::dbExecute(con, "CREATE TABLE t1 (x)") expect_equal(tbl_vars(tbl(con, "t1")), "x") expect_equal(tbl_vars(tbl(con, in_schema("main", "t1"))), "x") }) test_that("can distinguish 'schema.table' from 'schema'.'table'", { con <- sqlite_con_with_aux() on.exit(DBI::dbDisconnect(con)) DBI::dbExecute(con, "CREATE TABLE aux.t1 (x, y, z)") DBI::dbExecute(con, "CREATE TABLE 'aux.t1' (a, b, c)") expect_equal(tbl_vars(tbl(con, in_schema("aux", "t1"))), c("x", "y", "z")) expect_equal(tbl_vars(tbl(con, ident("aux.t1"))), c("a", "b", "c")) }) test_that("can create a new table in non-default schema", { con <- sqlite_con_with_aux() on.exit(DBI::dbDisconnect(con)) aux_mtcars <- copy_to(con, mtcars, in_schema("aux", "mtcars"), temporary = FALSE) expect_equal(tbl_vars(aux_mtcars), tbl_vars(mtcars)) }) dbplyr/tests/testthat/test-do.R0000644000176200001440000000237013066535257016270 0ustar liggesuserscontext("do") test_that("ungrouped data collected first", { out <- memdb_frame(x = 1:2) %>% do(head(.)) expect_equal(out, tibble(x = 1:2)) }) test_that("named argument become list columns", { mf <- memdb_frame( g = rep(1:3, 1:3), x = 1:6 ) %>% group_by(g) out <- mf %>% do(nrow = nrow(.), ncol = ncol(.)) expect_equal(out$nrow, list(1, 2, 3)) expect_equal(out$ncol, list(2, 2, 2)) }) test_that("unnamed results bound together by row", { mf <- memdb_frame( g = c(1, 1, 2, 2), x = c(3, 9, 4, 9) ) %>% group_by(g) first <- mf %>% do(head(., 1)) expect_equal_tbl(first, tibble(g = c(1, 2), x = c(3, 4))) }) test_that("Results respect select", { mf <- memdb_frame( g = c(1, 1, 2, 2), x = c(3, 9, 4, 9), y = 1:4, z = 4:1 ) %>% group_by(g) out <- mf %>% select(x) %>% do(ncol = ncol(.)) expect_equal(out$g, c(1, 2)) expect_equal(out$ncol, list(2L, 2L)) }) test_that("results independent of chunk_size", { mf <- memdb_frame( g = rep(1:3, 1:3), x = 1:6 ) %>% group_by(g) nrows <- function(group, n) { unlist(do(group, nrow = nrow(.), .chunk_size = n)$nrow) } expect_equal(nrows(mf, 1), c(1, 2, 3)) expect_equal(nrows(mf, 2), c(1, 2, 3)) expect_equal(nrows(mf, 10), c(1, 2, 3)) }) dbplyr/tests/testthat/test-translate-literals.R0000644000176200001440000000105613070721623021465 0ustar liggesuserscontext("translate-literals") test_that("default logical translation is SQL 99", { expect_equal(translate_sql(FALSE), sql("FALSE")) expect_equal(translate_sql(TRUE), sql("TRUE")) expect_equal(translate_sql(NA), sql("NULL")) }) test_that("SQLite logical translation is to integers", { translate_sql_sqlite <- function(x) { translate_sql(!! enquo(x), con = simulate_sqlite()) } expect_equal(translate_sql_sqlite(FALSE), sql("0")) expect_equal(translate_sql_sqlite(TRUE), sql("1")) expect_equal(translate_sql_sqlite(NA), sql("NULL")) }) dbplyr/tests/testthat/test-translate-oracle.R0000644000176200001440000000153313175643172021123 0ustar liggesuserscontext("translate-Oracle") test_that("custom scalar functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_oracle()) } expect_equal(trans(as.character(x)), sql("CAST(`x` AS VARCHAR(255))")) expect_equal(trans(as.double(x)), sql("CAST(`x` AS NUMBER)")) }) test_that("queries translate correctly", { mf <- lazy_frame(x = 1, src = simulate_oracle()) expect_match( mf %>% head() %>% sql_render(simulate_oracle()), sql("^SELECT [*] FROM [(]SELECT [*]\nFROM [(]`df`[)] [)] `[^`]*` WHERE ROWNUM [<][=] 6") ) expect_match( mf %>% group_by(x) %>% tally %>% ungroup() %>% tally() %>% sql_render(simulate_oracle()), sql("^SELECT COUNT[(][*][)] AS `nn`\nFROM [(]SELECT `x`, COUNT[(][*][)] AS `n`\nFROM [(]`df`[)] \nGROUP BY `x`[)] `[^`]*`$") ) }) dbplyr/tests/testthat/test-translate-odbc.R0000644000176200001440000000510713175643172020566 0ustar liggesuserscontext("translate-odbc") test_that("custom scalar translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_odbc("OdbcConnection")) } expect_equal(trans(as.numeric(x)), sql("CAST(`x` AS DOUBLE)")) expect_equal(trans(as.double(x)), sql("CAST(`x` AS DOUBLE)")) expect_equal(trans(as.integer(x)), sql("CAST(`x` AS INT)")) expect_equal(trans(as.logical(x)), sql("CAST(`x` AS BOOLEAN)")) expect_equal(trans(as.character(x)), sql("CAST(`x` AS STRING)")) expect_equal(trans(as.Date(x)), sql("CAST(`x` AS DATE)")) expect_equal(trans(paste0(x, y)), sql("CONCAT(`x`, `y`)")) expect_equal(trans(cosh(x)), sql("(EXP(`x`) + EXP(-(`x`))) / 2")) expect_equal(trans(sinh(x)), sql("(EXP(`x`) - EXP(-(`x`))) / 2")) expect_equal(trans(tanh(x)), sql("((EXP(`x`) - EXP(-(`x`))) / 2) / ((EXP(`x`) + EXP(-(`x`))) / 2)")) expect_equal(trans(round(10.1)), sql("ROUND(10.1, 0)")) expect_equal(trans(round(10.1, digits = 1)), sql("ROUND(10.1, 1)")) expect_equal(trans(coth(x)), sql("((EXP(`x`) + EXP(-(`x`))) / 2) / ((EXP(`x`) - EXP(-(`x`))) / 2)")) expect_equal(trans(paste(x, y)), sql("CONCAT_WS(' ', `x`,`y`)")) expect_equal(trans(paste(x, y, sep = ",")), sql("CONCAT_WS(',', `x`,`y`)")) }) test_that("custom aggregators translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = FALSE, con = simulate_odbc("OdbcConnection")) } expect_equal(trans(sd(x)), sql("STDDEV_SAMP(`x`)")) expect_equal(trans(count()), sql("COUNT(*)")) expect_equal(trans(n()), sql("COUNT(*)")) expect_equal(trans(n_distinct(x)), sql("COUNT(DISTINCT `x`)")) }) test_that("custom window functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = TRUE, con = simulate_odbc("OdbcConnection")) } expect_equal(trans(sd(x, na.rm = TRUE)), sql("STDDEV_SAMP(`x`) OVER ()")) expect_equal(trans(count()), sql("COUNT(*) OVER ()")) expect_equal(trans(n()), sql("COUNT(*) OVER ()")) expect_equal(trans(n_distinct(x)), sql("COUNT(DISTINCT `x`) OVER ()")) }) test_that("queries translate correctly", { mf <- lazy_frame(x = 1, src = simulate_odbc("OdbcConnection")) expect_equal( mf %>% tally() %>% show_query(), sql("SELECT COUNT(*) AS `n`\nFROM `df`") ) expect_equal( mf %>% summarise(count = n()) %>% show_query(), sql("SELECT COUNT(*) AS `count`\nFROM `df`") ) }) dbplyr/tests/testthat/test-mutate.r0000644000176200001440000000521413221275427017216 0ustar liggesuserscontext("mutate") test_that("mutate computed before summarise", { mf <- memdb_frame(x = c(1, 2, 3), y = c(9, 8, 7)) out <- mutate(mf, z = x + y) %>% summarise(sum_z = sum(z, na.rm = TRUE)) %>% collect() expect_equal(out$sum_z, 30) }) test_that("two mutates equivalent to one", { mf <- memdb_frame(x = c(1, 5, 9), y = c(3, 12, 11)) df1 <- mf %>% mutate(x2 = x * 2, y4 = y * 4) %>% collect() df2 <- mf %>% collect() %>% mutate(x2 = x * 2, y4 = y * 4) expect_equal_tbl(df1, df2) }) test_that("can refer to fresly created values", { out1 <- memdb_frame(x1 = 1) %>% mutate(x2 = x1 + 1, x3 = x2 + 1, x4 = x3 + 1) %>% collect() expect_equal(out1, tibble(x1 = 1, x2 = 2, x3 = 3, x4 = 4)) out2 <- memdb_frame(x = 1) %>% mutate(x = x + 1, x = x + 1, x = x + 1) %>% collect() expect_equal(out2, tibble(x = 4)) }) test_that("queries are not nested unnecessarily", { # Should only be one query deep sql <- memdb_frame(x = 1) %>% mutate(y = x + 1, a = y + 1, b = y + 1) %>% sql_build() expect_s3_class(sql$from, "select_query") expect_s3_class(sql$from$from, "ident") }) test_that("maintains order of existing columns (#3216, #3223)", { lazy <- lazy_frame(x = 1, y = 2) %>% mutate(z = 3, y = 4, y = 5) expect_equal(op_vars(lazy), c("x", "y", "z")) }) test_that("supports overwriting variables (#3222)", { df <- memdb_frame(x = 1, y = 2) %>% mutate(y = 4, y = 5) %>% collect() expect_equal(df, tibble(x = 1, y = 5)) df <- memdb_frame(x = 1, y = 2) %>% mutate(y = 4, y = y + 1) %>% collect() expect_equal(df, tibble(x = 1, y = 5)) df <- memdb_frame(x = 1, y = 2) %>% mutate(y = 4, y = x + 4) %>% collect() expect_equal(df, tibble(x = 1, y = 5)) }) # SQL generation ----------------------------------------------------------- test_that("mutate calls windowed versions of sql functions", { dfs <- test_frame_windowed(x = 1:4, g = rep(c(1, 2), each = 2)) out <- map(dfs, . %>% group_by(g) %>% mutate(r = as.numeric(row_number(x)))) expect_equal(out$df$r, c(1, 2, 1, 2)) expect_equal_tbls(out) }) test_that("recycled aggregates generate window function", { dfs <- test_frame_windowed(x = 1:4, g = rep(c(1, 2), each = 2)) out <- map(dfs, . %>% group_by(g) %>% mutate(r = x > mean(x, na.rm = TRUE))) expect_equal(out$df$r, c(FALSE, TRUE, FALSE, TRUE)) expect_equal_tbls(out) }) test_that("cumulative aggregates generate window function", { dfs <- test_frame_windowed(x = 1:4, g = rep(c(1, 2), each = 2)) out <- map(dfs, . %>% group_by(g) %>% arrange(x) %>% mutate(r = as.numeric(cumsum(x))) ) expect_equal(out$df$r, c(1, 3, 3, 7)) expect_equal_tbls(out) }) dbplyr/tests/testthat/test-win_over.R0000644000176200001440000000145713070720726017513 0ustar liggesuserscontext("win_over") test_that("over() only requires first argument", { expect_equal(win_over("X"), sql("'X' OVER ()")) }) test_that("multiple group by or order values don't have parens", { expect_equal( win_over(ident("x"), order = c("x", "y")), sql('"x" OVER (ORDER BY "x", "y")') ) expect_equal( win_over(ident("x"), partition = c("x", "y")), sql('"x" OVER (PARTITION BY "x", "y")') ) }) test_that("connection affects quoting window function fields", { old <- set_current_con(simulate_test()) on.exit(set_current_con(old)) expect_equal(win_over(ident("x")), sql("`x` OVER ()")) expect_equal( win_over(ident("x"), partition = "x"), sql("`x` OVER (PARTITION BY `x`)") ) expect_equal( win_over(ident("x"), order = "x"), sql("`x` OVER (ORDER BY `x`)") ) }) dbplyr/tests/testthat/test-sql-render.R0000644000176200001440000001003213053067477017734 0ustar liggesuserscontext("SQL: render") # These test the full SQL rendering pipeline by running very simple examples # against a live SQLite database. # Single table ------------------------------------------------------------ test_that("rendering table wraps in SELECT *", { out <- memdb_frame(x = 1) expect_match(out %>% sql_render, "^SELECT [*]\nFROM `[^`]*`$") expect_equal(out %>% collect, data_frame(x = 1)) }) test_that("quoting for rendering mutated grouped table", { out <- memdb_frame(x = 1, y = 2) %>% mutate(y = x) expect_match(out %>% sql_render, "^SELECT `x`, `x` AS `y`\nFROM `[^`]*`$") expect_equal(out %>% collect, data_frame(x = 1, y = 1)) }) test_that("quoting for rendering ordered grouped table", { out <- memdb_frame(x = 1, y = 2) %>% group_by(x) %>% arrange(y) expect_match(out %>% sql_render, "^SELECT [*]\nFROM `[^`]*`\nORDER BY `y`$") expect_equal(out %>% collect, data_frame(x = 1, y = 2)) }) test_that("quoting for rendering summarized grouped table", { out <- memdb_frame(x = 1) %>% group_by(x) %>% summarize(n = n()) expect_match(out %>% sql_render, "^SELECT `x`, COUNT[(][)] AS `n`\nFROM `[^`]*`\nGROUP BY `x`$") expect_equal(out %>% collect, data_frame(x = 1, n = 1L)) }) # Single table verbs ------------------------------------------------------ test_that("select quotes correctly", { out <- memdb_frame(x = 1, y = 1) %>% select(x) %>% collect() expect_equal(out, data_frame(x = 1)) }) test_that("select can rename", { out <- memdb_frame(x = 1, y = 2) %>% select(y = x) %>% collect() expect_equal(out, data_frame(y = 1)) }) test_that("distinct adds DISTINCT suffix", { out <- memdb_frame(x = c(1, 1)) %>% distinct() expect_match(out %>% sql_render(), "SELECT DISTINCT") expect_equal(out %>% collect(), data_frame(x = 1)) }) test_that("distinct over columns uses GROUP BY", { out <- memdb_frame(x = c(1, 2), y = c(1, 1)) %>% distinct(y) expect_match(out %>% sql_render(), "SELECT `y`.*GROUP BY `y`") expect_equal(out %>% collect(), data_frame(y = 1)) }) test_that("head limits rows returned", { out <- memdb_frame(x = 1:100) %>% head(10) %>% collect() expect_equal(nrow(out), 10) }) test_that("head accepts fractional input", { out <- memdb_frame(x = 1:100) %>% head(10.5) %>% collect() expect_equal(nrow(out), 10) }) test_that("head renders to integer fractional input", { out <- memdb_frame(x = 1:100) %>% head(10.5) %>% sql_render() expect_match(out, "LIMIT 10$") }) test_that("head works with huge whole numbers", { out <- memdb_frame(x = 1:100) %>% head(1e10) %>% collect() expect_equal(out, data_frame(x = 1:100)) }) test_that("mutate overwrites previous variables", { df <- memdb_frame(x = 1:5) %>% mutate(x = x + 1) %>% mutate(x = x + 1) %>% collect() expect_equal(names(df), "x") expect_equal(df$x, 1:5 + 2) }) test_that("sequence of operations work", { out <- memdb_frame(x = c(1, 2, 3, 4)) %>% select(y = x) %>% mutate(z = 2 * y) %>% filter(z == 2) %>% collect() expect_equal(out, data_frame(y = 1, z = 2)) }) test_that("compute creates correct column names", { out <- memdb_frame(x = 1) %>% group_by(x) %>% summarize(n = n()) %>% compute() %>% collect() expect_equal(out, data_frame(x = 1, n = 1L)) }) # Joins make valid sql ---------------------------------------------------- test_that("join generates correct sql", { lf1 <- memdb_frame(x = 1, y = 2) lf2 <- memdb_frame(x = 1, z = 3) out <- lf1 %>% inner_join(lf2, by = "x") %>% collect() expect_equal(out, data.frame(x = 1, y = 2, z = 3)) }) test_that("semi join generates correct sql", { lf1 <- memdb_frame(x = c(1, 2), y = c(2, 3)) lf2 <- memdb_frame(x = 1) lf3 <- inner_join(lf1, lf2, by = "x") expect_equal(op_vars(lf3), c("x", "y")) out <- collect(lf3) expect_equal(out, data.frame(x = 1, y = 2)) }) test_that("set ops generates correct sql", { lf1 <- memdb_frame(x = 1) lf2 <- memdb_frame(x = c(1, 2)) out <- lf1 %>% union(lf2) %>% collect() expect_equal(out, data.frame(x = c(1, 2))) }) dbplyr/tests/testthat/test-translate-MySQL.R0000644000176200001440000000067113173734270020623 0ustar liggesuserscontext("translate-MySQL") test_that("use CHAR type for as.character", { expect_equivalent( translate_sql(as.character(x), con = simulate_mysql()), sql("CAST(`x` AS CHAR)") ) }) test_that("logicals converted to integer correctly", { skip_if_no_db("mysql") df1 <- data.frame(x = c(TRUE, FALSE, NA)) df2 <- src_test("mysql") %>% copy_to(df1, random_table_name()) %>% collect() expect_identical(df2$x, c(1L, 0L, NA)) }) dbplyr/tests/testthat/test-distinct.R0000644000176200001440000000141413066534450017477 0ustar liggesuserscontext("distinct") df <- tibble( x = c(1, 1, 1, 1), y = c(1, 1, 2, 2), z = c(1, 2, 1, 2) ) dfs <- test_load(df) test_that("distinct equivalent to local unique when keep_all is TRUE", { dfs %>% map(. %>% distinct()) %>% expect_equal_tbls(unique(df)) }) test_that("distinct for single column equivalent to local unique (#1937)", { dfs %>% map(. %>% distinct(x, .keep_all = FALSE)) %>% expect_equal_tbls(unique(df["x"])) dfs %>% map(. %>% distinct(y, .keep_all = FALSE)) %>% expect_equal_tbls(unique(df["y"])) }) test_that("distinct throws error if column is specified and .keep_all is TRUE", { mf <- memdb_frame(x = 1:10) expect_error( mf %>% distinct(x, .keep_all = TRUE) %>% collect(), "specified columns.*[.]keep_all" ) }) dbplyr/tests/testthat/test-colwise.R0000644000176200001440000000060413066751304017322 0ustar liggesuserscontext("colwise") test_that("tbl_dbi support colwise variants", { mf <- memdb_frame(x = 1:5, y = factor(letters[1:5])) exp <- mf %>% collect() %>% mutate(y = as.character(y)) expect_message( mf1 <- mutate_if(mf, is.factor, as.character), "on the first 100 rows" ) expect_equal_tbl(mf1, exp) mf2 <- mutate_at(mf, "y", as.character) expect_equal_tbl(mf2, exp) }) dbplyr/tests/testthat/test-escape.R0000644000176200001440000000114213070722231017103 0ustar liggesuserscontext("escape") # See test-sql-escape for SQL generation tests. These tests that the # generate SQL actually works across the three main open source backends test_that("multiplication works", { dfs <- test_frame(x = c(1, NA)) %>% map(. %>% mutate(y = coalesce(x > 0, TRUE))) out <- dfs %>% map(collect) # SQLite treats as integers expect_identical(out$sqlite$y, c(1L, 1L)) # MySQL converts to doubles if (!is.null(out$mysql)) expect_identical(out$mysql$y, c(1, 1)) # PostgresSQL keeps as logical if (!is.null(out$postgres)) expect_identical(out$postgres$y, c(TRUE, TRUE)) }) dbplyr/tests/testthat/test-compute.R0000644000176200001440000000171213174434301017325 0ustar liggesuserscontext("compute") test_that("compute doesn't change representation", { mf1 <- memdb_frame(x = 5:1, y = 1:5, z = "a") expect_equal_tbl(mf1, mf1 %>% compute) expect_equal_tbl(mf1, mf1 %>% compute %>% compute) mf2 <- mf1 %>% mutate(z = x + y) expect_equal_tbl(mf2, mf2 %>% compute) }) test_that("compute can create indexes", { mfs <- test_frame(x = 5:1, y = 1:5, z = 10) mfs %>% map(. %>% compute(indexes = c("x", "y"))) %>% expect_equal_tbls() mfs %>% map(. %>% compute(indexes = list("x", "y", c("x", "y")))) %>% expect_equal_tbls() mfs %>% map(. %>% compute(indexes = "x", unique_indexes = "y")) %>% expect_equal_tbls() mfs %>% map(. %>% compute(unique_indexes = list(c("x", "z"), c("y", "z")))) %>% expect_equal_tbls() }) test_that("unique index fails if values are duplicated", { mfs <- test_frame(x = 5:1, y = "a", ignore = "df") map(mfs, function(.) expect_error(compute(., unique_indexes = "y"))) }) dbplyr/tests/testthat/test-explain.R0000644000176200001440000000024713067742005017317 0ustar liggesuserscontext("explain") test_that("basic pipeline is correct", { mf <- memdb_frame(x = 1:5) %>% filter(x > 5) expect_output_file(explain(mf), "explain-sqlite.txt") }) dbplyr/tests/testthat/test-sets.R0000644000176200001440000000356313174434200016633 0ustar liggesuserscontext("sets") test_that("column order is matched", { df1 <- memdb_frame(x = 1, y = 2) df2 <- memdb_frame(y = 1, x = 2) out <- collect(dplyr::union(df1, df2)) expect_equal(out, tibble(x = c(1, 2), y = c(2, 1))) }) test_that("missing columns filled with NULL", { df1 <- memdb_frame(x = 1) df2 <- memdb_frame(y = 1) out <- collect(dplyr::union(df1, df2)) expect_equal(out, tibble(x = c(1, NA), y = c(NA, 1))) }) # SQL generation ---------------------------------------------------------- test_that("union and union all work for all backends", { df <- tibble(x = 1:10, y = x %% 2) tbls_full <- test_load(df) tbls_filter <- test_load(filter(df, y == 0)) tbls_full %>% map2(tbls_filter, union) %>% expect_equal_tbls() tbls_full %>% map2(tbls_filter, union_all) %>% expect_equal_tbls() }) test_that("intersect and setdiff work for supported backends", { df <- tibble(x = 1:10, y = x %% 2) # MySQL doesn't support EXCEPT or INTERSECT tbls_full <- test_load(df, ignore = c("mysql", "MariaDB")) tbls_filter <- test_load(filter(df, y == 0), ignore = c("mysql", "MariaDB")) tbls_full %>% map2(tbls_filter, intersect) %>% expect_equal_tbls() tbls_full %>% map2(tbls_filter, setdiff) %>% expect_equal_tbls() }) test_that("SQLite warns if set op attempted when tbl has LIMIT", { mf <- memdb_frame(x = 1:2) m1 <- head(mf, 1) expect_error(dplyr::union(mf, m1), "does not support") expect_error(dplyr::union(m1, mf), "does not support") }) test_that("other backends can combine with a limit", { df <- tibble(x = 1:2) # sqlite only allows limit at top level tbls_full <- test_load(df, ignore = "sqlite") tbls_head <- lapply(test_load(df, ignore = "sqlite"), head, n = 1) tbls_full %>% map2(tbls_head, union) %>% expect_equal_tbls() tbls_full %>% map2(tbls_head, union_all) %>% expect_equal_tbls() }) dbplyr/tests/testthat/test-sql-build.R0000644000176200001440000001057713174404170017557 0ustar liggesuserscontext("SQL: build") test_that("base source of lazy frame is always 'df'", { out <- lazy_frame(x = 1, y = 5) %>% sql_build() expect_equal(out, ident("df")) }) test_that("connection affects SQL generation", { lf <- lazy_frame(x = 1, y = 5) %>% summarise(n = n()) out1 <- lf %>% sql_build() out2 <- lf %>% sql_build(con = simulate_postgres()) expect_equal(out1$select, sql('COUNT() AS "n"')) expect_equal(out2$select, sql('COUNT(*) AS \"n\"')) }) # select and rename ------------------------------------------------------- test_that("select picks variables", { out <- lazy_frame(x1 = 1, x2 = 1, x3 = 2) %>% select(x1:x2) %>% sql_build() expect_equal(out$select, ident("x1" = "x1", "x2" = "x2")) }) test_that("select renames variables", { out <- lazy_frame(x1 = 1, x2 = 1, x3 = 2) %>% select(y = x1, z = x2) %>% sql_build() expect_equal(out$select, ident("y" = "x1", "z" = "x2")) }) test_that("select can refer to variables in local env", { vars <- c("x", "y") out <- lazy_frame(x = 1, y = 1) %>% select(one_of(vars)) %>% sql_build() expect_equal(out$select, ident("x" = "x", "y" = "y")) }) test_that("rename preserves existing vars", { out <- lazy_frame(x = 1, y = 1) %>% rename(z = y) %>% sql_build() expect_equal(out$select, ident("x" = "x", "z" = "y")) }) # arrange ----------------------------------------------------------------- test_that("arrange generates order_by", { out <- lazy_frame(x = 1, y = 1) %>% arrange(x) %>% sql_build() expect_equal(out$order_by, sql('"x"')) }) test_that("arrange converts desc", { out <- lazy_frame(x = 1, y = 1) %>% arrange(desc(x)) %>% sql_build() expect_equal(out$order_by, sql('"x" DESC')) }) test_that("grouped arrange doesn't order by groups", { out <- lazy_frame(x = 1, y = 1) %>% group_by(x) %>% arrange(y) %>% sql_build() expect_equal(out$order_by, sql('"y"')) }) # summarise --------------------------------------------------------------- test_that("summarise generates group_by and select", { out <- lazy_frame(g = 1) %>% group_by(g) %>% summarise(n = n()) %>% sql_build() expect_equal(out$group_by, sql('"g"')) expect_equal(out$select, sql('"g"', 'COUNT() AS "n"')) }) # filter ------------------------------------------------------------------ test_that("filter generates simple expressions", { out <- lazy_frame(x = 1) %>% filter(x > 1L) %>% sql_build() expect_equal(out$where, sql('"x" > 1')) }) # mutate ------------------------------------------------------------------ test_that("mutate generates simple expressions", { out <- lazy_frame(x = 1) %>% mutate(y = x + 1L) %>% sql_build() expect_equal(out$select, sql('"x"', '"x" + 1 AS "y"')) }) # ungroup by -------------------------------------------------------------- test_that("ungroup drops PARTITION BY", { out <- lazy_frame(x = 1) %>% group_by(x) %>% ungroup() %>% mutate(x = rank(x)) %>% sql_build() expect_equal(out$select, sql('rank() OVER (ORDER BY "x") AS "x"')) }) # distinct ---------------------------------------------------------------- test_that("distinct sets flagged", { out1 <- lazy_frame(x = 1) %>% select() %>% sql_build() expect_false(out1$distinct) out2 <- lazy_frame(x = 1) %>% distinct() %>% sql_build() expect_true(out2$distinct) }) # head -------------------------------------------------------------------- test_that("head limits rows", { out <- lazy_frame(x = 1:100) %>% head(10) %>% sql_build() expect_equal(out$limit, 10) }) # joins ------------------------------------------------------------------- test_that("join captures both tables", { lf1 <- lazy_frame(x = 1, y = 2) lf2 <- lazy_frame(x = 1, z = 2) out <- inner_join(lf1, lf2) %>% sql_build() expect_s3_class(out, "join_query") expect_equal(op_vars(out$x), c("x", "y")) expect_equal(op_vars(out$y), c("x", "z")) expect_equal(out$type, "inner") }) test_that("semi join captures both tables", { lf1 <- lazy_frame(x = 1, y = 2) lf2 <- lazy_frame(x = 1, z = 2) out <- semi_join(lf1, lf2) %>% sql_build() expect_equal(op_vars(out$x), c("x", "y")) expect_equal(op_vars(out$y), c("x", "z")) expect_equal(out$anti, FALSE) }) test_that("set ops captures both tables", { lf1 <- lazy_frame(x = 1, y = 2) lf2 <- lazy_frame(x = 1, z = 2) out <- union(lf1, lf2) %>% sql_build() expect_equal(out$type, "UNION") }) dbplyr/tests/testthat/test-summarise.r0000644000176200001440000000074113173465370017730 0ustar liggesuserscontext("Summarise") test_that("summarise peels off a single layer of grouping", { mf1 <- memdb_frame(x = 1, y = 1, z = 2) %>% group_by(x, y) mf2 <- mf1 %>% summarise(n = n()) expect_equal(group_vars(mf2), "x") mf3 <- mf2 %>% summarise(n = n()) expect_equal(group_vars(mf3), character()) }) test_that("summarise performs partial evaluation", { mf1 <- memdb_frame(x = 1) val <- 1 mf2 <- mf1 %>% summarise(y = x == val) %>% collect() expect_equal(mf2$y, 1) }) dbplyr/tests/testthat/test-lazy-ops.R0000644000176200001440000001032013173721605017427 0ustar liggesuserscontext("lazy-ops") # op_vars ----------------------------------------------------------------- test_that("select reduces variables", { out <- mtcars %>% tbl_lazy() %>% select(mpg:disp) expect_equal(op_vars(out), c("mpg", "cyl", "disp")) }) test_that("rename preserves existing", { out <- data_frame(x = 1, y = 2) %>% tbl_lazy() %>% rename(z = y) expect_equal(op_vars(out), c("x", "z")) }) test_that("mutate adds new", { out <- data_frame(x = 1) %>% tbl_lazy() %>% mutate(y = x + 1, z = y + 1) expect_equal(op_vars(out), c("x", "y", "z")) }) test_that("summarise replaces existing", { out <- data_frame(x = 1, y = 2) %>% tbl_lazy() %>% summarise(z = 1) expect_equal(op_vars(out), "z") }) test_that("summarised and mutated vars are always named", { mf <- dbplyr::memdb_frame(a = 1) out1 <- mf %>% summarise(1) %>% op_vars() expect_equal(out1, "1") out2 <- mf %>% mutate(1) %>% op_vars() expect_equal(out2, c("a", "1")) }) test_that("distinct has complicated rules", { out <- lazy_frame(x = 1, y = 2) %>% distinct() expect_equal(op_vars(out), c("x", "y")) out <- lazy_frame(x = 1, y = 2) %>% distinct(x, .keep_all = TRUE) expect_equal(op_vars(out), c("x", "y")) out <- lazy_frame(x = 1, y = 2, z = 3) %>% distinct(x, y) expect_equal(op_vars(out), c("x", "y")) out <- lazy_frame(x = 1, y = 2, z = 3) %>% group_by(x) %>% distinct(y) expect_equal(op_vars(out), c("x", "y")) }) test_that("grouped summary keeps groups", { out <- data_frame(g = 1, x = 1) %>% tbl_lazy() %>% group_by(g) %>% summarise(y = 1) expect_equal(op_vars(out), c("g", "y")) }) test_that("joins get vars from both left and right", { out <- left_join( lazy_frame(x = 1, y = 1), lazy_frame(x = 2, z = 2), by = "x" ) expect_equal(op_vars(out), c("x", "y", "z")) }) test_that("semi joins get vars from left", { out <- semi_join( lazy_frame(x = 1, y = 1), lazy_frame(x = 2, z = 2), by = "x" ) expect_equal(op_vars(out), c("x", "y")) }) # op_grps ----------------------------------------------------------------- test_that("group_by overrides existing groups", { df <- data_frame(g1 = 1, g2 = 2, x = 3) %>% tbl_lazy() out1 <- df %>% group_by(g1) expect_equal(op_grps(out1), "g1") out2 <- out1 %>% group_by(g2) expect_equal(op_grps(out2), "g2") }) test_that("group_by increases grouping if add = TRUE", { df <- data_frame(g1 = 1, g2 = 2, x = 3) %>% tbl_lazy() out <- df %>% group_by(g1) %>% group_by(g2, add = TRUE) expect_equal(op_grps(out), c("g1", "g2")) }) test_that("rename renames grouping vars", { df <- lazy_frame(a = 1, b = 2) %>% group_by(a) %>% rename(c = a) expect_equal(op_grps(df), "c") }) test_that("summarise drops one grouping level", { df <- data_frame(g1 = 1, g2 = 2, x = 3) %>% tbl_lazy() %>% group_by(g1, g2) out1 <- df %>% summarise(y = 1) out2 <- out1 %>% summarise(y = 2) expect_equal(op_grps(out1), "g1") expect_equal(op_grps(out2), character()) }) test_that("ungroup drops all groups", { out1 <- lazy_frame(g1 = 1, g2 = 2) %>% group_by(g1, g2) %>% ungroup() out2 <- lazy_frame(g1 = 1, g2 = 2) %>% group_by(g1, g2) %>% ungroup() %>% rename(g3 = g1) expect_equal(op_grps(out1), character()) expect_equal(op_grps(out2), character()) }) # op_sort ----------------------------------------------------------------- test_that("unsorted gives NULL", { out <- lazy_frame(x = 1:3, y = 3:1) expect_equal(op_sort(out), NULL) }) test_that("arranges captures DESC", { out <- lazy_frame(x = 1:3, y = 3:1) %>% arrange(desc(x)) expect_equal(op_sort(out), list(~desc(x))) }) test_that("multiple arranges combine", { out <- lazy_frame(x = 1:3, y = 3:1) %>% arrange(x) %>% arrange(y) out <- arrange(arrange(lazy_frame(x = 1:3, y = 3:1), x), y) expect_equal(op_sort(out), list(~x, ~y)) }) test_that("preserved across compute and collapse", { df1 <- memdb_frame(x = sample(10)) %>% arrange(x) df2 <- compute(df1) expect_equal(op_sort(df2), list(~x)) df3 <- collapse(df1) expect_equal(op_sort(df3), list(~x)) }) # head -------------------------------------------------------------------- test_that("two heads are equivalent to one", { out <- lazy_frame(x = 1:10) %>% head(3) %>% head(5) expect_equal(out$ops$args$n, 3) }) dbplyr/tests/testthat/helper-output.R0000644000176200001440000000026312732751461017521 0ustar liggesusersoutput_file <- function(filename) file.path("output", filename) expect_output_file_rel <- function(x, filename) { expect_output_file(x, output_file(filename), update = TRUE) } dbplyr/tests/testthat/test-joins.R0000644000176200001440000000632413174424334017004 0ustar liggesuserscontext("joins") df1 <- memdb_frame(x = 1:5, y = 1:5) df2 <- memdb_frame(a = 5:1, b = 1:5) df3 <- memdb_frame(x = 1:5, z = 1:5) df4 <- memdb_frame(a = 5:1, z = 5:1) test_that("named by join by different x and y vars", { j1 <- collect(inner_join(df1, df2, c("x" = "a"))) expect_equal(names(j1), c("x", "y", "b")) expect_equal(nrow(j1), 5) j2 <- collect(inner_join(df1, df2, c("x" = "a", "y" = "b"))) expect_equal(names(j2), c("x", "y")) expect_equal(nrow(j2), 1) }) test_that("named by join by same z vars", { j1 <- collect(inner_join(df3, df4, c("z" = "z"))) expect_equal(nrow(j1), 5) expect_equal(names(j1), c("x", "z", "a")) }) test_that("join with both same and different vars", { j1 <- collect(left_join(df1, df3, by = c("y" = "z", "x"))) expect_equal(nrow(j1), 5) expect_equal(names(j1), c("x", "y")) }) test_that("inner join doesn't result in duplicated columns ", { expect_equal(colnames(inner_join(df1, df1)), c("x", "y")) }) test_that("self-joins allowed with named by", { fam <- memdb_frame(id = 1:5, parent = c(NA, 1, 2, 2, 4)) j1 <- fam %>% left_join(fam, by = c("parent" = "id")) j2 <- fam %>% inner_join(fam, by = c("parent" = "id")) expect_equal(op_vars(j1), c("id", "parent.x", "parent.y")) expect_equal(op_vars(j2), c("id", "parent.x", "parent.y")) expect_equal(nrow(collect(j1)), 5) expect_equal(nrow(collect(j2)), 4) j3 <- collect(semi_join(fam, fam, by = c("parent" = "id"))) j4 <- collect(anti_join(fam, fam, by = c("parent" = "id"))) expect_equal(j3, filter(collect(fam), !is.na(parent))) expect_equal(j4, filter(collect(fam), is.na(parent))) }) test_that("suffix modifies duplicated variable names", { fam <- memdb_frame(id = 1:5, parent = c(NA, 1, 2, 2, 4)) j1 <- collect(inner_join(fam, fam, by = c("parent" = "id"), suffix = c("1", "2"))) j2 <- collect(left_join(fam, fam, by = c("parent" = "id"), suffix = c("1", "2"))) expect_named(j1, c("id", "parent1", "parent2")) expect_named(j2, c("id", "parent1", "parent2")) }) test_that("join variables always disambiguated (#2823)", { # Even if the new variable conflicts with an existing variable df1 <- dbplyr::memdb_frame(a = 1, b.x = 1, b = 1) df2 <- dbplyr::memdb_frame(a = 1, b = 1) both <- collect(dplyr::left_join(df1, df2, by = "a")) expect_named(both, c("a", "b.x", "b.x.x", "b.y")) }) test_that("join functions error on column not found for SQL sources #1928", { # Rely on dplyr to test precise code expect_error( left_join(memdb_frame(x = 1:5), memdb_frame(y = 1:5), by = "x"), "missing|(not found)" ) expect_error( left_join(memdb_frame(x = 1:5), memdb_frame(y = 1:5), by = "y"), "missing|(not found)" ) expect_error( left_join(memdb_frame(x = 1:5), memdb_frame(y = 1:5)), "[Nn]o common variables" ) }) # All sources ------------------------------------------------------------- test_that("sql generated correctly for all sources", { x <- test_frame(a = letters[1:7], c = 2:8) y <- test_frame(a = letters[1:4], b = c(1, 2, 3, NA)) xy <- map2(x, y, left_join) expect_equal_tbls(xy) }) test_that("full join is promoted to cross join for no overlapping variables", { result <- df1 %>% full_join(df2, by = character()) %>% collect() expect_equal(nrow(result), 25) }) dbplyr/tests/testthat/helper-src.R0000644000176200001440000000154313174635560016754 0ustar liggesusersif (test_srcs$length() == 0) { test_register_src("df", dplyr::src_df(env = new.env(parent = emptyenv()))) test_register_con("sqlite", RSQLite::SQLite(), ":memory:") if (identical(Sys.getenv("TRAVIS"), "true")) { test_register_con("postgres", RPostgreSQL::PostgreSQL(), dbname = "test", user = "travis", password = "" ) } else { test_register_con("mysql", RMySQL::MySQL(), dbname = "test", host = "localhost", user = Sys.getenv("USER") ) test_register_con("MariaDB", RMariaDB::MariaDB(), dbname = "test", host = "localhost", user = Sys.getenv("USER") ) test_register_con("postgres", RPostgreSQL::PostgreSQL(), dbname = "test", host = "localhost", user = "" ) } } skip_if_no_db <- function(db) { if (!test_srcs$has(db)) skip(paste0("No ", db)) } dbplyr/tests/testthat/test-translate-teradata.r0000644000176200001440000000340413221275427021476 0ustar liggesuserscontext("translate-teradata") test_that("custom scalar translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_teradata()) } expect_equal(trans(x != y), sql("`x` <> `y`")) expect_equal(trans(as.numeric(x)), sql("CAST(`x` AS NUMERIC)")) expect_equal(trans(as.double(x)), sql("CAST(`x` AS NUMERIC)")) expect_equal(trans(as.character(x)), sql("CAST(`x` AS VARCHAR(MAX))")) expect_equal(trans(log(x)), sql("LN(`x`)")) expect_equal(trans(cot(x)), sql("1 / TAN(`x`)")) expect_equal(trans(nchar(x)), sql("CHARACTER_LENGTH(`x`)")) expect_equal(trans(ceil(x)), sql("CEILING(`x`)")) expect_equal(trans(ceiling(x)), sql("CEILING(`x`)")) expect_equal(trans(atan2(x, y)), sql("ATAN2(`y`,`x`)")) expect_equal(trans(substr(x, 1, 2)), sql("SUBSTR(`x`, 1.0, 2.0)")) expect_error(trans(paste(x)), sql("not supported")) }) test_that("custom aggregators translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = FALSE, con = simulate_teradata()) } expect_equal(trans(var(x)), sql("VAR_SAMP(`x`)")) expect_error(trans(cor(x)), "not available") expect_error(trans(cov(x)), "not available") }) test_that("custom window functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = TRUE, con = simulate_teradata()) } expect_equal(trans(var(x, na.rm = TRUE)), sql("VAR_SAMP(`x`) OVER ()")) expect_error(trans(cor(x)), "not supported") expect_error(trans(cov(x)), "not supported") }) test_that("filter and mutate translate is.na correctly", { mf <- lazy_frame(x = 1, src = simulate_teradata()) expect_equal( mf %>% head() %>% show_query(), sql("SELECT TOP 6 *\nFROM `df`") ) }) dbplyr/tests/testthat/test-group-size.R0000644000176200001440000000265613174434123017767 0ustar liggesuserscontext("Group sizes") # Data for the first three test_that groups below df <- data.frame(x = rep(1:3, each = 10), y = rep(1:6, each = 5)) # MariaDB returns bit64 instead of int, which makes testing hard tbls <- test_load(df, ignore = "MariaDB") test_that("ungrouped data has 1 group, with group size = nrow()", { for (tbl in tbls) { expect_equal(n_groups(tbl), 1L) expect_equal(group_size(tbl), 30) } }) test_that("rowwise data has one group for each group", { rw <- rowwise(df) expect_equal(n_groups(rw), 30) expect_equal(group_size(rw), rep(1, 30)) }) test_that("group_size correct for grouped data", { for (tbl in tbls) { grp <- group_by(tbl, x) expect_equal(n_groups(grp), 3L) expect_equal(group_size(grp), rep(10, 3)) } }) # For following tests, add an extra level that's not present in data df$x = factor(df$x, levels = 1:4) tbls <- test_load(df, ignore = "MariaDB") test_that("n_groups drops zero-length groups", { for (tbl in tbls) { grp <- group_by(tbl, x) expect_equal(n_groups(grp), 3, info = class(tbl)[1]) } }) test_that("summarise drops zero-length groups", { for (tbl in tbls) { res <- tbl %>% group_by(x) %>% summarise(n = n(), mn = mean(y, na.rm = TRUE)) %>% collect expect_equal(nrow(res), 3, info = class(tbl)[1]) expect_equal(tail(res$n, n = 1), 10, info = class(tbl)[1]) expect_false(is.nan(tail(res$mn, n = 1)), info = class(tbl)[1]) } }) dbplyr/tests/testthat/test-copy_to.R0000644000176200001440000000135313067213706017333 0ustar liggesuserscontext("copy_to") test_that("can round trip basic data frame", { df <- test_frame(x = c(1, 10, 9, NA), y = letters[1:4]) expect_equal_tbls(df) }) test_that("NAs in character fields handled by db sources (#2256)", { df <- test_frame( x = c("a", "aa", NA), y = c(NA, "b", "bb"), z = c("cc", NA, "c") ) expect_equal_tbls(df) }) test_that("src_sql allows you to overwrite", { name <- random_table_name() copy_to(src_memdb(), tibble(x = 1), name = name) # Can't check for specific error messages because will vary expect_error(copy_to(src_memdb(), tibble(x = 1), name = name)) df2 <- tibble(x = 2) copy_to(src_memdb(), df2, name = name, overwrite = TRUE) expect_equal(collect(tbl(src_memdb(), name)), df2) }) dbplyr/tests/testthat/test-tbl-sql.r0000644000176200001440000000467113173736625017313 0ustar liggesuserscontext("tbl_sql") test_that("can generate sql tbls with raw sql", { mf1 <- memdb_frame(x = 1:3, y = 3:1) mf2 <- tbl(mf1$src, build_sql("SELECT * FROM ", mf1$ops$x)) expect_equal(collect(mf1), collect(mf2)) }) test_that("tbl_sql() works with string argument", { name <- unclass(random_table_name()) df <- memdb_frame(a = 1, .name = name) expect_equal(collect(tbl_sql("sqlite", df$src, name)), collect(df)) }) test_that("memdb_frame() returns visible output", { expect_true(withVisible(memdb_frame(a = 1))$visible) }) test_that("head/print respects n" ,{ df2 <- memdb_frame(x = 1:5) out <- df2 %>% head(n = Inf) %>% collect() expect_equal(nrow(out), 5) expect_output(print(df2, n = Inf)) out <- df2 %>% head(n = 1) %>% collect() expect_equal(nrow(out), 1) out <- df2 %>% head(n = 0) %>% collect() expect_equal(nrow(out), 0) expect_error( df2 %>% head(n = -1) %>% collect(), "not greater than or equal to 0" ) }) test_that("db_write_table calls dbQuoteIdentifier on table name" ,{ idents <- character() setClass("DummyDBIConnection", representation("DBIConnection")) setMethod("dbQuoteIdentifier", c("DummyDBIConnection", "character"), function(conn, x, ...) { idents <<- c(idents, x) } ) setMethod("dbWriteTable", c("DummyDBIConnection", "character", "ANY"), function(conn, name, value, ...) {TRUE} ) dummy_con <- new("DummyDBIConnection") db_write_table(dummy_con, "somecrazytablename", NA, NA) expect_true("somecrazytablename" %in% idents) }) test_that("same_src distinguishes srcs", { src1 <- src_sqlite(":memory:", create = TRUE) src2 <- src_sqlite(":memory:", create = TRUE) expect_true(same_src(src1, src1)) expect_false(same_src(src1, src2)) db1 <- copy_to(src1, iris, 'data1', temporary = FALSE) db2 <- copy_to(src2, iris, 'data2', temporary = FALSE) expect_true(same_src(db1, db1)) expect_false(same_src(db1, db2)) expect_false(same_src(db1, mtcars)) }) test_that("can copy to from remote sources", { df <- data.frame(x = 1:10) con1 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") on.exit(DBI::dbDisconnect(con1)) df_1 <- copy_to(con1, df, "df1") # Create from tbl in same database df_2 <- copy_to(con1, df_1, "df2") expect_equal(collect(df_2), df) # Create from tbl in another data con2 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") on.exit(DBI::dbDisconnect(con2)) df_3 <- copy_to(con2, df_1, "df3") expect_equal(collect(df_3), df) }) dbplyr/tests/testthat/test-translate-access.r0000644000176200001440000000510613176657724021170 0ustar liggesuserscontext("translate-ACCESS") test_that("custom scalar translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_odbc_access()) } # Conversion expect_equal(trans(as.numeric(x)), sql("CDBL(`x`)")) expect_equal(trans(as.double(x)), sql("CDBL(`x`)")) expect_equal(trans(as.integer(x)), sql("INT(`x`)")) expect_equal(trans(as.logical(x)), sql("CBOOL(`x`)")) expect_equal(trans(as.character(x)), sql("CSTR(`x`)")) expect_equal(trans(as.Date(x)), sql("CDATE(`x`)")) # Math expect_equal(trans(exp(x)), sql("EXP(`x`)")) expect_equal(trans(log(x)), sql("LOG(`x`)")) expect_equal(trans(log10(x)), sql("LOG(`x`) / LOG(10)")) expect_equal(trans(sqrt(x)), sql("SQR(`x`)")) expect_equal(trans(sign(x)), sql("SGN(`x`)")) expect_equal(trans(floor(x)), sql("INT(`x`)")) expect_equal(trans(ceiling(x)), sql("INT(`x` + 0.9999999999)")) expect_equal(trans(ceil(x)), sql("INT(`x` + 0.9999999999)")) # String expect_equal(trans(nchar(x)), sql("LEN(`x`)")) expect_equal(trans(tolower(x)), sql("LCASE(`x`)")) expect_equal(trans(toupper(x)), sql("UCASE(`x`)")) expect_equal(trans(substr(x, 1, 2)), sql("RIGHT(LEFT(`x`, 2.0), 2.0)")) expect_equal(trans(paste(x)), sql("CSTR(`x`)")) expect_equal(trans(trimws(x)), sql("TRIM(`x`)")) expect_equal(trans(is.null(x)), sql("ISNULL(`x`)")) expect_equal(trans(is.na(x)), sql("ISNULL(`x`)")) expect_equal(trans(coalesce(x, y)), sql("IIF(ISNULL(`x`), `y`, `x`)")) expect_equal(trans(pmin(x, y)), sql("IIF(`x` <= `y`, `x`, `y`)")) expect_equal(trans(pmax(x, y)), sql("IIF(`x` <= `y`, `y`, `x`)")) expect_equal(trans(Sys.Date()), sql("DATE()")) # Special paste() tests expect_equal(trans(paste(x, y, sep = "+")), sql("`x` & '+' & `y`")) expect_equal(trans(paste0(x, y)), sql("`x` & `y`")) expect_error(trans(paste(x, collapse = "-")),"`collapse` not supported") }) test_that("custom aggregators translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = FALSE, con = simulate_odbc_access()) } expect_equal(trans(sd(x)), sql("STDEV(`x`)")) expect_equal(trans(var(x)), sql("VAR(`x`)")) expect_error(trans(cor(x)), "not available") expect_error(trans(cov(x)), "not available") expect_error(trans(n_distinct(x)), "not available") }) test_that("queries translate correctly", { mf <- lazy_frame(x = 1, src = simulate_odbc_access()) expect_equal( mf %>% head() %>% show_query(), sql("SELECT TOP 6 *\nFROM `df`") ) }) dbplyr/tests/testthat/test-joins-consistent.R0000644000176200001440000001347513174434207021200 0ustar liggesuserscontext("SQL: consistent join results") test_that("consistent result of left join on key column with same name in both tables", { test_l_j_by_x <- function(tbl_left, tbl_right) { left_join(tbl_left, tbl_right, by = "x") %>% arrange(x, y, z) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), z = 1L:4L) tbls_left <- test_load(tbl_left) tbls_right <- test_load(tbl_right) compare_tbls2(tbls_left, tbls_right, op = test_l_j_by_x) }) test_that("consistent result of inner join on key column with same name in both tables", { test_i_j_by_x <- function(tbl_left, tbl_right) { inner_join(tbl_left, tbl_right, by = "x") %>% arrange(x, y, z) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), z = 1L:4L) tbls_left <- test_load(tbl_left) tbls_right <- test_load(tbl_right) compare_tbls2(tbls_left, tbls_right, op = test_i_j_by_x) }) test_that("consistent result of right join on key column with same name in both tables", { test_r_j_by_x <- function(tbl_left, tbl_right) { right_join(tbl_left, tbl_right, by = "x") %>% arrange(x, y, z) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), z = 1L:4L) # SQLite does not support right joins tbls_left <- test_load(tbl_left, ignore = c("sqlite")) tbls_right <- test_load(tbl_right, ignore = c("sqlite")) compare_tbls2(tbls_left, tbls_right, op = test_r_j_by_x) }) test_that("consistent result of full join on key column with same name in both tables", { test_f_j_by_x <- function(tbl_left, tbl_right) { full_join(tbl_left, tbl_right, by = "x") %>% arrange(x, y, z) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), z = 1L:4L) # SQLite and MySQL do not support full joins tbls_left <- test_load(tbl_left, ignore = c("sqlite", "mysql", "MariaDB")) tbls_right <- test_load(tbl_right, ignore = c("sqlite", "mysql", "MariaDB")) compare_tbls2(tbls_left, tbls_right, op = test_f_j_by_x) }) test_that("consistent result of left join on key column with different names", { test_l_j_by_xl_xr <- function(tbl_left, tbl_right) { left_join(tbl_left, tbl_right, by = c("xl" = "xr")) %>% arrange(xl, y, z) } tbl_left <- tibble(xl = 1L:4L, y = 1L:4L) tbl_right <- tibble(xr = c(1L:3L, 5L), z = 1L:4L) tbls_left <- test_load(tbl_left) tbls_right <- test_load(tbl_right) compare_tbls2(tbls_left, tbls_right, op = test_l_j_by_xl_xr) }) test_that("consistent result of inner join on key column with different names", { test_i_j_by_xl_xr <- function(tbl_left, tbl_right) { inner_join(tbl_left, tbl_right, by = c("xl" = "xr")) %>% arrange(xl, y, z) } tbl_left <- tibble(xl = 1L:4L, y = 1L:4L) tbl_right <- tibble(xr = c(1L:3L, 5L), z = 1L:4L) tbls_left <- test_load(tbl_left) tbls_right <- test_load(tbl_right) compare_tbls2(tbls_left, tbls_right, op = test_i_j_by_xl_xr) }) test_that("consistent result of right join on key column with different names", { test_r_j_by_xl_xr <- function(tbl_left, tbl_right) { right_join(tbl_left, tbl_right, by = c("xl" = "xr")) %>% arrange(xl, y, z) } tbl_left <- tibble(xl = 1L:4L, y = 1L:4L) tbl_right <- tibble(xr = c(1L:3L, 5L), z = 1L:4L) # SQLite does not support right joins tbls_left <- test_load(tbl_left, ignore = c("sqlite")) tbls_right <- test_load(tbl_right, ignore = c("sqlite")) compare_tbls2(tbls_left, tbls_right, op = test_r_j_by_xl_xr) }) test_that("consistent result of full join on key column with different names", { test_f_j_by_xl_xr <- function(tbl_left, tbl_right) { full_join(tbl_left, tbl_right, by = c("xl" = "xr")) %>% arrange(xl, y, z) } tbl_left <- tibble(xl = 1L:4L, y = 1L:4L) tbl_right <- tibble(xr = c(1L:3L, 5L), z = 1L:4L) # SQLite and MySQL do not support full joins tbls_left <- test_load(tbl_left, ignore = c("sqlite", "mysql", "MariaDB")) tbls_right <- test_load(tbl_right, ignore = c("sqlite", "mysql", "MariaDB")) compare_tbls2(tbls_left, tbls_right, op = test_f_j_by_xl_xr) }) test_that("consistent result of left natural join", { test_l_j <- function(tbl_left, tbl_right) { left_join(tbl_left, tbl_right) %>% arrange(x, y, z, w) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L, w = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), y = 1L:4L, z = 1L:4L) tbls_left <- test_load(tbl_left) tbls_right <- test_load(tbl_right) compare_tbls2(tbls_left, tbls_right, op = test_l_j) }) test_that("consistent result of inner natural join", { test_i_j <- function(tbl_left, tbl_right) { inner_join(tbl_left, tbl_right) %>% arrange(x, y, z, w) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L, w = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), y = 1L:4L, z = 1L:4L) tbls_left <- test_load(tbl_left) tbls_right <- test_load(tbl_right) compare_tbls2(tbls_left, tbls_right, op = test_i_j) }) test_that("consistent result of right natural join", { test_r_j <- function(tbl_left, tbl_right) { right_join(tbl_left, tbl_right) %>% arrange(x, y, z, w) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L, w = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), y = 1L:4L, z = 1L:4L) # SQLite does not support right joins tbls_left <- test_load(tbl_left, ignore = c("sqlite")) tbls_right <- test_load(tbl_right, ignore = c("sqlite")) compare_tbls2(tbls_left, tbls_right, op = test_r_j) }) test_that("consistent result of full natural join", { test_f_j <- function(tbl_left, tbl_right) { full_join(tbl_left, tbl_right) %>% arrange(x, y, z, w) } tbl_left <- data_frame(x = 1L:4L, y = 1L:4L, w = 1L:4L) tbl_right <- data_frame(x = c(1L:3L, 5L), y = 1L:4L, z = 1L:4L) # SQLite and MySQL do not support full joins tbls_left <- test_load(tbl_left, ignore = c("sqlite", "mysql", "MariaDB")) tbls_right <- test_load(tbl_right, ignore = c("sqlite", "mysql", "MariaDB")) compare_tbls2(tbls_left, tbls_right, op = test_f_j) }) dbplyr/tests/testthat/test-translate-postgresql.R0000644000176200001440000000321413221275427022053 0ustar liggesuserscontext("translate-postgresql") test_that("custom scalar translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_odbc_postgresql()) } expect_equal(trans(log10(x)), sql("LOG(`x`)")) expect_equal(trans(log(x)), sql("LN(`x`)")) expect_equal(trans(log(x, 2)), sql("LOG(`x`) / LOG(2.0)")) expect_equal(trans(cot(x)), sql("1 / TAN(`x`)")) expect_equal(trans(round(x, digits = 1.1)), sql("ROUND((`x`) :: numeric, 1)")) expect_equal(trans(grepl("exp", x)), sql("(`x`) ~ ('exp')")) expect_equal(trans(grepl("exp", x, TRUE)), sql("(`x`) ~* ('exp')")) }) test_that("custom stringr functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_odbc_postgresql()) } expect_equal(trans(str_locate(x, y)), sql("STRPOS(`x`, `y`)")) expect_equal(trans(str_detect(x, y)), sql("STRPOS(`x`, `y`) > 0")) }) test_that("two variable aggregates are translated correctly", { trans <- function(x, window) { translate_sql(!!enquo(x), window = window, con = simulate_odbc_postgresql()) } expect_equal(trans(cor(x, y), window = FALSE), sql("CORR(`x`, `y`)")) expect_equal(trans(cor(x, y), window = TRUE), sql("CORR(`x`, `y`) OVER ()")) }) test_that("pasting translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = FALSE, con = simulate_odbc_postgresql()) } expect_equal(trans(paste(x, y)), sql("CONCAT_WS(' ', `x`, `y`)")) expect_equal(trans(paste0(x, y)), sql("CONCAT_WS('', `x`, `y`)")) expect_error(trans(paste0(x, collapse = "")), "`collapse` not supported") }) dbplyr/tests/testthat/test-filter.r0000644000176200001440000000356613173457755017230 0ustar liggesuserscontext("filter") test_that("filter captures local variables", { mf <- memdb_frame(x = 1:5, y = 5:1) z <- 3 df1 <- mf %>% filter(x > z) %>% collect() df2 <- mf %>% collect() %>% filter(x > z) expect_equal_tbl(df1, df2) }) test_that("two filters equivalent to one", { mf <- memdb_frame(x = 1:5, y = 5:1) df1 <- mf %>% filter(x > 3) %>% filter(y < 3) df2 <- mf %>% filter(x > 3, y < 3) expect_equal_tbl(df1, df2) }) test_that("each argument gets implicit parens", { mf <- memdb_frame( v1 = c("a", "b", "a", "b"), v2 = c("b", "a", "a", "b"), v3 = c("a", "b", "c", "d") ) mf1 <- mf %>% filter((v1 == "a" | v2 == "a") & v3 == "a") mf2 <- mf %>% filter(v1 == "a" | v2 == "a", v3 == "a") expect_equal_tbl(mf1, mf2) }) # SQL generation -------------------------------------------------------- test_that("basic filter works across all backends", { dfs <- test_frame(x = 1:5, y = 5:1) dfs %>% map(. %>% filter(x > 3)) %>% expect_equal_tbls() }) test_that("filter calls windowed versions of sql functions", { dfs <- test_frame_windowed( x = 1:10, g = rep(c(1, 2), each = 5) ) dfs %>% map(. %>% group_by(g) %>% filter(row_number(x) < 3)) %>% expect_equal_tbls(tibble(g = c(1, 1, 2, 2), x = c(1L, 2L, 6L, 7L))) }) test_that("recycled aggregates generate window function", { dfs <- test_frame_windowed( x = 1:10, g = rep(c(1, 2), each = 5) ) dfs %>% map(. %>% group_by(g) %>% filter(x > mean(x, na.rm = TRUE))) %>% expect_equal_tbls(tibble(g = c(1, 1, 2, 2), x = c(4L, 5L, 9L, 10L))) }) test_that("cumulative aggregates generate window function", { dfs <- test_frame_windowed( x = c(1:3, 2:4), g = rep(c(1, 2), each = 3) ) dfs %>% map(. %>% group_by(g) %>% arrange(x) %>% filter(cumsum(x) > 3) ) %>% expect_equal_tbls(tibble(g = c(1, 2, 2), x = c(3L, 3L, 4L))) }) dbplyr/tests/testthat/test-remote.R0000644000176200001440000000103413174406056017147 0ustar liggesuserscontext("test-remote.R") test_that("remote_name returns null for computed tables", { mf <- memdb_frame(x = 5, .name = "refxiudlph") expect_equal(remote_name(mf), ident("refxiudlph")) mf2 <- mf %>% filter(x == 3) expect_equal(remote_name(mf2), NULL) }) test_that("can retrieve query, src and con metadata", { mf <- memdb_frame(x = 5) expect_s4_class(remote_con(mf), "DBIConnection") expect_s3_class(remote_src(mf), "src_sql") expect_s3_class(remote_query(mf), "sql") expect_type(remote_query_plan(mf), "character") }) dbplyr/NAMESPACE0000644000176200001440000002417513221470501012771 0ustar liggesusers# Generated by roxygen2: do not edit by hand S3method(anti_join,tbl_lazy) S3method(arrange,tbl_lazy) S3method(arrange_,tbl_lazy) S3method(as.data.frame,tbl_sql) S3method(as.sql,character) S3method(as.sql,ident) S3method(as.sql,sql) S3method(auto_copy,tbl_sql) S3method(c,ident) S3method(c,sql) S3method(collapse,tbl_sql) S3method(collect,tbl_sql) S3method(compute,tbl_sql) S3method(copy_to,src_sql) S3method(db_analyze,"Microsoft SQL Server") S3method(db_analyze,ACCESS) S3method(db_analyze,DBIConnection) S3method(db_analyze,Hive) S3method(db_analyze,Impala) S3method(db_analyze,MariaDBConnection) S3method(db_analyze,MySQLConnection) S3method(db_analyze,OraConnection) S3method(db_analyze,Oracle) S3method(db_analyze,Teradata) S3method(db_begin,DBIConnection) S3method(db_begin,MySQLConnection) S3method(db_begin,PostgreSQLConnection) S3method(db_collect,DBIConnection) S3method(db_commit,DBIConnection) S3method(db_commit,MySQLConnection) S3method(db_compute,DBIConnection) S3method(db_copy_to,DBIConnection) S3method(db_create_index,DBIConnection) S3method(db_create_index,MariaDBConnection) S3method(db_create_index,MySQLConnection) S3method(db_create_indexes,DBIConnection) S3method(db_create_table,DBIConnection) S3method(db_data_type,DBIConnection) S3method(db_data_type,MySQLConnection) S3method(db_desc,DBIConnection) S3method(db_desc,MariaDBConnection) S3method(db_desc,MySQLConnection) S3method(db_desc,OdbcConnection) S3method(db_desc,PostgreSQL) S3method(db_desc,PostgreSQLConnection) S3method(db_desc,PqConnection) S3method(db_desc,SQLiteConnection) S3method(db_drop_table,DBIConnection) S3method(db_drop_table,OdbcConnection) S3method(db_drop_table,OraConnection) S3method(db_explain,DBIConnection) S3method(db_explain,PostgreSQL) S3method(db_explain,PostgreSQLConnection) S3method(db_explain,PqConnection) S3method(db_has_table,DBIConnection) S3method(db_has_table,MariaDBConnection) S3method(db_has_table,MySQLConnection) S3method(db_has_table,PostgreSQLConnection) S3method(db_insert_into,DBIConnection) S3method(db_list_tables,DBIConnection) S3method(db_query_fields,DBIConnection) S3method(db_query_fields,PostgreSQLConnection) S3method(db_query_rows,DBIConnection) S3method(db_rollback,DBIConnection) S3method(db_rollback,MySQLConnection) S3method(db_save_query,"Microsoft SQL Server") S3method(db_save_query,DBIConnection) S3method(db_sql_render,DBIConnection) S3method(db_write_table,"Microsoft SQL Server") S3method(db_write_table,DBIConnection) S3method(db_write_table,MySQLConnection) S3method(db_write_table,PostgreSQLConnection) S3method(dim,tbl_sql) S3method(dimnames,tbl_sql) S3method(distinct,tbl_lazy) S3method(distinct_,tbl_lazy) S3method(do,tbl_sql) S3method(do_,tbl_sql) S3method(escape,"NULL") S3method(escape,Date) S3method(escape,POSIXt) S3method(escape,character) S3method(escape,double) S3method(escape,factor) S3method(escape,ident) S3method(escape,ident_q) S3method(escape,integer) S3method(escape,integer64) S3method(escape,list) S3method(escape,logical) S3method(escape,sql) S3method(explain,tbl_sql) S3method(filter_,tbl_lazy) S3method(format,ident) S3method(format,sql) S3method(format,src_sql) S3method(full_join,tbl_lazy) S3method(group_by,tbl_lazy) S3method(group_by_,tbl_lazy) S3method(group_size,tbl_sql) S3method(group_vars,tbl_lazy) S3method(groups,tbl_lazy) S3method(head,tbl_lazy) S3method(inner_join,tbl_lazy) S3method(left_join,tbl_lazy) S3method(mutate,tbl_lazy) S3method(mutate_,tbl_lazy) S3method(n_groups,tbl_sql) S3method(names,sql_variant) S3method(op_desc,op) S3method(op_desc,op_arrange) S3method(op_desc,op_group_by) S3method(op_frame,op_base) S3method(op_frame,op_double) S3method(op_frame,op_frame) S3method(op_frame,op_single) S3method(op_frame,tbl_lazy) S3method(op_grps,op_base) S3method(op_grps,op_double) S3method(op_grps,op_group_by) S3method(op_grps,op_rename) S3method(op_grps,op_single) S3method(op_grps,op_summarise) S3method(op_grps,op_ungroup) S3method(op_grps,tbl_lazy) S3method(op_sort,op_arrange) S3method(op_sort,op_base) S3method(op_sort,op_double) S3method(op_sort,op_order) S3method(op_sort,op_single) S3method(op_sort,op_summarise) S3method(op_sort,tbl_lazy) S3method(op_vars,op_base) S3method(op_vars,op_distinct) S3method(op_vars,op_join) S3method(op_vars,op_mutate) S3method(op_vars,op_rename) S3method(op_vars,op_select) S3method(op_vars,op_semi_join) S3method(op_vars,op_set_op) S3method(op_vars,op_single) S3method(op_vars,op_summarise) S3method(op_vars,tbl_lazy) S3method(print,ident) S3method(print,join_query) S3method(print,op_base_local) S3method(print,op_base_remote) S3method(print,op_single) S3method(print,select_query) S3method(print,semi_join_query) S3method(print,set_op_query) S3method(print,sql) S3method(print,sql_variant) S3method(pull,tbl_sql) S3method(rename,tbl_lazy) S3method(rename_,tbl_lazy) S3method(right_join,tbl_lazy) S3method(same_src,src_sql) S3method(same_src,tbl_lazy) S3method(same_src,tbl_sql) S3method(select,tbl_lazy) S3method(select_,tbl_lazy) S3method(semi_join,tbl_lazy) S3method(show_query,tbl_lazy) S3method(show_query,tbl_sql) S3method(sql_build,op_arrange) S3method(sql_build,op_base_local) S3method(sql_build,op_base_remote) S3method(sql_build,op_distinct) S3method(sql_build,op_filter) S3method(sql_build,op_frame) S3method(sql_build,op_group_by) S3method(sql_build,op_head) S3method(sql_build,op_join) S3method(sql_build,op_mutate) S3method(sql_build,op_order) S3method(sql_build,op_rename) S3method(sql_build,op_select) S3method(sql_build,op_semi_join) S3method(sql_build,op_set_op) S3method(sql_build,op_summarise) S3method(sql_build,op_ungroup) S3method(sql_build,tbl_lazy) S3method(sql_escape_ident,"NULL") S3method(sql_escape_ident,DBIConnection) S3method(sql_escape_ident,MySQLConnection) S3method(sql_escape_ident,SQLiteConnection) S3method(sql_escape_logical,"NULL") S3method(sql_escape_logical,DBIConnection) S3method(sql_escape_logical,SQLiteConnection) S3method(sql_escape_string,"NULL") S3method(sql_escape_string,DBIConnection) S3method(sql_join,DBIConnection) S3method(sql_join,MySQLConnection) S3method(sql_optimise,ident) S3method(sql_optimise,query) S3method(sql_optimise,select_query) S3method(sql_optimise,sql) S3method(sql_render,ident) S3method(sql_render,join_query) S3method(sql_render,op) S3method(sql_render,select_query) S3method(sql_render,semi_join_query) S3method(sql_render,set_op_query) S3method(sql_render,sql) S3method(sql_render,tbl_lazy) S3method(sql_render,tbl_sql) S3method(sql_select,"Microsoft SQL Server") S3method(sql_select,ACCESS) S3method(sql_select,DBIConnection) S3method(sql_select,OraConnection) S3method(sql_select,Oracle) S3method(sql_select,Teradata) S3method(sql_semi_join,DBIConnection) S3method(sql_set_op,SQLiteConnection) S3method(sql_set_op,default) S3method(sql_subquery,DBIConnection) S3method(sql_subquery,DBITestConnection) S3method(sql_subquery,OraConnection) S3method(sql_subquery,Oracle) S3method(sql_subquery,SQLiteConnection) S3method(sql_translate_env,"Microsoft SQL Server") S3method(sql_translate_env,"NULL") S3method(sql_translate_env,ACCESS) S3method(sql_translate_env,DBIConnection) S3method(sql_translate_env,Hive) S3method(sql_translate_env,Impala) S3method(sql_translate_env,MariaDBConnection) S3method(sql_translate_env,MySQLConnection) S3method(sql_translate_env,OdbcConnection) S3method(sql_translate_env,OraConnection) S3method(sql_translate_env,Oracle) S3method(sql_translate_env,PostgreSQL) S3method(sql_translate_env,PostgreSQLConnection) S3method(sql_translate_env,PqConnection) S3method(sql_translate_env,Redshift) S3method(sql_translate_env,SQLiteConnection) S3method(sql_translate_env,Teradata) S3method(src_tbls,src_sql) S3method(summarise,tbl_lazy) S3method(summarise_,tbl_lazy) S3method(tail,tbl_sql) S3method(tbl,src_dbi) S3method(tbl_sum,tbl_sql) S3method(tbl_vars,tbl_lazy) S3method(ungroup,tbl_lazy) S3method(union_all,tbl_lazy) S3method(unique,sql) export(add_op_single) export(as.sql) export(base_agg) export(base_no_win) export(base_odbc_agg) export(base_odbc_scalar) export(base_odbc_win) export(base_scalar) export(base_win) export(build_sql) export(copy_lahman) export(copy_nycflights13) export(db_collect) export(db_compute) export(db_copy_to) export(db_sql_render) export(escape) export(has_lahman) export(has_nycflights13) export(ident) export(ident_q) export(in_schema) export(is.ident) export(is.sql) export(join_query) export(lahman_df) export(lahman_mysql) export(lahman_postgres) export(lahman_sqlite) export(lahman_srcs) export(lazy_frame) export(memdb_frame) export(named_commas) export(nycflights13_postgres) export(nycflights13_sqlite) export(op_base) export(op_double) export(op_frame) export(op_grps) export(op_single) export(op_sort) export(op_vars) export(partial_eval) export(remote_con) export(remote_name) export(remote_query) export(remote_query_plan) export(remote_src) export(select_query) export(semi_join_query) export(set_op_query) export(simulate_dbi) export(simulate_hive) export(simulate_impala) export(simulate_mssql) export(simulate_mysql) export(simulate_odbc) export(simulate_odbc_access) export(simulate_odbc_postgresql) export(simulate_oracle) export(simulate_postgres) export(simulate_sqlite) export(simulate_teradata) export(sql) export(sql_aggregate) export(sql_aggregate_2) export(sql_build) export(sql_cast) export(sql_cot) export(sql_escape_logical) export(sql_expr) export(sql_infix) export(sql_log) export(sql_not_supported) export(sql_optimise) export(sql_paste) export(sql_paste_infix) export(sql_prefix) export(sql_quote) export(sql_render) export(sql_translator) export(sql_variant) export(sql_vector) export(src_dbi) export(src_memdb) export(src_sql) export(src_test) export(tbl_lazy) export(tbl_sql) export(test_frame) export(test_load) export(test_register_con) export(test_register_src) export(translate_sql) export(translate_sql_) export(win_absent) export(win_aggregate) export(win_aggregate_2) export(win_cumulative) export(win_current_frame) export(win_current_group) export(win_current_order) export(win_over) export(win_rank) export(win_recycled) export(window_frame) export(window_order) import(DBI) import(dplyr) import(rlang) importFrom(R6,R6Class) importFrom(assertthat,assert_that) importFrom(assertthat,is.flag) importFrom(glue,glue) importFrom(methods,setOldClass) importFrom(stats,setNames) importFrom(stats,update) importFrom(utils,head) importFrom(utils,tail) dbplyr/NEWS.md0000644000176200001440000003553413221502250012646 0ustar liggesusers# dbplyr 1.2.0 ## New top-level translations * New translations for * MS Access (#2946) (@DavisVaughan) * Oracle, via odbc or ROracle (#2928, #2732, @edgararuiz) * Teradata. * Redshift. * dbplyr now supplies appropriate translations for the RMariaDB and RPostgres packages (#3154). We generally recommend using these packages in favour of the older RMySQL and RPostgreSQL packages as they are fully DBI compliant and tested with DBItest. ## New features * `copy_to()` can now "copy" tbl_sql in the same src, providing another way to cache a query into a temporary table (#3064). You can also `copy_to` tbl_sqls from another source, and `copy_to()` will automatically collect then copy. * Initial support for stringr functions: `str_length()`, `str_to_upper()`, `str_to_lower()`, `str_replace_all()`, `str_detect()`, `str_trim()`. Regular expression support varies from database to database, but most simple regular expressions should be ok. ## Tools for developers * `db_compute()` gains an `analyze` argument to match `db_copy_to()`. * New `remote_name()`, `remote_con()`, `remote_src()`, `remote_query()` and `remote_query_plan()` provide a standard API for get metadata about a remote tbl (#3130, #2923, #2824). * New `sql_expr()` is a more convenient building block for low-level SQL translation (#3169). * New `sql_aggregate()` and `win_aggregate()` for generating SQL and windowed SQL functions for aggregates. These take one argument, `x`, and warn if `na.rm` is not `TRUE` (#3155). `win_recycled()` is equivalent to `win_aggregate()` and has been soft-deprecated. * `db_write_table` now needs to return the table name ## Minor improvements and bug fixes * Multiple `head()` calls in a row now collapse to a single call. This avoids a printing problem with MS SQL (#3084). * `escape()` now works with integer64 values from the bit64 package (#3230) * `if`, `ifelse()`, and `if_else()` now correctly scope the false condition so that it only applies to non-NULL conditions (#3157) * `ident()` and `ident_q()` handle 0-length inputs better, and should be easier to use with S3 (#3212) * `in_schema()` should now work in more places, particularly in `copy_to()` (#3013, @baileych) * SQL generation for joins no longer gets stuck in a endless loop if you request an empty suffix (#3220). * `mutate()` has better logic for splitting a single mutate into multiple subqueries (#3095). * Improved `paste()` and `paste0()` support in MySQL, PostgreSQL (#3168), and RSQLite (#3176). MySQL and PostgreSQL gain support for `str_flatten()` which behaves like `paste(x, collapse = "-")` (but for technical reasons can't be implemented as a straightforward translation of `paste()`). * `same_src.tbl_sql()` now performs correct comparison instead of always returning `TRUE`. This means that `copy = TRUE` once again allows you to perform cross-database joins (#3002). * `select()` queries no longer alias column names unnecessarily (#2968, @DavisVaughan). * `select()` and `rename()` are now powered by tidyselect, fixing a few renaming bugs (#3132, #2943, #2860). * `summarise()` once again performs partial evaluation before database submission (#3148). * `test_src()` makes it easier to access a single test source. ## Database specific improvements * MS SQL * Better support for temporary tables (@Hong-Revo) * Different translations for filter/mutate contexts for: `NULL` evaluation (`is.na()`, `is.null()`), logical operators (`!`, `&`, `&&`, `|`, `||`), and comparison operators (`==`, `!=`, `<`, `>`, `>=`, `<=`) * MySQL: `copy_to()` (via `db_write_table()`) correctly translates logical variables to integers (#3151). * odbc: improved `n()` translation in windowed context. * SQLite: improved `na_if` translation (@cwarden) * PostgreSQL: translation for `grepl()` added (@zozlak) # dbplyr 1.1.0 ## New features * `full_join()` over non-overlapping columns `by = character()` translated to `CROSS JOIN` (#2924). * `case_when()` now translates to SQL "CASE WHEN" (#2894) * `x %in% c(1)` now generates the same SQL as `x %in% 1` (#2898). * New `window_order()` and `window_frame()` give you finer control over the window functions that dplyr creates (#2874, #2593). * Added SQL translations for Oracle (@edgararuiz). ## Minor improvements and bug fixes * `x %in% c(1)` now generates the same SQL as `x %in% 1` (#2898). * `head(tbl, 0)` is now supported (#2863). * `select()`ing zero columns gives a more information error message (#2863). * Variables created in a join are now disambiguated against other variables in the same table, not just variables in the other table (#2823). * PostgreSQL gains a better translation for `round()` (#60). * Added custom `db_analyze_table()` for MS SQL, Oracle, Hive and Impala (@edgararuiz) * Added support for `sd()` for aggregate and window functions (#2887) (@edgararuiz) * You can now use the magrittr pipe within expressions, e.g. `mutate(mtcars, cyl %>% as.character())`. * If a translation was supplied for a summarise function, but not for the equivalent windowed variant, the expression would be translated to `NULL` with a warning. Now `sql_variant()` checks that all aggregate functions have matching window functions so that correct translations or clean errors will be generated (#2887) # dbplyr 1.0.0 ## New features * `tbl()` and `copy_to()` now work directly with DBI connections (#2423, #2576), so there is no longer a need to generate a dplyr src. ```R library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars) mtcars2 <- tbl(con, "mtcars") mtcars2 ``` * `glimpse()` now works with remote tables (#2665) * dplyr has gained a basic SQL optimiser, which collapses certain nested SELECT queries into a single query (#1979). This will improve query execution performance for databases with less sophisticated query optimisers, and fixes certain problems with ordering and limits in subqueries (#1979). A big thanks goes to @hhoeflin for figuring out this optimisation. * `compute()` and `collapse()` now preserve the "ordering" of rows. This only affects the computation of window functions, as the rest of SQL does not care about row order (#2281). * `copy_to()` gains an `overwrite` argument which allows you to overwrite an existing table. Use with care! (#2296) * New `in_schema()` function makes it easy to refer to tables in schema: `in_schema("my_schema_name", "my_table_name")`. ## Deprecated and defunct * `query()` is no longer exported. It hasn't been useful for a while so this shouldn't break any code. ## Verb-level SQL generation * Partial evaluation occurs immediately when you execute a verb (like `filter()` or `mutate()`) rather than happening when the query is executed (#2370). * `mutate.tbl_sql()` will now generate as many subqueries as necessary so that you can refer to variables that you just created (like in mutate with regular dataframes) (#2481, #2483). * SQL joins have been improved: * SQL joins always use the `ON ...` syntax, avoiding `USING ...` even for natural joins. Improved handling of tables with columns of the same name (#1997, @javierluraschi). They now generate SQL more similar to what you'd write by hand, eliminating a layer or two of subqueries (#2333) * [API] They now follow the same rules for including duplicated key variables that the data frame methods do, namely that key variables are only kept from `x`, and never from `y` (#2410) * [API] The `sql_join()` generic now gains a `vars` argument which lists the variables taken from the left and right sides of the join. If you have a custom `sql_join()` method, you'll need to update how your code generates joins, following the template in `sql_join.generic()`. * `full_join()` throws a clear error when you attempt to use it with a MySQL backend (#2045) * `right_join()` and `full_join()` now return results consistent with local data frame sources when there are records in the right table with no match in the left table. `right_join()` returns values of `by` columns from the right table. `full_join()` returns coalesced values of `by` columns from the left and right tables (#2578, @ianmcook) * `group_by()` can now perform an inline mutate for database backends (#2422). * The SQL generation set operations (`intersect()`, `setdiff()`, `union()`, and `union_all()`) have been considerably improved. By default, the component SELECT are surrounded with parentheses, except on SQLite. The SQLite backend will now throw an error if you attempt a set operation on a query that contains a LIMIT, as that is not supported in SQLite (#2270). All set operations match column names across inputs, filling in non-matching variables with NULL (#2556). * `rename()` and `group_by()` now combine correctly (#1962) * `tbl_lazy()` and `lazy_tbl()` have been exported. These help you test generated SQL with out an active database connection. * `ungroup()` correctly resets grouping variables (#2704). ## Vector-level SQL generation * New `as.sql()` safely coerces an input to SQL. * More tranlators for `as.character()`, `as.integer()` and `as.double()` (#2775). * New `ident_q()` makes it possible to specifier identifiers that do not need to be quoted. * Translation of inline scalars: * Logical values are now translated differently depending on the backend. The default is to use "true" and "false" which is the SQL-99 standard, but not widely support. SQLite translates to "0" and "1" (#2052). * `Inf` and `-Inf` are correctly escaped * Better test for whether or not a double is similar to an integer and hence needs a trailing 0.0 added (#2004). * Quoting defaults to `DBI::dbEscapeString()` and `DBI::dbQuoteIdentifier()` respectively. * `::` and `:::` are handled correctly (#2321) * `x %in% 1` is now correctly translated to `x IN (1)` (#511). * `ifelse()` and `if_else()` use correct argument names in SQL translation (#2225). * `ident()` now returns an object with class `c("ident", "character")`. It no longer contains "sql" to indicate that this is not already escaped. * `is.na()` and `is.null()` gain extra parens in SQL translation to preserve correct precedence (#2302). * [API] `log(x, b)` is now correctly translated to the SQL `log(b, x)` (#2288). SQLite does not support the 2-argument log function so it is translated to `log(x) / log(b)`. * `nth(x, i)` is now correctly translated to `nth_value(x, i)`. * `n_distinct()` now accepts multiple variables (#2148). * [API] `substr()` is now translated to SQL, correcting for the difference in the third argument. In R, it's the position of the last character, in SQL it's the length of the string (#2536). * `win_over()` escapes expression using current database rules. ## Backends * `copy_to()` now uses `db_write_table()` instead of `db_create_table()` and `db_insert_into()`. `db_write_table.DBIConnection()` uses `dbWriteTable()`. * New `db_copy_to()`, `db_compute()` and `db_collect()` allow backends to override the entire database process behind `copy_to()`, `compute()` and `collect()`. `db_sql_render()` allow additional control over the SQL rendering process. * All generics whose behaviour can vary from database to database now provide a DBIConnection method. That means that you can easily scan the NAMESPACE to see the extension points. * `sql_escape_logical()` allows you to control the translation of literal logicals (#2614). * `src_desc()` has been replaced by `db_desc()` and now dispatches on the connection, eliminating the last method that required dispatch on the class of the src. * `win_over()`, `win_rank()`, `win_recycled()`, `win_cumulative()`, `win_current_group()` and `win_current_order()` are now exported. This should make it easier to provide customised SQL for window functions (#2051, #2126). * SQL translation for Microsoft SQL Server (@edgararuiz) * SQL translation for Apache Hive (@edgararuiz) * SQL translation for Apache Impala (@edgararuiz) ## Minor bug fixes and improvements * `collect()` once again defaults to return all rows in the data (#1968). This makes it behave the same as `as.data.frame()` and `as_tibble()`. * `collect()` only regroups by variables present in the data (#2156) * `collect()` will automatically LIMIT the result to the `n`, the number of rows requested. This will provide the query planner with more information that it may be able to use to improve execution time (#2083). * `common_by()` gets a better error message for unexpected inputs (#2091) * `copy_to()` no longer checks that the table doesn't exist before creation, intead preferring to fall back on the database for error messages. This should reduce both false positives and false negative (#1470) * `copy_to()` now succeeds for MySQL if a character column contains `NA` (#1975, #2256, #2263, #2381, @demorenoc, @eduardgrebe). * `copy_to()` now returns it's output invisibly (since you're often just calling for the side-effect). * `distinct()` reports improved variable information for SQL backends. This means that it is more likely to work in the middle of a pipeline (#2359). * Ungrouped `do()` on database backends now collects all data locally first (#2392). * Call `dbFetch()` instead of the deprecated `fetch()` (#2134). Use `DBI::dbExecute()` for non-query SQL commands (#1912) * `explain()` and `show_query()` now invisibly return the first argument, making them easier to use inside a pipeline. * `print.tbl_sql()` displays ordering (#2287) and prints table name, if known. * `print(df, n = Inf)` and `head(df, n = Inf)` now work with remote tables (#2580). * `db_desc()` and `sql_translate_env()` get defaults for DBIConnection. * Formatting now works by overriding the `tbl_sum()` generic instead of `print()`. This means that the output is more consistent with tibble, and that `format()` is now supported also for SQL sources (tidyverse/dbplyr#14). ## Lazy ops * [API] The signature of `op_base` has changed to `op_base(x, vars, class)` * [API] `translate_sql()` and `partial_eval()` have been refined: * `translate_sql()` no longer takes a vars argument; instead call `partial_eval()` yourself. * Because it no longer needs the environment `translate_sql()_` now works with a list of dots, rather than a `lazy_dots`. * `partial_eval()` now takes a character vector of variable names rather than a tbl. * This leads to a simplification of the `op` data structure: dots is now a list of expressions rather than a `lazy_dots`. * [API] `op_vars()` now returns a list of quoted expressions. This enables escaping to happen at the correct time (i.e. when the connection is known). dbplyr/R/0000755000176200001440000000000013221502521011740 5ustar liggesusersdbplyr/R/explain.r0000644000176200001440000000061413174406110013570 0ustar liggesusers#' @export show_query.tbl_sql <- function(x, ...) { message("\n", remote_query(x)) invisible(x) } #' @export show_query.tbl_lazy <- function(x, ...) { qry <- sql_build(x, con = x$src, ...) sql_render(qry, con = x$src, ...) } #' @export explain.tbl_sql <- function(x, ...) { force(x) show_query(x) message("\n") message("\n", remote_query_plan(x)) invisible(x) } dbplyr/R/window.R0000644000176200001440000000260313123013174013375 0ustar liggesusers#' Override window order and frame #' #' @param .data A remote tibble #' @param ... Name-value pairs of expressions. #' @param from,to Bounds of the frame. #' @export #' @examples #' library(dplyr) #' df <- lazy_frame(g = rep(1:2, each = 5), y = runif(10), z = 1:10) #' #' df %>% #' window_order(y) %>% #' mutate(z = cumsum(y)) %>% #' sql_build() #' #' df %>% #' group_by(g) %>% #' window_frame(-3, 0) %>% #' window_order(z) %>% #' mutate(z = sum(x)) %>% #' sql_build() #' @export window_order <- function(.data, ...) { dots <- quos(...) dots <- partial_eval(dots, vars = op_vars(.data)) names(dots) <- NULL add_op_order(.data, dots) } #' @export #' @rdname window_order window_frame <- function(.data, from = -Inf, to = Inf) { stopifnot(is.numeric(from), length(from) == 1) stopifnot(is.numeric(to), length(to) == 1) add_op_single("frame", .data, args = list(range = c(from, to))) } #' @export sql_build.op_frame <- function(op, con, ...) { sql_build(op$x, con, ...) } #' @export #' @rdname lazy_ops op_frame <- function(op) UseMethod("op_frame") #' @export op_frame.tbl_lazy <- function(op) { op_frame(op$ops) } #' @export op_frame.op_base <- function(op) { NULL } #' @export op_frame.op_single <- function(op) { op_frame(op$x) } #' @export op_frame.op_double <- function(op) { op_frame(op$x) } #' @export op_frame.op_frame <- function(op) { op$args$range } dbplyr/R/test-frame.R0000644000176200001440000000421613174433047014152 0ustar liggesusers#' Infrastructure for testing dplyr #' #' Register testing sources, then use `test_load()` to load an existing #' data frame into each source. To create a new table in each source, #' use `test_frame()`. #' #' @keywords internal #' @examples #' \dontrun{ #' test_register_src("df", src_df(env = new.env())) #' test_register_src("sqlite", src_sqlite(":memory:", create = TRUE)) #' #' test_frame(x = 1:3, y = 3:1) #' test_load(mtcars) #' } #' @name testing NULL #' @export #' @rdname testing test_register_src <- function(name, src) { message("Registering testing src: ", name, " ", appendLF = FALSE) tryCatch( { test_srcs$add(name, src) message("OK") }, error = function(e) message("\n* ", conditionMessage(e)) ) } #' @export #' @rdname testing test_register_con <- function(name, ...) { test_register_src(name, src_dbi(DBI::dbConnect(...), auto_disconnect = TRUE)) } #' @export #' @rdname testing src_test <- function(name) { srcs <- test_srcs$get() if (!name %in% names(srcs)) { stop("Couldn't find test src ", name, call. = FALSE) } srcs[[name]] } #' @export #' @rdname testing test_load <- function(df, name = random_table_name(), srcs = test_srcs$get(), ignore = character()) { stopifnot(is.data.frame(df)) stopifnot(is.character(ignore)) srcs <- srcs[setdiff(names(srcs), ignore)] lapply(srcs, copy_to, df, name = name) } #' @export #' @rdname testing test_frame <- function(..., srcs = test_srcs$get(), ignore = character()) { df <- data_frame(...) test_load(df, srcs = srcs, ignore = ignore) } test_frame_windowed <- function(...) { # SQLite and MySQL don't support window functions test_frame(..., ignore = c("sqlite", "mysql", "MariaDB")) } # Manage cache of testing srcs test_srcs <- local({ e <- new.env(parent = emptyenv()) e$srcs <- list() list( get = function() e$srcs, has = function(x) x %in% names(e$srcs), add = function(name, src) { stopifnot(is.src(src)) e$srcs[[name]] <- src }, set = function(...) { old <- e$srcs e$srcs <- list(...) invisible(old) }, length = function() { length(e$srcs) } ) }) dbplyr/R/sql-optimise.R0000644000176200001440000000270013176710165014526 0ustar liggesusers#' @export #' @rdname sql_build sql_optimise <- function(x, con = NULL, ...) { UseMethod("sql_optimise") } #' @export sql_optimise.sql <- function(x, con = NULL, ...) { # Can't optimise raw SQL x } #' @export sql_optimise.ident <- function(x, con = NULL, ...) { x } #' @export sql_optimise.query <- function(x, con = NULL, ...) { # Default to no optimisation x } #' @export sql_optimise.select_query <- function(x, con = NULL, ...) { if (!inherits(x$from, "select_query")) { return(x) } from <- sql_optimise(x$from) # If all outer clauses are executed after the inner clauses, we # can drop them down a level outer <- select_query_clauses(x) inner <- select_query_clauses(from) if (length(outer) == 0 || length(inner) == 0) return(x) if (min(outer) > max(inner)) { from[as.character(outer)] <- x[as.character(outer)] from } else { x } } # List clauses used by a query, in the order they are executed # https://sqlbolt.com/lesson/select_queries_order_of_execution # List clauses used by a query, in the order they are executed in select_query_clauses <- function(x) { present <- c( where = length(x$where) > 0, group_by = length(x$group_by) > 0, having = length(x$having) > 0, select = !identical(x$select, sql("*")), distinct = x$distinct, order_by = length(x$order_by) > 0, limit = !is.null(x$limit) ) ordered(names(present)[present], levels = names(present)) } dbplyr/R/translate-sql-odbc.R0000644000176200001440000000511213176710165015577 0ustar liggesusers#' @export #' @rdname sql_variant #' @format NULL base_odbc_scalar <- sql_translator(.parent = base_scalar, as.numeric = sql_cast("DOUBLE"), as.double = sql_cast("DOUBLE"), as.integer = sql_cast("INT"), as.logical = sql_cast("BOOLEAN"), as.character = sql_cast("STRING"), as.Date = sql_cast("DATE"), paste0 = sql_prefix("CONCAT"), # cosh, sinh, coth and tanh calculations are based on this article # https://en.wikipedia.org/wiki/Hyperbolic_function cosh = function(x) build_sql("(EXP(", x, ") + EXP(-(", x,"))) / 2"), sinh = function(x) build_sql("(EXP(", x, ") - EXP(-(", x,"))) / 2"), tanh = function(x){ build_sql( "((EXP(", x, ") - EXP(-(", x,"))) / 2) / ((EXP(", x, ") + EXP(-(", x,"))) / 2)" )}, round = function(x, digits = 0L){ build_sql( "ROUND(", x, ", ", as.integer(digits),")" )}, coth = function(x){ build_sql( "((EXP(", x, ") + EXP(-(", x,"))) / 2) / ((EXP(", x, ") - EXP(-(", x,"))) / 2)" )}, paste = function(..., sep = " "){ build_sql( "CONCAT_WS(",sep, ", ",escape(c(...), parens = "", collapse = ","),")" )} ) #' @export #' @rdname sql_variant #' @format NULL base_odbc_agg <- sql_translator( .parent = base_agg, n = function() sql("COUNT(*)"), count = function() sql("COUNT(*)"), sd = sql_prefix("STDDEV_SAMP") ) #' @export #' @rdname sql_variant #' @format NULL base_odbc_win <- sql_translator(.parent = base_win, sd = win_aggregate("STDDEV_SAMP"), count = function() { win_over(sql("COUNT(*)"), win_current_group()) } ) #' @export db_desc.OdbcConnection <- function(x) { info <- DBI::dbGetInfo(x) host <- if (info$servername == "") "localhost" else info$servername port <- if (info$port == "") "" else paste0(":", info$port) paste0( info$dbms.name, " ", info$db.version, "[", info$username, "@", host, port, "/", info$dbname , "]") } #' @export sql_translate_env.OdbcConnection <- function(con) { sql_variant( base_odbc_scalar, base_odbc_agg, base_odbc_win ) } #' @export db_drop_table.OdbcConnection <- function(con, table, force = FALSE, ...) { sql <- build_sql( "DROP TABLE ", if (force) sql("IF EXISTS "), sql(table), con = con ) DBI::dbExecute(con, sql) } dbplyr/R/sql-query.R0000644000176200001440000001302013221455756014042 0ustar liggesusers # select_query ------------------------------------------------------------ #' @export #' @rdname sql_build select_query <- function(from, select = sql("*"), where = character(), group_by = character(), having = character(), order_by = character(), limit = NULL, distinct = FALSE) { stopifnot(is.character(select)) stopifnot(is.character(where)) stopifnot(is.character(group_by)) stopifnot(is.character(having)) stopifnot(is.character(order_by)) stopifnot(is.null(limit) || (is.numeric(limit) && length(limit) == 1L)) stopifnot(is.logical(distinct), length(distinct) == 1L) structure( list( from = from, select = select, where = where, group_by = group_by, having = having, order_by = order_by, distinct = distinct, limit = limit ), class = c("select_query", "query") ) } #' @export print.select_query <- function(x, ...) { cat( "\n", sep = "" ) cat("From: ", gsub("\n", " ", sql_render(x$from, root = FALSE)), "\n", sep = "") if (length(x$select)) cat("Select: ", named_commas(x$select), "\n", sep = "") if (length(x$where)) cat("Where: ", named_commas(x$where), "\n", sep = "") if (length(x$group_by)) cat("Group by: ", named_commas(x$group_by), "\n", sep = "") if (length(x$order_by)) cat("Order by: ", named_commas(x$order_by), "\n", sep = "") if (length(x$having)) cat("Having: ", named_commas(x$having), "\n", sep = "") if (length(x$limit)) cat("Limit: ", x$limit, "\n", sep = "") } #' @export #' @rdname sql_build join_query <- function(x, y, vars, type = "inner", by = NULL, suffix = c(".x", ".y")) { structure( list( x = x, y = y, vars = vars, type = type, by = by ), class = c("join_query", "query") ) } # Returns NULL if variables don't need to be renamed join_vars <- function(x_names, y_names, type, by, suffix = c(".x", ".y")) { # Remove join keys from y y_names <- setdiff(y_names, by$y) # Add suffix where needed suffix <- check_suffix(suffix) x_new <- add_suffixes(x_names, y_names, suffix$x) y_new <- add_suffixes(y_names, x_names, suffix$y) # In left and inner joins, return key values only from x # In right joins, return key values only from y # In full joins, return key values by coalescing values from x and y x_x <- x_names x_y <- by$y[match(x_names, by$x)] x_y[type == "left" | type == "inner"] <- NA x_x[type == "right" & !is.na(x_y)] <- NA y_x <- rep_len(NA, length(y_names)) y_y <- y_names # Return a list with 3 parallel vectors # At each position, values in the 3 vectors represent # alias - name of column in join result # x - name of column from left table or NA if only from right table # y - name of column from right table or NA if only from left table list(alias = c(x_new, y_new), x = c(x_x, y_x), y = c(x_y, y_y)) } add_suffixes <- function(x, y, suffix) { if (identical(suffix, "")) { return(x) } out <- chr_along(x) for (i in seq_along(x)) { nm <- x[[i]] while (nm %in% y || nm %in% out) { nm <- paste0(nm, suffix) } out[[i]] <- nm } out } semi_join_vars <- function(x_names, y_names) { all_names <- set_names(union(x_names, y_names)) x_new <- all_names x_new[!all_names %in% x_names] <- NA y_new <- all_names y_new[!all_names %in% y_names] <- NA list(x = x_new, y = y_new) } get_join_xy_names <- function(by, uniques) { xy_by <- by$x[by$x == by$y] x_names <- uniques$x x_rename <- names(x_names) %in% xy_by names(x_names)[!x_rename] <- "" y_names <- uniques$y y_remove <- names(y_names) %in% xy_by y_names <- unname(y_names[!y_remove]) c(x_names, y_names) } #' @export print.join_query <- function(x, ...) { cat("\n", sep = "") cat("By: ", paste0(x$by$x, "-", x$by$y, collapse = ", "), "\n", sep = "") cat(named_rule("X"), "\n", sep = "") print(x$x$ops) cat(named_rule("Y"), "\n", sep = "") print(x$y$ops) } #' @export #' @rdname sql_build semi_join_query <- function(x, y, anti = FALSE, by = NULL) { structure( list( x = x, y = y, anti = anti, by = by ), class = c("semi_join_query", "query") ) } #' @export print.semi_join_query <- function(x, ...) { cat( "\n", sep = "" ) cat("By: ", paste0(x$by$x, "-", x$by$y, collapse = ", "), "\n", sep = "") cat(named_rule("X"), "\n", sep = "") print(x$x$ops) cat(named_rule("Y"), "\n", sep = "") print(x$y$ops) } #' @export #' @rdname sql_build set_op_query <- function(x, y, type = type) { structure( list( x = x, y = y, type = type ), class = c("set_op_query", "query") ) } #' @export print.set_op_query <- function(x, ...) { cat("\n", sep = "") cat(named_rule("X"), "\n", sep = "") print(x$x$ops) cat(named_rule("Y"), "\n", sep = "") print(x$y$ops) } check_suffix <- function(x) { if (!is.character(x) || length(x) != 2) { stop("`suffix` must be a character vector of length 2.", call. = FALSE) } list(x = x[1], y = x[2]) } common_by_from_vector <- function(by) { by <- by[!duplicated(by)] by_x <- names(by) %||% by by_y <- unname(by) # If x partially named, assume unnamed are the same in both tables by_x[by_x == ""] <- by_y[by_x == ""] list(x = by_x, y = by_y) } dbplyr/R/memdb.R0000644000176200001440000000140313066545775013176 0ustar liggesusers#' Create a database table in temporary in-memory database. #' #' `memdb_frame()` works like [tibble::tibble()], but instead of creating a new #' data frame in R, it creates a table in [src_memdb()]. #' #' @inheritParams tibble::data_frame #' @param .name Name of table in database: defaults to a random name that's #' unlikely to conflict with an existing table. #' @export #' @examples #' library(dplyr) #' df <- memdb_frame(x = runif(100), y = runif(100)) #' df %>% arrange(x) #' df %>% arrange(x) %>% show_query() memdb_frame <- function(..., .name = random_table_name()) { x <- copy_to(src_memdb(), data_frame(...), name = .name) x } #' @rdname memdb_frame #' @export src_memdb <- function() { cache_computation("src_memdb", src_sqlite(":memory:", TRUE)) } dbplyr/R/translate-sql-base.r0000644000176200001440000002415713176663723015663 0ustar liggesusers#' @include translate-sql-window.r #' @include translate-sql-helpers.r #' @include sql-escape.r NULL sql_if <- function(cond, if_true, if_false = NULL) { build_sql( "CASE WHEN (", cond, ")", " THEN (", if_true, ")", if (!is.null(if_false)) build_sql(" WHEN NOT(", cond, ") THEN (", if_false, ")"), " END" ) } #' @export #' @rdname sql_variant #' @format NULL base_scalar <- sql_translator( `+` = sql_infix("+"), `*` = sql_infix("*"), `/` = sql_infix("/"), `%%` = sql_infix("%"), `^` = sql_prefix("power", 2), `-` = function(x, y = NULL) { if (is.null(y)) { if (is.numeric(x)) { -x } else { build_sql(sql("-"), x) } } else { build_sql(x, sql(" - "), y) } }, `!=` = sql_infix("!="), `==` = sql_infix("="), `<` = sql_infix("<"), `<=` = sql_infix("<="), `>` = sql_infix(">"), `>=` = sql_infix(">="), `%in%` = function(x, table) { if (is.sql(table) || length(table) > 1) { build_sql(x, " IN ", table) } else { build_sql(x, " IN (", table, ")") } }, `!` = sql_prefix("not"), `&` = sql_infix("and"), `&&` = sql_infix("and"), `|` = sql_infix("or"), `||` = sql_infix("or"), xor = function(x, y) { sql(sprintf("%1$s OR %2$s AND NOT (%1$s AND %2$s)", escape(x), escape(y))) }, abs = sql_prefix("abs", 1), acos = sql_prefix("acos", 1), acosh = sql_prefix("acosh", 1), asin = sql_prefix("asin", 1), asinh = sql_prefix("asinh", 1), atan = sql_prefix("atan", 1), atan2 = sql_prefix("atan2", 2), atanh = sql_prefix("atanh", 1), ceil = sql_prefix("ceil", 1), ceiling = sql_prefix("ceil", 1), cos = sql_prefix("cos", 1), cosh = sql_prefix("cosh", 1), cot = sql_prefix("cot", 1), coth = sql_prefix("coth", 1), exp = sql_prefix("exp", 1), floor = sql_prefix("floor", 1), log = function(x, base = exp(1)) { if (isTRUE(all.equal(base, exp(1)))) { sql_expr(ln(!!x)) } else { sql_expr(log(!!base, !!x)) } }, log10 = sql_prefix("log10", 1), round = sql_prefix("round", 2), sign = sql_prefix("sign", 1), sin = sql_prefix("sin", 1), sinh = sql_prefix("sinh", 1), sqrt = sql_prefix("sqrt", 1), tan = sql_prefix("tan", 1), tanh = sql_prefix("tanh", 1), tolower = sql_prefix("lower", 1), toupper = sql_prefix("upper", 1), trimws = sql_prefix("trim", 1), nchar = sql_prefix("length", 1), substr = function(x, start, stop) { start <- as.integer(start) length <- pmax(as.integer(stop) - start + 1L, 0L) build_sql(sql("substr"), list(x, start, length)) }, `if` = sql_if, if_else = function(condition, true, false) sql_if(condition, true, false), ifelse = function(test, yes, no) sql_if(test, yes, no), case_when = function(...) sql_case_when(...), sql = function(...) sql(...), `(` = function(x) { build_sql("(", x, ")") }, `{` = function(x) { build_sql("(", x, ")") }, desc = function(x) { build_sql(x, sql(" DESC")) }, is.null = function(x) sql_expr(((!!x) %is% NULL)), is.na = function(x) sql_expr(((!!x) %is% NULL)), na_if = sql_prefix("NULL_IF", 2), coalesce = sql_prefix("coalesce"), as.numeric = sql_cast("NUMERIC"), as.double = sql_cast("NUMERIC"), as.integer = sql_cast("INTEGER"), as.character = sql_cast("TEXT"), c = function(...) c(...), `:` = function(from, to) from:to, between = function(x, left, right) { build_sql(x, " BETWEEN ", left, " AND ", right) }, pmin = sql_prefix("min"), pmax = sql_prefix("max"), `%>%` = `%>%`, # stringr functions # SQL Syntax reference links: # MySQL https://dev.mysql.com/doc/refman/5.7/en/string-functions.html # Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions # Impala: https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_string_functions.html # PostgreSQL: https://www.postgresql.org/docs/9.1/static/functions-string.html # MS SQL: https://docs.microsoft.com/en-us/sql/t-sql/functions/string-functions-transact-sql # Oracle: https://docs.oracle.com/middleware/bidv1221/desktop/BIDVD/GUID-BBA975C7-B2C5-4C94-A007-28775680F6A5.htm#BILUG685 str_length = sql_prefix("LENGTH"), str_to_upper = sql_prefix("UPPER"), str_to_lower = sql_prefix("LOWER"), str_replace_all = function(string, pattern, replacement){ build_sql( "REPLACE(", string, ", ", pattern, ", ", replacement, ")" )}, str_detect = function(string, pattern){ build_sql( "INSTR(", pattern, ", ", string, ") > 0" )}, str_trim = function(string, side = "both"){ build_sql( sql(ifelse(side == "both" | side == "left", "LTRIM(", "(")), sql(ifelse(side == "both" | side == "right", "RTRIM(", "(")), string ,"))" )} ) base_symbols <- sql_translator( pi = sql("PI()"), `*` = sql("*"), `NULL` = sql("NULL") ) #' @export #' @rdname sql_variant #' @format NULL base_agg <- sql_translator( # SQL-92 aggregates # http://db.apache.org/derby/docs/10.7/ref/rrefsqlj33923.html n = function() sql("COUNT()"), mean = sql_aggregate("avg"), var = sql_aggregate("variance"), sum = sql_aggregate("sum"), min = sql_aggregate("min"), max = sql_aggregate("max"), n_distinct = function(...) { vars <- sql_vector(list(...), parens = FALSE, collapse = ", ") build_sql("COUNT(DISTINCT ", vars, ")") } ) #' @export #' @rdname sql_variant #' @format NULL base_win <- sql_translator( # rank functions have a single order argument that overrides the default row_number = win_rank("row_number"), min_rank = win_rank("rank"), rank = win_rank("rank"), dense_rank = win_rank("dense_rank"), percent_rank = win_rank("percent_rank"), cume_dist = win_rank("cume_dist"), ntile = function(order_by, n) { win_over( build_sql("NTILE", list(as.integer(n))), win_current_group(), order_by %||% win_current_order() ) }, # Variants that take more arguments first = function(x, order_by = NULL) { win_over( build_sql("first_value", list(x)), win_current_group(), order_by %||% win_current_order() ) }, last = function(x, order_by = NULL) { win_over( build_sql("last_value", list(x)), win_current_group(), order_by %||% win_current_order() ) }, nth = function(x, n, order_by = NULL) { win_over( build_sql("nth_value", list(x, as.integer(n))), win_current_group(), order_by %||% win_current_order() ) }, lead = function(x, n = 1L, default = NA, order_by = NULL) { win_over( build_sql("LEAD", list(x, n, default)), win_current_group(), order_by %||% win_current_order() ) }, lag = function(x, n = 1L, default = NA, order_by = NULL) { win_over( build_sql("LAG", list(x, as.integer(n), default)), win_current_group(), order_by %||% win_current_order() ) }, # Recycled aggregate fuctions take single argument, don't need order and # include entire partition in frame. mean = win_aggregate("avg"), var = win_aggregate("variance"), sum = win_aggregate("sum"), min = win_aggregate("min"), max = win_aggregate("max"), n = function() { win_over(sql("COUNT(*)"), win_current_group()) }, n_distinct = function(...) { vars <- sql_vector(list(...), parens = FALSE, collapse = ", ") win_over(build_sql("COUNT(DISTINCT ", vars, ")"), win_current_group()) }, # Cumulative function are like recycled aggregates except that R names # have cum prefix, order_by is inherited and frame goes from -Inf to 0. cummean = win_cumulative("mean"), cumsum = win_cumulative("sum"), cummin = win_cumulative("min"), cummax = win_cumulative("max"), # Manually override other parameters -------------------------------------- order_by = function(order_by, expr) { old <- set_win_current_order(order_by) on.exit(set_win_current_order(old)) expr } ) #' @export #' @rdname sql_variant #' @format NULL base_no_win <- sql_translator( row_number = win_absent("row_number"), min_rank = win_absent("rank"), rank = win_absent("rank"), dense_rank = win_absent("dense_rank"), percent_rank = win_absent("percent_rank"), cume_dist = win_absent("cume_dist"), ntile = win_absent("ntile"), mean = win_absent("avg"), sd = win_absent("sd"), var = win_absent("var"), cov = win_absent("cov"), cor = win_absent("cor"), sum = win_absent("sum"), min = win_absent("min"), max = win_absent("max"), n = win_absent("n"), n_distinct = win_absent("n_distinct"), cummean = win_absent("mean"), cumsum = win_absent("sum"), cummin = win_absent("min"), cummax = win_absent("max"), nth = win_absent("nth_value"), first = win_absent("first_value"), last = win_absent("last_value"), lead = win_absent("lead"), lag = win_absent("lag"), order_by = win_absent("order_by"), str_flatten = win_absent("str_flatten"), count = win_absent("count") ) # case_when --------------------------------------------------------------- sql_case_when <- function(...) { # TODO: switch to dplyr::case_when_prepare when available formulas <- dots_list(...) n <- length(formulas) if (n == 0) { abort("No cases provided") } query <- vector("list", n) value <- vector("list", n) for (i in seq_len(n)) { f <- formulas[[i]] env <- environment(f) query[[i]] <- escape(eval_bare(f[[2]], env), con = sql_current_con()) value[[i]] <- escape(eval_bare(f[[3]], env), con = sql_current_con()) } clauses <- purrr::map2_chr(query, value, ~ paste0("WHEN (", .x, ") THEN (", .y, ")")) sql(paste0( "CASE\n", paste0(clauses, collapse = "\n"), "\nEND" )) } dbplyr/R/db-compute.R0000644000176200001440000000572613176710165014152 0ustar liggesusers#' More db generics #' #' These are new, so not included in dplyr for backward compatibility #' purposes. #' #' @keywords internal #' @export db_copy_to <- function(con, table, values, overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ...) { UseMethod("db_copy_to") } #' @export db_copy_to.DBIConnection <- function(con, table, values, overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ...) { types <- types %||% db_data_type(con, values) names(types) <- names(values) db_begin(con) tryCatch({ if (overwrite) { db_drop_table(con, table, force = TRUE) } table <- db_write_table(con, table, types = types, values = values, temporary = temporary) db_create_indexes(con, table, unique_indexes, unique = TRUE) db_create_indexes(con, table, indexes, unique = FALSE) if (analyze) db_analyze(con, table) db_commit(con) }, error = function(err) { db_rollback(con) stop(err) }) table } #' @export #' @rdname db_copy_to db_compute <- function(con, table, sql, temporary = TRUE, unique_indexes = list(), indexes = list(), analyze = TRUE, ...) { UseMethod("db_compute") } #' @export db_compute.DBIConnection <- function(con, table, sql, temporary = TRUE, unique_indexes = list(), indexes = list(), analyze = TRUE, ...) { if (!is.list(indexes)) { indexes <- as.list(indexes) } if (!is.list(unique_indexes)) { unique_indexes <- as.list(unique_indexes) } table <- db_save_query(con, sql, table, temporary = temporary) db_create_indexes(con, table, unique_indexes, unique = TRUE) db_create_indexes(con, table, indexes, unique = FALSE) if (analyze) db_analyze(con, table) table } #' @export #' @rdname db_copy_to db_collect <- function(con, sql, n = -1, warn_incomplete = TRUE, ...) { UseMethod("db_collect") } #' @export db_collect.DBIConnection <- function(con, sql, n = -1, warn_incomplete = TRUE, ...) { res <- dbSendQuery(con, sql) tryCatch({ out <- dbFetch(res, n = n) if (warn_incomplete) { res_warn_incomplete(res, "n = Inf") } }, finally = { dbClearResult(res) }) out } #' @rdname db_copy_to #' @export db_sql_render <- function(con, sql, ...) { UseMethod("db_sql_render") } #' @export db_sql_render.DBIConnection <- function(con, sql, ...) { qry <- sql_build(sql, con = con, ...) sql_render(qry, con = con, ...) } dbplyr/R/db-odbc-redshift.R0000644000176200001440000000015113124463506015173 0ustar liggesusers#' @export sql_translate_env.Redshift <- function(con) { sql_translate_env.PostgreSQLConnection(con) } dbplyr/R/translate-sql.r0000644000176200001440000001632513174454174014745 0ustar liggesusers#' Translate an expression to sql. #' #' @section Base translation: #' The base translator, `base_sql`, #' provides custom mappings for `!` (to NOT), `&&` and `&` to #' `AND`, `||` and `|` to `OR`, `^` to `POWER`, #' \code{\%>\%} to \code{\%}, `ceiling` to `CEIL`, `mean` to #' `AVG`, `var` to `VARIANCE`, `tolower` to `LOWER`, #' `toupper` to `UPPER` and `nchar` to `LENGTH`. #' #' `c()` and `:` keep their usual R behaviour so you can easily create #' vectors that are passed to sql. #' #' All other functions will be preserved as is. R's infix functions #' (e.g. \code{\%like\%}) will be converted to their SQL equivalents #' (e.g. `LIKE`). You can use this to access SQL string concatenation: #' `||` is mapped to `OR`, but \code{\%||\%} is mapped to `||`. #' To suppress this behaviour, and force errors immediately when dplyr doesn't #' know how to translate a function it encounters, using set the #' `dplyr.strict_sql` option to `TRUE`. #' #' You can also use [sql()] to insert a raw sql string. #' #' @section SQLite translation: #' The SQLite variant currently only adds one additional function: a mapping #' from `sd()` to the SQL aggregation function `STDEV`. #' #' @param ...,dots Expressions to translate. `translate_sql()` #' automatically quotes them for you. `translate_sql_()` expects #' a list of already quoted objects. #' @param con An optional database connection to control the details of #' the translation. The default, `NULL`, generates ANSI SQL. #' @param vars Deprecated. Now call [partial_eval()] directly. #' @param vars_group,vars_order,vars_frame Parameters used in the `OVER` #' expression of windowed functions. #' @param window Use `FALSE` to suppress generation of the `OVER` #' statement used for window functions. This is necessary when generating #' SQL for a grouped summary. #' @param context Use to carry information for special translation cases. For example, MS SQL needs a different conversion for is.na() in WHERE vs. SELECT clauses. Expects a list. #' @export #' @examples #' # Regular maths is translated in a very straightforward way #' translate_sql(x + 1) #' translate_sql(sin(x) + tan(y)) #' #' # Note that all variable names are escaped #' translate_sql(like == "x") #' # In ANSI SQL: "" quotes variable _names_, '' quotes strings #' #' # Logical operators are converted to their sql equivalents #' translate_sql(x < 5 & !(y >= 5)) #' # xor() doesn't have a direct SQL equivalent #' translate_sql(xor(x, y)) #' #' # If is translated into case when #' translate_sql(if (x > 5) "big" else "small") #' #' # Infix functions are passed onto SQL with % removed #' translate_sql(first %like% "Had%") #' translate_sql(first %is% NULL) #' translate_sql(first %in% c("John", "Roger", "Robert")) #' #' # And be careful if you really want integers #' translate_sql(x == 1) #' translate_sql(x == 1L) #' #' # If you have an already quoted object, use translate_sql_: #' x <- quote(y + 1 / sin(t)) #' translate_sql_(list(x)) #' #' # Windowed translation -------------------------------------------- #' # Known window functions automatically get OVER() #' translate_sql(mpg > mean(mpg)) #' #' # Suppress this with window = FALSE #' translate_sql(mpg > mean(mpg), window = FALSE) #' #' # vars_group controls partition: #' translate_sql(mpg > mean(mpg), vars_group = "cyl") #' #' # and vars_order controls ordering for those functions that need it #' translate_sql(cumsum(mpg)) #' translate_sql(cumsum(mpg), vars_order = "mpg") translate_sql <- function(..., con = NULL, vars = character(), vars_group = NULL, vars_order = NULL, vars_frame = NULL, window = TRUE) { if (!missing(vars)) { abort("`vars` is deprecated. Please use partial_eval() directly.") } translate_sql_( quos(...), con = con, vars_group = vars_group, vars_order = vars_order, vars_frame = vars_frame, window = window ) } #' @export #' @rdname translate_sql translate_sql_ <- function(dots, con = NULL, vars_group = NULL, vars_order = NULL, vars_frame = NULL, window = TRUE, context = list()) { if (length(dots) == 0) { return(sql()) } stopifnot(is.list(dots)) if (!any(have_name(dots))) { names(dots) <- NULL } old_con <- set_current_con(con) on.exit(set_current_con(old_con), add = TRUE) if (length(context) > 0) { old_context <- set_current_context(context) on.exit(set_current_context(context), add = TRUE) } if (window) { old_group <- set_win_current_group(vars_group) on.exit(set_win_current_group(old_group), add = TRUE) old_order <- set_win_current_order(vars_order) on.exit(set_win_current_order(old_order), add = TRUE) old_frame <- set_win_current_frame(vars_frame) on.exit(set_win_current_frame(old_frame), add = TRUE) } variant <- sql_translate_env(con) pieces <- map(dots, function(x) { if (is_atomic(get_expr(x))) { escape(get_expr(x), con = con) } else { overscope <- sql_overscope(x, variant, con, window = window) on.exit(overscope_clean(overscope)) escape(overscope_eval_next(overscope, x)) } }) sql(unlist(pieces)) } sql_overscope <- function(expr, variant, con, window = FALSE, strict = getOption("dplyr.strict_sql", FALSE)) { stopifnot(is.sql_variant(variant)) # Default for unknown functions if (!strict) { unknown <- setdiff(all_calls(expr), names(variant)) top_env <- ceply(unknown, default_op, parent = empty_env()) } else { top_env <- child_env(NULL) } # Known R -> SQL functions special_calls <- copy_env(variant$scalar, parent = top_env) if (!window) { special_calls2 <- copy_env(variant$aggregate, parent = special_calls) } else { special_calls2 <- copy_env(variant$window, parent = special_calls) } # Existing symbols in expression names <- all_names(expr) name_env <- ceply( names, function(x) escape(ident(x), con = con), parent = special_calls2 ) # Known sql expressions symbol_env <- env_clone(base_symbols, parent = name_env) new_overscope(symbol_env, top_env) } is_infix_base <- function(x) { x %in% c("::", "$", "@", "^", "*", "/", "+", "-", ">", ">=", "<", "<=", "==", "!=", "!", "&", "&&", "|", "||", "~", "<-", "<<-") } is_infix_user <- function(x) { grepl("^%.*%$", x) } default_op <- function(x) { assert_that(is_string(x)) if (is_infix_base(x)) { sql_infix(x) } else if (is_infix_user(x)) { x <- substr(x, 2, nchar(x) - 1) sql_infix(x) } else { sql_prefix(x) } } all_calls <- function(x) { if (!is.call(x)) return(NULL) fname <- as.character(x[[1]]) unique(c(fname, unlist(lapply(x[-1], all_calls), use.names = FALSE))) } all_names <- function(x) { if (is.name(x)) return(as.character(x)) if (!is.call(x)) return(NULL) unique(unlist(lapply(x[-1], all_names), use.names = FALSE)) } # character vector -> environment ceply <- function(x, f, ..., parent = parent.frame()) { if (length(x) == 0) return(new.env(parent = parent)) l <- lapply(x, f, ...) names(l) <- x list2env(l, parent = parent) } dbplyr/R/tbl-lazy.R0000644000176200001440000001601513221456356013642 0ustar liggesusers#' Create a local lazy tibble #' #' These functions are useful for testing SQL generation without having to #' have an active database connection. #' #' @keywords internal #' @export #' @examples #' library(dplyr) #' df <- data.frame(x = 1, y = 2) #' #' df_sqlite <- tbl_lazy(df, src = simulate_sqlite()) #' df_sqlite %>% summarise(x = sd(x)) %>% show_query() tbl_lazy <- function(df, src = NULL) { make_tbl("lazy", ops = op_base_local(df), src = src) } setOldClass(c("tbl_lazy", "tbl")) #' @export #' @rdname tbl_lazy lazy_frame <- function(..., src = NULL) { tbl_lazy(tibble(...), src = src) } #' @export same_src.tbl_lazy <- function(x, y) { inherits(y, "tbl_lazy") } #' @export tbl_vars.tbl_lazy <- function(x) { op_vars(x$ops) } #' @export groups.tbl_lazy <- function(x) { lapply(group_vars(x), as.name) } #' @export group_vars.tbl_lazy <- function(x) { op_grps(x$ops) } render_lazy <- function(x, ...) { cat("Source: lazy\n") cat("Vars : ", commas(op_vars(x$ops)), "\n", sep = "") cat("Groups: ", commas(op_grps(x$ops)), "\n", sep = "") cat("\n") print(x$ops) } # Single table methods ---------------------------------------------------- # registered onLoad filter.tbl_lazy <- function(.data, ...) { dots <- quos(...) dots <- partial_eval(dots, vars = op_vars(.data)) add_op_single("filter", .data, dots = dots) } #' @export filter_.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) dots <- partial_eval(dots, vars = op_vars(.data)) add_op_single("filter", .data, dots = dots) } #' @export arrange.tbl_lazy <- function(.data, ...) { dots <- quos(...) dots <- partial_eval(dots, vars = op_vars(.data)) names(dots) <- NULL add_op_single("arrange", .data, dots = dots) } #' @export arrange_.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) arrange(.data, !!! dots) } #' @export select.tbl_lazy <- function(.data, ...) { dots <- quos(...) add_op_single("select", .data, dots = dots) } #' @export select_.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) add_op_single("select", .data, dots = dots) } #' @export rename.tbl_lazy <- function(.data, ...) { dots <- quos(...) dots <- partial_eval(dots, vars = op_vars(.data)) add_op_single("rename", .data, dots = dots) } #' @export rename_.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) dots <- partial_eval(dots, vars = op_vars(.data)) add_op_single("rename", .data, dots = dots) } #' @export summarise.tbl_lazy <- function(.data, ...) { dots <- quos(..., .named = TRUE) dots <- partial_eval(dots, vars = op_vars(.data)) add_op_single("summarise", .data, dots = dots) } #' @export summarise_.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) dots <- partial_eval(dots, vars = op_vars(.data)) add_op_single("summarise", .data, dots = dots) } #' @export mutate.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- quos(..., .named = TRUE) dots <- partial_eval(dots, vars = op_vars(.data)) # For each expression, check if it uses any newly created variables. # If so, nest the mutate() new_vars <- character() init <- 0L for (i in seq_along(dots)) { cur_var <- names(dots)[[i]] used_vars <- all_names(get_expr(dots[[i]])) if (any(used_vars %in% new_vars)) { .data <- add_op_single("mutate", .data, dots = dots[new_vars]) new_vars <- cur_var init <- i } else { new_vars <- c(new_vars, cur_var) } } if (init != 0L) { dots <- dots[-seq2(1L, init - 1)] } add_op_single("mutate", .data, dots = dots) } #' @export mutate_.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) dots <- partial_eval(dots, vars = op_vars(.data)) add_op_single("mutate", .data, dots = dots) } #' @export group_by.tbl_lazy <- function(.data, ..., add = FALSE) { dots <- quos(...) dots <- partial_eval(dots, vars = op_vars(.data)) if (length(dots) == 0) { return(.data) } groups <- group_by_prepare(.data, .dots = dots, add = add) names <- map_chr(groups$groups, as_string) add_op_single("group_by", groups$data, dots = set_names(groups$groups, names), args = list(add = FALSE) ) } #' @export group_by_.tbl_lazy <- function(.data, ..., .dots = list(), add = FALSE) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) group_by(.data, !!! dots, add = add) } #' @export head.tbl_lazy <- function(x, n = 6L, ...) { if (inherits(x$ops, "op_head")) { x$ops$args$n <- min(x$ops$args$n, n) } else { x$ops <- op_single("head", x = x$ops, dots = dots, args = list(n = n)) } x } #' @export ungroup.tbl_lazy <- function(x, ...) { add_op_single("ungroup", x) } #' @export distinct.tbl_lazy <- function(.data, ..., .keep_all = FALSE) { dots <- quos(..., .named = TRUE) dots <- partial_eval(dots, vars = op_vars(.data)) add_op_single("distinct", .data, dots = dots, args = list(.keep_all = .keep_all)) } #' @export distinct_.tbl_lazy <- function(.data, ..., .dots = list(), .keep_all = FALSE) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) distinct(.data, !!! dots, .keep_all = .keep_all) } # Dual table verbs ------------------------------------------------------------ add_op_join <- function(x, y, type, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ...) { if (identical(type, "full") && identical(by, character())) { type <- "cross" by <- list(x = character(0), y = character(0)) } else { by <- common_by(by, x, y) } y <- auto_copy( x, y, copy = copy, indexes = if (auto_index) list(by$y) ) vars <- join_vars(op_vars(x), op_vars(y), type = type, by = by, suffix = suffix) x$ops <- op_double("join", x, y, args = list( vars = vars, type = type, by = by, suffix = suffix )) x } add_op_semi_join <- function(x, y, anti = FALSE, by = NULL, copy = FALSE, auto_index = FALSE, ...) { by <- common_by(by, x, y) y <- auto_copy( x, y, copy, indexes = if (auto_index) list(by$y) ) x$ops <- op_double("semi_join", x, y, args = list( anti = anti, by = by )) x } add_op_set_op <- function(x, y, type, copy = FALSE, ...) { y <- auto_copy(x, y, copy) if (inherits(x$src$con, "SQLiteConnection")) { # LIMIT only part the compound-select-statement not the select-core. # # https://www.sqlite.org/syntax/compound-select-stmt.html # https://www.sqlite.org/syntax/select-core.html if (inherits(x$ops, "op_head") || inherits(y$ops, "op_head")) { stop("SQLite does not support set operations on LIMITs", call. = FALSE) } } x$ops <- op_double("set_op", x, y, args = list(type = type)) x } # Currently the dual table verbs are defined on tbl_sql, because the # because they definitions are bit too tightly connected to SQL. dbplyr/R/db-odbc-impala.R0000644000176200001440000000111313174404170014622 0ustar liggesusers#' @export sql_translate_env.Impala <- function(con) { sql_variant( scalar = sql_translator(.parent = base_odbc_scalar, as.Date = sql_cast("VARCHAR(10)"), ceiling = sql_prefix("CEIL") ) , base_odbc_agg, base_odbc_win ) } #' @export db_analyze.Impala <- function(con, table, ...) { # Using COMPUTE STATS instead of ANALYZE as recommended in this article # https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_compute_stats.html sql <- build_sql("COMPUTE STATS ", as.sql(table), con = con) DBI::dbExecute(con, sql) } dbplyr/R/db-sqlite.r0000644000176200001440000000277313221461571014031 0ustar liggesusers#' @export db_desc.SQLiteConnection <- function(x) { paste0("sqlite ", sqlite_version(), " [", x@dbname, "]") } sqlite_version <- function() { if (utils::packageVersion("RSQLite") > 1) { RSQLite::rsqliteVersion()[[2]] } else { DBI::dbGetInfo(RSQLite::SQLite())$clientVersion } } # SQL methods ------------------------------------------------------------- #' @export sql_translate_env.SQLiteConnection <- function(con) { sql_variant( sql_translator(.parent = base_scalar, log = function(x, base = exp(1)) { if (base != exp(1)) { sql_expr(log(!!x) / log(!!base)) } else { sql_expr(log(!!x)) } }, na_if = sql_prefix("NULLIF", 2), paste = sql_paste_infix(" ", "||", function(x) sql_expr(cast(UQ(x) %as% text))), paste0 = sql_paste_infix("", "||", function(x) sql_expr(cast(UQ(x) %as% text))) ), sql_translator(.parent = base_agg, sd = sql_aggregate("stdev") ), base_no_win ) } #' @export sql_escape_ident.SQLiteConnection <- function(con, x) { sql_quote(x, "`") } #' @export sql_escape_logical.SQLiteConnection <- function(con, x){ y <- as.character(as.integer(x)) y[is.na(x)] <- "NULL" y } #' @export sql_subquery.SQLiteConnection <- function(con, from, name = unique_name(), ...) { if (is.ident(from)) { setNames(from, name) } else { if (is.null(name)) { build_sql("(", from, ")", con = con) } else { build_sql("(", from, ") AS ", ident(name), con = con) } } } dbplyr/R/db-roracle.R0000644000176200001440000000221513176710322014106 0ustar liggesusers#' @export sql_translate_env.OraConnection <- function(con) { sql_translate_env.Oracle(con) } #' @export sql_select.OraConnection <- function(con, select, from, where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL, distinct = FALSE, ...) { sql_select.Oracle(con, select, from, where = where, group_by = group_by, having = having, order_by = order_by, limit = limit, distinct = distinct, ...) } #' @export db_analyze.OraConnection <- function(con, table, ...) { db_analyze.Oracle(con = con, table = table, ...) } #' @export sql_subquery.OraConnection <- function(con, from, name = unique_name(), ...) { sql_subquery.Oracle(con = con, from = from, name = name, ...) } #' @export db_drop_table.OraConnection <- function(con, table, force = FALSE, ...) { db_drop_table.OdbcConnection(con = con, table = table, force = force, ...) } dbplyr/R/dbplyr.R0000644000176200001440000000043213102125357013364 0ustar liggesusers#' @importFrom assertthat assert_that #' @importFrom assertthat is.flag #' @importFrom stats setNames update #' @importFrom utils head tail #' @importFrom glue glue #' @importFrom methods setOldClass #' @import dplyr #' @import rlang #' @import DBI #' @keywords internal "_PACKAGE" dbplyr/R/translate-sql-paste.R0000644000176200001440000000173513176642465016022 0ustar liggesusers#' @export #' @rdname sql_variant sql_paste <- function(default_sep, f = "CONCAT_WS") { f <- toupper(f) function(..., sep = default_sep, collapse = NULL){ check_collapse(collapse) sql_expr(UQ(f)(!!sep, !!!list(...))) } } #' @export #' @rdname sql_variant sql_paste_infix <- function(default_sep, op, cast) { force(default_sep) op <- as.symbol(paste0("%", op, "%")) force(cast) function(..., sep = default_sep, collapse = NULL){ check_collapse(collapse) args <- list(...) if (length(args) == 1) { return(cast(args[[1]])) } if (sep == "") { infix <- function(x, y) sql_expr(UQ(op)(!!x, !!y)) } else { infix <- function(x, y) sql_expr(UQ(op)(UQ(op)(!!x, !!sep), !!y)) } reduce(args, infix) } } check_collapse <- function(collapse) { if (is.null(collapse)) return() stop( "`collapse` not supported in DB translation of `paste()`.\n", "Please use str_flatten() instead", call. = FALSE ) } dbplyr/R/translate-sql-clause.r0000644000176200001440000000253413122763264016210 0ustar liggesusers sql_clause_generic <- function(clause, fields, con){ if (length(fields) > 0L) { assert_that(is.character(fields)) build_sql( sql(clause), " ", escape(fields, collapse = ", ", con = con) ) } } sql_clause_select <- function(select, con, distinct = FALSE){ assert_that(is.character(select)) if (is_empty(select)) { abort("Query contains no columns") } build_sql( "SELECT ", if (distinct) sql("DISTINCT "), escape(select, collapse = ", ", con = con) ) } sql_clause_where <- function(where, con){ if (length(where) > 0L) { assert_that(is.character(where)) where_paren <- escape(where, parens = TRUE, con = con) build_sql("WHERE ", sql_vector(where_paren, collapse = " AND ")) } } sql_clause_limit <- function(limit, con){ if (!is.null(limit) && !identical(limit, Inf)) { assert_that(is.numeric(limit), length(limit) == 1L, limit >= 0) build_sql( "LIMIT ", sql(format(trunc(limit), scientific = FALSE)), con = con ) } } sql_clause_from <- function(from, con) sql_clause_generic("FROM", from, con) sql_clause_group_by <- function(group_by, con) sql_clause_generic("GROUP BY", group_by, con) sql_clause_having <- function(having, con) sql_clause_generic("HAVING", having, con) sql_clause_order_by <- function(order_by, con) sql_clause_generic("ORDER BY", order_by, con) dbplyr/R/data-lahman.r0000644000176200001440000000550313066524352014312 0ustar liggesusers#' Cache and retrieve an `src_sqlite` of the Lahman baseball database. #' #' This creates an interesting database using data from the Lahman baseball #' data source, provided by Sean Lahman at #' \url{http://www.seanlahman.com/baseball-archive/statistics/}, and #' made easily available in R through the \pkg{Lahman} package by #' Michael Friendly, Dennis Murphy and Martin Monkman. See the documentation #' for that package for documentation of the inidividual tables. #' #' @param ... Other arguments passed to `src` on first #' load. For mysql and postgresql, the defaults assume you have a local #' server with `lahman` database already created. #' For `lahman_srcs()`, character vector of names giving srcs to generate. #' @param quiet if `TRUE`, suppress messages about databases failing to #' connect. #' @param type src type. #' @keywords internal #' @examples #' # Connect to a local sqlite database, if already created #' \donttest{ #' if (has_lahman("sqlite")) { #' lahman_sqlite() #' batting <- tbl(lahman_sqlite(), "Batting") #' batting #' } #' #' # Connect to a local postgres database with lahman database, if available #' if (has_lahman("postgres")) { #' lahman_postgres() #' batting <- tbl(lahman_postgres(), "Batting") #' } #' } #' @name lahman NULL #' @export #' @rdname lahman lahman_sqlite <- function(path = NULL) { path <- db_location(path, "lahman.sqlite") copy_lahman(src_sqlite(path = path, create = TRUE)) } #' @export #' @rdname lahman lahman_postgres <- function(dbname = "lahman", host = "localhost", ...) { src <- src_postgres(dbname, host = host, ...) copy_lahman(src) } #' @export #' @rdname lahman lahman_mysql <- function(dbname = "lahman", ...) { src <- src_mysql(dbname, ...) copy_lahman(src) } #' @export #' @rdname lahman lahman_df <- function() { src_df("Lahman") } #' @rdname lahman #' @export copy_lahman <- function(src, ...) { # Create missing tables tables <- setdiff(lahman_tables(), src_tbls(src)) for (table in tables) { df <- getExportedValue("Lahman", table) message("Creating table: ", table) ids <- as.list(names(df)[grepl("ID$", names(df))]) copy_to(src, df, table, indexes = ids, temporary = FALSE) } src } # Get list of all non-label data frames in package lahman_tables <- function() { tables <- utils::data(package = "Lahman")$results[, 3] tables[!grepl("Labels", tables)] } #' @rdname lahman #' @export has_lahman <- function(type, ...) { if (!requireNamespace("Lahman", quietly = TRUE)) return(FALSE) if (missing(type)) return(TRUE) succeeds(lahman(type, ...), quiet = FALSE) } #' @rdname lahman #' @export lahman_srcs <- function(..., quiet = NULL) { load_srcs(lahman, c(...), quiet = quiet) } lahman <- function(type, ...) { if (missing(type)) { src_df("Lahman") } else { f <- match.fun(paste0("lahman_", type)) f(...) } } dbplyr/R/utils-format.r0000644000176200001440000000137013066546453014575 0ustar liggesusers wrap <- function(..., indent = 0) { x <- paste0(..., collapse = "") wrapped <- strwrap( x, indent = indent, exdent = indent + 2, width = getOption("width") ) paste0(wrapped, collapse = "\n") } rule <- function(pad = "-", gap = 2L) { paste0(rep(pad, getOption("width") - gap), collapse = "") } named_rule <- function(..., pad = "-") { if (nargs() == 0) { title <- "" } else { title <- paste0(...) } paste0(title, " ", rule(pad = pad, gap = nchar(title) - 1)) } # function for the thousand separator, # returns "," unless it's used for the decimal point, in which case returns "." 'big_mark' <- function(x, ...) { mark <- if (identical(getOption("OutDec"), ",")) "." else "," formatC(x, big.mark = mark, ...) } dbplyr/R/sql-escape.r0000644000176200001440000001420213221470274014170 0ustar liggesusers#' Escape/quote a string. #' #' @param x An object to escape. Existing sql vectors will be left as is, #' character vectors are escaped with single quotes, numeric vectors have #' trailing `.0` added if they're whole numbers, identifiers are #' escaped with double quotes. #' @param parens,collapse Controls behaviour when multiple values are supplied. #' `parens` should be a logical flag, or if `NA`, will wrap in #' parens if length > 1. #' #' Default behaviour: lists are always wrapped in parens and separated by #' commas, identifiers are separated by commas and never wrapped, #' atomic vectors are separated by spaces and wrapped in parens if needed. #' @param con Database connection. If not specified, uses SQL 92 conventions. #' @rdname escape #' @export #' @examples #' # Doubles vs. integers #' escape(1:5) #' escape(c(1, 5.4)) #' #' # String vs known sql vs. sql identifier #' escape("X") #' escape(sql("X")) #' escape(ident("X")) #' #' # Escaping is idempotent #' escape("X") #' escape(escape("X")) #' escape(escape(escape("X"))) escape <- function(x, parens = NA, collapse = " ", con = NULL) { UseMethod("escape") } #' @export escape.ident <- function(x, parens = FALSE, collapse = ", ", con = NULL) { y <- sql_escape_ident(con, x) sql_vector(names_to_as(y, names2(x), con = con), parens, collapse) } #' @export escape.ident_q <- function(x, parens = FALSE, collapse = ", ", con = NULL) { sql_vector(names_to_as(x, names2(x), con = con), parens, collapse) } #' @export escape.logical <- function(x, parens = NA, collapse = ", ", con = NULL) { sql_vector(sql_escape_logical(con, x), parens, collapse, con = con) } #' @export escape.factor <- function(x, parens = NA, collapse = ", ", con = NULL) { x <- as.character(x) escape.character(x, parens = parens, collapse = collapse, con = con) } #' @export escape.Date <- function(x, parens = NA, collapse = ", ", con = NULL) { x <- as.character(x) escape.character(x, parens = parens, collapse = collapse, con = con) } #' @export escape.POSIXt <- function(x, parens = NA, collapse = ", ", con = NULL) { x <- strftime(x, "%Y-%m-%dT%H:%M:%OSZ", tz = "UTC") escape.character(x, parens = parens, collapse = collapse, con = con) } #' @export escape.character <- function(x, parens = NA, collapse = ", ", con = NULL) { sql_vector(sql_escape_string(con, x), parens, collapse, con = con) } #' @export escape.double <- function(x, parens = NA, collapse = ", ", con = NULL) { out <- ifelse(is.wholenumber(x), sprintf("%.1f", x), as.character(x)) # Special values out[is.na(x)] <- "NULL" inf <- is.infinite(x) out[inf & x > 0] <- "'Infinity'" out[inf & x < 0] <- "'-Infinity'" sql_vector(out, parens, collapse) } #' @export escape.integer <- function(x, parens = NA, collapse = ", ", con = NULL) { x[is.na(x)] <- "NULL" sql_vector(x, parens, collapse) } #' @export escape.integer64 <- function(x, parens = NA, collapse = ", ", con = NULL) { x <- as.character(x) x[is.na(x)] <- "NULL" sql_vector(x, parens, collapse) } #' @export escape.NULL <- function(x, parens = NA, collapse = " ", con = NULL) { sql("NULL") } #' @export escape.sql <- function(x, parens = NULL, collapse = NULL, con = NULL) { sql_vector(x, isTRUE(parens), collapse, con = con) } #' @export escape.list <- function(x, parens = TRUE, collapse = ", ", con = NULL) { pieces <- vapply(x, escape, character(1), con = con) sql_vector(pieces, parens, collapse) } #' @export #' @rdname escape sql_vector <- function(x, parens = NA, collapse = " ", con = NULL) { if (length(x) == 0) { if (!is.null(collapse)) { return(if (isTRUE(parens)) sql("()") else sql("")) } else { return(sql()) } } if (is.na(parens)) { parens <- length(x) > 1L } x <- names_to_as(x, con = con) x <- paste(x, collapse = collapse) if (parens) x <- paste0("(", x, ")") sql(x) } names_to_as <- function(x, names = names2(x), con = NULL) { if (length(x) == 0) { return(character()) } names_esc <- sql_escape_ident(con, names) as <- ifelse(names == "" | names_esc == x, "", paste0(" AS ", names_esc)) paste0(x, as) } #' Build a SQL string. #' #' This is a convenience function that should prevent sql injection attacks #' (which in the context of dplyr are most likely to be accidental not #' deliberate) by automatically escaping all expressions in the input, while #' treating bare strings as sql. This is unlikely to prevent any serious #' attack, but should make it unlikely that you produce invalid sql. #' #' @param ... input to convert to SQL. Use [sql()] to preserve #' user input as is (dangerous), and [ident()] to label user #' input as sql identifiers (safe) #' @param .env the environment in which to evalute the arguments. Should not #' be needed in typical use. #' @param con database connection; used to select correct quoting characters. #' @export #' @examples #' build_sql("SELECT * FROM TABLE") #' x <- "TABLE" #' build_sql("SELECT * FROM ", x) #' build_sql("SELECT * FROM ", ident(x)) #' build_sql("SELECT * FROM ", sql(x)) #' #' # http://xkcd.com/327/ #' name <- "Robert'); DROP TABLE Students;--" #' build_sql("INSERT INTO Students (Name) VALUES (", name, ")") build_sql <- function(..., .env = parent.frame(), con = sql_current_con()) { escape_expr <- function(x) { # If it's a string, leave it as is if (is.character(x)) return(x) val <- eval_bare(x, .env) # Skip nulls, so you can use if statements like in paste if (is.null(val)) return("") escape(val, con = con) } pieces <- vapply(dots(...), escape_expr, character(1)) sql(paste0(pieces, collapse = "")) } #' Helper function for quoting sql elements. #' #' If the quote character is present in the string, it will be doubled. #' `NA`s will be replaced with NULL. #' #' @export #' @param x Character vector to escape. #' @param quote Single quoting character. #' @export #' @keywords internal #' @examples #' sql_quote("abc", "'") #' sql_quote("I've had a good day", "'") #' sql_quote(c("abc", NA), "'") sql_quote <- function(x, quote) { if (length(x) == 0) { return(x) } y <- gsub(quote, paste0(quote, quote), x, fixed = TRUE) y <- paste0(quote, y, quote) y[is.na(x)] <- "NULL" names(y) <- names(x) y } dbplyr/R/do.r0000644000176200001440000001075313067256154012553 0ustar liggesusers#' Perform arbitrary computation on remote backend #' #' @inheritParams dplyr::do #' @param .chunk_size The size of each chunk to pull into R. If this number is #' too big, the process will be slow because R has to allocate and free a lot #' of memory. If it's too small, it will be slow, because of the overhead of #' talking to the database. #' @export do.tbl_sql <- function(.data, ..., .chunk_size = 1e4L) { groups_sym <- groups(.data) if (length(groups_sym) == 0) { .data <- collect(.data) return(do(.data, ...)) } args <- quos(...) named <- named_args(args) # Create data frame of labels labels <- .data %>% select(!!! groups_sym) %>% summarise() %>% collect() con <- .data$src$con n <- nrow(labels) m <- length(args) out <- replicate(m, vector("list", n), simplify = FALSE) names(out) <- names(args) p <- progress_estimated(n * m, min_time = 2) # Create ungrouped data frame suitable for chunked retrieval query <- Query$new(con, db_sql_render(con, ungroup(.data)), op_vars(.data)) # When retrieving in pages, there's no guarantee we'll get a complete group. # So we always assume the last group in the chunk is incomplete, and leave # it for the next. If the group size is large than chunk size, it may # take a couple of iterations to get the entire group, but that should # be an unusual situation. last_group <- NULL i <- 0 # Assumes `chunk` to be ordered with group columns first gvars <- seq_along(groups_sym) # Create the dynamic scope for tidy evaluation env <- child_env(NULL) overscope <- new_overscope(env) on.exit(overscope_clean(overscope)) query$fetch_paged(.chunk_size, function(chunk) { if (!is_null(last_group)) { chunk <- rbind(last_group, chunk) } # Create an id for each group grouped <- chunk %>% group_by(!!! syms(names(chunk)[gvars])) index <- attr(grouped, "indices") # zero indexed n <- length(index) last_group <<- chunk[index[[length(index)]] + 1L, , drop = FALSE] for (j in seq_len(n - 1)) { cur_chunk <- chunk[index[[j]] + 1L, , drop = FALSE] # Update pronouns within the overscope env$. <- env$.data <- cur_chunk for (k in seq_len(m)) { out[[k]][i + j] <<- list(overscope_eval_next(overscope, args[[k]])) p$tick()$print() } } i <<- i + (n - 1) }) # Process last group if (!is_null(last_group)) { env$. <- env$.data <- last_group for (k in seq_len(m)) { out[[k]][i + 1] <- list(overscope_eval_next(overscope, args[[k]])) p$tick()$print() } } if (!named) { label_output_dataframe(labels, out, groups(.data)) } else { label_output_list(labels, out, groups(.data)) } } #' @export do_.tbl_sql <- function(.data, ..., .dots = list(), .chunk_size = 1e4L) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) do(.data, !!! dots, .chunk_size = .chunk_size) } # Helper functions ------------------------------------------------------------- label_output_dataframe <- function(labels, out, groups) { data_frame <- vapply(out[[1]], is.data.frame, logical(1)) if (any(!data_frame)) { stop( "Results are not data frames at positions: ", paste(which(!data_frame), collapse = ", "), call. = FALSE ) } rows <- vapply(out[[1]], nrow, numeric(1)) out <- bind_rows(out[[1]]) if (!is.null(labels)) { # Remove any common columns from labels labels <- labels[setdiff(names(labels), names(out))] # Repeat each row to match data labels <- labels[rep(seq_len(nrow(labels)), rows), , drop = FALSE] rownames(labels) <- NULL grouped_df(bind_cols(labels, out), groups) } else { rowwise(out) } } label_output_list <- function(labels, out, groups) { if (!is.null(labels)) { labels[names(out)] <- out rowwise(labels) } else { class(out) <- "data.frame" attr(out, "row.names") <- .set_row_names(length(out[[1]])) rowwise(out) } } named_args <- function(args) { # Arguments must either be all named or all unnamed. named <- sum(names2(args) != "") if (!(named == 0 || named == length(args))) { stop( "Arguments to do() must either be all named or all unnamed", call. = FALSE ) } if (named == 0 && length(args) > 1) { stop("Can only supply single unnamed argument to do()", call. = FALSE) } # Check for old syntax if (named == 1 && names(args) == ".f") { stop( "do syntax changed in dplyr 0.2. Please see documentation for details", call. = FALSE ) } named != 0 } dbplyr/R/data-nycflights13.r0000644000176200001440000000405513047150410015356 0ustar liggesusers#' Database versions of the nycflights13 data #' #' These functions cache the data from the `nycflights13` database in #' a local database, for use in examples and vignettes. Indexes are created #' to making joining tables on natural keys efficient. #' #' @keywords internal #' @name nycflights13 NULL #' @export #' @rdname nycflights13 #' @param path location of sqlite database file nycflights13_sqlite <- function(path = NULL) { cache_computation("nycflights_sqlite", { path <- db_location(path, "nycflights13.sqlite") message("Caching nycflights db at ", path) src <- src_sqlite(path, create = TRUE) copy_nycflights13(src) }) } #' @export #' @rdname nycflights13 #' @param dbname,... Arguments passed on to [src_postgres()] nycflights13_postgres <- function(dbname = "nycflights13", ...) { cache_computation("nycflights_postgres", { message("Caching nycflights db in postgresql db ", dbname) copy_nycflights13(src_postgres(dbname, ...)) }) } #' @rdname nycflights13 #' @export has_nycflights13 <- function(type = c("sqlite", "postgresql"), ...) { if (!requireNamespace("nycflights13", quietly = TRUE)) return(FALSE) type <- match.arg(type) succeeds(switch( type, sqlite = nycflights13_sqlite(...), quiet = TRUE, postgres = nycflights13_postgres(...), quiet = TRUE )) } #' @export #' @rdname nycflights13 copy_nycflights13 <- function(src, ...) { all <- utils::data(package = "nycflights13")$results[, 3] unique_index <- list( airlines = list("carrier"), planes = list("tailnum") ) index <- list( airports = list("faa"), flights = list(c("year", "month", "day"), "carrier", "tailnum", "origin", "dest"), weather = list(c("year", "month", "day"), "origin") ) tables <- setdiff(all, src_tbls(src)) # Create missing tables for (table in tables) { df <- getExportedValue("nycflights13", table) message("Creating table: ", table) copy_to( src, df, table, unique_indexes = unique_index[[table]], indexes = index[[table]], temporary = FALSE ) } src } dbplyr/R/db-mysql.r0000644000176200001440000001013313176710165013667 0ustar liggesusers#' @export db_desc.MySQLConnection <- function(x) { info <- dbGetInfo(x) paste0( "mysql ", info$serverVersion, " [", info$user, "@", info$host, ":", info$port, "/", info$dbname, "]" ) } #' @export db_desc.MariaDBConnection <- db_desc.MySQLConnection #' @export sql_translate_env.MySQLConnection <- function(con) { sql_variant( sql_translator(.parent = base_scalar, as.character = sql_cast("CHAR"), paste = sql_paste(" "), paste0 = sql_paste("") ), sql_translator(.parent = base_agg, n = function() sql("COUNT(*)"), sd = sql_aggregate("stddev_samp"), var = sql_aggregate("var_samp"), str_flatten = function(x, collapse) { sql_expr(group_concat(!!x %separator% !!collapse)) } ), base_no_win ) } #' @export sql_translate_env.MariaDBConnection <- sql_translate_env.MySQLConnection # DBI methods ------------------------------------------------------------------ #' @export db_has_table.MySQLConnection <- function(con, table, ...) { # MySQL has no way to list temporary tables, so we always NA to # skip any local checks and rely on the database to throw informative errors NA } #' @export db_has_table.MariaDBConnection <- db_has_table.MySQLConnection #' @export db_data_type.MySQLConnection <- function(con, fields, ...) { char_type <- function(x) { n <- max(nchar(as.character(x), "bytes"), 0L, na.rm = TRUE) if (n <= 65535) { paste0("varchar(", n, ")") } else { "mediumtext" } } data_type <- function(x) { switch( class(x)[1], logical = "boolean", integer = "integer", numeric = "double", factor = char_type(x), character = char_type(x), Date = "date", POSIXct = "datetime", stop("Unknown class ", paste(class(x), collapse = "/"), call. = FALSE) ) } vapply(fields, data_type, character(1)) } #' @export db_begin.MySQLConnection <- function(con, ...) { dbExecute(con, "START TRANSACTION") } #' @export db_commit.MySQLConnection <- function(con, ...) { dbExecute(con, "COMMIT") } #' @export db_rollback.MySQLConnection <- function(con, ...) { dbExecute(con, "ROLLBACK") } #' @export db_write_table.MySQLConnection <- function(con, table, types, values, temporary = TRUE, ...) { db_create_table(con, table, types, temporary = temporary) values <- purrr::modify_if(values, is.logical, as.integer) values <- purrr::modify_if(values, is.factor, as.character) values <- purrr::modify_if(values, is.character, encodeString, na.encode = FALSE) tmp <- tempfile(fileext = ".csv") utils::write.table(values, tmp, sep = "\t", quote = FALSE, qmethod = "escape", na = "\\N", row.names = FALSE, col.names = FALSE) sql <- build_sql("LOAD DATA LOCAL INFILE ", encodeString(tmp), " INTO TABLE ", as.sql(table), con = con) dbExecute(con, sql) table } #' @export db_create_index.MySQLConnection <- function(con, table, columns, name = NULL, unique = FALSE, ...) { name <- name %||% paste0(c(table, columns), collapse = "_") fields <- escape(ident(columns), parens = TRUE, con = con) index <- build_sql( "ADD ", if (unique) sql("UNIQUE "), "INDEX ", ident(name), " ", fields, con = con ) sql <- build_sql("ALTER TABLE ", as.sql(table), "\n", index, con = con) dbExecute(con, sql) } #' @export db_create_index.MariaDBConnection <- db_create_index.MySQLConnection #' @export db_analyze.MySQLConnection <- function(con, table, ...) { sql <- build_sql("ANALYZE TABLE", as.sql(table), con = con) dbExecute(con, sql) } #' @export db_analyze.MariaDBConnection <- db_analyze.MySQLConnection # SQL methods ------------------------------------------------------------- #' @export sql_escape_ident.MySQLConnection <- function(con, x) { sql_quote(x, "`") } #' @export sql_join.MySQLConnection <- function(con, x, y, vars, type = "inner", by = NULL, ...) { if (identical(type, "full")) { stop("MySQL does not support full joins", call. = FALSE) } NextMethod() } globalVariables(c("%separator%", "group_concat")) dbplyr/R/remote.R0000644000176200001440000000237113174423753013401 0ustar liggesusers#' Metadata about a remote table #' #' `remote_name()` gives the name remote table, or `NULL` if it's a query. #' `remote_query()` gives the text of the query, and `remote_query_plan()` #' the query plan (as computed by the remote database). `remote_src()` and #' `remote_con()` give the dplyr source and DBI connection respectively. #' #' @param x Remote table, currently must be a [tbl_sql]. #' @return The value, or `NULL` if not remote table, or not applicable. #' For example, computed queries do not have a "name" #' @export #' @examples #' mf <- memdb_frame(x = 1:5, y = 5:1, .name = "blorp") #' remote_name(mf) #' remote_src(mf) #' remote_con(mf) #' remote_query(mf) #' #' mf2 <- dplyr::filter(mf, x > 3) #' remote_name(mf2) #' remote_src(mf2) #' remote_con(mf2) #' remote_query(mf2) remote_name <- function(x) { if (!inherits(x$ops, "op_base")) return() x$ops$x } #' @export #' @rdname remote_name remote_src <- function(x) { x$src } #' @export #' @rdname remote_name remote_con <- function(x) { x$src$con } #' @export #' @rdname remote_name remote_query <- function(x) { db_sql_render(remote_con(x), x) } #' @export #' @rdname remote_name remote_query_plan <- function(x) { db_explain(remote_con(x), db_sql_render(remote_con(x), x$ops)) } dbplyr/R/lazy-ops.R0000644000176200001440000001403413173732522013657 0ustar liggesusers#' Lazy operations #' #' This set of S3 classes describe the action of dplyr verbs. These are #' currently used for SQL sources to separate the description of operations #' in R from their computation in SQL. This API is very new so is likely #' to evolve in the future. #' #' `op_vars()` and `op_grps()` compute the variables and groups from #' a sequence of lazy operations. `op_sort()` tracks the order of the #' data for use in window functions. #' #' @keywords internal #' @name lazy_ops NULL # Base constructors ------------------------------------------------------- #' @export #' @rdname lazy_ops op_base <- function(x, vars, class = character()) { stopifnot(is.character(vars)) structure( list( x = x, vars = vars ), class = c(paste0("op_base_", class), "op_base", "op") ) } op_base_remote <- function(x, vars) { stopifnot(is.sql(x) || is.ident(x)) op_base(x, vars, class = "remote") } #' @export print.op_base_remote <- function(x, ...) { if (inherits(x$x, "ident")) { cat("From: ", x$x, "\n", sep = "") } else { cat("From: \n") } cat("\n", sep = "") } op_base_local <- function(df) { op_base(df, names(df), class = "local") } #' @export print.op_base_local <- function(x, ...) { cat(" ", dim_desc(x$x), "\n", sep = "") } # Operators --------------------------------------------------------------- #' @export #' @rdname lazy_ops op_single <- function(name, x, dots = list(), args = list()) { structure( list( name = name, x = x, dots = dots, args = args ), class = c(paste0("op_", name), "op_single", "op") ) } #' @export #' @rdname lazy_ops add_op_single <- function(name, .data, dots = list(), args = list()) { .data$ops <- op_single(name, x = .data$ops, dots = dots, args = args) .data } #' @export print.op_single <- function(x, ...) { print(x$x) cat("-> ", x$name, "()\n", sep = "") for (dot in x$dots) { cat(" - ", deparse_trunc(dot), "\n", sep = "") } } #' @export #' @rdname lazy_ops op_double <- function(name, x, y, args = list()) { structure( list( name = name, x = x, y = y, args = args ), class = c(paste0("op_", name), "op_double", "op") ) } # op_grps ----------------------------------------------------------------- #' @export #' @rdname lazy_ops op_grps <- function(op) UseMethod("op_grps") #' @export op_grps.op_base <- function(op) character() #' @export op_grps.op_group_by <- function(op) { if (isTRUE(op$args$add)) { union(op_grps(op$x), names(op$dots)) } else { names(op$dots) } } #' @export op_grps.op_ungroup <- function(op) { character() } #' @export op_grps.op_summarise <- function(op) { grps <- op_grps(op$x) if (length(grps) == 1) { character() } else { grps[-length(grps)] } } #' @export op_grps.op_rename <- function(op) { names(tidyselect::vars_rename(op_grps(op$x), !!! op$dots, .strict = FALSE)) } #' @export op_grps.op_single <- function(op) { op_grps(op$x) } #' @export op_grps.op_double <- function(op) { op_grps(op$x) } #' @export op_grps.tbl_lazy <- function(op) { op_grps(op$ops) } # op_vars ----------------------------------------------------------------- #' @export #' @rdname lazy_ops op_vars <- function(op) UseMethod("op_vars") #' @export op_vars.op_base <- function(op) { op$vars } #' @export op_vars.op_select <- function(op) { names(tidyselect::vars_select(op_vars(op$x), !!! op$dots, .include = op_grps(op$x))) } #' @export op_vars.op_rename <- function(op) { names(rename_vars(op_vars(op$x), !!! op$dots)) } #' @export op_vars.op_summarise <- function(op) { c(op_grps(op$x), names(op$dots)) } #' @export op_vars.op_distinct <- function(op) { if (length(op$dots) == 0 || op$args$.keep_all) { op_vars(op$x) } else { c(op_grps(op$x), names(op$dots)) } } #' @export op_vars.op_mutate <- function(op) { unique(c(op_vars(op$x), names(op$dots))) } #' @export op_vars.op_single <- function(op) { op_vars(op$x) } #' @export op_vars.op_join <- function(op) { op$args$vars$alias } #' @export op_vars.op_semi_join <- function(op) { op_vars(op$x) } #' @export op_vars.op_set_op <- function(op) { union(op_vars(op$x), op_vars(op$y)) } #' @export op_vars.tbl_lazy <- function(op) { op_vars(op$ops) } # op_sort ----------------------------------------------------------------- # This is only used to determine the order for window functions # so it purposely ignores grouping. Returns a list of expressions. #' @export #' @rdname lazy_ops op_sort <- function(op) UseMethod("op_sort") #' @export op_sort.op_base <- function(op) NULL #' @export op_sort.op_summarise <- function(op) NULL #' @export op_sort.op_arrange <- function(op) { c(op_sort(op$x), op$dots) } #' @export op_sort.op_single <- function(op) { op_sort(op$x) } #' @export op_sort.op_double <- function(op) { op_sort(op$x) } #' @export op_sort.tbl_lazy <- function(op) { op_sort(op$ops) } # We want to preserve this ordering (for window functions) without # imposing an additional arrange, so we have a special op_order add_op_order <- function(.data, dots = list()) { if (length(dots) == 0) { return(.data) } .data$ops <- op_single("order", x = .data$ops, dots = dots) .data } #' @export op_sort.op_order <- function(op) { c(op_sort(op$x), op$dots) } #' @export sql_build.op_order <- function(op, con, ...) { sql_build(op$x, con, ...) } # Description ------------------------------------------------------------- tbl_desc <- function(x) { paste0( op_desc(x$ops), " [", op_rows(x$ops), " x ", big_mark(op_cols(x$ops)), "]" ) } op_rows <- function(op) "??" op_cols <- function(op) length(op_vars(op)) op_desc <- function(op) UseMethod("op_desc") op_desc.op_base_remote <- function(op) { if (is.ident(op$x)) { paste0("table<", op$x, ">") } else { "SQL" } } #' @export op_desc.op_group_by <- function(x, ...) { op_desc(x$x, ...) } #' @export op_desc.op_arrange <- function(x, ...) { op_desc(x$x, ...) } #' @export op_desc.op <- function(x, ..., con = con) { "lazy query" } dbplyr/R/query.r0000644000176200001440000000217513066550561013313 0ustar liggesusers#' @importFrom R6 R6Class NULL Query <- R6::R6Class("Query", private = list( .nrow = NULL, .vars = NULL ), public = list( con = NULL, sql = NULL, initialize = function(con, sql, vars) { self$con <- con self$sql <- sql private$.vars <- vars }, print = function(...) { cat(" ", self$sql, "\n", sep = "") print(self$con) }, fetch = function(n = -1L) { res <- dbSendQuery(self$con, self$sql) on.exit(dbClearResult(res)) out <- dbFetch(res, n) res_warn_incomplete(res) out }, fetch_paged = function(chunk_size = 1e4, callback) { qry <- dbSendQuery(self$con, self$sql) on.exit(dbClearResult(qry)) while (!dbHasCompleted(qry)) { chunk <- dbFetch(qry, chunk_size) callback(chunk) } invisible(TRUE) }, vars = function() { private$.vars }, nrow = function() { if (!is.null(private$.nrow)) return(private$.nrow) private$.nrow <- db_query_rows(self$con, self$sql) private$.nrow }, ncol = function() { length(self$vars()) } ) ) dbplyr/R/utils.r0000644000176200001440000000344513221457712013304 0ustar liggesusersdots <- function(...) { eval_bare(substitute(alist(...))) } deparse_trunc <- function(x, width = getOption("width")) { text <- deparse(x, width.cutoff = width) if (length(text) == 1 && nchar(text) < width) return(text) paste0(substr(text[1], 1, width - 3), "...") } all_apply <- function(xs, f) { for (x in xs) { if (!f(x)) return(FALSE) } TRUE } any_apply <- function(xs, f) { for (x in xs) { if (f(x)) return(TRUE) } FALSE } drop_last <- function(x) { if (length(x) <= 1L) return(NULL) x[-length(x)] } is.wholenumber <- function(x) { trunc(x) == x } deparse_all <- function(x) { x <- map_if(x, is_formula, f_rhs) map_chr(x, expr_text, width = 500L) } deparse_names <- function(x) { x <- map_if(x, is_formula, f_rhs) map_chr(x, deparse) } #' Provides comma-separated string out ot the parameters #' @export #' @keywords internal #' @param ... Arguments to be constructed into the string named_commas <- function(...) { x <- c(...) if (is_null(names(x))) { paste0(x, collapse = ", ") } else { paste0(names(x), " = ", x, collapse = ", ") } } commas <- function(...) paste0(..., collapse = ", ") in_travis <- function() identical(Sys.getenv("TRAVIS"), "true") named <- function(...) { x <- c(...) missing_names <- names2(x) == "" names(x)[missing_names] <- x[missing_names] x } unique_name <- local({ i <- 0 function() { i <<- i + 1 paste0("zzz", i) } }) succeeds <- function(x, quiet = FALSE) { tryCatch( { x TRUE }, error = function(e) { if (!quiet) message("Error: ", e$message) FALSE } ) } c_character <- function(...) { x <- c(...) if (length(x) == 0) { return(character()) } if (!is.character(x)) { stop("Character input expected", call. = FALSE) } x } dbplyr/R/src-sql.r0000644000176200001440000000154713173470051013526 0ustar liggesusers#' Create a "sql src" object #' #' Deprecated: please use [src_dbi] instead. #' #' @keywords internal #' @export #' @param subclass name of subclass. "src_sql" is an abstract base class, so you #' must supply this value. `src_` is automatically prepended to the #' class name #' @param con the connection object #' @param ... fields used by object src_sql <- function(subclass, con, ...) { subclass <- paste0("src_", subclass) structure(list(con = con, ...), class = c(subclass, "src_sql", "src")) } #' @export same_src.src_sql <- function(x, y) { if (!inherits(y, "src_sql")) return(FALSE) identical(x$con, y$con) } #' @export src_tbls.src_sql <- function(x, ...) { db_list_tables(x$con) } #' @export format.src_sql <- function(x, ...) { paste0( "src: ", db_desc(x$con), "\n", wrap("tbls: ", paste0(sort(src_tbls(x)), collapse = ", ")) ) } dbplyr/R/partial-eval.r0000644000176200001440000000710513123002645014512 0ustar liggesusers#' Partially evaluate an expression. #' #' This function partially evaluates an expression, using information from #' the tbl to determine whether names refer to local expressions #' or remote variables. This simplifies SQL translation because expressions #' don't need to carry around their environment - all revelant information #' is incorporated into the expression. #' #' @section Symbol substitution: #' #' `partial_eval()` needs to guess if you're referring to a variable on the #' server (remote), or in the current environment (local). It's not possible to #' do this 100% perfectly. `partial_eval()` uses the following heuristic: #' #' \itemize{ #' \item If the tbl variables are known, and the symbol matches a tbl #' variable, then remote. #' \item If the symbol is defined locally, local. #' \item Otherwise, remote. #' } #' #' @param call an unevaluated expression, as produced by [quote()] #' @param vars character vector of variable names. #' @param env environment in which to search for local values #' @export #' @keywords internal #' @examples #' vars <- c("year", "id") #' partial_eval(quote(year > 1980), vars = vars) #' #' ids <- c("ansonca01", "forceda01", "mathebo01") #' partial_eval(quote(id %in% ids), vars = vars) #' #' # You can use local to disambiguate between local and remote #' # variables: otherwise remote is always preferred #' year <- 1980 #' partial_eval(quote(year > year), vars = vars) #' partial_eval(quote(year > local(year)), vars = vars) #' #' # Functions are always assumed to be remote. Use local to force evaluation #' # in R. #' f <- function(x) x + 1 #' partial_eval(quote(year > f(1980)), vars = vars) #' partial_eval(quote(year > local(f(1980))), vars = vars) #' #' # For testing you can also use it with the tbl omitted #' partial_eval(quote(1 + 2 * 3)) #' x <- 1 #' partial_eval(quote(x ^ y)) partial_eval <- function(call, vars = character(), env = caller_env()) { switch_type(call, "NULL" = NULL, symbol = sym_partial_eval(call, vars, env), language = lang_partial_eval(call, vars, env), logical = , integer = , double = , complex = , string = , character = call, formula = { # This approach may be ill-founded: might be better to have a separate # function for partially evaluating a list of quos/lazy_dots f_rhs(call) <- partial_eval(f_rhs(call), vars, f_env(call)) if (length(call) == 3) { f_lhs(call) <- partial_eval(f_lhs(call), vars, f_env(call)) } call }, list = { if (inherits(call, "lazy_dots")) { call <- dplyr:::compat_lazy_dots(call, env) } map(call, partial_eval, vars = vars, env = env) }, abort(glue("Unknown input type: ", class(call))) ) } sym_partial_eval <- function(sym, vars, env) { name <- as_string(sym) if (name %in% vars) { sym } else if (env_has(env, name, inherit = TRUE)) { eval_bare(sym, env) } else { sym } } lang_partial_eval <- function(call, vars, env) { switch_lang(call, # Evaluate locally if complex CAR inlined = , namespaced = , recursive = eval_bare(call, env), named = { # Process call arguments recursively, unless user has manually called # remote/local name <- as_string(node_car(call)) if (name == "local") { eval_bare(call[[2]], env) } else if (name %in% c("$", "[[", "[")) { # Subsetting is always done locally eval_bare(call, env) } else if (name == "remote") { call[[2]] } else { call[-1] <- lapply(call[-1], partial_eval, vars = vars, env = env) call } } ) } dbplyr/R/sql-render.R0000644000176200001440000000421213076151446014154 0ustar liggesusers#' @export #' @rdname sql_build sql_render <- function(query, con = NULL, ...) { UseMethod("sql_render") } #' @export sql_render.tbl_lazy <- function(query, con = query$con, ...) { # only used for testing qry <- sql_build(query$ops, con = con, ...) sql_render(qry, con = con, ...) } #' @export sql_render.tbl_sql <- function(query, con = query$src$con, ...) { # only used for testing qry <- sql_build(query$ops, con = con, ...) sql_render(qry, con = con, ...) } #' @export sql_render.op <- function(query, con = NULL, ...) { sql_render(sql_build(query, ...), con = con, ...) } #' @export sql_render.select_query <- function(query, con = NULL, ..., root = FALSE) { from <- sql_subquery(con, sql_render(query$from, con, ..., root = root), name = NULL) sql_select( con, query$select, from, where = query$where, group_by = query$group_by, having = query$having, order_by = query$order_by, limit = query$limit, distinct = query$distinct, ... ) } #' @export sql_render.ident <- function(query, con = NULL, ..., root = TRUE) { if (root) { sql_select(con, sql("*"), query) } else { query } } #' @export sql_render.sql <- function(query, con = NULL, ...) { query } #' @export sql_render.join_query <- function(query, con = NULL, ..., root = FALSE) { from_x <- sql_subquery(con, sql_render(query$x, con, ..., root = root), name = "TBL_LEFT") from_y <- sql_subquery(con, sql_render(query$y, con, ..., root = root), name = "TBL_RIGHT") sql_join(con, from_x, from_y, vars = query$vars, type = query$type, by = query$by) } #' @export sql_render.semi_join_query <- function(query, con = NULL, ..., root = FALSE) { from_x <- sql_subquery(con, sql_render(query$x, con, ..., root = root), name = "TBL_LEFT") from_y <- sql_subquery(con, sql_render(query$y, con, ..., root = root), name = "TBL_RIGHT") sql_semi_join(con, from_x, from_y, anti = query$anti, by = query$by) } #' @export sql_render.set_op_query <- function(query, con = NULL, ..., root = FALSE) { from_x <- sql_render(query$x, con, ..., root = TRUE) from_y <- sql_render(query$y, con, ..., root = TRUE) sql_set_op(con, from_x, from_y, method = query$type) } dbplyr/R/dbi-s3.r0000644000176200001440000001053113176710165013222 0ustar liggesusers#' @export db_desc.DBIConnection <- function(x) { class(x)[[1]] } #' @export db_list_tables.DBIConnection <- function(con) dbListTables(con) #' @export db_has_table.DBIConnection <- function(con, table) dbExistsTable(con, table) #' @export db_data_type.DBIConnection <- function(con, fields) { vapply(fields, dbDataType, dbObj = con, FUN.VALUE = character(1)) } #' @export db_save_query.DBIConnection <- function(con, sql, name, temporary = TRUE, ...) { tt_sql <- build_sql( "CREATE ", if (temporary) sql("TEMPORARY "), "TABLE ", as.sql(name), " AS ", sql, con = con ) dbExecute(con, tt_sql) name } #' @export db_begin.DBIConnection <- function(con, ...) { dbBegin(con) } #' @export db_commit.DBIConnection <- function(con, ...) dbCommit(con) #' @export db_rollback.DBIConnection <- function(con, ...) dbRollback(con) #' @export db_write_table.DBIConnection <- function(con, table, types, values, temporary = TRUE, ...) { dbWriteTable( con, name = dbi_quote(as.sql(table), con), value = values, field.types = types, temporary = temporary, row.names = FALSE ) table } #' @export db_create_table.DBIConnection <- function(con, table, types, temporary = TRUE, ...) { assert_that(is_string(table), is.character(types)) field_names <- escape(ident(names(types)), collapse = NULL, con = con) fields <- sql_vector( paste0(field_names, " ", types), parens = TRUE, collapse = ", ", con = con ) sql <- build_sql( "CREATE ", if (temporary) sql("TEMPORARY "), "TABLE ", as.sql(table), " ", fields, con = con ) dbExecute(con, sql) } #' @export db_insert_into.DBIConnection <- function(con, table, values, ...) { dbWriteTable(con, table, values, append = TRUE, row.names = FALSE) } #' @export db_create_indexes.DBIConnection <- function(con, table, indexes = NULL, unique = FALSE, ...) { if (is.null(indexes)) return() assert_that(is.list(indexes)) for (index in indexes) { db_create_index(con, table, index, unique = unique, ...) } } #' @export db_create_index.DBIConnection <- function(con, table, columns, name = NULL, unique = FALSE, ...) { assert_that(is_string(table), is.character(columns)) name <- name %||% paste0(c(table, columns), collapse = "_") fields <- escape(ident(columns), parens = TRUE, con = con) sql <- build_sql( "CREATE ", if (unique) sql("UNIQUE "), "INDEX ", as.sql(name), " ON ", as.sql(table), " ", fields, con = con) dbExecute(con, sql) } #' @export db_drop_table.DBIConnection <- function(con, table, force = FALSE, ...) { sql <- build_sql( "DROP TABLE ", if (force) sql("IF EXISTS "), as.sql(table), con = con ) dbExecute(con, sql) } #' @export db_analyze.DBIConnection <- function(con, table, ...) { sql <- build_sql("ANALYZE ", as.sql(table), con = con) dbExecute(con, sql) } #' @export db_explain.DBIConnection <- function(con, sql, ...) { exsql <- build_sql("EXPLAIN ", sql, con = con) expl <- dbGetQuery(con, exsql) out <- utils::capture.output(print(expl)) paste(out, collapse = "\n") } #' @export db_query_fields.DBIConnection <- function(con, sql, ...) { sql <- sql_select(con, sql("*"), sql_subquery(con, sql), where = sql("0 = 1")) qry <- dbSendQuery(con, sql) on.exit(dbClearResult(qry)) res <- dbFetch(qry, 0) names(res) } #' @export db_query_rows.DBIConnection <- function(con, sql, ...) { from <- sql_subquery(con, sql, "master") rows <- build_sql("SELECT count(*) FROM ", from, con = con) as.integer(dbGetQuery(con, rows)[[1]]) } # Utility functions ------------------------------------------------------------ random_table_name <- function(n = 10) { paste0(sample(letters, n, replace = TRUE), collapse = "") } res_warn_incomplete <- function(res, hint = "n = -1") { if (dbHasCompleted(res)) return() rows <- big_mark(dbGetRowCount(res)) warning("Only first ", rows, " results retrieved. Use ", hint, " to retrieve all.", call. = FALSE) } dbi_quote <- function(x, con) UseMethod("dbi_quote") dbi_quote.ident_q <- function(x, con) DBI::SQL(x) dbi_quote.ident <- function(x, con) DBI::dbQuoteIdentifier(con, x) dbi_quote.character <- function(x, con) DBI::dbQuoteString(con, x) dbi_quote.sql <- function(x, con) DBI::SQL(x) dbplyr/R/db-odbc-teradata.R0000644000176200001440000000623713221275427015164 0ustar liggesusers#' @export sql_select.Teradata <- function(con, select, from, where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL, distinct = FALSE, ...) { out <- vector("list", 7) names(out) <- c("select", "from", "where", "group_by", "having", "order_by","limit") assert_that(is.character(select), length(select) > 0L) out$select <- build_sql( "SELECT ", if (distinct) sql("DISTINCT "), # Teradata uses the TOP statement instead of LIMIT which is what SQL92 uses # TOP is expected after DISTINCT and not at the end of the query # e.g: SELECT TOP 100 * FROM my_table if (!is.null(limit) && !identical(limit, Inf)) { assert_that(is.numeric(limit), length(limit) == 1L, limit > 0) build_sql(" TOP ", as.integer(limit), " ")}, escape(select, collapse = ", ", con = con) ) out$from <- sql_clause_from(from, con) out$where <- sql_clause_where(where, con) out$group_by <- sql_clause_group_by(group_by, con) out$having <- sql_clause_having(having, con) out$order_by <- sql_clause_order_by(order_by, con) escape(unname(compact(out)), collapse = "\n", parens = FALSE, con = con) } #' @export sql_translate_env.Teradata <- function(con) { sql_variant( sql_translator(.parent = base_odbc_scalar, `!=` = sql_infix("<>"), as.numeric = sql_cast("NUMERIC"), as.double = sql_cast("NUMERIC"), as.character = sql_cast("VARCHAR(MAX)"), log10 = sql_prefix("LOG"), log = sql_log(), cot = sql_cot(), nchar = sql_prefix("CHARACTER_LENGTH"), ceil = sql_prefix("CEILING"), ceiling = sql_prefix("CEILING"), atan2 = function(x, y){ build_sql( "ATAN2(", y, ",", x, ")" )}, substr = function(x, start, stop){ len <- stop - start + 1 build_sql( "SUBSTR(", x, ", ", start, ", ", len, ")" )}, paste = function(...) { stop("PASTE() is not supported in this SQL variant, try PASTE0() instead", call. = FALSE) } ), sql_translator(.parent = base_odbc_agg, cor = sql_not_supported("cor()"), cov = sql_not_supported("cov()"), var = sql_prefix("VAR_SAMP") ), sql_translator(.parent = base_odbc_win, cor = win_absent("cor"), cov = win_absent("cov"), var = win_recycled("VAR_SAMP") ) )} #' @export db_analyze.Teradata <- function(con, table, ...) { # Using COLLECT STATISTICS instead of ANALYZE as recommended in this article # https://www.tutorialspoint.com/teradata/teradata_statistics.htm sql <- build_sql( "COLLECT STATISTICS ", ident(table) , con = con ) DBI::dbExecute(con, sql) } dbplyr/R/sql-expr.R0000644000176200001440000000256713174465555013676 0ustar liggesusers#' Generate SQL from R expressions #' #' Low-level building block for generating SQL from R expressions. #' Strings are escaped; names become bare SQL identifiers. User infix #' functions have `\\%` stripped. #' #' @param x A quasiquoted expression #' @inheritParams translate_sql #' @export #' @examples #' sql_expr(f(x + 1)) #' #' sql_expr(f("x", "y")) #' sql_expr(f(x, y)) #' #' sql_expr(cast("x" %as% DECIMAL)) #' sql_expr(round(x) %::% numeric) sql_expr <- function(x, con = sql_current_con()) { x <- enexpr(x) x <- replace_expr(x, con = con) sql(x) } replace_expr <- function(x, con) { if (is.atomic(x)) { escape(x, con = con) } else if (is.name(x)) { as.character(x) # } else if (is.call(x) && identical(x[[1]], quote(I))) { # escape(ident(as.character(x[[2]]))) } else if (is.call(x)) { fun <- toupper(as.character(x[[1]])) args <- lapply(x[-1], replace_expr, con = con) if (is_infix_base(fun)) { if (length(args) == 1) { paste0(fun, args[[1]]) } else { paste0(args[[1]], " ", fun, " ", args[[2]]) } } else if (is_infix_user(fun)) { fun <- substr(fun, 2, nchar(fun) - 1) paste0(args[[1]], " ", fun, " ", args[[2]]) } else if (fun == "(") { paste0("(", paste0(args, collapse = ", "), ")") } else { paste0(fun, "(", paste0(args, collapse = ", "), ")") } } else { x } } dbplyr/R/ident.R0000644000176200001440000000234413221465646013211 0ustar liggesusers#' @include utils.r NULL #' Flag a character vector as SQL identifiers #' #' `ident()` takes unquoted strings and flags them as identifiers. #' `ident_q()` assumes its input has already been quoted, and ensures #' it does not get quoted again. This is currently used only for #' for `schema.table`. #' #' @param ... A character vector, or name-value pairs #' @param x An object #' @export #' @examples #' # SQL92 quotes strings with ' #' escape("x") #' #' # And identifiers with " #' ident("x") #' escape(ident("x")) #' #' # You can supply multiple inputs #' ident(a = "x", b = "y") #' ident_q(a = "x", b = "y") ident <- function(...) { x <- c_character(...) structure(x, class = c("ident", "character")) } setOldClass(c("ident", "character"), ident()) #' @export #' @rdname ident ident_q <- function(...) { x <- c_character(...) structure(x, class = c("ident_q", "ident", "character")) } setOldClass(c("ident_q", "ident", "character"), ident_q()) #' @export print.ident <- function(x, ...) cat(format(x, ...), sep = "\n") #' @export format.ident <- function(x, ...) { if (length(x) == 0) { paste0(" [empty]") } else { paste0(" ", x) } } #' @rdname ident #' @export is.ident <- function(x) inherits(x, "ident") dbplyr/R/db-postgres.r0000644000176200001440000001074713221451550014372 0ustar liggesusers#' @export db_desc.PostgreSQLConnection <- function(x) { info <- dbGetInfo(x) host <- if (info$host == "") "localhost" else info$host paste0("postgres ", info$serverVersion, " [", info$user, "@", host, ":", info$port, "/", info$dbname, "]") } #' @export db_desc.PostgreSQL <- db_desc.PostgreSQLConnection #' @export db_desc.PqConnection <- db_desc.PostgreSQLConnection #' @export sql_translate_env.PostgreSQLConnection <- function(con) { sql_variant( sql_translator(.parent = base_scalar, log10 = function(x) sql_expr(log(!!x)), log = sql_log(), cot = sql_cot(), round = function(x, digits = 0L) { digits <- as.integer(digits) sql_expr(round((!!x) %::% numeric, !!digits)) }, grepl = function(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) { # https://www.postgresql.org/docs/current/static/functions-matching.html#FUNCTIONS-POSIX-TABLE if (any(c(perl, fixed, useBytes))) { stop("perl, fixed and useBytes parameters are unsupported") } if (ignore.case) { sql_expr((!!x) %~*% (!!pattern)) } else { sql_expr((!!x) %~% (!!pattern)) } }, paste = sql_paste(" "), paste0 = sql_paste(""), # stringr functions str_locate = function(string, pattern) { sql_expr(strpos(!!string, !!pattern)) }, str_detect = function(string, pattern){ sql_expr(strpos(!!string, !!pattern) > 0L) } ), sql_translator(.parent = base_agg, n = function() sql("COUNT(*)"), cor = sql_aggregate_2("corr"), cov = sql_aggregate_2("covar_samp"), sd = sql_aggregate("stddev_samp"), var = sql_aggregate("var_samp"), all = sql_aggregate("bool_and"), any = sql_aggregate("bool_or"), str_flatten = function(x, collapse) sql_expr(string_agg(!!x, !!collapse)) ), sql_translator(.parent = base_win, n = function() { win_over(sql("COUNT(*)"), partition = win_current_group()) }, cor = win_aggregate_2("corr"), cov = win_aggregate_2("covar_samp"), sd = win_aggregate("stddev_samp"), var = win_aggregate("var_samp"), all = win_aggregate("bool_and"), any = win_aggregate("bool_or"), str_flatten = function(x, collapse) { win_over( sql_expr(string_agg(!!x, !!collapse)), partition = win_current_group(), order = win_current_order() ) } ) ) } #' @export sql_translate_env.PostgreSQL <- sql_translate_env.PostgreSQLConnection #' @export sql_translate_env.PqConnection <- sql_translate_env.PostgreSQLConnection # DBI methods ------------------------------------------------------------------ # Doesn't return TRUE for temporary tables #' @export db_has_table.PostgreSQLConnection <- function(con, table, ...) { table %in% db_list_tables(con) } #' @export db_begin.PostgreSQLConnection <- function(con, ...) { dbExecute(con, "BEGIN TRANSACTION") } #' @export db_write_table.PostgreSQLConnection <- function(con, table, types, values, temporary = TRUE, ...) { db_create_table(con, table, types, temporary = temporary) if (nrow(values) == 0) return(NULL) cols <- lapply(values, escape, collapse = NULL, parens = FALSE, con = con) col_mat <- matrix(unlist(cols, use.names = FALSE), nrow = nrow(values)) rows <- apply(col_mat, 1, paste0, collapse = ", ") values <- paste0("(", rows, ")", collapse = "\n, ") sql <- build_sql("INSERT INTO ", as.sql(table), " VALUES ", sql(values)) dbExecute(con, sql) table } #' @export db_query_fields.PostgreSQLConnection <- function(con, sql, ...) { fields <- build_sql( "SELECT * FROM ", sql_subquery(con, sql), " WHERE 0=1", con = con ) qry <- dbSendQuery(con, fields) on.exit(dbClearResult(qry)) dbGetInfo(qry)$fieldDescription[[1]]$name } # http://www.postgresql.org/docs/9.3/static/sql-explain.html #' @export db_explain.PostgreSQLConnection <- function(con, sql, format = "text", ...) { format <- match.arg(format, c("text", "json", "yaml", "xml")) exsql <- build_sql( "EXPLAIN ", if (!is.null(format)) build_sql("(FORMAT ", sql(format), ") "), sql ) expl <- dbGetQuery(con, exsql) paste(expl[[1]], collapse = "\n") } #' @export db_explain.PostgreSQL <- db_explain.PostgreSQLConnection #' @export db_explain.PqConnection <- db_explain.PostgreSQLConnection globalVariables(c("strpos", "%::%", "string_agg", "%~*%", "%~%")) dbplyr/R/compat-purrr.R0000644000176200001440000000725013066545575014551 0ustar liggesusers# nocov start # This file serves as a reference for compatibility functions for # purrr. They are not drop-in replacements but allow a similar style # of programming. This is useful in cases where purrr is too heavy a # package to depend on. Please find the most recent version in rlang's # repository. map <- function(.x, .f, ...) { lapply(.x, .f, ...) } map_mold <- function(.x, .f, .mold, ...) { out <- vapply(.x, .f, .mold, ..., USE.NAMES = FALSE) rlang::set_names(out, names(.x)) } map_lgl <- function(.x, .f, ...) { map_mold(.x, .f, logical(1), ...) } map_int <- function(.x, .f, ...) { map_mold(.x, .f, integer(1), ...) } map_dbl <- function(.x, .f, ...) { map_mold(.x, .f, double(1), ...) } map_chr <- function(.x, .f, ...) { map_mold(.x, .f, character(1), ...) } map_cpl <- function(.x, .f, ...) { map_mold(.x, .f, complex(1), ...) } pluck <- function(.x, .f) { map(.x, `[[`, .f) } pluck_lgl <- function(.x, .f) { map_lgl(.x, `[[`, .f) } pluck_int <- function(.x, .f) { map_int(.x, `[[`, .f) } pluck_dbl <- function(.x, .f) { map_dbl(.x, `[[`, .f) } pluck_chr <- function(.x, .f) { map_chr(.x, `[[`, .f) } pluck_cpl <- function(.x, .f) { map_cpl(.x, `[[`, .f) } map2 <- function(.x, .y, .f, ...) { Map(.f, .x, .y, ...) } map2_lgl <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "logical") } map2_int <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "integer") } map2_dbl <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "double") } map2_chr <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "character") } map2_cpl <- function(.x, .y, .f, ...) { as.vector(map2(.x, .y, .f, ...), "complex") } args_recycle <- function(args) { lengths <- map_int(args, length) n <- max(lengths) stopifnot(all(lengths == 1L | lengths == n)) to_recycle <- lengths == 1L args[to_recycle] <- map(args[to_recycle], function(x) rep.int(x, n)) args } pmap <- function(.l, .f, ...) { args <- args_recycle(.l) do.call("mapply", c( FUN = list(quote(.f)), args, MoreArgs = quote(list(...)), SIMPLIFY = FALSE, USE.NAMES = FALSE )) } probe <- function(.x, .p, ...) { if (is_logical(.p)) { stopifnot(length(.p) == length(.x)) .p } else { map_lgl(.x, .p, ...) } } keep <- function(.x, .f, ...) { .x[probe(.x, .f, ...)] } discard <- function(.x, .p, ...) { sel <- probe(.x, .p, ...) .x[is.na(sel) | !sel] } map_if <- function(.x, .p, .f, ...) { matches <- probe(.x, .p) .x[matches] <- map(.x[matches], .f, ...) .x } flatten <- function(.x) { unlist(.x, FALSE, FALSE) } compact <- function(.x) { Filter(length, .x) } transpose <- function(.l) { inner_names <- names(.l[[1]]) if (is.null(inner_names)) { fields <- seq_along(.l[[1]]) } else { fields <- set_names(inner_names) } map(fields, function(i) { map(.l, .subset2, i) }) } every <- function(.x, .p, ...) { for (i in seq_along(.x)) { if (!rlang::is_true(.p(.x[[i]], ...))) return(FALSE) } TRUE } some <- function(.x, .p, ...) { for (i in seq_along(.x)) { if (rlang::is_true(.p(.x[[i]], ...))) return(TRUE) } FALSE } negate <- function(.p) { function(...) !.p(...) } reduce <- function(.x, .f, ..., .init) { f <- function(x, y) .f(x, y, ...) Reduce(f, .x, init = .init) } reduce_right <- function(.x, .f, ..., .init) { f <- function(x, y) .f(y, x, ...) Reduce(f, .x, init = .init, right = TRUE) } accumulate <- function(.x, .f, ..., .init) { f <- function(x, y) .f(x, y, ...) Reduce(f, .x, init = .init, accumulate = TRUE) } accumulate_right <- function(.x, .f, ..., .init) { f <- function(x, y) .f(y, x, ...) Reduce(f, .x, init = .init, right = TRUE, accumulate = TRUE) } # nocov end dbplyr/R/simulate.r0000644000176200001440000000374613174442662014000 0ustar liggesuserssimulate_test <- function() { structure(list(), class = c("DBITestConnection", "DBIConnection")) } db_query_fields.DBITestConnection <- function(con, sql, ...) { c("field1") } sql_escape_ident.DBITestConnection <- function(con, x) { sql_quote(x, "`") } sql_escape_string.DBITestConnection <- function(con, x) { sql_quote(x, "'") } #' @export sql_subquery.DBITestConnection <- function(con, from, name = unique_name(), ...) { if (is.ident(from)) { setNames(from, name) } else { build_sql("(", from, ") ", ident(name %||% random_table_name()), con = con) } } # DBI connections -------------------------------------------------------------- #' @export #' @rdname tbl_lazy simulate_dbi <- function() { structure( list(), class = "DBIConnection" ) } #' @export #' @rdname tbl_lazy simulate_sqlite <- function() { structure( list(), class = c("SQLiteConnection", "DBIConnection") ) } #' @export #' @rdname tbl_lazy simulate_postgres <- function() { structure( list(), class = c("PostgreSQLConnection", "DBIConnection") ) } #' @export #' @rdname tbl_lazy simulate_mysql <- function() { structure( list(), class = c("MySQLConnection", "DBIConnection") ) } #' @export #' @rdname tbl_lazy simulate_odbc <- function(type = NULL) { structure( list(), class = c(type, "DBITestConnection", "DBIConnection") ) } #' @export #' @rdname tbl_lazy simulate_impala <- function() simulate_odbc("Impala") #' @export #' @rdname tbl_lazy simulate_mssql <- function() simulate_odbc("Microsoft SQL Server") #' @export #' @rdname tbl_lazy simulate_oracle <- function() simulate_odbc("Oracle") #' @export #' @rdname tbl_lazy simulate_hive <- function() simulate_odbc("Hive") #' @export #' @rdname tbl_lazy simulate_odbc_postgresql <- function() simulate_odbc("PostgreSQL") #' @export #' @rdname tbl_lazy simulate_teradata <- function() simulate_odbc("Teradata") #' @export #' @rdname tbl_lazy simulate_odbc_access <- function() simulate_odbc("ACCESS") dbplyr/R/sql.R0000644000176200001440000000236313221465765012710 0ustar liggesusers#' SQL escaping. #' #' These functions are critical when writing functions that translate R #' functions to sql functions. Typically a conversion function should escape #' all its inputs and return an sql object. #' #' @param ... Character vectors that will be combined into a single SQL #' expression. #' @export sql <- function(...) { x <- c_character(...) structure(x, class = c("sql", "character")) } # See setOldClass definition in zzz.R #' @export c.sql <- function(..., drop_null = FALSE, con = NULL) { input <- list(...) if (drop_null) input <- compact(input) out <- unlist(lapply(input, escape, collapse = NULL, con = con)) sql(out) } #' @export c.ident <- c.sql #' @export unique.sql <- function(x, ...) { sql(NextMethod()) } #' @rdname sql #' @export is.sql <- function(x) inherits(x, "sql") #' @export print.sql <- function(x, ...) cat(format(x, ...), sep = "\n") #' @export format.sql <- function(x, ...) { if (length(x) == 0) { paste0(" [empty]") } else { paste0(" ", x) } } #' @rdname sql #' @export #' @param x Object to coerce as.sql <- function(x) UseMethod("as.sql") #' @export as.sql.ident <- function(x) x #' @export as.sql.sql <- function(x) x #' @export as.sql.character <- function(x) ident(x) dbplyr/R/sql-build.R0000644000176200001440000001326013221275427013775 0ustar liggesusers#' Build and render SQL from a sequence of lazy operations #' #' `sql_build()` creates a `select_query` S3 object, that is rendered #' to a SQL string by `sql_render()`. The output from `sql_build()` is #' designed to be easy to test, as it's database agnostic, and has #' a hierarchical structure. #' #' `sql_build()` is generic over the lazy operations, \link{lazy_ops}, #' and generates an S3 object that represents the query. `sql_render()` #' takes a query object and then calls a function that is generic #' over the database. For example, `sql_build.op_mutate()` generates #' a `select_query`, and `sql_render.select_query()` calls #' `sql_select()`, which has different methods for different databases. #' The default methods should generate ANSI 92 SQL where possible, so you #' backends only need to override the methods if the backend is not ANSI #' compliant. #' #' @export #' @keywords internal #' @param op A sequence of lazy operations #' @param con A database connection. The default `NULL` uses a set of #' rules that should be very similar to ANSI 92, and allows for testing #' without an active database connection. #' @param ... Other arguments passed on to the methods. Not currently used. sql_build <- function(op, con = NULL, ...) { UseMethod("sql_build") } #' @export sql_build.tbl_lazy <- function(op, con = NULL, ...) { # only used for testing qry <- sql_build(op$ops, con = con, ...) sql_optimise(qry, con = con, ...) } # Base ops -------------------------------------------------------- #' @export sql_build.op_base_remote <- function(op, con, ...) { op$x } #' @export sql_build.op_base_local <- function(op, con, ...) { ident("df") } # Single table ops -------------------------------------------------------- #' @export sql_build.op_select <- function(op, con, ...) { vars <- tidyselect::vars_select(op_vars(op$x), !!! op$dots, .include = op_grps(op$x)) select_query(sql_build(op$x, con), ident(vars)) } #' @export sql_build.op_rename <- function(op, con, ...) { vars <- tidyselect::vars_rename(op_vars(op$x), !!! op$dots) select_query(sql_build(op$x, con), ident(vars)) } #' @export sql_build.op_arrange <- function(op, con, ...) { order_vars <- translate_sql_(op$dots, con, context = list(clause = "ORDER")) group_vars <- c.sql(ident(op_grps(op$x)), con = con) select_query(sql_build(op$x, con), order_by = order_vars) } #' @export sql_build.op_summarise <- function(op, con, ...) { select_vars <- translate_sql_(op$dots, con, window = FALSE, context = list(clause = "SELECT")) group_vars <- c.sql(ident(op_grps(op$x)), con = con) select_query( sql_build(op$x, con), select = c.sql(group_vars, select_vars, con = con), group_by = group_vars ) } #' @export sql_build.op_mutate <- function(op, con, ...) { vars <- op_vars(op$x) new_vars <- translate_sql_( op$dots, con, vars_group = op_grps(op), vars_order = translate_sql_(op_sort(op), con, context = list(clause = "ORDER")), vars_frame = op_frame(op), context = list(clause = "SELECT") ) select_query( sql_build(op$x, con), select = overwrite_vars(vars, new_vars, con) ) } overwrite_vars <- function(vars, new_vars, con) { all_names <- unique(c(vars, names(new_vars))) new_idx <- match(names(new_vars), all_names) all_vars <- c.sql(ident(all_names), con = con) all_vars[new_idx] <- c.sql(new_vars, con = con) all_vars } #' @export sql_build.op_head <- function(op, con, ...) { select_query(sql_build(op$x, con), limit = op$args$n) } #' @export sql_build.op_group_by <- function(op, con, ...) { sql_build(op$x, con, ...) } #' @export sql_build.op_ungroup <- function(op, con, ...) { sql_build(op$x, con, ...) } #' @export sql_build.op_filter <- function(op, con, ...) { vars <- op_vars(op$x) if (!uses_window_fun(op$dots, con)) { where_sql <- translate_sql_(op$dots, con, context = list(clause = "WHERE")) select_query( sql_build(op$x, con), where = where_sql ) } else { # Do partial evaluation, then extract out window functions where <- translate_window_where_all(op$dots, ls(sql_translate_env(con)$window)) # Convert where$expr back to a lazy dots object, and then # create mutate operation mutated <- sql_build(op_single("mutate", op$x, dots = where$comp), con) where_sql <- translate_sql_(where$expr, con = con, context = list(clause = "WHERE")) select_query(mutated, select = ident(vars), where = where_sql) } } #' @export sql_build.op_distinct <- function(op, con, ...) { if (length(op$dots) == 0) { select_query( sql_build(op$x, con), distinct = TRUE ) } else { if (op$args$.keep_all) { stop( "Can't calculate distinct only on specified columns with SQL unless .keep_all is FALSE", call. = FALSE ) } group_vars <- c.sql(ident(op_vars(op)), con = con) select_query( sql_build(op$x, con), select = group_vars, group_by = group_vars ) } } # Dual table ops -------------------------------------------------------- #' @export sql_build.op_join <- function(op, con, ...) { join_query( op$x, op$y, vars = op$args$vars, type = op$args$type, by = op$args$by, suffix = op$args$suffix ) } #' @export sql_build.op_semi_join <- function(op, con, ...) { semi_join_query(op$x, op$y, anti = op$args$anti, by = op$args$by) } #' @export sql_build.op_set_op <- function(op, con, ...) { x_vars <- op_vars(op$x) y_vars <- op_vars(op$y) if (!identical(x_vars, y_vars)) { vars <- semi_join_vars(x_vars, y_vars) x <- select_query(sql_build(op$x, con), ident(vars$x)) y <- select_query(sql_build(op$y, con), ident(vars$y)) } else { x <- op$x y <- op$y } set_op_query(x, y, type = op$args$type) } dbplyr/R/cache.r0000644000176200001440000000312412616436421013202 0ustar liggesusers# Environment for caching connections etc cache <- new.env(parent = emptyenv()) is_cached <- function(name) exists(name, envir = cache) set_cache <- function(name, value) { # message("Setting ", name, " in cache") assign(name, value, envir = cache) value } get_cache <- function(name) { # message("Getting ", name, " from cache") get(name, envir = cache) } cache_computation <- function(name, computation) { if (is_cached(name)) { get_cache(name) } else { res <- force(computation) set_cache(name, res) res } } load_srcs <- function(f, src_names, quiet = NULL) { if (is.null(quiet)) { quiet <- !identical(Sys.getenv("NOT_CRAN"), "true") } srcs <- lapply(src_names, function(x) { out <- NULL try(out <- f(x), silent = TRUE) if (is.null(out) && !quiet) { message("Could not instantiate ", x, " src") } out }) compact(setNames(srcs, src_names)) } db_location <- function(path = NULL, filename) { if (!is.null(path)) { # Check that path is a directory and is writeable if (!file.exists(path) || !file.info(path)$isdir) { stop(path, " is not a directory", call. = FALSE) } if (!is_writeable(path)) stop("Can not write to ", path, call. = FALSE) return(file.path(path, filename)) } pkg <- file.path(system.file("db", package = "dplyr")) if (is_writeable(pkg)) return(file.path(pkg, filename)) tmp <- tempdir() if (is_writeable(tmp)) return(file.path(tmp, filename)) stop("Could not find writeable location to cache db", call. = FALSE) } is_writeable <- function(x) { unname(file.access(x, 2) == 0) } dbplyr/R/testthat.r0000644000176200001440000000224013066532772014003 0ustar liggesusers expect_equal_tbl <- function(object, expected, ..., info = NULL, label = NULL, expected.label = NULL) { lab_act <- label %||% expr_label(substitute(object)) lab_exp <- expected.label %||% expr_label(substitute(expected)) ok <- dplyr::all_equal(collect(object), collect(expected), ...) msg <- glue(" {lab_act} not equal to {lab_exp}. {paste(ok, collapse = '\n')} ") testthat::expect(isTRUE(ok), msg, info = info) } expect_equal_tbls <- function(results, ref = NULL, ...) { stopifnot(is.list(results)) if (!is_named(results)) { result_name <- expr_name(substitute(results)) names(results) <- paste0(result_name, "_", seq_along(results)) } # If ref is NULL, use the first result if (is.null(ref)) { if (length(results) < 2) { testthat::skip("Need at least two srcs to compare") } ref <- results[[1]] ref_name <- names(results)[[1]] rest <- results[-1] } else { rest <- results ref_name <- "`ref`" } for (i in seq_along(rest)) { expect_equal_tbl( rest[[i]], ref, ..., label = names(rest)[[i]], expected.label = ref_name ) } invisible(TRUE) } dbplyr/R/schema.R0000644000176200001440000000127613173736450013351 0ustar liggesusers#' Refer to a table in a schema #' #' @param schema,table Names of schema and table. #' @export #' @examples #' in_schema("my_schema", "my_table") #' #' # Example using schemas with SQLite #' con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") #' src <- src_dbi(con, auto_disconnect = TRUE) #' #' # Add auxilary schema #' tmp <- tempfile() #' DBI::dbExecute(con, paste0("ATTACH '", tmp, "' AS aux")) #' #' library(dplyr, warn.conflicts = FALSE) #' copy_to(con, iris, "df", temporary = FALSE) #' copy_to(con, mtcars, in_schema("aux", "df"), temporary = FALSE) #' #' con %>% tbl("df") #' con %>% tbl(in_schema("aux", "df")) in_schema <- function(schema, table) { ident_q(paste0(schema, ".", table)) } dbplyr/R/sql-generic.R0000644000176200001440000001252513174424334014315 0ustar liggesusers#' More SQL generics #' #' These are new, so not included in dplyr for backward compatibility #' purposes. #' #' @keywords internal #' @export sql_escape_logical <- function(con, x) { UseMethod("sql_escape_logical") } # DBIConnection methods ----------------------------------------------------------------- #' @export sql_select.DBIConnection <- function(con, select, from, where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL, distinct = FALSE, ...) { out <- vector("list", 7) names(out) <- c("select", "from", "where", "group_by", "having", "order_by", "limit") out$select <- sql_clause_select(select, con, distinct) out$from <- sql_clause_from(from, con) out$where <- sql_clause_where(where, con) out$group_by <- sql_clause_group_by(group_by, con) out$having <- sql_clause_having(having, con) out$order_by <- sql_clause_order_by(order_by, con) out$limit <- sql_clause_limit(limit, con) escape(unname(compact(out)), collapse = "\n", parens = FALSE, con = con) } #' @export sql_subquery.DBIConnection <- function(con, from, name = unique_name(), ...) { if (is.ident(from)) { setNames(from, name) } else { build_sql("(", from, ") ", ident(name %||% random_table_name()), con = con) } } #' @export sql_join.DBIConnection <- function(con, x, y, vars, type = "inner", by = NULL, ...) { JOIN <- switch( type, left = sql("LEFT JOIN"), inner = sql("INNER JOIN"), right = sql("RIGHT JOIN"), full = sql("FULL JOIN"), cross = sql("CROSS JOIN"), stop("Unknown join type:", type, call. = FALSE) ) select <- sql_join_vars(con, vars) on <- sql_join_tbls(con, by) # Wrap with SELECT since callers assume a valid query is returned build_sql( "SELECT ", select, "\n", " FROM ", x, "\n", " ", JOIN, " ", y, "\n", if (!is.null(on)) build_sql(" ON ", on, "\n") else NULL, con = con ) } sql_join_vars <- function(con, vars) { sql_vector( mapply( FUN = sql_join_var, alias = vars$alias, x = vars$x, y = vars$y, MoreArgs = list(con = con), SIMPLIFY = FALSE, USE.NAMES = TRUE ), parens = FALSE, collapse = ", ", con = con ) } sql_join_var <- function(con, alias, x, y) { if (!is.na(x) & !is.na(y)) { sql_coalesce( sql_table_prefix(con, x, table = "TBL_LEFT"), sql_table_prefix(con, y, table = "TBL_RIGHT") ) } else if (!is.na(x)) { sql_table_prefix(con, x, table = "TBL_LEFT") } else if (!is.na(y)) { sql_table_prefix(con, y, table = "TBL_RIGHT") } else { stop("No source for join column ", alias, call. = FALSE) } } sql_join_tbls <- function(con, by) { on <- NULL if (length(by$x) + length(by$y) > 0) { on <- sql_vector( paste0( sql_table_prefix(con, by$x, "TBL_LEFT"), " = ", sql_table_prefix(con, by$y, "TBL_RIGHT") ), collapse = " AND ", parens = TRUE ) } on } sql_coalesce <- function(...) { vars <- sql_vector(list(...), parens = FALSE, collapse = ", ") build_sql("coalesce(", vars, ")") } sql_as <- function(con, var, alias = names(var), table = NULL) { if (length(var) == 0) return(ident()) var <- sql_table_prefix(con, var, table = table) alias <- sql_escape_ident(con, alias) sql(paste0(var, " AS ", alias)) } sql_table_prefix <- function(con, var, table = NULL) { var <- sql_escape_ident(con, var) if (!is.null(table)) { table <- sql_escape_ident(con, table) sql(paste0(table, ".", var)) } else { var } } #' @export sql_semi_join.DBIConnection <- function(con, x, y, anti = FALSE, by = NULL, ...) { # X and Y are subqueries named TBL_LEFT and TBL_RIGHT left <- escape(ident("TBL_LEFT"), con = con) right <- escape(ident("TBL_RIGHT"), con = con) on <- sql_vector( paste0( left, ".", sql_escape_ident(con, by$x), " = ", right, ".", sql_escape_ident(con, by$y) ), collapse = " AND ", parens = TRUE, con = con ) build_sql( "SELECT * FROM ", x, "\n\n", "WHERE ", if (anti) sql("NOT "), "EXISTS (\n", " SELECT 1 FROM ", y, "\n", " WHERE ", on, "\n", ")", con = con ) } #' @export sql_set_op.default <- function(con, x, y, method) { build_sql( "(", x, ")", "\n", sql(method), "\n", "(", y, ")" ) } #' @export sql_set_op.SQLiteConnection <- function(con, x, y, method) { # SQLite does not allow parentheses build_sql( x, "\n", sql(method), "\n", y ) } #' @export sql_escape_string.DBIConnection <- function(con, x) { dbQuoteString(con, x) } #' @export sql_escape_string.NULL <- function(con, x) { sql_quote(x, "'") } #' @export sql_escape_ident.DBIConnection <- function(con, x) { dbQuoteIdentifier(con, x) } #' @export sql_escape_ident.NULL <- function(con, x) { sql_quote(x, '"') } #' @export sql_escape_logical.DBIConnection <- function(con, x) { y <- as.character(x) y[is.na(x)] <- "NULL" y } #' @export sql_escape_logical.NULL <- sql_escape_logical.DBIConnection #' @export sql_translate_env.NULL <- function(con) { sql_variant( base_scalar, base_agg, base_win ) } #' @export sql_translate_env.DBIConnection <- function(con) { sql_variant( base_scalar, base_agg, base_win ) } dbplyr/R/translate-sql-helpers.r0000644000176200001440000001372713176660356016413 0ustar liggesusers#' Create an sql translator #' #' When creating a package that maps to a new SQL based src, you'll often #' want to provide some additional mappings from common R commands to the #' commands that your tbl provides. These three functions make that #' easy. #' #' @section Helper functions: #' #' `sql_infix()` and `sql_prefix()` create default SQL infix and prefix #' functions given the name of the SQL function. They don't perform any input #' checking, but do correctly escape their input, and are useful for #' quickly providing default wrappers for a new SQL variant. #' #' @keywords internal #' @seealso [win_over()] for helper functions for window functions. #' @param scalar,aggregate,window The three families of functions than an #' SQL variant can supply. #' @param ...,.funs named functions, used to add custom converters from standard #' R functions to sql functions. Specify individually in `...`, or #' provide a list of `.funs` #' @param .parent the sql variant that this variant should inherit from. #' Defaults to `base_agg` which provides a standard set of #' mappings for the most common operators and functions. #' @param f the name of the sql function as a string #' @param n for `sql_infix()`, an optional number of arguments to expect. #' Will signal error if not correct. #' @seealso [sql()] for an example of a more customised sql #' conversion function. #' @export #' @examples #' # An example of adding some mappings for the statistical functions that #' # postgresql provides: http://bit.ly/K5EdTn #' #' postgres_agg <- sql_translator(.parent = base_agg, #' cor = sql_aggregate_2("corr"), #' cov = sql_aggregate_2("covar_samp"), #' sd = sql_aggregate("stddev_samp"), #' var = sql_aggregate("var_samp") #' ) #' postgres_var <- sql_variant( #' base_scalar, #' postgres_agg, #' base_no_win #' ) #' #' # Next we have to simulate a connection that uses this variant #' con <- structure( #' list(), #' class = c("TestCon", "DBITestConnection", "DBIConnection") #' ) #' sql_translate_env.TestCon <- function(x) postgres_var #' #' translate_sql(cor(x, y), con = con, window = FALSE) #' translate_sql(sd(income / years), con = con, window = FALSE) #' #' # Any functions not explicitly listed in the converter will be translated #' # to sql as is, so you don't need to convert all functions. #' translate_sql(regr_intercept(y, x), con = con) sql_variant <- function(scalar = sql_translator(), aggregate = sql_translator(), window = sql_translator()) { stopifnot(is.environment(scalar)) stopifnot(is.environment(aggregate)) stopifnot(is.environment(window)) # Need to check that every function in aggregate also occurs in window missing <- setdiff(ls(aggregate), ls(window)) if (length(missing) > 0) { warn(paste0( "Translator is missing window variants of the following aggregate functions:\n", paste0("* ", missing, "\n", collapse = "") )) } structure( list(scalar = scalar, aggregate = aggregate, window = window), class = "sql_variant" ) } is.sql_variant <- function(x) inherits(x, "sql_variant") #' @export print.sql_variant <- function(x, ...) { wrap_ls <- function(x, ...) { vars <- sort(ls(envir = x)) wrapped <- strwrap(paste0(vars, collapse = ", "), ...) if (identical(wrapped, "")) return() paste0(wrapped, "\n", collapse = "") } cat("\n") cat(wrap_ls( x$scalar, prefix = "scalar: " )) cat(wrap_ls( x$aggregate, prefix = "aggregate: " )) cat(wrap_ls( x$window, prefix = "window: " )) } #' @export names.sql_variant <- function(x) { c(ls(envir = x$scalar), ls(envir = x$aggregate), ls(envir = x$window)) } #' @export #' @rdname sql_variant sql_translator <- function(..., .funs = list(), .parent = new.env(parent = emptyenv())) { funs <- c(list(...), .funs) if (length(funs) == 0) return(.parent) list2env(funs, copy_env(.parent)) } copy_env <- function(from, to = NULL, parent = parent.env(from)) { list2env(as.list(from), envir = to, parent = parent) } #' @rdname sql_variant #' @export sql_infix <- function(f) { assert_that(is_string(f)) f <- toupper(f) function(x, y) { build_sql(x, " ", sql(f), " ", y) } } #' @rdname sql_variant #' @export sql_prefix <- function(f, n = NULL) { assert_that(is_string(f)) f <- toupper(f) function(...) { args <- list(...) if (!is.null(n) && length(args) != n) { stop( "Invalid number of args to SQL ", f, ". Expecting ", n, call. = FALSE ) } if (any(names2(args) != "")) { warning("Named arguments ignored for SQL ", f, call. = FALSE) } build_sql(sql(f), args) } } #' @rdname sql_variant #' @export sql_aggregate <- function(f) { assert_that(is_string(f)) f <- toupper(f) function(x, na.rm = FALSE) { check_na_rm(f, na.rm) build_sql(sql(f), list(x)) } } #' @rdname sql_variant #' @export sql_aggregate_2 <- function(f) { assert_that(is_string(f)) f <- toupper(f) function(x, y) { build_sql(sql(f), list(x, y)) } } check_na_rm <- function(f, na.rm) { if (identical(na.rm, TRUE)) { return() } warning( "Missing values are always removed in SQL.\n", "Use `", f, "(x, na.rm = TRUE)` to silence this warning", call. = FALSE ) } #' @rdname sql_variant #' @export sql_not_supported <- function(f) { assert_that(is_string(f)) f <- toupper(f) function(...) { stop(f, " is not available in this SQL variant", call. = FALSE) } } #' @rdname sql_variant #' @export sql_cast <- function(type) { type <- sql(type) function(x) { sql_expr(cast(UQ(x) %as% UQ(type))) } } #' @rdname sql_variant #' @export sql_log <- function() { function(x, base = exp(1)){ if (isTRUE(all.equal(base, exp(1)))) { sql_expr(ln(!!x)) } else { sql_expr(log(!!x) / log(!!base)) } } } #' @rdname sql_variant #' @export sql_cot <- function(){ function(x){ sql_expr(1L / tan(!!x)) } } globalVariables(c("%as%", "cast", "ln")) dbplyr/R/translate-sql-window.r0000644000176200001440000001716013221463067016242 0ustar liggesusers#' Generate SQL expression for window functions #' #' `win_over()` makes it easy to generate the window function specification. #' `win_absent()`, `win_rank()`, `win_aggregate()`, and `win_cumulative()` #' provide helpers for constructing common types of window functions. #' `win_current_group()` and `win_current_order()` allow you to access #' the grouping and order context set up by [group_by()] and [arrange()]. #' #' @param expr The window expression #' @param parition Variables to partition over #' @param order Variables to order by #' @param frame A numeric vector of length two defining the frame. #' @param f The name of an sql function as a string #' @export #' @keywords internal #' @examples #' win_over(sql("avg(x)")) #' win_over(sql("avg(x)"), "y") #' win_over(sql("avg(x)"), order = "y") #' win_over(sql("avg(x)"), order = c("x", "y")) #' win_over(sql("avg(x)"), frame = c(-Inf, 0), order = "y") win_over <- function(expr, partition = NULL, order = NULL, frame = NULL) { if (length(partition) > 0) { partition <- as.sql(partition) partition <- build_sql( "PARTITION BY ", sql_vector( escape(partition, con = sql_current_con()), collapse = ", ", parens = FALSE ) ) } if (length(order) > 0) { order <- as.sql(order) order <- build_sql( "ORDER BY ", sql_vector( escape(order, con = sql_current_con()), collapse = ", ", parens = FALSE ) ) } if (length(frame) > 0) { if (length(order) == 0) { warning( "Windowed expression '", expr, "' does not have explicit order.\n", "Please use arrange() or window_order() to make determinstic.", call. = FALSE ) } if (is.numeric(frame)) frame <- rows(frame[1], frame[2]) frame <- build_sql("ROWS ", frame) } over <- sql_vector(compact(list(partition, order, frame)), parens = TRUE) sql <- build_sql(expr, " OVER ", over) sql } rows <- function(from = -Inf, to = 0) { if (from >= to) stop("from must be less than to", call. = FALSE) dir <- function(x) if (x < 0) "PRECEDING" else "FOLLOWING" val <- function(x) if (is.finite(x)) as.integer(abs(x)) else "UNBOUNDED" bound <- function(x) { if (x == 0) return("CURRENT ROW") paste(val(x), dir(x)) } if (to == 0) { sql(bound(from)) } else { sql(paste0("BETWEEN ", bound(from), " AND ", bound(to))) } } #' @rdname win_over #' @export win_rank <- function(f) { force(f) function(order = NULL) { win_over( build_sql(sql(f), list()), partition = win_current_group(), order = order %||% win_current_order(), frame = win_current_frame() ) } } #' @rdname win_over #' @export win_aggregate <- function(f) { force(f) function(x, na.rm = FALSE) { check_na_rm(f, na.rm) frame <- win_current_frame() win_over( build_sql(sql(f), list(x)), partition = win_current_group(), order = if (!is.null(frame)) win_current_order(), frame = frame ) } } #' @rdname win_over #' @export win_aggregate_2 <- function(f) { f <- toupper(f) function(x, y) { frame <- win_current_frame() win_over( build_sql(sql(f), list(x, y)), partition = win_current_group(), order = if (!is.null(frame)) win_current_order(), frame = frame ) } } #' @rdname win_over #' @usage NULL #' @export win_recycled <- win_aggregate #' @rdname win_over #' @export win_cumulative <- function(f) { force(f) function(x) { win_over( build_sql(sql(f), list(x)), partition = win_current_group(), order = win_current_order(), frame = c(-Inf, 0) ) } } #' @rdname win_over #' @export win_absent <- function(f) { force(f) function(...) { stop( "Window function `", f, "()` is not supported by this database", call. = FALSE ) } } # API to set default partitioning etc ------------------------------------- # Use a global variable to communicate state of partitioning between # tbl and sql translator. This isn't the most amazing design, but it keeps # things loosely coupled and is easy to understand. sql_context <- new.env(parent = emptyenv()) sql_context$group_by <- NULL sql_context$order_by <- NULL sql_context$con <- NULL # Used to carry additional information needed for special cases sql_context$context <- "" set_current_con <- function(con) { old <- sql_context$con sql_context$con <- con invisible(old) } set_win_current_group <- function(vars) { stopifnot(is.null(vars) || is.character(vars)) old <- sql_context$group_by sql_context$group_by <- vars invisible(old) } set_win_current_order <- function(vars) { stopifnot(is.null(vars) || is.character(vars)) old <- sql_context$order_by sql_context$order_by <- vars invisible(old) } set_win_current_frame <- function(frame) { stopifnot(is.null(frame) || is.numeric(frame)) old <- sql_context$frame sql_context$frame <- frame invisible(old) } #' @export #' @rdname win_over win_current_group <- function() sql_context$group_by #' @export #' @rdname win_over win_current_order <- function() sql_context$order_by #' @export #' @rdname win_over win_current_frame <- function() sql_context$frame # Not exported, because you shouldn't need it sql_current_con <- function() sql_context$con # Functions to manage information for special cases set_current_context <- function(context) { old <- sql_context$context sql_context$context <- context invisible(old) } sql_current_context <- function() sql_context$context sql_current_select <- function() sql_context$context %in% c("SELECT", "ORDER") # Where translation ------------------------------------------------------- uses_window_fun <- function(x, con) { if (is.null(x)) return(FALSE) if (is.list(x)) { calls <- unlist(lapply(x, all_calls)) } else { calls <- all_calls(x) } win_f <- ls(envir = sql_translate_env(con)$window) any(calls %in% win_f) } common_window_funs <- function() { ls(sql_translate_env(NULL)$window) } #' @noRd #' @examples #' translate_window_where(quote(1)) #' translate_window_where(quote(x)) #' translate_window_where(quote(x == 1)) #' translate_window_where(quote(x == 1 && y == 2)) #' translate_window_where(quote(n() > 10)) #' translate_window_where(quote(rank() > cumsum(AB))) translate_window_where <- function(expr, window_funs = common_window_funs()) { switch_type(expr, formula = translate_window_where(f_rhs(expr), window_funs), logical = , integer = , double = , complex = , character = , string = , symbol = window_where(expr, list()), language = { if (lang_name(expr) %in% window_funs) { name <- unique_name() window_where(sym(name), set_names(list(expr), name)) } else { args <- map(expr[-1], translate_window_where, window_funs = window_funs) expr <- lang(node_car(expr), splice(map(args, "[[", "expr"))) window_where( expr = expr, comp = unlist(map(args, "[[", "comp"), recursive = FALSE) ) } }, abort(glue("Unknown type: ", typeof(expr))) ) } #' @noRd #' @examples #' translate_window_where_all(list(quote(x == 1), quote(n() > 2))) #' translate_window_where_all(list(quote(cumsum(x) == 10), quote(n() > 2))) translate_window_where_all <- function(x, window_funs = common_window_funs()) { out <- lapply(x, translate_window_where, window_funs = window_funs) list( expr = unlist(lapply(out, "[[", "expr"), recursive = FALSE), comp = unlist(lapply(out, "[[", "comp"), recursive = FALSE) ) } window_where <- function(expr, comp) { stopifnot(is.call(expr) || is.name(expr) || is.atomic(expr)) stopifnot(is.list(comp)) list( expr = expr, comp = comp ) } dbplyr/R/src_dbi.R0000644000176200001440000001071613221456346013512 0ustar liggesusers#' dplyr backend for any DBI-compatible database #' #' @description #' `src_dbi()` is a general dplyr backend that connects to any #' DBI driver. `src_memdb()` connects to a temporary in-memory SQLite #' database, that's useful for testing and experimenting. #' #' You can generate a `tbl()` directly from the DBI connection, or #' go via `src_dbi()`. #' #' @details #' All data manipulation on SQL tbls are lazy: they will not actually #' run the query or retrieve the data unless you ask for it: they all return #' a new `tbl_dbi` object. Use [compute()] to run the query and save the #' results in a temporary in the database, or use [collect()] to retrieve the #' results to R. You can see the query with [show_query()]. #' #' For best performance, the database should have an index on the variables #' that you are grouping by. Use [explain()] to check that the database is using #' the indexes that you expect. #' #' There is one excpetion: [do()] is not lazy since it must pull the data #' into R. #' #' @param con An object that inherits from [DBI::DBIConnection-class], #' typically generated by [DBI::dbConnect] #' @param auto_disconnect Should the connection be automatically closed when #' the src is deleted? Set to `TRUE` if you initialize the connection #' the call to `src_dbi()`. Pass `NA` to auto-disconnect but print a message #' when this happens. #' @return An S3 object with class `src_dbi`, `src_sql`, `src`. #' @export #' @examples #' # Basic connection using DBI ------------------------------------------- #' library(dplyr) #' #' con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") #' src <- src_dbi(con, auto_disconnect = TRUE) #' #' # Add some data #' copy_to(src, mtcars) #' src #' DBI::dbListTables(con) #' #' # To retrieve a single table from a source, use `tbl()` #' src %>% tbl("mtcars") #' #' # You can also use pass raw SQL if you want a more sophisticated query #' src %>% tbl(sql("SELECT * FROM mtcars WHERE cyl == 8")) #' #' # Alternatively, you can use the `src_sqlite()` helper #' src2 <- src_sqlite(":memory:", create = TRUE) #' #' # If you just want a temporary in-memory database, use src_memdb() #' src3 <- src_memdb() #' #' # To show off the full features of dplyr's database integration, #' # we'll use the Lahman database. lahman_sqlite() takes care of #' # creating the database. #' #' if (has_lahman("sqlite")) { #' lahman_p <- lahman_sqlite() #' batting <- lahman_p %>% tbl("Batting") #' batting #' #' # Basic data manipulation verbs work in the same way as with a tibble #' batting %>% filter(yearID > 2005, G > 130) #' batting %>% select(playerID:lgID) #' batting %>% arrange(playerID, desc(yearID)) #' batting %>% summarise(G = mean(G), n = n()) #' #' # There are a few exceptions. For example, databases give integer results #' # when dividing one integer by another. Multiply by 1 to fix the problem #' batting %>% #' select(playerID:lgID, AB, R, G) %>% #' mutate( #' R_per_game1 = R / G, #' R_per_game2 = R * 1.0 / G #' ) #' #' # All operations are lazy: they don't do anything until you request the #' # data, either by `print()`ing it (which shows the first ten rows), #' # or by `collect()`ing the results locally. #' system.time(recent <- filter(batting, yearID > 2010)) #' system.time(collect(recent)) #' #' # You can see the query that dplyr creates with show_query() #' batting %>% #' filter(G > 0) %>% #' group_by(playerID) %>% #' summarise(n = n()) %>% #' show_query() #' } src_dbi <- function(con, auto_disconnect = FALSE) { # stopifnot(is(con, "DBIConnection")) if (is_false(auto_disconnect)) { disco <- NULL } else { disco <- db_disconnector(con, quiet = is_true(auto_disconnect)) } structure( list( con = con, disco = disco ), class = c("src_dbi", "src_sql", "src") ) } setOldClass(c("src_dbi", "src_sql", "src")) # Methods ----------------------------------------------------------------- #' @export #' @aliases tbl_dbi #' @rdname src_dbi #' @param src Either a `src_dbi` or `DBIConnection` #' @param from Either a string (giving a table name) or literal [sql()]. #' @param ... Needed for compatibility with generic; currently ignored. tbl.src_dbi <- function(src, from, ...) { tbl_sql("dbi", src = src, from = from) } # Creates an environment that disconnects the database when it's GC'd db_disconnector <- function(con, quiet = FALSE) { reg.finalizer(environment(), function(...) { if (!quiet) { message("Auto-disconnecting ", class(con)[[1]]) } dbDisconnect(con) }) environment() } dbplyr/R/tbl-sql.r0000644000176200001440000003001113177052675013520 0ustar liggesusers#' Create an SQL tbl (abstract) #' #' Generally, you should no longer need to provide a custom `tbl()` #' method you you can default `tbl.DBIConnect` method. #' #' @keywords internal #' @export #' @param subclass name of subclass #' @param ... needed for agreement with generic. Not otherwise used. #' @param vars If known, the names of the variables in the tbl. This is #' relatively expensive to determine automatically, so is cached throughout #' dplyr. However, you should usually be able to leave this blank and it #' will be determined from the context. tbl_sql <- function(subclass, src, from, ..., vars = NULL) { # If not literal sql, must be a table identifier from <- as.sql(from) vars <- db_query_fields(src$con, from) ops <- op_base_remote(from, vars) make_tbl(c(subclass, "sql", "lazy"), src = src, ops = ops) } #' @export same_src.tbl_sql <- function(x, y) { if (!inherits(y, "tbl_sql")) return(FALSE) same_src(x$src, y$src) } # Grouping methods ------------------------------------------------------------- #' @export group_size.tbl_sql <- function(x) { df <- x %>% summarise(n = n()) %>% collect() df$n } #' @export n_groups.tbl_sql <- function(x) { if (length(groups(x)) == 0) return(1L) df <- x %>% summarise() %>% ungroup() %>% summarise(n = n()) %>% collect() df$n } # Standard data frame methods -------------------------------------------------- #' @export as.data.frame.tbl_sql <- function(x, row.names = NULL, optional = NULL, ..., n = Inf) { as.data.frame(collect(x, n = n)) } #' @export tbl_sum.tbl_sql <- function(x) { grps <- op_grps(x$ops) sort <- op_sort(x$ops) c( "Source" = tbl_desc(x), "Database" = db_desc(x$src$con), if (length(grps) > 0) c("Groups" = commas(grps)), if (length(sort) > 0) c("Ordered by" = commas(deparse_all(sort))) ) } #' @export pull.tbl_sql <- function(.data, var = -1) { expr <- enquo(var) var <- dplyr:::find_var(expr, tbl_vars(.data)) .data <- select(.data, !! sym(var)) .data <- collect(.data) .data[[1]] } #' @export dimnames.tbl_sql <- function(x) { list(NULL, op_vars(x$ops)) } #' @export dim.tbl_sql <- function(x) { c(NA, length(op_vars(x$ops))) } #' @export tail.tbl_sql <- function(x, n = 6L, ...) { stop("tail() is not supported by sql sources", call. = FALSE) } # Joins ------------------------------------------------------------------------ #' Join sql tbls. #' #' See [join] for a description of the general purpose of the #' functions. #' #' @section Implementation notes: #' #' Semi-joins are implemented using `WHERE EXISTS`, and anti-joins with #' `WHERE NOT EXISTS`. Support for semi-joins is somewhat partial: you #' can only create semi joins where the `x` and `y` columns are #' compared with `=` not with more general operators. #' #' @inheritParams dplyr::join #' @param copy If `x` and `y` are not from the same data source, #' and `copy` is `TRUE`, then `y` will be copied into a #' temporary table in same database as `x`. `*_join()` will automatically #' run `ANALYZE` on the created table in the hope that this will make #' you queries as efficient as possible by giving more data to the query #' planner. #' #' This allows you to join tables across srcs, but it's potentially expensive #' operation so you must opt into it. #' @param auto_index if `copy` is `TRUE`, automatically create #' indices for the variables in `by`. This may speed up the join if #' there are matching indexes in `x`. #' @examples #' \dontrun{ #' library(dplyr) #' if (has_lahman("sqlite")) { #' #' # Left joins ---------------------------------------------------------------- #' lahman_s <- lahman_sqlite() #' batting <- tbl(lahman_s, "Batting") #' team_info <- select(tbl(lahman_s, "Teams"), yearID, lgID, teamID, G, R:H) #' #' # Combine player and whole team statistics #' first_stint <- select(filter(batting, stint == 1), playerID:H) #' both <- left_join(first_stint, team_info, type = "inner", by = c("yearID", "teamID", "lgID")) #' head(both) #' explain(both) #' #' # Join with a local data frame #' grid <- expand.grid( #' teamID = c("WAS", "ATL", "PHI", "NYA"), #' yearID = 2010:2012) #' top4a <- left_join(batting, grid, copy = TRUE) #' explain(top4a) #' #' # Indices don't really help here because there's no matching index on #' # batting #' top4b <- left_join(batting, grid, copy = TRUE, auto_index = TRUE) #' explain(top4b) #' #' # Semi-joins ---------------------------------------------------------------- #' #' people <- tbl(lahman_s, "Master") #' #' # All people in half of fame #' hof <- tbl(lahman_s, "HallOfFame") #' semi_join(people, hof) #' #' # All people not in the hall of fame #' anti_join(people, hof) #' #' # Find all managers #' manager <- tbl(lahman_s, "Managers") #' semi_join(people, manager) #' #' # Find all managers in hall of fame #' famous_manager <- semi_join(semi_join(people, manager), hof) #' famous_manager #' explain(famous_manager) #' #' # Anti-joins ---------------------------------------------------------------- #' #' # batters without person covariates #' anti_join(batting, people) #' } #' } #' @name join.tbl_sql NULL #' @rdname join.tbl_sql #' @export inner_join.tbl_lazy <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ...) { add_op_join( x, y, "inner", by = by, copy = copy, suffix = suffix, auto_index = auto_index, ... ) } #' @rdname join.tbl_sql #' @export left_join.tbl_lazy <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ...) { add_op_join( x, y, "left", by = by, copy = copy, suffix = suffix, auto_index = auto_index, ... ) } #' @rdname join.tbl_sql #' @export right_join.tbl_lazy <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ...) { add_op_join( x, y, "right", by = by, copy = copy, suffix = suffix, auto_index = auto_index, ... ) } #' @rdname join.tbl_sql #' @export full_join.tbl_lazy <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ...) { add_op_join( x, y, "full", by = by, copy = copy, suffix = suffix, auto_index = auto_index, ... ) } #' @rdname join.tbl_sql #' @export semi_join.tbl_lazy <- function(x, y, by = NULL, copy = FALSE, auto_index = FALSE, ...) { add_op_semi_join( x, y, anti = FALSE, by = by, copy = copy, auto_index = auto_index, ... ) } #' @rdname join.tbl_sql #' @export anti_join.tbl_lazy <- function(x, y, by = NULL, copy = FALSE, auto_index = FALSE, ...) { add_op_semi_join( x, y, anti = TRUE, by = by, copy = copy, auto_index = auto_index, ... ) } # Set operations --------------------------------------------------------------- # registered onLoad intersect.tbl_lazy <- function(x, y, copy = FALSE, ...) { add_op_set_op(x, y, "INTERSECT", copy = copy, ...) } # registered onLoad union.tbl_lazy <- function(x, y, copy = FALSE, ...) { add_op_set_op(x, y, "UNION", copy = copy, ...) } #' @export union_all.tbl_lazy <- function(x, y, copy = FALSE, ...) { add_op_set_op(x, y, "UNION ALL", copy = copy, ...) } # registered onLoad setdiff.tbl_lazy <- function(x, y, copy = FALSE, ...) { add_op_set_op(x, y, "EXCEPT", copy = copy, ...) } # Copying ---------------------------------------------------------------------- #' @export auto_copy.tbl_sql <- function(x, y, copy = FALSE, ...) { copy_to(x$src, as.data.frame(y), random_table_name(), ...) } #' Copy a local data frame to a DBI backend. #' #' This [copy_to()] method works for all DBI sources. It is useful for #' copying small amounts of data to a database for examples, experiments, #' and joins. By default, it creates temporary tables which are typically #' only visible to the current connection to the database. #' #' @export #' @param df A local data frame, a `tbl_sql` from same source, or a `tbl_sql` #' from another source. If from another source, all data must transition #' through R in one pass, so it is only suitable for transferring small #' amounts of data. #' @param types a character vector giving variable types to use for the columns. #' See \url{http://www.sqlite.org/datatype3.html} for available types. #' @param temporary if `TRUE`, will create a temporary table that is #' local to this connection and will be automatically deleted when the #' connection expires #' @param unique_indexes a list of character vectors. Each element of the list #' will create a new unique index over the specified column(s). Duplicate rows #' will result in failure. #' @param indexes a list of character vectors. Each element of the list #' will create a new index. #' @param analyze if `TRUE` (the default), will automatically ANALYZE the #' new table so that the query optimiser has useful information. #' @inheritParams dplyr::copy_to #' @return A [tbl()] object (invisibly). #' @examples #' library(dplyr) #' set.seed(1014) #' #' mtcars$model <- rownames(mtcars) #' mtcars2 <- src_memdb() %>% #' copy_to(mtcars, indexes = list("model"), overwrite = TRUE) #' mtcars2 %>% filter(model == "Hornet 4 Drive") #' #' cyl8 <- mtcars2 %>% filter(cyl == 8) #' cyl8_cached <- copy_to(src_memdb(), cyl8) #' #' # copy_to is called automatically if you set copy = TRUE #' # in the join functions #' df <- tibble(cyl = c(6, 8)) #' mtcars2 %>% semi_join(df, copy = TRUE) copy_to.src_sql <- function(dest, df, name = deparse(substitute(df)), overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ...) { assert_that(is_string(name), is.flag(temporary)) if (!is.data.frame(df) && !inherits(df, "tbl_sql")) { stop("`df` must be a local dataframe or a remote tbl_sql", call. = FALSE) } if (inherits(df, "tbl_sql") && same_src(df$src, dest)) { out <- compute(df, name = name, temporary = temporary, unique_indexes = unique_indexes, indexes = indexes, analyze = analyze, ... ) } else { df <- collect(df) class(df) <- "data.frame" # avoid S4 dispatch problem in dbSendPreparedQuery name <- db_copy_to(dest$con, name, df, overwrite = overwrite, types = types, temporary = temporary, unique_indexes = unique_indexes, indexes = indexes, analyze = analyze, ... ) out <- tbl(dest, name) } invisible(out) } #' @export collapse.tbl_sql <- function(x, vars = NULL, ...) { sql <- db_sql_render(x$src$con, x) tbl(x$src, sql) %>% group_by(!!! syms(op_grps(x))) %>% add_op_order(op_sort(x)) } #' @export compute.tbl_sql <- function(x, name = random_table_name(), temporary = TRUE, unique_indexes = list(), indexes = list(), analyze = TRUE, ...) { vars <- op_vars(x) assert_that(all(unlist(indexes) %in% vars)) assert_that(all(unlist(unique_indexes) %in% vars)) x_aliased <- select(x, !!! syms(vars)) # avoids problems with SQLite quoting (#1754) sql <- db_sql_render(x$src$con, x_aliased$ops) name <- db_compute(x$src$con, name, sql, temporary = temporary, unique_indexes = unique_indexes, indexes = indexes, analyze = analyze, ... ) tbl(x$src, name) %>% group_by(!!! syms(op_grps(x))) %>% add_op_order(op_sort(x)) } #' @export collect.tbl_sql <- function(x, ..., n = Inf, warn_incomplete = TRUE) { assert_that(length(n) == 1, n > 0L) if (n == Inf) { n <- -1 } else { # Gives the query planner information that it might be able to take # advantage of x <- head(x, n) } sql <- db_sql_render(x$src$con, x) out <- db_collect(x$src$con, sql, n = n, warn_incomplete = warn_incomplete) grouped_df(out, intersect(op_grps(x), names(out))) } dbplyr/R/zzz.R0000644000176200001440000000223213221465704012732 0ustar liggesusers.onLoad <- function(...) { register_s3_method("dplyr", "union", "tbl_lazy") register_s3_method("dplyr", "intersect", "tbl_lazy") register_s3_method("dplyr", "setdiff", "tbl_lazy") register_s3_method("dplyr", "filter", "tbl_lazy") # These are also currently defined in dplyr, and we want to avoid a warning # about double duplication. Conditional can be removed after update to # dplyr if (!methods::isClass("sql")) { setOldClass(c("sql", "character"), sql()) } } register_s3_method <- function(pkg, generic, class, fun = NULL) { stopifnot(is.character(pkg), length(pkg) == 1) stopifnot(is.character(generic), length(generic) == 1) stopifnot(is.character(class), length(class) == 1) if (is.null(fun)) { fun <- get(paste0(generic, ".", class), envir = parent.frame()) } else { stopifnot(is.function(fun)) } if (pkg %in% loadedNamespaces()) { registerS3method(generic, class, fun, envir = asNamespace(pkg)) } # Always register hook in case package is later unloaded & reloaded setHook( packageEvent(pkg, "onLoad"), function(...) { registerS3method(generic, class, fun, envir = asNamespace(pkg)) } ) } dbplyr/R/db-odbc-hive.R0000644000176200001440000000140013174465254014323 0ustar liggesusers#' @export sql_translate_env.Hive <- function(con) { sql_variant( sql_translator(.parent = base_odbc_scalar, var = sql_prefix("VARIANCE"), cot = function(x){ sql_expr(1 / tan(!!x)) }, str_replace_all = function(string, pattern, replacement) { sql_expr(regexp_replace(!!string, !!pattern, !!replacement)) } ), base_odbc_agg, base_odbc_win ) } #' @export db_analyze.Hive <- function(con, table, ...) { # Using ANALYZE TABLE instead of ANALYZE as recommended in this article: https://cwiki.apache.org/confluence/display/Hive/StatsDev sql <- build_sql( "ANALYZE TABLE ", as.sql(table), " COMPUTE STATISTICS" , con = con) DBI::dbExecute(con, sql) } globalVariables("regexp_replace") dbplyr/R/db-odbc-access.R0000644000176200001440000001433113176660402014632 0ustar liggesusers# sql_ generics -------------------------------------------- #' @export sql_select.ACCESS <- function(con, select, from, where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL, distinct = FALSE, ...) { out <- vector("list", 7) names(out) <- c("select", "from", "where", "group_by","having", "order_by", "limit") assert_that(is.character(select), length(select) > 0L) out$select <- build_sql( "SELECT ", if (distinct) sql("DISTINCT "), # Access uses the TOP statement instead of LIMIT which is what SQL92 uses # TOP is expected after DISTINCT and not at the end of the query # e.g: SELECT TOP 100 * FROM my_table if (!is.null(limit) && !identical(limit, Inf)) { assert_that(is.numeric(limit), length(limit) == 1L, limit > 0) build_sql("TOP ", as.integer(limit), " ")}, escape(select, collapse = ", ", con = con) ) out$from <- sql_clause_from(from, con) out$where <- sql_clause_where(where, con) out$group_by <- sql_clause_group_by(group_by, con) out$having <- sql_clause_having(having, con) out$order_by <- sql_clause_order_by(order_by, con) escape(unname(compact(out)), collapse = "\n", parens = FALSE, con = con) } #' @export sql_translate_env.ACCESS <- function(con) { sql_variant( sql_translator(.parent = base_odbc_scalar, # Much of this translation comes from: https://www.techonthenet.com/access/functions/ # Conversion as.numeric = sql_prefix("CDBL"), as.double = sql_prefix("CDBL"), # as.integer() always rounds down. CInt does not, but Int does as.integer = sql_prefix("INT"), as.logical = sql_prefix("CBOOL"), as.character = sql_prefix("CSTR"), as.Date = sql_prefix("CDATE"), # Math exp = sql_prefix("EXP"), log = sql_prefix("LOG"), log10 = function(x) { sql_expr(log(!!x) / log(10L)) }, sqrt = sql_prefix("SQR"), sign = sql_prefix("SGN"), floor = sql_prefix("INT"), # Nearly add 1, then drop off the decimal. This results in the equivalent to ceiling() ceiling = function(x) { sql_expr(int(UQ(x) + .9999999999)) }, ceil = function(x) { sql_expr(int(UQ(x) + .9999999999)) }, # There is no POWER function in Access. It uses ^ instead `^` = function(x, y) { sql_expr(UQ(x) ^ UQ(y)) }, # Strings nchar = sql_prefix("LEN"), tolower = sql_prefix("LCASE"), toupper = sql_prefix("UCASE"), # Pull `left` chars from the left, then `right` chars from the right to replicate substr substr = function(x, start, stop){ right <- stop - start + 1 left <- stop sql_expr(right(left(!!x, !!left), !!right)) }, trimws = sql_prefix("TRIM"), # No support for CONCAT in Access paste = sql_paste_infix(" ", "&", function(x) sql_expr(CStr(!!x))), paste0 = sql_paste_infix("", "&", function(x) sql_expr(CStr(!!x))), # Logic # Access always returns -1 for True and 0 for False is.null = sql_prefix("ISNULL"), is.na = sql_prefix("ISNULL"), # IIF() is like ifelse() ifelse = function(test, yes, no){ sql_expr(iif(!!test, !!yes, !!no)) }, # Coalesce doesn't exist in Access. # NZ() only works while in Access, not with the Access driver # IIF(ISNULL()) is the best way to construct this coalesce = function(x, y) { sql_expr(iif(isnull(!!x), !!y, !!x)) }, # pmin/pmax for 2 columns pmin = function(x, y) { sql_expr(iif(UQ(x) <= UQ(y), !!x, !!y)) }, pmax = function(x, y) { sql_expr(iif(UQ(x) <= UQ(y), !!y, !!x)) }, # Dates Sys.Date = sql_prefix("DATE") ), sql_translator(.parent = base_odbc_agg, mean = sql_prefix("AVG"), sd = sql_prefix("STDEV"), var = sql_prefix("VAR"), max = sql_prefix("MAX"), min = sql_prefix("MIN"), # Access does not have functions for cor and cov cor = sql_not_supported("cor()"), cov = sql_not_supported("cov()"), # Count # Count(Distinct *) does not work in Access # This would work but we don't know the table name when translating: # SELECT Count(*) FROM (SELECT DISTINCT * FROM table_name) AS T n_distinct = sql_not_supported("n_distinct") ), # Window functions not supported in Access sql_translator(.parent = base_no_win) )} # db_ generics ----------------------------------- #' @export db_analyze.ACCESS <- function(con, table, ...) { # Do nothing. Access doesn't support an analyze / update statistics function } # Util ------------------------------------------- sql_escape_logical.ACCESS <- function(con, x) { # Access uses a convention of -1 as True and 0 as False y <- ifelse(x, -1, 0) y[is.na(x)] <- "NULL" y } globalVariables(c("CStr", "iif", "isnull", "text")) dbplyr/R/db-odbc-oracle.R0000644000176200001440000000460113221453315014627 0ustar liggesusers#' @export sql_select.Oracle<- function(con, select, from, where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL, distinct = FALSE, ...) { out <- vector("list", 7) names(out) <- c("select", "from", "where", "group_by", "having", "order_by", "limit") out$select <- sql_clause_select(select, con, distinct) out$from <- sql_clause_from(from, con) out$where <- sql_clause_where(where, con) out$group_by <- sql_clause_group_by(group_by, con) out$having <- sql_clause_having(having, con) out$order_by <- sql_clause_order_by(order_by, con) # Processing limit via ROWNUM in a WHERE clause, thie method # is backwards & forward compatible: https://oracle-base.com/articles/misc/top-n-queries if (!is.null(limit) && !identical(limit, Inf)) { out <- escape(unname(compact(out)), collapse = "\n", parens = FALSE, con = con) assertthat::assert_that(is.numeric(limit), length(limit) == 1L, limit > 0) out <- build_sql( "SELECT * FROM ", sql_subquery(con, out), " WHERE ROWNUM <= ", limit, con = con) }else{ escape(unname(compact(out)), collapse = "\n", parens = FALSE, con = con) } } #' @export sql_translate_env.Oracle <- function(con) { sql_variant( sql_translator(.parent = base_odbc_scalar, # Data type conversions are mostly based on this article # https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements001.htm as.character = sql_cast("VARCHAR(255)"), as.numeric = sql_cast("NUMBER"), as.double = sql_cast("NUMBER") ), base_odbc_agg, base_odbc_win ) } #' @export db_analyze.Oracle <- function(con, table, ...) { # https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_4005.htm sql <- dbplyr::build_sql( "ANALYZE TABLE ", as.sql(table), " COMPUTE STATISTICS", con = con) DBI::dbExecute(con, sql) } #' @export sql_subquery.Oracle <- function(con, from, name = unique_name(), ...) { # Table aliases in Oracle should not have an "AS": https://www.techonthenet.com/oracle/alias.php if (is.ident(from)) { build_sql("(", from, ") ", if(!is.null(name)) ident(name) , con = con) } else { build_sql("(", from, ") ", ident(name %||% random_table_name()), con = con) } } dbplyr/R/db-odbc-mssql.R0000644000176200001440000001743013221275427014533 0ustar liggesusers#' @export `sql_select.Microsoft SQL Server`<- function(con, select, from, where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL, distinct = FALSE, ...) { out <- vector("list", 7) names(out) <- c("select", "from", "where", "group_by", "having", "order_by","limit") assert_that(is.character(select), length(select) > 0L) out$select <- build_sql( "SELECT ", if (distinct) sql("DISTINCT "), # MS SQL uses the TOP statement instead of LIMIT which is what SQL92 uses # TOP is expected after DISTINCT and not at the end of the query # e.g: SELECT TOP 100 * FROM my_table if (!is.null(limit) && !identical(limit, Inf)) { assert_that(is.numeric(limit), length(limit) == 1L, limit > 0) build_sql(" TOP ", as.integer(limit), " ")}, escape(select, collapse = ", ", con = con) ) out$from <- sql_clause_from(from, con) out$where <- sql_clause_where(where, con) out$group_by <- sql_clause_group_by(group_by, con) out$having <- sql_clause_having(having, con) out$order_by <- sql_clause_order_by(order_by, con) escape(unname(compact(out)), collapse = "\n", parens = FALSE, con = con) } #' @export `sql_translate_env.Microsoft SQL Server` <- function(con) { sql_variant( sql_translator(.parent = base_odbc_scalar, `!` = mssql_not_sql_prefix(), `!=` = mssql_logical_infix("!="), `==` = mssql_logical_infix("="), `<` = mssql_logical_infix("<"), `<=` = mssql_logical_infix("<="), `>` = mssql_logical_infix(">"), `>=` = mssql_logical_infix(">="), `&` = mssql_generic_infix("&", "AND"), `&&` = mssql_generic_infix("&", "AND"), `|` = mssql_generic_infix("|", "OR"), `||` = mssql_generic_infix("|", "OR"), `if` = mssql_sql_if, if_else = function(condition, true, false) mssql_sql_if(condition, true, false), ifelse = function(test, yes, no) mssql_sql_if(test, yes, no), as.numeric = sql_cast("NUMERIC"), as.double = sql_cast("NUMERIC"), as.character = sql_cast("VARCHAR(MAX)"), log = sql_prefix("LOG"), nchar = sql_prefix("LEN"), atan2 = sql_prefix("ATN2"), ceil = sql_prefix("CEILING"), ceiling = sql_prefix("CEILING"), substr = function(x, start, stop){ len <- stop - start + 1 build_sql( "SUBSTRING(", x, ", ", start, ", ", len, ")" )}, is.null = function(x) mssql_is_null(x, sql_current_context()), is.na = function(x) mssql_is_null(x, sql_current_context()), # TRIM is not supported on MS SQL versions under 2017 # https://docs.microsoft.com/en-us/sql/t-sql/functions/trim-transact-sql # Best solution was to nest a left and right trims. trimws = function(x){ build_sql( "LTRIM(RTRIM(", x ,"))" )}, # MSSQL supports CONCAT_WS in the CTP version of 2016 paste = sql_not_supported("paste()"), # stringr functions str_length = sql_prefix("LEN"), str_locate = function(string, pattern){ build_sql( "CHARINDEX(", pattern, ", ", string, ")" )}, str_detect = function(string, pattern){ build_sql( "CHARINDEX(", pattern, ", ", string, ") > 0" )} ), sql_translator(.parent = base_odbc_agg, sd = sql_aggregate("STDEV"), var = sql_aggregate("VAR"), # MSSQL does not have function for: cor and cov cor = sql_not_supported("cor()"), cov = sql_not_supported("cov()") ), sql_translator(.parent = base_odbc_win, sd = win_aggregate("STDEV"), var = win_aggregate("VAR"), # MSSQL does not have function for: cor and cov cor = win_absent("cor"), cov = win_absent("cov") ) )} #' @export `db_analyze.Microsoft SQL Server` <- function(con, table, ...) { # Using UPDATE STATISTICS instead of ANALYZE as recommended in this article # https://docs.microsoft.com/en-us/sql/t-sql/statements/update-statistics-transact-sql sql <- build_sql( "UPDATE STATISTICS ", as.sql(table) , con = con ) DBI::dbExecute(con, sql) } mssql_temp_name <- function(name, temporary){ # check that name has prefixed '##' if temporary if(temporary && substr(name, 1, 1) != "#") { name <- paste0("##", name) message("Created a temporary table named: ", name) } name } #' @export `db_save_query.Microsoft SQL Server` <- function(con, sql, name, temporary = TRUE,...){ name <- mssql_temp_name(name, temporary) tt_sql <- build_sql("select * into ", as.sql(name), " from (", sql, ") ", as.sql(name), con = con) dbExecute(con, tt_sql) name } #' @export `db_write_table.Microsoft SQL Server` <- function(con, table, types, values, temporary = TRUE, ...) { table <- mssql_temp_name(table, temporary) dbWriteTable( con, name = table, types = types, value = values, temporary = FALSE, row.names = FALSE ) table } # `IS NULL` returns a boolean expression, so you can't use it in a result set # the approach using casting return a bit, so you can use in a result set, but not in where. # Microsoft documentation: The result of a comparison operator has the Boolean data type. # This has three values: TRUE, FALSE, and UNKNOWN. Expressions that return a Boolean data type are # known as Boolean expressions. Unlike other SQL Server data types, a Boolean data type cannot # be specified as the data type of a table column or variable, and cannot be returned in a result set. # https://docs.microsoft.com/en-us/sql/t-sql/language-elements/comparison-operators-transact-sql mssql_is_null <- function(x, context) { if (context$clause %in% c("SELECT", "ORDER")) { sql_expr(convert(BIT, iif(UQ(x) %is% NULL, 1L, 0L))) } else { sql_expr(((!!x) %is% NULL) ) } } mssql_not_sql_prefix <- function() { function(...) { args <- list(...) if (sql_current_select()) { build_sql(sql("~"), args) } else { build_sql(sql("NOT"), args) } } } mssql_logical_infix <- function(f) { assert_that(is_string(f)) f <- toupper(f) function(x, y) { condition <- build_sql(x, " ", sql(f), " ", y) if (sql_current_select()) { sql_expr(convert(BIT, iif(!!condition, 1, 0))) } else { condition } } } mssql_generic_infix <- function(if_select, if_filter) { assert_that(is_string(if_select)) assert_that(is_string(if_filter)) if_select <- toupper(if_select) if_filter <- toupper(if_filter) function(x, y) { if (sql_current_select()) { f <- if_select } else { f <- if_filter } build_sql(x, " ", sql(f), " ", y) } } mssql_sql_if <- function(cond, if_true, if_false = NULL) { build_sql( "CASE", " WHEN ((", cond, ") = 'TRUE')", " THEN (", if_true, ")", if (!is.null(if_false)){ build_sql(" WHEN ((", cond, ") = 'FALSE')", " THEN (", if_false, ")") } else { build_sql(" ELSE ('')") }, " END" ) } globalVariables(c("BIT", "%is%", "convert", "iif")) dbplyr/vignettes/0000755000176200001440000000000013221502521013547 5ustar liggesusersdbplyr/vignettes/dbplyr.Rmd0000644000176200001440000002660213221275427015531 0ustar liggesusers--- title: "Introduction to dbplyr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to dbplyr} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 6L, tibble.print_max = 6L, digits = 3) ``` As well as working with local in-memory data stored in data frames, dplyr also works with remote on-disk data stored in databases. This is particularly useful in two scenarios: * Your data is already in a database. * You have so much data that it does not all fit into memory simultaneously and you need to use some external storage engine. (If your data fits in memory there is no advantage to putting it in a database: it will only be slower and more frustrating.) This vignette focusses on the first scenario because it's the most common. If you're using R to do data analysis inside a company, most of the data you need probably already lives in a database (it's just a matter of figuring out which one!). However, you will learn how to load data in to a local database in order to demonstrate dplyr's database tools. At the end, I'll also give you a few pointers if you do need to set up your own database. ## Getting started To use databases with dplyr you need to first install dbplyr: ```{r, eval = FALSE} install.packages("dbplyr") ``` You'll also need to install a DBI backend package. The DBI package provides a common interface that allows dplyr to work with many different databases using the same code. DBI is automatically installed with dbplyr, but you need to install a specific backend for the database that you want to connect to. Five commonly used backends are: * [RMySQL](https://github.com/rstats-db/RMySQL#readme) connects to MySQL and MariaDB * [RPostgreSQL](https://CRAN.R-project.org/package=RPostgreSQL) connects to Postgres and Redshift. * [RSQLite](https://github.com/rstats-db/RSQLite) embeds a SQLite database. * [odbc](https://github.com/rstats-db/odbc#odbc) connects to many commercial databases via the open database connectivity protocol. * [bigrquery](https://github.com/rstats-db/bigrquery) connects to Google's BigQuery. If the database you need to connect to is not listed here, you'll need to do some investigation (i.e. googling) yourself. In this vignette, we're going to use the RSQLite backend which is automatically installed when you install dbplyr. SQLite is a great way to get started with databases because it's completely embedded inside an R package. Unlike most other systems, you don't need to setup a separate database server. SQLite is great for demos, but is surprisingly powerful, and with a little practice you can use it to easily work with many gigabytes of data. ## Connecting to the database To work with a database in dplyr, you must first connect to it, using `DBI::dbConnect()`. We're not going to go into the details of the DBI package here, but it's the foundation upon which dbplyr is built. You'll need to learn more about if you need to do things to the database that are beyond the scope of dplyr. ```{r setup, message = FALSE} library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:") ``` The arguments to `DBI::dbConnect()` vary from database to database, but the first argument is always the database backend. It's `RSQLite::SQLite()` for RSQLite, `RMySQL::MySQL()` for RMySQL, `RPostgreSQL::PostgreSQL()` for RPostgreSQL, `odbc::odbc()` for odbc, and `bigrquery::bigquery()` for BigQuery. SQLite only needs one other argument: the path to the database. Here we use the special string `":memory:"` which causes SQLite to make a temporary in-memory database. Most existing databases don't live in a file, but instead live on another server. That means in real-life that your code will look more like this: ```{r, eval = FALSE} con <- DBI::dbConnect(RMySQL::MySQL(), host = "database.rstudio.com", user = "hadley", password = rstudioapi::askForPassword("Database password") ) ``` (If you're not using RStudio, you'll need some other way to securely retrieve your password. You should never record it in your analysis scripts or type it into the console. [Securing Credentials](https://db.rstudio.com/best-practices/managing-credentials) provides some best practices.) Our temporary database has no data in it, so we'll start by copying over `nycflights13::flights` using the convenient `copy_to()` function. This is a quick and dirty way of getting data into a database and is useful primarily for demos and other small jobs. ```{r} copy_to(con, nycflights13::flights, "flights", temporary = FALSE, indexes = list( c("year", "month", "day"), "carrier", "tailnum", "dest" ) ) ``` As you can see, the `copy_to()` operation has an additional argument that allows you to supply indexes for the table. Here we set up indexes that will allow us to quickly process the data by day, carrier, plane, and destination. Creating the right indices is key to good database performance, but is unfortunately beyond the scope of this article. Now that we've copied the data, we can use `tbl()` to take a reference to it: ```{r} flights_db <- tbl(con, "flights") ``` When you print it out, you'll notice that it mostly looks like a regular tibble: ```{r} flights_db ``` The main difference is that you can see that it's a remote source in a SQLite database. ## Generating queries To interact with a database you usually use SQL, the Structured Query Language. SQL is over 40 years old, and is used by pretty much every database in existence. The goal of dbplyr is to automatically generate SQL for you so that you're not forced to use it. However, SQL is a very large language and dbplyr doesn't do everything. It focusses on `SELECT` statements, the SQL you write most often as an analyst. Most of the time you don't need to know anything about SQL, and you can continue to use the dplyr verbs that you're already familiar with: ```{r} flights_db %>% select(year:day, dep_delay, arr_delay) flights_db %>% filter(dep_delay > 240) flights_db %>% group_by(dest) %>% summarise(delay = mean(dep_time)) ``` However, in the long-run, I highly recommend you at least learn the basics of SQL. It's a valuable skill for any data scientist, and it will help you debug problems if you run into problems with dplyr's automatic translation. If you're completely new to SQL you might start with this [codeacademy tutorial](https://www.codecademy.com/learn/learn-sql). If you have some familiarity with SQL and you'd like to learn more, I found [how indexes work in SQLite](http://www.sqlite.org/queryplanner.html) and [10 easy steps to a complete understanding of SQL](http://blog.jooq.org/2016/03/17/10-easy-steps-to-a-complete-understanding-of-sql) to be particularly helpful. The most important difference between ordinary data frames and remote database queries is that your R code is translated into SQL and executed in the database on the remote server, not in R on your local machine. When working with databases, dplyr tries to be as lazy as possible: * It never pulls data into R unless you explicitly ask for it. * It delays doing any work until the last possible moment: it collects together everything you want to do and then sends it to the database in one step. For example, take the following code: ```{r} tailnum_delay_db <- flights_db %>% group_by(tailnum) %>% summarise( delay = mean(arr_delay), n = n() ) %>% arrange(desc(delay)) %>% filter(n > 100) ``` Suprisingly, this sequence of operations never touches the database. It's not until you ask for the data (e.g. by printing `tailnum_delay`) that dplyr generates the SQL and requests the results from the database. Even then it tries to do as little work as possible and only pulls down a few rows. ```{r} tailnum_delay_db ``` Behind the scenes, dplyr is translating your R code into SQL. You can see the SQL it's generating with `show_query()`: ```{r} tailnum_delay_db %>% show_query() ``` If you're familiar with SQL, this probably isn't exactly what you'd write by hand, but it does the job. You can learn more about the SQL translation in `vignette("sql-translation")`. Typically, you'll iterate a few times before you figure out what data you need from the database. Once you've figured it out, use `collect()` to pull all the data down into a local tibble: ```{r} tailnum_delay <- tailnum_delay_db %>% collect() tailnum_delay ``` `collect()` requires that database does some work, so it may take a long time to complete. Otherwise, dplyr tries to prevent you from accidentally performing expensive query operations: * Because there's generally no way to determine how many rows a query will return unless you actually run it, `nrow()` is always `NA`. * Because you can't find the last few rows without executing the whole query, you can't use `tail()`. ```{r, error = TRUE} nrow(tailnum_delay_db) tail(tailnum_delay_db) ``` You can also ask the database how it plans to execute the query with `explain()`. The output is database dependent, and can be esoteric, but learning a bit about it can be very useful because it helps you understand if the database can execute the query efficiently, or if you need to create new indices. ## Creating your own database If you don't already have a database, here's some advice from my experiences setting up and running all of them. SQLite is by far the easiest to get started with, but the lack of window functions makes it limited for data analysis. PostgreSQL is not too much harder to use and has a wide range of built-in functions. In my opinion, you shouldn't bother with MySQL/MariaDB: it's a pain to set up, the documentation is subpar, and it's less featureful than Postgres. Google BigQuery might be a good fit if you have very large data, or if you're willing to pay (a small amount of) money to someone who'll look after your database. All of these databases follow a client-server model - a computer that connects to the database and the computer that is running the database (the two may be one and the same but usually isn't). Getting one of these databases up and running is beyond the scope of this article, but there are plenty of tutorials available on the web. ### MySQL/MariaDB In terms of functionality, MySQL lies somewhere between SQLite and PostgreSQL. It provides a wider range of [built-in functions](http://dev.mysql.com/doc/refman/5.0/en/functions.html), but it does not support window functions (so you can't do grouped mutates and filters). ### PostgreSQL PostgreSQL is a considerably more powerful database than SQLite. It has: * a much wider range of [built-in functions](http://www.postgresql.org/docs/current/static/functions.html), and * support for [window functions](http://www.postgresql.org/docs/current/static/tutorial-window.html), which allow grouped subset and mutates to work. ### BigQuery BigQuery is a hosted database server provided by Google. To connect, you need to provide your `project`, `dataset` and optionally a project for `billing` (if billing for `project` isn't enabled). It provides a similar set of functions to Postgres and is designed specifically for analytic workflows. Because it's a hosted solution, there's no setup involved, but if you have a lot of data, getting it to Google can be an ordeal (especially because upload support from R is not great currently). (If you have lots of data, you can [ship hard drives]()!) dbplyr/vignettes/windows.png0000644000176200001440000015427313102722665015776 0ustar liggesusersPNG  IHDR9psRGB pHYs.#.#x?viTXtXML:com.adobe.xmp 5 2 1 2@IDATxsSXq(Nq(.EJ")x o)wwwwzlrvd+3Lgf0}? H@$  H@ vZ H@$  H@@u@$  H@t0[$  H@$  K$  H@$0ԥIߺ%  H@$  H`x{w]w=7SOb8-~>wG4Ls1 3*W'dKlS!{.]>nSN9r-_{wg}n%\\6}L,V$ K`(ֶ.bw*^`eYRdy[ou֪;,;l|[oc]h~N8!.2_u`dTSME+!`sfss7rK H@$<֜{ꫯp묳/TW^{ 7_0}'7,nLJ x>!: K&l2z'kү9眓iξ vۍ]/N z[o5#M3+N{UW \\6 EGb%  H@ Ђs^>qc}rfmƍK.nYz2S#rE%%d樣{wp/}Q 6?C1|=C`p|Ms\_bpcK/F(_pl; q.x.F;#j$ ȫv-Dk>Fi4Zbڲn=`"~N8U%=k_Z+g0L4D3^q_c5`ZI'AadU45 3cVȽ GrqXs7Z>8i9R $  @KofڌEqj.'xD)$MX,Zf.˘cXBn]x0_}Zmժꨘ\Ws7tAw%  H@C@{W\#g{n+7 쩋_dp0W_}1]曳d뮻YތW,׽ꪫ7glbYX;9(lynDvҎE'J4KѧN%w$K H@$Pt)=2 9XcQF/tK\1~r{k4F.w}zjvl eA1;C: &d裏>0Fj)ns.c^H*4"'XVЁݤIvFA[lH@@A0r%rN%/{Jg[WZO1x n84Q{I9t3ơ+2Dgk&kwdAQe:_oau/h3%tR=u]Mb-xꨘQ9g[ިodkj#gi$ :^V"5_zmZj?O5qG;s͑Hq>^q)s =}~wfYhA<^m8~bJb9{N6ߑ61X H@@ _:"3Yf=wQG$I#]v!<הomAJ?C5hO1mWȒF!9`lV d)y睗ut4`Vկ3v< ~xPnl^n:rYձ7ћmR#mbh$ R(K7twqix<# 겘g1EpVZil8C[n!f9wBf5ۃYlRE^pxO*袋8K,D4o%wH}>+ H@n/n:[`.FڼsezkNQ8I6fc_-b|`!7}m_? 5cM-x.F $  HW^;r6r.+ Q /0z릛nQyG.e0L?ؚѐ;[y~xZ|17x#3<*x} 32xL\Y eea%$  t ֬4lm}AEiW|G*ގ< (mGC%\ "=Ӄ9묳:_J:KQ$  wgHo,2X'm =حʐ଩暫EZF Q.$  4G@]?sK@$  H@@s<9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn ~ۯ|f$  H@@ d$P/_|1#O9 7\i$  H@~߿ u6zT+%Unl覒ݒ1V]4# H@V~`eQVUFܧ._o&ĕSdgJ76Jgp{InӠFZ$  H _ځ$ H@$  H@p ͓$  H@$ԥ]6O$  H@@Pvxi$  H@$ .'.y$  H@:;H$  H@$  t9uiw͓$  H@$ԥA' H@$  H Km$  H@$ '.< H@$  H@]N@]l$  H@$  t8uiwI@$  H@r.`' H@$  H K;4O$  H@@Pvy< H@$ >{~ߕJ@]V. H@H.va:N4N$P| VX0X`FU4S{ ?Ґ䫯 5X3Ovi'u] 5oN_3^#.-]i$  |)\s8,ə!ڒQZTQoyC38;~wqdž&E.o~Z0o_}g5E@]Zroz{z7@#h~Qr_ܤo;Jl ?Xgf},0s9;6M@k ,MIԳ:K.a3%dI~/2)ƀ9s衇7|/b-6#vl^yܞ6Zo">Gu5{7fmxww\V?)*W_=3N8aS׌,L{rǮYW}fIn`!}MlLP|}#K/MH*a[ ZPuQNvORDԪ:c6ld}ٯHS[oumF/4L[_3M@S{#0IОa?+kkowyE4L>FmcQu_,W/L`O+N#Y֥N<Ĺ YMM>*^5?R?O6SN9%MoU`O>BD Dr1"vi#\,E$黙go$#7@k&dd0l͈J9Cu]1O$YwuoO.j6`Ұ10>MdT_HϑN(!N\s12\<> EHblQ ?1Yg\ HCTZhG[!ZhUF܎ΐ[fmQ{,(S%5aVSGq[]+zVKQ uYve#&nQn1?Flq Jկ" qۊx -o*9OmR1C %r0'BjF6lBjS;FxDiq_~aes1B @RJ>uY*$4& R Ȁg1b/BdRbǦIU(GEFRˢ.G i" R*1ǁ65YKXIc Qu&ѥ bVMx>,%\K^d|kysDԥ$:*3nΚiL !^ x:묃'=5j5並Xy83i 0d 7,OptMD7#MBA"M $A|?"b@p  +Bi>(6ʢbVHҴcrerl&Fʹ4ȂM@WcOhns7δ -b 9@RF@Qge`3'gyɴT8;@0{,Tf3UGɬى .aӟS6a4|c (|RBqJo 1[lñM6avFfe68yiuSW|c^vr 2 O:$˧ /g %Ҁ:kI ތS4(')br)Y ԿLR 'ok%H{{7>w!v<0BGe5,Bc1-wl5f#S'n.)ꩧJmo Y~V]d kې;Z( W:~Ield-1˂_:e6yV В.~Lpj:є,er""ID8`. ;fuu_52 rdSE%,!KsJ b|LXj &\$;!ӟP3ihg9153+Ĺt xB -R $ЙԙiJM‚3ϯQ&k"{S+Z6br\ԿLEކs R ~7elGa?ir1xO8b3'[bc.M/2QwIV)e bɥYHb8eqshApBSNA4(/PE|`Mo͢"i5K[滽E%~}#EP1Y┗j @[>uY32nİE~ba'S!QG467 cOaXӤjcXR.(XYN_IagKNW3&9:+VP+ :$ryr @'PvB/hCNXvm#0`AWv9WVq [6'b^/cxX?-z0wr˝wމ.E36Q%8%WT6$ ' Ra<, -2&X ϖ_?=o0w^5z96xcc=ڒs~8#̤L84 US&khYSVVmLK @ ȕ]]d,dNAS0gnt):_uH⦰*wZrNq&^K[W$WTYRQK≉O8>ٳ\ / f+0WB1,brټ%^"0WB6=o.kdKh:A6{uwŗ P2s(4Mj_S"}Fýy%_,̺XzW\p-ٺrp.wy0_]rOɖpg` ҪQ/=i.׿is}mPݨ14iaR f/eZ8]Q< I)\%vY]֯e 8) glxmȰژxngV5^~,"c$ͣMΕbd9#KQO|{)̘Tr@kSz{z@'PvB/hCftFl /\3Mgk n3҃kXSgw`EL5 z'dͤᾐ-^<" AqǣVb.b"nHNNeCxĄ"I-˳LM"4<(K8ZѥD!J+.q,wBŨh҆{F_IVYs,M*RV5sNœPhPbǢ7e,e)K⋑CM%]㴯~Nü$vrj88겗-#e h[E'dFqG[G+3c!-["c$+\)`v||CID_܎ҪTlߢ|;c7A?skد^I` KUw?~Gy Go*?E뭷^4+NӁ QN)p/8iDDR7㗿%~lv3Ys(&v{55N:Y] (c\EC;C(RktZĆT`O XVs8~;ZW4cرw"%M9[K0;;z~[R˲Yq0kUFX9e8x=uYQ:Jjvo#;#*?-Ј爢3yPMZ;MpzhFC\PFɁ}$  H@$ ԥ+ H@$  H@C@]:0E$  H@jPb$  H@$  up H@$  H@M@]Z$  H@$00ԥZ$  H@$  H6uim.J@$  H@P gk$  H@$ ԥ+ H@$  H@C@]:0E$  H@jPb$  H@$  up H@$  H@M@]Z$  H@$00ԥZ$  H@$  H6uim.J@$  H@P gk$  H@$ ԥ+ H@$  H@C@]:0E$  H@jPb$  H@$  up H@$  H@M@]Z$  H@$00ԥZ$  H@$  H6uim.J@$  H@P gk$  H@$ ԥ+ H@$  H@C@]:0E$  H@jPb$  H@$  up H@$  H@M@]Z$  H@$00ԥZ$  H@$  H6uim.J@$  H@P gk$  H@$ ԥ+ H@$  H@C@]:0E$  H@jPb$  H@$  up H@$  H@M@]Z$  H@$00ԥZ$  H@^|?pM H@ $04\ve;A44k+G?g EL?8")=G}ꩧ>Sk5\3N\u]E5O?M뮻n_3^G[! H@}#-C\s8֛Ƶ-2Jw񨣎j{_lh8W?O?4ok>M/ tợB$0YTr_{w7ߜjzKx +x9 L7t=ܨ9&i$0ԥIۺ4/o۸a&hE]t'd(a}ꫯb37 -<6kAW^9h#[&J{?~r ;;,STz뫯zg&pU55Y lǮYW~/`!}MwLP|;#x ;ET/@M*}YH4-˯4L+9]Ź H@FV=(oy6:nv~馛^vtfs]w?׿~_w͝IR:q7餓N1c9&+xț8`iEr7RK]y)k&pm&;y]iRz'͙"WYev>K/MH*a[ ~}ه('6~'"jUWc1fm6>W]uU$N[ou6sjDKDѡb-I|& )j~G$h0 ~ԕ|F|{ XL3Jay{1k꥗ fF~495l]*O{tPzfүja0 S3YCZk-./gO}uYV\pA0B?`ְY{ C 12eKq+Ը{Dd?(mi>$X}2z}рL'yϻ4&e\{ӿ/\2my*Y32nkW^9 /21 M7ݔY087z@_% 'Pi}mhSTihs;P&J#~ 7d R;j_e92fPRs[uVK Yve#/ (F8F SHܵ%1H.%w\^}q2EL6"^|ٕʞ@'Jկ" qx _J%go9Vn)c!S\98%ٰ FV#b4d$11Tc6P];1R8f)E1}TH 0IAiL@'38cih@2r1B_PR=܃QQxc@F /XSO=EL]r5ui/fd~i4*rp/__IN8ᄴj{{whS. ,hZ19(UV)laxrir3Ѭ(2[;BZ' )#}TH 7)Nհipov2`G7Y(n6shlv JX: ^j3@, ~헳K H,/)2`E,gIU<6!l TXp- !BMpƻ]>rA&ꩧJ-_=21C&ʍ5|S 8 s9gj?a:]$CIJP6Ee_2<+6fJW_?&[{X5BKYRҿ}l9n'lӥչӃ.2,-E?Ζjj.|lduxRi?s5WֶX1, ~bvJD 6sCj+h} j:XjV85)%M0z͇E>kf. qu$1m]wdI$G"VQȋ$ʚEQE,,hCCVk&^w{^VSeFF4ٺ"ACK9 ]ڧ.|]}{[ǂX3Ll| reƤ|rKuůe Pg5, t2ui'goC@,^l[SrzVah<^c-v6f]ȸd]-rwK=c?XHQ 7k_~b!mGnu91a|CՑj62X oXiJ&[/rNۄ2 wY*96xcc=vB[qNo`$ۃ9a+"ۆ{@R~\ܻrRWtѭIN&h&6, tuiGut.*i)ޞ6Ve2 t2{|_pobeY X .e1ɺR_Ne.,CxH38fxf?lȖVzϬL9}bDq5oSU3p{zaR]GD):_r%$hEp.c;9Y]˼s)[{FEg}wqŻujl7:U[Y|$._0 $ $.~Ѫ.!0,l wo<`&c`A].mk_Xx G?QT$ßɣV8y4ʁdDSa}W皒&9\Bƙ'&CF$ZCbC,DF$G"q0XGcG+Q5"6Di_ѥl A!/QqM3NP8x)eP9UQ'aNB4rrrq>MwYXMKi_LJ;98at\u-DXr~Z32|(yTvYi6gy&X<|TLx H K٪?kUie/4-͹+` %5;Umeo:묓e @zhهZy(ͶI<}Q gHgj# gl6MC.Tt62 PVJ(848㎑&+f$7!·=p2L>sIt\F3p&,XII+ uԩŶd򢟳rUx& YRLo痒?mTBuqk+[^d`0$48mL@Xj H,2m5Q%6iq^Y>ۜx@"52ڜeU%5;KkռUM( \]x8LQ<%I/%U\"6]CK+]S^)B'дٿ@Koδ1Nbn: f3y~mLjiH@iAV>eO1F 35|DZj^-h|0;dw.d@P#쎪 Vo']  emN81f' H<-M]FguzE9~U>ɶns$Э<[{vu~-dپ :L'sv&kLgy@k5IO;g1t]ve/ة;PX t2ksFC\Vv%) Ӄ 2I:1XQ:Kjv4b%58d)5KYv_9khC$0t/:}mK%  H@@{9:9uyqe]K?{p $  H`cP`%=|~i-$  H@$PF26K@$  H@K[" H@$  Hԥe5m$  H@$=ԥӗD$  H@@ Kk, H@$  H{K/m$  H@$ 2P״Y$  H@@PvO_ H@$  H@e$.-ci$  H@$ !.ힾ%$  H@H@]Z^f H@$  H@C@]=}iK$  H@$  $  H@{ҖH@$  H@(#ui{M%  H@$  tui-$  H@$PF2{6Q@IDAT6K@$  H@K[" H@$  Hԥe5m$  H@$=ԥӗD$  H@@ Kk, H@$  H{K/m$  H@$ 2P״Y$  H@@PvO_ H@$  H@e$.-ci$  H@$ !.ힾ%$  H@H@]Z^f H@$  H@C@]=}iK$  H@$  $  H@{ҖH@$  H@(#ui{M%  H@$  tui-$  H@$PF26K@$  H@K[" H@$  Hԥe5m$  H@$=ԥӗD$  H@@ Kk, H@$  H{K/m$  H@>SK@hui;Z$  t4.la=蠃:1woxm*VX!|k,hiM{'S;#hW_]x2<4L vi'u]\\s͂y]^P:$  H`(o5\387i*>O“.NgosF[Rs7GuT4gqwN?o8~Z0o_^K@Rǀ$  H>~l/|}T+>ժs97HռkmٛoY{wdgy"8judwq&|7x}2Fa[lŊDXsexWp{6h59ꨣ8∽%h2{7) wqYY(n .O>IV-LzKSE PLz{CݰFyj&pSZoe5 &~G|Wx{%_4H?S<_|O52R8o6 7\&G-u>70a*I@O[c(Boh;뮻'x׿f 4EJ0Mr.B4e'pqM:SL1Řc 䖳"X:ӢZ}_j5XcM6odMv'J+=1O>93E*讴G]z饩 CE;Cz+Է4o=|RĮO>$UZ 2,S)30i1L<'p9 L%X4:Cf-"j~/fi1Fe/R‰'8s>oDg뮻b#<&}9`bf`[o agjruNpO< W\H3̽KGKIC??[eۤ+i",rie߭f毶j1w(FB>(,QL2$ڠ# n@~צBp_r!vYAnhu]wJ.?=쳩R@`VUת^Nif~f폟w5K1{fW7%1mq饗Ì6mƝ񪫮J ) ٧HkA#l$ ]zk4DM4#"G?]tQİK,D\^q2 [k%$=K?җCC(g?ȋ#ЄЮ/~ |' b qِw-(@:$iB($nmEQz)a' 6PT[. Jw}w>(Zkm馨fZB__z%;xh;۔ _:2@ifxNb%* BTLm k6'U"? eY8',>&1e'ٕj&$:ȋϟlfF [a H iXȔ}+f ;Lr^$ 炠L8' $yۍ;`&Ff \1x縉la[)8zꩄY9$P xE7[1sS܆Ip|%a,Q%Ux,믿~*=ְ Ş]2rvT/dmf8fմ __rR$kGO W]uM)>{*O8ɾUǀI<>IgKf9M}C\Hvfd?q|.23+7;~ƤˌB -SORLA -뾲?NyW%ja7u,+;X({w5X6[G!ˈPAOSRyp1~뮻;qV*bDp"橪5Kn&zrѽh0U)>96oįJ DJfs; ̵ׁͤzdK#RBs&RV:l EqR_Nz.̧HХ|\Bb r,(`5W[p_Ӱzx H` K5ɱ=;gZQK 4@ֆƽ2>+l±k4yr-wy'x%yHŽqz8&WZ$ Y7f^V?5V@D7!`"{pWR&b$k".=-e h$oƜeE8OR*Leשdι^!mO;E(= AARn6 ѥUGV(alP v++s)dKƬ3ޞ\doxz{x t> ތ[pCt1dkq)ee[u/>R|ηxH38fxzd?lȖ0l?gGS}+EF5*~gaYlÎ\ Mz8_H.LRK\Er^uZ6қlN*N^aw 'U_G.FE hy>3o˹uUL侺c$ A'𿯼A7E$P%ON ˎ +-xo)J<)P(@$er ` p1v؁BI3uK$Йԥ/ZU0^;2U Yčl4 G"h0t#J5(W^y%>vQ..YDJ8EoPKĝ4r2CFJ4*>ė"6$c "45E~vm]ѓ%ܣ )^93DiRHʤ("z#Pt u..VD?G-wྣsy?LqS$w c=H%<u'4, k<Ҟ"-Ӭ "&ѥUU]Ǚ)#ND:$0aUUjU.4Ja3?Ã=0xgeӝ{EpnY9ʄ\>`o{mTgʟ)y+sTYʇ^O$P&#%  H -ǛG^#+l(#FLY/=#P|֫[ٻ?Rҿfe*O?Z%P#VF=' H@@xD$ !AGiU{zk,s >X=bu{#WfוG,؏s=*OQi{? I@$?0$ H@C2Vwi_y<鎿LRF/~؛JJp_h;爛xt$  B@] %  H |IK2pni1kei*KX&K5lxx')|nzsB1Uf2gx<4 H@hY旀$ x䞞͟NPʟ,>YezϞ"lC~]_4I]YoRYjʮWrTYpE$ uS{뭞uPN3ǵk*e0:x?*gq&x*J76JgpQIn~,ahpjыT産~!\?2蕯ُJǓW.X$  H@腀^- H@CrtE:ߘ=x9+{ost%+ H@}$.#0K@vjWܬĕE&2Zs$- H@ 1  H@@ ;lgT<3Ssl$  4".mD%  H@}%"gk&K@,ٮ$  H@:#A#$  H@$  Y!6\$  H@@GPvD7h$  H@$ !K@]:dކK@$  H@Ҏ$  H@$0d Klp H@$  H@A@]ݠ$  H@,uz. H@$  H#K;4B$  H@%.]o%  H@$  tuiGtFH@$  H@ԥCm$  H@$ .n H@$  H@Ctv $  H@$ԥ ! H@$  H`Pٮ$  H@:#A#$  H@$  Y!6\$  H@@GPvD7h$  H@$ !K@]:dކK@$  H@Ҏ$  H@$0d Klp H@$  H@A@]ݠ$  H@,uz. H`x9昻{";<]>î ў7|ƺKڼno~͖$А!"H@jx饗N8oe=8ӥ }G}ߖOz^޸袋|We3\{% vPJ@@︱oRD_>|G?ss9g_3MMJM,VCC|7x}ꫯ0M7b-6䓑Ffo/~MswW(R6~uGfaEu .O>IV>`GuQ{Kblrl_Ǥ4 K#ϖ ':2Űyqeoz+3:L 0ث1$O{c9f_3fmtAHud dpBfwW`$ԥGZ'p737~_6^{5$ TOtM_|q9D#C AH^z &Hq7̻?8K@g"iyE5EG}gAu](s%` ^n! Ut-1TS1cUmn믿>c!K>S,첽;rM?X_>m%4:ؚ tI3<3 }z]OC'?&1uKo 0PI9묳2b \{9_c5gTcrry~3&#49&iUW] #j%Ri\N^{~QJ t/bSSC3S! 3j4  .SkK~eEiJ̭vm / Ah+ 6zs=ߎJ Hё??B.[H@¡+Ȏ~ w_u$xW18c1By}}?ڀo>1 D6 B+{r)YCj^s5QH+}J+U'H@mع{, fFr{ai!/Jz뭇[uU)*ղnf\"Q(ÿ[mU$(Rn6G3/nA= `vHuYCl)en&(Lʇb{a"pR Dgf *@fY;YEgxQ4A$҃c=6 ]w]J#'9C_.Zz=.D"20XIaX"_sAi$g"fI'Zas=УR'oa$>l_xI٧`΀$0zh@p;J =,ovpL t qQelrsX}Չa M5.[na~(`\?½2K!X>26D-+~- (=96 El\C02SWCGh*H/4;S$[rv pDuYwq@ E:\xh8##<\|% <@R2bK,ѰEO=AbR`j@* aށb\X܌))H4 BmH$T|!R J2:wtdaWPJKSqAiވUØa /+E7"ۘdB{lnVJf@ǧ/*$JqL:Nm(Jrӽ9uco JLmPV*'-l+< ǫ%#I;<6[ֆ (0C 5`$Jy #Zeq\,FHs?ZLTd~D)(F4!;kSd{R锆İh&2m*"rxh|5\)GH QNn.R*[`p9 a3BB5MW9S:Nd dϹclڞabi+lS,LےӮt )M(7ȕD6ĉ,| Nrfr4OT HK;n TCbRG`&y"eb3ރomM!-8!G䰜\.áZC!{xlХ|%b“TG 8ˉ*6O2ds$4$A\ o5\3K+k\yz)Y)UC2[`6̲e6޹&˅9e8⍷PEMpq3yDI.&{%e6Muyr62@lã C؇SO\󽔀tp[{n \pA_UtZ"2V qˎnIfecR5K9!J9,292.XgH'leL. Y7f^֣Rr.V85P_ʡS59)0nm`v+Rcu٘:ТޘHY O)&qDSQGT+\t_T&I8(Vp : kLF5k[_uJ< ~tл@Zk!RlaM-d@^>y8B)(tҬdž,cu"e{Pž8KDA|P դ8W1U[!kOkÜy-IH4`6>ѫ|sn.Q,e)5 5qtͪJ6v?{Wgs%2яGޜ.mD)48'jYˍﴎ4g[ dlS!wɅ6cdf#Zj)NDv̦˂fTfV(NK֋S $ N .^ІB8nPJ;!6P0q[YPbx.jǤbA)DZQ"Syb7$--c'7I" Ibr^Q †T.Ō6di_[ot*d+x4΢㩒<+ ז[n?<98׿u(1=PpqS, _*,f8lJ!\PP}Sc0}ʌ `UCձi9 9azx[O-ŋy~-agӰvt)|b'} sth H` Ku5E!~u}蚪uhsZoItL9/q7q|1Itq*ofy$ڕ_|Gh[4$`AKANpj( ҭsҥIi.">qpZC)b@RFqjLq+T>ژ#8խ<0Gݒj ħlSN9%{WlX9X|'tUڼb6h#OQ`!V!= #"C{BCo2CK<2 CP`S9qF!Y5MJF :asLƩ LLb;F겗HY0vDKQf #g0[a H` K 4xl:Wմ4imn) @֖[yh8MkX(ih*TAĠZy,5D U@8-5VKͥ(4.ė"6d lwEǖ0֢-3#6pCryHYTm+zH JӃ^[TUzUWQPS} P&E@lʒc:GnYGeE21O5*qC E7d"f`+TJdg'6?D'23¸#b0s44zunoujL2ь=^Pb7fcK` fs@d,#*唑F)l km`E[A/9PYKgp_R[0xm\`,ׇ(e $7\r ;B"W0O,ja-M&@V|cC9 H&ER]|XsJ%4|ua%dEJ@=#;3S! K1N2soEnbb ؅]݅ݭ(?[QNVD P{>qۻۇϽx=Ngvo>󝙥YFKo {)6=,{ݒtHTK-RZLLmccaHrկ'&}.-mNܒZn[.#"&d/mʴzloQXB={dV+UAbu$EI 0|Ja'~v3igo,%OJŭ?r5֨(ewۣG78;X_駟:~gWmSnLjt>yp aa^x{)=CwMk0ֻ@ ]>Qhȑ#Q_mwDs=vXΔ^xB^s5,H( vYg!ۦfFfڰ=\j7묳r)={|gs㎷zkb4s2,,p K^" c1U_#,gqF,RUӘTۼ?Ν;wԩTI wRZa0|~2.K., (v9;?f]lb|1b_#bN^Aԝbg̾ڵkR~wr%' 4_ql{/UnСC=4<Ka/ddV/~S҇u ܪk0}E@@VWEh{l-`n|_{5F/t԰bĹm3SO=tM2$mD.ugFj[nz[m_/bt^z)W_}uW_!k3藳cp')/Y2fv#X>ɋtlQ|I!WXau]LI>Td9` V[ Pҍ7H ȳg}&AIlǷ~ۧON'tnfoV s:)D?ClOL bl馛"YM4." .ڡ`PFPDKcy睙ar6E]h_}O!@b v&J-sύ:RTe0OɈ^i}Gh]v\rrɒ@s.mvQD r-Gu6d9F?N^OĹRb .(n5x9KSF g!?ڷ&0^k;<+K,a A\hŽS/,~C܀}{EmK l$:hf ޽{6—-ciVFHSvn喖H/* t2Yf h%4;έO<*SD*szc݀W^Yqbf|6|sU$ޘ(s][xw3S,1 ]wE}EX#M33N+!JlzvRg}'O"nY,FHRƸS.@_^*G{>3h7x,$;%av[aSO+lvi'. aZヒOE0 or"q>DZg|P{ʔ}/8α<{֭ 4́uVb ` fR)8['V/ZO>AӚ&tХ컻_"5\‰5Ԣ8gy,_R Dgf?CXgBE.4mmE0}"9~諒I 'G> +gm65޼%܌%7f:Hd!1 .Շ]c]O <&V]!"tQJ( %9IdpF֨Y1 _on=`goA( dȘO(<ћZ-"KuT6h<Q HVUQQ BءG#̘caܶk4i_YcB"؋@.):ؤ sR2)2!7N|FvӘ3{)R[\laұLl=E4SmP\*gb2M 4#kNKD,KE+ɗ]v_,PRr]O?ٚ,V }hfҥ:*|̘Q9Yła,5ֱT0WJ d0oov1TSYPZ\Ju|OY/tK~me}!#L+DXueSS!kY1[4}.H}Rkt O?43?d>K,&K!s<d$§S;N.Eݩ*/7RYI߲Pds2ɿ97[%+"% ]ڴM@3hsFl@0[ə ȫ,XSYx\6 B~{8P&l 'L9 O6"f5bf13Zv7grᰈ<Ѣv`嗗s$'c%H`Jd.3O,twpڰ7ڑE><N{ffX`l\Κ}fvvɫle^)nCce[o27o%.a:p`ףם|>ҩxk`S-n'$}r2OF4ћJUG" MH@6aH"Жxo;)ST✊<<;F]vم|b^{!W7p:Qp cǴpN1 P=8/%:g`vcz #ʃpN|vIݏ2)?ko?,Ձwr qۿ'xӃ85ǡ-:'\2u}\ ٰtHCen@tM{y)q7 *h,Dgn]Ե=dc^ǎ-K#-4| `. 0 x4މO=|rLQK)/4%F)[{#'*X =SUFوf(fn駟#?L AS_Z^z!0NB01/A%5یɷcaN"A@M@"hWd/Wsq_eWo-4(+\(\vAvE0Tˏ9v9rvS ÙFp@2|ZAe~l@`({5?xaB*sj2hKwv" " " % DEKi jlJj@@/D%!" " " " " "P5Ҫ)@ HtiQD@D@D@D@D@DKkQITM@jt(" " " " " "Pҥ5$D@D@D@D@D@D@& ]Z5:E@T" " " " " " U."Ԁti *  HVNE@D@D@D@D@D@j@@@KF" " " " " " 5 ]ZJBD@D@D@D@D@DjҥUSD.D%!" " " " " "P5Ҫ)@ HtiQD@D@D@D@D@DKkQITM@jt(" " " " " "Pҥ5$D@D@D@D@D@D@& ]Z5:E@T" " " " " " U."Ԁti *  HVNE@D@D@D@D@D@j@@@KF" " " " " " 5 ]ZJBD@D@D@D@D@DjҥUSD.D%!" " " " " "P5Ҫ)@ HtiQD@D@D@D@D@DKkQITM@jt(" " " " " "Pҥ5$D@D@D@D@D@D@& ]Z5:E(*>~Z&(Ak? ʒR_~⭺)*:Ig$Ran&tҗ^zyK\d'w^:tiZKeh*~^SO5UVѣG7,*2x/%t(X,~믿WmOM3?O[E.mR&w}^xW_ꫯUT# ߿;Y{U:?py-|ME47clY7~_ٷXv:7ig["QJX4+M @١g,@Ʈ_5{z+/,mB/8rg֬!C~ƒWx].3F_~姜rJN[uDv}@XkСxEvFruwlXov77GeI&dfXr%C].馛+o&_DnVZiR}٭]}Y:,6lC P)Fp nbs=P>_|PAR L3t)l^c kqE(O>Fm}xn!vxN4D+cy[;{9ox^~B,j}}Ѯ /N;^M;Ђ{Ns=: / kau]G c["A{gh2PO>9&쯸 ~>њ0!uPl榛n{s6_ x D'aK,̒D}Oh;SO=59zUӸtz7.م|>h,RevQ@ptZl+Ҿ{dzÚkI[u-9眳6喧H%X;Ɋ+8rrғ7xc{dQBO*B-^zi4>W(|B -T_EaX2E"n{;7HCNα6H͑ ~9Ǥ^@0e73O>䌭?nEd[gug_~̀SLR#: OnVLɡȈ2&İ$OU駟O<1@:[n%-QGuG"x:[BEmVȀZ^KAhC~m<1,[MXve1QBJNz(. Cc`fCط~2G4eC(38Hq> |W򥕱3sg[ղw!љ4 {/w })w&m3Gt!h&6 L)?_M#M~>.R}(4@Rn ~=?Ҿ{">0?.6bTLWE@jKqꫯ<6Cg0زXƑX0!Ր . *Q f<6ctr/Y\<V5FcUC<󜏊U'O,Qq([%["',e3>sI'XGC]`+zg%Apr!XsGP RKS0g64qȴ 2G=[~ lHx8̒^{S*d,.Za IQ2t &Lh1V lvuW%ziiR)=xh9u3<^YV@VM!duLvaf&:eO2QŒu$.! I>PUi R@P\n"JjA(hBFX>cɸ3il"Yqa 2իW/nL.3PRM8芶4^AndUW] LBL&/k,'t&Sye'ˆԐ5/JIe|fM FK;F~ԓZ 6C2`ڏI ˦A4\B0` D[7[ XS˙KJk { ,w\j1Uacy'QP> ̷j+ŕ>XjÍ,Ж|8DyR--k"vzkWW_DzY àZf`dOIh>̛axl|@2/J IBs-HTHِX,͓B,a ͝4(iA|DV3G&4 u;aȅ0|FFO&JEkz[_w8zLti"`y+ߵ|g[0#|9tJ8&VXq'27:'UF̅=cVzZF9K`"PC gkN:d\J /OhNBЙKg`XH⇿y(@+ og#">biaɨ)0np1ؖH S*cPf@L5!X 1)!l!F"%LJYt1~KH 5Fޣ]111g amwgYGX_䁽~3f3% j̘Qs=r-:{y\3 8 `b$ݱfckj ]l%**X sbږכo):2~hPн٩  6 VH#bڅpga0Q{ BV ޤCMY-T_R:?hQB#͙:%. R~nF1֐]0 Pi!4w>ɫrx(v5 >42<d)<2 Y]Tg .s&`" 9 Dc!'Ͱ/ ca[`br# ѥ|eH -<<V衃,,ehdmمeDBQp#ХԔA0_mAr$%% ɒp50l#@l\NU$"TMbDH;r DH_$'QV7ʃdz<)(f:\}>5s֝[+Y@n0dF6˿mFQ/Beݩ|-sfԱl|.#z ХLyKmʃHP2ie`of!'P/]h -)Jm u0r ϰ`j[*},0Y8 t @ `7,0o hn,fR /1ca H|4QFlLbаf2~ .y4BiV 4IxL.dI( klcu,FѴ[S7V(fX&Z_r0[&bb5hUVY%7YTRGbIfOgԦl¸٦oi8gD}HҌna0\1EAM&yj>.IUc3^6[$\9> )dp-6֚jԼ+ZQl'=cYؔyxO,-*{tJlJݗ.<)+UٯfZM4aW pن,v녺4{/&=f.qM̊][5]:ܼRT~b<(Yت*>xFJ?@s(M6ل8—s"w_zGڰMl e$ < z%?~_1EeD k8*:#]PDn5415nFl8ϔ=&@R`Ⱥ (b,釗j{ϓ0Rn$AZL~ԩ5җ4@u)v|xU4zG7 ƓzkXQ륺]{ 88m৑!H%yԏBm1<85/OJ =xa;IBM1 iʓI5L >e9DjK}+)ü?x϶ZWK2g>f%Z`9j>\y- F1]|i0h+~y1!s.e2m ɫ)%K(R p[ /y`+B1p LsURc8AљA (W mݖX ~: I |DZ>@Ylo99؂ o O 9-A3R0, Y)-oj% ɚ~fݢ'q-?#}O,k@bwni$4L'HaGx&A|Ӡ)%h}Ёth8nrn tiVԛK$%>a^UkrV2L1ḡ14u_ 7 W`kNS4V֐ץ5̨I7 5 d4" J&!"<굎T $c &T(KE=>,cLbRz" " cVX#' 9 s~qјA5FR_ m0U=TVr@{%f˯keU/D /dO&g;Dl N<ĝw޹=USuhmK}mٙ:>odeCD@D@D@w\cr8Ӌ?X[[8Nm5BI^zr@slW!DE(" RNˎ" " " " " " "PCҥ5D@D@D@D@D@D@*& ]Z12E!TR" " " " " " ."@ 6oKNq7‰@ HU|B{桬m(̓Et #ҥ1U%ʳnEJ]i0oXD@D@D}.m_ڈTDݎkw3GEwqqَ0|]x{wv_7Q7SwT_ n(q?|^~& gݼFrw\%J#E7^%qwl+O=>x;J"qs`$ks}wROD@D@ I@ͦBԀ^FtUB]Gizy$8ʳO?FnF8{oCn‰㞼? 3n"7t9^&f:hyD;N&2뵄[רTvEK y>wn%/?zrs]wa>" " "PX [r\D@D@ZA`{ݠ]T؉Ur #߹OoY곑<ן#-znԏ "&γpQl.iε{ȔUvQr/=9]`[ŧ_F'ƛ gu;ঙ!"ouG÷Ã&;G#" " $ ]ZvSE@D@ZIko"JIQz,Aѓ$ ٫%/>vKR>C?N7s˭9F;En|,Ŕ}th׻>;D{=ҺDu<kۛ/o⮵y$C:O" " "PLZ[vSE@D@ZC݇o֋&dinZ1wHp?b-|]b76҉'s3Wi!מogr3FKp1^%2xz-u#*'2SO2ON}סCx=z7i)bR=)x/Yvc3iϕ׏! #)?z?hE"og\X5tr|Ѩd1Hˇ@moxZ/-hId9mVfsZ~{6-l7ZD@D5f=:vhwۯtSO!%Z+݈.5I= Ԟ / qpp}M-~ !ۻNzDдq:[.egΊeN;q8e8zө_.-lө" " UH}Or7 pKts}sq%an&KzEVP^Dz]ePtY#{4]߭[;P>ўsAm^sNtN/1>R#Gb_+v9w.ne]W" " "PXF+jQãeQ]vEbO4TtaÆ12ܹsi(\(\ ZwAqM? 6ѯl9?r|'ݧs[2o:ڻd8hIZE-sfogE0! U &׍}5rJ\ۿ׻Trt39#" " 텀񶗖T=D@D@D@D@D@DKn*ҥ%U(&bJ-" " " " " 텀ti{iICD@D@D@D@D@I@R@{! ]^ZRb.-f" " " " " "^HT=D@D@D@D@D@DKn*ҥ%U(&bJ-" " " " " 텀ti{iICD@D@D@D@D@I@R@{! ]^ZRb.-f" " " " " "^HT=D@D@D@D@D@DKn*ҥ%U(&bJ-" " " " " 텀ti{iICD@D@D@D@D@I@R@ߟ|ߨ G@tkrUXD@D`РAc5/Uhr MꪫV]Ax3K5Ķ7d' CjIUz{vl4}.;ONofkFF;Oyᇛg ^tE??[hPfC0MeIK,q}p gwm޽;Y#/r:sI'1z~R٦'s:%Y 5O X9c9he_5hJ h\*߿?{}ESL1EϞ=wy_|UQ~_YO?=8CYfW~+z` p'-@vc$x< tBv!f|M}gG}Dve;]uUܼ:uɶ{ >nrdQ8& SQRLM=#tv~暊7s*hyuT4݀kEF lTbch6cc9/Օwy'?8뮋[k΀PB~dRƸ{ld8X7 дLw`w٣G7q/ڦ\Q[{إ:[.| fKE+ TBn_Jsy3*=2o>hվ2}g_I<-_fYHX,<}|ag 5IRfxS9딜Fr)TfZ,EEB3 '.ͨ :!j=f dL?,CҤ#0KtQǏ%s ZJ; Ak1FT-LzDZ-.MyRȃ%KNFs~/V$ᓡK`-eǴ8|U)q6DŽk-ϫ9ZYeIHi}G_ERHNyjQp:LGFؘ`{y駙y-)R/" "P,ٺׅW9]tQ[-_cE g)sύP Qdj*U}r(>pqBW^y aߏ?a"&f0Gx^fV%|a _hX Faa0_c[ Zs aD)K5mbXcO-鈕IЈaM7-TT1F*S݌C e򄎌F>E3{T&B}ŇY)l[IDATØ;7puuIfQe uVׇq6<(:5!*y&wKmcz7I`7R?X=bֹ1qSq5<%T0K '1Б䐘0pFŧ:+#_Ⱦeiix2e{EXT1b̑d@d@(׌fMeqj.-f5LypnMLa֜( θ7V 2ߒ?R=-dkҘ$J-pjj,X>dґ o0f蓗bW̓lׇUhAdcI!Iħ`ZU -klWY#&VÈec?;O e$'e\m~gfSV;yV2v&uiF%ޕ@2AAA+P?ϾYR90 O.l'DƢ[|J37G\~Ssɚς6FvO߲+> Y:50hO8ÞTˇg]7M21r5zQhb;| 8s.™#qv.N[ S:t`6"l+CZV/AřJ ?rKbG, * p(ԥ<$<)P8Sf\fV*|٦'"axJRcM81I,R+ov~@x$[l!2$/UWd諸'$+[~}$'}T9l0Yws RrN==9mLGjNƺ*J,ב1!Y0(}# tK.! $x^*,d:>waBf/eF0`YJyGvx^GdyL[eľIfXn퟿j)kBe+\-vH>t酗!Z9ʛ9k`a!83Z6oOlUYllS˞f\D~<Ѣt  cN8Zѥx)eQ_V!2 Q"( 'G:ad[xQfIx >7~(GTgAq#4٣{wp=]iG!ɩQ3DfS;bI,gfB3/Ju~:#MaKHFR~~QG4&өxVPbH3G*Ǣ_mnX*7\x>JN78oRatsFW.{э1&8+~rV`6422&9ZÀ` ľ/EEgR0x8yi2 1Du)K)u:GrL(Au0sm,y%} (xa5ʜCI Hp}pF,hK@L>c;JRJ޽ӧ_R 34ZэqecdӤi,HU˅M',H 3Y")[{#KIXfdznZMhL~X{ĴFvJ0Ϝ lN?0YdPlARlLp[<D .JҒ/-I%Im(e#R$mm'VMXo9VS_G{i,XGWMW9ͷ6>eƇJZ5q,_ń)XL]b"̮@leKϮrKˌ^F,R<:,zĆa^ĺrgl[CƊ #ЁnwN3#1$kIfhBƊ2ǀ+YZ[–f;3%/hK#3,9F!25qd]N4"jX0F`ΧX ") c!%WvbT$$ۯ.1&!" ٌnA#ASB"181ՇXbQnj[!ɀu)a,e L!eaRMOD&51z cII,Xyaf}(sʙ,O AӰJ%u~"$">XYGua?NɋU +k}D+" 4FeY Y=2Ry4NZiCXKc@a֫pe.ag$Uau)hK5/DA\||*Mh$B4<5.o~2λy+=*;#]`mGYF,5g=᫆WJKS=Y5d9,-@exӔCD@DM{]vL"I1@X 9c5* #&{'v2,d,5 l&g ;MꝻ h].6ERS6QCD@N8*' F}>Y% M,a&'Yk̤s1l+e_ ‰l6x^[ %3l/g S,"@SǴɢŮSDC6'&֙UY֗}H 644Ÿx (ƬORٵ4ƝM-jK*" " " MHEMXz !*-ZoJ8wFJ??cGPH|Op{ +ti[Oe(r&0oaNw '" " ' ]ZTE@D@kC< ܄\W8(8f_Zp*@ u[hi{ܙGx1wn'nE&iotVXmÞpӁn9Zzc>݅'pSMVحN!neZBnUo 4wL{kM hD+l1f9]eC7m6u^-?, ibwZvsrg]%q'" " cKǀFVE@D@}-.h-ܣϹ9N~z{^7um/ݤ])O#vrn7 n\ܹFε6w>c;Y*DI/=z7GR^u[{Q ZaHsA5}]xquO \QwnT^~mC$FSH)-zteD&+u}0@};=k"{oFM9u=tgdD]h)ʳT9ϭaa)rݧ" " "~ H߶UD@D@2Kߦ:O7ڭ[cwr"5;nKѶҮSEڍopv>(:h#jjDf󏖈WK(A@ͬJOӉc]"gdivAݨۯtSObb^ه![V:wt]׹?pwt|FX~J'>xSKX"" "0.YU@}tZYC{Yޛܲӻeso|(FnV?%LGExoYt9n.ݙ7Dkz- FtS ^ǹq wq^Σx[?1@ᢣUzhS׮][L;]D*s:߰aFauܹ{ ȴYoF-v;Xj m@bDEN?4q'$])Mtr'R2g'ԇCD@D@_:洵j*" "I`KN0a;rhoLYL$" " 픀ti;mXUKD@DN0ruAɊF@tLkqWD@DuNaզԈti@*1@cHUM[JRD@D@D@D@D@D 7ܨPD@D@D@D@D@DKUI&PN 4Tۯnqǧgs: 7 W`kNV4VɊ@0z&,$_O:ufiKkkS^" " " " y H%p" " " " " "  [M=*MKR8z.U)" " " " " "ti^R '" " " " " "PDQ۷(cxIENDB`dbplyr/vignettes/notes/0000755000176200001440000000000013050655353014713 5ustar liggesusersdbplyr/vignettes/notes/postgres-setup.Rmd0000644000176200001440000000104113050655353020357 0ustar liggesusers # Setting up Postgresql ## Install First install postgresql, create a data directory, and create a default database. ``` brew install postgresql export PGDATA=~/db/postgres-9.5 # set this globally somewhere initdb -E utf8 createdb createdb lahman createdb nycflightd13 ``` ## Start ``` pg_ctl start ``` ## Connect ```{r, eval = FALSE} install.packages("RPostgreSQL") library(DBI) con <- dbConnect(RPostgreSQL::PostgreSQL(), dbname = "hadley") dbListTables(con) ``` dbplyr/vignettes/notes/mysql-setup.Rmd0000644000176200001440000000134013050656072017657 0ustar liggesusers # Setting up MariaDB ## Install ``` brew install mariadb mysql_install_db --verbose --user=hadley --basedir=/usr/local \ --datadir=/User/hadley/db/mariadb --tmpdir=/tmp mysqld --datadir='/Users/hadley/db/mysql' mysql -u root -e "CREATE DATABASE lahman;" mysql -u root -e "CREATE DATABASE nycflights13;" mysql -u root -e "CREATE DATABASE test;" ``` ## Start ``` mysqld --datadir='/Users/hadley/db/mysql' ``` ## Connect ```{r, eval = FALSE} install.packages("RMySQL") library(RMySQL) drv <- dbDriver("MySQL") con <- dbConnect(drv, dbname = "lahman", username = "root", password = "") dbListTables(con) ``` # Shut down ``` mysqladmin shutdown -u root -p ``` dbplyr/vignettes/sql-translation.Rmd0000644000176200001440000003023013102722665017357 0ustar liggesusers--- title: "SQL translation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{SQL translation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) ``` There are two components to dplyr's SQL translation system: * translation of vector expressions like `x * y + 10` * translation of whole verbs like `mutate()` or `summarise()` To explore them, you'll need to load both dbplyr and dplyr: ```{r, message = FALSE} library(dbplyr) library(dplyr) ``` ## Vectors Most filtering, mutating or summarising operations only perform simple mathematical operations. These operations are very similar between R and SQL, so they're easy to translate. To see what's happening yourself, you can use `translate_sql()`. The basic techniques that underlie the implementation of `translate_sql()` are described in ["Advanced R"](http://adv-r.had.co.nz/dsl.html). `translate_sql()` is built on top of R's parsing engine and has been carefully designed to generate correct SQL. It also protects you against SQL injection attacks by correctly escaping the strings and variable names needed by the database that you're connecting to. The following examples work through some of the basic differences between R and SQL. * `"` and `'` mean different things ```{r} # In SQLite variable names are escaped by double quotes: translate_sql(x) # And strings are escaped by single quotes translate_sql("x") ``` * Many functions have slightly different names ```{r} translate_sql(x == 1 && (y < 2 || z > 3)) translate_sql(x ^ 2 < 10) translate_sql(x %% 2 == 10) ``` * And some functions have different argument orders: ```{r} translate_sql(substr(x, 5, 10)) translate_sql(log(x, 10)) ``` * R and SQL have different defaults for integers and reals. In R, 1 is a real, and 1L is an integer. In SQL, 1 is an integer, and 1.0 is a real ```{r} translate_sql(1) translate_sql(1L) ``` * If statements are translated into a case statement: ```{r} translate_sql(if (x > 5) "big" else "small") ``` ### Known functions dplyr knows how to convert the following R functions to SQL: * basic math operators: `+`, `-`, `*`, `/`, `%%`, `^` * math functions: `abs`, `acos`, `acosh`, `asin`, `asinh`, `atan`, `atan2`, `atanh`, `ceiling`, `cos`, `cosh`, `cot`, `coth`, `exp`, `floor`, `log`, `log10`, `round`, `sign`, `sin`, `sinh`, `sqrt`, `tan`, `tanh` * logical comparisons: `<`, `<=`, `!=`, `>=`, `>`, `==`, `%in%` * boolean operations: `&`, `&&`, `|`, `||`, `!`, `xor` * basic aggregations: `mean`, `sum`, `min`, `max`, `sd`, `var` * string functions: `tolower`, `toupper`, `trimws`, `nchar`, `substr` * coerce types: `as.numeric`, `as.integer`, `as.character` Perfect translation is not possible because databases don't have all the functions that R does. The goal of dplyr is to provide a semantic rather than a literal translation: what you mean rather than what is done. In fact, even for functions that exist both in databases and R, you shouldn't expect results to be identical; database programmers have different priorities than R core programmers. For example, in R in order to get a higher level of numerical accuracy, `mean()` loops through the data twice. R's `mean()` also provides a `trim` option for computing trimmed means; this is something that databases do not provide. Databases automatically drop NULLs (their equivalent of missing values) whereas in R you have to ask nicely. This means the essence of simple calls like `mean(x)` will be translated accurately, but more complicated calls like `mean(x, trim = 0.5, na.rm = TRUE)` will raise an error: ```{r, error = TRUE} translate_sql(mean(x, na.rm = TRUE)) translate_sql(mean(x, trim = 0.1)) ``` `translate_sql()` takes an optional `con` parameter. If not supplied, this causes dplyr to generate (approximately) SQL-92 compliant SQL. If supplied, dplyr uses `sql_translate_env()` to look up a custom environment which makes it possible for different databases to generate slightly different SQL: see `vignette("new-backend")` for more details. ### Unknown functions Any function that dplyr doesn't know how to convert is left as is. This means that database functions that are not covered by dplyr can be used directly via `translate_sql()`. Here a couple of examples that will work with [SQLite](http://www.sqlite.org/lang_corefunc.html): ```{r} translate_sql(glob(x, y)) translate_sql(x %like% "ab%") ``` ### Window functions Things get a little trickier with window functions, because SQL's window functions are considerably more expressive than the specific variants provided by base R or dplyr. They have the form `[expression] OVER ([partition clause] [order clause] [frame_clause])`: * The __expression__ is a combination of variable names and window functions. Support for window functions varies from database to database, but most support the ranking functions, `lead`, `lag`, `nth`, `first`, `last`, `count`, `min`, `max`, `sum`, `avg` and `stddev`. * The __partition clause__ specifies how the window function is broken down over groups. It plays an analogous role to `GROUP BY` for aggregate functions, and `group_by()` in dplyr. It is possible for different window functions to be partitioned into different groups, but not all databases support it, and neither does dplyr. * The __order clause__ controls the ordering (when it makes a difference). This is important for the ranking functions since it specifies which variables to rank by, but it's also needed for cumulative functions and lead. Whenever you're thinking about before and after in SQL, you must always tell it which variable defines the order. If the order clause is missing when needed, some databases fail with an error message while others return non-deterministic results. * The __frame clause__ defines which rows, or __frame__, that are passed to the window function, describing which rows (relative to the current row) should be included. The frame clause provides two offsets which determine the start and end of frame. There are three special values: -Inf means to include all preceeding rows (in SQL, "unbounded preceding"), 0 means the current row ("current row"), and Inf means all following rows ("unbounded following)". The complete set of options is comprehensive, but fairly confusing, and is summarised visually below. ```{r} knitr::include_graphics("windows.png", dpi = 200) ``` Of the many possible specifications, there are only three that commonly used. They select between aggregation variants: * Recycled: `BETWEEN UNBOUND PRECEEDING AND UNBOUND FOLLOWING` * Cumulative: `BETWEEN UNBOUND PRECEEDING AND CURRENT ROW` * Rolling: `BETWEEN 2 PRECEEDING AND 2 FOLLOWING` dplyr generates the frame clause based on whether your using a recycled aggregate or a cumulative aggregate. To see how individual window functions are translated to SQL, we can again use `translate_sql()`: ```{r} translate_sql(mean(G)) translate_sql(rank(G)) translate_sql(ntile(G, 2)) translate_sql(lag(G)) ``` If the tbl has been grouped or arranged previously in the pipeline, then dplyr will use that information to set the "partition by" and "order by" clauses. For interactive exploration, you can achieve the same effect by setting the `vars_group` and `vars_order` arguments to `translate_sql()` ```{r} translate_sql(cummean(G), vars_order = "year") translate_sql(rank(), vars_group = "ID") ``` There are some challenges when translating window functions between R and SQL, because dplyr tries to keep the window functions as similar as possible to both the existing R analogues and to the SQL functions. This means that there are three ways to control the order clause depending on which window function you're using: * For ranking functions, the ordering variable is the first argument: `rank(x)`, `ntile(y, 2)`. If omitted or `NULL`, will use the default ordering associated with the tbl (as set by `arrange()`). * Accumulating aggregates only take a single argument (the vector to aggregate). To control ordering, use `order_by()`. * Aggregates implemented in dplyr (`lead`, `lag`, `nth_value`, `first_value`, `last_value`) have an `order_by` argument. Supply it to override the default ordering. The three options are illustrated in the snippet below: ```{r, eval = FALSE} mutate(players, min_rank(yearID), order_by(yearID, cumsum(G)), lead(G, order_by = yearID) ) ``` Currently there is no way to order by multiple variables, except by setting the default ordering with `arrange()`. This will be added in a future release. ## Whole tables All dplyr verbs generate a `SELECT` statement. To demonstrate we'll make a temporary database with a couple of tables ```{r} con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") flights <- copy_to(con, nycflights13::flights) airports <- copy_to(con, nycflights13::airports) ``` ### Single table verbs * `select()` and `mutate()` modify the `SELECT` clause: ```{r} flights %>% select(contains("delay")) %>% show_query() flights %>% select(distance, air_time) %>% mutate(speed = distance / (air_time / 60)) %>% show_query() ``` (As you can see here, the generated SQL isn't always as minimal as you might generate by hand.) * `filter()` generates a `WHERE` clause: ```{r} flights %>% filter(month == 1, day == 1) %>% show_query() ``` * `arrange()` generates an `ORDER BY` clause: ```{r} flights %>% arrange(carrier, desc(arr_delay)) %>% show_query() ``` * `summarise()` and `group_by()` work together to generate a `GROUP BY` clause: ```{r} flights %>% group_by(month, day) %>% summarise(delay = mean(dep_delay)) %>% show_query() ``` ### Dual table verbs | R | SQL |------------------|------------------------------------------------------------ | `inner_join()` | `SELECT * FROM x JOIN y ON x.a = y.a` | `left_join()` | `SELECT * FROM x LEFT JOIN y ON x.a = y.a` | `right_join()` | `SELECT * FROM x RIGHT JOIN y ON x.a = y.a` | `full_join()` | `SELECT * FROM x FULL JOIN y ON x.a = y.a` | `semi_join()` | `SELECT * FROM x WHERE EXISTS (SELECT 1 FROM y WHERE x.a = y.a)` | `anti_join()` | `SELECT * FROM x WHERE NOT EXISTS (SELECT 1 FROM y WHERE x.a = y.a)` | `intersect(x, y)`| `SELECT * FROM x INTERSECT SELECT * FROM y` | `union(x, y)` | `SELECT * FROM x UNION SELECT * FROM y` | `setdiff(x, y)` | `SELECT * FROM x EXCEPT SELECT * FROM y` `x` and `y` don't have to be tables in the same database. If you specify `copy = TRUE`, dplyr will copy the `y` table into the same location as the `x` variable. This is useful if you've downloaded a summarised dataset and determined a subset of interest that you now want the full data for. You can use `semi_join(x, y, copy = TRUE)` to upload the indices of interest to a temporary table in the same database as `x`, and then perform a efficient semi join in the database. If you're working with large data, it maybe also be helpful to set `auto_index = TRUE`. That will automatically add an index on the join variables to the temporary table. ### Behind the scenes The verb level SQL translation is implemented on top of `tbl_lazy`, which basically tracks the operations you perform in a pipeline (see `lazy-ops.R`). Turning that into a SQL query takes place in three steps: * `sql_build()` recurses over the lazy op data structure building up query objects (`select_query()`, `join_query()`, `set_op_query()` etc) that represent the different subtypes of `SELECT` queries that we might generate. * `sql_optimise()` takes a pass over these SQL objects, looking for potential optimisations. Currently this only involves removing subqueries where possible. * `sql_render()` calls an SQL generation function (`sql_select()`, `sql_join()`, `sql_subquery()`, `sql_semijoin()` etc) to produce the actual SQL. Each of these functions is a generic, taking the connection as an argument, so that the details can be customised for different databases. dbplyr/vignettes/windows.graffle0000644000176200001440000000573412250116636016613 0ustar liggesusers][w~N~'`NYΥmz$S;i_m&\zߏ*޺[Hÿ< #7_'!ܻ::=I4K7'A=hvONW"i4g{Fc:ʐJv0Q: (ij Ql !WG;hvԶ] }}=lЫc4@rN3kCL=KMEV,뱈"DL51D>ēzIq "!<%E!d<q IXәeL܊=kH.\ղ &?f & sp{ɨ3Y)/jz0,ܱUdPF*=xt4K*vHk+B9!ƃcyԨ%蝚%:tGeofц$

3;S7-[PbF&qFpfc3SpoѴn݋۞&sކ* ޚ <ݺ&P+M;JHȑ$EN6__y;;LZSQ)R6e>.Qxbb! 5nxF7eӻpksbl#FmT`%7 Aσ>c)I-Z2O,s0d-LjƑ%^8zg鎙G:x8Pq\RËՌԝkF(b|S8xŃyu Ŀݔ:{^R^a_V8}W2H&!p>3<%-ܗl$unH, h"tg{z&VSӥ:jx M]VM9ֳ"`}01( rCg|,@. o/ֈmUr[}vVa8xv[BludCӒud{6`  ba|ïJCdH6C#hi\ܠm h ho94sX$}^#H0ܐ- do#{}[2o]_Cd KegX_$_̪Jo_:ʐ(otػUF 2 CgjR5j\5.dVQ"Կ vlgtfv:xX҈?p iPP9FZ@A@aà`< F36hs%$i@ 2b5  lO  †Az8aM4?*`>D, FARTQD<:ͥ6 D~(0PPx*ӯeμi" q/c̍f'<yr1ɵ\q6Z&?o6M u[mwp#tng. ' mSVY?m~l/nsA_nI&8h;L" p8#)g)r w$^9Z\`sjawp2K\Qf4},gpFcʥɪ/#DK+.B((̒Uc19A)H6,f>3 S/Unvtc_q{$)J u #Ҳ!OW6.敆Cb֍O?@V:8nߵf>UIƪځU^h \a8!gk㡙tC8*k]8& JHr9Sg;:&=~0/; Yzg!K{Vu޼>ު ߱M>s6lnC}ك۹Q? |ə킷?^*۳v3yuMO^ܡ&@}}c*9.ΜIl\޻6N7oOd|{p/C+_YIV Y;#tr2^J"(%V&![v2 j3!|7b^Z֭;Ov/w6FaU)WډFID:GJcW!O1oNx PHmiZqI)Ş=! \ %7],F ߚEwTSrsKA1s#)dbplyr/vignettes/new-backend.Rmd0000644000176200001440000000644413173721605016415 0ustar liggesusers--- title: "Adding a new DBI backend" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Adding a new DBI backend} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, echo = FALSE, message = FALSE} knitr::opts_chunk$set(collapse = T, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) ``` This document describes how to add a new SQL backend to dbplyr. To begin: * Ensure that you have a DBI compliant database backend. If not, you'll need to first create it by following the instructions in `vignette("backend", package = "DBI")`. * You'll need a working knowledge of S3. Make sure that you're [familiar with the basics](http://adv-r.had.co.nz/OO-essentials.html#s3) before you start. This document is still a work in progress, but it will hopefully get you started. I'd also strongly recommend reading the bundled source code for [SQLite](https://github.com/tidyverse/dbplyr/blob/master/R/db-sqlite.r), [MySQL](https://github.com/tidyverse/dbplyr/blob/master/R/db-mysql.r), and [PostgreSQL](https://github.com/tidyverse/dbplyr/blob/master/R/db-postgres.r). ## First steps For interactive exploitation, attach dplyr and DBI. If you're creating a package, you'll need to import dplyr and DBI. ```{r setup, message = FALSE} library(dplyr) library(DBI) ``` Check that you can create a tbl from a connection, like: ```{r} con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:") DBI::dbWriteTable(con, "mtcars", mtcars) tbl(con, "mtcars") ``` If you can't, this likely indicates some problem with the DBI methods. Use [DBItest](https://github.com/rstats-db/DBItest) to narrow down the problem. Now is a good time to implement a method for `db_desc()`. This should briefly describe the connection, typically formatting the information returned from `dbGetInfo()`. This is what dbplyr does for Postgres connections: ```{r} #' @export db_desc.PostgreSQLConnection <- function(x) { info <- dbGetInfo(x) host <- if (info$host == "") "localhost" else info$host paste0("postgres ", info$serverVersion, " [", info$user, "@", host, ":", info$port, "/", info$dbname, "]") } ``` ## Copying, computing, collecting and collapsing Next, check that `copy_to()`, `collapse()`, `compute()`, and `collect()` work. * If `copy_to()` fails, it's likely you need a method for `db_write_table()`, `db_create_indexes()` or `db_analyze()`. * If `collapse()` fails, your database has a non-standard way of constructing subqueries. Add a method for `sql_subquery()`. * If `compute()` fails, your database has a non-standard way of saving queries in temporary tables. Add a method for `db_save_query()`. ## SQL translation Make sure you've read `vignette("sql-translation")` so you have the lay of the land. ### Verbs Check that SQL translation for the key verbs work: * `summarise()`, `mutate()`, `filter()` etc: powered by `sql_select()` * `left_join()`, `inner_join()`: powered by `sql_join()` * `semi_join()`, `anti_join()`: powered by `sql_semi_join()` * `union()`, `intersect()`, `setdiff()`: powered by `sql_set_op()` ### Vectors Finally, you may have to provide custom R -> SQL translation at the vector level by providing a method for `src_translate_env()`. This function should return an object created by `sql_variant()`. See existing methods for examples. dbplyr/README.md0000755000176200001440000000433113142433304013026 0ustar liggesusers dbplyr ======================================================= [![Build Status](https://travis-ci.org/tidyverse/dbplyr.svg?branch=master)](https://travis-ci.org/tidyverse/dbplyr) [![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/dbplyr)](http://cran.r-project.org/package=dbplyr) [![Coverage Status](https://img.shields.io/codecov/c/github/tidyverse/dbplyr/master.svg)](https://codecov.io/github/tidyverse/dbplyr?branch=master) Overview -------- dbplyr is the database backend for dplyr. If you are using dplyr to connect to databases, you generally will not need to use any functions from dbplyr, but you will need to make sure it's installed. Issues ------ If you find any bugs, please file in [dplyr](https://github.com/tidyverse/dplyr/issues). Installation ------------ ``` r # You can install the released version from CRAN install.packages("dbplyr") # Or the the development version from GitHub: # install.packages("devtools") devtools::install_github("tidyverse/dbplyr") ``` Usage ----- ``` r library(dplyr, warn.conflicts = FALSE) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars) mtcars2 <- tbl(con, "mtcars") mtcars2 #> # Source: table [?? x 11] #> # Database: sqlite 3.19.3 [:memory:] #> mpg cyl disp hp drat wt qsec vs am gear carb #> #> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 #> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 #> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 #> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 #> 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 #> # ... with more rows ``` dbplyr/MD50000644000176200001440000002166113223156042012062 0ustar liggesusers124526607aed2572aa06059d3484b4de *DESCRIPTION 4ac396cdf32c44b3d07b7d6c9fd388d0 *LICENSE a3420bdfb3c11e38ebec9c772983c214 *NAMESPACE 228a514cc8e817470859e6caf4d69ea8 *NEWS.md 1f566eaecc587981286785466196bb10 *R/cache.r 664f0300770a2fa30a2176ea8f651368 *R/compat-purrr.R 89c9a0446a641b4ceb92c4e8e4e9acbc *R/data-lahman.r c8f63ee7909a9d8293007975bf46b7c7 *R/data-nycflights13.r 13a90eed48a0bb675457ebaf9a9d597d *R/db-compute.R f3912df8f51d665e9d620693026b044d *R/db-mysql.r b1f6687043cb010f2b27cd01bcf626cc *R/db-odbc-access.R 699dc0d4e6f3d054f38da1fdc7302a8b *R/db-odbc-hive.R bdbe8d943ad268d8969829c719c0a94e *R/db-odbc-impala.R 6618f4bc15c57f9b23933387dc513756 *R/db-odbc-mssql.R fe05a2dc2ba8f60e4a0664a3414794f4 *R/db-odbc-oracle.R abe6eb1f29dc35bb4976483cabd8be32 *R/db-odbc-redshift.R cd29cafcbc84ee63b056f0f141ddeae8 *R/db-odbc-teradata.R 4d0e3fe886432ffbd5ef2a760a627409 *R/db-postgres.r 451c93d4fb1bc49f75198a8b0313c6a5 *R/db-roracle.R 1903e24521ef094f244cc664442621a6 *R/db-sqlite.r 819f51d77b2b123dae45b4e35ccbcda5 *R/dbi-s3.r 35ed306f23bbb6fef95ae03c63258ca7 *R/dbplyr.R 2656b160c8cb0ca589004f7130ac7794 *R/do.r f111ff10b7843f16642e02029189332d *R/explain.r 6a0d999c025a330aefd2637f00af1c06 *R/ident.R 6f9cf85647ca03759f7f3bcf6cb0cc0b *R/lazy-ops.R 0c6e57dfaa43ce3203f77712c8e58caa *R/memdb.R 367452e7fdfe97a4fcc42b7271547081 *R/partial-eval.r 06e4b01d70e0bd760469dfff7a495d11 *R/query.r 32ec9e74991cc0599606e444086664e0 *R/remote.R 30f680d3d986096f8665aabaf90f888a *R/schema.R 03d2f1e2c02b60a2d2de622bd79e683a *R/simulate.r 69c9fb3259f50361392302cbe543433f *R/sql-build.R 60851349f86eccf7139339ebb6f3f7b9 *R/sql-escape.r b86f6a4a555d4409141f91041160a8f6 *R/sql-expr.R c66b87835ed32118e263585f54995685 *R/sql-generic.R 02a21b050f0d3f3825ab0136d48cb91e *R/sql-optimise.R 64583b01b452fa64282652a5d4d2e8bb *R/sql-query.R ad1d5f2d3489a66afb9cb9e692cfeef0 *R/sql-render.R 74e15d348b82b03627663278e458db89 *R/sql.R 24cbc6b42eac264ba6149b4f039ed122 *R/src-sql.r b382862dfa42057e269412569f0136a4 *R/src_dbi.R 3cd551921372f69cee4aeba14009d113 *R/tbl-lazy.R 8565fe0c7be39f21146f47b56a9bda5f *R/tbl-sql.r e446442199bac3c0b072e4846141cbe7 *R/test-frame.R af7a909243a02c5914f1cde70c991772 *R/testthat.r 39bb5225f2eff0a97785ed21802d0bbe *R/translate-sql-base.r 524a43a93bda38e9f7f61d9957778104 *R/translate-sql-clause.r 3ffc6b6ccabe5b7b170a945a0dca8c9c *R/translate-sql-helpers.r 81749aaab952162407d4d6bf5b903e97 *R/translate-sql-odbc.R 2324b0406c3ac25b587e15f9af04db0e *R/translate-sql-paste.R e9a2c33ce0713aaca94f2e205d022e67 *R/translate-sql-window.r 28b29f3acb966ef65a2d7328bab60c82 *R/translate-sql.r 29db7f24063b2fe14b566a59a15d1481 *R/utils-format.r b5290be2dbe90769dc56d5c95cef31f7 *R/utils.r 012854569fce7cddf8b0cf0c3314d5fe *R/window.R 47040ee865beb255a363c79c0bbcfbd4 *R/zzz.R ade198baa32223764a95aab0f622c0f0 *README.md f9ac858cef0a4dd67c5dc165499b6d0a *build/vignette.rds 0b8938d562d916ee751aaffeb371e32d *inst/doc/dbplyr.R 830cf012d65df479e01294fd3052e1f3 *inst/doc/dbplyr.Rmd 71388b7f98b99aa325042f064a73e704 *inst/doc/dbplyr.html 1c2e34b8105de8e6491299ca6297863a *inst/doc/new-backend.R 3287d66d52e13e5185f69766c0332733 *inst/doc/new-backend.Rmd f78942c411dd29cc20dd5e2a1e132a2c *inst/doc/new-backend.html 6164bad57db0da78751f32018fe355f2 *inst/doc/sql-translation.R 87d6b4a178814e5268950f515707b666 *inst/doc/sql-translation.Rmd cb7fea32489193f76b46222514e1ccf8 *inst/doc/sql-translation.html 3566c52b9c44196915a2f43dcbb1e757 *man/build_sql.Rd f1870a4469be2e6f78345136d41e2ef6 *man/copy_to.src_sql.Rd bf20b917054e14a24f6501d44b96797f *man/db_copy_to.Rd b11b43b40a5628a67f22b86eaefb31a3 *man/dbplyr-package.Rd d540b6ed747636a8f77c950a3a69f818 *man/do.tbl_sql.Rd 54e2b8083828cfcbd5c5f3d4d8f61278 *man/escape.Rd f624141eaf9866aff28d18cce3eefabf *man/figures/logo.png 00db753f245ca74e340091c6a5310eae *man/ident.Rd c6661194f65d8b5a83dcc0beed69cd14 *man/in_schema.Rd 04f2f5f1c2825459066709b7b3de26cc *man/join.tbl_sql.Rd 2bcc4ed126fefa63bad41955c6c0df47 *man/lahman.Rd 2b3fd5ef0c28b46dc42b510736f91cd9 *man/lazy_ops.Rd b0e44849d1ba102ec2a4c74fdfdf976f *man/memdb_frame.Rd 943f27c2c7adbdfe701213a7e7156970 *man/named_commas.Rd bd82e47ba15e98e4b9be818aa6df1192 *man/nycflights13.Rd dfbba35dec971948cab49bf296a07047 *man/partial_eval.Rd 3cd83bd0677f9b1d02f7600ec286a76f *man/remote_name.Rd e052fba850d59135142a82fbb746038a *man/sql.Rd 6ee31b69eabfcafcfed5485ab88a5e76 *man/sql_build.Rd 5a7d68f1a9bd6004354be1c1fc9e3437 *man/sql_escape_logical.Rd f6a835b296d6c2592376598c9e88f846 *man/sql_expr.Rd 6b89544baeb1c91b6e2bd3f5815b95d0 *man/sql_quote.Rd 9e3f44e152d4ddb50f7093534c2a2204 *man/sql_variant.Rd e0357a56a7c7066f3e1d9bfd3f51d69e *man/src_dbi.Rd ce65ea64aafccb5e7297d19f15413812 *man/src_sql.Rd 46b3da407b984ec92cd208909231ee24 *man/tbl_lazy.Rd 5461739939836c47d7fda64286774b00 *man/tbl_sql.Rd 344610d57a2762c58e394c3e70520b8c *man/testing.Rd 9c0ce9ceabd7d0c9960a401e77de69d3 *man/translate_sql.Rd 6e63270617f3b3c462b3a757bc4f8c3e *man/win_over.Rd dd6d5e079d5a3a748907684a2f5fd218 *man/window_order.Rd c1a44682d2c53dba690e0222461853ae *tests/testthat.R d41d8cd98f00b204e9800998ecf8427e *tests/testthat/explain-sqlite.txt 37a29834fe43519f14036fa4ed93f4bd *tests/testthat/helper-output.R a412ba356a32fbdeed5c19534e42ee45 *tests/testthat/helper-src.R 29c4e4d944cb48214990bcc5ea77912d *tests/testthat/test-arrange.r 7c51af824f93bf5093877463821e38f0 *tests/testthat/test-collect.R de11ad360fef9dd1df4c6cd26d7adca3 *tests/testthat/test-colwise.R 31419984cbeb9d1594e5a4689c5449b4 *tests/testthat/test-compute.R a41a65729aa8217658728149c9d9eab8 *tests/testthat/test-copy_to.R 49e16b30c290929b05629d4d9af59bbd *tests/testthat/test-distinct.R ee7774b41ee80b60d776ffb02cf13915 *tests/testthat/test-do.R 5c31ee1a5f7e6f244960a94d2d7aca85 *tests/testthat/test-escape.R 4d5ca1557c9d71825352a7661bd7e797 *tests/testthat/test-explain.R 33e660a16bf9612b085fce4ec53ad5e0 *tests/testthat/test-filter.r fd108b8b9d78fdf8d61db80130751410 *tests/testthat/test-group-by.r f9d8ef0c275b42ebd9595f0f1861d952 *tests/testthat/test-group-size.R 5c0dff0fae40be14cc4bf420615390ed *tests/testthat/test-ident.R 0f16491ebe909c8c1929c749e993d249 *tests/testthat/test-joins-consistent.R c30430f26a97e5d993e862bc6fa4e812 *tests/testthat/test-joins.R b60aee7a53fe9be33efeefe00c7f1d11 *tests/testthat/test-lazy-ops.R f3d9a20d15549dcf5ac821fd06d29a6b *tests/testthat/test-mutate.r d51a07a11708e1a1bc312210907c0763 *tests/testthat/test-output.R 880b82e738e400646b2c862ad85eaf10 *tests/testthat/test-partial_eval.R f31456d9e191bd735b670eae99b97b7b *tests/testthat/test-pull.R 4573b53bb30c4bfc6929ce170c7eaef0 *tests/testthat/test-remote.R 8090ca2fd971e3f3dc7ec7170369250e *tests/testthat/test-schema.R 5b6ed7e452c6ebc2ddb78f5c7dfce686 *tests/testthat/test-select.r 6d1a99a1ab2f7999d6522c4469290cf1 *tests/testthat/test-sets.R 10f6a6518cb98ba517ab87a13b1c3cd7 *tests/testthat/test-sql-build.R d0ef775810356ef6ff988060467cfd44 *tests/testthat/test-sql-escape.r 0e54f7e89311f0bf3536732969dc27e2 *tests/testthat/test-sql-expr.R 44ac96ad52d1a78b11d158ba41707298 *tests/testthat/test-sql-optimise.R eb26e7e3c20b49bcd42d487970fcbcfd *tests/testthat/test-sql-query.R b6f96ad681f1f9d1234865e2d4ed5a20 *tests/testthat/test-sql-render.R 8fabf538bcc61506443fe2e149b84222 *tests/testthat/test-summarise.r 3a54f7df5c1a0d3ce98fd5de7c74dac4 *tests/testthat/test-tbl-sql.r 4a20be37204b2f5387165d199bb4401c *tests/testthat/test-translate-MySQL.R ced08a27b8b39f69b447b61904fda374 *tests/testthat/test-translate-access.r 81b8b0abfa66dc4cef75e4b22efe0d2e *tests/testthat/test-translate-hive.R b7e0cb2bb0bb1c3df6dd5b29093890f9 *tests/testthat/test-translate-impala.R 60ce328d2d946887eaff5dc6a29f57f4 *tests/testthat/test-translate-literals.R dae5b4b705ec96583ec2bf73689a613f *tests/testthat/test-translate-math.R 0d61cf34cd508ce2c1bc287b70ac519a *tests/testthat/test-translate-mssql.r 153dccb236a6f1675d5baf26a1a9951e *tests/testthat/test-translate-odbc.R ee0bb2a2c7f710d0cb2c34e8c8d3af5a *tests/testthat/test-translate-oracle.R 7b2433c66d4bd123e410f0056f2b2faa *tests/testthat/test-translate-postgresql.R 69f46d6835885b2132743dfdead5700c *tests/testthat/test-translate-sql-helpers.r 93ead8a6961f0bb0a47393f868adde83 *tests/testthat/test-translate-sql-paste.R 09eea3ac77c19150c5a3cbba8d907c7e *tests/testthat/test-translate-sql-window.r 56eb6fb898617a00c383fd617303cfd0 *tests/testthat/test-translate-sqlite.R 69be1e706722025d167d3b5393e012ab *tests/testthat/test-translate-teradata.r c2fdcd4f7471d372240994bed5c1fdba *tests/testthat/test-translate-vectorised.R bb13a28fb31b3207922e9529cb56c7f0 *tests/testthat/test-translate-window.R 70f6ea17721675b7be49ffd726b9d53d *tests/testthat/test-translate.r 3e522c7dc08afee2f2e698dea2b35930 *tests/testthat/test-win_over.R 830cf012d65df479e01294fd3052e1f3 *vignettes/dbplyr.Rmd 3287d66d52e13e5185f69766c0332733 *vignettes/new-backend.Rmd d821503b6e947231aa16d7ab925338f3 *vignettes/notes/mysql-setup.Rmd dd971bdd0cf9faf04337c9aa1fe7936f *vignettes/notes/postgres-setup.Rmd 87d6b4a178814e5268950f515707b666 *vignettes/sql-translation.Rmd 83cdde894e0c44ffda5a9dbae3c80092 *vignettes/windows.graffle a21bea5bdf33d4e34a9dc472f7227de7 *vignettes/windows.png dbplyr/build/0000755000176200001440000000000013221502520012635 5ustar liggesusersdbplyr/build/vignette.rds0000644000176200001440000000043413221502520015175 0ustar liggesusersRN0<(q4*Upu"8FQo8M!wƞu480e4a߬) 1ިM Aȉbͳ7Ej,ץ.7Z+Y3gN؋K!r8C^x߇1SWW(fݤlrGo = 3.2) Imports: assertthat (>= 0.2.0), DBI (>= 0.7), dplyr (>= 0.7.4), glue (>= 1.2.0), methods, purrr (>= 0.2.4), R6 (>= 2.2.2), rlang (>= 0.1.6), tibble (>= 1.4.1), tidyselect (>= 0.2.2), utils Suggests: bit64, covr, knitr, Lahman (>= 5.0.0), nycflights13 (>= 0.2.2), rmarkdown, RMariaDB (>= 1.0.2), RMySQL (>= 0.10.11), RPostgreSQL (>= 0.4.1), RSQLite (>= 2.0), testthat (>= 2.0.0) VignetteBuilder: knitr LazyData: yes RoxygenNote: 6.0.1 Collate: 'cache.r' 'compat-purrr.R' 'data-lahman.r' 'data-nycflights13.r' 'db-compute.R' 'db-mysql.r' 'db-odbc-access.R' 'db-odbc-hive.R' 'db-odbc-impala.R' 'db-odbc-mssql.R' 'db-odbc-oracle.R' 'db-odbc-redshift.R' 'db-odbc-teradata.R' 'db-postgres.r' 'db-roracle.R' 'db-sqlite.r' 'dbi-s3.r' 'dbplyr.R' 'do.r' 'explain.r' 'utils.r' 'ident.R' 'lazy-ops.R' 'memdb.R' 'partial-eval.r' 'query.r' 'remote.R' 'schema.R' 'simulate.r' 'sql-build.R' 'sql-escape.r' 'sql-expr.R' 'sql-generic.R' 'sql-optimise.R' 'sql-query.R' 'sql-render.R' 'sql.R' 'src-sql.r' 'src_dbi.R' 'tbl-lazy.R' 'tbl-sql.r' 'test-frame.R' 'testthat.r' 'translate-sql-helpers.r' 'translate-sql-window.r' 'translate-sql-base.r' 'translate-sql-clause.r' 'translate-sql-odbc.R' 'translate-sql-paste.R' 'translate-sql.r' 'utils-format.r' 'window.R' 'zzz.R' NeedsCompilation: no Packaged: 2017-12-29 18:11:29 UTC; hadley Author: Hadley Wickham [aut, cre], Edgar Ruiz [aut], RStudio [cph, fnd] Maintainer: Hadley Wickham Repository: CRAN Date/Publication: 2018-01-03 13:35:30 UTC dbplyr/man/0000755000176200001440000000000013221502521012312 5ustar liggesusersdbplyr/man/copy_to.src_sql.Rd0000644000176200001440000000462013173702011015726 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/tbl-sql.r \name{copy_to.src_sql} \alias{copy_to.src_sql} \title{Copy a local data frame to a DBI backend.} \usage{ \method{copy_to}{src_sql}(dest, df, name = deparse(substitute(df)), overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ...) } \arguments{ \item{dest}{remote data source} \item{df}{A local data frame, a \code{tbl_sql} from same source, or a \code{tbl_sql} from another source. If from another source, all data must transition through R in one pass, so it is only suitable for transferring small amounts of data.} \item{name}{name for new remote table.} \item{overwrite}{If \code{TRUE}, will overwrite an existing table with name \code{name}. If \code{FALSE}, will throw an error if \code{name} already exists.} \item{types}{a character vector giving variable types to use for the columns. See \url{http://www.sqlite.org/datatype3.html} for available types.} \item{temporary}{if \code{TRUE}, will create a temporary table that is local to this connection and will be automatically deleted when the connection expires} \item{unique_indexes}{a list of character vectors. Each element of the list will create a new unique index over the specified column(s). Duplicate rows will result in failure.} \item{indexes}{a list of character vectors. Each element of the list will create a new index.} \item{analyze}{if \code{TRUE} (the default), will automatically ANALYZE the new table so that the query optimiser has useful information.} \item{...}{other parameters passed to methods.} } \value{ A \code{\link[=tbl]{tbl()}} object (invisibly). } \description{ This \code{\link[=copy_to]{copy_to()}} method works for all DBI sources. It is useful for copying small amounts of data to a database for examples, experiments, and joins. By default, it creates temporary tables which are typically only visible to the current connection to the database. } \examples{ library(dplyr) set.seed(1014) mtcars$model <- rownames(mtcars) mtcars2 <- src_memdb() \%>\% copy_to(mtcars, indexes = list("model"), overwrite = TRUE) mtcars2 \%>\% filter(model == "Hornet 4 Drive") cyl8 <- mtcars2 \%>\% filter(cyl == 8) cyl8_cached <- copy_to(src_memdb(), cyl8) # copy_to is called automatically if you set copy = TRUE # in the join functions df <- tibble(cyl = c(6, 8)) mtcars2 \%>\% semi_join(df, copy = TRUE) } dbplyr/man/figures/0000755000176200001440000000000013070761017013767 5ustar liggesusersdbplyr/man/figures/logo.png0000644000176200001440000003543513070761016015446 0ustar liggesusersPNG  IHDRxb]esRGB pHYs  iTXtXML:com.adobe.xmp Adobe ImageReady 1 ).=8IDATx ]Ew6@XBBH: d"8pQ  89hpqqAd$lYB.3_{ow:+F?}ԩSa\8橉^(yKW]tKfu.0GLJkr?K,~]b޲eEms/wC}:>ǣŭS1|\_fe+ŭ+iQWʫ7U%7qͲ~ɝ$b-*y4?L[Q\J}8Q-] WΛupjSqk8"׼t<֕/rz+0K^eXN\y]ջtwMEԋNj~`b ^O[6rX.?5߿# _p=4X-Nܹ|Ә1R-űs{pQ, 4.Y'\9Jm+n✟{ɪDfJ.Һ[lKc`qzl󒦹~nJǼPѫ>Rĵ%-ZUU?Ӟ޽ú:9blJ5D_|qj]~^>eֹ8!}Uz\ 0f[2VW[:kie|b@C !I(3&٭~)Sn{|.VAG{KZӲ {W7‘2E:X,Z<q[2Kt"0ŽB=>Hl$iqk.4^ nm]K=X%^4 ӹњF5"䊭Þ}5Pg!dlYφؚz24XA_ag\Z[Z (,#%)sܞ-=ƉtۚTF~JܱhoKǣag^f uK~Y=T;q\S]*_ EɁZC.wuuݖvm: , E^<\gTzr8>FeMń}z}q\Nd"&e@8nh.!%p jTD[[w[;nVNH ,^{mR75R&~zm!Q<GQ܋jp.\d wHI:XÞmTu!PύS|Z~nq&Ķ+͏{^.lq'.N]ZY@+jc?Fzv^ oIÜ9*z6@BZkXY%Z1`$-#LVdX4~FG7wFܰ gdHA"ɡUr+nDľ =ۑ =[taOCIϊSe+ -ڂ~fXϕٳ3μg?TVÞF`cښ/iJhFw؃"G7X}]mYF6!dؖ~nkG?_,Pi=_]F{>s~kSrȶ ﰥNf\t1 ߈ӅB-@ wKkkWﯟwa C*+V%ᮏ5cC4R>aW͛HP: bFҳ@:(g匆ZDnlZ0ĺs $EEXҍ;_ukԆK MbQt%\lMf O\L$jϊo^2(5 gTtWFM]_zqtOX5WbܖNnv4u?ďއt@aИE|V2Yٞ2n̰!,=[[sEӹ) I0^8쩫a#= nVa{u!V.Ԗf{L$dPv + FDm*vaVg5N,Ka>BOk㾓8ZFV.vl_?6ZQ_=а䀔4ƋT{ZJϦ5AzcЬS aɬ’A- q<˘9Z>KFa< SQF {H{ {Xz6g ES|5 K2~nCl!Ol7:r6_36aI=@W^6}G'vi@a<@m@צqbZC@%yϭ,cLG?X%6t]7k'?*q'bM?.Rp5|lAy4b@USEtdW+(Nl0gv7b[> &rg%u.KXUם\^-UU0^^[ PUk24IUMĈ< .~/jI%ÅaIJZ5IzBYfi}:a3$NsBOzk(O[|\-mMD,lIAmUW43 ݤ\QPG2 LwTG3ߟ_c< [_ѥyN, Shdzk"'EhKt Jii?.\rsT:n%JqpĤ@Am&Íy.Vw}_k<7W4wiB*y(dKTg3* r=ֆ*֩C8t$H]SRqqd'TYHxĹx )j$_toG/؁/5F (N{h_YwZb7ż:n<Ċu"+asZ_`f+\kA42P^$8W7g jߘ$>+7?熃CįD 3?`s<H:ۭ{_dUV|`D3Q+ !{<תkGϲ#IN"O A#Ovϱ˗Y~# +m0B鋾fUg9,/s8w!tHuLW""; b5n5n$0y9l9LI3Ӽ)LCԻc8E{&:CJnu%$A! lNl!hnoyg[V3h,ᤂ]:Fe=}q^g,l1&OXʽLUuS!8qcZ8c =+?hT&gA\x/HHk!}N(ֲrx֪NX~ˣcX q"8^3@h\=P'T*w;⚛͛|)4q'$Byy,1f~]+>{R^,73|Cn?FƉ}7~BI8 d35qguͶver1VW:YYj2=0jw;yJ.;߶ 4u 'XfiVOKe2XX~ Co/k3%Bξ[Y~8~N;K9YVΝ+rḷ[ Sfіm\'BWz mX gvX-]%v21cX,~ӛLkf_hឡNoyڲT7Q,38JG+"+ʘ ,oz5p.ru6 jUcΪ䲱;[nܪs:?}l~= v#߁OZr:\Y7^u:]kC7o<Š Gk~Lǹ&VRkܧTW5.[ꕵi2!R@66+ݝ=ު;oAj}I|&Ͱ䄩j&,8皍;Rj׽EΗ,Uq`7Z7?ݿ:vZXz4۷[?za[>e'ƒnljpGP(1ı|A?*{38i `IgXUtƙr8>Vxᛖ=V3,uܣ7ۇ[ )a<1*7J%!3YmJsoT7\K"]Ƙ x  YȉigN|2nYrH V{\Y|t y:_qړηMVD/ gOseKZ=2a vW؅nf:W ^Qa0: "Pa!ge)P݌eE*NOz&_gXl4v­ibNYbdk۲{nczY值& >dkräpwۉk+َEU38 .+3u<M']f]qGX",wy*?6->=Iٶs|U7 ׶/kUsҟg~;J:omnO1:Z~˷ SQC+GQIW~,@[53p z%%&SqNOr{́ y9B R>UJk[Ժyob3EyNS(?L69jgaz N,$oZqlbñDsHE ^Yk_)+6<MFo& N!aңZ#4)\5.myUCbE.J0Auf &dƇ/ ThHx28 |8\v8{ŏg\ySb=O68 \_A: \k]~]yS7,7~Ֆ"3_M$plGINT!e^!@}_s =%D8+z "H~%-M7*+מ"[DKQy=^'D0 8Q"gOYǖЪ~]4eʻYEG+5l^+sWY2D>N>^ƌMu*j8$u){;Jk>#͖f5㙺q(2wxϕ/fgtgFD T9ݛ*{rkD]D%. C1L @a Ԯm}}?-ƣǍqt@g\"퍻8//̌mL L]DgoBFG6 #KV;ooKr*jYq\NzP|:ܳç#^͈ GxnP1E>Q6xfY_ex`^Ykg-ܩ@WLA1ۍ7)9;k~>+n[ጣL[7y{d0#w98G`&Iil)E ={;,5| KEW%C/dV(6]Ue0>GHbUkzڱɡud] sO{w{CAV}- ifds&&\6%}*@[jz*E 5d*#Ey4f5K0cJA&܇DB3!!ѷz5I=HEnzG;ڏ095 %^/ƓxP:j=o; K\I+0-Q'WqW#-Xo}!al)rnD879CsdjqUUcCDC yNױ΅ M7{ŏ}UCV_Pl&pBMFilV)^pױιyX HpQ޼C_֋jϠ+_Ɍ & .iuE~7kZ+>:nC=U0gWRݒx3%NDSg,ܷͫ?/M]YYK-b)|d/e왙:vZO0lmKt:Ncn[kcZ[8=ң(kR^Ö˻ !Xk^̅ˮbW4NXVZԬSŧ}j.z$~IZ+OErW_Xb4:dᷡ %6fMԇ0~1׏"LPZL ޥ{c5PÆmyF~zA>vx1Ţ&܅}tcYk#CbO^ :_WGHLFn NjЯEbd+1²galK ? ۠rY;9v:`%ZF>,`Ë-ā(8\1.gtԟu9Cp>vTttfa vvZW,cGPl:CVr:"ʼnߨlXuZz Z?ȆzRpd^r#:qq:qo+FB-UWa=:`{XEF|qW(:;k\({G`29syp$KWT0A\R s:oOjz(nZҽ5,6uvPK;7TRt5 U/e#om~H[:BM̋qXLh8/~NSa_A (h{ت߹9E݉MLhqcEt2vD!ؽ2Wʀ|d9a~wqbhwwu(1v=8XrNj.ǷȮMo+![g;5 !It,9A;,z O*yOD%23 q@\]J-8yhh_pw!{sp*TŃ;ͬA⍮򐊋#{GTK"'*$t(BPtc(ָPDW:"qUC <c-^@xo ,ZV$@ܫG$G2W@*A_XElXgOz8DTJ5|Rd|{?kEXNPeN3w?0 Њya$"rT/2Y@q!8Buޏ.@{1rmk,uVEY6]MQ geaf_]`&$*?Ƕeѧm{W:ypcW'*^^q<<3X1iGr4 VL.o^mŹ[~#ŵClF(%GARm6_.ΫP[i[cUly݃A];:+•gQ &D֜J%&Z5tHviHFHֶ\y)tZL"K`zĕM3Aq::Sk7=Ll&*^m~y~ "Z+ Xzw+ /1aY8^vw?bK !?-KxPÑz pg!H1t%-˷,9)Y|pNÄV.-]pK}%=Qr'7Gf DzJyttLbs%=IfT4.@{ q0!/ag3V͋"2z+.[8Zk^4׏MWq ź2{s'H7ץnK#v>h{K 5toŲMVӎ%AM2{nWG/[\bLmۛ[nc.].c{CD.K/nSfLrOI?gTqԫ@>^d .?:UzVtw{'1%Rlx͂[Lűmy[Cp%qkZ(#zjz} $YnG?W1փʇ~? \g WDurzӐʼn|KYq)yNcVpq8_W1,9gdpq -eV>?FOo Bk iy)/!Ǽ/'~m˺Hϖ%ntCwO)'+[Oy䛪ޛTAg|v,/\1nVtm|a|Bz\mCjl+ډNJYʕ.[5եxIPwE+7v] .J:{QÞ\?=eXJnpZ5߭{%U7/${ r|ryK;P?mlCw emݧQ",}xgCC{%X$U"LTlp3g6U go 'V9Z{՜~1v]:7\Ȯ_ې)ez`'r&3N Dgѱr/jJg9,B&KHAV=/a {\ d* 㑒_y%^T]*v. nP? ~88auV(+)SZ'ν;-ig &s_Cο$!o`سti'~"|5-"rI7e:sP?.W~Π5" F 4A$&e~q8 ӒsqQV+Ḻ 5[pzV,Nni*Lr!7|\zҭka2/][TzJ?G #VQ~NeVIyt%#Qszk=tY|V [Mm[+:b~@>h8|ܷ4}YP_i*?{?W dԍ0fptt{Jd9Lo֘`Ԋ:^Ex%.d"X%kYNJXgСHƃuso]Jl/<tG7xWV]9lk1’D3LXR]VH0[ܛu>i-qz񖈗ˣoL˿W.?/7?nTϽ󃕆 ਂ>Y> m K7"aIH!>C*4B/M0r Y^yk͏lbۑl@ ]lw6.n.1o[r3̇?f{食68G AlgbRaҔMc8{"^ }֊~ ֲ L-t˶5t&QpT >૫8=z=|/Bv'?3_i%CQcÒhkfiDhz&3˃_X{w;z6mXŴԅpҳꌌi5˚UjOeO# +Q=W^F* %氤S6JYup 5L2~yXXͱ|%H<_1 O0]_ZSylDׁG"}’x͕V%Ķ"ܤB ׍XlBK9F ׿w8]۹ Z<ºGRqX!>+˰nn^b- 5ÑFਁgRX2_GŖ2| Ϫt~>ae^Fƴؓuxk5gŧku3Fkr<ݑ_Ͼ7hB% R=+ÒK枌&PKj K%)Ly69KMb5cf}snHdسk`uݪNIEF=ҋ{5̑гzҨ8`W{+!~Ò$vWڳqaٽŏ= ?ts _:uٝGoVCF$s[G# phaC' gÒDULzaIҍ ;4Pѝ'fs`M+1y[k&+`;{D☢^l? u3,W4t0" lagO+7p#pXR_YwSx?c㎰',^khN " Izo!?> kC^L"j aIӈŚ߰$YXK`o5ӳic^=`W{[\=amC=&_?+,uް1v°$ڔF KK #)=-.F6 ݽZ:yu*lXIvƓDCƋ4c QHm+a# :$ggWqZQk?x^9c`5@43hZ?,UNi)0f;xy?.޲cCϪ{KcAK K>~<#7,I~ 4\zoς-u*YCƋ?c@Ò5)߅ YÑ#bU秗̬2Iv>A8Ha_?7 KVQj`$WXR5V1V^跮)OpZ- LIENDB`dbplyr/man/src_dbi.Rd0000644000176200001440000000752013173736733014236 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/src_dbi.R \name{src_dbi} \alias{src_dbi} \alias{tbl.src_dbi} \alias{tbl_dbi} \title{dplyr backend for any DBI-compatible database} \usage{ src_dbi(con, auto_disconnect = FALSE) \method{tbl}{src_dbi}(src, from, ...) } \arguments{ \item{con}{An object that inherits from \link[DBI:DBIConnection-class]{DBI::DBIConnection}, typically generated by \link[DBI:dbConnect]{DBI::dbConnect}} \item{auto_disconnect}{Should the connection be automatically closed when the src is deleted? Set to \code{TRUE} if you initialize the connection the call to \code{src_dbi()}. Pass \code{NA} to auto-disconnect but print a message when this happens.} \item{src}{Either a \code{src_dbi} or \code{DBIConnection}} \item{from}{Either a string (giving a table name) or literal \code{\link[=sql]{sql()}}.} \item{...}{Needed for compatibility with generic; currently ignored.} } \value{ An S3 object with class \code{src_dbi}, \code{src_sql}, \code{src}. } \description{ \code{src_dbi()} is a general dplyr backend that connects to any DBI driver. \code{src_memdb()} connects to a temporary in-memory SQLite database, that's useful for testing and experimenting. You can generate a \code{tbl()} directly from the DBI connection, or go via \code{src_dbi()}. } \details{ All data manipulation on SQL tbls are lazy: they will not actually run the query or retrieve the data unless you ask for it: they all return a new \code{tbl_dbi} object. Use \code{\link[=compute]{compute()}} to run the query and save the results in a temporary in the database, or use \code{\link[=collect]{collect()}} to retrieve the results to R. You can see the query with \code{\link[=show_query]{show_query()}}. For best performance, the database should have an index on the variables that you are grouping by. Use \code{\link[=explain]{explain()}} to check that the database is using the indexes that you expect. There is one excpetion: \code{\link[=do]{do()}} is not lazy since it must pull the data into R. } \examples{ # Basic connection using DBI ------------------------------------------- library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") src <- src_dbi(con, auto_disconnect = TRUE) # Add some data copy_to(src, mtcars) src DBI::dbListTables(con) # To retrieve a single table from a source, use `tbl()` src \%>\% tbl("mtcars") # You can also use pass raw SQL if you want a more sophisticated query src \%>\% tbl(sql("SELECT * FROM mtcars WHERE cyl == 8")) # Alternatively, you can use the `src_sqlite()` helper src2 <- src_sqlite(":memory:", create = TRUE) # If you just want a temporary in-memory database, use src_memdb() src3 <- src_memdb() # To show off the full features of dplyr's database integration, # we'll use the Lahman database. lahman_sqlite() takes care of # creating the database. if (has_lahman("sqlite")) { lahman_p <- lahman_sqlite() batting <- lahman_p \%>\% tbl("Batting") batting # Basic data manipulation verbs work in the same way as with a tibble batting \%>\% filter(yearID > 2005, G > 130) batting \%>\% select(playerID:lgID) batting \%>\% arrange(playerID, desc(yearID)) batting \%>\% summarise(G = mean(G), n = n()) # There are a few exceptions. For example, databases give integer results # when dividing one integer by another. Multiply by 1 to fix the problem batting \%>\% select(playerID:lgID, AB, R, G) \%>\% mutate( R_per_game1 = R / G, R_per_game2 = R * 1.0 / G ) # All operations are lazy: they don't do anything until you request the # data, either by `print()`ing it (which shows the first ten rows), # or by `collect()`ing the results locally. system.time(recent <- filter(batting, yearID > 2010)) system.time(collect(recent)) # You can see the query that dplyr creates with show_query() batting \%>\% filter(G > 0) \%>\% group_by(playerID) \%>\% summarise(n = n()) \%>\% show_query() } } dbplyr/man/testing.Rd0000644000176200001440000000160213173734276014301 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/test-frame.R \name{testing} \alias{testing} \alias{test_register_src} \alias{test_register_con} \alias{src_test} \alias{test_load} \alias{test_frame} \title{Infrastructure for testing dplyr} \usage{ test_register_src(name, src) test_register_con(name, ...) src_test(name) test_load(df, name = random_table_name(), srcs = test_srcs$get(), ignore = character()) test_frame(..., srcs = test_srcs$get(), ignore = character()) } \description{ Register testing sources, then use \code{test_load()} to load an existing data frame into each source. To create a new table in each source, use \code{test_frame()}. } \examples{ \dontrun{ test_register_src("df", src_df(env = new.env())) test_register_src("sqlite", src_sqlite(":memory:", create = TRUE)) test_frame(x = 1:3, y = 3:1) test_load(mtcars) } } \keyword{internal} dbplyr/man/nycflights13.Rd0000644000176200001440000000153513047150410015125 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/data-nycflights13.r \name{nycflights13} \alias{nycflights13} \alias{nycflights13_sqlite} \alias{nycflights13_postgres} \alias{has_nycflights13} \alias{copy_nycflights13} \title{Database versions of the nycflights13 data} \usage{ nycflights13_sqlite(path = NULL) nycflights13_postgres(dbname = "nycflights13", ...) has_nycflights13(type = c("sqlite", "postgresql"), ...) copy_nycflights13(src, ...) } \arguments{ \item{path}{location of sqlite database file} \item{dbname, ...}{Arguments passed on to \code{\link[=src_postgres]{src_postgres()}}} } \description{ These functions cache the data from the \code{nycflights13} database in a local database, for use in examples and vignettes. Indexes are created to making joining tables on natural keys efficient. } \keyword{internal} dbplyr/man/sql_variant.Rd0000644000176200001440000000655613176660113015153 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/translate-sql-helpers.r, % R/translate-sql-base.r, R/translate-sql-odbc.R, R/translate-sql-paste.R \docType{data} \name{sql_variant} \alias{sql_variant} \alias{sql_translator} \alias{sql_infix} \alias{sql_prefix} \alias{sql_aggregate} \alias{sql_aggregate_2} \alias{sql_not_supported} \alias{sql_cast} \alias{sql_log} \alias{sql_cot} \alias{base_scalar} \alias{base_agg} \alias{base_win} \alias{base_no_win} \alias{base_odbc_scalar} \alias{base_odbc_agg} \alias{base_odbc_win} \alias{sql_paste} \alias{sql_paste_infix} \title{Create an sql translator} \usage{ sql_variant(scalar = sql_translator(), aggregate = sql_translator(), window = sql_translator()) sql_translator(..., .funs = list(), .parent = new.env(parent = emptyenv())) sql_infix(f) sql_prefix(f, n = NULL) sql_aggregate(f) sql_aggregate_2(f) sql_not_supported(f) sql_cast(type) sql_log() sql_cot() base_scalar base_agg base_win base_no_win base_odbc_scalar base_odbc_agg base_odbc_win sql_paste(default_sep, f = "CONCAT_WS") sql_paste_infix(default_sep, op, cast) } \arguments{ \item{scalar, aggregate, window}{The three families of functions than an SQL variant can supply.} \item{..., .funs}{named functions, used to add custom converters from standard R functions to sql functions. Specify individually in \code{...}, or provide a list of \code{.funs}} \item{.parent}{the sql variant that this variant should inherit from. Defaults to \code{base_agg} which provides a standard set of mappings for the most common operators and functions.} \item{f}{the name of the sql function as a string} \item{n}{for \code{sql_infix()}, an optional number of arguments to expect. Will signal error if not correct.} } \description{ When creating a package that maps to a new SQL based src, you'll often want to provide some additional mappings from common R commands to the commands that your tbl provides. These three functions make that easy. } \section{Helper functions}{ \code{sql_infix()} and \code{sql_prefix()} create default SQL infix and prefix functions given the name of the SQL function. They don't perform any input checking, but do correctly escape their input, and are useful for quickly providing default wrappers for a new SQL variant. } \examples{ # An example of adding some mappings for the statistical functions that # postgresql provides: http://bit.ly/K5EdTn postgres_agg <- sql_translator(.parent = base_agg, cor = sql_aggregate_2("corr"), cov = sql_aggregate_2("covar_samp"), sd = sql_aggregate("stddev_samp"), var = sql_aggregate("var_samp") ) postgres_var <- sql_variant( base_scalar, postgres_agg, base_no_win ) # Next we have to simulate a connection that uses this variant con <- structure( list(), class = c("TestCon", "DBITestConnection", "DBIConnection") ) sql_translate_env.TestCon <- function(x) postgres_var translate_sql(cor(x, y), con = con, window = FALSE) translate_sql(sd(income / years), con = con, window = FALSE) # Any functions not explicitly listed in the converter will be translated # to sql as is, so you don't need to convert all functions. translate_sql(regr_intercept(y, x), con = con) } \seealso{ \code{\link[=win_over]{win_over()}} for helper functions for window functions. \code{\link[=sql]{sql()}} for an example of a more customised sql conversion function. } \keyword{datasets} \keyword{internal} dbplyr/man/named_commas.Rd0000644000176200001440000000057013036450604015236 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/utils.r \name{named_commas} \alias{named_commas} \title{Provides comma-separated string out ot the parameters} \usage{ named_commas(...) } \arguments{ \item{...}{Arguments to be constructed into the string} } \description{ Provides comma-separated string out ot the parameters } \keyword{internal} dbplyr/man/memdb_frame.Rd0000644000176200001440000000177713221275422015062 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/memdb.R \name{memdb_frame} \alias{memdb_frame} \alias{src_memdb} \title{Create a database table in temporary in-memory database.} \usage{ memdb_frame(..., .name = random_table_name()) src_memdb() } \arguments{ \item{...}{A set of name-value pairs. Arguments are evaluated sequentially, so you can refer to previously created variables. These arguments are processed with \code{\link[rlang:quos]{rlang::quos()}} and support unquote via \code{!!} and unquote-splice via \code{!!!}.} \item{.name}{Name of table in database: defaults to a random name that's unlikely to conflict with an existing table.} } \description{ \code{memdb_frame()} works like \code{\link[tibble:tibble]{tibble::tibble()}}, but instead of creating a new data frame in R, it creates a table in \code{\link[=src_memdb]{src_memdb()}}. } \examples{ library(dplyr) df <- memdb_frame(x = runif(100), y = runif(100)) df \%>\% arrange(x) df \%>\% arrange(x) \%>\% show_query() } dbplyr/man/tbl_sql.Rd0000644000176200001440000000134613067247711014264 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/tbl-sql.r \name{tbl_sql} \alias{tbl_sql} \title{Create an SQL tbl (abstract)} \usage{ tbl_sql(subclass, src, from, ..., vars = NULL) } \arguments{ \item{subclass}{name of subclass} \item{...}{needed for agreement with generic. Not otherwise used.} \item{vars}{If known, the names of the variables in the tbl. This is relatively expensive to determine automatically, so is cached throughout dplyr. However, you should usually be able to leave this blank and it will be determined from the context.} } \description{ Generally, you should no longer need to provide a custom \code{tbl()} method you you can default \code{tbl.DBIConnect} method. } \keyword{internal} dbplyr/man/sql_escape_logical.Rd0000644000176200001440000000050013102115360016404 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/sql-generic.R \name{sql_escape_logical} \alias{sql_escape_logical} \title{More SQL generics} \usage{ sql_escape_logical(con, x) } \description{ These are new, so not included in dplyr for backward compatibility purposes. } \keyword{internal} dbplyr/man/do.tbl_sql.Rd0000644000176200001440000000147313102150366014653 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/do.r \name{do.tbl_sql} \alias{do.tbl_sql} \title{Perform arbitrary computation on remote backend} \usage{ \method{do}{tbl_sql}(.data, ..., .chunk_size = 10000L) } \arguments{ \item{.data}{a tbl} \item{...}{Expressions to apply to each group. If named, results will be stored in a new column. If unnamed, should return a data frame. You can use \code{.} to refer to the current group. You can not mix named and unnamed arguments.} \item{.chunk_size}{The size of each chunk to pull into R. If this number is too big, the process will be slow because R has to allocate and free a lot of memory. If it's too small, it will be slow, because of the overhead of talking to the database.} } \description{ Perform arbitrary computation on remote backend } dbplyr/man/partial_eval.Rd0000644000176200001440000000370113062337773015267 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/partial-eval.r \name{partial_eval} \alias{partial_eval} \title{Partially evaluate an expression.} \usage{ partial_eval(call, vars = character(), env = caller_env()) } \arguments{ \item{call}{an unevaluated expression, as produced by \code{\link[=quote]{quote()}}} \item{vars}{character vector of variable names.} \item{env}{environment in which to search for local values} } \description{ This function partially evaluates an expression, using information from the tbl to determine whether names refer to local expressions or remote variables. This simplifies SQL translation because expressions don't need to carry around their environment - all revelant information is incorporated into the expression. } \section{Symbol substitution}{ \code{partial_eval()} needs to guess if you're referring to a variable on the server (remote), or in the current environment (local). It's not possible to do this 100% perfectly. \code{partial_eval()} uses the following heuristic: \itemize{ \item If the tbl variables are known, and the symbol matches a tbl variable, then remote. \item If the symbol is defined locally, local. \item Otherwise, remote. } } \examples{ vars <- c("year", "id") partial_eval(quote(year > 1980), vars = vars) ids <- c("ansonca01", "forceda01", "mathebo01") partial_eval(quote(id \%in\% ids), vars = vars) # You can use local to disambiguate between local and remote # variables: otherwise remote is always preferred year <- 1980 partial_eval(quote(year > year), vars = vars) partial_eval(quote(year > local(year)), vars = vars) # Functions are always assumed to be remote. Use local to force evaluation # in R. f <- function(x) x + 1 partial_eval(quote(year > f(1980)), vars = vars) partial_eval(quote(year > local(f(1980))), vars = vars) # For testing you can also use it with the tbl omitted partial_eval(quote(1 + 2 * 3)) x <- 1 partial_eval(quote(x ^ y)) } \keyword{internal} dbplyr/man/escape.Rd0000644000176200001440000000237413102116020014040 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/sql-escape.r \name{escape} \alias{escape} \alias{sql_vector} \title{Escape/quote a string.} \usage{ escape(x, parens = NA, collapse = " ", con = NULL) sql_vector(x, parens = NA, collapse = " ", con = NULL) } \arguments{ \item{x}{An object to escape. Existing sql vectors will be left as is, character vectors are escaped with single quotes, numeric vectors have trailing \code{.0} added if they're whole numbers, identifiers are escaped with double quotes.} \item{parens, collapse}{Controls behaviour when multiple values are supplied. \code{parens} should be a logical flag, or if \code{NA}, will wrap in parens if length > 1. Default behaviour: lists are always wrapped in parens and separated by commas, identifiers are separated by commas and never wrapped, atomic vectors are separated by spaces and wrapped in parens if needed.} \item{con}{Database connection. If not specified, uses SQL 92 conventions.} } \description{ Escape/quote a string. } \examples{ # Doubles vs. integers escape(1:5) escape(c(1, 5.4)) # String vs known sql vs. sql identifier escape("X") escape(sql("X")) escape(ident("X")) # Escaping is idempotent escape("X") escape(escape("X")) escape(escape(escape("X"))) } dbplyr/man/build_sql.Rd0000644000176200001440000000236413070722420014570 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/sql-escape.r \name{build_sql} \alias{build_sql} \title{Build a SQL string.} \usage{ build_sql(..., .env = parent.frame(), con = sql_current_con()) } \arguments{ \item{...}{input to convert to SQL. Use \code{\link[=sql]{sql()}} to preserve user input as is (dangerous), and \code{\link[=ident]{ident()}} to label user input as sql identifiers (safe)} \item{.env}{the environment in which to evalute the arguments. Should not be needed in typical use.} \item{con}{database connection; used to select correct quoting characters.} } \description{ This is a convenience function that should prevent sql injection attacks (which in the context of dplyr are most likely to be accidental not deliberate) by automatically escaping all expressions in the input, while treating bare strings as sql. This is unlikely to prevent any serious attack, but should make it unlikely that you produce invalid sql. } \examples{ build_sql("SELECT * FROM TABLE") x <- "TABLE" build_sql("SELECT * FROM ", x) build_sql("SELECT * FROM ", ident(x)) build_sql("SELECT * FROM ", sql(x)) # http://xkcd.com/327/ name <- "Robert'); DROP TABLE Students;--" build_sql("INSERT INTO Students (Name) VALUES (", name, ")") } dbplyr/man/sql_quote.Rd0000644000176200001440000000104013036450604014620 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/sql-escape.r \name{sql_quote} \alias{sql_quote} \title{Helper function for quoting sql elements.} \usage{ sql_quote(x, quote) } \arguments{ \item{x}{Character vector to escape.} \item{quote}{Single quoting character.} } \description{ If the quote character is present in the string, it will be doubled. \code{NA}s will be replaced with NULL. } \examples{ sql_quote("abc", "'") sql_quote("I've had a good day", "'") sql_quote(c("abc", NA), "'") } \keyword{internal} dbplyr/man/sql_expr.Rd0000644000176200001440000000131713174465635014465 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/sql-expr.R \name{sql_expr} \alias{sql_expr} \title{Generate SQL from R expressions} \usage{ sql_expr(x, con = sql_current_con()) } \arguments{ \item{x}{A quasiquoted expression} \item{con}{An optional database connection to control the details of the translation. The default, \code{NULL}, generates ANSI SQL.} } \description{ Low-level building block for generating SQL from R expressions. Strings are escaped; names become bare SQL identifiers. User infix functions have \code{\\\%} stripped. } \examples{ sql_expr(f(x + 1)) sql_expr(f("x", "y")) sql_expr(f(x, y)) sql_expr(cast("x" \%as\% DECIMAL)) sql_expr(round(x) \%::\% numeric) } dbplyr/man/dbplyr-package.Rd0000644000176200001440000000155513176640704015513 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/dbplyr.R \docType{package} \name{dbplyr-package} \alias{dbplyr} \alias{dbplyr-package} \title{dbplyr: A 'dplyr' Back End for Databases} \description{ A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author. } \seealso{ Useful links: \itemize{ \item \url{https://github.com/tidyverse/dbplyr} \item Report bugs at \url{https://github.com/tidyverse/dplyr/issues} } } \author{ \strong{Maintainer}: Hadley Wickham \email{hadley@rstudio.com} Authors: \itemize{ \item Edgar Ruiz } Other contributors: \itemize{ \item RStudio [copyright holder, funder] } } \keyword{internal} dbplyr/man/win_over.Rd0000644000176200001440000000277313174441461014456 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/translate-sql-window.r \name{win_over} \alias{win_over} \alias{win_rank} \alias{win_aggregate} \alias{win_aggregate_2} \alias{win_recycled} \alias{win_cumulative} \alias{win_absent} \alias{win_current_group} \alias{win_current_order} \alias{win_current_frame} \title{Generate SQL expression for window functions} \usage{ win_over(expr, partition = NULL, order = NULL, frame = NULL) win_rank(f) win_aggregate(f) win_aggregate_2(f) win_cumulative(f) win_absent(f) win_current_group() win_current_order() win_current_frame() } \arguments{ \item{expr}{The window expression} \item{order}{Variables to order by} \item{frame}{A numeric vector of length two defining the frame.} \item{f}{The name of an sql function as a string} \item{parition}{Variables to partition over} } \description{ \code{win_over()} makes it easy to generate the window function specification. \code{win_absent()}, \code{win_rank()}, \code{win_aggregate()}, and \code{win_cumulative()} provide helpers for constructing common types of window functions. \code{win_current_group()} and \code{win_current_order()} allow you to access the grouping and order context set up by \code{\link[=group_by]{group_by()}} and \code{\link[=arrange]{arrange()}}. } \examples{ win_over(sql("avg(x)")) win_over(sql("avg(x)"), "y") win_over(sql("avg(x)"), order = "y") win_over(sql("avg(x)"), order = c("x", "y")) win_over(sql("avg(x)"), frame = c(-Inf, 0), order = "y") } \keyword{internal} dbplyr/man/window_order.Rd0000644000176200001440000000134413123013250015302 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/window.R \name{window_order} \alias{window_order} \alias{window_frame} \title{Override window order and frame} \usage{ window_order(.data, ...) window_frame(.data, from = -Inf, to = Inf) } \arguments{ \item{.data}{A remote tibble} \item{...}{Name-value pairs of expressions.} \item{from, to}{Bounds of the frame.} } \description{ Override window order and frame } \examples{ library(dplyr) df <- lazy_frame(g = rep(1:2, each = 5), y = runif(10), z = 1:10) df \%>\% window_order(y) \%>\% mutate(z = cumsum(y)) \%>\% sql_build() df \%>\% group_by(g) \%>\% window_frame(-3, 0) \%>\% window_order(z) \%>\% mutate(z = sum(x)) \%>\% sql_build() } dbplyr/man/in_schema.Rd0000644000176200001440000000142513173736733014555 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/schema.R \name{in_schema} \alias{in_schema} \title{Refer to a table in a schema} \usage{ in_schema(schema, table) } \arguments{ \item{schema, table}{Names of schema and table.} } \description{ Refer to a table in a schema } \examples{ in_schema("my_schema", "my_table") # Example using schemas with SQLite con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") src <- src_dbi(con, auto_disconnect = TRUE) # Add auxilary schema tmp <- tempfile() DBI::dbExecute(con, paste0("ATTACH '", tmp, "' AS aux")) library(dplyr, warn.conflicts = FALSE) copy_to(con, iris, "df", temporary = FALSE) copy_to(con, mtcars, in_schema("aux", "df"), temporary = FALSE) con \%>\% tbl("df") con \%>\% tbl(in_schema("aux", "df")) } dbplyr/man/join.tbl_sql.Rd0000644000176200001440000001027413221446732015216 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/tbl-sql.r \name{join.tbl_sql} \alias{join.tbl_sql} \alias{inner_join.tbl_lazy} \alias{left_join.tbl_lazy} \alias{right_join.tbl_lazy} \alias{full_join.tbl_lazy} \alias{semi_join.tbl_lazy} \alias{anti_join.tbl_lazy} \title{Join sql tbls.} \usage{ \method{inner_join}{tbl_lazy}(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ...) \method{left_join}{tbl_lazy}(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ...) \method{right_join}{tbl_lazy}(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ...) \method{full_join}{tbl_lazy}(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ...) \method{semi_join}{tbl_lazy}(x, y, by = NULL, copy = FALSE, auto_index = FALSE, ...) \method{anti_join}{tbl_lazy}(x, y, by = NULL, copy = FALSE, auto_index = FALSE, ...) } \arguments{ \item{x}{tbls to join} \item{y}{tbls to join} \item{by}{a character vector of variables to join by. If \code{NULL}, the default, \code{*_join()} will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right (to suppress the message, simply explicitly list the variables that you want to join). To join by different variables on x and y use a named vector. For example, \code{by = c("a" = "b")} will match \code{x.a} to \code{y.b}.} \item{copy}{If \code{x} and \code{y} are not from the same data source, and \code{copy} is \code{TRUE}, then \code{y} will be copied into a temporary table in same database as \code{x}. \code{*_join()} will automatically run \code{ANALYZE} on the created table in the hope that this will make you queries as efficient as possible by giving more data to the query planner. This allows you to join tables across srcs, but it's potentially expensive operation so you must opt into it.} \item{suffix}{If there are non-joined duplicate variables in \code{x} and \code{y}, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.} \item{auto_index}{if \code{copy} is \code{TRUE}, automatically create indices for the variables in \code{by}. This may speed up the join if there are matching indexes in \code{x}.} \item{...}{other parameters passed onto methods} } \description{ See \link{join} for a description of the general purpose of the functions. } \section{Implementation notes}{ Semi-joins are implemented using \code{WHERE EXISTS}, and anti-joins with \code{WHERE NOT EXISTS}. Support for semi-joins is somewhat partial: you can only create semi joins where the \code{x} and \code{y} columns are compared with \code{=} not with more general operators. } \examples{ \dontrun{ library(dplyr) if (has_lahman("sqlite")) { # Left joins ---------------------------------------------------------------- lahman_s <- lahman_sqlite() batting <- tbl(lahman_s, "Batting") team_info <- select(tbl(lahman_s, "Teams"), yearID, lgID, teamID, G, R:H) # Combine player and whole team statistics first_stint <- select(filter(batting, stint == 1), playerID:H) both <- left_join(first_stint, team_info, type = "inner", by = c("yearID", "teamID", "lgID")) head(both) explain(both) # Join with a local data frame grid <- expand.grid( teamID = c("WAS", "ATL", "PHI", "NYA"), yearID = 2010:2012) top4a <- left_join(batting, grid, copy = TRUE) explain(top4a) # Indices don't really help here because there's no matching index on # batting top4b <- left_join(batting, grid, copy = TRUE, auto_index = TRUE) explain(top4b) # Semi-joins ---------------------------------------------------------------- people <- tbl(lahman_s, "Master") # All people in half of fame hof <- tbl(lahman_s, "HallOfFame") semi_join(people, hof) # All people not in the hall of fame anti_join(people, hof) # Find all managers manager <- tbl(lahman_s, "Managers") semi_join(people, manager) # Find all managers in hall of fame famous_manager <- semi_join(semi_join(people, manager), hof) famous_manager explain(famous_manager) # Anti-joins ---------------------------------------------------------------- # batters without person covariates anti_join(batting, people) } } } dbplyr/man/translate_sql.Rd0000644000176200001440000000761613173721605015503 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/translate-sql.r \name{translate_sql} \alias{translate_sql} \alias{translate_sql_} \title{Translate an expression to sql.} \usage{ translate_sql(..., con = NULL, vars = character(), vars_group = NULL, vars_order = NULL, vars_frame = NULL, window = TRUE) translate_sql_(dots, con = NULL, vars_group = NULL, vars_order = NULL, vars_frame = NULL, window = TRUE, context = list()) } \arguments{ \item{..., dots}{Expressions to translate. \code{translate_sql()} automatically quotes them for you. \code{translate_sql_()} expects a list of already quoted objects.} \item{con}{An optional database connection to control the details of the translation. The default, \code{NULL}, generates ANSI SQL.} \item{vars}{Deprecated. Now call \code{\link[=partial_eval]{partial_eval()}} directly.} \item{vars_group, vars_order, vars_frame}{Parameters used in the \code{OVER} expression of windowed functions.} \item{window}{Use \code{FALSE} to suppress generation of the \code{OVER} statement used for window functions. This is necessary when generating SQL for a grouped summary.} \item{context}{Use to carry information for special translation cases. For example, MS SQL needs a different conversion for is.na() in WHERE vs. SELECT clauses. Expects a list.} } \description{ Translate an expression to sql. } \section{Base translation}{ The base translator, \code{base_sql}, provides custom mappings for \code{!} (to NOT), \code{&&} and \code{&} to \code{AND}, \code{||} and \code{|} to \code{OR}, \code{^} to \code{POWER}, \code{\%>\%} to \code{\%}, \code{ceiling} to \code{CEIL}, \code{mean} to \code{AVG}, \code{var} to \code{VARIANCE}, \code{tolower} to \code{LOWER}, \code{toupper} to \code{UPPER} and \code{nchar} to \code{LENGTH}. \code{c()} and \code{:} keep their usual R behaviour so you can easily create vectors that are passed to sql. All other functions will be preserved as is. R's infix functions (e.g. \code{\%like\%}) will be converted to their SQL equivalents (e.g. \code{LIKE}). You can use this to access SQL string concatenation: \code{||} is mapped to \code{OR}, but \code{\%||\%} is mapped to \code{||}. To suppress this behaviour, and force errors immediately when dplyr doesn't know how to translate a function it encounters, using set the \code{dplyr.strict_sql} option to \code{TRUE}. You can also use \code{\link[=sql]{sql()}} to insert a raw sql string. } \section{SQLite translation}{ The SQLite variant currently only adds one additional function: a mapping from \code{sd()} to the SQL aggregation function \code{STDEV}. } \examples{ # Regular maths is translated in a very straightforward way translate_sql(x + 1) translate_sql(sin(x) + tan(y)) # Note that all variable names are escaped translate_sql(like == "x") # In ANSI SQL: "" quotes variable _names_, '' quotes strings # Logical operators are converted to their sql equivalents translate_sql(x < 5 & !(y >= 5)) # xor() doesn't have a direct SQL equivalent translate_sql(xor(x, y)) # If is translated into case when translate_sql(if (x > 5) "big" else "small") # Infix functions are passed onto SQL with \% removed translate_sql(first \%like\% "Had\%") translate_sql(first \%is\% NULL) translate_sql(first \%in\% c("John", "Roger", "Robert")) # And be careful if you really want integers translate_sql(x == 1) translate_sql(x == 1L) # If you have an already quoted object, use translate_sql_: x <- quote(y + 1 / sin(t)) translate_sql_(list(x)) # Windowed translation -------------------------------------------- # Known window functions automatically get OVER() translate_sql(mpg > mean(mpg)) # Suppress this with window = FALSE translate_sql(mpg > mean(mpg), window = FALSE) # vars_group controls partition: translate_sql(mpg > mean(mpg), vars_group = "cyl") # and vars_order controls ordering for those functions that need it translate_sql(cumsum(mpg)) translate_sql(cumsum(mpg), vars_order = "mpg") } dbplyr/man/lahman.Rd0000644000176200001440000000327513066524540014064 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/data-lahman.r \name{lahman} \alias{lahman} \alias{lahman_sqlite} \alias{lahman_postgres} \alias{lahman_mysql} \alias{lahman_df} \alias{copy_lahman} \alias{has_lahman} \alias{lahman_srcs} \title{Cache and retrieve an \code{src_sqlite} of the Lahman baseball database.} \usage{ lahman_sqlite(path = NULL) lahman_postgres(dbname = "lahman", host = "localhost", ...) lahman_mysql(dbname = "lahman", ...) lahman_df() copy_lahman(src, ...) has_lahman(type, ...) lahman_srcs(..., quiet = NULL) } \arguments{ \item{...}{Other arguments passed to \code{src} on first load. For mysql and postgresql, the defaults assume you have a local server with \code{lahman} database already created. For \code{lahman_srcs()}, character vector of names giving srcs to generate.} \item{type}{src type.} \item{quiet}{if \code{TRUE}, suppress messages about databases failing to connect.} } \description{ This creates an interesting database using data from the Lahman baseball data source, provided by Sean Lahman at \url{http://www.seanlahman.com/baseball-archive/statistics/}, and made easily available in R through the \pkg{Lahman} package by Michael Friendly, Dennis Murphy and Martin Monkman. See the documentation for that package for documentation of the inidividual tables. } \examples{ # Connect to a local sqlite database, if already created \donttest{ if (has_lahman("sqlite")) { lahman_sqlite() batting <- tbl(lahman_sqlite(), "Batting") batting } # Connect to a local postgres database with lahman database, if available if (has_lahman("postgres")) { lahman_postgres() batting <- tbl(lahman_postgres(), "Batting") } } } \keyword{internal} dbplyr/man/sql_build.Rd0000644000176200001440000000367113053143332014572 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/sql-build.R, R/sql-optimise.R, R/sql-query.R, % R/sql-render.R \name{sql_build} \alias{sql_build} \alias{sql_optimise} \alias{select_query} \alias{join_query} \alias{semi_join_query} \alias{set_op_query} \alias{sql_render} \title{Build and render SQL from a sequence of lazy operations} \usage{ sql_build(op, con = NULL, ...) sql_optimise(x, con = NULL, ...) select_query(from, select = sql("*"), where = character(), group_by = character(), having = character(), order_by = character(), limit = NULL, distinct = FALSE) join_query(x, y, vars, type = "inner", by = NULL, suffix = c(".x", ".y")) semi_join_query(x, y, anti = FALSE, by = NULL) set_op_query(x, y, type = type) sql_render(query, con = NULL, ...) } \arguments{ \item{op}{A sequence of lazy operations} \item{con}{A database connection. The default \code{NULL} uses a set of rules that should be very similar to ANSI 92, and allows for testing without an active database connection.} \item{...}{Other arguments passed on to the methods. Not currently used.} } \description{ \code{sql_build()} creates a \code{select_query} S3 object, that is rendered to a SQL string by \code{sql_render()}. The output from \code{sql_build()} is designed to be easy to test, as it's database agnostic, and has a hierarchical structure. } \details{ \code{sql_build()} is generic over the lazy operations, \link{lazy_ops}, and generates an S3 object that represents the query. \code{sql_render()} takes a query object and then calls a function that is generic over the database. For example, \code{sql_build.op_mutate()} generates a \code{select_query}, and \code{sql_render.select_query()} calls \code{sql_select()}, which has different methods for different databases. The default methods should generate ANSI 92 SQL where possible, so you backends only need to override the methods if the backend is not ANSI compliant. } \keyword{internal} dbplyr/man/src_sql.Rd0000644000176200001440000000100513052655655014266 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/src-sql.r \name{src_sql} \alias{src_sql} \title{Create a "sql src" object} \usage{ src_sql(subclass, con, ...) } \arguments{ \item{subclass}{name of subclass. "src_sql" is an abstract base class, so you must supply this value. \code{src_} is automatically prepended to the class name} \item{con}{the connection object} \item{...}{fields used by object} } \description{ Deprecated: please use \link{src_dbi} instead. } \keyword{internal} dbplyr/man/ident.Rd0000644000176200001440000000140313174404170013712 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/ident.R \name{ident} \alias{ident} \alias{ident_q} \alias{is.ident} \title{Flag a character vector as SQL identifiers} \usage{ ident(...) ident_q(...) is.ident(x) } \arguments{ \item{...}{A character vector, or name-value pairs} \item{x}{An object} } \description{ \code{ident()} takes unquoted strings and flags them as identifiers. \code{ident_q()} assumes its input has already been quoted, and ensures it does not get quoted again. This is currently used only for for \code{schema.table}. } \examples{ # SQL92 quotes strings with ' escape("x") # And identifiers with " ident("x") escape(ident("x")) # You can supply multiple inputs ident(a = "x", b = "y") ident_q(a = "x", b = "y") } dbplyr/man/lazy_ops.Rd0000644000176200001440000000176213123013012014440 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/lazy-ops.R, R/window.R \name{lazy_ops} \alias{lazy_ops} \alias{op_base} \alias{op_single} \alias{add_op_single} \alias{op_double} \alias{op_grps} \alias{op_vars} \alias{op_sort} \alias{op_frame} \title{Lazy operations} \usage{ op_base(x, vars, class = character()) op_single(name, x, dots = list(), args = list()) add_op_single(name, .data, dots = list(), args = list()) op_double(name, x, y, args = list()) op_grps(op) op_vars(op) op_sort(op) op_frame(op) } \description{ This set of S3 classes describe the action of dplyr verbs. These are currently used for SQL sources to separate the description of operations in R from their computation in SQL. This API is very new so is likely to evolve in the future. } \details{ \code{op_vars()} and \code{op_grps()} compute the variables and groups from a sequence of lazy operations. \code{op_sort()} tracks the order of the data for use in window functions. } \keyword{internal} dbplyr/man/db_copy_to.Rd0000644000176200001440000000125313173662660014744 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/db-compute.R \name{db_copy_to} \alias{db_copy_to} \alias{db_compute} \alias{db_collect} \alias{db_sql_render} \title{More db generics} \usage{ db_copy_to(con, table, values, overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ...) db_compute(con, table, sql, temporary = TRUE, unique_indexes = list(), indexes = list(), analyze = TRUE, ...) db_collect(con, sql, n = -1, warn_incomplete = TRUE, ...) db_sql_render(con, sql, ...) } \description{ These are new, so not included in dplyr for backward compatibility purposes. } \keyword{internal} dbplyr/man/sql.Rd0000644000176200001440000000101413102116020013365 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/sql.R \name{sql} \alias{sql} \alias{is.sql} \alias{as.sql} \title{SQL escaping.} \usage{ sql(...) is.sql(x) as.sql(x) } \arguments{ \item{...}{Character vectors that will be combined into a single SQL expression.} \item{x}{Object to coerce} } \description{ These functions are critical when writing functions that translate R functions to sql functions. Typically a conversion function should escape all its inputs and return an sql object. } dbplyr/man/tbl_lazy.Rd0000644000176200001440000000213113174404170014426 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/simulate.r, R/tbl-lazy.R \name{simulate_dbi} \alias{simulate_dbi} \alias{simulate_sqlite} \alias{simulate_postgres} \alias{simulate_mysql} \alias{simulate_odbc} \alias{simulate_impala} \alias{simulate_mssql} \alias{simulate_oracle} \alias{simulate_hive} \alias{simulate_odbc_postgresql} \alias{simulate_teradata} \alias{simulate_odbc_access} \alias{tbl_lazy} \alias{lazy_frame} \title{Create a local lazy tibble} \usage{ simulate_dbi() simulate_sqlite() simulate_postgres() simulate_mysql() simulate_odbc(type = NULL) simulate_impala() simulate_mssql() simulate_oracle() simulate_hive() simulate_odbc_postgresql() simulate_teradata() simulate_odbc_access() tbl_lazy(df, src = NULL) lazy_frame(..., src = NULL) } \description{ These functions are useful for testing SQL generation without having to have an active database connection. } \examples{ library(dplyr) df <- data.frame(x = 1, y = 2) df_sqlite <- tbl_lazy(df, src = simulate_sqlite()) df_sqlite \%>\% summarise(x = sd(x)) \%>\% show_query() } \keyword{internal} dbplyr/man/remote_name.Rd0000644000176200001440000000212113174423766015114 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/remote.R \name{remote_name} \alias{remote_name} \alias{remote_src} \alias{remote_con} \alias{remote_query} \alias{remote_query_plan} \title{Metadata about a remote table} \usage{ remote_name(x) remote_src(x) remote_con(x) remote_query(x) remote_query_plan(x) } \arguments{ \item{x}{Remote table, currently must be a \link{tbl_sql}.} } \value{ The value, or \code{NULL} if not remote table, or not applicable. For example, computed queries do not have a "name" } \description{ \code{remote_name()} gives the name remote table, or \code{NULL} if it's a query. \code{remote_query()} gives the text of the query, and \code{remote_query_plan()} the query plan (as computed by the remote database). \code{remote_src()} and \code{remote_con()} give the dplyr source and DBI connection respectively. } \examples{ mf <- memdb_frame(x = 1:5, y = 5:1, .name = "blorp") remote_name(mf) remote_src(mf) remote_con(mf) remote_query(mf) mf2 <- dplyr::filter(mf, x > 3) remote_name(mf2) remote_src(mf2) remote_con(mf2) remote_query(mf2) } dbplyr/LICENSE0000644000176200001440000000005213066544374012564 0ustar liggesusersYEAR: 2013-2017 COPYRIGHT HOLDER: RStudio