dbplyr/0000755000176200001440000000000013501770504011550 5ustar liggesusersdbplyr/inst/0000755000176200001440000000000013501765345012534 5ustar liggesusersdbplyr/inst/doc/0000755000176200001440000000000013501765344013300 5ustar liggesusersdbplyr/inst/doc/reprex.Rmd0000644000176200001440000000556613426327622015264 0ustar liggesusers--- title: "Reprexes for dbplyr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{reprex} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` If you're reporting a bug in dbplyr, it is much easier for me to help you if you can supply a [reprex](https://reprex.tidyverse.org) that I can run on my computer. Creating reprexes for dbplyr is particularly challenging because you are probably using a database that you can't share with me. Fortunately, in many cases you can still demonstrate the problem even if I don't have the complete dataset, or even access to the database system that you're using. This vignette outlines three approaches for creating reprexes that will work anywhere: * Use `memdb_frame()`/`tbl_memdb()` to easily create datasets that live in an in-memory SQLite database. * Use `lazy_frame()`/`tbl_lazy()` to simulate SQL generation of dplyr pipelines. * Use `translate_sql()` to simulate SQL generation of columnar expression. ```{r setup, message = FALSE} library(dplyr) library(dbplyr) ``` ## Using `memdb_frame()` The first place to start is with SQLite. SQLite is particularly appealing because it's completely embedded instead an R package so doesn't have any external dependencies. SQLite is designed to be small and simple, so it can't demonstrate all problems, but it's easy to try out and a great place to start. You can easily create a SQLite in-memory database table using `memdb_frame()`: ```{r} mf <- memdb_frame(g = c(1, 1, 2, 2, 2), x = 1:5, y = 5:1) mf mf %>% group_by(g) %>% summarise_all(mean, na.rm = TRUE) ``` Reprexes are easiest to understand if you create very small custom data, but if you do want to use an existing data frame you can use `tbl_memdb()`: ```{r} mtcars_db <- tbl_memdb(mtcars) mtcars_db %>% count(cyl) %>% show_query() ``` ## Translating verbs Many problems with dbplyr come down to incorrect SQL generation. Fortunately, it's possible to generate SQL without a database using `lazy_frame()` and `tbl_lazy()`. Both take an `con` argument which takes a database "simulator" like `simulate_postgres()`, `simulate_sqlite()`, etc. ```{r} x <- c("abc", "def", "ghif") lazy_frame(x = x, con = simulate_postgres()) %>% head(5) %>% show_query() lazy_frame(x = x, con = simulate_mssql()) %>% head(5) %>% show_query() ``` If you isolate the problem to incorrect SQL generation, it would be very helpful if you could also suggest more appropriate SQL. ## Translating individual expressions In some cases, you might be able to track the problem down to incorrect translation for a single column expression. In that case, you can make your reprex even simpler with `translate_sql()`: ```{r} translate_sql(substr(x, 1, 2), con = simulate_postgres()) translate_sql(substr(x, 1, 2), con = simulate_sqlite()) ``` dbplyr/inst/doc/dbplyr.Rmd0000644000176200001440000002632013474056125015242 0ustar liggesusers--- title: "Introduction to dbplyr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to dbplyr} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 6L, tibble.print_max = 6L, digits = 3) ``` As well as working with local in-memory data stored in data frames, dplyr also works with remote on-disk data stored in databases. This is particularly useful in two scenarios: * Your data is already in a database. * You have so much data that it does not all fit into memory simultaneously and you need to use some external storage engine. (If your data fits in memory there is no advantage to putting it in a database: it will only be slower and more frustrating.) This vignette focuses on the first scenario because it's the most common. If you're using R to do data analysis inside a company, most of the data you need probably already lives in a database (it's just a matter of figuring out which one!). However, you will learn how to load data in to a local database in order to demonstrate dplyr's database tools. At the end, I'll also give you a few pointers if you do need to set up your own database. ## Getting started To use databases with dplyr you need to first install dbplyr: ```{r, eval = FALSE} install.packages("dbplyr") ``` You'll also need to install a DBI backend package. The DBI package provides a common interface that allows dplyr to work with many different databases using the same code. DBI is automatically installed with dbplyr, but you need to install a specific backend for the database that you want to connect to. Five commonly used backends are: * [RMariaDB](https://CRAN.R-project.org/package=RMariaDB) connects to MySQL and MariaDB * [RPostgres](https://CRAN.R-project.org/package=RPostgres) connects to Postgres and Redshift. * [RSQLite](https://github.com/rstats-db/RSQLite) embeds a SQLite database. * [odbc](https://github.com/rstats-db/odbc#odbc) connects to many commercial databases via the open database connectivity protocol. * [bigrquery](https://github.com/rstats-db/bigrquery) connects to Google's BigQuery. If the database you need to connect to is not listed here, you'll need to do some investigation (i.e. googling) yourself. In this vignette, we're going to use the RSQLite backend which is automatically installed when you install dbplyr. SQLite is a great way to get started with databases because it's completely embedded inside an R package. Unlike most other systems, you don't need to setup a separate database server. SQLite is great for demos, but is surprisingly powerful, and with a little practice you can use it to easily work with many gigabytes of data. ## Connecting to the database To work with a database in dplyr, you must first connect to it, using `DBI::dbConnect()`. We're not going to go into the details of the DBI package here, but it's the foundation upon which dbplyr is built. You'll need to learn more about if you need to do things to the database that are beyond the scope of dplyr. ```{r setup, message = FALSE} library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:") ``` The arguments to `DBI::dbConnect()` vary from database to database, but the first argument is always the database backend. It's `RSQLite::SQLite()` for RSQLite, `RMariaDB::MariaDB()` for RMariaDB, `RPostgres::Postgres()` for RPostgres, `odbc::odbc()` for odbc, and `bigrquery::bigquery()` for BigQuery. SQLite only needs one other argument: the path to the database. Here we use the special string `":memory:"` which causes SQLite to make a temporary in-memory database. Most existing databases don't live in a file, but instead live on another server. That means in real-life that your code will look more like this: ```{r, eval = FALSE} con <- DBI::dbConnect(RMariaDB::MariaDB(), host = "database.rstudio.com", user = "hadley", password = rstudioapi::askForPassword("Database password") ) ``` (If you're not using RStudio, you'll need some other way to securely retrieve your password. You should never record it in your analysis scripts or type it into the console. [Securing Credentials](https://db.rstudio.com/best-practices/managing-credentials) provides some best practices.) Our temporary database has no data in it, so we'll start by copying over `nycflights13::flights` using the convenient `copy_to()` function. This is a quick and dirty way of getting data into a database and is useful primarily for demos and other small jobs. ```{r} copy_to(con, nycflights13::flights, "flights", temporary = FALSE, indexes = list( c("year", "month", "day"), "carrier", "tailnum", "dest" ) ) ``` As you can see, the `copy_to()` operation has an additional argument that allows you to supply indexes for the table. Here we set up indexes that will allow us to quickly process the data by day, carrier, plane, and destination. Creating the right indices is key to good database performance, but is unfortunately beyond the scope of this article. Now that we've copied the data, we can use `tbl()` to take a reference to it: ```{r} flights_db <- tbl(con, "flights") ``` When you print it out, you'll notice that it mostly looks like a regular tibble: ```{r} flights_db ``` The main difference is that you can see that it's a remote source in a SQLite database. ## Generating queries To interact with a database you usually use SQL, the Structured Query Language. SQL is over 40 years old, and is used by pretty much every database in existence. The goal of dbplyr is to automatically generate SQL for you so that you're not forced to use it. However, SQL is a very large language and dbplyr doesn't do everything. It focusses on `SELECT` statements, the SQL you write most often as an analyst. Most of the time you don't need to know anything about SQL, and you can continue to use the dplyr verbs that you're already familiar with: ```{r} flights_db %>% select(year:day, dep_delay, arr_delay) flights_db %>% filter(dep_delay > 240) flights_db %>% group_by(dest) %>% summarise(delay = mean(dep_time)) ``` However, in the long-run, I highly recommend you at least learn the basics of SQL. It's a valuable skill for any data scientist, and it will help you debug problems if you run into problems with dplyr's automatic translation. If you're completely new to SQL you might start with this [codeacademy tutorial](https://www.codecademy.com/learn/learn-sql). If you have some familiarity with SQL and you'd like to learn more, I found [how indexes work in SQLite](http://www.sqlite.org/queryplanner.html) and [10 easy steps to a complete understanding of SQL](http://blog.jooq.org/2016/03/17/10-easy-steps-to-a-complete-understanding-of-sql) to be particularly helpful. The most important difference between ordinary data frames and remote database queries is that your R code is translated into SQL and executed in the database on the remote server, not in R on your local machine. When working with databases, dplyr tries to be as lazy as possible: * It never pulls data into R unless you explicitly ask for it. * It delays doing any work until the last possible moment: it collects together everything you want to do and then sends it to the database in one step. For example, take the following code: ```{r} tailnum_delay_db <- flights_db %>% group_by(tailnum) %>% summarise( delay = mean(arr_delay), n = n() ) %>% arrange(desc(delay)) %>% filter(n > 100) ``` Surprisingly, this sequence of operations never touches the database. It's not until you ask for the data (e.g. by printing `tailnum_delay`) that dplyr generates the SQL and requests the results from the database. Even then it tries to do as little work as possible and only pulls down a few rows. ```{r} tailnum_delay_db ``` Behind the scenes, dplyr is translating your R code into SQL. You can see the SQL it's generating with `show_query()`: ```{r} tailnum_delay_db %>% show_query() ``` If you're familiar with SQL, this probably isn't exactly what you'd write by hand, but it does the job. You can learn more about the SQL translation in `vignette("translation-verb")` and `vignette("translation-function")`. Typically, you'll iterate a few times before you figure out what data you need from the database. Once you've figured it out, use `collect()` to pull all the data down into a local tibble: ```{r} tailnum_delay <- tailnum_delay_db %>% collect() tailnum_delay ``` `collect()` requires that database does some work, so it may take a long time to complete. Otherwise, dplyr tries to prevent you from accidentally performing expensive query operations: * Because there's generally no way to determine how many rows a query will return unless you actually run it, `nrow()` is always `NA`. * Because you can't find the last few rows without executing the whole query, you can't use `tail()`. ```{r, error = TRUE} nrow(tailnum_delay_db) tail(tailnum_delay_db) ``` You can also ask the database how it plans to execute the query with `explain()`. The output is database dependent, and can be esoteric, but learning a bit about it can be very useful because it helps you understand if the database can execute the query efficiently, or if you need to create new indices. ## Creating your own database If you don't already have a database, here's some advice from my experiences setting up and running all of them. SQLite is by far the easiest to get started with. PostgreSQL is not too much harder to use and has a wide range of built-in functions. In my opinion, you shouldn't bother with MySQL/MariaDB: it's a pain to set up, the documentation is subpar, and it's less featureful than Postgres. Google BigQuery might be a good fit if you have very large data, or if you're willing to pay (a small amount of) money to someone who'll look after your database. All of these databases follow a client-server model - a computer that connects to the database and the computer that is running the database (the two may be one and the same but usually isn't). Getting one of these databases up and running is beyond the scope of this article, but there are plenty of tutorials available on the web. ### MySQL/MariaDB In terms of functionality, MySQL lies somewhere between SQLite and PostgreSQL. It provides a wider range of [built-in functions](http://dev.mysql.com/doc/refman/5.0/en/functions.html). It gained support for window functions in 2018. ### PostgreSQL PostgreSQL is a considerably more powerful database than SQLite. It has a much wider range of [built-in functions](http://www.postgresql.org/docs/current/static/functions.html), and is generally a more featureful database. ### BigQuery BigQuery is a hosted database server provided by Google. To connect, you need to provide your `project`, `dataset` and optionally a project for `billing` (if billing for `project` isn't enabled). It provides a similar set of functions to Postgres and is designed specifically for analytic workflows. Because it's a hosted solution, there's no setup involved, but if you have a lot of data, getting it to Google can be an ordeal (especially because upload support from R is not great currently). (If you have lots of data, you can [ship hard drives]()!) dbplyr/inst/doc/translation-verb.html0000644000176200001440000004100113501765344017454 0ustar liggesusers Verb translation

Verb translation

There are two parts to dbplyr SQL translation: translating dplyr verbs, and translating expressions within those verbs. This vignette describes how entire verbs are translated; vignette("translate-function") describes how individual expressions within those verbs are translated.

All dplyr verbs generate a SELECT statement. To demonstrate we’ll make a temporary database with a couple of tables

library(dplyr)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
flights <- copy_to(con, nycflights13::flights)
airports <- copy_to(con, nycflights13::airports)

Single table verbs

Dual table verbs

R SQL
inner_join() SELECT * FROM x JOIN y ON x.a = y.a
left_join() SELECT * FROM x LEFT JOIN y ON x.a = y.a
right_join() SELECT * FROM x RIGHT JOIN y ON x.a = y.a
full_join() SELECT * FROM x FULL JOIN y ON x.a = y.a
semi_join() SELECT * FROM x WHERE EXISTS (SELECT 1 FROM y WHERE x.a = y.a)
anti_join() SELECT * FROM x WHERE NOT EXISTS (SELECT 1 FROM y WHERE x.a = y.a)
intersect(x, y) SELECT * FROM x INTERSECT SELECT * FROM y
union(x, y) SELECT * FROM x UNION SELECT * FROM y
setdiff(x, y) SELECT * FROM x EXCEPT SELECT * FROM y

x and y don’t have to be tables in the same database. If you specify copy = TRUE, dplyr will copy the y table into the same location as the x variable. This is useful if you’ve downloaded a summarised dataset and determined a subset of interest that you now want the full data for. You can use semi_join(x, y, copy = TRUE) to upload the indices of interest to a temporary table in the same database as x, and then perform a efficient semi join in the database.

If you’re working with large data, it maybe also be helpful to set auto_index = TRUE. That will automatically add an index on the join variables to the temporary table.

Behind the scenes

The verb level SQL translation is implemented on top of tbl_lazy, which basically tracks the operations you perform in a pipeline (see lazy-ops.R). Turning that into a SQL query takes place in three steps:

dbplyr/inst/doc/dbplyr.R0000644000176200001440000000407713501765341014724 0ustar liggesusers## ---- include = FALSE---------------------------------------------------- knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 6L, tibble.print_max = 6L, digits = 3) ## ---- eval = FALSE------------------------------------------------------- # install.packages("dbplyr") ## ----setup, message = FALSE---------------------------------------------- library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:") ## ---- eval = FALSE------------------------------------------------------- # con <- DBI::dbConnect(RMariaDB::MariaDB(), # host = "database.rstudio.com", # user = "hadley", # password = rstudioapi::askForPassword("Database password") # ) ## ------------------------------------------------------------------------ copy_to(con, nycflights13::flights, "flights", temporary = FALSE, indexes = list( c("year", "month", "day"), "carrier", "tailnum", "dest" ) ) ## ------------------------------------------------------------------------ flights_db <- tbl(con, "flights") ## ------------------------------------------------------------------------ flights_db ## ------------------------------------------------------------------------ flights_db %>% select(year:day, dep_delay, arr_delay) flights_db %>% filter(dep_delay > 240) flights_db %>% group_by(dest) %>% summarise(delay = mean(dep_time)) ## ------------------------------------------------------------------------ tailnum_delay_db <- flights_db %>% group_by(tailnum) %>% summarise( delay = mean(arr_delay), n = n() ) %>% arrange(desc(delay)) %>% filter(n > 100) ## ------------------------------------------------------------------------ tailnum_delay_db ## ------------------------------------------------------------------------ tailnum_delay_db %>% show_query() ## ------------------------------------------------------------------------ tailnum_delay <- tailnum_delay_db %>% collect() tailnum_delay ## ---- error = TRUE------------------------------------------------------- nrow(tailnum_delay_db) tail(tailnum_delay_db) dbplyr/inst/doc/translation-verb.Rmd0000644000176200001440000000766113474056125017247 0ustar liggesusers--- title: "Verb translation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Verb translation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- There are two parts to dbplyr SQL translation: translating dplyr verbs, and translating expressions within those verbs. This vignette describes how entire verbs are translated; `vignette("translate-function")` describes how individual expressions within those verbs are translated. All dplyr verbs generate a `SELECT` statement. To demonstrate we'll make a temporary database with a couple of tables ```{r, message = FALSE} library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") flights <- copy_to(con, nycflights13::flights) airports <- copy_to(con, nycflights13::airports) ``` ## Single table verbs * `select()` and `mutate()` modify the `SELECT` clause: ```{r} flights %>% select(contains("delay")) %>% show_query() flights %>% select(distance, air_time) %>% mutate(speed = distance / (air_time / 60)) %>% show_query() ``` (As you can see here, the generated SQL isn't always as minimal as you might generate by hand.) * `filter()` generates a `WHERE` clause: ```{r} flights %>% filter(month == 1, day == 1) %>% show_query() ``` * `arrange()` generates an `ORDER BY` clause: ```{r} flights %>% arrange(carrier, desc(arr_delay)) %>% show_query() ``` * `summarise()` and `group_by()` work together to generate a `GROUP BY` clause: ```{r} flights %>% group_by(month, day) %>% summarise(delay = mean(dep_delay)) %>% show_query() ``` ## Dual table verbs | R | SQL |------------------|------------------------------------------------------------ | `inner_join()` | `SELECT * FROM x JOIN y ON x.a = y.a` | `left_join()` | `SELECT * FROM x LEFT JOIN y ON x.a = y.a` | `right_join()` | `SELECT * FROM x RIGHT JOIN y ON x.a = y.a` | `full_join()` | `SELECT * FROM x FULL JOIN y ON x.a = y.a` | `semi_join()` | `SELECT * FROM x WHERE EXISTS (SELECT 1 FROM y WHERE x.a = y.a)` | `anti_join()` | `SELECT * FROM x WHERE NOT EXISTS (SELECT 1 FROM y WHERE x.a = y.a)` | `intersect(x, y)`| `SELECT * FROM x INTERSECT SELECT * FROM y` | `union(x, y)` | `SELECT * FROM x UNION SELECT * FROM y` | `setdiff(x, y)` | `SELECT * FROM x EXCEPT SELECT * FROM y` `x` and `y` don't have to be tables in the same database. If you specify `copy = TRUE`, dplyr will copy the `y` table into the same location as the `x` variable. This is useful if you've downloaded a summarised dataset and determined a subset of interest that you now want the full data for. You can use `semi_join(x, y, copy = TRUE)` to upload the indices of interest to a temporary table in the same database as `x`, and then perform a efficient semi join in the database. If you're working with large data, it maybe also be helpful to set `auto_index = TRUE`. That will automatically add an index on the join variables to the temporary table. ## Behind the scenes The verb level SQL translation is implemented on top of `tbl_lazy`, which basically tracks the operations you perform in a pipeline (see `lazy-ops.R`). Turning that into a SQL query takes place in three steps: * `sql_build()` recurses over the lazy op data structure building up query objects (`select_query()`, `join_query()`, `set_op_query()` etc) that represent the different subtypes of `SELECT` queries that we might generate. * `sql_optimise()` takes a pass over these SQL objects, looking for potential optimisations. Currently this only involves removing subqueries where possible. * `sql_render()` calls an SQL generation function (`sql_select()`, `sql_join()`, `sql_subquery()`, `sql_semijoin()` etc) to produce the actual SQL. Each of these functions is a generic, taking the connection as an argument, so that the details can be customised for different databases. dbplyr/inst/doc/translation-function.Rmd0000644000176200001440000002312613474056125020130 0ustar liggesusers--- title: "Function translation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Function translation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) ``` There are two parts to dbplyr SQL translation: translating dplyr verbs, and translating expressions within those verbs. This vignette describes how individual expressions (function calls) are translated; `vignette("translate-verb")` describes how entire verbs are translated. ```{r, message = FALSE} library(dbplyr) library(dplyr) ``` `dbplyr::translate_sql()` powers translation of individual function calls, and I'll use it extensively in this vignette to show what's happening. You shouldn't need to use it ordinary code as dbplyr takes care of the translation automatically. ```{r} translate_sql((x + y) / 2) ``` `translate_sql()` takes an optional `con` parameter. If not supplied, this causes dplyr to generate (approximately) SQL-92 compliant SQL. If supplied, dplyr uses `sql_translate_env()` to look up a custom environment which makes it possible for different databases to generate slightly different SQL: see `vignette("new-backend")` for more details. You can use the various simulate helpers to see the translations used by different backends: ```{r} translate_sql(x ^ 2L) translate_sql(x ^ 2L, con = simulate_sqlite()) translate_sql(x ^ 2L, con = simulate_access()) ``` Perfect translation is not possible because databases don't have all the functions that R does. The goal of dplyr is to provide a semantic rather than a literal translation: what you mean, rather than precisely what is done. In fact, even for functions that exist both in databases and R, you shouldn't expect results to be identical; database programmers have different priorities than R core programmers. For example, in R in order to get a higher level of numerical accuracy, `mean()` loops through the data twice. R's `mean()` also provides a `trim` option for computing trimmed means; this is something that databases do not provide. If you're interested in how `translate_sql()` is implemented, the basic techniques that underlie the implementation of `translate_sql()` are described in ["Advanced R"](http://adv-r.hadley.nz/translation.html). ## Basic differences The following examples work through some of the basic differences between R and SQL. * `"` and `'` mean different things ```{r} # In SQLite variable names are escaped by double quotes: translate_sql(x) # And strings are escaped by single quotes translate_sql("x") ``` * And some functions have different argument orders: ```{r} translate_sql(substr(x, 5, 10)) translate_sql(log(x, 10)) ``` * R and SQL have different defaults for integers and reals. In R, 1 is a real, and 1L is an integer. In SQL, 1 is an integer, and 1.0 is a real ```{r} translate_sql(1) translate_sql(1L) ``` ## Known functions ### Mathematics * basic math operators: `+`, `-`, `*`, `/`, `^` * trigonometry: `acos()`, `asin()`, `atan()`, `atan2()`, `cos()`, `cot()`, `tan()`, `sin()` * hypergeometric: `cosh()`, `coth()`, `sinh()`, `tanh()` * logarithmic: `log()`, `log10()`, `exp()` * misc: `abs()`, `ceiling()`, `sqrt()`, `sign()`, `round()` ## Modulo arithmetic dbplyr translates `%%` and `%/%` to their SQL equivalents but note that they are not precisely the same: most databases use truncated division where the modulo operator takes the sign of the dividend, where R using the mathematically preferred floored division with the modulo sign taking the sign of the divisor. ```{r} df <- tibble( x = c(10L, 10L, -10L, -10L), y = c(3L, -3L, 3L, -3L) ) mf <- src_memdb() %>% copy_to(df, overwrite = TRUE) df %>% mutate(x %% y, x %/% y) mf %>% mutate(x %% y, x %/% y) ``` ### Logical comparisons * logical comparisons: `<`, `<=`, `!=`, `>=`, `>`, `==`, `%in%` * boolean operations: `&`, `&&`, `|`, `||`, `!`, `xor()` ### Aggregation All database provide translation for the basic aggregations: `mean()`, `sum()`, `min()`, `max()`, `sd()`, `var()`. Databases automatically drop NULLs (their equivalent of missing values) whereas in R you have to ask nicely. The aggregation functions warn you about this important difference: ```{r} translate_sql(mean(x)) translate_sql(mean(x, na.rm = TRUE)) ``` Note that, by default, `translate()` assumes that the call is inside a `mutate()` or `filter()` and generates a window translation. If you want to see the equivalent `summarise()`/aggregation translation, use `window = FALSE`: ```{r} translate_sql(mean(x, na.rm = TRUE), window = FALSE) ``` ### Conditional evaluation `if` and `switch()` are translate to `CASE WHEN`: ```{r} translate_sql(if (x > 5) "big" else "small") translate_sql(switch(x, a = 1L, b = 2L, 3L)) ``` ### String manipulation ### Date/time * string functions: `tolower`, `toupper`, `trimws`, `nchar`, `substr` * coerce types: `as.numeric`, `as.integer`, `as.character` ## Unknown functions Any function that dplyr doesn't know how to convert is left as is. This means that database functions that are not covered by dplyr can be used directly via `translate_sql()`. Here a couple of examples that will work with [SQLite](http://www.sqlite.org/lang_corefunc.html): ```{r} translate_sql(glob(x, y)) translate_sql(x %like% "ab%") ``` See `vignette("sql")` for more details. ## Window functions Things get a little trickier with window functions, because SQL's window functions are considerably more expressive than the specific variants provided by base R or dplyr. They have the form `[expression] OVER ([partition clause] [order clause] [frame_clause])`: * The __expression__ is a combination of variable names and window functions. Support for window functions varies from database to database, but most support the ranking functions, `lead`, `lag`, `nth`, `first`, `last`, `count`, `min`, `max`, `sum`, `avg` and `stddev`. * The __partition clause__ specifies how the window function is broken down over groups. It plays an analogous role to `GROUP BY` for aggregate functions, and `group_by()` in dplyr. It is possible for different window functions to be partitioned into different groups, but not all databases support it, and neither does dplyr. * The __order clause__ controls the ordering (when it makes a difference). This is important for the ranking functions since it specifies which variables to rank by, but it's also needed for cumulative functions and lead. Whenever you're thinking about before and after in SQL, you must always tell it which variable defines the order. If the order clause is missing when needed, some databases fail with an error message while others return non-deterministic results. * The __frame clause__ defines which rows, or __frame__, that are passed to the window function, describing which rows (relative to the current row) should be included. The frame clause provides two offsets which determine the start and end of frame. There are three special values: -Inf means to include all preceding rows (in SQL, "unbounded preceding"), 0 means the current row ("current row"), and Inf means all following rows ("unbounded following)". The complete set of options is comprehensive, but fairly confusing, and is summarised visually below. ```{r echo = FALSE, out.width = "100%"} knitr::include_graphics("windows.png", dpi = 300) ``` Of the many possible specifications, there are only three that commonly used. They select between aggregation variants: * Recycled: `BETWEEN UNBOUND PRECEEDING AND UNBOUND FOLLOWING` * Cumulative: `BETWEEN UNBOUND PRECEEDING AND CURRENT ROW` * Rolling: `BETWEEN 2 PRECEEDING AND 2 FOLLOWING` dplyr generates the frame clause based on whether your using a recycled aggregate or a cumulative aggregate. To see how individual window functions are translated to SQL, we can again use `translate_sql()`: ```{r} translate_sql(mean(G)) translate_sql(rank(G)) translate_sql(ntile(G, 2)) translate_sql(lag(G)) ``` If the tbl has been grouped or arranged previously in the pipeline, then dplyr will use that information to set the "partition by" and "order by" clauses. For interactive exploration, you can achieve the same effect by setting the `vars_group` and `vars_order` arguments to `translate_sql()` ```{r} translate_sql(cummean(G), vars_order = "year") translate_sql(rank(), vars_group = "ID") ``` There are some challenges when translating window functions between R and SQL, because dplyr tries to keep the window functions as similar as possible to both the existing R analogues and to the SQL functions. This means that there are three ways to control the order clause depending on which window function you're using: * For ranking functions, the ordering variable is the first argument: `rank(x)`, `ntile(y, 2)`. If omitted or `NULL`, will use the default ordering associated with the tbl (as set by `arrange()`). * Accumulating aggregates only take a single argument (the vector to aggregate). To control ordering, use `order_by()`. * Aggregates implemented in dplyr (`lead`, `lag`, `nth_value`, `first_value`, `last_value`) have an `order_by` argument. Supply it to override the default ordering. The three options are illustrated in the snippet below: ```{r, eval = FALSE} mutate(players, min_rank(yearID), order_by(yearID, cumsum(G)), lead(G, order_by = yearID) ) ``` Currently there is no way to order by multiple variables, except by setting the default ordering with `arrange()`. This will be added in a future release. dbplyr/inst/doc/sql.html0000644000176200001440000004506213501765342014772 0ustar liggesusers Writing SQL with dbplyr

Writing SQL with dbplyr

This vignette discusses why you might use dbplyr instead of writing SQL yourself, and what to do when dbplyr’s built-in translations can’t create the SQL that you need.

library(dplyr)
library(dbplyr)

mf <- memdb_frame(x = 1, y = 2)

Why use dbplyr?

One simple nicety of dplyr is that it will automatically generate subqueries if you want to use a freshly created variable in mutate():

In general, it’s much easier to work iteratively in dbplyr. You can easily give intermediate queries names, and reuse them in multiple places. Or if you have a common operation that you want to do to many queries, you can easily wrap it up in a function. It’s also easy to chain count() to the end of any query to check the results are about what you expect.

What happens when dbplyr fails?

dbplyr aims to translate the most common R functions to their SQL equivalents, allowing you to ignore the vagaries of the SQL dialect that you’re working with, so you can focus on the data analysis problem at hand. But different backends have different capabilities, and sometimes there are SQL functions that don’t have exact equivalents in R. In those cases, you’ll need to write SQL code directly. This section shows you how you can do so.

Prefix functions

Any function that dbplyr doesn’t know about will be left as is:

Because SQL functions are general case insensitive, I recommend using upper case when you’re using SQL functions in R code. That makes it easier to spot that you’re doing something unusual:

Infix functions

As well as prefix functions (where the name of the function comes before the arguments), dbplyr also translates infix functions. That allows you to use expressions like LIKE which does a limited form of pattern matching:

Or use || for string concatenation (note that backends should translate paste() and paste0() for you):

Special forms

SQL functions tend to have a greater variety of syntax than R. That means there are a number of expressions that can’t be translated directly from R code. To insert these in your own queries, you can use literal SQL inside sql():

Note that you can use sql() at any depth inside the expression:

dbplyr/inst/doc/sql.Rmd0000644000176200001440000000603513474056125014546 0ustar liggesusers--- title: "Writing SQL with dbplyr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Writing SQL with dbplyr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This vignette discusses why you might use dbplyr instead of writing SQL yourself, and what to do when dbplyr's built-in translations can't create the SQL that you need. ```{r setup, message = FALSE} library(dplyr) library(dbplyr) mf <- memdb_frame(x = 1, y = 2) ``` ## Why use dbplyr? One simple nicety of dplyr is that it will automatically generate subqueries if you want to use a freshly created variable in `mutate()`: ```{r} mf %>% mutate( a = y * x, b = a ^ 2, ) %>% show_query() ``` In general, it's much easier to work iteratively in dbplyr. You can easily give intermediate queries names, and reuse them in multiple places. Or if you have a common operation that you want to do to many queries, you can easily wrap it up in a function. It's also easy to chain `count()` to the end of any query to check the results are about what you expect. ## What happens when dbplyr fails? dbplyr aims to translate the most common R functions to their SQL equivalents, allowing you to ignore the vagaries of the SQL dialect that you're working with, so you can focus on the data analysis problem at hand. But different backends have different capabilities, and sometimes there are SQL functions that don't have exact equivalents in R. In those cases, you'll need to write SQL code directly. This section shows you how you can do so. ### Prefix functions Any function that dbplyr doesn't know about will be left as is: ```{r} mf %>% mutate(z = foofify(x, y)) %>% show_query() ``` Because SQL functions are general case insensitive, I recommend using upper case when you're using SQL functions in R code. That makes it easier to spot that you're doing something unusual: ```{r} mf %>% mutate(z = FOOFIFY(x, y)) %>% show_query() ``` ### Infix functions As well as prefix functions (where the name of the function comes before the arguments), dbplyr also translates infix functions. That allows you to use expressions like `LIKE` which does a limited form of pattern matching: ```{r} mf %>% filter(x %LIKE% "%foo%") %>% show_query() ``` Or use `||` for string concatenation (note that backends should translate `paste()` and `paste0()` for you): ```{r} mf %>% transmute(z = x %||% y) %>% show_query() ``` ### Special forms SQL functions tend to have a greater variety of syntax than R. That means there are a number of expressions that can't be translated directly from R code. To insert these in your own queries, you can use literal SQL inside `sql()`: ```{r} mf %>% transmute(factorial = sql("x!")) %>% show_query() mf %>% transmute(factorial = sql("CAST(x AS FLOAT)")) %>% show_query() ``` Note that you can use `sql()` at any depth inside the expression: ```{r} mf %>% filter(x == sql("ANY VALUES(1, 2, 3)")) %>% show_query() ``` dbplyr/inst/doc/reprex.R0000644000176200001440000000204413501765342014726 0ustar liggesusers## ---- include = FALSE---------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ## ----setup, message = FALSE---------------------------------------------- library(dplyr) library(dbplyr) ## ------------------------------------------------------------------------ mf <- memdb_frame(g = c(1, 1, 2, 2, 2), x = 1:5, y = 5:1) mf mf %>% group_by(g) %>% summarise_all(mean, na.rm = TRUE) ## ------------------------------------------------------------------------ mtcars_db <- tbl_memdb(mtcars) mtcars_db %>% count(cyl) %>% show_query() ## ------------------------------------------------------------------------ x <- c("abc", "def", "ghif") lazy_frame(x = x, con = simulate_postgres()) %>% head(5) %>% show_query() lazy_frame(x = x, con = simulate_mssql()) %>% head(5) %>% show_query() ## ------------------------------------------------------------------------ translate_sql(substr(x, 1, 2), con = simulate_postgres()) translate_sql(substr(x, 1, 2), con = simulate_sqlite()) dbplyr/inst/doc/dbplyr.html0000644000176200001440000011027313501765341015463 0ustar liggesusers Introduction to dbplyr

Introduction to dbplyr

As well as working with local in-memory data stored in data frames, dplyr also works with remote on-disk data stored in databases. This is particularly useful in two scenarios:

(If your data fits in memory there is no advantage to putting it in a database: it will only be slower and more frustrating.)

This vignette focuses on the first scenario because it’s the most common. If you’re using R to do data analysis inside a company, most of the data you need probably already lives in a database (it’s just a matter of figuring out which one!). However, you will learn how to load data in to a local database in order to demonstrate dplyr’s database tools. At the end, I’ll also give you a few pointers if you do need to set up your own database.

Getting started

To use databases with dplyr you need to first install dbplyr:

You’ll also need to install a DBI backend package. The DBI package provides a common interface that allows dplyr to work with many different databases using the same code. DBI is automatically installed with dbplyr, but you need to install a specific backend for the database that you want to connect to.

Five commonly used backends are:

If the database you need to connect to is not listed here, you’ll need to do some investigation (i.e. googling) yourself.

In this vignette, we’re going to use the RSQLite backend which is automatically installed when you install dbplyr. SQLite is a great way to get started with databases because it’s completely embedded inside an R package. Unlike most other systems, you don’t need to setup a separate database server. SQLite is great for demos, but is surprisingly powerful, and with a little practice you can use it to easily work with many gigabytes of data.

Connecting to the database

To work with a database in dplyr, you must first connect to it, using DBI::dbConnect(). We’re not going to go into the details of the DBI package here, but it’s the foundation upon which dbplyr is built. You’ll need to learn more about if you need to do things to the database that are beyond the scope of dplyr.

The arguments to DBI::dbConnect() vary from database to database, but the first argument is always the database backend. It’s RSQLite::SQLite() for RSQLite, RMariaDB::MariaDB() for RMariaDB, RPostgres::Postgres() for RPostgres, odbc::odbc() for odbc, and bigrquery::bigquery() for BigQuery. SQLite only needs one other argument: the path to the database. Here we use the special string ":memory:" which causes SQLite to make a temporary in-memory database.

Most existing databases don’t live in a file, but instead live on another server. That means in real-life that your code will look more like this:

(If you’re not using RStudio, you’ll need some other way to securely retrieve your password. You should never record it in your analysis scripts or type it into the console. Securing Credentials provides some best practices.)

Our temporary database has no data in it, so we’ll start by copying over nycflights13::flights using the convenient copy_to() function. This is a quick and dirty way of getting data into a database and is useful primarily for demos and other small jobs.

As you can see, the copy_to() operation has an additional argument that allows you to supply indexes for the table. Here we set up indexes that will allow us to quickly process the data by day, carrier, plane, and destination. Creating the right indices is key to good database performance, but is unfortunately beyond the scope of this article.

Now that we’ve copied the data, we can use tbl() to take a reference to it:

When you print it out, you’ll notice that it mostly looks like a regular tibble:

The main difference is that you can see that it’s a remote source in a SQLite database.

Generating queries

To interact with a database you usually use SQL, the Structured Query Language. SQL is over 40 years old, and is used by pretty much every database in existence. The goal of dbplyr is to automatically generate SQL for you so that you’re not forced to use it. However, SQL is a very large language and dbplyr doesn’t do everything. It focusses on SELECT statements, the SQL you write most often as an analyst.

Most of the time you don’t need to know anything about SQL, and you can continue to use the dplyr verbs that you’re already familiar with:

However, in the long-run, I highly recommend you at least learn the basics of SQL. It’s a valuable skill for any data scientist, and it will help you debug problems if you run into problems with dplyr’s automatic translation. If you’re completely new to SQL you might start with this codeacademy tutorial. If you have some familiarity with SQL and you’d like to learn more, I found how indexes work in SQLite and 10 easy steps to a complete understanding of SQL to be particularly helpful.

The most important difference between ordinary data frames and remote database queries is that your R code is translated into SQL and executed in the database on the remote server, not in R on your local machine. When working with databases, dplyr tries to be as lazy as possible:

For example, take the following code:

Surprisingly, this sequence of operations never touches the database. It’s not until you ask for the data (e.g. by printing tailnum_delay) that dplyr generates the SQL and requests the results from the database. Even then it tries to do as little work as possible and only pulls down a few rows.

Behind the scenes, dplyr is translating your R code into SQL. You can see the SQL it’s generating with show_query():

If you’re familiar with SQL, this probably isn’t exactly what you’d write by hand, but it does the job. You can learn more about the SQL translation in vignette("translation-verb") and vignette("translation-function").

Typically, you’ll iterate a few times before you figure out what data you need from the database. Once you’ve figured it out, use collect() to pull all the data down into a local tibble:

collect() requires that database does some work, so it may take a long time to complete. Otherwise, dplyr tries to prevent you from accidentally performing expensive query operations:

You can also ask the database how it plans to execute the query with explain(). The output is database dependent, and can be esoteric, but learning a bit about it can be very useful because it helps you understand if the database can execute the query efficiently, or if you need to create new indices.

Creating your own database

If you don’t already have a database, here’s some advice from my experiences setting up and running all of them. SQLite is by far the easiest to get started with. PostgreSQL is not too much harder to use and has a wide range of built-in functions. In my opinion, you shouldn’t bother with MySQL/MariaDB: it’s a pain to set up, the documentation is subpar, and it’s less featureful than Postgres. Google BigQuery might be a good fit if you have very large data, or if you’re willing to pay (a small amount of) money to someone who’ll look after your database.

All of these databases follow a client-server model - a computer that connects to the database and the computer that is running the database (the two may be one and the same but usually isn’t). Getting one of these databases up and running is beyond the scope of this article, but there are plenty of tutorials available on the web.

MySQL/MariaDB

In terms of functionality, MySQL lies somewhere between SQLite and PostgreSQL. It provides a wider range of built-in functions. It gained support for window functions in 2018.

PostgreSQL

PostgreSQL is a considerably more powerful database than SQLite. It has a much wider range of built-in functions, and is generally a more featureful database.

BigQuery

BigQuery is a hosted database server provided by Google. To connect, you need to provide your project, dataset and optionally a project for billing (if billing for project isn’t enabled).

It provides a similar set of functions to Postgres and is designed specifically for analytic workflows. Because it’s a hosted solution, there’s no setup involved, but if you have a lot of data, getting it to Google can be an ordeal (especially because upload support from R is not great currently). (If you have lots of data, you can ship hard drives!)

dbplyr/inst/doc/translation-verb.R0000644000176200001440000000166313501765344016723 0ustar liggesusers## ---- message = FALSE---------------------------------------------------- library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") flights <- copy_to(con, nycflights13::flights) airports <- copy_to(con, nycflights13::airports) ## ------------------------------------------------------------------------ flights %>% select(contains("delay")) %>% show_query() flights %>% select(distance, air_time) %>% mutate(speed = distance / (air_time / 60)) %>% show_query() ## ------------------------------------------------------------------------ flights %>% filter(month == 1, day == 1) %>% show_query() ## ------------------------------------------------------------------------ flights %>% arrange(carrier, desc(arr_delay)) %>% show_query() ## ------------------------------------------------------------------------ flights %>% group_by(month, day) %>% summarise(delay = mean(dep_delay)) %>% show_query() dbplyr/inst/doc/reprex.html0000644000176200001440000004256413501765342015504 0ustar liggesusers Reprexes for dbplyr

Reprexes for dbplyr

If you’re reporting a bug in dbplyr, it is much easier for me to help you if you can supply a reprex that I can run on my computer. Creating reprexes for dbplyr is particularly challenging because you are probably using a database that you can’t share with me. Fortunately, in many cases you can still demonstrate the problem even if I don’t have the complete dataset, or even access to the database system that you’re using.

This vignette outlines three approaches for creating reprexes that will work anywhere:

library(dplyr)
library(dbplyr)

Using memdb_frame()

The first place to start is with SQLite. SQLite is particularly appealing because it’s completely embedded instead an R package so doesn’t have any external dependencies. SQLite is designed to be small and simple, so it can’t demonstrate all problems, but it’s easy to try out and a great place to start.

You can easily create a SQLite in-memory database table using memdb_frame():

Reprexes are easiest to understand if you create very small custom data, but if you do want to use an existing data frame you can use tbl_memdb():

Translating verbs

Many problems with dbplyr come down to incorrect SQL generation. Fortunately, it’s possible to generate SQL without a database using lazy_frame() and tbl_lazy(). Both take an con argument which takes a database “simulator” like simulate_postgres(), simulate_sqlite(), etc.

If you isolate the problem to incorrect SQL generation, it would be very helpful if you could also suggest more appropriate SQL.

Translating individual expressions

In some cases, you might be able to track the problem down to incorrect translation for a single column expression. In that case, you can make your reprex even simpler with translate_sql():

dbplyr/inst/doc/sql.R0000644000176200001440000000245313501765342014224 0ustar liggesusers## ---- include = FALSE---------------------------------------------------- knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ## ----setup, message = FALSE---------------------------------------------- library(dplyr) library(dbplyr) mf <- memdb_frame(x = 1, y = 2) ## ------------------------------------------------------------------------ mf %>% mutate( a = y * x, b = a ^ 2, ) %>% show_query() ## ------------------------------------------------------------------------ mf %>% mutate(z = foofify(x, y)) %>% show_query() ## ------------------------------------------------------------------------ mf %>% mutate(z = FOOFIFY(x, y)) %>% show_query() ## ------------------------------------------------------------------------ mf %>% filter(x %LIKE% "%foo%") %>% show_query() ## ------------------------------------------------------------------------ mf %>% transmute(z = x %||% y) %>% show_query() ## ------------------------------------------------------------------------ mf %>% transmute(factorial = sql("x!")) %>% show_query() mf %>% transmute(factorial = sql("CAST(x AS FLOAT)")) %>% show_query() ## ------------------------------------------------------------------------ mf %>% filter(x == sql("ANY VALUES(1, 2, 3)")) %>% show_query() dbplyr/inst/doc/new-backend.R0000644000176200001440000000146113501765341015600 0ustar liggesusers## ---- echo = FALSE, message = FALSE-------------------------------------- knitr::opts_chunk$set(collapse = T, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) ## ----setup, message = FALSE---------------------------------------------- library(dplyr) library(DBI) ## ------------------------------------------------------------------------ con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:") DBI::dbWriteTable(con, "mtcars", mtcars) tbl(con, "mtcars") ## ------------------------------------------------------------------------ #' @export db_desc.PostgreSQLConnection <- function(x) { info <- dbGetInfo(x) host <- if (info$host == "") "localhost" else info$host paste0("postgres ", info$serverVersion, " [", info$user, "@", host, ":", info$port, "/", info$dbname, "]") } dbplyr/inst/doc/translation-function.R0000644000176200001440000000475313501765342017613 0ustar liggesusers## ----setup, include = FALSE---------------------------------------------- knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) ## ---- message = FALSE---------------------------------------------------- library(dbplyr) library(dplyr) ## ------------------------------------------------------------------------ translate_sql((x + y) / 2) ## ------------------------------------------------------------------------ translate_sql(x ^ 2L) translate_sql(x ^ 2L, con = simulate_sqlite()) translate_sql(x ^ 2L, con = simulate_access()) ## ------------------------------------------------------------------------ # In SQLite variable names are escaped by double quotes: translate_sql(x) # And strings are escaped by single quotes translate_sql("x") ## ------------------------------------------------------------------------ translate_sql(substr(x, 5, 10)) translate_sql(log(x, 10)) ## ------------------------------------------------------------------------ translate_sql(1) translate_sql(1L) ## ------------------------------------------------------------------------ df <- tibble( x = c(10L, 10L, -10L, -10L), y = c(3L, -3L, 3L, -3L) ) mf <- src_memdb() %>% copy_to(df, overwrite = TRUE) df %>% mutate(x %% y, x %/% y) mf %>% mutate(x %% y, x %/% y) ## ------------------------------------------------------------------------ translate_sql(mean(x)) translate_sql(mean(x, na.rm = TRUE)) ## ------------------------------------------------------------------------ translate_sql(mean(x, na.rm = TRUE), window = FALSE) ## ------------------------------------------------------------------------ translate_sql(if (x > 5) "big" else "small") translate_sql(switch(x, a = 1L, b = 2L, 3L)) ## ------------------------------------------------------------------------ translate_sql(glob(x, y)) translate_sql(x %like% "ab%") ## ----echo = FALSE, out.width = "100%"------------------------------------ knitr::include_graphics("windows.png", dpi = 300) ## ------------------------------------------------------------------------ translate_sql(mean(G)) translate_sql(rank(G)) translate_sql(ntile(G, 2)) translate_sql(lag(G)) ## ------------------------------------------------------------------------ translate_sql(cummean(G), vars_order = "year") translate_sql(rank(), vars_group = "ID") ## ---- eval = FALSE------------------------------------------------------- # mutate(players, # min_rank(yearID), # order_by(yearID, cumsum(G)), # lead(G, order_by = yearID) # ) dbplyr/inst/doc/new-backend.Rmd0000644000176200001440000000646413457577000016134 0ustar liggesusers--- title: "Adding a new DBI backend" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Adding a new DBI backend} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, echo = FALSE, message = FALSE} knitr::opts_chunk$set(collapse = T, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) ``` This document describes how to add a new SQL backend to dbplyr. To begin: * Ensure that you have a DBI compliant database backend. If not, you'll need to first create it by following the instructions in `vignette("backend", package = "DBI")`. * You'll need a working knowledge of S3. Make sure that you're [familiar with the basics](http://adv-r.had.co.nz/OO-essentials.html#s3) before you start. This document is still a work in progress, but it will hopefully get you started. I'd also strongly recommend reading the bundled source code for [SQLite](https://github.com/tidyverse/dbplyr/blob/master/R/backend-sqlite.R), [MySQL](https://github.com/tidyverse/dbplyr/blob/master/R/backend-mysql.R), and [PostgreSQL](https://github.com/tidyverse/dbplyr/blob/master/R/backend-postgres.R). ## First steps For interactive exploitation, attach dplyr and DBI. If you're creating a package, you'll need to import dplyr and DBI. ```{r setup, message = FALSE} library(dplyr) library(DBI) ``` Check that you can create a tbl from a connection, like: ```{r} con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:") DBI::dbWriteTable(con, "mtcars", mtcars) tbl(con, "mtcars") ``` If you can't, this likely indicates some problem with the DBI methods. Use [DBItest](https://github.com/rstats-db/DBItest) to narrow down the problem. Now is a good time to implement a method for `db_desc()`. This should briefly describe the connection, typically formatting the information returned from `dbGetInfo()`. This is what dbplyr does for Postgres connections: ```{r} #' @export db_desc.PostgreSQLConnection <- function(x) { info <- dbGetInfo(x) host <- if (info$host == "") "localhost" else info$host paste0("postgres ", info$serverVersion, " [", info$user, "@", host, ":", info$port, "/", info$dbname, "]") } ``` ## Copying, computing, collecting and collapsing Next, check that `copy_to()`, `collapse()`, `compute()`, and `collect()` work. * If `copy_to()` fails, it's likely you need a method for `db_write_table()`, `db_create_indexes()` or `db_analyze()`. * If `collapse()` fails, your database has a non-standard way of constructing subqueries. Add a method for `sql_subquery()`. * If `compute()` fails, your database has a non-standard way of saving queries in temporary tables. Add a method for `db_save_query()`. ## SQL translation Make sure you've read `vignette("translation-verb")` so you have the lay of the land. ### Verbs Check that SQL translation for the key verbs work: * `summarise()`, `mutate()`, `filter()` etc: powered by `sql_select()` * `left_join()`, `inner_join()`: powered by `sql_join()` * `semi_join()`, `anti_join()`: powered by `sql_semi_join()` * `union()`, `intersect()`, `setdiff()`: powered by `sql_set_op()` ### Vectors Finally, you may have to provide custom R -> SQL translation at the vector level by providing a method for `sql_translate_env()`. This function should return an object created by `sql_variant()`. See existing methods for examples. dbplyr/inst/doc/translation-function.html0000644000176200001440000031526513501765343020362 0ustar liggesusers Function translation

Function translation

There are two parts to dbplyr SQL translation: translating dplyr verbs, and translating expressions within those verbs. This vignette describes how individual expressions (function calls) are translated; vignette("translate-verb") describes how entire verbs are translated.

library(dbplyr)
library(dplyr)

dbplyr::translate_sql() powers translation of individual function calls, and I’ll use it extensively in this vignette to show what’s happening. You shouldn’t need to use it ordinary code as dbplyr takes care of the translation automatically.

translate_sql((x + y) / 2)
#> <SQL> (`x` + `y`) / 2.0

translate_sql() takes an optional con parameter. If not supplied, this causes dplyr to generate (approximately) SQL-92 compliant SQL. If supplied, dplyr uses sql_translate_env() to look up a custom environment which makes it possible for different databases to generate slightly different SQL: see vignette("new-backend") for more details. You can use the various simulate helpers to see the translations used by different backends:

translate_sql(x ^ 2L)
#> <SQL> POWER(`x`, 2)
translate_sql(x ^ 2L, con = simulate_sqlite())
#> <SQL> POWER(`x`, 2)
translate_sql(x ^ 2L, con = simulate_access())
#> <SQL> `x` ^ 2

Perfect translation is not possible because databases don’t have all the functions that R does. The goal of dplyr is to provide a semantic rather than a literal translation: what you mean, rather than precisely what is done. In fact, even for functions that exist both in databases and R, you shouldn’t expect results to be identical; database programmers have different priorities than R core programmers. For example, in R in order to get a higher level of numerical accuracy, mean() loops through the data twice. R’s mean() also provides a trim option for computing trimmed means; this is something that databases do not provide.

If you’re interested in how translate_sql() is implemented, the basic techniques that underlie the implementation of translate_sql() are described in “Advanced R”.

Basic differences

The following examples work through some of the basic differences between R and SQL.

Known functions

Mathematics

  • basic math operators: +, -, *, /, ^
  • trigonometry: acos(), asin(), atan(), atan2(), cos(), cot(), tan(), sin()
  • hypergeometric: cosh(), coth(), sinh(), tanh()
  • logarithmic: log(), log10(), exp()
  • misc: abs(), ceiling(), sqrt(), sign(), round()

Modulo arithmetic

dbplyr translates %% and %/% to their SQL equivalents but note that they are not precisely the same: most databases use truncated division where the modulo operator takes the sign of the dividend, where R using the mathematically preferred floored division with the modulo sign taking the sign of the divisor.

Logical comparisons

  • logical comparisons: <, <=, !=, >=, >, ==, %in%
  • boolean operations: &, &&, |, ||, !, xor()

Aggregation

All database provide translation for the basic aggregations: mean(), sum(), min(), max(), sd(), var(). Databases automatically drop NULLs (their equivalent of missing values) whereas in R you have to ask nicely. The aggregation functions warn you about this important difference:

Note that, by default, translate() assumes that the call is inside a mutate() or filter() and generates a window translation. If you want to see the equivalent summarise()/aggregation translation, use window = FALSE:

String manipulation

Date/time

  • string functions: tolower, toupper, trimws, nchar, substr
  • coerce types: as.numeric, as.integer, as.character

Unknown functions

Any function that dplyr doesn’t know how to convert is left as is. This means that database functions that are not covered by dplyr can be used directly via translate_sql(). Here a couple of examples that will work with SQLite:

See vignette("sql") for more details.

Window functions

Things get a little trickier with window functions, because SQL’s window functions are considerably more expressive than the specific variants provided by base R or dplyr. They have the form [expression] OVER ([partition clause] [order clause] [frame_clause]):

To see how individual window functions are translated to SQL, we can again use translate_sql():

If the tbl has been grouped or arranged previously in the pipeline, then dplyr will use that information to set the “partition by” and “order by” clauses. For interactive exploration, you can achieve the same effect by setting the vars_group and vars_order arguments to translate_sql()

There are some challenges when translating window functions between R and SQL, because dplyr tries to keep the window functions as similar as possible to both the existing R analogues and to the SQL functions. This means that there are three ways to control the order clause depending on which window function you’re using:

The three options are illustrated in the snippet below:

Currently there is no way to order by multiple variables, except by setting the default ordering with arrange(). This will be added in a future release.

dbplyr/inst/doc/new-backend.html0000644000176200001440000003544513501765341016354 0ustar liggesusers Adding a new DBI backend

Adding a new DBI backend

This document describes how to add a new SQL backend to dbplyr. To begin:

This document is still a work in progress, but it will hopefully get you started. I’d also strongly recommend reading the bundled source code for SQLite, MySQL, and PostgreSQL.

First steps

For interactive exploitation, attach dplyr and DBI. If you’re creating a package, you’ll need to import dplyr and DBI.

Check that you can create a tbl from a connection, like:

If you can’t, this likely indicates some problem with the DBI methods. Use DBItest to narrow down the problem.

Now is a good time to implement a method for db_desc(). This should briefly describe the connection, typically formatting the information returned from dbGetInfo(). This is what dbplyr does for Postgres connections:

Copying, computing, collecting and collapsing

Next, check that copy_to(), collapse(), compute(), and collect() work.

SQL translation

Make sure you’ve read vignette("translation-verb") so you have the lay of the land.

Verbs

Check that SQL translation for the key verbs work:

  • summarise(), mutate(), filter() etc: powered by sql_select()
  • left_join(), inner_join(): powered by sql_join()
  • semi_join(), anti_join(): powered by sql_semi_join()
  • union(), intersect(), setdiff(): powered by sql_set_op()

Vectors

Finally, you may have to provide custom R -> SQL translation at the vector level by providing a method for sql_translate_env(). This function should return an object created by sql_variant(). See existing methods for examples.

dbplyr/tests/0000755000176200001440000000000013415745770012725 5ustar liggesusersdbplyr/tests/testthat.R0000644000176200001440000000007013415745770014705 0ustar liggesuserslibrary(testthat) library(dbplyr) test_check("dbplyr") dbplyr/tests/testthat/0000755000176200001440000000000013501770504014552 5ustar liggesusersdbplyr/tests/testthat/test-backend-odbc.R0000644000176200001440000000222213416412511020140 0ustar liggesuserscontext("test-backend-odbc.R") test_that("custom scalar translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_odbc()) } expect_equal(trans(as.numeric(x)), sql("CAST(`x` AS DOUBLE)")) expect_equal(trans(as.double(x)), sql("CAST(`x` AS DOUBLE)")) expect_equal(trans(as.integer(x)), sql("CAST(`x` AS INT)")) expect_equal(trans(as.character(x)), sql("CAST(`x` AS STRING)")) }) test_that("custom aggregators translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = FALSE, con = simulate_odbc()) } expect_equal(trans(sd(x)), sql("STDDEV_SAMP(`x`)")) expect_equal(trans(count()), sql("COUNT(*)")) expect_equal(trans(n()), sql("COUNT(*)")) }) test_that("custom window functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = TRUE, con = simulate_odbc()) } expect_equal(trans(sd(x, na.rm = TRUE)), sql("STDDEV_SAMP(`x`) OVER ()")) expect_equal(trans(count()), sql("COUNT(*) OVER ()")) expect_equal(trans(n()), sql("COUNT(*) OVER ()")) }) dbplyr/tests/testthat/test-query-set-op.R0000644000176200001440000000103113426611300020211 0ustar liggesuserscontext("test-query-set-op") test_that("print method doesn't change unexpectedly", { lf1 <- lazy_frame(x = 1, y = 2) lf2 <- lazy_frame(x = 1, z = 2) qry <- sql_build(union(lf1, lf2)) expect_known_output(print(qry), test_path("test-query-set-op-print.txt")) }) test_that("generated sql doesn't change unexpectedly", { lf <- lazy_frame(x = 1, y = 2) reg <- list( union = union(lf, lf), setdiff = setdiff(lf, lf), intersect = intersect(lf, lf) ) expect_known_output(print(reg), test_path("sql/setop.sql")) }) dbplyr/tests/testthat/test-verb-arrange.R0000644000176200001440000000353413426635320020234 0ustar liggesuserscontext("arrange") test_that("two arranges equivalent to one", { mf <- memdb_frame(x = c(2, 2, 1), y = c(1, -1, 1)) mf1 <- mf %>% arrange(x, y) mf2 <- mf %>% arrange(y) %>% arrange(x) expect_equal_tbl(mf1, mf2) }) # sql_render -------------------------------------------------------------- test_that("quoting for rendering ordered grouped table", { out <- memdb_frame(x = 1, y = 2) %>% group_by(x) %>% arrange(y) expect_match(out %>% sql_render, "^SELECT [*]\nFROM `[^`]*`\nORDER BY `y`$") expect_equal(out %>% collect, tibble(x = 1, y = 2)) }) # sql_build --------------------------------------------------------------- test_that("arrange generates order_by", { out <- lazy_frame(x = 1, y = 1) %>% arrange(x) %>% sql_build() expect_equal(out$order_by, sql('`x`')) }) test_that("arrange converts desc", { out <- lazy_frame(x = 1, y = 1) %>% arrange(desc(x)) %>% sql_build() expect_equal(out$order_by, sql('`x` DESC')) }) test_that("grouped arrange doesn't order by groups", { out <- lazy_frame(x = 1, y = 1) %>% group_by(x) %>% arrange(y) %>% sql_build() expect_equal(out$order_by, sql('`y`')) }) test_that("grouped arrange order by groups when .by_group is set to TRUE", { lf <- lazy_frame(x = 1, y = 1, con = simulate_dbi()) out <- lf %>% group_by(x) %>% arrange(y, .by_group = TRUE) %>% sql_build() expect_equal(out$order_by, sql(c('`x`','`y`'))) }) # ops --------------------------------------------------------------------- test_that("arranges captures DESC", { out <- lazy_frame(x = 1:3, y = 3:1) %>% arrange(desc(x)) expect_equal(op_sort(out), list(~desc(x))) }) test_that("multiple arranges combine", { out <- lazy_frame(x = 1:3, y = 3:1) %>% arrange(x) %>% arrange(y) out <- arrange(arrange(lazy_frame(x = 1:3, y = 3:1), x), y) expect_equal(op_sort(out), list(~x, ~y)) }) dbplyr/tests/testthat/test-utils.R0000644000176200001440000000160613416375022017016 0ustar liggesuserscontext("utils") test_that("deparse_trunc() expression to text", { expect_equal( deparse_trunc(expr(test)), "test" ) dt <- deparse_trunc( expr(!!paste0(rep("x", 200), collapse = "")) ) expect_equal( nchar(dt), getOption("width") ) }) test_that("Says 1.1 is not a whole number", { expect_false(is.wholenumber(1.1)) }) test_that("Succesful and not-sucessful commands are identified", { expect_true(succeeds("success")) expect_false(succeeds(x - 1)) }) test_that("Dots are collapsed into a single variable", { expect_equal( named_commas(x = 1, y = 2), "x = 1, y = 2" ) expect_equal( named_commas(1, 2), "1, 2" ) }) test_that("Correctly identifies the Travis flag", { expect_equal( in_travis(), Sys.getenv("TRAVIS") == "true" ) }) test_that("Returns error if no characters are passed", { expect_error(c_character(1, 2)) }) dbplyr/tests/testthat/test-verb-filter.R0000644000176200001440000000416513426561404020104 0ustar liggesuserscontext("filter") test_that("filter captures local variables", { mf <- memdb_frame(x = 1:5, y = 5:1) z <- 3 df1 <- mf %>% filter(x > z) %>% collect() df2 <- mf %>% collect() %>% filter(x > z) expect_equal_tbl(df1, df2) }) test_that("two filters equivalent to one", { mf <- memdb_frame(x = 1:5, y = 5:1) df1 <- mf %>% filter(x > 3) %>% filter(y < 3) df2 <- mf %>% filter(x > 3, y < 3) expect_equal_tbl(df1, df2) }) test_that("each argument gets implicit parens", { mf <- memdb_frame( v1 = c("a", "b", "a", "b"), v2 = c("b", "a", "a", "b"), v3 = c("a", "b", "c", "d") ) mf1 <- mf %>% filter((v1 == "a" | v2 == "a") & v3 == "a") mf2 <- mf %>% filter(v1 == "a" | v2 == "a", v3 == "a") expect_equal_tbl(mf1, mf2) }) # SQL generation -------------------------------------------------------- test_that("basic filter works across all backends", { dfs <- test_frame(x = 1:5, y = 5:1) dfs %>% lapply(. %>% filter(x > 3)) %>% expect_equal_tbls() }) test_that("filter calls windowed versions of sql functions", { dfs <- test_frame_windowed( x = 1:10, g = rep(c(1, 2), each = 5) ) dfs %>% lapply(. %>% group_by(g) %>% filter(row_number(x) < 3)) %>% expect_equal_tbls(tibble(g = c(1, 1, 2, 2), x = c(1L, 2L, 6L, 7L))) }) test_that("recycled aggregates generate window function", { dfs <- test_frame_windowed( x = 1:10, g = rep(c(1, 2), each = 5) ) dfs %>% lapply(. %>% group_by(g) %>% filter(x > mean(x, na.rm = TRUE))) %>% expect_equal_tbls(tibble(g = c(1, 1, 2, 2), x = c(4L, 5L, 9L, 10L))) }) test_that("cumulative aggregates generate window function", { dfs <- test_frame_windowed( x = c(1:3, 2:4), g = rep(c(1, 2), each = 3) ) dfs %>% lapply(. %>% group_by(g) %>% arrange(x) %>% filter(cumsum(x) > 3) ) %>% expect_equal_tbls(tibble(g = c(1, 2, 2), x = c(3L, 3L, 4L))) }) # sql_build --------------------------------------------------------------- test_that("filter generates simple expressions", { out <- lazy_frame(x = 1) %>% filter(x > 1L) %>% sql_build() expect_equal(out$where, sql('`x` > 1')) }) dbplyr/tests/testthat/test-translate-sql-window.R0000644000176200001440000000555013426334503021757 0ustar liggesuserscontext("test-translate-sql-window.r") test_that("aggregation functions warn once if na.rm = FALSE", { old <- set_current_con(simulate_dbi()) on.exit(set_current_con(old)) sql_mean <- win_aggregate("MEAN") expect_warning(sql_mean("x"), "Missing values") expect_warning(sql_mean("x"), NA) expect_warning(sql_mean("x", na.rm = TRUE), NA) }) test_that("window functions without group have empty over", { expect_equal(translate_sql(n()), sql("COUNT(*) OVER ()")) expect_equal(translate_sql(sum(x, na.rm = TRUE)), sql("SUM(`x`) OVER ()")) }) test_that("aggregating window functions ignore order_by", { expect_equal( translate_sql(n(), vars_order = "x"), sql("COUNT(*) OVER ()") ) expect_equal( translate_sql(sum(x, na.rm = TRUE), vars_order = "x"), sql("SUM(`x`) OVER ()") ) }) test_that("order_by overrides default ordering", { expect_equal( translate_sql(order_by(y, cumsum(x)), vars_order = "x"), sql("SUM(`x`) OVER (ORDER BY `y` ROWS UNBOUNDED PRECEDING)") ) expect_equal( translate_sql(order_by(y, cummean(x)), vars_order = "x"), sql("AVG(`x`) OVER (ORDER BY `y` ROWS UNBOUNDED PRECEDING)") ) expect_equal( translate_sql(order_by(y, cummin(x)), vars_order = "x"), sql("MIN(`x`) OVER (ORDER BY `y` ROWS UNBOUNDED PRECEDING)") ) expect_equal( translate_sql(order_by(y, cummax(x)), vars_order = "x"), sql("MAX(`x`) OVER (ORDER BY `y` ROWS UNBOUNDED PRECEDING)") ) }) test_that("cumulative windows warn if no order", { expect_warning(translate_sql(cumsum(x)), "does not have explicit order") expect_warning(translate_sql(cumsum(x), vars_order = "x"), NA) }) test_that("ntile always casts to integer", { expect_equal( translate_sql(ntile(x, 10.5)), sql("NTILE(10) OVER (ORDER BY `x`)") ) }) test_that("first, last, and nth translated to _value", { expect_equal(translate_sql(first(x)), sql("FIRST_VALUE(`x`) OVER ()")) expect_equal(translate_sql(last(x)), sql("LAST_VALUE(`x`) OVER ()")) expect_equal(translate_sql(nth(x, 1)), sql("NTH_VALUE(`x`, 1) OVER ()")) }) test_that("can override frame of recycled functions", { expect_equal( translate_sql(sum(x, na.rm = TRUE), vars_frame = c(-1, 0), vars_order = "y"), sql("SUM(`x`) OVER (ORDER BY `y` ROWS 1 PRECEDING)") ) }) # win_over ---------------------------------------------------------------- test_that("over() only requires first argument", { old <- set_current_con(simulate_dbi()) on.exit(set_current_con(old)) expect_equal(win_over("X"), sql("'X' OVER ()")) }) test_that("multiple group by or order values don't have parens", { old <- set_current_con(simulate_dbi()) on.exit(set_current_con(old)) expect_equal( win_over(ident("x"), order = c("x", "y")), sql("`x` OVER (ORDER BY `x`, `y`)") ) expect_equal( win_over(ident("x"), partition = c("x", "y")), sql("`x` OVER (PARTITION BY `x`, `y`)") ) }) dbplyr/tests/testthat/test-backend-oracle.R0000644000176200001440000000164613426147047020521 0ustar liggesuserscontext("test-backend-oracle.R") test_that("custom scalar functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_oracle()) } expect_equal(trans(as.character(x)), sql("CAST(`x` AS VARCHAR2(255))")) expect_equal(trans(as.integer64(x)), sql("CAST(`x` AS NUMBER(19))")) expect_equal(trans(as.double(x)), sql("CAST(`x` AS NUMBER)")) }) test_that("queries translate correctly", { mf <- lazy_frame(x = 1, con = simulate_oracle()) expect_match( mf %>% head() %>% sql_render(simulate_oracle()), sql("^SELECT [*] FROM [(]SELECT [*]\nFROM [(]`df`[)] [)] `[^`]*` WHERE ROWNUM [<][=] 6") ) }) test_that("paste and paste0 translate correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_oracle(), window = FALSE) } expect_equal(trans(paste(x, y)), sql("`x` || ' ' || `y`")) expect_equal(trans(paste0(x, y)), sql("`x` || `y`")) }) dbplyr/tests/testthat/test-translate-sql-quantile.R0000644000176200001440000000111413426635320022263 0ustar liggesuserscontext("test-translate-sql-quantile") test_that("quantile and median don't change without warning", { reg <- list( quantile = translate_sql(quantile(x, 0.75), window = FALSE), quantile_win = translate_sql(quantile(x, 0.75), vars_group = "g"), median = translate_sql(median(x), window = FALSE), median_win = translate_sql(median(x), vars_group = "g") ) expect_known_output(print(reg), test_path("sql/backend-quantile.sql")) }) test_that("checks for invalid probs", { expect_error(check_probs("a"), "numeric") expect_error(check_probs(1:3), "single value") }) dbplyr/tests/testthat/test-backend-mssql.R0000644000176200001440000001464013475552423020413 0ustar liggesuserscontext("test-backend-mssql.R") test_that("custom scalar translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_mssql()) } expect_equal(trans(as.logical(x)), sql("CAST(`x` AS BIT)")) expect_equal(trans(as.numeric(x)), sql("CAST(`x` AS NUMERIC)")) expect_equal(trans(as.double(x)), sql("CAST(`x` AS NUMERIC)")) expect_equal(trans(as.character(x)), sql("CAST(`x` AS VARCHAR(MAX))")) expect_equal(trans(log(x)), sql("LOG(`x`)")) expect_equal(trans(nchar(x)), sql("LEN(`x`)")) expect_equal(trans(atan2(x)), sql("ATN2(`x`)")) expect_equal(trans(ceiling(x)), sql("CEILING(`x`)")) expect_equal(trans(ceil(x)), sql("CEILING(`x`)")) expect_equal(trans(substr(x, 1, 2)), sql("SUBSTRING(`x`, 1, 2)")) expect_equal(trans(trimws(x)), sql("LTRIM(RTRIM(`x`))")) expect_equal(trans(paste(x, y)), sql("`x` + ' ' + `y`")) expect_error(trans(bitwShiftL(x, 2L)), sql("not available")) expect_error(trans(bitwShiftR(x, 2L)), sql("not available")) }) test_that("custom stringr functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_mssql()) } expect_equal(trans(str_length(x)), sql("LEN(`x`)")) }) test_that("custom aggregators translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = FALSE, con = simulate_mssql()) } expect_equal(trans(sd(x, na.rm = TRUE)), sql("STDEV(`x`)")) expect_equal(trans(var(x, na.rm = TRUE)), sql("VAR(`x`)")) expect_error(trans(cor(x)), "not available") expect_error(trans(cov(x)), "not available") }) test_that("custom window functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = TRUE, con = simulate_mssql()) } expect_equal(trans(sd(x, na.rm = TRUE)), sql("STDEV(`x`) OVER ()")) expect_equal(trans(var(x, na.rm = TRUE)), sql("VAR(`x`) OVER ()")) expect_error(trans(cor(x)), "not supported") expect_error(trans(cov(x)), "not supported") }) test_that("filter and mutate translate is.na correctly", { mf <- lazy_frame(x = 1, con = simulate_mssql()) expect_equal( mf %>% head() %>% sql_render(), sql("SELECT TOP(6) *\nFROM `df`") ) expect_equal( mf %>% mutate(z = is.na(x)) %>% sql_render(), sql("SELECT `x`, CONVERT(BIT, IIF(`x` IS NULL, 1, 0)) AS `z`\nFROM `df`") ) expect_equal( mf %>% mutate(z = !is.na(x)) %>% sql_render(), sql("SELECT `x`, ~(CONVERT(BIT, IIF(`x` IS NULL, 1, 0))) AS `z`\nFROM `df`") ) expect_equal( mf %>% filter(is.na(x)) %>% sql_render(), sql("SELECT *\nFROM `df`\nWHERE (((`x`) IS NULL))") ) expect_equal( mf %>% mutate(x = x == 1) %>% sql_render(), sql("SELECT `x` = 1.0 AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = x != 1) %>% sql_render(), sql("SELECT `x` != 1.0 AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = x > 1) %>% sql_render(), sql("SELECT `x` > 1.0 AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = x >= 1) %>% sql_render(), sql("SELECT `x` >= 1.0 AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = !(x == 1)) %>% sql_render(), sql("SELECT ~((`x` = 1.0)) AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = !(x != 1)) %>% sql_render(), sql("SELECT ~((`x` != 1.0)) AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = !(x > 1)) %>% sql_render(), sql("SELECT ~((`x` > 1.0)) AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = !(x >= 1)) %>% sql_render(), sql("SELECT ~((`x` >= 1.0)) AS `x`\nFROM `df`") ) expect_equal( mf %>% mutate(x = x > 4 & x < 5) %>% sql_render(), sql("SELECT `x` > 4.0 & `x` < 5.0 AS `x`\nFROM `df`") ) expect_equal( mf %>% filter(x > 4 & x < 5) %>% sql_render(), sql("SELECT *\nFROM `df`\nWHERE (`x` > 4.0 AND `x` < 5.0)") ) expect_equal( mf %>% mutate(x = x > 4 | x < 5) %>% sql_render(), sql("SELECT `x` > 4.0 | `x` < 5.0 AS `x`\nFROM `df`") ) expect_equal( mf %>% filter(x > 4 | x < 5) %>% sql_render(), sql("SELECT *\nFROM `df`\nWHERE (`x` > 4.0 OR `x` < 5.0)") ) expect_equal( mf %>% mutate(x = ifelse(x == 0, 0 ,1)) %>% sql_render(), sql("SELECT CASE WHEN (`x` = 0.0) THEN (0.0) WHEN NOT(`x` = 0.0) THEN (1.0) END AS `x`\nFROM `df`") ) }) test_that("Special ifelse and case_when cases return the correct queries", { mf <- lazy_frame(x = 1, con = simulate_mssql()) expect_equal( mf %>% mutate(z = ifelse(x %in% c(1, 2), 0, 1)) %>% sql_render(), sql("SELECT `x`, CASE WHEN (`x` IN (1.0, 2.0)) THEN (0.0) WHEN NOT(`x` IN (1.0, 2.0)) THEN (1.0) END AS `z` FROM `df`") ) expect_equal( mf %>% mutate(z = case_when(is.na(x) ~ 1, !is.na(x) ~ 2, TRUE ~ 3)) %>% sql_render(), sql("SELECT `x`, CASE\nWHEN (((`x`) IS NULL)) THEN (1.0)\nWHEN (NOT(((`x`) IS NULL))) THEN (2.0)\nELSE (3.0)\nEND AS `z`\nFROM `df`") ) }) test_that("ORDER BY in subqueries uses TOP 100 PERCENT (#175)", { mf <- lazy_frame(x = 1:3, con = simulate_mssql()) expect_equal( mf %>% mutate(x = -x) %>% arrange(x) %>% mutate(x = -x) %>% sql_render(), sql("SELECT -`x` AS `x`\nFROM (SELECT TOP 100 PERCENT *\nFROM (SELECT TOP 100 PERCENT -`x` AS `x`\nFROM `df`) `dbplyr_001`\nORDER BY `x`) `dbplyr_002`") ) }) test_that("custom lubridate functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_mssql()) } expect_equal(trans(as_date(x)), sql("CAST(`x` AS DATE)")) expect_equal(trans(as_datetime(x)), sql("CAST(`x` AS DATETIME2)")) expect_equal(trans(today()), sql("CAST(SYSDATETIME() AS DATE)")) expect_equal(trans(year(x)), sql("DATEPART(YEAR, `x`)")) expect_equal(trans(day(x)), sql("DATEPART(DAY, `x`)")) expect_equal(trans(mday(x)), sql("DATEPART(DAY, `x`)")) expect_equal(trans(yday(x)), sql("DATEPART(DAYOFYEAR, `x`)")) expect_equal(trans(hour(x)), sql("DATEPART(HOUR, `x`)")) expect_equal(trans(minute(x)), sql("DATEPART(MINUTE, `x`)")) expect_equal(trans(second(x)), sql("DATEPART(SECOND, `x`)")) expect_equal(trans(month(x)), sql("DATEPART(MONTH, `x`)")) expect_equal(trans(month(x, label = TRUE, abbr = FALSE)), sql("DATENAME(MONTH, `x`)")) expect_error(trans(month(x, abbr = TRUE, abbr = TRUE))) expect_equal(trans(quarter(x)), sql("DATEPART(QUARTER, `x`)")) expect_equal(trans(quarter(x, with_year = TRUE)), sql("(DATENAME(YEAR, `x`) + '.' + DATENAME(QUARTER, `x`))")) expect_error(trans(quarter(x, fiscal_start = 5))) }) dbplyr/tests/testthat/test-backend-impala.R0000644000176200001440000000152113442161247020503 0ustar liggesuserscontext("test-backend-impala.R") test_that("custom scalar functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_impala()) } expect_equal(trans(as.Date(x)), sql("CAST(`x` AS VARCHAR(10))")) expect_equal(trans(ceiling(x)), sql("CEIL(`x`)")) }) test_that("custom bitwise operations translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_impala()) } expect_equal(trans(bitwNot(x)), sql("BITNOT(`x`)")) expect_equal(trans(bitwAnd(x, 128L)), sql("BITAND(`x`, 128)")) expect_equal(trans(bitwOr(x, 128L)), sql("BITOR(`x`, 128)")) expect_equal(trans(bitwXor(x, 128L)), sql("BITXOR(`x`, 128)")) expect_equal(trans(bitwShiftL(x, 2L)), sql("SHIFTLEFT(`x`, 2)")) expect_equal(trans(bitwShiftR(x, 2L)), sql("SHIFTRIGHT(`x`, 2)")) }) dbplyr/tests/testthat/test-sql-translator.txt0000644000176200001440000000003613500704060021246 0ustar liggesusers scalar: test dbplyr/tests/testthat/test-backend-mysql.R0000644000176200001440000000067613443245100020407 0ustar liggesuserscontext("test-backend-mysql.R") test_that("use CHAR type for as.character", { expect_equivalent( translate_sql(as.character(x), con = simulate_mysql()), sql("CAST(`x` AS CHAR)") ) }) test_that("logicals converted to integer correctly", { skip_if_no_db("mysql") df1 <- data.frame(x = c(TRUE, FALSE, NA)) df2 <- src_test("mysql") %>% copy_to(df1, unique_table_name()) %>% collect() expect_identical(df2$x, c(1L, 0L, NA)) }) dbplyr/tests/testthat/test-tbl-lazy.R0000644000176200001440000000144213426147047017417 0ustar liggesuserscontext("test-tbl-lazy.R") test_that("adds src class", { tb <- tbl_lazy(mtcars, con = simulate_sqlite()) expect_s3_class(tb, "tbl_SQLiteConnection") }) test_that("has print method", { expect_known_output( tbl_lazy(mtcars), test_path("test-tbl-lazy-print.txt"), print = TRUE ) }) test_that("support colwise variants", { mf <- memdb_frame(x = 1:5, y = factor(letters[1:5])) exp <- mf %>% collect() %>% mutate(y = as.character(y)) expect_message( mf1 <- mutate_if(mf, is.factor, as.character), "on the first 100 rows" ) expect_equal_tbl(mf1, exp) mf2 <- mutate_at(mf, "y", as.character) expect_equal_tbl(mf2, exp) }) test_that("base source of tbl_lazy is always 'df'", { out <- lazy_frame(x = 1, y = 5) %>% sql_build() expect_equal(out, ident("df")) }) dbplyr/tests/testthat/test-translate-sql-paste.R0000644000176200001440000000143613416413555021567 0ustar liggesuserscontext("test-translate-sql-paste.R") test_that("basic prefix operation", { old <- set_current_con(simulate_dbi()) on.exit(set_current_con(old)) paste <- sql_paste("") x <- ident("x") y <- ident("y") expect_equal(paste(x), sql("CONCAT_WS('', `x`)")) expect_equal(paste(x, y), sql("CONCAT_WS('', `x`, `y`)")) expect_equal(paste(x, y, sep = " "), sql("CONCAT_WS(' ', `x`, `y`)")) }) test_that("basic infix operation", { old <- set_current_con(simulate_dbi()) on.exit(set_current_con(old)) paste <- sql_paste_infix("", "&&", function(x) sql_expr(cast((!!x) %as% text))) x <- ident("x") y <- ident("y") expect_equal(paste(x), sql("CAST(`x` AS text)")) expect_equal(paste(x, y), sql("`x` && `y`")) expect_equal(paste(x, y, sep = " "), sql("`x` && ' ' && `y`")) }) dbplyr/tests/testthat/test-backend-access.R0000644000176200001440000000507313426147047020513 0ustar liggesuserscontext("test-backend-access.R") test_that("custom scalar translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_access()) } # Conversion expect_equal(trans(as.numeric(x)), sql("CDBL(`x`)")) expect_equal(trans(as.double(x)), sql("CDBL(`x`)")) expect_equal(trans(as.integer(x)), sql("INT(`x`)")) expect_equal(trans(as.logical(x)), sql("CBOOL(`x`)")) expect_equal(trans(as.character(x)), sql("CSTR(`x`)")) expect_equal(trans(as.Date(x)), sql("CDATE(`x`)")) # Math expect_equal(trans(exp(x)), sql("EXP(`x`)")) expect_equal(trans(log(x)), sql("LOG(`x`)")) expect_equal(trans(log10(x)), sql("LOG(`x`) / LOG(10)")) expect_equal(trans(sqrt(x)), sql("SQR(`x`)")) expect_equal(trans(sign(x)), sql("SGN(`x`)")) expect_equal(trans(floor(x)), sql("INT(`x`)")) expect_equal(trans(ceiling(x)), sql("INT(`x` + 0.9999999999)")) expect_equal(trans(ceil(x)), sql("INT(`x` + 0.9999999999)")) # String expect_equal(trans(nchar(x)), sql("LEN(`x`)")) expect_equal(trans(tolower(x)), sql("LCASE(`x`)")) expect_equal(trans(toupper(x)), sql("UCASE(`x`)")) expect_equal(trans(substr(x, 1, 2)), sql("RIGHT(LEFT(`x`, 2.0), 2.0)")) expect_equal(trans(paste(x)), sql("CSTR(`x`)")) expect_equal(trans(trimws(x)), sql("TRIM(`x`)")) expect_equal(trans(is.null(x)), sql("ISNULL(`x`)")) expect_equal(trans(is.na(x)), sql("ISNULL(`x`)")) expect_equal(trans(coalesce(x, y)), sql("IIF(ISNULL(`x`), `y`, `x`)")) expect_equal(trans(pmin(x, y)), sql("IIF(`x` <= `y`, `x`, `y`)")) expect_equal(trans(pmax(x, y)), sql("IIF(`x` <= `y`, `y`, `x`)")) expect_equal(trans(Sys.Date()), sql("DATE()")) # Special paste() tests expect_equal(trans(paste(x, y, sep = "+")), sql("`x` & '+' & `y`")) expect_equal(trans(paste0(x, y)), sql("`x` & `y`")) expect_error(trans(paste(x, collapse = "-")),"`collapse` not supported") }) test_that("custom aggregators translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = FALSE, con = simulate_access()) } expect_equal(trans(sd(x)), sql("STDEV(`x`)")) expect_equal(trans(var(x)), sql("VAR(`x`)")) expect_error(trans(cor(x)), "not available") expect_error(trans(cov(x)), "not available") expect_error(trans(n_distinct(x)), "not available") }) test_that("queries translate correctly", { mf <- lazy_frame(x = 1, con = simulate_access()) expect_equal( mf %>% head() %>% sql_render(), sql("SELECT TOP 6 *\nFROM `df`") ) }) dbplyr/tests/testthat/test-query-set-op-print.txt0000644000176200001440000000022413500704057021771 0ustar liggesusers X: From: df Select: `x`, `y`, NULL Y: From: df Select: `x`, NULL, `z` dbplyr/tests/testthat/test-verb-group_by.R0000644000176200001440000000440713475552746020461 0ustar liggesuserscontext("group_by") test_that("group_by with add = TRUE adds groups", { mf <- memdb_frame(x = 1:3, y = 1:3) gf1 <- mf %>% group_by(x, y) gf2 <- mf %>% group_by(x) %>% group_by(y, add = TRUE) expect_equal(group_vars(gf1), c("x", "y")) expect_equal(group_vars(gf2), c("x", "y")) }) test_that("collect, collapse and compute preserve grouping", { g <- memdb_frame(x = 1:3, y = 1:3) %>% group_by(x, y) expect_equal(group_vars(compute(g)), c("x", "y")) expect_equal(group_vars(collapse(g)), c("x", "y")) expect_equal(group_vars(collect(g)), c("x", "y")) }) test_that("joins preserve grouping", { g <- memdb_frame(x = 1:3, y = 1:3) %>% group_by(x) expect_equal(group_vars(inner_join(g, g, by = c("x", "y"))), "x") expect_equal(group_vars(left_join(g, g, by = c("x", "y"))), "x") expect_equal(group_vars(semi_join(g, g, by = c("x", "y"))), "x") expect_equal(group_vars(anti_join(g, g, by = c("x", "y"))), "x") }) test_that("group_by can perform mutate", { mf <- memdb_frame(x = 3:1, y = 1:3) out <- mf %>% group_by(z = x + y) %>% summarise(n = n()) %>% collect() expect_equal(out, tibble(z = 4L, n = 3L)) }) # sql_build --------------------------------------------------------------- test_that("ungroup drops PARTITION BY", { out <- lazy_frame(x = 1) %>% group_by(x) %>% ungroup() %>% mutate(x = rank(x)) %>% sql_build() expect_equal(out$select, sql(x = 'RANK() OVER (ORDER BY `x`)')) }) # ops --------------------------------------------------------------------- test_that("group_by overrides existing groups", { df <- tibble(g1 = 1, g2 = 2, x = 3) %>% tbl_lazy() out1 <- df %>% group_by(g1) expect_equal(op_grps(out1), "g1") out2 <- out1 %>% group_by(g2) expect_equal(op_grps(out2), "g2") }) test_that("group_by increases grouping if add = TRUE", { df <- tibble(g1 = 1, g2 = 2, x = 3) %>% tbl_lazy() out <- df %>% group_by(g1) %>% group_by(g2, add = TRUE) expect_equal(op_grps(out), c("g1", "g2")) }) test_that("ungroup drops all groups", { out1 <- lazy_frame(g1 = 1, g2 = 2) %>% group_by(g1, g2) %>% ungroup() out2 <- lazy_frame(g1 = 1, g2 = 2) %>% group_by(g1, g2) %>% ungroup() %>% rename(g3 = g1) expect_equal(op_grps(out1), character()) expect_equal(op_grps(out2), character()) }) dbplyr/tests/testthat/test-verb-joins.R0000644000176200001440000002760413442161247017743 0ustar liggesuserscontext("test-joins.R") df1 <- memdb_frame(x = 1:5, y = 1:5) df2 <- memdb_frame(a = 5:1, b = 1:5) df3 <- memdb_frame(x = 1:5, z = 1:5) df4 <- memdb_frame(a = 5:1, z = 5:1) test_that("named by join by different x and y vars", { j1 <- collect(inner_join(df1, df2, c("x" = "a"))) expect_equal(names(j1), c("x", "y", "b")) expect_equal(nrow(j1), 5) j2 <- collect(inner_join(df1, df2, c("x" = "a", "y" = "b"))) expect_equal(names(j2), c("x", "y")) expect_equal(nrow(j2), 1) }) test_that("named by join by same z vars", { j1 <- collect(inner_join(df3, df4, c("z" = "z"))) expect_equal(nrow(j1), 5) expect_equal(names(j1), c("x", "z", "a")) }) test_that("join with both same and different vars", { j1 <- collect(left_join(df1, df3, by = c("y" = "z", "x"))) expect_equal(nrow(j1), 5) expect_equal(names(j1), c("x", "y")) }) test_that("joining over arbitrary predicates", { j1 <- collect(left_join(df1, df2, sql_on = "LHS.x = RHS.b")) j2 <- collect(left_join(df1, df2, by = c("x" = "b"))) %>% mutate(b = x) expect_equal(j1, j2) j1 <- collect(left_join(df1, df3, sql_on = "LHS.x = RHS.z")) j2 <- collect(left_join(df1, df3, by = c("x" = "z"))) %>% mutate(z = x.x) expect_equal(j1, j2) j1 <- collect(left_join(df1, df3, sql_on = "LHS.x = RHS.x")) j2 <- collect(left_join(df1, df3, by = "x")) %>% mutate(x.y = x) %>% rename(x.x = x) expect_equal(j1, j2) }) test_that("inner join doesn't result in duplicated columns ", { expect_equal(colnames(inner_join(df1, df1)), c("x", "y")) }) test_that("self-joins allowed with named by", { fam <- memdb_frame(id = 1:5, parent = c(NA, 1, 2, 2, 4)) j1 <- fam %>% left_join(fam, by = c("parent" = "id")) j2 <- fam %>% inner_join(fam, by = c("parent" = "id")) expect_equal(op_vars(j1), c("id", "parent.x", "parent.y")) expect_equal(op_vars(j2), c("id", "parent.x", "parent.y")) expect_equal(nrow(collect(j1)), 5) expect_equal(nrow(collect(j2)), 4) j3 <- collect(semi_join(fam, fam, by = c("parent" = "id"))) j4 <- collect(anti_join(fam, fam, by = c("parent" = "id"))) expect_equal(j3, filter(collect(fam), !is.na(parent))) expect_equal(j4, filter(collect(fam), is.na(parent))) }) test_that("suffix modifies duplicated variable names", { fam <- memdb_frame(id = 1:5, parent = c(NA, 1, 2, 2, 4)) j1 <- collect(inner_join(fam, fam, by = c("parent" = "id"), suffix = c("1", "2"))) j2 <- collect(left_join(fam, fam, by = c("parent" = "id"), suffix = c("1", "2"))) expect_named(j1, c("id", "parent1", "parent2")) expect_named(j2, c("id", "parent1", "parent2")) }) test_that("join variables always disambiguated (#2823)", { # Even if the new variable conflicts with an existing variable df1 <- dbplyr::memdb_frame(a = 1, b.x = 1, b = 1) df2 <- dbplyr::memdb_frame(a = 1, b = 1) both <- collect(dplyr::left_join(df1, df2, by = "a")) expect_named(both, c("a", "b.x", "b.x.x", "b.y")) }) test_that("join functions error on column not found for SQL sources #1928", { # Rely on dplyr to test precise code expect_error( left_join(memdb_frame(x = 1:5), memdb_frame(y = 1:5), by = "x"), "missing|(not found)" ) expect_error( left_join(memdb_frame(x = 1:5), memdb_frame(y = 1:5), by = "y"), "missing|(not found)" ) expect_error( left_join(memdb_frame(x = 1:5), memdb_frame(y = 1:5)), "[Nn]o common variables" ) }) test_that("join generates correct sql", { lf1 <- memdb_frame(x = 1, y = 2) lf2 <- memdb_frame(x = 1, z = 3) out <- lf1 %>% inner_join(lf2, by = "x") %>% collect() expect_equal(out, data.frame(x = 1, y = 2, z = 3)) }) test_that("semi join generates correct sql", { lf1 <- memdb_frame(x = c(1, 2), y = c(2, 3)) lf2 <- memdb_frame(x = 1) lf3 <- inner_join(lf1, lf2, by = "x") expect_equal(op_vars(lf3), c("x", "y")) out <- collect(lf3) expect_equal(out, data.frame(x = 1, y = 2)) }) test_that("set ops generates correct sql", { lf1 <- memdb_frame(x = 1) lf2 <- memdb_frame(x = c(1, 2)) out <- lf1 %>% union(lf2) %>% collect() expect_equal(out, data.frame(x = c(1, 2))) }) # All sources ------------------------------------------------------------- test_that("sql generated correctly for all sources", { x <- test_frame(a = letters[1:7], c = 2:8) y <- test_frame(a = letters[1:4], b = c(1, 2, 3, NA)) xy <- purrr::map2(x, y, left_join) expect_equal_tbls(xy) }) test_that("full join is promoted to cross join for no overlapping variables", { result <- df1 %>% full_join(df2, by = character()) %>% collect() expect_equal(nrow(result), 25) }) # Consistency of results -------------------------------------------------- test_that("consistent result of left join on key column with same name in both tables", { test_l_j_by_x <- function(tbl_left, tbl_right) { left_join(tbl_left, tbl_right, by = "x") %>% arrange(x, y, z) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), z = 1L:4L) tbls_left <- test_load(tbl_left) tbls_right <- test_load(tbl_right) compare_tbls2(tbls_left, tbls_right, op = test_l_j_by_x) }) test_that("consistent result of inner join on key column with same name in both tables", { test_i_j_by_x <- function(tbl_left, tbl_right) { inner_join(tbl_left, tbl_right, by = "x") %>% arrange(x, y, z) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), z = 1L:4L) tbls_left <- test_load(tbl_left) tbls_right <- test_load(tbl_right) compare_tbls2(tbls_left, tbls_right, op = test_i_j_by_x) }) test_that("consistent result of right join on key column with same name in both tables", { test_r_j_by_x <- function(tbl_left, tbl_right) { right_join(tbl_left, tbl_right, by = "x") %>% arrange(x, y, z) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), z = 1L:4L) # SQLite does not support right joins tbls_left <- test_load(tbl_left, ignore = c("sqlite")) tbls_right <- test_load(tbl_right, ignore = c("sqlite")) compare_tbls2(tbls_left, tbls_right, op = test_r_j_by_x) }) test_that("consistent result of full join on key column with same name in both tables", { test_f_j_by_x <- function(tbl_left, tbl_right) { full_join(tbl_left, tbl_right, by = "x") %>% arrange(x, y, z) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), z = 1L:4L) # SQLite and MySQL do not support full joins tbls_left <- test_load(tbl_left, ignore = c("sqlite", "mysql", "MariaDB")) tbls_right <- test_load(tbl_right, ignore = c("sqlite", "mysql", "MariaDB")) compare_tbls2(tbls_left, tbls_right, op = test_f_j_by_x) }) test_that("consistent result of left join on key column with different names", { test_l_j_by_xl_xr <- function(tbl_left, tbl_right) { left_join(tbl_left, tbl_right, by = c("xl" = "xr")) %>% arrange(xl, y, z) } tbl_left <- tibble(xl = 1L:4L, y = 1L:4L) tbl_right <- tibble(xr = c(1L:3L, 5L), z = 1L:4L) tbls_left <- test_load(tbl_left) tbls_right <- test_load(tbl_right) compare_tbls2(tbls_left, tbls_right, op = test_l_j_by_xl_xr) }) test_that("consistent result of inner join on key column with different names", { test_i_j_by_xl_xr <- function(tbl_left, tbl_right) { inner_join(tbl_left, tbl_right, by = c("xl" = "xr")) %>% arrange(xl, y, z) } tbl_left <- tibble(xl = 1L:4L, y = 1L:4L) tbl_right <- tibble(xr = c(1L:3L, 5L), z = 1L:4L) tbls_left <- test_load(tbl_left) tbls_right <- test_load(tbl_right) compare_tbls2(tbls_left, tbls_right, op = test_i_j_by_xl_xr) }) test_that("consistent result of right join on key column with different names", { test_r_j_by_xl_xr <- function(tbl_left, tbl_right) { right_join(tbl_left, tbl_right, by = c("xl" = "xr")) %>% arrange(xl, y, z) } tbl_left <- tibble(xl = 1L:4L, y = 1L:4L) tbl_right <- tibble(xr = c(1L:3L, 5L), z = 1L:4L) # SQLite does not support right joins tbls_left <- test_load(tbl_left, ignore = c("sqlite")) tbls_right <- test_load(tbl_right, ignore = c("sqlite")) compare_tbls2(tbls_left, tbls_right, op = test_r_j_by_xl_xr) }) test_that("consistent result of full join on key column with different names", { test_f_j_by_xl_xr <- function(tbl_left, tbl_right) { full_join(tbl_left, tbl_right, by = c("xl" = "xr")) %>% arrange(xl, y, z) } tbl_left <- tibble(xl = 1L:4L, y = 1L:4L) tbl_right <- tibble(xr = c(1L:3L, 5L), z = 1L:4L) # SQLite and MySQL do not support full joins tbls_left <- test_load(tbl_left, ignore = c("sqlite", "mysql", "MariaDB")) tbls_right <- test_load(tbl_right, ignore = c("sqlite", "mysql", "MariaDB")) compare_tbls2(tbls_left, tbls_right, op = test_f_j_by_xl_xr) }) test_that("consistent result of left natural join", { test_l_j <- function(tbl_left, tbl_right) { left_join(tbl_left, tbl_right) %>% arrange(x, y, z, w) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L, w = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), y = 1L:4L, z = 1L:4L) tbls_left <- test_load(tbl_left) tbls_right <- test_load(tbl_right) compare_tbls2(tbls_left, tbls_right, op = test_l_j) }) test_that("consistent result of inner natural join", { test_i_j <- function(tbl_left, tbl_right) { inner_join(tbl_left, tbl_right) %>% arrange(x, y, z, w) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L, w = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), y = 1L:4L, z = 1L:4L) tbls_left <- test_load(tbl_left) tbls_right <- test_load(tbl_right) compare_tbls2(tbls_left, tbls_right, op = test_i_j) }) test_that("consistent result of right natural join", { test_r_j <- function(tbl_left, tbl_right) { right_join(tbl_left, tbl_right) %>% arrange(x, y, z, w) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L, w = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), y = 1L:4L, z = 1L:4L) # SQLite does not support right joins tbls_left <- test_load(tbl_left, ignore = c("sqlite")) tbls_right <- test_load(tbl_right, ignore = c("sqlite")) compare_tbls2(tbls_left, tbls_right, op = test_r_j) }) test_that("consistent result of full natural join", { test_f_j <- function(tbl_left, tbl_right) { full_join(tbl_left, tbl_right) %>% arrange(x, y, z, w) } tbl_left <- tibble(x = 1L:4L, y = 1L:4L, w = 1L:4L) tbl_right <- tibble(x = c(1L:3L, 5L), y = 1L:4L, z = 1L:4L) # SQLite and MySQL do not support full joins tbls_left <- test_load(tbl_left, ignore = c("sqlite", "mysql", "MariaDB")) tbls_right <- test_load(tbl_right, ignore = c("sqlite", "mysql", "MariaDB")) compare_tbls2(tbls_left, tbls_right, op = test_f_j) }) # sql_build --------------------------------------------------------------- test_that("join captures both tables", { lf1 <- lazy_frame(x = 1, y = 2) lf2 <- lazy_frame(x = 1, z = 2) out <- inner_join(lf1, lf2) %>% sql_build() expect_s3_class(out, "join_query") expect_equal(op_vars(out$x), c("x", "y")) expect_equal(op_vars(out$y), c("x", "z")) expect_equal(out$type, "inner") }) test_that("semi join captures both tables", { lf1 <- lazy_frame(x = 1, y = 2) lf2 <- lazy_frame(x = 1, z = 2) out <- semi_join(lf1, lf2) %>% sql_build() expect_equal(op_vars(out$x), c("x", "y")) expect_equal(op_vars(out$y), c("x", "z")) expect_equal(out$anti, FALSE) }) test_that("set ops captures both tables", { lf1 <- lazy_frame(x = 1, y = 2) lf2 <- lazy_frame(x = 1, z = 2) out <- union(lf1, lf2) %>% sql_build() expect_equal(out$type, "UNION") }) # ops --------------------------------------------------------------------- test_that("joins get vars from both left and right", { out <- left_join( lazy_frame(x = 1, y = 1), lazy_frame(x = 2, z = 2), by = "x" ) expect_equal(op_vars(out), c("x", "y", "z")) }) test_that("semi joins get vars from left", { out <- semi_join( lazy_frame(x = 1, y = 1), lazy_frame(x = 2, z = 2), by = "x" ) expect_equal(op_vars(out), c("x", "y")) }) # Helpers ----------------------------------------------------------------- test_that("add_suffixes works if no suffix requested", { expect_equal(add_suffixes(c("x", "x"), "y", ""), c("x", "x")) expect_equal(add_suffixes(c("x", "y"), "y", ""), c("x", "y")) }) dbplyr/tests/testthat/test-query-select-print.txt0000644000176200001440000000005413500704057022042 0ustar liggesusers From: df Select: * dbplyr/tests/testthat/test-sql-expr.R0000644000176200001440000000116013416112226017417 0ustar liggesuserscontext("test-sql-expr.R") test_that("atomic vectors are escaped", { con <- simulate_dbi() expect_equal(sql_expr(2, con = con), sql("2.0")) expect_equal(sql_expr("x", con = con), sql("'x'")) }) test_that("user infix functions have % stripped", { con <- simulate_dbi() expect_equal(sql_expr(x %like% y, con = con), sql("x LIKE y")) }) test_that("string function names are not quoted", { con <- simulate_dbi() f <- "foo" expect_equal(sql_expr((!!f)(), con = con), sql("FOO()")) }) test_that("correct number of parens", { con <- simulate_dbi() expect_equal(sql_expr((1L), con = con), sql("(1)")) }) dbplyr/tests/testthat/sql/0000755000176200001440000000000013501765351015355 5ustar liggesusersdbplyr/tests/testthat/sql/setop.sql0000644000176200001440000000030513500704057017221 0ustar liggesusers$union (SELECT * FROM `df`) UNION (SELECT * FROM `df`) $setdiff (SELECT * FROM `df`) EXCEPT (SELECT * FROM `df`) $intersect (SELECT * FROM `df`) INTERSECT (SELECT * FROM `df`) dbplyr/tests/testthat/sql/mutate-select-collapse.sql0000644000176200001440000000017613500704071022445 0ustar liggesusers$xy SELECT `x` * 2.0 AS `x`, `y` * 2.0 AS `y` FROM `df` $yx SELECT `y` * 2.0 AS `y`, `x` * 2.0 AS `x` FROM `df` dbplyr/tests/testthat/sql/select-mutate-collapse.sql0000644000176200001440000000010513500704071022435 0ustar liggesusers$a SELECT 1.0 AS `a` FROM `df` $x SELECT `x` FROM `df` dbplyr/tests/testthat/sql/backend-quantile.sql0000644000176200001440000000047613500704060021301 0ustar liggesusers$quantile PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY `x`) $quantile_win PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY `x`) OVER (PARTITION BY `g`) $median PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY `x`) $median_win PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY `x`) OVER (PARTITION BY `g`) dbplyr/tests/testthat/sql/select-collapse.sql0000644000176200001440000000017413500704071021146 0ustar liggesusers$flip2 SELECT `x`, `y` FROM `df` $flip3 SELECT `y`, `x` FROM `df` $rename SELECT `x` AS `x2` FROM `df` dbplyr/tests/testthat/sql/mutate-subqueries.sql0000644000176200001440000000026713500704071021556 0ustar liggesusers$inplace SELECT `x` + 1.0 AS `x` FROM (SELECT `x` + 1.0 AS `x` FROM `df`) $increment SELECT `x`, `x1`, `x1` + 1.0 AS `x2` FROM (SELECT `x`, `x` + 1.0 AS `x1` FROM `df`) dbplyr/tests/testthat/sql/mutate-select.sql0000644000176200001440000000010513426605507020650 0ustar liggesusers$a SELECT 1.0 AS `a` FROM `df` $x SELECT `x` FROM `df` dbplyr/tests/testthat/sql/semi-join.sql0000644000176200001440000000045013500704057017762 0ustar liggesusers$semi SELECT * FROM `df` AS `LHS` WHERE EXISTS ( SELECT 1 FROM `df` AS `RHS` WHERE (`LHS`.`x` = `RHS`.`x` AND `LHS`.`y` = `RHS`.`y`) ) $anti SELECT * FROM `df` AS `LHS` WHERE NOT EXISTS ( SELECT 1 FROM `df` AS `RHS` WHERE (`LHS`.`x` = `RHS`.`x` AND `LHS`.`y` = `RHS`.`y`) ) dbplyr/tests/testthat/sql/join-on.sql0000644000176200001440000000117413500704057017445 0ustar liggesusers$inner SELECT `LHS`.`x` AS `x.x`, `LHS`.`y` AS `y`, `RHS`.`x` AS `x.y`, `RHS`.`z` AS `z` FROM `df` AS `LHS` INNER JOIN `df` AS `RHS` ON (LHS.y < RHS.z) $left SELECT `LHS`.`x` AS `x.x`, `LHS`.`y` AS `y`, `RHS`.`x` AS `x.y`, `RHS`.`z` AS `z` FROM `df` AS `LHS` LEFT JOIN `df` AS `RHS` ON (LHS.y < RHS.z) $right SELECT `LHS`.`x` AS `x.x`, `LHS`.`y` AS `y`, `RHS`.`x` AS `x.y`, `RHS`.`z` AS `z` FROM `df` AS `LHS` RIGHT JOIN `df` AS `RHS` ON (LHS.y < RHS.z) $full SELECT `LHS`.`x` AS `x.x`, `LHS`.`y` AS `y`, `RHS`.`x` AS `x.y`, `RHS`.`z` AS `z` FROM `df` AS `LHS` FULL JOIN `df` AS `RHS` ON (LHS.y < RHS.z) dbplyr/tests/testthat/sql/join.sql0000644000176200001440000000121613500704057017030 0ustar liggesusers$inner SELECT `LHS`.`x` AS `x`, `LHS`.`y` AS `y` FROM `df` AS `LHS` INNER JOIN `df` AS `RHS` ON (`LHS`.`x` = `RHS`.`x` AND `LHS`.`y` = `RHS`.`y`) $left SELECT `LHS`.`x` AS `x`, `LHS`.`y` AS `y` FROM `df` AS `LHS` LEFT JOIN `df` AS `RHS` ON (`LHS`.`x` = `RHS`.`x` AND `LHS`.`y` = `RHS`.`y`) $right SELECT `RHS`.`x` AS `x`, `RHS`.`y` AS `y` FROM `df` AS `LHS` RIGHT JOIN `df` AS `RHS` ON (`LHS`.`x` = `RHS`.`x` AND `LHS`.`y` = `RHS`.`y`) $full SELECT COALESCE(`LHS`.`x`, `RHS`.`x`) AS `x`, COALESCE(`LHS`.`y`, `RHS`.`y`) AS `y` FROM `df` AS `LHS` FULL JOIN `df` AS `RHS` ON (`LHS`.`x` = `RHS`.`x` AND `LHS`.`y` = `RHS`.`y`) dbplyr/tests/testthat/test-src_dbi.R0000644000176200001440000000035113415745770017271 0ustar liggesuserscontext("test-src_dbi.R") test_that("tbl and src classes include connection class", { mf <- memdb_frame(x = 1, y = 2) expect_true(inherits(mf, "tbl_SQLiteConnection")) expect_true(inherits(mf$src, "src_SQLiteConnection")) }) dbplyr/tests/testthat/test-translate-sql-conditional.R0000644000176200001440000000134113426424061022744 0ustar liggesuserscontext("translate-vectorised") test_that("case_when converted to CASE WHEN", { expect_equal( translate_sql(case_when(x > 1L ~ "a")), sql("CASE\nWHEN (`x` > 1) THEN ('a')\nEND") ) }) test_that("even inside mutate", { out <- lazy_frame(x = 1:5) %>% mutate(y = case_when(x > 1L ~ "a")) %>% sql_build() expect_equal( out$select[[2]], "CASE\nWHEN (`x` > 1) THEN ('a')\nEND" ) }) test_that("case_when translates correctly to ELSE when TRUE ~ is used 2", { out <- translate_sql( case_when( x == 1L ~ "yes", x == 0L ~ "no", TRUE ~ "undefined") ) expect_equal( out, sql("CASE\nWHEN (`x` = 1) THEN ('yes')\nWHEN (`x` = 0) THEN ('no')\nELSE ('undefined')\nEND") ) }) dbplyr/tests/testthat/test-ident.R0000644000176200001440000000104413416413012016745 0ustar liggesuserscontext("test-ident.R") test_that("zero length inputs return correct clases", { expect_s3_class(ident(), "ident") expect_s3_class(ident_q(), "ident_q") }) test_that("ident quotes and ident_q doesn't", { con <- simulate_dbi() x1 <- ident("x") x2 <- ident_q('"x"') expect_equal(escape(x1, con = con), sql('`x`')) expect_equal(escape(x2, con = con), sql('"x"')) }) test_that("ident are left unchanged when coerced to sql", { x1 <- ident("x") x2 <- ident_q('"x"') expect_equal(as.sql(x1), x1) expect_equal(as.sql(x2), x2) }) dbplyr/tests/testthat/test-backend-sqlite.R0000644000176200001440000000244613416411326020545 0ustar liggesuserscontext("test-backend-sqlite.R") test_that("logicals translated to integers", { expect_equal(escape(FALSE, con = simulate_sqlite()), sql("0")) expect_equal(escape(TRUE, con = simulate_sqlite()), sql("1")) expect_equal(escape(NA, con = simulate_sqlite()), sql("NULL")) }) test_that("vectorised translations", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_sqlite(), window = FALSE) } expect_equal(trans(paste(x, y)), sql("`x` || ' ' || `y`")) expect_equal(trans(paste0(x, y)), sql("`x` || `y`")) }) test_that("pmin and max become MIN and MAX", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_sqlite(), window = FALSE) } expect_equal(trans(pmin(x, y)), sql('MIN(`x`, `y`)')) expect_equal(trans(pmax(x, y)), sql('MAX(`x`, `y`)')) }) test_that("as.numeric()/as.double() get custom translation", { mf <- dbplyr::memdb_frame(x = 1L) out <- mf %>% mutate(x1 = as.numeric(x), x2 = as.double(x)) %>% collect() expect_type(out$x1, "double") expect_type(out$x2, "double") }) test_that("sqlite mimics two argument log", { translate_sqlite <- function(...) { translate_sql(..., con = src_memdb()$con) } expect_equal(translate_sqlite(log(x)), sql('LOG(`x`)')) expect_equal(translate_sqlite(log(x, 10)), sql('LOG(`x`) / LOG(10.0)')) }) dbplyr/tests/testthat/test-query-join-print.txt0000644000176200001440000000007413500704057021524 0ustar liggesusers By: x-x X: df Y: df dbplyr/tests/testthat/test-verb-head.R0000644000176200001440000000164113426322700017506 0ustar liggesuserscontext("test-verb-head") test_that("head limits rows returned", { out <- memdb_frame(x = 1:100) %>% head(10) %>% collect() expect_equal(nrow(out), 10) }) test_that("head limits rows", { out <- lazy_frame(x = 1:100) %>% head(10) %>% sql_build() expect_equal(out$limit, 10) }) test_that("head works with huge whole numbers", { out <- memdb_frame(x = 1:100) %>% head(1e10) %>% collect() expect_equal(out, tibble(x = 1:100)) }) # sql_render -------------------------------------------------------------- test_that("head renders to integer fractional input", { out <- memdb_frame(x = 1:100) %>% head(10.5) %>% sql_render() expect_match(out, "LIMIT 10$") }) # ops --------------------------------------------------------------------- test_that("two heads are equivalent to one", { out <- lazy_frame(x = 1:10) %>% head(3) %>% head(5) expect_equal(out$ops$args$n, 3) }) dbplyr/tests/testthat/test-verb-select.R0000644000176200001440000000715413475552746020114 0ustar liggesuserscontext("select") df <- as.data.frame(as.list(setNames(1:26, letters))) tbls <- test_load(df) test_that("select quotes correctly", { out <- memdb_frame(x = 1, y = 1) %>% select(x) %>% collect() expect_equal(out, tibble(x = 1)) }) test_that("select can rename", { out <- memdb_frame(x = 1, y = 2) %>% select(y = x) %>% collect() expect_equal(out, tibble(y = 1)) }) test_that("two selects equivalent to one", { mf <- memdb_frame(a = 1, b = 1, c = 1, d = 2) out <- mf %>% select(a:c) %>% select(b:c) %>% collect() expect_named(out, c("b", "c")) }) test_that("select operates on mutated vars", { mf <- memdb_frame(x = c(1, 2, 3), y = c(3, 2, 1)) df1 <- mf %>% mutate(x, z = x + y) %>% select(z) %>% collect() df2 <- mf %>% collect() %>% mutate(x, z = x + y) %>% select(z) expect_equal_tbl(df1, df2) }) test_that("select renames variables (#317)", { mf <- memdb_frame(x = 1, y = 2) expect_equal_tbl(mf %>% select(A = x), tibble(A = 1)) }) test_that("rename renames variables", { mf <- memdb_frame(x = 1, y = 2) expect_equal_tbl(mf %>% rename(A = x), tibble(A = 1, y = 2)) }) test_that("can rename multiple vars", { mf <- memdb_frame(a = 1, b = 2) exp <- tibble(c = 1, d = 2) expect_equal_tbl(mf %>% rename(c = a, d = b), exp) expect_equal_tbl(mf %>% group_by(a) %>% rename(c = a, d = b), exp) }) test_that("select preserves grouping vars", { mf <- memdb_frame(a = 1, b = 2) %>% group_by(b) out <- mf %>% select(a) %>% collect() expect_named(out, c("b", "a")) }) # sql_render -------------------------------------------------------------- test_that("multiple selects are collapsed", { lf <- lazy_frame(x = 1, y = 2) reg <- list( flip2 = lf %>% select(2:1) %>% select(2:1), flip3 = lf %>% select(2:1) %>% select(2:1) %>% select(2:1), rename = lf %>% select(x1 = x) %>% select(x2 = x1) ) expect_known_output(print(reg), test_path("sql/select-collapse.sql")) }) test_that("mutate collapses over nested select", { lf <- lazy_frame(g = 0, x = 1, y = 2) reg <- list( a = lf %>% mutate(a = 1, b = 2) %>% select(a), x = lf %>% mutate(a = 1, b = 2) %>% select(x) ) expect_known_output(print(reg), test_path("sql/select-mutate-collapse.sql")) }) # sql_build ------------------------------------------------------------- test_that("select picks variables", { out <- lazy_frame(x1 = 1, x2 = 1, x3 = 2) %>% select(x1:x2) %>% sql_build() expect_equal(out$select, sql("x1" = "`x1`", "x2" = "`x2`")) }) test_that("select renames variables", { out <- lazy_frame(x1 = 1, x2 = 1, x3 = 2) %>% select(y = x1, z = x2) %>% sql_build() expect_equal(out$select, sql("y" = "`x1`", "z" = "`x2`")) }) test_that("select can refer to variables in local env", { vars <- c("x", "y") out <- lazy_frame(x = 1, y = 1) %>% select(one_of(vars)) %>% sql_build() expect_equal(out$select, sql("x" = "`x`", "y" = "`y`")) }) test_that("rename preserves existing vars", { out <- lazy_frame(x = 1, y = 1) %>% rename(z = y) %>% sql_build() expect_equal(out$select, sql("x" = "`x`", "z" = "`y`")) }) # ops --------------------------------------------------------------------- test_that("select reduces variables", { out <- mtcars %>% tbl_lazy() %>% select(mpg:disp) expect_equal(op_vars(out), c("mpg", "cyl", "disp")) }) test_that("rename preserves existing", { out <- tibble(x = 1, y = 2) %>% tbl_lazy() %>% rename(z = y) expect_equal(op_vars(out), c("x", "z")) }) test_that("rename renames grouping vars", { df <- lazy_frame(a = 1, b = 2) %>% group_by(a) %>% rename(c = a) expect_equal(op_grps(df), "c") }) dbplyr/tests/testthat/test-backend-teradata.R0000644000176200001440000000445213442161247021033 0ustar liggesuserscontext("test-backend-terdata.R") test_that("custom scalar translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_teradata()) } expect_equal(trans(x != y), sql("`x` <> `y`")) expect_equal(trans(as.numeric(x)), sql("CAST(`x` AS NUMERIC)")) expect_equal(trans(as.double(x)), sql("CAST(`x` AS NUMERIC)")) expect_equal(trans(as.character(x)), sql("CAST(`x` AS VARCHAR(MAX))")) expect_equal(trans(log(x)), sql("LN(`x`)")) expect_equal(trans(cot(x)), sql("1 / TAN(`x`)")) expect_equal(trans(nchar(x)), sql("CHARACTER_LENGTH(`x`)")) expect_equal(trans(ceil(x)), sql("CEILING(`x`)")) expect_equal(trans(ceiling(x)), sql("CEILING(`x`)")) expect_equal(trans(atan2(x, y)), sql("ATAN2(`y`, `x`)")) expect_equal(trans(substr(x, 1, 2)), sql("SUBSTR(`x`, 1.0, 2.0)")) expect_error(trans(paste(x)), sql("not supported")) }) test_that("custom bitwise operations translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_impala()) } expect_equal(trans(bitwNot(x)), sql("BITNOT(`x`)")) expect_equal(trans(bitwAnd(x, 128L)), sql("BITAND(`x`, 128)")) expect_equal(trans(bitwOr(x, 128L)), sql("BITOR(`x`, 128)")) expect_equal(trans(bitwXor(x, 128L)), sql("BITXOR(`x`, 128)")) expect_equal(trans(bitwShiftL(x, 2L)), sql("SHIFTLEFT(`x`, 2)")) expect_equal(trans(bitwShiftR(x, 2L)), sql("SHIFTRIGHT(`x`, 2)")) }) test_that("custom aggregators translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = FALSE, con = simulate_teradata()) } expect_equal(trans(var(x)), sql("VAR_SAMP(`x`)")) expect_error(trans(cor(x)), "not available") expect_error(trans(cov(x)), "not available") }) test_that("custom window functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = TRUE, con = simulate_teradata()) } expect_equal(trans(var(x, na.rm = TRUE)), sql("VAR_SAMP(`x`) OVER ()")) expect_error(trans(cor(x)), "not supported") expect_error(trans(cov(x)), "not supported") }) test_that("filter and mutate translate is.na correctly", { mf <- lazy_frame(x = 1, con = simulate_teradata()) expect_equal( mf %>% head() %>% sql_render(), sql("SELECT TOP 6 *\nFROM `df`") ) }) dbplyr/tests/testthat/test-explain-sqlite.txt0000644000176200001440000000017213416125771021234 0ustar liggesusers SELECT * FROM `XYZ` WHERE (`x` > 5.0) id parent notused detail 1 2 0 0 SCAN TABLE XYZ dbplyr/tests/testthat/test-query-select.R0000644000176200001440000000342613455376041020307 0ustar liggesuserscontext("test-query-select.R") test_that("select_query() print method output is as expected", { mf <- select_query(lazy_frame(x = 1, con = simulate_dbi())) expect_known_output(mf, test_path("test-query-select-print.txt"), print = TRUE) }) test_that("queries generated by select() don't alias unnecessarily", { lf_build <- lazy_frame(x = 1) %>% select(x) %>% sql_build() lf_render <- sql_render(lf_build, con = simulate_dbi()) expect_equal(lf_render, sql("SELECT `x`\nFROM `df`")) }) # Optimisations ----------------------------------------------------------- test_that("optimisation is turned on by default", { lf <- lazy_frame(x = 1, y = 2) %>% arrange(x) %>% head(5) qry <- lf %>% sql_build() expect_equal(qry$from, ident("df")) }) test_that("group by then limit is collapsed", { lf <- memdb_frame(x = 1:10, y = 2) %>% group_by(x) %>% summarise(y = sum(y, na.rm = TRUE)) %>% head(1) qry <- lf %>% sql_build() expect_equal(qry$limit, 1L) expect_equal(qry$group_by, sql('`x`')) # And check that it returns the correct value expect_equal(collect(lf), tibble(x = 1L, y = 2)) }) test_that("filter and rename are correctly composed", { lf <- memdb_frame(x = 1, y = 2) %>% filter(x == 1) %>% select(x = y) qry <- lf %>% sql_build() expect_equal(qry$select, sql(x = "`y`")) expect_equal(qry$where, sql('`x` = 1.0')) # It surprises me that this SQL works! expect_equal(collect(lf), tibble(x = 2)) }) test_that("trivial subqueries are collapsed", { lf <- memdb_frame(a = 1:3) %>% mutate(b = a + 1) %>% distinct() %>% arrange() qry <- lf %>% sql_build() expect_is(qry$from, "ident") expect_true(qry$distinct) # And check that it returns the correct value expect_equal(collect(lf), tibble(a = 1:3, b = a + 1.0)) }) dbplyr/tests/testthat/test-tbl-lazy-print.txt0000644000176200001440000000003113500704060021143 0ustar liggesusers SELECT * FROM `df` dbplyr/tests/testthat/test-verb-summarise.R0000644000176200001440000000422613475552746020637 0ustar liggesuserscontext("Summarise") test_that("summarise peels off a single layer of grouping", { mf1 <- memdb_frame(x = 1, y = 1, z = 2) %>% group_by(x, y) mf2 <- mf1 %>% summarise(n = n()) expect_equal(group_vars(mf2), "x") mf3 <- mf2 %>% summarise(n = n()) expect_equal(group_vars(mf3), character()) }) test_that("summarise performs partial evaluation", { mf1 <- memdb_frame(x = 1) val <- 1 mf2 <- mf1 %>% summarise(y = x == val) %>% collect() expect_equal(mf2$y, 1) }) test_that("can't refer to freshly created variables", { mf1 <- lazy_frame(x = 1) expect_error( summarise(mf1, y = sum(x), z = sum(y)), "refers to a variable" ) }) # sql-render -------------------------------------------------------------- test_that("quoting for rendering summarized grouped table", { out <- memdb_frame(x = 1) %>% group_by(x) %>% summarize(n = n()) expect_match(out %>% sql_render, "^SELECT `x`, COUNT[(][)] AS `n`\nFROM `[^`]*`\nGROUP BY `x`$") expect_equal(out %>% collect, tibble(x = 1, n = 1L)) }) # sql-build --------------------------------------------------------------- test_that("summarise generates group_by and select", { out <- lazy_frame(g = 1) %>% group_by(g) %>% summarise(n = n()) %>% sql_build() expect_equal(out$group_by, sql('`g`')) expect_equal(out$select, sql('`g`', 'COUNT() AS `n`')) }) # ops --------------------------------------------------------------------- test_that("summarise replaces existing", { out <- tibble(x = 1, y = 2) %>% tbl_lazy() %>% summarise(z = 1) expect_equal(op_vars(out), "z") }) test_that("summarised vars are always named", { mf <- dbplyr::memdb_frame(a = 1) out1 <- mf %>% summarise(1) %>% op_vars() expect_equal(out1, "1") }) test_that("grouped summary keeps groups", { out <- tibble(g = 1, x = 1) %>% tbl_lazy() %>% group_by(g) %>% summarise(y = 1) expect_equal(op_vars(out), c("g", "y")) }) test_that("summarise drops one grouping level", { df <- tibble(g1 = 1, g2 = 2, x = 3) %>% tbl_lazy() %>% group_by(g1, g2) out1 <- df %>% summarise(y = 1) out2 <- out1 %>% summarise(y = 2) expect_equal(op_grps(out1), "g1") expect_equal(op_grps(out2), character()) }) dbplyr/tests/testthat/test-translate-sql.R0000644000176200001440000001461513476016031020452 0ustar liggesuserscontext("translate") test_that("dplyr.strict_sql = TRUE prevents auto conversion", { old <- options(dplyr.strict_sql = TRUE) on.exit(options(old)) expect_equal(translate_sql(1 + 2), sql("1.0 + 2.0")) expect_error(translate_sql(blah(x)), "could not find function") }) test_that("Wrong number of arguments raises error", { expect_error(translate_sql(mean(1, 2, na.rm = TRUE), window = FALSE), "unused argument") }) test_that("between translated to special form (#503)", { out <- translate_sql(between(x, 1, 2)) expect_equal(out, sql("`x` BETWEEN 1.0 AND 2.0")) }) test_that("is.na and is.null are equivalent", { # Needs to be wrapped in parens to ensure correct precedence expect_equal(translate_sql(is.na(x)), sql("((`x`) IS NULL)")) expect_equal(translate_sql(is.null(x)), sql("((`x`) IS NULL)")) expect_equal(translate_sql(x + is.na(x)), sql("`x` + ((`x`) IS NULL)")) expect_equal(translate_sql(!is.na(x)), sql("NOT(((`x`) IS NULL))")) }) test_that("%in% translation parenthesises when needed", { expect_equal(translate_sql(x %in% 1L), sql("`x` IN (1)")) expect_equal(translate_sql(x %in% c(1L)), sql("`x` IN (1)")) expect_equal(translate_sql(x %in% 1:2), sql("`x` IN (1, 2)")) expect_equal(translate_sql(x %in% y), sql("`x` IN `y`")) }) test_that("%in% strips vector names", { expect_equal(translate_sql(x %in% c(a = 1L)), sql("`x` IN (1)")) expect_equal(translate_sql(x %in% !!c(a = 1L)), sql("`x` IN (1)")) }) test_that("%in% with empty vector", { expect_equal(translate_sql(x %in% !!integer()), sql("FALSE")) }) test_that("n_distinct(x) translated to COUNT(distinct, x)", { expect_equal( translate_sql(n_distinct(x), window = FALSE), sql("COUNT(DISTINCT `x`)") ) expect_equal( translate_sql(n_distinct(x), window = TRUE), sql("COUNT(DISTINCT `x`) OVER ()") ) expect_error(translate_sql(n_distinct(x, y), window = FALSE), "unused argument") }) test_that("na_if is translated to NULLIF (#211)", { expect_equal(translate_sql(na_if(x, 0L)), sql("NULLIF(`x`, 0)")) }) test_that("connection affects quoting character", { lf <- lazy_frame(field1 = 1, con = simulate_sqlite()) out <- select(lf, field1) expect_match(sql_render(out), "^SELECT `field1`\nFROM `df`$") }) test_that("magrittr pipe is translated", { expect_identical(translate_sql(1 %>% is.na()), translate_sql(is.na(1))) }) # casts ------------------------------------------------------------------- test_that("casts as expected", { expect_equal(translate_sql(as.integer64(x)), sql("CAST(`x` AS BIGINT)")) expect_equal(translate_sql(as.logical(x)), sql("CAST(`x` AS BOOLEAN)")) expect_equal(translate_sql(as.Date(x)), sql("CAST(`x` AS DATE)")) }) # conditionals ------------------------------------------------------------ test_that("all forms of if translated to case statement", { expected <- sql("CASE WHEN (`x`) THEN (1) WHEN NOT(`x`) THEN (2) END") expect_equal(translate_sql(if (x) 1L else 2L), expected) expect_equal(translate_sql(ifelse(x, 1L, 2L)), expected) expect_equal(translate_sql(if_else(x, 1L, 2L)), expected) }) test_that("if translation adds parens", { expect_equal( translate_sql(if (x) y), sql("CASE WHEN (`x`) THEN (`y`) END") ) expect_equal( translate_sql(if (x) y else z), sql("CASE WHEN (`x`) THEN (`y`) WHEN NOT(`x`) THEN (`z`) END") ) }) test_that("if and ifelse use correctly named arguments",{ exp <- translate_sql(if (x) 1 else 2) expect_equal(translate_sql(ifelse(test = x, yes = 1, no = 2)), exp) expect_equal(translate_sql(if_else(condition = x, true = 1, false = 2)), exp) }) test_that("switch translated to CASE WHEN", { expect_equal( translate_sql(switch(x, a = 1L)), sql("CASE `x` WHEN ('a') THEN (1) END") ) expect_equal( translate_sql(switch(x, a = 1L, 2L)), sql("CASE `x` WHEN ('a') THEN (1) ELSE (2) END") ) }) # numeric ----------------------------------------------------------------- test_that("hypergeometric functions use manual calculation", { expect_equal(translate_sql(cosh(x)), sql("(EXP(`x`) + EXP(-(`x`))) / 2")) expect_equal(translate_sql(sinh(x)), sql("(EXP(`x`) - EXP(-(`x`))) / 2")) expect_equal(translate_sql(tanh(x)), sql("(EXP(2 * (`x`)) - 1) / (EXP(2 * (`x`)) + 1)")) expect_equal(translate_sql(coth(x)), sql("(EXP(2 * (`x`)) + 1) / (EXP(2 * (`x`)) - 1)")) }) test_that("pmin and max use GREATEST and LEAST", { expect_equal(translate_sql(pmin(x, y)), sql("LEAST(`x`, `y`)")) expect_equal(translate_sql(pmax(x, y)), sql("GREATEST(`x`, `y`)")) }) test_that("round uses integer digits", { expect_equal(translate_sql(round(10.1)), sql("ROUND(10.1, 0)")) expect_equal(translate_sql(round(10.1, digits = 1)), sql("ROUND(10.1, 1)")) }) # string functions -------------------------------------------------------- test_that("different arguments of substr are corrected", { expect_equal(translate_sql(substr(x, 3, 4)), sql("SUBSTR(`x`, 3, 2)")) expect_equal(translate_sql(substr(x, 3, 3)), sql("SUBSTR(`x`, 3, 1)")) expect_equal(translate_sql(substr(x, 3, 2)), sql("SUBSTR(`x`, 3, 0)")) expect_equal(translate_sql(substr(x, 3, 1)), sql("SUBSTR(`x`, 3, 0)")) }) test_that("paste() translated to CONCAT_WS", { expect_equal(translate_sql(paste0(x, y)), sql("CONCAT_WS('', `x`, `y`)")) expect_equal(translate_sql(paste(x, y)), sql("CONCAT_WS(' ', `x`, `y`)")) expect_equal(translate_sql(paste(x, y, sep = ",")), sql("CONCAT_WS(',', `x`, `y`)")) }) # stringr ------------------------------------------- test_that("str_length() translates correctly ", { expect_equal(translate_sql(str_length(x)), sql("LENGTH(`x`)")) }) test_that("lower/upper translates correctly ", { expect_equal(translate_sql(str_to_upper(x)), sql("UPPER(`x`)")) expect_equal(translate_sql(str_to_lower(x)), sql("LOWER(`x`)")) }) test_that("str_trim() translates correctly ", { expect_equal( translate_sql(str_trim(x, "both")), sql("LTRIM(RTRIM(`x`))") ) }) # subsetting -------------------------------------------------------------- test_that("$ and [[ index into nested fields", { expect_equal(translate_sql(a$b), sql("`a`.`b`")) expect_equal(translate_sql(a[["b"]]), sql("`a`.`b`")) }) test_that("can only subset with strings", { expect_error(translate_sql(a[[1]]), "index with strings") expect_error(translate_sql(a[[x]]), "index with strings") }) test_that("[ treated as if it is logical subsetting", { expect_equal(translate_sql(y[x == 0L]), sql("CASE WHEN (`x` = 0) THEN (`y`) END")) }) dbplyr/tests/testthat/test-verb-pull.R0000644000176200001440000000071313426147047017571 0ustar liggesuserscontext("test-verb-pull") test_that("can extract default, by name, or positive/negative position", { x <- 1:10 y <- runif(10) mf <- memdb_frame(x = x, y = y) expect_equal(pull(mf), y) expect_equal(pull(mf, x), x) expect_equal(pull(mf, 1L), x) expect_equal(pull(mf, -1), y) }) test_that("extracts correct column from grouped tbl", { mf <- memdb_frame(id = "a", value = 42) gf <- mf %>% group_by(id) expect_equal(pull(mf, value), 42) }) dbplyr/tests/testthat/test-sql-variant.txt0000644000176200001440000000006513500704060020523 0ustar liggesusers scalar: + aggregate: + window: + dbplyr/tests/testthat/test-backend-postgres.R0000644000176200001440000000537113455376701021124 0ustar liggesuserscontext("test-backend-postgres.R") test_that("custom scalar translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_postgres()) } expect_equal(trans(bitwXor(x, 128L)), sql("`x` # 128")) expect_equal(trans(log10(x)), sql("LOG(`x`)")) expect_equal(trans(log(x)), sql("LN(`x`)")) expect_equal(trans(log(x, 2)), sql("LOG(`x`) / LOG(2.0)")) expect_equal(trans(cot(x)), sql("1 / TAN(`x`)")) expect_equal(trans(round(x, digits = 1.1)), sql("ROUND((`x`) :: numeric, 1)")) expect_equal(trans(grepl("exp", x)), sql("(`x`) ~ ('exp')")) expect_equal(trans(grepl("exp", x, TRUE)), sql("(`x`) ~* ('exp')")) expect_equal(trans(substr("test", 2 , 3)), sql("SUBSTR('test', 2, 2)")) }) test_that("custom stringr functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_postgres()) } expect_equal(trans(str_replace_all(x, y, z)), sql("REGEXP_REPLACE(`x`, `y`, `z`)")) }) test_that("two variable aggregates are translated correctly", { trans <- function(x, window) { translate_sql(!!enquo(x), window = window, con = simulate_postgres()) } expect_equal(trans(cor(x, y), window = FALSE), sql("CORR(`x`, `y`)")) expect_equal(trans(cor(x, y), window = TRUE), sql("CORR(`x`, `y`) OVER ()")) }) test_that("pasting translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), window = FALSE, con = simulate_postgres()) } expect_equal(trans(paste(x, y)), sql("CONCAT_WS(' ', `x`, `y`)")) expect_equal(trans(paste0(x, y)), sql("CONCAT_WS('', `x`, `y`)")) expect_error(trans(paste0(x, collapse = "")), "`collapse` not supported") }) test_that("postgres mimics two argument log", { trans <- function(...) { translate_sql(..., con = simulate_postgres()) } expect_equal(trans(log(x)), sql('LN(`x`)')) expect_equal(trans(log(x, 10)), sql('LOG(`x`) / LOG(10.0)')) expect_equal(trans(log(x, 10L)), sql('LOG(`x`) / LOG(10)')) }) test_that("custom lubridate functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_postgres()) } expect_equal(trans(yday(x)), sql("EXTRACT(DOY FROM `x`)")) expect_equal(trans(quarter(x)), sql("EXTRACT(QUARTER FROM `x`)")) expect_equal(trans(quarter(x, with_year = TRUE)), sql("(EXTRACT(YEAR FROM `x`) || '.' || EXTRACT(QUARTER FROM `x`))")) expect_error(trans(quarter(x, fiscal_start = 2))) }) test_that("postgres can explain (#272)", { skip_if_no_db("postgres") df1 <- data.frame(x = 1:3) expect_output(expect_error( src_test("postgres") %>% copy_to(df1, unique_table_name()) %>% mutate(y = x + 1) %>% explain(), NA )) }) dbplyr/tests/testthat/test-query-semi-join.R0000644000176200001440000000100613426611521020703 0ustar liggesuserscontext("test-query-semi-join") test_that("print method doesn't change unexpectedly", { lf1 <- lazy_frame(x = 1, y = 2) lf2 <- lazy_frame(x = 1, z = 2) qry <- sql_build(semi_join(lf1, lf2)) expect_known_output(print(qry), test_path("test-query-semi-join-print.txt")) }) test_that("generated sql doesn't change unexpectedly", { lf <- lazy_frame(x = 1, y = 2) reg <- list( semi = semi_join(lf, lf), anti = anti_join(lf, lf) ) expect_known_output(print(reg), test_path("sql/semi-join.sql")) }) dbplyr/tests/testthat/test-query-semi-join-print.txt0000644000176200001440000000007213500704057022455 0ustar liggesusers By: x-x X: df Y: df dbplyr/tests/testthat/test-verb-set-ops.R0000644000176200001440000000360113416126630020200 0ustar liggesuserscontext("sets") test_that("column order is matched", { df1 <- memdb_frame(x = 1, y = 2) df2 <- memdb_frame(y = 1, x = 2) out <- collect(union(df1, df2)) expect_equal(out, tibble(x = c(1, 2), y = c(2, 1))) }) test_that("missing columns filled with NULL", { df1 <- memdb_frame(x = 1) df2 <- memdb_frame(y = 1) out <- collect(union(df1, df2)) expect_equal(out, tibble(x = c(1, NA), y = c(NA, 1))) }) # SQL generation ---------------------------------------------------------- test_that("union and union all work for all backends", { df <- tibble(x = 1:10, y = x %% 2) tbls_full <- test_load(df) tbls_filter <- test_load(filter(df, y == 0)) tbls_full %>% purrr::map2(tbls_filter, union) %>% expect_equal_tbls() tbls_full %>% purrr::map2(tbls_filter, union_all) %>% expect_equal_tbls() }) test_that("intersect and setdiff work for supported backends", { df <- tibble(x = 1:10, y = x %% 2) # MySQL doesn't support EXCEPT or INTERSECT tbls_full <- test_load(df, ignore = c("mysql", "MariaDB")) tbls_filter <- test_load(filter(df, y == 0), ignore = c("mysql", "MariaDB")) tbls_full %>% purrr::map2(tbls_filter, intersect) %>% expect_equal_tbls() tbls_full %>% purrr::map2(tbls_filter, setdiff) %>% expect_equal_tbls() }) test_that("SQLite warns if set op attempted when tbl has LIMIT", { mf <- memdb_frame(x = 1:2) m1 <- head(mf, 1) expect_error(union(mf, m1), "does not support") expect_error(union(m1, mf), "does not support") }) test_that("other backends can combine with a limit", { df <- tibble(x = 1:2) # sqlite only allows limit at top level tbls_full <- test_load(df, ignore = "sqlite") tbls_head <- lapply(test_load(df, ignore = "sqlite"), head, n = 1) tbls_full %>% purrr::map2(tbls_head, union) %>% expect_equal_tbls() tbls_full %>% purrr::map2(tbls_head, union_all) %>% expect_equal_tbls() }) dbplyr/tests/testthat/test-backend-hive.R0000644000176200001440000000116213455376040020177 0ustar liggesuserscontext("test-backend-hive.R") test_that("custom scalar & string functions translated correctly", { trans <- function(x) { translate_sql(!!enquo(x), con = simulate_hive()) } expect_equal(trans(bitwShiftL(x, 2L)), sql("SHIFTLEFT(`x`, 2)")) expect_equal(trans(bitwShiftR(x, 2L)), sql("SHIFTRIGHT(`x`, 2)")) expect_equal(trans(cot(x)), sql("1.0 / TAN(`x`)")) expect_equal(trans(str_replace_all(x, "old", "new")), sql("REGEXP_REPLACE(`x`, 'old', 'new')")) expect_equal(trans(median(x)), sql("PERCENTILE(`x`, 0.5) OVER ()")) }) dbplyr/tests/testthat/test-escape.R0000644000176200001440000000725213416413232017115 0ustar liggesuserscontext("test-escape.R") # Identifiers ------------------------------------------------------------------ ei <- function(...) unclass(escape(ident(c(...)), con = simulate_dbi())) test_that("identifiers get identifier quoting", { expect_equal(ei("x"), '`x`') }) test_that("identifiers are comma separated", { expect_equal(ei("x", "y"), '`x`, `y`') }) test_that("identifier names become AS", { expect_equal(ei(x = "y"), '`y` AS `x`') }) # Zero-length inputs ------------------------------------------------------ test_that("zero length inputs yield zero length output when not collapsed", { con <- simulate_dbi() expect_equal(sql_vector(sql(), collapse = NULL, con = con), sql()) expect_equal(sql_vector(ident(), collapse = NULL, con = con), sql()) expect_equal(sql_vector(ident_q(), collapse = NULL, con = con), sql()) }) test_that("zero length inputs yield length-1 output when collapsed", { con <- simulate_dbi() expect_equal(sql_vector(sql(), parens = FALSE, collapse = "", con = con), sql("")) expect_equal(sql_vector(sql(), parens = TRUE, collapse = "", con = con), sql("()")) expect_equal(sql_vector(ident(), parens = FALSE, collapse = "", con = con), sql("")) expect_equal(sql_vector(ident(), parens = TRUE, collapse = "", con = con), sql("()")) expect_equal(sql_vector(ident_q(), parens = FALSE, collapse = "", con = con), sql("")) expect_equal(sql_vector(ident_q(), parens = TRUE, collapse = "", con = con), sql("()")) }) # Numeric ------------------------------------------------------------------ test_that("missing vaues become null", { con <- simulate_dbi() expect_equal(escape(NA, con = con), sql("NULL")) expect_equal(escape(NA_real_, con = con), sql("NULL")) expect_equal(escape(NA_integer_, con = con), sql("NULL")) expect_equal(escape(NA_character_, con = con), sql("NULL")) }) test_that("-Inf and Inf are expanded and quoted", { con <- simulate_dbi() expect_equal(escape(Inf, con = con), sql("'Infinity'")) expect_equal(escape(-Inf, con = con), sql("'-Infinity'")) }) test_that("can escape integer64 values", { con <- simulate_dbi() skip_if_not_installed("bit64") expect_equal( escape(bit64::as.integer64(NA), con = con), sql("NULL") ) expect_equal( escape(bit64::as.integer64("123456789123456789"), con = con), sql("123456789123456789") ) }) # Logical ----------------------------------------------------------------- test_that("logical is SQL-99 compatible (by default)", { con <- simulate_dbi() expect_equal(escape(TRUE, con = con), sql("TRUE")) expect_equal(escape(FALSE, con = con), sql("FALSE")) expect_equal(escape(NA, con = con), sql("NULL")) }) # Date-time --------------------------------------------------------------- test_that("date-times are converted to ISO 8601", { con <- simulate_dbi() x <- ISOdatetime(2000, 1, 2, 3, 4, 5, tz = "US/Central") expect_equal(escape(x, con = con), sql("'2000-01-02T09:04:05Z'")) }) # names_to_as() ----------------------------------------------------------- test_that("names_to_as() doesn't alias when ident name and value are identical", { x <- ident(name = "name") y <- sql_escape_ident(con = simulate_dbi(), x = x) expect_equal(names_to_as(y, names2(x), con = simulate_dbi()), "`name`") }) test_that("names_to_as() doesn't alias when ident name is missing", { x <- ident("*") y <- sql_escape_ident(con = simulate_dbi(), x = x) expect_equal(names_to_as(y, names2(x), con = simulate_dbi()), "`*`") }) test_that("names_to_as() aliases when ident name and value are different", { x <- ident(new_name = "name") y <- sql_escape_ident(con = simulate_dbi(), x = x) expect_equal(names_to_as(y, names2(x), con = simulate_dbi()), "`name` AS `new_name`") }) dbplyr/tests/testthat/test-sql-build.R0000644000176200001440000000033713426323035017550 0ustar liggesuserscontext("test-sql-build") test_that("rendering table wraps in SELECT *", { out <- memdb_frame(x = 1) expect_match(out %>% sql_render(), "^SELECT [*]\nFROM `[^`]*`$") expect_equal(out %>% collect(), tibble(x = 1)) }) dbplyr/tests/testthat/test-verb-mutate.R0000644000176200001440000001264013475555062020122 0ustar liggesuserscontext("mutate") test_that("mutate computed before summarise", { mf <- memdb_frame(x = c(1, 2, 3), y = c(9, 8, 7)) out <- mutate(mf, z = x + y) %>% summarise(sum_z = sum(z, na.rm = TRUE)) %>% collect() expect_equal(out$sum_z, 30) }) test_that("two mutates equivalent to one", { mf <- memdb_frame(x = c(1, 5, 9), y = c(3, 12, 11)) df1 <- mf %>% mutate(x2 = x * 2, y4 = y * 4) %>% collect() df2 <- mf %>% collect() %>% mutate(x2 = x * 2, y4 = y * 4) expect_equal_tbl(df1, df2) }) test_that("can refer to fresly created values", { out1 <- memdb_frame(x1 = 1) %>% mutate(x2 = x1 + 1, x3 = x2 + 1, x4 = x3 + 1) %>% collect() expect_equal(out1, tibble(x1 = 1, x2 = 2, x3 = 3, x4 = 4)) out2 <- memdb_frame(x = 1) %>% mutate(x = x + 1, x = x + 1, x = x + 1) %>% collect() expect_equal(out2, tibble(x = 4)) }) test_that("queries are not nested unnecessarily", { # Should only be one query deep sql <- memdb_frame(x = 1) %>% mutate(y = x + 1, a = y + 1, b = y + 1) %>% sql_build() expect_s3_class(sql$from, "select_query") expect_s3_class(sql$from$from, "ident") }) test_that("maintains order of existing columns (#3216, #3223)", { lazy <- lazy_frame(x = 1, y = 2) %>% mutate(z = 3, y = 4, y = 5) expect_equal(op_vars(lazy), c("x", "y", "z")) }) test_that("supports overwriting variables (#3222)", { df <- memdb_frame(x = 1, y = 2) %>% mutate(y = 4, y = 5) %>% collect() expect_equal(df, tibble(x = 1, y = 5)) df <- memdb_frame(x = 1, y = 2) %>% mutate(y = 4, y = y + 1) %>% collect() expect_equal(df, tibble(x = 1, y = 5)) df <- memdb_frame(x = 1, y = 2) %>% mutate(y = 4, y = x + 4) %>% collect() expect_equal(df, tibble(x = 1, y = 5)) }) # SQL generation ----------------------------------------------------------- test_that("mutate calls windowed versions of sql functions", { dfs <- test_frame_windowed(x = 1:4, g = rep(c(1, 2), each = 2)) out <- lapply(dfs, . %>% group_by(g) %>% mutate(r = as.numeric(row_number(x)))) expect_equal(out$df$r, c(1, 2, 1, 2)) expect_equal_tbls(out) }) test_that("recycled aggregates generate window function", { dfs <- test_frame_windowed(x = as.numeric(1:4), g = rep(c(1, 2), each = 2)) out <- lapply(dfs, . %>% group_by(g) %>% mutate(r = x - mean(x, na.rm = TRUE))) expect_equal(out$df$r, c(-0.5, 0.5, -0.5, 0.5)) expect_equal_tbls(out) }) test_that("cumulative aggregates generate window function", { dfs <- test_frame_windowed(x = 1:4, g = rep(c(1, 2), each = 2)) out <- lapply(dfs, . %>% group_by(g) %>% arrange(x) %>% mutate(r = as.numeric(cumsum(x))) ) expect_equal(out$df$r, c(1, 3, 3, 7)) expect_equal_tbls(out) }) test_that("mutate overwrites previous variables", { df <- memdb_frame(x = 1:5) %>% mutate(x = x + 1) %>% mutate(x = x + 1) %>% collect() expect_equal(names(df), "x") expect_equal(df$x, 1:5 + 2) }) test_that("sequence of operations work", { out <- memdb_frame(x = c(1, 2, 3, 4)) %>% select(y = x) %>% mutate(z = 2 * y) %>% filter(z == 2) %>% collect() expect_equal(out, tibble(y = 1, z = 2)) }) # sql_render -------------------------------------------------------------- test_that("quoting for rendering mutated grouped table", { out <- memdb_frame(x = 1, y = 2) %>% mutate(y = x) expect_match(out %>% sql_render, "^SELECT `x`, `x` AS `y`\nFROM `[^`]*`$") expect_equal(out %>% collect, tibble(x = 1, y = 1)) }) test_that("mutate generates subqueries as needed", { lf <- lazy_frame(x = 1, con = simulate_sqlite()) reg <- list( inplace = lf %>% mutate(x = x + 1, x = x + 1), increment = lf %>% mutate(x1 = x + 1, x2 = x1 + 1) ) expect_known_output(print(reg), test_path("sql/mutate-subqueries.sql")) }) test_that("mutate collapses over nested select", { lf <- lazy_frame(g = 0, x = 1, y = 2) reg <- list( xy = lf %>% select(x:y) %>% mutate(x = x * 2, y = y * 2), yx = lf %>% select(y:x) %>% mutate(x = x * 2, y = y * 2) ) expect_known_output(print(reg), test_path("sql/mutate-select-collapse.sql")) }) # sql_build --------------------------------------------------------------- test_that("mutate generates simple expressions", { out <- lazy_frame(x = 1) %>% mutate(y = x + 1L) %>% sql_build() expect_equal(out$select, sql(x = '`x`', y = '`x` + 1')) }) test_that("mutate can drop variables with NULL", { out <- lazy_frame(x = 1, y = 1) %>% mutate(y = NULL) %>% sql_build() expect_named(out$select, "x") }) test_that("mutate_all generates correct sql", { out <- lazy_frame(x = 1, y = 1) %>% mutate_all(~ . + 1L) %>% sql_build() expect_equal(out$select, sql(x = '`x` + 1', y = '`y` + 1')) out <- lazy_frame(x = 1) %>% mutate_all(list(one = ~ . + 1L, two = ~ . + 2L)) %>% sql_build() expect_equal(out$select, sql(`x` = '`x`', one = '`x` + 1', two = '`x` + 2')) }) test_that("mutate_all scopes nested quosures correctly", { num <- 10L out <- lazy_frame(x = 1, y = 1) %>% mutate_all(~ . + num) %>% sql_build() expect_equal(out$select, sql(x = '`x` + 10', y = '`y` + 10')) }) # ops --------------------------------------------------------------------- test_that("mutate adds new", { out <- tibble(x = 1) %>% tbl_lazy() %>% mutate(y = x + 1, z = y + 1) expect_equal(op_vars(out), c("x", "y", "z")) }) test_that("mutated vars are always named", { mf <- dbplyr::memdb_frame(a = 1) out2 <- mf %>% mutate(1) %>% op_vars() expect_equal(out2, c("a", "1")) }) dbplyr/tests/testthat/test-tbl-sql.R0000644000176200001440000000703413501726003017227 0ustar liggesuserscontext("test-tbl_sql.R") test_that("tbl_sql() works with string argument", { name <- unclass(unique_table_name()) df <- memdb_frame(a = 1, .name = name) expect_equal(collect(tbl_sql("sqlite", df$src, name)), collect(df)) }) test_that("head/print respects n" ,{ df2 <- memdb_frame(x = 1:5) out <- df2 %>% head(n = Inf) %>% collect() expect_equal(nrow(out), 5) expect_output(print(df2, n = Inf)) out <- df2 %>% head(n = 1) %>% collect() expect_equal(nrow(out), 1) out <- df2 %>% head(n = 0) %>% collect() expect_equal(nrow(out), 0) expect_error( df2 %>% head(n = -1) %>% collect(), "not greater than or equal to 0" ) }) test_that("same_src distinguishes srcs", { src1 <- src_sqlite(":memory:", create = TRUE) src2 <- src_sqlite(":memory:", create = TRUE) expect_true(same_src(src1, src1)) expect_false(same_src(src1, src2)) db1 <- copy_to(src1, iris, 'data1', temporary = FALSE) db2 <- copy_to(src2, iris, 'data2', temporary = FALSE) expect_true(same_src(db1, db1)) expect_false(same_src(db1, db2)) expect_false(same_src(db1, mtcars)) }) # tbl --------------------------------------------------------------------- test_that("can generate sql tbls with raw sql", { mf1 <- memdb_frame(x = 1:3, y = 3:1) mf2 <- tbl(mf1$src, build_sql("SELECT * FROM ", mf1$ops$x, con = simulate_dbi())) expect_equal(collect(mf1), collect(mf2)) }) test_that("can refer to default schema explicitly", { con <- sqlite_con_with_aux() on.exit(DBI::dbDisconnect(con)) DBI::dbExecute(con, "CREATE TABLE t1 (x)") expect_equal(as.character(tbl_vars(tbl(con, "t1"))), "x") expect_equal(as.character(tbl_vars(tbl(con, in_schema("main", "t1")))), "x") }) test_that("can distinguish 'schema.table' from 'schema'.'table'", { con <- sqlite_con_with_aux() on.exit(DBI::dbDisconnect(con)) DBI::dbExecute(con, "CREATE TABLE aux.t1 (x, y, z)") DBI::dbExecute(con, "CREATE TABLE 'aux.t1' (a, b, c)") expect_equal(as.character(tbl_vars(tbl(con, in_schema("aux", "t1")))), c("x", "y", "z")) expect_equal(as.character(tbl_vars(tbl(con, ident("aux.t1")))), c("a", "b", "c")) }) # n_groups ---------------------------------------------------------------- # Data for the first three test_that groups below df <- data.frame(x = rep(1:3, each = 10), y = rep(1:6, each = 5)) # MariaDB returns bit64 instead of int, which makes testing hard tbls <- test_load(df, ignore = "MariaDB") test_that("ungrouped data has 1 group, with group size = nrow()", { for (tbl in tbls) { expect_equal(n_groups(tbl), 1L) expect_equal(group_size(tbl), 30) } }) test_that("rowwise data has one group for each group", { rw <- rowwise(df) expect_equal(n_groups(rw), 30) expect_equal(group_size(rw), rep(1, 30)) }) test_that("group_size correct for grouped data", { for (tbl in tbls) { grp <- group_by(tbl, x) expect_equal(n_groups(grp), 3L) expect_equal(group_size(grp), rep(10, 3)) } }) # tbl_sum ------------------------------------------------------------------- test_that("ungrouped output", { mf <- memdb_frame(x = 1:5, y = 1:5, .name = "tbl_sum_test") out1 <- tbl_sum(mf) expect_named(out1, c("Source", "Database")) expect_equal(out1[["Source"]], "table [?? x 2]") expect_match(out1[["Database"]], "sqlite (.*) \\[:memory:\\]") out2 <- tbl_sum(mf %>% group_by(x, y)) expect_named(out2, c("Source", "Database", "Groups")) expect_equal(out2[["Groups"]], c("x, y")) out3 <- tbl_sum(mf %>% arrange(x)) expect_named(out3, c("Source", "Database", "Ordered by")) expect_equal(out3[["Ordered by"]], c("x")) }) dbplyr/tests/testthat/test-translate-sql-string.R0000644000176200001440000000213113442404754021752 0ustar liggesuserscontext("translate string helpers") test_that("sql_substr works as expected", { old <- set_current_con(simulate_dbi()) on.exit(set_current_con(old)) x <- ident("x") substr <- sql_substr("SUBSTR") expect_error(substr("test"), 'argument "start" is missing') expect_error(substr("test", 0), 'argument "stop" is missing') expect_equal(substr("test", 0, 1), sql("SUBSTR('test', 0, 2)")) expect_equal(substr("test", 3, 2), sql("SUBSTR('test', 3, 0)")) expect_equal(substr("test", 3, 3), sql("SUBSTR('test', 3, 1)")) }) test_that("sql_str_sub works as expected", { old <- set_current_con(simulate_dbi()) on.exit(set_current_con(old)) x <- ident("x") substr <- sql_str_sub("SUBSTR") expect_equal(substr(x), sql("SUBSTR(`x`, 1)")) expect_equal(substr(x, 1), sql("SUBSTR(`x`, 1)")) expect_equal(substr(x, -1), sql("SUBSTR(`x`, -1)")) expect_equal(substr(x, 2, 4), sql("SUBSTR(`x`, 2, 3)")) expect_equal(substr(x, 2, 2), sql("SUBSTR(`x`, 2, 1)")) expect_equal(substr(x, 1, -2), sql("SUBSTR(`x`, 1, LENGTH(`x`) - 1)")) expect_equal(substr(x, -3, -3), sql("SUBSTR(`x`, -3, 1)")) }) dbplyr/tests/testthat/test-backend-.R0000644000176200001440000000462613442161247017330 0ustar liggesuserscontext("translate-math") test_that("db_write_table calls dbQuoteIdentifier on table name" ,{ idents <- character() setClass("DummyDBIConnection", representation("DBIConnection")) setMethod("dbQuoteIdentifier", c("DummyDBIConnection", "character"), function(conn, x, ...) { idents <<- c(idents, x) } ) setMethod("dbWriteTable", c("DummyDBIConnection", "character", "ANY"), function(conn, name, value, ...) {TRUE} ) dummy_con <- new("DummyDBIConnection") db_write_table(dummy_con, "somecrazytablename", NA, NA) expect_true("somecrazytablename" %in% idents) }) # basic arithmetic -------------------------------------------------------- test_that("basic arithmetic is correct", { expect_equal(translate_sql(1 + 2), sql("1.0 + 2.0")) expect_equal(translate_sql(2 * 4), sql("2.0 * 4.0")) expect_equal(translate_sql(5 ^ 2), sql("POWER(5.0, 2.0)")) expect_equal(translate_sql(100L %% 3L), sql("100 % 3")) }) test_that("small numbers aren't converted to 0", { expect_equal(translate_sql(1e-9), sql("1e-09")) }) # minus ------------------------------------------------------------------- test_that("unary minus flips sign of number", { expect_equal(translate_sql(-10L), sql("-10")) expect_equal(translate_sql(x == -10), sql('`x` = -10.0')) expect_equal(translate_sql(x %in% c(-1L, 0L)), sql('`x` IN (-1, 0)')) }) test_that("unary minus wraps non-numeric expressions", { expect_equal(translate_sql(-(1L + 2L)), sql("-(1 + 2)")) expect_equal(translate_sql(-mean(x, na.rm = TRUE), window = FALSE), sql('-AVG(`x`)')) }) test_that("binary minus subtracts", { expect_equal(translate_sql(1L - 10L), sql("1 - 10")) }) # log --------------------------------------------------------------------- test_that("log base comes first", { expect_equal(translate_sql(log(x, 10)), sql('LOG(10.0, `x`)')) }) test_that("log becomes ln", { expect_equal(translate_sql(log(x)), sql('LN(`x`)')) }) # bitwise ----------------------------------------------------------------- test_that("bitwise operations", { expect_equal(translate_sql(bitwNot(x)), sql("~(`x`)")) expect_equal(translate_sql(bitwAnd(x, 128L)), sql("`x` & 128")) expect_equal(translate_sql(bitwOr(x, 128L)), sql("`x` | 128")) expect_equal(translate_sql(bitwXor(x, 128L)), sql("`x` ^ 128")) expect_equal(translate_sql(bitwShiftL(x, 2L)), sql("`x` << 2")) expect_equal(translate_sql(bitwShiftR(x, 2L)), sql("`x` >> 2")) }) dbplyr/tests/testthat/test-verb-distinct.R0000644000176200001440000000406313442450753020437 0ustar liggesuserscontext("distinct") df <- tibble( x = c(1, 1, 1, 1), y = c(1, 1, 2, 2), z = c(1, 2, 1, 2) ) dfs <- test_load(df) test_that("distinct equivalent to local unique when keep_all is TRUE", { dfs %>% lapply(. %>% distinct()) %>% expect_equal_tbls(unique(df)) }) test_that("distinct for single column equivalent to local unique (#1937)", { dfs %>% lapply(. %>% distinct(x, .keep_all = FALSE)) %>% expect_equal_tbls(unique(df["x"])) dfs %>% lapply(. %>% distinct(y, .keep_all = FALSE)) %>% expect_equal_tbls(unique(df["y"])) }) test_that("distinct throws error if column is specified and .keep_all is TRUE", { mf <- memdb_frame(x = 1:10) expect_error( mf %>% distinct(x, .keep_all = TRUE) %>% collect(), "specified columns.*[.]keep_all" ) }) # sql-render -------------------------------------------------------------- test_that("distinct adds DISTINCT suffix", { out <- memdb_frame(x = c(1, 1)) %>% distinct() expect_match(out %>% sql_render(), "SELECT DISTINCT") expect_equal(out %>% collect(), tibble(x = 1)) }) test_that("distinct can compute variables", { out <- memdb_frame(x = c(2, 1), y = c(1, 2)) %>% distinct(z = x + y) expect_equal(out %>% collect(), tibble(z = 3)) }) # sql_build --------------------------------------------------------------- test_that("distinct sets flagged", { out1 <- lazy_frame(x = 1) %>% select() %>% sql_build() expect_false(out1$distinct) out2 <- lazy_frame(x = 1) %>% distinct() %>% sql_build() expect_true(out2$distinct) }) # ops --------------------------------------------------------------------- test_that("distinct has complicated rules", { out <- lazy_frame(x = 1, y = 2) %>% distinct() expect_equal(op_vars(out), c("x", "y")) out <- lazy_frame(x = 1, y = 2, z = 3) %>% distinct(x, y) expect_equal(op_vars(out), c("x", "y")) out <- lazy_frame(x = 1, y = 2, z = 3) %>% distinct(a = x, b = y) expect_equal(op_vars(out), c("a", "b")) out <- lazy_frame(x = 1, y = 2, z = 3) %>% group_by(x) %>% distinct(y) expect_equal(op_vars(out), c("x", "y")) }) dbplyr/tests/testthat/test-verb-compute.R0000644000176200001440000000415713426323065020273 0ustar liggesuserscontext("test-verb-compute") test_that("collect equivalent to as.data.frame/as_tibble", { mf <- memdb_frame(letters) expect_equal(as.data.frame(mf), data.frame(letters, stringsAsFactors = FALSE)) expect_equal(tibble::as_tibble(mf), tibble::tibble(letters)) expect_equal(collect(mf), tibble::tibble(letters)) }) test_that("explicit collection returns all data", { n <- 1e5 + 10 # previous default was 1e5 big <- memdb_frame(x = seq_len(n)) nrow1 <- big %>% as.data.frame() %>% nrow() nrow2 <- big %>% tibble::as_tibble() %>% nrow() nrow3 <- big %>% collect() %>% nrow() expect_equal(nrow1, n) expect_equal(nrow2, n) expect_equal(nrow3, n) }) test_that("compute doesn't change representation", { mf1 <- memdb_frame(x = 5:1, y = 1:5, z = "a") expect_equal_tbl(mf1, mf1 %>% compute) expect_equal_tbl(mf1, mf1 %>% compute %>% compute) mf2 <- mf1 %>% mutate(z = x + y) expect_equal_tbl(mf2, mf2 %>% compute) }) test_that("compute can create indexes", { mfs <- test_frame(x = 5:1, y = 1:5, z = 10) mfs %>% lapply(. %>% compute(indexes = c("x", "y"))) %>% expect_equal_tbls() mfs %>% lapply(. %>% compute(indexes = list("x", "y", c("x", "y")))) %>% expect_equal_tbls() mfs %>% lapply(. %>% compute(indexes = "x", unique_indexes = "y")) %>% expect_equal_tbls() mfs %>% lapply(. %>% compute(unique_indexes = list(c("x", "z"), c("y", "z")))) %>% expect_equal_tbls() }) test_that("unique index fails if values are duplicated", { mfs <- test_frame(x = 5:1, y = "a", ignore = "df") lapply(mfs, function(.) expect_error(compute(., unique_indexes = "y"))) }) test_that("compute creates correct column names", { out <- memdb_frame(x = 1) %>% group_by(x) %>% summarize(n = n()) %>% compute() %>% collect() expect_equal(out, tibble(x = 1, n = 1L)) }) # ops --------------------------------------------------------------------- test_that("preserved across compute and collapse", { df1 <- memdb_frame(x = sample(10)) %>% arrange(x) df2 <- compute(df1) expect_equal(op_sort(df2), list(~x)) df3 <- collapse(df1) expect_equal(op_sort(df3), list(~x)) }) dbplyr/tests/testthat/helper-src.R0000644000176200001440000000203013442161033016727 0ustar liggesusersif (test_srcs$length() == 0) { test_register_src("df", dplyr::src_df(env = new.env(parent = emptyenv()))) test_register_con("sqlite", RSQLite::SQLite(), ":memory:") if (identical(Sys.getenv("TRAVIS"), "true")) { test_register_con("postgres", RPostgreSQL::PostgreSQL(), dbname = "test", user = "travis", password = "" ) } else { test_register_con("mysql", RMySQL::MySQL(), dbname = "test", host = "localhost", user = Sys.getenv("USER") ) test_register_con("MariaDB", RMariaDB::MariaDB(), dbname = "test", host = "localhost", user = Sys.getenv("USER") ) test_register_con("postgres", RPostgreSQL::PostgreSQL(), dbname = "test", host = "localhost", user = "" ) } } skip_if_no_db <- function(db) { if (!test_srcs$has(db)) skip(paste0("No ", db)) } sqlite_con_with_aux <- function() { tmp <- tempfile() con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") DBI::dbExecute(con, paste0("ATTACH '", tmp, "' AS aux")) con } dbplyr/tests/testthat/test-partial-eval.R0000644000176200001440000000210213475554530020237 0ustar liggesuserscontext("test-partial-eval.R") test_that("namespace operators always evaluated locally", { expect_equal(partial_eval(quote(base::sum(1, 2))), 3) expect_equal(partial_eval(quote(base:::sum(1, 2))), 3) }) test_that("namespaced calls to dplyr functions are stripped", { expect_equal(partial_eval(quote(dplyr::n())), expr(n())) }) test_that("use quosure environment for unevaluted formulas", { x <- 1 expect_equal(partial_eval(expr(~x)), ~1) }) test_that("can look up inlined function", { expect_equal( partial_eval(expr((!!mean)(x)), vars = "x"), expr(mean(x)) ) expect_equal( partial_eval(expr((!!as_function("mean"))(x)), vars = "x"), expr(mean(x)) ) }) test_that("respects tidy evaluation pronouns", { x <- "X" X <- "XX" expect_equal(partial_eval(expr(.data$x)), expr(x)) expect_equal(partial_eval(expr(.data[["x"]])), expr(x)) expect_equal(partial_eval(expr(.data[[x]])), expr(X)) expect_equal(partial_eval(expr(.env$x)), "X") expect_equal(partial_eval(expr(.env[["x"]])), "X") expect_equal(partial_eval(expr(.env[[x]])), "XX") }) dbplyr/tests/testthat/test-verb-do.R0000644000176200001440000000237713415750354017226 0ustar liggesuserscontext("test-do.R") test_that("ungrouped data collected first", { out <- memdb_frame(x = 1:2) %>% do(head(.)) expect_equal(out, tibble(x = 1:2)) }) test_that("named argument become list columns", { mf <- memdb_frame( g = rep(1:3, 1:3), x = 1:6 ) %>% group_by(g) out <- mf %>% do(nrow = nrow(.), ncol = ncol(.)) expect_equal(out$nrow, list(1, 2, 3)) expect_equal(out$ncol, list(2, 2, 2)) }) test_that("unnamed results bound together by row", { mf <- memdb_frame( g = c(1, 1, 2, 2), x = c(3, 9, 4, 9) ) %>% group_by(g) first <- mf %>% do(head(., 1)) expect_equal_tbl(first, tibble(g = c(1, 2), x = c(3, 4))) }) test_that("Results respect select", { mf <- memdb_frame( g = c(1, 1, 2, 2), x = c(3, 9, 4, 9), y = 1:4, z = 4:1 ) %>% group_by(g) out <- mf %>% select(x) %>% do(ncol = ncol(.)) expect_equal(out$g, c(1, 2)) expect_equal(out$ncol, list(2L, 2L)) }) test_that("results independent of chunk_size", { mf <- memdb_frame( g = rep(1:3, 1:3), x = 1:6 ) %>% group_by(g) nrows <- function(group, n) { unlist(do(group, nrow = nrow(.), .chunk_size = n)$nrow) } expect_equal(nrows(mf, 1), c(1, 2, 3)) expect_equal(nrows(mf, 2), c(1, 2, 3)) expect_equal(nrows(mf, 10), c(1, 2, 3)) }) dbplyr/tests/testthat/test-translate-sql-helpers.R0000644000176200001440000000253413426147016022112 0ustar liggesuserscontext("test-translate-sql-helpers.r") old <- NULL setup(old <<- set_current_con(simulate_dbi())) teardown(set_current_con(old)) test_that("aggregation functions warn if na.rm = FALSE", { sql_mean <- sql_aggregate("MEAN") expect_warning(sql_mean("x"), "Missing values") expect_warning(sql_mean("x", na.rm = TRUE), NA) }) test_that("missing window functions create a warning", { sim_scalar <- sql_translator() sim_agg <- sql_translator(`+` = sql_infix("+")) sim_win <- sql_translator() expect_warning( sql_variant(sim_scalar, sim_agg, sim_win), "Translator is missing" ) }) test_that("missing aggregate functions filled in", { sim_scalar <- sql_translator() sim_agg <- sql_translator() sim_win <- sql_translator(mean = function() {}) trans <- sql_variant(sim_scalar, sim_agg, sim_win) expect_error(trans$aggregate$mean(), "only available in a window") }) test_that("output of print method for sql_variant is correct", { sim_trans <- sql_translator(`+` = sql_infix("+")) expect_known_output( sql_variant(sim_trans, sim_trans, sim_trans), test_path("test-sql-variant.txt"), print = TRUE ) }) test_that("win_rank() is accepted by the sql_translator", { expect_known_output( print(sql_variant( sql_translator( test = win_rank("test") ) )), test_path("test-sql-translator.txt")) }) dbplyr/tests/testthat/test-sql.R0000644000176200001440000000034113443122005016437 0ustar liggesuserscontext("test-sql") test_that("can concatenate sql vector without supplying connection", { expect_equal(c(sql("x")), sql("x")) expect_equal(c(sql("x"), "x"), sql("x", "'x'")) expect_equal(c(ident("x")), sql("`x`")) }) dbplyr/tests/testthat/test-verb-copy-to.R0000644000176200001440000000275513416400535020210 0ustar liggesuserscontext("test-copy-to") test_that("can copy to from remote sources", { df <- data.frame(x = 1:10) con1 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") on.exit(DBI::dbDisconnect(con1), add = TRUE) df_1 <- copy_to(con1, df, "df1") # Create from tbl in same database df_2 <- copy_to(con1, df_1, "df2") expect_equal(collect(df_2), df) # Create from tbl in another data con2 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") on.exit(DBI::dbDisconnect(con2), add = TRUE) df_3 <- copy_to(con2, df_1, "df3") expect_equal(collect(df_3), df) }) test_that("can round trip basic data frame", { df <- test_frame(x = c(1, 10, 9, NA), y = letters[1:4]) expect_equal_tbls(df) }) test_that("NAs in character fields handled by db sources (#2256)", { df <- test_frame( x = c("a", "aa", NA), y = c(NA, "b", "bb"), z = c("cc", NA, "c") ) expect_equal_tbls(df) }) test_that("only overwrite existing table if explicitly requested", { con <- DBI::dbConnect(RSQLite::SQLite()) on.exit(DBI::dbDisconnect(con)) DBI::dbWriteTable(con, "df", data.frame(x = 1:5)) expect_error(copy_to(con, data.frame(x = 1), name = "df"), "exists") expect_silent(copy_to(con, data.frame(x = 1), name = "df", overwrite = TRUE)) }) test_that("can create a new table in non-default schema", { con <- sqlite_con_with_aux() on.exit(DBI::dbDisconnect(con)) aux_mtcars <- copy_to(con, mtcars, in_schema("aux", "mtcars"), temporary = FALSE) expect_equal(tbl_vars(aux_mtcars), tbl_vars(mtcars)) }) dbplyr/tests/testthat/test-query-join.R0000644000176200001440000000176513442161247017767 0ustar liggesuserscontext("test-query-join") test_that("print method doesn't change unexpectedly", { lf1 <- lazy_frame(x = 1, y = 2) lf2 <- lazy_frame(x = 1, z = 2) qry <- sql_build(left_join(lf1, lf2)) expect_known_output(print(qry), test_path("test-query-join-print.txt")) }) test_that("generated sql doesn't change unexpectedly", { lf <- lazy_frame(x = 1, y = 2) reg <- list( inner = inner_join(lf, lf), left = left_join(lf, lf), right = right_join(lf, lf), full = full_join(lf, lf) ) expect_known_output(print(reg), test_path("sql/join.sql")) }) test_that("sql_on query doesn't change unexpectedly", { lf1 <- lazy_frame(x = 1, y = 2) lf2 <- lazy_frame(x = 1, z = 3) reg <- list( inner = inner_join(lf1, lf2, sql_on = "LHS.y < RHS.z"), left = left_join(lf1, lf2, sql_on = "LHS.y < RHS.z"), right = right_join(lf1, lf2, sql_on = "LHS.y < RHS.z"), full = full_join(lf1, lf2, sql_on = "LHS.y < RHS.z") ) expect_known_output(print(reg), test_path("sql/join-on.sql")) }) dbplyr/tests/testthat/test-remote.R0000644000176200001440000000103413415745770017156 0ustar liggesuserscontext("test-remote.R") test_that("remote_name returns null for computed tables", { mf <- memdb_frame(x = 5, .name = "refxiudlph") expect_equal(remote_name(mf), ident("refxiudlph")) mf2 <- mf %>% filter(x == 3) expect_equal(remote_name(mf2), NULL) }) test_that("can retrieve query, src and con metadata", { mf <- memdb_frame(x = 5) expect_s4_class(remote_con(mf), "DBIConnection") expect_s3_class(remote_src(mf), "src_sql") expect_s3_class(remote_query(mf), "sql") expect_type(remote_query_plan(mf), "character") }) dbplyr/NAMESPACE0000644000176200001440000002460213501726132012771 0ustar liggesusers# Generated by roxygen2: do not edit by hand S3method(anti_join,tbl_lazy) S3method(arrange,tbl_lazy) S3method(arrange_,tbl_lazy) S3method(as.data.frame,tbl_lazy) S3method(as.data.frame,tbl_sql) S3method(as.sql,character) S3method(as.sql,ident) S3method(as.sql,sql) S3method(auto_copy,tbl_sql) S3method(c,ident) S3method(c,sql) S3method(collapse,tbl_sql) S3method(collect,tbl_sql) S3method(compute,tbl_sql) S3method(copy_to,src_sql) S3method(db_analyze,"Microsoft SQL Server") S3method(db_analyze,ACCESS) S3method(db_analyze,DBIConnection) S3method(db_analyze,Hive) S3method(db_analyze,Impala) S3method(db_analyze,MariaDBConnection) S3method(db_analyze,MySQL) S3method(db_analyze,MySQLConnection) S3method(db_analyze,OraConnection) S3method(db_analyze,Oracle) S3method(db_analyze,Teradata) S3method(db_begin,DBIConnection) S3method(db_begin,MySQLConnection) S3method(db_begin,PostgreSQLConnection) S3method(db_collect,DBIConnection) S3method(db_commit,DBIConnection) S3method(db_commit,MySQLConnection) S3method(db_compute,DBIConnection) S3method(db_copy_to,"Microsoft SQL Server") S3method(db_copy_to,DBIConnection) S3method(db_create_index,DBIConnection) S3method(db_create_index,MariaDBConnection) S3method(db_create_index,MySQL) S3method(db_create_index,MySQLConnection) S3method(db_create_indexes,DBIConnection) S3method(db_create_table,DBIConnection) S3method(db_data_type,DBIConnection) S3method(db_data_type,MySQLConnection) S3method(db_desc,DBIConnection) S3method(db_desc,MariaDBConnection) S3method(db_desc,MySQL) S3method(db_desc,MySQLConnection) S3method(db_desc,OdbcConnection) S3method(db_desc,PostgreSQL) S3method(db_desc,PostgreSQLConnection) S3method(db_desc,PqConnection) S3method(db_desc,SQLiteConnection) S3method(db_drop_table,DBIConnection) S3method(db_drop_table,OraConnection) S3method(db_drop_table,Oracle) S3method(db_explain,DBIConnection) S3method(db_explain,Oracle) S3method(db_explain,PostgreSQL) S3method(db_explain,PostgreSQLConnection) S3method(db_explain,PqConnection) S3method(db_explain,SQLiteConnection) S3method(db_has_table,DBIConnection) S3method(db_has_table,MariaDBConnection) S3method(db_has_table,MySQL) S3method(db_has_table,MySQLConnection) S3method(db_has_table,PostgreSQLConnection) S3method(db_insert_into,DBIConnection) S3method(db_list_tables,DBIConnection) S3method(db_query_fields,DBIConnection) S3method(db_query_fields,PostgreSQLConnection) S3method(db_query_rows,DBIConnection) S3method(db_rollback,DBIConnection) S3method(db_rollback,MySQLConnection) S3method(db_save_query,"Microsoft SQL Server") S3method(db_save_query,DBIConnection) S3method(db_sql_render,DBIConnection) S3method(db_write_table,"Microsoft SQL Server") S3method(db_write_table,DBIConnection) S3method(db_write_table,MySQLConnection) S3method(db_write_table,PostgreSQLConnection) S3method(dim,tbl_lazy) S3method(dimnames,tbl_lazy) S3method(distinct,tbl_lazy) S3method(distinct_,tbl_lazy) S3method(do,tbl_sql) S3method(do_,tbl_sql) S3method(escape,"NULL") S3method(escape,Date) S3method(escape,POSIXt) S3method(escape,character) S3method(escape,data.frame) S3method(escape,double) S3method(escape,factor) S3method(escape,ident) S3method(escape,ident_q) S3method(escape,integer) S3method(escape,integer64) S3method(escape,list) S3method(escape,logical) S3method(escape,reactivevalues) S3method(escape,sql) S3method(explain,tbl_sql) S3method(filter_,tbl_lazy) S3method(format,ident) S3method(format,sql) S3method(format,src_sql) S3method(full_join,tbl_lazy) S3method(group_by,tbl_lazy) S3method(group_by_,tbl_lazy) S3method(group_size,tbl_sql) S3method(group_vars,tbl_lazy) S3method(groups,tbl_lazy) S3method(head,tbl_lazy) S3method(inner_join,tbl_lazy) S3method(left_join,tbl_lazy) S3method(mutate,tbl_lazy) S3method(mutate_,tbl_lazy) S3method(n_groups,tbl_sql) S3method(names,sql_variant) S3method(op_desc,op) S3method(op_desc,op_arrange) S3method(op_desc,op_base_remote) S3method(op_desc,op_group_by) S3method(op_frame,op_base) S3method(op_frame,op_double) S3method(op_frame,op_frame) S3method(op_frame,op_single) S3method(op_frame,tbl_lazy) S3method(op_grps,op_base) S3method(op_grps,op_double) S3method(op_grps,op_group_by) S3method(op_grps,op_select) S3method(op_grps,op_single) S3method(op_grps,op_summarise) S3method(op_grps,op_ungroup) S3method(op_grps,tbl_lazy) S3method(op_sort,op_arrange) S3method(op_sort,op_base) S3method(op_sort,op_double) S3method(op_sort,op_order) S3method(op_sort,op_single) S3method(op_sort,op_summarise) S3method(op_sort,tbl_lazy) S3method(op_vars,op_base) S3method(op_vars,op_distinct) S3method(op_vars,op_double) S3method(op_vars,op_join) S3method(op_vars,op_select) S3method(op_vars,op_semi_join) S3method(op_vars,op_set_op) S3method(op_vars,op_single) S3method(op_vars,op_summarise) S3method(op_vars,tbl_lazy) S3method(print,ident) S3method(print,join_query) S3method(print,op_base_local) S3method(print,op_base_remote) S3method(print,op_single) S3method(print,select_query) S3method(print,semi_join_query) S3method(print,set_op_query) S3method(print,sql) S3method(print,sql_variant) S3method(print,tbl_lazy) S3method(print,tbl_sql) S3method(pull,tbl_sql) S3method(rename,tbl_lazy) S3method(rename_,tbl_lazy) S3method(right_join,tbl_lazy) S3method(same_src,src_sql) S3method(same_src,tbl_lazy) S3method(same_src,tbl_sql) S3method(select,tbl_lazy) S3method(select_,tbl_lazy) S3method(semi_join,tbl_lazy) S3method(show_query,tbl_lazy) S3method(sql_build,ident) S3method(sql_build,op_arrange) S3method(sql_build,op_base_local) S3method(sql_build,op_base_remote) S3method(sql_build,op_distinct) S3method(sql_build,op_filter) S3method(sql_build,op_frame) S3method(sql_build,op_group_by) S3method(sql_build,op_head) S3method(sql_build,op_join) S3method(sql_build,op_order) S3method(sql_build,op_select) S3method(sql_build,op_semi_join) S3method(sql_build,op_set_op) S3method(sql_build,op_summarise) S3method(sql_build,op_ungroup) S3method(sql_build,tbl_lazy) S3method(sql_escape_ident,DBIConnection) S3method(sql_escape_ident,MySQLConnection) S3method(sql_escape_ident,SQLiteConnection) S3method(sql_escape_logical,ACCESS) S3method(sql_escape_logical,DBIConnection) S3method(sql_escape_logical,SQLiteConnection) S3method(sql_escape_string,DBIConnection) S3method(sql_join,DBIConnection) S3method(sql_join,MySQLConnection) S3method(sql_optimise,ident) S3method(sql_optimise,query) S3method(sql_optimise,select_query) S3method(sql_optimise,sql) S3method(sql_render,ident) S3method(sql_render,join_query) S3method(sql_render,op) S3method(sql_render,select_query) S3method(sql_render,semi_join_query) S3method(sql_render,set_op_query) S3method(sql_render,sql) S3method(sql_render,tbl_lazy) S3method(sql_select,"Microsoft SQL Server") S3method(sql_select,ACCESS) S3method(sql_select,DBIConnection) S3method(sql_select,OraConnection) S3method(sql_select,Oracle) S3method(sql_select,Teradata) S3method(sql_semi_join,DBIConnection) S3method(sql_set_op,SQLiteConnection) S3method(sql_set_op,default) S3method(sql_subquery,DBIConnection) S3method(sql_subquery,OraConnection) S3method(sql_subquery,Oracle) S3method(sql_subquery,SQLiteConnection) S3method(sql_translate_env,"Microsoft SQL Server") S3method(sql_translate_env,ACCESS) S3method(sql_translate_env,DBIConnection) S3method(sql_translate_env,Hive) S3method(sql_translate_env,Impala) S3method(sql_translate_env,MariaDBConnection) S3method(sql_translate_env,MySQL) S3method(sql_translate_env,MySQLConnection) S3method(sql_translate_env,OdbcConnection) S3method(sql_translate_env,OraConnection) S3method(sql_translate_env,Oracle) S3method(sql_translate_env,PostgreSQL) S3method(sql_translate_env,PostgreSQLConnection) S3method(sql_translate_env,PqConnection) S3method(sql_translate_env,Redshift) S3method(sql_translate_env,SQLiteConnection) S3method(sql_translate_env,Teradata) S3method(src_tbls,src_sql) S3method(summarise,tbl_lazy) S3method(summarise_,tbl_lazy) S3method(tail,tbl_sql) S3method(tbl,src_dbi) S3method(tbl_sum,tbl_sql) S3method(tbl_vars,tbl_lazy) S3method(transmute,tbl_lazy) S3method(ungroup,tbl_lazy) S3method(union_all,tbl_lazy) S3method(unique,sql) export(add_op_single) export(as.sql) export(base_agg) export(base_no_win) export(base_odbc_agg) export(base_odbc_scalar) export(base_odbc_win) export(base_scalar) export(base_win) export(build_sql) export(copy_lahman) export(copy_nycflights13) export(db_collect) export(db_compute) export(db_copy_to) export(db_sql_render) export(escape) export(escape_ansi) export(has_lahman) export(has_nycflights13) export(ident) export(ident_q) export(in_schema) export(is.ident) export(is.sql) export(join_query) export(lahman_df) export(lahman_mysql) export(lahman_postgres) export(lahman_sqlite) export(lahman_srcs) export(lazy_frame) export(memdb_frame) export(named_commas) export(nycflights13_postgres) export(nycflights13_sqlite) export(op_base) export(op_double) export(op_frame) export(op_grps) export(op_single) export(op_sort) export(op_vars) export(partial_eval) export(remote_con) export(remote_name) export(remote_query) export(remote_query_plan) export(remote_src) export(select_query) export(semi_join_query) export(set_op_query) export(simulate_access) export(simulate_dbi) export(simulate_hive) export(simulate_impala) export(simulate_mssql) export(simulate_mysql) export(simulate_odbc) export(simulate_oracle) export(simulate_postgres) export(simulate_sqlite) export(simulate_teradata) export(sql) export(sql_aggregate) export(sql_aggregate_2) export(sql_build) export(sql_call2) export(sql_cast) export(sql_cot) export(sql_escape_logical) export(sql_expr) export(sql_infix) export(sql_log) export(sql_not_supported) export(sql_optimise) export(sql_paste) export(sql_paste_infix) export(sql_prefix) export(sql_quote) export(sql_render) export(sql_str_sub) export(sql_substr) export(sql_translator) export(sql_variant) export(sql_vector) export(src_dbi) export(src_memdb) export(src_sql) export(src_test) export(tbl_lazy) export(tbl_memdb) export(tbl_sql) export(test_frame) export(test_load) export(test_register_con) export(test_register_src) export(translate_sql) export(translate_sql_) export(win_absent) export(win_aggregate) export(win_aggregate_2) export(win_cumulative) export(win_current_frame) export(win_current_group) export(win_current_order) export(win_over) export(win_rank) export(win_recycled) export(window_frame) export(window_order) import(DBI) import(dplyr) import(rlang) import(tibble) importFrom(R6,R6Class) importFrom(assertthat,assert_that) importFrom(assertthat,is.flag) importFrom(glue,glue) importFrom(methods,setOldClass) importFrom(stats,setNames) importFrom(stats,update) importFrom(utils,head) importFrom(utils,tail) dbplyr/NEWS.md0000644000176200001440000006172613501765271012667 0ustar liggesusers# dbplyr 1.4.2 * Fix bug when partially evaluating unquoting quosure containing a single symbol (#317) * Fixes for rlang and dpylr compatibility. # dbplyr 1.4.1 Minor improvements to SQL generation * `x %in% y` strips names of `y` (#269). * Enhancements for scoped verbs (`mutate_all()`, `summarise_if()`, `filter_at()` etc) (#296, #306). * MS SQL use `TOP 100 PERCENT` as stop-gap to allow subqueries with `ORDER BY` (#277). * Window functions now translated correctly for Hive (#293, @cderv). # dbplyr 1.4.0 ## Breaking changes * ``Error: `con` must not be NULL``: If you see this error, it probably means that you have forgotten to pass `con` down to a dbplyr function. Previously, dbplyr defaulted to using `simulate_dbi()` which introduced subtle escaping bugs. (It's also possible I have forgotten to pass it somewhere that the dbplyr tests don't pick up, so if you can't figure it out, please let me know). * Subsetting (`[[`, `$`, and `[`) functions are no longer evaluated locally. This makes the translation more consistent and enables useful new idioms for modern databases (#200). ## New features * MySQL/MariaDB (https://mariadb.com/kb/en/library/window-functions/) and SQLite (https://www.sqlite.org/windowfunctions.html) translations gain support for window functions, available in Maria DB 10.2, MySQL 8.0, and SQLite 3.25 (#191). * Overall, dplyr generates many fewer subqueries: * Joins and semi-joins no longer add an unneeded subquery (#236). This is facilitated by the new `bare_identifier_ok` argument to `sql_render()`; the previous argument was called `root` and confused me. * Many sequences of `select()`, `rename()`, `mutate()`, and `transmute()` can be collapsed into a single query, instead of always generating a subquery (#213). * New `vignette("sql")` describes some advantages of dbplyr over SQL (#205) and gives some advice about writing literal SQL inside of dplyr, when you need to (#196). * New `vignette("reprex")` gives some hints on creating reprexes that work anywhere (#117). This is supported by a new `tbl_memdb()` that matches the existing `tbl_lazy()`. * All `..._join()` functions gain an `sql_on` argument that allows specifying arbitrary join predicates in SQL code (#146, @krlmlr). ## SQL translations * New translations for some lubridate functions: `today()`, `now()`, `year()`, `month()`, `day()`, `hour()`, `minute()`, `second()`, `quarter()`, `yday()` (@colearendt, @derekmorr). Also added new translation for `as.POSIXct()`. * New translations for stringr functions: `str_c()`, `str_sub()`, `str_length()`, `str_to_upper()`, `str_to_lower()`, and `str_to_title()` (@colearendt). Non-translated stringr functions throw a clear error. * New translations for bitwise operations: `bitwNot()`, `bitwAnd()`, `bitwOr()`, `bitwXor()`, `bitwShiftL()`, and `bitwShiftR()`. Unlike the base R functions, the translations do not coerce arguments to integers (@davidchall, #235). * New translation for `x[y]` to `CASE WHEN y THEN x END`. This enables `sum(a[b == 0])` to work as you expect from R (#202). `y` needs to be a logical expression; if not you will likely get a type error from your database. * New translations for `x$y` and `x[["y"]]` to `x.y`, enabling you to index into nested fields in databases that provide them (#158). * The `.data` and `.env` pronouns of tidy evaluation are correctly translated (#132). * New translation for `median()` and `quantile()`. Works for all ANSI compliant databases (SQL Server, Postgres, MariaDB, Teradata) and has custom translations for Hive. Thanks to @edavidaja for researching the SQL variants! (#169) * `na_if()` is correct translated to `NULLIF()` (rather than `NULL_IF`) (#211). * `n_distinct()` translation throws an error when given more than one argument. (#101, #133). * New default translations for `paste()`, `paste0()`, and the hyperbolic functions (these previously were only available for ODBC databases). * Corrected translations of `pmin()` and `pmax()` to `LEAST()` and `GREATEST()` for ANSI compliant databases (#118), to `MIN()` and `MAX()` for SQLite, and to an error for SQL server. * New translation for `switch()` to the simple form of `CASE WHEN` (#192). ### SQL simulation SQL simulation makes it possible to see what dbplyr will translate SQL to, without having an active database connection, and is used for testing and generating reprexes. * SQL simulation has been overhauled. It now works reliably, is better documented, and always uses ANSI escaping (i.e. `` ` `` for field names and `'` for strings). * `tbl_lazy()` now actually puts a `dbplyr::src` in the `$src` field. This shouldn't affect any downstream code unless you were previously working around this weird difference between `tbl_lazy` and `tbl_sql` classes. It also includes the `src` class in its class, and when printed, shows the generated SQL (#111). ## Database specific improvements * MySQL/MariaDB * Translations also applied to connections via the odbc package (@colearendt, #238) * Basic support for regular expressions via `str_detect()` and `str_replace_all()` (@colearendt, #168). * Improved translation for `as.logical(x)` to `IF(x, TRUE, FALSE)`. * Oracle * New custom translation for `paste()` and `paste0()` (@cderv, #221) * Postgres * Basic support for regular expressions via `str_detect()` and `str_replace_all()` (@colearendt, #168). * SQLite * `explain()` translation now generates `EXPLAIN QUERY PLAN` which generates a higher-level, more human friendly explanation. * SQL server * Improved translation for `as.logical(x)` to `CAST(x as BIT)` (#250). * Translates `paste()`, `paste0()`, and `str_c()` to `+`. * `copy_to()` method applies temporary table name transformation earlier so that you can now overwrite temporary tables (#258). * `db_write_table()` method uses correct argument name for passing along field types (#251). ## Minor improvements and bug fixes * Aggregation functions only warn once per session about the use of `na.rm = TRUE` (#216). * table names generated by `random_table_name()` have the prefix "dbplyr_", which makes it easier to find them programmatically (@mattle24, #111) * Functions that are only available in a windowed (`mutate()`) query now throw an error when called in a aggregate (`summarise()`) query (#129) * `arrange()` understands the `.by_group` argument, making it possible sort by groups if desired. The default is `FALSE` (#115) * `distinct()` now handles computed variables like `distinct(df, y = x + y)` (#154). * `escape()`, `sql_expr()` and `build_sql()` no longer accept `con = NULL` as a shortcut for `con = simulate_dbi()`. This made it too easy to forget to pass `con` along, introducing extremely subtle escaping bugs. `win_over()` gains a `con` argument for the same reason. * New `escape_ansi()` always uses ANSI SQL 92 standard escaping (for use in examples and documentation). * `mutate(df, x = NULL)` drops `x` from the output, just like when working with local data frames (#194). * `partial_eval()` processes inlined functions (including rlang lambda functions). This makes dbplyr work with more forms of scoped verbs like `df %>% summarise_all(~ mean(.))`, `df %>% summarise_all(list(mean))` (#134). * `sql_aggregate()` now takes an optional argument `f_r` for passing to `check_na_rm()`. This allows the warning to show the R function name rather than the SQL function name (@sverchkov, #153). * `sql_infix()` gains a `pad` argument for the rare operator that doesn't need to be surrounded by spaces. * `sql_prefix()` no longer turns SQL functions into uppercase, allowing for correct translation of case-sensitive SQL functions (#181, @mtoto). * `summarise()` gives a clear error message if you refer to a variable created in that same `summarise()` (#114). * New `sql_call2()` which is to `rlang::call2()` as `sql_expr()` is to `rlang::expr()`. * `show_query()` and `explain()` use `cat()` rather than message. * `union()`, `union_all()`, `setdiff()` and `intersect()` do a better job of matching columns across backends (#183). # dbplyr 1.3.0 * Now supports for dplyr 0.8.0 (#190) and R 3.1.0 ## API changes * Calls of the form `dplyr::foo()` are now evaluated in the database, rather than locally (#197). * `vars` argument to `tbl_sql()` has been formally deprecated; it hasn't actually done anything for a while (#3254). * `src` and `tbl` objects now include a class generated from the class of the underlying connection object. This makes it possible for dplyr backends to implement different behaviour at the dplyr level, when needed. (#2293) ## SQL translation * `x %in% y` is now translated to `FALSE` if `y` is empty (@mgirlich, #160). * New `as.integer64(x)` translation to `CAST(x AS BIGINT)` (#3305) * `case_when` now translates with a ELSE clause if a formula of the form `TRUE~` is provided . (@cderv, #112) * `cummean()` now generates `AVG()` not `MEAN()` (#157) * `str_detect()` now uses correct parameter order (#3397) * MS SQL * Cumulative summary functions now work (#157) * `ifelse()` uses `CASE WHEN` instead of `IIF`; this allows more complex operations, such as `%in%`, to work properly (#93) * Oracle * Custom `db_drop_table()` now only drops tables if they exist (#3306) * Custom `setdiff()` translation (#3493) * Custom `db_explain()` translation (#3471) * SQLite * Correct translation for `as.numeric()`/`as.double()` (@chris-park, #171). * Redshift * `substr()` translation improved (#3339) ## Minor improvements and bug fixes * `copy_to()` will only remove existing table when `overwrite = TRUE` and the table already exists, eliminating a confusing "NOTICE" from PostgreSQL (#3197). * `partial_eval()` handles unevaluated formulas (#184). * `pull.tbl_sql()` now extracts correctly from grouped tables (#3562). * `sql_render.op()` now correctly forwards the `con` argument (@kevinykuo, #73). # dbplyr 1.2.2 * R CMD check fixes # dbplyr 1.2.1 * Forward compatibility fixes for rlang 0.2.0 # dbplyr 1.2.0 ## New top-level translations * New translations for * MS Access (#2946) (@DavisVaughan) * Oracle, via odbc or ROracle (#2928, #2732, @edgararuiz) * Teradata. * Redshift. * dbplyr now supplies appropriate translations for the RMariaDB and RPostgres packages (#3154). We generally recommend using these packages in favour of the older RMySQL and RPostgreSQL packages as they are fully DBI compliant and tested with DBItest. ## New features * `copy_to()` can now "copy" tbl_sql in the same src, providing another way to cache a query into a temporary table (#3064). You can also `copy_to` tbl_sqls from another source, and `copy_to()` will automatically collect then copy. * Initial support for stringr functions: `str_length()`, `str_to_upper()`, `str_to_lower()`, `str_replace_all()`, `str_detect()`, `str_trim()`. Regular expression support varies from database to database, but most simple regular expressions should be ok. ## Tools for developers * `db_compute()` gains an `analyze` argument to match `db_copy_to()`. * New `remote_name()`, `remote_con()`, `remote_src()`, `remote_query()` and `remote_query_plan()` provide a standard API for get metadata about a remote tbl (#3130, #2923, #2824). * New `sql_expr()` is a more convenient building block for low-level SQL translation (#3169). * New `sql_aggregate()` and `win_aggregate()` for generating SQL and windowed SQL functions for aggregates. These take one argument, `x`, and warn if `na.rm` is not `TRUE` (#3155). `win_recycled()` is equivalent to `win_aggregate()` and has been soft-deprecated. * `db_write_table` now needs to return the table name ## Minor improvements and bug fixes * Multiple `head()` calls in a row now collapse to a single call. This avoids a printing problem with MS SQL (#3084). * `escape()` now works with integer64 values from the bit64 package (#3230) * `if`, `ifelse()`, and `if_else()` now correctly scope the false condition so that it only applies to non-NULL conditions (#3157) * `ident()` and `ident_q()` handle 0-length inputs better, and should be easier to use with S3 (#3212) * `in_schema()` should now work in more places, particularly in `copy_to()` (#3013, @baileych) * SQL generation for joins no longer gets stuck in a endless loop if you request an empty suffix (#3220). * `mutate()` has better logic for splitting a single mutate into multiple subqueries (#3095). * Improved `paste()` and `paste0()` support in MySQL, PostgreSQL (#3168), and RSQLite (#3176). MySQL and PostgreSQL gain support for `str_flatten()` which behaves like `paste(x, collapse = "-")` (but for technical reasons can't be implemented as a straightforward translation of `paste()`). * `same_src.tbl_sql()` now performs correct comparison instead of always returning `TRUE`. This means that `copy = TRUE` once again allows you to perform cross-database joins (#3002). * `select()` queries no longer alias column names unnecessarily (#2968, @DavisVaughan). * `select()` and `rename()` are now powered by tidyselect, fixing a few renaming bugs (#3132, #2943, #2860). * `summarise()` once again performs partial evaluation before database submission (#3148). * `test_src()` makes it easier to access a single test source. ## Database specific improvements * MS SQL * Better support for temporary tables (@Hong-Revo) * Different translations for filter/mutate contexts for: `NULL` evaluation (`is.na()`, `is.null()`), logical operators (`!`, `&`, `&&`, `|`, `||`), and comparison operators (`==`, `!=`, `<`, `>`, `>=`, `<=`) * MySQL: `copy_to()` (via `db_write_table()`) correctly translates logical variables to integers (#3151). * odbc: improved `n()` translation in windowed context. * SQLite: improved `na_if` translation (@cwarden) * PostgreSQL: translation for `grepl()` added (@zozlak) * Oracle: changed VARVHAR to VARCHAR2 datatype (@washcycle, #66) # dbplyr 1.1.0 ## New features * `full_join()` over non-overlapping columns `by = character()` translated to `CROSS JOIN` (#2924). * `case_when()` now translates to SQL "CASE WHEN" (#2894) * `x %in% c(1)` now generates the same SQL as `x %in% 1` (#2898). * New `window_order()` and `window_frame()` give you finer control over the window functions that dplyr creates (#2874, #2593). * Added SQL translations for Oracle (@edgararuiz). ## Minor improvements and bug fixes * `x %in% c(1)` now generates the same SQL as `x %in% 1` (#2898). * `head(tbl, 0)` is now supported (#2863). * `select()`ing zero columns gives a more information error message (#2863). * Variables created in a join are now disambiguated against other variables in the same table, not just variables in the other table (#2823). * PostgreSQL gains a better translation for `round()` (#60). * Added custom `db_analyze_table()` for MS SQL, Oracle, Hive and Impala (@edgararuiz) * Added support for `sd()` for aggregate and window functions (#2887) (@edgararuiz) * You can now use the magrittr pipe within expressions, e.g. `mutate(mtcars, cyl %>% as.character())`. * If a translation was supplied for a summarise function, but not for the equivalent windowed variant, the expression would be translated to `NULL` with a warning. Now `sql_variant()` checks that all aggregate functions have matching window functions so that correct translations or clean errors will be generated (#2887) # dbplyr 1.0.0 ## New features * `tbl()` and `copy_to()` now work directly with DBI connections (#2423, #2576), so there is no longer a need to generate a dplyr src. ```R library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars) mtcars2 <- tbl(con, "mtcars") mtcars2 ``` * `glimpse()` now works with remote tables (#2665) * dplyr has gained a basic SQL optimiser, which collapses certain nested SELECT queries into a single query (#1979). This will improve query execution performance for databases with less sophisticated query optimisers, and fixes certain problems with ordering and limits in subqueries (#1979). A big thanks goes to @hhoeflin for figuring out this optimisation. * `compute()` and `collapse()` now preserve the "ordering" of rows. This only affects the computation of window functions, as the rest of SQL does not care about row order (#2281). * `copy_to()` gains an `overwrite` argument which allows you to overwrite an existing table. Use with care! (#2296) * New `in_schema()` function makes it easy to refer to tables in schema: `in_schema("my_schema_name", "my_table_name")`. ## Deprecated and defunct * `query()` is no longer exported. It hasn't been useful for a while so this shouldn't break any code. ## Verb-level SQL generation * Partial evaluation occurs immediately when you execute a verb (like `filter()` or `mutate()`) rather than happening when the query is executed (#2370). * `mutate.tbl_sql()` will now generate as many subqueries as necessary so that you can refer to variables that you just created (like in mutate with regular dataframes) (#2481, #2483). * SQL joins have been improved: * SQL joins always use the `ON ...` syntax, avoiding `USING ...` even for natural joins. Improved handling of tables with columns of the same name (#1997, @javierluraschi). They now generate SQL more similar to what you'd write by hand, eliminating a layer or two of subqueries (#2333) * [API] They now follow the same rules for including duplicated key variables that the data frame methods do, namely that key variables are only kept from `x`, and never from `y` (#2410) * [API] The `sql_join()` generic now gains a `vars` argument which lists the variables taken from the left and right sides of the join. If you have a custom `sql_join()` method, you'll need to update how your code generates joins, following the template in `sql_join.generic()`. * `full_join()` throws a clear error when you attempt to use it with a MySQL backend (#2045) * `right_join()` and `full_join()` now return results consistent with local data frame sources when there are records in the right table with no match in the left table. `right_join()` returns values of `by` columns from the right table. `full_join()` returns coalesced values of `by` columns from the left and right tables (#2578, @ianmcook) * `group_by()` can now perform an inline mutate for database backends (#2422). * The SQL generation set operations (`intersect()`, `setdiff()`, `union()`, and `union_all()`) have been considerably improved. By default, the component SELECT are surrounded with parentheses, except on SQLite. The SQLite backend will now throw an error if you attempt a set operation on a query that contains a LIMIT, as that is not supported in SQLite (#2270). All set operations match column names across inputs, filling in non-matching variables with NULL (#2556). * `rename()` and `group_by()` now combine correctly (#1962) * `tbl_lazy()` and `lazy_tbl()` have been exported. These help you test generated SQL with out an active database connection. * `ungroup()` correctly resets grouping variables (#2704). ## Vector-level SQL generation * New `as.sql()` safely coerces an input to SQL. * More translators for `as.character()`, `as.integer()` and `as.double()` (#2775). * New `ident_q()` makes it possible to specifier identifiers that do not need to be quoted. * Translation of inline scalars: * Logical values are now translated differently depending on the backend. The default is to use "true" and "false" which is the SQL-99 standard, but not widely support. SQLite translates to "0" and "1" (#2052). * `Inf` and `-Inf` are correctly escaped * Better test for whether or not a double is similar to an integer and hence needs a trailing 0.0 added (#2004). * Quoting defaults to `DBI::dbEscapeString()` and `DBI::dbQuoteIdentifier()` respectively. * `::` and `:::` are handled correctly (#2321) * `x %in% 1` is now correctly translated to `x IN (1)` (#511). * `ifelse()` and `if_else()` use correct argument names in SQL translation (#2225). * `ident()` now returns an object with class `c("ident", "character")`. It no longer contains "sql" to indicate that this is not already escaped. * `is.na()` and `is.null()` gain extra parens in SQL translation to preserve correct precedence (#2302). * [API] `log(x, b)` is now correctly translated to the SQL `log(b, x)` (#2288). SQLite does not support the 2-argument log function so it is translated to `log(x) / log(b)`. * `nth(x, i)` is now correctly translated to `nth_value(x, i)`. * `n_distinct()` now accepts multiple variables (#2148). * [API] `substr()` is now translated to SQL, correcting for the difference in the third argument. In R, it's the position of the last character, in SQL it's the length of the string (#2536). * `win_over()` escapes expression using current database rules. ## Backends * `copy_to()` now uses `db_write_table()` instead of `db_create_table()` and `db_insert_into()`. `db_write_table.DBIConnection()` uses `dbWriteTable()`. * New `db_copy_to()`, `db_compute()` and `db_collect()` allow backends to override the entire database process behind `copy_to()`, `compute()` and `collect()`. `db_sql_render()` allow additional control over the SQL rendering process. * All generics whose behaviour can vary from database to database now provide a DBIConnection method. That means that you can easily scan the NAMESPACE to see the extension points. * `sql_escape_logical()` allows you to control the translation of literal logicals (#2614). * `src_desc()` has been replaced by `db_desc()` and now dispatches on the connection, eliminating the last method that required dispatch on the class of the src. * `win_over()`, `win_rank()`, `win_recycled()`, `win_cumulative()`, `win_current_group()` and `win_current_order()` are now exported. This should make it easier to provide customised SQL for window functions (#2051, #2126). * SQL translation for Microsoft SQL Server (@edgararuiz) * SQL translation for Apache Hive (@edgararuiz) * SQL translation for Apache Impala (@edgararuiz) ## Minor bug fixes and improvements * `collect()` once again defaults to return all rows in the data (#1968). This makes it behave the same as `as.data.frame()` and `as_tibble()`. * `collect()` only regroups by variables present in the data (#2156) * `collect()` will automatically LIMIT the result to the `n`, the number of rows requested. This will provide the query planner with more information that it may be able to use to improve execution time (#2083). * `common_by()` gets a better error message for unexpected inputs (#2091) * `copy_to()` no longer checks that the table doesn't exist before creation, instead preferring to fall back on the database for error messages. This should reduce both false positives and false negative (#1470) * `copy_to()` now succeeds for MySQL if a character column contains `NA` (#1975, #2256, #2263, #2381, @demorenoc, @eduardgrebe). * `copy_to()` now returns it's output invisibly (since you're often just calling for the side-effect). * `distinct()` reports improved variable information for SQL backends. This means that it is more likely to work in the middle of a pipeline (#2359). * Ungrouped `do()` on database backends now collects all data locally first (#2392). * Call `dbFetch()` instead of the deprecated `fetch()` (#2134). Use `DBI::dbExecute()` for non-query SQL commands (#1912) * `explain()` and `show_query()` now invisibly return the first argument, making them easier to use inside a pipeline. * `print.tbl_sql()` displays ordering (#2287) and prints table name, if known. * `print(df, n = Inf)` and `head(df, n = Inf)` now work with remote tables (#2580). * `db_desc()` and `sql_translate_env()` get defaults for DBIConnection. * Formatting now works by overriding the `tbl_sum()` generic instead of `print()`. This means that the output is more consistent with tibble, and that `format()` is now supported also for SQL sources (tidyverse/dbplyr#14). ## Lazy ops * [API] The signature of `op_base` has changed to `op_base(x, vars, class)` * [API] `translate_sql()` and `partial_eval()` have been refined: * `translate_sql()` no longer takes a vars argument; instead call `partial_eval()` yourself. * Because it no longer needs the environment `translate_sql()_` now works with a list of dots, rather than a `lazy_dots`. * `partial_eval()` now takes a character vector of variable names rather than a tbl. * This leads to a simplification of the `op` data structure: dots is now a list of expressions rather than a `lazy_dots`. * [API] `op_vars()` now returns a list of quoted expressions. This enables escaping to happen at the correct time (i.e. when the connection is known). dbplyr/R/0000755000176200001440000000000013501726004011745 5ustar liggesusersdbplyr/R/verb-select.R0000644000176200001440000000474013443122122014304 0ustar liggesusers# select and rename ----------------------------------------------------------- #' @export select.tbl_lazy <- function(.data, ...) { dots <- quos(...) old_vars <- op_vars(.data$ops) new_vars <- tidyselect::vars_select(old_vars, !!!dots, .include = op_grps(.data$ops)) .data$ops <- op_select(.data$ops, syms(new_vars)) .data } #' @export rename.tbl_lazy <- function(.data, ...) { dots <- quos(...) old_vars <- op_vars(.data$ops) new_vars <- tidyselect::vars_rename(old_vars, !!!dots) .data$ops <- op_select(.data$ops, syms(new_vars)) .data } # op_select --------------------------------------------------------------- op_select <- function(x, vars) { if (inherits(x, "op_select")) { # Special optimisation when applied to pure projection() - this is # conservative and we could expand to any op_select() if combined with # the logic in nest_vars() prev_vars <- x$args$vars if (purrr::every(vars, is.symbol)) { # if current operation is pure projection # we can just subset the previous selection sel_vars <- purrr::map_chr(vars, as_string) vars <- set_names(prev_vars[sel_vars], names(sel_vars)) x <- x$x } else if (purrr::every(prev_vars, is.symbol)) { # if previous operation is pure projection sel_vars <- purrr::map_chr(prev_vars, as_string) if (all(names(sel_vars) == sel_vars)) { # and there's no renaming # we can just ignore the previous step x <- x$x } } } new_op_select(x, vars) } # SELECT in the SQL sense - powers select(), rename(), mutate(), and transmute() new_op_select <- function(x, vars) { stopifnot(inherits(x, "op")) stopifnot(is.list(vars)) op_single("select", x, dots = list(), args = list(vars = vars)) } #' @export op_vars.op_select <- function(op) { names(op$args$vars) } #' @export op_grps.op_select <- function(op) { # Find renamed variables symbols <- purrr::keep(op$args$vars, is_symbol) new2old <- purrr::map_chr(symbols, as_string) old2new <- set_names(names(new2old), new2old) grps <- op_grps(op$x) grps[grps %in% names(old2new)] <- old2new[grps] grps } #' @export sql_build.op_select <- function(op, con, ...) { new_vars <- translate_sql_( op$args$vars, con, vars_group = op_grps(op), vars_order = translate_sql_(op_sort(op), con, context = list(clause = "ORDER")), vars_frame = op_frame(op), context = list(clause = "SELECT") ) select_query( sql_build(op$x, con), select = new_vars ) } dbplyr/R/test-frame.R0000644000176200001440000000431113443245100014134 0ustar liggesusers#' Infrastructure for testing dplyr #' #' Register testing sources, then use `test_load()` to load an existing #' data frame into each source. To create a new table in each source, #' use `test_frame()`. #' #' @keywords internal #' @examples #' \dontrun{ #' test_register_src("df", src_df(env = new.env())) #' test_register_src("sqlite", src_sqlite(":memory:", create = TRUE)) #' #' test_frame(x = 1:3, y = 3:1) #' test_load(mtcars) #' } #' @name testing NULL #' @export #' @rdname testing test_register_src <- function(name, src) { message("Registering testing src: ", name, " ", appendLF = FALSE) tryCatch( { test_srcs$add(name, src) message("OK") }, error = function(e) message("\n* ", conditionMessage(e)) ) } #' @export #' @rdname testing test_register_con <- function(name, ...) { test_register_src(name, src_dbi(DBI::dbConnect(...), auto_disconnect = TRUE)) } #' @export #' @rdname testing src_test <- function(name) { srcs <- test_srcs$get() if (!name %in% names(srcs)) { stop("Couldn't find test src ", name, call. = FALSE) } srcs[[name]] } #' @export #' @rdname testing test_load <- function(df, name = unique_table_name(), srcs = test_srcs$get(), ignore = character()) { stopifnot(is.data.frame(df)) stopifnot(is.character(ignore)) srcs <- srcs[setdiff(names(srcs), ignore)] lapply(srcs, copy_to, df, name = name) } #' @export #' @rdname testing test_frame <- function(..., srcs = test_srcs$get(), ignore = character()) { df <- tibble(...) test_load(df, srcs = srcs, ignore = ignore) } test_frame_windowed <- function(...) { # SQLite and MySQL don't support window functions test_frame(..., ignore = c("sqlite", "mysql", "MariaDB")) } # Manage cache of testing srcs test_srcs <- local({ list( get = function() env_get(cache(), "srcs", list()), has = function(x) { srcs <- env_get(cache(), "srcs", list()) has_name(srcs, x) }, add = function(name, src) { srcs <- env_get(cache(), "srcs", list()) srcs[[name]] <- src env_poke(cache(), "srcs", srcs) }, set = function(...) { env_poke(cache(), "src", list(...)) }, length = function() { length(cache()$srcs) } ) }) dbplyr/R/verb-head.R0000644000176200001440000000070513417123671013736 0ustar liggesusers#' @export head.tbl_lazy <- function(x, n = 6L, ...) { if (inherits(x$ops, "op_head")) { x$ops$args$n <- min(x$ops$args$n, n) } else { x$ops <- op_single("head", x = x$ops, args = list(n = n)) } x } #' @export tail.tbl_sql <- function(x, n = 6L, ...) { stop("tail() is not supported by sql sources", call. = FALSE) } #' @export sql_build.op_head <- function(op, con, ...) { select_query(sql_build(op$x, con), limit = op$args$n) } dbplyr/R/utils-format.R0000644000176200001440000000120613426410503014515 0ustar liggesuserswrap <- function(..., indent = 0) { x <- paste0(..., collapse = "") wrapped <- strwrap( x, indent = indent, exdent = indent + 2, width = getOption("width") ) paste0(wrapped, collapse = "\n") } indent <- function(x) { x <- paste0(x, collapse = "\n") paste0(" ", gsub("\n", "\n ", x)) } indent_print <- function(x) { indent(utils::capture.output(print(x))) } # function for the thousand separator, # returns "," unless it's used for the decimal point, in which case returns "." 'big_mark' <- function(x, ...) { mark <- if (identical(getOption("OutDec"), ",")) "." else "," formatC(x, big.mark = mark, ...) } dbplyr/R/verb-do-query.R0000644000176200001440000000217513415745770014615 0ustar liggesusers#' @importFrom R6 R6Class NULL Query <- R6::R6Class("Query", private = list( .nrow = NULL, .vars = NULL ), public = list( con = NULL, sql = NULL, initialize = function(con, sql, vars) { self$con <- con self$sql <- sql private$.vars <- vars }, print = function(...) { cat(" ", self$sql, "\n", sep = "") print(self$con) }, fetch = function(n = -1L) { res <- dbSendQuery(self$con, self$sql) on.exit(dbClearResult(res)) out <- dbFetch(res, n) res_warn_incomplete(res) out }, fetch_paged = function(chunk_size = 1e4, callback) { qry <- dbSendQuery(self$con, self$sql) on.exit(dbClearResult(qry)) while (!dbHasCompleted(qry)) { chunk <- dbFetch(qry, chunk_size) callback(chunk) } invisible(TRUE) }, vars = function() { private$.vars }, nrow = function() { if (!is.null(private$.nrow)) return(private$.nrow) private$.nrow <- db_query_rows(self$con, self$sql) private$.nrow }, ncol = function() { length(self$vars()) } ) ) dbplyr/R/verb-distinct.R0000644000176200001440000000106013442450314014644 0ustar liggesusers#' @export distinct.tbl_lazy <- function(.data, ..., .keep_all = FALSE) { if (dots_n(...) > 0) { if (.keep_all) { stop( "Can only find distinct value of specified columns if .keep_all is FALSE", call. = FALSE ) } .data <- transmute(.data, ...) } add_op_single("distinct", .data, dots = list()) } #' @export op_vars.op_distinct <- function(op) { c(op_grps(op$x), op_vars(op$x)) } #' @export sql_build.op_distinct <- function(op, con, ...) { select_query( sql_build(op$x, con), distinct = TRUE ) } dbplyr/R/utils.R0000644000176200001440000000256013474056125013244 0ustar liggesusersdeparse_trunc <- function(x, width = getOption("width")) { text <- deparse(x, width.cutoff = width) if (length(text) == 1 && nchar(text) < width) return(text) paste0(substr(text[1], 1, width - 3), "...") } is.wholenumber <- function(x) { trunc(x) == x } deparse_all <- function(x) { x <- purrr::map_if(x, is_formula, f_rhs) purrr::map_chr(x, expr_text, width = 500L) } #' Provides comma-separated string out of the parameters #' @export #' @keywords internal #' @param ... Arguments to be constructed into the string named_commas <- function(...) { x <- unlist(purrr::map(list2(...), as.character)) if (is_null(names(x))) { paste0(x, collapse = ", ") } else { paste0(names(x), " = ", x, collapse = ", ") } } commas <- function(...) paste0(..., collapse = ", ") in_travis <- function() identical(Sys.getenv("TRAVIS"), "true") unique_name <- local({ i <- 0 function() { i <<- i + 1 paste0("zzz", i) } }) succeeds <- function(x, quiet = FALSE) { tryCatch( { x TRUE }, error = function(e) { if (!quiet) message("Error: ", e$message) FALSE } ) } c_character <- function(...) { x <- c(...) if (length(x) == 0) { return(character()) } if (!is.character(x)) { stop("Character input expected", call. = FALSE) } x } cat_line <- function(...) cat(paste0(..., "\n"), sep = "") dbplyr/R/memdb.R0000644000176200001440000000177313443245100013162 0ustar liggesusers#' Create a database table in temporary in-memory database. #' #' `memdb_frame()` works like [tibble::tibble()], but instead of creating a new #' data frame in R, it creates a table in [src_memdb()]. #' #' @inheritParams tibble::tibble #' @param name,.name Name of table in database: defaults to a random name that's #' unlikely to conflict with an existing table. #' @param df Data frame to copy #' @export #' @examples #' library(dplyr) #' df <- memdb_frame(x = runif(100), y = runif(100)) #' df %>% arrange(x) #' df %>% arrange(x) %>% show_query() #' #' mtcars_db <- tbl_memdb(mtcars) #' mtcars_db %>% count(cyl) %>% show_query() memdb_frame <- function(..., .name = unique_table_name()) { x <- copy_to(src_memdb(), tibble(...), name = .name) x } #' @rdname memdb_frame #' @export tbl_memdb <- function(df, name = deparse(substitute(df))) { copy_to(src_memdb(), df, name = name) } #' @rdname memdb_frame #' @export src_memdb <- function() { cache_computation("src_memdb", src_sqlite(":memory:", TRUE)) } dbplyr/R/verb-joins.R0000644000176200001440000002132213501726003014145 0ustar liggesusers#' Join sql tbls. #' #' See [join] for a description of the general purpose of the #' functions. #' #' @section Implementation notes: #' #' Semi-joins are implemented using `WHERE EXISTS`, and anti-joins with #' `WHERE NOT EXISTS`. #' #' All joins use column equality by default. #' An arbitrary join predicate can be specified by passing #' an SQL expression to the `sql_on` argument. #' Use `LHS` and `RHS` to refer to the left-hand side or #' right-hand side table, respectively. #' #' @inheritParams dplyr::join #' @param copy If `x` and `y` are not from the same data source, #' and `copy` is `TRUE`, then `y` will be copied into a #' temporary table in same database as `x`. `*_join()` will automatically #' run `ANALYZE` on the created table in the hope that this will make #' you queries as efficient as possible by giving more data to the query #' planner. #' #' This allows you to join tables across srcs, but it's potentially expensive #' operation so you must opt into it. #' @param auto_index if `copy` is `TRUE`, automatically create #' indices for the variables in `by`. This may speed up the join if #' there are matching indexes in `x`. #' @param sql_on A custom join predicate as an SQL expression. The SQL #' can refer to the `LHS` and `RHS` aliases to disambiguate #' column names. #' @examples #' \dontrun{ #' library(dplyr) #' if (has_lahman("sqlite")) { #' #' # Left joins ---------------------------------------------------------------- #' lahman_s <- lahman_sqlite() #' batting <- tbl(lahman_s, "Batting") #' team_info <- select(tbl(lahman_s, "Teams"), yearID, lgID, teamID, G, R:H) #' #' # Combine player and whole team statistics #' first_stint <- select(filter(batting, stint == 1), playerID:H) #' both <- left_join(first_stint, team_info, type = "inner", by = c("yearID", "teamID", "lgID")) #' head(both) #' explain(both) #' #' # Join with a local data frame #' grid <- expand.grid( #' teamID = c("WAS", "ATL", "PHI", "NYA"), #' yearID = 2010:2012) #' top4a <- left_join(batting, grid, copy = TRUE) #' explain(top4a) #' #' # Indices don't really help here because there's no matching index on #' # batting #' top4b <- left_join(batting, grid, copy = TRUE, auto_index = TRUE) #' explain(top4b) #' #' # Semi-joins ---------------------------------------------------------------- #' #' people <- tbl(lahman_s, "Master") #' #' # All people in half of fame #' hof <- tbl(lahman_s, "HallOfFame") #' semi_join(people, hof) #' #' # All people not in the hall of fame #' anti_join(people, hof) #' #' # Find all managers #' manager <- tbl(lahman_s, "Managers") #' semi_join(people, manager) #' #' # Find all managers in hall of fame #' famous_manager <- semi_join(semi_join(people, manager), hof) #' famous_manager #' explain(famous_manager) #' #' # Anti-joins ---------------------------------------------------------------- #' #' # batters without person covariates #' anti_join(batting, people) #' #' # Arbitrary predicates ------------------------------------------------------ #' #' # Find all pairs of awards given to the same player #' # with at least 18 years between the awards: #' awards_players <- tbl(lahman_s, "AwardsPlayers") #' inner_join( #' awards_players, awards_players, #' sql_on = paste0( #' "(LHS.playerID = RHS.playerID) AND ", #' "(LHS.yearID < RHS.yearID - 18)" #' ) #' ) #' } #' } #' @name join.tbl_sql NULL #' @rdname join.tbl_sql #' @export inner_join.tbl_lazy <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ..., sql_on = NULL) { add_op_join( x, y, "inner", by = by, sql_on = sql_on, copy = copy, suffix = suffix, auto_index = auto_index, ... ) } #' @rdname join.tbl_sql #' @export left_join.tbl_lazy <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ..., sql_on = NULL) { add_op_join( x, y, "left", by = by, sql_on = sql_on, copy = copy, suffix = suffix, auto_index = auto_index, ... ) } #' @rdname join.tbl_sql #' @export right_join.tbl_lazy <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ..., sql_on = NULL) { add_op_join( x, y, "right", by = by, sql_on = sql_on, copy = copy, suffix = suffix, auto_index = auto_index, ... ) } #' @rdname join.tbl_sql #' @export full_join.tbl_lazy <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ..., sql_on = NULL) { add_op_join( x, y, "full", by = by, sql_on = sql_on, copy = copy, suffix = suffix, auto_index = auto_index, ... ) } #' @rdname join.tbl_sql #' @export semi_join.tbl_lazy <- function(x, y, by = NULL, copy = FALSE, auto_index = FALSE, ..., sql_on = NULL) { add_op_semi_join( x, y, anti = FALSE, by = by, sql_on = sql_on, copy = copy, auto_index = auto_index, ... ) } #' @rdname join.tbl_sql #' @export anti_join.tbl_lazy <- function(x, y, by = NULL, copy = FALSE, auto_index = FALSE, ..., sql_on = NULL) { add_op_semi_join( x, y, anti = TRUE, by = by, sql_on = sql_on, copy = copy, auto_index = auto_index, ... ) } add_op_join <- function(x, y, type, by = NULL, sql_on = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ...) { if (!is.null(sql_on)) { by <- list(x = character(0), y = character(0), on = sql(sql_on)) } else if (identical(type, "full") && identical(by, character())) { type <- "cross" by <- list(x = character(0), y = character(0)) } else { by <- common_by(by, x, y) } y <- auto_copy( x, y, copy = copy, indexes = if (auto_index) list(by$y) ) vars <- join_vars(op_vars(x), op_vars(y), type = type, by = by, suffix = suffix) x$ops <- op_double("join", x, y, args = list( vars = vars, type = type, by = by, suffix = suffix )) x } add_op_semi_join <- function(x, y, anti = FALSE, by = NULL, sql_on = NULL, copy = FALSE, auto_index = FALSE, ...) { if (!is.null(sql_on)) { by <- list(x = character(0), y = character(0), on = sql(sql_on)) } else { by <- common_by(by, x, y) } y <- auto_copy( x, y, copy, indexes = if (auto_index) list(by$y) ) x$ops <- op_double("semi_join", x, y, args = list( anti = anti, by = by )) x } join_vars <- function(x_names, y_names, type, by, suffix = c(".x", ".y")) { # Remove join keys from y y_names <- setdiff(y_names, by$y) # Add suffix where needed suffix <- check_suffix(suffix) x_new <- add_suffixes(x_names, y_names, suffix$x) y_new <- add_suffixes(y_names, x_names, suffix$y) # In left and inner joins, return key values only from x # In right joins, return key values only from y # In full joins, return key values by coalescing values from x and y x_x <- x_names x_y <- by$y[match(x_names, by$x)] x_y[type == "left" | type == "inner"] <- NA x_x[type == "right" & !is.na(x_y)] <- NA y_x <- rep_len(NA, length(y_names)) y_y <- y_names # Return a list with 3 parallel vectors # At each position, values in the 3 vectors represent # alias - name of column in join result # x - name of column from left table or NA if only from right table # y - name of column from right table or NA if only from left table list(alias = c(x_new, y_new), x = c(x_x, y_x), y = c(x_y, y_y)) } check_suffix <- function(x) { if (!is.character(x) || length(x) != 2) { stop("`suffix` must be a character vector of length 2.", call. = FALSE) } list(x = x[1], y = x[2]) } add_suffixes <- function(x, y, suffix) { if (identical(suffix, "")) { return(x) } out <- character(length(x)) for (i in seq_along(x)) { nm <- x[[i]] while (nm %in% y || nm %in% out) { nm <- paste0(nm, suffix) } out[[i]] <- nm } out } #' @export op_vars.op_join <- function(op) { op$args$vars$alias } #' @export op_vars.op_semi_join <- function(op) { op_vars(op$x) } #' @export sql_build.op_join <- function(op, con, ...) { join_query( op$x, op$y, vars = op$args$vars, type = op$args$type, by = op$args$by, suffix = op$args$suffix ) } #' @export sql_build.op_semi_join <- function(op, con, ...) { semi_join_query(op$x, op$y, anti = op$args$anti, by = op$args$by) } dbplyr/R/verb-pull.R0000644000176200001440000000035013417124460014002 0ustar liggesusers#' @export pull.tbl_sql <- function(.data, var = -1) { expr <- enquo(var) var <- dplyr:::find_var(expr, tbl_vars(.data)) .data <- ungroup(.data) .data <- select(.data, !! sym(var)) .data <- collect(.data) .data[[1]] } dbplyr/R/backend-impala.R0000644000176200001440000000155213442161247014731 0ustar liggesusers#' @export sql_translate_env.Impala <- function(con) { sql_variant( scalar = sql_translator(.parent = base_odbc_scalar, bitwNot = sql_prefix("BITNOT", 1), bitwAnd = sql_prefix("BITAND", 2), bitwOr = sql_prefix("BITOR", 2), bitwXor = sql_prefix("BITXOR", 2), bitwShiftL = sql_prefix("SHIFTLEFT", 2), bitwShiftR = sql_prefix("SHIFTRIGHT", 2), as.Date = sql_cast("VARCHAR(10)"), ceiling = sql_prefix("CEIL") ) , base_odbc_agg, base_odbc_win ) } #' @export db_analyze.Impala <- function(con, table, ...) { # Using COMPUTE STATS instead of ANALYZE as recommended in this article # https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_compute_stats.html sql <- build_sql("COMPUTE STATS ", as.sql(table), con = con) DBI::dbExecute(con, sql) } dbplyr/R/explain.R0000644000176200001440000000043013426147047013537 0ustar liggesusers#' @export show_query.tbl_lazy <- function(x, ...) { cat_line("") cat_line(remote_query(x)) invisible(x) } #' @export explain.tbl_sql <- function(x, ...) { force(x) show_query(x) cat_line() cat_line("") cat_line(remote_query_plan(x)) invisible(x) } dbplyr/R/tbl-lazy.R0000644000176200001440000000636613474055330013647 0ustar liggesusers#' Create a local lazy tibble #' #' These functions are useful for testing SQL generation without having to #' have an active database connection. See [simulate_dbi()] for a list #' available database simulations. #' #' @keywords internal #' @export #' @examples #' library(dplyr) #' df <- data.frame(x = 1, y = 2) #' #' df_sqlite <- tbl_lazy(df, con = simulate_sqlite()) #' df_sqlite %>% summarise(x = sd(x, na.rm = TRUE)) %>% show_query() tbl_lazy <- function(df, con = simulate_dbi(), src = NULL) { if (!is.null(src)) { warn("`src` is deprecated; please use `con` instead") con <- src } subclass <- class(con)[[1]] make_tbl( purrr::compact(c(subclass, "lazy")), ops = op_base_local(df), src = src_dbi(con) ) } setOldClass(c("tbl_lazy", "tbl")) #' @export #' @rdname tbl_lazy lazy_frame <- function(..., con = simulate_dbi(), src = NULL) { tbl_lazy(tibble(...), con = con, src = src) } #' @export dimnames.tbl_lazy <- function(x) { list(NULL, op_vars(x$ops)) } #' @export dim.tbl_lazy <- function(x) { c(NA, length(op_vars(x$ops))) } #' @export print.tbl_lazy <- function(x, ...) { show_query(x) } #' @export as.data.frame.tbl_lazy <- function(x, row.names, optional, ...) { stop("Can not coerce `tbl_lazy` to data.frame", call. = FALSE) } #' @export same_src.tbl_lazy <- function(x, y) { inherits(y, "tbl_lazy") } #' @export tbl_vars.tbl_lazy <- function(x) { op_vars(x$ops) } #' @export groups.tbl_lazy <- function(x) { lapply(group_vars(x), as.name) } # Manually registered in zzz.R group_by_drop_default.tbl_lazy <- function(x) { TRUE } #' @export group_vars.tbl_lazy <- function(x) { op_grps(x$ops) } # lazyeval ---------------------------------------------------------------- # nocov start #' @export filter_.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) filter(.data, !!!dots) } #' @export arrange_.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) arrange(.data, !!!dots) } #' @export select_.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) select(.data, !!!dots) } #' @export rename_.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) rename(.data, !!!dots) } #' @export summarise_.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) summarise(.data, !!!dots) } #' @export mutate_.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) mutate(.data, !!!dots) } #' @export group_by_.tbl_lazy <- function(.data, ..., .dots = list(), add = FALSE) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) group_by(.data, !!!dots, add = add) } #' @export distinct_.tbl_lazy <- function(.data, ..., .dots = list(), .keep_all = FALSE) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) distinct(.data, !!! dots, .keep_all = .keep_all) } #' @export do_.tbl_sql <- function(.data, ..., .dots = list(), .chunk_size = 1e4L) { dots <- dplyr:::compat_lazy_dots(.dots, caller_env(), ...) do(.data, !!! dots, .chunk_size = .chunk_size) } # nocov end dbplyr/R/tbl-sql.R0000644000176200001440000000402613426604550013457 0ustar liggesusers#' Create an SQL tbl (abstract) #' #' Generally, you should no longer need to provide a custom `tbl()` #' method you you can default `tbl.DBIConnect` method. #' #' @keywords internal #' @export #' @param subclass name of subclass #' @param ... needed for agreement with generic. Not otherwise used. #' @param vars DEPRECATED tbl_sql <- function(subclass, src, from, ..., vars = NULL) { # If not literal sql, must be a table identifier from <- as.sql(from) if (!missing(vars)) { warning("`vars` argument is deprecated as it is no longer needed", call. = FALSE) } vars <- vars %||% db_query_fields(src$con, from) ops <- op_base_remote(from, vars) make_tbl(c(subclass, "sql", "lazy"), src = src, ops = ops) } #' @export same_src.tbl_sql <- function(x, y) { inherits(y, "tbl_sql") && same_src(x$src, y$src) } # Grouping methods ------------------------------------------------------------- #' @export group_size.tbl_sql <- function(x) { df <- x %>% summarise(n = n()) %>% collect() df$n } #' @export n_groups.tbl_sql <- function(x) { if (length(groups(x)) == 0) return(1L) df <- x %>% summarise() %>% ungroup() %>% summarise(n = n()) %>% collect() df$n } # Standard data frame methods -------------------------------------------------- #' @export print.tbl_sql <- function(x, ..., n = NULL, width = NULL, n_extra = NULL) { cat_line(format(x, ..., n = n, width = width, n_extra = n_extra)) invisible(x) } #' @export as.data.frame.tbl_sql <- function(x, row.names = NULL, optional = NULL, ..., n = Inf) { as.data.frame(collect(x, n = n)) } #' @export tbl_sum.tbl_sql <- function(x) { grps <- op_grps(x$ops) sort <- op_sort(x$ops) c( "Source" = tbl_desc(x), "Database" = db_desc(x$src$con), "Groups" = if (length(grps) > 0) commas(grps), "Ordered by" = if (length(sort) > 0) commas(deparse_all(sort)) ) } tbl_desc <- function(x) { paste0( op_desc(x$ops), " [", op_rows(x$ops), " x ", big_mark(op_cols(x$ops)), "]" ) } dbplyr/R/verb-filter.R0000644000176200001440000000212513474055330014317 0ustar liggesusers# registered onLoad filter.tbl_lazy <- function(.data, ..., .preserve = FALSE) { if (!identical(.preserve, FALSE)) { stop("`.preserve` is not supported on database backends", call. = FALSE) } dots <- quos(...) dots <- partial_eval_dots(dots, vars = op_vars(.data)) add_op_single("filter", .data, dots = dots) } #' @export sql_build.op_filter <- function(op, con, ...) { vars <- op_vars(op$x) if (!uses_window_fun(op$dots, con)) { where_sql <- translate_sql_(op$dots, con, context = list(clause = "WHERE")) select_query( sql_build(op$x, con), where = where_sql ) } else { # Do partial evaluation, then extract out window functions where <- translate_window_where_all(op$dots, ls(sql_translate_env(con)$window)) # Convert where$expr back to a lazy dots object, and then # create mutate operation mutated <- sql_build(new_op_select(op$x, carry_over(vars, where$comp)), con = con) where_sql <- translate_sql_(where$expr, con = con, context = list(clause = "WHERE")) select_query(mutated, select = ident(vars), where = where_sql) } } dbplyr/R/src-sql.R0000644000176200001440000000154713415745770013502 0ustar liggesusers#' Create a "sql src" object #' #' Deprecated: please use [src_dbi] instead. #' #' @keywords internal #' @export #' @param subclass name of subclass. "src_sql" is an abstract base class, so you #' must supply this value. `src_` is automatically prepended to the #' class name #' @param con the connection object #' @param ... fields used by object src_sql <- function(subclass, con, ...) { subclass <- paste0("src_", subclass) structure(list(con = con, ...), class = c(subclass, "src_sql", "src")) } #' @export same_src.src_sql <- function(x, y) { if (!inherits(y, "src_sql")) return(FALSE) identical(x$con, y$con) } #' @export src_tbls.src_sql <- function(x, ...) { db_list_tables(x$con) } #' @export format.src_sql <- function(x, ...) { paste0( "src: ", db_desc(x$con), "\n", wrap("tbls: ", paste0(sort(src_tbls(x)), collapse = ", ")) ) } dbplyr/R/verb-do.R0000644000176200001440000001062213501726003013426 0ustar liggesusers#' Perform arbitrary computation on remote backend #' #' @inheritParams dplyr::do #' @param .chunk_size The size of each chunk to pull into R. If this number is #' too big, the process will be slow because R has to allocate and free a lot #' of memory. If it's too small, it will be slow, because of the overhead of #' talking to the database. #' @export do.tbl_sql <- function(.data, ..., .chunk_size = 1e4L) { groups_sym <- groups(.data) if (length(groups_sym) == 0) { .data <- collect(.data) return(do(.data, ...)) } args <- quos(...) named <- named_args(args) # Create data frame of labels labels <- .data %>% select(!!! groups_sym) %>% summarise() %>% collect() con <- .data$src$con n <- nrow(labels) m <- length(args) out <- replicate(m, vector("list", n), simplify = FALSE) names(out) <- names(args) p <- progress_estimated(n * m, min_time = 2) # Create ungrouped data frame suitable for chunked retrieval query <- Query$new(con, db_sql_render(con, ungroup(.data)), op_vars(.data)) # When retrieving in pages, there's no guarantee we'll get a complete group. # So we always assume the last group in the chunk is incomplete, and leave # it for the next. If the group size is large than chunk size, it may # take a couple of iterations to get the entire group, but that should # be an unusual situation. last_group <- NULL i <- 0 # Assumes `chunk` to be ordered with group columns first gvars <- seq_along(groups_sym) # Initialise a data mask for tidy evaluation env <- env(empty_env()) mask <- new_data_mask(env) query$fetch_paged(.chunk_size, function(chunk) { if (!is_null(last_group)) { chunk <- rbind(last_group, chunk) } # Create an id for each group grouped <- chunk %>% group_by(!!! syms(names(chunk)[gvars])) if (utils::packageVersion("dplyr") < "0.7.9") { index <- attr(grouped, "indices") # convert from 0-index index <- lapply(index, `+`, 1L) } else { index <- dplyr::group_rows(grouped) } n <- length(index) last_group <<- chunk[index[[length(index)]], , drop = FALSE] for (j in seq_len(n - 1)) { cur_chunk <- chunk[index[[j]], , drop = FALSE] # Update pronouns within the data mask env$. <- cur_chunk env$.data <- cur_chunk for (k in seq_len(m)) { out[[k]][[i + j]] <<- eval_tidy(args[[k]], mask) p$tick()$print() } } i <<- i + (n - 1) }) # Process last group if (!is_null(last_group)) { env$. <- last_group last_group <- env$.data for (k in seq_len(m)) { out[[k]][[i + 1]] <- eval_tidy(args[[k]], mask) p$tick()$print() } } if (!named) { label_output_dataframe(labels, out, groups(.data)) } else { label_output_list(labels, out, groups(.data)) } } # Helper functions ------------------------------------------------------------- label_output_dataframe <- function(labels, out, groups) { data_frame <- vapply(out[[1]], is.data.frame, logical(1)) if (any(!data_frame)) { stop( "Results are not data frames at positions: ", paste(which(!data_frame), collapse = ", "), call. = FALSE ) } rows <- vapply(out[[1]], nrow, numeric(1)) out <- bind_rows(out[[1]]) if (!is.null(labels)) { # Remove any common columns from labels labels <- labels[setdiff(names(labels), names(out))] # Repeat each row to match data labels <- labels[rep(seq_len(nrow(labels)), rows), , drop = FALSE] rownames(labels) <- NULL grouped_df(bind_cols(labels, out), groups) } else { rowwise(out) } } label_output_list <- function(labels, out, groups) { if (!is.null(labels)) { labels[names(out)] <- out rowwise(labels) } else { class(out) <- "data.frame" attr(out, "row.names") <- .set_row_names(length(out[[1]])) rowwise(out) } } named_args <- function(args) { # Arguments must either be all named or all unnamed. named <- sum(names2(args) != "") if (!(named == 0 || named == length(args))) { stop( "Arguments to do() must either be all named or all unnamed", call. = FALSE ) } if (named == 0 && length(args) > 1) { stop("Can only supply single unnamed argument to do()", call. = FALSE) } # Check for old syntax if (named == 1 && names(args) == ".f") { stop( "do syntax changed in dplyr 0.2. Please see documentation for details", call. = FALSE ) } named != 0 } dbplyr/R/dbplyr.R0000644000176200001440000000045413415745770013406 0ustar liggesusers#' @importFrom assertthat assert_that #' @importFrom assertthat is.flag #' @importFrom stats setNames update #' @importFrom utils head tail #' @importFrom glue glue #' @importFrom methods setOldClass #' @import dplyr #' @import rlang #' @import DBI #' @import tibble #' @keywords internal "_PACKAGE" dbplyr/R/translate-sql-paste.R0000644000176200001440000000167213426147016016010 0ustar liggesusers#' @export #' @rdname sql_variant sql_paste <- function(default_sep, f = "CONCAT_WS") { function(..., sep = default_sep, collapse = NULL){ check_collapse(collapse) sql_call2(f, sep, ...) } } #' @export #' @rdname sql_variant sql_paste_infix <- function(default_sep, op, cast) { force(default_sep) op <- as.symbol(paste0("%", op, "%")) force(cast) function(..., sep = default_sep, collapse = NULL){ check_collapse(collapse) args <- list(...) if (length(args) == 1) { return(cast(args[[1]])) } if (sep == "") { infix <- function(x, y) sql_call2(op, x, y) } else { infix <- function(x, y) sql_call2(op, sql_call2(op, x, sep), y) } purrr::reduce(args, infix) } } check_collapse <- function(collapse) { if (is.null(collapse)) return() stop( "`collapse` not supported in DB translation of `paste()`.\n", "Please use str_flatten() instead", call. = FALSE ) } dbplyr/R/query-set-op.R0000644000176200001440000000216013426612360014446 0ustar liggesusers#' @export #' @rdname sql_build set_op_query <- function(x, y, type = type) { structure( list( x = x, y = y, type = type ), class = c("set_op_query", "query") ) } #' @export print.set_op_query <- function(x, ...) { cat_line("") cat_line("X:") cat_line(indent_print(sql_build(x$x))) cat_line("Y:") cat_line(indent_print(sql_build(x$y))) } #' @export sql_render.set_op_query <- function(query, con = NULL, ..., bare_identifier_ok = FALSE) { from_x <- sql_render(query$x, con, ..., bare_identifier_ok = FALSE) from_y <- sql_render(query$y, con, ..., bare_identifier_ok = FALSE) sql_set_op(con, from_x, from_y, method = query$type) } # SQL generation ---------------------------------------------------------- #' @export sql_set_op.default <- function(con, x, y, method) { build_sql( "(", x, ")", "\n", sql(method), "\n", "(", y, ")", con = con ) } #' @export sql_set_op.SQLiteConnection <- function(con, x, y, method) { # SQLite does not allow parentheses build_sql( x, "\n", sql(method), "\n", y, con = con ) } dbplyr/R/verb-window.R0000644000176200001440000000312713442212726014343 0ustar liggesusers#' Override window order and frame #' #' @param .data A remote tibble #' @param ... Name-value pairs of expressions. #' @param from,to Bounds of the frame. #' @export #' @examples #' library(dplyr) #' df <- lazy_frame(g = rep(1:2, each = 5), y = runif(10), z = 1:10) #' #' df %>% #' window_order(y) %>% #' mutate(z = cumsum(y)) %>% #' sql_build() #' #' df %>% #' group_by(g) %>% #' window_frame(-3, 0) %>% #' window_order(z) %>% #' mutate(z = sum(x)) %>% #' sql_build() #' @export window_order <- function(.data, ...) { dots <- quos(...) dots <- partial_eval_dots(dots, vars = op_vars(.data)) names(dots) <- NULL add_op_order(.data, dots) } # We want to preserve this ordering (for window functions) without # imposing an additional arrange, so we have a special op_order add_op_order <- function(.data, dots = list()) { if (length(dots) == 0) { return(.data) } .data$ops <- op_single("order", x = .data$ops, dots = dots) .data } #' @export op_sort.op_order <- function(op) { c(op_sort(op$x), op$dots) } #' @export sql_build.op_order <- function(op, con, ...) { sql_build(op$x, con, ...) } # Frame ------------------------------------------------------------------- #' @export #' @rdname window_order window_frame <- function(.data, from = -Inf, to = Inf) { stopifnot(is.numeric(from), length(from) == 1) stopifnot(is.numeric(to), length(to) == 1) add_op_single("frame", .data, args = list(range = c(from, to))) } #' @export op_frame.op_frame <- function(op) { op$args$range } #' @export sql_build.op_frame <- function(op, con, ...) { sql_build(op$x, con, ...) } dbplyr/R/query-semi-join.R0000644000176200001440000000306213426612330015130 0ustar liggesusers#' @export #' @rdname sql_build semi_join_query <- function(x, y, anti = FALSE, by = NULL) { structure( list( x = x, y = y, anti = anti, by = by ), class = c("semi_join_query", "query") ) } #' @export print.semi_join_query <- function(x, ...) { cat_line("") cat_line("By:") cat_line(indent(paste0(x$by$x, "-", x$by$y))) cat_line("X:") cat_line(indent_print(sql_build(x$x))) cat_line("Y:") cat_line(indent_print(sql_build(x$y))) } #' @export sql_render.semi_join_query <- function(query, con = NULL, ..., bare_identifier_ok = FALSE) { from_x <- sql_subquery( con, sql_render(query$x, con, ..., bare_identifier_ok = TRUE), name = "LHS" ) from_y <- sql_subquery( con, sql_render(query$y, con, ..., bare_identifier_ok = TRUE), name = "RHS" ) sql_semi_join(con, from_x, from_y, anti = query$anti, by = query$by) } # SQL generation ---------------------------------------------------------- #' @export sql_semi_join.DBIConnection <- function(con, x, y, anti = FALSE, by = NULL, ...) { lhs <- escape(ident("LHS"), con = con) rhs <- escape(ident("RHS"), con = con) on <- sql_vector( paste0( lhs, ".", sql_escape_ident(con, by$x), " = ", rhs, ".", sql_escape_ident(con, by$y) ), collapse = " AND ", parens = TRUE, con = con ) build_sql( "SELECT * FROM ", x, "\n", "WHERE ", if (anti) sql("NOT "), "EXISTS (\n", " SELECT 1 FROM ", y, "\n", " WHERE ", on, "\n", ")", con = con ) } dbplyr/R/verb-copy-to.R0000644000176200001440000001063613455376041014436 0ustar liggesusers#' Copy a local data frame to a DBI backend. #' #' This [copy_to()] method works for all DBI sources. It is useful for #' copying small amounts of data to a database for examples, experiments, #' and joins. By default, it creates temporary tables which are typically #' only visible to the current connection to the database. #' #' @export #' @param df A local data frame, a `tbl_sql` from same source, or a `tbl_sql` #' from another source. If from another source, all data must transition #' through R in one pass, so it is only suitable for transferring small #' amounts of data. #' @param types a character vector giving variable types to use for the columns. #' See \url{http://www.sqlite.org/datatype3.html} for available types. #' @param temporary if `TRUE`, will create a temporary table that is #' local to this connection and will be automatically deleted when the #' connection expires #' @param unique_indexes a list of character vectors. Each element of the list #' will create a new unique index over the specified column(s). Duplicate rows #' will result in failure. #' @param indexes a list of character vectors. Each element of the list #' will create a new index. #' @param analyze if `TRUE` (the default), will automatically ANALYZE the #' new table so that the query optimiser has useful information. #' @inheritParams dplyr::copy_to #' @return A [tbl()] object (invisibly). #' @examples #' library(dplyr) #' set.seed(1014) #' #' mtcars$model <- rownames(mtcars) #' mtcars2 <- src_memdb() %>% #' copy_to(mtcars, indexes = list("model"), overwrite = TRUE) #' mtcars2 %>% filter(model == "Hornet 4 Drive") #' #' cyl8 <- mtcars2 %>% filter(cyl == 8) #' cyl8_cached <- copy_to(src_memdb(), cyl8) #' #' # copy_to is called automatically if you set copy = TRUE #' # in the join functions #' df <- tibble(cyl = c(6, 8)) #' mtcars2 %>% semi_join(df, copy = TRUE) copy_to.src_sql <- function(dest, df, name = deparse(substitute(df)), overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ...) { assert_that(is_string(name), is.flag(temporary)) if (!is.data.frame(df) && !inherits(df, "tbl_sql")) { stop("`df` must be a local dataframe or a remote tbl_sql", call. = FALSE) } if (inherits(df, "tbl_sql") && same_src(df$src, dest)) { out <- compute(df, name = name, temporary = temporary, unique_indexes = unique_indexes, indexes = indexes, analyze = analyze, ... ) } else { # avoid S4 dispatch problem in dbSendPreparedQuery df <- as.data.frame(collect(df)) name <- db_copy_to(dest$con, name, df, overwrite = overwrite, types = types, temporary = temporary, unique_indexes = unique_indexes, indexes = indexes, analyze = analyze, ... ) out <- tbl(dest, name) } invisible(out) } #' @export auto_copy.tbl_sql <- function(x, y, copy = FALSE, ...) { copy_to(x$src, as.data.frame(y), unique_table_name(), ...) } #' More db generics #' #' These are new, so not included in dplyr for backward compatibility #' purposes. #' #' @keywords internal #' @export db_copy_to <- function(con, table, values, overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ...) { UseMethod("db_copy_to") } #' @export db_copy_to.DBIConnection <- function(con, table, values, overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ...) { types <- types %||% db_data_type(con, values) names(types) <- names(values) with_transaction(con, { # Only remove if it exists; returns NA for MySQL if (overwrite && !is_false(db_has_table(con, table))) { db_drop_table(con, table, force = TRUE) } table <- db_write_table(con, table, types = types, values = values, temporary = temporary) db_create_indexes(con, table, unique_indexes, unique = TRUE) db_create_indexes(con, table, indexes, unique = FALSE) if (analyze) db_analyze(con, table) }) table } # Don't use `tryCatch()` because it messes with the callstack with_transaction <- function(con, code) { db_begin(con) on.exit(db_rollback(con)) code on.exit() db_commit(con) } dbplyr/R/data-cache.R0000644000176200001440000000265113415745770014065 0ustar liggesuserscache <- function() { if (!is_attached("dbplyr_cache")) { get("attach")(new_environment(), name = "dbplyr_cache") } search_env("dbplyr_cache") } cache_computation <- function(name, computation) { cache <- cache() if (env_has(cache, name)) { env_get(cache, name) } else { res <- force(computation) env_poke(cache, name, res) res } } load_srcs <- function(f, src_names, quiet = NULL) { if (is.null(quiet)) { quiet <- !identical(Sys.getenv("NOT_CRAN"), "true") } srcs <- lapply(src_names, function(x) { out <- NULL try(out <- f(x), silent = TRUE) if (is.null(out) && !quiet) { message("Could not instantiate ", x, " src") } out }) purrr::compact(setNames(srcs, src_names)) } db_location <- function(path = NULL, filename) { if (!is.null(path)) { # Check that path is a directory and is writeable if (!file.exists(path) || !file.info(path)$isdir) { stop(path, " is not a directory", call. = FALSE) } if (!is_writeable(path)) stop("Can not write to ", path, call. = FALSE) return(file.path(path, filename)) } pkg <- file.path(system.file("db", package = "dplyr")) if (is_writeable(pkg)) return(file.path(pkg, filename)) tmp <- tempdir() if (is_writeable(tmp)) return(file.path(tmp, filename)) stop("Could not find writeable location to cache db", call. = FALSE) } is_writeable <- function(x) { unname(file.access(x, 2) == 0) } dbplyr/R/translate-sql-helpers.R0000644000176200001440000001512513426334467016344 0ustar liggesusers#' Create an sql translator #' #' When creating a package that maps to a new SQL based src, you'll often #' want to provide some additional mappings from common R commands to the #' commands that your tbl provides. These three functions make that #' easy. #' #' @section Helper functions: #' #' `sql_infix()` and `sql_prefix()` create default SQL infix and prefix #' functions given the name of the SQL function. They don't perform any input #' checking, but do correctly escape their input, and are useful for #' quickly providing default wrappers for a new SQL variant. #' #' @keywords internal #' @seealso [win_over()] for helper functions for window functions. #' @param scalar,aggregate,window The three families of functions than an #' SQL variant can supply. #' @param ...,.funs named functions, used to add custom converters from standard #' R functions to sql functions. Specify individually in `...`, or #' provide a list of `.funs` #' @param .parent the sql variant that this variant should inherit from. #' Defaults to `base_agg` which provides a standard set of #' mappings for the most common operators and functions. #' @param f the name of the sql function as a string #' @param f_r the name of the r function being translated as a string #' @param n for `sql_infix()`, an optional number of arguments to expect. #' Will signal error if not correct. #' @seealso [sql()] for an example of a more customised sql #' conversion function. #' @export #' @examples #' # An example of adding some mappings for the statistical functions that #' # postgresql provides: http://bit.ly/K5EdTn #' #' postgres_agg <- sql_translator(.parent = base_agg, #' cor = sql_aggregate_2("CORR"), #' cov = sql_aggregate_2("COVAR_SAMP"), #' sd = sql_aggregate("STDDEV_SAMP", "sd"), #' var = sql_aggregate("VAR_SAMP", "var") #' ) #' #' # Next we have to simulate a connection that uses this variant #' con <- simulate_dbi("TestCon") #' sql_translate_env.TestCon <- function(x) { #' sql_variant( #' base_scalar, #' postgres_agg, #' base_no_win #' ) #' } #' #' translate_sql(cor(x, y), con = con, window = FALSE) #' translate_sql(sd(income / years), con = con, window = FALSE) #' #' # Any functions not explicitly listed in the converter will be translated #' # to sql as is, so you don't need to convert all functions. #' translate_sql(regr_intercept(y, x), con = con) sql_variant <- function(scalar = sql_translator(), aggregate = sql_translator(), window = sql_translator()) { stopifnot(is.environment(scalar)) stopifnot(is.environment(aggregate)) stopifnot(is.environment(window)) # Need to check that every function in aggregate also occurs in window missing <- setdiff(ls(aggregate), ls(window)) if (length(missing) > 0) { warn(paste0( "Translator is missing window variants of the following aggregate functions:\n", paste0("* ", missing, "\n", collapse = "") )) } # An ensure that every window function is flagged in aggregate context missing <- setdiff(ls(window), ls(aggregate)) missing_funs <- lapply(missing, sql_aggregate_win) env_bind(aggregate, !!!set_names(missing_funs, missing)) structure( list(scalar = scalar, aggregate = aggregate, window = window), class = "sql_variant" ) } is.sql_variant <- function(x) inherits(x, "sql_variant") #' @export print.sql_variant <- function(x, ...) { wrap_ls <- function(x, ...) { vars <- sort(ls(envir = x)) wrapped <- strwrap(paste0(vars, collapse = ", "), ...) if (identical(wrapped, "")) return() paste0(wrapped, "\n", collapse = "") } cat("\n") cat(wrap_ls( x$scalar, prefix = "scalar: " )) cat(wrap_ls( x$aggregate, prefix = "aggregate: " )) cat(wrap_ls( x$window, prefix = "window: " )) } #' @export names.sql_variant <- function(x) { c(ls(envir = x$scalar), ls(envir = x$aggregate), ls(envir = x$window)) } #' @export #' @rdname sql_variant sql_translator <- function(..., .funs = list(), .parent = new.env(parent = emptyenv())) { funs <- c(list2(...), .funs) if (length(funs) == 0) return(.parent) list2env(funs, copy_env(.parent)) } copy_env <- function(from, to = NULL, parent = parent.env(from)) { list2env(as.list(from), envir = to, parent = parent) } #' @rdname sql_variant #' @param pad If `TRUE`, the default, pad the infix operator with spaces. #' @export sql_infix <- function(f, pad = TRUE) { assert_that(is_string(f)) if (pad) { function(x, y) { build_sql(x, " ", sql(f), " ", y) } } else { function(x, y) { build_sql(x, sql(f), y) } } } #' @rdname sql_variant #' @export sql_prefix <- function(f, n = NULL) { assert_that(is_string(f)) function(...) { args <- list(...) if (!is.null(n) && length(args) != n) { stop( "Invalid number of args to SQL ", f, ". Expecting ", n, call. = FALSE ) } if (any(names2(args) != "")) { warning("Named arguments ignored for SQL ", f, call. = FALSE) } build_sql(sql(f), args) } } #' @rdname sql_variant #' @export sql_aggregate <- function(f, f_r = f) { assert_that(is_string(f)) warned <- FALSE function(x, na.rm = FALSE) { warned <<- check_na_rm(f_r, na.rm, warned) build_sql(sql(f), list(x)) } } #' @rdname sql_variant #' @export sql_aggregate_2 <- function(f) { assert_that(is_string(f)) function(x, y) { build_sql(sql(f), list(x, y)) } } sql_aggregate_win <- function(f) { force(f) function(...) { stop( "`", f, "()` is only available in a windowed (`mutate()`) context", call. = FALSE ) } } check_na_rm <- function(f, na.rm, warned) { if (warned || identical(na.rm, TRUE)) { return(warned) } warning( "Missing values are always removed in SQL.\n", "Use `", f, "(x, na.rm = TRUE)` to silence this warning\n", "This warning is displayed only once per session.", call. = FALSE ) TRUE } #' @rdname sql_variant #' @export sql_not_supported <- function(f) { assert_that(is_string(f)) function(...) { stop(f, " is not available in this SQL variant", call. = FALSE) } } #' @rdname sql_variant #' @export sql_cast <- function(type) { type <- sql(type) function(x) { sql_expr(cast(!!x %as% !!type)) } } #' @rdname sql_variant #' @export sql_log <- function() { function(x, base = exp(1)){ if (isTRUE(all.equal(base, exp(1)))) { sql_expr(ln(!!x)) } else { sql_expr(log(!!x) / log(!!base)) } } } #' @rdname sql_variant #' @export sql_cot <- function(){ function(x){ sql_expr(1L / tan(!!x)) } } globalVariables(c("%as%", "cast", "ln")) dbplyr/R/verb-summarise.R0000644000176200001440000000251213442456705015045 0ustar liggesusers#' @export summarise.tbl_lazy <- function(.data, ...) { dots <- quos(..., .named = TRUE) dots <- partial_eval_dots(dots, vars = op_vars(.data)) check_summarise_vars(dots) add_op_single("summarise", .data, dots = dots) } # For each expression, check if it uses any newly created variables check_summarise_vars <- function(dots) { for (i in seq_along(dots)) { used_vars <- all_names(get_expr(dots[[i]])) cur_vars <- names(dots)[seq_len(i - 1)] if (any(used_vars %in% cur_vars)) { stop( "`", names(dots)[[i]], "` refers to a variable created earlier in this summarise().\n", "Do you need an extra mutate() step?", call. = FALSE ) } } } #' @export op_vars.op_summarise <- function(op) { c(op_grps(op$x), names(op$dots)) } #' @export op_grps.op_summarise <- function(op) { grps <- op_grps(op$x) if (length(grps) == 1) { character() } else { grps[-length(grps)] } } #' @export op_sort.op_summarise <- function(op) NULL #' @export sql_build.op_summarise <- function(op, con, ...) { select_vars <- translate_sql_(op$dots, con, window = FALSE, context = list(clause = "SELECT")) group_vars <- c.sql(ident(op_grps(op$x)), con = con) select_query( sql_build(op$x, con), select = c.sql(group_vars, select_vars, con = con), group_by = group_vars ) } dbplyr/R/translate-sql-string.R0000644000176200001440000000264113442404754016202 0ustar liggesusers# R prefers to specify start / stop or start / end # databases usually specify start / length # https://www.postgresql.org/docs/current/functions-string.html #' @export #' @rdname sql_variant sql_substr <- function(f = "SUBSTR") { function(x, start, stop) { start <- as.integer(start) length <- pmax(as.integer(stop) - start + 1L, 0L) sql_call2(f, x, start, length) } } # str_sub(x, start, end) - start and end can be negative # SUBSTR(string, start, length) - start can be negative #' @export #' @rdname sql_variant sql_str_sub <- function(subset_f = "SUBSTR", length_f = "LENGTH") { function(string, start = 1L, end = -1L) { stopifnot(length(start) == 1L, length(end) == 1L) start <- as.integer(start) end <- as.integer(end) if (end == -1L) { sql_call2(subset_f, string, start) } else if (end <= 0) { if (start < 0) { length <- pmax(- start + end + 1L, 0L) } else { length <- sql_expr(!!sql_call2(length_f, string) - !!(abs(end) - 1L)) } sql_call2(subset_f, string, start, length) } else if (end > 0) { length <- pmax(end - start + 1L, 0L) sql_call2(subset_f, string, start, length) } } } sql_str_trim <- function(string, side = c("both", "left", "right")) { side <- match.arg(side) switch(side, left = sql_expr(LTRIM(!!string)), right = sql_expr(RTRIM(!!string)), both = sql_expr(LTRIM(RTRIM(!!string))), ) } dbplyr/R/remote.R0000644000176200001440000000237113426143776013405 0ustar liggesusers#' Metadata about a remote table #' #' `remote_name()` gives the name remote table, or `NULL` if it's a query. #' `remote_query()` gives the text of the query, and `remote_query_plan()` #' the query plan (as computed by the remote database). `remote_src()` and #' `remote_con()` give the dplyr source and DBI connection respectively. #' #' @param x Remote table, currently must be a [tbl_sql]. #' @return The value, or `NULL` if not remote table, or not applicable. #' For example, computed queries do not have a "name" #' @export #' @examples #' mf <- memdb_frame(x = 1:5, y = 5:1, .name = "blorp") #' remote_name(mf) #' remote_src(mf) #' remote_con(mf) #' remote_query(mf) #' #' mf2 <- dplyr::filter(mf, x > 3) #' remote_name(mf2) #' remote_src(mf2) #' remote_con(mf2) #' remote_query(mf2) remote_name <- function(x) { if (!inherits(x$ops, "op_base")) return() x$ops$x } #' @export #' @rdname remote_name remote_src <- function(x) { x$src } #' @export #' @rdname remote_name remote_con <- function(x) { x$src$con } #' @export #' @rdname remote_name remote_query <- function(x) { db_sql_render(remote_con(x), x) } #' @export #' @rdname remote_name remote_query_plan <- function(x) { db_explain(remote_con(x), db_sql_render(remote_con(x), x$ops)) } dbplyr/R/escape.R0000644000176200001440000001552013501726004013333 0ustar liggesusers#' Escape/quote a string. #' #' `escape()` requires you to provide a database connection to control the #' details of escaping. `escape_ansi()` uses the SQL 92 ANSI standard. #' #' @param x An object to escape. Existing sql vectors will be left as is, #' character vectors are escaped with single quotes, numeric vectors have #' trailing `.0` added if they're whole numbers, identifiers are #' escaped with double quotes. #' @param parens,collapse Controls behaviour when multiple values are supplied. #' `parens` should be a logical flag, or if `NA`, will wrap in #' parens if length > 1. #' #' Default behaviour: lists are always wrapped in parens and separated by #' commas, identifiers are separated by commas and never wrapped, #' atomic vectors are separated by spaces and wrapped in parens if needed. #' @param con Database connection. #' @rdname escape #' @export #' @examples #' # Doubles vs. integers #' escape_ansi(1:5) #' escape_ansi(c(1, 5.4)) #' #' # String vs known sql vs. sql identifier #' escape_ansi("X") #' escape_ansi(sql("X")) #' escape_ansi(ident("X")) #' #' # Escaping is idempotent #' escape_ansi("X") #' escape_ansi(escape_ansi("X")) #' escape_ansi(escape_ansi(escape_ansi("X"))) escape <- function(x, parens = NA, collapse = " ", con = NULL) { if (is.null(con)) { stop("`con` must not be NULL", call. = FALSE) } UseMethod("escape") } #' @export #' @rdname escape escape_ansi <- function(x, parens = NA, collapse = "") { escape(x, parens = parens, collapse = collapse, con = simulate_dbi()) } #' @export escape.ident <- function(x, parens = FALSE, collapse = ", ", con = NULL) { y <- sql_escape_ident(con, x) sql_vector(names_to_as(y, names2(x), con = con), parens, collapse, con = con) } #' @export escape.ident_q <- function(x, parens = FALSE, collapse = ", ", con = NULL) { sql_vector(names_to_as(x, names2(x), con = con), parens, collapse, con = con) } #' @export escape.logical <- function(x, parens = NA, collapse = ", ", con = NULL) { sql_vector(sql_escape_logical(con, x), parens, collapse, con = con) } #' @export escape.factor <- function(x, parens = NA, collapse = ", ", con = NULL) { x <- as.character(x) escape.character(x, parens = parens, collapse = collapse, con = con) } #' @export escape.Date <- function(x, parens = NA, collapse = ", ", con = NULL) { x <- as.character(x) escape.character(x, parens = parens, collapse = collapse, con = con) } #' @export escape.POSIXt <- function(x, parens = NA, collapse = ", ", con = NULL) { x <- strftime(x, "%Y-%m-%dT%H:%M:%OSZ", tz = "UTC") escape.character(x, parens = parens, collapse = collapse, con = con) } #' @export escape.character <- function(x, parens = NA, collapse = ", ", con = NULL) { sql_vector(sql_escape_string(con, x), parens, collapse, con = con) } #' @export escape.double <- function(x, parens = NA, collapse = ", ", con = NULL) { out <- ifelse(is.wholenumber(x), sprintf("%.1f", x), as.character(x)) # Special values out[is.na(x)] <- "NULL" inf <- is.infinite(x) out[inf & x > 0] <- "'Infinity'" out[inf & x < 0] <- "'-Infinity'" sql_vector(out, parens, collapse, con = con) } #' @export escape.integer <- function(x, parens = NA, collapse = ", ", con = NULL) { x[is.na(x)] <- "NULL" sql_vector(x, parens, collapse, con = con) } #' @export escape.integer64 <- function(x, parens = NA, collapse = ", ", con = NULL) { x <- as.character(x) x[is.na(x)] <- "NULL" sql_vector(x, parens, collapse, con = con) } #' @export escape.NULL <- function(x, parens = NA, collapse = " ", con = NULL) { sql("NULL") } #' @export escape.sql <- function(x, parens = NULL, collapse = NULL, con = NULL) { sql_vector(x, isTRUE(parens), collapse, con = con) } #' @export escape.list <- function(x, parens = TRUE, collapse = ", ", con = NULL) { pieces <- vapply(x, escape, character(1), con = con) sql_vector(pieces, parens, collapse, con = con) } #' @export escape.data.frame <- function(x, parens = TRUE, collapse = ", ", con = NULL) { message <- paste0( "Cannot embed a data frame in a SQL query.\n\n", "If you are seeing this error in code that used to work, the most likely ", "cause is a change dbplyr 1.4.0. Previously `df$x` or `df[[y]]` implied ", "that `df` was a local variable, but now you must make that explict ", " with `!!` or `local()`, e.g., `!!df$x` or `local(df[[\"y\"]))" ) abort(paste(strwrap(message), collapse = "\n")) } #' @export escape.reactivevalues <- function(x, parens = TRUE, collapse = ", ", con = NULL) { message <- paste0( "Cannot embed a reactiveValues() object in a SQL query.\n\n", "If you are seeing this error in code that used to work, the most likely ", "cause is a change dbplyr 1.4.0. Previously `df$x` or `df[[y]]` implied ", "that `df` was a local variable, but now you must make that explict ", " with `!!` or `local()`, e.g., `!!df$x` or `local(df[[\"y\"]))" ) abort(paste(strwrap(message), collapse = "\n")) } #' @export #' @rdname escape sql_vector <- function(x, parens = NA, collapse = " ", con = NULL) { if (is.null(con)) { stop("`con` must not be NULL", call. = FALSE) } if (length(x) == 0) { if (!is.null(collapse)) { return(if (isTRUE(parens)) sql("()") else sql("")) } else { return(sql()) } } if (is.na(parens)) { parens <- length(x) > 1L } x <- names_to_as(x, con = con) x <- paste(x, collapse = collapse) if (parens) x <- paste0("(", x, ")") sql(x) } names_to_as <- function(x, names = names2(x), con = NULL) { if (length(x) == 0) { return(character()) } names_esc <- sql_escape_ident(con, names) as <- ifelse(names == "" | names_esc == x, "", paste0(" AS ", names_esc)) paste0(x, as) } #' Helper function for quoting sql elements. #' #' If the quote character is present in the string, it will be doubled. #' `NA`s will be replaced with NULL. #' #' @export #' @param x Character vector to escape. #' @param quote Single quoting character. #' @export #' @keywords internal #' @examples #' sql_quote("abc", "'") #' sql_quote("I've had a good day", "'") #' sql_quote(c("abc", NA), "'") sql_quote <- function(x, quote) { if (length(x) == 0) { return(x) } y <- gsub(quote, paste0(quote, quote), x, fixed = TRUE) y <- paste0(quote, y, quote) y[is.na(x)] <- "NULL" names(y) <- names(x) y } #' More SQL generics #' #' These are new, so not included in dplyr for backward compatibility #' purposes. #' #' @keywords internal #' @export sql_escape_logical <- function(con, x) { UseMethod("sql_escape_logical") } # DBIConnection methods -------------------------------------------------------- #' @export sql_escape_string.DBIConnection <- function(con, x) { dbQuoteString(con, x) } #' @export sql_escape_ident.DBIConnection <- function(con, x) { dbQuoteIdentifier(con, x) } #' @export sql_escape_logical.DBIConnection <- function(con, x) { y <- as.character(x) y[is.na(x)] <- "NULL" y } dbplyr/R/lazy-ops.R0000644000176200001440000001050713417126327013661 0ustar liggesusers#' Lazy operations #' #' This set of S3 classes describe the action of dplyr verbs. These are #' currently used for SQL sources to separate the description of operations #' in R from their computation in SQL. This API is very new so is likely #' to evolve in the future. #' #' `op_vars()` and `op_grps()` compute the variables and groups from #' a sequence of lazy operations. `op_sort()` and `op_frame()` tracks the #' order and frame for use in window functions. #' #' @keywords internal #' @name lazy_ops NULL # Base constructors ------------------------------------------------------- #' @export #' @rdname lazy_ops op_base <- function(x, vars, class = character()) { stopifnot(is.character(vars)) structure( list( x = x, vars = vars ), class = c(paste0("op_base_", class), "op_base", "op") ) } op_base_local <- function(df) { op_base(df, names(df), class = "local") } op_base_remote <- function(x, vars) { stopifnot(is.sql(x) || is.ident(x)) op_base(x, vars, class = "remote") } #' @export print.op_base_remote <- function(x, ...) { if (inherits(x$x, "ident")) { cat("From: ", x$x, "\n", sep = "") } else { cat("From: \n") } cat("\n", sep = "") } #' @export print.op_base_local <- function(x, ...) { cat(" ", dim_desc(x$x), "\n", sep = "") } #' @export sql_build.op_base_remote <- function(op, con, ...) { op$x } #' @export sql_build.op_base_local <- function(op, con, ...) { ident("df") } # Operators --------------------------------------------------------------- #' @export #' @rdname lazy_ops op_single <- function(name, x, dots = list(), args = list()) { structure( list( name = name, x = x, dots = dots, args = args ), class = c(paste0("op_", name), "op_single", "op") ) } #' @export #' @rdname lazy_ops add_op_single <- function(name, .data, dots = list(), args = list()) { .data$ops <- op_single(name, x = .data$ops, dots = dots, args = args) .data } #' @export print.op_single <- function(x, ...) { print(x$x) cat("-> ", x$name, "()\n", sep = "") for (dot in x$dots) { cat(" - ", deparse_trunc(dot), "\n", sep = "") } } #' @export #' @rdname lazy_ops op_double <- function(name, x, y, args = list()) { structure( list( name = name, x = x, y = y, args = args ), class = c(paste0("op_", name), "op_double", "op") ) } # op_grps ----------------------------------------------------------------- #' @export #' @rdname lazy_ops op_grps <- function(op) UseMethod("op_grps") #' @export op_grps.op_base <- function(op) character() #' @export op_grps.op_single <- function(op) op_grps(op$x) #' @export op_grps.op_double <- function(op) op_grps(op$x) #' @export op_grps.tbl_lazy <- function(op) op_grps(op$ops) # op_vars ----------------------------------------------------------------- #' @export #' @rdname lazy_ops op_vars <- function(op) UseMethod("op_vars") #' @export op_vars.op_base <- function(op) op$vars #' @export op_vars.op_single <- function(op) op_vars(op$x) #' @export op_vars.op_double <- function(op) stop("Not implemented", call. = FALSE) #' @export op_vars.tbl_lazy <- function(op) op_vars(op$ops) # op_sort ----------------------------------------------------------------- #' @export #' @rdname lazy_ops op_sort <- function(op) UseMethod("op_sort") #' @export op_sort.op_base <- function(op) NULL #' @export op_sort.op_single <- function(op) op_sort(op$x) #' @export op_sort.op_double <- function(op) op_sort(op$x) #' @export op_sort.tbl_lazy <- function(op) op_sort(op$ops) # op_frame ---------------------------------------------------------------- #' @export #' @rdname lazy_ops op_frame <- function(op) UseMethod("op_frame") #' @export op_frame.op_base <- function(op) NULL #' @export op_frame.op_single <- function(op) op_frame(op$x) #' @export op_frame.op_double <- function(op) op_frame(op$x) #' @export op_frame.tbl_lazy <- function(op) op_frame(op$ops) # Description ------------------------------------------------------------- op_rows <- function(op) "??" op_cols <- function(op) length(op_vars(op)) op_desc <- function(op) UseMethod("op_desc") #' @export op_desc.op <- function(x, ..., con = con) { "lazy query" } #' @export op_desc.op_base_remote <- function(op) { if (is.ident(op$x)) { paste0("table<", op$x, ">") } else { "SQL" } } dbplyr/R/query.R0000644000176200001440000000014613415754020013241 0ustar liggesusers#' @export sql_optimise.query <- function(x, con = NULL, ...) { # Default to no optimisation x } dbplyr/R/translate-sql-quantile.R0000644000176200001440000000176413426635320016520 0ustar liggesuserssql_quantile <- function(f, style = c("infix", "ordered"), window = FALSE) { force(f) style <- match.arg(style) force(window) function(x, probs) { check_probs(probs) sql <- switch(style, infix = sql_call2(f, x, probs), ordered = build_sql( sql_call2(f, probs), " WITHIN GROUP (ORDER BY ", x, ")" ) ) if (window) { sql <- win_over(sql, partition = win_current_group(), frame = win_current_frame() ) } sql } } sql_median <- function(f, style = c("infix", "ordered"), window = FALSE) { quantile <- sql_quantile(f, style = style, window = window) function(x) { quantile(x, 0.5) } } check_probs <- function(probs) { if (!is.numeric(probs)) { stop("`probs` must be numeric", call. = FALSE) } if (length(probs) > 1) { stop("SQL translation only supports single value for `probs`.", call. = FALSE) } } dbplyr/R/partial-eval.R0000644000176200001440000001273213476543766014505 0ustar liggesusers#' Partially evaluate an expression. #' #' This function partially evaluates an expression, using information from #' the tbl to determine whether names refer to local expressions #' or remote variables. This simplifies SQL translation because expressions #' don't need to carry around their environment - all relevant information #' is incorporated into the expression. #' #' @section Symbol substitution: #' #' `partial_eval()` needs to guess if you're referring to a variable on the #' server (remote), or in the current environment (local). It's not possible to #' do this 100% perfectly. `partial_eval()` uses the following heuristic: #' #' \itemize{ #' \item If the tbl variables are known, and the symbol matches a tbl #' variable, then remote. #' \item If the symbol is defined locally, local. #' \item Otherwise, remote. #' } #' #' You can override the guesses using `local()` and `remote()` to force #' computation, or by using the `.data` and `.env` pronouns of tidy evaluation. #' #' @param call an unevaluated expression, as produced by [quote()] #' @param vars character vector of variable names. #' @param env environment in which to search for local values #' @export #' @keywords internal #' @examples #' vars <- c("year", "id") #' partial_eval(quote(year > 1980), vars = vars) #' #' ids <- c("ansonca01", "forceda01", "mathebo01") #' partial_eval(quote(id %in% ids), vars = vars) #' #' # cf. #' partial_eval(quote(id == .data$ids), vars = vars) #' #' # You can use local() or .env to disambiguate between local and remote #' # variables: otherwise remote is always preferred #' year <- 1980 #' partial_eval(quote(year > year), vars = vars) #' partial_eval(quote(year > local(year)), vars = vars) #' partial_eval(quote(year > .env$year), vars = vars) #' #' # Functions are always assumed to be remote. Use local to force evaluation #' # in R. #' f <- function(x) x + 1 #' partial_eval(quote(year > f(1980)), vars = vars) #' partial_eval(quote(year > local(f(1980))), vars = vars) #' #' # For testing you can also use it with the tbl omitted #' partial_eval(quote(1 + 2 * 3)) #' x <- 1 #' partial_eval(quote(x ^ y)) partial_eval <- function(call, vars = character(), env = caller_env()) { if (is_null(call)) { NULL } else if (is_atomic(call)) { call } else if (is_symbol(call)) { partial_eval_sym(call, vars, env) } else if (is_quosure(call)) { partial_eval(get_expr(call), vars, get_env(call)) } else if (is_call(call)) { partial_eval_call(call, vars, env) } else { abort(glue("Unknown input type: ", class(call))) } } partial_eval_dots <- function(dots, vars) { stopifnot(inherits(dots, "quosures")) lapply(dots, function(x) { new_quosure( partial_eval(get_expr(x), vars = vars, env = get_env(x)), get_env(x) ) }) } partial_eval_sym <- function(sym, vars, env) { name <- as_string(sym) if (name %in% vars) { sym } else if (env_has(env, name, inherit = TRUE)) { eval_bare(sym, env) } else { sym } } is_namespaced_dplyr_call <- function(call) { is_symbol(call[[1]], "::") && is_symbol(call[[2]], "dplyr") } is_tidy_pronoun <- function(call) { is_symbol(call[[1]], c("$", "[[")) && is_symbol(call[[2]], c(".data", ".env")) } partial_eval_call <- function(call, vars, env) { fun <- call[[1]] # Try to find the name of inlined functions if (inherits(fun, "inline_colwise_function")) { dot_var <- vars[[attr(call, "position")]] call <- replace_dot(attr(fun, "formula")[[2]], dot_var) env <- get_env(attr(fun, "formula")) } else if (is.function(fun)) { fun_name <- find_fun(fun) if (is.null(fun_name)) { # This probably won't work, but it seems like it's worth a shot. return(eval_bare(call, env)) } call[[1]] <- fun <- sym(fun_name) } # So are compound calls, EXCEPT dplyr::foo() if (is.call(fun)) { if (is_namespaced_dplyr_call(fun)) { call[[1]] <- fun[[3]] } else if (is_tidy_pronoun(fun)) { stop("Use local() or remote() to force evaluation of functions", call. = FALSE) } else { return(eval_bare(call, env)) } } # .data$, .data[[]], .env$, .env[[]] need special handling if (is_tidy_pronoun(call)) { if (is_symbol(call[[1]], "$")) { idx <- call[[3]] } else { idx <- as.name(eval_bare(call[[3]], env)) } if (is_symbol(call[[2]], ".data")) { idx } else { eval_bare(idx, env) } } else { # Process call arguments recursively, unless user has manually called # remote/local name <- as_string(call[[1]]) if (name == "local") { eval_bare(call[[2]], env) } else if (name == "remote") { call[[2]] } else { call[-1] <- lapply(call[-1], partial_eval, vars = vars, env = env) call } } } find_fun <- function(fun) { if (is_lambda(fun)) { body <- body(fun) if (!is_call(body)) { return(NULL) } fun_name <- body[[1]] if (!is_symbol(fun_name)) { return(NULL) } as.character(fun_name) } else if (is.function(fun)) { fun_name(fun) } } fun_name <- function(fun) { pkg_env <- env_parent(global_env()) known <- c(ls(base_agg), ls(base_scalar)) for (x in known) { if (!env_has(pkg_env, x, inherit = TRUE)) next fun_x <- env_get(pkg_env, x, inherit = TRUE) if (identical(fun, fun_x)) return(x) } NULL } replace_dot <- function(call, var) { if (is_symbol(call, ".")) { sym(var) } else if (is_call(call)) { call[] <- lapply(call, replace_dot, var = var) call } else { call } } dbplyr/R/translate-sql-window.R0000644000176200001440000001755313426633300016204 0ustar liggesusers#' Generate SQL expression for window functions #' #' `win_over()` makes it easy to generate the window function specification. #' `win_absent()`, `win_rank()`, `win_aggregate()`, and `win_cumulative()` #' provide helpers for constructing common types of window functions. #' `win_current_group()` and `win_current_order()` allow you to access #' the grouping and order context set up by [group_by()] and [arrange()]. #' #' @param expr The window expression #' @param parition Variables to partition over #' @param order Variables to order by #' @param frame A numeric vector of length two defining the frame. #' @param f The name of an sql function as a string #' @export #' @keywords internal #' @examples #' con <- simulate_dbi() #' #' win_over(sql("avg(x)"), con = con) #' win_over(sql("avg(x)"), "y", con = con) #' win_over(sql("avg(x)"), order = "y", con = con) #' win_over(sql("avg(x)"), order = c("x", "y"), con = con) #' win_over(sql("avg(x)"), frame = c(-Inf, 0), order = "y", con = con) win_over <- function(expr, partition = NULL, order = NULL, frame = NULL, con = sql_current_con()) { if (length(partition) > 0) { partition <- as.sql(partition) partition <- build_sql( "PARTITION BY ", sql_vector( escape(partition, con = con), collapse = ", ", parens = FALSE, con = con ), con = con ) } if (length(order) > 0) { order <- as.sql(order) order <- build_sql( "ORDER BY ", sql_vector( escape(order, con = con), collapse = ", ", parens = FALSE, con = con ), con = con ) } if (length(frame) > 0) { if (length(order) == 0) { warning( "Windowed expression '", expr, "' does not have explicit order.\n", "Please use arrange() or window_order() to make determinstic.", call. = FALSE ) } if (is.numeric(frame)) frame <- rows(frame[1], frame[2]) frame <- build_sql("ROWS ", frame, con = con) } over <- sql_vector(purrr::compact(list(partition, order, frame)), parens = TRUE, con = con) sql <- build_sql(expr, " OVER ", over, con = con) sql } rows <- function(from = -Inf, to = 0) { if (from >= to) stop("from must be less than to", call. = FALSE) dir <- function(x) if (x < 0) "PRECEDING" else "FOLLOWING" val <- function(x) if (is.finite(x)) as.integer(abs(x)) else "UNBOUNDED" bound <- function(x) { if (x == 0) return("CURRENT ROW") paste(val(x), dir(x)) } if (to == 0) { sql(bound(from)) } else { sql(paste0("BETWEEN ", bound(from), " AND ", bound(to))) } } #' @rdname win_over #' @export win_rank <- function(f) { force(f) function(order = NULL) { win_over( build_sql(sql(f), list()), partition = win_current_group(), order = order %||% win_current_order(), frame = win_current_frame() ) } } #' @rdname win_over #' @export win_aggregate <- function(f) { force(f) warned <- FALSE function(x, na.rm = FALSE) { warned <<- check_na_rm(f, na.rm, warned) frame <- win_current_frame() win_over( build_sql(sql(f), list(x)), partition = win_current_group(), order = if (!is.null(frame)) win_current_order(), frame = frame ) } } #' @rdname win_over #' @export win_aggregate_2 <- function(f) { function(x, y) { frame <- win_current_frame() win_over( build_sql(sql(f), list(x, y)), partition = win_current_group(), order = if (!is.null(frame)) win_current_order(), frame = frame ) } } #' @rdname win_over #' @usage NULL #' @export win_recycled <- win_aggregate #' @rdname win_over #' @export win_cumulative <- function(f) { force(f) function(x, order = NULL) { win_over( build_sql(sql(f), list(x)), partition = win_current_group(), order = order %||% win_current_order(), frame = c(-Inf, 0) ) } } #' @rdname win_over #' @export win_absent <- function(f) { force(f) function(...) { stop( "Window function `", f, "()` is not supported by this database", call. = FALSE ) } } # API to set default partitioning etc ------------------------------------- # Use a global variable to communicate state of partitioning between # tbl and sql translator. This isn't the most amazing design, but it keeps # things loosely coupled and is easy to understand. sql_context <- new.env(parent = emptyenv()) sql_context$group_by <- NULL sql_context$order_by <- NULL sql_context$con <- NULL # Used to carry additional information needed for special cases sql_context$context <- "" set_current_con <- function(con) { old <- sql_context$con sql_context$con <- con invisible(old) } set_win_current_group <- function(vars) { stopifnot(is.null(vars) || is.character(vars)) old <- sql_context$group_by sql_context$group_by <- vars invisible(old) } set_win_current_order <- function(vars) { stopifnot(is.null(vars) || is.character(vars)) old <- sql_context$order_by sql_context$order_by <- vars invisible(old) } set_win_current_frame <- function(frame) { stopifnot(is.null(frame) || is.numeric(frame)) old <- sql_context$frame sql_context$frame <- frame invisible(old) } #' @export #' @rdname win_over win_current_group <- function() sql_context$group_by #' @export #' @rdname win_over win_current_order <- function() sql_context$order_by #' @export #' @rdname win_over win_current_frame <- function() sql_context$frame # Not exported, because you shouldn't need it sql_current_con <- function() { sql_context$con } # Functions to manage information for special cases set_current_context <- function(context) { old <- sql_context$context sql_context$context <- context invisible(old) } sql_current_context <- function() sql_context$context sql_current_select <- function() sql_context$context %in% c("SELECT", "ORDER") # Where translation ------------------------------------------------------- uses_window_fun <- function(x, con) { if (is.null(x)) return(FALSE) if (is.list(x)) { calls <- unlist(lapply(x, all_calls)) } else { calls <- all_calls(x) } win_f <- ls(envir = sql_translate_env(con)$window) any(calls %in% win_f) } common_window_funs <- function() { ls(sql_translate_env(NULL)$window) } #' @noRd #' @examples #' translate_window_where(quote(1)) #' translate_window_where(quote(x)) #' translate_window_where(quote(x == 1)) #' translate_window_where(quote(x == 1 && y == 2)) #' translate_window_where(quote(n() > 10)) #' translate_window_where(quote(rank() > cumsum(AB))) translate_window_where <- function(expr, window_funs = common_window_funs()) { switch_type(expr, formula = translate_window_where(f_rhs(expr), window_funs), logical = , integer = , double = , complex = , character = , string = , symbol = window_where(expr, list()), language = { if (lang_name(expr) %in% window_funs) { name <- unique_name() window_where(sym(name), set_names(list(expr), name)) } else { args <- lapply(expr[-1], translate_window_where, window_funs = window_funs) expr <- lang(node_car(expr), splice(lapply(args, "[[", "expr"))) window_where( expr = expr, comp = unlist(lapply(args, "[[", "comp"), recursive = FALSE) ) } }, abort(glue("Unknown type: ", typeof(expr))) ) } #' @noRd #' @examples #' translate_window_where_all(list(quote(x == 1), quote(n() > 2))) #' translate_window_where_all(list(quote(cumsum(x) == 10), quote(n() > 2))) translate_window_where_all <- function(x, window_funs = common_window_funs()) { out <- lapply(x, translate_window_where, window_funs = window_funs) list( expr = unlist(lapply(out, "[[", "expr"), recursive = FALSE), comp = unlist(lapply(out, "[[", "comp"), recursive = FALSE) ) } window_where <- function(expr, comp) { stopifnot(is.call(expr) || is.name(expr) || is.atomic(expr)) stopifnot(is.list(comp)) list( expr = expr, comp = comp ) } dbplyr/R/backend-mssql.R0000644000176200001440000002411313475560423014630 0ustar liggesusers#' @export `sql_select.Microsoft SQL Server` <- function(con, select, from, where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL, distinct = FALSE, ..., bare_identifier_ok = FALSE) { out <- vector("list", 7) names(out) <- c("select", "from", "where", "group_by", "having", "order_by","limit") assert_that(is.character(select), length(select) > 0L) out$select <- build_sql( "SELECT ", if (distinct) sql("DISTINCT "), if (!is.null(limit) && !identical(limit, Inf)) { # MS SQL uses the TOP statement instead of LIMIT which is what SQL92 uses # TOP is expected after DISTINCT and not at the end of the query # e.g: SELECT TOP 100 * FROM my_table assert_that(is.numeric(limit), length(limit) == 1L, limit > 0) build_sql("TOP(", as.integer(limit), ") ", con = con) } else if (!is.null(order_by) && bare_identifier_ok) { # Stop-gap measure so that a wider range of queries is supported (#276). # MS SQL doesn't allow ORDER BY in subqueries, # unless also TOP (or FOR XML) is specified. # Workaround: Use TOP 100 PERCENT # https://stackoverflow.com/a/985953/946850 sql("TOP 100 PERCENT ") }, escape(select, collapse = ", ", con = con), con = con ) out$from <- sql_clause_from(from, con) out$where <- sql_clause_where(where, con) out$group_by <- sql_clause_group_by(group_by, con) out$having <- sql_clause_having(having, con) out$order_by <- sql_clause_order_by(order_by, con) escape(unname(purrr::compact(out)), collapse = "\n", parens = FALSE, con = con) } #' @export `sql_translate_env.Microsoft SQL Server` <- function(con) { sql_variant( sql_translator(.parent = base_odbc_scalar, `!` = function(x) { if (sql_current_select()) { build_sql(sql("~"), list(x)) } else { sql_expr(NOT(!!x)) } }, `!=` = sql_infix("!="), `==` = sql_infix("="), `<` = sql_infix("<"), `<=` = sql_infix("<="), `>` = sql_infix(">"), `>=` = sql_infix(">="), `&` = mssql_generic_infix("&", "%AND%"), `&&` = mssql_generic_infix("&", "%AND%"), `|` = mssql_generic_infix("|", "%OR%"), `||` = mssql_generic_infix("|", "%OR%"), bitwShiftL = sql_not_supported("bitwShiftL"), bitwShiftR = sql_not_supported("bitwShiftR"), `if` = mssql_sql_if, if_else = function(condition, true, false) mssql_sql_if(condition, true, false), ifelse = function(test, yes, no) mssql_sql_if(test, yes, no), as.logical = sql_cast("BIT"), as.Date = sql_cast("DATE"), as.numeric = sql_cast("NUMERIC"), as.double = sql_cast("NUMERIC"), as.character = sql_cast("VARCHAR(MAX)"), log = sql_prefix("LOG"), atan2 = sql_prefix("ATN2"), ceil = sql_prefix("CEILING"), ceiling = sql_prefix("CEILING"), # https://dba.stackexchange.com/questions/187090 pmin = sql_not_supported("pmin()"), pmax = sql_not_supported("pmax()"), is.null = function(x) mssql_is_null(x, sql_current_context()), is.na = function(x) mssql_is_null(x, sql_current_context()), # string functions ------------------------------------------------ nchar = sql_prefix("LEN"), paste = sql_paste_infix(" ", "+", function(x) sql_expr(cast(!!x %as% text))), paste0 = sql_paste_infix("", "+", function(x) sql_expr(cast(!!x %as% text))), substr = sql_substr("SUBSTRING"), # stringr functions str_length = sql_prefix("LEN"), str_c = sql_paste_infix("", "+", function(x) sql_expr(cast(!!x %as% text))), # no built in function: https://stackoverflow.com/questions/230138 str_to_title = sql_not_supported("str_to_title()"), str_sub = sql_str_sub("SUBSTRING", "LEN"), # lubridate --------------------------------------------------------------- # https://en.wikibooks.org/wiki/SQL_Dialects_Reference/Functions_and_expressions/Date_and_time_functions as_date = sql_cast("DATE"), # Using DATETIME2 as it complies with ANSI and ISO. # MS recommends DATETIME2 for new work: # https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-2017 as_datetime = sql_cast("DATETIME2"), today = function() sql_expr(CAST(SYSDATETIME() %AS% DATE)), # https://docs.microsoft.com/en-us/sql/t-sql/functions/datepart-transact-sql?view=sql-server-2017 year = function(x) sql_expr(DATEPART(YEAR, !!x)), day = function(x) sql_expr(DATEPART(DAY, !!x)), mday = function(x) sql_expr(DATEPART(DAY, !!x)), yday = function(x) sql_expr(DATEPART(DAYOFYEAR, !!x)), hour = function(x) sql_expr(DATEPART(HOUR, !!x)), minute = function(x) sql_expr(DATEPART(MINUTE, !!x)), second = function(x) sql_expr(DATEPART(SECOND, !!x)), month = function(x, label = FALSE, abbr = TRUE) { if (!label) { sql_expr(DATEPART(MONTH, !!x)) } else { if (!abbr) { sql_expr(DATENAME(MONTH, !!x)) } else { stop("`abbr` is not supported in SQL Server translation", call. = FALSE) } } }, quarter = function(x, with_year = FALSE, fiscal_start = 1) { if (fiscal_start != 1) { stop("`fiscal_start` is not supported in SQL Server translation. Must be 1.", call. = FALSE) } if (with_year) { sql_expr((DATENAME(YEAR, !!x) + '.' + DATENAME(QUARTER, !!x))) } else { sql_expr(DATEPART(QUARTER, !!x)) } }, ), sql_translator(.parent = base_odbc_agg, sd = sql_aggregate("STDEV", "sd"), var = sql_aggregate("VAR", "var"), # MSSQL does not have function for: cor and cov cor = sql_not_supported("cor()"), cov = sql_not_supported("cov()") ), sql_translator(.parent = base_odbc_win, sd = win_aggregate("STDEV"), var = win_aggregate("VAR"), # MSSQL does not have function for: cor and cov cor = win_absent("cor"), cov = win_absent("cov") ) )} #' @export `db_analyze.Microsoft SQL Server` <- function(con, table, ...) { # Using UPDATE STATISTICS instead of ANALYZE as recommended in this article # https://docs.microsoft.com/en-us/sql/t-sql/statements/update-statistics-transact-sql sql <- build_sql("UPDATE STATISTICS ", as.sql(table), con = con) DBI::dbExecute(con, sql) } # Temporary tables -------------------------------------------------------- # SQL server does not support CREATE TEMPORARY TABLE and instead prefixes # temporary table names with # mssql_temp_name <- function(name, temporary){ # check that name has prefixed '##' if temporary if (temporary && substr(name, 1, 1) != "#") { name <- paste0("##", name) message("Created a temporary table named: ", name) } name } #' @export `db_copy_to.Microsoft SQL Server` <- function(con, table, values, overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ...) { NextMethod( table = mssql_temp_name(table, temporary), types = types, values = values, temporary = FALSE ) } #' @export `db_save_query.Microsoft SQL Server` <- function(con, sql, name, temporary = TRUE, ...){ name <- mssql_temp_name(name, temporary) # Different syntax for MSSQL: https://stackoverflow.com/q/16683758/946850 tt_sql <- build_sql( "SELECT * ", "INTO ", as.sql(name), " ", "FROM (", sql, ") AS temp", con = con ) dbExecute(con, tt_sql) name } #' @export `db_write_table.Microsoft SQL Server` <- function(con, table, types, values, temporary = TRUE, ...) { NextMethod( table = mssql_temp_name(table, temporary), types = types, values = values, temporary = FALSE ) } # `IS NULL` returns a boolean expression, so you can't use it in a result set # the approach using casting return a bit, so you can use in a result set, but not in where. # Microsoft documentation: The result of a comparison operator has the Boolean data type. # This has three values: TRUE, FALSE, and UNKNOWN. Expressions that return a Boolean data type are # known as Boolean expressions. Unlike other SQL Server data types, a Boolean data type cannot # be specified as the data type of a table column or variable, and cannot be returned in a result set. # https://docs.microsoft.com/en-us/sql/t-sql/language-elements/comparison-operators-transact-sql mssql_is_null <- function(x, context = NULL) { needs_bit <- is.list(context) && !is.null(context$clause) && context$clause %in% c("SELECT", "ORDER") if (needs_bit) { sql_expr(convert(BIT, iif(!!x %is% NULL, 1L, 0L))) } else { sql_is_null(x) } } mssql_generic_infix <- function(if_select, if_filter) { force(if_select) force(if_filter) function(x, y) { if (sql_current_select()) { f <- if_select } else { f <- if_filter } sql_call2(f, x, y) } } mssql_sql_if <- function(cond, if_true, if_false = NULL) { old <- set_current_context(list(clause = "")) on.exit(set_current_context(old), add = TRUE) cond <- build_sql(cond) sql_if(cond, if_true, if_false) } globalVariables(c("BIT", "CAST", "%AS%", "%is%", "convert", "DATE", "DATENAME", "DATEPART", "iif", "NOT", "SUBSTRING", "LTRIM", "RTRIM", "CHARINDEX", "SYSDATETIME", "SECOND", "MINUTE", "HOUR", "DAY", "DAYOFWEEK", "DAYOFYEAR", "MONTH", "QUARTER", "YEAR")) dbplyr/R/backend-oracle.R0000644000176200001440000001067413443245100014730 0ustar liggesusers#' @export sql_select.Oracle<- function(con, select, from, where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL, distinct = FALSE, ...) { out <- vector("list", 7) names(out) <- c("select", "from", "where", "group_by", "having", "order_by", "limit") out$select <- sql_clause_select(select, con, distinct) out$from <- sql_clause_from(from, con) out$where <- sql_clause_where(where, con) out$group_by <- sql_clause_group_by(group_by, con) out$having <- sql_clause_having(having, con) out$order_by <- sql_clause_order_by(order_by, con) # Processing limit via ROWNUM in a WHERE clause, thie method # is backwards & forward compatible: https://oracle-base.com/articles/misc/top-n-queries if (!is.null(limit) && !identical(limit, Inf)) { out <- escape(unname(purrr::compact(out)), collapse = "\n", parens = FALSE, con = con) assertthat::assert_that(is.numeric(limit), length(limit) == 1L, limit > 0) out <- build_sql( "SELECT * FROM ", sql_subquery(con, out), " WHERE ROWNUM <= ", limit, con = con) }else{ escape(unname(purrr::compact(out)), collapse = "\n", parens = FALSE, con = con) } } #' @export sql_translate_env.Oracle <- function(con) { sql_variant( sql_translator(.parent = base_odbc_scalar, # Data type conversions are mostly based on this article # https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements001.htm # https://stackoverflow.com/questions/1171196 as.character = sql_cast("VARCHAR2(255)"), # bit64::as.integer64 can translate to BIGINT for some # vendors, which is equivalent to NUMBER(19) in Oracle # https://docs.oracle.com/cd/B19306_01/gateways.102/b14270/apa.htm as.integer64 = sql_cast("NUMBER(19)"), as.numeric = sql_cast("NUMBER"), as.double = sql_cast("NUMBER"), # https://docs.oracle.com/cd/B19306_01/server.102/b14200/operators003.htm#i997789 paste = sql_paste_infix(" ", "||", function(x) sql_expr(cast(!!x %as% text))), paste0 = sql_paste_infix("", "||", function(x) sql_expr(cast(!!x %as% text))), ), base_odbc_agg, base_odbc_win ) } #' @export db_explain.Oracle <- function(con, sql, ...) { DBI::dbExecute(con, build_sql("EXPLAIN PLAN FOR ", sql, con = con)) expl <- DBI::dbGetQuery(con, "SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY())") out <- utils::capture.output(print(expl)) paste(out, collapse = "\n") } #' @export db_analyze.Oracle <- function(con, table, ...) { # https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_4005.htm sql <- dbplyr::build_sql( "ANALYZE TABLE ", as.sql(table), " COMPUTE STATISTICS", con = con ) DBI::dbExecute(con, sql) } #' @export sql_subquery.Oracle <- function(con, from, name = unique_name(), ...) { # Table aliases in Oracle should not have an "AS": https://www.techonthenet.com/oracle/alias.php if (is.ident(from)) { build_sql("(", from, ") ", if (!is.null(name)) ident(name), con = con) } else { build_sql("(", from, ") ", ident(name %||% unique_table_name()), con = con) } } #' @export db_drop_table.Oracle <- function(con, table, force = FALSE, ...) { if (force) { # https://stackoverflow.com/questions/1799128/oracle-if-table-exists sql <- build_sql( "BEGIN ", "EXECUTE IMMEDIATE 'DROP TABLE ", ident(table), "';", "EXCEPTION WHEN OTHERS THEN IF SQLCODE != -942 THEN RAISE; END IF; ", "END;", con = con ) } else { sql <- build_sql("DROP TABLE ", ident(table)) } DBI::dbExecute(con, sql) } # registered onLoad located in the zzz.R script setdiff.tbl_Oracle <- function(x, y, copy = FALSE, ...) { # Oracle uses MINUS instead of EXCEPT for this operation: # https://docs.oracle.com/cd/B19306_01/server.102/b14200/queries004.htm add_op_set_op(x, y, "MINUS", copy = copy, ...) } # roacle package ---------------------------------------------------------- #' @export sql_translate_env.OraConnection <- sql_translate_env.Oracle #' @export sql_select.OraConnection <- sql_select.Oracle #' @export db_analyze.OraConnection <- db_analyze.Oracle #' @export sql_subquery.OraConnection <- sql_subquery.Oracle #' @export db_drop_table.OraConnection <- db_drop_table.Oracle # registered onLoad located in the zzz.R script setdiff.OraConnection <- setdiff.tbl_Oracle dbplyr/R/sql-expr.R0000644000176200001440000000441213476015651013656 0ustar liggesusers#' Generate SQL from R expressions #' #' Low-level building block for generating SQL from R expressions. #' Strings are escaped; names become bare SQL identifiers. User infix #' functions have `%` stripped. #' #' Using `sql_expr()` in package will require use of [globalVariables()] #' to avoid `R CMD check` NOTES. This is a small amount of additional pain, #' which I think is worthwhile because it leads to more readable translation #' code. #' #' @param x A quasiquoted expression #' @param con Connection to use for escaping. Will be set automatically when #' called from a function translation. #' @param .fn Function name (as string, call, or symbol) #' @param ... Arguments to function #' @keywords internal #' @inheritParams translate_sql #' @export #' @examples #' con <- simulate_dbi() # not necessary when writing translations #' #' sql_expr(f(x + 1), con = con) #' sql_expr(f("x", "y"), con = con) #' sql_expr(f(x, y), con = con) #' #' x <- ident("x") #' sql_expr(f(!!x, y), con = con) #' #' sql_expr(cast("x" %as% DECIMAL), con = con) #' sql_expr(round(x) %::% numeric, con = con) #' #' sql_call2("+", quote(x), 1, con = con) #' sql_call2("+", "x", 1, con = con) sql_expr <- function(x, con = sql_current_con()) { x <- enexpr(x) x <- replace_expr(x, con = con) sql(x) } #' @export #' @rdname sql_expr sql_call2 <- function(.fn, ..., con = sql_current_con()) { fn <- call2(.fn, ...) fn <- replace_expr(fn, con = con) sql(fn) } replace_expr <- function(x, con) { if (is.atomic(x)) { as.character(escape(unname(x), con = con)) } else if (is.name(x)) { as.character(x) # } else if (is.call(x) && identical(x[[1]], quote(I))) { # escape(ident(as.character(x[[2]]))) } else if (is.call(x)) { fun <- toupper(as.character(x[[1]])) args <- lapply(x[-1], replace_expr, con = con) if (is_infix_base(fun)) { if (length(args) == 1) { paste0(fun, args[[1]]) } else { paste0(args[[1]], " ", fun, " ", args[[2]]) } } else if (is_infix_user(fun)) { fun <- substr(fun, 2, nchar(fun) - 1) paste0(args[[1]], " ", fun, " ", args[[2]]) } else if (fun == "(") { paste0("(", paste0(args, collapse = ", "), ")") } else { paste0(fun, "(", paste0(args, collapse = ", "), ")") } } else { x } } dbplyr/R/verb-mutate.R0000644000176200001440000000360613475557272014353 0ustar liggesusers# mutate ------------------------------------------------------------------ #' @export mutate.tbl_lazy <- function(.data, ..., .dots = list()) { dots <- quos(..., .named = TRUE) dots <- partial_eval_dots(dots, vars = op_vars(.data)) nest_vars(.data, dots, union(op_vars(.data), op_grps(.data))) } # transmute --------------------------------------------------------------- #' @export transmute.tbl_lazy <- function(.data, ...) { dots <- quos(..., .named = TRUE) dots <- partial_eval_dots(dots, vars = op_vars(.data)) nest_vars(.data, dots, character()) } # helpers ----------------------------------------------------------------- # TODO: refactor to remove `.data` argument and return a list of layers. nest_vars <- function(.data, dots, all_vars) { # For each expression, check if it uses any newly created variables. # If so, nest the mutate() new_vars <- character() init <- 0L for (i in seq_along(dots)) { cur_var <- names(dots)[[i]] used_vars <- all_names(get_expr(dots[[i]])) if (any(used_vars %in% new_vars)) { .data$ops <- op_select(.data$ops, carry_over(all_vars, dots[new_vars])) all_vars <- c(all_vars, setdiff(new_vars, all_vars)) new_vars <- cur_var init <- i } else { new_vars <- c(new_vars, cur_var) } } if (init != 0L) { dots <- dots[-seq2(1L, init - 1)] } .data$ops <- op_select(.data$ops, carry_over(all_vars, dots)) .data } # Combine a selection (passed through from subquery) # with new actions carry_over <- function(sel = character(), act = list()) { if (is.null(names(sel))) { names(sel) <- sel } sel <- syms(sel) # Keep last of duplicated acts act <- act[!duplicated(names(act), fromLast = TRUE)] # Preserve order of sel both <- intersect(names(sel), names(act)) sel[both] <- act[both] # Adding new variables at end new <- setdiff(names(act), names(sel)) c(sel, act[new]) } dbplyr/R/data-lahman.R0000644000176200001440000000550213474056125014252 0ustar liggesusers#' Cache and retrieve an `src_sqlite` of the Lahman baseball database. #' #' This creates an interesting database using data from the Lahman baseball #' data source, provided by Sean Lahman at #' \url{http://www.seanlahman.com/baseball-archive/statistics/}, and #' made easily available in R through the \pkg{Lahman} package by #' Michael Friendly, Dennis Murphy and Martin Monkman. See the documentation #' for that package for documentation of the individual tables. #' #' @param ... Other arguments passed to `src` on first #' load. For MySQL and PostgreSQL, the defaults assume you have a local #' server with `lahman` database already created. #' For `lahman_srcs()`, character vector of names giving srcs to generate. #' @param quiet if `TRUE`, suppress messages about databases failing to #' connect. #' @param type src type. #' @keywords internal #' @examples #' # Connect to a local sqlite database, if already created #' \donttest{ #' if (has_lahman("sqlite")) { #' lahman_sqlite() #' batting <- tbl(lahman_sqlite(), "Batting") #' batting #' } #' #' # Connect to a local postgres database with lahman database, if available #' if (has_lahman("postgres")) { #' lahman_postgres() #' batting <- tbl(lahman_postgres(), "Batting") #' } #' } #' @name lahman NULL #' @export #' @rdname lahman lahman_sqlite <- function(path = NULL) { path <- db_location(path, "lahman.sqlite") copy_lahman(src_sqlite(path = path, create = TRUE)) } #' @export #' @rdname lahman lahman_postgres <- function(dbname = "lahman", host = "localhost", ...) { src <- src_postgres(dbname, host = host, ...) copy_lahman(src) } #' @export #' @rdname lahman lahman_mysql <- function(dbname = "lahman", ...) { src <- src_mysql(dbname, ...) copy_lahman(src) } #' @export #' @rdname lahman lahman_df <- function() { src_df("Lahman") } #' @rdname lahman #' @export copy_lahman <- function(src, ...) { # Create missing tables tables <- setdiff(lahman_tables(), src_tbls(src)) for (table in tables) { df <- getExportedValue("Lahman", table) message("Creating table: ", table) ids <- as.list(names(df)[grepl("ID$", names(df))]) copy_to(src, df, table, indexes = ids, temporary = FALSE) } src } # Get list of all non-label data frames in package lahman_tables <- function() { tables <- utils::data(package = "Lahman")$results[, 3] tables[!grepl("Labels", tables)] } #' @rdname lahman #' @export has_lahman <- function(type, ...) { if (!requireNamespace("Lahman", quietly = TRUE)) return(FALSE) if (missing(type)) return(TRUE) succeeds(lahman(type, ...), quiet = FALSE) } #' @rdname lahman #' @export lahman_srcs <- function(..., quiet = NULL) { load_srcs(lahman, c(...), quiet = quiet) } lahman <- function(type, ...) { if (missing(type)) { src_df("Lahman") } else { f <- match.fun(paste0("lahman_", type)) f(...) } } dbplyr/R/ident.R0000644000176200001440000000235613416116304013202 0ustar liggesusers#' @include utils.R NULL #' Flag a character vector as SQL identifiers #' #' `ident()` takes unquoted strings and flags them as identifiers. #' `ident_q()` assumes its input has already been quoted, and ensures #' it does not get quoted again. This is currently used only for #' for `schema.table`. #' #' @param ... A character vector, or name-value pairs #' @param x An object #' @export #' @examples #' # SQL92 quotes strings with ' #' escape_ansi("x") #' #' # And identifiers with " #' ident("x") #' escape_ansi(ident("x")) #' #' # You can supply multiple inputs #' ident(a = "x", b = "y") #' ident_q(a = "x", b = "y") ident <- function(...) { x <- c_character(...) structure(x, class = c("ident", "character")) } setOldClass(c("ident", "character"), ident()) #' @export #' @rdname ident ident_q <- function(...) { x <- c_character(...) structure(x, class = c("ident_q", "ident", "character")) } setOldClass(c("ident_q", "ident", "character"), ident_q()) #' @export print.ident <- function(x, ...) cat(format(x, ...), sep = "\n") #' @export format.ident <- function(x, ...) { if (length(x) == 0) { paste0(" [empty]") } else { paste0(" ", x) } } #' @rdname ident #' @export is.ident <- function(x) inherits(x, "ident") dbplyr/R/backend-teradata.R0000644000176200001440000000715713455376040015265 0ustar liggesusers#' @export sql_select.Teradata <- function(con, select, from, where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL, distinct = FALSE, ...) { out <- vector("list", 7) names(out) <- c("select", "from", "where", "group_by", "having", "order_by","limit") assert_that(is.character(select), length(select) > 0L) out$select <- build_sql( "SELECT ", if (distinct) sql("DISTINCT "), # Teradata uses the TOP statement instead of LIMIT which is what SQL92 uses # TOP is expected after DISTINCT and not at the end of the query # e.g: SELECT TOP 100 * FROM my_table if (!is.null(limit) && !identical(limit, Inf)) { assert_that(is.numeric(limit), length(limit) == 1L, limit > 0) build_sql(" TOP ", as.integer(limit), " ", con = con) }, escape(select, collapse = ", ", con = con), con = con ) out$from <- sql_clause_from(from, con) out$where <- sql_clause_where(where, con) out$group_by <- sql_clause_group_by(group_by, con) out$having <- sql_clause_having(having, con) out$order_by <- sql_clause_order_by(order_by, con) escape(unname(purrr::compact(out)), collapse = "\n", parens = FALSE, con = con) } #' @export sql_translate_env.Teradata <- function(con) { sql_variant( sql_translator(.parent = base_odbc_scalar, `!=` = sql_infix("<>"), bitwNot = sql_prefix("BITNOT", 1), bitwAnd = sql_prefix("BITAND", 2), bitwOr = sql_prefix("BITOR", 2), bitwXor = sql_prefix("BITXOR", 2), bitwShiftL = sql_prefix("SHIFTLEFT", 2), bitwShiftR = sql_prefix("SHIFTRIGHT", 2), as.numeric = sql_cast("NUMERIC"), as.double = sql_cast("NUMERIC"), as.character = sql_cast("VARCHAR(MAX)"), log10 = sql_prefix("LOG"), log = sql_log(), cot = sql_cot(), quantile = sql_quantile("APPROX_PERCENTILE"), median = sql_median("APPROX_PERCENTILE"), nchar = sql_prefix("CHARACTER_LENGTH"), ceil = sql_prefix("CEILING"), ceiling = sql_prefix("CEILING"), atan2 = function(x, y) { sql_expr(ATAN2(!!y, !!x)) }, substr = function(x, start, stop) { len <- stop - start + 1 sql_expr(SUBSTR(!!x, !!start, !!len)) }, paste = function(...) { stop( "`paste()`` is not supported in this SQL variant, try `paste0()` instead", call. = FALSE ) } ), sql_translator(.parent = base_odbc_agg, cor = sql_not_supported("cor()"), cov = sql_not_supported("cov()"), var = sql_prefix("VAR_SAMP"), ), sql_translator(.parent = base_odbc_win, cor = win_absent("cor"), cov = win_absent("cov"), var = win_recycled("VAR_SAMP") ) )} #' @export db_analyze.Teradata <- function(con, table, ...) { # Using COLLECT STATISTICS instead of ANALYZE as recommended in this article # https://www.tutorialspoint.com/teradata/teradata_statistics.htm sql <- build_sql( "COLLECT STATISTICS ", ident(table) , con = con ) DBI::dbExecute(con, sql) } utils::globalVariables(c("ATAN2", "SUBSTR")) dbplyr/R/verb-arrange.R0000644000176200001440000000300113474056125014446 0ustar liggesusers#' Arrange rows by variables in a remote database table #' #' Order rows of database tables by an expression involving its variables. #' #' @section Missing values: #' Compared to its sorting behaviour on local data, the [arrange()] method for #' most database tables sorts NA at the beginning unless wrapped with [desc()]. #' Users can override this behaviour by explicitly sorting on `is.na(x)`. #' #' @inheritParams dplyr::arrange #' @return An object of the same class as `.data`. #' @examples #' library(dplyr) #' #' dbplyr::memdb_frame(a = c(3, 4, 1, 2)) %>% #' arrange(a) #' #' # NA sorted first #' dbplyr::memdb_frame(a = c(3, 4, NA, 2)) %>% #' arrange(a) #' #' # override by sorting on is.na() first #' dbplyr::memdb_frame(a = c(3, 4, NA, 2)) %>% #' arrange(is.na(a), a) #' #' @export arrange.tbl_lazy <- function(.data, ..., .by_group = FALSE) { dots <- quos(...) dots <- partial_eval_dots(dots, vars = op_vars(.data)) names(dots) <- NULL add_op_single( "arrange", .data, dots = dots, args = list(.by_group = .by_group) ) } #' @export op_sort.op_arrange <- function(op) { c(op_sort(op$x), op$dots) } #' @export op_desc.op_arrange <- function(x, ...) { op_desc(x$x, ...) } #' @export sql_build.op_arrange <- function(op, con, ...) { order_vars <- translate_sql_(op$dots, con, context = list(clause = "ORDER")) if (op$args$.by_group) { order_vars <- c.sql(ident(op_grps(op$x)), order_vars, con = con) } select_query( sql_build(op$x, con), order_by = order_vars ) } dbplyr/R/sql.R0000644000176200001440000000311313443121714012667 0ustar liggesusers#' SQL escaping. #' #' These functions are critical when writing functions that translate R #' functions to sql functions. Typically a conversion function should escape #' all its inputs and return an sql object. #' #' @param ... Character vectors that will be combined into a single SQL #' expression. #' @export sql <- function(...) { x <- c_character(...) structure(x, class = c("sql", "character")) } # See setOldClass definition in zzz.R # c() is also called outside of the dbplyr context so must supply default # connection - this seems like a design mistake, and probably an indication # that within dbplyr c() should be replace with a more specific function #' @export c.sql <- function(..., drop_null = FALSE, con = simulate_dbi()) { input <- list(...) if (drop_null) input <- purrr::compact(input) out <- unlist(lapply(input, escape, collapse = NULL, con = con)) sql(out) } #' @export c.ident <- c.sql #' @export unique.sql <- function(x, ...) { sql(NextMethod()) } #' @rdname sql #' @export is.sql <- function(x) inherits(x, "sql") #' @export print.sql <- function(x, ...) cat(format(x, ...), sep = "\n") #' @export format.sql <- function(x, ...) { if (length(x) == 0) { paste0(" [empty]") } else { if (!is.null(names(x))) { paste0(" ", paste0(x, " AS ", names(x))) } else { paste0(" ", x) } } } #' @rdname sql #' @export #' @param x Object to coerce as.sql <- function(x) UseMethod("as.sql") #' @export as.sql.ident <- function(x) x #' @export as.sql.sql <- function(x) x #' @export as.sql.character <- function(x) ident(x) dbplyr/R/sql-build.R0000644000176200001440000000603113426613117013772 0ustar liggesusers#' Build and render SQL from a sequence of lazy operations #' #' `sql_build()` creates a `select_query` S3 object, that is rendered #' to a SQL string by `sql_render()`. The output from `sql_build()` is #' designed to be easy to test, as it's database agnostic, and has #' a hierarchical structure. Outside of testing, however, you should #' always call `sql_render()`. #' #' `sql_build()` is generic over the lazy operations, \link{lazy_ops}, #' and generates an S3 object that represents the query. `sql_render()` #' takes a query object and then calls a function that is generic #' over the database. For example, `sql_build.op_mutate()` generates #' a `select_query`, and `sql_render.select_query()` calls #' `sql_select()`, which has different methods for different databases. #' The default methods should generate ANSI 92 SQL where possible, so you #' backends only need to override the methods if the backend is not ANSI #' compliant. #' #' @export #' @keywords internal #' @param op A sequence of lazy operations #' @param con A database connection. The default `NULL` uses a set of #' rules that should be very similar to ANSI 92, and allows for testing #' without an active database connection. #' @param ... Other arguments passed on to the methods. Not currently used. sql_build <- function(op, con = NULL, ...) { UseMethod("sql_build") } #' @export sql_build.tbl_lazy <- function(op, con = op$src$con, ...) { # only used for testing qry <- sql_build(op$ops, con = con, ...) sql_optimise(qry, con = con, ...) } #' @export sql_build.ident <- function(op, con = NULL, ...) { op } # Render ------------------------------------------------------------------ #' @export #' @rdname sql_build #' @param bare_identifier_ok Is it ok to return a bare table identifier. #' Set to `TRUE` when generating queries to be nested within other #' queries where a bare table name is ok. sql_render <- function(query, con = NULL, ..., bare_identifier_ok = FALSE) { UseMethod("sql_render") } #' @export sql_render.tbl_lazy <- function(query, con = query$src$con, ..., bare_identifier_ok = FALSE) { sql_render(query$ops, con = con, ..., bare_identifier_ok = bare_identifier_ok) } #' @export sql_render.op <- function(query, con = NULL, ..., bare_identifier_ok = FALSE) { qry <- sql_build(query, con = con, ...) qry <- sql_optimise(qry, con = con, ...) sql_render(qry, con = con, ..., bare_identifier_ok = bare_identifier_ok) } #' @export sql_render.sql <- function(query, con = NULL, ..., bare_identifier_ok = FALSE) { query } #' @export sql_render.ident <- function(query, con = NULL, ..., bare_identifier_ok = FALSE) { if (bare_identifier_ok) { query } else { sql_select(con, sql("*"), query) } } # Optimise ---------------------------------------------------------------- #' @export #' @rdname sql_build sql_optimise <- function(x, con = NULL, ...) { UseMethod("sql_optimise") } #' @export sql_optimise.sql <- function(x, con = NULL, ...) { # Can't optimise raw SQL x } #' @export sql_optimise.ident <- function(x, con = NULL, ...) { x } dbplyr/R/verb-compute.R0000644000176200001440000001000613443245100014473 0ustar liggesusers#' Force computation of query #' #' `collapse()` creates a subquery; `compute()` stores the results in a #' remote table; `collect()` downloads the results into the current #' R session. #' #' @export #' @param x A `tbl_sql` collapse.tbl_sql <- function(x, ...) { sql <- db_sql_render(x$src$con, x) tbl(x$src, sql) %>% group_by(!!! syms(op_grps(x))) %>% add_op_order(op_sort(x)) } # compute ----------------------------------------------------------------- #' @rdname collapse.tbl_sql #' @param name Table name in remote database. #' @param temporary Should the table be temporary (`TRUE`, the default`) or #' persistent (`FALSE`)? #' @inheritParams copy_to.src_sql #' @export compute.tbl_sql <- function(x, name = unique_table_name(), temporary = TRUE, unique_indexes = list(), indexes = list(), analyze = TRUE, ...) { vars <- op_vars(x) assert_that(all(unlist(indexes) %in% vars)) assert_that(all(unlist(unique_indexes) %in% vars)) x_aliased <- select(x, !!! syms(vars)) # avoids problems with SQLite quoting (#1754) sql <- db_sql_render(x$src$con, x_aliased$ops) name <- db_compute(x$src$con, name, sql, temporary = temporary, unique_indexes = unique_indexes, indexes = indexes, analyze = analyze, ... ) tbl(x$src, name) %>% group_by(!!! syms(op_grps(x))) %>% add_op_order(op_sort(x)) } #' @export #' @rdname db_copy_to db_compute <- function(con, table, sql, temporary = TRUE, unique_indexes = list(), indexes = list(), analyze = TRUE, ...) { UseMethod("db_compute") } #' @export db_compute.DBIConnection <- function(con, table, sql, temporary = TRUE, unique_indexes = list(), indexes = list(), analyze = TRUE, ...) { if (!is.list(indexes)) { indexes <- as.list(indexes) } if (!is.list(unique_indexes)) { unique_indexes <- as.list(unique_indexes) } table <- db_save_query(con, sql, table, temporary = temporary) db_create_indexes(con, table, unique_indexes, unique = TRUE) db_create_indexes(con, table, indexes, unique = FALSE) if (analyze) db_analyze(con, table) table } # collect ----------------------------------------------------------------- #' @rdname collapse.tbl_sql #' @param n Number of rows to fetch. Defaults to `Inf`, meaning all rows. #' @param warn_incomplete Warn if `n` is less than the number of result rows? #' @export collect.tbl_sql <- function(x, ..., n = Inf, warn_incomplete = TRUE) { assert_that(length(n) == 1, n > 0L) if (n == Inf) { n <- -1 } else { # Gives the query planner information that it might be able to take # advantage of x <- head(x, n) } sql <- db_sql_render(x$src$con, x) out <- db_collect(x$src$con, sql, n = n, warn_incomplete = warn_incomplete) grouped_df(out, intersect(op_grps(x), names(out))) } #' @export #' @rdname db_copy_to db_collect <- function(con, sql, n = -1, warn_incomplete = TRUE, ...) { UseMethod("db_collect") } #' @export db_collect.DBIConnection <- function(con, sql, n = -1, warn_incomplete = TRUE, ...) { res <- dbSendQuery(con, sql) tryCatch({ out <- dbFetch(res, n = n) if (warn_incomplete) { res_warn_incomplete(res, "n = Inf") } }, finally = { dbClearResult(res) }) out } # sql_render -------------------------------------------------------------- # Used by implyr #' @rdname db_copy_to #' @export db_sql_render <- function(con, sql, ...) { UseMethod("db_sql_render") } #' @export db_sql_render.DBIConnection <- function(con, sql, ...) { sql_render(sql, con = con, ...) } dbplyr/R/backend-sqlite.R0000644000176200001440000000366613426147016014777 0ustar liggesusers#' @export db_desc.SQLiteConnection <- function(x) { paste0("sqlite ", sqlite_version(), " [", x@dbname, "]") } #' @export db_explain.SQLiteConnection <- function(con, sql, ...) { exsql <- build_sql("EXPLAIN QUERY PLAN ", sql, con = con) expl <- dbGetQuery(con, exsql) out <- utils::capture.output(print(expl)) paste(out, collapse = "\n") } sqlite_version <- function() { numeric_version(RSQLite::rsqliteVersion()[[2]]) } # SQL methods ------------------------------------------------------------- #' @export sql_translate_env.SQLiteConnection <- function(con) { sql_variant( sql_translator(.parent = base_scalar, as.numeric = sql_cast("REAL"), as.double = sql_cast("REAL"), log = function(x, base = exp(1)) { if (base != exp(1)) { sql_expr(log(!!x) / log(!!base)) } else { sql_expr(log(!!x)) } }, paste = sql_paste_infix(" ", "||", function(x) sql_expr(cast(!!x %as% text))), paste0 = sql_paste_infix("", "||", function(x) sql_expr(cast(!!x %as% text))), # https://www.sqlite.org/lang_corefunc.html#maxoreunc pmin = sql_prefix("MIN"), pmax = sql_prefix("MAX"), ), sql_translator(.parent = base_agg, sd = sql_aggregate("STDEV", "sd") ), if (sqlite_version() >= "3.25") { sql_translator(.parent = base_win, sd = win_aggregate("STDEV") ) } else { base_no_win } ) } #' @export sql_escape_ident.SQLiteConnection <- function(con, x) { sql_quote(x, "`") } #' @export sql_escape_logical.SQLiteConnection <- function(con, x){ y <- as.character(as.integer(x)) y[is.na(x)] <- "NULL" y } #' @export sql_subquery.SQLiteConnection <- function(con, from, name = unique_name(), ...) { if (is.ident(from)) { setNames(from, name) } else { if (is.null(name)) { build_sql("(", from, ")", con = con) } else { build_sql("(", from, ") AS ", ident(name), con = con) } } } dbplyr/R/simulate.R0000644000176200001440000000343013416415140013714 0ustar liggesusers#' Simulate database connections #' #' These functions generate S3 objects that have been designed to simulate #' the action of a database connection, without actually having the database #' available. Obviously, this simulation can only be incomplete, but most #' importantly it allows us to simulate SQL generation for any database without #' actually connecting to it. #' #' Simulated SQL always quotes identifies with `` `x` ``, and strings with #' `'x'`. #' #' @keywords internal #' @export simulate_dbi <- function(class = character()) { structure( list(), class = c(class, "TestConnection", "DBIConnection") ) } # Needed to work around fundamental hackiness of how I'm mingling # S3 and S4 dispatch sql_escape_ident.TestConnection <- function(con, x) { sql_quote(x, "`") } sql_escape_string.TestConnection <- function(con, x) { sql_quote(x, "'") } #' @export #' @rdname simulate_dbi simulate_access <- function() simulate_dbi("ACCESS") #' @export #' @rdname simulate_dbi simulate_hive <- function() simulate_dbi("Hive") #' @export #' @rdname simulate_dbi simulate_mysql <- function() simulate_dbi("MySQLConnection") #' @export #' @rdname simulate_dbi simulate_impala <- function() simulate_dbi("Impala") #' @export #' @rdname simulate_dbi simulate_mssql <- function() simulate_dbi("Microsoft SQL Server") #' @export #' @rdname simulate_dbi simulate_odbc <- function() simulate_dbi("OdbcConnection") #' @export #' @rdname simulate_dbi simulate_oracle <- function() simulate_dbi("Oracle") #' @export #' @rdname simulate_dbi simulate_postgres <- function() simulate_dbi("PostgreSQLConnection") #' @export #' @rdname simulate_dbi simulate_sqlite <- function() simulate_dbi("SQLiteConnection") #' @export #' @rdname simulate_dbi simulate_teradata <- function() simulate_dbi("Teradata") dbplyr/R/verb-group_by.R0000644000176200001440000000237613442404754014673 0ustar liggesusers# group_by ---------------------------------------------------------------- #' @export group_by.tbl_lazy <- function(.data, ..., add = FALSE, .drop = TRUE) { dots <- quos(...) dots <- partial_eval_dots(dots, vars = op_vars(.data)) if (!identical(.drop, TRUE)) { stop("`.drop` is not supported with database backends", call. = FALSE) } if (length(dots) == 0) { return(.data) } groups <- group_by_prepare(.data, .dots = dots, add = add) names <- purrr::map_chr(groups$groups, as_string) add_op_single("group_by", groups$data, dots = set_names(groups$groups, names), args = list(add = FALSE) ) } #' @export op_desc.op_group_by <- function(x, ...) { op_desc(x$x, ...) } #' @export op_grps.op_group_by <- function(op) { if (isTRUE(op$args$add)) { union(op_grps(op$x), names(op$dots)) } else { names(op$dots) } } #' @export sql_build.op_group_by <- function(op, con, ...) { sql_build(op$x, con, ...) } # ungroup ----------------------------------------------------------------- #' @export ungroup.tbl_lazy <- function(x, ...) { add_op_single("ungroup", x) } #' @export op_grps.op_ungroup <- function(op) { character() } #' @export sql_build.op_ungroup <- function(op, con, ...) { sql_build(op$x, con, ...) } dbplyr/R/schema.R0000644000176200001440000000127613415745770013355 0ustar liggesusers#' Refer to a table in a schema #' #' @param schema,table Names of schema and table. #' @export #' @examples #' in_schema("my_schema", "my_table") #' #' # Example using schemas with SQLite #' con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") #' src <- src_dbi(con, auto_disconnect = TRUE) #' #' # Add auxilary schema #' tmp <- tempfile() #' DBI::dbExecute(con, paste0("ATTACH '", tmp, "' AS aux")) #' #' library(dplyr, warn.conflicts = FALSE) #' copy_to(con, iris, "df", temporary = FALSE) #' copy_to(con, mtcars, in_schema("aux", "df"), temporary = FALSE) #' #' con %>% tbl("df") #' con %>% tbl(in_schema("aux", "df")) in_schema <- function(schema, table) { ident_q(paste0(schema, ".", table)) } dbplyr/R/build-sql.R0000644000176200001440000000357113474056125014003 0ustar liggesusers#' Build a SQL string. #' #' This is a convenience function that should prevent sql injection attacks #' (which in the context of dplyr are most likely to be accidental not #' deliberate) by automatically escaping all expressions in the input, while #' treating bare strings as sql. This is unlikely to prevent any serious #' attack, but should make it unlikely that you produce invalid sql. #' #' This function should be used only when generating `SELECT` clauses, #' other high level queries, or for other syntax that has no R equivalent. #' For individual function translations, prefer [sql_expr()]. #' #' @param ... input to convert to SQL. Use [sql()] to preserve #' user input as is (dangerous), and [ident()] to label user #' input as sql identifiers (safe) #' @param .env the environment in which to evaluate the arguments. Should not #' be needed in typical use. #' @param con database connection; used to select correct quoting characters. #' @keywords internal #' @export #' @examples #' con <- simulate_dbi() #' build_sql("SELECT * FROM TABLE", con = con) #' x <- "TABLE" #' build_sql("SELECT * FROM ", x, con = con) #' build_sql("SELECT * FROM ", ident(x), con = con) #' build_sql("SELECT * FROM ", sql(x), con = con) #' #' # http://xkcd.com/327/ #' name <- "Robert'); DROP TABLE Students;--" #' build_sql("INSERT INTO Students (Name) VALUES (", name, ")", con = con) build_sql <- function(..., .env = parent.frame(), con = sql_current_con()) { if (is.null(con)) { stop("`con` must not be NULL", call. = FALSE) } escape_expr <- function(x, con) { # If it's a string, leave it as is if (is.character(x)) return(x) val <- eval_bare(x, .env) # Skip nulls, so you can use if statements like in paste if (is.null(val)) return("") escape(val, con = con) } pieces <- purrr::map_chr(enexprs(...), escape_expr, con = con) sql(paste0(pieces, collapse = "")) } dbplyr/R/backend-.R0000644000176200001440000004526713476016143013561 0ustar liggesusers#' @include translate-sql-conditional.R #' @include translate-sql-window.R #' @include translate-sql-helpers.R #' @include translate-sql-paste.R #' @include translate-sql-string.R #' @include translate-sql-quantile.R #' @include escape.R #' @include sql.R #' @include utils.R NULL #' @export sql_translate_env.DBIConnection <- function(con) { sql_variant( base_scalar, base_agg, base_win ) } #' @export sql_subquery.DBIConnection <- function(con, from, name = unique_name(), ...) { if (is.ident(from)) { setNames(from, name) } else { build_sql("(", from, ") ", ident(name %||% unique_table_name()), con = con) } } #' @export #' @rdname sql_variant #' @format NULL base_scalar <- sql_translator( `+` = sql_infix("+"), `*` = sql_infix("*"), `/` = sql_infix("/"), `%%` = sql_infix("%"), `^` = sql_prefix("POWER", 2), `-` = function(x, y = NULL) { if (is.null(y)) { if (is.numeric(x)) { -x } else { sql_expr(-!!x) } } else { sql_expr(!!x - !!y) } }, `$` = sql_infix(".", pad = FALSE), `[[` = function(x, i) { i <- enexpr(i) if (!is.character(i)) { stop("Can only index with strings", call. = FALSE) } build_sql(x, ".", ident(i)) }, `[` = function(x, i) { build_sql("CASE WHEN (", i, ") THEN (", x, ") END") }, `!=` = sql_infix("!="), `==` = sql_infix("="), `<` = sql_infix("<"), `<=` = sql_infix("<="), `>` = sql_infix(">"), `>=` = sql_infix(">="), `%in%` = function(x, table) { if (is.sql(table) || length(table) > 1) { sql_expr(!!x %in% !!table) } else if (length(table) == 0) { sql_expr(FALSE) } else { sql_expr(!!x %in% ((!!table))) } }, `!` = sql_prefix("NOT"), `&` = sql_infix("AND"), `&&` = sql_infix("AND"), `|` = sql_infix("OR"), `||` = sql_infix("OR"), xor = function(x, y) { sql_expr(!!x %OR% !!y %AND NOT% (!!x %AND% !!y)) }, # bitwise operators # SQL Syntax reference links: # Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ArithmeticOperators # Impala: https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_bit_functions.html # PostgreSQL: https://www.postgresql.org/docs/7.4/functions-math.html # MS SQL: https://docs.microsoft.com/en-us/sql/t-sql/language-elements/bitwise-operators-transact-sql?view=sql-server-2017 # MySQL https://dev.mysql.com/doc/refman/5.7/en/bit-functions.html # Oracle: https://docs.oracle.com/cd/E19253-01/817-6223/chp-typeopexpr-7/index.html # SQLite: https://www.tutorialspoint.com/sqlite/sqlite_bitwise_operators.htm # Teradata: https://docs.teradata.com/reader/1DcoER_KpnGTfgPinRAFUw/h3CS4MuKL1LCMQmnubeSRQ bitwNot = function(x) sql_expr(~ ((!!x))), bitwAnd = sql_infix("&"), bitwOr = sql_infix("|"), bitwXor = sql_infix("^"), bitwShiftL = sql_infix("<<"), bitwShiftR = sql_infix(">>"), abs = sql_prefix("ABS", 1), acos = sql_prefix("ACOS", 1), asin = sql_prefix("ASIN", 1), atan = sql_prefix("ATAN", 1), atan2 = sql_prefix("ATAN2", 2), ceil = sql_prefix("CEIL", 1), ceiling = sql_prefix("CEIL", 1), cos = sql_prefix("COS", 1), cot = sql_prefix("COT", 1), exp = sql_prefix("EXP", 1), floor = sql_prefix("FLOOR", 1), log = function(x, base = exp(1)) { if (isTRUE(all.equal(base, exp(1)))) { sql_expr(ln(!!x)) } else { sql_expr(log(!!base, !!x)) } }, log10 = sql_prefix("LOG10", 1), round = sql_prefix("ROUND", 2), sign = sql_prefix("SIGN", 1), sin = sql_prefix("SIN", 1), sqrt = sql_prefix("SQRT", 1), tan = sql_prefix("TAN", 1), # cosh, sinh, coth and tanh calculations are based on this article # https://en.wikipedia.org/wiki/Hyperbolic_function cosh = function(x) sql_expr((!!sql_exp(1, x) + !!sql_exp(-1, x)) / 2L), sinh = function(x) sql_expr((!!sql_exp(1, x) - !!sql_exp(-1, x)) / 2L), tanh = function(x) sql_expr((!!sql_exp(2, x) - 1L) / (!!sql_exp(2, x) + 1L)), coth = function(x) sql_expr((!!sql_exp(2, x) + 1L) / (!!sql_exp(2, x) - 1L)), round = function(x, digits = 0L) { sql_expr(ROUND(!!x, !!as.integer(digits))) }, `if` = sql_if, if_else = function(condition, true, false) sql_if(condition, true, false), ifelse = function(test, yes, no) sql_if(test, yes, no), switch = function(x, ...) sql_switch(x, ...), case_when = function(...) sql_case_when(...), sql = function(...) sql(...), `(` = function(x) { sql_expr(((!!x))) }, `{` = function(x) { sql_expr(((!!x))) }, desc = function(x) { build_sql(x, sql(" DESC")) }, is.null = sql_is_null, is.na = sql_is_null, na_if = sql_prefix("NULLIF", 2), coalesce = sql_prefix("COALESCE"), as.numeric = sql_cast("NUMERIC"), as.double = sql_cast("NUMERIC"), as.integer = sql_cast("INTEGER"), as.character = sql_cast("TEXT"), as.logical = sql_cast("BOOLEAN"), as.Date = sql_cast("DATE"), as.POSIXct = sql_cast("TIMESTAMP"), # MS SQL - https://docs.microsoft.com/en-us/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql # Hive - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-IntegralTypes(TINYINT,SMALLINT,INT/INTEGER,BIGINT) # Postgres - https://www.postgresql.org/docs/8.4/static/datatype-numeric.html # Impala - https://impala.apache.org/docs/build/html/topics/impala_bigint.html as.integer64 = sql_cast("BIGINT"), c = function(...) c(...), `:` = function(from, to) from:to, between = function(x, left, right) { sql_expr(!!x %BETWEEN% !!left %AND% !!right) }, pmin = sql_prefix("LEAST"), pmax = sql_prefix("GREATEST"), `%>%` = `%>%`, # lubridate --------------------------------------------------------------- # https://en.wikibooks.org/wiki/SQL_Dialects_Reference/Functions_and_expressions/Date_and_time_functions as_date = sql_cast("DATE"), as_datetime = sql_cast("TIMESTAMP"), today = function() sql_expr(CURRENT_DATE), now = function() sql_expr(CURRENT_TIMESTAMP), # https://modern-sql.com/feature/extract year = function(x) sql_expr(EXTRACT(year %from% !!x)), month = function(x) sql_expr(EXTRACT(month %from% !!x)), day = function(x) sql_expr(EXTRACT(day %from% !!x)), mday = function(x) sql_expr(EXTRACT(day %from% !!x)), yday = sql_not_supported("yday()"), qday = sql_not_supported("qday()"), wday = sql_not_supported("wday()"), hour = function(x) sql_expr(EXTRACT(hour %from% !!x)), minute = function(x) sql_expr(EXTRACT(minute %from% !!x)), second = function(x) sql_expr(EXTRACT(second %from% !!x)), # String functions ------------------------------------------------------ # SQL Syntax reference links: # MySQL https://dev.mysql.com/doc/refman/5.7/en/string-functions.html # Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions # Impala: https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_string_functions.html # PostgreSQL: https://www.postgresql.org/docs/9.1/static/functions-string.html # MS SQL: https://docs.microsoft.com/en-us/sql/t-sql/functions/string-functions-transact-sql # Oracle: https://docs.oracle.com/database/121/SQLRF/functions002.htm#SQLRF51180 # base R nchar = sql_prefix("LENGTH", 1), tolower = sql_prefix("LOWER", 1), toupper = sql_prefix("UPPER", 1), trimws = function(x, which = "both") sql_str_trim(x, side = which), paste = sql_paste(" "), paste0 = sql_paste(""), substr = sql_substr("SUBSTR"), # stringr functions str_length = sql_prefix("LENGTH", 1), str_to_lower = sql_prefix("LOWER", 1), str_to_upper = sql_prefix("UPPER", 1), str_to_title = sql_prefix("INITCAP", 1), str_trim = sql_str_trim, str_c = sql_paste(""), str_sub = sql_str_sub("SUBSTR"), str_c = sql_not_supported("str_c()"), str_conv = sql_not_supported("str_conv()"), str_count = sql_not_supported("str_count()"), str_detect = sql_not_supported("str_detect()"), str_dup = sql_not_supported("str_dup()"), str_extract = sql_not_supported("str_extract()"), str_extract_all = sql_not_supported("str_extract_all()"), str_flatten = sql_not_supported("str_flatten()"), str_glue = sql_not_supported("str_glue()"), str_glue_data = sql_not_supported("str_glue_data()"), str_interp = sql_not_supported("str_interp()"), str_locate = sql_not_supported("str_locate()"), str_locate_all = sql_not_supported("str_locate_all()"), str_match = sql_not_supported("str_match()"), str_match_all = sql_not_supported("str_match_all()"), str_order = sql_not_supported("str_order()"), str_pad = sql_not_supported("str_pad()"), str_remove = sql_not_supported("str_remove()"), str_remove_all = sql_not_supported("str_remove_all()"), str_replace = sql_not_supported("str_replace()"), str_replace_all = sql_not_supported("str_replace_all()"), str_replace_na = sql_not_supported("str_replace_na()"), str_sort = sql_not_supported("str_sort()"), str_split = sql_not_supported("str_split()"), str_split_fixed = sql_not_supported("str_split_fixed()"), str_squish = sql_not_supported("str_squish()"), str_subset = sql_not_supported("str_subset()"), str_trunc = sql_not_supported("str_trunc()"), str_view = sql_not_supported("str_view()"), str_view_all = sql_not_supported("str_view_all()"), str_which = sql_not_supported("str_which()"), str_wrap = sql_not_supported("str_wrap()") ) base_symbols <- sql_translator( pi = sql("PI()"), `*` = sql("*"), `NULL` = sql("NULL") ) sql_exp <- function(a, x) { a <- as.integer(a) if (identical(a, 1L)) { sql_expr(EXP(!!x)) } else if (identical(a, -1L)) { sql_expr(EXP(-((!!x)))) } else { sql_expr(EXP(!!a * ((!!x)))) } } #' @export #' @rdname sql_variant #' @format NULL base_agg <- sql_translator( # SQL-92 aggregates # http://db.apache.org/derby/docs/10.7/ref/rrefsqlj33923.html n = function() sql("COUNT()"), mean = sql_aggregate("AVG", "mean"), var = sql_aggregate("VARIANCE", "var"), sum = sql_aggregate("SUM"), min = sql_aggregate("MIN"), max = sql_aggregate("MAX"), # Ordered set functions quantile = sql_quantile("PERCENTILE_CONT", "ordered"), median = sql_median("PERCENTILE_CONT", "ordered"), # first = sql_prefix("FIRST_VALUE", 1), # last = sql_prefix("LAST_VALUE", 1), # nth = sql_prefix("NTH_VALUE", 2), n_distinct = function(x) { build_sql("COUNT(DISTINCT ", x, ")") } ) #' @export #' @rdname sql_variant #' @format NULL base_win <- sql_translator( # rank functions have a single order argument that overrides the default row_number = win_rank("ROW_NUMBER"), min_rank = win_rank("RANK"), rank = win_rank("RANK"), dense_rank = win_rank("DENSE_RANK"), percent_rank = win_rank("PERCENT_RANK"), cume_dist = win_rank("CUME_DIST"), ntile = function(order_by, n) { win_over( sql_expr(NTILE(!!as.integer(n))), win_current_group(), order_by %||% win_current_order() ) }, # Variants that take more arguments first = function(x, order_by = NULL) { win_over( sql_expr(FIRST_VALUE(!!x)), win_current_group(), order_by %||% win_current_order() ) }, last = function(x, order_by = NULL) { win_over( sql_expr(LAST_VALUE(!!x)), win_current_group(), order_by %||% win_current_order() ) }, nth = function(x, n, order_by = NULL) { win_over( sql_expr(NTH_VALUE(!!x, !!as.integer(n))), win_current_group(), order_by %||% win_current_order() ) }, lead = function(x, n = 1L, default = NA, order_by = NULL) { win_over( sql_expr(LEAD(!!x, !!n, !!default)), win_current_group(), order_by %||% win_current_order() ) }, lag = function(x, n = 1L, default = NA, order_by = NULL) { win_over( sql_expr(LAG(!!x, !!as.integer(n), !!default)), win_current_group(), order_by %||% win_current_order() ) }, # Recycled aggregate fuctions take single argument, don't need order and # include entire partition in frame. mean = win_aggregate("AVG"), var = win_aggregate("VARIANCE"), sum = win_aggregate("SUM"), min = win_aggregate("MIN"), max = win_aggregate("MAX"), # Ordered set functions quantile = sql_quantile("PERCENTILE_CONT", "ordered", window = TRUE), median = sql_median("PERCENTILE_CONT", "ordered", window = TRUE), # Counts n = function() { win_over(sql("COUNT(*)"), win_current_group()) }, n_distinct = function(x) { win_over(build_sql("COUNT(DISTINCT ", x, ")"), win_current_group()) }, # Cumulative function are like recycled aggregates except that R names # have cum prefix, order_by is inherited and frame goes from -Inf to 0. cummean = win_cumulative("AVG"), cumsum = win_cumulative("SUM"), cummin = win_cumulative("MIN"), cummax = win_cumulative("MAX"), # Manually override other parameters -------------------------------------- order_by = function(order_by, expr) { old <- set_win_current_order(order_by) on.exit(set_win_current_order(old)) expr } ) #' @export #' @rdname sql_variant #' @format NULL base_no_win <- sql_translator( row_number = win_absent("ROW_NUMBER"), min_rank = win_absent("RANK"), rank = win_absent("RANK"), dense_rank = win_absent("DENSE_RANK"), percent_rank = win_absent("PERCENT_RANK"), cume_dist = win_absent("CUME_DIST"), ntile = win_absent("NTILE"), mean = win_absent("AVG"), sd = win_absent("SD"), var = win_absent("VAR"), cov = win_absent("COV"), cor = win_absent("COR"), sum = win_absent("SUM"), min = win_absent("MIN"), max = win_absent("MAX"), median = win_absent("PERCENTILE_CONT"), quantile = win_absent("PERCENTILE_CONT"), n = win_absent("N"), n_distinct = win_absent("N_DISTINCT"), cummean = win_absent("MEAN"), cumsum = win_absent("SUM"), cummin = win_absent("MIN"), cummax = win_absent("MAX"), nth = win_absent("NTH_VALUE"), first = win_absent("FIRST_VALUE"), last = win_absent("LAST_VALUE"), lead = win_absent("LEAD"), lag = win_absent("LAG"), order_by = win_absent("ORDER_BY"), str_flatten = win_absent("STR_FLATTEN"), count = win_absent("COUNT") ) # db_ methods ------------------------------------------------------------- #' @export db_desc.DBIConnection <- function(x) { class(x)[[1]] } #' @export db_list_tables.DBIConnection <- function(con) dbListTables(con) #' @export db_has_table.DBIConnection <- function(con, table) dbExistsTable(con, table) #' @export db_data_type.DBIConnection <- function(con, fields) { vapply(fields, dbDataType, dbObj = con, FUN.VALUE = character(1)) } #' @export db_save_query.DBIConnection <- function(con, sql, name, temporary = TRUE, ...) { tt_sql <- build_sql( "CREATE ", if (temporary) sql("TEMPORARY "), "TABLE ", as.sql(name), " AS ", sql, con = con ) dbExecute(con, tt_sql) name } #' @export db_begin.DBIConnection <- function(con, ...) { dbBegin(con) } #' @export db_commit.DBIConnection <- function(con, ...) dbCommit(con) #' @export db_rollback.DBIConnection <- function(con, ...) dbRollback(con) #' @export db_write_table.DBIConnection <- function(con, table, types, values, temporary = TRUE, ...) { dbWriteTable( con, name = dbi_quote(as.sql(table), con), value = values, field.types = types, temporary = temporary, row.names = FALSE ) table } #' @export db_create_table.DBIConnection <- function(con, table, types, temporary = TRUE, ...) { assert_that(is_string(table), is.character(types)) field_names <- escape(ident(names(types)), collapse = NULL, con = con) fields <- sql_vector( paste0(field_names, " ", types), parens = TRUE, collapse = ", ", con = con ) sql <- build_sql( "CREATE ", if (temporary) sql("TEMPORARY "), "TABLE ", as.sql(table), " ", fields, con = con ) dbExecute(con, sql) } #' @export db_insert_into.DBIConnection <- function(con, table, values, ...) { dbWriteTable(con, table, values, append = TRUE, row.names = FALSE) } #' @export db_create_indexes.DBIConnection <- function(con, table, indexes = NULL, unique = FALSE, ...) { if (is.null(indexes)) return() assert_that(is.list(indexes)) for (index in indexes) { db_create_index(con, table, index, unique = unique, ...) } } #' @export db_create_index.DBIConnection <- function(con, table, columns, name = NULL, unique = FALSE, ...) { assert_that(is_string(table), is.character(columns)) name <- name %||% paste0(c(table, columns), collapse = "_") fields <- escape(ident(columns), parens = TRUE, con = con) sql <- build_sql( "CREATE ", if (unique) sql("UNIQUE "), "INDEX ", as.sql(name), " ON ", as.sql(table), " ", fields, con = con) dbExecute(con, sql) } #' @export db_drop_table.DBIConnection <- function(con, table, force = FALSE, ...) { sql <- build_sql( "DROP TABLE ", if (force) sql("IF EXISTS "), as.sql(table), con = con ) dbExecute(con, sql) } #' @export db_analyze.DBIConnection <- function(con, table, ...) { sql <- build_sql("ANALYZE ", as.sql(table), con = con) dbExecute(con, sql) } #' @export db_explain.DBIConnection <- function(con, sql, ...) { exsql <- build_sql("EXPLAIN ", sql, con = con) expl <- dbGetQuery(con, exsql) out <- utils::capture.output(print(expl)) paste(out, collapse = "\n") } #' @export db_query_fields.DBIConnection <- function(con, sql, ...) { sql <- sql_select(con, sql("*"), sql_subquery(con, sql), where = sql("0 = 1")) qry <- dbSendQuery(con, sql) on.exit(dbClearResult(qry)) res <- dbFetch(qry, 0) names(res) } #' @export db_query_rows.DBIConnection <- function(con, sql, ...) { from <- sql_subquery(con, sql, "master") rows <- build_sql("SELECT COUNT(*) FROM ", from, con = con) as.integer(dbGetQuery(con, rows)[[1]]) } # Utility functions ------------------------------------------------------------ unique_table_name <- local({ i <- 0 function() { i <<- i + 1 sprintf("dbplyr_%03i", i) } }) res_warn_incomplete <- function(res, hint = "n = -1") { if (dbHasCompleted(res)) return() rows <- big_mark(dbGetRowCount(res)) warning("Only first ", rows, " results retrieved. Use ", hint, " to retrieve all.", call. = FALSE) } dbi_quote <- function(x, con) UseMethod("dbi_quote") dbi_quote.ident_q <- function(x, con) DBI::SQL(x) dbi_quote.ident <- function(x, con) DBI::dbQuoteIdentifier(con, x) dbi_quote.character <- function(x, con) DBI::dbQuoteString(con, x) dbi_quote.sql <- function(x, con) DBI::SQL(x) dbplyr/R/translate-sql-clause.R0000644000176200001440000000264413415756400016151 0ustar liggesusers sql_clause_generic <- function(clause, fields, con){ if (length(fields) > 0L) { assert_that(is.character(fields)) build_sql( sql(clause), " ", escape(fields, collapse = ", ", con = con), con = con ) } } sql_clause_select <- function(select, con, distinct = FALSE){ assert_that(is.character(select)) if (is_empty(select)) { abort("Query contains no columns") } build_sql( "SELECT ", if (distinct) sql("DISTINCT "), escape(select, collapse = ", ", con = con), con = con ) } sql_clause_where <- function(where, con){ if (length(where) > 0L) { assert_that(is.character(where)) where_paren <- escape(where, parens = TRUE, con = con) build_sql( "WHERE ", sql_vector(where_paren, collapse = " AND ", con = con), con = con ) } } sql_clause_limit <- function(limit, con){ if (!is.null(limit) && !identical(limit, Inf)) { assert_that(is.numeric(limit), length(limit) == 1L, limit >= 0) build_sql( "LIMIT ", sql(format(trunc(limit), scientific = FALSE)), con = con ) } } sql_clause_from <- function(from, con) sql_clause_generic("FROM", from, con) sql_clause_group_by <- function(group_by, con) sql_clause_generic("GROUP BY", group_by, con) sql_clause_having <- function(having, con) sql_clause_generic("HAVING", having, con) sql_clause_order_by <- function(order_by, con) sql_clause_generic("ORDER BY", order_by, con) dbplyr/R/backend-access.R0000644000176200001440000001437513417141014014726 0ustar liggesusers# sql_ generics -------------------------------------------- #' @export sql_select.ACCESS <- function(con, select, from, where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL, distinct = FALSE, ...) { out <- vector("list", 7) names(out) <- c("select", "from", "where", "group_by","having", "order_by", "limit") assert_that(is.character(select), length(select) > 0L) out$select <- build_sql( "SELECT ", if (distinct) sql("DISTINCT "), # Access uses the TOP statement instead of LIMIT which is what SQL92 uses # TOP is expected after DISTINCT and not at the end of the query # e.g: SELECT TOP 100 * FROM my_table if (!is.null(limit) && !identical(limit, Inf)) { assert_that(is.numeric(limit), length(limit) == 1L, limit > 0) build_sql("TOP ", as.integer(limit), " ", con = con) }, escape(select, collapse = ", ", con = con), con = con ) out$from <- sql_clause_from(from, con) out$where <- sql_clause_where(where, con) out$group_by <- sql_clause_group_by(group_by, con) out$having <- sql_clause_having(having, con) out$order_by <- sql_clause_order_by(order_by, con) escape(unname(purrr::compact(out)), collapse = "\n", parens = FALSE, con = con) } #' @export sql_translate_env.ACCESS <- function(con) { sql_variant( sql_translator(.parent = base_odbc_scalar, # Much of this translation comes from: https://www.techonthenet.com/access/functions/ # Conversion as.numeric = sql_prefix("CDBL"), as.double = sql_prefix("CDBL"), # as.integer() always rounds down. CInt does not, but Int does as.integer = sql_prefix("INT"), as.logical = sql_prefix("CBOOL"), as.character = sql_prefix("CSTR"), as.Date = sql_prefix("CDATE"), # Math exp = sql_prefix("EXP"), log = sql_prefix("LOG"), log10 = function(x) { sql_expr(log(!!x) / log(10L)) }, sqrt = sql_prefix("SQR"), sign = sql_prefix("SGN"), floor = sql_prefix("INT"), # Nearly add 1, then drop off the decimal. This results in the equivalent to ceiling() ceiling = function(x) { sql_expr(int(!!x + .9999999999)) }, ceil = function(x) { sql_expr(int(!!x + .9999999999)) }, # There is no POWER function in Access. It uses ^ instead `^` = function(x, y) { sql_expr((!!x) ^ (!!y)) }, # Strings nchar = sql_prefix("LEN"), tolower = sql_prefix("LCASE"), toupper = sql_prefix("UCASE"), # Pull `left` chars from the left, then `right` chars from the right to replicate substr substr = function(x, start, stop){ right <- stop - start + 1 left <- stop sql_expr(right(left(!!x, !!left), !!right)) }, trimws = sql_prefix("TRIM"), # No support for CONCAT in Access paste = sql_paste_infix(" ", "&", function(x) sql_expr(CStr(!!x))), paste0 = sql_paste_infix("", "&", function(x) sql_expr(CStr(!!x))), # Logic # Access always returns -1 for True and 0 for False is.null = sql_prefix("ISNULL"), is.na = sql_prefix("ISNULL"), # IIF() is like ifelse() ifelse = function(test, yes, no){ sql_expr(iif(!!test, !!yes, !!no)) }, # Coalesce doesn't exist in Access. # NZ() only works while in Access, not with the Access driver # IIF(ISNULL()) is the best way to construct this coalesce = function(x, y) { sql_expr(iif(isnull(!!x), !!y, !!x)) }, # pmin/pmax for 2 columns pmin = function(x, y) { sql_expr(iif(!!x <= !!y, !!x, !!y)) }, pmax = function(x, y) { sql_expr(iif(!!x <= !!y, !!y, !!x)) }, # Dates Sys.Date = sql_prefix("DATE") ), sql_translator(.parent = base_odbc_agg, mean = sql_prefix("AVG"), sd = sql_prefix("STDEV"), var = sql_prefix("VAR"), max = sql_prefix("MAX"), min = sql_prefix("MIN"), # Access does not have functions for cor and cov cor = sql_not_supported("cor()"), cov = sql_not_supported("cov()"), # Count # Count(Distinct *) does not work in Access # This would work but we don't know the table name when translating: # SELECT Count(*) FROM (SELECT DISTINCT * FROM table_name) AS T n_distinct = sql_not_supported("n_distinct") ), # Window functions not supported in Access sql_translator(.parent = base_no_win) )} # db_ generics ----------------------------------- #' @export db_analyze.ACCESS <- function(con, table, ...) { # Do nothing. Access doesn't support an analyze / update statistics function } # Util ------------------------------------------- #' @export sql_escape_logical.ACCESS <- function(con, x) { # Access uses a convention of -1 as True and 0 as False y <- ifelse(x, -1, 0) y[is.na(x)] <- "NULL" y } globalVariables(c("CStr", "iif", "isnull", "text")) dbplyr/R/backend-hive.R0000644000176200001440000000222613474056125014423 0ustar liggesusers#' @export sql_translate_env.Hive <- function(con) { sql_variant( sql_translator(.parent = base_odbc_scalar, bitwShiftL = sql_prefix("SHIFTLEFT", 2), bitwShiftR = sql_prefix("SHIFTRIGHT", 2), cot = function(x){ sql_expr(1 / tan(!!x)) }, str_replace_all = function(string, pattern, replacement) { sql_expr(regexp_replace(!!string, !!pattern, !!replacement)) } ), sql_translator(.parent = base_odbc_agg, var = sql_prefix("VARIANCE"), quantile = sql_quantile("PERCENTILE"), median = sql_median("PERCENTILE") ), sql_translator(.parent = base_odbc_win, var = win_aggregate("VARIANCE"), quantile = sql_quantile("PERCENTILE", window = TRUE), median = sql_median("PERCENTILE", window = TRUE) ) ) } #' @export db_analyze.Hive <- function(con, table, ...) { # Using ANALYZE TABLE instead of ANALYZE as recommended in this article: https://cwiki.apache.org/confluence/display/Hive/StatsDev sql <- build_sql( "ANALYZE TABLE ", as.sql(table), " COMPUTE STATISTICS" , con = con) DBI::dbExecute(con, sql) } globalVariables("regexp_replace") dbplyr/R/src_dbi.R0000644000176200001440000001101213426145336013501 0ustar liggesusers#' dplyr backend for any DBI-compatible database #' #' @description #' `src_dbi()` is a general dplyr backend that connects to any #' DBI driver. `src_memdb()` connects to a temporary in-memory SQLite #' database, that's useful for testing and experimenting. #' #' You can generate a `tbl()` directly from the DBI connection, or #' go via `src_dbi()`. #' #' @details #' All data manipulation on SQL tbls are lazy: they will not actually #' run the query or retrieve the data unless you ask for it: they all return #' a new `tbl_dbi` object. Use [compute()] to run the query and save the #' results in a temporary in the database, or use [collect()] to retrieve the #' results to R. You can see the query with [show_query()]. #' #' For best performance, the database should have an index on the variables #' that you are grouping by. Use [explain()] to check that the database is using #' the indexes that you expect. #' #' There is one exception: [do()] is not lazy since it must pull the data #' into R. #' #' @param con An object that inherits from [DBI::DBIConnection-class], #' typically generated by [DBI::dbConnect] #' @param auto_disconnect Should the connection be automatically closed when #' the src is deleted? Set to `TRUE` if you initialize the connection #' the call to `src_dbi()`. Pass `NA` to auto-disconnect but print a message #' when this happens. #' @return An S3 object with class `src_dbi`, `src_sql`, `src`. #' @export #' @examples #' # Basic connection using DBI ------------------------------------------- #' library(dplyr) #' #' con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") #' src <- src_dbi(con, auto_disconnect = TRUE) #' #' # Add some data #' copy_to(src, mtcars) #' src #' DBI::dbListTables(con) #' #' # To retrieve a single table from a source, use `tbl()` #' src %>% tbl("mtcars") #' #' # You can also use pass raw SQL if you want a more sophisticated query #' src %>% tbl(sql("SELECT * FROM mtcars WHERE cyl = 8")) #' #' # Alternatively, you can use the `src_sqlite()` helper #' src2 <- src_sqlite(":memory:", create = TRUE) #' #' # If you just want a temporary in-memory database, use src_memdb() #' src3 <- src_memdb() #' #' # To show off the full features of dplyr's database integration, #' # we'll use the Lahman database. lahman_sqlite() takes care of #' # creating the database. #' #' if (has_lahman("sqlite")) { #' lahman_p <- lahman_sqlite() #' batting <- lahman_p %>% tbl("Batting") #' batting #' #' # Basic data manipulation verbs work in the same way as with a tibble #' batting %>% filter(yearID > 2005, G > 130) #' batting %>% select(playerID:lgID) #' batting %>% arrange(playerID, desc(yearID)) #' batting %>% summarise(G = mean(G), n = n()) #' #' # There are a few exceptions. For example, databases give integer results #' # when dividing one integer by another. Multiply by 1 to fix the problem #' batting %>% #' select(playerID:lgID, AB, R, G) %>% #' mutate( #' R_per_game1 = R / G, #' R_per_game2 = R * 1.0 / G #' ) #' #' # All operations are lazy: they don't do anything until you request the #' # data, either by `print()`ing it (which shows the first ten rows), #' # or by `collect()`ing the results locally. #' system.time(recent <- filter(batting, yearID > 2010)) #' system.time(collect(recent)) #' #' # You can see the query that dplyr creates with show_query() #' batting %>% #' filter(G > 0) %>% #' group_by(playerID) %>% #' summarise(n = n()) %>% #' show_query() #' } src_dbi <- function(con, auto_disconnect = FALSE) { # stopifnot(is(con, "DBIConnection")) if (is_false(auto_disconnect)) { disco <- NULL } else { disco <- db_disconnector(con, quiet = is_true(auto_disconnect)) } subclass <- paste0("src_", class(con)[[1]]) structure( list( con = con, disco = disco ), class = c(subclass, "src_dbi", "src_sql", "src") ) } setOldClass(c("src_dbi", "src_sql", "src")) #' @export #' @aliases tbl_dbi #' @rdname src_dbi #' @param src Either a `src_dbi` or `DBIConnection` #' @param from Either a string (giving a table name) or literal [sql()]. #' @param ... Needed for compatibility with generic; currently ignored. tbl.src_dbi <- function(src, from, ...) { subclass <- class(src$con)[[1]] # prefix added by dplyr::make_tbl tbl_sql(c(subclass, "dbi"), src = src, from = from) } # Creates an environment that disconnects the database when it's GC'd db_disconnector <- function(con, quiet = FALSE) { reg.finalizer(environment(), function(...) { if (!quiet) { message("Auto-disconnecting ", class(con)[[1]]) } dbDisconnect(con) }) environment() } dbplyr/R/backend-mysql.R0000644000176200001440000001310413442404754014632 0ustar liggesusers#' @export db_desc.MySQLConnection <- function(x) { info <- dbGetInfo(x) paste0( "mysql ", info$serverVersion, " [", info$user, "@", info$host, ":", info$port, "/", info$dbname, "]" ) } #' @export db_desc.MariaDBConnection <- db_desc.MySQLConnection #' @export db_desc.MySQL <- db_desc.MySQLConnection #' @export sql_translate_env.MySQLConnection <- function(con) { sql_variant( sql_translator(.parent = base_scalar, as.logical = function(x) { sql_expr(IF(x, TRUE, FALSE)) }, as.character = sql_cast("CHAR"), # string functions ------------------------------------------------ paste = sql_paste(" "), paste0 = sql_paste(""), # stringr str_c = sql_paste(""), # https://dev.mysql.com/doc/refman/8.0/en/regexp.html # NB: case insensitive by default; could use REGEXP_LIKE for MySQL, # but available in MariaDB. A few more details at: # https://www.oreilly.com/library/view/mysql-cookbook/0596001452/ch04s11.html str_detect = sql_infix("REGEXP"), str_locate = function(string, pattern) { sql_expr(REGEXP_INSTR(!!string, !!pattern)) }, str_replace_all = function(string, pattern, replacement){ sql_expr(regexp_replace(!!string, !!pattern, !!replacement)) } ), sql_translator(.parent = base_agg, n = function() sql("COUNT(*)"), sd = sql_aggregate("STDDEV_SAMP", "sd"), var = sql_aggregate("VAR_SAMP", "var"), str_flatten = function(x, collapse) { sql_expr(group_concat(!!x %separator% !!collapse)) } ), sql_translator(.parent = base_win, n = function() { win_over(sql("COUNT(*)"), partition = win_current_group()) }, sd = win_aggregate("STDDEV_SAMP"), var = win_aggregate("VAR_SAMP"), # GROUP_CONCAT not currently available as window function # https://mariadb.com/kb/en/library/aggregate-functions-as-window-functions/ str_flatten = win_absent("str_flatten") ) ) } #' @export sql_translate_env.MariaDBConnection <- sql_translate_env.MySQLConnection #' @export sql_translate_env.MySQL <- sql_translate_env.MySQLConnection # DBI methods ------------------------------------------------------------------ #' @export db_has_table.MySQLConnection <- function(con, table, ...) { # MySQL has no way to list temporary tables, so we always NA to # skip any local checks and rely on the database to throw informative errors NA } #' @export db_has_table.MariaDBConnection <- db_has_table.MySQLConnection #' @export db_has_table.MySQL <- db_has_table.MySQLConnection #' @export db_data_type.MySQLConnection <- function(con, fields, ...) { char_type <- function(x) { n <- max(nchar(as.character(x), "bytes"), 0L, na.rm = TRUE) if (n <= 65535) { paste0("varchar(", n, ")") } else { "mediumtext" } } data_type <- function(x) { switch( class(x)[1], logical = "boolean", integer = "integer", numeric = "double", factor = char_type(x), character = char_type(x), Date = "date", POSIXct = "datetime", stop("Unknown class ", paste(class(x), collapse = "/"), call. = FALSE) ) } vapply(fields, data_type, character(1)) } #' @export db_begin.MySQLConnection <- function(con, ...) { dbExecute(con, "START TRANSACTION") } #' @export db_commit.MySQLConnection <- function(con, ...) { dbExecute(con, "COMMIT") } #' @export db_rollback.MySQLConnection <- function(con, ...) { dbExecute(con, "ROLLBACK") } #' @export db_write_table.MySQLConnection <- function(con, table, types, values, temporary = TRUE, ...) { db_create_table(con, table, types, temporary = temporary) values <- purrr::modify_if(values, is.logical, as.integer) values <- purrr::modify_if(values, is.factor, as.character) values <- purrr::modify_if(values, is.character, encodeString, na.encode = FALSE) tmp <- tempfile(fileext = ".csv") utils::write.table(values, tmp, sep = "\t", quote = FALSE, qmethod = "escape", na = "\\N", row.names = FALSE, col.names = FALSE) sql <- build_sql("LOAD DATA LOCAL INFILE ", encodeString(tmp), " INTO TABLE ", as.sql(table), con = con) dbExecute(con, sql) table } #' @export db_create_index.MySQLConnection <- function(con, table, columns, name = NULL, unique = FALSE, ...) { name <- name %||% paste0(c(table, columns), collapse = "_") fields <- escape(ident(columns), parens = TRUE, con = con) index <- build_sql( "ADD ", if (unique) sql("UNIQUE "), "INDEX ", ident(name), " ", fields, con = con ) sql <- build_sql("ALTER TABLE ", as.sql(table), "\n", index, con = con) dbExecute(con, sql) } #' @export db_create_index.MariaDBConnection <- db_create_index.MySQLConnection #' @export db_create_index.MySQL <- db_create_index.MySQLConnection #' @export db_analyze.MySQLConnection <- function(con, table, ...) { sql <- build_sql("ANALYZE TABLE", as.sql(table), con = con) dbExecute(con, sql) } #' @export db_analyze.MariaDBConnection <- db_analyze.MySQLConnection #' @export db_analyze.MySQL <- db_analyze.MySQLConnection # SQL methods ------------------------------------------------------------- #' @export sql_escape_ident.MySQLConnection <- function(con, x) { sql_quote(x, "`") } #' @export sql_join.MySQLConnection <- function(con, x, y, vars, type = "inner", by = NULL, ...) { if (identical(type, "full")) { stop("MySQL does not support full joins", call. = FALSE) } NextMethod() } globalVariables(c("%separator%", "group_concat", "IF", "REGEXP_INSTR")) dbplyr/R/testthat.R0000644000176200001440000000224013415745770013745 0ustar liggesusers expect_equal_tbl <- function(object, expected, ..., info = NULL, label = NULL, expected.label = NULL) { lab_act <- label %||% expr_label(substitute(object)) lab_exp <- expected.label %||% expr_label(substitute(expected)) ok <- dplyr::all_equal(collect(object), collect(expected), ...) msg <- glue(" {lab_act} not equal to {lab_exp}. {paste(ok, collapse = '\n')} ") testthat::expect(isTRUE(ok), msg, info = info) } expect_equal_tbls <- function(results, ref = NULL, ...) { stopifnot(is.list(results)) if (!is_named(results)) { result_name <- expr_name(substitute(results)) names(results) <- paste0(result_name, "_", seq_along(results)) } # If ref is NULL, use the first result if (is.null(ref)) { if (length(results) < 2) { testthat::skip("Need at least two srcs to compare") } ref <- results[[1]] ref_name <- names(results)[[1]] rest <- results[-1] } else { rest <- results ref_name <- "`ref`" } for (i in seq_along(rest)) { expect_equal_tbl( rest[[i]], ref, ..., label = names(rest)[[i]], expected.label = ref_name ) } invisible(TRUE) } dbplyr/R/query-join.R0000644000176200001440000000606513442161247014207 0ustar liggesusers#' @export #' @rdname sql_build join_query <- function(x, y, vars, type = "inner", by = NULL, suffix = c(".x", ".y")) { structure( list( x = x, y = y, vars = vars, type = type, by = by ), class = c("join_query", "query") ) } #' @export print.join_query <- function(x, ...) { cat_line("") cat_line("By:") cat_line(indent(paste0(x$by$x, "-", x$by$y))) cat_line("X:") cat_line(indent_print(sql_build(x$x))) cat_line("Y:") cat_line(indent_print(sql_build(x$y))) } #' @export sql_render.join_query <- function(query, con = NULL, ..., bare_identifier_ok = FALSE) { from_x <- sql_subquery( con, sql_render(query$x, con, ..., bare_identifier_ok = TRUE), name = "LHS" ) from_y <- sql_subquery( con, sql_render(query$y, con, ..., bare_identifier_ok = TRUE), name = "RHS" ) sql_join(con, from_x, from_y, vars = query$vars, type = query$type, by = query$by) } # SQL generation ---------------------------------------------------------- #' @export sql_join.DBIConnection <- function(con, x, y, vars, type = "inner", by = NULL, ...) { JOIN <- switch( type, left = sql("LEFT JOIN"), inner = sql("INNER JOIN"), right = sql("RIGHT JOIN"), full = sql("FULL JOIN"), cross = sql("CROSS JOIN"), stop("Unknown join type:", type, call. = FALSE) ) select <- sql_join_vars(con, vars) on <- sql_join_tbls(con, by) # Wrap with SELECT since callers assume a valid query is returned build_sql( "SELECT ", select, "\n", "FROM ", x, "\n", JOIN, " ", y, "\n", if (!is.null(on)) build_sql("ON ", on, "\n", con = con) else NULL, con = con ) } sql_join_vars <- function(con, vars) { sql_vector( mapply( FUN = sql_join_var, alias = vars$alias, x = vars$x, y = vars$y, MoreArgs = list(con = con), SIMPLIFY = FALSE, USE.NAMES = TRUE ), parens = FALSE, collapse = ", ", con = con ) } sql_join_var <- function(con, alias, x, y) { if (!is.na(x) & !is.na(y)) { sql_expr( COALESCE( !!sql_table_prefix(con, x, table = "LHS"), !!sql_table_prefix(con, y, table = "RHS") ), con = con ) } else if (!is.na(x)) { sql_table_prefix(con, x, table = "LHS") } else if (!is.na(y)) { sql_table_prefix(con, y, table = "RHS") } else { stop("No source for join column ", alias, call. = FALSE) } } sql_join_tbls <- function(con, by) { on <- NULL if (length(by$x) + length(by$y) > 0) { on <- sql_vector( paste0( sql_table_prefix(con, by$x, "LHS"), " = ", sql_table_prefix(con, by$y, "RHS") ), collapse = " AND ", parens = TRUE, con = con ) } else if (length(by$on) > 0) { on <- build_sql("(", by$on, ")", con = con) } on } sql_table_prefix <- function(con, var, table = NULL) { var <- sql_escape_ident(con, var) if (!is.null(table)) { table <- sql_escape_ident(con, table) sql(paste0(table, ".", var)) } else { var } } utils::globalVariables("COALESCE") dbplyr/R/backend-postgres.R0000644000176200001440000001453613455376446015357 0ustar liggesusers#' @export db_desc.PostgreSQLConnection <- function(x) { info <- dbGetInfo(x) host <- if (info$host == "") "localhost" else info$host paste0("postgres ", info$serverVersion, " [", info$user, "@", host, ":", info$port, "/", info$dbname, "]") } #' @export db_desc.PostgreSQL <- db_desc.PostgreSQLConnection #' @export db_desc.PqConnection <- db_desc.PostgreSQLConnection postgres_grepl <- function(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE) { # https://www.postgresql.org/docs/current/static/functions-matching.html#FUNCTIONS-POSIX-TABLE if (any(c(perl, fixed, useBytes))) { abort("`perl`, `fixed` and `useBytes` parameters are unsupported") } if (ignore.case) { sql_expr(((!!x)) %~*% ((!!pattern))) } else { sql_expr(((!!x)) %~% ((!!pattern))) } } postgres_round <- function(x, digits = 0L) { digits <- as.integer(digits) sql_expr(round(((!!x)) %::% numeric, !!digits)) } #' @export sql_translate_env.PostgreSQLConnection <- function(con) { sql_variant( sql_translator(.parent = base_scalar, bitwXor = sql_infix("#"), log10 = function(x) sql_expr(log(!!x)), log = sql_log(), cot = sql_cot(), round = postgres_round, grepl = postgres_grepl, paste = sql_paste(" "), paste0 = sql_paste(""), # stringr functions # https://www.postgresql.org/docs/9.1/functions-string.html # https://www.postgresql.org/docs/9.1/functions-matching.html#FUNCTIONS-POSIX-REGEXP str_c = sql_paste(""), str_locate = function(string, pattern) { sql_expr(strpos(!!string, !!pattern)) }, str_detect = function(string, pattern) { sql_expr(strpos(!!string, !!pattern) > 0L) }, str_replace_all = function(string, pattern, replacement){ sql_expr(regexp_replace(!!string, !!pattern, !!replacement)) }, # lubridate functions month = function(x, label = FALSE, abbr = TRUE) { if (!label) { sql_expr(EXTRACT(MONTH %FROM% !!x)) } else { if (abbr) { sql_expr(TO_CHAR(!!x, "Mon")) } else { sql_expr(TO_CHAR(!!x, "Month")) } } }, quarter = function(x, with_year = FALSE, fiscal_start = 1) { if (fiscal_start != 1) { stop("`fiscal_start` is not supported in PostgreSQL translation. Must be 1.", call. = FALSE) } if (with_year) { sql_expr((EXTRACT(YEAR %FROM% !!x) || '.' || EXTRACT(QUARTER %FROM% !!x))) } else { sql_expr(EXTRACT(QUARTER %FROM% !!x)) } }, wday = function(x, label = FALSE, abbr = TRUE, week_start = NULL) { if (!label) { week_start <- week_start %||% getOption("lubridate.week.start", 7) offset <- as.integer(7 - week_start) sql_expr(EXTRACT("dow" %FROM% DATE(!!x) + !!offset) + 1) } else if (label && !abbr) { sql_expr(TO_CHAR(!!x, "Day")) } else if (label && abbr) { sql_expr(SUBSTR(TO_CHAR(!!x, "Day"), 1, 3)) } else { stop("Unrecognized arguments to `wday`", call. = FALSE) } }, yday = function(x) sql_expr(EXTRACT(DOY %FROM% !!x)) ), sql_translator(.parent = base_agg, n = function() sql("COUNT(*)"), cor = sql_aggregate_2("CORR"), cov = sql_aggregate_2("COVAR_SAMP"), sd = sql_aggregate("STDDEV_SAMP", "sd"), var = sql_aggregate("VAR_SAMP", "var"), all = sql_aggregate("BOOL_AND", "all"), any = sql_aggregate("BOOL_OR", "any"), str_flatten = function(x, collapse) sql_expr(string_agg(!!x, !!collapse)) ), sql_translator(.parent = base_win, n = function() { win_over(sql("COUNT(*)"), partition = win_current_group()) }, cor = win_aggregate_2("CORR"), cov = win_aggregate_2("COVAR_SAMP"), sd = win_aggregate("STDDEV_SAMP"), var = win_aggregate("VAR_SAMP"), all = win_aggregate("BOOL_AND"), any = win_aggregate("BOOL_OR"), str_flatten = function(x, collapse) { win_over( sql_expr(string_agg(!!x, !!collapse)), partition = win_current_group(), order = win_current_order() ) } ) ) } #' @export sql_translate_env.PostgreSQL <- sql_translate_env.PostgreSQLConnection #' @export sql_translate_env.PqConnection <- sql_translate_env.PostgreSQLConnection #' @export sql_translate_env.Redshift <- sql_translate_env.PostgreSQLConnection # DBI methods ------------------------------------------------------------------ # Doesn't return TRUE for temporary tables #' @export db_has_table.PostgreSQLConnection <- function(con, table, ...) { table %in% db_list_tables(con) } #' @export db_begin.PostgreSQLConnection <- function(con, ...) { dbExecute(con, "BEGIN TRANSACTION") } #' @export db_write_table.PostgreSQLConnection <- function(con, table, types, values, temporary = TRUE, ...) { db_create_table(con, table, types, temporary = temporary) if (nrow(values) == 0) return(NULL) cols <- lapply(values, escape, collapse = NULL, parens = FALSE, con = con) col_mat <- matrix(unlist(cols, use.names = FALSE), nrow = nrow(values)) rows <- apply(col_mat, 1, paste0, collapse = ", ") values <- paste0("(", rows, ")", collapse = "\n, ") sql <- build_sql("INSERT INTO ", as.sql(table), " VALUES ", sql(values), con = con) dbExecute(con, sql) table } #' @export db_query_fields.PostgreSQLConnection <- function(con, sql, ...) { fields <- build_sql( "SELECT * FROM ", sql_subquery(con, sql), " WHERE 0=1", con = con ) qry <- dbSendQuery(con, fields) on.exit(dbClearResult(qry)) dbGetInfo(qry)$fieldDescription[[1]]$name } # http://www.postgresql.org/docs/9.3/static/sql-explain.html #' @export db_explain.PostgreSQLConnection <- function(con, sql, format = "text", ...) { format <- match.arg(format, c("text", "json", "yaml", "xml")) exsql <- build_sql( "EXPLAIN ", if (!is.null(format)) sql(paste0("(FORMAT ", format, ") ")), sql, con = con ) expl <- dbGetQuery(con, exsql) paste(expl[[1]], collapse = "\n") } #' @export db_explain.PostgreSQL <- db_explain.PostgreSQLConnection #' @export db_explain.PqConnection <- db_explain.PostgreSQLConnection globalVariables(c("strpos", "%::%", "%FROM%", "DATE", "EXTRACT", "TO_CHAR", "string_agg", "%~*%", "%~%", "MONTH", "DOY")) dbplyr/R/backend-odbc.R0000644000176200001440000000255313416403620014372 0ustar liggesusers#' @include translate-sql-window.R #' @include translate-sql-helpers.R #' @include translate-sql-paste.R #' @include escape.R NULL #' @export sql_translate_env.OdbcConnection <- function(con) { sql_variant( base_odbc_scalar, base_odbc_agg, base_odbc_win ) } #' @export #' @rdname sql_variant #' @format NULL base_odbc_scalar <- sql_translator( .parent = base_scalar, as.numeric = sql_cast("DOUBLE"), as.double = sql_cast("DOUBLE"), as.integer = sql_cast("INT"), as.character = sql_cast("STRING") ) #' @export #' @rdname sql_variant #' @format NULL base_odbc_agg <- sql_translator( .parent = base_agg, n = function() sql("COUNT(*)"), count = function() sql("COUNT(*)"), sd = sql_prefix("STDDEV_SAMP") ) #' @export #' @rdname sql_variant #' @format NULL base_odbc_win <- sql_translator( .parent = base_win, sd = win_aggregate("STDDEV_SAMP"), n = function() win_over(sql("COUNT(*)"), win_current_group()), count = function() win_over(sql("COUNT(*)"), win_current_group()) ) #' @export db_desc.OdbcConnection <- function(x) { info <- DBI::dbGetInfo(x) host <- if (info$servername == "") "localhost" else info$servername port <- if (info$port == "") "" else paste0(":", info$port) paste0( info$dbms.name, " ", info$db.version, "[", info$username, "@", host, port, "/", info$dbname, "]" ) } utils::globalVariables("EXP") dbplyr/R/translate-sql-conditional.R0000644000176200001440000000334613416404441017174 0ustar liggesuserssql_if <- function(cond, if_true, if_false = NULL) { build_sql( "CASE WHEN (", cond, ")", " THEN (", if_true, ")", if (!is.null(if_false)) build_sql(" WHEN NOT(", cond, ") THEN (", if_false, ")"), " END" ) } sql_case_when <- function(...) { # TODO: switch to dplyr::case_when_prepare when available formulas <- list2(...) n <- length(formulas) if (n == 0) { abort("No cases provided") } query <- vector("list", n) value <- vector("list", n) old <- sql_current_context() on.exit(set_current_context(old), add = TRUE) set_current_context(list(clause = "")) for (i in seq_len(n)) { f <- formulas[[i]] env <- environment(f) query[[i]] <- escape(eval_bare(f[[2]], env), con = sql_current_con()) value[[i]] <- escape(eval_bare(f[[3]], env), con = sql_current_con()) } clauses <- purrr::map2_chr(query, value, ~ paste0("WHEN (", .x, ") THEN (", .y, ")")) # if a formula like TRUE ~ "other" is at the end of a sequence, use ELSE statement if (query[[n]] == "TRUE") { clauses[[n]] <- paste0("ELSE (", value[[n]], ")") } sql(paste0( "CASE\n", paste0(clauses, collapse = "\n"), "\nEND" )) } sql_switch <- function(x, ...) { input <- list2(...) named <- names(input) != "" clauses <- purrr::map2_chr(names(input)[named], input[named], function(x, y) { build_sql("WHEN (", x , ") THEN (", y, ") ") }) n_unnamed <- sum(!named) if (n_unnamed == 0) { # do nothing } else if (n_unnamed == 1) { clauses <- c(clauses, build_sql("ELSE ", input[!named], " ")) } else { stop("Can only have one unnamed (ELSE) input", call. = FALSE) } build_sql("CASE ", x, " ", !!!clauses, "END") } sql_is_null <- function(x) { sql_expr((((!!x)) %is% NULL)) } dbplyr/R/data-nycflights13.R0000644000176200001440000000405513474056125015332 0ustar liggesusers#' Database versions of the nycflights13 data #' #' These functions cache the data from the `nycflights13` database in #' a local database, for use in examples and vignettes. Indexes are created #' to making joining tables on natural keys efficient. #' #' @keywords internal #' @name nycflights13 NULL #' @export #' @rdname nycflights13 #' @param path location of SQLite database file nycflights13_sqlite <- function(path = NULL) { cache_computation("nycflights_sqlite", { path <- db_location(path, "nycflights13.sqlite") message("Caching nycflights db at ", path) src <- src_sqlite(path, create = TRUE) copy_nycflights13(src) }) } #' @export #' @rdname nycflights13 #' @param dbname,... Arguments passed on to [src_postgres()] nycflights13_postgres <- function(dbname = "nycflights13", ...) { cache_computation("nycflights_postgres", { message("Caching nycflights db in postgresql db ", dbname) copy_nycflights13(src_postgres(dbname, ...)) }) } #' @rdname nycflights13 #' @export has_nycflights13 <- function(type = c("sqlite", "postgresql"), ...) { if (!requireNamespace("nycflights13", quietly = TRUE)) return(FALSE) type <- match.arg(type) succeeds(switch( type, sqlite = nycflights13_sqlite(...), quiet = TRUE, postgres = nycflights13_postgres(...), quiet = TRUE )) } #' @export #' @rdname nycflights13 copy_nycflights13 <- function(src, ...) { all <- utils::data(package = "nycflights13")$results[, 3] unique_index <- list( airlines = list("carrier"), planes = list("tailnum") ) index <- list( airports = list("faa"), flights = list(c("year", "month", "day"), "carrier", "tailnum", "origin", "dest"), weather = list(c("year", "month", "day"), "origin") ) tables <- setdiff(all, src_tbls(src)) # Create missing tables for (table in tables) { df <- getExportedValue("nycflights13", table) message("Creating table: ", table) copy_to( src, df, table, unique_indexes = unique_index[[table]], indexes = index[[table]], temporary = FALSE ) } src } dbplyr/R/verb-set-ops.R0000644000176200001440000000331113426576515014434 0ustar liggesusers# registered onLoad intersect.tbl_lazy <- function(x, y, copy = FALSE, ...) { add_op_set_op(x, y, "INTERSECT", copy = copy, ...) } # registered onLoad union.tbl_lazy <- function(x, y, copy = FALSE, ...) { add_op_set_op(x, y, "UNION", copy = copy, ...) } #' @export union_all.tbl_lazy <- function(x, y, copy = FALSE, ...) { add_op_set_op(x, y, "UNION ALL", copy = copy, ...) } # registered onLoad setdiff.tbl_lazy <- function(x, y, copy = FALSE, ...) { add_op_set_op(x, y, "EXCEPT", copy = copy, ...) } add_op_set_op <- function(x, y, type, copy = FALSE, ...) { y <- auto_copy(x, y, copy) if (inherits(x$src$con, "SQLiteConnection")) { # LIMIT only part the compound-select-statement not the select-core. # # https://www.sqlite.org/syntax/compound-select-stmt.html # https://www.sqlite.org/syntax/select-core.html if (inherits(x$ops, "op_head") || inherits(y$ops, "op_head")) { stop("SQLite does not support set operations on LIMITs", call. = FALSE) } } # Ensure each has same variables vars <- union(op_vars(x), op_vars(y)) x <- fill_vars(x, vars) y <- fill_vars(y, vars) x$ops <- op_double("set_op", x, y, args = list(type = type)) x } fill_vars <- function(x, vars) { x_vars <- op_vars(x) if (identical(x_vars, vars)) { return(x) } new_vars <- lapply(set_names(vars), function(var) { if (var %in% x_vars) { sym(var) } else { NA } }) x$ops <- op_select(x$ops, new_vars) x } #' @export op_vars.op_set_op <- function(op) { union(op_vars(op$x), op_vars(op$y)) } #' @export sql_build.op_set_op <- function(op, con, ...) { # add_op_set_op() ensures that both have same variables set_op_query(op$x, op$y, type = op$args$type) } dbplyr/R/translate-sql.R0000644000176200001440000001656013501726003014671 0ustar liggesusers#' Translate an expression to sql. #' #' @section Base translation: #' The base translator, `base_sql`, #' provides custom mappings for `!` (to NOT), `&&` and `&` to #' `AND`, `||` and `|` to `OR`, `^` to `POWER`, #' \code{\%>\%} to \code{\%}, `ceiling` to `CEIL`, `mean` to #' `AVG`, `var` to `VARIANCE`, `tolower` to `LOWER`, #' `toupper` to `UPPER` and `nchar` to `LENGTH`. #' #' `c()` and `:` keep their usual R behaviour so you can easily create #' vectors that are passed to sql. #' #' All other functions will be preserved as is. R's infix functions #' (e.g. \code{\%like\%}) will be converted to their SQL equivalents #' (e.g. `LIKE`). You can use this to access SQL string concatenation: #' `||` is mapped to `OR`, but \code{\%||\%} is mapped to `||`. #' To suppress this behaviour, and force errors immediately when dplyr doesn't #' know how to translate a function it encounters, using set the #' `dplyr.strict_sql` option to `TRUE`. #' #' You can also use [sql()] to insert a raw sql string. #' #' @section SQLite translation: #' The SQLite variant currently only adds one additional function: a mapping #' from `sd()` to the SQL aggregation function `STDEV`. #' #' @param ...,dots Expressions to translate. `translate_sql()` #' automatically quotes them for you. `translate_sql_()` expects #' a list of already quoted objects. #' @param con An optional database connection to control the details of #' the translation. The default, `NULL`, generates ANSI SQL. #' @param vars Deprecated. Now call [partial_eval()] directly. #' @param vars_group,vars_order,vars_frame Parameters used in the `OVER` #' expression of windowed functions. #' @param window Use `FALSE` to suppress generation of the `OVER` #' statement used for window functions. This is necessary when generating #' SQL for a grouped summary. #' @param context Use to carry information for special translation cases. For example, MS SQL needs a different conversion for is.na() in WHERE vs. SELECT clauses. Expects a list. #' @export #' @examples #' # Regular maths is translated in a very straightforward way #' translate_sql(x + 1) #' translate_sql(sin(x) + tan(y)) #' #' # Note that all variable names are escaped #' translate_sql(like == "x") #' # In ANSI SQL: "" quotes variable _names_, '' quotes strings #' #' # Logical operators are converted to their sql equivalents #' translate_sql(x < 5 & !(y >= 5)) #' # xor() doesn't have a direct SQL equivalent #' translate_sql(xor(x, y)) #' #' # If is translated into case when #' translate_sql(if (x > 5) "big" else "small") #' #' # Infix functions are passed onto SQL with % removed #' translate_sql(first %like% "Had%") #' translate_sql(first %is% NA) #' translate_sql(first %in% c("John", "Roger", "Robert")) #' #' # And be careful if you really want integers #' translate_sql(x == 1) #' translate_sql(x == 1L) #' #' # If you have an already quoted object, use translate_sql_: #' x <- quote(y + 1 / sin(t)) #' translate_sql_(list(x), con = simulate_dbi()) #' #' # Windowed translation -------------------------------------------- #' # Known window functions automatically get OVER() #' translate_sql(mpg > mean(mpg)) #' #' # Suppress this with window = FALSE #' translate_sql(mpg > mean(mpg), window = FALSE) #' #' # vars_group controls partition: #' translate_sql(mpg > mean(mpg), vars_group = "cyl") #' #' # and vars_order controls ordering for those functions that need it #' translate_sql(cumsum(mpg)) #' translate_sql(cumsum(mpg), vars_order = "mpg") translate_sql <- function(..., con = simulate_dbi(), vars = character(), vars_group = NULL, vars_order = NULL, vars_frame = NULL, window = TRUE) { if (!missing(vars)) { abort("`vars` is deprecated. Please use partial_eval() directly.") } translate_sql_( quos(...), con = con, vars_group = vars_group, vars_order = vars_order, vars_frame = vars_frame, window = window ) } #' @export #' @rdname translate_sql translate_sql_ <- function(dots, con = NULL, vars_group = NULL, vars_order = NULL, vars_frame = NULL, window = TRUE, context = list()) { if (length(dots) == 0) { return(sql()) } stopifnot(is.list(dots)) if (!any(have_name(dots))) { names(dots) <- NULL } old_con <- set_current_con(con) on.exit(set_current_con(old_con), add = TRUE) if (length(context) > 0) { old_context <- set_current_context(context) on.exit(set_current_context(old_context), add = TRUE) } if (window) { old_group <- set_win_current_group(vars_group) on.exit(set_win_current_group(old_group), add = TRUE) old_order <- set_win_current_order(vars_order) on.exit(set_win_current_order(old_order), add = TRUE) old_frame <- set_win_current_frame(vars_frame) on.exit(set_win_current_frame(old_frame), add = TRUE) } variant <- sql_translate_env(con) pieces <- lapply(dots, function(x) { if (is_null(get_expr(x))) { NULL } else if (is_atomic(get_expr(x))) { escape(get_expr(x), con = con) } else { mask <- sql_data_mask(x, variant, con = con, window = window) escape(eval_tidy(x, mask), con = con) } }) sql(unlist(pieces)) } sql_data_mask <- function(expr, variant, con, window = FALSE, strict = getOption("dplyr.strict_sql", FALSE)) { stopifnot(is.sql_variant(variant)) # Default for unknown functions if (!strict) { unknown <- setdiff(all_calls(expr), names(variant)) top_env <- ceply(unknown, default_op, parent = empty_env()) } else { top_env <- child_env(NULL) } # Known R -> SQL functions special_calls <- copy_env(variant$scalar, parent = top_env) if (!window) { special_calls2 <- copy_env(variant$aggregate, parent = special_calls) } else { special_calls2 <- copy_env(variant$window, parent = special_calls) } # Existing symbols in expression names <- all_names(expr) idents <- lapply(names, ident) name_env <- ceply(idents, escape, con = con, parent = special_calls2) # Known sql expressions symbol_env <- env_clone(base_symbols, parent = name_env) new_data_mask(symbol_env, top_env) } is_infix_base <- function(x) { x %in% c("::", "$", "@", "^", "*", "/", "+", "-", ">", ">=", "<", "<=", "==", "!=", "!", "&", "&&", "|", "||", "~", "<-", "<<-") } is_infix_user <- function(x) { grepl("^%.*%$", x) } default_op <- function(x) { assert_that(is_string(x)) if (is_infix_base(x)) { sql_infix(x) } else if (is_infix_user(x)) { x <- substr(x, 2, nchar(x) - 1) sql_infix(x) } else { sql_prefix(x) } } all_calls <- function(x) { if (is_quosure(x)) return(all_calls(quo_get_expr(x))) if (!is.call(x)) return(NULL) fname <- as.character(x[[1]]) unique(c(fname, unlist(lapply(x[-1], all_calls), use.names = FALSE))) } all_names <- function(x) { if (is.name(x)) return(as.character(x)) if (is_quosure(x)) return(all_names(quo_get_expr(x))) if (!is.call(x)) return(NULL) unique(unlist(lapply(x[-1], all_names), use.names = FALSE)) } # character vector -> environment ceply <- function(x, f, ..., parent = parent.frame()) { if (length(x) == 0) return(new.env(parent = parent)) l <- lapply(x, f, ...) names(l) <- x list2env(l, parent = parent) } dbplyr/R/zzz.R0000644000176200001440000000264613442425712012743 0ustar liggesusers# nocov start .onLoad <- function(...) { register_s3_method("dplyr", "union", "tbl_lazy") register_s3_method("dplyr", "intersect", "tbl_lazy") register_s3_method("dplyr", "setdiff", "tbl_lazy") register_s3_method("dplyr", "setdiff", "tbl_Oracle") register_s3_method("dplyr", "setdiff", "OraConnection") register_s3_method("dplyr", "filter", "tbl_lazy") if (utils::packageVersion("dplyr") >= "0.8.0.9008") { register_s3_method("dplyr", "group_by_drop_default", "tbl_lazy") } # These are also currently defined in dplyr, and we want to avoid a warning # about double duplication. Conditional can be removed after update to # dplyr if (!methods::isClass("sql")) { setOldClass(c("sql", "character"), sql()) } } register_s3_method <- function(pkg, generic, class, fun = NULL) { stopifnot(is.character(pkg), length(pkg) == 1) stopifnot(is.character(generic), length(generic) == 1) stopifnot(is.character(class), length(class) == 1) if (is.null(fun)) { fun <- get(paste0(generic, ".", class), envir = parent.frame()) } else { stopifnot(is.function(fun)) } if (pkg %in% loadedNamespaces()) { registerS3method(generic, class, fun, envir = asNamespace(pkg)) } # Always register hook in case package is later unloaded & reloaded setHook( packageEvent(pkg, "onLoad"), function(...) { registerS3method(generic, class, fun, envir = asNamespace(pkg)) } ) } # nocov end dbplyr/R/query-select.R0000644000176200001440000001014613475552423014530 0ustar liggesusers#' @export #' @rdname sql_build select_query <- function(from, select = sql("*"), where = character(), group_by = character(), having = character(), order_by = character(), limit = NULL, distinct = FALSE) { stopifnot(is.character(select)) stopifnot(is.character(where)) stopifnot(is.character(group_by)) stopifnot(is.character(having)) stopifnot(is.character(order_by)) stopifnot(is.null(limit) || (is.numeric(limit) && length(limit) == 1L)) stopifnot(is.logical(distinct), length(distinct) == 1L) structure( list( from = from, select = select, where = where, group_by = group_by, having = having, order_by = order_by, distinct = distinct, limit = limit ), class = c("select_query", "query") ) } #' @export print.select_query <- function(x, ...) { cat( "\n", sep = "" ) cat_line("From:") cat_line(indent_print(sql_build(x$from))) if (length(x$select)) cat("Select: ", named_commas(x$select), "\n", sep = "") if (length(x$where)) cat("Where: ", named_commas(x$where), "\n", sep = "") if (length(x$group_by)) cat("Group by: ", named_commas(x$group_by), "\n", sep = "") if (length(x$order_by)) cat("Order by: ", named_commas(x$order_by), "\n", sep = "") if (length(x$having)) cat("Having: ", named_commas(x$having), "\n", sep = "") if (length(x$limit)) cat("Limit: ", x$limit, "\n", sep = "") } #' @export sql_optimise.select_query <- function(x, con = NULL, ...) { if (!inherits(x$from, "select_query")) { return(x) } from <- sql_optimise(x$from) # If all outer clauses are executed after the inner clauses, we # can drop them down a level outer <- select_query_clauses(x) inner <- select_query_clauses(from) can_squash <- length(outer) == 0 || length(inner) == 0 || min(outer) > max(inner) if (can_squash) { from[as.character(outer)] <- x[as.character(outer)] from } else { x } } # List clauses used by a query, in the order they are executed # https://sqlbolt.com/lesson/select_queries_order_of_execution # List clauses used by a query, in the order they are executed in select_query_clauses <- function(x) { present <- c( where = length(x$where) > 0, group_by = length(x$group_by) > 0, having = length(x$having) > 0, select = !identical(x$select, sql("*")), distinct = x$distinct, order_by = length(x$order_by) > 0, limit = !is.null(x$limit) ) ordered(names(present)[present], levels = names(present)) } #' @export sql_render.select_query <- function(query, con, ..., bare_identifier_ok = FALSE) { from <- sql_subquery(con, sql_render(query$from, con, ..., bare_identifier_ok = TRUE), name = NULL ) sql_select( con, query$select, from, where = query$where, group_by = query$group_by, having = query$having, order_by = query$order_by, limit = query$limit, distinct = query$distinct, ..., bare_identifier_ok = bare_identifier_ok ) } # SQL generation ---------------------------------------------------------- #' @export sql_select.DBIConnection <- function(con, select, from, where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL, distinct = FALSE, ...) { out <- vector("list", 7) names(out) <- c("select", "from", "where", "group_by", "having", "order_by", "limit") out$select <- sql_clause_select(select, con, distinct) out$from <- sql_clause_from(from, con) out$where <- sql_clause_where(where, con) out$group_by <- sql_clause_group_by(group_by, con) out$having <- sql_clause_having(having, con) out$order_by <- sql_clause_order_by(order_by, con) out$limit <- sql_clause_limit(limit, con) escape(unname(purrr::compact(out)), collapse = "\n", parens = FALSE, con = con) } dbplyr/vignettes/0000755000176200001440000000000013501765351013564 5ustar liggesusersdbplyr/vignettes/reprex.Rmd0000644000176200001440000000556613426327622015552 0ustar liggesusers--- title: "Reprexes for dbplyr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{reprex} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` If you're reporting a bug in dbplyr, it is much easier for me to help you if you can supply a [reprex](https://reprex.tidyverse.org) that I can run on my computer. Creating reprexes for dbplyr is particularly challenging because you are probably using a database that you can't share with me. Fortunately, in many cases you can still demonstrate the problem even if I don't have the complete dataset, or even access to the database system that you're using. This vignette outlines three approaches for creating reprexes that will work anywhere: * Use `memdb_frame()`/`tbl_memdb()` to easily create datasets that live in an in-memory SQLite database. * Use `lazy_frame()`/`tbl_lazy()` to simulate SQL generation of dplyr pipelines. * Use `translate_sql()` to simulate SQL generation of columnar expression. ```{r setup, message = FALSE} library(dplyr) library(dbplyr) ``` ## Using `memdb_frame()` The first place to start is with SQLite. SQLite is particularly appealing because it's completely embedded instead an R package so doesn't have any external dependencies. SQLite is designed to be small and simple, so it can't demonstrate all problems, but it's easy to try out and a great place to start. You can easily create a SQLite in-memory database table using `memdb_frame()`: ```{r} mf <- memdb_frame(g = c(1, 1, 2, 2, 2), x = 1:5, y = 5:1) mf mf %>% group_by(g) %>% summarise_all(mean, na.rm = TRUE) ``` Reprexes are easiest to understand if you create very small custom data, but if you do want to use an existing data frame you can use `tbl_memdb()`: ```{r} mtcars_db <- tbl_memdb(mtcars) mtcars_db %>% count(cyl) %>% show_query() ``` ## Translating verbs Many problems with dbplyr come down to incorrect SQL generation. Fortunately, it's possible to generate SQL without a database using `lazy_frame()` and `tbl_lazy()`. Both take an `con` argument which takes a database "simulator" like `simulate_postgres()`, `simulate_sqlite()`, etc. ```{r} x <- c("abc", "def", "ghif") lazy_frame(x = x, con = simulate_postgres()) %>% head(5) %>% show_query() lazy_frame(x = x, con = simulate_mssql()) %>% head(5) %>% show_query() ``` If you isolate the problem to incorrect SQL generation, it would be very helpful if you could also suggest more appropriate SQL. ## Translating individual expressions In some cases, you might be able to track the problem down to incorrect translation for a single column expression. In that case, you can make your reprex even simpler with `translate_sql()`: ```{r} translate_sql(substr(x, 1, 2), con = simulate_postgres()) translate_sql(substr(x, 1, 2), con = simulate_sqlite()) ``` dbplyr/vignettes/dbplyr.Rmd0000644000176200001440000002632013474056125015530 0ustar liggesusers--- title: "Introduction to dbplyr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to dbplyr} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 6L, tibble.print_max = 6L, digits = 3) ``` As well as working with local in-memory data stored in data frames, dplyr also works with remote on-disk data stored in databases. This is particularly useful in two scenarios: * Your data is already in a database. * You have so much data that it does not all fit into memory simultaneously and you need to use some external storage engine. (If your data fits in memory there is no advantage to putting it in a database: it will only be slower and more frustrating.) This vignette focuses on the first scenario because it's the most common. If you're using R to do data analysis inside a company, most of the data you need probably already lives in a database (it's just a matter of figuring out which one!). However, you will learn how to load data in to a local database in order to demonstrate dplyr's database tools. At the end, I'll also give you a few pointers if you do need to set up your own database. ## Getting started To use databases with dplyr you need to first install dbplyr: ```{r, eval = FALSE} install.packages("dbplyr") ``` You'll also need to install a DBI backend package. The DBI package provides a common interface that allows dplyr to work with many different databases using the same code. DBI is automatically installed with dbplyr, but you need to install a specific backend for the database that you want to connect to. Five commonly used backends are: * [RMariaDB](https://CRAN.R-project.org/package=RMariaDB) connects to MySQL and MariaDB * [RPostgres](https://CRAN.R-project.org/package=RPostgres) connects to Postgres and Redshift. * [RSQLite](https://github.com/rstats-db/RSQLite) embeds a SQLite database. * [odbc](https://github.com/rstats-db/odbc#odbc) connects to many commercial databases via the open database connectivity protocol. * [bigrquery](https://github.com/rstats-db/bigrquery) connects to Google's BigQuery. If the database you need to connect to is not listed here, you'll need to do some investigation (i.e. googling) yourself. In this vignette, we're going to use the RSQLite backend which is automatically installed when you install dbplyr. SQLite is a great way to get started with databases because it's completely embedded inside an R package. Unlike most other systems, you don't need to setup a separate database server. SQLite is great for demos, but is surprisingly powerful, and with a little practice you can use it to easily work with many gigabytes of data. ## Connecting to the database To work with a database in dplyr, you must first connect to it, using `DBI::dbConnect()`. We're not going to go into the details of the DBI package here, but it's the foundation upon which dbplyr is built. You'll need to learn more about if you need to do things to the database that are beyond the scope of dplyr. ```{r setup, message = FALSE} library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:") ``` The arguments to `DBI::dbConnect()` vary from database to database, but the first argument is always the database backend. It's `RSQLite::SQLite()` for RSQLite, `RMariaDB::MariaDB()` for RMariaDB, `RPostgres::Postgres()` for RPostgres, `odbc::odbc()` for odbc, and `bigrquery::bigquery()` for BigQuery. SQLite only needs one other argument: the path to the database. Here we use the special string `":memory:"` which causes SQLite to make a temporary in-memory database. Most existing databases don't live in a file, but instead live on another server. That means in real-life that your code will look more like this: ```{r, eval = FALSE} con <- DBI::dbConnect(RMariaDB::MariaDB(), host = "database.rstudio.com", user = "hadley", password = rstudioapi::askForPassword("Database password") ) ``` (If you're not using RStudio, you'll need some other way to securely retrieve your password. You should never record it in your analysis scripts or type it into the console. [Securing Credentials](https://db.rstudio.com/best-practices/managing-credentials) provides some best practices.) Our temporary database has no data in it, so we'll start by copying over `nycflights13::flights` using the convenient `copy_to()` function. This is a quick and dirty way of getting data into a database and is useful primarily for demos and other small jobs. ```{r} copy_to(con, nycflights13::flights, "flights", temporary = FALSE, indexes = list( c("year", "month", "day"), "carrier", "tailnum", "dest" ) ) ``` As you can see, the `copy_to()` operation has an additional argument that allows you to supply indexes for the table. Here we set up indexes that will allow us to quickly process the data by day, carrier, plane, and destination. Creating the right indices is key to good database performance, but is unfortunately beyond the scope of this article. Now that we've copied the data, we can use `tbl()` to take a reference to it: ```{r} flights_db <- tbl(con, "flights") ``` When you print it out, you'll notice that it mostly looks like a regular tibble: ```{r} flights_db ``` The main difference is that you can see that it's a remote source in a SQLite database. ## Generating queries To interact with a database you usually use SQL, the Structured Query Language. SQL is over 40 years old, and is used by pretty much every database in existence. The goal of dbplyr is to automatically generate SQL for you so that you're not forced to use it. However, SQL is a very large language and dbplyr doesn't do everything. It focusses on `SELECT` statements, the SQL you write most often as an analyst. Most of the time you don't need to know anything about SQL, and you can continue to use the dplyr verbs that you're already familiar with: ```{r} flights_db %>% select(year:day, dep_delay, arr_delay) flights_db %>% filter(dep_delay > 240) flights_db %>% group_by(dest) %>% summarise(delay = mean(dep_time)) ``` However, in the long-run, I highly recommend you at least learn the basics of SQL. It's a valuable skill for any data scientist, and it will help you debug problems if you run into problems with dplyr's automatic translation. If you're completely new to SQL you might start with this [codeacademy tutorial](https://www.codecademy.com/learn/learn-sql). If you have some familiarity with SQL and you'd like to learn more, I found [how indexes work in SQLite](http://www.sqlite.org/queryplanner.html) and [10 easy steps to a complete understanding of SQL](http://blog.jooq.org/2016/03/17/10-easy-steps-to-a-complete-understanding-of-sql) to be particularly helpful. The most important difference between ordinary data frames and remote database queries is that your R code is translated into SQL and executed in the database on the remote server, not in R on your local machine. When working with databases, dplyr tries to be as lazy as possible: * It never pulls data into R unless you explicitly ask for it. * It delays doing any work until the last possible moment: it collects together everything you want to do and then sends it to the database in one step. For example, take the following code: ```{r} tailnum_delay_db <- flights_db %>% group_by(tailnum) %>% summarise( delay = mean(arr_delay), n = n() ) %>% arrange(desc(delay)) %>% filter(n > 100) ``` Surprisingly, this sequence of operations never touches the database. It's not until you ask for the data (e.g. by printing `tailnum_delay`) that dplyr generates the SQL and requests the results from the database. Even then it tries to do as little work as possible and only pulls down a few rows. ```{r} tailnum_delay_db ``` Behind the scenes, dplyr is translating your R code into SQL. You can see the SQL it's generating with `show_query()`: ```{r} tailnum_delay_db %>% show_query() ``` If you're familiar with SQL, this probably isn't exactly what you'd write by hand, but it does the job. You can learn more about the SQL translation in `vignette("translation-verb")` and `vignette("translation-function")`. Typically, you'll iterate a few times before you figure out what data you need from the database. Once you've figured it out, use `collect()` to pull all the data down into a local tibble: ```{r} tailnum_delay <- tailnum_delay_db %>% collect() tailnum_delay ``` `collect()` requires that database does some work, so it may take a long time to complete. Otherwise, dplyr tries to prevent you from accidentally performing expensive query operations: * Because there's generally no way to determine how many rows a query will return unless you actually run it, `nrow()` is always `NA`. * Because you can't find the last few rows without executing the whole query, you can't use `tail()`. ```{r, error = TRUE} nrow(tailnum_delay_db) tail(tailnum_delay_db) ``` You can also ask the database how it plans to execute the query with `explain()`. The output is database dependent, and can be esoteric, but learning a bit about it can be very useful because it helps you understand if the database can execute the query efficiently, or if you need to create new indices. ## Creating your own database If you don't already have a database, here's some advice from my experiences setting up and running all of them. SQLite is by far the easiest to get started with. PostgreSQL is not too much harder to use and has a wide range of built-in functions. In my opinion, you shouldn't bother with MySQL/MariaDB: it's a pain to set up, the documentation is subpar, and it's less featureful than Postgres. Google BigQuery might be a good fit if you have very large data, or if you're willing to pay (a small amount of) money to someone who'll look after your database. All of these databases follow a client-server model - a computer that connects to the database and the computer that is running the database (the two may be one and the same but usually isn't). Getting one of these databases up and running is beyond the scope of this article, but there are plenty of tutorials available on the web. ### MySQL/MariaDB In terms of functionality, MySQL lies somewhere between SQLite and PostgreSQL. It provides a wider range of [built-in functions](http://dev.mysql.com/doc/refman/5.0/en/functions.html). It gained support for window functions in 2018. ### PostgreSQL PostgreSQL is a considerably more powerful database than SQLite. It has a much wider range of [built-in functions](http://www.postgresql.org/docs/current/static/functions.html), and is generally a more featureful database. ### BigQuery BigQuery is a hosted database server provided by Google. To connect, you need to provide your `project`, `dataset` and optionally a project for `billing` (if billing for `project` isn't enabled). It provides a similar set of functions to Postgres and is designed specifically for analytic workflows. Because it's a hosted solution, there's no setup involved, but if you have a lot of data, getting it to Google can be an ordeal (especially because upload support from R is not great currently). (If you have lots of data, you can [ship hard drives]()!) dbplyr/vignettes/translation-verb.Rmd0000644000176200001440000000766113474056125017535 0ustar liggesusers--- title: "Verb translation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Verb translation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- There are two parts to dbplyr SQL translation: translating dplyr verbs, and translating expressions within those verbs. This vignette describes how entire verbs are translated; `vignette("translate-function")` describes how individual expressions within those verbs are translated. All dplyr verbs generate a `SELECT` statement. To demonstrate we'll make a temporary database with a couple of tables ```{r, message = FALSE} library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") flights <- copy_to(con, nycflights13::flights) airports <- copy_to(con, nycflights13::airports) ``` ## Single table verbs * `select()` and `mutate()` modify the `SELECT` clause: ```{r} flights %>% select(contains("delay")) %>% show_query() flights %>% select(distance, air_time) %>% mutate(speed = distance / (air_time / 60)) %>% show_query() ``` (As you can see here, the generated SQL isn't always as minimal as you might generate by hand.) * `filter()` generates a `WHERE` clause: ```{r} flights %>% filter(month == 1, day == 1) %>% show_query() ``` * `arrange()` generates an `ORDER BY` clause: ```{r} flights %>% arrange(carrier, desc(arr_delay)) %>% show_query() ``` * `summarise()` and `group_by()` work together to generate a `GROUP BY` clause: ```{r} flights %>% group_by(month, day) %>% summarise(delay = mean(dep_delay)) %>% show_query() ``` ## Dual table verbs | R | SQL |------------------|------------------------------------------------------------ | `inner_join()` | `SELECT * FROM x JOIN y ON x.a = y.a` | `left_join()` | `SELECT * FROM x LEFT JOIN y ON x.a = y.a` | `right_join()` | `SELECT * FROM x RIGHT JOIN y ON x.a = y.a` | `full_join()` | `SELECT * FROM x FULL JOIN y ON x.a = y.a` | `semi_join()` | `SELECT * FROM x WHERE EXISTS (SELECT 1 FROM y WHERE x.a = y.a)` | `anti_join()` | `SELECT * FROM x WHERE NOT EXISTS (SELECT 1 FROM y WHERE x.a = y.a)` | `intersect(x, y)`| `SELECT * FROM x INTERSECT SELECT * FROM y` | `union(x, y)` | `SELECT * FROM x UNION SELECT * FROM y` | `setdiff(x, y)` | `SELECT * FROM x EXCEPT SELECT * FROM y` `x` and `y` don't have to be tables in the same database. If you specify `copy = TRUE`, dplyr will copy the `y` table into the same location as the `x` variable. This is useful if you've downloaded a summarised dataset and determined a subset of interest that you now want the full data for. You can use `semi_join(x, y, copy = TRUE)` to upload the indices of interest to a temporary table in the same database as `x`, and then perform a efficient semi join in the database. If you're working with large data, it maybe also be helpful to set `auto_index = TRUE`. That will automatically add an index on the join variables to the temporary table. ## Behind the scenes The verb level SQL translation is implemented on top of `tbl_lazy`, which basically tracks the operations you perform in a pipeline (see `lazy-ops.R`). Turning that into a SQL query takes place in three steps: * `sql_build()` recurses over the lazy op data structure building up query objects (`select_query()`, `join_query()`, `set_op_query()` etc) that represent the different subtypes of `SELECT` queries that we might generate. * `sql_optimise()` takes a pass over these SQL objects, looking for potential optimisations. Currently this only involves removing subqueries where possible. * `sql_render()` calls an SQL generation function (`sql_select()`, `sql_join()`, `sql_subquery()`, `sql_semijoin()` etc) to produce the actual SQL. Each of these functions is a generic, taking the connection as an argument, so that the details can be customised for different databases. dbplyr/vignettes/translation-function.Rmd0000644000176200001440000002312613474056125020416 0ustar liggesusers--- title: "Function translation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Function translation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) ``` There are two parts to dbplyr SQL translation: translating dplyr verbs, and translating expressions within those verbs. This vignette describes how individual expressions (function calls) are translated; `vignette("translate-verb")` describes how entire verbs are translated. ```{r, message = FALSE} library(dbplyr) library(dplyr) ``` `dbplyr::translate_sql()` powers translation of individual function calls, and I'll use it extensively in this vignette to show what's happening. You shouldn't need to use it ordinary code as dbplyr takes care of the translation automatically. ```{r} translate_sql((x + y) / 2) ``` `translate_sql()` takes an optional `con` parameter. If not supplied, this causes dplyr to generate (approximately) SQL-92 compliant SQL. If supplied, dplyr uses `sql_translate_env()` to look up a custom environment which makes it possible for different databases to generate slightly different SQL: see `vignette("new-backend")` for more details. You can use the various simulate helpers to see the translations used by different backends: ```{r} translate_sql(x ^ 2L) translate_sql(x ^ 2L, con = simulate_sqlite()) translate_sql(x ^ 2L, con = simulate_access()) ``` Perfect translation is not possible because databases don't have all the functions that R does. The goal of dplyr is to provide a semantic rather than a literal translation: what you mean, rather than precisely what is done. In fact, even for functions that exist both in databases and R, you shouldn't expect results to be identical; database programmers have different priorities than R core programmers. For example, in R in order to get a higher level of numerical accuracy, `mean()` loops through the data twice. R's `mean()` also provides a `trim` option for computing trimmed means; this is something that databases do not provide. If you're interested in how `translate_sql()` is implemented, the basic techniques that underlie the implementation of `translate_sql()` are described in ["Advanced R"](http://adv-r.hadley.nz/translation.html). ## Basic differences The following examples work through some of the basic differences between R and SQL. * `"` and `'` mean different things ```{r} # In SQLite variable names are escaped by double quotes: translate_sql(x) # And strings are escaped by single quotes translate_sql("x") ``` * And some functions have different argument orders: ```{r} translate_sql(substr(x, 5, 10)) translate_sql(log(x, 10)) ``` * R and SQL have different defaults for integers and reals. In R, 1 is a real, and 1L is an integer. In SQL, 1 is an integer, and 1.0 is a real ```{r} translate_sql(1) translate_sql(1L) ``` ## Known functions ### Mathematics * basic math operators: `+`, `-`, `*`, `/`, `^` * trigonometry: `acos()`, `asin()`, `atan()`, `atan2()`, `cos()`, `cot()`, `tan()`, `sin()` * hypergeometric: `cosh()`, `coth()`, `sinh()`, `tanh()` * logarithmic: `log()`, `log10()`, `exp()` * misc: `abs()`, `ceiling()`, `sqrt()`, `sign()`, `round()` ## Modulo arithmetic dbplyr translates `%%` and `%/%` to their SQL equivalents but note that they are not precisely the same: most databases use truncated division where the modulo operator takes the sign of the dividend, where R using the mathematically preferred floored division with the modulo sign taking the sign of the divisor. ```{r} df <- tibble( x = c(10L, 10L, -10L, -10L), y = c(3L, -3L, 3L, -3L) ) mf <- src_memdb() %>% copy_to(df, overwrite = TRUE) df %>% mutate(x %% y, x %/% y) mf %>% mutate(x %% y, x %/% y) ``` ### Logical comparisons * logical comparisons: `<`, `<=`, `!=`, `>=`, `>`, `==`, `%in%` * boolean operations: `&`, `&&`, `|`, `||`, `!`, `xor()` ### Aggregation All database provide translation for the basic aggregations: `mean()`, `sum()`, `min()`, `max()`, `sd()`, `var()`. Databases automatically drop NULLs (their equivalent of missing values) whereas in R you have to ask nicely. The aggregation functions warn you about this important difference: ```{r} translate_sql(mean(x)) translate_sql(mean(x, na.rm = TRUE)) ``` Note that, by default, `translate()` assumes that the call is inside a `mutate()` or `filter()` and generates a window translation. If you want to see the equivalent `summarise()`/aggregation translation, use `window = FALSE`: ```{r} translate_sql(mean(x, na.rm = TRUE), window = FALSE) ``` ### Conditional evaluation `if` and `switch()` are translate to `CASE WHEN`: ```{r} translate_sql(if (x > 5) "big" else "small") translate_sql(switch(x, a = 1L, b = 2L, 3L)) ``` ### String manipulation ### Date/time * string functions: `tolower`, `toupper`, `trimws`, `nchar`, `substr` * coerce types: `as.numeric`, `as.integer`, `as.character` ## Unknown functions Any function that dplyr doesn't know how to convert is left as is. This means that database functions that are not covered by dplyr can be used directly via `translate_sql()`. Here a couple of examples that will work with [SQLite](http://www.sqlite.org/lang_corefunc.html): ```{r} translate_sql(glob(x, y)) translate_sql(x %like% "ab%") ``` See `vignette("sql")` for more details. ## Window functions Things get a little trickier with window functions, because SQL's window functions are considerably more expressive than the specific variants provided by base R or dplyr. They have the form `[expression] OVER ([partition clause] [order clause] [frame_clause])`: * The __expression__ is a combination of variable names and window functions. Support for window functions varies from database to database, but most support the ranking functions, `lead`, `lag`, `nth`, `first`, `last`, `count`, `min`, `max`, `sum`, `avg` and `stddev`. * The __partition clause__ specifies how the window function is broken down over groups. It plays an analogous role to `GROUP BY` for aggregate functions, and `group_by()` in dplyr. It is possible for different window functions to be partitioned into different groups, but not all databases support it, and neither does dplyr. * The __order clause__ controls the ordering (when it makes a difference). This is important for the ranking functions since it specifies which variables to rank by, but it's also needed for cumulative functions and lead. Whenever you're thinking about before and after in SQL, you must always tell it which variable defines the order. If the order clause is missing when needed, some databases fail with an error message while others return non-deterministic results. * The __frame clause__ defines which rows, or __frame__, that are passed to the window function, describing which rows (relative to the current row) should be included. The frame clause provides two offsets which determine the start and end of frame. There are three special values: -Inf means to include all preceding rows (in SQL, "unbounded preceding"), 0 means the current row ("current row"), and Inf means all following rows ("unbounded following)". The complete set of options is comprehensive, but fairly confusing, and is summarised visually below. ```{r echo = FALSE, out.width = "100%"} knitr::include_graphics("windows.png", dpi = 300) ``` Of the many possible specifications, there are only three that commonly used. They select between aggregation variants: * Recycled: `BETWEEN UNBOUND PRECEEDING AND UNBOUND FOLLOWING` * Cumulative: `BETWEEN UNBOUND PRECEEDING AND CURRENT ROW` * Rolling: `BETWEEN 2 PRECEEDING AND 2 FOLLOWING` dplyr generates the frame clause based on whether your using a recycled aggregate or a cumulative aggregate. To see how individual window functions are translated to SQL, we can again use `translate_sql()`: ```{r} translate_sql(mean(G)) translate_sql(rank(G)) translate_sql(ntile(G, 2)) translate_sql(lag(G)) ``` If the tbl has been grouped or arranged previously in the pipeline, then dplyr will use that information to set the "partition by" and "order by" clauses. For interactive exploration, you can achieve the same effect by setting the `vars_group` and `vars_order` arguments to `translate_sql()` ```{r} translate_sql(cummean(G), vars_order = "year") translate_sql(rank(), vars_group = "ID") ``` There are some challenges when translating window functions between R and SQL, because dplyr tries to keep the window functions as similar as possible to both the existing R analogues and to the SQL functions. This means that there are three ways to control the order clause depending on which window function you're using: * For ranking functions, the ordering variable is the first argument: `rank(x)`, `ntile(y, 2)`. If omitted or `NULL`, will use the default ordering associated with the tbl (as set by `arrange()`). * Accumulating aggregates only take a single argument (the vector to aggregate). To control ordering, use `order_by()`. * Aggregates implemented in dplyr (`lead`, `lag`, `nth_value`, `first_value`, `last_value`) have an `order_by` argument. Supply it to override the default ordering. The three options are illustrated in the snippet below: ```{r, eval = FALSE} mutate(players, min_rank(yearID), order_by(yearID, cumsum(G)), lead(G, order_by = yearID) ) ``` Currently there is no way to order by multiple variables, except by setting the default ordering with `arrange()`. This will be added in a future release. dbplyr/vignettes/windows.png0000644000176200001440000015427313415745770016007 0ustar liggesusersPNG  IHDR9psRGB pHYs.#.#x?viTXtXML:com.adobe.xmp 5 2 1 2@IDATxsSXq(Nq(.EJ")x o)wwwwzlrvd+3Lgf0}? H@$  H@ vZ H@$  H@@u@$  H@t0[$  H@$  K$  H@$0ԥIߺ%  H@$  H`x{w]w=7SOb8-~>wG4Ls1 3*W'dKlS!{.]>nSN9r-_{wg}n%\\6}L,V$ K`(ֶ.bw*^`eYRdy[ou֪;,;l|[oc]h~N8!.2_u`dTSME+!`sfss7rK H@$<֜{ꫯp묳/TW^{ 7_0}'7,nLJ x>!: K&l2z'kү9眓iξ vۍ]/N z[o5#M3+N{UW \\6 EGb%  H@ Ђs^>qc}rfmƍK.nYz2S#rE%%d樣{wp/}Q 6?C1|=C`p|Ms\_bpcK/F(_pl; q.x.F;#j$ ȫv-Dk>Fi4Zbڲn=`"~N8U%=k_Z+g0L4D3^q_c5`ZI'AadU45 3cVȽ GrqXs7Z>8i9R $  @KofڌEqj.'xD)$MX,Zf.˘cXBn]x0_}Zmժꨘ\Ws7tAw%  H@C@{W\#g{n+7 쩋_dp0W_}1]曳d뮻YތW,׽ꪫ7glbYX;9(lynDvҎE'J4KѧN%w$K H@$Pt)=2 9XcQF/tK\1~r{k4F.w}zjvl eA1;C: &d裏>0Fj)ns.c^H*4"'XVЁݤIvFA[lH@@A0r%rN%/{Jg[WZO1x n84Q{I9t3ơ+2Dgk&kwdAQe:_oau/h3%tR=u]Mb-xꨘQ9g[ިodkj#gi$ :^V"5_zmZj?O5qG;s͑Hq>^q)s =}~wfYhA<^m8~bJb9{N6ߑ61X H@@ _:"3Yf=wQG$I#]v!<הomAJ?C5hO1mWȒF!9`lV d)y睗ut4`Vկ3v< ~xPnl^n:rYձ7ћmR#mbh$ R(K7twqix<# 겘g1EpVZil8C[n!f9wBf5ۃYlRE^pxO*袋8K,D4o%wH}>+ H@n/n:[`.FڼsezkNQ8I6fc_-b|`!7}m_? 5cM-x.F $  HW^;r6r.+ Q /0z릛nQyG.e0L?ؚѐ;[y~xZ|17x#3<*x} 32xL\Y eea%$  t ֬4lm}AEiW|G*ގ< (mGC%\ "=Ӄ9묳:_J:KQ$  wgHo,2X'm =حʐ଩暫EZF Q.$  4G@]?sK@$  H@@s<9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn H@$  H@h9~斀$  H@$ Kgn ~ۯ|f$  H@@ d$P/_|1#O9 7\i$  H@~߿ u6zT+%Unl覒ݒ1V]4# H@V~`eQVUFܧ._o&ĕSdgJ76Jgp{InӠFZ$  H _ځ$ H@$  H@p ͓$  H@$ԥ]6O$  H@@Pvxi$  H@$ .'.y$  H@:;H$  H@$  t9uiw͓$  H@$ԥA' H@$  H Km$  H@$ '.< H@$  H@]N@]l$  H@$  t8uiwI@$  H@r.`' H@$  H K;4O$  H@@Pvy< H@$ >{~ߕJ@]V. H@H.va:N4N$P| VX0X`FU4S{ ?Ґ䫯 5X3Ovi'u] 5oN_3^#.-]i$  |)\s8,ə!ڒQZTQoyC38;~wqdž&E.o~Z0o_}g5E@]Zroz{z7@#h~Qr_ܤo;Jl ?Xgf},0s9;6M@k ,MIԳ:K.a3%dI~/2)ƀ9s衇7|/b-6#vl^yܞ6Zo">Gu5{7fmxww\V?)*W_=3N8aS׌,L{rǮYW}fIn`!}MlLP|}#K/MH*a[ ZPuQNvORDԪ:c6ld}ٯHS[oumF/4L[_3M@S{#0IОa?+kkowyE4L>FmcQu_,W/L`O+N#Y֥N<Ĺ YMM>*^5?R?O6SN9%MoU`O>BD Dr1"vi#\,E$黙go$#7@k&dd0l͈J9Cu]1O$YwuoO.j6`Ұ10>MdT_HϑN(!N\s12\<> EHblQ ?1Yg\ HCTZhG[!ZhUF܎ΐ[fmQ{,(S%5aVSGq[]+zVKQ uYve#&nQn1?Flq Jկ" qۊx -o*9OmR1C %r0'BjF6lBjS;FxDiq_~aes1B @RJ>uY*$4& R Ȁg1b/BdRbǦIU(GEFRˢ.G i" R*1ǁ65YKXIc Qu&ѥ bVMx>,%\K^d|kysDԥ$:*3nΚiL !^ x:묃'=5j5並Xy83i 0d 7,OptMD7#MBA"M $A|?"b@p  +Bi>(6ʢbVHҴcrerl&Fʹ4ȂM@WcOhns7δ -b 9@RF@Qge`3'gyɴT8;@0{,Tf3UGɬى .aӟS6a4|c (|RBqJo 1[lñM6avFfe68yiuSW|c^vr 2 O:$˧ /g %Ҁ:kI ތS4(')br)Y ԿLR 'ok%H{{7>w!v<0BGe5,Bc1-wl5f#S'n.)ꩧJmo Y~V]d kې;Z( W:~Ield-1˂_:e6yV В.~Lpj:є,er""ID8`. ;fuu_52 rdSE%,!KsJ b|LXj &\$;!ӟP3ihg9153+Ĺt xB -R $ЙԙiJM‚3ϯQ&k"{S+Z6br\ԿLEކs R ~7elGa?ir1xO8b3'[bc.M/2QwIV)e bɥYHb8eqshApBSNA4(/PE|`Mo͢"i5K[滽E%~}#EP1Y┗j @[>uY32nİE~ba'S!QG467 cOaXӤjcXR.(XYN_IagKNW3&9:+VP+ :$ryr @'PvB/hCNXvm#0`AWv9WVq [6'b^/cxX?-z0wr˝wމ.E36Q%8%WT6$ ' Ra<, -2&X ϖ_?=o0w^5z96xcc=ڒs~8#̤L84 US&khYSVVmLK @ ȕ]]d,dNAS0gnt):_uH⦰*wZrNq&^K[W$WTYRQK≉O8>ٳ\ / f+0WB1,brټ%^"0WB6=o.kdKh:A6{uwŗ P2s(4Mj_S"}Fýy%_,̺XzW\p-ٺrp.wy0_]rOɖpg` ҪQ/=i.׿is}mPݨ14iaR f/eZ8]Q< I)\%vY]֯e 8) glxmȰژxngV5^~,"c$ͣMΕbd9#KQO|{)̘Tr@kSz{z@'PvB/hCftFl /\3Mgk n3҃kXSgw`EL5 z'dͤᾐ-^<" AqǣVb.b"nHNNeCxĄ"I-˳LM"4<(K8ZѥD!J+.q,wBŨh҆{F_IVYs,M*RV5sNœPhPbǢ7e,e)K⋑CM%]㴯~Nü$vrj88겗-#e h[E'dFqG[G+3c!-["c$+\)`v||CID_܎ҪTlߢ|;c7A?skد^I` KUw?~Gy Go*?E뭷^4+NӁ QN)p/8iDDR7㗿%~lv3Ys(&v{55N:Y] (c\EC;C(RktZĆT`O XVs8~;ZW4cرw"%M9[K0;;z~[R˲Yq0kUFX9e8x=uYQ:Jjvo#;#*?-Ј爢3yPMZ;MpzhFC\PFɁ}$  H@$ ԥ+ H@$  H@C@]:0E$  H@jPb$  H@$  up H@$  H@M@]Z$  H@$00ԥZ$  H@$  H6uim.J@$  H@P gk$  H@$ ԥ+ H@$  H@C@]:0E$  H@jPb$  H@$  up H@$  H@M@]Z$  H@$00ԥZ$  H@$  H6uim.J@$  H@P gk$  H@$ ԥ+ H@$  H@C@]:0E$  H@jPb$  H@$  up H@$  H@M@]Z$  H@$00ԥZ$  H@$  H6uim.J@$  H@P gk$  H@$ ԥ+ H@$  H@C@]:0E$  H@jPb$  H@$  up H@$  H@M@]Z$  H@$00ԥZ$  H@$  H6uim.J@$  H@P gk$  H@$ ԥ+ H@$  H@C@]:0E$  H@jPb$  H@$  up H@$  H@M@]Z$  H@$00ԥZ$  H@^|?pM H@ $04\ve;A44k+G?g EL?8")=G}ꩧ>Sk5\3N\u]E5O?M뮻n_3^G[! H@}#-C\s8֛Ƶ-2Jw񨣎j{_lh8W?O?4ok>M/ tợB$0YTr_{w7ߜjzKx +x9 L7t=ܨ9&i$0ԥIۺ4/o۸a&hE]t'd(a}ꫯb37 -<6kAW^9h#[&J{?~r ;;,STz뫯zg&pU55Y lǮYW~/`!}MwLP|;#x ;ET/@M*}YH4-˯4L+9]Ź H@FV=(oy6:nv~馛^vtfs]w?׿~_w͝IR:q7餓N1c9&+xț8`iEr7RK]y)k&pm&;y]iRz'͙"WYev>K/MH*a[ ~}ه('6~'"jUWc1fm6>W]uU$N[ou6sjDKDѡb-I|& )j~G$h0 ~ԕ|F|{ XL3Jay{1k꥗ fF~495l]*O{tPzfүja0 S3YCZk-./gO}uYV\pA0B?`ְY{ C 12eKq+Ը{Dd?(mi>$X}2z}рL'yϻ4&e\{ӿ/\2my*Y32nkW^9 /21 M7ݔY087z@_% 'Pi}mhSTihs;P&J#~ 7d R;j_e92fPRs[uVK Yve#/ (F8F SHܵ%1H.%w\^}q2EL6"^|ٕʞ@'Jկ" qx _J%go9Vn)c!S\98%ٰ FV#b4d$11Tc6P];1R8f)E1}TH 0IAiL@'38cih@2r1B_PR=܃QQxc@F /XSO=EL]r5ui/fd~i4*rp/__IN8ᄴj{{whS. ,hZ19(UV)laxrir3Ѭ(2[;BZ' )#}TH 7)Nհipov2`G7Y(n6shlv JX: ^j3@, ~헳K H,/)2`E,gIU<6!l TXp- !BMpƻ]>rA&ꩧJ-_=21C&ʍ5|S 8 s9gj?a:]$CIJP6Ee_2<+6fJW_?&[{X5BKYRҿ}l9n'lӥչӃ.2,-E?Ζjj.|lduxRi?s5WֶX1, ~bvJD 6sCj+h} j:XjV85)%M0z͇E>kf. qu$1m]wdI$G"VQȋ$ʚEQE,,hCCVk&^w{^VSeFF4ٺ"ACK9 ]ڧ.|]}{[ǂX3Ll| reƤ|rKuůe Pg5, t2ui'goC@,^l[SrzVah<^c-v6f]ȸd]-rwK=c?XHQ 7k_~b!mGnu91a|CՑj62X oXiJ&[/rNۄ2 wY*96xcc=vB[qNo`$ۃ9a+"ۆ{@R~\ܻrRWtѭIN&h&6, tuiGut.*i)ޞ6Ve2 t2{|_pobeY X .e1ɺR_Ne.,CxH38fxf?lȖVzϬL9}bDq5oSU3p{zaR]GD):_r%$hEp.c;9Y]˼s)[{FEg}wqŻujl7:U[Y|$._0 $ $.~Ѫ.!0,l wo<`&c`A].mk_Xx G?QT$ßɣV8y4ʁdDSa}W皒&9\Bƙ'&CF$ZCbC,DF$G"q0XGcG+Q5"6Di_ѥl A!/QqM3NP8x)eP9UQ'aNB4rrrq>MwYXMKi_LJ;98at\u-DXr~Z32|(yTvYi6gy&X<|TLx H K٪?kUie/4-͹+` %5;Umeo:묓e @zhهZy(ͶI<}Q gHgj# gl6MC.Tt62 PVJ(848㎑&+f$7!·=p2L>sIt\F3p&,XII+ uԩŶd򢟳rUx& YRLo痒?mTBuqk+[^d`0$48mL@Xj H,2m5Q%6iq^Y>ۜx@"52ڜeU%5;KkռUM( \]x8LQ<%I/%U\"6]CK+]S^)B'дٿ@Koδ1Nbn: f3y~mLjiH@iAV>eO1F 35|DZj^-h|0;dw.d@P#쎪 Vo']  emN81f' H<-M]FguzE9~U>ɶns$Э<[{vu~-dپ :L'sv&kLgy@k5IO;g1t]ve/ة;PX t2ksFC\Vv%) Ӄ 2I:1XQ:Kjv4b%58d)5KYv_9khC$0t/:}mK%  H@@{9:9uyqe]K?{p $  H`cP`%=|~i-$  H@$PF26K@$  H@K[" H@$  Hԥe5m$  H@$=ԥӗD$  H@@ Kk, H@$  H{K/m$  H@$ 2P״Y$  H@@PvO_ H@$  H@e$.-ci$  H@$ !.ힾ%$  H@H@]Z^f H@$  H@C@]=}iK$  H@$  $  H@{ҖH@$  H@(#ui{M%  H@$  tui-$  H@$PF2{6Q@IDAT6K@$  H@K[" H@$  Hԥe5m$  H@$=ԥӗD$  H@@ Kk, H@$  H{K/m$  H@$ 2P״Y$  H@@PvO_ H@$  H@e$.-ci$  H@$ !.ힾ%$  H@H@]Z^f H@$  H@C@]=}iK$  H@$  $  H@{ҖH@$  H@(#ui{M%  H@$  tui-$  H@$PF26K@$  H@K[" H@$  Hԥe5m$  H@$=ԥӗD$  H@@ Kk, H@$  H{K/m$  H@>SK@hui;Z$  t4.la=蠃:1woxm*VX!|k,hiM{'S;#hW_]x2<4L vi'u]\\s͂y]^P:$  H`(o5\387i*>O“.NgosF[Rs7GuT4gqwN?o8~Z0o_^K@Rǀ$  H>~l/|}T+>ժs97HռkmٛoY{wdgy"8judwq&|7x}2Fa[lŊDXsexWp{6h59ꨣ8∽%h2{7) wqYY(n .O>IV-LzKSE PLz{CݰFyj&pSZoe5 &~G|Wx{%_4H?S<_|O52R8o6 7\&G-u>70a*I@O[c(Boh;뮻'x׿f 4EJ0Mr.B4e'pqM:SL1Řc 䖳"X:ӢZ}_j5XcM6odMv'J+=1O>93E*讴G]z饩 CE;Cz+Է4o=|RĮO>$UZ 2,S)30i1L<'p9 L%X4:Cf-"j~/fi1Fe/R‰'8s>oDg뮻b#<&}9`bf`[o agjruNpO< W\H3̽KGKIC??[eۤ+i",rie߭f毶j1w(FB>(,QL2$ڠ# n@~צBp_r!vYAnhu]wJ.?=쳩R@`VUת^Nif~f폟w5K1{fW7%1mq饗Ì6mƝ񪫮J ) ٧HkA#l$ ]zk4DM4#"G?]tQİK,D\^q2 [k%$=K?җCC(g?ȋ#ЄЮ/~ |' b qِw-(@:$iB($nmEQz)a' 6PT[. Jw}w>(Zkm馨fZB__z%;xh;۔ _:2@ifxNb%* BTLm k6'U"? eY8',>&1e'ٕj&$:ȋϟlfF [a H iXȔ}+f ;Lr^$ 炠L8' $yۍ;`&Ff \1x縉la[)8zꩄY9$P xE7[1sS܆Ip|%a,Q%Ux,믿~*=ְ Ş]2rvT/dmf8fմ __rR$kGO W]uM)>{*O8ɾUǀI<>IgKf9M}C\Hvfd?q|.23+7;~ƤˌB -SORLA -뾲?NyW%ja7u,+;X({w5X6[G!ˈPAOSRyp1~뮻;qV*bDp"橪5Kn&zrѽh0U)>96oįJ DJfs; ̵ׁͤzdK#RBs&RV:l EqR_Nz.̧HХ|\Bb r,(`5W[p_Ӱzx H` K5ɱ=;gZQK 4@ֆƽ2>+l±k4yr-wy'x%yHŽqz8&WZ$ Y7f^V?5V@D7!`"{pWR&b$k".=-e h$oƜeE8OR*Leשdι^!mO;E(= AARn6 ѥUGV(alP v++s)dKƬ3ޞ\doxz{x t> ތ[pCt1dkq)ee[u/>R|ηxH38fxzd?lȖ0l?gGS}+EF5*~gaYlÎ\ Mz8_H.LRK\Er^uZ6қlN*N^aw 'U_G.FE hy>3o˹uUL侺c$ A'𿯼A7E$P%ON ˎ +-xo)J<)P(@$er ` p1v؁BI3uK$Йԥ/ZU0^;2U Yčl4 G"h0t#J5(W^y%>vQ..YDJ8EoPKĝ4r2CFJ4*>ė"6$c "45E~vm]ѓ%ܣ )^93DiRHʤ("z#Pt u..VD?G-wྣsy?LqS$w c=H%<u'4, k<Ҟ"-Ӭ "&ѥUU]Ǚ)#ND:$0aUUjU.4Ja3?Ã=0xgeӝ{EpnY9ʄ\>`o{mTgʟ)y+sTYʇ^O$P&#%  H -ǛG^#+l(#FLY/=#P|֫[ٻ?Rҿfe*O?Z%P#VF=' H@@xD$ !AGiU{zk,s >X=bu{#WfוG,؏s=*OQi{? I@$?0$ H@C2Vwi_y<鎿LRF/~؛JJp_h;爛xt$  B@] %  H |IK2pni1kei*KX&K5lxx')|nzsB1Uf2gx<4 H@hY旀$ x䞞͟NPʟ,>YezϞ"lC~]_4I]YoRYjʮWrTYpE$ uS{뭞uPN3ǵk*e0:x?*gq&x*J76JgpQIn~,ahpjыT産~!\?2蕯ُJǓW.X$  H@腀^- H@CrtE:ߘ=x9+{ost%+ H@}$.#0K@vjWܬĕE&2Zs$- H@ 1  H@@ ;lgT<3Ssl$  4".mD%  H@}%"gk&K@,ٮ$  H@:#A#$  H@$  Y!6\$  H@@GPvD7h$  H@$ !K@]:dކK@$  H@Ҏ$  H@$0d Klp H@$  H@A@]ݠ$  H@,uz. H@$  H#K;4B$  H@%.]o%  H@$  tuiGtFH@$  H@ԥCm$  H@$ .n H@$  H@Ctv $  H@$ԥ ! H@$  H`Pٮ$  H@:#A#$  H@$  Y!6\$  H@@GPvD7h$  H@$ !K@]:dކK@$  H@Ҏ$  H@$0d Klp H@$  H@A@]ݠ$  H@,uz. H`x9昻{";<]>î ў7|ƺKڼno~͖$А!"H@jx饗N8oe=8ӥ }G}ߖOz^޸袋|We3\{% vPJ@@︱oRD_>|G?ss9g_3MMJM,VCC|7x}ꫯ0M7b-6䓑Ffo/~MswW(R6~uGfaEu .O>IV>`GuQ{Kblrl_Ǥ4 K#ϖ ':2Űyqeoz+3:L 0ث1$O{c9f_3fmtAHud dpBfwW`$ԥGZ'p737~_6^{5$ TOtM_|q9D#C AH^z &Hq7̻?8K@g"iyE5EG}gAu](s%` ^n! Ut-1TS1cUmn믿>c!K>S,첽;rM?X_>m%4:ؚ tI3<3 }z]OC'?&1uKo 0PI9묳2b \{9_c5gTcrry~3&#49&iUW] #j%Ri\N^{~QJ t/bSSC3S! 3j4  .SkK~eEiJ̭vm / Ah+ 6zs=ߎJ Hё??B.[H@¡+Ȏ~ w_u$xW18c1By}}?ڀo>1 D6 B+{r)YCj^s5QH+}J+U'H@mع{, fFr{ai!/Jz뭇[uU)*ղnf\"Q(ÿ[mU$(Rn6G3/nA= `vHuYCl)en&(Lʇb{a"pR Dgf *@fY;YEgxQ4A$҃c=6 ]w]J#'9C_.Zz=.D"20XIaX"_sAi$g"fI'Zas=УR'oa$>l_xI٧`΀$0zh@p;J =,ovpL t qQelrsX}Չa M5.[na~(`\?½2K!X>26D-+~- (=96 El\C02SWCGh*H/4;S$[rv pDuYwq@ E:\xh8##<\|% <@R2bK,ѰEO=AbR`j@* aށb\X܌))H4 BmH$T|!R J2:wtdaWPJKSqAiވUØa /+E7"ۘdB{lnVJf@ǧ/*$JqL:Nm(Jrӽ9uco JLmPV*'-l+< ǫ%#I;<6[ֆ (0C 5`$Jy #Zeq\,FHs?ZLTd~D)(F4!;kSd{R锆İh&2m*"rxh|5\)GH QNn.R*[`p9 a3BB5MW9S:Nd dϹclڞabi+lS,LےӮt )M(7ȕD6ĉ,| Nrfr4OT HK;n TCbRG`&y"eb3ރomM!-8!G䰜\.áZC!{xlХ|%b“TG 8ˉ*6O2ds$4$A\ o5\3K+k\yz)Y)UC2[`6̲e6޹&˅9e8⍷PEMpq3yDI.&{%e6Muyr62@lã C؇SO\󽔀tp[{n \pA_UtZ"2V qˎnIfecR5K9!J9,292.XgH'leL. Y7f^֣Rr.V85P_ʡS59)0nm`v+Rcu٘:ТޘHY O)&qDSQGT+\t_T&I8(Vp : kLF5k[_uJ< ~tл@Zk!RlaM-d@^>y8B)(tҬdž,cu"e{Pž8KDA|P դ8W1U[!kOkÜy-IH4`6>ѫ|sn.Q,e)5 5qtͪJ6v?{Wgs%2яGޜ.mD)48'jYˍﴎ4g[ dlS!wɅ6cdf#Zj)NDv̦˂fTfV(NK֋S $ N .^ІB8nPJ;!6P0q[YPbx.jǤbA)DZQ"Syb7$--c'7I" Ibr^Q †T.Ō6di_[ot*d+x4΢㩒<+ ז[n?<98׿u(1=PpqS, _*,f8lJ!\PP}Sc0}ʌ `UCձi9 9azx[O-ŋy~-agӰvt)|b'} sth H` Ku5E!~u}蚪uhsZoItL9/q7q|1Itq*ofy$ڕ_|Gh[4$`AKANpj( ҭsҥIi.">qpZC)b@RFqjLq+T>ژ#8խ<0Gݒj ħlSN9%{WlX9X|'tUڼb6h#OQ`!V!= #"C{BCo2CK<2 CP`S9qF!Y5MJF :asLƩ LLb;F겗HY0vDKQf #g0[a H` K 4xl:Wմ4imn) @֖[yh8MkX(ih*TAĠZy,5D U@8-5VKͥ(4.ė"6d lwEǖ0֢-3#6pCryHYTm+zH JӃ^[TUzUWQPS} P&E@lʒc:GnYGeE21O5*qC E7d"f`+TJdg'6?D'23¸#b0s44zunoujL2ь=^Pb7fcK` fs@d,#*唑F)l km`E[A/9PYKgp_R[0xm\`,ׇ(e $7\r ;B"W0O,ja-M&@V|cC9 H&ER]|XsJ%4|ua%dEJ@=#;3S! K1N2soEnbb ؅]݅ݭ(?[QNVD P{>qۻۇϽx=Ngvo>󝙥YFKo {)6=,{ݒtHTK-RZLLmccaHrկ'&}.-mNܒZn[.#"&d/mʴzloQXB={dV+UAbu$EI 0|Ja'~v3igo,%OJŭ?r5֨(ewۣG78;X_駟:~gWmSnLjt>yp aa^x{)=CwMk0ֻ@ ]>Qhȑ#Q_mwDs=vXΔ^xB^s5,H( vYg!ۦfFfڰ=\j7묳r)={|gs㎷zkb4s2,,p K^" c1U_#,gqF,RUӘTۼ?Ν;wԩTI wRZa0|~2.K., (v9;?f]lb|1b_#bN^Aԝbg̾ڵkR~wr%' 4_ql{/UnСC=4<Ka/ddV/~S҇u ܪk0}E@@VWEh{l-`n|_{5F/t԰bĹm3SO=tM2$mD.ugFj[nz[m_/bt^z)W_}uW_!k3藳cp')/Y2fv#X>ɋtlQ|I!WXau]LI>Td9` V[ Pҍ7H ȳg}&AIlǷ~ۧON'tnfoV s:)D?ClOL bl馛"YM4." .ڡ`PFPDKcy睙ar6E]h_}O!@b v&J-sύ:RTe0OɈ^i}Gh]v\rrɒ@s.mvQD r-Gu6d9F?N^OĹRb .(n5x9KSF g!?ڷ&0^k;<+K,a A\hŽS/,~C܀}{EmK l$:hf ޽{6—-ciVFHSvn喖H/* t2Yf h%4;έO<*SD*szc݀W^Yqbf|6|sU$ޘ(s][xw3S,1 ]wE}EX#M33N+!JlzvRg}'O"nY,FHRƸS.@_^*G{>3h7x,$;%av[aSO+lvi'. aZヒOE0 or"q>DZg|P{ʔ}/8α<{֭ 4́uVb ` fR)8['V/ZO>AӚ&tХ컻_"5\‰5Ԣ8gy,_R Dgf?CXgBE.4mmE0}"9~諒I 'G> +gm65޼%܌%7f:Hd!1 .Շ]c]O <&V]!"tQJ( %9IdpF֨Y1 _on=`goA( dȘO(<ћZ-"KuT6h<Q HVUQQ BءG#̘caܶk4i_YcB"؋@.):ؤ sR2)2!7N|FvӘ3{)R[\laұLl=E4SmP\*gb2M 4#kNKD,KE+ɗ]v_,PRr]O?ٚ,V }hfҥ:*|̘Q9Yła,5ֱT0WJ d0oov1TSYPZ\Ju|OY/tK~me}!#L+DXueSS!kY1[4}.H}Rkt O?43?d>K,&K!s<d$§S;N.Eݩ*/7RYI߲Pds2ɿ97[%+"% ]ڴM@3hsFl@0[ə ȫ,XSYx\6 B~{8P&l 'L9 O6"f5bf13Zv7grᰈ<Ѣv`嗗s$'c%H`Jd.3O,twpڰ7ڑE><N{ffX`l\Κ}fvvɫle^)nCce[o27o%.a:p`ףם|>ҩxk`S-n'$}r2OF4ћJUG" MH@6aH"Жxo;)ST✊<<;F]vم|b^{!W7p:Qp cǴpN1 P=8/%:g`vcz #ʃpN|vIݏ2)?ko?,Ձwr qۿ'xӃ85ǡ-:'\2u}\ ٰtHCen@tM{y)q7 *h,Dgn]Ե=dc^ǎ-K#-4| `. 0 x4މO=|rLQK)/4%F)[{#'*X =SUFوf(fn駟#?L AS_Z^z!0NB01/A%5یɷcaN"A@M@"hWd/Wsq_eWo-4(+\(\vAvE0Tˏ9v9rvS ÙFp@2|ZAe~l@`({5?xaB*sj2hKwv" " " % DEKi jlJj@@/D%!" " " " " "P5Ҫ)@ HtiQD@D@D@D@D@DKkQITM@jt(" " " " " "Pҥ5$D@D@D@D@D@D@& ]Z5:E@T" " " " " " U."Ԁti *  HVNE@D@D@D@D@D@j@@@KF" " " " " " 5 ]ZJBD@D@D@D@D@DjҥUSD.D%!" " " " " "P5Ҫ)@ HtiQD@D@D@D@D@DKkQITM@jt(" " " " " "Pҥ5$D@D@D@D@D@D@& ]Z5:E@T" " " " " " U."Ԁti *  HVNE@D@D@D@D@D@j@@@KF" " " " " " 5 ]ZJBD@D@D@D@D@DjҥUSD.D%!" " " " " "P5Ҫ)@ HtiQD@D@D@D@D@DKkQITM@jt(" " " " " "Pҥ5$D@D@D@D@D@D@& ]Z5:E(*>~Z&(Ak? ʒR_~⭺)*:Ig$Ran&tҗ^zyK\d'w^:tiZKeh*~^SO5UVѣG7,*2x/%t(X,~믿WmOM3?O[E.mR&w}^xW_ꫯUT# ߿;Y{U:?py-|ME47clY7~_ٷXv:7ig["QJX4+M @١g,@Ʈ_5{z+/,mB/8rg֬!C~ƒWx].3F_~姜rJN[uDv}@XkСxEvFruwlXov77GeI&dfXr%C].馛+o&_DnVZiR}٭]}Y:,6lC P)Fp nbs=P>_|PAR L3t)l^c kqE(O>Fm}xn!vxN4D+cy[;{9ox^~B,j}}Ѯ /N;^M;Ђ{Ns=: / kau]G c["A{gh2PO>9&쯸 ~>њ0!uPl榛n{s6_ x D'aK,̒D}Oh;SO=59zUӸtz7.م|>h,RevQ@ptZl+Ҿ{dzÚkI[u-9眳6喧H%X;Ɋ+8rrғ7xc{dQBO*B-^zi4>W(|B -T_EaX2E"n{;7HCNα6H͑ ~9Ǥ^@0e73O>䌭?nEd[gug_~̀SLR#: OnVLɡȈ2&İ$OU駟O<1@:[n%-QGuG"x:[BEmVȀZ^KAhC~m<1,[MXve1QBJNz(. Cc`fCط~2G4eC(38Hq> |W򥕱3sg[ղw!љ4 {/w })w&m3Gt!h&6 L)?_M#M~>.R}(4@Rn ~=?Ҿ{">0?.6bTLWE@jKqꫯ<6Cg0زXƑX0!Ր . *Q f<6ctr/Y\<V5FcUC<󜏊U'O,Qq([%["',e3>sI'XGC]`+zg%Apr!XsGP RKS0g64qȴ 2G=[~ lHx8̒^{S*d,.Za IQ2t &Lh1V lvuW%ziiR)=xh9u3<^YV@VM!duLvaf&:eO2QŒu$.! I>PUi R@P\n"JjA(hBFX>cɸ3il"Yqa 2իW/nL.3PRM8芶4^AndUW] LBL&/k,'t&Sye'ˆԐ5/JIe|fM FK;F~ԓZ 6C2`ڏI ˦A4\B0` D[7[ XS˙KJk { ,w\j1Uacy'QP> ̷j+ŕ>XjÍ,Ж|8DyR--k"vzkWW_DzY àZf`dOIh>̛axl|@2/J IBs-HTHِX,͓B,a ͝4(iA|DV3G&4 u;aȅ0|FFO&JEkz[_w8zLti"`y+ߵ|g[0#|9tJ8&VXq'27:'UF̅=cVzZF9K`"PC gkN:d\J /OhNBЙKg`XH⇿y(@+ og#">biaɨ)0np1ؖH S*cPf@L5!X 1)!l!F"%LJYt1~KH 5Fޣ]111g amwgYGX_䁽~3f3% j̘Qs=r-:{y\3 8 `b$ݱfckj ]l%**X sbږכo):2~hPн٩  6 VH#bڅpga0Q{ BV ޤCMY-T_R:?hQB#͙:%. R~nF1֐]0 Pi!4w>ɫrx(v5 >42<d)<2 Y]Tg .s&`" 9 Dc!'Ͱ/ ca[`br# ѥ|eH -<<V衃,,ehdmمeDBQp#ХԔA0_mAr$%% ɒp50l#@l\NU$"TMbDH;r DH_$'QV7ʃdz<)(f:\}>5s֝[+Y@n0dF6˿mFQ/Beݩ|-sfԱl|.#z ХLyKmʃHP2ie`of!'P/]h -)Jm u0r ϰ`j[*},0Y8 t @ `7,0o hn,fR /1ca H|4QFlLbаf2~ .y4BiV 4IxL.dI( klcu,FѴ[S7V(fX&Z_r0[&bb5hUVY%7YTRGbIfOgԦl¸٦oi8gD}HҌna0\1EAM&yj>.IUc3^6[$\9> )dp-6֚jԼ+ZQl'=cYؔyxO,-*{tJlJݗ.<)+UٯfZM4aW pن,v녺4{/&=f.qM̊][5]:ܼRT~b<(Yت*>xFJ?@s(M6ل8—s"w_zGڰMl e$ < z%?~_1EeD k8*:#]PDn5415nFl8ϔ=&@R`Ⱥ (b,釗j{ϓ0Rn$AZL~ԩ5җ4@u)v|xU4zG7 ƓzkXQ륺]{ 88m৑!H%yԏBm1<85/OJ =xa;IBM1 iʓI5L >e9DjK}+)ü?x϶ZWK2g>f%Z`9j>\y- F1]|i0h+~y1!s.e2m ɫ)%K(R p[ /y`+B1p LsURc8AљA (W mݖX ~: I |DZ>@Ylo99؂ o O 9-A3R0, Y)-oj% ɚ~fݢ'q-?#}O,k@bwni$4L'HaGx&A|Ӡ)%h}Ёth8nrn tiVԛK$%>a^UkrV2L1ḡ14u_ 7 W`kNS4V֐ץ5̨I7 5 d4" J&!"<굎T $c &T(KE=>,cLbRz" " cVX#' 9 s~qјA5FR_ m0U=TVr@{%f˯keU/D /dO&g;Dl N<ĝw޹=USuhmK}mٙ:>odeCD@D@D@w\cr8Ӌ?X[[8Nm5BI^zr@slW!DE(" RNˎ" " " " " " "PCҥ5D@D@D@D@D@D@*& ]Z12E!TR" " " " " " ."@ 6oKNq7‰@ HU|B{桬m(̓Et #ҥ1U%ʳnEJ]i0oXD@D@D}.m_ڈTDݎkw3GEwqqَ0|]x{wv_7Q7SwT_ n(q?|^~& gݼFrw\%J#E7^%qwl+O=>x;J"qs`$ks}wROD@D@ I@ͦBԀ^FtUB]Gizy$8ʳO?FnF8{oCn‰㞼? 3n"7t9^&f:hyD;N&2뵄[רTvEK y>wn%/?zrs]wa>" " "PX [r\D@D@ZA`{ݠ]T؉Ur #߹OoY곑<ן#-znԏ "&γpQl.iε{ȔUvQr/=9]`[ŧ_F'ƛ gu;ঙ!"ouG÷Ã&;G#" " $ ]ZvSE@D@ZIko"JIQz,Aѓ$ ٫%/>vKR>C?N7s˭9F;En|,Ŕ}th׻>;D{=ҺDu<kۛ/o⮵y$C:O" " "PLZ[vSE@D@ZC݇o֋&dinZ1wHp?b-|]b76҉'s3Wi!מogr3FKp1^%2xz-u#*'2SO2ON}סCx=z7i)bR=)x/Yvc3iϕ׏! #)?z?hE"og\X5tr|Ѩd1Hˇ@moxZ/-hId9mVfsZ~{6-l7ZD@D5f=:vhwۯtSO!%Z+݈.5I= Ԟ / qpp}M-~ !ۻNzDдq:[.egΊeN;q8e8zө_.-lө" " UH}Or7 pKts}sq%an&KzEVP^Dz]ePtY#{4]߭[;P>ўsAm^sNtN/1>R#Gb_+v9w.ne]W" " "PXF+jQãeQ]vEbO4TtaÆ12ܹsi(\(\ ZwAqM? 6ѯl9?r|'ݧs[2o:ڻd8hIZE-sfogE0! U &׍}5rJ\ۿ׻Trt39#" " 텀񶗖T=D@D@D@D@D@DKn*ҥ%U(&bJ-" " " " " 텀ti{iICD@D@D@D@D@I@R@{! ]^ZRb.-f" " " " " "^HT=D@D@D@D@D@DKn*ҥ%U(&bJ-" " " " " 텀ti{iICD@D@D@D@D@I@R@{! ]^ZRb.-f" " " " " "^HT=D@D@D@D@D@DKn*ҥ%U(&bJ-" " " " " 텀ti{iICD@D@D@D@D@I@R@ߟ|ߨ G@tkrUXD@D`РAc5/Uhr MꪫV]Ax3K5Ķ7d' CjIUz{vl4}.;ONofkFF;Oyᇛg ^tE??[hPfC0MeIK,q}p gwm޽;Y#/r:sI'1z~R٦'s:%Y 5O X9c9he_5hJ h\*߿?{}ESL1EϞ=wy_|UQ~_YO?=8CYfW~+z` p'-@vc$x< tBv!f|M}gG}Dve;]uUܼ:uɶ{ >nrdQ8& SQRLM=#tv~暊7s*hyuT4݀kEF lTbch6cc9/Օwy'?8뮋[k΀PB~dRƸ{ld8X7 дLw`w٣G7q/ڦ\Q[{إ:[.| fKE+ TBn_Jsy3*=2o>hվ2}g_I<-_fYHX,<}|ag 5IRfxS9딜Fr)TfZ,EEB3 '.ͨ :!j=f dL?,CҤ#0KtQǏ%s ZJ; Ak1FT-LzDZ-.MyRȃ%KNFs~/V$ᓡK`-eǴ8|U)q6DŽk-ϫ9ZYeIHi}G_ERHNyjQp:LGFؘ`{y駙y-)R/" "P,ٺׅW9]tQ[-_cE g)sύP Qdj*U}r(>pqBW^y aߏ?a"&f0Gx^fV%|a _hX Faa0_c[ Zs aD)K5mbXcO-鈕IЈaM7-TT1F*S݌C e򄎌F>E3{T&B}ŇY)l[IDATØ;7puuIfQe uVׇq6<(:5!*y&wKmcz7I`7R?X=bֹ1qSq5<%T0K '1Б䐘0pFŧ:+#_Ⱦeiix2e{EXT1b̑d@d@(׌fMeqj.-f5LypnMLa֜( θ7V 2ߒ?R=-dkҘ$J-pjj,X>dґ o0f蓗bW̓lׇUhAdcI!Iħ`ZU -klWY#&VÈec?;O e$'e\m~gfSV;yV2v&uiF%ޕ@2AAA+P?ϾYR90 O.l'DƢ[|J37G\~Ssɚς6FvO߲+> Y:50hO8ÞTˇg]7M21r5zQhb;| 8s.™#qv.N[ S:t`6"l+CZV/AřJ ?rKbG, * p(ԥ<$<)P8Sf\fV*|٦'"axJRcM81I,R+ov~@x$[l!2$/UWd諸'$+[~}$'}T9l0Yws RrN==9mLGjNƺ*J,ב1!Y0(}# tK.! $x^*,d:>waBf/eF0`YJyGvx^GdyL[eľIfXn퟿j)kBe+\-vH>t酗!Z9ʛ9k`a!83Z6oOlUYllS˞f\D~<Ѣt  cN8Zѥx)eQ_V!2 Q"( 'G:ad[xQfIx >7~(GTgAq#4٣{wp=]iG!ɩQ3DfS;bI,gfB3/Ju~:#MaKHFR~~QG4&өxVPbH3G*Ǣ_mnX*7\x>JN78oRatsFW.{э1&8+~rV`6422&9ZÀ` ľ/EEgR0x8yi2 1Du)K)u:GrL(Au0sm,y%} (xa5ʜCI Hp}pF,hK@L>c;JRJ޽ӧ_R 34ZэqecdӤi,HU˅M',H 3Y")[{#KIXfdznZMhL~X{ĴFvJ0Ϝ lN?0YdPlARlLp[<D .JҒ/-I%Im(e#R$mm'VMXo9VS_G{i,XGWMW9ͷ6>eƇJZ5q,_ń)XL]b"̮@leKϮrKˌ^F,R<:,zĆa^ĺrgl[CƊ #ЁnwN3#1$kIfhBƊ2ǀ+YZ[–f;3%/hK#3,9F!25qd]N4"jX0F`ΧX ") c!%WvbT$$ۯ.1&!" ٌnA#ASB"181ՇXbQnj[!ɀu)a,e L!eaRMOD&51z cII,Xyaf}(sʙ,O AӰJ%u~"$">XYGua?NɋU +k}D+" 4FeY Y=2Ry4NZiCXKc@a֫pe.ag$Uau)hK5/DA\||*Mh$B4<5.o~2λy+=*;#]`mGYF,5g=᫆WJKS=Y5d9,-@exӔCD@DM{]vL"I1@X 9c5* #&{'v2,d,5 l&g ;MꝻ h].6ERS6QCD@N8*' F}>Y% M,a&'Yk̤s1l+e_ ‰l6x^[ %3l/g S,"@SǴɢŮSDC6'&֙UY֗}H 644Ÿx (ƬORٵ4ƝM-jK*" " " MHEMXz !*-ZoJ8wFJ??cGPH|Op{ +ti[Oe(r&0oaNw '" " ' ]ZTE@D@kC< ܄\W8(8f_Zp*@ u[hi{ܙGx1wn'nE&iotVXmÞpӁn9Zzc>݅'pSMVحN!neZBnUo 4wL{kM hD+l1f9]eC7m6u^-?, ibwZvsrg]%q'" " cKǀFVE@D@}-.h-ܣϹ9N~z{^7um/ݤ])O#vrn7 n\ܹFε6w>c;Y*DI/=z7GR^u[{Q ZaHsA5}]xquO \QwnT^~mC$FSH)-zteD&+u}0@};=k"{oFM9u=tgdD]h)ʳT9ϭaa)rݧ" " "~ H߶UD@D@2Kߦ:O7ڭ[cwr"5;nKѶҮSEڍopv>(:h#jjDf󏖈WK(A@ͬJOӉc]"gdivAݨۯtSObb^ه![V:wt]׹?pwt|FX~J'>xSKX"" "0.YU@}tZYC{Yޛܲӻeso|(FnV?%LGExoYt9n.ݙ7Dkz- FtS ^ǹq wq^Σx[?1@ᢣUzhS׮][L;]D*s:߰aFauܹ{ ȴYoF-v;Xj m@bDEN?4q'$])Mtr'R2g'ԇCD@D@_:洵j*" "I`KN0a;rhoLYL$" " 픀ti;mXUKD@DN0ruAɊF@tLkqWD@DuNaզԈti@*1@cHUM[JRD@D@D@D@D@D 7ܨPD@D@D@D@D@DKUI&PN 4Tۯnqǧgs: 7 W`kNV4VɊ@0z&,$_O:ufiKkkS^" " " " y H%p" " " " " "  [M=*MKR8z.U)" " " " " "ti^R '" " " " " "PDQ۷(cxIENDB`dbplyr/vignettes/sql.Rmd0000644000176200001440000000603513474056125015034 0ustar liggesusers--- title: "Writing SQL with dbplyr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Writing SQL with dbplyr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This vignette discusses why you might use dbplyr instead of writing SQL yourself, and what to do when dbplyr's built-in translations can't create the SQL that you need. ```{r setup, message = FALSE} library(dplyr) library(dbplyr) mf <- memdb_frame(x = 1, y = 2) ``` ## Why use dbplyr? One simple nicety of dplyr is that it will automatically generate subqueries if you want to use a freshly created variable in `mutate()`: ```{r} mf %>% mutate( a = y * x, b = a ^ 2, ) %>% show_query() ``` In general, it's much easier to work iteratively in dbplyr. You can easily give intermediate queries names, and reuse them in multiple places. Or if you have a common operation that you want to do to many queries, you can easily wrap it up in a function. It's also easy to chain `count()` to the end of any query to check the results are about what you expect. ## What happens when dbplyr fails? dbplyr aims to translate the most common R functions to their SQL equivalents, allowing you to ignore the vagaries of the SQL dialect that you're working with, so you can focus on the data analysis problem at hand. But different backends have different capabilities, and sometimes there are SQL functions that don't have exact equivalents in R. In those cases, you'll need to write SQL code directly. This section shows you how you can do so. ### Prefix functions Any function that dbplyr doesn't know about will be left as is: ```{r} mf %>% mutate(z = foofify(x, y)) %>% show_query() ``` Because SQL functions are general case insensitive, I recommend using upper case when you're using SQL functions in R code. That makes it easier to spot that you're doing something unusual: ```{r} mf %>% mutate(z = FOOFIFY(x, y)) %>% show_query() ``` ### Infix functions As well as prefix functions (where the name of the function comes before the arguments), dbplyr also translates infix functions. That allows you to use expressions like `LIKE` which does a limited form of pattern matching: ```{r} mf %>% filter(x %LIKE% "%foo%") %>% show_query() ``` Or use `||` for string concatenation (note that backends should translate `paste()` and `paste0()` for you): ```{r} mf %>% transmute(z = x %||% y) %>% show_query() ``` ### Special forms SQL functions tend to have a greater variety of syntax than R. That means there are a number of expressions that can't be translated directly from R code. To insert these in your own queries, you can use literal SQL inside `sql()`: ```{r} mf %>% transmute(factorial = sql("x!")) %>% show_query() mf %>% transmute(factorial = sql("CAST(x AS FLOAT)")) %>% show_query() ``` Note that you can use `sql()` at any depth inside the expression: ```{r} mf %>% filter(x == sql("ANY VALUES(1, 2, 3)")) %>% show_query() ``` dbplyr/vignettes/notes/0000755000176200001440000000000013474056125014715 5ustar liggesusersdbplyr/vignettes/notes/_mysql-setup.Rmd0000644000176200001440000000134013415745770020027 0ustar liggesusers # Setting up MariaDB ## Install ``` brew install mariadb mysql_install_db --verbose --user=hadley --basedir=/usr/local \ --datadir=/User/hadley/db/mariadb --tmpdir=/tmp mysqld --datadir='/Users/hadley/db/mysql' mysql -u root -e "CREATE DATABASE lahman;" mysql -u root -e "CREATE DATABASE nycflights13;" mysql -u root -e "CREATE DATABASE test;" ``` ## Start ``` mysqld --datadir='/Users/hadley/db/mysql' ``` ## Connect ```{r, eval = FALSE} install.packages("RMySQL") library(RMySQL) drv <- dbDriver("MySQL") con <- dbConnect(drv, dbname = "lahman", username = "root", password = "") dbListTables(con) ``` # Shut down ``` mysqladmin shutdown -u root -p ``` dbplyr/vignettes/notes/_postgres-setup.Rmd0000644000176200001440000000104113474056125020520 0ustar liggesusers # Setting up PostgreSQL ## Install First install PostgreSQL, create a data directory, and create a default database. ``` brew install postgresql export PGDATA=~/db/postgres-9.5 # set this globally somewhere initdb -E utf8 createdb createdb lahman createdb nycflightd13 ``` ## Start ``` pg_ctl start ``` ## Connect ```{r, eval = FALSE} install.packages("RPostgreSQL") library(DBI) con <- dbConnect(RPostgreSQL::PostgreSQL(), dbname = "hadley") dbListTables(con) ``` dbplyr/vignettes/windows.graffle0000644000176200001440000000573413415745770016626 0ustar liggesusers][w~N~'`NYΥmz$S;i_m&\zߏ*޺[Hÿ< #7_'!ܻ::=I4K7'A=hvONW"i4g{Fc:ʐJv0Q: (ij Ql !WG;hvԶ] }}=lЫc4@rN3kCL=KMEV,뱈"DL51D>ēzIq "!<%E!d<q IXәeL܊=kH.\ղ &?f & sp{ɨ3Y)/jz0,ܱUdPF*=xt4K*vHk+B9!ƃcyԨ%蝚%:tGeofц$

3;S7-[PbF&qFpfc3SpoѴn݋۞&sކ* ޚ <ݺ&P+M;JHȑ$EN6__y;;LZSQ)R6e>.Qxbb! 5nxF7eӻpksbl#FmT`%7 Aσ>c)I-Z2O,s0d-LjƑ%^8zg鎙G:x8Pq\RËՌԝkF(b|S8xŃyu Ŀݔ:{^R^a_V8}W2H&!p>3<%-ܗl$unH, h"tg{z&VSӥ:jx M]VM9ֳ"`}01( rCg|,@. o/ֈmUr[}vVa8xv[BludCӒud{6`  ba|ïJCdH6C#hi\ܠm h ho94sX$}^#H0ܐ- do#{}[2o]_Cd KegX_$_̪Jo_:ʐ(otػUF 2 CgjR5j\5.dVQ"Կ vlgtfv:xX҈?p iPP9FZ@A@aà`< F36hs%$i@ 2b5  lO  †Az8aM4?*`>D, FARTQD<:ͥ6 D~(0PPx*ӯeμi" q/c̍f'<yr1ɵ\q6Z&?o6M u[mwp#tng. ' mSVY?m~l/nsA_nI&8h;L" p8#)g)r w$^9Z\`sjawp2K\Qf4},gpFcʥɪ/#DK+.B((̒Uc19A)H6,f>3 S/Unvtc_q{$)J u #Ҳ!OW6.敆Cb֍O?@V:8nߵf>UIƪځU^h \a8!gk㡙tC8*k]8& JHr9Sg;:&=~0/; Yzg!K{Vu޼>ު ߱M>s6lnC}ك۹Q? |ə킷?^*۳v3yuMO^ܡ&@}}c*9.ΜIl\޻6N7oOd|{p/C+_YIV Y;#tr2^J"(%V&![v2 j3!|7b^Z֭;Ov/w6FaU)WډFID:GJcW!O1oNx PHmiZqI)Ş=! \ %7],F ߚEwTSrsKA1s#)dbplyr/vignettes/new-backend.Rmd0000644000176200001440000000646413457577000016422 0ustar liggesusers--- title: "Adding a new DBI backend" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Adding a new DBI backend} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r, echo = FALSE, message = FALSE} knitr::opts_chunk$set(collapse = T, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L) ``` This document describes how to add a new SQL backend to dbplyr. To begin: * Ensure that you have a DBI compliant database backend. If not, you'll need to first create it by following the instructions in `vignette("backend", package = "DBI")`. * You'll need a working knowledge of S3. Make sure that you're [familiar with the basics](http://adv-r.had.co.nz/OO-essentials.html#s3) before you start. This document is still a work in progress, but it will hopefully get you started. I'd also strongly recommend reading the bundled source code for [SQLite](https://github.com/tidyverse/dbplyr/blob/master/R/backend-sqlite.R), [MySQL](https://github.com/tidyverse/dbplyr/blob/master/R/backend-mysql.R), and [PostgreSQL](https://github.com/tidyverse/dbplyr/blob/master/R/backend-postgres.R). ## First steps For interactive exploitation, attach dplyr and DBI. If you're creating a package, you'll need to import dplyr and DBI. ```{r setup, message = FALSE} library(dplyr) library(DBI) ``` Check that you can create a tbl from a connection, like: ```{r} con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:") DBI::dbWriteTable(con, "mtcars", mtcars) tbl(con, "mtcars") ``` If you can't, this likely indicates some problem with the DBI methods. Use [DBItest](https://github.com/rstats-db/DBItest) to narrow down the problem. Now is a good time to implement a method for `db_desc()`. This should briefly describe the connection, typically formatting the information returned from `dbGetInfo()`. This is what dbplyr does for Postgres connections: ```{r} #' @export db_desc.PostgreSQLConnection <- function(x) { info <- dbGetInfo(x) host <- if (info$host == "") "localhost" else info$host paste0("postgres ", info$serverVersion, " [", info$user, "@", host, ":", info$port, "/", info$dbname, "]") } ``` ## Copying, computing, collecting and collapsing Next, check that `copy_to()`, `collapse()`, `compute()`, and `collect()` work. * If `copy_to()` fails, it's likely you need a method for `db_write_table()`, `db_create_indexes()` or `db_analyze()`. * If `collapse()` fails, your database has a non-standard way of constructing subqueries. Add a method for `sql_subquery()`. * If `compute()` fails, your database has a non-standard way of saving queries in temporary tables. Add a method for `db_save_query()`. ## SQL translation Make sure you've read `vignette("translation-verb")` so you have the lay of the land. ### Verbs Check that SQL translation for the key verbs work: * `summarise()`, `mutate()`, `filter()` etc: powered by `sql_select()` * `left_join()`, `inner_join()`: powered by `sql_join()` * `semi_join()`, `anti_join()`: powered by `sql_semi_join()` * `union()`, `intersect()`, `setdiff()`: powered by `sql_set_op()` ### Vectors Finally, you may have to provide custom R -> SQL translation at the vector level by providing a method for `sql_translate_env()`. This function should return an object created by `sql_variant()`. See existing methods for examples. dbplyr/README.md0000755000176200001440000000702213426363744013045 0ustar liggesusers # dbplyr [![Travis build status](https://travis-ci.org/tidyverse/dbplyr.svg?branch=master)](https://travis-ci.org/tidyverse/dbplyr) [![CRAN status](https://www.r-pkg.org/badges/version/dbplyr)](https://cran.r-project.org/package=dbplyr) [![Codecov test coverage](https://codecov.io/gh/tidyverse/dbplyr/branch/master/graph/badge.svg)](https://codecov.io/gh/tidyverse/dbplyr?branch=master) ## Overview dbplyr is the database backend for [dplyr](https://dplyr.tidyverse.org). It allows you to use remote database tables as if they are in-memory data frames by automatically converting dplyr code into SQL. To learn more about why you might use dbplyr instead of writing SQL, see `vignette("sql")`. To learn more about the details of the SQL translation, see `vignette("translation-verb")` and `vignette("translation-function")`. ## Installation ``` r # The easiest way to get dbplyr is to install the whole tidyverse: install.packages("tidyverse") # Alternatively, install just dbplyr: install.packages("dbplyr") # Or the the development version from GitHub: # install.packages("devtools") devtools::install_github("tidyverse/dbplyr") ``` ## Usage dbplyr is designed to work with database tables as if they were local data frames. To demonstrate this I’ll first create an in-memory SQLite database and copy over a dataset: ``` r library(dplyr, warn.conflicts = FALSE) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars) ``` Note that you don’t actually need to load dbplyr with `library(dbplyr)`; dplyr automatically loads it for you when it sees you working with a database. Database connections are coordinated by the DBI package. Learn more at Now you can retrieve a table using `tbl()` (see `?tbl_dbi` for more details). Printing it just retrieves the first few rows: ``` r mtcars2 <- tbl(con, "mtcars") mtcars2 #> # Source: table [?? x 11] #> # Database: sqlite 3.25.3 [:memory:] #> mpg cyl disp hp drat wt qsec vs am gear carb #> #> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 #> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 #> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 #> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 #> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 #> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 #> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 #> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 #> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 #> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 #> # … with more rows ``` All dplyr calls are evaluated lazily, generating SQL that is only sent to the database when you request the data. ``` r # lazily generates query summary <- mtcars2 %>% group_by(cyl) %>% summarise(mpg = mean(mpg, na.rm = TRUE)) %>% arrange(desc(mpg)) # see query summary %>% show_query() #> #> SELECT `cyl`, AVG(`mpg`) AS `mpg` #> FROM `mtcars` #> GROUP BY `cyl` #> ORDER BY `mpg` DESC # execute query and retrieve results summary %>% collect() #> # A tibble: 3 x 2 #> cyl mpg #> #> 1 4 26.7 #> 2 6 19.7 #> 3 8 15.1 ``` dbplyr/MD50000644000176200001440000002664213501770504012072 0ustar liggesusersdcbeefa83cb075cc64bb68678982e9bd *DESCRIPTION 4ac396cdf32c44b3d07b7d6c9fd388d0 *LICENSE 394732bdacac5fe5efe197d2c6a0ae43 *NAMESPACE 22e0bcbf1d25d9d219f9f531d13daa66 *NEWS.md f36511f4b075460a88101b526ff1dd68 *R/backend-.R d4c2936aa2eccf4e5ecb9f76ed3275e4 *R/backend-access.R 94d36b9737d1c15dcaa13739d867f0f2 *R/backend-hive.R f07e371ee4ec2204e37db4ba5a130d55 *R/backend-impala.R bed8153e0d5be857837bc85159922f7d *R/backend-mssql.R 5fe368d9b1cce56a06c98df29102be5b *R/backend-mysql.R 42cab1bfbf5f12c5a7a78e0104f998e6 *R/backend-odbc.R 026655a2120191f666f346a4e658be84 *R/backend-oracle.R 9eb6f6b0e820efd0869798e5a243dd9b *R/backend-postgres.R 09522685de1a734560c9cadcb398a509 *R/backend-sqlite.R 36c3680b9b0b01afcf8643226f638885 *R/backend-teradata.R bd0dcd33db4b78d8c1e57433fabec232 *R/build-sql.R bea8e68aac434159fe0df55514866cb7 *R/data-cache.R e39bd321b28611ac1fbe579559e61df9 *R/data-lahman.R 76799c8d72351b5201c9a56f4d143a6e *R/data-nycflights13.R 206558b9a7408d36fb4740e2bbb0299f *R/dbplyr.R 8b96155bd8e80a2c3265edc17032105b *R/escape.R 674001667416552c1a812267730474ae *R/explain.R 7706bec0e1ecd6b88b4f516b454073b2 *R/ident.R c6b362596ccb57afdc45fac8927ad16b *R/lazy-ops.R a6d0a1cd208b4d42e25a9d7436db8978 *R/memdb.R 24affafa3666fc26f80a41315c808f3c *R/partial-eval.R 28576da7f9e03d7e666c01627a2324e0 *R/query-join.R e294d26b377fd878ccc51718d33bdddf *R/query-select.R 57aad197f9fce9c07adc4c31121227d4 *R/query-semi-join.R dd22fe9b6cda504c45b2ec5f51cde7a4 *R/query-set-op.R 5728dd0092cb05bbb985d63a8f92d220 *R/query.R 32ec9e74991cc0599606e444086664e0 *R/remote.R 30f680d3d986096f8665aabaf90f888a *R/schema.R 3de6191e23b0340520f8878e3c62e30a *R/simulate.R 391681f677837d05d5d59e50caa0a336 *R/sql-build.R e54f163eb195e00135faadd1d015ae7c *R/sql-expr.R 3b330d5fbaa23fdc5e60777dc1ad53ab *R/sql.R 24cbc6b42eac264ba6149b4f039ed122 *R/src-sql.R 3fe69de62e541c28445c8ff69b03023d *R/src_dbi.R 23497a42c2cb8e7305c988f8476c8c79 *R/tbl-lazy.R 14458d8f2a69ff445688df24c7b69b04 *R/tbl-sql.R f3b6355fad06177520398ab143261fdf *R/test-frame.R af7a909243a02c5914f1cde70c991772 *R/testthat.R 9411c95cc1f9977954bf77ec84f3b0c9 *R/translate-sql-clause.R f1b70e4a3a0b6fb31d8d1230b2a6591b *R/translate-sql-conditional.R 3e79285384fbb2c250a17d5e6a76e721 *R/translate-sql-helpers.R be129d2f1a63b83dfd7e386ef10b71c3 *R/translate-sql-paste.R 04d3a1808fc3252fcab0d47547a87148 *R/translate-sql-quantile.R 6aab92198dc8944299c597aaecd8c8b4 *R/translate-sql-string.R 1ca67414a694180611f05047c6cb0299 *R/translate-sql-window.R af7f4e75c0a2be9bb3bc3088defb1c80 *R/translate-sql.R f40e978f9e7564b39cf781689b1e6a09 *R/utils-format.R d9f4880e95a8a892627a8f57cdec718d *R/utils.R 1c17e513f46e6606b54cd4eef34e2921 *R/verb-arrange.R 2ea799ed7ccec236d0ab12fc434f3452 *R/verb-compute.R 6255cb951fb462bc7fbad4154ecb8877 *R/verb-copy-to.R b774d761111d2346f607289bf915ea98 *R/verb-distinct.R 06e4b01d70e0bd760469dfff7a495d11 *R/verb-do-query.R 4142399f94271b32d59f2f8c0da00165 *R/verb-do.R 0e5b6530f123fb080c582b93febb1ae2 *R/verb-filter.R a6e3cd680ebd16f1f1b8af19078e884a *R/verb-group_by.R 8ad253e500126bd3a0a7562359c22adf *R/verb-head.R 2e6efe5d392d2d670401390026add57d *R/verb-joins.R 507b979d6d466ed804f96c9a7d54b47d *R/verb-mutate.R a91da6df597baf5898434946da9b2d9c *R/verb-pull.R bb8b835db1674e83c829b19992bcd9ae *R/verb-select.R baa7a0bde032e98810216eb2e1ba88b9 *R/verb-set-ops.R 1c3fd8e4a5e9b736acbc154eb25aa7dd *R/verb-summarise.R a4c5d6a771420be80b31fae9b74a0387 *R/verb-window.R 00766fd559f87b7a5846f0fb092f18b8 *R/zzz.R 41836c781a73e657efee9bff9d306e7f *README.md 86b8cba257b3bbd29a518896d41f34ec *build/vignette.rds 226d49582833ed15e96c32c31f3569f2 *inst/doc/dbplyr.R bf5b76727c1f777d7e439ffb18cf836f *inst/doc/dbplyr.Rmd 125586ab3a98528f583cfe58bb16cc40 *inst/doc/dbplyr.html 1c2e34b8105de8e6491299ca6297863a *inst/doc/new-backend.R 17f5c9500ca3358769645a9fcf960acd *inst/doc/new-backend.Rmd f49ee74f9bfa61667c2bd7f87f097d04 *inst/doc/new-backend.html da22d54ef233099a3e5ed80de2e83093 *inst/doc/reprex.R 207aa5dee050c8f3af19775fdd198831 *inst/doc/reprex.Rmd af3a83c8ebf9dfbbdeca5e8def5d7190 *inst/doc/reprex.html 6339aedd6d505147710fb79c54772384 *inst/doc/sql.R c9bd277aab2bc659e4560e3e0bf48a2e *inst/doc/sql.Rmd 4d08c6716c3842d408697a1ff0acf7b8 *inst/doc/sql.html 371f8dfd372d956fed0ee70d6dddf853 *inst/doc/translation-function.R aeac3ebaced55b1fc02174bb68d08364 *inst/doc/translation-function.Rmd 68c31f80cea687542f0294198a8b517f *inst/doc/translation-function.html a5517474dac5fa4f800d98cea4769f32 *inst/doc/translation-verb.R 809d23dc7614390b9791010aab34bca5 *inst/doc/translation-verb.Rmd dd812ace13151953265056f9dc6e8d0b *inst/doc/translation-verb.html 75b52b72166502881831ea370f232aa3 *man/arrange.tbl_lazy.Rd 01dd69830eb85e0436331673d9453ab1 *man/build_sql.Rd e6537e2d0220170926b7711c9e7ca8d1 *man/collapse.tbl_sql.Rd 2fc7554e0fadf1e20f3b9a7f09b6358d *man/copy_to.src_sql.Rd 8d3ff25ac42eab57cc674acad5ec272c *man/db_copy_to.Rd d0c730cf79c8617f6c3349386352e073 *man/dbplyr-package.Rd 7d145db470cd2f3dbdfac3eda32eb354 *man/do.tbl_sql.Rd 631b43094d3d7c5825f8249e66869b68 *man/escape.Rd 07a131391baa568b5e0902cf435ead19 *man/figures/logo.png e8e28f9b7ff4392e2c9bffd736bbec41 *man/ident.Rd c6661194f65d8b5a83dcc0beed69cd14 *man/in_schema.Rd 3de4b5ece803e27d2a881a6e79080924 *man/join.tbl_sql.Rd d40e28b4f452a3537eb6e767dbb8fd37 *man/lahman.Rd 22606e77ad1a306a7cf0da4e5679db2c *man/lazy_ops.Rd a5c729c0191105659330552b4c7dc758 *man/memdb_frame.Rd 9e733c531fd12648436d76b356c3fbdf *man/named_commas.Rd 8411c6a729623992f3b9581035a52d97 *man/nycflights13.Rd b465a3506a9595980f2cf8e43bcf6e77 *man/partial_eval.Rd 3cd83bd0677f9b1d02f7600ec286a76f *man/remote_name.Rd b1c7965a5db9ce4fa979c90d5c497462 *man/simulate_dbi.Rd e052fba850d59135142a82fbb746038a *man/sql.Rd 6f3708f958d122b319b44ab502076476 *man/sql_build.Rd 8e2b65071ecb6b3541f687d897c26d2f *man/sql_escape_logical.Rd 5cc626422a8a4a1bc05f27b56560f834 *man/sql_expr.Rd 5bf7cb049cf82b7f3bfdfba3afbf5553 *man/sql_quote.Rd 4e68526e9e93a01c8aec8fa080843770 *man/sql_variant.Rd 03c17e417f557ae3ec119e608f776f7b *man/src_dbi.Rd 0dda78a128e30875facea39c77552f22 *man/src_sql.Rd 30efa7d7c019dfcd91b78de885c5321d *man/tbl_lazy.Rd d50a2378b77e8fb3530dd77a339a61e2 *man/tbl_sql.Rd be514892df0c50f58a68962bad005945 *man/testing.Rd d872ffc392eee7040ae6417b714db86b *man/translate_sql.Rd cc01a1bf6e5466c2b7aadd127d59ef9c *man/win_over.Rd 478fc57c1d25fd53137cdc71fe81d5b4 *man/window_order.Rd c1a44682d2c53dba690e0222461853ae *tests/testthat.R 1cb26674b06f09bb38f7b27ee1223b7f *tests/testthat/helper-src.R 35f93f371541c003accc95e2b27b77aa *tests/testthat/sql/backend-quantile.sql 181b48cb2fa82cdda63eaa762482edcb *tests/testthat/sql/join-on.sql 6be4d34fdb6cd8b264f79aa524eb9972 *tests/testthat/sql/join.sql 2c1c1a36dfe79063c2a4e6755a680948 *tests/testthat/sql/mutate-select-collapse.sql a46bf017976364ef8332c01e744661d8 *tests/testthat/sql/mutate-select.sql db7054428924d8c4ca117503a0ea844c *tests/testthat/sql/mutate-subqueries.sql 2a2be3731d114c41089d0fb5a9becc47 *tests/testthat/sql/select-collapse.sql a46bf017976364ef8332c01e744661d8 *tests/testthat/sql/select-mutate-collapse.sql 9ad793f0827dc99657e643d60d1974b8 *tests/testthat/sql/semi-join.sql 65713c8c683c233e4495fc8484e6d006 *tests/testthat/sql/setop.sql 7e9cebced2ae0d15425af5eb9b2707ad *tests/testthat/test-backend-.R f4d5c8e43d5e150865520dfcc7ddb03a *tests/testthat/test-backend-access.R 595d4abfb1f8f4c4f31d7893b170fe33 *tests/testthat/test-backend-hive.R 4d2bc1e37c9d613c06ce89b37b41adfa *tests/testthat/test-backend-impala.R 08d1437baa9e5b97093d0c6b462b0ee9 *tests/testthat/test-backend-mssql.R 7dd729be48a581edcf620b1f1cd1e376 *tests/testthat/test-backend-mysql.R f27928b56e2cf732941064948278fcb5 *tests/testthat/test-backend-odbc.R ecfb5b7a73efe1ca2fa7022f59cba0ed *tests/testthat/test-backend-oracle.R d87797adc0a731e69358f5454a60632b *tests/testthat/test-backend-postgres.R 60a05b051b26a032e57b0781eac99e4f *tests/testthat/test-backend-sqlite.R 103e550c9f74bd262f240f9d32320c84 *tests/testthat/test-backend-teradata.R 28e7e25e1ea073c61cecd81084d01b51 *tests/testthat/test-escape.R 8dff97a9a888bb27e2eab1cdf4fb5411 *tests/testthat/test-explain-sqlite.txt e08a4ba6c33140f248409004d25a0ec8 *tests/testthat/test-ident.R d11e24e37211b77027e17bd149f52bda *tests/testthat/test-partial-eval.R 6309b7466f5934cbe8e298129dc04181 *tests/testthat/test-query-join-print.txt 552aa363dde69288af28560515c466fa *tests/testthat/test-query-join.R 766c8cc16254fb549590331a07d13b03 *tests/testthat/test-query-select-print.txt de0ace1f373fb439a6fe5ba46b6934b6 *tests/testthat/test-query-select.R 97cbabf8cc013ccf1ed9aeec63630e6f *tests/testthat/test-query-semi-join-print.txt cfbb6c1a74020e1eef239ddc0daa63a6 *tests/testthat/test-query-semi-join.R 1d02ca2bd36d600f0410ab09f2ddff5f *tests/testthat/test-query-set-op-print.txt c35eed8406fe84ba48fe054fc5430c8e *tests/testthat/test-query-set-op.R 4573b53bb30c4bfc6929ce170c7eaef0 *tests/testthat/test-remote.R 3365d949df0a76eccc9a7c383e57220f *tests/testthat/test-sql-build.R 11c342b715ced625a6da207b1f98ae54 *tests/testthat/test-sql-expr.R a0ac57eacbf4b67f592806f859e53e64 *tests/testthat/test-sql-translator.txt 3f3eb42aba921f62430591295dc5010c *tests/testthat/test-sql-variant.txt 731e7d20eac9c711ae9bc1c80a6ae29f *tests/testthat/test-sql.R f3cefdc1fb2717e3a34496d4e127aef4 *tests/testthat/test-src_dbi.R 5bae9c96444659499dde7a2f02568a53 *tests/testthat/test-tbl-lazy-print.txt 526be9e14e292bb2346480093b58e65a *tests/testthat/test-tbl-lazy.R 28afde44670ef6dbf32bf626c9d29f4d *tests/testthat/test-tbl-sql.R 767290bcb13c3d28749ddf28b59f117a *tests/testthat/test-translate-sql-conditional.R 9fc7cd2be8eadd0ab5d99ac3fc3cc9a1 *tests/testthat/test-translate-sql-helpers.R 6153a968d54d17e6ddcf9276cb6ed50c *tests/testthat/test-translate-sql-paste.R f717954cd1f87a67caaaa4ce6520cc50 *tests/testthat/test-translate-sql-quantile.R 33f6997fe48b83f5ad9b62c3ec9d624b *tests/testthat/test-translate-sql-string.R a41e9df4b5d5eb91b62cbe97a76d4fd0 *tests/testthat/test-translate-sql-window.R 5ba89c8ce8ab6a188de44c017c75bf8c *tests/testthat/test-translate-sql.R b4e20d73b884c29f12c39bef8ec7d196 *tests/testthat/test-utils.R 18e406271eb47c7474da34d46ba9f999 *tests/testthat/test-verb-arrange.R fe358d47dd58c9aee1e68ad55c199361 *tests/testthat/test-verb-compute.R 99a92c0960d4df9b44d0cd97689c241f *tests/testthat/test-verb-copy-to.R 367913d81a45220cb464bdd3ad65d6e1 *tests/testthat/test-verb-distinct.R 79302b90d7d0a679c581179e7e347b20 *tests/testthat/test-verb-do.R 6cee7bffb715725ebb9771a5f485f87d *tests/testthat/test-verb-filter.R b50da71da438e59d7b923eca3410efc6 *tests/testthat/test-verb-group_by.R fb9cb271af411ae0ef87ca5b6bc69f0d *tests/testthat/test-verb-head.R 61ffc852fe73ee7c4a99ce2006a267d4 *tests/testthat/test-verb-joins.R 81b6837945d8fd3d3806c703e65f9467 *tests/testthat/test-verb-mutate.R e05ba76f5a4cf3677474299ee0772233 *tests/testthat/test-verb-pull.R b58a8f77f2145fba590e1b7b86342452 *tests/testthat/test-verb-select.R ca49cbe2fc3c720eb49bd25c147c8a5e *tests/testthat/test-verb-set-ops.R 61810f9bd59d39cb54b80a99370ee3df *tests/testthat/test-verb-summarise.R bf5b76727c1f777d7e439ffb18cf836f *vignettes/dbplyr.Rmd 17f5c9500ca3358769645a9fcf960acd *vignettes/new-backend.Rmd d821503b6e947231aa16d7ab925338f3 *vignettes/notes/_mysql-setup.Rmd 06e54db2684b7e21b6c91a56650bb988 *vignettes/notes/_postgres-setup.Rmd 207aa5dee050c8f3af19775fdd198831 *vignettes/reprex.Rmd c9bd277aab2bc659e4560e3e0bf48a2e *vignettes/sql.Rmd aeac3ebaced55b1fc02174bb68d08364 *vignettes/translation-function.Rmd 809d23dc7614390b9791010aab34bca5 *vignettes/translation-verb.Rmd 83cdde894e0c44ffda5a9dbae3c80092 *vignettes/windows.graffle a21bea5bdf33d4e34a9dc472f7227de7 *vignettes/windows.png dbplyr/build/0000755000176200001440000000000013501765344012655 5ustar liggesusersdbplyr/build/vignette.rds0000644000176200001440000000054213501765344015215 0ustar liggesusersRN0t>SUTP^7@R^n҈ K@٤^d!gf79 xox҄_uҁ=,^.9NUHnсb id3r8_Yy􁭿 54DjeKuJ7 #Btt9Ʀћ>H״}`l,R˗`]]B'1ةY|fQ27 *8I]Om=gXC4Q?@X"W`yoPGHf46~Sk$2'tĪH`J/}²^b#$4 }Xbdbplyr/DESCRIPTION0000644000176200001440000000544213501770504013263 0ustar liggesusersType: Package Package: dbplyr Title: A 'dplyr' Back End for Databases Version: 1.4.2 Authors@R: c(person(given = "Hadley", family = "Wickham", role = c("aut", "cre"), email = "hadley@rstudio.com"), person(given = "Edgar", family = "Ruiz", role = "aut"), person(given = "RStudio", role = c("cph", "fnd"))) Description: A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author. License: MIT + file LICENSE URL: https://dbplyr.tidyverse.org/, https://github.com/tidyverse/dbplyr BugReports: https://github.com/tidyverse/dbplyr/issues Depends: R (>= 3.1) Imports: assertthat (>= 0.2.0), DBI (>= 1.0.0), dplyr (>= 0.8.0), glue (>= 1.2.0), methods, purrr (>= 0.2.5), R6 (>= 2.2.2), rlang (>= 0.2.0), tibble (>= 1.4.2), tidyselect (>= 0.2.4), utils Suggests: bit64, covr, knitr, Lahman, nycflights13, RMariaDB (>= 1.0.2), rmarkdown, RMySQL (>= 0.10.11), RPostgreSQL (>= 0.4.1), RSQLite (>= 2.1.0), testthat (>= 2.0.0), withr (>= 2.1.2) VignetteBuilder: knitr Encoding: UTF-8 Language: en-gb LazyData: yes RoxygenNote: 6.1.1 Collate: 'utils.R' 'sql.R' 'escape.R' 'translate-sql-quantile.R' 'translate-sql-string.R' 'translate-sql-paste.R' 'translate-sql-helpers.R' 'translate-sql-window.R' 'translate-sql-conditional.R' 'backend-.R' 'backend-access.R' 'backend-hive.R' 'backend-impala.R' 'backend-mssql.R' 'backend-mysql.R' 'backend-odbc.R' 'backend-oracle.R' 'backend-postgres.R' 'backend-sqlite.R' 'backend-teradata.R' 'build-sql.R' 'data-cache.R' 'data-lahman.R' 'data-nycflights13.R' 'dbplyr.R' 'explain.R' 'ident.R' 'lazy-ops.R' 'memdb.R' 'partial-eval.R' 'query-join.R' 'query-select.R' 'query-semi-join.R' 'query-set-op.R' 'query.R' 'remote.R' 'schema.R' 'simulate.R' 'sql-build.R' 'sql-expr.R' 'src-sql.R' 'src_dbi.R' 'tbl-lazy.R' 'tbl-sql.R' 'test-frame.R' 'testthat.R' 'translate-sql-clause.R' 'translate-sql.R' 'utils-format.R' 'verb-arrange.R' 'verb-compute.R' 'verb-copy-to.R' 'verb-distinct.R' 'verb-do-query.R' 'verb-do.R' 'verb-filter.R' 'verb-group_by.R' 'verb-head.R' 'verb-joins.R' 'verb-mutate.R' 'verb-pull.R' 'verb-select.R' 'verb-set-ops.R' 'verb-summarise.R' 'verb-window.R' 'zzz.R' NeedsCompilation: no Packaged: 2019-06-17 19:32:57 UTC; hadley Author: Hadley Wickham [aut, cre], Edgar Ruiz [aut], RStudio [cph, fnd] Maintainer: Hadley Wickham Repository: CRAN Date/Publication: 2019-06-17 20:00:04 UTC dbplyr/man/0000755000176200001440000000000013474056125012330 5ustar liggesusersdbplyr/man/copy_to.src_sql.Rd0000644000176200001440000000462513416420631015740 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/verb-copy-to.R \name{copy_to.src_sql} \alias{copy_to.src_sql} \title{Copy a local data frame to a DBI backend.} \usage{ \method{copy_to}{src_sql}(dest, df, name = deparse(substitute(df)), overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ...) } \arguments{ \item{dest}{remote data source} \item{df}{A local data frame, a \code{tbl_sql} from same source, or a \code{tbl_sql} from another source. If from another source, all data must transition through R in one pass, so it is only suitable for transferring small amounts of data.} \item{name}{name for new remote table.} \item{overwrite}{If \code{TRUE}, will overwrite an existing table with name \code{name}. If \code{FALSE}, will throw an error if \code{name} already exists.} \item{types}{a character vector giving variable types to use for the columns. See \url{http://www.sqlite.org/datatype3.html} for available types.} \item{temporary}{if \code{TRUE}, will create a temporary table that is local to this connection and will be automatically deleted when the connection expires} \item{unique_indexes}{a list of character vectors. Each element of the list will create a new unique index over the specified column(s). Duplicate rows will result in failure.} \item{indexes}{a list of character vectors. Each element of the list will create a new index.} \item{analyze}{if \code{TRUE} (the default), will automatically ANALYZE the new table so that the query optimiser has useful information.} \item{...}{other parameters passed to methods.} } \value{ A \code{\link[=tbl]{tbl()}} object (invisibly). } \description{ This \code{\link[=copy_to]{copy_to()}} method works for all DBI sources. It is useful for copying small amounts of data to a database for examples, experiments, and joins. By default, it creates temporary tables which are typically only visible to the current connection to the database. } \examples{ library(dplyr) set.seed(1014) mtcars$model <- rownames(mtcars) mtcars2 <- src_memdb() \%>\% copy_to(mtcars, indexes = list("model"), overwrite = TRUE) mtcars2 \%>\% filter(model == "Hornet 4 Drive") cyl8 <- mtcars2 \%>\% filter(cyl == 8) cyl8_cached <- copy_to(src_memdb(), cyl8) # copy_to is called automatically if you set copy = TRUE # in the join functions df <- tibble(cyl = c(6, 8)) mtcars2 \%>\% semi_join(df, copy = TRUE) } dbplyr/man/figures/0000755000176200001440000000000013415745770014002 5ustar liggesusersdbplyr/man/figures/logo.png0000644000176200001440000005260413415745770015457 0ustar liggesusersPNG  IHDRX?gAMA a cHRMz&u0`:pQ<bKGD pHYs!7!73XztIME 5kv{TsIDATxwx\y9N ʠW v7HImIeq\ıc;qI쟝dݬwM^GvVkN\eIDRl {L{;&h`|0m~#07<1_sO)Й#Ot$ 1` | #W"/hu_<#<|8xepK?z? M!ML1F7f:|xuL!5L!lߵMr`-~r L%LF+?>d%ig/BN}L0#[1-i`4WiċSS)062ߙ`8U1b  ܊!>"S S)ˆp UB?ċSSb"4p,H %㋾CM!.g!ub_n'`*dy(.7a^M"g!& cjRH "NwQAF%zԵ@TƀSȳ)dD Pd)1tҼ r3XTPBt{Ե >;Z<1BIL#|#cT+%8s)zԡ! hG0jk pF2HwESӏ)if]w`CN )jQL$/42){iz84qCI!%(qPAVEȱ @%ݾ {Ч?6m3)$3ߘ9m榓jQ Njo?IfM ` 8I\螺6lEvK҄;! 飹G(%->/BN4, /Min: !&.>|um}iSM4a x huW`,)$t;(K'ۓ . RGO)0/1FMb0< F#`@JIs'ivoGqӘ$a xnElI_K)nU))~d@ GKptcӶL6[0SI *MwRNێ`Ej$ Aoc')1.2vrIw*"e;? P79imqj[(Pf~dQ~: (LϢP$' :gZܱ?nl1jӶxLa1n8(fGaE:0u>|dzӶL36?9r3od݇/IFcɴ-ie[Ƕ7l]4vLNZi[M+ie(MoPᣭ7hM'7f( 7]{=]u>C:ɲ-~8xfM#i9xtbUt3`[ R?MfojnnK:qB°-Fb4vi)0<6 '9l~ZzSvJ <-yO4' ŀ?KJZ漴-Yb,Qas(q0l~rg!@$u~zIIL\L8W< ?AׁmQ;HKohlatXgol'.+ |2h[R?6̤'ucD] /f{d"D5I[_a-mv_L;N5FSO0XRG1Z㧹mq~FNX(:8,SLfh Ũ zX=mQ2 tj&h$!9E1D'ϐ JRQVe. E ID~IOMJۢNF?RkS$]ȫ(@(*b4xYBh# TvץSn:Ǽ/wIw`s\tA]\+6'(Wdh;^-%}΀LFZfiaTL!l@mV MZtT 2O#`/,7O ,PP wf %'9fL~FS[4K|:ؤEM8M4AGmp6jgLLR)QQ~*,TtZ>q![~i1%"a JѳZe`%fkg+乍+8#MP!q ,b;>fd"oV@C >/2>u d:nb*9 w0&s =ED[fh-1>s-40ZvR`&yt6J~$ujt@2&sP$I7E|Q$ t?<ݤF @`pY 3&.0/7RT0 XNJa IBND' 4iwjR“4tB4)Q [UUEUU1]GQTt]Ǫ 3UCEq?OpAt6ߎMj))^HE;Ez{{^,V+YYY>}^ 'G qBWɇLd0(C0xt"]T8dԍ] RU`XJjeMMf{]|oUWx$ v:rs{b1bH~|>ijlr ]ݼِH#[ܵIׂf -$SPH%NoP',EPX[lcSRZ?7&\x! 4ӭQVZɯ~K<멪b5ϙ3gp@X,~N8 5kB\."hvSM\tOVfXQd9U։ b:->p%F#J2s pgq1wAâM+)7 RBG,gjU}j߈tvv@0׳tR, WHOҥKDxǕ+W|PRRѣGFJIzz:eeeAb%%%deeCK[uWT[Cg9jԨqNZBb~=jQb=n-:C_?%^6mYd .fݍfT(FYhNSSO}`?H ~;9/]I qf-| ˗/ǓjbG?|3v).Wbs^%yc6&$?26<'1 ab1 8 <Ҋx{%=ÛoɡCBǎExT\Rj%Soe݆ɢEXz5k֬!''<*++X,(bV~q!9l+#X oWr^ qnP!w>G8^syOd!PTTDGH,͹qvzv|RRjj}HXG9CU@_| ]]헉\|WظugTATu^ Vob6>6oL~~"BF tvv_]׹Xr%cCݻټy hnG@/^<OGYYYp!/">_Edads\Q?xٱc\p`*gO*=5OGz$GW3LvɾmϣU+X{ߤrպ\<~?M8{,{7 {졵4233y|ӟjv;k֬aϞ=ntcZhii}>;f}Ĺ_:BMMbQ$#rP#P1]gV'/g~SCH]Ky$?  oNy|/>d߾}F:IEERBIq!@ӧO+/s1 ^M6QQQ??3~'yp:TTTo?N^^ިinnoo|`]/xKp_;Q"*8 UmFUU"PQt7dT(zdz_HA+$(j|i͠c,5NJқ";w䯿5VZE}}=?JSS/>N._< X,FYY-"77>(D)//JhvC\.t]' a'!`Q% Ky1Ne*$׭ .?7Fĺ!22 /'t;Q 9'f֭رR~?uuun]]]twwsIΜ9Ù3gؼyBaXPUW_}ݻwŽ;x~m6m4Op:f6>rV Z{.ЁMr JA \6<2n֕ZXS Ƞk+ŋqFJKK),,Hhhhnp8"`_qڞ@6VyCH}q?NNN.kj˺ X} 7B>r]0 )2t40,ɆBK{ϵGv*=---Z |rn7~.,KRݽ}w}0 \n4lLYV֯_?g߷ooGܹsLww7@H$-EiANEEz+f M༡shH&#nyI/,/9ӭ1~p:pUBǹr9xG|O=4? >F"hЩxzRl lj2rwqe^xjkkٳ g" <p8HKK###Kvv6iiitbٰX,Kpͧ5;-d8cySzt $Ak=KNblū)]+֠i:?Wk).*P`mmm464bZy׉Ƣm6,\v=0QUB__~233x<8N p:dff5PH>Y#j.OskF[HwG50b8fZ[[Br8}4?o=* P"63P:njސFORۧRc-Ēexb-o>V+v<W=r$.9vO;YWlyQ 2V0 !*p!̏"iNV+uuuo|իWp`qw?kq_b5]HZ|{WnEֲdm8\iŋ\z`0Hzz:8 Z:DMM gΝ${Yq?k2{qZ,4'Yȁ&Q#\P@gS{bc\֬^CZ۽{7*]]]x^^ * qE-9^@%LK1x9 S)Bb3UPQ8աPy|`0Onm!S{Yϑ,uI ֦wnJ]#~ongev/^Lgg'?O{ÇEUYmw݋,ZosJ#^k,gjpI m_}k87N;{׳b 9p\x5yrcx9i(BlCq._̹N<%??Vʊ n>b}%Jkg*YLB蘃V@O~sp@UU***رc <3{Y97o4^Kt z\+Dnq-,YREnn.ϟXurQPPpV!E\Hz[X"c*d0j(H[(R$~(Op8?~0rq-PVV /@{[n{/ה>yRWt\jTJ#b)PZD[S؍[@Z7%K hFaau Qg9MM{p "m¶w>|C}nLwŋzjvE[k x<{* %{{Gϱ0Mac^v>LSoKTT.&33<;wd׮]V䠹IY3T)p3)JLn}߽ 7#<2`Y/cg+ivvÏϥKӋ Z)PR~ S z9jcӖ[Qߗ^zj***&m!K,7xݯFՒ+8p:O"N\rFo/.Fz\krqX{VYSPL8 VRYiv޽TWWOP@nn.~;===<쳔S.v_4P)XICN`W}BLFAbRr_n=.w`MၫNq`;ȯ_?G;fq N' .v@EW^yΎvS+Q.]̢NܛjN#zKm%1^XP;Gb 8ő@Cio^e;HsNn喁#]445߳=^N#R/\͛7S[[˫v#p6LO~lV+vU6Ry˃pֹ&Px_|)` dTӚ(uõ.ƛ9gm6k׮e˖-477sAYX^ 5x+ywsҟzhŀ1l!X 3ҊXb̐oe˖Ν;9{,47hJ?fzCtcS4VZĜ-36 {[yu.\jTN,YBOO/_… 8vZ[/(`u{7yškA:ZVe ӍdrCa*Z|⯸;ZRr%v rss)++krX$HͱrUzY nj8L7 d\vkGeҪ oqq^{5N:EGGlٲEvvY^/B(~]ͩSꢤÁn'//Ju|m\5T{\,Mͧ S&SBPvƾ:8)`ݽx^sƃʕ+>|SNqIF沠H$PT>jܳP'ͦہ/S&IA h9ԕI`#nͷFX-i---9s7|˗/E(BQmڵrA?,o4S"6lTuL4ikU-Z%wzF,XX^ҥK[8pb˗/]Cvv;;t5J3_Kl dP2:M}:gTj#9K]UZ‚|s" q{9vIqq1r yBuqd%L 1k+qjz4Ȝ%-YϲXR?,jM8JN[nu"-lzVTj rZ'bB0{XY?Iae^M/6u`HGt}:$5}NڕlK)_kSUUE~^޸B_S|uC8 (Z0dN!E2[aqG{3~owk+$l%+7jz/"//:޺bUn:֦MfM*|x. {7^`.,^EM^ǢŋFbcr9l7wdh d a!iae!RFS.7bJWQr+RhN"+NLN,SY?|.zKU"_`C>YIJ2\АfWX"4=@W$}'P,̥>L F^J,` x:65Y'eO7~5ȡss t!u {-z Ɣef%q dW~nta x:5_Ūл@o81 zw=a_J<`L"9!^B&k!7,L(RSGd<BaJ)C{?3w읉k[C>KЇP{sc xij35%#]h]蝵pߤn,aՉq ТȈ|e6Ypd tu֢w]A|F$YX]`3LY 2liZOJiž6xݤ -QV!ihm5DksRGc[eE^ g!}hm |o_װ.?$صMУ %Jc% ass-g zhx>Z(azaS-?Ȇv7*CB_Q9GB!t{tR aOJDy[LZq;Kjn-g&[l RqWލ!lnԂVK߃/@ #_ KF"&;LA'1(ؐD }|ᴮTa]H8?zDd- \"F/%|B(]r7p"xd*{ %~[|׻?=ECF8)5G"[C&jc[5N24M`[~/`7qhA hEQEkƿ]-8>A4H.ԣ ƄZ,eqw(EƵ1EzHkv\w'DZp kZrxc Gkh!= 3z_ ;:ǿO=Uc*-<D/NॿGL1^[v>97!~ȹZN?,t_ +o.[uR,%4x[ls|}?ы~ DXVE7Bw2Ái D(rl+7VP,z Nؿ9VqP k߯"#~B>K{]x -&;kD.jLO%aa]<(l., o_ |S(<>}6o޺` Hr;JFɐA3-B'jQ‡G4~ :c k~#M(FAk9_6t#JfUZb|$v CH~Ё]`Gż*#:.6x w9 hW Ɠz 9,KABҞ߼J8Ò"rEPыƧ[6]yb$^d#>#F Dν`Wf(RGIˍSbg;/!Ak9;VodjB,1=V< ze>͍WŤ} z !#:FŎuᭃP5Bk|8^Ȱ1&"%JF15<ԚO#n Bt51kV}GɉSC zoŽX +SLd[pҌ9yϫ|&P!CZe h>3pd eZBEDNU@kF2(%j*ԬLP~9>@;/&GDBA#v` XJ7 lG1y;ɘP n.wuԢX WO>l:X VkXb)8̾5DKfR%ra'$[5o1l5f6մa x$Nk7B`_A#h,1I±S78/W6pxnpf)#sP4//Eb^bh/Q\^lG40Π|7dL+;EAk;0LC),Җmb[aR K-k?;VL/Xpl\~x5!u7qlĠBdЁ>}P)^=(\ށ]7̾4cV E%Vw[?ƹDx~X9Ch0JVY@(Fj*ՁmB(Zydć}uFDFa7JuVC֪8u֚ӌ)" џI#7^ ՆZXZzV2 {D/V[FN=V}u{.}xC00?VP #va}DCDϿlNNdSE !Bo|ͨQ4ԆO Vk𩧍?iQbU bs= b_F>T?&|x%W(EE"lMDB/c;4W(9(ZodtL!CX!6Dza9Fu 6ȉwfn;O,-kxIq$ޏ 'b~K:cH2@#zi/Ϣu\{5AQ G'rc Dk뤨q` xLSk>E,'%% 5w]5g:[?\(Xi^W2rw.(d؇ۈ}Dk:)>%7F w )Z4K:tfX/AY6XᝥNe, jx;wL l_9?faگ3E3)`>,؂\+ @>yK$fd `[~~>>y_ޔ6۰Q=B-g'dl}g, AL2JL)iyw(CVPo/:'DuCs +$9ݮKؚ$H߆uGw|;V6B)$ 0:m:]!9o\!%y.AI f8i>bGjRb,ذV`n"%tc/m2n|I}= CS+$)JS(VO:D.)rXq"&p))"M>f$S''? XJ~44]XqTN;‘aբP1O~&)`_DrSf2#cdr$jFR6 N%1@&t$F$OmLɯ~p3iĎ1 H1pX6;q? =:=$jW;/.e@D+]B_Ra͌"4$tBLe&GG S? NbXLxAVwHUUoD\꒴K]k?_chvp{O~;_2 !Pd;q]s]yUML=2:mj^70Yyk`"d+p%F6 D(4tZ%Q`7Q}qA1ZI!= i9d2O@D~IO'|I7-0N&}]Fy'UQØ?:0l27&3h˽!9pO&? w1! `=EA m<,r\=fdW†p;Sepc=q3Dv=ۀIJBaȣഘ"6ImAO_֦${~+#l#瀥S0nas Ӷhr$H>tʛ<3Dě8I1Bȋ?>Mf{ۘ)d$1O It?~\N8Q=)1D |?̯E-B#QI 7A1D.  $ Ia-8׳MCp.ܡǚ4F!^bH޴- _{hu`Kp#TF $OcN:# <w2ۓ"0m&'aIͯcpU@r;ا!B`L7}cjHznAq{Rɝ&&GO?eAc:[C1&!B`$|X7Ki12 vl2A֠_ Ʀ$ 0z3ˆ#%O1<&?Ni26x9a>_TH`fě8eWc-L#3mF-~؁i[4I2as5FƙX5mEisuLۢ rBER7S-ni[yau"\Y67^{xͯ!8[6"`1m7F{).lkRXQG~~O$oH ο>$^-3`m9^" CwЈƜV#6I=Չa'ثF;ǔHl\0V bNI&?NbY ˴-R7_F5¾Q;59E;'c)o)R^ F-~ Ӷ8/P V`)݈} Fͮ$S u‡]uR7Sִ̹-#۪ۊտh5/`)\E]# _*f9'`նq3LѶq[`1E`[n#l:JV׿e$r(EDμ`s2l~3ŜpӶ8G5c)YO!\^w%#?GI/DI!t.圴sZ ӶXneLҦb̻b1:-F" oŲ` zo/{6?9'm~3ż05H1m3]Gs@HǶ}(rD5 Ƕ?'z'(c!6?C0`´-N?RxPau Dk8c %2@z:.Z4L{ya)杀Y@Jˋ&tX֠u^Ʋ`+Z{ Sb ͅC 2l~3żp!Ba8uۢ >.3-s%k͟)ԼmDϽ"_RXM®x[LdHZ\H˼ cLAMD)Vc*r>AXנCqe!uPʈ]{Q]\ 28KƘ 7m7ma']i 1X6߱Vގ}'x KZG@hQ4:C"6ǁb6pӶ8U$Jf2@6w}'~G8ұVA H)m{)j)nJ'Ŷe`m:'( lCCxQ3l.!Zےw6>+AO|fy_-`xqm ^d$ۈmDș i釋]:Sgfpk%Ql>Mo[Pea[{67 O#O=V"6'0r/%^مLp#Kxr.˯wa[~;hNz@G?8u_cAL1䦶-J ;"rl JZ.?ZyZ9cq)y}Cs7S}1Ř[-T0 SKOH2\O)?:H5o ۋlEDϼ@Dqe:Za 8M1RIYmQ% a)\UNO67{3SϠR0ɲu<'-J@FDN>x h5BAf' G>hMC7S ƶ(ahֲMH);`)œOpZy/#51)މb x]ۢ'% kžzQ3KP2{ľAG~oGC#^& 6:ǁċp'))2lR"dď랿Ghmvı(Yev}kr5)$ҶE#NԂXoAk=^dTΈMő 4$-/1ulטUۢ4酠pl/b %-8WZ1% :c/܄6l-*@Dl>D/׌DFE64]Oj~7o0Z[N+DN?pf"T+zgLߜ 0[Ekv#Zyzo#/ B0 Mo.c xVg>,vV$XB:M8n,}5PV16O~Q03)ޙ 3],Vö(%Jz2CM<,1E)M<,3¶>}2YiKLYm0&1h@kES)bc rMɶS iKAL #lc'm[o` 8E8QD 70$۶xL!ٶ!69)9FmQ`Fk69)9Ȉ8 qn$}')޹)9! |(GL<<`E7O1<ŶKL3FՉ_kw~GmjG"%tEXtdate:create2018-06-06T08:50:26-05:00QE%tEXtdate:modify2017-11-21T00:25:53-06:00SIENDB`dbplyr/man/src_dbi.Rd0000644000176200001440000000751713415745770014244 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/src_dbi.R \name{src_dbi} \alias{src_dbi} \alias{tbl.src_dbi} \alias{tbl_dbi} \title{dplyr backend for any DBI-compatible database} \usage{ src_dbi(con, auto_disconnect = FALSE) \method{tbl}{src_dbi}(src, from, ...) } \arguments{ \item{con}{An object that inherits from \link[DBI:DBIConnection-class]{DBI::DBIConnection}, typically generated by \link[DBI:dbConnect]{DBI::dbConnect}} \item{auto_disconnect}{Should the connection be automatically closed when the src is deleted? Set to \code{TRUE} if you initialize the connection the call to \code{src_dbi()}. Pass \code{NA} to auto-disconnect but print a message when this happens.} \item{src}{Either a \code{src_dbi} or \code{DBIConnection}} \item{from}{Either a string (giving a table name) or literal \code{\link[=sql]{sql()}}.} \item{...}{Needed for compatibility with generic; currently ignored.} } \value{ An S3 object with class \code{src_dbi}, \code{src_sql}, \code{src}. } \description{ \code{src_dbi()} is a general dplyr backend that connects to any DBI driver. \code{src_memdb()} connects to a temporary in-memory SQLite database, that's useful for testing and experimenting. You can generate a \code{tbl()} directly from the DBI connection, or go via \code{src_dbi()}. } \details{ All data manipulation on SQL tbls are lazy: they will not actually run the query or retrieve the data unless you ask for it: they all return a new \code{tbl_dbi} object. Use \code{\link[=compute]{compute()}} to run the query and save the results in a temporary in the database, or use \code{\link[=collect]{collect()}} to retrieve the results to R. You can see the query with \code{\link[=show_query]{show_query()}}. For best performance, the database should have an index on the variables that you are grouping by. Use \code{\link[=explain]{explain()}} to check that the database is using the indexes that you expect. There is one exception: \code{\link[=do]{do()}} is not lazy since it must pull the data into R. } \examples{ # Basic connection using DBI ------------------------------------------- library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") src <- src_dbi(con, auto_disconnect = TRUE) # Add some data copy_to(src, mtcars) src DBI::dbListTables(con) # To retrieve a single table from a source, use `tbl()` src \%>\% tbl("mtcars") # You can also use pass raw SQL if you want a more sophisticated query src \%>\% tbl(sql("SELECT * FROM mtcars WHERE cyl = 8")) # Alternatively, you can use the `src_sqlite()` helper src2 <- src_sqlite(":memory:", create = TRUE) # If you just want a temporary in-memory database, use src_memdb() src3 <- src_memdb() # To show off the full features of dplyr's database integration, # we'll use the Lahman database. lahman_sqlite() takes care of # creating the database. if (has_lahman("sqlite")) { lahman_p <- lahman_sqlite() batting <- lahman_p \%>\% tbl("Batting") batting # Basic data manipulation verbs work in the same way as with a tibble batting \%>\% filter(yearID > 2005, G > 130) batting \%>\% select(playerID:lgID) batting \%>\% arrange(playerID, desc(yearID)) batting \%>\% summarise(G = mean(G), n = n()) # There are a few exceptions. For example, databases give integer results # when dividing one integer by another. Multiply by 1 to fix the problem batting \%>\% select(playerID:lgID, AB, R, G) \%>\% mutate( R_per_game1 = R / G, R_per_game2 = R * 1.0 / G ) # All operations are lazy: they don't do anything until you request the # data, either by `print()`ing it (which shows the first ten rows), # or by `collect()`ing the results locally. system.time(recent <- filter(batting, yearID > 2010)) system.time(collect(recent)) # You can see the query that dplyr creates with show_query() batting \%>\% filter(G > 0) \%>\% group_by(playerID) \%>\% summarise(n = n()) \%>\% show_query() } } dbplyr/man/testing.Rd0000644000176200001440000000160213443245336014273 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/test-frame.R \name{testing} \alias{testing} \alias{test_register_src} \alias{test_register_con} \alias{src_test} \alias{test_load} \alias{test_frame} \title{Infrastructure for testing dplyr} \usage{ test_register_src(name, src) test_register_con(name, ...) src_test(name) test_load(df, name = unique_table_name(), srcs = test_srcs$get(), ignore = character()) test_frame(..., srcs = test_srcs$get(), ignore = character()) } \description{ Register testing sources, then use \code{test_load()} to load an existing data frame into each source. To create a new table in each source, use \code{test_frame()}. } \examples{ \dontrun{ test_register_src("df", src_df(env = new.env())) test_register_src("sqlite", src_sqlite(":memory:", create = TRUE)) test_frame(x = 1:3, y = 3:1) test_load(mtcars) } } \keyword{internal} dbplyr/man/nycflights13.Rd0000644000176200001440000000153513474056125015141 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/data-nycflights13.R \name{nycflights13} \alias{nycflights13} \alias{nycflights13_sqlite} \alias{nycflights13_postgres} \alias{has_nycflights13} \alias{copy_nycflights13} \title{Database versions of the nycflights13 data} \usage{ nycflights13_sqlite(path = NULL) nycflights13_postgres(dbname = "nycflights13", ...) has_nycflights13(type = c("sqlite", "postgresql"), ...) copy_nycflights13(src, ...) } \arguments{ \item{path}{location of SQLite database file} \item{dbname, ...}{Arguments passed on to \code{\link[=src_postgres]{src_postgres()}}} } \description{ These functions cache the data from the \code{nycflights13} database in a local database, for use in examples and vignettes. Indexes are created to making joining tables on natural keys efficient. } \keyword{internal} dbplyr/man/sql_variant.Rd0000644000176200001440000000713713442404754015152 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/translate-sql-string.R, % R/translate-sql-paste.R, R/translate-sql-helpers.R, R/backend-.R, % R/backend-odbc.R \docType{data} \name{sql_substr} \alias{sql_substr} \alias{sql_str_sub} \alias{sql_paste} \alias{sql_paste_infix} \alias{sql_variant} \alias{sql_translator} \alias{sql_infix} \alias{sql_prefix} \alias{sql_aggregate} \alias{sql_aggregate_2} \alias{sql_not_supported} \alias{sql_cast} \alias{sql_log} \alias{sql_cot} \alias{base_scalar} \alias{base_agg} \alias{base_win} \alias{base_no_win} \alias{base_odbc_scalar} \alias{base_odbc_agg} \alias{base_odbc_win} \title{Create an sql translator} \usage{ sql_substr(f = "SUBSTR") sql_str_sub(subset_f = "SUBSTR", length_f = "LENGTH") sql_paste(default_sep, f = "CONCAT_WS") sql_paste_infix(default_sep, op, cast) sql_variant(scalar = sql_translator(), aggregate = sql_translator(), window = sql_translator()) sql_translator(..., .funs = list(), .parent = new.env(parent = emptyenv())) sql_infix(f, pad = TRUE) sql_prefix(f, n = NULL) sql_aggregate(f, f_r = f) sql_aggregate_2(f) sql_not_supported(f) sql_cast(type) sql_log() sql_cot() base_scalar base_agg base_win base_no_win base_odbc_scalar base_odbc_agg base_odbc_win } \arguments{ \item{f}{the name of the sql function as a string} \item{scalar, aggregate, window}{The three families of functions than an SQL variant can supply.} \item{..., .funs}{named functions, used to add custom converters from standard R functions to sql functions. Specify individually in \code{...}, or provide a list of \code{.funs}} \item{.parent}{the sql variant that this variant should inherit from. Defaults to \code{base_agg} which provides a standard set of mappings for the most common operators and functions.} \item{pad}{If \code{TRUE}, the default, pad the infix operator with spaces.} \item{n}{for \code{sql_infix()}, an optional number of arguments to expect. Will signal error if not correct.} \item{f_r}{the name of the r function being translated as a string} } \description{ When creating a package that maps to a new SQL based src, you'll often want to provide some additional mappings from common R commands to the commands that your tbl provides. These three functions make that easy. } \section{Helper functions}{ \code{sql_infix()} and \code{sql_prefix()} create default SQL infix and prefix functions given the name of the SQL function. They don't perform any input checking, but do correctly escape their input, and are useful for quickly providing default wrappers for a new SQL variant. } \examples{ # An example of adding some mappings for the statistical functions that # postgresql provides: http://bit.ly/K5EdTn postgres_agg <- sql_translator(.parent = base_agg, cor = sql_aggregate_2("CORR"), cov = sql_aggregate_2("COVAR_SAMP"), sd = sql_aggregate("STDDEV_SAMP", "sd"), var = sql_aggregate("VAR_SAMP", "var") ) # Next we have to simulate a connection that uses this variant con <- simulate_dbi("TestCon") sql_translate_env.TestCon <- function(x) { sql_variant( base_scalar, postgres_agg, base_no_win ) } translate_sql(cor(x, y), con = con, window = FALSE) translate_sql(sd(income / years), con = con, window = FALSE) # Any functions not explicitly listed in the converter will be translated # to sql as is, so you don't need to convert all functions. translate_sql(regr_intercept(y, x), con = con) } \seealso{ \code{\link[=win_over]{win_over()}} for helper functions for window functions. \code{\link[=sql]{sql()}} for an example of a more customised sql conversion function. } \keyword{datasets} \keyword{internal} dbplyr/man/arrange.tbl_lazy.Rd0000644000176200001440000000273013474056125016057 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/verb-arrange.R \name{arrange.tbl_lazy} \alias{arrange.tbl_lazy} \title{Arrange rows by variables in a remote database table} \usage{ \method{arrange}{tbl_lazy}(.data, ..., .by_group = FALSE) } \arguments{ \item{.data}{A tbl. All main verbs are S3 generics and provide methods for \code{\link[=tbl_df]{tbl_df()}}, \code{\link[dtplyr:tbl_dt]{dtplyr::tbl_dt()}} and \code{\link[dbplyr:tbl_dbi]{dbplyr::tbl_dbi()}}.} \item{...}{Comma separated list of unquoted variable names, or expressions involving variable names. Use \code{\link[=desc]{desc()}} to sort a variable in descending order.} \item{.by_group}{If \code{TRUE}, will sort first by grouping variable. Applies to grouped data frames only.} } \value{ An object of the same class as \code{.data}. } \description{ Order rows of database tables by an expression involving its variables. } \section{Missing values}{ Compared to its sorting behaviour on local data, the \code{\link[=arrange]{arrange()}} method for most database tables sorts NA at the beginning unless wrapped with \code{\link[=desc]{desc()}}. Users can override this behaviour by explicitly sorting on \code{is.na(x)}. } \examples{ library(dplyr) dbplyr::memdb_frame(a = c(3, 4, 1, 2)) \%>\% arrange(a) # NA sorted first dbplyr::memdb_frame(a = c(3, 4, NA, 2)) \%>\% arrange(a) # override by sorting on is.na() first dbplyr::memdb_frame(a = c(3, 4, NA, 2)) \%>\% arrange(is.na(a), a) } dbplyr/man/named_commas.Rd0000644000176200001440000000057013474056125015244 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/utils.R \name{named_commas} \alias{named_commas} \title{Provides comma-separated string out of the parameters} \usage{ named_commas(...) } \arguments{ \item{...}{Arguments to be constructed into the string} } \description{ Provides comma-separated string out of the parameters } \keyword{internal} dbplyr/man/memdb_frame.Rd0000644000176200001440000000236613443245336015064 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/memdb.R \name{memdb_frame} \alias{memdb_frame} \alias{tbl_memdb} \alias{src_memdb} \title{Create a database table in temporary in-memory database.} \usage{ memdb_frame(..., .name = unique_table_name()) tbl_memdb(df, name = deparse(substitute(df))) src_memdb() } \arguments{ \item{...}{A set of name-value pairs. Arguments are evaluated sequentially, so you can refer to previously created elements. These arguments are processed with \code{\link[rlang:quos]{rlang::quos()}} and support unquote via \code{\link{!!}} and unquote-splice via \code{\link{!!!}}. Use \code{:=} to create columns that start with a dot.} \item{df}{Data frame to copy} \item{name, .name}{Name of table in database: defaults to a random name that's unlikely to conflict with an existing table.} } \description{ \code{memdb_frame()} works like \code{\link[tibble:tibble]{tibble::tibble()}}, but instead of creating a new data frame in R, it creates a table in \code{\link[=src_memdb]{src_memdb()}}. } \examples{ library(dplyr) df <- memdb_frame(x = runif(100), y = runif(100)) df \%>\% arrange(x) df \%>\% arrange(x) \%>\% show_query() mtcars_db <- tbl_memdb(mtcars) mtcars_db \%>\% count(cyl) \%>\% show_query() } dbplyr/man/tbl_sql.Rd0000644000176200001440000000100413415746254014256 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/tbl-sql.R \name{tbl_sql} \alias{tbl_sql} \title{Create an SQL tbl (abstract)} \usage{ tbl_sql(subclass, src, from, ..., vars = NULL) } \arguments{ \item{subclass}{name of subclass} \item{...}{needed for agreement with generic. Not otherwise used.} \item{vars}{DEPRECATED} } \description{ Generally, you should no longer need to provide a custom \code{tbl()} method you you can default \code{tbl.DBIConnect} method. } \keyword{internal} dbplyr/man/sql_escape_logical.Rd0000644000176200001440000000047313416416232016427 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/escape.R \name{sql_escape_logical} \alias{sql_escape_logical} \title{More SQL generics} \usage{ sql_escape_logical(con, x) } \description{ These are new, so not included in dplyr for backward compatibility purposes. } \keyword{internal} dbplyr/man/do.tbl_sql.Rd0000644000176200001440000000150013416420631014645 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/verb-do.R \name{do.tbl_sql} \alias{do.tbl_sql} \title{Perform arbitrary computation on remote backend} \usage{ \method{do}{tbl_sql}(.data, ..., .chunk_size = 10000L) } \arguments{ \item{.data}{a tbl} \item{...}{Expressions to apply to each group. If named, results will be stored in a new column. If unnamed, should return a data frame. You can use \code{.} to refer to the current group. You can not mix named and unnamed arguments.} \item{.chunk_size}{The size of each chunk to pull into R. If this number is too big, the process will be slow because R has to allocate and free a lot of memory. If it's too small, it will be slow, because of the overhead of talking to the database.} } \description{ Perform arbitrary computation on remote backend } dbplyr/man/partial_eval.Rd0000644000176200001440000000433613474056125015270 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/partial-eval.R \name{partial_eval} \alias{partial_eval} \title{Partially evaluate an expression.} \usage{ partial_eval(call, vars = character(), env = caller_env()) } \arguments{ \item{call}{an unevaluated expression, as produced by \code{\link[=quote]{quote()}}} \item{vars}{character vector of variable names.} \item{env}{environment in which to search for local values} } \description{ This function partially evaluates an expression, using information from the tbl to determine whether names refer to local expressions or remote variables. This simplifies SQL translation because expressions don't need to carry around their environment - all relevant information is incorporated into the expression. } \section{Symbol substitution}{ \code{partial_eval()} needs to guess if you're referring to a variable on the server (remote), or in the current environment (local). It's not possible to do this 100% perfectly. \code{partial_eval()} uses the following heuristic: \itemize{ \item If the tbl variables are known, and the symbol matches a tbl variable, then remote. \item If the symbol is defined locally, local. \item Otherwise, remote. } You can override the guesses using \code{local()} and \code{remote()} to force computation, or by using the \code{.data} and \code{.env} pronouns of tidy evaluation. } \examples{ vars <- c("year", "id") partial_eval(quote(year > 1980), vars = vars) ids <- c("ansonca01", "forceda01", "mathebo01") partial_eval(quote(id \%in\% ids), vars = vars) # cf. partial_eval(quote(id == .data$ids), vars = vars) # You can use local() or .env to disambiguate between local and remote # variables: otherwise remote is always preferred year <- 1980 partial_eval(quote(year > year), vars = vars) partial_eval(quote(year > local(year)), vars = vars) partial_eval(quote(year > .env$year), vars = vars) # Functions are always assumed to be remote. Use local to force evaluation # in R. f <- function(x) x + 1 partial_eval(quote(year > f(1980)), vars = vars) partial_eval(quote(year > local(f(1980))), vars = vars) # For testing you can also use it with the tbl omitted partial_eval(quote(1 + 2 * 3)) x <- 1 partial_eval(quote(x ^ y)) } \keyword{internal} dbplyr/man/escape.Rd0000644000176200001440000000270313416377202014057 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/escape.R \name{escape} \alias{escape} \alias{escape_ansi} \alias{sql_vector} \title{Escape/quote a string.} \usage{ escape(x, parens = NA, collapse = " ", con = NULL) escape_ansi(x, parens = NA, collapse = "") sql_vector(x, parens = NA, collapse = " ", con = NULL) } \arguments{ \item{x}{An object to escape. Existing sql vectors will be left as is, character vectors are escaped with single quotes, numeric vectors have trailing \code{.0} added if they're whole numbers, identifiers are escaped with double quotes.} \item{parens, collapse}{Controls behaviour when multiple values are supplied. \code{parens} should be a logical flag, or if \code{NA}, will wrap in parens if length > 1. Default behaviour: lists are always wrapped in parens and separated by commas, identifiers are separated by commas and never wrapped, atomic vectors are separated by spaces and wrapped in parens if needed.} \item{con}{Database connection.} } \description{ \code{escape()} requires you to provide a database connection to control the details of escaping. \code{escape_ansi()} uses the SQL 92 ANSI standard. } \examples{ # Doubles vs. integers escape_ansi(1:5) escape_ansi(c(1, 5.4)) # String vs known sql vs. sql identifier escape_ansi("X") escape_ansi(sql("X")) escape_ansi(ident("X")) # Escaping is idempotent escape_ansi("X") escape_ansi(escape_ansi("X")) escape_ansi(escape_ansi(escape_ansi("X"))) } dbplyr/man/simulate_dbi.Rd0000644000176200001440000000210313416415143015247 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/simulate.R \name{simulate_dbi} \alias{simulate_dbi} \alias{simulate_access} \alias{simulate_hive} \alias{simulate_mysql} \alias{simulate_impala} \alias{simulate_mssql} \alias{simulate_odbc} \alias{simulate_oracle} \alias{simulate_postgres} \alias{simulate_sqlite} \alias{simulate_teradata} \title{Simulate database connections} \usage{ simulate_dbi(class = character()) simulate_access() simulate_hive() simulate_mysql() simulate_impala() simulate_mssql() simulate_odbc() simulate_oracle() simulate_postgres() simulate_sqlite() simulate_teradata() } \description{ These functions generate S3 objects that have been designed to simulate the action of a database connection, without actually having the database available. Obviously, this simulation can only be incomplete, but most importantly it allows us to simulate SQL generation for any database without actually connecting to it. } \details{ Simulated SQL always quotes identifies with \code{`x`}, and strings with \code{'x'}. } \keyword{internal} dbplyr/man/build_sql.Rd0000644000176200001440000000310313474056125014572 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/build-sql.R \name{build_sql} \alias{build_sql} \title{Build a SQL string.} \usage{ build_sql(..., .env = parent.frame(), con = sql_current_con()) } \arguments{ \item{...}{input to convert to SQL. Use \code{\link[=sql]{sql()}} to preserve user input as is (dangerous), and \code{\link[=ident]{ident()}} to label user input as sql identifiers (safe)} \item{.env}{the environment in which to evaluate the arguments. Should not be needed in typical use.} \item{con}{database connection; used to select correct quoting characters.} } \description{ This is a convenience function that should prevent sql injection attacks (which in the context of dplyr are most likely to be accidental not deliberate) by automatically escaping all expressions in the input, while treating bare strings as sql. This is unlikely to prevent any serious attack, but should make it unlikely that you produce invalid sql. } \details{ This function should be used only when generating \code{SELECT} clauses, other high level queries, or for other syntax that has no R equivalent. For individual function translations, prefer \code{\link[=sql_expr]{sql_expr()}}. } \examples{ con <- simulate_dbi() build_sql("SELECT * FROM TABLE", con = con) x <- "TABLE" build_sql("SELECT * FROM ", x, con = con) build_sql("SELECT * FROM ", ident(x), con = con) build_sql("SELECT * FROM ", sql(x), con = con) # http://xkcd.com/327/ name <- "Robert'); DROP TABLE Students;--" build_sql("INSERT INTO Students (Name) VALUES (", name, ")", con = con) } \keyword{internal} dbplyr/man/sql_quote.Rd0000644000176200001440000000103413416416356014633 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/escape.R \name{sql_quote} \alias{sql_quote} \title{Helper function for quoting sql elements.} \usage{ sql_quote(x, quote) } \arguments{ \item{x}{Character vector to escape.} \item{quote}{Single quoting character.} } \description{ If the quote character is present in the string, it will be doubled. \code{NA}s will be replaced with NULL. } \examples{ sql_quote("abc", "'") sql_quote("I've had a good day", "'") sql_quote(c("abc", NA), "'") } \keyword{internal} dbplyr/man/sql_expr.Rd0000644000176200001440000000255113416116560014453 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/sql-expr.R \name{sql_expr} \alias{sql_expr} \alias{sql_call2} \title{Generate SQL from R expressions} \usage{ sql_expr(x, con = sql_current_con()) sql_call2(.fn, ..., con = sql_current_con()) } \arguments{ \item{x}{A quasiquoted expression} \item{con}{Connection to use for escaping. Will be set automatically when called from a function translation.} \item{.fn}{Function name (as string, call, or symbol)} \item{...}{Arguments to function} } \description{ Low-level building block for generating SQL from R expressions. Strings are escaped; names become bare SQL identifiers. User infix functions have \code{\%} stripped. } \details{ Using \code{sql_expr()} in package will require use of \code{\link[=globalVariables]{globalVariables()}} to avoid \code{R CMD check} NOTES. This is a small amount of additional pain, which I think is worthwhile because it leads to more readable translation code. } \examples{ con <- simulate_dbi() # not necessary when writing translations sql_expr(f(x + 1), con = con) sql_expr(f("x", "y"), con = con) sql_expr(f(x, y), con = con) x <- ident("x") sql_expr(f(!!x, y), con = con) sql_expr(cast("x" \%as\% DECIMAL), con = con) sql_expr(round(x) \%::\% numeric, con = con) sql_call2("+", quote(x), 1, con = con) sql_call2("+", "x", 1, con = con) } \keyword{internal} dbplyr/man/dbplyr-package.Rd0000644000176200001440000000174013476014054015503 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/dbplyr.R \docType{package} \name{dbplyr-package} \alias{dbplyr} \alias{dbplyr-package} \title{dbplyr: A 'dplyr' Back End for Databases} \description{ \if{html}{\figure{logo.png}{options: align='right'}} A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author. } \seealso{ Useful links: \itemize{ \item \url{https://dbplyr.tidyverse.org/} \item \url{https://github.com/tidyverse/dbplyr} \item Report bugs at \url{https://github.com/tidyverse/dbplyr/issues} } } \author{ \strong{Maintainer}: Hadley Wickham \email{hadley@rstudio.com} Authors: \itemize{ \item Edgar Ruiz } Other contributors: \itemize{ \item RStudio [copyright holder, funder] } } \keyword{internal} dbplyr/man/win_over.Rd0000644000176200001440000000314413416117512014443 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/translate-sql-window.R \name{win_over} \alias{win_over} \alias{win_rank} \alias{win_aggregate} \alias{win_aggregate_2} \alias{win_recycled} \alias{win_cumulative} \alias{win_absent} \alias{win_current_group} \alias{win_current_order} \alias{win_current_frame} \title{Generate SQL expression for window functions} \usage{ win_over(expr, partition = NULL, order = NULL, frame = NULL, con = sql_current_con()) win_rank(f) win_aggregate(f) win_aggregate_2(f) win_cumulative(f) win_absent(f) win_current_group() win_current_order() win_current_frame() } \arguments{ \item{expr}{The window expression} \item{order}{Variables to order by} \item{frame}{A numeric vector of length two defining the frame.} \item{f}{The name of an sql function as a string} \item{parition}{Variables to partition over} } \description{ \code{win_over()} makes it easy to generate the window function specification. \code{win_absent()}, \code{win_rank()}, \code{win_aggregate()}, and \code{win_cumulative()} provide helpers for constructing common types of window functions. \code{win_current_group()} and \code{win_current_order()} allow you to access the grouping and order context set up by \code{\link[=group_by]{group_by()}} and \code{\link[=arrange]{arrange()}}. } \examples{ con <- simulate_dbi() win_over(sql("avg(x)"), con = con) win_over(sql("avg(x)"), "y", con = con) win_over(sql("avg(x)"), order = "y", con = con) win_over(sql("avg(x)"), order = c("x", "y"), con = con) win_over(sql("avg(x)"), frame = c(-Inf, 0), order = "y", con = con) } \keyword{internal} dbplyr/man/window_order.Rd0000644000176200001440000000135113417117054015315 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/verb-window.R \name{window_order} \alias{window_order} \alias{window_frame} \title{Override window order and frame} \usage{ window_order(.data, ...) window_frame(.data, from = -Inf, to = Inf) } \arguments{ \item{.data}{A remote tibble} \item{...}{Name-value pairs of expressions.} \item{from, to}{Bounds of the frame.} } \description{ Override window order and frame } \examples{ library(dplyr) df <- lazy_frame(g = rep(1:2, each = 5), y = runif(10), z = 1:10) df \%>\% window_order(y) \%>\% mutate(z = cumsum(y)) \%>\% sql_build() df \%>\% group_by(g) \%>\% window_frame(-3, 0) \%>\% window_order(z) \%>\% mutate(z = sum(x)) \%>\% sql_build() } dbplyr/man/in_schema.Rd0000644000176200001440000000142513415745770014555 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/schema.R \name{in_schema} \alias{in_schema} \title{Refer to a table in a schema} \usage{ in_schema(schema, table) } \arguments{ \item{schema, table}{Names of schema and table.} } \description{ Refer to a table in a schema } \examples{ in_schema("my_schema", "my_table") # Example using schemas with SQLite con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") src <- src_dbi(con, auto_disconnect = TRUE) # Add auxilary schema tmp <- tempfile() DBI::dbExecute(con, paste0("ATTACH '", tmp, "' AS aux")) library(dplyr, warn.conflicts = FALSE) copy_to(con, iris, "df", temporary = FALSE) copy_to(con, mtcars, in_schema("aux", "df"), temporary = FALSE) con \%>\% tbl("df") con \%>\% tbl(in_schema("aux", "df")) } dbplyr/man/join.tbl_sql.Rd0000644000176200001440000001174213442161247015217 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/verb-joins.R \name{join.tbl_sql} \alias{join.tbl_sql} \alias{inner_join.tbl_lazy} \alias{left_join.tbl_lazy} \alias{right_join.tbl_lazy} \alias{full_join.tbl_lazy} \alias{semi_join.tbl_lazy} \alias{anti_join.tbl_lazy} \title{Join sql tbls.} \usage{ \method{inner_join}{tbl_lazy}(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ..., sql_on = NULL) \method{left_join}{tbl_lazy}(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ..., sql_on = NULL) \method{right_join}{tbl_lazy}(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ..., sql_on = NULL) \method{full_join}{tbl_lazy}(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), auto_index = FALSE, ..., sql_on = NULL) \method{semi_join}{tbl_lazy}(x, y, by = NULL, copy = FALSE, auto_index = FALSE, ..., sql_on = NULL) \method{anti_join}{tbl_lazy}(x, y, by = NULL, copy = FALSE, auto_index = FALSE, ..., sql_on = NULL) } \arguments{ \item{x}{tbls to join} \item{y}{tbls to join} \item{by}{a character vector of variables to join by. If \code{NULL}, the default, \code{*_join()} will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right (to suppress the message, simply explicitly list the variables that you want to join). To join by different variables on x and y use a named vector. For example, \code{by = c("a" = "b")} will match \code{x.a} to \code{y.b}.} \item{copy}{If \code{x} and \code{y} are not from the same data source, and \code{copy} is \code{TRUE}, then \code{y} will be copied into a temporary table in same database as \code{x}. \code{*_join()} will automatically run \code{ANALYZE} on the created table in the hope that this will make you queries as efficient as possible by giving more data to the query planner. This allows you to join tables across srcs, but it's potentially expensive operation so you must opt into it.} \item{suffix}{If there are non-joined duplicate variables in \code{x} and \code{y}, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.} \item{auto_index}{if \code{copy} is \code{TRUE}, automatically create indices for the variables in \code{by}. This may speed up the join if there are matching indexes in \code{x}.} \item{...}{other parameters passed onto methods, for instance, \code{na_matches} to control how \code{NA} values are matched. See \link{join.tbl_df} for more.} \item{sql_on}{A custom join predicate as an SQL expression. The SQL can refer to the \code{LHS} and \code{RHS} aliases to disambiguate column names.} } \description{ See \link{join} for a description of the general purpose of the functions. } \section{Implementation notes}{ Semi-joins are implemented using \code{WHERE EXISTS}, and anti-joins with \code{WHERE NOT EXISTS}. All joins use column equality by default. An arbitrary join predicate can be specified by passing an SQL expression to the \code{sql_on} argument. Use \code{LHS} and \code{RHS} to refer to the left-hand side or right-hand side table, respectively. } \examples{ \dontrun{ library(dplyr) if (has_lahman("sqlite")) { # Left joins ---------------------------------------------------------------- lahman_s <- lahman_sqlite() batting <- tbl(lahman_s, "Batting") team_info <- select(tbl(lahman_s, "Teams"), yearID, lgID, teamID, G, R:H) # Combine player and whole team statistics first_stint <- select(filter(batting, stint == 1), playerID:H) both <- left_join(first_stint, team_info, type = "inner", by = c("yearID", "teamID", "lgID")) head(both) explain(both) # Join with a local data frame grid <- expand.grid( teamID = c("WAS", "ATL", "PHI", "NYA"), yearID = 2010:2012) top4a <- left_join(batting, grid, copy = TRUE) explain(top4a) # Indices don't really help here because there's no matching index on # batting top4b <- left_join(batting, grid, copy = TRUE, auto_index = TRUE) explain(top4b) # Semi-joins ---------------------------------------------------------------- people <- tbl(lahman_s, "Master") # All people in half of fame hof <- tbl(lahman_s, "HallOfFame") semi_join(people, hof) # All people not in the hall of fame anti_join(people, hof) # Find all managers manager <- tbl(lahman_s, "Managers") semi_join(people, manager) # Find all managers in hall of fame famous_manager <- semi_join(semi_join(people, manager), hof) famous_manager explain(famous_manager) # Anti-joins ---------------------------------------------------------------- # batters without person covariates anti_join(batting, people) # Arbitrary predicates ------------------------------------------------------ # Find all pairs of awards given to the same player # with at least 18 years between the awards: awards_players <- tbl(lahman_s, "AwardsPlayers") inner_join( awards_players, awards_players, sql_on = paste0( "(LHS.playerID = RHS.playerID) AND ", "(LHS.yearID < RHS.yearID - 18)" ) ) } } } dbplyr/man/translate_sql.Rd0000644000176200001440000000766013416117721015500 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/translate-sql.R \name{translate_sql} \alias{translate_sql} \alias{translate_sql_} \title{Translate an expression to sql.} \usage{ translate_sql(..., con = simulate_dbi(), vars = character(), vars_group = NULL, vars_order = NULL, vars_frame = NULL, window = TRUE) translate_sql_(dots, con = NULL, vars_group = NULL, vars_order = NULL, vars_frame = NULL, window = TRUE, context = list()) } \arguments{ \item{..., dots}{Expressions to translate. \code{translate_sql()} automatically quotes them for you. \code{translate_sql_()} expects a list of already quoted objects.} \item{con}{An optional database connection to control the details of the translation. The default, \code{NULL}, generates ANSI SQL.} \item{vars}{Deprecated. Now call \code{\link[=partial_eval]{partial_eval()}} directly.} \item{vars_group, vars_order, vars_frame}{Parameters used in the \code{OVER} expression of windowed functions.} \item{window}{Use \code{FALSE} to suppress generation of the \code{OVER} statement used for window functions. This is necessary when generating SQL for a grouped summary.} \item{context}{Use to carry information for special translation cases. For example, MS SQL needs a different conversion for is.na() in WHERE vs. SELECT clauses. Expects a list.} } \description{ Translate an expression to sql. } \section{Base translation}{ The base translator, \code{base_sql}, provides custom mappings for \code{!} (to NOT), \code{&&} and \code{&} to \code{AND}, \code{||} and \code{|} to \code{OR}, \code{^} to \code{POWER}, \code{\%>\%} to \code{\%}, \code{ceiling} to \code{CEIL}, \code{mean} to \code{AVG}, \code{var} to \code{VARIANCE}, \code{tolower} to \code{LOWER}, \code{toupper} to \code{UPPER} and \code{nchar} to \code{LENGTH}. \code{c()} and \code{:} keep their usual R behaviour so you can easily create vectors that are passed to sql. All other functions will be preserved as is. R's infix functions (e.g. \code{\%like\%}) will be converted to their SQL equivalents (e.g. \code{LIKE}). You can use this to access SQL string concatenation: \code{||} is mapped to \code{OR}, but \code{\%||\%} is mapped to \code{||}. To suppress this behaviour, and force errors immediately when dplyr doesn't know how to translate a function it encounters, using set the \code{dplyr.strict_sql} option to \code{TRUE}. You can also use \code{\link[=sql]{sql()}} to insert a raw sql string. } \section{SQLite translation}{ The SQLite variant currently only adds one additional function: a mapping from \code{sd()} to the SQL aggregation function \code{STDEV}. } \examples{ # Regular maths is translated in a very straightforward way translate_sql(x + 1) translate_sql(sin(x) + tan(y)) # Note that all variable names are escaped translate_sql(like == "x") # In ANSI SQL: "" quotes variable _names_, '' quotes strings # Logical operators are converted to their sql equivalents translate_sql(x < 5 & !(y >= 5)) # xor() doesn't have a direct SQL equivalent translate_sql(xor(x, y)) # If is translated into case when translate_sql(if (x > 5) "big" else "small") # Infix functions are passed onto SQL with \% removed translate_sql(first \%like\% "Had\%") translate_sql(first \%is\% NA) translate_sql(first \%in\% c("John", "Roger", "Robert")) # And be careful if you really want integers translate_sql(x == 1) translate_sql(x == 1L) # If you have an already quoted object, use translate_sql_: x <- quote(y + 1 / sin(t)) translate_sql_(list(x), con = simulate_dbi()) # Windowed translation -------------------------------------------- # Known window functions automatically get OVER() translate_sql(mpg > mean(mpg)) # Suppress this with window = FALSE translate_sql(mpg > mean(mpg), window = FALSE) # vars_group controls partition: translate_sql(mpg > mean(mpg), vars_group = "cyl") # and vars_order controls ordering for those functions that need it translate_sql(cumsum(mpg)) translate_sql(cumsum(mpg), vars_order = "mpg") } dbplyr/man/lahman.Rd0000644000176200001440000000327413474056125014065 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/data-lahman.R \name{lahman} \alias{lahman} \alias{lahman_sqlite} \alias{lahman_postgres} \alias{lahman_mysql} \alias{lahman_df} \alias{copy_lahman} \alias{has_lahman} \alias{lahman_srcs} \title{Cache and retrieve an \code{src_sqlite} of the Lahman baseball database.} \usage{ lahman_sqlite(path = NULL) lahman_postgres(dbname = "lahman", host = "localhost", ...) lahman_mysql(dbname = "lahman", ...) lahman_df() copy_lahman(src, ...) has_lahman(type, ...) lahman_srcs(..., quiet = NULL) } \arguments{ \item{...}{Other arguments passed to \code{src} on first load. For MySQL and PostgreSQL, the defaults assume you have a local server with \code{lahman} database already created. For \code{lahman_srcs()}, character vector of names giving srcs to generate.} \item{type}{src type.} \item{quiet}{if \code{TRUE}, suppress messages about databases failing to connect.} } \description{ This creates an interesting database using data from the Lahman baseball data source, provided by Sean Lahman at \url{http://www.seanlahman.com/baseball-archive/statistics/}, and made easily available in R through the \pkg{Lahman} package by Michael Friendly, Dennis Murphy and Martin Monkman. See the documentation for that package for documentation of the individual tables. } \examples{ # Connect to a local sqlite database, if already created \donttest{ if (has_lahman("sqlite")) { lahman_sqlite() batting <- tbl(lahman_sqlite(), "Batting") batting } # Connect to a local postgres database with lahman database, if available if (has_lahman("postgres")) { lahman_postgres() batting <- tbl(lahman_postgres(), "Batting") } } } \keyword{internal} dbplyr/man/sql_build.Rd0000644000176200001440000000435513426613253014602 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/query-join.R, R/query-select.R, % R/query-semi-join.R, R/query-set-op.R, R/sql-build.R \name{join_query} \alias{join_query} \alias{select_query} \alias{semi_join_query} \alias{set_op_query} \alias{sql_build} \alias{sql_render} \alias{sql_optimise} \title{Build and render SQL from a sequence of lazy operations} \usage{ join_query(x, y, vars, type = "inner", by = NULL, suffix = c(".x", ".y")) select_query(from, select = sql("*"), where = character(), group_by = character(), having = character(), order_by = character(), limit = NULL, distinct = FALSE) semi_join_query(x, y, anti = FALSE, by = NULL) set_op_query(x, y, type = type) sql_build(op, con = NULL, ...) sql_render(query, con = NULL, ..., bare_identifier_ok = FALSE) sql_optimise(x, con = NULL, ...) } \arguments{ \item{op}{A sequence of lazy operations} \item{con}{A database connection. The default \code{NULL} uses a set of rules that should be very similar to ANSI 92, and allows for testing without an active database connection.} \item{...}{Other arguments passed on to the methods. Not currently used.} \item{bare_identifier_ok}{Is it ok to return a bare table identifier. Set to \code{TRUE} when generating queries to be nested within other queries where a bare table name is ok.} } \description{ \code{sql_build()} creates a \code{select_query} S3 object, that is rendered to a SQL string by \code{sql_render()}. The output from \code{sql_build()} is designed to be easy to test, as it's database agnostic, and has a hierarchical structure. Outside of testing, however, you should always call \code{sql_render()}. } \details{ \code{sql_build()} is generic over the lazy operations, \link{lazy_ops}, and generates an S3 object that represents the query. \code{sql_render()} takes a query object and then calls a function that is generic over the database. For example, \code{sql_build.op_mutate()} generates a \code{select_query}, and \code{sql_render.select_query()} calls \code{sql_select()}, which has different methods for different databases. The default methods should generate ANSI 92 SQL where possible, so you backends only need to override the methods if the backend is not ANSI compliant. } \keyword{internal} dbplyr/man/src_sql.Rd0000644000176200001440000000100513415746254014265 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/src-sql.R \name{src_sql} \alias{src_sql} \title{Create a "sql src" object} \usage{ src_sql(subclass, con, ...) } \arguments{ \item{subclass}{name of subclass. "src_sql" is an abstract base class, so you must supply this value. \code{src_} is automatically prepended to the class name} \item{con}{the connection object} \item{...}{fields used by object} } \description{ Deprecated: please use \link{src_dbi} instead. } \keyword{internal} dbplyr/man/ident.Rd0000644000176200001440000000141513416116406013716 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/ident.R \name{ident} \alias{ident} \alias{ident_q} \alias{is.ident} \title{Flag a character vector as SQL identifiers} \usage{ ident(...) ident_q(...) is.ident(x) } \arguments{ \item{...}{A character vector, or name-value pairs} \item{x}{An object} } \description{ \code{ident()} takes unquoted strings and flags them as identifiers. \code{ident_q()} assumes its input has already been quoted, and ensures it does not get quoted again. This is currently used only for for \code{schema.table}. } \examples{ # SQL92 quotes strings with ' escape_ansi("x") # And identifiers with " ident("x") escape_ansi(ident("x")) # You can supply multiple inputs ident(a = "x", b = "y") ident_q(a = "x", b = "y") } dbplyr/man/lazy_ops.Rd0000644000176200001440000000177213417126252014462 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/lazy-ops.R \name{lazy_ops} \alias{lazy_ops} \alias{op_base} \alias{op_single} \alias{add_op_single} \alias{op_double} \alias{op_grps} \alias{op_vars} \alias{op_sort} \alias{op_frame} \title{Lazy operations} \usage{ op_base(x, vars, class = character()) op_single(name, x, dots = list(), args = list()) add_op_single(name, .data, dots = list(), args = list()) op_double(name, x, y, args = list()) op_grps(op) op_vars(op) op_sort(op) op_frame(op) } \description{ This set of S3 classes describe the action of dplyr verbs. These are currently used for SQL sources to separate the description of operations in R from their computation in SQL. This API is very new so is likely to evolve in the future. } \details{ \code{op_vars()} and \code{op_grps()} compute the variables and groups from a sequence of lazy operations. \code{op_sort()} and \code{op_frame()} tracks the order and frame for use in window functions. } \keyword{internal} dbplyr/man/db_copy_to.Rd0000644000176200001440000000127713417123652014744 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/verb-compute.R, R/verb-copy-to.R \name{db_compute} \alias{db_compute} \alias{db_collect} \alias{db_sql_render} \alias{db_copy_to} \title{More db generics} \usage{ db_compute(con, table, sql, temporary = TRUE, unique_indexes = list(), indexes = list(), analyze = TRUE, ...) db_collect(con, sql, n = -1, warn_incomplete = TRUE, ...) db_sql_render(con, sql, ...) db_copy_to(con, table, values, overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ...) } \description{ These are new, so not included in dplyr for backward compatibility purposes. } \keyword{internal} dbplyr/man/sql.Rd0000644000176200001440000000101413415745770013420 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/sql.R \name{sql} \alias{sql} \alias{is.sql} \alias{as.sql} \title{SQL escaping.} \usage{ sql(...) is.sql(x) as.sql(x) } \arguments{ \item{...}{Character vectors that will be combined into a single SQL expression.} \item{x}{Object to coerce} } \description{ These functions are critical when writing functions that translate R functions to sql functions. Typically a conversion function should escape all its inputs and return an sql object. } dbplyr/man/tbl_lazy.Rd0000644000176200001440000000127313426147047014443 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/tbl-lazy.R \name{tbl_lazy} \alias{tbl_lazy} \alias{lazy_frame} \title{Create a local lazy tibble} \usage{ tbl_lazy(df, con = simulate_dbi(), src = NULL) lazy_frame(..., con = simulate_dbi(), src = NULL) } \description{ These functions are useful for testing SQL generation without having to have an active database connection. See \code{\link[=simulate_dbi]{simulate_dbi()}} for a list available database simulations. } \examples{ library(dplyr) df <- data.frame(x = 1, y = 2) df_sqlite <- tbl_lazy(df, con = simulate_sqlite()) df_sqlite \%>\% summarise(x = sd(x, na.rm = TRUE)) \%>\% show_query() } \keyword{internal} dbplyr/man/collapse.tbl_sql.Rd0000644000176200001440000000270213443245336016061 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/verb-compute.R \name{collapse.tbl_sql} \alias{collapse.tbl_sql} \alias{compute.tbl_sql} \alias{collect.tbl_sql} \title{Force computation of query} \usage{ \method{collapse}{tbl_sql}(x, ...) \method{compute}{tbl_sql}(x, name = unique_table_name(), temporary = TRUE, unique_indexes = list(), indexes = list(), analyze = TRUE, ...) \method{collect}{tbl_sql}(x, ..., n = Inf, warn_incomplete = TRUE) } \arguments{ \item{x}{A \code{tbl_sql}} \item{...}{other parameters passed to methods.} \item{name}{Table name in remote database.} \item{temporary}{Should the table be temporary (\code{TRUE}, the default\code{) or persistent (}FALSE`)?} \item{unique_indexes}{a list of character vectors. Each element of the list will create a new unique index over the specified column(s). Duplicate rows will result in failure.} \item{indexes}{a list of character vectors. Each element of the list will create a new index.} \item{analyze}{if \code{TRUE} (the default), will automatically ANALYZE the new table so that the query optimiser has useful information.} \item{n}{Number of rows to fetch. Defaults to \code{Inf}, meaning all rows.} \item{warn_incomplete}{Warn if \code{n} is less than the number of result rows?} } \description{ \code{collapse()} creates a subquery; \code{compute()} stores the results in a remote table; \code{collect()} downloads the results into the current R session. } dbplyr/man/remote_name.Rd0000644000176200001440000000212113415745770015114 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/remote.R \name{remote_name} \alias{remote_name} \alias{remote_src} \alias{remote_con} \alias{remote_query} \alias{remote_query_plan} \title{Metadata about a remote table} \usage{ remote_name(x) remote_src(x) remote_con(x) remote_query(x) remote_query_plan(x) } \arguments{ \item{x}{Remote table, currently must be a \link{tbl_sql}.} } \value{ The value, or \code{NULL} if not remote table, or not applicable. For example, computed queries do not have a "name" } \description{ \code{remote_name()} gives the name remote table, or \code{NULL} if it's a query. \code{remote_query()} gives the text of the query, and \code{remote_query_plan()} the query plan (as computed by the remote database). \code{remote_src()} and \code{remote_con()} give the dplyr source and DBI connection respectively. } \examples{ mf <- memdb_frame(x = 1:5, y = 5:1, .name = "blorp") remote_name(mf) remote_src(mf) remote_con(mf) remote_query(mf) mf2 <- dplyr::filter(mf, x > 3) remote_name(mf2) remote_src(mf2) remote_con(mf2) remote_query(mf2) } dbplyr/LICENSE0000644000176200001440000000005213415745770012565 0ustar liggesusersYEAR: 2013-2017 COPYRIGHT HOLDER: RStudio