wikitaxa/0000755000177700017770000000000013216753634013437 5ustar herbrandtherbrandtwikitaxa/inst/0000755000177700017770000000000013216602015014376 5ustar herbrandtherbrandtwikitaxa/inst/doc/0000755000177700017770000000000013216602015015143 5ustar herbrandtherbrandtwikitaxa/inst/doc/wikitaxa_vignette.Rmd0000644000177700017770000000547413071751010021346 0ustar herbrandtherbrandt--- title: "Introduction to the wikitaxa package" author: "Scott Chamberlain" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to the wikitaxa package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r echo=FALSE} knitr::opts_chunk$set( comment = "#>", collapse = TRUE, warning = FALSE, message = FALSE ) ``` `wikitaxa` - Taxonomy data from Wikipedia The goal of `wikitaxa` is to allow search and taxonomic data retrieval from across many Wikimedia sites, including: Wikipedia, Wikicommons, and Wikispecies. There are lower level and higher level parts to the package API: ### Low level API The low level API is meant for power users and gives you more control, but requires more knowledge. * `wt_wiki_page()` * `wt_wiki_page_parse()` * `wt_wiki_url_build()` * `wt_wiki_url_parse()` * `wt_wikispecies_parse()` * `wt_wikicommons_parse()` * `wt_wikipedia_parse()` ### High level API The high level API is meant to be easier and faster to use. * `wt_data()` * `wt_data_id()` * `wt_wikispecies()` * `wt_wikicommons()` * `wt_wikipedia()` Search functions: * `wt_wikicommons_search()` * `wt_wikispecies_search()` * `wt_wikipedia_search()` ## Installation CRAN version ```{r eval=FALSE} install.packages("wikitaxa") ``` Dev version ```{r eval=FALSE} devtools::install_github("ropensci/wikitaxa") ``` ```{r} library("wikitaxa") ``` ## wiki data ```{r eval=FALSE} wt_data("Poa annua") ``` Get a Wikidata ID ```{r} wt_data_id("Mimulus foliatus") ``` ## wikipedia lower level ```{r} pg <- wt_wiki_page("https://en.wikipedia.org/wiki/Malus_domestica") res <- wt_wiki_page_parse(pg) res$iwlinks ``` higher level ```{r} res <- wt_wikipedia("Malus domestica") res$common_names res$classification ``` choose a wikipedia language ```{r eval=FALSE} # French wt_wikipedia(name = "Malus domestica", wiki = "fr") # Slovak wt_wikipedia(name = "Malus domestica", wiki = "sk") # Vietnamese wt_wikipedia(name = "Malus domestica", wiki = "vi") ``` search ```{r} wt_wikipedia_search(query = "Pinus") ``` search supports languages ```{r eval=FALSE} wt_wikipedia_search(query = "Pinus", wiki = "fr") ``` ## wikicommons lower level ```{r} pg <- wt_wiki_page("https://commons.wikimedia.org/wiki/Abelmoschus") res <- wt_wikicommons_parse(pg) res$common_names[1:3] ``` higher level ```{r} res <- wt_wikicommons("Abelmoschus") res$classification res$common_names ``` search ```{r} wt_wikicommons_search(query = "Pinus") ``` ## wikispecies lower level ```{r} pg <- wt_wiki_page("https://species.wikimedia.org/wiki/Malus_domestica") res <- wt_wikispecies_parse(pg, types = "common_names") res$common_names[1:3] ``` higher level ```{r} res <- wt_wikispecies("Malus domestica") res$classification res$common_names ``` search ```{r} wt_wikispecies_search(query = "Pinus") ``` wikitaxa/inst/doc/wikitaxa_vignette.html0000644000177700017770000012567113216602015021573 0ustar herbrandtherbrandt Introduction to the wikitaxa package

Introduction to the wikitaxa package

Scott Chamberlain

2017-12-20

wikitaxa - Taxonomy data from Wikipedia

The goal of wikitaxa is to allow search and taxonomic data retrieval from across many Wikimedia sites, including: Wikipedia, Wikicommons, and Wikispecies.

There are lower level and higher level parts to the package API:

Low level API

The low level API is meant for power users and gives you more control, but requires more knowledge.

High level API

The high level API is meant to be easier and faster to use.

Search functions:

Installation

CRAN version

install.packages("wikitaxa")

Dev version

devtools::install_github("ropensci/wikitaxa")
library("wikitaxa")

wiki data

wt_data("Poa annua")

Get a Wikidata ID

wt_data_id("Mimulus foliatus")
#> [1] "Q6495130"
#> attr(,"class")
#> [1] "wiki_id"

wikipedia

lower level

pg <- wt_wiki_page("https://en.wikipedia.org/wiki/Malus_domestica")
res <- wt_wiki_page_parse(pg)
res$iwlinks
#> [1] "https://en.wiktionary.org/wiki/apple"                                  
#> [2] "https://commons.wikimedia.org/wiki/Special:Search/Apple"               
#> [3] "https://en.wikiquote.org/wiki/Apples"                                  
#> [4] "https://en.wikisource.org/wiki/1911_Encyclop%C3%A6dia_Britannica/Apple"
#> [5] "https://en.wikibooks.org/wiki/Apples"                                  
#> [6] "https://species.wikimedia.org/wiki/Malus_domestica"                    
#> [7] "https://commons.wikimedia.org/wiki/Category:Apple_cultivars"

higher level

res <- wt_wikipedia("Malus domestica")
res$common_names
#> # A tibble: 1 x 2
#>    name language
#>   <chr>    <chr>
#> 1 Apple       en
res$classification
#> # A tibble: 3 x 2
#>         rank         name
#>        <chr>        <chr>
#> 1 plainlinks             
#> 2    species    M. pumila
#> 3   binomial Malus pumila

choose a wikipedia language

# French
wt_wikipedia(name = "Malus domestica", wiki = "fr")
# Slovak
wt_wikipedia(name = "Malus domestica", wiki = "sk")
# Vietnamese
wt_wikipedia(name = "Malus domestica", wiki = "vi")

search

wt_wikipedia_search(query = "Pinus")
#> $batchcomplete
#> [1] ""
#> 
#> $continue
#> $continue$sroffset
#> [1] 10
#> 
#> $continue$continue
#> [1] "-||"
#> 
#> 
#> $query
#> $query$searchinfo
#> $query$searchinfo$totalhits
#> [1] 2912
#> 
#> 
#> $query$search
#> # A tibble: 10 x 7
#>       ns                 title  pageid  size wordcount
#>  * <int>                 <chr>   <int> <int>     <int>
#>  1     0                  Pine   39389 21808      2460
#>  2     0 List of Pinus species  448990 14070       984
#>  3     0        Pinus longaeva  649634 12794      1424
#>  4     0       Pinus ponderosa  532941 29851      2644
#>  5     0            Pinus mugo  438946 10733       808
#>  6     0      Bristlecone pine  215931 16321      1679
#>  7     0           Pinus nigra  438963 11476      1352
#>  8     0      Pinus thunbergii 1522846  4679       438
#>  9     0        Pinus contorta  507717 22621      2321
#> 10     0       Pinus sabiniana  427209 13352      1262
#> # ... with 2 more variables: snippet <chr>, timestamp <chr>

search supports languages

wt_wikipedia_search(query = "Pinus", wiki = "fr")

wikicommons

lower level

pg <- wt_wiki_page("https://commons.wikimedia.org/wiki/Abelmoschus")
res <- wt_wikicommons_parse(pg)
res$common_names[1:3]
#> [[1]]
#> [[1]]$name
#> [1] "okra"
#> 
#> [[1]]$language
#> [1] "en"
#> 
#> 
#> [[2]]
#> [[2]]$name
#> [1] "مسكي"
#> 
#> [[2]]$language
#> [1] "ar"
#> 
#> 
#> [[3]]
#> [[3]]$name
#> [1] "Abelmoş"
#> 
#> [[3]]$language
#> [1] "az"

higher level

res <- wt_wikicommons("Abelmoschus")
res$classification
#> # A tibble: 15 x 2
#>          rank           name
#>         <chr>          <chr>
#>  1     Domain      Eukaryota
#>  2   unranked Archaeplastida
#>  3     Regnum        Plantae
#>  4     Cladus    angiosperms
#>  5     Cladus       eudicots
#>  6     Cladus  core eudicots
#>  7     Cladus    superrosids
#>  8     Cladus         rosids
#>  9     Cladus    eurosids II
#> 10       Ordo       Malvales
#> 11    Familia      Malvaceae
#> 12 Subfamilia     Malvoideae
#> 13     Tribus      Hibisceae
#> 14      Genus    Abelmoschus
#> 15  Authority  Medik. (1787)
res$common_names
#> # A tibble: 19 x 2
#>                name language
#>               <chr>    <chr>
#>  1             okra       en
#>  2             مسكي       ar
#>  3          Abelmoş       az
#>  4        Ibiškovec       cs
#>  5     Bisameibisch       de
#>  6            Okrat       fi
#>  7        Abelmosco       gl
#>  8        Abelmošus       hr
#>  9           Ybiškė       lt
#> 10   "അബെ\u0d7dമോസ്കസ്"       ml
#> 11         Абельмош      mrj
#> 12 Abelmoskusslekta       nn
#> 13          Piżmian       pl
#> 14         Абельмош       ru
#> 15             موري       sd
#> 16      Okrasläktet       sv
#> 17         Абельмош      udm
#> 18    Chi Vông vang       vi
#> 19           黄葵属       zh

search

wt_wikicommons_search(query = "Pinus")
#> $batchcomplete
#> [1] ""
#> 
#> $continue
#> $continue$sroffset
#> [1] 10
#> 
#> $continue$continue
#> [1] "-||"
#> 
#> 
#> $query
#> $query$searchinfo
#> $query$searchinfo$totalhits
#> [1] 261
#> 
#> 
#> $query$search
#> # A tibble: 10 x 7
#>       ns                                    title   pageid  size wordcount
#>  * <int>                                    <chr>    <int> <int>     <int>
#>  1     0                                    Pinus    82071  4154       320
#>  2     0                       Pinus × schwerinii 11923249   634        67
#>  3     0                              Pinus nigra    64703  7775       501
#>  4     0                             Spinus pinus   703299  1560       242
#>  5     0                            Pinus cooperi  8853401   564        64
#>  6     0 Pinus distribution maps of North America 29464212 25971        92
#>  7     0                           Pinus herrerae 29975479   206        28
#>  8     0                       Pinus tabuliformis   235899  1739       138
#>  9     0                          Pinus maximinoi 20376092   485        60
#> 10     0                      Pinus pseudostrobus  9972866   756        83
#> # ... with 2 more variables: snippet <chr>, timestamp <chr>

wikispecies

lower level

pg <- wt_wiki_page("https://species.wikimedia.org/wiki/Malus_domestica")
res <- wt_wikispecies_parse(pg, types = "common_names")
res$common_names[1:3]
#> [[1]]
#> [[1]]$name
#> [1] "Ябълка"
#> 
#> [[1]]$language
#> [1] "български"
#> 
#> 
#> [[2]]
#> [[2]]$name
#> [1] "Poma, pomera"
#> 
#> [[2]]$language
#> [1] "català"
#> 
#> 
#> [[3]]
#> [[3]]$name
#> [1] "Apfel"
#> 
#> [[3]]$language
#> [1] "Deutsch"

higher level

res <- wt_wikispecies("Malus domestica")
res$classification
#> # A tibble: 8 x 2
#>          rank          name
#>         <chr>         <chr>
#> 1 Superregnum     Eukaryota
#> 2      Regnum       Plantae
#> 3      Cladus   Angiosperms
#> 4      Cladus      Eudicots
#> 5      Cladus Core eudicots
#> 6      Cladus        Rosids
#> 7      Cladus    Eurosids I
#> 8        Ordo       Rosales
res$common_names
#> # A tibble: 19 x 2
#>               name   language
#>              <chr>      <chr>
#>  1          Ябълка  български
#>  2    Poma, pomera     català
#>  3           Apfel    Deutsch
#>  4     Aed-õunapuu      eesti
#>  5           Μηλιά   Ελληνικά
#>  6           Apple    English
#>  7         Manzano    español
#>  8           Pomme   français
#>  9           Melâr     furlan
#> 10        사과나무     한국어
#> 11          ‘Āpala    Hawaiʻi
#> 12            Melo   italiano
#> 13           Aapel Nordfriisk
#> 14  Maçã, Macieira  português
#> 15 Яблоня домашняя    русский
#> 16   Tarhaomenapuu      suomi
#> 17            Elma     Türkçe
#> 18  Яблуня домашня українська
#> 19          Pomaro     vèneto

search

wt_wikispecies_search(query = "Pinus")
#> $batchcomplete
#> [1] ""
#> 
#> $continue
#> $continue$sroffset
#> [1] 10
#> 
#> $continue$continue
#> [1] "-||"
#> 
#> 
#> $query
#> $query$searchinfo
#> $query$searchinfo$totalhits
#> [1] 400
#> 
#> 
#> $query$search
#> # A tibble: 10 x 7
#>       ns                    title pageid  size wordcount
#>  * <int>                    <chr>  <int> <int>     <int>
#>  1     0                    Pinus  17362  1570       282
#>  2     0 Pinus nigra subsp. nigra 327138  1412       127
#>  3     0        Pinus subg. Pinus 300923   318        27
#>  4     0             Pinus clausa  45047  1520       210
#>  5     0        Pinus sect. Pinus 300935   623        68
#>  6     0           Pinus resinosa  45082  1195       165
#>  7     0         Pinus gordoniana 260795   594        61
#>  8     0     Pinus subsect. Pinus 300938   718        94
#>  9     0         Pinus thunbergii  73542   999       140
#> 10     0          Pinus sabiniana  45084   644        80
#> # ... with 2 more variables: snippet <chr>, timestamp <chr>
wikitaxa/inst/doc/wikitaxa_vignette.R0000644000177700017770000000476213216602015021025 0ustar herbrandtherbrandt## ----echo=FALSE---------------------------------------------------------- knitr::opts_chunk$set( comment = "#>", collapse = TRUE, warning = FALSE, message = FALSE ) ## ----eval=FALSE---------------------------------------------------------- # install.packages("wikitaxa") ## ----eval=FALSE---------------------------------------------------------- # devtools::install_github("ropensci/wikitaxa") ## ------------------------------------------------------------------------ library("wikitaxa") ## ----eval=FALSE---------------------------------------------------------- # wt_data("Poa annua") ## ------------------------------------------------------------------------ wt_data_id("Mimulus foliatus") ## ------------------------------------------------------------------------ pg <- wt_wiki_page("https://en.wikipedia.org/wiki/Malus_domestica") res <- wt_wiki_page_parse(pg) res$iwlinks ## ------------------------------------------------------------------------ res <- wt_wikipedia("Malus domestica") res$common_names res$classification ## ----eval=FALSE---------------------------------------------------------- # # French # wt_wikipedia(name = "Malus domestica", wiki = "fr") # # Slovak # wt_wikipedia(name = "Malus domestica", wiki = "sk") # # Vietnamese # wt_wikipedia(name = "Malus domestica", wiki = "vi") ## ------------------------------------------------------------------------ wt_wikipedia_search(query = "Pinus") ## ----eval=FALSE---------------------------------------------------------- # wt_wikipedia_search(query = "Pinus", wiki = "fr") ## ------------------------------------------------------------------------ pg <- wt_wiki_page("https://commons.wikimedia.org/wiki/Abelmoschus") res <- wt_wikicommons_parse(pg) res$common_names[1:3] ## ------------------------------------------------------------------------ res <- wt_wikicommons("Abelmoschus") res$classification res$common_names ## ------------------------------------------------------------------------ wt_wikicommons_search(query = "Pinus") ## ------------------------------------------------------------------------ pg <- wt_wiki_page("https://species.wikimedia.org/wiki/Malus_domestica") res <- wt_wikispecies_parse(pg, types = "common_names") res$common_names[1:3] ## ------------------------------------------------------------------------ res <- wt_wikispecies("Malus domestica") res$classification res$common_names ## ------------------------------------------------------------------------ wt_wikispecies_search(query = "Pinus") wikitaxa/tests/0000755000177700017770000000000013216602015014563 5ustar herbrandtherbrandtwikitaxa/tests/testthat/0000755000177700017770000000000013216602015016423 5ustar herbrandtherbrandtwikitaxa/tests/testthat/test-wikispecies.R0000644000177700017770000000550713145576531022067 0ustar herbrandtherbrandtcontext("wt_wikispecies") test_that("wt_wikispecies returns non-empty results", { skip_on_cran() aa <- wt_wikispecies(name = "Malus domestica") expect_is(aa, "list") expect_named(aa, c('langlinks', 'externallinks', 'common_names', 'classification')) expect_is(aa$langlinks, "data.frame") expect_is(aa$externallinks, "character") expect_is(aa$common_names, "data.frame") expect_named(aa$common_names, c('name', 'language')) expect_is(aa$classification, "data.frame") expect_named(aa$classification, c('rank', 'name')) bb <- wt_wikispecies(name = "Poa annua") expect_is(bb, "list") expect_named(bb, c('langlinks', 'externallinks', 'common_names', 'classification')) expect_is(bb$langlinks, "data.frame") expect_is(bb$externallinks, "character") expect_is(bb$common_names, "data.frame") expect_named(bb$common_names, c('name', 'language')) expect_is(bb$classification, "data.frame") expect_named(bb$classification, c('rank', 'name')) }) test_that("wt_wikispecies fails well", { skip_on_cran() expect_error(wt_wikispecies(), "argument \"name\" is missing") expect_error(wt_wikispecies(5), "name must be of class character") # "name" must be length 1 expect_error( wt_wikispecies(c("Pinus", "asdfadsf")), "length\\(name\\) == 1 is not TRUE" ) # "utf8" must be logical expect_error( wt_wikispecies("Pinus", "asdf"), "utf8 must be of class logical" ) }) context("wt_wikispecies_parse") test_that("wt_wikispecies_parse returns non-empty results", { skip_on_cran() url <- "https://species.wikimedia.org/wiki/Malus_domestica" pg <- wt_wiki_page(url) types <- c("common_names") result <- wt_wikispecies_parse(pg, types = types) expect_is(result, "list") for (fieldname in types) { expect_is(result[fieldname], "list") expect_gt(length(result[fieldname]), 0) } }) context("wt_wikispecies_search") test_that("wt_wikispecies_search works", { skip_on_cran() aa <- wt_wikispecies_search(query = "Pinus") expect_is(aa, "list") expect_is(aa$continue, "list") expect_is(aa$query, "list") expect_is(aa$query$searchinfo, "list") expect_is(aa$query$search, "data.frame") expect_named(aa$query$search, c('ns', 'title', 'pageid', 'size', 'wordcount', 'snippet', 'timestamp')) # no results when not found expect_equal(NROW(wt_wikispecies_search("asdfadfaadfadfs")$query$search), 0) }) test_that("wt_wikispecies_search fails well", { skip_on_cran() expect_error( wt_wikispecies_search(), "argument \"query\" is missing" ) expect_error( wt_wikispecies_search("Pinus", limit = "adf"), "limit must be of class integer, numeric" ) expect_error( wt_wikispecies_search("Pinus", offset = "adf"), "offset must be of class integer, numeric" ) }) wikitaxa/tests/testthat/test-wikicommons.R0000644000177700017770000000566413216573646022116 0ustar herbrandtherbrandtcontext("wt_wikicommons") test_that("wt_wikicommons returns non-empty results", { skip_on_cran() aa <- wt_wikicommons(name = "Malus domestica") expect_is(aa, "list") expect_named(aa, c('langlinks', 'externallinks', 'common_names', 'classification')) expect_is(aa$langlinks, "data.frame") expect_is(aa$externallinks, "character") expect_is(aa$common_names, "data.frame") expect_named(aa$common_names, c('name', 'language')) expect_is(aa$classification, "data.frame") expect_named(aa$classification, c('rank', 'name')) bb <- wt_wikicommons(name = "Poa annua") expect_is(bb, "list") expect_named(bb, c('langlinks', 'externallinks', 'common_names', 'classification')) expect_is(bb$langlinks, "data.frame") expect_is(bb$externallinks, "character") expect_is(bb$common_names, "data.frame") expect_named(bb$common_names, c('name', 'language')) expect_is(bb$classification, "data.frame") expect_named(bb$classification, c('rank', 'name')) }) test_that("wt_wikicommons fails well", { skip_on_cran() expect_error(wt_wikicommons(), "argument \"name\" is missing") expect_error(wt_wikicommons(5), "name must be of class character") # "name" must be length 1 expect_error( wt_wikicommons(c("Pinus", "asdfadsf")), "length\\(name\\) == 1 is not TRUE" ) # "utf8" must be logical expect_error( wt_wikicommons("Pinus", "asdf"), "utf8 must be of class logical" ) # when on page found, returns list() expect_equal( wt_wikicommons("Category:Ursus"), list() ) }) context("wt_wikicommons_parse") test_that("wt_wikicommons_parse returns non-empty results", { skip_on_cran() url <- "https://commons.wikimedia.org/wiki/Malus_domestica" pg <- wt_wiki_page(url) types <- c("common_names") result <- wt_wikicommons_parse(pg, types = types) expect_is(result, "list") for (fieldname in types) { expect_is(result[fieldname], "list") expect_gt(length(result[fieldname]), 0) } }) context("wt_wikicommons_search") test_that("wt_wikicommons_search works", { skip_on_cran() aa <- wt_wikicommons_search(query = "Pinus") expect_is(aa, "list") expect_is(aa$continue, "list") expect_is(aa$query, "list") expect_is(aa$query$searchinfo, "list") expect_is(aa$query$search, "data.frame") expect_named(aa$query$search, c('ns', 'title', 'pageid', 'size', 'wordcount', 'snippet', 'timestamp')) # no results when not found expect_equal(NROW(wt_wikicommons_search("asdfadfaadfadfs")$query$search), 0) }) test_that("wt_wikicommons_search fails well", { skip_on_cran() expect_error( wt_wikicommons_search(), "argument \"query\" is missing" ) expect_error( wt_wikicommons_search("Pinus", limit = "adf"), "limit must be of class integer, numeric" ) expect_error( wt_wikicommons_search("Pinus", offset = "adf"), "offset must be of class integer, numeric" ) }) wikitaxa/tests/testthat/test-wikipedia.R0000644000177700017770000000557513145576474021531 0ustar herbrandtherbrandtcontext("wt_wikipedia") test_that("wt_wikipedia returns non-empty results", { skip_on_cran() aa <- wt_wikipedia(name = "Malus domestica") expect_is(aa, "list") expect_named(aa, c('langlinks', 'externallinks', 'common_names', 'classification', 'synonyms')) expect_is(aa$langlinks, "data.frame") expect_is(aa$externallinks, "character") expect_is(aa$common_names, "data.frame") expect_named(aa$common_names, c('name', 'language')) expect_is(aa$classification, "data.frame") expect_named(aa$classification, c('rank', 'name')) bb <- wt_wikipedia(name = "Poa annua") expect_is(bb, "list") expect_named(bb, c('langlinks', 'externallinks', 'common_names', 'classification', 'synonyms')) expect_is(bb$langlinks, "data.frame") expect_is(bb$externallinks, "character") expect_is(bb$common_names, "data.frame") expect_is(bb$classification, "data.frame") expect_named(bb$classification, c('rank', 'name')) }) test_that("wt_wikipedia fails well", { skip_on_cran() expect_error(wt_wikipedia(), "argument \"name\" is missing") expect_error(wt_wikipedia(5), "name must be of class character") # "name" must be length 1 expect_error( wt_wikipedia(c("Pinus", "asdfadsf")), "length\\(name\\) == 1 is not TRUE" ) # "wiki" must be character expect_error( wt_wikipedia("Pinus", 5), "wiki must be of class character" ) # "utf8" must be logical expect_error( wt_wikipedia("Pinus", utf8 = "asdf"), "utf8 must be of class logical" ) }) context("wt_wikipedia_parse") test_that("wt_wikipedia_parse returns non-empty results", { skip_on_cran() url <- "https://species.wikimedia.org/wiki/Malus_domestica" pg <- wt_wiki_page(url) types <- c("common_names") result <- wt_wikipedia_parse(pg, types = types) expect_is(result, "list") for (fieldname in types) { expect_is(result[fieldname], "list") expect_gt(length(result[fieldname]), 0) } }) context("wt_wikipedia_search") test_that("wt_wikipedia_search works", { skip_on_cran() aa <- wt_wikipedia_search(query = "Pinus") expect_is(aa, "list") expect_is(aa$continue, "list") expect_is(aa$query, "list") expect_is(aa$query$searchinfo, "list") expect_is(aa$query$search, "data.frame") expect_named(aa$query$search, c('ns', 'title', 'pageid', 'size', 'wordcount', 'snippet', 'timestamp')) # no results when not found expect_equal(NROW(wt_wikipedia_search("asdfadfaadfadfs")$query$search), 0) }) test_that("wt_wikipedia_search fails well", { skip_on_cran() expect_error( wt_wikipedia_search(), "argument \"query\" is missing" ) expect_error( wt_wikipedia_search("Pinus", limit = "adf"), "limit must be of class integer, numeric" ) expect_error( wt_wikipedia_search("Pinus", offset = "adf"), "offset must be of class integer, numeric" ) }) wikitaxa/tests/testthat/test-wt_data.R0000644000177700017770000000107013103202431021135 0ustar herbrandtherbrandtcontext("wt_data") test_that("wt_data returns the correct class", { skip_on_cran() prop <- "P846" aa <- wt_data("Mimulus foliatus", property = prop) expect_is(aa, "list") expect_is(aa$labels, "data.frame") expect_is(aa$descriptions, "data.frame") expect_is(aa$aliases, "data.frame") expect_is(aa$sitelinks, "data.frame") expect_is(aa$claims, "data.frame") expect_is(aa$claims, "data.frame") expect_equal(aa$claims$property, prop) }) test_that("wt_data fails well", { expect_error(wt_data(), "argument \"x\" is missing, with no default") }) wikitaxa/tests/testthat/test-wt_wiki_url_build.R0000644000177700017770000000141613103202540023235 0ustar herbrandtherbrandtcontext("wt_wiki_url_build") test_that("wt_wiki_url_build correctly builds static page url", { skip_on_cran() url <- "https://en.wikipedia.org/wiki/Malus_domestica" result <- wt_wiki_url_build("en", "wikipedia", "Malus domestica") expect_equal(result, url) }) test_that("wt_wiki_url_build correctly builds API page url", { skip_on_cran() url <- gsub("\n|\\s+", "", "https://en.wikipedia.org/w/api.php?page= Malus_domestica&action=parse&redirects=TRUE&format=json& utf8=TRUE&prop=text") result <- wt_wiki_url_build("en", "wikipedia", "Malus domestica", api = TRUE, action = "parse", redirects = TRUE, format = "json", utf8 = TRUE, prop = "text") expect_equal(result, url) }) wikitaxa/tests/testthat/test-wt_wiki_url_parse.R0000644000177700017770000000127013103202545023253 0ustar herbrandtherbrandtcontext("wt_wiki_url_parse") test_that("wt_wiki_url_parse correctly parses static page url", { skip_on_cran() url <- "https://en.wikipedia.org/wiki/Malus_domestica" result <- wt_wiki_url_parse(url) expect_is(result, "list") expect_equal(result$wiki, "en") expect_equal(result$type, "wikipedia") expect_equal(result$page, "Malus_domestica") }) test_that("wt_wiki_url_parse correctly parses API page url", { skip_on_cran() url <- "https://en.wikipedia.org/w/api.php?page=Malus_domestica" result <- wt_wiki_url_parse(url) expect_is(result, "list") expect_equal(result$wiki, "en") expect_equal(result$type, "wikipedia") expect_equal(result$page, "Malus_domestica") }) wikitaxa/tests/testthat/test-wt_wiki_page.R0000644000177700017770000000202713103202533022171 0ustar herbrandtherbrandtcontext("wt_wiki_page/wt_wiki_page_parse") test_that("wt_wiki_page returns a response object", { skip_on_cran() url <- "https://en.wikipedia.org/wiki/Malus_domestica" result <- wt_wiki_page(url) expect_is(result, "HttpResponse") }) test_that("wt_wiki_page_parse returns non-empty results", { skip_on_cran() url <- "https://en.wikipedia.org/wiki/Malus_domestica" pg <- wt_wiki_page(url) types <- c("langlinks", "iwlinks", "externallinks") result <- wt_wiki_page_parse(pg, types = types) expect_is(result, "list") for (fieldname in types) { expect_is(result[fieldname], "list") expect_gt(length(result[fieldname]), 0) } }) test_that("wt_wiki_page_parse returns non-empty results", { skip_on_cran() url <- "https://en.wikipedia.org/wiki/Malus_domestica" pg <- wt_wiki_page(url) types <- c("common_names") result <- wt_wiki_page_parse(pg, types = types) expect_is(result, "list") for (fieldname in types) { expect_is(result[fieldname], "list") expect_gt(length(result[fieldname]), 0) } }) wikitaxa/tests/test-all.R0000644000177700017770000000007313027025301016430 0ustar herbrandtherbrandtlibrary(testthat) library(wikitaxa) test_check("wikitaxa") wikitaxa/NAMESPACE0000644000177700017770000000073113065247715014657 0ustar herbrandtherbrandt# Generated by roxygen2: do not edit by hand S3method(wt_data,default) S3method(wt_data,wiki_id) export(wt_data) export(wt_data_id) export(wt_wiki_page) export(wt_wiki_page_parse) export(wt_wiki_url_build) export(wt_wiki_url_parse) export(wt_wikicommons) export(wt_wikicommons_parse) export(wt_wikicommons_search) export(wt_wikipedia) export(wt_wikipedia_parse) export(wt_wikipedia_search) export(wt_wikispecies) export(wt_wikispecies_parse) export(wt_wikispecies_search) wikitaxa/NEWS.md0000644000177700017770000000132213216575217014532 0ustar herbrandtherbrandtwikitaxa 0.2.0 ============== ### BUG FIXES * `wt_wikicommons()` fails better now when a page does not exist, and is now consitent with the rest of package (#14) * `wt_wikicommons()` fixed - classification objects were not working correctly as the data used is a hot mess - tried to improve parsing of that text (#13) * `wt_data()` fix - was failing due to i think a change in the internal pkg `WikidataR` (#12) wikitaxa 0.1.4 ============== ### NEW FEATURES * `wt_wikipedia()` and `wt_wikipedia_search()` gain parameter `wiki` to give the wiki language, which defaults to `en` (#9) ### MINOR IMPROVEMENTS * move some examples to dontrun (#11) wikitaxa 0.1.0 ============== ### NEW FEATURES * Released to CRAN wikitaxa/data/0000755000177700017770000000000013071747041014342 5ustar herbrandtherbrandtwikitaxa/data/wikipedias.rda0000644000177700017770000001414613071747217017176 0ustar herbrandtherbrandt[o[WvwI-'`L(8A ,ykLNsI^Wo"$YxIL'"˲"Zc[?TLD;043=sBss;|pכf "H"g64M3bo}Zm0֤%F!֚Y4ZLVPu©S 1lTChIbeZ5`U ɍFF3 r4pg$q@P3X#4cNk6L=Aն.kǣ5ub@$a}ؼI9}8׆vVenu6;KFۨF,-y_}|Mek52Dc Wژ5ܜDC\KIɬ͢*$fXHMm sZ! 7'NwhrNta䶝p,)m,ej=3KtӲL,5d7L  r:)k2 ITwjĶA욲~i ŀдvZ)(No$dJq ݰӬP_.[}B0vL#^.s|3֙7vgw[Xi׫k :ߴc]K|L)=)FC-EQu{`~jo gY@XBq q b^pT=Rʄֲ*k " zMZt@REGj4 >赘i:.6F7OњCDu3ItJno9*ȅ67jiȸ&:q\ !44roM{Hȫӯ4Jq(F%yJc;>&Ӑ6aߌdʒ|h +Y%Zkli7YUYb jgw f5CdlgV& !ҕPnh9 ϘiQJPiwA:HyD;pIxD[ N&zh`@PB`po`zߖx0%ruéĖ⡡Kv5yFסfh0dU7RՓf w͝$EaudF55;/iA4=F^ 19єB' <DTh*pCȵ4<+Nf(:p] ΠxbG:ljcuZ ƨ%)3AMJv1w3h]`i'4d&VV8 Ő@9YUV2 1A F4f"_6SՋ752d,k1ƁT*thiH\ RFM)$LMC!k{7X x,qRI뺥욻9*d @<0xR=;so])݀Mn3e8xn H&.\+?u!tRpl8MpT}ZA7i# 2ks08Dbr [ B\lOsd}"e:"fnRPL`4k[8ie0mU5 4PJ>`Z8mؒ`zNJmڐ$A ($CzJ& Zjm{t_0I0IBQh8hL v) E@u끉a>Jy)3bT_>zƌ]qEld%XXA?z}9F/(臢Cooz][ϧ/ B w~w |K',<nP00uo8M\YXB}>M\&eob\v875䙝B?GԸZ%wwwoO `)+*6[~Oȯ2L\fO?,[-6V(KgΔe(-y+ZA^KCo7:<獎qlt}Ad隊lS U#BGM2r~+ޫ\8_0&~pPunKL I7\QW 7o$\G8oDω4dI@ѻ{QH4&fS&/WϖK@3z39;sM]V(tG`Շq^d f姢I\mGyu(/TW[ut_`ᵹ;qMO ؅I;UKWB RW+W.V|Q;Tʕ+\$wvmXWCR6 > ݊JH.A?+ôҢ"n]2jxw9a3XF J=Z | R*]],\J{ 7o`/,{VY_b (0o`y,dq\i+ʹ_~B4Ӎ; ǂ@udzxnb֖]cp$[#..}Թ ӆɰ8@Gs@.24^%7_-դf/FzC֌Y`w(܌7wϛ[f9M Вseh"U{)sx{k39'{@ %s@Ck`g`E59`7\jö'W:,%̌gW0%t-^ŹCT -Byg/AS y%A'*xrkNr5U46&3EQy}aot]Ql|KT4 2(^0_Y 8QVzZ +')3;4_ &S[Z'G~'mPƛK{"wuX $'t=]}VŅ-/{Ox_,s2o6=Vm 0x]<.OocijkOoMG.*XQX;7~bʅlڙ]S)oLZy8ZypT V2D / Q]#wOf~.1DXsQn׹;&/ySIojƛ.¯@^`;:ؖ>o|˕w+g#_7•Z Ucozuw373 b0TQS\-!W& |/^o_S3% [-݀+>|rh~,, Wܰ@^YԄv=gXx66[>Q~W >CT&~uL$i x* rUY柬CWGdEqg^y>~X#:֫Xy 6vِR~mpq ZYUtWͮ|t %W> |w~{wATA_}s{[-*>LMXWTw"oMÝ.52N6yPmc?'|}h6v؈"XWI*"YTepmI2*jP$CI.D}GJ8Z4D0E>eqf7?. $OzB6ǯ"X#(*,Ұ?OT֒ȟO6q&` %3.r[GR(14o G*Gc?y\"O-=c؋=euo> F{-^đHv mGd% >.'IEp؈| oYroɰST4TD$ Q !)j.Ȉ;j1\S\j(uSޗ'FQN%"$#ئyyCD$ZGϠ[r.:إ$̀c:3"1у[=r0BC[FD$~#18;DҥQp,{F- Q[*2ĉ(m -#`Fr+<Ƙ\e0d>b)UfuGj%>+ڈ8fl4 ,ED/=<}s' "M/Q-/nA9)̡gU 92_p4B_4+MF|;H?#G?߇27/-LT~gu/6 o5?>h@H%xAF4C.PT^A>[6!*_m?iL#?[\eD,8?`_xi␟Z0i7wikitaxa/R/0000755000177700017770000000000013216602015013622 5ustar herbrandtherbrandtwikitaxa/R/wiki.R0000644000177700017770000000662013107627255014730 0ustar herbrandtherbrandt#' Wikidata taxonomy data #' #' @export #' @param x (character) a taxonomic name #' @param property (character) a property id, e.g., P486 #' @param ... curl options passed on to [httr::GET()] #' @param language (character) two letter language code #' @param limit (integer) records to return. Default: 10 #' @return `wt_data` searches Wikidata, and returns a list with elements: #' \itemize{ #' \item labels - data.frame with columns: language, value #' \item descriptions - data.frame with columns: language, value #' \item aliases - data.frame with columns: language, value #' \item sitelinks - data.frame with columns: site, title #' \item claims - data.frame with columns: claims, property_value, #' property_description, value (comma separted values in string) #' } #' #' `wt_data_id` gets the Wikidata ID for the searched term, and #' returns the ID as character #' #' @details Note that `wt_data` can take a while to run since when fetching #' claims it has to do so one at a time for each claim #' #' You can search things other than taxonomic names with `wt_data` if you #' like #' @examples \dontrun{ #' # search by taxon name #' # wt_data("Mimulus alsinoides") #' #' # choose which properties to return #' wt_data(x="Mimulus foliatus", property = c("P846", "P815")) #' #' # get a taxonomic identifier #' wt_data_id("Mimulus foliatus") #' # the id can be passed directly to wt_data() #' # wt_data(wt_data_id("Mimulus foliatus")) #' } wt_data <- function(x, property = NULL, ...) { UseMethod("wt_data") } #' @export wt_data.wiki_id <- function(x, property = NULL, ...) { data_wiki(x, property = property, ...) } #' @export wt_data.default <- function(x, property = NULL, ...) { x <- WikidataR::find_item(search_term = x, ...) if (length(x) == 0) stop("no results found", call. = FALSE) data_wiki(x[[1]]$id, property = property, ...) } #' @export #' @rdname wt_data wt_data_id <- function(x, language = "en", limit = 10, ...) { x <- WikidataR::find_item(search_term = x, language = language, limit = limit, ...) x <- if (length(x) == 0) NA else x[[1]]$id structure(x, class = "wiki_id") } data_wiki <- function(x, property = NULL, ...) { xx <- WikidataR::get_item(x, ...) if (is.null(property)) { claims <- create_claims(xx[[1]]$claims) } else{ cl <- Filter(function(x) x$mainsnak$property %in% property, xx[[1]]$claims) if (length(cl) == 0) stop("No matching properties", call. = FALSE) claims <- create_claims(cl) } list( labels = dt_df(xx[[1]]$labels), descriptions = dt_df(xx[[1]]$descriptions), aliases = dt_df(xx[[1]]$aliases), sitelinks = dt_df(lapply(xx[[1]]$sitelinks, function(x) x[names(x) %in% c('site', 'title')])), claims = dt_df(claims) ) } fetch_property <- function(x) { tmp <- WikidataR::get_property(x) list( property_value = tmp[[1]]$labels$en$value, property_description = tmp[[1]]$descriptions$en$value ) } create_claims <- function(x) { lapply(x, function(z) { ff <- c( property = paste0(unique(z$mainsnak$property), collapse = ","), fetch_property(unique(z$mainsnak$property)), value = { if (inherits(z$mainsnak$datavalue$value, "data.frame")) { paste0(z$mainsnak$datavalue$value$`numeric-id`, collapse = ",") } else { paste0(z$mainsnak$datavalue$value, collapse = ",") } } ) ff[vapply(ff, is.null, logical(1))] <- NA ff }) } wikitaxa/R/wikicommons.R0000644000177700017770000001036113216573356016324 0ustar herbrandtherbrandt#' WikiCommons #' #' @export #' @template args #' @family Wikicommons functions #' @return `wt_wikicommons` returns a list, with slots: #' \itemize{ #' \item langlinks - language page links #' \item externallinks - external links #' \item common_names - a data.frame with `name` and `language` columns #' \item classification - a data.frame with `rank` and `name` columns #' } #' #' `wt_wikicommons_parse` returns a list #' #' `wt_wikicommons_search` returns a list with slots for `continue` and #' `query`, where `query` holds the results, with `query$search` slot with #' the search results #' @references for help on search #' @examples \dontrun{ #' # high level #' wt_wikicommons(name = "Malus domestica") #' wt_wikicommons(name = "Pinus contorta") #' wt_wikicommons(name = "Ursus americanus") #' wt_wikicommons(name = "Balaenoptera musculus") #' #' wt_wikicommons(name = "Category:Poeae") #' wt_wikicommons(name = "Category:Pinaceae") #' #' # low level #' pg <- wt_wiki_page("https://commons.wikimedia.org/wiki/Malus_domestica") #' wt_wikicommons_parse(pg) #' #' # search wikicommons #' wt_wikicommons_search(query = "Pinus") #' #' ## use search results to dig into pages #' res <- wt_wikicommons_search(query = "Pinus") #' lapply(res$query$search$title[1:3], wt_wikicommons) #' } wt_wikicommons <- function(name, utf8 = TRUE, ...) { assert(name, "character") stopifnot(length(name) == 1) prop <- c("langlinks", "externallinks", "common_names", "classification") res <- wt_wiki_url_build( wiki = "commons", type = "wikimedia", page = name, utf8 = utf8, prop = prop) pg <- wt_wiki_page(res, ...) wt_wikicommons_parse(pg, prop, tidy = TRUE) } #' @export #' @rdname wt_wikicommons wt_wikicommons_parse <- function(page, types = c("langlinks", "iwlinks", "externallinks", "common_names", "classification"), tidy = FALSE) { result <- wt_wiki_page_parse(page, types = types, tidy = tidy) json <- jsonlite::fromJSON(rawToChar(page$content), simplifyVector = FALSE) # if output is NULL if (is.null(json$parse)) { return(result) } # if page not found txt <- xml2::read_html(json$parse$text[[1]]) html <- tryCatch( xml2::xml_find_all(txt, "//div[contains(., \"Domain\") or contains(., \"Phylum\")]")[[2]], error = function(e) e) if (inherits(html, "error")) return(list()) ## Common names if ("common_names" %in% types) { vernacular_html <- xml2::xml_find_all(txt, xpath = "//bdi[@class='vernacular']") # XML formats: # name # name ## Name formats: # name1 / name2 # name1, name2 # name (category) cnms <- lapply(vernacular_html, function(x) { attributes <- xml2::xml_attrs(x) language <- attributes[["lang"]] name <- trimws(gsub("[ ]*\\(.*\\)", "", xml2::xml_text(x))) list( name = name, language = language ) }) result$common_names <- if (tidy) atbl(dt_df(cnms)) else cnms } ## classification if ("classification" %in% types) { html <- tryCatch( xml2::xml_find_all(txt, "//div[contains(., \"Domain\") or contains(., \"Phylum\")]")[[2]], error = function(e) e) # labels labels <- c(gsub(":", "", strex(xml2::xml_text(html), "[A-Za-z]+\\)?:")[[1]]), "Authority") labels <- gsub("\\(|\\)", "", labels) labels <- labels[-1] # values values <- xml2::xml_text( xml2::xml_find_all(if (inherits(html, "xml_nodes")) html[[2]] else html, ".//b")) values <- gsub("^:\\s+|^.+:\\s?", "", values) values <- values[-1] clz <- mapply(list, rank = labels, name = values, SIMPLIFY = FALSE, USE.NAMES = FALSE) result$classification <- if (tidy) atbl(dt_df(clz)) else clz } return(result) } #' @export #' @rdname wt_wikicommons wt_wikicommons_search <- function(query, limit = 10, offset = 0, utf8 = TRUE, ...) { tmp <- g_et(search_base("commons"), sh(query, limit, offset, utf8), ...) tmp$query$search <- atbl(tmp$query$search) return(tmp) } wikitaxa/R/wikipages.R0000644000177700017770000001421313103204320015721 0ustar herbrandtherbrandt# MediaWiki (general) ---------------- #' Parse MediaWiki Page URL #' #' Parse a MediaWiki page url into its component parts (wiki name, wiki type, #' and page title). Supports both static page urls and their equivalent API #' calls. #' #' @export #' @param url (character) MediaWiki page url. #' @family MediaWiki functions #' @return a list with elements: #' \itemize{ #' \item wiki - wiki language #' \item type - wikipedia type #' \item page - page name #' } #' @examples #' wt_wiki_url_parse(url="https://en.wikipedia.org/wiki/Malus_domestica") #' wt_wiki_url_parse("https://en.wikipedia.org/w/api.php?page=Malus_domestica") wt_wiki_url_parse <- function(url) { url <- curl::curl_unescape(url) if (grepl("/w/api.php?", url)) { matches <- match_( url, "//([^\\.]+).([^\\.]+).[^/]*/w/api\\.php\\?.*page=([^&]+).*$") } else { matches <- match_(url, "//([^\\.]+).([^\\.]+).[^/]*/wiki/([^\\?]+)") } return(list( wiki = matches[2], type = matches[3], page = matches[4] )) } #' Build MediaWiki Page URL #' #' Builds a MediaWiki page url from its component parts (wiki name, wiki type, #' and page title). Supports both static page urls and their equivalent API #' calls. #' #' @export #' @param wiki (character | list) Either the wiki name or a list with #' `$wiki`, `$type`, and `$page` (the output of [wt_wiki_url_parse()]). #' @param type (character) Wiki type. #' @param page (character) Wiki page title. #' @param api (boolean) Whether to return an API call or a static page url #' (default). If `FALSE`, all following (API-only) arguments are ignored. #' @param action (character) See #' for supported actions. This function currently only supports "parse". #' @param redirects (boolean) If the requested page is set to a redirect, #' resolve it. #' @param format (character) See #' for supported output formats. #' @param utf8 (boolean) If `TRUE`, encodes most (but not all) non-ASCII #' characters as UTF-8 instead of replacing them with hexadecimal escape #' sequences. #' @param prop (character) Properties to retrieve, either as a character vector #' or pipe-delimited string. See #' for #' supported properties. #' @family MediaWiki functions #' @return a URL (character) #' @examples #' wt_wiki_url_build(wiki = "en", type = "wikipedia", page = "Malus domestica") #' wt_wiki_url_build( #' wt_wiki_url_parse("https://en.wikipedia.org/wiki/Malus_domestica")) #' wt_wiki_url_build("en", "wikipedia", "Malus domestica", api = TRUE) wt_wiki_url_build <- function(wiki, type = NULL, page = NULL, api = FALSE, action = "parse", redirects = TRUE, format = "json", utf8 = TRUE, prop = c("text", "langlinks", "categories", "links", "templates", "images", "externallinks", "sections", "revid", "displaytitle", "iwlinks", "properties")) { assert(utf8, "logical") if (is.null(type) && is.null(page)) { type <- wiki$type page <- wiki$page wiki <- wiki$wiki } page <- gsub(" ", "_", page) if (api) { base_url <- paste0("https://", wiki, ".", type, ".org/w/api.php") # To ensure it is removed if (!utf8) utf8 <- "" prop <- paste(prop, collapse = "|") query <- c(page = page, mget(c("action", "redirects", "format", "utf8", "prop"))) query <- query[vapply(query, "!=", logical(1), "")] url <- crul::url_build(base_url, query = query) return(url) } else { return(paste0("https://", wiki, ".", type, ".org/wiki/", page)) } } #' Get MediaWiki Page from API #' #' Supports both static page urls and their equivalent API calls. #' #' @export #' @param url (character) MediaWiki page url. #' @param ... Arguments passed to [wt_wiki_url_build()] if `url` #' is a static page url. #' @family MediaWiki functions #' @return an `HttpResponse` response object from \pkg{crul} #' @details If the URL given is for a human readable html page, #' we convert it to equivalent API call - if URL is already an API call, #' we just use that. #' @examples \dontrun{ #' wt_wiki_page("https://en.wikipedia.org/wiki/Malus_domestica") #' } wt_wiki_page <- function(url, ...) { stopifnot(inherits(url, "character")) if (!grepl("/w/api.php?", url)) { url <- wt_wiki_url_build(wt_wiki_url_parse(url), api = TRUE) } cli <- crul::HttpClient$new(url = url) res <- cli$get(...) res$raise_for_status() return(res) } #' Parse MediaWiki Page #' #' Parses common properties from the result of a MediaWiki API page call. #' #' @export #' @param page ([crul::HttpResponse]) Result of [wt_wiki_page()] #' @param types (character) List of properties to parse. #' @param tidy (logical). tidy output to data.frames when possible. #' Default: `FALSE` #' @family MediaWiki functions #' @return a list #' @details Available properties currently not parsed: #' title, displaytitle, pageid, revid, redirects, text, categories, #' links, templates, images, sections, properties, ... #' @examples \dontrun{ #' pg <- wt_wiki_page("https://en.wikipedia.org/wiki/Malus_domestica") #' wt_wiki_page_parse(pg) #' } wt_wiki_page_parse <- function(page, types = c("langlinks", "iwlinks", "externallinks"), tidy = FALSE) { stopifnot(inherits(page, "HttpResponse")) result <- list() json <- jsonlite::fromJSON(rawToChar(page$content), tidy) if (is.null(json$parse)) { return(result) } ## Links to equivalent page in other languages if ("langlinks" %in% types) { result$langlinks <- if (tidy) { atbl(json$parse$langlinks) } else { vapply(json$parse$langlinks, "[[", "", "url") } } ## Other wiki links if ("iwlinks" %in% types) { result$iwlinks <- if (tidy) { atbl(json$parse$iwlinks$url) } else { vapply(json$parse$iwlinks, "[[", "", "url") } } ## Links to external resources if ("externallinks" %in% types) { result$externallinks <- json$parse$externallinks } ## Return return(result) } wikitaxa/R/wikipedia.R0000644000177700017770000001173113103204212015706 0ustar herbrandtherbrandt#' Wikipedia #' #' @export #' @template args #' @param wiki (character) wiki language. default: en. See [wikipedias] for #' language codes. #' @family Wikipedia functions #' @return `wt_wikipedia` returns a list, with slots: #' \itemize{ #' \item langlinks - language page links #' \item externallinks - external links #' \item common_names - a data.frame with `name` and `language` columns #' \item classification - a data.frame with `rank` and `name` columns #' \item synonyms - a character vector with taxonomic names #' } #' #' `wt_wikipedia_parse` returns a list with same slots determined by #' the `types` parmeter #' #' `wt_wikipedia_search` returns a list with slots for `continue` and #' `query`, where `query` holds the results, with `query$search` slot with #' the search results #' @references for help on search #' @examples \dontrun{ #' # high level #' wt_wikipedia(name = "Malus domestica") #' wt_wikipedia(name = "Malus domestica", wiki = "fr") #' wt_wikipedia(name = "Malus domestica", wiki = "da") #' #' # low level #' pg <- wt_wiki_page("https://en.wikipedia.org/wiki/Malus_domestica") #' wt_wikipedia_parse(pg) #' wt_wikipedia_parse(pg, tidy = TRUE) #' #' # search wikipedia #' wt_wikipedia_search(query = "Pinus") #' wt_wikipedia_search(query = "Pinus", wiki = "fr") #' wt_wikipedia_search(query = "Pinus", wiki = "br") #' #' ## curl options #' # wt_wikipedia_search(query = "Pinus", verbose = TRUE) #' #' ## use search results to dig into pages #' res <- wt_wikipedia_search(query = "Pinus") #' lapply(res$query$search$title[1:3], wt_wikipedia) #' } wt_wikipedia <- function(name, wiki = "en", utf8 = TRUE, ...) { assert(name, "character") assert(wiki, "character") stopifnot(length(name) == 1) prop <- c("langlinks", "externallinks", "common_names", "classification", "synonyms") res <- wt_wiki_url_build( wiki = wiki, type = "wikipedia", page = name, utf8 = utf8, prop = prop) pg <- wt_wiki_page(res, ...) wt_wikipedia_parse(page = pg, types = prop, tidy = TRUE) } #' @export #' @rdname wt_wikipedia wt_wikipedia_parse <- function(page, types = c("langlinks", "iwlinks", "externallinks", "common_names", "classification"), tidy = FALSE) { result <- wt_wiki_page_parse(page, types = types, tidy = tidy) json <- jsonlite::fromJSON(rawToChar(page$content), simplifyVector = TRUE) if (is.null(json$parse)) { return(result) } ## Common names if ("common_names" %in% types) { xml <- xml2::read_html(json$parse$text[[1]]) names_xml <- list( regular_bolds = xml2::xml_find_all( xml, xpath = "/html/body/p[count(preceding::div[contains(@id, 'toc') or contains(@class, 'toc')]) = 0 and count(preceding::h1) = 0 and count(preceding::h2) = 0 and count(preceding::h3) = 0]//b[not(parent::*[self::i]) and not(i)]"), #nolint regular_biotabox_header = xml2::xml_find_all( xml, xpath = "(//table[contains(@class, 'infobox biota') or contains(@class, 'infobox_v2 biota')]//th)[1]/b[not(parent::*[self::i]) and not(i)]") #nolint ) # NOTE: Often unreliable. regular_title <- stats::na.omit( match_(json$parse$displaytitle, "^([^<]*)$")[2]) common_names <- unique(c(unlist(lapply(names_xml, xml2::xml_text)), regular_title)) language <- match_(page$url, 'http[s]*://([^\\.]*)\\.')[2] cnms <- lapply(common_names, function(name) { list(name = name, language = language) }) result$common_names <- if (tidy) atbl(dt_df(cnms)) else cnms } ## classification if ("classification" %in% types) { txt <- xml2::read_html(json$parse$text[[1]]) html <- xml2::xml_find_all(txt, "//table[@class=\"infobox biota\"]//span") labels <- xml2::xml_attr(html, "class") labels <- gsub("^\\s+|\\s$|\\(|\\)", "", labels) values <- gsub("^\\s+|\\s$", "", xml2::xml_text(html)) clz <- mapply(list, rank = labels, name = values, SIMPLIFY = FALSE, USE.NAMES = FALSE) result$classification <- if (tidy) atbl(dt_df(clz)) else clz } ## synonyms if ("synonyms" %in% types) { syns <- list() txt <- xml2::read_html(json$parse$text[[1]]) html <- xml2::xml_find_all(txt, "//table[@class=\"infobox biota\"]//td") syn_node <- xml2::xml_find_first(html, "//th/a[contains(text(), \"Synonyms\")]") if (length(stats::na.omit(xml2::xml_text(syn_node))) > 0) { syn <- strsplit(xml2::xml_text(html[length(html)]), "\n")[[1]] syns <- syn[nzchar(syn)] } result$synonyms <- syns } return(result) } #' @export #' @rdname wt_wikipedia wt_wikipedia_search <- function(query, wiki = "en", limit = 10, offset = 0, utf8 = TRUE, ...) { assert(wiki, "character") tmp <- g_et(search_base(wiki, "wikipedia"), sh(query, limit, offset, utf8), ...) tmp$query$search <- atbl(tmp$query$search) return(tmp) } wikitaxa/R/wikitaxa-package.R0000644000177700017770000000105713071750101017161 0ustar herbrandtherbrandt#' Taxonomic Information from Wikipedia #' #' @name wikitaxa-package #' @aliases wikitaxa #' @docType package #' @author Scott Chamberlain \email{myrmecocystus@@gmail.com} #' @author Ethan Welty #' @keywords package NULL #' List of Wikipedias #' #' data.frame of 295 rows, with 3 columns: #' \itemize{ #' \item language - language #' \item language_local - language in local name #' \item wiki - langugae code for the wiki #' } #' #' From #' #' @name wikipedias #' @docType data #' @keywords data NULL wikitaxa/R/wikispecies.R0000644000177700017770000001006313145635425016300 0ustar herbrandtherbrandt#' WikiSpecies #' #' @export #' @template args #' @family Wikispecies functions #' @return `wt_wikispecies` returns a list, with slots: #' \itemize{ #' \item langlinks - language page links #' \item externallinks - external links #' \item common_names - a data.frame with `name` and `language` columns #' \item classification - a data.frame with `rank` and `name` columns #' } #' #' `wt_wikispecies_parse` returns a list #' #' `wt_wikispecies_search` returns a list with slots for `continue` and #' `query`, where `query` holds the results, with `query$search` slot with #' the search results #' @references for help on search #' @examples \dontrun{ #' # high level #' wt_wikispecies(name = "Malus domestica") #' wt_wikispecies(name = "Pinus contorta") #' wt_wikispecies(name = "Ursus americanus") #' wt_wikispecies(name = "Balaenoptera musculus") #' #' # low level #' pg <- wt_wiki_page("https://species.wikimedia.org/wiki/Abelmoschus") #' wt_wikispecies_parse(pg) #' #' # search wikispecies #' wt_wikispecies_search(query = "pine tree") #' #' ## use search results to dig into pages #' res <- wt_wikispecies_search(query = "pine tree") #' lapply(res$query$search$title[1:3], wt_wikispecies) #' } wt_wikispecies <- function(name, utf8 = TRUE, ...) { assert(name, "character") stopifnot(length(name) == 1) prop <- c("langlinks", "externallinks", "common_names", "classification") res <- wt_wiki_url_build( wiki = "species", type = "wikimedia", page = name, utf8 = utf8, prop = prop) pg <- wt_wiki_page(res, ...) wt_wikispecies_parse(pg, prop, tidy = TRUE) } #' @export #' @rdname wt_wikispecies wt_wikispecies_parse <- function(page, types = c("langlinks", "iwlinks", "externallinks", "common_names", "classification"), tidy = FALSE) { result <- wt_wiki_page_parse(page, types = types, tidy = tidy) json <- jsonlite::fromJSON(rawToChar(page$content), simplifyVector = FALSE) if (is.null(json$parse)) { return(result) } ## Common names if ("common_names" %in% types) { xml <- xml2::read_html(json$parse$text[[1]]) # XML formats: # language: [name|name] # Name formats: # name1, name2 vernacular_html <- xml2::xml_find_all( xml, "(//h2/span[contains(@id, 'Vernacular')]/parent::*/following-sibling::div)[1]" #nolint ) languages_html <- xml2::xml_find_all(vernacular_html, xpath = "b") languages <- gsub("\\s*:\\s*", "", unlist(lapply(languages_html, xml2::xml_text))) names_html <- xml2::xml_find_all( vernacular_html, "b[not(following-sibling::*[1][self::a])]/following-sibling::text()[1] | b/following-sibling::*[1][self::a]/text()") #nolint common_names <- gsub("^\\s*", "", unlist(lapply(names_html, xml2::xml_text))) cnms <- mapply(list, name = common_names, language = languages, SIMPLIFY = FALSE, USE.NAMES = FALSE) result$common_names <- if (tidy) atbl(dt_df(cnms)) else cnms } ## classification if ("classification" %in% types) { txt <- xml2::read_html(json$parse$text[[1]]) html <- xml2::xml_text( xml2::xml_find_first(txt, "//table[contains(@class, \"wikitable\")]//p")) html <- strsplit(html, "\n")[[1]] labels <- vapply(html, function(z) strsplit(z, ":")[[1]][1], "", USE.NAMES = FALSE) values <- vapply(html, function(z) strsplit(z, ":")[[1]][2], "", USE.NAMES = FALSE) values <- gsub("^\\s+|\\s+$", "", values) clz <- mapply(list, rank = labels, name = values, SIMPLIFY = FALSE, USE.NAMES = FALSE) result$classification <- if (tidy) atbl(dt_df(clz)) else clz } return(result) } #' @export #' @rdname wt_wikispecies wt_wikispecies_search <- function(query, limit = 10, offset = 0, utf8 = TRUE, ...) { tmp <- g_et(search_base("species"), sh(query, limit, offset, utf8), ...) tmp$query$search <- atbl(tmp$query$search) return(tmp) } wikitaxa/R/globals.R0000644000177700017770000000011513071776340015401 0ustar herbrandtherbrandtif (getRversion() >= "2.15.1") { utils::globalVariables(c('wikipedias')) } wikitaxa/R/zzz.R0000644000177700017770000000242513216573427014623 0ustar herbrandtherbrandttc <- function(l) Filter(Negate(is.null), l) dt_df <- function(x) { (ffff <- data.table::setDF(data.table::rbindlist(x, fill = TRUE, use.names = TRUE))) } search_base <- function(x, y = "wikimedia") { sprintf("https://%s.%s.org/w/api.php", x, y) } atbl <- function(x) tibble::as_tibble(x) g_et <- function(url, args = list(), ...) { cli <- crul::HttpClient$new(url = url) res <- cli$get(query = args, ...) res$raise_for_status() jsonlite::fromJSON(res$parse("UTF-8")) } assert <- function(x, y) { if (!is.null(x)) { if (!class(x) %in% y) { stop(deparse(substitute(x)), " must be of class ", paste0(y, collapse = ", "), call. = FALSE) } } } sh <- function(query, limit, offset, utf8) { assert(limit, c("integer", "numeric")) assert(offset, c("integer", "numeric")) assert(utf8, "logical") tc(list( action = "query", list = "search", srsearch = query, utf8 = if (utf8) "" else NULL, format = "json", srprop = "size|wordcount|timestamp|snippet", srlimit = limit, sroffset = offset )) } match_ <- function(string, pattern) { pos <- regexec(pattern, string) regmatches(string, pos)[[1]] } strex <- function(string, pattern) { regmatches(string, gregexpr(pattern, string)) } wikitaxa/vignettes/0000755000177700017770000000000013216602015015431 5ustar herbrandtherbrandtwikitaxa/vignettes/wikitaxa_vignette.Rmd0000644000177700017770000000547413071751010021634 0ustar herbrandtherbrandt--- title: "Introduction to the wikitaxa package" author: "Scott Chamberlain" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to the wikitaxa package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r echo=FALSE} knitr::opts_chunk$set( comment = "#>", collapse = TRUE, warning = FALSE, message = FALSE ) ``` `wikitaxa` - Taxonomy data from Wikipedia The goal of `wikitaxa` is to allow search and taxonomic data retrieval from across many Wikimedia sites, including: Wikipedia, Wikicommons, and Wikispecies. There are lower level and higher level parts to the package API: ### Low level API The low level API is meant for power users and gives you more control, but requires more knowledge. * `wt_wiki_page()` * `wt_wiki_page_parse()` * `wt_wiki_url_build()` * `wt_wiki_url_parse()` * `wt_wikispecies_parse()` * `wt_wikicommons_parse()` * `wt_wikipedia_parse()` ### High level API The high level API is meant to be easier and faster to use. * `wt_data()` * `wt_data_id()` * `wt_wikispecies()` * `wt_wikicommons()` * `wt_wikipedia()` Search functions: * `wt_wikicommons_search()` * `wt_wikispecies_search()` * `wt_wikipedia_search()` ## Installation CRAN version ```{r eval=FALSE} install.packages("wikitaxa") ``` Dev version ```{r eval=FALSE} devtools::install_github("ropensci/wikitaxa") ``` ```{r} library("wikitaxa") ``` ## wiki data ```{r eval=FALSE} wt_data("Poa annua") ``` Get a Wikidata ID ```{r} wt_data_id("Mimulus foliatus") ``` ## wikipedia lower level ```{r} pg <- wt_wiki_page("https://en.wikipedia.org/wiki/Malus_domestica") res <- wt_wiki_page_parse(pg) res$iwlinks ``` higher level ```{r} res <- wt_wikipedia("Malus domestica") res$common_names res$classification ``` choose a wikipedia language ```{r eval=FALSE} # French wt_wikipedia(name = "Malus domestica", wiki = "fr") # Slovak wt_wikipedia(name = "Malus domestica", wiki = "sk") # Vietnamese wt_wikipedia(name = "Malus domestica", wiki = "vi") ``` search ```{r} wt_wikipedia_search(query = "Pinus") ``` search supports languages ```{r eval=FALSE} wt_wikipedia_search(query = "Pinus", wiki = "fr") ``` ## wikicommons lower level ```{r} pg <- wt_wiki_page("https://commons.wikimedia.org/wiki/Abelmoschus") res <- wt_wikicommons_parse(pg) res$common_names[1:3] ``` higher level ```{r} res <- wt_wikicommons("Abelmoschus") res$classification res$common_names ``` search ```{r} wt_wikicommons_search(query = "Pinus") ``` ## wikispecies lower level ```{r} pg <- wt_wiki_page("https://species.wikimedia.org/wiki/Malus_domestica") res <- wt_wikispecies_parse(pg, types = "common_names") res$common_names[1:3] ``` higher level ```{r} res <- wt_wikispecies("Malus domestica") res$classification res$common_names ``` search ```{r} wt_wikispecies_search(query = "Pinus") ``` wikitaxa/README.md0000644000177700017770000001563613216574127014727 0ustar herbrandtherbrandtwikitaxa ======== [![Project Status: WIP - Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](http://www.repostatus.org/badges/latest/wip.svg)](http://www.repostatus.org/#wip) [![Build Status](https://api.travis-ci.org/ropensci/wikitaxa.svg?branch=master)](https://travis-ci.org/ropensci/wikitaxa) [![codecov](https://codecov.io/gh/ropensci/wikitaxa/branch/master/graph/badge.svg)](https://codecov.io/gh/ropensci/wikitaxa) [![rstudio mirror downloads](https://cranlogs.r-pkg.org/badges/wikitaxa)](https://github.com/metacran/cranlogs.app) [![cran version](https://www.r-pkg.org/badges/version/wikitaxa)](https://cran.r-project.org/package=wikitaxa) `wikitaxa` - taxonomy data from Wikipedia/Wikidata/Wikispecies ### Low level API The low level API is meant for power users and gives you more control, but requires more knowledge. * `wt_wiki_page()` * `wt_wiki_page_parse()` * `wt_wiki_url_build()` * `wt_wiki_url_parse()` * `wt_wikispecies_parse()` * `wt_wikicommons_parse()` * `wt_wikipedia_parse()` ### High level API The high level API is meant to be easier and faster to use. * `wt_data()` * `wt_data_id()` * `wt_wikispecies()` * `wt_wikicommons()` * `wt_wikipedia()` Search functions: * `wt_wikicommons_search()` * `wt_wikispecies_search()` * `wt_wikipedia_search()` ## Installation CRAN version ```r install.packages("wikitaxa") ``` Dev version ```r install.packages("devtools") devtools::install_github("ropensci/wikitaxa") ``` ```r library('wikitaxa') ``` ## wiki data ```r wt_data("Poa annua") ``` Get a Wikidata ID ```r wt_data_id("Mimulus foliatus") #> [1] "Q6495130" #> attr(,"class") #> [1] "wiki_id" ``` ## wikipedia lower level ```r pg <- wt_wiki_page("https://en.wikipedia.org/wiki/Malus_domestica") res <- wt_wiki_page_parse(pg) res$iwlinks #> [1] "https://en.wiktionary.org/wiki/apple" #> [2] "https://commons.wikimedia.org/wiki/Special:Search/Apple" #> [3] "https://en.wikiquote.org/wiki/Apples" #> [4] "https://en.wikisource.org/wiki/1911_Encyclop%C3%A6dia_Britannica/Apple" #> [5] "https://en.wikibooks.org/wiki/Apples" #> [6] "https://species.wikimedia.org/wiki/Malus_domestica" #> [7] "https://commons.wikimedia.org/wiki/Category:Apple_cultivars" ``` higher level ```r res <- wt_wikipedia("Malus domestica") res$common_names #> # A tibble: 1 x 2 #> name language #> #> 1 Apple en res$classification #> # A tibble: 3 x 2 #> rank name #> #> 1 plainlinks #> 2 species M. pumila #> 3 binomial Malus pumila ``` choose a wikipedia language ```r # French wt_wikipedia(name = "Malus domestica", wiki = "fr") # Slovak wt_wikipedia(name = "Malus domestica", wiki = "sk") # Vietnamese wt_wikipedia(name = "Malus domestica", wiki = "vi") ``` ## wikicommons lower level ```r pg <- wt_wiki_page("https://commons.wikimedia.org/wiki/Abelmoschus") res <- wt_wikicommons_parse(pg) res$common_names[1:3] #> [[1]] #> [[1]]$name #> [1] "okra" #> #> [[1]]$language #> [1] "en" #> #> #> [[2]] #> [[2]]$name #> [1] "مسكي" #> #> [[2]]$language #> [1] "ar" #> #> #> [[3]] #> [[3]]$name #> [1] "Abelmoş" #> #> [[3]]$language #> [1] "az" ``` higher level ```r res <- wt_wikicommons("Abelmoschus") res$classification #> # A tibble: 15 x 2 #> rank name #> #> 1 Domain Eukaryota #> 2 unranked Archaeplastida #> 3 Regnum Plantae #> 4 Cladus angiosperms #> 5 Cladus eudicots #> 6 Cladus core eudicots #> 7 Cladus superrosids #> 8 Cladus rosids #> 9 Cladus eurosids II #> 10 Ordo Malvales #> 11 Familia Malvaceae #> 12 Subfamilia Malvoideae #> 13 Tribus Hibisceae #> 14 Genus Abelmoschus #> 15 Authority Medik. (1787) res$common_names #> # A tibble: 19 x 2 #> name language #> #> 1 okra en #> 2 مسكي ar #> 3 Abelmoş az #> 4 Ibiškovec cs #> 5 Bisameibisch de #> 6 Okrat fi #> 7 Abelmosco gl #> 8 Abelmošus hr #> 9 Ybiškė lt #> 10 "അബെ\u0d7dമോസ്കസ്" ml #> 11 Абельмош mrj #> 12 Abelmoskusslekta nn #> 13 Piżmian pl #> 14 Абельмош ru #> 15 موري sd #> 16 Okrasläktet sv #> 17 Абельмош udm #> 18 Chi Vông vang vi #> 19 黄葵属 zh ``` ## wikispecies lower level ```r pg <- wt_wiki_page("https://species.wikimedia.org/wiki/Malus_domestica") res <- wt_wikispecies_parse(pg, types = "common_names") res$common_names[1:3] #> [[1]] #> [[1]]$name #> [1] "Ябълка" #> #> [[1]]$language #> [1] "български" #> #> #> [[2]] #> [[2]]$name #> [1] "Poma, pomera" #> #> [[2]]$language #> [1] "català" #> #> #> [[3]] #> [[3]]$name #> [1] "Apfel" #> #> [[3]]$language #> [1] "Deutsch" ``` higher level ```r res <- wt_wikispecies("Malus domestica") res$classification #> # A tibble: 8 x 2 #> rank name #> #> 1 Superregnum Eukaryota #> 2 Regnum Plantae #> 3 Cladus Angiosperms #> 4 Cladus Eudicots #> 5 Cladus Core eudicots #> 6 Cladus Rosids #> 7 Cladus Eurosids I #> 8 Ordo Rosales res$common_names #> # A tibble: 19 x 2 #> name language #> #> 1 Ябълка български #> 2 Poma, pomera català #> 3 Apfel Deutsch #> 4 Aed-õunapuu eesti #> 5 Μηλιά Ελληνικά #> 6 Apple English #> 7 Manzano español #> 8 Pomme français #> 9 Melâr furlan #> 10 사과나무 한국어 #> 11 ‘Āpala Hawaiʻi #> 12 Melo italiano #> 13 Aapel Nordfriisk #> 14 Maçã, Macieira português #> 15 Яблоня домашняя русский #> 16 Tarhaomenapuu suomi #> 17 Elma Türkçe #> 18 Яблуня домашня українська #> 19 Pomaro vèneto ``` ## Contributors * [Ethan Welty](https://github.com/ezwelty) * [Scott Chamberlain](https://github.com/sckott) ## Meta * Please [report any issues or bugs](https://github.com/ropensci/wikitaxa/issues). * License: MIT * Get citation information for `wikitaxa` in R doing `citation(package = 'wikitaxa')` * Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms. [![ropensci](https://ropensci.org/public_images/github_footer.png)](https://ropensci.org) wikitaxa/MD50000644000177700017770000000401213216753634013744 0ustar herbrandtherbrandt462a98450412c1a45c5697e0073a8dc8 *DESCRIPTION c5af52351472a750055a760a8924ce71 *LICENSE 913e5d7d676aaab98f66aa238239936a *NAMESPACE 9f18cb1fb7de8a9e55cf256df1ed2912 *NEWS.md 742ccfe2d41233878115e1c41a293f53 *R/globals.R 96c6a672e8158a79108b1a18a69173e4 *R/wiki.R dade6665c461b4e115db6b8f65f03e1f *R/wikicommons.R ced60ef6ab6afa5d44605b0a1b4536ac *R/wikipages.R 6202228c1fa33e2ae7562dc96d79d382 *R/wikipedia.R c9c6a6612f15af4ff245a9996083b000 *R/wikispecies.R 04d38ca008da155845ee56063b54d35a *R/wikitaxa-package.R 63ae6c0c7b00a1dfc5c380d5513c3e93 *R/zzz.R 31e800cb9cdd30d1ff33642a1178b9fd *README.md 523203e0ec346dcd93b21348c558cbb8 *build/vignette.rds ed7a9871999234c7e411153f36dc5530 *data/wikipedias.rda 1b3cadb5ad75e550ad6c53d201061c5e *inst/doc/wikitaxa_vignette.R 6a0319642b0b7bcd0a58a15068eae741 *inst/doc/wikitaxa_vignette.Rmd 33e6918063e75779f806bda84c134398 *inst/doc/wikitaxa_vignette.html 95a5800c2e07653bd1e555a5b105911a *man/wikipedias.Rd c52da248a29e5d39bc26f8437f5a3947 *man/wikitaxa-package.Rd 723900cc9e822db562a2254b6c278e11 *man/wt_data.Rd 45863fa8f408821364438c642173e05c *man/wt_wiki_page.Rd 23c68e21a98be07d2d1937d17e51b29e *man/wt_wiki_page_parse.Rd 7b48c4aac9de999437c327d17d55fc6d *man/wt_wiki_url_build.Rd 95cec5153587ef7765b041eafccabc0b *man/wt_wiki_url_parse.Rd 34b5c0e06c959df8d9dd210a32b46579 *man/wt_wikicommons.Rd 2e1f6d59f7767984a202b1e3cbbe69bc *man/wt_wikipedia.Rd f0033ed79f6bab8017ef399e40cf388b *man/wt_wikispecies.Rd f8a030c37b64a043072d27be6aa286d1 *tests/test-all.R 31a277de5b738a8274c572d285ed8585 *tests/testthat/test-wikicommons.R e6dc4c1c07708834e3b2a5d419437ac8 *tests/testthat/test-wikipedia.R 578739dd1e6fbd80d7fbd216f1e2392a *tests/testthat/test-wikispecies.R 4930b09d3f1c6cc9fcca4da14999e053 *tests/testthat/test-wt_data.R 0f6d914ba4d5a8cce72733421fb3e667 *tests/testthat/test-wt_wiki_page.R 27a0b128a66d503af9209461eba1a936 *tests/testthat/test-wt_wiki_url_build.R e93890d0d28f9fc47b5ffa19beae9b95 *tests/testthat/test-wt_wiki_url_parse.R 6a0319642b0b7bcd0a58a15068eae741 *vignettes/wikitaxa_vignette.Rmd wikitaxa/build/0000755000177700017770000000000013216602015014520 5ustar herbrandtherbrandtwikitaxa/build/vignette.rds0000644000177700017770000000034613216602015017062 0ustar herbrandtherbrandtmQM 0 1Aw_1Eă7)[֎jh] iZ{ycfrJrDi1tKy(.bs;,Mg bZ}܃\3c(e<^7±(%28:rPI|o[ҟõEtk+ _QZKße0FAa.h.:H`[8wikitaxa/DESCRIPTION0000644000177700017770000000211313216753634015142 0ustar herbrandtherbrandtPackage: wikitaxa Title: Taxonomic Information from 'Wikipedia' Description: 'Taxonomic' information from 'Wikipedia', 'Wikicommons', 'Wikispecies', and 'Wikidata'. Functions included for getting taxonomic information from each of the sources just listed, as well performing taxonomic search. Version: 0.2.0 License: MIT + file LICENSE URL: https://github.com/ropensci/wikitaxa BugReports: https://github.com/ropensci/wikitaxa/issues Authors@R: c( person("Scott", "Chamberlain", role = c("aut", "cre"), email = "myrmecocystus+r@gmail.com"), person("Ethan", "Welty", role = "aut") ) LazyLoad: yes LazyData: yes Encoding: UTF-8 VignetteBuilder: knitr Depends: R(>= 3.2.1) Imports: WikidataR, data.table, curl, crul (>= 0.3.4), tibble, jsonlite, xml2 Suggests: roxygen2 (>= 6.0.1), testthat, knitr, rmarkdown RoxygenNote: 6.0.1 NeedsCompilation: no Packaged: 2017-12-21 00:45:01 UTC; sckott Author: Scott Chamberlain [aut, cre], Ethan Welty [aut] Maintainer: Scott Chamberlain Repository: CRAN Date/Publication: 2017-12-21 15:47:40 UTC wikitaxa/man/0000755000177700017770000000000013216602015014174 5ustar herbrandtherbrandtwikitaxa/man/wikipedias.Rd0000644000177700017770000000067513071774412016636 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/wikitaxa-package.R \docType{data} \name{wikipedias} \alias{wikipedias} \title{List of Wikipedias} \description{ data.frame of 295 rows, with 3 columns: \itemize{ \item language - language \item language_local - language in local name \item wiki - langugae code for the wiki } } \details{ From \url{https://meta.wikimedia.org/wiki/List_of_Wikipedias} } \keyword{data} wikitaxa/man/wt_data.Rd0000644000177700017770000000321113107627373016120 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/wiki.R \name{wt_data} \alias{wt_data} \alias{wt_data_id} \title{Wikidata taxonomy data} \usage{ wt_data(x, property = NULL, ...) wt_data_id(x, language = "en", limit = 10, ...) } \arguments{ \item{x}{(character) a taxonomic name} \item{property}{(character) a property id, e.g., P486} \item{...}{curl options passed on to \code{\link[httr:GET]{httr::GET()}}} \item{language}{(character) two letter language code} \item{limit}{(integer) records to return. Default: 10} } \value{ \code{wt_data} searches Wikidata, and returns a list with elements: \itemize{ \item labels - data.frame with columns: language, value \item descriptions - data.frame with columns: language, value \item aliases - data.frame with columns: language, value \item sitelinks - data.frame with columns: site, title \item claims - data.frame with columns: claims, property_value, property_description, value (comma separted values in string) } \code{wt_data_id} gets the Wikidata ID for the searched term, and returns the ID as character } \description{ Wikidata taxonomy data } \details{ Note that \code{wt_data} can take a while to run since when fetching claims it has to do so one at a time for each claim You can search things other than taxonomic names with \code{wt_data} if you like } \examples{ \dontrun{ # search by taxon name # wt_data("Mimulus alsinoides") # choose which properties to return wt_data(x="Mimulus foliatus", property = c("P846", "P815")) # get a taxonomic identifier wt_data_id("Mimulus foliatus") # the id can be passed directly to wt_data() # wt_data(wt_data_id("Mimulus foliatus")) } } wikitaxa/man/wt_wiki_page.Rd0000644000177700017770000000161213070141166017137 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/wikipages.R \name{wt_wiki_page} \alias{wt_wiki_page} \title{Get MediaWiki Page from API} \usage{ wt_wiki_page(url, ...) } \arguments{ \item{url}{(character) MediaWiki page url.} \item{...}{Arguments passed to \code{\link[=wt_wiki_url_build]{wt_wiki_url_build()}} if \code{url} is a static page url.} } \value{ an \code{HttpResponse} response object from \pkg{crul} } \description{ Supports both static page urls and their equivalent API calls. } \details{ If the URL given is for a human readable html page, we convert it to equivalent API call - if URL is already an API call, we just use that. } \examples{ \dontrun{ wt_wiki_page("https://en.wikipedia.org/wiki/Malus_domestica") } } \seealso{ Other MediaWiki functions: \code{\link{wt_wiki_page_parse}}, \code{\link{wt_wiki_url_build}}, \code{\link{wt_wiki_url_parse}} } wikitaxa/man/wt_wikispecies.Rd0000644000177700017770000000437713145635712017542 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/wikispecies.R \name{wt_wikispecies} \alias{wt_wikispecies} \alias{wt_wikispecies_parse} \alias{wt_wikispecies_search} \title{WikiSpecies} \usage{ wt_wikispecies(name, utf8 = TRUE, ...) wt_wikispecies_parse(page, types = c("langlinks", "iwlinks", "externallinks", "common_names", "classification"), tidy = FALSE) wt_wikispecies_search(query, limit = 10, offset = 0, utf8 = TRUE, ...) } \arguments{ \item{name}{(character) Wiki name - as a page title, must be length 1} \item{utf8}{(logical) If `TRUE`, encodes most (but not all) non-ASCII characters as UTF-8 instead of replacing them with hexadecimal escape sequences. Default: `TRUE`} \item{...}{curl options, passed on to [httr::GET()]} \item{page}{([httr::response()]) Result of [wt_wiki_page()]} \item{types}{(character) List of properties to parse} \item{tidy}{(logical). tidy output to data.frame's if possible. Default: `FALSE`} \item{query}{(character) query terms} \item{limit}{(integer) number of results to return. Default: 10} \item{offset}{(integer) record to start at. Default: 0} } \value{ \code{wt_wikispecies} returns a list, with slots: \itemize{ \item langlinks - language page links \item externallinks - external links \item common_names - a data.frame with \code{name} and \code{language} columns \item classification - a data.frame with \code{rank} and \code{name} columns } \code{wt_wikispecies_parse} returns a list \code{wt_wikispecies_search} returns a list with slots for \code{continue} and \code{query}, where \code{query} holds the results, with \code{query$search} slot with the search results } \description{ WikiSpecies } \examples{ \dontrun{ # high level wt_wikispecies(name = "Malus domestica") wt_wikispecies(name = "Pinus contorta") wt_wikispecies(name = "Ursus americanus") wt_wikispecies(name = "Balaenoptera musculus") # low level pg <- wt_wiki_page("https://species.wikimedia.org/wiki/Abelmoschus") wt_wikispecies_parse(pg) # search wikispecies wt_wikispecies_search(query = "pine tree") ## use search results to dig into pages res <- wt_wikispecies_search(query = "pine tree") lapply(res$query$search$title[1:3], wt_wikispecies) } } \references{ \url{https://www.mediawiki.org/wiki/API:Search} for help on search } wikitaxa/man/wt_wikicommons.Rd0000644000177700017770000000451713216573444017560 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/wikicommons.R \name{wt_wikicommons} \alias{wt_wikicommons} \alias{wt_wikicommons_parse} \alias{wt_wikicommons_search} \title{WikiCommons} \usage{ wt_wikicommons(name, utf8 = TRUE, ...) wt_wikicommons_parse(page, types = c("langlinks", "iwlinks", "externallinks", "common_names", "classification"), tidy = FALSE) wt_wikicommons_search(query, limit = 10, offset = 0, utf8 = TRUE, ...) } \arguments{ \item{name}{(character) Wiki name - as a page title, must be length 1} \item{utf8}{(logical) If `TRUE`, encodes most (but not all) non-ASCII characters as UTF-8 instead of replacing them with hexadecimal escape sequences. Default: `TRUE`} \item{...}{curl options, passed on to [httr::GET()]} \item{page}{([httr::response()]) Result of [wt_wiki_page()]} \item{types}{(character) List of properties to parse} \item{tidy}{(logical). tidy output to data.frame's if possible. Default: `FALSE`} \item{query}{(character) query terms} \item{limit}{(integer) number of results to return. Default: 10} \item{offset}{(integer) record to start at. Default: 0} } \value{ \code{wt_wikicommons} returns a list, with slots: \itemize{ \item langlinks - language page links \item externallinks - external links \item common_names - a data.frame with \code{name} and \code{language} columns \item classification - a data.frame with \code{rank} and \code{name} columns } \code{wt_wikicommons_parse} returns a list \code{wt_wikicommons_search} returns a list with slots for \code{continue} and \code{query}, where \code{query} holds the results, with \code{query$search} slot with the search results } \description{ WikiCommons } \examples{ \dontrun{ # high level wt_wikicommons(name = "Malus domestica") wt_wikicommons(name = "Pinus contorta") wt_wikicommons(name = "Ursus americanus") wt_wikicommons(name = "Balaenoptera musculus") wt_wikicommons(name = "Category:Poeae") wt_wikicommons(name = "Category:Pinaceae") # low level pg <- wt_wiki_page("https://commons.wikimedia.org/wiki/Malus_domestica") wt_wikicommons_parse(pg) # search wikicommons wt_wikicommons_search(query = "Pinus") ## use search results to dig into pages res <- wt_wikicommons_search(query = "Pinus") lapply(res$query$search$title[1:3], wt_wikicommons) } } \references{ \url{https://www.mediawiki.org/wiki/API:Search} for help on search } wikitaxa/man/wt_wiki_page_parse.Rd0000644000177700017770000000206413070141166020333 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/wikipages.R \name{wt_wiki_page_parse} \alias{wt_wiki_page_parse} \title{Parse MediaWiki Page} \usage{ wt_wiki_page_parse(page, types = c("langlinks", "iwlinks", "externallinks"), tidy = FALSE) } \arguments{ \item{page}{(\link[crul:HttpResponse]{crul::HttpResponse}) Result of \code{\link[=wt_wiki_page]{wt_wiki_page()}}} \item{types}{(character) List of properties to parse.} \item{tidy}{(logical). tidy output to data.frames when possible. Default: \code{FALSE}} } \value{ a list } \description{ Parses common properties from the result of a MediaWiki API page call. } \details{ Available properties currently not parsed: title, displaytitle, pageid, revid, redirects, text, categories, links, templates, images, sections, properties, ... } \examples{ \dontrun{ pg <- wt_wiki_page("https://en.wikipedia.org/wiki/Malus_domestica") wt_wiki_page_parse(pg) } } \seealso{ Other MediaWiki functions: \code{\link{wt_wiki_page}}, \code{\link{wt_wiki_url_build}}, \code{\link{wt_wiki_url_parse}} } wikitaxa/man/wt_wikipedia.Rd0000644000177700017770000000516413103204377017155 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/wikipedia.R \name{wt_wikipedia} \alias{wt_wikipedia} \alias{wt_wikipedia_parse} \alias{wt_wikipedia_search} \title{Wikipedia} \usage{ wt_wikipedia(name, wiki = "en", utf8 = TRUE, ...) wt_wikipedia_parse(page, types = c("langlinks", "iwlinks", "externallinks", "common_names", "classification"), tidy = FALSE) wt_wikipedia_search(query, wiki = "en", limit = 10, offset = 0, utf8 = TRUE, ...) } \arguments{ \item{name}{(character) Wiki name - as a page title, must be length 1} \item{wiki}{(character) wiki language. default: en. See \link{wikipedias} for language codes.} \item{utf8}{(logical) If `TRUE`, encodes most (but not all) non-ASCII characters as UTF-8 instead of replacing them with hexadecimal escape sequences. Default: `TRUE`} \item{...}{curl options, passed on to [httr::GET()]} \item{page}{([httr::response()]) Result of [wt_wiki_page()]} \item{types}{(character) List of properties to parse} \item{tidy}{(logical). tidy output to data.frame's if possible. Default: `FALSE`} \item{query}{(character) query terms} \item{limit}{(integer) number of results to return. Default: 10} \item{offset}{(integer) record to start at. Default: 0} } \value{ \code{wt_wikipedia} returns a list, with slots: \itemize{ \item langlinks - language page links \item externallinks - external links \item common_names - a data.frame with \code{name} and \code{language} columns \item classification - a data.frame with \code{rank} and \code{name} columns \item synonyms - a character vector with taxonomic names } \code{wt_wikipedia_parse} returns a list with same slots determined by the \code{types} parmeter \code{wt_wikipedia_search} returns a list with slots for \code{continue} and \code{query}, where \code{query} holds the results, with \code{query$search} slot with the search results } \description{ Wikipedia } \examples{ \dontrun{ # high level wt_wikipedia(name = "Malus domestica") wt_wikipedia(name = "Malus domestica", wiki = "fr") wt_wikipedia(name = "Malus domestica", wiki = "da") # low level pg <- wt_wiki_page("https://en.wikipedia.org/wiki/Malus_domestica") wt_wikipedia_parse(pg) wt_wikipedia_parse(pg, tidy = TRUE) # search wikipedia wt_wikipedia_search(query = "Pinus") wt_wikipedia_search(query = "Pinus", wiki = "fr") wt_wikipedia_search(query = "Pinus", wiki = "br") ## curl options # wt_wikipedia_search(query = "Pinus", verbose = TRUE) ## use search results to dig into pages res <- wt_wikipedia_search(query = "Pinus") lapply(res$query$search$title[1:3], wt_wikipedia) } } \references{ \url{https://www.mediawiki.org/wiki/API:Search} for help on search } wikitaxa/man/wt_wiki_url_parse.Rd0000644000177700017770000000153013070125600020211 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/wikipages.R \name{wt_wiki_url_parse} \alias{wt_wiki_url_parse} \title{Parse MediaWiki Page URL} \usage{ wt_wiki_url_parse(url) } \arguments{ \item{url}{(character) MediaWiki page url.} } \value{ a list with elements: \itemize{ \item wiki - wiki language \item type - wikipedia type \item page - page name } } \description{ Parse a MediaWiki page url into its component parts (wiki name, wiki type, and page title). Supports both static page urls and their equivalent API calls. } \examples{ wt_wiki_url_parse(url="https://en.wikipedia.org/wiki/Malus_domestica") wt_wiki_url_parse("https://en.wikipedia.org/w/api.php?page=Malus_domestica") } \seealso{ Other MediaWiki functions: \code{\link{wt_wiki_page_parse}}, \code{\link{wt_wiki_page}}, \code{\link{wt_wiki_url_build}} } wikitaxa/man/wt_wiki_url_build.Rd0000644000177700017770000000420413065334417020213 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/wikipages.R \name{wt_wiki_url_build} \alias{wt_wiki_url_build} \title{Build MediaWiki Page URL} \usage{ wt_wiki_url_build(wiki, type = NULL, page = NULL, api = FALSE, action = "parse", redirects = TRUE, format = "json", utf8 = TRUE, prop = c("text", "langlinks", "categories", "links", "templates", "images", "externallinks", "sections", "revid", "displaytitle", "iwlinks", "properties")) } \arguments{ \item{wiki}{(character | list) Either the wiki name or a list with \code{$wiki}, \code{$type}, and \code{$page} (the output of \code{\link[=wt_wiki_url_parse]{wt_wiki_url_parse()}}).} \item{type}{(character) Wiki type.} \item{page}{(character) Wiki page title.} \item{api}{(boolean) Whether to return an API call or a static page url (default). If \code{FALSE}, all following (API-only) arguments are ignored.} \item{action}{(character) See \url{https://en.wikipedia.org/w/api.php} for supported actions. This function currently only supports "parse".} \item{redirects}{(boolean) If the requested page is set to a redirect, resolve it.} \item{format}{(character) See \url{https://en.wikipedia.org/w/api.php} for supported output formats.} \item{utf8}{(boolean) If \code{TRUE}, encodes most (but not all) non-ASCII characters as UTF-8 instead of replacing them with hexadecimal escape sequences.} \item{prop}{(character) Properties to retrieve, either as a character vector or pipe-delimited string. See \url{https://en.wikipedia.org/w/api.php?action=help&modules=parse} for supported properties.} } \value{ a URL (character) } \description{ Builds a MediaWiki page url from its component parts (wiki name, wiki type, and page title). Supports both static page urls and their equivalent API calls. } \examples{ wt_wiki_url_build(wiki = "en", type = "wikipedia", page = "Malus domestica") wt_wiki_url_build( wt_wiki_url_parse("https://en.wikipedia.org/wiki/Malus_domestica")) wt_wiki_url_build("en", "wikipedia", "Malus domestica", api = TRUE) } \seealso{ Other MediaWiki functions: \code{\link{wt_wiki_page_parse}}, \code{\link{wt_wiki_page}}, \code{\link{wt_wiki_url_parse}} } wikitaxa/man/wikitaxa-package.Rd0000644000177700017770000000056313064527572017720 0ustar herbrandtherbrandt% Generated by roxygen2: do not edit by hand % Please edit documentation in R/wikitaxa-package.R \docType{package} \name{wikitaxa-package} \alias{wikitaxa-package} \alias{wikitaxa} \title{Taxonomic Information from Wikipedia} \description{ Taxonomic Information from Wikipedia } \author{ Scott Chamberlain \email{myrmecocystus@gmail.com} Ethan Welty } \keyword{package} wikitaxa/LICENSE0000644000177700017770000000005713064410665014441 0ustar herbrandtherbrandtYEAR: 2017 COPYRIGHT HOLDER: Scott Chamberlain