readr/0000755000175100001440000000000013106646435011375 5ustar hornikusersreadr/inst/0000755000175100001440000000000013106621354012343 5ustar hornikusersreadr/inst/extdata/0000755000175100001440000000000013106315444013775 5ustar hornikusersreadr/inst/extdata/challenge.csv0000644000175100001440000011223113057262333016437 0ustar hornikusersx,y 404,NA 4172,NA 3004,NA 787,NA 37,NA 2332,NA 2489,NA 1449,NA 3665,NA 3863,NA 4374,NA 875,NA 172,NA 1602,NA 2012,NA 979,NA 2018,NA 319,NA 1944,NA 4878,NA 1450,NA 3392,NA 3677,NA 980,NA 4903,NA 3708,NA 258,NA 2652,NA 3480,NA 3443,NA 157,NA 1128,NA 1505,NA 3183,NA 2396,NA 2161,NA 3533,NA 4743,NA 902,NA 1085,NA 3401,NA 2495,NA 3209,NA 3302,NA 481,NA 3829,NA 3849,NA 4954,NA 4853,NA 1946,NA 2306,NA 1577,NA 874,NA 2658,NA 2469,NA 3897,NA 1021,NA 3567,NA 327,NA 1772,NA 4126,NA 1370,NA 2851,NA 1679,NA 2982,NA 958,NA 4739,NA 2713,NA 2724,NA 1393,NA 2234,NA 1858,NA 141,NA 2330,NA 1951,NA 101,NA 1885,NA 2800,NA 4286,NA 1925,NA 2640,NA 3004,NA 1307,NA 1451,NA 2401,NA 4601,NA 2004,NA 1066,NA 3359,NA 294,NA 4986,NA 746,NA 2593,NA 4231,NA 3592,NA 1207,NA 2736,NA 4175,NA 140,NA 2347,NA 4029,NA 4071,NA 2020,NA 1093,NA 2092,NA 3345,NA 2539,NA 3302,NA 2559,NA 4178,NA 3544,NA 4372,NA 58,NA 4442,NA 4982,NA 2501,NA 1795,NA 3875,NA 2923,NA 3170,NA 4294,NA 2835,NA 1265,NA 4595,NA 4337,NA 1243,NA 2015,NA 3849,NA 598,NA 974,NA 823,NA 3317,NA 4283,NA 4633,NA 2762,NA 2886,NA 3438,NA 1224,NA 224,NA 4550,NA 354,NA 4985,NA 3060,NA 863,NA 4548,NA 188,NA 2968,NA 1185,NA 4532,NA 4095,NA 3500,NA 1101,NA 3640,NA 1086,NA 2282,NA 1664,NA 2842,NA 1262,NA 2321,NA 4589,NA 4865,NA 4096,NA 4515,NA 2907,NA 3866,NA 4976,NA 3555,NA 1075,NA 1459,NA 3609,NA 4334,NA 1193,NA 23,NA 4718,NA 2191,NA 3754,NA 3340,NA 2040,NA 1757,NA 3691,NA 3322,NA 427,NA 4281,NA 385,NA 4265,NA 532,NA 2425,NA 1237,NA 3433,NA 819,NA 4765,NA 1610,NA 1808,NA 4439,NA 4141,NA 504,NA 4531,NA 3864,NA 1917,NA 4999,NA 1747,NA 4737,NA 1081,NA 161,NA 727,NA 4272,NA 1066,NA 1052,NA 198,NA 4724,NA 1225,NA 3906,NA 1442,NA 4377,NA 1479,NA 4918,NA 2950,NA 3796,NA 4181,NA 3815,NA 2087,NA 691,NA 405,NA 3280,NA 3011,NA 3285,NA 1647,NA 4898,NA 3576,NA 4364,NA 4917,NA 1093,NA 3323,NA 1948,NA 231,NA 3085,NA 2993,NA 2035,NA 4292,NA 2589,NA 4897,NA 86,NA 3368,NA 1857,NA 4591,NA 3390,NA 3326,NA 3781,NA 2715,NA 1197,NA 2545,NA 2087,NA 3635,NA 3189,NA 1983,NA 4798,NA 1494,NA 252,NA 2881,NA 1090,NA 630,NA 4691,NA 4007,NA 3791,NA 2663,NA 2735,NA 480,NA 1942,NA 862,NA 3454,NA 3377,NA 4732,NA 982,NA 4844,NA 1936,NA 3252,NA 4073,NA 355,NA 2635,NA 3818,NA 2177,NA 2763,NA 1021,NA 156,NA 4849,NA 894,NA 3892,NA 4429,NA 4183,NA 3027,NA 4535,NA 180,NA 658,NA 471,NA 3483,NA 2029,NA 329,NA 633,NA 4687,NA 1082,NA 3331,NA 1020,NA 2743,NA 4138,NA 638,NA 1306,NA 1443,NA 74,NA 4269,NA 2071,NA 3756,NA 3596,NA 2577,NA 4875,NA 1804,NA 852,NA 3916,NA 155,NA 3948,NA 834,NA 144,NA 3930,NA 4127,NA 4827,NA 1894,NA 872,NA 3019,NA 4028,NA 184,NA 3665,NA 1077,NA 81,NA 644,NA 3431,NA 3210,NA 1637,NA 1938,NA 3538,NA 3489,NA 4629,NA 2296,NA 2980,NA 826,NA 2835,NA 4491,NA 2973,NA 4159,NA 2968,NA 3895,NA 1989,NA 4250,NA 3710,NA 1589,NA 559,NA 506,NA 4001,NA 1900,NA 264,NA 4933,NA 3021,NA 744,NA 2694,NA 629,NA 4816,NA 235,NA 808,NA 4683,NA 4854,NA 3552,NA 4426,NA 4885,NA 175,NA 2194,NA 3223,NA 4975,NA 1574,NA 4280,NA 2702,NA 4368,NA 2578,NA 4348,NA 4283,NA 1729,NA 1,NA 1026,NA 4727,NA 1407,NA 4405,NA 4806,NA 21,NA 2765,NA 1097,NA 3234,NA 3639,NA 4363,NA 1910,NA 4464,NA 4220,NA 3649,NA 193,NA 1670,NA 3747,NA 4566,NA 1022,NA 2359,NA 1926,NA 1964,NA 1092,NA 37,NA 1819,NA 2631,NA 4221,NA 680,NA 1883,NA 1317,NA 2490,NA 98,NA 436,NA 4980,NA 4711,NA 622,NA 576,NA 1834,NA 2356,NA 3921,NA 2452,NA 510,NA 4718,NA 3531,NA 2512,NA 2650,NA 1293,NA 3559,NA 4843,NA 3306,NA 3982,NA 367,NA 4424,NA 4134,NA 3629,NA 1837,NA 2618,NA 2350,NA 493,NA 2581,NA 2249,NA 2748,NA 3248,NA 796,NA 1469,NA 4457,NA 2941,NA 3167,NA 1298,NA 1592,NA 1697,NA 3804,NA 55,NA 316,NA 1320,NA 2970,NA 1488,NA 474,NA 3807,NA 3863,NA 2010,NA 296,NA 3752,NA 2642,NA 1380,NA 1307,NA 2720,NA 996,NA 3226,NA 3752,NA 1355,NA 4379,NA 4259,NA 230,NA 1906,NA 917,NA 4609,NA 4531,NA 965,NA 4322,NA 67,NA 4429,NA 1958,NA 381,NA 3234,NA 4584,NA 4173,NA 2507,NA 3011,NA 2345,NA 4432,NA 3353,NA 1969,NA 2757,NA 1213,NA 1017,NA 342,NA 1537,NA 4966,NA 582,NA 3578,NA 1131,NA 667,NA 4637,NA 4471,NA 1019,NA 1285,NA 3071,NA 2208,NA 1578,NA 507,NA 1364,NA 3269,NA 4640,NA 134,NA 2798,NA 4271,NA 380,NA 1030,NA 2480,NA 1310,NA 2080,NA 2196,NA 912,NA 392,NA 89,NA 3641,NA 4855,NA 2677,NA 833,NA 291,NA 2296,NA 3114,NA 2975,NA 3716,NA 2622,NA 2485,NA 74,NA 4790,NA 2266,NA 4908,NA 2724,NA 1801,NA 2516,NA 374,NA 4849,NA 3243,NA 4923,NA 2681,NA 3806,NA 2822,NA 3893,NA 3196,NA 1895,NA 1798,NA 4222,NA 2284,NA 896,NA 4832,NA 3568,NA 125,NA 3133,NA 4140,NA 3216,NA 3543,NA 4354,NA 1410,NA 867,NA 549,NA 2932,NA 4254,NA 4608,NA 3379,NA 1753,NA 44,NA 2155,NA 3625,NA 2062,NA 2755,NA 782,NA 216,NA 3424,NA 2573,NA 4729,NA 4216,NA 325,NA 3811,NA 392,NA 668,NA 4049,NA 2797,NA 3669,NA 1749,NA 4914,NA 2045,NA 1805,NA 3263,NA 718,NA 3404,NA 4297,NA 4194,NA 4407,NA 1189,NA 2894,NA 4490,NA 1723,NA 3805,NA 3656,NA 4263,NA 4880,NA 566,NA 4852,NA 3241,NA 281,NA 2366,NA 1474,NA 3052,NA 606,NA 3148,NA 3560,NA 3061,NA 173,NA 3330,NA 3265,NA 2260,NA 2585,NA 3384,NA 4405,NA 3657,NA 1994,NA 2153,NA 728,NA 2256,NA 2894,NA 353,NA 3712,NA 2747,NA 3173,NA 684,NA 4652,NA 3256,NA 2644,NA 1126,NA 4917,NA 546,NA 350,NA 3889,NA 3292,NA 1297,NA 4592,NA 744,NA 3204,NA 1007,NA 3719,NA 4239,NA 4269,NA 4018,NA 120,NA 3977,NA 4433,NA 3001,NA 2164,NA 4602,NA 3081,NA 2179,NA 4487,NA 3846,NA 641,NA 2694,NA 646,NA 2555,NA 2719,NA 1209,NA 4016,NA 4740,NA 2037,NA 2574,NA 4908,NA 1771,NA 2280,NA 1101,NA 410,NA 1847,NA 634,NA 3700,NA 4780,NA 3344,NA 2341,NA 2691,NA 1655,NA 3144,NA 2263,NA 4441,NA 3922,NA 691,NA 1407,NA 3535,NA 2211,NA 3389,NA 3504,NA 343,NA 4793,NA 1223,NA 4632,NA 2514,NA 4678,NA 2194,NA 1753,NA 2383,NA 4831,NA 1486,NA 1454,NA 4299,NA 967,NA 4046,NA 1828,NA 1264,NA 4281,NA 651,NA 3960,NA 1780,NA 4822,NA 594,NA 2291,NA 2619,NA 4186,NA 168,NA 217,NA 3961,NA 1014,NA 344,NA 4323,NA 386,NA 2156,NA 4869,NA 2855,NA 3773,NA 1213,NA 3136,NA 843,NA 2224,NA 824,NA 592,NA 1838,NA 4733,NA 4378,NA 1301,NA 3287,NA 610,NA 1595,NA 3116,NA 2235,NA 3542,NA 4451,NA 522,NA 3153,NA 4208,NA 1822,NA 3115,NA 2304,NA 818,NA 2570,NA 717,NA 3252,NA 777,NA 2542,NA 430,NA 2516,NA 193,NA 4121,NA 1430,NA 1234,NA 1990,NA 3161,NA 4743,NA 1701,NA 3137,NA 4125,NA 726,NA 4836,NA 431,NA 1203,NA 3195,NA 2517,NA 1253,NA 4896,NA 3283,NA 450,NA 3153,NA 4384,NA 4652,NA 2098,NA 2478,NA 1764,NA 1244,NA 4794,NA 1800,NA 995,NA 3632,NA 841,NA 1133,NA 4228,NA 1730,NA 337,NA 136,NA 591,NA 69,NA 3679,NA 4620,NA 4911,NA 2027,NA 1349,NA 2442,NA 256,NA 13,NA 2480,NA 1219,NA 1279,NA 2762,NA 1258,NA 3143,NA 1581,NA 4623,NA 4533,NA 460,NA 3689,NA 2849,NA 3483,NA 3504,NA 200,NA 2158,NA 4072,NA 2833,NA 2942,NA 4402,NA 3597,NA 4683,NA 2148,NA 1431,NA 3854,NA 3083,NA 797,NA 2008,NA 9,NA 2090,NA 3820,NA 3973,NA 1213,NA 3796,NA 146,NA 2187,NA 2653,NA 2150,NA 4047,NA 4613,NA 3376,NA 470,NA 988,NA 2378,NA 3572,NA 2691,NA 4377,NA 1468,NA 1124,NA 3455,NA 1562,NA 2417,NA 609,NA 3451,NA 1579,NA 4081,NA 2730,NA 4737,NA 193,NA 3239,NA 399,NA 2165,NA 3805,NA 1469,NA 537,NA 365,NA 1782,NA 2858,NA 3390,NA 3454,NA 1868,NA 490,NA 496,NA 3875,NA 758,NA 1974,NA 4675,NA 3698,NA 3179,NA 1692,NA 4813,NA 559,NA 3253,NA 4918,NA 896,NA 690,NA 283,NA 2732,NA 2333,NA 4482,NA 93,NA 4255,NA 2508,NA 831,NA 1806,NA 3261,NA 4371,NA 3642,NA 2063,NA 797,NA 4229,NA 4422,NA 1980,NA 191,NA 4757,NA 3919,NA 1098,NA 1655,NA 889,NA 1813,NA 1958,NA 4520,NA 1383,NA 697,NA 2257,NA 552,NA 4405,NA 2670,NA 3697,NA 3598,NA 1323,NA 3370,NA 1049,NA 3453,NA 974,NA 3911,NA 76,NA 4671,NA 423,NA 171,NA 1555,NA 3924,NA 1403,NA 827,NA 2168,NA 4071,NA 3433,NA 3887,NA 457,NA 3714,NA 1984,NA 1481,NA 3715,NA 2333,NA 3866,NA 111,NA 4076,NA 1520,NA 4659,NA 2703,NA 1275,NA 2388,NA 3523,NA 38,NA 3863,NA 1329,NA 4856,NA 953,NA 99,NA 3062,NA 2629,NA 3173,NA 1978,NA 875,NA 1637,NA 3074,NA 396,NA 2596,NA 1532,NA 3357,NA 1969,NA 3740,NA 695,NA 1887,NA 3207,NA 4971,NA 1843,NA 1687,NA 4569,NA 4548,NA 0.23837975086644292,2015-01-16 0.41167997173033655,2018-05-18 0.7460716762579978,2015-09-05 0.723450553836301,2012-11-28 0.614524137461558,2020-01-13 0.473980569280684,2016-04-17 0.5784610391128808,2011-05-14 0.2415937229525298,2020-07-18 0.11437866208143532,2011-04-30 0.2983446326106787,2010-05-11 0.48411949491128325,2014-11-02 0.5674063181504607,2014-06-23 0.7539531090296805,2017-05-31 0.5454252359922975,2017-11-18 0.759677961235866,2013-04-20 0.21296746260486543,2010-12-11 0.8392650238238275,2022-10-10 0.40669705532491207,2010-11-11 0.8737398001831025,2015-03-22 0.8821565378457308,2013-12-25 0.10768936760723591,2011-02-27 0.5745443711057305,2015-12-14 0.996033379342407,2021-02-10 0.8505534324795008,2012-11-28 0.4376550551969558,2020-04-04 0.32642992469482124,2021-04-08 0.3359688585624099,2022-03-13 0.23927381564863026,2010-03-28 0.42137112445198,2023-01-04 0.5121368307154626,2012-04-02 0.1854463662020862,2015-11-09 0.3264005221426487,2017-01-13 0.8170736429747194,2011-02-25 0.7487379980739206,2019-04-04 0.5317418694030493,2022-04-01 0.9615713683888316,2016-05-04 0.10404637176543474,2010-06-02 0.7371236665640026,2017-03-08 0.847479980904609,2014-11-17 0.6437387536279857,2011-07-08 0.23245719773694873,2010-03-26 0.9162295656278729,2015-11-20 0.4245975555386394,2018-08-28 0.19294350570999086,2017-09-17 0.7015503356233239,2010-10-17 0.9925807097461075,2023-08-21 0.7089125071652234,2015-10-13 0.5174851573538035,2017-09-09 0.7087320478167385,2021-03-24 0.3751404786016792,2016-03-01 0.2547737658023834,2021-10-09 0.9964129347354174,2020-05-17 0.5228953601326793,2022-09-10 0.7158094178885221,2011-02-13 0.0024924282915890217,2010-09-04 0.2929687723517418,2014-09-23 0.1808160012587905,2010-01-16 0.4075938919559121,2019-04-15 0.21699588908813894,2018-04-11 0.07134267035871744,2011-08-20 0.2533115807455033,2021-08-08 0.6524795212317258,2022-07-27 0.7344441062305123,2013-06-29 0.8175131441093981,2013-09-28 0.30599033809266984,2014-10-16 0.8852475683670491,2020-06-03 0.6065588523633778,2015-07-15 0.8810191683005542,2012-10-24 0.6799206326249987,2018-01-16 0.1429436623584479,2018-03-12 0.7654655806254596,2010-12-14 0.6269666294101626,2020-11-30 0.7303384605329484,2020-06-26 0.4237044109031558,2021-05-31 0.26833077566698194,2010-02-14 0.7690480363089591,2018-03-20 0.587346678134054,2016-06-28 0.8985677554737777,2015-05-10 0.8703245387878269,2014-05-17 0.39930623723194003,2010-11-04 0.9651191120501608,2015-05-03 0.49599104514345527,2020-10-09 0.46659751585684717,2019-10-28 0.10774453124031425,2011-02-08 0.9142980496399105,2015-09-25 0.14380344981327653,2010-12-23 0.26518719107843935,2018-02-13 0.14846304850652814,2019-01-31 0.41424868325702846,2017-06-14 0.10894967359490693,2020-09-16 0.24707794794812799,2018-04-30 0.5906431097537279,2011-03-23 0.32826729747466743,2022-11-26 0.7329705220181495,2022-05-31 0.7279052240774035,2019-10-16 0.5457234212663025,2018-10-09 0.7337463176809251,2016-07-10 0.9333583756815642,2010-05-17 0.24339259508997202,2016-01-13 0.282959054922685,2015-01-17 0.2166259593795985,2018-04-14 0.12004142836667597,2021-07-23 0.17026365920901299,2019-08-09 0.521528884768486,2018-11-05 0.9634940281976014,2013-11-07 0.8878725194372237,2020-03-21 0.6364465965889394,2021-04-28 0.32938025146722794,2019-08-24 0.6022224121261388,2013-04-15 0.7721615340560675,2016-06-14 0.2101352927275002,2011-05-22 0.5843083660583943,2020-01-13 0.5420758109539747,2020-07-09 0.09239664277993143,2023-02-06 0.3693408251274377,2014-08-14 0.9524294231086969,2013-10-20 0.9496892413590103,2015-06-21 0.44977682176977396,2013-04-15 0.25981824356131256,2014-11-16 0.018906170036643744,2014-04-12 0.7214050476904958,2022-02-24 0.9528096492867917,2012-12-04 0.44022330385632813,2014-06-30 0.5040123793296516,2019-01-16 0.04355699848383665,2021-12-12 0.9224744557868689,2019-03-08 0.9237895561382174,2017-11-18 0.60564771364443,2013-07-14 0.004081981023773551,2011-03-07 0.656856436515227,2021-07-19 0.1509289499372244,2011-05-02 0.1991606669034809,2016-04-16 0.23963832925073802,2010-03-24 0.6615739674307406,2018-07-28 0.2417888215277344,2016-04-28 0.07154973852448165,2014-09-22 0.7046719279605895,2020-05-10 0.3153969100676477,2011-01-25 0.37587519478984177,2014-05-08 0.8150977415498346,2012-07-01 0.2005599664989859,2023-02-28 0.30193018657155335,2020-11-25 0.08772024232894182,2023-05-27 0.1833201281260699,2011-04-21 0.20680187526158988,2019-02-11 0.8549594988580793,2021-09-28 0.262909896671772,2013-11-29 0.9453342743217945,2017-07-05 0.7927354699932039,2012-10-08 0.8315818924456835,2022-11-27 0.618977224919945,2018-12-19 0.44373362860642374,2022-12-03 0.9646256130654365,2010-12-18 0.16219870373606682,2010-12-27 0.1856537905987352,2014-09-29 0.6131014195270836,2021-06-09 0.48148447810672224,2016-02-20 0.3961378976237029,2021-11-17 0.31875640782527626,2010-04-25 0.8383750088978559,2023-08-19 0.2992035255301744,2012-03-10 0.6160618360154331,2010-07-12 0.4621038355398923,2019-01-28 0.939205955946818,2010-01-05 0.006291386438533664,2016-07-03 0.3494274849072099,2018-07-01 0.5662713926285505,2016-05-30 0.5289570635650307,2015-11-26 0.6370153992902488,2016-09-02 0.35079587949439883,2020-03-12 0.9241711974609643,2013-03-27 0.4740412225946784,2016-10-19 0.7781341173686087,2011-10-24 0.7987365354783833,2019-02-01 0.878499910235405,2012-12-25 0.9098438550718129,2020-02-29 0.4455377559643239,2010-07-09 0.28955932846292853,2018-12-04 0.3033107363153249,2014-07-31 0.9190243480261415,2013-01-11 0.9357123947702348,2010-04-17 0.6716483265627176,2021-01-05 0.37126996577717364,2023-08-27 0.5671314361970872,2014-12-16 0.4785984419286251,2022-10-20 0.5677487845532596,2019-01-04 0.2865388400387019,2017-01-12 0.759599128505215,2017-11-03 0.22846577316522598,2020-07-20 0.42650539334863424,2010-06-02 0.45658472809009254,2011-05-26 0.709906758973375,2014-03-05 0.28745697857812047,2015-12-17 0.7730602626688778,2015-12-28 0.5745622855611145,2013-09-16 0.02179576293565333,2022-02-27 0.7730264803394675,2022-04-02 0.05912893358618021,2017-01-29 0.02764830063097179,2018-03-15 0.7408465940970927,2018-12-18 0.5933208465576172,2010-08-10 0.6866767185274512,2022-05-06 0.7353102252818644,2012-09-08 0.46110520721413195,2021-01-18 0.661999277304858,2016-01-22 9.579434990882874e-4,2020-01-05 0.9759655296802521,2015-02-08 0.22016345639713109,2010-12-13 0.30632783845067024,2015-11-29 0.731566637288779,2017-08-03 0.13350622565485537,2011-06-27 0.0998360610101372,2022-05-10 0.171385153895244,2010-05-24 0.6171815898269415,2011-05-09 0.6832633081357926,2016-05-01 0.7905740689020604,2012-04-02 0.21324812644161284,2017-01-09 0.7920108856633306,2022-11-30 0.5926344085019082,2019-04-13 0.6062636019196361,2015-10-09 0.11076854448765516,2022-07-20 0.19560863845981658,2017-10-25 0.007070775609463453,2014-08-21 0.940870595164597,2021-04-29 0.06820935127325356,2020-04-19 0.13692918047308922,2013-03-20 0.332817024551332,2014-08-25 0.4896882916800678,2014-09-27 0.17295454000122845,2014-02-20 0.06493924162350595,2018-07-18 0.45166698982939124,2013-05-20 0.3794023538939655,2011-04-26 0.9092983675654978,2012-07-29 0.029946457827463746,2010-06-26 0.19479636382311583,2017-02-12 0.6794793712906539,2012-06-16 0.7474663024768233,2017-11-09 0.6495377144310623,2018-08-10 0.07758067711256444,2013-09-19 0.599653656128794,2017-12-03 0.8790108121465892,2013-03-08 0.7320371561218053,2011-09-25 0.08005308615975082,2023-08-25 0.8327498119324446,2016-06-30 0.8647056978661567,2011-03-29 0.712964971549809,2013-04-07 0.757407110882923,2013-07-20 0.12243391619995236,2010-05-10 0.1936978818848729,2012-05-28 0.31417828937992454,2018-06-28 0.13753298204392195,2013-01-10 0.8478028802201152,2010-11-25 0.4852219868917018,2018-10-31 0.47024272638373077,2014-04-13 0.7910453744698316,2018-10-01 0.31298327283002436,2023-05-03 0.3087645126506686,2012-03-24 0.34515533596277237,2017-04-30 0.8284433148801327,2018-01-05 0.2739954984281212,2014-05-14 0.430836085928604,2016-05-30 0.4168978400994092,2018-10-03 0.13431219942867756,2011-01-24 0.6863413986284286,2023-08-05 0.17207811656408012,2021-11-10 0.9234934435226023,2020-12-15 0.6137435929849744,2021-10-08 0.31562944664619863,2020-12-10 0.8097330906894058,2014-07-09 0.9023741011042148,2015-11-19 0.1252977994736284,2012-12-28 0.5655571934767067,2015-01-14 0.12764110649004579,2010-06-23 0.6275976162869483,2023-04-15 0.07532395399175584,2011-10-16 0.2854513239581138,2016-03-21 0.31060242909006774,2021-09-18 0.14672756171785295,2011-12-11 0.7997705133166164,2010-12-16 0.1719960793852806,2017-10-16 0.7856838123407215,2010-10-30 0.4700357641559094,2019-12-12 0.4329577290918678,2020-09-13 0.5745328599587083,2017-05-17 0.7299699452705681,2021-09-27 0.8941871484275907,2014-12-31 0.2203063692431897,2015-11-07 0.2915964382700622,2018-10-26 0.8128987492527813,2022-11-17 0.043602329678833485,2010-11-07 0.5052716645877808,2015-09-10 0.24165588174946606,2010-01-27 0.5891424184665084,2021-04-13 0.9711405686102808,2019-03-19 0.23272100347094238,2019-01-11 0.8674180153757334,2014-06-13 0.1107617428060621,2011-05-18 0.8014917799737304,2016-07-08 0.6579244541935623,2012-12-21 0.652054297272116,2013-10-21 0.2263226448558271,2010-02-28 0.8539796313270926,2018-03-15 0.435607039835304,2016-02-28 0.28928173682652414,2017-07-23 0.6375846704468131,2022-09-07 0.2645394585561007,2017-12-26 0.38626837776973844,2011-01-09 0.6191939699929208,2011-06-02 0.5416780826635659,2022-09-20 0.4848310004454106,2013-06-22 0.7642949193250388,2013-02-18 0.9610269367694855,2014-07-02 0.2705845332238823,2015-03-19 0.7306002208497375,2013-10-21 0.13416554615832865,2018-08-28 0.04839822766371071,2014-03-08 0.7036070702597499,2013-09-21 0.14721379429101944,2015-02-19 0.769155333051458,2015-04-04 0.17635010997764766,2021-04-26 0.11075899936258793,2016-02-12 0.9383424082770944,2019-09-09 0.8464711401611567,2020-11-07 0.5711435815319419,2021-02-15 0.6790934063028544,2015-05-18 0.08932224358431995,2013-11-12 0.7853741250000894,2023-06-25 0.22695744805969298,2013-07-10 0.44817846501246095,2021-10-23 0.16122763720341027,2019-11-16 0.1761116897687316,2020-07-24 0.19822812359780073,2010-10-08 0.3576113139279187,2011-07-23 0.1813332038000226,2019-12-15 0.5611448597628623,2018-08-10 0.6590830096974969,2020-08-27 0.6602534090634435,2014-01-24 0.0024007875472307205,2020-10-14 0.9934460341464728,2022-05-01 0.6274892308283597,2016-05-05 0.014630335848778486,2017-01-02 0.20517821749672294,2012-05-28 0.6630766640882939,2014-09-28 0.4637440303340554,2018-06-23 0.36034815781749785,2013-05-03 0.6893663913942873,2022-12-04 0.258860788308084,2010-12-07 0.8512050320859998,2021-03-23 0.4800046910531819,2022-09-04 0.5372663901653141,2014-05-05 0.6616398973856121,2014-12-28 0.3008545101620257,2010-11-11 0.2635015156120062,2016-08-17 0.305046129738912,2010-07-24 0.8749582655727863,2022-03-28 0.7184372169431299,2017-11-15 0.40691969403997064,2011-03-23 0.019359473139047623,2022-02-21 0.050180358812212944,2013-11-17 0.7022510319948196,2015-09-28 0.06637026951648295,2013-04-16 0.03613236825913191,2010-10-27 0.020153695717453957,2010-06-09 0.37278109695762396,2018-10-31 0.22566540399566293,2015-07-07 0.4919181000441313,2019-04-10 0.4466451567132026,2015-01-19 0.6902048990596086,2012-08-29 0.8427399797365069,2015-07-02 0.37583366711623967,2020-04-06 0.9848896882031113,2019-04-10 0.9524263297207654,2013-10-15 0.989898509113118,2014-06-03 0.4431283543817699,2016-02-11 0.1537638516165316,2017-03-03 0.9447273225523531,2016-08-05 0.5194664136506617,2013-02-19 0.45563460420817137,2018-09-24 0.22009019972756505,2010-03-19 0.139182384358719,2022-12-23 0.216157881077379,2015-08-03 0.4056786729488522,2012-12-04 0.23373459139838815,2021-01-29 0.025066359667107463,2015-08-08 0.7523419591598213,2012-04-06 0.7320725433528423,2011-01-09 0.40910677472129464,2014-03-02 0.6308578054886311,2014-02-10 0.0654449830763042,2012-06-09 0.14869215176440775,2022-04-15 0.279701764928177,2010-12-05 0.8506568092852831,2013-03-31 0.021540780318900943,2015-12-12 0.2528298799879849,2017-08-22 0.6567115010693669,2022-12-01 0.7891494096256793,2021-05-22 0.7088456475175917,2021-07-27 0.06459299195557833,2013-06-05 0.511311343871057,2011-09-22 0.20092834974639118,2021-11-28 0.8141155925113708,2012-06-27 0.6537800759542733,2023-08-08 0.35325198201462626,2023-09-06 0.5965948959346861,2023-02-18 0.7277043734211475,2011-12-15 0.9955685392487794,2010-11-12 0.49805527180433273,2013-04-25 0.23029476939700544,2022-08-15 0.30125431274063885,2015-10-09 0.7204666768666357,2014-09-26 0.2614054181613028,2020-07-25 0.1076963481027633,2019-06-03 0.2624227120541036,2016-07-28 0.18670618324540555,2020-06-17 0.5424416796304286,2022-05-26 0.06551847420632839,2010-07-05 0.8803851366974413,2017-04-26 0.2792125369887799,2022-02-19 0.6727036715019494,2013-08-15 0.060130874160677195,2010-10-19 0.9884855502750725,2020-03-08 0.28227543109096587,2021-11-05 0.5541123666334897,2016-12-14 0.8900840044952929,2023-02-13 0.02280205488204956,2020-05-13 0.6776884538121521,2016-03-20 0.33705979655496776,2023-05-22 0.14119609468616545,2016-06-13 0.3525639877188951,2016-07-15 0.4014448565430939,2018-06-22 0.8889143522828817,2018-02-23 0.31261418759822845,2017-08-24 0.06165470811538398,2016-09-23 0.6794862640090287,2022-10-15 0.3781625689007342,2012-10-13 0.5883703480940312,2017-04-03 0.4919785351958126,2022-08-05 0.28577694413252175,2019-01-01 0.008419594960287213,2013-03-19 0.5507742969784886,2017-07-26 0.8132309077773243,2022-06-09 0.6435745854396373,2020-10-10 0.06906200293451548,2012-11-09 0.7287162716966122,2018-10-09 0.6721111985389143,2023-03-13 0.8198009913321584,2020-06-12 0.9146215580403805,2020-04-28 0.6899706807453185,2017-07-02 0.49197780271060765,2017-05-03 0.6469286507926881,2013-03-16 0.45651495666243136,2018-08-04 0.12544662156142294,2015-05-05 0.6043649739585817,2017-03-28 0.8268267358653247,2013-11-16 0.6193782512564212,2018-02-09 0.8410537105519325,2019-07-27 0.42724660760723054,2023-05-16 0.7692867037840188,2011-06-25 0.712140791118145,2012-05-04 0.0633672084659338,2013-02-15 0.4260052361059934,2013-12-13 0.08827764308080077,2022-12-02 0.20831681927666068,2018-03-25 0.22826087311841547,2014-12-11 0.05044214380905032,2015-10-10 0.5618212523404509,2010-11-23 0.6942461444996297,2019-02-13 0.22794265439733863,2022-10-21 0.9603136049117893,2017-03-27 0.4928371913265437,2020-12-18 0.7232086854055524,2010-11-08 0.49136308254674077,2018-08-31 0.8453550811391324,2019-03-31 0.025109663605690002,2019-06-20 0.5484002430457622,2021-01-29 0.9554548165760934,2021-02-03 0.14050186681561172,2011-05-02 0.7102736248634756,2012-10-14 0.12640188890509307,2011-12-04 0.8356562776025385,2019-02-18 0.2981795039959252,2011-10-05 0.381328749936074,2011-06-13 0.24784933566115797,2016-12-21 0.3449816491920501,2020-04-22 0.41978342621587217,2016-09-24 0.6591099870856851,2012-11-27 0.9539570489432663,2015-05-07 0.4698936538770795,2018-12-25 0.15062109171412885,2016-09-18 0.9025228463578969,2016-09-11 0.4380257027223706,2020-11-21 0.8067555839661509,2011-04-22 0.48149546841159463,2013-02-07 0.25803821301087737,2013-04-14 0.17013581050559878,2019-08-23 0.1606106914114207,2010-12-23 0.66139040957205,2010-10-18 0.46855212026275694,2016-05-22 0.5453928408678621,2012-03-21 0.725098253460601,2017-03-01 0.5254825404845178,2023-05-15 0.618936445331201,2011-01-30 0.1789609114639461,2010-06-11 0.7677212303970009,2015-08-10 0.8162949671968818,2012-06-27 0.19458237988874316,2020-03-18 0.21251409477554262,2010-05-09 0.24883011914789677,2012-04-04 0.7480038029607385,2012-11-25 0.40724376146681607,2013-05-02 0.5616738076787442,2010-06-04 0.7505097503308207,2011-05-16 0.5511977674905211,2013-02-07 0.473349581239745,2011-04-24 0.06262986944057047,2013-10-19 0.0939460473600775,2010-11-26 0.5801826189272106,2013-02-20 0.38567587174475193,2017-11-19 0.2365208996925503,2020-05-09 0.5195376325864345,2022-12-20 0.6412099292501807,2016-07-10 0.829900240059942,2015-10-24 0.9190941501874477,2010-06-06 0.9294001522939652,2016-08-21 0.6453137448988855,2011-05-23 0.783117612125352,2019-10-06 0.05871596094220877,2010-07-07 0.35836152103729546,2020-02-18 0.479386042105034,2014-03-09 0.7230917664710432,2018-06-27 0.6964026989880949,2010-07-27 0.870710554998368,2014-09-25 0.708805855596438,2017-01-25 0.3548054692801088,2018-10-31 0.8072527183685452,2016-03-03 0.35278886649757624,2021-05-23 0.7531260862015188,2013-09-13 0.9081510829273611,2020-06-04 0.4353852095082402,2017-09-21 0.22024713945575058,2011-09-24 0.03718220675364137,2013-06-24 0.6980540752410889,2012-03-27 0.14983401424251497,2015-01-20 0.5424376127775759,2012-06-12 0.7967723628971726,2020-01-16 0.7196246690582484,2016-06-26 0.7281896565109491,2021-08-12 0.04781616129912436,2011-04-14 0.45137571380473673,2011-10-05 0.794269957812503,2013-10-29 0.8246223253663629,2021-03-27 0.20905156270600855,2015-05-28 0.16820653341710567,2022-02-26 0.9802742237225175,2018-02-20 0.600670009618625,2013-05-17 0.15167629974894226,2015-02-23 0.29273867909796536,2011-11-17 0.4480606229044497,2011-05-10 0.8239078253973275,2018-10-18 0.9694043302442878,2021-03-12 0.29540916392579675,2012-02-24 0.23268974153324962,2017-01-04 0.3254810383077711,2019-09-24 0.10037211910821497,2014-12-24 0.3302253605797887,2019-02-27 0.19259870192036033,2019-11-13 0.26887363637797534,2017-06-03 0.8835293431766331,2015-05-22 0.6706231615971774,2015-06-24 0.07432106742635369,2013-10-07 0.6853092038072646,2012-04-03 0.6008155907038599,2021-11-15 0.7061404753476381,2018-12-12 0.4681660116184503,2015-05-16 0.01098793395794928,2021-07-27 0.7832956942729652,2022-02-13 0.6218532985076308,2020-01-30 0.8660587386693805,2018-10-23 0.919852337334305,2014-04-23 0.3253989245276898,2013-02-12 0.9157620661426336,2010-11-15 0.08054490759968758,2011-01-29 0.8555176814552397,2020-01-06 0.30502897896803916,2012-05-30 0.7363630407489836,2022-03-12 0.940962569322437,2022-08-28 0.8610232374630868,2017-09-14 0.3641701233573258,2022-09-04 0.9125234829261899,2022-08-07 0.752922203624621,2012-02-01 0.6414824086241424,2018-12-17 0.7954503307119012,2018-05-07 0.9849717258475721,2018-08-07 0.6223692377097905,2014-09-25 0.5553264871705323,2010-12-29 0.9610665023792535,2018-03-21 0.9156397173646837,2012-11-27 0.6032756008207798,2011-06-04 0.311554106650874,2019-04-29 0.5551521240267903,2016-03-22 0.9375977437011898,2011-09-19 0.36503715231083333,2022-07-30 0.33863229816779494,2015-05-20 0.7696835622191429,2010-12-17 0.301342302467674,2019-04-06 0.6296409552451223,2016-05-21 0.8024997254833579,2016-10-03 0.5422355639748275,2019-09-10 0.6309975676704198,2014-01-11 0.18978887028060853,2012-10-27 0.20345269003883004,2021-01-31 0.9283512588590384,2019-08-10 0.41344345500692725,2020-07-21 0.13096988503821194,2015-08-18 0.061163004487752914,2015-12-14 0.8860738726798445,2017-09-16 0.5922085058409721,2013-02-06 0.7224121852777898,2010-07-19 0.5123929986730218,2011-06-11 0.29606865253299475,2014-10-14 0.6397780675906688,2012-01-22 0.8825434281025082,2020-08-10 0.9461507303640246,2016-09-08 0.709416676312685,2020-02-03 0.9236205760389566,2015-05-14 0.016221591737121344,2018-10-01 0.17147828871384263,2019-05-26 0.21398976421914995,2021-01-18 0.8451151894405484,2021-03-27 0.24332171166315675,2018-04-24 0.5150503544136882,2012-03-23 0.8631874904967844,2020-02-02 0.040558676002547145,2010-12-07 0.4600282253231853,2020-09-25 0.16600484843365848,2020-11-13 0.9153843396343291,2011-02-14 0.4094238232355565,2021-07-25 0.6405321785714477,2016-03-05 0.16481841239146888,2021-09-10 0.18098014616407454,2023-01-09 0.996452712919563,2016-06-16 0.6451109414920211,2013-10-23 0.9180984173435718,2021-05-11 0.7950654453597963,2020-06-26 0.9130970847327262,2014-10-20 0.3905595827382058,2012-01-11 0.3574004932306707,2014-08-19 0.6143616286572069,2023-01-08 0.1924407461192459,2014-05-16 0.07183849718421698,2011-11-15 0.3062329371459782,2010-08-17 0.17457634513266385,2014-02-24 0.8877611239440739,2012-05-12 0.4978482248261571,2015-05-29 0.693908091634512,2015-01-04 0.874216026859358,2020-02-01 0.01808677427470684,2018-10-07 0.3819870548322797,2015-11-26 0.5135930245742202,2017-02-10 0.04722265945747495,2014-10-05 0.8030951099935919,2021-12-03 0.6340869336854666,2015-01-20 0.7713282140903175,2014-02-04 0.5017637426499277,2018-04-18 0.7112887632101774,2019-05-16 0.09189838543534279,2019-08-08 0.10590877430513501,2022-08-16 0.22491388185881078,2020-04-28 0.4176635534968227,2016-05-30 0.3440130678936839,2020-12-01 0.6642059565056115,2014-10-01 0.44336367142386734,2019-04-05 0.30618356238119304,2019-08-04 0.26934600668027997,2018-03-07 0.27042659488506615,2012-12-02 0.0976896530482918,2018-08-12 0.9920599514152855,2018-03-05 0.1045265388675034,2018-06-14 0.43448846065439284,2012-12-26 0.684687570668757,2014-04-01 0.329821523046121,2019-12-01 0.39954269072040915,2016-02-12 0.6991565418429673,2021-11-11 0.2425231086090207,2016-11-23 0.027535082073882222,2012-05-15 0.07009002240374684,2020-11-12 0.023703276878222823,2020-05-19 0.20660110423341393,2012-07-12 0.6988863211590797,2014-08-06 0.9813835630193353,2013-03-17 0.7816515797749162,2011-09-26 0.6054745719302446,2019-03-02 0.20836171018891037,2018-02-10 0.7636784943751991,2012-10-04 0.8187859968747944,2015-10-27 0.7611123095266521,2012-11-22 0.8286271207034588,2010-08-30 0.008509289706125855,2016-06-28 0.08882123627699912,2023-04-25 0.91988012520596385,2011-04-07 0.6383964512497187,2012-11-01 0.4169857541564852,2013-05-04 0.7020355253480375,2018-11-22 0.16102612484246492,2013-07-26 0.3779098354279995,2014-06-06 0.9751168165821582,2019-01-21 0.4035551785491407,2013-04-01 0.723759297747165,2021-05-05 0.38518987968564034,2019-02-15 0.38908845093101263,2017-05-29 0.12964176665991545,2011-08-30 0.2847507023252547,2013-12-29 0.13074389309622347,2022-06-02 0.4740489721298218,2012-06-11 0.9800091898068786,2022-07-08 0.16834043501876295,2017-12-26 0.18153826682828367,2015-07-02 0.8789390495512635,2017-06-27 0.6766599684488028,2014-08-06 0.5074436131399125,2022-06-22 0.4814086586702615,2021-10-30 0.9683199205901474,2011-06-18 0.24795010755769908,2020-04-03 0.13281461247242987,2017-12-24 0.06826614774763584,2017-01-06 0.40022375574335456,2014-01-08 0.34733960195444524,2023-07-28 0.08134637214243412,2022-07-29 0.04008660069666803,2023-08-17 0.26790826581418514,2021-09-06 0.4845776897855103,2022-02-24 0.6038457204122096,2016-04-07 0.2126810213085264,2011-06-13 0.08015722362324595,2013-10-31 0.17985428147949278,2018-12-23 0.7949596226681024,2015-02-25 0.5292033553123474,2021-04-19 0.4661243304144591,2019-05-10 0.8921737256459892,2014-05-12 0.7419538695830852,2011-11-06 0.7637358212377876,2022-12-01 0.8440997828729451,2018-03-08 0.7052174902055413,2019-12-05 0.9484422185923904,2014-04-20 0.14429178135469556,2019-04-22 0.8038033130578697,2013-09-01 0.2943441355600953,2013-02-16 0.38137339940294623,2015-10-28 0.23724128119647503,2021-05-20 0.34614214673638344,2012-06-30 0.42900022584944963,2017-09-05 0.5998602632898837,2018-09-17 0.565516313072294,2013-01-10 0.4661923705134541,2019-02-25 0.23348797275684774,2022-07-29 0.740812616655603,2011-04-18 0.29534474899992347,2021-03-29 0.08237326238304377,2014-10-04 0.27436478761956096,2021-07-29 0.9310599053278565,2021-01-23 0.8814145622309297,2019-08-20 0.4742308217100799,2011-11-21 0.5750370132736862,2012-03-20 0.21033262298442423,2013-10-07 0.5982999296393245,2022-01-31 0.02650217106565833,2021-02-16 0.8523679610807449,2014-02-21 0.5338073449674994,2011-02-25 0.09864674136042595,2015-06-13 0.6973787155002356,2010-09-05 0.6462127384729683,2015-02-02 0.9212825754657388,2013-12-09 0.28879159269854426,2017-04-01 0.65436782524921,2010-03-17 0.6188365686684847,2015-07-05 0.644347591791302,2011-12-24 0.5879467707127333,2011-03-01 0.9590033662971109,2021-08-27 0.16798287397250533,2016-08-17 0.24776496808044612,2021-09-24 0.5207485517021269,2019-01-31 0.13296581688337028,2020-05-10 0.8159506213851273,2017-12-10 0.30784280761145055,2017-08-09 0.3464580220170319,2010-08-14 0.32439053687267005,2015-10-04 0.8299951327499002,2020-02-17 0.16978011513128877,2017-01-12 0.27819421770982444,2012-03-11 0.3639769915025681,2018-10-17 0.06989352311939001,2021-05-26 0.5890974695794284,2017-07-12 0.08413626649416983,2010-09-03 0.2758814513217658,2013-11-30 0.0892041027545929,2021-09-30 0.9139310284517705,2017-08-14 0.23067126562818885,2016-04-02 0.9596100023481995,2018-09-28 0.31913768011145294,2023-04-20 0.43079651868902147,2020-06-18 0.9163004402071238,2011-10-02 0.8421652615070343,2022-01-12 0.9530339573975652,2015-05-05 0.3136253524571657,2012-08-28 0.8803836130537093,2012-07-18 0.29948478611186147,2021-10-23 0.4408169274684042,2017-08-04 0.44570411927998066,2019-03-12 0.42943084822036326,2013-02-09 0.8614283904898912,2010-01-27 0.7890478519257158,2019-07-23 0.3662304144818336,2023-07-17 0.33877988043241203,2015-10-21 0.9619562041480094,2017-03-21 0.8873374862596393,2017-01-02 0.4318412118591368,2018-11-03 0.8925788707565516,2018-10-07 0.1908249231055379,2017-07-06 0.753541242564097,2016-09-09 0.18671885086223483,2019-11-10 0.4893101565539837,2021-12-21 0.1323064104653895,2021-10-13 0.6215600143186748,2018-11-04 0.3441609856672585,2020-06-19 0.5986538652796298,2018-08-01 0.5948208479676396,2017-04-27 0.08747628959827125,2012-10-14 0.7449057816993445,2022-07-29 0.7255401618313044,2022-07-30 0.802798884222284,2015-04-04 0.5034499294124544,2015-10-23 0.26467121997848153,2014-03-05 0.5361411133781075,2018-04-26 0.2134377434849739,2018-10-31 0.2555720009841025,2011-12-01 0.3432095227763057,2023-09-05 0.3149803134147078,2010-02-20 0.903441054513678,2012-09-27 0.5070839948020875,2013-12-28 0.8868092112243176,2013-01-28 0.19502249849028885,2016-06-14 0.9889192474074662,2018-01-26 0.9127213363535702,2021-08-17 0.7590857506729662,2020-11-15 0.8878286243416369,2018-01-05 0.2729664109647274,2019-03-11 0.9270147723145783,2014-04-03 0.8476126017048955,2012-12-01 0.4657681928947568,2022-10-19 0.6940696041565388,2014-01-04 0.26842982484959066,2013-11-07 0.19049296411685646,2019-09-04 0.41361317480914295,2023-01-19 0.23820438305847347,2010-08-31 0.09241898846812546,2012-03-04 0.2726121188607067,2019-08-02 0.009083196753636003,2022-03-06 0.629982847487554,2022-06-09 0.07927433913573623,2021-10-07 0.3504166591446847,2022-12-17 0.6004056162200868,2013-09-14 0.9690369053278118,2017-03-29 0.6163354294840246,2019-07-03 0.5224107033573091,2013-04-09 0.5226436799857765,2019-04-11 0.00876278686337173,2012-08-17 0.4591184495948255,2022-10-01 0.4761861457955092,2015-03-13 0.974526327336207,2021-02-08 0.6641715527512133,2010-07-26 0.8101816652342677,2014-01-19 0.9218756454065442,2021-08-26 0.09567142208106816,2022-05-02 0.9353634966537356,2023-07-12 0.5359931767452508,2019-11-19 0.8296154425479472,2023-02-19 0.8165666493587196,2011-04-23 0.30543361068703234,2017-06-16 0.7086418280377984,2018-07-20 0.17579243425279856,2020-09-14 0.19219414866529405,2022-02-06 0.18565151165239513,2022-05-09 0.4843596222344786,2021-03-21 0.7586447366047651,2015-04-13 0.3020715794991702,2012-12-15 0.38378978963010013,2019-08-14 0.18092394573614,2014-08-31 0.6372511743102223,2017-02-25 0.5597414104267955,2019-05-10 0.8500275288242847,2023-06-13 0.6701601550448686,2017-11-09 0.6118010880891234,2020-09-14 0.9065461044665426,2011-04-10 0.3120599687099457,2016-12-10 0.5973760541528463,2022-01-16 0.6979898712597787,2021-03-17 0.8268592851236463,2018-02-06 0.9671381479129195,2017-02-22 0.36611850443296134,2019-05-22 0.8452709591947496,2023-02-05 0.391217652708292,2014-01-22 0.6951273591257632,2020-03-19 0.6493835819419473,2022-07-17 0.023566172923892736,2013-09-16 0.926038958132267,2011-04-28 0.9850510796532035,2016-04-15 0.9585321145132184,2020-02-05 0.26632869709283113,2013-01-09 0.6759593775495887,2021-07-09 0.8263764544390142,2011-06-13 0.7603731814306229,2015-01-14 0.3346221512183547,2019-10-18 0.9804811442736536,2016-05-18 0.9473683452233672,2013-09-15 0.509538036538288,2010-08-04 0.3867357175331563,2018-12-18 0.5971393240615726,2012-06-02 0.13570102746598423,2021-02-28 0.6072117269504815,2020-11-05 0.6190444205421954,2016-11-16 0.1604869430884719,2019-08-06 0.22741486108861864,2012-09-16 0.4889993858523667,2021-07-24 0.26625592773780227,2022-03-23 0.986886880826205,2020-11-30 0.6590274758636951,2018-10-18 0.5617879598867148,2019-01-19 0.6039721027482301,2013-01-01 0.19239175505936146,2013-06-26 0.3716695522889495,2014-09-30 0.12009952031075954,2019-09-14 0.3957092612981796,2010-01-04 0.03923126310110092,2014-01-15 0.6294073443859816,2012-06-19 0.5232696952298284,2015-12-01 0.3931053976994008,2022-04-14 0.8778933002613485,2013-01-14 0.2882499238476157,2017-05-17 0.6321781876031309,2013-09-23 0.313025128794834,2010-11-25 0.14582274248823524,2023-01-05 0.8205009659286588,2017-03-13 0.7456198027357459,2020-12-14 0.6778734670951962,2010-03-23 0.20513771777041256,2018-08-29 0.9919730878900737,2018-09-19 0.6689565279521048,2014-09-11 0.7538818956818432,2022-12-29 0.6451980541460216,2021-03-04 0.10516616073437035,2023-05-21 0.04980481299571693,2022-02-03 0.5507950552273542,2018-01-16 0.027205367805436254,2016-08-03 0.18725806567817926,2013-01-15 0.6483364240266383,2020-09-27 0.8247189852409065,2011-10-22 0.9155435566790402,2022-01-10 0.8255569902248681,2021-08-03 0.7955550437327474,2015-05-20 0.6881147245876491,2021-02-07 0.3386629270389676,2015-03-01 0.46830290742218494,2010-09-07 0.8369869156740606,2015-04-22 0.7704877557698637,2018-02-24 0.5956799318082631,2012-11-21 0.5965282435063273,2010-03-27 0.17414100118912756,2016-05-01 0.47566762403585017,2017-06-19 0.9339482507202774,2016-06-11 0.05953748035244644,2018-03-30 0.14324546162970364,2020-06-10 0.42678032303228974,2013-11-08 0.5644535899627954,2017-07-12 0.18729942245408893,2016-08-12 0.6027495227754116,2022-05-12 0.7348782932385802,2020-08-11 0.06834881310351193,2011-10-26 0.7829179642722011,2015-12-09 0.921492709312588,2012-09-27 0.04428216675296426,2013-02-04 0.7131148546468467,2010-11-29 0.9038860204163939,2013-10-16 0.7395815039053559,2015-04-26 0.1721756304614246,2011-04-12 0.18658997677266598,2017-01-30 0.38248836481943727,2014-06-08 0.45361327519640326,2016-10-19 0.4551314772106707,2023-09-01 0.17310278164222836,2010-01-09 0.3054172566626221,2020-01-11 0.867752101039514,2016-12-16 0.2602322499733418,2010-01-03 0.6808707599993795,2016-04-23 0.8535765560809523,2016-08-10 0.805274312151596,2019-11-21 0.1635163405444473,2018-03-29 0.47193897631950676,2014-08-04 0.7183186465408653,2015-08-16 0.26987858884967864,2020-02-04 0.608237189007923,2019-01-06 readr/inst/extdata/fwf-sample.txt0000644000175100001440000000020113106315444016570 0ustar hornikusersJohn Smith WA 418-Y11-4111 Mary Hartford CA 319-Z19-4341 Evan Nolan IL 219-532-c301 readr/inst/extdata/mtcars.csv.bz20000644000175100001440000000105113057262333016477 0ustar hornikusersBZh91AY&SY"!ـ>}@ᦀԨɦ&4QAh`jM !  @Fu2f qgtKNr,:Zb^6MBTዔi=XihUP4,4#m^ȱ 75,T 1O9|fGM,D%{1_ӧoFsCU]CUpdxVRmDfK٥AX/{ƺ^U*]WƔ2;ED]ر`ށd9 L<1 "(H Wreadr/inst/extdata/mtcars.csv0000644000175100001440000000242713057262333016013 0ustar hornikusers"mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb" 21,6,160,110,3.9,2.62,16.46,0,1,4,4 21,6,160,110,3.9,2.875,17.02,0,1,4,4 22.8,4,108,93,3.85,2.32,18.61,1,1,4,1 21.4,6,258,110,3.08,3.215,19.44,1,0,3,1 18.7,8,360,175,3.15,3.44,17.02,0,0,3,2 18.1,6,225,105,2.76,3.46,20.22,1,0,3,1 14.3,8,360,245,3.21,3.57,15.84,0,0,3,4 24.4,4,146.7,62,3.69,3.19,20,1,0,4,2 22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2 19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4 17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4 16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3 17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3 15.2,8,275.8,180,3.07,3.78,18,0,0,3,3 10.4,8,472,205,2.93,5.25,17.98,0,0,3,4 10.4,8,460,215,3,5.424,17.82,0,0,3,4 14.7,8,440,230,3.23,5.345,17.42,0,0,3,4 32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1 30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2 33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1 21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1 15.5,8,318,150,2.76,3.52,16.87,0,0,3,2 15.2,8,304,150,3.15,3.435,17.3,0,0,3,2 13.3,8,350,245,3.73,3.84,15.41,0,0,3,4 19.2,8,400,175,3.08,3.845,17.05,0,0,3,2 27.3,4,79,66,4.08,1.935,18.9,1,1,4,1 26,4,120.3,91,4.43,2.14,16.7,0,1,5,2 30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2 15.8,8,351,264,4.22,3.17,14.5,0,1,5,4 19.7,6,145,175,3.62,2.77,15.5,0,1,5,6 15,8,301,335,3.54,3.57,14.6,0,1,5,8 21.4,4,121,109,4.11,2.78,18.6,1,1,4,2 readr/inst/extdata/mtcars.csv.zip0000644000175100001440000000130713057262333016610 0ustar hornikusersPKIF mtcars.csvUX ܫUjUun1 E A|9n ^RQXstI^R^;ow<|/?/Oxz`ay}zaJ5Q-$#q5a3iϘ,Q˅7MPinR=WJA*+!\2FVp8>yPL8puܘqxVN[MjXKd롆!V ("S epe,:^PO?76n_ºH_Nr }Ma`~Ӹ(0ѫh* xh愣 'V/i"YvI(cPȌ0i_8%n=/J#@P ړd0I1JJ*9CYi;7BM˺ qitmYrڨul5HGSB^@ltw φWpSf= U(f;"3 {[vRZn8cz\Q`@A_3+9{PKX%PKIFX% @mtcars.csvUXܫUjUPKDmreadr/inst/extdata/massey-rating.txt0000644000175100001440000000136613057262333017332 0ustar hornikusersUCC PAY LAZ KPK RT COF BIH DII ENG ACU Rank Team Conf 1 1 1 1 1 1 1 1 1 1 1 Ohio St B10 2 2 2 2 2 2 2 2 4 2 2 Oregon P12 3 4 3 4 3 4 3 4 2 3 3 Alabama SEC 4 3 4 3 4 3 5 3 3 4 4 TCU B12 6 6 6 5 5 7 6 5 6 11 5 Michigan St B10 7 7 7 6 7 6 11 8 7 8 6 Georgia SEC 5 5 5 7 6 8 4 6 5 5 7 Florida St ACC 8 8 9 9 10 5 7 7 10 7 8 Baylor B12 9 11 8 13 11 11 12 9 14 9 9 Georgia Tech ACC 13 10 13 11 8 9 10 11 9 10 10 Mississippi SEC readr/inst/extdata/example.log0000644000175100001440000000032013057262333016131 0ustar hornikusers172.21.13.45 - Microsoft\JohnDoe [08/Apr/2001:17:39:04 -0800] "GET /scripts/iisadmin/ism.dll?http/serv HTTP/1.0" 200 3401 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 readr/inst/extdata/epa78.txt0000755000175100001440000000420413057262333015470 0ustar hornikusersALFA ROMEO ALFA ROMEO 78010003 ALFETTA 03 81 8 74 7 89 9 ALFETTA 78010053 SPIDER 2000 01 SPIDER 2000 78010103 AMC AMC 78020002 GREMLIN 03 79 9 79 9 GREMLIN 78020053 PACER 04 89 11 89 11 PACER 78020103 PACER WAGON 07 90 26 91 26 PACER WAGON 78020153 CONCORD 04 88 12 90 11 90 11 83 16 CONCORD 78020203 CONCORD WAGON 07 91 30 91 30 CONCORD WAGON 78020253 MATADOR COUPE 05 97 14 97 14 MATADOR COUPE 78020303 MATADOR SEDAN 06 110 20 110 20 MATADOR SEDAN 78020353 MATADOR WAGON 09 112 50 112 50 MATADOR WAGON 78020403 ASTON MARTIN ASTON MARTIN 78040002 ASTON MARTIN ASTON MARTIN 78040053 AUDI AUDI 78050002 FOX 03 84 11 84 11 84 11 FOX 78050053 FOX WAGON 07 83 40 83 40 FOX WAGON 78050103 5000 04 90 15 90 15 5000 78050153 AVANTI AVANTI 78065002 AVANTI II 02 75 8 75 8 AVANTI II 78065053 readr/inst/doc/0000755000175100001440000000000013106621354013110 5ustar hornikusersreadr/inst/doc/readr.html0000644000175100001440000007175613106621353015112 0ustar hornikusers Introduction to readr

Introduction to readr

The key problem that readr solves is parsing a flat file into a tibble. Parsing is the process of taking a text file and turning it into a rectangular tibble where each column is the appropriate part. Parsing takes place in three basic stages:

  1. The flat file is parsed into a rectangular matrix of strings.

  2. The type of each column is determined.

  3. Each column of strings is parsed into a vector of a more specific type.

It’s easiest to learn how this works in the opposite order Below, you’ll learn how the:

  1. Vector parsers turn a character vector in to a more specific type.

  2. Column specification describes the type of each column and the strategy readr uses to guess types so you don’t need to supply them all.

  3. Rectangular parsers turn a flat file into a matrix of rows and columns.

Each parse_*() is coupled with a col_*() function, which will be used in the process of parsing a complete tibble.

Vector parsers

It’s easiest to learn the vector parses using parse_ functions. These all take a character vector and some options. They return a new vector the same length as the old, along with an attribute describing any problems.

Atomic vectors

parse_logical(), parse_integer(), parse_double(), and parse_character() are straightforward parsers that produce the corresponding atomic vector.

parse_integer(c("1", "2", "3"))
#> [1] 1 2 3
parse_double(c("1.56", "2.34", "3.56"))
#> [1] 1.56 2.34 3.56
parse_logical(c("true", "false"))
#> [1]  TRUE FALSE

By default, readr expects . as the decimal mark and , as the grouping mark. You can override this default using locale(), as described in vignette("locales").

Flexible numeric parser

parse_integer() and parse_double() are strict: the input string must be a single number with no leading or trailing characters. parse_number() is more flexible: it ignores non-numeric prefixes and suffixes, and knows how to deal with grouping marks. This makes it suitable for reading currencies and percentages:

parse_number(c("0%", "10%", "150%"))
#> [1]   0  10 150
parse_number(c("$1,234.5", "$12.45"))
#> [1] 1234.50   12.45

Date/times

readr supports three types of date/time data:

  • dates: number of days since 1970-01-01.
  • times: number of seconds since midnight.
  • datetimes: number of seconds since midnight 1970-01-01.
parse_datetime("2010-10-01 21:45")
#> [1] "2010-10-01 21:45:00 UTC"
parse_date("2010-10-01")
#> [1] "2010-10-01"
parse_time("1:00pm")
#> 13:00:00

Each function takes a format argument which describes the format of the string. If not specified, it uses a default value:

  • parse_datetime() recognises ISO8601 datetimes.

  • parse_date() uses the date_format specified by the locale(). The default value is %AD which uses an automatic date parser that recognises dates of the format Y-m-d or Y/m/d.

  • parse_time() uses the time_format specified by the locale(). The default value is %At which uses an automatic time parser that recognises times of the form H:M optionally followed by seconds and am/pm.

In most cases, you will need to supply a format, as documented in parse_datetime():

parse_datetime("1 January, 2010", "%d %B, %Y")
#> [1] "2010-01-01 UTC"
parse_datetime("02/02/15", "%m/%d/%y")
#> [1] "2015-02-02 UTC"

Factors

When reading a column that has a known set of values, you can read directly into a factor. parse_factor() will generate generate a warning if a value is not in the supplied levels.

parse_factor(c("a", "b", "a"), levels = c("a", "b", "c"))
#> [1] a b a
#> Levels: a b c
parse_factor(c("a", "b", "d"), levels = c("a", "b", "c"))
#> Warning: 1 parsing failure.
#> row col           expected actual
#>   3  -- value in level set      d
#> [1] a    b    <NA>
#> attr(,"problems")
#> # A tibble: 1 × 4
#>     row   col           expected actual
#>   <int> <int>              <chr>  <chr>
#> 1     3    NA value in level set      d
#> Levels: a b c

Column specification

It would be tedious if you had to specify the type of every column when reading a file. Instead readr, uses some heuristics to guess the type of each column. You can access these results yourself using guess_parser():

guess_parser(c("a", "b", "c"))
#> [1] "character"
guess_parser(c("1", "2", "3"))
#> [1] "integer"
guess_parser(c("1,000", "2,000", "3,000"))
#> [1] "number"
guess_parser(c("2001/10/10"))
#> [1] "date"

The guessing policies are described in the documentation for the individual functions. Guesses are fairly strict. For example, we don’t guess that currencies are numbers, even though we can parse them:

guess_parser("$1,234")
#> [1] "character"
parse_number("1,234")
#> [1] 1234

The are two parsers that will never be guessed: col_skip() and col_factor(). You will always need to supply these explicitly.

You can see the specification that readr would generate for a column file by using spec_csv(), spec_tsv() and so on:

x <- spec_csv(readr_example("challenge.csv"))
#> Parsed with column specification:
#> cols(
#>   x = col_integer(),
#>   y = col_character()
#> )

For bigger files, you can often make the specification simpler by changing the default column type using cols_condense()

mtcars_spec <- spec_csv(readr_example("mtcars.csv"))
#> Parsed with column specification:
#> cols(
#>   mpg = col_double(),
#>   cyl = col_integer(),
#>   disp = col_double(),
#>   hp = col_integer(),
#>   drat = col_double(),
#>   wt = col_double(),
#>   qsec = col_double(),
#>   vs = col_integer(),
#>   am = col_integer(),
#>   gear = col_integer(),
#>   carb = col_integer()
#> )
mtcars_spec
#> cols(
#>   mpg = col_double(),
#>   cyl = col_integer(),
#>   disp = col_double(),
#>   hp = col_integer(),
#>   drat = col_double(),
#>   wt = col_double(),
#>   qsec = col_double(),
#>   vs = col_integer(),
#>   am = col_integer(),
#>   gear = col_integer(),
#>   carb = col_integer()
#> )

cols_condense(mtcars_spec)
#> cols(
#>   .default = col_integer(),
#>   mpg = col_double(),
#>   disp = col_double(),
#>   drat = col_double(),
#>   wt = col_double(),
#>   qsec = col_double()
#> )

By default readr only looks at the first 1000 rows. This keeps file parsing speedy, but can generate incorrect guesses. For example, in challenge.csv the column types change in row 1001, so readr guesses the wrong types. One way to resolve the problem is to increase the number of rows:

x <- spec_csv(readr_example("challenge.csv"), guess_max = 1001)
#> Parsed with column specification:
#> cols(
#>   x = col_double(),
#>   y = col_date(format = "")
#> )

Another way is to manually specify the col_type, as described below.

Rectangular parsers

readr comes with five parsers for rectangular file formats:

Each of these functions firsts calls spec_xxx() (as described above), and then parses the file according to that column specification:

df1 <- read_csv(readr_example("challenge.csv"))
#> Parsed with column specification:
#> cols(
#>   x = col_integer(),
#>   y = col_character()
#> )
#> Warning: 1000 parsing failures.
#>  row col               expected             actual                                                                                                                 file
#> 1001   x no trailing characters .23837975086644292 '/private/var/folders/dt/r5s12t392tb5sk181j3gs4zw0000gn/T/RtmpufyP0S/Rinst133b1133d212a/readr/extdata/challenge.csv'
#> 1002   x no trailing characters .41167997173033655 '/private/var/folders/dt/r5s12t392tb5sk181j3gs4zw0000gn/T/RtmpufyP0S/Rinst133b1133d212a/readr/extdata/challenge.csv'
#> 1003   x no trailing characters .7460716762579978  '/private/var/folders/dt/r5s12t392tb5sk181j3gs4zw0000gn/T/RtmpufyP0S/Rinst133b1133d212a/readr/extdata/challenge.csv'
#> 1004   x no trailing characters .723450553836301   '/private/var/folders/dt/r5s12t392tb5sk181j3gs4zw0000gn/T/RtmpufyP0S/Rinst133b1133d212a/readr/extdata/challenge.csv'
#> 1005   x no trailing characters .614524137461558   '/private/var/folders/dt/r5s12t392tb5sk181j3gs4zw0000gn/T/RtmpufyP0S/Rinst133b1133d212a/readr/extdata/challenge.csv'
#> .... ... ...................... .................. ....................................................................................................................
#> See problems(...) for more details.

The rectangular parsing functions almost always succeed; they’ll only fail if the format is severely messed up. Instead, readr will generate a data frame of problems. The first few will be printed out, and you can access them all with problems():

problems(df1)
#> # A tibble: 1,000 × 5
#>      row   col               expected             actual
#>    <int> <chr>                  <chr>              <chr>
#> 1   1001     x no trailing characters .23837975086644292
#> 2   1002     x no trailing characters .41167997173033655
#> 3   1003     x no trailing characters  .7460716762579978
#> 4   1004     x no trailing characters   .723450553836301
#> 5   1005     x no trailing characters   .614524137461558
#> 6   1006     x no trailing characters   .473980569280684
#> 7   1007     x no trailing characters  .5784610391128808
#> 8   1008     x no trailing characters  .2415937229525298
#> 9   1009     x no trailing characters .11437866208143532
#> 10  1010     x no trailing characters  .2983446326106787
#> # ... with 990 more rows, and 1 more variables: file <chr>

You’ve already seen one way of handling bad guesses: increasing the number of rows used to guess the type of each column.

df2 <- read_csv(readr_example("challenge.csv"), guess_max = 1001)
#> Parsed with column specification:
#> cols(
#>   x = col_double(),
#>   y = col_date(format = "")
#> )

Another approach is to manually supply the column specification.

Overriding the defaults

In the previous examples, you may have noticed that readr printed the column specification that it used to parse the file:

#> Parsed with column specification:
#> cols(
#>   x = col_integer(),
#>   y = col_character()
#> )

You can also access it after the fact using spec():

spec(df1)
#> cols(
#>   x = col_integer(),
#>   y = col_character()
#> )
spec(df2)
#> cols(
#>   x = col_double(),
#>   y = col_date(format = "")
#> )

(This also allows you to access the full column specification if you’re reading a very wide file. By default, readr will only print the specification of the first 20 columns.)

If you want to manually specify the column types, you can start by copying and pasting this code, and then tweaking it fix the parsing problems.

df3 <- read_csv(
  readr_example("challenge.csv"), 
  col_types = cols(
    x = col_double(),
    y = col_date(format = "")
  )
)

In general, it’s good practice to supply an explicit column specification. It is more work, but it ensures that you get warnings if the data changes in unexpected ways. To be really strict, you can use stop_for_problems(df3). This will throw an error if there are any parsing problems, forcing you to fix those problems before proceeding with the analysis.

Output

The output of all these functions is a tibble. Note that characters are never automatically converted to factors (i.e. no more stringsAsFactors = FALSE) and column names are left as is, not munged into valid R identifiers (i.e. there is no check.names = TRUE). Row names are never set.

Attributes store the column specification (spec()) and any parsing problems (problems()).

readr/inst/doc/readr.Rmd0000644000175100001440000001760313106315444014660 0ustar hornikusers--- title: "Introduction to readr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to readr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(readr) knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` The key problem that readr solves is __parsing__ a flat file into a tibble. Parsing is the process of taking a text file and turning it into a rectangular tibble where each column is the appropriate part. Parsing takes place in three basic stages: 1. The flat file is parsed into a rectangular matrix of strings. 1. The type of each column is determined. 1. Each column of strings is parsed into a vector of a more specific type. It's easiest to learn how this works in the opposite order Below, you'll learn how the: 1. __Vector parsers__ turn a character vector in to a more specific type. 1. __Column specification__ describes the type of each column and the strategy readr uses to guess types so you don't need to supply them all. 1. __Rectangular parsers__ turn a flat file into a matrix of rows and columns. Each `parse_*()` is coupled with a `col_*()` function, which will be used in the process of parsing a complete tibble. ## Vector parsers It's easiest to learn the vector parses using `parse_` functions. These all take a character vector and some options. They return a new vector the same length as the old, along with an attribute describing any problems. ### Atomic vectors `parse_logical()`, `parse_integer()`, `parse_double()`, and `parse_character()` are straightforward parsers that produce the corresponding atomic vector. ```{r} parse_integer(c("1", "2", "3")) parse_double(c("1.56", "2.34", "3.56")) parse_logical(c("true", "false")) ``` By default, readr expects `.` as the decimal mark and `,` as the grouping mark. You can override this default using `locale()`, as described in `vignette("locales")`. ### Flexible numeric parser `parse_integer()` and `parse_double()` are strict: the input string must be a single number with no leading or trailing characters. `parse_number()` is more flexible: it ignores non-numeric prefixes and suffixes, and knows how to deal with grouping marks. This makes it suitable for reading currencies and percentages: ```{r} parse_number(c("0%", "10%", "150%")) parse_number(c("$1,234.5", "$12.45")) ``` ### Date/times readr supports three types of date/time data: * dates: number of days since 1970-01-01. * times: number of seconds since midnight. * datetimes: number of seconds since midnight 1970-01-01. ```{r} parse_datetime("2010-10-01 21:45") parse_date("2010-10-01") parse_time("1:00pm") ``` Each function takes a `format` argument which describes the format of the string. If not specified, it uses a default value: * `parse_datetime()` recognises [ISO8601](https://en.wikipedia.org/wiki/ISO_8601) datetimes. * `parse_date()` uses the `date_format` specified by the `locale()`. The default value is `%AD` which uses an automatic date parser that recognises dates of the format `Y-m-d` or `Y/m/d`. * `parse_time()` uses the `time_format` specified by the `locale()`. The default value is `%At` which uses an automatic time parser that recognises times of the form `H:M` optionally followed by seconds and am/pm. In most cases, you will need to supply a `format`, as documented in `parse_datetime()`: ```{r} parse_datetime("1 January, 2010", "%d %B, %Y") parse_datetime("02/02/15", "%m/%d/%y") ``` ### Factors When reading a column that has a known set of values, you can read directly into a factor. `parse_factor()` will generate generate a warning if a value is not in the supplied levels. ```{r} parse_factor(c("a", "b", "a"), levels = c("a", "b", "c")) parse_factor(c("a", "b", "d"), levels = c("a", "b", "c")) ``` ## Column specification It would be tedious if you had to specify the type of every column when reading a file. Instead readr, uses some heuristics to guess the type of each column. You can access these results yourself using `guess_parser()`: ```{r} guess_parser(c("a", "b", "c")) guess_parser(c("1", "2", "3")) guess_parser(c("1,000", "2,000", "3,000")) guess_parser(c("2001/10/10")) ``` The guessing policies are described in the documentation for the individual functions. Guesses are fairly strict. For example, we don't guess that currencies are numbers, even though we can parse them: ```{r} guess_parser("$1,234") parse_number("1,234") ``` The are two parsers that will never be guessed: `col_skip()` and `col_factor()`. You will always need to supply these explicitly. You can see the specification that readr would generate for a column file by using `spec_csv()`, `spec_tsv()` and so on: ```{r} x <- spec_csv(readr_example("challenge.csv")) ``` For bigger files, you can often make the specification simpler by changing the default column type using `cols_condense()` ```{r} mtcars_spec <- spec_csv(readr_example("mtcars.csv")) mtcars_spec cols_condense(mtcars_spec) ``` By default readr only looks at the first 1000 rows. This keeps file parsing speedy, but can generate incorrect guesses. For example, in `challenge.csv` the column types change in row 1001, so readr guesses the wrong types. One way to resolve the problem is to increase the number of rows: ```{r} x <- spec_csv(readr_example("challenge.csv"), guess_max = 1001) ``` Another way is to manually specify the `col_type`, as described below. ## Rectangular parsers readr comes with five parsers for rectangular file formats: * `read_csv()` and `read_csv2()` for csv files * `read_tsv()` for tabs separated files * `read_fwf()` for fixed-width files * `read_log()` for web log files Each of these functions firsts calls `spec_xxx()` (as described above), and then parses the file according to that column specification: ```{r} df1 <- read_csv(readr_example("challenge.csv")) ``` The rectangular parsing functions almost always succeed; they'll only fail if the format is severely messed up. Instead, readr will generate a data frame of problems. The first few will be printed out, and you can access them all with `problems()`: ```{r} problems(df1) ``` You've already seen one way of handling bad guesses: increasing the number of rows used to guess the type of each column. ```{r} df2 <- read_csv(readr_example("challenge.csv"), guess_max = 1001) ``` Another approach is to manually supply the column specification. ### Overriding the defaults In the previous examples, you may have noticed that readr printed the column specification that it used to parse the file: ```{r} #> Parsed with column specification: #> cols( #> x = col_integer(), #> y = col_character() #> ) ``` You can also access it after the fact using `spec()`: ```{r} spec(df1) spec(df2) ``` (This also allows you to access the full column specification if you're reading a very wide file. By default, readr will only print the specification of the first 20 columns.) If you want to manually specify the column types, you can start by copying and pasting this code, and then tweaking it fix the parsing problems. ```{r} df3 <- read_csv( readr_example("challenge.csv"), col_types = cols( x = col_double(), y = col_date(format = "") ) ) ``` In general, it's good practice to supply an explicit column specification. It is more work, but it ensures that you get warnings if the data changes in unexpected ways. To be really strict, you can use `stop_for_problems(df3)`. This will throw an error if there are any parsing problems, forcing you to fix those problems before proceeding with the analysis. ### Output The output of all these functions is a tibble. Note that characters are never automatically converted to factors (i.e. no more `stringsAsFactors = FALSE`) and column names are left as is, not munged into valid R identifiers (i.e. there is no `check.names = TRUE`). Row names are never set. Attributes store the column specification (`spec()`) and any parsing problems (`problems()`). readr/inst/doc/locales.html0000644000175100001440000006525013106621352015426 0ustar hornikusers Locales

Locales

The goal of readr’s locales is to encapsulate common options that vary between languages and localities. This includes:

(Stricly speaking these are not locales in the usual technical sense of the word because they also contain information about time zones and encoding.)

To create a new locale, you use the locale() function:

locale()
#> <locale>
#> Numbers:  123,456.78
#> Formats:  %AD / %AT
#> Timezone: UTC
#> Encoding: UTF-8
#> <date_names>
#> Days:   Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed),
#>         Thursday (Thu), Friday (Fri), Saturday (Sat)
#> Months: January (Jan), February (Feb), March (Mar), April (Apr), May
#>         (May), June (Jun), July (Jul), August (Aug), September
#>         (Sep), October (Oct), November (Nov), December (Dec)
#> AM/PM:  AM/PM

This rest of this vignette will explain what each of the options do.

All of the parsing function in readr take a locale argument. You’ll most often use it with read_csv(), read_fwf() or read_table(). Readr is designed to work the same way across systems, so the default locale is English centric like R. If you’re not in an English speaking country, this makes initial import a little harder, because you have to override the defaults. But the payoff is big: you can share your code and know that it will work on any other system. Base R takes a different philosophy. It uses system defaults, so typical data import is a little easier, but sharing code is harder.

Rather than demonstrating the use of locales with read_csv() and fields, in this vignette I’m going to use the parse_*() functions. These work with a character vector instead of a file on disk, so they’re easier to use in examples. They’re also useful in their own right if you need to do custom parsing. See type_convert() if you need to apply multiple parsers to a data frame.

Dates and times

Names of months and days

The first argument to locale() is date_names, and it controls what values are used for month and day names. The easiest way to specify it is with a ISO 639 language code:

locale("ko") # Korean
#> <locale>
#> Numbers:  123,456.78
#> Formats:  %AD / %AT
#> Timezone: UTC
#> Encoding: UTF-8
#> <date_names>
#> Days:   일요일 (일), 월요일 (월), 화요일 (화), 수요일 (수), 목요일 (목),
#>         금요일 (금), 토요일 (토)
#> Months: 1월, 2월, 3월, 4월, 5월, 6월, 7월, 8월, 9월, 10월, 11월, 12월
#> AM/PM:  오전/오후
locale("fr") # French
#> <locale>
#> Numbers:  123,456.78
#> Formats:  %AD / %AT
#> Timezone: UTC
#> Encoding: UTF-8
#> <date_names>
#> Days:   dimanche (dim.), lundi (lun.), mardi (mar.), mercredi (mer.),
#>         jeudi (jeu.), vendredi (ven.), samedi (sam.)
#> Months: janvier (janv.), février (févr.), mars (mars), avril (avr.), mai
#>         (mai), juin (juin), juillet (juil.), août (août),
#>         septembre (sept.), octobre (oct.), novembre (nov.),
#>         décembre (déc.)
#> AM/PM:  AM/PM

If you don’t already know the code for your language, Wikipedia has a good list. Currently readr has 185 languages available. You can list them all with date_names_langs().

Specifying a locale allows you to parse dates in other languages:

parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
#> [1] "2015-01-01"
parse_date("14 oct. 1979", "%d %b %Y", locale = locale("fr"))
#> [1] "1979-10-14"

For many languages, it’s common to find that diacritics have been stripped so they can be stored as ASCII. You can tell the locale that with the asciify option:

parse_date("1 août 2015", "%d %B %Y", locale = locale("fr"))
#> [1] "2015-08-01"
parse_date("1 aout 2015", "%d %B %Y", locale = locale("fr", asciify = TRUE))
#> [1] "2015-08-01"

Note that the quality of the translations is variable, especially for the rarer languages. If you discover that they’re not quite right for your data, you can create your own with date_names(). The following example creates a locale with Māori date names:

maori <- locale(date_names(
  day = c("Rātapu", "Rāhina", "Rātū", "Rāapa", "Rāpare", "Rāmere", "Rāhoroi"),
  mon = c("Kohi-tātea", "Hui-tanguru", "Poutū-te-rangi", "Paenga-whāwhā",
    "Haratua", "Pipiri", "Hōngongoi", "Here-turi-kōkā", "Mahuru",
    "Whiringa-ā-nuku", "Whiringa-ā-rangi", "Hakihea")
))

Timezones

Unless otherwise specified, readr assumes that times are in UTC, the Universal Coordinated Time (this is a successor to GMT and for almost all intents is identical). UTC is most suitable for data because it doesn’t have daylight savings - this avoids a whole class of potential problems. If your data isn’t already in UTC, you’ll need to supply a tz in the locale:

parse_datetime("2001-10-10 20:10")
#> [1] "2001-10-10 20:10:00 UTC"
parse_datetime("2001-10-10 20:10", locale = locale(tz = "Pacific/Auckland"))
#> [1] "2001-10-10 20:10:00 NZDT"
parse_datetime("2001-10-10 20:10", locale = locale(tz = "Europe/Dublin"))
#> [1] "2001-10-10 20:10:00 IST"

You can see a complete list of time zones with OlsonNames().

If you’re American, note that “EST” is a Canadian time zone that does not have DST. It’s not Eastern Standard Time! Instead use:

  • PST/PDT = “US/Pacific”
  • CST/CDT = “US/Central”
  • MST/MDT = “US/Mountain”
  • EST/EDT = “US/Eastern”

(Note that there are more specific time zones for smaller areas that don’t follow the same rules. For example, “US/Arizona”, which follows mostly follows mountain time, but doesn’t have daylight savings. If you’re dealing with historical data, you might need an even more specific zone like “America/North_Dakota/New_Salem” - that will get you the most accurate time zones.)

Note that these are only used as defaults. If individual times have timezones and you’re using “%Z” (as name, e.g. “America/Chicago”) or “%z” (as offset from UTC, e.g. “+0800”), they’ll override the defaults. There’s currently no good way to parse times that use US abbreviations.

Note that once you have the date in R, changing the time zone just changes its printed representation - it still represents the same instants of time. If you’ve loaded non-UTC data, and want to display it as UTC, try this snippet of code:

is_datetime <- sapply(df, inherits, "POSIXct")
df[is_datetime] <- lapply(df[is_datetime], function(x) {
  attr(x, "tzone") <- "UTC"
  x
})

Default formats

Locales also provide default date and time formats. The time format isn’t currently used for anything, but the date format is used when guessing column types. The default date format is %Y-%m-%d because that’s unambiguous:

str(parse_guess("2010-10-10"))
#>  Date[1:1], format: "2010-10-10"

If you’re an American, you might want you use your illogical date sytem::

str(parse_guess("01/02/2013"))
#>  chr "01/02/2013"
str(parse_guess("01/02/2013", locale = locale(date_format = "%d/%m/%Y")))
#>  Date[1:1], format: "2013-02-01"

Character

All readr functions yield strings encoded in UTF-8. This encoding is the most likely to give good results in the widest variety of settings. By default, readr assumes that your input is also in UTF-8. This is less likely to be the case, especially when you’re working with older datasets.

The following code illustrates the problems with encodings:

library(stringi)
x <- "Émigré cause célèbre déjà vu.\n"
y <- stri_conv(x, "UTF-8", "latin1")

# These strings look like they're identical:
x
#> [1] "Émigré cause célèbre déjà vu.\n"
y
#> [1] "Émigré cause célèbre déjà vu.\n"
identical(x, y)
#> [1] TRUE

# But they have difference encodings:
Encoding(x)
#> [1] "unknown"
Encoding(y)
#> [1] "latin1"

# That means while they print the same, their raw (binary)
# representation is actually quite different:
charToRaw(x)
#>  [1] c3 89 6d 69 67 72 c3 a9 20 63 61 75 73 65 20 63 c3 a9 6c c3 a8 62 72
#> [24] 65 20 64 c3 a9 6a c3 a0 20 76 75 2e 0a
charToRaw(y)
#>  [1] c9 6d 69 67 72 e9 20 63 61 75 73 65 20 63 e9 6c e8 62 72 65 20 64 e9
#> [24] 6a e0 20 76 75 2e 0a

# readr expects strings to be encoded as UTF-8. If they're
# not, you'll get weird characters
parse_character(x)
#> [1] "Émigré cause célèbre déjà vu.\n"
parse_character(y)
#> [1] "\xc9migr\xe9 cause c\xe9l\xe8bre d\xe9j\xe0 vu.\n"

# If you know the encoding, supply it:
parse_character(y, locale = locale(encoding = "latin1"))
#> [1] "Émigré cause célèbre déjà vu.\n"

If you don’t know what encoding the file uses, try guess_encoding(). It’s not 100% perfect (as it’s fundamentally a heuristic), but should at least get you pointed in the right direction:

guess_encoding(x)
#> # A tibble: 3 × 2
#>       encoding confidence
#>          <chr>      <dbl>
#> 1        UTF-8       1.00
#> 2 windows-1250       0.34
#> 3 windows-1252       0.26
guess_encoding(y)
#> # A tibble: 2 × 2
#>     encoding confidence
#>        <chr>      <dbl>
#> 1 ISO-8859-2        0.4
#> 2 ISO-8859-1        0.3

# Note that the first guess produces a valid string, but isn't correct:
parse_character(y, locale = locale(encoding = "ISO-8859-2"))
#> [1] "Émigré cause célčbre déjŕ vu.\n"
# But ISO-8859-1 is another name for latin1
parse_character(y, locale = locale(encoding = "ISO-8859-1"))
#> [1] "Émigré cause célèbre déjà vu.\n"

Numbers

Some countries use the decimal point, while others use the decimal comma. The decimal_mark option controls which readr uses when parsing doubles:

parse_double("1,23", locale = locale(decimal_mark = ","))
#> [1] 1.23

Additionally, when writing out big numbers, you might have 1,000,000, 1.000.000, 1 000 000, or 1'000'000. The grouping mark is ignored by the more flexible number parser:

parse_number("$1,234.56")
#> [1] 1234.56
parse_number("$1.234,56", 
  locale = locale(decimal_mark = ",", grouping_mark = ".")
)
#> [1] 1234.56

# readr is smart enough to guess that if you're using , for decimals then
# you're probably using . for grouping:
parse_number("$1.234,56", locale = locale(decimal_mark = ","))
#> [1] 1234.56
readr/inst/doc/readr.R0000644000175100001440000000505413106621353014333 0ustar hornikusers## ---- include = FALSE---------------------------------------------------- library(readr) knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ## ------------------------------------------------------------------------ parse_integer(c("1", "2", "3")) parse_double(c("1.56", "2.34", "3.56")) parse_logical(c("true", "false")) ## ------------------------------------------------------------------------ parse_number(c("0%", "10%", "150%")) parse_number(c("$1,234.5", "$12.45")) ## ------------------------------------------------------------------------ parse_datetime("2010-10-01 21:45") parse_date("2010-10-01") parse_time("1:00pm") ## ------------------------------------------------------------------------ parse_datetime("1 January, 2010", "%d %B, %Y") parse_datetime("02/02/15", "%m/%d/%y") ## ------------------------------------------------------------------------ parse_factor(c("a", "b", "a"), levels = c("a", "b", "c")) parse_factor(c("a", "b", "d"), levels = c("a", "b", "c")) ## ------------------------------------------------------------------------ guess_parser(c("a", "b", "c")) guess_parser(c("1", "2", "3")) guess_parser(c("1,000", "2,000", "3,000")) guess_parser(c("2001/10/10")) ## ------------------------------------------------------------------------ guess_parser("$1,234") parse_number("1,234") ## ------------------------------------------------------------------------ x <- spec_csv(readr_example("challenge.csv")) ## ------------------------------------------------------------------------ mtcars_spec <- spec_csv(readr_example("mtcars.csv")) mtcars_spec cols_condense(mtcars_spec) ## ------------------------------------------------------------------------ x <- spec_csv(readr_example("challenge.csv"), guess_max = 1001) ## ------------------------------------------------------------------------ df1 <- read_csv(readr_example("challenge.csv")) ## ------------------------------------------------------------------------ problems(df1) ## ------------------------------------------------------------------------ df2 <- read_csv(readr_example("challenge.csv"), guess_max = 1001) ## ------------------------------------------------------------------------ #> Parsed with column specification: #> cols( #> x = col_integer(), #> y = col_character() #> ) ## ------------------------------------------------------------------------ spec(df1) spec(df2) ## ------------------------------------------------------------------------ df3 <- read_csv( readr_example("challenge.csv"), col_types = cols( x = col_double(), y = col_date(format = "") ) ) readr/inst/doc/locales.Rmd0000644000175100001440000002001713106315444015176 0ustar hornikusers--- title: "Locales" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Locales} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(readr) knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` The goal of readr's locales is to encapsulate common options that vary between languages and localities. This includes: * The names of months and days, used when parsing dates. * The default time zone, used when parsing datetimes. * The character encoding, used when reading non-ASCII strings. * Default date format, used when guessing column types. * The decimal and grouping marks, used when reading numbers. (Stricly speaking these are not locales in the usual technical sense of the word because they also contain information about time zones and encoding.) To create a new locale, you use the `locale()` function: ```{r} locale() ``` This rest of this vignette will explain what each of the options do. All of the parsing function in readr take a `locale` argument. You'll most often use it with `read_csv()`, `read_fwf()` or `read_table()`. Readr is designed to work the same way across systems, so the default locale is English centric like R. If you're not in an English speaking country, this makes initial import a little harder, because you have to override the defaults. But the payoff is big: you can share your code and know that it will work on any other system. Base R takes a different philosophy. It uses system defaults, so typical data import is a little easier, but sharing code is harder. Rather than demonstrating the use of locales with `read_csv()` and fields, in this vignette I'm going to use the `parse_*()` functions. These work with a character vector instead of a file on disk, so they're easier to use in examples. They're also useful in their own right if you need to do custom parsing. See `type_convert()` if you need to apply multiple parsers to a data frame. ## Dates and times ### Names of months and days The first argument to `locale()` is `date_names`, and it controls what values are used for month and day names. The easiest way to specify it is with a ISO 639 language code: ```{r} locale("ko") # Korean locale("fr") # French ``` If you don't already know the code for your language, [Wikipedia](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) has a good list. Currently readr has `r length(date_names_langs())` languages available. You can list them all with `date_names_langs()`. Specifying a locale allows you to parse dates in other languages: ```{r} parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr")) parse_date("14 oct. 1979", "%d %b %Y", locale = locale("fr")) ``` For many languages, it's common to find that diacritics have been stripped so they can be stored as ASCII. You can tell the locale that with the `asciify` option: ```{r} parse_date("1 août 2015", "%d %B %Y", locale = locale("fr")) parse_date("1 aout 2015", "%d %B %Y", locale = locale("fr", asciify = TRUE)) ``` Note that the quality of the translations is variable, especially for the rarer languages. If you discover that they're not quite right for your data, you can create your own with `date_names()`. The following example creates a locale with Māori date names: ```{r} maori <- locale(date_names( day = c("Rātapu", "Rāhina", "Rātū", "Rāapa", "Rāpare", "Rāmere", "Rāhoroi"), mon = c("Kohi-tātea", "Hui-tanguru", "Poutū-te-rangi", "Paenga-whāwhā", "Haratua", "Pipiri", "Hōngongoi", "Here-turi-kōkā", "Mahuru", "Whiringa-ā-nuku", "Whiringa-ā-rangi", "Hakihea") )) ``` ### Timezones Unless otherwise specified, readr assumes that times are in UTC, the Universal Coordinated Time (this is a successor to GMT and for almost all intents is identical). UTC is most suitable for data because it doesn't have daylight savings - this avoids a whole class of potential problems. If your data isn't already in UTC, you'll need to supply a `tz` in the locale: ```{r} parse_datetime("2001-10-10 20:10") parse_datetime("2001-10-10 20:10", locale = locale(tz = "Pacific/Auckland")) parse_datetime("2001-10-10 20:10", locale = locale(tz = "Europe/Dublin")) ``` You can see a complete list of time zones with `OlsonNames()`. If you're American, note that "EST" is a Canadian time zone that does not have DST. It's not Eastern Standard Time! Instead use: * PST/PDT = "US/Pacific" * CST/CDT = "US/Central" * MST/MDT = "US/Mountain" * EST/EDT = "US/Eastern" (Note that there are more specific time zones for smaller areas that don't follow the same rules. For example, "US/Arizona", which follows mostly follows mountain time, but doesn't have daylight savings. If you're dealing with historical data, you might need an even more specific zone like "America/North_Dakota/New_Salem" - that will get you the most accurate time zones.) Note that these are only used as defaults. If individual times have timezones and you're using "%Z" (as name, e.g. "America/Chicago") or "%z" (as offset from UTC, e.g. "+0800"), they'll override the defaults. There's currently no good way to parse times that use US abbreviations. Note that once you have the date in R, changing the time zone just changes its printed representation - it still represents the same instants of time. If you've loaded non-UTC data, and want to display it as UTC, try this snippet of code: ```{r, eval = FALSE} is_datetime <- sapply(df, inherits, "POSIXct") df[is_datetime] <- lapply(df[is_datetime], function(x) { attr(x, "tzone") <- "UTC" x }) ``` ### Default formats Locales also provide default date and time formats. The time format isn't currently used for anything, but the date format is used when guessing column types. The default date format is `%Y-%m-%d` because that's unambiguous: ```{r} str(parse_guess("2010-10-10")) ``` If you're an American, you might want you use your illogical date sytem:: ```{r} str(parse_guess("01/02/2013")) str(parse_guess("01/02/2013", locale = locale(date_format = "%d/%m/%Y"))) ``` ## Character All readr functions yield strings encoded in UTF-8. This encoding is the most likely to give good results in the widest variety of settings. By default, readr assumes that your input is also in UTF-8. This is less likely to be the case, especially when you're working with older datasets. The following code illustrates the problems with encodings: ```{r} library(stringi) x <- "Émigré cause célèbre déjà vu.\n" y <- stri_conv(x, "UTF-8", "latin1") # These strings look like they're identical: x y identical(x, y) # But they have difference encodings: Encoding(x) Encoding(y) # That means while they print the same, their raw (binary) # representation is actually quite different: charToRaw(x) charToRaw(y) # readr expects strings to be encoded as UTF-8. If they're # not, you'll get weird characters parse_character(x) parse_character(y) # If you know the encoding, supply it: parse_character(y, locale = locale(encoding = "latin1")) ``` If you don't know what encoding the file uses, try `guess_encoding()`. It's not 100% perfect (as it's fundamentally a heuristic), but should at least get you pointed in the right direction: ```{r} guess_encoding(x) guess_encoding(y) # Note that the first guess produces a valid string, but isn't correct: parse_character(y, locale = locale(encoding = "ISO-8859-2")) # But ISO-8859-1 is another name for latin1 parse_character(y, locale = locale(encoding = "ISO-8859-1")) ``` ## Numbers Some countries use the decimal point, while others use the decimal comma. The `decimal_mark` option controls which readr uses when parsing doubles: ```{r} parse_double("1,23", locale = locale(decimal_mark = ",")) ``` Additionally, when writing out big numbers, you might have `1,000,000`, `1.000.000`, `1 000 000`, or `1'000'000`. The grouping mark is ignored by the more flexible number parser: ```{r} parse_number("$1,234.56") parse_number("$1.234,56", locale = locale(decimal_mark = ",", grouping_mark = ".") ) # readr is smart enough to guess that if you're using , for decimals then # you're probably using . for grouping: parse_number("$1.234,56", locale = locale(decimal_mark = ",")) ``` readr/inst/doc/locales.R0000644000175100001440000000662613106621352014665 0ustar hornikusers## ---- include = FALSE---------------------------------------------------- library(readr) knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ## ------------------------------------------------------------------------ locale() ## ------------------------------------------------------------------------ locale("ko") # Korean locale("fr") # French ## ------------------------------------------------------------------------ parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr")) parse_date("14 oct. 1979", "%d %b %Y", locale = locale("fr")) ## ------------------------------------------------------------------------ parse_date("1 août 2015", "%d %B %Y", locale = locale("fr")) parse_date("1 aout 2015", "%d %B %Y", locale = locale("fr", asciify = TRUE)) ## ------------------------------------------------------------------------ maori <- locale(date_names( day = c("Rātapu", "Rāhina", "Rātū", "Rāapa", "Rāpare", "Rāmere", "Rāhoroi"), mon = c("Kohi-tātea", "Hui-tanguru", "Poutū-te-rangi", "Paenga-whāwhā", "Haratua", "Pipiri", "Hōngongoi", "Here-turi-kōkā", "Mahuru", "Whiringa-ā-nuku", "Whiringa-ā-rangi", "Hakihea") )) ## ------------------------------------------------------------------------ parse_datetime("2001-10-10 20:10") parse_datetime("2001-10-10 20:10", locale = locale(tz = "Pacific/Auckland")) parse_datetime("2001-10-10 20:10", locale = locale(tz = "Europe/Dublin")) ## ---- eval = FALSE------------------------------------------------------- # is_datetime <- sapply(df, inherits, "POSIXct") # df[is_datetime] <- lapply(df[is_datetime], function(x) { # attr(x, "tzone") <- "UTC" # x # }) ## ------------------------------------------------------------------------ str(parse_guess("2010-10-10")) ## ------------------------------------------------------------------------ str(parse_guess("01/02/2013")) str(parse_guess("01/02/2013", locale = locale(date_format = "%d/%m/%Y"))) ## ------------------------------------------------------------------------ library(stringi) x <- "Émigré cause célèbre déjà vu.\n" y <- stri_conv(x, "UTF-8", "latin1") # These strings look like they're identical: x y identical(x, y) # But they have difference encodings: Encoding(x) Encoding(y) # That means while they print the same, their raw (binary) # representation is actually quite different: charToRaw(x) charToRaw(y) # readr expects strings to be encoded as UTF-8. If they're # not, you'll get weird characters parse_character(x) parse_character(y) # If you know the encoding, supply it: parse_character(y, locale = locale(encoding = "latin1")) ## ------------------------------------------------------------------------ guess_encoding(x) guess_encoding(y) # Note that the first guess produces a valid string, but isn't correct: parse_character(y, locale = locale(encoding = "ISO-8859-2")) # But ISO-8859-1 is another name for latin1 parse_character(y, locale = locale(encoding = "ISO-8859-1")) ## ------------------------------------------------------------------------ parse_double("1,23", locale = locale(decimal_mark = ",")) ## ------------------------------------------------------------------------ parse_number("$1,234.56") parse_number("$1.234,56", locale = locale(decimal_mark = ",", grouping_mark = ".") ) # readr is smart enough to guess that if you're using , for decimals then # you're probably using . for grouping: parse_number("$1.234,56", locale = locale(decimal_mark = ",")) readr/tests/0000755000175100001440000000000013057262333012533 5ustar hornikusersreadr/tests/testthat.R0000644000175100001440000000006613057262333014520 0ustar hornikuserslibrary(testthat) library(readr) test_check("readr") readr/tests/testthat/0000755000175100001440000000000013106646435014377 5ustar hornikusersreadr/tests/testthat/test-encoding.R0000644000175100001440000000056213106315444017261 0ustar hornikuserscontext("Encoding") test_that("guess_encoding() works", { x <- guess_encoding(readr_example("mtcars.csv")) expect_is(x, "tbl_df") expect_equal(as.character(x$encoding), "ASCII") expect_equal(x$confidence, 1) x <- guess_encoding("a\n\u00b5\u00b5") expect_is(x, "tbl_df") expect_equal(as.character(x$encoding), "UTF-8") expect_equal(x$confidence, 0.8) }) readr/tests/testthat/eol-lf.csv0000644000175100001440000000003213057262333016261 0ustar hornikusers"x","y" 1,"a" 2,"b" 3,"c" readr/tests/testthat/test-parsing-datetime.R0000644000175100001440000001743713106315672020744 0ustar hornikuserscontext("Parsing, datetime") test_that("utctime is equivalent to R conversion", { year <- seq(0, 4000) mon <- rep(3, length(year)) day <- rep(1, length(year)) zero <- rep(0, length(year)) expect_equal( ISOdatetime(year, mon, day, zero, zero, zero, tz = "UTC"), utctime(year, mon, day, zero, zero, zero, zero) ) }) # Parsing ---------------------------------------------------------------------- r_parse <- function(x, fmt) as.POSIXct(strptime(x, fmt, tz = "UTC")) test_that("%d, %m and %y", { target <- utctime(2010, 2, 3, 0, 0, 0, 0) expect_equal(parse_datetime("10-02-03", "%y-%m-%d"), target) expect_equal(parse_datetime("10-03-02", "%y-%d-%m"), target) expect_equal(parse_datetime("03/02/10", "%d/%m/%y"), target) expect_equal(parse_datetime("02/03/10", "%m/%d/%y"), target) }) test_that("Compound formats work", { target <- utctime(2010, 2, 3, 0, 0, 0, 0) expect_equal(parse_datetime("02/03/10", "%D"), target) expect_equal(parse_datetime("2010-02-03", "%F"), target) expect_equal(parse_datetime("10/02/03", "%x"), target) }) test_that("%y matches R behaviour", { expect_equal( parse_datetime("01-01-69", "%d-%m-%y"), r_parse("01-01-69", "%d-%m-%y") ) expect_equal( parse_datetime("01-01-68", "%d-%m-%y"), r_parse("01-01-68", "%d-%m-%y") ) }) test_that("%e allows leading space", { expect_equal(parse_datetime("201010 1", "%Y%m%e"), utctime(2010, 10, 1, 0, 0, 0, 0)) }) test_that("%OS captures partial seconds", { x <- parse_datetime("2001-01-01 00:00:01.125", "%Y-%m-%d %H:%M:%OS") expect_equal(as.POSIXlt(x)$sec, 1.125) x <- parse_datetime("2001-01-01 00:00:01.333", "%Y-%m-%d %H:%M:%OS") expect_equal(as.POSIXlt(x)$sec, 1.333, tol = 1e-6) }) test_that("%y requries 4 digits", { expect_warning(parse_date("003-01-01", "%Y-%m-%d"), "parsing failure") expect_warning(parse_date("03-01-01", "%Y-%m-%d"), "parsing failure") expect_warning(parse_date("00003-01-01", "%Y-%m-%d"), "parsing failure") }) test_that("invalid dates return NA", { expect_warning(x <- parse_datetime("2010-02-30", "%Y-%m-%d")) expect_true(is.na(x)) }) test_that("failed parsing returns NA", { expect_warning({ x <- parse_datetime(c("2010-02-ab", "2010-02", "2010/02/01"), "%Y-%m-%d") }) expect_equal(is.na(x), c(TRUE, TRUE, TRUE)) expect_equal(n_problems(x), 3) }) test_that("invalid specs returns NA", { expect_warning(x <- parse_datetime("2010-02-20", "%Y-%m-%m")) expect_equal(is.na(x), TRUE) expect_equal(n_problems(x), 1) }) test_that("ISO8601 partial dates are not parsed", { expect_equal(n_problems(parse_datetime("20")), 1) expect_equal(n_problems(parse_datetime("2001")), 1) expect_equal(n_problems(parse_datetime("2001-01")), 1) }) test_that("Year only gets parsed", { expect_equal(parse_datetime("2010", "%Y"), ISOdate(2010, 1, 1, 0, tz = "UTC")) expect_equal(parse_datetime("2010-06", "%Y-%m"),ISOdate(2010, 6, 1, 0, tz = "UTC")) }) test_that("%p detects AM/PM", { am <- parse_datetime(c("2015-01-01 01:00 AM", "2015-01-01 01:00 am"), "%F %I:%M %p") pm <- parse_datetime(c("2015-01-01 01:00 PM", "2015-01-01 01:00 pm"), "%F %I:%M %p") expect_equal(pm, am + 12 * 3600) expect_equal(parse_datetime("12/31/1991 12:01 AM", "%m/%d/%Y %I:%M %p"), POSIXct(694137660, "UTC")) expect_equal(parse_datetime("12/31/1991 12:01 PM", "%m/%d/%Y %I:%M %p"), POSIXct(694180860, "UTC")) expect_equal(parse_datetime("12/31/1991 1:01 AM", "%m/%d/%Y %I:%M %p"), POSIXct(694141260, "UTC")) expect_warning(x <- parse_datetime(c("12/31/1991 00:01 PM", "12/31/1991 13:01 PM"), "%m/%d/%Y %I:%M %p")) expect_equal(n_problems(x), 2) }) test_that("%b and %B are case insensitve", { ref <- parse_date("2001-01-01") expect_equal(parse_date("2001 JAN 01", "%Y %b %d"), ref) expect_equal(parse_date("2001 JANUARY 01", "%Y %B %d"), ref) }) test_that("%. requires a value", { ref <- parse_date("2001-01-01") expect_equal(parse_date("2001?01?01", "%Y%.%m%.%d"), ref) expect_warning( out <- parse_date("20010101", "%Y%.%m%.%d") ) expect_equal(n_problems(out), 1) }) test_that("%Z detects named time zones", { ref <- POSIXct(1285912800, "America/Chicago") ct <- locale(tz = "America/Chicago") expect_equal(parse_datetime("2010-10-01 01:00", locale = ct), ref) expect_equal( parse_datetime("2010-10-01 01:00 America/Chicago", "%Y-%m-%d %H:%M %Z", locale = ct), ref ) }) test_that("parse_date returns a double like as.Date()", { ref <- parse_date("2001-01-01") expect_type(parse_datetime("2001-01-01"), "double") }) test_that("parses NA/empty correctly", { expect_equal(parse_datetime(""), POSIXct(NA_real_)) expect_equal(parse_date(""), as.Date(NA)) expect_equal(parse_datetime("NA"), POSIXct(NA_real_)) expect_equal(parse_date("NA"), as.Date(NA)) expect_equal(parse_datetime("TeSt", na = "TeSt"), POSIXct(NA_real_)) expect_equal(parse_date("TeSt", na = "TeSt"), as.Date(NA)) }) # Locales ----------------------------------------------------------------- test_that("locale affects months", { jan1 <- as.Date("2010-01-01") fr <- locale("fr") expect_equal(parse_date("1 janv. 2010", "%d %b %Y", locale = fr), jan1) expect_equal(parse_date("1 janvier 2010", "%d %B %Y", locale = fr), jan1) }) test_that("locale affects am/pm", { a <- parse_time("1:30 PM", "%H:%M %p") b <- parse_time("오후 1시 30분", "%p %H시 %M분", locale = locale("ko")) expect_equal(a, b) }) test_that("locale affects both guessing and parsing", { out <- parse_guess("01/02/2013", locale = locale(date_format = "%m/%d/%Y")) expect_equal(out, as.Date("2013-01-02")) }) test_that("text re-encoded before strings are parsed", { skip_on_cran() # need to figure out why this fails x <- "1 f\u00e9vrier 2010" y <- iconv(x, to = "ISO-8859-1") feb01 <- as.Date(ISOdate(2010, 02, 01)) expect_equal( parse_date(x, "%d %B %Y", locale = locale("fr")), feb01 ) expect_equal( parse_date(y, "%d %B %Y", locale = locale("fr", encoding = "ISO-8859-1")), feb01 ) }) # Time zones ------------------------------------------------------------------ test_that("same times with different offsets parsed as same time", { # From http://en.wikipedia.org/wiki/ISO_8601#Time_offsets_from_UTC same_time <- paste("2010-02-03", c("18:30Z", "22:30+04", "1130-0700", "15:00-03:30")) parsed <- parse_datetime(same_time) expect_equal(parsed, rep(utctime(2010, 2, 3, 18, 30, 0, 0), 4)) }) test_that("offsets can cross date boundaries", { expect_equal( parse_datetime("2015-01-31T2000-0500"), parse_datetime("2015-02-01T0100Z") ) }) test_that("unambiguous times with and without daylight savings", { skip_on_cran() # need to figure out why this fails melb <- locale(tz = "Australia/Melbourne") # Melbourne had daylight savings in 2015 that ended the morning of 2015-04-05 expect_equal( parse_datetime(c("2015-04-04 12:00:00", "2015-04-06 12:00:00"), locale = melb), POSIXct(c(1428109200, 1428285600), "Australia/Melbourne") ) # Japan didn't have daylight savings in 2015 ja <- locale(tz = "Japan") expect_equal( parse_datetime(c("2015-04-04 12:00:00", "2015-04-06 12:00:00"), locale = ja), POSIXct(c(1428116400, 1428289200), "Japan") ) }) # Guessing --------------------------------------------------------------------- test_that("DDDD-DD not parsed as date (i.e. doesn't trigger partial date match)", { expect_equal(guess_parser(c("1989-90", "1990-91")), "character") }) test_that("leading zeros don't get parsed as date without explicit separator", { expect_equal(guess_parser("00010203"), "character") expect_equal(guess_parser("0001-02-03"), "date") }) test_that("must have either two - or none", { expect_equal(guess_parser("2000-10-10"), "date") expect_equal(guess_parser("2000-1010"), "character") expect_equal(guess_parser("200010-10"), "character") expect_equal(guess_parser("20001010"), "integer") }) readr/tests/testthat/test-parsing-count-fields.R0000644000175100001440000000074513057262333021536 0ustar hornikuserscontext("Parsing, count_fields") test_that("counts correct number of fields based on supplied tokenizer", { string <- "1,a,NA\n2,b,NA\n" res_csv <- count_fields(string, tokenizer_csv()) res_tsv <- count_fields(string, tokenizer_tsv()) expect_equal(res_csv, c(3, 3)) expect_equal(res_tsv, c(1, 1)) }) test_that("maximum lines counted is respected", { string <- "1,a,NA\n2,b,NA\n" res_csv <- count_fields(string, tokenizer_csv(), n_max = 1) expect_equal(res_csv, 3) }) readr/tests/testthat/test-locale.R0000644000175100001440000000105313057262333016731 0ustar hornikuserscontext("locale") test_that("setting decimal mark overrides grouping mark", { expect_equal(locale(decimal_mark = ".")$grouping_mark, ",") expect_equal(locale(decimal_mark = ",")$grouping_mark, ".") }) test_that("setting grouping mark overrides decimal mark", { expect_equal(locale(grouping_mark = ".")$decimal_mark, ",") expect_equal(locale(grouping_mark = ",")$decimal_mark, ".") }) test_that("grouping and decimal marks must be different", { expect_error( locale(grouping_mark = ".", decimal_mark = "."), "must be different" ) }) readr/tests/testthat/enc-iso-8859-1.txt0000644000175100001440000000001713057262333017240 0ustar hornikusersfranais lve readr/tests/testthat/test-parsing-time.R0000644000175100001440000000240313106315444020066 0ustar hornikuserscontext("Parsing, time") test_that("default format captures cases", { late_night <- hms::hms(seconds = 22 * 3600 + 20 * 60) expect_equal(parse_time("22:20"), late_night) expect_equal(parse_time("10:20 pm"), late_night) expect_equal(parse_time("22:20:05"), hms::as.hms(late_night + 5)) expect_equal(parse_time("10:20:05 pm"), hms::as.hms(late_night + 5)) }) test_that("twelve o'clock is parsed properly", { morning <- hms::hms(seconds = 0 * 3600 + 1 * 60) midday <- hms::hms(seconds = 12 * 3600 + 1 * 60) expect_equal(parse_time("12:01 AM"), morning) expect_equal(parse_time("12:01 PM"), midday) expect_equal(parse_time("12:01"), midday) }) test_that("accepts single digit hour", { early_morn <- hms::hms(seconds = 1 * 3600 + 20 * 60) expect_equal(parse_time("1:20 am"), early_morn) }) test_that("parses NA/empty correctly", { out <- parse_time(c("NA", "")) exp <- hms::hms(seconds = c(NA_real_, NA_real_)) expect_equal(out, exp) expect_equal(parse_time("TeSt", na = "TeSt"), hms::hms(seconds = NA_real_)) }) test_that("times are guessed as expected", { expect_equal(guess_parser("12:01"), "time") expect_equal( guess_parser("12:01:01"), "time") expect_equal( guess_parser(c("04:00:00", "04:30:00", "14:00:22")), "time") }) readr/tests/testthat/eol-cr.csv0000644000175100001440000000003213057262333016264 0ustar hornikusers"x","y" 1,"a" 2,"b" 3,"c" readr/tests/testthat/test-write-lines.R0000644000175100001440000000571613106315444017743 0ustar hornikuserscontext("write_lines") test_that("write_lines uses UTF-8 encoding", { tmp <- tempfile() on.exit(unlink(tmp)) write_lines(c("fran\u00e7ais", "\u00e9l\u00e8ve"), tmp) x <- read_lines(tmp, locale = locale(encoding = "UTF-8"), progress = FALSE) expect_equal(x, c("fran\u00e7ais", "\u00e9l\u00e8ve")) }) test_that("write_lines writes an empty file if given a empty character vector", { tmp <- tempfile() on.exit(unlink(tmp)) write_lines(character(), tmp) expect_true(empty_file(tmp)) }) test_that("write_lines respects the NA argument", { tmp <- tempfile() tmp2 <- tempfile() on.exit(unlink(c(tmp, tmp2))) write_lines(c("first", NA_character_, "last"), tmp) expect_equal(read_lines(tmp), c("first", "NA", "last")) write_lines(c("first", NA_character_, "last"), tmp2, na = "test") expect_equal(read_lines(tmp2), c("first", "test", "last")) }) test_that("write_lines can append to a file", { tmp <- tempfile() on.exit(unlink(tmp)) write_lines(c("first", "last"), tmp) write_lines(c("first", "last"), tmp, append = TRUE) expect_equal(read_lines(tmp), c("first", "last", "first", "last")) }) test_that("write_lines accepts a list of raws", { x <- lapply(seq_along(1:10), function(x) charToRaw(paste0(collapse = "", sample(letters, size = sample(0:22, 1))))) tmp <- tempfile() on.exit(unlink(tmp)) write_lines(x, tmp) expect_equal(read_lines(tmp), vapply(x, rawToChar, character(1))) }) # write_file ------------------------------------------------------------------ test_that("write_file round trips", { tmp <- tempfile() on.exit(unlink(tmp)) x <- "foo\nbar" write_file(x, tmp) expect_equal(read_file(tmp), x) }) test_that("write_file round trips with an empty vector", { tmp <- tempfile() on.exit(unlink(tmp)) x <- "" write_file(x, tmp) expect_equal(read_file(tmp), x) }) test_that("write_file errors if given a character vector of length != 1", { tmp <- tempfile() expect_error(write_file(character(), tmp)) expect_error(write_file(c("foo", "bar"), tmp)) }) test_that("write_file with raw round trips", { tmp <- tempfile() on.exit(unlink(tmp)) x <- charToRaw("foo\nbar") write_file(x, tmp) expect_equal(read_file_raw(tmp), x) }) test_that("write_file with raw round trips with an empty vector", { tmp <- tempfile() on.exit(unlink(tmp)) x <- raw() write_file(x, tmp) expect_equal(read_file_raw(tmp), x) }) test_that("write_lines can write to compressed files", { mt <- read_lines(readr_example("mtcars.csv.bz2")) filename <- file.path(tempdir(), "mtcars.csv.bz2") on.exit(unlink(filename)) write_lines(mt, filename) expect_true(is_bz2_file(filename)) expect_equal(mt, read_lines(filename)) }) test_that("write_file can write to compressed files", { mt <- read_file(readr_example("mtcars.csv.bz2")) filename <- file.path(tempdir(), "mtcars.csv.bz2") on.exit(unlink(filename)) write_file(mt, filename) expect_true(is_bz2_file(filename)) expect_equal(mt, read_file(filename)) }) readr/tests/testthat/test-parsing-logical.R0000644000175100001440000000147213106315444020547 0ustar hornikuserscontext("Parsing, logical") test_that("TRUE and FALSE parsed", { expect_equal(parse_logical(c("TRUE", "FALSE")), c(TRUE, FALSE)) }) test_that("true and false parsed", { expect_equal(parse_logical(c("true", "false")), c(TRUE, FALSE)) }) test_that("True and False parsed", { expect_equal(parse_logical(c("True", "False")), c(TRUE, FALSE)) }) test_that("T and F parsed", { expect_equal(parse_logical(c("T", "F")), c(TRUE, FALSE)) }) test_that("t and f parsed", { expect_equal(parse_logical(c("t", "f")), c(TRUE, FALSE)) }) test_that("1 and 0 parsed", { expect_equal(parse_logical(c("1", "0")), c(TRUE, FALSE)) }) test_that("other values generate warnings", { expect_warning(out <- parse_logical(c("A", "AB", "ABCD", "ABCDE", "NA"))) expect_equivalent(out, rep(NA, 5)) expect_equal(n_problems(out), 4) }) readr/tests/testthat/null-file0000644000175100001440000000002213057262333016177 0ustar hornikusersa,b,c 1,2, 3,4,5 readr/tests/testthat/basic-df.csv0000644000175100001440000000046013057262333016560 0ustar hornikusersa,b,c,d TRUE,7,0.181526642525569,"m" TRUE,2,0.833227441413328,"z" TRUE,8,0.926790483295918,"r" FALSE,10,0.375270307529718,"s" TRUE,6,0.420266286935657,"g" TRUE,3,0.435449987649918,"h" TRUE,5,0.0210941969417036,"w" FALSE,9,0.0915570755023509,"u" FALSE,1,0.756106866057962,"l" FALSE,4,0.353530979715288,NA readr/tests/testthat/sample_text.txt0000644000175100001440000000000713057262333017456 0ustar hornikusersabc 123readr/tests/testthat/empty-file0000644000175100001440000000000013053660504016355 0ustar hornikusersreadr/tests/testthat/test-read-csv.R0000644000175100001440000002127713106315672017210 0ustar hornikuserscontext("read_csv") test_that("read_csv col imputation, col_name detection and NA detection works", { test_data <- read_csv("basic-df.csv", col_types = NULL, col_names = TRUE, progress = FALSE) expect_equal(unname(unlist(lapply(test_data, class))), c("logical", "integer", "numeric", "character")) expect_equal(names(test_data), c("a", "b", "c", "d")) expect_equal(sum(is.na(test_data$d)), 1) test_data2 <- read_csv("basic-df.csv", col_types = list(a = "l", b = "i", c = "d", d = "c"), col_names = TRUE, progress = FALSE) expect_identical(test_data, test_data2) }) test_that("read_csv's 'NA' option genuinely changes the NA values", { expect_equal(read_csv("a\nz", na = "z", progress = FALSE)$a, NA_character_) }) test_that("read_csv's 'NA' option works with multiple NA values", { expect_equal(read_csv("a\nNA\n\nmiss\n13", na = c("13", "miss"), progress = FALSE)$a, c("NA", NA, NA)) }) test_that('passing character() to read_csv\'s "NA" option reads "" correctly', { expect_equal(read_csv("a\nfoo\n\n", na = character(), progress = FALSE)$a, "foo") }) test_that("passing \"\" to read_csv's 'NA' option reads \"\" correctly", { expect_equal(read_csv("a,b\nfoo,bar\nfoo,\n", na = "", progress = FALSE)$b, c("bar", NA)) }) test_that("changing read_csv's 'quote' argument works correctly", { test_data <- read_csv("basic-df.csv", col_types = NULL, col_names = TRUE, progress = FALSE) test_data_singlequote <- read_csv("basic-df-singlequote.csv", quote="'") expect_identical(test_data, test_data_singlequote) }) test_that("read_csv's 'skip' option allows for skipping'", { test_data <- read_csv("basic-df.csv", skip = 1, progress = FALSE) expect_equal(nrow(test_data), 9) }) test_that("read_csv's 'skip' option allows for skipping when no header row is present'", { test_data <- read_csv("basic-df.csv", skip = 1, col_names = FALSE, progress = FALSE) expect_equal(nrow(test_data), 10) }) test_that("read_csv's 'n_max' allows for a maximum number of records and does not corrupt any", { test_data <- read_csv("basic-df.csv", n_max = 7, progress = FALSE) expect_equal(nrow(test_data), 7) expect_equal(sum(is.na(test_data)), 0) }) test_that("n_max also affects column guessing", { df <- read_csv(n_max = 1, 'x,y,z 1,2,3 1,2,3,4' , progress = FALSE) expect_equal(dim(df), c(1, 3)) }) test_that("can read more than 100 columns", { set.seed(2015-3-13) x <- as.data.frame(matrix(rbinom(300, 2, .5), nrow = 2)) y <- format_csv(x) expect_equal(ncol(read_csv(y, progress = FALSE)), 150) }) test_that("encoding affects text and headers", { x <- read_csv("enc-iso-8859-1.txt", locale = locale(encoding = "ISO-8859-1"), progress = FALSE) expect_identical(names(x), "fran\u00e7ais") expect_identical(x[[1]], "\u00e9l\u00e8ve") }) test_that("nuls are dropped with a warning", { expect_warning(x <- read_csv("raw.csv", progress = FALSE)) expect_equal(n_problems(x), 1) expect_equal(x$abc, "ab") }) # Column warnings --------------------------------------------------------- test_that("warnings based on number of columns (not output columns)", { out1 <- read_csv("1,2,3\n4,5,6", "z", "__i", progress = FALSE) out2 <- read_csv("1,2,3\n4,5,6", FALSE, cols_only(X3 = "i"), progress = FALSE) expect_equal(n_problems(out1), 0) expect_equal(n_problems(out2), 0) }) test_that("missing last field generates warning", { expect_warning(out <- read_csv("a,b\n2", progress = FALSE)) expect_equal(problems(out)$expected, "2 columns") }) test_that("missing lines are skipped without warning", { # first expect_silent(out <- read_csv("a,b\n\n\n1,2", progress = FALSE)) # middle expect_silent(out <- read_csv("a,b\n1,2\n\n\n2,3\n", progress = FALSE)) # last (trailing \n is ignored) expect_silent(out <- read_csv("a,b\n1,2\n\n\n", progress = FALSE)) }) test_that("warning lines are correct after skipping", { expect_warning(out1 <- read_csv("v1,v2\n\n1,2", col_types = "i", progress = FALSE)) expect_warning(out2 <- read_csv("v1,v2\n#foo\n1,2", col_types = "i", comment = "#", progress = FALSE)) expect_equal(problems(out1)$row, 1) expect_equal(problems(out2)$row, 1) expect_warning(out3 <- read_csv("v1,v2\n\n1,2\n\n3,4", col_types = "i", progress = FALSE)) expect_warning(out4 <- read_csv("v1,v2\n#foo\n1,2\n#bar\n3,4", col_types = "i", comment = "#", progress = FALSE)) expect_equal(problems(out3)$row, c(1, 2)) expect_equal(problems(out4)$row, c(1, 2)) }) test_that("extra columns generates warnings", { expect_warning(out1 <- read_csv("a,b\n1,2,3\n", progress = FALSE)) expect_warning(out2 <- read_csv("a,b\n1,2,3", col_types = "ii", progress = FALSE)) expect_warning(out3 <- read_csv("1,2,3\n", c("a", "b"), progress = FALSE)) expect_warning(out4 <- read_csv("1,2,3\n", c("a", "b"), "ii", progress = FALSE)) expect_equal(problems(out1)$expected, "2 columns") expect_equal(problems(out2)$expected, "2 columns") expect_equal(problems(out3)$expected, "2 columns") expect_equal(problems(out4)$expected, "2 columns") }) test_that("too few or extra col_types generates warnings", { expect_warning(out1 <- read_csv("v1,v2\n1,2", col_types = "i", progress = FALSE)) expect_equal(problems(out1)$expected, "1 columns") expect_equal( problems(out1)$actual, "2 columns") expect_warning(out2 <- read_csv("v1,v2\n1,2", col_types = "iii", progress = FALSE)) expect_equal(ncol(out2), 2) }) # read_csv2 --------------------------------------------------------------- test_that("decimal mark automatically set to ,", { expect_message( x <- read_csv2("x\n1,23", progress = FALSE), if (default_locale()$decimal_mark == ".") "decimal .*grouping .*mark" else NA) expect_equal(x[[1]], 1.23) }) # Zero rows --------------------------------------------------------------- test_that("header only df gets character columns", { x <- read_csv("a,b\n", progress = FALSE) expect_equal(dim(x), c(0, 2)) expect_equal(class(x$a), "character") expect_equal(class(x$b), "character") }) test_that("n_max 0 gives zero row data frame", { x <- read_csv("a,b\n1,2", n_max = 0, progress = FALSE) expect_equal(dim(x), c(0, 2)) expect_equal(class(x$a), "character") expect_equal(class(x$b), "character") }) test_that("empty file with col_names and col_types creates correct columns", { x <- read_csv(datasource_string("", 0), c("a", "b"), "ii", progress = FALSE) expect_equal(dim(x), c(0, 2)) expect_equal(class(x$a), "integer") expect_equal(class(x$b), "integer") }) # Comments ---------------------------------------------------------------- test_that("comments are ignored regardless of where they appear", { out1 <- read_csv('x\n1#comment',comment = "#", progress = FALSE) out2 <- read_csv('x\n1#comment\n#comment', comment = "#", progress = FALSE) out3 <- read_csv('x\n"1"#comment', comment = "#", progress = FALSE) expect_equal(out1$x, 1) expect_equal(out2$x, 1) expect_equal(out3$x, 1) expect_warning(out4 <- read_csv('x,y\n1,#comment', comment = "#", progress = FALSE)) expect_equal(out4$y, NA_character_) expect_warning(out5 <- read_csv("x1,x2,x3\nA2,B2,C2\nA3#,B2,C2\nA4,A5,A6", comment = "#", progress = FALSE)) expect_warning(out6 <- read_csv("x1,x2,x3\nA2,B2,C2\nA3,#B2,C2\nA4,A5,A6", comment = "#", progress = FALSE)) expect_warning(out7 <- read_csv("x1,x2,x3\nA2,B2,C2\nA3,#B2,C2\n#comment\nA4,A5,A6", comment = "#", progress = FALSE)) chk <- tibble::data_frame( x1 = c("A2", "A3", "A4"), x2 = c("B2", NA_character_, "A5"), x3 = c("C2", NA_character_, "A6")) expect_true(all.equal(chk, out5)) expect_true(all.equal(chk, out6)) expect_true(all.equal(chk, out7)) }) test_that("escaped/quoted comments are ignored", { out1 <- read_delim('x\n\\#', comment = "#", delim = ",", escape_backslash = TRUE, escape_double = FALSE, progress = FALSE) out2 <- read_csv('x\n"#"', comment = "#", progress = FALSE) expect_equal(out1$x, "#") expect_equal(out2$x, "#") }) test_that("leading comments are ignored", { out <- read_csv("#a\n#b\nx\n1", comment = "#", progress = FALSE) expect_equal(ncol(out), 1) expect_equal(out$x, 1L) }) test_that("skip respects comments", { read_x <- function(...) { read_csv("#a\nb\nc", col_names = FALSE, ..., progress = FALSE)[[1]] } expect_equal(read_x(), c("#a", "b", "c")) expect_equal(read_x(skip = 1), c("b", "c")) expect_equal(read_x(comment = "#"), c("b", "c")) expect_equal(read_x(comment = "#", skip = 1), c("c")) }) test_that("read_csv returns an empty data.frame on an empty file", { expect_true(all.equal(read_csv("empty-file", progress = FALSE), tibble::data_frame())) }) test_that("read_delim errors on length 0 delimiter (557)", { expect_error(read_delim("a b\n1 2\n", delim = ""), "`delim` must be at least one character, use `read_table\\(\\)` for whitespace delimited input\\.") }) readr/tests/testthat/test-read-file.R0000644000175100001440000000473513106315444017331 0ustar hornikuserscontext("read_file") # df <- dplyr::data_frame(français = "élève") # write.csv(df, # "tests/testthat/enc-iso-8859-1.txt", # fileEncoding = "ISO-8859-1", # row.names = FALSE, # quote = FALSE) test_that("read_file respects encoding", { x <- read_file("enc-iso-8859-1.txt", locale(encoding = "ISO-8859-1")) expect_equal(substr(x, 5, 5), "\u00e7") }) sample_text_str <- "abc\n123" # contents of sample_text.txt eol_cr_text <- "x y\n1 a\n2 b\n3 c\n" # contents of eol_cr.txt test_that("read_file works with a local text file passed as character", { expect_equal(read_file("sample_text.txt"), sample_text_str) }) test_that("read_file works with a local text file, skipping one line", { expect_equal( read_file(datasource("sample_text.txt", skip = 1)), paste(tail(strsplit(sample_text_str,"\n")[[1]], -1), collapse = "\n") ) }) test_that("read_file works with a character datasource", { expect_equal(read_file(sample_text_str), sample_text_str) }) test_that("read_file works with a connection to a local file", { con <- file("sample_text.txt", "rb") on.exit(close(con), add = TRUE) expect_equal(read_file(con), sample_text_str) }) test_that("read_file works with a raw datasource", { expect_equal(read_file(charToRaw(sample_text_str)), sample_text_str) }) test_that("read_file works with compressed files", { expect_equal(read_file("eol-cr.txt.gz"), eol_cr_text) expect_equal(read_file("eol-cr.txt.bz2"), eol_cr_text) expect_equal(read_file("eol-cr.txt.xz"), eol_cr_text) expect_equal(read_file("eol-cr.txt.zip"), eol_cr_text) }) test_that("read_file works via https", { url <- "https://raw.githubusercontent.com/tidyverse/readr/master/tests/testthat/eol-cr.txt" expect_equal(read_file(url), eol_cr_text) }) test_that("read_file works via https on gz file", { url <- "https://raw.githubusercontent.com/tidyverse/readr/master/tests/testthat/eol-cr.txt.gz" expect_equal(read_file(url), eol_cr_text) }) test_that("read_file returns \"\" on an empty file", { expect_equal(read_file("empty-file"), "") }) # read_file_raw --------------------------------------------------------------- test_that("read_file_raw works with a local text file", { expect_equal(read_file_raw("sample_text.txt"), charToRaw("abc\n123")) }) test_that("read_file_raw works with a character datasource", { expect_equal(read_file_raw("abc\n123"), charToRaw("abc\n123")) }) test_that("read_file_raw returns raw() on an empty file", { expect_equal(read_file_raw("empty-file"), raw()) }) readr/tests/testthat/test-write-delim.R0000644000175100001440000000632713106315444017722 0ustar hornikuserscontext("write_delim") test_that("strings are only quoted if needed", { x <- c("a", ',') csv <- format_delim(data.frame(x), delim = ",",col_names = FALSE) expect_equal(csv, 'a\n\",\"\n') ssv <- format_delim(data.frame(x), delim = " ",col_names = FALSE) expect_equal(ssv, 'a\n,\n') }) test_that("a literal NA is quoted", { expect_equal(format_csv(data.frame(x = "NA")), "x\n\"NA\"\n") }) test_that("na argument modifies how missing values are written", { df <- data.frame(x = c(NA, "x", ".")) expect_equal(format_csv(df, na = "."), "x\n.\nx\n\".\"\n") }) test_that("read_delim/csv/tsv and write_delim round trip special chars", { x <- c("a", '"', ",", "\n","at\t") output <- data.frame(x) input <- read_delim(format_delim(output, delim = " "), delim = " ", progress = FALSE) input_csv <- read_csv(format_delim(output, delim = ","), progress = FALSE) input_tsv <- read_tsv(format_delim(output, delim = "\t"), progress = FALSE) expect_equal(input$x, input_csv$x, input_tsv$x, x) }) test_that("special floating point values translated to text", { df <- data.frame(x = c(NaN, NA, Inf, -Inf)) expect_equal(format_csv(df), "x\nNaN\nNA\nInf\n-Inf\n") }) test_that("logical values give long names", { df <- data.frame(x = c(NA, FALSE, TRUE)) expect_equal(format_csv(df), "x\nNA\nFALSE\nTRUE\n") }) test_that("roundtrip preserved floating point numbers", { input <- data.frame(x = runif(100)) output <- read_delim(format_delim(input, delim = " "), delim = " ", progress = FALSE) expect_equal(input$x, output$x) }) test_that("roundtrip preserves dates and datetimes", { x <- as.Date("2010-01-01") + 1:10 y <- as.POSIXct(x) attr(y, "tzone") <- "UTC" input <- data.frame(x, y) output <- read_delim(format_delim(input, delim = " "), delim = " ", progress = FALSE) expect_equal(output$x, x) expect_equal(output$y, y) }) test_that("fails to create file in non-existent directory", { expect_warning(expect_error(write_csv(mtcars, file.path(tempdir(), "/x/y")), "cannot open the connection"), "No such file or directory") }) test_that("write_excel_csv includes a byte order mark", { tmp <- tempfile() on.exit(unlink(tmp)) write_excel_csv(mtcars, tmp) output <- readBin(tmp, "raw", file.info(tmp)$size) # BOM is there expect_equal(output[1:3], charToRaw("\xEF\xBB\xBF")) # Rest of file also there expect_equal(output[4:6], charToRaw("mpg")) }) test_that("does not writes a tailing .0 for whole number doubles", { expect_equal(format_tsv(tibble::data_frame(x = 1)), "x\n1\n") expect_equal(format_tsv(tibble::data_frame(x = 0)), "x\n0\n") expect_equal(format_tsv(tibble::data_frame(x = -1)), "x\n-1\n") expect_equal(format_tsv(tibble::data_frame(x = 999)), "x\n999\n") expect_equal(format_tsv(tibble::data_frame(x = -999)), "x\n-999\n") expect_equal(format_tsv(tibble::data_frame(x = 123456789)), "x\n123456789\n") expect_equal(format_tsv(tibble::data_frame(x = -123456789)), "x\n-123456789\n") }) test_that("write_csv can write to compressed files", { mt <- read_csv(readr_example("mtcars.csv.bz2")) filename <- file.path(tempdir(), "mtcars.csv.bz2") on.exit(unlink(filename)) write_csv(mt, filename) expect_true(is_bz2_file(filename)) expect_equal(mt, read_csv(filename)) }) readr/tests/testthat/test-parsing-character.R0000644000175100001440000000422113106315444021064 0ustar hornikuserscontext("Parsing, character") test_that("ws dropped by default", { df <- read_csv("x\n a \n b\n", progress = FALSE) expect_equal(df$x, c("a", "b")) }) test_that("trim_ws = FALSE keeps ws", { df <- read_csv("x\n a\nb \n", trim_ws = FALSE, progress = FALSE) expect_equal(df$x, c(" a", "b ")) }) # Encoding ---------------------------------------------------------------- test_that("locale encoding affects parsing", { x <- c("août", "élève", "ça va") # expect_equal(Encoding(x), rep("UTF-8", 3)) y <- iconv(x, "UTF-8", "latin1") # expect_equal(Encoding(x), rep("latin1", 3)) fr <- locale("fr", encoding = "latin1") z <- parse_character(y, locale = fr) # expect_equal(Encoding(z), rep("UTF-8", 3)) # identical coerces encodings to match, so need to compare raw values as_raw <- function(x) lapply(x, charToRaw) expect_identical(as_raw(x), as_raw(z)) }) test_that("Unicode Byte order marks are stripped from output", { # UTF-8 expect_equal( charToRaw(read_lines( as.raw(c(0xef, 0xbb, 0xbf, # BOM 0x41, # A 0x0A # newline )))), as.raw(0x41)) # UTF-16 Big Endian expect_equal( charToRaw(read_lines( as.raw(c(0xfe, 0xff, # BOM 0x41, # A 0x0A # newline )))), as.raw(0x41)) # UTF-16 Little Endian expect_equal( charToRaw(read_lines( as.raw(c(0xff, 0xfe, # BOM 0x41, # A 0x0A # newline )))), as.raw(0x41)) # UTF-32 Big Endian expect_equal( charToRaw(read_lines( as.raw(c(0x00, 0x00, 0xfe, 0xff, # BOM 0x41, # A 0x0A # newline )))), as.raw(0x41)) # UTF-32 Little Endian expect_equal( charToRaw(read_lines( as.raw(c(0xff, 0xfe, 0x00, 0x00, # BOM 0x41, # A 0x0A # newline )))), as.raw(0x41)) # Vectors shorter than the BOM are handled safely expect_equal(charToRaw(read_lines( as.raw(c(0xef, 0xbb)))), as.raw(c(0xef, 0xbb))) expect_equal(charToRaw(read_lines( as.raw(c(0xfe)))), as.raw(c(0xfe))) expect_equal(charToRaw(read_lines( as.raw(c(0xff)))), as.raw(c(0xff))) }) readr/tests/testthat/eol-crlf.txt0000644000175100001440000000002013057262333016627 0ustar hornikusersx y 1 a 2 b 3 c readr/tests/testthat/eol-crlf.csv0000644000175100001440000000003613057262333016612 0ustar hornikusers"x","y" 1,"a" 2,"b" 3,"c" readr/tests/testthat/test-read-fwf.R0000644000175100001440000001371513106315750017172 0ustar hornikuserscontext("read_fwf") test_that("trailing spaces ommitted", { spec <- fwf_empty("fwf-trailing.txt") expect_equal(spec$begin, c(0, 4)) expect_equal(spec$end, c(3, NA)) df <- read_fwf("fwf-trailing.txt", spec, progress = FALSE) expect_equal(df$X1, df$X2) }) test_that("skipping column doesn't pad col_names", { x <- "1 2 3\n4 5 6" out1 <- read_fwf(x, fwf_empty(x), col_types = 'd-d') expect_named(out1, c("X1", "X3")) names <- c("a", "b", "c") out2 <- read_fwf(x, fwf_empty(x, col_names = names), col_types = 'd-d') expect_named(out2, c("a", "c")) }) test_that("fwf_empty can skip comments", { x <- "COMMENT\n1 2 3\n4 5 6" out1 <- read_fwf(x, fwf_empty(x, comment = "COMMENT"), comment = "COMMENT") expect_equal(dim(out1), c(2, 3)) }) test_that("passing \"\" to read_fwf's 'na' option", { expect_equal(read_fwf('foobar\nfoo ', fwf_widths(c(3, 3)), na = "", progress = FALSE)[[2]], c("bar", NA)) }) test_that("ragged last column expanded with NA", { x <- read_fwf("1a\n2ab\n3abc", fwf_widths(c(1, NA)), progress = FALSE) expect_equal(x$X2, c("a", "ab", "abc")) expect_equal(n_problems(x), 0) }) test_that("ragged last column shrunk with warning", { expect_warning(x <- read_fwf("1a\n2ab\n3abc", fwf_widths(c(1, 3)), progress = FALSE)) expect_equal(x$X2, c("a", "ab", "abc")) expect_equal(n_problems(x), 2) }) test_that("read all columns with positions, non ragged", { col_pos <- fwf_positions(c(1,3,6),c(2,5,6)) x <- read_fwf('12345A\n67890BBBBBBBBB\n54321C',col_positions = col_pos, progress = FALSE) expect_equal(x$X3, c("A", "B", "C")) expect_equal(n_problems(x), 0) }) test_that("read subset columns with positions", { col_pos <- fwf_positions(c(1,3),c(2,5)) x <- read_fwf('12345A\n67890BBBBBBBBB\n54321C',col_positions = col_pos, progress = FALSE) expect_equal(x$X1, c(12, 67, 54)) expect_equal(x$X2, c(345, 890, 321)) expect_equal(n_problems(x), 0) }) test_that("read columns with positions, ragged", { col_pos <- fwf_positions(c(1,3,6),c(2,5,NA)) x <- read_fwf('12345A\n67890BBBBBBBBB\n54321C',col_positions = col_pos, progress = FALSE) expect_equal(x$X1, c(12, 67, 54)) expect_equal(x$X2, c(345, 890, 321)) expect_equal(x$X3, c('A', 'BBBBBBBBB', 'C')) expect_equal(n_problems(x), 0) }) test_that("read columns with width, ragged", { col_pos <- fwf_widths(c(2,3,NA)) x <- read_fwf('12345A\n67890BBBBBBBBB\n54321C',col_positions = col_pos, progress = FALSE) expect_equal(x$X1, c(12, 67, 54)) expect_equal(x$X2, c(345, 890, 321)) expect_equal(x$X3, c('A', 'BBBBBBBBB', 'C')) expect_equal(n_problems(x), 0) }) test_that("read_fwf returns an empty data.frame on an empty file", { expect_true(all.equal(read_fwf("empty-file", progress = FALSE), tibble::data_frame())) }) test_that("check for line breaks in between widths", { txt1 <- paste( "1 1", "2", "1 1 ", sep = "\n" ) expect_warning(out1 <- read_fwf(txt1, fwf_empty(txt1))) expect_equal(n_problems(out1), 2) txt2 <- paste( " 1 1", " 2", " 1 1 ", sep = "\n" ) expect_warning(out2 <- read_fwf(txt2, fwf_empty(txt2))) expect_equal(n_problems(out2), 2) exp <- tibble::tibble(X1 = c(1L, 2L, 1L), X2 = c(1L, NA, 1L)) expect_true(all.equal(out1, exp)) expect_true(all.equal(out2, exp)) }) test_that("ignore commented lines anywhere in file", { col_pos <- fwf_positions(c(1,3,6),c(2,5,6)) x1 <- read_fwf('COMMENT\n12345A\n67890BBBBBBBBB\n54321C',col_positions = col_pos, comment = "COMMENT", progress = FALSE) x2 <- read_fwf('12345A\n67890BBBBBBBBB\nCOMMENT\n54321C',col_positions = col_pos, comment = "COMMENT", progress = FALSE) x3 <- read_fwf('12345A\n67890BBBBBBBBB\n54321C\nCOMMENT',col_positions = col_pos, comment = "COMMENT", progress = FALSE) x4 <- read_fwf('COMMENT\n12345A\nCOMMENT\n67890BBBBBBBBB\n54321C\nCOMMENT',col_positions = col_pos, comment = "COMMENT", progress = FALSE) expect_identical(x1, x2) expect_identical(x1, x3) expect_identical(x1, x4) expect_equal(x1$X3, c("A", "B", "C")) expect_equal(n_problems(x1), 0) }) test_that("error on empty spec (#511, #519)", { txt = "foo\n" pos = fwf_positions(start = numeric(0), end = numeric(0)) expect_error(read_fwf(txt, pos), "Zero-length.*specifications not supported") }) test_that("error on overlapping spec (#534)", { expect_error( read_fwf("2015a\n2016b", fwf_positions(c(1, 3, 5), c(4, 4, 5))), "Overlap.*" ) }) # fwf_cols test_that("fwf_cols produces correct fwf_positions object with elements of length 2", { expected <- fwf_positions(c(1L, 9L, 4L), c(2L, 12L, 6L), c("a", "b", "d")) expect_equivalent(fwf_cols(a = c(1, 2), b = c(9, 12), d = c(4, 6)), expected) }) test_that("fwf_cols produces correct fwf_positions object with elements of length 1", { expected <- fwf_widths(c(2L, 4L, 3L), c("a", "b", "c")) expect_equivalent(fwf_cols(a = 2, b = 4, c = 3), expected) }) test_that("fwf_cols throws error when arguments are not length 1 or 2", { expect_error(fwf_cols(a = 1:3, b = 4:5)) expect_error(fwf_cols(a = c(), b = 4:5)) }) test_that("fwf_cols works with unnamed columns", { expect_equivalent( fwf_cols(c(1, 2), c(9, 12), c(4, 6)), fwf_positions(c(1L, 9L, 4L), c(2L, 12L, 6L), c("X1", "X2", "X3")) ) expect_equivalent( fwf_cols(a = c(1, 2), c(9, 12), c(4, 6)), fwf_positions(c(1L, 9L, 4L), c(2L, 12L, 6L), c("a", "X2", "X3")) ) }) # read_table ------------------------------------------------------------------- test_that("read_table silently reads ragged last column", { x <- read_table("foo bar\n1 2\n3 4\n5 6\n", progress = FALSE) expect_equal(x$foo, c(1, 3, 5)) }) test_that("read_table skips all comment lines", { x <- read_table("foo bar\n1 2\n3 4\n5 6\n", progress = FALSE) y <- read_table("#comment1\n#comment2\nfoo bar\n1 2\n3 4\n5 6\n", progress = FALSE, comment = "#") expect_equal(x, y) }) test_that("read_table can read from a pipe (552)", { x <- read_table(pipe("echo a b c && echo 1 2 3 && echo 4 5 6"), progress = FALSE) expect_equal(x$a, c(1, 4)) }) readr/tests/testthat/basic-df-singlequote.csv0000644000175100001440000000046013106315444021112 0ustar hornikusersa,b,c,d TRUE,7,0.181526642525569,'m' TRUE,2,0.833227441413328,'z' TRUE,8,0.926790483295918,'r' FALSE,10,0.375270307529718,'s' TRUE,6,0.420266286935657,'g' TRUE,3,0.435449987649918,'h' TRUE,5,0.0210941969417036,'w' FALSE,9,0.0915570755023509,'u' FALSE,1,0.756106866057962,'l' FALSE,4,0.353530979715288,NA readr/tests/testthat/test-parsing-numeric.R0000644000175100001440000000717013106315672020603 0ustar hornikuserscontext("Parsing, numeric") es_MX <- locale("es", decimal_mark = ",") test_that("non-numeric integer/double matches fail", { expect_equal(n_problems(parse_double("d")), 1) expect_equal(n_problems(parse_integer("d")), 1) }) test_that("partial integer/double matches fail", { expect_equal(n_problems(parse_double("3d")), 1) expect_equal(n_problems(parse_integer("3d")), 1) }) test_that("parse functions converts NAs", { expect_equal(parse_double(c("1.5", "NA")), c(1.5, NA)) }) test_that("leading/trailing ws ignored when parsing", { expect_equal(parse_double(c(" 1.5", "1.5", "1.5 ")), rep(1.5, 3)) expect_equal(read_csv("x\n 1.5\n1.5\n1.5 \n", progress = FALSE)$x, rep(1.5, 3)) }) test_that("lone - or decimal marks are not numbers", { expect_equal(guess_parser("-"), "character") expect_equal(guess_parser("."), "character") expect_equal(guess_parser(",", locale = es_MX), "character") expect_equal(n_problems(parse_number(c(".", "-"))), 2) }) test_that("Numbers with trailing characters are parsed as characters", { expect_equal(guess_parser("13T"), "character") expect_equal(guess_parser(c("13T", "13T", "10N")), "character") }) test_that("problems() returns the full failed string if parsing fails (548)", { probs <- problems(read_tsv("x\n1\nx", na = "", col_types = "n")) expect_equal(probs$row, 2) expect_equal(probs$expected, "a number") expect_equal(probs$actual, "x") }) # Leading zeros ----------------------------------------------------------- test_that("leading zeros are not numbers", { expect_equal(guess_parser("0"), "integer") expect_equal(guess_parser("0."), "double") expect_equal(guess_parser("0001"), "character") }) # Flexible number parsing ------------------------------------------------- test_that("col_number only takes first number", { expect_equal(parse_number("XYZ 123,000 BLAH 456"), 123000) }) test_that("col_number helps with currency", { expect_equal(parse_number("$1,000,000.00"), 1e6) expect_equal(parse_number("$1.000.000,00", locale = es_MX), 1e6) }) test_that("invalid numbers don't parse", { expect_warning(x <- parse_number(c("..", "--", "3.3.3", "4-1"))) expect_equal(n_problems(x), 2) expect_equal(c(x), c(NA, NA, 3.3, 4.0)) }) test_that("number not guess if leading/trailing", { expect_equal(guess_parser("X1"), "character") expect_equal(parse_number("X1"), 1) expect_equal(guess_parser("1X"), "character") expect_equal(parse_number("1X"), 1) }) # Decimal comma ----------------------------------------------------------- test_that("parse_vector passes along decimal_mark", { expect_equal(parse_double("1,5", locale = es_MX), 1.5) }) test_that("type_convert passes along decimal_mark", { df <- data.frame(x = "1,5", stringsAsFactors = FALSE) out <- type_convert(df, locale = es_MX) expect_equal(out$x, 1.5) }) test_that("read_tsv passes on decimal_mark", { out <- read_tsv("x\n1,5", locale = es_MX, progress = FALSE) expect_equal(out$x, 1.5) }) # Negative numbers ----------------------------------------------------------- test_that("negative numbers return negative values", { expect_equal(parse_number("-2"), -2) expect_equal(parse_number("-100,000.00"), -100000) }) # Large numbers ------------------------------------------------------------- test_that("large numbers are parsed properly", { expect_equal(parse_double("100000000000000000000"), 1e20) expect_equal(parse_double("1267650600228229401496703205376"), 1.267650600228229401496703205376e+30) expect_equal(parse_double("100000000000000000000", locale = es_MX), 1e20) expect_equal(parse_double("1267650600228229401496703205376", locale = es_MX), 1.267650600228229401496703205376e+30) }) readr/tests/testthat/fwf-trailing.txt0000644000175100001440000000002013057262333017515 0ustar hornikusers123 123 123 123 readr/tests/testthat/eol-cr.txt.gz0000644000175100001440000000005713057262333016736 0ustar hornikusersWUeol-cr.txtP2TH2RH2VHkxygreadr/tests/testthat/helper.R0000644000175100001440000000112013106315444015764 0ustar hornikusers# Provide helper overriding tibble::all.equal.tbl_df as it requires dplyr # https://github.com/tidyverse/readr/pull/577 # Using this helper allows us to avoid Suggesting dplyr all.equal.tbl_df <- function(target, current, ..., check.attributes = FALSE) { all.equal.list(target, current, ..., check.attributes = check.attributes) } is_bz2_file <- function(x) { # Magic number for bz2 is "BZh" in ASCII # https://en.wikipedia.org/wiki/Bzip2#File_format identical(charToRaw("BZh"), readBin(x, n = 3, what = "raw")) } encoded <- function(x, encoding) { Encoding(x) <- encoding x } readr/tests/testthat/test-read-table.R0000644000175100001440000000251413106315444017472 0ustar hornikuserscontext("read_table") # read_table ------------------------------------------------------------------- test_that("read_table silently reads ragged last column", { x <- read_table("foo bar\n1 2\n3 4\n5 6\n", progress = FALSE) expect_equal(x$foo, c(1, 3, 5)) }) test_that("read_table skips all comment lines", { x <- read_table("foo bar\n1 2\n3 4\n5 6\n", progress = FALSE) y <- read_table("#comment1\n#comment2\nfoo bar\n1 2\n3 4\n5 6\n", progress = FALSE, comment = "#") expect_equal(x, y) }) test_that("read_table can read from a pipe (552)", { x <- read_table(pipe("echo a b c && echo 1 2 3 && echo 4 5 6"), progress = FALSE) expect_equal(x$a, c(1, 4)) }) # read_table2 ------------------------------------------------------------------- test_that("read_table2 silently reads ragged columns", { x <- read_table2("foo bar\n1 2\n3 4\n5 6\n", progress = FALSE) expect_equal(x$foo, c(1, 3, 5)) }) test_that("read_table2 skips all comment lines", { x <- read_table2("foo bar\n1 2\n3 4\n5 6\n", progress = FALSE) y <- read_table2("#comment1\n#comment2\nfoo bar\n1 2\n3 4\n5 6\n", progress = FALSE, comment = "#") expect_equal(x, y) }) test_that("read_table2 can read from a pipe (552)", { x <- read_table2(pipe("echo a b c && echo 1 2 3 && echo 4 5 6")) expect_equal(x$a, c(1, 4)) }) readr/tests/testthat/test-col-spec.R0000644000175100001440000001573713106315672017215 0ustar hornikuserscontext("col_spec") test_that("supplied col names must match non-skipped col types", { out <- col_spec_standardise(col_types = "c_c", col_names = c("a", "c")) expect_equal(names(out[[1]]), c("a", "", "c")) }) test_that("supplied col names matches to non-skipped col types", { out <- col_spec_standardise("a,b,c\n1,2,3", col_types = "i_i") expect_equal(names(out[[1]]), c("a", "b", "c")) }) test_that("guess col names matches all col types", { out <- col_spec_standardise("a,b,c\n", col_types = "i_i") expect_equal(names(out[[1]]), c("a", "b", "c")) expect_equal(out[[1]][[2]], col_skip()) }) test_that("col_names expanded to col_types with dummy names", { expect_warning( out <- col_spec_standardise("1,2,3,4\n", c("a", "b"), "iiii"), "Insufficient `col_names`" ) expect_equal(names(out[[1]]), c("a", "b", "X3", "X4")) }) test_that("col_names expanded to match col_types, with skipping", { expect_warning( out <- col_spec_standardise(col_types = "c_c", col_names = "a"), "Insufficient `col_names`" ) expect_equal(names(out[[1]]), c("a", "", "X2")) }) test_that("col_types expanded to col_names by guessing", { expect_warning( out <- col_spec_standardise("1,2,3\n", c("a", "b", "c"), "ii"), "Insufficient `col_types`" ) expect_equal(names(out[[1]]), c("a", "b", "c")) expect_equal(out[[1]][[3]], col_integer()) }) test_that("defaults expanded to match names", { out <- col_spec_standardise("a,b,c\n1,2,3", col_types = cols(.default = "c")) expect_equal(out[[1]], list( a = col_character(), b = col_character(), c = col_character() )) }) test_that("col_spec_standardise works properly with 1 row inputs and no header columns (#333)", { expect_is(col_spec_standardise("1\n", col_names = FALSE)[[1]]$X1, "collector_integer") }) test_that("warns about duplicated names", { expect_warning(col_spec_standardise("a,a\n1,2"), "Duplicated column names") expect_warning(col_spec_standardise("X2,\n1,2"), "Duplicated column names") expect_warning( col_spec_standardise("1,2\n1,2", col_names = c("X", "X")), "Duplicated column names" ) }) test_that("warn about missing col names and fill in", { expect_warning(col_spec_standardise(",\n1,2"), "Missing column names") expect_warning( col_spec_standardise("1,2\n1,2", col_names = c("X", NA)), "Missing column names" ) }) # Printing ---------------------------------------------------------------- regex_escape <- function(x) { chars <- c("*", ".", "?", "^", "+", "$", "|", "(", ")", "[", "]", "{", "}", "\\") gsub(paste0("([\\", paste0(collapse = "\\", chars), "])"), "\\\\\\1", x, perl = TRUE) } test_that("print(col_spec) with guess_parser", { out <- col_spec_standardise("a,b,c\n1,2,3") expect_output(print(out), regex_escape( "cols( a = col_integer(), b = col_integer(), c = col_integer() )")) }) test_that("print(col_spec) with collector_skip", { out <- cols_only(a = col_integer(), c = col_integer()) expect_output(print(out), regex_escape( "cols_only( a = col_integer(), c = col_integer() )")) }) test_that("print(col_spec) with truncated output", { out <- col_spec_standardise("a,b,c\n1,2,3", col_types = cols(.default = "c")) expect_output(print(out, n = 2, condense = FALSE), regex_escape( "cols( .default = col_character(), a = col_character(), b = col_character() # ... with 1 more columns )")) }) test_that("spec object attached to read data", { test_data <- read_csv("basic-df.csv", col_types = NULL, col_names = TRUE, progress = FALSE) expect_equal(spec(test_data), cols( a = col_logical(), b = col_integer(), c = col_double(), d = col_character())) }) test_that("print(col_spec) works with dates", { out <- col_spec_standardise("a,b,c\n", col_types = cols(a = col_date(format = "%Y-%m-%d"), b = col_date(), c = col_date())) expect_output(print(out), regex_escape( "cols( a = col_date(format = \"%Y-%m-%d\"), b = col_date(format = \"\"), c = col_date(format = \"\") )")) }) test_that("print(col_spec) with unnamed columns", { out <- col_spec_standardise(col_types = "c_c", col_names = c("a", "c")) expect_output(print(out), regex_escape( "cols( a = col_character(), col_skip(), c = col_character() )")) }) test_that("print(cols_only()) prints properly", { out <- cols_only( a = col_character(), c = col_integer()) expect_output(print(out), regex_escape( "cols_only( a = col_character(), c = col_integer() )")) }) test_that("print(col_spec) with n == 0 prints nothing", { out <- col_spec_standardise("a,b,c\n1,2,3") expect_silent(print(out, n = 0)) }) test_that("print(col_spec, condense = TRUE) condenses the spec", { out <- col_spec_standardise("a,b,c,d\n1,2,3,a") expect_output(print(cols_condense(out)), regex_escape( "cols( .default = col_integer(), d = col_character() )")) out <- col_spec_standardise("a,b,c,d\n1,2,3,4") expect_output(print(cols_condense(out)), regex_escape( "cols( .default = col_integer() )")) }) test_that("print(col_spec) with no columns specified", { out <- cols() expect_output(print(out), regex_escape("cols()")) out <- cols(.default = col_character()) expect_output(print(out), regex_escape( "cols( .default = col_character() )")) }) test_that("print(col_spec) and condense edge cases", { out <- cols(a = col_integer(), b = col_integer(), c = col_double()) expect_equal(format(out, n = 1, condense = TRUE), "cols( .default = col_integer(), c = col_double() ) ") }) test_that("non-syntatic names are escaped", { x <- read_csv("a b,_c,1,a`b\n1,2,3,4") expect_equal(format(spec(x)), "cols( `a b` = col_integer(), `_c` = col_integer(), `1` = col_integer(), `a\\`b` = col_integer() ) ") }) test_that("long expressions are wrapped (597)", { expect_equal(format(cols(a = col_factor(levels = c("apple", "pear", "banana", "peach", "apricot", "orange", "plum"), ordered = TRUE))), 'cols( a = col_factor(levels = c("apple", "pear", "banana", "peach", "apricot", "orange", "plum" ), ordered = TRUE, include_na = FALSE) ) ') }) test_that("guess_types errors on invalid inputs", { expect_error(col_spec_standardise("a,b,c\n", guess_max = NA), "`guess_max` must be a positive integer") expect_error(col_spec_standardise("a,b,c\n", guess_max = -1), "`guess_max` must be a positive integer") expect_warning(col_spec_standardise("a,b,c\n", guess_max = Inf), "`guess_max` is a very large value") }) test_that("check_guess_max errors on invalid inputs", { expect_error(check_guess_max(NULL), "`guess_max` must be a positive integer") expect_error(check_guess_max("test"), "`guess_max` must be a positive integer") expect_error(check_guess_max(letters), "`guess_max` must be a positive integer") expect_error(check_guess_max(1:2), "`guess_max` must be a positive integer") expect_error(check_guess_max(NA), "`guess_max` must be a positive integer") expect_error(check_guess_max(-1), "`guess_max` must be a positive integer") expect_warning(check_guess_max(Inf), "`guess_max` is a very large value") }) readr/tests/testthat/eol-cr.txt.bz20000644000175100001440000000006613057262333017013 0ustar hornikusersBZh91AY&SY_̛Y@88` 10!b>+j\H  vreadr/tests/testthat/test-read-chunked.R0000644000175100001440000000636413106315444020033 0ustar hornikuserscontext("read-chunked") test_that("read_lines_chunked", { file <- readr_example("mtcars.csv") num_rows <- length(readLines(file)) get_sizes <- function(data, pos) sizes[[length(sizes) + 1]] <<- length(data) # Full file in one chunk sizes <- list() read_lines_chunked(file, get_sizes) expect_equal(num_rows, sizes[[1]]) # Each line separately sizes <- list() read_lines_chunked(file, get_sizes, chunk_size = 1) expect_true(all(sizes == 1)) expect_equal(num_rows, length(sizes)) # In chunks of 5 sizes <- list() read_lines_chunked(file, get_sizes, chunk_size = 5) expect_true(all(sizes[1:6] == 5)) expect_true(all(sizes[[7]] == 3)) # Halting early get_sizes_stop <- function(data, pos) { sizes[[length(sizes) + 1]] <<- length(data) if (pos >= 5) { return(FALSE) } } sizes <- list() read_lines_chunked(file, get_sizes_stop, chunk_size = 5) expect_true(length(sizes) == 2) expect_true(all(sizes[1:2] == 5)) }) test_that("read_delim_chunked", { file <- readr_example("mtcars.csv") unchunked <- read_csv(file) get_dims <- function(data, pos) dims[[length(dims) + 1]] <<- dim(data) # Full file in one chunk dims <- list() read_csv_chunked(file, get_dims) expect_equal(dim(unchunked), dims[[1]]) # Each line separately dims <- list() read_csv_chunked(file, get_dims, chunk_size = 1) expect_true(all(vapply(dims[1:6], identical, logical(1), c(1L, 11L)))) expect_equal(nrow(unchunked), length(dims)) # In chunks of 5 dims <- list() read_csv_chunked(file, get_dims, chunk_size = 5) expect_true(all(vapply(dims[1:6], identical, logical(1), c(5L, 11L)))) expect_true(identical(dims[[7]], c(2L, 11L))) # Halting early get_dims_stop <- function(data, pos) { dims[[length(dims) + 1]] <<- dim(data) if (pos >= 5) { return(FALSE) } } dims <- list() read_csv_chunked(file, get_dims_stop, chunk_size = 5) expect_true(length(dims) == 2) expect_true(all(vapply(dims[1:2], identical, logical(1), c(5L, 11L)))) }) test_that("DataFrameCallback works as intended", { f <- readr_example("mtcars.csv") out0 <- subset(read_csv(f), gear == 3) fun3 <- DataFrameCallback$new(function(x, pos) subset(x, gear == 3)) out1 <- read_csv_chunked(f, fun3) # Need to set guess_max higher than 1 to guess correct column types out2 <- read_csv_chunked(f, fun3, chunk_size = 1, guess_max = 10) out3 <- read_csv_chunked(f, fun3, chunk_size = 10) expect_true(all.equal(out0, out1)) expect_true(all.equal(out0, out2)) expect_true(all.equal(out0, out3)) # No matching rows out0 <- subset(read_csv(f), gear == 5) fun5 <- DataFrameCallback$new(function(x, pos) subset(x, gear == 5)) out1 <- read_csv_chunked(f, fun5) # Need to set guess_max higher than 1 to guess correct column types out2 <- read_csv_chunked(f, fun5, chunk_size = 1, guess_max = 10) out3 <- read_csv_chunked(f, fun5, chunk_size = 10) expect_true(all.equal(out0, out1)) expect_true(all.equal(out0, out2)) expect_true(all.equal(out0, out3)) }) test_that("ListCallback works as intended", { f <- readr_example("mtcars.csv") out0 <- read_csv(f) fun <- ListCallback$new(function(x, pos) x[["mpg"]]) out1 <- read_csv_chunked(f, fun, chunk_size = 10) expect_equal(out0[["mpg"]], unlist(out1)) }) readr/tests/testthat/eol-cr.txt.xz0000644000175100001440000000011013057262333016745 0ustar hornikusers7zXZִF!t/x y 1 a 2 b 3 c 4xT(( l`}YZreadr/tests/testthat/test-read-lines.R0000644000175100001440000000336413106315444017521 0ustar hornikuserscontext("read_lines") test_that("read_lines respects encoding", { x <- read_lines("enc-iso-8859-1.txt", locale = locale(encoding = "ISO-8859-1"), progress = FALSE) expect_equal(x, c("fran\u00e7ais", "\u00e9l\u00e8ve")) }) test_that("read_lines returns an empty character vector on an empty file", { expect_equal(read_lines("empty-file", progress = FALSE), character()) }) test_that("read_lines handles embedded nuls", { expect_equal(read_lines("null-file", progress = FALSE), c("a,b,c", "1,2,", "3,4,5")) }) test_that("read_lines uses na argument", { expect_equal(read_lines("sample_text.txt", na = "abc", progress = FALSE), c(NA_character_, "123")) expect_equal(read_lines("sample_text.txt", na = "123", progress = FALSE), c("abc", NA_character_)) expect_equal(read_lines("sample_text.txt", na = c("abc", "123"), progress = FALSE), c(NA_character_, NA_character_)) }) test_that("blank lines are passed unchanged", { tmp <- tempfile() on.exit(unlink(tmp)) x <- c("abc", "", "123") write_lines(path = tmp, x) expect_equal(read_lines(tmp), x) expect_equal(read_lines(tmp, na = ""), c("abc", NA_character_, "123")) }) test_that("allocation works as expected", { tmp <- tempfile(fileext = ".gz") on.exit(unlink(tmp)) x <- rep(paste(rep("a", 2 ^ 10), collapse = ''), 2 ^ 11) writeLines(x, tmp) expect_equal(length(read_lines(tmp)), 2^11) }) # These tests are slow so are commented out #test_that("long vectors are supported", { #tmp <- tempfile(fileext = ".gz") #on.exit(unlink(tmp)) #x <- rep(paste(rep("a", 2 ^ 16), collapse = ''), 2 ^ 15) #con <- gzfile(tmp, open = "w", compression = 0) #writeLines(x, con) #close(con) #expect_equal(length(read_lines(tmp)), 2^15) #expect_equal(length(read_lines_raw(tmp)), 2^15) #}) readr/tests/testthat/test-parsing-factors.R0000644000175100001440000000373413106315444020601 0ustar hornikuserscontext("Parsing, factors") test_that("strings mapped to levels", { x <- parse_factor(c("a", "b"), levels = c("a", "b")) expect_equal(x, factor(c("a", "b"))) }) test_that("can generate ordered factor", { x <- parse_factor(c("a", "b"), levels = c("a", "b"), ordered = TRUE) expect_equal(x, ordered(c("a", "b"))) }) test_that("warning if value not in levels", { expect_warning(x <- parse_factor(c("a", "b", "c"), levels = c("a", "b"))) expect_equal(n_problems(x), 1) expect_equal(is.na(x), c(FALSE, FALSE, TRUE)) }) test_that("NAs silently passed along", { x <- parse_factor(c("a", "b", "NA"), levels = c("a", "b"), include_na = FALSE) expect_equal(n_problems(x), 0) expect_equal(x, factor(c("a", "b", NA))) }) test_that("levels = NULL (497)", { x <- parse_factor(c("a", "b", "c", "b"), levels = NULL) expect_equal(n_problems(x), 0) expect_equal(x, factor(c("a", "b", "c", "b"))) }) test_that("NAs included in levels if desired", { x <- parse_factor(c("NA", "b", "a"), levels = c("a", "b", NA)) expect_equal(x, factor(c(NA, "b", "a"), levels = c("a", "b", NA), exclude = NULL)) x <- parse_factor(c("NA", "b", "a"), levels = c("a", "b"), include_na = TRUE) expect_equal(x, factor(c(NA, "b", "a"), levels = c("a", "b", NA), exclude = NULL)) x <- parse_factor(c("NA", "b", "a"), levels = c("a", "b"), include_na = FALSE) expect_equal(x, factor(c(NA, "b", "a"))) x <- parse_factor(c("NA", "b", "a"), levels = NULL, include_na = FALSE) expect_equal(x, factor(c("NA", "b", "a"), levels = c("b", "a"))) x <- parse_factor(c("NA", "b", "a"), levels = NULL, include_na = TRUE) expect_equal(x, factor(c("NA", "b", "a"), levels = c(NA, "b", "a"), exclude = NULL)) }) test_that("Factors handle encodings properly (#615)", { x <- read_csv(encoded("test\nA\n\xC4\n", "latin1"), col_types = cols(col_factor(c("A", "\uC4"))), locale = locale(encoding = "latin1"), progress = FALSE) expect_is(x$test, "factor") expect_equal(x$test, factor(c("A", "\uC4"))) }) readr/tests/testthat/raw.csv0000644000175100001440000000002113057262333015672 0ustar hornikusersabc,def abc,def readr/tests/testthat/eol-cr.txt0000644000175100001440000000002013057262333016305 0ustar hornikusersx y 1 a 2 b 3 c readr/tests/testthat/test-tokenizer-delim.R0000644000175100001440000000524513106315444020600 0ustar hornikuserscontext("TokenizerDelim") # Tests tokenizing and unescaping parse_b <- function(x, ...) { tok <- tokenizer_delim(",", escape_double = FALSE, escape_backslash = TRUE, ...) tokenize(datasource_string(x, 0), tok) } parse_d <- function(x, ...) { tok <- tokenizer_delim(",", escape_double = TRUE, escape_backslash = FALSE, ...) tokenize(datasource_string(x, 0), tok) } test_that("simple sequence parsed correctly", { expect_equal(parse_d('1,2,3'), list(c("1", "2", "3"))) }) test_that("newlines are not tokenised", { expect_equal(parse_d('1\n2'), list("1", "2")) }) test_that("quotes in strings are dropped", { expect_equal(parse_d('"abc",abc'), list(c("abc", "abc"))) expect_equal(parse_b('"abc",abc'), list(c("abc", "abc"))) expect_equal(parse_b("'abc',abc", quote = "'"), list(c("abc", "abc"))) expect_equal(parse_d("'abc',abc", quote = "'"), list(c("abc", "abc"))) }) test_that("problems if unterminated string", { p1 <- problems(parse_d('1,2,"3')) p2 <- problems(parse_b('1,2,"3')) expect_equal(p1$col, 3) expect_equal(p2$col, 3) expect_equal(p1$expected, "closing quote at end of file") expect_equal(p2$expected, "closing quote at end of file") }) test_that("problem if unterminated escape", { p <- problems(parse_b('1\\')) expect_equal(p$row, 1) expect_equal(p$col, 1) }) test_that("empty fields become empty strings", { expect_equal(parse_d(',\n,'), list(c("[EMPTY]", "[EMPTY]"), c("[EMPTY]", "[EMPTY]"))) expect_equal(parse_d(',\n,\n'), list(c("[EMPTY]", "[EMPTY]"), c("[EMPTY]", "[EMPTY]"))) expect_equal(parse_d('""'), list("[EMPTY]")) }) test_that("bare NA becomes missing value", { expect_equal(parse_b('NA,"NA"', quoted_na = FALSE), list(c("[MISSING]", "NA"))) expect_equal(parse_d('NA,"NA"', quoted_na = FALSE), list(c("[MISSING]", "NA"))) }) test_that("quoted NA also becomes missing value", { expect_equal(parse_b('NA,"NA"', quoted_na = TRUE), list(c("[MISSING]", "[MISSING]"))) expect_equal(parse_d('NA,"NA"', quoted_na = TRUE), list(c("[MISSING]", "[MISSING]"))) }) test_that("empty string become missing values", { expect_equal(parse_b('NA,""', na = ""), list(c("NA", "[MISSING]"))) }) test_that("NA with spaces becomes missing value", { expect_equal(parse_b(' NA '), list(c("[MISSING]"))) }) test_that("string can be ended by new line", { expect_equal(parse_d('123,"a"\n'), list(c("123", "a"))) }) test_that("can escape delimeter with backslash", { expect_equal(parse_b('1\\,2'), list("1,2")) }) test_that("doubled quote becomes single quote (with d-escaping)", { expect_equal(parse_d('""""'), list('"')) }) test_that("escaped quoted doesn't terminate string (with b-escaping)", { expect_equal(parse_b('"\\""'), list('"')) }) readr/tests/testthat/test-type-convert.R0000644000175100001440000000103113057262333020125 0ustar hornikuserscontext("type_convert") test_that("missing values removed before guessing col type", { df1 <- data.frame(x = c("NA", "10"), stringsAsFactors = FALSE) df2 <- type_convert(df1) expect_equal(df2$x, c(NA, 10L)) }) test_that("requires data.frame input", { not_df <- matrix(letters[1:4], nrow = 2) expect_error(type_convert(not_df), "is.data.frame") }) test_that("character specifications of col_types not allowed", { expect_error(type_convert(mtcars, col_types = "dididddiiii"), "must be `NULL` or a `cols` specification") }) readr/tests/testthat/test-collectors.R0000644000175100001440000000242513106315672017647 0ustar hornikuserscontext("Collectors") test_that("guess for empty strings is character", { expect_equal(guess_parser(c("", "")), "character") }) test_that("guess for missing vector is character", { expect_equal(guess_parser(NA_character_), "character") }) test_that("empty + NA ignored when determining type", { expect_equal(guess_parser(c("1", "")), "integer") expect_equal(guess_parser(c("1", NA)), "integer") }) test_that("guess decimal commas with correct locale", { expect_equal(guess_parser("1,300"), "number") expect_equal(guess_parser("1,300", locale(decimal_mark = ",")), "double") }) # Numbers ----------------------------------------------------------------- test_that("only accept numbers with grouping mark", { expect_equal(guess_parser("1,300"), "number") expect_equal(guess_parser("1,300.00"), "number") }) # Concise collectors specification ---------------------------------------- test_that("_ or - skips column", { out1 <- read_csv("x,y\n1,2\n3,4", col_types = "-i", progress = FALSE) out2 <- read_csv("x,y\n1,2\n3,4", col_types = "_i", progress = FALSE) expect_equal(names(out1), "y") expect_equal(names(out2), "y") }) test_that("? guesses column type", { out1 <- read_csv("x,y\n1,2\n3,4", col_types = "?i", progress = FALSE) expect_equal(out1$x, c(1L, 3L)) }) readr/tests/testthat/test-eol.R0000644000175100001440000000333413106315444016252 0ustar hornikuserscontext("EOL") if (FALSE) { df <- data.frame(x = 1:3, y = letters[1:3], stringsAsFactors = FALSE) write.csv(df, "tests/testthat/eol-lf.csv", row.names = FALSE, eol = "\n") write.csv(df, "tests/testthat/eol-cr.csv", row.names = FALSE, eol = "\r") write.csv(df, "tests/testthat/eol-crlf.csv", row.names = FALSE, eol = "\r\n") write.fwf <- function(x, path, ...) { write.table(x, path, row.names = FALSE, quote = FALSE) } write.fwf(df, "tests/testthat/eol-lf.txt", row.names = FALSE, eol = "\n") write.fwf(df, "tests/testthat/eol-cr.txt", row.names = FALSE, eol = "\r") write.fwf(df, "tests/testthat/eol-crlf.txt", row.names = FALSE, eol = "\r\n") } test_that("read_csv standardises line breaks", { expect_equal(read_csv("eol-lf.csv", progress = FALSE)$y, letters[1:3]) expect_equal(read_csv("eol-cr.csv", progress = FALSE)$y, letters[1:3]) expect_equal(read_csv("eol-crlf.csv", progress = FALSE)$y, letters[1:3]) }) test_that("read_lines standardises line breaks", { lf <- read_lines("eol-lf.csv", progress = FALSE) expect_equal(read_lines("eol-cr.csv", progress = FALSE), lf) expect_equal(read_lines("eol-crlf.csv", progress = FALSE), lf) }) test_that("read_fwf/read_table standardises line breaks", { expect_equal(read_table("eol-lf.txt", progress = FALSE)$y, letters[1:3]) expect_equal(read_table("eol-cr.txt", progress = FALSE)$y, letters[1:3]) expect_equal(read_table("eol-crlf.txt", progress = FALSE)$y, letters[1:3]) }) test_that("read_table2 standardises line breaks", { expect_equal(read_table2("eol-lf.txt", progress = FALSE)$y, letters[0:3]) expect_equal(read_table2("eol-cr.txt", progress = FALSE)$y, letters[1:3]) expect_equal(read_table2("eol-crlf.txt", progress = FALSE)$y, letters[1:3]) }) readr/tests/testthat/eol-lf.txt0000644000175100001440000000002013057262333016302 0ustar hornikusersx y 1 a 2 b 3 c readr/tests/testthat/test-parsing.R0000644000175100001440000000022213057262333017132 0ustar hornikuserscontext("Parsing") test_that("trimmed before NA detection", { expect_equal(parse_logical(c(" TRUE ", "FALSE", " NA ")), c(TRUE, FALSE, NA)) }) readr/tests/testthat/eol-cr.txt.zip0000644000175100001440000000027213057262333017117 0ustar hornikusersPK o%Gkxyg eol-cr.txtUT WUnUux x y 1 a 2 b 3 c PK o%Gkxyg eol-cr.txtUTWUux PKPTreadr/tests/testthat/test-problems.R0000644000175100001440000000330713106315444017316 0ustar hornikuserscontext("problems") test_that("stop_for_problems throws error", { expect_warning(x <- parse_integer("1.234")) expect_error(stop_for_problems(x), "1 parsing failure") }) test_that("skipping columns gives incorrect problem column (#573)", { delim.skip0 <- problems(read_csv("aa,bb,cc\n", col_names = F, col_types = "dcc")) delim.skip1 <- problems(read_csv("aa,bb,cc\n", col_names = F, col_types = "_dc")) delim.skip2 <- problems(read_csv("aa,bb,cc\n", col_names = F, col_types = "--d")) expect_equal(delim.skip0$col, "X1") expect_equal(delim.skip1$col, "X2") expect_equal(delim.skip2$col, "X3") delim.sk0.2 <- problems(read_tsv("aa\tbb\tcc\n", col_names = F, col_types = "dcd")) delim.sk1.2 <- problems(read_tsv("aa\tbb\tcc\n", col_names = F, col_types = "_dd")) expect_equal(delim.sk0.2$col, c("X1", "X3")) expect_equal(delim.sk1.2$col, c("X2", "X3")) fwf.pos <- fwf_widths(c(2, 2, 2)) fwf.skip0 <- problems(read_fwf("aabbcc\n", fwf.pos, col_types = "dcc")) fwf.skip1 <- problems(read_fwf("aabbcc\n", fwf.pos, col_types = "_dc")) fwf.skip2 <- problems(read_fwf("aabbcc\n", fwf.pos, col_types = "--d")) fwf.sk0.2 <- problems(read_fwf("aabbcc\n", fwf.pos, col_types = "dcd")) fwf.sk1.2 <- problems(read_fwf("aabbcc\n", fwf.pos, col_types = "d-d")) expect_equal(fwf.skip0$col, "X1") expect_equal(fwf.skip1$col, "X2") expect_equal(fwf.skip2$col, "X3") expect_equal(fwf.sk0.2$col, c("X1", "X3")) expect_equal(fwf.sk1.2$col, c("X1", "X3")) }) test_that("problems returns the filename (#581)", { files <- problems(read_csv(readr_example("mtcars.csv"), col_types = cols(mpg = col_integer())))$file expect_equal(length(files), 28L) expect_equal("mtcars.csv'", basename(files)[[1L]]) }) readr/src/0000755000175100001440000000000013106621354012155 5ustar hornikusersreadr/src/SourceFile.h0000644000175100001440000000207113106621354014366 0ustar hornikusers#ifndef FASTREAD_SOURCEFILE_H_ #define FASTREAD_SOURCEFILE_H_ #include #include "boost.h" #include "Source.h" class SourceFile : public Source { boost::interprocess::file_mapping fm_; boost::interprocess::mapped_region mr_; const char* begin_; const char* end_; public: SourceFile(const std::string& path, int skip = 0, const std::string& comment = "") { try { fm_ = boost::interprocess::file_mapping(path.c_str(), boost::interprocess::read_only); mr_ = boost::interprocess::mapped_region(fm_, boost::interprocess::read_only); } catch(boost::interprocess::interprocess_exception& e) { Rcpp::stop("Cannot read file %s: %s", path, e.what()); } begin_ = static_cast(mr_.get_address()); end_ = begin_ + mr_.get_size(); // Skip byte order mark, if needed begin_ = skipBom(begin_, end_); // Skip lines, if needed begin_ = skipLines(begin_, end_, skip, comment); } const char* begin() { return begin_; } const char* end() { return end_; } }; #endif readr/src/parse.cpp0000644000175100001440000000675313106621354014006 0ustar hornikusers#include using namespace Rcpp; #include "Collector.h" #include "LocaleInfo.h" #include "Source.h" #include "Tokenizer.h" #include "TokenizerLine.h" #include "Warnings.h" // [[Rcpp::export]] IntegerVector dim_tokens_(List sourceSpec, List tokenizerSpec) { SourcePtr source = Source::create(sourceSpec); TokenizerPtr tokenizer = Tokenizer::create(tokenizerSpec); tokenizer->tokenize(source->begin(), source->end()); int rows = -1, cols = -1; for (Token t = tokenizer->nextToken(); t.type() != TOKEN_EOF; t = tokenizer->nextToken()) { rows = t.row(); if ((int) t.col() > cols) cols = t.col(); } return IntegerVector::create(rows + 1, cols + 1); } // [[Rcpp::export]] std::vector count_fields_(List sourceSpec, List tokenizerSpec, int n_max) { SourcePtr source = Source::create(sourceSpec); TokenizerPtr tokenizer = Tokenizer::create(tokenizerSpec); tokenizer->tokenize(source->begin(), source->end()); std::vector fields; for (Token t = tokenizer->nextToken(); t.type() != TOKEN_EOF; t = tokenizer->nextToken()) { if (n_max > 0 && t.row() >= (size_t) n_max) break; if (t.row() >= fields.size()) { fields.resize(t.row() + 1); } fields[t.row()] = t.col() + 1; } return fields; } // [[Rcpp::export]] RObject guess_header_(List sourceSpec, List tokenizerSpec, List locale_) { Warnings warnings; LocaleInfo locale(locale_); SourcePtr source = Source::create(sourceSpec); TokenizerPtr tokenizer = Tokenizer::create(tokenizerSpec); tokenizer->tokenize(source->begin(), source->end()); tokenizer->setWarnings(&warnings); CollectorCharacter out(&locale.encoder_); out.setWarnings(&warnings); for (Token t = tokenizer->nextToken(); t.type() != TOKEN_EOF; t = tokenizer->nextToken()) { if (t.row() > (size_t) 0) // only read one row break; if (t.col() >= (size_t) out.size()) { out.resize(t.col() + 1); } if (t.type() == TOKEN_STRING) { out.setValue(t.col(), t); } } return out.vector(); } // [[Rcpp::export]] RObject tokenize_(List sourceSpec, List tokenizerSpec, int n_max) { Warnings warnings; SourcePtr source = Source::create(sourceSpec); TokenizerPtr tokenizer = Tokenizer::create(tokenizerSpec); tokenizer->tokenize(source->begin(), source->end()); tokenizer->setWarnings(&warnings); std::vector > rows; for (Token t = tokenizer->nextToken(); t.type() != TOKEN_EOF; t = tokenizer->nextToken()) { if (n_max > 0 && t.row() >= (size_t) n_max) break; if (t.row() >= rows.size()) { rows.resize(t.row() + 1); } std::vector& row = rows[t.row()]; if (t.col() >= row.size()) row.resize(t.col() + 1); row[t.col()] = t.asString(); } RObject out = wrap(rows); return warnings.addAsAttribute(out); } // [[Rcpp::export]] SEXP parse_vector_(CharacterVector x, List collectorSpec, List locale_, const std::vector& na) { Warnings warnings; int n = x.size(); LocaleInfo locale(locale_); boost::shared_ptr col = Collector::create(collectorSpec, &locale); col->setWarnings(&warnings); col->resize(n); for (int i = 0; i < n; ++i) { Token t; if (x[i] == NA_STRING) { t = Token(TOKEN_MISSING, i, -1); } else { SEXP string = x[i]; t = Token(CHAR(string), CHAR(string) + Rf_length(string), i, -1, false); t.trim(); t.flagNA(na); } col->setValue(i, t); } return warnings.addAsAttribute(col->vector()); } readr/src/read.cpp0000644000175100001440000001116713106621354013602 0ustar hornikusers#include using namespace Rcpp; #include "LocaleInfo.h" #include "Source.h" #include "Tokenizer.h" #include "TokenizerLine.h" #include "Collector.h" #include "Progress.h" #include "Warnings.h" #include "Reader.h" // [[Rcpp::export]] CharacterVector read_file_(List sourceSpec, List locale_) { SourcePtr source = Source::create(sourceSpec); LocaleInfo locale(locale_); return CharacterVector::create( locale.encoder_.makeSEXP(source->begin(), source->end()) ); } // [[Rcpp::export]] RawVector read_file_raw_(List sourceSpec) { SourcePtr source = Source::create(sourceSpec); RawVector res(source->end() - source->begin()); std::copy(source->begin(), source->end(), res.begin()); return res; } // [[Rcpp::export]] CharacterVector read_lines_(List sourceSpec, List locale_, std::vector na, int n_max = -1, bool progress = true) { LocaleInfo locale(locale_); Reader r( Source::create(sourceSpec), TokenizerPtr(new TokenizerLine(na)), CollectorPtr(new CollectorCharacter(&locale.encoder_)), progress); return r.readToVector(n_max); } Function R6method(Environment env, const std::string& method) { return as(env[method]); } bool isTrue(SEXP x) { if (!(TYPEOF(x) == LGLSXP && Rf_length(x) == 1)) { stop("`continue()` must return a length 1 logical vector"); } return LOGICAL(x)[0] == TRUE; } // [[Rcpp::export]] void read_lines_chunked_(List sourceSpec, List locale_, std::vector na, int chunkSize, Environment callback, bool progress = true) { LocaleInfo locale(locale_); Reader r( Source::create(sourceSpec), TokenizerPtr(new TokenizerLine(na)), CollectorPtr(new CollectorCharacter(&locale.encoder_)), progress); CharacterVector out; int pos = 1; while (isTrue(R6method(callback, "continue")())) { CharacterVector out = r.readToVector(chunkSize); if (out.size() == 0) { return; } R6method(callback, "receive")(out, pos); pos += out.size(); } return; } // [[Rcpp::export]] List read_lines_raw_(List sourceSpec, int n_max = -1, bool progress = false) { Reader r( Source::create(sourceSpec), TokenizerPtr(new TokenizerLine()), CollectorPtr(new CollectorRaw()), progress); return r.readToVector(n_max); } typedef std::vector::iterator CollectorItr; // [[Rcpp::export]] RObject read_tokens_(List sourceSpec, List tokenizerSpec, ListOf colSpecs, CharacterVector colNames, List locale_, int n_max = -1, bool progress = true) { LocaleInfo l(locale_); Reader r( Source::create(sourceSpec), Tokenizer::create(tokenizerSpec), collectorsCreate(colSpecs, &l), progress, colNames); return r.readToDataFrame(n_max); } // [[Rcpp::export]] void read_tokens_chunked_(List sourceSpec, Environment callback, int chunkSize, List tokenizerSpec, ListOf colSpecs, CharacterVector colNames, List locale_, bool progress = true) { LocaleInfo l(locale_); Reader r( Source::create(sourceSpec), Tokenizer::create(tokenizerSpec), collectorsCreate(colSpecs, &l), progress, colNames); int pos = 1; while (isTrue(R6method(callback, "continue")())) { DataFrame out = r.readToDataFrame(chunkSize); if (out.nrows() == 0) { return; } R6method(callback, "receive")(out, pos); pos += out.nrows(); } return; } // [[Rcpp::export]] std::vector guess_types_(List sourceSpec, List tokenizerSpec, Rcpp::List locale_, int n = 100) { Warnings warnings; SourcePtr source = Source::create(sourceSpec); TokenizerPtr tokenizer = Tokenizer::create(tokenizerSpec); tokenizer->tokenize(source->begin(), source->end()); tokenizer->setWarnings(&warnings); // silence warnings LocaleInfo locale(locale_); std::vector collectors; for (Token t = tokenizer->nextToken(); t.type() != TOKEN_EOF; t = tokenizer->nextToken()) { if (t.row() >= (size_t) n) break; // Add new collectors, if needed if (t.col() >= collectors.size()) { int p = collectors.size() - t.col() + 1; for (int j = 0; j < p; ++j) { CollectorPtr col = CollectorPtr(new CollectorCharacter(&locale.encoder_)); col->setWarnings(&warnings); col->resize(n); collectors.push_back(col); } } collectors[t.col()]->setValue(t.row(), t); } std::vector out; for (size_t j = 0; j < collectors.size(); ++j) { CharacterVector col = as(collectors[j]->vector()); out.push_back(collectorGuess(col, locale_)); } return out; } readr/src/TokenizerDelim.cpp0000644000175100001440000002207013106621354015607 0ustar hornikusers#include using namespace Rcpp; #include "TokenizerDelim.h" TokenizerDelim::TokenizerDelim(char delim, char quote, std::vector NA, std::string comment, bool trimWS, bool escapeBackslash, bool escapeDouble, bool quotedNA): delim_(delim), quote_(quote), NA_(NA), comment_(comment), hasComment_(comment.size() > 0), trimWS_(trimWS), escapeBackslash_(escapeBackslash), escapeDouble_(escapeDouble), quotedNA_(quotedNA), hasEmptyNA_(false), moreTokens_(false) { for (size_t i = 0; i < NA_.size(); ++i) { if (NA_[i] == "") { hasEmptyNA_ = true; break; } } } void TokenizerDelim::tokenize(SourceIterator begin, SourceIterator end) { cur_ = begin; begin_ = begin; end_ = end; row_ = 0; col_ = 0; state_ = STATE_DELIM; moreTokens_ = true; } std::pair TokenizerDelim::progress() { size_t bytes = cur_ - begin_; return std::make_pair(bytes / (double) (end_ - begin_), bytes); } Token TokenizerDelim::nextToken() { // Capture current position int row = row_, col = col_; if (!moreTokens_) return Token(TOKEN_EOF, row, col); SourceIterator token_begin = cur_; bool hasEscapeD = false, hasEscapeB = false, hasNull = false; while (cur_ != end_) { // Increments cur on destruct, ensuring that we always move on to the // next character Advance advance(&cur_); if (*cur_ == '\0') hasNull = true; if ((end_ - cur_) % 131072 == 0) Rcpp::checkUserInterrupt(); switch(state_) { case STATE_DELIM: if (*cur_ == '\r' || *cur_ == '\n') { if (col_ == 0) { advanceForLF(&cur_, end_); token_begin = cur_ + 1; break; } newRecord(); return emptyToken(row, col); } else if (isComment(cur_)) { state_ = STATE_COMMENT; } else if (*cur_ == delim_) { newField(); return emptyToken(row, col); } else if (*cur_ == quote_) { state_ = STATE_STRING; } else if (escapeBackslash_ && *cur_ == '\\') { state_ = STATE_ESCAPE_F; } else { state_ = STATE_FIELD; } break; case STATE_FIELD: if (*cur_ == '\r' || *cur_ == '\n') { newRecord(); return fieldToken(token_begin, advanceForLF(&cur_, end_), hasEscapeB, hasNull, row, col); } else if (isComment(cur_)) { newField(); state_ = STATE_COMMENT; return fieldToken(token_begin, cur_, hasEscapeB, hasNull, row, col); } else if (escapeBackslash_ && *cur_ == '\\') { state_ = STATE_ESCAPE_F; } else if (*cur_ == delim_) { newField(); return fieldToken(token_begin, cur_, hasEscapeB, hasNull, row, col); } break; case STATE_ESCAPE_F: hasEscapeB = true; state_ = STATE_FIELD; break; case STATE_QUOTE: if (*cur_ == quote_) { hasEscapeD = true; state_ = STATE_STRING; } else if (*cur_ == '\r' || *cur_ == '\n') { newRecord(); return stringToken(token_begin + 1, advanceForLF(&cur_, end_) - 1, hasEscapeB, hasEscapeD, hasNull, row, col); } else if (isComment(cur_)) { state_ = STATE_COMMENT; return stringToken(token_begin + 1, cur_ - 1, hasEscapeB, hasEscapeD, hasNull, row, col); } else if (*cur_ == delim_) { newField(); return stringToken(token_begin + 1, cur_ - 1, hasEscapeB, hasEscapeD, hasNull, row, col); } else { warn(row, col, "delimiter or quote", std::string(cur_, cur_ + 1)); state_ = STATE_STRING; } break; case STATE_STRING: if (*cur_ == quote_) { if (escapeDouble_) { state_ = STATE_QUOTE; } else { state_ = STATE_STRING_END; } } else if (escapeBackslash_ && *cur_ == '\\') { state_ = STATE_ESCAPE_S; } break; case STATE_STRING_END: if (*cur_ == '\r' || *cur_ == '\n') { newRecord(); return stringToken(token_begin + 1, advanceForLF(&cur_, end_) - 1, hasEscapeB, hasEscapeD, hasNull, row, col); } else if (isComment(cur_)) { state_ = STATE_COMMENT; return stringToken(token_begin + 1, cur_ - 1, hasEscapeB, hasEscapeD, hasNull, row, col); } else if (*cur_ == delim_) { newField(); return stringToken(token_begin + 1, cur_ - 1, hasEscapeB, hasEscapeD, hasNull, row, col); } else { state_ = STATE_FIELD; } break; case STATE_ESCAPE_S: hasEscapeB = true; state_ = STATE_STRING; break; case STATE_COMMENT: if (*cur_ == '\r' || *cur_ == '\n') { // If we have read at least one record on the current row go to the // next row, line, otherwise just ignore the line. if (col_ > 0) { row_++; row++; col_ = 0; } col = 0; advanceForLF(&cur_, end_); token_begin = cur_ + 1; state_ = STATE_DELIM; } break; } } // Reached end of Source: cur_ == end_ moreTokens_ = false; switch (state_) { case STATE_DELIM: if (col_ == 0) { return Token(TOKEN_EOF, row, col); } else { return emptyToken(row, col); } case STATE_STRING_END: case STATE_QUOTE: return stringToken(token_begin + 1, end_ - 1, hasEscapeB, hasEscapeD, hasNull, row, col); case STATE_STRING: warn(row, col, "closing quote at end of file"); return stringToken(token_begin + 1, end_, hasEscapeB, hasEscapeD, hasNull, row, col); case STATE_ESCAPE_S: case STATE_ESCAPE_F: warn(row, col, "closing escape at end of file"); return stringToken(token_begin, end_ - 1, hasEscapeB, hasEscapeD, hasNull, row, col); case STATE_FIELD: return fieldToken(token_begin, end_, hasEscapeB, hasNull, row, col); case STATE_COMMENT: return Token(TOKEN_EOF, row, col); } return Token(TOKEN_EOF, row, col); } bool TokenizerDelim::isComment(const char* cur) const { if (!hasComment_) return false; boost::iterator_range haystack(cur, end_); return boost::starts_with(haystack, comment_); } void TokenizerDelim::newField() { col_++; state_ = STATE_DELIM; } void TokenizerDelim::newRecord() { row_++; col_ = 0; state_ = STATE_DELIM; } Token TokenizerDelim::emptyToken(int row, int col) { return Token(hasEmptyNA_ ? TOKEN_MISSING : TOKEN_EMPTY, row, col); } Token TokenizerDelim::fieldToken(SourceIterator begin, SourceIterator end, bool hasEscapeB, bool hasNull, int row, int col) { Token t(begin, end, row, col, hasNull, (hasEscapeB) ? this : NULL); if (trimWS_) t.trim(); t.flagNA(NA_); return t; } Token TokenizerDelim::stringToken(SourceIterator begin, SourceIterator end, bool hasEscapeB, bool hasEscapeD, bool hasNull, int row, int col) { Token t(begin, end, row, col, hasNull, (hasEscapeD || hasEscapeB) ? this : NULL); if (trimWS_) t.trim(); if (quotedNA_) t.flagNA(NA_); return t; } void TokenizerDelim::unescape(SourceIterator begin, SourceIterator end, boost::container::string* pOut) { if (escapeDouble_ && !escapeBackslash_) { unescapeDouble(begin, end, pOut); } else if (escapeBackslash_ && !escapeDouble_) { unescapeBackslash(begin, end, pOut); } else if (escapeBackslash_ && escapeDouble_) { Rcpp::stop("Backslash & double escapes not supported at this time"); } } void TokenizerDelim::unescapeDouble(SourceIterator begin, SourceIterator end, boost::container::string* pOut) { pOut->reserve(end - begin); bool inEscape = false; for (SourceIterator cur = begin; cur != end; ++cur) { if (*cur == quote_) { if (inEscape) { pOut->push_back(*cur); inEscape = false; } else { inEscape = true; } } else { pOut->push_back(*cur); } } } void TokenizerDelim::unescapeBackslash(SourceIterator begin, SourceIterator end, boost::container::string* pOut) { pOut->reserve(end - begin); bool inEscape = false; for (SourceIterator cur = begin; cur != end; ++cur) { if (inEscape) { switch(*cur) { case '\'': pOut->push_back('\''); break; case '"': pOut->push_back('"'); break; case '\\': pOut->push_back('\\'); break; case 'a': pOut->push_back('\a'); break; case 'b': pOut->push_back('\b'); break; case 'f': pOut->push_back('\f'); break; case 'n': pOut->push_back('\n'); break; case 'r': pOut->push_back('\r'); break; case 't': pOut->push_back('\t'); break; case 'v': pOut->push_back('\v'); break; default: if (*cur == delim_ || *cur == quote_ || isComment(cur)) { pOut->push_back(*cur); } else { pOut->push_back('\\'); pOut->push_back(*cur); warn(row_, col_, "standard escape", "\\" + std::string(cur, 1)); } break; } inEscape = false; } else { if (*cur == '\\') { inEscape = true; } else { pOut->push_back(*cur); } } } } readr/src/boost.h0000644000175100001440000000056713106621354013464 0ustar hornikusers#ifndef FASTREAD_BOOST_H_ #define FASTREAD_BOOST_H_ #pragma GCC system_header #include #include #include #include #include #include #include #endif readr/src/type_convert.cpp0000644000175100001440000000146613106621354015411 0ustar hornikusers#include using namespace Rcpp; #include "Collector.h" #include "LocaleInfo.h" #include "Token.h" // [[Rcpp::export]] RObject type_convert_col(CharacterVector x, List spec, List locale_, int col, const std::vector& na, bool trim_ws) { LocaleInfo locale(locale_); CollectorPtr collector = Collector::create(spec, &locale); collector->resize(x.size()); for (int i = 0; i < x.size(); ++i) { SEXP string = x[i]; Token t; if (string == NA_STRING) { t = Token(TOKEN_MISSING, i - 1, col - 1); } else { const char* begin = CHAR(string); t = Token(begin, begin + Rf_length(string), i - 1, col - 1, false); if (trim_ws) t.trim(); t.flagNA(na); } collector->setValue(i, t); } return collector->vector(); } readr/src/grisu3.c0000644000175100001440000003236013106621354013541 0ustar hornikusers/* Copyright Jukka Jylänki Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ /* Modifcations to dtoa_grisu3() referenced mikkelfj: are under the following * Copyright (c) 2016 Mikkel F. Jørgensen, dvide.com * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. http://www.apache.org/licenses/LICENSE-2.0 */ /* This file is part of an implementation of the "grisu3" double to string conversion algorithm described in the research paper "Printing Floating-Point Numbers Quickly And Accurately with Integers" by Florian Loitsch, available at http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf */ #include // uint64_t etc. #include // assert #include // ceil #include // sprintf #include #ifdef _MSC_VER #pragma warning(disable : 4204) // nonstandard extension used : non-constant aggregate initializer #endif #define D64_SIGN 0x8000000000000000ULL #define D64_EXP_MASK 0x7FF0000000000000ULL #define D64_FRACT_MASK 0x000FFFFFFFFFFFFFULL #define D64_IMPLICIT_ONE 0x0010000000000000ULL #define D64_EXP_POS 52 #define D64_EXP_BIAS 1075 #define DIYFP_FRACT_SIZE 64 #define D_1_LOG2_10 0.30102999566398114 // 1 / lg(10) #define MIN_TARGET_EXP -60 #define MASK32 0xFFFFFFFFULL #define CAST_U64(d) (*(uint64_t*)&d) #define MIN(x,y) ((x) <= (y) ? (x) : (y)) #define MAX(x,y) ((x) >= (y) ? (x) : (y)) #define MIN_CACHED_EXP -348 #define CACHED_EXP_STEP 8 typedef struct diy_fp { uint64_t f; int e; } diy_fp; typedef struct power { uint64_t fract; int16_t b_exp, d_exp; } power; static const power pow_cache[] = { { 0xfa8fd5a0081c0288ULL, -1220, -348 }, { 0xbaaee17fa23ebf76ULL, -1193, -340 }, { 0x8b16fb203055ac76ULL, -1166, -332 }, { 0xcf42894a5dce35eaULL, -1140, -324 }, { 0x9a6bb0aa55653b2dULL, -1113, -316 }, { 0xe61acf033d1a45dfULL, -1087, -308 }, { 0xab70fe17c79ac6caULL, -1060, -300 }, { 0xff77b1fcbebcdc4fULL, -1034, -292 }, { 0xbe5691ef416bd60cULL, -1007, -284 }, { 0x8dd01fad907ffc3cULL, -980, -276 }, { 0xd3515c2831559a83ULL, -954, -268 }, { 0x9d71ac8fada6c9b5ULL, -927, -260 }, { 0xea9c227723ee8bcbULL, -901, -252 }, { 0xaecc49914078536dULL, -874, -244 }, { 0x823c12795db6ce57ULL, -847, -236 }, { 0xc21094364dfb5637ULL, -821, -228 }, { 0x9096ea6f3848984fULL, -794, -220 }, { 0xd77485cb25823ac7ULL, -768, -212 }, { 0xa086cfcd97bf97f4ULL, -741, -204 }, { 0xef340a98172aace5ULL, -715, -196 }, { 0xb23867fb2a35b28eULL, -688, -188 }, { 0x84c8d4dfd2c63f3bULL, -661, -180 }, { 0xc5dd44271ad3cdbaULL, -635, -172 }, { 0x936b9fcebb25c996ULL, -608, -164 }, { 0xdbac6c247d62a584ULL, -582, -156 }, { 0xa3ab66580d5fdaf6ULL, -555, -148 }, { 0xf3e2f893dec3f126ULL, -529, -140 }, { 0xb5b5ada8aaff80b8ULL, -502, -132 }, { 0x87625f056c7c4a8bULL, -475, -124 }, { 0xc9bcff6034c13053ULL, -449, -116 }, { 0x964e858c91ba2655ULL, -422, -108 }, { 0xdff9772470297ebdULL, -396, -100 }, { 0xa6dfbd9fb8e5b88fULL, -369, -92 }, { 0xf8a95fcf88747d94ULL, -343, -84 }, { 0xb94470938fa89bcfULL, -316, -76 }, { 0x8a08f0f8bf0f156bULL, -289, -68 }, { 0xcdb02555653131b6ULL, -263, -60 }, { 0x993fe2c6d07b7facULL, -236, -52 }, { 0xe45c10c42a2b3b06ULL, -210, -44 }, { 0xaa242499697392d3ULL, -183, -36 }, { 0xfd87b5f28300ca0eULL, -157, -28 }, { 0xbce5086492111aebULL, -130, -20 }, { 0x8cbccc096f5088ccULL, -103, -12 }, { 0xd1b71758e219652cULL, -77, -4 }, { 0x9c40000000000000ULL, -50, 4 }, { 0xe8d4a51000000000ULL, -24, 12 }, { 0xad78ebc5ac620000ULL, 3, 20 }, { 0x813f3978f8940984ULL, 30, 28 }, { 0xc097ce7bc90715b3ULL, 56, 36 }, { 0x8f7e32ce7bea5c70ULL, 83, 44 }, { 0xd5d238a4abe98068ULL, 109, 52 }, { 0x9f4f2726179a2245ULL, 136, 60 }, { 0xed63a231d4c4fb27ULL, 162, 68 }, { 0xb0de65388cc8ada8ULL, 189, 76 }, { 0x83c7088e1aab65dbULL, 216, 84 }, { 0xc45d1df942711d9aULL, 242, 92 }, { 0x924d692ca61be758ULL, 269, 100 }, { 0xda01ee641a708deaULL, 295, 108 }, { 0xa26da3999aef774aULL, 322, 116 }, { 0xf209787bb47d6b85ULL, 348, 124 }, { 0xb454e4a179dd1877ULL, 375, 132 }, { 0x865b86925b9bc5c2ULL, 402, 140 }, { 0xc83553c5c8965d3dULL, 428, 148 }, { 0x952ab45cfa97a0b3ULL, 455, 156 }, { 0xde469fbd99a05fe3ULL, 481, 164 }, { 0xa59bc234db398c25ULL, 508, 172 }, { 0xf6c69a72a3989f5cULL, 534, 180 }, { 0xb7dcbf5354e9beceULL, 561, 188 }, { 0x88fcf317f22241e2ULL, 588, 196 }, { 0xcc20ce9bd35c78a5ULL, 614, 204 }, { 0x98165af37b2153dfULL, 641, 212 }, { 0xe2a0b5dc971f303aULL, 667, 220 }, { 0xa8d9d1535ce3b396ULL, 694, 228 }, { 0xfb9b7cd9a4a7443cULL, 720, 236 }, { 0xbb764c4ca7a44410ULL, 747, 244 }, { 0x8bab8eefb6409c1aULL, 774, 252 }, { 0xd01fef10a657842cULL, 800, 260 }, { 0x9b10a4e5e9913129ULL, 827, 268 }, { 0xe7109bfba19c0c9dULL, 853, 276 }, { 0xac2820d9623bf429ULL, 880, 284 }, { 0x80444b5e7aa7cf85ULL, 907, 292 }, { 0xbf21e44003acdd2dULL, 933, 300 }, { 0x8e679c2f5e44ff8fULL, 960, 308 }, { 0xd433179d9c8cb841ULL, 986, 316 }, { 0x9e19db92b4e31ba9ULL, 1013, 324 }, { 0xeb96bf6ebadf77d9ULL, 1039, 332 }, { 0xaf87023b9bf0ee6bULL, 1066, 340 } }; static int cached_pow(int exp, diy_fp *p) { int k = (int)ceil((exp+DIYFP_FRACT_SIZE-1) * D_1_LOG2_10); int i = (k-MIN_CACHED_EXP-1) / CACHED_EXP_STEP + 1; p->f = pow_cache[i].fract; p->e = pow_cache[i].b_exp; return pow_cache[i].d_exp; } static diy_fp minus(diy_fp x, diy_fp y) { diy_fp d; d.f = x.f - y.f; d.e = x.e; assert(x.e == y.e && x.f >= y.f); return d; } static diy_fp multiply(diy_fp x, diy_fp y) { uint64_t a, b, c, d, ac, bc, ad, bd, tmp; diy_fp r; a = x.f >> 32; b = x.f & MASK32; c = y.f >> 32; d = y.f & MASK32; ac = a*c; bc = b*c; ad = a*d; bd = b*d; tmp = (bd >> 32) + (ad & MASK32) + (bc & MASK32); tmp += 1U << 31; // round r.f = ac + (ad >> 32) + (bc >> 32) + (tmp >> 32); r.e = x.e + y.e + 64; return r; } static diy_fp normalize_diy_fp(diy_fp n) { assert(n.f != 0); while(!(n.f & 0xFFC0000000000000ULL)) { n.f <<= 10; n.e -= 10; } while(!(n.f & D64_SIGN)) { n.f <<= 1; --n.e; } return n; } static diy_fp double2diy_fp(double d) { diy_fp fp; uint64_t u64 = CAST_U64(d); if (!(u64 & D64_EXP_MASK)) { fp.f = u64 & D64_FRACT_MASK; fp.e = 1 - D64_EXP_BIAS; } else { fp.f = (u64 & D64_FRACT_MASK) + D64_IMPLICIT_ONE; fp.e = (int)((u64 & D64_EXP_MASK) >> D64_EXP_POS) - D64_EXP_BIAS; } return fp; } // pow10_cache[i] = 10^(i-1) static const unsigned int pow10_cache[] = { 0, 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000 }; static int largest_pow10(uint32_t n, int n_bits, uint32_t *power) { int guess = ((n_bits + 1) * 1233 >> 12) + 1/*skip first entry*/; if (n < pow10_cache[guess]) --guess; // We don't have any guarantees that 2^n_bits <= n. *power = pow10_cache[guess]; return guess; } static int round_weed(char *buffer, int len, uint64_t wp_W, uint64_t delta, uint64_t rest, uint64_t ten_kappa, uint64_t ulp) { uint64_t wp_Wup = wp_W - ulp; uint64_t wp_Wdown = wp_W + ulp; while(rest < wp_Wup && delta - rest >= ten_kappa && (rest + ten_kappa < wp_Wup || wp_Wup - rest >= rest + ten_kappa - wp_Wup)) { --buffer[len-1]; rest += ten_kappa; } if (rest < wp_Wdown && delta - rest >= ten_kappa && (rest + ten_kappa < wp_Wdown || wp_Wdown - rest > rest + ten_kappa - wp_Wdown)) return 0; return 2*ulp <= rest && rest <= delta - 4*ulp; } static int digit_gen(diy_fp low, diy_fp w, diy_fp high, char *buffer, int *length, int *kappa) { uint64_t unit = 1; diy_fp too_low = { low.f - unit, low.e }; diy_fp too_high = { high.f + unit, high.e }; diy_fp unsafe_interval = minus(too_high, too_low); diy_fp one = { 1ULL << -w.e, w.e }; uint32_t p1 = (uint32_t)(too_high.f >> -one.e); uint64_t p2 = too_high.f & (one.f - 1); uint32_t div; *kappa = largest_pow10(p1, DIYFP_FRACT_SIZE + one.e, &div); *length = 0; while(*kappa > 0) { uint64_t rest; int digit = p1 / div; buffer[*length] = (char)('0' + digit); ++*length; p1 %= div; --*kappa; rest = ((uint64_t)p1 << -one.e) + p2; if (rest < unsafe_interval.f) return round_weed(buffer, *length, minus(too_high, w).f, unsafe_interval.f, rest, (uint64_t)div << -one.e, unit); div /= 10; } for(;;) { int digit; p2 *= 10; unit *= 10; unsafe_interval.f *= 10; // Integer division by one. digit = (int)(p2 >> -one.e); buffer[*length] = (char)('0' + digit); ++*length; p2 &= one.f - 1; // Modulo by one. --*kappa; if (p2 < unsafe_interval.f) return round_weed(buffer, *length, minus(too_high, w).f * unit, unsafe_interval.f, p2, one.f, unit); } } static int grisu3(double v, char *buffer, int *length, int *d_exp) { int mk, kappa, success; diy_fp dfp = double2diy_fp(v); diy_fp w = normalize_diy_fp(dfp); // normalize boundaries diy_fp t = { (dfp.f << 1) + 1, dfp.e - 1 }; diy_fp b_plus = normalize_diy_fp(t); diy_fp b_minus; diy_fp c_mk; // Cached power of ten: 10^-k uint64_t u64 = CAST_U64(v); assert(v > 0 && v <= 1.7976931348623157e308); // Grisu only handles strictly positive finite numbers. if (!(u64 & D64_FRACT_MASK) && (u64 & D64_EXP_MASK) != 0) { b_minus.f = (dfp.f << 2) - 1; b_minus.e = dfp.e - 2;} // lower boundary is closer? else { b_minus.f = (dfp.f << 1) - 1; b_minus.e = dfp.e - 1; } b_minus.f = b_minus.f << (b_minus.e - b_plus.e); b_minus.e = b_plus.e; mk = cached_pow(MIN_TARGET_EXP - DIYFP_FRACT_SIZE - w.e, &c_mk); w = multiply(w, c_mk); b_minus = multiply(b_minus, c_mk); b_plus = multiply(b_plus, c_mk); success = digit_gen(b_minus, w, b_plus, buffer, length, &kappa); *d_exp = kappa - mk; return success; } static int i_to_str(int val, char *str) { int len, i; char *s; char *begin = str; if (val < 0) { *str++ = '-'; val = -val; } s = str; for(;;) { int ni = val / 10; int digit = val - ni*10; *s++ = (char)('0' + digit); if (ni == 0) break; val = ni; } *s = '\0'; len = (int)(s - str); for(i = 0; i < len/2; ++i) { char ch = str[i]; str[i] = str[len-1-i]; str[len-1-i] = ch; } return (int)(s - begin); } int dtoa_grisu3(double v, char *dst) { int d_exp, len, success, decimals, i; uint64_t u64 = CAST_U64(v); char *s2 = dst; assert(dst); // Prehandle NaNs if ((u64 << 1) > 0xFFE0000000000000ULL) return sprintf(dst, "NaN(%08X%08X)", (uint32_t)(u64 >> 32), (uint32_t)u64); // Prehandle negative values. if ((u64 & D64_SIGN) != 0) { *s2++ = '-'; v = -v; u64 ^= D64_SIGN; } // Prehandle zero. if (!u64) { *s2++ = '0'; *s2 = '\0'; return (int)(s2 - dst); } // Prehandle infinity. if (u64 == D64_EXP_MASK) { *s2++ = 'i'; *s2++ = 'n'; *s2++ = 'f'; *s2 = '\0'; return (int)(s2 - dst); } success = grisu3(v, s2, &len, &d_exp); // If grisu3 was not able to convert the number to a string, then use old sprintf (suboptimal). if (!success) return sprintf(s2, "%.17g", v) + (int)(s2 - dst); // handle whole numbers if (d_exp >= 0 && d_exp <= 2) { while(d_exp-- > 0) s2[len++] = '0'; s2[len] = '\0'; return (int)(s2+len-dst); } // We now have an integer string of form "151324135" and a base-10 exponent for that number. // Next, decide the best presentation for that string by whether to use a decimal point, or the scientific exponent notation 'e'. // We don't pick the absolute shortest representation, but pick a balance between readability and shortness, e.g. // 1.545056189557677e-308 could be represented in a shorter form // 1545056189557677e-323 but that would be somewhat unreadable. decimals = MIN(-d_exp, MAX(1, len-1)); // mikkelfj: // fix zero prefix .1 => 0.1, important for JSON export. // prefer unscientific notation at same length: // -1.2345e-4 over -1.00012345, // -1.0012345 over -1.2345e-3 if (d_exp < 0 && (len + d_exp) > -3 && len <= -d_exp) { // mikkelfj: fix zero prefix .1 => 0.1, and short exponents 1.3e-2 => 0.013. memmove(s2 + 2 - d_exp - len, s2, len); s2[0] = '0'; s2[1] = '.'; for (i = 2; i < 2-d_exp-len; ++i) s2[i] = '0'; len += i; } else if (d_exp < 0 && len > 1) // Add decimal point? { for(i = 0; i < decimals; ++i) s2[len-i] = s2[len-i-1]; s2[len++ - decimals] = '.'; d_exp += decimals; // Need scientific notation as well? if (d_exp != 0) { s2[len++] = 'e'; len += i_to_str(d_exp, s2+len); } }// Add scientific notation? else if (d_exp < 0 || d_exp > 2) { s2[len++] = 'e'; len += i_to_str(d_exp, s2+len); } // Add zeroes instead of scientific notation? s2[len] = '\0'; // grisu3 doesn't null terminate, so ensure termination. return (int)(s2+len-dst); } readr/src/tzfile.h0000644000175100001440000001255213106621354013630 0ustar hornikusers#ifndef TZFILE_H #define TZFILE_H /* ** This file is in the public domain, so clarified as of ** 1996-06-05 by Arthur David Olson. */ /* ** This header is for use ONLY with the time conversion code. ** There is no guarantee that it will remain unchanged, ** or that it will remain at all. ** Do NOT copy it to any system include directory. ** Thank you! */ /* ** Information about time zone files. */ #ifndef TZDIR #define TZDIR "/usr/local/etc/zoneinfo" /* Time zone object file directory */ #endif /* !defined TZDIR */ #ifndef TZDEFAULT #define TZDEFAULT "UTC" // needs to be a valid timezone, PR#16503 #endif /* !defined TZDEFAULT */ /* We don't ship posixrules, which is usually a link to a USA timezeone. So choose one instead. */ #ifndef TZDEFRULES #define TZDEFRULES "America/New_York" #endif /* !defined TZDEFRULES */ /* ** Each file begins with. . . */ #define TZ_MAGIC "TZif" struct tzhead { char tzh_magic[4]; /* TZ_MAGIC */ char tzh_version[1]; /* '\0' or '2' or '3' as of 2013 */ char tzh_reserved[15]; /* reserved--must be zero */ char tzh_ttisgmtcnt[4]; /* coded number of trans. time flags */ char tzh_ttisstdcnt[4]; /* coded number of trans. time flags */ char tzh_leapcnt[4]; /* coded number of leap seconds */ char tzh_timecnt[4]; /* coded number of transition times */ char tzh_typecnt[4]; /* coded number of local time types */ char tzh_charcnt[4]; /* coded number of abbr. chars */ }; /* ** . . .followed by. . . ** ** tzh_timecnt (char [4])s coded transition times a la time(2) ** tzh_timecnt (unsigned char)s types of local time starting at above ** tzh_typecnt repetitions of ** one (char [4]) coded UT offset in seconds ** one (unsigned char) used to set tm_isdst ** one (unsigned char) that's an abbreviation list index ** tzh_charcnt (char)s '\0'-terminated zone abbreviations ** tzh_leapcnt repetitions of ** one (char [4]) coded leap second transition times ** one (char [4]) total correction after above ** tzh_ttisstdcnt (char)s indexed by type; if TRUE, transition ** time is standard time, if FALSE, ** transition time is wall clock time ** if absent, transition times are ** assumed to be wall clock time ** tzh_ttisgmtcnt (char)s indexed by type; if TRUE, transition ** time is UT, if FALSE, ** transition time is local time ** if absent, transition times are ** assumed to be local time */ /* ** If tzh_version is '2' or greater, the above is followed by a second instance ** of tzhead and a second instance of the data in which each coded transition ** time uses 8 rather than 4 chars, ** then a POSIX-TZ-environment-variable-style string for use in handling ** instants after the last transition time stored in the file ** (with nothing between the newlines if there is no POSIX representation for ** such instants). ** ** If tz_version is '3' or greater, the above is extended as follows. ** First, the POSIX TZ string's hour offset may range from -167 ** through 167 as compared to the POSIX-required 0 through 24. ** Second, its DST start time may be January 1 at 00:00 and its stop ** time December 31 at 24:00 plus the difference between DST and ** standard time, indicating DST all year. */ /* ** In the current implementation, "tzset()" refuses to deal with files that ** exceed any of the limits below. */ #ifndef TZ_MAX_TIMES #define TZ_MAX_TIMES 1200 #endif /* !defined TZ_MAX_TIMES */ #ifndef TZ_MAX_TYPES #ifndef NOSOLAR #define TZ_MAX_TYPES 256 /* Limited by what (unsigned char)'s can hold */ #endif /* !defined NOSOLAR */ #ifdef NOSOLAR /* ** Must be at least 14 for Europe/Riga as of Jan 12 1995, ** as noted by Earl Chew. */ #define TZ_MAX_TYPES 20 /* Maximum number of local time types */ #endif /* !defined NOSOLAR */ #endif /* !defined TZ_MAX_TYPES */ // increased from 50, http://mm.icann.org/pipermail/tz/2015-August/022623.html #ifndef TZ_MAX_CHARS #define TZ_MAX_CHARS 100 /* Maximum number of abbreviation characters */ /* (limited by what unsigned chars can hold) */ #endif /* !defined TZ_MAX_CHARS */ #ifndef TZ_MAX_LEAPS #define TZ_MAX_LEAPS 50 /* Maximum number of leap second corrections */ #endif /* !defined TZ_MAX_LEAPS */ #define SECSPERMIN 60 #define MINSPERHOUR 60 #define HOURSPERDAY 24 #define DAYSPERWEEK 7 #define DAYSPERNYEAR 365 #define DAYSPERLYEAR 366 #define SECSPERHOUR (SECSPERMIN * MINSPERHOUR) #define SECSPERDAY ((int_fast32_t) SECSPERHOUR * HOURSPERDAY) #define MONSPERYEAR 12 #define TM_SUNDAY 0 #define TM_MONDAY 1 #define TM_TUESDAY 2 #define TM_WEDNESDAY 3 #define TM_THURSDAY 4 #define TM_FRIDAY 5 #define TM_SATURDAY 6 #define TM_JANUARY 0 #define TM_FEBRUARY 1 #define TM_MARCH 2 #define TM_APRIL 3 #define TM_MAY 4 #define TM_JUNE 5 #define TM_JULY 6 #define TM_AUGUST 7 #define TM_SEPTEMBER 8 #define TM_OCTOBER 9 #define TM_NOVEMBER 10 #define TM_DECEMBER 11 #define TM_YEAR_BASE 1900 #define EPOCH_YEAR 1970 #define EPOCH_WDAY TM_THURSDAY #define isleap(y) (((y) % 4) == 0 && (((y) % 100) != 0 || ((y) % 400) == 0)) /* ** Since everything in isleap is modulo 400 (or a factor of 400), we know that ** isleap(y) == isleap(y % 400) ** and so ** isleap(a + b) == isleap((a + b) % 400) ** or ** isleap(a + b) == isleap(a % 400 + b % 400) ** This is true even if % means modulo rather than Fortran remainder ** (which is allowed by C89 but not C99). ** We use this to avoid addition overflow problems. */ #define isleap_sum(a, b) isleap((a) % 400 + (b) % 400) #endif /* !defined TZFILE_H */ readr/src/write.cpp0000644000175100001440000000241013106621354014010 0ustar hornikusers#include using namespace Rcpp; #include #include #include "write_connection.h" #include // stream // [[Rcpp::export]] void write_lines_(const CharacterVector &lines, RObject connection, const std::string& na) { boost::iostreams::stream output(connection); for (CharacterVector::const_iterator i = lines.begin(); i != lines.end(); ++i) { if (CharacterVector::is_na(*i)) { output << na << '\n'; } else { output << Rf_translateCharUTF8(*i) << '\n'; } } return; } // [[Rcpp::export]] void write_lines_raw_(List x, RObject connection) { boost::iostreams::stream output(connection); for (int i = 0;i < x.length();++i) { RawVector y = x.at(i); output.write(reinterpret_cast(&y[0]), y.size() * sizeof(y[0])); output << '\n'; } return; } // [[Rcpp::export]] void write_file_(std::string x, RObject connection) { boost::iostreams::stream out(connection); out << x; return; } // [[Rcpp::export]] void write_file_raw_(RawVector x, RObject connection) { boost::iostreams::stream output(connection); output.write(reinterpret_cast(&x[0]), x.size() * sizeof(x[0])); return; } readr/src/Source.cpp0000644000175100001440000000143613106621354014125 0ustar hornikusers#include using namespace Rcpp; #include "Source.h" #include "SourceFile.h" #include "SourceString.h" #include "SourceRaw.h" SourcePtr Source::create(List spec) { std::string subclass(as(spec.attr("class"))[0]); int skip = as(spec["skip"]); std::string comment = as(spec["comment"]); if (subclass == "source_raw") { return SourcePtr(new SourceRaw(as(spec[0]), skip, comment)); } else if (subclass == "source_string") { return SourcePtr(new SourceString(as(spec[0]), skip, comment)); } else if (subclass == "source_file") { std::string path(as(spec[0])[0]); return SourcePtr(new SourceFile(path, skip, comment)); } Rcpp::stop("Unknown source type"); return SourcePtr(); } readr/src/Reader.h0000644000175100001440000000252013106621354013527 0ustar hornikusers#include #include "Collector.h" #include "Source.h" #include "Progress.h" using namespace Rcpp; class Reader { public: Reader(SourcePtr source, TokenizerPtr tokenizer, std::vector collectors, bool progress = true, CharacterVector colNames = CharacterVector()); Reader(SourcePtr source, TokenizerPtr tokenizer, CollectorPtr collector, bool progress = true, CharacterVector colNames = CharacterVector()); RObject readToDataFrame(int lines = -1); template T readToVector(int lines) { read(lines); T out = as(collectors_[0]->vector()); collectorsClear(); return out; } template RObject readToVectorWithWarnings(int lines) { read(lines); return warnings_.addAsAttribute(as(collectors_[0]->vector())); } private: Warnings warnings_; SourcePtr source_; TokenizerPtr tokenizer_; std::vector collectors_; bool progress_; Progress progressBar_; std::vector keptColumns_; CharacterVector outNames_; bool begun_; Token t_; const static int progressStep_ = 10000; void init(CharacterVector colNames); int read(int lines = -1); void checkColumns(int i, int j, int n); void collectorsResize(int n); void collectorsClear(); }; readr/src/Source.h0000644000175100001440000000463513106621354013576 0ustar hornikusers#ifndef FASTREAD_SOURCE_H_ #define FASTREAD_SOURCE_H_ #include #include "boost.h" class Source; typedef boost::shared_ptr SourcePtr; class Source { public: virtual ~Source() {} virtual const char* begin() = 0; virtual const char* end() = 0; static const char* skipLines(const char* begin, const char* end, int n, const std::string& comment = "") { bool hasComment = comment != ""; bool isComment = false, lineStart = true; const char* cur = begin; while(n > 0 && cur != end) { if (lineStart) { isComment = hasComment && inComment(cur, end, comment); lineStart = false; } if (*cur == '\r') { if (cur + 1 != end && *(cur + 1) == '\n') { cur++; } if (!isComment) n--; lineStart = true; } else if (*cur == '\n') { if (!isComment) n--; lineStart = true; } cur++; } return cur; } static const char* skipBom(const char* begin, const char* end) { /* Unicode Byte Order Marks https://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding 00 00 FE FF: UTF-32BE FF FE 00 00: UTF-32LE FE FF: UTF-16BE FF FE: UTF-16LE EF BB BF: UTF-8 */ switch(begin[0]) { // UTF-32BE case '\x00': if (end - begin >= 4 && begin[1] == '\x00' && begin[2] == '\xFE' && begin[3] == '\xFF') { return begin + 4; } break; // UTF-8 case '\xEF': if (end - begin >= 3 && begin[1] == '\xBB' && begin[2] == '\xBF') { return begin + 3; } break; // UTF-16BE case '\xfe': if (end - begin >= 2 && begin[1] == '\xff') { return begin + 2; } break; case '\xff': if (end - begin >= 2 && begin[1] == '\xfe') { // UTF-32 LE if (end - begin >= 4 && begin[2] == '\x00' && begin[3] == '\x00') { return begin + 4; } // UTF-16 LE return begin + 2; } break; } return begin; } static SourcePtr create(Rcpp::List spec); private: static bool inComment(const char* cur, const char* end, const std::string& comment) { boost::iterator_range haystack(cur, end); return boost::starts_with(haystack, comment); } }; #endif readr/src/TokenizerWs.cpp0000644000175100001440000000415513106621354015152 0ustar hornikusers#include using namespace Rcpp; #include "Source.h" #include "Tokenizer.h" #include "TokenizerFwf.h" #include "utils.h" // TokenizerWs -------------------------------------------------------------------- #include #include "TokenizerWs.h" #include TokenizerWs::TokenizerWs(std::vector NA, std::string comment) : NA_(NA), comment_(comment), moreTokens_(false), hasComment_(comment.size() > 0) { } void TokenizerWs::tokenize(SourceIterator begin, SourceIterator end) { cur_ = begin; curLine_ = begin; begin_ = begin; end_ = end; row_ = 0; col_ = 0; moreTokens_ = true; } std::pair TokenizerWs::progress() { size_t bytes = cur_ - begin_; return std::make_pair(bytes / (double) (end_ - begin_), bytes); } Token TokenizerWs::nextToken() { if (cur_ == end_) return Token(TOKEN_EOF, 0, 0); // Check for comments only at start of line while(cur_ != end_ && col_ == 0 && isComment(cur_)) { // Skip rest of line while(cur_ != end_ && *cur_ != '\n' && *cur_ != '\r') { ++cur_; } advanceForLF(&cur_, end_); if (cur_ != end_) { ++cur_; } curLine_ = cur_; } // Find start of field SourceIterator fieldBegin = cur_; while(fieldBegin != end_ && isblank(*fieldBegin)) { ++fieldBegin; } SourceIterator fieldEnd = fieldBegin; while(fieldEnd != end_ && !isspace(*fieldEnd)) { ++fieldEnd; } bool hasNull = *fieldEnd == '\0'; Token t = fieldToken(fieldBegin, fieldEnd, hasNull); cur_ = fieldEnd; ++col_; if (cur_ != end_ && (*cur_ == '\r' || *cur_ == '\n')) { advanceForLF(&cur_, end_); ++cur_; row_++; col_ = 0; } return t; } Token TokenizerWs::fieldToken(SourceIterator begin, SourceIterator end, bool hasNull) { if (begin == end) return Token(TOKEN_MISSING, row_, col_); Token t = Token(begin, end, row_, col_, hasNull); t.trim(); t.flagNA(NA_); return t; } bool TokenizerWs::isComment(const char* cur) const { if (!hasComment_) return false; boost::iterator_range haystack(cur, end_); return boost::starts_with(haystack, comment_); } readr/src/Collector.h0000644000175100001440000001511113106621354014253 0ustar hornikusers#ifndef FASTREAD_COLLECTOR_H_ #define FASTREAD_COLLECTOR_H_ #include #include #include "Iconv.h" #include "LocaleInfo.h" #include "Token.h" #include "Warnings.h" #include "DateTime.h" #include "DateTimeParser.h" class Collector; typedef boost::shared_ptr CollectorPtr; class Collector { protected: Rcpp::RObject column_; Warnings* pWarnings_; int n_; public: Collector(SEXP column, Warnings* pWarnings = NULL): column_(column), pWarnings_(pWarnings), n_(0) { } virtual ~Collector() {}; virtual void setValue(int i, const Token& t) =0; virtual Rcpp::RObject vector() { return column_; }; virtual bool skip() { return false; } int size() { return n_; } void resize(int n) { if (n == n_) return; n_ = n; column_ = Rf_lengthgets(column_, n); } void clear() { resize(0); } void setWarnings(Warnings* pWarnings) { pWarnings_ = pWarnings; } inline void warn(int row, int col, std::string expected, std::string actual) { if (pWarnings_ == NULL) { Rcpp::warning( "[%i, %i]: expected %s, but got '%s'", row + 1, col + 1, expected, actual); return; } pWarnings_->addWarning(row, col, expected, actual); } inline void warn(int row, int col, std::string expected, SourceIterators actual) { warn(row, col, expected, std::string(actual.first, actual.second)); } static CollectorPtr create(Rcpp::List spec, LocaleInfo* pLocale); }; // Character ------------------------------------------------------------------- class CollectorCharacter : public Collector { Iconv* pEncoder_; public: CollectorCharacter(Iconv* pEncoder): Collector(Rcpp::CharacterVector()), pEncoder_(pEncoder) {} void setValue(int i, const Token& t); void setValue(int i, const std::string& s); }; // Date ------------------------------------------------------------------------ class CollectorDate : public Collector { std::string format_; DateTimeParser parser_; public: CollectorDate(LocaleInfo* pLocale, const std::string& format): Collector(Rcpp::NumericVector()), format_(format), parser_(pLocale) { } void setValue(int i, const Token& t); Rcpp::RObject vector() { column_.attr("class") = "Date"; return column_; }; }; // Date time ------------------------------------------------------------------- class CollectorDateTime : public Collector { std::string format_; DateTimeParser parser_; std::string tz_; public: CollectorDateTime(LocaleInfo* pLocale, const std::string& format): Collector(Rcpp::NumericVector()), format_(format), parser_(pLocale), tz_(pLocale->tz_) { } void setValue(int i, const Token& t); Rcpp::RObject vector() { column_.attr("class") = Rcpp::CharacterVector::create("POSIXct", "POSIXt"); column_.attr("tzone") = tz_; return column_; }; }; class CollectorDouble : public Collector { char decimalMark_; public: CollectorDouble(char decimalMark): Collector(Rcpp::NumericVector()), decimalMark_(decimalMark) {} void setValue(int i, const Token& t); }; class CollectorFactor : public Collector { Iconv* pEncoder_; std::vector levels_; std::map levelset_; bool ordered_, implicitLevels_, includeNa_; boost::container::string buffer_; void insert(int i, Rcpp::String str, const Token& t); public: CollectorFactor(Iconv* pEncoder, Rcpp::Nullable levels, bool ordered, bool includeNa): Collector(Rcpp::IntegerVector()), pEncoder_(pEncoder), ordered_(ordered), includeNa_(includeNa) { implicitLevels_ = levels.isNull(); if (!implicitLevels_) { Rcpp::CharacterVector lvls = Rcpp::CharacterVector(levels); int n = lvls.size(); for (int i = 0; i < n; ++i) { Rcpp::String std_level; if (STRING_ELT(lvls, i) != NA_STRING) { const char* level = Rf_translateCharUTF8(STRING_ELT(lvls, i)); std_level = level; } else { std_level = NA_STRING; } levels_.push_back(std_level); levelset_.insert(std::make_pair(std_level, i)); } } } void setValue(int i, const Token& t); Rcpp::RObject vector() { if (ordered_) { column_.attr("class") = Rcpp::CharacterVector::create("ordered", "factor"); } else { column_.attr("class") = "factor"; } int n = levels_.size(); Rcpp::CharacterVector levels = Rcpp::CharacterVector(n); for (int i = 0; i < n; ++i) { levels[i] = levels_[i]; } column_.attr("levels") = levels; return column_; }; }; class CollectorInteger : public Collector { public: CollectorInteger(): Collector(Rcpp::IntegerVector()) {} void setValue(int i, const Token& t); }; class CollectorLogical : public Collector { public: CollectorLogical(): Collector(Rcpp::LogicalVector()) {} void setValue(int i, const Token& t); }; class CollectorNumeric : public Collector { char decimalMark_, groupingMark_; public: CollectorNumeric(char decimalMark, char groupingMark): Collector(Rcpp::NumericVector()), decimalMark_(decimalMark), groupingMark_(groupingMark) {} void setValue(int i, const Token& t); bool isNum(char c); }; // Time --------------------------------------------------------------------- class CollectorTime : public Collector { std::string format_; DateTimeParser parser_; public: CollectorTime(LocaleInfo* pLocale, const std::string& format): Collector(Rcpp::NumericVector()), format_(format), parser_(pLocale) { } void setValue(int i, const Token& t); Rcpp::RObject vector() { column_.attr("class") = Rcpp::CharacterVector::create("hms", "difftime"); column_.attr("units") = "secs"; return column_; }; }; // Skip --------------------------------------------------------------------- class CollectorSkip : public Collector { public: CollectorSkip() : Collector(R_NilValue) {} void setValue(int i, const Token& t) {} bool skip() { return true; } }; // Raw ------------------------------------------------------------------------- class CollectorRaw : public Collector { public: CollectorRaw() : Collector(Rcpp::List()) {} void setValue(int i, const Token& t); }; // Helpers --------------------------------------------------------------------- std::vector collectorsCreate(Rcpp::ListOf specs, LocaleInfo* pLocale); void collectorsResize(std::vector& collectors, int n); void collectorsClear(std::vector& collectors); std::string collectorGuess(Rcpp::CharacterVector input, Rcpp::List locale_); #endif readr/src/SourceRaw.h0000644000175100001440000000125513106621354014243 0ustar hornikusers#ifndef FASTREAD_SOURCERAW_H_ #define FASTREAD_SOURCERAW_H_ #include #include "Source.h" class SourceRaw : public Source { Rcpp::RawVector x_; // Make sure it doesn't get GC'd const char* begin_; const char* end_; public: SourceRaw(Rcpp::RawVector x, int skip = 0, const std::string& comment = ""): x_(x) { begin_ = (const char*) RAW(x); end_ = (const char*) RAW(x) + Rf_xlength(x); // Skip byte order mark, if needed begin_ = skipBom(begin_, end_); // Skip lines, if needed begin_ = skipLines(begin_, end_, skip, comment); } const char* begin() { return begin_; } const char* end() { return end_; } }; #endif readr/src/TokenizerLine.h0000644000175100001440000000305113106621354015107 0ustar hornikusers#ifndef FASTREAD_TOKENIZERLINE_H_ #define FASTREAD_TOKENIZERLINE_H_ #include #include "Token.h" #include "Tokenizer.h" #include "utils.h" class TokenizerLine : public Tokenizer { SourceIterator begin_, cur_, end_; std::vector NA_; bool moreTokens_; int line_; public: TokenizerLine(std::vector NA): NA_(NA), moreTokens_(false) {} TokenizerLine(): moreTokens_(false) {} void tokenize(SourceIterator begin, SourceIterator end) { begin_ = begin; cur_ = begin; end_ = end; line_ = 0; moreTokens_ = true; } std::pair progress() { size_t bytes = cur_ - begin_; return std::make_pair(bytes / (double) (end_ - begin_), bytes); } Token nextToken() { SourceIterator token_begin = cur_; bool hasNull = false; if (!moreTokens_) return Token(TOKEN_EOF, line_, 0); while (cur_ != end_) { Advance advance(&cur_); if (*cur_ == '\0') hasNull = true; if ((line_ + 1) % 500000 == 0) Rcpp::checkUserInterrupt(); switch(*cur_) { case '\r': case '\n': { Token t = Token(token_begin, advanceForLF(&cur_, end_), line_++, 0, hasNull); t.flagNA(NA_); return t; } default: break; } } // Reached end of Source: cur_ == end_ moreTokens_ = false; if (token_begin == end_) { return Token(TOKEN_EOF, line_++, 0); } else { Token t = Token(token_begin, end_, line_++, 0, hasNull); t.flagNA(NA_); return t; } } }; #endif readr/src/localtime.h0000644000175100001440000000052013106621354014274 0ustar hornikusers#ifdef __cplusplus extern "C" { #endif struct Rtm { int tm_sec; int tm_min; int tm_hour; int tm_mday; int tm_mon; int tm_year; int tm_wday; int tm_yday; int tm_isdst; long tm_gmtoff; const char *tm_zone; }; typedef struct Rtm stm; time_t my_mktime(stm* const tmp, const char* name); #ifdef __cplusplus } #endif readr/src/LocaleInfo.h0000644000175100001440000000060213106621354014337 0ustar hornikusers#ifndef FASTREAD_LOCALINFO #define FASTREAD_LOCALINFO #include "Iconv.h" class LocaleInfo { public: // LC_TIME std::vector mon_, monAb_, day_, dayAb_, amPm_; std::string dateFormat_, timeFormat_; // LC_NUMERIC char decimalMark_, groupingMark_; // LC_MISC std::string tz_; std::string encoding_; Iconv encoder_; LocaleInfo(Rcpp::List); }; #endif readr/src/Tokenizer.h0000644000175100001440000000325713106621354014307 0ustar hornikusers#ifndef FASTREAD_TOKENIZER_H_ #define FASTREAD_TOKENIZER_H_ #include #include "boost.h" #include "Warnings.h" class Token; typedef const char* SourceIterator; typedef std::pair SourceIterators; typedef void (*UnescapeFun)(SourceIterator, SourceIterator, boost::container::string*); class Tokenizer; typedef boost::shared_ptr TokenizerPtr; class Tokenizer { Warnings* pWarnings_; public: Tokenizer(): pWarnings_(NULL) {} virtual ~Tokenizer() {} virtual void tokenize(SourceIterator begin, SourceIterator end) = 0; virtual Token nextToken() = 0; // Percentage & bytes virtual std::pair progress() = 0; virtual void unescape(SourceIterator begin, SourceIterator end, boost::container::string* pOut) { pOut->reserve(end - begin); for (SourceIterator cur = begin; cur != end; ++cur) pOut->push_back(*cur); } void setWarnings(Warnings* pWarnings) { pWarnings_ = pWarnings; } inline void warn(int row, int col, const std::string& expected, const std::string& actual = "") { if (pWarnings_ == NULL) { Rcpp::warning("[%i, %i]: expected %s", row + 1, col + 1, expected); return; } pWarnings_->addWarning(row, col, expected, actual); } static TokenizerPtr create(Rcpp::List spec); }; // ----------------------------------------------------------------------------- // Helper class for parsers - ensures iterator always advanced no matter // how loop is exited class Advance : boost::noncopyable { SourceIterator* pIter_; public: Advance(SourceIterator* pIter): pIter_(pIter) {} ~Advance() { (*pIter_)++; } }; #endif readr/src/SourceString.h0000644000175100001440000000124713106621354014761 0ustar hornikusers#ifndef FASTREAD_SOURCESTRING_H_ #define FASTREAD_SOURCESTRING_H_ #include #include "Source.h" class SourceString : public Source { Rcpp::RObject string_; const char* begin_; const char* end_; public: SourceString(Rcpp::CharacterVector x, int skip = 0, const std::string& comment = "") { string_ = x[0]; begin_ = CHAR(string_); end_ = begin_ + Rf_xlength(string_); // Skip byte order mark, if needed begin_ = skipBom(begin_, end_); // Skip lines, if needed begin_ = skipLines(begin_, end_, skip, comment); } const char* begin() { return begin_; } const char* end() { return end_; } }; #endif readr/src/Collector.cpp0000644000175100001440000002346113106621354014615 0ustar hornikusers#include using namespace Rcpp; #include "Collector.h" #include "LocaleInfo.h" #include "QiParsers.h" CollectorPtr Collector::create(List spec, LocaleInfo* pLocale) { std::string subclass(as(spec.attr("class"))[0]); if (subclass == "collector_skip") return CollectorPtr(new CollectorSkip()); if (subclass == "collector_logical") return CollectorPtr(new CollectorLogical()); if (subclass == "collector_integer") return CollectorPtr(new CollectorInteger()); if (subclass == "collector_double") { return CollectorPtr(new CollectorDouble(pLocale->decimalMark_)); } if (subclass == "collector_number") return CollectorPtr(new CollectorNumeric(pLocale->decimalMark_, pLocale->groupingMark_)); if (subclass == "collector_character") return CollectorPtr(new CollectorCharacter(&pLocale->encoder_)); if (subclass == "collector_date") { SEXP format_ = spec["format"]; std::string format = (Rf_isNull(format_)) ? pLocale->dateFormat_ : as(format_); return CollectorPtr(new CollectorDate(pLocale, format)); } if (subclass == "collector_datetime") { std::string format = as(spec["format"]); return CollectorPtr(new CollectorDateTime(pLocale, format)); } if (subclass == "collector_time") { std::string format = as(spec["format"]); return CollectorPtr(new CollectorTime(pLocale, format)); } if (subclass == "collector_factor") { Nullable levels = as< Nullable >(spec["levels"]); bool ordered = as(spec["ordered"]); bool includeNa = as(spec["include_na"]); return CollectorPtr(new CollectorFactor(&pLocale->encoder_, levels, ordered, includeNa)); } Rcpp::stop("Unsupported column type"); return CollectorPtr(new CollectorSkip()); } std::vector collectorsCreate(ListOf specs, LocaleInfo* pLocale) { std::vector collectors; for (int j = 0; j < specs.size(); ++j) { CollectorPtr col = Collector::create(specs[j], pLocale); collectors.push_back(col); } return collectors; } // Implementations ------------------------------------------------------------ void CollectorCharacter::setValue(int i, const Token& t) { switch(t.type()) { case TOKEN_STRING: { boost::container::string buffer; SourceIterators string = t.getString(&buffer); if (t.hasNull()) warn(t.row(), t.col(), "", "embedded null"); SET_STRING_ELT(column_, i, pEncoder_->makeSEXP(string.first, string.second, t.hasNull())); break; }; case TOKEN_MISSING: SET_STRING_ELT(column_, i, NA_STRING); break; case TOKEN_EMPTY: SET_STRING_ELT(column_, i, Rf_mkCharCE("", CE_UTF8)); break; case TOKEN_EOF: Rcpp::stop("Invalid token"); } } void CollectorCharacter::setValue(int i, const std::string& s) { SET_STRING_ELT(column_, i, Rf_mkCharCE(s.c_str(), CE_UTF8)); } void CollectorDate::setValue(int i, const Token& t) { switch(t.type()) { case TOKEN_STRING: { boost::container::string buffer; SourceIterators string = t.getString(&buffer); std::string std_string(string.first, string.second); parser_.setDate(std_string.c_str()); bool res = (format_ == "") ? parser_.parseLocaleDate() : parser_.parse(format_); if (!res) { warn(t.row(), t.col(), "date like " + format_, std_string); REAL(column_)[i] = NA_REAL; return; } DateTime dt = parser_.makeDate(); if (!dt.validDate()) { warn(t.row(), t.col(), "valid date", std_string); REAL(column_)[i] = NA_REAL; return; } REAL(column_)[i] = dt.date(); return; } case TOKEN_MISSING: case TOKEN_EMPTY: REAL(column_)[i] = NA_REAL; return; case TOKEN_EOF: Rcpp::stop("Invalid token"); } } void CollectorDateTime::setValue(int i, const Token& t) { switch(t.type()) { case TOKEN_STRING: { boost::container::string buffer; SourceIterators string = t.getString(&buffer); std::string std_string(string.first, string.second); parser_.setDate(std_string.c_str()); bool res = (format_ == "") ? parser_.parseISO8601() : parser_.parse(format_); if (!res) { warn(t.row(), t.col(), "date like " + format_, std_string); REAL(column_)[i] = NA_REAL; return; } DateTime dt = parser_.makeDateTime(); if (!dt.validDateTime()) { warn(t.row(), t.col(), "valid date", std_string); REAL(column_)[i] = NA_REAL; return; } REAL(column_)[i] = dt.datetime(); return; } case TOKEN_MISSING: case TOKEN_EMPTY: REAL(column_)[i] = NA_REAL; return; case TOKEN_EOF: Rcpp::stop("Invalid token"); } return; } void CollectorDouble::setValue(int i, const Token& t) { switch(t.type()) { case TOKEN_STRING: { boost::container::string buffer; SourceIterators str = t.getString(&buffer); bool ok = parseDouble(decimalMark_, str.first, str.second, REAL(column_)[i]); if (!ok) { REAL(column_)[i] = NA_REAL; warn(t.row(), t.col(), "a double", str); return; } if (str.first != str.second) { REAL(column_)[i] = NA_REAL; warn(t.row(), t.col(), "no trailing characters", str); return; } return; } case TOKEN_MISSING: case TOKEN_EMPTY: REAL(column_)[i] = NA_REAL; break; case TOKEN_EOF: Rcpp::stop("Invalid token"); } } void CollectorFactor::insert(int i, Rcpp::String str, const Token& t) { std::map::iterator it = levelset_.find(str); if (it == levelset_.end()) { if (implicitLevels_ || (includeNa_ && str == NA_STRING)) { int n = levelset_.size(); levelset_.insert(std::make_pair(str, n)); levels_.push_back(str); INTEGER(column_)[i] = n + 1; } else { warn(t.row(), t.col(), "value in level set", str); INTEGER(column_)[i] = NA_INTEGER; } } else { INTEGER(column_)[i] = it->second + 1; } } void CollectorFactor::setValue(int i, const Token& t) { switch(t.type()) { case TOKEN_STRING: { boost::container::string buffer; SourceIterators string = t.getString(&buffer); Rcpp::String std_string = pEncoder_->makeSEXP(string.first, string.second, t.hasNull()); insert(i, std_string, t); return; }; case TOKEN_MISSING: case TOKEN_EMPTY: if (includeNa_) { insert(i, NA_STRING, t); } else { INTEGER(column_)[i] = NA_INTEGER; } return; case TOKEN_EOF: Rcpp::stop("Invalid token"); } } void CollectorInteger::setValue(int i, const Token& t) { switch(t.type()) { case TOKEN_STRING: { boost::container::string buffer; SourceIterators str = t.getString(&buffer); bool ok = parseInt(str.first, str.second, INTEGER(column_)[i]); if (!ok) { INTEGER(column_)[i] = NA_INTEGER; warn(t.row(), t.col(), "an integer", str); return; } if (str.first != str.second) { warn(t.row(), t.col(), "no trailing characters", str); INTEGER(column_)[i] = NA_INTEGER; return; } return; }; case TOKEN_MISSING: case TOKEN_EMPTY: INTEGER(column_)[i] = NA_INTEGER; break; case TOKEN_EOF: Rcpp::stop("Invalid token"); } } void CollectorLogical::setValue(int i, const Token& t) { switch(t.type()) { case TOKEN_STRING: { boost::container::string buffer; SourceIterators string = t.getString(&buffer); int size = string.second - string.first; switch(size) { case 1: if (*string.first == 'T' || *string.first == 't' || *string.first == '1') { LOGICAL(column_)[i] = 1; return; } if (*string.first == 'F' || *string.first == 'f' || *string.first == '0') { LOGICAL(column_)[i] = 0; return; } break; case 4: if (strncasecmp(string.first, "true", 4) == 0) { LOGICAL(column_)[i] = 1; return; } break; case 5: if (strncasecmp(string.first, "false", 5) == 0) { LOGICAL(column_)[i] = 0; return; } break; default: break; } warn(t.row(), t.col(), "1/0/T/F/TRUE/FALSE", string); LOGICAL(column_)[i] = NA_LOGICAL; return; }; case TOKEN_MISSING: case TOKEN_EMPTY: LOGICAL(column_)[i] = NA_LOGICAL; return; break; case TOKEN_EOF: Rcpp::stop("Invalid token"); } } void CollectorNumeric::setValue(int i, const Token& t) { switch(t.type()) { case TOKEN_STRING: { boost::container::string buffer; SourceIterators str = t.getString(&buffer); bool ok = parseNumber(decimalMark_, groupingMark_, str.first, str.second, REAL(column_)[i]); if (!ok) { REAL(column_)[i] = NA_REAL; warn(t.row(), t.col(), "a number", str); return; } break; } case TOKEN_MISSING: case TOKEN_EMPTY: REAL(column_)[i] = NA_REAL; break; case TOKEN_EOF: Rcpp::stop("Invalid token"); } } void CollectorTime::setValue(int i, const Token& t) { switch(t.type()) { case TOKEN_STRING: { boost::container::string buffer; SourceIterators string = t.getString(&buffer); std::string std_string(string.first, string.second); parser_.setDate(std_string.c_str()); bool res = (format_ == "") ? parser_.parseLocaleTime() : parser_.parse(format_); if (!res) { warn(t.row(), t.col(), "time like " + format_, std_string); REAL(column_)[i] = NA_REAL; return; } DateTime dt = parser_.makeTime(); if (!dt.validTime()) { warn(t.row(), t.col(), "valid date", std_string); REAL(column_)[i] = NA_REAL; return; } REAL(column_)[i] = dt.time(); return; } case TOKEN_MISSING: case TOKEN_EMPTY: REAL(column_)[i] = NA_REAL; return; case TOKEN_EOF: Rcpp::stop("Invalid token"); } } void CollectorRaw::setValue(int i, const Token& t) { if (t.type() == TOKEN_EOF) { Rcpp::stop("Invalid token"); } SET_VECTOR_ELT(column_, i, t.asRaw()); return; } readr/src/TokenizerFwf.cpp0000644000175100001440000001572413106621354015307 0ustar hornikusers#include using namespace Rcpp; #include "Source.h" #include "Tokenizer.h" #include "TokenizerFwf.h" #include "utils.h" struct skip_t { SourceIterator begin; int lines; }; skip_t skip_comments(SourceIterator begin, SourceIterator end, std::string comment = "") { skip_t out; if (comment.length() == 0) { out.begin = begin; out.lines = 0; return out; } SourceIterator cur = begin; int skip = 0; boost::iterator_range haystack(cur, end); while(boost::starts_with(haystack, comment)) { //Rcpp::Rcout << boost::starts_with(haystack, comment); // Skip rest of line while(cur != end && *cur != '\n' && *cur != '\r') { ++cur; } advanceForLF(&cur, end); ++cur; haystack = boost::iterator_range(cur, end); ++skip; } out.begin = cur; out.lines = skip; return out; } std::vector emptyCols_(SourceIterator begin, SourceIterator end, size_t n = 100, std::string comment = "") { std::vector is_white; size_t row = 0, col = 0; for (SourceIterator cur = begin; cur != end; ++cur) { if (row > n) break; switch(*cur) { case '\n': case '\r': advanceForLF(&cur, end); col = 0; row++; break; case ' ': col++; break; default: // Make sure there's enough room if (col >= is_white.size()) is_white.resize(col + 1, true); is_white[col] = false; col++; } } return is_white; } // [[Rcpp::export]] List whitespaceColumns(List sourceSpec, int n = 100, std::string comment = "") { SourcePtr source = Source::create(sourceSpec); skip_t s = skip_comments(source->begin(), source->end(), comment); std::vector empty = emptyCols_(s.begin, source->end(), n); std::vector begin, end; bool in_col = false; for (size_t i = 0; i < empty.size(); ++i) { if (in_col && empty[i]) { end.push_back(i); in_col = false; } else if (!in_col && !empty[i]) { begin.push_back(i); in_col = true; } } if (in_col) end.push_back(empty.size()); return List::create( _["begin"] = begin, _["end"] = end, _["skip"] = s.lines ); } // TokenizerFwf --------------------------------------------------------------- #include #include "TokenizerFwf.h" TokenizerFwf::TokenizerFwf(const std::vector& beginOffset, const std::vector& endOffset, std::vector NA, std::string comment): beginOffset_(beginOffset), endOffset_(endOffset), NA_(NA), cols_(beginOffset.size()), comment_(comment), moreTokens_(false), hasComment_(comment.size() > 0) { if (beginOffset_.size() != endOffset_.size()) Rcpp::stop("Begin (%i) and end (%i) specifications must have equal length", beginOffset_.size(), endOffset_.size()); if (beginOffset_.size() == 0) Rcpp::stop("Zero-length begin and end specifications not supported"); // File is assumed to be ragged (last column can have variable width) // when the last element of endOffset_ is NA isRagged_ = endOffset_[endOffset_.size() - 1L] == NA_INTEGER; max_ = 0; for (int j = 0; j < (cols_ - isRagged_); ++j) { if (endOffset_[j] <= beginOffset_[j]) Rcpp::stop("Begin offset (%i) must be smaller than end offset (%i)", beginOffset_[j], endOffset_[j]); if (beginOffset_[j] < max_) { Rcpp::stop( "Overlapping specification not supported. " "Begin offset (%i) must be greater than or equal to previous end offset (%i)", beginOffset_[j], max_); } if (endOffset_[j] > max_) { max_ = endOffset_[j]; } } } void TokenizerFwf::tokenize(SourceIterator begin, SourceIterator end) { cur_ = begin; curLine_ = begin; begin_ = begin; end_ = end; row_ = 0; col_ = 0; moreTokens_ = true; } std::pair TokenizerFwf::progress() { size_t bytes = cur_ - begin_; return std::make_pair(bytes / (double) (end_ - begin_), bytes); } Token TokenizerFwf::nextToken() { if (!moreTokens_) return Token(TOKEN_EOF, 0, 0); // Check for comments only at start of line while(cur_ != end_ && col_ == 0 && isComment(cur_)) { // Skip rest of line while(cur_ != end_ && *cur_ != '\n' && *cur_ != '\r') { ++cur_; } advanceForLF(&cur_, end_); if (cur_ != end_) { ++cur_; } curLine_ = cur_; } // Find start of field SourceIterator fieldBegin = cur_; findBeginning: int skip = beginOffset_[col_] - (cur_ - curLine_); for (int i = 0; i < skip; ++i) { if (fieldBegin == end_) break; if (*fieldBegin == '\n' || *fieldBegin == '\r') { warn(row_, col_, tfm::format("%i chars between fields", skip), tfm::format("%i chars until end of line", i) ); row_++; col_ = 0; advanceForLF(&fieldBegin, end_); if (fieldBegin != end_) fieldBegin++; cur_ = curLine_ = fieldBegin; goto findBeginning; } fieldBegin++; } if (fieldBegin == end_) { // need to warn here if col != 0/cols - 1 moreTokens_ = false; return Token(TOKEN_EOF, 0, 0); } // Find end of field SourceIterator fieldEnd = fieldBegin; bool lastCol = (col_ == cols_ - 1), tooShort = false, hasNull = false; if (lastCol && isRagged_) { // Last column is ragged, so read until end of line (ignoring width) while(fieldEnd != end_ && *fieldEnd != '\r' && *fieldEnd != '\n') { if (*fieldEnd == '\0') hasNull = true; fieldEnd++; } } else { int width = endOffset_[col_] - beginOffset_[col_]; // Find the end of the field, stopping for newlines for(int i = 0; i < width; ++i) { if (fieldEnd == end_ || *fieldEnd == '\n' || *fieldEnd == '\r') { warn(row_, col_, tfm::format("%i chars", width), tfm::format("%i", i)); tooShort = true; break; } if (*fieldEnd == '\0') hasNull = true; fieldEnd++; } } Token t = fieldToken(fieldBegin, fieldEnd, hasNull); if (lastCol || tooShort) { row_++; col_ = 0; if (!(tooShort || isRagged_)) { // Proceed to the end of the line when you are possibly not there. // This is needed in case the last column in the file is not being read. while(fieldEnd != end_ && *fieldEnd != '\r' && *fieldEnd != '\n') { if (*fieldEnd == '\0') hasNull = true; fieldEnd++; } } curLine_ = fieldEnd; advanceForLF(&curLine_, end_); if (curLine_ != end_) curLine_++; cur_ = curLine_; } else { col_++; cur_ = fieldEnd; } return t; } Token TokenizerFwf::fieldToken(SourceIterator begin, SourceIterator end, bool hasNull) { if (begin == end) return Token(TOKEN_MISSING, row_, col_); Token t = Token(begin, end, row_, col_, hasNull); t.trim(); t.flagNA(NA_); return t; } bool TokenizerFwf::isComment(const char* cur) const { if (!hasComment_) return false; boost::iterator_range haystack(cur, end_); return boost::starts_with(haystack, comment_); } readr/src/LocaleInfo.cpp0000644000175100001440000000157113106621354014700 0ustar hornikusers#include #include "LocaleInfo.h" using namespace Rcpp; LocaleInfo::LocaleInfo(List x): encoding_(as(x["encoding"])), encoder_(Iconv(encoding_)) { std::string klass = x.attr("class"); if (klass != "locale") stop("Invalid input: must be of class locale"); List date_names = as(x["date_names"]); mon_ = as >(date_names["mon"]); monAb_ = as >(date_names["mon_ab"]); day_ = as >(date_names["day"]); dayAb_ = as >(date_names["day_ab"]); amPm_ = as >(date_names["am_pm"]); decimalMark_ = as(x["decimal_mark"]); groupingMark_ = as(x["grouping_mark"]); dateFormat_ = as(x["date_format"]); timeFormat_ = as(x["time_format"]); tz_ = as(x["tz"]); } readr/src/Progress.h0000644000175100001440000000341213106621354014132 0ustar hornikusers#ifndef FASTREAD_PROGRESS_H_ #define FASTREAD_PROGRESS_H_ #include #include #include inline int now() { return clock() / CLOCKS_PER_SEC; } inline std::string clearLine(int width = 50) { return "\r" + std::string(' ', width) + "\r"; } inline std::string showTime(int x) { if (x < 60) { return tfm::format("%i s", x); } else if (x < 60 * 60) { return tfm::format("%i m", x / 60); } else { return tfm::format("%i h", x / (60 * 60)); } } class Progress { int timeMin_, timeInit_, timeStop_, width_; bool show_, stopped_; public: Progress(int min = 5, int width = Rf_GetOptionWidth()): timeMin_(min), timeInit_(now()), timeStop_(now()), width_(width), show_(false), stopped_(false) { } void stop() { timeStop_ = now(); stopped_ = true; } void show(std::pair progress) { double prop = progress.first, size = progress.second / (1024 * 1024); double est = (now() - timeInit_) / prop; if (!show_) { if (est > timeMin_) { show_ = true; } else { return; } } std::stringstream labelStream; tfm::format(labelStream, " %3d%%", (int) (prop * 100)); if (size > 0) { tfm::format(labelStream, " %4.0f MB", size); } std::string label = labelStream.str(); int barSize = width_ - label.size() - 2; if (barSize < 0) { return; } int nbars = prop * barSize; int nspaces = (1 - prop) * barSize; std::string bars(nbars, '='), spaces(nspaces, ' '); Rcpp::Rcout << '\r' << '|' << bars << spaces << '|' << label; } ~Progress() { try { if (!show_) return; if (!stopped_) timeStop_ = now(); Rcpp::Rcout << "\n"; } catch (...) {} } }; #endif readr/src/localtime.c0000644000175100001440000016025113106621354014277 0ustar hornikusers/* * R : A Computer Language for Statistical Data Analysis * Modifications copyright (C) 2007-2015 The R Core Team * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, a copy is available at * https://www.R-project.org/Licenses/ */ /* The orginal version of this file stated ** This file is in the public domain, so clarified as of ** 1996-06-05 by Arthur David Olson. The modified version is copyrighted. Modifications include: setting EOVERFLOW where to find the zi database Mingw-w64 changes removing ATTRIBUTE_PURE, conditional parts for e.g. ALL_STATE use of 'unknown' isdst use of 64-bit time_t irrespective of platform. use of tm_zone and tm_gmtoff on all platforms. Additional modifications made by Hadley Wickham, (c) RStudio: * provide tzset_name() to avoid use of env vars * eliminate code unrelated to mktime */ #include #include /* for CHAR_BIT et al. */ #include #include #include #include #ifndef EOVERFLOW # define EOVERFLOW 79 #endif #include #include #include // for open + modes #include #include "localtime.h" #ifndef _WIN32 # include // for access, read, close #endif #ifndef TRUE #define TRUE 1 #endif /* !defined TRUE */ #ifndef FALSE #define FALSE 0 #endif /* !defined FALSE */ /* merged from private.h */ #ifndef TYPE_BIT #define TYPE_BIT(type) (sizeof (type) * CHAR_BIT) #endif /* !defined TYPE_BIT */ #ifndef TYPE_SIGNED #define TYPE_SIGNED(type) (((type) -1) < 0) #endif /* !defined TYPE_SIGNED */ #define TWOS_COMPLEMENT(t) ((t) ~ (t) 0 < 0) #define GRANDPARENTED "Local time zone must be set--see zic manual page" #define YEARSPERREPEAT 400 /* years before a Gregorian repeat */ #define AVGSECSPERYEAR 31556952L #define SECSPERREPEAT ((int_fast64_t) YEARSPERREPEAT * (int_fast64_t) AVGSECSPERYEAR) #define SECSPERREPEAT_BITS 34 /* ceil(log2(SECSPERREPEAT)) */ #define is_digit(c) ((unsigned)(c) - '0' <= 9) #define INITIALIZE(x) (x = 0) /* Max and min values of the integer type T, of which only the bottom B bits are used, and where the highest-order used bit is considered to be a sign bit if T is signed. */ #define MAXVAL(t, b) \ ((t) (((t) 1 << ((b) - 1 - TYPE_SIGNED(t))) \ - 1 + ((t) 1 << ((b) - 1 - TYPE_SIGNED(t))))) #define MINVAL(t, b) \ ((t) (TYPE_SIGNED(t) ? - TWOS_COMPLEMENT(t) - MAXVAL(t, b) : 0)) /* The minimum and maximum finite time values. This assumes no padding. */ static time_t const time_t_min = MINVAL(time_t, TYPE_BIT(time_t)); static time_t const time_t_max = MAXVAL(time_t, TYPE_BIT(time_t)); #include "tzfile.h" #ifndef TZ_ABBR_MAX_LEN #define TZ_ABBR_MAX_LEN 16 #endif /* !defined TZ_ABBR_MAX_LEN */ #ifndef TZ_ABBR_CHAR_SET #define TZ_ABBR_CHAR_SET \ "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 :+-._" #endif /* !defined TZ_ABBR_CHAR_SET */ #ifndef TZ_ABBR_ERR_CHAR #define TZ_ABBR_ERR_CHAR '_' #endif /* !defined TZ_ABBR_ERR_CHAR */ /* ** SunOS 4.1.1 headers lack O_BINARY. */ #ifdef O_BINARY #define OPEN_MODE (O_RDONLY | O_BINARY) #endif /* defined O_BINARY */ #ifndef O_BINARY #define OPEN_MODE O_RDONLY #endif /* !defined O_BINARY */ static const char gmt[] = "GMT"; /* ** The DST rules to use if TZ has no rules and we can't load TZDEFRULES. ** We default to US rules as of 1999-08-17. ** POSIX 1003.1 section 8.1.1 says that the default DST rules are ** implementation dependent; for historical reasons, US rules are a ** common default. */ #ifndef TZDEFRULESTRING #define TZDEFRULESTRING ",M4.1.0,M10.5.0" #endif /* !defined TZDEFDST */ struct ttinfo { /* time type information */ int_fast32_t tt_gmtoff; /* UT offset in seconds */ int tt_isdst; /* used to set tm_isdst */ int tt_abbrind; /* abbreviation list index */ int tt_ttisstd; /* TRUE if transition is std time */ int tt_ttisgmt; /* TRUE if transition is UT */ }; struct lsinfo { /* leap second information */ time_t ls_trans; /* transition time */ int_fast64_t ls_corr; /* correction to apply */ }; #define BIGGEST(a, b) (((a) > (b)) ? (a) : (b)) #ifdef TZNAME_MAX #define MY_TZNAME_MAX TZNAME_MAX #endif /* defined TZNAME_MAX */ #ifndef TZNAME_MAX #define MY_TZNAME_MAX 255 #endif /* !defined TZNAME_MAX */ struct state { int leapcnt; int timecnt; int typecnt; int charcnt; int goback; int goahead; time_t ats[TZ_MAX_TIMES]; unsigned char types[TZ_MAX_TIMES]; struct ttinfo ttis[TZ_MAX_TYPES]; char chars[BIGGEST(BIGGEST(TZ_MAX_CHARS + 1, sizeof gmt), (2 * (MY_TZNAME_MAX + 1)))]; struct lsinfo lsis[TZ_MAX_LEAPS]; int defaulttype; /* for early times or if no transitions */ }; struct rule { int r_type; /* type of rule--see below */ int r_day; /* day number of rule */ int r_week; /* week number of rule */ int r_mon; /* month number of rule */ int_fast32_t r_time; /* transition time of rule */ }; #define JULIAN_DAY 0 /* Jn - Julian day */ #define DAY_OF_YEAR 1 /* n - day of year */ #define MONTH_NTH_DAY_OF_WEEK 2 /* Mm.n.d - month, week, day of week */ /* ** Prototypes for static functions. */ static int_fast32_t detzcode(const char * codep); static int_fast64_t detzcode64(const char * codep); static int differ_by_repeat(time_t t1, time_t t0); static const char * getzname(const char * strp); static const char * getqzname(const char * strp, const int delim); static const char * getnum(const char * strp, int * nump, int min, int max); static const char * getsecs(const char * strp, int_fast32_t * secsp); static const char * getoffset(const char * strp, int_fast32_t * offsetp); static const char * getrule(const char * strp, struct rule * rulep); static void gmtload(struct state * sp); static stm * localsub(const time_t * timep, int_fast32_t offset, stm * tmp); static int increment_overflow(int * number, int delta); static int leaps_thru_end_of(int y); static int increment_overflow32(int_fast32_t * number, int delta); static int increment_overflow_time(time_t *t, int_fast32_t delta); static int normalize_overflow32(int_fast32_t * tensptr, int * unitsptr, int base); static int normalize_overflow(int * tensptr, int * unitsptr, int base); static time_t time1(stm * tmp, stm * (*funcp)(const time_t *, int_fast32_t, stm *), int_fast32_t offset); static time_t time2(stm *tmp, stm * (*funcp)(const time_t *, int_fast32_t, stm*), int_fast32_t offset, int * okayp); static time_t time2sub(stm *tmp, stm * (*funcp)(const time_t *, int_fast32_t, stm*), int_fast32_t offset, int * okayp, int do_norm_secs); static stm * timesub(const time_t * timep, int_fast32_t offset, const struct state * sp, stm * tmp); static int tmcomp(const stm * atmp, const stm * btmp); static int_fast32_t transtime(int year, const struct rule * rulep, int_fast32_t offset); static int typesequiv(const struct state * sp, int a, int b); static int tzload(const char * name, struct state * sp, int doextend); static int tzparse(const char * name, struct state * sp, int lastditch); static int tzdir(char* buf); static struct state lclmem; static struct state gmtmem; #define lclptr (&lclmem) #define gmtptr (&gmtmem) #ifndef TZ_STRLEN_MAX #define TZ_STRLEN_MAX 255 #endif /* !defined TZ_STRLEN_MAX */ static char lcl_TZname[TZ_STRLEN_MAX + 1]; static int lcl_is_set; /* ** Section 4.12.3 of X3.159-1989 requires that ** Except for the strftime function, these functions [asctime, ** ctime, gmtime, localtime] return values in one of two static ** objects: a broken-down time structure and an array of char. ** Thanks to Paul Eggert for noting this. */ #define TWOS_COMPLEMENT(t) ((t) ~ (t) 0 < 0) static int_fast32_t detzcode(const char *const codep) { register int_fast32_t result; register int i; int_fast32_t one = 1; int_fast32_t halfmaxval = one << (32 - 2); int_fast32_t maxval = halfmaxval - 1 + halfmaxval; int_fast32_t minval = -1 - maxval; result = codep[0] & 0x7f; for (i = 1; i < 4; ++i) result = (result << 8) | (codep[i] & 0xff); if (codep[0] & 0x80) { /* Do two's-complement negation even on non-two's-complement machines. If the result would be minval - 1, return minval. */ result -= !TWOS_COMPLEMENT(int_fast32_t) && result != 0; result += minval; } return result; } static int_fast64_t detzcode64(const char *const codep) { register uint_fast64_t result; register int i; int_fast64_t one = 1; int_fast64_t halfmaxval = one << (64 - 2); int_fast64_t maxval = halfmaxval - 1 + halfmaxval; int_fast64_t minval = -TWOS_COMPLEMENT(int_fast64_t) - maxval; result = codep[0] & 0x7f; for (i = 1; i < 8; ++i) result = (result << 8) | (codep[i] & 0xff); if (codep[0] & 0x80) { /* Do two's-complement negation even on non-two's-complement machines. If the result would be minval - 1, return minval. */ result -= !TWOS_COMPLEMENT(int_fast64_t) && result != 0; result += minval; } return result; } static int differ_by_repeat(const time_t t1, const time_t t0) { if (TYPE_BIT(time_t) - TYPE_SIGNED(time_t) < SECSPERREPEAT_BITS) return 0; /* R change */ return (int_fast64_t)t1 - (int_fast64_t)t0 == SECSPERREPEAT; } extern void Rf_warning(const char *, ...); extern void Rf_error(const char *, ...); static int tzload(const char * name, struct state * const sp, const int doextend) { const char * p; int i; int fid; ssize_t nread; typedef union { struct tzhead tzhead; char buf[2 * sizeof(struct tzhead) + 2 * sizeof *sp + 4 * TZ_MAX_TIMES]; } u_t; u_t u; u_t * const up = &u; sp->goback = sp->goahead = FALSE; /* if (name == NULL && (name = TZDEFAULT) == NULL) return -1; */ if (name == NULL) { name = TZDEFAULT; } { int doaccess; /* ** Section 4.9.1 of the C standard says that ** "FILENAME_MAX expands to an integral constant expression ** that is the size needed for an array of char large enough ** to hold the longest file name string that the implementation ** guarantees can be opened." */ char fullname[FILENAME_MAX + 1]; const char *sname = name; if (name[0] == ':') ++name; doaccess = name[0] == '/'; if (!doaccess) { char buf[1000]; if (tzdir(buf) != 0) { return -1; } p = buf; if ((strlen(p) + strlen(name) + 1) >= sizeof fullname) return -1; (void) strcpy(fullname, p); (void) strcat(fullname, "/"); (void) strcat(fullname, name); /* ** Set doaccess if '.' (as in "../") shows up in name. */ if (strchr(name, '.') != NULL) doaccess = TRUE; name = fullname; } if (doaccess && access(name, R_OK) != 0) { Rf_warning("unknown timezone '%s'", sname); return -1; } if ((fid = open(name, OPEN_MODE)) == -1) { Rf_warning("unknown timezone '%s'", sname); return -1; } } nread = read(fid, up->buf, sizeof up->buf); if (close(fid) < 0 || nread <= 0) return -1; for (int stored = 4; stored <= 8; stored *= 2) { int ttisstdcnt, ttisgmtcnt, timecnt; ttisstdcnt = (int) detzcode(up->tzhead.tzh_ttisstdcnt); ttisgmtcnt = (int) detzcode(up->tzhead.tzh_ttisgmtcnt); sp->leapcnt = (int) detzcode(up->tzhead.tzh_leapcnt); sp->timecnt = (int) detzcode(up->tzhead.tzh_timecnt); sp->typecnt = (int) detzcode(up->tzhead.tzh_typecnt); sp->charcnt = (int) detzcode(up->tzhead.tzh_charcnt); p = up->tzhead.tzh_charcnt + sizeof up->tzhead.tzh_charcnt; if (sp->leapcnt < 0 || sp->leapcnt > TZ_MAX_LEAPS || sp->typecnt <= 0 || sp->typecnt > TZ_MAX_TYPES || sp->timecnt < 0 || sp->timecnt > TZ_MAX_TIMES || sp->charcnt < 0 || sp->charcnt > TZ_MAX_CHARS || (ttisstdcnt != sp->typecnt && ttisstdcnt != 0) || (ttisgmtcnt != sp->typecnt && ttisgmtcnt != 0)) return -1; if (nread - (p - up->buf) < sp->timecnt * stored + /* ats */ sp->timecnt + /* types */ sp->typecnt * 6 + /* ttinfos */ sp->charcnt + /* chars */ sp->leapcnt * (stored + 4) + /* lsinfos */ ttisstdcnt + /* ttisstds */ ttisgmtcnt) /* ttisgmts */ return -1; timecnt = 0; for (int i = 0; i < sp->timecnt; ++i) { int_fast64_t at = stored == 4 ? detzcode(p) : detzcode64(p); sp->types[i] = ((TYPE_SIGNED(time_t) ? time_t_min <= at : 0 <= at) && at <= time_t_max); if (sp->types[i]) { if (i && !timecnt && at != time_t_min) { /* ** Keep the earlier record, but tweak ** it so that it starts with the ** minimum time_t value. */ sp->types[i - 1] = 1; sp->ats[timecnt++] = time_t_min; } sp->ats[timecnt++] = at; } p += stored; } timecnt = 0; for (int i = 0; i < sp->timecnt; ++i) { unsigned char typ = *p++; if (sp->typecnt <= typ) return -1; if (sp->types[i]) sp->types[timecnt++] = typ; } sp->timecnt = timecnt; for (int i = 0; i < sp->typecnt; ++i) { struct ttinfo * ttisp; ttisp = &sp->ttis[i]; ttisp->tt_gmtoff = detzcode(p); p += 4; ttisp->tt_isdst = (unsigned char) *p++; if (ttisp->tt_isdst != 0 && ttisp->tt_isdst != 1) return -1; ttisp->tt_abbrind = (unsigned char) *p++; if (ttisp->tt_abbrind < 0 || ttisp->tt_abbrind > sp->charcnt) return -1; } for (i = 0; i < sp->charcnt; ++i) sp->chars[i] = *p++; sp->chars[i] = '\0'; /* ensure '\0' at end */ for (int i = 0; i < sp->leapcnt; ++i) { struct lsinfo * lsisp; lsisp = &sp->lsis[i]; lsisp->ls_trans = (stored == 4) ? detzcode(p) : detzcode64(p); p += stored; lsisp->ls_corr = detzcode(p); p += 4; } for (int i = 0; i < sp->typecnt; ++i) { struct ttinfo * ttisp; ttisp = &sp->ttis[i]; if (ttisstdcnt == 0) ttisp->tt_ttisstd = FALSE; else { ttisp->tt_ttisstd = *p++; if (ttisp->tt_ttisstd != TRUE && ttisp->tt_ttisstd != FALSE) return -1; } } for (int i = 0; i < sp->typecnt; ++i) { struct ttinfo * ttisp; ttisp = &sp->ttis[i]; if (ttisgmtcnt == 0) ttisp->tt_ttisgmt = FALSE; else { ttisp->tt_ttisgmt = *p++; if (ttisp->tt_ttisgmt != TRUE && ttisp->tt_ttisgmt != FALSE) return -1; } } /* ** If this is an old file, we're done. */ if (up->tzhead.tzh_version[0] == '\0') break; nread -= p - up->buf; for (int i = 0; i < nread; ++i) up->buf[i] = p[i]; /* ** If this is a signed narrow time_t system, we're done. */ if (TYPE_SIGNED(time_t) && stored >= (int) sizeof(time_t)) break; } if (doextend && nread > 2 && up->buf[0] == '\n' && up->buf[nread - 1] == '\n' && sp->typecnt + 2 <= TZ_MAX_TYPES) { struct state ts; int result; up->buf[nread - 1] = '\0'; result = tzparse(&up->buf[1], &ts, FALSE); if (result == 0 && ts.typecnt == 2 && sp->charcnt + ts.charcnt <= TZ_MAX_CHARS) { for (int i = 0; i < 2; ++i) ts.ttis[i].tt_abbrind += sp->charcnt; for (int i = 0; i < ts.charcnt; ++i) sp->chars[sp->charcnt++] = ts.chars[i]; i = 0; while (i < ts.timecnt && ts.ats[i] <= sp->ats[sp->timecnt - 1]) ++i; while (i < ts.timecnt && sp->timecnt < TZ_MAX_TIMES) { sp->ats[sp->timecnt] = ts.ats[i]; sp->types[sp->timecnt] = (unsigned char)(sp->typecnt + ts.types[i]); ++sp->timecnt; ++i; } sp->ttis[sp->typecnt++] = ts.ttis[0]; sp->ttis[sp->typecnt++] = ts.ttis[1]; } } if (sp->timecnt > 1) { for (int i = 1; i < sp->timecnt; ++i) if (typesequiv(sp, sp->types[i], sp->types[0]) && differ_by_repeat(sp->ats[i], sp->ats[0])) { sp->goback = TRUE; break; } for (int i = sp->timecnt - 2; i >= 0; --i) if (typesequiv(sp, sp->types[sp->timecnt - 1], sp->types[i]) && differ_by_repeat(sp->ats[sp->timecnt - 1], sp->ats[i])) { sp->goahead = TRUE; break; } } /* ** If type 0 is is unused in transitions, ** it's the type to use for early times. */ for (i = 0; i < sp->typecnt; ++i) if (sp->types[i] == 0) break; i = (i >= sp->typecnt) ? 0 : -1; /* ** Absent the above, ** if there are transition times ** and the first transition is to a daylight time ** find the standard type less than and closest to ** the type of the first transition. */ if (i < 0 && sp->timecnt > 0 && sp->ttis[sp->types[0]].tt_isdst) { i = sp->types[0]; while (--i >= 0) if (!sp->ttis[i].tt_isdst) break; } /* ** If no result yet, find the first standard type. ** If there is none, punt to type zero. */ if (i < 0) { i = 0; while (sp->ttis[i].tt_isdst) if (++i >= sp->typecnt) { i = 0; break; } } sp->defaulttype = i; return 0; } static int typesequiv(const struct state * const sp, const int a, const int b) { int result; if (sp == NULL || a < 0 || a >= sp->typecnt || b < 0 || b >= sp->typecnt) result = FALSE; else { const struct ttinfo * ap = &sp->ttis[a]; const struct ttinfo * bp = &sp->ttis[b]; result = ap->tt_gmtoff == bp->tt_gmtoff && ap->tt_isdst == bp->tt_isdst && ap->tt_ttisstd == bp->tt_ttisstd && ap->tt_ttisgmt == bp->tt_ttisgmt && strcmp(&sp->chars[ap->tt_abbrind], &sp->chars[bp->tt_abbrind]) == 0; } return result; } static const int mon_lengths[2][MONSPERYEAR] = { { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }, { 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 } }; static const int year_lengths[2] = { DAYSPERNYEAR, DAYSPERLYEAR }; /* ** Given a pointer into a time zone string, scan until a character that is not ** a valid character in a zone name is found. Return a pointer to that ** character. */ static const char * getzname(const char * strp) { char c; while ((c = *strp) != '\0' && !is_digit(c) && c != ',' && c != '-' && c != '+') ++strp; return strp; } /* ** Given a pointer into an extended time zone string, scan until the ending ** delimiter of the zone name is located. Return a pointer to the delimiter. ** ** As with getzname above, the legal character set is actually quite ** restricted, with other characters producing undefined results. ** We don't do any checking here; checking is done later in common-case code. */ static const char * getqzname(const char *strp, const int delim) { int c; while ((c = *strp) != '\0' && c != delim) ++strp; return strp; } /* ** Given a pointer into a time zone string, extract a number from that string. ** Check that the number is within a specified range; if it is not, return ** NULL. ** Otherwise, return a pointer to the first character not part of the number. */ static const char * getnum(const char * strp, int * const nump, const int min, const int max) { char c; int num; if (strp == NULL || !is_digit(c = *strp)) return NULL; num = 0; do { num = num * 10 + (c - '0'); if (num > max) return NULL; /* illegal value */ c = *++strp; } while (is_digit(c)); if (num < min) return NULL; /* illegal value */ *nump = num; return strp; } /* ** Given a pointer into a time zone string, extract a number of seconds, ** in hh[:mm[:ss]] form, from the string. ** If any error occurs, return NULL. ** Otherwise, return a pointer to the first character not part of the number ** of seconds. */ static const char * getsecs(const char *strp, int_fast32_t *const secsp) { int num; /* ** 'HOURSPERDAY * DAYSPERWEEK - 1' allows quasi-Posix rules like ** "M10.4.6/26", which does not conform to Posix, ** but which specifies the equivalent of ** "02:00 on the first Sunday on or after 23 Oct". */ strp = getnum(strp, &num, 0, HOURSPERDAY * DAYSPERWEEK - 1); if (strp == NULL) return NULL; *secsp = num * (int_fast32_t) SECSPERHOUR; if (*strp == ':') { ++strp; strp = getnum(strp, &num, 0, MINSPERHOUR - 1); if (strp == NULL) return NULL; *secsp += num * SECSPERMIN; if (*strp == ':') { ++strp; /* 'SECSPERMIN' allows for leap seconds. */ strp = getnum(strp, &num, 0, SECSPERMIN); if (strp == NULL) return NULL; *secsp += num; } } return strp; } /* ** Given a pointer into a time zone string, extract an offset, in ** [+-]hh[:mm[:ss]] form, from the string. ** If any error occurs, return NULL. ** Otherwise, return a pointer to the first character not part of the time. */ static const char * getoffset(const char *strp, int_fast32_t *const offsetp) { int neg = 0; if (*strp == '-') { neg = 1; ++strp; } else if (*strp == '+') ++strp; strp = getsecs(strp, offsetp); if (strp == NULL) return NULL; /* illegal time */ if (neg) *offsetp = -*offsetp; return strp; } /* ** Given a pointer into a time zone string, extract a rule in the form ** date[/time]. See POSIX section 8 for the format of "date" and "time". ** If a valid rule is not found, return NULL. ** Otherwise, return a pointer to the first character not part of the rule. */ static const char * getrule(const char * strp, struct rule * const rulep) { if (*strp == 'J') { /* ** Julian day. */ rulep->r_type = JULIAN_DAY; ++strp; strp = getnum(strp, &rulep->r_day, 1, DAYSPERNYEAR); } else if (*strp == 'M') { /* ** Month, week, day. */ rulep->r_type = MONTH_NTH_DAY_OF_WEEK; ++strp; strp = getnum(strp, &rulep->r_mon, 1, MONSPERYEAR); if (strp == NULL) return NULL; if (*strp++ != '.') return NULL; strp = getnum(strp, &rulep->r_week, 1, 5); if (strp == NULL) return NULL; if (*strp++ != '.') return NULL; strp = getnum(strp, &rulep->r_day, 0, DAYSPERWEEK - 1); } else if (is_digit(*strp)) { /* ** Day of year. */ rulep->r_type = DAY_OF_YEAR; strp = getnum(strp, &rulep->r_day, 0, DAYSPERLYEAR - 1); } else return NULL; /* invalid format */ if (strp == NULL) return NULL; if (*strp == '/') { /* ** Time specified. */ ++strp; strp = getoffset(strp, &rulep->r_time); } else rulep->r_time = 2 * SECSPERHOUR; /* default = 2:00:00 */ return strp; } /* ** Given a year, a rule, and the offset from UT at the time that rule takes ** effect, calculate the year-relative time that rule takes effect. */ static int_fast32_t transtime(const int year, const struct rule *const rulep, const int_fast32_t offset) { int leapyear; int_fast32_t value; int d, m1, yy0, yy1, yy2, dow; INITIALIZE(value); leapyear = isleap(year); switch (rulep->r_type) { case JULIAN_DAY: /* ** Jn - Julian day, 1 == January 1, 60 == March 1 even in leap ** years. ** In non-leap years, or if the day number is 59 or less, just ** add SECSPERDAY times the day number-1 to the time of ** January 1, midnight, to get the day. */ value = (rulep->r_day - 1) * SECSPERDAY; if (leapyear && rulep->r_day >= 60) value += SECSPERDAY; break; case DAY_OF_YEAR: /* ** n - day of year. ** Just add SECSPERDAY times the day number to the time of ** January 1, midnight, to get the day. */ value = rulep->r_day * SECSPERDAY; break; case MONTH_NTH_DAY_OF_WEEK: /* ** Mm.n.d - nth "dth day" of month m. */ /* ** Use Zeller's Congruence to get day-of-week of first day of ** month. */ m1 = (rulep->r_mon + 9) % 12 + 1; yy0 = (rulep->r_mon <= 2) ? (year - 1) : year; yy1 = yy0 / 100; yy2 = yy0 % 100; dow = ((26 * m1 - 2) / 10 + 1 + yy2 + yy2 / 4 + yy1 / 4 - 2 * yy1) % 7; if (dow < 0) dow += DAYSPERWEEK; /* ** "dow" is the day-of-week of the first day of the month. Get ** the day-of-month (zero-origin) of the first "dow" day of the ** month. */ d = rulep->r_day - dow; if (d < 0) d += DAYSPERWEEK; for (int i = 1; i < rulep->r_week; ++i) { if (d + DAYSPERWEEK >= mon_lengths[leapyear][rulep->r_mon - 1]) break; d += DAYSPERWEEK; } /* ** "d" is the day-of-month (zero-origin) of the day we want. */ value = d * SECSPERDAY; for (int i = 0; i < rulep->r_mon - 1; ++i) value += mon_lengths[leapyear][i] * SECSPERDAY; break; } /* ** "value" is the year-relative time of 00:00:00 UT on the day in ** question. To get the year-relative time of the specified local ** time on that day, add the transition time and the current offset ** from UT. */ return value + rulep->r_time + offset; } /* ** Given a POSIX section 8-style TZ string, fill in the rule tables as ** appropriate. */ static int tzparse(const char * name, struct state * const sp, const int lastditch) { const char * stdname; const char * dstname; size_t stdlen; size_t dstlen; int_fast32_t stdoffset; int_fast32_t dstoffset; char * cp; int load_result; static struct ttinfo zttinfo; INITIALIZE(dstname); stdname = name; if (lastditch) { stdlen = strlen(name); /* length of standard zone name */ name += stdlen; if (stdlen >= sizeof sp->chars) stdlen = (sizeof sp->chars) - 1; stdoffset = 0; } else { if (*name == '<') { name++; stdname = name; name = getqzname(name, '>'); if (*name != '>') return (-1); stdlen = name - stdname; name++; } else { name = getzname(name); stdlen = name - stdname; } if (*name == '\0') return -1; name = getoffset(name, &stdoffset); if (name == NULL) return -1; } load_result = tzload(TZDEFRULES, sp, FALSE); if (load_result != 0) sp->leapcnt = 0; /* so, we're off a little */ if (*name != '\0') { if (*name == '<') { dstname = ++name; name = getqzname(name, '>'); if (*name != '>') return -1; dstlen = name - dstname; name++; } else { dstname = name; name = getzname(name); dstlen = name - dstname; /* length of DST zone name */ } if (*name != '\0' && *name != ',' && *name != ';') { name = getoffset(name, &dstoffset); if (name == NULL) return -1; } else dstoffset = stdoffset - SECSPERHOUR; if (*name == '\0' && load_result != 0) name = TZDEFRULESTRING; if (*name == ',' || *name == ';') { struct rule start; struct rule end; int year; int yearlim; int timecnt; time_t janfirst; ++name; if ((name = getrule(name, &start)) == NULL) return -1; if (*name++ != ',') return -1; if ((name = getrule(name, &end)) == NULL) return -1; if (*name != '\0') return -1; sp->typecnt = 2; /* standard time and DST */ /* ** Two transitions per year, from EPOCH_YEAR forward. */ sp->ttis[0] = sp->ttis[1] = zttinfo; sp->ttis[0].tt_gmtoff = -dstoffset; sp->ttis[0].tt_isdst = 1; sp->ttis[0].tt_abbrind = (int)(stdlen + 1); sp->ttis[1].tt_gmtoff = -stdoffset; sp->ttis[1].tt_isdst = 0; sp->ttis[1].tt_abbrind = 0; timecnt = 0; janfirst = 0; yearlim = EPOCH_YEAR + YEARSPERREPEAT; for (year = EPOCH_YEAR; year < yearlim; year++) { int_fast32_t starttime = transtime(year, &start, stdoffset), endtime = transtime(year, &end, dstoffset); int_fast32_t yearsecs = (year_lengths[isleap(year)] * SECSPERDAY); int reversed = endtime < starttime; if (reversed) { int_fast32_t swap = starttime; starttime = endtime; endtime = swap; } if (reversed || (starttime < endtime && (endtime - starttime < (yearsecs + (stdoffset - dstoffset))))) { if (TZ_MAX_TIMES - 2 < timecnt) break; yearlim = year + YEARSPERREPEAT + 1; sp->ats[timecnt] = janfirst; if (increment_overflow_time (&sp->ats[timecnt], starttime)) break; sp->types[timecnt++] = (unsigned char) reversed; sp->ats[timecnt] = janfirst; if (increment_overflow_time (&sp->ats[timecnt], endtime)) break; sp->types[timecnt++] = !reversed; } if (increment_overflow_time(&janfirst, yearsecs)) break; } sp->timecnt = timecnt; if (!timecnt) sp->typecnt = 1; /* Perpetual DST. */ } else { int_fast32_t theirstdoffset, theirdstoffset, theiroffset; int isdst; if (*name != '\0') return -1; /* ** Initial values of theirstdoffset and theirdstoffset. */ theirstdoffset = 0; for (int i = 0; i < sp->timecnt; ++i) { int j = sp->types[i]; if (!sp->ttis[j].tt_isdst) { theirstdoffset = -sp->ttis[j].tt_gmtoff; break; } } theirdstoffset = 0; for (int i = 0; i < sp->timecnt; ++i) { int j = sp->types[i]; if (sp->ttis[j].tt_isdst) { theirdstoffset = -sp->ttis[j].tt_gmtoff; break; } } /* ** Initially we're assumed to be in standard time. */ isdst = FALSE; theiroffset = theirstdoffset; /* ** Now juggle transition times and types ** tracking offsets as you do. */ for (int i = 0; i < sp->timecnt; ++i) { int j = sp->types[i]; sp->types[i] = (unsigned char)sp->ttis[j].tt_isdst; if (sp->ttis[j].tt_ttisgmt) { /* No adjustment to transition time */ } else { /* ** If summer time is in effect, and the ** transition time was not specified as ** standard time, add the summer time ** offset to the transition time; ** otherwise, add the standard time ** offset to the transition time. */ /* ** Transitions from DST to DDST ** will effectively disappear since ** POSIX provides for only one DST ** offset. */ if (isdst && !sp->ttis[j].tt_ttisstd) { sp->ats[i] += dstoffset - theirdstoffset; } else { sp->ats[i] += stdoffset - theirstdoffset; } } theiroffset = -sp->ttis[j].tt_gmtoff; if (sp->ttis[j].tt_isdst) theirdstoffset = theiroffset; else theirstdoffset = theiroffset; } /* ** Finally, fill in ttis. */ sp->ttis[0] = sp->ttis[1] = zttinfo; sp->ttis[0].tt_gmtoff = -stdoffset; sp->ttis[0].tt_isdst = FALSE; sp->ttis[0].tt_abbrind = 0; sp->ttis[1].tt_gmtoff = -dstoffset; sp->ttis[1].tt_isdst = TRUE; sp->ttis[1].tt_abbrind = (int)(stdlen + 1); sp->typecnt = 2; } } else { dstlen = 0; sp->typecnt = 1; /* only standard time */ sp->timecnt = 0; sp->ttis[0] = zttinfo; sp->ttis[0].tt_gmtoff = -stdoffset; sp->ttis[0].tt_isdst = 0; sp->ttis[0].tt_abbrind = 0; } sp->charcnt = (int)(stdlen + 1); if (dstlen != 0) sp->charcnt += dstlen + 1; if ((size_t) sp->charcnt > sizeof sp->chars) return -1; cp = sp->chars; (void) strncpy(cp, stdname, stdlen); cp += stdlen; *cp++ = '\0'; if (dstlen != 0) { (void) strncpy(cp, dstname, dstlen); *(cp + dstlen) = '\0'; } return 0; } static void gmtload(struct state * const sp) { if (tzload(gmt, sp, TRUE) != 0) (void) tzparse(gmt, sp, TRUE); } void R_tzsetwall(void) { if (lcl_is_set < 0) return; lcl_is_set = -1; if (tzload((char *) NULL, lclptr, TRUE) != 0) gmtload(lclptr); } void tzset_name(const char * name) { if (name == NULL) { R_tzsetwall(); return; } if (lcl_is_set > 0 && strcmp(lcl_TZname, name) == 0) return; lcl_is_set = strlen(name) < sizeof lcl_TZname; if (lcl_is_set) (void) strcpy(lcl_TZname, name); if (*name == '\0') { /* ** User wants it fast rather than right. */ lclptr->leapcnt = 0; /* so, we're off a little */ lclptr->timecnt = 0; lclptr->typecnt = 0; lclptr->charcnt = 0; lclptr->goback = lclptr->goahead = FALSE; lclptr->ttis[0].tt_isdst = 0; lclptr->ttis[0].tt_gmtoff = 0; lclptr->ttis[0].tt_abbrind = 0; (void) strcpy(lclptr->chars, gmt); } else { int ok = tzload(name, lclptr, TRUE); if (ok != 0) { Rf_warning("Failed to load tz %s: falling back to %s", name, gmt); if (name[0] == ':' || tzparse(name, lclptr, FALSE) != 0) (void) gmtload(lclptr); } } } void tzset(void) { tzset_name(getenv("TZ")); } /* ** The easy way to behave "as if no library function calls" localtime ** is to not call it--so we drop its guts into "localsub", which can be ** freely called. (And no, the PANS doesn't require the above behavior-- ** but it *is* desirable.) ** ** The unused offset argument is for the benefit of mktime variants. */ /*ARGSUSED*/ static stm * localsub(const time_t *const timep, const int_fast32_t offset, stm *const tmp) { struct state * sp; const struct ttinfo * ttisp; int i; stm * result; const time_t t = *timep; sp = lclptr; if ((sp->goback && t < sp->ats[0]) || (sp->goahead && t > sp->ats[sp->timecnt - 1])) { time_t newt = t; time_t seconds; time_t years; if (t < sp->ats[0]) seconds = sp->ats[0] - t; else seconds = t - sp->ats[sp->timecnt - 1]; --seconds; years = (seconds / SECSPERREPEAT + 1) * YEARSPERREPEAT; seconds = years * AVGSECSPERYEAR; if (t < sp->ats[0]) newt += seconds; else newt -= seconds; if (newt < sp->ats[0] || newt > sp->ats[sp->timecnt - 1]) return NULL; /* "cannot happen" */ result = localsub(&newt, offset, tmp); if (result == tmp) { time_t newy; newy = tmp->tm_year; if (t < sp->ats[0]) newy -= years; else newy += years; tmp->tm_year = (int)newy; if (tmp->tm_year != newy) return NULL; } return result; } if (sp->timecnt == 0 || t < sp->ats[0]) { i = sp->defaulttype; } else { int lo = 1; int hi = sp->timecnt; while (lo < hi) { int mid = (lo + hi) >> 1; if (t < sp->ats[mid]) hi = mid; else lo = mid + 1; } i = (int) sp->types[lo - 1]; } ttisp = &sp->ttis[i]; /* ** To get (wrong) behavior that's compatible with System V Release 2.0 ** you'd replace the statement below with ** t += ttisp->tt_gmtoff; ** timesub(&t, 0L, sp, tmp); */ result = timesub(&t, ttisp->tt_gmtoff, sp, tmp); tmp->tm_isdst = ttisp->tt_isdst; //#ifdef HAVE_TM_ZONE tmp->tm_zone = &sp->chars[ttisp->tt_abbrind]; //#endif return result; } /* ** Return the number of leap years through the end of the given year ** where, to make the math easy, the answer for year zero is defined as zero. */ static int leaps_thru_end_of(const int y) { return (y >= 0) ? (y / 4 - y / 100 + y / 400) : -(leaps_thru_end_of(-(y + 1)) + 1); } static stm * timesub(const time_t *const timep, const int_fast32_t offset, const struct state *const sp, stm *const tmp) { const struct lsinfo * lp; time_t tdays; int idays; /* unsigned would be so 2003 */ int_fast64_t rem; int y; const int * ip; int_fast64_t corr; int hit; int i; corr = 0; hit = 0; i = sp->leapcnt; while (--i >= 0) { lp = &sp->lsis[i]; if (*timep >= lp->ls_trans) { if (*timep == lp->ls_trans) { hit = ((i == 0 && lp->ls_corr > 0) || lp->ls_corr > sp->lsis[i - 1].ls_corr); if (hit) while (i > 0 && sp->lsis[i].ls_trans == sp->lsis[i - 1].ls_trans + 1 && sp->lsis[i].ls_corr == sp->lsis[i - 1].ls_corr + 1) { ++hit; --i; } } corr = lp->ls_corr; break; } } y = EPOCH_YEAR; tdays = *timep / SECSPERDAY; rem = *timep - tdays * SECSPERDAY; while (tdays < 0 || tdays >= year_lengths[isleap(y)]) { int newy; time_t tdelta; int idelta; int leapdays; tdelta = tdays / DAYSPERLYEAR; if (! ((! TYPE_SIGNED(time_t) || INT_MIN <= tdelta) && tdelta <= INT_MAX)) return NULL; idelta = (int)tdelta; if (idelta == 0) idelta = (tdays < 0) ? -1 : 1; newy = y; if (increment_overflow(&newy, idelta)) return NULL; leapdays = leaps_thru_end_of(newy - 1) - leaps_thru_end_of(y - 1); tdays -= ((time_t) newy - y) * DAYSPERNYEAR; tdays -= leapdays; y = newy; } { int_fast32_t seconds; seconds = (int_fast32_t)(tdays * SECSPERDAY); tdays = seconds / SECSPERDAY; rem += seconds - tdays * SECSPERDAY; } /* ** Given the range, we can now fearlessly cast... */ idays = (int)tdays; rem += offset - corr; while (rem < 0) { rem += SECSPERDAY; --idays; } while (rem >= SECSPERDAY) { rem -= SECSPERDAY; ++idays; } while (idays < 0) { if (increment_overflow(&y, -1)) return NULL; idays += year_lengths[isleap(y)]; } while (idays >= year_lengths[isleap(y)]) { idays -= year_lengths[isleap(y)]; if (increment_overflow(&y, 1)) return NULL; } tmp->tm_year = y; if (increment_overflow(&tmp->tm_year, -TM_YEAR_BASE)) return NULL; tmp->tm_yday = idays; /* ** The "extra" mods below avoid overflow problems. */ tmp->tm_wday = EPOCH_WDAY + ((y - EPOCH_YEAR) % DAYSPERWEEK) * (DAYSPERNYEAR % DAYSPERWEEK) + leaps_thru_end_of(y - 1) - leaps_thru_end_of(EPOCH_YEAR - 1) + idays; tmp->tm_wday %= DAYSPERWEEK; if (tmp->tm_wday < 0) tmp->tm_wday += DAYSPERWEEK; tmp->tm_hour = (int) (rem / SECSPERHOUR); rem %= SECSPERHOUR; tmp->tm_min = (int) (rem / SECSPERMIN); /* ** A positive leap second requires a special ** representation. This uses "... ??:59:60" et seq. */ tmp->tm_sec = (int) (rem % SECSPERMIN) + hit; ip = mon_lengths[isleap(y)]; for (tmp->tm_mon = 0; idays >= ip[tmp->tm_mon]; ++(tmp->tm_mon)) idays -= ip[tmp->tm_mon]; tmp->tm_mday = (int) (idays + 1); tmp->tm_isdst = 0; //#ifdef HAVE_TM_GMTOFF tmp->tm_gmtoff = offset; //#endif return tmp; } /* ** Adapted from code provided by Robert Elz, who writes: ** The "best" way to do mktime I think is based on an idea of Bob ** Kridle's (so its said...) from a long time ago. ** It does a binary search of the time_t space. Since time_t's are ** just 32 bits, its a max of 32 iterations (even at 64 bits it ** would still be very reasonable). */ #ifndef WRONG #define WRONG (-1) #endif /* !defined WRONG */ /* ** Normalize logic courtesy Paul Eggert. */ static int increment_overflow(int *const ip, int j) { int const i = *ip; /* ** If i >= 0 there can only be overflow if i + j > INT_MAX ** or if j > INT_MAX - i; given i >= 0, INT_MAX - i cannot overflow. ** If i < 0 there can only be overflow if i + j < INT_MIN ** or if j < INT_MIN - i; given i < 0, INT_MIN - i cannot overflow. */ if ((i >= 0) ? (j > INT_MAX - i) : (j < INT_MIN - i)) return TRUE; *ip += j; return FALSE; } static int increment_overflow32(int_fast32_t *const lp, int const m) { int_fast32_t const l = *lp; if ((l >= 0) ? (m > INT_FAST32_MAX - l) : (m < INT_FAST32_MIN - l)) return TRUE; *lp += m; return FALSE; } static int increment_overflow_time(time_t *tp, int_fast32_t j) { /* ** This is like ** 'if (! (time_t_min <= *tp + j && *tp + j <= time_t_max)) ...', ** except that it does the right thing even if *tp + j would overflow. */ if (! (j < 0 ? (TYPE_SIGNED(time_t) ? time_t_min - j <= *tp : -1 - j < *tp) : *tp <= time_t_max - j)) return TRUE; *tp += j; return FALSE; } static int normalize_overflow(int * const tensptr, int * const unitsptr, const int base) { int tensdelta; tensdelta = (*unitsptr >= 0) ? (*unitsptr / base) : (-1 - (-1 - *unitsptr) / base); *unitsptr -= tensdelta * base; return increment_overflow(tensptr, tensdelta); } static int normalize_overflow32(int_fast32_t *const tensptr, int *const unitsptr, const int base) { int tensdelta; tensdelta = (*unitsptr >= 0) ? (*unitsptr / base) : (-1 - (-1 - *unitsptr) / base); *unitsptr -= tensdelta * base; return increment_overflow32(tensptr, tensdelta); } static int tmcomp(const stm * const atmp, const stm * const btmp) { int result; if (atmp->tm_year != btmp->tm_year) return atmp->tm_year < btmp->tm_year ? -1 : 1; if ((result = (atmp->tm_mon - btmp->tm_mon)) == 0 && (result = (atmp->tm_mday - btmp->tm_mday)) == 0 && (result = (atmp->tm_hour - btmp->tm_hour)) == 0 && (result = (atmp->tm_min - btmp->tm_min)) == 0) result = atmp->tm_sec - btmp->tm_sec; return result; } static time_t time2sub(stm *const tmp, stm *(*const funcp)(const time_t *, int_fast32_t, stm *), const int_fast32_t offset, int *const okayp, const int do_norm_secs) { const struct state * sp; int dir; int i; int saved_seconds; int_fast32_t li; time_t lo, hi; int_fast32_t y; time_t newt, t; stm yourtm = *tmp, mytm; *okayp = FALSE; if (do_norm_secs) { if (normalize_overflow(&yourtm.tm_min, &yourtm.tm_sec, SECSPERMIN)) { errno = EOVERFLOW; return WRONG; } } if (normalize_overflow(&yourtm.tm_hour, &yourtm.tm_min, MINSPERHOUR)) { errno = EOVERFLOW; return WRONG; } if (normalize_overflow(&yourtm.tm_mday, &yourtm.tm_hour, HOURSPERDAY)) { errno = EOVERFLOW; return WRONG; } y = yourtm.tm_year; if (normalize_overflow32(&y, &yourtm.tm_mon, MONSPERYEAR)) { errno = EOVERFLOW; return WRONG; } /* ** Turn y into an actual year number for now. ** It is converted back to an offset from TM_YEAR_BASE later. */ if (increment_overflow32(&y, TM_YEAR_BASE)) { errno = EOVERFLOW; return WRONG; } while (yourtm.tm_mday <= 0) { if (increment_overflow32(&y, -1)) { errno = EOVERFLOW; return WRONG; } li = y + (1 < yourtm.tm_mon); yourtm.tm_mday += year_lengths[isleap(li)]; } while (yourtm.tm_mday > DAYSPERLYEAR) { li = y + (1 < yourtm.tm_mon); yourtm.tm_mday -= year_lengths[isleap(li)]; if (increment_overflow32(&y, 1)) { errno = EOVERFLOW; return WRONG; } } for ( ; ; ) { i = mon_lengths[isleap(y)][yourtm.tm_mon]; if (yourtm.tm_mday <= i) break; yourtm.tm_mday -= i; if (++yourtm.tm_mon >= MONSPERYEAR) { yourtm.tm_mon = 0; if (increment_overflow32(&y, 1)) { errno = EOVERFLOW; return WRONG; } } } if (increment_overflow32(&y, -TM_YEAR_BASE)) { errno = EOVERFLOW; return WRONG; } yourtm.tm_year = y; if (yourtm.tm_year != y) { errno = EOVERFLOW; return WRONG; } if (yourtm.tm_sec >= 0 && yourtm.tm_sec < SECSPERMIN) saved_seconds = 0; else if (y + TM_YEAR_BASE < EPOCH_YEAR) { /* ** We can't set tm_sec to 0, because that might push the ** time below the minimum representable time. ** Set tm_sec to 59 instead. ** This assumes that the minimum representable time is ** not in the same minute that a leap second was deleted from, ** which is a safer assumption than using 58 would be. */ if (increment_overflow(&yourtm.tm_sec, 1 - SECSPERMIN)) { errno = EOVERFLOW; return WRONG; } saved_seconds = yourtm.tm_sec; yourtm.tm_sec = SECSPERMIN - 1; } else { saved_seconds = yourtm.tm_sec; yourtm.tm_sec = 0; } /* ** Do a binary search (this works whatever time_t's type is). */ lo = time_t_min; hi = time_t_max; for ( ; ; ) { t = lo / 2 + hi / 2; if (t < lo) t = lo; else if (t > hi) t = hi; if ((*funcp)(&t, offset, &mytm) == NULL) { /* ** Assume that t is too extreme to be represented in ** a struct tm; arrange things so that it is less ** extreme on the next pass. */ dir = (t > 0) ? 1 : -1; } else dir = tmcomp(&mytm, &yourtm); if (dir != 0) { if (t == lo) { if (t == time_t_max) { errno = EOVERFLOW; return WRONG; } ++t; ++lo; } else if (t == hi) { if (t == time_t_min) { errno = EOVERFLOW; return WRONG; } --t; --hi; } if (lo > hi) { errno = EOVERFLOW; return WRONG; } if (dir > 0) hi = t; else lo = t; continue; } if (yourtm.tm_isdst < 0 || mytm.tm_isdst == yourtm.tm_isdst) break; /* ** Right time, wrong type. ** Hunt for right time, right type. ** It's okay to guess wrong since the guess ** gets checked. */ sp = (const struct state *) ((funcp == localsub) ? lclptr : gmtptr); for (int i = sp->typecnt - 1; i >= 0; --i) { if (sp->ttis[i].tt_isdst != yourtm.tm_isdst) continue; for (int j = sp->typecnt - 1; j >= 0; --j) { if (sp->ttis[j].tt_isdst == yourtm.tm_isdst) continue; newt = t + sp->ttis[j].tt_gmtoff - sp->ttis[i].tt_gmtoff; if ((*funcp)(&newt, offset, &mytm) == NULL) continue; if (tmcomp(&mytm, &yourtm) != 0) continue; if (mytm.tm_isdst != yourtm.tm_isdst) continue; /* ** We have a match. */ t = newt; goto label; } } errno = EOVERFLOW; return WRONG; } label: newt = t + saved_seconds; if ((newt < t) != (saved_seconds < 0)) { errno = EOVERFLOW; return WRONG; } t = newt; if ((*funcp)(&t, offset, tmp)) *okayp = TRUE; return t; } static time_t time2(stm * const tmp, stm * (*const funcp)(const time_t *, int_fast32_t, stm *), const int_fast32_t offset, int *const okayp) { time_t t; /* ** First try without normalization of seconds ** (in case tm_sec contains a value associated with a leap second). ** If that fails, try with normalization of seconds. */ t = time2sub(tmp, funcp, offset, okayp, FALSE); return *okayp ? t : time2sub(tmp, funcp, offset, okayp, TRUE); } static time_t time1(stm *const tmp, stm *(*const funcp) (const time_t *, int_fast32_t, stm *), const int_fast32_t offset) { time_t t; const struct state *sp; int seen[TZ_MAX_TYPES]; int types[TZ_MAX_TYPES]; int okay; if (tmp == NULL) { errno = EINVAL; return WRONG; } if (tmp->tm_isdst > 1) tmp->tm_isdst = 1; t = time2(tmp, funcp, offset, &okay); if (okay || tmp->tm_isdst < 0) return t; /* R change. This appears to be required by POSIX (it says the setting is used 'initially') and is documented for Solaris. Try unknown DST setting, if it was set. */ if (tmp->tm_isdst >= 0) { tmp->tm_isdst = -1; errno = 0; // previous attempt will have set it t = time2(tmp, funcp, offset, &okay); if (okay) return t; } /* ** We're supposed to assume that somebody took a time of one type ** and did some math on it that yielded a "struct tm" that's bad. ** We try to divine the type they started from and adjust to the ** type they need. */ sp = (const struct state *) ((funcp == localsub) ? lclptr : gmtptr); for (int i = 0; i < sp->typecnt; ++i) seen[i] = FALSE; int nseen = 0; for (int i = sp->timecnt - 1; i >= 0; --i) if (!seen[sp->types[i]]) { seen[sp->types[i]] = TRUE; types[nseen++] = sp->types[i]; } for (int sameind = 0; sameind < nseen; ++sameind) { int samei = types[sameind]; if (sp->ttis[samei].tt_isdst != tmp->tm_isdst) continue; for (int otherind = 0; otherind < nseen; ++otherind) { int otheri = types[otherind]; if (sp->ttis[otheri].tt_isdst == tmp->tm_isdst) continue; tmp->tm_sec += sp->ttis[otheri].tt_gmtoff - sp->ttis[samei].tt_gmtoff; tmp->tm_isdst = !tmp->tm_isdst; t = time2(tmp, funcp, offset, &okay); if (okay) return t; tmp->tm_sec -= sp->ttis[otheri].tt_gmtoff - sp->ttis[samei].tt_gmtoff; tmp->tm_isdst = !tmp->tm_isdst; } } errno = EOVERFLOW; return WRONG; } time_t my_mktime(stm* const tmp, const char* name) { tzset_name(name); return time1(tmp, localsub, 0L); } // Returns non-zero if the file is a directory (like S_ISDIR) int is_dir (const char* path) { struct stat sb; if (stat(path, &sb) == -1) return 0; return S_ISDIR(sb.st_mode); } static int tzdir(char* buf) { const char* p = getenv("TZDIR"); if (p != NULL && is_dir(p) != 0) { strncpy(buf, p, 1000); return 0; } p = getenv("R_SHARE_DIR"); if (p != NULL) { snprintf(buf, 1000, "%s/zoneinfo", p); if (is_dir(buf) != 0) { return 0; } } snprintf(buf, 1000, "%s/share/zoneinfo", getenv("R_HOME")); if (is_dir(buf) != 0) { return 0; } // Common linux location strncpy(buf, "/usr/share/zoneinfo/", 1000); if (is_dir(buf) != 0) { return 0; } // Common solaris location strncpy(buf, "/usr/share/lib/zoneinfo/", 1000); if (is_dir(buf) != 0) { return 0; } return -1; } readr/src/write_delim.cpp0000644000175100001440000000751013106621354015170 0ustar hornikusers#include using namespace Rcpp; #include #include "grisu3.h" #include "write_connection.h" #include // stream // Defined later to make copyright clearer template void stream_delim(Stream& output, const RObject& x, int i, char delim, const std::string& na); template void stream_delim_row(Stream& output, const Rcpp::List& x, int i, char delim, const std::string& na) { int p = Rf_length(x); for (int j = 0; j < p - 1; ++j) { stream_delim(output, x.at(j), i, delim, na); output << delim; } stream_delim(output, x.at(p - 1), i, delim, na); output << '\n'; } bool needs_quote(const char* string, char delim, const std::string& na) { if (string == na) return true; for (const char* cur = string; *cur != '\0'; ++cur) { if (*cur == '\n' || *cur == '\r' || *cur == '"' || *cur == delim) return true; } return false; } template void stream_delim(Stream& output, const char* string, char delim, const std::string& na) { bool quotes = needs_quote(string, delim, na); if (quotes) output << '"'; for (const char* cur = string; *cur != '\0'; ++cur) { switch (*cur) { case '"': output << "\"\""; break; default: output << *cur; } } if (quotes) output << '"'; } template void stream_delim(Stream& output, const List& df, char delim, const std::string& na, bool col_names = true, bool bom = false) { int p = Rf_length(df); if (p == 0) return; if (bom) { output << "\xEF\xBB\xBF"; } if (col_names) { CharacterVector names = as(df.attr("names")); for (int j = 0; j < p; ++j) { stream_delim(output, names, j, delim, na); if (j != p - 1) output << delim; } output << '\n'; } RObject first_col = df[0]; int n = Rf_length(first_col); for (int i = 0; i < n; ++i) { stream_delim_row(output, df, i, delim, na); } } // [[Rcpp::export]] std::string stream_delim_(const List& df, RObject connection, char delim, const std::string& na, bool col_names = true, bool bom = false) { if (connection == R_NilValue) { std::ostringstream output; stream_delim(output, df, delim, na, col_names, bom); return output.str(); } else { boost::iostreams::stream output(connection); stream_delim(output, df, delim, na, col_names, bom); } return ""; } // ============================================================================= // Derived from EncodeElementS in RPostgreSQL // Written by: tomoakin@kenroku.kanazawa-u.ac.jp // License: GPL-2 template void stream_delim(Stream& output, const RObject& x, int i, char delim, const std::string& na) { switch (TYPEOF(x)) { case LGLSXP: { int value = LOGICAL(x)[i]; if (value == TRUE) { output << "TRUE"; } else if (value == FALSE) { output << "FALSE"; } else { output << na; } break; } case INTSXP: { int value = INTEGER(x)[i]; if (value == NA_INTEGER) { output << na; } else { output << value; } break; } case REALSXP:{ double value = REAL(x)[i]; if (!R_FINITE(value)) { if (ISNA(value)) { output << na; } else if (ISNAN(value)) { output << "NaN"; } else if (value > 0) { output << "Inf"; } else { output << "-Inf"; } } else { char str[32]; int len = dtoa_grisu3(value, str); output.write(str, len); } break; } case STRSXP: { if (STRING_ELT(x, i) == NA_STRING) { output << na; } else { stream_delim(output, Rf_translateCharUTF8(STRING_ELT(x, i)), delim, na); } break; } default: Rcpp::stop("Don't know how to handle vector of type %s.", Rf_type2char(TYPEOF(x))); } } readr/src/Iconv.cpp0000644000175100001440000000433013106621354013737 0ustar hornikusers#include using namespace Rcpp; #include "Iconv.h" Iconv::Iconv(const std::string& from, const std::string& to) { if (from == "UTF-8") { cd_ = NULL; } else { cd_ = Riconv_open(to.c_str(), from.c_str()); if (cd_ == (void*) -1) { if (errno == EINVAL) { stop("Can't convert from %s to %s", from, to); } else { stop("Iconv initialisation failed"); } } // Allocate space in buffer buffer_.resize(1024); } } Iconv::~Iconv() { if (cd_ != NULL) { Riconv_close(cd_); cd_ = NULL; } } size_t Iconv::convert(const char* start, const char* end) { size_t n = end - start; // Ensure buffer is big enough: one input byte can never generate // more than 4 output bytes size_t max_size = n * 4; if (buffer_.size() < max_size) buffer_.resize(max_size); char* outbuf = &buffer_[0]; size_t inbytesleft = n, outbytesleft = max_size; size_t res = Riconv(cd_, &start, &inbytesleft, &outbuf, &outbytesleft); if (res == (size_t) -1) { switch(errno) { case EILSEQ: stop("Invalid multibyte sequence"); case EINVAL: stop("Incomplete multibyte sequence"); case E2BIG: stop("Iconv buffer too small"); default: stop("Iconv failed to convert for unknown reason"); } } return max_size - outbytesleft; } int my_strnlen (const char *s, int maxlen){ for(int n = 0; n < maxlen; ++n) { if(s[n] == '\0') return n; } return maxlen; } #if defined(__sun) #define readr_strnlen my_strnlen #else #define readr_strnlen strnlen #endif // To be safe, we need to check for nulls - this also needs to emit // a warning, but this behaviour is better than crashing SEXP safeMakeChar(const char* start, size_t n, bool hasNull) { int m = hasNull ? readr_strnlen(start, n) : n; return Rf_mkCharLenCE(start, m, CE_UTF8); } SEXP Iconv::makeSEXP(const char* start, const char* end, bool hasNull) { if (cd_ == NULL) return safeMakeChar(start, end - start, hasNull); int n = convert(start, end); return safeMakeChar(&buffer_[0], n, hasNull); } std::string Iconv::makeString(const char* start, const char* end) { if (cd_ == NULL) return std::string(start, end); int n = convert(start, end); return std::string(&buffer_[0], n); } readr/src/utils.h0000644000175100001440000000052413106621354013467 0ustar hornikusers#ifndef FASTREAD_UTILS_H_ #define FASTREAD_UTILS_H_ // Advances iterator if the next character is a LF. // Returns iterator to end of line. template inline Iter advanceForLF(Iter* pBegin, Iter end) { Iter cur = *pBegin; if (*cur == '\r' && (cur + 1 != end) && *(cur + 1) == '\n') (*pBegin)++; return cur; } #endif readr/src/Tokenizer.cpp0000644000175100001440000000347613106621354014645 0ustar hornikusers#include using namespace Rcpp; #include "Tokenizer.h" #include "TokenizerDelim.h" #include "TokenizerFwf.h" #include "TokenizerWs.h" #include "TokenizerLine.h" #include "TokenizerLog.h" TokenizerPtr Tokenizer::create(List spec) { std::string subclass(as(spec.attr("class"))[0]); if (subclass == "tokenizer_delim") { char delim = as(spec["delim"]); char quote = as(spec["quote"]); std::vector na = as >(spec["na"]); std::string comment = as(spec["comment"]); bool trimWs = as(spec["trim_ws"]); bool escapeDouble = as(spec["escape_double"]); bool escapeBackslash = as(spec["escape_backslash"]); bool quotedNA = as(spec["quoted_na"]); return TokenizerPtr(new TokenizerDelim(delim, quote, na, comment, trimWs, escapeBackslash, escapeDouble, quotedNA) ); } else if (subclass == "tokenizer_fwf") { std::vector begin = as >(spec["begin"]), end = as >(spec["end"]); std::vector na = as >(spec["na"]); std::string comment = as(spec["comment"]); return TokenizerPtr(new TokenizerFwf(begin, end, na, comment)); } else if (subclass == "tokenizer_line") { std::vector na = as >(spec["na"]); return TokenizerPtr(new TokenizerLine(na)); } else if (subclass == "tokenizer_log") { return TokenizerPtr(new TokenizerLog()); } else if (subclass == "tokenizer_ws") { std::vector na = as >(spec["na"]); std::string comment = as(spec["comment"]); return TokenizerPtr(new TokenizerWs(na, comment)); } Rcpp::stop("Unknown tokenizer type"); return TokenizerPtr(); } readr/src/TokenizerFwf.h0000644000175100001440000000157413106621354014752 0ustar hornikusers#ifndef FASTREAD_TOKENIZERFWF_H_ #define FASTREAD_TOKENIZERFWF_H_ #include #include "Token.h" #include "Tokenizer.h" #include "utils.h" class TokenizerFwf : public Tokenizer { std::vector beginOffset_, endOffset_; std::vector NA_; SourceIterator begin_, cur_, curLine_, end_; int row_, col_, cols_, max_; std::string comment_; bool moreTokens_, isRagged_, hasComment_; public: TokenizerFwf(const std::vector& beginOffset, const std::vector& endOffset, std::vector NA = std::vector(1, "NA"), std::string comment = ""); void tokenize(SourceIterator begin, SourceIterator end); std::pair progress(); Token nextToken(); private: Token fieldToken(SourceIterator begin, SourceIterator end, bool hasNull); bool isComment(const char* cur) const; }; #endif readr/src/connection.cpp0000644000175100001440000000166213106621354015025 0ustar hornikusers#include using namespace Rcpp; // Wrapper around R's read_bin function RawVector read_bin(RObject con, int bytes = 64 * 1024) { Rcpp::Environment baseEnv = Rcpp::Environment::base_env(); Rcpp::Function readBin = baseEnv["readBin"]; RawVector out = Rcpp::as(readBin(con, "raw", bytes)); return out; } // Read data from a connection in chunks and then combine into a single // raw vector. // // [[Rcpp::export]] RawVector read_connection_(RObject con, int chunk_size = 64 * 1024) { std::vector chunks; RawVector chunk; while((chunk = read_bin(con, chunk_size)).size() > 0) chunks.push_back(chunk); size_t size = 0; for (size_t i = 0; i < chunks.size(); ++i) size += chunks[i].size(); RawVector out(size); size_t pos = 0; for (size_t i = 0; i < chunks.size(); ++i) { memcpy(RAW(out) + pos, RAW(chunks[i]), chunks[i].size()); pos += chunks[i].size(); } return out; } readr/src/Iconv.h0000644000175100001440000000075713106621354013415 0ustar hornikusers#ifndef READ_ICONV_H_ #define READ_ICONV_H_ #include "R_ext/Riconv.h" #include class Iconv { void* cd_; std::string buffer_; public: Iconv(const std::string& from, const std::string& to = "UTF-8"); virtual ~Iconv(); SEXP makeSEXP(const char* start, const char* end, bool hasNull = true); std::string makeString(const char* start, const char* end); private: // Returns number of characters in buffer size_t convert(const char* start, const char* end); }; #endif readr/src/Reader.cpp0000644000175100001440000000745213106621354014073 0ustar hornikusers#include "Reader.h" Reader::Reader(SourcePtr source, TokenizerPtr tokenizer, std::vector collectors, bool progress, CharacterVector colNames) : source_(source), tokenizer_(tokenizer), collectors_(collectors), progress_(progress), begun_(false) { init(colNames); } Reader::Reader(SourcePtr source, TokenizerPtr tokenizer, CollectorPtr collector, bool progress, CharacterVector colNames) : source_(source), tokenizer_(tokenizer), progress_(progress), begun_(false) { collectors_.push_back(collector); init(colNames); } void Reader::init(CharacterVector colNames) { tokenizer_->tokenize(source_->begin(), source_->end()); tokenizer_->setWarnings(&warnings_); // Work out which output columns we are keeping and set warnings for each collector size_t p = collectors_.size(); for (size_t j = 0; j < p; ++j) { if (!collectors_[j]->skip()) { keptColumns_.push_back(j); collectors_[j]->setWarnings(&warnings_); } } if (colNames.size() > 0) { outNames_ = CharacterVector(keptColumns_.size()); int i = 0; for (std::vector::const_iterator it = keptColumns_.begin(); it != keptColumns_.end(); ++it) { outNames_[i++] = colNames[*it]; } } } RObject Reader::readToDataFrame(int lines) { read(lines); // Save individual columns into a data frame List out(outNames_.size()); int j = 0; for (std::vector::const_iterator it = keptColumns_.begin(); it != keptColumns_.end(); ++it) { out[j++] = collectors_[*it]->vector(); } out.attr("names") = outNames_; out = warnings_.addAsAttribute(out); collectorsClear(); warnings_.clear(); static Function as_tibble("as_tibble", Environment::namespace_env("tibble")); return as_tibble(out); } int Reader::read(int lines) { if (t_.type() == TOKEN_EOF) { return(-1); } int n = (lines < 0) ? 1000 : lines; collectorsResize(n); int last_row = -1, last_col = -1, cells = 0; int first_row; if (!begun_) { t_ = tokenizer_->nextToken(); begun_ = true; first_row = 0; } else { first_row = t_.row(); } while (t_.type() != TOKEN_EOF) { if (progress_ && (++cells) % progressStep_ == 0) { progressBar_.show(tokenizer_->progress()); } if (t_.col() == 0 && static_cast(t_.row()) != first_row) { checkColumns(last_row, last_col, collectors_.size()); } if (lines >= 0 && static_cast(t_.row()) - first_row >= lines) { break; } if (static_cast(t_.row()) - first_row >= n) { // Estimate rows in full dataset and resize collectors n = ((t_.row() - first_row) / tokenizer_->progress().first) * 1.1; collectorsResize(n); } // only set value if within the expected number of columns if (t_.col() < collectors_.size()) { collectors_[t_.col()]->setValue(t_.row() - first_row, t_); } last_row = t_.row(); last_col = t_.col(); t_ = tokenizer_->nextToken(); } if (last_row != -1) { checkColumns(last_row, last_col, collectors_.size()); } if (progress_) { progressBar_.show(tokenizer_->progress()); } progressBar_.stop(); // Resize the collectors to the final size (if it is not already at that // size) if (last_row == -1) { collectorsResize(0); } else if ((last_row - first_row) < (n - 1)) { collectorsResize((last_row - first_row) + 1); } return last_row - first_row; } void Reader::checkColumns(int i, int j, int n) { if (j + 1 == n) return; warnings_.addWarning(i, -1, tfm::format("%i columns", n), tfm::format("%i columns", j + 1) ); } void Reader::collectorsResize(int n) { for (size_t j = 0; j < collectors_.size(); ++j) { collectors_[j]->resize(n); } } void Reader::collectorsClear() { for (size_t j = 0; j < collectors_.size(); ++j) { collectors_[j]->clear(); } } readr/src/grisu3.h0000644000175100001440000000334313106621354013545 0ustar hornikusers#ifndef FASTREAD_GRISU3_H_ #define FASTREAD_GRISU3_H_ /* Copyright Jukka Jylänki Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ /* This file is part of an implementation of the "grisu3" double to string conversion algorithm described in the research paper "Printing Floating-Point Numbers Quickly And Accurately with Integers" by Florian Loitsch, available at http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf */ extern "C" { /// Converts the given double-precision floating point number to a string representation. /** For most inputs, this string representation is the shortest such, which deserialized again, returns the same bit representation of the double. @param v The number to convert. @param dst [out] The double-precision floating point number will be written here as a null-terminated string. The conversion algorithm will write at most 25 bytes to this buffer. (null terminator is included in this count). The dst pointer may not be null. @return the number of characters written to dst, excluding the null terminator (which is always written) is returned here. */ int dtoa_grisu3(double v, char *dst); } #ifdef __cplusplus #include std::string dtoa_grisu3_string(double v); #endif #endif readr/src/DateTimeParser.h0000644000175100001440000002710313106621354015202 0ustar hornikusers#ifndef FASTREAD_DATE_TIME_PARSER_H_ #define FASTREAD_DATE_TIME_PARSER_H_ #include #include "boost.h" #include "DateTime.h" #include "LocaleInfo.h" #include "QiParsers.h" // Parsing --------------------------------------------------------------------- class DateTimeParser { int year_, mon_, day_, hour_, min_, sec_; double psec_; int amPm_; bool compactDate_; // used for guessing int tzOffsetHours_, tzOffsetMinutes_; std::string tz_; LocaleInfo* pLocale_; std::string tzDefault_; const char* dateItr_; const char* dateEnd_; public: DateTimeParser(LocaleInfo* pLocale): pLocale_(pLocale), tzDefault_(pLocale->tz_), dateItr_(NULL), dateEnd_(NULL) { reset(); } // Parse ISO8601 date time. In benchmarks this only seems ~30% faster than // parsing with a format string so it doesn't seem necessary to add individual // parsers for other common formats. bool parseISO8601(bool partial = true) { // Date: YYYY-MM-DD, YYYYMMDD if (!consumeInteger(4, &year_)) return false; if (consumeThisChar('-')) compactDate_ = false; if (!consumeInteger1(2, &mon_)) return false; if (!compactDate_ && !consumeThisChar('-')) return false; if (!consumeInteger1(2, &day_)) return false; if (isComplete()) return true; // Spec requires T, but common to use space instead char next; if (!consumeChar(&next)) return false; if (next != 'T' && next != ' ') return false; // hh:mm:ss.sss, hh:mm:ss, hh:mm, hh // hhmmss.sss, hhmmss, hhmm if (!consumeInteger(2, &hour_)) return false; consumeThisChar(':'); consumeInteger(2, &min_); consumeThisChar(':'); consumeSeconds(&sec_, &psec_); if (isComplete()) return true; // Has a timezone tz_ = "UTC"; if (!consumeTzOffset(&tzOffsetHours_, &tzOffsetMinutes_)) return false; return isComplete(); } bool parseLocaleTime() { return parse(pLocale_->timeFormat_); } bool parseLocaleDate() { return parse(pLocale_->dateFormat_); } // A flexible time parser for the most common formats bool parseTime() { if (!consumeInteger(2, &hour_, false)) return false; if (!consumeThisChar(':')) return false; if (!consumeInteger(2, &min_)) return false; consumeThisChar(':'); consumeSeconds(&sec_, NULL); consumeWhiteSpace(); consumeString(pLocale_->amPm_, &amPm_); consumeWhiteSpace(); return isComplete(); } bool parseDate() { // Date: YYYY-MM-DD, YYYY/MM/DD if (!consumeInteger(4, &year_)) return false; if (!consumeThisChar('-') && !consumeThisChar('/')) return false; if (!consumeInteger1(2, &mon_)) return false; if (!consumeThisChar('-') && !consumeThisChar('/')) return false; if (!consumeInteger1(2, &day_)) return false; return isComplete(); } bool isComplete() { return dateItr_ == dateEnd_; } void setDate(const char* date) { reset(); dateItr_ = date; dateEnd_ = date + strlen(date); } bool parse(const std::string& format) { consumeWhiteSpace(); // always consume leading whitespace std::string::const_iterator formatItr, formatEnd = format.end(); for (formatItr = format.begin(); formatItr != formatEnd; ++formatItr) { // Whitespace in format matches 0 or more whitespace in date if (std::isspace(*formatItr)) { consumeWhiteSpace(); continue; } // Any other characters must much exactly. if (*formatItr != '%') { if (!consumeThisChar(*formatItr)) return false; continue; } if (formatItr + 1 == formatEnd) Rcpp::stop("Invalid format: trailing %"); formatItr++; switch(*formatItr) { case 'Y': // year with century if (!consumeInteger(4, &year_)) return false; break; case 'y': // year without century if (!consumeInteger(2, &year_)) return false; year_ += (year_ < 69) ? 2000 : 1900; break; case 'm': // month if (!consumeInteger1(2, &mon_, false)) return false; break; case 'b': // abbreviated month name if (!consumeString(pLocale_->monAb_, &mon_)) return false; break; case 'B': // month name if (!consumeString(pLocale_->mon_, &mon_)) return false; break; case 'd': // day if (!consumeInteger1(2, &day_, false)) return false; break; case 'e': // day with optional leading space if (!consumeInteger1WithSpace(2, &day_)) return false; break; case 'H': // hour if (!consumeInteger(2, &hour_, false)) return false; break; case 'I': // hour if (!consumeInteger(2, &hour_, false)) return false; if (hour_ < 1 || hour_ > 12) { return false; } hour_ %= 12; break; case 'M': // minute if (!consumeInteger(2, &min_)) return false; break; case 'S': // seconds (integer) if (!consumeSeconds(&sec_, NULL)) return false; break; case 'O': // seconds (double) if (formatItr + 1 == formatEnd || *(formatItr + 1) != 'S') Rcpp::stop("Invalid format: %%O must be followed by %%S"); formatItr++; if (!consumeSeconds(&sec_, &psec_)) return false; break; case 'p': // AM/PM if (!consumeString(pLocale_->amPm_, &amPm_)) return false; break; case 'z': // time zone specification tz_ = "UTC"; if (!consumeTzOffset(&tzOffsetHours_, &tzOffsetMinutes_)) return false; break; case 'Z': // time zone name if (!consumeTzName(&tz_)) return false; break; // Extensions case '.': if (!consumeNonDigit()) return false; break; case '+': if (!consumeNonDigits()) return false; break; case '*': consumeNonDigits(); break; case 'A': // auto date / time if (formatItr + 1 == formatEnd) Rcpp::stop("Invalid format: %%A must be followed by another letter"); formatItr++; switch(*formatItr) { case 'D': if (!parseDate()) return false; break; case 'T': if (!parseTime()) return false; break; default: Rcpp::stop("Invalid %%A auto parser"); } break; // Compound formats case 'D': parse("%m/%d/%y"); break; case 'F': parse("%Y-%m-%d"); break; case 'R': parse("%H:%M"); break; case 'X': case 'T': parse("%H:%M:%S"); break; case 'x': parse("%y/%m/%d"); break; default: Rcpp::stop("Unsupported format %%%s", *formatItr); } } consumeWhiteSpace(); // always consume trailing whitespace return isComplete(); } DateTime makeDateTime() { DateTime dt(year_, mon_, day_, hour(), min_, sec_, psec_, tz_); if (tz_ == "UTC") dt.setOffset(-tzOffsetHours_ * 3600 - tzOffsetMinutes_ * 60); return dt; } DateTime makeDate() { DateTime dt(year_, mon_, day_, 0, 0, 0, 0, "UTC"); return dt; } DateTime makeTime() { DateTime dt(0, 0, 0, hour(), min_, sec_, psec_, "UTC"); return dt; } bool compactDate() { return compactDate_; } int year() { return year_; } private: int hour() { if (hour_ == 12) { // 12 AM if (amPm_ == 0) { return hour_ - 12; } // 12 PM return hour_; } // Rest of PM if (amPm_ == 1) { return hour_ + 12; } // 24 hour time return hour_; } inline bool consumeSeconds(int* pSec, double* pPartialSec) { double sec; if (!consumeDouble(&sec)) return false; *pSec = (int) sec; if (pPartialSec != NULL) *pPartialSec = sec - *pSec; return true; } inline bool consumeString(const std::vector& haystack, int* pOut) { // haystack is always in UTF-8 std::string needleUTF8 = pLocale_->encoder_.makeString(dateItr_, dateEnd_); for(size_t i = 0; i < haystack.size(); ++i) { if (boost::istarts_with(needleUTF8, haystack[i])) { *pOut = i; dateItr_ += haystack[i].size(); return true; } } return false; } inline bool consumeInteger(int n, int* pOut, bool exact = true) { if (dateItr_ == dateEnd_ || *dateItr_ == '-' || *dateItr_ == '+') return false; const char* start = dateItr_; const char* end = std::min(dateItr_ + n, dateEnd_); bool ok = parseInt(dateItr_, end, *pOut); return ok && (!exact || (dateItr_ - start) == n); } // Integer indexed from 1 (i.e. month and date) inline bool consumeInteger1(int n, int* pOut, bool exact = true) { if (!consumeInteger(n, pOut, exact)) return false; (*pOut)--; return true; } // Integer indexed from 1 with optional space inline bool consumeInteger1WithSpace(int n, int* pOut) { if (consumeThisChar(' ')) n--; return consumeInteger1(n, pOut); } inline bool consumeDouble(double* pOut) { if (dateItr_ == dateEnd_ || *dateItr_ == '-' || *dateItr_ == '+') return false; return parseDouble(pLocale_->decimalMark_, dateItr_, dateEnd_, *pOut); } inline bool consumeWhiteSpace() { while (dateItr_ != dateEnd_ && std::isspace(*dateItr_)) dateItr_++; return true; } inline bool consumeNonDigit() { if (dateItr_ == dateEnd_ || std::isdigit(*dateItr_)) return false; dateItr_++; return true; } inline bool consumeNonDigits() { if (!consumeNonDigit()) return false; while (dateItr_ != dateEnd_ && !std::isdigit(*dateItr_)) dateItr_++; return true; } inline bool consumeChar(char* pOut) { if (dateItr_ == dateEnd_) return false; *pOut = *dateItr_++; return true; } inline bool consumeThisChar(char needed) { if (dateItr_ == dateEnd_ || *dateItr_ != needed) return false; dateItr_++; return true; } inline bool consumeAMPM(bool* pIsPM) { if (dateItr_ == dateEnd_) return false; if (consumeThisChar('A') || consumeThisChar('a')) { *pIsPM = false; } else if (consumeThisChar('P') || consumeThisChar('p')) { *pIsPM = true; } else { return false; } if (!(consumeThisChar('M') || consumeThisChar('m'))) return false; return true; } // ISO8601 style // Z // ±hh:mm // ±hhmm // ±hh inline bool consumeTzOffset(int* pHours, int* pMinutes) { if (consumeThisChar('Z')) return true; // Optional +/- (required for ISO8601 but we'll let it slide) int mult = 1; if (*dateItr_ == '+' || *dateItr_ == '-') { mult = (*dateItr_ == '-') ? -1 : 1; dateItr_++; } // Required hours if (!consumeInteger(2, pHours)) return false; // Optional colon and minutes consumeThisChar(':'); consumeInteger(2, pMinutes); *pHours *= mult; *pMinutes *= mult; return true; } inline bool consumeTzName(std::string* pOut) { const char* tzStart = dateItr_; while (dateItr_ != dateEnd_ && !std::isspace(*dateItr_)) dateItr_++; pOut->assign(tzStart, dateItr_); return tzStart != dateItr_; } void reset() { year_ = -1; mon_ = 0; day_ = 0; hour_ = 0; min_ = 0; sec_ = 0; psec_ = 0; amPm_ = -1; compactDate_ = true; tzOffsetHours_ = 0; tzOffsetMinutes_ = 0; tz_ = tzDefault_; } }; #endif readr/src/QiParsers.h0000644000175100001440000000560513106621354014245 0ustar hornikusers#ifndef FASTREAD_QI_PARSERS #define FASTREAD_QI_PARSERS #include "boost.h" struct DecimalCommaPolicy : public boost::spirit::qi::real_policies { template static bool parse_dot(Iterator& first, Iterator const& last) { if (first == last || *first != ',') return false; ++first; return true; } }; template inline bool parseDouble(const char decimalMark, Iterator& first, Iterator& last, Attr& res) { if (decimalMark == '.') { return boost::spirit::qi::parse(first, last, boost::spirit::qi::long_double, res); } else if (decimalMark == ',') { return boost::spirit::qi::parse(first, last, boost::spirit::qi::real_parser(), res); } else { return false; } } enum NumberState { STATE_INIT, STATE_LHS, STATE_RHS, STATE_FIN }; // First and last are updated to point to first/last successfully parsed // character template inline bool parseNumber(char decimalMark, char groupingMark, Iterator& first, Iterator& last, Attr& res) { Iterator cur = first; // Advance to first non-character for(; cur != last; ++cur) { if (*cur == '-' || *cur == decimalMark || (*cur >= '0' && *cur <= '9')) break; } if (cur == last) { return false; } else { // Move first to start of number first = cur; } double sum = 0, denom = 1; NumberState state = STATE_INIT; bool seenNumber = false; double sign = 1.0; for(; cur != last; ++cur) { if (state == STATE_FIN) break; switch(state) { case STATE_INIT: if (*cur == '-') { state = STATE_LHS; sign = -1.0; } else if (*cur == decimalMark) { state = STATE_RHS; } else if (*cur >= '0' && *cur <= '9') { seenNumber = true; state = STATE_LHS; sum = *cur - '0'; } else { goto end; } break; case STATE_LHS: if (*cur == groupingMark) { // do nothing } else if (*cur == decimalMark) { state = STATE_RHS; } else if (*cur >= '0' && *cur <= '9') { seenNumber = true; sum *= 10; sum += *cur - '0'; } else { goto end; } break; case STATE_RHS: if (*cur == groupingMark) { // do nothing } else if (*cur >= '0' && *cur <= '9') { seenNumber = true; denom *= 10; sum += (*cur - '0') / denom; } else { goto end; } break; case STATE_FIN: goto end; } } end: // Set last to point to final character used last = cur; res = sign * sum; return seenNumber; } template inline bool parseInt(Iterator& first, Iterator& last, Attr& res) { return boost::spirit::qi::parse(first, last, boost::spirit::qi::int_, res); } #endif readr/src/write_connection.h0000644000175100001440000000116513106621354015702 0ustar hornikusers#ifndef READR_WRITE_CONNECTION_H_ #define READR_WRITE_CONNECTION_H_ #include // streamsize #include // sink_tag #include typedef struct Rconn * Rconnection; Rconnection get_connection(SEXP con); // http://www.boost.org/doc/libs/1_63_0/libs/iostreams/doc/tutorial/container_sink.html namespace io = boost::iostreams; class connection_sink { private: Rconnection con_; public: typedef char char_type; typedef io::sink_tag category; connection_sink(SEXP con); std::streamsize write(const char* s, std::streamsize n); }; #endif readr/src/write_connection.cpp0000644000175100001440000000206713106621354016237 0ustar hornikusers#include "write_connection.h" #define class class_name #define private private_ptr #include #undef class #undef private #if R_CONNECTIONS_VERSION != 1 #error "Missing or unsupported connection API in R" #endif #if defined(R_VERSION) && R_VERSION >= R_Version(3, 3, 0) Rconnection get_connection(SEXP con) { return R_GetConnection(con); } # else extern "C" { extern Rconnection getConnection(int) ; } Rconnection get_connection(SEXP con) { if (!Rf_inherits(con, "connection")) Rcpp::stop("invalid connection"); return getConnection(Rf_asInteger(con)); } #endif // http://www.boost.org/doc/libs/1_63_0/libs/iostreams/doc/tutorial/container_sink.html // namespace io = boost::iostreams; connection_sink::connection_sink(SEXP con) { con_ = get_connection(con); } std::streamsize connection_sink::write(const char* s, std::streamsize n) { size_t write_size; if ((write_size = R_WriteConnection(con_, (void *) s, n)) != static_cast(n)) { Rcpp::stop("write failed, expected %l, got %l", n, write_size); } return write_size; } readr/src/Token.h0000644000175100001440000000542413106621354013413 0ustar hornikusers#ifndef FASTREAD_TOKEN_H_ #define FASTREAD_TOKEN_H_ #include #include #include "Source.h" #include "Iconv.h" #include "Tokenizer.h" enum TokenType { TOKEN_STRING, // a sequence of characters TOKEN_MISSING, // an missing value TOKEN_EMPTY, // an empty value TOKEN_EOF // end of file }; class Token { TokenType type_; SourceIterator begin_, end_; size_t row_, col_; bool hasNull_; Tokenizer* pTokenizer_; public: Token(): type_(TOKEN_EMPTY), row_(0), col_(0) {} Token(TokenType type, int row, int col): type_(type), row_(row), col_(col) {} Token(SourceIterator begin, SourceIterator end, int row, int col, bool hasNull, Tokenizer* pTokenizer = NULL): type_(TOKEN_STRING), begin_(begin), end_(end), row_(row), col_(col), hasNull_(hasNull), pTokenizer_(pTokenizer) { if (begin_ == end_) type_ = TOKEN_EMPTY; } std::string asString() const { switch(type_) { case TOKEN_STRING: { boost::container::string buffer; SourceIterators string = getString(&buffer); return std::string(string.first, string.second); } case TOKEN_MISSING: return "[MISSING]"; case TOKEN_EMPTY: return "[EMPTY]"; case TOKEN_EOF: return "[EOF]"; } return ""; } SEXP asRaw() const { int n = (type_ == TOKEN_STRING) ? end_ - begin_ : 0; Rcpp::RawVector out(n); if (n > 0) memcpy(RAW(out), begin_, n); return out; } SEXP asSEXP(Iconv* pEncoder) const { switch(type_) { case TOKEN_STRING: { boost::container::string buffer; SourceIterators string = getString(&buffer); return pEncoder->makeSEXP(string.first, string.second, hasNull_); } default: return NA_STRING; } } TokenType type() const { return type_; } SourceIterators getString(boost::container::string *pOut) const { if (pTokenizer_ == NULL) return std::make_pair(begin_, end_); pTokenizer_->unescape(begin_, end_, pOut); return std::make_pair(pOut->data(), pOut->data() + pOut->size()); } size_t row() const { return row_; } size_t col() const { return col_; } bool hasNull() const { return hasNull_; } Token& trim() { while (begin_ != end_ && *begin_ == ' ') begin_++; while (end_ != begin_ && *(end_ - 1) == ' ') end_--; if (begin_ == end_) type_ = TOKEN_EMPTY; return *this; } Token& flagNA(const std::vector& NA) { std::vector::const_iterator it; for (it = NA.begin(); it != NA.end(); ++it) { if ((size_t) (end_ - begin_) != it->size()) continue; if (strncmp(begin_, it->data(), it->size()) == 0) { type_ = TOKEN_MISSING; break; } } return *this; } }; #endif readr/src/TokenizerDelim.h0000644000175100001440000000324013106621354015252 0ustar hornikusers#ifndef FASTREAD_TOKENIZEDELIM_H_ #define FASTREAD_TOKENIZEDELIM_H_ #include #include "Token.h" #include "Tokenizer.h" #include "utils.h" enum DelimState { STATE_DELIM, STATE_FIELD, STATE_STRING, STATE_QUOTE, STATE_ESCAPE_S, STATE_ESCAPE_F, STATE_STRING_END, STATE_COMMENT }; class TokenizerDelim : public Tokenizer { char delim_, quote_; std::vector NA_; std::string comment_; bool hasComment_, trimWS_, escapeBackslash_, escapeDouble_, quotedNA_, hasEmptyNA_; SourceIterator begin_, cur_, end_; DelimState state_; int row_, col_; bool moreTokens_; public: TokenizerDelim(char delim = ',', char quote = '"', std::vector NA = std::vector(1, "NA"), std::string comment = "", bool trimWS = true, bool escapeBackslash = false, bool escapeDouble = true, bool quotedNA = true); void tokenize(SourceIterator begin, SourceIterator end); std::pair progress(); Token nextToken(); void unescape(SourceIterator begin, SourceIterator end, boost::container::string* pOut); private: bool isComment(const char* cur) const; void newField(); void newRecord(); Token emptyToken(int row, int col); Token fieldToken(SourceIterator begin, SourceIterator end, bool hasEscapeB, bool hasNull, int row, int col); Token stringToken(SourceIterator begin, SourceIterator end, bool hasEscapeB, bool hasEscapeD, bool hasNull, int row, int col); void unescapeBackslash(SourceIterator begin, SourceIterator end, boost::container::string* pOut); void unescapeDouble(SourceIterator begin, SourceIterator end, boost::container::string* pOut); }; #endif readr/src/Warnings.h0000644000175100001440000000241613106621354014121 0ustar hornikusers#ifndef READ_WARNINGS_H_ #define READ_WARNINGS_H_ class Warnings { std::vector row_, col_; std::vector expected_, actual_; public: Warnings() { } // row and col should be zero-indexed. addWarning converts into one-indexed void addWarning(int row, int col, const std::string& expected, const std::string& actual) { row_.push_back(row == -1 ? NA_INTEGER : row + 1); col_.push_back(col == -1 ? NA_INTEGER : col + 1); expected_.push_back(expected); actual_.push_back(actual); } Rcpp::RObject addAsAttribute(Rcpp::RObject x) { if (size() == 0) return x; x.attr("problems") = asDataFrame(); return x; } size_t size() { return row_.size(); } void clear() { row_.clear(); col_.clear(); expected_.clear(); actual_.clear(); } Rcpp::List asDataFrame() { Rcpp::List out = Rcpp::List::create( Rcpp::_["row"] = Rcpp::wrap(row_), Rcpp::_["col"] = Rcpp::wrap(col_), Rcpp::_["expected"] = Rcpp::wrap(expected_), Rcpp::_["actual"] = Rcpp::wrap(actual_) ); out.attr("class") = Rcpp::CharacterVector::create("tbl_df", "tbl", "data.frame"); out.attr("row.names") = Rcpp::IntegerVector::create(NA_INTEGER, -size()); return out; } }; #endif readr/src/DateTime.h0000644000175100001440000001270213106621354014024 0ustar hornikusers#ifndef READR_DATE_TIME_H_ #define READR_DATE_TIME_H_ #include #include #include "localtime.h" // Much of this code is adapted from R's src/main/datetime.c. // Author: The R Core Team. // License: GPL >= 2 static const int month_length[12] = {31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31}; static const int month_start[12] = {0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334}; // Leap days occur in a 400 year cycle: this records the cumulative number // of leap days in per cycle. Generated with: // is_leap <- function(y) (y %% 4) == 0 & ((y %% 100) != 0 | (y %% 400) == 0) // cumsum(is_leap(0:399)) static const int leap_days[400] = {0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 27, 28, 28, 28, 28, 29, 29, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 36, 36, 36, 36, 37, 37, 37, 37, 38, 38, 38, 38, 39, 39, 39, 39, 40, 40, 40, 40, 41, 41, 41, 41, 42, 42, 42, 42, 43, 43, 43, 43, 44, 44, 44, 44, 45, 45, 45, 45, 46, 46, 46, 46, 47, 47, 47, 47, 48, 48, 48, 48, 49, 49, 49, 49, 49, 49, 49, 49, 50, 50, 50, 50, 51, 51, 51, 51, 52, 52, 52, 52, 53, 53, 53, 53, 54, 54, 54, 54, 55, 55, 55, 55, 56, 56, 56, 56, 57, 57, 57, 57, 58, 58, 58, 58, 59, 59, 59, 59, 60, 60, 60, 60, 61, 61, 61, 61, 62, 62, 62, 62, 63, 63, 63, 63, 64, 64, 64, 64, 65, 65, 65, 65, 66, 66, 66, 66, 67, 67, 67, 67, 68, 68, 68, 68, 69, 69, 69, 69, 70, 70, 70, 70, 71, 71, 71, 71, 72, 72, 72, 72, 73, 73, 73, 73, 73, 73, 73, 73, 74, 74, 74, 74, 75, 75, 75, 75, 76, 76, 76, 76, 77, 77, 77, 77, 78, 78, 78, 78, 79, 79, 79, 79, 80, 80, 80, 80, 81, 81, 81, 81, 82, 82, 82, 82, 83, 83, 83, 83, 84, 84, 84, 84, 85, 85, 85, 85, 86, 86, 86, 86, 87, 87, 87, 87, 88, 88, 88, 88, 89, 89, 89, 89, 90, 90, 90, 90, 91, 91, 91, 91, 92, 92, 92, 92, 93, 93, 93, 93, 94, 94, 94, 94, 95, 95, 95, 95, 96, 96, 96, 96, 97, 97, 97}; static const int cycle_days = 400 * 365 + 97; inline int is_leap(unsigned y) { return (y % 4) == 0 && ((y % 100) != 0 || (y % 400) == 0); } class DateTime { int year_, mon_, day_, hour_, min_, sec_, offset_; double psec_; std::string tz_; public: DateTime(int year, int mon, int day, int hour = 0, int min = 0, int sec = 0, double psec = 0, const std::string& tz = ""): year_(year), mon_(mon), day_(day), hour_(hour), min_(min), sec_(sec), offset_(0), psec_(psec), tz_(tz) { } // Used to add time zone offsets which can only be easily applied once // we've converted into seconds since epoch. void setOffset(int offset) { offset_ = offset; } // Is this a valid date time? bool validDateTime() const { return validDate() && validTime(); } bool validDate() const { if (year_ < 0) return false; if (mon_ < 0 || mon_ > 11) return false; if (day_ < 0 || day_ >= days_in_month()) return false; return true; } bool validTime() const { if (sec_ < 0 || sec_ > 60) return false; if (min_ < 0 || min_ > 59) return false; if (hour_ < 0 || hour_ > 23) return false; return true; } double datetime() const { return (tz_ == "UTC") ? utctime() : localtime(); } int date() const { return utcdate(); } double time() const { return psec_ + sec_ + (min_ * 60) + (hour_ * 3600); } private: // Number of number of seconds since 1970-01-01T00:00:00Z. // Compared to usual implementations this returns a double, and supports // a wider range of dates. Invalid dates have undefined behaviour. double utctime() const { return utcdate() * 86400.0 + time() + offset_; } // Find number of days since 1970-01-01. // Invalid dates have undefined behaviour. int utcdate() const { if (!validDate()) return NA_REAL; // Number of days since start of year int day = month_start[mon_] + day_; if (mon_ > 1 && is_leap(year_)) day++; // Number of days since 0000-01-01 // Leap years come in 400 year cycles so determine which cycle we're // in, and what position we're in within that cycle. int ly_cycle = year_ / 400; int ly_offset = year_ - (ly_cycle * 400); if (ly_offset < 0) { ly_offset += 400; ly_cycle--; } day += ly_cycle * cycle_days + ly_offset * 365 + leap_days[ly_offset]; // Convert to number of days since 1970-01-01 day -= 719528; return day; } double localtime() const { if (!validDateTime()) return NA_REAL; struct Rtm tm; tm.tm_year = year_ - 1900; tm.tm_mon = mon_; tm.tm_mday = day_ + 1; tm.tm_hour = hour_; tm.tm_min = min_; tm.tm_sec = sec_; // The Daylight Saving Time flag (tm_isdst) is greater than zero if Daylight // Saving Time is in effect, zero if Daylight Saving Time is not in effect, // and less than zero if the information is not available. tm.tm_isdst = -1; time_t time = my_mktime(&tm, tz_.c_str()); return time + psec_ + offset_; } inline int days_in_month() const { return month_length[mon_] + (mon_ == 1 && is_leap(year_)); } inline int days_in_year() const { return 365 + is_leap(year_); } }; #endif readr/src/Makevars.win0000644000175100001440000000002213106621354014437 0ustar hornikusersPKG_LIBS=-lRiconv readr/src/init.c0000644000175100001440000000033713106621354013267 0ustar hornikusers#include #include #include // for NULL #include void R_init_odbc(DllInfo* info) { R_registerRoutines(info, NULL, NULL, NULL, NULL); R_useDynamicSymbols(info, TRUE); } readr/src/datetime.cpp0000644000175100001440000000140713106621354014457 0ustar hornikusers#include using namespace Rcpp; #include "DateTime.h" // [[Rcpp::export]] NumericVector utctime(IntegerVector year, IntegerVector month, IntegerVector day, IntegerVector hour, IntegerVector min, IntegerVector sec, NumericVector psec) { int n = year.size(); if (month.size() != n || day.size() != n || hour.size() != n || min.size() != n || sec.size() != n || psec.size() != n) { Rcpp::stop("All inputs must be same length"); } NumericVector out = NumericVector(n); for (int i = 0; i < n; ++i) { DateTime dt(year[i], month[i] - 1, day[i] - 1, hour[i], min[i], sec[i], psec[i], "UTC"); out[i] = dt.datetime(); } out.attr("class") = CharacterVector::create("POSIXct", "POSIXt"); out.attr("tzone") = "UTC"; return out; } readr/src/RcppExports.cpp0000644000175100001440000003656713106621354015173 0ustar hornikusers// Generated by using Rcpp::compileAttributes() -> do not edit by hand // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 #include using namespace Rcpp; // collectorGuess std::string collectorGuess(CharacterVector input, List locale_); RcppExport SEXP readr_collectorGuess(SEXP inputSEXP, SEXP locale_SEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< CharacterVector >::type input(inputSEXP); Rcpp::traits::input_parameter< List >::type locale_(locale_SEXP); rcpp_result_gen = Rcpp::wrap(collectorGuess(input, locale_)); return rcpp_result_gen; END_RCPP } // read_connection_ RawVector read_connection_(RObject con, int chunk_size); RcppExport SEXP readr_read_connection_(SEXP conSEXP, SEXP chunk_sizeSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< RObject >::type con(conSEXP); Rcpp::traits::input_parameter< int >::type chunk_size(chunk_sizeSEXP); rcpp_result_gen = Rcpp::wrap(read_connection_(con, chunk_size)); return rcpp_result_gen; END_RCPP } // utctime NumericVector utctime(IntegerVector year, IntegerVector month, IntegerVector day, IntegerVector hour, IntegerVector min, IntegerVector sec, NumericVector psec); RcppExport SEXP readr_utctime(SEXP yearSEXP, SEXP monthSEXP, SEXP daySEXP, SEXP hourSEXP, SEXP minSEXP, SEXP secSEXP, SEXP psecSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< IntegerVector >::type year(yearSEXP); Rcpp::traits::input_parameter< IntegerVector >::type month(monthSEXP); Rcpp::traits::input_parameter< IntegerVector >::type day(daySEXP); Rcpp::traits::input_parameter< IntegerVector >::type hour(hourSEXP); Rcpp::traits::input_parameter< IntegerVector >::type min(minSEXP); Rcpp::traits::input_parameter< IntegerVector >::type sec(secSEXP); Rcpp::traits::input_parameter< NumericVector >::type psec(psecSEXP); rcpp_result_gen = Rcpp::wrap(utctime(year, month, day, hour, min, sec, psec)); return rcpp_result_gen; END_RCPP } // dim_tokens_ IntegerVector dim_tokens_(List sourceSpec, List tokenizerSpec); RcppExport SEXP readr_dim_tokens_(SEXP sourceSpecSEXP, SEXP tokenizerSpecSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type sourceSpec(sourceSpecSEXP); Rcpp::traits::input_parameter< List >::type tokenizerSpec(tokenizerSpecSEXP); rcpp_result_gen = Rcpp::wrap(dim_tokens_(sourceSpec, tokenizerSpec)); return rcpp_result_gen; END_RCPP } // count_fields_ std::vector count_fields_(List sourceSpec, List tokenizerSpec, int n_max); RcppExport SEXP readr_count_fields_(SEXP sourceSpecSEXP, SEXP tokenizerSpecSEXP, SEXP n_maxSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type sourceSpec(sourceSpecSEXP); Rcpp::traits::input_parameter< List >::type tokenizerSpec(tokenizerSpecSEXP); Rcpp::traits::input_parameter< int >::type n_max(n_maxSEXP); rcpp_result_gen = Rcpp::wrap(count_fields_(sourceSpec, tokenizerSpec, n_max)); return rcpp_result_gen; END_RCPP } // guess_header_ RObject guess_header_(List sourceSpec, List tokenizerSpec, List locale_); RcppExport SEXP readr_guess_header_(SEXP sourceSpecSEXP, SEXP tokenizerSpecSEXP, SEXP locale_SEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type sourceSpec(sourceSpecSEXP); Rcpp::traits::input_parameter< List >::type tokenizerSpec(tokenizerSpecSEXP); Rcpp::traits::input_parameter< List >::type locale_(locale_SEXP); rcpp_result_gen = Rcpp::wrap(guess_header_(sourceSpec, tokenizerSpec, locale_)); return rcpp_result_gen; END_RCPP } // tokenize_ RObject tokenize_(List sourceSpec, List tokenizerSpec, int n_max); RcppExport SEXP readr_tokenize_(SEXP sourceSpecSEXP, SEXP tokenizerSpecSEXP, SEXP n_maxSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type sourceSpec(sourceSpecSEXP); Rcpp::traits::input_parameter< List >::type tokenizerSpec(tokenizerSpecSEXP); Rcpp::traits::input_parameter< int >::type n_max(n_maxSEXP); rcpp_result_gen = Rcpp::wrap(tokenize_(sourceSpec, tokenizerSpec, n_max)); return rcpp_result_gen; END_RCPP } // parse_vector_ SEXP parse_vector_(CharacterVector x, List collectorSpec, List locale_, const std::vector& na); RcppExport SEXP readr_parse_vector_(SEXP xSEXP, SEXP collectorSpecSEXP, SEXP locale_SEXP, SEXP naSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< CharacterVector >::type x(xSEXP); Rcpp::traits::input_parameter< List >::type collectorSpec(collectorSpecSEXP); Rcpp::traits::input_parameter< List >::type locale_(locale_SEXP); Rcpp::traits::input_parameter< const std::vector& >::type na(naSEXP); rcpp_result_gen = Rcpp::wrap(parse_vector_(x, collectorSpec, locale_, na)); return rcpp_result_gen; END_RCPP } // read_file_ CharacterVector read_file_(List sourceSpec, List locale_); RcppExport SEXP readr_read_file_(SEXP sourceSpecSEXP, SEXP locale_SEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type sourceSpec(sourceSpecSEXP); Rcpp::traits::input_parameter< List >::type locale_(locale_SEXP); rcpp_result_gen = Rcpp::wrap(read_file_(sourceSpec, locale_)); return rcpp_result_gen; END_RCPP } // read_file_raw_ RawVector read_file_raw_(List sourceSpec); RcppExport SEXP readr_read_file_raw_(SEXP sourceSpecSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type sourceSpec(sourceSpecSEXP); rcpp_result_gen = Rcpp::wrap(read_file_raw_(sourceSpec)); return rcpp_result_gen; END_RCPP } // read_lines_ CharacterVector read_lines_(List sourceSpec, List locale_, std::vector na, int n_max, bool progress); RcppExport SEXP readr_read_lines_(SEXP sourceSpecSEXP, SEXP locale_SEXP, SEXP naSEXP, SEXP n_maxSEXP, SEXP progressSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type sourceSpec(sourceSpecSEXP); Rcpp::traits::input_parameter< List >::type locale_(locale_SEXP); Rcpp::traits::input_parameter< std::vector >::type na(naSEXP); Rcpp::traits::input_parameter< int >::type n_max(n_maxSEXP); Rcpp::traits::input_parameter< bool >::type progress(progressSEXP); rcpp_result_gen = Rcpp::wrap(read_lines_(sourceSpec, locale_, na, n_max, progress)); return rcpp_result_gen; END_RCPP } // read_lines_chunked_ void read_lines_chunked_(List sourceSpec, List locale_, std::vector na, int chunkSize, Environment callback, bool progress); RcppExport SEXP readr_read_lines_chunked_(SEXP sourceSpecSEXP, SEXP locale_SEXP, SEXP naSEXP, SEXP chunkSizeSEXP, SEXP callbackSEXP, SEXP progressSEXP) { BEGIN_RCPP Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type sourceSpec(sourceSpecSEXP); Rcpp::traits::input_parameter< List >::type locale_(locale_SEXP); Rcpp::traits::input_parameter< std::vector >::type na(naSEXP); Rcpp::traits::input_parameter< int >::type chunkSize(chunkSizeSEXP); Rcpp::traits::input_parameter< Environment >::type callback(callbackSEXP); Rcpp::traits::input_parameter< bool >::type progress(progressSEXP); read_lines_chunked_(sourceSpec, locale_, na, chunkSize, callback, progress); return R_NilValue; END_RCPP } // read_lines_raw_ List read_lines_raw_(List sourceSpec, int n_max, bool progress); RcppExport SEXP readr_read_lines_raw_(SEXP sourceSpecSEXP, SEXP n_maxSEXP, SEXP progressSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type sourceSpec(sourceSpecSEXP); Rcpp::traits::input_parameter< int >::type n_max(n_maxSEXP); Rcpp::traits::input_parameter< bool >::type progress(progressSEXP); rcpp_result_gen = Rcpp::wrap(read_lines_raw_(sourceSpec, n_max, progress)); return rcpp_result_gen; END_RCPP } // read_tokens_ RObject read_tokens_(List sourceSpec, List tokenizerSpec, ListOf colSpecs, CharacterVector colNames, List locale_, int n_max, bool progress); RcppExport SEXP readr_read_tokens_(SEXP sourceSpecSEXP, SEXP tokenizerSpecSEXP, SEXP colSpecsSEXP, SEXP colNamesSEXP, SEXP locale_SEXP, SEXP n_maxSEXP, SEXP progressSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type sourceSpec(sourceSpecSEXP); Rcpp::traits::input_parameter< List >::type tokenizerSpec(tokenizerSpecSEXP); Rcpp::traits::input_parameter< ListOf >::type colSpecs(colSpecsSEXP); Rcpp::traits::input_parameter< CharacterVector >::type colNames(colNamesSEXP); Rcpp::traits::input_parameter< List >::type locale_(locale_SEXP); Rcpp::traits::input_parameter< int >::type n_max(n_maxSEXP); Rcpp::traits::input_parameter< bool >::type progress(progressSEXP); rcpp_result_gen = Rcpp::wrap(read_tokens_(sourceSpec, tokenizerSpec, colSpecs, colNames, locale_, n_max, progress)); return rcpp_result_gen; END_RCPP } // read_tokens_chunked_ void read_tokens_chunked_(List sourceSpec, Environment callback, int chunkSize, List tokenizerSpec, ListOf colSpecs, CharacterVector colNames, List locale_, bool progress); RcppExport SEXP readr_read_tokens_chunked_(SEXP sourceSpecSEXP, SEXP callbackSEXP, SEXP chunkSizeSEXP, SEXP tokenizerSpecSEXP, SEXP colSpecsSEXP, SEXP colNamesSEXP, SEXP locale_SEXP, SEXP progressSEXP) { BEGIN_RCPP Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type sourceSpec(sourceSpecSEXP); Rcpp::traits::input_parameter< Environment >::type callback(callbackSEXP); Rcpp::traits::input_parameter< int >::type chunkSize(chunkSizeSEXP); Rcpp::traits::input_parameter< List >::type tokenizerSpec(tokenizerSpecSEXP); Rcpp::traits::input_parameter< ListOf >::type colSpecs(colSpecsSEXP); Rcpp::traits::input_parameter< CharacterVector >::type colNames(colNamesSEXP); Rcpp::traits::input_parameter< List >::type locale_(locale_SEXP); Rcpp::traits::input_parameter< bool >::type progress(progressSEXP); read_tokens_chunked_(sourceSpec, callback, chunkSize, tokenizerSpec, colSpecs, colNames, locale_, progress); return R_NilValue; END_RCPP } // guess_types_ std::vector guess_types_(List sourceSpec, List tokenizerSpec, Rcpp::List locale_, int n); RcppExport SEXP readr_guess_types_(SEXP sourceSpecSEXP, SEXP tokenizerSpecSEXP, SEXP locale_SEXP, SEXP nSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type sourceSpec(sourceSpecSEXP); Rcpp::traits::input_parameter< List >::type tokenizerSpec(tokenizerSpecSEXP); Rcpp::traits::input_parameter< Rcpp::List >::type locale_(locale_SEXP); Rcpp::traits::input_parameter< int >::type n(nSEXP); rcpp_result_gen = Rcpp::wrap(guess_types_(sourceSpec, tokenizerSpec, locale_, n)); return rcpp_result_gen; END_RCPP } // whitespaceColumns List whitespaceColumns(List sourceSpec, int n, std::string comment); RcppExport SEXP readr_whitespaceColumns(SEXP sourceSpecSEXP, SEXP nSEXP, SEXP commentSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type sourceSpec(sourceSpecSEXP); Rcpp::traits::input_parameter< int >::type n(nSEXP); Rcpp::traits::input_parameter< std::string >::type comment(commentSEXP); rcpp_result_gen = Rcpp::wrap(whitespaceColumns(sourceSpec, n, comment)); return rcpp_result_gen; END_RCPP } // type_convert_col RObject type_convert_col(CharacterVector x, List spec, List locale_, int col, const std::vector& na, bool trim_ws); RcppExport SEXP readr_type_convert_col(SEXP xSEXP, SEXP specSEXP, SEXP locale_SEXP, SEXP colSEXP, SEXP naSEXP, SEXP trim_wsSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< CharacterVector >::type x(xSEXP); Rcpp::traits::input_parameter< List >::type spec(specSEXP); Rcpp::traits::input_parameter< List >::type locale_(locale_SEXP); Rcpp::traits::input_parameter< int >::type col(colSEXP); Rcpp::traits::input_parameter< const std::vector& >::type na(naSEXP); Rcpp::traits::input_parameter< bool >::type trim_ws(trim_wsSEXP); rcpp_result_gen = Rcpp::wrap(type_convert_col(x, spec, locale_, col, na, trim_ws)); return rcpp_result_gen; END_RCPP } // stream_delim_ std::string stream_delim_(const List& df, RObject connection, char delim, const std::string& na, bool col_names, bool bom); RcppExport SEXP readr_stream_delim_(SEXP dfSEXP, SEXP connectionSEXP, SEXP delimSEXP, SEXP naSEXP, SEXP col_namesSEXP, SEXP bomSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const List& >::type df(dfSEXP); Rcpp::traits::input_parameter< RObject >::type connection(connectionSEXP); Rcpp::traits::input_parameter< char >::type delim(delimSEXP); Rcpp::traits::input_parameter< const std::string& >::type na(naSEXP); Rcpp::traits::input_parameter< bool >::type col_names(col_namesSEXP); Rcpp::traits::input_parameter< bool >::type bom(bomSEXP); rcpp_result_gen = Rcpp::wrap(stream_delim_(df, connection, delim, na, col_names, bom)); return rcpp_result_gen; END_RCPP } // write_lines_ void write_lines_(const CharacterVector& lines, RObject connection, const std::string& na); RcppExport SEXP readr_write_lines_(SEXP linesSEXP, SEXP connectionSEXP, SEXP naSEXP) { BEGIN_RCPP Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< const CharacterVector& >::type lines(linesSEXP); Rcpp::traits::input_parameter< RObject >::type connection(connectionSEXP); Rcpp::traits::input_parameter< const std::string& >::type na(naSEXP); write_lines_(lines, connection, na); return R_NilValue; END_RCPP } // write_lines_raw_ void write_lines_raw_(List x, RObject connection); RcppExport SEXP readr_write_lines_raw_(SEXP xSEXP, SEXP connectionSEXP) { BEGIN_RCPP Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< List >::type x(xSEXP); Rcpp::traits::input_parameter< RObject >::type connection(connectionSEXP); write_lines_raw_(x, connection); return R_NilValue; END_RCPP } // write_file_ void write_file_(std::string x, RObject connection); RcppExport SEXP readr_write_file_(SEXP xSEXP, SEXP connectionSEXP) { BEGIN_RCPP Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< std::string >::type x(xSEXP); Rcpp::traits::input_parameter< RObject >::type connection(connectionSEXP); write_file_(x, connection); return R_NilValue; END_RCPP } // write_file_raw_ void write_file_raw_(RawVector x, RObject connection); RcppExport SEXP readr_write_file_raw_(SEXP xSEXP, SEXP connectionSEXP) { BEGIN_RCPP Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< RawVector >::type x(xSEXP); Rcpp::traits::input_parameter< RObject >::type connection(connectionSEXP); write_file_raw_(x, connection); return R_NilValue; END_RCPP } readr/src/CollectorGuess.cpp0000644000175100001440000000626213106621354015624 0ustar hornikusers#include using namespace Rcpp; #include "DateTime.h" #include "DateTimeParser.h" #include "LocaleInfo.h" #include "QiParsers.h" typedef bool (*canParseFun)(const std::string&, LocaleInfo* pLocale); bool canParse(CharacterVector x, const canParseFun& canParse, LocaleInfo* pLocale) { for (int i = 0; i < x.size(); ++i) { if (x[i] == NA_STRING) continue; if (x[i].size() == 0) continue; if (!canParse(std::string(x[i]), pLocale)) return false; } return true; } bool allMissing(CharacterVector x) { for (int i = 0; i < x.size(); ++i) { if (x[i] != NA_STRING && x[i].size() > 0) return false; } return true; } bool isLogical(const std::string& x, LocaleInfo* pLocale) { return x == "T" || x == "F" || x == "TRUE" || x == "FALSE"; } bool isInteger(const std::string& x, LocaleInfo* pLocale) { if (x[0] == '0' && x.size() > 1) return false; int res = 0; std::string::const_iterator begin = x.begin(), end = x.end(); return parseInt(begin, end, res) && begin == end; } bool isNumber(const std::string& x, LocaleInfo* pLocale) { // Leading zero not followed by decimal mark if (x[0] == '0' && x.size() > 1 && x[1] != pLocale->decimalMark_) return false; double res = 0; std::string::const_iterator begin = x.begin(), end = x.end(); bool ok = parseNumber(pLocale->decimalMark_, pLocale->groupingMark_, begin, end, res); return ok && begin == x.begin() && end == x.end(); } bool isDouble(const std::string& x, LocaleInfo* pLocale) { // Leading zero not followed by decimal mark if (x[0] == '0' && x.size() > 1 && x[1] != pLocale->decimalMark_) return false; double res = 0; std::string::const_iterator begin = x.begin(), end = x.end(); return parseDouble(pLocale->decimalMark_, begin, end, res) && begin == end; } bool isTime(const std::string& x, LocaleInfo* pLocale) { DateTimeParser parser(pLocale); parser.setDate(x.c_str()); return parser.parseLocaleTime(); } bool isDate(const std::string& x, LocaleInfo* pLocale) { DateTimeParser parser(pLocale); parser.setDate(x.c_str()); return parser.parseLocaleDate(); } static bool isDateTime(const std::string& x, LocaleInfo* pLocale) { DateTimeParser parser(pLocale); parser.setDate(x.c_str()); bool ok = parser.parseISO8601(); if (!ok) return false; if (!parser.compactDate()) return true; // Values like 00014567 are unlikely to be dates, so don't guess return parser.year() > 999; } // [[Rcpp::export]] std::string collectorGuess(CharacterVector input, List locale_) { LocaleInfo locale(locale_); if (input.size() == 0 || allMissing(input)) return "character"; // Work from strictest to most flexible if (canParse(input, isLogical, &locale)) return "logical"; if (canParse(input, isInteger, &locale)) return "integer"; if (canParse(input, isDouble, &locale)) return "double"; if (canParse(input, isNumber, &locale)) return "number"; if (canParse(input, isTime, &locale)) return "time"; if (canParse(input, isDate, &locale)) return "date"; if (canParse(input, isDateTime, &locale)) return "datetime"; // Otherwise can always parse as a character return "character"; } readr/src/TokenizerWs.h0000644000175100001440000000132613106621354014614 0ustar hornikusers#ifndef READR_TOKENIZERWS_H_ #define READR_TOKENIZERWS_H_ #include #include "Token.h" #include "Tokenizer.h" #include "utils.h" class TokenizerWs : public Tokenizer { std::vector NA_; SourceIterator begin_, cur_, curLine_, end_; int row_, col_; std::string comment_; bool moreTokens_, hasComment_; public: TokenizerWs(std::vector NA = std::vector(1, "NA"), std::string comment = ""); void tokenize(SourceIterator begin, SourceIterator end); std::pair progress(); Token nextToken(); private: Token fieldToken(SourceIterator begin, SourceIterator end, bool hasNull); bool isComment(const char* cur) const; }; #endif readr/src/TokenizerLog.h0000644000175100001440000000750713106621354014753 0ustar hornikusers#ifndef FASTREAD_TOKENIZER_LOG_H_ #define FASTREAD_TOKENIZER_LOG_H_ #include #include "Token.h" #include "Tokenizer.h" #include "utils.h" enum LogState { LOG_DELIM, LOG_FIELD, LOG_STRING, LOG_ESCAPE, LOG_QUOTE, LOG_DATE }; class TokenizerLog : public Tokenizer { SourceIterator begin_, cur_, end_; LogState state_; int row_, col_; bool moreTokens_; public: TokenizerLog() { } void tokenize(SourceIterator begin, SourceIterator end) { cur_ = begin; begin_ = begin; end_ = end; row_ = 0; col_ = 0; state_ = LOG_DELIM; moreTokens_ = true; } std::pair progress() { size_t bytes = cur_ - begin_; return std::make_pair(bytes / (double) (end_ - begin_), bytes); } Token nextToken() { // Capture current position int row = row_, col = col_; if (!moreTokens_) return Token(TOKEN_EOF, row, col); SourceIterator token_begin = cur_; while (cur_ != end_) { Advance advance(&cur_); if ((row_ + 1) % 100000 == 0 || (col_ + 1) % 100000 == 0) Rcpp::checkUserInterrupt(); switch(state_) { case LOG_DELIM: if (*cur_ == '\r' || *cur_ == '\n') { newRecord(); advanceForLF(&cur_, end_); return Token(TOKEN_EMPTY, row, col); } else if (*cur_ == ' ') { newField(); return Token(TOKEN_EMPTY, row, col); } else if (*cur_ == '"') { state_ = LOG_STRING; } else if (*cur_ == '[') { state_ = LOG_DATE; } else { state_ = LOG_FIELD; } break; case LOG_FIELD: if (*cur_ == '\r' || *cur_ == '\n') { newRecord(); return fieldToken(token_begin, advanceForLF(&cur_, end_), row, col); } else if (*cur_ == ' ') { newField(); return fieldToken(token_begin, cur_, row, col); } break; case LOG_QUOTE: if (*cur_ == ' ') { newField(); return fieldToken(token_begin + 1, cur_ - 1, row, col); } else if (*cur_ == '\r' || *cur_ == '\n') { newRecord(); return fieldToken(token_begin + 1, advanceForLF(&cur_, end_) - 1, row, col); } else { state_ = LOG_STRING; } break; case LOG_STRING: if (*cur_ == '"') { state_ = LOG_QUOTE; } else if (*cur_ == '\\') { state_ = LOG_ESCAPE; } break; case LOG_ESCAPE: state_ = LOG_STRING; break; case LOG_DATE: if (*cur_ == ']') { newField(); if (cur_ + 1 != end_) cur_++; return fieldToken(token_begin + 1, cur_ - 1, row, col); } break; } } // Reached end of Source: cur_ == end_ moreTokens_ = false; switch (state_) { case LOG_DELIM: if (col_ == 0) { return Token(TOKEN_EOF, row, col); } else { return Token(TOKEN_EMPTY, row, col); } case LOG_QUOTE: return fieldToken(token_begin + 1, end_ - 1, row, col); case LOG_STRING: return fieldToken(token_begin + 1, end_, row, col); case LOG_ESCAPE: warn(row, col, "closing escape at end of file"); return fieldToken(token_begin + 1, end_, row, col); case LOG_DATE: warn(row, col, "closing ] at end of file"); return fieldToken(token_begin + 1, end_, row, col); case LOG_FIELD: return fieldToken(token_begin, end_, row, col); } return Token(TOKEN_EOF, row, col); } private: void newField() { col_++; state_ = LOG_DELIM; } void newRecord() { row_++; col_ = 0; state_ = LOG_DELIM; } Token fieldToken(SourceIterator begin, SourceIterator end, int row, int col) { return Token(begin, end, row, col, false).flagNA(std::vector(1, "-")); } }; #endif readr/NAMESPACE0000644000175100001440000000446113106315444012612 0ustar hornikusers# Generated by roxygen2: do not edit by hand S3method(as.col_spec,"NULL") S3method(as.col_spec,character) S3method(as.col_spec,col_spec) S3method(as.col_spec,default) S3method(as.col_spec,list) S3method(format,col_spec) S3method(output_column,POSIXt) S3method(output_column,default) S3method(output_column,double) S3method(print,col_spec) S3method(print,collector) S3method(print,date_names) S3method(print,locale) export(ChunkCallback) export(DataFrameCallback) export(ListCallback) export(SideEffectChunkCallback) export(col_character) export(col_date) export(col_datetime) export(col_double) export(col_factor) export(col_guess) export(col_integer) export(col_logical) export(col_number) export(col_skip) export(col_time) export(cols) export(cols_condense) export(cols_only) export(count_fields) export(datasource) export(date_names) export(date_names_lang) export(date_names_langs) export(default_locale) export(format_csv) export(format_delim) export(format_tsv) export(fwf_cols) export(fwf_empty) export(fwf_positions) export(fwf_widths) export(guess_encoding) export(guess_parser) export(locale) export(output_column) export(parse_character) export(parse_date) export(parse_datetime) export(parse_double) export(parse_factor) export(parse_guess) export(parse_integer) export(parse_logical) export(parse_number) export(parse_time) export(parse_vector) export(problems) export(read_csv) export(read_csv2) export(read_csv2_chunked) export(read_csv_chunked) export(read_delim) export(read_delim_chunked) export(read_file) export(read_file_raw) export(read_fwf) export(read_lines) export(read_lines_chunked) export(read_lines_raw) export(read_log) export(read_rds) export(read_table) export(read_table2) export(read_tsv) export(read_tsv_chunked) export(readr_example) export(spec) export(spec_csv) export(spec_csv2) export(spec_delim) export(spec_table) export(spec_tsv) export(stop_for_problems) export(tokenize) export(tokenizer_csv) export(tokenizer_delim) export(tokenizer_fwf) export(tokenizer_line) export(tokenizer_log) export(tokenizer_tsv) export(tokenizer_ws) export(type_convert) export(write_csv) export(write_delim) export(write_excel_csv) export(write_file) export(write_lines) export(write_rds) export(write_tsv) importClassesFrom(Rcpp,"C++Object") importFrom(R6,R6Class) importFrom(hms,hms) importFrom(tibble,tibble) useDynLib(readr) readr/NEWS.md0000644000175100001440000004557113106615757012512 0ustar hornikusers# readr 1.1.1 * Point release for test compatibility with tibble v1.3.1. * Fixed undefined behavior in localtime.c when using `locale(tz = "")` after loading a timezone due to incomplete reinitialization of the global locale. # readr 1.1.0 ## New features ### Parser improvements * `parse_factor()` gains a `include_na` argument, to include `NA` in the factor levels (#541). * `parse_factor()` will now can accept `levels = NULL`, which allows one to generate factor levels based on the data (like stringsAsFactors = TRUE) (#497). * `parse_numeric()` now returns the full string if it contains no numbers (#548). * `parse_time()` now correctly handles 12 AM/PM (#579). * `problems()` now returns the file path in additional to the location of the error in the file (#581). * `read_csv2()` gives a message if it updates the default locale (#443, @krlmlr). * `read_delim()` now signals an error if given an empty delimiter (#557). * `write_*()` functions witting whole number doubles are no longer written with a trailing `.0` (#526). ### Whitespace / fixed width improvements * `fwf_cols()` allows for specifying the `col_positions` argument of `read_fwf()` with named arguments of either column positions or widths (#616, @jrnold). * `fwf_empty()` gains an `n` argument to control how many lines are read for whitespace to determine column structure (#518, @Yeedle). * `read_fwf()` gives error message if specifications have overlapping columns (#534, @gergness) * `read_table()` can now handle `pipe()` connections (#552). * `read_table()` can now handle files with many lines of leading comments (#563). * `read_table2()` which allows any number of whitespace characters as delimiters, a more exact replacement for `utils::read.table()` (#608). ## Writing to connections * `write_*()` functions now support writing to binary connections. In addition output filenames with `.gz`, `.bz2` or `.xz` will automatically open the appropriate connection and to write the compressed file. (#348) * `write_lines()` now accepts a list of raw vectors (#542). ## Miscellaneous features * `col_euro_double()`, `parse_euro_double()`, `col_numeric()`, and `parse_numeric()` have been removed. * `guess_encoding()` returns a tibble, and works better with lists of raw vectors (as returned by `read_lines_raw()`). * `ListCallback` R6 Class to provide a more flexible return type for callback functions (#568, @mmuurr) * `tibble::as.tibble()` now used to construct tibbles (#538). * `read_csv`, `read_csv2`, and `read_tsv` gain a `quote` argument, (#631, @noamross) ## Bugfixes * `parse_factor()` now converts data to UTF-8 based on the supplied locale (#615). * `read_*()` functions with the `guess_max` argument now throw errors on inappropriate inputs (#588). * `read_*_chunked()` functions now properly end the stream if `FALSE` is returned from the callback. * `read_delim()` and `read_fwf()` when columns are skipped using `col_types` now report the correct column name (#573, @cb4ds). * `spec()` declarations that are long now print properly (#597). * `read_table()` does not print `spec` when `col_types` is not `NULL` (#630, @jrnold). * `guess_encoding()` now returns a tibble for all ASCII input as well (#641). # readr 1.0.0 ## Column guessing The process by which readr guesses the types of columns has received a substantial overhaul to make it easier to fix problems when the initial guesses aren't correct, and to make it easier to generate reproducible code. Now column specifications are printing by default when you read from a file: ```R challenge <- read_csv(readr_example("challenge.csv")) #> Parsed with column specification: #> cols( #> x = col_integer(), #> y = col_character() #> ) ``` And you can extract those values after the fact with `spec()`: ```R spec(challenge) #> cols( #> x = col_integer(), #> y = col_character() #> ) ``` This makes it easier to quickly identify parsing problems and fix them (#314). If the column specification is long, the new `cols_condense()` is used to condense the spec by identifying the most common type and setting it as the default. This is particularly useful when only a handful of columns have a different type (#466). You can also generating an initial specification without parsing the file using `spec_csv()`, `spec_tsv()`, etc. Once you have figured out the correct column types for a file, it's often useful to make the parsing strict. You can do this either by copying and pasting the printed output, or for very long specs, saving the spec to disk with `write_rds()`. In production scripts, combine this with `stop_for_problems()` (#465): if the input data changes form, you'll fail fast with an error. You can now also adjust the number of rows that readr uses to guess the column types with `guess_max`: ```R challenge <- read_csv(readr_example("challenge.csv"), guess_max = 1500) #> Parsed with column specification: #> cols( #> x = col_double(), #> y = col_date(format = "") #> ) ``` You can now access the guessing algorithm from R. `guess_parser()` will tell you which parser readr will select for a character vector (#377). We've made a number of fixes to the guessing algorithm: * New example `extdata/challenge.csv` which is carefully created to cause problems with the default column type guessing heuristics. * Blank lines and lines with only comments are now skipped automatically without warning (#381, #321). * Single '-' or '.' are now parsed as characters, not numbers (#297). * Numbers followed by a single trailing character are parsed as character, not numbers (#316). * We now guess at times using the `time_format` specified in the `locale()`. We have made a number of improvements to the reification of the `col_types`, `col_names` and the actual data: * If `col_types` is too long, it is subsetted correctly (#372, @jennybc). * If `col_names` is too short, the added names are numbered correctly (#374, @jennybc). * Missing colum name names are now given a default name (`X2`, `X7` etc) (#318). Duplicated column names are now deduplicated. Both changes generate a warning; to suppress it supply an explicit `col_names` (setting `skip = 1` if there's an existing ill-formed header). * `col_types()` accepts a named list as input (#401). ## Column parsing The date time parsers recognise three new format strings: * `%I` for 12 hour time format (#340). * `%AD` and `%AT` are "automatic" date and time parsers. They are both slightly less flexible than previous defaults. The automatic date parser requires a four digit year, and only accepts `-` and `/` as separators (#442). The flexible time parser now requires colons between hours and minutes and optional seconds (#424). `%y` and `%Y` are now strict and require 2 or 4 characters respectively. Date and time parsing functions received a number of small enhancements: * `parse_time()` returns `hms` objects rather than a custom `time` class (#409). It now correctly parses missing values (#398). * `parse_date()` returns a numeric vector (instead of an integer vector) (#357). * `parse_date()`, `parse_time()` and `parse_datetime()` gain an `na` argument to match all other parsers (#413). * If the format argument is omitted `parse_date()` or `parse_time()`, date and time formats specified in the locale will be used. These now default to `%AD` and `%AT` respectively. * You can now parse partial dates with `parse_date()` and `parse_datetime()`, e.g. `parse_date("2001", "%Y")` returns `2001-01-01`. `parse_number()` is slightly more flexible - it now parses numbers up to the first ill-formed character. For example `parse_number("-3-")` and `parse_number("...3...")` now return -3 and 3 respectively. We also fixed a major bug where parsing negative numbers yielded positive values (#308). `parse_logical()` now accepts `0`, `1` as well as lowercase `t`, `f`, `true`, `false`. ## New readers and writers * `read_file_raw()` reads a complete file into a single raw vector (#451). * `read_*()` functions gain a `quoted_na` argument to control whether missing values within quotes are treated as missing values or as strings (#295). * `write_excel_csv()` can be used to write a csv file with a UTF-8 BOM at the start, which forces Excel to read it as UTF-8 encoded (#375). * `write_lines()` writes a character vector to a file (#302). * `write_file()` to write a single character or raw vector to a file (#474). * Experimental support for chunked reading a writing (`read_*_chunked()`) functions. The API is unstable and subject to change in the future (#427). ## Minor features and bug fixes * Printing double values now uses an [implementation](https://github.com/juj/MathGeoLib/blob/master/src/Math/grisu3.c) of the [grisu3 algorithm](http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf) which speeds up writing of large numeric data frames by ~10X. (#432) '.0' is appended to whole number doubles, to ensure they will be read as doubles as well. (#483) * readr imports tibble so that you get consistent `tbl_df` behaviour (#317, #385). * New example `extdata/challenge.csv` which is carefully created to cause problems with the default column type guessing heuristics. * `default_locale()` now sets the default locale in `readr.default_locale` rather than regenerating it for each call. (#416). * `locale()` now automatically sets decimal mark if you set the grouping mark. It throws an error if you accidentally set decimal and grouping marks to the same character (#450). * All `read_*()` can read into long vectors, substantially increasing the number of rows you can read (#309). * All `read_*()` functions return empty objects rather than signaling an error when run on an empty file (#356, #441). * `read_delim()` gains a `trim_ws` argument (#312, noamross) * `read_fwf()` received a number of improvements: * `read_fwf()` now can now reliably read only a partial set of columns (#322, #353, #469) * `fwf_widths()` accepts negative column widths for compatibility with the `widths` argument in `read.fwf()` (#380, @leeper). * You can now read fixed width files with ragged final columns, by setting the final end position in `fwf_positions()` or final width in `fwf_widths()` to `NA` (#353, @ghaarsma). `fwf_empty()` does this automatically. * `read_fwf()` and `fwf_empty()` can now skip commented lines by setting a `comment` argument (#334). * `read_lines()` ignores embedded null's in strings (#338) and gains a `na` argument (#479). * `readr_example()` makes it easy to access example files bundled with readr. * `type_convert()` now accepts only `NULL` or a `cols` specification for `col_types` (#369). * `write_delim()` and `write_csv()` now invisibly return the input data frame (as documented, #363). * Doubles are parsed with `boost::spirit::qi::long_double` to work around a bug in the spirit library when parsing large numbers (#412). * Fix bug when detecting column types for single row files without headers (#333). # readr 0.2.2 * Fix bug when checking empty values for missingness (caused valgrind issue and random crashes). # readr 0.2.1 * Fixes so that readr works on Solaris. # readr 0.2.0 ## Internationalisation readr now has a strategy for dealing with settings that vary from place to place: locales. The default locale is still US centric (because R itself is), but you can now easily override the default timezone, decimal separator, grouping mark, day & month names, date format, and encoding. This has lead to a number of changes: * `read_csv()`, `read_tsv()`, `read_fwf()`, `read_table()`, `read_lines()`, `read_file()`, `type_convert()`, `parse_vector()` all gain a `locale` argument. * `locale()` controls all the input settings that vary from place-to-place. * `col_euro_double()` and `parse_euro_double()` have been deprecated. Use the `decimal_mark` parameter to `locale()` instead. * The default encoding is now UTF-8. To load files that are not in UTF-8, set the `encoding` parameter of the `locale()` (#40). New `guess_encoding()` function uses stringi to help you figure out the encoding of a file. * `parse_datetime()` and `parse_date()` with `%B` and `%b` use the month names (full and abbreviate) defined in the locale (#242). They also inherit the tz from the locale, rather than using an explicit `tz` parameter. See `vignette("locales")` for more details. ## File parsing improvements * `cols()` lets you pick the default column type for columns not otherwise explicitly named (#148). You can refer to parsers either with their full name (e.g. `col_character()`) or their one letter abbreviation (e.g. `c`). * `cols_only()` allows you to load only named columns. You can also choose to override the default column type in `cols()` (#72). * `read_fwf()` is now much more careful with new lines. If a line is too short, you'll get a warning instead of a silent mistake (#166, #254). Additionally, the last column can now be ragged: the width of the last field is silently extended until it hits the next line break (#146). This appears to be a common feature of "fixed" width files in the wild. * In `read_csv()`, `read_tsv()`, `read_delim()` etc: * `comment` argument allows you to ignore comments (#68). * `trim_ws` argument controls whether leading and trailing whitespace is removed. It defaults to `TRUE` (#137). * Specifying the wrong number of column names, or having rows with an unexpected number of columns, generates a warning, rather than an error (#189). * Multiple NA values can be specified by passing a character vector to `na` (#125). The default has been changed to `na = c("", "NA")`. Specifying `na = ""` now works as expected with character columns (#114). ## Column parsing improvements Readr gains `vignette("column-types")` which describes how the defaults work and how to override them (#122). * `parse_character()` gains better support for embedded nulls: any characters after the first null are dropped with a warning (#202). * `parse_integer()` and `parse_double()` no longer silently ignore trailing letters after the number (#221). * New `parse_time()` and `col_time()` allows you to parse times (hours, minutes, seconds) into number of seconds since midnight. If the format is omitted, it uses a flexible parser that looks for hours, then optional colon, then minutes, then optional colon, then optional seconds, then optional am/pm (#249). * `parse_date()` and `parse_datetime()`: * `parse_datetime()` no longer incorrectly reads partial dates (e.g. 19, 1900, 1900-01) (#136). These triggered common false positives and after re-reading the ISO8601 spec, I believe they actually refer to periods of time, and should not be translated in to a specific instant (#228). * Compound formats "%D", "%F", "%R", "%X", "%T", "%x" are now parsed correctly, instead of using the ISO8601 parser (#178, @kmillar). * "%." now requires a non-digit. New "%+" skips one or more non-digits. * You can now use `%p` to refer to AM/PM (and am/pm) (#126). * `%b` and `%B` formats (month and abbreviated month name) ignore case when matching (#219). * Local (non-UTC) times with and without daylight savings are now parsed correctly (#120, @andres-s). * `parse_number()` is a somewhat flexible numeric parser designed to read currencies and percentages. It only reads the first number from a string (using the grouping mark defined by the locale). * `parse_numeric()` has been deprecated because the name is confusing - it's a flexible number parser, not a parser of "numerics", as R collectively calls doubles and integers. Use `parse_number()` instead. As well as improvements to the parser, I've also made a number of tweaks to the heuristics that readr uses to guess column types: * New `parse_guess()` and `col_guess()` to explicitly guess column type. * Bumped up row inspection for column typing guessing from 100 to 1000. * The heuristics for guessing `col_integer()` and `col_double()` are stricter. Numbers with leading zeros now default to being parsed as text, rather than as integers/doubles (#266). * A column is guessed as `col_number()` only if it parses as a regular number when you ignoring the grouping marks. ## Minor improvements and bug fixes * Now use R's platform independent `iconv` wrapper, thanks to BDR (#149). * Pathological zero row inputs (due to empty input, `skip` or `n_max`) now return zero row data frames (#119). * When guessing field types, and there's no information to go on, use character instead of logical (#124, #128). * Concise `col_types` specification now understands `?` (guess) and `-` (skip) (#188). * `count_fields()` starts counting from 1, not 0 (#200). * `format_csv()` and `format_delim()` make it easy to render a csv or delimited file into a string. * `fwf_empty()` now works correctly when `col_names` supplied (#186, #222). * `parse_*()` gains a `na` argument that allows you to specify which values should be converted to missing. * `problems()` now reports column names rather than column numbers (#143). Whenever there is a problem, the first five problems are printing out in a warning message, so you can more easily see what's wrong. * `read_*()` throws a warning instead of an error is `col_types` specifies a non-existent column (#145, @alyst). * `read_*()` can read from a remote gz compressed file (#163). * `read_delim()` defaults to `escape_backslash = FALSE` and `escape_double = TRUE` for consistency. `n_max` also affects the number of rows read to guess the column types (#224). * `read_lines()` gains a progress bar. It now also correctly checks for interrupts every 500,000 lines so you can interrupt long running jobs. It also correctly estimates the number of lines in the file, considerably speeding up the reading of large files (60s -> 15s for a 1.5 Gb file). * `read_lines_raw()` allows you to read a file into a list of raw vectors, one element for each line. * `type_convert()` gains `NA` and `trim_ws` arguments, and removes missing values before determining column types. * `write_csv()`, `write_delim()`, and `write_rds()` all invisably return their input so you can use them in a pipe (#290). * `write_delim()` generalises `write_csv()` to write any delimited format (#135). `write_tsv()` is a helpful wrapper for tab separated files. * Quotes are only used when they're needed (#116): when the string contains a quote, the delimiter, a new line or NA. * Double vectors are saved using same amount of precision as `as.character()` (#117). * New `na` argument that specifies how missing values should be written (#187) * POSIXt vectors are saved in a ISO8601 compatible format (#134). * No longer fails silently if it can't open the target for writing (#193, #172). * `write_rds()` and `read_rds()` wrap around `readRDS()` and `saveRDS()`, defaulting to no compression (#140, @nicolasCoutin). readr/R/0000755000175100001440000000000013106621354011567 5ustar hornikusersreadr/R/write.R0000644000175100001440000001177313106315444013055 0ustar hornikusers#' Write a data frame to a delimited file #' #' This is about twice as fast as [write.csv()], and never #' writes row names. `output_column()` is a generic method used to coerce #' columns to suitable output. #' #' @section Output: #' Factors are coerced to character. Doubles are formatted using the grisu3 #' algorithm. POSIXct's are formatted as ISO8601. #' #' All columns are encoded as UTF-8. `write_excel_csv()` also includes a #' \href{https://en.wikipedia.org/wiki/Byte_order_mark}{UTF-8 Byte order mark} #' which indicates to Excel the csv is UTF-8 encoded. #' #' Values are only quoted if needed: if they contain a comma, quote or newline. #' #' @param x A data frame to write to disk #' @param path Path or connection to write to. #' @param append If `FALSE`, will overwrite existing file. If `TRUE`, #' will append to existing file. In both cases, if file does not exist a new #' file is created. #' @param col_names Write columns names at the top of the file? #' @param delim Delimiter used to separate values. Defaults to `" "`. Must be #' a single character. #' @param na String used for missing values. Defaults to NA. Missing values #' will never be quoted; strings with the same value as `na` will #' always be quoted. #' @return `write_*()` returns the input `x` invisibly. #' @references Florian Loitsch, Printing Floating-Point Numbers Quickly and #' Accurately with Integers, PLDI '10, #' \url{http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf} #' @export #' @examples #' tmp <- tempfile() #' write_csv(mtcars, tmp) #' head(read_csv(tmp)) #' #' # format_* is useful for testing and reprexes #' cat(format_csv(head(mtcars))) #' cat(format_tsv(head(mtcars))) #' cat(format_delim(head(mtcars), ";")) #' #' df <- data.frame(x = c(1, 2, NA)) #' format_csv(df, na = ".") #' #' # Quotes are automatically as needed #' df <- data.frame(x = c("a", '"', ",", "\n")) #' cat(format_csv(df)) #' #' # A output connection will be automatically created for output filenames #' # with appropriate extensions. #' dir <- tempdir() #' write_tsv(mtcars, file.path(dir, "mtcars.tsv.gz")) #' write_tsv(mtcars, file.path(dir, "mtcars.tsv.bz2")) #' write_tsv(mtcars, file.path(dir, "mtcars.tsv.xz")) write_delim <- function(x, path, delim = " ", na = "NA", append = FALSE, col_names = !append) { stopifnot(is.data.frame(x)) x_out <- lapply(x, output_column) stream_delim(x_out, path, delim, col_names = col_names, append = append, na = na) invisible(x) } #' @rdname write_delim #' @export write_csv <- function(x, path, na = "NA", append = FALSE, col_names = !append) { write_delim(x, path, delim = ",", na = na,append = append, col_names = col_names) } #' @rdname write_delim #' @export write_excel_csv <- function(x, path, na = "NA", append = FALSE, col_names = !append) { stopifnot(is.data.frame(x)) x_out <- lapply(x, output_column) stream_delim(x_out, path, ",", col_names = col_names, append = append, na = na, bom = TRUE) invisible(x) } #' @rdname write_delim #' @export write_tsv <- function(x, path, na = "NA", append = FALSE, col_names = !append) { write_delim(x, path, delim = '\t', na = na, append = append, col_names = col_names) } #' Convert a data frame to a delimited string #' #' These functions are equivalent to [write_csv()] etc., but instead #' of writing to disk, they return a string. #' #' @return A string. #' @inherit write_delim #' @export format_delim <- function(x, delim, na = "NA", append = FALSE, col_names = !append) { stopifnot(is.data.frame(x)) x <- lapply(x, output_column) stream_delim(x, NULL, delim, col_names = col_names, append = append, na = na) } #' @export #' @rdname format_delim format_csv <- function(x, na = "NA", append = FALSE, col_names = !append) { format_delim(x, delim = ",", na = na, append = append, col_names = col_names) } #' @export #' @rdname format_delim format_tsv <- function(x, na = "NA", append = FALSE, col_names = !append) { format_delim(x, delim = "\t", na = na, append = append, col_names = col_names) } #' Preprocess column for output #' #' This is a generic function that applied to each column before it is saved #' to disk. It provides a hook for S3 classes that need special handling. #' #' @keywords internal #' @param x A vector #' @export #' @examples #' # Most columns are left as is, but POSIXct are #' # converted to ISO8601. #' x <- parse_datetime("2016-01-01") #' str(output_column(x)) output_column <- function(x) { UseMethod("output_column") } #' @export output_column.default <- function(x) { if (!is.object(x)) return(x) as.character(x) } #' @export output_column.double <- function(x) { x } #' @export output_column.POSIXt <- function(x) { format(x, "%Y-%m-%dT%H:%M:%OSZ", tz = "UTC") } stream_delim <- function(df, path, append = FALSE, ...) { path <- standardise_path(path, input = FALSE) if (inherits(path, "connection") && !isOpen(path)) { on.exit(close(path), add = TRUE) if (isTRUE(append)) { open(path, "ab") } else { open(path, "wb") } } stream_delim_(df, path, ...) } readr/R/locale.R0000644000175100001440000001036613106315444013157 0ustar hornikusers#' Create locales #' #' A locale object tries to capture all the defaults that can vary between #' countries. You set the locale in once, and the details are automatically #' passed on down to the columns parsers. The defaults have been chosen to #' match R (i.e. US English) as closely as possible. See #' `vignette("locales")` for more details. #' #' @param date_names Character representations of day and month names. Either #' the language code as string (passed on to [date_names_lang()]) #' or an object created by [date_names()]. #' @param date_format,time_format Default date and time formats. #' @param decimal_mark,grouping_mark Symbols used to indicate the decimal #' place, and to chunk larger numbers. Decimal mark can only be `,` or #' `.`. #' @param tz Default tz. This is used both for input (if the time zone isn't #' present in individual strings), and for output (to control the default #' display). The default is to use "UTC", a time zone that does not use #' daylight savings time (DST) and hence is typically most useful for data. #' The absence of time zones makes it approximately 50x faster to generate #' UTC times than any other time zone. #' #' Use `""` to use the system default time zone, but beware that this #' will not be reproducible across systems. #' #' For a complete list of possible time zones, see \code{\link{OlsonNames}()}. #' Americans, note that "EST" is a Canadian time zone that does not have #' DST. It is \emph{not} Eastern Standard Time. It's better to use #' "US/Eastern", "US/Central" etc. #' @param encoding Default encoding. This only affects how the file is #' read - readr always converts the output to UTF-8. #' @param asciify Should diacritics be stripped from date names and converted to #' ASCII? This is useful if you're dealing with ASCII data where the correct #' spellings have been lost. Requires the \pkg{stringi} package. #' @export #' @examples #' locale() #' locale("fr") #' #' # South American locale #' locale("es", decimal_mark = ",") locale <- function(date_names = "en", date_format = "%AD", time_format = "%AT", decimal_mark = ".", grouping_mark = ",", tz = "UTC", encoding = "UTF-8", asciify = FALSE) { if (is.character(date_names)) { date_names <- date_names_lang(date_names) } stopifnot(is.date_names(date_names)) if (asciify) { date_names[] <- lapply(date_names, stringi::stri_trans_general, id = "latin-ascii") } if (missing(grouping_mark) && !missing(decimal_mark)) { grouping_mark <- if (decimal_mark == ".") "," else "." } else if (missing(decimal_mark) && !missing(grouping_mark)) { decimal_mark <- if (grouping_mark == ".") "," else "." } stopifnot(decimal_mark %in% c(".", ",")) stopifnot(is.character(grouping_mark), length(grouping_mark) == 1) if (decimal_mark == grouping_mark) { stop("`decimal_mark` and `grouping_mark` must be different", call. = FALSE) } check_tz(tz) check_encoding(encoding) structure( list( date_names = date_names, date_format = date_format, time_format = time_format, decimal_mark = decimal_mark, grouping_mark = grouping_mark, tz = tz, encoding = encoding ), class = "locale" ) } is.locale <- function(x) inherits(x, "locale") #' @export print.locale <- function(x, ...) { cat("\n") cat("Numbers: ", prettyNum(123456.78, big.mark = x$grouping_mark, decimal.mark = x$decimal_mark, digits = 8), "\n", sep = "") cat("Formats: ", x$date_format, " / ", x$time_format, "\n", sep = "") cat("Timezone: ", x$tz, "\n", sep = "") cat("Encoding: ", x$encoding, "\n", sep = "") print(x$date_names) } #' @export #' @rdname locale default_locale <- function() { loc <- getOption("readr.default_locale") if (is.null(loc)) { loc <- locale() options("readr.default_locale" = loc) } loc } check_tz <- function(x) { stopifnot(is.character(x), length(x) == 1) if (identical(x, "")) return(TRUE) if (x %in% OlsonNames()) return(TRUE) stop("Unknown TZ ", x, call. = FALSE) } check_encoding <- function(x) { stopifnot(is.character(x), length(x) == 1) if (tolower(x) %in% tolower(iconvlist())) return(TRUE) stop("Unknown encoding ", x, call. = FALSE) } readr/R/count_fields.R0000644000175100001440000000115513106315444014372 0ustar hornikusers#' Count the number of fields in each line of a file #' #' This is useful for diagnosing problems with functions that fail #' to parse correctly. #' #' @inheritParams datasource #' @param tokenizer A tokenizer that specifies how to break the `file` #' up into fields, e.g., [tokenizer_csv()], #' [tokenizer_fwf()] #' @param n_max Optionally, maximum number of rows to count fields for. #' @export #' @examples #' count_fields(readr_example("mtcars.csv"), tokenizer_csv()) count_fields <- function(file, tokenizer, skip = 0, n_max = -1L) { ds <- datasource(file, skip = skip) count_fields_(ds, tokenizer, n_max) } readr/R/utils.R0000644000175100001440000000117013106315444013051 0ustar hornikusers# Silence R CMD check note #' @importFrom tibble tibble NULL isFALSE <- function(x) identical(x, FALSE) is.connection <- function(x) inherits(x, "connection") `%||%` <- function(a, b) if (is.null(a)) b else a is_syntactic <- function(x) make.names(x) == x show_progress <- function() { isTRUE(getOption("readr.show_progress")) && # user disables progress bar interactive() && # an interactive session is.null(getOption("knitr.in.progress")) # Not actively knitting a document } deparse2 <- function(expr, ..., sep = "\n") { paste(deparse(expr, ...), collapse = sep) } is_integerish <- function(x) { floor(x) == x } readr/R/date-symbols.R0000644000175100001440000000460213106315444014317 0ustar hornikusers#' Create or retrieve date names #' #' When parsing dates, you often need to know how weekdays of the week and #' months are represented as text. This pair of functions allows you to either #' create your own, or retrieve from a standard list. The standard list is #' derived from ICU (\url{http://site.icu-project.org}) via the stringi package. #' #' @param mon,mon_ab Full and abbreviated month names. #' @param day,day_ab Full and abbreviated week day names. Starts with Sunday. #' @param am_pm Names used for AM and PM. #' @export #' @examples #' date_names_lang("en") #' date_names_lang("ko") #' date_names_lang("fr") date_names <- function(mon, mon_ab = mon, day, day_ab = day, am_pm = c("AM", "PM")) { stopifnot(is.character(mon), length(mon) == 12) stopifnot(is.character(mon_ab), length(mon_ab) == 12) stopifnot(is.character(day), length(day) == 7) stopifnot(is.character(day_ab), length(day_ab) == 7) structure( list( mon = enc2utf8(mon), mon_ab = enc2utf8(mon_ab), day = enc2utf8(day), day_ab = enc2utf8(day_ab), am_pm = enc2utf8(am_pm) ), class = "date_names" ) } #' @export #' @rdname date_names #' @param language A BCP 47 locale, made up of a languge and a region, #' e.g. `"en_US"` for American English. See `date_names_locales()` #' for a complete list of available locales. date_names_lang <- function(language) { stopifnot(is.character(language), length(language) == 1) symbols <- date_symbols[[language]] if (is.null(symbols)) { stop("Unknown language '", language, "'", call. = FALSE) } symbols } #' @export #' @rdname date_names date_names_langs <- function() { names(date_symbols) } #' @export print.date_names <- function(x, ...) { cat("\n") if (identical(x$day, x$day_ab)) { day <- paste0(x$day, collapse = ", ") } else { day <- paste0(x$day, " (", x$day_ab, ")", collapse = ", ") } if (identical(x$mon, x$mon_ab)) { mon <- paste0(x$mon, collapse = ", ") } else { mon <- paste0(x$mon, " (", x$mon_ab, ")", collapse = ", ") } am_pm <- paste0(x$am_pm, collapse = "/") cat_wrap("Days: ", day) cat_wrap("Months: ", mon) cat_wrap("AM/PM: ", am_pm) } is.date_names <- function(x) inherits(x, "date_names") cat_wrap <- function(header, body) { body <- strwrap(body, exdent = nchar(header)) cat(header, paste(body, collapse = "\n"), "\n", sep = "") } readr/R/read_delim.R0000644000175100001440000002436413106315672014013 0ustar hornikusers#' @useDynLib readr #' @importClassesFrom Rcpp "C++Object" NULL #' Read a delimited file (including csv & tsv) into a tibble #' #' `read_csv()` and `read_tsv()` are special cases of the general #' `read_delim()`. They're useful for reading the most common types of #' flat file data, comma separated values and tab separated values, #' respectively. `read_csv2()` uses `;` for separators, instead of #' `,`. This is common in European countries which use `,` as the #' decimal separator. #' @inheritParams datasource #' @inheritParams tokenizer_delim #' @param col_names Either `TRUE`, `FALSE` or a character vector #' of column names. #' #' If `TRUE`, the first row of the input will be used as the column #' names, and will not be included in the data frame. If `FALSE`, column #' names will be generated automatically: X1, X2, X3 etc. #' #' If `col_names` is a character vector, the values will be used as the #' names of the columns, and the first row of the input will be read into #' the first row of the output data frame. #' #' Missing (`NA`) column names will generate a warning, and be filled #' in with dummy names `X1`, `X2` etc. Duplicate column names #' will generate a warning and be made unique with a numeric prefix. #' @param col_types One of `NULL`, a [cols()] specification, or #' a string. See `vignette("column-types")` for more details. #' #' If `NULL`, all column types will be imputed from the first 1000 rows #' on the input. This is convenient (and fast), but not robust. If the #' imputation fails, you'll need to supply the correct types yourself. #' #' If a column specification created by [cols()], it must contain #' one column specification for each column. If you only want to read a #' subset of the columns, use [cols_only()]. #' #' Alternatively, you can use a compact string representation where each #' character represents one column: #' c = character, i = integer, n = number, d = double, #' l = logical, D = date, T = date time, t = time, ? = guess, or #' `_`/`-` to skip the column. #' @param locale The locale controls defaults that vary from place to place. #' The default locale is US-centric (like R), but you can use #' [locale()] to create your own locale that controls things like #' the default time zone, encoding, decimal mark, big mark, and day/month #' names. #' @param n_max Maximum number of records to read. #' @param guess_max Maximum number of records to use for guessing column types. #' @param progress Display a progress bar? By default it will only display #' in an interactive session and not while knitting a document. The display #' is updated every 50,000 values and will only display if estimated reading #' time is 5 seconds or more. The automatic progress bar can be disabled by #' setting option \code{readr.show_progress} to \code{FALSE}. #' @return A data frame. If there are parsing problems, a warning tells you #' how many, and you can retrieve the details with \code{\link{problems}()}. #' @export #' @examples #' # Input sources ------------------------------------------------------------- #' # Read from a path #' read_csv(readr_example("mtcars.csv")) #' read_csv(readr_example("mtcars.csv.zip")) #' read_csv(readr_example("mtcars.csv.bz2")) #' read_csv("https://github.com/tidyverse/readr/raw/master/inst/extdata/mtcars.csv") #' #' # Or directly from a string (must contain a newline) #' read_csv("x,y\n1,2\n3,4") #' #' # Column types -------------------------------------------------------------- #' # By default, readr guesses the columns types, looking at the first 100 rows. #' # You can override with a compact specification: #' read_csv("x,y\n1,2\n3,4", col_types = "dc") #' #' # Or with a list of column types: #' read_csv("x,y\n1,2\n3,4", col_types = list(col_double(), col_character())) #' #' # If there are parsing problems, you get a warning, and can extract #' # more details with problems() #' y <- read_csv("x\n1\n2\nb", col_types = list(col_double())) #' y #' problems(y) #' #' # File types ---------------------------------------------------------------- #' read_csv("a,b\n1.0,2.0") #' read_csv2("a;b\n1,0;2,0") #' read_tsv("a\tb\n1.0\t2.0") #' read_delim("a|b\n1.0|2.0", delim = "|") read_delim <- function(file, delim, quote = '"', escape_backslash = FALSE, escape_double = TRUE, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, comment = "", trim_ws = FALSE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress()) { if (!nzchar(delim)) { stop("`delim` must be at least one character, ", "use `read_table()` for whitespace delimited input.", call. = FALSE) } tokenizer <- tokenizer_delim(delim, quote = quote, escape_backslash = escape_backslash, escape_double = escape_double, na = na, quoted_na = quoted_na, comment = comment, trim_ws = trim_ws) read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, locale = locale, skip = skip, comment = comment, n_max = n_max, guess_max = guess_max, progress = progress) } #' @rdname read_delim #' @export read_csv <- function(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress()) { tokenizer <- tokenizer_csv(na = na, quoted_na = TRUE, quote = quote, comment = comment, trim_ws = trim_ws) read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, locale = locale, skip = skip, comment = comment, n_max = n_max, guess_max = guess_max, progress = progress) } #' @rdname read_delim #' @export read_csv2 <- function(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress()) { if (locale$decimal_mark == ".") { message("Using ',' as decimal and '.' as grouping mark. Use read_delim() for more control.") locale$decimal_mark <- "," locale$grouping_mark <- "." } tokenizer <- tokenizer_delim(delim = ";", na = na, quoted_na = quoted_na, quote = quote, comment = comment, trim_ws = trim_ws) read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, locale = locale, skip = skip, comment = comment, n_max = n_max, guess_max = guess_max, progress = progress) } #' @rdname read_delim #' @export read_tsv <- function(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress()) { tokenizer <- tokenizer_tsv(na = na, quoted_na = quoted_na, quote = quote, comment = comment, trim_ws = trim_ws) read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, locale = locale, skip = skip, comment = comment, n_max = n_max, guess_max = guess_max, progress = progress) } # Helper functions for reading from delimited files ---------------------------- read_tokens <- function(data, tokenizer, col_specs, col_names, locale_, n_max, progress) { if (n_max == Inf) { n_max <- -1 } read_tokens_(data, tokenizer, col_specs, col_names, locale_, n_max, progress) } read_delimited <- function(file, tokenizer, col_names = TRUE, col_types = NULL, locale = default_locale(), skip = 0, comment = "", n_max = Inf, guess_max = min(1000, n_max), progress = show_progress()) { name <- source_name(file) # If connection needed, read once. file <- standardise_path(file) if (is.connection(file)) { data <- read_connection(file) } else { if (empty_file(file)) { return(tibble::data_frame()) } data <- file } spec <- col_spec_standardise( data, skip = skip, comment = comment, guess_max = guess_max, col_names = col_names, col_types = col_types, tokenizer = tokenizer, locale = locale) ds <- datasource(data, skip = skip + isTRUE(col_names), comment = comment) if (is.null(col_types) && !inherits(ds, "source_string")) { show_cols_spec(spec) } out <- read_tokens(ds, tokenizer, spec$cols, names(spec$cols), locale_ = locale, n_max = n_max, progress = progress) out <- name_problems(out, names(spec$cols), name) attr(out, "spec") <- spec warn_problems(out) } generate_spec_fun <- function(x) { formals(x)$n_max <- 0 formals(x)$guess_max <- 1000 args <- formals(x) body(x) <- call("attr", as.call(c(substitute(x), stats::setNames(lapply(names(args), as.symbol), names(args)))), "spec") formals(x) <- args x } #' Generate a column specification #' #' When printed, only the first 20 columns are printed by default. To override, #' set `options(readr.num_columns)` can be used to modify this (a value of 0 #' turns off printing). #' #' @return The `col_spec` generated for the file. #' @inheritParams read_delim #' @export #' @examples #' # Input sources ------------------------------------------------------------- #' # Retrieve specs from a path #' spec_csv(system.file("extdata/mtcars.csv", package = "readr")) #' spec_csv(system.file("extdata/mtcars.csv.zip", package = "readr")) #' #' # Or directly from a string (must contain a newline) #' spec_csv("x,y\n1,2\n3,4") #' #' # Column types -------------------------------------------------------------- #' # By default, readr guesses the columns types, looking at the first 1000 rows. #' # You can specify the number of rows used with guess_max. #' spec_csv(system.file("extdata/mtcars.csv", package = "readr"), guess_max = 20) spec_delim <- generate_spec_fun(read_delim) #' @rdname spec_delim #' @export spec_csv <- generate_spec_fun(read_csv) #' @rdname spec_delim #' @export spec_csv2 <- generate_spec_fun(read_csv2) #' @rdname spec_delim #' @export spec_tsv <- generate_spec_fun(read_tsv) readr/R/readr.R0000644000175100001440000000012213106315444013002 0ustar hornikusers#' @keywords internal #' @importFrom hms hms #' @importFrom R6 R6Class "_PACKAGE" readr/R/rds.R0000644000175100001440000000231113106315444012477 0ustar hornikusers#' Read/write RDS files. #' #' Consistent wrapper around [saveRDS()] and [readRDS()]. #' `write_rds()` does not compress by default as space is generally cheaper #' than time. #' #' @param path Path to read from/write to. #' @keywords internal #' @export #' @examples #' temp <- tempfile() #' write_rds(mtcars, temp) #' read_rds(temp) #' #' \dontrun{ #' write_rds(mtcars, "compressed_mtc.rds", "xz", compression = 9L) #' } read_rds <- function(path) { readRDS(path) } #' @param x R object to write to serialise. #' @param compress Compression method to use: "none", "gz" ,"bz", or "xz". #' @param ... Additional arguments to connection function. For example, control #' the space-time trade-off of different compression methods with #' `compression`. See [connections()] for more details. #' @return `write_rds()` returns `x`, invisibly. #' @rdname read_rds #' @export write_rds <- function(x, path, compress = c("none", "gz", "bz2", "xz"), ...) { compress <- match.arg(compress) con <- switch(compress, none = file(path, ...), gz = gzfile(path, ...), bz2 = bzfile(path, ...), xz = xzfile(path, ...)) on.exit(close(con), add = TRUE) saveRDS(x, con) invisible(x) } readr/R/collectors.R0000644000175100001440000003077613106315444014100 0ustar hornikuserscollector <- function(type, ...) { structure(list(...), class = c(paste0("collector_", type), "collector")) } is.collector <- function(x) inherits(x, "collector") #' @export print.collector <- function(x, ...) { cat("<", class(x)[1], ">\n", sep = "") } collector_find <- function(name) { if (is.na(name)) { return(col_character()) } get(paste0("col_", name), envir = asNamespace("readr"))() } #' Parse a character vector. #' #' @param x Character vector of elements to parse. #' @param collector Column specification. #' @inheritParams read_delim #' @inheritParams tokenizer_delim #' @keywords internal #' @export #' @examples #' x <- c("1", "2", "3", "NA") #' parse_vector(x, col_integer()) #' parse_vector(x, col_double()) parse_vector <- function(x, collector, na = c("", "NA"), locale = default_locale()) { if (is.character(collector)) { collector <- collector_find(collector) } warn_problems(parse_vector_(x, collector, na = na, locale_ = locale)) } #' Parse logicals, integers, and reals #' #' Use `parse_*()` if you have a character vector you want to parse. Use #' `col_*()` in conjunction with a `read_*()` function to parse the #' values as they're read in. #' #' @name parse_atomic #' @aliases NULL #' @param x Character vector of values to parse. #' @inheritParams tokenizer_delim #' @inheritParams read_delim #' @family parsers #' @examples #' parse_integer(c("1", "2", "3")) #' parse_double(c("1", "2", "3.123")) #' parse_number("$1,123,456.00") #' #' # Use locale to override default decimal and grouping marks #' es_MX <- locale("es", decimal_mark = ",") #' parse_number("$1.123.456,00", locale = es_MX) #' #' # Invalid values are replaced with missing values with a warning. #' x <- c("1", "2", "3", "-") #' parse_double(x) #' # Or flag values as missing #' parse_double(x, na = "-") NULL #' @rdname parse_atomic #' @export parse_logical <- function(x, na = c("", "NA"), locale = default_locale()) { parse_vector(x, col_logical(), na = na, locale = locale) } #' @rdname parse_atomic #' @export parse_integer <- function(x, na = c("", "NA"), locale = default_locale()) { parse_vector(x, col_integer(), na = na, locale = locale) } #' @rdname parse_atomic #' @export parse_double <- function(x, na = c("", "NA"), locale = default_locale()) { parse_vector(x, col_double(), na = na, locale = locale) } #' @rdname parse_atomic #' @export parse_character <- function(x, na = c("", "NA"), locale = default_locale()) { parse_vector(x, col_character(), na = na, locale = locale) } #' @rdname parse_atomic #' @export col_logical <- function() { collector("logical") } #' @rdname parse_atomic #' @export col_integer <- function() { collector("integer") } #' @rdname parse_atomic #' @export col_double <- function() { collector("double") } #' @rdname parse_atomic #' @export col_character <- function() { collector("character") } #' Skip a column #' #' Use this function to ignore a column when reading in a file. #' To skip all columns not otherwise specified, use \code{\link{cols_only}()}. #' #' @family parsers #' @export col_skip <- function() { collector("skip") } #' Parse numbers, flexibly #' #' This drops any non-numeric characters before or after the first number. #' The grouping mark specified by the locale is ignored inside the number. #' #' @inheritParams parse_atomic #' @inheritParams tokenizer_delim #' @inheritParams read_delim #' @family parsers #' @export #' @examples #' parse_number("$1000") #' parse_number("1,234,567.78") parse_number <- function(x, na = c("", "NA"), locale = default_locale()) { parse_vector(x, col_number(), na = na, locale = locale) } #' @rdname parse_number #' @export col_number <- function() { collector("number") } #' Parse using the "best" type #' #' `parse_guess()` returns the parser vector; `guess_parser()` #' returns the name of the parser. These functions use a number of heuristics #' to determine which type of vector is "best". Generally they try to err of #' the side of safety, as it's straightforward to override the parsing choice #' if needed. #' #' @inheritParams parse_atomic #' @inheritParams tokenizer_delim #' @inheritParams read_delim #' @family parsers #' @export #' @examples #' # Logical vectors #' parse_guess(c("FALSE", "TRUE", "F", "T")) #' #' # Integers and doubles #' parse_guess(c("1","2","3")) #' parse_guess(c("1.6","2.6","3.4")) #' #' # Numbers containing grouping mark #' guess_parser("1,234,566") #' parse_guess("1,234,566") #' #' # ISO 8601 date times #' guess_parser(c("2010-10-10")) #' parse_guess(c("2010-10-10")) parse_guess <- function(x, na = c("", "NA"), locale = default_locale()) { parse_vector(x, guess_parser(x, locale), na = na, locale = locale) } #' @rdname parse_guess #' @export col_guess <- function() { collector("guess") } #' @rdname parse_guess #' @export guess_parser <- function(x, locale = default_locale()) { stopifnot(is.locale(locale)) collectorGuess(x, locale) } #' Parse factors #' #' `parse_factor` is similar to [factor()], but will generate #' warnings if elements of `x` are not found in `levels`. #' #' @param levels Character vector providing set of allowed levels. if `NULL`, #' will generate levels based on the unique values of `x`, ordered by order #' of appearance in `x`. #' @param ordered Is it an ordered factor? #' @param include_na If `NA` are present, include as an explicit factor to level? #' @inheritParams parse_atomic #' @inheritParams tokenizer_delim #' @inheritParams read_delim #' @family parsers #' @export #' @examples #' parse_factor(c("a", "b"), letters) #' #' x <- c("cat", "dog", "caw") #' levels <- c("cat", "dog", "cow") #' #' # Base R factor() silently converts unknown levels to NA #' x1 <- factor(x, levels) #' #' # parse_factor generates a warning & problems #' x2 <- parse_factor(x, levels) #' #' # Using an argument of `NULL` will generate levels based on values of `x` #' x2 <- parse_factor(x, levels = NULL) parse_factor <- function(x, levels, ordered = FALSE, na = c("", "NA"), locale = default_locale(), include_na = TRUE) { parse_vector(x, col_factor(levels, ordered, include_na), na = na, locale = locale) } #' @rdname parse_factor #' @export col_factor <- function(levels, ordered = FALSE, include_na = FALSE) { collector("factor", levels = levels, ordered = ordered, include_na = include_na) } # More complex ------------------------------------------------------------ #' Parse date/times #' #' @section Format specification: #' `readr` uses a format specification similar to [strptime()]. #' There are three types of element: #' #' \enumerate{ #' \item Date components are specified with "\%" followed by a letter. #' For example "\%Y" matches a 4 digit year, "\%m", matches a 2 digit #' month and "\%d" matches a 2 digit day. Month and day default to `1`, #' (i.e. Jan 1st) if not present, for example if only a year is given. #' \item Whitespace is any sequence of zero or more whitespace characters. #' \item Any other character is matched exactly. #' } #' #' `parse_datetime()` recognises the following format specifications: #' \itemize{ #' \item Year: "\%Y" (4 digits). "\%y" (2 digits); 00-69 -> 2000-2069, #' 70-99 -> 1970-1999. #' \item Month: "\%m" (2 digits), "\%b" (abbreviated name in current #' locale), "\%B" (full name in current locale). #' \item Day: "\%d" (2 digits), "\%e" (optional leading space) #' \item Hour: "\%H" or "\%I", use I (and not H) with AM/PM. #' \item Minutes: "\%M" #' \item Seconds: "\%S" (integer seconds), "\%OS" (partial seconds) #' \item Time zone: "\%Z" (as name, e.g. "America/Chicago"), "\%z" (as #' offset from UTC, e.g. "+0800") #' \item AM/PM indicator: "\%p". #' \item Non-digits: "\%." skips one non-digit character, #' "\%+" skips one or more non-digit characters, #' "\%*" skips any number of non-digits characters. #' \item Automatic parsers: "\%AD" parses with a flexible YMD parser, #' "\%AT" parses with a flexible HMS parser. #' \item Shortcuts: "\%D" = "\%m/\%d/\%y", "\%F" = "\%Y-\%m-\%d", #' "\%R" = "\%H:\%M", "\%T" = "\%H:\%M:\%S", "\%x" = "\%y/\%m/\%d". #' } #' #' @section ISO8601 support: #' #' Currently, readr does not support all of ISO8601. Missing features: #' #' \itemize{ #' \item Week & weekday specifications, e.g. "2013-W05", "2013-W05-10" #' \item Ordinal dates, e.g. "2013-095". #' \item Using commas instead of a period for decimal separator #' } #' #' The parser is also a little laxer than ISO8601: #' #' \itemize{ #' \item Dates and times can be separated with a space, not just T. #' \item Mostly correct specifications like "2009-05-19 14:" and "200912-01" work. #' } #' #' @param x A character vector of dates to parse. #' @param format A format specification, as described below. If set to "", #' date times are parsed as ISO8601, dates and times used the date and #' time formats specified in the [locale()]. #' #' Unlike [strptime()], the format specification must match #' the complete string. #' @inheritParams read_delim #' @inheritParams tokenizer_delim #' @return A [POSIXct()] vector with `tzone` attribute set to #' `tz`. Elements that could not be parsed (or did not generate valid #' dates) will bes set to `NA`, and a warning message will inform #' you of the total number of failures. #' @family parsers #' @export #' @examples #' # Format strings -------------------------------------------------------- #' parse_datetime("01/02/2010", "%d/%m/%Y") #' parse_datetime("01/02/2010", "%m/%d/%Y") #' # Handle any separator #' parse_datetime("01/02/2010", "%m%.%d%.%Y") #' #' # Dates look the same, but internally they use the number of days since #' # 1970-01-01 instead of the number of seconds. This avoids a whole lot #' # of troubles related to time zones, so use if you can. #' parse_date("01/02/2010", "%d/%m/%Y") #' parse_date("01/02/2010", "%m/%d/%Y") #' #' # You can parse timezones from strings (as listed in OlsonNames()) #' parse_datetime("2010/01/01 12:00 US/Central", "%Y/%m/%d %H:%M %Z") #' # Or from offsets #' parse_datetime("2010/01/01 12:00 -0600", "%Y/%m/%d %H:%M %z") #' #' # Use the locale parameter to control the default time zone #' # (but note UTC is considerably faster than other options) #' parse_datetime("2010/01/01 12:00", "%Y/%m/%d %H:%M", #' locale = locale(tz = "US/Central")) #' parse_datetime("2010/01/01 12:00", "%Y/%m/%d %H:%M", #' locale = locale(tz = "US/Eastern")) #' #' # Unlike strptime, the format specification must match the complete #' # string (ignoring leading and trailing whitespace). This avoids common #' # errors: #' strptime("01/02/2010", "%d/%m/%y") #' parse_datetime("01/02/2010", "%d/%m/%y") #' #' # Failures ------------------------------------------------------------- #' parse_datetime("01/01/2010", "%d/%m/%Y") #' parse_datetime(c("01/ab/2010", "32/01/2010"), "%d/%m/%Y") #' #' # Locales -------------------------------------------------------------- #' # By default, readr expects English date/times, but that's easy to change' #' parse_datetime("1 janvier 2015", "%d %B %Y", locale = locale("fr")) #' parse_datetime("1 enero 2015", "%d %B %Y", locale = locale("es")) #' #' # ISO8601 -------------------------------------------------------------- #' # With separators #' parse_datetime("1979-10-14") #' parse_datetime("1979-10-14T10") #' parse_datetime("1979-10-14T10:11") #' parse_datetime("1979-10-14T10:11:12") #' parse_datetime("1979-10-14T10:11:12.12345") #' #' # Without separators #' parse_datetime("19791014") #' parse_datetime("19791014T101112") #' #' # Time zones #' us_central <- locale(tz = "US/Central") #' parse_datetime("1979-10-14T1010", locale = us_central) #' parse_datetime("1979-10-14T1010-0500", locale = us_central) #' parse_datetime("1979-10-14T1010Z", locale = us_central) #' # Your current time zone #' parse_datetime("1979-10-14T1010", locale = locale(tz = "")) parse_datetime <- function(x, format = "", na = c("", "NA"), locale = default_locale()) { parse_vector(x, col_datetime(format), na = na, locale = locale) } #' @rdname parse_datetime #' @export parse_date <- function(x, format = "", na = c("", "NA"), locale = default_locale()) { parse_vector(x, col_date(format), na = na, locale = locale) } #' @rdname parse_datetime #' @export parse_time <- function(x, format = "", na = c("", "NA"), locale = default_locale()) { parse_vector(x, col_time(format), na = na, locale = locale) } #' @rdname parse_datetime #' @export col_datetime <- function(format = "") { collector("datetime", format = format) } #' @rdname parse_datetime #' @export col_date <- function(format = "") { collector("date", format = format) } #' @rdname parse_datetime #' @export col_time <- function(format = "") { collector("time", format = format) } readr/R/example.R0000644000175100001440000000056613106315672013357 0ustar hornikusers#' Get path to readr example #' #' readr comes bundled with a number of sample files in its `inst/extdata` #' directory. This function make them easy to access #' #' @param path Name of file #' @export #' @keywords internal #' @examples #' readr_example("challenge.csv") readr_example <- function(path) { system.file("extdata", path, package = "readr", mustWork = TRUE) } readr/R/POSIXct.R0000644000175100001440000000014213057262333013143 0ustar hornikusersPOSIXct <- function(x, tz = "UTC") { structure(x, class = c("POSIXct", "POSIXt"), tzone = tz) } readr/R/read_lines_chunked.R0000644000175100001440000000112713106315444015521 0ustar hornikusers#' Read lines from a file or string by chunk. #' #' @inheritParams datasource #' @inheritParams read_delim_chunked #' @keywords internal #' @family chunked #' @export read_lines_chunked <- function(file, callback, chunk_size = 10000, skip = 0, locale = default_locale(), na = character(), progress = show_progress()) { if (empty_file(file)) { return(character()) } ds <- datasource(file, skip = skip) callback <- as_chunk_callback(callback) on.exit(callback$finally(), add = TRUE) read_lines_chunked_(ds, locale, na, chunk_size, callback, progress) return(callback$result()) } readr/R/read_table.R0000644000175100001440000000565513106315444014007 0ustar hornikusers#' Read whitespace-separated columns into a tibble #' #' @description #' `read_table()` and `read_table2()` are designed to read the type of textual #' data where each column is #' separate by one (or more) columns of space. #' #' `read_table2()` is like [read.table()], it allows any number of whitespace #' characters between columns, and the lines can be of different lengths. #' #' `read_table()` is more strict, each line must be the same length, #' and each field is in the same position in every line. It first finds empty columns and then #' parses like a fixed width file. #' #' `spec_table()` and `spec_table2()` return #' the column specifications rather than a data frame. #' #' @seealso [read_fwf()] to read fixed width files where each column #' is not separated by whitespace. `read_fwf()` is also useful for reading #' tabular data with non-standard formatting. #' @inheritParams datasource #' @inheritParams tokenizer_fwf #' @inheritParams read_delim #' @export #' @examples #' # One corner from http://www.masseyratings.com/cf/compare.htm #' massey <- readr_example("massey-rating.txt") #' cat(read_file(massey)) #' read_table(massey) #' #' # Sample of 1978 fuel economy data from #' # http://www.fueleconomy.gov/feg/epadata/78data.zip #' epa <- readr_example("epa78.txt") #' cat(read_file(epa)) #' read_table(epa, col_names = FALSE) read_table <- function(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = "NA", skip = 0, n_max = Inf, guess_max = min(n_max, 1000), progress = show_progress(), comment = "") { ds <- datasource(file, skip = skip) columns <- fwf_empty(ds, skip = skip, n = guess_max, comment = comment) skip <- skip + columns$skip tokenizer <- tokenizer_fwf(columns$begin, columns$end, na = na, comment = comment) spec <- col_spec_standardise( file = ds, skip = skip, guess_max = guess_max, col_names = col_names, col_types = col_types, locale = locale, tokenizer = tokenizer ) ds <- datasource(file = ds, skip = skip + isTRUE(col_names)) if (is.null(col_types) && !inherits(ds, "source_string")) { show_cols_spec(spec) } res <- read_tokens(ds, tokenizer, spec$cols, names(spec$cols), locale_ = locale, n_max = n_max, progress = progress) attr(res, "spec") <- spec res } #' @rdname read_table #' @export read_table2 <- function(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = "NA", skip = 0, n_max = Inf, guess_max = min(n_max, 1000), progress = show_progress(), comment = "") { tokenizer <- tokenizer_ws(na = na, comment = comment) read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, locale = locale, skip = skip, comment = comment, n_max = n_max, guess_max = guess_max, progress = progress) } #' @rdname spec_delim #' @export spec_table <- generate_spec_fun(read_table) readr/R/encoding.R0000644000175100001440000000256713106315444013512 0ustar hornikusers#' Guess encoding of file #' #' Uses [stringi::stri_enc_detect()]: see the documentation there #' for caveats. #' #' @rdname encoding #' @param file A character string specifying an input as specified in #' [datasource()], a raw vector, or a list of raw vectors. #' @inheritParams datasource #' @inheritParams read_lines #' @param threshold Only report guesses above this threshold of certainty. #' @return A tibble #' @export #' @examples #' guess_encoding(readr_example("mtcars.csv")) #' guess_encoding(read_lines_raw(readr_example("mtcars.csv"))) #' guess_encoding(read_file_raw(readr_example("mtcars.csv"))) #' #' guess_encoding("a\n\u00b5\u00b5") guess_encoding <- function(file, n_max = 1e4, threshold = 0.20) { if (!requireNamespace("stringi", quietly = TRUE)) { stop("stringi package required for encoding operations", call. = FALSE) } if (is.character(file)) { lines <- unlist(read_lines_raw(file, n_max = n_max)) } else if (is.raw(file)) { lines <- file } else if (is.list(file)) { lines <- unlist(file) } else { stop("Unknown input to `file`", call. = FALSE) } if (stringi::stri_enc_isascii(lines)) { return(tibble::tibble(encoding = "ASCII", confidence = 1)) } guess <- stringi::stri_enc_detect(lines) df <- tibble::as_tibble(guess[[1]]) names(df) <- tolower(names(df)) df[df$confidence > threshold, c("encoding", "confidence")] } readr/R/sysdata.rda0000644000175100001440000007256213057262333013746 0ustar hornikusersBZh91AY&SY2u?w}&lzOZ[[[{::*<:ɬv[-'L]Vy_{|/v:K]ګ'^{\>֬׷9w=ݹUֽ1[D<=Ǧ۽eO=mZ/{ybڎ;'otܨק6W^k^*{\Η\ݷκZyw׳羯fwsIݳ$UZڻbfPz*'{^njw{{k^{K@= +BKi]J{V{y9Uyp[i4 2C@ )biO'h43SQz3C)=@  h& i=)M1Q==4HOj)OIi=M444z(4% 5) ~OJyyOSMGxfPW}v5}3͂q)ojYlG+uelYbg{!B;i jWŁnPBu1}8!8DZ[o~oچj  `kYW?j a@R:ۻ '^v;;?7Bi_'6/1gXo-'?͍Z!sE !co6^#%2O/GyFOs?3dFTYV]Uv5W.{ HN UWƫ=C6@P=_Ыo%zFįM6)' 1ffjtVC%?JӜHJ#|]L(w1bGsAt<f wCR^g/ a%<4z.9ۇW ]`TQ#;^Fh Z*bA B HM@ Y@@"o+H?v{S==4 ҁ7aewct/a|ݓvㆍhgg'}Ǡ\,? f< 7km}ȵ)JR?xT\- > _W&هQ̈]ɱw `!@`@|)tG oC HY3@AɁm|ŗz3Ea#Ps~\ !4O Bc6ImWi[f2B l*(w"|xw_=}_V)Wv|xoIɫǐqg$Zkv$U=rY"G"H& "0Xmt~uB~"¦$dhF9C'Xk[_O $"yq8}?ӁĐ{^Jp@H> E +3,g$D@ ݻ%=~Gh,a<{:oN8GӾ A|v'8| NPލQ6& /8cy1>*7p'/q-?)P9C989{Ѹ·KాGcK!!Ƒ#`oJ5F͈sGD:zf= MZ+V6, _+3{٫L T!J  J+93ᡨ NSs73ZZ"G4.5c!sH naⰹWꪤQ(E20YWb6Z[흍c^i=h솏{q=)z! }pd': *6!hmXyt@((YZߨ^qQ Vsއ'-J1ܨ(1RV 8ޞ},B79ң_;eU0cUFs̈9|Gd(tlz"lUiO_ZVt1NT&AΨ:0M)u}{㧎>m1/lcuz] węr p}`#2ZC(xG\l@!8/uhjr^2 ^ 0HF׎em7QV,JG$R5Ţס[ (.*e3HQ^}1) wS`I1863~\z5c酩)-@3U*+ {Pp C&QDƀ2/?(Fw֘+M&BukLj M2[ӴEUY deņLK5 qt=)R<$X41ɤr(֥ C zp]W&h1U2<xcq6 <8/a8)]}]h$<#s9\ LV3`qv vMKOc-(Ku7 {͡*[ whk`ැ6y8aœ{{"$/]SC.x0Cb8O!"7wxQUˀT[djz<j|8a:z!t2ٱ|yQ1% Ko]03s9im8Z>u*(!19Jb+=aݼ.LvlĊtQ9c\ \(ԯ͖P~~ˎnf3Pܦ}t r"s+_sI4E-%P+# ˅rX0^X82$B*V+xb6L;bҲ';~9p(yu ȫ=ێo+06\ .I"zV\ل]!zooor3VZ!f UIeXj0t_!8?G oۜUAhj\lj 'bphK4*L{=Mɥ/g%b5uf}1ɸ g:w6u +@cM +b*32dƃ50Jʏ,iDZV_ANTQx k0+sϕ"517 A|fȤ)ڳuolL2!~n71! 2w-b?P任9,O먵p]^/O5um0]<Łd+1F6u0$l~`tl`wCr*tNGx́\Vz S^pWR#Nׅ5C=IAm; pWJT5H;ZϮx<Îj^~Iڀ9 Cr݌!:dfB p7)9 vu-x oX[ d+sz;;;=bKL% YE7 c1ee ra,ƾ#@av&zbA86؀LdO7$iO:5UDpMR_;J0SfB-j'$5vڴ@DCff'KzŰg"+iƟВӵw__%ZU;Coy--JY Rb}PjϝfRiI յXr:,k<4i.j"n(+ٮ2/do.a S)ˡ{b`E7uj8c<@ѝs6]|{A*s;-;gV͞ c57{-}t2oT` ı3л-fr` ;ݙԁ/f+cMkɘ-*KDI cI(4!-n9^ZuX X819ʙT1Ό8®>H;l[p()k$F3zɱJ|} %( R5%ݐϙ}W(]ukeCSA̳,"AO)m=g tٲ{P01#h\a',ZL(*,v$pVvQ+"2z BV k# RU%D7\ŵ'Sb̾Ҟ3fʠ3Zflvl*.ƙpp]9Rx@$RH@ RR*m [   Dd Hd@$,D! F EȢ Q`A``,Db$(EF E @!*0 Y2ADdPc $HAV$$1!#,HTdDdT`,d"*# 01,` ( AH"DEEDX( 0a@Ub$H(E,Y"1dXEX"dAdFXB H*(,D`0V((A(R"#QQAF (*XDF F,FAHȪ,cX1@PPEEEQA @ȪD*1UA@AERAX UH0HȬE "$EX)TUAX ċ*IE1AUAI FȤ "*(PAUD`Q EHdV ,XQY$F( HA@cb "B,(@3PF "BG?,^:af/KدZjMʵA6`"Jh('yFmkRyug"b%d 8H,kǾN~âfA;(H,DY!0"U `_- CSnD3|=z 'އM(`'VTN&UlQL 7`iЇ,h]1ǁ )~b! tSPQETX2Rd4D̸ĩ.wezBR3&O/~^ \]oY ^*V 2> smO]}[J`y`X!u48Z5d):Yv~u=x^/'v9߱?CiII->̃Qw|}T,FaNAWgS}C%bACwDSFnb=!e<b;>Ǟ @ޑ_p2#޷N(|\QH$ԃ|XT~y qvu;_L(-k-zkYB %QD/;[mx?&wT+Zֵ`&Bj9bs s9Ndĥ!)4 H 'C{ZYb֋Y~Eϥaj~l2c9qC䉗pہ .w]CH N*g acMs>AٳP2!l^NIqK-!rҖjyUC5Dj6!~? }g*Eq@TsJ>P,f eDCUәD9Eκk/C"dԢ%J0Vb]f“8  ƂXԡEQTC9] REVtV/\HǂGC2D4Rh'u!-Q&J\_w}[gr'}ʊ}c1QiK'!9 LcZ?< ]?Ɓ$g( I5&w |8L< a`h9FzG1}w@ JqԳ[v d)X!gi0H}Jped7tcEMܔӊ殂"2ʍ;^ .{be4^K}gƙݥIK|OM&sȘ]{`z{gNߑƂNJ . H6ylwXU\ɰ2+-MXǗpV**B:`@(NM HF*b r.e~;31iL;0&3 hAp%Н"Ա~:{ɡ"7[XgPfA"2j6 Ap&@~ouo3xY5qSxLȈ4TK&c0 a_;g-Ptռ2,lvݨ]aڲ=#N u-Q6V[86^c(AQ&sLYKS1 o <@8/ULb$9n@#, 0JTeJRP}NBDg:T};zهh4ۘӝNhk\Sҝsض)Z;˥Aefa-Sx>yby'=Ƞ9 P6JL47ԅ_2*JQ2V@ϋB82`Ӿ9=?evjUZ "cY:JM.اT^dc/!vh^E֔2ԁTXvp?b!394RB!Lp,iuFoi_0 *yƓ}W>o #tC|;%܁=-F:r)ɡO]|zw>}iܷئ_j8ܠ+ =ϱ[eWk9*nD ~5.o=D)5\q3Bo/f@Dٵˮ k:3QZ;Ω(x/ůܣ%l&+f2NV^uxKB/ 2F]C2mZG39ã#<_g8y+ܔ'ɼ3cqq0D95Nh(pV֟zuI{ռh_Rzj+HF)Mgi_69T^= v)M>D/~0VggZczavXȩCCÕWab+:]&U-u݅p)-Ng&6 UoERUEUTUPU*2}@cu^R+@3×vIW=xTPV|c3D<wE!캎Ešm?< uxhBiF쪂PP@0tM= V?molDY`r3ܖd˟[#W *Ꞵ3LasOs,Yd_{4xhŁ$6> KJسę({:iPqj)njH  ڑ53‹]0&2H۾yحdf#>{>Ë>jCV`~bcFBBlTY{s M8j翟h smѳB*7>1tM>z|%UghRE ȩIĝRJj19obbhp:uڌ? A CR2YF_@QCArPJKRE+N3 `/=fr'Q\T p &)JFCD~B1Pn֨.SZۂʽAq\<Py\{9MQH/[m^AuaK~1iٵ209Ap$3 {51d>[29ot~lc-ps]ETMbX*"'٬^;s֊VVf1JN4 H: q05ѳCAM+cZVk$Wl %Cݳk9jd,XM2PQMB6S'V;;#6hEWEe`JAr! TDU+ɲ3(QnP] ;+J֒K*m e^Age4XpdR8s xdJ٠_m@b0rߎ>$!v"~#C MR)^ad9U7Š F0P)8G6Ң>2rL^K jGci3Ah|# sFL߭Z ;1tQ*)XP1d&2 $]۠H?vldU }gCU78F%m8hEuǺC%췎SG-Y%8Tø=@AGU@QC.X,[HJ\LTƻN tD+~1M>g\4BBM$굥=`,oQcw0`_jF~b>J۠_ <˺S$wB([ grndvsA{\'N ]pv0a){J%]E.zAP~7FD>xv}_/S3g&*͙Ynr2Y*; rXl0M9Sbv]gIsZ]\FR` Vf-6s 8["S{tD.eӨ*?q7(][{yb ŎW&Q L뾚L9BGLitŠ_ZrǞxeud[`l8a9 {X=Cz!,|>RWgGx~yϥtva"…)4 YV./siDq@“=6rn ⁐6QN:NۯS_s4 9__k?XÁZHߏ[# mMpuC囌zVaO8=O` Y )fdP^t3P\Kc5SԦk?>,c.M]=MřK 6#ouy>#٭e@|YT۠A >3q͍ǘIWqFN:e3 ɚm&`waمOxD.E)<ųtc"n9 ] Yi BFcpSU?WY<~輆 ᷋+=$gz1s[٤8=8@}jtW* }/rj1JjHP0I Zw '?>`ry`"3kg{r$R:a4]#۩js)$vG]PPoxF23.PSkΏ7+ BQzȿ$?IdRw|nVkBd" wHG3-@bR UkDFKpգ/(gϔ((TN_.<*J$H_I.#$PRJwUST#`66 鏂eY$eaݧ'~DŽtpq^sLCAcCJ'D8}"/'|u0ߗO*)0RYh!%ˆA;">LS;ZK.׎ay4n̅6}+D&Ͻh/8o{s@C7aTy9lHw7Ȓ.%C0s0ǧR 2A*q3]l!GLAR֢q:/ub%rzou멿K1hyyW89pfL `SoOxv燞A b> X 1j|c 4)x7!U;\b6gECtq)#s-K{.]nԀSq'ݪ^@eCe Rڏ?#K!c:1BrIHd+d "Uso{<<4ie,,Y$Q[tv1vMn|?9 ۚٝ(4arM(y^JJs_yEmZw\i/#n~SUJ (.c,b+")`ɤfT"yڂn" wp!D02z"BUm>^~9MO٨'O&aQ熕3_Yթ5THveYAsB5[ϡ/3QF1,r+0Fb[ژ`56t9KNM1M  3wR)!^F[YvF^@; 0;'"+#+dis"`yr ǡCܲn!o/LYJvo[..@v^sNJAPQ[_mF fnU9hT޶jg}O?г.3iV0"ޟ~lpԒIkzV1aKG6Ŗ3RS4vZF-]tgt9JJ[8ڽ6bd P*sᓦG[:Z YhB QZ${HT>߾&dlq~+ALcSMB)H'reNq*p4ށ2Qe/ⶨdp/ckB-cTsss h48X||nNnCq0>!Iq;%m e9D.\BnF5DT:V'ATչ}0zҢ[V.F֨:dnW Ш`((ܦgS~M]&(#@ W ÓC3hl]4RYm! wFMFZjw? DwT O(<4Q X`ȴŬհvf-W> )J0vһϗ7Շk>ͅ".ШPTcm3 gPsWU1.Fd1,J+0Q+~{0\>S2 H\M1> e5߇i DC4C: ϯD.l}C{Hl$A!4ڻ4E=ì,;(ef$ s!oTfuxP@(\9X%i`|JN'DKKFqh>I,|ef捪h_mZfDבaйj=Aa̖dlNjڕީ~vYAuQ'fu;舢dLwlP0 {<*0s3bpU%]L`eX@ WEFʟ\C2<BQ6;VXuwӰFz7}"J{shQE.1rAfuM ">ζ֬ss*ȭj,)*%Rzxd"Y}5Cva['Ǔfv֘&mR* XvGvFk+LyDe%B $`̧q`Y.xJX^Ҩ_7EgW~Wo;_kR}*hAMҴ)?8Y}|i'ŒY˩hCx}!p}J;vGm)`MGhx>?wp,zH9lߋnyjE 5G{}cNFJLzʏ"izZgTEKNYugx4oaҥ. E>!7t3ZE{0!GtE8|M'ZO(|77e֤½`D;#;CǤ<*_R`"LbwlՎY7j9WCSl^&MV*债VW V mQOŢ^]~'rǽUv.Xn*_=Sѽ; =w<4^ gGR&p9;K 2K"Xc[Cz"*X:=ͭ,HRypM6|B-X3Gj j=@+C@6绸,͈cKTfFaWNj z-fR HX]{+W_DЇmJg( .f`}ݱ4hNXo;OFN2F 'ͺan,,RY.>U tjǬ4mU1P1Ǝ5t5CEbhbk54pҭj \m`0vƛ Kw1FӧJь#Ʈ%-o6@uR3sro ǎɍIgg%M*#I<w^BM _:ևpukj/*ٶ:0nn c[|xۈyNn,2ncM0SӜZ4gBz (6%qҿ1`PHJrwp |}Nr@R)cHBs 0-sR⾸b́"'L5|1a}rS r2w^nZ.'3D%6vF+sZ4gǿ1}y 0P!S_mǒ"nO>bh4udP,"&MJRJ˻oß$EgbͼBhӰz㘦NȤlB߶jޜq $ ^'P.Q8k7;y}i9ZgCdQCDvh$ \ 1oA'VW^\Q])1.Tka|BO!IBHf$ \:(B_" -a 0z_KUgct>~reNi!,"a?GH%H"L3P4J5.JKYNLR#01.0c0rSB"Y"VͭA:yr`Bhlҫu? g1xtqCP6 >jc"H)M).2r&8&Xtw|>L"""9n2$1HQd&Yi5uIx.U =N/Hm[`T& a+@۞ jڻxJ`}BZ05® ܲ>)lC53* >f$MZ<Vo]hdBk(ҍMF.^@DjlV"^d nYpIX^*ك Y5 !BaÅbك&!$1{ϛ' csK p9Ўa*hgSZ( _9EL 4.͖6U3Z i)fdaDh(hOPN,$IS/;qJ84Q9[S >yvLA;\dl>2 |h1Γs(L ;XCQiHVk:~PF'Vw~6ۻix09*@Ј0<1`KcYH/E$[T,|06%8)3r wpCLO(a@"H J#0L5)S5iUя0]IL`vwe9ߑF7eyn7"0B ta'PRVD$@ CǞ# V R*]Y$цI!IRL$2 hF4gZ7:1? "DMi#_Yp'q2)HECK9./;,+Az:{݉aLߊQ⳪bRVNCPK" ȅJBFdL^eϬQS Hd #D*PB()V3I& qF1cH2^C}ځ\— eU uՁSK hlJJqrGpǐW\㫸YaĹ! 1i4*X.LC"1f}c!*R2("@-F+A! hVv݈Ys)FFiP-0ǚl6Hmrra;.0#dz5GYPI"C#i;K":Qx\l͐۶z~= Q,HcX#Xэk hfH@8LP /XX&]콪[ɡ*mFQExl5B1>"qwbW@P >s2*LdR2 ݩh.8ʪ)(K+ '5fY4h|(xNHO@@QIE$J2xOnz2bBv`YVAd"oS!)żb----BCAFgFD&-Y9b">2h^QAA2`d)hm.f Œ<\ 6 EH,f.1/iH&Hu.6#;N,/E(ɘ ^"d)1,e0wNf8ZWhuubꪐ/*՘@ g0%xVaamQ3^)o[+D$6AN0NL^TiE@&HIHilXk wãCztxGTVzҤQT0d"tU*-B@FBn H```xPPq$+gq! rr,A.@^g2kR; ӸU-N tPIh40p=*$EF C83$8N`3+ mN 23pDs? M؋>>]?nRxbu9L|松w0{:bXDTHRW\su1p !rIUY$&P-YO*H 1"(/ijZ^\\г$',ҎY)eˢ~s?$# >.5 5~mϷlU'Bd};{oAo==5LFœ6 (({Ll MXAqt$ ep,[w#MePN4eegfSA0a# ~`~ ^Ʉ1v:c(r54Af"ѷhrYvg`DϾw~`g'g 3$h5 D5A .b2m4ğdh/C"'LQ!0?|{y%??PyKt }3PjJ``H{Ҝ0e@.4v:ʉLe{m+8~H2!>rוai ;xʎRnRQ4 ­=yNec΁^Nn̉F>2_;f84O.c㛢k=|sUק|ٜν;e! pP0 pI 4C-fʠ1M;Ao 0&J*e| ! isȣx/=A8Ǡr"鶶H3 ?4ơ-X>BbWnvpwkg5W9; qSPs6?l<btdqF$b@( 4vZU#%9v\YE]w5h`+LzYarLFc o.@?@vNܾϏ!>*4؆.sN~^3%Qt`ý>> tq*1EC 1 5Kޫp&1q*v5,y CefW`mkwIJyQ/!wp*` NBc^ &B [9 ble&uc9Me$IF7 եd'P"``Ñfhplv3F`[">:w<>W=z`n1xDΫf+][Nި} hdy g%8]"g验z=^-7EG忽E܃πH76xIҖ1WV2+<:SCGq^ndÞ$~ha?[,܌-pe{cpj|,o& A P3O9:n@p Zs=aeXŤ1Hu n 46ۂ\-:tfbyr=uD`v$w\CNr#)b\uYwFG9|>`f9/  }a%28OEB@4I'> `񉀂 %*L)(J!pc# SH IvC%KCè3pТ\OǼm<e2D;RuZ9!!q@B0͊@@9A9G>񼘛yrS<-x]Z s7>'E sI rC9AΠa4{@1D  9β!yYx 6C1l.ıh4qRH';Ɠ(1`c6,7g\ZV@e )IdIzIq<Ϛ@M 5Bbz /^} GӈZNWt4oj²2_ g4jEAL*rR!>1e] )2g_,t}mj[UȘL4xHǑ6g|(/][ țuj͛3RT1̭$1&sC+/ H`'X,A/Xd;٧bB.hjˇeR*,Pl ThBP@``Tq>,NLR" !!=#4yOhVl b{`ŠāC`VL"2"'Ӡ:1z221a6I=,"9@(|t5S@M@aCPg '9`tvHx!ta;v qX(&P"1>sGL3KRprcwqQ˯Pfa).a8 8FTGfnYu ŭJ%'"2cg?=quk$""蠼y)1p_.K=hff@la=-˭n!$T!"43ue^NrP N>SP D3 'x ꜐@VD\!1$ 14d,c1.A}4 l ,&("pQm:8˒e{艬9+H8NAgqA'qH1Fuji0\LBN\2W_f'WXzI%o{LMœ]u (,exo;]3+ϻW^x© A 7[HC\? g3ccSY"+d : A\;˟F>qVYT((6Yl<ʯJiyy& +ƗQ]2fPGh[Us/~9aLYeA$Ѱ#sT57prmT"WHV&lvcHO}0c}QbKĹx<`0QClڮe3jIsCfՁ|[t<^G314>;Xx)t1Oopl.[st%kAy/a3>I?Wi#͇hO|q XּiU9f6f[tHLy@`Aa³!xqoPH8-.Z6~+]mkme!%ym#9;DO SkؘX3}P<òuVJfZ7e U}^?$^V' z u[f*CiW:' ͈<6F` u&CD{Rk¢–3EETeOV<$hv,,K5|O'3-|e'a[ƏIZ.aA:+kxdA2Ǘ-{F#YB ȗX\cCCƖ364GZfpL1FTuOZUUUobp".{'"D,2GA*L=('Rg38eٞp(L vRd?A%01qӓ5q.sGyJ47H&=() "`X5=ӆ1;/d Ln _b NP*CL9E \ h[b- QY#.w>Y_@u9CB|h^2GN>$"1D: o?grO Q6s>L%9dհ,Ŷ e#5WD̻\A*o]]Vs\ERtfObFFt<#؜,~ 7 +|`pPZ[ ~Nk42A}P }29SL+O) Ga% mɑ[|=LHdy 9ibc x2r*'<\1OlN33ao!J7۽&,ݺ P"AO69jU|Ue3Ĉxd=<٭kCqq*io]ۏ݋|_G9<LKQXb 9H.l(RDau)ҢC9LdJǜv'xc13{2rr򓐠q+,-*.Z?sg?=> c g ׹|HBؔ%q ۪<ԅk%k٨2E4:O)QrW=( |)E|~>)B89C8ˆ)$2ȃܰPjQ`Sp`=p!9Q8hl2 6$b45"PVǜt0^XR3)N*'}Wvgwv7Y[ԨHuZͷ26mm{&fl]om?S)7c;g~} Þ#=g>lw@ ?T`v2eGf'9"G#zx?9K AAyG;6sR'vA8p 1) "s", call. = FALSE) } probs <- function(x) { attr(suppressWarnings(x), "problems") } n_problems <- function(x) { probs <- probs(x) if (is.null(probs)) 0 else nrow(probs) } problem_rows <- function(x) { if (n_problems(x) == 0) return(x[0, , drop = FALSE]) probs <- problems(x) x[unique(probs$row), , drop = FALSE] } warn_problems <- function(x) { n <- n_problems(x) if (n == 0) return(x) probs <- attr(x, "problems") many_problems <- nrow(probs) > 5 probs_f <- format(utils::head(probs, 5), justify = "left") probs_f[probs_f == "NA"] <- "--" probs_f <- rbind(names(probs), probs_f) probs_f <- lapply(probs_f, format, justify = "right") if (many_problems) { width <- vapply(probs_f, function(x) max(nchar(x)), integer(1)) dots <- vapply(width, function(i) paste(rep(".", i), collapse = ""), FUN.VALUE = character(1)) probs_f <- Map(c, probs_f, dots) } probs_f <- do.call(paste, c(probs_f, list(sep = " ", collapse = "\n"))) warning(n, " parsing failure", if (n > 1) "s", ".\n", probs_f, "\n", if (many_problems) "See problems(...) for more details.\n", call. = FALSE, immediate. = TRUE, noBreaks. = TRUE) x } name_problems <- function(x, all_colnames, name = "input") { if (n_problems(x) == 0) return(x) problems <- problems(x) problems$file <- name problems$col <- all_colnames[problems$col] attr(x, "problems") <- problems x } readr/R/type_convert.R0000644000175100001440000000461513106315672014444 0ustar hornikusers#' Re-convert character columns in existing data frame #' #' This is useful if you need to do some manual munging - you can read the #' columns in as character, clean it up with (e.g.) regular expressions and #' then let readr take another stab at parsing it. The name is a homage to #' the base \code{\link[utils]{type.convert}()}. #' #' @param df A data frame. #' @param col_types One of `NULL`, a [cols()] specification, or #' a string. See `vignette("column-types")` for more details. #' #' If `NULL`, all column types will be imputed from the first 1000 rows #' on the input. This is convenient (and fast), but not robust. If the #' imputation fails, you'll need to supply the correct types yourself. #' #' If a column specification created by [cols()], it must contain #' one column specification for each column. If you only want to read a #' subset of the columns, use [cols_only()]. #' #' Unlike other functions `type_convert()` does not allow character #' specifications of `col_types`. #' @inheritParams tokenizer_delim #' @inheritParams read_delim #' @export #' @examples #' df <- data.frame( #' x = as.character(runif(10)), #' y = as.character(sample(10)), #' stringsAsFactors = FALSE #' ) #' str(df) #' str(type_convert(df)) #' #' df <- data.frame(x = c("NA", "10"), stringsAsFactors = FALSE) #' str(type_convert(df)) #' #' # Type convert can be used to infer types from an entire dataset #' type_convert( #' read_csv(readr_example("mtcars.csv"), #' col_types = cols(.default = col_character()))) type_convert <- function(df, col_types = NULL, na = c("", "NA"), trim_ws = TRUE, locale = default_locale()) { stopifnot(is.data.frame(df)) is_character <- vapply(df, is.character, logical(1)) char_cols <- df[is_character] guesses <- lapply(char_cols, function(x) { x[x %in% na] <- NA guess_parser(x, locale) }) if (is.character(col_types)) { stop("`col_types` must be `NULL` or a `cols` specification for `type_convert()`.", call. = FALSE) } specs <- col_spec_standardise( col_types = col_types, col_names = names(char_cols), guessed_types = guesses ) if (is.null(col_types)) { show_cols_spec(specs) } df[is_character] <- lapply(seq_along(char_cols), function(i) { type_convert_col(char_cols[[i]], specs$cols[[i]], which(is_character)[i], locale_ = locale, na = na, trim_ws = trim_ws) }) df } readr/R/read_delim_chunked.R0000644000175100001440000000604313106315444015503 0ustar hornikusers# Generates the chunked definition from the read_* definition generate_chunked_fun <- function(x) { args <- formals(x) # Remove n_max argument args <- args[names(args) != "n_max"] # Change guess_max default to use chunk_size args$guess_max[[3]] <- quote(chunk_size) args <- append(args, alist(callback =, chunk_size = 10000), 1) b <- as.list(body(x)) # Change read_delimited to read_delimited_chunked b[[length(b)]][[1]] <- quote(read_delimited_chunked) call_args <- as.list(b[[length(b)]]) # Remove the n_max argument call_args <- call_args[!names(call_args) == "n_max"] # add the callback and chunk_size arguments b[[length(b)]] <- as.call(append(call_args, alist(callback = callback, chunk_size = chunk_size), 2)) body(x) <- as.call(b) formals(x) <- args x } # Generates the modified read_delimited function generate_read_delimited_chunked <- function(x) { args <- formals(x) args <- args[names(args) != "n_max"] args <- append(args, alist(callback =, chunk_size = 10000), 1) # Change guess_max default to use chunk_size args$guess_max[[3]] <- quote(chunk_size) b <- as.list(body(x)) for (i in seq_along(b)) { if (is.call(b[[i]]) && identical(b[[i]][[1]], as.symbol("<-")) && is.call(b[[i]][[3]]) && identical(b[[i]][[3]][[1]], quote(read_tokens))) { # Change read_tokens() to read_tokens_chunked b[[i]][[3]][[1]] <- quote(read_tokens_chunked) chunked_call <- as.list(b[[i]][[3]]) # Remove the n_max argument chunked_call <- chunked_call[!names(chunked_call) == "n_max"] # Add the callback and chunk_size arguments b[[i]] <- as.call(append(chunked_call, alist(callback = callback, chunk_size = chunk_size), 2)) # Remove additional calls b <- b[-seq(i + 1, length(b))] body(x) <- as.call(b) formals(x) <- args return(x) } } x } read_tokens_chunked <- function(data, callback, chunk_size, tokenizer, col_specs, col_names, locale_, progress) { callback <- as_chunk_callback(callback) on.exit(callback$finally(), add = TRUE) read_tokens_chunked_(data, callback, chunk_size, tokenizer, col_specs, col_names, locale_, progress) return(callback$result()) } utils::globalVariables(c("callback", "chunk_size")) read_delimited_chunked <- generate_read_delimited_chunked(read_delimited) #' Read a delimited file by chunks #' #' @inheritParams read_delim #' @param callback A callback function to call on each chunk #' @param chunk_size The number of rows to include in each chunk #' @keywords internal #' @family chunked #' @export #' @examples #' # Cars with 3 gears #' f <- function(x, pos) subset(x, gear == 3) #' read_csv_chunked(readr_example("mtcars.csv"), DataFrameCallback$new(f), chunk_size = 5) read_delim_chunked <- generate_chunked_fun(read_delim) #' @rdname read_delim_chunked #' @export read_csv_chunked <- generate_chunked_fun(read_csv) #' @rdname read_delim_chunked #' @export read_csv2_chunked <- generate_chunked_fun(read_csv2) #' @rdname read_delim_chunked #' @export read_tsv_chunked <- generate_chunked_fun(read_tsv) readr/R/file.R0000644000175100001440000000274113106315444012635 0ustar hornikusers#' Read/write a complete file #' #' `read_file()` reads a complete file into a single object: either a #' character vector of length one, or a raw vector. `write_file()` takes a #' single string, or a raw vector, and writes it exactly as is. Raw vectors #' are useful when dealing with binary data, or if you have text data with #' unknown encoding. #' #' @inheritParams datasource #' @inheritParams read_delim #' @return #' `read_file`: A length 1 character vector. #' `read_lines_raw`: A raw vector. #' @export #' @examples #' read_file(file.path(R.home("doc"), "AUTHORS")) #' read_file_raw(file.path(R.home("doc"), "AUTHORS")) #' #' tmp <- tempfile() #' #' x <- format_csv(mtcars[1:6, ]) #' write_file(x, tmp) #' identical(x, read_file(tmp)) #' #' read_lines(x) read_file <- function(file, locale = default_locale()) { if (empty_file(file)) { return("") } ds <- datasource(file) read_file_(ds, locale) } #' @export #' @rdname read_file read_file_raw <- function(file) { if (empty_file(file)) { return(raw()) } ds <- datasource(file) read_file_raw_(ds) } #' @inherit write_lines #' @rdname read_file #' @export write_file <- function(x, path, append = FALSE) { path <- standardise_path(path, input = FALSE) if (!isOpen(path)) { on.exit(close(path), add = TRUE) if (isTRUE(append)) { open(path, "ab") } else { open(path, "wb") } } if (is.raw(x)) { write_file_raw_(x, path) } else { write_file_(x, path) } invisible(x) } readr/R/RcppExports.R0000644000175100001440000000725113106615427014214 0ustar hornikusers# Generated by using Rcpp::compileAttributes() -> do not edit by hand # Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 collectorGuess <- function(input, locale_) { .Call('readr_collectorGuess', PACKAGE = 'readr', input, locale_) } read_connection_ <- function(con, chunk_size = 64 * 1024L) { .Call('readr_read_connection_', PACKAGE = 'readr', con, chunk_size) } utctime <- function(year, month, day, hour, min, sec, psec) { .Call('readr_utctime', PACKAGE = 'readr', year, month, day, hour, min, sec, psec) } dim_tokens_ <- function(sourceSpec, tokenizerSpec) { .Call('readr_dim_tokens_', PACKAGE = 'readr', sourceSpec, tokenizerSpec) } count_fields_ <- function(sourceSpec, tokenizerSpec, n_max) { .Call('readr_count_fields_', PACKAGE = 'readr', sourceSpec, tokenizerSpec, n_max) } guess_header_ <- function(sourceSpec, tokenizerSpec, locale_) { .Call('readr_guess_header_', PACKAGE = 'readr', sourceSpec, tokenizerSpec, locale_) } tokenize_ <- function(sourceSpec, tokenizerSpec, n_max) { .Call('readr_tokenize_', PACKAGE = 'readr', sourceSpec, tokenizerSpec, n_max) } parse_vector_ <- function(x, collectorSpec, locale_, na) { .Call('readr_parse_vector_', PACKAGE = 'readr', x, collectorSpec, locale_, na) } read_file_ <- function(sourceSpec, locale_) { .Call('readr_read_file_', PACKAGE = 'readr', sourceSpec, locale_) } read_file_raw_ <- function(sourceSpec) { .Call('readr_read_file_raw_', PACKAGE = 'readr', sourceSpec) } read_lines_ <- function(sourceSpec, locale_, na, n_max = -1L, progress = TRUE) { .Call('readr_read_lines_', PACKAGE = 'readr', sourceSpec, locale_, na, n_max, progress) } read_lines_chunked_ <- function(sourceSpec, locale_, na, chunkSize, callback, progress = TRUE) { invisible(.Call('readr_read_lines_chunked_', PACKAGE = 'readr', sourceSpec, locale_, na, chunkSize, callback, progress)) } read_lines_raw_ <- function(sourceSpec, n_max = -1L, progress = FALSE) { .Call('readr_read_lines_raw_', PACKAGE = 'readr', sourceSpec, n_max, progress) } read_tokens_ <- function(sourceSpec, tokenizerSpec, colSpecs, colNames, locale_, n_max = -1L, progress = TRUE) { .Call('readr_read_tokens_', PACKAGE = 'readr', sourceSpec, tokenizerSpec, colSpecs, colNames, locale_, n_max, progress) } read_tokens_chunked_ <- function(sourceSpec, callback, chunkSize, tokenizerSpec, colSpecs, colNames, locale_, progress = TRUE) { invisible(.Call('readr_read_tokens_chunked_', PACKAGE = 'readr', sourceSpec, callback, chunkSize, tokenizerSpec, colSpecs, colNames, locale_, progress)) } guess_types_ <- function(sourceSpec, tokenizerSpec, locale_, n = 100L) { .Call('readr_guess_types_', PACKAGE = 'readr', sourceSpec, tokenizerSpec, locale_, n) } whitespaceColumns <- function(sourceSpec, n = 100L, comment = "") { .Call('readr_whitespaceColumns', PACKAGE = 'readr', sourceSpec, n, comment) } type_convert_col <- function(x, spec, locale_, col, na, trim_ws) { .Call('readr_type_convert_col', PACKAGE = 'readr', x, spec, locale_, col, na, trim_ws) } stream_delim_ <- function(df, connection, delim, na, col_names = TRUE, bom = FALSE) { .Call('readr_stream_delim_', PACKAGE = 'readr', df, connection, delim, na, col_names, bom) } write_lines_ <- function(lines, connection, na) { invisible(.Call('readr_write_lines_', PACKAGE = 'readr', lines, connection, na)) } write_lines_raw_ <- function(x, connection) { invisible(.Call('readr_write_lines_raw_', PACKAGE = 'readr', x, connection)) } write_file_ <- function(x, connection) { invisible(.Call('readr_write_file_', PACKAGE = 'readr', x, connection)) } write_file_raw_ <- function(x, connection) { invisible(.Call('readr_write_file_raw_', PACKAGE = 'readr', x, connection)) } readr/R/lines.R0000644000175100001440000000422313106315444013025 0ustar hornikusers#' Read/write lines to/from a file #' #' `read_lines()` reads up to `n_max` lines from a file. New lines are #' not included in the output. `read_lines_raw()` produces a list of raw #' vectors, and is useful for handling data with unknown encoding. #' `write_lines()` takes a character vector or list of raw vectors, appending a #' new line after each entry. #' #' @inheritParams datasource #' @inheritParams read_delim #' @param n_max Number of lines to read. If `n_max` is -1, all lines in #' file will be read. #' @return `read_lines()`: A character vector with one element for each line. #' `read_lines_raw()`: A list containing a raw vector for each line. #' @export #' @examples #' read_lines(file.path(R.home("doc"), "AUTHORS"), n_max = 10) #' read_lines_raw(file.path(R.home("doc"), "AUTHORS"), n_max = 10) #' #' tmp <- tempfile() #' #' write_lines(rownames(mtcars), tmp) #' read_lines(tmp) #' read_file(tmp) # note trailing \n #' #' write_lines(airquality$Ozone, tmp, na = "-1") #' read_lines(tmp) read_lines <- function(file, skip = 0, n_max = -1L, locale = default_locale(), na = character(), progress = show_progress()) { if (empty_file(file)) { return(character()) } ds <- datasource(file, skip = skip) read_lines_(ds, locale_ = locale, na = na, n_max = n_max, progress = progress) } #' @export #' @rdname read_lines read_lines_raw <- function(file, skip = 0, n_max = -1L, progress = show_progress()) { if (empty_file(file)) { return(list()) } ds <- datasource(file, skip = skip) read_lines_raw_(ds, n_max = n_max, progress = progress) } #' @inheritParams write_delim #' @return `write_lines()` returns `x`, invisibly. #' @export #' @rdname read_lines write_lines <- function(x, path, na = "NA", append = FALSE) { is_raw <- is.list(x) && inherits(x[[1]], "raw") if (!is_raw) { x <- as.character(x) } path <- standardise_path(path, input = FALSE) if (!isOpen(path)) { on.exit(close(path), add = TRUE) open(path, if (isTRUE(append)) "ab" else "wb") } if (is_raw) { write_lines_raw_(x, path) } else { write_lines_(x, path, na) } invisible(x) } readr/R/col_types.R0000644000175100001440000002634713106315444013727 0ustar hornikusers#' Create column specification #' #' @param ... Either column objects created by `col_*()`, or their #' abbreviated character names. If you're only overriding a few columns, #' it's best to refer to columns by name. If not named, the column types #' must match the column names exactly. #' @param .default Any named columns not explicitly overridden in `...` #' will be read with this column type. #' @export #' @examples #' cols(a = col_integer()) #' cols_only(a = col_integer()) #' #' # You can also use the standard abreviations #' cols(a = "i") #' cols(a = "i", b = "d", c = "_") cols <- function(..., .default = col_guess()) { col_types <- list(...) is_character <- vapply(col_types, is.character, logical(1)) col_types[is_character] <- lapply(col_types[is_character], col_concise) if (is.character(.default)) { .default <- col_concise(.default) } col_spec(col_types, .default) } #' @export #' @rdname cols cols_only <- function(...) { cols(..., .default = col_skip()) } # col_spec ---------------------------------------------------------------- col_spec <- function(col_types, default = col_guess()) { stopifnot(is.list(col_types)) stopifnot(is.collector(default)) is_collector <- vapply(col_types, is.collector, logical(1)) if (any(!is_collector)) { stop("Some `col_types` are not S3 collector objects: ", paste(which(!is_collector), collapse = ", "), call. = FALSE) } structure( list( cols = col_types, default = default ), class = "col_spec" ) } is.col_spec <- function(x) inherits(x, "col_spec") as.col_spec <- function(x) UseMethod("as.col_spec") #' @export as.col_spec.character <- function(x) { letters <- strsplit(x, "")[[1]] col_spec(lapply(letters, col_concise), col_guess()) } #' @export as.col_spec.NULL <- function(x) { col_spec(list()) } #' @export as.col_spec.list <- function(x) { do.call(cols, x) } #' @export as.col_spec.col_spec <- function(x) x #' @export as.col_spec.default <- function(x) { stop("`col_types` must be NULL, a list or a string", call. = FALSE) } #' @export print.col_spec <- function(x, n = Inf, condense = NULL, ...) { cat(format.col_spec(x, n = n, condense = condense, ...)) invisible(x) } #' @description #' `cols_condense()` takes a spec object and condenses its definition by setting #' the default column type to the most frequent type and only listing columns #' with a different type. #' @rdname spec #' @export cols_condense <- function(x) { types <- vapply(x$cols, function(xx) class(xx)[[1]], character(1)) counts <- table(types) most_common <- names(counts)[counts == max(counts)][[1]] x$default <- x$cols[types == most_common][[1]] x$cols <- x$cols[types != most_common] x } #' @export format.col_spec <- function(x, n = Inf, condense = NULL, ...) { if (n == 0) { return("") } # condense if cols >= n condense <- condense %||% (length(x$cols) >= n) if (isTRUE(condense)) { x <- cols_condense(x) } # truncate to minumum of n or length cols <- x$cols[seq_len(min(length(x$cols), n))] default <- NULL if (inherits(x$default, "collector_guess")) { fun_type <- "cols" } else if (inherits(x$default, "collector_skip")) { fun_type <- "cols_only" } else { fun_type <- "cols" type <- sub("^collector_", "", class(x$default)[[1]]) default <- paste0(".default = col_", type, "()") } cols_args <- c(default, vapply(seq_along(cols), function(i) { col_funs <- sub("^collector_", "col_", class(cols[[i]])[[1]]) args <- vapply(cols[[i]], deparse2, character(1), sep = "\n ") args <- paste(names(args), args, sep = " = ", collapse = ", ") col_names <- names(cols)[[i]] # Need to handle unnamed columns and columns with non-syntactic names named <- col_names != "" non_syntactic <- !is_syntactic(col_names) & named col_names[non_syntactic] <- paste0("`", gsub("`", "\\\\`", col_names[non_syntactic]), "`") out <- paste0(col_names, " = ", col_funs, "(", args, ")") out[!named] <- paste0(col_funs, "(", args, ")") out }, character(1) ) ) if (length(x$cols) == 0 && length(cols_args) == 0) { return(paste0(fun_type, "()\n")) } out <- paste0(fun_type, "(\n ", paste(collapse = ",\n ", cols_args)) if (length(x$cols) > n) { out <- paste0(out, "\n # ... with ", length(x$cols) - n, " more columns") } out <- paste0(out, "\n)\n") out } # Used in read_delim(), read_fwf() and type_convert() show_cols_spec <- function(spec, n = getOption("readr.num_columns", 20)) { if (n > 0) { message("Parsed with column specification:\n", format(spec, n = n, condense = NULL), appendLF = FALSE) if (length(spec$cols) >= n) { message("See spec(...) for full column specifications.") } } } #' Examine the column specifications for a data frame #' #' `spec()` extracts the full column specification from a tibble #' created by readr. #' #' @param x The data frame object to extract from #' @return A col_spec object. #' @export #' @examples #' df <- read_csv(readr_example("mtcars.csv")) #' s <- spec(df) #' s #' #' cols_condense(s) spec <- function(x) { stopifnot(inherits(x, "tbl_df")) attr(x, "spec") } col_concise <- function(x) { switch(x, "_" = , "-" = col_skip(), "?" = col_guess(), c = col_character(), D = col_date(), d = col_double(), i = col_integer(), l = col_logical(), n = col_number(), T = col_datetime(), t = col_time(), stop("Unknown shortcut: ", x, call. = FALSE) ) } col_spec_standardise <- function(file, col_names = TRUE, col_types = NULL, guessed_types = NULL, comment = "", skip = 0, guess_max = 1000, tokenizer = tokenizer_csv(), locale = default_locale(), drop_skipped_names = FALSE) { # Figure out the column names ----------------------------------------------- if (is.logical(col_names) && length(col_names) == 1) { ds_header <- datasource(file, skip = skip, comment = comment) if (col_names) { col_names <- guess_header(ds_header, tokenizer, locale) skip <- skip + 1 } else { n <- length(guess_header(ds_header, tokenizer, locale)) col_names <- paste0("X", seq_len(n)) } guessed_names <- TRUE } else if (is.character(col_names)) { guessed_names <- FALSE } else { stop("`col_names` must be TRUE, FALSE or a character vector", call. = FALSE) } missing_names <- is.na(col_names) if (any(missing_names)) { new_names <- paste0("X", seq_along(col_names)[missing_names]) col_names[missing_names] <- new_names warning( "Missing column names filled in: ", paste0( encodeString(new_names, quote = "'"), " [", which(missing_names), "]", collapse = ", " ), call. = FALSE ) } if (anyDuplicated(col_names)) { dups <- duplicated(col_names) old_names <- col_names col_names <- make.unique(col_names, sep = "_") warning( "Duplicated column names deduplicated: ", paste0( encodeString(old_names[dups], quote = "'"), " => ", encodeString(col_names[dups], quote = "'"), " [", which(dups), "]", collapse = ", " ), call. = FALSE ) } # Figure out column types ---------------------------------------------------- spec <- as.col_spec(col_types) type_names <- names(spec$cols) if (length(spec$cols) == 0) { # no types specified so use defaults spec$cols <- rep(list(spec$default), length(col_names)) names(spec$cols) <- col_names } else if (is.null(type_names) && guessed_names) { # unnamed types & names guessed from header: match exactly if (length(spec$cols) != length(col_names)) { warning("Unnamed `col_types` should have the same length as `col_names`. ", "Using smaller of the two.", call. = FALSE) n <- min(length(col_names), length(spec$cols)) spec$cols <- spec$cols[seq_len(n)] col_names <- col_names[seq_len(n)] } names(spec$cols) <- col_names } else if (is.null(type_names) && !guessed_names) { # unnamed types & names supplied: match non-skipped columns skipped <- vapply(spec$cols, inherits, "collector_skip", FUN.VALUE = logical(1)) # Needed for read_fwf() because width generator functions have name for # every column, even those that are skipped. Not need for read_delim() if (drop_skipped_names) { col_names <- col_names[!skipped] } n_read <- sum(!skipped) n_names <- length(col_names) n_new <- abs(n_names - n_read) if (n_read < n_names) { warning("Insufficient `col_types`. Guessing ", n_new, " columns.", call. = FALSE) spec$cols <- c(spec$cols, list(rep(col_guess(), n_new))) } else if (n_read > n_names) { warning("Insufficient `col_names`. Adding ", n_new, " names.", call. = FALSE) col_names2 <- rep("", length(spec$cols)) col_names2[!skipped] <- c(col_names, paste0("X", seq_len(n_new) + n_names)) col_names <- col_names2 } else { col_names2 <- rep("", length(spec$cols)) col_names2[!skipped] <- col_names col_names <- col_names2 } names(spec$cols) <- col_names } else { # names types bad_types <- !(type_names %in% col_names) if (any(bad_types)) { warning("The following named parsers don't match the column names: ", paste0(type_names[bad_types], collapse = ", "), call. = FALSE) spec$cols <- spec$cols[!bad_types] type_names <- type_names[!bad_types] } default_types <- !(col_names %in% type_names) if (any(default_types)) { defaults <- rep(list(spec$default), sum(default_types)) names(defaults) <- col_names[default_types] spec$cols[names(defaults)] <- defaults } spec$cols <- spec$cols[col_names] } # Guess any types that need to be guessed ------------------------------------ is_guess <- vapply(spec$cols, function(x) inherits(x, "collector_guess"), logical(1)) if (any(is_guess)) { if (is.null(guessed_types)) { ds <- datasource(file, skip = skip, comment = comment) guessed_types <- guess_types(ds, tokenizer, locale, guess_max = guess_max) } # Need to be careful here: there might be more guesses than types/names guesses <- guessed_types[seq_along(spec$cols)][is_guess] spec$cols[is_guess] <- lapply(guesses, collector_find) } spec } check_guess_max <- function(guess_max, max_limit = .Machine$integer.max %/% 100) { if (length(guess_max) != 1 || !is.numeric(guess_max) || !is_integerish(guess_max) || is.na(guess_max) || guess_max < 0) { stop("`guess_max` must be a positive integer", call. = FALSE) } if (guess_max > max_limit) { warning("`guess_max` is a very large value, setting to `", max_limit, "` to avoid exhausting memory", call. = FALSE) guess_max <- max_limit } guess_max } guess_types <- function(datasource, tokenizer, locale, guess_max = 1000, max_limit = .Machine$integer.max %/% 100) { guess_max <- check_guess_max(guess_max, max_limit) guess_types_(datasource, tokenizer, locale, n = guess_max) } guess_header <- function(datasource, tokenizer, locale = default_locale()) { guess_header_(datasource, tokenizer, locale) } readr/R/read_fwf.R0000644000175100001440000001167513106315444013501 0ustar hornikusers #' Read a fixed width file into a tibble #' #' A fixed width file can be a very compact representation of numeric data. #' It's also very fast to parse, because every field is in the same place in #' every line. Unfortunately, it's painful to parse because you need to #' describe the length of every field. Readr aims to make it as easy as possible #' by providing a number of different ways to describe the field structure. #' #' @seealso [read_table()] to read fixed width files where each #' column is separated by whitespace. #' @inheritParams datasource #' @inheritParams tokenizer_fwf #' @inheritParams read_delim #' @param col_positions Column positions, as created by [fwf_empty()], #' [fwf_widths()] or [fwf_positions()]. To read in only selected fields, #' use [fwf_positions()]. If the width of the last column is variable (a #' ragged fwf file), supply the last end position as NA. #' @export #' @examples #' fwf_sample <- readr_example("fwf-sample.txt") #' cat(read_lines(fwf_sample)) #' #' # You can specify column positions in several ways: #' # 1. Guess based on position of empty columns #' read_fwf(fwf_sample, fwf_empty(fwf_sample, col_names = c("first", "last", "state", "ssn"))) #' # 2. A vector of field widths #' read_fwf(fwf_sample, fwf_widths(c(20, 10, 12), c("name", "state", "ssn"))) #' # 3. Paired vectors of start and end positions #' read_fwf(fwf_sample, fwf_positions(c(1, 30), c(10, 42), c("name", "ssn"))) #' # 4. Named arguments with start and end positions #' read_fwf(fwf_sample, fwf_cols(name = c(1, 10), ssn = c(30, 42))) #' # 5. Named arguments with column widths #' read_fwf(fwf_sample, fwf_cols(name = 20, state = 10, ssn = 12)) read_fwf <- function(file, col_positions, col_types = NULL, locale = default_locale(), na = c("", "NA"), comment = "", skip = 0, n_max = Inf, guess_max = min(n_max, 1000), progress = show_progress()) { ds <- datasource(file, skip = skip) if (inherits(ds, "source_file") && empty_file(file)) { return(tibble::tibble()) } tokenizer <- tokenizer_fwf(col_positions$begin, col_positions$end, na = na, comment = comment) spec <- col_spec_standardise( file, skip = skip, guess_max = guess_max, tokenizer = tokenizer, locale = locale, col_names = col_positions$col_names, col_types = col_types, drop_skipped_names = TRUE ) if (is.null(col_types) && !inherits(ds, "source_string")) { show_cols_spec(spec) } out <- read_tokens(ds, tokenizer, spec$cols, names(spec$cols), locale_ = locale, n_max = if (n_max == Inf) -1 else n_max, progress = progress) out <- name_problems(out, names(spec$cols), source_name(file)) attr(out, "spec") <- spec warn_problems(out) } #' @rdname read_fwf #' @export #' @param n Number of lines the tokenizer will read to determine file structure. By default #' it is set to 100. fwf_empty <- function(file, skip = 0, col_names = NULL, comment = "", n = 100L) { ds <- datasource(file, skip = skip) out <- whitespaceColumns(ds, comment = comment, n = n) out$end[length(out$end)] <- NA col_names <- fwf_col_names(col_names, length(out$begin)) out$col_names <- col_names out } #' @rdname read_fwf #' @export #' @param widths Width of each field. Use NA as width of last field when #' reading a ragged fwf file. #' @param col_names Either NULL, or a character vector column names. fwf_widths <- function(widths, col_names = NULL) { pos <- cumsum(c(1L, abs(widths))) fwf_positions(pos[-length(pos)], pos[-1] - 1L, col_names) } #' @rdname read_fwf #' @export #' @param start,end Starting and ending (inclusive) positions of each field. #' Use NA as last end field when reading a ragged fwf file. fwf_positions <- function(start, end = NULL, col_names = NULL) { stopifnot(length(start) == length(end)) col_names <- fwf_col_names(col_names, length(start)) tibble( begin = start - 1L, end = end, # -1 to change to 0 offset, +1 to be exclusive, col_names = col_names ) } #' @rdname read_fwf #' @export #' @param ... If the first element is a data frame, #' then it must have all numeric columns and either one or two rows. #' The column names are the variable names, and the column values are the #' variable widths if a length one vector, and variable start and end #' positions. #' Otherwise, the elements of `...` are used to construct a data frame #' with or or two rows as above. fwf_cols <- function(...) { x <- lapply(list(...), as.integer) names(x) <- fwf_col_names(names(x), length(x)) x <- tibble::as_tibble(x) if (nrow(x) == 2) { fwf_positions(as.integer(x[1, ]), as.integer(x[2, ]), names(x)) } else if (nrow(x) == 1) { fwf_widths(as.integer(x[1, ]), names(x)) } else { stop("All variables must have either one (width) two (start, end) values.", call. = FALSE) } } fwf_col_names <- function(nm, n) { nm <- nm %||% rep("", n) nm_empty <- (nm == "") nm[nm_empty] <- paste0("X", seq_len(n))[nm_empty] nm } readr/R/source.R0000644000175100001440000001150013106315444013207 0ustar hornikusers#' Create a source object. #' #' @param file Either a path to a file, a connection, or literal data #' (either a single string or a raw vector). #' #' Files ending in `.gz`, `.bz2`, `.xz`, or `.zip` will #' be automatically uncompressed. Files starting with `http://`, #' `https://`, `ftp://`, or `ftps://` will be automatically #' downloaded. Remote gz files can also be automatically downloaded and #' decompressed. #' #' Literal data is most useful for examples and tests. It must contain at #' least one new line to be recognised as data (instead of a path). #' @param skip Number of lines to skip before reading data. #' @keywords internal #' @export #' @examples #' # Literal csv #' datasource("a,b,c\n1,2,3") #' datasource(charToRaw("a,b,c\n1,2,3")) #' #' # Strings #' datasource(readr_example("mtcars.csv")) #' datasource(readr_example("mtcars.csv.bz2")) #' datasource(readr_example("mtcars.csv.zip")) #' \dontrun{ #' datasource("https://github.com/tidyverse/readr/raw/master/inst/extdata/mtcars.csv") #' } #' #' # Connection #' con <- rawConnection(charToRaw("abc\n123")) #' datasource(con) #' close(con) datasource <- function(file, skip = 0, comment = "") { if (inherits(file, "source")) { # If `skip` and `comment` arguments are expliictly passed, we want to use # those even if `file` is already a source if (!missing(skip)) { file$skip <- skip } if (!missing(comment)) { file$comment <- comment } file } else if (is.connection(file)) { datasource_connection(file, skip, comment) } else if (is.raw(file)) { datasource_raw(file, skip, comment) } else if (is.character(file)) { if (grepl("\n", file)) { datasource_string(file, skip, comment) } else { file <- standardise_path(file) if (is.connection(file)) { datasource_connection(file, skip, comment) } else { datasource_file(file, skip, comment) } } } else { stop("`file` must be a string, raw vector or a connection.", call. = FALSE) } } # Constructors ----------------------------------------------------------------- new_datasource <- function(type, x, skip, comment = "", ...) { structure(list(x, skip = skip, comment = comment, ...), class = c(paste0("source_", type), "source")) } datasource_string <- function(text, skip, comment = "") { new_datasource("string", text, skip = skip, comment = comment) } datasource_file <- function(path, skip, comment = "") { path <- check_path(path) new_datasource("file", path, skip = skip, comment = comment) } datasource_connection <- function(path, skip, comment = "") { datasource_raw(read_connection(path), skip, comment = comment) } datasource_raw <- function(text, skip, comment) { new_datasource("raw", text, skip = skip, comment = comment) } # Helpers ---------------------------------------------------------------------- read_connection <- function(con) { stopifnot(is.connection(con)) if (!isOpen(con)) { open(con, "rb") on.exit(close(con), add = TRUE) } read_connection_(con) } standardise_path <- function(path, input = TRUE) { if (!is.character(path)) return(path) if (grepl("\n", path)) return(path) if (is_url(path)) { if (requireNamespace("curl", quietly = TRUE)) { con <- curl::curl(path) } else { message("`curl` package not installed, falling back to using `url()`") con <- url(path) } if (identical(tools::file_ext(path), "gz")) { return(gzcon(con)) } else { return(con) } } if (isTRUE(input)) { path <- check_path(path) } switch(tools::file_ext(path), gz = gzfile(path, ""), bz2 = bzfile(path, ""), xz = xzfile(path, ""), zip = zipfile(path, ""), # Use a file connection for output if (!isTRUE(input)) { file(path, "") } else { path }) } source_name <- function(x) { if (is.connection(x)) { "" } else if (is.raw(x)) { "" } else if (is.character(x)) { if (grepl("\n", x)) { "literal data" } else { paste0("'", x, "'") } } else { "???" } } is_url <- function(path) { grepl("^(http|ftp)s?://", path) } check_path <- function(path) { if (file.exists(path)) return(normalizePath(path, "/", mustWork = FALSE)) stop("'", path, "' does not exist", if (!is_absolute_path(path)) paste0(" in current working directory ('", getwd(), "')"), ".", call. = FALSE ) } is_absolute_path <- function(path) { grepl("^(/|[A-Za-z]:|\\\\|~)", path) } zipfile <- function(path, open = "r") { files <- utils::unzip(path, list = TRUE) file <- files$Name[[1]] if (nrow(files) > 1) { message("Multiple files in zip: reading '", file, "'") } unz(path, file, open = open) } empty_file <- function(x) { is.character(x) && file.exists(x) && file.info(x, extra_cols = FALSE)$size == 0 } readr/R/callback.R0000644000175100001440000000723113106315444013451 0ustar hornikusersas_chunk_callback <- function(x) UseMethod("as_chunk_callback") as_chunk_callback.function <- function(x) { SideEffectChunkCallback$new(x) } as_chunk_callback.R6ClassGenerator <- function(x) { as_chunk_callback(x$new()) } as_chunk_callback.ChunkCallback <- function(x) { x } #' Callback classes #' #' These classes are used to define callback behaviors. #' #' \describe{ #' \item{ChunkCallback}{Callback interface definition, all callback functions should inherit from this class.} #' \item{SideEffectChunkCallback}{Callback function that is used only for side effects, no results are returned.} #' \item{DataFrameCallback}{Callback function that combines each result together at the end.} #' } #' @usage NULL #' @format NULL #' @name callback #' @keywords internal #' @family chunked #' @examples #' ## If given a regular function it is converted to a SideEffectChunkCallback #' #' # view structure of each chunk #' read_lines_chunked(readr_example("mtcars.csv"), str, chunk_size = 5) #' #' # Print starting line of each chunk #' f <- function(x, pos) print(pos) #' read_lines_chunked(readr_example("mtcars.csv"), SideEffectChunkCallback$new(f), chunk_size = 5) #' #' # If combined results are desired you can use the DataFrameCallback #' #' # Cars with 3 gears #' f <- function(x, pos) subset(x, gear == 3) #' read_csv_chunked(readr_example("mtcars.csv"), DataFrameCallback$new(f), chunk_size = 5) #' #' # The ListCallback can be used for more flexible output #' f <- function(x, pos) x$mpg[x$hp > 100] #' read_csv_chunked(readr_example("mtcars.csv"), ListCallback$new(f), chunk_size = 5) #' @export ChunkCallback <- R6::R6Class("ChunkCallback", private = list( callback = NULL ), public = list( initialize = function(callback) NULL, receive = function(data, index) NULL, continue = function() TRUE, result = function() NULL, finally = function() NULL ) ) #' @usage NULL #' @format NULL #' @rdname callback #' @export SideEffectChunkCallback <- R6::R6Class("SideEffectChunkCallback", inherit = ChunkCallback, private = list( cancel = FALSE ), public = list( initialize = function(callback) { check_callback_fun(callback) private$callback <- callback }, receive = function(data, index) { result <- private$callback(data, index) private$cancel <- identical(result, FALSE) }, continue = function() { !private$cancel } ) ) #' @usage NULL #' @format NULL #' @rdname callback #' @export DataFrameCallback <- R6::R6Class("DataFrameCallback", inherit = ChunkCallback, private = list( results = list() ), public = list( initialize = function(callback) { private$callback <- callback }, receive = function(data, index) { result <- private$callback(data, index) private$results <- c(private$results, list(result)) }, result = function() { do.call(`rbind`, private$results) }, finally = function() { private$results <- list() } ) ) #' @usage NULL #' @format NULL #' @rdname callback #' @export ListCallback <- R6::R6Class("ListCallback", inherit = ChunkCallback, private = list( results = list() ), public = list( initialize = function(callback) { private$callback <- callback }, receive = function(data, index) { result <- private$callback(data, index) private$results <- c(private$results, list(result)) }, result = function() { private$results }, finally = function() { private$results <- list() } ) ) check_callback_fun <- function(callback) { n_args <- length(formals(callback)) if (n_args < 2) { stop("`callback` must have two or more arguments", call. = FALSE) } } readr/R/read_log.R0000644000175100001440000000123413106315444013466 0ustar hornikusers#' Read common/combined log file into a tibble #' #' This is a fairly standard format for log files - it uses both quotes #' and square brackets for quoting, and there may be literal quotes embedded #' in a quoted string. The dash, "-", is used for missing values. #' #' @inheritParams read_delim #' @export #' @examples #' read_log(readr_example("example.log")) read_log <- function(file, col_names = FALSE, col_types = NULL, skip = 0, n_max = Inf, progress = show_progress()) { tokenizer <- tokenizer_log() read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, skip = skip, n_max = n_max, progress = progress) } readr/R/zzz.R0000644000175100001440000000046413106315444012553 0ustar hornikusers.onLoad <- function(libname, pkgname) { opt <- options() opt_readr <- list( readr.show_progress = TRUE ) to_set <- !(names(opt_readr) %in% names(opt)) if(any(to_set)) options(opt_readr[to_set]) invisible() } release_questions <- function() { c( "Have checked with the IDE team?" ) } readr/R/tokenizer.R0000644000175100001440000001013413106315444013723 0ustar hornikusers#' Tokenize a file/string. #' #' Turns input into a character vector. Usually the tokenization is done purely #' in C++, and never exposed to R (because that requires a copy). This function #' is useful for testing, or when a file doesn't parse correctly and you want #' to see the underlying tokens. #' #' @inheritParams datasource #' @param tokenizer A tokenizer specification. #' @param skip Number of lines to skip before reading data. #' @param n_max Optionally, maximum number of rows to tokenize. #' @keywords internal #' @export #' @examples #' tokenize("1,2\n3,4,5\n\n6") #' #' # Only tokenize first two lines #' tokenize("1,2\n3,4,5\n\n6", n = 2) tokenize <- function(file, tokenizer = tokenizer_csv(), skip = 0, n_max = -1L) { ds <- datasource(file, skip = skip) tokenize_(ds, tokenizer, n_max) } #' Tokenizers. #' #' Explicitly create tokenizer objects. Usually you will not call these #' function, but will instead use one of the use friendly wrappers like #' [read_csv()]. #' #' @keywords internal #' @name Tokenizers #' @examples #' tokenizer_csv() NULL #' @export #' @rdname Tokenizers #' @param comment A string used to identify comments. Any text after the #' comment characters will be silently ignored. #' @param na Character vector of strings to use for missing values. Set this #' option to `character()` to indicate no missing values. #' @param quoted_na Should missing values inside quotes be treated as missing #' values (the default) or strings. #' @param delim Single character used to separate fields within a record. #' @param quote Single character used to quote strings. #' @param trim_ws Should leading and trailing whitespace be trimmed from #' each field before parsing it? #' @param escape_double Does the file escape quotes by doubling them? #' i.e. If this option is `TRUE`, the value `""""` represents #' a single quote, `\"`. #' @param escape_backslash Does the file use backslashes to escape special #' characters? This is more general than `escape_double` as backslashes #' can be used to escape the delimiter character, the quote character, or #' to add special characters like `\\n`. tokenizer_delim <- function(delim, quote = '"', na = "NA", quoted_na = TRUE, comment = "", trim_ws = TRUE, escape_double = TRUE, escape_backslash = FALSE) { structure( list( delim = delim, quote = quote, na = na, quoted_na = quoted_na, comment = comment, trim_ws = trim_ws, escape_double = escape_double, escape_backslash = escape_backslash ), class = "tokenizer_delim" ) } #' @export #' @rdname Tokenizers tokenizer_csv <- function(na = "NA", quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE) { tokenizer_delim( delim = ",", na = na, quoted_na = quoted_na, quote = quote, comment = comment, trim_ws = trim_ws, escape_double = TRUE, escape_backslash = FALSE ) } #' @export #' @rdname Tokenizers tokenizer_tsv <- function(na = "NA", quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE) { tokenizer_delim( delim = "\t", na = na, quoted_na = quoted_na, quote = quote, comment = comment, trim_ws = trim_ws, escape_double = TRUE, escape_backslash = FALSE ) } #' @export #' @rdname Tokenizers tokenizer_line <- function(na = character()) { structure(list(na = na), class = "tokenizer_line") } #' @export #' @rdname Tokenizers tokenizer_log <- function() { structure(list(), class = "tokenizer_log") } #' @export #' @rdname Tokenizers #' @param begin,end Begin and end offsets for each file. These are C++ #' offsets so the first column is column zero, and the ranges are #' [begin, end) (i.e inclusive-exclusive). tokenizer_fwf <- function(begin, end, na = "NA", comment = "") { structure(list(begin = begin, end = end, na = na, comment = comment), class = "tokenizer_fwf") } #' @export #' @rdname Tokenizers tokenizer_ws <- function(na = "NA", comment = "") { structure(list(na = na, comment = comment), class = "tokenizer_ws") } readr/vignettes/0000755000175100001440000000000013106621354013376 5ustar hornikusersreadr/vignettes/readr.Rmd0000644000175100001440000001760313106315444015146 0ustar hornikusers--- title: "Introduction to readr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to readr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(readr) knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` The key problem that readr solves is __parsing__ a flat file into a tibble. Parsing is the process of taking a text file and turning it into a rectangular tibble where each column is the appropriate part. Parsing takes place in three basic stages: 1. The flat file is parsed into a rectangular matrix of strings. 1. The type of each column is determined. 1. Each column of strings is parsed into a vector of a more specific type. It's easiest to learn how this works in the opposite order Below, you'll learn how the: 1. __Vector parsers__ turn a character vector in to a more specific type. 1. __Column specification__ describes the type of each column and the strategy readr uses to guess types so you don't need to supply them all. 1. __Rectangular parsers__ turn a flat file into a matrix of rows and columns. Each `parse_*()` is coupled with a `col_*()` function, which will be used in the process of parsing a complete tibble. ## Vector parsers It's easiest to learn the vector parses using `parse_` functions. These all take a character vector and some options. They return a new vector the same length as the old, along with an attribute describing any problems. ### Atomic vectors `parse_logical()`, `parse_integer()`, `parse_double()`, and `parse_character()` are straightforward parsers that produce the corresponding atomic vector. ```{r} parse_integer(c("1", "2", "3")) parse_double(c("1.56", "2.34", "3.56")) parse_logical(c("true", "false")) ``` By default, readr expects `.` as the decimal mark and `,` as the grouping mark. You can override this default using `locale()`, as described in `vignette("locales")`. ### Flexible numeric parser `parse_integer()` and `parse_double()` are strict: the input string must be a single number with no leading or trailing characters. `parse_number()` is more flexible: it ignores non-numeric prefixes and suffixes, and knows how to deal with grouping marks. This makes it suitable for reading currencies and percentages: ```{r} parse_number(c("0%", "10%", "150%")) parse_number(c("$1,234.5", "$12.45")) ``` ### Date/times readr supports three types of date/time data: * dates: number of days since 1970-01-01. * times: number of seconds since midnight. * datetimes: number of seconds since midnight 1970-01-01. ```{r} parse_datetime("2010-10-01 21:45") parse_date("2010-10-01") parse_time("1:00pm") ``` Each function takes a `format` argument which describes the format of the string. If not specified, it uses a default value: * `parse_datetime()` recognises [ISO8601](https://en.wikipedia.org/wiki/ISO_8601) datetimes. * `parse_date()` uses the `date_format` specified by the `locale()`. The default value is `%AD` which uses an automatic date parser that recognises dates of the format `Y-m-d` or `Y/m/d`. * `parse_time()` uses the `time_format` specified by the `locale()`. The default value is `%At` which uses an automatic time parser that recognises times of the form `H:M` optionally followed by seconds and am/pm. In most cases, you will need to supply a `format`, as documented in `parse_datetime()`: ```{r} parse_datetime("1 January, 2010", "%d %B, %Y") parse_datetime("02/02/15", "%m/%d/%y") ``` ### Factors When reading a column that has a known set of values, you can read directly into a factor. `parse_factor()` will generate generate a warning if a value is not in the supplied levels. ```{r} parse_factor(c("a", "b", "a"), levels = c("a", "b", "c")) parse_factor(c("a", "b", "d"), levels = c("a", "b", "c")) ``` ## Column specification It would be tedious if you had to specify the type of every column when reading a file. Instead readr, uses some heuristics to guess the type of each column. You can access these results yourself using `guess_parser()`: ```{r} guess_parser(c("a", "b", "c")) guess_parser(c("1", "2", "3")) guess_parser(c("1,000", "2,000", "3,000")) guess_parser(c("2001/10/10")) ``` The guessing policies are described in the documentation for the individual functions. Guesses are fairly strict. For example, we don't guess that currencies are numbers, even though we can parse them: ```{r} guess_parser("$1,234") parse_number("1,234") ``` The are two parsers that will never be guessed: `col_skip()` and `col_factor()`. You will always need to supply these explicitly. You can see the specification that readr would generate for a column file by using `spec_csv()`, `spec_tsv()` and so on: ```{r} x <- spec_csv(readr_example("challenge.csv")) ``` For bigger files, you can often make the specification simpler by changing the default column type using `cols_condense()` ```{r} mtcars_spec <- spec_csv(readr_example("mtcars.csv")) mtcars_spec cols_condense(mtcars_spec) ``` By default readr only looks at the first 1000 rows. This keeps file parsing speedy, but can generate incorrect guesses. For example, in `challenge.csv` the column types change in row 1001, so readr guesses the wrong types. One way to resolve the problem is to increase the number of rows: ```{r} x <- spec_csv(readr_example("challenge.csv"), guess_max = 1001) ``` Another way is to manually specify the `col_type`, as described below. ## Rectangular parsers readr comes with five parsers for rectangular file formats: * `read_csv()` and `read_csv2()` for csv files * `read_tsv()` for tabs separated files * `read_fwf()` for fixed-width files * `read_log()` for web log files Each of these functions firsts calls `spec_xxx()` (as described above), and then parses the file according to that column specification: ```{r} df1 <- read_csv(readr_example("challenge.csv")) ``` The rectangular parsing functions almost always succeed; they'll only fail if the format is severely messed up. Instead, readr will generate a data frame of problems. The first few will be printed out, and you can access them all with `problems()`: ```{r} problems(df1) ``` You've already seen one way of handling bad guesses: increasing the number of rows used to guess the type of each column. ```{r} df2 <- read_csv(readr_example("challenge.csv"), guess_max = 1001) ``` Another approach is to manually supply the column specification. ### Overriding the defaults In the previous examples, you may have noticed that readr printed the column specification that it used to parse the file: ```{r} #> Parsed with column specification: #> cols( #> x = col_integer(), #> y = col_character() #> ) ``` You can also access it after the fact using `spec()`: ```{r} spec(df1) spec(df2) ``` (This also allows you to access the full column specification if you're reading a very wide file. By default, readr will only print the specification of the first 20 columns.) If you want to manually specify the column types, you can start by copying and pasting this code, and then tweaking it fix the parsing problems. ```{r} df3 <- read_csv( readr_example("challenge.csv"), col_types = cols( x = col_double(), y = col_date(format = "") ) ) ``` In general, it's good practice to supply an explicit column specification. It is more work, but it ensures that you get warnings if the data changes in unexpected ways. To be really strict, you can use `stop_for_problems(df3)`. This will throw an error if there are any parsing problems, forcing you to fix those problems before proceeding with the analysis. ### Output The output of all these functions is a tibble. Note that characters are never automatically converted to factors (i.e. no more `stringsAsFactors = FALSE`) and column names are left as is, not munged into valid R identifiers (i.e. there is no `check.names = TRUE`). Row names are never set. Attributes store the column specification (`spec()`) and any parsing problems (`problems()`). readr/vignettes/releases/0000755000175100001440000000000013106315444015201 5ustar hornikusersreadr/vignettes/releases/readr-0.2.0.Rmd0000644000175100001440000002667013106315444017410 0ustar hornikusers--- title: "readr 0.2.0" --- ```{r, include = FALSE} knitr::opts_chunk$set(comment = "#>", collapse = T) library(readr) library(dplyr) ``` readr 0.2.0 is now available on CRAN. readr makes it easy to read many types of tabular data, including csv, tsv and fixed width. Compared to base equivalents like `read.csv()`, readr is much faster and gives more convenient output: it never converts strings to factors, can parse date/times, and it doesn't munge the column names. This is a big release, so below I describe the new features divided into four main categories: * Improved support for international data. * Column parsing improvements. * File parsing improvements, including support for comments. * Improved writers. There were too many minor improvements and bug fixes to describe in detail here. See the [release notes](https://github.com/hadley/readr/releases/tag/v0.2.0) for a complete list. ## Internationalisation readr now has a strategy for dealing with settings that vary across languages and localities: __locales__. A locale, created with `locale()`, includes: * The names of months and days, used when parsing dates. * The default time zone, used when parsing datetimes. * The character encoding, used when reading non-ASCII strings. * Default date format, used when guessing column types. * The decimal and grouping marks, used when reading numbers. I'll cover the most important of these parameters below. For more details, see `vignette("locales")`. To override the default US-centric locale, you pass a custom locale to `read_csv()`, `read_tsv()`, or `read_fwf()`. Rather than showing those funtions here, I'll use the `parse_*()` functions because they work with character vectors instead of a files, but are otherwise identical. ### Names of months and days The first argument to `locale()` is `date_names` which controls what values are used for month and day names. The easiest way to specify them is with a ISO 639 language code: ```{r} locale("ko") # Korean locale("fr") # French ``` This allows you to parse dates in other languages: ```{r} parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr")) parse_date("14 oct. 1979", "%d %b %Y", locale = locale("fr")) ``` ### Timezones readr assumes that times are in [Coordinated Universal Time](https://en.wikipedia.org/wiki/Coordinated_Universal_Time), aka UTC. UTC is the best timezone for data because it doesn't have daylight savings. If your data isn't already in UTC, you'll need to supply a `tz` in the locale: ```{r} parse_datetime("2001-10-10 20:10") parse_datetime("2001-10-10 20:10", locale = locale(tz = "Pacific/Auckland")) parse_datetime("2001-10-10 20:10", locale = locale(tz = "Europe/Dublin")) ``` List all available times zones with `OlsonNames()`. If you're American, note that "EST" is not Eastern Standard Time -- it's a Canadian time zone that doesn't have DST! Instead of relying on ambiguous abbreivations, use: * PST/PDT = "US/Pacific" * CST/CDT = "US/Central" * MST/MDT = "US/Mountain" * EST/EDT = "US/Eastern" ### Default formats Locales also provide default date and time formats. The time format isn't currently used for anything, but the date format is used when guessing column types. The default date format is `%Y-%m-%d` because that's unambiguous: ```{r} str(parse_guess("2010-10-10")) ``` If you're an American, you might want you use your illogical date sytem:: ```{r} str(parse_guess("01/02/2013")) str(parse_guess("01/02/2013", locale = locale(date_format = "%d/%m/%Y"))) ``` ### Character encoding All readr functions yield strings encoded in UTF-8. This encoding is the most likely to give good results in the widest variety of settings. By default, readr assumes that your input is also in UTF-8, which is less likely to be the case, especially when you're working with older datasets. To parse a dataset that's not in UTF-8, you need to a supply an `encoding`. The following code creates a string encoded with latin1 (aka ISO-8859-1), and shows how it's different from the string encoded as UTF-8, and how to parse it with readr: ```{r} x <- "Émigré cause célèbre déjà vu.\n" y <- stringi::stri_conv(x, "UTF-8", "Latin1") # These strings look like they're identical: x y identical(x, y) # But they have different encodings: Encoding(x) Encoding(y) # That means while they print the same, their raw (binary) # representation is actually rather different: charToRaw(x) charToRaw(y) # readr expects strings to be encoded as UTF-8. If they're # not, you'll get weird characters parse_character(x) parse_character(y) # If you know the encoding, supply it: parse_character(y, locale = locale(encoding = "latin1")) ``` If you don't know what encoding the file uses, try `guess_encoding()`. It's not 100% perfect (as it's fundamentally a heuristic), but should at least get you pointed in the right direction: ```{r} guess_encoding(y) # Note that the first guess produces a valid string, # but isn't correct: parse_character(y, locale = locale(encoding = "ISO-8859-2")) # But ISO-8859-1 is another name for latin1 parse_character(y, locale = locale(encoding = "ISO-8859-1")) ``` ### Numbers Some countries use the decimal point, while others use the decimal comma. The `decimal_mark` option controls which readr uses when parsing doubles: ```{r} parse_double("1,23", locale = locale(decimal_mark = ",")) ``` The `big_mark` option describes which character is used to space groups of digits. Do you write `1,000,000`, `1.000.000`, `1 000 000`, or `1'000'000`? Specifying the grouping mark allows `parse_number()` to parse large number as they're commonly written: ```{r} parse_number("1,234.56") # dplyr is smart enough to guess that if you're using , for # decimals then you're probably using . for grouping: parse_number("1.234,56", locale = locale(decimal_mark = ",")) ``` ## Column parsing improvements One of the most useful parts of readr are the column parsers: the tools that turns character input into usefully typed data frame columns. This process is now described more fully in a new vignette: `vignette("column-types")`. By default, column types are guessed by looking at the data. I've made a number of tweaks to make it more likely that your code will load correctly the first time: * readr now looks at the first 1000 rows (instead of just the first 100) when guessing column types: this only takes a fraction more time, but should hopefully yield better guesses for more inputs. * `col_date()` and `col_datetime()` no longer recognise partial dates like 19, 1900, 1900-01. These triggered many false positives and after re-reading the ISO8601 spec, I believe they actually refer to periods of time, so should not be parsed into a specific instant. * `col_integer()` no longer recognises values started with zeros (e.g. 0001) as these are often used as identifiers. * `col_number()` will automatically recognise numbers containing the grouping mark (see below for more details). But you can override these defaults with the `col_types()` argument. In this version, `col_types` gains some much needed flexibility: * New `cols()` function takes of assembling the list of column types, and with its `.default` argument, allows you to control the default column type: ```{r} read_csv("x,y\n1,2", col_types = cols(.default = "c")) ``` You can refer to parsers with their full name (e.g. `col_character()`) or their one letter abbreviation (e.g. `c`). The default value of `.default` is "?": guess the type of column from the data. * `cols_only()` allows you to load only the specified columns: ```{r} read_csv("a,b,c\n1,2,3", col_types = cols_only("b" = "?")) ``` Many of the individual parsers have also been improved: * `col_integer()` and `col_double()` no longer silently ignore trailing characters after the number. * New `col_number()`/`parse_number()` replace the old `col_numeric()`/ `parse_numeric()`. This parser is less flexible, so it's less likely to silently ignored bad input. It's designed specifically to read currencies and percentages. It only reads the first number from a string, ignoring the grouping mark defined by the locale: ```{r} parse_number("1,234,566") parse_number("$1,234") parse_number("27%") ``` * New `parse_time()` and `col_time()` allow you to parse times. They have an optional `format` argument, that uses the same components as `parse_datetime()`. If `format` is omitted, they use a flexible parser that looks for hours, then an optional colon, then minutes, then an optional colon, then optional seconds, then optional am/pm. ```{r} parse_time(c("1:45 PM", "1345", "13:45:00")) ``` `parse_time()` returns the number of seconds since midnight as an integer with class "time". readr includes a basic print method. * `parse_date()`/`col_date()` and `parse_datetime()`/`col_datetime()` gain two new format strings: "%+" skips one or more non-digits, and `%p` reads in AM/PM (and am/pm). ## File parsing improvements `read_csv()`, `read_tsv()`, and `read_delim()` gain extra arguments that allow you to parse more files: * Multiple NA values can be specified by passing a character vector to `na`. The default has been changed to `na = c("", "NA")`. ```{r} read_csv("a,b\n.,NA\n1,3", na = c(".", "NA")) ``` * New `comment` argument allows you to ignore all text after a string: ```{r} read_csv( "#This is a comment #This is another comment a,b 1,10 2,20", comment = "#") ``` * `trim_ws` argument controls whether leading and trailing whitespace is removed. It defaults to `TRUE`. ```{r} read_csv("a,b\n 1, 2") read_csv("a,b\n 1, 2", trim_ws = FALSE) ``` Specifying the wrong number of column names, or having rows with an unexpected number of columns, now gives a warning, rather than an error: ```{r} read_csv("a,b,c\n1,2\n1,2,3,4") ``` Note that the warning message now also shows you the first five problems. I hope this will often allow you to iterate immediately, rather than having to look at the full `problems()`. ## Writers Despite the name, readr also provides some tools for writing data frames to disk. In this version there are three output functions: * `write_csv()` and `write_tsv()` write tab and comma delimted files, and `write_delim()` writes with user specified delimiter. * `write_rds()` and `read_rds()` wrap around `readRDS()` and `saveRDS()`, defaulting to no compression, because you're usually more interested in saving time (expensive) than disk space (cheap). All these functions invisibly return their output so you can use them as part of a pipeline: ```R my_df %>% some_manipulation() %>% write_csv("interim-a.csv") %>% some_more_manipulation() %>% write_csv("interim-b.csv") %>% even_more_manipulation() %>% write_csv("final.csv") ``` You can now control how missing values are written with the `na` argument, and the quoting algorithm has been further refined to only add quotes when needed: when the string contains a quote, the delimiter, a new line or the same text as missing value. Output for doubles now uses the same precision as R, and POSIXt vectors are saved in a ISO8601 compatible format. For testing, you can use `format_csv()`, `format_tsv()`, and `format_delim()` to write csv to a string: ```{r} mtcars %>% head(4) %>% format_csv() %>% cat() ``` This is particularly useful for generating [reprexes](https://github.com/jennybc/reprex). readr/vignettes/releases/readr-1.0.0.Rmd0000644000175100001440000001236513106315444017403 0ustar hornikusers--- title: "readr 1.0.0" --- ```{r setup, include = FALSE} knitr::opts_chunk$set( comment = "#>", collapse = TRUE ) library(readr) ``` readr 1.0.0 is now available on CRAN. readr makes it easy to read many types of rectangular data, including csv, tsv and fixed width files. Compared to base equivalents like `read.csv()`, readr is much faster and gives more convenient output: it never converts strings to factors, can parse date/times, and it doesn't munge the column names. Install the latest version with: ```{r, eval = FALSE} install.packages("readr") ``` Releasing a version 1.0.0 was a deliberate choice to reflect the maturity and stability and readr, thanks largely to work by Jim Hester. readr is by no means perfect, but I don't expect any major changes to the API in the future. In this version we: * Use a better strategy for guessing column types. * Improved the default date and time parsers. * Provided a full set of lower-level file and line readers and writers. * Fixed many bugs. ## Column guessing The process by which readr guesses the types of columns has received a substantial overhaul to make it easier to fix problems when the initial guesses aren't correct, and to make it easier to generate reproducible code. Now column specifications are printing by default when you read from a file: ```{r} mtcars2 <- read_csv(readr_example("mtcars.csv")) ``` The thought is that once you've figured out the correct column types for a file, you should make the parsing strict. You can do this either by copying and pasting the printed column specification or by saving the spec to disk: ```{r} # Once you've figured out the correct types mtcars_spec <- write_rds(spec(mtcars2), "mtcars2-spec.rds") # Every subsequent load mtcars2 <- read_csv( readr_example("mtcars.csv"), col_types = read_rds("mtcars2-spec.rds") ) # In production, you might want to throw an error if there # are any parsing problems. stop_for_problems(mtcars2) ``` You can now also adjust the number of rows that readr uses to guess the column types with `guess_max`: ```{r} challenge <- read_csv(readr_example("challenge.csv")) challenge <- read_csv(readr_example("challenge.csv"), guess_max = 1500) ``` (If you want to suppress the printed specification, just provide the dummy spec `col_types = cols()`) You can now access the guessing algorithm from R: `guess_parser()` will tell you which parser readr will select. ```{r} guess_parser("1,234") # Were previously guessed as numbers guess_parser(c(".", "-")) guess_parser(c("10W", "20N")) # Now uses the default time format guess_parser("10:30") ``` ## Date-time parsing improvements: The date time parsers recognise three new format strings: * `%I` for 12 hour time format: ```{r} library(hms) parse_time("1 pm", "%I %p") ``` Note that `parse_time()` returns `hms` from the [hms](https://github.com/rstats-db/hms) package, rather than a custom `time` class * `%AD` and `%AT` are "automatic" date and time parsers. They are both slightly less flexible than previous defaults. The automatic date parser requires a four digit year, and only accepts `-` and `/` as separators. The flexible time parser now requires colons between hours and minutes and optional seconds. ```{r} parse_date("2010-01-01", "%AD") parse_time("15:01", "%AT") ``` If the format argument is omitted in `parse_date()` or `parse_time()`, the default date and time formats specified in the locale will be used. These now default to `%AD` and `%AT` respectively. You may want to override in your standard `locale()` if the conventions are different where you live. ## Low-level readers and writers readr now contains a full set of efficient lower-level readers: * `read_file()` reads a file into a length-1 character vector; `read_file_raw()` reads a file into a single raw vector. * `read_lines()` reads a file into a character vector with one entry per line; `read_lines_raw()` reads into a list of raw vectors with one entry per line. These are paired with `write_lines()` and `write_file()` to efficient write character and raw vectors back to disk. ## Other changes * `read_fwf()` was overhauled to reliably read only a partial set of columns, to read files with ragged final columns (by setting the final position/width to `NA`), and to skip comments (with the `comment` argument). * readr contains an experimental API for reading a file in chunks, e.g. `read_csv_chunked()` and `read_lines_chunked()`. These allow you to work with files that are bigger than memory. We haven't yet finalised the API so please use with care, and send us your feedback. * There are many otherbug fixes and other minor improvements. You can see a complete list in the [release notes](https://github.com/hadley/readr/releases/tag/v1.0.0). A big thanks goes to all the community members who contributed to this release: @[antoine-lizee](https://github.com/antoine-lizee), @[fpinter](https://github.com/fpinter), @[ghaarsma](https://github.com/ghaarsma), @[jennybc](https://github.com/jennybc), @[jeroenooms](https://github.com/jeroenooms), @[leeper](https://github.com/leeper), @[LluisRamon](https://github.com/LluisRamon), @[noamross](https://github.com/noamross), and @[tvedebrink](https://github.com/tvedebrink). readr/vignettes/releases/readr-0.1.0.Rmd0000644000175100001440000001745413106315444017407 0ustar hornikusers--- title: "readr 0.1.0" --- ```{r, echo = FALSE} knitr::opts_chunk$set(comment = "#>", collapse = T) ``` I'm pleased to announced that readr is now available on CRAN. Readr makes it easy to read many types of tabular data: * Delimited files with`read_delim()`, `read_csv()`, `read_tsv()`, and `read_csv2()`. * Fixed width files with `read_fwf()`, and `read_table()`. * Web log files with `read_log()`. You can install it by running: ```{r, eval = FALSE} install.packages("readr") ``` Compared to the equivalent base functions, readr functions are around 10x faster. They're also easier to use because they're more consistent, they produce data frames that are easier to use (no more `stringsAsFactors = FALSE`!), they have a more flexible column specification, and any parsing problems are recorded in a data frame. Each of these features is described in more detail below. ## Input All readr functions work the same way. There are four important arguments: * `file` gives the file to read; a url or local path. A local path can point to a a zipped, bzipped, xzipped, or gzipped file - it'll be automatically uncompressed in memory before reading. You can also pass in a connection or a raw vector. For small examples, you can also supply literal data: if `file` contains a new line, then the data will be read directly from the string. Thanks to [data.table](https://github.com/Rdatatable/data.table) for this great idea! ```{r} library(readr) read_csv("x,y\n1,2\n3,4") ``` * `col_names`: describes the column names (equivalent to `header` in base R). It has three possible values: * `TRUE` will use the the first row of data as column names. * `FALSE` will number the columns sequentially. * A character vector to use as column names. * `col_types`: overrides the default column types (equivalent to `colClasses` in base R). More on that below. * `progress`: By default, readr will display a progress bar if the estimated loading time is greater than 5 seconds. Use `progress = FALSE` to suppress the progress indicator. ## Output The output has been designed to make your life easier: * Characters are never automatically converted to factors (i.e. no more `stringsAsFactors = FALSE`!). * Column names are left as is, not munged into valid R identifiers (i.e. there is no `check.names = TRUE`). Use backticks to refer to variables with unusual names, e.g. `` df$`Income ($000)` ``. * The output has class `c("tbl_df", "tbl", "data.frame")` so if you also use [dplyr](http://blog.rstudio.org/2015/01/09/dplyr-0-4-0/) you'll get an enhanced print method (i.e. you'll see just the first ten rows, not the first 10,000!). * Row names are never set. ## Column types Readr heuristically inspects the first 100 rows to guess the type of each columns. This is not perfect, but it's fast and it's a reasonable start. Readr can automatically detect these column types: * `col_logical()` [l], contains only `T`, `F`, `TRUE` or `FALSE`. * `col_integer()` [i], integers. * `col_double()` [d], doubles. * `col_euro_double()` [e], "Euro" doubles that use `,` as the decimal separator. * `col_date()` [D]: Y-m-d dates. * `col_datetime()` [T]: ISO8601 date times * `col_character()` [c], everything else. You can manually specify other column types: * `col_skip()` [_], don't import this column. * `col_date(format)` and `col_datetime(format, tz)`, dates or date times parsed with given format string. Dates and times are rather complex, so they're described in more detail in the next section. * `col_numeric()` [n], a sloppy numeric parser that ignores everything apart from 0-9, `-` and `.` (this is useful for parsing currency data). * `col_factor(levels, ordered)`, parse a fixed set of known values into a (optionally ordered) factor. There are two ways to override the default choices with the `col_types` argument: * Use a compact string: `"dc__d"`. Each letter corresponds to a column so this specification means: read first column as double, second as character, skip the next two and read the last column as a double. (There's no way to use this form with column types that need parameters.) * With a (named) list of col objects: ```R read_csv("iris.csv", col_types = list( Sepal.Length = col_double(), Sepal.Width = col_double(), Petal.Length = col_double(), Petal.Width = col_double(), Species = col_factor(c("setosa", "versicolor", "virginica")) )) ``` Any omitted columns will be parsed automatically, so the previous call is equivalent to: ```R read_csv("iris.csv", col_types = list( Species = col_factor(c("setosa", "versicolor", "virginica")) ) ``` ### Dates and times One of the most helpful features of readr is its ability to import dates and date times. It can automatically recognise the following formats: * Dates in year-month-day form: `2001-10-20` or `2010/15/10` (or any non-numeric separator). It can't automatically recongise dates in m/d/y or d/m/y format because they're ambiguous: is `02/01/2015` the 2nd of January or the 1st of February? * Date times as [ISO8601](http://en.wikipedia.org/wiki/ISO_8601) form: e.g. `2001-02-03 04:05:06.07 -0800`, `20010203 040506`, `20010203` etc. I don't support every possible variant yet, so please let me know if it doesn't work for your data (more details in `?parse_datetime`). If your dates are in another format, don't despair. You can use `col_date()` and `col_datetime()` to explicit specify a format string. Readr implements it's own `strptime()` equivalent which supports the following format strings: * Year: `\%Y` (4 digits). `\%y` (2 digits); 00-69 -> 2000-2069, 70-99 -> 1970-1999. * Month: `\%m` (2 digits), `\%b` (abbreviated name in current locale), `\%B` (full name in current locale). * Day: `\%d` (2 digits), `\%e` (optional leading space) * Hour: `\%H` * Minutes: `\%M` * Seconds: `\%S` (integer seconds), `\%OS` (partial seconds) * Time zone: `\%Z` (as name, e.g. `America/Chicago`), `\%z` (as offset from UTC, e.g. `+0800`) * Non-digits: `\%.` skips one non-digit charcater, `\%*` skips any number of non-digit characters. * Shortcuts: `\%D` = `\%m/\%d/\%y`, `\%F` = `\%Y-\%m-\%d`, `\%R` = `\%H:\%M`, `\%T` = `\%H:\%M:\%S`, `\%x` = `\%y/\%m/\%d`. To practice parsing date times with out having to load the file each time, you can use `parse_datetime()` and `parse_date()`: ```{r} parse_date("2015-10-10") parse_datetime("2015-10-10 15:14") parse_date("02/01/2015", "%m/%d/%Y") parse_date("02/01/2015", "%d/%m/%Y") ``` ## Problems If there are any problems parsing the file, the `read_` function will throw a warning telling you how many problems there are. You can then use the `problems()` function to access a data frame that gives information about each problem: ```{r} csv <- "x,y 1,a b,2 " df <- read_csv(csv, col_types = "ii") problems(df) df ``` ## Helper functions Readr also provides a handful of other useful functions: * `read_lines()` works the same way as `readLines()`, but is a lot faster. * `read_file()` reads a complete file into a string. * `type_convert()` attempts to coerce all character columns to their appropriate type. This is useful if you need to do some manual munging (e.g. with regular expressions) to turn strings into numbers. It uses the same rules as the `read_*` functions. * `write_csv()` writes a data frame out to a csv file. It's quite a bit faster than `write.csv()` and it never writes row.names. It also escapes `"` embedded in strings in a way that `read_csv()` can read. ## Development Readr is still under very active development. If you have problems loading a dataset, please try the [development version](https://github.com/hadley/readr), and if that doesn't work, [file an issue](https://github.com/hadley/readr/issues). readr/vignettes/locales.Rmd0000644000175100001440000002001713106315444015464 0ustar hornikusers--- title: "Locales" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Locales} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(readr) knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` The goal of readr's locales is to encapsulate common options that vary between languages and localities. This includes: * The names of months and days, used when parsing dates. * The default time zone, used when parsing datetimes. * The character encoding, used when reading non-ASCII strings. * Default date format, used when guessing column types. * The decimal and grouping marks, used when reading numbers. (Stricly speaking these are not locales in the usual technical sense of the word because they also contain information about time zones and encoding.) To create a new locale, you use the `locale()` function: ```{r} locale() ``` This rest of this vignette will explain what each of the options do. All of the parsing function in readr take a `locale` argument. You'll most often use it with `read_csv()`, `read_fwf()` or `read_table()`. Readr is designed to work the same way across systems, so the default locale is English centric like R. If you're not in an English speaking country, this makes initial import a little harder, because you have to override the defaults. But the payoff is big: you can share your code and know that it will work on any other system. Base R takes a different philosophy. It uses system defaults, so typical data import is a little easier, but sharing code is harder. Rather than demonstrating the use of locales with `read_csv()` and fields, in this vignette I'm going to use the `parse_*()` functions. These work with a character vector instead of a file on disk, so they're easier to use in examples. They're also useful in their own right if you need to do custom parsing. See `type_convert()` if you need to apply multiple parsers to a data frame. ## Dates and times ### Names of months and days The first argument to `locale()` is `date_names`, and it controls what values are used for month and day names. The easiest way to specify it is with a ISO 639 language code: ```{r} locale("ko") # Korean locale("fr") # French ``` If you don't already know the code for your language, [Wikipedia](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) has a good list. Currently readr has `r length(date_names_langs())` languages available. You can list them all with `date_names_langs()`. Specifying a locale allows you to parse dates in other languages: ```{r} parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr")) parse_date("14 oct. 1979", "%d %b %Y", locale = locale("fr")) ``` For many languages, it's common to find that diacritics have been stripped so they can be stored as ASCII. You can tell the locale that with the `asciify` option: ```{r} parse_date("1 août 2015", "%d %B %Y", locale = locale("fr")) parse_date("1 aout 2015", "%d %B %Y", locale = locale("fr", asciify = TRUE)) ``` Note that the quality of the translations is variable, especially for the rarer languages. If you discover that they're not quite right for your data, you can create your own with `date_names()`. The following example creates a locale with Māori date names: ```{r} maori <- locale(date_names( day = c("Rātapu", "Rāhina", "Rātū", "Rāapa", "Rāpare", "Rāmere", "Rāhoroi"), mon = c("Kohi-tātea", "Hui-tanguru", "Poutū-te-rangi", "Paenga-whāwhā", "Haratua", "Pipiri", "Hōngongoi", "Here-turi-kōkā", "Mahuru", "Whiringa-ā-nuku", "Whiringa-ā-rangi", "Hakihea") )) ``` ### Timezones Unless otherwise specified, readr assumes that times are in UTC, the Universal Coordinated Time (this is a successor to GMT and for almost all intents is identical). UTC is most suitable for data because it doesn't have daylight savings - this avoids a whole class of potential problems. If your data isn't already in UTC, you'll need to supply a `tz` in the locale: ```{r} parse_datetime("2001-10-10 20:10") parse_datetime("2001-10-10 20:10", locale = locale(tz = "Pacific/Auckland")) parse_datetime("2001-10-10 20:10", locale = locale(tz = "Europe/Dublin")) ``` You can see a complete list of time zones with `OlsonNames()`. If you're American, note that "EST" is a Canadian time zone that does not have DST. It's not Eastern Standard Time! Instead use: * PST/PDT = "US/Pacific" * CST/CDT = "US/Central" * MST/MDT = "US/Mountain" * EST/EDT = "US/Eastern" (Note that there are more specific time zones for smaller areas that don't follow the same rules. For example, "US/Arizona", which follows mostly follows mountain time, but doesn't have daylight savings. If you're dealing with historical data, you might need an even more specific zone like "America/North_Dakota/New_Salem" - that will get you the most accurate time zones.) Note that these are only used as defaults. If individual times have timezones and you're using "%Z" (as name, e.g. "America/Chicago") or "%z" (as offset from UTC, e.g. "+0800"), they'll override the defaults. There's currently no good way to parse times that use US abbreviations. Note that once you have the date in R, changing the time zone just changes its printed representation - it still represents the same instants of time. If you've loaded non-UTC data, and want to display it as UTC, try this snippet of code: ```{r, eval = FALSE} is_datetime <- sapply(df, inherits, "POSIXct") df[is_datetime] <- lapply(df[is_datetime], function(x) { attr(x, "tzone") <- "UTC" x }) ``` ### Default formats Locales also provide default date and time formats. The time format isn't currently used for anything, but the date format is used when guessing column types. The default date format is `%Y-%m-%d` because that's unambiguous: ```{r} str(parse_guess("2010-10-10")) ``` If you're an American, you might want you use your illogical date sytem:: ```{r} str(parse_guess("01/02/2013")) str(parse_guess("01/02/2013", locale = locale(date_format = "%d/%m/%Y"))) ``` ## Character All readr functions yield strings encoded in UTF-8. This encoding is the most likely to give good results in the widest variety of settings. By default, readr assumes that your input is also in UTF-8. This is less likely to be the case, especially when you're working with older datasets. The following code illustrates the problems with encodings: ```{r} library(stringi) x <- "Émigré cause célèbre déjà vu.\n" y <- stri_conv(x, "UTF-8", "latin1") # These strings look like they're identical: x y identical(x, y) # But they have difference encodings: Encoding(x) Encoding(y) # That means while they print the same, their raw (binary) # representation is actually quite different: charToRaw(x) charToRaw(y) # readr expects strings to be encoded as UTF-8. If they're # not, you'll get weird characters parse_character(x) parse_character(y) # If you know the encoding, supply it: parse_character(y, locale = locale(encoding = "latin1")) ``` If you don't know what encoding the file uses, try `guess_encoding()`. It's not 100% perfect (as it's fundamentally a heuristic), but should at least get you pointed in the right direction: ```{r} guess_encoding(x) guess_encoding(y) # Note that the first guess produces a valid string, but isn't correct: parse_character(y, locale = locale(encoding = "ISO-8859-2")) # But ISO-8859-1 is another name for latin1 parse_character(y, locale = locale(encoding = "ISO-8859-1")) ``` ## Numbers Some countries use the decimal point, while others use the decimal comma. The `decimal_mark` option controls which readr uses when parsing doubles: ```{r} parse_double("1,23", locale = locale(decimal_mark = ",")) ``` Additionally, when writing out big numbers, you might have `1,000,000`, `1.000.000`, `1 000 000`, or `1'000'000`. The grouping mark is ignored by the more flexible number parser: ```{r} parse_number("$1,234.56") parse_number("$1.234,56", locale = locale(decimal_mark = ",", grouping_mark = ".") ) # readr is smart enough to guess that if you're using , for decimals then # you're probably using . for grouping: parse_number("$1.234,56", locale = locale(decimal_mark = ",")) ``` readr/README.md0000644000175100001440000001257513106315672012662 0ustar hornikusers readr ================================================ [![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/readr)](http://cran.r-project.org/package=readr) [![Build Status](https://travis-ci.org/tidyverse/readr.svg?branch=master)](https://travis-ci.org/tidyverse/readr) [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/tidyverse/readr?branch=master&svg=true)](https://ci.appveyor.com/project/tidyverse/readr) [![Coverage Status](http://codecov.io/github/tidyverse/readr/coverage.svg?branch=master)](http://codecov.io/tidyverse/readr?branch=master) Overview -------- The goal of readr is to provide a fast and friendly way to read rectangular data (like csv, tsv, and fwf). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes. If you are new to readr, the best place to start is the [data import chapter](http://r4ds.had.co.nz/data-import.html) in R for data science. Installation ------------ ``` r # The easiest way to get readr is to install the whole tidyverse: install.packages("tidyverse") # Alternatively, install just readr: install.packages("readr") # Or the the development version from GitHub: # install.packages("devtools") devtools::install_github("tidyverse/readr") ``` Usage ----- readr is part of the core tidyverse, so load it with: ``` r library(tidyverse) ``` To accurately read a rectangular dataset with readr you combine two pieces: a function that parses the overall file, and a column specification. The column specification describes how each column should be converted from a character vector to the most appropriate data type, and in most cases it's not necessary because readr will guess it for you automatically. readr supports seven file formats with seven `read_` functions: - `read_csv()`: comma separated (CSV) files - `read_tsv()`: tab separated files - `read_delim()`: general delimited files - `read_fwf()`: fixed width files - `read_table()`: tabular files where colums are separated by white-space. - `read_log()`: web log files In many cases, these functions will just work: you supply the path to a file and you get a tibble back. The following example loads a sample file bundled with readr: ``` r mtcars <- read_csv(readr_example("mtcars.csv")) #> Parsed with column specification: #> cols( #> mpg = col_double(), #> cyl = col_integer(), #> disp = col_double(), #> hp = col_integer(), #> drat = col_double(), #> wt = col_double(), #> qsec = col_double(), #> vs = col_integer(), #> am = col_integer(), #> gear = col_integer(), #> carb = col_integer() #> ) ``` Note that readr prints the column specification. This is useful because it allows you to check that the columns have been read in as you expect, and if they haven't, you can easily copy and paste into a new call: ``` r mtcars <- read_csv(readr_example("mtcars.csv"), col_types = cols( mpg = col_double(), cyl = col_integer(), disp = col_double(), hp = col_integer(), drat = col_double(), vs = col_integer(), wt = col_double(), qsec = col_double(), am = col_integer(), gear = col_integer(), carb = col_integer() ) ) ``` `vignette("column-types")` gives more detail on how readr guess the column types, how you can override the defaults, and provides some useful tools for debugging parsing problems. Alternatives ------------ There are two main alternatives to readr: base R and data.table's `fread()`. The most important differences are discussed below. ### Base R Compared to the corresponding base functions, readr functions: - Use a consistent naming scheme for the parameters (e.g. `col_names` and `col_types` not `header` and `colClasses`). - Are much faster (up to 10x). - Leave strings as is by default, and automatically parse common date/time formats. - Have a helpful progress bar if loading is going to take a while. - All functions work exactly the same way regardless of the current locale. To override the US-centric defaults, use `locale()`. ### data.table and `fread()` [data.table](https://github.com/Rdatatable/data.table) has a function similar to `read_csv()` called fread. Compared to fread, readr functions: - Are slower (currently ~1.2-2x slower. If you want absolutely the best performance, use `data.table::fread()`. - Use a slightly more sophisticated parser, recognising both doubled (`""""`) and backslash escapes (`"\""`), and can produce factors and date/times directly. - Forces you to supply all parameters, where `fread()` saves you work by automatically guessing the delimiter, whether or not the file has a header, and how many lines to skip. - Are built on a different underlying infrastructure. Readr functions are designed to be quite general, which makes it easier to add support for new rectangular data formats. `fread()` is designed to be as fast as possible. Acknowledgements ---------------- Thanks to: - [Joe Cheng](https://github.com/jcheng5) for showing me the beauty of deterministic finite automata for parsing, and for teaching me why I should write a tokenizer. - [JJ Allaire](https://github.com/jjallaire) for helping me come up with a design that makes very few copies, and is easy to extend. - [Dirk Eddelbuettel](http://dirk.eddelbuettel.com) for coming up with the name! readr/MD50000644000175100001440000002327413106646435011715 0ustar hornikusersad0327685ef91a8824a4cebe7533ca74 *DESCRIPTION b234ee4d69f5fce4486a80fdaf4a4263 *LICENSE 08f59e5ac10cf0a1c6b2cc7b06cf9dbe *NAMESPACE 043504c811b7618ffaf5d233709dea88 *NEWS.md 4d8cd06e6accb7e97c45d9f1ec38b95d *R/POSIXct.R 5e41f8f462f6444aaeed792a4ede2b0f *R/RcppExports.R f2ae3ddf5dc1d26691fd70fb562db4a3 *R/callback.R 89491f1ac86b414a6e0f9a1878cb47db *R/col_types.R c9e1d8161ea958923e5ffaec7fa6124c *R/collectors.R e16cd98a9bd36fc70d2cb9178fd1c64c *R/count_fields.R e30c324f7416be4019d132e655f16a80 *R/date-symbols.R 48ae89f3562f4ad46358974e056eb81a *R/encoding.R 10667de8e9333a5aa7277b4b58f64e03 *R/example.R 5a5a39853aa020e6b06d508c042c1180 *R/file.R c0b6f3ce6be0904dcc55d6852608956d *R/lines.R b42a89b36d080f175f2aec002c523bbc *R/locale.R 6700527a5a87fde1eb98fd0852d0bceb *R/problems.R 4d39d040c294a156b65dc5d8d8c74adc *R/rds.R 21e90f4ccc0cb8a53d1f0e9b1a167f65 *R/read_delim.R 82ce3f7e1e1d44d7fc07f488b524af00 *R/read_delim_chunked.R 69184e8acdef9b8c788f139da1102586 *R/read_fwf.R 015c59fff5cbfa73aa0ab83050169a5b *R/read_lines_chunked.R 7372b171067416c00a1774ab61cdba5a *R/read_log.R 1005e88c6282de241e367ebcff945927 *R/read_table.R ae7224d8d38af47d23dbf21d6341ed44 *R/readr.R 36f3d485338a0890ced9134814b63842 *R/source.R 7485ed94645922a8f68273c9d2532890 *R/sysdata.rda e91c858c226ee7f60940866011d6c2b1 *R/tokenizer.R 4561b18a80d407b83f305d5c646f1a4a *R/type_convert.R 838d8693cdfdbf2b877ddb0ed670e6c4 *R/utils.R 30df0c2090280a05d7db8664f64f35ed *R/write.R 8b4e0830fbce4a15cd7ed3f5a709e985 *R/zzz.R ea8b3265d7b8e48dcf2be2d88d4b03b9 *README.md d1c3e45b5d85160716ee0e2496dc37bc *build/vignette.rds 8e492ecc4dd7b6a3efc9693795da606b *inst/doc/locales.R 9d5a871ab532c238a9ed9cb7b6e8b656 *inst/doc/locales.Rmd 6f36ac0ba6b6cc8f6deac9220642a415 *inst/doc/locales.html 44ef2e88bc8af75c262c857c0cc0886e *inst/doc/readr.R c59f342de72e2d1fdfb6c62628801815 *inst/doc/readr.Rmd 3677bf54b11bd324df0a0cb5b646393d *inst/doc/readr.html b05a668d9a4de93b5dd397cfde7905b1 *inst/extdata/challenge.csv fa584bf0652806be23f04c9938ec0ec8 *inst/extdata/epa78.txt 9dc92f35a3293d75ce989e1e694a57c7 *inst/extdata/example.log 891bca40aeba031848809c3e587b20d7 *inst/extdata/fwf-sample.txt 0e5e0f32575cc33876c3db780f327708 *inst/extdata/massey-rating.txt 5143f7b8ed70e91698d432d721c11a63 *inst/extdata/mtcars.csv 99100423693851c707ccdb228723ac83 *inst/extdata/mtcars.csv.bz2 d347f11bcaccca8806927c7a211a9640 *inst/extdata/mtcars.csv.zip 3b452b776d69f7fc547b137d73ef93b7 *man/Tokenizers.Rd 9f88500e35c54887912c0f8795299517 *man/callback.Rd 08bbe9163efa893bbbd90240e79ec4f7 *man/col_skip.Rd f300ee3903669f8445f25a5eb1a1efd8 *man/cols.Rd 128f1bbbd67caa93815c05987bafa8a4 *man/count_fields.Rd 74b7b542cb539ab33639790d7bc21b42 *man/datasource.Rd 66a90485ea28e913b4311932710bd554 *man/date_names.Rd c0c73fc6fc44077a31deb4c009f9b0e4 *man/encoding.Rd 257cd25ab6ac8437ffd528b47a34eedb *man/format_delim.Rd 83a1a644eb92fa45bb3671a9110b73eb *man/locale.Rd 9f4021a316a9ea3a0290fa03cc3fc53c *man/output_column.Rd 1179c735e4c54c3f1028d28231c6c332 *man/parse_atomic.Rd 2e5aba59289e9d9d7ac412d38ad119e3 *man/parse_datetime.Rd 5d77005e11ca0aeebd46f696fbfb5ac0 *man/parse_factor.Rd 2d9b6fe081e075d393ac2523a9eecf94 *man/parse_guess.Rd 1b572cca6749efeb58c232eb80f8b0f6 *man/parse_number.Rd e9b38e7ec3006ccc33c7293525c71c77 *man/parse_vector.Rd ee745636272636ed7c2ad6c4a06cc7b8 *man/problems.Rd 68044e1133f1599cbbd208e3bfa7e850 *man/read_delim.Rd 370e2c76cd2c81b9508d1bcb7677ad80 *man/read_delim_chunked.Rd f445cf44fe1299d5f4dc0e03a9f8f179 *man/read_file.Rd c3159dae2e65b6ca74af6ef2d58b0d04 *man/read_fwf.Rd 83c7da85e2a0abd54a6682a8eb74944e *man/read_lines.Rd df0fee3cfca3b7d44bfe25c143ba3eae *man/read_lines_chunked.Rd b4f117edec79644f6aefaed22d271cb8 *man/read_log.Rd 788f77fd851ffde336b7f7ba35417ad1 *man/read_rds.Rd bc0ccfc0e9539d36af596612d051908c *man/read_table.Rd 349b1e3f4ea47b01bc55c069a76c191b *man/readr-package.Rd 5d5a49990f4b8547f1dc1bfdd5ec39a4 *man/readr_example.Rd 3d051b2d5f299722be8c266542b9e88c *man/spec.Rd 800761ee5c8a71f596a6439881f9f87f *man/spec_delim.Rd 21cb902861c4518f30421f2cb613a497 *man/tokenize.Rd 88cd11d54976d893e9f2eb0382b33175 *man/type_convert.Rd e7a42ad67e30119e0a4ddb37fd8eff25 *man/write_delim.Rd 3ba3b2656e60856b929ba1046e7285b9 *src/Collector.cpp d5f64e5718b359b2b729c7c93c8e2bc3 *src/Collector.h bcbde355f49679373578e77ffbe54754 *src/CollectorGuess.cpp b240fe86acd4580f6e8bbc09cebe5bee *src/DateTime.h a753218d4164f0fce0e7439b0330b0fb *src/DateTimeParser.h b7edc2ff177cd82125ff43caf63c71f7 *src/Iconv.cpp f479885778600c5503db5ecdfa54ca0d *src/Iconv.h a74578a9b68c235d95f6509a7eb8f9ad *src/LocaleInfo.cpp d7a9446da671b6ef37721623cb89bc4e *src/LocaleInfo.h ea5c667e6402d3cc2c14d6ff9753b910 *src/Makevars.win 3b9a27bf03da6a2047b0f92934c06e50 *src/Progress.h ac6fc10e6245780b1d7bb9386da9e56a *src/QiParsers.h bc4f2f916cf19454ac162f8754e555b7 *src/RcppExports.cpp d55deab9f1c2c9b92586ad75eff0bc66 *src/Reader.cpp 1853eeeb49104347644ae4557c155f25 *src/Reader.h cfca8264d163024ba7c5b31110311838 *src/Source.cpp 0161c10273064a5a8c08eb172e2161f8 *src/Source.h c29ef6c761fb0c8c55f4d07c6b81e980 *src/SourceFile.h 3590b08461286ce99d3e67c5f76b35df *src/SourceRaw.h 6257cdeec9c2b1bdf2b6a285f79cd75d *src/SourceString.h aa7ce6d8c30625c11ec1e0aa3b3b1190 *src/Token.h 2e23a989109dc18ecc8ab7a1e52a6509 *src/Tokenizer.cpp d87374701e3aa1498fb5fef4f57bb3b8 *src/Tokenizer.h 52256d4fb87b78beb149781b2bf6d972 *src/TokenizerDelim.cpp 195d020f46c77577cfe7a7646b240ddc *src/TokenizerDelim.h 66d4f697eb43f7ba569ddc71712abcb5 *src/TokenizerFwf.cpp a42a131873d4b1fd34eb8bc93354edb0 *src/TokenizerFwf.h 3ac103427a97d28bce2fd4e8112eb69c *src/TokenizerLine.h de8b1e6f015fb58e454f44dd86a0d490 *src/TokenizerLog.h d72435866a4d766ef0ad1f732c2715f3 *src/TokenizerWs.cpp 84e9aec990cd9fa4ecf8d0288c45a4c0 *src/TokenizerWs.h 051b79564be7ae37dae965ef4528cd70 *src/Warnings.h e7851f437085853620a9b7bdc2c89dea *src/boost.h be2c92c3d2a0fb1ded1b2bc32a737fa4 *src/connection.cpp ddaa76b8a82daabcb374f62c03a5b74a *src/datetime.cpp d6239d17c5eb7d3393e34e882a3d960a *src/grisu3.c d763bbf07076d54cf56b950534f343ab *src/grisu3.h 08d6f7641ff5b9d6a964096e07cb3ea5 *src/init.c 6d35fc9bb10a5fac8baecdaf862fb7e4 *src/localtime.c fa6a0141e4e6563c551ba4c5b1f2497d *src/localtime.h e51f15029e7b831364b2dec23deafcfa *src/parse.cpp 3995bee83a3fecc69d1914b5928ffd4a *src/read.cpp e9e43bb954168c4b9c4a8116f3d4576e *src/type_convert.cpp e9f09c7e1def0cbc8caf715c2fa28f5c *src/tzfile.h 2cbcee2ab202a4038fe255b8acab5952 *src/utils.h 3b87e7c6fd17045c4691c3804debe060 *src/write.cpp 10d5b787f45dcbc1cc8f872ff956ee60 *src/write_connection.cpp 3e46f73b1816e04e576d40e37faf9071 *src/write_connection.h 3a463f20dd6aa51be165b655c0475441 *src/write_delim.cpp ac6e89c9cad51c62f8f6afeeda497df7 *tests/testthat.R 2686557b47e277b9177c830ce874eb22 *tests/testthat/basic-df-singlequote.csv d3d05c4f078dc2bf4c3640dd6a36db7b *tests/testthat/basic-df.csv d41d8cd98f00b204e9800998ecf8427e *tests/testthat/empty-file 7c924d6682b55bb601d4e5b428123709 *tests/testthat/enc-iso-8859-1.txt a06a26f43f86d0d2badd0c1c8c43ebf4 *tests/testthat/eol-cr.csv e55dde023260053db920dacbb2648d68 *tests/testthat/eol-cr.txt 87ad70e2779bf2fe683df5922e4a76a9 *tests/testthat/eol-cr.txt.bz2 bdb17292feb64034e5eb2924d5862801 *tests/testthat/eol-cr.txt.gz 4681b3bd5b571d733e085743fd59397d *tests/testthat/eol-cr.txt.xz d5b4be352f40c106430d43c5e861152d *tests/testthat/eol-cr.txt.zip 403913f0469f686e762c722326f8859b *tests/testthat/eol-crlf.csv e55dde023260053db920dacbb2648d68 *tests/testthat/eol-crlf.txt 920aabc4d3eabf4f3709c8aefcddff55 *tests/testthat/eol-lf.csv e55dde023260053db920dacbb2648d68 *tests/testthat/eol-lf.txt 2b4d8a640b79cf108e795e2a81a9cb4b *tests/testthat/fwf-trailing.txt bed1a49448a208359f8c8ba5a2acf208 *tests/testthat/helper.R 2e5a6ac9fca4e989ef83a3a090da9099 *tests/testthat/null-file ea427f49d3ef99f68cc3902c7d317a87 *tests/testthat/raw.csv 0add241c7230a0eec1d1d516b0c52264 *tests/testthat/sample_text.txt c99264a1a2e84493dfb2d96ff6475e61 *tests/testthat/test-col-spec.R f785d9dab79cb6ef69900928a6a814dc *tests/testthat/test-collectors.R 0e1e4c5a5dac6c994b10de00d3373196 *tests/testthat/test-encoding.R 49f45122bf5405b102b2ba0e3ce24387 *tests/testthat/test-eol.R 821a2c544a118cd8c2a4f53585d52524 *tests/testthat/test-locale.R 049e5e35f76152fa14c1b6245c0937d6 *tests/testthat/test-parsing-character.R e4ef48e7f58543fec7b351c7001355df *tests/testthat/test-parsing-count-fields.R 2a6d71e3f5985bd168327f315c10fe57 *tests/testthat/test-parsing-datetime.R 163e752c7380b8fed6a986a8ca26da74 *tests/testthat/test-parsing-factors.R 4c36f5576a2a15dc63f28ef22bc362de *tests/testthat/test-parsing-logical.R bb3862626a6c81947e9a2c1042ec6b10 *tests/testthat/test-parsing-numeric.R 66f42b34b630da477b510bc7cd7d56af *tests/testthat/test-parsing-time.R e413a7173eba27678274de75d0426735 *tests/testthat/test-parsing.R 75cbb003d6413c95b9690e822d1ecb55 *tests/testthat/test-problems.R c8da43bf60ea4b791e4e0523d0d94876 *tests/testthat/test-read-chunked.R dab1fc000c82b8cd45175988089c3a6a *tests/testthat/test-read-csv.R 72fedc218eadd7cd6bce486b47dc8f2d *tests/testthat/test-read-file.R 2debfb814c995ff659d76928b76f8b02 *tests/testthat/test-read-fwf.R d0d0d867482a456b7b8fda67faff1a75 *tests/testthat/test-read-lines.R eafc6a6b3bb3ffd8e87307ff647e1bd1 *tests/testthat/test-read-table.R 8def70b30d29c2c609f1342d74354df0 *tests/testthat/test-tokenizer-delim.R 7b7ba5e5028c7af8b1c7cf74b5ae6780 *tests/testthat/test-type-convert.R 35cda508fcf0a5c88241eed8be0d065a *tests/testthat/test-write-delim.R 5b45dcf3aadd82e1d9a8e99c655327aa *tests/testthat/test-write-lines.R be927e3b50837f942eff2f0ec19e5f2a *tools/logo.png 9d5a871ab532c238a9ed9cb7b6e8b656 *vignettes/locales.Rmd c59f342de72e2d1fdfb6c62628801815 *vignettes/readr.Rmd 4da6128da9f2fe51ab3f77f0a5f2e1ed *vignettes/releases/readr-0.1.0.Rmd 66e1ce4e0234d811dd11267c6325161d *vignettes/releases/readr-0.2.0.Rmd 7f1a25f4f55231db9fb8731744431c4e *vignettes/releases/readr-1.0.0.Rmd readr/build/0000755000175100001440000000000013106621354012465 5ustar hornikusersreadr/build/vignette.rds0000644000175100001440000000034113106621354015022 0ustar hornikusersu 0gڟYA >Q]t;"an1]O&)ts~9n!BhAX ˜pV%Z#T0.O֖JW )b-c7ٸ]r~!c5ܸ zsA<;f.H DžAVpBN3z=#{RA=^l9@h۶:9#CJ4I j΅readr/DESCRIPTION0000644000175100001440000000316413106646435013107 0ustar hornikusersPackage: readr Version: 1.1.1 Title: Read Rectangular Text Data Description: The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes. Authors@R: c( person("Hadley", "Wickham", , "hadley@rstudio.com", "aut"), person("Jim", "Hester", , "james.hester@rstudio.com", c("aut", "cre")), person("Romain", "Francois", role = "aut"), person("R Core Team", role = "ctb", comment = "Date time code adapted from R"), person("RStudio", role = c("cph", "fnd")), person("Jukka", "Jylänki", role = c("ctb", "cph"), comment = "grisu3 implementation"), person("Mikkel", "Jørgensen", role = c("ctb", "cph"), comment = "grisu3 implementation")) Encoding: UTF-8 Depends: R (>= 3.0.2) LinkingTo: Rcpp, BH Imports: Rcpp (>= 0.12.0.5), tibble, hms, R6 Suggests: curl, testthat, knitr, rmarkdown, stringi, covr License: GPL (>= 2) | file LICENSE BugReports: https://github.com/tidyverse/readr/issues URL: http://readr.tidyverse.org, https://github.com/tidyverse/readr VignetteBuilder: knitr RoxygenNote: 6.0.1 NeedsCompilation: yes Packaged: 2017-05-16 16:03:56 UTC; jhester Author: Hadley Wickham [aut], Jim Hester [aut, cre], Romain Francois [aut], R Core Team [ctb] (Date time code adapted from R), RStudio [cph, fnd], Jukka Jylänki [ctb, cph] (grisu3 implementation), Mikkel Jørgensen [ctb, cph] (grisu3 implementation) Maintainer: Jim Hester Repository: CRAN Date/Publication: 2017-05-16 19:03:57 UTC readr/man/0000755000175100001440000000000013106315672012144 5ustar hornikusersreadr/man/problems.Rd0000644000175100001440000000213413106315444014253 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/problems.R \name{problems} \alias{problems} \alias{stop_for_problems} \title{Retrieve parsing problems} \usage{ problems(x) stop_for_problems(x) } \arguments{ \item{x}{An data frame (from \code{read_*()}) or a vector (from \code{parse_*()}).} } \value{ A data frame with one row for each problem and four columns: \item{row,col}{Row and column of problem} \item{expected}{What readr expected to find} \item{actual}{What it actually got} } \description{ Readr functions will only throw an error if parsing fails in an unrecoverable way. However, there are lots of potential problems that you might want to know about - these are stored in the \code{problems} attribute of the output, which you can easily access with this function. \code{stop_for_problems()} will throw an error if there are any parsing problems: this is useful for automated scripts where you want to throw an error as soon as you encounter a problem. } \examples{ x <- parse_integer(c("1X", "blah", "3")) problems(x) y <- parse_integer(c("1", "2", "3")) problems(y) } readr/man/locale.Rd0000644000175100001440000000444313106315444013674 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/locale.R \name{locale} \alias{locale} \alias{default_locale} \title{Create locales} \usage{ locale(date_names = "en", date_format = "\%AD", time_format = "\%AT", decimal_mark = ".", grouping_mark = ",", tz = "UTC", encoding = "UTF-8", asciify = FALSE) default_locale() } \arguments{ \item{date_names}{Character representations of day and month names. Either the language code as string (passed on to \code{\link[=date_names_lang]{date_names_lang()}}) or an object created by \code{\link[=date_names]{date_names()}}.} \item{date_format, time_format}{Default date and time formats.} \item{decimal_mark, grouping_mark}{Symbols used to indicate the decimal place, and to chunk larger numbers. Decimal mark can only be \code{,} or \code{.}.} \item{tz}{Default tz. This is used both for input (if the time zone isn't present in individual strings), and for output (to control the default display). The default is to use "UTC", a time zone that does not use daylight savings time (DST) and hence is typically most useful for data. The absence of time zones makes it approximately 50x faster to generate UTC times than any other time zone. Use \code{""} to use the system default time zone, but beware that this will not be reproducible across systems. For a complete list of possible time zones, see \code{\link{OlsonNames}()}. Americans, note that "EST" is a Canadian time zone that does not have DST. It is \emph{not} Eastern Standard Time. It's better to use "US/Eastern", "US/Central" etc.} \item{encoding}{Default encoding. This only affects how the file is read - readr always converts the output to UTF-8.} \item{asciify}{Should diacritics be stripped from date names and converted to ASCII? This is useful if you're dealing with ASCII data where the correct spellings have been lost. Requires the \pkg{stringi} package.} } \description{ A locale object tries to capture all the defaults that can vary between countries. You set the locale in once, and the details are automatically passed on down to the columns parsers. The defaults have been chosen to match R (i.e. US English) as closely as possible. See \code{vignette("locales")} for more details. } \examples{ locale() locale("fr") # South American locale locale("es", decimal_mark = ",") } readr/man/parse_datetime.Rd0000644000175100001440000001426013106315444015421 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/collectors.R \name{parse_datetime} \alias{parse_datetime} \alias{parse_date} \alias{parse_time} \alias{col_datetime} \alias{col_date} \alias{col_time} \title{Parse date/times} \usage{ parse_datetime(x, format = "", na = c("", "NA"), locale = default_locale()) parse_date(x, format = "", na = c("", "NA"), locale = default_locale()) parse_time(x, format = "", na = c("", "NA"), locale = default_locale()) col_datetime(format = "") col_date(format = "") col_time(format = "") } \arguments{ \item{x}{A character vector of dates to parse.} \item{format}{A format specification, as described below. If set to "", date times are parsed as ISO8601, dates and times used the date and time formats specified in the \code{\link[=locale]{locale()}}. Unlike \code{\link[=strptime]{strptime()}}, the format specification must match the complete string.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} } \value{ A \code{\link[=POSIXct]{POSIXct()}} vector with \code{tzone} attribute set to \code{tz}. Elements that could not be parsed (or did not generate valid dates) will bes set to \code{NA}, and a warning message will inform you of the total number of failures. } \description{ Parse date/times } \section{Format specification}{ \code{readr} uses a format specification similar to \code{\link[=strptime]{strptime()}}. There are three types of element: \enumerate{ \item Date components are specified with "\%" followed by a letter. For example "\%Y" matches a 4 digit year, "\%m", matches a 2 digit month and "\%d" matches a 2 digit day. Month and day default to \code{1}, (i.e. Jan 1st) if not present, for example if only a year is given. \item Whitespace is any sequence of zero or more whitespace characters. \item Any other character is matched exactly. } \code{parse_datetime()} recognises the following format specifications: \itemize{ \item Year: "\%Y" (4 digits). "\%y" (2 digits); 00-69 -> 2000-2069, 70-99 -> 1970-1999. \item Month: "\%m" (2 digits), "\%b" (abbreviated name in current locale), "\%B" (full name in current locale). \item Day: "\%d" (2 digits), "\%e" (optional leading space) \item Hour: "\%H" or "\%I", use I (and not H) with AM/PM. \item Minutes: "\%M" \item Seconds: "\%S" (integer seconds), "\%OS" (partial seconds) \item Time zone: "\%Z" (as name, e.g. "America/Chicago"), "\%z" (as offset from UTC, e.g. "+0800") \item AM/PM indicator: "\%p". \item Non-digits: "\%." skips one non-digit character, "\%+" skips one or more non-digit characters, "\%*" skips any number of non-digits characters. \item Automatic parsers: "\%AD" parses with a flexible YMD parser, "\%AT" parses with a flexible HMS parser. \item Shortcuts: "\%D" = "\%m/\%d/\%y", "\%F" = "\%Y-\%m-\%d", "\%R" = "\%H:\%M", "\%T" = "\%H:\%M:\%S", "\%x" = "\%y/\%m/\%d". } } \section{ISO8601 support}{ Currently, readr does not support all of ISO8601. Missing features: \itemize{ \item Week & weekday specifications, e.g. "2013-W05", "2013-W05-10" \item Ordinal dates, e.g. "2013-095". \item Using commas instead of a period for decimal separator } The parser is also a little laxer than ISO8601: \itemize{ \item Dates and times can be separated with a space, not just T. \item Mostly correct specifications like "2009-05-19 14:" and "200912-01" work. } } \examples{ # Format strings -------------------------------------------------------- parse_datetime("01/02/2010", "\%d/\%m/\%Y") parse_datetime("01/02/2010", "\%m/\%d/\%Y") # Handle any separator parse_datetime("01/02/2010", "\%m\%.\%d\%.\%Y") # Dates look the same, but internally they use the number of days since # 1970-01-01 instead of the number of seconds. This avoids a whole lot # of troubles related to time zones, so use if you can. parse_date("01/02/2010", "\%d/\%m/\%Y") parse_date("01/02/2010", "\%m/\%d/\%Y") # You can parse timezones from strings (as listed in OlsonNames()) parse_datetime("2010/01/01 12:00 US/Central", "\%Y/\%m/\%d \%H:\%M \%Z") # Or from offsets parse_datetime("2010/01/01 12:00 -0600", "\%Y/\%m/\%d \%H:\%M \%z") # Use the locale parameter to control the default time zone # (but note UTC is considerably faster than other options) parse_datetime("2010/01/01 12:00", "\%Y/\%m/\%d \%H:\%M", locale = locale(tz = "US/Central")) parse_datetime("2010/01/01 12:00", "\%Y/\%m/\%d \%H:\%M", locale = locale(tz = "US/Eastern")) # Unlike strptime, the format specification must match the complete # string (ignoring leading and trailing whitespace). This avoids common # errors: strptime("01/02/2010", "\%d/\%m/\%y") parse_datetime("01/02/2010", "\%d/\%m/\%y") # Failures ------------------------------------------------------------- parse_datetime("01/01/2010", "\%d/\%m/\%Y") parse_datetime(c("01/ab/2010", "32/01/2010"), "\%d/\%m/\%Y") # Locales -------------------------------------------------------------- # By default, readr expects English date/times, but that's easy to change' parse_datetime("1 janvier 2015", "\%d \%B \%Y", locale = locale("fr")) parse_datetime("1 enero 2015", "\%d \%B \%Y", locale = locale("es")) # ISO8601 -------------------------------------------------------------- # With separators parse_datetime("1979-10-14") parse_datetime("1979-10-14T10") parse_datetime("1979-10-14T10:11") parse_datetime("1979-10-14T10:11:12") parse_datetime("1979-10-14T10:11:12.12345") # Without separators parse_datetime("19791014") parse_datetime("19791014T101112") # Time zones us_central <- locale(tz = "US/Central") parse_datetime("1979-10-14T1010", locale = us_central) parse_datetime("1979-10-14T1010-0500", locale = us_central) parse_datetime("1979-10-14T1010Z", locale = us_central) # Your current time zone parse_datetime("1979-10-14T1010", locale = locale(tz = "")) } \seealso{ Other parsers: \code{\link{col_skip}}, \code{\link{parse_factor}}, \code{\link{parse_guess}}, \code{\link{parse_logical}}, \code{\link{parse_number}} } readr/man/read_lines.Rd0000644000175100001440000000576313106315444014550 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/lines.R \name{read_lines} \alias{read_lines} \alias{read_lines_raw} \alias{write_lines} \title{Read/write lines to/from a file} \usage{ read_lines(file, skip = 0, n_max = -1L, locale = default_locale(), na = character(), progress = show_progress()) read_lines_raw(file, skip = 0, n_max = -1L, progress = show_progress()) write_lines(x, path, na = "NA", append = FALSE) } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \code{http://}, \code{https://}, \code{ftp://}, or \code{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).} \item{skip}{Number of lines to skip before reading data.} \item{n_max}{Number of lines to read. If \code{n_max} is -1, all lines in file will be read.} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{progress}{Display a progress bar? By default it will only display in an interactive session and not while knitting a document. The display is updated every 50,000 values and will only display if estimated reading time is 5 seconds or more. The automatic progress bar can be disabled by setting option \code{readr.show_progress} to \code{FALSE}.} \item{x}{A data frame to write to disk} \item{path}{Path or connection to write to.} \item{append}{If \code{FALSE}, will overwrite existing file. If \code{TRUE}, will append to existing file. In both cases, if file does not exist a new file is created.} } \value{ \code{read_lines()}: A character vector with one element for each line. \code{read_lines_raw()}: A list containing a raw vector for each line. \code{write_lines()} returns \code{x}, invisibly. } \description{ \code{read_lines()} reads up to \code{n_max} lines from a file. New lines are not included in the output. \code{read_lines_raw()} produces a list of raw vectors, and is useful for handling data with unknown encoding. \code{write_lines()} takes a character vector or list of raw vectors, appending a new line after each entry. } \examples{ read_lines(file.path(R.home("doc"), "AUTHORS"), n_max = 10) read_lines_raw(file.path(R.home("doc"), "AUTHORS"), n_max = 10) tmp <- tempfile() write_lines(rownames(mtcars), tmp) read_lines(tmp) read_file(tmp) # note trailing \\n write_lines(airquality$Ozone, tmp, na = "-1") read_lines(tmp) } readr/man/format_delim.Rd0000644000175100001440000000332713106315444015077 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/write.R \name{format_delim} \alias{format_delim} \alias{format_csv} \alias{format_tsv} \title{Convert a data frame to a delimited string} \usage{ format_delim(x, delim, na = "NA", append = FALSE, col_names = !append) format_csv(x, na = "NA", append = FALSE, col_names = !append) format_tsv(x, na = "NA", append = FALSE, col_names = !append) } \arguments{ \item{x}{A data frame to write to disk} \item{delim}{Delimiter used to separate values. Defaults to \code{" "}. Must be a single character.} \item{na}{String used for missing values. Defaults to NA. Missing values will never be quoted; strings with the same value as \code{na} will always be quoted.} \item{append}{If \code{FALSE}, will overwrite existing file. If \code{TRUE}, will append to existing file. In both cases, if file does not exist a new file is created.} \item{col_names}{Write columns names at the top of the file?} } \value{ A string. } \description{ These functions are equivalent to \code{\link[=write_csv]{write_csv()}} etc., but instead of writing to disk, they return a string. } \section{Output}{ Factors are coerced to character. Doubles are formatted using the grisu3 algorithm. POSIXct's are formatted as ISO8601. All columns are encoded as UTF-8. \code{write_excel_csv()} also includes a \href{https://en.wikipedia.org/wiki/Byte_order_mark}{UTF-8 Byte order mark} which indicates to Excel the csv is UTF-8 encoded. Values are only quoted if needed: if they contain a comma, quote or newline. } \references{ Florian Loitsch, Printing Floating-Point Numbers Quickly and Accurately with Integers, PLDI '10, \url{http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf} } readr/man/spec.Rd0000644000175100001440000000130213106315444013356 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/col_types.R \name{cols_condense} \alias{cols_condense} \alias{spec} \title{Examine the column specifications for a data frame} \usage{ cols_condense(x) spec(x) } \arguments{ \item{x}{The data frame object to extract from} } \value{ A col_spec object. } \description{ \code{cols_condense()} takes a spec object and condenses its definition by setting the default column type to the most frequent type and only listing columns with a different type. \code{spec()} extracts the full column specification from a tibble created by readr. } \examples{ df <- read_csv(readr_example("mtcars.csv")) s <- spec(df) s cols_condense(s) } readr/man/spec_delim.Rd0000644000175100001440000001414513106315672014544 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/read_delim.R, R/read_table.R \name{spec_delim} \alias{spec_delim} \alias{spec_csv} \alias{spec_csv2} \alias{spec_tsv} \alias{spec_table} \title{Generate a column specification} \usage{ spec_delim(file, delim, quote = "\\"", escape_backslash = FALSE, escape_double = TRUE, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, comment = "", trim_ws = FALSE, skip = 0, n_max = 0, guess_max = 1000, progress = show_progress()) spec_csv(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\\"", comment = "", trim_ws = TRUE, skip = 0, n_max = 0, guess_max = 1000, progress = show_progress()) spec_csv2(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\\"", comment = "", trim_ws = TRUE, skip = 0, n_max = 0, guess_max = 1000, progress = show_progress()) spec_tsv(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\\"", comment = "", trim_ws = TRUE, skip = 0, n_max = 0, guess_max = 1000, progress = show_progress()) spec_table(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = "NA", skip = 0, n_max = 0, guess_max = 1000, progress = show_progress(), comment = "") } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \code{http://}, \code{https://}, \code{ftp://}, or \code{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).} \item{delim}{Single character used to separate fields within a record.} \item{quote}{Single character used to quote strings.} \item{escape_backslash}{Does the file use backslashes to escape special characters? This is more general than \code{escape_double} as backslashes can be used to escape the delimiter character, the quote character, or to add special characters like \code{\\n}.} \item{escape_double}{Does the file escape quotes by doubling them? i.e. If this option is \code{TRUE}, the value \code{""""} represents a single quote, \code{\"}.} \item{col_names}{Either \code{TRUE}, \code{FALSE} or a character vector of column names. If \code{TRUE}, the first row of the input will be used as the column names, and will not be included in the data frame. If \code{FALSE}, column names will be generated automatically: X1, X2, X3 etc. If \code{col_names} is a character vector, the values will be used as the names of the columns, and the first row of the input will be read into the first row of the output data frame. Missing (\code{NA}) column names will generate a warning, and be filled in with dummy names \code{X1}, \code{X2} etc. Duplicate column names will generate a warning and be made unique with a numeric prefix.} \item{col_types}{One of \code{NULL}, a \code{\link[=cols]{cols()}} specification, or a string. See \code{vignette("column-types")} for more details. If \code{NULL}, all column types will be imputed from the first 1000 rows on the input. This is convenient (and fast), but not robust. If the imputation fails, you'll need to supply the correct types yourself. If a column specification created by \code{\link[=cols]{cols()}}, it must contain one column specification for each column. If you only want to read a subset of the columns, use \code{\link[=cols_only]{cols_only()}}. Alternatively, you can use a compact string representation where each character represents one column: c = character, i = integer, n = number, d = double, l = logical, D = date, T = date time, t = time, ? = guess, or \code{_}/\code{-} to skip the column.} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{quoted_na}{Should missing values inside quotes be treated as missing values (the default) or strings.} \item{comment}{A string used to identify comments. Any text after the comment characters will be silently ignored.} \item{trim_ws}{Should leading and trailing whitespace be trimmed from each field before parsing it?} \item{skip}{Number of lines to skip before reading data.} \item{n_max}{Maximum number of records to read.} \item{guess_max}{Maximum number of records to use for guessing column types.} \item{progress}{Display a progress bar? By default it will only display in an interactive session and not while knitting a document. The display is updated every 50,000 values and will only display if estimated reading time is 5 seconds or more. The automatic progress bar can be disabled by setting option \code{readr.show_progress} to \code{FALSE}.} } \value{ The \code{col_spec} generated for the file. } \description{ When printed, only the first 20 columns are printed by default. To override, set \code{options(readr.num_columns)} can be used to modify this (a value of 0 turns off printing). } \examples{ # Input sources ------------------------------------------------------------- # Retrieve specs from a path spec_csv(system.file("extdata/mtcars.csv", package = "readr")) spec_csv(system.file("extdata/mtcars.csv.zip", package = "readr")) # Or directly from a string (must contain a newline) spec_csv("x,y\\n1,2\\n3,4") # Column types -------------------------------------------------------------- # By default, readr guesses the columns types, looking at the first 1000 rows. # You can specify the number of rows used with guess_max. spec_csv(system.file("extdata/mtcars.csv", package = "readr"), guess_max = 20) } readr/man/read_lines_chunked.Rd0000644000175100001440000000377313106315444016250 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/read_lines_chunked.R \name{read_lines_chunked} \alias{read_lines_chunked} \title{Read lines from a file or string by chunk.} \usage{ read_lines_chunked(file, callback, chunk_size = 10000, skip = 0, locale = default_locale(), na = character(), progress = show_progress()) } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \code{http://}, \code{https://}, \code{ftp://}, or \code{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).} \item{callback}{A callback function to call on each chunk} \item{chunk_size}{The number of rows to include in each chunk} \item{skip}{Number of lines to skip before reading data.} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{progress}{Display a progress bar? By default it will only display in an interactive session and not while knitting a document. The display is updated every 50,000 values and will only display if estimated reading time is 5 seconds or more. The automatic progress bar can be disabled by setting option \code{readr.show_progress} to \code{FALSE}.} } \description{ Read lines from a file or string by chunk. } \seealso{ Other chunked: \code{\link{callback}}, \code{\link{read_delim_chunked}} } \keyword{internal} readr/man/datasource.Rd0000644000175100001440000000242113106315444014561 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/source.R \name{datasource} \alias{datasource} \title{Create a source object.} \usage{ datasource(file, skip = 0, comment = "") } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \code{http://}, \code{https://}, \code{ftp://}, or \code{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).} \item{skip}{Number of lines to skip before reading data.} } \description{ Create a source object. } \examples{ # Literal csv datasource("a,b,c\\n1,2,3") datasource(charToRaw("a,b,c\\n1,2,3")) # Strings datasource(readr_example("mtcars.csv")) datasource(readr_example("mtcars.csv.bz2")) datasource(readr_example("mtcars.csv.zip")) \dontrun{ datasource("https://github.com/tidyverse/readr/raw/master/inst/extdata/mtcars.csv") } # Connection con <- rawConnection(charToRaw("abc\\n123")) datasource(con) close(con) } \keyword{internal} readr/man/parse_guess.Rd0000644000175100001440000000315413106315444014753 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/collectors.R \name{parse_guess} \alias{parse_guess} \alias{col_guess} \alias{guess_parser} \title{Parse using the "best" type} \usage{ parse_guess(x, na = c("", "NA"), locale = default_locale()) col_guess() guess_parser(x, locale = default_locale()) } \arguments{ \item{x}{Character vector of values to parse.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} } \description{ \code{parse_guess()} returns the parser vector; \code{guess_parser()} returns the name of the parser. These functions use a number of heuristics to determine which type of vector is "best". Generally they try to err of the side of safety, as it's straightforward to override the parsing choice if needed. } \examples{ # Logical vectors parse_guess(c("FALSE", "TRUE", "F", "T")) # Integers and doubles parse_guess(c("1","2","3")) parse_guess(c("1.6","2.6","3.4")) # Numbers containing grouping mark guess_parser("1,234,566") parse_guess("1,234,566") # ISO 8601 date times guess_parser(c("2010-10-10")) parse_guess(c("2010-10-10")) } \seealso{ Other parsers: \code{\link{col_skip}}, \code{\link{parse_datetime}}, \code{\link{parse_factor}}, \code{\link{parse_logical}}, \code{\link{parse_number}} } readr/man/parse_factor.Rd0000644000175100001440000000351713106315444015106 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/collectors.R \name{parse_factor} \alias{parse_factor} \alias{col_factor} \title{Parse factors} \usage{ parse_factor(x, levels, ordered = FALSE, na = c("", "NA"), locale = default_locale(), include_na = TRUE) col_factor(levels, ordered = FALSE, include_na = FALSE) } \arguments{ \item{x}{Character vector of values to parse.} \item{levels}{Character vector providing set of allowed levels. if \code{NULL}, will generate levels based on the unique values of \code{x}, ordered by order of appearance in \code{x}.} \item{ordered}{Is it an ordered factor?} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} \item{include_na}{If \code{NA} are present, include as an explicit factor to level?} } \description{ \code{parse_factor} is similar to \code{\link[=factor]{factor()}}, but will generate warnings if elements of \code{x} are not found in \code{levels}. } \examples{ parse_factor(c("a", "b"), letters) x <- c("cat", "dog", "caw") levels <- c("cat", "dog", "cow") # Base R factor() silently converts unknown levels to NA x1 <- factor(x, levels) # parse_factor generates a warning & problems x2 <- parse_factor(x, levels) # Using an argument of `NULL` will generate levels based on values of `x` x2 <- parse_factor(x, levels = NULL) } \seealso{ Other parsers: \code{\link{col_skip}}, \code{\link{parse_datetime}}, \code{\link{parse_guess}}, \code{\link{parse_logical}}, \code{\link{parse_number}} } readr/man/Tokenizers.Rd0000644000175100001440000000427013106315444014570 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/tokenizer.R \name{Tokenizers} \alias{Tokenizers} \alias{tokenizer_delim} \alias{tokenizer_csv} \alias{tokenizer_tsv} \alias{tokenizer_line} \alias{tokenizer_log} \alias{tokenizer_fwf} \alias{tokenizer_ws} \title{Tokenizers.} \usage{ tokenizer_delim(delim, quote = "\\"", na = "NA", quoted_na = TRUE, comment = "", trim_ws = TRUE, escape_double = TRUE, escape_backslash = FALSE) tokenizer_csv(na = "NA", quoted_na = TRUE, quote = "\\"", comment = "", trim_ws = TRUE) tokenizer_tsv(na = "NA", quoted_na = TRUE, quote = "\\"", comment = "", trim_ws = TRUE) tokenizer_line(na = character()) tokenizer_log() tokenizer_fwf(begin, end, na = "NA", comment = "") tokenizer_ws(na = "NA", comment = "") } \arguments{ \item{delim}{Single character used to separate fields within a record.} \item{quote}{Single character used to quote strings.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{quoted_na}{Should missing values inside quotes be treated as missing values (the default) or strings.} \item{comment}{A string used to identify comments. Any text after the comment characters will be silently ignored.} \item{trim_ws}{Should leading and trailing whitespace be trimmed from each field before parsing it?} \item{escape_double}{Does the file escape quotes by doubling them? i.e. If this option is \code{TRUE}, the value \code{""""} represents a single quote, \code{\"}.} \item{escape_backslash}{Does the file use backslashes to escape special characters? This is more general than \code{escape_double} as backslashes can be used to escape the delimiter character, the quote character, or to add special characters like \code{\\n}.} \item{begin, end}{Begin and end offsets for each file. These are C++ offsets so the first column is column zero, and the ranges are [begin, end) (i.e inclusive-exclusive).} } \description{ Explicitly create tokenizer objects. Usually you will not call these function, but will instead use one of the use friendly wrappers like \code{\link[=read_csv]{read_csv()}}. } \examples{ tokenizer_csv() } \keyword{internal} readr/man/read_delim.Rd0000644000175100001440000001541313106315672014524 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/read_delim.R \name{read_delim} \alias{read_delim} \alias{read_csv} \alias{read_csv2} \alias{read_tsv} \title{Read a delimited file (including csv & tsv) into a tibble} \usage{ read_delim(file, delim, quote = "\\"", escape_backslash = FALSE, escape_double = TRUE, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, comment = "", trim_ws = FALSE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress()) read_csv(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\\"", comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress()) read_csv2(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\\"", comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress()) read_tsv(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\\"", comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress()) } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \code{http://}, \code{https://}, \code{ftp://}, or \code{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).} \item{delim}{Single character used to separate fields within a record.} \item{quote}{Single character used to quote strings.} \item{escape_backslash}{Does the file use backslashes to escape special characters? This is more general than \code{escape_double} as backslashes can be used to escape the delimiter character, the quote character, or to add special characters like \code{\\n}.} \item{escape_double}{Does the file escape quotes by doubling them? i.e. If this option is \code{TRUE}, the value \code{""""} represents a single quote, \code{\"}.} \item{col_names}{Either \code{TRUE}, \code{FALSE} or a character vector of column names. If \code{TRUE}, the first row of the input will be used as the column names, and will not be included in the data frame. If \code{FALSE}, column names will be generated automatically: X1, X2, X3 etc. If \code{col_names} is a character vector, the values will be used as the names of the columns, and the first row of the input will be read into the first row of the output data frame. Missing (\code{NA}) column names will generate a warning, and be filled in with dummy names \code{X1}, \code{X2} etc. Duplicate column names will generate a warning and be made unique with a numeric prefix.} \item{col_types}{One of \code{NULL}, a \code{\link[=cols]{cols()}} specification, or a string. See \code{vignette("column-types")} for more details. If \code{NULL}, all column types will be imputed from the first 1000 rows on the input. This is convenient (and fast), but not robust. If the imputation fails, you'll need to supply the correct types yourself. If a column specification created by \code{\link[=cols]{cols()}}, it must contain one column specification for each column. If you only want to read a subset of the columns, use \code{\link[=cols_only]{cols_only()}}. Alternatively, you can use a compact string representation where each character represents one column: c = character, i = integer, n = number, d = double, l = logical, D = date, T = date time, t = time, ? = guess, or \code{_}/\code{-} to skip the column.} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{quoted_na}{Should missing values inside quotes be treated as missing values (the default) or strings.} \item{comment}{A string used to identify comments. Any text after the comment characters will be silently ignored.} \item{trim_ws}{Should leading and trailing whitespace be trimmed from each field before parsing it?} \item{skip}{Number of lines to skip before reading data.} \item{n_max}{Maximum number of records to read.} \item{guess_max}{Maximum number of records to use for guessing column types.} \item{progress}{Display a progress bar? By default it will only display in an interactive session and not while knitting a document. The display is updated every 50,000 values and will only display if estimated reading time is 5 seconds or more. The automatic progress bar can be disabled by setting option \code{readr.show_progress} to \code{FALSE}.} } \value{ A data frame. If there are parsing problems, a warning tells you how many, and you can retrieve the details with \code{\link{problems}()}. } \description{ \code{read_csv()} and \code{read_tsv()} are special cases of the general \code{read_delim()}. They're useful for reading the most common types of flat file data, comma separated values and tab separated values, respectively. \code{read_csv2()} uses \code{;} for separators, instead of \code{,}. This is common in European countries which use \code{,} as the decimal separator. } \examples{ # Input sources ------------------------------------------------------------- # Read from a path read_csv(readr_example("mtcars.csv")) read_csv(readr_example("mtcars.csv.zip")) read_csv(readr_example("mtcars.csv.bz2")) read_csv("https://github.com/tidyverse/readr/raw/master/inst/extdata/mtcars.csv") # Or directly from a string (must contain a newline) read_csv("x,y\\n1,2\\n3,4") # Column types -------------------------------------------------------------- # By default, readr guesses the columns types, looking at the first 100 rows. # You can override with a compact specification: read_csv("x,y\\n1,2\\n3,4", col_types = "dc") # Or with a list of column types: read_csv("x,y\\n1,2\\n3,4", col_types = list(col_double(), col_character())) # If there are parsing problems, you get a warning, and can extract # more details with problems() y <- read_csv("x\\n1\\n2\\nb", col_types = list(col_double())) y problems(y) # File types ---------------------------------------------------------------- read_csv("a,b\\n1.0,2.0") read_csv2("a;b\\n1,0;2,0") read_tsv("a\\tb\\n1.0\\t2.0") read_delim("a|b\\n1.0|2.0", delim = "|") } readr/man/cols.Rd0000644000175100001440000000144413106315444013373 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/col_types.R \name{cols} \alias{cols} \alias{cols_only} \title{Create column specification} \usage{ cols(..., .default = col_guess()) cols_only(...) } \arguments{ \item{...}{Either column objects created by \code{col_*()}, or their abbreviated character names. If you're only overriding a few columns, it's best to refer to columns by name. If not named, the column types must match the column names exactly.} \item{.default}{Any named columns not explicitly overridden in \code{...} will be read with this column type.} } \description{ Create column specification } \examples{ cols(a = col_integer()) cols_only(a = col_integer()) # You can also use the standard abreviations cols(a = "i") cols(a = "i", b = "d", c = "_") } readr/man/read_log.Rd0000644000175100001440000000604013106315672014207 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/read_log.R \name{read_log} \alias{read_log} \title{Read common/combined log file into a tibble} \usage{ read_log(file, col_names = FALSE, col_types = NULL, skip = 0, n_max = Inf, progress = show_progress()) } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \code{http://}, \code{https://}, \code{ftp://}, or \code{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).} \item{col_names}{Either \code{TRUE}, \code{FALSE} or a character vector of column names. If \code{TRUE}, the first row of the input will be used as the column names, and will not be included in the data frame. If \code{FALSE}, column names will be generated automatically: X1, X2, X3 etc. If \code{col_names} is a character vector, the values will be used as the names of the columns, and the first row of the input will be read into the first row of the output data frame. Missing (\code{NA}) column names will generate a warning, and be filled in with dummy names \code{X1}, \code{X2} etc. Duplicate column names will generate a warning and be made unique with a numeric prefix.} \item{col_types}{One of \code{NULL}, a \code{\link[=cols]{cols()}} specification, or a string. See \code{vignette("column-types")} for more details. If \code{NULL}, all column types will be imputed from the first 1000 rows on the input. This is convenient (and fast), but not robust. If the imputation fails, you'll need to supply the correct types yourself. If a column specification created by \code{\link[=cols]{cols()}}, it must contain one column specification for each column. If you only want to read a subset of the columns, use \code{\link[=cols_only]{cols_only()}}. Alternatively, you can use a compact string representation where each character represents one column: c = character, i = integer, n = number, d = double, l = logical, D = date, T = date time, t = time, ? = guess, or \code{_}/\code{-} to skip the column.} \item{skip}{Number of lines to skip before reading data.} \item{n_max}{Maximum number of records to read.} \item{progress}{Display a progress bar? By default it will only display in an interactive session and not while knitting a document. The display is updated every 50,000 values and will only display if estimated reading time is 5 seconds or more. The automatic progress bar can be disabled by setting option \code{readr.show_progress} to \code{FALSE}.} } \description{ This is a fairly standard format for log files - it uses both quotes and square brackets for quoting, and there may be literal quotes embedded in a quoted string. The dash, "-", is used for missing values. } \examples{ read_log(readr_example("example.log")) } readr/man/write_delim.Rd0000644000175100001440000000515113106315444014736 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/write.R \name{write_delim} \alias{write_delim} \alias{write_csv} \alias{write_excel_csv} \alias{write_tsv} \title{Write a data frame to a delimited file} \usage{ write_delim(x, path, delim = " ", na = "NA", append = FALSE, col_names = !append) write_csv(x, path, na = "NA", append = FALSE, col_names = !append) write_excel_csv(x, path, na = "NA", append = FALSE, col_names = !append) write_tsv(x, path, na = "NA", append = FALSE, col_names = !append) } \arguments{ \item{x}{A data frame to write to disk} \item{path}{Path or connection to write to.} \item{delim}{Delimiter used to separate values. Defaults to \code{" "}. Must be a single character.} \item{na}{String used for missing values. Defaults to NA. Missing values will never be quoted; strings with the same value as \code{na} will always be quoted.} \item{append}{If \code{FALSE}, will overwrite existing file. If \code{TRUE}, will append to existing file. In both cases, if file does not exist a new file is created.} \item{col_names}{Write columns names at the top of the file?} } \value{ \code{write_*()} returns the input \code{x} invisibly. } \description{ This is about twice as fast as \code{\link[=write.csv]{write.csv()}}, and never writes row names. \code{output_column()} is a generic method used to coerce columns to suitable output. } \section{Output}{ Factors are coerced to character. Doubles are formatted using the grisu3 algorithm. POSIXct's are formatted as ISO8601. All columns are encoded as UTF-8. \code{write_excel_csv()} also includes a \href{https://en.wikipedia.org/wiki/Byte_order_mark}{UTF-8 Byte order mark} which indicates to Excel the csv is UTF-8 encoded. Values are only quoted if needed: if they contain a comma, quote or newline. } \examples{ tmp <- tempfile() write_csv(mtcars, tmp) head(read_csv(tmp)) # format_* is useful for testing and reprexes cat(format_csv(head(mtcars))) cat(format_tsv(head(mtcars))) cat(format_delim(head(mtcars), ";")) df <- data.frame(x = c(1, 2, NA)) format_csv(df, na = ".") # Quotes are automatically as needed df <- data.frame(x = c("a", '"', ",", "\\n")) cat(format_csv(df)) # A output connection will be automatically created for output filenames # with appropriate extensions. dir <- tempdir() write_tsv(mtcars, file.path(dir, "mtcars.tsv.gz")) write_tsv(mtcars, file.path(dir, "mtcars.tsv.bz2")) write_tsv(mtcars, file.path(dir, "mtcars.tsv.xz")) } \references{ Florian Loitsch, Printing Floating-Point Numbers Quickly and Accurately with Integers, PLDI '10, \url{http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf} } readr/man/read_file.Rd0000644000175100001440000000402213106315444014340 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/file.R \name{read_file} \alias{read_file} \alias{read_file_raw} \alias{write_file} \title{Read/write a complete file} \usage{ read_file(file, locale = default_locale()) read_file_raw(file) write_file(x, path, append = FALSE) } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \code{http://}, \code{https://}, \code{ftp://}, or \code{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} \item{x}{A data frame to write to disk} \item{path}{Path or connection to write to.} \item{append}{If \code{FALSE}, will overwrite existing file. If \code{TRUE}, will append to existing file. In both cases, if file does not exist a new file is created.} } \value{ \code{read_file}: A length 1 character vector. \code{read_lines_raw}: A raw vector. } \description{ \code{read_file()} reads a complete file into a single object: either a character vector of length one, or a raw vector. \code{write_file()} takes a single string, or a raw vector, and writes it exactly as is. Raw vectors are useful when dealing with binary data, or if you have text data with unknown encoding. } \examples{ read_file(file.path(R.home("doc"), "AUTHORS")) read_file_raw(file.path(R.home("doc"), "AUTHORS")) tmp <- tempfile() x <- format_csv(mtcars[1:6, ]) write_file(x, tmp) identical(x, read_file(tmp)) read_lines(x) } readr/man/encoding.Rd0000644000175100001440000000163213106315444014220 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/encoding.R \name{guess_encoding} \alias{guess_encoding} \title{Guess encoding of file} \usage{ guess_encoding(file, n_max = 10000, threshold = 0.2) } \arguments{ \item{file}{A character string specifying an input as specified in \code{\link[=datasource]{datasource()}}, a raw vector, or a list of raw vectors.} \item{n_max}{Number of lines to read. If \code{n_max} is -1, all lines in file will be read.} \item{threshold}{Only report guesses above this threshold of certainty.} } \value{ A tibble } \description{ Uses \code{\link[stringi:stri_enc_detect]{stringi::stri_enc_detect()}}: see the documentation there for caveats. } \examples{ guess_encoding(readr_example("mtcars.csv")) guess_encoding(read_lines_raw(readr_example("mtcars.csv"))) guess_encoding(read_file_raw(readr_example("mtcars.csv"))) guess_encoding("a\\n\\u00b5\\u00b5") } readr/man/read_rds.Rd0000644000175100001440000000207313106315444014215 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/rds.R \name{read_rds} \alias{read_rds} \alias{write_rds} \title{Read/write RDS files.} \usage{ read_rds(path) write_rds(x, path, compress = c("none", "gz", "bz2", "xz"), ...) } \arguments{ \item{path}{Path to read from/write to.} \item{x}{R object to write to serialise.} \item{compress}{Compression method to use: "none", "gz" ,"bz", or "xz".} \item{...}{Additional arguments to connection function. For example, control the space-time trade-off of different compression methods with \code{compression}. See \code{\link[=connections]{connections()}} for more details.} } \value{ \code{write_rds()} returns \code{x}, invisibly. } \description{ Consistent wrapper around \code{\link[=saveRDS]{saveRDS()}} and \code{\link[=readRDS]{readRDS()}}. \code{write_rds()} does not compress by default as space is generally cheaper than time. } \examples{ temp <- tempfile() write_rds(mtcars, temp) read_rds(temp) \dontrun{ write_rds(mtcars, "compressed_mtc.rds", "xz", compression = 9L) } } \keyword{internal} readr/man/read_table.Rd0000644000175100001440000001155713106315672014526 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/read_table.R \name{read_table} \alias{read_table} \alias{read_table2} \title{Read whitespace-separated columns into a tibble} \usage{ read_table(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = "NA", skip = 0, n_max = Inf, guess_max = min(n_max, 1000), progress = show_progress(), comment = "") read_table2(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = "NA", skip = 0, n_max = Inf, guess_max = min(n_max, 1000), progress = show_progress(), comment = "") } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \code{http://}, \code{https://}, \code{ftp://}, or \code{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).} \item{col_names}{Either \code{TRUE}, \code{FALSE} or a character vector of column names. If \code{TRUE}, the first row of the input will be used as the column names, and will not be included in the data frame. If \code{FALSE}, column names will be generated automatically: X1, X2, X3 etc. If \code{col_names} is a character vector, the values will be used as the names of the columns, and the first row of the input will be read into the first row of the output data frame. Missing (\code{NA}) column names will generate a warning, and be filled in with dummy names \code{X1}, \code{X2} etc. Duplicate column names will generate a warning and be made unique with a numeric prefix.} \item{col_types}{One of \code{NULL}, a \code{\link[=cols]{cols()}} specification, or a string. See \code{vignette("column-types")} for more details. If \code{NULL}, all column types will be imputed from the first 1000 rows on the input. This is convenient (and fast), but not robust. If the imputation fails, you'll need to supply the correct types yourself. If a column specification created by \code{\link[=cols]{cols()}}, it must contain one column specification for each column. If you only want to read a subset of the columns, use \code{\link[=cols_only]{cols_only()}}. Alternatively, you can use a compact string representation where each character represents one column: c = character, i = integer, n = number, d = double, l = logical, D = date, T = date time, t = time, ? = guess, or \code{_}/\code{-} to skip the column.} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{skip}{Number of lines to skip before reading data.} \item{n_max}{Maximum number of records to read.} \item{guess_max}{Maximum number of records to use for guessing column types.} \item{progress}{Display a progress bar? By default it will only display in an interactive session and not while knitting a document. The display is updated every 50,000 values and will only display if estimated reading time is 5 seconds or more. The automatic progress bar can be disabled by setting option \code{readr.show_progress} to \code{FALSE}.} \item{comment}{A string used to identify comments. Any text after the comment characters will be silently ignored.} } \description{ \code{read_table()} and \code{read_table2()} are designed to read the type of textual data where each column is #' separate by one (or more) columns of space. \code{read_table2()} is like \code{\link[=read.table]{read.table()}}, it allows any number of whitespace characters between columns, and the lines can be of different lengths. \code{read_table()} is more strict, each line must be the same length, and each field is in the same position in every line. It first finds empty columns and then parses like a fixed width file. \code{spec_table()} and \code{spec_table2()} return the column specifications rather than a data frame. } \examples{ # One corner from http://www.masseyratings.com/cf/compare.htm massey <- readr_example("massey-rating.txt") cat(read_file(massey)) read_table(massey) # Sample of 1978 fuel economy data from # http://www.fueleconomy.gov/feg/epadata/78data.zip epa <- readr_example("epa78.txt") cat(read_file(epa)) read_table(epa, col_names = FALSE) } \seealso{ \code{\link[=read_fwf]{read_fwf()}} to read fixed width files where each column is not separated by whitespace. \code{read_fwf()} is also useful for reading tabular data with non-standard formatting. } readr/man/col_skip.Rd0000644000175100001440000000075413106315444014241 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/collectors.R \name{col_skip} \alias{col_skip} \title{Skip a column} \usage{ col_skip() } \description{ Use this function to ignore a column when reading in a file. To skip all columns not otherwise specified, use \code{\link{cols_only}()}. } \seealso{ Other parsers: \code{\link{parse_datetime}}, \code{\link{parse_factor}}, \code{\link{parse_guess}}, \code{\link{parse_logical}}, \code{\link{parse_number}} } readr/man/output_column.Rd0000644000175100001440000000104213106315444015342 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/write.R \name{output_column} \alias{output_column} \title{Preprocess column for output} \usage{ output_column(x) } \arguments{ \item{x}{A vector} } \description{ This is a generic function that applied to each column before it is saved to disk. It provides a hook for S3 classes that need special handling. } \examples{ # Most columns are left as is, but POSIXct are # converted to ISO8601. x <- parse_datetime("2016-01-01") str(output_column(x)) } \keyword{internal} readr/man/callback.Rd0000644000175100001440000000302613106315444014165 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/callback.R \docType{data} \name{callback} \alias{callback} \alias{ChunkCallback} \alias{SideEffectChunkCallback} \alias{DataFrameCallback} \alias{ListCallback} \title{Callback classes} \description{ These classes are used to define callback behaviors. } \details{ \describe{ \item{ChunkCallback}{Callback interface definition, all callback functions should inherit from this class.} \item{SideEffectChunkCallback}{Callback function that is used only for side effects, no results are returned.} \item{DataFrameCallback}{Callback function that combines each result together at the end.} } } \examples{ ## If given a regular function it is converted to a SideEffectChunkCallback # view structure of each chunk read_lines_chunked(readr_example("mtcars.csv"), str, chunk_size = 5) # Print starting line of each chunk f <- function(x, pos) print(pos) read_lines_chunked(readr_example("mtcars.csv"), SideEffectChunkCallback$new(f), chunk_size = 5) # If combined results are desired you can use the DataFrameCallback # Cars with 3 gears f <- function(x, pos) subset(x, gear == 3) read_csv_chunked(readr_example("mtcars.csv"), DataFrameCallback$new(f), chunk_size = 5) # The ListCallback can be used for more flexible output f <- function(x, pos) x$mpg[x$hp > 100] read_csv_chunked(readr_example("mtcars.csv"), ListCallback$new(f), chunk_size = 5) } \seealso{ Other chunked: \code{\link{read_delim_chunked}}, \code{\link{read_lines_chunked}} } \keyword{datasets} \keyword{internal} readr/man/count_fields.Rd0000644000175100001440000000243013106315444015105 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/count_fields.R \name{count_fields} \alias{count_fields} \title{Count the number of fields in each line of a file} \usage{ count_fields(file, tokenizer, skip = 0, n_max = -1L) } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \code{http://}, \code{https://}, \code{ftp://}, or \code{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).} \item{tokenizer}{A tokenizer that specifies how to break the \code{file} up into fields, e.g., \code{\link[=tokenizer_csv]{tokenizer_csv()}}, \code{\link[=tokenizer_fwf]{tokenizer_fwf()}}} \item{skip}{Number of lines to skip before reading data.} \item{n_max}{Optionally, maximum number of rows to count fields for.} } \description{ This is useful for diagnosing problems with functions that fail to parse correctly. } \examples{ count_fields(readr_example("mtcars.csv"), tokenizer_csv()) } readr/man/parse_atomic.Rd0000644000175100001440000000360513106315444015102 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/collectors.R \name{parse_atomic} \alias{parse_logical} \alias{parse_integer} \alias{parse_double} \alias{parse_character} \alias{col_logical} \alias{col_integer} \alias{col_double} \alias{col_character} \title{Parse logicals, integers, and reals} \usage{ parse_logical(x, na = c("", "NA"), locale = default_locale()) parse_integer(x, na = c("", "NA"), locale = default_locale()) parse_double(x, na = c("", "NA"), locale = default_locale()) parse_character(x, na = c("", "NA"), locale = default_locale()) col_logical() col_integer() col_double() col_character() } \arguments{ \item{x}{Character vector of values to parse.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} } \description{ Use \code{parse_*()} if you have a character vector you want to parse. Use \code{col_*()} in conjunction with a \code{read_*()} function to parse the values as they're read in. } \examples{ parse_integer(c("1", "2", "3")) parse_double(c("1", "2", "3.123")) parse_number("$1,123,456.00") # Use locale to override default decimal and grouping marks es_MX <- locale("es", decimal_mark = ",") parse_number("$1.123.456,00", locale = es_MX) # Invalid values are replaced with missing values with a warning. x <- c("1", "2", "3", "-") parse_double(x) # Or flag values as missing parse_double(x, na = "-") } \seealso{ Other parsers: \code{\link{col_skip}}, \code{\link{parse_datetime}}, \code{\link{parse_factor}}, \code{\link{parse_guess}}, \code{\link{parse_number}} } readr/man/date_names.Rd0000644000175100001440000000213013106315444014524 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/date-symbols.R \name{date_names} \alias{date_names} \alias{date_names_lang} \alias{date_names_langs} \title{Create or retrieve date names} \usage{ date_names(mon, mon_ab = mon, day, day_ab = day, am_pm = c("AM", "PM")) date_names_lang(language) date_names_langs() } \arguments{ \item{mon, mon_ab}{Full and abbreviated month names.} \item{day, day_ab}{Full and abbreviated week day names. Starts with Sunday.} \item{am_pm}{Names used for AM and PM.} \item{language}{A BCP 47 locale, made up of a languge and a region, e.g. \code{"en_US"} for American English. See \code{date_names_locales()} for a complete list of available locales.} } \description{ When parsing dates, you often need to know how weekdays of the week and months are represented as text. This pair of functions allows you to either create your own, or retrieve from a standard list. The standard list is derived from ICU (\url{http://site.icu-project.org}) via the stringi package. } \examples{ date_names_lang("en") date_names_lang("ko") date_names_lang("fr") } readr/man/parse_vector.Rd0000644000175100001440000000166413106315444015133 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/collectors.R \name{parse_vector} \alias{parse_vector} \title{Parse a character vector.} \usage{ parse_vector(x, collector, na = c("", "NA"), locale = default_locale()) } \arguments{ \item{x}{Character vector of elements to parse.} \item{collector}{Column specification.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} } \description{ Parse a character vector. } \examples{ x <- c("1", "2", "3", "NA") parse_vector(x, col_integer()) parse_vector(x, col_double()) } \keyword{internal} readr/man/tokenize.Rd0000644000175100001440000000252613106315444014265 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/tokenizer.R \name{tokenize} \alias{tokenize} \title{Tokenize a file/string.} \usage{ tokenize(file, tokenizer = tokenizer_csv(), skip = 0, n_max = -1L) } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \code{http://}, \code{https://}, \code{ftp://}, or \code{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).} \item{tokenizer}{A tokenizer specification.} \item{skip}{Number of lines to skip before reading data.} \item{n_max}{Optionally, maximum number of rows to tokenize.} } \description{ Turns input into a character vector. Usually the tokenization is done purely in C++, and never exposed to R (because that requires a copy). This function is useful for testing, or when a file doesn't parse correctly and you want to see the underlying tokens. } \examples{ tokenize("1,2\\n3,4,5\\n\\n6") # Only tokenize first two lines tokenize("1,2\\n3,4,5\\n\\n6", n = 2) } \keyword{internal} readr/man/readr_example.Rd0000644000175100001440000000067113106315672015247 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/example.R \name{readr_example} \alias{readr_example} \title{Get path to readr example} \usage{ readr_example(path) } \arguments{ \item{path}{Name of file} } \description{ readr comes bundled with a number of sample files in its \code{inst/extdata} directory. This function make them easy to access } \examples{ readr_example("challenge.csv") } \keyword{internal} readr/man/parse_number.Rd0000644000175100001440000000220213106315444015106 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/collectors.R \name{parse_number} \alias{parse_number} \alias{col_number} \title{Parse numbers, flexibly} \usage{ parse_number(x, na = c("", "NA"), locale = default_locale()) col_number() } \arguments{ \item{x}{Character vector of values to parse.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} } \description{ This drops any non-numeric characters before or after the first number. The grouping mark specified by the locale is ignored inside the number. } \examples{ parse_number("$1000") parse_number("1,234,567.78") } \seealso{ Other parsers: \code{\link{col_skip}}, \code{\link{parse_datetime}}, \code{\link{parse_factor}}, \code{\link{parse_guess}}, \code{\link{parse_logical}} } readr/man/type_convert.Rd0000644000175100001440000000432213106315672015155 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/type_convert.R \name{type_convert} \alias{type_convert} \title{Re-convert character columns in existing data frame} \usage{ type_convert(df, col_types = NULL, na = c("", "NA"), trim_ws = TRUE, locale = default_locale()) } \arguments{ \item{df}{A data frame.} \item{col_types}{One of \code{NULL}, a \code{\link[=cols]{cols()}} specification, or a string. See \code{vignette("column-types")} for more details. If \code{NULL}, all column types will be imputed from the first 1000 rows on the input. This is convenient (and fast), but not robust. If the imputation fails, you'll need to supply the correct types yourself. If a column specification created by \code{\link[=cols]{cols()}}, it must contain one column specification for each column. If you only want to read a subset of the columns, use \code{\link[=cols_only]{cols_only()}}. Unlike other functions \code{type_convert()} does not allow character specifications of \code{col_types}.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{trim_ws}{Should leading and trailing whitespace be trimmed from each field before parsing it?} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} } \description{ This is useful if you need to do some manual munging - you can read the columns in as character, clean it up with (e.g.) regular expressions and then let readr take another stab at parsing it. The name is a homage to the base \code{\link[utils]{type.convert}()}. } \examples{ df <- data.frame( x = as.character(runif(10)), y = as.character(sample(10)), stringsAsFactors = FALSE ) str(df) str(type_convert(df)) df <- data.frame(x = c("NA", "10"), stringsAsFactors = FALSE) str(type_convert(df)) # Type convert can be used to infer types from an entire dataset type_convert( read_csv(readr_example("mtcars.csv"), col_types = cols(.default = col_character()))) } readr/man/read_delim_chunked.Rd0000644000175100001440000001321613106315672016224 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/read_delim_chunked.R \name{read_delim_chunked} \alias{read_delim_chunked} \alias{read_csv_chunked} \alias{read_csv2_chunked} \alias{read_tsv_chunked} \title{Read a delimited file by chunks} \usage{ read_delim_chunked(file, callback, chunk_size = 10000, delim, quote = "\\"", escape_backslash = FALSE, escape_double = TRUE, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, comment = "", trim_ws = FALSE, skip = 0, guess_max = min(1000, chunk_size), progress = show_progress()) read_csv_chunked(file, callback, chunk_size = 10000, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\\"", comment = "", trim_ws = TRUE, skip = 0, guess_max = min(1000, chunk_size), progress = show_progress()) read_csv2_chunked(file, callback, chunk_size = 10000, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\\"", comment = "", trim_ws = TRUE, skip = 0, guess_max = min(1000, chunk_size), progress = show_progress()) read_tsv_chunked(file, callback, chunk_size = 10000, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\\"", comment = "", trim_ws = TRUE, skip = 0, guess_max = min(1000, chunk_size), progress = show_progress()) } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \code{http://}, \code{https://}, \code{ftp://}, or \code{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).} \item{callback}{A callback function to call on each chunk} \item{chunk_size}{The number of rows to include in each chunk} \item{delim}{Single character used to separate fields within a record.} \item{quote}{Single character used to quote strings.} \item{escape_backslash}{Does the file use backslashes to escape special characters? This is more general than \code{escape_double} as backslashes can be used to escape the delimiter character, the quote character, or to add special characters like \code{\\n}.} \item{escape_double}{Does the file escape quotes by doubling them? i.e. If this option is \code{TRUE}, the value \code{""""} represents a single quote, \code{\"}.} \item{col_names}{Either \code{TRUE}, \code{FALSE} or a character vector of column names. If \code{TRUE}, the first row of the input will be used as the column names, and will not be included in the data frame. If \code{FALSE}, column names will be generated automatically: X1, X2, X3 etc. If \code{col_names} is a character vector, the values will be used as the names of the columns, and the first row of the input will be read into the first row of the output data frame. Missing (\code{NA}) column names will generate a warning, and be filled in with dummy names \code{X1}, \code{X2} etc. Duplicate column names will generate a warning and be made unique with a numeric prefix.} \item{col_types}{One of \code{NULL}, a \code{\link[=cols]{cols()}} specification, or a string. See \code{vignette("column-types")} for more details. If \code{NULL}, all column types will be imputed from the first 1000 rows on the input. This is convenient (and fast), but not robust. If the imputation fails, you'll need to supply the correct types yourself. If a column specification created by \code{\link[=cols]{cols()}}, it must contain one column specification for each column. If you only want to read a subset of the columns, use \code{\link[=cols_only]{cols_only()}}. Alternatively, you can use a compact string representation where each character represents one column: c = character, i = integer, n = number, d = double, l = logical, D = date, T = date time, t = time, ? = guess, or \code{_}/\code{-} to skip the column.} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{quoted_na}{Should missing values inside quotes be treated as missing values (the default) or strings.} \item{comment}{A string used to identify comments. Any text after the comment characters will be silently ignored.} \item{trim_ws}{Should leading and trailing whitespace be trimmed from each field before parsing it?} \item{skip}{Number of lines to skip before reading data.} \item{guess_max}{Maximum number of records to use for guessing column types.} \item{progress}{Display a progress bar? By default it will only display in an interactive session and not while knitting a document. The display is updated every 50,000 values and will only display if estimated reading time is 5 seconds or more. The automatic progress bar can be disabled by setting option \code{readr.show_progress} to \code{FALSE}.} } \description{ Read a delimited file by chunks } \examples{ # Cars with 3 gears f <- function(x, pos) subset(x, gear == 3) read_csv_chunked(readr_example("mtcars.csv"), DataFrameCallback$new(f), chunk_size = 5) } \seealso{ Other chunked: \code{\link{callback}}, \code{\link{read_lines_chunked}} } \keyword{internal} readr/man/read_fwf.Rd0000644000175100001440000001255313106315672014216 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/read_fwf.R \name{read_fwf} \alias{read_fwf} \alias{fwf_empty} \alias{fwf_widths} \alias{fwf_positions} \alias{fwf_cols} \title{Read a fixed width file into a tibble} \usage{ read_fwf(file, col_positions, col_types = NULL, locale = default_locale(), na = c("", "NA"), comment = "", skip = 0, n_max = Inf, guess_max = min(n_max, 1000), progress = show_progress()) fwf_empty(file, skip = 0, col_names = NULL, comment = "", n = 100L) fwf_widths(widths, col_names = NULL) fwf_positions(start, end = NULL, col_names = NULL) fwf_cols(...) } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \code{http://}, \code{https://}, \code{ftp://}, or \code{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).} \item{col_positions}{Column positions, as created by \code{\link[=fwf_empty]{fwf_empty()}}, \code{\link[=fwf_widths]{fwf_widths()}} or \code{\link[=fwf_positions]{fwf_positions()}}. To read in only selected fields, use \code{\link[=fwf_positions]{fwf_positions()}}. If the width of the last column is variable (a ragged fwf file), supply the last end position as NA.} \item{col_types}{One of \code{NULL}, a \code{\link[=cols]{cols()}} specification, or a string. See \code{vignette("column-types")} for more details. If \code{NULL}, all column types will be imputed from the first 1000 rows on the input. This is convenient (and fast), but not robust. If the imputation fails, you'll need to supply the correct types yourself. If a column specification created by \code{\link[=cols]{cols()}}, it must contain one column specification for each column. If you only want to read a subset of the columns, use \code{\link[=cols_only]{cols_only()}}. Alternatively, you can use a compact string representation where each character represents one column: c = character, i = integer, n = number, d = double, l = logical, D = date, T = date time, t = time, ? = guess, or \code{_}/\code{-} to skip the column.} \item{locale}{The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use \code{\link[=locale]{locale()}} to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.} \item{na}{Character vector of strings to use for missing values. Set this option to \code{character()} to indicate no missing values.} \item{comment}{A string used to identify comments. Any text after the comment characters will be silently ignored.} \item{skip}{Number of lines to skip before reading data.} \item{n_max}{Maximum number of records to read.} \item{guess_max}{Maximum number of records to use for guessing column types.} \item{progress}{Display a progress bar? By default it will only display in an interactive session and not while knitting a document. The display is updated every 50,000 values and will only display if estimated reading time is 5 seconds or more. The automatic progress bar can be disabled by setting option \code{readr.show_progress} to \code{FALSE}.} \item{col_names}{Either NULL, or a character vector column names.} \item{n}{Number of lines the tokenizer will read to determine file structure. By default it is set to 100.} \item{widths}{Width of each field. Use NA as width of last field when reading a ragged fwf file.} \item{start, end}{Starting and ending (inclusive) positions of each field. Use NA as last end field when reading a ragged fwf file.} \item{...}{If the first element is a data frame, then it must have all numeric columns and either one or two rows. The column names are the variable names, and the column values are the variable widths if a length one vector, and variable start and end positions. Otherwise, the elements of \code{...} are used to construct a data frame with or or two rows as above.} } \description{ A fixed width file can be a very compact representation of numeric data. It's also very fast to parse, because every field is in the same place in every line. Unfortunately, it's painful to parse because you need to describe the length of every field. Readr aims to make it as easy as possible by providing a number of different ways to describe the field structure. } \examples{ fwf_sample <- readr_example("fwf-sample.txt") cat(read_lines(fwf_sample)) # You can specify column positions in several ways: # 1. Guess based on position of empty columns read_fwf(fwf_sample, fwf_empty(fwf_sample, col_names = c("first", "last", "state", "ssn"))) # 2. A vector of field widths read_fwf(fwf_sample, fwf_widths(c(20, 10, 12), c("name", "state", "ssn"))) # 3. Paired vectors of start and end positions read_fwf(fwf_sample, fwf_positions(c(1, 30), c(10, 42), c("name", "ssn"))) # 4. Named arguments with start and end positions read_fwf(fwf_sample, fwf_cols(name = c(1, 10), ssn = c(30, 42))) # 5. Named arguments with column widths read_fwf(fwf_sample, fwf_cols(name = 20, state = 10, ssn = 12)) } \seealso{ \code{\link[=read_table]{read_table()}} to read fixed width files where each column is separated by whitespace. } readr/man/readr-package.Rd0000644000175100001440000000221013106315444015111 0ustar hornikusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/readr.R \docType{package} \name{readr-package} \alias{readr} \alias{readr-package} \title{readr: Read Rectangular Text Data} \description{ The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes. } \seealso{ Useful links: \itemize{ \item \url{http://readr.tidyverse.org} \item \url{https://github.com/tidyverse/readr} \item Report bugs at \url{https://github.com/tidyverse/readr/issues} } } \author{ \strong{Maintainer}: Jim Hester \email{james.hester@rstudio.com} Authors: \itemize{ \item Hadley Wickham \email{hadley@rstudio.com} \item Romain Francois } Other contributors: \itemize{ \item R Core Team (Date time code adapted from R) [contributor] \item RStudio [copyright holder, funder] \item Jukka Jylänki (grisu3 implementation) [contributor, copyright holder] \item Mikkel Jørgensen (grisu3 implementation) [contributor, copyright holder] } } \keyword{internal} readr/tools/0000755000175100001440000000000013106315444012526 5ustar hornikusersreadr/tools/logo.png0000644000175100001440000003513713106315444014205 0ustar hornikusersPNG  IHDRxb]esRGB pHYs  iTXtXML:com.adobe.xmp Adobe ImageReady 1 ).=8-IDATx}|\ՕyS{lY$06`$MB B$$&d&B 6 j(I]HHv7 mbHXLqX$˖l:fl*#k{[{ι9e.TWlb~ Q6Ħ.?yǻVt,uqwKŦZ~N1{(yv$g_>芼TMG)Q|81 hIlŬ~q~܁S?5d5MI|TYWW-6V|:Pi)@VXP9KvL.FR*݀[Z`rlix7gW?AwaPI|d7>oZp9R &RB( xqڣh! 2sJ"Q:8lfoMC>'uS&|t7}o{LZ_``K.6AQ_*g@b[m ؐDl瑢̶f1m,HԢNZZa,aI3 am_wKN!E^.b|'X΢.=L&gM(H "wW'3U]lۚ?C>nIFcиrc9yiE_kPN00/XEYFA3̶s_?ul rKCGň@<?GJgzQ71r66/rZB` /n̞i5}o vKrVkBdp ƴ'ȍV1xB))y12u@dꄽ=P[D=1\0%1ra1́]ްgjIۦI4=nL͞ p:oo6-8Um[/c|,7]1hm^eFu-^"8N<"M+p(} E1?Wq f!$Æml,x$ d?n~wg30!ô/Ǜf~7^?k (xvc-Fk7tfy3ExR)5N@I GtKjXo(ƲpKRU9r6n<ӄ/* V(nL9n4˼8nQځj_{,+nD YA[tǖZY)Y'M9-dU,)P?lnIoiXKVd,!d>cqHcXӽeHHp7^͋L{BzS>*k~zӥc6p =;ȻuW(^%=IF0ꖄeP[}R.ed=[ŗ݈E @3f'tdC?\fy&9axF)G(, ^,Vn}:@E}Phr%Rxd,?3U+SPP0Œ!g/ÆT>#Fh3z~lZњ-@'uKڜj ,vL7^z5?pz?t8#n@ XC$ADsőP44G k?U=HPxH&d\ .{s,c\288{&Gv0 QXlt˙kpqŒV;bgZگh=tũmԳ7Nn~V[^6^U.D)@#JRLib Ofp|K@v9eҧ%`V1fl[增C |nh/׍co.{4*Ė|vaW@,*vԌ0@.۷F;pniV?{; ݗNP2DvRr 2)'& ^IʑmDFc1%9/D':Ћ4L`҆Ϳ CZn<0T3x-෿y<,0[r,#+KY9C:$QG\S$Ld, lQ?0CEe<$9R&m].}XΒ'|P%0D9ڧ=٣F9F$:%t ; X lNWGȹշ7,N(|h:\%`‚(*X۶[=(ɘ2>Y : yWwׯX?_}^"!Yň#dIbGM1l+YlaCmOˉJg>o߃.,g0Ax〤@p1R4vct lA+FL't0AAsc`@0SF{7}:#:.//I'&Jt@}43e`|NrKZb SNZͷV (ߑu!`˖/?>:P+1o֜ q. ylgχhYJ#¼/yqg(i iL܊"{*l @DrEJN],V~\p?8JLͳ`͑iٲ"y/繁Qpgy# NCr#TfY 0جdLoVsqCMヿF]>`<no&K#G/6H>U Gs|-C0-!F-N`r) ==Q-Csݭvɹg9 N~b~(KqwR;H&" &Pn1Tp(y}W{u~CsKd^J2ܣLĂÝÇC221pL`g-.KcW " ŅGY׉f3g[pO9ǟu5 E`׺#tkuV,+%oYIȣ=\45;X;KjdEH?F>w;:/(XE`+zu8lFo_NK (M:@ A3q M*ܥaَlgV8߭{w]xr"Nֻ>돸d5byv/@l; (08'PƎ`T& (3U̎c RLŷ_"kEԥcm *` Nն8&92Xd \H|X$?<;qF1X J%G$?<>D?̅ =>?Q^~ACHGmgYru2*r >Iϙǰc] h+aUONZ>N ?S b"OH3Q/.c@\5|(a7#H^8ViVeϣҤ˵ʣr+>M5X1a0GHf k{9?8eH~>.c7shj*2v UْVeX+;)L,;%؈8J:qDbzPCEavBa  SX,w#NnhԂ'$ `AP,6dIx8V(vl@.9ҍlSf`v&Gt & |ʻSw4/K{$1.B~Wn\Tۅ,\,iQҍ{$aCCG aؽ`J3?I 35?TN)Tlj{$%ExN<́1X4\纩L3?Zr" g`@ ڎ~S)g"nlG_B9(0htwg\.+OY(qQR&ϿouLS="9YNLrKcS;e`ɒn]HpS BS\f@ WU/{@`. ,}N9+ PjEkWv͗ʦF0 7F4"l KXFl <|fE| e~v}<ߕR>/;#F]9t*k5 7Ao?#KN͑C,PEVJ2n>ӯ}Iu?VK; pw3R'|3P |R&4l" /87O69-}/\p_|[6JrKcl(‚hl#g.ϓ?ud&UH+Gs.ΕVy ~>,Γ\MW.[I!^ rŊ}ܦ2>u7^%}/'+;:dŹ_*E !nRW - TBN4&'U硫NÞklg > 2# QͪcR:(+[d*inˌqȆ^ܵWn>˯}K>q_,/!.O=7 Ҡ|3_'Dmg/9_~pk¶yVp諯Ew3r`s(ϯk. M\SL%=ޑ_}0VڰNA3 0[F@ t* Q|؄?۾RNYPZ_ .gDwv<)-yd͕/=*oD LK:XLHYrS'zd˻+#bY<+߸[.W\PPj=1!&Ayg#OlCϋuK?@zO{QI ` B{An}&]%G;9{bt{W? EI4cBNd[\P\Oߠ,?+WZ;=ȋUwvl[yX(fr޹땝wttɦ߽sR%$- =.O.YXR0T%>G~{neK/8SVSC(iFFFDF%qrGɂt{?>%x;s+:vJGExMaOD%C N$%_-,/#Kky:)uMH]iԠ0JlyKolYQo-!=S1ȨHPX̙!s2e/ P Zuuv(n$Q!taO"e.p hWW/>⠁r m 7IRY@& ,+.T1F$;OեG3M'` hנw,Mo!C|Rx$X9;P=w>5IcYӷ?)}+ |>o[o.\$?~;Jذ$k[%5t|,oG֊飂3F :<7'sxۮj ψldԴo=D=ݲOIJJٳO.nQ=_@b#`P,+X,B4 p`ECk+OB[ZI/Ȃֻ@4tJa~,+^4[ZdvzT~R^e\xjdEߦmN̒ҁ{lXR%Kcc3pfO[8JdiL^ QŅviyΟAGSD: sPvٯAt r˖;Of(0ݠn5.]N='I1o)ҹSN_Y)Q*3( V$~g utNN,}Y;vdfLX̮"5Te8s]؂XynCx/e|m'徟hzCvEǯX&~*ygt%Hr"s 2MW_T*^$w5W],-?}j@wWjZ=o.O4<?ZaVkYltIWW&^#AفNyfȼ֍+ t?y3Q y|`뇘xD>^$+W,áEay T@Yc?(ϒ,_Y"8$SRUD(;3i\pr! wBIc4R5s`ӌLJeWQaxϴ8op>2׮>OeB‚l͇ۤ쵪]U5O`ݞj.)P$]VV)ߺr_"#x%ZqV[$*9ڵ˂ye,ӥwTw^~sg8&+@#ͬ4Is*M3Mp*&b@6ț0(lB k昸K҅F񿡥KvT6s;M5K^GMoa$ܹZ~(ՐK,HXΚMJR 8m,y)*wTUAQw''Knj4}Uۡ] Uplk4Ӓ(fMhq3 ppohY9v"M„D2W[rr ٷJb97̛˿=.9'H4(v'#=pH3;_R=Z@M+<Rc? u%\VEXs;dC^d4BG=Cv({{ $RȺBPɦiEH~VU^' 31b >~#i;F;\}(qCpf y U>hO zFQW>cub1_q=Zmj8ϸ [{2yXdvjHv>R;:sعp.ka0/?YFX%m̘5]s5@cMȰAؿ gs _xO&α*?fڅKPm FhOX%S=[9/yњyxlXt -B8:v!Li}0E[p.ُ =+@O^pBHA@( W,^!wA@QXvR2·y۝LDNU{!IGY`^-ÜoPPXڞfA meϢVm~ 멡 au'.:'PXGgx+08>c܎XKSp'L}ր<+&Q] !Cz"]ej15&v.'VM > l_?X\Q8Y'ȼx qv8V/~l}{hgU[-qa o)\z"Z;z[. 6~&ֱ!/W-y`N}-|rV٧ol--D%sd*::"δw!밬P. yf;p_i=gBGaF2̣pMP0*-\jD*VР /ͳ~N Û?Ӿx=T L4p"O@] MZn1thŭez2FgϞֿ6F VXLN`-ımo&+ tDНx@6%JޖP .r{j$Ze #07F⡀-&]X\c:swc7;Pke;$؏J4Ȣ fM>o+.;\UF*F kOL"9Xu~IMɾ`*d5 ch%Sd0[M ܊<)Ip~H=V= vT!Je58~mcDt8ls3v$L!KK:(E st% KpF?ع_V,-P-Z,?dT;p!9,{WDB&@..?, ayX,4KgxTmRc٧mrʄ #=|-wk]Vh~nk>J~%/ʟ+Vv s/FHC@穁 Pt, ,@&Y6'+z?NGIɺZr"ʭ<~]ƚcA^'9(&?;Ʈ{ R-0"@l_{KJ@%8il|@4Ҋ>+ gd$x<.Xo`e NRu]*Z~Pt0`&MdXjV#;i/ sD 5ah{QnI|܋kFI80xniO@UF@4b%eGp382dڐj>7of+ 禲:;^RgPy0Fbăfcd\R*9R7*F_`P0{/׹p% -҆ -cm%(o$(iJμ,.s=f ,ϾtliT g 5-9~/!dC')JQO5:R>/%; ZV|:;|Kg*&&@1VȶѯQ}' ?xS\׿ P-9y&6ԋ#)X|. jv c v`sQpP}D- ]x A dm*g^!qZ-g8t6}b/ySk1r@cZťϪXAmlٓlu$X1v jF:{CȃL\Md:}` r`m>ˇpY dS9 9KJ&Z!Ϻ`\i6gX"?fu*ǒ YU\kNt8!/F1528B26AfL9^n 0#]ʾxXf$~r6/nF\`Hp9hmR}E, NSg+jhanck멣Gy)j dѢPhmN 00t T)6rJYSN C|N\|Zl[K_jxA9ہ-`n@Ԭ0px9noԻ(ЃA^-A1\ O?<փդ4JY@|F}s3~۞HC'#/{FFM۾n<"54uɆwJDSj^Tr̋`Ǟ lڳt2f"U`!n6kS"g'^&pDWǏz x'`F!9苫8P!nH@8AU$=|o*%-g/KOYJKUC`$t 0/zm´!oMw֗?]>9;^`y`bLJ9pt @L(9'VjKΘbWDKMGZ9_k/9iXrTQ[27&[h7ݒ|֩)_>1CnNbp3NҺbOr3ܖllCMvJ;b\Ry糶z{߭r,MeO1Mj3OoJZ0|;VeYd|%;hjkE'AdsuDdȶɒ{n3 2cs/ӵĔ=;@]<ʇϻ!nQy0# oMs栯~ y7Ĕ&iL`pK%׬6ƽ&%3^* 2HcOj(aAloO[#c&A^!o}X-n!sS͋s0y];[v?K;yxN3vUBSj5עJvK8b;^epK18"QQ:='T~ iNK}sZžӂXl7Z?b8noh,x嬖1 쯍%uZ8߀v@[R,LHvw$$ԇV Sj(,3׭BśK^G VYr#G9kyzl7$gg`zϰ*Zs0F9(e!Ѭ!Nh*)ǐd ݚ唋Sxsv$ɮzyrF2#{YڢbYsrqP>ڲ<=O6 ^=C_[6,h8--I (5nĀ[ҖϜl+|(Pٖ_->0I]zR<_^V+ =^y22)a]J"رYp_)r}coCx4z ll7|ܒTkS,]B%yr|Yw {}9pQj_}΁lƘ|Yx;qӝƳ[;`X\2%pK~$X͞O v0{$:t}%E)rfq|BwϾg탯QRD'^;x^ng1w[9`!fϔ^mKx"F'CD͞~$=Q`Ѡ .ggڵ%*L3PhxA焺]Me=՛iDGDo00pF,C[fOXdmGlԉ| y]R[ޯn@ziA}o3x `ؼJ@x@%MJNi[E,m<aQr2_X](yⰴ`¸Ai[L9AIJӞJCn]p;we|N_zC dtK6.cQE*9Ɖxzl r'č7]bvlN|N-\s`ss[҄"Շ|Ց+)ex຦ʍZ, ]Cn5W70{r6+KΆ`Of}r 8V'?oC-䖄 W(Hz~K7BCf}r֪s`}!fO̅c.퍜2`i^g0G=R 1d_s `?$`%7N84fS,Z%|>P3Rz|g vzAIENDB`readr/LICENSE0000644000175100001440000004325413057262333012406 0ustar hornikusers GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License.