haven/0000755000176200001440000000000014102416217011351 5ustar liggesusershaven/NAMESPACE0000644000176200001440000000731314101767560012605 0ustar liggesusers# Generated by roxygen2: do not edit by hand S3method("names<-",haven_labelled) S3method(as.character,haven_labelled) S3method(as_factor,data.frame) S3method(as_factor,haven_labelled) S3method(as_factor,labelled) S3method(format,haven_labelled) S3method(format,pillar_shaft_haven_labelled_chr) S3method(format,pillar_shaft_haven_labelled_num) S3method(is.na,haven_labelled_spss) S3method(levels,haven_labelled) S3method(median,haven_labelled) S3method(obj_print_footer,haven_labelled) S3method(obj_print_footer,haven_labelled_spss) S3method(obj_print_header,haven_labelled) S3method(quantile,haven_labelled) S3method(summary,haven_labelled) S3method(vec_arith,haven_labelled) S3method(vec_arith.haven_labelled,default) S3method(vec_arith.haven_labelled,haven_labelled) S3method(vec_arith.haven_labelled,numeric) S3method(vec_arith.numeric,haven_labelled) S3method(vec_cast,character.haven_labelled) S3method(vec_cast,character.haven_labelled_spss) S3method(vec_cast,double.haven_labelled) S3method(vec_cast,double.haven_labelled_spss) S3method(vec_cast,haven_labelled.character) S3method(vec_cast,haven_labelled.double) S3method(vec_cast,haven_labelled.haven_labelled) S3method(vec_cast,haven_labelled.haven_labelled_spss) S3method(vec_cast,haven_labelled.integer) S3method(vec_cast,haven_labelled_spss.character) S3method(vec_cast,haven_labelled_spss.double) S3method(vec_cast,haven_labelled_spss.haven_labelled) S3method(vec_cast,haven_labelled_spss.haven_labelled_spss) S3method(vec_cast,haven_labelled_spss.integer) S3method(vec_cast,integer.haven_labelled) S3method(vec_cast,integer.haven_labelled_spss) S3method(vec_math,haven_labelled) S3method(vec_ptype2,character.haven_labelled) S3method(vec_ptype2,character.haven_labelled_spss) S3method(vec_ptype2,double.haven_labelled) S3method(vec_ptype2,double.haven_labelled_spss) S3method(vec_ptype2,haven_labelled.character) S3method(vec_ptype2,haven_labelled.double) S3method(vec_ptype2,haven_labelled.haven_labelled) S3method(vec_ptype2,haven_labelled.haven_labelled_spss) S3method(vec_ptype2,haven_labelled.integer) S3method(vec_ptype2,haven_labelled_spss.character) S3method(vec_ptype2,haven_labelled_spss.double) S3method(vec_ptype2,haven_labelled_spss.haven_labelled) S3method(vec_ptype2,haven_labelled_spss.haven_labelled_spss) S3method(vec_ptype2,haven_labelled_spss.integer) S3method(vec_ptype2,integer.haven_labelled) S3method(vec_ptype2,integer.haven_labelled_spss) S3method(vec_ptype_abbr,haven_labelled) S3method(vec_ptype_full,haven_labelled) S3method(vec_ptype_full,haven_labelled_spss) S3method(zap_formats,data.frame) S3method(zap_formats,default) S3method(zap_label,data.frame) S3method(zap_label,default) S3method(zap_labels,data.frame) S3method(zap_labels,default) S3method(zap_labels,haven_labelled) S3method(zap_labels,haven_labelled_spss) S3method(zap_missing,data.frame) S3method(zap_missing,default) S3method(zap_missing,haven_labelled) S3method(zap_missing,haven_labelled_spss) S3method(zap_widths,data.frame) S3method(zap_widths,default) export(as_factor) export(format_tagged_na) export(is.labelled) export(is_tagged_na) export(labelled) export(labelled_spss) export(na_tag) export(print_labels) export(print_tagged_na) export(read_dta) export(read_por) export(read_sas) export(read_sav) export(read_spss) export(read_stata) export(read_xpt) export(tagged_na) export(vec_arith.haven_labelled) export(write_dta) export(write_sas) export(write_sav) export(write_xpt) export(zap_empty) export(zap_formats) export(zap_label) export(zap_labels) export(zap_missing) export(zap_widths) import(rlang) import(vctrs) importFrom(forcats,as_factor) importFrom(hms,hms) importFrom(methods,setOldClass) importFrom(stats,median) importFrom(stats,quantile) importFrom(tibble,tibble) useDynLib(haven, .registration = TRUE) haven/LICENSE0000644000176200001440000000011314033646021012352 0ustar liggesusersYEAR: 2013-2019 COPYRIGHT HOLDER: Hadley Wickham; RStudio; and Evan Miller haven/README.md0000644000176200001440000000545314101767533012650 0ustar liggesusers # haven [![CRAN status](https://www.r-pkg.org/badges/version/haven)](https://cran.r-project.org/package=haven) [![R build status](https://github.com/tidyverse/haven/workflows/R-CMD-check/badge.svg)](https://github.com/tidyverse/haven/actions) [![Codecov test coverage](https://codecov.io/gh/tidyverse/haven/branch/master/graph/badge.svg)](https://codecov.io/gh/tidyverse/haven?branch=master) ## Overview Haven enables R to read and write various data formats used by other statistical packages by wrapping the fantastic [ReadStat](https://github.com/WizardMac/ReadStat) C library written by [Evan Miller](https://www.evanmiller.org). Haven is part of the [tidyverse](https://www.tidyverse.org/). Currently it supports: - **SAS**: `read_sas()` reads `.sas7bdat` + `.sas7bcat` files and `read_xpt()` reads SAS transport files (version 5 and version 8). - **SPSS**: `read_sav()` reads `.sav` files and `read_por()` reads the older `.por` files. `write_sav()` writes `.sav` files. - **Stata**: `read_dta()` reads `.dta` files (up to version 15). `write_dta()` writes `.dta` files (versions 8-15). The output objects: - Are [tibbles](https://github.com/tidyverse/tibble), which have a better print method for very long and very wide files. - Translate value labels into a new `labelled()` class, which preserves the original semantics and can easily be coerced to factors with `as_factor()`. Special missing values are preserved. See `vignette("semantics")` for more details. - Dates and times are converted to R date/time classes. Character vectors are not converted to factors. ## Installation ``` r # The easiest way to get haven is to install the whole tidyverse: install.packages("tidyverse") # Alternatively, install just haven: install.packages("haven") ``` ## Usage ``` r library(haven) # SAS read_sas("mtcars.sas7bdat") write_sas(mtcars, "mtcars.sas7bdat") # SPSS read_sav("mtcars.sav") write_sav(mtcars, "mtcars.sav") # Stata read_dta("mtcars.dta") write_dta(mtcars, "mtcars.dta") ``` ## Related work - [foreign](https://cran.r-project.org/package=foreign) reads from SAS XPORT, SPSS, and Stata (up to version 12) files. - [readstat13](https://cran.r-project.org/package=readstata13) reads from and writes to all Stata file format versions. - [sas7bdat](https://cran.r-project.org/package=sas7bdat) reads from SAS7BDAT files. ## Code of Conduct Please note that the haven project is released with a [Contributor Code of Conduct](https://haven.tidyverse.org/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms. haven/man/0000755000176200001440000000000014101766013012125 5ustar liggesusershaven/man/vec_arith.haven_labelled.Rd0000644000176200001440000000045714033646021017312 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/labelled.R \name{vec_arith.haven_labelled} \alias{vec_arith.haven_labelled} \title{Internal vctrs methods} \usage{ \method{vec_arith}{haven_labelled}(op, x, y, ...) } \description{ Internal vctrs methods } \keyword{internal} haven/man/tagged_na.Rd0000644000176200001440000000311314033646021014323 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/tagged_na.R \name{tagged_na} \alias{tagged_na} \alias{na_tag} \alias{is_tagged_na} \alias{format_tagged_na} \alias{print_tagged_na} \title{"Tagged" missing values} \usage{ tagged_na(...) na_tag(x) is_tagged_na(x, tag = NULL) format_tagged_na(x, digits = getOption("digits")) print_tagged_na(x, digits = getOption("digits")) } \arguments{ \item{...}{Vectors containing single character. The letter will be used to "tag" the missing value.} \item{x}{A numeric vector} \item{tag}{If \code{NULL}, will only return true if the tag has this value.} \item{digits}{Number of digits to use in string representation} } \description{ "Tagged" missing values work exactly like regular R missing values except that they store one additional byte of information a tag, which is usually a letter ("a" to "z"). When by loading a SAS and Stata file, the tagged missing values always use lower case values. } \details{ \code{format_tagged_na()} and \code{print_tagged_na()} format tagged NA's as NA(a), NA(b), etc. } \examples{ x <- c(1:5, tagged_na("a"), tagged_na("z"), NA) # Tagged NA's work identically to regular NAs x is.na(x) # To see that they're special, you need to use na_tag(), # is_tagged_na(), or print_tagged_na(): is_tagged_na(x) na_tag(x) print_tagged_na(x) # You can test for specific tagged NAs with the second argument is_tagged_na(x, "a") # Because the support for tagged's NAs is somewhat tagged on to R, # the left-most NA will tend to be preserved in arithmetic operations. na_tag(tagged_na("a") + tagged_na("z")) } haven/man/read_spss.Rd0000644000176200001440000001002614034330047014375 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/haven-spss.R \name{read_spss} \alias{read_spss} \alias{read_sav} \alias{read_por} \alias{write_sav} \title{Read and write SPSS files} \usage{ read_sav( file, encoding = NULL, user_na = FALSE, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique" ) read_por( file, user_na = FALSE, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique" ) write_sav(data, path, compress = FALSE) read_spss( file, user_na = FALSE, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique" ) } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \verb{http://}, \verb{https://}, \verb{ftp://}, or \verb{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path) or be a vector of greater than length 1. Using a value of \code{\link[readr:clipboard]{clipboard()}} will read from the system clipboard.} \item{encoding}{The character encoding used for the file. The default, \code{NULL}, use the encoding specified in the file, but sometimes this value is incorrect and it is useful to be able to override it.} \item{user_na}{If \code{TRUE} variables with user defined missing will be read into \code{\link[=labelled_spss]{labelled_spss()}} objects. If \code{FALSE}, the default, user-defined missings will be converted to \code{NA}.} \item{col_select}{One or more selection expressions, like in \code{\link[dplyr:select]{dplyr::select()}}. Use \code{c()} or \code{list()} to use more than one expression. See \code{?dplyr::select} for details on available selection options. Only the specified columns will be read from \code{data_file}.} \item{skip}{Number of lines to skip before reading data.} \item{n_max}{Maximum number of lines to read.} \item{.name_repair}{Treatment of problematic column names: \itemize{ \item \code{"minimal"}: No name repair or checks, beyond basic existence, \item \code{"unique"}: Make sure names are unique and not empty, \item \code{"check_unique"}: (default value), no name repair, but check they are \code{unique}, \item \code{"universal"}: Make the names \code{unique} and syntactic \item a function: apply custom name repair (e.g., \code{.name_repair = make.names} for names in the style of base R). \item A purrr-style anonymous function, see \code{\link[rlang:as_function]{rlang::as_function()}} } This argument is passed on as \code{repair} to \code{\link[vctrs:vec_as_names]{vctrs::vec_as_names()}}. See there for more details on these terms and the strategies used to enforce them.} \item{data}{Data frame to write.} \item{path}{Path to a file where the data will be written.} \item{compress}{If \code{TRUE}, will compress the file, resulting in a \code{.zsav} file. Otherwise the \code{.sav} file will be bytecode compressed.} } \value{ A tibble, data frame variant with nice defaults. Variable labels are stored in the "label" attribute of each variable. It is not printed on the console, but the RStudio viewer will show it. \code{write_sav()} returns the input \code{data} invisibly. } \description{ \code{read_sav()} reads both \code{.sav} and \code{.zsav} files; \code{write_sav()} creates \code{.zsav} files when \code{compress = TRUE}. \code{read_por()} reads \code{.por} files. \code{read_spss()} uses either \code{read_por()} or \code{read_sav()} based on the file extension. } \details{ Currently haven can read and write logical, integer, numeric, character and factors. See \code{\link[=labelled_spss]{labelled_spss()}} for how labelled variables in SPSS are handled in R. } \examples{ path <- system.file("examples", "iris.sav", package = "haven") read_sav(path) tmp <- tempfile(fileext = ".sav") write_sav(mtcars, tmp) read_sav(tmp) } haven/man/labelled.Rd0000644000176200001440000000354714033646021014171 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/labelled.R \name{labelled} \alias{labelled} \alias{is.labelled} \title{Create a labelled vector.} \usage{ labelled(x = double(), labels = NULL, label = NULL) is.labelled(x) } \arguments{ \item{x}{A vector to label. Must be either numeric (integer or double) or character.} \item{labels}{A named vector or \code{NULL}. The vector should be the same type as \code{x}. Unlike factors, labels don't need to be exhaustive: only a fraction of the values might be labelled.} \item{label}{A short, human-readable description of the vector.} } \description{ A labelled vector is a common data structure in other statistical environments, allowing you to assign text labels to specific values. This class makes it possible to import such labelled vectors in to R without loss of fidelity. This class provides few methods, as I expect you'll coerce to a standard R class (e.g. a \code{\link[=factor]{factor()}}) soon after importing. } \examples{ s1 <- labelled(c("M", "M", "F"), c(Male = "M", Female = "F")) s2 <- labelled(c(1, 1, 2), c(Male = 1, Female = 2)) s3 <- labelled(c(1, 1, 2), c(Male = 1, Female = 2), label="Assigned sex at birth") # Unfortunately it's not possible to make as.factor work for labelled objects # so instead use as_factor. This works for all types of labelled vectors. as_factor(s1) as_factor(s1, levels = "values") as_factor(s2) # Other statistical software supports multiple types of missing values s3 <- labelled(c("M", "M", "F", "X", "N/A"), c(Male = "M", Female = "F", Refused = "X", "Not applicable" = "N/A") ) s3 as_factor(s3) # Often when you have a partially labelled numeric vector, labelled values # are special types of missing. Use zap_labels to replace labels with missing # values x <- labelled(c(1, 2, 1, 2, 10, 9), c(Unknown = 9, Refused = 10)) zap_labels(x) } haven/man/zap_widths.Rd0000644000176200001440000000120414033646021014565 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/zap_widths.R \name{zap_widths} \alias{zap_widths} \title{Remove display width attributes} \usage{ zap_widths(x) } \arguments{ \item{x}{A vector or data frame.} } \description{ To provide some mild support for round-tripping variables between SPSS and R, haven stores display widths in an attribute: \code{display_width}. If this causes problems for your code, you can get rid of them with \code{zap_widths}. } \seealso{ Other zappers: \code{\link{zap_empty}()}, \code{\link{zap_formats}()}, \code{\link{zap_labels}()}, \code{\link{zap_label}()} } \concept{zappers} haven/man/print_labels.Rd0000644000176200001440000000122014033646021015065 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/labelled.R \name{print_labels} \alias{print_labels} \title{Print the labels of a labelled vector} \usage{ print_labels(x, name = NULL) } \arguments{ \item{x}{A labelled vector} \item{name}{The name of the vector (optional)} } \description{ This is a convenience function, useful to explore the variables of a newly imported dataset. } \examples{ s1 <- labelled(c("M", "M", "F"), c(Male = "M", Female = "F")) s2 <- labelled(c(1, 1, 2), c(Male = 1, Female = 2)) labelled_df <- tibble::tibble(s1, s2) for (var in names(labelled_df)) { print_labels(labelled_df[[var]], var) } } haven/man/labelled_spss.Rd0000644000176200001440000000263414033646021015235 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/labelled_spss.R \name{labelled_spss} \alias{labelled_spss} \title{Labelled vectors for SPSS} \usage{ labelled_spss( x = double(), labels = NULL, na_values = NULL, na_range = NULL, label = NULL ) } \arguments{ \item{x}{A vector to label. Must be either numeric (integer or double) or character.} \item{labels}{A named vector or \code{NULL}. The vector should be the same type as \code{x}. Unlike factors, labels don't need to be exhaustive: only a fraction of the values might be labelled.} \item{na_values}{A vector of values that should also be considered as missing.} \item{na_range}{A numeric vector of length two giving the (inclusive) extents of the range. Use \code{-Inf} and \code{Inf} if you want the range to be open ended.} \item{label}{A short, human-readable description of the vector.} } \description{ This class is only used when \code{user_na = TRUE} in \code{\link[=read_sav]{read_sav()}}. It is similar to the \code{\link[=labelled]{labelled()}} class but it also models SPSS's user-defined missings, which can be up to three distinct values, or for numeric vectors a range. } \examples{ x1 <- labelled_spss(1:10, c(Good = 1, Bad = 8), na_values = c(9, 10)) is.na(x1) x2 <- labelled_spss(1:10, c(Good = 1, Bad = 8), na_range = c(9, Inf), label = "Quality rating") is.na(x2) # Print data and metadata x2 } haven/man/zap_formats.Rd0000644000176200001440000000126214033646021014742 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/zap_formats.R \name{zap_formats} \alias{zap_formats} \title{Remove format attributes} \usage{ zap_formats(x) } \arguments{ \item{x}{A vector or data frame.} } \description{ To provide some mild support for round-tripping variables between Stata/SPSS and R, haven stores variable formats in an attribute: \code{format.stata}, \code{format.spss}, or \code{format.sas}. If this causes problems for your code, you can get rid of them with \code{zap_formats}. } \seealso{ Other zappers: \code{\link{zap_empty}()}, \code{\link{zap_labels}()}, \code{\link{zap_label}()}, \code{\link{zap_widths}()} } \concept{zappers} haven/man/zap_empty.Rd0000644000176200001440000000110314101766333014424 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/zap_empty.R \name{zap_empty} \alias{zap_empty} \title{Convert empty strings into missing values} \usage{ zap_empty(x) } \arguments{ \item{x}{A character vector} } \value{ A character vector with empty strings replaced by missing values. } \description{ Convert empty strings into missing values } \examples{ x <- c("a", "", "c") zap_empty(x) } \seealso{ Other zappers: \code{\link{zap_formats}()}, \code{\link{zap_labels}()}, \code{\link{zap_label}()}, \code{\link{zap_widths}()} } \concept{zappers} haven/man/figures/0000755000176200001440000000000014033646021013571 5ustar liggesusershaven/man/figures/logo.png0000644000176200001440000004635214033646021015251 0ustar liggesusersPNG  IHDRX?gAMA a cHRMz&u0`:pQ<bKGD pHYs!7!73XztIME 9bPKIDATxy\Wy{k_{_$Zh%Y%/LHXCBaB L&d&3vB$@c ލe־[{uWzVUWZR|>WշnUo~WWV,{ @ _KTL!b/@15 'n@O_^LPB/(q pq@Ũ TB(QF 0I?]Js%9HxCہO70vLD<QC WĹo|<+-_X_b{{޶_fm3* o[Hn~;O$"ΐ1%wXq !`^g䫹jޒ'oฆ&g:\ؾ2ݽ %CObvx_Wx}^:]݂b s!@wDmMv44LsHjB̎iI.Q6,u,dӁ6 z% _z+k֬A4#ZX#)GM['9cH#}!e&FiRG_?ip7?sN>#DEne{X^hc]ir`=VI 6])(VmPhR't'GYe #wH~mիW;VBׯG8Lqz2'd5tcjѦ$>dZ =B%( !vrԉIy4ms<_<O4ٽ{7wȅN&+S*:4I Xf!0119VXͷ֭[Yv-[nr!Ǹلض޽{7{Ćڷ_~;BD<6`[!eȶ8!nQ#ɾBe!0ct?+=y|Nʕ+D"W90 x >ϳg^o#*Jm>Qqnx?δ='#Ag[h!&c\ifI}}w]"mbx:/^$R__{m~۷oBWzA%z\+c[a[Sl~n~72m|BM!HzɗF]q躎iH) :,"JzȖGꔀ'_溢mDl~*,cg1NܸyYlMMM[_c(׽-T^diۢ ͯ|P lذ`0HuUB!0M.l@jz{{I% x<\.CL"ty{l~w녯*o-cn~GrFϿ?@{y4׿g>`7ovFg;r:@z|Hg-gk? '}tYu+s,/^bg'F%"PEOO/L79Mw:}+2488&$r*~ʅ34ߋ#\ 0׏#Fh.7Zs!`;Lm.~&YsYu9n7eaHpH$BwWzDTw\8P |3ٳL&C?===TWWNI&d2B,pV^/~`0H4ja׮]|#B+U9p9cؙ`!6mBJY}iow2FOD"cH! PC(3FfWVVN&d2`0m'ò,uh4J0dwL ii$B@]]^T*UchF0 8uvbWy]|硇" Ճ4R8j+(-li aH) U)Vdf b\pǏ8gO .&;ܹs~ 2*f %rE8ul#!xz #;zSNqq<řs\f1=h:ܑ-*yrQUU D2Wä(#\ebB7T*Ua=ef6ϙ%daW Y8#zRNuPn"t9pb(} UUUx^din^Mfq;wχ>l,N9P.KriCA&uE&)tZG9ő#yx=\\ɻllBC()2$Ua|>xHX.͒fK섖e144H9={سw?GNk(KE.]'ZD/I%9Cp_L J꺞l2ݳ'Or!:9c(c*# ko\A%viH 8uuuH"m +˔|)`0iۋeܹ]]d =\jZWx-J|@QPJ@M`.7V6\(%qxs=C ^P5"P+ =/g15zCIҰI1&W";P.G~l4%' x*jvߜhJub %`fI5ozR@!5UJ=R)mLt\ g#5ƲfRCXpg jn_ʭx3۶mcݺ먪P8 ݽF]S ː|5J]Ө$N1A<`F+Lc{>hK[q 7n:/YWj)̗p*c\ve 4.`0HWWĕΦ8VrC$ݢ&gzV޲5kVzjP[[Wb) jOoE\nfzD"={=Cm TWDUׯ`ۨgM446RQQoBMun:-U>A H343TWWS[[ iDQ<LzZ[[u,t]/i2|ضMuu5h;͒NRbYVIk0NT Je# p8Ed2B_׋Om -UkʐfP0 ƅ|\lFV8Q wmm-agSHۜy3ː|1P(Tk^/^D"A:l7+QX)%}}CzP]\neaE"B=|opx0Ml6{Sl,޽l=8yxmm4-]' Jep!mdn!W:^z< ɔ£Gl6K_o/Nd߾}ܵsAӍ 5\{Mځau RJeV'p8.dJ:>NJ8cǜ\HP= kzC% T壧wݹ^rG `zkzL&C:fMH$<$Ig:%=ؾJpʛq- GO~Ns+)%rB=TTTo>x $\M`9[a\nZdz(t)id10Њ*dcVs%2CDؿ_]Wͭh 7ů/c+,iUd~-Ll4l/+wULQ(3a***H%Sx -)GfL m +ÈuP'4[;ٰBece*k\fF ,@uwww9mf[as -CHZW-+q 7N8WkF ! d6R%21ۄp@ ZbVVJ `%>6az޼ͺ뮣ʪꕗ3= Ωە?JeF>)qF ;=JY\v% ,me#+W`4/[Fmm@u*9\v*nW(`d2ɢ~4 ol6CҥXjե`ZU)M~%rBڅh4cx WnT*E*"O*@xL; *)']HF躎ra6ad کa0~tBP.#dn7erRډ !! mV&K~DQf2|Q.XF'N(* !H\.6A;Q|\:j5:P1pKFC0,X#a$vv)d4kSO?_x;n~#X:e٣\F\O`oG(bKaѷa=h?'N;/_0h``؅ `;ە;Je6RH+K0XMUxJ B,v& !Riژ~`˲SݻƮ}9u!ۏ.S jn 0QF%2 luvThii)t/b''N`wCOrgaˋ X{e-!oHBJ^Ntʤ*nW(΢$rxeی A~A'ϰ!4zd*nW( e*G~E5hzeۇ)u baNpłMdc"fY<^V M[@OO>t24[( RFh4 ߎ'P[f Hif0qx`'zJ͊޵۶mcill,q*EQB C=]\.Q._!E#j8g143Xap2ыNPt6y&[em*(u* !B_UܮQ.e")4EƔN)%Ҷ24v&JưFfPP-Y}k׮1텗c%@ |H$B_COcBfV2Xֺ-[ƢE&roe DUU󳪍U(FGCCB>{% aYlfis3MMMdYkXf/BJ9 ]ZKQ(6}Xiʠ|iN6l@,wcqBK.u 3y3X$Bp;3@H$BEE555ȢEhhh0:up^*~@55;=&iZH GNM{{;\rb[< w1t\Qz4c͗ikoK4'g%YJ ZUp"E~<ފ\ʵG[[O?"23 ސojbVŋul6[kUee)g2!Xb%Yd UFɊ7q]X}*r(8i0=77PQQm;"χd2eY477iU(o|$S7n˖-# wja-gv:L |>] +ne]q)Ooo/_җb)iw}S"z؛ԡnF3p ZYVXA(0 RpSxěǥkxiSJ)F,mnJĊ|ŠD x0=]:Qr >@!Zl_5&Ab4gΜaʕx^dz.]Zgů:@b%PL"AxW}~OsÍ7rذawufbU5&XԶmyy?^DV5+J3D{+դ_g%/_ʕhooz*++ B%[N/|F{[aR) cǎq`~ɓ' _i u(>J3Dh:Uxڱϱy1`xD"*++\e$& Ly&!4Μ=?JH$b0*O,c84-p*#ȀemVeQYnDپt?N4Ipмt_ os`z F%x%٦Δ݈@*ߥ++k:=lo,)Ik`[jX&E*4h.y\nA<{c 4raA xRVcBrNK6\Ɋ*\7Xpy\ul)Ifز:H[^5j{%;l-[W]?NOlxJdA  *ٴ|Qй NniGS_4c*۶ZUU]B1QV(0J F X(+s%`b9Ɖ`K)GRJib[ξ3kQxB hB\RS ]W /}_F@経ޏ,( sܚKߠѠ5uk: zb fߩ.zhB!sy]4VGh[TŲ j+C=.l)Ns+ơ= 5G Y!$ z8!6 n"1 %3 &.=[B4eŒV-aqm׍i&9\;&tڊ z-%=mKu-."azwgz2s,Tsض$pM+x7pI*cpL7}r7$+ިRJj+V޲u u_[w0KgmKV4UA{/;ç3xlco2~Ƕ.eKj"uzZ.#]rΌar+OOt [JjA3`QuD9|M67Л h "Nyv L^yJ%%K+oݲ {4~+s]k[V/9zmY~B!ޱm _~0,ۙ ~b)V6~Mm6VqtK ဗo^ɒڑi'/!Ruwp뺥Wn+԰5qsOKmS *l 5ܳ?x6F.\ѐ7nnc%|WIg ^,_ww]Zomyz\:޽)᏾$ɴq#}.+oH˧u+i>G,S˂޳ s=< vP"C8Fdxx9= :l?xgv`N.IH EUuc^Gkʡs:,!xf >4Tg|$3U%,u}+HxOfxleKZ|xw^ٝ'H`6Β(ܺzjUGTJ˲1̑)oٺݲσᙝ'8pTƤ:`%qS;UM8N/X5o\n739o޲ m ~ygx'GEu_?KۢZ5+C앣xD, 5N󕇷ҁtOfHgMYdS<$%iMS!}ē^{!X"zV7tm}<$zWɡ"3mE5#Sԗ?ށsfUGiǡ>%6ưml[1,c ^eRB~gbْ*[]w+\&He o=y}8r];y9Ɇ8׭ī/Yݞ#̫=,@_tJiv$;J{mcX7#RJ* eK} q 8v~Bܹu5 Kò.-S|W0L r.]{OWqތe XJɿ=3]^]'/_Tǭ3s% ?X"p߃ LkL-kx첏mi䘸ʯDl]ν7,gZqLF ?~;ַ m < t]]2R# &ߖ\vTܷ_}%l{=H$[ &-5#%%W΍{=qfW_0Qww:gusvUmcْ+XU8v7;O%%DKGj4V4U6f 2:_8o%kDJs~7;֕LA8Gp!S$?۩ x~MӼ#n߰ ]#uݺoh+?Ԩ|e׭㞆ҴWS=B.w4JӀN\{ް6;t>gw|\[*;nicۏkŗm[ؼr1t.rn]\T'/0+.X;Obq-.]TlwlANq%i6oy%>SG伿hHkXtMڑFYvq`eܺ$l.^=1JNX8xqi~^ĺӕt?Zs$Ζ{߰t|Ost!}I|X[kfqMۖx.tn?Bn<)4AsCeI6DwzPbl[PqgDNW"9n$J5Fٺz m\_Mk 50SDPBbK_X+%)FJHGJs=WMRR:^ ]>ÎC \Ɲi7mjdk}9 @cJ߸j1틫/뭘]Ǎ6A[-^syu2m#%7\Ds}wnl-5cX.@ .ҁ#x]z_ `?(O1pgɒm]A*L.ɤtFͪj+tܮc>e3G|?{~ᤖݣ Cw{?ǝe*gX]QUW횜CqeM펱+j~[V/+jkho[;o $N>q1^~U\˲+GI.$Oz`]sz|tGuTw7HKU8@SmM~-|޹ =3Z^Youn[舓讍om`. ܴ6VѾ`{ct7~/\uiZ4vdɮU~ͅQԥkܲn)[@ <_ *YƬa6m?)>z % *|ѻ8M%z7Lg*1Ǖ! s.[.19TT^%[L?Â̬\p6Md6kX״*ӴJΑ11'v?z3׵ԏ)Ӳ9v~z7^tK_v%y*Glvg$ 1#g7rVFZ8.1bǡ<1S(Qaִ.~F9GDOW*nv2 ʔKjh~NgLx٭X"${N\mI}Um5ex.R3]1^?7T@ì(2%2;yq\{P`d/w0ɡlKXZec[#Zi uiSY9^ut[šiZFf&x]ǥk:<3W!94.@=H_A9Sxek.EzRb]]:&mIִ Q*k=-.ͮ/sB|f,JTvm3Y"\tDD&0#vhSh47ܜ}kʵC+gX0-rurUy5 ZS hap)%6JIlo?oB16"D$~bèmF !y7t|[R}6eNta=pi_MaŐn/|ÖB/!:52L!C[UR`Byo2YHݥ XX&Zz-̨ jr pXOahfn[wiB#kG!9QwPpѯ?@;]XF@Y)2 tvoZ6yMc|C@U+OoBQt RwO4 \C"3ĹnwK2pR0>@tԴuශk=W1Rw;n剔l IN6 @Oൈ&A;Jkދ5AmQ1]գ{5VkWS$dp;)`+TBE`61HL_$g*p 7U/HaW6Mꤚla6"$#0L^y45^ QiB'̿.-*f9 | 3p ӔVLi#2)'qrq.86/?azgQeo9tyrޙ|9L'2H =P 7YGoYVEŌ06qV/g+B3 fYsVldl~3EYʶRN{xm~]o( QEE1pi!Dzܿ7 6^l~3EY 8-*l)q:c-[W7SNj<o(;9qfjpm/TNR:6f֙imkoy%MX?=:v=@r6?-Q8?lByTH 86&M"_2XeAK z{"V,'/!1{s+H/ۚI &܇)# pgvsY%I#@3$lm!5SZ1kkm72' x'v qdX[Jeh8brigAp󔽀 N'pZL0K",P$f[J4Mcje;ur$;*>.!m I&}K r7Ϝp8Sv :J0s$LqM+i_\Meȏ{O!6fuN#T|n_UKk{"ou5Ou&~:EP W&9qSqR ny+p mm~^m~3ܿ1-~Ą 8&_jmOyxk>Gkc%x9|a٩= ]yG- zٺEaj+|)v[WQ ī8=-%1'WYn~OTn~ %˰-J)3m$|qQW]/ L˼|cAWb!mTXξ8,ˇwMU${Nڑffm~ef %+Pl$MQnW.ps{o\Nv["#7n`%\|655|Oz hBn~Ĺ1j.c[T/ HKl8wn~_) <*".c[1I̚mѥkD>tM7o{OsM+z\:əj8tòI1 7S(O!8 u;Vtڸ4!@*B{ Au$]\>kG.5-BI]q=A@m M-Jlm)YRC^U6BcuxG;{Ng\)=JH%ΊOpKpJLh BXF$ s\BB0R_?~0ù3&$z&'겍YtΘs]"J3lg{p:o޺W?^<ĉ HF3o>Vg-J 틫L28BY-l~%Ybm&b[D͑7oPeݶ(Ad7OQ.FM> 4Mv. } \seealso{ Useful links: \itemize{ \item \url{https://haven.tidyverse.org} \item \url{https://github.com/tidyverse/haven} \item \url{https://github.com/WizardMac/ReadStat} \item Report bugs at \url{https://github.com/tidyverse/haven/issues} } } \author{ \strong{Maintainer}: Hadley Wickham \email{hadley@rstudio.com} Authors: \itemize{ \item Evan Miller (Author of included ReadStat code) [copyright holder] } Other contributors: \itemize{ \item RStudio [copyright holder, funder] } } \keyword{internal} haven/man/zap_labels.Rd0000644000176200001440000000173114101766135014537 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/zap_labels.R \name{zap_labels} \alias{zap_labels} \title{Zap value labels} \usage{ zap_labels(x) } \arguments{ \item{x}{A vector or data frame} } \description{ Removes value labels, leaving unlabelled vectors as is. Use this if you want to simply drop all \code{labels} from a data frame. Zapping labels from \code{\link[=labelled_spss]{labelled_spss()}} also removes user-defined missing values, replacing with standard \code{NA}s. } \examples{ x1 <- labelled(1:5, c(good = 1, bad = 5)) x1 zap_labels(x1) x2 <- labelled_spss(c(1:4, 9), c(good = 1, bad = 5), na_values = 9) x2 zap_labels(x2) # zap_labels also works with data frames df <- tibble::tibble(x1, x2) df zap_labels(df) } \seealso{ \code{\link[=zap_label]{zap_label()}} to remove variable labels. Other zappers: \code{\link{zap_empty}()}, \code{\link{zap_formats}()}, \code{\link{zap_label}()}, \code{\link{zap_widths}()} } \concept{zappers} haven/man/as_factor.Rd0000644000176200001440000000413214034334622014357 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/as_factor.R \name{as_factor} \alias{as_factor} \alias{as_factor.data.frame} \alias{as_factor.haven_labelled} \alias{as_factor.labelled} \title{Convert input to a factor.} \usage{ \method{as_factor}{data.frame}(x, ..., only_labelled = TRUE) \method{as_factor}{haven_labelled}( x, levels = c("default", "labels", "values", "both"), ordered = FALSE, ... ) \method{as_factor}{labelled}( x, levels = c("default", "labels", "values", "both"), ordered = FALSE, ... ) } \arguments{ \item{x}{Object to coerce to a factor.} \item{...}{Other arguments passed down to method.} \item{only_labelled}{Only apply to labelled columns?} \item{levels}{How to create the levels of the generated factor: \itemize{ \item "default": uses labels where available, otherwise the values. Labels are sorted by value. \item "both": like "default", but pastes together the level and value \item "label": use only the labels; unlabelled values become \code{NA} \item "values: use only the values }} \item{ordered}{If \code{TRUE} create an ordered (ordinal) factor, if \code{FALSE} (the default) create a regular (nominal) factor.} } \description{ The base function \code{as.factor()} is not a generic, but this variant is. Methods are provided for factors, character vectors, labelled vectors, and data frames. By default, when applied to a data frame, it only affects \link{labelled} columns. } \details{ Includes methods for both class \code{haven_labelled} and \code{labelled} for backward compatibility. } \examples{ x <- labelled(sample(5, 10, replace = TRUE), c(Bad = 1, Good = 5)) # Default method uses values where available as_factor(x) # You can also extract just the labels as_factor(x, levels = "labels") # Or just the values as_factor(x, levels = "values") # Or combine value and label as_factor(x, levels = "both") # as_factor() will preserve SPSS missing values from values and ranges y <- labelled_spss(1:10, na_values = c(2, 4), na_range = c(8, 10)) as_factor(y) # use zap_missing() first to convert to NAs zap_missing(y) as_factor(zap_missing(y)) } haven/man/read_dta.Rd0000644000176200001440000001013714035041246014161 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/haven-stata.R \name{read_dta} \alias{read_dta} \alias{read_stata} \alias{write_dta} \title{Read and write Stata DTA files} \usage{ read_dta( file, encoding = NULL, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique" ) read_stata( file, encoding = NULL, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique" ) write_dta(data, path, version = 14, label = attr(data, "label")) } \arguments{ \item{file}{Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} will be automatically uncompressed. Files starting with \verb{http://}, \verb{https://}, \verb{ftp://}, or \verb{ftps://} will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path) or be a vector of greater than length 1. Using a value of \code{\link[readr:clipboard]{clipboard()}} will read from the system clipboard.} \item{encoding}{The character encoding used for the file. Generally, only needed for Stata 13 files and earlier. See Encoding section for details.} \item{col_select}{One or more selection expressions, like in \code{\link[dplyr:select]{dplyr::select()}}. Use \code{c()} or \code{list()} to use more than one expression. See \code{?dplyr::select} for details on available selection options. Only the specified columns will be read from \code{data_file}.} \item{skip}{Number of lines to skip before reading data.} \item{n_max}{Maximum number of lines to read.} \item{.name_repair}{Treatment of problematic column names: \itemize{ \item \code{"minimal"}: No name repair or checks, beyond basic existence, \item \code{"unique"}: Make sure names are unique and not empty, \item \code{"check_unique"}: (default value), no name repair, but check they are \code{unique}, \item \code{"universal"}: Make the names \code{unique} and syntactic \item a function: apply custom name repair (e.g., \code{.name_repair = make.names} for names in the style of base R). \item A purrr-style anonymous function, see \code{\link[rlang:as_function]{rlang::as_function()}} } This argument is passed on as \code{repair} to \code{\link[vctrs:vec_as_names]{vctrs::vec_as_names()}}. See there for more details on these terms and the strategies used to enforce them.} \item{data}{Data frame to write.} \item{path}{Path to a file where the data will be written.} \item{version}{File version to use. Supports versions 8-15.} \item{label}{Dataset label to use, or \code{NULL}. Defaults to the value stored in the "label" attribute of \code{data}. Must be <= 80 characters.} } \value{ A tibble, data frame variant with nice defaults. Variable labels are stored in the "label" attribute of each variable. It is not printed on the console, but the RStudio viewer will show it. If a dataset label is defined in Stata, it will stored in the "label" attribute of the tibble. \code{write_dta()} returns the input \code{data} invisibly. } \description{ Currently haven can read and write logical, integer, numeric, character and factors. See \code{\link[=labelled]{labelled()}} for how labelled variables in Stata are handled in R. } \section{Character encoding}{ Prior to Stata 14, files did not declare a text encoding, and the default encoding differed across platforms. If \code{encoding = NULL}, haven assumes the encoding is windows-1252, the text encoding used by Stata on Windows. Unfortunately Stata on Mac and Linux use a different default encoding, "latin1". If you encounter an error such as "Unable to convert string to the requested encoding", try \code{encoding = "latin1"} For Stata 14 and later, you should not need to manually specify \code{encoding} value unless the value was incorrectly recorded in the source file. } \examples{ path <- system.file("examples", "iris.dta", package = "haven") read_dta(path) tmp <- tempfile(fileext = ".dta") write_dta(mtcars, tmp) read_dta(tmp) read_stata(tmp) } haven/man/read_sas.Rd0000644000176200001440000000530014034330015014165 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/haven-sas.R \name{read_sas} \alias{read_sas} \alias{write_sas} \title{Read and write SAS files} \usage{ read_sas( data_file, catalog_file = NULL, encoding = NULL, catalog_encoding = encoding, col_select = NULL, skip = 0L, n_max = Inf, cols_only = "DEPRECATED", .name_repair = "unique" ) write_sas(data, path) } \arguments{ \item{data_file, catalog_file}{Path to data and catalog files. The files are processed with \code{\link[readr:datasource]{readr::datasource()}}.} \item{encoding, catalog_encoding}{The character encoding used for the \code{data_file} and \code{catalog_encoding} respectively. A value of \code{NULL} uses the encoding specified in the file; use this argument to override it if it is incorrect.} \item{col_select}{One or more selection expressions, like in \code{\link[dplyr:select]{dplyr::select()}}. Use \code{c()} or \code{list()} to use more than one expression. See \code{?dplyr::select} for details on available selection options. Only the specified columns will be read from \code{data_file}.} \item{skip}{Number of lines to skip before reading data.} \item{n_max}{Maximum number of lines to read.} \item{cols_only}{\strong{Deprecated}: Use \code{col_select} instead.} \item{.name_repair}{Treatment of problematic column names: \itemize{ \item \code{"minimal"}: No name repair or checks, beyond basic existence, \item \code{"unique"}: Make sure names are unique and not empty, \item \code{"check_unique"}: (default value), no name repair, but check they are \code{unique}, \item \code{"universal"}: Make the names \code{unique} and syntactic \item a function: apply custom name repair (e.g., \code{.name_repair = make.names} for names in the style of base R). \item A purrr-style anonymous function, see \code{\link[rlang:as_function]{rlang::as_function()}} } This argument is passed on as \code{repair} to \code{\link[vctrs:vec_as_names]{vctrs::vec_as_names()}}. See there for more details on these terms and the strategies used to enforce them.} \item{data}{Data frame to write.} \item{path}{Path to file where the data will be written.} } \value{ A tibble, data frame variant with nice defaults. Variable labels are stored in the "label" attribute of each variable. It is not printed on the console, but the RStudio viewer will show it. \code{write_sas()} returns the input \code{data} invisibly. } \description{ \code{read_sas()} supports both sas7bdat files and the accompanying sas7bcat files that SAS uses to record value labels. \code{write_sas()} is currently experimental and only works for limited datasets. } \examples{ path <- system.file("examples", "iris.sas7bdat", package = "haven") read_sas(path) } haven/man/zap_label.Rd0000644000176200001440000000142314101766013014345 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/zap_label.R \name{zap_label} \alias{zap_label} \title{Zap variable labels} \usage{ zap_label(x) } \arguments{ \item{x}{A vector or data frame} } \description{ Removes variable label, leaving unlabelled vectors as is. } \examples{ x1 <- labelled(1:5, c(good = 1, bad = 5), label = "rating") x1 zap_label(x1) x2 <- labelled_spss(c(1:4, 9), label = "score", na_values = 9) x2 zap_label(x2) # zap_label also works with data frames df <- tibble::tibble(x1, x2) str(df) str(zap_label(df)) } \seealso{ \code{\link[=zap_labels]{zap_labels()}} to remove value labels. Other zappers: \code{\link{zap_empty}()}, \code{\link{zap_formats}()}, \code{\link{zap_labels}()}, \code{\link{zap_widths}()} } \concept{zappers} haven/DESCRIPTION0000644000176200001440000000271314102416217013062 0ustar liggesusersPackage: haven Title: Import and Export 'SPSS', 'Stata' and 'SAS' Files Version: 2.4.3 Authors@R: c(person(given = "Hadley", family = "Wickham", role = c("aut", "cre"), email = "hadley@rstudio.com"), person(given = "Evan", family = "Miller", role = c("aut", "cph"), comment = "Author of included ReadStat code"), person(given = "RStudio", role = c("cph", "fnd"))) Description: Import foreign statistical formats into R via the embedded 'ReadStat' C library, . License: MIT + file LICENSE URL: https://haven.tidyverse.org, https://github.com/tidyverse/haven, https://github.com/WizardMac/ReadStat BugReports: https://github.com/tidyverse/haven/issues Depends: R (>= 3.2) Imports: forcats (>= 0.2.0), hms, methods, readr (>= 0.1.0), rlang (>= 0.4.0), tibble, tidyselect, vctrs (>= 0.3.0) Suggests: cli, covr, crayon, fs, knitr, pillar (>= 1.4.0), rmarkdown, testthat (>= 3.0.0) LinkingTo: cpp11 VignetteBuilder: knitr Config/testthat/edition: 3 Encoding: UTF-8 RoxygenNote: 7.1.1 SystemRequirements: GNU make, C++11, zlib NeedsCompilation: yes Packaged: 2021-08-03 21:27:47 UTC; hadley Author: Hadley Wickham [aut, cre], Evan Miller [aut, cph] (Author of included ReadStat code), RStudio [cph, fnd] Maintainer: Hadley Wickham Repository: CRAN Date/Publication: 2021-08-04 04:50:23 UTC haven/build/0000755000176200001440000000000014102332322012442 5ustar liggesusershaven/build/vignette.rds0000644000176200001440000000034414102332322015002 0ustar liggesusersuK0S&xN%$lm56bh#qZ^vfۻґ,>l6D/ÂZ&E 'f\*(xY:O璽IiɂQaOQNz w3OP}Њf7;ɫ,;á-Ӝca\XZ"MD^e erSuy/lm6kGir Gzhaven/tests/0000755000176200001440000000000014102332323012506 5ustar liggesusershaven/tests/testthat/0000755000176200001440000000000014102416217014353 5ustar liggesusershaven/tests/testthat/test-haven-stata.R0000644000176200001440000001535414101006665017677 0ustar liggesusers# read_stata -------------------------------------------------------------- test_that("stata data types read into expected types (#45)", { df <- read_stata(test_path("stata/types.dta")) types <- vapply(df, typeof, character(1)) expect_equal(types, c( vfloat = "double", vdouble = "double", vlong = "double", vint = "double", vbyte = "double", vstr = "character", vdate = "double", vdatetime = "double" )) }) test_that("Stata %td (date) and %tc (datetime) read into expected classes", { df <- read_stata(test_path("stata/types.dta")) expect_s3_class(df$vdate, "Date") expect_s3_class(df$vdatetime, "POSIXct") }) test_that("Old %d format read into Date class", { df <- zap_formats(read_stata(test_path("stata/datetime-d.dta"))) expect_equal(df$date, as.Date("2015-11-02")) }) test_that("tagged double missings are read correctly", { x <- read_dta(test_path("stata/tagged-na-double.dta"))$x expect_equal(na_tag(x), c(rep(NA, 5), "a", "h", "z")) labels <- attr(x, "labels") expect_equal(na_tag(labels), c("a", "z")) }) test_that("tagged integer missings are read correctly", { x <- read_dta(test_path("stata/tagged-na-int.dta"))$x expect_equal(na_tag(x), c(rep(NA, 5), "a", "h", "z")) labels <- attr(x, "labels") expect_equal(na_tag(labels), c("a", "z")) }) test_that("file label and notes stored as attributes", { df <- read_dta(test_path("stata/notes.dta")) expect_equal(attr(df, "label"), "This is a test dataset.") expect_length(attr(df, "notes"), 2) }) test_that("only selected columns are read", { out <- read_dta(test_path("stata/notes.dta"), col_select = "id") expect_named(out, "id") }) test_that("using skip returns correct number of rows", { rows_after_skipping <- function(n) { nrow(read_dta(test_path("stata/notes.dta"), skip = n)) } n <- rows_after_skipping(0) expect_equal(rows_after_skipping(1), n - 1) expect_equal(rows_after_skipping(n - 1), 1) expect_equal(rows_after_skipping(n + 0), 0) expect_equal(rows_after_skipping(n + 1), 0) }) test_that("can limit the number of rows to read", { rows_with_limit <- function(n) { nrow(read_dta(test_path("stata/notes.dta"), n_max = n)) } n <- rows_with_limit(Inf) expect_equal(rows_with_limit(0), 0) expect_equal(rows_with_limit(1), 1) expect_equal(rows_with_limit(n), n) expect_equal(rows_with_limit(n + 1), n) # alternatives for unlimited rows expect_equal(rows_with_limit(NA), n) expect_equal(rows_with_limit(-1), n) }) # write_dta --------------------------------------------------------------- test_that("can roundtrip basic types", { x <- runif(10) expect_equal(roundtrip_var(x, "dta"), x) expect_equal(roundtrip_var(1:10, "dta"), 1:10) expect_equal(roundtrip_var(c(TRUE, FALSE), "dta"), c(1, 0)) expect_equal(roundtrip_var(letters, "dta"), letters) }) test_that("can roundtrip missing values (as much as possible)", { expect_equal(roundtrip_var(NA, "dta"), NA_integer_) expect_equal(roundtrip_var(NA_real_, "dta"), NA_real_) expect_equal(roundtrip_var(NA_integer_, "dta"), NA_integer_) expect_equal(roundtrip_var(NA_character_, "dta"), "") }) test_that("can roundtrip date times", { x1 <- c(as.Date("2010-01-01"), NA) expect_equal(roundtrip_var(x1, "dta"), x1) # converted to same time in UTC x2 <- as.POSIXct("2010-01-01 09:00", tz = "Pacific/Auckland") expect_equal( roundtrip_var(x2, "dta"), as.POSIXct("2010-01-01 09:00", tz = "UTC") ) }) test_that("can roundtrip tagged NAs", { x <- c(1, 2, tagged_na('a', 'b'), NA) expect_equal(roundtrip_var(x, "dta"), x) tags <- tagged_na('a', 'b') y <- labelled( c(1, 2, 1, tags[1], tags[2]), c("ABC" = tags[1], "DEF" = tags[2]) ) expect_equal(roundtrip_var(y, "dta"), y) }) test_that("infinity gets converted to NA", { expect_equal(roundtrip_var(c(Inf, 0, -Inf), "dta"), c(NA, 0, NA)) }) test_that("factors become labelleds", { f <- factor(c("a", "b"), levels = letters[1:3]) rt <- roundtrip_var(f, "dta") expect_s3_class(rt, "haven_labelled") expect_equal(as.vector(rt), 1:2) expect_equal(attr(rt, "labels"), c(a = 1, b = 2, c = 3)) }) test_that("labels are preserved", { x <- 1:10 attr(x, "label") <- "abc" expect_equal(attr(roundtrip_var(x, "dta"), "label"), "abc") }) test_that("labelleds are round tripped", { int <- labelled(c(1L, 2L), c(a = 1L, b = 3L)) num <- labelled(c(1, 2), c(a = 1, b = 3)) chr <- labelled(c("a", "b"), c(a = "b", b = "a")) expect_equal(roundtrip_var(num, "dta"), num) # FIXME! # expect_equal(roundtrip_var(chr, "dta"), chr) }) test_that("can write labelled with NULL labels", { int <- labelled(c(1L, 2L), NULL) num <- labelled(c(1, 2), NULL) chr <- labelled(c("a", "b"), NULL) expect_equal(roundtrip_var(int, "dta"), c(1L, 2L)) expect_equal(roundtrip_var(num, "dta"), c(1L, 2L)) expect_equal(roundtrip_var(chr, "dta"), c("a", "b")) }) test_that("factors become labelleds", { f <- factor(c("a", "b"), levels = letters[1:3]) rt <- roundtrip_var(f, "dta") expect_s3_class(rt, "haven_labelled") expect_equal(as.vector(rt), 1:2) expect_equal(attr(rt, "labels"), c(a = 1, b = 2, c = 3)) }) test_that("labels are converted to utf-8", { labels_utf8 <- c("\u00e9\u00e8", "\u00e0", "\u00ef") labels_latin1 <- iconv(labels_utf8, "utf-8", "latin1") v_utf8 <- labelled(3:1, setNames(1:3, labels_utf8)) v_latin1 <- labelled(3:1, setNames(1:3, labels_latin1)) expect_equal(names(attr(roundtrip_var(v_utf8, "dta"), "labels")), labels_utf8) expect_equal(names(attr(roundtrip_var(v_latin1, "dta"), "labels")), labels_utf8) }) test_that("supports stata version 15", { df <- tibble(x = factor(letters), y = runif(26)) path <- tempfile() write_dta(df, path, version = 15) df2 <- read_dta(path) df2$x <- as_factor(df2$x) df2$y <- zap_formats(df2$y) expect_equal(df2, df) }) test_that("can roundtrip file labels", { df <- tibble(x = 1) expect_null(attr(roundtrip_dta(df), "label")) expect_equal(attr(roundtrip_dta(df, label = "abcd"), "label"), "abcd") attr(df, "label") <- "abc" expect_equal(attr(roundtrip_dta(df), "label"), "abc") expect_equal(attr(roundtrip_dta(df, label = "abcd"), "label"), "abcd") expect_null(attr(roundtrip_dta(df, label = NULL), "label")) }) test_that("invalid files generate informative errors", { expect_snapshot(error = TRUE,{ long <- paste(rep("a", 100), collapse = "") write_dta(data.frame(x = 1), tempfile(), label = long) df <- data.frame(1) names(df) <- "x y" write_dta(df, tempfile(), version = 13) names(df) <- long write_dta(df, tempfile(), version = 13) write_dta(df, tempfile(), version = 14) }) }) test_that("can't write non-integer labels (#401)", { expect_snapshot(error = TRUE, { df <- data.frame(x = labelled(c(1, 2.5, 3), c("b" = 1.5))) write_dta(df, tempfile()) }) }) haven/tests/testthat/test-zap_widths.R0000644000176200001440000000052714033646044017641 0ustar liggesuserstest_that("can zap width attribute from vector", { x <- structure(1:5, display_width = 10) y <- zap_widths(x) expect_null(attributes(y)) }) test_that("can zap width attribute from vector in data frame", { x <- structure(1:5, display_width = 10) df <- data.frame(x = x) out <- zap_widths(df) expect_null(attributes(out$x)) }) haven/tests/testthat/test-labelled_spss.R0000644000176200001440000002427314101006665020300 0ustar liggesuserstest_that("constructor checks na_value", { expect_incompatible_type(labelled_spss(1:10, na_values = "a")) expect_snapshot(error = TRUE, { labelled_spss(1:10, na_values = "a") labelled_spss(1:10, na_values = NA_integer_) }) }) test_that("constructor checks na_range", { expect_snapshot(error = TRUE,{ labelled_spss(1:10, na_range = "a") labelled_spss(1:10, na_range = 1:3) labelled_spss(1:10, na_range = c(2, NA)) labelled_spss(1:10, na_range = c(2, 1)) }) }) test_that("printed output is stable", { x <- labelled_spss( 1:5, c("Good" = 1, "Bad" = 5), na_values = c(1, 2), na_range = c(3, Inf) ) expect_snapshot(x) }) test_that("subsetting preserves attributes", { x <- labelled_spss( 1:5, c("Good" = 1, "Bad" = 5), na_values = c(1, 2), na_range = c(3, Inf), label = "Rating" ) expect_identical(x, x[]) }) test_that("labels must be unique", { expect_error( labelled_spss(1, c(female = 1, male = 1), na_values = 9), "must be unique") }) # is.na ------------------------------------------------------------------- test_that("values in na_range flagged as missing", { x <- labelled_spss(1:5, c("a" = 1), na_range = c(1, 3)) expect_equal(is.na(x), c(TRUE, TRUE, TRUE, FALSE, FALSE)) }) test_that("values in na_values flagged as missing", { x <- labelled_spss(1:5, c("a" = 1), na_values = c(1, 3, 5)) expect_equal(is.na(x), c(TRUE, FALSE, TRUE, FALSE, TRUE)) }) # Types ------------------------------------------------------------------- test_that("combining preserves class", { expect_s3_class(vec_c(labelled_spss(), labelled_spss()), "haven_labelled_spss") expect_s3_class(vec_c(labelled_spss(), labelled_spss(na_values = 1)), "haven_labelled") expect_s3_class(vec_c(labelled_spss(na_values = 1), labelled_spss(na_values = 1)), "haven_labelled_spss") }) test_that("combining is symmetrical w.r.t. data types", { expect_incompatible_type(vec_c(labelled_spss(character()), labelled_spss())) expect_incompatible_type(vec_c(labelled_spss(), labelled_spss(character()))) expect_identical( vec_c(labelled_spss(integer()), labelled_spss()), vec_c(labelled_spss(), labelled_spss(integer())) ) expect_identical( vec_c(labelled_spss(), double()), vec_c(double(), labelled_spss()) ) expect_identical( vec_c(labelled_spss(), integer()), vec_c(integer(), labelled_spss()) ) expect_identical( vec_c(labelled_spss(), labelled()), vec_c(labelled(), labelled_spss()) ) }) test_that("can cast labelled_spss to atomic vectors", { x_int <- labelled_spss(1:2) x_dbl <- labelled_spss(c(1, 2)) x_chr <- labelled_spss(c("a", "b")) expect_identical(vec_cast(x_int, integer()), 1:2) expect_identical(vec_cast(x_int, double()), c(1, 2)) expect_error(vec_cast(x_int, character()), class = "vctrs_error_incompatible_type") expect_identical(vec_cast(x_dbl, integer()), 1:2) expect_identical(vec_cast(x_dbl, double()), c(1, 2)) expect_error(vec_cast(x_dbl, character()), class = "vctrs_error_incompatible_type") expect_error(vec_cast(x_chr, integer()), class = "vctrs_error_incompatible_type") expect_error(vec_cast(x_chr, double()), class = "vctrs_error_incompatible_type") expect_identical(vec_cast(x_chr, character()), c("a", "b")) }) test_that("can cast atomic vectors to labelled_spss", { x_int <- labelled_spss(1:2) x_dbl <- labelled_spss(c(1, 2)) x_chr <- labelled_spss(c("a", "b")) expect_identical(vec_cast(1:3, x_int), labelled_spss(1:3)) expect_identical(vec_cast(1:3, x_dbl), labelled_spss(c(1, 2, 3))) expect_error(vec_cast(1:3, x_chr), class = "vctrs_error_incompatible_type") expect_identical(vec_cast(c(0, 1), x_int), labelled_spss(0:1)) expect_identical(vec_cast(c(0, 1), x_dbl), labelled_spss(c(0, 1))) expect_error(vec_cast(c(0, 1), x_chr), class = "vctrs_error_incompatible_type") expect_error(vec_cast("a", x_int), class = "vctrs_error_incompatible_type") expect_error(vec_cast("a", x_dbl), class = "vctrs_error_incompatible_type") expect_identical(vec_cast("a", x_chr), labelled_spss("a")) }) test_that("combining preserves label sets", { expect_equal( vec_c( labelled_spss(1, labels = c(Good = 1, Bad = 5)), labelled_spss(5, labels = c(Good = 1, Bad = 5)), ), labelled_spss(c(1, 5), labels = c(Good = 1, Bad = 5)) ) }) test_that("combining preserves user missing", { expect_equal( vec_c( labelled_spss(1, na_values = c(1, 5)), labelled_spss(5, na_values = c(1, 5)), ), labelled_spss(c(1, 5), na_values = c(1, 5)) ) expect_equal( vec_c( labelled_spss(1, na_range = c(1, 5)), labelled_spss(5, na_range = c(1, 5)), ), labelled_spss(c(1, 5), na_range = c(1, 5)) ) }) test_that("can combine names", { x <- labelled_spss(c(x = 1L)) expect_named(vec_c(x, x), c("x", "x")) expect_named(vec_c(x, c(y = 1L)), c("x", "y")) }) test_that("take labels from LHS", { expect_equal( vec_c( labelled_spss(1, labels = c(Good = 1, Bad = 5)), labelled_spss(5, labels = c(Bad = 1, Good = 5)), ), labelled_spss(c(1, 5), labels = c(Good = 1, Bad = 5)) ) expect_equal( vec_c( labelled_spss(1, labels = c(Good = 1)), labelled_spss(5, labels = c(Bad = 1)), ), labelled_spss(c(1, 5), labels = c(Good = 1)) ) }) test_that("strip user missing if different", { expect_equal( vec_c( labelled_spss(na_values = 1), labelled_spss(na_values = 5), ), labelled() ) expect_equal( vec_c( labelled_spss(na_range = c(1, 5)), labelled_spss(na_range = c(2, 4)), ), labelled() ) expect_equal( vec_c( labelled_spss(na_range = c(1, 5)), labelled_spss(na_values = 5), ), labelled() ) }) test_that("combining picks label from the left", { expect_equal( attr(vec_c( labelled_spss(label = "left"), labelled_spss(label = "right"), ), "label", exact = TRUE), "left" ) }) test_that("combining with bare vectors results in a labelled_spss()", { expect_identical(vec_c(labelled_spss(), 1.1), labelled_spss(1.1)) expect_identical(vec_c(labelled_spss(integer()), 1.1), labelled_spss(1.1)) expect_equal( vec_c(labelled_spss(labels = c(Good = 1, Bad = 5)), 1, 3, 5), labelled_spss(vec_c(1, 3, 5), labels = c(Good = 1, Bad = 5)) ) }) test_that("casting to labelled_spss throws lossy cast if not safe", { expect_incompatible_type(vec_cast("a", labelled_spss())) expect_incompatible_type(vec_cast("a", labelled_spss(integer()))) expect_error(vec_cast(1.1, labelled_spss(integer())), class = "vctrs_error_cast_lossy") }) test_that("casting to a superset of labels works", { expect_equal( vec_cast( labelled_spss(c(1, 5), c(Good = 1)), labelled_spss(labels = c(Good = 1, Bad = 5)) ), labelled_spss(c(1, 5), labels = c(Good = 1, Bad = 5)) ) }) test_that("casting to a subset of labels works iff labels were unused", { expect_equal( vec_cast( labelled_spss(1, c(Good = 1, Bad = 5)), labelled_spss(labels = c(Good = 1)) ), labelled_spss(1, labels = c(Good = 1)) ) expect_lossy_cast(vec_cast( labelled_spss(c(1, 5), c(Good = 1, Bad = 5)), labelled_spss(labels = c(Good = 1)) )) }) test_that("casting away labels throws lossy cast", { expect_lossy_cast(vec_cast( labelled_spss(1, c(Good = 1)), labelled_spss(labels = c(Bad = 5)) )) }) test_that("casting to a superset of user missing works", { expect_equal( vec_cast( labelled_spss(c(1, 5), na_values = 1), labelled_spss(na_values = c(1, 5)) ), labelled_spss(c(1, 5), na_values = c(1, 5)) ) expect_equal( vec_cast( labelled_spss(c(1, 5), na_values = 1), labelled_spss(na_range = c(1, 5)) ), labelled_spss(c(1, 5), na_range = c(1, 5)) ) expect_equal( vec_cast( labelled_spss(c(1, 5), na_range = c(2, 4)), labelled_spss(na_range = c(1, 5)) ), labelled_spss(c(1, 5), na_range = c(1, 5)) ) }) test_that("casting to a subset of user missing works iff values were unused", { expect_equal( vec_cast( labelled_spss(1, na_values = c(1, 5)), labelled_spss(na_values = 1) ), labelled_spss(1, na_values = 1) ) expect_lossy_cast(vec_cast( labelled_spss(c(1, 5), na_values = c(1, 5)), labelled_spss(na_values = 1) )) expect_equal( vec_cast( labelled_spss(1, na_range = c(1, 5)), labelled_spss(na_range = c(1, 3)) ), labelled_spss(1, na_range = c(1, 3)) ) expect_lossy_cast(vec_cast( labelled_spss(c(1, 5), na_range = c(1, 5)), labelled_spss(na_range = c(1, 3)) )) expect_equal( vec_cast( labelled_spss(1, na_range = c(1, 5)), labelled_spss(na_values = 1) ), labelled_spss(1, na_values = 1) ) expect_lossy_cast(vec_cast( labelled_spss(c(1, 5), na_range = c(1, 5)), labelled_spss(na_values = 1) )) expect_equal( vec_cast( labelled_spss(1, na_values = c(1, 5)), labelled_spss(na_range = c(1, 3)) ), labelled_spss(1, na_range = c(1, 3)) ) expect_lossy_cast(vec_cast( labelled_spss(c(1, 5), na_values = c(1, 5)), labelled_spss(na_range = c(1, 3)) )) }) test_that("casting away user missing throws lossy cast", { expect_lossy_cast(vec_cast( labelled_spss(1, na_values = 1), labelled_spss(na_values = 5) )) expect_lossy_cast(vec_cast( labelled_spss(1, na_range = c(1, 3)), labelled_spss(na_range = c(5, 7)) )) expect_lossy_cast(vec_cast( labelled_spss(1, na_range = c(1, 3)), labelled_spss(na_values = 5) )) expect_lossy_cast(vec_cast( labelled_spss(1, na_values = 1), labelled_spss(na_range = c(5, 7)) )) }) test_that("casting to regular labelled ignores missing values", { expect_equal( vec_cast( labelled_spss(1, na_values = c(1, 5)), labelled() ), labelled(1) ) }) test_that("casting away tagged na values throws lossy cast", { expect_lossy_cast(vec_cast( labelled_spss(tagged_na("a")), labelled_spss(integer()) )) expect_incompatible_type(vec_cast( labelled_spss(tagged_na("a")), labelled_spss(character()) )) }) test_that("won't cast labelled_spss numeric to character", { expect_incompatible_type(vec_cast(labelled_spss(), character())) expect_incompatible_type(vec_cast(labelled_spss(integer()), character())) }) haven/tests/testthat/test-tagged_na.R0000644000176200001440000000247714033646044017404 0ustar liggesuserstest_that("tagged_na is NA (but not NaN)", { x <- tagged_na("a") expect_true(is.na(x)) expect_false(is.nan(x)) }) # tag_na ------------------------------------------------------------------ test_that("can extract value of tagged na", { expect_equal(na_tag(tagged_na(letters)), letters) }) test_that("tag of system NA is NA", { expect_equal(na_tag(NA_real_), NA_character_) }) test_that("tag of non-NA is NA", { expect_equal(na_tag(1), NA_character_) }) # is_tagged_na ------------------------------------------------------------ test_that("regular NA isn't tagged", { expect_false(is_tagged_na(NA_real_)) }) test_that("non-missing isn't tagged", { expect_false(is_tagged_na(1)) }) test_that("tagged values are tagged", { x <- tagged_na(c("a", "z")) expect_equal(is_tagged_na(x), c(TRUE, TRUE)) }) test_that("values are checked if required", { x <- tagged_na(c("a", "z")) expect_equal(is_tagged_na(x, "a"), c(TRUE, FALSE)) }) # character output ----------------------------------------------------------- test_that("format_tagged_na displays tagged NA's specially", { x <- c(1, tagged_na("a"), NA) expect_equal(format_tagged_na(x), c( " 1", "NA(a)", " NA" )) }) test_that("print_tagged_na is stable", { x <- c(1:100, tagged_na(letters), NA) expect_snapshot(print_tagged_na(x)) }) haven/tests/testthat/helper-roundtrip.R0000644000176200001440000000171214033646021020003 0ustar liggesusers roundtrip_sav <- function(x, ...) { tmp <- tempfile() on.exit(unlink(tmp)) write_sav(x, tmp, ...) zap_formats(read_sav(tmp)) } roundtrip_dta <- function(x, ...) { tmp <- tempfile() on.exit(unlink(tmp)) write_dta(x, tmp, ...) zap_formats(read_dta(tmp)) } roundtrip_sas <- function(x, ...) { tmp <- tempfile() on.exit(unlink(tmp)) write_sas(x, tmp, ...) zap_formats(read_sas(tmp)) } roundtrip_xpt <- function(x, ...) { tmp <- tempfile() on.exit(unlink(tmp)) write_xpt(x, tmp, ...) zap_formats(read_xpt(tmp)) } roundtrip_var <- function(x, type = "sav", ...) { df <- tibble::tibble(x = x) # Forces xpt files to be correct length even when ending with # empty character strings if (type == "xpt") { df$y <- seq_along(x) } switch(type, sav = roundtrip_sav(df, ...)$x, dta = roundtrip_dta(df, ...)$x, sas = roundtrip_sas(df, ...)$x, xpt = roundtrip_xpt(df, ...)$x, stop("Unsupported type") ) } haven/tests/testthat/spss/0000755000176200001440000000000014034332152015342 5ustar liggesusershaven/tests/testthat/spss/umlauts.sav0000644000176200001440000000106714033646021017555 0ustar liggesusers$FL2@(#) IBM SPSS STATISTICS MS Windows 22.0.0.0 Y@23 Feb 1512:43:46 VAR1 This is an ä-umlaut? the ä umlaut @ the ü umlaut @ the ö umlaut    VAR1=var1var1:$@Role('0' )UTF-8efeghaven/tests/testthat/spss/variable-label.sav0000644000176200001440000000076014033646021020724 0ustar liggesusers$FL2@(#) IBM SPSS STATISTICS MS Windows 22.0.0.0 Y@05 Feb 1516:12:01 SEX Gender ?female   SEX=sexsex:$@Role('0' )UTF-8ehaven/tests/testthat/spss/labelled-str.sav0000755000176200001440000000101514033646021020431 0ustar liggesusers$FL2@(#) IBM SPSS STATISTICS 64-bit MS Windows 24.0.0.0 Y@13 Sep 1613:20:53 GENDER F Female M Male    GENDER=gendergender:$@Role('0' )UTF-8M F haven/tests/testthat/spss/labelled-num.sav0000755000176200001440000000077314033646021020432 0ustar liggesusers$FL2@(#) IBM SPSS STATISTICS MS Windows 22.0.0.0 Y@06 Feb 1514:33:36 VAR00002? This is one   VAR00002=VAR00002VAR00002:$@Role('0' )UTF-8ehaven/tests/testthat/spss/datetime.sav0000644000176200001440000000123214033646021017651 0ustar liggesusers$FL2@(#) IBM SPSS STATISTICS DATA FILE MS Windows 19.0.0 Y@27 Feb 1512:05:48   DATE DATE.POS  TIME    'DATE=date DATE.POS=date.posix TIME=time;date:$@Role('0' )/date.posix:$@Role('0' )/time:$@Role('0' ) windows-1252c BVBGk@0c B6c BQ@haven/tests/testthat/spss/labelled-num-na.sav0000755000176200001440000000102714033646021021017 0ustar liggesusers$FL2@(#) IBM SPSS STATISTICS MS Windows 22.0.0.0 Y@06 Feb 1514:34:22 VAR00002Only one value "@? This is one   VAR00002=VAR00002VAR00002:$@Role('0' )UTF-8emhaven/tests/testthat/test-haven-sas.R0000644000176200001440000001641114101006665017344 0ustar liggesusers# read_sas ---------------------------------------------------------------- test_that("variable label stored as attributes", { df <- read_sas(test_path("sas/hadley.sas7bdat")) expect_equal(attr(df$gender, "label"), NULL) expect_equal(attr(df$q1, "label"), "The instructor was well prepared") }) test_that("value labels parsed from bcat file", { df <- read_sas(test_path("sas/hadley.sas7bdat"), test_path("sas/formats.sas7bcat")) expect_s3_class(df$gender, "haven_labelled") expect_equal(attr(df$gender, "labels"), c(Female = "f", Male = "m")) expect_equal(attr(df$workshop, "labels"), c(R = 1, SAS = 2)) }) test_that("value labels read in as same type as vector", { df <- read_sas(test_path("sas/hadley.sas7bdat"), test_path("sas/formats.sas7bcat")) expect_equal(typeof(df$gender), typeof(attr(df$gender, "labels"))) expect_equal(typeof(df$workshop), typeof(attr(df$workshop, "labels"))) }) test_that("date times are converted into corresponding R types", { df <- read_sas(test_path("sas/datetime.sas7bdat")) expect_equal(df$VAR1[1], ISOdatetime(2015, 02, 02, 14, 42, 12, "UTC")) expect_equal(df$VAR2[1], as.Date("2015-02-02")) expect_equal(df$VAR3[1], as.Date("2015-02-02")) expect_equal(df$VAR4[1], as.Date("2015-02-02")) expect_equal(df$VAR5[1], hms::hms(52932)) }) test_that("tagged missings are read correctly", { x <- read_sas(test_path("sas/tagged-na.sas7bdat"), test_path("sas/tagged-na.sas7bcat"))$x expect_equal(na_tag(x), c(rep(NA, 5), "a", "h", "z")) labels <- attr(x, "labels") expect_equal(na_tag(labels), c("a", "z")) }) test_that("default name repair can be overridden", { df <- data.frame(1:3, 1:3) colnames(df) <- c("id", "id") path <- tempfile() write_sas(df, path) expect_message(read_sas(path), "id...1") expect_message(read_sas(path, .name_repair = "minimal"), NA) }) test_that("connections are read", { file_conn <- file(test_path("sas/hadley.sas7bdat")) expect_identical(read_sas(file_conn), read_sas("sas/hadley.sas7bdat")) }) test_that("zip files are read", { expect_identical( read_sas(test_path("sas/hadley.zip")), read_sas(test_path("sas/hadley.sas7bdat")) ) }) # Row skipping ------------------------------------------------------------ test_that("using skip returns correct number of rows", { rows_after_skipping <- function(n) { nrow(read_sas(test_path("sas/hadley.sas7bdat"), skip = n)) } n <- rows_after_skipping(0) expect_equal(rows_after_skipping(1), n - 1) expect_equal(rows_after_skipping(n - 1), 1) expect_equal(rows_after_skipping(n + 0), 0) expect_equal(rows_after_skipping(n + 1), 0) }) # Row limiting ------------------------------------------------------------ test_that("can limit the number of rows to read", { rows_with_limit <- function(n) { nrow(read_sas(test_path("sas/hadley.sas7bdat"), n_max = n)) } n <- rows_with_limit(Inf) expect_equal(rows_with_limit(0), 0) expect_equal(rows_with_limit(1), 1) expect_equal(rows_with_limit(n), n) expect_equal(rows_with_limit(n + 1), n) # alternatives for unlimited rows expect_equal(rows_with_limit(NA), n) expect_equal(rows_with_limit(-1), n) }) test_that("throws informative error on bad row limit", { rows_with_limit <- function(n) { nrow(read_sas(test_path("sas/hadley.sas7bdat"), n_max = n)) } expect_error(rows_with_limit(1:5), "must have length 1") expect_error(rows_with_limit("foo"), "must be numeric") }) # Column selection -------------------------------------------------------- test_that("can select columns to read, with tidyselect semantics", { with_col_select <- function(x) { read_sas(test_path("sas/hadley.sas7bdat"), col_select = {{ x }}) } full_data <- with_col_select(NULL) n_col <- ncol(full_data) expect_equal(with_col_select("id"), full_data[, "id"]) expect_equal(with_col_select(id), full_data[, "id"]) expect_equal(with_col_select(2:3), full_data[, 2:3]) expect_equal(with_col_select(tidyselect::last_col()), full_data[, n_col]) }) test_that("throws error on empty column selection", { with_col_select <- function(x) { read_sas(test_path("sas/hadley.sas7bdat"), col_select = {{ x }}) } expect_error(with_col_select(character()), "Can't find") expect_error(with_col_select(tidyselect::starts_with("x")), "Can't find") }) test_that("can select columns when a catalog file is present", { expect_named( read_sas( test_path("sas/hadley.sas7bdat"), test_path("sas/formats.sas7bcat"), col_select = "workshop" ), "workshop" ) }) test_that("using cols_only warns about deprecation, but works", { expect_warning( out <- read_sas(test_path("sas/hadley.sas7bdat"), cols_only = "id"), "is deprecated" ) expect_named(out, "id") }) # write_sas --------------------------------------------------------------- test_that("can roundtrip basic types", { x <- runif(10) expect_equal(roundtrip_var(x, "sas"), x) expect_equal(roundtrip_var(1:10, "sas"), 1:10) expect_equal(roundtrip_var(c(TRUE, FALSE), "sas"), c(1, 0)) expect_equal(roundtrip_var(letters, "sas"), letters) }) test_that("can roundtrip missing values (as much as possible)", { expect_equal(roundtrip_var(NA, "sas"), NA_integer_) expect_equal(roundtrip_var(NA_real_, "sas"), NA_real_) expect_equal(roundtrip_var(NA_integer_, "sas"), NA_integer_) expect_equal(roundtrip_var(NA_character_, "sas"), "") }) test_that("can write labelled with NULL labels", { int <- labelled(c(1L, 2L), NULL) num <- labelled(c(1, 2), NULL) chr <- labelled(c("a", "b"), NULL) expect_equal(roundtrip_var(int, "sas"), c(1L, 2L)) expect_equal(roundtrip_var(num, "sas"), c(1, 2)) expect_equal(roundtrip_var(chr, "sas"), c("a", "b")) }) test_that("can roundtrip date times", { x1 <- c(as.Date("2010-01-01"), NA) x2 <- as.POSIXct(x1) attr(x2, "tzone") <- "UTC" expect_equal(roundtrip_var(x1, "sas"), x1) expect_equal(roundtrip_var(x2, "sas"), x2) }) test_that("can roundtrip format attribute", { df <- data.frame(x = structure(1:5, format.sas = "xyz")) path <- tempfile() write_sas(df, path) out <- read_sas(path) expect_equal(df$x, out$x) }) test_that("infinity gets converted to NA", { expect_equal(roundtrip_var(c(Inf, 0, -Inf), "sas"), c(NA, 0, NA)) }) # read_xpt ---------------------------------------------------------------- test_that("can read date/times", { x <- as.Date("2018-01-01") df <- data.frame(date = x, datetime = as.POSIXct(x)) path <- tempfile() write_xpt(df, path) res <- read_xpt(path) expect_s3_class(res$date, "Date") expect_s3_class(res$datetime, "POSIXct") }) # write_xpt --------------------------------------------------------------- test_that("can roundtrip basic types", { x <- runif(10) expect_equal(roundtrip_var(x, "xpt"), x) expect_equal(roundtrip_var(1:10, "xpt"), 1:10) expect_equal(roundtrip_var(c(TRUE, FALSE), "xpt"), c(1, 0)) expect_equal(roundtrip_var(letters, "xpt"), letters) }) test_that("can roundtrip missing values (as much as possible)", { expect_equal(roundtrip_var(NA, "xpt"), NA_integer_) expect_equal(roundtrip_var(NA_real_, "xpt"), NA_real_) expect_equal(roundtrip_var(NA_integer_, "xpt"), NA_integer_) expect_equal(roundtrip_var(NA_character_, "xpt"), "") }) test_that("invalid files generate informative errors", { expect_snapshot(error = TRUE, { write_xpt(mtcars, file.path(tempdir(), " temp.xpt")) }) }) haven/tests/testthat/_snaps/0000755000176200001440000000000014034334015015635 5ustar liggesusershaven/tests/testthat/_snaps/labelled-pillar.md0000644000176200001440000000574114036036502021215 0ustar liggesusers# nice display in tibbles Code int <- labelled(1:5, labels = c(good = 1L, bad = 5L)) dbl <- labelled(1:5 / 10, labels = c(good = 0.1, bad = 0.5)) chr <- labelled(letters[1:5], labels = c(good = "a", bad = "e")) tibble(int, dbl, chr) Output # A tibble: 5 x 3 int dbl chr 1 1 [good] 0.1 [good] a [good] 2 2 0.2 b 3 3 0.3 c 4 4 0.4 d 5 5 [bad] 0.5 [bad] e [bad] # pillar Code x <- labelled(1:11, c(Good = 1, Bad = 8)) tibble::tibble(x) Output # A tibble: 11 x 1 x 1 1 [Good] 2 2 3 3 4 4 5 5 6 6 7 7 8 8 [Bad] 9 9 10 10 11 11 --- Code x <- labelled(c(rep(c(1.22352, 1000, -345), each = 3), 35, 35), c(One = 1.22352, Two = 35, Threeeee = 1000)) tibble::tibble(x) Output # A tibble: 11 x 1 x 1 1.22 [One] 2 1.22 [One] 3 1.22 [One] 4 1000 [Threeeee] 5 1000 [Threeeee] 6 1000 [Threeeee] 7 -345 8 -345 9 -345 10 35 [Two] 11 35 [Two] --- Code x <- labelled(c(rep("A", 3), rep("B", 3), rep("XXXXXX", 4), NA), c(Apple = "A", Banana = "B", Mystery = "XXXXXX")) tibble::tibble(x) Output # A tibble: 11 x 1 x 1 A [Apple] 2 A [Apple] 3 A [Apple] 4 B [Banana] 5 B [Banana] 6 B [Banana] 7 XXXXXX [Mystery] 8 XXXXXX [Mystery] 9 XXXXXX [Mystery] 10 XXXXXX [Mystery] 11 --- Code x <- labelled(c(1:8, tagged_na("a"), tagged_na("b"), NA), c(Good = 1, Bad = 8, Refused = tagged_na("b"))) tibble::tibble(x) Output # A tibble: 11 x 1 x 1 1 [Good] 2 2 3 3 4 4 5 5 6 6 7 7 8 8 [Bad] 9 NA(a) 10 NA(b) [Refused] 11 NA --- Code x <- labelled_spss(c(1:10, NA), c(Good = 1, Bad = 8, Refused = 10), c(9, 10)) tibble::tibble(x) Output # A tibble: 11 x 1 x 1 1 [Good] 2 2 3 3 4 4 5 5 6 6 7 7 8 8 [Bad] 9 9 (NA) 10 10 (NA) [Refused] 11 NA haven/tests/testthat/_snaps/labelled_spss.md0000644000176200001440000000220614036036502020775 0ustar liggesusers# constructor checks na_value Code labelled_spss(1:10, na_values = "a") Error Can't convert `na_values` to match type of `x` . Code labelled_spss(1:10, na_values = NA_integer_) Error `na_values` can not contain missing values. # constructor checks na_range Code labelled_spss(1:10, na_range = "a") Error `na_range` must be a vector of length two the same type as `x`. Code labelled_spss(1:10, na_range = 1:3) Error `na_range` must be a vector of length two the same type as `x`. Code labelled_spss(1:10, na_range = c(2, NA)) Error `na_range` can not contain missing values. Code labelled_spss(1:10, na_range = c(2, 1)) Error `na_range` must be in ascending order. # printed output is stable Code x Output [5]> [1] 1 2 3 4 5 Missing values: 1, 2 Missing range: [3, Inf] Labels: value label 1 Good 5 Bad haven/tests/testthat/_snaps/haven-spss.md0000644000176200001440000000040714036036501020250 0ustar liggesusers# complain about long factor labels Code x <- paste(rep("a", 200), collapse = "") df <- data.frame(x = factor(x)) write_sav(df, tempfile()) Error SPSS only supports levels with <= 120 characters Problems: `x` haven/tests/testthat/_snaps/haven-stata.md0000644000176200001440000000227014036036501020374 0ustar liggesusers# invalid files generate informative errors Code long <- paste(rep("a", 100), collapse = "") write_dta(data.frame(x = 1), tempfile(), label = long) Error Stata data labels must be 80 characters or fewer Code df <- data.frame(1) names(df) <- "x y" write_dta(df, tempfile(), version = 13) Error The following variable names are not valid Stata variables: `x y` Code names(df) <- long write_dta(df, tempfile(), version = 13) Error The following variable names are not valid Stata variables: `aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa` Code write_dta(df, tempfile(), version = 14) Error The following variable names are not valid Stata variables: `aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa` # can't write non-integer labels (#401) Code df <- data.frame(x = labelled(c(1, 2.5, 3), c(b = 1.5))) write_dta(df, tempfile()) Error Stata only supports labelling with integers. Problems: `x` haven/tests/testthat/_snaps/labelled.md0000644000176200001440000000050514036036560017731 0ustar liggesusers# printed output is stable Code x Output [9]> [1] 1 2 3 4 5 NA NA(x) NA(y) NA(z) Labels: value label 1 Good 5 Bad NA(x) Not Applicable NA(y) Refused to answer haven/tests/testthat/_snaps/haven-sas.md0000644000176200001440000000032614036036501020046 0ustar liggesusers# invalid files generate informative errors Code write_xpt(mtcars, file.path(tempdir(), " temp.xpt")) Error Failed to create file: A provided name contains an illegal character. haven/tests/testthat/_snaps/tagged_na.md0000644000176200001440000000171114036036502020072 0ustar liggesusers# print_tagged_na is stable Code print_tagged_na(x) Output [1] 1 2 3 4 5 6 7 8 9 10 11 12 [13] 13 14 15 16 17 18 19 20 21 22 23 24 [25] 25 26 27 28 29 30 31 32 33 34 35 36 [37] 37 38 39 40 41 42 43 44 45 46 47 48 [49] 49 50 51 52 53 54 55 56 57 58 59 60 [61] 61 62 63 64 65 66 67 68 69 70 71 72 [73] 73 74 75 76 77 78 79 80 81 82 83 84 [85] 85 86 87 88 89 90 91 92 93 94 95 96 [97] 97 98 99 100 NA(a) NA(b) NA(c) NA(d) NA(e) NA(f) NA(g) NA(h) [109] NA(i) NA(j) NA(k) NA(l) NA(m) NA(n) NA(o) NA(p) NA(q) NA(r) NA(s) NA(t) [121] NA(u) NA(v) NA(w) NA(x) NA(y) NA(z) NA haven/tests/testthat/stata/0000755000176200001440000000000014102332323015462 5ustar liggesusershaven/tests/testthat/stata/tagged-na-double.dta0000755000176200001440000000372114033646021021267 0ustar liggesusers
118LSF 7 Jun 2016 18:03
<7Cx000001%9.0gtestlabelQwP @%@B @QZ@8B B118%AZ@]8B @ @ @1QZ@QZ@ _dta_lang_listx"Qw8L @x" x @Z@Cdefault _dta_lang_cx"Qw8L @x" x @Z@Cdefault?@@@@%testlabel @ @QZ@B apple.zebra
haven/tests/testthat/stata/datetime-d.dta0000644000176200001440000000047714033646021020207 0ustar liggesuserss`%`$r$~r$BB 2 Nov 2015 16:07date%d0g''wwwwhwTFhaven/tests/testthat/stata/tagged-na-int.dta0000755000176200001440000000366114033646021020612 0ustar liggesusers
118LSF 7 Jun 2016 11:09
<7Cx000001%9.0gtestlabelA?(B @t @ @x_ QZ@8B B118%AZ@x_ 8B @ @ @x_ QZ@QZ@ _dta_lang_cstQwHJ @HJx @Z@W Շdefault _dta_lang_listQwHJ @HJBZ@W Շdefault%testlabel @x_  @x_ QZ@B apple.zebra
haven/tests/testthat/stata/types.dta0000644000176200001440000000427614033646021017337 0ustar liggesusers
117LSF 1 Dec 2015 03:48
Eb$S! vfloatvdoublevlong01vint001vbyte01vstrvdatevdatetime%9.0g%9.0g%9.0g%9.0g%9.0g%13s%td0g%tcgwQuuBU@QtvB0T@QuuBU@QtvB0T@QuuBU@QtvB0T@QuuBU@QtvB0T@QuuBU@QtvB0T@QuuBU@QtvB0T@QuuBU@QtvB0T@QuuBU@QtvB0T@H@Q @2lloB@u~yB@@Hello, World!O
haven/tests/testthat/stata/notes.dta0000644000176200001440000000224614033646021017316 0ustar liggesuserssThis is a test dataset.VX2П!xZП,";29 Jul 2016 14:34idfaceHeight") expect_error(quantile(x_chr), "labelled") expect_equal(summary(x_chr), summary(letters[1:3])) }) # types ------------------------------------------------------------------- test_that("combining is symmetrical w.r.t. data types", { expect_incompatible_type(vec_c(labelled(character()), labelled())) expect_incompatible_type(vec_c(labelled(), labelled(character()))) expect_identical( vec_c(labelled(integer()), labelled()), vec_c(labelled(), labelled(integer())) ) expect_identical( vec_c(labelled(), double()), vec_c(double(), labelled()) ) expect_identical( vec_c(labelled(), integer()), vec_c(integer(), labelled()) ) }) test_that("can cast labelled to atomic vectors", { x_int <- labelled(1:2) x_dbl <- labelled(c(1, 2)) x_chr <- labelled(c("a", "b")) expect_identical(vec_cast(x_int, integer()), 1:2) expect_identical(vec_cast(x_int, double()), c(1, 2)) expect_error(vec_cast(x_int, character()), class = "vctrs_error_incompatible_type") expect_identical(vec_cast(x_dbl, integer()), 1:2) expect_identical(vec_cast(x_dbl, double()), c(1, 2)) expect_error(vec_cast(x_dbl, character()), class = "vctrs_error_incompatible_type") expect_error(vec_cast(x_chr, integer()), class = "vctrs_error_incompatible_type") expect_error(vec_cast(x_chr, double()), class = "vctrs_error_incompatible_type") expect_identical(vec_cast(x_chr, character()), c("a", "b")) }) test_that("can cast atomic vectors to labelled", { x_int <- labelled(1:2) x_dbl <- labelled(c(1, 2)) x_chr <- labelled(c("a", "b")) expect_identical(vec_cast(1:3, x_int), labelled(1:3)) expect_identical(vec_cast(1:3, x_dbl), labelled(c(1, 2, 3))) expect_error(vec_cast(1:3, x_chr), class = "vctrs_error_incompatible_type") expect_identical(vec_cast(c(0, 1), x_int), labelled(0:1)) expect_identical(vec_cast(c(0, 1), x_dbl), labelled(c(0, 1))) expect_error(vec_cast(c(0, 1), x_chr), class = "vctrs_error_incompatible_type") expect_error(vec_cast("a", x_int), class = "vctrs_error_incompatible_type") expect_error(vec_cast("a", x_dbl), class = "vctrs_error_incompatible_type") expect_identical(vec_cast("a", x_chr), labelled("a")) }) test_that("combining preserves label sets", { expect_equal( vec_c( labelled(1, labels = c(Good = 1, Bad = 5)), labelled(5, labels = c(Good = 1, Bad = 5)), ), labelled(c(1, 5), labels = c(Good = 1, Bad = 5)) ) }) test_that("can combine names", { x <- labelled(c(x = 1L)) expect_named(vec_c(x, x), c("x", "x")) expect_named(vec_c(x, c(y = 1L)), c("x", "y")) }) test_that("take labels from LHS", { expect_equal( vec_c( labelled(1, labels = c(Good = 1, Bad = 5)), labelled(5, labels = c(Bad = 1, Good = 5)), ), labelled(c(1, 5), labels = c(Good = 1, Bad = 5)) ) expect_equal( vec_c( labelled(1, labels = c(Good = 1)), labelled(5, labels = c(Bad = 1)), ), labelled(c(1, 5), labels = c(Good = 1)) ) }) test_that("combining picks label from the left", { expect_equal( attr(vec_c( labelled(label = "left"), labelled(label = "right"), ), "label", exact = TRUE), "left" ) }) test_that("combining with bare vectors results in a labelled()", { expect_identical(vec_c(labelled(), 1.1), labelled(1.1)) expect_identical(vec_c(labelled(integer()), 1.1), labelled(1.1)) expect_equal( vec_c(labelled(labels = c(Good = 1, Bad = 5)), 1, 3, 5), labelled(vec_c(1, 3, 5), labels = c(Good = 1, Bad = 5)) ) }) test_that("casting to labelled throws lossy cast if not safe", { expect_incompatible_type(vec_cast("a", labelled())) expect_incompatible_type(vec_cast("a", labelled(integer()))) expect_error(vec_cast(1.1, labelled(integer())), class = "vctrs_error_cast_lossy") }) test_that("casting to a superset of labels works", { expect_equal( vec_cast( labelled(c(1, 5), c(Good = 1)), labelled(labels = c(Good = 1, Bad = 5)) ), labelled(c(1, 5), labels = c(Good = 1, Bad = 5)) ) }) test_that("casting to a subset of labels works iff labels were unused", { expect_equal( vec_cast( labelled(1, c(Good = 1, Bad = 5)), labelled(labels = c(Good = 1)) ), labelled(1, labels = c(Good = 1)) ) expect_lossy_cast(vec_cast( labelled(c(1, 5), c(Good = 1, Bad = 5)), labelled(labels = c(Good = 1)) )) }) test_that("casting away labels throws lossy cast", { expect_lossy_cast(vec_cast( labelled(1, c(Good = 1)), labelled(labels = c(Bad = 5)) )) }) test_that("casting away tagged na values throws lossy cast", { expect_lossy_cast(vec_cast( labelled(tagged_na("a")), labelled(integer()) )) expect_incompatible_type(vec_cast( labelled(tagged_na("a")), labelled(character()) )) }) test_that("won't cast labelled numeric to character", { expect_incompatible_type(vec_cast(labelled(), character())) expect_incompatible_type(vec_cast(labelled(integer()), character())) }) # methods ----------------------------------------------------------------- test_that("printed output is stable", { x <- labelled( c(1:5, NA, tagged_na("x", "y", "z")), c( Good = 1, Bad = 5, "Not Applicable" = tagged_na("x"), "Refused to answer" = tagged_na("y") ) ) expect_snapshot(x) }) test_that("given correct name in data frame", { x <- labelled(1:3, c(a = 1)) expect_named(data.frame(x), "x") expect_named(data.frame(y = x), "y") }) test_that("can convert to factor with using labels with labelled na's", { x <- labelled(c(1:2, tagged_na("a")), c(a = 1, c = tagged_na("a"))) expect_equal(as_factor(x, "labels"), factor(c("a", NA, "c"))) }) haven/tests/testthat/test-labelled-pillar.R0000644000176200001440000000211614034330341020475 0ustar liggesuserstest_that("nice display in tibbles", { expect_snapshot({ int <- labelled(1:5, labels = c(good = 1L, bad = 5L)) dbl <- labelled(1:5 / 10, labels = c(good = 0.1, bad = 0.5)) chr <- labelled(letters[1:5], labels = c(good = "a", bad = "e")) tibble(int, dbl, chr) }) }) test_that("pillar", { expect_snapshot({ x <- labelled(1:11, c(Good = 1, Bad = 8)) tibble::tibble(x) }) expect_snapshot({ x <- labelled( c(rep(c(1.22352, 1000, -345), each = 3), 35, 35), c(One = 1.22352, Two = 35, Threeeee = 1000) ) tibble::tibble(x) }) expect_snapshot({ x <- labelled( c(rep("A", 3), rep("B", 3), rep("XXXXXX", 4), NA), c(Apple = "A", Banana = "B", Mystery = "XXXXXX") ) tibble::tibble(x) }) expect_snapshot({ x <- labelled( c(1:8, tagged_na("a"), tagged_na("b"), NA), c(Good = 1, "Bad" = 8, Refused = tagged_na("b")) ) tibble::tibble(x) }) expect_snapshot({ x <- labelled_spss( c(1:10, NA), c(Good = 1, Bad = 8, Refused = 10), c(9, 10) ) tibble::tibble(x) }) }) haven/tests/testthat/helpers-types.R0000644000176200001440000000031114033646021017276 0ustar liggesusersexpect_lossy_cast <- function(...) { expect_error(class = "vctrs_error_cast_lossy", ...) } expect_incompatible_type <- function(...) { expect_error(class = "vctrs_error_incompatible_type", ...) } haven/tests/testthat/test-as_factor.R0000644000176200001440000000736514034330420017422 0ustar liggesusers# Base types -------------------------------------------------------------- test_that("variable label is kept when converting characters to factors (#178)", { s1 <- structure(letters, "label" = "letters") expect_identical(attr(as_factor(s1), "label"), "letters") }) # Labelled values --------------------------------------------------------- test_that("all labels (implicit missing values) are preserved when levels is 'default' or 'both' (#172)", { s1 <- labelled(rep(1, 3), c("A" = 1, "B" = 2, "C" = 3)) exp <- factor(rep("A", 3), levels = c("A", "B", "C")) expect_equal(as_factor(s1), exp) exp <- factor(rep("[1] A", 3), levels = c("[1] A", "[2] B", "[3] C")) expect_equal(as_factor(s1, levels = "both"), exp) }) test_that("all labels (existing and missing) are sorted by values (#172)", { s1 <- labelled(c(1, 4), c("Agree" = 1, "Neutral" = 2, "Disagree" = 3, "Don't know" = 5)) exp <- factor(c("Agree", "4"), levels = c("Agree", "Neutral", "Disagree", "4", "Don't know")) expect_equal(as_factor(s1), exp) }) test_that("all values are preserved", { s1 <- labelled(1:3, c("A" = 2)) exp <- factor(c("1", "A", "3"), levels = c("1", "A", "3")) expect_equal(as_factor(s1), exp) }) test_that("character labelled converts to factor", { s1 <- labelled(c("M", "M", "F"), c(Male = "M", Female = "F")) exp <- factor(c("Male", "Male", "Female"), levels = c("Female", "Male")) expect_equal(as_factor(s1), exp) }) test_that("converts tagged NAs", { s1 <- labelled(c(1:2, tagged_na("a", "b")), c("Apple" = tagged_na("a"))) exp <- factor(c("1", "2", "Apple", NA)) expect_equal(as_factor(s1), exp) }) # Both test_that("both combines values and levels", { s1 <- labelled(2:1, c("A" = 1)) exp <- factor(c("2", "[1] A"), levels = c("[1] A", "2")) expect_equal(as_factor(s1, "both"), exp) }) # Values test_that("values preserves order if possible", { s1 <- labelled(c("M", "M", "F"), c(Male = "M", Female = "F")) exp <- factor(c("M", "M", "F"), levels = c("M", "F")) expect_equal(as_factor(s1, "values"), exp) }) test_that("otherwise falls back to alphabetical", { s1 <- labelled(c("M", "M", "F", "G"), c(Male = "M", Female = "F")) exp <- factor(c("M", "M", "F", "G"), levels = c("F", "G", "M")) expect_equal(as_factor(s1, "values"), exp) }) # Labels test_that("labels preserves all label values", { var <- labelled(1L, c(female = 1L, male = 2L)) expect_equal(as_factor(var, "labels"), factor("female", levels = c("female", "male"))) }) test_that("order of labels doesn't matter", { var <- labelled(1L, c(female = 2L, male = 1L)) expect_equal(as_factor(var, "labels"), factor("male", levels = c("female", "male"))) }) test_that("as_factor labels works with non-unique labels", { s1 <- labelled(1:2, c("label" = 1, "label" = 2)) exp <- factor(c("label", "label"), levels = "label") expect_equal(as_factor(s1, "labels"), exp) }) # Variable label test_that("variable label is kept when converting labelled to factor (#178)", { s1 <- labelled(1:3, c("A" = 2)) attr(s1, "label") <- "labelled" expect_identical(attr(as_factor(s1), "label"), "labelled") }) # data frames ------------------------------------------------------------- test_that("... passed along", { df <- data.frame(x = labelled(2:1, c("A" = 1))) out <- as_factor(df, "both") expect_equal(levels(out$x), c("[1] A", "2")) }) # replace_with ------------------------------------------------------------ test_that("updates numeric values", { x <- 1:5 expect_equal(replace_with(x, -1, 5), x) expect_equal(replace_with(x, 1, 5), c(5, 2:5)) expect_equal(replace_with(x, 5, 1), c(1:4, 1)) expect_equal(replace_with(x, 1:5, rep(1, 5)), rep(1, 5)) }) test_that("udpates tagged NAs", { x <- c(tagged_na("a"), 1:3) expect_equal(replace_with(x, tagged_na("a"), 0), 0:3) }) haven/tests/testthat/test-zap_missing.R0000644000176200001440000000161714034334516020010 0ustar liggesuserstest_that("strips na tags", { x1 <- labelled(tagged_na("a", "b"), c(a = tagged_na("a"), b = 1)) x2 <- zap_missing(x1) expect_equal(na_tag(x2), c(NA_character_, NA)) expect_equal(attr(x2, "labels"), c(b = 1)) }) test_that("converts user-defined missings", { x1 <- labelled_spss(c(1, 2, 99), c(missing = 99), na_values = 99) x2 <- zap_missing(x1) expect_s3_class(x2, "haven_labelled") expect_equal(as.integer(x2), c(1, 2, NA)) x3 <- labelled_spss(1:10, na_values = c(2, 4), na_range = c(8, 10)) x4 <- zap_missing(x3) expect_s3_class(x4, "haven_labelled") expect_equal(as.integer(x4), c(1, NA, 3, NA, 5, 6, 7, NA, NA, NA)) }) test_that("converts data frame", { x1 <- labelled(tagged_na("a", "b"), c(a = tagged_na("a"), b = 1)) df1 <- tibble::tibble(x1 = 1, x2 = 2:1) df2 <- zap_missing(df1) expect_equal(na_tag(df1$x1), c(NA_character_, NA)) expect_equal(df1$x2, df2$x2) }) haven/tests/testthat/test-zap-empty.R0000644000176200001440000000017114033646044017406 0ustar liggesuserstest_that("empty strings replaced with missing", { x <- c("", "a", NA) expect_equal(zap_empty(x), c(NA, "a", NA)) }) haven/tests/testthat/test-haven-spss.R0000644000176200001440000002127714101006665017554 0ustar liggesusers# read_spss --------------------------------------------------------------- test_that("variable label stored as attributes", { df <- read_spss(test_path("spss/variable-label.sav")) expect_equal(attr(df$sex, "label"), "Gender") }) test_that("value labels stored as labelled class", { num <- zap_formats(read_spss(test_path("spss/labelled-num.sav"))) str <- zap_formats(read_spss(test_path("spss/labelled-str.sav"))) expect_equal(num[[1]], labelled(1, c("This is one" = 1))) expect_equal(str[[1]], labelled(c("M", "F"), c(Female = "F", Male = "M"))) }) test_that("value labels read in as same type as vector", { df <- read_spss(test_path("spss/variable-label.sav")) num <- read_spss(test_path("spss/labelled-num.sav")) str <- read_spss(test_path("spss/labelled-str.sav")) expect_equal(typeof(df$sex), typeof(attr(df$sex, "labels"))) expect_equal(typeof(num[[1]]), typeof(attr(num[[1]], "labels"))) expect_equal(typeof(str[[1]]), typeof(attr(str[[1]], "labels"))) }) test_that("non-ASCII labels converted to utf-8", { x <- read_spss(test_path("spss/umlauts.sav"))[[1]] expect_equal(attr(x, "label"), "This is an \u00e4-umlaut") expect_equal(names(attr(x, "labels"))[1], "the \u00e4 umlaut") }) test_that("datetime variables converted to the correct class", { df <- read_spss(test_path("spss/datetime.sav")) expect_true(inherits(df$date, "Date")) expect_true(inherits(df$date.posix, "POSIXct")) expect_true(inherits(df$time, "hms")) }) test_that("datetime values correctly imported (offset)", { df <- read_spss(test_path("spss/datetime.sav")) expect_equal(df$date[1], as.Date("2014-09-22d")) expect_equal(df$date.posix[2], as.POSIXct("2014-09-23 15:59:20", tz = "UTC")) expect_equal(as.integer(df$time[1]), 43870) }) test_that("formats roundtrip", { df <- tibble::tibble( a = structure(c(1, 1, 2), format.spss = "F1.0"), b = structure(4:6, format.spss = "F2.1"), c = structure(7:9, format.spss = "N2"), d = structure(c("Text", "Text", ""), format.spss = "A100") ) tmp <- tempfile() on.exit(unlink(tmp)) write_sav(df, tmp) df2 <- read_sav(tmp) expect_equal(df$a, df2$a) expect_equal(df$b, df2$b) expect_equal(df$c, df2$c) expect_equal(df$d, df2$d) }) test_that("widths roundtrip", { df <- tibble::tibble( a = structure(c(1, 1, 2), display_width = 10), b = structure(4:6, display_width = 11), c = structure(7:9, display_width = 12), d = structure(c("Text", "Text", ""), display_width = 10) ) tmp <- tempfile() on.exit(unlink(tmp)) write_sav(df, tmp) df2 <- read_sav(tmp) expect_equal(df$a, zap_formats(df2$a)) expect_equal(df$b, zap_formats(df2$b)) expect_equal(df$c, zap_formats(df2$c)) expect_equal(df$d, zap_formats(df2$d)) }) test_that("only selected columns are read", { out <- read_spss(test_path("spss/datetime.sav"), col_select = "date") expect_named(out, "date") }) # Row skipping/limiting -------------------------------------------------------- test_that("using skip returns correct number of rows", { rows_after_skipping <- function(n) { nrow(read_spss(test_path("spss/datetime.sav"), skip = n)) } n <- rows_after_skipping(0) expect_equal(rows_after_skipping(1), n - 1) expect_equal(rows_after_skipping(n - 1), 1) expect_equal(rows_after_skipping(n + 0), 0) expect_equal(rows_after_skipping(n + 1), 0) }) test_that("can limit the number of rows to read", { rows_with_limit <- function(n) { nrow(read_spss(test_path("spss/datetime.sav"), n_max = n)) } n <- rows_with_limit(Inf) expect_equal(rows_with_limit(0), 0) expect_equal(rows_with_limit(1), 1) expect_equal(rows_with_limit(n), n) expect_equal(rows_with_limit(n + 1), n) # alternatives for unlimited rows expect_equal(rows_with_limit(NA), n) expect_equal(rows_with_limit(-1), n) }) # User-defined missings --------------------------------------------------- test_that("user-defined missing values read as missing by default", { num <- read_spss(test_path("spss/labelled-num-na.sav"))[[1]] expect_equal(vec_data(num)[[2]], NA_real_) }) test_that("user-defined missing values can be preserved", { num <- read_spss(test_path("spss/labelled-num-na.sav"), user_na = TRUE)[[1]] expect_s3_class(num, "haven_labelled_spss") expect_equal(vec_data(num)[[2]], 9) expect_equal(attr(num, "na_values"), 9) expect_equal(attr(num, "na_range"), NULL) num }) test_that("system missings read as NA", { df <- tibble::tibble(x = c(1, NA)) out <- roundtrip_sav(df) expect_identical(df$x, c(1, NA)) }) # write_sav --------------------------------------------------------------- test_that("can roundtrip basic types", { x <- runif(10) expect_equal(roundtrip_var(x, "sav"), x) expect_equal(roundtrip_var(1:10, "sav"), 1:10) expect_equal(roundtrip_var(c(TRUE, FALSE), "sav"), c(1, 0)) expect_equal(roundtrip_var(letters, "sav"), letters) }) test_that("can roundtrip missing values (as much as possible)", { expect_equal(roundtrip_var(NA, "sav"), NA_integer_) expect_equal(roundtrip_var(NA_real_, "sav"), NA_real_) expect_equal(roundtrip_var(NA_integer_, "sav"), NA_integer_) expect_equal(roundtrip_var(NA_character_, "sav"), "") }) test_that("can roundtrip date times", { x1 <- c(as.Date("2010-01-01"), NA) x2 <- as.POSIXct(x1) attr(x2, "tzone") <- "UTC" expect_equal(roundtrip_var(x1, "sav"), x1) expect_equal(roundtrip_var(x2, "sav"), x2) }) test_that("can roundtrip times", { x <- hms::hms(c(1, NA, 86400)) expect_equal(roundtrip_var(x, "sav"), x) }) test_that("infinity gets converted to NA", { expect_equal(roundtrip_var(c(Inf, 0, -Inf), "sav"), c(NA, 0, NA)) }) test_that("factors become labelleds", { f <- factor(c("a", "b"), levels = letters[1:3]) rt <- roundtrip_var(f, "sav") expect_s3_class(rt, "haven_labelled") expect_equal(as.vector(rt), 1:2) expect_equal(attr(rt, "labels"), c(a = 1, b = 2, c = 3)) }) test_that("labels are preserved", { x <- 1:10 attr(x, "label") <- "abc" expect_equal(attr(roundtrip_var(x, "sav"), "label"), "abc") }) test_that("labelleds are round tripped", { int <- labelled(c(1L, 2L), c(a = 1L, b = 3L)) num <- labelled(c(1, 2), c(a = 1, b = 3)) chr <- labelled(c("a", "b"), c(a = "b", b = "a")) expect_equal(roundtrip_var(num, "sav"), num) expect_equal(roundtrip_var(chr, "sav"), chr) }) test_that("spss labelleds are round tripped", { df <- tibble( x = labelled_spss( c(1, 2, 1, 9), labels = c(no = 1, yes = 2, unknown = 9), na_values = 9, na_range = c(80, 90) ) ) path <- tempfile() write_sav(df, path) df2 <- read_sav(path) expect_s3_class(df2$x, "haven_labelled") expect_equal(as.double(df2$x), c(1, 2, 1, NA)) df3 <- read_sav(path, user_na = TRUE) expect_s3_class(df3$x, "haven_labelled_spss") expect_equal(attr(df3$x, "na_values"), attr(df$x, "na_values")) expect_equal(attr(df3$x, "na_range"), attr(df$x, "na_range")) }) test_that("spss string labelleds are round tripped", { df <- tibble( x = labelled_spss( c("1", "2", "3", "99"), labels = c(one = "1"), na_values = "99", na_range = c("2", "3") ) ) path <- tempfile() write_sav(df, path) df2 <- read_sav(path) expect_s3_class(df2$x, "haven_labelled") expect_equal(as.character(df2$x), c("1", NA, NA, NA)) df3 <- read_sav(path, user_na = TRUE) expect_s3_class(df3$x, "haven_labelled_spss") expect_equal(attr(df3$x, "na_values"), attr(df$x, "na_values")) expect_equal(attr(df3$x, "na_range"), attr(df$x, "na_range")) }) test_that("factors become labelleds", { f <- factor(c("a", "b"), levels = letters[1:3]) rt <- roundtrip_var(f, "sav") expect_s3_class(rt, "haven_labelled") expect_equal(as.vector(rt), 1:2) expect_equal(attr(rt, "labels"), c(a = 1, b = 2, c = 3)) }) test_that("labels are converted to utf-8", { labels_utf8 <- c("\u00e9\u00e8", "\u00e0", "\u00ef") labels_latin1 <- iconv(labels_utf8, "utf-8", "latin1") v_utf8 <- labelled(3:1, setNames(1:3, labels_utf8)) v_latin1 <- labelled(3:1, setNames(1:3, labels_latin1)) expect_equal(names(attr(roundtrip_var(v_utf8, "sav"), "labels")), labels_utf8) expect_equal(names(attr(roundtrip_var(v_latin1, "sav"), "labels")), labels_utf8) }) test_that("complain about long factor labels", { expect_snapshot(error = TRUE, { x <- paste(rep("a", 200), collapse = "") df <- data.frame(x = factor(x)) write_sav(df, tempfile()) }) }) # max_level_lengths ------------------------------------------------------- test_that("works with NA levels", { x <- factor(c("a", "abc", NA), exclude = NULL) expect_equal(max_level_length(x), 3) }) test_that("works with empty factors", { x <- factor(character(), levels = character()) expect_equal(max_level_length(x), 0) x <- factor(character(), levels = c(NA_character_)) expect_equal(max_level_length(x), 0) }) haven/tests/testthat/test-zap_labels.R0000644000176200001440000000126014033646044017574 0ustar liggesuserstest_that("zap_labels strips labelled attributes", { var <- labelled(c(1L, 98L, 99L), c(not_answered = 98L, not_applicable = 99L)) exp <- c(1L, 98L, 99L) expect_equal(zap_labels(var), exp) }) test_that("zap_labels returns variables not of class('labelled') unmodified", { var <- c(1L, 98L, 99L) expect_equal(zap_labels(var), var) }) test_that("zap_labels is applied to every column in data frame", { df <- tibble::tibble(x = 1:10, y = labelled(10:1, c("good" = 1))) expect_equal(zap_labels(df)$y, 10:1) }) test_that("replaces user-defined missings for spss", { x <- labelled_spss(1:5, c(a = 1), na_values = c(2, 4)) expect_equal(zap_labels(x), c(1, NA, 3, NA, 5)) }) haven/tests/testthat/sas/0000755000176200001440000000000014034332133015137 5ustar liggesusershaven/tests/testthat/sas/formats.sas7bcat0000644000176200001440000004200014033646021020242 0ustar liggesuserscϽ 1""332""3323#3SAS FILEFORMATS CATALOG x2AOMd3A9.0401M1X64_8PROx^^^xx2AxHSYSRESR PGBITMAP`FORMATC GENDER8FORMAT WORKSHOPxKXLCH(XLSRshy2A/Ld3A  XLSRLd3ALd3AxO0001000100060000 xF  XLSRVMd3AMd3AxO0010001000030000 xF  XLSRM$AfN$A  < IOM Cache Service XLSR 8N$AfN$A   < Base A. M. for Catalog's XLSR M$AfN$A   < Clipboard Access Method XLSR 8N$AfN$A   < Communication Ports XLSR M$AfN$A   < Base A.M. for URL XLSR 8N$AfN$A   < Dynamic Data Exchange XLSR8N$AfN$A  < Base A. M. for Disk files XLSR 8N$AfN$A   < Drive Map access method XLSR 8N$AfN$A   < Base A. M. for Dummy files XLSR 8N$AfN$A   < Base A. M. for EMAIL XLSR 8N$AfN$A   <  FTP A. M. xjx`Kx$GENDER Ld3A ( @ f  m  Female MalexWORKSHOPVMd3A(-q=`0ѿ0hRSAS  haven/tests/testthat/sas/hadley.sas7bdat0000644000176200001440000040000014033646021020034 0ustar liggesusers`Ͻ 1""332""3323#3>SAS FILEHADLEY DATA h%3Ah%3A9.0401M1X64_8PROh%$J$J$JCkh%3ABk  0LHh <44p4<4444????@?f @@@?@?f @?@@@@f @@@?@ @?@@@@m @@@@@@m @?@@@@m @@@@@@m ~  ~ ~1$~1$$p L @00( \0 (@$(8Hl DATASTEPidworkshopWORKSHOPgender$GENDERq1The instructor was well preparedq2The instructor communicated wellq3The course material was helpfulq4Overall, I found the workhsop useful 0"8(nBk $}haven/tests/testthat/sas/hadley.zip0000644000176200001440000000226314033646021017137 0ustar liggesusersPK)UF{ hadley.sas7bdatUT TPWux kUIj"M  jD&HIM4hJPAАlM&'+zՃJA^HBv, ZNիZ ó!t3c#coDmtdz$ɚ>_GXҕwb}!{W^z譓gMKg/n]Ο~8H.g7-cqCVygZV֭HxUOxϧygJ_ a(]\O\͝<]>{&m=f}]whk:ֹRdQC_Oc]}l~gߥަ=qOSߑ]q}vx0 7CQ[ Ł,poU*{6d_нK{VC} ^zCǾ{mW*p$?#P?ˡUlWZWBh z7si ՘%t{syˏLQ_h L|)4܎׺GMFGG.̅pa>乗'ϾS\+8ع6|1ZX^(on 3хb3ډ;jfW6fg6sZfy- 3bRX abyTz2:_\6+z1)ݜ;{lW;|YεsmIC\ӹt.ydz.s7w{nIJGCӝ -MjV{^jI9uy;jIiJ4vܵ{7g'Y͏qxL_{wYP~/PK)UF{ hadley.sas7bdatUTTux PKUHhaven/tests/testthat/sas/tagged-na.sas7bdat0000644000176200001440000040000014033646021020415 0ustar liggesusers`Ͻ 1""332""3323#3>SAS FILEAPPLE DATA *ŞA*ŞA 9.0401M2X64_8PRO*EEEh*ŞAh  08p <4<?@@@@($,> DATASTEPxXFMT,  0"h  haven/tests/testthat/sas/tagged-na.sas7bcat0000644000176200001440000004200014033646021020416 0ustar liggesuserscϽ 1""332""3323#3SAS FILEXFMT CATALOG }?m/A}?m/A 9.0401M2X64_8PRO}?m/1qJ1qJ1qJж~}?m/AѶ~|SYSRESR PGBITMAP`FORMAT XFMT8Ҷ~`XLCH (XLSR$n/A$n/A  XLSR?5n/A?5n/AxO0010001000050000 F  XLSR8N$AfN$A  < Base A. M. for external files XLSRM$AfN$A  < IOM Cache Service XLSR 8N$AfN$A   < Base A. M. for Catalog's XLSR M$AfN$A   < Clipboard Access Method XLSR 8N$AfN$A   < Communication Ports XLSR M$AfN$A   < Base A.M. for URL XLSR 8N$AfN$A   < Dynamic Data Exchange XLSR8N$AfN$A  < Base A. M. for Disk files XLSR 8N$AfN$A   < Drive Map access method XLSR 8N$AfN$A   < Base A. M. for Dummy files XLSR 8N$AfN$A   < Base A. M. for EMAIL XLSR 8N$AfN$A   <  FTP A. M. Ӷ~Lf jԶ~  xXFMT ?5n/A(-q= `0AAAA0ZZZZ Apple Zebrahaven/tests/testthat/sas/datetime.sas7bdat0000644000176200001440000001200014033646021020361 0ustar liggesusers`Ͻ 1""2"2""2"22"">SAS FILEDATETIME DATA M3AM3A9.0301M0XP_PROMJ"J"J"ChhM3ABhh\   0 dD < P 4 4X 4$ 4 4 4 . A@@@@@@@dAC@C@C@@A@@@@D@heck the spelling of all wordsCheck spelling and suggest correct wordsSuggest correct word for misspelled wordRemember misspelled word as a correct wordAdd misspelled word to a dictionaryInclude a dictionary to be used for spell checkingClose an included dictionaryCreate a dictionaryAGSRDIFC "Enter dictionary name: DICT CREATE @1" Enter dictionary name: DICT FREE @1"Enter dictionary name: DICT INCLUDE @1" Enter dictionary name: SPELL ADD @1"Enter dictionary name: SPELL REMEMBER @1 Spell suggestSpell all suggest Spell all% {qf gSh h h h h h h f g f gf gf g f gf gf gf gSProgram Editor Log Output Graph Results Explorer Contents Only My Favorite Folders   (1?]^_n`p p"View the Program Editor windowView the Log windowView the Output windowView the Graph windowView the Results Navigator windowView the Explorer windowView the Explorer showing only its contentsExplore Favorite FoldersPLOaexcy  exproot filesdmsexpEXPLORER odsresultsGRAPH1LISTINGLOGPGM "f g~h$h$h$h$h$h$h$ f g f gf gf gf gf gf gf g f g~ ?Query Table Editor Graphics Editor ODS Graphics Designer Report Editor Image Editor Text Editor New Library New File Shortcut  #9GT`lwx}z{|p  p pOpen the query toolOpen the table editorOpen the graphics editorOpen the ODS graphics designerTH @ 4((h D 0$0<DPX DATASTEPVAR1DATETIMEVAR2MMDDYYVAR3DATEVAR4WEEKDATEVAR5TIMEX 0"( ?Bhh  Lhaven/tests/testthat/test-zap_label.R0000644000176200001440000000212214033646044017407 0ustar liggesuserstest_that("zap_label strips label but doesn't change other attributes", { var <- labelled(c(1L, 98L, 99L), c(not_answered = 98L, not_applicable = 99L), label = "foo") var_nolabel <- labelled(c(1L, 98L, 99L), c(not_answered = 98L, not_applicable = 99L)) expect_identical(zap_label(var), var_nolabel) }) test_that("zap_label returns variables not of class('labelled') unmodified", { var <- c(1L, 98L, 99L) expect_equal(zap_labels(var), var) }) test_that("zap_label is correctly applied to every column in data frame", { y1_label <- labelled(10:1, c("good" = 1), label="foo") y1_nolabel <- labelled(10:1, c("good" = 1)) y2_label <- labelled(1:10, c("bad" = 2), label="bar") y2_nolabel <- labelled(1:10, c("bad" = 2)) df <- tibble::tibble(x = 1:10, y1=y1_label, y2=y2_label) df_zapped <- zap_label(df) expect_equal(ncol(df_zapped), ncol(df)) expect_identical(df_zapped$y1, y1_nolabel) expect_identical(df_zapped$y2, y2_nolabel) }) test_that("zap_label strips attribute from any vector", { x <- structure(1:5, label = "a") expect_equal(attr(zap_label(x), "label"), NULL) }) haven/tests/testthat.R0000644000176200001440000000006614033646021014501 0ustar liggesuserslibrary(testthat) library(haven) test_check("haven") haven/src/0000755000176200001440000000000014102332323012133 5ustar liggesusershaven/src/haven_types.cpp0000644000176200001440000001131014033646021015166 0ustar liggesusers#include #include "haven_types.h" #include FileVendor extVendor(FileExt ext) { switch (ext) { case HAVEN_DTA: return HAVEN_STATA; case HAVEN_SAV: case HAVEN_POR: return HAVEN_SPSS; case HAVEN_SAS7BDAT: case HAVEN_SAS7BCAT: case HAVEN_XPT: return HAVEN_SAS; default: cpp11::stop("Unknown file extension"); } /* not actually reached */ return HAVEN_SAS; } std::string formatAttribute(FileVendor vendor) { switch (vendor) { case HAVEN_STATA: return "format.stata"; case HAVEN_SPSS: return "format.spss"; case HAVEN_SAS: return "format.sas"; } return ""; } bool hasPrefix(std::string x, std::string prefix) { return x.compare(0, prefix.size(), prefix) == 0; } VarType numType(SEXP x) { if (Rf_inherits(x, "Date")) { return HAVEN_DATE; } else if (Rf_inherits(x, "POSIXct")) { return HAVEN_DATETIME; } else if (Rf_inherits(x, "hms")) { return HAVEN_TIME; } else { return HAVEN_DEFAULT; } } VarType numType(FileVendor vendor, const char* var_format) { if (var_format == NULL) return HAVEN_DEFAULT; std::string format(var_format); switch(vendor) { case HAVEN_SAS: // http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000589916.htm if (hasPrefix(format,"DATETIME")) return HAVEN_DATETIME; else if (hasPrefix(format,"IS8601DT")) return HAVEN_DATETIME; else if (hasPrefix(format,"E8601DT")) return HAVEN_DATETIME; else if (hasPrefix(format,"B8601DT")) return HAVEN_DATETIME; else if (hasPrefix(format,"IS8601DA")) return HAVEN_DATE; else if (hasPrefix(format,"E8601DA")) return HAVEN_DATE; else if (hasPrefix(format,"B8601DA")) return HAVEN_DATE; else if (hasPrefix(format,"WEEKDATE")) return HAVEN_DATE; else if (hasPrefix(format,"MMDDYY")) return HAVEN_DATE; else if (hasPrefix(format,"DDMMYY")) return HAVEN_DATE; else if (hasPrefix(format,"YYMMDD")) return HAVEN_DATE; else if (hasPrefix(format,"DATE")) return HAVEN_DATE; else if (hasPrefix(format,"TIME")) return HAVEN_TIME; else if (hasPrefix(format,"HHMM")) return HAVEN_TIME; else if (hasPrefix(format,"IS8601TM")) return HAVEN_TIME; else if (hasPrefix(format,"E8601TM")) return HAVEN_TIME; else if (hasPrefix(format,"B8601TM")) return HAVEN_TIME; else return HAVEN_DEFAULT; case HAVEN_SPSS: // http://www-01.ibm.com/support/knowledgecenter/?lang=en#!/SSLVMB_20.0.0/com.ibm.spss.statistics.help/syn_date_and_time_date_time_formats.htm if (hasPrefix(format, "DATETIME")) return HAVEN_DATETIME; else if (hasPrefix(format, "DATE")) return HAVEN_DATE; else if (hasPrefix(format, "ADATE")) return HAVEN_DATE; else if (hasPrefix(format, "EDATE")) return HAVEN_DATE; else if (hasPrefix(format, "JDATE")) return HAVEN_DATE; else if (hasPrefix(format, "SDATE")) return HAVEN_DATE; else if (hasPrefix(format, "TIME")) return HAVEN_TIME; else if (hasPrefix(format, "DTIME")) return HAVEN_TIME; else return HAVEN_DEFAULT; case HAVEN_STATA: if (hasPrefix(format, "%tC")) return HAVEN_DATETIME; else if (hasPrefix(format, "%tc")) return HAVEN_DATETIME; else if (hasPrefix(format, "%td")) return HAVEN_DATE; else if (hasPrefix(format, "%d")) return HAVEN_DATE; else return HAVEN_DEFAULT; } return HAVEN_DEFAULT; } // Value conversion ----------------------------------------------------------- int daysOffset(FileVendor vendor) { switch(vendor) { case HAVEN_SAS: return 3653; // 1960-01-01 case HAVEN_STATA: return 3653; case HAVEN_SPSS: return 141428; // 1582-01-01 } return 0; } double adjustDatetimeToR(FileVendor vendor, VarType var, double value) { if (std::isnan(value)) return value; double offset = daysOffset(vendor); switch(var) { case HAVEN_DATETIME: if (vendor == HAVEN_STATA) // stored in milliseconds value /= 1000; return value - offset * 86400; case HAVEN_DATE: if (vendor == HAVEN_SPSS) // stored in seconds value /= 86400; return value - offset; default: return value; } } double adjustDatetimeFromR(FileVendor vendor, SEXP col, double value) { if (std::isnan(value)) return value; double offset = daysOffset(vendor); switch(numType(col)) { case HAVEN_DATETIME: value += offset * 86400; if (vendor == HAVEN_STATA) // stored in milliseconds value *= 1000; return value; case HAVEN_DATE: value += offset; if (vendor == HAVEN_SPSS) // stored in seconds value *= 86400; return value; default: return value; } } haven/src/tagged_na.c0000644000176200001440000000635514033646021014227 0ustar liggesusers#define R_NO_REMAP #include #include #include // Scalar operators ------------------------------------------------------- // IEEE 754 defines binary64 as // * 1 bit : sign // * 11 bits: exponent // * 52 bits: significand // // R stores the value "1954" in the last 32 bits: this payload marks // the value as a NA, not a regular NaN. // // (Note that this discussion like most discussion of FP on the web, assumes // a big-endian architecture - in little endian the sign bit is the last // bit) typedef union { double value; // 8 bytes char byte[8]; // 8 * 1 bytes } ieee_double; #ifdef WORDS_BIGENDIAN // First two bytes are sign & expoonent // Last four bytes are 1954 const int TAG_BYTE = 3; #else const int TAG_BYTE = 4; #endif double make_tagged_na(char x) { ieee_double y; y.value = NA_REAL; y.byte[TAG_BYTE] = x; return y.value; } char tagged_na_value(double x) { ieee_double y; y.value = x; return y.byte[TAG_BYTE]; } char first_char(SEXP x) { if (TYPEOF(x) != CHARSXP) return '\0'; if (x == NA_STRING) return '\0'; return CHAR(x)[0]; } // Vectorised wrappers ----------------------------------------------------- SEXP tagged_na_(SEXP x) { if (TYPEOF(x) != STRSXP) Rf_errorcall(R_NilValue, "`x` must be a character vector"); int n = Rf_length(x); SEXP out = PROTECT(Rf_allocVector(REALSXP, n)); for (int i = 0; i < n; ++i) { char xi = first_char(STRING_ELT(x, i)); REAL(out)[i] = make_tagged_na(xi); } UNPROTECT(1); return out; } SEXP na_tag_(SEXP x) { if (TYPEOF(x) != REALSXP) Rf_errorcall(R_NilValue, "`x` must be a double vector"); int n = Rf_length(x); SEXP out = PROTECT(Rf_allocVector(STRSXP, n)); for (int i = 0; i < n; ++i) { double xi = REAL(x)[i]; if (!isnan(xi)) { SET_STRING_ELT(out, i, NA_STRING); } else { char tag = tagged_na_value(xi); if (tag == '\0') { SET_STRING_ELT(out, i, NA_STRING); } else { SET_STRING_ELT(out, i, Rf_mkCharLenCE(&tag, 1, CE_UTF8)); } } } UNPROTECT(1); return out; } SEXP falses(int n) { SEXP out = PROTECT(Rf_allocVector(LGLSXP, n)); for (int i = 0; i < n; ++i) LOGICAL(out)[i] = 0; UNPROTECT(1); return out; } SEXP is_tagged_na_(SEXP x, SEXP tag_) { if (TYPEOF(x) != REALSXP) { return falses(Rf_length(x)); } bool has_tag; char check_tag; if (TYPEOF(tag_) == NILSXP) { has_tag = false; check_tag = '\0'; } else if (TYPEOF(tag_) == STRSXP) { if (Rf_length(tag_) != 1) Rf_errorcall(R_NilValue, "`tag` must be a character vector of length 1"); has_tag = true; check_tag = first_char(STRING_ELT(tag_, 0)); } else { Rf_errorcall(R_NilValue, "`tag` must be NULL or a character vector"); } int n = Rf_length(x); SEXP out = PROTECT(Rf_allocVector(LGLSXP, n)); for (int i = 0; i < n; ++i) { double xi = REAL(x)[i]; if (!isnan(xi)) { LOGICAL(out)[i] = false; } else { char tag = tagged_na_value(xi); if (tag == '\0') { LOGICAL(out)[i] = false; } else { if (has_tag) { LOGICAL(out)[i] = tag == check_tag; } else { LOGICAL(out)[i] = true; } } } } UNPROTECT(1); return out; } haven/src/DfWriter.cpp0000644000176200001440000003243414035041246014401 0ustar liggesusers#include "readstat.h" #include "haven_types.h" #include "tagged_na.h" #include "cpp11/doubles.hpp" #include "cpp11/strings.hpp" #include "cpp11/integers.hpp" #include "cpp11/sexp.hpp" #include "cpp11/list.hpp" ssize_t data_writer(const void *data, size_t len, void *ctx); inline const char* string_utf8(SEXP x, int i) { return Rf_translateCharUTF8(STRING_ELT(x, i)); } inline const bool string_is_missing(SEXP x, int i) { return STRING_ELT(x, i) == NA_STRING; } inline readstat_measure_e measureType(SEXP x) { if (Rf_inherits(x, "ordered")) { return READSTAT_MEASURE_ORDINAL; } else if (Rf_inherits(x, "factor")) { return READSTAT_MEASURE_NOMINAL; } else { switch(TYPEOF(x)) { case INTSXP: case REALSXP: return READSTAT_MEASURE_SCALE; case LGLSXP: case STRSXP: return READSTAT_MEASURE_NOMINAL; default: return READSTAT_MEASURE_UNKNOWN; } } } inline int displayWidth(cpp11::sexp x) { cpp11::sexp display_width_obj(x.attr("display_width")); switch(TYPEOF(display_width_obj)) { case INTSXP: return INTEGER(display_width_obj)[0]; case REALSXP: return REAL(display_width_obj)[0]; } return 0; } class Writer { FileExt ext_; FileVendor vendor_; cpp11::list x_; readstat_writer_t* writer_; FILE* pOut_; public: Writer(FileExt ext, cpp11::list x, cpp11::strings pathEnc): ext_(ext), vendor_(extVendor(ext)), x_(x) { std::string path(Rf_translateChar(pathEnc[0])); pOut_ = fopen(path.c_str(), "wb"); if (pOut_ == NULL) cpp11::stop("Failed to open '%s' for writing", path.c_str()); writer_ = readstat_writer_init(); checkStatus(readstat_set_data_writer(writer_, data_writer)); } ~Writer() { try { fclose(pOut_); readstat_writer_free(writer_); } catch (...) {}; } void setCompression(readstat_compress_t version) { readstat_writer_set_compression(writer_, version); } void setVersion(int version) { readstat_writer_set_file_format_version(writer_, version); } void setName(const std::string& name) { readstat_writer_set_table_name(writer_, name.c_str()); } void setFileLabel(cpp11::sexp label) { if (label == R_NilValue) return; readstat_writer_set_file_label(writer_, string_utf8(label, 0)); } void write() { int p = x_.size(); if (p == 0) return; int n = Rf_length(x_[0]); readstat_error_t status; switch(ext_) { case HAVEN_SAV: status = readstat_begin_writing_sav(writer_, this, n); break; case HAVEN_DTA: status = readstat_begin_writing_dta(writer_, this, n); break; case HAVEN_SAS7BDAT: status = readstat_begin_writing_sas7bdat(writer_, this, n); break; case HAVEN_XPT: status = readstat_begin_writing_xport(writer_, this, n); break; case HAVEN_POR: case HAVEN_SAS7BCAT: status = READSTAT_OK; // not used break; } if (status) { cpp11::stop("Failed to create file: %s.", readstat_error_message(status)); } status = readstat_validate_metadata(writer_); if (status) { cpp11::stop("Failed to write metadata: %s.", readstat_error_message(status)); } cpp11::strings names(x_.attr("names")); // Define variables for (int j = 0; j < p; ++j) { cpp11::sexp col = x_[j]; VarType type = numType(col); const char* name = string_utf8(names, j); const char* format = var_format(col, type); switch(TYPEOF(col)) { case LGLSXP: status = defineVariable(cpp11::integers(cpp11::safe[Rf_coerceVector](col, INTSXP)), name, format); break; case INTSXP: status = defineVariable(cpp11::integers(col), name, format); break; case REALSXP: status = defineVariable(cpp11::doubles(col), name, format); break; case STRSXP: status = defineVariable(cpp11::strings(col), name, format); break; default: cpp11::stop("Columns of type %s not supported yet", Rf_type2char(TYPEOF(col))); } if (status) { cpp11::stop("Failed to create column `%s`: %s.", name, readstat_error_message(status)); } } // Write data for (int i = 0; i < n; ++i) { checkStatus(readstat_begin_row(writer_)); for (int j = 0; j < p; ++j) { cpp11::sexp col(x_[j]); readstat_variable_t* var = readstat_get_variable(writer_, j); switch (TYPEOF(col)) { case LGLSXP: { int val = LOGICAL(col)[i]; status = insertValue(var, val, val == NA_LOGICAL); break; } case INTSXP: { int val = INTEGER(col)[i]; status = insertValue(var, (int) adjustDatetimeFromR(vendor_, col, val), val == NA_INTEGER); break; } case REALSXP: { double val = REAL(col)[i]; status = insertValue(var, adjustDatetimeFromR(vendor_, col, val), !R_finite(val)); break; } case STRSXP: { status = insertValue(var, string_utf8(col, i), string_is_missing(col, i)); break; } default: status = READSTAT_OK; break; } if (status) { cpp11::stop("Failed to insert value [%i, %i]: %s.", i + 1, j + 1, readstat_error_message(status)); } } checkStatus(readstat_end_row(writer_)); } checkStatus(readstat_end_writing(writer_)); } // Define variables ---------------------------------------------------------- const char* var_label(cpp11::sexp x) { cpp11::sexp label(x.attr("label")); if (label == R_NilValue) return NULL; return string_utf8(label, 0); } const char* var_format(cpp11::sexp x, VarType varType) { // Use attribute, if present cpp11::sexp format(x.attr(formatAttribute(vendor_).c_str())); if (format != R_NilValue) return string_utf8(format, 0); switch(varType) { case HAVEN_DEFAULT: return NULL; case HAVEN_DATETIME: switch(vendor_) { case HAVEN_SAS: return "DATETIME"; case HAVEN_SPSS: return "DATETIME"; case HAVEN_STATA: return "%tc"; } case HAVEN_DATE: switch(vendor_) { case HAVEN_SAS: return "DATE"; case HAVEN_SPSS: return "DATE"; case HAVEN_STATA: return "%td"; } case HAVEN_TIME: switch(vendor_) { case HAVEN_SAS: return "TIME"; case HAVEN_SPSS: return "TIME"; case HAVEN_STATA: return NULL; // Stata doesn't have a pure time type } } return NULL; } readstat_error_t defineVariable(cpp11::integers x, const char* name, const char* format = NULL) { readstat_label_set_t* labelSet = NULL; if (Rf_inherits(x, "factor")) { labelSet = readstat_add_label_set(writer_, READSTAT_TYPE_INT32, name); cpp11::strings levels(x.attr("levels")); for (int i = 0; i < levels.size(); ++i) readstat_label_int32_value(labelSet, i + 1, string_utf8(levels, i)); } else if (Rf_inherits(x, "haven_labelled") && TYPEOF(x.attr("labels")) != NILSXP) { labelSet = readstat_add_label_set(writer_, READSTAT_TYPE_INT32, name); cpp11::integers values(x.attr("labels")); cpp11::strings labels(values.attr("names")); for (int i = 0; i < values.size(); ++i) readstat_label_int32_value(labelSet, values[i], string_utf8(labels, i)); } readstat_variable_t* var = readstat_add_variable(writer_, name, READSTAT_TYPE_INT32, 0); readstat_variable_set_format(var, format); readstat_variable_set_label(var, var_label(x)); readstat_variable_set_label_set(var, labelSet); readstat_variable_set_measure(var, measureType(x)); readstat_variable_set_display_width(var, displayWidth(x)); return readstat_validate_variable(writer_, var); } readstat_error_t defineVariable(cpp11::doubles x, const char* name, const char* format = NULL) { readstat_label_set_t* labelSet = NULL; if (Rf_inherits(x, "haven_labelled") && TYPEOF(x.attr("labels")) != NILSXP) { labelSet = readstat_add_label_set(writer_, READSTAT_TYPE_DOUBLE, name); cpp11::doubles values(x.attr("labels")); cpp11::strings labels(values.attr("names")); for (int i = 0; i < values.size(); ++i) { char tag = tagged_na_value(values[i]); if (!std::isnan(values[i]) || tag == '\0') { readstat_label_double_value(labelSet, values[i], string_utf8(labels, i)); } else { readstat_label_tagged_value(labelSet, tag, string_utf8(labels, i)); } } } readstat_variable_t* var = readstat_add_variable(writer_, name, READSTAT_TYPE_DOUBLE, 0); readstat_variable_set_format(var, format); readstat_variable_set_label(var, var_label(x)); readstat_variable_set_label_set(var, labelSet); readstat_variable_set_measure(var, measureType(x)); readstat_variable_set_display_width(var, displayWidth(x)); if (Rf_inherits(x, "haven_labelled_spss")) { SEXP na_range = x.attr("na_range"); if (TYPEOF(na_range) == REALSXP && Rf_length(na_range) == 2) { readstat_variable_add_missing_double_range(var, REAL(na_range)[0], REAL(na_range)[1]); } SEXP na_values = x.attr("na_values"); if (TYPEOF(na_values) == REALSXP) { int n = Rf_length(na_values); for (int i = 0; i < n; ++i) { readstat_variable_add_missing_double_value(var, REAL(na_values)[i]); } } } return readstat_validate_variable(writer_, var); } readstat_error_t defineVariable(cpp11::strings x, const char* name, const char* format = NULL) { readstat_label_set_t* labelSet = NULL; if (Rf_inherits(x, "haven_labelled") && TYPEOF(x.attr("labels")) != NILSXP) { labelSet = readstat_add_label_set(writer_, READSTAT_TYPE_STRING, name); cpp11::strings values(x.attr("labels")); cpp11::strings labels(values.attr("names")); for (int i = 0; i < values.size(); ++i) readstat_label_string_value(labelSet, string_utf8(values, i), string_utf8(labels, i)); } int max_length = 0; for (int i = 0; i < x.size(); ++i) { int length = strlen(string_utf8(x, i)); if (length > max_length) max_length = length; } readstat_variable_t* var = readstat_add_variable(writer_, name, READSTAT_TYPE_STRING, max_length); readstat_variable_set_format(var, format); readstat_variable_set_label(var, var_label(x)); readstat_variable_set_label_set(var, labelSet); readstat_variable_set_measure(var, measureType(x)); readstat_variable_set_display_width(var, displayWidth(x)); if (Rf_inherits(x, "haven_labelled_spss")) { SEXP na_range = x.attr("na_range"); if (Rf_length(na_range) == 2) { if (TYPEOF(na_range) == STRSXP) { readstat_variable_add_missing_string_range(var, R_CHAR(STRING_ELT(na_range, 0)), R_CHAR(STRING_ELT(na_range, 1)) ); } } SEXP na_values = x.attr("na_values"); int n = Rf_length(na_values); if (TYPEOF(na_values) == STRSXP) { for (int i = 0; i < n; ++i) { readstat_variable_add_missing_string_value(var, R_CHAR(STRING_ELT(na_values, i))); } } } return readstat_validate_variable(writer_, var); } // Value helper ------------------------------------------------------------- readstat_error_t insertValue(readstat_variable_t* var, int val, bool is_missing) { if (is_missing) { return readstat_insert_missing_value(writer_, var); } else { return readstat_insert_int32_value(writer_, var, val); } } readstat_error_t insertValue(readstat_variable_t* var, double val, bool is_missing) { if (is_missing) { char tag = tagged_na_value(val); if (tag == '\0') { return readstat_insert_missing_value(writer_, var); } else { return readstat_insert_tagged_missing_value(writer_, var, tag); } } else { return readstat_insert_double_value(writer_, var, val); } } readstat_error_t insertValue(readstat_variable_t* var, const char* val, bool is_missing) { if (is_missing) { return readstat_insert_missing_value(writer_, var); } else { return readstat_insert_string_value(writer_, var, val); } } // Misc ---------------------------------------------------------------------- void checkStatus(readstat_error_t err) { if (err == 0) return; cpp11::stop("Writing failure: %s.", readstat_error_message(err)); } ssize_t write(const void *data, size_t len) { return fwrite(data, sizeof(char), len, pOut_); } }; ssize_t data_writer(const void *data, size_t len, void *ctx) { return ((Writer*) ctx)->write(data, len); } [[cpp11::register]] void write_sav_(cpp11::list data, cpp11::strings path, bool compress) { Writer writer(HAVEN_SAV, data, path); if (compress) writer.setCompression(READSTAT_COMPRESS_BINARY); else writer.setCompression(READSTAT_COMPRESS_ROWS); writer.write(); } [[cpp11::register]] void write_dta_(cpp11::list data, cpp11::strings path, int version, cpp11::sexp label) { Writer writer(HAVEN_DTA, data, path); writer.setVersion(version); writer.setFileLabel(label); writer.write(); } [[cpp11::register]] void write_sas_(cpp11::list data, cpp11::strings path) { Writer(HAVEN_SAS7BDAT, data, path).write(); } [[cpp11::register]] void write_xpt_(cpp11::list data, cpp11::strings path, int version, std::string name) { Writer writer(HAVEN_XPT, data, path); writer.setVersion(version); writer.setName(name); writer.write(); } haven/src/readstat/0000755000176200001440000000000014102332323013742 5ustar liggesusershaven/src/readstat/CKHashTable.h0000644000176200001440000000246014101007206016164 0ustar liggesusers// CKHashTable - A simple hash table // Copyright 2010-2020 Evan Miller (see LICENSE) #include #include typedef struct ck_hash_entry_s { off_t key_offset; size_t key_length; const void *value; } ck_hash_entry_t; typedef struct ck_hash_table_s { size_t capacity; size_t count; ck_hash_entry_t *entries; char *keys; size_t keys_used; size_t keys_capacity; } ck_hash_table_t; int ck_str_hash_insert(const char *key, const void *value, ck_hash_table_t *table); const void *ck_str_hash_lookup(const char *key, ck_hash_table_t *table); int ck_str_n_hash_insert(const char *key, size_t keylen, const void *value, ck_hash_table_t *table); const void *ck_str_n_hash_lookup(const char *key, size_t keylen, ck_hash_table_t *table); int ck_float_hash_insert(float key, const void *value, ck_hash_table_t *table); const void *ck_float_hash_lookup(float key, ck_hash_table_t *table); int ck_double_hash_insert(double key, const void *value, ck_hash_table_t *table); const void *ck_double_hash_lookup(double key, ck_hash_table_t *table); ck_hash_table_t *ck_hash_table_init(size_t num_entries, size_t mean_key_length); void ck_hash_table_wipe(ck_hash_table_t *table); int ck_hash_table_grow(ck_hash_table_t *table); void ck_hash_table_free(ck_hash_table_t *table); haven/src/readstat/readstat_convert.h0000644000176200001440000000016314101007206017460 0ustar liggesusers readstat_error_t readstat_convert(char *dst, size_t dst_len, const char *src, size_t src_len, iconv_t converter); haven/src/readstat/readstat_value.c0000644000176200001440000001270314101765776017141 0ustar liggesusers #include "readstat.h" readstat_type_class_t readstat_type_class(readstat_type_t type) { if (type == READSTAT_TYPE_STRING || type == READSTAT_TYPE_STRING_REF) return READSTAT_TYPE_CLASS_STRING; return READSTAT_TYPE_CLASS_NUMERIC; } readstat_type_t readstat_value_type(readstat_value_t value) { return value.type; } readstat_type_class_t readstat_value_type_class(readstat_value_t value) { return readstat_type_class(value.type); } char readstat_value_tag(readstat_value_t value) { return value.tag; } int readstat_value_is_missing(readstat_value_t value, readstat_variable_t *variable) { if (value.is_system_missing || value.is_tagged_missing) return 1; if (variable) return readstat_value_is_defined_missing(value, variable); return 0; } int readstat_value_is_system_missing(readstat_value_t value) { return (value.is_system_missing); } int readstat_value_is_tagged_missing(readstat_value_t value) { return (value.is_tagged_missing); } static int readstat_double_is_defined_missing(double fp_value, readstat_variable_t *variable) { int count = readstat_variable_get_missing_ranges_count(variable); int i; for (i=0; i= lo && fp_value <= hi) { return 1; } } return 0; } static int readstat_string_is_defined_missing(const char *string, readstat_variable_t *variable) { if (string == NULL) return 0; int count = readstat_variable_get_missing_ranges_count(variable); int i; for (i=0; i= 0 && strcmp(string, hi) <= 0) { return 1; } } return 0; } int readstat_value_is_defined_missing(readstat_value_t value, readstat_variable_t *variable) { if (readstat_value_type_class(value) != readstat_variable_get_type_class(variable)) return 0; if (readstat_value_type_class(value) == READSTAT_TYPE_CLASS_STRING) return readstat_string_is_defined_missing(readstat_string_value(value), variable); if (readstat_value_type_class(value) == READSTAT_TYPE_CLASS_NUMERIC) return readstat_double_is_defined_missing(readstat_double_value(value), variable); return 0; } char readstat_int8_value(readstat_value_t value) { if (readstat_value_is_system_missing(value)) return 0; if (value.type == READSTAT_TYPE_DOUBLE) return (char)value.v.double_value; if (value.type == READSTAT_TYPE_FLOAT) return (char)value.v.float_value; if (value.type == READSTAT_TYPE_INT32) return (char)value.v.i32_value; if (value.type == READSTAT_TYPE_INT16) return (char)value.v.i16_value; if (value.type == READSTAT_TYPE_INT8) return value.v.i8_value; return 0; } int16_t readstat_int16_value(readstat_value_t value) { if (readstat_value_is_system_missing(value)) return 0; if (value.type == READSTAT_TYPE_DOUBLE) return (int16_t)value.v.double_value; if (value.type == READSTAT_TYPE_FLOAT) return (int16_t)value.v.float_value; if (value.type == READSTAT_TYPE_INT32) return (int16_t)value.v.i32_value; if (value.type == READSTAT_TYPE_INT16) return value.v.i16_value; if (value.type == READSTAT_TYPE_INT8) return value.v.i8_value; return 0; } int32_t readstat_int32_value(readstat_value_t value) { if (readstat_value_is_system_missing(value)) return 0; if (value.type == READSTAT_TYPE_DOUBLE) return (int32_t)value.v.double_value; if (value.type == READSTAT_TYPE_FLOAT) return (int32_t)value.v.float_value; if (value.type == READSTAT_TYPE_INT32) return value.v.i32_value; if (value.type == READSTAT_TYPE_INT16) return value.v.i16_value; if (value.type == READSTAT_TYPE_INT8) return value.v.i8_value; return 0; } float readstat_float_value(readstat_value_t value) { if (readstat_value_is_system_missing(value)) return NAN; if (value.type == READSTAT_TYPE_DOUBLE) return (float)value.v.double_value; if (value.type == READSTAT_TYPE_FLOAT) return value.v.float_value; if (value.type == READSTAT_TYPE_INT32) return value.v.i32_value; if (value.type == READSTAT_TYPE_INT16) return value.v.i16_value; if (value.type == READSTAT_TYPE_INT8) return value.v.i8_value; return value.v.float_value; } double readstat_double_value(readstat_value_t value) { if (readstat_value_is_system_missing(value)) return NAN; if (value.type == READSTAT_TYPE_DOUBLE) return value.v.double_value; if (value.type == READSTAT_TYPE_FLOAT) return value.v.float_value; if (value.type == READSTAT_TYPE_INT32) return value.v.i32_value; if (value.type == READSTAT_TYPE_INT16) return value.v.i16_value; if (value.type == READSTAT_TYPE_INT8) return value.v.i8_value; return NAN; } const char *readstat_string_value(readstat_value_t value) { if (readstat_value_type(value) == READSTAT_TYPE_STRING) return value.v.string_value; return NULL; } haven/src/readstat/readstat_malloc.h0000644000176200001440000000020614101007206017245 0ustar liggesusers void *readstat_malloc(size_t size); void *readstat_calloc(size_t count, size_t size); void *readstat_realloc(void *ptr, size_t len); haven/src/readstat/LICENSE0000644000176200001440000000210314101007206014741 0ustar liggesusersCopyright (c) 2013-2016 Evan Miller (except where otherwise noted) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. haven/src/readstat/readstat_bits.c0000644000176200001440000000316514101007206016741 0ustar liggesusers// // readstat_bits.c - Bit-twiddling utility functions // #include #include #include #include "readstat_bits.h" int machine_is_little_endian() { int test_byte_order = 1; return ((char *)&test_byte_order)[0]; } char ones_to_twos_complement1(char num) { return num < 0 ? num+1 : num; } int16_t ones_to_twos_complement2(int16_t num) { return num < 0 ? num+1 : num; } int32_t ones_to_twos_complement4(int32_t num) { return num < 0 ? num+1 : num; } char twos_to_ones_complement1(char num) { return num < 0 ? num-1 : num; } int16_t twos_to_ones_complement2(int16_t num) { return num < 0 ? num-1 : num; } int32_t twos_to_ones_complement4(int32_t num) { return num < 0 ? num-1 : num; } uint16_t byteswap2(uint16_t num) { return ((num & 0xFF00) >> 8) | ((num & 0x00FF) << 8); } uint32_t byteswap4(uint32_t num) { num = ((num & 0xFFFF0000) >> 16) | ((num & 0x0000FFFF) << 16); return ((num & 0xFF00FF00) >> 8) | ((num & 0x00FF00FF) << 8); } uint64_t byteswap8(uint64_t num) { num = ((num & 0xFFFFFFFF00000000) >> 32) | ((num & 0x00000000FFFFFFFF) << 32); num = ((num & 0xFFFF0000FFFF0000) >> 16) | ((num & 0x0000FFFF0000FFFF) << 16); return ((num & 0xFF00FF00FF00FF00) >> 8) | ((num & 0x00FF00FF00FF00FF) << 8); } float byteswap_float(float num) { uint32_t answer = 0; memcpy(&answer, &num, 4); answer = byteswap4(answer); memcpy(&num, &answer, 4); return num; } double byteswap_double(double num) { uint64_t answer = 0; memcpy(&answer, &num, 8); answer = byteswap8(answer); memcpy(&num, &answer, 8); return num; } haven/src/readstat/readstat_io_unistd.h0000644000176200001440000000103114101007206017770 0ustar liggesusers typedef struct unistd_io_ctx_s { int fd; } unistd_io_ctx_t; int unistd_open_handler(const char *path, void *io_ctx); int unistd_close_handler(void *io_ctx); readstat_off_t unistd_seek_handler(readstat_off_t offset, readstat_io_flags_t whence, void *io_ctx); ssize_t unistd_read_handler(void *buf, size_t nbytes, void *io_ctx); readstat_error_t unistd_update_handler(long file_size, readstat_progress_handler progress_handler, void *user_ctx, void *io_ctx); readstat_error_t unistd_io_init(readstat_parser_t *parser); haven/src/readstat/readstat_iconv.h0000644000176200001440000000054114101007233017116 0ustar liggesusers#include /* ICONV_CONST defined by autotools; so we hack this in manually */ #if defined(_WIN32) || defined(__sun) #define ICONV_CONST const #else #define ICONV_CONST #endif typedef ICONV_CONST char ** readstat_iconv_inbuf_t; typedef struct readstat_charset_entry_s { int code; char name[32]; } readstat_charset_entry_t; haven/src/readstat/txt/0000755000176200001440000000000014101765776014606 5ustar liggesusershaven/src/readstat/txt/readstat_schema.h0000644000176200001440000000015514101007206020060 0ustar liggesusersreadstat_schema_entry_t *readstat_schema_find_or_create_entry(readstat_schema_t *dct, const char *var_name); haven/src/readstat/txt/readstat_copy.c0000644000176200001440000000230014101007206017557 0ustar liggesusers#include #include #include void readstat_copy(char *buf, size_t buf_len, const char *str_start, size_t str_len) { size_t this_len = str_len; if (this_len >= buf_len) { this_len = buf_len - 1; } memcpy(buf, str_start, this_len); buf[this_len] = '\0'; } void readstat_copy_lower(char *buf, size_t buf_len, const char *str_start, size_t str_len) { int i; readstat_copy(buf, buf_len, str_start, str_len); for (i=0; i= buf_len) { this_len = buf_len - 1; } size_t i=0; size_t j=0; int slash = 0; for (i=0; i #include #include "../readstat.h" #include "../readstat_strings.h" #include "readstat_schema.h" #include "readstat_copy.h" #include "commands_util.h" %%{ machine spss_commands; write data noerror nofinal; }%% readstat_schema_t *readstat_parse_spss_commands(readstat_parser_t *parser, const char *filepath, void *user_ctx, readstat_error_t *outError) { if (parser->io->open(filepath, parser->io->io_ctx) == -1) { if (outError) *outError = READSTAT_ERROR_OPEN; return NULL; } readstat_schema_t *schema = NULL; unsigned char *bytes = NULL; readstat_error_t error = READSTAT_OK; ssize_t len = parser->io->seek(0, READSTAT_SEEK_END, parser->io->io_ctx); if (len == -1) { error = READSTAT_ERROR_SEEK; goto cleanup; } parser->io->seek(0, READSTAT_SEEK_SET, parser->io->io_ctx); bytes = malloc(len); parser->io->read(bytes, len, parser->io->io_ctx); unsigned char *p = bytes; unsigned char *pe = bytes + len; unsigned char *eof = pe; unsigned char *str_start = NULL; size_t str_len = 0; int cs; int i; int line_no = 0; uint64_t first_integer = 0, integer = 0; double double_value = NAN; unsigned char *line_start = p; char varname[32]; char argname[32]; char string_value[32]; char buf[1024]; char var_list[1024][32]; long var_col = 0; long var_row = 0; long var_len = 0; long var_count = 0; readstat_type_t var_type = READSTAT_TYPE_DOUBLE; label_type_t label_type = LABEL_TYPE_DOUBLE; int labelset_count = 0; if ((schema = calloc(1, sizeof(readstat_schema_t))) == NULL) { error = READSTAT_ERROR_MALLOC; goto cleanup; } schema->rows_per_observation = 1; %%{ action start_integer { integer = 0; } action incr_integer { integer = 10 * integer + (fc - '0'); } action copy_pos { var_col = integer - 1; var_len = 1; } action set_len { var_len = integer - var_col; } action copy_quoted_buf { readstat_copy_quoted(buf, sizeof(buf), (char *)str_start, str_len); } action copy_quoted_string { readstat_copy_quoted(string_value, sizeof(string_value), (char *)str_start, str_len); } action copy_string { readstat_copy(string_value, sizeof(string_value), (char *)str_start, str_len); } action copy_varname { readstat_copy(varname, sizeof(varname), (char *)str_start, str_len); } action copy_argname { readstat_copy(argname, sizeof(argname), (char *)str_start, str_len); } action handle_var { readstat_schema_entry_t *entry = readstat_schema_find_or_create_entry(schema, varname); entry->variable.type = var_type; entry->row = var_row; entry->col = var_col; entry->len = var_len; } action handle_var_label { readstat_schema_entry_t *entry = readstat_schema_find_or_create_entry(schema, varname); readstat_copy(entry->variable.label, sizeof(entry->variable.label), buf, sizeof(buf)); } action reset_variable_list { var_count = 0; } action add_variable_to_list { if (var_count < sizeof(var_list)/sizeof(var_list[0])) { memcpy(var_list[var_count++], varname, sizeof(varname)); } } action handle_get_data_arg { if (strcasecmp(argname, "FIRSTCASE") == 0) { schema->first_line = integer; } if (strcasecmp(argname, "DELIMITERS") == 0) { schema->field_delimiter = buf[0]; } } action handle_labelset { char labelset_name[256]; snprintf(labelset_name, sizeof(labelset_name), "labels%d", labelset_count++); for (i=0; ilabelset, sizeof(entry->labelset), labelset_name, sizeof(labelset_name)); } } action handle_value_label { char labelset_name[256]; snprintf(labelset_name, sizeof(labelset_name), "labels%d", labelset_count); error = submit_value_label(parser, labelset_name, label_type, first_integer, integer, double_value, string_value, buf, user_ctx); if (error != READSTAT_OK) goto cleanup; } single_quoted_string = "'" ( [^']* ) >{ str_start = fpc; } %{ str_len = fpc - str_start; } "'"; double_quoted_string = "\"" ( [^"]* ) >{ str_start = fpc; } %{ str_len = fpc - str_start; } "\""; quoted_string = ( single_quoted_string | double_quoted_string ) %copy_quoted_buf; newline = ( "\n" | "\r\n" ) %{ line_no++; line_start = p; }; identifier = ( [A-Za-z] [_A-Za-z0-9]* ) >{ str_start = fpc; } %{ str_len = fpc - str_start; }; integer = [0-9]+ >start_integer $incr_integer; double_value = "-"? integer ("." integer)?; whitespace = [ \t] | newline; pos = ( integer %copy_pos ("-" whitespace* integer %set_len)? ); multiline_comment = "/*" ( any* - ( any* "*/" any* ) ) "*/"; comment = "*" ( any* - ( any* "." whitespace* newline any* ) ) "." whitespace* newline | multiline_comment | "COMMENT " [^\.]* "."; var = identifier %copy_varname; width = (whitespace+ "(A"i integer? ")" %{ var_type = READSTAT_TYPE_STRING; } | whitespace+ "(" integer ")" )?; type = ( "A"i integer %{ var_type = READSTAT_TYPE_STRING; } | "F"i integer %{ var_type = READSTAT_TYPE_DOUBLE; } ("." integer)? | "DATE"i integer %{ var_type = READSTAT_TYPE_DOUBLE; } | "ADATE"i integer %{ var_type = READSTAT_TYPE_STRING; } ); slash_arg = "/" identifier %copy_argname whitespace* "=" whitespace* (identifier | quoted_string | integer); slash_args = slash_arg ( whitespace* slash_arg)*; select_cmd = "SELECT"i whitespace+ "IF"i whitespace+ (whitespace | identifier | "-"? integer | "(" | ")" | quoted_string )+ "."; file_handle_cmd = "FILE"i whitespace+ "HANDLE"i whitespace+ identifier whitespace+ slash_args whitespace* "."; save_cmd = "SAVE"i whitespace+ ( "OUTFILE"i | "DICTIONARY"i ) whitespace* "="? whitespace* quoted_string "/"? whitespace* ("/" whitespace* "COMPRESSED" whitespace*)? "."; data_list_arg = ( "RECORD"i ("S"i)? whitespace* "=" whitespace* integer | "FILE"i whitespace* "=" whitespace* (quoted_string | identifier) | "TABLE"i | "FIXED"i ); data_list_args = data_list_arg (whitespace+ data_list_arg)*; data_list_cmd = "DATA"i whitespace+ "LIST"i whitespace+ data_list_args whitespace* ( "/" ( integer %{ var_row = integer - 1; } )? whitespace+ var whitespace+ pos %{ var_type = READSTAT_TYPE_DOUBLE; } width (whitespace+ var >handle_var whitespace+ pos %{ var_type = READSTAT_TYPE_DOUBLE; } width )+ whitespace* )+ "." %handle_var; get_data_variable = var whitespace+ type %handle_var; get_data_variable_list = get_data_variable ( whitespace+ get_data_variable )*; get_data_arg = ( slash_arg %handle_get_data_arg | "/VARIABLES"i whitespace* "=" whitespace* get_data_variable_list ); get_data_args = get_data_arg (whitespace* get_data_arg)*; get_data_cmd = "GET"i whitespace+ "DATA"i whitespace+ get_data_args whitespace* "."; get_file_cmd = "GET"i whitespace+ "FILE"i whitespace* ("=" whitespace*)? quoted_string whitespace* "."; dataset_cmd_arg = "WINDOW"i whitespace* "=" whitespace* identifier; dataset_cmd_args = dataset_cmd_arg (whitespace+ dataset_cmd_arg)*; dataset_cmd = "DATASET"i whitespace+ "NAME"i whitespace+ (identifier | quoted_string) (whitespace+ dataset_cmd_args)? "."; format_string = "F" integer "." integer; format_spec = identifier (whitespace+ identifier)* whitespace+ "(" format_string ")"; formats_cmd = "FORMATS"i whitespace+ format_spec (whitespace+ "/" whitespace+ format_spec)* whitespace* "."; variable_labels_cmd = "VARIABLE"i whitespace+ "LABEL"i ("S"i)? (whitespace+ var whitespace+ quoted_string %handle_var_label (whitespace* "/")? )+ whitespace* "."; variable_list = var %reset_variable_list %add_variable_to_list (whitespace+ var %add_variable_to_list)*; missing_value_label = "." %{ label_type = -1; } whitespace+ quoted_string %handle_value_label; missing_values_item = var whitespace+ "(" quoted_string ")"; missing_values_list = missing_values_item (whitespace+ missing_values_item)*; value_label = ( "-" integer %{ label_type = LABEL_TYPE_DOUBLE; double_value = -(double)integer; } | integer %{ label_type = LABEL_TYPE_DOUBLE; double_value = integer; } | integer whitespace+ "-" whitespace+ %{ first_integer = integer; } integer %{ label_type = LABEL_TYPE_RANGE; } | quoted_string %{ label_type = LABEL_TYPE_STRING; } %copy_quoted_string ) whitespace+ quoted_string %handle_value_label; variable_value_labels = variable_list whitespace+ ( value_label | missing_value_label ) (whitespace+ value_label)* whitespace* %handle_labelset; variable_level = variable_list whitespace+ ( "(SCALE)"i | "(NOMINAL)"i | "(ORDINAL)"i ); variable_level_subcmd = "VARIABLE"i whitespace+ "LEVEL"i whitespace+ variable_level ( whitespace* "/" whitespace* variable_level )*; value_labels_cmd = "VALUE"i whitespace+ "LABELS"i whitespace+ ("/" whitespace*)? variable_value_labels ( "/" whitespace* variable_value_labels )* ( "/" whitespace* ( variable_level_subcmd whitespace* )? )? "."; missing_values_cmd = "MISSING"i whitespace+ "VALUES"i whitespace+ missing_values_list whitespace* "."; recode_cmd = "RECODE"i whitespace+ identifier whitespace+ "(" double_value (whitespace+ double_value)* "=" whitespace* "SYSMIS" whitespace* ")" whitespace* "."; execute_cmd = "EXECUTE"i whitespace* "."; list_cmd = "LIST"i whitespace* "."; display_cmd = "DISPLAY"i (whitespace+ identifier)* whitespace* "."; input_program_cmd = "INPUT"i whitespace+ "PROGRAM"i whitespace* "."; end_input_program_cmd = "END"i whitespace+ input_program_cmd; set_cmd = "SET"i whitespace+ identifier whitespace* "=" whitespace* (identifier | integer | quoted_string) whitespace* "."; command = file_handle_cmd | data_list_cmd | get_data_cmd | get_file_cmd | dataset_cmd | display_cmd | formats_cmd | missing_values_cmd | variable_labels_cmd | value_labels_cmd | recode_cmd | select_cmd | save_cmd | list_cmd | input_program_cmd | end_input_program_cmd | set_cmd | execute_cmd; main := ( whitespace | comment | command )*; write init; write exec; }%% /* suppress warnings */ (void)spss_commands_en_main; if (cs < %%{ write first_final; }%%) { char error_buf[1024]; if (p == pe) { snprintf(error_buf, sizeof(error_buf), "Error parsing SPSS command file (end-of-file unexpectedly reached)"); } else { snprintf(error_buf, sizeof(error_buf), "Error parsing SPSS command file around line #%d, col #%ld (%c)", line_no + 1, (long)(p - line_start + 1), *p); } if (parser->handlers.error) { parser->handlers.error(error_buf, user_ctx); } error = READSTAT_ERROR_PARSE; goto cleanup; } error = submit_columns(parser, schema, user_ctx); cleanup: parser->io->close(parser->io->io_ctx); free(bytes); if (error != READSTAT_OK) { if (outError) *outError = error; readstat_schema_free(schema); schema = NULL; } return schema; } haven/src/readstat/txt/readstat_copy.h0000644000176200001440000000042014101007206017565 0ustar liggesusers void readstat_copy(char *buf, size_t buf_len, const char *str_start, size_t str_len); void readstat_copy_lower(char *buf, size_t buf_len, const char *str_start, size_t str_len); void readstat_copy_quoted(char *buf, size_t buf_len, const char *str_start, size_t str_len); haven/src/readstat/txt/commands_util.h0000644000176200001440000000076314101007206017574 0ustar liggesusers typedef enum { LABEL_TYPE_NAN = -1, LABEL_TYPE_DOUBLE, LABEL_TYPE_STRING, LABEL_TYPE_RANGE, LABEL_TYPE_OTHER } label_type_t; readstat_error_t submit_columns(readstat_parser_t *parser, readstat_schema_t *dct, void *user_ctx); readstat_error_t submit_value_label(readstat_parser_t *parser, const char *labelset, label_type_t label_type, int64_t first_integer, int64_t last_integer, double double_value, const char *string_value, const char *buf, void *user_ctx); haven/src/readstat/txt/readstat_stata_dictionary_read.rl0000644000176200001440000001627514101007206023354 0ustar liggesusers#include #include "../readstat.h" #include "readstat_schema.h" #include "readstat_copy.h" %%{ machine stata_dictionary; write data noerror nofinal; }%% readstat_schema_t *readstat_parse_stata_dictionary(readstat_parser_t *parser, const char *filepath, void *user_ctx, readstat_error_t *outError) { if (parser->io->open(filepath, parser->io->io_ctx) == -1) { if (outError) *outError = READSTAT_ERROR_OPEN; return NULL; } readstat_schema_t *schema = NULL; unsigned char *bytes = NULL; int cb_return_value = READSTAT_HANDLER_OK; int total_entry_count = 0; int partial_entry_count = 0; readstat_error_t error = READSTAT_OK; ssize_t len = parser->io->seek(0, READSTAT_SEEK_END, parser->io->io_ctx); if (len == -1) { error = READSTAT_ERROR_SEEK; goto cleanup; } parser->io->seek(0, READSTAT_SEEK_SET, parser->io->io_ctx); bytes = malloc(len); parser->io->read(bytes, len, parser->io->io_ctx); unsigned char *p = bytes; unsigned char *pe = bytes + len; unsigned char *str_start = NULL; size_t str_len = 0; int cs; // u_char *eof = pe; int integer = 0; int current_row = 0; int current_col = 0; int line_no = 0; unsigned char *line_start = p; readstat_schema_entry_t current_entry; if ((schema = calloc(1, sizeof(readstat_schema_t))) == NULL) { error = READSTAT_ERROR_MALLOC; goto cleanup; } schema->rows_per_observation = 1; %%{ action start_integer { integer = 0; } action incr_integer { integer = 10 * integer + (fc - '0'); } action start_entry { memset(¤t_entry, 0, sizeof(readstat_schema_entry_t)); current_entry.decimal_separator = '.'; current_entry.variable.type = READSTAT_TYPE_DOUBLE; current_entry.variable.index = total_entry_count; } action end_entry { current_entry.row = current_row; current_entry.col = current_col; current_col += current_entry.len; cb_return_value = READSTAT_HANDLER_OK; if (parser->handlers.variable) { current_entry.variable.index_after_skipping = partial_entry_count; cb_return_value = parser->handlers.variable(total_entry_count, ¤t_entry.variable, NULL, user_ctx); if (cb_return_value == READSTAT_HANDLER_ABORT) { error = READSTAT_ERROR_USER_ABORT; goto cleanup; } } if (cb_return_value == READSTAT_HANDLER_SKIP_VARIABLE) { current_entry.skip = 1; } else { partial_entry_count++; } schema->entries = realloc(schema->entries, sizeof(readstat_schema_entry_t) * (schema->entry_count+1)); memcpy(&schema->entries[schema->entry_count++], ¤t_entry, sizeof(readstat_schema_entry_t)); total_entry_count++; } action copy_filename { readstat_copy(schema->filename, sizeof(schema->filename), (char *)str_start, str_len); } action copy_varname { readstat_copy(current_entry.variable.name, sizeof(current_entry.variable.name), (char *)str_start, str_len); } action copy_varlabel { readstat_copy(current_entry.variable.label, sizeof(current_entry.variable.label), (char *)str_start, str_len); } quoted_string = "\"" ( [^"]* ) >{ str_start = fpc; } %{ str_len = fpc - str_start; } "\""; unquoted_string = [A-Za-z0-9_/\\\.\-]+ >{ str_start = fpc; } %{ str_len = fpc - str_start; }; identifier = ( [A-Za-z] [_\.A-Za-z0-9]* ) >{ str_start = fpc; } %{ str_len = fpc - str_start; }; newline = ( "\n" | "\r\n" ) %{ line_no++; line_start = p; }; spacetab = [ \t]; whitespace = spacetab | newline; filename = ( quoted_string | unquoted_string ) %copy_filename; integer = [0-9]+ >start_integer $incr_integer; lines_marker = "_lines(" spacetab* integer spacetab* ")" %{ schema->rows_per_observation = integer; }; line_marker = "_line(" spacetab* integer spacetab* ")" %{ current_row = integer - 1; }; column_marker = "_column(" spacetab* integer spacetab* ")" %{ current_col = integer - 1; }; newline_marker = "_newline" %{ current_row++; } ( "(" spacetab* integer spacetab* ")" %{ current_row += (integer - 1); } )?; skip_marker = "_skip(" spacetab* integer spacetab* ")" %{ current_col += (integer - 1) }; lrecl_marker = "_lrecl(" spacetab* integer spacetab* ")" %{ schema->cols_per_observation = integer; }; firstlineoffile_marker = "_firstlineoffile(" spacetab* integer spacetab* ")" %{ schema->first_line = integer - 1; }; marker = lrecl_marker | firstlineoffile_marker | lines_marker | line_marker | column_marker | newline_marker; type = "byte" %{ current_entry.variable.type = READSTAT_TYPE_INT8; } | "int" %{ current_entry.variable.type = READSTAT_TYPE_INT16; } | "long" %{ current_entry.variable.type = READSTAT_TYPE_INT32; } | "float" %{ current_entry.variable.type = READSTAT_TYPE_FLOAT; } | "double" %{ current_entry.variable.type = READSTAT_TYPE_DOUBLE; } | "str" integer %{ current_entry.variable.type = READSTAT_TYPE_STRING; current_entry.variable.storage_width = integer; }; varname = identifier %copy_varname; varlabel = quoted_string %copy_varlabel; format = "%" integer %{ current_entry.len = integer; } ( "s" | "S" | ( ( ( "." | "," %{ current_entry.decimal_separator = ','; } ) integer )? ( "f" | "g" | "e" ) ) ); entry = ( ( type spacetab+ )? varname ( spacetab+ format )? ( spacetab+ varlabel )? spacetab* newline ) >start_entry %end_entry; comment = "*" [^\r\n]* newline | "/*" ( any* - ( any* "*/" any* ) ) "*/"; contents = ( whitespace* ( marker | entry | comment ) )* whitespace*; main := comment* ("infile" whitespace+)? "dictionary" whitespace+ ( "using" whitespace+ filename whitespace+ )? "{" contents "}" any*; write init; write exec; }%% /* suppress warnings */ (void)stata_dictionary_en_main; if (cs < %%{ write first_final; }%%) { char error_buf[1024]; if (p == pe) { snprintf(error_buf, sizeof(error_buf), "Error parsing .dct file (end-of-file unexpectedly reached)"); } else { snprintf(error_buf, sizeof(error_buf), "Error parsing .dct file around line #%d, col #%ld (%c)", line_no + 1, (long)(p - line_start + 1), *p); } if (parser->handlers.error) { parser->handlers.error(error_buf, user_ctx); } error = READSTAT_ERROR_PARSE; goto cleanup; } cleanup: parser->io->close(parser->io->io_ctx); free(bytes); if (error != READSTAT_OK) { if (outError) *outError = error; readstat_schema_free(schema); schema = NULL; } return schema; } haven/src/readstat/txt/readstat_txt_read.c0000644000176200001440000002031414101007206020424 0ustar liggesusers#include #include #include #include #include "../readstat.h" #include "../readstat_iconv.h" #include "../readstat_convert.h" #include "readstat_schema.h" #if defined _MSC_VER #define restrict __restrict #endif typedef struct txt_ctx_s { int rows; iconv_t converter; readstat_schema_t *schema; } txt_ctx_t; static readstat_error_t handle_value(readstat_parser_t *parser, iconv_t converter, int obs_index, readstat_schema_entry_t *entry, char *bytes, size_t len, void *ctx) { readstat_error_t error = READSTAT_OK; char *converted_value = malloc(4*len+1); readstat_variable_t *variable = &entry->variable; readstat_value_t value = { .type = variable->type }; if (readstat_type_class(variable->type) == READSTAT_TYPE_CLASS_STRING) { error = readstat_convert(converted_value, 4 * len + 1, bytes, len, converter); if (error != READSTAT_OK) goto cleanup; value.v.string_value = converted_value; } else { char *endptr = NULL; if (variable->type == READSTAT_TYPE_DOUBLE) { value.v.double_value = strtod(bytes, &endptr); } else if (variable->type == READSTAT_TYPE_FLOAT) { value.v.float_value = strtof(bytes, &endptr); } else { value.v.i32_value = strtol(bytes, &endptr, 10); value.type = READSTAT_TYPE_INT32; } value.is_system_missing = (endptr == bytes); } if (parser->handlers.value(obs_index, variable, value, ctx) == READSTAT_HANDLER_ABORT) { error = READSTAT_ERROR_USER_ABORT; } cleanup: free(converted_value); return error; } static ssize_t txt_getdelim(char ** restrict linep, size_t * restrict linecapp, int delimiter, readstat_io_t *io) { char *value_buffer = *linep; size_t value_buffer_len = *linecapp; ssize_t i = 0; ssize_t bytes_read = 0; while ((bytes_read = io->read(&value_buffer[i], 1, io->io_ctx)) == 1 && value_buffer[i++] != delimiter) { if (i == value_buffer_len) { value_buffer = realloc(value_buffer, value_buffer_len *= 2); } } *linep = value_buffer; *linecapp = value_buffer_len; if (bytes_read == -1) return -1; return i; } static readstat_error_t txt_parse_delimited(readstat_parser_t *parser, txt_ctx_t *ctx, void *user_ctx) { size_t value_buffer_len = 4096; char *value_buffer = malloc(value_buffer_len); readstat_schema_t *schema = ctx->schema; readstat_error_t retval = READSTAT_OK; readstat_io_t *io = parser->io; int k=0; while (1) { for (int j=0; jentry_count; j++) { readstat_schema_entry_t *entry = &schema->entries[j]; int delimiter = (j == schema->entry_count-1) ? '\n' : schema->field_delimiter; ssize_t chars_read = txt_getdelim(&value_buffer, &value_buffer_len, delimiter, io); if (chars_read == 0) goto cleanup; if (chars_read == -1) { retval = READSTAT_ERROR_READ; goto cleanup; } if (parser->handlers.value && !entry->skip) { chars_read--; // delimiter if (chars_read > 0 && value_buffer[chars_read-1] == '\r') { chars_read--; // CRLF } value_buffer[chars_read] = '\0'; retval = handle_value(parser, ctx->converter, k, entry, value_buffer, chars_read, user_ctx); if (retval != READSTAT_OK) goto cleanup; } } if (++k == parser->row_limit) break; } cleanup: ctx->rows = k; if (value_buffer) free(value_buffer); return retval; } static readstat_error_t txt_parse_fixed_width(readstat_parser_t *parser, txt_ctx_t *ctx, void *user_ctx, const size_t *line_lens, char *line_buffer) { char value_buffer[4096]; readstat_schema_t *schema = ctx->schema; readstat_io_t *io = parser->io; readstat_error_t retval = READSTAT_OK; int k=0; while (1) { int j=0; for (int i=0; irows_per_observation; i++) { ssize_t bytes_read = io->read(line_buffer, line_lens[i], io->io_ctx); if (bytes_read == 0) goto cleanup; if (bytes_read < line_lens[i]) { retval = READSTAT_ERROR_READ; goto cleanup; } for (; jentry_count && schema->entries[j].row == i; j++) { readstat_schema_entry_t *entry = &schema->entries[j]; size_t field_len = schema->entries[j].len; size_t field_offset = schema->entries[j].col; if (field_len < sizeof(value_buffer) && parser->handlers.value && !entry->skip) { memcpy(value_buffer, &line_buffer[field_offset], field_len); value_buffer[field_len] = '\0'; retval = handle_value(parser, ctx->converter, k, entry, value_buffer, field_len, user_ctx); if (retval != READSTAT_OK) { goto cleanup; } } } if (schema->cols_per_observation == 0) { char throwaway = '\0'; while (io->read(&throwaway, 1, io->io_ctx) == 1 && throwaway != '\n'); } } if (++k == parser->row_limit) break; } cleanup: ctx->rows = k; return retval; } readstat_error_t readstat_parse_txt(readstat_parser_t *parser, const char *filename, readstat_schema_t *schema, void *user_ctx) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = parser->io; int i; size_t *line_lens = NULL; size_t line_buffer_len = 0; char *line_buffer = NULL; txt_ctx_t ctx = { .schema = schema }; if (parser->output_encoding && parser->input_encoding) { ctx.converter = iconv_open(parser->output_encoding, parser->input_encoding); if (ctx.converter == (iconv_t)-1) { ctx.converter = NULL; retval = READSTAT_ERROR_UNSUPPORTED_CHARSET; goto cleanup; } } if (io->open(filename, io->io_ctx) == -1) { retval = READSTAT_ERROR_OPEN; goto cleanup; } if ((line_lens = malloc(schema->rows_per_observation * sizeof(size_t))) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } for (i=0; irows_per_observation; i++) { line_lens[i] = schema->cols_per_observation; } for (i=0; ientry_count; i++) { readstat_schema_entry_t *entry = &schema->entries[i]; if (line_lens[entry->row] < entry->col + entry->len) line_lens[entry->row] = entry->col + entry->len; } for (i=0; irows_per_observation; i++) { if (line_buffer_len < line_lens[i]) line_buffer_len = line_lens[i]; } line_buffer_len += 2; /* CRLF */ if ((line_buffer = malloc(line_buffer_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (schema->first_line > 1) { int throwaway_lines = schema->first_line - 1; char throwaway_char = '\0'; while (throwaway_lines--) { while (io->read(&throwaway_char, 1, io->io_ctx) == 1 && throwaway_char != '\n'); } } if (schema->field_delimiter) { retval = txt_parse_delimited(parser, &ctx, user_ctx); } else { retval = txt_parse_fixed_width(parser, &ctx, user_ctx, line_lens, line_buffer); } if (retval != READSTAT_OK) goto cleanup; if (parser->handlers.metadata) { readstat_metadata_t metadata = { .row_count = ctx.rows, .var_count = schema->entry_count }; int cb_retval = parser->handlers.metadata(&metadata, user_ctx); if (cb_retval == READSTAT_HANDLER_ABORT) retval = READSTAT_ERROR_USER_ABORT; } cleanup: io->close(io->io_ctx); if (line_buffer) free(line_buffer); if (line_lens) free(line_lens); if (ctx.converter) iconv_close(ctx.converter); return retval; } haven/src/readstat/txt/readstat_spss_commands_read.c0000644000176200001440000022141414101765776022511 0ustar liggesusers#line 1 "src/txt/readstat_spss_commands_read.rl" #include #include #include "../readstat.h" #include "../readstat_strings.h" #include "readstat_schema.h" #include "readstat_copy.h" #include "commands_util.h" #line 14 "src/txt/readstat_spss_commands_read.c" static const signed char _spss_commands_actions[] = { 0, 1, 1, 1, 2, 1, 4, 1, 8, 1, 12, 1, 13, 1, 15, 1, 16, 1, 17, 1, 18, 1, 19, 1, 20, 1, 21, 1, 22, 1, 27, 1, 30, 1, 31, 1, 32, 1, 34, 2, 0, 1, 2, 1, 0, 2, 2, 28, 2, 2, 29, 2, 3, 28, 2, 3, 29, 2, 4, 9, 2, 4, 12, 2, 4, 14, 2, 4, 20, 2, 8, 20, 2, 15, 16, 2, 17, 18, 2, 19, 13, 2, 19, 20, 2, 21, 6, 2, 21, 7, 2, 21, 12, 2, 21, 20, 2, 23, 8, 2, 24, 8, 2, 25, 8, 2, 26, 8, 3, 4, 0, 1, 3, 4, 14, 13, 3, 4, 35, 5, 3, 19, 0, 1, 3, 19, 8, 20, 3, 21, 0, 1, 3, 21, 1, 0, 3, 21, 6, 11, 3, 21, 12, 6, 3, 33, 0, 1, 4, 19, 33, 0, 1, 4, 21, 6, 10, 11, 4, 21, 6, 11, 10, 0 }; static const short _spss_commands_key_offsets[] = { 0, 0, 1, 2, 7, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 25, 27, 29, 35, 41, 47, 48, 50, 52, 54, 58, 68, 78, 79, 81, 85, 87, 92, 97, 98, 108, 118, 119, 120, 121, 126, 137, 148, 149, 155, 163, 171, 172, 183, 189, 195, 196, 203, 212, 221, 222, 226, 229, 233, 241, 249, 250, 261, 267, 273, 274, 283, 294, 305, 306, 310, 313, 319, 329, 339, 340, 344, 346, 350, 356, 362, 363, 365, 367, 369, 371, 375, 381, 387, 388, 390, 392, 394, 396, 398, 400, 405, 410, 411, 413, 415, 417, 419, 421, 425, 427, 429, 433, 439, 445, 446, 448, 450, 452, 454, 456, 460, 468, 476, 477, 488, 493, 498, 499, 503, 515, 520, 525, 526, 538, 550, 551, 552, 553, 559, 565, 571, 572, 573, 574, 582, 595, 597, 599, 601, 603, 605, 609, 617, 625, 626, 637, 646, 655, 656, 657, 659, 662, 664, 667, 672, 678, 684, 685, 687, 689, 693, 701, 709, 710, 712, 714, 716, 720, 725, 730, 731, 737, 749, 754, 759, 760, 772, 784, 785, 786, 787, 793, 799, 805, 806, 807, 808, 816, 829, 843, 857, 871, 885, 899, 913, 927, 941, 953, 958, 963, 964, 976, 988, 989, 1002, 1014, 1026, 1027, 1031, 1039, 1049, 1059, 1060, 1071, 1081, 1091, 1092, 1094, 1096, 1098, 1100, 1108, 1110, 1118, 1126, 1128, 1130, 1132, 1134, 1136, 1138, 1140, 1142, 1144, 1148, 1154, 1160, 1161, 1163, 1165, 1167, 1169, 1171, 1175, 1183, 1191, 1192, 1203, 1208, 1213, 1214, 1216, 1217, 1218, 1219, 1224, 1233, 1242, 1243, 1244, 1245, 1247, 1249, 1251, 1253, 1255, 1259, 1267, 1275, 1276, 1287, 1292, 1297, 1298, 1301, 1303, 1311, 1318, 1325, 1326, 1328, 1335, 1340, 1345, 1346, 1347, 1348, 1349, 1350, 1351, 1356, 1361, 1362, 1366, 1368, 1370, 1374, 1382, 1390, 1391, 1393, 1395, 1397, 1399, 1401, 1403, 1405, 1407, 1409, 1416, 1423, 1424, 1425, 1426, 1432, 1438, 1444, 1445, 1450, 1455, 1456, 1457, 1458, 1459, 1460, 1461, 1462, 1463, 1464, 1465, 1472, 1479, 1480, 1481, 1482, 1488, 1494, 1495, 1497, 1499, 1501, 1503, 1505, 1507, 1511, 1513, 1515, 1517, 1521, 1527, 1533, 1534, 1536, 1540, 1555, 1571, 1587, 1588, 1589, 1590, 1606, 1607, 1608, 1610, 1626, 1643, 1660, 1675, 1676, 1680, 1688, 1696, 1697, 1709, 1714, 1719, 1720, 1732, 1744, 1745, 1746, 1747, 1752, 1753, 1754, 1761, 1773, 1775, 1779, 1781, 1783, 1787, 1793, 1799, 1800, 1802, 1804, 1806, 1808, 1810, 1814, 1823, 1832, 1833, 1841, 1849, 1850, 1861, 1875, 1889, 1890, 1891, 1892, 1896, 1902, 1908, 1909, 1910, 1911, 1917, 1928, 1939, 1940, 1941, 1942, 1944, 1950, 1961, 1972, 1973, 1986, 1999, 2012, 2025, 2038, 2051, 2064, 2075, 2091, 2107, 2108, 2112, 2118, 2125, 2132, 2133, 2134, 2135, 2139, 2145, 2151, 2152, 2158, 2169, 2182, 2195, 2208, 2221, 2232, 2246, 2260, 2261, 2272, 2287, 2302, 2303, 2309, 2311, 2313, 2315, 2317, 2319, 2321, 2322, 2328, 2334, 2335, 2343, 2351, 2352, 2363, 2372, 2381, 2382, 2393, 2395, 2397, 2399, 2401, 2403, 2405, 2416, 2418, 2420, 2422, 2424, 2426, 2430, 2436, 2442, 2443, 2445, 2447, 2449, 2451, 2457, 2465, 2473, 2474, 2485, 2491, 2497, 2498, 2499, 2500, 2506, 2516, 2526, 2527, 2532, 2541, 2550, 2551, 2552, 2553, 2557, 2559, 2561, 2563, 2565, 2573, 2575, 2577, 2579, 2586, 2593, 2594, 2600, 2606, 2607, 2610, 2616, 2619, 2625, 2631, 2632, 2640, 2643, 2647, 2650, 2656, 2662, 2663, 2669, 2675, 2677, 2679, 2681, 2683, 2685, 2692, 2697, 2702, 2703, 2709, 2715, 2716, 2723, 2725, 2727, 2729, 2731, 2736, 2737, 2738, 2750, 2752, 2754, 2756, 2758, 2762, 2768, 2774, 2775, 2777, 2779, 2781, 2785, 2795, 2805, 2806, 2807, 2808, 2813, 2819, 2825, 2826, 2828, 2830, 2832, 2834, 2836, 2841, 2846, 2847, 2855, 2863, 2864, 2876, 2877, 2878, 2880, 2882, 2884, 2886, 2888, 2893, 2902, 2911, 2912, 2924, 2951, 2978, 3005, 0 }; static const char _spss_commands_trans_keys[] = { 10, 46, 9, 10, 13, 32, 46, 10, 46, 42, 42, 42, 47, 79, 77, 77, 69, 78, 84, 32, 46, 65, 73, 97, 105, 84, 116, 65, 97, 9, 10, 13, 32, 83, 115, 9, 10, 13, 32, 76, 108, 9, 10, 13, 32, 76, 108, 10, 73, 105, 83, 115, 84, 116, 9, 10, 13, 32, 9, 10, 13, 32, 70, 82, 84, 102, 114, 116, 9, 10, 13, 32, 70, 82, 84, 102, 114, 116, 10, 73, 105, 76, 88, 108, 120, 69, 101, 9, 10, 13, 32, 61, 9, 10, 13, 32, 61, 10, 9, 10, 13, 32, 34, 39, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 47, 9, 10, 13, 32, 47, 70, 82, 84, 102, 114, 116, 9, 10, 13, 32, 47, 70, 82, 84, 102, 114, 116, 10, 9, 10, 13, 32, 48, 57, 9, 10, 13, 32, 65, 90, 97, 122, 9, 10, 13, 32, 65, 90, 97, 122, 10, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 48, 57, 9, 10, 13, 32, 48, 57, 10, 9, 10, 13, 32, 45, 48, 57, 9, 10, 13, 32, 40, 65, 90, 97, 122, 9, 10, 13, 32, 40, 65, 90, 97, 122, 10, 65, 97, 48, 57, 41, 48, 57, 9, 10, 13, 32, 9, 10, 13, 32, 65, 90, 97, 122, 9, 10, 13, 32, 65, 90, 97, 122, 10, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 48, 57, 9, 10, 13, 32, 48, 57, 10, 9, 10, 13, 32, 45, 46, 47, 48, 57, 9, 10, 13, 32, 40, 46, 47, 65, 90, 97, 122, 9, 10, 13, 32, 40, 46, 47, 65, 90, 97, 122, 10, 65, 97, 48, 57, 41, 48, 57, 9, 10, 13, 32, 46, 47, 9, 10, 13, 32, 46, 47, 65, 90, 97, 122, 9, 10, 13, 32, 46, 47, 65, 90, 97, 122, 10, 78, 88, 110, 120, 68, 100, 9, 10, 13, 32, 9, 10, 13, 32, 73, 105, 9, 10, 13, 32, 73, 105, 10, 78, 110, 80, 112, 85, 117, 84, 116, 9, 10, 13, 32, 9, 10, 13, 32, 80, 112, 9, 10, 13, 32, 80, 112, 10, 82, 114, 79, 111, 71, 103, 82, 114, 65, 97, 77, 109, 9, 10, 13, 32, 46, 9, 10, 13, 32, 46, 10, 69, 101, 67, 99, 85, 117, 84, 116, 69, 101, 73, 79, 105, 111, 76, 108, 69, 101, 9, 10, 13, 32, 9, 10, 13, 32, 72, 104, 9, 10, 13, 32, 72, 104, 10, 65, 97, 78, 110, 68, 100, 76, 108, 69, 101, 9, 10, 13, 32, 9, 10, 13, 32, 65, 90, 97, 122, 9, 10, 13, 32, 65, 90, 97, 122, 10, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 9, 10, 13, 32, 47, 10, 65, 90, 97, 122, 9, 10, 13, 32, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 61, 9, 10, 13, 32, 61, 10, 9, 10, 13, 32, 34, 39, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 46, 47, 9, 10, 13, 32, 46, 47, 9, 10, 13, 32, 46, 47, 10, 39, 39, 9, 10, 13, 32, 46, 47, 48, 57, 9, 10, 13, 32, 46, 47, 95, 48, 57, 65, 90, 97, 122, 82, 114, 77, 109, 65, 97, 84, 116, 83, 115, 9, 10, 13, 32, 9, 10, 13, 32, 65, 90, 97, 122, 9, 10, 13, 32, 65, 90, 97, 122, 10, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 40, 65, 90, 97, 122, 9, 10, 13, 32, 40, 65, 90, 97, 122, 10, 70, 48, 57, 46, 48, 57, 48, 57, 41, 48, 57, 9, 10, 13, 32, 46, 9, 10, 13, 32, 46, 47, 9, 10, 13, 32, 46, 47, 10, 69, 101, 84, 116, 9, 10, 13, 32, 9, 10, 13, 32, 68, 70, 100, 102, 9, 10, 13, 32, 68, 70, 100, 102, 10, 65, 97, 84, 116, 65, 97, 9, 10, 13, 32, 9, 10, 13, 32, 47, 9, 10, 13, 32, 47, 10, 86, 118, 65, 90, 97, 122, 9, 10, 13, 32, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 61, 9, 10, 13, 32, 61, 10, 9, 10, 13, 32, 34, 39, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 46, 47, 9, 10, 13, 32, 46, 47, 9, 10, 13, 32, 46, 47, 10, 39, 39, 9, 10, 13, 32, 46, 47, 48, 57, 9, 10, 13, 32, 46, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 61, 65, 95, 97, 48, 57, 66, 90, 98, 122, 9, 10, 13, 32, 61, 82, 95, 114, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 61, 73, 95, 105, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 61, 65, 95, 97, 48, 57, 66, 90, 98, 122, 9, 10, 13, 32, 61, 66, 95, 98, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 61, 76, 95, 108, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 61, 69, 95, 101, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 61, 83, 95, 115, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 61, 9, 10, 13, 32, 61, 10, 9, 10, 13, 32, 34, 39, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 48, 57, 65, 90, 97, 122, 10, 9, 10, 13, 32, 46, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 47, 65, 68, 70, 97, 100, 102, 9, 10, 13, 32, 46, 47, 65, 68, 70, 97, 100, 102, 10, 68, 100, 48, 57, 9, 10, 13, 32, 46, 47, 48, 57, 9, 10, 13, 32, 46, 47, 65, 90, 97, 122, 9, 10, 13, 32, 46, 47, 65, 90, 97, 122, 10, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 65, 68, 70, 97, 100, 102, 9, 10, 13, 32, 65, 68, 70, 97, 100, 102, 10, 65, 97, 84, 116, 69, 101, 48, 57, 9, 10, 13, 32, 46, 47, 48, 57, 48, 57, 9, 10, 13, 32, 46, 47, 48, 57, 9, 10, 13, 32, 46, 47, 48, 57, 73, 105, 83, 115, 84, 116, 73, 105, 83, 115, 83, 115, 73, 105, 78, 110, 71, 103, 9, 10, 13, 32, 9, 10, 13, 32, 86, 118, 9, 10, 13, 32, 86, 118, 10, 65, 97, 76, 108, 85, 117, 69, 101, 83, 115, 9, 10, 13, 32, 9, 10, 13, 32, 65, 90, 97, 122, 9, 10, 13, 32, 65, 90, 97, 122, 10, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 40, 9, 10, 13, 32, 40, 10, 34, 39, 34, 34, 41, 9, 10, 13, 32, 46, 9, 10, 13, 32, 46, 65, 90, 97, 122, 9, 10, 13, 32, 46, 65, 90, 97, 122, 10, 39, 39, 69, 101, 67, 99, 79, 111, 68, 100, 69, 101, 9, 10, 13, 32, 9, 10, 13, 32, 65, 90, 97, 122, 9, 10, 13, 32, 65, 90, 97, 122, 10, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 40, 9, 10, 13, 32, 40, 10, 45, 48, 57, 48, 57, 9, 10, 13, 32, 46, 61, 48, 57, 9, 10, 13, 32, 45, 48, 57, 9, 10, 13, 32, 45, 48, 57, 10, 48, 57, 9, 10, 13, 32, 61, 48, 57, 9, 10, 13, 32, 83, 9, 10, 13, 32, 83, 10, 89, 83, 77, 73, 83, 9, 10, 13, 32, 41, 9, 10, 13, 32, 41, 10, 65, 69, 97, 101, 86, 118, 69, 101, 9, 10, 13, 32, 9, 10, 13, 32, 68, 79, 100, 111, 9, 10, 13, 32, 68, 79, 100, 111, 10, 73, 105, 67, 99, 84, 116, 73, 105, 79, 111, 78, 110, 65, 97, 82, 114, 89, 121, 9, 10, 13, 32, 34, 39, 61, 9, 10, 13, 32, 34, 39, 61, 10, 34, 34, 9, 10, 13, 32, 46, 47, 9, 10, 13, 32, 46, 47, 9, 10, 13, 32, 46, 47, 10, 9, 10, 13, 32, 67, 9, 10, 13, 32, 67, 10, 79, 77, 80, 82, 69, 83, 83, 69, 68, 9, 10, 13, 32, 46, 47, 67, 9, 10, 13, 32, 46, 47, 67, 10, 39, 39, 9, 10, 13, 32, 34, 39, 9, 10, 13, 32, 34, 39, 10, 85, 117, 84, 116, 70, 102, 73, 105, 76, 108, 69, 101, 76, 84, 108, 116, 69, 101, 67, 99, 84, 116, 9, 10, 13, 32, 9, 10, 13, 32, 73, 105, 9, 10, 13, 32, 73, 105, 10, 70, 102, 9, 10, 13, 32, 9, 10, 13, 32, 34, 39, 45, 40, 41, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 46, 40, 41, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 46, 40, 41, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 34, 39, 45, 46, 40, 41, 48, 57, 65, 90, 97, 122, 39, 39, 48, 57, 9, 10, 13, 32, 34, 39, 45, 46, 40, 41, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 46, 95, 40, 41, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 46, 95, 40, 41, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 40, 41, 48, 57, 65, 90, 97, 122, 10, 9, 10, 13, 32, 9, 10, 13, 32, 65, 90, 97, 122, 9, 10, 13, 32, 65, 90, 97, 122, 10, 9, 10, 13, 32, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 61, 9, 10, 13, 32, 61, 10, 9, 10, 13, 32, 34, 39, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 46, 39, 39, 9, 10, 13, 32, 46, 48, 57, 9, 10, 13, 32, 46, 95, 48, 57, 65, 90, 97, 122, 65, 97, 76, 82, 108, 114, 85, 117, 69, 101, 9, 10, 13, 32, 9, 10, 13, 32, 76, 108, 9, 10, 13, 32, 76, 108, 10, 65, 97, 66, 98, 69, 101, 76, 108, 83, 115, 9, 10, 13, 32, 9, 10, 13, 32, 47, 65, 90, 97, 122, 9, 10, 13, 32, 47, 65, 90, 97, 122, 10, 9, 10, 13, 32, 65, 90, 97, 122, 9, 10, 13, 32, 65, 90, 97, 122, 10, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 46, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 46, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 9, 10, 13, 32, 34, 39, 9, 10, 13, 32, 34, 39, 10, 34, 34, 9, 10, 13, 32, 46, 47, 9, 10, 13, 32, 34, 39, 45, 46, 47, 48, 57, 9, 10, 13, 32, 34, 39, 45, 46, 47, 48, 57, 10, 39, 39, 48, 57, 9, 10, 13, 32, 48, 57, 9, 10, 13, 32, 46, 86, 118, 65, 90, 97, 122, 9, 10, 13, 32, 46, 86, 118, 65, 90, 97, 122, 10, 9, 10, 13, 32, 65, 95, 97, 48, 57, 66, 90, 98, 122, 9, 10, 13, 32, 82, 95, 114, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 73, 95, 105, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 65, 95, 97, 48, 57, 66, 90, 98, 122, 9, 10, 13, 32, 66, 95, 98, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 76, 95, 108, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 69, 95, 101, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 46, 76, 108, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 46, 76, 108, 48, 57, 65, 90, 97, 122, 10, 9, 10, 13, 32, 9, 10, 13, 32, 48, 57, 9, 10, 13, 32, 34, 39, 45, 9, 10, 13, 32, 34, 39, 45, 10, 39, 39, 9, 10, 13, 32, 9, 10, 13, 32, 48, 57, 9, 10, 13, 32, 48, 57, 10, 9, 10, 13, 32, 48, 57, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 69, 95, 101, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 86, 95, 118, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 69, 95, 101, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 76, 95, 108, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 46, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 46, 48, 57, 65, 90, 97, 122, 10, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 40, 45, 46, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 40, 45, 46, 48, 57, 65, 90, 97, 122, 10, 78, 79, 83, 110, 111, 115, 79, 111, 77, 109, 73, 105, 78, 110, 65, 97, 76, 108, 41, 9, 10, 13, 32, 46, 47, 9, 10, 13, 32, 46, 47, 10, 9, 10, 13, 32, 65, 90, 97, 122, 9, 10, 13, 32, 65, 90, 97, 122, 10, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 40, 65, 90, 97, 122, 9, 10, 13, 32, 40, 65, 90, 97, 122, 10, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 82, 114, 68, 100, 67, 99, 65, 97, 76, 108, 69, 101, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 73, 105, 65, 97, 66, 98, 76, 108, 69, 101, 9, 10, 13, 32, 9, 10, 13, 32, 76, 108, 9, 10, 13, 32, 76, 108, 10, 65, 97, 66, 98, 69, 101, 76, 108, 9, 10, 13, 32, 83, 115, 9, 10, 13, 32, 65, 90, 97, 122, 9, 10, 13, 32, 65, 90, 97, 122, 10, 9, 10, 13, 32, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 9, 10, 13, 32, 34, 39, 10, 34, 34, 9, 10, 13, 32, 46, 47, 9, 10, 13, 32, 46, 47, 65, 90, 97, 122, 9, 10, 13, 32, 46, 47, 65, 90, 97, 122, 10, 9, 10, 13, 32, 46, 9, 10, 13, 32, 46, 65, 90, 97, 122, 9, 10, 13, 32, 46, 65, 90, 97, 122, 10, 39, 39, 9, 10, 13, 32, 65, 97, 84, 116, 69, 101, 48, 57, 9, 10, 13, 32, 46, 47, 48, 57, 73, 105, 76, 108, 69, 101, 9, 10, 13, 32, 34, 39, 61, 9, 10, 13, 32, 34, 39, 61, 10, 9, 10, 13, 32, 34, 39, 9, 10, 13, 32, 34, 39, 10, 41, 48, 57, 9, 10, 13, 32, 46, 47, 41, 48, 57, 9, 10, 13, 32, 48, 57, 9, 10, 13, 32, 48, 57, 10, 9, 10, 13, 32, 46, 47, 48, 57, 41, 48, 57, 9, 10, 13, 32, 41, 48, 57, 9, 10, 13, 32, 48, 57, 9, 10, 13, 32, 48, 57, 10, 9, 10, 13, 32, 48, 57, 9, 10, 13, 32, 48, 57, 69, 101, 67, 99, 79, 111, 82, 114, 68, 100, 9, 10, 13, 32, 61, 83, 115, 9, 10, 13, 32, 61, 9, 10, 13, 32, 61, 10, 9, 10, 13, 32, 48, 57, 9, 10, 13, 32, 48, 57, 10, 9, 10, 13, 32, 47, 48, 57, 65, 97, 66, 98, 76, 108, 69, 101, 9, 10, 13, 32, 47, 39, 39, 9, 10, 13, 32, 47, 95, 48, 57, 65, 90, 97, 122, 69, 101, 68, 100, 69, 101, 84, 116, 9, 10, 13, 32, 9, 10, 13, 32, 78, 110, 9, 10, 13, 32, 78, 110, 10, 65, 97, 77, 109, 69, 101, 9, 10, 13, 32, 9, 10, 13, 32, 34, 39, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 46, 9, 10, 13, 32, 87, 119, 9, 10, 13, 32, 87, 119, 10, 73, 105, 78, 110, 68, 100, 79, 111, 87, 119, 9, 10, 13, 32, 61, 9, 10, 13, 32, 61, 10, 9, 10, 13, 32, 65, 90, 97, 122, 9, 10, 13, 32, 65, 90, 97, 122, 10, 9, 10, 13, 32, 46, 95, 48, 57, 65, 90, 97, 122, 39, 39, 83, 115, 80, 112, 76, 108, 65, 97, 89, 121, 9, 10, 13, 32, 46, 9, 10, 13, 32, 46, 65, 90, 97, 122, 9, 10, 13, 32, 46, 65, 90, 97, 122, 10, 9, 10, 13, 32, 46, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 42, 47, 67, 68, 69, 70, 71, 73, 76, 77, 82, 83, 86, 100, 101, 102, 103, 105, 108, 109, 114, 115, 118, 9, 10, 13, 32, 42, 47, 67, 68, 69, 70, 71, 73, 76, 77, 82, 83, 86, 100, 101, 102, 103, 105, 108, 109, 114, 115, 118, 9, 10, 13, 32, 42, 47, 67, 68, 69, 70, 71, 73, 76, 77, 82, 83, 86, 100, 101, 102, 103, 105, 108, 109, 114, 115, 118, 9, 10, 13, 32, 42, 47, 67, 68, 69, 70, 71, 73, 76, 77, 82, 83, 86, 100, 101, 102, 103, 105, 108, 109, 114, 115, 118, 48, 57, 0 }; static const signed char _spss_commands_single_lengths[] = { 0, 1, 1, 5, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 4, 2, 2, 6, 6, 6, 1, 2, 2, 2, 4, 10, 10, 1, 2, 4, 2, 5, 5, 1, 6, 6, 1, 1, 1, 5, 11, 11, 1, 4, 4, 4, 1, 5, 4, 4, 1, 5, 5, 5, 1, 2, 1, 4, 4, 4, 1, 5, 4, 4, 1, 7, 7, 7, 1, 2, 1, 6, 6, 6, 1, 4, 2, 4, 6, 6, 1, 2, 2, 2, 2, 4, 6, 6, 1, 2, 2, 2, 2, 2, 2, 5, 5, 1, 2, 2, 2, 2, 2, 4, 2, 2, 4, 6, 6, 1, 2, 2, 2, 2, 2, 4, 4, 4, 1, 5, 5, 5, 1, 0, 6, 5, 5, 1, 6, 6, 1, 1, 1, 6, 6, 6, 1, 1, 1, 6, 7, 2, 2, 2, 2, 2, 4, 4, 4, 1, 5, 5, 5, 1, 1, 0, 1, 0, 1, 5, 6, 6, 1, 2, 2, 4, 8, 8, 1, 2, 2, 2, 4, 5, 5, 1, 2, 6, 5, 5, 1, 6, 6, 1, 1, 1, 6, 6, 6, 1, 1, 1, 6, 7, 8, 8, 8, 8, 8, 8, 8, 8, 6, 5, 5, 1, 6, 6, 1, 7, 12, 12, 1, 2, 6, 6, 6, 1, 5, 10, 10, 1, 2, 2, 2, 0, 6, 0, 6, 6, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 6, 6, 1, 2, 2, 2, 2, 2, 4, 4, 4, 1, 5, 5, 5, 1, 2, 1, 1, 1, 5, 5, 5, 1, 1, 1, 2, 2, 2, 2, 2, 4, 4, 4, 1, 5, 5, 5, 1, 1, 0, 6, 5, 5, 1, 0, 5, 5, 5, 1, 1, 1, 1, 1, 1, 5, 5, 1, 4, 2, 2, 4, 8, 8, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 7, 1, 1, 1, 6, 6, 6, 1, 5, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, 7, 1, 1, 1, 6, 6, 1, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 4, 6, 6, 1, 2, 4, 7, 8, 8, 1, 1, 1, 8, 1, 1, 0, 8, 9, 9, 7, 1, 4, 4, 4, 1, 6, 5, 5, 1, 6, 6, 1, 1, 1, 5, 1, 1, 5, 6, 2, 4, 2, 2, 4, 6, 6, 1, 2, 2, 2, 2, 2, 4, 5, 5, 1, 4, 4, 1, 5, 8, 8, 1, 1, 1, 4, 6, 6, 1, 1, 1, 6, 9, 9, 1, 1, 1, 0, 4, 7, 7, 1, 7, 7, 7, 7, 7, 7, 7, 5, 10, 10, 1, 4, 4, 7, 7, 1, 1, 1, 4, 4, 4, 1, 4, 5, 7, 7, 7, 7, 5, 8, 8, 1, 5, 9, 9, 1, 6, 2, 2, 2, 2, 2, 2, 1, 6, 6, 1, 4, 4, 1, 5, 5, 5, 1, 5, 2, 2, 2, 2, 2, 2, 5, 2, 2, 2, 2, 2, 4, 6, 6, 1, 2, 2, 2, 2, 6, 4, 4, 1, 5, 6, 6, 1, 1, 1, 6, 6, 6, 1, 5, 5, 5, 1, 1, 1, 4, 2, 2, 2, 0, 6, 2, 2, 2, 7, 7, 1, 6, 6, 1, 1, 6, 1, 4, 4, 1, 6, 1, 4, 1, 4, 4, 1, 4, 4, 2, 2, 2, 2, 2, 7, 5, 5, 1, 4, 4, 1, 5, 2, 2, 2, 2, 5, 1, 1, 6, 2, 2, 2, 2, 4, 6, 6, 1, 2, 2, 2, 4, 6, 6, 1, 1, 1, 5, 6, 6, 1, 2, 2, 2, 2, 2, 5, 5, 1, 4, 4, 1, 6, 1, 1, 2, 2, 2, 2, 2, 5, 5, 5, 1, 6, 27, 27, 27, 27, 0 }; static const signed char _spss_commands_range_lengths[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 3, 1, 1, 0, 1, 2, 2, 0, 1, 1, 0, 2, 2, 0, 3, 1, 1, 0, 1, 2, 2, 0, 1, 1, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 0, 0, 0, 2, 3, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 2, 2, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 3, 3, 0, 3, 0, 0, 0, 1, 1, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 0, 0, 0, 4, 0, 0, 1, 4, 4, 4, 4, 0, 0, 2, 2, 0, 3, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 2, 2, 0, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 2, 2, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 0, 0, 0, 1, 0 }; static const short _spss_commands_index_offsets[] = { 0, 0, 2, 4, 10, 13, 15, 17, 20, 22, 24, 26, 28, 30, 32, 34, 36, 41, 44, 47, 54, 61, 68, 70, 73, 76, 79, 84, 95, 106, 108, 111, 116, 119, 125, 131, 133, 142, 151, 153, 155, 157, 163, 175, 187, 189, 195, 202, 209, 211, 220, 226, 232, 234, 241, 249, 257, 259, 263, 266, 271, 278, 285, 287, 296, 302, 308, 310, 319, 329, 339, 341, 345, 348, 355, 364, 373, 375, 380, 383, 388, 395, 402, 404, 407, 410, 413, 416, 421, 428, 435, 437, 440, 443, 446, 449, 452, 455, 461, 467, 469, 472, 475, 478, 481, 484, 489, 492, 495, 500, 507, 514, 516, 519, 522, 525, 528, 531, 536, 543, 550, 552, 561, 567, 573, 575, 578, 588, 594, 600, 602, 612, 622, 624, 626, 628, 635, 642, 649, 651, 653, 655, 663, 674, 677, 680, 683, 686, 689, 694, 701, 708, 710, 719, 727, 735, 737, 739, 741, 744, 746, 749, 755, 762, 769, 771, 774, 777, 782, 791, 800, 802, 805, 808, 811, 816, 822, 828, 830, 835, 845, 851, 857, 859, 869, 879, 881, 883, 885, 892, 899, 906, 908, 910, 912, 920, 931, 943, 955, 967, 979, 991, 1003, 1015, 1027, 1037, 1043, 1049, 1051, 1061, 1071, 1073, 1084, 1097, 1110, 1112, 1116, 1124, 1133, 1142, 1144, 1153, 1164, 1175, 1177, 1180, 1183, 1186, 1188, 1196, 1198, 1206, 1214, 1217, 1220, 1223, 1226, 1229, 1232, 1235, 1238, 1241, 1246, 1253, 1260, 1262, 1265, 1268, 1271, 1274, 1277, 1282, 1289, 1296, 1298, 1307, 1313, 1319, 1321, 1324, 1326, 1328, 1330, 1336, 1344, 1352, 1354, 1356, 1358, 1361, 1364, 1367, 1370, 1373, 1378, 1385, 1392, 1394, 1403, 1409, 1415, 1417, 1420, 1422, 1430, 1437, 1444, 1446, 1448, 1455, 1461, 1467, 1469, 1471, 1473, 1475, 1477, 1479, 1485, 1491, 1493, 1498, 1501, 1504, 1509, 1518, 1527, 1529, 1532, 1535, 1538, 1541, 1544, 1547, 1550, 1553, 1556, 1564, 1572, 1574, 1576, 1578, 1585, 1592, 1599, 1601, 1607, 1613, 1615, 1617, 1619, 1621, 1623, 1625, 1627, 1629, 1631, 1633, 1641, 1649, 1651, 1653, 1655, 1662, 1669, 1671, 1674, 1677, 1680, 1683, 1686, 1689, 1694, 1697, 1700, 1703, 1708, 1715, 1722, 1724, 1727, 1732, 1744, 1757, 1770, 1772, 1774, 1776, 1789, 1791, 1793, 1795, 1808, 1822, 1836, 1848, 1850, 1855, 1862, 1869, 1871, 1881, 1887, 1893, 1895, 1905, 1915, 1917, 1919, 1921, 1927, 1929, 1931, 1938, 1948, 1951, 1956, 1959, 1962, 1967, 1974, 1981, 1983, 1986, 1989, 1992, 1995, 1998, 2003, 2011, 2019, 2021, 2028, 2035, 2037, 2046, 2058, 2070, 2072, 2074, 2076, 2081, 2088, 2095, 2097, 2099, 2101, 2108, 2119, 2130, 2132, 2134, 2136, 2138, 2144, 2154, 2164, 2166, 2177, 2188, 2199, 2210, 2221, 2232, 2243, 2252, 2266, 2280, 2282, 2287, 2293, 2301, 2309, 2311, 2313, 2315, 2320, 2326, 2332, 2334, 2340, 2349, 2360, 2371, 2382, 2393, 2402, 2414, 2426, 2428, 2437, 2450, 2463, 2465, 2472, 2475, 2478, 2481, 2484, 2487, 2490, 2492, 2499, 2506, 2508, 2515, 2522, 2524, 2533, 2541, 2549, 2551, 2560, 2563, 2566, 2569, 2572, 2575, 2578, 2587, 2590, 2593, 2596, 2599, 2602, 2607, 2614, 2621, 2623, 2626, 2629, 2632, 2635, 2642, 2649, 2656, 2658, 2667, 2674, 2681, 2683, 2685, 2687, 2694, 2703, 2712, 2714, 2720, 2728, 2736, 2738, 2740, 2742, 2747, 2750, 2753, 2756, 2758, 2766, 2769, 2772, 2775, 2783, 2791, 2793, 2800, 2807, 2809, 2812, 2819, 2822, 2828, 2834, 2836, 2844, 2847, 2852, 2855, 2861, 2867, 2869, 2875, 2881, 2884, 2887, 2890, 2893, 2896, 2904, 2910, 2916, 2918, 2924, 2930, 2932, 2939, 2942, 2945, 2948, 2951, 2957, 2959, 2961, 2971, 2974, 2977, 2980, 2983, 2988, 2995, 3002, 3004, 3007, 3010, 3013, 3018, 3027, 3036, 3038, 3040, 3042, 3048, 3055, 3062, 3064, 3067, 3070, 3073, 3076, 3079, 3085, 3091, 3093, 3100, 3107, 3109, 3119, 3121, 3123, 3126, 3129, 3132, 3135, 3138, 3144, 3152, 3160, 3162, 3172, 3200, 3228, 3256, 0 }; static const short _spss_commands_cond_targs[] = { 629, 0, 3, 2, 3, 629, 4, 3, 3, 2, 629, 3, 2, 6, 0, 7, 6, 7, 628, 6, 9, 0, 10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 628, 15, 17, 618, 17, 618, 0, 18, 18, 0, 19, 19, 0, 20, 21, 22, 20, 585, 585, 0, 20, 21, 22, 20, 23, 23, 0, 20, 21, 22, 20, 23, 23, 0, 21, 0, 24, 24, 0, 25, 25, 0, 26, 26, 0, 27, 28, 29, 27, 0, 27, 28, 29, 27, 30, 562, 575, 30, 562, 575, 0, 27, 28, 29, 27, 30, 562, 575, 30, 562, 575, 0, 28, 0, 31, 31, 0, 32, 583, 32, 583, 0, 33, 33, 0, 33, 34, 35, 33, 36, 0, 33, 34, 35, 33, 36, 0, 34, 0, 36, 37, 38, 36, 39, 580, 582, 582, 0, 36, 37, 38, 36, 39, 580, 582, 582, 0, 37, 0, 41, 40, 41, 40, 42, 43, 44, 42, 45, 0, 42, 43, 44, 42, 45, 30, 562, 575, 30, 562, 575, 0, 42, 43, 44, 42, 45, 30, 562, 575, 30, 562, 575, 0, 43, 0, 46, 47, 48, 46, 561, 0, 46, 47, 48, 46, 49, 49, 0, 46, 47, 48, 46, 49, 49, 0, 47, 0, 50, 51, 52, 50, 49, 49, 49, 49, 0, 50, 51, 52, 50, 53, 0, 50, 51, 52, 50, 53, 0, 51, 0, 54, 55, 56, 54, 557, 53, 0, 54, 55, 56, 54, 57, 63, 63, 0, 54, 55, 56, 54, 57, 63, 63, 0, 55, 0, 554, 554, 58, 0, 59, 58, 0, 60, 61, 62, 60, 0, 60, 61, 62, 60, 63, 63, 0, 60, 61, 62, 60, 63, 63, 0, 61, 0, 64, 65, 66, 64, 63, 63, 63, 63, 0, 64, 65, 66, 64, 67, 0, 64, 65, 66, 64, 67, 0, 65, 0, 68, 69, 70, 68, 550, 630, 45, 67, 0, 68, 69, 70, 68, 71, 630, 45, 63, 63, 0, 68, 69, 70, 68, 71, 630, 45, 63, 63, 0, 69, 0, 547, 547, 72, 0, 73, 72, 0, 74, 75, 76, 74, 630, 45, 0, 74, 75, 76, 74, 630, 45, 63, 63, 0, 74, 75, 76, 74, 630, 45, 63, 63, 0, 75, 0, 78, 100, 78, 100, 0, 79, 79, 0, 80, 81, 82, 80, 0, 80, 81, 82, 80, 83, 83, 0, 80, 81, 82, 80, 83, 83, 0, 81, 0, 84, 84, 0, 85, 85, 0, 86, 86, 0, 87, 87, 0, 88, 89, 90, 88, 0, 88, 89, 90, 88, 91, 91, 0, 88, 89, 90, 88, 91, 91, 0, 89, 0, 92, 92, 0, 93, 93, 0, 94, 94, 0, 95, 95, 0, 96, 96, 0, 97, 97, 0, 97, 98, 99, 97, 628, 0, 97, 98, 99, 97, 628, 0, 98, 0, 101, 101, 0, 102, 102, 0, 103, 103, 0, 104, 104, 0, 97, 97, 0, 106, 143, 106, 143, 0, 107, 107, 0, 108, 108, 0, 109, 110, 111, 109, 0, 109, 110, 111, 109, 112, 112, 0, 109, 110, 111, 109, 112, 112, 0, 110, 0, 113, 113, 0, 114, 114, 0, 115, 115, 0, 116, 116, 0, 117, 117, 0, 118, 119, 120, 118, 0, 118, 119, 120, 118, 121, 121, 0, 118, 119, 120, 118, 121, 121, 0, 119, 0, 122, 123, 124, 122, 121, 121, 121, 121, 0, 122, 123, 124, 122, 125, 0, 122, 123, 124, 122, 125, 0, 123, 0, 126, 126, 0, 127, 128, 129, 127, 130, 126, 126, 126, 126, 0, 127, 128, 129, 127, 130, 0, 127, 128, 129, 127, 130, 0, 128, 0, 130, 131, 132, 130, 133, 139, 141, 142, 142, 0, 130, 131, 132, 130, 133, 139, 141, 142, 142, 0, 131, 0, 135, 134, 135, 134, 136, 137, 138, 136, 628, 125, 0, 136, 137, 138, 136, 628, 125, 0, 136, 137, 138, 136, 628, 125, 0, 137, 0, 135, 140, 135, 140, 136, 137, 138, 136, 628, 125, 141, 0, 136, 137, 138, 136, 628, 125, 142, 142, 142, 142, 0, 144, 144, 0, 145, 145, 0, 146, 146, 0, 147, 147, 0, 148, 148, 0, 149, 150, 151, 149, 0, 149, 150, 151, 149, 152, 152, 0, 149, 150, 151, 149, 152, 152, 0, 150, 0, 153, 154, 155, 153, 152, 152, 152, 152, 0, 153, 154, 155, 153, 156, 152, 152, 0, 153, 154, 155, 153, 156, 152, 152, 0, 154, 0, 157, 0, 158, 0, 159, 158, 0, 160, 0, 161, 160, 0, 162, 163, 164, 162, 628, 0, 162, 163, 164, 162, 628, 148, 0, 162, 163, 164, 162, 628, 148, 0, 163, 0, 166, 166, 0, 167, 167, 0, 168, 169, 170, 168, 0, 168, 169, 170, 168, 171, 538, 171, 538, 0, 168, 169, 170, 168, 171, 538, 171, 538, 0, 169, 0, 172, 172, 0, 173, 173, 0, 174, 174, 0, 175, 176, 177, 175, 0, 175, 176, 177, 175, 178, 0, 175, 176, 177, 175, 178, 0, 176, 0, 196, 196, 179, 179, 0, 180, 181, 182, 180, 183, 179, 179, 179, 179, 0, 180, 181, 182, 180, 183, 0, 180, 181, 182, 180, 183, 0, 181, 0, 183, 184, 185, 183, 186, 192, 194, 195, 195, 0, 183, 184, 185, 183, 186, 192, 194, 195, 195, 0, 184, 0, 188, 187, 188, 187, 189, 190, 191, 189, 628, 178, 0, 189, 190, 191, 189, 628, 178, 0, 189, 190, 191, 189, 628, 178, 0, 190, 0, 188, 193, 188, 193, 189, 190, 191, 189, 628, 178, 194, 0, 189, 190, 191, 189, 628, 178, 195, 195, 195, 195, 0, 180, 181, 182, 180, 183, 197, 179, 197, 179, 179, 179, 0, 180, 181, 182, 180, 183, 198, 179, 198, 179, 179, 179, 0, 180, 181, 182, 180, 183, 199, 179, 199, 179, 179, 179, 0, 180, 181, 182, 180, 183, 200, 179, 200, 179, 179, 179, 0, 180, 181, 182, 180, 183, 201, 179, 201, 179, 179, 179, 0, 180, 181, 182, 180, 183, 202, 179, 202, 179, 179, 179, 0, 180, 181, 182, 180, 183, 203, 179, 203, 179, 179, 179, 0, 180, 181, 182, 180, 183, 204, 179, 204, 179, 179, 179, 0, 205, 206, 207, 205, 208, 179, 179, 179, 179, 0, 205, 206, 207, 205, 208, 0, 205, 206, 207, 205, 208, 0, 206, 0, 208, 209, 210, 208, 186, 192, 194, 211, 211, 0, 208, 209, 210, 208, 186, 192, 194, 211, 211, 0, 209, 0, 212, 213, 214, 212, 628, 178, 211, 211, 211, 211, 0, 212, 213, 214, 212, 628, 178, 215, 224, 229, 215, 224, 229, 0, 212, 213, 214, 212, 628, 178, 215, 224, 229, 215, 224, 229, 0, 213, 0, 533, 533, 216, 0, 217, 218, 219, 217, 628, 178, 216, 0, 217, 218, 219, 217, 628, 178, 220, 220, 0, 217, 218, 219, 217, 628, 178, 220, 220, 0, 218, 0, 221, 222, 223, 221, 220, 220, 220, 220, 0, 221, 222, 223, 221, 215, 224, 229, 215, 224, 229, 0, 221, 222, 223, 221, 215, 224, 229, 215, 224, 229, 0, 222, 0, 225, 225, 0, 226, 226, 0, 227, 227, 0, 228, 0, 217, 218, 219, 217, 628, 178, 228, 0, 230, 0, 217, 218, 219, 217, 631, 178, 230, 0, 217, 218, 219, 217, 628, 178, 231, 0, 233, 233, 0, 234, 234, 0, 97, 97, 0, 236, 236, 0, 237, 237, 0, 238, 238, 0, 239, 239, 0, 240, 240, 0, 241, 241, 0, 242, 243, 244, 242, 0, 242, 243, 244, 242, 245, 245, 0, 242, 243, 244, 242, 245, 245, 0, 243, 0, 246, 246, 0, 247, 247, 0, 248, 248, 0, 249, 249, 0, 250, 250, 0, 251, 252, 253, 251, 0, 251, 252, 253, 251, 254, 254, 0, 251, 252, 253, 251, 254, 254, 0, 252, 0, 255, 256, 257, 255, 254, 254, 254, 254, 0, 255, 256, 257, 255, 258, 0, 255, 256, 257, 255, 258, 0, 256, 0, 259, 266, 0, 261, 260, 261, 260, 262, 0, 263, 264, 265, 263, 628, 0, 263, 264, 265, 263, 628, 254, 254, 0, 263, 264, 265, 263, 628, 254, 254, 0, 264, 0, 261, 267, 261, 267, 269, 269, 0, 270, 270, 0, 271, 271, 0, 272, 272, 0, 273, 273, 0, 274, 275, 276, 274, 0, 274, 275, 276, 274, 277, 277, 0, 274, 275, 276, 274, 277, 277, 0, 275, 0, 278, 279, 280, 278, 277, 277, 277, 277, 0, 278, 279, 280, 278, 281, 0, 278, 279, 280, 278, 281, 0, 279, 0, 282, 283, 0, 283, 0, 284, 285, 286, 284, 287, 289, 283, 0, 284, 285, 286, 284, 282, 283, 0, 284, 285, 286, 284, 282, 283, 0, 285, 0, 288, 0, 284, 285, 286, 284, 289, 288, 0, 289, 290, 291, 289, 292, 0, 289, 290, 291, 289, 292, 0, 290, 0, 293, 0, 294, 0, 295, 0, 296, 0, 297, 0, 297, 298, 299, 297, 97, 0, 297, 298, 299, 297, 97, 0, 298, 0, 301, 351, 301, 351, 0, 302, 302, 0, 303, 303, 0, 304, 305, 306, 304, 0, 304, 305, 306, 304, 307, 345, 307, 345, 0, 304, 305, 306, 304, 307, 345, 307, 345, 0, 305, 0, 308, 308, 0, 309, 309, 0, 310, 310, 0, 311, 311, 0, 312, 312, 0, 313, 313, 0, 314, 314, 0, 315, 315, 0, 316, 316, 0, 316, 317, 318, 316, 319, 340, 342, 0, 316, 317, 318, 316, 319, 340, 342, 0, 317, 0, 321, 320, 321, 320, 322, 323, 324, 322, 628, 337, 0, 322, 323, 324, 322, 628, 325, 0, 322, 323, 324, 322, 628, 325, 0, 323, 0, 325, 326, 327, 325, 328, 0, 325, 326, 327, 325, 328, 0, 326, 0, 329, 0, 330, 0, 331, 0, 332, 0, 333, 0, 334, 0, 335, 0, 336, 0, 97, 0, 337, 338, 339, 337, 628, 325, 328, 0, 337, 338, 339, 337, 628, 325, 328, 0, 338, 0, 321, 341, 321, 341, 342, 343, 344, 342, 319, 340, 0, 342, 343, 344, 342, 319, 340, 0, 343, 0, 346, 346, 0, 347, 347, 0, 348, 348, 0, 349, 349, 0, 350, 350, 0, 316, 316, 0, 352, 376, 352, 376, 0, 353, 353, 0, 354, 354, 0, 355, 355, 0, 356, 357, 358, 356, 0, 356, 357, 358, 356, 359, 359, 0, 356, 357, 358, 356, 359, 359, 0, 357, 0, 360, 360, 0, 361, 374, 375, 361, 0, 362, 363, 364, 362, 365, 368, 370, 362, 371, 372, 372, 0, 362, 363, 364, 362, 365, 368, 370, 628, 362, 371, 372, 372, 0, 362, 363, 364, 362, 365, 368, 370, 628, 362, 371, 372, 372, 0, 363, 0, 367, 366, 367, 366, 362, 363, 364, 362, 365, 368, 370, 628, 362, 371, 372, 372, 0, 367, 369, 367, 369, 371, 0, 362, 363, 364, 362, 365, 368, 370, 628, 362, 371, 372, 372, 0, 362, 363, 364, 362, 365, 368, 370, 628, 372, 362, 373, 372, 372, 0, 362, 363, 364, 362, 365, 368, 370, 628, 372, 362, 373, 372, 372, 0, 362, 363, 364, 362, 365, 368, 370, 362, 371, 372, 372, 0, 374, 0, 377, 378, 379, 377, 0, 377, 378, 379, 377, 380, 380, 0, 377, 378, 379, 377, 380, 380, 0, 378, 0, 381, 382, 383, 381, 384, 380, 380, 380, 380, 0, 381, 382, 383, 381, 384, 0, 381, 382, 383, 381, 384, 0, 382, 0, 384, 385, 386, 384, 387, 390, 392, 393, 393, 0, 384, 385, 386, 384, 387, 390, 392, 393, 393, 0, 385, 0, 389, 388, 389, 388, 97, 98, 99, 97, 628, 0, 389, 391, 389, 391, 97, 98, 99, 97, 628, 392, 0, 97, 98, 99, 97, 628, 393, 393, 393, 393, 0, 395, 395, 0, 396, 499, 396, 499, 0, 397, 397, 0, 398, 398, 0, 399, 400, 401, 399, 0, 399, 400, 401, 399, 402, 402, 0, 399, 400, 401, 399, 402, 402, 0, 400, 0, 403, 403, 0, 404, 404, 0, 405, 405, 0, 406, 406, 0, 407, 407, 0, 408, 409, 410, 408, 0, 408, 409, 410, 408, 411, 414, 414, 0, 408, 409, 410, 408, 411, 414, 414, 0, 409, 0, 411, 412, 413, 411, 414, 414, 0, 411, 412, 413, 411, 414, 414, 0, 412, 0, 415, 416, 417, 415, 414, 414, 414, 414, 0, 415, 416, 417, 415, 418, 430, 432, 448, 449, 460, 460, 0, 415, 416, 417, 415, 418, 430, 432, 448, 449, 460, 460, 0, 416, 0, 420, 419, 420, 419, 421, 422, 423, 421, 0, 421, 422, 423, 421, 424, 453, 0, 421, 422, 423, 421, 424, 453, 0, 422, 0, 426, 425, 426, 425, 427, 428, 429, 427, 628, 434, 0, 427, 428, 429, 427, 418, 430, 432, 628, 434, 449, 0, 427, 428, 429, 427, 418, 430, 432, 628, 434, 449, 0, 428, 0, 420, 431, 420, 431, 433, 0, 421, 422, 423, 421, 433, 0, 434, 435, 436, 434, 628, 437, 437, 414, 414, 0, 434, 435, 436, 434, 628, 437, 437, 414, 414, 0, 435, 0, 415, 416, 417, 415, 438, 414, 438, 414, 414, 414, 0, 415, 416, 417, 415, 439, 414, 439, 414, 414, 414, 0, 415, 416, 417, 415, 440, 414, 440, 414, 414, 414, 0, 415, 416, 417, 415, 441, 414, 441, 414, 414, 414, 0, 415, 416, 417, 415, 442, 414, 442, 414, 414, 414, 0, 415, 416, 417, 415, 443, 414, 443, 414, 414, 414, 0, 415, 416, 417, 415, 444, 414, 444, 414, 414, 414, 0, 445, 446, 447, 445, 414, 414, 414, 414, 0, 445, 446, 447, 445, 418, 430, 432, 448, 461, 461, 449, 460, 460, 0, 445, 446, 447, 445, 418, 430, 432, 448, 461, 461, 449, 460, 460, 0, 446, 0, 421, 422, 423, 421, 0, 450, 451, 452, 450, 449, 0, 450, 451, 452, 450, 424, 453, 455, 0, 450, 451, 452, 450, 424, 453, 455, 0, 451, 0, 426, 454, 426, 454, 456, 457, 458, 456, 0, 456, 457, 458, 456, 459, 0, 456, 457, 458, 456, 459, 0, 457, 0, 421, 422, 423, 421, 459, 0, 415, 416, 417, 415, 460, 460, 460, 460, 0, 415, 416, 417, 415, 462, 460, 462, 460, 460, 460, 0, 415, 416, 417, 415, 463, 460, 463, 460, 460, 460, 0, 415, 416, 417, 415, 464, 460, 464, 460, 460, 460, 0, 415, 416, 417, 415, 465, 460, 465, 460, 460, 460, 0, 466, 467, 468, 466, 460, 460, 460, 460, 0, 466, 467, 468, 466, 418, 430, 432, 448, 449, 469, 469, 0, 466, 467, 468, 466, 418, 430, 432, 448, 449, 469, 469, 0, 467, 0, 470, 471, 472, 470, 469, 469, 469, 469, 0, 470, 471, 472, 470, 418, 430, 473, 432, 448, 449, 498, 498, 0, 470, 471, 472, 470, 418, 430, 473, 432, 448, 449, 498, 498, 0, 471, 0, 474, 492, 494, 474, 492, 494, 0, 475, 475, 0, 476, 476, 0, 477, 477, 0, 478, 478, 0, 479, 479, 0, 480, 480, 0, 481, 0, 481, 482, 483, 481, 628, 484, 0, 481, 482, 483, 481, 628, 484, 0, 482, 0, 484, 485, 486, 484, 487, 487, 0, 484, 485, 486, 484, 487, 487, 0, 485, 0, 488, 489, 490, 488, 487, 487, 487, 487, 0, 488, 489, 490, 488, 473, 491, 491, 0, 488, 489, 490, 488, 473, 491, 491, 0, 489, 0, 488, 489, 490, 488, 491, 491, 491, 491, 0, 493, 493, 0, 476, 476, 0, 495, 495, 0, 496, 496, 0, 497, 497, 0, 480, 480, 0, 470, 471, 472, 470, 498, 498, 498, 498, 0, 500, 500, 0, 501, 501, 0, 502, 502, 0, 503, 503, 0, 504, 504, 0, 505, 506, 507, 505, 0, 505, 506, 507, 505, 508, 508, 0, 505, 506, 507, 505, 508, 508, 0, 506, 0, 509, 509, 0, 510, 510, 0, 511, 511, 0, 512, 512, 0, 513, 514, 515, 513, 532, 532, 0, 513, 514, 515, 513, 516, 516, 0, 513, 514, 515, 513, 516, 516, 0, 514, 0, 517, 518, 519, 517, 516, 516, 516, 516, 0, 517, 518, 519, 517, 520, 530, 0, 517, 518, 519, 517, 520, 530, 0, 518, 0, 522, 521, 522, 521, 523, 524, 525, 523, 628, 526, 0, 523, 524, 525, 523, 628, 526, 516, 516, 0, 523, 524, 525, 523, 628, 526, 516, 516, 0, 524, 0, 527, 528, 529, 527, 628, 0, 527, 528, 529, 527, 628, 516, 516, 0, 527, 528, 529, 527, 628, 516, 516, 0, 528, 0, 522, 531, 522, 531, 513, 514, 515, 513, 0, 534, 534, 0, 535, 535, 0, 536, 536, 0, 537, 0, 217, 218, 219, 217, 628, 178, 537, 0, 539, 539, 0, 540, 540, 0, 541, 541, 0, 541, 542, 543, 541, 387, 390, 544, 0, 541, 542, 543, 541, 387, 390, 544, 0, 542, 0, 544, 545, 546, 544, 387, 390, 0, 544, 545, 546, 544, 387, 390, 0, 545, 0, 548, 549, 0, 74, 75, 76, 74, 630, 45, 0, 548, 549, 0, 550, 551, 552, 550, 553, 0, 550, 551, 552, 550, 553, 0, 551, 0, 68, 69, 70, 68, 630, 45, 553, 0, 555, 556, 0, 60, 61, 62, 60, 0, 555, 556, 0, 557, 558, 559, 557, 560, 0, 557, 558, 559, 557, 560, 0, 558, 0, 54, 55, 56, 54, 560, 0, 46, 47, 48, 46, 561, 0, 563, 563, 0, 564, 564, 0, 565, 565, 0, 566, 566, 0, 567, 567, 0, 568, 569, 570, 568, 571, 568, 568, 0, 568, 569, 570, 568, 571, 0, 568, 569, 570, 568, 571, 0, 569, 0, 571, 572, 573, 571, 574, 0, 571, 572, 573, 571, 574, 0, 572, 0, 42, 43, 44, 42, 45, 574, 0, 576, 576, 0, 577, 577, 0, 578, 578, 0, 579, 579, 0, 42, 43, 44, 42, 45, 0, 41, 581, 41, 581, 42, 43, 44, 42, 45, 582, 582, 582, 582, 0, 584, 584, 0, 579, 579, 0, 586, 586, 0, 587, 587, 0, 588, 589, 590, 588, 0, 588, 589, 590, 588, 591, 591, 0, 588, 589, 590, 588, 591, 591, 0, 589, 0, 592, 592, 0, 593, 593, 0, 594, 594, 0, 595, 596, 597, 595, 0, 595, 596, 597, 595, 598, 616, 615, 615, 0, 595, 596, 597, 595, 598, 616, 615, 615, 0, 596, 0, 600, 599, 600, 599, 601, 602, 603, 601, 628, 0, 601, 602, 603, 601, 604, 604, 0, 601, 602, 603, 601, 604, 604, 0, 602, 0, 605, 605, 0, 606, 606, 0, 607, 607, 0, 608, 608, 0, 609, 609, 0, 609, 610, 611, 609, 612, 0, 609, 610, 611, 609, 612, 0, 610, 0, 612, 613, 614, 612, 615, 615, 0, 612, 613, 614, 612, 615, 615, 0, 613, 0, 601, 602, 603, 601, 628, 615, 615, 615, 615, 0, 600, 617, 600, 617, 619, 619, 0, 620, 620, 0, 621, 621, 0, 622, 622, 0, 623, 623, 0, 624, 625, 626, 624, 628, 0, 624, 625, 626, 624, 628, 627, 627, 0, 624, 625, 626, 624, 628, 627, 627, 0, 625, 0, 624, 625, 626, 624, 628, 627, 627, 627, 627, 0, 628, 629, 1, 628, 2, 5, 8, 16, 77, 105, 165, 83, 232, 235, 268, 300, 394, 16, 77, 105, 165, 83, 232, 235, 268, 300, 394, 0, 628, 629, 1, 628, 2, 5, 8, 16, 77, 105, 165, 83, 232, 235, 268, 300, 394, 16, 77, 105, 165, 83, 232, 235, 268, 300, 394, 0, 628, 629, 1, 628, 2, 5, 8, 16, 77, 105, 165, 83, 232, 235, 268, 300, 394, 16, 77, 105, 165, 83, 232, 235, 268, 300, 394, 0, 628, 629, 1, 628, 2, 5, 8, 16, 77, 105, 165, 83, 232, 235, 268, 300, 394, 16, 77, 105, 165, 83, 232, 235, 268, 300, 394, 231, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 0 }; static const short _spss_commands_cond_actions[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 21, 21, 81, 81, 0, 0, 0, 75, 17, 19, 0, 5, 5, 5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 39, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 81, 81, 0, 0, 0, 84, 84, 84, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 0, 21, 21, 21, 21, 120, 0, 0, 0, 45, 45, 45, 45, 3, 1, 0, 0, 0, 0, 0, 0, 69, 69, 0, 21, 21, 21, 21, 21, 124, 124, 0, 0, 0, 0, 0, 39, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 69, 69, 0, 21, 21, 21, 21, 124, 124, 0, 0, 0, 84, 84, 84, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 0, 21, 21, 21, 21, 120, 0, 0, 0, 48, 48, 48, 48, 3, 48, 48, 1, 0, 0, 0, 0, 0, 0, 0, 0, 69, 69, 0, 21, 21, 21, 21, 21, 21, 21, 124, 124, 0, 0, 0, 0, 0, 39, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 69, 69, 0, 21, 21, 21, 21, 21, 21, 124, 124, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 81, 81, 0, 0, 0, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 23, 23, 0, 87, 87, 87, 87, 87, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 23, 23, 0, 21, 21, 21, 21, 21, 21, 120, 81, 81, 0, 0, 0, 75, 17, 19, 0, 5, 5, 5, 5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 72, 13, 15, 0, 0, 0, 0, 0, 0, 0, 1, 0, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 81, 81, 0, 0, 0, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 21, 81, 81, 0, 0, 0, 0, 0, 39, 0, 0, 1, 0, 39, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 23, 23, 23, 23, 0, 87, 87, 87, 87, 87, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 23, 23, 0, 21, 21, 21, 21, 21, 21, 120, 81, 81, 0, 0, 0, 75, 17, 19, 0, 60, 60, 60, 60, 60, 60, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 72, 13, 15, 0, 9, 9, 9, 9, 9, 9, 1, 0, 90, 90, 90, 90, 90, 90, 0, 0, 0, 0, 0, 87, 87, 87, 87, 87, 0, 0, 0, 0, 0, 0, 0, 87, 87, 87, 87, 87, 0, 0, 0, 0, 0, 0, 0, 87, 87, 87, 87, 87, 0, 0, 0, 0, 0, 0, 0, 87, 87, 87, 87, 87, 0, 0, 0, 0, 0, 0, 0, 87, 87, 87, 87, 87, 0, 0, 0, 0, 0, 0, 0, 87, 87, 87, 87, 87, 0, 0, 0, 0, 0, 0, 0, 87, 87, 87, 87, 87, 0, 0, 0, 0, 0, 0, 0, 87, 87, 87, 87, 87, 0, 0, 0, 0, 0, 0, 0, 87, 87, 87, 87, 87, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 23, 23, 0, 21, 21, 21, 21, 21, 21, 120, 81, 81, 0, 0, 0, 140, 140, 140, 140, 90, 90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 39, 0, 96, 96, 96, 96, 96, 96, 1, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 21, 21, 81, 81, 0, 0, 0, 84, 84, 84, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 0, 102, 102, 102, 102, 102, 102, 1, 0, 39, 0, 99, 99, 99, 99, 99, 99, 1, 0, 7, 7, 7, 7, 7, 7, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 81, 81, 0, 0, 0, 84, 84, 84, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 75, 17, 19, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 21, 81, 81, 0, 0, 0, 72, 13, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 81, 81, 0, 0, 0, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 0, 39, 0, 39, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 39, 0, 21, 21, 21, 21, 21, 120, 0, 0, 0, 39, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 21, 0, 0, 0, 75, 17, 19, 0, 5, 5, 5, 5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 21, 0, 0, 0, 72, 13, 15, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 23, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 23, 23, 0, 21, 21, 21, 21, 21, 21, 21, 21, 21, 120, 81, 81, 0, 0, 0, 75, 17, 19, 0, 5, 5, 5, 5, 5, 5, 5, 5, 5, 108, 66, 66, 0, 72, 13, 15, 0, 39, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 23, 23, 0, 25, 25, 25, 25, 25, 25, 25, 25, 0, 25, 128, 93, 93, 0, 25, 25, 25, 25, 25, 25, 25, 25, 0, 25, 132, 93, 93, 0, 21, 21, 21, 21, 21, 21, 21, 21, 120, 81, 81, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 81, 81, 0, 0, 0, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 23, 23, 0, 21, 21, 21, 21, 21, 21, 120, 81, 81, 0, 0, 0, 75, 17, 19, 0, 5, 5, 5, 5, 5, 0, 72, 13, 15, 0, 0, 0, 0, 0, 0, 1, 0, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 21, 81, 81, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 81, 81, 0, 0, 0, 153, 153, 153, 153, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 23, 23, 0, 21, 21, 21, 21, 21, 21, 21, 21, 120, 81, 81, 0, 0, 0, 75, 17, 19, 0, 116, 116, 116, 116, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 75, 17, 19, 0, 63, 63, 63, 63, 112, 112, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 39, 0, 21, 21, 21, 21, 21, 21, 21, 78, 78, 120, 0, 0, 0, 72, 13, 15, 0, 39, 0, 33, 33, 33, 33, 1, 0, 0, 0, 0, 0, 0, 23, 23, 23, 23, 0, 21, 21, 21, 21, 21, 81, 81, 81, 81, 0, 0, 0, 153, 153, 153, 153, 0, 0, 0, 0, 0, 0, 0, 153, 153, 153, 153, 0, 0, 0, 0, 0, 0, 0, 153, 153, 153, 153, 0, 0, 0, 0, 0, 0, 0, 153, 153, 153, 153, 0, 0, 0, 0, 0, 0, 0, 153, 153, 153, 153, 0, 0, 0, 0, 0, 0, 0, 153, 153, 153, 153, 0, 0, 0, 0, 0, 0, 0, 153, 153, 153, 153, 0, 0, 0, 0, 0, 0, 0, 153, 153, 153, 153, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 39, 23, 23, 0, 21, 21, 21, 21, 21, 21, 21, 21, 81, 81, 120, 81, 81, 0, 0, 0, 31, 31, 31, 31, 0, 35, 35, 35, 35, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 21, 0, 0, 0, 72, 13, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 144, 0, 21, 21, 21, 21, 148, 0, 0, 0, 37, 37, 37, 37, 1, 0, 136, 136, 136, 136, 0, 0, 0, 0, 0, 136, 136, 136, 136, 0, 0, 0, 0, 0, 0, 0, 136, 136, 136, 136, 0, 0, 0, 0, 0, 0, 0, 136, 136, 136, 136, 0, 0, 0, 0, 0, 0, 0, 136, 136, 136, 136, 0, 0, 0, 0, 0, 0, 0, 136, 136, 136, 136, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 23, 23, 0, 21, 21, 21, 21, 21, 21, 21, 21, 120, 81, 81, 0, 0, 0, 158, 158, 158, 158, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 23, 23, 0, 21, 21, 21, 21, 21, 21, 21, 21, 21, 120, 81, 81, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 81, 81, 0, 0, 0, 153, 153, 153, 153, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 21, 81, 81, 0, 0, 0, 136, 136, 136, 136, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 136, 136, 136, 136, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 81, 81, 0, 0, 0, 84, 84, 84, 84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 75, 17, 19, 0, 57, 57, 57, 57, 57, 57, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 21, 21, 81, 81, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 21, 81, 81, 0, 0, 0, 72, 13, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 0, 105, 105, 105, 105, 105, 105, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 39, 0, 27, 27, 27, 27, 27, 27, 0, 0, 1, 0, 0, 0, 0, 0, 39, 0, 21, 21, 21, 21, 120, 0, 0, 0, 54, 54, 54, 54, 54, 54, 1, 0, 0, 39, 0, 27, 27, 27, 27, 0, 0, 1, 0, 0, 0, 0, 0, 39, 0, 21, 21, 21, 21, 120, 0, 0, 0, 51, 51, 51, 51, 1, 0, 29, 29, 29, 29, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 39, 0, 21, 21, 21, 21, 120, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 72, 13, 15, 0, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 21, 21, 81, 81, 0, 0, 0, 75, 17, 19, 0, 5, 5, 5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 81, 81, 0, 0, 0, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 72, 13, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 21, 21, 21, 21, 21, 81, 81, 0, 0, 0, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 7, 0, 0 }; static const short _spss_commands_eof_trans[] = { 3286, 3287, 3288, 3289, 3290, 3291, 3292, 3293, 3294, 3295, 3296, 3297, 3298, 3299, 3300, 3301, 3302, 3303, 3304, 3305, 3306, 3307, 3308, 3309, 3310, 3311, 3312, 3313, 3314, 3315, 3316, 3317, 3318, 3319, 3320, 3321, 3322, 3323, 3324, 3325, 3326, 3327, 3328, 3329, 3330, 3331, 3332, 3333, 3334, 3335, 3336, 3337, 3338, 3339, 3340, 3341, 3342, 3343, 3344, 3345, 3346, 3347, 3348, 3349, 3350, 3351, 3352, 3353, 3354, 3355, 3356, 3357, 3358, 3359, 3360, 3361, 3362, 3363, 3364, 3365, 3366, 3367, 3368, 3369, 3370, 3371, 3372, 3373, 3374, 3375, 3376, 3377, 3378, 3379, 3380, 3381, 3382, 3383, 3384, 3385, 3386, 3387, 3388, 3389, 3390, 3391, 3392, 3393, 3394, 3395, 3396, 3397, 3398, 3399, 3400, 3401, 3402, 3403, 3404, 3405, 3406, 3407, 3408, 3409, 3410, 3411, 3412, 3413, 3414, 3415, 3416, 3417, 3418, 3419, 3420, 3421, 3422, 3423, 3424, 3425, 3426, 3427, 3428, 3429, 3430, 3431, 3432, 3433, 3434, 3435, 3436, 3437, 3438, 3439, 3440, 3441, 3442, 3443, 3444, 3445, 3446, 3447, 3448, 3449, 3450, 3451, 3452, 3453, 3454, 3455, 3456, 3457, 3458, 3459, 3460, 3461, 3462, 3463, 3464, 3465, 3466, 3467, 3468, 3469, 3470, 3471, 3472, 3473, 3474, 3475, 3476, 3477, 3478, 3479, 3480, 3481, 3482, 3483, 3484, 3485, 3486, 3487, 3488, 3489, 3490, 3491, 3492, 3493, 3494, 3495, 3496, 3497, 3498, 3499, 3500, 3501, 3502, 3503, 3504, 3505, 3506, 3507, 3508, 3509, 3510, 3511, 3512, 3513, 3514, 3515, 3516, 3517, 3518, 3519, 3520, 3521, 3522, 3523, 3524, 3525, 3526, 3527, 3528, 3529, 3530, 3531, 3532, 3533, 3534, 3535, 3536, 3537, 3538, 3539, 3540, 3541, 3542, 3543, 3544, 3545, 3546, 3547, 3548, 3549, 3550, 3551, 3552, 3553, 3554, 3555, 3556, 3557, 3558, 3559, 3560, 3561, 3562, 3563, 3564, 3565, 3566, 3567, 3568, 3569, 3570, 3571, 3572, 3573, 3574, 3575, 3576, 3577, 3578, 3579, 3580, 3581, 3582, 3583, 3584, 3585, 3586, 3587, 3588, 3589, 3590, 3591, 3592, 3593, 3594, 3595, 3596, 3597, 3598, 3599, 3600, 3601, 3602, 3603, 3604, 3605, 3606, 3607, 3608, 3609, 3610, 3611, 3612, 3613, 3614, 3615, 3616, 3617, 3618, 3619, 3620, 3621, 3622, 3623, 3624, 3625, 3626, 3627, 3628, 3629, 3630, 3631, 3632, 3633, 3634, 3635, 3636, 3637, 3638, 3639, 3640, 3641, 3642, 3643, 3644, 3645, 3646, 3647, 3648, 3649, 3650, 3651, 3652, 3653, 3654, 3655, 3656, 3657, 3658, 3659, 3660, 3661, 3662, 3663, 3664, 3665, 3666, 3667, 3668, 3669, 3670, 3671, 3672, 3673, 3674, 3675, 3676, 3677, 3678, 3679, 3680, 3681, 3682, 3683, 3684, 3685, 3686, 3687, 3688, 3689, 3690, 3691, 3692, 3693, 3694, 3695, 3696, 3697, 3698, 3699, 3700, 3701, 3702, 3703, 3704, 3705, 3706, 3707, 3708, 3709, 3710, 3711, 3712, 3713, 3714, 3715, 3716, 3717, 3718, 3719, 3720, 3721, 3722, 3723, 3724, 3725, 3726, 3727, 3728, 3729, 3730, 3731, 3732, 3733, 3734, 3735, 3736, 3737, 3738, 3739, 3740, 3741, 3742, 3743, 3744, 3745, 3746, 3747, 3748, 3749, 3750, 3751, 3752, 3753, 3754, 3755, 3756, 3757, 3758, 3759, 3760, 3761, 3762, 3763, 3764, 3765, 3766, 3767, 3768, 3769, 3770, 3771, 3772, 3773, 3774, 3775, 3776, 3777, 3778, 3779, 3780, 3781, 3782, 3783, 3784, 3785, 3786, 3787, 3788, 3789, 3790, 3791, 3792, 3793, 3794, 3795, 3796, 3797, 3798, 3799, 3800, 3801, 3802, 3803, 3804, 3805, 3806, 3807, 3808, 3809, 3810, 3811, 3812, 3813, 3814, 3815, 3816, 3817, 3818, 3819, 3820, 3821, 3822, 3823, 3824, 3825, 3826, 3827, 3828, 3829, 3830, 3831, 3832, 3833, 3834, 3835, 3836, 3837, 3838, 3839, 3840, 3841, 3842, 3843, 3844, 3845, 3846, 3847, 3848, 3849, 3850, 3851, 3852, 3853, 3854, 3855, 3856, 3857, 3858, 3859, 3860, 3861, 3862, 3863, 3864, 3865, 3866, 3867, 3868, 3869, 3870, 3871, 3872, 3873, 3874, 3875, 3876, 3877, 3878, 3879, 3880, 3881, 3882, 3883, 3884, 3885, 3886, 3887, 3888, 3889, 3890, 3891, 3892, 3893, 3894, 3895, 3896, 3897, 3898, 3899, 3900, 3901, 3902, 3903, 3904, 3905, 3906, 3907, 3908, 3909, 3910, 3911, 3912, 3913, 3914, 3915, 3916, 3917, 0 }; static const int spss_commands_start = 628; static const int spss_commands_en_main = 628; #line 14 "src/txt/readstat_spss_commands_read.rl" readstat_schema_t *readstat_parse_spss_commands(readstat_parser_t *parser, const char *filepath, void *user_ctx, readstat_error_t *outError) { if (parser->io->open(filepath, parser->io->io_ctx) == -1) { if (outError) *outError = READSTAT_ERROR_OPEN; return NULL; } readstat_schema_t *schema = NULL; unsigned char *bytes = NULL; readstat_error_t error = READSTAT_OK; ssize_t len = parser->io->seek(0, READSTAT_SEEK_END, parser->io->io_ctx); if (len == -1) { error = READSTAT_ERROR_SEEK; goto cleanup; } parser->io->seek(0, READSTAT_SEEK_SET, parser->io->io_ctx); bytes = malloc(len); parser->io->read(bytes, len, parser->io->io_ctx); unsigned char *p = bytes; unsigned char *pe = bytes + len; unsigned char *eof = pe; unsigned char *str_start = NULL; size_t str_len = 0; int cs; int i; int line_no = 0; uint64_t first_integer = 0, integer = 0; double double_value = NAN; unsigned char *line_start = p; char varname[32]; char argname[32]; char string_value[32]; char buf[1024]; char var_list[1024][32]; long var_col = 0; long var_row = 0; long var_len = 0; long var_count = 0; readstat_type_t var_type = READSTAT_TYPE_DOUBLE; label_type_t label_type = LABEL_TYPE_DOUBLE; int labelset_count = 0; if ((schema = calloc(1, sizeof(readstat_schema_t))) == NULL) { error = READSTAT_ERROR_MALLOC; goto cleanup; } schema->rows_per_observation = 1; #line 1893 "src/txt/readstat_spss_commands_read.c" { cs = (int)spss_commands_start; } #line 1898 "src/txt/readstat_spss_commands_read.c" { int _klen; unsigned int _trans = 0; const char * _keys; const signed char * _acts; unsigned int _nacts; _resume: {} if ( p == pe && p != eof ) goto _out; if ( p == eof ) { if ( _spss_commands_eof_trans[cs] > 0 ) { _trans = (unsigned int)_spss_commands_eof_trans[cs] - 1; } } else { _keys = ( _spss_commands_trans_keys + (_spss_commands_key_offsets[cs])); _trans = (unsigned int)_spss_commands_index_offsets[cs]; _klen = (int)_spss_commands_single_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + _klen - 1; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _keys += _klen; _trans += (unsigned int)_klen; break; } _mid = _lower + ((_upper-_lower) >> 1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 1; else if ( ( (*( p))) > (*( _mid)) ) _lower = _mid + 1; else { _trans += (unsigned int)(_mid - _keys); goto _match; } } } _klen = (int)_spss_commands_range_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + (_klen<<1) - 2; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _trans += (unsigned int)_klen; break; } _mid = _lower + (((_upper-_lower) >> 1) & ~1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 2; else if ( ( (*( p))) > (*( _mid + 1)) ) _lower = _mid + 2; else { _trans += (unsigned int)((_mid - _keys)>>1); break; } } } _match: {} } cs = (int)_spss_commands_cond_targs[_trans]; if ( _spss_commands_cond_actions[_trans] != 0 ) { _acts = ( _spss_commands_actions + (_spss_commands_cond_actions[_trans])); _nacts = (unsigned int)(*( _acts)); _acts += 1; while ( _nacts > 0 ) { switch ( (*( _acts)) ) { case 0: { { #line 78 "src/txt/readstat_spss_commands_read.rl" integer = 0; } #line 1983 "src/txt/readstat_spss_commands_read.c" break; } case 1: { { #line 82 "src/txt/readstat_spss_commands_read.rl" integer = 10 * integer + ((( (*( p)))) - '0'); } #line 1994 "src/txt/readstat_spss_commands_read.c" break; } case 2: { { #line 86 "src/txt/readstat_spss_commands_read.rl" var_col = integer - 1; var_len = 1; } #line 2006 "src/txt/readstat_spss_commands_read.c" break; } case 3: { { #line 91 "src/txt/readstat_spss_commands_read.rl" var_len = integer - var_col; } #line 2017 "src/txt/readstat_spss_commands_read.c" break; } case 4: { { #line 95 "src/txt/readstat_spss_commands_read.rl" readstat_copy_quoted(buf, sizeof(buf), (char *)str_start, str_len); } #line 2028 "src/txt/readstat_spss_commands_read.c" break; } case 5: { { #line 99 "src/txt/readstat_spss_commands_read.rl" readstat_copy_quoted(string_value, sizeof(string_value), (char *)str_start, str_len); } #line 2039 "src/txt/readstat_spss_commands_read.c" break; } case 6: { { #line 107 "src/txt/readstat_spss_commands_read.rl" readstat_copy(varname, sizeof(varname), (char *)str_start, str_len); } #line 2050 "src/txt/readstat_spss_commands_read.c" break; } case 7: { { #line 111 "src/txt/readstat_spss_commands_read.rl" readstat_copy(argname, sizeof(argname), (char *)str_start, str_len); } #line 2061 "src/txt/readstat_spss_commands_read.c" break; } case 8: { { #line 115 "src/txt/readstat_spss_commands_read.rl" readstat_schema_entry_t *entry = readstat_schema_find_or_create_entry(schema, varname); entry->variable.type = var_type; entry->row = var_row; entry->col = var_col; entry->len = var_len; } #line 2076 "src/txt/readstat_spss_commands_read.c" break; } case 9: { { #line 123 "src/txt/readstat_spss_commands_read.rl" readstat_schema_entry_t *entry = readstat_schema_find_or_create_entry(schema, varname); readstat_copy(entry->variable.label, sizeof(entry->variable.label), buf, sizeof(buf)); } #line 2088 "src/txt/readstat_spss_commands_read.c" break; } case 10: { { #line 128 "src/txt/readstat_spss_commands_read.rl" var_count = 0; } #line 2099 "src/txt/readstat_spss_commands_read.c" break; } case 11: { { #line 132 "src/txt/readstat_spss_commands_read.rl" if (var_count < sizeof(var_list)/sizeof(var_list[0])) { memcpy(var_list[var_count++], varname, sizeof(varname)); } } #line 2112 "src/txt/readstat_spss_commands_read.c" break; } case 12: { { #line 138 "src/txt/readstat_spss_commands_read.rl" if (strcasecmp(argname, "FIRSTCASE") == 0) { schema->first_line = integer; } if (strcasecmp(argname, "DELIMITERS") == 0) { schema->field_delimiter = buf[0]; } } #line 2128 "src/txt/readstat_spss_commands_read.c" break; } case 13: { { #line 147 "src/txt/readstat_spss_commands_read.rl" char labelset_name[256]; snprintf(labelset_name, sizeof(labelset_name), "labels%d", labelset_count++); for (i=0; ilabelset, sizeof(entry->labelset), labelset_name, sizeof(labelset_name)); } } #line 2144 "src/txt/readstat_spss_commands_read.c" break; } case 14: { { #line 156 "src/txt/readstat_spss_commands_read.rl" char labelset_name[256]; snprintf(labelset_name, sizeof(labelset_name), "labels%d", labelset_count); error = submit_value_label(parser, labelset_name, label_type, first_integer, integer, double_value, string_value, buf, user_ctx); if (error != READSTAT_OK) goto cleanup; } #line 2160 "src/txt/readstat_spss_commands_read.c" break; } case 15: { { #line 165 "src/txt/readstat_spss_commands_read.rl" str_start = p; } #line 2169 "src/txt/readstat_spss_commands_read.c" break; } case 16: { { #line 165 "src/txt/readstat_spss_commands_read.rl" str_len = p - str_start; } #line 2178 "src/txt/readstat_spss_commands_read.c" break; } case 17: { { #line 167 "src/txt/readstat_spss_commands_read.rl" str_start = p; } #line 2187 "src/txt/readstat_spss_commands_read.c" break; } case 18: { { #line 167 "src/txt/readstat_spss_commands_read.rl" str_len = p - str_start; } #line 2196 "src/txt/readstat_spss_commands_read.c" break; } case 19: { { #line 171 "src/txt/readstat_spss_commands_read.rl" line_no++; line_start = p; } #line 2205 "src/txt/readstat_spss_commands_read.c" break; } case 20: { { #line 173 "src/txt/readstat_spss_commands_read.rl" str_start = p; } #line 2214 "src/txt/readstat_spss_commands_read.c" break; } case 21: { { #line 173 "src/txt/readstat_spss_commands_read.rl" str_len = p - str_start; } #line 2223 "src/txt/readstat_spss_commands_read.c" break; } case 22: { { #line 191 "src/txt/readstat_spss_commands_read.rl" var_type = READSTAT_TYPE_STRING; } #line 2232 "src/txt/readstat_spss_commands_read.c" break; } case 23: { { #line 194 "src/txt/readstat_spss_commands_read.rl" var_type = READSTAT_TYPE_STRING; } #line 2241 "src/txt/readstat_spss_commands_read.c" break; } case 24: { { #line 195 "src/txt/readstat_spss_commands_read.rl" var_type = READSTAT_TYPE_DOUBLE; } #line 2250 "src/txt/readstat_spss_commands_read.c" break; } case 25: { { #line 196 "src/txt/readstat_spss_commands_read.rl" var_type = READSTAT_TYPE_DOUBLE; } #line 2259 "src/txt/readstat_spss_commands_read.c" break; } case 26: { { #line 197 "src/txt/readstat_spss_commands_read.rl" var_type = READSTAT_TYPE_STRING; } #line 2268 "src/txt/readstat_spss_commands_read.c" break; } case 27: { { #line 218 "src/txt/readstat_spss_commands_read.rl" var_row = integer - 1; } #line 2277 "src/txt/readstat_spss_commands_read.c" break; } case 28: { { #line 219 "src/txt/readstat_spss_commands_read.rl" var_type = READSTAT_TYPE_DOUBLE; } #line 2286 "src/txt/readstat_spss_commands_read.c" break; } case 29: { { #line 220 "src/txt/readstat_spss_commands_read.rl" var_type = READSTAT_TYPE_DOUBLE; } #line 2295 "src/txt/readstat_spss_commands_read.c" break; } case 30: { { #line 253 "src/txt/readstat_spss_commands_read.rl" label_type = -1; } #line 2304 "src/txt/readstat_spss_commands_read.c" break; } case 31: { { #line 259 "src/txt/readstat_spss_commands_read.rl" label_type = LABEL_TYPE_DOUBLE; double_value = -(double)integer; } #line 2313 "src/txt/readstat_spss_commands_read.c" break; } case 32: { { #line 260 "src/txt/readstat_spss_commands_read.rl" label_type = LABEL_TYPE_DOUBLE; double_value = integer; } #line 2322 "src/txt/readstat_spss_commands_read.c" break; } case 33: { { #line 261 "src/txt/readstat_spss_commands_read.rl" first_integer = integer; } #line 2331 "src/txt/readstat_spss_commands_read.c" break; } case 34: { { #line 261 "src/txt/readstat_spss_commands_read.rl" label_type = LABEL_TYPE_RANGE; } #line 2340 "src/txt/readstat_spss_commands_read.c" break; } case 35: { { #line 262 "src/txt/readstat_spss_commands_read.rl" label_type = LABEL_TYPE_STRING; } #line 2349 "src/txt/readstat_spss_commands_read.c" break; } } _nacts -= 1; _acts += 1; } } if ( p == eof ) { if ( cs >= 628 ) goto _out; } else { if ( cs != 0 ) { p += 1; goto _resume; } } _out: {} } #line 312 "src/txt/readstat_spss_commands_read.rl" /* suppress warnings */ (void)spss_commands_en_main; if (cs < #line 2380 "src/txt/readstat_spss_commands_read.c" 628 #line 317 "src/txt/readstat_spss_commands_read.rl" ) { char error_buf[1024]; if (p == pe) { snprintf(error_buf, sizeof(error_buf), "Error parsing SPSS command file (end-of-file unexpectedly reached)"); } else { snprintf(error_buf, sizeof(error_buf), "Error parsing SPSS command file around line #%d, col #%ld (%c)", line_no + 1, (long)(p - line_start + 1), *p); } if (parser->handlers.error) { parser->handlers.error(error_buf, user_ctx); } error = READSTAT_ERROR_PARSE; goto cleanup; } error = submit_columns(parser, schema, user_ctx); cleanup: parser->io->close(parser->io->io_ctx); free(bytes); if (error != READSTAT_OK) { if (outError) *outError = error; readstat_schema_free(schema); schema = NULL; } return schema; } haven/src/readstat/txt/readstat_sas_commands_read.rl0000644000176200001440000003466114101007206022461 0ustar liggesusers#include #include "../readstat.h" #include "../readstat_strings.h" #include "readstat_schema.h" #include "readstat_copy.h" #include "commands_util.h" %%{ machine sas_commands; write data noerror nofinal; }%% readstat_schema_t *readstat_parse_sas_commands(readstat_parser_t *parser, const char *filepath, void *user_ctx, readstat_error_t *outError) { if (parser->io->open(filepath, parser->io->io_ctx) == -1) { if (outError) *outError = READSTAT_ERROR_OPEN; return NULL; } readstat_schema_t *schema = NULL; unsigned char *bytes = NULL; readstat_error_t error = READSTAT_OK; ssize_t len = parser->io->seek(0, READSTAT_SEEK_END, parser->io->io_ctx); if (len == -1) { error = READSTAT_ERROR_SEEK; goto cleanup; } parser->io->seek(0, READSTAT_SEEK_SET, parser->io->io_ctx); bytes = malloc(len); parser->io->read(bytes, len, parser->io->io_ctx); unsigned char *p = bytes; unsigned char *pe = bytes + len; unsigned char *eof = pe; unsigned char *str_start = NULL; size_t str_len = 0; int cs; double double_value = NAN; uint64_t first_integer = 0; uint64_t integer = 0; int line_no = 0; unsigned char *line_start = p; char varname[32]; char argname[32]; char labelset[32]; char string_value[32]; char buf[1024]; readstat_type_t var_type = READSTAT_TYPE_DOUBLE; label_type_t label_type = LABEL_TYPE_DOUBLE; int var_row = 0, var_col = 0; int var_len = 0; if ((schema = calloc(1, sizeof(readstat_schema_t))) == NULL) { error = READSTAT_ERROR_MALLOC; goto cleanup; } schema->rows_per_observation = 1; %%{ action start_integer { integer = 0; } action incr_integer { integer = 10 * integer + (fc - '0'); } action incr_hex_integer { int value = 0; if (fc >= '0' && fc <= '9') { value = fc - '0'; } else if (fc >= 'A' && fc <= 'F') { value = fc - 'A' + 10; } else if (fc >= 'a' && fc <= 'f') { value = fc - 'a' + 10; } integer = 16 * integer + value; } action copy_pos { var_col = integer - 1; var_len = 1; } action set_len { var_len = integer - var_col; } action set_str { var_type = READSTAT_TYPE_STRING; } action set_dbl { var_type = READSTAT_TYPE_DOUBLE; } action copy_buf { readstat_copy(buf, sizeof(buf), (char *)str_start, str_len); } action copy_labelset { readstat_copy(labelset, sizeof(labelset), (char *)str_start, str_len); } action copy_string { readstat_copy(string_value, sizeof(string_value), (char *)str_start, str_len); } action copy_argname { readstat_copy(argname, sizeof(argname), (char *)str_start, str_len); } action copy_varname { readstat_copy_lower(varname, sizeof(varname), (char *)str_start, str_len); } action handle_arg { if (strcasecmp(argname, "firstobs") == 0) { schema->first_line = integer; } if (strcasecmp(argname, "dlm") == 0) { schema->field_delimiter = integer ? integer : buf[0]; } } action handle_var { readstat_schema_entry_t *entry = readstat_schema_find_or_create_entry(schema, varname); entry->variable.type = var_type; entry->row = var_row; entry->col = var_col; entry->len = var_len; } action handle_var_len { readstat_schema_entry_t *entry = readstat_schema_find_or_create_entry(schema, varname); entry->len = var_len; } action handle_var_label { readstat_schema_entry_t *entry = readstat_schema_find_or_create_entry(schema, varname); readstat_copy(entry->variable.label, sizeof(entry->variable.label), buf, sizeof(buf)); } action handle_var_labelset { readstat_schema_entry_t *entry = readstat_schema_find_or_create_entry(schema, varname); readstat_copy(entry->labelset, sizeof(entry->labelset), labelset, sizeof(labelset)); } action handle_value_label { error = submit_value_label(parser, labelset, label_type, first_integer, integer, double_value, string_value, buf, user_ctx); if (error != READSTAT_OK) goto cleanup; } single_quoted_string = "'" ( [^']* ) >{ str_start = fpc; } %{ str_len = fpc - str_start; } "'"; double_quoted_string = "\"" ( [^"]* ) >{ str_start = fpc; } %{ str_len = fpc - str_start; } "\""; unquoted_string = [A-Za-z] [_A-Za-z0-9\.]*; quoted_string = ( single_quoted_string | double_quoted_string ) %copy_buf; hex_string = "'" ( [0-9A-Fa-f]+ ) >start_integer $incr_hex_integer "'x"; newline = ( "\n" | "\r\n" ) %{ line_no++; line_start = p; }; missing_value = "." [A-Z]?; identifier = ( [$_A-Za-z] [_A-Za-z0-9]* ) >{ str_start = fpc; } %{ str_len = fpc - str_start; }; identifier_eval = "&"? identifier "."?; integer = [0-9]+ >start_integer $incr_integer; true_whitespace = [ \t] | newline; multiline_comment = "/*" ( any* - ( any* "*/" any* ) ) "*/"; comment = "*" ( any* - ( any* ";" true_whitespace* newline any* ) ) ";" true_whitespace* newline | multiline_comment; whitespace = true_whitespace | multiline_comment; var = identifier %copy_varname; labelset = identifier %copy_labelset; arg = identifier %copy_argname (whitespace* "=" whitespace* (identifier_eval | quoted_string | hex_string | integer) >start_integer %handle_arg)?; args = arg ( whitespace+ arg)*; options_cmd = "OPTIONS"i whitespace+ args whitespace* ";"; let_macro = "%LET"i whitespace+ identifier whitespace* "=" whitespace* (unquoted_string | quoted_string) whitespace* ";"; libname_cmd = "LIBNAME"i whitespace+ identifier whitespace+ ( quoted_string (whitespace+ args)? | "CLEAR"i | "_ALL_"i whitespace* "CLEAR"i | "LIST"i | "_ALL_"i whitespace* "LIST"i ) whitespace* ";"; footnote_cmd = "FOOTNOTE"i whitespace+ quoted_string whitespace* ";"; empty_cmd = ";"; value_label = ( "-" integer %{ label_type = LABEL_TYPE_DOUBLE; double_value = -(double)integer; } | integer %{ label_type = LABEL_TYPE_DOUBLE; double_value = integer; } | integer whitespace+ "-" whitespace+ %{ first_integer = integer; } integer %{ label_type = LABEL_TYPE_RANGE; } | unquoted_string %{ label_type = LABEL_TYPE_STRING; } %copy_string | quoted_string %{ label_type = LABEL_TYPE_STRING; } %copy_string | "other" %{ label_type = LABEL_TYPE_OTHER; } ) whitespace* "=" whitespace* quoted_string %handle_value_label; var_len = ("$" whitespace* integer %set_str | integer %set_dbl) %{ var_len = integer; }; value_cmd = "VALUE"i whitespace+ labelset whitespace+ ("(" args ")" whitespace*)? value_label (whitespace+ value_label)* whitespace* ";"; proc_format_cmd = "PROC"i whitespace+ "FORMAT"i whitespace* ( args whitespace* )? ";" ( whitespace | empty_cmd | value_cmd )+; filename_cmd = "FILENAME"i (whitespace+ args)? whitespace+ quoted_string whitespace* ";"; if_statement = "IF"i ( whitespace | identifier | "-"? integer | "(" | ")" | ".")+ ";"; data_cmd = "DATA"i (whitespace+ identifier_eval | unquoted_string | quoted_string )+ whitespace* ";"; missing_cmd = "MISSING"i whitespace+ identifier whitespace* ";"; # lrecl_option = "LRECL"i whitespace* "=" whitespace* integer %handle_info; infile_cmd = "INFILE"i (whitespace+ quoted_string)? (whitespace* args)? whitespace* ";"; length_spec = var whitespace+ var_len %handle_var_len; length_cmd = "LENGTH"i whitespace+ length_spec (whitespace+ length_spec)* whitespace* ";"; label_spec = var whitespace* "=" whitespace* quoted_string %handle_var_label; label_cmd = "LABEL"i whitespace+ label_spec (whitespace+ label_spec)* whitespace* ";"; date_separator = [SN]; date_format = ( "MMDDYY" integer | "DATE" | "DATE9" | "DATETIME" | "DAY" | "DDMMYY" date_separator? integer | "DOWNAME" | "JULDAY" | "JULIAN" | "MMDDYY" date_separator? integer | "MMYY" date_separator? | "MONNAME" | "MONTH" | "MONYY" | "PDFJULG" | "WEEKDATE" | "WEEKDAY" | "WORDDATE" | "WORDDATX" | "QTR" | "QTRR" | "TIME" | "TIMEAMPM" | "TOD" | "YEAR" | "YYMMDD" | "YYMM" date_separator? | "YYQ" date_separator? | "YYQR" date_separator? ); format_lbl_spec = labelset "." %handle_var_labelset; format_dbl_spec = integer "." integer?; format_date_spec = date_format "." integer?; var_format_spec = var whitespace+ ( format_lbl_spec | format_dbl_spec | format_date_spec ); format_cmd = "FORMAT"i whitespace+ var_format_spec (whitespace+ var_format_spec)* whitespace* ";"; var_attribute = ( "LENGTH"i whitespace* "=" whitespace* var_len %handle_var_len | "LABEL"i whitespace* "=" whitespace* quoted_string %handle_var_label | "FORMAT"i whitespace* "=" whitespace* format_dbl_spec ); var_attributes = var_attribute (whitespace+ var_attribute)*; attrib_spec = var whitespace+ var_attributes %handle_var; attrib_cmd = "ATTRIB"i whitespace+ attrib_spec (whitespace+ attrib_spec)* whitespace* ";"; input_format_spec = ("$CHAR" integer %set_str | identifier %set_dbl); input_int_spec = var whitespace+ integer %copy_pos "-" integer %set_len %set_dbl %handle_var; input_dbl_spec = "@" integer %copy_pos whitespace+ var whitespace+ (var_len | input_format_spec) "." %handle_var integer?; input_txt_spec = var whitespace+ "$" whitespace+ integer %copy_pos "-" integer %set_len %set_str %handle_var; row_spec = "#" integer %{ var_row = integer - 1; }; input_spec = (input_int_spec | input_dbl_spec | input_txt_spec | row_spec | var); input_cmd = "INPUT"i whitespace+ %{ var_row = 0; } input_spec (whitespace+ input_spec)* whitespace* ";"; invalue_missing_spec = single_quoted_string whitespace* "=" whitespace* missing_value; invalue_format_spec = format_dbl_spec | format_date_spec; invalue_other_spec = "OTHER" whitespace* "=" whitespace* "(|" invalue_format_spec "|)"; invalue_spec = invalue_missing_spec | invalue_other_spec; invalue_cmd = "INVALUE"i whitespace+ identifier whitespace+ invalue_spec (whitespace+ invalue_spec)* whitespace* ";"; proc_print_cmd = "PROC"i whitespace+ "PRINT"i (whitespace+ args) (whitespace+ "(" args ")")? whitespace* ";"; proc_contents_cmd = "PROC"i whitespace+ "CONTENTS"i (whitespace+ args) whitespace* ";"; run_cmd = "RUN"i whitespace* ";"; command = options_cmd | let_macro | libname_cmd | footnote_cmd | value_cmd | proc_format_cmd | filename_cmd | attrib_cmd | data_cmd | if_statement | missing_cmd | infile_cmd | format_cmd | label_cmd | length_cmd | input_cmd | invalue_cmd | proc_print_cmd | proc_contents_cmd | run_cmd; main := ( true_whitespace | comment | command )*; write init; write exec; }%% /* suppress warnings */ (void)sas_commands_en_main; if (cs < %%{ write first_final; }%%) { char error_buf[1024]; if (p == pe) { snprintf(error_buf, sizeof(error_buf), "Error parsing SAS command file (end-of-file unexpectedly reached)"); } else { snprintf(error_buf, sizeof(error_buf), "Error parsing SAS command file around line #%d, col #%ld (%c)", line_no + 1, (long)(p - line_start + 1), *p); } if (parser->handlers.error) { parser->handlers.error(error_buf, user_ctx); } error = READSTAT_ERROR_PARSE; goto cleanup; } error = submit_columns(parser, schema, user_ctx); cleanup: parser->io->close(parser->io->io_ctx); free(bytes); if (error != READSTAT_OK) { if (outError) *outError = error; readstat_schema_free(schema); schema = NULL; } return schema; } haven/src/readstat/txt/readstat_schema.c0000644000176200001440000000211314101007206020047 0ustar liggesusers#include #include "../readstat.h" #include "readstat_schema.h" #include "readstat_copy.h" void readstat_schema_free(readstat_schema_t *schema) { if (schema) { free(schema->entries); free(schema); } } readstat_schema_entry_t *readstat_schema_find_or_create_entry(readstat_schema_t *dct, const char *var_name) { readstat_schema_entry_t *entry = NULL; int i; /* linear search. this is shitty, but whatever */ for (i=0; ientry_count; i++) { if (strcmp(dct->entries[i].variable.name, var_name) == 0) { entry = &dct->entries[i]; break; } } if (!entry) { dct->entries = realloc(dct->entries, sizeof(readstat_schema_entry_t) * (dct->entry_count + 1)); entry = &dct->entries[dct->entry_count]; memset(entry, 0, sizeof(readstat_schema_entry_t)); readstat_copy(entry->variable.name, sizeof(entry->variable.name), var_name, strlen(var_name)); entry->decimal_separator = '.'; entry->variable.index = dct->entry_count++; } return entry; } haven/src/readstat/txt/commands_util.c0000644000176200001440000000535514101007206017571 0ustar liggesusers#include #include "../readstat.h" #include "readstat_schema.h" #include "commands_util.h" readstat_error_t submit_value_label(readstat_parser_t *parser, const char *labelset, label_type_t label_type, int64_t first_integer, int64_t last_integer, double double_value, const char *string_value, const char *buf, void *user_ctx) { if (!parser->handlers.value_label) return READSTAT_OK; int cb_retval = READSTAT_HANDLER_OK; if (label_type == LABEL_TYPE_RANGE) { int64_t i; for (i=first_integer; i<=last_integer; i++) { readstat_value_t value = { .type = READSTAT_TYPE_DOUBLE, .v = { .double_value = i } }; cb_retval = parser->handlers.value_label(labelset, value, buf, user_ctx); if (cb_retval != READSTAT_HANDLER_OK) goto cleanup; } } else if (label_type != LABEL_TYPE_OTHER) { readstat_value_t value = { { 0 } }; if (label_type == LABEL_TYPE_DOUBLE) { value.type = READSTAT_TYPE_DOUBLE; value.v.double_value = double_value; } else if (label_type == LABEL_TYPE_STRING) { value.type = READSTAT_TYPE_STRING; value.v.string_value = string_value; } else if (label_type == LABEL_TYPE_NAN) { value.type = READSTAT_TYPE_DOUBLE; value.v.double_value = NAN; } cb_retval = parser->handlers.value_label(labelset, value, buf, user_ctx); } cleanup: return (cb_retval == READSTAT_HANDLER_OK) ? READSTAT_OK : READSTAT_ERROR_USER_ABORT; } readstat_error_t submit_columns(readstat_parser_t *parser, readstat_schema_t *dct, void *user_ctx) { int i; int partial_entry_count = 0; for (i=0; ientry_count; i++) { readstat_schema_entry_t *entry = &dct->entries[i]; if (dct->rows_per_observation < entry->row + 1) { dct->rows_per_observation = entry->row + 1; } } if (!parser->handlers.variable) return READSTAT_OK; for (i=0; ientry_count; i++) { readstat_schema_entry_t *entry = &dct->entries[i]; entry->variable.index = i; entry->variable.index_after_skipping = partial_entry_count; if (entry->variable.type == READSTAT_TYPE_STRING) entry->variable.storage_width = entry->len; int cb_retval = parser->handlers.variable(i, &entry->variable, entry->labelset[0] ? entry->labelset : NULL, user_ctx); if (cb_retval == READSTAT_HANDLER_SKIP_VARIABLE) { entry->skip = 1; } else if (cb_retval == READSTAT_HANDLER_ABORT) { return READSTAT_ERROR_USER_ABORT; } else { partial_entry_count++; } } return READSTAT_OK; } haven/src/readstat/txt/readstat_stata_dictionary_read.c0000644000176200001440000006376314101765776023214 0ustar liggesusers#line 1 "src/txt/readstat_stata_dictionary_read.rl" #include #include "../readstat.h" #include "readstat_schema.h" #include "readstat_copy.h" #line 11 "src/txt/readstat_stata_dictionary_read.c" static const signed char _stata_dictionary_actions[] = { 0, 1, 1, 1, 4, 1, 6, 1, 7, 1, 8, 1, 9, 1, 11, 1, 13, 1, 14, 1, 15, 1, 16, 1, 17, 1, 18, 1, 19, 1, 20, 1, 27, 2, 0, 1, 2, 2, 11, 2, 7, 8, 2, 10, 4, 2, 12, 5, 2, 13, 3, 2, 13, 9, 3, 13, 2, 11, 3, 14, 2, 11, 3, 15, 2, 11, 3, 16, 2, 11, 3, 17, 2, 11, 3, 18, 2, 11, 3, 19, 2, 11, 3, 20, 2, 11, 3, 21, 12, 5, 3, 22, 12, 5, 3, 23, 12, 5, 3, 24, 12, 5, 3, 25, 12, 5, 3, 26, 12, 5, 3, 28, 0, 1, 4, 13, 3, 2, 11, 0 }; static const short _stata_dictionary_key_offsets[] = { 0, 0, 4, 6, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 27, 33, 39, 40, 41, 42, 43, 44, 48, 61, 74, 75, 76, 77, 81, 86, 91, 92, 110, 128, 129, 131, 132, 133, 135, 147, 153, 171, 175, 176, 177, 178, 179, 180, 181, 185, 190, 193, 211, 224, 225, 238, 251, 263, 273, 274, 275, 279, 283, 285, 293, 295, 299, 303, 308, 310, 323, 336, 349, 362, 375, 387, 400, 413, 426, 439, 451, 464, 477, 489, 502, 515, 528, 540, 553, 566, 578, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 609, 614, 617, 635, 637, 638, 639, 641, 645, 650, 653, 671, 672, 676, 681, 684, 702, 703, 704, 705, 706, 710, 715, 718, 736, 737, 738, 739, 740, 741, 742, 761, 765, 770, 773, 791, 803, 804, 805, 806, 807, 808, 812, 817, 822, 823, 824, 0 }; static const char _stata_dictionary_trans_keys[] = { 42, 47, 100, 105, 10, 13, 42, 47, 100, 105, 42, 42, 42, 47, 105, 99, 116, 105, 111, 110, 97, 114, 121, 9, 10, 13, 32, 9, 10, 13, 32, 117, 123, 9, 10, 13, 32, 117, 123, 10, 115, 105, 110, 103, 9, 10, 13, 32, 9, 10, 13, 32, 34, 92, 95, 45, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 92, 95, 45, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 9, 10, 13, 32, 123, 9, 10, 13, 32, 123, 10, 9, 10, 13, 32, 42, 47, 95, 98, 100, 102, 105, 108, 115, 125, 65, 90, 97, 122, 9, 10, 13, 32, 42, 47, 95, 98, 100, 102, 105, 108, 115, 125, 65, 90, 97, 122, 10, 10, 13, 42, 42, 42, 47, 9, 10, 13, 32, 46, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 37, 9, 10, 13, 32, 42, 47, 95, 98, 100, 102, 105, 108, 115, 125, 65, 90, 97, 122, 99, 102, 108, 110, 111, 108, 117, 109, 110, 40, 9, 32, 48, 57, 9, 32, 41, 48, 57, 9, 32, 41, 9, 10, 13, 32, 42, 47, 95, 98, 100, 102, 105, 108, 115, 125, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 121, 48, 57, 65, 90, 97, 122, 10, 9, 10, 13, 32, 46, 95, 116, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 101, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 37, 65, 90, 97, 122, 34, 34, 9, 10, 13, 32, 9, 10, 13, 32, 48, 57, 44, 46, 83, 115, 48, 57, 101, 103, 48, 57, 48, 57, 101, 103, 9, 10, 13, 32, 9, 10, 13, 32, 34, 48, 57, 9, 10, 13, 32, 46, 95, 111, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 117, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 98, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 108, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 101, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 108, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 111, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 97, 48, 57, 65, 90, 98, 122, 9, 10, 13, 32, 46, 95, 116, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 110, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 116, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 111, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 110, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 103, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 116, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 114, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 46, 95, 48, 57, 65, 90, 97, 122, 105, 114, 115, 116, 108, 105, 110, 101, 111, 102, 102, 105, 108, 101, 40, 9, 32, 48, 57, 9, 32, 41, 48, 57, 9, 32, 41, 9, 10, 13, 32, 42, 47, 95, 98, 100, 102, 105, 108, 115, 125, 65, 90, 97, 122, 105, 114, 110, 101, 40, 115, 9, 32, 48, 57, 9, 32, 41, 48, 57, 9, 32, 41, 9, 10, 13, 32, 42, 47, 95, 98, 100, 102, 105, 108, 115, 125, 65, 90, 97, 122, 40, 9, 32, 48, 57, 9, 32, 41, 48, 57, 9, 32, 41, 9, 10, 13, 32, 42, 47, 95, 98, 100, 102, 105, 108, 115, 125, 65, 90, 97, 122, 101, 99, 108, 40, 9, 32, 48, 57, 9, 32, 41, 48, 57, 9, 32, 41, 9, 10, 13, 32, 42, 47, 95, 98, 100, 102, 105, 108, 115, 125, 65, 90, 97, 122, 101, 119, 108, 105, 110, 101, 9, 10, 13, 32, 40, 42, 47, 95, 98, 100, 102, 105, 108, 115, 125, 65, 90, 97, 122, 9, 32, 48, 57, 9, 32, 41, 48, 57, 9, 32, 41, 9, 10, 13, 32, 42, 47, 95, 98, 100, 102, 105, 108, 115, 125, 65, 90, 97, 122, 9, 10, 13, 32, 92, 95, 45, 57, 65, 90, 97, 122, 110, 102, 105, 108, 101, 9, 10, 13, 32, 9, 10, 13, 32, 100, 9, 10, 13, 32, 100, 10, 10, 0 }; static const signed char _stata_dictionary_single_lengths[] = { 0, 4, 2, 4, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 6, 6, 1, 1, 1, 1, 1, 4, 7, 7, 1, 1, 1, 4, 5, 5, 1, 14, 14, 1, 2, 1, 1, 2, 6, 6, 14, 4, 1, 1, 1, 1, 1, 1, 2, 3, 3, 14, 7, 1, 7, 7, 6, 6, 1, 1, 4, 4, 0, 4, 0, 0, 4, 5, 0, 7, 7, 7, 7, 7, 6, 7, 7, 7, 7, 6, 7, 7, 6, 7, 7, 7, 6, 7, 7, 6, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 3, 14, 2, 1, 1, 2, 2, 3, 3, 14, 1, 2, 3, 3, 14, 1, 1, 1, 1, 2, 3, 3, 14, 1, 1, 1, 1, 1, 1, 15, 2, 3, 3, 14, 6, 1, 1, 1, 1, 1, 4, 5, 5, 1, 1, 0, 0 }; static const signed char _stata_dictionary_range_lengths[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 3, 0, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 2, 3, 0, 3, 3, 3, 2, 0, 0, 0, 0, 1, 2, 1, 2, 0, 0, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 2, 0, 0, 0, 0, 1, 1, 0, 2, 0, 1, 1, 0, 2, 0, 0, 0, 0, 1, 1, 0, 2, 0, 0, 0, 0, 0, 0, 2, 1, 1, 0, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; static const short _stata_dictionary_index_offsets[] = { 0, 0, 5, 8, 13, 15, 17, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 43, 50, 57, 59, 61, 63, 65, 67, 72, 83, 94, 96, 98, 100, 105, 111, 117, 119, 136, 153, 155, 158, 160, 162, 165, 175, 182, 199, 204, 206, 208, 210, 212, 214, 216, 220, 225, 229, 246, 257, 259, 270, 281, 291, 300, 302, 304, 309, 314, 316, 323, 325, 328, 333, 339, 341, 352, 363, 374, 385, 396, 406, 417, 428, 439, 450, 460, 471, 482, 492, 503, 514, 525, 535, 546, 557, 567, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 611, 616, 620, 637, 640, 642, 644, 647, 651, 656, 660, 677, 679, 683, 688, 692, 709, 711, 713, 715, 717, 721, 726, 730, 747, 749, 751, 753, 755, 757, 759, 777, 781, 786, 790, 807, 817, 819, 821, 823, 825, 827, 832, 838, 844, 846, 848, 0 }; static const short _stata_dictionary_cond_targs[] = { 2, 4, 7, 146, 0, 3, 155, 2, 2, 4, 7, 146, 0, 5, 0, 6, 5, 6, 1, 5, 8, 0, 9, 0, 10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0, 17, 18, 19, 17, 0, 17, 18, 19, 17, 20, 34, 0, 17, 18, 19, 17, 20, 34, 0, 18, 0, 21, 0, 22, 0, 23, 0, 24, 0, 25, 26, 27, 25, 0, 25, 26, 27, 25, 28, 145, 145, 145, 145, 145, 0, 25, 26, 27, 25, 28, 145, 145, 145, 145, 145, 0, 26, 0, 30, 29, 30, 29, 31, 32, 33, 31, 0, 31, 32, 33, 31, 34, 0, 31, 32, 33, 31, 34, 0, 32, 0, 34, 35, 36, 34, 37, 38, 44, 55, 72, 78, 83, 86, 90, 156, 41, 41, 0, 34, 35, 36, 34, 37, 38, 44, 55, 72, 78, 83, 86, 90, 156, 41, 41, 0, 35, 0, 35, 36, 37, 39, 0, 40, 39, 40, 34, 39, 42, 43, 56, 42, 41, 41, 41, 41, 41, 0, 42, 43, 56, 42, 61, 65, 0, 34, 35, 36, 34, 37, 38, 44, 55, 72, 78, 83, 86, 90, 156, 41, 41, 0, 45, 94, 113, 134, 0, 46, 0, 47, 0, 48, 0, 49, 0, 50, 0, 51, 0, 51, 51, 52, 0, 53, 53, 54, 52, 0, 53, 53, 54, 0, 34, 35, 36, 34, 37, 38, 44, 55, 72, 78, 83, 86, 90, 156, 41, 41, 0, 42, 43, 56, 42, 41, 41, 57, 41, 41, 41, 0, 43, 0, 42, 43, 56, 42, 41, 41, 58, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 59, 41, 41, 41, 0, 60, 43, 56, 60, 41, 41, 41, 41, 41, 0, 60, 43, 56, 60, 61, 65, 41, 41, 0, 63, 62, 63, 62, 64, 43, 56, 64, 0, 64, 43, 56, 64, 0, 66, 0, 67, 71, 69, 69, 66, 69, 0, 68, 0, 68, 69, 0, 70, 43, 56, 70, 0, 70, 43, 56, 70, 61, 0, 68, 0, 42, 43, 56, 42, 41, 41, 73, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 74, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 75, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 76, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 77, 41, 41, 41, 0, 60, 43, 56, 60, 41, 41, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 79, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 80, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 81, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 82, 41, 41, 41, 0, 60, 43, 56, 60, 41, 41, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 84, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 85, 41, 41, 41, 0, 60, 43, 56, 60, 41, 41, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 87, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 88, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 89, 41, 41, 41, 0, 60, 43, 56, 60, 41, 41, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 91, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 92, 41, 41, 41, 0, 42, 43, 56, 42, 41, 41, 93, 41, 41, 0, 60, 43, 56, 60, 41, 41, 93, 41, 41, 0, 95, 0, 96, 0, 97, 0, 98, 0, 99, 0, 100, 0, 101, 0, 102, 0, 103, 0, 104, 0, 105, 0, 106, 0, 107, 0, 108, 0, 109, 0, 109, 109, 110, 0, 111, 111, 112, 110, 0, 111, 111, 112, 0, 34, 35, 36, 34, 37, 38, 44, 55, 72, 78, 83, 86, 90, 156, 41, 41, 0, 114, 126, 0, 115, 0, 116, 0, 117, 121, 0, 117, 117, 118, 0, 119, 119, 120, 118, 0, 119, 119, 120, 0, 34, 35, 36, 34, 37, 38, 44, 55, 72, 78, 83, 86, 90, 156, 41, 41, 0, 122, 0, 122, 122, 123, 0, 124, 124, 125, 123, 0, 124, 124, 125, 0, 34, 35, 36, 34, 37, 38, 44, 55, 72, 78, 83, 86, 90, 156, 41, 41, 0, 127, 0, 128, 0, 129, 0, 130, 0, 130, 130, 131, 0, 132, 132, 133, 131, 0, 132, 132, 133, 0, 34, 35, 36, 34, 37, 38, 44, 55, 72, 78, 83, 86, 90, 156, 41, 41, 0, 135, 0, 136, 0, 137, 0, 138, 0, 139, 0, 140, 0, 34, 35, 36, 34, 141, 37, 38, 44, 55, 72, 78, 83, 86, 90, 156, 41, 41, 0, 141, 141, 142, 0, 143, 143, 144, 142, 0, 143, 143, 144, 0, 34, 35, 36, 34, 37, 38, 44, 55, 72, 78, 83, 86, 90, 156, 41, 41, 0, 31, 32, 33, 31, 145, 145, 145, 145, 145, 0, 147, 0, 148, 0, 149, 0, 150, 0, 151, 0, 152, 153, 154, 152, 0, 152, 153, 154, 152, 7, 0, 152, 153, 154, 152, 7, 0, 153, 0, 3, 0, 156, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 0 }; static const signed char _stata_dictionary_cond_actions[] = { 0, 0, 0, 0, 0, 0, 0, 0, 15, 15, 15, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 15, 15, 15, 15, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 0, 15, 15, 15, 15, 15, 51, 51, 51, 51, 51, 0, 0, 0, 39, 7, 9, 0, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 15, 15, 15, 15, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 36, 36, 36, 36, 36, 36, 0, 36, 36, 0, 15, 15, 15, 15, 15, 15, 15, 54, 54, 54, 54, 54, 54, 15, 54, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 48, 48, 48, 48, 48, 48, 114, 114, 114, 114, 114, 114, 48, 114, 114, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 21, 21, 21, 21, 21, 21, 21, 66, 66, 66, 66, 66, 66, 21, 66, 66, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 86, 45, 45, 86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 13, 0, 39, 7, 9, 0, 5, 5, 5, 5, 0, 0, 0, 0, 0, 0, 33, 0, 31, 31, 31, 31, 1, 31, 0, 110, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 102, 45, 45, 102, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 98, 45, 45, 98, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 90, 45, 45, 90, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 94, 45, 45, 94, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 0, 0, 0, 0, 0, 45, 45, 45, 45, 0, 0, 33, 0, 0, 0, 106, 45, 45, 106, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 29, 29, 29, 29, 29, 29, 29, 82, 82, 82, 82, 82, 82, 29, 82, 82, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 19, 19, 19, 19, 19, 19, 19, 62, 62, 62, 62, 62, 62, 19, 62, 62, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 17, 17, 17, 17, 17, 17, 17, 58, 58, 58, 58, 58, 58, 17, 58, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 27, 27, 27, 27, 27, 27, 27, 78, 78, 78, 78, 78, 78, 27, 78, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 23, 23, 23, 23, 23, 23, 70, 70, 70, 70, 70, 70, 23, 70, 70, 0, 0, 0, 33, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 74, 74, 74, 74, 74, 74, 25, 74, 74, 0, 42, 42, 42, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 15, 15, 15, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; static const int stata_dictionary_start = 1; static const int stata_dictionary_en_main = 1; #line 11 "src/txt/readstat_stata_dictionary_read.rl" readstat_schema_t *readstat_parse_stata_dictionary(readstat_parser_t *parser, const char *filepath, void *user_ctx, readstat_error_t *outError) { if (parser->io->open(filepath, parser->io->io_ctx) == -1) { if (outError) *outError = READSTAT_ERROR_OPEN; return NULL; } readstat_schema_t *schema = NULL; unsigned char *bytes = NULL; int cb_return_value = READSTAT_HANDLER_OK; int total_entry_count = 0; int partial_entry_count = 0; readstat_error_t error = READSTAT_OK; ssize_t len = parser->io->seek(0, READSTAT_SEEK_END, parser->io->io_ctx); if (len == -1) { error = READSTAT_ERROR_SEEK; goto cleanup; } parser->io->seek(0, READSTAT_SEEK_SET, parser->io->io_ctx); bytes = malloc(len); parser->io->read(bytes, len, parser->io->io_ctx); unsigned char *p = bytes; unsigned char *pe = bytes + len; unsigned char *str_start = NULL; size_t str_len = 0; int cs; // u_char *eof = pe; int integer = 0; int current_row = 0; int current_col = 0; int line_no = 0; unsigned char *line_start = p; readstat_schema_entry_t current_entry; if ((schema = calloc(1, sizeof(readstat_schema_t))) == NULL) { error = READSTAT_ERROR_MALLOC; goto cleanup; } schema->rows_per_observation = 1; #line 545 "src/txt/readstat_stata_dictionary_read.c" { cs = (int)stata_dictionary_start; } #line 550 "src/txt/readstat_stata_dictionary_read.c" { int _klen; unsigned int _trans = 0; const char * _keys; const signed char * _acts; unsigned int _nacts; _resume: {} if ( p == pe ) goto _out; _keys = ( _stata_dictionary_trans_keys + (_stata_dictionary_key_offsets[cs])); _trans = (unsigned int)_stata_dictionary_index_offsets[cs]; _klen = (int)_stata_dictionary_single_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + _klen - 1; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _keys += _klen; _trans += (unsigned int)_klen; break; } _mid = _lower + ((_upper-_lower) >> 1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 1; else if ( ( (*( p))) > (*( _mid)) ) _lower = _mid + 1; else { _trans += (unsigned int)(_mid - _keys); goto _match; } } } _klen = (int)_stata_dictionary_range_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + (_klen<<1) - 2; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _trans += (unsigned int)_klen; break; } _mid = _lower + (((_upper-_lower) >> 1) & ~1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 2; else if ( ( (*( p))) > (*( _mid + 1)) ) _lower = _mid + 2; else { _trans += (unsigned int)((_mid - _keys)>>1); break; } } } _match: {} cs = (int)_stata_dictionary_cond_targs[_trans]; if ( _stata_dictionary_cond_actions[_trans] != 0 ) { _acts = ( _stata_dictionary_actions + (_stata_dictionary_cond_actions[_trans])); _nacts = (unsigned int)(*( _acts)); _acts += 1; while ( _nacts > 0 ) { switch ( (*( _acts)) ) { case 0: { { #line 63 "src/txt/readstat_stata_dictionary_read.rl" integer = 0; } #line 628 "src/txt/readstat_stata_dictionary_read.c" break; } case 1: { { #line 67 "src/txt/readstat_stata_dictionary_read.rl" integer = 10 * integer + ((( (*( p)))) - '0'); } #line 639 "src/txt/readstat_stata_dictionary_read.c" break; } case 2: { { #line 71 "src/txt/readstat_stata_dictionary_read.rl" memset(¤t_entry, 0, sizeof(readstat_schema_entry_t)); current_entry.decimal_separator = '.'; current_entry.variable.type = READSTAT_TYPE_DOUBLE; current_entry.variable.index = total_entry_count; } #line 653 "src/txt/readstat_stata_dictionary_read.c" break; } case 3: { { #line 78 "src/txt/readstat_stata_dictionary_read.rl" current_entry.row = current_row; current_entry.col = current_col; current_col += current_entry.len; cb_return_value = READSTAT_HANDLER_OK; if (parser->handlers.variable) { current_entry.variable.index_after_skipping = partial_entry_count; cb_return_value = parser->handlers.variable(total_entry_count, ¤t_entry.variable, NULL, user_ctx); if (cb_return_value == READSTAT_HANDLER_ABORT) { error = READSTAT_ERROR_USER_ABORT; goto cleanup; } } if (cb_return_value == READSTAT_HANDLER_SKIP_VARIABLE) { current_entry.skip = 1; } else { partial_entry_count++; } schema->entries = realloc(schema->entries, sizeof(readstat_schema_entry_t) * (schema->entry_count+1)); memcpy(&schema->entries[schema->entry_count++], ¤t_entry, sizeof(readstat_schema_entry_t)); total_entry_count++; } #line 683 "src/txt/readstat_stata_dictionary_read.c" break; } case 4: { { #line 101 "src/txt/readstat_stata_dictionary_read.rl" readstat_copy(schema->filename, sizeof(schema->filename), (char *)str_start, str_len); } #line 694 "src/txt/readstat_stata_dictionary_read.c" break; } case 5: { { #line 105 "src/txt/readstat_stata_dictionary_read.rl" readstat_copy(current_entry.variable.name, sizeof(current_entry.variable.name), (char *)str_start, str_len); } #line 706 "src/txt/readstat_stata_dictionary_read.c" break; } case 6: { { #line 110 "src/txt/readstat_stata_dictionary_read.rl" readstat_copy(current_entry.variable.label, sizeof(current_entry.variable.label), (char *)str_start, str_len); } #line 718 "src/txt/readstat_stata_dictionary_read.c" break; } case 7: { { #line 115 "src/txt/readstat_stata_dictionary_read.rl" str_start = p; } #line 727 "src/txt/readstat_stata_dictionary_read.c" break; } case 8: { { #line 115 "src/txt/readstat_stata_dictionary_read.rl" str_len = p - str_start; } #line 736 "src/txt/readstat_stata_dictionary_read.c" break; } case 9: { { #line 117 "src/txt/readstat_stata_dictionary_read.rl" str_start = p; } #line 745 "src/txt/readstat_stata_dictionary_read.c" break; } case 10: { { #line 117 "src/txt/readstat_stata_dictionary_read.rl" str_len = p - str_start; } #line 754 "src/txt/readstat_stata_dictionary_read.c" break; } case 11: { { #line 119 "src/txt/readstat_stata_dictionary_read.rl" str_start = p; } #line 763 "src/txt/readstat_stata_dictionary_read.c" break; } case 12: { { #line 119 "src/txt/readstat_stata_dictionary_read.rl" str_len = p - str_start; } #line 772 "src/txt/readstat_stata_dictionary_read.c" break; } case 13: { { #line 121 "src/txt/readstat_stata_dictionary_read.rl" line_no++; line_start = p; } #line 781 "src/txt/readstat_stata_dictionary_read.c" break; } case 14: { { #line 131 "src/txt/readstat_stata_dictionary_read.rl" schema->rows_per_observation = integer; } #line 790 "src/txt/readstat_stata_dictionary_read.c" break; } case 15: { { #line 133 "src/txt/readstat_stata_dictionary_read.rl" current_row = integer - 1; } #line 799 "src/txt/readstat_stata_dictionary_read.c" break; } case 16: { { #line 135 "src/txt/readstat_stata_dictionary_read.rl" current_col = integer - 1; } #line 808 "src/txt/readstat_stata_dictionary_read.c" break; } case 17: { { #line 137 "src/txt/readstat_stata_dictionary_read.rl" current_row++; } #line 817 "src/txt/readstat_stata_dictionary_read.c" break; } case 18: { { #line 137 "src/txt/readstat_stata_dictionary_read.rl" current_row += (integer - 1); } #line 826 "src/txt/readstat_stata_dictionary_read.c" break; } case 19: { { #line 141 "src/txt/readstat_stata_dictionary_read.rl" schema->cols_per_observation = integer; } #line 835 "src/txt/readstat_stata_dictionary_read.c" break; } case 20: { { #line 143 "src/txt/readstat_stata_dictionary_read.rl" schema->first_line = integer - 1; } #line 844 "src/txt/readstat_stata_dictionary_read.c" break; } case 21: { { #line 147 "src/txt/readstat_stata_dictionary_read.rl" current_entry.variable.type = READSTAT_TYPE_INT8; } #line 853 "src/txt/readstat_stata_dictionary_read.c" break; } case 22: { { #line 148 "src/txt/readstat_stata_dictionary_read.rl" current_entry.variable.type = READSTAT_TYPE_INT16; } #line 862 "src/txt/readstat_stata_dictionary_read.c" break; } case 23: { { #line 149 "src/txt/readstat_stata_dictionary_read.rl" current_entry.variable.type = READSTAT_TYPE_INT32; } #line 871 "src/txt/readstat_stata_dictionary_read.c" break; } case 24: { { #line 150 "src/txt/readstat_stata_dictionary_read.rl" current_entry.variable.type = READSTAT_TYPE_FLOAT; } #line 880 "src/txt/readstat_stata_dictionary_read.c" break; } case 25: { { #line 151 "src/txt/readstat_stata_dictionary_read.rl" current_entry.variable.type = READSTAT_TYPE_DOUBLE; } #line 889 "src/txt/readstat_stata_dictionary_read.c" break; } case 26: { { #line 152 "src/txt/readstat_stata_dictionary_read.rl" current_entry.variable.type = READSTAT_TYPE_STRING; current_entry.variable.storage_width = integer; } #line 899 "src/txt/readstat_stata_dictionary_read.c" break; } case 27: { { #line 159 "src/txt/readstat_stata_dictionary_read.rl" current_entry.len = integer; } #line 908 "src/txt/readstat_stata_dictionary_read.c" break; } case 28: { { #line 160 "src/txt/readstat_stata_dictionary_read.rl" current_entry.decimal_separator = ','; } #line 917 "src/txt/readstat_stata_dictionary_read.c" break; } } _nacts -= 1; _acts += 1; } } if ( cs != 0 ) { p += 1; goto _resume; } _out: {} } #line 174 "src/txt/readstat_stata_dictionary_read.rl" /* suppress warnings */ (void)stata_dictionary_en_main; if (cs < #line 942 "src/txt/readstat_stata_dictionary_read.c" 156 #line 179 "src/txt/readstat_stata_dictionary_read.rl" ) { char error_buf[1024]; if (p == pe) { snprintf(error_buf, sizeof(error_buf), "Error parsing .dct file (end-of-file unexpectedly reached)"); } else { snprintf(error_buf, sizeof(error_buf), "Error parsing .dct file around line #%d, col #%ld (%c)", line_no + 1, (long)(p - line_start + 1), *p); } if (parser->handlers.error) { parser->handlers.error(error_buf, user_ctx); } error = READSTAT_ERROR_PARSE; goto cleanup; } cleanup: parser->io->close(parser->io->io_ctx); free(bytes); if (error != READSTAT_OK) { if (outError) *outError = error; readstat_schema_free(schema); schema = NULL; } return schema; } haven/src/readstat/txt/readstat_sas_commands_read.c0000644000176200001440000035514714101765776022322 0ustar liggesusers#line 1 "src/txt/readstat_sas_commands_read.rl" #include #include "../readstat.h" #include "../readstat_strings.h" #include "readstat_schema.h" #include "readstat_copy.h" #include "commands_util.h" #line 13 "src/txt/readstat_sas_commands_read.c" static const signed char _sas_commands_actions[] = { 0, 1, 0, 1, 1, 1, 2, 1, 3, 1, 7, 1, 12, 1, 13, 1, 16, 1, 18, 1, 19, 1, 20, 1, 21, 1, 22, 1, 23, 1, 24, 1, 25, 1, 26, 1, 28, 1, 33, 1, 34, 2, 0, 1, 2, 0, 23, 2, 1, 0, 2, 5, 32, 2, 6, 32, 2, 7, 12, 2, 7, 15, 2, 7, 17, 2, 7, 23, 2, 18, 19, 2, 20, 21, 2, 22, 0, 2, 22, 23, 2, 22, 34, 2, 24, 6, 2, 24, 8, 2, 24, 10, 2, 24, 11, 2, 24, 12, 2, 24, 23, 2, 29, 9, 2, 34, 23, 3, 4, 6, 13, 3, 5, 24, 6, 3, 5, 32, 14, 3, 6, 32, 14, 3, 7, 15, 13, 3, 7, 30, 9, 3, 13, 0, 1, 3, 18, 0, 2, 3, 22, 0, 1, 3, 22, 0, 23, 3, 22, 34, 23, 3, 24, 0, 1, 3, 24, 1, 0, 3, 27, 0, 1, 3, 29, 9, 31, 4, 4, 5, 13, 6, 4, 4, 6, 13, 5, 4, 5, 32, 14, 13, 4, 5, 32, 24, 6, 4, 6, 32, 14, 13, 4, 22, 27, 0, 1, 0 }; static const short _sas_commands_key_offsets[] = { 0, 0, 1, 3, 5, 7, 12, 23, 34, 35, 48, 54, 60, 61, 62, 63, 65, 76, 87, 88, 89, 90, 96, 102, 108, 109, 110, 111, 113, 114, 115, 116, 117, 119, 132, 133, 134, 136, 137, 142, 144, 145, 146, 148, 150, 152, 154, 156, 158, 163, 174, 185, 186, 198, 207, 216, 217, 218, 219, 221, 223, 225, 227, 229, 231, 237, 243, 244, 245, 246, 248, 255, 262, 263, 264, 265, 267, 270, 278, 294, 310, 311, 312, 313, 315, 329, 343, 357, 371, 385, 398, 408, 418, 419, 420, 421, 423, 427, 429, 431, 433, 439, 445, 446, 447, 448, 450, 457, 464, 465, 466, 467, 473, 474, 475, 476, 477, 479, 481, 483, 485, 487, 493, 499, 500, 501, 502, 504, 512, 520, 521, 528, 535, 536, 537, 538, 540, 548, 549, 550, 552, 560, 576, 590, 604, 618, 631, 641, 651, 652, 653, 654, 656, 670, 684, 698, 712, 725, 735, 745, 746, 747, 748, 750, 758, 759, 760, 762, 764, 766, 768, 779, 791, 803, 804, 820, 833, 846, 847, 853, 854, 855, 857, 858, 859, 871, 872, 873, 888, 900, 916, 917, 918, 920, 924, 926, 928, 930, 932, 934, 936, 941, 954, 967, 968, 981, 995, 1009, 1010, 1011, 1012, 1014, 1030, 1046, 1047, 1048, 1049, 1054, 1055, 1056, 1058, 1071, 1076, 1082, 1089, 1090, 1097, 1103, 1104, 1105, 1107, 1114, 1118, 1120, 1122, 1124, 1126, 1128, 1133, 1140, 1147, 1148, 1149, 1150, 1152, 1154, 1156, 1158, 1163, 1174, 1185, 1186, 1198, 1219, 1240, 1241, 1249, 1255, 1267, 1279, 1280, 1281, 1282, 1284, 1285, 1286, 1288, 1291, 1299, 1307, 1318, 1328, 1337, 1347, 1355, 1363, 1372, 1381, 1390, 1399, 1408, 1417, 1426, 1436, 1444, 1452, 1461, 1470, 1479, 1488, 1497, 1507, 1516, 1525, 1534, 1543, 1553, 1563, 1572, 1581, 1591, 1600, 1611, 1620, 1629, 1638, 1647, 1656, 1665, 1674, 1683, 1692, 1701, 1711, 1720, 1729, 1738, 1747, 1756, 1765, 1774, 1784, 1793, 1802, 1811, 1820, 1830, 1839, 1848, 1857, 1866, 1875, 1885, 1895, 1904, 1913, 1923, 1932, 1943, 1954, 1955, 1956, 1958, 1962, 1979, 1997, 2015, 2016, 2034, 2036, 2054, 2055, 2056, 2058, 2076, 2082, 2084, 2086, 2088, 2100, 2114, 2128, 2129, 2130, 2131, 2143, 2155, 2167, 2168, 2182, 2195, 2208, 2209, 2210, 2211, 2213, 2229, 2245, 2246, 2247, 2248, 2254, 2255, 2256, 2258, 2272, 2278, 2284, 2291, 2292, 2299, 2306, 2307, 2308, 2310, 2318, 2319, 2320, 2321, 2322, 2324, 2326, 2328, 2333, 2346, 2359, 2360, 2362, 2370, 2384, 2398, 2399, 2412, 2428, 2444, 2445, 2458, 2474, 2490, 2491, 2492, 2493, 2495, 2498, 2500, 2508, 2509, 2510, 2512, 2514, 2521, 2532, 2543, 2544, 2556, 2569, 2582, 2583, 2597, 2604, 2611, 2612, 2613, 2614, 2616, 2619, 2627, 2635, 2643, 2651, 2660, 2669, 2678, 2686, 2694, 2695, 2696, 2698, 2701, 2702, 2703, 2705, 2706, 2707, 2709, 2712, 2714, 2722, 2735, 2751, 2767, 2768, 2781, 2797, 2813, 2814, 2815, 2816, 2818, 2821, 2823, 2831, 2832, 2833, 2835, 2836, 2837, 2839, 2841, 2843, 2845, 2847, 2852, 2863, 2874, 2875, 2887, 2894, 2901, 2902, 2903, 2904, 2910, 2916, 2917, 2918, 2919, 2921, 2927, 2933, 2934, 2942, 2950, 2958, 2959, 2960, 2961, 2963, 2964, 2965, 2966, 2967, 2973, 2979, 2980, 2981, 2982, 2984, 2990, 2996, 2997, 2998, 3008, 3011, 3014, 3017, 3018, 3024, 3027, 3029, 3030, 3033, 3034, 3035, 3036, 3037, 3038, 3039, 3040, 3041, 3045, 3047, 3048, 3049, 3050, 3051, 3052, 3054, 3055, 3056, 3057, 3058, 3060, 3062, 3063, 3064, 3067, 3068, 3071, 3072, 3073, 3074, 3075, 3076, 3077, 3078, 3079, 3080, 3082, 3084, 3085, 3086, 3088, 3089, 3090, 3091, 3092, 3094, 3095, 3096, 3097, 3098, 3100, 3101, 3102, 3103, 3104, 3105, 3107, 3109, 3110, 3111, 3113, 3114, 3118, 3122, 3123, 3124, 3126, 3127, 3128, 3130, 3131, 3132, 3134, 3135, 3136, 3138, 3144, 3146, 3148, 3150, 3155, 3166, 3177, 3178, 3191, 3197, 3203, 3204, 3205, 3206, 3208, 3215, 3222, 3223, 3224, 3225, 3231, 3243, 3255, 3256, 3257, 3258, 3260, 3261, 3262, 3263, 3264, 3266, 3267, 3268, 3270, 3272, 3274, 3276, 3278, 3283, 3294, 3305, 3306, 3318, 3326, 3334, 3335, 3342, 3349, 3350, 3351, 3352, 3354, 3362, 3374, 3386, 3387, 3388, 3389, 3391, 3392, 3393, 3395, 3403, 3404, 3405, 3407, 3409, 3411, 3413, 3415, 3417, 3422, 3433, 3444, 3445, 3457, 3469, 3481, 3482, 3483, 3484, 3490, 3491, 3492, 3493, 3494, 3496, 3498, 3500, 3502, 3504, 3506, 3508, 3510, 3512, 3514, 3516, 3517, 3526, 3535, 3536, 3537, 3538, 3540, 3541, 3542, 3544, 3546, 3548, 3550, 3552, 3554, 3556, 3561, 3572, 3583, 3584, 3597, 3598, 3599, 3601, 3603, 3605, 3607, 3609, 3611, 3613, 3618, 3629, 3640, 3641, 3642, 3643, 3645, 3647, 3649, 3651, 3656, 3667, 3678, 3679, 3680, 3681, 3683, 3685, 3687, 3689, 3691, 3693, 3695, 3697, 3699, 3701, 3703, 3705, 3717, 3729, 3730, 3744, 3757, 3770, 3771, 3772, 3773, 3775, 3783, 3784, 3785, 3786, 3788, 3790, 3792, 3794, 3796, 3798, 3800, 3805, 3816, 3827, 3828, 3840, 3856, 3872, 3873, 3874, 3875, 3881, 3887, 3893, 3894, 3895, 3896, 3898, 3905, 3912, 3913, 3914, 3915, 3921, 3937, 3953, 3954, 3955, 3956, 3958, 3966, 3967, 3968, 3970, 3978, 3985, 3992, 3993, 3998, 4005, 4012, 4013, 4014, 4015, 4017, 4025, 4026, 4027, 4029, 4042, 4056, 4070, 4084, 4098, 4111, 4112, 4113, 4114, 4115, 4117, 4123, 4137, 4149, 4161, 4162, 4163, 4164, 4166, 4182, 4198, 4199, 4200, 4201, 4207, 4218, 4229, 4230, 4231, 4232, 4234, 4249, 4264, 4265, 4266, 4267, 4269, 4283, 4289, 4295, 4302, 4303, 4310, 4317, 4318, 4319, 4321, 4329, 4330, 4331, 4333, 4334, 4335, 4337, 4353, 4369, 4370, 4371, 4372, 4378, 4379, 4380, 4382, 4396, 4402, 4408, 4415, 4416, 4423, 4430, 4431, 4432, 4434, 4442, 4444, 4446, 4448, 4450, 4455, 4466, 4477, 4478, 4492, 4506, 4520, 4521, 4527, 4541, 4553, 4565, 4566, 4567, 4568, 4570, 4586, 4602, 4603, 4604, 4605, 4611, 4622, 4633, 4634, 4635, 4636, 4638, 4652, 4658, 4664, 4671, 4672, 4679, 4686, 4687, 4688, 4690, 4698, 4699, 4700, 4702, 4718, 4734, 4735, 4736, 4737, 4743, 4756, 4769, 4770, 4771, 4772, 4774, 4788, 4794, 4800, 4807, 4808, 4815, 4822, 4823, 4824, 4826, 4834, 4835, 4836, 4838, 4840, 4842, 4844, 4846, 4851, 4862, 4873, 4874, 4886, 4902, 4918, 4919, 4920, 4921, 4927, 4933, 4939, 4940, 4941, 4942, 4944, 4951, 4958, 4959, 4960, 4961, 4967, 4983, 4999, 5000, 5001, 5002, 5004, 5012, 5013, 5014, 5016, 5024, 5031, 5038, 5039, 5044, 5051, 5058, 5059, 5060, 5061, 5063, 5071, 5072, 5073, 5075, 5088, 5102, 5116, 5130, 5144, 5157, 5158, 5159, 5160, 5161, 5163, 5169, 5183, 5195, 5207, 5208, 5209, 5210, 5212, 5228, 5244, 5245, 5246, 5247, 5253, 5264, 5275, 5276, 5277, 5278, 5280, 5295, 5310, 5311, 5312, 5313, 5315, 5329, 5335, 5341, 5348, 5349, 5356, 5363, 5364, 5365, 5367, 5375, 5376, 5377, 5379, 5380, 5381, 5383, 5410, 5437, 5465, 0 }; static const char _sas_commands_trans_keys[] = { 10, 76, 108, 69, 101, 84, 116, 9, 10, 13, 32, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 9, 10, 13, 32, 47, 61, 10, 42, 42, 42, 47, 9, 10, 13, 32, 34, 39, 47, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 47, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 47, 59, 9, 10, 13, 32, 47, 59, 9, 10, 13, 32, 47, 59, 10, 42, 42, 42, 47, 39, 39, 42, 42, 42, 47, 9, 10, 13, 32, 47, 59, 95, 46, 57, 65, 90, 97, 122, 42, 42, 42, 47, 59, 9, 10, 13, 32, 59, 10, 59, 42, 42, 42, 47, 84, 116, 84, 116, 82, 114, 73, 105, 66, 98, 9, 10, 13, 32, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 70, 76, 102, 108, 9, 10, 13, 32, 47, 70, 76, 102, 108, 10, 42, 42, 42, 47, 79, 111, 82, 114, 77, 109, 65, 97, 84, 116, 9, 10, 13, 32, 47, 61, 9, 10, 13, 32, 47, 61, 10, 42, 42, 42, 47, 9, 10, 13, 32, 47, 48, 57, 9, 10, 13, 32, 47, 48, 57, 10, 42, 42, 42, 47, 46, 48, 57, 9, 10, 13, 32, 47, 59, 48, 57, 9, 10, 13, 32, 36, 47, 59, 70, 76, 95, 102, 108, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 59, 70, 76, 95, 102, 108, 65, 90, 97, 122, 10, 42, 42, 42, 47, 9, 10, 13, 32, 47, 79, 95, 111, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 82, 95, 114, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 77, 95, 109, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 65, 95, 97, 48, 57, 66, 90, 98, 122, 9, 10, 13, 32, 47, 84, 95, 116, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 70, 76, 102, 108, 9, 10, 13, 32, 47, 61, 70, 76, 102, 108, 10, 42, 42, 42, 47, 65, 69, 97, 101, 66, 98, 69, 101, 76, 108, 9, 10, 13, 32, 47, 61, 9, 10, 13, 32, 47, 61, 10, 42, 42, 42, 47, 9, 10, 13, 32, 34, 39, 47, 9, 10, 13, 32, 34, 39, 47, 10, 34, 34, 9, 10, 13, 32, 47, 59, 39, 39, 42, 42, 42, 47, 78, 110, 71, 103, 84, 116, 72, 104, 9, 10, 13, 32, 47, 61, 9, 10, 13, 32, 47, 61, 10, 42, 42, 42, 47, 9, 10, 13, 32, 36, 47, 48, 57, 9, 10, 13, 32, 36, 47, 48, 57, 10, 9, 10, 13, 32, 47, 48, 57, 9, 10, 13, 32, 47, 48, 57, 10, 42, 42, 42, 47, 9, 10, 13, 32, 47, 59, 48, 57, 42, 42, 42, 47, 9, 10, 13, 32, 47, 59, 48, 57, 9, 10, 13, 32, 47, 65, 69, 95, 97, 101, 48, 57, 66, 90, 98, 122, 9, 10, 13, 32, 47, 66, 95, 98, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 69, 95, 101, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 76, 95, 108, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 70, 76, 102, 108, 9, 10, 13, 32, 47, 61, 70, 76, 102, 108, 10, 42, 42, 42, 47, 9, 10, 13, 32, 47, 78, 95, 110, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 71, 95, 103, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 84, 95, 116, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 72, 95, 104, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 70, 76, 102, 108, 9, 10, 13, 32, 47, 61, 70, 76, 102, 108, 10, 42, 42, 42, 47, 9, 10, 13, 32, 47, 59, 48, 57, 42, 42, 42, 47, 65, 97, 84, 116, 65, 97, 9, 10, 13, 32, 34, 39, 47, 65, 90, 97, 122, 9, 10, 13, 32, 36, 38, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 38, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 34, 39, 46, 47, 59, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 38, 47, 59, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 38, 47, 59, 95, 65, 90, 97, 122, 10, 36, 95, 65, 90, 97, 122, 42, 42, 42, 47, 34, 34, 9, 10, 13, 32, 34, 39, 47, 59, 65, 90, 97, 122, 39, 39, 9, 10, 13, 32, 34, 39, 47, 59, 95, 46, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 47, 59, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 46, 47, 59, 95, 48, 57, 65, 90, 97, 122, 42, 42, 42, 47, 73, 79, 105, 111, 76, 108, 69, 101, 78, 110, 65, 97, 77, 109, 69, 101, 9, 10, 13, 32, 47, 9, 10, 13, 32, 34, 36, 39, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 34, 36, 39, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 36, 39, 47, 61, 95, 65, 90, 97, 122, 9, 10, 13, 32, 34, 36, 39, 47, 61, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 47, 42, 42, 42, 47, 9, 10, 13, 32, 46, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 36, 95, 65, 90, 97, 122, 39, 48, 57, 65, 70, 97, 102, 39, 39, 48, 57, 65, 70, 97, 102, 9, 10, 13, 32, 47, 120, 42, 42, 42, 47, 9, 10, 13, 32, 47, 48, 57, 79, 82, 111, 114, 84, 116, 78, 110, 79, 111, 84, 116, 69, 101, 9, 10, 13, 32, 47, 9, 10, 13, 32, 34, 39, 47, 9, 10, 13, 32, 34, 39, 47, 10, 42, 42, 42, 47, 77, 109, 65, 97, 84, 116, 9, 10, 13, 32, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 68, 74, 77, 80, 81, 84, 87, 89, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 68, 74, 77, 80, 81, 84, 87, 89, 95, 48, 57, 65, 90, 97, 122, 10, 46, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 59, 9, 10, 13, 32, 36, 47, 59, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 59, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 42, 42, 42, 47, 46, 48, 57, 9, 10, 13, 32, 47, 59, 48, 57, 9, 10, 13, 32, 47, 59, 48, 57, 46, 65, 68, 79, 95, 48, 57, 66, 90, 97, 122, 46, 84, 89, 95, 48, 57, 65, 90, 97, 122, 46, 69, 95, 48, 57, 65, 90, 97, 122, 46, 57, 84, 95, 48, 56, 65, 90, 97, 122, 9, 10, 13, 32, 47, 59, 48, 57, 46, 95, 48, 57, 65, 90, 97, 122, 46, 73, 95, 48, 57, 65, 90, 97, 122, 46, 77, 95, 48, 57, 65, 90, 97, 122, 46, 69, 95, 48, 57, 65, 90, 97, 122, 46, 77, 95, 48, 57, 65, 90, 97, 122, 46, 77, 95, 48, 57, 65, 90, 97, 122, 46, 89, 95, 48, 57, 65, 90, 97, 122, 46, 89, 95, 48, 57, 65, 90, 97, 122, 46, 78, 83, 95, 48, 57, 65, 90, 97, 122, 46, 95, 48, 57, 65, 90, 97, 122, 46, 95, 48, 57, 65, 90, 97, 122, 46, 87, 95, 48, 57, 65, 90, 97, 122, 46, 78, 95, 48, 57, 65, 90, 97, 122, 46, 65, 95, 48, 57, 66, 90, 97, 122, 46, 85, 95, 48, 57, 65, 90, 97, 122, 46, 76, 95, 48, 57, 65, 90, 97, 122, 46, 68, 73, 95, 48, 57, 65, 90, 97, 122, 46, 65, 95, 48, 57, 66, 90, 97, 122, 46, 89, 95, 48, 57, 65, 90, 97, 122, 46, 65, 95, 48, 57, 66, 90, 97, 122, 46, 78, 95, 48, 57, 65, 90, 97, 122, 46, 77, 79, 95, 48, 57, 65, 90, 97, 122, 46, 68, 89, 95, 48, 57, 65, 90, 97, 122, 46, 68, 95, 48, 57, 65, 90, 97, 122, 46, 89, 95, 48, 57, 65, 90, 97, 122, 46, 78, 83, 95, 48, 57, 65, 90, 97, 122, 46, 78, 95, 48, 57, 65, 90, 97, 122, 46, 78, 84, 89, 95, 48, 57, 65, 90, 97, 122, 46, 72, 95, 48, 57, 65, 90, 97, 122, 46, 68, 95, 48, 57, 65, 90, 97, 122, 46, 70, 95, 48, 57, 65, 90, 97, 122, 46, 74, 95, 48, 57, 65, 90, 97, 122, 46, 85, 95, 48, 57, 65, 90, 97, 122, 46, 76, 95, 48, 57, 65, 90, 97, 122, 46, 71, 95, 48, 57, 65, 90, 97, 122, 46, 84, 95, 48, 57, 65, 90, 97, 122, 46, 82, 95, 48, 57, 65, 90, 97, 122, 46, 82, 95, 48, 57, 65, 90, 97, 122, 46, 73, 79, 95, 48, 57, 65, 90, 97, 122, 46, 77, 95, 48, 57, 65, 90, 97, 122, 46, 69, 95, 48, 57, 65, 90, 97, 122, 46, 65, 95, 48, 57, 66, 90, 97, 122, 46, 77, 95, 48, 57, 65, 90, 97, 122, 46, 80, 95, 48, 57, 65, 90, 97, 122, 46, 77, 95, 48, 57, 65, 90, 97, 122, 46, 68, 95, 48, 57, 65, 90, 97, 122, 46, 69, 79, 95, 48, 57, 65, 90, 97, 122, 46, 69, 95, 48, 57, 65, 90, 97, 122, 46, 75, 95, 48, 57, 65, 90, 97, 122, 46, 68, 95, 48, 57, 65, 90, 97, 122, 46, 65, 95, 48, 57, 66, 90, 97, 122, 46, 84, 89, 95, 48, 57, 65, 90, 97, 122, 46, 82, 95, 48, 57, 65, 90, 97, 122, 46, 68, 95, 48, 57, 65, 90, 97, 122, 46, 68, 95, 48, 57, 65, 90, 97, 122, 46, 65, 95, 48, 57, 66, 90, 97, 122, 46, 84, 95, 48, 57, 65, 90, 97, 122, 46, 69, 88, 95, 48, 57, 65, 90, 97, 122, 46, 69, 89, 95, 48, 57, 65, 90, 97, 122, 46, 65, 95, 48, 57, 66, 90, 97, 122, 46, 82, 95, 48, 57, 65, 90, 97, 122, 46, 77, 81, 95, 48, 57, 65, 90, 97, 122, 46, 77, 95, 48, 57, 65, 90, 97, 122, 46, 68, 78, 83, 95, 48, 57, 65, 90, 97, 122, 46, 78, 82, 83, 95, 48, 57, 65, 90, 97, 122, 42, 42, 42, 47, 70, 78, 102, 110, 9, 10, 13, 32, 36, 45, 46, 47, 95, 40, 41, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 45, 46, 47, 59, 95, 40, 41, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 45, 46, 47, 59, 95, 40, 41, 48, 57, 65, 90, 97, 122, 10, 9, 10, 13, 32, 36, 45, 46, 47, 59, 95, 40, 41, 48, 57, 65, 90, 97, 122, 48, 57, 9, 10, 13, 32, 36, 45, 46, 47, 59, 95, 40, 41, 48, 57, 65, 90, 97, 122, 42, 42, 42, 47, 9, 10, 13, 32, 36, 45, 46, 47, 59, 95, 40, 41, 48, 57, 65, 90, 97, 122, 70, 80, 86, 102, 112, 118, 73, 105, 76, 108, 69, 101, 9, 10, 13, 32, 36, 47, 59, 95, 65, 90, 97, 122, 9, 10, 13, 32, 34, 36, 39, 47, 59, 95, 65, 90, 97, 122, 9, 10, 13, 32, 34, 36, 39, 47, 59, 95, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 36, 47, 59, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 59, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 59, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 59, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 59, 61, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 59, 61, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 47, 59, 42, 42, 42, 47, 9, 10, 13, 32, 46, 47, 59, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 59, 36, 95, 65, 90, 97, 122, 39, 48, 57, 65, 70, 97, 102, 39, 39, 48, 57, 65, 70, 97, 102, 9, 10, 13, 32, 47, 59, 120, 42, 42, 42, 47, 9, 10, 13, 32, 47, 59, 48, 57, 39, 39, 42, 42, 42, 47, 85, 117, 84, 116, 9, 10, 13, 32, 47, 9, 10, 13, 32, 35, 36, 47, 64, 95, 65, 90, 97, 122, 9, 10, 13, 32, 35, 36, 47, 64, 95, 65, 90, 97, 122, 10, 48, 57, 9, 10, 13, 32, 47, 59, 48, 57, 9, 10, 13, 32, 35, 36, 47, 59, 64, 95, 65, 90, 97, 122, 9, 10, 13, 32, 35, 36, 47, 59, 64, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 59, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 35, 36, 47, 59, 64, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 35, 36, 47, 59, 64, 95, 48, 57, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 59, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 35, 36, 47, 59, 64, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 35, 36, 47, 59, 64, 95, 48, 57, 65, 90, 97, 122, 10, 42, 42, 42, 47, 45, 48, 57, 48, 57, 9, 10, 13, 32, 47, 59, 48, 57, 42, 42, 42, 47, 48, 57, 9, 10, 13, 32, 47, 48, 57, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 48, 57, 65, 90, 97, 122, 10, 9, 10, 13, 32, 46, 47, 67, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 48, 57, 9, 10, 13, 32, 47, 48, 57, 10, 42, 42, 42, 47, 46, 48, 57, 9, 10, 13, 32, 47, 59, 48, 57, 9, 10, 13, 32, 47, 59, 48, 57, 46, 95, 48, 57, 65, 90, 97, 122, 46, 95, 48, 57, 65, 90, 97, 122, 46, 72, 95, 48, 57, 65, 90, 97, 122, 46, 65, 95, 48, 57, 66, 90, 97, 122, 46, 82, 95, 48, 57, 65, 90, 97, 122, 46, 95, 48, 57, 65, 90, 97, 122, 46, 95, 48, 57, 65, 90, 97, 122, 42, 42, 42, 47, 46, 48, 57, 42, 42, 42, 47, 42, 42, 42, 47, 45, 48, 57, 48, 57, 9, 10, 13, 32, 47, 59, 48, 57, 9, 10, 13, 32, 47, 59, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 35, 36, 47, 59, 64, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 35, 36, 47, 59, 64, 95, 48, 57, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 59, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 35, 36, 47, 59, 64, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 35, 36, 47, 59, 64, 95, 48, 57, 65, 90, 97, 122, 10, 42, 42, 42, 47, 45, 48, 57, 48, 57, 9, 10, 13, 32, 47, 59, 48, 57, 42, 42, 42, 47, 42, 42, 42, 47, 65, 97, 76, 108, 85, 117, 69, 101, 9, 10, 13, 32, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 39, 47, 79, 9, 10, 13, 32, 39, 47, 79, 10, 39, 39, 9, 10, 13, 32, 47, 61, 9, 10, 13, 32, 47, 61, 10, 42, 42, 42, 47, 9, 10, 13, 32, 46, 47, 9, 10, 13, 32, 46, 47, 10, 9, 10, 13, 32, 47, 59, 65, 90, 9, 10, 13, 32, 39, 47, 59, 79, 9, 10, 13, 32, 39, 47, 59, 79, 10, 42, 42, 42, 47, 84, 72, 69, 82, 9, 10, 13, 32, 47, 61, 9, 10, 13, 32, 47, 61, 10, 42, 42, 42, 47, 9, 10, 13, 32, 40, 47, 9, 10, 13, 32, 40, 47, 10, 124, 68, 74, 77, 80, 81, 84, 87, 89, 48, 57, 46, 48, 57, 124, 48, 57, 124, 48, 57, 41, 9, 10, 13, 32, 47, 59, 65, 68, 79, 84, 89, 69, 46, 57, 84, 46, 73, 77, 69, 77, 77, 89, 89, 78, 83, 48, 57, 48, 57, 87, 78, 65, 85, 76, 68, 73, 65, 89, 65, 78, 77, 79, 68, 89, 68, 89, 46, 78, 83, 78, 78, 84, 89, 72, 68, 70, 74, 85, 76, 71, 84, 82, 46, 82, 73, 79, 77, 69, 46, 65, 77, 80, 77, 68, 69, 79, 69, 75, 68, 65, 84, 89, 82, 68, 68, 65, 84, 69, 88, 69, 89, 65, 82, 77, 81, 77, 46, 68, 78, 83, 46, 78, 82, 83, 42, 42, 42, 47, 42, 42, 42, 47, 42, 42, 42, 47, 42, 42, 42, 47, 65, 69, 73, 97, 101, 105, 66, 98, 69, 101, 76, 108, 9, 10, 13, 32, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 9, 10, 13, 32, 47, 61, 10, 42, 42, 42, 47, 9, 10, 13, 32, 34, 39, 47, 9, 10, 13, 32, 34, 39, 47, 10, 34, 34, 9, 10, 13, 32, 47, 59, 9, 10, 13, 32, 36, 47, 59, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 59, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 39, 39, 42, 42, 42, 47, 42, 42, 42, 47, 78, 110, 71, 103, 84, 116, 72, 104, 9, 10, 13, 32, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 48, 57, 9, 10, 13, 32, 36, 47, 48, 57, 10, 9, 10, 13, 32, 47, 48, 57, 9, 10, 13, 32, 47, 48, 57, 10, 42, 42, 42, 47, 9, 10, 13, 32, 47, 59, 48, 57, 9, 10, 13, 32, 36, 47, 59, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 59, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 42, 42, 42, 47, 9, 10, 13, 32, 47, 59, 48, 57, 42, 42, 42, 47, 66, 98, 78, 110, 65, 97, 77, 109, 69, 101, 9, 10, 13, 32, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 47, 67, 76, 95, 99, 108, 9, 10, 13, 32, 34, 39, 47, 67, 76, 95, 99, 108, 10, 34, 34, 9, 10, 13, 32, 47, 59, 39, 39, 42, 42, 42, 47, 76, 108, 69, 101, 65, 97, 82, 114, 73, 105, 83, 115, 84, 116, 65, 97, 76, 108, 76, 108, 95, 9, 10, 13, 32, 47, 67, 76, 99, 108, 9, 10, 13, 32, 47, 67, 76, 99, 108, 10, 42, 42, 42, 47, 42, 42, 42, 47, 73, 105, 83, 115, 83, 115, 73, 105, 78, 110, 71, 103, 9, 10, 13, 32, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 59, 95, 48, 57, 65, 90, 97, 122, 42, 42, 42, 47, 80, 112, 84, 116, 73, 105, 79, 111, 78, 110, 83, 115, 9, 10, 13, 32, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 82, 114, 79, 111, 67, 99, 9, 10, 13, 32, 47, 9, 10, 13, 32, 47, 67, 70, 80, 99, 102, 112, 9, 10, 13, 32, 47, 67, 70, 80, 99, 102, 112, 10, 42, 42, 42, 47, 79, 111, 78, 110, 84, 116, 69, 101, 78, 110, 84, 116, 79, 111, 82, 114, 77, 109, 65, 97, 84, 116, 9, 10, 13, 32, 36, 47, 59, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 59, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 59, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 59, 61, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 59, 61, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 9, 10, 13, 32, 47, 59, 86, 118, 10, 42, 42, 42, 47, 85, 117, 78, 110, 65, 97, 76, 108, 85, 117, 69, 101, 9, 10, 13, 32, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 40, 45, 47, 111, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 40, 45, 47, 111, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 47, 61, 9, 10, 13, 32, 47, 61, 9, 10, 13, 32, 47, 61, 10, 42, 42, 42, 47, 9, 10, 13, 32, 34, 39, 47, 9, 10, 13, 32, 34, 39, 47, 10, 34, 34, 9, 10, 13, 32, 47, 59, 9, 10, 13, 32, 34, 39, 45, 47, 59, 111, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 47, 59, 111, 48, 57, 65, 90, 97, 122, 10, 39, 39, 48, 57, 9, 10, 13, 32, 47, 61, 48, 57, 42, 42, 42, 47, 9, 10, 13, 32, 47, 61, 48, 57, 9, 10, 13, 32, 45, 47, 61, 9, 10, 13, 32, 45, 47, 61, 10, 9, 10, 13, 32, 47, 9, 10, 13, 32, 47, 48, 57, 9, 10, 13, 32, 47, 48, 57, 10, 42, 42, 42, 47, 9, 10, 13, 32, 47, 61, 48, 57, 42, 42, 42, 47, 9, 10, 13, 32, 47, 61, 95, 46, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 95, 116, 46, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 95, 104, 46, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 95, 101, 46, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 95, 114, 46, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 95, 46, 57, 65, 90, 97, 122, 39, 39, 42, 42, 42, 47, 36, 95, 65, 90, 97, 122, 9, 10, 13, 32, 41, 47, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 61, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 61, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 41, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 9, 10, 13, 32, 34, 39, 45, 47, 111, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 47, 111, 48, 57, 65, 90, 97, 122, 10, 42, 42, 42, 47, 9, 10, 13, 32, 41, 46, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 41, 47, 36, 95, 65, 90, 97, 122, 39, 48, 57, 65, 70, 97, 102, 39, 39, 48, 57, 65, 70, 97, 102, 9, 10, 13, 32, 41, 47, 120, 42, 42, 42, 47, 9, 10, 13, 32, 41, 47, 48, 57, 42, 42, 42, 47, 42, 42, 42, 47, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 47, 59, 42, 42, 42, 47, 9, 10, 13, 32, 46, 47, 59, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 59, 36, 95, 65, 90, 97, 122, 39, 48, 57, 65, 70, 97, 102, 39, 39, 48, 57, 65, 70, 97, 102, 9, 10, 13, 32, 47, 59, 120, 42, 42, 42, 47, 9, 10, 13, 32, 47, 59, 48, 57, 82, 114, 73, 105, 78, 110, 84, 116, 9, 10, 13, 32, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 59, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 40, 47, 59, 61, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 40, 47, 59, 61, 95, 65, 90, 97, 122, 10, 36, 95, 65, 90, 97, 122, 9, 10, 13, 32, 41, 47, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 61, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 61, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 41, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 9, 10, 13, 32, 41, 46, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 41, 47, 36, 95, 65, 90, 97, 122, 39, 48, 57, 65, 70, 97, 102, 39, 39, 48, 57, 65, 70, 97, 102, 9, 10, 13, 32, 41, 47, 120, 42, 42, 42, 47, 9, 10, 13, 32, 41, 47, 48, 57, 42, 42, 42, 47, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 47, 59, 9, 10, 13, 32, 36, 40, 47, 59, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 40, 47, 59, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 9, 10, 13, 32, 46, 47, 59, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 59, 36, 95, 65, 90, 97, 122, 39, 48, 57, 65, 70, 97, 102, 39, 39, 48, 57, 65, 70, 97, 102, 9, 10, 13, 32, 47, 59, 120, 42, 42, 42, 47, 9, 10, 13, 32, 47, 59, 48, 57, 42, 42, 42, 47, 65, 97, 76, 108, 85, 117, 69, 101, 9, 10, 13, 32, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 9, 10, 13, 32, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 40, 45, 47, 111, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 40, 45, 47, 111, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 47, 61, 9, 10, 13, 32, 47, 61, 9, 10, 13, 32, 47, 61, 10, 42, 42, 42, 47, 9, 10, 13, 32, 34, 39, 47, 9, 10, 13, 32, 34, 39, 47, 10, 34, 34, 9, 10, 13, 32, 47, 59, 9, 10, 13, 32, 34, 39, 45, 47, 59, 111, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 47, 59, 111, 48, 57, 65, 90, 97, 122, 10, 39, 39, 48, 57, 9, 10, 13, 32, 47, 61, 48, 57, 42, 42, 42, 47, 9, 10, 13, 32, 47, 61, 48, 57, 9, 10, 13, 32, 45, 47, 61, 9, 10, 13, 32, 45, 47, 61, 10, 9, 10, 13, 32, 47, 9, 10, 13, 32, 47, 48, 57, 9, 10, 13, 32, 47, 48, 57, 10, 42, 42, 42, 47, 9, 10, 13, 32, 47, 61, 48, 57, 42, 42, 42, 47, 9, 10, 13, 32, 47, 61, 95, 46, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 95, 116, 46, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 95, 104, 46, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 95, 101, 46, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 95, 114, 46, 57, 65, 90, 97, 122, 9, 10, 13, 32, 47, 61, 95, 46, 57, 65, 90, 97, 122, 39, 39, 42, 42, 42, 47, 36, 95, 65, 90, 97, 122, 9, 10, 13, 32, 41, 47, 61, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 61, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 61, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 36, 38, 39, 47, 95, 48, 57, 65, 90, 97, 122, 10, 34, 34, 9, 10, 13, 32, 41, 47, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 9, 10, 13, 32, 36, 47, 95, 65, 90, 97, 122, 10, 42, 42, 42, 47, 9, 10, 13, 32, 34, 39, 45, 47, 111, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 34, 39, 45, 47, 111, 48, 57, 65, 90, 97, 122, 10, 42, 42, 42, 47, 9, 10, 13, 32, 41, 46, 47, 95, 48, 57, 65, 90, 97, 122, 9, 10, 13, 32, 41, 47, 36, 95, 65, 90, 97, 122, 39, 48, 57, 65, 70, 97, 102, 39, 39, 48, 57, 65, 70, 97, 102, 9, 10, 13, 32, 41, 47, 120, 42, 42, 42, 47, 9, 10, 13, 32, 41, 47, 48, 57, 42, 42, 42, 47, 42, 42, 42, 47, 9, 10, 13, 32, 37, 42, 47, 65, 68, 70, 73, 76, 77, 79, 80, 82, 86, 97, 100, 102, 105, 108, 109, 111, 112, 114, 118, 9, 10, 13, 32, 37, 42, 47, 65, 68, 70, 73, 76, 77, 79, 80, 82, 86, 97, 100, 102, 105, 108, 109, 111, 112, 114, 118, 9, 10, 13, 32, 37, 42, 47, 59, 65, 68, 70, 73, 76, 77, 79, 80, 82, 86, 97, 100, 102, 105, 108, 109, 111, 112, 114, 118, 9, 10, 13, 32, 37, 42, 47, 59, 65, 68, 70, 73, 76, 77, 79, 80, 82, 86, 97, 100, 102, 105, 108, 109, 111, 112, 114, 118, 0 }; static const signed char _sas_commands_single_lengths[] = { 0, 1, 2, 2, 2, 5, 7, 7, 1, 7, 6, 6, 1, 1, 1, 2, 7, 7, 1, 1, 1, 6, 6, 6, 1, 1, 1, 2, 1, 1, 1, 1, 2, 7, 1, 1, 2, 1, 5, 2, 1, 1, 2, 2, 2, 2, 2, 2, 5, 7, 7, 1, 6, 9, 9, 1, 1, 1, 2, 2, 2, 2, 2, 2, 6, 6, 1, 1, 1, 2, 5, 5, 1, 1, 1, 2, 1, 6, 12, 12, 1, 1, 1, 2, 8, 8, 8, 8, 8, 7, 10, 10, 1, 1, 1, 2, 4, 2, 2, 2, 6, 6, 1, 1, 1, 2, 7, 7, 1, 1, 1, 6, 1, 1, 1, 1, 2, 2, 2, 2, 2, 6, 6, 1, 1, 1, 2, 6, 6, 1, 5, 5, 1, 1, 1, 2, 6, 1, 1, 2, 6, 10, 8, 8, 8, 7, 10, 10, 1, 1, 1, 2, 8, 8, 8, 8, 7, 10, 10, 1, 1, 1, 2, 6, 1, 1, 2, 2, 2, 2, 7, 8, 8, 1, 10, 9, 9, 1, 2, 1, 1, 2, 1, 1, 8, 1, 1, 9, 8, 10, 1, 1, 2, 4, 2, 2, 2, 2, 2, 2, 5, 9, 9, 1, 7, 10, 10, 1, 1, 1, 2, 10, 10, 1, 1, 1, 5, 1, 1, 2, 7, 5, 2, 1, 1, 1, 6, 1, 1, 2, 5, 4, 2, 2, 2, 2, 2, 5, 7, 7, 1, 1, 1, 2, 2, 2, 2, 5, 7, 7, 1, 6, 15, 15, 1, 2, 6, 8, 8, 1, 1, 1, 2, 1, 1, 2, 1, 6, 6, 5, 4, 3, 4, 6, 2, 3, 3, 3, 3, 3, 3, 3, 4, 2, 2, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 4, 4, 3, 3, 4, 3, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 4, 4, 3, 3, 4, 3, 5, 5, 1, 1, 2, 4, 9, 10, 10, 1, 10, 0, 10, 1, 1, 2, 10, 6, 2, 2, 2, 8, 10, 10, 1, 1, 1, 8, 8, 8, 1, 8, 9, 9, 1, 1, 1, 2, 10, 10, 1, 1, 1, 6, 1, 1, 2, 8, 6, 2, 1, 1, 1, 7, 1, 1, 2, 6, 1, 1, 1, 1, 2, 2, 2, 5, 9, 9, 1, 0, 6, 10, 10, 1, 7, 10, 10, 1, 7, 10, 10, 1, 1, 1, 2, 1, 0, 6, 1, 1, 2, 0, 5, 7, 7, 1, 6, 7, 7, 1, 8, 5, 5, 1, 1, 1, 2, 1, 6, 6, 2, 2, 3, 3, 3, 2, 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 2, 1, 0, 6, 7, 10, 10, 1, 7, 10, 10, 1, 1, 1, 2, 1, 0, 6, 1, 1, 2, 1, 1, 2, 2, 2, 2, 2, 5, 7, 7, 1, 6, 7, 7, 1, 1, 1, 6, 6, 1, 1, 1, 2, 6, 6, 1, 6, 8, 8, 1, 1, 1, 2, 1, 1, 1, 1, 6, 6, 1, 1, 1, 2, 6, 6, 1, 1, 8, 1, 1, 1, 1, 6, 3, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 2, 0, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 3, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 4, 4, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 6, 2, 2, 2, 5, 7, 7, 1, 7, 6, 6, 1, 1, 1, 2, 7, 7, 1, 1, 1, 6, 8, 8, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 2, 2, 2, 5, 7, 7, 1, 6, 6, 6, 1, 5, 5, 1, 1, 1, 2, 6, 8, 8, 1, 1, 1, 2, 1, 1, 2, 6, 1, 1, 2, 2, 2, 2, 2, 2, 5, 7, 7, 1, 6, 12, 12, 1, 1, 1, 6, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 9, 9, 1, 1, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 5, 7, 7, 1, 7, 1, 1, 2, 2, 2, 2, 2, 2, 2, 5, 7, 7, 1, 1, 1, 2, 2, 2, 2, 5, 11, 11, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 8, 8, 1, 8, 9, 9, 1, 1, 1, 2, 8, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 5, 7, 7, 1, 6, 10, 10, 1, 1, 1, 6, 6, 6, 1, 1, 1, 2, 7, 7, 1, 1, 1, 6, 10, 10, 1, 1, 1, 0, 6, 1, 1, 2, 6, 7, 7, 1, 5, 5, 5, 1, 1, 1, 2, 6, 1, 1, 2, 7, 8, 8, 8, 8, 7, 1, 1, 1, 1, 2, 2, 8, 8, 8, 1, 1, 1, 2, 10, 10, 1, 1, 1, 6, 7, 7, 1, 1, 1, 2, 9, 9, 1, 1, 1, 2, 8, 6, 2, 1, 1, 1, 7, 1, 1, 2, 6, 1, 1, 2, 1, 1, 2, 10, 10, 1, 1, 1, 6, 1, 1, 2, 8, 6, 2, 1, 1, 1, 7, 1, 1, 2, 6, 2, 2, 2, 2, 5, 7, 7, 1, 8, 10, 10, 1, 2, 8, 8, 8, 1, 1, 1, 2, 10, 10, 1, 1, 1, 6, 7, 7, 1, 1, 1, 2, 8, 6, 2, 1, 1, 1, 7, 1, 1, 2, 6, 1, 1, 2, 10, 10, 1, 1, 1, 6, 9, 9, 1, 1, 1, 2, 8, 6, 2, 1, 1, 1, 7, 1, 1, 2, 6, 1, 1, 2, 2, 2, 2, 2, 5, 7, 7, 1, 6, 10, 10, 1, 1, 1, 6, 6, 6, 1, 1, 1, 2, 7, 7, 1, 1, 1, 6, 10, 10, 1, 1, 1, 0, 6, 1, 1, 2, 6, 7, 7, 1, 5, 5, 5, 1, 1, 1, 2, 6, 1, 1, 2, 7, 8, 8, 8, 8, 7, 1, 1, 1, 1, 2, 2, 8, 8, 8, 1, 1, 1, 2, 10, 10, 1, 1, 1, 6, 7, 7, 1, 1, 1, 2, 9, 9, 1, 1, 1, 2, 8, 6, 2, 1, 1, 1, 7, 1, 1, 2, 6, 1, 1, 2, 1, 1, 2, 27, 27, 28, 28, 0 }; static const signed char _sas_commands_range_lengths[] = { 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 2, 2, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 2, 2, 0, 3, 2, 2, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0, 3, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 2, 2, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 3, 0, 2, 3, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 3, 3, 0, 3, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 4, 4, 4, 0, 4, 1, 4, 0, 0, 0, 4, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 2, 2, 2, 0, 3, 2, 2, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 3, 0, 2, 3, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 1, 1, 2, 2, 0, 3, 3, 3, 0, 3, 3, 3, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 2, 2, 0, 3, 3, 3, 0, 3, 1, 1, 0, 0, 0, 0, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 3, 3, 3, 0, 3, 3, 3, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 2, 3, 2, 2, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 3, 0, 2, 3, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 3, 0, 2, 3, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 2, 0, 3, 2, 2, 0, 2, 3, 2, 2, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 3, 0, 2, 3, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 3, 3, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 3, 0, 2, 3, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 2, 3, 2, 2, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 3, 0, 2, 3, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; static const short _sas_commands_index_offsets[] = { 0, 0, 2, 5, 8, 11, 17, 27, 37, 39, 50, 57, 64, 66, 68, 70, 73, 83, 93, 95, 97, 99, 106, 113, 120, 122, 124, 126, 129, 131, 133, 135, 137, 140, 151, 153, 155, 158, 160, 166, 169, 171, 173, 176, 179, 182, 185, 188, 191, 197, 207, 217, 219, 229, 239, 249, 251, 253, 255, 258, 261, 264, 267, 270, 273, 280, 287, 289, 291, 293, 296, 303, 310, 312, 314, 316, 319, 322, 330, 345, 360, 362, 364, 366, 369, 381, 393, 405, 417, 429, 440, 451, 462, 464, 466, 468, 471, 476, 479, 482, 485, 492, 499, 501, 503, 505, 508, 516, 524, 526, 528, 530, 537, 539, 541, 543, 545, 548, 551, 554, 557, 560, 567, 574, 576, 578, 580, 583, 591, 599, 601, 608, 615, 617, 619, 621, 624, 632, 634, 636, 639, 647, 661, 673, 685, 697, 708, 719, 730, 732, 734, 736, 739, 751, 763, 775, 787, 798, 809, 820, 822, 824, 826, 829, 837, 839, 841, 844, 847, 850, 853, 863, 874, 885, 887, 901, 913, 925, 927, 932, 934, 936, 939, 941, 943, 954, 956, 958, 971, 982, 996, 998, 1000, 1003, 1008, 1011, 1014, 1017, 1020, 1023, 1026, 1032, 1044, 1056, 1058, 1069, 1082, 1095, 1097, 1099, 1101, 1104, 1118, 1132, 1134, 1136, 1138, 1144, 1146, 1148, 1151, 1162, 1168, 1173, 1178, 1180, 1185, 1192, 1194, 1196, 1199, 1206, 1211, 1214, 1217, 1220, 1223, 1226, 1232, 1240, 1248, 1250, 1252, 1254, 1257, 1260, 1263, 1266, 1272, 1282, 1292, 1294, 1304, 1323, 1342, 1344, 1350, 1357, 1368, 1379, 1381, 1383, 1385, 1388, 1390, 1392, 1395, 1398, 1406, 1414, 1423, 1431, 1438, 1446, 1454, 1460, 1467, 1474, 1481, 1488, 1495, 1502, 1509, 1517, 1523, 1529, 1536, 1543, 1550, 1557, 1564, 1572, 1579, 1586, 1593, 1600, 1608, 1616, 1623, 1630, 1638, 1645, 1654, 1661, 1668, 1675, 1682, 1689, 1696, 1703, 1710, 1717, 1724, 1732, 1739, 1746, 1753, 1760, 1767, 1774, 1781, 1789, 1796, 1803, 1810, 1817, 1825, 1832, 1839, 1846, 1853, 1860, 1868, 1876, 1883, 1890, 1898, 1905, 1914, 1923, 1925, 1927, 1930, 1935, 1949, 1964, 1979, 1981, 1996, 1998, 2013, 2015, 2017, 2020, 2035, 2042, 2045, 2048, 2051, 2062, 2075, 2088, 2090, 2092, 2094, 2105, 2116, 2127, 2129, 2141, 2153, 2165, 2167, 2169, 2171, 2174, 2188, 2202, 2204, 2206, 2208, 2215, 2217, 2219, 2222, 2234, 2241, 2246, 2251, 2253, 2258, 2266, 2268, 2270, 2273, 2281, 2283, 2285, 2287, 2289, 2292, 2295, 2298, 2304, 2316, 2328, 2330, 2332, 2340, 2353, 2366, 2368, 2379, 2393, 2407, 2409, 2420, 2434, 2448, 2450, 2452, 2454, 2457, 2460, 2462, 2470, 2472, 2474, 2477, 2479, 2486, 2496, 2506, 2508, 2518, 2529, 2540, 2542, 2554, 2561, 2568, 2570, 2572, 2574, 2577, 2580, 2588, 2596, 2602, 2608, 2615, 2622, 2629, 2635, 2641, 2643, 2645, 2648, 2651, 2653, 2655, 2658, 2660, 2662, 2665, 2668, 2670, 2678, 2689, 2703, 2717, 2719, 2730, 2744, 2758, 2760, 2762, 2764, 2767, 2770, 2772, 2780, 2782, 2784, 2787, 2789, 2791, 2794, 2797, 2800, 2803, 2806, 2812, 2822, 2832, 2834, 2844, 2852, 2860, 2862, 2864, 2866, 2873, 2880, 2882, 2884, 2886, 2889, 2896, 2903, 2905, 2913, 2922, 2931, 2933, 2935, 2937, 2940, 2942, 2944, 2946, 2948, 2955, 2962, 2964, 2966, 2968, 2971, 2978, 2985, 2987, 2989, 2999, 3002, 3005, 3008, 3010, 3017, 3021, 3024, 3026, 3030, 3032, 3034, 3036, 3038, 3040, 3042, 3044, 3046, 3050, 3052, 3054, 3056, 3058, 3060, 3062, 3065, 3067, 3069, 3071, 3073, 3076, 3079, 3081, 3083, 3087, 3089, 3093, 3095, 3097, 3099, 3101, 3103, 3105, 3107, 3109, 3111, 3114, 3117, 3119, 3121, 3124, 3126, 3128, 3130, 3132, 3135, 3137, 3139, 3141, 3143, 3146, 3148, 3150, 3152, 3154, 3156, 3159, 3162, 3164, 3166, 3169, 3171, 3176, 3181, 3183, 3185, 3188, 3190, 3192, 3195, 3197, 3199, 3202, 3204, 3206, 3209, 3216, 3219, 3222, 3225, 3231, 3241, 3251, 3253, 3264, 3271, 3278, 3280, 3282, 3284, 3287, 3295, 3303, 3305, 3307, 3309, 3316, 3327, 3338, 3340, 3342, 3344, 3347, 3349, 3351, 3353, 3355, 3358, 3360, 3362, 3365, 3368, 3371, 3374, 3377, 3383, 3393, 3403, 3405, 3415, 3423, 3431, 3433, 3440, 3447, 3449, 3451, 3453, 3456, 3464, 3475, 3486, 3488, 3490, 3492, 3495, 3497, 3499, 3502, 3510, 3512, 3514, 3517, 3520, 3523, 3526, 3529, 3532, 3538, 3548, 3558, 3560, 3570, 3583, 3596, 3598, 3600, 3602, 3609, 3611, 3613, 3615, 3617, 3620, 3623, 3626, 3629, 3632, 3635, 3638, 3641, 3644, 3647, 3650, 3652, 3662, 3672, 3674, 3676, 3678, 3681, 3683, 3685, 3688, 3691, 3694, 3697, 3700, 3703, 3706, 3712, 3722, 3732, 3734, 3745, 3747, 3749, 3752, 3755, 3758, 3761, 3764, 3767, 3770, 3776, 3786, 3796, 3798, 3800, 3802, 3805, 3808, 3811, 3814, 3820, 3832, 3844, 3846, 3848, 3850, 3853, 3856, 3859, 3862, 3865, 3868, 3871, 3874, 3877, 3880, 3883, 3886, 3897, 3908, 3910, 3922, 3934, 3946, 3948, 3950, 3952, 3955, 3964, 3966, 3968, 3970, 3973, 3976, 3979, 3982, 3985, 3988, 3991, 3997, 4007, 4017, 4019, 4029, 4043, 4057, 4059, 4061, 4063, 4070, 4077, 4084, 4086, 4088, 4090, 4093, 4101, 4109, 4111, 4113, 4115, 4122, 4136, 4150, 4152, 4154, 4156, 4158, 4166, 4168, 4170, 4173, 4181, 4189, 4197, 4199, 4205, 4212, 4219, 4221, 4223, 4225, 4228, 4236, 4238, 4240, 4243, 4254, 4266, 4278, 4290, 4302, 4313, 4315, 4317, 4319, 4321, 4324, 4329, 4341, 4352, 4363, 4365, 4367, 4369, 4372, 4386, 4400, 4402, 4404, 4406, 4413, 4423, 4433, 4435, 4437, 4439, 4442, 4455, 4468, 4470, 4472, 4474, 4477, 4489, 4496, 4501, 4506, 4508, 4513, 4521, 4523, 4525, 4528, 4536, 4538, 4540, 4543, 4545, 4547, 4550, 4564, 4578, 4580, 4582, 4584, 4591, 4593, 4595, 4598, 4610, 4617, 4622, 4627, 4629, 4634, 4642, 4644, 4646, 4649, 4657, 4660, 4663, 4666, 4669, 4675, 4685, 4695, 4697, 4709, 4722, 4735, 4737, 4742, 4754, 4765, 4776, 4778, 4780, 4782, 4785, 4799, 4813, 4815, 4817, 4819, 4826, 4836, 4846, 4848, 4850, 4852, 4855, 4867, 4874, 4879, 4884, 4886, 4891, 4899, 4901, 4903, 4906, 4914, 4916, 4918, 4921, 4935, 4949, 4951, 4953, 4955, 4962, 4974, 4986, 4988, 4990, 4992, 4995, 5007, 5014, 5019, 5024, 5026, 5031, 5039, 5041, 5043, 5046, 5054, 5056, 5058, 5061, 5064, 5067, 5070, 5073, 5079, 5089, 5099, 5101, 5111, 5125, 5139, 5141, 5143, 5145, 5152, 5159, 5166, 5168, 5170, 5172, 5175, 5183, 5191, 5193, 5195, 5197, 5204, 5218, 5232, 5234, 5236, 5238, 5240, 5248, 5250, 5252, 5255, 5263, 5271, 5279, 5281, 5287, 5294, 5301, 5303, 5305, 5307, 5310, 5318, 5320, 5322, 5325, 5336, 5348, 5360, 5372, 5384, 5395, 5397, 5399, 5401, 5403, 5406, 5411, 5423, 5434, 5445, 5447, 5449, 5451, 5454, 5468, 5482, 5484, 5486, 5488, 5495, 5505, 5515, 5517, 5519, 5521, 5524, 5537, 5550, 5552, 5554, 5556, 5559, 5571, 5578, 5583, 5588, 5590, 5595, 5603, 5605, 5607, 5610, 5618, 5620, 5622, 5625, 5627, 5629, 5632, 5660, 5688, 5717, 0 }; static const short _sas_commands_cond_targs[] = { 1095, 0, 3, 3, 0, 4, 4, 0, 5, 5, 0, 6, 7, 8, 6, 34, 0, 6, 7, 8, 6, 9, 34, 9, 9, 9, 0, 6, 7, 8, 6, 9, 34, 9, 9, 9, 0, 7, 0, 10, 11, 12, 10, 13, 16, 9, 9, 9, 9, 0, 10, 11, 12, 10, 13, 16, 0, 10, 11, 12, 10, 13, 16, 0, 11, 0, 14, 0, 15, 14, 15, 10, 14, 16, 17, 18, 16, 19, 28, 30, 33, 33, 0, 16, 17, 18, 16, 19, 28, 30, 33, 33, 0, 17, 0, 21, 20, 21, 20, 22, 23, 24, 22, 25, 1094, 0, 22, 23, 24, 22, 25, 1094, 0, 22, 23, 24, 22, 25, 1094, 0, 23, 0, 26, 0, 27, 26, 27, 22, 26, 21, 29, 21, 29, 31, 0, 32, 31, 32, 16, 31, 22, 23, 24, 22, 25, 1094, 33, 33, 33, 33, 0, 35, 0, 36, 35, 36, 6, 35, 38, 37, 38, 1095, 39, 38, 38, 37, 1095, 38, 37, 41, 0, 42, 41, 42, 1094, 41, 44, 44, 0, 45, 45, 0, 46, 46, 0, 47, 47, 0, 48, 48, 0, 49, 50, 51, 49, 164, 0, 49, 50, 51, 49, 52, 164, 52, 52, 52, 0, 49, 50, 51, 49, 52, 164, 52, 52, 52, 0, 50, 0, 53, 54, 55, 53, 56, 52, 52, 52, 52, 0, 53, 54, 55, 53, 56, 59, 96, 59, 96, 0, 53, 54, 55, 53, 56, 59, 96, 59, 96, 0, 54, 0, 57, 0, 58, 57, 58, 53, 57, 60, 60, 0, 61, 61, 0, 62, 62, 0, 63, 63, 0, 64, 64, 0, 64, 65, 66, 64, 67, 70, 0, 64, 65, 66, 64, 67, 70, 0, 65, 0, 68, 0, 69, 68, 69, 64, 68, 70, 71, 72, 70, 73, 76, 0, 70, 71, 72, 70, 73, 76, 0, 71, 0, 74, 0, 75, 74, 75, 70, 74, 77, 76, 0, 78, 79, 80, 78, 81, 1094, 163, 0, 78, 79, 80, 78, 52, 81, 1094, 84, 141, 52, 84, 141, 52, 52, 0, 78, 79, 80, 78, 52, 81, 1094, 84, 141, 52, 84, 141, 52, 52, 0, 79, 0, 82, 0, 83, 82, 83, 78, 82, 53, 54, 55, 53, 56, 85, 52, 85, 52, 52, 52, 0, 53, 54, 55, 53, 56, 86, 52, 86, 52, 52, 52, 0, 53, 54, 55, 53, 56, 87, 52, 87, 52, 52, 52, 0, 53, 54, 55, 53, 56, 88, 52, 88, 52, 52, 52, 0, 53, 54, 55, 53, 56, 89, 52, 89, 52, 52, 52, 0, 90, 91, 92, 90, 93, 70, 52, 52, 52, 52, 0, 90, 91, 92, 90, 93, 70, 59, 96, 59, 96, 0, 90, 91, 92, 90, 93, 70, 59, 96, 59, 96, 0, 91, 0, 94, 0, 95, 94, 95, 90, 94, 97, 117, 97, 117, 0, 98, 98, 0, 99, 99, 0, 100, 100, 0, 100, 101, 102, 100, 103, 106, 0, 100, 101, 102, 100, 103, 106, 0, 101, 0, 104, 0, 105, 104, 105, 100, 104, 106, 107, 108, 106, 109, 112, 114, 0, 106, 107, 108, 106, 109, 112, 114, 0, 107, 0, 111, 110, 111, 110, 78, 79, 80, 78, 81, 1094, 0, 111, 113, 111, 113, 115, 0, 116, 115, 116, 106, 115, 118, 118, 0, 119, 119, 0, 120, 120, 0, 121, 121, 0, 121, 122, 123, 121, 124, 127, 0, 121, 122, 123, 121, 124, 127, 0, 122, 0, 125, 0, 126, 125, 126, 121, 125, 127, 128, 129, 127, 130, 137, 140, 0, 127, 128, 129, 127, 130, 137, 140, 0, 128, 0, 130, 131, 132, 130, 133, 136, 0, 130, 131, 132, 130, 133, 136, 0, 131, 0, 134, 0, 135, 134, 135, 130, 134, 78, 79, 80, 78, 81, 1094, 136, 0, 138, 0, 139, 138, 139, 127, 138, 78, 79, 80, 78, 81, 1094, 140, 0, 53, 54, 55, 53, 56, 142, 152, 52, 142, 152, 52, 52, 52, 0, 53, 54, 55, 53, 56, 143, 52, 143, 52, 52, 52, 0, 53, 54, 55, 53, 56, 144, 52, 144, 52, 52, 52, 0, 53, 54, 55, 53, 56, 145, 52, 145, 52, 52, 52, 0, 146, 147, 148, 146, 149, 106, 52, 52, 52, 52, 0, 146, 147, 148, 146, 149, 106, 59, 96, 59, 96, 0, 146, 147, 148, 146, 149, 106, 59, 96, 59, 96, 0, 147, 0, 150, 0, 151, 150, 151, 146, 150, 53, 54, 55, 53, 56, 153, 52, 153, 52, 52, 52, 0, 53, 54, 55, 53, 56, 154, 52, 154, 52, 52, 52, 0, 53, 54, 55, 53, 56, 155, 52, 155, 52, 52, 52, 0, 53, 54, 55, 53, 56, 156, 52, 156, 52, 52, 52, 0, 157, 158, 159, 157, 160, 127, 52, 52, 52, 52, 0, 157, 158, 159, 157, 160, 127, 59, 96, 59, 96, 0, 157, 158, 159, 157, 160, 127, 59, 96, 59, 96, 0, 158, 0, 161, 0, 162, 161, 162, 157, 161, 78, 79, 80, 78, 81, 1094, 163, 0, 165, 0, 166, 165, 166, 49, 165, 168, 168, 0, 169, 169, 0, 170, 170, 0, 171, 172, 173, 171, 182, 185, 190, 187, 187, 0, 171, 172, 173, 171, 174, 178, 190, 174, 174, 174, 0, 171, 172, 173, 171, 174, 178, 190, 174, 174, 174, 0, 172, 0, 175, 176, 177, 175, 182, 185, 188, 179, 1094, 174, 174, 189, 189, 0, 175, 176, 177, 175, 174, 178, 179, 1094, 174, 174, 174, 0, 175, 176, 177, 175, 174, 178, 179, 1094, 174, 174, 174, 0, 176, 0, 174, 174, 174, 174, 0, 180, 0, 181, 180, 181, 175, 180, 184, 183, 184, 183, 175, 176, 177, 175, 182, 185, 179, 1094, 187, 187, 0, 184, 186, 184, 186, 175, 176, 177, 175, 182, 185, 179, 1094, 187, 187, 187, 187, 0, 175, 176, 177, 175, 182, 185, 179, 1094, 187, 187, 0, 175, 176, 177, 175, 182, 185, 187, 179, 1094, 189, 189, 189, 189, 0, 191, 0, 192, 191, 192, 171, 191, 194, 231, 194, 231, 0, 195, 195, 0, 196, 196, 0, 197, 197, 0, 198, 198, 0, 199, 199, 0, 200, 200, 0, 201, 202, 203, 201, 217, 0, 201, 202, 203, 201, 19, 204, 28, 217, 204, 204, 204, 0, 201, 202, 203, 201, 19, 204, 28, 217, 204, 204, 204, 0, 202, 0, 205, 206, 207, 205, 208, 211, 204, 204, 204, 204, 0, 205, 206, 207, 205, 19, 204, 28, 208, 211, 204, 204, 204, 0, 205, 206, 207, 205, 19, 204, 28, 208, 211, 204, 204, 204, 0, 206, 0, 209, 0, 210, 209, 210, 205, 209, 211, 212, 213, 211, 214, 220, 222, 223, 227, 220, 230, 220, 220, 0, 211, 212, 213, 211, 214, 220, 222, 223, 227, 220, 230, 220, 220, 0, 212, 0, 216, 215, 216, 215, 201, 202, 203, 201, 217, 0, 218, 0, 219, 218, 219, 201, 218, 201, 202, 203, 201, 221, 217, 220, 220, 220, 220, 0, 201, 202, 203, 201, 217, 0, 220, 220, 220, 220, 0, 216, 225, 225, 225, 224, 216, 224, 226, 225, 225, 225, 224, 201, 202, 203, 201, 217, 221, 0, 228, 0, 229, 228, 229, 211, 228, 201, 202, 203, 201, 217, 230, 0, 232, 244, 232, 244, 0, 233, 233, 0, 234, 234, 0, 235, 235, 0, 236, 236, 0, 237, 237, 0, 238, 239, 240, 238, 241, 0, 238, 239, 240, 238, 19, 28, 241, 0, 238, 239, 240, 238, 19, 28, 241, 0, 239, 0, 242, 0, 243, 242, 243, 238, 242, 245, 245, 0, 246, 246, 0, 247, 247, 0, 248, 249, 250, 248, 339, 0, 248, 249, 250, 248, 251, 339, 251, 251, 251, 0, 248, 249, 250, 248, 251, 339, 251, 251, 251, 0, 249, 0, 252, 253, 254, 252, 263, 251, 251, 251, 251, 0, 252, 253, 254, 252, 255, 263, 269, 288, 295, 303, 309, 312, 320, 332, 255, 266, 255, 255, 0, 252, 253, 254, 252, 255, 263, 269, 288, 295, 303, 309, 312, 320, 332, 255, 266, 255, 255, 0, 253, 0, 256, 255, 255, 255, 255, 0, 257, 258, 259, 257, 260, 1094, 0, 257, 258, 259, 257, 251, 260, 1094, 251, 251, 251, 0, 257, 258, 259, 257, 251, 260, 1094, 251, 251, 251, 0, 258, 0, 261, 0, 262, 261, 262, 257, 261, 264, 0, 265, 264, 265, 252, 264, 267, 266, 0, 257, 258, 259, 257, 260, 1094, 268, 0, 257, 258, 259, 257, 260, 1094, 268, 0, 256, 270, 278, 285, 255, 255, 255, 255, 0, 256, 271, 274, 255, 255, 255, 255, 0, 256, 272, 255, 255, 255, 255, 0, 273, 274, 275, 255, 255, 255, 255, 0, 257, 258, 259, 257, 260, 1094, 268, 0, 273, 255, 255, 255, 255, 0, 256, 276, 255, 255, 255, 255, 0, 256, 277, 255, 255, 255, 255, 0, 256, 274, 255, 255, 255, 255, 0, 256, 279, 255, 255, 255, 255, 0, 256, 280, 255, 255, 255, 255, 0, 256, 281, 255, 255, 255, 255, 0, 256, 282, 255, 255, 255, 255, 0, 256, 284, 284, 255, 283, 255, 255, 0, 273, 255, 283, 255, 255, 0, 256, 255, 283, 255, 255, 0, 256, 286, 255, 255, 255, 255, 0, 256, 287, 255, 255, 255, 255, 0, 256, 276, 255, 255, 255, 255, 0, 256, 289, 255, 255, 255, 255, 0, 256, 290, 255, 255, 255, 255, 0, 256, 291, 293, 255, 255, 255, 255, 0, 256, 292, 255, 255, 255, 255, 0, 256, 274, 255, 255, 255, 255, 0, 256, 294, 255, 255, 255, 255, 0, 256, 274, 255, 255, 255, 255, 0, 256, 296, 300, 255, 255, 255, 255, 0, 256, 297, 298, 255, 255, 255, 255, 0, 256, 280, 255, 255, 255, 255, 0, 256, 299, 255, 255, 255, 255, 0, 273, 274, 274, 255, 255, 255, 255, 0, 256, 301, 255, 255, 255, 255, 0, 256, 287, 302, 292, 255, 255, 255, 255, 0, 256, 274, 255, 255, 255, 255, 0, 256, 304, 255, 255, 255, 255, 0, 256, 305, 255, 255, 255, 255, 0, 256, 306, 255, 255, 255, 255, 0, 256, 307, 255, 255, 255, 255, 0, 256, 308, 255, 255, 255, 255, 0, 256, 274, 255, 255, 255, 255, 0, 256, 310, 255, 255, 255, 255, 0, 256, 311, 255, 255, 255, 255, 0, 273, 274, 255, 255, 255, 255, 0, 256, 313, 319, 255, 255, 255, 255, 0, 256, 314, 255, 255, 255, 255, 0, 256, 315, 255, 255, 255, 255, 0, 273, 316, 255, 255, 255, 255, 0, 256, 317, 255, 255, 255, 255, 0, 256, 318, 255, 255, 255, 255, 0, 256, 274, 255, 255, 255, 255, 0, 256, 274, 255, 255, 255, 255, 0, 256, 321, 326, 255, 255, 255, 255, 0, 256, 322, 255, 255, 255, 255, 0, 256, 323, 255, 255, 255, 255, 0, 256, 324, 255, 255, 255, 255, 0, 256, 325, 255, 255, 255, 255, 0, 256, 277, 274, 255, 255, 255, 255, 0, 256, 327, 255, 255, 255, 255, 0, 256, 328, 255, 255, 255, 255, 0, 256, 329, 255, 255, 255, 255, 0, 256, 330, 255, 255, 255, 255, 0, 256, 331, 255, 255, 255, 255, 0, 256, 274, 274, 255, 255, 255, 255, 0, 256, 333, 335, 255, 255, 255, 255, 0, 256, 334, 255, 255, 255, 255, 0, 256, 274, 255, 255, 255, 255, 0, 256, 336, 338, 255, 255, 255, 255, 0, 256, 337, 255, 255, 255, 255, 0, 273, 319, 274, 274, 255, 255, 255, 255, 0, 273, 274, 299, 274, 255, 255, 255, 255, 0, 340, 0, 341, 340, 341, 248, 340, 343, 354, 343, 354, 0, 344, 345, 346, 344, 347, 348, 344, 350, 347, 344, 349, 347, 347, 0, 344, 345, 346, 344, 347, 348, 344, 350, 1094, 347, 344, 349, 347, 347, 0, 344, 345, 346, 344, 347, 348, 344, 350, 1094, 347, 344, 349, 347, 347, 0, 345, 0, 344, 345, 346, 344, 347, 348, 344, 350, 1094, 347, 344, 353, 347, 347, 0, 349, 0, 344, 345, 346, 344, 347, 348, 344, 350, 1094, 347, 344, 349, 347, 347, 0, 351, 0, 352, 351, 352, 344, 351, 344, 345, 346, 344, 347, 348, 344, 350, 1094, 347, 344, 353, 347, 347, 0, 355, 400, 487, 355, 400, 487, 0, 356, 356, 0, 357, 357, 0, 358, 358, 0, 359, 360, 361, 359, 368, 397, 1094, 368, 368, 368, 0, 359, 360, 361, 359, 362, 368, 395, 397, 1094, 368, 368, 368, 0, 359, 360, 361, 359, 362, 368, 395, 397, 1094, 368, 368, 368, 0, 360, 0, 364, 363, 364, 363, 365, 366, 367, 365, 368, 381, 1094, 368, 368, 368, 0, 365, 366, 367, 365, 368, 381, 1094, 368, 368, 368, 0, 365, 366, 367, 365, 368, 381, 1094, 368, 368, 368, 0, 366, 0, 369, 370, 371, 369, 372, 1094, 375, 368, 368, 368, 368, 0, 369, 370, 371, 369, 368, 372, 1094, 375, 368, 368, 368, 0, 369, 370, 371, 369, 368, 372, 1094, 375, 368, 368, 368, 0, 370, 0, 373, 0, 374, 373, 374, 369, 373, 375, 376, 377, 375, 378, 384, 386, 387, 391, 384, 394, 384, 384, 0, 375, 376, 377, 375, 378, 384, 386, 387, 391, 384, 394, 384, 384, 0, 376, 0, 380, 379, 380, 379, 365, 366, 367, 365, 381, 1094, 0, 382, 0, 383, 382, 383, 365, 382, 365, 366, 367, 365, 385, 381, 1094, 384, 384, 384, 384, 0, 365, 366, 367, 365, 381, 1094, 0, 384, 384, 384, 384, 0, 380, 389, 389, 389, 388, 380, 388, 390, 389, 389, 389, 388, 365, 366, 367, 365, 381, 1094, 385, 0, 392, 0, 393, 392, 393, 375, 392, 365, 366, 367, 365, 381, 1094, 394, 0, 364, 396, 364, 396, 398, 0, 399, 398, 399, 359, 398, 401, 401, 0, 402, 402, 0, 403, 404, 405, 403, 484, 0, 403, 404, 405, 403, 406, 467, 484, 428, 467, 467, 467, 0, 403, 404, 405, 403, 406, 467, 484, 428, 467, 467, 467, 0, 404, 0, 407, 0, 408, 409, 410, 408, 425, 1094, 407, 0, 408, 409, 410, 408, 406, 411, 425, 1094, 428, 411, 411, 411, 0, 408, 409, 410, 408, 406, 411, 425, 1094, 428, 411, 411, 411, 0, 409, 0, 412, 413, 414, 412, 461, 1094, 411, 411, 411, 411, 0, 412, 413, 414, 412, 406, 415, 461, 1094, 428, 411, 464, 411, 411, 0, 412, 413, 414, 412, 406, 415, 461, 1094, 428, 411, 464, 411, 411, 0, 413, 0, 416, 417, 418, 416, 419, 1094, 411, 411, 411, 411, 0, 416, 417, 418, 416, 406, 415, 419, 1094, 428, 411, 422, 411, 411, 0, 416, 417, 418, 416, 406, 415, 419, 1094, 428, 411, 422, 411, 411, 0, 417, 0, 420, 0, 421, 420, 421, 416, 420, 423, 422, 0, 424, 0, 408, 409, 410, 408, 425, 1094, 424, 0, 426, 0, 427, 426, 427, 408, 426, 429, 0, 430, 431, 432, 430, 458, 429, 0, 430, 431, 432, 430, 433, 458, 433, 433, 433, 0, 430, 431, 432, 430, 433, 458, 433, 433, 433, 0, 431, 0, 434, 435, 436, 434, 454, 433, 433, 433, 433, 0, 434, 435, 436, 434, 437, 454, 448, 457, 448, 448, 0, 434, 435, 436, 434, 437, 454, 448, 457, 448, 448, 0, 435, 0, 438, 439, 440, 438, 445, 441, 449, 448, 447, 448, 448, 0, 438, 439, 440, 438, 441, 444, 0, 438, 439, 440, 438, 441, 444, 0, 439, 0, 442, 0, 443, 442, 443, 438, 442, 445, 444, 0, 408, 409, 410, 408, 425, 1094, 446, 0, 408, 409, 410, 408, 425, 1094, 446, 0, 445, 448, 447, 448, 448, 0, 445, 448, 448, 448, 448, 0, 445, 450, 448, 448, 448, 448, 0, 445, 451, 448, 448, 448, 448, 0, 445, 452, 448, 448, 448, 448, 0, 445, 448, 453, 448, 448, 0, 445, 448, 453, 448, 448, 0, 455, 0, 456, 455, 456, 434, 455, 445, 457, 0, 459, 0, 460, 459, 460, 430, 459, 462, 0, 463, 462, 463, 412, 462, 465, 464, 0, 466, 0, 408, 409, 410, 408, 425, 1094, 466, 0, 468, 469, 470, 468, 481, 1094, 467, 467, 467, 467, 0, 468, 469, 470, 468, 406, 471, 481, 1094, 428, 411, 464, 411, 411, 0, 468, 469, 470, 468, 406, 471, 481, 1094, 428, 411, 464, 411, 411, 0, 469, 0, 472, 473, 474, 472, 475, 1094, 411, 411, 411, 411, 0, 472, 473, 474, 472, 406, 415, 475, 1094, 428, 411, 478, 411, 411, 0, 472, 473, 474, 472, 406, 415, 475, 1094, 428, 411, 478, 411, 411, 0, 473, 0, 476, 0, 477, 476, 477, 472, 476, 479, 478, 0, 480, 0, 408, 409, 410, 408, 425, 1094, 480, 0, 482, 0, 483, 482, 483, 468, 482, 485, 0, 486, 485, 486, 403, 485, 488, 488, 0, 489, 489, 0, 490, 490, 0, 491, 491, 0, 492, 493, 494, 492, 614, 0, 492, 493, 494, 492, 495, 614, 495, 495, 495, 0, 492, 493, 494, 492, 495, 614, 495, 495, 495, 0, 493, 0, 496, 497, 498, 496, 611, 495, 495, 495, 495, 0, 496, 497, 498, 496, 499, 611, 517, 0, 496, 497, 498, 496, 499, 611, 517, 0, 497, 0, 501, 500, 501, 500, 501, 502, 503, 501, 504, 507, 0, 501, 502, 503, 501, 504, 507, 0, 502, 0, 505, 0, 506, 505, 506, 501, 505, 507, 508, 509, 507, 510, 608, 0, 507, 508, 509, 507, 510, 608, 0, 508, 0, 511, 512, 513, 511, 514, 1094, 536, 0, 511, 512, 513, 511, 499, 514, 1094, 517, 0, 511, 512, 513, 511, 499, 514, 1094, 517, 0, 512, 0, 515, 0, 516, 515, 516, 511, 515, 518, 0, 519, 0, 520, 0, 521, 0, 521, 522, 523, 521, 524, 527, 0, 521, 522, 523, 521, 524, 527, 0, 522, 0, 525, 0, 526, 525, 526, 521, 525, 527, 528, 529, 527, 530, 605, 0, 527, 528, 529, 527, 530, 605, 0, 528, 0, 531, 0, 537, 554, 561, 569, 575, 578, 586, 598, 532, 0, 533, 532, 0, 535, 534, 0, 535, 534, 0, 536, 0, 511, 512, 513, 511, 514, 1094, 0, 538, 545, 551, 0, 539, 541, 0, 540, 0, 533, 541, 542, 0, 533, 0, 543, 0, 544, 0, 541, 0, 546, 0, 547, 0, 548, 0, 549, 0, 550, 550, 532, 0, 532, 0, 552, 0, 553, 0, 543, 0, 555, 0, 556, 0, 557, 559, 0, 558, 0, 541, 0, 560, 0, 541, 0, 562, 566, 0, 563, 564, 0, 547, 0, 565, 0, 533, 541, 541, 0, 567, 0, 553, 568, 558, 0, 541, 0, 570, 0, 571, 0, 572, 0, 573, 0, 574, 0, 541, 0, 576, 0, 577, 0, 533, 541, 0, 579, 585, 0, 580, 0, 581, 0, 533, 582, 0, 583, 0, 584, 0, 541, 0, 541, 0, 587, 592, 0, 588, 0, 589, 0, 590, 0, 591, 0, 544, 541, 0, 593, 0, 594, 0, 595, 0, 596, 0, 597, 0, 541, 541, 0, 599, 601, 0, 600, 0, 541, 0, 602, 604, 0, 603, 0, 533, 585, 541, 541, 0, 533, 541, 565, 541, 0, 606, 0, 607, 606, 607, 527, 606, 609, 0, 610, 609, 610, 507, 609, 612, 0, 613, 612, 613, 496, 612, 615, 0, 616, 615, 616, 492, 615, 618, 652, 684, 618, 652, 684, 0, 619, 619, 0, 620, 620, 0, 621, 621, 0, 622, 623, 624, 622, 649, 0, 622, 623, 624, 622, 625, 649, 625, 625, 625, 0, 622, 623, 624, 622, 625, 649, 625, 625, 625, 0, 623, 0, 626, 627, 628, 626, 629, 632, 625, 625, 625, 625, 0, 626, 627, 628, 626, 629, 632, 0, 626, 627, 628, 626, 629, 632, 0, 627, 0, 630, 0, 631, 630, 631, 626, 630, 632, 633, 634, 632, 635, 644, 646, 0, 632, 633, 634, 632, 635, 644, 646, 0, 633, 0, 637, 636, 637, 636, 638, 639, 640, 638, 641, 1094, 0, 638, 639, 640, 638, 625, 641, 1094, 625, 625, 625, 0, 638, 639, 640, 638, 625, 641, 1094, 625, 625, 625, 0, 639, 0, 642, 0, 643, 642, 643, 638, 642, 637, 645, 637, 645, 647, 0, 648, 647, 648, 632, 647, 650, 0, 651, 650, 651, 622, 650, 653, 653, 0, 654, 654, 0, 655, 655, 0, 656, 656, 0, 657, 658, 659, 657, 681, 0, 657, 658, 659, 657, 660, 681, 660, 660, 660, 0, 657, 658, 659, 657, 660, 681, 660, 660, 660, 0, 658, 0, 661, 662, 663, 661, 677, 660, 660, 660, 660, 0, 661, 662, 663, 661, 664, 677, 680, 0, 661, 662, 663, 661, 664, 677, 680, 0, 662, 0, 664, 665, 666, 664, 667, 670, 0, 664, 665, 666, 664, 667, 670, 0, 665, 0, 668, 0, 669, 668, 669, 664, 668, 671, 672, 673, 671, 674, 1094, 670, 0, 671, 672, 673, 671, 660, 674, 1094, 660, 660, 660, 0, 671, 672, 673, 671, 660, 674, 1094, 660, 660, 660, 0, 672, 0, 675, 0, 676, 675, 676, 671, 675, 678, 0, 679, 678, 679, 661, 678, 671, 672, 673, 671, 674, 1094, 680, 0, 682, 0, 683, 682, 683, 657, 682, 685, 685, 0, 686, 686, 0, 687, 687, 0, 688, 688, 0, 689, 689, 0, 690, 691, 692, 690, 722, 0, 690, 691, 692, 690, 693, 722, 693, 693, 693, 0, 690, 691, 692, 690, 693, 722, 693, 693, 693, 0, 691, 0, 694, 695, 696, 694, 702, 693, 693, 693, 693, 0, 694, 695, 696, 694, 697, 700, 702, 705, 709, 712, 705, 709, 0, 694, 695, 696, 694, 697, 700, 702, 705, 709, 712, 705, 709, 0, 695, 0, 699, 698, 699, 698, 365, 366, 367, 365, 381, 1094, 0, 699, 701, 699, 701, 703, 0, 704, 703, 704, 694, 703, 706, 706, 0, 707, 707, 0, 708, 708, 0, 22, 22, 0, 710, 710, 0, 711, 711, 0, 22, 22, 0, 713, 713, 0, 714, 714, 0, 715, 715, 0, 716, 0, 716, 717, 718, 716, 719, 705, 709, 705, 709, 0, 716, 717, 718, 716, 719, 705, 709, 705, 709, 0, 717, 0, 720, 0, 721, 720, 721, 716, 720, 723, 0, 724, 723, 724, 690, 723, 726, 726, 0, 727, 727, 0, 728, 728, 0, 729, 729, 0, 730, 730, 0, 731, 731, 0, 732, 733, 734, 732, 736, 0, 732, 733, 734, 732, 735, 736, 735, 735, 735, 0, 732, 733, 734, 732, 735, 736, 735, 735, 735, 0, 733, 0, 22, 23, 24, 22, 25, 1094, 735, 735, 735, 735, 0, 737, 0, 738, 737, 738, 732, 737, 740, 740, 0, 741, 741, 0, 742, 742, 0, 743, 743, 0, 744, 744, 0, 745, 745, 0, 746, 747, 748, 746, 749, 0, 746, 747, 748, 746, 368, 749, 368, 368, 368, 0, 746, 747, 748, 746, 368, 749, 368, 368, 368, 0, 747, 0, 750, 0, 751, 750, 751, 746, 750, 753, 753, 0, 754, 754, 0, 755, 755, 0, 756, 757, 758, 756, 759, 0, 756, 757, 758, 756, 759, 762, 768, 916, 762, 768, 916, 0, 756, 757, 758, 756, 759, 762, 768, 916, 762, 768, 916, 0, 757, 0, 760, 0, 761, 760, 761, 756, 760, 763, 763, 0, 764, 764, 0, 765, 765, 0, 766, 766, 0, 767, 767, 0, 744, 744, 0, 769, 769, 0, 770, 770, 0, 771, 771, 0, 772, 772, 0, 773, 773, 0, 773, 774, 775, 773, 776, 902, 783, 776, 776, 776, 0, 773, 774, 775, 773, 776, 902, 783, 776, 776, 776, 0, 774, 0, 777, 778, 779, 777, 780, 783, 896, 776, 776, 776, 776, 0, 777, 778, 779, 777, 776, 780, 783, 896, 776, 776, 776, 0, 777, 778, 779, 777, 776, 780, 783, 896, 776, 776, 776, 0, 778, 0, 781, 0, 782, 781, 782, 777, 781, 1096, 1097, 784, 1096, 785, 1096, 790, 790, 0, 1097, 0, 786, 0, 787, 786, 787, 1096, 786, 789, 789, 0, 22, 22, 0, 791, 791, 0, 792, 792, 0, 793, 793, 0, 794, 794, 0, 795, 796, 797, 795, 893, 0, 795, 796, 797, 795, 798, 893, 798, 798, 798, 0, 795, 796, 797, 795, 798, 893, 798, 798, 798, 0, 796, 0, 799, 800, 801, 799, 890, 798, 798, 798, 798, 0, 799, 800, 801, 799, 802, 820, 853, 822, 890, 843, 827, 842, 842, 0, 799, 800, 801, 799, 802, 820, 853, 822, 890, 843, 827, 842, 842, 0, 800, 0, 804, 803, 804, 803, 805, 806, 807, 805, 808, 811, 0, 805, 806, 807, 805, 808, 811, 0, 805, 806, 807, 805, 808, 811, 0, 806, 0, 809, 0, 810, 809, 810, 805, 809, 811, 812, 813, 811, 814, 848, 850, 0, 811, 812, 813, 811, 814, 848, 850, 0, 812, 0, 816, 815, 816, 815, 817, 818, 819, 817, 824, 1096, 0, 817, 818, 819, 817, 802, 820, 822, 824, 1096, 843, 827, 842, 842, 0, 817, 818, 819, 817, 802, 820, 822, 824, 1096, 843, 827, 842, 842, 0, 818, 0, 804, 821, 804, 821, 823, 0, 805, 806, 807, 805, 808, 811, 823, 0, 825, 0, 826, 825, 826, 817, 825, 828, 829, 830, 828, 839, 811, 827, 0, 828, 829, 830, 828, 831, 839, 811, 0, 828, 829, 830, 828, 831, 839, 811, 0, 829, 0, 832, 833, 834, 832, 835, 0, 832, 833, 834, 832, 835, 838, 0, 832, 833, 834, 832, 835, 838, 0, 833, 0, 836, 0, 837, 836, 837, 832, 836, 805, 806, 807, 805, 808, 811, 838, 0, 840, 0, 841, 840, 841, 828, 840, 805, 806, 807, 805, 808, 811, 842, 842, 842, 842, 0, 805, 806, 807, 805, 808, 811, 842, 844, 842, 842, 842, 0, 805, 806, 807, 805, 808, 811, 842, 845, 842, 842, 842, 0, 805, 806, 807, 805, 808, 811, 842, 846, 842, 842, 842, 0, 805, 806, 807, 805, 808, 811, 842, 847, 842, 842, 842, 0, 805, 806, 807, 805, 808, 811, 842, 842, 842, 842, 0, 816, 849, 816, 849, 851, 0, 852, 851, 852, 811, 851, 854, 854, 854, 854, 0, 855, 856, 857, 855, 873, 858, 861, 854, 854, 854, 854, 0, 855, 856, 857, 855, 854, 858, 861, 854, 854, 854, 0, 855, 856, 857, 855, 854, 858, 861, 854, 854, 854, 0, 856, 0, 859, 0, 860, 859, 860, 855, 859, 861, 862, 863, 861, 864, 879, 881, 882, 886, 879, 889, 879, 879, 0, 861, 862, 863, 861, 864, 879, 881, 882, 886, 879, 889, 879, 879, 0, 862, 0, 866, 865, 866, 865, 867, 868, 869, 867, 873, 870, 0, 867, 868, 869, 867, 854, 870, 854, 854, 854, 0, 867, 868, 869, 867, 854, 870, 854, 854, 854, 0, 868, 0, 871, 0, 872, 871, 872, 867, 871, 873, 874, 875, 873, 802, 820, 822, 876, 843, 827, 842, 842, 0, 873, 874, 875, 873, 802, 820, 822, 876, 843, 827, 842, 842, 0, 874, 0, 877, 0, 878, 877, 878, 873, 877, 867, 868, 869, 867, 873, 880, 870, 879, 879, 879, 879, 0, 867, 868, 869, 867, 873, 870, 0, 879, 879, 879, 879, 0, 866, 884, 884, 884, 883, 866, 883, 885, 884, 884, 884, 883, 867, 868, 869, 867, 873, 870, 880, 0, 887, 0, 888, 887, 888, 861, 887, 867, 868, 869, 867, 873, 870, 889, 0, 891, 0, 892, 891, 892, 799, 891, 894, 0, 895, 894, 895, 795, 894, 896, 897, 898, 896, 899, 905, 907, 908, 912, 905, 915, 905, 905, 0, 896, 897, 898, 896, 899, 905, 907, 908, 912, 905, 915, 905, 905, 0, 897, 0, 901, 900, 901, 900, 773, 774, 775, 773, 902, 783, 0, 903, 0, 904, 903, 904, 773, 903, 773, 774, 775, 773, 906, 902, 783, 905, 905, 905, 905, 0, 773, 774, 775, 773, 902, 783, 0, 905, 905, 905, 905, 0, 901, 910, 910, 910, 909, 901, 909, 911, 910, 910, 910, 909, 773, 774, 775, 773, 902, 783, 906, 0, 913, 0, 914, 913, 914, 896, 913, 773, 774, 775, 773, 902, 783, 915, 0, 917, 917, 0, 918, 918, 0, 919, 919, 0, 920, 920, 0, 921, 922, 923, 921, 985, 0, 921, 922, 923, 921, 924, 985, 924, 924, 924, 0, 921, 922, 923, 921, 924, 985, 924, 924, 924, 0, 922, 0, 925, 926, 927, 925, 959, 1094, 962, 924, 924, 924, 924, 0, 925, 926, 927, 925, 924, 928, 959, 1094, 962, 924, 924, 924, 0, 925, 926, 927, 925, 924, 928, 959, 1094, 962, 924, 924, 924, 0, 926, 0, 929, 929, 929, 929, 0, 930, 931, 932, 930, 22, 933, 936, 929, 929, 929, 929, 0, 930, 931, 932, 930, 929, 933, 936, 929, 929, 929, 0, 930, 931, 932, 930, 929, 933, 936, 929, 929, 929, 0, 931, 0, 934, 0, 935, 934, 935, 930, 934, 936, 937, 938, 936, 939, 948, 950, 951, 955, 948, 958, 948, 948, 0, 936, 937, 938, 936, 939, 948, 950, 951, 955, 948, 958, 948, 948, 0, 937, 0, 941, 940, 941, 940, 942, 943, 944, 942, 22, 945, 0, 942, 943, 944, 942, 929, 945, 929, 929, 929, 0, 942, 943, 944, 942, 929, 945, 929, 929, 929, 0, 943, 0, 946, 0, 947, 946, 947, 942, 946, 942, 943, 944, 942, 22, 949, 945, 948, 948, 948, 948, 0, 942, 943, 944, 942, 22, 945, 0, 948, 948, 948, 948, 0, 941, 953, 953, 953, 952, 941, 952, 954, 953, 953, 953, 952, 942, 943, 944, 942, 22, 945, 949, 0, 956, 0, 957, 956, 957, 936, 956, 942, 943, 944, 942, 22, 945, 958, 0, 960, 0, 961, 960, 961, 925, 960, 962, 963, 964, 962, 965, 974, 976, 977, 981, 974, 984, 974, 974, 0, 962, 963, 964, 962, 965, 974, 976, 977, 981, 974, 984, 974, 974, 0, 963, 0, 967, 966, 967, 966, 968, 969, 970, 968, 971, 1094, 0, 968, 969, 970, 968, 924, 928, 971, 1094, 924, 924, 924, 0, 968, 969, 970, 968, 924, 928, 971, 1094, 924, 924, 924, 0, 969, 0, 972, 0, 973, 972, 973, 968, 972, 968, 969, 970, 968, 975, 971, 1094, 974, 974, 974, 974, 0, 968, 969, 970, 968, 971, 1094, 0, 974, 974, 974, 974, 0, 967, 979, 979, 979, 978, 967, 978, 980, 979, 979, 979, 978, 968, 969, 970, 968, 971, 1094, 975, 0, 982, 0, 983, 982, 983, 962, 982, 968, 969, 970, 968, 971, 1094, 984, 0, 986, 0, 987, 986, 987, 921, 986, 989, 989, 0, 990, 990, 0, 991, 991, 0, 992, 992, 0, 993, 994, 995, 993, 1091, 0, 993, 994, 995, 993, 996, 1091, 996, 996, 996, 0, 993, 994, 995, 993, 996, 1091, 996, 996, 996, 0, 994, 0, 997, 998, 999, 997, 1088, 996, 996, 996, 996, 0, 997, 998, 999, 997, 1000, 1018, 1051, 1020, 1088, 1041, 1025, 1040, 1040, 0, 997, 998, 999, 997, 1000, 1018, 1051, 1020, 1088, 1041, 1025, 1040, 1040, 0, 998, 0, 1002, 1001, 1002, 1001, 1003, 1004, 1005, 1003, 1006, 1009, 0, 1003, 1004, 1005, 1003, 1006, 1009, 0, 1003, 1004, 1005, 1003, 1006, 1009, 0, 1004, 0, 1007, 0, 1008, 1007, 1008, 1003, 1007, 1009, 1010, 1011, 1009, 1012, 1046, 1048, 0, 1009, 1010, 1011, 1009, 1012, 1046, 1048, 0, 1010, 0, 1014, 1013, 1014, 1013, 1015, 1016, 1017, 1015, 1022, 1094, 0, 1015, 1016, 1017, 1015, 1000, 1018, 1020, 1022, 1094, 1041, 1025, 1040, 1040, 0, 1015, 1016, 1017, 1015, 1000, 1018, 1020, 1022, 1094, 1041, 1025, 1040, 1040, 0, 1016, 0, 1002, 1019, 1002, 1019, 1021, 0, 1003, 1004, 1005, 1003, 1006, 1009, 1021, 0, 1023, 0, 1024, 1023, 1024, 1015, 1023, 1026, 1027, 1028, 1026, 1037, 1009, 1025, 0, 1026, 1027, 1028, 1026, 1029, 1037, 1009, 0, 1026, 1027, 1028, 1026, 1029, 1037, 1009, 0, 1027, 0, 1030, 1031, 1032, 1030, 1033, 0, 1030, 1031, 1032, 1030, 1033, 1036, 0, 1030, 1031, 1032, 1030, 1033, 1036, 0, 1031, 0, 1034, 0, 1035, 1034, 1035, 1030, 1034, 1003, 1004, 1005, 1003, 1006, 1009, 1036, 0, 1038, 0, 1039, 1038, 1039, 1026, 1038, 1003, 1004, 1005, 1003, 1006, 1009, 1040, 1040, 1040, 1040, 0, 1003, 1004, 1005, 1003, 1006, 1009, 1040, 1042, 1040, 1040, 1040, 0, 1003, 1004, 1005, 1003, 1006, 1009, 1040, 1043, 1040, 1040, 1040, 0, 1003, 1004, 1005, 1003, 1006, 1009, 1040, 1044, 1040, 1040, 1040, 0, 1003, 1004, 1005, 1003, 1006, 1009, 1040, 1045, 1040, 1040, 1040, 0, 1003, 1004, 1005, 1003, 1006, 1009, 1040, 1040, 1040, 1040, 0, 1014, 1047, 1014, 1047, 1049, 0, 1050, 1049, 1050, 1009, 1049, 1052, 1052, 1052, 1052, 0, 1053, 1054, 1055, 1053, 1071, 1056, 1059, 1052, 1052, 1052, 1052, 0, 1053, 1054, 1055, 1053, 1052, 1056, 1059, 1052, 1052, 1052, 0, 1053, 1054, 1055, 1053, 1052, 1056, 1059, 1052, 1052, 1052, 0, 1054, 0, 1057, 0, 1058, 1057, 1058, 1053, 1057, 1059, 1060, 1061, 1059, 1062, 1077, 1079, 1080, 1084, 1077, 1087, 1077, 1077, 0, 1059, 1060, 1061, 1059, 1062, 1077, 1079, 1080, 1084, 1077, 1087, 1077, 1077, 0, 1060, 0, 1064, 1063, 1064, 1063, 1065, 1066, 1067, 1065, 1071, 1068, 0, 1065, 1066, 1067, 1065, 1052, 1068, 1052, 1052, 1052, 0, 1065, 1066, 1067, 1065, 1052, 1068, 1052, 1052, 1052, 0, 1066, 0, 1069, 0, 1070, 1069, 1070, 1065, 1069, 1071, 1072, 1073, 1071, 1000, 1018, 1020, 1074, 1041, 1025, 1040, 1040, 0, 1071, 1072, 1073, 1071, 1000, 1018, 1020, 1074, 1041, 1025, 1040, 1040, 0, 1072, 0, 1075, 0, 1076, 1075, 1076, 1071, 1075, 1065, 1066, 1067, 1065, 1071, 1078, 1068, 1077, 1077, 1077, 1077, 0, 1065, 1066, 1067, 1065, 1071, 1068, 0, 1077, 1077, 1077, 1077, 0, 1064, 1082, 1082, 1082, 1081, 1064, 1081, 1083, 1082, 1082, 1082, 1081, 1065, 1066, 1067, 1065, 1071, 1068, 1078, 0, 1085, 0, 1086, 1085, 1086, 1059, 1085, 1065, 1066, 1067, 1065, 1071, 1068, 1087, 0, 1089, 0, 1090, 1089, 1090, 997, 1089, 1092, 0, 1093, 1092, 1093, 993, 1092, 1094, 1095, 1, 1094, 2, 37, 40, 43, 167, 193, 342, 617, 725, 739, 752, 788, 988, 43, 167, 193, 342, 617, 725, 739, 752, 788, 988, 0, 1094, 1095, 1, 1094, 2, 37, 40, 43, 167, 193, 342, 617, 725, 739, 752, 788, 988, 43, 167, 193, 342, 617, 725, 739, 752, 788, 988, 0, 1096, 1097, 784, 1096, 2, 37, 785, 1096, 43, 167, 193, 342, 617, 725, 739, 752, 788, 790, 43, 167, 193, 342, 617, 725, 739, 752, 788, 790, 0, 1096, 1097, 784, 1096, 2, 37, 785, 1096, 43, 167, 193, 342, 617, 725, 739, 752, 788, 790, 43, 167, 193, 342, 617, 725, 739, 752, 788, 790, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096, 1097, 0 }; static const short _sas_commands_cond_actions[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 29, 29, 29, 29, 29, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 71, 21, 23, 0, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 68, 17, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 25, 25, 25, 25, 25, 139, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 13, 13, 13, 13, 13, 13, 41, 0, 0, 0, 0, 0, 27, 0, 0, 27, 27, 27, 27, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 77, 77, 77, 77, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 71, 21, 23, 0, 123, 123, 123, 123, 123, 123, 0, 68, 17, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 25, 25, 25, 25, 25, 25, 139, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 25, 25, 25, 25, 25, 139, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 177, 177, 177, 177, 177, 177, 3, 0, 0, 0, 0, 0, 0, 0, 0, 187, 187, 187, 187, 187, 187, 3, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 13, 13, 13, 13, 13, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 77, 77, 77, 0, 0, 0, 29, 29, 29, 29, 29, 29, 29, 29, 29, 0, 0, 29, 29, 0, 0, 0, 0, 0, 27, 0, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 25, 77, 77, 77, 0, 0, 0, 27, 27, 27, 27, 0, 0, 0, 0, 0, 0, 0, 0, 71, 21, 23, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 68, 17, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 29, 29, 29, 29, 29, 29, 29, 29, 29, 0, 0, 29, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 25, 77, 25, 25, 77, 77, 77, 0, 0, 0, 89, 89, 89, 89, 89, 89, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 25, 77, 25, 25, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 44, 1, 1, 0, 44, 41, 44, 44, 0, 25, 25, 25, 25, 74, 143, 74, 74, 25, 143, 139, 143, 143, 0, 0, 0, 71, 21, 23, 0, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 0, 0, 0, 95, 95, 95, 95, 29, 95, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 0, 27, 27, 27, 27, 0, 68, 135, 135, 135, 17, 19, 0, 19, 5, 5, 5, 0, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 27, 27, 27, 27, 27, 27, 41, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 77, 77, 77, 77, 77, 77, 139, 77, 77, 0, 0, 0, 86, 0, 0, 0, 0, 0, 15, 15, 15, 15, 15, 15, 0, 0, 0, 0, 0, 27, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 41, 0, 0, 0, 0, 0, 0, 0, 3, 0, 86, 0, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 15, 15, 15, 15, 15, 15, 41, 0, 86, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 41, 0, 0, 0, 86, 0, 3, 0, 0, 0, 86, 0, 41, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 0, 86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 27, 0, 41, 27, 27, 0, 0, 0, 0, 0, 27, 0, 0, 0, 0, 27, 0, 41, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 25, 25, 77, 25, 139, 77, 77, 0, 0, 0, 29, 29, 29, 29, 98, 29, 29, 29, 29, 98, 29, 151, 98, 98, 0, 41, 0, 0, 0, 0, 0, 27, 0, 0, 0, 0, 27, 0, 47, 27, 27, 0, 0, 0, 0, 0, 0, 0, 0, 29, 29, 29, 29, 98, 29, 29, 29, 29, 98, 29, 155, 98, 98, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 27, 27, 27, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 25, 77, 25, 25, 25, 77, 77, 77, 0, 0, 0, 71, 21, 23, 0, 9, 9, 9, 9, 65, 9, 9, 65, 65, 65, 0, 0, 0, 0, 0, 27, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 77, 77, 77, 0, 0, 0, 89, 89, 89, 89, 89, 89, 89, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 44, 1, 1, 0, 44, 41, 44, 44, 0, 25, 25, 25, 25, 74, 143, 74, 74, 25, 143, 139, 143, 143, 0, 0, 0, 71, 21, 23, 0, 56, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 0, 0, 0, 95, 95, 95, 95, 29, 95, 95, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 0, 27, 27, 27, 27, 0, 68, 135, 135, 135, 17, 19, 0, 19, 5, 5, 5, 0, 56, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 3, 0, 68, 17, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 104, 0, 39, 104, 104, 104, 0, 25, 25, 25, 25, 80, 147, 25, 80, 147, 147, 147, 0, 0, 0, 41, 0, 37, 37, 37, 37, 37, 37, 3, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 25, 77, 25, 25, 25, 77, 77, 77, 0, 0, 0, 92, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 27, 41, 27, 27, 0, 25, 25, 25, 25, 25, 77, 25, 25, 25, 77, 139, 77, 77, 0, 0, 0, 92, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 27, 41, 27, 27, 0, 25, 25, 25, 25, 25, 77, 25, 25, 25, 77, 139, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 3, 0, 41, 0, 172, 172, 172, 172, 172, 172, 3, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 7, 7, 7, 7, 7, 3, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 41, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 139, 77, 77, 0, 0, 0, 0, 0, 0, 0, 83, 0, 0, 0, 41, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 25, 25, 25, 25, 25, 139, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 50, 3, 0, 13, 13, 13, 13, 13, 13, 131, 0, 0, 0, 0, 0, 0, 0, 3, 0, 182, 0, 3, 0, 0, 0, 83, 0, 0, 0, 0, 0, 83, 0, 0, 0, 0, 0, 0, 83, 0, 0, 0, 0, 0, 0, 83, 0, 0, 0, 0, 0, 0, 83, 0, 41, 0, 0, 0, 111, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 53, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 3, 0, 41, 0, 107, 107, 107, 107, 107, 107, 3, 0, 92, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 27, 41, 27, 27, 0, 25, 25, 25, 25, 25, 77, 25, 25, 25, 77, 139, 77, 77, 0, 0, 0, 92, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 27, 41, 27, 27, 0, 25, 25, 25, 25, 25, 77, 25, 25, 25, 77, 139, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 3, 0, 41, 0, 167, 167, 167, 167, 167, 167, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 29, 29, 29, 29, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 68, 17, 19, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 0, 3, 0, 0, 41, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 92, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 71, 21, 23, 0, 59, 59, 59, 59, 59, 59, 0, 0, 0, 0, 0, 27, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 68, 17, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 92, 92, 92, 92, 92, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 25, 25, 25, 25, 25, 25, 139, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 25, 25, 25, 25, 25, 139, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 115, 115, 115, 115, 115, 115, 3, 0, 0, 0, 0, 0, 27, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 119, 119, 119, 119, 119, 119, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 29, 29, 29, 29, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 71, 21, 23, 0, 9, 9, 9, 9, 9, 9, 0, 68, 17, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 29, 29, 29, 29, 29, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 77, 77, 77, 0, 0, 0, 89, 89, 89, 89, 89, 89, 89, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 86, 86, 86, 86, 86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 139, 25, 25, 0, 0, 0, 71, 21, 23, 0, 127, 127, 127, 127, 127, 127, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 71, 21, 23, 0, 62, 62, 62, 62, 62, 62, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 139, 25, 25, 0, 0, 0, 68, 17, 19, 0, 41, 0, 31, 31, 31, 31, 31, 31, 3, 0, 0, 0, 0, 0, 0, 0, 0, 33, 33, 33, 33, 33, 33, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 159, 0, 25, 25, 25, 25, 25, 192, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35, 35, 35, 35, 35, 35, 3, 0, 0, 0, 0, 0, 0, 0, 0, 101, 101, 101, 101, 101, 101, 0, 0, 0, 0, 0, 101, 101, 101, 101, 101, 101, 0, 0, 0, 0, 0, 0, 101, 101, 101, 101, 101, 101, 0, 0, 0, 0, 0, 0, 101, 101, 101, 101, 101, 101, 0, 0, 0, 0, 0, 0, 101, 101, 101, 101, 101, 101, 0, 0, 0, 0, 0, 0, 163, 163, 163, 163, 163, 163, 0, 0, 0, 0, 0, 68, 17, 19, 0, 0, 0, 0, 0, 0, 0, 0, 27, 27, 27, 27, 0, 89, 89, 89, 89, 89, 89, 89, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 44, 1, 1, 0, 44, 41, 44, 44, 0, 25, 25, 25, 25, 74, 143, 74, 74, 25, 143, 139, 143, 143, 0, 0, 0, 71, 21, 23, 0, 56, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 139, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 95, 95, 95, 95, 95, 29, 95, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 0, 27, 27, 27, 27, 0, 68, 135, 135, 135, 17, 19, 0, 19, 5, 5, 5, 0, 56, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 44, 1, 1, 0, 44, 41, 44, 44, 0, 25, 25, 25, 25, 74, 143, 74, 74, 25, 143, 139, 143, 143, 0, 0, 0, 71, 21, 23, 0, 56, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 0, 0, 0, 95, 95, 95, 95, 29, 95, 95, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 0, 27, 27, 27, 27, 0, 68, 135, 135, 135, 17, 19, 0, 19, 5, 5, 5, 0, 56, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 89, 89, 89, 89, 89, 89, 89, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 25, 25, 77, 77, 77, 0, 0, 0, 27, 27, 27, 27, 0, 89, 89, 89, 89, 89, 89, 89, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 44, 1, 1, 0, 44, 41, 44, 44, 0, 25, 25, 25, 25, 74, 143, 74, 74, 25, 143, 139, 143, 143, 0, 0, 0, 71, 21, 23, 0, 56, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 95, 95, 95, 95, 95, 29, 95, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 0, 27, 27, 27, 27, 0, 68, 135, 135, 135, 17, 19, 0, 19, 5, 5, 5, 0, 56, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 44, 1, 1, 0, 44, 41, 44, 44, 0, 25, 25, 25, 25, 74, 143, 74, 74, 25, 143, 139, 143, 143, 0, 0, 0, 71, 21, 23, 0, 56, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 27, 0, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 95, 95, 95, 95, 29, 95, 95, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 0, 27, 27, 27, 27, 0, 68, 135, 135, 135, 17, 19, 0, 19, 5, 5, 5, 0, 56, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 86, 86, 86, 86, 86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 139, 25, 25, 0, 0, 0, 71, 21, 23, 0, 127, 127, 127, 127, 127, 127, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 71, 21, 23, 0, 62, 62, 62, 62, 62, 62, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 139, 25, 25, 0, 0, 0, 68, 17, 19, 0, 41, 0, 31, 31, 31, 31, 31, 31, 3, 0, 0, 0, 0, 0, 0, 0, 0, 33, 33, 33, 33, 33, 33, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 159, 0, 25, 25, 25, 25, 25, 192, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35, 35, 35, 35, 35, 35, 3, 0, 0, 0, 0, 0, 0, 0, 0, 101, 101, 101, 101, 101, 101, 0, 0, 0, 0, 0, 101, 101, 101, 101, 101, 101, 0, 0, 0, 0, 0, 0, 101, 101, 101, 101, 101, 101, 0, 0, 0, 0, 0, 0, 101, 101, 101, 101, 101, 101, 0, 0, 0, 0, 0, 0, 101, 101, 101, 101, 101, 101, 0, 0, 0, 0, 0, 0, 163, 163, 163, 163, 163, 163, 0, 0, 0, 0, 0, 68, 17, 19, 0, 0, 0, 0, 0, 0, 0, 0, 27, 27, 27, 27, 0, 89, 89, 89, 89, 89, 89, 89, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 44, 1, 1, 0, 44, 41, 44, 44, 0, 25, 25, 25, 25, 74, 143, 74, 74, 25, 143, 139, 143, 143, 0, 0, 0, 71, 21, 23, 0, 56, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 27, 0, 27, 27, 27, 0, 25, 25, 25, 25, 77, 25, 77, 77, 77, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 41, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 139, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 95, 95, 95, 95, 95, 29, 95, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 0, 27, 27, 27, 27, 0, 68, 135, 135, 135, 17, 19, 0, 19, 5, 5, 5, 0, 56, 56, 56, 56, 56, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 11, 11, 11, 11, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 0, 25, 0 }; static const short _sas_commands_eof_trans[] = { 5747, 5748, 5749, 5750, 5751, 5752, 5753, 5754, 5755, 5756, 5757, 5758, 5759, 5760, 5761, 5762, 5763, 5764, 5765, 5766, 5767, 5768, 5769, 5770, 5771, 5772, 5773, 5774, 5775, 5776, 5777, 5778, 5779, 5780, 5781, 5782, 5783, 5784, 5785, 5786, 5787, 5788, 5789, 5790, 5791, 5792, 5793, 5794, 5795, 5796, 5797, 5798, 5799, 5800, 5801, 5802, 5803, 5804, 5805, 5806, 5807, 5808, 5809, 5810, 5811, 5812, 5813, 5814, 5815, 5816, 5817, 5818, 5819, 5820, 5821, 5822, 5823, 5824, 5825, 5826, 5827, 5828, 5829, 5830, 5831, 5832, 5833, 5834, 5835, 5836, 5837, 5838, 5839, 5840, 5841, 5842, 5843, 5844, 5845, 5846, 5847, 5848, 5849, 5850, 5851, 5852, 5853, 5854, 5855, 5856, 5857, 5858, 5859, 5860, 5861, 5862, 5863, 5864, 5865, 5866, 5867, 5868, 5869, 5870, 5871, 5872, 5873, 5874, 5875, 5876, 5877, 5878, 5879, 5880, 5881, 5882, 5883, 5884, 5885, 5886, 5887, 5888, 5889, 5890, 5891, 5892, 5893, 5894, 5895, 5896, 5897, 5898, 5899, 5900, 5901, 5902, 5903, 5904, 5905, 5906, 5907, 5908, 5909, 5910, 5911, 5912, 5913, 5914, 5915, 5916, 5917, 5918, 5919, 5920, 5921, 5922, 5923, 5924, 5925, 5926, 5927, 5928, 5929, 5930, 5931, 5932, 5933, 5934, 5935, 5936, 5937, 5938, 5939, 5940, 5941, 5942, 5943, 5944, 5945, 5946, 5947, 5948, 5949, 5950, 5951, 5952, 5953, 5954, 5955, 5956, 5957, 5958, 5959, 5960, 5961, 5962, 5963, 5964, 5965, 5966, 5967, 5968, 5969, 5970, 5971, 5972, 5973, 5974, 5975, 5976, 5977, 5978, 5979, 5980, 5981, 5982, 5983, 5984, 5985, 5986, 5987, 5988, 5989, 5990, 5991, 5992, 5993, 5994, 5995, 5996, 5997, 5998, 5999, 6000, 6001, 6002, 6003, 6004, 6005, 6006, 6007, 6008, 6009, 6010, 6011, 6012, 6013, 6014, 6015, 6016, 6017, 6018, 6019, 6020, 6021, 6022, 6023, 6024, 6025, 6026, 6027, 6028, 6029, 6030, 6031, 6032, 6033, 6034, 6035, 6036, 6037, 6038, 6039, 6040, 6041, 6042, 6043, 6044, 6045, 6046, 6047, 6048, 6049, 6050, 6051, 6052, 6053, 6054, 6055, 6056, 6057, 6058, 6059, 6060, 6061, 6062, 6063, 6064, 6065, 6066, 6067, 6068, 6069, 6070, 6071, 6072, 6073, 6074, 6075, 6076, 6077, 6078, 6079, 6080, 6081, 6082, 6083, 6084, 6085, 6086, 6087, 6088, 6089, 6090, 6091, 6092, 6093, 6094, 6095, 6096, 6097, 6098, 6099, 6100, 6101, 6102, 6103, 6104, 6105, 6106, 6107, 6108, 6109, 6110, 6111, 6112, 6113, 6114, 6115, 6116, 6117, 6118, 6119, 6120, 6121, 6122, 6123, 6124, 6125, 6126, 6127, 6128, 6129, 6130, 6131, 6132, 6133, 6134, 6135, 6136, 6137, 6138, 6139, 6140, 6141, 6142, 6143, 6144, 6145, 6146, 6147, 6148, 6149, 6150, 6151, 6152, 6153, 6154, 6155, 6156, 6157, 6158, 6159, 6160, 6161, 6162, 6163, 6164, 6165, 6166, 6167, 6168, 6169, 6170, 6171, 6172, 6173, 6174, 6175, 6176, 6177, 6178, 6179, 6180, 6181, 6182, 6183, 6184, 6185, 6186, 6187, 6188, 6189, 6190, 6191, 6192, 6193, 6194, 6195, 6196, 6197, 6198, 6199, 6200, 6201, 6202, 6203, 6204, 6205, 6206, 6207, 6208, 6209, 6210, 6211, 6212, 6213, 6214, 6215, 6216, 6217, 6218, 6219, 6220, 6221, 6222, 6223, 6224, 6225, 6226, 6227, 6228, 6229, 6230, 6231, 6232, 6233, 6234, 6235, 6236, 6237, 6238, 6239, 6240, 6241, 6242, 6243, 6244, 6245, 6246, 6247, 6248, 6249, 6250, 6251, 6252, 6253, 6254, 6255, 6256, 6257, 6258, 6259, 6260, 6261, 6262, 6263, 6264, 6265, 6266, 6267, 6268, 6269, 6270, 6271, 6272, 6273, 6274, 6275, 6276, 6277, 6278, 6279, 6280, 6281, 6282, 6283, 6284, 6285, 6286, 6287, 6288, 6289, 6290, 6291, 6292, 6293, 6294, 6295, 6296, 6297, 6298, 6299, 6300, 6301, 6302, 6303, 6304, 6305, 6306, 6307, 6308, 6309, 6310, 6311, 6312, 6313, 6314, 6315, 6316, 6317, 6318, 6319, 6320, 6321, 6322, 6323, 6324, 6325, 6326, 6327, 6328, 6329, 6330, 6331, 6332, 6333, 6334, 6335, 6336, 6337, 6338, 6339, 6340, 6341, 6342, 6343, 6344, 6345, 6346, 6347, 6348, 6349, 6350, 6351, 6352, 6353, 6354, 6355, 6356, 6357, 6358, 6359, 6360, 6361, 6362, 6363, 6364, 6365, 6366, 6367, 6368, 6369, 6370, 6371, 6372, 6373, 6374, 6375, 6376, 6377, 6378, 6379, 6380, 6381, 6382, 6383, 6384, 6385, 6386, 6387, 6388, 6389, 6390, 6391, 6392, 6393, 6394, 6395, 6396, 6397, 6398, 6399, 6400, 6401, 6402, 6403, 6404, 6405, 6406, 6407, 6408, 6409, 6410, 6411, 6412, 6413, 6414, 6415, 6416, 6417, 6418, 6419, 6420, 6421, 6422, 6423, 6424, 6425, 6426, 6427, 6428, 6429, 6430, 6431, 6432, 6433, 6434, 6435, 6436, 6437, 6438, 6439, 6440, 6441, 6442, 6443, 6444, 6445, 6446, 6447, 6448, 6449, 6450, 6451, 6452, 6453, 6454, 6455, 6456, 6457, 6458, 6459, 6460, 6461, 6462, 6463, 6464, 6465, 6466, 6467, 6468, 6469, 6470, 6471, 6472, 6473, 6474, 6475, 6476, 6477, 6478, 6479, 6480, 6481, 6482, 6483, 6484, 6485, 6486, 6487, 6488, 6489, 6490, 6491, 6492, 6493, 6494, 6495, 6496, 6497, 6498, 6499, 6500, 6501, 6502, 6503, 6504, 6505, 6506, 6507, 6508, 6509, 6510, 6511, 6512, 6513, 6514, 6515, 6516, 6517, 6518, 6519, 6520, 6521, 6522, 6523, 6524, 6525, 6526, 6527, 6528, 6529, 6530, 6531, 6532, 6533, 6534, 6535, 6536, 6537, 6538, 6539, 6540, 6541, 6542, 6543, 6544, 6545, 6546, 6547, 6548, 6549, 6550, 6551, 6552, 6553, 6554, 6555, 6556, 6557, 6558, 6559, 6560, 6561, 6562, 6563, 6564, 6565, 6566, 6567, 6568, 6569, 6570, 6571, 6572, 6573, 6574, 6575, 6576, 6577, 6578, 6579, 6580, 6581, 6582, 6583, 6584, 6585, 6586, 6587, 6588, 6589, 6590, 6591, 6592, 6593, 6594, 6595, 6596, 6597, 6598, 6599, 6600, 6601, 6602, 6603, 6604, 6605, 6606, 6607, 6608, 6609, 6610, 6611, 6612, 6613, 6614, 6615, 6616, 6617, 6618, 6619, 6620, 6621, 6622, 6623, 6624, 6625, 6626, 6627, 6628, 6629, 6630, 6631, 6632, 6633, 6634, 6635, 6636, 6637, 6638, 6639, 6640, 6641, 6642, 6643, 6644, 6645, 6646, 6647, 6648, 6649, 6650, 6651, 6652, 6653, 6654, 6655, 6656, 6657, 6658, 6659, 6660, 6661, 6662, 6663, 6664, 6665, 6666, 6667, 6668, 6669, 6670, 6671, 6672, 6673, 6674, 6675, 6676, 6677, 6678, 6679, 6680, 6681, 6682, 6683, 6684, 6685, 6686, 6687, 6688, 6689, 6690, 6691, 6692, 6693, 6694, 6695, 6696, 6697, 6698, 6699, 6700, 6701, 6702, 6703, 6704, 6705, 6706, 6707, 6708, 6709, 6710, 6711, 6712, 6713, 6714, 6715, 6716, 6717, 6718, 6719, 6720, 6721, 6722, 6723, 6724, 6725, 6726, 6727, 6728, 6729, 6730, 6731, 6732, 6733, 6734, 6735, 6736, 6737, 6738, 6739, 6740, 6741, 6742, 6743, 6744, 6745, 6746, 6747, 6748, 6749, 6750, 6751, 6752, 6753, 6754, 6755, 6756, 6757, 6758, 6759, 6760, 6761, 6762, 6763, 6764, 6765, 6766, 6767, 6768, 6769, 6770, 6771, 6772, 6773, 6774, 6775, 6776, 6777, 6778, 6779, 6780, 6781, 6782, 6783, 6784, 6785, 6786, 6787, 6788, 6789, 6790, 6791, 6792, 6793, 6794, 6795, 6796, 6797, 6798, 6799, 6800, 6801, 6802, 6803, 6804, 6805, 6806, 6807, 6808, 6809, 6810, 6811, 6812, 6813, 6814, 6815, 6816, 6817, 6818, 6819, 6820, 6821, 6822, 6823, 6824, 6825, 6826, 6827, 6828, 6829, 6830, 6831, 6832, 6833, 6834, 6835, 6836, 6837, 6838, 6839, 6840, 6841, 6842, 6843, 6844, 0 }; static const int sas_commands_start = 1094; static const int sas_commands_en_main = 1094; #line 13 "src/txt/readstat_sas_commands_read.rl" readstat_schema_t *readstat_parse_sas_commands(readstat_parser_t *parser, const char *filepath, void *user_ctx, readstat_error_t *outError) { if (parser->io->open(filepath, parser->io->io_ctx) == -1) { if (outError) *outError = READSTAT_ERROR_OPEN; return NULL; } readstat_schema_t *schema = NULL; unsigned char *bytes = NULL; readstat_error_t error = READSTAT_OK; ssize_t len = parser->io->seek(0, READSTAT_SEEK_END, parser->io->io_ctx); if (len == -1) { error = READSTAT_ERROR_SEEK; goto cleanup; } parser->io->seek(0, READSTAT_SEEK_SET, parser->io->io_ctx); bytes = malloc(len); parser->io->read(bytes, len, parser->io->io_ctx); unsigned char *p = bytes; unsigned char *pe = bytes + len; unsigned char *eof = pe; unsigned char *str_start = NULL; size_t str_len = 0; int cs; double double_value = NAN; uint64_t first_integer = 0; uint64_t integer = 0; int line_no = 0; unsigned char *line_start = p; char varname[32]; char argname[32]; char labelset[32]; char string_value[32]; char buf[1024]; readstat_type_t var_type = READSTAT_TYPE_DOUBLE; label_type_t label_type = LABEL_TYPE_DOUBLE; int var_row = 0, var_col = 0; int var_len = 0; if ((schema = calloc(1, sizeof(readstat_schema_t))) == NULL) { error = READSTAT_ERROR_MALLOC; goto cleanup; } schema->rows_per_observation = 1; #line 3220 "src/txt/readstat_sas_commands_read.c" { cs = (int)sas_commands_start; } #line 3225 "src/txt/readstat_sas_commands_read.c" { int _klen; unsigned int _trans = 0; const char * _keys; const signed char * _acts; unsigned int _nacts; _resume: {} if ( p == pe && p != eof ) goto _out; if ( p == eof ) { if ( _sas_commands_eof_trans[cs] > 0 ) { _trans = (unsigned int)_sas_commands_eof_trans[cs] - 1; } } else { _keys = ( _sas_commands_trans_keys + (_sas_commands_key_offsets[cs])); _trans = (unsigned int)_sas_commands_index_offsets[cs]; _klen = (int)_sas_commands_single_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + _klen - 1; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _keys += _klen; _trans += (unsigned int)_klen; break; } _mid = _lower + ((_upper-_lower) >> 1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 1; else if ( ( (*( p))) > (*( _mid)) ) _lower = _mid + 1; else { _trans += (unsigned int)(_mid - _keys); goto _match; } } } _klen = (int)_sas_commands_range_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + (_klen<<1) - 2; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _trans += (unsigned int)_klen; break; } _mid = _lower + (((_upper-_lower) >> 1) & ~1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 2; else if ( ( (*( p))) > (*( _mid + 1)) ) _lower = _mid + 2; else { _trans += (unsigned int)((_mid - _keys)>>1); break; } } } _match: {} } cs = (int)_sas_commands_cond_targs[_trans]; if ( _sas_commands_cond_actions[_trans] != 0 ) { _acts = ( _sas_commands_actions + (_sas_commands_cond_actions[_trans])); _nacts = (unsigned int)(*( _acts)); _acts += 1; while ( _nacts > 0 ) { switch ( (*( _acts)) ) { case 0: { { #line 72 "src/txt/readstat_sas_commands_read.rl" integer = 0; } #line 3310 "src/txt/readstat_sas_commands_read.c" break; } case 1: { { #line 76 "src/txt/readstat_sas_commands_read.rl" integer = 10 * integer + ((( (*( p)))) - '0'); } #line 3321 "src/txt/readstat_sas_commands_read.c" break; } case 2: { { #line 80 "src/txt/readstat_sas_commands_read.rl" int value = 0; if ((( (*( p)))) >= '0' && (( (*( p)))) <= '9') { value = (( (*( p)))) - '0'; } else if ((( (*( p)))) >= 'A' && (( (*( p)))) <= 'F') { value = (( (*( p)))) - 'A' + 10; } else if ((( (*( p)))) >= 'a' && (( (*( p)))) <= 'f') { value = (( (*( p)))) - 'a' + 10; } integer = 16 * integer + value; } #line 3340 "src/txt/readstat_sas_commands_read.c" break; } case 3: { { #line 92 "src/txt/readstat_sas_commands_read.rl" var_col = integer - 1; var_len = 1; } #line 3352 "src/txt/readstat_sas_commands_read.c" break; } case 4: { { #line 97 "src/txt/readstat_sas_commands_read.rl" var_len = integer - var_col; } #line 3363 "src/txt/readstat_sas_commands_read.c" break; } case 5: { { #line 101 "src/txt/readstat_sas_commands_read.rl" var_type = READSTAT_TYPE_STRING; } #line 3374 "src/txt/readstat_sas_commands_read.c" break; } case 6: { { #line 105 "src/txt/readstat_sas_commands_read.rl" var_type = READSTAT_TYPE_DOUBLE; } #line 3385 "src/txt/readstat_sas_commands_read.c" break; } case 7: { { #line 109 "src/txt/readstat_sas_commands_read.rl" readstat_copy(buf, sizeof(buf), (char *)str_start, str_len); } #line 3396 "src/txt/readstat_sas_commands_read.c" break; } case 8: { { #line 113 "src/txt/readstat_sas_commands_read.rl" readstat_copy(labelset, sizeof(labelset), (char *)str_start, str_len); } #line 3407 "src/txt/readstat_sas_commands_read.c" break; } case 9: { { #line 117 "src/txt/readstat_sas_commands_read.rl" readstat_copy(string_value, sizeof(string_value), (char *)str_start, str_len); } #line 3418 "src/txt/readstat_sas_commands_read.c" break; } case 10: { { #line 121 "src/txt/readstat_sas_commands_read.rl" readstat_copy(argname, sizeof(argname), (char *)str_start, str_len); } #line 3429 "src/txt/readstat_sas_commands_read.c" break; } case 11: { { #line 125 "src/txt/readstat_sas_commands_read.rl" readstat_copy_lower(varname, sizeof(varname), (char *)str_start, str_len); } #line 3440 "src/txt/readstat_sas_commands_read.c" break; } case 12: { { #line 129 "src/txt/readstat_sas_commands_read.rl" if (strcasecmp(argname, "firstobs") == 0) { schema->first_line = integer; } if (strcasecmp(argname, "dlm") == 0) { schema->field_delimiter = integer ? integer : buf[0]; } } #line 3456 "src/txt/readstat_sas_commands_read.c" break; } case 13: { { #line 138 "src/txt/readstat_sas_commands_read.rl" readstat_schema_entry_t *entry = readstat_schema_find_or_create_entry(schema, varname); entry->variable.type = var_type; entry->row = var_row; entry->col = var_col; entry->len = var_len; } #line 3471 "src/txt/readstat_sas_commands_read.c" break; } case 14: { { #line 146 "src/txt/readstat_sas_commands_read.rl" readstat_schema_entry_t *entry = readstat_schema_find_or_create_entry(schema, varname); entry->len = var_len; } #line 3483 "src/txt/readstat_sas_commands_read.c" break; } case 15: { { #line 151 "src/txt/readstat_sas_commands_read.rl" readstat_schema_entry_t *entry = readstat_schema_find_or_create_entry(schema, varname); readstat_copy(entry->variable.label, sizeof(entry->variable.label), buf, sizeof(buf)); } #line 3495 "src/txt/readstat_sas_commands_read.c" break; } case 16: { { #line 156 "src/txt/readstat_sas_commands_read.rl" readstat_schema_entry_t *entry = readstat_schema_find_or_create_entry(schema, varname); readstat_copy(entry->labelset, sizeof(entry->labelset), labelset, sizeof(labelset)); } #line 3507 "src/txt/readstat_sas_commands_read.c" break; } case 17: { { #line 161 "src/txt/readstat_sas_commands_read.rl" error = submit_value_label(parser, labelset, label_type, first_integer, integer, double_value, string_value, buf, user_ctx); if (error != READSTAT_OK) goto cleanup; } #line 3521 "src/txt/readstat_sas_commands_read.c" break; } case 18: { { #line 168 "src/txt/readstat_sas_commands_read.rl" str_start = p; } #line 3530 "src/txt/readstat_sas_commands_read.c" break; } case 19: { { #line 168 "src/txt/readstat_sas_commands_read.rl" str_len = p - str_start; } #line 3539 "src/txt/readstat_sas_commands_read.c" break; } case 20: { { #line 170 "src/txt/readstat_sas_commands_read.rl" str_start = p; } #line 3548 "src/txt/readstat_sas_commands_read.c" break; } case 21: { { #line 170 "src/txt/readstat_sas_commands_read.rl" str_len = p - str_start; } #line 3557 "src/txt/readstat_sas_commands_read.c" break; } case 22: { { #line 178 "src/txt/readstat_sas_commands_read.rl" line_no++; line_start = p; } #line 3566 "src/txt/readstat_sas_commands_read.c" break; } case 23: { { #line 182 "src/txt/readstat_sas_commands_read.rl" str_start = p; } #line 3575 "src/txt/readstat_sas_commands_read.c" break; } case 24: { { #line 182 "src/txt/readstat_sas_commands_read.rl" str_len = p - str_start; } #line 3584 "src/txt/readstat_sas_commands_read.c" break; } case 25: { { #line 221 "src/txt/readstat_sas_commands_read.rl" label_type = LABEL_TYPE_DOUBLE; double_value = -(double)integer; } #line 3593 "src/txt/readstat_sas_commands_read.c" break; } case 26: { { #line 222 "src/txt/readstat_sas_commands_read.rl" label_type = LABEL_TYPE_DOUBLE; double_value = integer; } #line 3602 "src/txt/readstat_sas_commands_read.c" break; } case 27: { { #line 223 "src/txt/readstat_sas_commands_read.rl" first_integer = integer; } #line 3611 "src/txt/readstat_sas_commands_read.c" break; } case 28: { { #line 223 "src/txt/readstat_sas_commands_read.rl" label_type = LABEL_TYPE_RANGE; } #line 3620 "src/txt/readstat_sas_commands_read.c" break; } case 29: { { #line 224 "src/txt/readstat_sas_commands_read.rl" label_type = LABEL_TYPE_STRING; } #line 3629 "src/txt/readstat_sas_commands_read.c" break; } case 30: { { #line 225 "src/txt/readstat_sas_commands_read.rl" label_type = LABEL_TYPE_STRING; } #line 3638 "src/txt/readstat_sas_commands_read.c" break; } case 31: { { #line 226 "src/txt/readstat_sas_commands_read.rl" label_type = LABEL_TYPE_OTHER; } #line 3647 "src/txt/readstat_sas_commands_read.c" break; } case 32: { { #line 229 "src/txt/readstat_sas_commands_read.rl" var_len = integer; } #line 3656 "src/txt/readstat_sas_commands_read.c" break; } case 33: { { #line 328 "src/txt/readstat_sas_commands_read.rl" var_row = integer - 1; } #line 3665 "src/txt/readstat_sas_commands_read.c" break; } case 34: { { #line 332 "src/txt/readstat_sas_commands_read.rl" var_row = 0; } #line 3674 "src/txt/readstat_sas_commands_read.c" break; } } _nacts -= 1; _acts += 1; } } if ( p == eof ) { if ( cs >= 1094 ) goto _out; } else { if ( cs != 0 ) { p += 1; goto _resume; } } _out: {} } #line 378 "src/txt/readstat_sas_commands_read.rl" /* suppress warnings */ (void)sas_commands_en_main; if (cs < #line 3705 "src/txt/readstat_sas_commands_read.c" 1094 #line 383 "src/txt/readstat_sas_commands_read.rl" ) { char error_buf[1024]; if (p == pe) { snprintf(error_buf, sizeof(error_buf), "Error parsing SAS command file (end-of-file unexpectedly reached)"); } else { snprintf(error_buf, sizeof(error_buf), "Error parsing SAS command file around line #%d, col #%ld (%c)", line_no + 1, (long)(p - line_start + 1), *p); } if (parser->handlers.error) { parser->handlers.error(error_buf, user_ctx); } error = READSTAT_ERROR_PARSE; goto cleanup; } error = submit_columns(parser, schema, user_ctx); cleanup: parser->io->close(parser->io->io_ctx); free(bytes); if (error != READSTAT_OK) { if (outError) *outError = error; readstat_schema_free(schema); schema = NULL; } return schema; } haven/src/readstat/readstat_malloc.c0000644000176200001440000000156214101007206017246 0ustar liggesusers#include #define MAX_MALLOC_SIZE 0xFFF000 /* ~16 MB. Needs to be at least 0x3FF00, i.e. the default ~4MB block size used * in compressed SPSS (ZSAV) files. The purpose here is to prevent massive * allocations in the event of a malformed file or a bug in the library. */ void *readstat_malloc(size_t len) { if (len > MAX_MALLOC_SIZE || len == 0) { return NULL; } return malloc(len); } void *readstat_calloc(size_t count, size_t size) { if (count > MAX_MALLOC_SIZE || size > MAX_MALLOC_SIZE || count * size > MAX_MALLOC_SIZE) { return NULL; } if (count == 0 || size == 0) { return NULL; } return calloc(count, size); } void *readstat_realloc(void *ptr, size_t len) { if (len > MAX_MALLOC_SIZE || len == 0) { if (ptr) free(ptr); return NULL; } return realloc(ptr, len); } haven/src/readstat/spss/0000755000176200001440000000000014102332323014732 5ustar liggesusershaven/src/readstat/spss/readstat_spss.h0000644000176200001440000000761714101007206017773 0ustar liggesusers #define SPSS_FORMAT_TYPE_A 1 #define SPSS_FORMAT_TYPE_AHEX 2 #define SPSS_FORMAT_TYPE_COMMA 3 #define SPSS_FORMAT_TYPE_DOLLAR 4 #define SPSS_FORMAT_TYPE_F 5 #define SPSS_FORMAT_TYPE_IB 6 #define SPSS_FORMAT_TYPE_PIBHEX 7 #define SPSS_FORMAT_TYPE_P 8 #define SPSS_FORMAT_TYPE_PIB 9 #define SPSS_FORMAT_TYPE_PK 10 #define SPSS_FORMAT_TYPE_RB 11 #define SPSS_FORMAT_TYPE_RBHEX 12 #define SPSS_FORMAT_TYPE_Z 15 #define SPSS_FORMAT_TYPE_N 16 #define SPSS_FORMAT_TYPE_E 17 #define SPSS_FORMAT_TYPE_DATE 20 #define SPSS_FORMAT_TYPE_TIME 21 #define SPSS_FORMAT_TYPE_DATETIME 22 #define SPSS_FORMAT_TYPE_ADATE 23 #define SPSS_FORMAT_TYPE_JDATE 24 #define SPSS_FORMAT_TYPE_DTIME 25 #define SPSS_FORMAT_TYPE_WKDAY 26 #define SPSS_FORMAT_TYPE_MONTH 27 #define SPSS_FORMAT_TYPE_MOYR 28 #define SPSS_FORMAT_TYPE_QYR 29 #define SPSS_FORMAT_TYPE_WKYR 30 #define SPSS_FORMAT_TYPE_PCT 31 #define SPSS_FORMAT_TYPE_DOT 32 #define SPSS_FORMAT_TYPE_CCA 33 #define SPSS_FORMAT_TYPE_CCB 34 #define SPSS_FORMAT_TYPE_CCC 35 #define SPSS_FORMAT_TYPE_CCD 36 #define SPSS_FORMAT_TYPE_CCE 37 #define SPSS_FORMAT_TYPE_EDATE 38 #define SPSS_FORMAT_TYPE_SDATE 39 #define SPSS_FORMAT_TYPE_MTIME 40 #define SPSS_FORMAT_TYPE_YMDHMS 41 #define SPSS_DOC_LINE_SIZE 80 #define SAV_HIGHEST_DOUBLE 0x7FEFFFFFFFFFFFFFUL #define SAV_MISSING_DOUBLE 0xFFEFFFFFFFFFFFFFUL #define SAV_LOWEST_DOUBLE 0xFFEFFFFFFFFFFFFEUL #define SAV_MEASURE_UNKNOWN 0 #define SAV_MEASURE_NOMINAL 1 #define SAV_MEASURE_ORDINAL 2 #define SAV_MEASURE_SCALE 3 #define SAV_ALIGNMENT_LEFT 0 #define SAV_ALIGNMENT_RIGHT 1 #define SAV_ALIGNMENT_CENTER 2 #include typedef struct spss_format_s { int type; int width; int decimal_places; } spss_format_t; // The reason some fields are stored unconverted is that some versions of SPSS // store truncated UTF-8 in the fields, and also use the truncated strings for // internal logic (such as matching names). If we convert them too early, the // last character of a truncated string will be dropped, and some of the column // information won't be found (e.g. in the key=value long variable record). typedef struct spss_varinfo_s { readstat_type_t type; int labels_index; int index; int offset; int width; unsigned int string_length; spss_format_t print_format; spss_format_t write_format; int n_segments; int n_missing_values; int missing_range; double missing_double_values[3]; char missing_string_values[3][8*4+1]; // stored UTF-8 char name[8+1]; // stored UNCONVERTED char longname[64+1]; // stored UNCONVERTED char *label; // stored UTF-8 readstat_measure_t measure; readstat_alignment_t alignment; int display_width; } spss_varinfo_t; int spss_format(char *buffer, size_t len, spss_format_t *format); int spss_varinfo_compare(const void *elem1, const void *elem2); void spss_varinfo_free(spss_varinfo_t *info); readstat_missingness_t spss_missingness_for_info(spss_varinfo_t *info); readstat_variable_t *spss_init_variable_for_info(spss_varinfo_t *info, int index_after_skipping, iconv_t converter); uint64_t spss_64bit_value(readstat_value_t value); uint32_t spss_measure_from_readstat_measure(readstat_measure_t measure); readstat_measure_t spss_measure_to_readstat_measure(uint32_t sav_measure); uint32_t spss_alignment_from_readstat_alignment(readstat_alignment_t alignment); readstat_alignment_t spss_alignment_to_readstat_alignment(uint32_t sav_alignment); readstat_error_t spss_format_for_variable(readstat_variable_t *r_variable, spss_format_t *spss_format); haven/src/readstat/spss/readstat_zsav_compress.c0000644000176200001440000000651314101007206021666 0ustar liggesusers #include #include #include #include "readstat_zsav_compress.h" zsav_ctx_t *zsav_ctx_init(size_t max_row_len, int64_t offset) { zsav_ctx_t *ctx = calloc(1, sizeof(zsav_ctx_t)); ctx->buffer = malloc(max_row_len); ctx->blocks_capacity = 10; ctx->blocks = calloc(ctx->blocks_capacity, sizeof(zsav_block_t *)); ctx->uncompressed_block_size = 0x3FF000; ctx->zheader_ofs = offset; ctx->compression_level = Z_DEFAULT_COMPRESSION; return ctx; } void zsav_ctx_free(zsav_ctx_t *ctx) { int i; for (i=0; iblocks_count; i++) { zsav_block_t *block = ctx->blocks[i]; deflateEnd(&block->stream); free(block->compressed_data); free(block); } free(ctx->blocks); free(ctx->buffer); free(ctx); } zsav_block_t *zsav_add_block(zsav_ctx_t *ctx) { zsav_block_t *block = NULL; if (ctx->blocks_count == ctx->blocks_capacity) { ctx->blocks = realloc(ctx->blocks, (ctx->blocks_capacity *= 2 ) * sizeof(zsav_block_t *)); } block = calloc(1, sizeof(zsav_block_t)); ctx->blocks[ctx->blocks_count++] = block; deflateInit(&block->stream, ctx->compression_level); block->compressed_data_capacity = deflateBound(&block->stream, ctx->uncompressed_block_size); block->compressed_data = malloc(block->compressed_data_capacity); return block; } zsav_block_t *zsav_current_block(zsav_ctx_t *ctx) { if (ctx->blocks_count == 0) return NULL; return ctx->blocks[ctx->blocks_count-1]; } int zsav_compress_row(void *input, size_t input_len, int finish, zsav_ctx_t *ctx) { off_t row_off = 0; unsigned char *row_buffer = input; size_t row_len = input_len; zsav_block_t *block = zsav_current_block(ctx); int deflate_status = Z_OK; if (block == NULL) { block = zsav_add_block(ctx); } block->stream.next_in = row_buffer; block->stream.avail_in = row_len; block->stream.next_out = &block->compressed_data[block->compressed_size]; block->stream.avail_out = block->compressed_data_capacity - block->compressed_size; /* If the row won't fit into this block, keep writing and flushing * until the remainder fits. */ while (row_len - row_off > ctx->uncompressed_block_size - block->uncompressed_size) { block->stream.avail_in = ctx->uncompressed_block_size - block->uncompressed_size; row_off += ctx->uncompressed_block_size - block->uncompressed_size; if ((deflate_status = deflate(&block->stream, Z_FINISH)) != Z_STREAM_END) { goto cleanup; } block->compressed_size = block->compressed_data_capacity - block->stream.avail_out; block->uncompressed_size = ctx->uncompressed_block_size - block->stream.avail_in; block = zsav_add_block(ctx); block->stream.next_in = &row_buffer[row_off]; block->stream.avail_in = row_len - row_off; block->stream.next_out = block->compressed_data; block->stream.avail_out = block->compressed_data_capacity; } /* Now the rest of the row will fit in the block */ deflate_status = deflate(&block->stream, finish ? Z_FINISH : Z_NO_FLUSH); block->compressed_size = block->compressed_data_capacity - block->stream.avail_out; block->uncompressed_size += (row_len - row_off) - block->stream.avail_in; cleanup: return deflate_status; } haven/src/readstat/spss/readstat_sav_parse.c0000644000176200001440000005323014101765776021000 0ustar liggesusers#line 1 "src/spss/readstat_sav_parse.rl" #include #include #include "../readstat.h" #include "../readstat_malloc.h" #include "../readstat_strings.h" #include "readstat_sav.h" #include "readstat_sav_parse.h" #line 21 "src/spss/readstat_sav_parse.rl" typedef struct varlookup { char name[8*4+1]; int index; } varlookup_t; static int compare_key_varlookup(const void *elem1, const void *elem2) { const char *key = (const char *)elem1; const varlookup_t *v = (const varlookup_t *)elem2; return strcasecmp(key, v->name); } static int compare_varlookups(const void *elem1, const void *elem2) { const varlookup_t *v1 = (const varlookup_t *)elem1; const varlookup_t *v2 = (const varlookup_t *)elem2; return strcasecmp(v1->name, v2->name); } static int count_vars(sav_ctx_t *ctx) { int i; spss_varinfo_t *last_info = NULL; int var_count = 0; for (i=0; ivar_index; i++) { spss_varinfo_t *info = ctx->varinfo[i]; if (last_info == NULL || strcmp(info->name, last_info->name) != 0) { var_count++; } last_info = info; } return var_count; } static varlookup_t *build_lookup_table(int var_count, sav_ctx_t *ctx) { varlookup_t *table = readstat_malloc(var_count * sizeof(varlookup_t)); int offset = 0; int i; spss_varinfo_t *last_info = NULL; for (i=0; ivar_index; i++) { spss_varinfo_t *info = ctx->varinfo[i]; if (last_info == NULL || strcmp(info->name, last_info->name) != 0) { varlookup_t *entry = &table[offset++]; memcpy(entry->name, info->name, sizeof(info->name)); entry->index = info->index; } last_info = info; } qsort(table, var_count, sizeof(varlookup_t), &compare_varlookups); return table; } #line 68 "src/spss/readstat_sav_parse.c" static const signed char _sav_long_variable_parse_actions[] = { 0, 1, 1, 1, 5, 2, 2, 0, 3, 6, 4, 3, 0 }; static const short _sav_long_variable_parse_key_offsets[] = { 0, 0, 5, 19, 33, 47, 61, 75, 89, 103, 104, 108, 113, 118, 123, 128, 133, 138, 143, 148, 153, 158, 163, 168, 173, 178, 183, 188, 193, 198, 203, 208, 213, 218, 223, 228, 233, 238, 243, 248, 253, 258, 263, 268, 273, 278, 283, 288, 293, 298, 303, 308, 313, 318, 323, 328, 333, 338, 343, 348, 353, 358, 363, 368, 373, 378, 383, 388, 393, 398, 403, 408, 413, 418, 423, 428, 0 }; static const unsigned char _sav_long_variable_parse_trans_keys[] = { 255u, 0u, 63u, 91u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 61u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 255u, 0u, 63u, 91u, 127u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 127u, 255u, 0u, 31u, 9u, 0u }; static const signed char _sav_long_variable_parse_single_lengths[] = { 0, 1, 4, 4, 4, 4, 4, 4, 4, 1, 2, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 0 }; static const signed char _sav_long_variable_parse_range_lengths[] = { 0, 2, 5, 5, 5, 5, 5, 5, 5, 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0 }; static const short _sav_long_variable_parse_index_offsets[] = { 0, 0, 4, 14, 24, 34, 44, 54, 64, 74, 76, 80, 85, 89, 94, 99, 104, 109, 114, 119, 124, 129, 134, 139, 144, 149, 154, 159, 164, 169, 174, 179, 184, 189, 194, 199, 204, 209, 214, 219, 224, 229, 234, 239, 244, 249, 254, 259, 264, 269, 274, 279, 284, 289, 294, 299, 304, 309, 314, 319, 324, 329, 334, 339, 344, 349, 354, 359, 364, 369, 374, 379, 384, 389, 394, 399, 0 }; static const signed char _sav_long_variable_parse_cond_targs[] = { 0, 0, 0, 2, 0, 10, 0, 0, 0, 0, 0, 0, 0, 3, 0, 10, 0, 0, 0, 0, 0, 0, 0, 4, 0, 10, 0, 0, 0, 0, 0, 0, 0, 5, 0, 10, 0, 0, 0, 0, 0, 0, 0, 6, 0, 10, 0, 0, 0, 0, 0, 0, 0, 7, 0, 10, 0, 0, 0, 0, 0, 0, 0, 8, 0, 10, 0, 0, 0, 0, 0, 0, 0, 9, 10, 0, 0, 0, 0, 11, 12, 0, 0, 0, 13, 0, 0, 0, 2, 12, 0, 0, 0, 14, 12, 0, 0, 0, 15, 12, 0, 0, 0, 16, 12, 0, 0, 0, 17, 12, 0, 0, 0, 18, 12, 0, 0, 0, 19, 12, 0, 0, 0, 20, 12, 0, 0, 0, 21, 12, 0, 0, 0, 22, 12, 0, 0, 0, 23, 12, 0, 0, 0, 24, 12, 0, 0, 0, 25, 12, 0, 0, 0, 26, 12, 0, 0, 0, 27, 12, 0, 0, 0, 28, 12, 0, 0, 0, 29, 12, 0, 0, 0, 30, 12, 0, 0, 0, 31, 12, 0, 0, 0, 32, 12, 0, 0, 0, 33, 12, 0, 0, 0, 34, 12, 0, 0, 0, 35, 12, 0, 0, 0, 36, 12, 0, 0, 0, 37, 12, 0, 0, 0, 38, 12, 0, 0, 0, 39, 12, 0, 0, 0, 40, 12, 0, 0, 0, 41, 12, 0, 0, 0, 42, 12, 0, 0, 0, 43, 12, 0, 0, 0, 44, 12, 0, 0, 0, 45, 12, 0, 0, 0, 46, 12, 0, 0, 0, 47, 12, 0, 0, 0, 48, 12, 0, 0, 0, 49, 12, 0, 0, 0, 50, 12, 0, 0, 0, 51, 12, 0, 0, 0, 52, 12, 0, 0, 0, 53, 12, 0, 0, 0, 54, 12, 0, 0, 0, 55, 12, 0, 0, 0, 56, 12, 0, 0, 0, 57, 12, 0, 0, 0, 58, 12, 0, 0, 0, 59, 12, 0, 0, 0, 60, 12, 0, 0, 0, 61, 12, 0, 0, 0, 62, 12, 0, 0, 0, 63, 12, 0, 0, 0, 64, 12, 0, 0, 0, 65, 12, 0, 0, 0, 66, 12, 0, 0, 0, 67, 12, 0, 0, 0, 68, 12, 0, 0, 0, 69, 12, 0, 0, 0, 70, 12, 0, 0, 0, 71, 12, 0, 0, 0, 72, 12, 0, 0, 0, 73, 12, 0, 0, 0, 74, 12, 0, 0, 0, 75, 12, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 0 }; static const signed char _sav_long_variable_parse_cond_actions[] = { 0, 0, 0, 1, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 3, 8, 0, 0, 0, 0, 0, 0, 0, 1, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 0 }; static const short _sav_long_variable_parse_eof_trans[] = { 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 0 }; static const int sav_long_variable_parse_start = 1; static const int sav_long_variable_parse_en_main = 1; #line 79 "src/spss/readstat_sav_parse.rl" readstat_error_t sav_parse_long_variable_names_record(void *data, int count, sav_ctx_t *ctx) { unsigned char *c_data = (unsigned char *)data; int var_count = count_vars(ctx); readstat_error_t retval = READSTAT_OK; char temp_key[8+1]; char temp_val[64+1]; unsigned char *str_start = NULL; size_t str_len = 0; char error_buf[8192]; unsigned char *p = c_data; unsigned char *pe = c_data + count; varlookup_t *table = build_lookup_table(var_count, ctx); unsigned char *eof = pe; int cs; #line 351 "src/spss/readstat_sav_parse.c" { cs = (int)sav_long_variable_parse_start; } #line 356 "src/spss/readstat_sav_parse.c" { int _klen; unsigned int _trans = 0; const unsigned char * _keys; const signed char * _acts; unsigned int _nacts; _resume: {} if ( p == pe && p != eof ) goto _out; if ( p == eof ) { if ( _sav_long_variable_parse_eof_trans[cs] > 0 ) { _trans = (unsigned int)_sav_long_variable_parse_eof_trans[cs] - 1; } } else { _keys = ( _sav_long_variable_parse_trans_keys + (_sav_long_variable_parse_key_offsets[cs])); _trans = (unsigned int)_sav_long_variable_parse_index_offsets[cs]; _klen = (int)_sav_long_variable_parse_single_lengths[cs]; if ( _klen > 0 ) { const unsigned char *_lower = _keys; const unsigned char *_upper = _keys + _klen - 1; const unsigned char *_mid; while ( 1 ) { if ( _upper < _lower ) { _keys += _klen; _trans += (unsigned int)_klen; break; } _mid = _lower + ((_upper-_lower) >> 1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 1; else if ( ( (*( p))) > (*( _mid)) ) _lower = _mid + 1; else { _trans += (unsigned int)(_mid - _keys); goto _match; } } } _klen = (int)_sav_long_variable_parse_range_lengths[cs]; if ( _klen > 0 ) { const unsigned char *_lower = _keys; const unsigned char *_upper = _keys + (_klen<<1) - 2; const unsigned char *_mid; while ( 1 ) { if ( _upper < _lower ) { _trans += (unsigned int)_klen; break; } _mid = _lower + (((_upper-_lower) >> 1) & ~1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 2; else if ( ( (*( p))) > (*( _mid + 1)) ) _lower = _mid + 2; else { _trans += (unsigned int)((_mid - _keys)>>1); break; } } } _match: {} } cs = (int)_sav_long_variable_parse_cond_targs[_trans]; if ( _sav_long_variable_parse_cond_actions[_trans] != 0 ) { _acts = ( _sav_long_variable_parse_actions + (_sav_long_variable_parse_cond_actions[_trans])); _nacts = (unsigned int)(*( _acts)); _acts += 1; while ( _nacts > 0 ) { switch ( (*( _acts)) ) { case 0: { { #line 13 "src/spss/readstat_sav_parse.rl" memcpy(temp_key, str_start, str_len); temp_key[str_len] = '\0'; } #line 442 "src/spss/readstat_sav_parse.c" break; } case 1: { { #line 20 "src/spss/readstat_sav_parse.rl" str_start = p; } #line 451 "src/spss/readstat_sav_parse.c" break; } case 2: { { #line 20 "src/spss/readstat_sav_parse.rl" str_len = p - str_start; } #line 460 "src/spss/readstat_sav_parse.c" break; } case 3: { { #line 102 "src/spss/readstat_sav_parse.rl" varlookup_t *found = bsearch(temp_key, table, var_count, sizeof(varlookup_t), &compare_key_varlookup); if (found) { spss_varinfo_t *info = ctx->varinfo[found->index]; memcpy(info->longname, temp_val, str_len); info->longname[str_len] = '\0'; } else if (ctx->handle.error) { snprintf(error_buf, sizeof(error_buf), "Failed to find %s", temp_key); ctx->handle.error(error_buf, ctx->user_ctx); } } #line 479 "src/spss/readstat_sav_parse.c" break; } case 4: { { #line 114 "src/spss/readstat_sav_parse.rl" memcpy(temp_val, str_start, str_len); temp_val[str_len] = '\0'; } #line 491 "src/spss/readstat_sav_parse.c" break; } case 5: { { #line 119 "src/spss/readstat_sav_parse.rl" str_start = p; } #line 500 "src/spss/readstat_sav_parse.c" break; } case 6: { { #line 119 "src/spss/readstat_sav_parse.rl" str_len = p - str_start; } #line 509 "src/spss/readstat_sav_parse.c" break; } } _nacts -= 1; _acts += 1; } } if ( p == eof ) { if ( cs >= 11 ) goto _out; } else { if ( cs != 0 ) { p += 1; goto _resume; } } _out: {} } #line 127 "src/spss/readstat_sav_parse.rl" if (cs < #line 537 "src/spss/readstat_sav_parse.c" 11 #line 129 "src/spss/readstat_sav_parse.rl" || p != pe) { if (ctx->handle.error) { snprintf(error_buf, sizeof(error_buf), "Error parsing string \"%.*s\" around byte #%ld/%d, character %c", count, (char *)data, (long)(p - c_data), count, *p); ctx->handle.error(error_buf, ctx->user_ctx); } retval = READSTAT_ERROR_PARSE; } if (table) free(table); /* suppress warning */ (void)sav_long_variable_parse_en_main; return retval; } #line 560 "src/spss/readstat_sav_parse.c" static const signed char _sav_very_long_string_parse_actions[] = { 0, 1, 1, 1, 3, 1, 4, 2, 2, 0, 2, 5, 4, 0 }; static const signed char _sav_very_long_string_parse_key_offsets[] = { 0, 0, 5, 19, 33, 47, 61, 75, 89, 103, 104, 106, 109, 111, 0 }; static const unsigned char _sav_very_long_string_parse_trans_keys[] = { 255u, 0u, 63u, 91u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 47u, 61u, 96u, 255u, 0u, 34u, 37u, 45u, 58u, 63u, 91u, 94u, 123u, 127u, 61u, 48u, 57u, 0u, 48u, 57u, 0u, 9u, 255u, 0u, 63u, 91u, 127u, 0u }; static const signed char _sav_very_long_string_parse_single_lengths[] = { 0, 1, 4, 4, 4, 4, 4, 4, 4, 1, 0, 1, 2, 1, 0 }; static const signed char _sav_very_long_string_parse_range_lengths[] = { 0, 2, 5, 5, 5, 5, 5, 5, 5, 0, 1, 1, 0, 2, 0 }; static const signed char _sav_very_long_string_parse_index_offsets[] = { 0, 0, 4, 14, 24, 34, 44, 54, 64, 74, 76, 78, 81, 84, 0 }; static const signed char _sav_very_long_string_parse_cond_targs[] = { 0, 0, 0, 2, 0, 10, 0, 0, 0, 0, 0, 0, 0, 3, 0, 10, 0, 0, 0, 0, 0, 0, 0, 4, 0, 10, 0, 0, 0, 0, 0, 0, 0, 5, 0, 10, 0, 0, 0, 0, 0, 0, 0, 6, 0, 10, 0, 0, 0, 0, 0, 0, 0, 7, 0, 10, 0, 0, 0, 0, 0, 0, 0, 8, 0, 10, 0, 0, 0, 0, 0, 0, 0, 9, 10, 0, 11, 0, 12, 11, 0, 12, 13, 0, 0, 0, 0, 2, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 0 }; static const signed char _sav_very_long_string_parse_cond_actions[] = { 0, 0, 0, 1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 10, 0, 3, 5, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; static const int sav_very_long_string_parse_start = 1; static const int sav_very_long_string_parse_en_main = 1; #line 153 "src/spss/readstat_sav_parse.rl" readstat_error_t sav_parse_very_long_string_record(void *data, int count, sav_ctx_t *ctx) { unsigned char *c_data = (unsigned char *)data; int var_count = count_vars(ctx); readstat_error_t retval = READSTAT_OK; char temp_key[8*4+1]; unsigned int temp_val = 0; unsigned char *str_start = NULL; size_t str_len = 0; size_t error_buf_len = 1024 + count; char *error_buf = NULL; unsigned char *p = c_data; unsigned char *pe = c_data + count; varlookup_t *table = NULL; int cs; error_buf = readstat_malloc(error_buf_len); table = build_lookup_table(var_count, ctx); #line 666 "src/spss/readstat_sav_parse.c" { cs = (int)sav_very_long_string_parse_start; } #line 671 "src/spss/readstat_sav_parse.c" { int _klen; unsigned int _trans = 0; const unsigned char * _keys; const signed char * _acts; unsigned int _nacts; _resume: {} if ( p == pe ) goto _out; _keys = ( _sav_very_long_string_parse_trans_keys + (_sav_very_long_string_parse_key_offsets[cs])); _trans = (unsigned int)_sav_very_long_string_parse_index_offsets[cs]; _klen = (int)_sav_very_long_string_parse_single_lengths[cs]; if ( _klen > 0 ) { const unsigned char *_lower = _keys; const unsigned char *_upper = _keys + _klen - 1; const unsigned char *_mid; while ( 1 ) { if ( _upper < _lower ) { _keys += _klen; _trans += (unsigned int)_klen; break; } _mid = _lower + ((_upper-_lower) >> 1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 1; else if ( ( (*( p))) > (*( _mid)) ) _lower = _mid + 1; else { _trans += (unsigned int)(_mid - _keys); goto _match; } } } _klen = (int)_sav_very_long_string_parse_range_lengths[cs]; if ( _klen > 0 ) { const unsigned char *_lower = _keys; const unsigned char *_upper = _keys + (_klen<<1) - 2; const unsigned char *_mid; while ( 1 ) { if ( _upper < _lower ) { _trans += (unsigned int)_klen; break; } _mid = _lower + (((_upper-_lower) >> 1) & ~1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 2; else if ( ( (*( p))) > (*( _mid + 1)) ) _lower = _mid + 2; else { _trans += (unsigned int)((_mid - _keys)>>1); break; } } } _match: {} cs = (int)_sav_very_long_string_parse_cond_targs[_trans]; if ( _sav_very_long_string_parse_cond_actions[_trans] != 0 ) { _acts = ( _sav_very_long_string_parse_actions + (_sav_very_long_string_parse_cond_actions[_trans])); _nacts = (unsigned int)(*( _acts)); _acts += 1; while ( _nacts > 0 ) { switch ( (*( _acts)) ) { case 0: { { #line 13 "src/spss/readstat_sav_parse.rl" memcpy(temp_key, str_start, str_len); temp_key[str_len] = '\0'; } #line 750 "src/spss/readstat_sav_parse.c" break; } case 1: { { #line 20 "src/spss/readstat_sav_parse.rl" str_start = p; } #line 759 "src/spss/readstat_sav_parse.c" break; } case 2: { { #line 20 "src/spss/readstat_sav_parse.rl" str_len = p - str_start; } #line 768 "src/spss/readstat_sav_parse.c" break; } case 3: { { #line 177 "src/spss/readstat_sav_parse.rl" varlookup_t *found = bsearch(temp_key, table, var_count, sizeof(varlookup_t), &compare_key_varlookup); if (found) { ctx->varinfo[found->index]->string_length = temp_val; ctx->varinfo[found->index]->write_format.width = temp_val; ctx->varinfo[found->index]->print_format.width = temp_val; } } #line 784 "src/spss/readstat_sav_parse.c" break; } case 4: { { #line 186 "src/spss/readstat_sav_parse.rl" if ((( (*( p)))) != '\0') { unsigned char digit = (( (*( p)))) - '0'; if (temp_val <= (UINT_MAX - digit) / 10) { temp_val = 10 * temp_val + digit; } else { {p += 1; goto _out; } } } } #line 802 "src/spss/readstat_sav_parse.c" break; } case 5: { { #line 197 "src/spss/readstat_sav_parse.rl" temp_val = 0; } #line 811 "src/spss/readstat_sav_parse.c" break; } } _nacts -= 1; _acts += 1; } } if ( cs != 0 ) { p += 1; goto _resume; } _out: {} } #line 205 "src/spss/readstat_sav_parse.rl" if (cs < #line 833 "src/spss/readstat_sav_parse.c" 12 #line 207 "src/spss/readstat_sav_parse.rl" || p != pe) { if (ctx->handle.error) { snprintf(error_buf, error_buf_len, "Parsed %ld of %ld bytes. Remaining bytes: %.*s", (long)(p - c_data), (long)(pe - c_data), (int)(pe - p), p); ctx->handle.error(error_buf, ctx->user_ctx); } retval = READSTAT_ERROR_PARSE; } if (table) free(table); if (error_buf) free(error_buf); /* suppress warning */ (void)sav_very_long_string_parse_en_main; return retval; } haven/src/readstat/spss/readstat_zsav_write.h0000644000176200001440000000021014101007206021156 0ustar liggesusers readstat_error_t zsav_write_compressed_row(void *writer_ctx, void *row, size_t len); readstat_error_t zsav_end_data(void *writer_ctx); haven/src/readstat/spss/readstat_por_parse.c0000644000176200001440000001604014101765776021005 0ustar liggesusers#line 1 "src/spss/readstat_por_parse.rl" #include #include "../readstat.h" #include "readstat_por_parse.h" #line 9 "src/spss/readstat_por_parse.c" static const signed char _por_field_parse_actions[] = { 0, 1, 0, 1, 1, 1, 5, 1, 8, 1, 9, 1, 10, 2, 2, 0, 2, 3, 1, 2, 5, 10, 2, 7, 10, 3, 4, 2, 0, 3, 6, 2, 0, 0 }; static const signed char _por_field_parse_key_offsets[] = { 0, 0, 8, 9, 14, 18, 23, 31, 35, 40, 44, 48, 55, 0 }; static const char _por_field_parse_trans_keys[] = { 32, 42, 45, 46, 48, 57, 65, 84, 46, 46, 48, 57, 65, 84, 48, 57, 65, 84, 47, 48, 57, 65, 84, 43, 45, 46, 47, 48, 57, 65, 84, 48, 57, 65, 84, 47, 48, 57, 65, 84, 48, 57, 65, 84, 48, 57, 65, 84, 43, 45, 47, 48, 57, 65, 84, 0 }; static const signed char _por_field_parse_single_lengths[] = { 0, 4, 1, 1, 0, 1, 4, 0, 1, 0, 0, 3, 0, 0 }; static const signed char _por_field_parse_range_lengths[] = { 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0 }; static const signed char _por_field_parse_index_offsets[] = { 0, 0, 7, 9, 13, 16, 20, 27, 30, 34, 37, 40, 46, 0 }; static const signed char _por_field_parse_cond_targs[] = { 1, 2, 3, 4, 6, 6, 0, 12, 0, 4, 6, 6, 0, 5, 5, 0, 12, 5, 5, 0, 7, 9, 10, 12, 6, 6, 0, 8, 8, 0, 12, 8, 8, 0, 8, 8, 0, 11, 11, 0, 7, 9, 12, 11, 11, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 0 }; static const signed char _por_field_parse_cond_actions[] = { 0, 9, 0, 0, 13, 13, 0, 11, 0, 7, 25, 25, 0, 16, 16, 0, 11, 3, 3, 0, 5, 5, 5, 19, 1, 1, 0, 13, 13, 0, 22, 1, 1, 0, 29, 29, 0, 16, 16, 0, 0, 0, 11, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; static const int por_field_parse_start = 1; static const int por_field_parse_en_main = 1; #line 9 "src/spss/readstat_por_parse.rl" ssize_t readstat_por_parse_double(const char *data, size_t len, double *result, readstat_error_handler error_cb, void *user_ctx) { ssize_t retval = 0; double val = 0.0; double denom = 30.0; double temp_frac = 0.0; double num = 0.0; double exp = 0.0; double temp_val = 0.0; const unsigned char *p = (const unsigned char *)data; const unsigned char *pe = p + len; int cs; int is_negative = 0, exp_is_negative = 0; int success = 0; #line 97 "src/spss/readstat_por_parse.c" { cs = (int)por_field_parse_start; } #line 102 "src/spss/readstat_por_parse.c" { int _klen; unsigned int _trans = 0; const char * _keys; const signed char * _acts; unsigned int _nacts; _resume: {} if ( p == pe ) goto _out; _keys = ( _por_field_parse_trans_keys + (_por_field_parse_key_offsets[cs])); _trans = (unsigned int)_por_field_parse_index_offsets[cs]; _klen = (int)_por_field_parse_single_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + _klen - 1; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _keys += _klen; _trans += (unsigned int)_klen; break; } _mid = _lower + ((_upper-_lower) >> 1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 1; else if ( ( (*( p))) > (*( _mid)) ) _lower = _mid + 1; else { _trans += (unsigned int)(_mid - _keys); goto _match; } } } _klen = (int)_por_field_parse_range_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + (_klen<<1) - 2; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _trans += (unsigned int)_klen; break; } _mid = _lower + (((_upper-_lower) >> 1) & ~1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 2; else if ( ( (*( p))) > (*( _mid + 1)) ) _lower = _mid + 2; else { _trans += (unsigned int)((_mid - _keys)>>1); break; } } } _match: {} cs = (int)_por_field_parse_cond_targs[_trans]; if ( _por_field_parse_cond_actions[_trans] != 0 ) { _acts = ( _por_field_parse_actions + (_por_field_parse_cond_actions[_trans])); _nacts = (unsigned int)(*( _acts)); _acts += 1; while ( _nacts > 0 ) { switch ( (*( _acts)) ) { case 0: { { #line 30 "src/spss/readstat_por_parse.rl" if ((( (*( p)))) >= '0' && (( (*( p)))) <= '9') { temp_val = 30 * temp_val + ((( (*( p)))) - '0'); } else if ((( (*( p)))) >= 'A' && (( (*( p)))) <= 'T') { temp_val = 30 * temp_val + (10 + (( (*( p)))) - 'A'); } } #line 184 "src/spss/readstat_por_parse.c" break; } case 1: { { #line 38 "src/spss/readstat_por_parse.rl" if ((( (*( p)))) >= '0' && (( (*( p)))) <= '9') { temp_frac += ((( (*( p)))) - '0') / denom; } else if ((( (*( p)))) >= 'A' && (( (*( p)))) <= 'T') { temp_frac += (10 + (( (*( p)))) - 'A') / denom; } denom *= 30.0; } #line 200 "src/spss/readstat_por_parse.c" break; } case 2: { { #line 47 "src/spss/readstat_por_parse.rl" temp_val = 0; } #line 209 "src/spss/readstat_por_parse.c" break; } case 3: { { #line 49 "src/spss/readstat_por_parse.rl" temp_frac = 0.0; } #line 218 "src/spss/readstat_por_parse.c" break; } case 4: { { #line 53 "src/spss/readstat_por_parse.rl" is_negative = 1; } #line 227 "src/spss/readstat_por_parse.c" break; } case 5: { { #line 53 "src/spss/readstat_por_parse.rl" num = temp_val; } #line 236 "src/spss/readstat_por_parse.c" break; } case 6: { { #line 54 "src/spss/readstat_por_parse.rl" exp_is_negative = 1; } #line 245 "src/spss/readstat_por_parse.c" break; } case 7: { { #line 54 "src/spss/readstat_por_parse.rl" exp = temp_val; } #line 254 "src/spss/readstat_por_parse.c" break; } case 8: { { #line 56 "src/spss/readstat_por_parse.rl" is_negative = 1; } #line 263 "src/spss/readstat_por_parse.c" break; } case 9: { { #line 58 "src/spss/readstat_por_parse.rl" val = NAN; } #line 272 "src/spss/readstat_por_parse.c" break; } case 10: { { #line 60 "src/spss/readstat_por_parse.rl" success = 1; {p += 1; goto _out; } } #line 281 "src/spss/readstat_por_parse.c" break; } } _nacts -= 1; _acts += 1; } } if ( cs != 0 ) { p += 1; goto _resume; } _out: {} } #line 64 "src/spss/readstat_por_parse.rl" if (!isnan(val)) { val = 1.0 * num + temp_frac; if (exp_is_negative) exp *= -1; if (exp) { val *= pow(30.0, exp); } if (is_negative) val *= -1; } if (!success) { retval = -1; if (error_cb) { char error_buf[1024]; snprintf(error_buf, sizeof(error_buf), "Read bytes: %ld String: %.*s Ending state: %d", (long)(p - (const unsigned char *)data), (int)len, data, cs); error_cb(error_buf, user_ctx); } } if (retval == 0) { if (result) *result = val; retval = (p - (const unsigned char *)data); } /* suppress warning */ (void)por_field_parse_en_main; return retval; } haven/src/readstat/spss/readstat_sav_parse.h0000644000176200001440000000032114101007206020747 0ustar liggesusers// // sav_parse.h // readstat_error_t sav_parse_long_variable_names_record(void *data, int count, sav_ctx_t *ctx); readstat_error_t sav_parse_very_long_string_record(void *data, int count, sav_ctx_t *ctx); haven/src/readstat/spss/readstat_sav_parse.rl0000644000176200001440000001474614101007206021155 0ustar liggesusers#include #include #include "../readstat.h" #include "../readstat_malloc.h" #include "../readstat_strings.h" #include "readstat_sav.h" #include "readstat_sav_parse.h" %%{ machine key_defs; action copy_key { memcpy(temp_key, str_start, str_len); temp_key[str_len] = '\0'; } non_ascii_byte = (0x80 .. 0xFE); # multi-byte sequence might be incomplete key = ( ( non_ascii_byte | [A-Z@] ) ( non_ascii_byte | [A-Za-z0-9@#$_\.] ){0,7} ) >{ str_start = fpc; } %{ str_len = fpc - str_start; }; }%% typedef struct varlookup { char name[8*4+1]; int index; } varlookup_t; static int compare_key_varlookup(const void *elem1, const void *elem2) { const char *key = (const char *)elem1; const varlookup_t *v = (const varlookup_t *)elem2; return strcasecmp(key, v->name); } static int compare_varlookups(const void *elem1, const void *elem2) { const varlookup_t *v1 = (const varlookup_t *)elem1; const varlookup_t *v2 = (const varlookup_t *)elem2; return strcasecmp(v1->name, v2->name); } static int count_vars(sav_ctx_t *ctx) { int i; spss_varinfo_t *last_info = NULL; int var_count = 0; for (i=0; ivar_index; i++) { spss_varinfo_t *info = ctx->varinfo[i]; if (last_info == NULL || strcmp(info->name, last_info->name) != 0) { var_count++; } last_info = info; } return var_count; } static varlookup_t *build_lookup_table(int var_count, sav_ctx_t *ctx) { varlookup_t *table = readstat_malloc(var_count * sizeof(varlookup_t)); int offset = 0; int i; spss_varinfo_t *last_info = NULL; for (i=0; ivar_index; i++) { spss_varinfo_t *info = ctx->varinfo[i]; if (last_info == NULL || strcmp(info->name, last_info->name) != 0) { varlookup_t *entry = &table[offset++]; memcpy(entry->name, info->name, sizeof(info->name)); entry->index = info->index; } last_info = info; } qsort(table, var_count, sizeof(varlookup_t), &compare_varlookups); return table; } %%{ machine sav_long_variable_parse; include key_defs; write data nofinal noerror; alphtype unsigned char; }%% readstat_error_t sav_parse_long_variable_names_record(void *data, int count, sav_ctx_t *ctx) { unsigned char *c_data = (unsigned char *)data; int var_count = count_vars(ctx); readstat_error_t retval = READSTAT_OK; char temp_key[8+1]; char temp_val[64+1]; unsigned char *str_start = NULL; size_t str_len = 0; char error_buf[8192]; unsigned char *p = c_data; unsigned char *pe = c_data + count; varlookup_t *table = build_lookup_table(var_count, ctx); unsigned char *eof = pe; int cs; %%{ action set_long_name { varlookup_t *found = bsearch(temp_key, table, var_count, sizeof(varlookup_t), &compare_key_varlookup); if (found) { spss_varinfo_t *info = ctx->varinfo[found->index]; memcpy(info->longname, temp_val, str_len); info->longname[str_len] = '\0'; } else if (ctx->handle.error) { snprintf(error_buf, sizeof(error_buf), "Failed to find %s", temp_key); ctx->handle.error(error_buf, ctx->user_ctx); } } action copy_value { memcpy(temp_val, str_start, str_len); temp_val[str_len] = '\0'; } value = ( non_ascii_byte | print ){1,64} >{ str_start = fpc; } %{ str_len = fpc - str_start; }; keyval = ( key %copy_key "=" value %copy_value ) %set_long_name; main := keyval ("\t" keyval)* "\t"?; write init; write exec; }%% if (cs < %%{ write first_final; }%%|| p != pe) { if (ctx->handle.error) { snprintf(error_buf, sizeof(error_buf), "Error parsing string \"%.*s\" around byte #%ld/%d, character %c", count, (char *)data, (long)(p - c_data), count, *p); ctx->handle.error(error_buf, ctx->user_ctx); } retval = READSTAT_ERROR_PARSE; } if (table) free(table); /* suppress warning */ (void)sav_long_variable_parse_en_main; return retval; } %%{ machine sav_very_long_string_parse; include key_defs; write data nofinal noerror; alphtype unsigned char; }%% readstat_error_t sav_parse_very_long_string_record(void *data, int count, sav_ctx_t *ctx) { unsigned char *c_data = (unsigned char *)data; int var_count = count_vars(ctx); readstat_error_t retval = READSTAT_OK; char temp_key[8*4+1]; unsigned int temp_val = 0; unsigned char *str_start = NULL; size_t str_len = 0; size_t error_buf_len = 1024 + count; char *error_buf = NULL; unsigned char *p = c_data; unsigned char *pe = c_data + count; varlookup_t *table = NULL; int cs; error_buf = readstat_malloc(error_buf_len); table = build_lookup_table(var_count, ctx); %%{ action set_width { varlookup_t *found = bsearch(temp_key, table, var_count, sizeof(varlookup_t), &compare_key_varlookup); if (found) { ctx->varinfo[found->index]->string_length = temp_val; ctx->varinfo[found->index]->write_format.width = temp_val; ctx->varinfo[found->index]->print_format.width = temp_val; } } action incr_val { if (fc != '\0') { unsigned char digit = fc - '0'; if (temp_val <= (UINT_MAX - digit) / 10) { temp_val = 10 * temp_val + digit; } else { fbreak; } } } value = [0-9]+ >{ temp_val = 0; } $incr_val; keyval = ( key %copy_key "=" value ) %set_width; main := keyval ("\0"+ "\t" keyval)* "\0"+ "\t"?; write init; write exec; }%% if (cs < %%{ write first_final; }%% || p != pe) { if (ctx->handle.error) { snprintf(error_buf, error_buf_len, "Parsed %ld of %ld bytes. Remaining bytes: %.*s", (long)(p - c_data), (long)(pe - c_data), (int)(pe - p), p); ctx->handle.error(error_buf, ctx->user_ctx); } retval = READSTAT_ERROR_PARSE; } if (table) free(table); if (error_buf) free(error_buf); /* suppress warning */ (void)sav_very_long_string_parse_en_main; return retval; } haven/src/readstat/spss/readstat_sav.h0000644000176200001440000000771214101007206017570 0ustar liggesusers// // readstat_sav.h // #include "readstat_spss.h" #pragma pack(push, 1) // SAV files typedef struct sav_file_header_record_s { char rec_type[4]; char prod_name[60]; int32_t layout_code; int32_t nominal_case_size; int32_t compression; int32_t weight_index; int32_t ncases; double bias; /* TODO is this portable? */ char creation_date[9]; char creation_time[8]; char file_label[64]; char padding[3]; } sav_file_header_record_t; typedef struct sav_variable_record_s { int32_t type; int32_t has_var_label; int32_t n_missing_values; int32_t print; int32_t write; char name[8]; } sav_variable_record_t; typedef struct sav_info_record_header_s { int32_t rec_type; int32_t subtype; int32_t size; int32_t count; } sav_info_record_t; typedef struct sav_machine_integer_info_record_s { int32_t version_major; int32_t version_minor; int32_t version_revision; int32_t machine_code; int32_t floating_point_rep; int32_t compression_code; int32_t endianness; int32_t character_code; } sav_machine_integer_info_record_t; typedef struct sav_machine_floating_point_info_record_s { uint64_t sysmis; uint64_t highest; uint64_t lowest; } sav_machine_floating_point_info_record_t; typedef struct sav_dictionary_termination_record_s { int32_t rec_type; int32_t filler; } sav_dictionary_termination_record_t; #pragma pack(pop) typedef struct sav_ctx_s { readstat_callbacks_t handle; size_t file_size; readstat_io_t *io; void *user_ctx; spss_varinfo_t **varinfo; size_t varinfo_capacity; readstat_variable_t **variables; const char *input_encoding; const char *output_encoding; char file_label[4*64+1]; time_t timestamp; uint32_t *variable_display_values; size_t variable_display_values_count; iconv_t converter; int var_index; int var_offset; int var_count; int record_count; int row_limit; int row_offset; int current_row; int value_labels_count; int fweight_index; char *raw_string; size_t raw_string_len; char *utf8_string; size_t utf8_string_len; uint64_t missing_double; uint64_t lowest_double; uint64_t highest_double; double bias; int format_version; readstat_compress_t compression; readstat_endian_t endianness; unsigned int bswap:1; } sav_ctx_t; #define SAV_RECORD_TYPE_VARIABLE 2 #define SAV_RECORD_TYPE_VALUE_LABEL 3 #define SAV_RECORD_TYPE_VALUE_LABEL_VARIABLES 4 #define SAV_RECORD_TYPE_DOCUMENT 6 #define SAV_RECORD_TYPE_HAS_DATA 7 #define SAV_RECORD_TYPE_DICT_TERMINATION 999 #define SAV_RECORD_SUBTYPE_INTEGER_INFO 3 #define SAV_RECORD_SUBTYPE_FP_INFO 4 #define SAV_RECORD_SUBTYPE_PRODUCT_INFO 10 #define SAV_RECORD_SUBTYPE_VAR_DISPLAY 11 #define SAV_RECORD_SUBTYPE_LONG_VAR_NAME 13 #define SAV_RECORD_SUBTYPE_VERY_LONG_STR 14 #define SAV_RECORD_SUBTYPE_NUMBER_OF_CASES 16 #define SAV_RECORD_SUBTYPE_DATA_FILE_ATTRS 17 #define SAV_RECORD_SUBTYPE_VARIABLE_ATTRS 18 #define SAV_RECORD_SUBTYPE_CHAR_ENCODING 20 #define SAV_RECORD_SUBTYPE_LONG_STRING_VALUE_LABELS 21 #define SAV_RECORD_SUBTYPE_LONG_STRING_MISSING_VALUES 22 #define SAV_FLOATING_POINT_REP_IEEE 1 #define SAV_FLOATING_POINT_REP_IBM 2 #define SAV_FLOATING_POINT_REP_VAX 3 #define SAV_ENDIANNESS_BIG 1 #define SAV_ENDIANNESS_LITTLE 2 #define SAV_EIGHT_SPACES " " sav_ctx_t *sav_ctx_init(sav_file_header_record_t *header, readstat_io_t *io); void sav_ctx_free(sav_ctx_t *ctx); haven/src/readstat/spss/readstat_por_read.c0000644000176200001440000007152714101007206020572 0ustar liggesusers// // readstat_por.c // #include #include #include #include #include #include #include #include #include "../readstat.h" #include "../readstat_iconv.h" #include "../readstat_convert.h" #include "../readstat_malloc.h" #include "../CKHashTable.h" #include "readstat_por_parse.h" #include "readstat_spss.h" #include "readstat_por.h" #define POR_LINE_LENGTH 80 #define POR_LABEL_NAME_PREFIX "labels" #define POR_FORMAT_SHIFT 82 #define MAX_FORMAT_TYPE (POR_FORMAT_SHIFT+SPSS_FORMAT_TYPE_YMDHMS) #define MAX_FORMAT_WIDTH 20000 #define MAX_FORMAT_DECIMALS 100 #define MAX_STRING_LENGTH 20000 #define MAX_VARS 1000000 #define MAX_WIDTH 1000000 #define MAX_LINES 1000000 #define MAX_STRINGS 1000000 #define MAX_LABELS 1000000 static ssize_t read_bytes(por_ctx_t *ctx, void *dst, size_t len); static readstat_error_t read_string(por_ctx_t *ctx, char *data, size_t len); static readstat_error_t por_update_progress(por_ctx_t *ctx) { readstat_io_t *io = ctx->io; return io->update(ctx->file_size, ctx->handle.progress, ctx->user_ctx, io->io_ctx); } static ssize_t read_bytes(por_ctx_t *ctx, void *dst, size_t len) { char *dst_pos = (char *)dst; readstat_io_t *io = ctx->io; char byte; while (dst_pos < (char *)dst + len) { if (ctx->num_spaces) { *dst_pos++ = ctx->space; ctx->num_spaces--; continue; } ssize_t bytes_read = io->read(&byte, 1, io->io_ctx); if (bytes_read == 0) { break; } if (bytes_read == -1) { return -1; } if (byte == '\r' || byte == '\n') { if (byte == '\r') { bytes_read = io->read(&byte, 1, io->io_ctx); if (bytes_read == 0 || bytes_read == -1 || byte != '\n') return -1; } ctx->num_spaces = POR_LINE_LENGTH - ctx->pos; ctx->pos = 0; continue; } else if (ctx->pos == POR_LINE_LENGTH) { return -1; } *dst_pos++ = byte; ctx->pos++; } return (int)(dst_pos - (char *)dst); } static uint16_t read_tag(por_ctx_t *ctx) { unsigned char tag; if (read_bytes(ctx, &tag, 1) != 1) { return -1; } return ctx->byte2unicode[tag]; } static readstat_error_t read_double_with_peek(por_ctx_t *ctx, double *out_double, unsigned char peek) { readstat_error_t retval = READSTAT_OK; double value = NAN; unsigned char buffer[100]; char utf8_buffer[300]; char error_buf[1024]; int64_t len = 0; ssize_t bytes_read = 0; buffer[0] = peek; bytes_read = read_bytes(ctx, &buffer[1], 1); if (bytes_read != 1) return READSTAT_ERROR_PARSE; if (ctx->byte2unicode[buffer[0]] == '*' && ctx->byte2unicode[buffer[1]] == '.') { if (out_double) *out_double = NAN; return READSTAT_OK; } int64_t i=2; while (ibyte2unicode[buffer[i-1]] != '/') { bytes_read = read_bytes(ctx, &buffer[i], 1); if (bytes_read != 1) return READSTAT_ERROR_PARSE; i++; } if (i == sizeof(buffer)) { return READSTAT_ERROR_PARSE; } len = por_utf8_encode(buffer, i, utf8_buffer, sizeof(utf8_buffer), ctx->byte2unicode); if (len == -1) { if (ctx->handle.error) { snprintf(error_buf, sizeof(error_buf), "Error converting double string (length=%" PRId64 "): %.*s", i, (int)i, buffer); ctx->handle.error(error_buf, ctx->user_ctx); } retval = READSTAT_ERROR_CONVERT; goto cleanup; } bytes_read = readstat_por_parse_double(utf8_buffer, len, &value, ctx->handle.error, ctx->user_ctx); if (bytes_read == -1) { if (ctx->handle.error) { snprintf(error_buf, sizeof(error_buf), "Error parsing double string (length=%" PRId64 "): %.*s [%s]", len, (int)len, utf8_buffer, buffer); ctx->handle.error(error_buf, ctx->user_ctx); } retval = READSTAT_ERROR_PARSE; goto cleanup; } cleanup: if (out_double) *out_double = value; return retval; } static readstat_error_t read_double(por_ctx_t *ctx, double *out_double) { unsigned char peek; size_t bytes_read = read_bytes(ctx, &peek, 1); if (bytes_read != 1) return READSTAT_ERROR_PARSE; return read_double_with_peek(ctx, out_double, peek); } static readstat_error_t read_integer_in_range(por_ctx_t *ctx, int min, int max, int *out_integer) { double dval = NAN; readstat_error_t retval = read_double(ctx, &dval); if (retval != READSTAT_OK) return retval; if (isnan(dval) || dval < min || dval > max) return READSTAT_ERROR_PARSE; if (out_integer) *out_integer = (int)dval; return READSTAT_OK; } static readstat_error_t maybe_read_double(por_ctx_t *ctx, double *out_double, int *out_finished) { unsigned char peek; size_t bytes_read = read_bytes(ctx, &peek, 1); if (bytes_read != 1) return READSTAT_ERROR_PARSE; if (ctx->byte2unicode[peek] == 'Z') { if (out_double) *out_double = NAN; if (out_finished) *out_finished = 1; return READSTAT_OK; } if (out_finished) *out_finished = 0; return read_double_with_peek(ctx, out_double, peek); } static readstat_error_t maybe_read_string(por_ctx_t *ctx, char *data, size_t len, int *out_finished) { readstat_error_t retval = READSTAT_OK; double value; int finished = 0; char error_buf[1024]; size_t string_length = 0; retval = maybe_read_double(ctx, &value, &finished); if (retval != READSTAT_OK || finished) { if (out_finished) *out_finished = finished; return retval; } if (value < 0 || value > MAX_STRING_LENGTH || isnan(value)) { retval = READSTAT_ERROR_PARSE; goto cleanup; } string_length = (size_t)value; if (string_length > ctx->string_buffer_len) { ctx->string_buffer_len = string_length; ctx->string_buffer = realloc(ctx->string_buffer, ctx->string_buffer_len); memset(ctx->string_buffer, 0, ctx->string_buffer_len); } if (read_bytes(ctx, ctx->string_buffer, string_length) == -1) { retval = READSTAT_ERROR_READ; goto cleanup; } size_t bytes_encoded = por_utf8_encode(ctx->string_buffer, string_length, data, len - 1, ctx->byte2unicode); if (bytes_encoded == -1) { if (ctx->handle.error) { snprintf(error_buf, sizeof(error_buf), "Error converting string: %.*s", (int)string_length, ctx->string_buffer); ctx->handle.error(error_buf, ctx->user_ctx); } retval = READSTAT_ERROR_CONVERT; goto cleanup; } data[bytes_encoded] = '\0'; if (out_finished) *out_finished = 0; cleanup: return retval; } static readstat_error_t read_string(por_ctx_t *ctx, char *data, size_t len) { int finished = 0; readstat_error_t retval = maybe_read_string(ctx, data, len, &finished); if (retval == READSTAT_OK && finished) { return READSTAT_ERROR_PARSE; } return retval; } static readstat_error_t read_variable_count_record(por_ctx_t *ctx) { int value; readstat_error_t retval = READSTAT_OK; if (ctx->var_count) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if ((retval = read_integer_in_range(ctx, 0, MAX_VARS, &value)) != READSTAT_OK) { goto cleanup; } ctx->var_count = value; ctx->variables = readstat_calloc(ctx->var_count, sizeof(readstat_variable_t *)); ctx->varinfo = readstat_calloc(ctx->var_count, sizeof(spss_varinfo_t)); if (ctx->variables == NULL || ctx->varinfo == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (ctx->handle.metadata) { readstat_metadata_t metadata = { .row_count = -1, .var_count = ctx->var_count, .creation_time = ctx->timestamp, .modified_time = ctx->timestamp, .file_format_version = ctx->version, .file_label = ctx->file_label }; if (ctx->handle.metadata(&metadata, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } cleanup: return retval; } static readstat_error_t read_precision_record(por_ctx_t *ctx) { int precision = 0; readstat_error_t error = read_integer_in_range(ctx, 0, 100, &precision); if (error == READSTAT_OK) ctx->base30_precision = precision; return error; } static readstat_error_t read_case_weight_record(por_ctx_t *ctx) { return read_string(ctx, ctx->fweight_name, sizeof(ctx->fweight_name)); } static readstat_error_t read_variable_record(por_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; int value; int i; spss_varinfo_t *varinfo = NULL; spss_format_t *formats[2]; ctx->var_offset++; if (ctx->var_offset == ctx->var_count) { retval = READSTAT_ERROR_PARSE; goto cleanup; } varinfo = &ctx->varinfo[ctx->var_offset]; formats[0] = &varinfo->print_format; formats[1] = &varinfo->write_format; varinfo->labels_index = -1; if ((retval = read_integer_in_range(ctx, 0, MAX_WIDTH, &value)) != READSTAT_OK) { goto cleanup; } varinfo->width = value; if (varinfo->width == 0) { varinfo->type = READSTAT_TYPE_DOUBLE; } else { varinfo->type = READSTAT_TYPE_STRING; } if ((retval = read_string(ctx, varinfo->name, sizeof(varinfo->name))) != READSTAT_OK) { goto cleanup; } ck_str_hash_insert(varinfo->name, varinfo, ctx->var_dict); for (i=0; i POR_FORMAT_SHIFT) { // Some files in the wild have their format types shifted by 82 for date/time values // I have no idea why, but see test files linked from: // https://github.com/WizardMac/ReadStat/issues/158 format->type = value - POR_FORMAT_SHIFT; } else { format->type = value; } if ((retval = read_integer_in_range(ctx, 0, MAX_FORMAT_WIDTH, &value)) != READSTAT_OK) { goto cleanup; } format->width = value; if ((retval = read_integer_in_range(ctx, 0, MAX_FORMAT_DECIMALS, &value)) != READSTAT_OK) { goto cleanup; } format->decimal_places = value; } cleanup: return retval; } static readstat_error_t read_missing_value_record(por_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; spss_varinfo_t *varinfo = NULL; if (ctx->var_offset < 0 || ctx->var_offset >= ctx->var_count) { retval = READSTAT_ERROR_PARSE; goto cleanup; } varinfo = &ctx->varinfo[ctx->var_offset]; if (varinfo->type == READSTAT_TYPE_DOUBLE) { if ((retval = read_double(ctx, &varinfo->missing_double_values[varinfo->n_missing_values])) != READSTAT_OK) { goto cleanup; } } else { if ((retval = read_string(ctx, varinfo->missing_string_values[varinfo->n_missing_values], sizeof(varinfo->missing_string_values[varinfo->n_missing_values]))) != READSTAT_OK) { goto cleanup; } } if (varinfo->n_missing_values > 2) { retval = READSTAT_ERROR_PARSE; goto cleanup; } varinfo->n_missing_values++; cleanup: return retval; } static readstat_error_t read_missing_value_range_record(por_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; spss_varinfo_t *varinfo = NULL; if (ctx->var_offset < 0 || ctx->var_offset == ctx->var_count) { retval = READSTAT_ERROR_PARSE; goto cleanup; } varinfo = &ctx->varinfo[ctx->var_offset]; varinfo->missing_range = 1; varinfo->n_missing_values = 2; if (varinfo->type == READSTAT_TYPE_DOUBLE) { if ((retval = read_double(ctx, &varinfo->missing_double_values[0])) != READSTAT_OK) { goto cleanup; } if ((retval = read_double(ctx, &varinfo->missing_double_values[1])) != READSTAT_OK) { goto cleanup; } } else { if ((retval = read_string(ctx, varinfo->missing_string_values[0], sizeof(varinfo->missing_string_values[0]))) != READSTAT_OK) { goto cleanup; } if ((retval = read_string(ctx, varinfo->missing_string_values[1], sizeof(varinfo->missing_string_values[1]))) != READSTAT_OK) { goto cleanup; } } cleanup: return retval; } static readstat_error_t read_missing_value_lo_range_record(por_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; spss_varinfo_t *varinfo = NULL; if (ctx->var_offset < 0 || ctx->var_offset == ctx->var_count) { retval = READSTAT_ERROR_PARSE; goto cleanup; } varinfo = &ctx->varinfo[ctx->var_offset]; varinfo->missing_range = 1; varinfo->n_missing_values = 2; if (varinfo->type == READSTAT_TYPE_DOUBLE) { varinfo->missing_double_values[0] = -HUGE_VAL; if ((retval = read_double(ctx, &varinfo->missing_double_values[1])) != READSTAT_OK) { goto cleanup; } } else { varinfo->missing_string_values[0][0] = '\0'; if ((retval = read_string(ctx, varinfo->missing_string_values[1], sizeof(varinfo->missing_string_values[1]))) != READSTAT_OK) { goto cleanup; } } cleanup: return retval; } static readstat_error_t read_missing_value_hi_range_record(por_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; spss_varinfo_t *varinfo = NULL; if (ctx->var_offset < 0 || ctx->var_offset == ctx->var_count) { retval = READSTAT_ERROR_PARSE; goto cleanup; } varinfo = &ctx->varinfo[ctx->var_offset]; varinfo->missing_range = 1; varinfo->n_missing_values = 2; if (varinfo->type == READSTAT_TYPE_DOUBLE) { if ((retval = read_double(ctx, &varinfo->missing_double_values[0])) != READSTAT_OK) { goto cleanup; } varinfo->missing_double_values[1] = HUGE_VAL; } else { if ((retval = read_string(ctx, varinfo->missing_string_values[0], sizeof(varinfo->missing_string_values[0]))) != READSTAT_OK) { goto cleanup; } varinfo->missing_string_values[1][0] = '\0'; } cleanup: return retval; } static readstat_error_t read_document_record(por_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; char string[256]; int i; int line_count = 0; if ((retval = read_integer_in_range(ctx, 0, MAX_LINES, &line_count)) != READSTAT_OK) { goto cleanup; } for (i=0; ihandle.note) { if (ctx->handle.note(i, string, ctx->user_ctx) != READSTAT_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } } cleanup: return retval; } static readstat_error_t read_variable_label_record(por_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; char string[256]; spss_varinfo_t *varinfo = NULL; if (ctx->var_offset < 0 || ctx->var_offset == ctx->var_count) { retval = READSTAT_ERROR_PARSE; goto cleanup; } varinfo = &ctx->varinfo[ctx->var_offset]; if ((retval = read_string(ctx, string, sizeof(string))) != READSTAT_OK) { goto cleanup; } varinfo->label = realloc(varinfo->label, 4*strlen(string) + 1); retval = readstat_convert(varinfo->label, 4*strlen(string) + 1, string, strlen(string), ctx->converter); cleanup: return retval; } static readstat_error_t read_value_label_record(por_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; double dval; int i; char string[256]; int count = 0, label_count = 0; char label_name_buf[256]; char label_buf[256]; snprintf(label_name_buf, sizeof(label_name_buf), POR_LABEL_NAME_PREFIX "%d", ctx->labels_offset); readstat_type_t value_type = READSTAT_TYPE_DOUBLE; if ((retval = read_integer_in_range(ctx, 0, MAX_STRINGS, &count)) != READSTAT_OK) { goto cleanup; } for (i=0; ivar_dict); if (info) { value_type = info->type; info->labels_index = ctx->labels_offset; } } if ((retval = read_integer_in_range(ctx, 0, MAX_LABELS, &label_count)) != READSTAT_OK) { goto cleanup; } for (i=0; ihandle.value_label) { if (ctx->handle.value_label(label_name_buf, value, label_buf, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } } ctx->labels_offset++; cleanup: return retval; } static readstat_error_t read_por_file_data(por_ctx_t *ctx) { int i; char input_string[256]; char output_string[4*256+1]; char error_buf[1024]; readstat_error_t rs_retval = READSTAT_OK; if (ctx->var_count == 0) return READSTAT_OK; while (1) { int finished = 0; for (i=0; ivar_count; i++) { spss_varinfo_t *info = &ctx->varinfo[i]; readstat_value_t value = { .type = info->type }; if (info->type == READSTAT_TYPE_STRING) { rs_retval = maybe_read_string(ctx, input_string, sizeof(input_string), &finished); if (rs_retval != READSTAT_OK) { if (ctx->handle.error) { snprintf(error_buf, sizeof(error_buf), "Error in %s (row=%d)", info->name, ctx->obs_count+1); ctx->handle.error(error_buf, ctx->user_ctx); } goto cleanup; } else if (finished) { if (i != 0) rs_retval = READSTAT_ERROR_PARSE; goto cleanup; } rs_retval = readstat_convert(output_string, sizeof(output_string), input_string, strlen(input_string), ctx->converter); if (rs_retval != READSTAT_OK) { goto cleanup; } value.v.string_value = output_string; } else if (info->type == READSTAT_TYPE_DOUBLE) { rs_retval = maybe_read_double(ctx, &value.v.double_value, &finished); if (rs_retval != READSTAT_OK) { if (ctx->handle.error) { snprintf(error_buf, sizeof(error_buf), "Error in %s (row=%d)", info->name, ctx->obs_count+1); ctx->handle.error(error_buf, ctx->user_ctx); } goto cleanup; } else if (finished) { if (i != 0) rs_retval = READSTAT_ERROR_PARSE; goto cleanup; } value.is_system_missing = isnan(value.v.double_value); } if (ctx->handle.value && !ctx->variables[i]->skip && !ctx->row_offset) { if (ctx->handle.value(ctx->obs_count, ctx->variables[i], value, ctx->user_ctx) != READSTAT_HANDLER_OK) { rs_retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } } if (ctx->row_offset) { ctx->row_offset--; } else { ctx->obs_count++; } rs_retval = por_update_progress(ctx); if (rs_retval != READSTAT_OK) break; if (ctx->row_limit > 0 && ctx->obs_count == ctx->row_limit) break; } cleanup: return rs_retval; } readstat_error_t read_version_and_timestamp(por_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; char string[256]; struct tm timestamp = { .tm_isdst = -1 }; unsigned char version; if (read_bytes(ctx, &version, sizeof(version)) != sizeof(version)) { retval = READSTAT_ERROR_READ; goto cleanup; } if ((retval = read_string(ctx, string, sizeof(string))) != READSTAT_OK) { /* creation date */ goto cleanup; } if (sscanf(string, "%04d%02d%02d", ×tamp.tm_year, ×tamp.tm_mon, ×tamp.tm_mday) != 3) { retval = READSTAT_ERROR_BAD_TIMESTAMP_STRING; goto cleanup; } if ((retval = read_string(ctx, string, sizeof(string))) != READSTAT_OK) { /* creation time */ goto cleanup; } if (sscanf(string, "%02d%02d%02d", ×tamp.tm_hour, ×tamp.tm_min, ×tamp.tm_sec) != 3) { retval = READSTAT_ERROR_BAD_TIMESTAMP_STRING; goto cleanup; } timestamp.tm_year -= 1900; timestamp.tm_mon--; ctx->timestamp = mktime(×tamp); ctx->version = ctx->byte2unicode[version] - 'A'; cleanup: return retval; } readstat_error_t handle_variables(por_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; int i; int index_after_skipping = 0; for (i=0; ivar_count; i++) { char label_name_buf[256]; spss_varinfo_t *info = &ctx->varinfo[i]; info->index = i; ctx->variables[i] = spss_init_variable_for_info(info, index_after_skipping, ctx->converter); snprintf(label_name_buf, sizeof(label_name_buf), POR_LABEL_NAME_PREFIX "%d", info->labels_index); int cb_retval = READSTAT_HANDLER_OK; if (ctx->handle.variable) { cb_retval = ctx->handle.variable(i, ctx->variables[i], info->labels_index == -1 ? NULL : label_name_buf, ctx->user_ctx); } if (cb_retval == READSTAT_HANDLER_ABORT) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } if (cb_retval == READSTAT_HANDLER_SKIP_VARIABLE) { ctx->variables[i]->skip = 1; } else { index_after_skipping++; } } if (ctx->handle.fweight && ctx->fweight_name[0]) { for (i=0; ivar_count; i++) { spss_varinfo_t *info = &ctx->varinfo[i]; if (strcmp(info->name, ctx->fweight_name) == 0) { if (ctx->handle.fweight(ctx->variables[i], ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } break; } } } cleanup: return retval; } readstat_error_t readstat_parse_por(readstat_parser_t *parser, const char *path, void *user_ctx) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = parser->io; unsigned char reverse_lookup[256]; char vanity[5][40]; char error_buf[1024]; por_ctx_t *ctx = por_ctx_init(); ctx->handle = parser->handlers; ctx->user_ctx = user_ctx; ctx->io = io; ctx->row_limit = parser->row_limit; if (parser->row_offset > 0) ctx->row_offset = parser->row_offset; if (parser->output_encoding) { if (strcmp(parser->output_encoding, "UTF-8") != 0) ctx->converter = iconv_open(parser->output_encoding, "UTF-8"); if (ctx->converter == (iconv_t)-1) { ctx->converter = NULL; retval = READSTAT_ERROR_UNSUPPORTED_CHARSET; goto cleanup; } } if (io->open(path, io->io_ctx) == -1) { retval = READSTAT_ERROR_OPEN; goto cleanup; } if ((ctx->file_size = io->seek(0, READSTAT_SEEK_END, io->io_ctx)) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if (io->seek(0, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if (read_bytes(ctx, vanity, sizeof(vanity)) != sizeof(vanity)) { retval = READSTAT_ERROR_READ; goto cleanup; } retval = readstat_convert(ctx->file_label, sizeof(ctx->file_label), vanity[1] + 20, 20, NULL); if (retval != READSTAT_OK) goto cleanup; if (read_bytes(ctx, reverse_lookup, sizeof(reverse_lookup)) != sizeof(reverse_lookup)) { retval = READSTAT_ERROR_READ; goto cleanup; } ctx->space = reverse_lookup[126]; int i; for (i=0; i<256; i++) { if (por_ascii_lookup[i]) { ctx->byte2unicode[reverse_lookup[i]] = por_ascii_lookup[i]; } else if (por_unicode_lookup[i]) { ctx->byte2unicode[reverse_lookup[i]] = por_unicode_lookup[i]; } } ctx->byte2unicode[reverse_lookup[64]] = por_unicode_lookup[64]; unsigned char check[8]; char tr_check[8]; if (read_bytes(ctx, check, sizeof(check)) != sizeof(check)) { retval = READSTAT_ERROR_READ; goto cleanup; } ssize_t encoded_len; if ((encoded_len = por_utf8_encode(check, sizeof(check), tr_check, sizeof(tr_check), ctx->byte2unicode)) == -1) { if (ctx->handle.error) { snprintf(error_buf, sizeof(error_buf), "Error converting check string: %.*s", (int)sizeof(check), check); ctx->handle.error(error_buf, ctx->user_ctx); } retval = READSTAT_ERROR_CONVERT; goto cleanup; } if (strncmp("SPSSPORT", tr_check, encoded_len) != 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } ctx->var_offset = -1; char string[256]; retval = read_version_and_timestamp(ctx); if (retval != READSTAT_OK) goto cleanup; while (1) { uint16_t tr_tag = read_tag(ctx); switch (tr_tag) { case '1': /* product ID */ case '2': /* author ID */ case '3': /* sub-product ID */ retval = read_string(ctx, string, sizeof(string)); break; case '4': /* variable count */ retval = read_variable_count_record(ctx); break; case '5': /* precision */ retval = read_precision_record(ctx); break; case '6': /* case weight */ retval = read_case_weight_record(ctx); break; case '7': /* variable */ retval = read_variable_record(ctx); break; case '8': /* missing value */ retval = read_missing_value_record(ctx); break; case 'B': /* missing value range */ retval = read_missing_value_range_record(ctx); break; case '9': /* LO THRU x */ retval = read_missing_value_lo_range_record(ctx); break; case 'A': /* x THRU HI */ retval = read_missing_value_hi_range_record(ctx); break; case 'C': /* variable label */ retval = read_variable_label_record(ctx); break; case 'D': /* value label */ retval = read_value_label_record(ctx); break; case 'E': /* document record */ retval = read_document_record(ctx); break; case 'F': /* file data */ if (ctx->var_offset != ctx->var_count - 1) { retval = READSTAT_ERROR_COLUMN_COUNT_MISMATCH; goto cleanup; } retval = handle_variables(ctx); if (retval != READSTAT_OK) goto cleanup; if (ctx->handle.value) { retval = read_por_file_data(ctx); } goto cleanup; default: retval = READSTAT_ERROR_PARSE; goto cleanup; } if (retval != READSTAT_OK) break; } cleanup: io->close(io->io_ctx); por_ctx_free(ctx); return retval; } haven/src/readstat/spss/readstat_zsav_write.c0000644000176200001440000001134214101007206021161 0ustar liggesusers#include #include #include #include "../readstat.h" #include "../readstat_writer.h" #include "readstat_sav_compress.h" #include "readstat_zsav_compress.h" #include "readstat_zsav_write.h" readstat_error_t zsav_write_compressed_row(void *writer_ctx, void *row, size_t len) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; zsav_ctx_t *zctx = writer->module_ctx; /* Kind of frustrating that SPSS does double compression. If they just * z-compressed the uncompressed data, we could calculate the block count * in advance and write out the file in a streaming manner. As things stand * we have to build up the file in memory until we know the final block * count. A possible streaming solution would be to declare the number of * blocks required to hold the maximum possible row-compressed size and * then fill out the end with no-op zero bytes (that get z-compressed very * small). */ size_t row_len = sav_compress_row(zctx->buffer, row, len, writer); int deflate_status = zsav_compress_row(zctx->buffer, row_len, writer->current_row + 1 == writer->row_count, zctx); if (deflate_status != Z_OK && deflate_status != Z_STREAM_END) return READSTAT_ERROR_WRITE; return READSTAT_OK; } static readstat_error_t zsav_write_data_header(readstat_writer_t *writer, zsav_ctx_t *zctx) { readstat_error_t retval = READSTAT_OK; uint64_t zheader_ofs = zctx->zheader_ofs; uint64_t ztrailer_ofs = zheader_ofs + 24; uint64_t ztrailer_len = 24 + zctx->blocks_count * 24; int i; for (i=0; iblocks_count; i++) { zsav_block_t *block = zctx->blocks[i]; ztrailer_ofs += block->compressed_size; } if ((retval = readstat_write_bytes(writer, &zheader_ofs, sizeof(uint64_t))) != READSTAT_OK) goto cleanup; if ((retval = readstat_write_bytes(writer, &ztrailer_ofs, sizeof(uint64_t))) != READSTAT_OK) goto cleanup; if ((retval = readstat_write_bytes(writer, &ztrailer_len, sizeof(uint64_t))) != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t zsav_write_data_blocks(readstat_writer_t *writer, zsav_ctx_t *zctx) { readstat_error_t retval = READSTAT_OK; int i; for (i=0; iblocks_count; i++) { zsav_block_t *block = zctx->blocks[i]; if ((retval = readstat_write_bytes(writer, block->compressed_data, block->compressed_size)) != READSTAT_OK) goto cleanup; } cleanup: return retval; } static readstat_error_t zsav_write_data_trailer(readstat_writer_t *writer, zsav_ctx_t *zctx) { readstat_error_t retval = READSTAT_OK; int64_t bias = -100; int64_t zero = 0; int32_t block_size = zctx->uncompressed_block_size; int32_t n_blocks = zctx->blocks_count; if ((retval = readstat_write_bytes(writer, &bias, sizeof(int64_t))) != READSTAT_OK) goto cleanup; if ((retval = readstat_write_bytes(writer, &zero, sizeof(int64_t))) != READSTAT_OK) goto cleanup; if ((retval = readstat_write_bytes(writer, &block_size, sizeof(int32_t))) != READSTAT_OK) goto cleanup; if ((retval = readstat_write_bytes(writer, &n_blocks, sizeof(int32_t))) != READSTAT_OK) goto cleanup; int i; int64_t uncompressed_ofs = zctx->zheader_ofs; int64_t compressed_ofs = zctx->zheader_ofs + 24; for (i=0; iblocks_count; i++) { zsav_block_t *block = zctx->blocks[i]; int32_t uncompressed_size = block->uncompressed_size; int32_t compressed_size = block->compressed_size; if ((retval = readstat_write_bytes(writer, &uncompressed_ofs, sizeof(int64_t))) != READSTAT_OK) goto cleanup; if ((retval = readstat_write_bytes(writer, &compressed_ofs, sizeof(int64_t))) != READSTAT_OK) goto cleanup; if ((retval = readstat_write_bytes(writer, &uncompressed_size, sizeof(int32_t))) != READSTAT_OK) goto cleanup; if ((retval = readstat_write_bytes(writer, &compressed_size, sizeof(int32_t))) != READSTAT_OK) goto cleanup; uncompressed_ofs += uncompressed_size; compressed_ofs += compressed_size; } cleanup: return retval; } readstat_error_t zsav_end_data(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; zsav_ctx_t *zctx = writer->module_ctx; readstat_error_t retval = READSTAT_OK; retval = zsav_write_data_header(writer, zctx); if (retval != READSTAT_OK) goto cleanup; retval = zsav_write_data_blocks(writer, zctx); if (retval != READSTAT_OK) goto cleanup; retval = zsav_write_data_trailer(writer, zctx); if (retval != READSTAT_OK) goto cleanup; cleanup: return retval; } haven/src/readstat/spss/readstat_sav_parse_timestamp.c0000644000176200001440000003655514101765776023076 0ustar liggesusers#line 1 "src/spss/readstat_sav_parse_timestamp.rl" #include #include "../readstat.h" #include "../readstat_iconv.h" #include "readstat_sav.h" #include "readstat_sav_parse_timestamp.h" #line 12 "src/spss/readstat_sav_parse_timestamp.c" static const signed char _sav_time_parse_actions[] = { 0, 1, 0, 1, 2, 1, 3, 1, 4, 1, 5, 2, 1, 0, 0 }; static const signed char _sav_time_parse_key_offsets[] = { 0, 0, 3, 5, 6, 9, 11, 12, 15, 17, 19, 21, 23, 0 }; static const char _sav_time_parse_trans_keys[] = { 32, 48, 57, 48, 57, 58, 32, 48, 57, 48, 57, 58, 32, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 0 }; static const signed char _sav_time_parse_single_lengths[] = { 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0 }; static const signed char _sav_time_parse_range_lengths[] = { 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0 }; static const signed char _sav_time_parse_index_offsets[] = { 0, 0, 3, 5, 7, 10, 12, 14, 17, 19, 21, 23, 25, 0 }; static const signed char _sav_time_parse_cond_targs[] = { 2, 11, 0, 3, 0, 4, 0, 5, 10, 0, 6, 0, 7, 0, 8, 9, 0, 12, 0, 12, 0, 6, 0, 3, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 0 }; static const signed char _sav_time_parse_cond_actions[] = { 0, 3, 0, 11, 0, 5, 0, 0, 3, 0, 11, 0, 7, 0, 0, 3, 0, 11, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0 }; static const signed char _sav_time_parse_eof_trans[] = { 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 0 }; static const int sav_time_parse_start = 1; static const int sav_time_parse_en_main = 1; #line 12 "src/spss/readstat_sav_parse_timestamp.rl" readstat_error_t sav_parse_time(const char *data, size_t len, struct tm *timestamp, readstat_error_handler error_cb, void *user_ctx) { readstat_error_t retval = READSTAT_OK; char error_buf[8192]; const char *p = data; const char *pe = p + len; const char *eof = pe; int cs; int temp_val = 0; #line 83 "src/spss/readstat_sav_parse_timestamp.c" { cs = (int)sav_time_parse_start; } #line 88 "src/spss/readstat_sav_parse_timestamp.c" { int _klen; unsigned int _trans = 0; const char * _keys; const signed char * _acts; unsigned int _nacts; _resume: {} if ( p == pe && p != eof ) goto _out; if ( p == eof ) { if ( _sav_time_parse_eof_trans[cs] > 0 ) { _trans = (unsigned int)_sav_time_parse_eof_trans[cs] - 1; } } else { _keys = ( _sav_time_parse_trans_keys + (_sav_time_parse_key_offsets[cs])); _trans = (unsigned int)_sav_time_parse_index_offsets[cs]; _klen = (int)_sav_time_parse_single_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + _klen - 1; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _keys += _klen; _trans += (unsigned int)_klen; break; } _mid = _lower + ((_upper-_lower) >> 1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 1; else if ( ( (*( p))) > (*( _mid)) ) _lower = _mid + 1; else { _trans += (unsigned int)(_mid - _keys); goto _match; } } } _klen = (int)_sav_time_parse_range_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + (_klen<<1) - 2; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _trans += (unsigned int)_klen; break; } _mid = _lower + (((_upper-_lower) >> 1) & ~1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 2; else if ( ( (*( p))) > (*( _mid + 1)) ) _lower = _mid + 2; else { _trans += (unsigned int)((_mid - _keys)>>1); break; } } } _match: {} } cs = (int)_sav_time_parse_cond_targs[_trans]; if ( _sav_time_parse_cond_actions[_trans] != 0 ) { _acts = ( _sav_time_parse_actions + (_sav_time_parse_cond_actions[_trans])); _nacts = (unsigned int)(*( _acts)); _acts += 1; while ( _nacts > 0 ) { switch ( (*( _acts)) ) { case 0: { { #line 24 "src/spss/readstat_sav_parse_timestamp.rl" temp_val = 10 * temp_val + ((( (*( p)))) - '0'); } #line 173 "src/spss/readstat_sav_parse_timestamp.c" break; } case 1: { { #line 28 "src/spss/readstat_sav_parse_timestamp.rl" temp_val = 0; } #line 182 "src/spss/readstat_sav_parse_timestamp.c" break; } case 2: { { #line 28 "src/spss/readstat_sav_parse_timestamp.rl" temp_val = (( (*( p)))) - '0'; } #line 191 "src/spss/readstat_sav_parse_timestamp.c" break; } case 3: { { #line 30 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_hour = temp_val; } #line 200 "src/spss/readstat_sav_parse_timestamp.c" break; } case 4: { { #line 32 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_min = temp_val; } #line 209 "src/spss/readstat_sav_parse_timestamp.c" break; } case 5: { { #line 34 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_sec = temp_val; } #line 218 "src/spss/readstat_sav_parse_timestamp.c" break; } } _nacts -= 1; _acts += 1; } } if ( p == eof ) { if ( cs >= 12 ) goto _out; } else { if ( cs != 0 ) { p += 1; goto _resume; } } _out: {} } #line 40 "src/spss/readstat_sav_parse_timestamp.rl" if (cs < #line 246 "src/spss/readstat_sav_parse_timestamp.c" 12 #line 42 "src/spss/readstat_sav_parse_timestamp.rl" || p != pe) { if (error_cb) { snprintf(error_buf, sizeof(error_buf), "Invalid time string (length=%d): %.*s", (int)len, (int)len, data); error_cb(error_buf, user_ctx); } retval = READSTAT_ERROR_BAD_TIMESTAMP_STRING; } (void)sav_time_parse_en_main; return retval; } #line 264 "src/spss/readstat_sav_parse_timestamp.c" static const signed char _sav_date_parse_actions[] = { 0, 1, 0, 1, 1, 1, 3, 1, 4, 1, 5, 1, 6, 1, 7, 1, 8, 1, 9, 1, 10, 1, 11, 1, 12, 1, 13, 1, 14, 1, 15, 2, 2, 0, 0 }; static const signed char _sav_date_parse_key_offsets[] = { 0, 0, 3, 6, 8, 16, 20, 21, 23, 26, 29, 30, 32, 33, 34, 36, 37, 39, 40, 42, 43, 45, 46, 50, 51, 53, 55, 57, 59, 60, 62, 64, 66, 68, 70, 72, 74, 75, 77, 78, 80, 81, 83, 84, 86, 87, 89, 90, 0 }; static const char _sav_date_parse_trans_keys[] = { 32, 48, 57, 32, 48, 57, 32, 45, 65, 68, 70, 74, 77, 78, 79, 83, 80, 85, 112, 117, 82, 32, 45, 32, 48, 57, 32, 48, 57, 71, 32, 45, 114, 103, 69, 101, 67, 32, 45, 99, 69, 101, 66, 32, 45, 98, 65, 85, 97, 117, 78, 32, 45, 76, 78, 32, 45, 32, 45, 110, 108, 110, 65, 97, 82, 89, 32, 45, 32, 45, 114, 121, 79, 111, 86, 32, 45, 118, 67, 99, 84, 32, 45, 116, 69, 101, 80, 32, 45, 112, 0 }; static const signed char _sav_date_parse_single_lengths[] = { 0, 1, 1, 2, 8, 4, 1, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 4, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 0, 0 }; static const signed char _sav_date_parse_range_lengths[] = { 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; static const short _sav_date_parse_index_offsets[] = { 0, 0, 3, 6, 9, 18, 23, 25, 28, 31, 34, 36, 39, 41, 43, 46, 48, 51, 53, 56, 58, 61, 63, 68, 70, 73, 76, 79, 82, 84, 87, 90, 93, 96, 99, 102, 105, 107, 110, 112, 115, 117, 120, 122, 125, 127, 130, 132, 0 }; static const signed char _sav_date_parse_cond_targs[] = { 2, 2, 0, 3, 3, 0, 4, 4, 0, 5, 14, 18, 22, 30, 35, 39, 43, 0, 6, 10, 12, 13, 0, 7, 0, 8, 8, 0, 9, 9, 0, 47, 47, 0, 11, 0, 8, 8, 0, 7, 0, 11, 0, 15, 17, 0, 16, 0, 8, 8, 0, 16, 0, 19, 21, 0, 20, 0, 8, 8, 0, 20, 0, 23, 25, 28, 29, 0, 24, 0, 8, 8, 0, 26, 27, 0, 8, 8, 0, 8, 8, 0, 24, 0, 26, 27, 0, 31, 34, 0, 32, 33, 0, 8, 8, 0, 8, 8, 0, 32, 33, 0, 36, 38, 0, 37, 0, 8, 8, 0, 37, 0, 40, 42, 0, 41, 0, 8, 8, 0, 41, 0, 44, 46, 0, 45, 0, 8, 8, 0, 45, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 0 }; static const signed char _sav_date_parse_cond_actions[] = { 31, 31, 0, 1, 1, 0, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 13, 0, 31, 31, 0, 1, 1, 0, 0, 0, 21, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 29, 29, 0, 0, 0, 0, 0, 0, 0, 0, 9, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 0, 0, 0, 0, 19, 19, 0, 17, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 0, 15, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 27, 0, 0, 0, 0, 0, 0, 0, 0, 25, 25, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0 }; static const short _sav_date_parse_eof_trans[] = { 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 0 }; static const int sav_date_parse_start = 1; static const int sav_date_parse_en_main = 1; #line 59 "src/spss/readstat_sav_parse_timestamp.rl" readstat_error_t sav_parse_date(const char *data, size_t len, struct tm *timestamp, readstat_error_handler error_cb, void *user_ctx) { readstat_error_t retval = READSTAT_OK; char error_buf[8192]; const char *p = data; const char *pe = p + len; const char *eof = pe; int cs; int temp_val = 0; #line 408 "src/spss/readstat_sav_parse_timestamp.c" { cs = (int)sav_date_parse_start; } #line 413 "src/spss/readstat_sav_parse_timestamp.c" { int _klen; unsigned int _trans = 0; const char * _keys; const signed char * _acts; unsigned int _nacts; _resume: {} if ( p == pe && p != eof ) goto _out; if ( p == eof ) { if ( _sav_date_parse_eof_trans[cs] > 0 ) { _trans = (unsigned int)_sav_date_parse_eof_trans[cs] - 1; } } else { _keys = ( _sav_date_parse_trans_keys + (_sav_date_parse_key_offsets[cs])); _trans = (unsigned int)_sav_date_parse_index_offsets[cs]; _klen = (int)_sav_date_parse_single_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + _klen - 1; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _keys += _klen; _trans += (unsigned int)_klen; break; } _mid = _lower + ((_upper-_lower) >> 1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 1; else if ( ( (*( p))) > (*( _mid)) ) _lower = _mid + 1; else { _trans += (unsigned int)(_mid - _keys); goto _match; } } } _klen = (int)_sav_date_parse_range_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + (_klen<<1) - 2; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _trans += (unsigned int)_klen; break; } _mid = _lower + (((_upper-_lower) >> 1) & ~1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 2; else if ( ( (*( p))) > (*( _mid + 1)) ) _lower = _mid + 2; else { _trans += (unsigned int)((_mid - _keys)>>1); break; } } } _match: {} } cs = (int)_sav_date_parse_cond_targs[_trans]; if ( _sav_date_parse_cond_actions[_trans] != 0 ) { _acts = ( _sav_date_parse_actions + (_sav_date_parse_cond_actions[_trans])); _nacts = (unsigned int)(*( _acts)); _acts += 1; while ( _nacts > 0 ) { switch ( (*( _acts)) ) { case 0: { { #line 71 "src/spss/readstat_sav_parse_timestamp.rl" char digit = ((( (*( p)))) - '0'); if (digit >= 0 && digit <= 9) { temp_val = 10 * temp_val + digit; } } #line 501 "src/spss/readstat_sav_parse_timestamp.c" break; } case 1: { { #line 78 "src/spss/readstat_sav_parse_timestamp.rl" if (temp_val < 70) { timestamp->tm_year = 100 + temp_val; } else { timestamp->tm_year = temp_val; } } #line 516 "src/spss/readstat_sav_parse_timestamp.c" break; } case 2: { { #line 87 "src/spss/readstat_sav_parse_timestamp.rl" temp_val = 0; } #line 525 "src/spss/readstat_sav_parse_timestamp.c" break; } case 3: { { #line 89 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_mday = temp_val; } #line 534 "src/spss/readstat_sav_parse_timestamp.c" break; } case 4: { { #line 94 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_mon = 0; } #line 543 "src/spss/readstat_sav_parse_timestamp.c" break; } case 5: { { #line 95 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_mon = 1; } #line 552 "src/spss/readstat_sav_parse_timestamp.c" break; } case 6: { { #line 96 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_mon = 2; } #line 561 "src/spss/readstat_sav_parse_timestamp.c" break; } case 7: { { #line 97 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_mon = 3; } #line 570 "src/spss/readstat_sav_parse_timestamp.c" break; } case 8: { { #line 98 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_mon = 4; } #line 579 "src/spss/readstat_sav_parse_timestamp.c" break; } case 9: { { #line 99 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_mon = 5; } #line 588 "src/spss/readstat_sav_parse_timestamp.c" break; } case 10: { { #line 100 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_mon = 6; } #line 597 "src/spss/readstat_sav_parse_timestamp.c" break; } case 11: { { #line 101 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_mon = 7; } #line 606 "src/spss/readstat_sav_parse_timestamp.c" break; } case 12: { { #line 102 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_mon = 8; } #line 615 "src/spss/readstat_sav_parse_timestamp.c" break; } case 13: { { #line 103 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_mon = 9; } #line 624 "src/spss/readstat_sav_parse_timestamp.c" break; } case 14: { { #line 104 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_mon = 10; } #line 633 "src/spss/readstat_sav_parse_timestamp.c" break; } case 15: { { #line 105 "src/spss/readstat_sav_parse_timestamp.rl" timestamp->tm_mon = 11; } #line 642 "src/spss/readstat_sav_parse_timestamp.c" break; } } _nacts -= 1; _acts += 1; } } if ( p == eof ) { if ( cs >= 47 ) goto _out; } else { if ( cs != 0 ) { p += 1; goto _resume; } } _out: {} } #line 112 "src/spss/readstat_sav_parse_timestamp.rl" if (cs < #line 670 "src/spss/readstat_sav_parse_timestamp.c" 47 #line 114 "src/spss/readstat_sav_parse_timestamp.rl" || p != pe) { if (error_cb) { snprintf(error_buf, sizeof(error_buf), "Invalid date string (length=%d): %.*s", (int)len, (int)len, data); error_cb(error_buf, user_ctx); } retval = READSTAT_ERROR_BAD_TIMESTAMP_STRING; } (void)sav_date_parse_en_main; return retval; } haven/src/readstat/spss/readstat_por_parse.h0000644000176200001440000000025314101007206020762 0ustar liggesusers// // readstat_por_parse.h // ssize_t readstat_por_parse_double(const char *data, size_t len, double *result, readstat_error_handler error_cb, void *user_ctx); haven/src/readstat/spss/readstat_sav_compress.h0000644000176200001440000000141014101007206021470 0ustar liggesusersenum sav_row_stream_status { SAV_ROW_STREAM_NEED_DATA, SAV_ROW_STREAM_HAVE_DATA, SAV_ROW_STREAM_FINISHED_ROW, SAV_ROW_STREAM_FINISHED_ALL }; struct sav_row_stream_s { const unsigned char *next_in; size_t avail_in; unsigned char *next_out; size_t avail_out; uint64_t missing_value; double bias; unsigned char chunk[8]; int i; int bswap; enum sav_row_stream_status status; }; size_t sav_compressed_row_bound(size_t uncompressed_length); size_t sav_compress_row(void *output_row, void *input_row, size_t input_len, readstat_writer_t *writer); void sav_decompress_row(struct sav_row_stream_s *state); haven/src/readstat/spss/readstat_por_write.c0000644000176200001440000006461114101007206021005 0ustar liggesusers #include #include #include #include #include #include "../readstat.h" #include "../CKHashTable.h" #include "../readstat_writer.h" #include "readstat_spss.h" #include "readstat_por.h" #define POR_BASE30_PRECISION 50 typedef struct por_write_ctx_s { unsigned char *unicode2byte; size_t unicode2byte_len; } por_write_ctx_t; static inline char por_encode_base30_digit(uint64_t digit) { if (digit < 10) return '0' + digit; return 'A' + (digit - 10); } static int por_write_base30_integer(char *string, size_t string_len, uint64_t integer) { int start = 0; int end = 0; int offset = 0; while (integer) { string[offset++] = por_encode_base30_digit(integer % 30); integer /= 30; } end = offset; offset--; while (offset > start) { char tmp = string[start]; string[start] = string[offset]; string[offset] = tmp; offset--; start++; } return end; } static readstat_error_t por_finish(readstat_writer_t *writer) { return readstat_write_line_padding(writer, 'Z', 80, "\r\n"); } static readstat_error_t por_write_bytes(readstat_writer_t *writer, const void *bytes, size_t len) { return readstat_write_bytes_as_lines(writer, bytes, len, 80, "\r\n"); } static readstat_error_t por_write_string_n(readstat_writer_t *writer, por_write_ctx_t *ctx, const char *string, size_t input_len) { char error_buf[1024]; readstat_error_t retval = READSTAT_OK; char *por_string = malloc(input_len); ssize_t output_len = por_utf8_decode(string, input_len, por_string, input_len, ctx->unicode2byte, ctx->unicode2byte_len); if (output_len == -1) { if (writer->error_handler) { snprintf(error_buf, sizeof(error_buf), "Error converting string (length=%" PRId64 "): %.*s", (int64_t)input_len, (int)input_len, string); writer->error_handler(error_buf, writer->user_ctx); } retval = READSTAT_ERROR_CONVERT; goto cleanup; } retval = por_write_bytes(writer, por_string, output_len); cleanup: if (por_string) free(por_string); return retval; } static readstat_error_t por_write_tag(readstat_writer_t *writer, por_write_ctx_t *ctx, char tag) { char string[2]; string[0] = tag; string[1] = '\0'; return por_write_string_n(writer, ctx, string, 1); } static ssize_t por_write_double_to_buffer(char *string, size_t buffer_len, double value, long precision) { int offset = 0; if (isnan(value)) { string[offset++] = '*'; string[offset++] = '.'; } else if (isinf(value)) { if (value < 0.0) { string[offset++] = '-'; } string[offset++] = '1'; string[offset++] = '+'; string[offset++] = 'T'; string[offset++] = 'T'; string[offset++] = '/'; } else { long integers_printed = 0; double integer_part; double fraction = modf(fabs(value), &integer_part); int64_t integer = integer_part; int64_t exponent = 0; if (value < 0.0) { string[offset++] = '-'; } if (integer == 0) { string[offset++] = '0'; } else { while (fraction == 0 && integer != 0 && (integer % 30) == 0) { integer /= 30; exponent++; } integers_printed = por_write_base30_integer(&string[offset], buffer_len - offset, integer); offset += integers_printed; } /* should use exponents for efficiency, but this works */ if (fraction) { string[offset++] = '.'; } while (fraction && integers_printed < precision) { fraction = modf(fraction * 30, &integer_part); integer = integer_part; if (integer < 0) { return -1; } else { string[offset++] = por_encode_base30_digit(integer); } integers_printed++; } if (exponent) { string[offset++] = '+'; offset += por_write_base30_integer(&string[offset], buffer_len - offset, exponent); } string[offset++] = '/'; } string[offset] = '\0'; return offset; } static readstat_error_t por_write_double(readstat_writer_t *writer, por_write_ctx_t *ctx, double value) { char error_buf[1024]; char string[256]; ssize_t bytes_written = por_write_double_to_buffer(string, sizeof(string), value, POR_BASE30_PRECISION); if (bytes_written == -1) { if (writer->error_handler) { snprintf(error_buf, sizeof(error_buf), "Unable to encode number: %lf", value); writer->error_handler(error_buf, writer->user_ctx); } return READSTAT_ERROR_WRITE; } return por_write_string_n(writer, ctx, string, bytes_written); } static readstat_error_t por_write_string_field_n(readstat_writer_t *writer, por_write_ctx_t *ctx, const char *string, size_t len) { readstat_error_t error = por_write_double(writer, ctx, len); if (error != READSTAT_OK) return error; return por_write_string_n(writer, ctx, string, len); } static readstat_error_t por_write_string_field(readstat_writer_t *writer, por_write_ctx_t *ctx, const char *string) { return por_write_string_field_n(writer, ctx, string, strlen(string)); } static por_write_ctx_t *por_write_ctx_init() { por_write_ctx_t *ctx = calloc(1, sizeof(por_write_ctx_t)); uint16_t max_unicode = 0; int i; for (i=0; i max_unicode) max_unicode = por_unicode_lookup[i]; } ctx->unicode2byte = malloc(max_unicode+1); ctx->unicode2byte_len = max_unicode+1; for (i=0; iunicode2byte[por_unicode_lookup[i]] = por_ascii_lookup[i]; } if (por_ascii_lookup[i]) { ctx->unicode2byte[por_ascii_lookup[i]] = por_ascii_lookup[i]; } } return ctx; } static void por_write_ctx_free(por_write_ctx_t *ctx) { if (ctx->unicode2byte) free(ctx->unicode2byte); free(ctx); } static readstat_error_t por_emit_header(readstat_writer_t *writer, por_write_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; size_t file_label_len = strlen(writer->file_label); char vanity[5][40]; memset(vanity, '0', sizeof(vanity)); memcpy(vanity[1], "ASCII SPSS PORT FILE", 20); strncpy(vanity[1] + 20, writer->file_label, 20); if (file_label_len < 20) memset(vanity[1] + 20 + file_label_len, ' ', 20 - file_label_len); por_write_bytes(writer, vanity, sizeof(vanity)); char lookup[256]; int i; memset(lookup, '0', sizeof(lookup)); for (i=0; itimestamp); if (!timestamp) { retval = READSTAT_ERROR_BAD_TIMESTAMP_VALUE; goto cleanup; } if ((retval = por_write_tag(writer, ctx, 'A')) != READSTAT_OK) goto cleanup; char date[9]; snprintf(date, sizeof(date), "%04d%02d%02d", (unsigned int)(timestamp->tm_year + 1900) % 10000, (unsigned int)(timestamp->tm_mon + 1) % 100, (unsigned int)(timestamp->tm_mday) % 100); if ((retval = por_write_string_field(writer, ctx, date)) != READSTAT_OK) goto cleanup; char time[7]; snprintf(time, sizeof(time), "%02d%02d%02d", (unsigned int)timestamp->tm_hour % 100, (unsigned int)timestamp->tm_min % 100, (unsigned int)timestamp->tm_sec % 100); if ((retval = por_write_string_field(writer, ctx, time)) != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t por_emit_identification_records(readstat_writer_t *writer, por_write_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; if ((retval = por_write_tag(writer, ctx, '1')) != READSTAT_OK) goto cleanup; if ((retval = por_write_string_field(writer, ctx, READSTAT_PRODUCT_NAME)) != READSTAT_OK) goto cleanup; if ((retval = por_write_tag(writer, ctx, '3')) != READSTAT_OK) goto cleanup; if ((retval = por_write_string_field(writer, ctx, READSTAT_PRODUCT_URL)) != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t por_emit_variable_count_record(readstat_writer_t *writer, por_write_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; if ((retval = por_write_tag(writer, ctx, '4')) != READSTAT_OK) goto cleanup; if ((retval = por_write_double(writer, ctx, writer->variables_count)) != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t por_emit_precision_record(readstat_writer_t *writer, por_write_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; if ((retval = por_write_tag(writer, ctx, '5')) != READSTAT_OK) goto cleanup; if ((retval = por_write_double(writer, ctx, POR_BASE30_PRECISION)) != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t por_emit_case_weight_variable_record(readstat_writer_t *writer, por_write_ctx_t *ctx) { if (!writer->fweight_variable) return READSTAT_OK; readstat_error_t retval = READSTAT_OK; if ((retval = por_write_tag(writer, ctx, '6')) != READSTAT_OK) goto cleanup; if ((retval = por_write_string_field(writer, ctx, readstat_variable_get_name(writer->fweight_variable))) != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t por_emit_format(readstat_writer_t *writer, por_write_ctx_t *ctx, spss_format_t *format) { readstat_error_t error = READSTAT_OK; if ((error = por_write_double(writer, ctx, format->type)) != READSTAT_OK) goto cleanup; if ((error = por_write_double(writer, ctx, format->width)) != READSTAT_OK) goto cleanup; if ((error = por_write_double(writer, ctx, format->decimal_places)) != READSTAT_OK) goto cleanup; cleanup: return error; } static readstat_error_t validate_variable_name(const char *name) { size_t len = strlen(name); if (len < 1 || len > 8) return READSTAT_ERROR_NAME_IS_TOO_LONG; int i; for (i=0; name[i]; i++) { if (name[i] >= 'A' && name[i] <= 'Z') continue; if (name[i] >= '0' && name[i] <= '9') continue; if (name[i] == '@' || name[i] == '#' || name[i] == '$') continue; if (name[i] == '_' || name[i] == '.') continue; return READSTAT_ERROR_NAME_CONTAINS_ILLEGAL_CHARACTER; } if (!(name[0] >= 'A' && name[0] <= 'Z') && name[0] != '@') return READSTAT_ERROR_NAME_BEGINS_WITH_ILLEGAL_CHARACTER; return READSTAT_OK; } static readstat_error_t por_emit_variable_label_record(readstat_writer_t *writer, por_write_ctx_t *ctx, readstat_variable_t *r_variable) { const char *label = readstat_variable_get_label(r_variable); readstat_error_t retval = READSTAT_OK; if (!label) return READSTAT_OK; if ((retval = por_write_tag(writer, ctx, 'C')) != READSTAT_OK) goto cleanup; if ((retval = por_write_string_field(writer, ctx, label)) != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t por_emit_missing_string_values_records(readstat_writer_t *writer, por_write_ctx_t *ctx, readstat_variable_t *r_variable) { readstat_error_t retval = READSTAT_OK; int n_missing_values = 0; int n_missing_ranges = readstat_variable_get_missing_ranges_count(r_variable); /* ranges */ int j; for (j=0; j 3) retval = READSTAT_ERROR_TOO_MANY_MISSING_VALUE_DEFINITIONS; cleanup: return retval; } static readstat_error_t por_emit_missing_double_values_records(readstat_writer_t *writer, por_write_ctx_t *ctx, readstat_variable_t *r_variable) { readstat_error_t retval = READSTAT_OK; int n_missing_values = 0; int n_missing_ranges = readstat_variable_get_missing_ranges_count(r_variable); /* ranges */ int j; for (j=0; j 3) retval = READSTAT_ERROR_TOO_MANY_MISSING_VALUE_DEFINITIONS; cleanup: return retval; } static readstat_error_t por_emit_missing_values_records(readstat_writer_t *writer, por_write_ctx_t *ctx, readstat_variable_t *r_variable) { if (r_variable->type == READSTAT_TYPE_DOUBLE) { return por_emit_missing_double_values_records(writer, ctx, r_variable); } return por_emit_missing_string_values_records(writer, ctx, r_variable); } static readstat_error_t por_emit_variable_records(readstat_writer_t *writer, por_write_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; int i; for (i=0; ivariables_count; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); const char *variable_name = readstat_variable_get_name(r_variable); spss_format_t print_format; if ((retval = por_write_tag(writer, ctx, '7')) != READSTAT_OK) break; retval = por_write_double(writer, ctx, (r_variable->type == READSTAT_TYPE_STRING) ? r_variable->user_width : 0); if (retval != READSTAT_OK) break; if ((retval = por_write_string_field(writer, ctx, variable_name)) != READSTAT_OK) break; if ((retval = spss_format_for_variable(r_variable, &print_format)) != READSTAT_OK) break; if ((retval = por_emit_format(writer, ctx, &print_format)) != READSTAT_OK) break; if ((retval = por_emit_format(writer, ctx, &print_format)) != READSTAT_OK) break; if ((retval = por_emit_missing_values_records(writer, ctx, r_variable)) != READSTAT_OK) break; if ((retval = por_emit_variable_label_record(writer, ctx, r_variable)) != READSTAT_OK) break; } return retval; } static readstat_error_t por_emit_value_label_records(readstat_writer_t *writer, por_write_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; int i, j; for (i=0; ilabel_sets_count; i++) { readstat_label_set_t *r_label_set = readstat_get_label_set(writer, i); readstat_type_t user_type = r_label_set->type; if (r_label_set->value_labels_count == 0 || r_label_set->variables_count == 0) continue; if ((retval = por_write_tag(writer, ctx, 'D')) != READSTAT_OK) goto cleanup; if ((retval = por_write_double(writer, ctx, r_label_set->variables_count)) != READSTAT_OK) goto cleanup; for (j=0; jvariables_count; j++) { readstat_variable_t *r_variable = readstat_get_label_set_variable(r_label_set, j); if ((retval = por_write_string_field(writer, ctx, readstat_variable_get_name(r_variable))) != READSTAT_OK) goto cleanup; } if ((retval = por_write_double(writer, ctx, r_label_set->value_labels_count)) != READSTAT_OK) goto cleanup; for (j=0; jvalue_labels_count; j++) { readstat_value_label_t *r_value_label = readstat_get_value_label(r_label_set, j); if (user_type == READSTAT_TYPE_STRING) { retval = por_write_string_field_n(writer, ctx, r_value_label->string_key, r_value_label->string_key_len); } else if (user_type == READSTAT_TYPE_DOUBLE) { retval = por_write_double(writer, ctx, r_value_label->double_key); } else if (user_type == READSTAT_TYPE_INT32) { retval = por_write_double(writer, ctx, r_value_label->int32_key); } if (retval != READSTAT_OK) goto cleanup; if ((retval = por_write_string_field_n(writer, ctx, r_value_label->label, r_value_label->label_len)) != READSTAT_OK) goto cleanup; } } cleanup: return retval; } static readstat_error_t por_emit_document_record(readstat_writer_t *writer, por_write_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; if ((retval = por_write_tag(writer, ctx, 'E')) != READSTAT_OK) goto cleanup; if ((retval = por_write_double(writer, ctx, writer->notes_count)) != READSTAT_OK) goto cleanup; int i; for (i=0; inotes_count; i++) { size_t len = strlen(writer->notes[i]); if (len > SPSS_DOC_LINE_SIZE) { retval = READSTAT_ERROR_NOTE_IS_TOO_LONG; goto cleanup; } if ((retval = por_write_string_field_n(writer, ctx, writer->notes[i], len)) != READSTAT_OK) goto cleanup; } cleanup: return retval; } static readstat_error_t por_emit_data_tag(readstat_writer_t *writer, por_write_ctx_t *ctx) { return por_write_tag(writer, ctx, 'F'); } static readstat_error_t por_begin_data(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; por_write_ctx_t *ctx = por_write_ctx_init(); readstat_error_t retval = READSTAT_OK; if ((retval = por_emit_header(writer, ctx)) != READSTAT_OK) goto cleanup; if ((retval = por_emit_version_and_timestamp(writer, ctx)) != READSTAT_OK) goto cleanup; if ((retval = por_emit_identification_records(writer, ctx)) != READSTAT_OK) goto cleanup; if ((retval = por_emit_variable_count_record(writer, ctx)) != READSTAT_OK) goto cleanup; if ((retval = por_emit_precision_record(writer, ctx)) != READSTAT_OK) goto cleanup; if ((retval = por_emit_case_weight_variable_record(writer, ctx)) != READSTAT_OK) goto cleanup; if ((retval = por_emit_variable_records(writer, ctx)) != READSTAT_OK) goto cleanup; if ((retval = por_emit_value_label_records(writer, ctx)) != READSTAT_OK) goto cleanup; if ((retval = por_emit_document_record(writer, ctx)) != READSTAT_OK) goto cleanup; if ((retval = por_emit_data_tag(writer, ctx)) != READSTAT_OK) goto cleanup; cleanup: if (retval != READSTAT_OK) { por_write_ctx_free(ctx); } else { writer->module_ctx = ctx; } return retval; } static readstat_error_t por_end_data(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; readstat_error_t error = READSTAT_OK; if ((error = por_write_tag(writer, writer->module_ctx, 'Z')) != READSTAT_OK) goto cleanup; if ((error = por_finish(writer)) != READSTAT_OK) goto cleanup; cleanup: por_write_ctx_free(writer->module_ctx); return error; } static size_t por_variable_width(readstat_type_t type, size_t user_width) { if (type == READSTAT_TYPE_STRING) { return POR_BASE30_PRECISION + 4 + user_width; } return POR_BASE30_PRECISION + 4; // minus sign + period + plus/minus + slash } static readstat_error_t por_variable_ok(const readstat_variable_t *variable) { return validate_variable_name(readstat_variable_get_name(variable)); } static readstat_error_t por_write_double_value(void *row, const readstat_variable_t *var, double value) { if (por_write_double_to_buffer(row, POR_BASE30_PRECISION + 4, value, POR_BASE30_PRECISION) == -1) { return READSTAT_ERROR_WRITE; } return READSTAT_OK; } static readstat_error_t por_write_int8_value(void *row, const readstat_variable_t *var, int8_t value) { return por_write_double_value(row, var, value); } static readstat_error_t por_write_int16_value(void *row, const readstat_variable_t *var, int16_t value) { return por_write_double_value(row, var, value); } static readstat_error_t por_write_int32_value(void *row, const readstat_variable_t *var, int32_t value) { return por_write_double_value(row, var, value); } static readstat_error_t por_write_float_value(void *row, const readstat_variable_t *var, float value) { return por_write_double_value(row, var, value); } static readstat_error_t por_write_missing_number(void *row, const readstat_variable_t *var) { return por_write_double_value(row, var, NAN); } static readstat_error_t por_write_missing_string(void *row, const readstat_variable_t *var) { return por_write_double_value(row, var, 0); } static readstat_error_t por_write_string_value(void *row, const readstat_variable_t *var, const char *string) { size_t len = strlen(string); if (len == 0) { string = " "; len = 1; } size_t storage_width = readstat_variable_get_storage_width(var); if (len > storage_width) { len = storage_width; } ssize_t bytes_written = por_write_double_to_buffer(row, POR_BASE30_PRECISION + 4, len, POR_BASE30_PRECISION); if (bytes_written == -1) { return READSTAT_ERROR_WRITE; } strncpy(((char *)row) + bytes_written, string, len); return READSTAT_OK; } static readstat_error_t por_write_row(void *writer_ctx, void *row, size_t row_len) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; char *row_chars = (char *)row; int offset = 0, output = 0; for (offset=0; offsetmodule_ctx, row_chars, output); } static readstat_error_t por_metadata_ok(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; if (writer->compression != READSTAT_COMPRESS_NONE) return READSTAT_ERROR_UNSUPPORTED_COMPRESSION; return READSTAT_OK; } readstat_error_t readstat_begin_writing_por(readstat_writer_t *writer, void *user_ctx, long row_count) { writer->callbacks.metadata_ok = &por_metadata_ok; writer->callbacks.variable_width = &por_variable_width; writer->callbacks.variable_ok = &por_variable_ok; writer->callbacks.write_int8 = &por_write_int8_value; writer->callbacks.write_int16 = &por_write_int16_value; writer->callbacks.write_int32 = &por_write_int32_value; writer->callbacks.write_float = &por_write_float_value; writer->callbacks.write_double = &por_write_double_value; writer->callbacks.write_string = &por_write_string_value; writer->callbacks.write_missing_string = &por_write_missing_string; writer->callbacks.write_missing_number = &por_write_missing_number; writer->callbacks.begin_data = &por_begin_data; writer->callbacks.write_row = &por_write_row; writer->callbacks.end_data = &por_end_data; return readstat_begin_writing_file(writer, user_ctx, row_count); } haven/src/readstat/spss/readstat_zsav_read.c0000644000176200001440000001452314101007206020746 0ustar liggesusers#include #include #include "../readstat.h" #include "../readstat_bits.h" #include "../readstat_iconv.h" #include "../readstat_malloc.h" #include "readstat_sav.h" #include "readstat_sav_compress.h" struct zheader { uint64_t zheader_ofs; uint64_t ztrailer_ofs; uint64_t ztrailer_len; }; struct ztrailer { int64_t bias; int64_t zero; int32_t block_size; int32_t n_blocks; }; struct ztrailer_entry { int64_t uncompressed_ofs; int64_t compressed_ofs; int32_t uncompressed_size; int32_t compressed_size; }; readstat_error_t zsav_read_compressed_data(sav_ctx_t *ctx, readstat_error_t (*row_handler)(unsigned char *, size_t, sav_ctx_t *)) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; readstat_off_t data_offset = 0; size_t uncompressed_row_len = ctx->var_offset * 8; readstat_off_t uncompressed_offset = 0; unsigned char *uncompressed_row = NULL; uLongf uncompressed_block_len = 0; unsigned char *compressed_block = NULL, *uncompressed_block = NULL; struct sav_row_stream_s state = { .missing_value = ctx->missing_double, .bias = ctx->bias, .bswap = ctx->bswap }; struct zheader zheader; struct ztrailer ztrailer; struct ztrailer_entry *ztrailer_entries = NULL; int n_blocks = 0; int block_i = 0; int i; if (io->read(&zheader, sizeof(struct zheader), io->io_ctx) < sizeof(struct zheader)) { retval = READSTAT_ERROR_READ; goto cleanup; } zheader.zheader_ofs = ctx->bswap ? byteswap8(zheader.zheader_ofs) : zheader.zheader_ofs; zheader.ztrailer_ofs = ctx->bswap ? byteswap8(zheader.ztrailer_ofs) : zheader.ztrailer_ofs; zheader.ztrailer_len = ctx->bswap ? byteswap8(zheader.ztrailer_len) : zheader.ztrailer_len; if (zheader.zheader_ofs != io->seek(0, READSTAT_SEEK_CUR, io->io_ctx) - sizeof(struct zheader)) { retval = READSTAT_ERROR_PARSE; goto cleanup; } n_blocks = (zheader.ztrailer_len - 24) / 24; if (io->seek(zheader.ztrailer_ofs, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if (io->read(&ztrailer, sizeof(struct ztrailer), io->io_ctx) < sizeof(struct ztrailer)) { retval = READSTAT_ERROR_READ; goto cleanup; } ztrailer.bias = ctx->bswap ? byteswap8(ztrailer.bias) : ztrailer.bias; ztrailer.zero = ctx->bswap ? byteswap8(ztrailer.zero) : ztrailer.zero; ztrailer.block_size = ctx->bswap ? byteswap4(ztrailer.block_size) : ztrailer.block_size; ztrailer.n_blocks = ctx->bswap ? byteswap4(ztrailer.n_blocks) : ztrailer.n_blocks; if (n_blocks != ztrailer.n_blocks) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (n_blocks && (ztrailer_entries = readstat_malloc(n_blocks * sizeof(struct ztrailer_entry))) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (io->read(ztrailer_entries, n_blocks * sizeof(struct ztrailer_entry), io->io_ctx) < n_blocks * sizeof(struct ztrailer_entry)) { retval = READSTAT_ERROR_READ; goto cleanup; } for (i=0; iuncompressed_ofs = ctx->bswap ? byteswap8(entry->uncompressed_ofs) : entry->uncompressed_ofs; entry->compressed_ofs = ctx->bswap ? byteswap8(entry->compressed_ofs) : entry->compressed_ofs; entry->uncompressed_size = ctx->bswap ? byteswap4(entry->uncompressed_size) : entry->uncompressed_size; entry->compressed_size = ctx->bswap ? byteswap4(entry->compressed_size) : entry->compressed_size; } if (uncompressed_row_len && (uncompressed_row = readstat_malloc(uncompressed_row_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } while (1) { if (block_i == n_blocks) goto cleanup; struct ztrailer_entry *entry = &ztrailer_entries[block_i]; if (io->seek(entry->compressed_ofs, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if ((compressed_block = readstat_realloc(compressed_block, entry->compressed_size)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (io->read(compressed_block, entry->compressed_size, io->io_ctx) != entry->compressed_size) { retval = READSTAT_ERROR_READ; goto cleanup; } uncompressed_block_len = entry->uncompressed_size; if ((uncompressed_block = readstat_realloc(uncompressed_block, uncompressed_block_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } int status = uncompress(uncompressed_block, &uncompressed_block_len, compressed_block, entry->compressed_size); if (status != Z_OK || uncompressed_block_len != entry->uncompressed_size) { retval = READSTAT_ERROR_PARSE; goto cleanup; } block_i++; state.status = SAV_ROW_STREAM_HAVE_DATA; data_offset = 0; while (state.status != SAV_ROW_STREAM_NEED_DATA) { state.next_in = &uncompressed_block[data_offset]; state.avail_in = uncompressed_block_len - data_offset; state.next_out = &uncompressed_row[uncompressed_offset]; state.avail_out = uncompressed_row_len - uncompressed_offset; sav_decompress_row(&state); uncompressed_offset = uncompressed_row_len - state.avail_out; data_offset = uncompressed_block_len - state.avail_in; if (state.status == SAV_ROW_STREAM_FINISHED_ROW) { retval = row_handler(uncompressed_row, uncompressed_row_len, ctx); if (retval != READSTAT_OK) goto cleanup; uncompressed_offset = 0; } if (state.status == SAV_ROW_STREAM_FINISHED_ALL) goto cleanup; if (ctx->row_limit > 0 && ctx->current_row == ctx->row_limit) goto cleanup; } } cleanup: if (uncompressed_row) free(uncompressed_row); if (ztrailer_entries) free(ztrailer_entries); if (compressed_block) free(compressed_block); if (uncompressed_block) free(uncompressed_block); return retval; } haven/src/readstat/spss/readstat_por_parse.rl0000644000176200001440000000515714101007206021160 0ustar liggesusers#include #include "../readstat.h" #include "readstat_por_parse.h" %%{ machine por_field_parse; write data nofinal noerror; }%% ssize_t readstat_por_parse_double(const char *data, size_t len, double *result, readstat_error_handler error_cb, void *user_ctx) { ssize_t retval = 0; double val = 0.0; double denom = 30.0; double temp_frac = 0.0; double num = 0.0; double exp = 0.0; double temp_val = 0.0; const unsigned char *p = (const unsigned char *)data; const unsigned char *pe = p + len; int cs; int is_negative = 0, exp_is_negative = 0; int success = 0; %%{ action incr_val { if (fc >= '0' && fc <= '9') { temp_val = 30 * temp_val + (fc - '0'); } else if (fc >= 'A' && fc <= 'T') { temp_val = 30 * temp_val + (10 + fc - 'A'); } } action incr_frac { if (fc >= '0' && fc <= '9') { temp_frac += (fc - '0') / denom; } else if (fc >= 'A' && fc <= 'T') { temp_frac += (10 + fc - 'A') / denom; } denom *= 30.0; } value = [0-9A-T]+ >{ temp_val = 0; } $incr_val; frac_value = [0-9A-T]+ >{ temp_frac = 0.0; } $incr_frac; fraction = "." frac_value; nonmissing_value = (("-" %{ is_negative = 1; })? value %{ num = temp_val; } fraction? ( ("+" | "-" %{ exp_is_negative = 1; }) value %{ exp = temp_val; })?) "/"; nonmissing_fraction = ("-" %{ is_negative = 1; })? fraction "/"; missing_value = "*." >{ val = NAN; }; main := " "* (missing_value | nonmissing_value | nonmissing_fraction ) @{ success = 1; fbreak; }; write init; write exec; }%% if (!isnan(val)) { val = 1.0 * num + temp_frac; if (exp_is_negative) exp *= -1; if (exp) { val *= pow(30.0, exp); } if (is_negative) val *= -1; } if (!success) { retval = -1; if (error_cb) { char error_buf[1024]; snprintf(error_buf, sizeof(error_buf), "Read bytes: %ld String: %.*s Ending state: %d", (long)(p - (const unsigned char *)data), (int)len, data, cs); error_cb(error_buf, user_ctx); } } if (retval == 0) { if (result) *result = val; retval = (p - (const unsigned char *)data); } /* suppress warning */ (void)por_field_parse_en_main; return retval; } haven/src/readstat/spss/readstat_spss_parse.rl0000644000176200001440000000755214101007206021351 0ustar liggesusers #include #include "../readstat.h" #include "readstat_spss.h" #include "readstat_spss_parse.h" %%{ machine spss_format_parser; write data nofinal noerror; }%% // For minimum width information see // https://www.ibm.com/support/knowledgecenter/SSLVMB_sub/statistics_reference_project_ddita/spss/base/syn_date_and_time_date_time_formats.html readstat_error_t spss_parse_format(const char *data, int count, spss_format_t *fmt) { unsigned char *p = (unsigned char *)data; unsigned char *pe = (unsigned char *)data + count; unsigned char *eof = pe; int cs; unsigned int integer = 0; %%{ action start_integer { integer = 0; } action incr_integer { integer = 10 * integer + (fc - '0'); } action save_width { fmt->width = integer; } action save_precision { fmt->decimal_places = integer; } type = ("A"i %{ fmt->type = SPSS_FORMAT_TYPE_A; } | "AHEX"i %{ fmt->type = SPSS_FORMAT_TYPE_AHEX; } | "COMMA"i %{ fmt->type = SPSS_FORMAT_TYPE_COMMA; } | "DOLLAR"i %{ fmt->type = SPSS_FORMAT_TYPE_DOLLAR; } | "F"i %{ fmt->type = SPSS_FORMAT_TYPE_F; } | "IB"i %{ fmt->type = SPSS_FORMAT_TYPE_IB; } | "PIBHEX"i %{ fmt->type = SPSS_FORMAT_TYPE_PIBHEX; } | "P"i %{ fmt->type = SPSS_FORMAT_TYPE_P; } | "PIB"i %{ fmt->type = SPSS_FORMAT_TYPE_PIB; } | "PK"i %{ fmt->type = SPSS_FORMAT_TYPE_PK; } | "RB"i %{ fmt->type = SPSS_FORMAT_TYPE_RB; } | "RBHEX"i %{ fmt->type = SPSS_FORMAT_TYPE_RBHEX; } | "Z"i %{ fmt->type = SPSS_FORMAT_TYPE_Z; } | "N"i %{ fmt->type = SPSS_FORMAT_TYPE_N; } | "E"i %{ fmt->type = SPSS_FORMAT_TYPE_E; } | "DATE"i %{ fmt->type = SPSS_FORMAT_TYPE_DATE; fmt->width = 11; } | "TIME"i %{ fmt->type = SPSS_FORMAT_TYPE_TIME; } | "DATETIME"i %{ fmt->type = SPSS_FORMAT_TYPE_DATETIME; fmt->width = 20; } | "YMDHMS"i %{ fmt->type = SPSS_FORMAT_TYPE_YMDHMS; fmt->width = 19; } | "ADATE"i %{ fmt->type = SPSS_FORMAT_TYPE_ADATE; fmt->width = 10; } | "JDATE"i %{ fmt->type = SPSS_FORMAT_TYPE_JDATE; } | "DTIME"i %{ fmt->type = SPSS_FORMAT_TYPE_DTIME; fmt->width = 23; } | "MTIME"i %{ fmt->type = SPSS_FORMAT_TYPE_MTIME; } | "WKDAY"i %{ fmt->type = SPSS_FORMAT_TYPE_WKDAY; } | "MONTH"i %{ fmt->type = SPSS_FORMAT_TYPE_MONTH; } | "MOYR"i %{ fmt->type = SPSS_FORMAT_TYPE_MOYR; } | "QYR"i %{ fmt->type = SPSS_FORMAT_TYPE_QYR; } | "WKYR"i %{ fmt->type = SPSS_FORMAT_TYPE_WKYR; fmt->width = 10; } | "PCT"i %{ fmt->type = SPSS_FORMAT_TYPE_PCT; } | "DOT"i %{ fmt->type = SPSS_FORMAT_TYPE_DOT; } | "CCA"i %{ fmt->type = SPSS_FORMAT_TYPE_CCA; } | "CCB"i %{ fmt->type = SPSS_FORMAT_TYPE_CCB; } | "CCC"i %{ fmt->type = SPSS_FORMAT_TYPE_CCC; } | "CCD"i %{ fmt->type = SPSS_FORMAT_TYPE_CCD; } | "CCE"i %{ fmt->type = SPSS_FORMAT_TYPE_CCE; } | "EDATE"i %{ fmt->type = SPSS_FORMAT_TYPE_EDATE; fmt->width = 10; } | "SDATE"i %{ fmt->type = SPSS_FORMAT_TYPE_SDATE; fmt->width = 10; } ); integer = [0-9]+ >start_integer $incr_integer; width = integer %save_width; precision = integer %save_precision; main := type (width ("." precision)? )?; write init; write exec; }%% /* suppress warning */ (void)spss_format_parser_en_main; if (cs < %%{ write first_final; }%% || p != eof) { return READSTAT_ERROR_PARSE; } return READSTAT_OK; } haven/src/readstat/spss/readstat_zsav_compress.h0000644000176200001440000000144414101007206021671 0ustar liggesusers typedef struct zsav_block_s { int32_t uncompressed_size; int32_t compressed_size; z_stream stream; unsigned char *compressed_data; size_t compressed_data_capacity; } zsav_block_t; typedef struct zsav_ctx_s { void *buffer; zsav_block_t **blocks; int blocks_count; int blocks_capacity; int64_t uncompressed_block_size; int64_t zheader_ofs; int compression_level; } zsav_ctx_t; zsav_ctx_t *zsav_ctx_init(size_t max_row_len, int64_t offset); void zsav_ctx_free(zsav_ctx_t *ctx); zsav_block_t *zsav_add_block(zsav_ctx_t *ctx); zsav_block_t *zsav_current_block(zsav_ctx_t *ctx); int zsav_compress_row(void *input, size_t input_len, int finish, zsav_ctx_t *zctx); haven/src/readstat/spss/readstat_spss_parse.c0000644000176200001440000004606214101765776021204 0ustar liggesusers#line 1 "src/spss/readstat_spss_parse.rl" #include #include "../readstat.h" #include "readstat_spss.h" #include "readstat_spss_parse.h" #line 11 "src/spss/readstat_spss_parse.c" static const signed char _spss_format_parser_actions[] = { 0, 1, 1, 1, 2, 1, 3, 1, 4, 1, 5, 1, 6, 1, 7, 1, 8, 1, 9, 1, 10, 1, 11, 1, 12, 1, 13, 1, 14, 1, 15, 1, 16, 1, 17, 1, 18, 1, 19, 1, 20, 1, 21, 1, 22, 1, 23, 1, 24, 1, 25, 1, 26, 1, 27, 1, 28, 1, 29, 1, 30, 1, 31, 1, 32, 1, 33, 1, 34, 1, 35, 1, 36, 1, 37, 1, 38, 1, 39, 1, 40, 2, 0, 1, 3, 4, 0, 1, 3, 5, 0, 1, 3, 6, 0, 1, 3, 7, 0, 1, 3, 8, 0, 1, 3, 9, 0, 1, 3, 10, 0, 1, 3, 11, 0, 1, 3, 12, 0, 1, 3, 13, 0, 1, 3, 14, 0, 1, 3, 15, 0, 1, 3, 16, 0, 1, 3, 17, 0, 1, 3, 18, 0, 1, 3, 19, 0, 1, 3, 20, 0, 1, 3, 21, 0, 1, 3, 22, 0, 1, 3, 23, 0, 1, 3, 24, 0, 1, 3, 25, 0, 1, 3, 26, 0, 1, 3, 27, 0, 1, 3, 28, 0, 1, 3, 29, 0, 1, 3, 30, 0, 1, 3, 31, 0, 1, 3, 32, 0, 1, 3, 33, 0, 1, 3, 34, 0, 1, 3, 35, 0, 1, 3, 36, 0, 1, 3, 37, 0, 1, 3, 38, 0, 1, 3, 39, 0, 1, 3, 40, 0, 1, 0 }; static const short _spss_format_parser_key_offsets[] = { 0, 0, 34, 36, 38, 40, 42, 44, 46, 50, 60, 62, 64, 66, 72, 74, 76, 78, 80, 82, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 118, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 172, 174, 176, 178, 180, 182, 184, 186, 188, 194, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 219, 221, 223, 225, 227, 231, 233, 235, 237, 239, 241, 243, 245, 247, 255, 257, 261, 263, 265, 267, 271, 273, 275, 277, 279, 281, 283, 0 }; static const char _spss_format_parser_trans_keys[] = { 65, 67, 68, 69, 70, 73, 74, 77, 78, 80, 81, 82, 83, 84, 87, 89, 90, 97, 99, 100, 101, 102, 105, 106, 109, 110, 112, 113, 114, 115, 116, 119, 121, 122, 48, 57, 65, 97, 84, 116, 69, 101, 69, 101, 88, 120, 67, 79, 99, 111, 65, 66, 67, 68, 69, 97, 98, 99, 100, 101, 77, 109, 77, 109, 65, 97, 65, 79, 84, 97, 111, 116, 84, 116, 69, 101, 73, 105, 77, 109, 69, 101, 76, 84, 108, 116, 76, 108, 65, 97, 82, 114, 73, 105, 77, 109, 69, 101, 65, 97, 84, 116, 69, 101, 66, 98, 68, 100, 65, 97, 84, 116, 69, 101, 79, 84, 111, 116, 78, 89, 110, 121, 84, 116, 72, 104, 82, 114, 73, 105, 77, 109, 69, 101, 84, 116, 66, 98, 69, 101, 88, 120, 89, 121, 82, 114, 66, 98, 69, 101, 88, 120, 68, 100, 65, 97, 84, 116, 69, 101, 73, 105, 77, 109, 69, 101, 75, 107, 68, 89, 100, 121, 65, 97, 89, 121, 82, 114, 77, 109, 68, 100, 72, 104, 77, 109, 83, 115, 68, 72, 100, 104, 48, 57, 46, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 84, 116, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 68, 100, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 67, 73, 75, 99, 105, 107, 48, 57, 48, 57, 72, 104, 48, 57, 48, 57, 48, 57, 48, 57, 72, 104, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 48, 57, 0 }; static const signed char _spss_format_parser_single_lengths[] = { 0, 34, 0, 2, 2, 2, 2, 2, 4, 10, 2, 2, 2, 6, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0 }; static const signed char _spss_format_parser_range_lengths[] = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0 }; static const short _spss_format_parser_index_offsets[] = { 0, 0, 35, 37, 40, 43, 46, 49, 52, 57, 68, 71, 74, 77, 84, 87, 90, 93, 96, 99, 104, 107, 110, 113, 116, 119, 122, 125, 128, 131, 134, 137, 140, 143, 146, 151, 156, 159, 162, 165, 168, 171, 174, 177, 180, 183, 186, 189, 192, 195, 198, 201, 204, 207, 210, 213, 216, 219, 222, 225, 230, 233, 236, 239, 242, 245, 248, 251, 254, 260, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 285, 287, 289, 291, 293, 297, 299, 301, 303, 305, 307, 309, 311, 313, 321, 323, 327, 329, 331, 333, 337, 339, 341, 343, 345, 347, 349, 0 }; static const signed char _spss_format_parser_cond_targs[] = { 68, 8, 13, 84, 86, 29, 30, 34, 92, 93, 46, 48, 51, 55, 58, 63, 106, 68, 8, 13, 84, 86, 29, 30, 34, 92, 93, 46, 48, 51, 55, 58, 63, 106, 0, 70, 0, 4, 4, 0, 5, 5, 0, 71, 71, 0, 7, 7, 0, 72, 72, 0, 9, 10, 9, 10, 0, 73, 74, 75, 76, 77, 73, 74, 75, 76, 77, 0, 11, 11, 0, 12, 12, 0, 78, 78, 0, 14, 19, 23, 14, 19, 23, 0, 15, 15, 0, 79, 79, 0, 17, 17, 0, 18, 18, 0, 80, 80, 0, 20, 82, 20, 82, 0, 21, 21, 0, 22, 22, 0, 81, 81, 0, 24, 24, 0, 25, 25, 0, 83, 83, 0, 27, 27, 0, 28, 28, 0, 85, 85, 0, 87, 87, 0, 31, 31, 0, 32, 32, 0, 33, 33, 0, 88, 88, 0, 35, 39, 35, 39, 0, 36, 38, 36, 38, 0, 37, 37, 0, 89, 89, 0, 90, 90, 0, 40, 40, 0, 41, 41, 0, 91, 91, 0, 94, 94, 0, 95, 95, 0, 45, 45, 0, 96, 96, 0, 47, 47, 0, 98, 98, 0, 99, 99, 0, 50, 50, 0, 100, 100, 0, 52, 52, 0, 53, 53, 0, 54, 54, 0, 101, 101, 0, 56, 56, 0, 57, 57, 0, 102, 102, 0, 59, 59, 0, 60, 62, 60, 62, 0, 61, 61, 0, 103, 103, 0, 104, 104, 0, 64, 64, 0, 65, 65, 0, 66, 66, 0, 67, 67, 0, 105, 105, 0, 3, 6, 3, 6, 69, 0, 2, 69, 0, 70, 0, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 16, 16, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 26, 26, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 42, 43, 97, 42, 43, 97, 69, 0, 69, 0, 44, 44, 69, 0, 69, 0, 69, 0, 69, 0, 49, 49, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 69, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 0 }; static const short _spss_format_parser_cond_actions[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 84, 0, 3, 1, 0, 1, 0, 160, 0, 88, 0, 204, 0, 208, 0, 212, 0, 216, 0, 220, 0, 92, 0, 0, 0, 144, 0, 152, 0, 96, 0, 200, 0, 168, 0, 0, 0, 140, 0, 224, 0, 100, 0, 104, 0, 164, 0, 180, 0, 184, 0, 172, 0, 136, 0, 0, 0, 0, 0, 0, 0, 112, 0, 196, 0, 0, 0, 116, 0, 108, 0, 120, 0, 188, 0, 0, 0, 124, 0, 128, 0, 228, 0, 148, 0, 176, 0, 192, 0, 156, 0, 132, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 3, 5, 45, 9, 67, 69, 71, 73, 75, 11, 37, 41, 13, 65, 49, 35, 77, 15, 17, 47, 55, 57, 51, 33, 21, 63, 23, 19, 25, 59, 27, 29, 79, 39, 53, 61, 43, 31, 0 }; static const short _spss_format_parser_eof_trans[] = { 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 0 }; static const int spss_format_parser_start = 1; static const int spss_format_parser_en_main = 1; #line 11 "src/spss/readstat_spss_parse.rl" // For minimum width information see // https://www.ibm.com/support/knowledgecenter/SSLVMB_sub/statistics_reference_project_ddita/spss/base/syn_date_and_time_date_time_formats.html readstat_error_t spss_parse_format(const char *data, int count, spss_format_t *fmt) { unsigned char *p = (unsigned char *)data; unsigned char *pe = (unsigned char *)data + count; unsigned char *eof = pe; int cs; unsigned int integer = 0; #line 310 "src/spss/readstat_spss_parse.c" { cs = (int)spss_format_parser_start; } #line 315 "src/spss/readstat_spss_parse.c" { int _klen; unsigned int _trans = 0; const char * _keys; const signed char * _acts; unsigned int _nacts; _resume: {} if ( p == pe && p != eof ) goto _out; if ( p == eof ) { if ( _spss_format_parser_eof_trans[cs] > 0 ) { _trans = (unsigned int)_spss_format_parser_eof_trans[cs] - 1; } } else { _keys = ( _spss_format_parser_trans_keys + (_spss_format_parser_key_offsets[cs])); _trans = (unsigned int)_spss_format_parser_index_offsets[cs]; _klen = (int)_spss_format_parser_single_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + _klen - 1; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _keys += _klen; _trans += (unsigned int)_klen; break; } _mid = _lower + ((_upper-_lower) >> 1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 1; else if ( ( (*( p))) > (*( _mid)) ) _lower = _mid + 1; else { _trans += (unsigned int)(_mid - _keys); goto _match; } } } _klen = (int)_spss_format_parser_range_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + (_klen<<1) - 2; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _trans += (unsigned int)_klen; break; } _mid = _lower + (((_upper-_lower) >> 1) & ~1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 2; else if ( ( (*( p))) > (*( _mid + 1)) ) _lower = _mid + 2; else { _trans += (unsigned int)((_mid - _keys)>>1); break; } } } _match: {} } cs = (int)_spss_format_parser_cond_targs[_trans]; if ( _spss_format_parser_cond_actions[_trans] != 0 ) { _acts = ( _spss_format_parser_actions + (_spss_format_parser_cond_actions[_trans])); _nacts = (unsigned int)(*( _acts)); _acts += 1; while ( _nacts > 0 ) { switch ( (*( _acts)) ) { case 0: { { #line 24 "src/spss/readstat_spss_parse.rl" integer = 0; } #line 400 "src/spss/readstat_spss_parse.c" break; } case 1: { { #line 28 "src/spss/readstat_spss_parse.rl" integer = 10 * integer + ((( (*( p)))) - '0'); } #line 411 "src/spss/readstat_spss_parse.c" break; } case 2: { { #line 32 "src/spss/readstat_spss_parse.rl" fmt->width = integer; } #line 422 "src/spss/readstat_spss_parse.c" break; } case 3: { { #line 36 "src/spss/readstat_spss_parse.rl" fmt->decimal_places = integer; } #line 433 "src/spss/readstat_spss_parse.c" break; } case 4: { { #line 40 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_A; } #line 442 "src/spss/readstat_spss_parse.c" break; } case 5: { { #line 41 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_AHEX; } #line 451 "src/spss/readstat_spss_parse.c" break; } case 6: { { #line 42 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_COMMA; } #line 460 "src/spss/readstat_spss_parse.c" break; } case 7: { { #line 43 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_DOLLAR; } #line 469 "src/spss/readstat_spss_parse.c" break; } case 8: { { #line 44 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_F; } #line 478 "src/spss/readstat_spss_parse.c" break; } case 9: { { #line 45 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_IB; } #line 487 "src/spss/readstat_spss_parse.c" break; } case 10: { { #line 46 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_PIBHEX; } #line 496 "src/spss/readstat_spss_parse.c" break; } case 11: { { #line 47 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_P; } #line 505 "src/spss/readstat_spss_parse.c" break; } case 12: { { #line 48 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_PIB; } #line 514 "src/spss/readstat_spss_parse.c" break; } case 13: { { #line 49 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_PK; } #line 523 "src/spss/readstat_spss_parse.c" break; } case 14: { { #line 50 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_RB; } #line 532 "src/spss/readstat_spss_parse.c" break; } case 15: { { #line 51 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_RBHEX; } #line 541 "src/spss/readstat_spss_parse.c" break; } case 16: { { #line 52 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_Z; } #line 550 "src/spss/readstat_spss_parse.c" break; } case 17: { { #line 53 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_N; } #line 559 "src/spss/readstat_spss_parse.c" break; } case 18: { { #line 54 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_E; } #line 568 "src/spss/readstat_spss_parse.c" break; } case 19: { { #line 55 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_DATE; fmt->width = 11; } #line 577 "src/spss/readstat_spss_parse.c" break; } case 20: { { #line 56 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_TIME; } #line 586 "src/spss/readstat_spss_parse.c" break; } case 21: { { #line 57 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_DATETIME; fmt->width = 20; } #line 595 "src/spss/readstat_spss_parse.c" break; } case 22: { { #line 58 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_YMDHMS; fmt->width = 19; } #line 604 "src/spss/readstat_spss_parse.c" break; } case 23: { { #line 59 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_ADATE; fmt->width = 10; } #line 613 "src/spss/readstat_spss_parse.c" break; } case 24: { { #line 60 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_JDATE; } #line 622 "src/spss/readstat_spss_parse.c" break; } case 25: { { #line 61 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_DTIME; fmt->width = 23; } #line 631 "src/spss/readstat_spss_parse.c" break; } case 26: { { #line 62 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_MTIME; } #line 640 "src/spss/readstat_spss_parse.c" break; } case 27: { { #line 63 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_WKDAY; } #line 649 "src/spss/readstat_spss_parse.c" break; } case 28: { { #line 64 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_MONTH; } #line 658 "src/spss/readstat_spss_parse.c" break; } case 29: { { #line 65 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_MOYR; } #line 667 "src/spss/readstat_spss_parse.c" break; } case 30: { { #line 66 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_QYR; } #line 676 "src/spss/readstat_spss_parse.c" break; } case 31: { { #line 67 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_WKYR; fmt->width = 10; } #line 685 "src/spss/readstat_spss_parse.c" break; } case 32: { { #line 68 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_PCT; } #line 694 "src/spss/readstat_spss_parse.c" break; } case 33: { { #line 69 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_DOT; } #line 703 "src/spss/readstat_spss_parse.c" break; } case 34: { { #line 70 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_CCA; } #line 712 "src/spss/readstat_spss_parse.c" break; } case 35: { { #line 71 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_CCB; } #line 721 "src/spss/readstat_spss_parse.c" break; } case 36: { { #line 72 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_CCC; } #line 730 "src/spss/readstat_spss_parse.c" break; } case 37: { { #line 73 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_CCD; } #line 739 "src/spss/readstat_spss_parse.c" break; } case 38: { { #line 74 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_CCE; } #line 748 "src/spss/readstat_spss_parse.c" break; } case 39: { { #line 75 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_EDATE; fmt->width = 10; } #line 757 "src/spss/readstat_spss_parse.c" break; } case 40: { { #line 76 "src/spss/readstat_spss_parse.rl" fmt->type = SPSS_FORMAT_TYPE_SDATE; fmt->width = 10; } #line 766 "src/spss/readstat_spss_parse.c" break; } } _nacts -= 1; _acts += 1; } } if ( p == eof ) { if ( cs >= 68 ) goto _out; } else { if ( cs != 0 ) { p += 1; goto _resume; } } _out: {} } #line 89 "src/spss/readstat_spss_parse.rl" /* suppress warning */ (void)spss_format_parser_en_main; if (cs < #line 797 "src/spss/readstat_spss_parse.c" 68 #line 94 "src/spss/readstat_spss_parse.rl" || p != eof) { return READSTAT_ERROR_PARSE; } return READSTAT_OK; } haven/src/readstat/spss/readstat_por.c0000644000176200001440000001302614101007206017565 0ustar liggesusers#include #include #include "../readstat.h" #include "../CKHashTable.h" #include "../readstat_convert.h" #include "readstat_spss.h" #include "readstat_por.h" int8_t por_ascii_lookup[256] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' ', '.', '<', '(', '+', '|', '&', '[', ']', '!', '$', '*', ')', ';', '^', '-', '/', '|', ',', '%', '_', '>', '?', '`', ':', '#', '@', '\'', '=', '"', 0, 0, 0, 0, 0, 0, '~', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, '{', '}', '\\', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; uint16_t por_unicode_lookup[256] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' ', '.', '<', '(', '+', '|', '&', '[', ']', '!', '$', '*', ')', ';', '^', '-', '/', 0x00A3, ',', '%', '_', '>', '?', 0x2018, ':', 0x00A6, '@', 0x2019, '=', '"', 0x2264, 0x25A1, 0x00B1, 0x25A0, 0x00B0, 0x2020, '~', 0x2013, 0x2514, 0x250C, 0x2265, 0x2070, 0x2071, 0x00B2, 0x00B3, 0x2074, 0x2075, 0x2076, 0x2077, 0x2078, 0x2079, 0x2518, 0x2510, 0x2260, 0x2014, 0x207D, 0x207E, 0x2E38, '{', '}', '\\', 0x00A2, 0x2022, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; por_ctx_t *por_ctx_init() { por_ctx_t *ctx = calloc(1, sizeof(por_ctx_t)); ctx->space = ' '; ctx->base30_precision = 20; ctx->var_dict = ck_hash_table_init(1024, 8); return ctx; } void por_ctx_free(por_ctx_t *ctx) { if (ctx->string_buffer) free(ctx->string_buffer); if (ctx->varinfo) { int i; for (i=0; ivar_count; i++) { if (ctx->varinfo[i].label) free(ctx->varinfo[i].label); } free(ctx->varinfo); } if (ctx->variables) { int i; for (i=0; ivar_count; i++) { if (ctx->variables[i]) free(ctx->variables[i]); } free(ctx->variables); } if (ctx->var_dict) ck_hash_table_free(ctx->var_dict); if (ctx->converter) iconv_close(ctx->converter); free(ctx); } ssize_t por_utf8_encode(const unsigned char *input, size_t input_len, char *output, size_t output_len, uint16_t lookup[256]) { int offset = 0; int i; for (i=0; i output_len) return offset; output[offset++] = codepoint; } else { if (codepoint <= 0x07FF) { if (offset + 2 > output_len) return offset; } else /* if (codepoint <= 0xFFFF) */{ if (offset + 3 > output_len) return offset; } /* TODO - For some reason that replacement character isn't recognized * by some systems, so be prepared to insert an ASCII space instead */ int printed = sprintf(output + offset, "%lc", codepoint); if (printed > 0) { offset += printed; } else { output[offset++] = ' '; } } } return offset; } ssize_t por_utf8_decode( const char *input, size_t input_len, char *output, size_t output_len, uint8_t *lookup, size_t lookup_len) { int offset = 0; wchar_t codepoint = 0; while (1) { int char_len = 0; if (offset + 1 > output_len) return offset; unsigned char val = *input; if (val >= 0x20 && val < 0x7F) { if (!lookup[val]) return -1; output[offset++] = lookup[val]; input++; } else { int conversions = sscanf(input, "%lc%n", &codepoint, &char_len); if (conversions == 0 || codepoint >= lookup_len || lookup[codepoint] == 0) { return -1; } output[offset++] = lookup[codepoint]; input += char_len; } } return offset; } haven/src/readstat/spss/readstat_sav.c0000644000176200001440000000507014101007206017556 0ustar liggesusers// // sav.c // #include #include #include #include #include #include #include #include #include "../readstat.h" #include "../readstat_bits.h" #include "../readstat_iconv.h" #include "../readstat_malloc.h" #include "readstat_sav.h" #define SAV_VARINFO_INITIAL_CAPACITY 512 sav_ctx_t *sav_ctx_init(sav_file_header_record_t *header, readstat_io_t *io) { sav_ctx_t *ctx = readstat_calloc(1, sizeof(sav_ctx_t)); if (ctx == NULL) { return NULL; } if (memcmp(&header->rec_type, "$FL2", 4) == 0) { ctx->format_version = 2; } else if (memcmp(&header->rec_type, "$FL3", 4) == 0) { ctx->format_version = 3; } else { sav_ctx_free(ctx); return NULL; } ctx->bswap = !(header->layout_code == 2 || header->layout_code == 3); ctx->endianness = (machine_is_little_endian() ^ ctx->bswap) ? READSTAT_ENDIAN_LITTLE : READSTAT_ENDIAN_BIG; if (header->compression == 1 || byteswap4(header->compression) == 1) { ctx->compression = READSTAT_COMPRESS_ROWS; } else if (header->compression == 2 || byteswap4(header->compression) == 2) { ctx->compression = READSTAT_COMPRESS_BINARY; } ctx->record_count = ctx->bswap ? byteswap4(header->ncases) : header->ncases; ctx->fweight_index = ctx->bswap ? byteswap4(header->weight_index) : header->weight_index; ctx->missing_double = SAV_MISSING_DOUBLE; ctx->lowest_double = SAV_LOWEST_DOUBLE; ctx->highest_double = SAV_HIGHEST_DOUBLE; ctx->bias = ctx->bswap ? byteswap_double(header->bias) : header->bias; ctx->varinfo_capacity = SAV_VARINFO_INITIAL_CAPACITY; if ((ctx->varinfo = readstat_calloc(ctx->varinfo_capacity, sizeof(spss_varinfo_t *))) == NULL) { sav_ctx_free(ctx); return NULL; } ctx->io = io; return ctx; } void sav_ctx_free(sav_ctx_t *ctx) { if (ctx->varinfo) { int i; for (i=0; ivar_index; i++) { spss_varinfo_free(ctx->varinfo[i]); } free(ctx->varinfo); } if (ctx->variables) { int i; for (i=0; ivar_count; i++) { if (ctx->variables[i]) free(ctx->variables[i]); } free(ctx->variables); } if (ctx->raw_string) free(ctx->raw_string); if (ctx->utf8_string) free(ctx->utf8_string); if (ctx->converter) iconv_close(ctx->converter); if (ctx->variable_display_values) { free(ctx->variable_display_values); } free(ctx); } haven/src/readstat/spss/readstat_sav_parse_timestamp.h0000644000176200001440000000043514101007206023040 0ustar liggesusers readstat_error_t sav_parse_time(const char *data, size_t len, struct tm *timestamp, readstat_error_handler error_cb, void *user_ctx); readstat_error_t sav_parse_date(const char *data, size_t len, struct tm *timestamp, readstat_error_handler error_cb, void *user_ctx); haven/src/readstat/spss/readstat_zsav_read.h0000644000176200001440000000021414101007206020743 0ustar liggesusers readstat_error_t zsav_read_compressed_data(sav_ctx_t *ctx, readstat_error_t (*row_handler)(unsigned char *, size_t, sav_ctx_t *)); haven/src/readstat/spss/readstat_sav_write.c0000644000176200001440000014515314101007206020777 0ustar liggesusers #include #include #include #include #include #include #include #include #include #include "../readstat.h" #include "../readstat_iconv.h" #include "../readstat_bits.h" #include "../readstat_malloc.h" #include "../readstat_writer.h" #include "../CKHashTable.h" #include "readstat_sav.h" #include "readstat_sav_compress.h" #include "readstat_spss_parse.h" #if HAVE_ZLIB #include #include "readstat_zsav_compress.h" #include "readstat_zsav_write.h" #endif #define MAX_STRING_SIZE 255 #define MAX_LABEL_SIZE 256 #define MAX_VALUE_LABEL_SIZE 120 typedef struct sav_varnames_s { char shortname[9]; char stem[6]; } sav_varnames_t; static long readstat_label_set_number_short_variables(readstat_label_set_t *r_label_set) { long count = 0; int j; for (j=0; jvariables_count; j++) { readstat_variable_t *r_variable = readstat_get_label_set_variable(r_label_set, j); if (r_variable->storage_width <= 8) { count++; } } return count; } static int readstat_label_set_needs_short_value_labels_record(readstat_label_set_t *r_label_set) { return readstat_label_set_number_short_variables(r_label_set) > 0; } static int readstat_label_set_needs_long_value_labels_record(readstat_label_set_t *r_label_set) { return readstat_label_set_number_short_variables(r_label_set) < r_label_set->variables_count; } static int32_t sav_encode_format(spss_format_t *spss_format) { uint8_t width = spss_format->width > 0xff ? 0xff : spss_format->width; return ((spss_format->type << 16) | (width << 8) | spss_format->decimal_places); } static readstat_error_t sav_encode_base_variable_format(readstat_variable_t *r_variable, int32_t *out_code) { spss_format_t spss_format; readstat_error_t retval = spss_format_for_variable(r_variable, &spss_format); if (retval == READSTAT_OK && out_code) *out_code = sav_encode_format(&spss_format); return retval; } static readstat_error_t sav_encode_ghost_variable_format(readstat_variable_t *r_variable, size_t user_width, int32_t *out_code) { spss_format_t spss_format; readstat_error_t retval = spss_format_for_variable(r_variable, &spss_format); spss_format.width = user_width; if (retval == READSTAT_OK && out_code) *out_code = sav_encode_format(&spss_format); return retval; } static size_t sav_format_variable_name(char *output, size_t output_len, sav_varnames_t *varnames) { snprintf(output, output_len, "%s", varnames->shortname); return strlen(output); } static size_t sav_format_ghost_variable_name(char *output, size_t output_len, sav_varnames_t *varnames, unsigned int segment) { snprintf(output, output_len, "%s", varnames->stem); size_t len = strlen(output); int letter = segment % 36; if (letter < 10) { output[len++] = '0' + letter; } else { output[len++] = 'A' + (letter - 10); } return len; } static int sav_variable_segments(readstat_type_t type, size_t user_width) { if (type == READSTAT_TYPE_STRING && user_width > MAX_STRING_SIZE) { return (user_width + 251) / 252; } return 1; } static readstat_error_t sav_emit_header(readstat_writer_t *writer) { sav_file_header_record_t header = { { 0 } }; readstat_error_t retval = READSTAT_OK; time_t now = writer->timestamp; struct tm *time_s = localtime(&now); /* There are portability issues with strftime so hack something up */ char months[][4] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" }; char creation_date[sizeof(header.creation_date)+1] = { 0 }; char creation_time[sizeof(header.creation_time)+1] = { 0 }; if (!time_s) { retval = READSTAT_ERROR_BAD_TIMESTAMP_VALUE; goto cleanup; } memcpy(header.rec_type, "$FL2", sizeof("$FL2")-1); if (writer->compression == READSTAT_COMPRESS_BINARY) { header.rec_type[3] = '3'; } memset(header.prod_name, ' ', sizeof(header.prod_name)); memcpy(header.prod_name, "@(#) SPSS DATA FILE - " READSTAT_PRODUCT_URL, sizeof("@(#) SPSS DATA FILE - " READSTAT_PRODUCT_URL)-1); header.layout_code = 2; header.nominal_case_size = writer->row_len / 8; if (writer->compression == READSTAT_COMPRESS_ROWS) { header.compression = 1; } else if (writer->compression == READSTAT_COMPRESS_BINARY) { header.compression = 2; } if (writer->fweight_variable) { int32_t dictionary_index = 1 + writer->fweight_variable->offset / 8; header.weight_index = dictionary_index; } else { header.weight_index = 0; } header.ncases = writer->row_count; header.bias = 100.0; snprintf(creation_date, sizeof(creation_date), "%02d %3.3s %02d", (unsigned int)time_s->tm_mday % 100, months[time_s->tm_mon], (unsigned int)time_s->tm_year % 100); memcpy(header.creation_date, creation_date, sizeof(header.creation_date)); snprintf(creation_time, sizeof(creation_time), "%02d:%02d:%02d", (unsigned int)time_s->tm_hour % 100, (unsigned int)time_s->tm_min % 100, (unsigned int)time_s->tm_sec % 100); memcpy(header.creation_time, creation_time, sizeof(header.creation_time)); memset(header.file_label, ' ', sizeof(header.file_label)); size_t file_label_len = strlen(writer->file_label); if (file_label_len > sizeof(header.file_label)) file_label_len = sizeof(header.file_label); if (writer->file_label[0]) memcpy(header.file_label, writer->file_label, file_label_len); retval = readstat_write_bytes(writer, &header, sizeof(header)); cleanup: return retval; } static readstat_error_t sav_emit_variable_label(readstat_writer_t *writer, readstat_variable_t *r_variable) { readstat_error_t retval = READSTAT_OK; const char *title_data = r_variable->label; size_t title_data_len = strlen(title_data); if (title_data_len > 0) { char padded_label[MAX_LABEL_SIZE]; uint32_t label_len = title_data_len; if (label_len > sizeof(padded_label)) label_len = sizeof(padded_label); retval = readstat_write_bytes(writer, &label_len, sizeof(label_len)); if (retval != READSTAT_OK) goto cleanup; strncpy(padded_label, title_data, (label_len + 3) / 4 * 4); retval = readstat_write_bytes(writer, padded_label, (label_len + 3) / 4 * 4); if (retval != READSTAT_OK) goto cleanup; } cleanup: return retval; } static int sav_n_missing_double_values(readstat_variable_t *r_variable) { int n_missing_ranges = readstat_variable_get_missing_ranges_count(r_variable); int n_missing_values = n_missing_ranges; int has_missing_range = 0; int j; for (j=0; jtype == READSTAT_TYPE_DOUBLE) { n_missing_values = sav_n_missing_double_values(r_variable); } else if (readstat_variable_get_storage_width(r_variable) <= 8) { n_missing_values = sav_n_missing_string_values(r_variable); } if (abs(n_missing_values) > 3) { return READSTAT_ERROR_TOO_MANY_MISSING_VALUE_DEFINITIONS; } if (out_n_missing_values) *out_n_missing_values = n_missing_values; return READSTAT_OK; } static readstat_error_t sav_emit_variable_missing_string_values(readstat_writer_t *writer, readstat_variable_t *r_variable) { readstat_error_t retval = READSTAT_OK; int n_missing_values = 0; int n_missing_ranges = readstat_variable_get_missing_ranges_count(r_variable); /* ranges */ int j; for (j=0; jtype == READSTAT_TYPE_DOUBLE) { return sav_emit_variable_missing_double_values(writer, r_variable); } else if (readstat_variable_get_storage_width(r_variable) <= 8) { return sav_emit_variable_missing_string_values(writer, r_variable); } return READSTAT_OK; } static readstat_error_t sav_emit_blank_variable_records(readstat_writer_t *writer, int extra_fields) { readstat_error_t retval = READSTAT_OK; int32_t rec_type = SAV_RECORD_TYPE_VARIABLE; sav_variable_record_t variable; while (extra_fields--) { retval = readstat_write_bytes(writer, &rec_type, sizeof(rec_type)); if (retval != READSTAT_OK) goto cleanup; memset(&variable, '\0', sizeof(variable)); memset(variable.name, ' ', sizeof(variable.name)); variable.type = -1; variable.print = variable.write = 0x011d01; retval = readstat_write_bytes(writer, &variable, sizeof(variable)); if (retval != READSTAT_OK) goto cleanup; } cleanup: return retval; } static readstat_error_t sav_emit_base_variable_record(readstat_writer_t *writer, readstat_variable_t *r_variable, sav_varnames_t *varnames) { readstat_error_t retval = READSTAT_OK; int32_t rec_type = SAV_RECORD_TYPE_VARIABLE; char name_data[9]; size_t name_data_len = sav_format_variable_name(name_data, sizeof(name_data), varnames); retval = readstat_write_bytes(writer, &rec_type, sizeof(rec_type)); if (retval != READSTAT_OK) goto cleanup; sav_variable_record_t variable = {0}; if (r_variable->type == READSTAT_TYPE_STRING) { variable.type = r_variable->user_width > MAX_STRING_SIZE ? MAX_STRING_SIZE : r_variable->user_width; } variable.has_var_label = (r_variable->label[0] != '\0'); retval = sav_n_missing_values(&variable.n_missing_values, r_variable); if (retval != READSTAT_OK) goto cleanup; retval = sav_encode_base_variable_format(r_variable, &variable.print); if (retval != READSTAT_OK) goto cleanup; variable.write = variable.print; memset(variable.name, ' ', sizeof(variable.name)); if (name_data_len > 0 && name_data_len <= sizeof(variable.name)) memcpy(variable.name, name_data, name_data_len); retval = readstat_write_bytes(writer, &variable, sizeof(variable)); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_variable_label(writer, r_variable); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_variable_missing_values(writer, r_variable); if (retval != READSTAT_OK) goto cleanup; int extra_fields = r_variable->storage_width / 8 - 1; if (extra_fields > 31) extra_fields = 31; retval = sav_emit_blank_variable_records(writer, extra_fields); if (retval != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t sav_emit_ghost_variable_record(readstat_writer_t *writer, readstat_variable_t *r_variable, sav_varnames_t *varnames, int segment, size_t user_width) { readstat_error_t retval = READSTAT_OK; int32_t rec_type = SAV_RECORD_TYPE_VARIABLE; sav_variable_record_t variable = { 0 }; char name_data[9]; size_t name_len = sav_format_ghost_variable_name(name_data, sizeof(name_data), varnames, segment); retval = readstat_write_bytes(writer, &rec_type, sizeof(rec_type)); if (retval != READSTAT_OK) goto cleanup; variable.type = user_width; retval = sav_encode_ghost_variable_format(r_variable, user_width, &variable.print); if (retval != READSTAT_OK) goto cleanup; variable.write = variable.print; memset(variable.name, ' ', sizeof(variable.name)); if (name_len > 0 && name_len <= sizeof(variable.name)) memcpy(variable.name, name_data, name_len); retval = readstat_write_bytes(writer, &variable, sizeof(variable)); if (retval != READSTAT_OK) goto cleanup; int extra_fields = (user_width + 7) / 8 - 1; if (extra_fields > 31) extra_fields = 31; retval = sav_emit_blank_variable_records(writer, extra_fields); if (retval != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t sav_emit_full_variable_record(readstat_writer_t *writer, readstat_variable_t *r_variable, sav_varnames_t *varnames) { readstat_error_t retval = READSTAT_OK; retval = sav_emit_base_variable_record(writer, r_variable, varnames); if (retval != READSTAT_OK) goto cleanup; int n_segments = sav_variable_segments(r_variable->type, r_variable->user_width); int i; for (i=1; iuser_width - (n_segments - 1) * 252); } retval = sav_emit_ghost_variable_record(writer, r_variable, varnames, i, storage_size); if (retval != READSTAT_OK) goto cleanup; } cleanup: return retval; } static readstat_error_t sav_emit_variable_records(readstat_writer_t *writer, sav_varnames_t *varnames) { readstat_error_t retval = READSTAT_OK; int i; for (i=0; ivariables_count; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); retval = sav_emit_full_variable_record(writer, r_variable, &varnames[i]); if (retval != READSTAT_OK) goto cleanup; } cleanup: return retval; } static readstat_error_t sav_emit_value_label_records(readstat_writer_t *writer) { readstat_error_t retval = READSTAT_OK; int i, j; for (i=0; ilabel_sets_count; i++) { readstat_label_set_t *r_label_set = readstat_get_label_set(writer, i); if (!readstat_label_set_needs_short_value_labels_record(r_label_set)) continue; readstat_type_t user_type = r_label_set->type; int32_t label_count = r_label_set->value_labels_count; int32_t rec_type = 0; if (label_count) { rec_type = SAV_RECORD_TYPE_VALUE_LABEL; retval = readstat_write_bytes(writer, &rec_type, sizeof(rec_type)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &label_count, sizeof(label_count)); if (retval != READSTAT_OK) goto cleanup; for (j=0; jstring_key_len; if (key_len > sizeof(value)) key_len = sizeof(value); memset(value, ' ', sizeof(value)); memcpy(value, r_value_label->string_key, key_len); } else if (user_type == READSTAT_TYPE_DOUBLE) { double num_val = r_value_label->double_key; memcpy(value, &num_val, sizeof(double)); } else if (user_type == READSTAT_TYPE_INT32) { double num_val = r_value_label->int32_key; memcpy(value, &num_val, sizeof(double)); } retval = readstat_write_bytes(writer, value, sizeof(value)); const char *label_data = r_value_label->label; uint8_t label_len = MAX_VALUE_LABEL_SIZE; if (label_len > r_value_label->label_len) label_len = r_value_label->label_len; retval = readstat_write_bytes(writer, &label_len, sizeof(label_len)); if (retval != READSTAT_OK) goto cleanup; char label[MAX_VALUE_LABEL_SIZE+8]; memset(label, ' ', sizeof(label)); memcpy(label, label_data, label_len); retval = readstat_write_bytes(writer, label, (label_len + sizeof(label_len) + 7) / 8 * 8 - sizeof(label_len)); if (retval != READSTAT_OK) goto cleanup; } rec_type = SAV_RECORD_TYPE_VALUE_LABEL_VARIABLES; int32_t var_count = readstat_label_set_number_short_variables(r_label_set); retval = readstat_write_bytes(writer, &rec_type, sizeof(rec_type)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &var_count, sizeof(var_count)); if (retval != READSTAT_OK) goto cleanup; for (j=0; jvariables_count; j++) { readstat_variable_t *r_variable = readstat_get_label_set_variable(r_label_set, j); if (r_variable->storage_width > 8) continue; int32_t dictionary_index = 1 + r_variable->offset / 8; retval = readstat_write_bytes(writer, &dictionary_index, sizeof(dictionary_index)); if (retval != READSTAT_OK) goto cleanup; } } } cleanup: return retval; } static readstat_error_t sav_emit_document_record(readstat_writer_t *writer) { readstat_error_t retval = READSTAT_OK; int32_t rec_type = SAV_RECORD_TYPE_DOCUMENT; int32_t n_lines = writer->notes_count; if (n_lines == 0) goto cleanup; retval = readstat_write_bytes(writer, &rec_type, sizeof(rec_type)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &n_lines, sizeof(n_lines)); if (retval != READSTAT_OK) goto cleanup; int i; for (i=0; inotes_count; i++) { size_t len = strlen(writer->notes[i]); if (len > SPSS_DOC_LINE_SIZE) { retval = READSTAT_ERROR_NOTE_IS_TOO_LONG; goto cleanup; } retval = readstat_write_bytes(writer, writer->notes[i], len); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_spaces(writer, SPSS_DOC_LINE_SIZE - len); if (retval != READSTAT_OK) goto cleanup; } cleanup: return retval; } static readstat_error_t sav_emit_integer_info_record(readstat_writer_t *writer) { readstat_error_t retval = READSTAT_OK; sav_info_record_t info_header = { .rec_type = SAV_RECORD_TYPE_HAS_DATA, .subtype = SAV_RECORD_SUBTYPE_INTEGER_INFO, .size = 4, .count = 8 }; sav_machine_integer_info_record_t machine_info = { .version_major = 20, .version_minor = 0, .version_revision = 0, .machine_code = -1, .floating_point_rep = SAV_FLOATING_POINT_REP_IEEE, .compression_code = 1, .endianness = machine_is_little_endian() ? SAV_ENDIANNESS_LITTLE : SAV_ENDIANNESS_BIG, .character_code = 65001 // UTF-8 }; retval = readstat_write_bytes(writer, &info_header, sizeof(info_header)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &machine_info, sizeof(machine_info)); if (retval != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t sav_emit_floating_point_info_record(readstat_writer_t *writer) { readstat_error_t retval = READSTAT_OK; sav_info_record_t info_header = { .rec_type = SAV_RECORD_TYPE_HAS_DATA, .subtype = SAV_RECORD_SUBTYPE_FP_INFO, .size = 8, .count = 3 }; retval = readstat_write_bytes(writer, &info_header, sizeof(info_header)); if (retval != READSTAT_OK) goto cleanup; sav_machine_floating_point_info_record_t fp_info = {0}; fp_info.sysmis = SAV_MISSING_DOUBLE; fp_info.highest = SAV_HIGHEST_DOUBLE; fp_info.lowest = SAV_LOWEST_DOUBLE; retval = readstat_write_bytes(writer, &fp_info, sizeof(fp_info)); if (retval != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t sav_emit_variable_display_record(readstat_writer_t *writer) { readstat_error_t retval = READSTAT_OK; int i; sav_info_record_t info_header = { .rec_type = SAV_RECORD_TYPE_HAS_DATA, .subtype = SAV_RECORD_SUBTYPE_VAR_DISPLAY, .size = sizeof(int32_t) }; int total_segments = 0; for (i=0; ivariables_count; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); total_segments += sav_variable_segments(r_variable->type, r_variable->user_width); } info_header.count = 3 * total_segments; retval = readstat_write_bytes(writer, &info_header, sizeof(info_header)); if (retval != READSTAT_OK) goto cleanup; for (i=0; ivariables_count; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); readstat_measure_t measure = readstat_variable_get_measure(r_variable); int32_t sav_measure = spss_measure_from_readstat_measure(measure); int32_t sav_display_width = readstat_variable_get_display_width(r_variable); if (sav_display_width <= 0) sav_display_width = 8; readstat_alignment_t alignment = readstat_variable_get_alignment(r_variable); int32_t sav_alignment = spss_alignment_from_readstat_alignment(alignment); int n_segments = sav_variable_segments(r_variable->type, r_variable->user_width); while (n_segments--) { retval = readstat_write_bytes(writer, &sav_measure, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &sav_display_width, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &sav_alignment, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; } } cleanup: return retval; } static readstat_error_t sav_emit_long_var_name_record(readstat_writer_t *writer, sav_varnames_t *varnames) { readstat_error_t retval = READSTAT_OK; int i; sav_info_record_t info_header = { .rec_type = SAV_RECORD_TYPE_HAS_DATA, .subtype = SAV_RECORD_SUBTYPE_LONG_VAR_NAME, .size = 1, .count = 0 }; for (i=0; ivariables_count; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); char name_data[9]; size_t name_data_len = sav_format_variable_name(name_data, sizeof(name_data), &varnames[i]); const char *title_data = r_variable->name; size_t title_data_len = strlen(title_data); if (title_data_len > 0 && name_data_len > 0) { if (title_data_len > 64) title_data_len = 64; info_header.count += name_data_len; info_header.count += sizeof("=")-1; info_header.count += title_data_len; info_header.count += sizeof("\x09")-1; } } if (info_header.count > 0) { info_header.count--; /* no trailing 0x09 */ retval = readstat_write_bytes(writer, &info_header, sizeof(info_header)); if (retval != READSTAT_OK) goto cleanup; int is_first = 1; for (i=0; ivariables_count; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); char name_data[9]; sav_format_variable_name(name_data, sizeof(name_data), &varnames[i]); const char *title_data = r_variable->name; size_t title_data_len = strlen(title_data); char kv_separator = '='; char tuple_separator = 0x09; if (title_data_len > 0) { if (title_data_len > 64) title_data_len = 64; if (!is_first) { retval = readstat_write_bytes(writer, &tuple_separator, sizeof(tuple_separator)); if (retval != READSTAT_OK) goto cleanup; } retval = readstat_write_string(writer, name_data); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &kv_separator, sizeof(kv_separator)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, title_data, title_data_len); if (retval != READSTAT_OK) goto cleanup; is_first = 0; } } } cleanup: return retval; } static readstat_error_t sav_emit_very_long_string_record(readstat_writer_t *writer, sav_varnames_t *varnames) { readstat_error_t retval = READSTAT_OK; int i; char tuple_separator[2] = { 0x00, 0x09 }; sav_info_record_t info_header = { .rec_type = SAV_RECORD_TYPE_HAS_DATA, .subtype = SAV_RECORD_SUBTYPE_VERY_LONG_STR, .size = 1, .count = 0 }; for (i=0; ivariables_count; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); if (r_variable->user_width <= MAX_STRING_SIZE) continue; char name_data[9]; sav_format_variable_name(name_data, sizeof(name_data), &varnames[i]); char kv_data[8+1+5+1]; snprintf(kv_data, sizeof(kv_data), "%.8s=%d", name_data, (unsigned int)r_variable->user_width % 100000); info_header.count += strlen(kv_data) + sizeof(tuple_separator); } if (info_header.count == 0) return READSTAT_OK; retval = readstat_write_bytes(writer, &info_header, sizeof(info_header)); if (retval != READSTAT_OK) goto cleanup; for (i=0; ivariables_count; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); if (r_variable->user_width <= MAX_STRING_SIZE) continue; char name_data[9]; sav_format_variable_name(name_data, sizeof(name_data), &varnames[i]); char kv_data[8+1+5+1]; snprintf(kv_data, sizeof(kv_data), "%.8s=%d", name_data, (unsigned int)r_variable->user_width % 100000); retval = readstat_write_string(writer, kv_data); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, tuple_separator, sizeof(tuple_separator)); if (retval != READSTAT_OK) goto cleanup; } cleanup: return retval; } static readstat_error_t sav_emit_long_string_value_labels_record(readstat_writer_t *writer) { readstat_error_t retval = READSTAT_OK; int i, j, k; char *space_buffer = NULL; sav_info_record_t info_header = { .rec_type = SAV_RECORD_TYPE_HAS_DATA, .subtype = SAV_RECORD_SUBTYPE_LONG_STRING_VALUE_LABELS, .size = 1, .count = 0 }; for (i=0; ilabel_sets_count; i++) { readstat_label_set_t *r_label_set = readstat_get_label_set(writer, i); if (!readstat_label_set_needs_long_value_labels_record(r_label_set)) continue; int32_t label_count = r_label_set->value_labels_count; int32_t var_count = r_label_set->variables_count; for (k=0; kname); int32_t storage_width = readstat_variable_get_storage_width(r_variable); if (storage_width <= 8) continue; info_header.count += sizeof(int32_t); // name length info_header.count += name_len; info_header.count += sizeof(int32_t); // variable width info_header.count += sizeof(int32_t); // label count for (j=0; jlabel_len; if (label_len > MAX_VALUE_LABEL_SIZE) label_len = MAX_VALUE_LABEL_SIZE; info_header.count += sizeof(int32_t); // value length info_header.count += storage_width; info_header.count += sizeof(int32_t); // label length info_header.count += label_len; } } } if (info_header.count == 0) goto cleanup; retval = readstat_write_bytes(writer, &info_header, sizeof(info_header)); if (retval != READSTAT_OK) goto cleanup; for (i=0; ilabel_sets_count; i++) { readstat_label_set_t *r_label_set = readstat_get_label_set(writer, i); if (!readstat_label_set_needs_long_value_labels_record(r_label_set)) continue; int32_t label_count = r_label_set->value_labels_count; int32_t var_count = r_label_set->variables_count; for (k=0; kname); int32_t storage_width = readstat_variable_get_storage_width(r_variable); if (storage_width <= 8) continue; space_buffer = realloc(space_buffer, storage_width); memset(space_buffer, ' ', storage_width); retval = readstat_write_bytes(writer, &name_len, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, r_variable->name, name_len); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &storage_width, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &label_count, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; for (j=0; jstring_key_len; int32_t label_len = r_value_label->label_len; if (label_len > MAX_VALUE_LABEL_SIZE) label_len = MAX_VALUE_LABEL_SIZE; retval = readstat_write_bytes(writer, &storage_width, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, r_value_label->string_key, value_len); if (retval != READSTAT_OK) goto cleanup; if (value_len < storage_width) { retval = readstat_write_bytes(writer, space_buffer, storage_width - value_len); if (retval != READSTAT_OK) goto cleanup; } retval = readstat_write_bytes(writer, &label_len, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, r_value_label->label, label_len); if (retval != READSTAT_OK) goto cleanup; } } } cleanup: if (space_buffer) free(space_buffer); return retval; } static readstat_error_t sav_emit_long_string_missing_values_record(readstat_writer_t *writer) { readstat_error_t retval = READSTAT_OK; int j, k; sav_info_record_t info_header = { .rec_type = SAV_RECORD_TYPE_HAS_DATA, .subtype = SAV_RECORD_SUBTYPE_LONG_STRING_MISSING_VALUES, .size = 1, .count = 0 }; int32_t var_count = writer->variables_count; for (k=0; kname); int32_t storage_width = readstat_variable_get_storage_width(r_variable); if (storage_width <= 8) continue; int n_missing_values = 0; for (j=0; jname); int8_t n_missing_values = 0; int32_t storage_width = readstat_variable_get_storage_width(r_variable); if (storage_width <= 8) continue; for (j=0; jname, name_len); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &n_missing_values, sizeof(int8_t)); if (retval != READSTAT_OK) goto cleanup; uint32_t value_len = 8; retval = readstat_write_bytes(writer, &value_len, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; for (j=0; jrow_count; retval = readstat_write_bytes(writer, &info_header, sizeof(sav_info_record_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &one, sizeof(uint64_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &ncases, sizeof(uint64_t)); if (retval != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t sav_emit_termination_record(readstat_writer_t *writer) { sav_dictionary_termination_record_t termination_record = { .rec_type = SAV_RECORD_TYPE_DICT_TERMINATION }; return readstat_write_bytes(writer, &termination_record, sizeof(termination_record)); } static readstat_error_t sav_write_int8(void *row, const readstat_variable_t *var, int8_t value) { double dval = value; memcpy(row, &dval, sizeof(double)); return READSTAT_OK; } static readstat_error_t sav_write_int16(void *row, const readstat_variable_t *var, int16_t value) { double dval = value; memcpy(row, &dval, sizeof(double)); return READSTAT_OK; } static readstat_error_t sav_write_int32(void *row, const readstat_variable_t *var, int32_t value) { double dval = value; memcpy(row, &dval, sizeof(double)); return READSTAT_OK; } static readstat_error_t sav_write_float(void *row, const readstat_variable_t *var, float value) { double dval = value; memcpy(row, &dval, sizeof(double)); return READSTAT_OK; } static readstat_error_t sav_write_double(void *row, const readstat_variable_t *var, double value) { double dval = value; memcpy(row, &dval, sizeof(double)); return READSTAT_OK; } static readstat_error_t sav_write_string(void *row, const readstat_variable_t *var, const char *value) { memset(row, ' ', var->storage_width); if (value != NULL && value[0] != '\0') { size_t value_len = strlen(value); off_t row_offset = 0; off_t val_offset = 0; unsigned char *row_bytes = (unsigned char *)row; if (value_len > var->storage_width) return READSTAT_ERROR_STRING_VALUE_IS_TOO_LONG; while (value_len - val_offset > 255) { memcpy(&row_bytes[row_offset], &value[val_offset], 255); row_offset += 256; val_offset += 255; } memcpy(&row_bytes[row_offset], &value[val_offset], value_len - val_offset); } return READSTAT_OK; } static readstat_error_t sav_write_missing_string(void *row, const readstat_variable_t *var) { memset(row, ' ', var->storage_width); return READSTAT_OK; } static readstat_error_t sav_write_missing_number(void *row, const readstat_variable_t *var) { uint64_t missing_val = SAV_MISSING_DOUBLE; memcpy(row, &missing_val, sizeof(uint64_t)); return READSTAT_OK; } static size_t sav_variable_width(readstat_type_t type, size_t user_width) { if (type == READSTAT_TYPE_STRING) { if (user_width > MAX_STRING_SIZE) { size_t n_segments = sav_variable_segments(type, user_width); size_t last_segment_width = ((user_width - (n_segments - 1) * 252) + 7)/8*8; return (n_segments-1)*256 + last_segment_width; } if (user_width == 0) { return 8; } return (user_width + 7) / 8 * 8; } return 8; } static readstat_error_t sav_validate_name_chars(const char *name, int unicode) { /* TODO check Unicode class */ int j; for (j=0; name[j]; j++) { if (name[j] == ' ') return READSTAT_ERROR_NAME_CONTAINS_ILLEGAL_CHARACTER; if ((name[j] > 0 || !unicode) && name[j] != '@' && name[j] != '.' && name[j] != '_' && name[j] != '$' && name[j] != '#' && !(name[j] >= 'a' && name[j] <= 'z') && !(name[j] >= 'A' && name[j] <= 'Z') && !(name[j] >= '0' && name[j] <= '9')) { return READSTAT_ERROR_NAME_CONTAINS_ILLEGAL_CHARACTER; } } char first_char = name[0]; if ((first_char > 0 || !unicode) && first_char != '@' && !(first_char >= 'a' && first_char <= 'z') && !(first_char >= 'A' && first_char <= 'Z')) { return READSTAT_ERROR_NAME_BEGINS_WITH_ILLEGAL_CHARACTER; } return READSTAT_OK; } static readstat_error_t sav_validate_name_unreserved(const char *name) { if (strcmp(name, "ALL") == 0 || strcmp(name, "AND") == 0 || strcmp(name, "BY") == 0 || strcmp(name, "EQ") == 0 || strcmp(name, "GE") == 0 || strcmp(name, "GT") == 0 || strcmp(name, "GT") == 0 || strcmp(name, "LE") == 0 || strcmp(name, "LT") == 0 || strcmp(name, "NE") == 0 || strcmp(name, "NOT") == 0 || strcmp(name, "OR") == 0 || strcmp(name, "TO") == 0 || strcmp(name, "WITH") == 0) return READSTAT_ERROR_NAME_IS_RESERVED_WORD; return READSTAT_OK; } static readstat_error_t sav_validate_name_length(size_t name_len) { if (name_len > 64) return READSTAT_ERROR_NAME_IS_TOO_LONG; if (name_len == 0) return READSTAT_ERROR_NAME_IS_ZERO_LENGTH; return READSTAT_OK; } static readstat_error_t sav_variable_ok(const readstat_variable_t *variable) { readstat_error_t error = READSTAT_OK; error = sav_validate_name_length(strlen(variable->name)); if (error != READSTAT_OK) return error; error = sav_validate_name_unreserved(variable->name); if (error != READSTAT_OK) return error; return sav_validate_name_chars(variable->name, 1); } static sav_varnames_t *sav_varnames_init(readstat_writer_t *writer) { sav_varnames_t *varnames = calloc(writer->variables_count, sizeof(sav_varnames_t)); ck_hash_table_t *table = ck_hash_table_init(writer->variables_count, 8); int i, k; for (i=0; ivariables_count; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); const char *name = r_variable->name; char *shortname = varnames[i].shortname; char *stem = varnames[i].stem; snprintf(shortname, sizeof(varnames[0].shortname), "%.8s", name); for (k=0; shortname[k]; k++) { // upcase shortname[k] = toupper(shortname[k]); } if (ck_str_hash_lookup(shortname, table)) { snprintf(shortname, sizeof(varnames[0].shortname), "V%d_A", i+1); } ck_str_hash_insert(shortname, r_variable, table); if (r_variable->user_width <= MAX_STRING_SIZE) continue; snprintf(stem, sizeof(varnames[0].stem), "%.5s", shortname); // conflict resolution? } ck_hash_table_free(table); return varnames; } static readstat_error_t sav_begin_data(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; readstat_error_t retval = READSTAT_OK; if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; sav_varnames_t *varnames = sav_varnames_init(writer); retval = sav_emit_header(writer); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_variable_records(writer, varnames); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_value_label_records(writer); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_document_record(writer); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_integer_info_record(writer); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_floating_point_info_record(writer); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_variable_display_record(writer); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_long_var_name_record(writer, varnames); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_very_long_string_record(writer, varnames); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_long_string_value_labels_record(writer); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_long_string_missing_values_record(writer); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_number_of_cases_record(writer); if (retval != READSTAT_OK) goto cleanup; retval = sav_emit_termination_record(writer); if (retval != READSTAT_OK) goto cleanup; cleanup: free(varnames); if (retval == READSTAT_OK) { size_t row_bound = sav_compressed_row_bound(writer->row_len); if (writer->compression == READSTAT_COMPRESS_ROWS) { writer->module_ctx = readstat_malloc(row_bound); #if HAVE_ZLIB } else if (writer->compression == READSTAT_COMPRESS_BINARY) { writer->module_ctx = zsav_ctx_init(row_bound, writer->bytes_written); #endif } } return retval; } static readstat_error_t sav_write_compressed_row(void *writer_ctx, void *row, size_t len) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; unsigned char *output = writer->module_ctx; size_t output_offset = sav_compress_row(output, row, len, writer); return readstat_write_bytes(writer, output, output_offset); } static readstat_error_t sav_metadata_ok(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; if (writer->version == 2 && writer->compression == READSTAT_COMPRESS_BINARY) return READSTAT_ERROR_UNSUPPORTED_COMPRESSION; if (writer->version != 2 && writer->version != 3) return READSTAT_ERROR_UNSUPPORTED_FILE_FORMAT_VERSION; return READSTAT_OK; } readstat_error_t readstat_begin_writing_sav(readstat_writer_t *writer, void *user_ctx, long row_count) { writer->callbacks.metadata_ok = &sav_metadata_ok; writer->callbacks.variable_width = &sav_variable_width; writer->callbacks.variable_ok = &sav_variable_ok; writer->callbacks.write_int8 = &sav_write_int8; writer->callbacks.write_int16 = &sav_write_int16; writer->callbacks.write_int32 = &sav_write_int32; writer->callbacks.write_float = &sav_write_float; writer->callbacks.write_double = &sav_write_double; writer->callbacks.write_string = &sav_write_string; writer->callbacks.write_missing_string = &sav_write_missing_string; writer->callbacks.write_missing_number = &sav_write_missing_number; writer->callbacks.begin_data = &sav_begin_data; if (writer->version == 3) { writer->compression = READSTAT_COMPRESS_BINARY; } else if (writer->version == 0) { writer->version = (writer->compression == READSTAT_COMPRESS_BINARY) ? 3 : 2; } if (writer->compression == READSTAT_COMPRESS_ROWS) { writer->callbacks.write_row = &sav_write_compressed_row; writer->callbacks.module_ctx_free = &free; #if HAVE_ZLIB } else if (writer->compression == READSTAT_COMPRESS_BINARY) { writer->callbacks.write_row = &zsav_write_compressed_row; writer->callbacks.end_data = &zsav_end_data; writer->callbacks.module_ctx_free = (readstat_module_ctx_free_callback)&zsav_ctx_free; #endif } else if (writer->compression == READSTAT_COMPRESS_NONE) { /* void */ } else { return READSTAT_ERROR_UNSUPPORTED_COMPRESSION; } return readstat_begin_writing_file(writer, user_ctx, row_count); } haven/src/readstat/spss/readstat_spss_parse.h0000644000176200001440000000012614101007206021151 0ustar liggesusers readstat_error_t spss_parse_format(const char *data, int count, spss_format_t *fmt); haven/src/readstat/spss/readstat_sav_compress.c0000644000176200001440000001132314101007206021467 0ustar liggesusers#include #include #include "../readstat.h" #include "../readstat_bits.h" #include "../readstat_iconv.h" #include "readstat_sav.h" #include "readstat_sav_compress.h" size_t sav_compressed_row_bound(size_t uncompressed_length) { return uncompressed_length + (uncompressed_length/8 + 8)/8*8; } size_t sav_compress_row(void *output_row, void *input_row, size_t input_len, readstat_writer_t *writer) { unsigned char *output = output_row; unsigned char *input = input_row; off_t input_offset = 0; off_t output_offset = 8; off_t control_offset = 0; int i; memset(&output[control_offset], 0, 8); for (i=0; ivariables_count; i++) { readstat_variable_t *variable = readstat_get_variable(writer, i); if (variable->type == READSTAT_TYPE_STRING) { size_t width = variable->storage_width; while (width > 0) { if (memcmp(&input[input_offset], SAV_EIGHT_SPACES, 8) == 0) { output[control_offset++] = 254; } else { output[control_offset++] = 253; memcpy(&output[output_offset], &input[input_offset], 8); output_offset += 8; } if (control_offset % 8 == 0) { control_offset = output_offset; memset(&output[control_offset], 0, 8); output_offset += 8; } input_offset += 8; width -= 8; } } else { uint64_t int_value; memcpy(&int_value, &input[input_offset], 8); if (int_value == SAV_MISSING_DOUBLE) { output[control_offset++] = 255; } else { double fp_value; memcpy(&fp_value, &input[input_offset], 8); if (fp_value > -100 && fp_value < 152 && (int)fp_value == fp_value) { output[control_offset++] = (int)fp_value + 100; } else { output[control_offset++] = 253; memcpy(&output[output_offset], &input[input_offset], 8); output_offset += 8; } } if (control_offset % 8 == 0) { control_offset = output_offset; memset(&output[control_offset], 0, 8); output_offset += 8; } input_offset += 8; } } if (writer->current_row + 1 == writer->row_count) output[control_offset] = 252; return output_offset; } void sav_decompress_row(struct sav_row_stream_s *state) { double fp_value; uint64_t missing_value = state->bswap ? byteswap8(state->missing_value) : state->missing_value; int i = 8 - state->i; while (1) { if (i == 8) { if (state->avail_in < 8) { state->status = SAV_ROW_STREAM_NEED_DATA; goto done; } memcpy(state->chunk, state->next_in, 8); state->next_in += 8; state->avail_in -= 8; i = 0; } while (i<8) { switch (state->chunk[i]) { case 0: break; case 252: state->status = SAV_ROW_STREAM_FINISHED_ALL; goto done; case 253: if (state->avail_in < 8) { state->status = SAV_ROW_STREAM_NEED_DATA; goto done; } memcpy(state->next_out, state->next_in, 8); state->next_out += 8; state->avail_out -= 8; state->next_in += 8; state->avail_in -= 8; break; case 254: memset(state->next_out, ' ', 8); state->next_out += 8; state->avail_out -= 8; break; case 255: memcpy(state->next_out, &missing_value, sizeof(uint64_t)); state->next_out += 8; state->avail_out -= 8; break; default: fp_value = state->chunk[i] - state->bias; fp_value = state->bswap ? byteswap_double(fp_value) : fp_value; memcpy(state->next_out, &fp_value, sizeof(double)); state->next_out += 8; state->avail_out -= 8; break; } i++; if (state->avail_out < 8) { state->status = SAV_ROW_STREAM_FINISHED_ROW; goto done; } } } done: state->i = 8 - i; } haven/src/readstat/spss/readstat_sav_parse_timestamp.rl0000644000176200001440000000712014101007206023224 0ustar liggesusers #include #include "../readstat.h" #include "../readstat_iconv.h" #include "readstat_sav.h" #include "readstat_sav_parse_timestamp.h" %%{ machine sav_time_parse; write data nofinal noerror; }%% readstat_error_t sav_parse_time(const char *data, size_t len, struct tm *timestamp, readstat_error_handler error_cb, void *user_ctx) { readstat_error_t retval = READSTAT_OK; char error_buf[8192]; const char *p = data; const char *pe = p + len; const char *eof = pe; int cs; int temp_val = 0; %%{ action incr_val { temp_val = 10 * temp_val + (fc - '0'); } integer2 = ( " " %{ temp_val = 0; } | [0-9] ${ temp_val = fc - '0'; } ) [0-9] $incr_val; hour = integer2 %{ timestamp->tm_hour = temp_val; }; minute = integer2 %{ timestamp->tm_min = temp_val; }; second = integer2 %{ timestamp->tm_sec = temp_val; }; main := hour ":" minute ":" second; write init; write exec; }%% if (cs < %%{ write first_final; }%%|| p != pe) { if (error_cb) { snprintf(error_buf, sizeof(error_buf), "Invalid time string (length=%d): %.*s", (int)len, (int)len, data); error_cb(error_buf, user_ctx); } retval = READSTAT_ERROR_BAD_TIMESTAMP_STRING; } (void)sav_time_parse_en_main; return retval; } %%{ machine sav_date_parse; write data nofinal noerror; }%% readstat_error_t sav_parse_date(const char *data, size_t len, struct tm *timestamp, readstat_error_handler error_cb, void *user_ctx) { readstat_error_t retval = READSTAT_OK; char error_buf[8192]; const char *p = data; const char *pe = p + len; const char *eof = pe; int cs; int temp_val = 0; %%{ action incr_val { char digit = (fc - '0'); if (digit >= 0 && digit <= 9) { temp_val = 10 * temp_val + digit; } } action save_year { if (temp_val < 70) { timestamp->tm_year = 100 + temp_val; } else { timestamp->tm_year = temp_val; } } # some files in the wild use space padding instead of 0 padding integer2 = [0-9 ]{2} >{ temp_val = 0; } $incr_val; day = integer2 %{ timestamp->tm_mday = temp_val; }; year = integer2 %save_year; month = ("Jan" | "JAN") %{ timestamp->tm_mon = 0; } | ("Feb" | "FEB") %{ timestamp->tm_mon = 1; } | ("Mar" | "MAR") %{ timestamp->tm_mon = 2; } | ("Apr" | "APR") %{ timestamp->tm_mon = 3; } | ("May" | "MAY") %{ timestamp->tm_mon = 4; } | ("Jun" | "JUN") %{ timestamp->tm_mon = 5; } | ("Jul" | "JUL") %{ timestamp->tm_mon = 6; } | ("Aug" | "AUG") %{ timestamp->tm_mon = 7; } | ("Sep" | "SEP") %{ timestamp->tm_mon = 8; } | ("Oct" | "OCT") %{ timestamp->tm_mon = 9; } | ("Nov" | "NOV") %{ timestamp->tm_mon = 10; } | ("Dec" | "DEC") %{ timestamp->tm_mon = 11; }; # somebody is outputting dash separators main := day [ \-] month [ \-] year; write init; write exec; }%% if (cs < %%{ write first_final; }%%|| p != pe) { if (error_cb) { snprintf(error_buf, sizeof(error_buf), "Invalid date string (length=%d): %.*s", (int)len, (int)len, data); error_cb(error_buf, user_ctx); } retval = READSTAT_ERROR_BAD_TIMESTAMP_STRING; } (void)sav_date_parse_en_main; return retval; } haven/src/readstat/spss/readstat_sav_read.c0000644000176200001440000016212714101765776020607 0ustar liggesusers #include #include #include #include #include #include #include #include #include #include "../readstat.h" #include "../readstat_bits.h" #include "../readstat_iconv.h" #include "../readstat_convert.h" #include "../readstat_malloc.h" #include "readstat_sav.h" #include "readstat_sav_compress.h" #include "readstat_sav_parse.h" #include "readstat_sav_parse_timestamp.h" #if HAVE_ZLIB #include "readstat_zsav_read.h" #endif #define DATA_BUFFER_SIZE 65536 #define VERY_LONG_STRING_MAX_LENGTH INT_MAX /* Others defined in table below */ /* See http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx */ static readstat_charset_entry_t _charset_table[] = { { .code = 1, .name = "EBCDIC-US" }, { .code = 2, .name = "WINDOWS-1252" }, /* supposed to be ASCII, but some files are miscoded */ { .code = 3, .name = "WINDOWS-1252" }, { .code = 4, .name = "DEC-KANJI" }, { .code = 437, .name = "CP437" }, { .code = 708, .name = "ASMO-708" }, { .code = 737, .name = "CP737" }, { .code = 775, .name = "CP775" }, { .code = 850, .name = "CP850" }, { .code = 852, .name = "CP852" }, { .code = 855, .name = "CP855" }, { .code = 857, .name = "CP857" }, { .code = 858, .name = "CP858" }, { .code = 860, .name = "CP860" }, { .code = 861, .name = "CP861" }, { .code = 862, .name = "CP862" }, { .code = 863, .name = "CP863" }, { .code = 864, .name = "CP864" }, { .code = 865, .name = "CP865" }, { .code = 866, .name = "CP866" }, { .code = 869, .name = "CP869" }, { .code = 874, .name = "CP874" }, { .code = 932, .name = "SHIFT-JIS" }, { .code = 936, .name = "ISO-IR-58" }, { .code = 949, .name = "ISO-IR-149" }, { .code = 950, .name = "BIG-5" }, { .code = 1200, .name = "UTF-16LE" }, { .code = 1201, .name = "UTF-16BE" }, { .code = 1250, .name = "WINDOWS-1250" }, { .code = 1251, .name = "WINDOWS-1251" }, { .code = 1252, .name = "WINDOWS-1252" }, { .code = 1253, .name = "WINDOWS-1253" }, { .code = 1254, .name = "WINDOWS-1254" }, { .code = 1255, .name = "WINDOWS-1255" }, { .code = 1256, .name = "WINDOWS-1256" }, { .code = 1257, .name = "WINDOWS-1257" }, { .code = 1258, .name = "WINDOWS-1258" }, { .code = 1361, .name = "CP1361" }, { .code = 10000, .name = "MACROMAN" }, { .code = 10004, .name = "MACARABIC" }, { .code = 10005, .name = "MACHEBREW" }, { .code = 10006, .name = "MACGREEK" }, { .code = 10007, .name = "MACCYRILLIC" }, { .code = 10010, .name = "MACROMANIA" }, { .code = 10017, .name = "MACUKRAINE" }, { .code = 10021, .name = "MACTHAI" }, { .code = 10029, .name = "MACCENTRALEUROPE" }, { .code = 10079, .name = "MACICELAND" }, { .code = 10081, .name = "MACTURKISH" }, { .code = 10082, .name = "MACCROATIAN" }, { .code = 12000, .name = "UTF-32LE" }, { .code = 12001, .name = "UTF-32BE" }, { .code = 20127, .name = "US-ASCII" }, { .code = 20866, .name = "KOI8-R" }, { .code = 20932, .name = "EUC-JP" }, { .code = 21866, .name = "KOI8-U" }, { .code = 28591, .name = "ISO-8859-1" }, { .code = 28592, .name = "ISO-8859-2" }, { .code = 28593, .name = "ISO-8859-3" }, { .code = 28594, .name = "ISO-8859-4" }, { .code = 28595, .name = "ISO-8859-5" }, { .code = 28596, .name = "ISO-8859-6" }, { .code = 28597, .name = "ISO-8859-7" }, { .code = 28598, .name = "ISO-8859-8" }, { .code = 28599, .name = "ISO-8859-9" }, { .code = 28603, .name = "ISO-8859-13" }, { .code = 28605, .name = "ISO-8859-15" }, { .code = 50220, .name = "ISO-2022-JP" }, { .code = 50221, .name = "ISO-2022-JP" }, // same as above? { .code = 50222, .name = "ISO-2022-JP" }, // same as above? { .code = 50225, .name = "ISO-2022-KR" }, { .code = 50229, .name = "ISO-2022-CN" }, { .code = 51932, .name = "EUC-JP" }, { .code = 51936, .name = "GBK" }, { .code = 51949, .name = "EUC-KR" }, { .code = 52936, .name = "HZ-GB-2312" }, { .code = 54936, .name = "GB18030" }, { .code = 65000, .name = "UTF-7" }, { .code = 65001, .name = "UTF-8" } }; #define SAV_LABEL_NAME_PREFIX "labels" typedef struct value_label_s { char raw_value[8]; char utf8_string_value[8*4+1]; readstat_value_t final_value; char *label; } value_label_t; static readstat_error_t sav_update_progress(sav_ctx_t *ctx); static readstat_error_t sav_read_data(sav_ctx_t *ctx); static readstat_error_t sav_read_compressed_data(sav_ctx_t *ctx, readstat_error_t (*row_handler)(unsigned char *, size_t, sav_ctx_t *)); static readstat_error_t sav_read_uncompressed_data(sav_ctx_t *ctx, readstat_error_t (*row_handler)(unsigned char *, size_t, sav_ctx_t *)); static readstat_error_t sav_skip_variable_record(sav_ctx_t *ctx); static readstat_error_t sav_read_variable_record(sav_ctx_t *ctx); static readstat_error_t sav_skip_document_record(sav_ctx_t *ctx); static readstat_error_t sav_read_document_record(sav_ctx_t *ctx); static readstat_error_t sav_skip_value_label_record(sav_ctx_t *ctx); static readstat_error_t sav_read_value_label_record(sav_ctx_t *ctx); static readstat_error_t sav_read_dictionary_termination_record(sav_ctx_t *ctx); static readstat_error_t sav_parse_machine_floating_point_record(const void *data, size_t size, size_t count, sav_ctx_t *ctx); static readstat_error_t sav_store_variable_display_parameter_record(const void *data, size_t size, size_t count, sav_ctx_t *ctx); static readstat_error_t sav_parse_variable_display_parameter_record(sav_ctx_t *ctx); static readstat_error_t sav_parse_machine_integer_info_record(const void *data, size_t data_len, sav_ctx_t *ctx); static readstat_error_t sav_parse_long_string_value_labels_record(const void *data, size_t size, size_t count, sav_ctx_t *ctx); static readstat_error_t sav_parse_long_string_missing_values_record(const void *data, size_t size, size_t count, sav_ctx_t *ctx); static void sav_tag_missing_double(readstat_value_t *value, sav_ctx_t *ctx) { double fp_value = value->v.double_value; uint64_t long_value = 0; memcpy(&long_value, &fp_value, 8); if (long_value == ctx->missing_double) value->is_system_missing = 1; if (long_value == ctx->lowest_double) value->is_system_missing = 1; if (long_value == ctx->highest_double) value->is_system_missing = 1; if (isnan(fp_value)) value->is_system_missing = 1; } static readstat_error_t sav_update_progress(sav_ctx_t *ctx) { readstat_io_t *io = ctx->io; return io->update(ctx->file_size, ctx->handle.progress, ctx->user_ctx, io->io_ctx); } static readstat_error_t sav_skip_variable_record(sav_ctx_t *ctx) { sav_variable_record_t variable; readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; if (io->read(&variable, sizeof(sav_variable_record_t), io->io_ctx) < sizeof(sav_variable_record_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (variable.has_var_label) { uint32_t label_len; if (io->read(&label_len, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } label_len = ctx->bswap ? byteswap4(label_len) : label_len; uint32_t label_capacity = (label_len + 3) / 4 * 4; if (io->seek(label_capacity, READSTAT_SEEK_CUR, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } } if (variable.n_missing_values) { int n_missing_values = ctx->bswap ? byteswap4(variable.n_missing_values) : variable.n_missing_values; if (io->seek(abs(n_missing_values) * sizeof(double), READSTAT_SEEK_CUR, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } } cleanup: return retval; } static readstat_error_t sav_read_variable_label(spss_varinfo_t *info, sav_ctx_t *ctx) { readstat_io_t *io = ctx->io; readstat_error_t retval = READSTAT_OK; uint32_t label_len, label_capacity; size_t out_label_len; char *label_buf = NULL; if (io->read(&label_len, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } label_len = ctx->bswap ? byteswap4(label_len) : label_len; if (label_len == 0) goto cleanup; label_capacity = (label_len + 3) / 4 * 4; if ((label_buf = readstat_malloc(label_capacity)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } out_label_len = (size_t)label_len*4+1; if ((info->label = readstat_malloc(out_label_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (io->read(label_buf, label_capacity, io->io_ctx) < label_capacity) { retval = READSTAT_ERROR_READ; goto cleanup; } retval = readstat_convert(info->label, out_label_len, label_buf, label_len, ctx->converter); if (retval != READSTAT_OK) goto cleanup; cleanup: if (label_buf) free(label_buf); if (retval != READSTAT_OK) { if (info->label) { free(info->label); info->label = NULL; } } return retval; } static readstat_error_t sav_read_variable_missing_double_values(spss_varinfo_t *info, sav_ctx_t *ctx) { readstat_io_t *io = ctx->io; int i; readstat_error_t retval = READSTAT_OK; if (io->read(info->missing_double_values, info->n_missing_values * sizeof(double), io->io_ctx) < info->n_missing_values * sizeof(double)) { retval = READSTAT_ERROR_READ; goto cleanup; } for (i=0; in_missing_values; i++) { if (ctx->bswap) { info->missing_double_values[i] = byteswap_double(info->missing_double_values[i]); } uint64_t long_value = 0; memcpy(&long_value, &info->missing_double_values[i], 8); if (long_value == ctx->missing_double) info->missing_double_values[i] = NAN; if (long_value == ctx->lowest_double) info->missing_double_values[i] = -HUGE_VAL; if (long_value == ctx->highest_double) info->missing_double_values[i] = HUGE_VAL; } cleanup: return retval; } static readstat_error_t sav_read_variable_missing_string_values(spss_varinfo_t *info, sav_ctx_t *ctx) { readstat_io_t *io = ctx->io; int i; readstat_error_t retval = READSTAT_OK; for (i=0; in_missing_values; i++) { char missing_value[8]; if (io->read(missing_value, sizeof(missing_value), io->io_ctx) < sizeof(missing_value)) { retval = READSTAT_ERROR_READ; goto cleanup; } retval = readstat_convert(info->missing_string_values[i], sizeof(info->missing_string_values[0]), missing_value, sizeof(missing_value), ctx->converter); if (retval != READSTAT_OK) goto cleanup; } cleanup: return retval; } static readstat_error_t sav_read_variable_missing_values(spss_varinfo_t *info, sav_ctx_t *ctx) { if (info->n_missing_values > 3 || info->n_missing_values < -3) { return READSTAT_ERROR_PARSE; } if (info->n_missing_values < 0) { info->missing_range = 1; info->n_missing_values = abs(info->n_missing_values); } else { info->missing_range = 0; } if (info->type == READSTAT_TYPE_DOUBLE) { return sav_read_variable_missing_double_values(info, ctx); } return sav_read_variable_missing_string_values(info, ctx); } static readstat_error_t sav_read_variable_record(sav_ctx_t *ctx) { readstat_io_t *io = ctx->io; sav_variable_record_t variable = { 0 }; spss_varinfo_t *info = NULL; readstat_error_t retval = READSTAT_OK; if (ctx->var_index == ctx->varinfo_capacity) { if ((ctx->varinfo = readstat_realloc(ctx->varinfo, (ctx->varinfo_capacity *= 2) * sizeof(spss_varinfo_t *))) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } } if (io->read(&variable, sizeof(sav_variable_record_t), io->io_ctx) < sizeof(sav_variable_record_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } variable.print = ctx->bswap ? byteswap4(variable.print) : variable.print; variable.write = ctx->bswap ? byteswap4(variable.write) : variable.write; int32_t type = ctx->bswap ? byteswap4(variable.type) : variable.type; if (type < 0) { if (ctx->var_index == 0) { return READSTAT_ERROR_PARSE; } ctx->var_offset++; ctx->varinfo[ctx->var_index-1]->width++; return 0; } if ((info = readstat_calloc(1, sizeof(spss_varinfo_t))) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } info->width = 1; info->n_segments = 1; info->index = ctx->var_index; info->offset = ctx->var_offset; info->labels_index = -1; retval = readstat_convert(info->name, sizeof(info->name), variable.name, sizeof(variable.name), NULL); if (retval != READSTAT_OK) goto cleanup; retval = readstat_convert(info->longname, sizeof(info->longname), variable.name, sizeof(variable.name), NULL); if (retval != READSTAT_OK) goto cleanup; info->print_format.decimal_places = (variable.print & 0x000000FF); info->print_format.width = (variable.print & 0x0000FF00) >> 8; info->print_format.type = (variable.print & 0x00FF0000) >> 16; info->write_format.decimal_places = (variable.write & 0x000000FF); info->write_format.width = (variable.write & 0x0000FF00) >> 8; info->write_format.type = (variable.write & 0x00FF0000) >> 16; if (type > 0 || info->print_format.type == SPSS_FORMAT_TYPE_A || info->write_format.type == SPSS_FORMAT_TYPE_A) { info->type = READSTAT_TYPE_STRING; } else { info->type = READSTAT_TYPE_DOUBLE; } if (variable.has_var_label) { if ((retval = sav_read_variable_label(info, ctx)) != READSTAT_OK) { goto cleanup; } } if (variable.n_missing_values) { info->n_missing_values = ctx->bswap ? byteswap4(variable.n_missing_values) : variable.n_missing_values; if ((retval = sav_read_variable_missing_values(info, ctx)) != READSTAT_OK) { goto cleanup; } } ctx->varinfo[ctx->var_index] = info; ctx->var_index++; ctx->var_offset++; cleanup: if (retval != READSTAT_OK) { spss_varinfo_free(info); } return retval; } static readstat_error_t sav_skip_value_label_record(sav_ctx_t *ctx) { uint32_t label_count; uint32_t rec_type; uint32_t var_count; readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; if (io->read(&label_count, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->bswap) label_count = byteswap4(label_count); int i; for (i=0; iseek(8, READSTAT_SEEK_CUR, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if (io->read(&unpadded_len, 1, io->io_ctx) < 1) { retval = READSTAT_ERROR_READ; goto cleanup; } padded_len = (unpadded_len + 8) / 8 * 8 - 1; if (io->seek(padded_len, READSTAT_SEEK_CUR, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } } if (io->read(&rec_type, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->bswap) rec_type = byteswap4(rec_type); if (rec_type != 4) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (io->read(&var_count, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->bswap) var_count = byteswap4(var_count); if (io->seek(var_count * sizeof(uint32_t), READSTAT_SEEK_CUR, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } cleanup: return retval; } static readstat_error_t sav_submit_value_labels(value_label_t *value_labels, int32_t label_count, readstat_type_t value_type, sav_ctx_t *ctx) { char label_name_buf[256]; readstat_error_t retval = READSTAT_OK; int32_t i; snprintf(label_name_buf, sizeof(label_name_buf), SAV_LABEL_NAME_PREFIX "%d", ctx->value_labels_count); for (i=0; ihandle.value_label(label_name_buf, vlabel->final_value, vlabel->label, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } cleanup: return retval; } static readstat_error_t sav_read_value_label_record(sav_ctx_t *ctx) { uint32_t label_count; readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; uint32_t *vars = NULL; uint32_t var_count; int32_t rec_type; readstat_type_t value_type = READSTAT_TYPE_STRING; char label_buf[256]; value_label_t *value_labels = NULL; if (io->read(&label_count, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->bswap) label_count = byteswap4(label_count); if (label_count && (value_labels = readstat_calloc(label_count, sizeof(value_label_t))) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } int i; for (i=0; iread(vlabel->raw_value, 8, io->io_ctx) < 8) { retval = READSTAT_ERROR_READ; goto cleanup; } if (io->read(&unpadded_label_len, 1, io->io_ctx) < 1) { retval = READSTAT_ERROR_READ; goto cleanup; } padded_label_len = (unpadded_label_len + 8) / 8 * 8 - 1; if (io->read(label_buf, padded_label_len, io->io_ctx) < padded_label_len) { retval = READSTAT_ERROR_READ; goto cleanup; } utf8_label_len = padded_label_len*4+1; if ((vlabel->label = readstat_malloc(utf8_label_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } retval = readstat_convert(vlabel->label, utf8_label_len, label_buf, padded_label_len, ctx->converter); if (retval != READSTAT_OK) goto cleanup; } if (io->read(&rec_type, sizeof(int32_t), io->io_ctx) < sizeof(int32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->bswap) rec_type = byteswap4(rec_type); if (rec_type != 4) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (io->read(&var_count, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->bswap) var_count = byteswap4(var_count); if (var_count && (vars = readstat_malloc(var_count * sizeof(uint32_t))) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (io->read(vars, var_count * sizeof(uint32_t), io->io_ctx) < var_count * sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } for (i=0; ibswap) var_offset = byteswap4(var_offset); var_offset--; // Why subtract 1???? spss_varinfo_t **var = bsearch(&var_offset, ctx->varinfo, ctx->var_index, sizeof(spss_varinfo_t *), &spss_varinfo_compare); if (var) { (*var)->labels_index = ctx->value_labels_count; value_type = (*var)->type; } } for (i=0; ifinal_value.type = value_type; if (value_type == READSTAT_TYPE_DOUBLE) { memcpy(&val_d, vlabel->raw_value, 8); if (ctx->bswap) val_d = byteswap_double(val_d); vlabel->final_value.v.double_value = val_d; sav_tag_missing_double(&vlabel->final_value, ctx); } else { retval = readstat_convert(vlabel->utf8_string_value, sizeof(vlabel->utf8_string_value), vlabel->raw_value, 8, ctx->converter); if (retval != READSTAT_OK) break; vlabel->final_value.v.string_value = vlabel->utf8_string_value; } } if (ctx->handle.value_label) { sav_submit_value_labels(value_labels, label_count, value_type, ctx); } ctx->value_labels_count++; cleanup: if (vars) free(vars); if (value_labels) { for (i=0; ilabel) free(vlabel->label); } free(value_labels); } return retval; } static readstat_error_t sav_skip_document_record(sav_ctx_t *ctx) { uint32_t n_lines; readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; if (io->read(&n_lines, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->bswap) n_lines = byteswap4(n_lines); if (io->seek(n_lines * SPSS_DOC_LINE_SIZE, READSTAT_SEEK_CUR, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } cleanup: return retval; } static readstat_error_t sav_read_document_record(sav_ctx_t *ctx) { if (!ctx->handle.note) return sav_skip_document_record(ctx); uint32_t n_lines; readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; if (io->read(&n_lines, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->bswap) n_lines = byteswap4(n_lines); char raw_buffer[SPSS_DOC_LINE_SIZE]; char utf8_buffer[4*SPSS_DOC_LINE_SIZE+1]; int i; for (i=0; iread(raw_buffer, SPSS_DOC_LINE_SIZE, io->io_ctx) < SPSS_DOC_LINE_SIZE) { retval = READSTAT_ERROR_READ; goto cleanup; } retval = readstat_convert(utf8_buffer, sizeof(utf8_buffer), raw_buffer, sizeof(raw_buffer), ctx->converter); if (retval != READSTAT_OK) goto cleanup; if (ctx->handle.note(i, utf8_buffer, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } cleanup: return retval; } static readstat_error_t sav_read_dictionary_termination_record(sav_ctx_t *ctx) { int32_t filler; readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; if (io->read(&filler, sizeof(int32_t), io->io_ctx) < sizeof(int32_t)) { retval = READSTAT_ERROR_READ; } return retval; } static readstat_error_t sav_process_row(unsigned char *buffer, size_t buffer_len, sav_ctx_t *ctx) { if (ctx->row_offset) { ctx->row_offset--; return READSTAT_OK; } readstat_error_t retval = READSTAT_OK; double fp_value; int offset = 0; readstat_off_t data_offset = 0; size_t raw_str_used = 0; int segment_offset = 0; int var_index = 0, col = 0; int raw_str_is_utf8 = ctx->input_encoding && !strcmp(ctx->input_encoding, "UTF-8"); while (data_offset < buffer_len && col < ctx->var_index && var_index < ctx->var_index) { spss_varinfo_t *col_info = ctx->varinfo[col]; spss_varinfo_t *var_info = ctx->varinfo[var_index]; readstat_value_t value = { .type = var_info->type }; if (offset > 31) { retval = READSTAT_ERROR_PARSE; goto done; } if (var_info->type == READSTAT_TYPE_STRING) { if (raw_str_used + 8 <= ctx->raw_string_len) { if (raw_str_is_utf8) { /* Skip null bytes, see https://github.com/tidyverse/haven/issues/560 */ char c; for (int i=0; i<8; i++) if ((c = buffer[data_offset+i])) ctx->raw_string[raw_str_used++] = c; } else { memcpy(ctx->raw_string + raw_str_used, &buffer[data_offset], 8); raw_str_used += 8; } } if (++offset == col_info->width) { if (++segment_offset < var_info->n_segments) { raw_str_used--; } offset = 0; col++; } if (segment_offset == var_info->n_segments) { if (!ctx->variables[var_info->index]->skip) { retval = readstat_convert(ctx->utf8_string, ctx->utf8_string_len, ctx->raw_string, raw_str_used, ctx->converter); if (retval != READSTAT_OK) goto done; value.v.string_value = ctx->utf8_string; if (ctx->handle.value(ctx->current_row, ctx->variables[var_info->index], value, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto done; } } raw_str_used = 0; segment_offset = 0; var_index += var_info->n_segments; } } else if (var_info->type == READSTAT_TYPE_DOUBLE) { if (!ctx->variables[var_info->index]->skip) { memcpy(&fp_value, &buffer[data_offset], 8); if (ctx->bswap) { fp_value = byteswap_double(fp_value); } value.v.double_value = fp_value; sav_tag_missing_double(&value, ctx); if (ctx->handle.value(ctx->current_row, ctx->variables[var_info->index], value, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto done; } } var_index += var_info->n_segments; col++; } data_offset += 8; } ctx->current_row++; done: return retval; } static readstat_error_t sav_read_data(sav_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; size_t longest_string = 256; int i; for (i=0; ivar_index;) { spss_varinfo_t *info = ctx->varinfo[i]; if (info->string_length > longest_string) { longest_string = info->string_length; } i += info->n_segments; } ctx->raw_string_len = longest_string + sizeof(SAV_EIGHT_SPACES)-2; ctx->raw_string = readstat_malloc(ctx->raw_string_len); ctx->utf8_string_len = 4*longest_string+1 + sizeof(SAV_EIGHT_SPACES)-2; ctx->utf8_string = readstat_malloc(ctx->utf8_string_len); if (ctx->raw_string == NULL || ctx->utf8_string == NULL) { retval = READSTAT_ERROR_MALLOC; goto done; } if (ctx->compression == READSTAT_COMPRESS_ROWS) { retval = sav_read_compressed_data(ctx, &sav_process_row); } else if (ctx->compression == READSTAT_COMPRESS_BINARY) { #if HAVE_ZLIB retval = zsav_read_compressed_data(ctx, &sav_process_row); #else retval = READSTAT_ERROR_UNSUPPORTED_COMPRESSION; #endif } else { retval = sav_read_uncompressed_data(ctx, &sav_process_row); } if (retval != READSTAT_OK) goto done; if (ctx->record_count >= 0 && ctx->current_row != ctx->row_limit) { retval = READSTAT_ERROR_ROW_COUNT_MISMATCH; } done: return retval; } static readstat_error_t sav_read_uncompressed_data(sav_ctx_t *ctx, readstat_error_t (*row_handler)(unsigned char *, size_t, sav_ctx_t *)) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; unsigned char *buffer = NULL; size_t bytes_read = 0; size_t buffer_len = ctx->var_offset * 8; buffer = readstat_malloc(buffer_len); if (ctx->row_offset) { if (io->seek(buffer_len * ctx->row_offset, READSTAT_SEEK_CUR, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto done; } ctx->row_offset = 0; } while (ctx->row_limit == -1 || ctx->current_row < ctx->row_limit) { retval = sav_update_progress(ctx); if (retval != READSTAT_OK) goto done; if ((bytes_read = io->read(buffer, buffer_len, io->io_ctx)) != buffer_len) goto done; retval = row_handler(buffer, buffer_len, ctx); if (retval != READSTAT_OK) goto done; } done: if (buffer) free(buffer); return retval; } static readstat_error_t sav_read_compressed_data(sav_ctx_t *ctx, readstat_error_t (*row_handler)(unsigned char *, size_t, sav_ctx_t *)) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; readstat_off_t data_offset = 0; unsigned char buffer[DATA_BUFFER_SIZE]; int buffer_used = 0; size_t uncompressed_row_len = ctx->var_offset * 8; readstat_off_t uncompressed_offset = 0; unsigned char *uncompressed_row = NULL; struct sav_row_stream_s state = { .missing_value = ctx->missing_double, .bias = ctx->bias, .bswap = ctx->bswap }; if (uncompressed_row_len && (uncompressed_row = readstat_malloc(uncompressed_row_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto done; } while (1) { retval = sav_update_progress(ctx); if (retval != READSTAT_OK) goto done; buffer_used = io->read(buffer, sizeof(buffer), io->io_ctx); if (buffer_used == -1 || buffer_used == 0 || (buffer_used % 8) != 0) goto done; state.status = SAV_ROW_STREAM_HAVE_DATA; data_offset = 0; while (state.status != SAV_ROW_STREAM_NEED_DATA) { state.next_in = &buffer[data_offset]; state.avail_in = buffer_used - data_offset; state.next_out = &uncompressed_row[uncompressed_offset]; state.avail_out = uncompressed_row_len - uncompressed_offset; sav_decompress_row(&state); uncompressed_offset = uncompressed_row_len - state.avail_out; data_offset = buffer_used - state.avail_in; if (state.status == SAV_ROW_STREAM_FINISHED_ROW) { retval = row_handler(uncompressed_row, uncompressed_row_len, ctx); if (retval != READSTAT_OK) goto done; uncompressed_offset = 0; } if (state.status == SAV_ROW_STREAM_FINISHED_ALL) goto done; if (ctx->row_limit > 0 && ctx->current_row == ctx->row_limit) goto done; } } done: if (uncompressed_row) free(uncompressed_row); return retval; } static readstat_error_t sav_parse_machine_integer_info_record(const void *data, size_t data_len, sav_ctx_t *ctx) { if (data_len != 32) return READSTAT_ERROR_PARSE; const char *src_charset = NULL; const char *dst_charset = ctx->output_encoding; sav_machine_integer_info_record_t record; memcpy(&record, data, data_len); if (ctx->bswap) { record.character_code = byteswap4(record.character_code); } if (ctx->input_encoding) { src_charset = ctx->input_encoding; } else { int i; for (i=0; ihandle.error) { char error_buf[1024]; snprintf(error_buf, sizeof(error_buf), "Unsupported character set: %d\n", record.character_code); ctx->handle.error(error_buf, ctx->user_ctx); } return READSTAT_ERROR_UNSUPPORTED_CHARSET; } ctx->input_encoding = src_charset; } if (src_charset && dst_charset) { // You might be tempted to skip the charset conversion when src_charset // and dst_charset are the same. However, some versions of SPSS insert // illegally truncated strings (e.g. the last character is three bytes // but the field only has room for two bytes). So to prevent the client // from receiving an invalid byte sequence, we ram everything through // our iconv machinery. iconv_t converter = iconv_open(dst_charset, src_charset); if (converter == (iconv_t)-1) { return READSTAT_ERROR_UNSUPPORTED_CHARSET; } if (ctx->converter) { iconv_close(ctx->converter); } ctx->converter = converter; } return READSTAT_OK; } static readstat_error_t sav_parse_machine_floating_point_record(const void *data, size_t size, size_t count, sav_ctx_t *ctx) { if (size != 8 || count != 3) return READSTAT_ERROR_PARSE; sav_machine_floating_point_info_record_t fp_info; memcpy(&fp_info, data, sizeof(sav_machine_floating_point_info_record_t)); ctx->missing_double = ctx->bswap ? byteswap8(fp_info.sysmis) : fp_info.sysmis; ctx->highest_double = ctx->bswap ? byteswap8(fp_info.highest) : fp_info.highest; ctx->lowest_double = ctx->bswap ? byteswap8(fp_info.lowest) : fp_info.lowest; return READSTAT_OK; } /* We don't yet know how many real variables there are, so store the values in the record * and make sense of them later. */ static readstat_error_t sav_store_variable_display_parameter_record(const void *data, size_t size, size_t count, sav_ctx_t *ctx) { if (size != 4) return READSTAT_ERROR_PARSE; const uint32_t *data_ptr = data; int i; ctx->variable_display_values = readstat_realloc(ctx->variable_display_values, count * sizeof(uint32_t)); if (count > 0 && ctx->variable_display_values == NULL) return READSTAT_ERROR_MALLOC; ctx->variable_display_values_count = count; for (i=0; ivariable_display_values[i] = ctx->bswap ? byteswap4(data_ptr[i]) : data_ptr[i]; } return READSTAT_OK; } static readstat_error_t sav_parse_variable_display_parameter_record(sav_ctx_t *ctx) { if (!ctx->variable_display_values) return READSTAT_OK; int i; long count = ctx->variable_display_values_count; if (count != 2 * ctx->var_index && count != 3 * ctx->var_index) { return READSTAT_ERROR_PARSE; } int has_display_width = ctx->var_index > 0 && (count / ctx->var_index == 3); int offset = 0; for (i=0; ivar_index;) { spss_varinfo_t *info = ctx->varinfo[i]; offset = (2 + has_display_width)*i; info->measure = spss_measure_to_readstat_measure(ctx->variable_display_values[offset++]); if (has_display_width) { info->display_width = ctx->variable_display_values[offset++]; } info->alignment = spss_alignment_to_readstat_alignment(ctx->variable_display_values[offset++]); i += info->n_segments; } return READSTAT_OK; } static readstat_error_t sav_read_pascal_string(char *buf, size_t buf_len, const char **inout_data_ptr, size_t data_ptr_len, sav_ctx_t *ctx) { const char *data_ptr = *inout_data_ptr; const char *data_end = data_ptr + data_ptr_len; readstat_error_t retval = READSTAT_OK; uint32_t var_name_len = 0; if (data_ptr + sizeof(uint32_t) > data_end) { retval = READSTAT_ERROR_PARSE; goto cleanup; } memcpy(&var_name_len, data_ptr, sizeof(uint32_t)); if (ctx->bswap) var_name_len = byteswap4(var_name_len); data_ptr += sizeof(uint32_t); if (data_ptr + var_name_len > data_end) { retval = READSTAT_ERROR_PARSE; goto cleanup; } retval = readstat_convert(buf, buf_len, data_ptr, var_name_len, NULL); if (retval != READSTAT_OK) goto cleanup; data_ptr += var_name_len; cleanup: *inout_data_ptr = data_ptr; return retval; } static readstat_error_t sav_parse_long_string_value_labels_record(const void *data, size_t size, size_t count, sav_ctx_t *ctx) { if (!ctx->handle.value_label) return READSTAT_OK; if (size != 1) return READSTAT_ERROR_PARSE; readstat_error_t retval = READSTAT_OK; uint32_t label_count = 0; uint32_t i = 0; const char *data_ptr = data; const char *data_end = data_ptr + count; char var_name_buf[256+1]; // unconverted char label_name_buf[256]; char *value_buffer = NULL; char *label_buffer = NULL; while (data_ptr < data_end) { memset(label_name_buf, '\0', sizeof(label_name_buf)); retval = sav_read_pascal_string(var_name_buf, sizeof(var_name_buf), &data_ptr, data_end - data_ptr, ctx); if (retval != READSTAT_OK) goto cleanup; for (i=0; ivar_index;) { spss_varinfo_t *info = ctx->varinfo[i]; if (strcmp(var_name_buf, info->longname) == 0) { info->labels_index = ctx->value_labels_count++; snprintf(label_name_buf, sizeof(label_name_buf), SAV_LABEL_NAME_PREFIX "%d", info->labels_index); break; } i += info->n_segments; } if (label_name_buf[0] == '\0') { retval = READSTAT_ERROR_PARSE; goto cleanup; } data_ptr += sizeof(uint32_t); if (data_ptr + sizeof(uint32_t) > data_end) { retval = READSTAT_ERROR_PARSE; goto cleanup; } memcpy(&label_count, data_ptr, sizeof(uint32_t)); if (ctx->bswap) label_count = byteswap4(label_count); data_ptr += sizeof(uint32_t); for (i=0; i data_end) { retval = READSTAT_ERROR_PARSE; goto cleanup; } memcpy(&value_len, data_ptr, sizeof(uint32_t)); if (ctx->bswap) value_len = byteswap4(value_len); data_ptr += sizeof(uint32_t); value_buffer_len = value_len*4+1; value_buffer = readstat_realloc(value_buffer, value_buffer_len); if (value_buffer == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (data_ptr + value_len > data_end) { retval = READSTAT_ERROR_PARSE; goto cleanup; } retval = readstat_convert(value_buffer, value_buffer_len, data_ptr, value_len, ctx->converter); if (retval != READSTAT_OK) goto cleanup; data_ptr += value_len; if (data_ptr + sizeof(uint32_t) > data_end) { retval = READSTAT_ERROR_PARSE; goto cleanup; } memcpy(&label_len, data_ptr, sizeof(uint32_t)); if (ctx->bswap) label_len = byteswap4(label_len); data_ptr += sizeof(uint32_t); label_buffer_len = label_len*4+1; label_buffer = readstat_realloc(label_buffer, label_buffer_len); if (label_buffer == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (data_ptr + label_len > data_end) { retval = READSTAT_ERROR_PARSE; goto cleanup; } retval = readstat_convert(label_buffer, label_buffer_len, data_ptr, label_len, ctx->converter); if (retval != READSTAT_OK) goto cleanup; data_ptr += label_len; readstat_value_t value = { .type = READSTAT_TYPE_STRING }; value.v.string_value = value_buffer; if (ctx->handle.value_label(label_name_buf, value, label_buffer, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } } if (data_ptr != data_end) { retval = READSTAT_ERROR_PARSE; } cleanup: if (value_buffer) free(value_buffer); if (label_buffer) free(label_buffer); return retval; } static readstat_error_t sav_parse_long_string_missing_values_record(const void *data, size_t size, size_t count, sav_ctx_t *ctx) { if (size != 1) return READSTAT_ERROR_PARSE; readstat_error_t retval = READSTAT_OK; uint32_t i = 0, j = 0; const char *data_ptr = data; const char *data_end = data_ptr + count; char var_name_buf[256+1]; while (data_ptr < data_end) { retval = sav_read_pascal_string(var_name_buf, sizeof(var_name_buf), &data_ptr, data_end - data_ptr, ctx); if (retval != READSTAT_OK) goto cleanup; if (data_ptr == data_end) { retval = READSTAT_ERROR_PARSE; goto cleanup; } char n_missing_values = *data_ptr++; if (n_missing_values < 1 || n_missing_values > 3) { retval = READSTAT_ERROR_PARSE; goto cleanup; } for (i=0; ivar_index;) { spss_varinfo_t *info = ctx->varinfo[i]; if (strcmp(var_name_buf, info->longname) == 0) { info->n_missing_values = n_missing_values; uint32_t var_name_len = 0; if (data_ptr + sizeof(uint32_t) > data_end) { retval = READSTAT_ERROR_PARSE; goto cleanup; } memcpy(&var_name_len, data_ptr, sizeof(uint32_t)); if (ctx->bswap) var_name_len = byteswap4(var_name_len); data_ptr += sizeof(uint32_t); for (j=0; j data_end) { retval = READSTAT_ERROR_PARSE; goto cleanup; } retval = readstat_convert(info->missing_string_values[j], sizeof(info->missing_string_values[0]), data_ptr, var_name_len, ctx->converter); if (retval != READSTAT_OK) goto cleanup; data_ptr += var_name_len; } break; } i += info->n_segments; } if (i == ctx->var_index) { retval = READSTAT_ERROR_PARSE; goto cleanup; } } if (data_ptr != data_end) { retval = READSTAT_ERROR_PARSE; } cleanup: return retval; } static readstat_error_t sav_parse_records_pass1(sav_ctx_t *ctx) { char data_buf[4096]; readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; while (1) { uint32_t rec_type; uint32_t extra_info[3]; size_t data_len = 0; int i; int done = 0; if (io->read(&rec_type, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->bswap) { rec_type = byteswap4(rec_type); } switch (rec_type) { case SAV_RECORD_TYPE_VARIABLE: retval = sav_skip_variable_record(ctx); if (retval != READSTAT_OK) goto cleanup; break; case SAV_RECORD_TYPE_VALUE_LABEL: retval = sav_skip_value_label_record(ctx); if (retval != READSTAT_OK) goto cleanup; break; case SAV_RECORD_TYPE_DOCUMENT: retval = sav_skip_document_record(ctx); if (retval != READSTAT_OK) goto cleanup; break; case SAV_RECORD_TYPE_DICT_TERMINATION: done = 1; break; case SAV_RECORD_TYPE_HAS_DATA: if (io->read(extra_info, sizeof(extra_info), io->io_ctx) < sizeof(extra_info)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->bswap) { for (i=0; i<3; i++) extra_info[i] = byteswap4(extra_info[i]); } uint32_t subtype = extra_info[0]; size_t size = extra_info[1]; size_t count = extra_info[2]; data_len = size * count; if (subtype == SAV_RECORD_SUBTYPE_INTEGER_INFO) { if (data_len > sizeof(data_buf)) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (io->read(data_buf, data_len, io->io_ctx) < data_len) { retval = READSTAT_ERROR_PARSE; goto cleanup; } retval = sav_parse_machine_integer_info_record(data_buf, data_len, ctx); if (retval != READSTAT_OK) goto cleanup; } else { if (io->seek(data_len, READSTAT_SEEK_CUR, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } } break; default: retval = READSTAT_ERROR_PARSE; goto cleanup; break; } if (done) break; } cleanup: return retval; } static readstat_error_t sav_parse_records_pass2(sav_ctx_t *ctx) { void *data_buf = NULL; size_t data_buf_capacity = 4096; readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; if ((data_buf = readstat_malloc(data_buf_capacity)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } while (1) { uint32_t rec_type; uint32_t extra_info[3]; size_t data_len = 0; int i; int done = 0; if (io->read(&rec_type, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->bswap) { rec_type = byteswap4(rec_type); } switch (rec_type) { case SAV_RECORD_TYPE_VARIABLE: if ((retval = sav_read_variable_record(ctx)) != READSTAT_OK) goto cleanup; break; case SAV_RECORD_TYPE_VALUE_LABEL: if ((retval = sav_read_value_label_record(ctx)) != READSTAT_OK) goto cleanup; break; case SAV_RECORD_TYPE_DOCUMENT: if ((retval = sav_read_document_record(ctx)) != READSTAT_OK) goto cleanup; break; case SAV_RECORD_TYPE_DICT_TERMINATION: if ((retval = sav_read_dictionary_termination_record(ctx)) != READSTAT_OK) goto cleanup; done = 1; break; case SAV_RECORD_TYPE_HAS_DATA: if (io->read(extra_info, sizeof(extra_info), io->io_ctx) < sizeof(extra_info)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->bswap) { for (i=0; i<3; i++) extra_info[i] = byteswap4(extra_info[i]); } uint32_t subtype = extra_info[0]; size_t size = extra_info[1]; size_t count = extra_info[2]; data_len = size * count; if (data_buf_capacity < data_len) { if ((data_buf = readstat_realloc(data_buf, data_buf_capacity = data_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } } if (data_len == 0 || io->read(data_buf, data_len, io->io_ctx) < data_len) { retval = READSTAT_ERROR_PARSE; goto cleanup; } switch (subtype) { case SAV_RECORD_SUBTYPE_INTEGER_INFO: /* parsed in pass 1 */ break; case SAV_RECORD_SUBTYPE_FP_INFO: retval = sav_parse_machine_floating_point_record(data_buf, size, count, ctx); if (retval != READSTAT_OK) goto cleanup; break; case SAV_RECORD_SUBTYPE_VAR_DISPLAY: retval = sav_store_variable_display_parameter_record(data_buf, size, count, ctx); if (retval != READSTAT_OK) goto cleanup; break; case SAV_RECORD_SUBTYPE_LONG_VAR_NAME: retval = sav_parse_long_variable_names_record(data_buf, count, ctx); if (retval != READSTAT_OK) goto cleanup; break; case SAV_RECORD_SUBTYPE_VERY_LONG_STR: retval = sav_parse_very_long_string_record(data_buf, count, ctx); if (retval != READSTAT_OK) goto cleanup; break; case SAV_RECORD_SUBTYPE_LONG_STRING_VALUE_LABELS: retval = sav_parse_long_string_value_labels_record(data_buf, size, count, ctx); if (retval != READSTAT_OK) goto cleanup; break; case SAV_RECORD_SUBTYPE_LONG_STRING_MISSING_VALUES: retval = sav_parse_long_string_missing_values_record(data_buf, size, count, ctx); if (retval != READSTAT_OK) goto cleanup; break; default: /* misc. info */ break; } break; default: retval = READSTAT_ERROR_PARSE; goto cleanup; break; } if (done) break; } cleanup: if (data_buf) free(data_buf); return retval; } static readstat_error_t sav_set_n_segments_and_var_count(sav_ctx_t *ctx) { int i; ctx->var_count = 0; for (i=0; ivar_index;) { spss_varinfo_t *info = ctx->varinfo[i]; if (info->string_length > VERY_LONG_STRING_MAX_LENGTH) return READSTAT_ERROR_PARSE; if (info->string_length) { info->n_segments = (info->string_length + 251) / 252; } info->index = ctx->var_count++; i += info->n_segments; } ctx->variables = readstat_calloc(ctx->var_count, sizeof(readstat_variable_t *)); return READSTAT_OK; } static readstat_error_t sav_handle_variables(sav_ctx_t *ctx) { int i; int index_after_skipping = 0; readstat_error_t retval = READSTAT_OK; if (!ctx->handle.variable) return retval; for (i=0; ivar_index;) { char label_name_buf[256]; spss_varinfo_t *info = ctx->varinfo[i]; ctx->variables[info->index] = spss_init_variable_for_info(info, index_after_skipping, ctx->converter); snprintf(label_name_buf, sizeof(label_name_buf), SAV_LABEL_NAME_PREFIX "%d", info->labels_index); int cb_retval = ctx->handle.variable(info->index, ctx->variables[info->index], info->labels_index == -1 ? NULL : label_name_buf, ctx->user_ctx); if (cb_retval == READSTAT_HANDLER_ABORT) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } if (cb_retval == READSTAT_HANDLER_SKIP_VARIABLE) { ctx->variables[info->index]->skip = 1; } else { index_after_skipping++; } i += info->n_segments; } cleanup: return retval; } static readstat_error_t sav_handle_fweight(sav_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; int i; if (ctx->handle.fweight && ctx->fweight_index >= 0) { for (i=0; ivar_index;) { spss_varinfo_t *info = ctx->varinfo[i]; if (info->offset == ctx->fweight_index - 1) { if (ctx->handle.fweight(ctx->variables[info->index], ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } break; } i += info->n_segments; } } cleanup: return retval; } readstat_error_t sav_parse_timestamp(sav_ctx_t *ctx, sav_file_header_record_t *header) { readstat_error_t retval = READSTAT_OK; struct tm timestamp = { .tm_isdst = -1 }; if ((retval = sav_parse_time(header->creation_time, sizeof(header->creation_time), ×tamp, ctx->handle.error, ctx->user_ctx)) != READSTAT_OK) goto cleanup; if ((retval = sav_parse_date(header->creation_date, sizeof(header->creation_date), ×tamp, ctx->handle.error, ctx->user_ctx)) != READSTAT_OK) goto cleanup; ctx->timestamp = mktime(×tamp); cleanup: return retval; } readstat_error_t readstat_parse_sav(readstat_parser_t *parser, const char *path, void *user_ctx) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = parser->io; sav_file_header_record_t header; sav_ctx_t *ctx = NULL; size_t file_size = 0; if (io->open(path, io->io_ctx) == -1) { return READSTAT_ERROR_OPEN; } file_size = io->seek(0, READSTAT_SEEK_END, io->io_ctx); if (file_size == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if (io->seek(0, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if (io->read(&header, sizeof(sav_file_header_record_t), io->io_ctx) < sizeof(sav_file_header_record_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } ctx = sav_ctx_init(&header, io); if (ctx == NULL) { retval = READSTAT_ERROR_PARSE; goto cleanup; } ctx->handle = parser->handlers; ctx->input_encoding = parser->input_encoding; ctx->output_encoding = parser->output_encoding; ctx->user_ctx = user_ctx; ctx->file_size = file_size; if (parser->row_offset > 0) ctx->row_offset = parser->row_offset; if (ctx->record_count >= 0) { int record_count_after_skipping = ctx->record_count - ctx->row_offset; if (record_count_after_skipping < 0) { record_count_after_skipping = 0; ctx->row_offset = ctx->record_count; } ctx->row_limit = record_count_after_skipping; if (parser->row_limit > 0 && parser->row_limit < record_count_after_skipping) ctx->row_limit = parser->row_limit; } else if (parser->row_limit > 0) { ctx->row_limit = parser->row_limit; } if ((retval = sav_parse_timestamp(ctx, &header)) != READSTAT_OK) goto cleanup; if ((retval = sav_parse_records_pass1(ctx)) != READSTAT_OK) goto cleanup; if (io->seek(sizeof(sav_file_header_record_t), READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if ((retval = sav_update_progress(ctx)) != READSTAT_OK) goto cleanup; if ((retval = sav_parse_records_pass2(ctx)) != READSTAT_OK) goto cleanup; if ((retval = sav_set_n_segments_and_var_count(ctx)) != READSTAT_OK) goto cleanup; if (ctx->var_count == 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (ctx->handle.metadata) { readstat_metadata_t metadata = { .row_count = ctx->record_count < 0 ? -1 : ctx->row_limit, .var_count = ctx->var_count, .file_encoding = ctx->input_encoding, .file_format_version = ctx->format_version, .creation_time = ctx->timestamp, .modified_time = ctx->timestamp, .compression = ctx->compression, .endianness = ctx->endianness }; if ((retval = readstat_convert(ctx->file_label, sizeof(ctx->file_label), header.file_label, sizeof(header.file_label), ctx->converter)) != READSTAT_OK) goto cleanup; metadata.file_label = ctx->file_label; if (ctx->handle.metadata(&metadata, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } if ((retval = sav_parse_variable_display_parameter_record(ctx)) != READSTAT_OK) goto cleanup; if ((retval = sav_handle_variables(ctx)) != READSTAT_OK) goto cleanup; if ((retval = sav_handle_fweight(ctx)) != READSTAT_OK) goto cleanup; if (ctx->handle.value) { retval = sav_read_data(ctx); } cleanup: io->close(io->io_ctx); if (ctx) sav_ctx_free(ctx); return retval; } haven/src/readstat/spss/readstat_spss.c0000644000176200001440000002151314101007206017755 0ustar liggesusers #include #include #include "../readstat.h" #include "../readstat_iconv.h" #include "../readstat_convert.h" #include "readstat_spss.h" #include "readstat_spss_parse.h" static char spss_type_strings[][16] = { [SPSS_FORMAT_TYPE_A] = "A", [SPSS_FORMAT_TYPE_AHEX] = "AHEX", [SPSS_FORMAT_TYPE_COMMA] = "COMMA", [SPSS_FORMAT_TYPE_DOLLAR] = "DOLLAR", [SPSS_FORMAT_TYPE_F] = "F", [SPSS_FORMAT_TYPE_IB] = "IB", [SPSS_FORMAT_TYPE_PIBHEX] = "PIBHEX", [SPSS_FORMAT_TYPE_P] = "P", [SPSS_FORMAT_TYPE_PIB] = "PIB", [SPSS_FORMAT_TYPE_PK] = "PK", [SPSS_FORMAT_TYPE_RB] = "RB", [SPSS_FORMAT_TYPE_RBHEX] = "RBHEX", [SPSS_FORMAT_TYPE_Z] = "Z", [SPSS_FORMAT_TYPE_N] = "N", [SPSS_FORMAT_TYPE_E] = "E", [SPSS_FORMAT_TYPE_DATE] = "DATE", [SPSS_FORMAT_TYPE_TIME] = "TIME", [SPSS_FORMAT_TYPE_DATETIME] = "DATETIME", [SPSS_FORMAT_TYPE_ADATE] = "ADATE", [SPSS_FORMAT_TYPE_JDATE] = "JDATE", [SPSS_FORMAT_TYPE_DTIME] = "DTIME", [SPSS_FORMAT_TYPE_WKDAY] = "WKDAY", [SPSS_FORMAT_TYPE_MONTH] = "MONTH", [SPSS_FORMAT_TYPE_MOYR] = "MOYR", [SPSS_FORMAT_TYPE_QYR] = "QYR", [SPSS_FORMAT_TYPE_WKYR] = "WKYR", [SPSS_FORMAT_TYPE_PCT] = "PCT", [SPSS_FORMAT_TYPE_DOT] = "DOT", [SPSS_FORMAT_TYPE_CCA] = "CCA", [SPSS_FORMAT_TYPE_CCB] = "CCB", [SPSS_FORMAT_TYPE_CCC] = "CCC", [SPSS_FORMAT_TYPE_CCD] = "CCD", [SPSS_FORMAT_TYPE_CCE] = "CCE", [SPSS_FORMAT_TYPE_EDATE] = "EDATE", [SPSS_FORMAT_TYPE_SDATE] = "SDATE", [SPSS_FORMAT_TYPE_MTIME] = "MTIME", [SPSS_FORMAT_TYPE_YMDHMS] = "YMDHMS", }; int spss_format(char *buffer, size_t len, spss_format_t *format) { if (format->type < 0 || format->type >= sizeof(spss_type_strings)/sizeof(spss_type_strings[0]) || spss_type_strings[format->type][0] == '\0') { return 0; } char *string = spss_type_strings[format->type]; if (format->decimal_places || format->type == SPSS_FORMAT_TYPE_F) { snprintf(buffer, len, "%s%d.%d", string, format->width, format->decimal_places); } else if (format->width) { snprintf(buffer, len, "%s%d", string, format->width); } else { snprintf(buffer, len, "%s", string); } return 1; } int spss_varinfo_compare(const void *elem1, const void *elem2) { int offset = *(int *)elem1; const spss_varinfo_t *v = *(const spss_varinfo_t **)elem2; if (offset < v->offset) return -1; return (offset > v->offset); } void spss_varinfo_free(spss_varinfo_t *info) { if (info) { if (info->label) free(info->label); free(info); } } uint64_t spss_64bit_value(readstat_value_t value) { double dval = readstat_double_value(value); uint64_t special_val; memcpy(&special_val, &dval, sizeof(double)); if (isinf(dval)) { if (dval < 0.0) { special_val = SAV_LOWEST_DOUBLE; } else { special_val = SAV_HIGHEST_DOUBLE; } } else if (isnan(dval)) { special_val = SAV_MISSING_DOUBLE; } return special_val; } static readstat_value_t spss_boxed_double_value(double fp_value) { readstat_value_t value = { .type = READSTAT_TYPE_DOUBLE, .v = { .double_value = fp_value }, .is_system_missing = isnan(fp_value) }; return value; } static readstat_value_t spss_boxed_string_value(const char *string) { readstat_value_t value = { .type = READSTAT_TYPE_STRING, .v = { .string_value = string } }; return value; } static readstat_value_t spss_boxed_missing_value(spss_varinfo_t *info, int i) { if (info->type == READSTAT_TYPE_DOUBLE) { return spss_boxed_double_value(info->missing_double_values[i]); } return spss_boxed_string_value(info->missing_string_values[i]); } readstat_missingness_t spss_missingness_for_info(spss_varinfo_t *info) { readstat_missingness_t missingness; memset(&missingness, '\0', sizeof(readstat_missingness_t)); if (info->missing_range) { missingness.missing_ranges_count++; missingness.missing_ranges[0] = spss_boxed_missing_value(info, 0); missingness.missing_ranges[1] = spss_boxed_missing_value(info, 1); if (info->n_missing_values == 3) { missingness.missing_ranges_count++; missingness.missing_ranges[2] = missingness.missing_ranges[3] = spss_boxed_missing_value(info, 2); } } else if (info->n_missing_values > 0) { missingness.missing_ranges_count = info->n_missing_values; int i=0; for (i=0; in_missing_values; i++) { missingness.missing_ranges[2*i] = missingness.missing_ranges[2*i+1] = spss_boxed_missing_value(info, i); } } return missingness; } readstat_variable_t *spss_init_variable_for_info(spss_varinfo_t *info, int index_after_skipping, iconv_t converter) { readstat_variable_t *variable = calloc(1, sizeof(readstat_variable_t)); variable->index = info->index; variable->index_after_skipping = index_after_skipping; variable->type = info->type; if (info->string_length) { variable->storage_width = info->string_length; } else { variable->storage_width = 8 * info->width; } if (info->longname[0]) { readstat_convert(variable->name, sizeof(variable->name), info->longname, sizeof(info->longname), converter); } else { readstat_convert(variable->name, sizeof(variable->name), info->name, sizeof(info->name), converter); } if (info->label) { snprintf(variable->label, sizeof(variable->label), "%s", info->label); } spss_format(variable->format, sizeof(variable->format), &info->print_format); variable->missingness = spss_missingness_for_info(info); variable->measure = info->measure; if (info->display_width) { variable->display_width = info->display_width; } else { variable->display_width = info->print_format.width; } return variable; } uint32_t spss_measure_from_readstat_measure(readstat_measure_t measure) { uint32_t sav_measure = SAV_MEASURE_UNKNOWN; if (measure == READSTAT_MEASURE_NOMINAL) { sav_measure = SAV_MEASURE_NOMINAL; } else if (measure == READSTAT_MEASURE_ORDINAL) { sav_measure = SAV_MEASURE_ORDINAL; } else if (measure == READSTAT_MEASURE_SCALE) { sav_measure = SAV_MEASURE_SCALE; } return sav_measure; } readstat_measure_t spss_measure_to_readstat_measure(uint32_t sav_measure) { if (sav_measure == SAV_MEASURE_NOMINAL) return READSTAT_MEASURE_NOMINAL; if (sav_measure == SAV_MEASURE_ORDINAL) return READSTAT_MEASURE_ORDINAL; if (sav_measure == SAV_MEASURE_SCALE) return READSTAT_MEASURE_SCALE; return READSTAT_MEASURE_UNKNOWN; } uint32_t spss_alignment_from_readstat_alignment(readstat_alignment_t alignment) { uint32_t sav_alignment = 0; if (alignment == READSTAT_ALIGNMENT_LEFT) { sav_alignment = SAV_ALIGNMENT_LEFT; } else if (alignment == READSTAT_ALIGNMENT_CENTER) { sav_alignment = SAV_ALIGNMENT_CENTER; } else if (alignment == READSTAT_ALIGNMENT_RIGHT) { sav_alignment = SAV_ALIGNMENT_RIGHT; } return sav_alignment; } readstat_alignment_t spss_alignment_to_readstat_alignment(uint32_t sav_alignment) { if (sav_alignment == SAV_ALIGNMENT_LEFT) return READSTAT_ALIGNMENT_LEFT; if (sav_alignment == SAV_ALIGNMENT_CENTER) return READSTAT_ALIGNMENT_CENTER; if (sav_alignment == SAV_ALIGNMENT_RIGHT) return READSTAT_ALIGNMENT_RIGHT; return READSTAT_ALIGNMENT_UNKNOWN; } readstat_error_t spss_format_for_variable(readstat_variable_t *r_variable, spss_format_t *spss_format) { readstat_error_t retval = READSTAT_OK; memset(spss_format, 0, sizeof(spss_format_t)); if (r_variable->type == READSTAT_TYPE_STRING) { spss_format->type = SPSS_FORMAT_TYPE_A; if (r_variable->display_width) { spss_format->width = r_variable->display_width; } else if (r_variable->user_width) { spss_format->width = r_variable->user_width; } else { spss_format->width = r_variable->storage_width; } } else { spss_format->type = SPSS_FORMAT_TYPE_F; if (r_variable->display_width) { spss_format->width = r_variable->display_width; } else { spss_format->width = 8; } if (r_variable->type == READSTAT_TYPE_DOUBLE || r_variable->type == READSTAT_TYPE_FLOAT) { spss_format->decimal_places = 2; } } if (r_variable->format[0]) { spss_format->decimal_places = 0; const char *fmt = r_variable->format; if (spss_parse_format(fmt, strlen(fmt), spss_format) != READSTAT_OK) { retval = READSTAT_ERROR_BAD_FORMAT_STRING; goto cleanup; } } cleanup: return retval; } haven/src/readstat/spss/readstat_por.h0000644000176200001440000000240314101007206017567 0ustar liggesusers extern int8_t por_ascii_lookup[256]; extern uint16_t por_unicode_lookup[256]; typedef struct por_ctx_s { readstat_callbacks_t handle; size_t file_size; void *user_ctx; int pos; readstat_io_t *io; char space; long num_spaces; time_t timestamp; long version; char fweight_name[9]; char file_label[21]; uint16_t byte2unicode[256]; size_t base30_precision; iconv_t converter; unsigned char *string_buffer; size_t string_buffer_len; int labels_offset; int obs_count; int var_count; int var_offset; int row_limit; int row_offset; readstat_variable_t **variables; spss_varinfo_t *varinfo; ck_hash_table_t *var_dict; } por_ctx_t; por_ctx_t *por_ctx_init(); void por_ctx_free(por_ctx_t *ctx); ssize_t por_utf8_encode(const unsigned char *input, size_t input_len, char *output, size_t output_len, uint16_t lookup[256]); ssize_t por_utf8_decode( const char *input, size_t input_len, char *output, size_t output_len, uint8_t *lookup, size_t lookup_len); haven/src/readstat/readstat_convert.c0000644000176200001440000000245614101007206017462 0ustar liggesusers #include #include "readstat.h" #include "readstat_iconv.h" #include "readstat_convert.h" readstat_error_t readstat_convert(char *dst, size_t dst_len, const char *src, size_t src_len, iconv_t converter) { /* strip off spaces from the input because the programs use ASCII space * padding even with non-ASCII encoding. */ while (src_len && src[src_len-1] == ' ') { src_len--; } if (dst_len == 0) { return READSTAT_ERROR_CONVERT_LONG_STRING; } else if (converter) { size_t dst_left = dst_len - 1; char *dst_end = dst; size_t status = iconv(converter, (readstat_iconv_inbuf_t)&src, &src_len, &dst_end, &dst_left); if (status == (size_t)-1) { if (errno == E2BIG) { return READSTAT_ERROR_CONVERT_LONG_STRING; } else if (errno == EILSEQ) { return READSTAT_ERROR_CONVERT_BAD_STRING; } else if (errno != EINVAL) { /* EINVAL indicates improper truncation; accept it */ return READSTAT_ERROR_CONVERT; } } dst[dst_len - dst_left - 1] = '\0'; } else if (src_len + 1 > dst_len) { return READSTAT_ERROR_CONVERT_LONG_STRING; } else { memcpy(dst, src, src_len); dst[src_len] = '\0'; } return READSTAT_OK; } haven/src/readstat/readstat_parser.c0000644000176200001440000000754114101007206017276 0ustar liggesusers #include #include "readstat.h" #include "readstat_io_unistd.h" readstat_parser_t *readstat_parser_init() { readstat_parser_t *parser = calloc(1, sizeof(readstat_parser_t)); parser->io = calloc(1, sizeof(readstat_io_t)); if (unistd_io_init(parser) != READSTAT_OK) { readstat_parser_free(parser); return NULL; } parser->output_encoding = "UTF-8"; return parser; } void readstat_parser_free(readstat_parser_t *parser) { if (parser) { if (parser->io) { readstat_set_io_ctx(parser, NULL); free(parser->io); } free(parser); } } readstat_error_t readstat_set_metadata_handler(readstat_parser_t *parser, readstat_metadata_handler metadata_handler) { parser->handlers.metadata = metadata_handler; return READSTAT_OK; } readstat_error_t readstat_set_note_handler(readstat_parser_t *parser, readstat_note_handler note_handler) { parser->handlers.note = note_handler; return READSTAT_OK; } readstat_error_t readstat_set_variable_handler(readstat_parser_t *parser, readstat_variable_handler variable_handler) { parser->handlers.variable = variable_handler; return READSTAT_OK; } readstat_error_t readstat_set_value_handler(readstat_parser_t *parser, readstat_value_handler value_handler) { parser->handlers.value = value_handler; return READSTAT_OK; } readstat_error_t readstat_set_value_label_handler(readstat_parser_t *parser, readstat_value_label_handler label_handler) { parser->handlers.value_label = label_handler; return READSTAT_OK; } readstat_error_t readstat_set_error_handler(readstat_parser_t *parser, readstat_error_handler error_handler) { parser->handlers.error = error_handler; return READSTAT_OK; } readstat_error_t readstat_set_progress_handler(readstat_parser_t *parser, readstat_progress_handler progress_handler) { parser->handlers.progress = progress_handler; return READSTAT_OK; } readstat_error_t readstat_set_fweight_handler(readstat_parser_t *parser, readstat_fweight_handler fweight_handler) { parser->handlers.fweight = fweight_handler; return READSTAT_OK; } readstat_error_t readstat_set_open_handler(readstat_parser_t *parser, readstat_open_handler open_handler) { parser->io->open = open_handler; return READSTAT_OK; } readstat_error_t readstat_set_close_handler(readstat_parser_t *parser, readstat_close_handler close_handler) { parser->io->close = close_handler; return READSTAT_OK; } readstat_error_t readstat_set_seek_handler(readstat_parser_t *parser, readstat_seek_handler seek_handler) { parser->io->seek = seek_handler; return READSTAT_OK; } readstat_error_t readstat_set_read_handler(readstat_parser_t *parser, readstat_read_handler read_handler) { parser->io->read = read_handler; return READSTAT_OK; } readstat_error_t readstat_set_update_handler(readstat_parser_t *parser, readstat_update_handler update_handler) { parser->io->update = update_handler; return READSTAT_OK; } readstat_error_t readstat_set_io_ctx(readstat_parser_t *parser, void *io_ctx) { if (parser->io->io_ctx_needs_free) { free(parser->io->io_ctx); } parser->io->io_ctx = io_ctx; parser->io->io_ctx_needs_free = 0; return READSTAT_OK; } readstat_error_t readstat_set_file_character_encoding(readstat_parser_t *parser, const char *encoding) { parser->input_encoding = encoding; return READSTAT_OK; } readstat_error_t readstat_set_handler_character_encoding(readstat_parser_t *parser, const char *encoding) { parser->output_encoding = encoding; return READSTAT_OK; } readstat_error_t readstat_set_row_limit(readstat_parser_t *parser, long row_limit) { parser->row_limit = row_limit; return READSTAT_OK; } readstat_error_t readstat_set_row_offset(readstat_parser_t *parser, long row_offset) { parser->row_offset = row_offset; return READSTAT_OK; } haven/src/readstat/readstat_variable.c0000644000176200001440000001067514101007206017571 0ustar liggesusers #include #include "readstat.h" static readstat_value_t make_blank_value(); static readstat_value_t make_double_value(double dval); static readstat_value_t make_blank_value() { readstat_value_t value = { .is_system_missing = 1, .v = { .double_value = NAN }, .type = READSTAT_TYPE_DOUBLE }; return value; } static readstat_value_t make_double_value(double dval) { readstat_value_t value = { .v = { .double_value = dval }, .type = READSTAT_TYPE_DOUBLE }; return value; } static readstat_value_t make_string_value(const char *string) { readstat_value_t value = { .v = { .string_value = string }, .type = READSTAT_TYPE_STRING }; return value; } const char *readstat_variable_get_name(const readstat_variable_t *variable) { if (variable->name[0]) return variable->name; return NULL; } const char *readstat_variable_get_label(const readstat_variable_t *variable) { if (variable->label[0]) return variable->label; return NULL; } const char *readstat_variable_get_format(const readstat_variable_t *variable) { if (variable->format[0]) return variable->format; return NULL; } readstat_type_t readstat_variable_get_type(const readstat_variable_t *variable) { return variable->type; } readstat_type_class_t readstat_variable_get_type_class(const readstat_variable_t *variable) { return readstat_type_class(variable->type); } int readstat_variable_get_index(const readstat_variable_t *variable) { return variable->index; } int readstat_variable_get_index_after_skipping(const readstat_variable_t *variable) { return variable->index_after_skipping; } size_t readstat_variable_get_storage_width(const readstat_variable_t *variable) { return variable->storage_width; } readstat_measure_t readstat_variable_get_measure(const readstat_variable_t *variable) { return variable->measure; } readstat_alignment_t readstat_variable_get_alignment(const readstat_variable_t *variable) { return variable->alignment; } int readstat_variable_get_display_width(const readstat_variable_t *variable) { return variable->display_width; } int readstat_variable_get_missing_ranges_count(const readstat_variable_t *variable) { return variable->missingness.missing_ranges_count; } readstat_value_t readstat_variable_get_missing_range_lo(const readstat_variable_t *variable, int i) { if (i < variable->missingness.missing_ranges_count && 2*i+1 < sizeof(variable->missingness.missing_ranges)/sizeof(variable->missingness.missing_ranges[0])) { return variable->missingness.missing_ranges[2*i]; } return make_blank_value(); } readstat_value_t readstat_variable_get_missing_range_hi(const readstat_variable_t *variable, int i) { if (i < variable->missingness.missing_ranges_count && 2*i+1 < sizeof(variable->missingness.missing_ranges)/sizeof(variable->missingness.missing_ranges[0])) { return variable->missingness.missing_ranges[2*i+1]; } return make_blank_value(); } static readstat_error_t readstat_variable_add_missing_value_range(readstat_variable_t *variable, readstat_value_t lo, readstat_value_t hi) { int i = readstat_variable_get_missing_ranges_count(variable); if (2*i < sizeof(variable->missingness.missing_ranges)/sizeof(variable->missingness.missing_ranges[0])) { variable->missingness.missing_ranges[2*i] = lo; variable->missingness.missing_ranges[2*i+1] = hi; variable->missingness.missing_ranges_count++; return READSTAT_OK; } return READSTAT_ERROR_TOO_MANY_MISSING_VALUE_DEFINITIONS; } readstat_error_t readstat_variable_add_missing_double_range(readstat_variable_t *variable, double lo, double hi) { return readstat_variable_add_missing_value_range(variable, make_double_value(lo), make_double_value(hi)); } readstat_error_t readstat_variable_add_missing_double_value(readstat_variable_t *variable, double value) { return readstat_variable_add_missing_value_range(variable, make_double_value(value), make_double_value(value)); } readstat_error_t readstat_variable_add_missing_string_range(readstat_variable_t *variable, const char *lo, const char *hi) { return readstat_variable_add_missing_value_range(variable, make_string_value(lo), make_string_value(hi)); } readstat_error_t readstat_variable_add_missing_string_value(readstat_variable_t *variable, const char *value) { return readstat_variable_add_missing_value_range(variable, make_string_value(value), make_string_value(value)); } haven/src/readstat/readstat_io_unistd.c0000644000176200001440000000704414101765776020024 0ustar liggesusers #include #include #include #if defined _WIN32 # include # include #endif #if !defined(_MSC_VER) # include #else #define open _open #define read _read #define close _close #endif #if defined _WIN32 || defined __CYGWIN__ #define UNISTD_OPEN_OPTIONS O_RDONLY | O_BINARY #elif defined _AIX #define UNISTD_OPEN_OPTIONS O_RDONLY | O_LARGEFILE #else #define UNISTD_OPEN_OPTIONS O_RDONLY #endif #if defined _WIN32 #define lseek _lseeki64 #elif defined _AIX #define lseek lseek64 #endif #include "readstat.h" #include "readstat_io_unistd.h" int open_with_unicode(const char *path, int options) { #if defined _WIN32 const int buffer_size = MultiByteToWideChar(CP_UTF8, 0, path, -1, NULL, 0); if(buffer_size <= 0) return -1; wchar_t* wpath = malloc((buffer_size + 1) * sizeof(wchar_t)); const int res = MultiByteToWideChar(CP_UTF8, 0, path, -1, wpath, buffer_size); wpath[buffer_size] = 0; if(res <= 0) { free(wpath); return -1; } int fd = _wopen(wpath, options); free(wpath); return fd; #else return open(path, options); #endif } int unistd_open_handler(const char *path, void *io_ctx) { int fd = open_with_unicode(path, UNISTD_OPEN_OPTIONS); ((unistd_io_ctx_t*) io_ctx)->fd = fd; return fd; } int unistd_close_handler(void *io_ctx) { int fd = ((unistd_io_ctx_t*) io_ctx)->fd; if (fd != -1) return close(fd); else return 0; } readstat_off_t unistd_seek_handler(readstat_off_t offset, readstat_io_flags_t whence, void *io_ctx) { int flag = 0; switch(whence) { case READSTAT_SEEK_SET: flag = SEEK_SET; break; case READSTAT_SEEK_CUR: flag = SEEK_CUR; break; case READSTAT_SEEK_END: flag = SEEK_END; break; default: return -1; } int fd = ((unistd_io_ctx_t*) io_ctx)->fd; return lseek(fd, offset, flag); } ssize_t unistd_read_handler(void *buf, size_t nbyte, void *io_ctx) { int fd = ((unistd_io_ctx_t*) io_ctx)->fd; ssize_t out = read(fd, buf, nbyte); return out; } readstat_error_t unistd_update_handler(long file_size, readstat_progress_handler progress_handler, void *user_ctx, void *io_ctx) { if (!progress_handler) return READSTAT_OK; int fd = ((unistd_io_ctx_t*) io_ctx)->fd; readstat_off_t current_offset = lseek(fd, 0, SEEK_CUR); if (current_offset == -1) return READSTAT_ERROR_SEEK; if (progress_handler(1.0 * current_offset / file_size, user_ctx)) return READSTAT_ERROR_USER_ABORT; return READSTAT_OK; } readstat_error_t unistd_io_init(readstat_parser_t *parser) { readstat_error_t retval = READSTAT_OK; unistd_io_ctx_t *io_ctx = NULL; if ((retval = readstat_set_open_handler(parser, unistd_open_handler)) != READSTAT_OK) return retval; if ((retval = readstat_set_close_handler(parser, unistd_close_handler)) != READSTAT_OK) return retval; if ((retval = readstat_set_seek_handler(parser, unistd_seek_handler)) != READSTAT_OK) return retval; if ((retval = readstat_set_read_handler(parser, unistd_read_handler)) != READSTAT_OK) return retval; if ((readstat_set_update_handler(parser, unistd_update_handler)) != READSTAT_OK) return retval; io_ctx = calloc(1, sizeof(unistd_io_ctx_t)); io_ctx->fd = -1; retval = readstat_set_io_ctx(parser, (void*) io_ctx); parser->io->io_ctx_needs_free = 1; return retval; } haven/src/readstat/readstat_strings.h0000644000176200001440000000017314101007206017472 0ustar liggesusers#if defined(_MSC_VER) # define strncasecmp _strnicmp # define strcasecmp _stricmp #else # include #endif haven/src/readstat/NEWS0000644000176200001440000001073414101765776014473 0ustar liggesusersNew in 1.1.6: * Migrate to GitHub Actions * Regenerate parsers with Ragel 7 and update build script * SAS7BDAT reader: Improved large file support on Windows #226 * SAV reader: Skip null bytes in UTF-8 data https://github.com/tidyverse/haven/issues/560 * SAV reader: Fix hang (oss-fuzz/23485) * DTA reader: Disallow str0 type * DTA reader: Fix encoding error when garbage values are present beyond the end of a string * Command file readers: Fix integer overflow (oss-fuzz/15778) * `extract_metadata`: Implement duration support #223 (thanks to @basgys) * Support for SAS files created with SAS Visual Forecaster #232 * Report format widths for date/time SAS formats #233 * Document the meaning of a -1 return value from `readstat_get_row_count` #234 * Fix SAS file creation / modification times on Windows #238 #240 New in 1.1.5: * Support for building with MSVC #214 (thanks to @zebrys and @jonathon-love) * CLI tools: Support non-ASCII file paths on Windows #200 #216 (thanks to @zebrys) * DTA reader: Ignore bad timestamps * DTA writer: Fix memory leak * DTA writer: Improved support for empty value labels #219 * POR reader: Improved support for date/time formats #160 * SAS7BDAT reader: Added support for reading the dataset label #180 #213 (thanks to @reikoch) * SAS7BDAT reader: Improved detection of compressed files * SAS7BDAT reader: Improved bounds checking OSS-Fuzz/28312 * SAS7BDAT reader: Support for more character encodings * SAV reader: Tolerate illegal lowercase variable names #217 * SAV reader: Better support for non-UTF-8 variable names * SAV reader: Fix format widths for very long strings https://github.com/Roche/pyreadstat/issues/77 * SAV reader: Fix undefined behavior with negative row counts OSS-Fuzz/23423 New in 1.1.4: * SAS7BDAT reader: Add support for binary-compressed files #21 * XPT v8 writer: Improve compatibility with SAS #207 (thanks to @reikoch) * XPT reader: Fix reading of long variable names #208 (thanks to @reikoch) * SAS readers: Support for more character encodings * SAV reader: Clients sometimes received truncated UTF-8 strings * SPSS writers: Improve compatibility with PSPP with DATETIME fields #211 * All formats: Improved support for setting / getting the `display_width` #210 New in 1.1.3: * Fix warnings when compiling with GCC 10 #202 * SAS RLE compressor: Fixes for large files #201 * SAV reader: Improved support for UTF-8 column names #206 * SAV reader: Return a better error message if the magic number doesn't match * SAV reader: Support for dash-separated timestamps New in 1.1.2: * DTA reader: support for Spanish-locale timestamps * SAS reader: support for "any" encoding tidyverse/haven#482 * CLI tool: Allow uppercase filename extensions * Improved support for reading SPSS and SAS command files * Improved support for reading POR files with format widths >100 * Improved support for reading SAV files containing space-padded timestamps #197 * Improved support for writing SAV files with a large number of variables #199 * Improved support for reading SAS7BDAT files created by Stat/Transfer #198 * Fix several integer overflows and undefined values #192 #193 #194 #195 #196 New in 1.1.1: * Support row limits in the plain-text parsers * SAV reader: Allow spaces in timestamp strings * README: Fix Windows / pacman instructions #189 * Fix errors opening files in Stata 15 (tidyverse/haven#461) New in 1.1: * New function: readstat_set_row_offset (#185). Thanks to @mikmart * Fix segfault when localtime fails on Windows * Fix implicit float conversion warning (oss-fuzz/16372) * New error code: READSTAT_ERROR_BAD_TIMESTAMP_VALUE * Renamed error code: READSTAT_ERROR_BAD_TIMESTAMP => READSTAT_ERROR_BAD_TIMESTAMP_STRING New in 1.0.2: * Compilation: Fix -Wstringop-truncation warnings on GCC 8.2 and later (#151) * SPSS command parser: Fix signed integer overflow (oss-fuzz/15049) * POR parser: Use doubles internally to prevent integer overflows with very large exponents (#182) New in 1.0.1: * SAV writer: Validate variable names * Fix a buffer overflow reading SPSS commands (oss-fuzz/15050) * New error code READSTAT_ERROR_NAME_IS_ZERO_LENGTH when a blank variable name is provided * New fuzzing dictionary files in fuzz/dict for parsing plain-text file formats * Move corpus files from corpus to fuzz/corpus haven/src/readstat/readstat_writer.h0000644000176200001440000000256614101007206017325 0ustar liggesusers #define READSTAT_PRODUCT_NAME "ReadStat" #define READSTAT_PRODUCT_URL "https://github.com/WizardMac/ReadStat" readstat_error_t readstat_begin_writing_file(readstat_writer_t *writer, void *user_ctx, long row_count); readstat_error_t readstat_write_bytes(readstat_writer_t *writer, const void *bytes, size_t len); readstat_error_t readstat_write_bytes_as_lines(readstat_writer_t *writer, const void *bytes, size_t len, size_t line_len, const char *line_sep); readstat_error_t readstat_write_line_padding(readstat_writer_t *writer, char pad, size_t line_len, const char *line_sep); readstat_error_t readstat_write_zeros(readstat_writer_t *writer, size_t len); readstat_error_t readstat_write_spaces(readstat_writer_t *writer, size_t len); readstat_error_t readstat_write_string(readstat_writer_t *writer, const char *bytes); readstat_error_t readstat_write_space_padded_string(readstat_writer_t *writer, const char *string, size_t max_len); readstat_value_label_t *readstat_get_value_label(readstat_label_set_t *label_set, int index); readstat_label_set_t *readstat_get_label_set(readstat_writer_t *writer, int index); readstat_variable_t *readstat_get_label_set_variable(readstat_label_set_t *label_set, int index); void readstat_sort_label_set(readstat_label_set_t *label_set, int (*compare)(const readstat_value_label_t *, const readstat_value_label_t *)); haven/src/readstat/stata/0000755000176200001440000000000014102332323015056 5ustar liggesusershaven/src/readstat/stata/readstat_dta.h0000644000176200001440000001316114101007206017666 0ustar liggesusers#pragma pack(push, 1) // DTA files typedef struct dta_header_s { unsigned char ds_format; unsigned char byteorder; unsigned char filetype; unsigned char unused; uint16_t nvar; uint32_t nobs; } dta_header_t; typedef struct dta_header64_s { unsigned char ds_format; unsigned char byteorder; unsigned char filetype; unsigned char unused; uint32_t nvar; uint64_t nobs; } dta_header64_t; typedef struct dta_117_strl_header_s { uint32_t v; uint32_t o; unsigned char type; int32_t len; } dta_117_strl_header_t; typedef struct dta_118_strl_header_s { uint32_t v; uint64_t o; unsigned char type; int32_t len; } dta_118_strl_header_t; #pragma pack(pop) typedef struct dta_strl_s { uint16_t v; uint64_t o; unsigned char type; size_t len; char data[1]; // Flexible array; use [1] for C++98 compatibility } dta_strl_t; typedef struct dta_ctx_s { char *data_label; size_t data_label_len; size_t data_label_len_len; time_t timestamp; size_t timestamp_len; char typlist_version; size_t typlist_entry_len; uint16_t *typlist; size_t typlist_len; char *varlist; size_t varlist_len; int16_t *srtlist; size_t srtlist_len; char *fmtlist; size_t fmtlist_len; char *lbllist; size_t lbllist_len; char *variable_labels; size_t variable_labels_len; size_t variable_name_len; size_t fmtlist_entry_len; size_t lbllist_entry_len; size_t variable_labels_entry_len; size_t expansion_len_len; size_t ch_metadata_len; size_t value_label_table_len_len; size_t value_label_table_labname_len; size_t value_label_table_padding_len; size_t strl_v_len; size_t strl_o_len; int64_t data_offset; int64_t strls_offset; int64_t value_labels_offset; int ds_format; int nvar; int64_t nobs; size_t record_len; int64_t row_limit; int64_t row_offset; int64_t current_row; unsigned int bswap:1; unsigned int machine_is_twos_complement:1; unsigned int file_is_xmlish:1; unsigned int supports_tagged_missing:1; int8_t max_int8; int16_t max_int16; int32_t max_int32; int32_t max_float; int64_t max_double; dta_strl_t **strls; size_t strls_count; size_t strls_capacity; readstat_variable_t **variables; readstat_endian_t endianness; iconv_t converter; readstat_callbacks_t handle; size_t file_size; void *user_ctx; readstat_io_t *io; int initialized; char error_buf[256]; } dta_ctx_t; #define DTA_HILO 0x01 #define DTA_LOHI 0x02 #define DTA_OLD_MAX_INT8 0x7e #define DTA_OLD_MAX_INT16 0x7ffe #define DTA_OLD_MAX_INT32 0x7ffffffe #define DTA_OLD_MAX_FLOAT 0x7effffff // +1.7e38f #define DTA_OLD_MAX_DOUBLE 0x7fdfffffffffffffL // +8.9e307 #define DTA_OLD_MISSING_INT8 0x7F #define DTA_OLD_MISSING_INT16 0x7FFF #define DTA_OLD_MISSING_INT32 0x7FFFFFFF #define DTA_OLD_MISSING_FLOAT 0x7F000000 #define DTA_OLD_MISSING_DOUBLE 0x7FE0000000000000L #define DTA_113_MAX_INT8 0x64 #define DTA_113_MAX_INT16 0x7fe4 #define DTA_113_MAX_INT32 0x7fffffe4 #define DTA_113_MAX_FLOAT 0x7effffff // +1.7e38f #define DTA_113_MAX_DOUBLE 0x7fdfffffffffffffL // +8.9e307 #define DTA_113_MISSING_INT8 0x65 #define DTA_113_MISSING_INT16 0x7FE5 #define DTA_113_MISSING_INT32 0x7FFFFFE5 #define DTA_113_MISSING_FLOAT 0x7F000000 #define DTA_113_MISSING_DOUBLE 0x7FE0000000000000L #define DTA_113_MISSING_INT8_A (DTA_113_MISSING_INT8+1) #define DTA_113_MISSING_INT16_A (DTA_113_MISSING_INT16+1) #define DTA_113_MISSING_INT32_A (DTA_113_MISSING_INT32+1) #define DTA_113_MISSING_FLOAT_A (DTA_113_MISSING_FLOAT+0x0800) #define DTA_113_MISSING_DOUBLE_A (DTA_113_MISSING_DOUBLE+0x010000000000) #define DTA_GSO_TYPE_BINARY 0x81 #define DTA_GSO_TYPE_ASCII 0x82 #define DTA_117_TYPE_CODE_INT8 0xFFFA #define DTA_117_TYPE_CODE_INT16 0xFFF9 #define DTA_117_TYPE_CODE_INT32 0xFFF8 #define DTA_117_TYPE_CODE_FLOAT 0xFFF7 #define DTA_117_TYPE_CODE_DOUBLE 0xFFF6 #define DTA_117_TYPE_CODE_STRL 0x8000 #define DTA_111_TYPE_CODE_INT8 0xFB #define DTA_111_TYPE_CODE_INT16 0xFC #define DTA_111_TYPE_CODE_INT32 0xFD #define DTA_111_TYPE_CODE_FLOAT 0xFE #define DTA_111_TYPE_CODE_DOUBLE 0xFF #define DTA_OLD_TYPE_CODE_INT8 'b' #define DTA_OLD_TYPE_CODE_INT16 'i' #define DTA_OLD_TYPE_CODE_INT32 'l' #define DTA_OLD_TYPE_CODE_FLOAT 'f' #define DTA_OLD_TYPE_CODE_DOUBLE 'd' dta_ctx_t *dta_ctx_alloc(readstat_io_t *io); readstat_error_t dta_ctx_init(dta_ctx_t *ctx, uint32_t nvar, uint64_t nobs, unsigned char byteorder, unsigned char ds_format, const char *input_encoding, const char *output_encoding); void dta_ctx_free(dta_ctx_t *ctx); readstat_error_t dta_type_info(uint16_t typecode, dta_ctx_t *ctx, size_t *max_len, readstat_type_t *out_type); haven/src/readstat/stata/readstat_dta_write.c0000644000176200001440000014163214101007206021100 0ustar liggesusers #include #include #include #include #include #include #include "../readstat.h" #include "../readstat_bits.h" #include "../readstat_iconv.h" #include "../readstat_writer.h" #include "readstat_dta.h" #define DTA_DEFAULT_DISPLAY_WIDTH_BYTE 8 #define DTA_DEFAULT_DISPLAY_WIDTH_INT16 8 #define DTA_DEFAULT_DISPLAY_WIDTH_INT32 12 #define DTA_DEFAULT_DISPLAY_WIDTH_FLOAT 9 #define DTA_DEFAULT_DISPLAY_WIDTH_DOUBLE 10 #define DTA_DEFAULT_DISPLAY_WIDTH_STRING 9 #define DTA_FILE_VERSION_MIN 104 #define DTA_FILE_VERSION_MAX 119 #define DTA_FILE_VERSION_DEFAULT 118 #define DTA_OLD_MAX_WIDTH 128 #define DTA_111_MAX_WIDTH 244 #define DTA_117_MAX_WIDTH 2045 #define DTA_OLD_MAX_NAME_LEN 9 #define DTA_110_MAX_NAME_LEN 33 #define DTA_118_MAX_NAME_LEN 129 static readstat_error_t dta_113_write_missing_numeric(void *row, const readstat_variable_t *var); static readstat_error_t dta_write_tag(readstat_writer_t *writer, dta_ctx_t *ctx, const char *tag) { if (!ctx->file_is_xmlish) return READSTAT_OK; return readstat_write_string(writer, tag); } static readstat_error_t dta_write_chunk(readstat_writer_t *writer, dta_ctx_t *ctx, const char *start_tag, const void *bytes, size_t len, const char *end_tag) { readstat_error_t error = READSTAT_OK; if ((error = dta_write_tag(writer, ctx, start_tag)) != READSTAT_OK) goto cleanup; if ((error = readstat_write_bytes(writer, bytes, len)) != READSTAT_OK) goto cleanup; if ((error = dta_write_tag(writer, ctx, end_tag)) != READSTAT_OK) goto cleanup; cleanup: return error; } static readstat_error_t dta_emit_header_data_label(readstat_writer_t *writer, dta_ctx_t *ctx) { readstat_error_t error = READSTAT_OK; char *data_label = NULL; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; cleanup: if (data_label) free(data_label); return error; } static readstat_error_t dta_emit_header_time_stamp(readstat_writer_t *writer, dta_ctx_t *ctx) { if (!ctx->timestamp_len) return READSTAT_OK; readstat_error_t error = READSTAT_OK; time_t now = writer->timestamp; struct tm *time_s = localtime(&now); char *timestamp = calloc(1, ctx->timestamp_len); /* There are locale/portability issues with strftime so hack something up */ char months[][4] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" }; if (!time_s) { error = READSTAT_ERROR_BAD_TIMESTAMP_VALUE; goto cleanup; } if (!timestamp) { error = READSTAT_ERROR_MALLOC; goto cleanup; } uint8_t actual_timestamp_len = snprintf(timestamp, ctx->timestamp_len, "%02d %3s %04d %02d:%02d", time_s->tm_mday, months[time_s->tm_mon], time_s->tm_year + 1900, time_s->tm_hour, time_s->tm_min); if (actual_timestamp_len == 0) { error = READSTAT_ERROR_WRITE; goto cleanup; } if (ctx->file_is_xmlish) { if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; if ((error = readstat_write_bytes(writer, &actual_timestamp_len, sizeof(uint8_t))) != READSTAT_OK) goto cleanup; if ((error = readstat_write_bytes(writer, timestamp, actual_timestamp_len)) != READSTAT_OK) goto cleanup; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; } else { error = readstat_write_bytes(writer, timestamp, ctx->timestamp_len); } cleanup: free(timestamp); return error; } static readstat_error_t dta_111_typecode_for_variable(readstat_variable_t *r_variable, uint16_t *out_typecode) { readstat_error_t retval = READSTAT_OK; size_t max_len = r_variable->storage_width; uint16_t typecode = 0; switch (r_variable->type) { case READSTAT_TYPE_INT8: typecode = DTA_111_TYPE_CODE_INT8; break; case READSTAT_TYPE_INT16: typecode = DTA_111_TYPE_CODE_INT16; break; case READSTAT_TYPE_INT32: typecode = DTA_111_TYPE_CODE_INT32; break; case READSTAT_TYPE_FLOAT: typecode = DTA_111_TYPE_CODE_FLOAT; break; case READSTAT_TYPE_DOUBLE: typecode = DTA_111_TYPE_CODE_DOUBLE; break; case READSTAT_TYPE_STRING: typecode = max_len; break; case READSTAT_TYPE_STRING_REF: retval = READSTAT_ERROR_STRING_REFS_NOT_SUPPORTED; break; } if (out_typecode && retval == READSTAT_OK) *out_typecode = typecode; return retval; } static readstat_error_t dta_117_typecode_for_variable(readstat_variable_t *r_variable, uint16_t *out_typecode) { readstat_error_t retval = READSTAT_OK; size_t max_len = r_variable->storage_width; uint16_t typecode = 0; switch (r_variable->type) { case READSTAT_TYPE_INT8: typecode = DTA_117_TYPE_CODE_INT8; break; case READSTAT_TYPE_INT16: typecode = DTA_117_TYPE_CODE_INT16; break; case READSTAT_TYPE_INT32: typecode = DTA_117_TYPE_CODE_INT32; break; case READSTAT_TYPE_FLOAT: typecode = DTA_117_TYPE_CODE_FLOAT; break; case READSTAT_TYPE_DOUBLE: typecode = DTA_117_TYPE_CODE_DOUBLE; break; case READSTAT_TYPE_STRING: typecode = max_len; break; case READSTAT_TYPE_STRING_REF: typecode = DTA_117_TYPE_CODE_STRL; break; } if (out_typecode) *out_typecode = typecode; return retval; } static readstat_error_t dta_old_typecode_for_variable(readstat_variable_t *r_variable, uint16_t *out_typecode) { readstat_error_t retval = READSTAT_OK; size_t max_len = r_variable->storage_width; uint16_t typecode = 0; switch (r_variable->type) { case READSTAT_TYPE_INT8: typecode = DTA_OLD_TYPE_CODE_INT8; break; case READSTAT_TYPE_INT16: typecode = DTA_OLD_TYPE_CODE_INT16; break; case READSTAT_TYPE_INT32: typecode = DTA_OLD_TYPE_CODE_INT32; break; case READSTAT_TYPE_FLOAT: typecode = DTA_OLD_TYPE_CODE_FLOAT; break; case READSTAT_TYPE_DOUBLE: typecode = DTA_OLD_TYPE_CODE_DOUBLE; break; case READSTAT_TYPE_STRING: typecode = max_len + 0x7F; break; case READSTAT_TYPE_STRING_REF: retval = READSTAT_ERROR_STRING_REFS_NOT_SUPPORTED; break; } if (out_typecode && retval == READSTAT_OK) *out_typecode = typecode; return retval; } static readstat_error_t dta_typecode_for_variable(readstat_variable_t *r_variable, int typlist_version, uint16_t *typecode) { if (typlist_version == 111) { return dta_111_typecode_for_variable(r_variable, typecode); } if (typlist_version == 117) { return dta_117_typecode_for_variable(r_variable, typecode); } return dta_old_typecode_for_variable(r_variable, typecode); } static readstat_error_t dta_emit_typlist(readstat_writer_t *writer, dta_ctx_t *ctx) { readstat_error_t error = READSTAT_OK; int i; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; for (i=0; invar; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); uint16_t typecode = 0; error = dta_typecode_for_variable(r_variable, ctx->typlist_version, &typecode); if (error != READSTAT_OK) goto cleanup; ctx->typlist[i] = typecode; } for (i=0; invar; i++) { if (ctx->typlist_entry_len == 1) { uint8_t byte = ctx->typlist[i]; error = readstat_write_bytes(writer, &byte, sizeof(uint8_t)); } else if (ctx->typlist_entry_len == 2) { uint16_t val = ctx->typlist[i]; error = readstat_write_bytes(writer, &val, sizeof(uint16_t)); } if (error != READSTAT_OK) goto cleanup; } if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; cleanup: return error; } static readstat_error_t dta_validate_name_chars(const char *name, int unicode) { /* TODO check Unicode class */ int j; for (j=0; name[j]; j++) { if ((name[j] > 0 || !unicode) && name[j] != '_' && !(name[j] >= 'a' && name[j] <= 'z') && !(name[j] >= 'A' && name[j] <= 'Z') && !(name[j] >= '0' && name[j] <= '9')) { return READSTAT_ERROR_NAME_CONTAINS_ILLEGAL_CHARACTER; } } char first_char = name[0]; if ((first_char > 0 || !unicode) && first_char != '_' && !(first_char >= 'a' && first_char <= 'z') && !(first_char >= 'A' && first_char <= 'Z')) { return READSTAT_ERROR_NAME_BEGINS_WITH_ILLEGAL_CHARACTER; } return READSTAT_OK; } static readstat_error_t dta_validate_name_unreserved(const char *name) { if (strcmp(name, "_all") == 0 || strcmp(name, "_b") == 0 || strcmp(name, "byte") == 0 || strcmp(name, "_coef") == 0 || strcmp(name, "_cons") == 0 || strcmp(name, "double") == 0 || strcmp(name, "float") == 0 || strcmp(name, "if") == 0 || strcmp(name, "in") == 0 || strcmp(name, "int") == 0 || strcmp(name, "long") == 0 || strcmp(name, "_n") == 0 || strcmp(name, "_N") == 0 || strcmp(name, "_pi") == 0 || strcmp(name, "_pred") == 0 || strcmp(name, "_rc") == 0 || strcmp(name, "_skip") == 0 || strcmp(name, "strL") == 0 || strcmp(name, "using") == 0 || strcmp(name, "with") == 0) { return READSTAT_ERROR_NAME_IS_RESERVED_WORD; } int len; if (sscanf(name, "str%d", &len) == 1) return READSTAT_ERROR_NAME_IS_RESERVED_WORD; return READSTAT_OK; } static readstat_error_t dta_validate_name(const char *name, int unicode, size_t max_len) { readstat_error_t error = READSTAT_OK; if (strlen(name) > max_len) return READSTAT_ERROR_NAME_IS_TOO_LONG; if (strlen(name) == 0) return READSTAT_ERROR_NAME_IS_ZERO_LENGTH; if ((error = dta_validate_name_chars(name, unicode)) != READSTAT_OK) return error; return dta_validate_name_unreserved(name); } static readstat_error_t dta_old_variable_ok(const readstat_variable_t *variable) { return dta_validate_name(readstat_variable_get_name(variable), 0, DTA_OLD_MAX_NAME_LEN); } static readstat_error_t dta_110_variable_ok(const readstat_variable_t *variable) { return dta_validate_name(readstat_variable_get_name(variable), 0, DTA_110_MAX_NAME_LEN); } static readstat_error_t dta_118_variable_ok(const readstat_variable_t *variable) { return dta_validate_name(readstat_variable_get_name(variable), 1, DTA_118_MAX_NAME_LEN); } static readstat_error_t dta_emit_varlist(readstat_writer_t *writer, dta_ctx_t *ctx) { readstat_error_t error = READSTAT_OK; int i; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; for (i=0; invar; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); strncpy(&ctx->varlist[ctx->variable_name_len*i], r_variable->name, ctx->variable_name_len); } if ((error = readstat_write_bytes(writer, ctx->varlist, ctx->varlist_len)) != READSTAT_OK) goto cleanup; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; cleanup: return error; } static readstat_error_t dta_emit_srtlist(readstat_writer_t *writer, dta_ctx_t *ctx) { readstat_error_t error = READSTAT_OK; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; memset(ctx->srtlist, '\0', ctx->srtlist_len); if ((error = readstat_write_bytes(writer, ctx->srtlist, ctx->srtlist_len)) != READSTAT_OK) goto cleanup; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; cleanup: return error; } static readstat_error_t dta_emit_fmtlist(readstat_writer_t *writer, dta_ctx_t *ctx) { readstat_error_t error = READSTAT_OK; int i; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; for (i=0; invar; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); if (r_variable->format[0]) { strncpy(&ctx->fmtlist[ctx->fmtlist_entry_len*i], r_variable->format, ctx->fmtlist_entry_len); } else { char format_letter = 'g'; int display_width = r_variable->display_width; if (readstat_type_class(r_variable->type) == READSTAT_TYPE_CLASS_STRING) { format_letter = 's'; } if (!display_width) { if (r_variable->type == READSTAT_TYPE_INT8) { display_width = DTA_DEFAULT_DISPLAY_WIDTH_BYTE; } else if (r_variable->type == READSTAT_TYPE_INT16) { display_width = DTA_DEFAULT_DISPLAY_WIDTH_INT16; } else if (r_variable->type == READSTAT_TYPE_INT32) { display_width = DTA_DEFAULT_DISPLAY_WIDTH_INT32; } else if (r_variable->type == READSTAT_TYPE_FLOAT) { display_width = DTA_DEFAULT_DISPLAY_WIDTH_FLOAT; } else if (r_variable->type == READSTAT_TYPE_DOUBLE) { display_width = DTA_DEFAULT_DISPLAY_WIDTH_DOUBLE; } else { display_width = DTA_DEFAULT_DISPLAY_WIDTH_STRING; } } char format[64]; if (format_letter == 'g') { sprintf(format, "%%%s%d.0g", r_variable->alignment == READSTAT_ALIGNMENT_LEFT ? "-" : "", display_width); } else { sprintf(format, "%%%s%ds", r_variable->alignment == READSTAT_ALIGNMENT_LEFT ? "-" : "", display_width); } strncpy(&ctx->fmtlist[ctx->fmtlist_entry_len*i], format, ctx->fmtlist_entry_len); } } if ((error = readstat_write_bytes(writer, ctx->fmtlist, ctx->fmtlist_len)) != READSTAT_OK) goto cleanup; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; cleanup: return error; } static readstat_error_t dta_emit_lbllist(readstat_writer_t *writer, dta_ctx_t *ctx) { readstat_error_t error = READSTAT_OK; int i; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; for (i=0; invar; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); if (r_variable->label_set) { strncpy(&ctx->lbllist[ctx->lbllist_entry_len*i], r_variable->label_set->name, ctx->lbllist_entry_len); } else { memset(&ctx->lbllist[ctx->lbllist_entry_len*i], '\0', ctx->lbllist_entry_len); } } if ((error = readstat_write_bytes(writer, ctx->lbllist, ctx->lbllist_len)) != READSTAT_OK) goto cleanup; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; cleanup: return error; } static readstat_error_t dta_emit_descriptors(readstat_writer_t *writer, dta_ctx_t *ctx) { readstat_error_t error = READSTAT_OK; error = dta_emit_typlist(writer, ctx); if (error != READSTAT_OK) goto cleanup; error = dta_emit_varlist(writer, ctx); if (error != READSTAT_OK) goto cleanup; error = dta_emit_srtlist(writer, ctx); if (error != READSTAT_OK) goto cleanup; error = dta_emit_fmtlist(writer, ctx); if (error != READSTAT_OK) goto cleanup; error = dta_emit_lbllist(writer, ctx); if (error != READSTAT_OK) goto cleanup; cleanup: return error; } static readstat_error_t dta_emit_variable_labels(readstat_writer_t *writer, dta_ctx_t *ctx) { readstat_error_t error = READSTAT_OK; int i; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; for (i=0; invar; i++) { readstat_variable_t *r_variable = readstat_get_variable(writer, i); strncpy(&ctx->variable_labels[ctx->variable_labels_entry_len*i], r_variable->label, ctx->variable_labels_entry_len); } if ((error = readstat_write_bytes(writer, ctx->variable_labels, ctx->variable_labels_len)) != READSTAT_OK) goto cleanup; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; cleanup: return error; } static readstat_error_t dta_emit_characteristics(readstat_writer_t *writer, dta_ctx_t *ctx) { readstat_error_t error = READSTAT_OK; int i; char *buffer = NULL; if (ctx->expansion_len_len == 0) return READSTAT_OK; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) return error; buffer = malloc(ctx->ch_metadata_len); for (i=0; inotes_count; i++) { if (ctx->file_is_xmlish) { error = dta_write_tag(writer, ctx, ""); } else { char data_type = 1; error = readstat_write_bytes(writer, &data_type, 1); } if (error != READSTAT_OK) goto cleanup; size_t len = strlen(writer->notes[i]); if (ctx->expansion_len_len == 2) { int16_t len16 = 2*ctx->ch_metadata_len + len + 1; error = readstat_write_bytes(writer, &len16, sizeof(len16)); } else if (ctx->expansion_len_len == 4) { int32_t len32 = 2*ctx->ch_metadata_len + len + 1; error = readstat_write_bytes(writer, &len32, sizeof(len32)); } if (error != READSTAT_OK) goto cleanup; strncpy(buffer, "_dta", ctx->ch_metadata_len); error = readstat_write_bytes(writer, buffer, ctx->ch_metadata_len); if (error != READSTAT_OK) goto cleanup; snprintf(buffer, ctx->ch_metadata_len, "note%d", i+1); error = readstat_write_bytes(writer, buffer, ctx->ch_metadata_len); if (error != READSTAT_OK) goto cleanup; error = readstat_write_bytes(writer, writer->notes[i], len + 1); if (error != READSTAT_OK) goto cleanup; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; } if (ctx->file_is_xmlish) { error = dta_write_tag(writer, ctx, ""); } else { error = readstat_write_zeros(writer, 1 + ctx->expansion_len_len); } if (error != READSTAT_OK) goto cleanup; cleanup: free(buffer); return error; } static readstat_error_t dta_117_emit_strl_header(readstat_writer_t *writer, readstat_string_ref_t *ref) { dta_117_strl_header_t header = { .v = ref->first_v, .o = ref->first_o, .type = DTA_GSO_TYPE_ASCII, .len = ref->len }; return readstat_write_bytes(writer, &header, sizeof(dta_117_strl_header_t)); } static readstat_error_t dta_118_emit_strl_header(readstat_writer_t *writer, readstat_string_ref_t *ref) { dta_118_strl_header_t header = { .v = ref->first_v, .o = ref->first_o, .type = DTA_GSO_TYPE_ASCII, .len = ref->len }; return readstat_write_bytes(writer, &header, sizeof(dta_118_strl_header_t)); } static readstat_error_t dta_emit_strls(readstat_writer_t *writer, dta_ctx_t *ctx) { if (!ctx->file_is_xmlish) return READSTAT_OK; readstat_error_t retval = READSTAT_OK; retval = readstat_write_string(writer, ""); if (retval != READSTAT_OK) goto cleanup; int i; for (i=0; istring_refs_count; i++) { readstat_string_ref_t *ref = writer->string_refs[i]; retval = readstat_write_string(writer, "GSO"); if (retval != READSTAT_OK) goto cleanup; if (ctx->strl_o_len > 4) { retval = dta_118_emit_strl_header(writer, ref); } else { retval = dta_117_emit_strl_header(writer, ref); } if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &ref->data[0], ref->len); if (retval != READSTAT_OK) goto cleanup; } retval = readstat_write_string(writer, ""); if (retval != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t dta_old_emit_value_labels(readstat_writer_t *writer, dta_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; int i, j; char labname[12+2]; char *label_buffer = NULL; for (i=0; ilabel_sets_count; i++) { readstat_label_set_t *r_label_set = readstat_get_label_set(writer, i); int32_t max_value = 0; for (j=0; jvalue_labels_count; j++) { readstat_value_label_t *value_label = readstat_get_value_label(r_label_set, j); if (value_label->tag) { retval = READSTAT_ERROR_TAGGED_VALUES_NOT_SUPPORTED; goto cleanup; } if (value_label->int32_key < 0 || value_label->int32_key > 1024) { retval = READSTAT_ERROR_NUMERIC_VALUE_IS_OUT_OF_RANGE; goto cleanup; } if (value_label->int32_key > max_value) { max_value = value_label->int32_key; } } int16_t table_len = 8*(max_value + 1); retval = readstat_write_bytes(writer, &table_len, sizeof(int16_t)); if (retval != READSTAT_OK) goto cleanup; memset(labname, 0, sizeof(labname)); strncpy(labname, r_label_set->name, ctx->value_label_table_labname_len); retval = readstat_write_bytes(writer, labname, ctx->value_label_table_labname_len + ctx->value_label_table_padding_len); if (retval != READSTAT_OK) goto cleanup; label_buffer = realloc(label_buffer, table_len); memset(label_buffer, 0, table_len); for (j=0; jvalue_labels_count; j++) { readstat_value_label_t *value_label = readstat_get_value_label(r_label_set, j); size_t len = value_label->label_len; if (len > 8) len = 8; memcpy(&label_buffer[8*value_label->int32_key], value_label->label, len); } retval = readstat_write_bytes(writer, label_buffer, table_len); if (retval != READSTAT_OK) goto cleanup; } cleanup: if (label_buffer) free(label_buffer); return retval; } static int dta_compare_value_labels(const readstat_value_label_t *vl1, const readstat_value_label_t *vl2) { if (vl1->tag) { if (vl2->tag) { return vl1->tag - vl2->tag; } return 1; } if (vl2->tag) { return -1; } return vl1->int32_key - vl2->int32_key; } static readstat_error_t dta_emit_value_labels(readstat_writer_t *writer, dta_ctx_t *ctx) { if (ctx->value_label_table_len_len == 2) return dta_old_emit_value_labels(writer, ctx); readstat_error_t retval = READSTAT_OK; int i, j; int32_t *off = NULL; int32_t *val = NULL; char *txt = NULL; char *labname = calloc(1, ctx->value_label_table_labname_len + ctx->value_label_table_padding_len); retval = dta_write_tag(writer, ctx, ""); if (retval != READSTAT_OK) goto cleanup; for (i=0; ilabel_sets_count; i++) { readstat_label_set_t *r_label_set = readstat_get_label_set(writer, i); int32_t n = r_label_set->value_labels_count; int32_t txtlen = 0; for (j=0; jlabel_len + 1; } retval = dta_write_tag(writer, ctx, ""); if (retval != READSTAT_OK) goto cleanup; int32_t table_len = 8 + 8*n + txtlen; retval = readstat_write_bytes(writer, &table_len, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; strncpy(labname, r_label_set->name, ctx->value_label_table_labname_len); retval = readstat_write_bytes(writer, labname, ctx->value_label_table_labname_len + ctx->value_label_table_padding_len); if (retval != READSTAT_OK) goto cleanup; if (txtlen == 0) { retval = readstat_write_bytes(writer, &txtlen, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &txtlen, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; retval = dta_write_tag(writer, ctx, ""); if (retval != READSTAT_OK) goto cleanup; continue; } off = realloc(off, 4*n); val = realloc(val, 4*n); txt = realloc(txt, txtlen); readstat_off_t offset = 0; readstat_sort_label_set(r_label_set, &dta_compare_value_labels); for (j=0; jlabel; size_t label_data_len = value_label->label_len; off[j] = offset; if (value_label->tag) { if (writer->version < 113) { retval = READSTAT_ERROR_TAGGED_VALUES_NOT_SUPPORTED; goto cleanup; } val[j] = DTA_113_MISSING_INT32_A + (value_label->tag - 'a'); } else { val[j] = value_label->int32_key; } memcpy(txt + offset, label, label_data_len); offset += label_data_len; txt[offset++] = '\0'; } retval = readstat_write_bytes(writer, &n, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &txtlen, sizeof(int32_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, off, 4*n); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, val, 4*n); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, txt, txtlen); if (retval != READSTAT_OK) goto cleanup; retval = dta_write_tag(writer, ctx, ""); if (retval != READSTAT_OK) goto cleanup; } retval = dta_write_tag(writer, ctx, ""); if (retval != READSTAT_OK) goto cleanup; cleanup: if (off) free(off); if (val) free(val); if (txt) free(txt); if (labname) free(labname); return retval; } static size_t dta_numeric_variable_width(readstat_type_t type, size_t user_width) { size_t len = 0; if (type == READSTAT_TYPE_DOUBLE) { len = 8; } else if (type == READSTAT_TYPE_FLOAT) { len = 4; } else if (type == READSTAT_TYPE_INT32) { len = 4; } else if (type == READSTAT_TYPE_INT16) { len = 2; } else if (type == READSTAT_TYPE_INT8) { len = 1; } return len; } static size_t dta_111_variable_width(readstat_type_t type, size_t user_width) { if (type == READSTAT_TYPE_STRING) { if (user_width > DTA_111_MAX_WIDTH || user_width == 0) user_width = DTA_111_MAX_WIDTH; return user_width; } return dta_numeric_variable_width(type, user_width); } static size_t dta_117_variable_width(readstat_type_t type, size_t user_width) { if (type == READSTAT_TYPE_STRING) { if (user_width > DTA_117_MAX_WIDTH || user_width == 0) user_width = DTA_117_MAX_WIDTH; return user_width; } if (type == READSTAT_TYPE_STRING_REF) return 8; return dta_numeric_variable_width(type, user_width); } static size_t dta_old_variable_width(readstat_type_t type, size_t user_width) { if (type == READSTAT_TYPE_STRING) { if (user_width > DTA_OLD_MAX_WIDTH || user_width == 0) user_width = DTA_OLD_MAX_WIDTH; return user_width; } return dta_numeric_variable_width(type, user_width); } static readstat_error_t dta_emit_xmlish_header(readstat_writer_t *writer, dta_ctx_t *ctx) { readstat_error_t error = READSTAT_OK; if ((error = dta_write_tag(writer, ctx, "")) != READSTAT_OK) goto cleanup; if ((error = dta_write_tag(writer, ctx, "
")) != READSTAT_OK) goto cleanup; char release[128]; snprintf(release, sizeof(release), "%ld", writer->version); if ((error = readstat_write_string(writer, release)) != READSTAT_OK) goto cleanup; error = dta_write_chunk(writer, ctx, "", machine_is_little_endian() ? "LSF" : "MSF", sizeof("MSF")-1, ""); if (error != READSTAT_OK) goto cleanup; if (writer->version >= 119) { uint32_t nvar = writer->variables_count; error = dta_write_chunk(writer, ctx, "", &nvar, sizeof(uint32_t), ""); if (error != READSTAT_OK) goto cleanup; } else { uint16_t nvar = writer->variables_count; error = dta_write_chunk(writer, ctx, "", &nvar, sizeof(uint16_t), ""); if (error != READSTAT_OK) goto cleanup; } if (writer->version >= 118) { uint64_t nobs = writer->row_count; error = dta_write_chunk(writer, ctx, "", &nobs, sizeof(uint64_t), ""); if (error != READSTAT_OK) goto cleanup; } else { uint32_t nobs = writer->row_count; error = dta_write_chunk(writer, ctx, "", &nobs, sizeof(uint32_t), ""); if (error != READSTAT_OK) goto cleanup; } error = dta_emit_header_data_label(writer, ctx); if (error != READSTAT_OK) goto cleanup; error = dta_emit_header_time_stamp(writer, ctx); if (error != READSTAT_OK) goto cleanup; if ((error = dta_write_tag(writer, ctx, "
")) != READSTAT_OK) goto cleanup; cleanup: return error; } static readstat_error_t dta_emit_header(readstat_writer_t *writer, dta_ctx_t *ctx) { if (ctx->file_is_xmlish) { return dta_emit_xmlish_header(writer, ctx); } readstat_error_t error = READSTAT_OK; dta_header_t header = {0}; header.ds_format = writer->version; header.byteorder = machine_is_little_endian() ? DTA_LOHI : DTA_HILO; header.filetype = 0x01; header.unused = 0x00; header.nvar = writer->variables_count; header.nobs = writer->row_count; if (writer->variables_count > 32767) { error = READSTAT_ERROR_TOO_MANY_COLUMNS; goto cleanup; } if ((error = readstat_write_bytes(writer, &header, sizeof(dta_header_t))) != READSTAT_OK) goto cleanup; if ((error = dta_emit_header_data_label(writer, ctx)) != READSTAT_OK) goto cleanup; if ((error = dta_emit_header_time_stamp(writer, ctx)) != READSTAT_OK) goto cleanup; cleanup: return READSTAT_OK; } static size_t dta_measure_tag(dta_ctx_t *ctx, const char *tag) { if (!ctx->file_is_xmlish) return 0; return strlen(tag); } static size_t dta_measure_map(dta_ctx_t *ctx) { return (dta_measure_tag(ctx, "") + 14 * sizeof(uint64_t) + dta_measure_tag(ctx, "")); } static size_t dta_measure_typlist(dta_ctx_t *ctx) { return (dta_measure_tag(ctx, "") + ctx->typlist_entry_len * ctx->nvar + dta_measure_tag(ctx, "")); } static size_t dta_measure_varlist(dta_ctx_t *ctx) { return (dta_measure_tag(ctx, "") + ctx->varlist_len + dta_measure_tag(ctx, "")); } static size_t dta_measure_srtlist(dta_ctx_t *ctx) { return (dta_measure_tag(ctx, "") + ctx->srtlist_len + dta_measure_tag(ctx, "")); } static size_t dta_measure_fmtlist(dta_ctx_t *ctx) { return (dta_measure_tag(ctx, "") + ctx->fmtlist_len + dta_measure_tag(ctx, "")); } static size_t dta_measure_lbllist(dta_ctx_t *ctx) { return (dta_measure_tag(ctx, "") + ctx->lbllist_len + dta_measure_tag(ctx, "")); } static size_t dta_measure_variable_labels(dta_ctx_t *ctx) { return (dta_measure_tag(ctx, "") + ctx->variable_labels_len + dta_measure_tag(ctx, "")); } static size_t dta_measure_characteristics(readstat_writer_t *writer, dta_ctx_t *ctx) { size_t characteristics_len = 0; int i; for (i=0; inotes_count; i++) { size_t ch_len = dta_measure_tag(ctx, "") + ctx->expansion_len_len + 2 * ctx->ch_metadata_len + strlen(writer->notes[i]) + 1 + dta_measure_tag(ctx, ""); characteristics_len += ch_len; } return (dta_measure_tag(ctx, "") + characteristics_len + dta_measure_tag(ctx, "")); } static size_t dta_measure_data(readstat_writer_t *writer, dta_ctx_t *ctx) { int i; for (i=0; invar; i++) { size_t max_len = 0; readstat_variable_t *r_variable = readstat_get_variable(writer, i); uint16_t typecode = 0; dta_typecode_for_variable(r_variable, ctx->typlist_version, &typecode); if (dta_type_info(typecode, ctx, &max_len, NULL) == READSTAT_OK) ctx->record_len += max_len; } return (dta_measure_tag(ctx, "") + ctx->record_len * ctx->nobs + dta_measure_tag(ctx, "")); } static size_t dta_measure_strls(readstat_writer_t *writer, dta_ctx_t *ctx) { int i; size_t strls_len = 0; for (i=0; istring_refs_count; i++) { readstat_string_ref_t *ref = writer->string_refs[i]; if (ctx->strl_o_len > 4) { strls_len += 20 + ref->len; } else { strls_len += 16 + ref->len; } } return (dta_measure_tag(ctx, "") + strls_len + dta_measure_tag(ctx, "")); } static size_t dta_measure_value_labels(readstat_writer_t *writer, dta_ctx_t *ctx) { size_t len = dta_measure_tag(ctx, ""); int i, j; for (i=0; ilabel_sets_count; i++) { readstat_label_set_t *r_label_set = readstat_get_label_set(writer, i); int32_t n = r_label_set->value_labels_count; int32_t txtlen = 0; for (j=0; jlabel_len + 1; } len += dta_measure_tag(ctx, ""); len += sizeof(int32_t); len += ctx->value_label_table_labname_len; len += ctx->value_label_table_padding_len; len += 8 + 8*n + txtlen; len += dta_measure_tag(ctx, ""); } len += dta_measure_tag(ctx, ""); return len; } static readstat_error_t dta_emit_map(readstat_writer_t *writer, dta_ctx_t *ctx) { if (!ctx->file_is_xmlish) return READSTAT_OK; uint64_t map[14]; map[0] = 0; /* */ map[1] = writer->bytes_written; /* */ map[2] = map[1] + dta_measure_map(ctx); /* */ map[3] = map[2] + dta_measure_typlist(ctx); /* */ map[4] = map[3] + dta_measure_varlist(ctx); /* */ map[5] = map[4] + dta_measure_srtlist(ctx); /* */ map[6] = map[5] + dta_measure_fmtlist(ctx); /* */ map[7] = map[6] + dta_measure_lbllist(ctx); /* */ map[8] = map[7] + dta_measure_variable_labels(ctx); /* */ map[9] = map[8] + dta_measure_characteristics(writer, ctx); /* */ map[10]= map[9] + dta_measure_data(writer, ctx); /* */ map[11]= map[10]+ dta_measure_strls(writer, ctx); /* */ map[12]= map[11]+ dta_measure_value_labels(writer, ctx); /* */ map[13]= map[12]+ dta_measure_tag(ctx, "
"); return dta_write_chunk(writer, ctx, "", map, sizeof(map), ""); } static readstat_error_t dta_begin_data(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; readstat_error_t error = READSTAT_OK; if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; dta_ctx_t *ctx = dta_ctx_alloc(NULL); error = dta_ctx_init(ctx, writer->variables_count, writer->row_count, machine_is_little_endian() ? DTA_LOHI : DTA_HILO, writer->version, NULL, NULL); if (error != READSTAT_OK) goto cleanup; error = dta_emit_header(writer, ctx); if (error != READSTAT_OK) goto cleanup; error = dta_emit_map(writer, ctx); if (error != READSTAT_OK) goto cleanup; error = dta_emit_descriptors(writer, ctx); if (error != READSTAT_OK) goto cleanup; error = dta_emit_variable_labels(writer, ctx); if (error != READSTAT_OK) goto cleanup; error = dta_emit_characteristics(writer, ctx); if (error != READSTAT_OK) goto cleanup; error = dta_write_tag(writer, ctx, ""); if (error != READSTAT_OK) goto cleanup; cleanup: if (error != READSTAT_OK) { dta_ctx_free(ctx); } else { writer->module_ctx = ctx; } return error; } static readstat_error_t dta_write_raw_int8(void *row, int8_t value) { memcpy(row, &value, sizeof(char)); return READSTAT_OK; } static readstat_error_t dta_write_raw_int16(void *row, int16_t value) { memcpy(row, &value, sizeof(int16_t)); return READSTAT_OK; } static readstat_error_t dta_write_raw_int32(void *row, int32_t value) { memcpy(row, &value, sizeof(int32_t)); return READSTAT_OK; } static readstat_error_t dta_write_raw_int64(void *row, int64_t value) { memcpy(row, &value, sizeof(int64_t)); return READSTAT_OK; } static readstat_error_t dta_write_raw_float(void *row, float value) { memcpy(row, &value, sizeof(float)); return READSTAT_OK; } static readstat_error_t dta_write_raw_double(void *row, double value) { memcpy(row, &value, sizeof(double)); return READSTAT_OK; } static readstat_error_t dta_113_write_int8(void *row, const readstat_variable_t *var, int8_t value) { if (value > DTA_113_MAX_INT8) { return READSTAT_ERROR_NUMERIC_VALUE_IS_OUT_OF_RANGE; } return dta_write_raw_int8(row, value); } static readstat_error_t dta_old_write_int8(void *row, const readstat_variable_t *var, int8_t value) { if (value > DTA_OLD_MAX_INT8) { return READSTAT_ERROR_NUMERIC_VALUE_IS_OUT_OF_RANGE; } return dta_write_raw_int8(row, value); } static readstat_error_t dta_113_write_int16(void *row, const readstat_variable_t *var, int16_t value) { if (value > DTA_113_MAX_INT16) { return READSTAT_ERROR_NUMERIC_VALUE_IS_OUT_OF_RANGE; } return dta_write_raw_int16(row, value); } static readstat_error_t dta_old_write_int16(void *row, const readstat_variable_t *var, int16_t value) { if (value > DTA_OLD_MAX_INT16) { return READSTAT_ERROR_NUMERIC_VALUE_IS_OUT_OF_RANGE; } return dta_write_raw_int16(row, value); } static readstat_error_t dta_113_write_int32(void *row, const readstat_variable_t *var, int32_t value) { if (value > DTA_113_MAX_INT32) { return READSTAT_ERROR_NUMERIC_VALUE_IS_OUT_OF_RANGE; } return dta_write_raw_int32(row, value); } static readstat_error_t dta_old_write_int32(void *row, const readstat_variable_t *var, int32_t value) { if (value > DTA_OLD_MAX_INT32) { return READSTAT_ERROR_NUMERIC_VALUE_IS_OUT_OF_RANGE; } return dta_write_raw_int32(row, value); } static readstat_error_t dta_write_float(void *row, const readstat_variable_t *var, float value) { int32_t max_flt_i32 = DTA_113_MAX_FLOAT; float max_flt; memcpy(&max_flt, &max_flt_i32, sizeof(float)); if (value > max_flt) { return READSTAT_ERROR_NUMERIC_VALUE_IS_OUT_OF_RANGE; } else if (isnan(value)) { return dta_113_write_missing_numeric(row, var); } return dta_write_raw_float(row, value); } static readstat_error_t dta_write_double(void *row, const readstat_variable_t *var, double value) { int64_t max_dbl_i64 = DTA_113_MAX_DOUBLE; double max_dbl; memcpy(&max_dbl, &max_dbl_i64, sizeof(double)); if (value > max_dbl) { return READSTAT_ERROR_NUMERIC_VALUE_IS_OUT_OF_RANGE; } else if (isnan(value)) { return dta_113_write_missing_numeric(row, var); } return dta_write_raw_double(row, value); } static readstat_error_t dta_write_string(void *row, const readstat_variable_t *var, const char *value) { size_t max_len = var->storage_width; if (value == NULL || value[0] == '\0') { memset(row, '\0', max_len); } else { size_t value_len = strlen(value); if (value_len > max_len) return READSTAT_ERROR_STRING_VALUE_IS_TOO_LONG; strncpy((char *)row, value, max_len); } return READSTAT_OK; } static readstat_error_t dta_118_write_string_ref(void *row, const readstat_variable_t *var, readstat_string_ref_t *ref) { if (ref == NULL) return READSTAT_ERROR_STRING_REF_IS_REQUIRED; int16_t v = ref->first_v; int64_t o = ref->first_o; char *row_bytes = (char *)row; memcpy(&row_bytes[0], &v, sizeof(int16_t)); if (!machine_is_little_endian()) { o <<= 16; } memcpy(&row_bytes[2], &o, 6); return READSTAT_OK; } static readstat_error_t dta_117_write_string_ref(void *row, const readstat_variable_t *var, readstat_string_ref_t *ref) { if (ref == NULL) return READSTAT_ERROR_STRING_REF_IS_REQUIRED; int32_t v = ref->first_v; int32_t o = ref->first_o; char *row_bytes = (char *)row; memcpy(&row_bytes[0], &v, sizeof(int32_t)); memcpy(&row_bytes[4], &o, sizeof(int32_t)); return READSTAT_OK; } static readstat_error_t dta_113_write_missing_numeric(void *row, const readstat_variable_t *var) { readstat_error_t retval = READSTAT_OK; if (var->type == READSTAT_TYPE_INT8) { retval = dta_write_raw_int8(row, DTA_113_MISSING_INT8); } else if (var->type == READSTAT_TYPE_INT16) { retval = dta_write_raw_int16(row, DTA_113_MISSING_INT16); } else if (var->type == READSTAT_TYPE_INT32) { retval = dta_write_raw_int32(row, DTA_113_MISSING_INT32); } else if (var->type == READSTAT_TYPE_FLOAT) { retval = dta_write_raw_int32(row, DTA_113_MISSING_FLOAT); } else if (var->type == READSTAT_TYPE_DOUBLE) { retval = dta_write_raw_int64(row, DTA_113_MISSING_DOUBLE); } return retval; } static readstat_error_t dta_old_write_missing_numeric(void *row, const readstat_variable_t *var) { readstat_error_t retval = READSTAT_OK; if (var->type == READSTAT_TYPE_INT8) { retval = dta_write_raw_int8(row, DTA_OLD_MISSING_INT8); } else if (var->type == READSTAT_TYPE_INT16) { retval = dta_write_raw_int16(row, DTA_OLD_MISSING_INT16); } else if (var->type == READSTAT_TYPE_INT32) { retval = dta_write_raw_int32(row, DTA_OLD_MISSING_INT32); } else if (var->type == READSTAT_TYPE_FLOAT) { retval = dta_write_raw_int32(row, DTA_OLD_MISSING_FLOAT); } else if (var->type == READSTAT_TYPE_DOUBLE) { retval = dta_write_raw_int64(row, DTA_OLD_MISSING_DOUBLE); } return retval; } static readstat_error_t dta_write_missing_string(void *row, const readstat_variable_t *var) { return dta_write_string(row, var, NULL); } static readstat_error_t dta_113_write_missing_tagged(void *row, const readstat_variable_t *var, char tag) { readstat_error_t retval = READSTAT_OK; if (tag < 'a' || tag > 'z') return READSTAT_ERROR_TAGGED_VALUE_IS_OUT_OF_RANGE; if (var->type == READSTAT_TYPE_INT8) { retval = dta_write_raw_int8(row, DTA_113_MISSING_INT8_A + (tag - 'a')); } else if (var->type == READSTAT_TYPE_INT16) { retval = dta_write_raw_int16(row, DTA_113_MISSING_INT16_A + (tag - 'a')); } else if (var->type == READSTAT_TYPE_INT32) { retval = dta_write_raw_int32(row, DTA_113_MISSING_INT32_A + (tag - 'a')); } else if (var->type == READSTAT_TYPE_FLOAT) { retval = dta_write_raw_int32(row, DTA_113_MISSING_FLOAT_A + ((tag - 'a') << 11)); } else if (var->type == READSTAT_TYPE_DOUBLE) { retval = dta_write_raw_int64(row, DTA_113_MISSING_DOUBLE_A + ((int64_t)(tag - 'a') << 40)); } else { retval = READSTAT_ERROR_TAGGED_VALUES_NOT_SUPPORTED; } return retval; } static readstat_error_t dta_end_data(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; dta_ctx_t *ctx = writer->module_ctx; readstat_error_t error = READSTAT_OK; if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; error = dta_write_tag(writer, ctx, ""); if (error != READSTAT_OK) goto cleanup; error = dta_emit_strls(writer, ctx); if (error != READSTAT_OK) goto cleanup; error = dta_emit_value_labels(writer, ctx); if (error != READSTAT_OK) goto cleanup; error = dta_write_tag(writer, ctx, ""); if (error != READSTAT_OK) goto cleanup; cleanup: return error; } static void dta_module_ctx_free(void *module_ctx) { dta_ctx_free(module_ctx); } readstat_error_t dta_metadata_ok(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; if (writer->compression != READSTAT_COMPRESS_NONE) return READSTAT_ERROR_UNSUPPORTED_COMPRESSION; if (writer->version > DTA_FILE_VERSION_MAX || writer->version < DTA_FILE_VERSION_MIN) return READSTAT_ERROR_UNSUPPORTED_FILE_FORMAT_VERSION; return READSTAT_OK; } readstat_error_t readstat_begin_writing_dta(readstat_writer_t *writer, void *user_ctx, long row_count) { if (writer->version == 0) writer->version = DTA_FILE_VERSION_DEFAULT; writer->callbacks.metadata_ok = &dta_metadata_ok; if (writer->version >= 117) { writer->callbacks.variable_width = &dta_117_variable_width; } else if (writer->version >= 111) { writer->callbacks.variable_width = &dta_111_variable_width; } else { writer->callbacks.variable_width = &dta_old_variable_width; } if (writer->version >= 118) { writer->callbacks.variable_ok = &dta_118_variable_ok; } else if (writer->version >= 110) { writer->callbacks.variable_ok = &dta_110_variable_ok; } else { writer->callbacks.variable_ok = &dta_old_variable_ok; } if (writer->version >= 118) { writer->callbacks.write_string_ref = &dta_118_write_string_ref; } else if (writer->version == 117) { writer->callbacks.write_string_ref = &dta_117_write_string_ref; } if (writer->version >= 113) { writer->callbacks.write_int8 = &dta_113_write_int8; writer->callbacks.write_int16 = &dta_113_write_int16; writer->callbacks.write_int32 = &dta_113_write_int32; writer->callbacks.write_missing_number = &dta_113_write_missing_numeric; writer->callbacks.write_missing_tagged = &dta_113_write_missing_tagged; } else { writer->callbacks.write_int8 = &dta_old_write_int8; writer->callbacks.write_int16 = &dta_old_write_int16; writer->callbacks.write_int32 = &dta_old_write_int32; writer->callbacks.write_missing_number = &dta_old_write_missing_numeric; } writer->callbacks.write_float = &dta_write_float; writer->callbacks.write_double = &dta_write_double; writer->callbacks.write_string = &dta_write_string; writer->callbacks.write_missing_string = &dta_write_missing_string; writer->callbacks.begin_data = &dta_begin_data; writer->callbacks.end_data = &dta_end_data; writer->callbacks.module_ctx_free = &dta_module_ctx_free; return readstat_begin_writing_file(writer, user_ctx, row_count); } haven/src/readstat/stata/readstat_dta_read.c0000644000176200001440000012104614102306445020666 0ustar liggesusers#include #include #include #include #include #include #include #if !defined(_POSIX_VERSION) || _POSIX_VERSION < 200809L size_t strnlen(const char* s, size_t maxlen) { const char* end; end = memchr(s, '\0', maxlen); if (end == NULL) return maxlen; return end - s; } #endif #include "../readstat.h" #include "../readstat_bits.h" #include "../readstat_iconv.h" #include "../readstat_convert.h" #include "../readstat_malloc.h" #include "readstat_dta.h" #include "readstat_dta_parse_timestamp.h" #define MAX_VALUE_LABEL_LEN 32000 static readstat_error_t dta_update_progress(dta_ctx_t *ctx); static readstat_error_t dta_read_descriptors(dta_ctx_t *ctx); static readstat_error_t dta_read_tag(dta_ctx_t *ctx, const char *tag); static readstat_error_t dta_read_expansion_fields(dta_ctx_t *ctx); static readstat_error_t dta_update_progress(dta_ctx_t *ctx) { double progress = 0.0; if (ctx->row_limit > 0) progress = 1.0 * ctx->current_row / ctx->row_limit; if (ctx->handle.progress && ctx->handle.progress(progress, ctx->user_ctx) != READSTAT_HANDLER_OK) return READSTAT_ERROR_USER_ABORT; return READSTAT_OK; } static readstat_variable_t *dta_init_variable(dta_ctx_t *ctx, int i, int index_after_skipping, readstat_type_t type, size_t max_len) { readstat_variable_t *variable = calloc(1, sizeof(readstat_variable_t)); variable->type = type; variable->index = i; variable->index_after_skipping = index_after_skipping; variable->storage_width = max_len; readstat_convert(variable->name, sizeof(variable->name), &ctx->varlist[ctx->variable_name_len*i], strnlen(&ctx->varlist[ctx->variable_name_len*i], ctx->variable_name_len), ctx->converter); if (ctx->variable_labels[ctx->variable_labels_entry_len*i]) { readstat_convert(variable->label, sizeof(variable->label), &ctx->variable_labels[ctx->variable_labels_entry_len*i], strnlen(&ctx->variable_labels[ctx->variable_labels_entry_len*i], ctx->variable_labels_entry_len), ctx->converter); } if (ctx->fmtlist[ctx->fmtlist_entry_len*i]) { readstat_convert(variable->format, sizeof(variable->format), &ctx->fmtlist[ctx->fmtlist_entry_len*i], strnlen(&ctx->fmtlist[ctx->fmtlist_entry_len*i], ctx->fmtlist_entry_len), ctx->converter); if (variable->format[0] == '%') { if (variable->format[1] == '-') { variable->alignment = READSTAT_ALIGNMENT_LEFT; } else if (variable->format[1] == '~') { variable->alignment = READSTAT_ALIGNMENT_CENTER; } else { variable->alignment = READSTAT_ALIGNMENT_RIGHT; } } int display_width; if (sscanf(variable->format, "%%%ds", &display_width) == 1 || sscanf(variable->format, "%%-%ds", &display_width) == 1) { variable->display_width = display_width; } } return variable; } static readstat_error_t dta_read_chunk( dta_ctx_t *ctx, const char *start_tag, void *dst, size_t dst_len, const char *end_tag) { char *dst_buffer = (char *)dst; readstat_io_t *io = ctx->io; readstat_error_t retval = READSTAT_OK; if ((retval = dta_read_tag(ctx, start_tag)) != READSTAT_OK) goto cleanup; if (io->read(dst_buffer, dst_len, io->io_ctx) != dst_len) { retval = READSTAT_ERROR_READ; goto cleanup; } if ((retval = dta_read_tag(ctx, end_tag)) != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t dta_read_map(dta_ctx_t *ctx) { if (!ctx->file_is_xmlish) return READSTAT_OK; readstat_error_t retval = READSTAT_OK; uint64_t map_buffer[14]; if ((retval = dta_read_chunk(ctx, "", map_buffer, sizeof(map_buffer), "")) != READSTAT_OK) { goto cleanup; } ctx->data_offset = ctx->bswap ? byteswap8(map_buffer[9]) : map_buffer[9]; ctx->strls_offset = ctx->bswap ? byteswap8(map_buffer[10]) : map_buffer[10]; ctx->value_labels_offset = ctx->bswap ? byteswap8(map_buffer[11]) : map_buffer[11]; cleanup: return retval; } static readstat_error_t dta_read_descriptors(dta_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; size_t buffer_len = ctx->nvar * ctx->typlist_entry_len; unsigned char *buffer = NULL; int i; if (ctx->nvar && (buffer = readstat_malloc(buffer_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if ((retval = dta_read_chunk(ctx, "", buffer, buffer_len, "")) != READSTAT_OK) goto cleanup; if (ctx->typlist_entry_len == 1) { for (i=0; invar; i++) { ctx->typlist[i] = buffer[i]; } } else if (ctx->typlist_entry_len == 2) { memcpy(ctx->typlist, buffer, buffer_len); if (ctx->bswap) { for (i=0; invar; i++) { ctx->typlist[i] = byteswap2(ctx->typlist[i]); } } } if ((retval = dta_read_chunk(ctx, "", ctx->varlist, ctx->varlist_len, "")) != READSTAT_OK) goto cleanup; if ((retval = dta_read_chunk(ctx, "", ctx->srtlist, ctx->srtlist_len, "")) != READSTAT_OK) goto cleanup; if ((retval = dta_read_chunk(ctx, "", ctx->fmtlist, ctx->fmtlist_len, "")) != READSTAT_OK) goto cleanup; if ((retval = dta_read_chunk(ctx, "", ctx->lbllist, ctx->lbllist_len, "")) != READSTAT_OK) goto cleanup; if ((retval = dta_read_chunk(ctx, "", ctx->variable_labels, ctx->variable_labels_len, "")) != READSTAT_OK) goto cleanup; cleanup: if (buffer) free(buffer); return retval; } static readstat_error_t dta_read_expansion_fields(dta_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; char *buffer = NULL; if (ctx->expansion_len_len == 0) return READSTAT_OK; if (ctx->file_is_xmlish && !ctx->handle.note) { if (io->seek(ctx->data_offset, READSTAT_SEEK_SET, io->io_ctx) == -1) { if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "Failed to seek to data section (offset=%" PRId64 ")", ctx->data_offset); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } return READSTAT_ERROR_SEEK; } return READSTAT_OK; } retval = dta_read_tag(ctx, ""); if (retval != READSTAT_OK) goto cleanup; while (1) { size_t len; char data_type; if (ctx->file_is_xmlish) { char start[4]; if (io->read(start, sizeof(start), io->io_ctx) != sizeof(start)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (memcmp(start, ""); if (retval != READSTAT_OK) goto cleanup; break; } else if (memcmp(start, "", sizeof(start)) != 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } data_type = 1; } else { if (io->read(&data_type, 1, io->io_ctx) != 1) { retval = READSTAT_ERROR_READ; goto cleanup; } } if (ctx->expansion_len_len == 2) { uint16_t len16; if (io->read(&len16, sizeof(uint16_t), io->io_ctx) != sizeof(uint16_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } len = ctx->bswap ? byteswap2(len16) : len16; } else { uint32_t len32; if (io->read(&len32, sizeof(uint32_t), io->io_ctx) != sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } len = ctx->bswap ? byteswap4(len32) : len32; } if (data_type == 0 && len == 0) break; if (data_type != 1 || len > (1<<20)) { retval = READSTAT_ERROR_NOTE_IS_TOO_LONG; goto cleanup; } if (ctx->handle.note && len >= 2 * ctx->ch_metadata_len) { if ((buffer = readstat_realloc(buffer, len + 1)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } buffer[len] = '\0'; if (io->read(buffer, len, io->io_ctx) != len) { retval = READSTAT_ERROR_READ; goto cleanup; } int index = 0; if (strncmp(&buffer[0], "_dta", 4) == 0 && sscanf(&buffer[ctx->ch_metadata_len], "note%d", &index) == 1) { if (ctx->handle.note(index, &buffer[2*ctx->ch_metadata_len], ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } } else { if (io->seek(len, READSTAT_SEEK_CUR, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } } retval = dta_read_tag(ctx, ""); if (retval != READSTAT_OK) goto cleanup; } cleanup: if (buffer) free(buffer); return retval; } static readstat_error_t dta_read_tag(dta_ctx_t *ctx, const char *tag) { readstat_error_t retval = READSTAT_OK; if (ctx->initialized && !ctx->file_is_xmlish) return retval; char buffer[256]; size_t len = strlen(tag); if (ctx->io->read(buffer, len, ctx->io->io_ctx) != len) { retval = READSTAT_ERROR_READ; goto cleanup; } if (strncmp(buffer, tag, len) != 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } cleanup: return retval; } static int dta_compare_strls(const void *elem1, const void *elem2) { const dta_strl_t *key = (const dta_strl_t *)elem1; const dta_strl_t *target = *(const dta_strl_t **)elem2; if (key->o == target->o) return key->v - target->v; return key->o - target->o; } static dta_strl_t dta_interpret_strl_vo_bytes(dta_ctx_t *ctx, const unsigned char *vo_bytes) { dta_strl_t strl = {0}; if (ctx->strl_v_len == 2) { if (ctx->endianness == READSTAT_ENDIAN_BIG) { strl.v = (vo_bytes[0] << 8) + vo_bytes[1]; strl.o = (((uint64_t)vo_bytes[2] << 40) + ((uint64_t)vo_bytes[3] << 32) + ((uint64_t)vo_bytes[4] << 24) + (vo_bytes[5] << 16) + (vo_bytes[6] << 8) + vo_bytes[7]); } else { strl.v = vo_bytes[0] + (vo_bytes[1] << 8); strl.o = (vo_bytes[2] + (vo_bytes[3] << 8) + (vo_bytes[4] << 16) + ((uint64_t)vo_bytes[5] << 24) + ((uint64_t)vo_bytes[6] << 32) + ((uint64_t)vo_bytes[7] << 40)); } } else if (ctx->strl_v_len == 4) { uint32_t v, o; memcpy(&v, &vo_bytes[0], sizeof(uint32_t)); memcpy(&o, &vo_bytes[4], sizeof(uint32_t)); strl.v = ctx->bswap ? byteswap4(v) : v; strl.o = ctx->bswap ? byteswap4(o) : o; } return strl; } static readstat_error_t dta_117_read_strl(dta_ctx_t *ctx, dta_strl_t *strl) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; dta_117_strl_header_t header; if (io->read(&header, sizeof(header), io->io_ctx) != sizeof(dta_117_strl_header_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } strl->v = ctx->bswap ? byteswap4(header.v) : header.v; strl->o = ctx->bswap ? byteswap4(header.o) : header.o; strl->type = header.type; strl->len = ctx->bswap ? byteswap4(header.len) : header.len; cleanup: return retval; } static readstat_error_t dta_118_read_strl(dta_ctx_t *ctx, dta_strl_t *strl) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; dta_118_strl_header_t header; if (io->read(&header, sizeof(header), io->io_ctx) != sizeof(dta_118_strl_header_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } strl->v = ctx->bswap ? byteswap4(header.v) : header.v; strl->o = ctx->bswap ? byteswap8(header.o) : header.o; strl->type = header.type; strl->len = ctx->bswap ? byteswap4(header.len) : header.len; cleanup: return retval; } static readstat_error_t dta_read_strl(dta_ctx_t *ctx, dta_strl_t *strl) { if (ctx->strl_o_len > 4) { return dta_118_read_strl(ctx, strl); } return dta_117_read_strl(ctx, strl); } static readstat_error_t dta_read_strls(dta_ctx_t *ctx) { if (!ctx->file_is_xmlish) return READSTAT_OK; readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; if (io->seek(ctx->strls_offset, READSTAT_SEEK_SET, io->io_ctx) == -1) { if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "Failed to seek to strls section (offset=%" PRId64 ")", ctx->strls_offset); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } retval = READSTAT_ERROR_SEEK; goto cleanup; } retval = dta_read_tag(ctx, ""); if (retval != READSTAT_OK) goto cleanup; ctx->strls_capacity = 100; ctx->strls = readstat_malloc(ctx->strls_capacity * sizeof(dta_strl_t *)); while (1) { char tag[3]; if (io->read(tag, sizeof(tag), io->io_ctx) != sizeof(tag)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (memcmp(tag, "GSO", sizeof(tag)) == 0) { dta_strl_t strl; retval = dta_read_strl(ctx, &strl); if (retval != READSTAT_OK) goto cleanup; if (strl.type != DTA_GSO_TYPE_ASCII) continue; if (ctx->strls_count == ctx->strls_capacity) { ctx->strls_capacity *= 2; if ((ctx->strls = readstat_realloc(ctx->strls, sizeof(dta_strl_t *) * ctx->strls_capacity)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } } dta_strl_t *strl_ptr = readstat_malloc(sizeof(dta_strl_t) + strl.len); if (strl_ptr == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } memcpy(strl_ptr, &strl, sizeof(dta_strl_t)); ctx->strls[ctx->strls_count++] = strl_ptr; if (io->read(&strl_ptr->data[0], strl_ptr->len, io->io_ctx) != strl_ptr->len) { retval = READSTAT_ERROR_READ; goto cleanup; } } else if (memcmp(tag, ""); if (retval != READSTAT_OK) goto cleanup; break; } else { retval = READSTAT_ERROR_PARSE; goto cleanup; } } cleanup: return retval; } static readstat_value_t dta_interpret_int8_bytes(dta_ctx_t *ctx, const void *buf) { readstat_value_t value = { .type = READSTAT_TYPE_INT8 }; int8_t byte = 0; memcpy(&byte, buf, sizeof(int8_t)); if (ctx->machine_is_twos_complement) { byte = ones_to_twos_complement1(byte); } if (byte > ctx->max_int8) { if (ctx->supports_tagged_missing && byte > DTA_113_MISSING_INT8) { value.tag = 'a' + (byte - DTA_113_MISSING_INT8_A); value.is_tagged_missing = 1; } else { value.is_system_missing = 1; } } value.v.i8_value = byte; return value; } static readstat_value_t dta_interpret_int16_bytes(dta_ctx_t *ctx, const void *buf) { readstat_value_t value = { .type = READSTAT_TYPE_INT16 }; int16_t num = 0; memcpy(&num, buf, sizeof(int16_t)); if (ctx->bswap) { num = byteswap2(num); } if (ctx->machine_is_twos_complement) { num = ones_to_twos_complement2(num); } if (num > ctx->max_int16) { if (ctx->supports_tagged_missing && num > DTA_113_MISSING_INT16) { value.tag = 'a' + (num - DTA_113_MISSING_INT16_A); value.is_tagged_missing = 1; } else { value.is_system_missing = 1; } } value.v.i16_value = num; return value; } static readstat_value_t dta_interpret_int32_bytes(dta_ctx_t *ctx, const void *buf) { readstat_value_t value = { .type = READSTAT_TYPE_INT32 }; int32_t num = 0; memcpy(&num, buf, sizeof(int32_t)); if (ctx->bswap) { num = byteswap4(num); } if (ctx->machine_is_twos_complement) { num = ones_to_twos_complement4(num); } if (num > ctx->max_int32) { if (ctx->supports_tagged_missing && num > DTA_113_MISSING_INT32) { value.tag = 'a' + (num - DTA_113_MISSING_INT32_A); value.is_tagged_missing = 1; } else { value.is_system_missing = 1; } } value.v.i32_value = num; return value; } static readstat_value_t dta_interpret_float_bytes(dta_ctx_t *ctx, const void *buf) { readstat_value_t value = { .type = READSTAT_TYPE_FLOAT }; float f_num = NAN; int32_t num = 0; memcpy(&num, buf, sizeof(int32_t)); if (ctx->bswap) { num = byteswap4(num); } if (num > ctx->max_float) { if (ctx->supports_tagged_missing && num > DTA_113_MISSING_FLOAT) { value.tag = 'a' + ((num - DTA_113_MISSING_FLOAT_A) >> 11); value.is_tagged_missing = 1; } else { value.is_system_missing = 1; } } else { memcpy(&f_num, &num, sizeof(int32_t)); } value.v.float_value = f_num; return value; } static readstat_value_t dta_interpret_double_bytes(dta_ctx_t *ctx, const void *buf) { readstat_value_t value = { .type = READSTAT_TYPE_DOUBLE }; double d_num = NAN; int64_t num = 0; memcpy(&num, buf, sizeof(int64_t)); if (ctx->bswap) { num = byteswap8(num); } if (num > ctx->max_double) { if (ctx->supports_tagged_missing && num > DTA_113_MISSING_DOUBLE) { value.tag = 'a' + ((num - DTA_113_MISSING_DOUBLE_A) >> 40); value.is_tagged_missing = 1; } else { value.is_system_missing = 1; } } else { memcpy(&d_num, &num, sizeof(int64_t)); } value.v.double_value = d_num; return value; } static readstat_error_t dta_handle_row(const unsigned char *buf, dta_ctx_t *ctx) { char str_buf[2048]; int j; readstat_off_t offset = 0; readstat_error_t retval = READSTAT_OK; for (j=0; jnvar; j++) { size_t max_len; readstat_value_t value = { { 0 } }; retval = dta_type_info(ctx->typlist[j], ctx, &max_len, &value.type); if (retval != READSTAT_OK) goto cleanup; if (ctx->variables[j]->skip) { offset += max_len; continue; } if (offset + max_len > ctx->record_len) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (value.type == READSTAT_TYPE_STRING) { if (max_len == 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } size_t str_len = strnlen((const char *)&buf[offset], max_len); retval = readstat_convert(str_buf, sizeof(str_buf), (const char *)&buf[offset], str_len, ctx->converter); if (retval != READSTAT_OK) goto cleanup; value.v.string_value = str_buf; } else if (value.type == READSTAT_TYPE_STRING_REF) { dta_strl_t key = dta_interpret_strl_vo_bytes(ctx, &buf[offset]); dta_strl_t **found = bsearch(&key, ctx->strls, ctx->strls_count, sizeof(dta_strl_t *), &dta_compare_strls); if (found) { value.v.string_value = (*found)->data; } value.type = READSTAT_TYPE_STRING; } else if (value.type == READSTAT_TYPE_INT8) { value = dta_interpret_int8_bytes(ctx, &buf[offset]); } else if (value.type == READSTAT_TYPE_INT16) { value = dta_interpret_int16_bytes(ctx, &buf[offset]); } else if (value.type == READSTAT_TYPE_INT32) { value = dta_interpret_int32_bytes(ctx, &buf[offset]); } else if (value.type == READSTAT_TYPE_FLOAT) { value = dta_interpret_float_bytes(ctx, &buf[offset]); } else if (value.type == READSTAT_TYPE_DOUBLE) { value = dta_interpret_double_bytes(ctx, &buf[offset]); } if (ctx->handle.value(ctx->current_row, ctx->variables[j], value, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } offset += max_len; } cleanup: return retval; } static readstat_error_t dta_handle_rows(dta_ctx_t *ctx) { readstat_io_t *io = ctx->io; unsigned char *buf = NULL; int i; readstat_error_t retval = READSTAT_OK; if (ctx->record_len && (buf = readstat_malloc(ctx->record_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (ctx->row_offset) { if (io->seek(ctx->record_len * ctx->row_offset, READSTAT_SEEK_CUR, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } } for (i=0; irow_limit; i++) { if (io->read(buf, ctx->record_len, io->io_ctx) != ctx->record_len) { retval = READSTAT_ERROR_READ; goto cleanup; } if ((retval = dta_handle_row(buf, ctx)) != READSTAT_OK) { goto cleanup; } ctx->current_row++; if ((retval = dta_update_progress(ctx)) != READSTAT_OK) { goto cleanup; } } if (ctx->row_limit < ctx->nobs - ctx->row_offset) { if (io->seek(ctx->record_len * (ctx->nobs - ctx->row_offset - ctx->row_limit), READSTAT_SEEK_CUR, io->io_ctx) == -1) retval = READSTAT_ERROR_SEEK; } cleanup: if (buf) free(buf); return retval; } static readstat_error_t dta_read_data(dta_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; if (!ctx->handle.value) { return READSTAT_OK; } if (io->seek(ctx->data_offset, READSTAT_SEEK_SET, io->io_ctx) == -1) { if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "Failed to seek to data section (offset=%" PRId64 ")", ctx->data_offset); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } retval = READSTAT_ERROR_SEEK; goto cleanup; } if ((retval = dta_read_tag(ctx, "")) != READSTAT_OK) goto cleanup; if ((retval = dta_update_progress(ctx)) != READSTAT_OK) goto cleanup; if ((retval = dta_handle_rows(ctx)) != READSTAT_OK) goto cleanup; if ((retval = dta_read_tag(ctx, "")) != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t dta_read_header(dta_ctx_t *ctx, dta_header_t *header) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; int bswap = 0; if (io->read(header, sizeof(dta_header_t), io->io_ctx) != sizeof(dta_header_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } bswap = (header->byteorder == DTA_LOHI) ^ machine_is_little_endian(); header->nvar = bswap ? byteswap2(header->nvar) : header->nvar; header->nobs = bswap ? byteswap4(header->nobs) : header->nobs; cleanup: return retval; } static readstat_error_t dta_read_xmlish_header(dta_ctx_t *ctx, dta_header64_t *header) { readstat_error_t retval = READSTAT_OK; if ((retval = dta_read_tag(ctx, "")) != READSTAT_OK) { goto cleanup; } if ((retval = dta_read_tag(ctx, "
")) != READSTAT_OK) { goto cleanup; } char ds_format[3]; if ((retval = dta_read_chunk(ctx, "", ds_format, sizeof(ds_format), "")) != READSTAT_OK) { goto cleanup; } header->ds_format = 100 * (ds_format[0] - '0') + 10 * (ds_format[1] - '0') + (ds_format[2] - '0'); char byteorder[3]; int byteswap = 0; if ((retval = dta_read_chunk(ctx, "", byteorder, sizeof(byteorder), "")) != READSTAT_OK) { goto cleanup; } if (strncmp(byteorder, "MSF", 3) == 0) { header->byteorder = DTA_HILO; } else if (strncmp(byteorder, "LSF", 3) == 0) { header->byteorder = DTA_LOHI; } else { retval = READSTAT_ERROR_PARSE; goto cleanup; } byteswap = (header->byteorder == DTA_LOHI) ^ machine_is_little_endian(); if (header->ds_format >= 119) { uint32_t nvar; if ((retval = dta_read_chunk(ctx, "", &nvar, sizeof(uint32_t), "")) != READSTAT_OK) { goto cleanup; } header->nvar = byteswap ? byteswap4(nvar) : nvar; } else { uint16_t nvar; if ((retval = dta_read_chunk(ctx, "", &nvar, sizeof(uint16_t), "")) != READSTAT_OK) { goto cleanup; } header->nvar = byteswap ? byteswap2(nvar) : nvar; } if (header->ds_format >= 118) { uint64_t nobs; if ((retval = dta_read_chunk(ctx, "", &nobs, sizeof(uint64_t), "")) != READSTAT_OK) { goto cleanup; } header->nobs = byteswap ? byteswap8(nobs) : nobs; } else { uint32_t nobs; if ((retval = dta_read_chunk(ctx, "", &nobs, sizeof(uint32_t), "")) != READSTAT_OK) { goto cleanup; } header->nobs = byteswap ? byteswap4(nobs) : nobs; } cleanup: return retval; } static readstat_error_t dta_read_label_and_timestamp(dta_ctx_t *ctx) { readstat_io_t *io = ctx->io; readstat_error_t retval = READSTAT_OK; char *data_label_buffer = NULL; char *timestamp_buffer = NULL; uint16_t label_len = 0; unsigned char timestamp_len = 0; char last_data_label_char = 0; struct tm timestamp = { .tm_isdst = -1 }; if (ctx->file_is_xmlish) { if ((retval = dta_read_tag(ctx, "")) != READSTAT_OK) { goto cleanup; } if ((retval = dta_read_tag(ctx, "")) != READSTAT_OK) { goto cleanup; } if (io->read(×tamp_len, 1, io->io_ctx) != 1) { retval = READSTAT_ERROR_READ; goto cleanup; } } else { timestamp_len = ctx->timestamp_len; } if (timestamp_len) { timestamp_buffer = readstat_malloc(timestamp_len); if (io->read(timestamp_buffer, timestamp_len, io->io_ctx) != timestamp_len) { retval = READSTAT_ERROR_READ; goto cleanup; } if (!ctx->file_is_xmlish) timestamp_len--; if (timestamp_buffer[0]) { if (timestamp_buffer[timestamp_len-1] == '\0' && last_data_label_char != '\0') { /* Stupid hack for miswritten files with off-by-one timestamp, DTA 114 era? */ memmove(timestamp_buffer+1, timestamp_buffer, timestamp_len-1); timestamp_buffer[0] = last_data_label_char; } if (dta_parse_timestamp(timestamp_buffer, timestamp_len, ×tamp, ctx->handle.error, ctx->user_ctx) == READSTAT_OK) { ctx->timestamp = mktime(×tamp); } } } if ((retval = dta_read_tag(ctx, "")) != READSTAT_OK) { goto cleanup; } cleanup: if (data_label_buffer) free(data_label_buffer); if (timestamp_buffer) free(timestamp_buffer); return retval; } static readstat_error_t dta_handle_variables(dta_ctx_t *ctx) { if (!ctx->handle.variable) return READSTAT_OK; readstat_error_t retval = READSTAT_OK; int i; int index_after_skipping = 0; for (i=0; invar; i++) { size_t max_len; readstat_type_t type; retval = dta_type_info(ctx->typlist[i], ctx, &max_len, &type); if (retval != READSTAT_OK) goto cleanup; if (type == READSTAT_TYPE_STRING) max_len++; /* might append NULL */ if (type == READSTAT_TYPE_STRING_REF) { type = READSTAT_TYPE_STRING; max_len = 0; } ctx->variables[i] = dta_init_variable(ctx, i, index_after_skipping, type, max_len); const char *value_labels = NULL; if (ctx->lbllist[ctx->lbllist_entry_len*i]) value_labels = &ctx->lbllist[ctx->lbllist_entry_len*i]; int cb_retval = ctx->handle.variable(i, ctx->variables[i], value_labels, ctx->user_ctx); if (cb_retval == READSTAT_HANDLER_ABORT) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } if (cb_retval == READSTAT_HANDLER_SKIP_VARIABLE) { ctx->variables[i]->skip = 1; } else { index_after_skipping++; } } cleanup: return retval; } static readstat_error_t dta_handle_value_labels(dta_ctx_t *ctx) { readstat_io_t *io = ctx->io; readstat_error_t retval = READSTAT_OK; char *table_buffer = NULL; char *utf8_buffer = NULL; if (io->seek(ctx->value_labels_offset, READSTAT_SEEK_SET, io->io_ctx) == -1) { if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "Failed to seek to value labels section (offset=%" PRId64 ")", ctx->value_labels_offset); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } retval = READSTAT_ERROR_SEEK; goto cleanup; } if ((retval = dta_read_tag(ctx, "")) != READSTAT_OK) { goto cleanup; } if (!ctx->handle.value_label) { return READSTAT_OK; } while (1) { size_t len = 0; char labname[129]; uint32_t i = 0, n = 0; if (ctx->value_label_table_len_len == 2) { int16_t table_header_len; if (io->read(&table_header_len, sizeof(int16_t), io->io_ctx) < sizeof(int16_t)) break; len = table_header_len; if (ctx->bswap) len = byteswap2(table_header_len); n = len / 8; } else { if (dta_read_tag(ctx, "") != READSTAT_OK) { break; } int32_t table_header_len; if (io->read(&table_header_len, sizeof(int32_t), io->io_ctx) < sizeof(int32_t)) break; len = table_header_len; if (ctx->bswap) len = byteswap4(table_header_len); } if (io->read(labname, ctx->value_label_table_labname_len, io->io_ctx) < ctx->value_label_table_labname_len) break; if (io->seek(ctx->value_label_table_padding_len, READSTAT_SEEK_CUR, io->io_ctx) == -1) break; if ((table_buffer = readstat_realloc(table_buffer, len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (io->read(table_buffer, len, io->io_ctx) < len) { break; } if (ctx->value_label_table_len_len == 2) { for (i=0; iconverter); if (retval != READSTAT_OK) goto cleanup; if (label_buf[0] && ctx->handle.value_label(labname, value, label_buf, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } } else if (len >= 8) { if ((retval = dta_read_tag(ctx, "")) != READSTAT_OK) { goto cleanup; } n = *(uint32_t *)table_buffer; uint32_t txtlen = *((uint32_t *)table_buffer+1); if (ctx->bswap) { n = byteswap4(n); txtlen = byteswap4(txtlen); } if (txtlen > len - 8 || n > (len - 8 - txtlen) / 8) { break; } uint32_t *off = (uint32_t *)table_buffer+2; uint32_t *val = (uint32_t *)table_buffer+2+n; char *txt = &table_buffer[8LL*n+8]; size_t utf8_buffer_len = 4*txtlen+1; if (txtlen > MAX_VALUE_LABEL_LEN+1) utf8_buffer_len = 4*MAX_VALUE_LABEL_LEN+1; utf8_buffer = realloc(utf8_buffer, utf8_buffer_len); /* Much bigger than we need but whatever */ if (utf8_buffer == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (ctx->bswap) { for (i=0; i= txtlen) { retval = READSTAT_ERROR_PARSE; goto cleanup; } readstat_value_t value = dta_interpret_int32_bytes(ctx, &val[i]); size_t max_label_len = txtlen - off[i]; if (max_label_len > MAX_VALUE_LABEL_LEN) max_label_len = MAX_VALUE_LABEL_LEN; size_t label_len = strnlen(&txt[off[i]], max_label_len); retval = readstat_convert(utf8_buffer, utf8_buffer_len, &txt[off[i]], label_len, ctx->converter); if (retval != READSTAT_OK) goto cleanup; if (ctx->handle.value_label(labname, value, utf8_buffer, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } } } cleanup: if (table_buffer) free(table_buffer); if (utf8_buffer) free(utf8_buffer); return retval; } readstat_error_t readstat_parse_dta(readstat_parser_t *parser, const char *path, void *user_ctx) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = parser->io; int i; dta_ctx_t *ctx; size_t file_size = 0; ctx = dta_ctx_alloc(io); if (io->open(path, io->io_ctx) == -1) { retval = READSTAT_ERROR_OPEN; goto cleanup; } char magic[4]; if (io->read(magic, 4, io->io_ctx) != 4) { retval = READSTAT_ERROR_READ; goto cleanup; } file_size = io->seek(0, READSTAT_SEEK_END, io->io_ctx); if (file_size == -1) { if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "Failed to seek to end of file"); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } retval = READSTAT_ERROR_SEEK; goto cleanup; } if (io->seek(0, READSTAT_SEEK_SET, io->io_ctx) == -1) { if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "Failed to seek to start of file"); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } retval = READSTAT_ERROR_SEEK; goto cleanup; } if (strncmp(magic, "input_encoding, parser->output_encoding); } else { dta_header_t header; if ((retval = dta_read_header(ctx, &header)) != READSTAT_OK) { goto cleanup; } retval = dta_ctx_init(ctx, header.nvar, header.nobs, header.byteorder, header.ds_format, parser->input_encoding, parser->output_encoding); } if (retval != READSTAT_OK) { goto cleanup; } ctx->user_ctx = user_ctx; ctx->file_size = file_size; ctx->handle = parser->handlers; if (parser->row_offset > 0) ctx->row_offset = parser->row_offset; int64_t nobs_after_skipping = ctx->nobs - ctx->row_offset; if (nobs_after_skipping < 0) { nobs_after_skipping = 0; ctx->row_offset = ctx->nobs; } ctx->row_limit = nobs_after_skipping; if (parser->row_limit > 0 && parser->row_limit < nobs_after_skipping) ctx->row_limit = parser->row_limit; retval = dta_update_progress(ctx); if (retval != READSTAT_OK) goto cleanup; if ((retval = dta_read_label_and_timestamp(ctx)) != READSTAT_OK) goto cleanup; if ((retval = dta_read_tag(ctx, "
")) != READSTAT_OK) { goto cleanup; } if (ctx->handle.metadata) { readstat_metadata_t metadata = { .row_count = ctx->row_limit, .var_count = ctx->nvar, .file_label = ctx->data_label, .creation_time = ctx->timestamp, .modified_time = ctx->timestamp, .file_format_version = ctx->ds_format, .is64bit = ctx->ds_format >= 118, .endianness = ctx->endianness }; if (ctx->handle.metadata(&metadata, user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } if ((retval = dta_read_map(ctx)) != READSTAT_OK) { retval = READSTAT_ERROR_READ; goto cleanup; } if ((retval = dta_read_descriptors(ctx)) != READSTAT_OK) { goto cleanup; } for (i=0; invar; i++) { size_t max_len; if ((retval = dta_type_info(ctx->typlist[i], ctx, &max_len, NULL)) != READSTAT_OK) goto cleanup; ctx->record_len += max_len; } if ((ctx->nvar > 0 || ctx->nobs > 0) && ctx->record_len == 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if ((retval = dta_handle_variables(ctx)) != READSTAT_OK) goto cleanup; if ((retval = dta_read_expansion_fields(ctx)) != READSTAT_OK) goto cleanup; if (!ctx->file_is_xmlish) { ctx->data_offset = io->seek(0, READSTAT_SEEK_CUR, io->io_ctx); if (ctx->data_offset == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } ctx->value_labels_offset = ctx->data_offset + ctx->record_len * ctx->nobs; } if ((retval = dta_read_strls(ctx)) != READSTAT_OK) goto cleanup; if ((retval = dta_read_data(ctx)) != READSTAT_OK) goto cleanup; if ((retval = dta_handle_value_labels(ctx)) != READSTAT_OK) goto cleanup; cleanup: io->close(io->io_ctx); if (ctx) dta_ctx_free(ctx); return retval; } haven/src/readstat/stata/readstat_dta.c0000644000176200001440000002326614101007206017670 0ustar liggesusers#include #include #include #include #include #include "../readstat.h" #include "../readstat_iconv.h" #include "../readstat_malloc.h" #include "../readstat_bits.h" #include "readstat_dta.h" #define DTA_MIN_VERSION 104 #define DTA_MAX_VERSION 119 dta_ctx_t *dta_ctx_alloc(readstat_io_t *io) { dta_ctx_t *ctx = calloc(1, sizeof(dta_ctx_t)); if (ctx == NULL) { return NULL; } ctx->io = io; ctx->initialized = 0; return ctx; } readstat_error_t dta_ctx_init(dta_ctx_t *ctx, uint32_t nvar, uint64_t nobs, unsigned char byteorder, unsigned char ds_format, const char *input_encoding, const char *output_encoding) { readstat_error_t retval = READSTAT_OK; int machine_byteorder = DTA_HILO; if (ds_format < DTA_MIN_VERSION || ds_format > DTA_MAX_VERSION) return READSTAT_ERROR_UNSUPPORTED_FILE_FORMAT_VERSION; if (machine_is_little_endian()) { machine_byteorder = DTA_LOHI; } ctx->bswap = (byteorder != machine_byteorder); ctx->ds_format = ds_format; ctx->endianness = byteorder == DTA_LOHI ? READSTAT_ENDIAN_LITTLE : READSTAT_ENDIAN_BIG; ctx->nvar = nvar; ctx->nobs = nobs; if (ctx->nvar) { if ((ctx->variables = readstat_calloc(ctx->nvar, sizeof(readstat_variable_t *))) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } } ctx->machine_is_twos_complement = READSTAT_MACHINE_IS_TWOS_COMPLEMENT; if (ds_format < 105) { ctx->fmtlist_entry_len = 7; } else if (ds_format < 114) { ctx->fmtlist_entry_len = 12; } else if (ds_format < 118) { ctx->fmtlist_entry_len = 49; } else { ctx->fmtlist_entry_len = 57; } if (ds_format >= 117) { ctx->typlist_version = 117; } else if (ds_format >= 111) { ctx->typlist_version = 111; } else { ctx->typlist_version = 0; } if (ds_format >= 118) { ctx->data_label_len_len = 2; ctx->strl_v_len = 2; ctx->strl_o_len = 6; } else if (ds_format >= 117) { ctx->data_label_len_len = 1; ctx->strl_v_len = 4; ctx->strl_o_len = 4; } if (ds_format < 105) { ctx->expansion_len_len = 0; } else if (ds_format < 110) { ctx->expansion_len_len = 2; } else { ctx->expansion_len_len = 4; } if (ds_format < 110) { ctx->lbllist_entry_len = 9; ctx->variable_name_len = 9; ctx->ch_metadata_len = 9; } else if (ds_format < 118) { ctx->lbllist_entry_len = 33; ctx->variable_name_len = 33; ctx->ch_metadata_len = 33; } else { ctx->lbllist_entry_len = 129; ctx->variable_name_len = 129; ctx->ch_metadata_len = 129; } if (ds_format < 108) { ctx->variable_labels_entry_len = 32; ctx->data_label_len = 32; } else if (ds_format < 118) { ctx->variable_labels_entry_len = 81; ctx->data_label_len = 81; } else { ctx->variable_labels_entry_len = 321; ctx->data_label_len = 321; } if (ds_format < 105) { ctx->timestamp_len = 0; ctx->value_label_table_len_len = 2; ctx->value_label_table_labname_len = 12; ctx->value_label_table_padding_len = 2; } else { ctx->timestamp_len = 18; ctx->value_label_table_len_len = 4; if (ds_format < 118) { ctx->value_label_table_labname_len = 33; } else { ctx->value_label_table_labname_len = 129; } ctx->value_label_table_padding_len = 3; } if (ds_format < 117) { ctx->typlist_entry_len = 1; ctx->file_is_xmlish = 0; } else { ctx->typlist_entry_len = 2; ctx->file_is_xmlish = 1; } if (ds_format < 113) { ctx->max_int8 = DTA_OLD_MAX_INT8; ctx->max_int16 = DTA_OLD_MAX_INT16; ctx->max_int32 = DTA_OLD_MAX_INT32; ctx->max_float = DTA_OLD_MAX_FLOAT; ctx->max_double = DTA_OLD_MAX_DOUBLE; } else { ctx->max_int8 = DTA_113_MAX_INT8; ctx->max_int16 = DTA_113_MAX_INT16; ctx->max_int32 = DTA_113_MAX_INT32; ctx->max_float = DTA_113_MAX_FLOAT; ctx->max_double = DTA_113_MAX_DOUBLE; ctx->supports_tagged_missing = 1; } if (output_encoding) { if (input_encoding) { ctx->converter = iconv_open(output_encoding, input_encoding); } else if (ds_format < 118) { ctx->converter = iconv_open(output_encoding, "WINDOWS-1252"); } else if (strcmp(output_encoding, "UTF-8") != 0) { ctx->converter = iconv_open(output_encoding, "UTF-8"); } if (ctx->converter == (iconv_t)-1) { ctx->converter = NULL; retval = READSTAT_ERROR_UNSUPPORTED_CHARSET; goto cleanup; } } if (ds_format < 119) { ctx->srtlist_len = (ctx->nvar + 1) * sizeof(int16_t); } else { ctx->srtlist_len = (ctx->nvar + 1) * sizeof(int32_t); } if ((ctx->srtlist = readstat_malloc(ctx->srtlist_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (ctx->nvar > 0) { ctx->typlist_len = ctx->nvar * sizeof(uint16_t); ctx->varlist_len = ctx->variable_name_len * ctx->nvar * sizeof(char); ctx->fmtlist_len = ctx->fmtlist_entry_len * ctx->nvar * sizeof(char); ctx->lbllist_len = ctx->lbllist_entry_len * ctx->nvar * sizeof(char); ctx->variable_labels_len = ctx->variable_labels_entry_len * ctx->nvar * sizeof(char); if ((ctx->typlist = readstat_malloc(ctx->typlist_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if ((ctx->varlist = readstat_malloc(ctx->varlist_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if ((ctx->fmtlist = readstat_malloc(ctx->fmtlist_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if ((ctx->lbllist = readstat_malloc(ctx->lbllist_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if ((ctx->variable_labels = readstat_malloc(ctx->variable_labels_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } } ctx->initialized = 1; cleanup: return retval; } void dta_ctx_free(dta_ctx_t *ctx) { if (ctx->typlist) free(ctx->typlist); if (ctx->varlist) free(ctx->varlist); if (ctx->srtlist) free(ctx->srtlist); if (ctx->fmtlist) free(ctx->fmtlist); if (ctx->lbllist) free(ctx->lbllist); if (ctx->variable_labels) free(ctx->variable_labels); if (ctx->converter) iconv_close(ctx->converter); if (ctx->data_label) free(ctx->data_label); if (ctx->variables) { int i; for (i=0; invar; i++) { if (ctx->variables[i]) free(ctx->variables[i]); } free(ctx->variables); } if (ctx->strls) { int i; for (i=0; istrls_count; i++) { free(ctx->strls[i]); } free(ctx->strls); } free(ctx); } readstat_error_t dta_type_info(uint16_t typecode, dta_ctx_t *ctx, size_t *max_len, readstat_type_t *out_type) { readstat_error_t retval = READSTAT_OK; size_t len = 0; readstat_type_t type = READSTAT_TYPE_STRING; if (ctx->typlist_version == 111) { switch (typecode) { case DTA_111_TYPE_CODE_INT8: len = 1; type = READSTAT_TYPE_INT8; break; case DTA_111_TYPE_CODE_INT16: len = 2; type = READSTAT_TYPE_INT16; break; case DTA_111_TYPE_CODE_INT32: len = 4; type = READSTAT_TYPE_INT32; break; case DTA_111_TYPE_CODE_FLOAT: len = 4; type = READSTAT_TYPE_FLOAT; break; case DTA_111_TYPE_CODE_DOUBLE: len = 8; type = READSTAT_TYPE_DOUBLE; break; default: len = typecode; type = READSTAT_TYPE_STRING; break; } } else if (ctx->typlist_version == 117) { switch (typecode) { case DTA_117_TYPE_CODE_INT8: len = 1; type = READSTAT_TYPE_INT8; break; case DTA_117_TYPE_CODE_INT16: len = 2; type = READSTAT_TYPE_INT16; break; case DTA_117_TYPE_CODE_INT32: len = 4; type = READSTAT_TYPE_INT32; break; case DTA_117_TYPE_CODE_FLOAT: len = 4; type = READSTAT_TYPE_FLOAT; break; case DTA_117_TYPE_CODE_DOUBLE: len = 8; type = READSTAT_TYPE_DOUBLE; break; case DTA_117_TYPE_CODE_STRL: len = 8; type = READSTAT_TYPE_STRING_REF; break; default: len = typecode; type = READSTAT_TYPE_STRING; break; } } else if (typecode < 0x7F) { switch (typecode) { case DTA_OLD_TYPE_CODE_INT8: len = 1; type = READSTAT_TYPE_INT8; break; case DTA_OLD_TYPE_CODE_INT16: len = 2; type = READSTAT_TYPE_INT16; break; case DTA_OLD_TYPE_CODE_INT32: len = 4; type = READSTAT_TYPE_INT32; break; case DTA_OLD_TYPE_CODE_FLOAT: len = 4; type = READSTAT_TYPE_FLOAT; break; case DTA_OLD_TYPE_CODE_DOUBLE: len = 8; type = READSTAT_TYPE_DOUBLE; break; default: retval = READSTAT_ERROR_PARSE; break; } } else { len = typecode - 0x7F; type = READSTAT_TYPE_STRING; } if (max_len) *max_len = len; if (out_type) *out_type = type; return retval; } haven/src/readstat/stata/readstat_dta_parse_timestamp.h0000644000176200001440000000023114101007206023135 0ustar liggesusers readstat_error_t dta_parse_timestamp(const char *data, size_t len, struct tm *timestamp, readstat_error_handler error_handler, void *user_ctx); haven/src/readstat/stata/readstat_dta_parse_timestamp.c0000644000176200001440000002460314101765776023170 0ustar liggesusers#line 1 "src/stata/readstat_dta_parse_timestamp.rl" #include #include "../readstat.h" #include "readstat_dta_parse_timestamp.h" #line 9 "src/stata/readstat_dta_parse_timestamp.c" static const signed char _dta_timestamp_parse_actions[] = { 0, 1, 0, 1, 2, 1, 3, 1, 4, 1, 5, 1, 6, 1, 7, 1, 8, 1, 9, 1, 10, 1, 11, 1, 12, 1, 13, 1, 14, 1, 15, 1, 16, 1, 17, 2, 1, 0, 0 }; static const signed char _dta_timestamp_parse_key_offsets[] = { 0, 0, 3, 5, 8, 26, 34, 36, 37, 39, 42, 45, 48, 50, 52, 53, 55, 59, 63, 64, 66, 68, 70, 71, 73, 75, 76, 80, 82, 86, 87, 88, 90, 96, 97, 98, 100, 102, 103, 107, 109, 110, 112, 114, 115, 0 }; static const char _dta_timestamp_parse_trans_keys[] = { 32, 48, 57, 48, 57, 32, 48, 57, 65, 68, 69, 70, 74, 77, 78, 79, 83, 97, 100, 101, 102, 106, 109, 110, 111, 115, 66, 71, 80, 85, 98, 103, 112, 117, 82, 114, 32, 48, 57, 32, 48, 57, 32, 48, 57, 58, 48, 57, 48, 57, 79, 111, 32, 71, 103, 69, 73, 101, 105, 67, 90, 99, 122, 32, 67, 99, 78, 110, 69, 101, 32, 69, 101, 66, 98, 32, 65, 85, 97, 117, 78, 110, 76, 78, 108, 110, 32, 32, 65, 97, 73, 82, 89, 105, 114, 121, 32, 32, 79, 111, 86, 118, 32, 67, 75, 99, 107, 84, 116, 32, 69, 101, 80, 112, 32, 48, 57, 0 }; static const signed char _dta_timestamp_parse_single_lengths[] = { 0, 1, 0, 1, 18, 8, 2, 1, 0, 1, 1, 1, 0, 2, 1, 2, 4, 4, 1, 2, 2, 2, 1, 2, 2, 1, 4, 2, 4, 1, 1, 2, 6, 1, 1, 2, 2, 1, 4, 2, 1, 2, 2, 1, 0, 0 }; static const signed char _dta_timestamp_parse_range_lengths[] = { 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0 }; static const short _dta_timestamp_parse_index_offsets[] = { 0, 0, 3, 5, 8, 27, 36, 39, 41, 43, 46, 49, 52, 54, 57, 59, 62, 67, 72, 74, 77, 80, 83, 85, 88, 91, 93, 98, 101, 106, 108, 110, 113, 120, 122, 124, 127, 130, 132, 137, 140, 142, 145, 148, 150, 0 }; static const signed char _dta_timestamp_parse_cond_targs[] = { 2, 3, 0, 3, 0, 4, 3, 0, 5, 16, 20, 23, 26, 31, 35, 38, 41, 5, 16, 20, 23, 26, 31, 35, 38, 41, 0, 6, 13, 6, 15, 6, 13, 6, 15, 0, 7, 7, 0, 8, 0, 9, 0, 10, 9, 0, 10, 11, 0, 12, 11, 0, 44, 0, 14, 14, 0, 8, 0, 14, 14, 0, 17, 19, 17, 19, 0, 18, 18, 18, 18, 0, 8, 0, 18, 18, 0, 21, 21, 0, 22, 22, 0, 8, 0, 24, 24, 0, 25, 25, 0, 8, 0, 27, 28, 27, 28, 0, 22, 22, 0, 29, 30, 29, 30, 0, 8, 0, 8, 0, 32, 32, 0, 33, 34, 33, 33, 34, 33, 0, 8, 0, 8, 0, 36, 36, 0, 37, 37, 0, 8, 0, 39, 39, 39, 39, 0, 40, 40, 0, 8, 0, 42, 42, 0, 43, 43, 0, 8, 0, 44, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 0 }; static const signed char _dta_timestamp_parse_cond_actions[] = { 0, 35, 0, 35, 0, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 0, 35, 0, 29, 1, 0, 0, 35, 0, 31, 1, 0, 35, 0, 0, 0, 0, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 17, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 0, 9, 0, 0, 0, 0, 0, 0, 0, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 0, 0, 0, 0, 0, 0, 0, 21, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 0 }; static const short _dta_timestamp_parse_eof_trans[] = { 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 0 }; static const int dta_timestamp_parse_start = 1; static const int dta_timestamp_parse_en_main = 1; #line 9 "src/stata/readstat_dta_parse_timestamp.rl" readstat_error_t dta_parse_timestamp(const char *data, size_t len, struct tm *timestamp, readstat_error_handler error_handler, void *user_ctx) { readstat_error_t retval = READSTAT_OK; const char *p = data; const char *pe = p + len; const char *eof = pe; int cs; unsigned int temp_val = 0; #line 154 "src/stata/readstat_dta_parse_timestamp.c" { cs = (int)dta_timestamp_parse_start; } #line 159 "src/stata/readstat_dta_parse_timestamp.c" { int _klen; unsigned int _trans = 0; const char * _keys; const signed char * _acts; unsigned int _nacts; _resume: {} if ( p == pe && p != eof ) goto _out; if ( p == eof ) { if ( _dta_timestamp_parse_eof_trans[cs] > 0 ) { _trans = (unsigned int)_dta_timestamp_parse_eof_trans[cs] - 1; } } else { _keys = ( _dta_timestamp_parse_trans_keys + (_dta_timestamp_parse_key_offsets[cs])); _trans = (unsigned int)_dta_timestamp_parse_index_offsets[cs]; _klen = (int)_dta_timestamp_parse_single_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + _klen - 1; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _keys += _klen; _trans += (unsigned int)_klen; break; } _mid = _lower + ((_upper-_lower) >> 1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 1; else if ( ( (*( p))) > (*( _mid)) ) _lower = _mid + 1; else { _trans += (unsigned int)(_mid - _keys); goto _match; } } } _klen = (int)_dta_timestamp_parse_range_lengths[cs]; if ( _klen > 0 ) { const char *_lower = _keys; const char *_upper = _keys + (_klen<<1) - 2; const char *_mid; while ( 1 ) { if ( _upper < _lower ) { _trans += (unsigned int)_klen; break; } _mid = _lower + (((_upper-_lower) >> 1) & ~1); if ( ( (*( p))) < (*( _mid)) ) _upper = _mid - 2; else if ( ( (*( p))) > (*( _mid + 1)) ) _lower = _mid + 2; else { _trans += (unsigned int)((_mid - _keys)>>1); break; } } } _match: {} } cs = (int)_dta_timestamp_parse_cond_targs[_trans]; if ( _dta_timestamp_parse_cond_actions[_trans] != 0 ) { _acts = ( _dta_timestamp_parse_actions + (_dta_timestamp_parse_cond_actions[_trans])); _nacts = (unsigned int)(*( _acts)); _acts += 1; while ( _nacts > 0 ) { switch ( (*( _acts)) ) { case 0: { { #line 20 "src/stata/readstat_dta_parse_timestamp.rl" temp_val = 10 * temp_val + ((( (*( p)))) - '0'); } #line 244 "src/stata/readstat_dta_parse_timestamp.c" break; } case 1: { { #line 24 "src/stata/readstat_dta_parse_timestamp.rl" temp_val = 0; } #line 253 "src/stata/readstat_dta_parse_timestamp.c" break; } case 2: { { #line 26 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_mday = temp_val; } #line 262 "src/stata/readstat_dta_parse_timestamp.c" break; } case 3: { { #line 29 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_mon = 0; } #line 271 "src/stata/readstat_dta_parse_timestamp.c" break; } case 4: { { #line 30 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_mon = 1; } #line 280 "src/stata/readstat_dta_parse_timestamp.c" break; } case 5: { { #line 31 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_mon = 2; } #line 289 "src/stata/readstat_dta_parse_timestamp.c" break; } case 6: { { #line 32 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_mon = 3; } #line 298 "src/stata/readstat_dta_parse_timestamp.c" break; } case 7: { { #line 33 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_mon = 4; } #line 307 "src/stata/readstat_dta_parse_timestamp.c" break; } case 8: { { #line 34 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_mon = 5; } #line 316 "src/stata/readstat_dta_parse_timestamp.c" break; } case 9: { { #line 35 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_mon = 6; } #line 325 "src/stata/readstat_dta_parse_timestamp.c" break; } case 10: { { #line 36 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_mon = 7; } #line 334 "src/stata/readstat_dta_parse_timestamp.c" break; } case 11: { { #line 37 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_mon = 8; } #line 343 "src/stata/readstat_dta_parse_timestamp.c" break; } case 12: { { #line 38 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_mon = 9; } #line 352 "src/stata/readstat_dta_parse_timestamp.c" break; } case 13: { { #line 39 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_mon = 10; } #line 361 "src/stata/readstat_dta_parse_timestamp.c" break; } case 14: { { #line 40 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_mon = 11; } #line 370 "src/stata/readstat_dta_parse_timestamp.c" break; } case 15: { { #line 42 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_year = temp_val - 1900; } #line 379 "src/stata/readstat_dta_parse_timestamp.c" break; } case 16: { { #line 44 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_hour = temp_val; } #line 388 "src/stata/readstat_dta_parse_timestamp.c" break; } case 17: { { #line 46 "src/stata/readstat_dta_parse_timestamp.rl" timestamp->tm_min = temp_val; } #line 397 "src/stata/readstat_dta_parse_timestamp.c" break; } } _nacts -= 1; _acts += 1; } } if ( p == eof ) { if ( cs >= 44 ) goto _out; } else { if ( cs != 0 ) { p += 1; goto _resume; } } _out: {} } #line 52 "src/stata/readstat_dta_parse_timestamp.rl" if (cs < #line 425 "src/stata/readstat_dta_parse_timestamp.c" 44 #line 54 "src/stata/readstat_dta_parse_timestamp.rl" || p != pe) { char error_buf[1024]; if (error_handler) { snprintf(error_buf, sizeof(error_buf), "Invalid timestamp string (length=%d): %.*s", (int)len, (int)len, data); error_handler(error_buf, user_ctx); } retval = READSTAT_ERROR_BAD_TIMESTAMP_STRING; } (void)dta_timestamp_parse_en_main; return retval; } haven/src/readstat/stata/readstat_dta_parse_timestamp.rl0000644000176200001440000000417014101007206023331 0ustar liggesusers #include #include "../readstat.h" #include "readstat_dta_parse_timestamp.h" %%{ machine dta_timestamp_parse; write data nofinal noerror; }%% readstat_error_t dta_parse_timestamp(const char *data, size_t len, struct tm *timestamp, readstat_error_handler error_handler, void *user_ctx) { readstat_error_t retval = READSTAT_OK; const char *p = data; const char *pe = p + len; const char *eof = pe; int cs; unsigned int temp_val = 0; %%{ action incr_val { temp_val = 10 * temp_val + (fc - '0'); } integer = [0-9]+ >{ temp_val = 0; } $incr_val; day = integer %{ timestamp->tm_mday = temp_val; }; month = # with some German and Spanish variants thrown in ("Jan"i | "Ene"i) %{ timestamp->tm_mon = 0; } | ("Feb"i) %{ timestamp->tm_mon = 1; } | ("Mar"i) %{ timestamp->tm_mon = 2; } | ("Apr"i | "Abr"i) %{ timestamp->tm_mon = 3; } | ("May"i | "Mai"i) %{ timestamp->tm_mon = 4; } | ("Jun"i) %{ timestamp->tm_mon = 5; } | ("Jul"i) %{ timestamp->tm_mon = 6; } | ("Aug"i | "Ago"i) %{ timestamp->tm_mon = 7; } | ("Sep"i) %{ timestamp->tm_mon = 8; } | ("Oct"i | "Okt"i) %{ timestamp->tm_mon = 9; } | ("Nov"i) %{ timestamp->tm_mon = 10; } | ("Dec"i | "Dez"i | "Dic"i) %{ timestamp->tm_mon = 11; }; year = integer %{ timestamp->tm_year = temp_val - 1900; }; hour = integer %{ timestamp->tm_hour = temp_val; }; minute = integer %{ timestamp->tm_min = temp_val; }; main := " "? day " " month " " year " "+ hour ":" minute; write init; write exec; }%% if (cs < %%{ write first_final; }%%|| p != pe) { char error_buf[1024]; if (error_handler) { snprintf(error_buf, sizeof(error_buf), "Invalid timestamp string (length=%d): %.*s", (int)len, (int)len, data); error_handler(error_buf, user_ctx); } retval = READSTAT_ERROR_BAD_TIMESTAMP_STRING; } (void)dta_timestamp_parse_en_main; return retval; } haven/src/readstat/readstat_writer.c0000644000176200001440000006066514101765776017353 0ustar liggesusers #include #include #include "readstat.h" #include "readstat_writer.h" #define VARIABLES_INITIAL_CAPACITY 50 #define LABEL_SETS_INITIAL_CAPACITY 50 #define NOTES_INITIAL_CAPACITY 50 #define VALUE_LABELS_INITIAL_CAPACITY 10 #define STRING_REFS_INITIAL_CAPACITY 100 #define LABEL_SET_VARIABLES_INITIAL_CAPACITY 2 static readstat_error_t readstat_write_row_default_callback(void *writer_ctx, void *bytes, size_t len) { return readstat_write_bytes((readstat_writer_t *)writer_ctx, bytes, len); } static int readstat_compare_string_refs(const void *elem1, const void *elem2) { readstat_string_ref_t *ref1 = *(readstat_string_ref_t **)elem1; readstat_string_ref_t *ref2 = *(readstat_string_ref_t **)elem2; if (ref1->first_o == ref2->first_o) return ref1->first_v - ref2->first_v; return ref1->first_o - ref2->first_o; } readstat_string_ref_t *readstat_string_ref_init(const char *string) { size_t len = strlen(string) + 1; readstat_string_ref_t *ref = calloc(1, sizeof(readstat_string_ref_t) + len); ref->first_o = -1; ref->first_v = -1; ref->len = len; memcpy(&ref->data[0], string, len); return ref; } readstat_writer_t *readstat_writer_init() { readstat_writer_t *writer = calloc(1, sizeof(readstat_writer_t)); writer->variables = calloc(VARIABLES_INITIAL_CAPACITY, sizeof(readstat_variable_t *)); writer->variables_capacity = VARIABLES_INITIAL_CAPACITY; writer->label_sets = calloc(LABEL_SETS_INITIAL_CAPACITY, sizeof(readstat_label_set_t *)); writer->label_sets_capacity = LABEL_SETS_INITIAL_CAPACITY; writer->notes = calloc(NOTES_INITIAL_CAPACITY, sizeof(char *)); writer->notes_capacity = NOTES_INITIAL_CAPACITY; writer->string_refs = calloc(STRING_REFS_INITIAL_CAPACITY, sizeof(readstat_string_ref_t *)); writer->string_refs_capacity = STRING_REFS_INITIAL_CAPACITY; writer->timestamp = time(NULL); writer->is_64bit = 1; writer->callbacks.write_row = &readstat_write_row_default_callback; return writer; } static void readstat_variable_free(readstat_variable_t *variable) { free(variable); } static void readstat_label_set_free(readstat_label_set_t *label_set) { int i; for (i=0; ivalue_labels_count; i++) { readstat_value_label_t *value_label = readstat_get_value_label(label_set, i); if (value_label->label) free(value_label->label); if (value_label->string_key) free(value_label->string_key); } free(label_set->value_labels); free(label_set->variables); free(label_set); } static void readstat_copy_label(readstat_value_label_t *value_label, const char *label) { if (label && strlen(label)) { value_label->label_len = strlen(label); value_label->label = malloc(value_label->label_len); memcpy(value_label->label, label, value_label->label_len); } } static readstat_value_label_t *readstat_add_value_label(readstat_label_set_t *label_set, const char *label) { if (label_set->value_labels_count == label_set->value_labels_capacity) { label_set->value_labels_capacity *= 2; label_set->value_labels = realloc(label_set->value_labels, label_set->value_labels_capacity * sizeof(readstat_value_label_t)); } readstat_value_label_t *new_value_label = &label_set->value_labels[label_set->value_labels_count++]; memset(new_value_label, 0, sizeof(readstat_value_label_t)); readstat_copy_label(new_value_label, label); return new_value_label; } readstat_error_t readstat_validate_variable(readstat_writer_t *writer, const readstat_variable_t *variable) { if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; if (writer->callbacks.variable_ok) return writer->callbacks.variable_ok(variable); return READSTAT_OK; } readstat_error_t readstat_validate_metadata(readstat_writer_t *writer) { if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; if (writer->callbacks.metadata_ok) return writer->callbacks.metadata_ok(writer); return READSTAT_OK; } static readstat_error_t readstat_begin_writing_data(readstat_writer_t *writer) { readstat_error_t retval = READSTAT_OK; size_t row_len = 0; int i; retval = readstat_validate_metadata(writer); if (retval != READSTAT_OK) goto cleanup; for (i=0; ivariables_count; i++) { readstat_variable_t *variable = readstat_get_variable(writer, i); variable->storage_width = writer->callbacks.variable_width(variable->type, variable->user_width); variable->offset = row_len; row_len += variable->storage_width; } if (writer->callbacks.variable_ok) { for (i=0; ivariables_count; i++) { retval = readstat_validate_variable(writer, readstat_get_variable(writer, i)); if (retval != READSTAT_OK) goto cleanup; } } writer->row_len = row_len; writer->row = malloc(writer->row_len); if (writer->callbacks.begin_data) { retval = writer->callbacks.begin_data(writer); } cleanup: return retval; } void readstat_writer_free(readstat_writer_t *writer) { int i; if (writer) { if (writer->callbacks.module_ctx_free && writer->module_ctx) { writer->callbacks.module_ctx_free(writer->module_ctx); } if (writer->variables) { for (i=0; ivariables_count; i++) { readstat_variable_free(writer->variables[i]); } free(writer->variables); } if (writer->label_sets) { for (i=0; ilabel_sets_count; i++) { readstat_label_set_free(writer->label_sets[i]); } free(writer->label_sets); } if (writer->notes) { for (i=0; inotes_count; i++) { free(writer->notes[i]); } free(writer->notes); } if (writer->string_refs) { for (i=0; istring_refs_count; i++) { free(writer->string_refs[i]); } free(writer->string_refs); } if (writer->row) { free(writer->row); } free(writer); } } readstat_error_t readstat_set_data_writer(readstat_writer_t *writer, readstat_data_writer data_writer) { writer->data_writer = data_writer; return READSTAT_OK; } readstat_error_t readstat_write_bytes(readstat_writer_t *writer, const void *bytes, size_t len) { size_t bytes_written = writer->data_writer(bytes, len, writer->user_ctx); if (bytes_written < len) { return READSTAT_ERROR_WRITE; } writer->bytes_written += bytes_written; return READSTAT_OK; } readstat_error_t readstat_write_bytes_as_lines(readstat_writer_t *writer, const void *bytes, size_t len, size_t line_len, const char *line_sep) { size_t line_sep_len = strlen(line_sep); readstat_error_t retval = READSTAT_OK; size_t bytes_written = 0; while (bytes_written < len) { ssize_t bytes_left_in_line = line_len - (writer->bytes_written % (line_len + line_sep_len)); if (len - bytes_written < bytes_left_in_line) { retval = readstat_write_bytes(writer, ((const char *)bytes) + bytes_written, len - bytes_written); bytes_written = len; } else { retval = readstat_write_bytes(writer, ((const char *)bytes) + bytes_written, bytes_left_in_line); bytes_written += bytes_left_in_line; } if (retval != READSTAT_OK) break; if (writer->bytes_written % (line_len + line_sep_len) == line_len) { if ((retval = readstat_write_bytes(writer, line_sep, line_sep_len)) != READSTAT_OK) break; } } return retval; } readstat_error_t readstat_write_line_padding(readstat_writer_t *writer, char pad, size_t line_len, const char *line_sep) { size_t line_sep_len = strlen(line_sep); if (writer->bytes_written % (line_len + line_sep_len) == 0) return READSTAT_OK; readstat_error_t error = READSTAT_OK; ssize_t bytes_left_in_line = line_len - (writer->bytes_written % (line_len + line_sep_len)); char *bytes = malloc(bytes_left_in_line); memset(bytes, pad, bytes_left_in_line); if ((error = readstat_write_bytes(writer, bytes, bytes_left_in_line)) != READSTAT_OK) goto cleanup; if ((error = readstat_write_bytes(writer, line_sep, line_sep_len)) != READSTAT_OK) goto cleanup; cleanup: if (bytes) free(bytes); return READSTAT_OK; } readstat_error_t readstat_write_string(readstat_writer_t *writer, const char *bytes) { return readstat_write_bytes(writer, bytes, strlen(bytes)); } static readstat_error_t readstat_write_repeated_byte(readstat_writer_t *writer, char byte, size_t len) { if (len == 0) return READSTAT_OK; char *zeros = malloc(len); memset(zeros, byte, len); readstat_error_t error = readstat_write_bytes(writer, zeros, len); free(zeros); return error; } readstat_error_t readstat_write_zeros(readstat_writer_t *writer, size_t len) { return readstat_write_repeated_byte(writer, '\0', len); } readstat_error_t readstat_write_spaces(readstat_writer_t *writer, size_t len) { return readstat_write_repeated_byte(writer, ' ', len); } readstat_error_t readstat_write_space_padded_string(readstat_writer_t *writer, const char *string, size_t max_len) { readstat_error_t retval = READSTAT_OK; if (string == NULL || string[0] == '\0') return readstat_write_spaces(writer, max_len); size_t len = strlen(string); if (len > max_len) len = max_len; if ((retval = readstat_write_bytes(writer, string, len)) != READSTAT_OK) return retval; return readstat_write_spaces(writer, max_len - len); } readstat_label_set_t *readstat_add_label_set(readstat_writer_t *writer, readstat_type_t type, const char *name) { if (writer->label_sets_count == writer->label_sets_capacity) { writer->label_sets_capacity *= 2; writer->label_sets = realloc(writer->label_sets, writer->label_sets_capacity * sizeof(readstat_label_set_t *)); } readstat_label_set_t *new_label_set = calloc(1, sizeof(readstat_label_set_t)); writer->label_sets[writer->label_sets_count++] = new_label_set; new_label_set->type = type; snprintf(new_label_set->name, sizeof(new_label_set->name), "%s", name); new_label_set->value_labels = calloc(VALUE_LABELS_INITIAL_CAPACITY, sizeof(readstat_value_label_t)); new_label_set->value_labels_capacity = VALUE_LABELS_INITIAL_CAPACITY; new_label_set->variables = calloc(LABEL_SET_VARIABLES_INITIAL_CAPACITY, sizeof(readstat_variable_t *)); new_label_set->variables_capacity = LABEL_SET_VARIABLES_INITIAL_CAPACITY; return new_label_set; } readstat_label_set_t *readstat_get_label_set(readstat_writer_t *writer, int index) { if (index < writer->label_sets_count) { return writer->label_sets[index]; } return NULL; } void readstat_sort_label_set(readstat_label_set_t *label_set, int (*compare)(const readstat_value_label_t *, const readstat_value_label_t *)) { qsort(label_set->value_labels, label_set->value_labels_count, sizeof(readstat_value_label_t), (int (*)(const void *, const void *))compare); } readstat_value_label_t *readstat_get_value_label(readstat_label_set_t *label_set, int index) { if (index < label_set->value_labels_count) { return &label_set->value_labels[index]; } return NULL; } readstat_variable_t *readstat_get_label_set_variable(readstat_label_set_t *label_set, int index) { if (index < label_set->variables_count) { return ((readstat_variable_t **)label_set->variables)[index]; } return NULL; } void readstat_label_double_value(readstat_label_set_t *label_set, double value, const char *label) { readstat_value_label_t *new_value_label = readstat_add_value_label(label_set, label); new_value_label->double_key = value; new_value_label->int32_key = value; } void readstat_label_int32_value(readstat_label_set_t *label_set, int32_t value, const char *label) { readstat_value_label_t *new_value_label = readstat_add_value_label(label_set, label); new_value_label->double_key = value; new_value_label->int32_key = value; } void readstat_label_string_value(readstat_label_set_t *label_set, const char *value, const char *label) { readstat_value_label_t *new_value_label = readstat_add_value_label(label_set, label); if (value && strlen(value)) { new_value_label->string_key_len = strlen(value); new_value_label->string_key = malloc(new_value_label->string_key_len); memcpy(new_value_label->string_key, value, new_value_label->string_key_len); } } void readstat_label_tagged_value(readstat_label_set_t *label_set, char tag, const char *label) { readstat_value_label_t *new_value_label = readstat_add_value_label(label_set, label); new_value_label->tag = tag; } readstat_variable_t *readstat_add_variable(readstat_writer_t *writer, const char *name, readstat_type_t type, size_t width) { if (writer->variables_count == writer->variables_capacity) { writer->variables_capacity *= 2; writer->variables = realloc(writer->variables, writer->variables_capacity * sizeof(readstat_variable_t *)); } readstat_variable_t *new_variable = calloc(1, sizeof(readstat_variable_t)); new_variable->index = writer->variables_count++; writer->variables[new_variable->index] = new_variable; new_variable->user_width = width; new_variable->type = type; if (readstat_variable_get_type_class(new_variable) == READSTAT_TYPE_CLASS_STRING) { new_variable->alignment = READSTAT_ALIGNMENT_LEFT; } else { new_variable->alignment = READSTAT_ALIGNMENT_RIGHT; } new_variable->measure = READSTAT_MEASURE_UNKNOWN; if (name) { snprintf(new_variable->name, sizeof(new_variable->name), "%s", name); } return new_variable; } static void readstat_append_string_ref(readstat_writer_t *writer, readstat_string_ref_t *ref) { if (writer->string_refs_count == writer->string_refs_capacity) { writer->string_refs_capacity *= 2; writer->string_refs = realloc(writer->string_refs, writer->string_refs_capacity * sizeof(readstat_string_ref_t *)); } writer->string_refs[writer->string_refs_count++] = ref; } readstat_string_ref_t *readstat_add_string_ref(readstat_writer_t *writer, const char *string) { readstat_string_ref_t *ref = readstat_string_ref_init(string); readstat_append_string_ref(writer, ref); return ref; } void readstat_add_note(readstat_writer_t *writer, const char *note) { if (writer->notes_count == writer->notes_capacity) { writer->notes_capacity *= 2; writer->notes = realloc(writer->notes, writer->notes_capacity * sizeof(const char *)); } char *note_copy = malloc(strlen(note) + 1); strcpy(note_copy, note); writer->notes[writer->notes_count++] = note_copy; } void readstat_variable_set_label(readstat_variable_t *variable, const char *label) { if (label) { snprintf(variable->label, sizeof(variable->label), "%s", label); } else { memset(variable->label, '\0', sizeof(variable->label)); } } void readstat_variable_set_format(readstat_variable_t *variable, const char *format) { if (format) { snprintf(variable->format, sizeof(variable->format), "%s", format); } else { memset(variable->format, '\0', sizeof(variable->format)); } } void readstat_variable_set_measure(readstat_variable_t *variable, readstat_measure_t measure) { variable->measure = measure; } void readstat_variable_set_alignment(readstat_variable_t *variable, readstat_alignment_t alignment) { variable->alignment = alignment; } void readstat_variable_set_display_width(readstat_variable_t *variable, int display_width) { variable->display_width = display_width; } void readstat_variable_set_label_set(readstat_variable_t *variable, readstat_label_set_t *label_set) { variable->label_set = label_set; if (label_set) { if (label_set->variables_count == label_set->variables_capacity) { label_set->variables_capacity *= 2; label_set->variables = realloc(label_set->variables, label_set->variables_capacity * sizeof(readstat_variable_t *)); } ((readstat_variable_t **)label_set->variables)[label_set->variables_count++] = variable; } } readstat_variable_t *readstat_get_variable(readstat_writer_t *writer, int index) { if (index < writer->variables_count) { return writer->variables[index]; } return NULL; } readstat_string_ref_t *readstat_get_string_ref(readstat_writer_t *writer, int index) { if (index < writer->string_refs_count) { return writer->string_refs[index]; } return NULL; } readstat_error_t readstat_writer_set_file_label(readstat_writer_t *writer, const char *file_label) { snprintf(writer->file_label, sizeof(writer->file_label), "%s", file_label); return READSTAT_OK; } readstat_error_t readstat_writer_set_file_timestamp(readstat_writer_t *writer, time_t timestamp) { writer->timestamp = timestamp; return READSTAT_OK; } readstat_error_t readstat_writer_set_table_name(readstat_writer_t *writer, const char *table_name) { snprintf(writer->table_name, sizeof(writer->table_name), "%s", table_name); return READSTAT_OK; } readstat_error_t readstat_writer_set_fweight_variable(readstat_writer_t *writer, const readstat_variable_t *variable) { if (readstat_variable_get_type_class(variable) == READSTAT_TYPE_CLASS_STRING) return READSTAT_ERROR_BAD_FREQUENCY_WEIGHT; writer->fweight_variable = variable; return READSTAT_OK; } readstat_error_t readstat_writer_set_file_format_version(readstat_writer_t *writer, uint8_t version) { writer->version = version; return READSTAT_OK; } readstat_error_t readstat_writer_set_file_format_is_64bit(readstat_writer_t *writer, int is_64bit) { writer->is_64bit = is_64bit; return READSTAT_OK; } readstat_error_t readstat_writer_set_compression(readstat_writer_t *writer, readstat_compress_t compression) { writer->compression = compression; return READSTAT_OK; } readstat_error_t readstat_writer_set_error_handler(readstat_writer_t *writer, readstat_error_handler error_handler) { writer->error_handler = error_handler; return READSTAT_OK; } readstat_error_t readstat_begin_writing_file(readstat_writer_t *writer, void *user_ctx, long row_count) { writer->row_count = row_count; writer->user_ctx = user_ctx; writer->initialized = 1; return readstat_validate_metadata(writer); } readstat_error_t readstat_begin_row(readstat_writer_t *writer) { readstat_error_t retval = READSTAT_OK; if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; if (writer->current_row == 0) retval = readstat_begin_writing_data(writer); memset(writer->row, '\0', writer->row_len); return retval; } // Then call one of these for each variable readstat_error_t readstat_insert_int8_value(readstat_writer_t *writer, const readstat_variable_t *variable, int8_t value) { if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; if (variable->type != READSTAT_TYPE_INT8) return READSTAT_ERROR_VALUE_TYPE_MISMATCH; return writer->callbacks.write_int8(&writer->row[variable->offset], variable, value); } readstat_error_t readstat_insert_int16_value(readstat_writer_t *writer, const readstat_variable_t *variable, int16_t value) { if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; if (variable->type != READSTAT_TYPE_INT16) return READSTAT_ERROR_VALUE_TYPE_MISMATCH; return writer->callbacks.write_int16(&writer->row[variable->offset], variable, value); } readstat_error_t readstat_insert_int32_value(readstat_writer_t *writer, const readstat_variable_t *variable, int32_t value) { if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; if (variable->type != READSTAT_TYPE_INT32) return READSTAT_ERROR_VALUE_TYPE_MISMATCH; return writer->callbacks.write_int32(&writer->row[variable->offset], variable, value); } readstat_error_t readstat_insert_float_value(readstat_writer_t *writer, const readstat_variable_t *variable, float value) { if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; if (variable->type != READSTAT_TYPE_FLOAT) return READSTAT_ERROR_VALUE_TYPE_MISMATCH; return writer->callbacks.write_float(&writer->row[variable->offset], variable, value); } readstat_error_t readstat_insert_double_value(readstat_writer_t *writer, const readstat_variable_t *variable, double value) { if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; if (variable->type != READSTAT_TYPE_DOUBLE) return READSTAT_ERROR_VALUE_TYPE_MISMATCH; return writer->callbacks.write_double(&writer->row[variable->offset], variable, value); } readstat_error_t readstat_insert_string_value(readstat_writer_t *writer, const readstat_variable_t *variable, const char *value) { if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; if (variable->type != READSTAT_TYPE_STRING) return READSTAT_ERROR_VALUE_TYPE_MISMATCH; return writer->callbacks.write_string(&writer->row[variable->offset], variable, value); } readstat_error_t readstat_insert_string_ref(readstat_writer_t *writer, const readstat_variable_t *variable, readstat_string_ref_t *ref) { if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; if (variable->type != READSTAT_TYPE_STRING_REF) return READSTAT_ERROR_VALUE_TYPE_MISMATCH; if (!writer->callbacks.write_string_ref) return READSTAT_ERROR_STRING_REFS_NOT_SUPPORTED; if (ref && ref->first_o == -1 && ref->first_v == -1) { ref->first_o = writer->current_row; ref->first_v = variable->index; } return writer->callbacks.write_string_ref(&writer->row[variable->offset], variable, ref); } readstat_error_t readstat_insert_missing_value(readstat_writer_t *writer, const readstat_variable_t *variable) { if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; if (variable->type == READSTAT_TYPE_STRING) { return writer->callbacks.write_missing_string(&writer->row[variable->offset], variable); } if (variable->type == READSTAT_TYPE_STRING_REF) { return readstat_insert_string_ref(writer, variable, NULL); } return writer->callbacks.write_missing_number(&writer->row[variable->offset], variable); } readstat_error_t readstat_insert_tagged_missing_value(readstat_writer_t *writer, const readstat_variable_t *variable, char tag) { if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; if (!writer->callbacks.write_missing_tagged) { /* Write out a missing number but return an error */ writer->callbacks.write_missing_number(&writer->row[variable->offset], variable); return READSTAT_ERROR_TAGGED_VALUES_NOT_SUPPORTED; } return writer->callbacks.write_missing_tagged(&writer->row[variable->offset], variable, tag); } readstat_error_t readstat_end_row(readstat_writer_t *writer) { if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; readstat_error_t error = writer->callbacks.write_row(writer, writer->row, writer->row_len); if (error == READSTAT_OK) writer->current_row++; return error; } readstat_error_t readstat_end_writing(readstat_writer_t *writer) { if (!writer->initialized) return READSTAT_ERROR_WRITER_NOT_INITIALIZED; if (writer->current_row != writer->row_count) return READSTAT_ERROR_ROW_COUNT_MISMATCH; if (writer->row_count == 0) { readstat_error_t retval = readstat_begin_writing_data(writer); if (retval != READSTAT_OK) return retval; } /* Sort if out of order */ int i; for (i=1; istring_refs_count; i++) { if (readstat_compare_string_refs(&writer->string_refs[i-1], &writer->string_refs[i]) > 0) { qsort(writer->string_refs, writer->string_refs_count, sizeof(readstat_string_ref_t *), &readstat_compare_string_refs); break; } } if (!writer->callbacks.end_data) return READSTAT_OK; return writer->callbacks.end_data(writer); } haven/src/readstat/readstat_bits.h0000644000176200001440000000106014101007206016736 0ustar liggesusers// // readstat_bit.h - Bit-twiddling utility functions // #define READSTAT_MACHINE_IS_TWOS_COMPLEMENT ((char)0xFF == (char)-1) #undef READSTAT_MACHINE_IS_TWOS_COMPLEMENT #define READSTAT_MACHINE_IS_TWOS_COMPLEMENT 0 int machine_is_little_endian(); char ones_to_twos_complement1(char num); int16_t ones_to_twos_complement2(int16_t num); int32_t ones_to_twos_complement4(int32_t num); uint16_t byteswap2(uint16_t num); uint32_t byteswap4(uint32_t num); uint64_t byteswap8(uint64_t num); float byteswap_float(float num); double byteswap_double(double num); haven/src/readstat/CKHashTable.c0000644000176200001440000002132214101007206016155 0ustar liggesusers// CKHashTable - A simple hash table // Copyright 2010-2020 Evan Miller (see LICENSE) #include "CKHashTable.h" /* SipHash reference C implementation Copyright (c) 2012 Jean-Philippe Aumasson Copyright (c) 2012 Daniel J. Bernstein To the extent possible under law, the author(s) have dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty. You should have received a copy of the CC0 Public Domain Dedication along with this software. If not, see . */ #include #include #include typedef uint64_t u64; typedef uint32_t u32; typedef uint8_t u8; #define ROTL(x,b) (u64)( ((x) << (b)) | ( (x) >> (64 - (b))) ) #define U32TO8_LE(p, v) \ (p)[0] = (u8)((v) ); (p)[1] = (u8)((v) >> 8); \ (p)[2] = (u8)((v) >> 16); (p)[3] = (u8)((v) >> 24); #define U64TO8_LE(p, v) \ U32TO8_LE((p), (u32)((v) )); \ U32TO8_LE((p) + 4, (u32)((v) >> 32)); #define U8TO64_LE(p) \ (((u64)((p)[0]) ) | \ ((u64)((p)[1]) << 8) | \ ((u64)((p)[2]) << 16) | \ ((u64)((p)[3]) << 24) | \ ((u64)((p)[4]) << 32) | \ ((u64)((p)[5]) << 40) | \ ((u64)((p)[6]) << 48) | \ ((u64)((p)[7]) << 56)) #define SIPROUND \ do { \ v0 += v1; v1=ROTL(v1,13); v1 ^= v0; v0=ROTL(v0,32); \ v2 += v3; v3=ROTL(v3,16); v3 ^= v2; \ v0 += v3; v3=ROTL(v3,21); v3 ^= v0; \ v2 += v1; v1=ROTL(v1,17); v1 ^= v2; v2=ROTL(v2,32); \ } while(0) /* SipHash-1-2 */ static int siphash( unsigned char *out, const unsigned char *in, unsigned long long inlen, const unsigned char *k ) { /* "somepseudorandomlygeneratedbytes" */ u64 v0 = 0x736f6d6570736575ULL; u64 v1 = 0x646f72616e646f6dULL; u64 v2 = 0x6c7967656e657261ULL; u64 v3 = 0x7465646279746573ULL; u64 b; u64 k0 = U8TO64_LE( k ); u64 k1 = U8TO64_LE( k + 8 ); u64 m; const u8 *end = in + inlen - ( inlen % sizeof( u64 ) ); const int left = inlen & 7; b = ( ( u64 )inlen ) << 56; v3 ^= k1; v2 ^= k0; v1 ^= k1; v0 ^= k0; for ( ; in != end; in += 8 ) { m = U8TO64_LE( in ); v3 ^= m; SIPROUND; v0 ^= m; } switch( left ) { case 7: b |= ( ( u64 )in[ 6] ) << 48; case 6: b |= ( ( u64 )in[ 5] ) << 40; case 5: b |= ( ( u64 )in[ 4] ) << 32; case 4: b |= ( ( u64 )in[ 3] ) << 24; case 3: b |= ( ( u64 )in[ 2] ) << 16; case 2: b |= ( ( u64 )in[ 1] ) << 8; case 1: b |= ( ( u64 )in[ 0] ); break; case 0: break; } v3 ^= b; SIPROUND; v0 ^= b; v2 ^= 0xff; SIPROUND; SIPROUND; b = v0 ^ v1 ^ v2 ^ v3; U64TO8_LE( out, b ); return 0; } static uint64_t ck_hash_str(const char *str, size_t keylen) { uint64_t hash; unsigned char k[16] = { 0 }; siphash((unsigned char *)&hash, (const unsigned char *)str, keylen, k); return hash; } const void *ck_float_hash_lookup(float key, ck_hash_table_t *table) { return ck_str_n_hash_lookup((const char *)&key, sizeof(float), table); } int ck_float_hash_insert(float key, const void *value, ck_hash_table_t *table) { return ck_str_n_hash_insert((const char *)&key, sizeof(float), value, table); } const void *ck_double_hash_lookup(double key, ck_hash_table_t *table) { return ck_str_n_hash_lookup((const char *)&key, sizeof(double), table); } int ck_double_hash_insert(double key, const void *value, ck_hash_table_t *table) { return ck_str_n_hash_insert((const char *)&key, sizeof(double), value, table); } const void *ck_str_hash_lookup(const char *key, ck_hash_table_t *table) { size_t keylen = strlen(key); return ck_str_n_hash_lookup(key, keylen, table); } const void *ck_str_n_hash_lookup(const char *key, size_t keylen, ck_hash_table_t *table) { if (table->count == 0) return NULL; if (keylen == 0) return NULL; uint64_t hash_key = ck_hash_str(key, keylen); hash_key %= table->capacity; uint64_t end = hash_key; do { char *this_key = &table->keys[table->entries[hash_key].key_offset]; size_t this_keylen = table->entries[hash_key].key_length; if (this_keylen == 0) return NULL; if (this_keylen == keylen && memcmp(this_key, key, keylen) == 0) { return table->entries[hash_key].value; } hash_key++; hash_key %= table->capacity; } while (hash_key != end); return NULL; } int ck_str_hash_insert(const char *key, const void *value, ck_hash_table_t *table) { size_t keylen = strlen(key); return ck_str_n_hash_insert(key, keylen, value, table); } static int ck_hash_insert_nocopy(off_t key_offset, size_t keylen, uint64_t hash_key, const void *value, ck_hash_table_t *table) { if (table->capacity == 0) return 0; hash_key %= table->capacity; uint64_t end = (hash_key + table->capacity - 1) % table->capacity; while (hash_key != end) { ck_hash_entry_t *entry = &table->entries[hash_key]; if (table->entries[hash_key].key_length == 0) { table->count++; entry->key_offset = key_offset; entry->key_length = keylen; entry->value = value; return 1; } else if (entry->key_length == keylen && entry->key_offset == key_offset) { entry->value = value; return 1; } hash_key++; hash_key %= table->capacity; } return 0; } int ck_str_n_hash_insert(const char *key, size_t keylen, const void *value, ck_hash_table_t *table) { if (table->capacity == 0) return 0; if (keylen == 0) return 0; if (table->count >= 0.75 * table->capacity) { if (ck_hash_table_grow(table) == -1) { return 0; } } uint64_t hash_key = ck_hash_str(key, keylen); hash_key %= table->capacity; uint64_t end = hash_key; do { ck_hash_entry_t *entry = &table->entries[hash_key]; char *this_key = &table->keys[entry->key_offset]; if (entry->key_length == 0) { table->count++; while (table->keys_used + keylen > table->keys_capacity) { table->keys_capacity *= 2; table->keys = realloc(table->keys, table->keys_capacity); } memcpy(table->keys + table->keys_used, key, keylen); entry->key_offset = table->keys_used; entry->key_length = keylen; table->keys_used += keylen; entry->value = value; return 1; } else if (entry->key_length == keylen && memcmp(this_key, key, keylen) == 0) { table->entries[hash_key].value = value; return 1; } hash_key++; hash_key %= table->capacity; } while (hash_key != end); return 0; } ck_hash_table_t *ck_hash_table_init(size_t num_entries, size_t mean_key_length) { ck_hash_table_t *table; if ((table = malloc(sizeof(ck_hash_table_t))) == NULL) return NULL; if ((table->keys = malloc(num_entries * mean_key_length)) == NULL) { free(table); return NULL; } table->keys_capacity = num_entries * mean_key_length; num_entries *= 2; if ((table->entries = malloc(num_entries * sizeof(ck_hash_entry_t))) == NULL) { free(table->keys); free(table); return NULL; } table->capacity = num_entries; ck_hash_table_wipe(table); return table; } void ck_hash_table_free(ck_hash_table_t *table) { free(table->entries); if (table->keys) free(table->keys); free(table); } void ck_hash_table_wipe(ck_hash_table_t *table) { table->keys_used = 0; table->count = 0; memset(table->entries, 0, table->capacity * sizeof(ck_hash_entry_t)); } int ck_hash_table_grow(ck_hash_table_t *table) { ck_hash_entry_t *old_entries = table->entries; uint64_t old_capacity = table->capacity; uint64_t new_capacity = 2 * table->capacity; if ((table->entries = calloc(new_capacity, sizeof(ck_hash_entry_t))) == NULL) { return -1; } table->capacity = new_capacity; table->count = 0; for (int i=0; ikeys[old_entries[i].key_offset]; uint64_t hash_key = ck_hash_str(this_key, old_entries[i].key_length); if (!ck_hash_insert_nocopy(old_entries[i].key_offset, old_entries[i].key_length, hash_key, old_entries[i].value, table)) return -1; } } free(old_entries); return 0; } haven/src/readstat/readstat_metadata.c0000644000176200001440000000224614101007206017557 0ustar liggesusers#include "readstat.h" int readstat_get_row_count(readstat_metadata_t *metadata) { return metadata->row_count; } int readstat_get_var_count(readstat_metadata_t *metadata) { return metadata->var_count; } time_t readstat_get_creation_time(readstat_metadata_t *metadata) { return metadata->creation_time; } time_t readstat_get_modified_time(readstat_metadata_t *metadata) { return metadata->modified_time; } int readstat_get_file_format_version(readstat_metadata_t *metadata) { return metadata->file_format_version; } int readstat_get_file_format_is_64bit(readstat_metadata_t *metadata) { return metadata->is64bit; } readstat_compress_t readstat_get_compression(readstat_metadata_t *metadata) { return metadata->compression; } readstat_endian_t readstat_get_endianness(readstat_metadata_t *metadata) { return metadata->endianness; } const char *readstat_get_file_label(readstat_metadata_t *metadata) { return metadata->file_label; } const char *readstat_get_file_encoding(readstat_metadata_t *metadata) { return metadata->file_encoding; } const char *readstat_get_table_name(readstat_metadata_t *metadata) { return metadata->table_name; } haven/src/readstat/sas/0000755000176200001440000000000014102332323014530 5ustar liggesusershaven/src/readstat/sas/readstat_xport.c0000644000176200001440000000147414101007206017743 0ustar liggesusers#include #include "readstat_xport.h" #include "../readstat_bits.h" char _xport_months[12][4] = { "JAN", "FEB", "MAR", "APR", "MAY", "JUN", "JUL", "AUG", "SEP", "OCT", "NOV", "DEC" }; void xport_namestr_bswap(xport_namestr_t *namestr) { if (!machine_is_little_endian()) return; namestr->ntype = byteswap2(namestr->ntype); namestr->nhfun = byteswap2(namestr->nhfun); namestr->nlng = byteswap2(namestr->nlng); namestr->nvar0 = byteswap2(namestr->nvar0); namestr->nfl = byteswap2(namestr->nfl); namestr->nfd = byteswap2(namestr->nfd); namestr->nfj = byteswap2(namestr->nfj); namestr->nifl = byteswap2(namestr->nifl); namestr->nifd = byteswap2(namestr->nifd); namestr->npos = byteswap4(namestr->npos); namestr->labeln = byteswap2(namestr->labeln); } haven/src/readstat/sas/readstat_sas7bcat_write.c0000644000176200001440000001461514101007206021511 0ustar liggesusers #include #include #include #include #include "../readstat.h" #include "../readstat_writer.h" #include "readstat_sas.h" #include "readstat_sas_rle.h" typedef struct sas7bcat_block_s { size_t len; char data[1]; // Flexible array; use [1] for C++-98 compatibility } sas7bcat_block_t; static sas7bcat_block_t *sas7bcat_block_for_label_set(readstat_label_set_t *r_label_set) { size_t len = 0; size_t name_len = strlen(r_label_set->name); int j; char name[32]; len += 106; if (name_len > 8) { len += 32; // long name if (name_len > 32) { name_len = 32; } } memcpy(&name[0], r_label_set->name, name_len); for (j=0; jvalue_labels_count; j++) { readstat_value_label_t *value_label = readstat_get_value_label(r_label_set, j); len += 30; // Value: 14-byte header + 16-byte padded value len += 8 + 2 + value_label->label_len + 1; } sas7bcat_block_t *block = calloc(1, sizeof(sas7bcat_block_t) + len); block->len = len; off_t begin = 106; int32_t count = r_label_set->value_labels_count; memcpy(&block->data[38], &count, sizeof(int32_t)); memcpy(&block->data[42], &count, sizeof(int32_t)); if (name_len > 8) { block->data[2] = (char)0x80; memcpy(&block->data[8], name, 8); memset(&block->data[106], ' ', 32); memcpy(&block->data[106], name, name_len); begin += 32; } else { memset(&block->data[8], ' ', 8); memcpy(&block->data[8], name, name_len); } char *lbp1 = &block->data[begin]; char *lbp2 = &block->data[begin+r_label_set->value_labels_count*30]; for (j=0; jvalue_labels_count; j++) { readstat_value_label_t *value_label = readstat_get_value_label(r_label_set, j); lbp1[2] = 24; // size - 6 int32_t index = j; memcpy(&lbp1[10], &index, sizeof(int32_t)); if (r_label_set->type == READSTAT_TYPE_STRING) { size_t string_len = value_label->string_key_len; if (string_len > 16) string_len = 16; memset(&lbp1[14], ' ', 16); memcpy(&lbp1[14], value_label->string_key, string_len); } else { uint64_t big_endian_value; double double_value = -1.0 * value_label->double_key; memcpy(&big_endian_value, &double_value, sizeof(double)); if (machine_is_little_endian()) { big_endian_value = byteswap8(big_endian_value); } memcpy(&lbp1[22], &big_endian_value, sizeof(uint64_t)); } int16_t label_len = value_label->label_len; memcpy(&lbp2[8], &label_len, sizeof(int16_t)); memcpy(&lbp2[10], value_label->label, label_len); lbp1 += 30; lbp2 += 8 + 2 + value_label->label_len + 1; } return block; } static readstat_error_t sas7bcat_emit_header(readstat_writer_t *writer, sas_header_info_t *hinfo) { sas_header_start_t header_start = { .a2 = hinfo->u64 ? SAS_ALIGNMENT_OFFSET_4 : SAS_ALIGNMENT_OFFSET_0, .a1 = SAS_ALIGNMENT_OFFSET_0, .endian = machine_is_little_endian() ? SAS_ENDIAN_LITTLE : SAS_ENDIAN_BIG, .file_format = SAS_FILE_FORMAT_UNIX, .encoding = 20, /* UTF-8 */ .file_type = "SAS FILE", .file_info = "CATALOG " }; memcpy(&header_start.magic, sas7bcat_magic_number, sizeof(header_start.magic)); return sas_write_header(writer, hinfo, header_start); } static readstat_error_t sas7bcat_begin_data(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; readstat_error_t retval = READSTAT_OK; int i; sas_header_info_t *hinfo = sas_header_info_init(writer, 0); sas7bcat_block_t **blocks = malloc(writer->label_sets_count * sizeof(sas7bcat_block_t)); char *page = malloc(hinfo->page_size); for (i=0; ilabel_sets_count; i++) { blocks[i] = sas7bcat_block_for_label_set(writer->label_sets[i]); } hinfo->page_count = 4; // Header retval = sas7bcat_emit_header(writer, hinfo); if (retval != READSTAT_OK) goto cleanup; // Page 0 retval = readstat_write_zeros(writer, hinfo->page_size); if (retval != READSTAT_OK) goto cleanup; memset(page, '\0', hinfo->page_size); // Page 1 char *xlsr = &page[856]; int16_t block_idx, block_off; block_idx = 4; block_off = 16; for (i=0; ilabel_sets_count; i++) { if (xlsr + 212 > page + hinfo->page_size) break; memcpy(&xlsr[0], "XLSR", 4); memcpy(&xlsr[4], &block_idx, sizeof(int16_t)); memcpy(&xlsr[8], &block_off, sizeof(int16_t)); xlsr[50] = 'O'; block_off += blocks[i]->len; xlsr += 212; } retval = readstat_write_bytes(writer, page, hinfo->page_size); if (retval != READSTAT_OK) goto cleanup; // Page 2 retval = readstat_write_zeros(writer, hinfo->page_size); if (retval != READSTAT_OK) goto cleanup; // Page 3 memset(page, '\0', hinfo->page_size); char block_header[16]; block_off = 16; for (i=0; ilabel_sets_count; i++) { if (block_off + sizeof(block_header) + blocks[i]->len > hinfo->page_size) break; memset(block_header, '\0', sizeof(block_header)); int32_t next_page = 0; int16_t next_off = 0; int16_t block_len = blocks[i]->len; memcpy(&block_header[0], &next_page, sizeof(int32_t)); memcpy(&block_header[4], &next_off, sizeof(int16_t)); memcpy(&block_header[6], &block_len, sizeof(int16_t)); memcpy(&page[block_off], block_header, sizeof(block_header)); block_off += sizeof(block_header); memcpy(&page[block_off], blocks[i]->data, blocks[i]->len); block_off += blocks[i]->len; } retval = readstat_write_bytes(writer, page, hinfo->page_size); if (retval != READSTAT_OK) goto cleanup; cleanup: for (i=0; ilabel_sets_count; i++) { free(blocks[i]); } free(blocks); free(hinfo); free(page); return retval; } readstat_error_t readstat_begin_writing_sas7bcat(readstat_writer_t *writer, void *user_ctx) { if (writer->version == 0) writer->version = SAS_DEFAULT_FILE_VERSION; writer->callbacks.begin_data = &sas7bcat_begin_data; return readstat_begin_writing_file(writer, user_ctx, 0); } haven/src/readstat/sas/readstat_sas.h0000644000176200001440000000773114101007206017364 0ustar liggesusers #include "../readstat.h" #include "../readstat_bits.h" #pragma pack(push, 1) typedef struct sas_header_start_s { unsigned char magic[32]; unsigned char a2; unsigned char mystery1[2]; unsigned char a1; unsigned char mystery2[1]; unsigned char endian; unsigned char mystery3[1]; char file_format; unsigned char mystery4[30]; unsigned char encoding; unsigned char mystery5[13]; char file_type[8]; char table_name[32]; unsigned char mystery6[32]; char file_info[8]; } sas_header_start_t; typedef struct sas_header_end_s { char release[8]; char host[16]; char version[16]; char os_vendor[16]; char os_name[16]; char extra[48]; } sas_header_end_t; #pragma pack(pop) typedef struct sas_header_info_s { int little_endian; int u64; int vendor; int major_version; int minor_version; int revision; int pad1; int64_t page_size; int64_t page_header_size; int64_t subheader_pointer_size; int64_t page_count; int64_t header_size; time_t creation_time; time_t modification_time; char table_name[32]; char file_label[256]; char *encoding; } sas_header_info_t; enum { READSTAT_VENDOR_STAT_TRANSFER, READSTAT_VENDOR_SAS }; typedef struct sas_text_ref_s { uint16_t index; uint16_t offset; uint16_t length; } sas_text_ref_t; #define SAS_ENDIAN_BIG 0x00 #define SAS_ENDIAN_LITTLE 0x01 #define SAS_FILE_FORMAT_UNIX '1' #define SAS_FILE_FORMAT_WINDOWS '2' #define SAS_ALIGNMENT_OFFSET_0 0x22 #define SAS_ALIGNMENT_OFFSET_4 0x33 #define SAS_COLUMN_TYPE_NUM 0x01 #define SAS_COLUMN_TYPE_CHR 0x02 #define SAS_SUBHEADER_SIGNATURE_ROW_SIZE 0xF7F7F7F7 #define SAS_SUBHEADER_SIGNATURE_COLUMN_SIZE 0xF6F6F6F6 #define SAS_SUBHEADER_SIGNATURE_COUNTS 0xFFFFFC00 #define SAS_SUBHEADER_SIGNATURE_COLUMN_FORMAT 0xFFFFFBFE #define SAS_SUBHEADER_SIGNATURE_COLUMN_MASK 0xFFFFFFF8 /* Seen in the wild: FA (unknown), F8 (locale?) */ #define SAS_SUBHEADER_SIGNATURE_COLUMN_ATTRS 0xFFFFFFFC #define SAS_SUBHEADER_SIGNATURE_COLUMN_TEXT 0xFFFFFFFD #define SAS_SUBHEADER_SIGNATURE_COLUMN_LIST 0xFFFFFFFE #define SAS_SUBHEADER_SIGNATURE_COLUMN_NAME 0xFFFFFFFF #define SAS_PAGE_TYPE_META 0x0000 #define SAS_PAGE_TYPE_DATA 0x0100 #define SAS_PAGE_TYPE_MIX 0x0200 #define SAS_PAGE_TYPE_AMD 0x0400 #define SAS_PAGE_TYPE_MASK 0x0F00 #define SAS_PAGE_TYPE_META2 0x4000 #define SAS_PAGE_TYPE_COMP 0x9000 #define SAS_SUBHEADER_POINTER_SIZE_32BIT 12 #define SAS_SUBHEADER_POINTER_SIZE_64BIT 24 #define SAS_PAGE_HEADER_SIZE_32BIT 24 #define SAS_PAGE_HEADER_SIZE_64BIT 40 #define SAS_COMPRESSION_NONE 0x00 #define SAS_COMPRESSION_TRUNC 0x01 #define SAS_COMPRESSION_ROW 0x04 #define SAS_COMPRESSION_SIGNATURE_RLE "SASYZCRL" #define SAS_COMPRESSION_SIGNATURE_RDC "SASYZCR2" #define SAS_DEFAULT_FILE_VERSION 9 extern unsigned char sas7bdat_magic_number[32]; extern unsigned char sas7bcat_magic_number[32]; uint64_t sas_read8(const char *data, int bswap); uint32_t sas_read4(const char *data, int bswap); uint16_t sas_read2(const char *data, int bswap); readstat_error_t sas_read_header(readstat_io_t *io, sas_header_info_t *ctx, readstat_error_handler error_handler, void *user_ctx); size_t sas_subheader_remainder(size_t len, size_t signature_len); sas_header_info_t *sas_header_info_init(readstat_writer_t *writer, int is_64bit); readstat_error_t sas_write_header(readstat_writer_t *writer, sas_header_info_t *hinfo, sas_header_start_t header_start); readstat_error_t sas_fill_page(readstat_writer_t *writer, sas_header_info_t *hinfo); readstat_error_t sas_validate_variable(const readstat_variable_t *variable); readstat_error_t sas_validate_name(const char *name, size_t max_len); readstat_error_t sas_validate_tag(char tag); void sas_assign_tag(readstat_value_t *value, uint8_t tag); haven/src/readstat/sas/ieee.c0000644000176200001440000003433714101007206015613 0ustar liggesusers#include #include #include "ieee.h" #include "../readstat_bits.h" /* These routines are modified versions of those found in SAS publication TS-140, * "RECORD LAYOUT OF A SAS VERSION 5 OR 6 DATA SET IN SAS TRANSPORT (XPORT) FORMAT" * https://support.sas.com/techsup/technote/ts140.pdf * * Modifications include using stdint.h and supporting infinite IEEE values. */ static void xpt2ieee(unsigned char *xport, unsigned char *ieee); static void ieee2xpt(unsigned char *ieee, unsigned char *xport); #ifndef FLOATREP #define FLOATREP get_native() int get_native(); #endif void memreverse(void *intp_void, int l) { if (!machine_is_little_endian()) return; int i,j; char save; char *intp = (char *)intp_void; j = l/2; for (i=0;i=0;i--) { temp[7-i] = from[i]; } from = temp; fromtype = CN_TYPE_IEEEB; /* Break intentionally omitted. */ case CN_TYPE_IEEEB : /* Break intentionally omitted. */ case CN_TYPE_XPORT : break; default: return(-1); } if (totype == CN_TYPE_NATIVE) { totype = FLOATREP; } switch(totype) { case CN_TYPE_XPORT : case CN_TYPE_IEEEB : case CN_TYPE_IEEEL : break; default: return(-2); } if (fromtype == totype) { memcpy(to,from,8); return(0); } switch(fromtype) { case CN_TYPE_IEEEB : if (totype == CN_TYPE_XPORT) ieee2xpt(from,to); else memcpy(to,from,8); break; case CN_TYPE_XPORT : xpt2ieee(from,to); break; } if (totype == CN_TYPE_IEEEL) { memcpy(temp,to,8); for (i=7;i>=0;i--) { to[7-i] = temp[i]; } } return(0); } int get_native() { static unsigned char float_reps[][8] = { {0x41,0x10,0x00,0x00,0x00,0x00,0x00,0x00}, {0x3f,0xf0,0x00,0x00,0x00,0x00,0x00,0x00}, {0x00,0x00,0x00,0x00,0x00,0x00,0xf0,0x3f} }; static double one = 1.00; int i,j; j = sizeof(float_reps)/8; for (i=0;i>= shift; ieee2 = (xport2 >> shift) | ((xport1 & 0x00000007) << (29 + (3 - shift))); } /* clear the 1 bit to the left of the binary point */ ieee1 &= 0xffefffff; /* set the exponent of the ieee number to be the actual */ /* exponent plus the shift count + 1023. Or this into the */ /* first half of the ieee number. The ibm exponent is excess */ /* 64 but is adjusted by 65 since during conversion to ibm */ /* format the exponent is incremented by 1 and the fraction */ /* bits left 4 positions to the right of the radix point. */ ieee1 |= (((((int32_t)(*temp & 0x7f) - 65) * 4) + shift + 1023) << 20) | (xport1 & 0x80000000); doret: memreverse(&ieee1,sizeof(uint32_t)); memcpy(ieee,&ieee1,sizeof(uint32_t)); memreverse(&ieee2,sizeof(uint32_t)); memcpy(ieee+4,&ieee2,sizeof(uint32_t)); return; } /*-------------------------------------------------------------*/ /* Name: ieee2xpt */ /* Purpose: converts IEEE to transport */ /* Usage: rc = ieee2xpt(to_ieee,p_data); */ /* Notes: this routine is an adaptation of the wzctdbl routine */ /* from the Apollo. */ /*-------------------------------------------------------------*/ void ieee2xpt(unsigned char *ieee, unsigned char *xport) { register int shift; unsigned char misschar; int ieee_exp; uint32_t xport1,xport2; uint32_t ieee1 = 0; uint32_t ieee2 = 0; char ieee8[8]; memcpy(ieee8,ieee,8); /*------get 2 longs for shifting------------------------------*/ memcpy(&ieee1,ieee8,sizeof(uint32_t)); memreverse(&ieee1,sizeof(uint32_t)); memcpy(&ieee2,ieee8+4,sizeof(uint32_t)); memreverse(&ieee2,sizeof(uint32_t)); memset(xport,0,8); /*-----if IEEE value is missing (1st 2 bytes are FFFF)-----*/ if (*ieee8 == (char)0xff && ieee8[1] == (char)0xff) { misschar = ~ieee8[2]; *xport = (misschar == 0xD2) ? 0x6D : misschar; return; } /**************************************************************/ /* Translate IEEE floating point number into IBM format float */ /* */ /* IEEE format: */ /* */ /* 6 5 0 */ /* 3 1 0 */ /* */ /* SEEEEEEEEEEEMMMM ........ MMMM */ /* */ /* Sign bit, 11 bit exponent, 52 fraction. Exponent is excess */ /* 1023. The fraction is multiplied by a power of 2 of the */ /* actual exponent. Normalized floating point numbers are */ /* represented with the binary point immediately to the left */ /* of the fraction with an implied "1" to the left of the */ /* binary point. */ /* */ /* IBM format: */ /* */ /* 6 5 0 */ /* 3 5 0 */ /* */ /* SEEEEEEEMMMM ......... MMMM */ /* */ /* Sign bit, 7 bit exponent, 56 bit fraction. Exponent is */ /* excess 64. The fraction is multiplied by a power of 16 of */ /* of the actual exponent. Normalized floating point numbers */ /* are presented with the radix point immediately to the left */ /* of the high order hex fraction digit. */ /* */ /* How do you translate from local to IBM format? */ /* */ /* The ieee format gives you a number that has a power of 2 */ /* exponent and a fraction of the form "1.". */ /* The first step is to get that "1" bit back into the */ /* fraction. Right shift it down 1 position, set the high */ /* order bit and reduce the binary exponent by 1. Now we have */ /* a fraction that looks like ".1" and it's */ /* ready to be shoved into ibm format. The ibm fraction has 4 */ /* more bits than the ieee, the ieee fraction must therefore */ /* be shifted left 4 positions before moving it in. We must */ /* also correct the fraction bits to account for the loss of 2*/ /* bits when converting from a binary exponent to a hex one */ /* (>> 2). We must shift the fraction left for 0, 1, 2, or 3 */ /* positions to maintain the proper magnitude. Doing */ /* conversion this way would tend to lose bits in the fraction*/ /* which is not desirable or necessary if we cheat a bit. */ /* First of all, we know that we are going to have to shift */ /* the ieee fraction left 4 places to put it in the right */ /* position; we won't do that, we'll just leave it where it is*/ /* and increment the ibm exponent by one, this will have the */ /* same effect and we won't have to do any shifting. Now, */ /* since we have 4 bits in front of the fraction to work with,*/ /* we won't lose any bits. We set the bit to the left of the */ /* fraction which is the implicit "1" in the ieee fraction. We*/ /* then adjust the fraction to account for the loss of bits */ /* when going to a hex exponent. This adjustment will never */ /* involve shifting by more than 3 positions so no bits are */ /* lost. */ /* Get ieee number less the exponent into the first half of */ /* the ibm number */ xport1 = ieee1 & 0x000fffff; /* get the second half of the number into the second half of */ /* the ibm number and see if both halves are 0. If so, ibm is */ /* also 0 and we just return */ if ((!(xport2 = ieee2)) && !ieee1) { ieee_exp = 0; goto doret; } /* get the actual exponent value out of the ieee number. The */ /* ibm fraction is a power of 16 and the ieee fraction a power*/ /* of 2 (16 ** n == 2 ** 4n). Save the low order 2 bits since */ /* they will get lost when we divide the exponent by 4 (right */ /* shift by 2) and we will have to shift the fraction by the */ /* appropriate number of bits to keep the proper magnitude. */ shift = (int) (ieee_exp = (int)(((ieee1 >> 16) & 0x7ff0) >> 4) - 1023) & 3; /* the ieee format has an implied "1" immdeiately to the left */ /* of the binary point. Show it in here. */ xport1 |= 0x00100000; if (shift) { /* set the first half of the ibm number by shifting it left */ /* the appropriate number of bits and oring in the bits */ /* from the lower half that would have been shifted in (if */ /* we could shift a double). The shift count can never */ /* exceed 3, so all we care about are the high order 3 */ /* bits. We don't want sign extention so make sure it's an */ /* unsigned char. We'll shift either5, 6, or 7 places to */ /* keep 3, 2, or 1 bits. After that, shift the second half */ /* of the number the right number of places. We always get */ /* zero fill on left shifts. */ xport1 = (xport1 << shift) | ((unsigned char) (((ieee2 >> 24) & 0xE0) >> (5 + (3 - shift)))); xport2 <<= shift; } /* Now set the ibm exponent and the sign of the fraction. The */ /* power of 2 ieee exponent must be divided by 4 and made */ /* excess 64 (we add 65 here because of the poisition of the */ /* fraction bits, essentially 4 positions lower than they */ /* should be so we incrment the ibm exponent). */ xport1 |= (((ieee_exp >>2) + 65) | ((ieee1 >> 24) & 0x80)) << 24; /* If the ieee exponent is greater than 248 or less than -260, */ /* then it cannot fit in the ibm exponent field. Send back the */ /* appropriate flag. */ doret: if (ieee_exp < -260) { memset(xport,0x00,8); } else if (ieee_exp > 248) { memset(xport+1,0xFF,7); *xport = 0x7F | ((ieee1 >> 24) & 0x80); } else { memreverse(&xport1,sizeof(uint32_t)); memcpy(xport,&xport1,sizeof(uint32_t)); memreverse(&xport2,sizeof(uint32_t)); memcpy(xport+4,&xport2,sizeof(uint32_t)); } return; } haven/src/readstat/sas/readstat_sas7bcat_read.c0000644000176200001440000004145514101007206021274 0ustar liggesusers#include #include #include #include #include #include "readstat_sas.h" #include "../readstat_iconv.h" #include "../readstat_convert.h" #include "../readstat_malloc.h" #define SAS_CATALOG_FIRST_INDEX_PAGE 1 #define SAS_CATALOG_USELESS_PAGES 3 typedef struct sas7bcat_ctx_s { readstat_metadata_handler metadata_handler; readstat_value_label_handler value_label_handler; void *user_ctx; readstat_io_t *io; int u64; int pad1; int bswap; int64_t xlsr_size; int64_t xlsr_offset; int64_t xlsr_O_offset; int64_t page_count; int64_t page_size; int64_t header_size; uint64_t *block_pointers; int block_pointers_used; int block_pointers_capacity; const char *input_encoding; const char *output_encoding; iconv_t converter; } sas7bcat_ctx_t; static void sas7bcat_ctx_free(sas7bcat_ctx_t *ctx) { if (ctx->converter) iconv_close(ctx->converter); if (ctx->block_pointers) free(ctx->block_pointers); free(ctx); } static readstat_error_t sas7bcat_parse_value_labels(const char *value_start, size_t value_labels_len, int label_count_used, int label_count_capacity, const char *name, sas7bcat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; int i; const char *lbp1 = value_start; uint32_t *value_offset = readstat_calloc(label_count_used, sizeof(uint32_t)); /* Doubles appear to be stored as big-endian, always */ int bswap_doubles = machine_is_little_endian(); int is_string = (name[0] == '$'); char *label = NULL; if (value_offset == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } /* Pass 1 -- find out the offset of the labels */ for (i=0; i value_labels_len || lbp1[2] < 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (ipad1+4] - value_start > value_labels_len) { retval = READSTAT_ERROR_PARSE; goto cleanup; } uint32_t label_pos = sas_read4(&lbp1[10+ctx->pad1], ctx->bswap); if (label_pos >= label_count_used) { retval = READSTAT_ERROR_PARSE; goto cleanup; } value_offset[label_pos] = lbp1 - value_start; } lbp1 += 6 + lbp1[2]; } const char *lbp2 = lbp1; /* Pass 2 -- parse pairs of values & labels */ for (i=0; i value_labels_len || &lbp2[10] - value_start > value_labels_len) { retval = READSTAT_ERROR_PARSE; goto cleanup; } readstat_value_t value = { .type = is_string ? READSTAT_TYPE_STRING : READSTAT_TYPE_DOUBLE }; char string_val[4*16+1]; if (is_string) { size_t value_entry_len = 6 + lbp1[2]; retval = readstat_convert(string_val, sizeof(string_val), &lbp1[value_entry_len-16], 16, ctx->converter); if (retval != READSTAT_OK) goto cleanup; value.v.string_value = string_val; } else { uint64_t val = sas_read8(&lbp1[22], bswap_doubles); double dval = NAN; if ((val | 0xFF0000000000) == 0xFFFFFFFFFFFF) { sas_assign_tag(&value, (val >> 40)); } else { memcpy(&dval, &val, 8); dval *= -1.0; } value.v.double_value = dval; } size_t label_len = sas_read2(&lbp2[8], ctx->bswap); if (&lbp2[10] + label_len - value_start > value_labels_len) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (ctx->value_label_handler) { label = realloc(label, 4 * label_len + 1); retval = readstat_convert(label, 4 * label_len + 1, &lbp2[10], label_len, ctx->converter); if (retval != READSTAT_OK) goto cleanup; if (ctx->value_label_handler(name, value, label, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } lbp2 += 8 + 2 + label_len + 1; } cleanup: free(label); free(value_offset); return retval; } static readstat_error_t sas7bcat_parse_block(const char *data, size_t data_size, sas7bcat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; size_t pad = 0; int label_count_capacity = 0; int label_count_used = 0; int payload_offset = 106; char name[4*32+1]; if (data_size < payload_offset) goto cleanup; pad = (data[2] & 0x08) ? 4 : 0; // might be 0x10, not sure if (ctx->u64) { label_count_capacity = sas_read4(&data[42+pad], ctx->bswap); label_count_used = sas_read4(&data[50+pad], ctx->bswap); payload_offset += 32; } else { label_count_capacity = sas_read4(&data[38+pad], ctx->bswap); label_count_used = sas_read4(&data[42+pad], ctx->bswap); } if ((retval = readstat_convert(name, sizeof(name), &data[8], 8, ctx->converter)) != READSTAT_OK) goto cleanup; if (pad) { pad += 16; } if ((data[2] & 0x80) && !ctx->u64) { // has long name if (data_size < payload_offset + pad + 32) goto cleanup; retval = readstat_convert(name, sizeof(name), &data[payload_offset+pad], 32, ctx->converter); if (retval != READSTAT_OK) goto cleanup; pad += 32; } if (data_size < payload_offset + pad) goto cleanup; if ((retval = sas7bcat_parse_value_labels(&data[payload_offset+pad], data_size - payload_offset - pad, label_count_used, label_count_capacity, name, ctx)) != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t sas7bcat_augment_index(const char *index, size_t len, sas7bcat_ctx_t *ctx) { const char *xlsr = index; readstat_error_t retval = READSTAT_OK; while (xlsr + ctx->xlsr_size <= index + len) { if (memcmp(xlsr, "XLSR", 4) != 0) // some block pointers seem to have 8 bytes of extra padding xlsr += 8; if (memcmp(xlsr, "XLSR", 4) != 0) break; if (xlsr[ctx->xlsr_O_offset] == 'O') { uint32_t page = 0, pos = 0; if (ctx->u64) { page = sas_read4(&xlsr[8], ctx->bswap); pos = sas_read4(&xlsr[16], ctx->bswap); } else { page = sas_read2(&xlsr[4], ctx->bswap); pos = sas_read2(&xlsr[8], ctx->bswap); } ctx->block_pointers[ctx->block_pointers_used++] = ((uint64_t)page << 32) + pos; } if (ctx->block_pointers_used == ctx->block_pointers_capacity) { ctx->block_pointers = readstat_realloc(ctx->block_pointers, (ctx->block_pointers_capacity *= 2) * sizeof(uint64_t)); if (ctx->block_pointers == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } } xlsr += ctx->xlsr_size; } cleanup: return retval; } static int compare_block_pointers(const void *elem1, const void *elem2) { uint64_t v1 = *(const uint64_t *)elem1; uint64_t v2 = *(const uint64_t *)elem2; return v1 - v2; } static void sas7bcat_sort_index(sas7bcat_ctx_t *ctx) { if (ctx->block_pointers_used == 0) return; int i; for (i=1; iblock_pointers_used; i++) { if (ctx->block_pointers[i] < ctx->block_pointers[i-1]) { qsort(ctx->block_pointers, ctx->block_pointers_used, sizeof(uint64_t), &compare_block_pointers); break; } } } static void sas7bcat_uniq_index(sas7bcat_ctx_t *ctx) { if (ctx->block_pointers_used == 0) return; int i; int out_i = 1; for (i=1; iblock_pointers_used; i++) { if (ctx->block_pointers[i] != ctx->block_pointers[i-1]) { if (out_i != i) { ctx->block_pointers[out_i] = ctx->block_pointers[i]; } out_i++; } } ctx->block_pointers_used = out_i; } static int sas7bcat_block_size(int start_page, int start_page_pos, sas7bcat_ctx_t *ctx, readstat_error_t *outError) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; int next_page = start_page; int next_page_pos = start_page_pos; int link_count = 0; int buffer_len = 0; int chain_link_len = 0; char chain_link[32]; int chain_link_header_len = 16; if (ctx->u64) { chain_link_header_len = 32; } // calculate buffer size needed while (next_page > 0 && next_page_pos > 0 && next_page <= ctx->page_count && link_count++ < ctx->page_count) { if (io->seek(ctx->header_size+(next_page-1)*ctx->page_size+next_page_pos, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if (io->read(chain_link, chain_link_header_len, io->io_ctx) < chain_link_header_len) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->u64) { next_page = sas_read4(&chain_link[0], ctx->bswap); next_page_pos = sas_read2(&chain_link[8], ctx->bswap); chain_link_len = sas_read2(&chain_link[10], ctx->bswap); } else { next_page = sas_read4(&chain_link[0], ctx->bswap); next_page_pos = sas_read2(&chain_link[4], ctx->bswap); chain_link_len = sas_read2(&chain_link[6], ctx->bswap); } buffer_len += chain_link_len; } cleanup: if (outError) *outError = retval; return retval == READSTAT_OK ? buffer_len : -1; } static readstat_error_t sas7bcat_read_block(char *buffer, size_t buffer_len, int start_page, int start_page_pos, sas7bcat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; int next_page = start_page; int next_page_pos = start_page_pos; int link_count = 0; int chain_link_len = 0; int buffer_offset = 0; char chain_link[32]; int chain_link_header_len = 16; if (ctx->u64) { chain_link_header_len = 32; } while (next_page > 0 && next_page_pos > 0 && next_page <= ctx->page_count && link_count++ < ctx->page_count) { if (io->seek(ctx->header_size+(next_page-1)*ctx->page_size+next_page_pos, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if (io->read(chain_link, chain_link_header_len, io->io_ctx) < chain_link_header_len) { retval = READSTAT_ERROR_READ; goto cleanup; } if (ctx->u64) { next_page = sas_read4(&chain_link[0], ctx->bswap); next_page_pos = sas_read2(&chain_link[8], ctx->bswap); chain_link_len = sas_read2(&chain_link[10], ctx->bswap); } else { next_page = sas_read4(&chain_link[0], ctx->bswap); next_page_pos = sas_read2(&chain_link[4], ctx->bswap); chain_link_len = sas_read2(&chain_link[6], ctx->bswap); } if (buffer_offset + chain_link_len > buffer_len) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (io->read(buffer + buffer_offset, chain_link_len, io->io_ctx) < chain_link_len) { retval = READSTAT_ERROR_READ; goto cleanup; } buffer_offset += chain_link_len; } cleanup: return retval; } readstat_error_t readstat_parse_sas7bcat(readstat_parser_t *parser, const char *path, void *user_ctx) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = parser->io; int64_t i; char *page = NULL; char *buffer = NULL; sas7bcat_ctx_t *ctx = calloc(1, sizeof(sas7bcat_ctx_t)); sas_header_info_t *hinfo = calloc(1, sizeof(sas_header_info_t)); ctx->block_pointers = malloc((ctx->block_pointers_capacity = 200) * sizeof(uint64_t)); ctx->value_label_handler = parser->handlers.value_label; ctx->metadata_handler = parser->handlers.metadata; ctx->input_encoding = parser->input_encoding; ctx->output_encoding = parser->output_encoding; ctx->user_ctx = user_ctx; ctx->io = io; if (io->open(path, io->io_ctx) == -1) { retval = READSTAT_ERROR_OPEN; goto cleanup; } if ((retval = sas_read_header(io, hinfo, parser->handlers.error, user_ctx)) != READSTAT_OK) { goto cleanup; } ctx->u64 = hinfo->u64; ctx->pad1 = hinfo->pad1; ctx->bswap = machine_is_little_endian() ^ hinfo->little_endian; ctx->header_size = hinfo->header_size; ctx->page_count = hinfo->page_count; ctx->page_size = hinfo->page_size; if (ctx->input_encoding == NULL) { ctx->input_encoding = hinfo->encoding; } ctx->xlsr_size = 212 + ctx->pad1; ctx->xlsr_offset = 856 + 2 * ctx->pad1; ctx->xlsr_O_offset = 50 + ctx->pad1; if (ctx->u64) { ctx->xlsr_offset += 144; ctx->xlsr_size += 72; ctx->xlsr_O_offset += 24; } if (ctx->input_encoding && ctx->output_encoding && strcmp(ctx->input_encoding, ctx->output_encoding) != 0) { iconv_t converter = iconv_open(ctx->output_encoding, ctx->input_encoding); if (converter == (iconv_t)-1) { retval = READSTAT_ERROR_UNSUPPORTED_CHARSET; goto cleanup; } ctx->converter = converter; } if (ctx->metadata_handler) { char table_name[4*32+1]; readstat_metadata_t metadata = { .file_encoding = ctx->input_encoding, /* orig encoding? */ .modified_time = hinfo->modification_time, .creation_time = hinfo->creation_time, .file_format_version = hinfo->major_version, .endianness = hinfo->little_endian ? READSTAT_ENDIAN_LITTLE : READSTAT_ENDIAN_BIG, .is64bit = ctx->u64 }; retval = readstat_convert(table_name, sizeof(table_name), hinfo->table_name, sizeof(hinfo->table_name), ctx->converter); if (retval != READSTAT_OK) goto cleanup; metadata.table_name = table_name; if (ctx->metadata_handler(&metadata, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } if ((page = readstat_malloc(ctx->page_size)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (io->seek(ctx->header_size+SAS_CATALOG_FIRST_INDEX_PAGE*ctx->page_size, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if (io->read(page, ctx->page_size, io->io_ctx) < ctx->page_size) { retval = READSTAT_ERROR_READ; goto cleanup; } retval = sas7bcat_augment_index(&page[ctx->xlsr_offset], ctx->page_size - ctx->xlsr_offset, ctx); if (retval != READSTAT_OK) goto cleanup; // Pass 1 -- find the XLSR entries for (i=SAS_CATALOG_USELESS_PAGES; ipage_count; i++) { if (io->seek(ctx->header_size+i*ctx->page_size, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if (io->read(page, ctx->page_size, io->io_ctx) < ctx->page_size) { retval = READSTAT_ERROR_READ; goto cleanup; } if (memcmp(&page[16], "XLSR", sizeof("XLSR")-1) == 0) { retval = sas7bcat_augment_index(&page[16], ctx->page_size - 16, ctx); if (retval != READSTAT_OK) goto cleanup; } } sas7bcat_sort_index(ctx); sas7bcat_uniq_index(ctx); // Pass 2 -- look up the individual block pointers for (i=0; iblock_pointers_used; i++) { int start_page = ctx->block_pointers[i] >> 32; int start_page_pos = (ctx->block_pointers[i]) & 0xFFFF; int buffer_len = sas7bcat_block_size(start_page, start_page_pos, ctx, &retval); if (buffer_len == -1) { goto cleanup; } else if (buffer_len == 0) { continue; } if ((buffer = readstat_realloc(buffer, buffer_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if ((retval = sas7bcat_read_block(buffer, buffer_len, start_page, start_page_pos, ctx)) != READSTAT_OK) goto cleanup; if ((retval = sas7bcat_parse_block(buffer, buffer_len, ctx)) != READSTAT_OK) goto cleanup; } cleanup: io->close(io->io_ctx); if (page) free(page); if (buffer) free(buffer); if (ctx) sas7bcat_ctx_free(ctx); if (hinfo) free(hinfo); return retval; } haven/src/readstat/sas/readstat_xport.h0000644000176200001440000000155614101007206017751 0ustar liggesusers typedef struct xport_header_record_s { char name[9]; int num1; int num2; int num3; int num4; int num5; int num6; } xport_header_record_t; extern char _xport_months[12][4]; #pragma pack(push, 1) typedef struct xport_namestr_s { uint16_t ntype; uint16_t nhfun; uint16_t nlng; uint16_t nvar0; char nname[8]; char nlabel[40]; char nform[8]; uint16_t nfl; uint16_t nfd; uint16_t nfj; char nfill[2]; char niform[8]; uint16_t nifl; uint16_t nifd; uint32_t npos; char longname[32]; uint16_t labeln; char rest[18]; } xport_namestr_t; #pragma pack(pop) #define XPORT_MIN_DOUBLE_SIZE 3 #define XPORT_MAX_DOUBLE_SIZE 8 void xport_namestr_bswap(xport_namestr_t *namestr); haven/src/readstat/sas/readstat_sas_rle.h0000644000176200001440000000055614101007206020224 0ustar liggesusers ssize_t sas_rle_decompress(void *output_buf, size_t output_len, const void *input_buf, size_t input_len); ssize_t sas_rle_compress(void *output_buf, size_t output_len, const void *input_buf, size_t input_len); ssize_t sas_rle_decompressed_len(const void *input_buf, size_t input_len); ssize_t sas_rle_compressed_len(const void *bytes, size_t len); haven/src/readstat/sas/readstat_sas7bdat_read.c0000644000176200001440000013251314101765776021320 0ustar liggesusers #include #include #include #include #include #include #include "readstat_sas.h" #include "readstat_sas_rle.h" #include "../readstat_iconv.h" #include "../readstat_convert.h" #include "../readstat_malloc.h" typedef struct col_info_s { sas_text_ref_t name_ref; sas_text_ref_t format_ref; sas_text_ref_t label_ref; int index; uint64_t offset; uint32_t width; int type; int format_len; } col_info_t; typedef struct subheader_pointer_s { uint64_t offset; uint64_t len; unsigned char compression; unsigned char is_compressed_data; } subheader_pointer_t; typedef struct sas7bdat_ctx_s { readstat_callbacks_t handle; int64_t file_size; int little_endian; int u64; int vendor; void *user_ctx; readstat_io_t *io; int bswap; int did_submit_columns; uint32_t row_length; uint32_t page_row_count; uint32_t parsed_row_count; uint32_t column_count; uint32_t row_limit; uint32_t row_offset; uint64_t header_size; uint64_t page_count; uint64_t page_size; char *page; char *row; uint64_t page_header_size; uint64_t subheader_signature_size; uint64_t subheader_pointer_size; int text_blob_count; size_t *text_blob_lengths; char **text_blobs; int col_names_count; int col_attrs_count; int col_formats_count; size_t max_col_width; char *scratch_buffer; size_t scratch_buffer_len; int col_info_count; col_info_t *col_info; readstat_variable_t **variables; const char *input_encoding; const char *output_encoding; iconv_t converter; time_t ctime; time_t mtime; int version; char table_name[4*32+1]; char file_label[4*256+1]; char error_buf[2048]; unsigned int rdc_compression:1; } sas7bdat_ctx_t; static void sas7bdat_ctx_free(sas7bdat_ctx_t *ctx) { int i; if (ctx->text_blobs) { for (i=0; itext_blob_count; i++) { free(ctx->text_blobs[i]); } free(ctx->text_blobs); free(ctx->text_blob_lengths); } if (ctx->variables) { for (i=0; icolumn_count; i++) { if (ctx->variables[i]) free(ctx->variables[i]); } free(ctx->variables); } if (ctx->col_info) free(ctx->col_info); if (ctx->scratch_buffer) free(ctx->scratch_buffer); if (ctx->page) free(ctx->page); if (ctx->row) free(ctx->row); if (ctx->converter) iconv_close(ctx->converter); free(ctx); } static readstat_error_t sas7bdat_update_progress(sas7bdat_ctx_t *ctx) { readstat_io_t *io = ctx->io; return io->update(ctx->file_size, ctx->handle.progress, ctx->user_ctx, io->io_ctx); } static sas_text_ref_t sas7bdat_parse_text_ref(const char *data, sas7bdat_ctx_t *ctx) { sas_text_ref_t ref; ref.index = sas_read2(&data[0], ctx->bswap); ref.offset = sas_read2(&data[2], ctx->bswap); ref.length = sas_read2(&data[4], ctx->bswap); return ref; } static readstat_error_t sas7bdat_copy_text_ref(char *out_buffer, size_t out_buffer_len, sas_text_ref_t text_ref, sas7bdat_ctx_t *ctx) { if (text_ref.index >= ctx->text_blob_count) return READSTAT_ERROR_PARSE; if (text_ref.length == 0) { out_buffer[0] = '\0'; return READSTAT_OK; } char *blob = ctx->text_blobs[text_ref.index]; if (text_ref.offset + text_ref.length > ctx->text_blob_lengths[text_ref.index]) return READSTAT_ERROR_PARSE; return readstat_convert(out_buffer, out_buffer_len, &blob[text_ref.offset], text_ref.length, ctx->converter); } static readstat_error_t sas7bdat_parse_column_text_subheader(const char *subheader, size_t len, sas7bdat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; size_t signature_len = ctx->subheader_signature_size; uint16_t remainder = sas_read2(&subheader[signature_len], ctx->bswap); char *blob = NULL; if (remainder != sas_subheader_remainder(len, signature_len)) { retval = READSTAT_ERROR_PARSE; goto cleanup; } ctx->text_blob_count++; ctx->text_blobs = readstat_realloc(ctx->text_blobs, ctx->text_blob_count * sizeof(char *)); ctx->text_blob_lengths = readstat_realloc(ctx->text_blob_lengths, ctx->text_blob_count * sizeof(ctx->text_blob_lengths[0])); if (ctx->text_blobs == NULL || ctx->text_blob_lengths == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if ((blob = readstat_malloc(len-signature_len)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } memcpy(blob, subheader+signature_len, len-signature_len); ctx->text_blob_lengths[ctx->text_blob_count-1] = len-signature_len; ctx->text_blobs[ctx->text_blob_count-1] = blob; cleanup: return retval; } static readstat_error_t sas7bdat_realloc_col_info(sas7bdat_ctx_t *ctx, size_t count) { if (ctx->col_info_count < count) { size_t old_count = ctx->col_info_count; ctx->col_info_count = count; ctx->col_info = readstat_realloc(ctx->col_info, ctx->col_info_count * sizeof(col_info_t)); if (ctx->col_info == NULL) { return READSTAT_ERROR_MALLOC; } memset(ctx->col_info + old_count, 0, (count - old_count) * sizeof(col_info_t)); } return READSTAT_OK; } static readstat_error_t sas7bdat_parse_column_size_subheader(const char *subheader, size_t len, sas7bdat_ctx_t *ctx) { uint64_t col_count; readstat_error_t retval = READSTAT_OK; if (ctx->column_count || ctx->did_submit_columns) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (len < (ctx->u64 ? 16 : 8)) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (ctx->u64) { col_count = sas_read8(&subheader[8], ctx->bswap); } else { col_count = sas_read4(&subheader[4], ctx->bswap); } ctx->column_count = col_count; retval = sas7bdat_realloc_col_info(ctx, ctx->column_count); cleanup: return retval; } static readstat_error_t sas7bdat_parse_row_size_subheader(const char *subheader, size_t len, sas7bdat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; uint64_t total_row_count; uint64_t row_length, page_row_count; if (len < (ctx->u64 ? 250: 190)) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (ctx->u64) { row_length = sas_read8(&subheader[40], ctx->bswap); total_row_count = sas_read8(&subheader[48], ctx->bswap); page_row_count = sas_read8(&subheader[120], ctx->bswap); } else { row_length = sas_read4(&subheader[20], ctx->bswap); total_row_count = sas_read4(&subheader[24], ctx->bswap); page_row_count = sas_read4(&subheader[60], ctx->bswap); } sas_text_ref_t file_label_ref = sas7bdat_parse_text_ref(&subheader[len-130], ctx); if (file_label_ref.length) { if ((retval = sas7bdat_copy_text_ref(ctx->file_label, sizeof(ctx->file_label), file_label_ref, ctx)) != READSTAT_OK) { goto cleanup; } } sas_text_ref_t compression_ref = sas7bdat_parse_text_ref(&subheader[len-118], ctx); if (compression_ref.length) { char compression[9]; if ((retval = sas7bdat_copy_text_ref(compression, sizeof(compression), compression_ref, ctx)) != READSTAT_OK) { goto cleanup; } ctx->rdc_compression = (memcmp(compression, SAS_COMPRESSION_SIGNATURE_RDC, 8) == 0); } ctx->row_length = row_length; ctx->row = readstat_realloc(ctx->row, ctx->row_length); if (ctx->row == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } ctx->page_row_count = page_row_count; uint64_t total_row_count_after_skipping = total_row_count; if (total_row_count > ctx->row_offset) { total_row_count_after_skipping -= ctx->row_offset; } else { total_row_count_after_skipping = 0; ctx->row_offset = total_row_count; } if (ctx->row_limit == 0 || total_row_count_after_skipping < ctx->row_limit) ctx->row_limit = total_row_count_after_skipping; cleanup: return retval; } static readstat_error_t sas7bdat_parse_column_name_subheader(const char *subheader, size_t len, sas7bdat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; size_t signature_len = ctx->subheader_signature_size; int cmax = ctx->u64 ? (len-28)/8 : (len-20)/8; int i; const char *cnp = &subheader[signature_len+8]; uint16_t remainder = sas_read2(&subheader[signature_len], ctx->bswap); if (remainder != sas_subheader_remainder(len, signature_len)) { retval = READSTAT_ERROR_PARSE; goto cleanup; } ctx->col_names_count += cmax; if ((retval = sas7bdat_realloc_col_info(ctx, ctx->col_names_count)) != READSTAT_OK) goto cleanup; for (i=ctx->col_names_count-cmax; icol_names_count; i++) { ctx->col_info[i].name_ref = sas7bdat_parse_text_ref(cnp, ctx); cnp += 8; } cleanup: return retval; } static readstat_error_t sas7bdat_parse_column_attributes_subheader(const char *subheader, size_t len, sas7bdat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; size_t signature_len = ctx->subheader_signature_size; int cmax = ctx->u64 ? (len-28)/16 : (len-20)/12; int i; const char *cap = &subheader[signature_len+8]; uint16_t remainder = sas_read2(&subheader[signature_len], ctx->bswap); if (remainder != sas_subheader_remainder(len, signature_len)) { retval = READSTAT_ERROR_PARSE; goto cleanup; } ctx->col_attrs_count += cmax; if ((retval = sas7bdat_realloc_col_info(ctx, ctx->col_attrs_count)) != READSTAT_OK) goto cleanup; for (i=ctx->col_attrs_count-cmax; icol_attrs_count; i++) { if (ctx->u64) { ctx->col_info[i].offset = sas_read8(&cap[0], ctx->bswap); } else { ctx->col_info[i].offset = sas_read4(&cap[0], ctx->bswap); } readstat_off_t off=4; if (ctx->u64) off=8; ctx->col_info[i].width = sas_read4(&cap[off], ctx->bswap); if (ctx->col_info[i].width > ctx->max_col_width) ctx->max_col_width = ctx->col_info[i].width; if (cap[off+6] == SAS_COLUMN_TYPE_NUM) { ctx->col_info[i].type = READSTAT_TYPE_DOUBLE; } else if (cap[off+6] == SAS_COLUMN_TYPE_CHR) { ctx->col_info[i].type = READSTAT_TYPE_STRING; } else { retval = READSTAT_ERROR_PARSE; goto cleanup; } ctx->col_info[i].index = i; cap += off+8; } cleanup: return retval; } static readstat_error_t sas7bdat_parse_column_format_subheader(const char *subheader, size_t len, sas7bdat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; if (len < (ctx->u64 ? 58 : 46)) { retval = READSTAT_ERROR_PARSE; goto cleanup; } ctx->col_formats_count++; if ((retval = sas7bdat_realloc_col_info(ctx, ctx->col_formats_count)) != READSTAT_OK) goto cleanup; if (ctx->u64) ctx->col_info[ctx->col_formats_count-1].format_len = sas_read2(&subheader[24], ctx->bswap); ctx->col_info[ctx->col_formats_count-1].format_ref = sas7bdat_parse_text_ref( ctx->u64 ? &subheader[46] : &subheader[34], ctx); ctx->col_info[ctx->col_formats_count-1].label_ref = sas7bdat_parse_text_ref( ctx->u64 ? &subheader[52] : &subheader[40], ctx); cleanup: return retval; } static readstat_error_t sas7bdat_handle_data_value(readstat_variable_t *variable, col_info_t *col_info, const char *col_data, sas7bdat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; int cb_retval = 0; readstat_value_t value; memset(&value, 0, sizeof(readstat_value_t)); value.type = col_info->type; if (col_info->type == READSTAT_TYPE_STRING) { retval = readstat_convert(ctx->scratch_buffer, ctx->scratch_buffer_len, col_data, col_info->width, ctx->converter); if (retval != READSTAT_OK) { if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "ReadStat: Error converting string (row=%u, col=%u) to specified encoding: %.*s", ctx->parsed_row_count+1, col_info->index+1, col_info->width, col_data); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } goto cleanup; } value.v.string_value = ctx->scratch_buffer; } else if (col_info->type == READSTAT_TYPE_DOUBLE) { uint64_t val = 0; double dval = NAN; if (ctx->little_endian) { int k; for (k=0; kwidth; k++) { val = (val << 8) | (unsigned char)col_data[col_info->width-1-k]; } } else { int k; for (k=0; kwidth; k++) { val = (val << 8) | (unsigned char)col_data[k]; } } val <<= (8-col_info->width)*8; memcpy(&dval, &val, 8); if (isnan(dval)) { value.v.double_value = NAN; sas_assign_tag(&value, ~((val >> 40) & 0xFF)); } else { value.v.double_value = dval; } } cb_retval = ctx->handle.value(ctx->parsed_row_count, variable, value, ctx->user_ctx); if (cb_retval != READSTAT_HANDLER_OK) retval = READSTAT_ERROR_USER_ABORT; cleanup: return retval; } static readstat_error_t sas7bdat_parse_single_row(const char *data, sas7bdat_ctx_t *ctx) { if (ctx->parsed_row_count == ctx->row_limit) return READSTAT_OK; if (ctx->row_offset) { ctx->row_offset--; return READSTAT_OK; } readstat_error_t retval = READSTAT_OK; int j; if (ctx->handle.value) { ctx->scratch_buffer_len = 4*ctx->max_col_width+1; ctx->scratch_buffer = readstat_realloc(ctx->scratch_buffer, ctx->scratch_buffer_len); if (ctx->scratch_buffer == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } for (j=0; jcolumn_count; j++) { col_info_t *col_info = &ctx->col_info[j]; readstat_variable_t *variable = ctx->variables[j]; if (variable->skip) continue; if (col_info->offset > ctx->row_length || col_info->offset + col_info->width > ctx->row_length) { retval = READSTAT_ERROR_PARSE; goto cleanup; } retval = sas7bdat_handle_data_value(variable, col_info, &data[col_info->offset], ctx); if (retval != READSTAT_OK) { goto cleanup; } } } ctx->parsed_row_count++; cleanup: return retval; } static readstat_error_t sas7bdat_parse_rows(const char *data, size_t len, sas7bdat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; int i; size_t row_offset=0; for (i=0; ipage_row_count && ctx->parsed_row_count < ctx->row_limit; i++) { if (row_offset + ctx->row_length > len) { retval = READSTAT_ERROR_ROW_WIDTH_MISMATCH; goto cleanup; } if ((retval = sas7bdat_parse_single_row(&data[row_offset], ctx)) != READSTAT_OK) goto cleanup; row_offset += ctx->row_length; } cleanup: return retval; } static readstat_error_t sas7bdat_parse_subheader_rdc(const char *subheader, size_t len, sas7bdat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; const unsigned char *input = (const unsigned char *)subheader; char *buffer = malloc(ctx->row_length); char *output = buffer; while (input + 2 <= (const unsigned char *)subheader + len) { int i; unsigned short prefix = (input[0] << 8) + input[1]; input += 2; for (i=0; i<16; i++) { if ((prefix & (1 << (15 - i))) == 0) { if (input + 1 > (const unsigned char *)subheader + len) { break; } if (output + 1 > buffer + ctx->row_length) { retval = READSTAT_ERROR_ROW_WIDTH_MISMATCH; goto cleanup; } *output++ = *input++; continue; } if (input + 2 > (const unsigned char *)subheader + len) { retval = READSTAT_ERROR_PARSE; goto cleanup; } unsigned char marker_byte = *input++; unsigned char next_byte = *input++; size_t insert_len = 0, copy_len = 0; unsigned char insert_byte = 0x00; size_t back_offset = 0; if (marker_byte <= 0x0F) { insert_len = 3 + marker_byte; insert_byte = next_byte; } else if ((marker_byte >> 4) == 1) { if (input + 1 > (const unsigned char *)subheader + len) { retval = READSTAT_ERROR_PARSE; goto cleanup; } insert_len = 19 + (marker_byte & 0x0F) + next_byte * 16; insert_byte = *input++; } else if ((marker_byte >> 4) == 2) { if (input + 1 > (const unsigned char *)subheader + len) { retval = READSTAT_ERROR_PARSE; goto cleanup; } copy_len = 16 + (*input++); back_offset = 3 + (marker_byte & 0x0F) + next_byte * 16; } else { copy_len = (marker_byte >> 4); back_offset = 3 + (marker_byte & 0x0F) + next_byte * 16; } if (insert_len) { if (output + insert_len > buffer + ctx->row_length) { retval = READSTAT_ERROR_ROW_WIDTH_MISMATCH; goto cleanup; } memset(output, insert_byte, insert_len); output += insert_len; } else if (copy_len) { if (output - buffer < back_offset || copy_len > back_offset) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (output + copy_len > buffer + ctx->row_length) { retval = READSTAT_ERROR_ROW_WIDTH_MISMATCH; goto cleanup; } memcpy(output, output - back_offset, copy_len); output += copy_len; } } } if (output - buffer != ctx->row_length) { retval = READSTAT_ERROR_ROW_WIDTH_MISMATCH; goto cleanup; } retval = sas7bdat_parse_single_row(buffer, ctx); cleanup: free(buffer); return retval; } static readstat_error_t sas7bdat_parse_subheader_rle(const char *subheader, size_t len, sas7bdat_ctx_t *ctx) { if (ctx->row_limit == ctx->parsed_row_count) return READSTAT_OK; readstat_error_t retval = READSTAT_OK; ssize_t bytes_decompressed = 0; bytes_decompressed = sas_rle_decompress(ctx->row, ctx->row_length, subheader, len); if (bytes_decompressed != ctx->row_length) { retval = READSTAT_ERROR_ROW_WIDTH_MISMATCH; if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "ReadStat: Row #%d decompressed to %ld bytes (expected %d bytes)", ctx->parsed_row_count, (long)(bytes_decompressed), ctx->row_length); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } goto cleanup; } retval = sas7bdat_parse_single_row(ctx->row, ctx); cleanup: return retval; } static readstat_error_t sas7bdat_parse_subheader_compressed(const char *subheader, size_t len, sas7bdat_ctx_t *ctx) { if (ctx->rdc_compression) return sas7bdat_parse_subheader_rdc(subheader, len, ctx); return sas7bdat_parse_subheader_rle(subheader, len, ctx); } static readstat_error_t sas7bdat_parse_subheader(uint32_t signature, const char *subheader, size_t len, sas7bdat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; if (len < 2 + ctx->subheader_signature_size) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (signature == SAS_SUBHEADER_SIGNATURE_ROW_SIZE) { retval = sas7bdat_parse_row_size_subheader(subheader, len, ctx); } else if (signature == SAS_SUBHEADER_SIGNATURE_COLUMN_SIZE) { retval = sas7bdat_parse_column_size_subheader(subheader, len, ctx); } else if (signature == SAS_SUBHEADER_SIGNATURE_COUNTS) { /* void */ } else if (signature == SAS_SUBHEADER_SIGNATURE_COLUMN_TEXT) { retval = sas7bdat_parse_column_text_subheader(subheader, len, ctx); } else if (signature == SAS_SUBHEADER_SIGNATURE_COLUMN_NAME) { retval = sas7bdat_parse_column_name_subheader(subheader, len, ctx); } else if (signature == SAS_SUBHEADER_SIGNATURE_COLUMN_ATTRS) { retval = sas7bdat_parse_column_attributes_subheader(subheader, len, ctx); } else if (signature == SAS_SUBHEADER_SIGNATURE_COLUMN_FORMAT) { retval = sas7bdat_parse_column_format_subheader(subheader, len, ctx); } else if (signature == SAS_SUBHEADER_SIGNATURE_COLUMN_LIST) { /* void */ } else if ((signature & SAS_SUBHEADER_SIGNATURE_COLUMN_MASK) == SAS_SUBHEADER_SIGNATURE_COLUMN_MASK) { /* void */ } else { retval = READSTAT_ERROR_PARSE; } cleanup: return retval; } static readstat_error_t sas7bdat_validate_column(col_info_t *col_info) { if (col_info->type == READSTAT_TYPE_DOUBLE) { if (col_info->width > 8 || col_info->width < 3) { return READSTAT_ERROR_PARSE; } } if (col_info->type == READSTAT_TYPE_STRING) { if (col_info->width > INT16_MAX || col_info->width == 0) { return READSTAT_ERROR_PARSE; } } return READSTAT_OK; } static readstat_variable_t *sas7bdat_init_variable(sas7bdat_ctx_t *ctx, int i, int index_after_skipping, readstat_error_t *out_retval) { readstat_error_t retval = READSTAT_OK; readstat_variable_t *variable = readstat_calloc(1, sizeof(readstat_variable_t)); variable->index = i; variable->index_after_skipping = index_after_skipping; variable->type = ctx->col_info[i].type; variable->storage_width = ctx->col_info[i].width; if ((retval = sas7bdat_validate_column(&ctx->col_info[i])) != READSTAT_OK) { goto cleanup; } if ((retval = sas7bdat_copy_text_ref(variable->name, sizeof(variable->name), ctx->col_info[i].name_ref, ctx)) != READSTAT_OK) { goto cleanup; } if ((retval = sas7bdat_copy_text_ref(variable->format, sizeof(variable->format), ctx->col_info[i].format_ref, ctx)) != READSTAT_OK) { goto cleanup; } size_t len = strlen(variable->format); if (len && ctx->col_info[i].format_len) { snprintf(variable->format + len, sizeof(variable->format) - len, "%d", ctx->col_info[i].format_len); } if ((retval = sas7bdat_copy_text_ref(variable->label, sizeof(variable->label), ctx->col_info[i].label_ref, ctx)) != READSTAT_OK) { goto cleanup; } cleanup: if (retval != READSTAT_OK) { free(variable); if (out_retval) *out_retval = retval; if (retval == READSTAT_ERROR_CONVERT_BAD_STRING) { if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "ReadStat: Error converting variable #%d info to specified encoding: %s %s (%s)", i, variable->name, variable->format, variable->label); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } } return NULL; } return variable; } static readstat_error_t sas7bdat_submit_columns(sas7bdat_ctx_t *ctx, int compressed) { readstat_error_t retval = READSTAT_OK; if (ctx->handle.metadata) { readstat_metadata_t metadata = { .row_count = ctx->row_limit, .var_count = ctx->column_count, .table_name = ctx->table_name, .file_label = ctx->file_label, .file_encoding = ctx->input_encoding, /* orig encoding? */ .creation_time = ctx->ctime, .modified_time = ctx->mtime, .file_format_version = ctx->version, .compression = READSTAT_COMPRESS_NONE, .endianness = ctx->little_endian ? READSTAT_ENDIAN_LITTLE : READSTAT_ENDIAN_BIG, .is64bit = ctx->u64 }; if (compressed) { if (ctx->rdc_compression) { metadata.compression = READSTAT_COMPRESS_BINARY; } else { metadata.compression = READSTAT_COMPRESS_ROWS; } } if (ctx->handle.metadata(&metadata, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } if (ctx->column_count == 0) goto cleanup; if ((ctx->variables = readstat_calloc(ctx->column_count, sizeof(readstat_variable_t *))) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } int i; int index_after_skipping = 0; for (i=0; icolumn_count; i++) { ctx->variables[i] = sas7bdat_init_variable(ctx, i, index_after_skipping, &retval); if (ctx->variables[i] == NULL) break; int cb_retval = READSTAT_HANDLER_OK; if (ctx->handle.variable) { cb_retval = ctx->handle.variable(i, ctx->variables[i], ctx->variables[i]->format, ctx->user_ctx); } if (cb_retval == READSTAT_HANDLER_ABORT) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } if (cb_retval == READSTAT_HANDLER_SKIP_VARIABLE) { ctx->variables[i]->skip = 1; } else { index_after_skipping++; } } cleanup: return retval; } static readstat_error_t sas7bdat_submit_columns_if_needed(sas7bdat_ctx_t *ctx, int compressed) { readstat_error_t retval = READSTAT_OK; if (!ctx->did_submit_columns) { if ((retval = sas7bdat_submit_columns(ctx, compressed)) != READSTAT_OK) { goto cleanup; } ctx->did_submit_columns = 1; } cleanup: return retval; } static int sas7bdat_signature_is_recognized(uint32_t signature) { return (signature == SAS_SUBHEADER_SIGNATURE_ROW_SIZE || signature == SAS_SUBHEADER_SIGNATURE_COLUMN_SIZE || signature == SAS_SUBHEADER_SIGNATURE_COUNTS || signature == SAS_SUBHEADER_SIGNATURE_COLUMN_FORMAT || (signature & SAS_SUBHEADER_SIGNATURE_COLUMN_MASK) == SAS_SUBHEADER_SIGNATURE_COLUMN_MASK); } static readstat_error_t sas7bdat_parse_subheader_pointer(const char *shp, size_t shp_size, subheader_pointer_t *info, sas7bdat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; if (ctx->u64) { if (shp_size <= 17) { retval = READSTAT_ERROR_PARSE; goto cleanup; } info->offset = sas_read8(&shp[0], ctx->bswap); info->len = sas_read8(&shp[8], ctx->bswap); info->compression = shp[16]; info->is_compressed_data = shp[17]; } else { if (shp_size <= 9) { retval = READSTAT_ERROR_PARSE; goto cleanup; } info->offset = sas_read4(&shp[0], ctx->bswap); info->len = sas_read4(&shp[4], ctx->bswap); info->compression = shp[8]; info->is_compressed_data = shp[9]; } cleanup: return retval; } static readstat_error_t sas7bdat_validate_subheader_pointer(subheader_pointer_t *shp_info, size_t page_size, uint16_t subheader_count, sas7bdat_ctx_t *ctx) { if (shp_info->offset > page_size) return READSTAT_ERROR_PARSE; if (shp_info->len > page_size) return READSTAT_ERROR_PARSE; if (shp_info->offset + shp_info->len > page_size) return READSTAT_ERROR_PARSE; if (shp_info->offset < ctx->page_header_size + subheader_count*ctx->subheader_pointer_size) return READSTAT_ERROR_PARSE; if (shp_info->compression == SAS_COMPRESSION_NONE) { if (shp_info->len < ctx->subheader_signature_size) return READSTAT_ERROR_PARSE; if (shp_info->offset + ctx->subheader_signature_size > page_size) return READSTAT_ERROR_PARSE; } return READSTAT_OK; } /* First, extract column text */ static readstat_error_t sas7bdat_parse_page_pass1(const char *page, size_t page_size, sas7bdat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; uint16_t subheader_count = sas_read2(&page[ctx->page_header_size-4], ctx->bswap); int i; const char *shp = &page[ctx->page_header_size]; int lshp = ctx->subheader_pointer_size; if (ctx->page_header_size + subheader_count*lshp > page_size) { retval = READSTAT_ERROR_PARSE; goto cleanup; } for (i=0; isubheader_signature_size; if ((retval = sas7bdat_parse_subheader_pointer(shp, page + page_size - shp, &shp_info, ctx)) != READSTAT_OK) { goto cleanup; } if (shp_info.len > 0 && shp_info.compression != SAS_COMPRESSION_TRUNC) { if ((retval = sas7bdat_validate_subheader_pointer(&shp_info, page_size, subheader_count, ctx)) != READSTAT_OK) { goto cleanup; } if (shp_info.compression == SAS_COMPRESSION_NONE) { signature = sas_read4(page + shp_info.offset, ctx->bswap); if (!ctx->little_endian && signature == -1 && signature_len == 8) { signature = sas_read4(page + shp_info.offset + 4, ctx->bswap); } if (signature == SAS_SUBHEADER_SIGNATURE_COLUMN_TEXT) { if ((retval = sas7bdat_parse_subheader(signature, page + shp_info.offset, shp_info.len, ctx)) != READSTAT_OK) { goto cleanup; } } } else if (shp_info.compression == SAS_COMPRESSION_ROW) { /* void */ } else { retval = READSTAT_ERROR_UNSUPPORTED_COMPRESSION; goto cleanup; } } shp += lshp; } cleanup: return retval; } static readstat_error_t sas7bdat_parse_page_pass2(const char *page, size_t page_size, sas7bdat_ctx_t *ctx) { uint16_t page_type; readstat_error_t retval = READSTAT_OK; page_type = sas_read2(&page[ctx->page_header_size-8], ctx->bswap); const char *data = NULL; if ((page_type & SAS_PAGE_TYPE_MASK) == SAS_PAGE_TYPE_DATA) { ctx->page_row_count = sas_read2(&page[ctx->page_header_size-6], ctx->bswap); data = &page[ctx->page_header_size]; } else if (!(page_type & SAS_PAGE_TYPE_COMP)) { uint16_t subheader_count = sas_read2(&page[ctx->page_header_size-4], ctx->bswap); int i; const char *shp = &page[ctx->page_header_size]; int lshp = ctx->subheader_pointer_size; if (ctx->page_header_size + subheader_count*lshp > page_size) { retval = READSTAT_ERROR_PARSE; goto cleanup; } for (i=0; i 0 && shp_info.compression != SAS_COMPRESSION_TRUNC) { if ((retval = sas7bdat_validate_subheader_pointer(&shp_info, page_size, subheader_count, ctx)) != READSTAT_OK) { goto cleanup; } if (shp_info.compression == SAS_COMPRESSION_NONE) { signature = sas_read4(page + shp_info.offset, ctx->bswap); if (!ctx->little_endian && signature == -1 && ctx->u64) { signature = sas_read4(page + shp_info.offset + 4, ctx->bswap); } if (shp_info.is_compressed_data && !sas7bdat_signature_is_recognized(signature)) { if (shp_info.len != ctx->row_length) { retval = READSTAT_ERROR_ROW_WIDTH_MISMATCH; goto cleanup; } if ((retval = sas7bdat_submit_columns_if_needed(ctx, 1)) != READSTAT_OK) { goto cleanup; } if ((retval = sas7bdat_parse_single_row(page + shp_info.offset, ctx)) != READSTAT_OK) { goto cleanup; } } else { if (signature != SAS_SUBHEADER_SIGNATURE_COLUMN_TEXT) { if ((retval = sas7bdat_parse_subheader(signature, page + shp_info.offset, shp_info.len, ctx)) != READSTAT_OK) { goto cleanup; } } } } else if (shp_info.compression == SAS_COMPRESSION_ROW) { if ((retval = sas7bdat_submit_columns_if_needed(ctx, 1)) != READSTAT_OK) { goto cleanup; } if ((retval = sas7bdat_parse_subheader_compressed(page + shp_info.offset, shp_info.len, ctx)) != READSTAT_OK) { goto cleanup; } } else { retval = READSTAT_ERROR_UNSUPPORTED_COMPRESSION; goto cleanup; } } shp += lshp; } if ((page_type & SAS_PAGE_TYPE_MASK) == SAS_PAGE_TYPE_MIX) { /* HACK - this is supposed to obey 8-byte boundaries but * some files created by Stat/Transfer don't. So verify that the * padding is { 0, 0, 0, 0 } or { ' ', ' ', ' ', ' ' } (or that * the file is not from Stat/Transfer) before skipping it */ if ((shp-page)%8 == 4 && shp + 4 <= page + page_size && (*(uint32_t *)shp == 0x00000000 || *(uint32_t *)shp == 0x20202020 || ctx->vendor != READSTAT_VENDOR_STAT_TRANSFER)) { data = shp + 4; } else { data = shp; } } } if (data) { if ((retval = sas7bdat_submit_columns_if_needed(ctx, 0)) != READSTAT_OK) { goto cleanup; } if (ctx->handle.value) { retval = sas7bdat_parse_rows(data, page + page_size - data, ctx); } } cleanup: return retval; } static readstat_error_t sas7bdat_parse_meta_pages_pass1(sas7bdat_ctx_t *ctx, int64_t *outLastExaminedPage) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; int64_t i; /* look for META and MIX pages at beginning... */ for (i=0; ipage_count; i++) { if (io->seek(ctx->header_size + i*ctx->page_size, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "ReadStat: Failed to seek to position %" PRId64 " (= %" PRId64 " + %" PRId64 "*%" PRId64 ")", ctx->header_size + i*ctx->page_size, ctx->header_size, i, ctx->page_size); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } goto cleanup; } readstat_off_t off = 0; if (ctx->u64) off = 16; size_t head_len = off + 16 + 2; size_t tail_len = ctx->page_size - head_len; if (io->read(ctx->page, head_len, io->io_ctx) < head_len) { retval = READSTAT_ERROR_READ; goto cleanup; } uint16_t page_type = sas_read2(&ctx->page[off+16], ctx->bswap); if ((page_type & SAS_PAGE_TYPE_MASK) == SAS_PAGE_TYPE_DATA) break; if ((page_type & SAS_PAGE_TYPE_COMP)) continue; if (io->read(ctx->page + head_len, tail_len, io->io_ctx) < tail_len) { retval = READSTAT_ERROR_READ; goto cleanup; } if ((retval = sas7bdat_parse_page_pass1(ctx->page, ctx->page_size, ctx)) != READSTAT_OK) { if (ctx->handle.error && retval != READSTAT_ERROR_USER_ABORT) { int64_t pos = io->seek(0, READSTAT_SEEK_CUR, io->io_ctx); snprintf(ctx->error_buf, sizeof(ctx->error_buf), "ReadStat: Error parsing page %" PRId64 ", bytes %" PRId64 "-%" PRId64, i, pos - ctx->page_size, pos-1); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } goto cleanup; } } cleanup: if (outLastExaminedPage) *outLastExaminedPage = i; return retval; } static readstat_error_t sas7bdat_parse_amd_pages_pass1(int64_t last_examined_page_pass1, sas7bdat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; uint64_t i; uint64_t amd_page_count = 0; /* ...then AMD pages at the end */ for (i=ctx->page_count-1; i>last_examined_page_pass1; i--) { if (io->seek(ctx->header_size + i*ctx->page_size, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "ReadStat: Failed to seek to position %" PRId64 " (= %" PRId64 " + %" PRId64 "*%" PRId64 ")", ctx->header_size + i*ctx->page_size, ctx->header_size, i, ctx->page_size); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } goto cleanup; } readstat_off_t off = 0; if (ctx->u64) off = 16; size_t head_len = off + 16 + 2; size_t tail_len = ctx->page_size - head_len; if (io->read(ctx->page, head_len, io->io_ctx) < head_len) { retval = READSTAT_ERROR_READ; goto cleanup; } uint16_t page_type = sas_read2(&ctx->page[off+16], ctx->bswap); if ((page_type & SAS_PAGE_TYPE_MASK) == SAS_PAGE_TYPE_DATA) { /* Usually AMD pages are at the end but sometimes data pages appear after them */ if (amd_page_count > 0) break; continue; } if ((page_type & SAS_PAGE_TYPE_COMP)) continue; if (io->read(ctx->page + head_len, tail_len, io->io_ctx) < tail_len) { retval = READSTAT_ERROR_READ; goto cleanup; } if ((retval = sas7bdat_parse_page_pass1(ctx->page, ctx->page_size, ctx)) != READSTAT_OK) { if (ctx->handle.error && retval != READSTAT_ERROR_USER_ABORT) { int64_t pos = io->seek(0, READSTAT_SEEK_CUR, io->io_ctx); snprintf(ctx->error_buf, sizeof(ctx->error_buf), "ReadStat: Error parsing page %" PRId64 ", bytes %" PRId64 "-%" PRId64, i, pos - ctx->page_size, pos-1); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } goto cleanup; } amd_page_count++; } cleanup: return retval; } static readstat_error_t sas7bdat_parse_all_pages_pass2(sas7bdat_ctx_t *ctx) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = ctx->io; int64_t i; for (i=0; ipage_count; i++) { if ((retval = sas7bdat_update_progress(ctx)) != READSTAT_OK) { goto cleanup; } if (io->read(ctx->page, ctx->page_size, io->io_ctx) < ctx->page_size) { retval = READSTAT_ERROR_READ; goto cleanup; } if ((retval = sas7bdat_parse_page_pass2(ctx->page, ctx->page_size, ctx)) != READSTAT_OK) { if (ctx->handle.error && retval != READSTAT_ERROR_USER_ABORT) { int64_t pos = io->seek(0, READSTAT_SEEK_CUR, io->io_ctx); snprintf(ctx->error_buf, sizeof(ctx->error_buf), "ReadStat: Error parsing page %" PRId64 ", bytes %" PRId64 "-%" PRId64, i, pos - ctx->page_size, pos-1); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } goto cleanup; } if (ctx->parsed_row_count == ctx->row_limit) break; } cleanup: return retval; } readstat_error_t readstat_parse_sas7bdat(readstat_parser_t *parser, const char *path, void *user_ctx) { int64_t last_examined_page_pass1 = 0; readstat_error_t retval = READSTAT_OK; readstat_io_t *io = parser->io; sas7bdat_ctx_t *ctx = calloc(1, sizeof(sas7bdat_ctx_t)); sas_header_info_t *hinfo = calloc(1, sizeof(sas_header_info_t)); ctx->handle = parser->handlers; ctx->input_encoding = parser->input_encoding; ctx->output_encoding = parser->output_encoding; ctx->user_ctx = user_ctx; ctx->io = parser->io; ctx->row_limit = parser->row_limit; if (parser->row_offset > 0) ctx->row_offset = parser->row_offset; if (io->open(path, io->io_ctx) == -1) { retval = READSTAT_ERROR_OPEN; goto cleanup; } if ((ctx->file_size = io->seek(0, READSTAT_SEEK_END, io->io_ctx)) == -1) { retval = READSTAT_ERROR_SEEK; if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "ReadStat: Failed to seek to end of file"); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } goto cleanup; } if (io->seek(0, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "ReadStat: Failed to seek to beginning of file"); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } goto cleanup; } if ((retval = sas_read_header(io, hinfo, ctx->handle.error, user_ctx)) != READSTAT_OK) { goto cleanup; } ctx->u64 = hinfo->u64; ctx->little_endian = hinfo->little_endian; ctx->vendor = hinfo->vendor; ctx->bswap = machine_is_little_endian() ^ hinfo->little_endian; ctx->header_size = hinfo->header_size; ctx->page_count = hinfo->page_count; ctx->page_size = hinfo->page_size; ctx->page_header_size = hinfo->page_header_size; ctx->subheader_pointer_size = hinfo->subheader_pointer_size; ctx->subheader_signature_size = ctx->u64 ? 8 : 4; ctx->ctime = hinfo->creation_time; ctx->mtime = hinfo->modification_time; ctx->version = hinfo->major_version; if (ctx->input_encoding == NULL) { ctx->input_encoding = hinfo->encoding; } if ((ctx->page = readstat_malloc(ctx->page_size)) == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (ctx->input_encoding && ctx->output_encoding && strcmp(ctx->input_encoding, ctx->output_encoding) != 0) { iconv_t converter = iconv_open(ctx->output_encoding, ctx->input_encoding); if (converter == (iconv_t)-1) { retval = READSTAT_ERROR_UNSUPPORTED_CHARSET; goto cleanup; } ctx->converter = converter; } if ((retval = readstat_convert(ctx->table_name, sizeof(ctx->table_name), hinfo->table_name, sizeof(hinfo->table_name), ctx->converter)) != READSTAT_OK) { goto cleanup; } if ((retval = sas7bdat_parse_meta_pages_pass1(ctx, &last_examined_page_pass1)) != READSTAT_OK) { goto cleanup; } if ((retval = sas7bdat_parse_amd_pages_pass1(last_examined_page_pass1, ctx)) != READSTAT_OK) { goto cleanup; } if (io->seek(ctx->header_size, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "ReadStat: Failed to seek to position %" PRId64, ctx->header_size); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } goto cleanup; } if ((retval = sas7bdat_parse_all_pages_pass2(ctx)) != READSTAT_OK) { goto cleanup; } if ((retval = sas7bdat_submit_columns_if_needed(ctx, 0)) != READSTAT_OK) { goto cleanup; } if (ctx->handle.value && ctx->parsed_row_count != ctx->row_limit) { retval = READSTAT_ERROR_ROW_COUNT_MISMATCH; if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "ReadStat: Expected %d rows in file, found %d", ctx->row_limit, ctx->parsed_row_count); ctx->handle.error(ctx->error_buf, ctx->user_ctx); } goto cleanup; } if ((retval = sas7bdat_update_progress(ctx)) != READSTAT_OK) { goto cleanup; } cleanup: io->close(io->io_ctx); if (retval == READSTAT_ERROR_OPEN || retval == READSTAT_ERROR_READ || retval == READSTAT_ERROR_SEEK) { if (ctx->handle.error) { snprintf(ctx->error_buf, sizeof(ctx->error_buf), "ReadStat: %s (retval = %d): %s (errno = %d)", readstat_error_message(retval), retval, strerror(errno), errno); ctx->handle.error(ctx->error_buf, user_ctx); } } if (ctx) sas7bdat_ctx_free(ctx); if (hinfo) free(hinfo); return retval; } haven/src/readstat/sas/readstat_xport_write.c0000644000176200001440000004354014101007206021155 0ustar liggesusers #include #include #include #include "../readstat.h" #include "../readstat_writer.h" #include "readstat_sas.h" #include "readstat_xport.h" #include "ieee.h" #define XPORT_DEFAULT_VERISON 8 #define RECORD_LEN 80 #if defined _MSC_VER #define restrict __restrict #endif static void copypad(char * restrict dst, size_t dst_len, const char * restrict src) { char *dst_end = dst + dst_len; while (dst < dst_end && *src) *dst++ = *src++; while (dst < dst_end) *dst++ = ' '; } static readstat_error_t xport_write_bytes(readstat_writer_t *writer, const void *bytes, size_t len) { return readstat_write_bytes_as_lines(writer, bytes, len, RECORD_LEN, ""); } static readstat_error_t xport_finish_record(readstat_writer_t *writer) { return readstat_write_line_padding(writer, ' ', RECORD_LEN, ""); } static readstat_error_t xport_write_record(readstat_writer_t *writer, const char *record) { size_t len = strlen(record); readstat_error_t retval = READSTAT_OK; retval = xport_write_bytes(writer, record, len); if (retval != READSTAT_OK) goto cleanup; retval = xport_finish_record(writer); if (retval != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t xport_write_header_record_v8(readstat_writer_t *writer, xport_header_record_t *xrecord) { char record[RECORD_LEN+1]; snprintf(record, sizeof(record), "HEADER RECORD*******%-8sHEADER RECORD!!!!!!!%-30d", xrecord->name, xrecord->num1); return xport_write_record(writer, record); } static readstat_error_t xport_write_header_record(readstat_writer_t *writer, xport_header_record_t *xrecord) { char record[RECORD_LEN+1]; snprintf(record, sizeof(record), "HEADER RECORD*******%-8sHEADER RECORD!!!!!!!" "%05d%05d%05d" "%05d%05d%05d", xrecord->name, xrecord->num1, xrecord->num2, xrecord->num3, xrecord->num4, xrecord->num5, xrecord->num6); return xport_write_record(writer, record); } static size_t xport_variable_width(readstat_type_t type, size_t user_width) { if (type == READSTAT_TYPE_STRING) return user_width; if (user_width >= XPORT_MAX_DOUBLE_SIZE || user_width == 0) return XPORT_MAX_DOUBLE_SIZE; if (user_width <= XPORT_MIN_DOUBLE_SIZE) return XPORT_MIN_DOUBLE_SIZE; return user_width; } static readstat_error_t xport_write_variables(readstat_writer_t *writer) { readstat_error_t retval = READSTAT_OK; int i; long offset = 0; int num_long_labels = 0; int any_has_long_format = 0; for (i=0; ivariables_count; i++) { int needs_long_record = 0; readstat_variable_t *variable = readstat_get_variable(writer, i); size_t width = xport_variable_width(variable->type, variable->user_width); xport_namestr_t namestr = { .nvar0 = i+1, .nlng = width, .npos = offset, .niform = " ", .nform = " " }; if (readstat_variable_get_type_class(variable) == READSTAT_TYPE_CLASS_STRING) { namestr.ntype = SAS_COLUMN_TYPE_CHR; } else { namestr.ntype = SAS_COLUMN_TYPE_NUM; } copypad(namestr.nname, sizeof(namestr.nname), variable->name); copypad(namestr.nlabel, sizeof(namestr.nlabel), variable->label); if (variable->format[0]) { int decimals = 0; int width = 0; char name[24]; sscanf(variable->format, "%s%d.%d", name, &width, &decimals); copypad(namestr.nform, sizeof(namestr.nform), name); namestr.nfl = width; namestr.nfd = decimals; copypad(namestr.niform, sizeof(namestr.niform), name); namestr.nifl = width; namestr.nifd = decimals; if (strlen(name) > 8) { any_has_long_format = 1; needs_long_record = 1; } } else if (variable->display_width) { namestr.nfl = variable->display_width; } namestr.nfj = (variable->alignment == READSTAT_ALIGNMENT_RIGHT); if (writer->version == 8) { copypad(namestr.longname, sizeof(namestr.longname), variable->name); size_t label_len = strlen(variable->label); if (label_len > 40) { needs_long_record = 1; } namestr.labeln = label_len; } if (needs_long_record) { num_long_labels++; } offset += width; xport_namestr_bswap(&namestr); retval = xport_write_bytes(writer, &namestr, sizeof(xport_namestr_t)); if (retval != READSTAT_OK) goto cleanup; } retval = xport_finish_record(writer); if (retval != READSTAT_OK) goto cleanup; if (writer->version == 8 && num_long_labels) { xport_header_record_t header = { .name = "LABELV8", .num1 = num_long_labels }; if (any_has_long_format) { strcpy(header.name, "LABELV9"); } retval = xport_write_header_record_v8(writer, &header); if (retval != READSTAT_OK) goto cleanup; for (i=0; ivariables_count; i++) { readstat_variable_t *variable = readstat_get_variable(writer, i); size_t label_len = strlen(variable->label); size_t name_len = strlen(variable->name); int has_long_label = 0; int has_long_format = 0; int format_len = 0; char format_name[24]; memset(format_name, 0, sizeof(format_name)); has_long_label = (label_len > 40); if (variable->format[0]) { int decimals = 2; int width = 8; int matches = sscanf(variable->format, "%s%d.%d", format_name, &width, &decimals); if (matches < 1) { retval = READSTAT_ERROR_BAD_FORMAT_STRING; goto cleanup; } format_len = strlen(format_name); if (format_len > 8) { has_long_format = 1; } } if (has_long_format) { uint16_t labeldef[5] = { i+1, name_len, format_len, format_len, label_len }; if (machine_is_little_endian()) { labeldef[0] = byteswap2(labeldef[0]); labeldef[1] = byteswap2(labeldef[1]); labeldef[2] = byteswap2(labeldef[2]); labeldef[3] = byteswap2(labeldef[3]); labeldef[4] = byteswap2(labeldef[4]); } retval = readstat_write_bytes(writer, labeldef, sizeof(labeldef)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_string(writer, variable->name); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_string(writer, format_name); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_string(writer, format_name); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_string(writer, variable->label); if (retval != READSTAT_OK) goto cleanup; } else if (has_long_label) { uint16_t labeldef[3] = { i+1, name_len, label_len }; if (machine_is_little_endian()) { labeldef[0] = byteswap2(labeldef[0]); labeldef[1] = byteswap2(labeldef[1]); labeldef[2] = byteswap2(labeldef[2]); } retval = readstat_write_bytes(writer, labeldef, sizeof(labeldef)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_string(writer, variable->name); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_string(writer, variable->label); if (retval != READSTAT_OK) goto cleanup; } } retval = xport_finish_record(writer); if (retval != READSTAT_OK) goto cleanup; } cleanup: return retval; } static readstat_error_t xport_write_first_header_record(readstat_writer_t *writer) { xport_header_record_t xrecord = { .name = "LIBRARY" }; if (writer->version == 8) { strcpy(xrecord.name, "LIBV8"); } return xport_write_header_record(writer, &xrecord); } static readstat_error_t xport_write_first_real_header_record(readstat_writer_t *writer, const char *timestamp) { char real_record[RECORD_LEN+1]; snprintf(real_record, sizeof(real_record), "%-8.8s" "%-8.8s" "%-8.8s" "%-8.8s" "%-8.8s" "%-24.24s" "%16.16s", "SAS", "SAS", "SASLIB", "6.06", "bsd4.2", "", timestamp); return xport_write_record(writer, real_record); } static readstat_error_t xport_write_member_header_record(readstat_writer_t *writer) { xport_header_record_t xrecord = { .name = "MEMBER", .num4 = 160, .num6 = 140 }; if (writer->version == 8) { strcpy(xrecord.name, "MEMBV8"); } return xport_write_header_record(writer, &xrecord); } static readstat_error_t xport_write_descriptor_header_record(readstat_writer_t *writer) { xport_header_record_t xrecord = { .name = "DSCRPTR" }; if (writer->version == 8) { strcpy(xrecord.name, "DSCPTV8"); } return xport_write_header_record(writer, &xrecord); } static readstat_error_t xport_write_member_record_v8(readstat_writer_t *writer, char *timestamp) { readstat_error_t retval = READSTAT_OK; char member_header[RECORD_LEN+1]; char *ds_name = "DATASET"; if (writer->table_name[0]) ds_name = writer->table_name; snprintf(member_header, sizeof(member_header), "%-8.8s" "%-32.32s" "%-8.8s" "%-8.8s" "%-8.8s" "%16.16s", "SAS", ds_name, "SASDATA", "6.06", "bsd4.2", timestamp); retval = xport_write_record(writer, member_header); return retval; } static readstat_error_t xport_write_member_record(readstat_writer_t *writer, char *timestamp) { if (writer->version == 8) return xport_write_member_record_v8(writer, timestamp); readstat_error_t retval = READSTAT_OK; char member_header[RECORD_LEN+1]; char *ds_name = "DATASET"; if (writer->table_name[0]) ds_name = writer->table_name; snprintf(member_header, sizeof(member_header), "%-8.8s" "%-8.8s" "%-8.8s" "%-8.8s" "%-8.8s" "%-24.24s" "%16.16s", "SAS", ds_name, "SASDATA", "6.06", "bsd4.2", "", timestamp); retval = xport_write_record(writer, member_header); return retval; } static readstat_error_t xport_write_file_label_record(readstat_writer_t *writer, char *timestamp) { char member_header[RECORD_LEN+1]; snprintf(member_header, sizeof(member_header), "%16.16s" "%16.16s" "%-40.40s" "%-8.8s", timestamp, "", writer->file_label, "" /* dstype? */); return xport_write_record(writer, member_header); } static readstat_error_t xport_write_namestr_header_record(readstat_writer_t *writer) { xport_header_record_t xrecord = { .name = "NAMESTR", .num2 = writer->variables_count }; if (writer->version == 8) { strcpy(xrecord.name, "NAMSTV8"); } return xport_write_header_record(writer, &xrecord); } static readstat_error_t xport_write_obs_header_record(readstat_writer_t *writer) { xport_header_record_t xrecord = { .name = "OBS" }; if (writer->version == 8) { strcpy(xrecord.name, "OBSV8"); } return xport_write_header_record(writer, &xrecord); } static readstat_error_t xport_format_timestamp(char *output, size_t output_len, time_t timestamp) { struct tm *ts = localtime(×tamp); if (!ts) return READSTAT_ERROR_BAD_TIMESTAMP_VALUE; snprintf(output, output_len, "%02d%3.3s%02d:%02d:%02d:%02d", (unsigned int)ts->tm_mday % 100, _xport_months[ts->tm_mon], (unsigned int)ts->tm_year % 100, (unsigned int)ts->tm_hour % 100, (unsigned int)ts->tm_min % 100, (unsigned int)ts->tm_sec % 100 ); return READSTAT_OK; } static readstat_error_t xport_begin_data(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; readstat_error_t retval = READSTAT_OK; char timestamp[17]; retval = xport_format_timestamp(timestamp, sizeof(timestamp), writer->timestamp); if (retval != READSTAT_OK) goto cleanup; retval = xport_write_first_header_record(writer); if (retval != READSTAT_OK) goto cleanup; retval = xport_write_first_real_header_record(writer, timestamp); if (retval != READSTAT_OK) goto cleanup; retval = xport_write_record(writer, timestamp); if (retval != READSTAT_OK) goto cleanup; retval = xport_write_member_header_record(writer); if (retval != READSTAT_OK) goto cleanup; retval = xport_write_descriptor_header_record(writer); if (retval != READSTAT_OK) goto cleanup; retval = xport_write_member_record(writer, timestamp); if (retval != READSTAT_OK) goto cleanup; retval = xport_write_file_label_record(writer, timestamp); if (retval != READSTAT_OK) goto cleanup; retval = xport_write_namestr_header_record(writer); if (retval != READSTAT_OK) goto cleanup; retval = xport_write_variables(writer); if (retval != READSTAT_OK) goto cleanup; retval = xport_write_obs_header_record(writer); if (retval != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t xport_end_data(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; readstat_error_t retval = READSTAT_OK; retval = xport_finish_record(writer); return retval; } static readstat_error_t xport_write_row(void *writer_ctx, void *row, size_t row_len) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; return xport_write_bytes(writer, row, row_len); } static readstat_error_t xport_write_double(void *row, const readstat_variable_t *var, double value) { char full_value[8]; int rc = cnxptiee(&value, CN_TYPE_NATIVE, full_value, CN_TYPE_XPORT); if (rc) return READSTAT_ERROR_CONVERT; memcpy(row, full_value, var->storage_width); return READSTAT_OK; } static readstat_error_t xport_write_float(void *row, const readstat_variable_t *var, float value) { return xport_write_double(row, var, value); } static readstat_error_t xport_write_int32(void *row, const readstat_variable_t *var, int32_t value) { return xport_write_double(row, var, value); } static readstat_error_t xport_write_int16(void *row, const readstat_variable_t *var, int16_t value) { return xport_write_double(row, var, value); } static readstat_error_t xport_write_int8(void *row, const readstat_variable_t *var, int8_t value) { return xport_write_double(row, var, value); } static readstat_error_t xport_write_string(void *row, const readstat_variable_t *var, const char *string) { memset(row, ' ', var->storage_width); if (string != NULL && string[0]) { size_t value_len = strlen(string); if (value_len > var->storage_width) return READSTAT_ERROR_STRING_VALUE_IS_TOO_LONG; memcpy(row, string, value_len); } return READSTAT_OK; } static readstat_error_t xport_write_missing_numeric(void *row, const readstat_variable_t *var) { char *row_bytes = (char *)row; row_bytes[0] = 0x2e; return READSTAT_OK; } static readstat_error_t xport_write_missing_string(void *row, const readstat_variable_t *var) { return xport_write_string(row, var, NULL); } static readstat_error_t xport_write_missing_tagged(void *row, const readstat_variable_t *var, char tag) { char *row_bytes = (char *)row; readstat_error_t error = sas_validate_tag(tag); if (error == READSTAT_OK) { row_bytes[0] = tag; } return error; } static readstat_error_t xport_metadata_ok(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; if (writer->version != 5 && writer->version != 8) return READSTAT_ERROR_UNSUPPORTED_FILE_FORMAT_VERSION; if (writer->table_name[0]) { if (writer->version == 8) { return sas_validate_name(writer->table_name, 32); } if (writer->version == 5) { return sas_validate_name(writer->table_name, 8); } } return READSTAT_OK; } readstat_error_t readstat_begin_writing_xport(readstat_writer_t *writer, void *user_ctx, long row_count) { if (writer->version == 0) writer->version = XPORT_DEFAULT_VERISON; writer->callbacks.metadata_ok = &xport_metadata_ok; writer->callbacks.write_int8 = &xport_write_int8; writer->callbacks.write_int16 = &xport_write_int16; writer->callbacks.write_int32 = &xport_write_int32; writer->callbacks.write_float = &xport_write_float; writer->callbacks.write_double = &xport_write_double; writer->callbacks.write_string = &xport_write_string; writer->callbacks.write_missing_string = &xport_write_missing_string; writer->callbacks.write_missing_number = &xport_write_missing_numeric; writer->callbacks.write_missing_tagged = &xport_write_missing_tagged; writer->callbacks.variable_width = &xport_variable_width; writer->callbacks.variable_ok = &sas_validate_variable; writer->callbacks.begin_data = &xport_begin_data; writer->callbacks.end_data = &xport_end_data; writer->callbacks.write_row = &xport_write_row; return readstat_begin_writing_file(writer, user_ctx, row_count); } haven/src/readstat/sas/readstat_sas_rle.c0000644000176200001440000002315414101007206020216 0ustar liggesusers #include #include #include #if defined(_MSC_VER) #include typedef SSIZE_T ssize_t; #endif #include "readstat_sas_rle.h" #define SAS_RLE_COMMAND_COPY64 0 #define SAS_RLE_COMMAND_INSERT_BYTE18 4 #define SAS_RLE_COMMAND_INSERT_AT17 5 #define SAS_RLE_COMMAND_INSERT_BLANK17 6 #define SAS_RLE_COMMAND_INSERT_ZERO17 7 #define SAS_RLE_COMMAND_COPY1 8 #define SAS_RLE_COMMAND_COPY17 9 #define SAS_RLE_COMMAND_COPY33 10 #define SAS_RLE_COMMAND_COPY49 11 #define SAS_RLE_COMMAND_INSERT_BYTE3 12 #define SAS_RLE_COMMAND_INSERT_AT2 13 #define SAS_RLE_COMMAND_INSERT_BLANK2 14 #define SAS_RLE_COMMAND_INSERT_ZERO2 15 #define MAX_INSERT_RUN 4112 // 4095 + 17 #define MAX_COPY_RUN 4159 // 4095 + 64 static size_t command_lengths[16] = { [SAS_RLE_COMMAND_COPY64] = 1, [SAS_RLE_COMMAND_INSERT_BYTE18] = 2, [SAS_RLE_COMMAND_INSERT_AT17] = 1, [SAS_RLE_COMMAND_INSERT_BLANK17] = 1, [SAS_RLE_COMMAND_INSERT_ZERO17] = 1, [SAS_RLE_COMMAND_INSERT_BYTE3] = 1 }; ssize_t sas_rle_decompressed_len(const void *input_buf, size_t input_len) { return sas_rle_decompress(NULL, 0, input_buf, input_len); } ssize_t sas_rle_decompress(void *output_buf, size_t output_len, const void *input_buf, size_t input_len) { unsigned char *buffer = (unsigned char *)output_buf; unsigned char *output = buffer; size_t output_written = 0; const unsigned char *input = (const unsigned char *)input_buf; while (input < (const unsigned char *)input_buf + input_len) { unsigned char control = *input++; unsigned char command = (control & 0xF0) >> 4; unsigned char length = (control & 0x0F); int copy_len = 0; int insert_len = 0; unsigned char insert_byte = '\0'; if (input + command_lengths[command] > (const unsigned char *)input_buf + input_len) { return -1; } switch (command) { case SAS_RLE_COMMAND_COPY64: copy_len = (*input++) + 64 + length * 256; break; case SAS_RLE_COMMAND_INSERT_BYTE18: insert_len = (*input++) + 18 + length * 256; insert_byte = *input++; break; case SAS_RLE_COMMAND_INSERT_AT17: insert_len = (*input++) + 17 + length * 256; insert_byte = '@'; break; case SAS_RLE_COMMAND_INSERT_BLANK17: insert_len = (*input++) + 17 + length * 256; insert_byte = ' '; break; case SAS_RLE_COMMAND_INSERT_ZERO17: insert_len = (*input++) + 17 + length * 256; insert_byte = '\0'; break; case SAS_RLE_COMMAND_COPY1: copy_len = length + 1; break; case SAS_RLE_COMMAND_COPY17: copy_len = length + 17; break; case SAS_RLE_COMMAND_COPY33: copy_len = length + 33; break; case SAS_RLE_COMMAND_COPY49: copy_len = length + 49; break; case SAS_RLE_COMMAND_INSERT_BYTE3: insert_byte = *input++; insert_len = length + 3; break; case SAS_RLE_COMMAND_INSERT_AT2: insert_byte = '@'; insert_len = length + 2; break; case SAS_RLE_COMMAND_INSERT_BLANK2: insert_byte = ' '; insert_len = length + 2; break; case SAS_RLE_COMMAND_INSERT_ZERO2: insert_byte = '\0'; insert_len = length + 2; break; default: /* error out here? */ break; } if (copy_len) { if (output_written + copy_len > output_len) { return -1; } if (input + copy_len > (const unsigned char *)input_buf + input_len) { return -1; } if (output) { memcpy(&output[output_written], input, copy_len); } input += copy_len; output_written += copy_len; } if (insert_len) { if (output_written + insert_len > output_len) { return -1; } if (output) { memset(&output[output_written], insert_byte, insert_len); } output_written += insert_len; } } return output_written; } static size_t sas_rle_measure_copy_run(size_t copy_run) { size_t len = 0; while (copy_run >= MAX_COPY_RUN) { len += 2 + MAX_COPY_RUN; copy_run -= MAX_COPY_RUN; } return len + (copy_run > 64) + (copy_run > 0) + copy_run; } static size_t sas_rle_copy_run(unsigned char *output_buf, size_t offset, const unsigned char *copy, size_t copy_run) { unsigned char *out = output_buf + offset; if (output_buf == NULL) return sas_rle_measure_copy_run(copy_run); while (copy_run >= MAX_COPY_RUN) { *out++ = (SAS_RLE_COMMAND_COPY64 << 4) + 0x0F; *out++ = 0xFF; memcpy(out, copy, MAX_COPY_RUN); out += MAX_COPY_RUN; copy += MAX_COPY_RUN; copy_run -= MAX_COPY_RUN; } if (copy_run > 64) { int length = (copy_run - 64) / 256; unsigned char rem = (copy_run - 64) % 256; *out++ = (SAS_RLE_COMMAND_COPY64 << 4) + (length & 0x0F); *out++ = rem; } else if (copy_run >= 49) { *out++ = (SAS_RLE_COMMAND_COPY49 << 4) + (copy_run - 49); } else if (copy_run >= 33) { *out++ = (SAS_RLE_COMMAND_COPY33 << 4) + (copy_run - 33); } else if (copy_run >= 17) { *out++ = (SAS_RLE_COMMAND_COPY17 << 4) + (copy_run - 17); } else if (copy_run >= 1) { *out++ = (SAS_RLE_COMMAND_COPY1 << 4) + (copy_run - 1); } memcpy(out, copy, copy_run); out += copy_run; return out - (output_buf + offset); } static int sas_rle_is_special_byte(unsigned char last_byte) { return (last_byte == '@' || last_byte == ' ' || last_byte == '\0'); } static size_t sas_rle_measure_insert_run(unsigned char last_byte, size_t insert_run) { if (sas_rle_is_special_byte(last_byte)) return insert_run > 17 ? 2 : 1; return insert_run > 18 ? 3 : 2; } static size_t sas_rle_insert_run(unsigned char *output_buf, size_t offset, unsigned char last_byte, size_t insert_run) { unsigned char *out = output_buf + offset; if (output_buf == NULL) return sas_rle_measure_insert_run(last_byte, insert_run); if (sas_rle_is_special_byte(last_byte)) { if (insert_run > 17) { int length = (insert_run - 17) / 256; unsigned char rem = (insert_run - 17) % 256; if (last_byte == '@') { *out++ = (SAS_RLE_COMMAND_INSERT_AT17 << 4) + (length & 0x0F); } else if (last_byte == ' ') { *out++ = (SAS_RLE_COMMAND_INSERT_BLANK17 << 4) + (length & 0x0F); } else if (last_byte == '\0') { *out++ = (SAS_RLE_COMMAND_INSERT_ZERO17 << 4) + (length & 0x0F); } *out++ = rem; } else if (insert_run >= 2) { if (last_byte == '@') { *out++ = (SAS_RLE_COMMAND_INSERT_AT2 << 4) + (insert_run - 2); } else if (last_byte == ' ') { *out++ = (SAS_RLE_COMMAND_INSERT_BLANK2 << 4) + (insert_run - 2); } else if (last_byte == '\0') { *out++ = (SAS_RLE_COMMAND_INSERT_ZERO2 << 4) + (insert_run - 2); } } } else if (insert_run > 18) { int length = (insert_run - 18) / 256; unsigned char rem = (insert_run - 18) % 256; *out++ = (SAS_RLE_COMMAND_INSERT_BYTE18 << 4) + (length & 0x0F); *out++ = rem; *out++ = last_byte; } else if (insert_run >= 3) { *out++ = (SAS_RLE_COMMAND_INSERT_BYTE3 << 4) + (insert_run - 3); *out++ = last_byte; } return out - (output_buf + offset); } static int sas_rle_is_insert_run(unsigned char last_byte, size_t insert_run) { if (sas_rle_is_special_byte(last_byte)) return (insert_run > 1); return (insert_run > 2); } ssize_t sas_rle_compressed_len(const void *bytes, size_t len) { return sas_rle_compress(NULL, 0, bytes, len); } ssize_t sas_rle_compress(void *output_buf, size_t output_len, const void *input_buf, size_t input_len) { /* TODO bounds check */ const unsigned char *p = (const unsigned char *)input_buf; const unsigned char *pe = p + input_len; const unsigned char *copy = p; unsigned char *out = (unsigned char *)output_buf; size_t insert_run = 0; size_t copy_run = 0; size_t out_written = 0; unsigned char last_byte = 0; while (p < pe) { unsigned char c = *p; if (insert_run == 0) { insert_run = 1; } else if (c == last_byte && insert_run < MAX_INSERT_RUN) { insert_run++; } else { if (sas_rle_is_insert_run(last_byte, insert_run)) { out_written += sas_rle_copy_run(out, out_written, copy, copy_run); out_written += sas_rle_insert_run(out, out_written, last_byte, insert_run); copy_run = 0; copy = p; } else { copy_run += insert_run; } insert_run = 1; } last_byte = c; p++; } if (sas_rle_is_insert_run(last_byte, insert_run)) { out_written += sas_rle_copy_run(out, out_written, copy, copy_run); out_written += sas_rle_insert_run(out, out_written, last_byte, insert_run); } else { out_written += sas_rle_copy_run(out, out_written, copy, copy_run + insert_run); } return out_written; } haven/src/readstat/sas/ieee.h0000644000176200001440000000026214101007206015606 0ustar liggesusers#define CN_TYPE_NATIVE 0 #define CN_TYPE_XPORT 1 #define CN_TYPE_IEEEB 2 #define CN_TYPE_IEEEL 3 int cnxptiee(const void *from_bytes, int fromtype, void *to_bytes, int totype); haven/src/readstat/sas/readstat_sas.c0000644000176200001440000004351414101765776017405 0ustar liggesusers #include #include #include #include #include #include #include #include #include "readstat_sas.h" #include "../readstat_iconv.h" #include "../readstat_convert.h" #include "../readstat_writer.h" #define SAS_FILE_HEADER_SIZE_32BIT 1024 #define SAS_FILE_HEADER_SIZE_64BIT 8192 #define SAS_DEFAULT_PAGE_SIZE 4096 #define SAS_DEFAULT_STRING_ENCODING "WINDOWS-1252" unsigned char sas7bdat_magic_number[32] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc2, 0xea, 0x81, 0x60, 0xb3, 0x14, 0x11, 0xcf, 0xbd, 0x92, 0x08, 0x00, 0x09, 0xc7, 0x31, 0x8c, 0x18, 0x1f, 0x10, 0x11 }; unsigned char sas7bcat_magic_number[32] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc2, 0xea, 0x81, 0x63, 0xb3, 0x14, 0x11, 0xcf, 0xbd, 0x92, 0x08, 0x00, 0x09, 0xc7, 0x31, 0x8c, 0x18, 0x1f, 0x10, 0x11 }; /* This table is cobbled together from extant files and: * https://support.sas.com/documentation/cdl/en/nlsref/61893/HTML/default/viewer.htm#a002607278.htm * https://support.sas.com/documentation/onlinedoc/dfdmstudio/2.6/dmpdmsug/Content/dfU_Encodings_SAS.html * * Discrepancies form the official documentation are noted with a comment. It * appears that in some instances that SAS software uses a newer encoding than * what's listed in the docs. In these cases the encoding used by ReadStat * represents the author's best guess. */ static readstat_charset_entry_t _charset_table[] = { { .code = 0, .name = SAS_DEFAULT_STRING_ENCODING }, { .code = 20, .name = "UTF-8" }, { .code = 28, .name = "US-ASCII" }, { .code = 29, .name = "ISO-8859-1" }, { .code = 30, .name = "ISO-8859-2" }, { .code = 31, .name = "ISO-8859-3" }, { .code = 32, .name = "ISO-8859-4" }, { .code = 33, .name = "ISO-8859-5" }, { .code = 34, .name = "ISO-8859-6" }, { .code = 35, .name = "ISO-8859-7" }, { .code = 36, .name = "ISO-8859-8" }, { .code = 37, .name = "ISO-8859-9" }, { .code = 39, .name = "ISO-8859-11" }, { .code = 40, .name = "ISO-8859-15" }, { .code = 41, .name = "CP437" }, { .code = 42, .name = "CP850" }, { .code = 43, .name = "CP852" }, { .code = 44, .name = "CP857" }, { .code = 45, .name = "CP858" }, { .code = 46, .name = "CP862" }, { .code = 47, .name = "CP864" }, { .code = 48, .name = "CP865" }, { .code = 49, .name = "CP866" }, { .code = 50, .name = "CP869" }, { .code = 51, .name = "CP874" }, { .code = 52, .name = "CP921" }, { .code = 53, .name = "CP922" }, { .code = 54, .name = "CP1129" }, { .code = 55, .name = "CP720" }, { .code = 56, .name = "CP737" }, { .code = 57, .name = "CP775" }, { .code = 58, .name = "CP860" }, { .code = 59, .name = "CP863" }, { .code = 60, .name = "WINDOWS-1250" }, { .code = 61, .name = "WINDOWS-1251" }, { .code = 62, .name = "WINDOWS-1252" }, { .code = 63, .name = "WINDOWS-1253" }, { .code = 64, .name = "WINDOWS-1254" }, { .code = 65, .name = "WINDOWS-1255" }, { .code = 66, .name = "WINDOWS-1256" }, { .code = 67, .name = "WINDOWS-1257" }, { .code = 68, .name = "WINDOWS-1258" }, { .code = 69, .name = "MACROMAN" }, { .code = 70, .name = "MACARABIC" }, { .code = 71, .name = "MACHEBREW" }, { .code = 72, .name = "MACGREEK" }, { .code = 73, .name = "MACTHAI" }, { .code = 75, .name = "MACTURKISH" }, { .code = 76, .name = "MACUKRAINE" }, { .code = 118, .name = "CP950" }, { .code = 119, .name = "EUC-TW" }, { .code = 123, .name = "BIG-5" }, { .code = 125, .name = "GB18030" }, // "euc-cn" in SAS { .code = 126, .name = "WINDOWS-936" }, // "zwin" { .code = 128, .name = "CP1381" }, // "zpce" { .code = 134, .name = "EUC-JP" }, { .code = 136, .name = "CP949" }, { .code = 137, .name = "CP942" }, { .code = 138, .name = "CP932" }, // "shift-jis" in SAS { .code = 140, .name = "EUC-KR" }, { .code = 141, .name = "CP949" }, // "kpce" { .code = 142, .name = "CP949" }, // "kwin" { .code = 163, .name = "MACICELAND" }, { .code = 167, .name = "ISO-2022-JP" }, { .code = 168, .name = "ISO-2022-KR" }, { .code = 169, .name = "ISO-2022-CN" }, { .code = 172, .name = "ISO-2022-CN-EXT" }, { .code = 204, .name = SAS_DEFAULT_STRING_ENCODING }, // "any" in SAS { .code = 205, .name = "GB18030" }, { .code = 227, .name = "ISO-8859-14" }, { .code = 242, .name = "ISO-8859-13" }, { .code = 245, .name = "MACCROATIAN" }, { .code = 246, .name = "MACCYRILLIC" }, { .code = 247, .name = "MACROMANIA" }, { .code = 248, .name = "SHIFT_JISX0213" }, }; static time_t sas_epoch() { return - 3653 * 86400; // seconds between 01-01-1960 and 01-01-1970 } static time_t sas_convert_time(double time, time_t epoch) { time += epoch; if (isnan(time)) return 0; if (time > (double)LONG_MAX) return LONG_MAX; if (time < (double)LONG_MIN) return LONG_MIN; return time; } uint64_t sas_read8(const char *data, int bswap) { uint64_t tmp; memcpy(&tmp, data, 8); return bswap ? byteswap8(tmp) : tmp; } uint32_t sas_read4(const char *data, int bswap) { uint32_t tmp; memcpy(&tmp, data, 4); return bswap ? byteswap4(tmp) : tmp; } uint16_t sas_read2(const char *data, int bswap) { uint16_t tmp; memcpy(&tmp, data, 2); return bswap ? byteswap2(tmp) : tmp; } size_t sas_subheader_remainder(size_t len, size_t signature_len) { return len - (4+2*signature_len); } readstat_error_t sas_read_header(readstat_io_t *io, sas_header_info_t *hinfo, readstat_error_handler error_handler, void *user_ctx) { sas_header_start_t header_start; sas_header_end_t header_end; int retval = READSTAT_OK; char error_buf[1024]; time_t epoch = sas_epoch(); if (io->read(&header_start, sizeof(sas_header_start_t), io->io_ctx) < sizeof(sas_header_start_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (memcmp(header_start.magic, sas7bdat_magic_number, sizeof(sas7bdat_magic_number)) != 0 && memcmp(header_start.magic, sas7bcat_magic_number, sizeof(sas7bcat_magic_number)) != 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (header_start.a1 == SAS_ALIGNMENT_OFFSET_4) { hinfo->pad1 = 4; } if (header_start.a2 == SAS_ALIGNMENT_OFFSET_4) { hinfo->u64 = 1; } int bswap = 0; if (header_start.endian == SAS_ENDIAN_BIG) { bswap = machine_is_little_endian(); hinfo->little_endian = 0; } else if (header_start.endian == SAS_ENDIAN_LITTLE) { bswap = !machine_is_little_endian(); hinfo->little_endian = 1; } else { retval = READSTAT_ERROR_PARSE; goto cleanup; } int i; for (i=0; iencoding = _charset_table[i].name; break; } } if (hinfo->encoding == NULL) { if (error_handler) { snprintf(error_buf, sizeof(error_buf), "Unsupported character set code: %d", header_start.encoding); error_handler(error_buf, user_ctx); } retval = READSTAT_ERROR_UNSUPPORTED_CHARSET; goto cleanup; } memcpy(hinfo->table_name, header_start.table_name, sizeof(header_start.table_name)); if (io->seek(hinfo->pad1, READSTAT_SEEK_CUR, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } double creation_time, modification_time; if (io->read(&creation_time, sizeof(double), io->io_ctx) < sizeof(double)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (bswap) creation_time = byteswap_double(creation_time); if (io->read(&modification_time, sizeof(double), io->io_ctx) < sizeof(double)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (bswap) modification_time = byteswap_double(modification_time); hinfo->creation_time = sas_convert_time(creation_time, epoch); hinfo->modification_time = sas_convert_time(modification_time, epoch); if (io->seek(16, READSTAT_SEEK_CUR, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } uint32_t header_size, page_size; if (io->read(&header_size, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } if (io->read(&page_size, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } hinfo->header_size = bswap ? byteswap4(header_size) : header_size; hinfo->page_size = bswap ? byteswap4(page_size) : page_size; if (hinfo->header_size < 1024 || hinfo->page_size < 1024) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (hinfo->header_size > (1<<24) || hinfo->page_size > (1<<24)) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (hinfo->u64) { hinfo->page_header_size = SAS_PAGE_HEADER_SIZE_64BIT; hinfo->subheader_pointer_size = SAS_SUBHEADER_POINTER_SIZE_64BIT; } else { hinfo->page_header_size = SAS_PAGE_HEADER_SIZE_32BIT; hinfo->subheader_pointer_size = SAS_SUBHEADER_POINTER_SIZE_32BIT; } if (hinfo->u64) { uint64_t page_count; if (io->read(&page_count, sizeof(uint64_t), io->io_ctx) < sizeof(uint64_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } hinfo->page_count = bswap ? byteswap8(page_count) : page_count; } else { uint32_t page_count; if (io->read(&page_count, sizeof(uint32_t), io->io_ctx) < sizeof(uint32_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } hinfo->page_count = bswap ? byteswap4(page_count) : page_count; } if (hinfo->page_count > (1<<24)) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (io->seek(8, READSTAT_SEEK_CUR, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; if (error_handler) { snprintf(error_buf, sizeof(error_buf), "ReadStat: Failed to seek forward by %d", 8); error_handler(error_buf, user_ctx); } goto cleanup; } if (io->read(&header_end, sizeof(sas_header_end_t), io->io_ctx) < sizeof(sas_header_end_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } char major; int minor, revision; if (sscanf(header_end.release, "%c.%04dM%1d", &major, &minor, &revision) != 3) { retval = READSTAT_ERROR_PARSE; goto cleanup; } if (major >= '1' && major <= '9') { hinfo->major_version = major - '0'; } else if (major == 'V') { // It appears that SAS Visual Forecaster reports the major version as "V" // Treat it as version 9 for all intents and purposes hinfo->major_version = 9; } else { retval = READSTAT_ERROR_PARSE; goto cleanup; } hinfo->minor_version = minor; hinfo->revision = revision; if ((major == '8' || major == '9') && minor == 0 && revision == 0) { /* A bit of a hack, but most SAS installations are running a minor update */ hinfo->vendor = READSTAT_VENDOR_STAT_TRANSFER; } else { hinfo->vendor = READSTAT_VENDOR_SAS; } if (io->seek(hinfo->header_size, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; if (error_handler) { snprintf(error_buf, sizeof(error_buf), "ReadStat: Failed to seek to position %" PRId64, hinfo->header_size); error_handler(error_buf, user_ctx); } goto cleanup; } cleanup: return retval; } readstat_error_t sas_write_header(readstat_writer_t *writer, sas_header_info_t *hinfo, sas_header_start_t header_start) { readstat_error_t retval = READSTAT_OK; time_t epoch = sas_epoch(); memset(header_start.table_name, ' ', sizeof(header_start.table_name)); size_t table_name_len = strlen(writer->table_name); if (table_name_len > sizeof(header_start.table_name)) table_name_len = sizeof(header_start.table_name); if (table_name_len) { memcpy(header_start.table_name, writer->table_name, table_name_len); } else { memcpy(header_start.table_name, "DATASET", sizeof("DATASET")-1); } retval = readstat_write_bytes(writer, &header_start, sizeof(sas_header_start_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_zeros(writer, hinfo->pad1); if (retval != READSTAT_OK) goto cleanup; double creation_time = hinfo->creation_time - epoch; retval = readstat_write_bytes(writer, &creation_time, sizeof(double)); if (retval != READSTAT_OK) goto cleanup; double modification_time = hinfo->modification_time - epoch; retval = readstat_write_bytes(writer, &modification_time, sizeof(double)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_zeros(writer, 16); if (retval != READSTAT_OK) goto cleanup; uint32_t header_size = hinfo->header_size; uint32_t page_size = hinfo->page_size; retval = readstat_write_bytes(writer, &header_size, sizeof(uint32_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_bytes(writer, &page_size, sizeof(uint32_t)); if (retval != READSTAT_OK) goto cleanup; if (hinfo->u64) { uint64_t page_count = hinfo->page_count; retval = readstat_write_bytes(writer, &page_count, sizeof(uint64_t)); } else { uint32_t page_count = hinfo->page_count; retval = readstat_write_bytes(writer, &page_count, sizeof(uint32_t)); } if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_zeros(writer, 8); if (retval != READSTAT_OK) goto cleanup; sas_header_end_t header_end = { .host = "9.0401M6Linux" }; char release[sizeof(header_end.release)+1] = { 0 }; snprintf(release, sizeof(release), "%1d.%04dM0", (unsigned int)writer->version % 10, 101); memcpy(header_end.release, release, sizeof(header_end.release)); retval = readstat_write_bytes(writer, &header_end, sizeof(sas_header_end_t)); if (retval != READSTAT_OK) goto cleanup; retval = readstat_write_zeros(writer, hinfo->header_size-writer->bytes_written); if (retval != READSTAT_OK) goto cleanup; cleanup: return retval; } sas_header_info_t *sas_header_info_init(readstat_writer_t *writer, int is_64bit) { sas_header_info_t *hinfo = calloc(1, sizeof(sas_header_info_t)); hinfo->creation_time = writer->timestamp; hinfo->modification_time = writer->timestamp; hinfo->page_size = SAS_DEFAULT_PAGE_SIZE; hinfo->u64 = !!is_64bit; if (hinfo->u64) { hinfo->header_size = SAS_FILE_HEADER_SIZE_64BIT; hinfo->page_header_size = SAS_PAGE_HEADER_SIZE_64BIT; hinfo->subheader_pointer_size = SAS_SUBHEADER_POINTER_SIZE_64BIT; } else { hinfo->header_size = SAS_FILE_HEADER_SIZE_32BIT; hinfo->page_header_size = SAS_PAGE_HEADER_SIZE_32BIT; hinfo->subheader_pointer_size = SAS_SUBHEADER_POINTER_SIZE_32BIT; } return hinfo; } readstat_error_t sas_fill_page(readstat_writer_t *writer, sas_header_info_t *hinfo) { if ((writer->bytes_written - hinfo->header_size) % hinfo->page_size) { size_t num_zeros = (hinfo->page_size - (writer->bytes_written - hinfo->header_size) % hinfo->page_size); return readstat_write_zeros(writer, num_zeros); } return READSTAT_OK; } readstat_error_t sas_validate_name(const char *name, size_t max_len) { int j; for (j=0; name[j]; j++) { if (name[j] != '_' && !(name[j] >= 'a' && name[j] <= 'z') && !(name[j] >= 'A' && name[j] <= 'Z') && !(name[j] >= '0' && name[j] <= '9')) { return READSTAT_ERROR_NAME_CONTAINS_ILLEGAL_CHARACTER; } } char first_char = name[0]; if (!first_char) return READSTAT_ERROR_NAME_IS_ZERO_LENGTH; if (first_char != '_' && !(first_char >= 'a' && first_char <= 'z') && !(first_char >= 'A' && first_char <= 'Z')) { return READSTAT_ERROR_NAME_BEGINS_WITH_ILLEGAL_CHARACTER; } if (strcmp(name, "_N_") == 0 || strcmp(name, "_ERROR_") == 0 || strcmp(name, "_NUMERIC_") == 0 || strcmp(name, "_CHARACTER_") == 0 || strcmp(name, "_ALL_") == 0) { return READSTAT_ERROR_NAME_IS_RESERVED_WORD; } if (strlen(name) > max_len) return READSTAT_ERROR_NAME_IS_TOO_LONG; return READSTAT_OK; } readstat_error_t sas_validate_variable(const readstat_variable_t *variable) { return sas_validate_name(readstat_variable_get_name(variable), 32); } readstat_error_t sas_validate_tag(char tag) { if (tag == '_' || (tag >= 'A' && tag <= 'Z')) return READSTAT_OK; return READSTAT_ERROR_TAGGED_VALUE_IS_OUT_OF_RANGE; } void sas_assign_tag(readstat_value_t *value, uint8_t tag) { /* We accommodate two tag schemes. In the first, the tag is an ASCII code * given by uint8_t tag above. System missing is represented by an ASCII * period. In the second scheme, (tag-2) is an offset from 'A', except when * tag == 0, in which case it represents an underscore, or tag == 1, in * which case it represents system-missing. */ if (tag == 0) { tag = '_'; } else if (tag >= 2 && tag < 28) { tag = 'A' + (tag - 2); } if (sas_validate_tag(tag) == READSTAT_OK) { value->tag = tag; value->is_tagged_missing = 1; } else { value->tag = 0; value->is_system_missing = 1; } } haven/src/readstat/sas/readstat_sas7bdat_write.c0000644000176200001440000007471514101007206021521 0ustar liggesusers #include #include #include #include #include "../readstat.h" #include "../readstat_writer.h" #include "readstat_sas.h" #include "readstat_sas_rle.h" typedef struct sas7bdat_subheader_s { uint32_t signature; char *data; size_t len; int is_row_data; int is_row_data_compressed; } sas7bdat_subheader_t; typedef struct sas7bdat_subheader_array_s { int64_t count; int64_t capacity; sas7bdat_subheader_t **subheaders; } sas7bdat_subheader_array_t; typedef struct sas7bdat_column_text_s { char *data; size_t capacity; size_t used; int64_t index; } sas7bdat_column_text_t; typedef struct sas7bdat_column_text_array_s { int64_t count; sas7bdat_column_text_t **column_texts; } sas7bdat_column_text_array_t; typedef struct sas7bdat_write_ctx_s { sas_header_info_t *hinfo; sas7bdat_subheader_array_t *sarray; } sas7bdat_write_ctx_t; static size_t sas7bdat_variable_width(readstat_type_t type, size_t user_width); static int32_t sas7bdat_count_meta_pages(readstat_writer_t *writer) { sas7bdat_write_ctx_t *ctx = (sas7bdat_write_ctx_t *)writer->module_ctx; sas_header_info_t *hinfo = ctx->hinfo; sas7bdat_subheader_array_t *sarray = ctx->sarray; int i; int pages = 1; size_t bytes_left = hinfo->page_size - hinfo->page_header_size; size_t shp_ptr_size = hinfo->subheader_pointer_size; for (i=sarray->count-1; i>=0; i--) { sas7bdat_subheader_t *subheader = sarray->subheaders[i]; if (subheader->len + shp_ptr_size > bytes_left) { bytes_left = hinfo->page_size - hinfo->page_header_size; pages++; } bytes_left -= (subheader->len + shp_ptr_size); } return pages; } static size_t sas7bdat_row_length(readstat_writer_t *writer) { int i; size_t len = 0; for (i=0; ivariables_count; i++) { readstat_variable_t *variable = readstat_get_variable(writer, i); len += sas7bdat_variable_width(readstat_variable_get_type(variable), readstat_variable_get_storage_width(variable)); } return len; } static int32_t sas7bdat_rows_per_page(readstat_writer_t *writer, sas_header_info_t *hinfo) { size_t row_length = sas7bdat_row_length(writer); return (hinfo->page_size - hinfo->page_header_size) / row_length; } static int32_t sas7bdat_count_data_pages(readstat_writer_t *writer, sas_header_info_t *hinfo) { if (writer->compression == READSTAT_COMPRESS_ROWS) return 0; int32_t rows_per_page = sas7bdat_rows_per_page(writer, hinfo); return (writer->row_count + (rows_per_page - 1)) / rows_per_page; } static sas7bdat_column_text_t *sas7bdat_column_text_init(int64_t index, size_t len) { sas7bdat_column_text_t *column_text = calloc(1, sizeof(sas7bdat_column_text_t)); column_text->data = malloc(len); column_text->capacity = len; column_text->index = index; return column_text; } static void sas7bdat_column_text_free(sas7bdat_column_text_t *column_text) { free(column_text->data); free(column_text); } static void sas7bdat_column_text_array_free(sas7bdat_column_text_array_t *column_text_array) { int i; for (i=0; icount; i++) { sas7bdat_column_text_free(column_text_array->column_texts[i]); } free(column_text_array->column_texts); free(column_text_array); } static sas_text_ref_t sas7bdat_make_text_ref(sas7bdat_column_text_array_t *column_text_array, const char *string) { size_t len = strlen(string); size_t padded_len = (len + 3) / 4 * 4; sas7bdat_column_text_t *column_text = column_text_array->column_texts[ column_text_array->count-1]; if (column_text->used + padded_len > column_text->capacity) { column_text_array->count++; column_text_array->column_texts = realloc(column_text_array->column_texts, sizeof(sas7bdat_column_text_t *) * column_text_array->count); column_text = sas7bdat_column_text_init(column_text_array->count-1, column_text->capacity); column_text_array->column_texts[column_text_array->count-1] = column_text; } sas_text_ref_t text_ref = { .index = column_text->index, .offset = column_text->used + 28, .length = len }; strncpy(&column_text->data[column_text->used], string, padded_len); column_text->used += padded_len; return text_ref; } static readstat_error_t sas7bdat_emit_header(readstat_writer_t *writer, sas_header_info_t *hinfo) { sas_header_start_t header_start = { .a2 = hinfo->u64 ? SAS_ALIGNMENT_OFFSET_4 : SAS_ALIGNMENT_OFFSET_0, .a1 = SAS_ALIGNMENT_OFFSET_0, .endian = machine_is_little_endian() ? SAS_ENDIAN_LITTLE : SAS_ENDIAN_BIG, .file_format = SAS_FILE_FORMAT_UNIX, .encoding = 20, /* UTF-8 */ .file_type = "SAS FILE", .file_info = "DATA " }; memcpy(&header_start.magic, sas7bdat_magic_number, sizeof(header_start.magic)); return sas_write_header(writer, hinfo, header_start); } static sas7bdat_subheader_t *sas7bdat_subheader_init(uint32_t signature, size_t len) { sas7bdat_subheader_t *subheader = calloc(1, sizeof(sas7bdat_subheader_t)); subheader->signature = signature; subheader->len = len; subheader->data = calloc(1, len); return subheader; } static sas7bdat_subheader_t *sas7bdat_row_size_subheader_init(readstat_writer_t *writer, sas_header_info_t *hinfo, sas7bdat_column_text_array_t *column_text_array) { sas7bdat_subheader_t *subheader = sas7bdat_subheader_init( SAS_SUBHEADER_SIGNATURE_ROW_SIZE, hinfo->u64 ? 808 : 480); if (hinfo->u64) { int64_t row_length = sas7bdat_row_length(writer); int64_t row_count = writer->row_count; int64_t ncfl1 = writer->variables_count; int64_t page_size = hinfo->page_size; memcpy(&subheader->data[40], &row_length, sizeof(int64_t)); memcpy(&subheader->data[48], &row_count, sizeof(int64_t)); memcpy(&subheader->data[72], &ncfl1, sizeof(int64_t)); memcpy(&subheader->data[104], &page_size, sizeof(int64_t)); memset(&subheader->data[128], 0xFF, 16); } else { int32_t row_length = sas7bdat_row_length(writer); int32_t row_count = writer->row_count; int32_t ncfl1 = writer->variables_count; int32_t page_size = hinfo->page_size; memcpy(&subheader->data[20], &row_length, sizeof(int32_t)); memcpy(&subheader->data[24], &row_count, sizeof(int32_t)); memcpy(&subheader->data[36], &ncfl1, sizeof(int32_t)); memcpy(&subheader->data[52], &page_size, sizeof(int32_t)); memset(&subheader->data[64], 0xFF, 8); } sas_text_ref_t text_ref = { 0 }; if (writer->file_label[0]) { text_ref = sas7bdat_make_text_ref(column_text_array, writer->file_label); memcpy(&subheader->data[subheader->len-130], &text_ref, sizeof(sas_text_ref_t)); } if (writer->compression == READSTAT_COMPRESS_ROWS) { text_ref = sas7bdat_make_text_ref(column_text_array, SAS_COMPRESSION_SIGNATURE_RLE); memcpy(&subheader->data[subheader->len-118], &text_ref, sizeof(sas_text_ref_t)); } return subheader; } static sas7bdat_subheader_t *sas7bdat_col_size_subheader_init(readstat_writer_t *writer, sas_header_info_t *hinfo) { sas7bdat_subheader_t *subheader = sas7bdat_subheader_init( SAS_SUBHEADER_SIGNATURE_COLUMN_SIZE, hinfo->u64 ? 24 : 12); if (hinfo->u64) { int64_t col_count = writer->variables_count; memcpy(&subheader->data[8], &col_count, sizeof(int64_t)); } else { int32_t col_count = writer->variables_count; memcpy(&subheader->data[4], &col_count, sizeof(int32_t)); } return subheader; } static size_t sas7bdat_col_name_subheader_length(readstat_writer_t *writer, sas_header_info_t *hinfo) { return (hinfo->u64 ? 28+8*writer->variables_count : 20+8*writer->variables_count); } static sas7bdat_subheader_t *sas7bdat_col_name_subheader_init(readstat_writer_t *writer, sas_header_info_t *hinfo, sas7bdat_column_text_array_t *column_text_array) { size_t len = sas7bdat_col_name_subheader_length(writer, hinfo); size_t signature_len = hinfo->u64 ? 8 : 4; uint16_t remainder = sas_subheader_remainder(len, signature_len); sas7bdat_subheader_t *subheader = sas7bdat_subheader_init( SAS_SUBHEADER_SIGNATURE_COLUMN_NAME, len); memcpy(&subheader->data[signature_len], &remainder, sizeof(uint16_t)); int i; char *ptrs = &subheader->data[signature_len+8]; for (i=0; ivariables_count; i++) { readstat_variable_t *variable = readstat_get_variable(writer, i); const char *name = readstat_variable_get_name(variable); sas_text_ref_t text_ref = sas7bdat_make_text_ref(column_text_array, name); memcpy(ptrs, &text_ref, sizeof(sas_text_ref_t)); ptrs += 8; } return subheader; } static size_t sas7bdat_col_attrs_subheader_length(readstat_writer_t *writer, sas_header_info_t *hinfo) { return (hinfo->u64 ? 28+16*writer->variables_count : 20+12*writer->variables_count); } static sas7bdat_subheader_t *sas7bdat_col_attrs_subheader_init(readstat_writer_t *writer, sas_header_info_t *hinfo) { size_t len = sas7bdat_col_attrs_subheader_length(writer, hinfo); size_t signature_len = hinfo->u64 ? 8 : 4; uint16_t remainder = sas_subheader_remainder(len, signature_len); sas7bdat_subheader_t *subheader = sas7bdat_subheader_init( SAS_SUBHEADER_SIGNATURE_COLUMN_ATTRS, len); memcpy(&subheader->data[signature_len], &remainder, sizeof(uint16_t)); char *ptrs = &subheader->data[signature_len+8]; uint64_t offset = 0; int i; for (i=0; ivariables_count; i++) { readstat_variable_t *variable = readstat_get_variable(writer, i); const char *name = readstat_variable_get_name(variable); readstat_type_t type = readstat_variable_get_type(variable); uint16_t name_length_flag = strlen(name) <= 8 ? 4 : 2048; uint32_t width = 0; if (hinfo->u64) { memcpy(&ptrs[0], &offset, sizeof(uint64_t)); ptrs += sizeof(uint64_t); } else { uint32_t offset32 = offset; memcpy(&ptrs[0], &offset32, sizeof(uint32_t)); ptrs += sizeof(uint32_t); } if (type == READSTAT_TYPE_STRING) { ptrs[6] = SAS_COLUMN_TYPE_CHR; width = readstat_variable_get_storage_width(variable); } else { ptrs[6] = SAS_COLUMN_TYPE_NUM; width = 8; } memcpy(&ptrs[0], &width, sizeof(uint32_t)); memcpy(&ptrs[4], &name_length_flag, sizeof(uint16_t)); offset += width; ptrs += 8; } return subheader; } static sas7bdat_subheader_t *sas7bdat_col_format_subheader_init(readstat_variable_t *variable, sas_header_info_t *hinfo, sas7bdat_column_text_array_t *column_text_array) { sas7bdat_subheader_t *subheader = sas7bdat_subheader_init( SAS_SUBHEADER_SIGNATURE_COLUMN_FORMAT, hinfo->u64 ? 64 : 52); const char *format = readstat_variable_get_format(variable); const char *label = readstat_variable_get_label(variable); off_t format_offset = hinfo->u64 ? 46 : 34; off_t label_offset = hinfo->u64 ? 52 : 40; if (format) { sas_text_ref_t text_ref = sas7bdat_make_text_ref(column_text_array, format); memcpy(&subheader->data[format_offset+0], &text_ref.index, sizeof(uint16_t)); memcpy(&subheader->data[format_offset+2], &text_ref.offset, sizeof(uint16_t)); memcpy(&subheader->data[format_offset+4], &text_ref.length, sizeof(uint16_t)); } if (label) { sas_text_ref_t text_ref = sas7bdat_make_text_ref(column_text_array, label); memcpy(&subheader->data[label_offset+0], &text_ref.index, sizeof(uint16_t)); memcpy(&subheader->data[label_offset+2], &text_ref.offset, sizeof(uint16_t)); memcpy(&subheader->data[label_offset+4], &text_ref.length, sizeof(uint16_t)); } return subheader; } static size_t sas7bdat_col_text_subheader_length(sas_header_info_t *hinfo, sas7bdat_column_text_t *column_text) { size_t signature_len = hinfo->u64 ? 8 : 4; size_t text_len = column_text ? column_text->used : 0; return signature_len + 28 + text_len; } static sas7bdat_subheader_t *sas7bdat_col_text_subheader_init(readstat_writer_t *writer, sas_header_info_t *hinfo, sas7bdat_column_text_t *column_text) { size_t signature_len = hinfo->u64 ? 8 : 4; size_t len = sas7bdat_col_text_subheader_length(hinfo, column_text); sas7bdat_subheader_t *subheader = sas7bdat_subheader_init( SAS_SUBHEADER_SIGNATURE_COLUMN_TEXT, len); uint16_t used = sas_subheader_remainder(len, signature_len); memcpy(&subheader->data[signature_len], &used, sizeof(uint16_t)); memset(&subheader->data[signature_len+12], ' ', 8); memcpy(&subheader->data[signature_len+28], column_text->data, column_text->used); return subheader; } static sas7bdat_subheader_array_t *sas7bdat_subheader_array_init(readstat_writer_t *writer, sas_header_info_t *hinfo) { sas7bdat_column_text_array_t *column_text_array = calloc(1, sizeof(sas7bdat_column_text_array_t)); column_text_array->count = 1; column_text_array->column_texts = malloc(sizeof(sas7bdat_column_text_t *)); column_text_array->column_texts[0] = sas7bdat_column_text_init(0, hinfo->page_size - hinfo->page_header_size - hinfo->subheader_pointer_size - sas7bdat_col_text_subheader_length(hinfo, NULL)); sas7bdat_subheader_array_t *sarray = calloc(1, sizeof(sas7bdat_subheader_array_t)); sarray->count = 4+writer->variables_count; sarray->subheaders = calloc(sarray->count, sizeof(sas7bdat_subheader_t *)); long idx = 0; int i; sas7bdat_subheader_t *col_name_subheader = NULL; sas7bdat_subheader_t *col_attrs_subheader = NULL; sas7bdat_subheader_t **col_format_subheaders = NULL; col_name_subheader = sas7bdat_col_name_subheader_init(writer, hinfo, column_text_array); col_attrs_subheader = sas7bdat_col_attrs_subheader_init(writer, hinfo); sarray->subheaders[idx++] = sas7bdat_row_size_subheader_init(writer, hinfo, column_text_array); sarray->subheaders[idx++] = sas7bdat_col_size_subheader_init(writer, hinfo); col_format_subheaders = calloc(writer->variables_count, sizeof(sas7bdat_subheader_t *)); for (i=0; ivariables_count; i++) { readstat_variable_t *variable = readstat_get_variable(writer, i); col_format_subheaders[i] = sas7bdat_col_format_subheader_init(variable, hinfo, column_text_array); } sarray->count += column_text_array->count; sarray->subheaders = realloc(sarray->subheaders, sarray->count * sizeof(sas7bdat_subheader_t *)); for (i=0; icount; i++) { sarray->subheaders[idx++] = sas7bdat_col_text_subheader_init(writer, hinfo, column_text_array->column_texts[i]); } sas7bdat_column_text_array_free(column_text_array); sarray->subheaders[idx++] = col_name_subheader; sarray->subheaders[idx++] = col_attrs_subheader; for (i=0; ivariables_count; i++) { sarray->subheaders[idx++] = col_format_subheaders[i]; } free(col_format_subheaders); sarray->capacity = sarray->count; if (writer->compression == READSTAT_COMPRESS_ROWS) { sarray->capacity = (sarray->count + writer->row_count); sarray->subheaders = realloc(sarray->subheaders, sarray->capacity * sizeof(sas7bdat_subheader_t *)); } return sarray; } static void sas7bdat_subheader_free(sas7bdat_subheader_t *subheader) { if (!subheader) return; if (subheader->data) free(subheader->data); free(subheader); } static void sas7bdat_subheader_array_free(sas7bdat_subheader_array_t *sarray) { int i; for (i=0; icount; i++) { sas7bdat_subheader_free(sarray->subheaders[i]); } free(sarray->subheaders); free(sarray); } static int sas7bdat_subheader_type(uint32_t signature) { return (signature == SAS_SUBHEADER_SIGNATURE_COLUMN_TEXT || signature == SAS_SUBHEADER_SIGNATURE_COLUMN_NAME || signature == SAS_SUBHEADER_SIGNATURE_COLUMN_ATTRS || signature == SAS_SUBHEADER_SIGNATURE_COLUMN_LIST); } static readstat_error_t sas7bdat_emit_meta_pages(readstat_writer_t *writer) { sas7bdat_write_ctx_t *ctx = (sas7bdat_write_ctx_t *)writer->module_ctx; sas_header_info_t *hinfo = ctx->hinfo; sas7bdat_subheader_array_t *sarray = ctx->sarray; readstat_error_t retval = READSTAT_OK; int16_t page_type = SAS_PAGE_TYPE_META; char *page = malloc(hinfo->page_size); int64_t shp_written = 0; while (sarray->count > shp_written) { memset(page, 0, hinfo->page_size); int16_t shp_count = 0; size_t shp_data_offset = hinfo->page_size; size_t shp_ptr_offset = hinfo->page_header_size; size_t shp_ptr_size = hinfo->subheader_pointer_size; memcpy(&page[hinfo->page_header_size-8], &page_type, sizeof(int16_t)); if (sarray->subheaders[shp_written]->len + shp_ptr_size > shp_data_offset - shp_ptr_offset) { retval = READSTAT_ERROR_ROW_IS_TOO_WIDE_FOR_PAGE; goto cleanup; } while (sarray->count > shp_written && sarray->subheaders[shp_written]->len + shp_ptr_size <= shp_data_offset - shp_ptr_offset) { sas7bdat_subheader_t *subheader = sarray->subheaders[shp_written]; uint32_t signature32 = subheader->signature; /* copy ptr */ if (hinfo->u64) { uint64_t offset = shp_data_offset - subheader->len; uint64_t len = subheader->len; memcpy(&page[shp_ptr_offset], &offset, sizeof(uint64_t)); memcpy(&page[shp_ptr_offset+8], &len, sizeof(uint64_t)); if (subheader->is_row_data) { if (subheader->is_row_data_compressed) { page[shp_ptr_offset+16] = SAS_COMPRESSION_ROW; } else { page[shp_ptr_offset+16] = SAS_COMPRESSION_NONE; } page[shp_ptr_offset+17] = 1; } else { page[shp_ptr_offset+17] = sas7bdat_subheader_type(subheader->signature); if (signature32 >= 0xFF000000) { int64_t signature64 = (int32_t)signature32; memcpy(&subheader->data[0], &signature64, sizeof(int64_t)); } else { memcpy(&subheader->data[0], &signature32, sizeof(int32_t)); } } } else { uint32_t offset = shp_data_offset - subheader->len; uint32_t len = subheader->len; memcpy(&page[shp_ptr_offset], &offset, sizeof(uint32_t)); memcpy(&page[shp_ptr_offset+4], &len, sizeof(uint32_t)); if (subheader->is_row_data) { if (subheader->is_row_data_compressed) { page[shp_ptr_offset+8] = SAS_COMPRESSION_ROW; } else { page[shp_ptr_offset+8] = SAS_COMPRESSION_NONE; } page[shp_ptr_offset+9] = 1; } else { page[shp_ptr_offset+9] = sas7bdat_subheader_type(subheader->signature); memcpy(&subheader->data[0], &signature32, sizeof(int32_t)); } } shp_ptr_offset += shp_ptr_size; /* copy data */ shp_data_offset -= subheader->len; memcpy(&page[shp_data_offset], subheader->data, subheader->len); shp_written++; shp_count++; } if (hinfo->u64) { memcpy(&page[34], &shp_count, sizeof(int16_t)); memcpy(&page[36], &shp_count, sizeof(int16_t)); } else { memcpy(&page[18], &shp_count, sizeof(int16_t)); memcpy(&page[20], &shp_count, sizeof(int16_t)); } retval = readstat_write_bytes(writer, page, hinfo->page_size); if (retval != READSTAT_OK) goto cleanup; } cleanup: free(page); return retval; } static int sas7bdat_page_is_too_small(readstat_writer_t *writer, sas_header_info_t *hinfo, size_t row_length) { size_t page_length = hinfo->page_size - hinfo->page_header_size; if (writer->compression == READSTAT_COMPRESS_NONE && page_length < row_length) return 1; if (writer->compression == READSTAT_COMPRESS_ROWS && page_length < row_length + hinfo->subheader_pointer_size) return 1; if (page_length < sas7bdat_col_name_subheader_length(writer, hinfo) + hinfo->subheader_pointer_size) return 1; if (page_length < sas7bdat_col_attrs_subheader_length(writer, hinfo) + hinfo->subheader_pointer_size) return 1; return 0; } static sas7bdat_write_ctx_t *sas7bdat_write_ctx_init(readstat_writer_t *writer) { sas7bdat_write_ctx_t *ctx = calloc(1, sizeof(sas7bdat_write_ctx_t)); sas_header_info_t *hinfo = sas_header_info_init(writer, writer->is_64bit); size_t row_length = sas7bdat_row_length(writer); while (sas7bdat_page_is_too_small(writer, hinfo, row_length)) { hinfo->page_size <<= 1; } ctx->hinfo = hinfo; ctx->sarray = sas7bdat_subheader_array_init(writer, hinfo); return ctx; } static void sas7bdat_write_ctx_free(sas7bdat_write_ctx_t *ctx) { free(ctx->hinfo); sas7bdat_subheader_array_free(ctx->sarray); free(ctx); } static readstat_error_t sas7bdat_emit_header_and_meta_pages(readstat_writer_t *writer) { sas7bdat_write_ctx_t *ctx = (sas7bdat_write_ctx_t *)writer->module_ctx; readstat_error_t retval = READSTAT_OK; if (sas7bdat_row_length(writer) == 0) { retval = READSTAT_ERROR_TOO_FEW_COLUMNS; goto cleanup; } if (writer->compression == READSTAT_COMPRESS_NONE && sas7bdat_rows_per_page(writer, ctx->hinfo) == 0) { retval = READSTAT_ERROR_ROW_IS_TOO_WIDE_FOR_PAGE; goto cleanup; } ctx->hinfo->page_count = sas7bdat_count_meta_pages(writer) + sas7bdat_count_data_pages(writer, ctx->hinfo); retval = sas7bdat_emit_header(writer, ctx->hinfo); if (retval != READSTAT_OK) goto cleanup; retval = sas7bdat_emit_meta_pages(writer); if (retval != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t sas7bdat_begin_data(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; readstat_error_t retval = READSTAT_OK; writer->module_ctx = sas7bdat_write_ctx_init(writer); if (writer->compression == READSTAT_COMPRESS_NONE) { retval = sas7bdat_emit_header_and_meta_pages(writer); if (retval != READSTAT_OK) goto cleanup; } cleanup: if (retval != READSTAT_OK) { if (writer->module_ctx) { sas7bdat_write_ctx_free(writer->module_ctx); writer->module_ctx = NULL; } } return retval; } static readstat_error_t sas7bdat_end_data(void *writer_ctx) { readstat_error_t retval = READSTAT_OK; readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; sas7bdat_write_ctx_t *ctx = (sas7bdat_write_ctx_t *)writer->module_ctx; if (writer->compression == READSTAT_COMPRESS_ROWS) { retval = sas7bdat_emit_header_and_meta_pages(writer); } else { retval = sas_fill_page(writer, ctx->hinfo); } return retval; } static void sas7bdat_module_ctx_free(void *module_ctx) { sas7bdat_write_ctx_free(module_ctx); } static readstat_error_t sas7bdat_write_double(void *row, const readstat_variable_t *var, double value) { memcpy(row, &value, sizeof(double)); return READSTAT_OK; } static readstat_error_t sas7bdat_write_float(void *row, const readstat_variable_t *var, float value) { return sas7bdat_write_double(row, var, value); } static readstat_error_t sas7bdat_write_int32(void *row, const readstat_variable_t *var, int32_t value) { return sas7bdat_write_double(row, var, value); } static readstat_error_t sas7bdat_write_int16(void *row, const readstat_variable_t *var, int16_t value) { return sas7bdat_write_double(row, var, value); } static readstat_error_t sas7bdat_write_int8(void *row, const readstat_variable_t *var, int8_t value) { return sas7bdat_write_double(row, var, value); } static readstat_error_t sas7bdat_write_missing_tagged_raw(void *row, const readstat_variable_t *var, char tag) { union { double dval; char chars[8]; } nan_value; nan_value.dval = NAN; nan_value.chars[5] = ~tag; return sas7bdat_write_double(row, var, nan_value.dval); } static readstat_error_t sas7bdat_write_missing_tagged(void *row, const readstat_variable_t *var, char tag) { readstat_error_t error = sas_validate_tag(tag); if (error == READSTAT_OK) return sas7bdat_write_missing_tagged_raw(row, var, tag); return error; } static readstat_error_t sas7bdat_write_missing_numeric(void *row, const readstat_variable_t *var) { return sas7bdat_write_missing_tagged_raw(row, var, '.'); } static readstat_error_t sas7bdat_write_string(void *row, const readstat_variable_t *var, const char *value) { size_t max_len = readstat_variable_get_storage_width(var); if (value == NULL || value[0] == '\0') { memset(row, '\0', max_len); } else { size_t value_len = strlen(value); if (value_len > max_len) return READSTAT_ERROR_STRING_VALUE_IS_TOO_LONG; strncpy((char *)row, value, max_len); } return READSTAT_OK; } static readstat_error_t sas7bdat_write_missing_string(void *row, const readstat_variable_t *var) { return sas7bdat_write_string(row, var, NULL); } static size_t sas7bdat_variable_width(readstat_type_t type, size_t user_width) { if (type == READSTAT_TYPE_STRING) { return user_width; } return 8; } static readstat_error_t sas7bdat_write_row_uncompressed(readstat_writer_t *writer, sas7bdat_write_ctx_t *ctx, void *bytes, size_t len) { readstat_error_t retval = READSTAT_OK; sas_header_info_t *hinfo = ctx->hinfo; int32_t rows_per_page = sas7bdat_rows_per_page(writer, hinfo); if (writer->current_row % rows_per_page == 0) { retval = sas_fill_page(writer, ctx->hinfo); if (retval != READSTAT_OK) goto cleanup; int16_t page_type = SAS_PAGE_TYPE_DATA; int16_t page_row_count = (writer->row_count - writer->current_row < rows_per_page ? writer->row_count - writer->current_row : rows_per_page); char *header = calloc(hinfo->page_header_size, 1); memcpy(&header[hinfo->page_header_size-6], &page_row_count, sizeof(int16_t)); memcpy(&header[hinfo->page_header_size-8], &page_type, sizeof(int16_t)); retval = readstat_write_bytes(writer, header, hinfo->page_header_size); free(header); if (retval != READSTAT_OK) goto cleanup; } retval = readstat_write_bytes(writer, bytes, len); cleanup: return retval; } /* We don't actually write compressed data out at this point; the file header * requires a page count, so instead we collect the compressed subheaders in * memory and write the entire file at the end, once the page count can be * determined. */ static readstat_error_t sas7bdat_write_row_compressed(readstat_writer_t *writer, sas7bdat_write_ctx_t *ctx, void *bytes, size_t len) { readstat_error_t retval = READSTAT_OK; size_t compressed_len = sas_rle_compressed_len(bytes, len); sas7bdat_subheader_t *subheader = NULL; if (compressed_len < len) { subheader = sas7bdat_subheader_init(0, compressed_len); subheader->is_row_data = 1; subheader->is_row_data_compressed = 1; size_t actual_len = sas_rle_compress(subheader->data, subheader->len, bytes, len); if (actual_len != compressed_len) { retval = READSTAT_ERROR_ROW_WIDTH_MISMATCH; goto cleanup; } } else { subheader = sas7bdat_subheader_init(0, len); subheader->is_row_data = 1; memcpy(subheader->data, bytes, len); } ctx->sarray->subheaders[ctx->sarray->count++] = subheader; cleanup: if (retval != READSTAT_OK) sas7bdat_subheader_free(subheader); return retval; } static readstat_error_t sas7bdat_write_row(void *writer_ctx, void *bytes, size_t len) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; sas7bdat_write_ctx_t *ctx = (sas7bdat_write_ctx_t *)writer->module_ctx; readstat_error_t retval = READSTAT_OK; if (writer->compression == READSTAT_COMPRESS_NONE) { retval = sas7bdat_write_row_uncompressed(writer, ctx, bytes, len); } else if (writer->compression == READSTAT_COMPRESS_ROWS) { retval = sas7bdat_write_row_compressed(writer, ctx, bytes, len); } return retval; } static readstat_error_t sas7bdat_metadata_ok(void *writer_ctx) { readstat_writer_t *writer = (readstat_writer_t *)writer_ctx; if (writer->compression != READSTAT_COMPRESS_NONE && writer->compression != READSTAT_COMPRESS_ROWS) return READSTAT_ERROR_UNSUPPORTED_COMPRESSION; return READSTAT_OK; } readstat_error_t readstat_begin_writing_sas7bdat(readstat_writer_t *writer, void *user_ctx, long row_count) { if (writer->version == 0) writer->version = SAS_DEFAULT_FILE_VERSION; writer->callbacks.metadata_ok = &sas7bdat_metadata_ok; writer->callbacks.write_int8 = &sas7bdat_write_int8; writer->callbacks.write_int16 = &sas7bdat_write_int16; writer->callbacks.write_int32 = &sas7bdat_write_int32; writer->callbacks.write_float = &sas7bdat_write_float; writer->callbacks.write_double = &sas7bdat_write_double; writer->callbacks.write_string = &sas7bdat_write_string; writer->callbacks.write_missing_string = &sas7bdat_write_missing_string; writer->callbacks.write_missing_number = &sas7bdat_write_missing_numeric; writer->callbacks.write_missing_tagged = &sas7bdat_write_missing_tagged; writer->callbacks.variable_width = &sas7bdat_variable_width; writer->callbacks.variable_ok = &sas_validate_variable; writer->callbacks.begin_data = &sas7bdat_begin_data; writer->callbacks.end_data = &sas7bdat_end_data; writer->callbacks.module_ctx_free = &sas7bdat_module_ctx_free; writer->callbacks.write_row = &sas7bdat_write_row; return readstat_begin_writing_file(writer, user_ctx, row_count); } haven/src/readstat/sas/readstat_xport_read.c0000644000176200001440000005520114101007206020733 0ustar liggesusers#include #include #include #include #include #include #include "../readstat.h" #include "../readstat_iconv.h" #include "../readstat_convert.h" #include "../readstat_malloc.h" #include "readstat_sas.h" #include "readstat_xport.h" #include "ieee.h" #define LINE_LEN 80 typedef struct xport_ctx_s { readstat_callbacks_t handle; size_t file_size; void *user_ctx; const char *input_encoding; const char *output_encoding; iconv_t converter; readstat_io_t *io; time_t timestamp; int obs_count; int var_count; int row_limit; int row_offset; size_t row_length; int parsed_row_count; char file_label[256*4+1]; char table_name[32*4+1]; readstat_variable_t **variables; int version; } xport_ctx_t; static readstat_error_t xport_update_progress(xport_ctx_t *ctx) { readstat_io_t *io = ctx->io; return io->update(ctx->file_size, ctx->handle.progress, ctx->user_ctx, io->io_ctx); } static xport_ctx_t *xport_ctx_init() { xport_ctx_t *ctx = calloc(1, sizeof(xport_ctx_t)); return ctx; } static void xport_ctx_free(xport_ctx_t *ctx) { if (ctx->variables) { int i; for (i=0; ivar_count; i++) { if (ctx->variables[i]) free(ctx->variables[i]); } free(ctx->variables); } if (ctx->converter) { iconv_close(ctx->converter); } free(ctx); } static ssize_t read_bytes(xport_ctx_t *ctx, void *dst, size_t dst_len) { readstat_io_t *io = (readstat_io_t *)ctx->io; return io->read(dst, dst_len, io->io_ctx); } static readstat_error_t xport_skip_record(xport_ctx_t *ctx) { readstat_io_t *io = (readstat_io_t *)ctx->io; if (io->seek(LINE_LEN, READSTAT_SEEK_CUR, io->io_ctx) == -1) return READSTAT_ERROR_SEEK; return READSTAT_OK; } static readstat_error_t xport_skip_rest_of_record(xport_ctx_t *ctx) { readstat_io_t *io = (readstat_io_t *)ctx->io; off_t pos = io->seek(0, READSTAT_SEEK_CUR, io->io_ctx); if (pos == -1) return READSTAT_ERROR_SEEK; if (pos % LINE_LEN) { if (io->seek(LINE_LEN - (pos % LINE_LEN), READSTAT_SEEK_CUR, io->io_ctx) == -1) return READSTAT_ERROR_SEEK; } return READSTAT_OK; } static readstat_error_t xport_read_record(xport_ctx_t *ctx, char *record) { ssize_t bytes_read = read_bytes(ctx, record, LINE_LEN); if (bytes_read < LINE_LEN) return READSTAT_ERROR_READ; record[LINE_LEN] = '\0'; return READSTAT_OK; } static readstat_error_t xport_read_header_record(xport_ctx_t *ctx, xport_header_record_t *xrecord) { char line[LINE_LEN+1]; readstat_error_t retval = READSTAT_OK; retval = xport_read_record(ctx, line); if (retval != READSTAT_OK) return retval; memset(xrecord, 0, sizeof(xport_header_record_t)); int matches = sscanf(line, "HEADER RECORD*******%8s HEADER RECORD!!!!!!!" "%05d%05d%05d" "%05d%05d%05d", xrecord->name, &xrecord->num1, &xrecord->num2, &xrecord->num3, &xrecord->num4, &xrecord->num5, &xrecord->num6); if (matches < 2) { return READSTAT_ERROR_PARSE; } return READSTAT_OK; } static readstat_error_t xport_expect_header_record(xport_ctx_t *ctx, const char *v5_name, const char *v8_name) { readstat_error_t retval = READSTAT_OK; xport_header_record_t xrecord; retval = xport_read_header_record(ctx, &xrecord); if (retval != READSTAT_OK) goto cleanup; if (ctx->version == 5 && strcmp(xrecord.name, v5_name) != 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } else if (ctx->version == 8 && strcmp(xrecord.name, v8_name) != 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } cleanup: return retval; } static readstat_error_t xport_read_table_name_record(xport_ctx_t *ctx) { char line[LINE_LEN+1]; readstat_error_t retval = READSTAT_OK; retval = xport_read_record(ctx, line); if (retval != READSTAT_OK) goto cleanup; retval = readstat_convert(ctx->table_name, sizeof(ctx->table_name), &line[8], ctx->version == 5 ? 8 : 32, ctx->converter); if (retval != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t xport_read_file_label_record(xport_ctx_t *ctx) { char line[LINE_LEN+1]; readstat_error_t retval = READSTAT_OK; retval = xport_read_record(ctx, line); if (retval != READSTAT_OK) goto cleanup; retval = readstat_convert(ctx->file_label, sizeof(ctx->file_label), &line[32], 40, ctx->converter); if (retval != READSTAT_OK) goto cleanup; cleanup: return retval; } static readstat_error_t xport_read_library_record(xport_ctx_t *ctx) { xport_header_record_t xrecord; readstat_error_t retval = xport_read_header_record(ctx, &xrecord); if (retval != READSTAT_OK) goto cleanup; if (strcmp(xrecord.name, "LIBRARY") == 0) { ctx->version = 5; } else if (strcmp(xrecord.name, "LIBV8") == 0) { ctx->version = 8; } else { retval = READSTAT_ERROR_UNSUPPORTED_FILE_FORMAT_VERSION; goto cleanup; } cleanup: return retval; } static readstat_error_t xport_read_timestamp_record(xport_ctx_t *ctx) { char line[LINE_LEN+1]; readstat_error_t retval = READSTAT_OK; struct tm ts = { .tm_isdst = -1 }; char month[4]; int i; retval = xport_read_record(ctx, line); if (retval != READSTAT_OK) goto cleanup; sscanf(line, "%02d%3s%02d:%02d:%02d:%02d", &ts.tm_mday, month, &ts.tm_year, &ts.tm_hour, &ts.tm_min, &ts.tm_sec); for (i=0; itimestamp = mktime(&ts); cleanup: return retval; } static readstat_error_t xport_read_namestr_header_record(xport_ctx_t *ctx) { xport_header_record_t xrecord; readstat_error_t retval = READSTAT_OK; retval = xport_read_header_record(ctx, &xrecord); if (retval != READSTAT_OK) goto cleanup; if (ctx->version == 5 && strcmp(xrecord.name, "NAMESTR") != 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } else if (ctx->version == 8 && strcmp(xrecord.name, "NAMSTV8") != 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } ctx->var_count = xrecord.num2; ctx->variables = readstat_calloc(ctx->var_count, sizeof(readstat_variable_t *)); if (ctx->variables == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } if (ctx->handle.metadata) { readstat_metadata_t metadata = { .row_count = -1, .var_count = ctx->var_count, .file_label = ctx->file_label, .table_name = ctx->table_name, .creation_time = ctx->timestamp, .modified_time = ctx->timestamp, .file_format_version = ctx->version }; if (ctx->handle.metadata(&metadata, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } cleanup: return retval; } static readstat_error_t xport_read_obs_header_record(xport_ctx_t *ctx) { return xport_expect_header_record(ctx, "OBS", "OBSV8"); } static readstat_error_t xport_construct_format(char *dst, size_t dst_len, const char *src, size_t src_len, int width, int decimals) { char *format = malloc(4 * src_len + 1); readstat_error_t retval = readstat_convert(format, 4 * src_len + 1, src, src_len, NULL); if (retval != READSTAT_OK) { free(format); return retval; } if (!format[0]) { *dst = '\0'; } else if (decimals) { snprintf(dst, dst_len, "%s%d.%d", format, width, decimals); } else if (width) { snprintf(dst, dst_len, "%s%d", format, width); } else { snprintf(dst, dst_len, "%s", format); } free(format); return retval; } static readstat_error_t xport_read_labels_v8(xport_ctx_t *ctx, int label_count) { readstat_error_t retval = READSTAT_OK; uint16_t labeldef[3]; char *name = NULL; char *label = NULL; int i; for (i=0; i ctx->var_count || index == 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } name = realloc(name, name_len + 1); label = realloc(label, label_len + 1); readstat_variable_t *variable = ctx->variables[index-1]; if (read_bytes(ctx, name, name_len) != name_len || read_bytes(ctx, label, label_len) != label_len) { retval = READSTAT_ERROR_READ; goto cleanup; } retval = readstat_convert(variable->name, sizeof(variable->name), name, name_len, ctx->converter); if (retval != READSTAT_OK) goto cleanup; retval = readstat_convert(variable->label, sizeof(variable->label), label, label_len, ctx->converter); if (retval != READSTAT_OK) goto cleanup; } retval = xport_skip_rest_of_record(ctx); if (retval != READSTAT_OK) goto cleanup; retval = xport_read_obs_header_record(ctx); if (retval != READSTAT_OK) goto cleanup; cleanup: free(name); free(label); return retval; } static readstat_error_t xport_read_labels_v9(xport_ctx_t *ctx, int label_count) { readstat_error_t retval = READSTAT_OK; uint16_t labeldef[5]; int i; char *name = NULL; char *format = NULL; char *informat = NULL; char *label = NULL; for (i=0; i ctx->var_count || index == 0) { retval = READSTAT_ERROR_PARSE; goto cleanup; } name = realloc(name, name_len + 1); format = realloc(format, format_len + 1); informat = realloc(informat, informat_len + 1); label = realloc(label, label_len + 1); readstat_variable_t *variable = ctx->variables[index-1]; if (read_bytes(ctx, name, name_len) != name_len || read_bytes(ctx, format, format_len) != format_len || read_bytes(ctx, informat, informat_len) != informat_len || read_bytes(ctx, label, label_len) != label_len) { retval = READSTAT_ERROR_READ; goto cleanup; } retval = readstat_convert(variable->name, sizeof(variable->name), name, name_len, ctx->converter); if (retval != READSTAT_OK) goto cleanup; retval = readstat_convert(variable->label, sizeof(variable->label), label, label_len, ctx->converter); if (retval != READSTAT_OK) goto cleanup; retval = xport_construct_format(variable->format, sizeof(variable->format), format, format_len, variable->display_width, variable->decimals); if (retval != READSTAT_OK) goto cleanup; } retval = xport_skip_rest_of_record(ctx); if (retval != READSTAT_OK) goto cleanup; retval = xport_read_obs_header_record(ctx); if (retval != READSTAT_OK) goto cleanup; cleanup: free(name); free(format); free(informat); free(label); return retval; } static readstat_error_t xport_read_variables(xport_ctx_t *ctx) { int i; readstat_error_t retval = READSTAT_OK; for (i=0; ivar_count; i++) { xport_namestr_t namestr; ssize_t bytes_read = read_bytes(ctx, &namestr, sizeof(xport_namestr_t)); if (bytes_read < sizeof(xport_namestr_t)) { retval = READSTAT_ERROR_READ; goto cleanup; } xport_namestr_bswap(&namestr); readstat_variable_t *variable = calloc(1, sizeof(readstat_variable_t)); variable->index = i; variable->type = namestr.ntype == SAS_COLUMN_TYPE_CHR ? READSTAT_TYPE_STRING : READSTAT_TYPE_DOUBLE; variable->storage_width = namestr.nlng; variable->display_width = namestr.nfl; variable->decimals = namestr.nfd; variable->alignment = namestr.nfj ? READSTAT_ALIGNMENT_RIGHT : READSTAT_ALIGNMENT_LEFT; if (ctx->version == 5) { retval = readstat_convert(variable->name, sizeof(variable->name), namestr.nname, sizeof(namestr.nname), ctx->converter); } else { retval = readstat_convert(variable->name, sizeof(variable->name), namestr.longname, sizeof(namestr.longname), ctx->converter); } if (retval != READSTAT_OK) goto cleanup; retval = readstat_convert(variable->label, sizeof(variable->label), namestr.nlabel, sizeof(namestr.nlabel), ctx->converter); if (retval != READSTAT_OK) goto cleanup; retval = xport_construct_format(variable->format, sizeof(variable->format), namestr.nform, sizeof(namestr.nform), variable->display_width, variable->decimals); if (retval != READSTAT_OK) goto cleanup; ctx->variables[i] = variable; } retval = xport_skip_rest_of_record(ctx); if (retval != READSTAT_OK) goto cleanup; if (ctx->version == 5) { retval = xport_read_obs_header_record(ctx); if (retval != READSTAT_OK) goto cleanup; } else { xport_header_record_t xrecord; retval = xport_read_header_record(ctx, &xrecord); if (retval != READSTAT_OK) goto cleanup; if (strcmp(xrecord.name, "OBSV8") == 0) { /* void */ } else if (strcmp(xrecord.name, "LABELV8") == 0) { retval = xport_read_labels_v8(ctx, xrecord.num1); } else if (strcmp(xrecord.name, "LABELV9") == 0) { retval = xport_read_labels_v9(ctx, xrecord.num1); } if (retval != READSTAT_OK) goto cleanup; } ctx->row_length = 0; int index_after_skipping = 0; for (i=0; ivar_count; i++) { readstat_variable_t *variable = ctx->variables[i]; variable->index_after_skipping = index_after_skipping; int cb_retval = READSTAT_HANDLER_OK; if (ctx->handle.variable) { cb_retval = ctx->handle.variable(i, variable, variable->format, ctx->user_ctx); } if (cb_retval == READSTAT_HANDLER_ABORT) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } if (cb_retval == READSTAT_HANDLER_SKIP_VARIABLE) { variable->skip = 1; } else { index_after_skipping++; } ctx->row_length += variable->storage_width; } cleanup: return retval; } static readstat_error_t xport_process_row(xport_ctx_t *ctx, const char *row, size_t row_length) { readstat_error_t retval = READSTAT_OK; int i; off_t pos = 0; char *string = NULL; for (i=0; ivar_count; i++) { readstat_variable_t *variable = ctx->variables[i]; readstat_value_t value = { .type = variable->type }; if (variable->type == READSTAT_TYPE_STRING) { string = readstat_realloc(string, 4*variable->storage_width+1); if (string == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } retval = readstat_convert(string, 4*variable->storage_width+1, &row[pos], variable->storage_width, ctx->converter); if (retval != READSTAT_OK) goto cleanup; value.v.string_value = string; } else { double dval = NAN; if (variable->storage_width <= XPORT_MAX_DOUBLE_SIZE && variable->storage_width >= XPORT_MIN_DOUBLE_SIZE) { char full_value[8] = { 0 }; if (memcmp(&full_value[1], &row[pos+1], variable->storage_width - 1) == 0 && (row[pos] == '.' || sas_validate_tag(row[pos]) == READSTAT_OK)) { if (row[pos] == '.') { value.is_system_missing = 1; } else { value.tag = row[pos]; value.is_tagged_missing = 1; } } else { memcpy(full_value, &row[pos], variable->storage_width); int rc = cnxptiee(full_value, CN_TYPE_XPORT, &dval, CN_TYPE_NATIVE); if (rc != 0) { retval = READSTAT_ERROR_CONVERT; goto cleanup; } } } value.v.double_value = dval; } pos += variable->storage_width; if (ctx->handle.value && !ctx->variables[i]->skip && !ctx->row_offset) { if (ctx->handle.value(ctx->parsed_row_count, variable, value, ctx->user_ctx) != READSTAT_HANDLER_OK) { retval = READSTAT_ERROR_USER_ABORT; goto cleanup; } } } if (ctx->row_offset) { ctx->row_offset--; } else { ctx->parsed_row_count++; } cleanup: free(string); return retval; } static readstat_error_t xport_read_data(xport_ctx_t *ctx) { if (!ctx->row_length) return READSTAT_OK; if (!ctx->handle.value) return READSTAT_OK; readstat_error_t retval = READSTAT_OK; char *row = readstat_malloc(ctx->row_length); char *blank_row = readstat_malloc(ctx->row_length); int num_blank_rows = 0; if (row == NULL || blank_row == NULL) { retval = READSTAT_ERROR_MALLOC; goto cleanup; } memset(blank_row, ' ', ctx->row_length); while (1) { ssize_t bytes_read = read_bytes(ctx, row, ctx->row_length); if (bytes_read == -1) { retval = READSTAT_ERROR_READ; goto cleanup; } else if (bytes_read < ctx->row_length) { break; } off_t pos = 0; int row_is_blank = 1; for (pos=0; posrow_length; pos++) { if (row[pos] != ' ') { row_is_blank = 0; break; } } if (row_is_blank) { num_blank_rows++; continue; } while (num_blank_rows) { retval = xport_process_row(ctx, blank_row, ctx->row_length); if (retval != READSTAT_OK) goto cleanup; if (ctx->row_limit > 0 && ctx->parsed_row_count == ctx->row_limit) goto cleanup; num_blank_rows--; } retval = xport_process_row(ctx, row, ctx->row_length); if (retval != READSTAT_OK) goto cleanup; retval = xport_update_progress(ctx); if (retval != READSTAT_OK) goto cleanup; if (ctx->row_limit > 0 && ctx->parsed_row_count == ctx->row_limit) break; } cleanup: if (row) free(row); if (blank_row) free(blank_row); return retval; } readstat_error_t readstat_parse_xport(readstat_parser_t *parser, const char *path, void *user_ctx) { readstat_error_t retval = READSTAT_OK; readstat_io_t *io = parser->io; xport_ctx_t *ctx = xport_ctx_init(); ctx->handle = parser->handlers; ctx->input_encoding = parser->input_encoding; ctx->output_encoding = parser->output_encoding; ctx->user_ctx = user_ctx; ctx->io = io; ctx->row_limit = parser->row_limit; if (parser->row_offset > 0) ctx->row_offset = parser->row_offset; if (io->open(path, io->io_ctx) == -1) { retval = READSTAT_ERROR_OPEN; goto cleanup; } if ((ctx->file_size = io->seek(0, READSTAT_SEEK_END, io->io_ctx)) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if (io->seek(0, READSTAT_SEEK_SET, io->io_ctx) == -1) { retval = READSTAT_ERROR_SEEK; goto cleanup; } if (ctx->input_encoding && ctx->output_encoding && strcmp(ctx->input_encoding, ctx->output_encoding) != 0) { iconv_t converter = iconv_open(ctx->output_encoding, ctx->input_encoding); if (converter == (iconv_t)-1) { retval = READSTAT_ERROR_UNSUPPORTED_CHARSET; goto cleanup; } ctx->converter = converter; } retval = xport_read_library_record(ctx); if (retval != READSTAT_OK) goto cleanup; retval = xport_skip_record(ctx); if (retval != READSTAT_OK) goto cleanup; retval = xport_read_timestamp_record(ctx); if (retval != READSTAT_OK) goto cleanup; retval = xport_expect_header_record(ctx, "MEMBER", "MEMBV8"); if (retval != READSTAT_OK) goto cleanup; retval = xport_expect_header_record(ctx, "DSCRPTR", "DSCPTV8"); if (retval != READSTAT_OK) goto cleanup; retval = xport_read_table_name_record(ctx); if (retval != READSTAT_OK) goto cleanup; retval = xport_read_file_label_record(ctx); if (retval != READSTAT_OK) goto cleanup; retval = xport_read_namestr_header_record(ctx); if (retval != READSTAT_OK) goto cleanup; retval = xport_read_variables(ctx); if (retval != READSTAT_OK) goto cleanup; if (ctx->row_length) { retval = xport_read_data(ctx); if (retval != READSTAT_OK) goto cleanup; } cleanup: io->close(io->io_ctx); xport_ctx_free(ctx); return retval; } haven/src/readstat/readstat.h0000644000176200001440000007022714101765776015757 0ustar liggesusers// // readstat.h - API and internal data structures for ReadStat // // Copyright Evan Miller and ReadStat authors (see LICENSE) // #ifndef INCLUDE_READSTAT_H #define INCLUDE_READSTAT_H #ifdef __cplusplus extern "C" { #endif #include #include #include #include #include enum { READSTAT_HANDLER_OK, READSTAT_HANDLER_ABORT, READSTAT_HANDLER_SKIP_VARIABLE }; typedef enum readstat_type_e { READSTAT_TYPE_STRING, READSTAT_TYPE_INT8, READSTAT_TYPE_INT16, READSTAT_TYPE_INT32, READSTAT_TYPE_FLOAT, READSTAT_TYPE_DOUBLE, READSTAT_TYPE_STRING_REF } readstat_type_t; typedef enum readstat_type_class_e { READSTAT_TYPE_CLASS_STRING, READSTAT_TYPE_CLASS_NUMERIC } readstat_type_class_t; typedef enum readstat_measure_e { READSTAT_MEASURE_UNKNOWN, READSTAT_MEASURE_NOMINAL = 1, READSTAT_MEASURE_ORDINAL, READSTAT_MEASURE_SCALE } readstat_measure_t; typedef enum readstat_alignment_e { READSTAT_ALIGNMENT_UNKNOWN, READSTAT_ALIGNMENT_LEFT = 1, READSTAT_ALIGNMENT_CENTER, READSTAT_ALIGNMENT_RIGHT } readstat_alignment_t; typedef enum readstat_compress_e { READSTAT_COMPRESS_NONE, READSTAT_COMPRESS_ROWS, READSTAT_COMPRESS_BINARY } readstat_compress_t; typedef enum readstat_endian_e { READSTAT_ENDIAN_NONE, READSTAT_ENDIAN_LITTLE, READSTAT_ENDIAN_BIG } readstat_endian_t; typedef enum readstat_error_e { READSTAT_OK, READSTAT_ERROR_OPEN = 1, READSTAT_ERROR_READ, READSTAT_ERROR_MALLOC, READSTAT_ERROR_USER_ABORT, READSTAT_ERROR_PARSE, READSTAT_ERROR_UNSUPPORTED_COMPRESSION, READSTAT_ERROR_UNSUPPORTED_CHARSET, READSTAT_ERROR_COLUMN_COUNT_MISMATCH, READSTAT_ERROR_ROW_COUNT_MISMATCH, READSTAT_ERROR_ROW_WIDTH_MISMATCH, READSTAT_ERROR_BAD_FORMAT_STRING, READSTAT_ERROR_VALUE_TYPE_MISMATCH, READSTAT_ERROR_WRITE, READSTAT_ERROR_WRITER_NOT_INITIALIZED, READSTAT_ERROR_SEEK, READSTAT_ERROR_CONVERT, READSTAT_ERROR_CONVERT_BAD_STRING, READSTAT_ERROR_CONVERT_SHORT_STRING, READSTAT_ERROR_CONVERT_LONG_STRING, READSTAT_ERROR_NUMERIC_VALUE_IS_OUT_OF_RANGE, READSTAT_ERROR_TAGGED_VALUE_IS_OUT_OF_RANGE, READSTAT_ERROR_STRING_VALUE_IS_TOO_LONG, READSTAT_ERROR_TAGGED_VALUES_NOT_SUPPORTED, READSTAT_ERROR_UNSUPPORTED_FILE_FORMAT_VERSION, READSTAT_ERROR_NAME_BEGINS_WITH_ILLEGAL_CHARACTER, READSTAT_ERROR_NAME_CONTAINS_ILLEGAL_CHARACTER, READSTAT_ERROR_NAME_IS_RESERVED_WORD, READSTAT_ERROR_NAME_IS_TOO_LONG, READSTAT_ERROR_BAD_TIMESTAMP_STRING, READSTAT_ERROR_BAD_FREQUENCY_WEIGHT, READSTAT_ERROR_TOO_MANY_MISSING_VALUE_DEFINITIONS, READSTAT_ERROR_NOTE_IS_TOO_LONG, READSTAT_ERROR_STRING_REFS_NOT_SUPPORTED, READSTAT_ERROR_STRING_REF_IS_REQUIRED, READSTAT_ERROR_ROW_IS_TOO_WIDE_FOR_PAGE, READSTAT_ERROR_TOO_FEW_COLUMNS, READSTAT_ERROR_TOO_MANY_COLUMNS, READSTAT_ERROR_NAME_IS_ZERO_LENGTH, READSTAT_ERROR_BAD_TIMESTAMP_VALUE } readstat_error_t; const char *readstat_error_message(readstat_error_t error_code); typedef struct readstat_metadata_s { int64_t row_count; int64_t var_count; time_t creation_time; time_t modified_time; int64_t file_format_version; readstat_compress_t compression; readstat_endian_t endianness; const char *table_name; const char *file_label; const char *file_encoding; unsigned int is64bit:1; } readstat_metadata_t; /* If the row count is unknown (e.g. it's an XPORT or POR file, or an SAV * file created with non-conforming software), then readstat_get_row_count * returns -1. */ int readstat_get_row_count(readstat_metadata_t *metadata); int readstat_get_var_count(readstat_metadata_t *metadata); time_t readstat_get_creation_time(readstat_metadata_t *metadata); time_t readstat_get_modified_time(readstat_metadata_t *metadata); int readstat_get_file_format_version(readstat_metadata_t *metadata); int readstat_get_file_format_is_64bit(readstat_metadata_t *metadata); readstat_compress_t readstat_get_compression(readstat_metadata_t *metadata); readstat_endian_t readstat_get_endianness(readstat_metadata_t *metadata); const char *readstat_get_table_name(readstat_metadata_t *metadata); const char *readstat_get_file_label(readstat_metadata_t *metadata); const char *readstat_get_file_encoding(readstat_metadata_t *metadata); typedef struct readstat_value_s { union { float float_value; double double_value; int8_t i8_value; int16_t i16_value; int32_t i32_value; const char *string_value; } v; readstat_type_t type; char tag; unsigned int is_system_missing:1; unsigned int is_tagged_missing:1; } readstat_value_t; /* Internal data structures */ typedef struct readstat_value_label_s { double double_key; int32_t int32_key; char tag; char *string_key; size_t string_key_len; char *label; size_t label_len; } readstat_value_label_t; typedef struct readstat_label_set_s { readstat_type_t type; char name[256]; readstat_value_label_t *value_labels; long value_labels_count; long value_labels_capacity; void *variables; long variables_count; long variables_capacity; } readstat_label_set_t; typedef struct readstat_missingness_s { readstat_value_t missing_ranges[32]; long missing_ranges_count; } readstat_missingness_t; typedef struct readstat_variable_s { readstat_type_t type; int index; char name[300]; char format[256]; char label[1024]; readstat_label_set_t *label_set; off_t offset; size_t storage_width; size_t user_width; readstat_missingness_t missingness; readstat_measure_t measure; readstat_alignment_t alignment; int display_width; int decimals; int skip; int index_after_skipping; } readstat_variable_t; typedef struct readstat_schema_entry_s { uint32_t row; uint32_t col; uint32_t len; int skip; readstat_variable_t variable; char labelset[32]; char decimal_separator; } readstat_schema_entry_t; typedef struct readstat_schema_s { char filename[255]; uint32_t rows_per_observation; uint32_t cols_per_observation; int first_line; int entry_count; char field_delimiter; readstat_schema_entry_t *entries; } readstat_schema_t; /* Value accessors */ readstat_type_t readstat_value_type(readstat_value_t value); readstat_type_class_t readstat_value_type_class(readstat_value_t value); /* Values can be missing in one of three ways: * 1. "System missing", delivered to value handlers as NaN. Occurs in all file * types. The most common kind of missing value. * 2. Tagged missing, also delivered as NaN, but with a single character tag * accessible via readstat_value_tag(). The tag might be 'a', 'b', etc, * corresponding to Stata's .a, .b, values etc. Occurs only in Stata and * SAS files. * 3. Defined missing. The value is a real number but is to be treated as * missing according to the variable's missingness rules (such as "value < 0 || * value == 999"). Occurs only in SPSS files. access the rules via: * * readstat_variable_get_missing_ranges_count() * readstat_variable_get_missing_range_lo() * readstat_variable_get_missing_range_hi() * * Note that "ranges" include individual values where lo == hi. * * readstat_value_is_missing() is equivalent to: * * (readstat_value_is_system_missing() * || readstat_value_is_tagged_missing() * || readstat_value_is_defined_missing()) */ int readstat_value_is_missing(readstat_value_t value, readstat_variable_t *variable); int readstat_value_is_system_missing(readstat_value_t value); int readstat_value_is_tagged_missing(readstat_value_t value); int readstat_value_is_defined_missing(readstat_value_t value, readstat_variable_t *variable); char readstat_value_tag(readstat_value_t value); char readstat_int8_value(readstat_value_t value); int16_t readstat_int16_value(readstat_value_t value); int32_t readstat_int32_value(readstat_value_t value); float readstat_float_value(readstat_value_t value); double readstat_double_value(readstat_value_t value); const char *readstat_string_value(readstat_value_t value); readstat_type_class_t readstat_type_class(readstat_type_t type); /* Accessor methods for use inside variable handlers */ int readstat_variable_get_index(const readstat_variable_t *variable); int readstat_variable_get_index_after_skipping(const readstat_variable_t *variable); const char *readstat_variable_get_name(const readstat_variable_t *variable); const char *readstat_variable_get_label(const readstat_variable_t *variable); const char *readstat_variable_get_format(const readstat_variable_t *variable); readstat_type_t readstat_variable_get_type(const readstat_variable_t *variable); readstat_type_class_t readstat_variable_get_type_class(const readstat_variable_t *variable); size_t readstat_variable_get_storage_width(const readstat_variable_t *variable); int readstat_variable_get_display_width(const readstat_variable_t *variable); readstat_measure_t readstat_variable_get_measure(const readstat_variable_t *variable); readstat_alignment_t readstat_variable_get_alignment(const readstat_variable_t *variable); int readstat_variable_get_missing_ranges_count(const readstat_variable_t *variable); readstat_value_t readstat_variable_get_missing_range_lo(const readstat_variable_t *variable, int i); readstat_value_t readstat_variable_get_missing_range_hi(const readstat_variable_t *variable, int i); /* Callbacks should return 0 (aka READSTAT_HANDLER_OK) on success and 1 (aka READSTAT_HANDLER_ABORT) to abort. */ /* If the variable handler returns READSTAT_HANDLER_SKIP_VARIABLE, the value handler will not be called on * the associated variable. (Note that subsequent variables will retain their original index values.) */ typedef int (*readstat_metadata_handler)(readstat_metadata_t *metadata, void *ctx); typedef int (*readstat_note_handler)(int note_index, const char *note, void *ctx); typedef int (*readstat_variable_handler)(int index, readstat_variable_t *variable, const char *val_labels, void *ctx); typedef int (*readstat_fweight_handler)(readstat_variable_t *variable, void *ctx); typedef int (*readstat_value_handler)(int obs_index, readstat_variable_t *variable, readstat_value_t value, void *ctx); typedef int (*readstat_value_label_handler)(const char *val_labels, readstat_value_t value, const char *label, void *ctx); typedef void (*readstat_error_handler)(const char *error_message, void *ctx); typedef int (*readstat_progress_handler)(double progress, void *ctx); #if defined(_MSC_VER) #include typedef SSIZE_T ssize_t; typedef __int64 readstat_off_t; #elif defined _WIN32 || defined __CYGWIN__ typedef _off64_t readstat_off_t; #elif defined _AIX typedef off64_t readstat_off_t; #else typedef off_t readstat_off_t; #endif typedef enum readstat_io_flags_e { READSTAT_SEEK_SET, READSTAT_SEEK_CUR, READSTAT_SEEK_END } readstat_io_flags_t; typedef int (*readstat_open_handler)(const char *path, void *io_ctx); typedef int (*readstat_close_handler)(void *io_ctx); typedef readstat_off_t (*readstat_seek_handler)(readstat_off_t offset, readstat_io_flags_t whence, void *io_ctx); typedef ssize_t (*readstat_read_handler)(void *buf, size_t nbyte, void *io_ctx); typedef readstat_error_t (*readstat_update_handler)(long file_size, readstat_progress_handler progress_handler, void *user_ctx, void *io_ctx); typedef struct readstat_io_s { readstat_open_handler open; readstat_close_handler close; readstat_seek_handler seek; readstat_read_handler read; readstat_update_handler update; void *io_ctx; int io_ctx_needs_free; } readstat_io_t; typedef struct readstat_callbacks_s { readstat_metadata_handler metadata; readstat_note_handler note; readstat_variable_handler variable; readstat_fweight_handler fweight; readstat_value_handler value; readstat_value_label_handler value_label; readstat_error_handler error; readstat_progress_handler progress; } readstat_callbacks_t; typedef struct readstat_parser_s { readstat_callbacks_t handlers; readstat_io_t *io; const char *input_encoding; const char *output_encoding; long row_limit; long row_offset; } readstat_parser_t; readstat_parser_t *readstat_parser_init(void); void readstat_parser_free(readstat_parser_t *parser); void readstat_io_free(readstat_io_t *io); readstat_error_t readstat_set_metadata_handler(readstat_parser_t *parser, readstat_metadata_handler metadata_handler); readstat_error_t readstat_set_note_handler(readstat_parser_t *parser, readstat_note_handler note_handler); readstat_error_t readstat_set_variable_handler(readstat_parser_t *parser, readstat_variable_handler variable_handler); readstat_error_t readstat_set_fweight_handler(readstat_parser_t *parser, readstat_fweight_handler fweight_handler); readstat_error_t readstat_set_value_handler(readstat_parser_t *parser, readstat_value_handler value_handler); readstat_error_t readstat_set_value_label_handler(readstat_parser_t *parser, readstat_value_label_handler value_label_handler); readstat_error_t readstat_set_error_handler(readstat_parser_t *parser, readstat_error_handler error_handler); readstat_error_t readstat_set_progress_handler(readstat_parser_t *parser, readstat_progress_handler progress_handler); readstat_error_t readstat_set_open_handler(readstat_parser_t *parser, readstat_open_handler open_handler); readstat_error_t readstat_set_close_handler(readstat_parser_t *parser, readstat_close_handler close_handler); readstat_error_t readstat_set_seek_handler(readstat_parser_t *parser, readstat_seek_handler seek_handler); readstat_error_t readstat_set_read_handler(readstat_parser_t *parser, readstat_read_handler read_handler); readstat_error_t readstat_set_update_handler(readstat_parser_t *parser, readstat_update_handler update_handler); readstat_error_t readstat_set_io_ctx(readstat_parser_t *parser, void *io_ctx); // Usually inferred from the file, but sometimes a manual override is desirable. // In particular, pre-14 Stata uses the system encoding, which is usually Win 1252 // but could be anything. `encoding' should be an iconv-compatible name. readstat_error_t readstat_set_file_character_encoding(readstat_parser_t *parser, const char *encoding); // Defaults to UTF-8. Pass in NULL to disable transliteration. readstat_error_t readstat_set_handler_character_encoding(readstat_parser_t *parser, const char *encoding); readstat_error_t readstat_set_row_limit(readstat_parser_t *parser, long row_limit); readstat_error_t readstat_set_row_offset(readstat_parser_t *parser, long row_offset); /* Parse binary / portable files */ readstat_error_t readstat_parse_dta(readstat_parser_t *parser, const char *path, void *user_ctx); readstat_error_t readstat_parse_sav(readstat_parser_t *parser, const char *path, void *user_ctx); readstat_error_t readstat_parse_por(readstat_parser_t *parser, const char *path, void *user_ctx); readstat_error_t readstat_parse_sas7bdat(readstat_parser_t *parser, const char *path, void *user_ctx); readstat_error_t readstat_parse_sas7bcat(readstat_parser_t *parser, const char *path, void *user_ctx); readstat_error_t readstat_parse_xport(readstat_parser_t *parser, const char *path, void *user_ctx); /* Parse a schema file... */ readstat_schema_t *readstat_parse_sas_commands(readstat_parser_t *parser, const char *filepath, void *user_ctx, readstat_error_t *outError); readstat_schema_t *readstat_parse_spss_commands(readstat_parser_t *parser, const char *filepath, void *user_ctx, readstat_error_t *outError); readstat_schema_t *readstat_parse_stata_dictionary(readstat_parser_t *parser, const char *filepath, void *user_ctx, readstat_error_t *outError); /* ... then pass the schema to the plain-text parser ... */ readstat_error_t readstat_parse_txt(readstat_parser_t *parser, const char *filename, readstat_schema_t *schema, void *user_ctx); /* ... and free the schema structure */ void readstat_schema_free(readstat_schema_t *schema); /* Internal module callbacks */ typedef struct readstat_string_ref_s { int64_t first_v; int64_t first_o; size_t len; char data[1]; // Flexible array; using [1] for C++98 compatibility } readstat_string_ref_t; typedef size_t (*readstat_variable_width_callback)(readstat_type_t type, size_t user_width); typedef readstat_error_t (*readstat_variable_ok_callback)(const readstat_variable_t *variable); typedef readstat_error_t (*readstat_write_int8_callback)(void *row_data, const readstat_variable_t *variable, int8_t value); typedef readstat_error_t (*readstat_write_int16_callback)(void *row_data, const readstat_variable_t *variable, int16_t value); typedef readstat_error_t (*readstat_write_int32_callback)(void *row_data, const readstat_variable_t *variable, int32_t value); typedef readstat_error_t (*readstat_write_float_callback)(void *row_data, const readstat_variable_t *variable, float value); typedef readstat_error_t (*readstat_write_double_callback)(void *row_data, const readstat_variable_t *variable, double value); typedef readstat_error_t (*readstat_write_string_callback)(void *row_data, const readstat_variable_t *variable, const char *value); typedef readstat_error_t (*readstat_write_string_ref_callback)(void *row_data, const readstat_variable_t *variable, readstat_string_ref_t *ref); typedef readstat_error_t (*readstat_write_missing_callback)(void *row_data, const readstat_variable_t *variable); typedef readstat_error_t (*readstat_write_tagged_callback)(void *row_data, const readstat_variable_t *variable, char tag); typedef readstat_error_t (*readstat_begin_data_callback)(void *writer); typedef readstat_error_t (*readstat_write_row_callback)(void *writer, void *row_data, size_t row_len); typedef readstat_error_t (*readstat_end_data_callback)(void *writer); typedef void (*readstat_module_ctx_free_callback)(void *module_ctx); typedef readstat_error_t (*readstat_metadata_ok_callback)(void *writer); typedef struct readstat_writer_callbacks_s { readstat_variable_width_callback variable_width; readstat_variable_ok_callback variable_ok; readstat_write_int8_callback write_int8; readstat_write_int16_callback write_int16; readstat_write_int32_callback write_int32; readstat_write_float_callback write_float; readstat_write_double_callback write_double; readstat_write_string_callback write_string; readstat_write_string_ref_callback write_string_ref; readstat_write_missing_callback write_missing_string; readstat_write_missing_callback write_missing_number; readstat_write_tagged_callback write_missing_tagged; readstat_begin_data_callback begin_data; readstat_write_row_callback write_row; readstat_end_data_callback end_data; readstat_module_ctx_free_callback module_ctx_free; readstat_metadata_ok_callback metadata_ok; } readstat_writer_callbacks_t; /* You'll need to define one of these to get going. Should return # bytes written, * or -1 on error, a la write(2) */ typedef ssize_t (*readstat_data_writer)(const void *data, size_t len, void *ctx); typedef struct readstat_writer_s { readstat_data_writer data_writer; size_t bytes_written; long version; int is_64bit; // SAS only readstat_compress_t compression; time_t timestamp; readstat_variable_t **variables; long variables_count; long variables_capacity; readstat_label_set_t **label_sets; long label_sets_count; long label_sets_capacity; char **notes; long notes_count; long notes_capacity; readstat_string_ref_t **string_refs; long string_refs_count; long string_refs_capacity; unsigned char *row; size_t row_len; int row_count; int current_row; char file_label[257]; char table_name[33]; const readstat_variable_t *fweight_variable; readstat_writer_callbacks_t callbacks; readstat_error_handler error_handler; void *module_ctx; void *user_ctx; int initialized; } readstat_writer_t; /* Writer API */ // First call this... readstat_writer_t *readstat_writer_init(void); // Then specify a function that will handle the output bytes... readstat_error_t readstat_set_data_writer(readstat_writer_t *writer, readstat_data_writer data_writer); // Next define your value labels, if any. Create as many named sets as you'd like. readstat_label_set_t *readstat_add_label_set(readstat_writer_t *writer, readstat_type_t type, const char *name); void readstat_label_double_value(readstat_label_set_t *label_set, double value, const char *label); void readstat_label_int32_value(readstat_label_set_t *label_set, int32_t value, const char *label); void readstat_label_string_value(readstat_label_set_t *label_set, const char *value, const char *label); void readstat_label_tagged_value(readstat_label_set_t *label_set, char tag, const char *label); // Now define your variables. Note that `storage_width' is used for: // * READSTAT_TYPE_STRING variables in all formats // * READSTAT_TYPE_DOUBLE variables, but only in the SAS XPORT format (valid values 3-8, defaults to 8) readstat_variable_t *readstat_add_variable(readstat_writer_t *writer, const char *name, readstat_type_t type, size_t storage_width); void readstat_variable_set_label(readstat_variable_t *variable, const char *label); void readstat_variable_set_format(readstat_variable_t *variable, const char *format); void readstat_variable_set_label_set(readstat_variable_t *variable, readstat_label_set_t *label_set); void readstat_variable_set_measure(readstat_variable_t *variable, readstat_measure_t measure); void readstat_variable_set_alignment(readstat_variable_t *variable, readstat_alignment_t alignment); void readstat_variable_set_display_width(readstat_variable_t *variable, int display_width); readstat_error_t readstat_variable_add_missing_double_value(readstat_variable_t *variable, double value); readstat_error_t readstat_variable_add_missing_double_range(readstat_variable_t *variable, double lo, double hi); readstat_error_t readstat_variable_add_missing_string_value(readstat_variable_t *variable, const char *value); readstat_error_t readstat_variable_add_missing_string_range(readstat_variable_t *variable, const char *lo, const char *hi); readstat_variable_t *readstat_get_variable(readstat_writer_t *writer, int index); // "Notes" appear in the file metadata. In SPSS these are stored as // lines in the Document Record; in Stata these are stored using // the "notes" feature. // // Note that the line length in SPSS is 80 characters; ReadStat will // produce a write error if a note is longer than this limit. void readstat_add_note(readstat_writer_t *writer, const char *note); // String refs are used for creating a READSTAT_TYPE_STRING_REF column, // which is only supported in Stata. String references can be shared // across columns, and inserted with readstat_insert_string_ref(). readstat_string_ref_t *readstat_add_string_ref(readstat_writer_t *writer, const char *string); readstat_string_ref_t *readstat_get_string_ref(readstat_writer_t *writer, int index); // Optional metadata readstat_error_t readstat_writer_set_file_label(readstat_writer_t *writer, const char *file_label); readstat_error_t readstat_writer_set_file_timestamp(readstat_writer_t *writer, time_t timestamp); readstat_error_t readstat_writer_set_fweight_variable(readstat_writer_t *writer, const readstat_variable_t *variable); readstat_error_t readstat_writer_set_file_format_version(readstat_writer_t *writer, uint8_t file_format_version); // e.g. 104-119 for DTA; 5 or 8 for SAS Transport. // SAV files support 2 or 3, where 3 is equivalent to setting // readstat_writer_set_compression(READSTAT_COMPRESS_BINARY) readstat_error_t readstat_writer_set_table_name(readstat_writer_t *writer, const char *table_name); // Only used in XPORT files at the moment (defaults to DATASET) readstat_error_t readstat_writer_set_file_format_is_64bit(readstat_writer_t *writer, int is_64bit); // applies only to SAS files; defaults to 1=true readstat_error_t readstat_writer_set_compression(readstat_writer_t *writer, readstat_compress_t compression); // READSTAT_COMPRESS_BINARY is supported only with SAV files (i.e. ZSAV files) // READSTAT_COMPRESS_ROWS is supported only with sas7bdat and SAV files // Optional error handler readstat_error_t readstat_writer_set_error_handler(readstat_writer_t *writer, readstat_error_handler error_handler); // Call one of these at any time before the first invocation of readstat_begin_row readstat_error_t readstat_begin_writing_dta(readstat_writer_t *writer, void *user_ctx, long row_count); readstat_error_t readstat_begin_writing_por(readstat_writer_t *writer, void *user_ctx, long row_count); readstat_error_t readstat_begin_writing_sas7bcat(readstat_writer_t *writer, void *user_ctx); readstat_error_t readstat_begin_writing_sas7bdat(readstat_writer_t *writer, void *user_ctx, long row_count); readstat_error_t readstat_begin_writing_sav(readstat_writer_t *writer, void *user_ctx, long row_count); readstat_error_t readstat_begin_writing_xport(readstat_writer_t *writer, void *user_ctx, long row_count); // Optional, file-specific validation routines, to be called AFTER readstat_begin_writing_XXX readstat_error_t readstat_validate_metadata(readstat_writer_t *writer); readstat_error_t readstat_validate_variable(readstat_writer_t *writer, const readstat_variable_t *variable); // Start a row of data (that is, a case or observation) readstat_error_t readstat_begin_row(readstat_writer_t *writer); // Then call one of these for each variable readstat_error_t readstat_insert_int8_value(readstat_writer_t *writer, const readstat_variable_t *variable, int8_t value); readstat_error_t readstat_insert_int16_value(readstat_writer_t *writer, const readstat_variable_t *variable, int16_t value); readstat_error_t readstat_insert_int32_value(readstat_writer_t *writer, const readstat_variable_t *variable, int32_t value); readstat_error_t readstat_insert_float_value(readstat_writer_t *writer, const readstat_variable_t *variable, float value); readstat_error_t readstat_insert_double_value(readstat_writer_t *writer, const readstat_variable_t *variable, double value); readstat_error_t readstat_insert_string_value(readstat_writer_t *writer, const readstat_variable_t *variable, const char *value); readstat_error_t readstat_insert_string_ref(readstat_writer_t *writer, const readstat_variable_t *variable, readstat_string_ref_t *ref); readstat_error_t readstat_insert_missing_value(readstat_writer_t *writer, const readstat_variable_t *variable); readstat_error_t readstat_insert_tagged_missing_value(readstat_writer_t *writer, const readstat_variable_t *variable, char tag); // Finally, close out the row readstat_error_t readstat_end_row(readstat_writer_t *writer); // Once you've written all the rows, clean up after yourself readstat_error_t readstat_end_writing(readstat_writer_t *writer); void readstat_writer_free(readstat_writer_t *writer); #ifdef __cplusplus } #endif #endif haven/src/readstat/readstat_error.c0000644000176200001440000001216514101007206017131 0ustar liggesusers #include "readstat.h" const char *readstat_error_message(readstat_error_t error_code) { if (error_code == READSTAT_OK) return NULL; if (error_code == READSTAT_ERROR_OPEN) return "Unable to open file"; if (error_code == READSTAT_ERROR_READ) return "Unable to read from file"; if (error_code == READSTAT_ERROR_MALLOC) return "Unable to allocate memory"; if (error_code == READSTAT_ERROR_USER_ABORT) return "The parsing was aborted (callback returned non-zero value)"; if (error_code == READSTAT_ERROR_PARSE) return "Invalid file, or file has unsupported features"; if (error_code == READSTAT_ERROR_UNSUPPORTED_COMPRESSION) return "File has unsupported compression scheme"; if (error_code == READSTAT_ERROR_UNSUPPORTED_CHARSET) return "File has an unsupported character set"; if (error_code == READSTAT_ERROR_COLUMN_COUNT_MISMATCH) return "File did not contain the expected number of columns"; if (error_code == READSTAT_ERROR_ROW_COUNT_MISMATCH) return "File did not contain the expected number of rows"; if (error_code == READSTAT_ERROR_ROW_WIDTH_MISMATCH) return "A row in the file was not the expected length"; if (error_code == READSTAT_ERROR_BAD_FORMAT_STRING) return "A provided format string could not be understood"; if (error_code == READSTAT_ERROR_VALUE_TYPE_MISMATCH) return "A provided value was incompatible with the variable's declared type"; if (error_code == READSTAT_ERROR_WRITE) return "Unable to write data"; if (error_code == READSTAT_ERROR_WRITER_NOT_INITIALIZED) return "The writer object was not properly initialized (call and check return value of readstat_begin_writing_XXX)"; if (error_code == READSTAT_ERROR_SEEK) return "Unable to seek within file"; if (error_code == READSTAT_ERROR_CONVERT) return "Unable to convert string to the requested encoding"; if (error_code == READSTAT_ERROR_CONVERT_BAD_STRING) return "Unable to convert string to the requested encoding (invalid byte sequence)"; if (error_code == READSTAT_ERROR_CONVERT_SHORT_STRING) return "Unable to convert string to the requested encoding (incomplete byte sequence)"; if (error_code == READSTAT_ERROR_CONVERT_LONG_STRING) return "Unable to convert string to the requested encoding (output buffer too small)"; if (error_code == READSTAT_ERROR_NUMERIC_VALUE_IS_OUT_OF_RANGE) return "A provided numeric value was outside the range of representable values in the specified file format"; if (error_code == READSTAT_ERROR_TAGGED_VALUE_IS_OUT_OF_RANGE) return "A provided tag value was outside the range of allowed values in the specified file format"; if (error_code == READSTAT_ERROR_STRING_VALUE_IS_TOO_LONG) return "A provided string value was longer than the available storage size of the specified column"; if (error_code == READSTAT_ERROR_TAGGED_VALUES_NOT_SUPPORTED) return "The file format does not supported character tags for missing values"; if (error_code == READSTAT_ERROR_UNSUPPORTED_FILE_FORMAT_VERSION) return "This version of the file format is not supported"; if (error_code == READSTAT_ERROR_NAME_BEGINS_WITH_ILLEGAL_CHARACTER) return "A provided name begins with an illegal character"; if (error_code == READSTAT_ERROR_NAME_CONTAINS_ILLEGAL_CHARACTER) return "A provided name contains an illegal character"; if (error_code == READSTAT_ERROR_NAME_IS_RESERVED_WORD) return "A provided name is a reserved word"; if (error_code == READSTAT_ERROR_NAME_IS_TOO_LONG) return "A provided name is too long for the file format"; if (error_code == READSTAT_ERROR_NAME_IS_ZERO_LENGTH) return "A provided name is blank or empty"; if (error_code == READSTAT_ERROR_BAD_TIMESTAMP_STRING) return "The file's timestamp string is invalid"; if (error_code == READSTAT_ERROR_BAD_FREQUENCY_WEIGHT) return "The provided variable can't be used as a frequency weight"; if (error_code == READSTAT_ERROR_TOO_MANY_MISSING_VALUE_DEFINITIONS) return "The number of defined missing values exceeds the format limit"; if (error_code == READSTAT_ERROR_NOTE_IS_TOO_LONG) return "The provided note is too long for the file format"; if (error_code == READSTAT_ERROR_STRING_REFS_NOT_SUPPORTED) return "This version of the file format does not support string references"; if (error_code == READSTAT_ERROR_STRING_REF_IS_REQUIRED) return "The provided value was not a valid string reference"; if (error_code == READSTAT_ERROR_ROW_IS_TOO_WIDE_FOR_PAGE) return "A row of data will not fit into the file format"; if (error_code == READSTAT_ERROR_TOO_FEW_COLUMNS) return "One or more columns must be provided"; if (error_code == READSTAT_ERROR_TOO_MANY_COLUMNS) return "Too many columns for this file format version"; if (error_code == READSTAT_ERROR_BAD_TIMESTAMP_VALUE) return "The provided file timestamp is invalid"; return "Unknown error"; } haven/src/Makevars0000644000176200001440000000052614033646021013640 0ustar liggesusersCFILES = $(wildcard *.c readstat/*.c readstat/sas/*.c readstat/spss/*.c readstat/stata/*.c) CPPFILES = $(wildcard *.cpp) SOURCES = $(CFILES) $(CPPFILES) # This must be defined identically in Makevars.win OBJECTS = $(CFILES:.c=.o) $(CPPFILES:.cpp=.o) PKG_CFLAGS = -Ireadstat -DHAVE_ZLIB PKG_CXXFLAGS = -Ireadstat -DHAVE_ZLIB PKG_LIBS = -lz haven/src/Makevars.win0000644000176200001440000000025514033646021014433 0ustar liggesusersinclude Makevars # This is also defined in Makevars, but somehow the definition from there is not used OBJECTS = $(CFILES:.c=.o) $(CPPFILES:.cpp=.o) PKG_LIBS=-lRiconv -lz haven/src/DfReader.cpp0000644000176200001440000005644014101006665014332 0ustar liggesusers#include #include #include #include #include #include #include #include "readstat.h" #include "haven_types.h" #include "tagged_na.h" #include "cpp11/strings.hpp" #include "cpp11/doubles.hpp" #include "cpp11/integers.hpp" #include "cpp11/r_string.hpp" #include "cpp11/list.hpp" #include "cpp11/raws.hpp" #include "cpp11/sexp.hpp" #include "cpp11/protect.hpp" #include "cpp11/function.hpp" double haven_double_value_udm(readstat_value_t value, readstat_variable_t* var, bool user_na) { if (readstat_value_is_tagged_missing(value)) { return make_tagged_na(tolower(readstat_value_tag(value))); } else if (!user_na && readstat_value_is_defined_missing(value, var)) { return NA_REAL; } else if (readstat_value_is_system_missing(value)) { return NA_REAL; } else { return readstat_double_value(value); } } double haven_double_value(readstat_value_t value) { if (readstat_value_is_tagged_missing(value)) { return make_tagged_na(tolower(readstat_value_tag(value))); } else { return readstat_double_value(value); } } // LabelSet ------------------------------------------------------------------- class LabelSet { std::vector labels_; std::vector values_s_; std::vector values_i_; std::vector values_d_; public: LabelSet() {} void add(const char* value, std::string label) { if (values_i_.size() > 0 || values_d_.size() > 0) cpp11::stop("Can't add string to integer/double labelset"); values_s_.push_back(value); labels_.push_back(label); } void add(int value, std::string label) { if (values_d_.size() > 0 || values_s_.size() > 0) cpp11::stop("Can't add integer to string/double labelset"); values_i_.push_back(value); labels_.push_back(label); } void add(double value, std::string label) { if (values_i_.size() > 0 || values_s_.size() > 0) cpp11::stop("Can't add double to integer/string labelset"); values_d_.push_back(value); labels_.push_back(label); } size_t size() const { return labels_.size(); } cpp11::sexp labels() const { cpp11::sexp out; if (values_i_.size() > 0) { int n = values_i_.size(); cpp11::writable::integers values(n); cpp11::writable::strings labels(n); for (int i = 0; i < n; ++i) { values[i] = values_i_[i]; labels[i] = labels_[i].c_str(); } values.attr("names") = labels; out = values; } else if (values_d_.size() > 0) { int n = values_d_.size(); cpp11::writable::doubles values(n); cpp11::writable::strings labels(n); for (int i = 0; i < n; ++i) { values[i] = values_d_[i]; labels[i] = labels_[i].c_str(); } values.attr("names") = labels; out = values; } else { int n = values_s_.size(); cpp11::writable::strings values(n), labels(n); for (int i = 0; i < n; ++i) { values[i] = values_s_[i].c_str(); labels[i] = labels_[i].c_str(); } values.attr("names") = labels; out = values; } return out; } }; // DfReader ------------------------------------------------------------------ class DfReader { FileVendor vendor_; int nrows_, nrowsAlloc_; int ncols_; cpp11::writable::list output_; cpp11::writable::strings names_; bool user_na_; std::vector val_labels_; std::map label_sets_; std::vector var_types_; std::vector notes_; std::set colsSkip_; public: DfReader(FileExt ext, bool user_na = false) : vendor_(extVendor(ext)), nrows_(0), ncols_(0), output_(static_cast(0)), user_na_(user_na) { } void skipCols(const std::vector& cols) { std::set cols_set(cols.begin(), cols.end()); colsSkip_ = cols_set; } void setInfo(int obs_count, int var_count) { if (obs_count < 0) { // If unknown, start with 1e5, and use doubling strategy nrowsAlloc_ = 1e5; nrows_ = 0; } else { nrowsAlloc_ = nrows_ = obs_count; } if (var_count < 1) { return; // sas7bcat has var_count = 0 } ncols_ = var_count - colsSkip_.size(); output_.resize(ncols_); names_.resize(ncols_); val_labels_.resize(ncols_); var_types_.resize(ncols_); } void setMetadata(const char *file_label) { if (file_label != NULL && strcmp(file_label, "") != 0) { output_.attr("label") = file_label; } } void setNote(int note_index, const char *note) { if (note != NULL && strcmp(note, "") != 0) { notes_.push_back(note); } } int createVariable(int index, readstat_variable_t *variable, const char *val_labels) { const char* name = readstat_variable_get_name(variable); if (colsSkip_.count(name) > 0) { return READSTAT_HANDLER_SKIP_VARIABLE; } int var_index = readstat_variable_get_index_after_skipping(variable); names_[var_index] = name; switch(readstat_variable_get_type(variable)) { case READSTAT_TYPE_STRING_REF: case READSTAT_TYPE_STRING: output_[var_index] = cpp11::writable::strings(nrowsAlloc_); break; case READSTAT_TYPE_INT8: case READSTAT_TYPE_INT16: case READSTAT_TYPE_INT32: case READSTAT_TYPE_FLOAT: case READSTAT_TYPE_DOUBLE: output_[var_index] = cpp11::writable::doubles(nrowsAlloc_); break; } cpp11::sexp col(output_[var_index]); const char* var_label = readstat_variable_get_label(variable); if (var_label != NULL && strcmp(var_label, "") != 0) { col.attr("label") = var_label; } if (val_labels != NULL) val_labels_[var_index] = val_labels; const char* var_format = readstat_variable_get_format(variable); VarType var_type = numType(vendor_, var_format); // Rcout << name << ": " << var_format << " [" << var_type << "]\n"; var_types_[var_index] = var_type; switch(var_type) { case HAVEN_DATE: col.attr("class") = "Date"; break; case HAVEN_TIME: col.attr("class") = {"hms", "difftime"}; col.attr("units") = "secs"; break; case HAVEN_DATETIME: col.attr("class") = {"POSIXct", "POSIXt"}; col.attr("tzone") = "UTC"; break; default: break; } // User defined missing values int n_ranges = readstat_variable_get_missing_ranges_count(variable); if (user_na_ && n_ranges > 0) { switch(readstat_variable_get_type(variable)) { case READSTAT_TYPE_STRING_REF: case READSTAT_TYPE_STRING: { cpp11::writable::strings na_values(R_xlen_t(0)); cpp11::writable::strings na_range(2); bool has_range = false; for (int i = 0; i < n_ranges; ++i) { readstat_value_t lo_value = readstat_variable_get_missing_range_lo(variable, i), hi_value = readstat_variable_get_missing_range_hi(variable, i); const char* lo = readstat_string_value(lo_value); const char* hi = readstat_string_value(hi_value); if (lo == hi) { // single value na_values.push_back(lo == NULL ? cpp11::r_string(NA_STRING) : cpp11::r_string(lo)); } else { has_range = true; // Can only ever be one range na_range[0] = lo; na_range[1] = hi; } } if (na_values.size() > 0) col.attr("na_values") = na_values; if (has_range) col.attr("na_range") = na_range; col.attr("class") = {"haven_labelled_spss", "haven_labelled", "vctrs_vctr", "character"}; break; } case READSTAT_TYPE_INT8: case READSTAT_TYPE_INT16: case READSTAT_TYPE_INT32: case READSTAT_TYPE_FLOAT: case READSTAT_TYPE_DOUBLE: { std::vector na_values; cpp11::writable::doubles na_range(2); bool has_range = false; for (int i = 0; i < n_ranges; ++i) { readstat_value_t lo_value = readstat_variable_get_missing_range_lo(variable, i), hi_value = readstat_variable_get_missing_range_hi(variable, i); double lo = readstat_double_value(lo_value), hi = readstat_double_value(hi_value); if (lo == hi) { // Single value na_values.push_back(lo); } else { has_range = true; // Can only ever be one range na_range[0] = lo; na_range[1] = hi; } } if (na_values.size() > 0) col.attr("na_values") = na_values; if (has_range) col.attr("na_range") = na_range; col.attr("class") = {"haven_labelled_spss", "haven_labelled", "vctrs_vctr", "double"}; } } } // Store original format as attribute if (var_format != NULL && strcmp(var_format, "") != 0) { col.attr(formatAttribute(vendor_)) = var_format; } // Store original display width as attribute if it differs from the default int display_width = readstat_variable_get_display_width(variable); if (vendor_ == HAVEN_SPSS && display_width != 8) { col.attr("display_width") = Rf_ScalarInteger(display_width); } return READSTAT_HANDLER_OK; } void setValue(int obs_index, readstat_variable_t *variable, readstat_value_t value) { int var_index = readstat_variable_get_index_after_skipping(variable); VarType var_type = var_types_[var_index]; if (obs_index >= nrowsAlloc_) resizeCols(nrowsAlloc_ * 2); if (obs_index >= nrows_) nrows_ = obs_index + 1; switch(value.type) { case READSTAT_TYPE_STRING_REF: case READSTAT_TYPE_STRING: { cpp11::writable::strings col(output_[var_index]); const char* str_value = readstat_string_value(value); if (readstat_value_is_tagged_missing(value)) { col[obs_index] = NA_STRING; } else if (!user_na_ && readstat_value_is_defined_missing(value, variable)) { col[obs_index] = NA_STRING; } else if (readstat_value_is_system_missing(value)) { col[obs_index] = NA_STRING; } else if (str_value == NULL) { col[obs_index] = cpp11::r_string(""); } else { col[obs_index] = cpp11::r_string(str_value); } break; } case READSTAT_TYPE_INT8: case READSTAT_TYPE_INT16: case READSTAT_TYPE_INT32: case READSTAT_TYPE_FLOAT: case READSTAT_TYPE_DOUBLE: { cpp11::writable::doubles col(output_[var_index]); double val = haven_double_value_udm(value, variable, user_na_); col[obs_index] = adjustDatetimeToR(vendor_, var_type, val); break; } } } void setValueLabels(const char *val_labels, readstat_value_t value, const char *label) { LabelSet& label_set = label_sets_[val_labels]; std::string label_s(label); switch(value.type) { case READSTAT_TYPE_STRING: // Encoded to utf-8 on output label_set.add(readstat_string_value(value), label_s); break; case READSTAT_TYPE_INT8: case READSTAT_TYPE_INT16: case READSTAT_TYPE_INT32: case READSTAT_TYPE_DOUBLE: label_set.add(haven_double_value(value), label_s); break; default: Rf_warning("Unsupported label type: %s", value.type); } } bool hasLabel(int var_index) const { std::string label = val_labels_[var_index]; if (label == "") return false; return label_sets_.count(label) > 0; } void resizeCols(int n) { // Rcout << "resizing to " << n << "\n"; nrowsAlloc_ = n; for (int i = 0; i < ncols_; ++i) { cpp11::sexp copy(Rf_lengthgets(output_[i], n)); Rf_copyMostAttrib(output_[i], copy); output_[i] = copy; } } void limitRows(long n) { if (nrows_ > n) { nrows_ = n; } } cpp11::list output(const std::string& name_repair) { if (nrows_ != nrowsAlloc_) resizeCols(nrows_); for (int i = 0; i < output_.size(); ++i) { cpp11::sexp col(output_[i]); if (hasLabel(i)) { if (Rf_getAttrib(col, R_ClassSymbol) == R_NilValue) { col.attr("class") = {"haven_labelled", "vctrs_vctr", Rf_type2char(TYPEOF(col))}; } col.attr("labels") = label_sets_[val_labels_[i]].labels(); } } int nNotes = notes_.size(); if (nNotes > 0) { cpp11::writable::strings notes(nNotes); for (int i = 0; i < nNotes; ++i) { notes[i] = notes_[i].c_str(); } output_.attr("notes") = notes_; } output_.attr("names") = names_; static cpp11::function as_tibble = cpp11::package("tibble")["as_tibble"]; using namespace cpp11::literals; return SEXP(as_tibble(output_, ".rows"_nm = nrows_, ".name_repair"_nm = name_repair)); } }; int dfreader_metadata(readstat_metadata_t *metadata, void *ctx) { ((DfReader*) ctx)->setInfo( readstat_get_row_count(metadata), readstat_get_var_count(metadata) ); ((DfReader*) ctx)->setMetadata(readstat_get_file_label(metadata)); return 0; } int dfreader_note(int note_index, const char *note, void *ctx) { ((DfReader*) ctx)->setNote(note_index, note); return 0; } int dfreader_variable(int index, readstat_variable_t *variable, const char *val_labels, void *ctx) { return ((DfReader*) ctx)->createVariable(index, variable, val_labels); } int dfreader_value(int obs_index, readstat_variable_t *variable, readstat_value_t value, void *ctx) { // Check for user interrupts every 10,000 rows or cols if ((obs_index + 1) % 10000 == 0 || (variable->index + 1) % 10000 == 0) cpp11::check_user_interrupt(); ((DfReader*) ctx)->setValue(obs_index, variable, value); return 0; } int dfreader_value_label(const char *val_labels, readstat_value_t value, const char *label, void *ctx) { ((DfReader*) ctx)->setValueLabels(val_labels, value, label); return 0; } void print_error(const char* error_message, void* ctx) { Rprintf("%s\n", error_message); } // IO handling ----------------------------------------------------------- class DfReaderInput { public: virtual ~DfReaderInput() {}; virtual int open(void* io_ctx) = 0; virtual int close(void* io_ctx) = 0; virtual readstat_off_t seek(readstat_off_t offset, readstat_io_flags_t whence, void *io_ctx) = 0; virtual ssize_t read(void *buf, size_t nbyte, void *io_ctx) = 0; virtual std::string source() const = 0; // human readable description of input source std::string encoding; }; template class DfReaderInputStream : public DfReaderInput { protected: Stream file_; public: readstat_off_t seek(readstat_off_t offset, readstat_io_flags_t whence, void *io_ctx) { std::ios_base::seekdir dir; switch(whence) { case READSTAT_SEEK_SET: dir = file_.beg; break; case READSTAT_SEEK_CUR: dir = file_.cur; break; case READSTAT_SEEK_END: default: dir = file_.end; break; } file_.seekg(offset, dir); return file_.tellg(); // returns -1 if failed } ssize_t read(void *buf, size_t nbyte, void *io_ctx) { file_.read((char*) buf, nbyte); return (file_.good() || file_.eof()) ? file_.gcount() : -1; } }; class DfReaderInputFile : public DfReaderInputStream { std::string filename_; public: DfReaderInputFile(cpp11::list spec, std::string encoding = "") { cpp11::strings path(spec[0]); filename_ = std::string(Rf_translateChar(path[0])); this->encoding = encoding; } std::string source() const { return filename_; } int open(void* io_ctx) { file_.open(filename_.c_str(), std::ifstream::binary); return file_.is_open() ? 0 : -1; } int close(void* io_ctx) { file_.close(); return file_.is_open() ? -1 : 0; } }; class DfReaderInputRaw : public DfReaderInputStream { public: DfReaderInputRaw(cpp11::list spec, std::string encoding = "") { cpp11::raws raw_data(spec[0]); std::string string_data((char*) RAW(raw_data), Rf_length(raw_data)); file_.str(string_data); this->encoding = encoding; } std::string source() const { return "file"; } int open(void* io_ctx) { return 0; } int close(void* io_ctx) { return 0; } }; int dfreader_open(const char* path, void *io_ctx) { return ((DfReaderInput*) io_ctx)->open(io_ctx); } int dfreader_close(void *io_ctx) { return ((DfReaderInput*) io_ctx)->close(io_ctx); } readstat_off_t dfreader_seek(readstat_off_t offset, readstat_io_flags_t whence, void* io_ctx) { return ((DfReaderInput*) io_ctx)->seek(offset, whence, io_ctx); } ssize_t dfreader_read(void* buf, size_t nbyte, void* io_ctx) { return ((DfReaderInput*) io_ctx)->read(buf, nbyte, io_ctx); } readstat_error_t dfreader_update(long file_size, readstat_progress_handler progress_handler, void *user_ctx, void *io_ctx) { return READSTAT_OK; } // Parser wrappers ------------------------------------------------------------- readstat_parser_t* haven_init_parser() { readstat_parser_t* parser = readstat_parser_init(); readstat_set_metadata_handler(parser, dfreader_metadata); readstat_set_note_handler(parser, dfreader_note); readstat_set_variable_handler(parser, dfreader_variable); readstat_set_value_handler(parser, dfreader_value); readstat_set_value_label_handler(parser, dfreader_value_label); readstat_set_error_handler(parser, print_error); return parser; } void haven_init_io(readstat_parser_t* parser, DfReaderInput& builder_input) { readstat_set_open_handler(parser, dfreader_open); readstat_set_close_handler(parser, dfreader_close); readstat_set_seek_handler(parser, dfreader_seek); readstat_set_read_handler(parser, dfreader_read); readstat_set_update_handler(parser, dfreader_update); readstat_set_io_ctx(parser, (void*) &builder_input); if (builder_input.encoding != "") { readstat_set_file_character_encoding(parser, builder_input.encoding.c_str()); } } void haven_set_row_limit(readstat_parser_t* parser, long n) { // readstat uses 0 to specify "all rows" but what we want is "minimal rows" readstat_set_row_limit(parser, n == 0 ? 1 : n); } template void haven_parse(readstat_parser_t* parser, DfReaderInput& builder_input, DfReader* builder) { haven_init_io(parser, builder_input); readstat_error_t result; switch(ext) { // the path for readstat_parse_* is baked into the io context (DfReaderInput) case HAVEN_SAS7BDAT: result = readstat_parse_sas7bdat(parser, "", builder); break; case HAVEN_SAS7BCAT: result = readstat_parse_sas7bcat(parser, "", builder); break; case HAVEN_XPT: result = readstat_parse_xport(parser, "", builder); break; case HAVEN_DTA: result = readstat_parse_dta(parser, "", builder); break; case HAVEN_SAV: result = readstat_parse_sav(parser, "", builder); break; case HAVEN_POR: result = readstat_parse_por(parser, "", builder); break; default: result = READSTAT_ERROR_PARSE; break; } if (result != READSTAT_OK) { std::string source = builder_input.source(); readstat_parser_free(parser); std::string msg(readstat_error_message(result)); cpp11::stop("Failed to parse %s: %s.", source.c_str(), msg.c_str()); } } template cpp11::list df_parse(cpp11::list spec, const std::vector& cols_skip, const long& n_max = -1, const long& rows_skip = 0, const std::string& encoding = "", const bool& user_na = false, const std::string& name_repair = "check_unique", cpp11::list catalog_spec = cpp11::writable::list(R_xlen_t(0)), const std::string& catalog_encoding = "" ) { DfReader builder(ext, user_na); builder.skipCols(cols_skip); readstat_parser_t* parser = haven_init_parser(); haven_set_row_limit(parser, n_max); readstat_set_row_offset(parser, rows_skip); if (ext == HAVEN_SAS7BDAT && catalog_spec.size() != 0) { InputClass cat_builder_input(catalog_spec, catalog_encoding); haven_parse(parser, cat_builder_input, &builder); } InputClass builder_input(spec, encoding); haven_parse(parser, builder_input, &builder); readstat_parser_free(parser); if (n_max >= 0) { builder.limitRows(n_max); // must enforce n_max = 0 } return builder.output(name_repair); } // # nocov start [[cpp11::register]] cpp11::list df_parse_sas_file(cpp11::list spec_b7dat, cpp11::list spec_b7cat, std::string encoding, std::string catalog_encoding, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair) { return df_parse(spec_b7dat, cols_skip, n_max, rows_skip, encoding, false, name_repair, spec_b7cat, catalog_encoding); } [[cpp11::register]] cpp11::list df_parse_sas_raw(cpp11::list spec_b7dat, cpp11::list spec_b7cat, std::string encoding, std::string catalog_encoding, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair) { return df_parse(spec_b7dat, cols_skip, n_max, rows_skip, encoding, false, name_repair, spec_b7cat, catalog_encoding); } [[cpp11::register]] cpp11::list df_parse_xpt_file(cpp11::list spec, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair) { return df_parse(spec, cols_skip, n_max, rows_skip, "", false, name_repair); } [[cpp11::register]] cpp11::list df_parse_xpt_raw(cpp11::list spec, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair) { return df_parse(spec, cols_skip, n_max, rows_skip, "", false, name_repair); } [[cpp11::register]] cpp11::list df_parse_dta_file(cpp11::list spec, std::string encoding, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair) { return df_parse(spec, cols_skip, n_max, rows_skip, encoding, false, name_repair); } [[cpp11::register]] cpp11::list df_parse_dta_raw(cpp11::list spec, std::string encoding, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair) { return df_parse(spec, cols_skip, n_max, rows_skip, encoding, false, name_repair); } [[cpp11::register]] cpp11::list df_parse_sav_file(cpp11::list spec, std::string encoding, bool user_na, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair) { return df_parse(spec, cols_skip, n_max, rows_skip, encoding, user_na, name_repair); } [[cpp11::register]] cpp11::list df_parse_sav_raw(cpp11::list spec, std::string encoding, bool user_na, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair) { return df_parse(spec, cols_skip, n_max, rows_skip, encoding, user_na, name_repair); } [[cpp11::register]] cpp11::list df_parse_por_file(cpp11::list spec, std::string encoding, bool user_na, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair) { return df_parse(spec, cols_skip, n_max, rows_skip, encoding, user_na, name_repair); } [[cpp11::register]] cpp11::list df_parse_por_raw(cpp11::list spec, std::string encoding, bool user_na, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair) { return df_parse(spec, cols_skip, n_max, rows_skip, encoding, user_na, name_repair); } // # nocov end haven/src/haven_types.h0000644000176200001440000000163414033646021014643 0ustar liggesusers#ifndef __HAVEN_TYPES__ #define __HAVEN_TYPES__ #include #include #include #include #include #define CPP11_PARTIAL enum FileVendor { HAVEN_SPSS, HAVEN_STATA, HAVEN_SAS }; enum FileExt { HAVEN_SAV, HAVEN_POR, HAVEN_DTA, HAVEN_SAS7BDAT, HAVEN_SAS7BCAT, HAVEN_XPT }; FileVendor extVendor(FileExt ext); enum VarType { HAVEN_DEFAULT, HAVEN_DATE, HAVEN_TIME, HAVEN_DATETIME }; std::string formatAttribute(FileVendor vendor); bool hasPrefix(std::string x, std::string prefix); VarType numType(SEXP x); VarType numType(FileVendor vendor, const char* var_format); // Value conversion ----------------------------------------------------------- int daysOffset(FileVendor vendor); double adjustDatetimeToR(FileVendor vendor, VarType var, double value); double adjustDatetimeFromR(FileVendor vendor, SEXP col, double value); #endif haven/src/cpp11.cpp0000644000176200001440000002715614102302631013575 0ustar liggesusers// Generated by cpp11: do not edit by hand // clang-format off #include "haven_types.h" #include "cpp11/declarations.hpp" // DfReader.cpp cpp11::list df_parse_sas_file(cpp11::list spec_b7dat, cpp11::list spec_b7cat, std::string encoding, std::string catalog_encoding, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair); extern "C" SEXP _haven_df_parse_sas_file(SEXP spec_b7dat, SEXP spec_b7cat, SEXP encoding, SEXP catalog_encoding, SEXP cols_skip, SEXP n_max, SEXP rows_skip, SEXP name_repair) { BEGIN_CPP11 return cpp11::as_sexp(df_parse_sas_file(cpp11::as_cpp>(spec_b7dat), cpp11::as_cpp>(spec_b7cat), cpp11::as_cpp>(encoding), cpp11::as_cpp>(catalog_encoding), cpp11::as_cpp>>(cols_skip), cpp11::as_cpp>(n_max), cpp11::as_cpp>(rows_skip), cpp11::as_cpp>(name_repair))); END_CPP11 } // DfReader.cpp cpp11::list df_parse_sas_raw(cpp11::list spec_b7dat, cpp11::list spec_b7cat, std::string encoding, std::string catalog_encoding, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair); extern "C" SEXP _haven_df_parse_sas_raw(SEXP spec_b7dat, SEXP spec_b7cat, SEXP encoding, SEXP catalog_encoding, SEXP cols_skip, SEXP n_max, SEXP rows_skip, SEXP name_repair) { BEGIN_CPP11 return cpp11::as_sexp(df_parse_sas_raw(cpp11::as_cpp>(spec_b7dat), cpp11::as_cpp>(spec_b7cat), cpp11::as_cpp>(encoding), cpp11::as_cpp>(catalog_encoding), cpp11::as_cpp>>(cols_skip), cpp11::as_cpp>(n_max), cpp11::as_cpp>(rows_skip), cpp11::as_cpp>(name_repair))); END_CPP11 } // DfReader.cpp cpp11::list df_parse_xpt_file(cpp11::list spec, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair); extern "C" SEXP _haven_df_parse_xpt_file(SEXP spec, SEXP cols_skip, SEXP n_max, SEXP rows_skip, SEXP name_repair) { BEGIN_CPP11 return cpp11::as_sexp(df_parse_xpt_file(cpp11::as_cpp>(spec), cpp11::as_cpp>>(cols_skip), cpp11::as_cpp>(n_max), cpp11::as_cpp>(rows_skip), cpp11::as_cpp>(name_repair))); END_CPP11 } // DfReader.cpp cpp11::list df_parse_xpt_raw(cpp11::list spec, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair); extern "C" SEXP _haven_df_parse_xpt_raw(SEXP spec, SEXP cols_skip, SEXP n_max, SEXP rows_skip, SEXP name_repair) { BEGIN_CPP11 return cpp11::as_sexp(df_parse_xpt_raw(cpp11::as_cpp>(spec), cpp11::as_cpp>>(cols_skip), cpp11::as_cpp>(n_max), cpp11::as_cpp>(rows_skip), cpp11::as_cpp>(name_repair))); END_CPP11 } // DfReader.cpp cpp11::list df_parse_dta_file(cpp11::list spec, std::string encoding, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair); extern "C" SEXP _haven_df_parse_dta_file(SEXP spec, SEXP encoding, SEXP cols_skip, SEXP n_max, SEXP rows_skip, SEXP name_repair) { BEGIN_CPP11 return cpp11::as_sexp(df_parse_dta_file(cpp11::as_cpp>(spec), cpp11::as_cpp>(encoding), cpp11::as_cpp>>(cols_skip), cpp11::as_cpp>(n_max), cpp11::as_cpp>(rows_skip), cpp11::as_cpp>(name_repair))); END_CPP11 } // DfReader.cpp cpp11::list df_parse_dta_raw(cpp11::list spec, std::string encoding, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair); extern "C" SEXP _haven_df_parse_dta_raw(SEXP spec, SEXP encoding, SEXP cols_skip, SEXP n_max, SEXP rows_skip, SEXP name_repair) { BEGIN_CPP11 return cpp11::as_sexp(df_parse_dta_raw(cpp11::as_cpp>(spec), cpp11::as_cpp>(encoding), cpp11::as_cpp>>(cols_skip), cpp11::as_cpp>(n_max), cpp11::as_cpp>(rows_skip), cpp11::as_cpp>(name_repair))); END_CPP11 } // DfReader.cpp cpp11::list df_parse_sav_file(cpp11::list spec, std::string encoding, bool user_na, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair); extern "C" SEXP _haven_df_parse_sav_file(SEXP spec, SEXP encoding, SEXP user_na, SEXP cols_skip, SEXP n_max, SEXP rows_skip, SEXP name_repair) { BEGIN_CPP11 return cpp11::as_sexp(df_parse_sav_file(cpp11::as_cpp>(spec), cpp11::as_cpp>(encoding), cpp11::as_cpp>(user_na), cpp11::as_cpp>>(cols_skip), cpp11::as_cpp>(n_max), cpp11::as_cpp>(rows_skip), cpp11::as_cpp>(name_repair))); END_CPP11 } // DfReader.cpp cpp11::list df_parse_sav_raw(cpp11::list spec, std::string encoding, bool user_na, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair); extern "C" SEXP _haven_df_parse_sav_raw(SEXP spec, SEXP encoding, SEXP user_na, SEXP cols_skip, SEXP n_max, SEXP rows_skip, SEXP name_repair) { BEGIN_CPP11 return cpp11::as_sexp(df_parse_sav_raw(cpp11::as_cpp>(spec), cpp11::as_cpp>(encoding), cpp11::as_cpp>(user_na), cpp11::as_cpp>>(cols_skip), cpp11::as_cpp>(n_max), cpp11::as_cpp>(rows_skip), cpp11::as_cpp>(name_repair))); END_CPP11 } // DfReader.cpp cpp11::list df_parse_por_file(cpp11::list spec, std::string encoding, bool user_na, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair); extern "C" SEXP _haven_df_parse_por_file(SEXP spec, SEXP encoding, SEXP user_na, SEXP cols_skip, SEXP n_max, SEXP rows_skip, SEXP name_repair) { BEGIN_CPP11 return cpp11::as_sexp(df_parse_por_file(cpp11::as_cpp>(spec), cpp11::as_cpp>(encoding), cpp11::as_cpp>(user_na), cpp11::as_cpp>>(cols_skip), cpp11::as_cpp>(n_max), cpp11::as_cpp>(rows_skip), cpp11::as_cpp>(name_repair))); END_CPP11 } // DfReader.cpp cpp11::list df_parse_por_raw(cpp11::list spec, std::string encoding, bool user_na, std::vector cols_skip, long n_max, long rows_skip, std::string name_repair); extern "C" SEXP _haven_df_parse_por_raw(SEXP spec, SEXP encoding, SEXP user_na, SEXP cols_skip, SEXP n_max, SEXP rows_skip, SEXP name_repair) { BEGIN_CPP11 return cpp11::as_sexp(df_parse_por_raw(cpp11::as_cpp>(spec), cpp11::as_cpp>(encoding), cpp11::as_cpp>(user_na), cpp11::as_cpp>>(cols_skip), cpp11::as_cpp>(n_max), cpp11::as_cpp>(rows_skip), cpp11::as_cpp>(name_repair))); END_CPP11 } // DfWriter.cpp void write_sav_(cpp11::list data, cpp11::strings path, bool compress); extern "C" SEXP _haven_write_sav_(SEXP data, SEXP path, SEXP compress) { BEGIN_CPP11 write_sav_(cpp11::as_cpp>(data), cpp11::as_cpp>(path), cpp11::as_cpp>(compress)); return R_NilValue; END_CPP11 } // DfWriter.cpp void write_dta_(cpp11::list data, cpp11::strings path, int version, cpp11::sexp label); extern "C" SEXP _haven_write_dta_(SEXP data, SEXP path, SEXP version, SEXP label) { BEGIN_CPP11 write_dta_(cpp11::as_cpp>(data), cpp11::as_cpp>(path), cpp11::as_cpp>(version), cpp11::as_cpp>(label)); return R_NilValue; END_CPP11 } // DfWriter.cpp void write_sas_(cpp11::list data, cpp11::strings path); extern "C" SEXP _haven_write_sas_(SEXP data, SEXP path) { BEGIN_CPP11 write_sas_(cpp11::as_cpp>(data), cpp11::as_cpp>(path)); return R_NilValue; END_CPP11 } // DfWriter.cpp void write_xpt_(cpp11::list data, cpp11::strings path, int version, std::string name); extern "C" SEXP _haven_write_xpt_(SEXP data, SEXP path, SEXP version, SEXP name) { BEGIN_CPP11 write_xpt_(cpp11::as_cpp>(data), cpp11::as_cpp>(path), cpp11::as_cpp>(version), cpp11::as_cpp>(name)); return R_NilValue; END_CPP11 } extern "C" { /* .Call calls */ extern SEXP _haven_df_parse_dta_file(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP _haven_df_parse_dta_raw(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP _haven_df_parse_por_file(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP _haven_df_parse_por_raw(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP _haven_df_parse_sas_file(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP _haven_df_parse_sas_raw(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP _haven_df_parse_sav_file(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP _haven_df_parse_sav_raw(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP _haven_df_parse_xpt_file(SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP _haven_df_parse_xpt_raw(SEXP, SEXP, SEXP, SEXP, SEXP); extern SEXP _haven_write_dta_(SEXP, SEXP, SEXP, SEXP); extern SEXP _haven_write_sas_(SEXP, SEXP); extern SEXP _haven_write_sav_(SEXP, SEXP, SEXP); extern SEXP _haven_write_xpt_(SEXP, SEXP, SEXP, SEXP); extern SEXP is_tagged_na_(SEXP, SEXP); extern SEXP na_tag_(SEXP); extern SEXP tagged_na_(SEXP); static const R_CallMethodDef CallEntries[] = { {"_haven_df_parse_dta_file", (DL_FUNC) &_haven_df_parse_dta_file, 6}, {"_haven_df_parse_dta_raw", (DL_FUNC) &_haven_df_parse_dta_raw, 6}, {"_haven_df_parse_por_file", (DL_FUNC) &_haven_df_parse_por_file, 7}, {"_haven_df_parse_por_raw", (DL_FUNC) &_haven_df_parse_por_raw, 7}, {"_haven_df_parse_sas_file", (DL_FUNC) &_haven_df_parse_sas_file, 8}, {"_haven_df_parse_sas_raw", (DL_FUNC) &_haven_df_parse_sas_raw, 8}, {"_haven_df_parse_sav_file", (DL_FUNC) &_haven_df_parse_sav_file, 7}, {"_haven_df_parse_sav_raw", (DL_FUNC) &_haven_df_parse_sav_raw, 7}, {"_haven_df_parse_xpt_file", (DL_FUNC) &_haven_df_parse_xpt_file, 5}, {"_haven_df_parse_xpt_raw", (DL_FUNC) &_haven_df_parse_xpt_raw, 5}, {"_haven_write_dta_", (DL_FUNC) &_haven_write_dta_, 4}, {"_haven_write_sas_", (DL_FUNC) &_haven_write_sas_, 2}, {"_haven_write_sav_", (DL_FUNC) &_haven_write_sav_, 3}, {"_haven_write_xpt_", (DL_FUNC) &_haven_write_xpt_, 4}, {"is_tagged_na_", (DL_FUNC) &is_tagged_na_, 2}, {"na_tag_", (DL_FUNC) &na_tag_, 1}, {"tagged_na_", (DL_FUNC) &tagged_na_, 1}, {NULL, NULL, 0} }; } extern "C" void R_init_haven(DllInfo* dll){ R_registerRoutines(dll, NULL, CallEntries, NULL, NULL); R_useDynamicSymbols(dll, FALSE); R_forceSymbols(dll, TRUE); } haven/src/tagged_na.h0000644000176200001440000000027114033646021014223 0ustar liggesusers#ifndef __TAGGED_NA__ #define __TAGGED_NA__ #ifdef __cplusplus extern "C" { #endif double make_tagged_na(char x); char tagged_na_value(double x); #ifdef __cplusplus } #endif #endif haven/vignettes/0000755000176200001440000000000014102332323013354 5ustar liggesusershaven/vignettes/semantics.Rmd0000644000176200001440000001445314033646021016023 0ustar liggesusers--- title: "Conversion semantics" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Conversion semantics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(haven) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` There are some differences between the way that R, SAS, SPSS, and Stata represented labelled data and missing values. While SAS, SPSS, and Stata share some obvious similarities, R is little different. This vignette explores the differences, and shows you how haven bridges the gap. ## Value labels Base R has one data type that effectively maintains a mapping between integers and character labels: the factor. This however, is not the primary use of factors: they are instead designed to automatically generate useful contrasts for linear models. Factors differ from the labelled values provided by the other tools in important ways: * SPSS and SAS can label numeric and character values, not just integer values. * The value do not need to be exhaustive. It is common to label the special missing values (e.g. `.D` = did not respond, `.N` = not applicable), while leaving other values as is. Value labels in SAS are a little different again. In SAS, labels are just special case of general formats. Formats include currencies and dates, but user-defined just assigns labels to individual values (including special missings value). Formats have names and existing independently of the variables they are associated with. You create a named format with `PROC FORMAT` and then associated with variables in a `DATA` step (the names of character formats thealways start with `$`). ### `labelled()` To allow you to import labelled vectors into R, haven provides the S3 labelled class, created with `labelled()`. This class allows you to associated arbitrary labels with numeric or character vectors: ```{r} x1 <- labelled( sample(1:5), c(Good = 1, Bad = 5) ) x1 x2 <- labelled( c("M", "F", "F", "F", "M"), c(Male = "M", Female = "F") ) x2 ``` The goal of haven is not to provide a labelled vector that you can use everywhere in your analysis. The goal is to provide an intermediate datastructure that you can convert into a regular R data frame. You can do this by either converting to a factor or stripping the labels: ```{r} as_factor(x1) zap_labels(x1) as_factor(x2) zap_labels(x2) ``` See the documentation for `as_factor()` for more options to control exactly what the factor uses for levels. Both `as_factor()` and `zap_labels()` have data frame methods if you want to apply the same strategy to every column in a data frame: ```{r} df <- tibble::data_frame(x1, x2, z = 1:5) df zap_labels(df) as_factor(df) ``` ## Missing values All three tools provide a global "system missing value" which is displayed as `.`. This is roughly equivalent to R's `NA`, although neither Stata nor SAS propagate missingness in numeric comparisons: SAS treats the missing value as the smallest possible number (i.e. `-inf`), and Stata treats it as the largest possible number (i.e. `inf`). Each tool also provides a mechanism for recording multiple types of missingness: * Stata has "extended" missing values, `.A` through `.Z`. * SAS has "special" missing values, `.A` through `.Z` plus `._`. * SPSS has per-column "user" missing values. Each column can declare up to three distinct values or a range of values (plus one distinct value) that should be treated as missing. Stata and SAS only support tagged missing values for numeric columns. SPSS supports up to three distinct values for character columns. Generally, operations involving a user-missing type return a system missing value. Haven models these missing values in two different ways: * For SAS and Stata, haven provides "tagged" missing values which extend R's regular `NA` to add a single character label. * For SPSS, haven provides a subclass of `labelled` that also provides user defined values and ranges. ### Tagged missing values To support Stata's extended and SAS's special missing value, haven implements a tagged NA. It does this by taking advantage of the internal structure of a floating point NA. That allows these values to behave identical to NA in regular R operations, while still preserving the value of the tag. The R interface for creating with tagged NAs is a little clunky because generally they'll be created by haven for you. But you can create your own with `tagged_na()`: ```{r} x <- c(1:3, tagged_na("a", "z"), 3:1) x ``` Note these tagged NAs behave identically to regular NAs, even when printing. To see their tags, use `print_tagged_na()`: ```{r} print_tagged_na(x) ``` To test if a value is a tagged NA, use `is_tagged_na()`, and to extract the value of the tag, use `na_tag()`: ```{r} is_tagged_na(x) is_tagged_na(x, "a") na_tag(x) ``` My expectation is that tagged missings are most often used in conjuction with labels (described below), so labelled vectors print the tags for you, and `as_factor()` knows how to relabel: ```{r} y <- labelled(x, c("Not home" = tagged_na("a"), "Refused" = tagged_na("z"))) y as_factor(y) ``` ### User defined missing values SPSS's user-defined values work differently to SAS and Stata. Each column can have either up to three distinct values that are considered as missing, or a range. Haven provides `labelled_spss()` as a subclass of `labelled()` to model these additional user-defined missings. ```{r} x1 <- labelled_spss(c(1:10, 99), c(Missing = 99), na_value = 99) x2 <- labelled_spss(c(1:10, 99), c(Missing = 99), na_range = c(90, Inf)) x1 x2 ``` These objects are somewhat dangerous to work with in R because most R functions don't know those values are missing: ```{r} mean(x1) ``` Because of that danger, the default behaviour of `read_spss()` is to return regular labelled objects where user-defined missing values have been converted to `NA`s. To get `read_spss()` to return `labelled_spss()` objects, you'll need to set `user_na = TRUE`. I've defined an `is.na()` method so you can find them yourself: ```{r} is.na(x1) ``` And the presence of that method does mean many functions with an `na.rm` argument will work correctly: ```{r} mean(x1, na.rm = TRUE) ``` But generally you should either convert to a factor, convert to regular missing vaues, or strip the all the labels: ```{r} as_factor(x1) zap_missing(x1) zap_labels(x1) ``` haven/vignettes/datetimes.Rmd0000644000176200001440000000243014101766302016005 0ustar liggesusers--- title: "Dates and times" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Dates and times} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Formats There are three common formats across SAS, SPSS and Stata. Date (number of days) * SAS: MMDDYY, DDMMYY, YYMMDD, DATE * Spss: n/a * Stata: %td Time (number of seconds): * SAS: TIME, HHMM, TOD * Spss: TIME, DTIME * Stata: n/a DateTime (number of seconds): * SAS: DATETIME * Spss: DATE, ADATE, SDATE, DATETIME (as milliseconds) * Stata: %tc, %tC ## Offsets Dates and date times use a difference offset to R: * SAS: 1960-01-01 (`r -as.integer(as.Date("1960-01-01"))` days) * Spss: 1582-10-14. (`r -as.integer(as.Date("1582-10-14"))` days) * Stata: 1960-01-01. (`r -as.integer(as.Date("1960-01-01"))` days) ## References * SAS: , * Spss: * Stata: haven/R/0000755000176200001440000000000014102302631011544 5ustar liggesusershaven/R/labelled_spss.R0000644000176200001440000002020214101006665014506 0ustar liggesusers#' Labelled vectors for SPSS #' #' This class is only used when `user_na = TRUE` in #' [read_sav()]. It is similar to the [labelled()] class #' but it also models SPSS's user-defined missings, which can be up to #' three distinct values, or for numeric vectors a range. #' #' @param na_values A vector of values that should also be considered as missing. #' @param na_range A numeric vector of length two giving the (inclusive) extents #' of the range. Use `-Inf` and `Inf` if you want the range to be #' open ended. #' @inheritParams labelled #' @export #' @examples #' x1 <- labelled_spss(1:10, c(Good = 1, Bad = 8), na_values = c(9, 10)) #' is.na(x1) #' #' x2 <- labelled_spss(1:10, c(Good = 1, Bad = 8), na_range = c(9, Inf), #' label = "Quality rating") #' is.na(x2) #' #' # Print data and metadata #' x2 labelled_spss <- function(x = double(), labels = NULL, na_values = NULL, na_range = NULL, label = NULL) { x <- vec_data(x) na_values <- vec_cast_named(na_values, x, x_arg = "na_values", to_arg = "x") labelled <- labelled(x, labels = labels, label = label) new_labelled_spss( vec_data(labelled), labels = attr(labelled, "labels"), na_values = na_values, na_range = na_range, label = attr(labelled, "label", exact = TRUE) ) } new_labelled_spss <- function(x, labels, na_values, na_range, label) { if (!is.null(na_values) && any(is.na(na_values))) { abort("`na_values` can not contain missing values.") } if (!is.null(na_range)) { type_ok <- (is.character(x) && is.character(na_range)) || (is.numeric(x) && is.numeric(na_range)) if (!type_ok || length(na_range) != 2) { abort("`na_range` must be a vector of length two the same type as `x`.") } if (any(is.na(na_range))) { abort("`na_range` can not contain missing values.") } if (na_range[1] >= na_range[2]) { abort("`na_range` must be in ascending order.") } } new_labelled(x, labels = labels, label = label, na_values = na_values, na_range = na_range, class = "haven_labelled_spss" ) } #' @export vec_ptype_full.haven_labelled_spss <- function(x, ...) { paste0("labelled_spss<", vec_ptype_full(vec_data(x)), ">") } #' @export obj_print_footer.haven_labelled_spss <- function(x, ...) { na_values <- attr(x, "na_values") if (!is.null(na_values)) { cat_line("Missing values: ", paste(na_values, collapse = ", ")) } na_range <- attr(x, "na_range") if (!is.null(na_range)) { cat_line("Missing range: [", paste(na_range, collapse = ", "), "]") } NextMethod() } #' @export is.na.haven_labelled_spss <- function(x) { miss <- NextMethod() val <- vec_data(x) na_values <- attr(x, "na_values") if (!is.null(na_values)) { miss <- miss | val %in% na_values } na_range <- attr(x, "na_range") if (!is.null(na_range)) { miss <- miss | (val >= na_range[1] & val <= na_range[2]) } miss } # Type system ------------------------------------------------------------- # # Import to avoid R CMD check NOTE # #' @importFrom methods setOldClass # setOldClass(c("haven_labelled_spss", "haven_labelled", "vctrs_vctr")) #' @export vec_ptype2.double.haven_labelled_spss <- function(x, y, ...) { data_type <- vec_ptype2(x, vec_data(y), ...) new_labelled_spss( data_type, labels = vec_cast_named(attr(y, "labels"), data_type), na_values = vec_cast(attr(y, "na_values"), data_type), na_range = attr(y, "na_range"), label = attr(y, "label", exact = TRUE) ) } #' @export vec_ptype2.integer.haven_labelled_spss <- vec_ptype2.double.haven_labelled_spss #' @export vec_ptype2.character.haven_labelled_spss <- vec_ptype2.double.haven_labelled_spss #' @export vec_ptype2.haven_labelled_spss.double <- function(x, y, ...) vec_ptype2(y, x, ...) #' @export vec_ptype2.haven_labelled_spss.integer <- vec_ptype2.haven_labelled_spss.double #' @export vec_ptype2.haven_labelled_spss.character <- vec_ptype2.haven_labelled_spss.double #' @export vec_ptype2.haven_labelled_spss.haven_labelled_spss <- function(x, y, ..., x_arg = "", y_arg = "") { data_type <- vec_ptype2(vec_data(x), vec_data(y), ..., x_arg = x_arg, y_arg = y_arg) # Prefer labels from LHS x_labels <- vec_cast_named(attr(x, "labels"), data_type, x_arg = x_arg) y_labels <- vec_cast_named(attr(y, "labels"), data_type, x_arg = y_arg) labels <- c(x_labels, y_labels[!y_labels %in% x_labels]) # Prefer labels from LHS label <- attr(x, "label", exact = TRUE) %||% attr(y, "label", exact = TRUE) x_na_values <- vec_cast(attr(x, "na_values"), data_type, x_arg = x_arg) y_na_values <- vec_cast(attr(y, "na_values"), data_type, x_arg = y_arg) # Ignore user defined missings and return a standard haven_labelled if # there are mismatches between the missing attributes if (!identical(x_na_values, y_na_values) || !identical(attr(x, "na_range"), attr(y, "na_range"))) { new_labelled(data_type, labels = labels, label = label) } else { new_labelled_spss( data_type, labels = labels, na_values = x_na_values, na_range = attr(x, "na_range"), label = label) } } #' @export vec_ptype2.haven_labelled_spss.haven_labelled <- vec_ptype2.haven_labelled_spss.haven_labelled_spss #' @export vec_ptype2.haven_labelled.haven_labelled_spss <- vec_ptype2.haven_labelled_spss.haven_labelled_spss #' @export vec_cast.double.haven_labelled_spss <- function(x, to, ...) vec_cast(vec_data(x), to) #' @export vec_cast.integer.haven_labelled_spss <- function(x, to, ...) vec_cast(vec_data(x), to) #' @export vec_cast.character.haven_labelled_spss <- function(x, to, ...) { if (is.character(x)) { vec_cast(vec_data(x), to, ...) } else { stop_incompatible_cast(x, to, ...) } } #' @export vec_cast.haven_labelled_spss.haven_labelled_spss <- function(x, to, ..., x_arg = "", to_arg = "") { out_data <- vec_cast(vec_data(x), vec_data(to), ..., x_arg = x_arg, to_arg = to_arg) x_labels <- attr(x, "labels") to_labels <- attr(to, "labels") out_labels <- to_labels %||% x_labels x_na_values <- attr(x, "na_values") to_na_values <- attr(to, "na_values") x_na_range <- attr(x, "na_range") to_na_range <- attr(to, "na_range") out <- labelled_spss(out_data, labels = out_labels, na_values = to_na_values, na_range = to_na_range, label = attr(x, "label", exact = TRUE) ) # do we lose tagged na values? if (is.double(x) && !is.double(out)) { lossy <- is_tagged_na(x) maybe_lossy_cast(out, x, to, lossy, x_arg = x_arg, to_arg = to_arg, details = "Only doubles can hold tagged na values." ) } # do any values become unlabelled? if (!is.null(to_labels)) { lossy <- x %in% x_labels[!x_labels %in% out_labels] maybe_lossy_cast(out, x, to, lossy, x_arg = x_arg, to_arg = to_arg, details = paste0("Values are labelled in `", x_arg, "` but not in `", to_arg, "`.") ) } # do any values switch from missing to non-missing? if (!is.null(to_na_range) | !is.null(to_na_values)) { lossy <- x %in% x_na_values if (!is.null(x_na_range)) lossy <- lossy | (vec_data(x) >= x_na_range[1] & vec_data(x) <= x_na_range[2]) if (!is.null(to_na_range)) lossy <- lossy & !(vec_data(x) >= to_na_range[1] & vec_data(x) <= to_na_range[2]) else if (!is.null(to_na_values)) lossy <- lossy & !x %in% to_na_values maybe_lossy_cast(out, x, to, lossy, x_arg = x_arg, to_arg = to_arg, details = paste0("Values are missing in `", x_arg, "` but not in `", to_arg, "`.") ) } out } #' @export vec_cast.haven_labelled_spss.double <- function(x, to, ...) { vec_cast.haven_labelled_spss.haven_labelled_spss(x, to, ...) } #' @export vec_cast.haven_labelled_spss.integer <- function(x, to, ...) { vec_cast.haven_labelled_spss.haven_labelled_spss(x, to, ...) } #' @export vec_cast.haven_labelled_spss.character <- function(x, to, ...) { vec_cast.haven_labelled_spss.haven_labelled_spss(x, to, ...) } #' @export vec_cast.haven_labelled.haven_labelled_spss <- function(x, to, ...) { vec_cast.haven_labelled.haven_labelled(x, to, ...) } #' @export vec_cast.haven_labelled_spss.haven_labelled <- function(x, to, ...) { vec_cast.haven_labelled_spss.haven_labelled_spss(x, to, ...) } haven/R/zap_widths.R0000644000176200001440000000107414033646021014054 0ustar liggesusers#' Remove display width attributes #' #' To provide some mild support for round-tripping variables between SPSS #' and R, haven stores display widths in an attribute: `display_width`. If this #' causes problems for your code, you can get rid of them with `zap_widths`. #' #' @param x A vector or data frame. #' @family zappers #' @export zap_widths <- function(x) { UseMethod("zap_widths") } #' @export zap_widths.default <- function(x) { attr(x, "display_width") <- NULL x } #' @export zap_widths.data.frame <- function(x) { x[] <- lapply(x, zap_widths) x } haven/R/zap_missing.R0000644000176200001440000000224014034334276014226 0ustar liggesusers#' Zap special missings to regular R missings #' #' This is useful if you want to convert tagged missing values from SAS or #' Stata, or user-defined missings from SPSS, to regular R `NA`. #' #' @param x A vector or data frame #' @export #' @examples #' x1 <- labelled( #' c(1, 5, tagged_na("a", "b")), #' c(Unknown = tagged_na("a"), Refused = tagged_na("b")) #' ) #' x1 #' zap_missing(x1) #' #' x2 <- labelled_spss( #' c(1, 2, 1, 99), #' c(missing = 99), #' na_value = 99 #' ) #' x2 #' zap_missing(x2) #' #' # You can also apply to data frames #' df <- tibble::tibble(x1, x2, y = 4:1) #' df #' zap_missing(df) zap_missing <- function(x) { UseMethod("zap_missing") } #' @export zap_missing.default <- function(x) { x } #' @export zap_missing.haven_labelled <- function(x) { x[is.na(x)] <- NA labels <- attr(x, "labels") labels <- labels[!is.na(labels)] attr(x, "labels") <- labels x } #' @export zap_missing.haven_labelled_spss <- function(x) { x[is.na(x)] <- NA attr(x, "na_values") <- NULL attr(x, "na_range") <- NULL class(x) <- "haven_labelled" x } #' @export zap_missing.data.frame <- function(x) { x[] <- lapply(x, zap_missing) x } haven/R/update.R0000644000176200001440000000113314033646021013156 0ustar liggesusers# nocov start update_readstat <- function(branch = "master") { tmp <- tempfile() utils::download.file( paste0("https://github.com/WizardMac/ReadStat/archive/", branch, ".zip"), tmp, quiet = TRUE ) base <- fs::path_common(utils::unzip(tmp, exdir = tempdir())) in_dir <- fs::path(base, "src") out_dir <- fs::path("src", "readstat") fs::dir_delete(out_dir) fs::dir_copy(in_dir, out_dir) fs::dir_delete(fs::path(out_dir, c("bin", "fuzz", "test"))) fs::file_copy(fs::path(base, "LICENSE"), out_dir) fs::file_copy(fs::path(base, "NEWS"), out_dir) invisible() } # nocov end haven/R/utils.R0000644000176200001440000000244214034330442013036 0ustar liggesuserscat_line <- function(...) { cat(paste0(..., "\n", collapse = "")) } # TODO: Remove once vec_cast() preserves names. # https://github.com/r-lib/vctrs/issues/623 vec_cast_named <- function(x, to, ...) { stats::setNames(vec_cast(x, to, ...), names(x)) } force_utc <- function(x) { if (identical(attr(x, "tzone"), "UTC")) { x } else { as.POSIXct(format(x, usetz = FALSE), tz = "UTC", format = "%Y-%m-%d %H:%M:%S") } } skip_cols <- function(reader, col_select = NULL, ...) { col_select <- enquo(col_select) if (quo_is_null(col_select)) { return(character()) } cols <- names(reader(..., n_max = 0L)) sels <- tidyselect::vars_select(cols, !!col_select) if (length(sels) == 0) { stop("Can't find any columns matching `col_select` in data.", call. = FALSE) } setdiff(cols, sels) } validate_n_max <- function(n) { if (!is.numeric(n) && !is.na(n)) { stop("`n_max` must be numeric, not ", class(n)[1], ".", call. = FALSE) } if (length(n) != 1) { stop("`n_max` must have length 1, not ", length(n), ".", call. = FALSE) } if (is.na(n) || is.infinite(n) || n < 0) { return(-1L) } as.integer(n) } adjust_tz <- function(df) { datetime <- vapply(df, inherits, "POSIXt", FUN.VALUE = logical(1)) df[datetime] <- lapply(df[datetime], force_utc) df } haven/R/tagged_na.R0000644000176200001440000000366214033646021013616 0ustar liggesusers#' "Tagged" missing values #' #' "Tagged" missing values work exactly like regular R missing values except #' that they store one additional byte of information a tag, which is usually #' a letter ("a" to "z"). When by loading a SAS and Stata file, the tagged #' missing values always use lower case values. #' #' `format_tagged_na()` and `print_tagged_na()` format tagged #' NA's as NA(a), NA(b), etc. #' #' @param ... Vectors containing single character. The letter will be used to #' "tag" the missing value. #' @param x A numeric vector #' @param digits Number of digits to use in string representation #' @export #' @examples #' x <- c(1:5, tagged_na("a"), tagged_na("z"), NA) #' #' # Tagged NA's work identically to regular NAs #' x #' is.na(x) #' #' # To see that they're special, you need to use na_tag(), #' # is_tagged_na(), or print_tagged_na(): #' is_tagged_na(x) #' na_tag(x) #' print_tagged_na(x) #' #' # You can test for specific tagged NAs with the second argument #' is_tagged_na(x, "a") #' #' # Because the support for tagged's NAs is somewhat tagged on to R, #' # the left-most NA will tend to be preserved in arithmetic operations. #' na_tag(tagged_na("a") + tagged_na("z")) tagged_na <- function(...) { .Call(tagged_na_, c(...)) } #' @rdname tagged_na #' @export na_tag <- function(x) { .Call(na_tag_, x) } #' @param tag If `NULL`, will only return true if the tag has this value. #' @rdname tagged_na #' @export is_tagged_na <- function(x, tag = NULL) { .Call(is_tagged_na_, x, tag) } #' @rdname tagged_na #' @export format_tagged_na <- function(x, digits = getOption("digits")) { out <- format(vec_data(x), digits = digits) out[is_tagged_na(x)] <- paste0("NA(", na_tag(x)[is_tagged_na(x)], ")") # format again to make sure all elements have same width format(out, justify = "right") } #' @rdname tagged_na #' @export print_tagged_na <- function(x, digits = getOption("digits")) { print(format_tagged_na(x), quote = FALSE) } haven/R/zzz.R0000644000176200001440000000171514033646021012537 0ustar liggesusers# nocov start .onUnload <- function(libpath) { library.dynam.unload("haven", libpath) } # Adapted from https://github.com/tidyverse/hms/blob/master/R/zzz.R .onLoad <- function(...) { register_s3_method("pillar", "pillar_shaft", "haven_labelled") invisible() } register_s3_method <- function(pkg, generic, class, fun = NULL) { stopifnot(is.character(pkg), length(pkg) == 1) stopifnot(is.character(generic), length(generic) == 1) stopifnot(is.character(class), length(class) == 1) if (is.null(fun)) { fun <- get(paste0(generic, ".", class), envir = parent.frame()) } else { stopifnot(is.function(fun)) } if (pkg %in% loadedNamespaces()) { registerS3method(generic, class, fun, envir = asNamespace(pkg)) } # Always register hook in case package is later unloaded & reloaded setHook( packageEvent(pkg, "onLoad"), function(...) { registerS3method(generic, class, fun, envir = asNamespace(pkg)) } ) } # nocov end haven/R/zap_empty.R0000644000176200001440000000050114101766327013712 0ustar liggesusers#' Convert empty strings into missing values #' #' @param x A character vector #' @return A character vector with empty strings replaced by missing values. #' @family zappers #' @export #' @examples #' x <- c("a", "", "c") #' zap_empty(x) zap_empty <- function(x) { stopifnot(is.character(x)) x[x == ""] <- NA x } haven/R/labelled.R0000644000176200001440000002334414101006665013450 0ustar liggesusers#' Create a labelled vector. #' #' A labelled vector is a common data structure in other statistical #' environments, allowing you to assign text labels to specific values. #' This class makes it possible to import such labelled vectors in to R #' without loss of fidelity. This class provides few methods, as I #' expect you'll coerce to a standard R class (e.g. a [factor()]) #' soon after importing. #' #' @param x A vector to label. Must be either numeric (integer or double) or #' character. #' @param labels A named vector or `NULL`. The vector should be the same type #' as `x`. Unlike factors, labels don't need to be exhaustive: only a fraction #' of the values might be labelled. #' @param label A short, human-readable description of the vector. #' @export #' @examples #' s1 <- labelled(c("M", "M", "F"), c(Male = "M", Female = "F")) #' s2 <- labelled(c(1, 1, 2), c(Male = 1, Female = 2)) #' s3 <- labelled(c(1, 1, 2), c(Male = 1, Female = 2), #' label="Assigned sex at birth") #' #' # Unfortunately it's not possible to make as.factor work for labelled objects #' # so instead use as_factor. This works for all types of labelled vectors. #' as_factor(s1) #' as_factor(s1, levels = "values") #' as_factor(s2) #' #' # Other statistical software supports multiple types of missing values #' s3 <- labelled(c("M", "M", "F", "X", "N/A"), #' c(Male = "M", Female = "F", Refused = "X", "Not applicable" = "N/A") #' ) #' s3 #' as_factor(s3) #' #' # Often when you have a partially labelled numeric vector, labelled values #' # are special types of missing. Use zap_labels to replace labels with missing #' # values #' x <- labelled(c(1, 2, 1, 2, 10, 9), c(Unknown = 9, Refused = 10)) #' zap_labels(x) labelled <- function(x = double(), labels = NULL, label = NULL) { x <- vec_data(x) labels <- vec_cast_named(labels, x, x_arg = "labels", to_arg = "x") validate_labelled(new_labelled(x, labels = labels, label = label)) } new_labelled <- function(x = double(), labels = NULL, label = NULL, ..., class = character()) { if (!is.numeric(x) && !is.character(x)) { abort("`x` must be a numeric or a character vector.") } if (!is.null(labels) && !vec_is(labels, x)) { abort("`labels` must be same type as `x`.") } if (!is.null(label) && (!is.character(label) || length(label) != 1)) { abort("`label` must be a character vector of length one.") } new_vctr(x, labels = labels, label = label, ..., class = c(class, "haven_labelled"), inherit_base_type = TRUE ) } validate_labelled <- function(x) { labels <- attr(x, "labels") if (is.null(labels)) { return(x) } if (is.null(names(labels))) { abort("`labels` must have names.") } if (any(duplicated(stats::na.omit(labels)))) { abort("`labels` must be unique.") } x } #' @export as.character.haven_labelled <- function(x, ...) { as.character(vec_data(x)) } #' @export levels.haven_labelled <- function(x) { NULL } # TODO: https://github.com/r-lib/vctrs/issues/1108 #' @export `names<-.haven_labelled` <- function(x, value) { attr(x, "names") <- value x } #' @importFrom stats median #' @export median.haven_labelled <- function(x, na.rm = TRUE, ...) { if (is.character(x)) { abort("Can't compute median of labelled") } median(vec_data(x), na.rm = TRUE, ...) } #' @importFrom stats quantile #' @export quantile.haven_labelled <- function(x, ...) { if (is.character(x)) { abort("Can't compute quantile of labelled") } quantile(vec_data(x), ...) } #' @export summary.haven_labelled <- function(object, ...) { summary(vec_data(object), ...) } # Formatting -------------------------------------------------------------- #' @export vec_ptype_full.haven_labelled <- function(x, ...) { paste0("labelled<", vec_ptype_full(vec_data(x)), ">") } #' @export vec_ptype_abbr.haven_labelled <- function(x, ...) { paste0(vec_ptype_abbr(vec_data(x)), "+lbl") } #' @export obj_print_header.haven_labelled <- function(x, ...) { cat_line("<", vec_ptype_full(x), "[", vec_size(x), "]>", get_labeltext(x)) invisible(x) } # Convenience function for getting the label with # with a prefix (if label is not empty), used for # printing 'label' and 'labelled_spss' vectors get_labeltext <- function(x, prefix=": ") { label = attr(x, "label", exact = TRUE) if(!is.null(label)) { paste0(prefix, label) } } #' @export format.haven_labelled <- function(x, ..., digits = getOption("digits")) { if (is.double(x)) { format_tagged_na(x, digits = digits) } else { format(vec_data(x), ...) } } #' @export obj_print_footer.haven_labelled <- function(x, ...) { print_labels(x) } #' Print the labels of a labelled vector #' #' This is a convenience function, useful to explore the variables of #' a newly imported dataset. #' @param x A labelled vector #' @param name The name of the vector (optional) #' @export #' @examples #' s1 <- labelled(c("M", "M", "F"), c(Male = "M", Female = "F")) #' s2 <- labelled(c(1, 1, 2), c(Male = 1, Female = 2)) #' labelled_df <- tibble::tibble(s1, s2) #' #' for (var in names(labelled_df)) { #' print_labels(labelled_df[[var]], var) #' } print_labels <- function(x, name = NULL) { if (!is.labelled(x)) { stop("x must be a labelled vector", call. = FALSE) } labels <- attr(x, "labels") if (length(labels) == 0) { return(invisible(x)) } cat("\nLabels:", name, "\n", sep = "") value <- if (is.double(labels)) format_tagged_na(labels) else unname(labels) lab_df <- data.frame(value = value, label = names(labels), row.names = NULL) print(lab_df, row.names = FALSE) invisible(x) } # Type system ------------------------------------------------------------- # Import to avoid R CMD check NOTE #' @importFrom methods setOldClass setOldClass(c("haven_labelled", "vctrs_vctr")) #' @export #' @rdname labelled is.labelled <- function(x) inherits(x, "haven_labelled") #' @export vec_ptype2.double.haven_labelled <- function(x, y, ...) { data_type <- vec_ptype2(x, vec_data(y), ...) new_labelled(data_type, labels = vec_cast_named(attr(y, "labels"), data_type), label = attr(y, "label", exact = TRUE) ) } #' @export vec_ptype2.integer.haven_labelled <- vec_ptype2.double.haven_labelled #' @export vec_ptype2.character.haven_labelled <- vec_ptype2.double.haven_labelled #' @export vec_ptype2.haven_labelled.double <- function(x, y, ...) vec_ptype2(y, x, ...) #' @export vec_ptype2.haven_labelled.integer <- vec_ptype2.haven_labelled.double #' @export vec_ptype2.haven_labelled.character <- vec_ptype2.haven_labelled.double #' @export vec_ptype2.haven_labelled.haven_labelled <- function(x, y, ..., x_arg = "", y_arg = "") { data_type <- vec_ptype2(vec_data(x), vec_data(y), ..., x_arg = x_arg, y_arg = y_arg) # Prefer labels from LHS x_labels <- vec_cast_named(attr(x, "labels"), data_type, x_arg = x_arg) y_labels <- vec_cast_named(attr(y, "labels"), data_type, x_arg = y_arg) labels <- c(x_labels, y_labels[!y_labels %in% x_labels]) # Prefer labels from LHS label <- attr(x, "label", exact = TRUE) %||% attr(y, "label", exact = TRUE) new_labelled(data_type, labels = labels, label = label) } #' @export vec_cast.double.haven_labelled <- function(x, to, ...) vec_cast(vec_data(x), to) #' @export vec_cast.integer.haven_labelled <- function(x, to, ...) vec_cast(vec_data(x), to) #' @export vec_cast.character.haven_labelled <- function(x, to, ...) { if (is.character(x)) { vec_cast(vec_data(x), to, ...) } else { stop_incompatible_cast(x, to, ...) } } #' @export vec_cast.haven_labelled.haven_labelled <- function(x, to, ..., x_arg = "", to_arg = "") { out_data <- vec_cast(vec_data(x), vec_data(to), ..., x_arg = x_arg, to_arg = to_arg) x_labels <- attr(x, "labels") to_labels <- attr(to, "labels") out_labels <- to_labels %||% x_labels out <- labelled(out_data, labels = out_labels, label = attr(x, "label", exact = TRUE) ) # do we lose tagged na values? if (is.double(x) && !is.double(out)) { lossy <- is_tagged_na(x) maybe_lossy_cast(out, x, to, lossy, x_arg = x_arg, to_arg = to_arg, details = "Only doubles can hold tagged na values." ) } # do any values become unlabelled? if (!is.null(to_labels)) { lossy <- x %in% x_labels[!x_labels %in% out_labels] maybe_lossy_cast(out, x, to, lossy, x_arg = x_arg, to_arg = to_arg, details = paste0("Values are labelled in `", x_arg, "` but not in `", to_arg, "`.") ) } out } #' @export vec_cast.haven_labelled.double <- function(x, to, ...) { vec_cast.haven_labelled.haven_labelled(x, to, ...) } #' @export vec_cast.haven_labelled.integer <- function(x, to, ...) { vec_cast.haven_labelled.haven_labelled(x, to, ...) } #' @export vec_cast.haven_labelled.character <- function(x, to, ...) { vec_cast.haven_labelled.haven_labelled(x, to, ...) } # Arithmetic -------------------------------------------------------------- #' Internal vctrs methods #' #' @keywords internal #' @export vec_arith.haven_labelled #' @method vec_arith haven_labelled #' @export vec_arith.haven_labelled <- function(op, x, y, ...) { UseMethod("vec_arith.haven_labelled", y) } #' @export #' @method vec_arith.haven_labelled default vec_arith.haven_labelled.default <- function(op, x, y, ...) { stop_incompatible_op(op, x, y) } #' @export #' @method vec_arith.haven_labelled haven_labelled vec_arith.haven_labelled.haven_labelled <- function(op, x, y, ...) { vec_arith_base(op, x, y) } #' @export #' @method vec_arith.haven_labelled numeric vec_arith.haven_labelled.numeric <- function(op, x, y, ...) { vec_arith_base(op, x, y) } #' @export #' @method vec_arith.numeric haven_labelled vec_arith.numeric.haven_labelled <- function(op, x, y, ...) { vec_arith_base(op, x, y) } #' @export vec_math.haven_labelled <- function(.fn, .x, ...) { vec_math_base(.fn, .x, ...) } haven/R/zap_label.R0000644000176200001440000000135014101766013013626 0ustar liggesusers#' Zap variable labels #' #' @description #' Removes variable label, leaving unlabelled vectors as is. #' #' @seealso [zap_labels()] to remove value labels. #' @param x A vector or data frame #' @family zappers #' @export #' @examples #' x1 <- labelled(1:5, c(good = 1, bad = 5), label = "rating") #' x1 #' zap_label(x1) #' #' x2 <- labelled_spss(c(1:4, 9), label = "score", na_values = 9) #' x2 #' zap_label(x2) #' #' # zap_label also works with data frames #' df <- tibble::tibble(x1, x2) #' str(df) #' str(zap_label(df)) zap_label <- function(x) { UseMethod("zap_label") } #' @export zap_label.default <- function(x) { attr(x, "label") <- NULL x } #' @export zap_label.data.frame <- function(x) { x[] <- lapply(x, zap_label) x } haven/R/haven-sas.R0000644000176200001440000001240214034330007013555 0ustar liggesusers#' Read and write SAS files #' #' `read_sas()` supports both sas7bdat files and the accompanying sas7bcat files #' that SAS uses to record value labels. `write_sas()` is currently experimental #' and only works for limited datasets. #' #' @param data_file,catalog_file Path to data and catalog files. The files are #' processed with [readr::datasource()]. #' @param data Data frame to write. #' @param path Path to file where the data will be written. #' @param encoding,catalog_encoding The character encoding used for the #' `data_file` and `catalog_encoding` respectively. A value of `NULL` uses the #' encoding specified in the file; use this argument to override it if it is #' incorrect. #' @inheritParams tibble::as_tibble #' @param col_select One or more selection expressions, like in #' [dplyr::select()]. Use `c()` or `list()` to use more than one expression. #' See `?dplyr::select` for details on available selection options. Only the #' specified columns will be read from `data_file`. #' @param skip Number of lines to skip before reading data. #' @param n_max Maximum number of lines to read. #' @param cols_only **Deprecated**: Use `col_select` instead. #' @return A tibble, data frame variant with nice defaults. #' #' Variable labels are stored in the "label" attribute of each variable. It is #' not printed on the console, but the RStudio viewer will show it. #' #' `write_sas()` returns the input `data` invisibly. #' @export #' @examples #' path <- system.file("examples", "iris.sas7bdat", package = "haven") #' read_sas(path) read_sas <- function(data_file, catalog_file = NULL, encoding = NULL, catalog_encoding = encoding, col_select = NULL, skip = 0L, n_max = Inf, cols_only = "DEPRECATED", .name_repair = "unique" ) { if (!missing(cols_only)) { warning("`cols_only` is deprecated. Please use `col_select` instead.", call. = FALSE) stopifnot(is.character(cols_only)) # used to only work with a char vector # guarantee a quosure to keep NULL and tidyselect logic clean downstream col_select <- quo(c(!!!cols_only)) } else { col_select <- enquo(col_select) } if (is.null(encoding)) { encoding <- "" } cols_skip <- skip_cols(read_sas, !!col_select, data_file, encoding = encoding) n_max <- validate_n_max(n_max) spec_data <- readr::datasource(data_file) if (is.null(catalog_file)) { spec_cat <- list() } else { spec_cat <- readr::datasource(catalog_file) } switch(class(spec_data)[1], source_file = df_parse_sas_file(spec_data, spec_cat, encoding = encoding, catalog_encoding = catalog_encoding, cols_skip = cols_skip, n_max = n_max, rows_skip = skip, name_repair = .name_repair), source_raw = df_parse_sas_raw(spec_data, spec_cat, encoding = encoding, catalog_encoding = catalog_encoding, cols_skip = cols_skip, n_max = n_max, rows_skip = skip, name_repair = .name_repair), stop("This kind of input is not handled", call. = FALSE) ) } #' @export #' @rdname read_sas write_sas <- function(data, path) { data <- validate_sas(data) write_sas_(data, normalizePath(path, mustWork = FALSE)) invisible(data) } #' Read and write SAS transport files #' #' The SAS transport format is a open format, as is required for submission #' of the data to the FDA. #' #' @inheritParams read_spss #' @return A tibble, data frame variant with nice defaults. #' #' Variable labels are stored in the "label" attribute of each variable. #' It is not printed on the console, but the RStudio viewer will show it. #' #' `write_xpt()` returns the input `data` invisibly. #' @export #' @examples #' tmp <- tempfile(fileext = ".xpt") #' write_xpt(mtcars, tmp) #' read_xpt(tmp) read_xpt <- function(file, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique" ) { cols_skip <- skip_cols(read_xpt, {{ col_select }}, file) n_max <- validate_n_max(n_max) spec <- readr::datasource(file) switch(class(spec)[1], source_file = df_parse_xpt_file(spec, cols_skip, n_max, skip, name_repair = .name_repair), source_raw = df_parse_xpt_raw(spec, cols_skip, n_max, skip, name_repair = .name_repair), stop("This kind of input is not handled", call. = FALSE) ) } #' @export #' @rdname read_xpt #' @param version Version of transport file specification to use: either 5 or 8. #' @param name Member name to record in file. Defaults to file name sans #' extension. Must be <= 8 characters for version 5, and <= 32 characters #' for version 8. write_xpt <- function(data, path, version = 8, name = NULL) { stopifnot(version %in% c(5, 8)) if (is.null(name)) { name <- tools::file_path_sans_ext(basename(path)) } name <- validate_xpt_name(name, version) data <- validate_sas(data) write_xpt_( data, normalizePath(path, mustWork = FALSE), version = version, name = name ) invisible(data) } # Validation -------------------------------------------------------------- validate_sas <- function(data) { stopifnot(is.data.frame(data)) adjust_tz(data) } validate_xpt_name <- function(name, version) { if (version == 5) { if (nchar(name) > 8) { stop("`name` must be 8 characters or fewer", call. = FALSE) } } else { if (nchar(name) > 32) { stop("`name` must be 32 characters or fewer", call. = FALSE) } } name } haven/R/cpp11.R0000644000176200001440000000474014102302631012620 0ustar liggesusers# Generated by cpp11: do not edit by hand df_parse_sas_file <- function(spec_b7dat, spec_b7cat, encoding, catalog_encoding, cols_skip, n_max, rows_skip, name_repair) { .Call(`_haven_df_parse_sas_file`, spec_b7dat, spec_b7cat, encoding, catalog_encoding, cols_skip, n_max, rows_skip, name_repair) } df_parse_sas_raw <- function(spec_b7dat, spec_b7cat, encoding, catalog_encoding, cols_skip, n_max, rows_skip, name_repair) { .Call(`_haven_df_parse_sas_raw`, spec_b7dat, spec_b7cat, encoding, catalog_encoding, cols_skip, n_max, rows_skip, name_repair) } df_parse_xpt_file <- function(spec, cols_skip, n_max, rows_skip, name_repair) { .Call(`_haven_df_parse_xpt_file`, spec, cols_skip, n_max, rows_skip, name_repair) } df_parse_xpt_raw <- function(spec, cols_skip, n_max, rows_skip, name_repair) { .Call(`_haven_df_parse_xpt_raw`, spec, cols_skip, n_max, rows_skip, name_repair) } df_parse_dta_file <- function(spec, encoding, cols_skip, n_max, rows_skip, name_repair) { .Call(`_haven_df_parse_dta_file`, spec, encoding, cols_skip, n_max, rows_skip, name_repair) } df_parse_dta_raw <- function(spec, encoding, cols_skip, n_max, rows_skip, name_repair) { .Call(`_haven_df_parse_dta_raw`, spec, encoding, cols_skip, n_max, rows_skip, name_repair) } df_parse_sav_file <- function(spec, encoding, user_na, cols_skip, n_max, rows_skip, name_repair) { .Call(`_haven_df_parse_sav_file`, spec, encoding, user_na, cols_skip, n_max, rows_skip, name_repair) } df_parse_sav_raw <- function(spec, encoding, user_na, cols_skip, n_max, rows_skip, name_repair) { .Call(`_haven_df_parse_sav_raw`, spec, encoding, user_na, cols_skip, n_max, rows_skip, name_repair) } df_parse_por_file <- function(spec, encoding, user_na, cols_skip, n_max, rows_skip, name_repair) { .Call(`_haven_df_parse_por_file`, spec, encoding, user_na, cols_skip, n_max, rows_skip, name_repair) } df_parse_por_raw <- function(spec, encoding, user_na, cols_skip, n_max, rows_skip, name_repair) { .Call(`_haven_df_parse_por_raw`, spec, encoding, user_na, cols_skip, n_max, rows_skip, name_repair) } write_sav_ <- function(data, path, compress) { invisible(.Call(`_haven_write_sav_`, data, path, compress)) } write_dta_ <- function(data, path, version, label) { invisible(.Call(`_haven_write_dta_`, data, path, version, label)) } write_sas_ <- function(data, path) { invisible(.Call(`_haven_write_sas_`, data, path)) } write_xpt_ <- function(data, path, version, name) { invisible(.Call(`_haven_write_xpt_`, data, path, version, name)) } haven/R/haven-spss.R0000644000176200001440000001044514034333742013775 0ustar liggesusers#' Read and write SPSS files #' #' `read_sav()` reads both `.sav` and `.zsav` files; `write_sav()` creates #' `.zsav` files when `compress = TRUE`. `read_por()` reads `.por` files. #' `read_spss()` uses either `read_por()` or `read_sav()` based on the #' file extension. #' #' Currently haven can read and write logical, integer, numeric, character #' and factors. See [labelled_spss()] for how labelled variables in #' SPSS are handled in R. #' #' @inheritParams read_sas #' @inheritParams readr::datasource #' @param path Path to a file where the data will be written. #' @param data Data frame to write. #' @param encoding The character encoding used for the file. The default, #' `NULL`, use the encoding specified in the file, but sometimes this #' value is incorrect and it is useful to be able to override it. #' @return A tibble, data frame variant with nice defaults. #' #' Variable labels are stored in the "label" attribute of each variable. #' It is not printed on the console, but the RStudio viewer will show it. #' #' `write_sav()` returns the input `data` invisibly. #' @name read_spss #' @examples #' path <- system.file("examples", "iris.sav", package = "haven") #' read_sav(path) #' #' tmp <- tempfile(fileext = ".sav") #' write_sav(mtcars, tmp) #' read_sav(tmp) NULL #' @export #' @rdname read_spss read_sav <- function(file, encoding = NULL, user_na = FALSE, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique") { if (is.null(encoding)) { encoding <- "" } cols_skip <- skip_cols(read_sav, {{ col_select }}, file, encoding) n_max <- validate_n_max(n_max) spec <- readr::datasource(file) switch(class(spec)[1], source_file = df_parse_sav_file(spec, encoding, user_na, cols_skip, n_max, skip, name_repair = .name_repair), source_raw = df_parse_sav_raw(spec, encoding, user_na, cols_skip, n_max, skip, name_repair = .name_repair), stop("This kind of input is not handled", call. = FALSE) ) } #' @export #' @rdname read_spss read_por <- function(file, user_na = FALSE, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique") { cols_skip <- skip_cols(read_por, {{ col_select }}, file) n_max <- validate_n_max(n_max) spec <- readr::datasource(file) switch(class(spec)[1], source_file = df_parse_por_file(spec, encoding = "", user_na = user_na, cols_skip, n_max, skip, name_repair = .name_repair), source_raw = df_parse_por_raw(spec, encoding = "", user_na = user_na, cols_skip, n_max, skip, name_repair = .name_repair), stop("This kind of input is not handled", call. = FALSE) ) } #' @export #' @rdname read_spss #' @param compress If `TRUE`, will compress the file, resulting in a `.zsav` #' file. Otherwise the `.sav` file will be bytecode compressed. write_sav <- function(data, path, compress = FALSE) { data <- validate_sav(data) write_sav_(data, normalizePath(path, mustWork = FALSE), compress = compress) invisible(data) } #' @export #' @rdname read_spss #' @param user_na If `TRUE` variables with user defined missing will #' be read into [labelled_spss()] objects. If `FALSE`, the #' default, user-defined missings will be converted to `NA`. read_spss <- function(file, user_na = FALSE, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique") { ext <- tolower(tools::file_ext(file)) switch(ext, sav = read_sav(file, user_na = user_na, col_select = {{ col_select }}, n_max = n_max, skip = skip, .name_repair = .name_repair), zsav = read_sav(file, user_na = user_na, col_select = {{ col_select }}, n_max = n_max, skip = skip, .name_repair = .name_repair), por = read_por(file, user_na = user_na, col_select = {{ col_select }}, n_max = n_max, skip = skip, .name_repair = .name_repair), stop("Unknown extension '.", ext, "'", call. = FALSE) ) } validate_sav <- function(data) { stopifnot(is.data.frame(data)) # Check factor lengths level_lengths <- vapply(data, max_level_length, integer(1)) bad_lengths <- level_lengths > 120 if (any(bad_lengths)) { stop( "SPSS only supports levels with <= 120 characters\n", "Problems: ", var_names(data, bad_lengths), call. = FALSE ) } adjust_tz(data) } # Helpers ----------------------------------------------------------------- max_level_length <- function(x) { if (!is.factor(x)) return(0L) max(0L, nchar(levels(x)), na.rm = TRUE) } haven/R/labelled-pillar.R0000644000176200001440000001502214033646021014723 0ustar liggesusers # Dynamically exported, see zzz.R pillar_shaft.haven_labelled <- function( x, show_labels = getOption("haven.show_pillar_labels", TRUE), ... ) { if (!isTRUE(show_labels) | !pillar_print_pkgs_available()) { return(pillar::pillar_shaft(unclass(x))) } if (is.numeric(x)) { val <- val_num_pillar_info(x) lbl <- lbl_pillar_info(x) pillar::new_pillar_shaft( list(val = val, lbl = lbl), min_width = max(val$disp_short$lhs_ws + val$disp_short$main_wid + lbl$wid_short), width = max(val$disp_full$lhs_ws + val$disp_full$main_wid + lbl$wid_full), class = "pillar_shaft_haven_labelled_num" ) } else { val <- val_chr_pillar_info(x) lbl <- lbl_pillar_info(x) pillar::new_pillar_shaft( list(val = val, lbl = lbl), min_width = max(val$wid_short + lbl$wid_short), width = max(val$wid_full + lbl$wid_full), class = "pillar_shaft_haven_labelled_chr" ) } } val_num_pillar_info <- function(x) { val_pillar <- pillar::pillar_shaft(zap_labels.haven_labelled(x)) disp_short <- num_disp_components(x, val_pillar, attr(val_pillar, "min_width")) disp_full <- num_disp_components(x, val_pillar, attr(val_pillar, "width")) if (is.double(x)) { na_display <- character(length(x)) na_display[is_tagged_na(x)] <- pillar::style_na(paste0("(", na_tag(x[is_tagged_na(x)]), ")")) disp_short <- add_text(disp_short, na_display) disp_full <- add_text(disp_full, na_display) } list( disp_short = disp_short, disp_full = disp_full ) } num_disp_components <- function(x, pillar, width) { display <- format(pillar, width) # Sometimes there's an extra leading space from pillar display <- trim_ws_lhs(display) # exponent notation formatting hinders stripping white space in NAs display[is.na(unclass(x))] <- crayon::strip_style(display[is.na(unclass(x))]) display_untrimmed_wid <- pillar::get_extent(display) display_max_wid <- max(display_untrimmed_wid) display <- trim_ws_rhs(display) main_wid <- pillar::get_extent(display) display_trimmed_rhs <- display_untrimmed_wid - main_wid display[is.na(unclass(x))] <- pillar::style_na(display[is.na(unclass(x))]) list( lhs_ws = max(main_wid + display_trimmed_rhs) - (main_wid + display_trimmed_rhs), main_wid = main_wid, main_txt = display, rhs_ws = display_trimmed_rhs ) } add_text <- function(display, new_text) { new_wid <- pillar::get_extent(new_text) wid_avail <- pmin(display$lhs_ws, new_wid) wid_needed <- new_wid - wid_avail display$lhs_ws <- display$lhs_ws + max(wid_needed) - wid_avail - wid_needed display$main_txt <- paste0(display$main_txt, new_text) display$main_wid <- pillar::get_extent(display$main_txt) display } val_chr_pillar_info <- function(x) { MIN_CHR_DISPLAY <- 4 val_pillar <- pillar::pillar_shaft(zap_labels.haven_labelled(x)) disp_full <- trim_ws_rhs(format(val_pillar, attr(val_pillar, "width"))) wid_full <- pillar::get_extent(disp_full) list( val_pillar = val_pillar, wid_short = pmin(MIN_CHR_DISPLAY, wid_full), disp_full = disp_full, wid_full = wid_full ) } lbl_pillar_info <- function(x) { MIN_LBL_DISPLAY <- 6 labels <- attr(x, "labels") if (length(labels) > 0) { names(labels) <- pillar::style_subtle(paste0(" [", names(labels), "]")) attr(x, "labels") <- labels label_display <- as.character(as_factor(x, "labels")) label_display[is.na(label_display)] <- "" } else { label_display <- character(length(x)) } label_widths <- pillar::get_extent(label_display) label_min_widths <- ifelse(label_widths > 0, pmin(MIN_LBL_DISPLAY, label_widths), 0) if (inherits(x, "haven_labelled_spss")) { MIN_NA_DISPLAY <- 4 na_display <- character(length(x)) na_display[is.na(x) & !is.na(unclass(x))] <- pillar::style_na(" (NA)") na_widths <- pillar::get_extent(na_display) label_display <- paste0(na_display, label_display) label_widths <- label_widths + na_widths label_min_widths <- label_min_widths + ifelse(label_widths > 0, pmin(MIN_NA_DISPLAY, label_widths), 0) } ret <- list( wid_short = label_min_widths, disp_full = label_display, wid_full = label_widths ) ret } #' @export format.pillar_shaft_haven_labelled_num <- function(x, width, ...) { vshort <- x$val$disp_short vfull <- x$val$disp_full lbl_wid <- pmax(0, x$lbl$wid_short - vfull$rhs_ws) if (width >= max(vfull$lhs_ws +vfull$main_wid + lbl_wid)) { lbl_width <- width - (vfull$lhs_ws + vfull$main_wid) lbl <- str_trunc(x$lbl$disp_full, lbl_width, subtle = TRUE) out <- paste_with_align(vfull$main_txt, lbl, vfull$lhs_ws, vfull$rhs_ws) } else { lbl_width <- width - (vshort$lhs_ws + vshort$main_wid) lbl <- str_trunc(x$lbl$disp_full, lbl_width, subtle = TRUE) out <- paste_with_align(vshort$main_txt, lbl, vshort$lhs_ws, vshort$rhs_ws) } pillar::new_ornament(out, width = width, align = "right") } #' @export format.pillar_shaft_haven_labelled_chr <- function(x, width, ...) { if (width >= max(x$val$wid_full + x$lbl$wid_short)) { lbl_width <- width - x$val$wid_full lbl <- str_trunc(x$lbl$disp_full, lbl_width, subtle = TRUE) out <- paste0(x$val$disp_full, lbl) } else { val_widths <- pmin(x$val$wid_full, width - x$lbl$wid_short) val_display <- str_trunc(x$val$disp_full, val_widths) lbl <- str_trunc(x$lbl$disp_full, width - val_widths, subtle = TRUE) out <- paste0(val_display, lbl) } pillar::new_ornament(out, width = width, align = "left") } # Helpers ----------------------------------------------------------------- str_trunc <- function(x, widths, subtle = FALSE) { str_width <- pillar::get_extent(x) too_wide <- which(!is.na(x) & str_width > widths) continue_symbol <- cli::symbol$continue if (subtle) continue_symbol <- pillar::style_subtle(continue_symbol) truncated <- Map(x[too_wide], widths[too_wide], f = function(item, wid) { paste0(crayon::col_substr(item, 1, wid - 1), continue_symbol) }) truncated <- as.vector(truncated, "character") x[too_wide] <- truncated x } trim_ws_rhs <- function(x) { sub("[ \t\r\n]+$", "", x) } trim_ws_lhs <- function(x) { sub("^[ \t\r\n]+", "", x) } pad_space <- function(n) { vapply(n, function(x) paste(rep(" ", x), collapse = ""), "") } paste_with_align <- function(x, y, lhs_ws, rhs_ws) { y_wid <- pillar::get_extent(y) added_chars <- max(y_wid - rhs_ws) rhs_ws <- added_chars - (y_wid - rhs_ws) paste0(pad_space(lhs_ws), x, y, pad_space(rhs_ws)) } pillar_print_pkgs_available <- function() { requireNamespace("crayon", quietly = TRUE) & requireNamespace("cli", quietly = TRUE) } haven/R/haven-package.R0000644000176200001440000000033514034327653014401 0ustar liggesusers#' @useDynLib haven, .registration = TRUE #' @keywords internal "_PACKAGE" ## usethis namespace: start #' @import rlang #' @import vctrs #' @importFrom tibble tibble #' @importFrom hms hms ## usethis namespace: end NULL haven/R/zap_formats.R0000644000176200001440000000124514033646021014225 0ustar liggesusers#' Remove format attributes #' #' To provide some mild support for round-tripping variables between Stata/SPSS #' and R, haven stores variable formats in an attribute: `format.stata`, #' `format.spss`, or `format.sas`. If this causes problems for your #' code, you can get rid of them with `zap_formats`. #' #' @param x A vector or data frame. #' @family zappers #' @export zap_formats <- function(x) { UseMethod("zap_formats") } #' @export zap_formats.default <- function(x) { attr(x, "format.spss") <- NULL attr(x, "format.sas") <- NULL attr(x, "format.stata") <- NULL x } #' @export zap_formats.data.frame <- function(x) { x[] <- lapply(x, zap_formats) x } haven/R/as_factor.R0000644000176200001440000000747414034334620013653 0ustar liggesusers#' Convert input to a factor. #' #' The base function `as.factor()` is not a generic, but this variant #' is. Methods are provided for factors, character vectors, labelled #' vectors, and data frames. By default, when applied to a data frame, #' it only affects [labelled] columns. #' #' Includes methods for both class `haven_labelled` and `labelled` #' for backward compatibility. #' #' @param x Object to coerce to a factor. #' @param ... Other arguments passed down to method. #' @param only_labelled Only apply to labelled columns? #' @export #' @examples #' x <- labelled(sample(5, 10, replace = TRUE), c(Bad = 1, Good = 5)) #' #' # Default method uses values where available #' as_factor(x) #' # You can also extract just the labels #' as_factor(x, levels = "labels") #' # Or just the values #' as_factor(x, levels = "values") #' # Or combine value and label #' as_factor(x, levels = "both") #' #' # as_factor() will preserve SPSS missing values from values and ranges #' y <- labelled_spss(1:10, na_values = c(2, 4), na_range = c(8, 10)) #' as_factor(y) #' # use zap_missing() first to convert to NAs #' zap_missing(y) #' as_factor(zap_missing(y)) #' @importFrom forcats as_factor #' @export #' @name as_factor NULL #' @rdname as_factor #' @export as_factor.data.frame <- function(x, ..., only_labelled = TRUE) { if (only_labelled) { labelled <- vapply(x, is.labelled, logical(1)) x[labelled] <- lapply(x[labelled], as_factor, ...) } else { x[] <- lapply(x, as_factor, ...) } x } #' @param ordered If `TRUE` create an ordered (ordinal) factor, if #' `FALSE` (the default) create a regular (nominal) factor. #' @param levels How to create the levels of the generated factor: #' #' * "default": uses labels where available, otherwise the values. #' Labels are sorted by value. #' * "both": like "default", but pastes together the level and value #' * "label": use only the labels; unlabelled values become `NA` #' * "values: use only the values #' @rdname as_factor #' @export as_factor.haven_labelled <- function(x, levels = c("default", "labels", "values", "both"), ordered = FALSE, ...) { levels <- match.arg(levels) label <- attr(x, "label", exact = TRUE) labels <- attr(x, "labels") if (levels %in% c("default", "both")) { if (levels == "both") { names(labels) <- paste0("[", labels, "] ", names(labels)) } # Replace each value with its label vals <- unique(vec_data(x)) levs <- replace_with(vals, unname(labels), names(labels)) # Ensure all labels are preserved levs <- sort(c(stats::setNames(vals, levs), labels), na.last = TRUE) levs <- unique(names(levs)) x <- replace_with(vec_data(x), unname(labels), names(labels)) x <- factor(x, levels = levs, ordered = ordered) } else if (levels == "labels") { levs <- unname(labels) labs <- names(labels) x <- replace_with(vec_data(x), levs, labs) x <- factor(x, unique(labs), ordered = ordered) } else if (levels == "values") { if (all(x %in% labels)) { levels <- unname(labels) } else { levels <- sort(unique(vec_data(x))) } x <- factor(vec_data(x), levels, ordered = ordered) } structure(x, label = label) } #' @export #' @rdname as_factor as_factor.labelled <- as_factor.haven_labelled replace_with <- function(x, from, to) { stopifnot(length(from) == length(to)) out <- x # First replace regular values matches <- match(x, from, incomparables = NA) if (anyNA(matches)) { out[!is.na(matches)] <- to[matches[!is.na(matches)]] } else { out <- to[matches] } # Then tagged missing values tagged <- is_tagged_na(x) if (!any(tagged)) { return(out) } matches <- match(na_tag(x), na_tag(from), incomparables = NA) # Could possibly be faster to use anyNA(matches) out[!is.na(matches)] <- to[matches[!is.na(matches)]] out } haven/R/haven-stata.R0000644000176200001440000001166714101006665014124 0ustar liggesusers#' Read and write Stata DTA files #' #' Currently haven can read and write logical, integer, numeric, character #' and factors. See [labelled()] for how labelled variables in #' Stata are handled in R. #' #' @section Character encoding: #' Prior to Stata 14, files did not declare a text encoding, and the #' default encoding differed across platforms. If `encoding = NULL`, #' haven assumes the encoding is windows-1252, the text encoding used by #' Stata on Windows. Unfortunately Stata on Mac and Linux use a different #' default encoding, "latin1". If you encounter an error such as #' "Unable to convert string to the requested encoding", try #' `encoding = "latin1"` #' #' For Stata 14 and later, you should not need to manually specify `encoding` #' value unless the value was incorrectly recorded in the source file. #' #' @inheritParams readr::datasource #' @inheritParams read_spss #' @param encoding The character encoding used for the file. Generally, #' only needed for Stata 13 files and earlier. See Encoding section #' for details. #' @return A tibble, data frame variant with nice defaults. #' #' Variable labels are stored in the "label" attribute of each variable. #' It is not printed on the console, but the RStudio viewer will show it. #' #' If a dataset label is defined in Stata, it will stored in the "label" #' attribute of the tibble. #' #' `write_dta()` returns the input `data` invisibly. #' @export #' @examples #' path <- system.file("examples", "iris.dta", package = "haven") #' read_dta(path) #' #' tmp <- tempfile(fileext = ".dta") #' write_dta(mtcars, tmp) #' read_dta(tmp) #' read_stata(tmp) read_dta <- function(file, encoding = NULL, col_select = NULL, skip = 0, n_max = Inf, .name_repair = "unique") { if (is.null(encoding)) { encoding <- "" } cols_skip <- skip_cols(read_dta, {{ col_select }}, file, encoding) n_max <- validate_n_max(n_max) spec <- readr::datasource(file) switch(class(spec)[1], source_file = df_parse_dta_file(spec, encoding, cols_skip, n_max, skip, name_repair = .name_repair), source_raw = df_parse_dta_raw(spec, encoding, cols_skip, n_max, skip, name_repair = .name_repair), stop("This kind of input is not handled", call. = FALSE) ) } #' @export #' @rdname read_dta read_stata <- read_dta #' @export #' @rdname read_dta #' @param version File version to use. Supports versions 8-15. #' @param label Dataset label to use, or `NULL`. Defaults to the value stored in #' the "label" attribute of `data`. Must be <= 80 characters. write_dta <- function(data, path, version = 14, label = attr(data, "label")) { data <- validate_dta(data, version = version) validate_dta_label(label) write_dta_(data, normalizePath(path, mustWork = FALSE), version = stata_file_format(version), label = label ) invisible(data) } stata_file_format <- function(version) { stopifnot(is.numeric(version), length(version) == 1) version <- as.integer(version) if (version == 15L) { 119 } else if (version == 14L) { 118 } else if (version == 13L) { 117 } else if (version == 12L) { 115 } else if (version %in% c(10L, 11L)) { 114 } else if (version %in% c(8L, 9L)) { 113 } else { stop("Version ", version, " not currently supported", call. = FALSE) } } validate_dta <- function(data, version) { stopifnot(is.data.frame(data)) # Check variable names bad_name <- !grepl("^[A-Za-z_]{1}[A-Za-z0-9_]+$", names(data)) bad_length <- nchar(names(data)) > 32 bad_vars <- if (version >= 14) bad_length else bad_length || bad_name if (any(bad_vars)) { stop( "The following variable names are not valid Stata variables: ", var_names(data, bad_vars), call. = FALSE ) } # Check double vectors can only have labelled integers bad_labels <- vapply(data, has_non_integer_labels, logical(1)) if (any(bad_labels)) { stop( "Stata only supports labelling with integers.\nProblems: ", var_names(data, bad_labels), call. = FALSE ) } adjust_tz(data) } validate_dta_label <- function(label) { if (!is.null(label)) { stopifnot(is.character(label), length(label) == 1) if (nchar(label) > 80) { stop("Stata data labels must be 80 characters or fewer", call. = FALSE) } } } # helpers ----------------------------------------------------------------- has_non_integer_labels <- function(x) { if (is.null(attr(x, "labels"))) { return(FALSE) } if (!is.labelled(x)) { return(FALSE) } if (!is.double(x)) { return(FALSE) } !is_integerish(attr(x, "labels")) } # Adapted from rlang is_integerish <- function(x) { if (!typeof(x) %in% c("double", "integer")) return(FALSE) missing_elts <- is.na(x) finite_elts <- is.finite(x) | missing_elts if (!all(finite_elts)) { return(FALSE) } x_finite <- x[finite_elts & !missing_elts] all(x_finite == as.integer(x_finite)) } var_names <- function(data, i) { x <- names(data)[i] paste(encodeString(x, quote = "`"), collapse = ", ") } haven/R/zap_labels.R0000644000176200001440000000225514101766065014025 0ustar liggesusers#' Zap value labels #' #' @description #' Removes value labels, leaving unlabelled vectors as is. Use this if you #' want to simply drop all `labels` from a data frame. #' #' Zapping labels from [labelled_spss()] also removes user-defined missing #' values, replacing with standard `NA`s. #' #' @param x A vector or data frame #' @family zappers #' @seealso [zap_label()] to remove variable labels. #' @export #' @examples #' x1 <- labelled(1:5, c(good = 1, bad = 5)) #' x1 #' zap_labels(x1) #' #' x2 <- labelled_spss(c(1:4, 9), c(good = 1, bad = 5), na_values = 9) #' x2 #' zap_labels(x2) #' #' # zap_labels also works with data frames #' df <- tibble::tibble(x1, x2) #' df #' zap_labels(df) zap_labels <- function(x) { UseMethod("zap_labels") } #' @export zap_labels.default <- function(x) { x } #' @export zap_labels.haven_labelled <- function(x) { attr(x, "labels") <- NULL class(x) <- NULL x } #' @export zap_labels.haven_labelled_spss <- function(x) { x[is.na(x)] <- NA attr(x, "labels") <- NULL attr(x, "na_values") <- NULL attr(x, "na_range") <- NULL class(x) <- NULL x } #' @export zap_labels.data.frame <- function(x) { x[] <- lapply(x, zap_labels) x } haven/NEWS.md0000644000176200001440000004133114102332271012446 0ustar liggesusers# haven 2.4.3 * Fix build failure on Solaris. # haven 2.4.2 * Updated to ReadStat 1.1.7 RC (#620). * `read_dta()` no longer crashes if it sees StrL variables with missing values (@gorcha, #594, #600, #608). urlchecker::url_check() * `write_dta()` now correctly handles "labelled"-class numeric (double) variables that don't have value labels (@jmobrien, #606, #609). * `write_dta()` now allows variable names up to 32 characters (@sbae, #605). * Can now correctly combine `labelled_spss()` with identical labels (@gorcha, #599). # haven 2.4.1 * Fix buglet when combining `labelled()` with identical labels. # haven 2.4.0 ## New features * `labelled_spss()` gains full vctrs support thanks to the hard work of @gorcha (#527, #534, #538, #557). This means that they should now work seamlessly in dplyr 1.0.0, tidyr 1.0.0 and other packages that use vctrs. * `labelled()` vectors are more permissive when concatenating; output labels will be a combination of the left-hand and the right-hand side, preferring values assigned to the left-hand side (#543). * Date-times are no longer forced to UTC, but instead converted to the equivalent UTC (#555). This should ensure that you see the same date-time in R and in Stata/SPSS/SAS. ## Minor improvements and bug fixes * Updated to ReadStat 1.1.5. Most importantly this includes support for SAS binary compression. * `as_factor(levels = "values")` preserves values of unlabelled elements (#570). * `labelled_spss()` is a little stricter: it prevents `na_range` and `na_value` from containing missing values, and ensures that `na_range` is in the correct order (#574). * `read_spss()` now reads NA values and ranges of character variables (#409). * `write_dta()` now correctly writes tagged NAs (including tagged NAs in labels) (#583) and once again validates length of variables names (#485). * `write_*()` now validate file and variable metadata with ReadStat. This should prevent many invalid files from being written (#408). Additionally, validation failures now provide more details about the source of the problem (e.g. the column name of the problem) (#463). * `write_sav(compress = FALSE)` now uses SPSS bytecode compression instead of the rarely-used uncompressed mode. `compress = TRUE` continues to use the newer (and not universally supported, but more compact) zlib format (@oliverbock, #544). # haven 2.3.1 * Add missing methods so `median()`, `quantile()` and `summary()` work once more (#520). * Add missing cast methods (#522). # haven 2.3.0 * `labelled()` gains the necessary support to work seemlessly in dplyr 1.0.0, tidyr 1.0.0, and other packages that use vctrs (@mikmart, #496). * `labelled()` vectors now explicitly inherit from the corresponding base types (e.g. integer, double, or character) (#509). * ReadStat update, including `read_sas()` supports for "any" encoding (#482), and fixes for compiler warnings. # haven 2.2.0 ## Partial reading Thanks to the hard work of @mikmart, all `read_*()` functions gain three new arguments that allow you to read in only part of a large file: * `col_select`: selects columns to read with a tidyselect interface (#248). * `skip`: skips rows before reading data (#370). * `n_max`: limits the number of rows to read. This also brings with it a deprecation: `cols_only` in `read_sas()` has been deprecated in favour of the new `col_select` argument. ## Minor improvements and bug fixes * `as_factor()` allows non-unique labels when `levels = "label"`. This fixes a particularly annoying printing bug (#424, @gergness) * `read_sas()` now supports (IS|E|B)8601(DT|DA|TM) date/time formats (@mikmart). * All `write_` functions gain a `.name_repair` argument that controls what happens when the input dataset has repeated column names (#436). * All `write_` functions can now write labelled vectors with `NULL` labels (#442). * `write_dta()` can now write dataset labels with the `label` argument, which defaults to the `label` attribute of the input data frame, if present (@gorcha, #449). * `write_dta()` works better with Stata 15, thanks to updated ReadStat (#461) # haven 2.1.1 * Fixes for R CMD check # haven 2.1.0 ## Improved labelling `labelled` objects get pretty printing that shows the labels and NA values when inside of a `tbl_df`. Turn this behaviour off with behavior using `option(haven.show_pillar_labels = FALSE)` (#340, @gergness). `labelled()` and `labelled_spss()` now allow `NULL` labels. This makes both classes more flexible, allowing you to use them for their other attributes (#219). `labelled()` tests that value labels are unique (@larmarange, #364) ## Minor improvements and bug fixes * `as_factor()`: * Is faster when input doesn't contain any missing values (@hughparsonage). * Added `labelled` method for backward compatbility (#414). * `data.frame` method now correctly passes `...` along (#407, @zkamvar). * `write_dta()` now checks that the labelled values are integers, not the values themselves (#401). * Updated to latest ReadStat from @evanmiller: * `read_por()` can now read files from SPSS 25 (#412) * `read_por()` now uses base-30 instead of base-10 for the exponent (#413) * `read_sas()` can read zero column file (#420) * `read_sav()` reads long strings (#381) * `read_sav()` has greater memory limit allowing it to read more labels (#418) * `read_spss()` reads long variable labels (#422) * `write_sav()` no longer creates incorrect column names when >10k columns (#410) * `write_sav()` no longer crashes when writing long label names (#395) # haven 2.0.0 ## BREAKING CHANGES * `labelled()` and `labelled_spss()` now produce objects with class "haven_labelled" and "haven_labelled_spss". Previously, the "labelled" class name clashed with the labelled class defined by Hmisc (#329). Unfortunately I couldn't come up with a way to fix this problem except to change the class name; it seems reasonable that haven should be the one to change names given that Hmisc has been around much longer. This will require some changes to packages that use haven, but shouldn't affect user code. ## Minor improvements * `labelled()` and `labelled_spss()` now support adding the `label` attribute to the resulting object. The `label` is a short, human-readable description of the object, and is now also used when printing, and can be easily removed using the new `zap_label()` function. (#362, @huftis) Previously, the `label` attribute was supported both when reading and writing SPSS files, but it was not possible to actually create objects in R having the `label` attribute using the constructors `labelled()` or `labelled_spss()`. # haven 1.1.2 * haven can read and write non-ASCII paths in R 3.5 (#371). * `labelled_spss` objects preserve their attributes when subsetted (#360, @gergness). * `read_sav()` gains an `encoding` argument to override the encoding stored in the file (#305). `read_sav()` can now read `.zsav` files (#338). * `write_*()` functions now invisibly return the input data frame (as documented) (#349, @austensen). * `write_dta()` allows non-ASCII variable labels for version 14 and above (#383). It also uses a less strict check for integers so that a labelled double containing only integer values can written (#343). * `write_sav()` produces `.zsav` files when `compress = TRUE` (#338). * `write_xpt()` can now set the "member" name, which defaults to the file name san extension (#328). * Update to latest readstat. * Fixes out of memory error (#342) * Now supports reading and writing stata 15 files (#339) * Negative integer labelled values were tagged as missing (#367) * Fix for when `as_factor()` with option `levels="labels"` is used on tagged NAs (#340, @gergness) # haven 1.1.1 * Update to latest readstat. Includes: * SPSS: empty charater columns now read as character (#311) * SPSS: now write long strings (#266) * Stata: reorder labelled vectors on write (#327) * State: `encoding` now affects value labels (#325) * SAS: can now write wide/long rows (#272, #335). * SAS: can now handle Windows Vietnamese character set (#336) * `read_por()` and `read_xpt()` now correctly preserve attributes if output needs to be reallocated (which is typical behaviour) (#313) * `read_sas()` recognises date/times format with trailing separator and width specifications (#324) * `read_sas()` gains a `catalog_encoding` argument so you can independently specify encoding of data and catalog (#312) * `write_*()` correctly measures lengths of non-ASCII labels (#258): this fixes the cryptic error "A provided string value was longer than the available storage size of the specified column." * `write_dta()` now checks for bad labels in all columns, not just the first (#326). * `write_sav()` no longer fails on empty factors or factors with an `NA` level (#301) and writes out more metadata for `labelled_spss` vectors (#334). # haven 1.1.0 * Update to latest readstat. Includes: * SAS: support Win baltic code page (#231) * SAS: better error messages instead of crashes (#234, #270) * SAS: fix "unable to read error" (#271) * SPSS: support uppercase time stamps (#230) * SPSS: fixes for 252-255 byte strings (#226) * SPSS: fixes for 0 byte strings (#245) * Share `as_factor()` with forcats package (#256) * `read_sav()` once again correctly returns system defined missings as `NA` (rather than `NaN`) (#223). `read_sav()` and `write_sav()` preserve SPSS's display widths (@ecortens). * `read_sas()` gains experimental `cols_only` argument to only read in specified columns (#248). * tibbles are created with `tibble::as_tibble()`, rather than by "hand" (#229). * `write_sav()` checks that factors don't have levels with >120 characters (#262) * `write_dta()` no longer checks that all value labels are at most 32 characters (since this is not a restriction of dta files) (#239). * All write methds now check that you're trying to write a data frame (#287). * Add support for reading (`read_xpt()`) and writing (`write_xpt()`) SAS transport files. * `write_*` functions turn ordered factors into labelled vectors (#285) # haven 1.0.0 * The ReadStat library is stored in a subdirectory of `src` (#209, @krlmlr). * Import tibble so that tibbles are printed consistently (#154, @krlmlr). * Update to latest ReadStat (#65). Includes: * Support for binary (aka Ross) compression for SAS (#31). * Support extended ASCII encoding for Stata (#71). * Support for Stata 14 files (#75, #212). * Support for SPSS value labels with more than 8 characters (#157). * More likely to get an error when attempting to create an invalid output file (#171). * Added support for reading and writing variable formats. Similarly to to variable labels, formats are stored as an attribute on the vector. Use `zap_formats()` if you want to remove these attributes. (@gorcha, #119, #123). * Added support for reading file "label" and "notes". These are not currently printed, but are stored in the attributes if you need to access them (#186). * Added support for "tagged" missing values (in Stata these are called "extended" and in SAS these are called "special") which carry an extra byte of information: a character label from "a" to "z". The downside of this change is that all integer columns are now converted to doubles, to support the encoding of the tag in the payload of a NaN. * New `labelled_spss()` is a subclass of `labelled()` that can model user missing values from SPSS. These can either be a set of distinct values, or for numeric vectors, a range. `zap_labels()` strips labels, and replaces user-defined missing values with `NA`. New `zap_missing()` just replaces user-defined missing vlaues with `NA`. `labelled_spss()` is potentially dangerous to work with in R because base functions don't know about `labelled_spss()` functions so will return the wrong result in the presence of user-defined missing values. For this reason, they will only be created by `read_spss()` when `user_na = TRUE` (normally user-defined missings are converted to NA). * `as_factor()` no longer drops the `label` attribute (variable label) when used (#177, @itsdalmo). * Using `as_factor()` with `levels = "default` or `levels = "both"` preserves unused labels (implicit missing) when converting (#172, @itsdalmo). Labels (and the resulting factor levels) are always sorted by values. * `as_factor()` gains a new `levels = "default"` mechanism. This uses the labels where present, and otherwise uses the labels. This is now the default, as it seems to map better to the semantics of labelled values in other statistical packages (#81). You can also use `levels = "both"` to combine the value and the label into a single string (#82). It also gains a method for data frames, so you can easily convert every labelled column to a factor in one function call. * New `vignette("semantics", package = "haven")` discusses the semantics of missing values and labelling in SAS, SPSS, and Stata, and how they are translated into R. * Support for `hms()` has been moved into the hms package (#162). Time varibles now have class `c("hms", "difftime")` and a `units` attribute with value "secs" (#162). * `labelled()` is less strict with its checks: you can mix double and integer value and labels (#86, #110, @lionel-), and `is.labelled()` is now exported (#124). Putting a labelled vector in a data frame now generates the correct column name (#193). * `read_dta()` now recognises "%d" and custom date types (#80, #130). It also gains an encoding parameter which you can use to override the default encoding. This is particularly useful for Stata 13 and below which did not store the encoding used in the file (#163). * `read_por()` now actually works (#35). * `read_sav()` now correctly recognises EDATE and JDATE formats as dates (#72). Variables with format DATE, ADATE, EDATE, JDATE or SDATE are imported as `Date` variables instead of `POSIXct`. You can now set `user_na = TRUE` to preserve user defined missing values: they will be given class `labelled_spss`. * `read_dta()`, `read_sas()`, and `read_sav()` have a better test for missing string values (#79). They can all read from connections and compressed files (@lionel-, #109) * `read_sas()` gains an encoding parameter to overide the encoding stored in the file if it is incorrect (#176). It gets better argument names (#214). * Added `type_sum()` method for labelled objects so they print nicely in tibbles. * `write_dta()` now verifies that variable names are valid Stata variables (#132), and throws an error if you attempt to save a labelled vector that is not an integer (#144). You can choose which `version` of Stata's file format to output (#217). * New `write_sas()` allows you to write data frames out to `sas7bdat` files. This is still somewhat experimental. * `write_sav()` writes hms variables to SPSS time variables, and the "measure" type is set for each variable (#133). * `write_dta()` and `write_sav()` support writing date and date/times (#25, #139, #145). Labelled values are always converted to UTF-8 before being written out (#87). Infinite values are now converted to missing values since SPSS and Stata don't support them (#149). Both use a better test for missing values (#70). * `zap_labels()` has been completely overhauled. It now works (@markriseley, #69), and only drops label attributes; it no longer replaces labelled values with `NA`s. It also gains a data frame method that zaps the labels from every column. * `print.labelled()` and `print.labelled_spss()` now display the type. # haven 0.2.0 * fixed a bug in `as_factor.labelled`, which generated 's and wrong labels for integer labels. * `zap_labels()` now leaves unlabelled vectors unchanged, making it easier to apply to all columns. * `write_dta()` and `write_sav()` take more care to always write output as UTF-8 (#36) * `write_dta()` and `write_sav()` won't crash if you give them invalid paths, and you can now use `~` to refer to your home directory (#37). * Byte variables are now correctly read into integers (not strings, #45), and missing values are captured correctly (#43). * Added `read_stata()` as alias to `read_dta()` (#52). * `read_spss()` uses extension to automatically choose between `read_sav()` and `read_por()` (#53) * Updates from ReadStat. Including fixes for various parsing bugs, more encodings, and better support for large files. * hms objects deal better with missings when printing. * Fixed bug causing labels for numeric variables to be read in as integers and associated error: ``Error: `x` and `labels` must be same type`` # haven 0.1.1 * Fixed memory initialisation problems found by valgrind. haven/MD50000644000176200001440000002733714102416217011675 0ustar liggesusers11702572f0b0b285b80073996f4b3eb6 *DESCRIPTION 59057db2fcc7f8f6bb930846cc7ae761 *LICENSE dfe4066967cfd262ba7e3a29a22d8d64 *NAMESPACE e599f9a526580cc32a7c261344a275b9 *NEWS.md 278df3d095bceb77de9785e8a3fcbfeb *R/as_factor.R 7fce9c2a647bdd4b58b5b9910035c9eb *R/cpp11.R cc3e26ea996c8c730077078713878024 *R/haven-package.R ec0ebd30d6a0d77b17c39c7b9060ee03 *R/haven-sas.R a510d03182e66358e6f70ddc0dca7b60 *R/haven-spss.R d43a586129e7429744224cc5c6122601 *R/haven-stata.R 6c0eaa712c0eed7ee15c2736129db567 *R/labelled-pillar.R 049a752896146a91eaa0948f51c68419 *R/labelled.R 3b176806635d0da0cd95586a349dca83 *R/labelled_spss.R 7f11d00351f491ad5ddac26093462273 *R/tagged_na.R 5bdcef0542cf1adf95968058cdde71a3 *R/update.R f4321b9fca2d0425fd401e5e6dc896be *R/utils.R 9e70ca4f83cd0ee16e8750f4fcfcfb93 *R/zap_empty.R 370b4c48e52079424321e0fddb718d58 *R/zap_formats.R 8b9d78179cd6898a03feacc5f3da5433 *R/zap_label.R d7297968a710b5ed58c21c717c661dbd *R/zap_labels.R 88b0c02d00c12eb30479953c815329dd *R/zap_missing.R 2e572376deee0b066ce9ed926df35586 *R/zap_widths.R 7fc3339c794878bdf42aad17d9aa974c *R/zzz.R ed2824ee1db977a8f588f131856c292b *README.md 3bb1fc806fc7635fd4cc01d2cce44628 *build/vignette.rds 2e217a3f5ce3af05183d41b049e3ac28 *inst/doc/datetimes.Rmd e52664898d3fb76be561abc0f3273a4d *inst/doc/datetimes.html d65caa7be28eceb46483900f58d129db *inst/doc/semantics.R 7597ff3a37dd05812a625a401f75ab60 *inst/doc/semantics.Rmd bbd0ea5fb40eb71d2ccc499a8485fa3b *inst/doc/semantics.html 782776cdad132bd02616ab5e0ddbb1a6 *inst/examples/iris.dta 6d7292019b3784d97ba2e2ced4ce5bac *inst/examples/iris.sas7bdat 9eec419726af6bb92009b25c5224e32b *inst/examples/iris.sav af144a86f0e446ef4c79743eaf1f206e *man/as_factor.Rd 0fa3558f0a322190305e9aa9c560305e *man/figures/logo.png ed064140f8cfec0f8ef2f8263cebf293 *man/haven-package.Rd e491b7492902efc645fcd7247819ae03 *man/labelled.Rd 6081aa82c0dfb19117160128f62c2dd8 *man/labelled_spss.Rd 24caa8b914409540e21ab4643fa6b3c5 *man/print_labels.Rd 5c39b5842f80a4b465f348b3bbbd28a8 *man/read_dta.Rd 8d59255fccfa0dde7094655b964b72e6 *man/read_sas.Rd a1bf3526fcca9047cb1cafd5ecaa43f7 *man/read_spss.Rd c4b56015785263c34c72d13a03db03af *man/read_xpt.Rd 63677c09f81f127d6d177f1578a51899 *man/tagged_na.Rd eda687561f95b4621502d83010ad03dc *man/vec_arith.haven_labelled.Rd 4444694fd93a620e651ad56e31057d1b *man/zap_empty.Rd cee7653156802db60b699a543005cbd7 *man/zap_formats.Rd 6cdf8da24c21fdcb11b6ff38f5814c0c *man/zap_label.Rd 59bd668d2810d87baf82bbc7961c6aa1 *man/zap_labels.Rd 769d5161d295987b7c3bb18c61c24d5c *man/zap_missing.Rd 1a288019a7882014999286262ecbf19d *man/zap_widths.Rd a871575e9cbd45525e06bf847c34d62c *src/DfReader.cpp dccefb780b00a856517e540c930548f5 *src/DfWriter.cpp 3d025c60416626c4809439175bd65e2f *src/Makevars d07eac277f23601815fa6163b3a47ebb *src/Makevars.win 5e01ba2214a4d7a461c4ce116b48f1a2 *src/cpp11.cpp 8479a5125b7e2ff4fd0ee78207eb1e79 *src/haven_types.cpp e02d7d3096abb42d59ffa98896ea6f30 *src/haven_types.h 103dff612cd33e793e40e7248ad66baa *src/readstat/CKHashTable.c 491bcc72997b8fda9127f32362515309 *src/readstat/CKHashTable.h bf3e2d382bcd8d3db7df44f3fd327563 *src/readstat/LICENSE 548ac1f6fd81c79230fc840aef448537 *src/readstat/NEWS 08628913656834cf3079d1dc746e4128 *src/readstat/readstat.h 91afb7a9b9258fa50c7066b55ea97999 *src/readstat/readstat_bits.c ee0bb989382f7fde0b4ecd721c63437a *src/readstat/readstat_bits.h 2a3cffc8f3b29795c1bf0570591ce7e0 *src/readstat/readstat_convert.c 94288276f3982ee9834e010e151d6645 *src/readstat/readstat_convert.h 61468195b63500e28cb9df6d3e1bc5d5 *src/readstat/readstat_error.c 2935273dde775fb72f077e45a259337f *src/readstat/readstat_iconv.h 24f4b501fef49f6534afe97e13ede650 *src/readstat/readstat_io_unistd.c 6d20f5534b536f5fbd48480cb1df9a7a *src/readstat/readstat_io_unistd.h 5d8a61f1413683fc00f56388c8fc1d90 *src/readstat/readstat_malloc.c 1e19c96593352dfd6acf55fd290101f4 *src/readstat/readstat_malloc.h cc8ab759021ae0b60db5406785672490 *src/readstat/readstat_metadata.c 164bf3ab50223b1fa8da29a967875ff7 *src/readstat/readstat_parser.c 6a4cde4107088907ed746c200fe69be7 *src/readstat/readstat_strings.h ad5546731948373135e21a00a9b71294 *src/readstat/readstat_value.c 8c4e3303f75c6d8dcd0032ba4a453bc3 *src/readstat/readstat_variable.c 8469e8f0d7929c74ce49bdfee7aa9bf0 *src/readstat/readstat_writer.c 9613047f750a4812cef0674584c5f12d *src/readstat/readstat_writer.h 70b27ed1d71dc01c682c05ae8ef25440 *src/readstat/sas/ieee.c a9f00e5b895054ef83c9bf82eb8f257f *src/readstat/sas/ieee.h 74983b99dc9d3d7c3d70ece02048ade2 *src/readstat/sas/readstat_sas.c 269da3f8e7515c7746b0dcb87cc2f74a *src/readstat/sas/readstat_sas.h 6533c74947e64c6a94364a85ab76c196 *src/readstat/sas/readstat_sas7bcat_read.c 4190e1b6c322e28cb4a7788d0dfd3ad9 *src/readstat/sas/readstat_sas7bcat_write.c c5c1869728e57bbedd480de9129a07af *src/readstat/sas/readstat_sas7bdat_read.c 4f924014410260e45619628dc3da49a7 *src/readstat/sas/readstat_sas7bdat_write.c 2e588ecf8b2d2079783c632bbebbc330 *src/readstat/sas/readstat_sas_rle.c a54f285ec408075cfd4f16f30fc4dc8e *src/readstat/sas/readstat_sas_rle.h 10f0fb38bd48b60a82a1123c3d9ecfaf *src/readstat/sas/readstat_xport.c 1b1cf645d9e9e2a0704cbb2d46dca15c *src/readstat/sas/readstat_xport.h 04788ddf12683dec83803db83fa9f9f0 *src/readstat/sas/readstat_xport_read.c d87b6b6d3574cd88d2b05820e4b74019 *src/readstat/sas/readstat_xport_write.c 7d73ca9f1c1458eca16f0cfb82824188 *src/readstat/spss/readstat_por.c 366840304a855d4901ae0b30f65c6835 *src/readstat/spss/readstat_por.h 39debfe6648b1a3e4b8986a47fefc60a *src/readstat/spss/readstat_por_parse.c 8fef965008763eac54d94438d33f7408 *src/readstat/spss/readstat_por_parse.h 5883eb91f972c7749322857c4e02d570 *src/readstat/spss/readstat_por_parse.rl ea0e57265b01821821e4c8896fafeb73 *src/readstat/spss/readstat_por_read.c da5a6b1f7d245248e6a35274c4132e5c *src/readstat/spss/readstat_por_write.c ac54ef1a89ec422b9c5eabc2cced87cd *src/readstat/spss/readstat_sav.c 5a95cb372899834027dba1c44fa8bb1c *src/readstat/spss/readstat_sav.h fedb859ccd21066ee484794d18db2e5b *src/readstat/spss/readstat_sav_compress.c 36c3cf444f8bd315b2df9241ff4bd86c *src/readstat/spss/readstat_sav_compress.h fc6cbaaa120bd9b6c422409d993d2ddb *src/readstat/spss/readstat_sav_parse.c 0153d271c5d495b5977057d9c19d9f01 *src/readstat/spss/readstat_sav_parse.h 370e8663842a9cc61dc91a93df984cd6 *src/readstat/spss/readstat_sav_parse.rl 05cbfd44c26c15a392607cc82a0d0037 *src/readstat/spss/readstat_sav_parse_timestamp.c 204b84a989b74ca7b4aad4205b67c357 *src/readstat/spss/readstat_sav_parse_timestamp.h ebceeb91b0b09cf5aa6f169e7305d1ca *src/readstat/spss/readstat_sav_parse_timestamp.rl 9e35f19b884566c818627d04e2cca668 *src/readstat/spss/readstat_sav_read.c 98c4a55cbea891ec34dae1e319f2fbaf *src/readstat/spss/readstat_sav_write.c 0af090ab65b13eb000cbf41485ea7a3f *src/readstat/spss/readstat_spss.c 7b9fd9b57ab7ba821f42a40880ebe11a *src/readstat/spss/readstat_spss.h 5ee7735c999de12bb6affe6a95df05e6 *src/readstat/spss/readstat_spss_parse.c 751f5e801cfb6e74dd796bd3c6b032c2 *src/readstat/spss/readstat_spss_parse.h bdb3b2546c0d8509f7b1a39ad6fbf8e4 *src/readstat/spss/readstat_spss_parse.rl 73403c81c103658ee4cc8f03f26a59e2 *src/readstat/spss/readstat_zsav_compress.c 15e03262560b5aee8e36da740b2502bb *src/readstat/spss/readstat_zsav_compress.h 629353f14b8e6fdb7bf59d1570934094 *src/readstat/spss/readstat_zsav_read.c 7c52e4ada261187caed0e1c7375dd4a5 *src/readstat/spss/readstat_zsav_read.h 51073a1bde4e76bbd619680686b7c2e8 *src/readstat/spss/readstat_zsav_write.c 6b58bce5b4eba960ba0ba215b6d15e30 *src/readstat/spss/readstat_zsav_write.h 28f3bcb3658b37986bc0c82d3d735c7a *src/readstat/stata/readstat_dta.c 9dfb84582049244cfbbd9090cd72e446 *src/readstat/stata/readstat_dta.h d5f3933987e4372756c63c0459f545de *src/readstat/stata/readstat_dta_parse_timestamp.c cf98cd82614330671b97dc2ffa92b8f3 *src/readstat/stata/readstat_dta_parse_timestamp.h 27e936432e52694c55ce40db34b10a90 *src/readstat/stata/readstat_dta_parse_timestamp.rl 18b61f029a3ef8f8b1ca4794ddab8c15 *src/readstat/stata/readstat_dta_read.c 3a6bc66057111d51dcb41665259f2377 *src/readstat/stata/readstat_dta_write.c a1072d79ee57035032acde3a021aca5f *src/readstat/txt/commands_util.c 4877d5b8f9f088c20fb19ce82837a18c *src/readstat/txt/commands_util.h a59afac1ff703b431f428116606aff95 *src/readstat/txt/readstat_copy.c 6826795c794ebf1ec4a6b47cd7dfff8b *src/readstat/txt/readstat_copy.h 5c1d7b30366774d40fe2ca51b4ac4809 *src/readstat/txt/readstat_sas_commands_read.c caed984face895d9f9edc2286cc61001 *src/readstat/txt/readstat_sas_commands_read.rl 199d23918573dc6c3afc5f394f1e154a *src/readstat/txt/readstat_schema.c 1510a9f2909f3531b881eae8a7bc01d7 *src/readstat/txt/readstat_schema.h 1993ff3ebfd1958408541b266367ca8b *src/readstat/txt/readstat_spss_commands_read.c f3e7c4def02989c448390f2d9172aeac *src/readstat/txt/readstat_spss_commands_read.rl 5f0adbe49e320ab8cf720197eb95ea2b *src/readstat/txt/readstat_stata_dictionary_read.c 2308d72a35a2d16f988875a07fac1c35 *src/readstat/txt/readstat_stata_dictionary_read.rl f6c9c245bceeb238d2fdef2952dee0df *src/readstat/txt/readstat_txt_read.c aae161361ab10ceb021df485c8cda71b *src/tagged_na.c f3a180e741a675c27b36d4ce86e1922e *src/tagged_na.h 130ef81e2dee1f48002137a5809e7765 *tests/testthat.R e74d679bdb1e873221c361463545b397 *tests/testthat/_snaps/haven-sas.md 0929758e50e6ac5ec7dfa542bf8928fe *tests/testthat/_snaps/haven-spss.md fb605bbcb254b8c1af40c7ae7385684f *tests/testthat/_snaps/haven-stata.md 3855904ed9135df6ae3c6f79c5c3797a *tests/testthat/_snaps/labelled-pillar.md 63534f56943e97339fcd553f17aeba63 *tests/testthat/_snaps/labelled.md aa0a1ff0b1a948a200528982d6e6b3fe *tests/testthat/_snaps/labelled_spss.md 7e6c8338b53c0d14f6e9932a27ac7d0f *tests/testthat/_snaps/tagged_na.md 20eca1176892ac88ce4453a00692246c *tests/testthat/helper-roundtrip.R fbadd9b4b37370561380b3ed6a03f1c3 *tests/testthat/helpers-types.R ae0a03c2f46d9d09e4888ee98f7faccc *tests/testthat/sas/datetime.sas7bdat f74b89b03972dd66a8ee7e253d91842f *tests/testthat/sas/formats.sas7bcat c7d024dc776bd1891da13439799f8e9b *tests/testthat/sas/hadley.sas7bdat afe9293b155fbd51fd20e0c8dd84927a *tests/testthat/sas/hadley.zip fb6f2b8e08c1ff0564e3fdcf4d2a824b *tests/testthat/sas/tagged-na.sas7bcat 3c7429bbf497edf98dc73b193420b56c *tests/testthat/sas/tagged-na.sas7bdat 3da136a3cf1086d4ec7132badfc3f9e0 *tests/testthat/spss/datetime.sav 1df50141e26e230a726ff6d76da3583d *tests/testthat/spss/labelled-num-na.sav 0446f805dc7bb9273b01d5e19ec042f7 *tests/testthat/spss/labelled-num.sav 8866383dd5d2c61f89270fbe4a6802a0 *tests/testthat/spss/labelled-str.sav 2cb2c744d5da2db3f373f4a5dd1ae86e *tests/testthat/spss/umlauts.sav 556e6b180ab9777783d8ea55a37e6178 *tests/testthat/spss/variable-label.sav a906d1b8e57ae2fbdd5cc3bfbe84757c *tests/testthat/stata/datetime-d.dta b4963a6dcfd5ea880600fcf70b997215 *tests/testthat/stata/notes.dta 69eef7e56b7f7137bbd7f0e6d3526eb2 *tests/testthat/stata/tagged-na-double.dta b5faae7735cdbdea22666cab82af6444 *tests/testthat/stata/tagged-na-int.dta 5781ee0b0dd56a2d8cb1ac287731b8f0 *tests/testthat/stata/types.dta 77d2a85d9b0ce7f0ee15af282924fdfc *tests/testthat/test-as_factor.R aff6de6713555005e75fc9808d564485 *tests/testthat/test-haven-sas.R ccd699fd36baf78020a93dd8ba4cb7e3 *tests/testthat/test-haven-spss.R c9623a37b585087b7b1b52e098574ce3 *tests/testthat/test-haven-stata.R d761dc392c979fb4bc191f7549ac86b5 *tests/testthat/test-labelled-pillar.R 08647e84dc6ca1bb41bc01e07db6d7f9 *tests/testthat/test-labelled.R c1dc9ae49359f943843dbeee00d159cd *tests/testthat/test-labelled_spss.R d611a2a8dd215709917e62a1c138b3f4 *tests/testthat/test-tagged_na.R 7fb6cf2475f91e4885eb4c26a30f8575 *tests/testthat/test-zap-empty.R 6c6e770bc46558cf18d0b25c55fd82d1 *tests/testthat/test-zap_label.R 07baede625e7ce796cb517409a6d2634 *tests/testthat/test-zap_labels.R 2b6e366cd86f2ee0ec9518a6b28c936c *tests/testthat/test-zap_missing.R ac8051fb05b17063d2beec3c729a777c *tests/testthat/test-zap_widths.R 2e217a3f5ce3af05183d41b049e3ac28 *vignettes/datetimes.Rmd 7597ff3a37dd05812a625a401f75ab60 *vignettes/semantics.Rmd haven/inst/0000755000176200001440000000000014102332322012320 5ustar liggesusershaven/inst/examples/0000755000176200001440000000000014033646021014145 5ustar liggesusershaven/inst/examples/iris.sav0000644000176200001440000001504214033646021015630 0ustar liggesusers$FL2@(#) SPSS DATA FILE - https://github.com/WizardMac/ReadStat Y@10 Jun 1611:25:39 VAR0 VAR1 VAR2 VAR3 VAR4 ?setosa @ versicolor @ virginica   RVAR0=Sepal.Length VAR1=Sepal.Width VAR2=Petal.Length VAR3=Petal.Width VAR4=Speciesffffff@ @ffffff???@@ffffff???@ @???ffffff@@???@ @ffffff???@333333@333333???ffffff@333333 @ffffff?333333??@333333 @???@333333@ffffff???@@???@ @???333333@333333 @???333333@@ffffff???333333@@???333333@@333333???@@???@333333@???ffffff@ @ffffff?333333??@ffffff@333333?333333??ffffff@ffffff@?333333??@333333 @333333???ffffff@ @???ffffff@ @???ffffff@ffffff @333333???333333@333333 @ffffff???@@???@333333 @???@ @???@333333 @ffffff???@ @???333333@@???@333333 @???@ffffff@???@@ffffff???@@???@ @333333???@ @???@ @ffffff???@@???ffffff@333333 @???@ @?333333??@ffffff@?333333??@ @???@ @?333333??ffffff@ffffff@ffffff???333333@@ffffff?333333??ffffff@ffffff@???ffffff@ @ffffff???333333@ @???@ffffff @ffffff???@ @@ffffff?@@ @@?@@@@?@@ffffff@@?@@ffffff@ffffff@?@@ffffff@@?@333333@ffffff @@?@@333333@ffffff @?@ffffff@333333@ffffff@?@@@333333@ffffff?@@@ @?@@@@?@@@@?@ffffff@333333@@ffffff?@ffffff@333333@ @?@@@@ffffff?@ffffff@@@?@333333@@ffffff@?@@@@?@ffffff@@333333@?@@ @333333@?@ffffff@ffffff@@?@333333@@@?@ffffff@ffffff@@333333?@@333333@333333@?@ffffff@@@ffffff?@333333@ffffff@333333@ffffff?@@@@333333?@@333333@@?@@@ @?@@333333@ffffff@?@@333333@ @?@333333@@333333@333333?@@@ffffff@?@@@@?@@333333 @@?@@@@?@333333@ffffff@@?@ffffff@@ffffff@?@@@@?@@@@333333?@ffffff@@ffffff@ffffff?@333333@@@333333?@@ffffff@ffffff @?@ffffff@@@?@@@@333333?@@333333@@?@@333333@333333@?@ffffff@@@?@@ffffff@ffffff@?@333333@ffffff @@@@333333@@ffffff@ffffff?@ffffff@@@@@333333@333333@ffffff@?@@@333333@@@ffffff@@ffffff@@@@@@333333?@333333@333333@333333@?@@@333333@?@@ @ffffff@@@@ @ffffff@@@@@333333@ffffff?@333333@@@@@@@@@@333333@ffffff@ffffff@333333@@@ @333333@ffffff@@@@@?@@ffffff@@@@@@@ffffff@@@@@?@@ @@ffffff@@ffffff@ffffff@@@@@ffffff@@@@333333@@@?@@ffffff @@@@@ @@?@@ffffff@333333@?@ffffff@@@?@@ffffff@ffffff@@@@@333333@?@@ffffff@ffffff@ffffff?@@ffffff@@@@@ffffff@ffffff@@@333333@ffffff@ffffff@?@ffffff@@ffffff@ffffff?@@@ffffff@ffffff@@333333@333333 @ffffff@333333@@@@@?@@@333333@?@@@@@@@@ffffff@333333@@@@ffffff@ffffff@@333333@@ffffff@ffffff?@333333@ @@ffffff@@@ffffff @@@@@@@ffffff@@333333@@@ffffff?@@@@@@@333333 @@ffffff@@@@ffffff@?@haven/inst/examples/iris.sas7bdat0000644000176200001440000040000014033646021016537 0ustar liggesusers`Ͻ 1""332""3323#3>SAS FILEIRIS DATA FnAFnA 9.0401M2X64_8PROFn /a?FnA/a?  0T<P4`4,4444ffffff@ @ffffff??setosa@@ffffff??setosa@ @??setosaffffff@@??setosa@ @ffffff??setosa@333333@333333??setosaffffff@333333 @ffffff?333333?setosa@333333 @??setosa@333333@ffffff??setosa@@??setosa@ @??setosa333333@333333 @??setosa333333@@ffffff??setosa333333@@??setosa333333@@333333??setosa@@??setosa@333333@??setosaffffff@ @ffffff?333333?setosa@ffffff@333333?333333?setosaffffff@ffffff@?333333?setosa@333333 @333333??setosaffffff@ @??setosaffffff@ @??setosaffffff@ffffff @333333??setosa333333@333333 @ffffff??setosa@@??setosa@333333 @??setosa@ @??setosa@333333 @ffffff??setosa@ @??setosa333333@@??setosa@333333 @??setosa@ffffff@??setosa@@ffffff??setosa@@??setosa@ @333333??setosa@ @??setosa@ @ffffff??setosa@@??setosaffffff@333333 @??setosa@ @?333333?setosa@ffffff@?333333?setosa@ @??setosa@ @?333333?setosaffffff@ffffff@ffffff??setosa333333@@ffffff?333333?setosaffffff@ffffff@??setosaffffff@ @ffffff??setosa333333@ @??setosa@ffffff @ffffff??setosa@ @@ffffff?versic@ @@?versic@@@?versic@ffffff@@?versic@ffffff@ffffff@?versic@ffffff@@?versic333333@ffffff @@?versic@333333@ffffff @?versicffffff@333333@ffffff@?versic@@333333@ffffff?versic@@ @?versic@@@?versic@@@?versicffffff@333333@@ffffff?versicffffff@333333@ @?versic@@@ffffff?versicffffff@@@?versic333333@@ffffff@?versic@@@?versicffffff@@333333@?versic@ @333333@?versicffffff@ffffff@@?versic333333@@@?versicffffff@ffffff@@333333?versic@333333@333333@?versicffffff@@@ffffff?versic333333@ffffff@333333@ffffff?versic@@@333333?versic@333333@@?versic@@ @?versic@333333@ffffff@?versic@333333@ @?versic333333@@333333@333333?versic@@ffffff@?versic@@@?versic@333333 @@?versic@@@?versic333333@ffffff@@?versicffffff@@ffffff@?versic@@@?versic@@@333333?versicffffff@@ffffff@ffffff?versic333333@@@333333?versic@ffffff@ffffff @?versicffffff@@@?versic@@@333333?versic@333333@@?versic@333333@333333@?versicffffff@@@?versic@ffffff@ffffff@?versic333333@ffffff @@@virgin333333@@ffffff@ffffff?virginffffff@@@@virgin333333@333333@ffffff@?virgin@@333333@@virginffffff@@ffffff@@virgin@@@333333?virgin333333@333333@333333@?virgin@@333333@?virgin@ @ffffff@@virgin@ @ffffff@@virgin@@333333@ffffff?virgin333333@@@@virgin@@@@virgin333333@ffffff@ffffff@333333@virgin@ @333333@ffffff@virgin@@@?virgin@ffffff@@@virgin@@@ffffff@virgin@@@?virgin@ @@ffffff@virginffffff@ffffff@@@virgin@ffffff@@@virgin333333@@@?virgin@ffffff @@@virgin@ @@?virgin@ffffff@333333@?virginffffff@@@?virgin@ffffff@ffffff@@virgin@@333333@?virgin@ffffff@ffffff@ffffff?virgin@ffffff@@@virgin@ffffff@ffffff@@virgin333333@ffffff@ffffff@?virginffffff@@ffffff@ffffff?virgin@@ffffff@ffffff@virgin333333@333333 @ffffff@333333@virgin@@@?virgin@@333333@?virgin@@@@virgin@@ffffff@333333@virgin@@ffffff@ffffff@virgin333333@@ffffff@ffffff?virgin333333@ @@ffffff@virgin@ffffff @@@virgin@@@ffffff@virgin333333@@@ffffff?virgin@@@@virgin@333333 @@ffffff@virgin@@ffffff@?virgin~  ~ ~~| pl \X HD 40( D 0$ 8 L ` t DATASTEPSepal_LengthBESTBESTSepal_WidthBESTBESTPetal_LengthBESTBESTPetal_WidthBESTBESTSpecies$$ 0"( 5:/a?   Ghaven/inst/examples/iris.dta0000644000176200001440000002002514033646021015604 0ustar liggesusers
118LSF10 Jun 2016 13:00
D/C  sepallengthsepalwidthpetallengthpetalwidthspecies%9.0g%9.0g%9.0g%9.0g%10scvSepal.Lengthw @%@B @ QZ@8B B118%AZ@ ?8B @ @ @ QZ@QZ@Sepal.Widthw @%@B @ QZ@8B B118%AZ@ ?8B @ @ @ QZ@QZ@Petal.Lengthw @%@B @ QZ@8B B118%AZ@ ?8B @ @ @ QZ@QZ@Petal.Widthw @%@B @ QZ@8B B118%AZ@ ?8B @ @ @ QZ@QZ@Speciesdthw @%@B @ QZ@8B B118%AZ@ ?8B @ @ @ QZ@QZ@ _dtatmp\stata\iris-stata-14.dta", encoding(utf-8) ` ` f_lang_list(!(X 'ؚ'H)''<'ȷ''('Y (0(default _dtatmp\stata\iris-stata-14.dta", encoding(utf-8) ` ` f_lang_cp7 (!(X 'ؚ'H)''<'ȷ''('Y (0(default33@`@33?L>setosax̜@@@33?L>setosaxff@L@ff?L>setosax33@ffF@?L>setosax@fff@33?L>setosax̬@y@?>setosax33@Y@33?>setosax@Y@?L>setosax̌@9@33?L>setosax̜@ffF@?=setosax̬@l@?L>setosax@Y@?L>setosax@@@33?=setosax@@@̌?=setosax@@?L>setosaxff@̌@?>setosax̬@y@ff?>setosax33@`@33?>setosaxff@33s@?>setosax33@33s@?>setosax̬@Y@?L>setosax33@l@?>setosax33@fff@?L>setosax33@33S@??setosax@Y@33?L>setosax@@@?L>setosax@Y@?>setosaxff@`@?L>setosaxff@Y@33?L>setosaxff@L@?L>setosax@ffF@?L>setosax̬@Y@?>setosaxff@33@?=setosax@ff@33?L>setosax̜@ffF@?L>setosax@L@?L>setosax@`@ff?L>setosax̜@fff@33?=setosax̌@@@ff?L>setosax33@Y@?L>setosax@`@ff?>setosax@33@ff?>setosax̌@L@ff?L>setosax@`@??setosax33@33s@33?>setosax@@@33?>setosax33@33s@?L>setosax33@L@33?L>setosax@l@?L>setosax@33S@33?L>setosax@L@ff@33?versicolor@L@@?versicolor@ffF@̜@?versicolor@33@@ff?versicolor@333@33@?versicolorff@333@@ff?versicolor@33S@ff@?versicolor̜@@33S@?versicolor33@9@33@ff?versicolorff@,@y@33?versicolor@@`@?versicolor̼@@@ff@?versicolor@ @@?versicolor33@9@ff@33?versicolor33@9@fff@ff?versicolorff@ffF@̌@33?versicolor33@@@@?versicolor@,@33@?versicolorff@ @@?versicolor33@ @y@̌?versicolor̼@L@@ff?versicolor33@333@@ff?versicolor@ @̜@?versicolor33@333@ff@?versicolor@9@@ff?versicolor33@@@̌@33?versicolor@333@@33?versicolorff@@@@?versicolor@9@@?versicolorff@ff&@`@?versicolor@@33s@̌?versicolor@@l@?versicolor@,@y@?versicolor@,@33@?versicolor̬@@@@?versicolor@Y@@?versicolorff@ffF@ff@?versicolor@33@̌@ff?versicolor33@@@33@ff?versicolor@ @@ff?versicolor@ff&@̌@?versicolor33@@@33@33?versicolor@ff&@@?versicolor@33@33S@?versicolor33@,@ff@ff?versicolorff@@@ff@?versicolorff@9@ff@ff?versicolorff@9@@ff?versicolor33@ @@@̌?versicolorff@333@33@ff?versicolor@33S@@ @virginica@,@33@33?virginica33@@@̼@ff@virginica@9@33@ff?virginica@@@@ @virginica33@@@33@ff@virginica̜@ @@?virginica@9@@ff?virginicaff@ @@ff?virginicaff@fff@33@ @virginica@L@33@@virginica@,@@33?virginica@@@@ff@virginicaff@ @@@virginica@333@33@@virginica@L@@33@virginica@@@@ff?virginicaff@33s@ff@ @virginicaff@ff&@@33@virginica@ @@?virginica@L@ff@33@virginica33@333@̜@@virginicaff@333@ff@@virginica@,@̜@ff?virginicaff@33S@ff@ff@virginicaff@L@@ff?virginicaff@333@@ff?virginica33@@@̜@ff?virginica@333@33@ff@virginicaff@@@@?virginica@333@33@33?virginica@33s@@@virginica@333@33@ @virginica@333@33@?virginica33@ff&@33@33?virginicaff@@@33@33@virginica@Y@33@@virginica@ffF@@ff?virginica@@@@ff?virginica@ffF@̬@ff@virginicaff@ffF@33@@virginica@ffF@33@33@virginica@,@33@33?virginica@L@̼@33@virginicaff@33S@ff@ @virginicaff@@@ff@33@virginica@ @@33?virginica@@@ff@@virginicaff@Y@̬@33@virginica̼@@@33@ff?virginica
haven/inst/doc/0000755000176200001440000000000014102332322013065 5ustar liggesusershaven/inst/doc/semantics.Rmd0000644000176200001440000001445314033646021015535 0ustar liggesusers--- title: "Conversion semantics" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Conversion semantics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(haven) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` There are some differences between the way that R, SAS, SPSS, and Stata represented labelled data and missing values. While SAS, SPSS, and Stata share some obvious similarities, R is little different. This vignette explores the differences, and shows you how haven bridges the gap. ## Value labels Base R has one data type that effectively maintains a mapping between integers and character labels: the factor. This however, is not the primary use of factors: they are instead designed to automatically generate useful contrasts for linear models. Factors differ from the labelled values provided by the other tools in important ways: * SPSS and SAS can label numeric and character values, not just integer values. * The value do not need to be exhaustive. It is common to label the special missing values (e.g. `.D` = did not respond, `.N` = not applicable), while leaving other values as is. Value labels in SAS are a little different again. In SAS, labels are just special case of general formats. Formats include currencies and dates, but user-defined just assigns labels to individual values (including special missings value). Formats have names and existing independently of the variables they are associated with. You create a named format with `PROC FORMAT` and then associated with variables in a `DATA` step (the names of character formats thealways start with `$`). ### `labelled()` To allow you to import labelled vectors into R, haven provides the S3 labelled class, created with `labelled()`. This class allows you to associated arbitrary labels with numeric or character vectors: ```{r} x1 <- labelled( sample(1:5), c(Good = 1, Bad = 5) ) x1 x2 <- labelled( c("M", "F", "F", "F", "M"), c(Male = "M", Female = "F") ) x2 ``` The goal of haven is not to provide a labelled vector that you can use everywhere in your analysis. The goal is to provide an intermediate datastructure that you can convert into a regular R data frame. You can do this by either converting to a factor or stripping the labels: ```{r} as_factor(x1) zap_labels(x1) as_factor(x2) zap_labels(x2) ``` See the documentation for `as_factor()` for more options to control exactly what the factor uses for levels. Both `as_factor()` and `zap_labels()` have data frame methods if you want to apply the same strategy to every column in a data frame: ```{r} df <- tibble::data_frame(x1, x2, z = 1:5) df zap_labels(df) as_factor(df) ``` ## Missing values All three tools provide a global "system missing value" which is displayed as `.`. This is roughly equivalent to R's `NA`, although neither Stata nor SAS propagate missingness in numeric comparisons: SAS treats the missing value as the smallest possible number (i.e. `-inf`), and Stata treats it as the largest possible number (i.e. `inf`). Each tool also provides a mechanism for recording multiple types of missingness: * Stata has "extended" missing values, `.A` through `.Z`. * SAS has "special" missing values, `.A` through `.Z` plus `._`. * SPSS has per-column "user" missing values. Each column can declare up to three distinct values or a range of values (plus one distinct value) that should be treated as missing. Stata and SAS only support tagged missing values for numeric columns. SPSS supports up to three distinct values for character columns. Generally, operations involving a user-missing type return a system missing value. Haven models these missing values in two different ways: * For SAS and Stata, haven provides "tagged" missing values which extend R's regular `NA` to add a single character label. * For SPSS, haven provides a subclass of `labelled` that also provides user defined values and ranges. ### Tagged missing values To support Stata's extended and SAS's special missing value, haven implements a tagged NA. It does this by taking advantage of the internal structure of a floating point NA. That allows these values to behave identical to NA in regular R operations, while still preserving the value of the tag. The R interface for creating with tagged NAs is a little clunky because generally they'll be created by haven for you. But you can create your own with `tagged_na()`: ```{r} x <- c(1:3, tagged_na("a", "z"), 3:1) x ``` Note these tagged NAs behave identically to regular NAs, even when printing. To see their tags, use `print_tagged_na()`: ```{r} print_tagged_na(x) ``` To test if a value is a tagged NA, use `is_tagged_na()`, and to extract the value of the tag, use `na_tag()`: ```{r} is_tagged_na(x) is_tagged_na(x, "a") na_tag(x) ``` My expectation is that tagged missings are most often used in conjuction with labels (described below), so labelled vectors print the tags for you, and `as_factor()` knows how to relabel: ```{r} y <- labelled(x, c("Not home" = tagged_na("a"), "Refused" = tagged_na("z"))) y as_factor(y) ``` ### User defined missing values SPSS's user-defined values work differently to SAS and Stata. Each column can have either up to three distinct values that are considered as missing, or a range. Haven provides `labelled_spss()` as a subclass of `labelled()` to model these additional user-defined missings. ```{r} x1 <- labelled_spss(c(1:10, 99), c(Missing = 99), na_value = 99) x2 <- labelled_spss(c(1:10, 99), c(Missing = 99), na_range = c(90, Inf)) x1 x2 ``` These objects are somewhat dangerous to work with in R because most R functions don't know those values are missing: ```{r} mean(x1) ``` Because of that danger, the default behaviour of `read_spss()` is to return regular labelled objects where user-defined missing values have been converted to `NA`s. To get `read_spss()` to return `labelled_spss()` objects, you'll need to set `user_na = TRUE`. I've defined an `is.na()` method so you can find them yourself: ```{r} is.na(x1) ``` And the presence of that method does mean many functions with an `na.rm` argument will work correctly: ```{r} mean(x1, na.rm = TRUE) ``` But generally you should either convert to a factor, convert to regular missing vaues, or strip the all the labels: ```{r} as_factor(x1) zap_missing(x1) zap_labels(x1) ``` haven/inst/doc/semantics.R0000644000176200001440000000347314102332322015205 0ustar liggesusers## ---- include = FALSE--------------------------------------------------------- library(haven) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ## ----------------------------------------------------------------------------- x1 <- labelled( sample(1:5), c(Good = 1, Bad = 5) ) x1 x2 <- labelled( c("M", "F", "F", "F", "M"), c(Male = "M", Female = "F") ) x2 ## ----------------------------------------------------------------------------- as_factor(x1) zap_labels(x1) as_factor(x2) zap_labels(x2) ## ----------------------------------------------------------------------------- df <- tibble::data_frame(x1, x2, z = 1:5) df zap_labels(df) as_factor(df) ## ----------------------------------------------------------------------------- x <- c(1:3, tagged_na("a", "z"), 3:1) x ## ----------------------------------------------------------------------------- print_tagged_na(x) ## ----------------------------------------------------------------------------- is_tagged_na(x) is_tagged_na(x, "a") na_tag(x) ## ----------------------------------------------------------------------------- y <- labelled(x, c("Not home" = tagged_na("a"), "Refused" = tagged_na("z"))) y as_factor(y) ## ----------------------------------------------------------------------------- x1 <- labelled_spss(c(1:10, 99), c(Missing = 99), na_value = 99) x2 <- labelled_spss(c(1:10, 99), c(Missing = 99), na_range = c(90, Inf)) x1 x2 ## ----------------------------------------------------------------------------- mean(x1) ## ----------------------------------------------------------------------------- is.na(x1) ## ----------------------------------------------------------------------------- mean(x1, na.rm = TRUE) ## ----------------------------------------------------------------------------- as_factor(x1) zap_missing(x1) zap_labels(x1) haven/inst/doc/datetimes.Rmd0000644000176200001440000000243014101766302015517 0ustar liggesusers--- title: "Dates and times" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Dates and times} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Formats There are three common formats across SAS, SPSS and Stata. Date (number of days) * SAS: MMDDYY, DDMMYY, YYMMDD, DATE * Spss: n/a * Stata: %td Time (number of seconds): * SAS: TIME, HHMM, TOD * Spss: TIME, DTIME * Stata: n/a DateTime (number of seconds): * SAS: DATETIME * Spss: DATE, ADATE, SDATE, DATETIME (as milliseconds) * Stata: %tc, %tC ## Offsets Dates and date times use a difference offset to R: * SAS: 1960-01-01 (`r -as.integer(as.Date("1960-01-01"))` days) * Spss: 1582-10-14. (`r -as.integer(as.Date("1582-10-14"))` days) * Stata: 1960-01-01. (`r -as.integer(as.Date("1960-01-01"))` days) ## References * SAS: , * Spss: * Stata: haven/inst/doc/semantics.html0000644000176200001440000011004214102332322015737 0ustar liggesusers Conversion semantics

Conversion semantics

There are some differences between the way that R, SAS, SPSS, and Stata represented labelled data and missing values. While SAS, SPSS, and Stata share some obvious similarities, R is little different. This vignette explores the differences, and shows you how haven bridges the gap.

Value labels

Base R has one data type that effectively maintains a mapping between integers and character labels: the factor. This however, is not the primary use of factors: they are instead designed to automatically generate useful contrasts for linear models. Factors differ from the labelled values provided by the other tools in important ways:

  • SPSS and SAS can label numeric and character values, not just integer values.

  • The value do not need to be exhaustive. It is common to label the special missing values (e.g. .D = did not respond, .N = not applicable), while leaving other values as is.

Value labels in SAS are a little different again. In SAS, labels are just special case of general formats. Formats include currencies and dates, but user-defined just assigns labels to individual values (including special missings value). Formats have names and existing independently of the variables they are associated with. You create a named format with PROC FORMAT and then associated with variables in a DATA step (the names of character formats thealways start with $).

labelled()

To allow you to import labelled vectors into R, haven provides the S3 labelled class, created with labelled(). This class allows you to associated arbitrary labels with numeric or character vectors:

x1 <- labelled(
  sample(1:5), 
  c(Good = 1, Bad = 5)
)
x1
#> <labelled<integer>[5]>
#> [1] 4 3 1 2 5
#> 
#> Labels:
#>  value label
#>      1  Good
#>      5   Bad

x2 <- labelled(
  c("M", "F", "F", "F", "M"), 
  c(Male = "M", Female = "F")
)
x2
#> <labelled<character>[5]>
#> [1] M F F F M
#> 
#> Labels:
#>  value  label
#>      M   Male
#>      F Female

The goal of haven is not to provide a labelled vector that you can use everywhere in your analysis. The goal is to provide an intermediate datastructure that you can convert into a regular R data frame. You can do this by either converting to a factor or stripping the labels:

as_factor(x1)
#> [1] 4    3    Good 2    Bad 
#> Levels: Good 2 3 4 Bad
zap_labels(x1)
#> [1] 4 3 1 2 5

as_factor(x2)
#> [1] Male   Female Female Female Male  
#> Levels: Female Male
zap_labels(x2)
#> [1] "M" "F" "F" "F" "M"

See the documentation for as_factor() for more options to control exactly what the factor uses for levels.

Both as_factor() and zap_labels() have data frame methods if you want to apply the same strategy to every column in a data frame:

df <- tibble::data_frame(x1, x2, z = 1:5)
#> Warning: `data_frame()` was deprecated in tibble 1.1.0.
#> Please use `tibble()` instead.
df
#> # A tibble: 5 x 3
#>          x1 x2             z
#>   <int+lbl> <chr+lbl>  <int>
#> 1  4        M [Male]       1
#> 2  3        F [Female]     2
#> 3  1 [Good] F [Female]     3
#> 4  2        F [Female]     4
#> 5  5 [Bad]  M [Male]       5

zap_labels(df)
#> # A tibble: 5 x 3
#>      x1 x2        z
#>   <int> <chr> <int>
#> 1     4 M         1
#> 2     3 F         2
#> 3     1 F         3
#> 4     2 F         4
#> 5     5 M         5
as_factor(df)
#> # A tibble: 5 x 3
#>   x1    x2         z
#>   <fct> <fct>  <int>
#> 1 4     Male       1
#> 2 3     Female     2
#> 3 Good  Female     3
#> 4 2     Female     4
#> 5 Bad   Male       5

Missing values

All three tools provide a global “system missing value” which is displayed as .. This is roughly equivalent to R’s NA, although neither Stata nor SAS propagate missingness in numeric comparisons: SAS treats the missing value as the smallest possible number (i.e. -inf), and Stata treats it as the largest possible number (i.e. inf).

Each tool also provides a mechanism for recording multiple types of missingness:

  • Stata has “extended” missing values, .A through .Z.

  • SAS has “special” missing values, .A through .Z plus ._.

  • SPSS has per-column “user” missing values. Each column can declare up to three distinct values or a range of values (plus one distinct value) that should be treated as missing.

Stata and SAS only support tagged missing values for numeric columns. SPSS supports up to three distinct values for character columns. Generally, operations involving a user-missing type return a system missing value.

Haven models these missing values in two different ways:

  • For SAS and Stata, haven provides “tagged” missing values which extend R’s regular NA to add a single character label.

  • For SPSS, haven provides a subclass of labelled that also provides user defined values and ranges.

Tagged missing values

To support Stata’s extended and SAS’s special missing value, haven implements a tagged NA. It does this by taking advantage of the internal structure of a floating point NA. That allows these values to behave identical to NA in regular R operations, while still preserving the value of the tag.

The R interface for creating with tagged NAs is a little clunky because generally they’ll be created by haven for you. But you can create your own with tagged_na():

x <- c(1:3, tagged_na("a", "z"), 3:1)
x
#> [1]  1  2  3 NA NA  3  2  1

Note these tagged NAs behave identically to regular NAs, even when printing. To see their tags, use print_tagged_na():

print_tagged_na(x)
#> [1]     1     2     3 NA(a) NA(z)     3     2     1

To test if a value is a tagged NA, use is_tagged_na(), and to extract the value of the tag, use na_tag():

is_tagged_na(x)
#> [1] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE
is_tagged_na(x, "a")
#> [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE

na_tag(x)
#> [1] NA  NA  NA  "a" "z" NA  NA  NA

My expectation is that tagged missings are most often used in conjuction with labels (described below), so labelled vectors print the tags for you, and as_factor() knows how to relabel:

y <- labelled(x, c("Not home" = tagged_na("a"), "Refused" = tagged_na("z")))
y
#> <labelled<double>[8]>
#> [1]     1     2     3 NA(a) NA(z)     3     2     1
#> 
#> Labels:
#>  value    label
#>  NA(a) Not home
#>  NA(z)  Refused

as_factor(y)
#> [1] 1        2        3        Not home Refused  3        2        1       
#> Levels: 1 2 3 Not home Refused

User defined missing values

SPSS’s user-defined values work differently to SAS and Stata. Each column can have either up to three distinct values that are considered as missing, or a range. Haven provides labelled_spss() as a subclass of labelled() to model these additional user-defined missings.

x1 <- labelled_spss(c(1:10, 99), c(Missing = 99), na_value = 99)
x2 <- labelled_spss(c(1:10, 99), c(Missing = 99), na_range = c(90, Inf))

x1
#> <labelled_spss<double>[11]>
#>  [1]  1  2  3  4  5  6  7  8  9 10 99
#> Missing values: 99
#> 
#> Labels:
#>  value   label
#>     99 Missing
x2
#> <labelled_spss<double>[11]>
#>  [1]  1  2  3  4  5  6  7  8  9 10 99
#> Missing range:  [90, Inf]
#> 
#> Labels:
#>  value   label
#>     99 Missing

These objects are somewhat dangerous to work with in R because most R functions don’t know those values are missing:

mean(x1)
#> [1] 14

Because of that danger, the default behaviour of read_spss() is to return regular labelled objects where user-defined missing values have been converted to NAs. To get read_spss() to return labelled_spss() objects, you’ll need to set user_na = TRUE.

I’ve defined an is.na() method so you can find them yourself:

is.na(x1)
#>  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

And the presence of that method does mean many functions with an na.rm argument will work correctly:

mean(x1, na.rm = TRUE)
#> [1] 14

But generally you should either convert to a factor, convert to regular missing vaues, or strip the all the labels:

as_factor(x1)
#>  [1] 1       2       3       4       5       6       7       8       9      
#> [10] 10      Missing
#> Levels: 1 2 3 4 5 6 7 8 9 10 Missing
zap_missing(x1)
#>  [1]  1  2  3  4  5  6  7  8  9 10 NA
#> attr(,"labels")
#> Missing 
#>      99 
#> attr(,"class")
#> [1] "haven_labelled"
zap_labels(x1)
#>  [1]  1  2  3  4  5  6  7  8  9 10 NA
haven/inst/doc/datetimes.html0000644000176200001440000002123514102332322015735 0ustar liggesusers Dates and times

Dates and times

Formats

There are three common formats across SAS, SPSS and Stata.

Date (number of days)

  • SAS: MMDDYY, DDMMYY, YYMMDD, DATE
  • Spss: n/a
  • Stata: %td

Time (number of seconds):

  • SAS: TIME, HHMM, TOD
  • Spss: TIME, DTIME
  • Stata: n/a

DateTime (number of seconds):

  • SAS: DATETIME
  • Spss: DATE, ADATE, SDATE, DATETIME (as milliseconds)
  • Stata: %tc, %tC

Offsets

Dates and date times use a difference offset to R:

  • SAS: 1960-01-01 (3653 days)
  • Spss: 1582-10-14. (141428 days)
  • Stata: 1960-01-01. (3653 days)