I have a 700mb .dta Stata file with 28 million observations and 14 column variables
When I attempt to import into R using foreign's read.dta() function I run out of RAM on my 8GB machine (page outs shoot into GBs very quickly).
staph <- read.dta("Staph_1999_2010.dta")
I hunted around and it sounds like a more efficient alternative would be to use the Stata.file() function from the memisc package .
When I call:
staph <- Stata.file("Staph_1999_2010.dta")
I get a segfault:
*** caught segfault ***
address 0xd5d2b920, cause 'memory not mapped'
Traceback:
1: .Call("dta_read_labels", bf, lbllen, padding)
2: dta.read.labels(bf, len.lbl, 3)
3: get.dictionary.dta(dta)
4: Stata.file("Staph_1999_2010.dta")
I find the documentation for Stata.file() difficult to follow.
(1) Am I using Stata.file()
correctly?
(2) Does Stata.file()
return a dataframe like read.dta() does?
(3) If I'm using Stata.file()
correctly, how can I fix the error I'm getting?
With access to Stata, one solution to export the .dta to .csv in Stata.
use "file.dta"
export delimited using "file.csv", replace
And then import in R using read.csv
or data.table::fread
.
Other ideas:
sample
in Stata Stata's compress
attempts a lossless compression by changing types (not
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.