简体   繁体   English

将原始数据文件转换为RData文件

[英]convert raw data file to RData file

I am trying to make a RData file from a raw numeric space deliminated text file, ie 我试图从原始的数字空间分隔文本文件,即RData文件

11 33 55
22 33 45
25 78 00 
44 87 99 ....

I have another R script which needs to load this new RData file and perform linear regression with the data using mapreduce (rhipe). 我还有另一个R脚本,需要加载此新RData文件并使用mapreduce(rhipe)对数据执行线性回归。 Thus when i save this RObject I need to read it back this way: 因此,当我保存此RObject时,我需要以这种方式读回它:

data <- strsplit(unlist(map.values)," ")

#so that I can run regression like:
y<- unlist(lapply(data,"[[",1))
x1<-unlist(lapply(data,"[[",2))
x2<-unlist(lapply(data,"[[",3))
lm(y~x1+x2)

I have tried many ways to save my data into the RData object, including table, list and as.character, but non of the succeed so that i can read it using my above method. 我尝试了多种方法将数据保存到RData对象中,包括表,列表和as.character,但都不成功,因此可以使用上述方法读取它。 How can I save my original file so that I can read it in the way I have above? 如何保存我的原始文件,以便可以按照上面的方式读取它? Thank you. 谢谢。

(ps. i cannot use load / read.table functions since i am reading from a HDFS file inside the mapper) (ps。我无法使用load / read.table函数,因为我正在从映射器中的HDFS文件读取)

If I understand you correctly, you want your stored object to be a bunch of strings of the form "number - space - number" . 如果我理解正确,那么您希望存储的对象是一串形式为“ number-space-number”的字符串。 In that case, use sprintf 在这种情况下,请使用sprintf

foo <- sprintf('%d %d %d',my_data[1,])

as an example of creating the first row. 作为创建第一行的示例。 Run a loop or *apply to build the entire array. 运行循环或*apply以构建整个阵列。 Save that character string array to an RData file. 将该字符串数组保存到RData文件。 This should at least be close to what you want. 这至少应该接近您想要的。
Note: I suppose it's futile to suggest improving the far-end code which does the data sorting and regressions? 注意:我认为建议改进数据分类和回归的远端代码是徒劳的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM