简体   繁体   English

如何阅读R中的单行CSV?

[英]How to read a one lined CSV in R?

I have been working on a dummy dataset recently and i found out that the data provided to me was all in single line. 我最近一直在研究一个虚拟数据集,我发现提供给我的数据都是单行的。 A similiar example for the same is depicted as follows: 一个类似的例子描述如下:

Name,Age,Gender,Occupation A,10,M,Student B,11,M,Student C,11,F,Student

i want to import the data and obtain an output as follows: 我想导入数据并获得如下输出:

Name  Age  Gender  Occupation
 A    10     M       Student
 B    11     M       Student
 C    12     F       Student

a case may arise that a value might be missing. 可能会出现可能缺少价值的情况。 a logic is required to import such data. 导入此类数据需要逻辑。 Can anyone help me out to build a logic behind the import of such data sets. 任何人都可以帮助我构建导入此类数据集的逻辑。

i tried the normal import but it really didn't helped. 我尝试了正常的导入,但它确实没有帮助。 just imported the file by read.csv() function and it didn't gave me an expected result. 刚刚通过read.csv()函数导入文件,它没有给我一个预期的结果。

EDIT: what if the data is like: 编辑:如果数据如下:

Name,Age,Gender,Occupation ABC XYZ,10,M,Student B,11,M,Student C,11,F,Student

and i want an output like: 我想要一个像这样的输出:

  Name     Age  Gender  Occupation
 ABC XYZ    10     M       Student
   B        11     M       Student
   C        12     F       Student

You could read your file in with readLines , turn spaces into line breaks, and then read it with read.csv : 您可以使用readLines读取文件,将空格转换为换行符,然后使用read.csv读取它:

# txt <- readLines("my_data.txt") # with a real data file
txt <- readLines(textConnection("Name,Age,Gender,Occupation A,10,M,Student B,11,M,Student C,11,F,Student"))

read.csv(text=gsub(" ","\n",txt))

output 产量

  Name Age Gender Occupation
1    A  10      M    Student
2    B  11      M    Student
3    C  11      F    Student

If you have millions of records, you will probably want to speed up this process, so I suggest using data.table 's fread instead of read.csv , which can also take a shell command to pre-process the file before reading in R, and sed will be a lot faster then doing the string manipulation in R. 如果你有数百万条记录,你可能想要加快这个过程,所以我建议使用data.tablefread而不是read.csv ,它也可以在读取R之前使用shell命令预处理文件,并且sed将比在R中进行字符串操作快得多。

Eg if you have this CSV stored at /tmp/x.csv , you can try something like: 例如,如果您将此CSV存储在/tmp/x.csv ,则可以尝试以下操作:

> data.table::fread("sed 's/ /\\n/g' /tmp/x.csv")
   Name Age Gender Occupation
1:    A  10      M    Student
2:    B  11      M    Student
3:    C  11      F    Student

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM