简体   繁体   English

R 在列中用逗号读取 csv

[英]R read csv with comma in column

Update 2020-5-14 2020-5-14 更新

Working with a different but similar dataset from here , I found read_csv seems to work fine.这里使用不同但相似的数据集,我发现read_csv似乎工作正常。 I haven't tried it with the original data yet though.不过,我还没有尝试过使用原始数据。

Although the replies didn't help solve the problem because my question was not correct, Shan's reply fits the original question I posted the most, so I accepted his answer.虽然回复没有帮助解决问题,因为我的问题不正确,但单的回复最符合我发布的原始问题,所以我接受了他的回答。

Update 2020-5-12 2020-5-12 更新

I think my original question is not correct.我认为我原来的问题是不正确的。 Like mentioned in the comment, the data was quoted.就像评论中提到的那样,引用了数据。 Although changing the separator made the 11582 row in R look the same as the 11583 row in excel, it doesn't mean it's "right".虽然更改分隔符使 R 中的 11582 行看起来与 excel 中的 11583 行相同,但这并不意味着它是“正确的”。 Maybe there is some incorrect line switch due to inappropriate encoding or something, and thus causing some of the columns to be displaced.可能是因为编码不当什么的,换行不正确,导致部分列错位。 If I open the data with notepad++, the instance at row 11583 in excel is at the 11596 row.如果我用 notepad++ 打开数据,excel 中第 11583 行的实例位于第 11596 行。


Original question原始问题

I am trying to read the listings.csv from this dataset in kaggle into R.我正在尝试从listings.csv中的这个数据集中读取 Listings.csv 到 R 中。 I downloaded the file and wrote the code read.csv('listing.csv') .我下载了文件并编写了代码read.csv('listing.csv') The first column, the column id , is supposed to be numeric.第一列,列id ,应该是数字。 However, it shows:但是,它显示:

listing$id[1:10]
 [1] 2015  2695  3176  3309  7071  9991  14325 16401 16644 17409
13129 Levels: Ole Berl穩n!,16736423,Nerea,Mitte,Parkviertel,52.55554132116211,13.340658248460871,Entire home/apt,36,6,3,2018-01-26,0.16,1,279\n17312576,Great 2 floor apartment near Friederich Str MITTE,116829651,Selin,Mitte,Alexanderplatz,52.52349354926847,13.391003496971203,Entire home/apt,170,3,31,2018-10-13,1.63,1,92\n17316675,80簡 m of charm in 3 rooms with office space,116862833,Jon,Neuk繹lln,Schillerpromenade,52.47499080234379,13.427509313575928...

I think it is because there are values with commas in the second column.我认为这是因为第二列中有逗号的值。 For example, opening the file with MiCrosoft excel, I can see one of the value in the second column is Ole,Ole... :例如,使用 MiCrosoft excel 打开文件,我可以看到第二列中的一个值是Ole,Ole... 在此处输入图像描述

How can I read a csv file into R correctly when some values contain commas?当某些值包含逗号时,如何将 csv 文件正确读取到 R 中?

Since you have access to the data in Excel, you can 'Save As' in Excel with a seperator other than comma (,).由于您可以访问 Excel 中的数据,因此您可以在 Excel 中使用逗号 (,) 以外的分隔符“另存为”。 First go in to Control Panel –> Region and Language -> Additional settings, you can change the "List Seperator".首先 go 在控制面板->区域和语言->附加设置中,可以更改“列表分隔符”。 Most common one other than comma is pipe symbol (|).除逗号外,最常见的是 pipe 符号 (|)。 In R, when you read_csv, specify the seperator as '|'.在 R 中,当您 read_csv 时,将分隔符指定为“|”。

You could try this?你可以试试这个?

lsitings <- read.csv("listings.csv", stringsAsFactors = FALSE) lsitings <- read.csv("listings.csv", stringsAsFactors = FALSE)

listings$name <- gsub(",","", listings$name) - This will remove the comma in Col name Listings$name <- gsub(",","", Listings$name) - 这将删除 Col name 中的逗号

If you don't need the information in the second column, then you can always delete it (in Excel) before importing into R.如果您不需要第二列中的信息,那么您可以在导入 R 之前将其删除(在 Excel 中)。 The read.csv function, which calls scan , can also omit unwanted columns using the colClasses argument.调用scanread.csv function 也可以使用colClasses参数省略不需要的列。 However, the fread function from the data.table package does this much more simply with the drop argument:但是,来自data.table package 的fread function 使用drop参数可以更简单地执行此操作:

library(data.table)
listings <- fread("listings.csv", drop=2)

If you do need the information in that column, then other methods are needed (see other solutions).如果您确实需要该列中的信息,则需要其他方法(请参阅其他解决方案)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM