R 在列中用逗号读取 csv

Question

Update 2020-5-14 2020-5-14 更新

Working with a different but similar dataset from here , I found read_csv seems to work fine.从这里使用不同但相似的数据集，我发现read_csv似乎工作正常。 I haven't tried it with the original data yet though.不过，我还没有尝试过使用原始数据。

Although the replies didn't help solve the problem because my question was not correct, Shan's reply fits the original question I posted the most, so I accepted his answer.虽然回复没有帮助解决问题，因为我的问题不正确，但单的回复最符合我发布的原始问题，所以我接受了他的回答。

Update 2020-5-12 2020-5-12 更新

I think my original question is not correct.我认为我原来的问题是不正确的。 Like mentioned in the comment, the data was quoted.就像评论中提到的那样，引用了数据。 Although changing the separator made the 11582 row in R look the same as the 11583 row in excel, it doesn't mean it's "right".虽然更改分隔符使 R 中的 11582 行看起来与 excel 中的 11583 行相同，但这并不意味着它是“正确的”。 Maybe there is some incorrect line switch due to inappropriate encoding or something, and thus causing some of the columns to be displaced.可能是因为编码不当什么的，换行不正确，导致部分列错位。 If I open the data with notepad++, the instance at row 11583 in excel is at the 11596 row.如果我用 notepad++ 打开数据，excel 中第 11583 行的实例位于第 11596 行。

Original question原始问题

I am trying to read the listings.csv from this dataset in kaggle into R.我正在尝试从listings.csv中的这个数据集中读取 Listings.csv 到 R 中。 I downloaded the file and wrote the code read.csv('listing.csv') .我下载了文件并编写了代码read.csv('listing.csv') 。 The first column, the column id , is supposed to be numeric.第一列，列id ，应该是数字。 However, it shows:但是，它显示：

listing$id[1:10]
 [1] 2015  2695  3176  3309  7071  9991  14325 16401 16644 17409
13129 Levels: Ole Berl穩n!,16736423,Nerea,Mitte,Parkviertel,52.55554132116211,13.340658248460871,Entire home/apt,36,6,3,2018-01-26,0.16,1,279\n17312576,Great 2 floor apartment near Friederich Str MITTE,116829651,Selin,Mitte,Alexanderplatz,52.52349354926847,13.391003496971203,Entire home/apt,170,3,31,2018-10-13,1.63,1,92\n17316675,80簡 m of charm in 3 rooms with office space,116862833,Jon,Neuk繹lln,Schillerpromenade,52.47499080234379,13.427509313575928...

I think it is because there are values with commas in the second column.我认为这是因为第二列中有逗号的值。 For example, opening the file with MiCrosoft excel, I can see one of the value in the second column is Ole,Ole... :例如，使用 MiCrosoft excel 打开文件，我可以看到第二列中的一个值是Ole,Ole... ：

How can I read a csv file into R correctly when some values contain commas?当某些值包含逗号时，如何将 csv 文件正确读取到 R 中？

Answer 1

Since you have access to the data in Excel, you can 'Save As' in Excel with a seperator other than comma (,).由于您可以访问 Excel 中的数据，因此您可以在 Excel 中使用逗号 (,) 以外的分隔符“另存为”。 First go in to Control Panel –> Region and Language -> Additional settings, you can change the "List Seperator".首先 go 在控制面板->区域和语言->附加设置中，可以更改“列表分隔符”。 Most common one other than comma is pipe symbol (|).除逗号外，最常见的是 pipe 符号 (|)。 In R, when you read_csv, specify the seperator as '|'.在 R 中，当您 read_csv 时，将分隔符指定为“|”。

Answer 2

You could try this?你可以试试这个？

lsitings <- read.csv("listings.csv", stringsAsFactors = FALSE) lsitings <- read.csv("listings.csv", stringsAsFactors = FALSE)

listings$name <- gsub(",","", listings$name) - This will remove the comma in Col name Listings$name <- gsub(",","", Listings$name) - 这将删除 Col name 中的逗号

Answer 3

If you don't need the information in the second column, then you can always delete it (in Excel) before importing into R.如果您不需要第二列中的信息，那么您可以在导入 R 之前将其删除（在 Excel 中）。 The read.csv function, which calls scan , can also omit unwanted columns using the colClasses argument.调用scan的read.csv function 也可以使用colClasses参数省略不需要的列。 However, the fread function from the data.table package does this much more simply with the drop argument:但是，来自data.table package 的fread function 使用drop参数可以更简单地执行此操作：

library(data.table)
listings <- fread("listings.csv", drop=2)

If you do need the information in that column, then other methods are needed (see other solutions).如果您确实需要该列中的信息，则需要其他方法（请参阅其他解决方案）。

R 在列中用逗号读取 csv

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-05-10 02:09:00

解决方案2
0 2020-05-10 02:42:45

解决方案3
0 2020-05-10 02:43:57

R 在列中用逗号读取 csv

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-05-10 02:09:00

解决方案2 0 2020-05-10 02:42:45

解决方案3 0 2020-05-10 02:43:57

解决方案1
2 已采纳 2020-05-10 02:09:00

解决方案2
0 2020-05-10 02:42:45

解决方案3
0 2020-05-10 02:43:57