[英]R read csv with comma in column
Update 2020-5-14 2020-5-14 更新
Working with a different but similar dataset from here , I found read_csv
seems to work fine.从这里使用不同但相似的数据集,我发现
read_csv
似乎工作正常。 I haven't tried it with the original data yet though.不过,我还没有尝试过使用原始数据。
Although the replies didn't help solve the problem because my question was not correct, Shan's reply fits the original question I posted the most, so I accepted his answer.虽然回复没有帮助解决问题,因为我的问题不正确,但单的回复最符合我发布的原始问题,所以我接受了他的回答。
Update 2020-5-12 2020-5-12 更新
I think my original question is not correct.我认为我原来的问题是不正确的。 Like mentioned in the comment, the data was quoted.
就像评论中提到的那样,引用了数据。 Although changing the separator made the 11582 row in R look the same as the 11583 row in excel, it doesn't mean it's "right".
虽然更改分隔符使 R 中的 11582 行看起来与 excel 中的 11583 行相同,但这并不意味着它是“正确的”。 Maybe there is some incorrect line switch due to inappropriate encoding or something, and thus causing some of the columns to be displaced.
可能是因为编码不当什么的,换行不正确,导致部分列错位。 If I open the data with notepad++, the instance at row 11583 in excel is at the 11596 row.
如果我用 notepad++ 打开数据,excel 中第 11583 行的实例位于第 11596 行。
Original question原始问题
I am trying to read the listings.csv
from this dataset in kaggle into R.我正在尝试从
listings.csv
中的这个数据集中读取 Listings.csv 到 R 中。 I downloaded the file and wrote the code read.csv('listing.csv')
.我下载了文件并编写了代码
read.csv('listing.csv')
。 The first column, the column id
, is supposed to be numeric.第一列,列
id
,应该是数字。 However, it shows:但是,它显示:
listing$id[1:10]
[1] 2015 2695 3176 3309 7071 9991 14325 16401 16644 17409
13129 Levels: Ole Berl穩n!,16736423,Nerea,Mitte,Parkviertel,52.55554132116211,13.340658248460871,Entire home/apt,36,6,3,2018-01-26,0.16,1,279\n17312576,Great 2 floor apartment near Friederich Str MITTE,116829651,Selin,Mitte,Alexanderplatz,52.52349354926847,13.391003496971203,Entire home/apt,170,3,31,2018-10-13,1.63,1,92\n17316675,80簡 m of charm in 3 rooms with office space,116862833,Jon,Neuk繹lln,Schillerpromenade,52.47499080234379,13.427509313575928...
I think it is because there are values with commas in the second column.我认为这是因为第二列中有逗号的值。 For example, opening the file with MiCrosoft excel, I can see one of the value in the second column is
Ole,Ole...
:例如,使用 MiCrosoft excel 打开文件,我可以看到第二列中的一个值是
Ole,Ole...
:
How can I read a csv file into R correctly when some values contain commas?当某些值包含逗号时,如何将 csv 文件正确读取到 R 中?
Since you have access to the data in Excel, you can 'Save As' in Excel with a seperator other than comma (,).由于您可以访问 Excel 中的数据,因此您可以在 Excel 中使用逗号 (,) 以外的分隔符“另存为”。 First go in to Control Panel –> Region and Language -> Additional settings, you can change the "List Seperator".
首先 go 在控制面板->区域和语言->附加设置中,可以更改“列表分隔符”。 Most common one other than comma is pipe symbol (|).
除逗号外,最常见的是 pipe 符号 (|)。 In R, when you read_csv, specify the seperator as '|'.
在 R 中,当您 read_csv 时,将分隔符指定为“|”。
You could try this?你可以试试这个?
lsitings <- read.csv("listings.csv", stringsAsFactors = FALSE) lsitings <- read.csv("listings.csv", stringsAsFactors = FALSE)
listings$name <- gsub(",","", listings$name) - This will remove the comma in Col name Listings$name <- gsub(",","", Listings$name) - 这将删除 Col name 中的逗号
If you don't need the information in the second column, then you can always delete it (in Excel) before importing into R.如果您不需要第二列中的信息,那么您可以在导入 R 之前将其删除(在 Excel 中)。 The
read.csv
function, which calls scan
, can also omit unwanted columns using the colClasses
argument.调用
scan
的read.csv
function 也可以使用colClasses
参数省略不需要的列。 However, the fread
function from the data.table package does this much more simply with the drop
argument:但是,来自data.table package 的
fread
function 使用drop
参数可以更简单地执行此操作:
library(data.table)
listings <- fread("listings.csv", drop=2)
If you do need the information in that column, then other methods are needed (see other solutions).如果您确实需要该列中的信息,则需要其他方法(请参阅其他解决方案)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.