简体   繁体   English

在 r 中使用 fread 读取逗号分隔的 csv 文件,其中包含逗号的字段

[英]Read comma separated csv file with fields containing commas using fread in r

I have a csv file separated by comma.我有一个用逗号分隔的 csv 文件。 However, there are fields containing commas like company names "Apple, Inc" and the fields will be separated into two columns, which leads to the following error using fread.但是,有些字段包含逗号,例如公司名称“Apple,Inc”,并且这些字段将分为两列,这会导致使用 fread 时出现以下错误。

"Stopped early on line 5. Expected 26 fields but found 27." “在第 5 行提前停止。预计有 26 个字段,但找到了 27 个。”

Any suggestions on how to appropriately load this file?有关如何正确加载此文件的任何建议? Thanks in advance!提前致谢!

Add:添加:

Example rows are as follows.示例行如下。 It seems that there are some fields with comma without quotes.似乎有些字段带有逗号而没有引号。 But they have whitespace following the comma inside the field.但是他们在字段内的逗号后面有空格。

100,Microsoft,azure.com
300,IBM,ibm.com
500,Google,google.com
100,Amazon, Inc,amazon.com
400,"SAP, Inc",sap.com

1) Using the test file created in the Note at the end and assuming that the file has no semicolons (use some other character if it does) read in the lines, replace the first and last comma with semicolon and then read it as a semicolon separated file. 1)使用最后在注释中创建的测试文件并假设文件没有分号(如果有,请使用其他字符)在行中读取,将第一个和最后一个逗号替换为分号,然后将其读取为分号分开的文件。

L <- readLines("firms.csv")
read.table(text = sub(",(.*),", ";\\1;", L), sep = ";")
##    V1          V2         V3
## 1 100   Microsoft  azure.com
## 2 300         IBM    ibm.com
## 3 500      Google google.com
## 4 100 Amazon, Inc amazon.com
## 5 400    SAP, Inc    sap.com

2) Another approach is to use gsub to replace every comma followed by space with semicolon followed by space and then use chartr to replace every comma with semicolon and every semicolon with comma and then read it in as a semicolon separated file. 2)另一种方法是使用 gsub 将每个逗号后跟空格替换为分号后跟空格,然后使用 chartr 将每个逗号替换为分号,将每个分号替换为逗号,然后将其作为分号分隔的文件读入。

L <- readLines("firms.csv")
read.table(text = chartr(",;", ";,", gsub(", ", "; ", L)), sep = ";")
##    V1          V2         V3
## 1 100   Microsoft  azure.com
## 2 300         IBM    ibm.com
## 3 500      Google google.com
## 4 100 Amazon, Inc amazon.com
## 5 400    SAP, Inc    sap.com

3) Another possibility if there are not too many such rows is to locate them and then put quotes around the offending fields in a text editor. 3)如果没有太多这样的行,另一种可能性是找到它们,然后在文本编辑器中在有问题的字段周围加上引号。 Then it can be read in normally.然后就可以正常读取了。

which(count.fields("firms.csv", sep = ",") != 3)
## [1] 4

Note笔记

Lines <- '100,Microsoft,azure.com
300,IBM,ibm.com
500,Google,google.com
100,Amazon, Inc,amazon.com
400,"SAP, Inc",sap.com
'
cat(Lines, file = "firms.csv")

Works fine for me.对我来说很好。 Can you provide a reproducible example?你能提供一个可重现的例子吗?

library(data.table)

# Create example and write out
df_out <- data.frame("X" = c("A", "B", "C"),
                     "Y"= c("a,A", "b,B", "C"))

write.csv(df_out, file = "df.csv", row.names = F)

# Read in CSV with fread
df_in <- fread("./df.csv")
df_in
   X   Y
1: A a,A
2: B b,B
3: C   C

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何读取包含R中逗号的值的csv? - How to read csv with values containing commas in R? 带有逗号分隔字段的 R tibble - read/write_csv() 错误地将数据解析为双精度 - R tibble with comma separated fields - read/write_csv() incorrectly parses data as double R:如何读取带有data.table :: fread的CSV文件,其逗号为十进制,并指向千位分隔符=“。” - R: How can I read a CSV file with data.table::fread, that has a comma as decimal and point as thousand separator=“.” 如何读取包含逗号分隔的不同长度的矢量的CSV文件? - How to read a CSV file that includes vectors of different lengths separated by commas? 如何使用fread()读取具有不规则空格分隔值的文件? - How to read file with irregular space separated value using fread()? 如何使用fread将制表符分隔文件读入data.table? - How to read tab separated file into data.table using fread? 如何读取 csv 文件的字段中包含逗号的列? - How to read csv file with a column containing commas in its field? 读取带有R data.table fread的全引号.csv文件 - read fully quoted .csv file with R data.table fread R 使用 fread colClasses 或跳过参数来读取没有列标题的 csv - R using fread colClasses or skip arguments to read csv with no column headers 使用 r 按特定列数进行 fread 或 read_csv - fread or read_csv by specific number of columns using r
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM