I have a csv file separated by comma. However, there are fields containing commas like company names "Apple, Inc" and the fields will be separated into two columns, which leads to the following error using fread.
"Stopped early on line 5. Expected 26 fields but found 27."
Any suggestions on how to appropriately load this file? Thanks in advance!
Add:
Example rows are as follows. It seems that there are some fields with comma without quotes. But they have whitespace following the comma inside the field.
100,Microsoft,azure.com
300,IBM,ibm.com
500,Google,google.com
100,Amazon, Inc,amazon.com
400,"SAP, Inc",sap.com
1) Using the test file created in the Note at the end and assuming that the file has no semicolons (use some other character if it does) read in the lines, replace the first and last comma with semicolon and then read it as a semicolon separated file.
L <- readLines("firms.csv")
read.table(text = sub(",(.*),", ";\\1;", L), sep = ";")
## V1 V2 V3
## 1 100 Microsoft azure.com
## 2 300 IBM ibm.com
## 3 500 Google google.com
## 4 100 Amazon, Inc amazon.com
## 5 400 SAP, Inc sap.com
2) Another approach is to use gsub to replace every comma followed by space with semicolon followed by space and then use chartr to replace every comma with semicolon and every semicolon with comma and then read it in as a semicolon separated file.
L <- readLines("firms.csv")
read.table(text = chartr(",;", ";,", gsub(", ", "; ", L)), sep = ";")
## V1 V2 V3
## 1 100 Microsoft azure.com
## 2 300 IBM ibm.com
## 3 500 Google google.com
## 4 100 Amazon, Inc amazon.com
## 5 400 SAP, Inc sap.com
3) Another possibility if there are not too many such rows is to locate them and then put quotes around the offending fields in a text editor. Then it can be read in normally.
which(count.fields("firms.csv", sep = ",") != 3)
## [1] 4
Lines <- '100,Microsoft,azure.com
300,IBM,ibm.com
500,Google,google.com
100,Amazon, Inc,amazon.com
400,"SAP, Inc",sap.com
'
cat(Lines, file = "firms.csv")
Works fine for me. Can you provide a reproducible example?
library(data.table)
# Create example and write out
df_out <- data.frame("X" = c("A", "B", "C"),
"Y"= c("a,A", "b,B", "C"))
write.csv(df_out, file = "df.csv", row.names = F)
# Read in CSV with fread
df_in <- fread("./df.csv")
df_in
X Y
1: A a,A
2: B b,B
3: C C
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.