简体   繁体   中英

How to read a csv but separating only at first two comma separators?

I have a CSV file. I want to read the file in R but use only the first 2 commas ie if there is a line like this in the file,

1,1000,I, am done, with you

In RI want this to the row of a dataframe with three columns like this

> df <- data.frame("Id"="1","Count" ="1000", "Comment" = "I, am done, with you")
> df
  Id Count              Comment
1  1  1000 I, am done, with you

Regular expression will work.

For example, suppose str are the rows you want to recognize. Here suppose your csv file looks like

1,1000,I, am done, with you
2,500, i don't know

If you want to read from file, just call readLines() to read all lines of the file as a character vector in R, just like str .

The technique is very simple. Here I use {stringr} package to match the text and extract the information I need.

str <- c("1,1000,I, am done, with you", "2,500, i don't know")

library(stringr)

# match the strings by pattern integer,integer,anything
matches <- str_match(str,pattern="(\\d+),(\\d+),\\s*(.+)")

Here I briefly explains the pattern (\\\\d+),(\\\\d+),\\\\s*(.+) . \\\\d represents digit character, \\\\s represents space character, . represents anything. + means one or more, * means none or some. () groups the patterns so that the function knows what we regard as a group of information.

If you look at matches , it looks like

     [,1]                          [,2] [,3]   [,4]                  
[1,] "1,1000,I, am done, with you" "1"  "1000" "I, am done, with you"
[2,] "2,500, i don't know"         "2"  "500"  "i don't know"        

Look, str_match function successfully split the texts by the pattern to a matrix. Then our work is only to transform the matrix to a data frame with correct data types.

df <- data.frame(matches[,-1],stringsAsFactors=F)
colnames(df) <- c("Id","Count","Comment")
df <- transform(df,Id=as.integer(Id),Count=as.integer(Count))

df is our target:

  Id Count              Comment
1  1  1000 I, am done, with you
2  2  1002         i don't know

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM