I have a CSV file. I want to read the file in R but use only the first 2 commas ie if there is a line like this in the file,
1,1000,I, am done, with you
In RI want this to the row of a dataframe with three columns like this
> df <- data.frame("Id"="1","Count" ="1000", "Comment" = "I, am done, with you")
> df
Id Count Comment
1 1 1000 I, am done, with you
Regular expression will work.
For example, suppose str
are the rows you want to recognize. Here suppose your csv file looks like
1,1000,I, am done, with you
2,500, i don't know
If you want to read from file, just call readLines()
to read all lines of the file as a character vector in R, just like str
.
The technique is very simple. Here I use {stringr}
package to match the text and extract the information I need.
str <- c("1,1000,I, am done, with you", "2,500, i don't know")
library(stringr)
# match the strings by pattern integer,integer,anything
matches <- str_match(str,pattern="(\\d+),(\\d+),\\s*(.+)")
Here I briefly explains the pattern (\\\\d+),(\\\\d+),\\\\s*(.+)
. \\\\d
represents digit character, \\\\s
represents space character, .
represents anything. +
means one or more, *
means none or some. ()
groups the patterns so that the function knows what we regard as a group of information.
If you look at matches
, it looks like
[,1] [,2] [,3] [,4]
[1,] "1,1000,I, am done, with you" "1" "1000" "I, am done, with you"
[2,] "2,500, i don't know" "2" "500" "i don't know"
Look, str_match
function successfully split the texts by the pattern to a matrix. Then our work is only to transform the matrix to a data frame with correct data types.
df <- data.frame(matches[,-1],stringsAsFactors=F)
colnames(df) <- c("Id","Count","Comment")
df <- transform(df,Id=as.integer(Id),Count=as.integer(Count))
df
is our target:
Id Count Comment
1 1 1000 I, am done, with you
2 2 1002 i don't know
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.