简体   繁体   中英

How to split a string conditionally in R?

I would like to split a string into multiple columns based on a number of conditions.

An example of my data:

Col1<- c("01/05/2004 02:59", "01/05/2004 05:04", "01/06/2004 07:19", "01/07/2004 02:55", "01/07/2004 04:32", "01/07/2004 04:38", "01/07/2004 17:13", "01/07/2004 18:40", "01/07/2004 20:58", "01/07/2004 23:39", "01/09/2004 13:28")

Col2<- c("Wabamun #4 off line.", "Keephills #2 on line.", "Wabamun #1 on line.", "North Red Deer T217s bus lock out.  Under investigation.",  "T217s has blown CTs on 778L", "T217s North Red Deer bus back in service (778L out of service)", "Keephills #2 off line.", "Wabamun #4 on line.", "Sundance #1 off line.", "Keephills #2 on line", "Homeland security event lowered to yellow ( elevated)")

df<- data.frame(Col1,Col2)

I would like to be able to split column w conditionally.

To get something like this:

Col3<- c("Wabamun #4", "Keephills #2", "Wabamun #1", "General Asset", "General Asset", "General Asset", "Keephills #2", "Wabamun #4", "Sundance #1", "Keephills #2", "General Asset") 

Col4<- c("off line.", "on line.", "on line.", "North Red Deer T217s bus lock out.  Under investigation.",  "T217s has blown CTs on 778L", "T217s North Red Deer bus back in service (778L out of service)", "off line.", "on line.", "off line.", "on line", "Homeland security event lowered to yellow ( elevated)")

After I'm planning to find the times between when an asset goes down and comes back online. These are often generator plants so I would also be looking up the capacity of the plant. Example Keephills #2 has a capacity of 300MW.

Thankfully, regular expressions are here to save the day.

# This line prevents character strings turning into factors
df<- data.frame(Col1,Col2, stringsAsFactors=FALSE)

# This match works with the powerplant names as 
# they're all 1 or more characters followed by a space, hash and single digit.
pwrmatch <- regexpr("^[[:alpha:]]+ #[[:digit:]]", df$Col2)
df$Col3 <- "General Asset"
df$Col3[grepl("^[[:alpha:]]+ #[[:digit:]]", df$Col2)] <- regmatches(df$Col2, pwrmatch)

Col3 now looks like: c("Wabamun #4", "Keephills #2", "Wabamun #1", "General Asset", "General Asset", "General Asset", "Keephills #2", "Wabamun #4", "Sundance #1", "Keephills #2", "General Asset")

The other line is a similar matter, simply matching all cases of on/off line.

linematch <- regexpr("(on|off) line", df$Col2)
df$Col4 <- df$Col2
df$Col4[grepl("(on|off) line", df$Col2)] <- regmatches(df$Col2, linematch)

Col4 now looks like: c("off line", "on line", "on line", "North Red Deer T217s bus lock out. Under investigation.", "T217s has blown CTs on 778L", "T217s North Red Deer bus back in service (778L out of service)", "off line", "on line", "off line", "on line", "Homeland security event lowered to yellow ( elevated)" )

> Col3 <- Col4 <- character(nrow(df))
> index <- grep("#", Col2, invert = TRUE)
> spl1 <- unlist(strsplit(Col2[-index], " o"))[c(TRUE, FALSE)]
> Col3[-index] <- spl1
> Col3[index] <- "General Asset"
> spl2 <- unlist(strsplit(Col2[-index], " o"))[c(FALSE, TRUE)]
> Col4[-index] <- paste("o", spl2, sep="")
> Col4[index] <- Col2[index]
> Col3
## [1] "Wabamun #4"    "Keephills #2"  "Wabamun #1"    "General Asset"
## [5] "General Asset" "General Asset" "Keephills #2"  "Wabamun #4"   
## [9] "Sundance #1"   "Keephills #2"  "General Asset"
> Col4
##  [1] "off line."                                                     
##  [2] "on line."                                                      
##  [3] "on line."                                                      
##  [4] "North Red Deer T217s bus lock out.  Under investigation."      
##  [5] "T217s has blown CTs on 778L"                                   
##  [6] "T217s North Red Deer bus back in service (778L out of service)"
##  [7] "off line."                                                     
##  [8] "on line."                                                      
##  [9] "off line."                                                     
## [10] "on line"                                                       
## [11] "Homeland security event lowered to yellow ( elevated)"      

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM