简体   繁体   中英

Splitting a column in an R dataframe

I have a column of data in a R data frame that has values such as:

Blue-#105
Green-#8845
Yellow-#5454
Blue-#999

I want to remove the last number part (starting at -#) so that Blue-#999 and Blue-#105 are consider the same thing when plotting. How could I accomplish this?

Use regular expressions:

> DF <- data.frame(col=c("Blue-#105", "Green-#8845", "Blue-#999"))
> DF
          col
1   Blue-#105
2 Green-#8845
3   Blue-#999
> DF$col <- gsub("-\\#.*", "", DF$col)
> DF
    col
1  Blue
2 Green
3  Blue
> 

Here we say that all strings starting with -# (where the comment char # needs to be escaped) and followed by whatever --- which is .* in regular expression lingo: any char (the dot) repeated as many times as it fits (the star) --- will get replaced by the empty string, or in other words, removed.

Use the sub or gsub function. For your example you could do something like:

newcolors <- sub("^([^-]*)-.*$", "\\1", oldcolors )

This assumes that the colors are in a vector 'oldcolors' and puts the results into newcolors. The pattern starts at the beginning of the string (^) then matches 0 or more characters that are not dashes ([^-] ), the parens around that says to save what is matched. Then it matches a dash followed by further characters (. ) until the end of the string ($), the matched portion (the entire string) is then replaced by whatever was matched within the parens (the color).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM