简体   繁体   中英

(R) Parse character vector and split into two separate columns

I have a dataframe with character columns of mean (sd) like so:

table <- tribble(
  ~var1, ~var2,
  #------------
  "27.0 (3.1)", "171.4 (9.0)",
  "27.0 (3.2)", "176.8 (7.2)",
  "27.1 (3.0)", "165.0 (6.2)"
)

I would like to split each column into two columns, one for the mean and one for the sd. Something like:

table_split <- tribble(
  ~var1_mean, ~var1_sd, ~var2_mean, ~var2_sd,
  #---------------------
  27.0, 3.1, 171.4, 9.0,
  27.0, 3.2, 176.8, 7.2,
  27.1, 3.0, 165.0, 6.2

)

So far, I have tried tidyr::separate(table, var1, c("var1_mean", "var1_sd"), sep = " \\\\(") which only partially works as it it does not remove the ending parenthesis.

Use separate as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA in the into argument.

library(dplyr)
library(tidyr)  

table %>% 
  separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%
  separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")

giving:

# A tibble: 3 x 4
  mean1 sd1   mean2 sd2  
  <chr> <chr> <chr> <chr>
1 27.0  3.1   171.4 9.0  
2 27.0  3.2   176.8 7.2  
3 27.1  3.0   165.0 6.2 

In base R you would do:

nms = paste0(c('mean','sd'),rep(1:2,each=ncol(table))) # Create the new names

read.table(text=gsub('[()]','',do.call(paste,table)),col.names = nms)

  mean1 sd1 mean2 sd2
1  27.0 3.1 171.4 9.0
2  27.0 3.2 176.8 7.2
3  27.1 3.0 165.0 6.2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM