简体   繁体   中英

How can I create a new data frame with several rows for each observation based on string column?

I have a data frame in R with data on observations. One column contains several data points for each observation recorded as one long string with separators. I would like to restructure this data so that one observation can occur with several rows instead per the example below.

The data right now looks like this:

df <- data.frame(matrix(c("A", "B",
                          "X", "Y",
                          "{data1},{data2}", "{data1}"),
                 nrow = 2,
                 ncol = 3,
                 byrow = F))
names(df) <- c("key", "info", "more_info")

I would like it to look like this:

df <- data.frame(matrix(c("A", "A", "B",
                          "X", "X", "Y",
                          "{data1}", "{data2}", "{data1}"),
                 nrow = 3,
                 ncol = 3,
                 byrow = F))
names(df) <- c("key", "info", "more_info")

My first idea was to first use separate() and then use pivot_longer() but this ran into issues since the length of the last column is not the same for each observation. In fact, for some observations it may consist of hundreds of records.

You can use separate_rows from tidyr:

> library(tidyr)
> separate_rows(df, more_info, sep=",")
# A tibble: 3 x 3
  key   info  more_info
  <fct> <fct> <chr>    
1 A     X     {data1}  
2 A     X     {data2}  
3 B     Y     {data1}  

An option with unnest after strsplit

library(dplyr)
library(tidyr)
df %>% 
    mutate(more_info = strsplit(more_info, ",")) %>% 
    unnest(c(more_info))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM