简体   繁体   中英

Unnesting a complex dataframe

I'm trying to unpack a dataframe with columns that contain sub dataframes in each row.

The problem is, that the sub dataframes in each column have different sizes (eg 1x3, 2x3 and 2x2). Moreover, I have a column in a sub dataframe (Conversions.Value) that has different data formats in each row (num and char). During the unpacking process, I get error messages like 'can't recycle input of size 3 to size 2.' or 'Can't combine ..1$Conversions$Value and ..6$Conversions$Value .' Structure below

structure(list
(Conversions = list(structure(list(Field = "Volume", 
    Unit = "m3", Value = 338L), class = "data.frame", row.names = 1L), 
    structure(list(Field = "Volume", Unit = "m3", Value = 450L), class = "data.frame", row.names = 1L)),     

Categories = list(structure(list(CategorySystem = c("Base", 
    NA), Title = c("Mineral materials and glass (excluding concrete)", "213.7 Kevytbetoni, Aerated concrete"), ClassificationType = c(NA, 
    "Talo2000")), class = "data.frame", row.names = 1:2), structure(list(
        CategorySystem = c("Base", NA), Title = c("Mineral materials and glass (excluding concrete)", 
        "213.7 Kevytbetoni, Aerated concrete"), ClassificationType = c(NA, 
        "Talo2000")), class = "data.frame", row.names = 1:2)), 
   
DataItems.DataValueItems = list(structure(list(DataModuleCode = c("A1-A3 Conservative",  "A1-A3 Typical"), Value = c(0.43, 0.36)), class = "data.frame", row.names = 1:2), 
        structure(list(DataModuleCode = c("A1-A3 Conservative", 
        "A1-A3 Typical"), Value = c(0.41, 0.34)), class = "data.frame", row.names = 1:2)), 
   
ResourceId = c(7000000995, 7000000996)), row.names = 1:2, class = "data.frame")

So far I've tried:

unnest_wider(df, col = 1:3, names_repair = "universal") 
# WORKED BUT multiple observations as a list in one row
# but different lengths

unnest_longer(df, col = 1:3, names_repair = "universal") %>%
mutate(across(.fns = as.character)) %>%
  type_convert()
# ERROR Can't combine `..1$Conversions$Value` <integer> and `..6$Conversions$Value` <character>.

df$Conversions=lapply(df$Conversions, FUN=as.character)
unnest_longer(df, col = 1:3, names_repair = "universal") %>%
  mutate(across(.fns = as.character)) %>%
  type_convert()
#ERROR ! In row 1, can't recycle input of size 3 to size 2.

ideally, this is how the outcome would look like

EDIT rbindlist worked, but only when applied on each column separately. Thus I lose the primary identificator of each row (ResourceId) and the data is not rejoinable anymore.

rbindlist(lapply(df$Conversions, as.data.frame.list), fill=TRUE)
rbindlist(lapply(df$Categories, as.data.frame.list), fill=TRUE)
rbindlist(lapply(df$DataItems.DataValueItems, as.data.frame.list), fill=TRUE)

How do I paste the Resource Id into the dataframe structure of each column, so that when rbindlist is applied after, I get a result with a column containing the respective ResourceId values?

So this is hideous I know but since nobody else has answered yet I figured I put it since I think this is what you wanted? let me know:

library(data.table)

df <- structure(list
          (Conversions = list(structure(list(Field = "Volume", 
                                             Unit = "m3", Value = 338L), class = "data.frame", row.names = 1L), 
                              structure(list(Field = "Volume", Unit = "m3", Value = 450L), class = "data.frame", row.names = 1L)),     
            
            Categories = list(structure(list(CategorySystem = c("Base", 
                                                                NA), Title = c("Mineral materials and glass (excluding concrete)", "213.7 Kevytbetoni, Aerated concrete"), ClassificationType = c(NA, 
                                                                                                                                                                                                  "Talo2000")), class = "data.frame", row.names = 1:2), structure(list(
                                                                                                                                                                                                    CategorySystem = c("Base", NA), Title = c("Mineral materials and glass (excluding concrete)", 
                                                                                                                                                                                                                                              "213.7 Kevytbetoni, Aerated concrete"), ClassificationType = c(NA, 
                                                                                                                                                                                                                                                                                                             "Talo2000")), class = "data.frame", row.names = 1:2)), 
            
            DataItems.DataValueItems = list(structure(list(DataModuleCode = c("A1-A3 Conservative",  "A1-A3 Typical"), Value = c(0.43, 0.36)), class = "data.frame", row.names = 1:2), 
                                            structure(list(DataModuleCode = c("A1-A3 Conservative", 
                                                                              "A1-A3 Typical"), Value = c(0.41, 0.34)), class = "data.frame", row.names = 1:2)), 
            
            ResourceId = c(7000000995, 7000000996)), row.names = 1:2, class = "data.frame")



unlisted <- list()
for (i in 1:length(df)){
  unlisted[[i]] <- rbindlist(lapply(df[i], as.data.frame.list), fill=TRUE)
}

cbind_new_list <- as.data.frame(do.call(cbind, unlisted))
removed_duplicates <- cbind_new_list[!duplicated(as.list(cbind_new_list))]

removed_duplicates
> removed_duplicates
   Field Unit Value Value.1 CategorySystem                                            Title ClassificationType     DataModuleCode Value.2 Value.1.1 X7000000995 X7000000996
1 Volume   m3   338     450           Base Mineral materials and glass (excluding concrete)               <NA> A1-A3 Conservative    0.43      0.41  7000000995  7000000996
2 Volume   m3   338     450           <NA>              213.7 Kevytbetoni, Aerated concrete           Talo2000      A1-A3 Typical    0.36      0.34  7000000995  7000000996

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM