簡體   English   中英

如何將具有相同列名的值組合到 R 中的新數據框中?

[英]How to combine values with the same column name into a new dataframe in R?

我有以下數據集

原始數據集:

ID    Col1    Col1    Col1    Col2    Col2    Col2
A     Dog                     House
B     Dog                             Car     Bike
C             Cat             House
D                     Mouse                   Bike

有沒有辦法創建一個新的數據框,它將所有具有相同列名的值組合在一起,如下所示

預期數據集:

ID    Col1    Col2    
A     Dog     House
B     Dog     Car, Bike
C     Cat     House
D     Mouse   Bike

你可以這樣做:

df <- structure(list(
  ID = c("A", "B", "C", "D"),
  Col1 = c("Dog", "Dog", NA, NA),
  Col1 = c(NA, NA, "Cat", NA),
  Col1 = c(NA, NA, NA, "Mouse"),
  Col2 = c("House", NA, "House", NA),
  Col2 = c(NA, "Car", NA, NA),
  Col2 = c(NA, "Bike", NA, "Bike")
),
class = c("data.frame"), row.names = c(NA, -4L)
)

library(dplyr)
library(tidyr)
library(purrr)

vars_to_unite <- unique(names(df))[unique(names(df)) != "ID"]
renamed_df <- as_tibble(df, .name_repair = "unique")

map_dfc(vars_to_unite, 
        ~unite(
          select(renamed_df, starts_with(.x)), 
          col = !!.x, sep = ", ", na.rm = TRUE
        )) %>% 
  mutate(ID = df$ID)

#> # A tibble: 4 × 3
#>   Col1  Col2      ID   
#>   <chr> <chr>     <chr>
#> 1 Dog   House     A    
#> 2 Dog   Car, Bike B    
#> 3 Cat   House     C    
#> 4 Mouse Bike      D

reprex 包於 2022-06-01 創建 (v2.0.1)

基礎 R 解決方案:

# Input data: df => data.frame
df <- structure(list(
  ID = c("A", "B", "C", "D"),
  Col1 = c("Dog", "Dog", NA, NA),
  Col1 = c(NA, NA, "Cat", NA),
  Col1 = c(NA, NA, NA, "Mouse"),
  Col2 = c("House", NA, "House", NA),
  Col2 = c(NA, "Car", NA, NA),
  Col2 = c(NA, "Bike", NA, "Bike")
),
  class = c("data.frame"), row.names = c(NA, -4L)
)


# Split-Apply-Combine: res => data.frame
res <- data.frame(
  do.call(
    cbind, 
      lapply(
        split.default(
          df,
          names(df)
        ),
        function(x){
          apply(
            x, 
            1, 
            FUN = function(y){
              toString(
                na.omit(y)
              )
            }
          )
        }
      )
  )[,unique(names(df))],
  stringsAsFactors = FALSE,
  row.names = row.names(df)
)

# output Result: data.frame => stdout(console)
res

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM