簡體   English   中英

按兩列合並三個數據幀,只保留 R 中的最大值

[英]Merge three dataframes by two columns, keeping only the largest value in R

假設我有三個數據框:

df1 <- data.frame (first_column  = c("A", "B","C"),
                  second_column = c(5, 5, 5),
                  third_column = c(1, 1, 1)
                   )

df2 <- data.frame (first_column  = c("A", "B","E"),
                  second_column = c(1, 1, 5),
                  third_column = c(1, 1, 1)
                    )

df3 <- data.frame (first_column  = c("E", "F","G"),
                  second_column = c(1, 1, 5),
                  third_column = c(1, 1, 1)
                   )

我想根據第一列組合所有這些,但如果重復,則只保留 second_column 中具有最大值的行。

所以 df1 + df2 + df3 =

first_column  second_column third_column
A             5             1
B             5             1
C             5             1
E             5             1
F             1             1
G             5             1

任何解決方案,即使需要兩個或更多步驟,都非常受歡迎。 (另外,如果兩列中的值相等,則保留其中任何一個)

data.table 方法:

library(data.table)

#convert dfs to data.tables
setDT(df1)
setDT(df2)
setDT(df3)

# rbind them, order them decreasing by column_2 and get the first row for each column_1:
rbindlist(list(df1, df2, df3))[order(-second_column)][, .SD[1, ], by = first_column]

   first_column second_column third_column
1:            A             5            1
2:            B             5            1
3:            C             5            1
4:            E             5            1
5:            G             5            1
6:            F             1            1

與@PavoDive 類似的data.table方法,但更冗長:

# convert to data.tables
library(data.table)
setDT(df1); setDT(df2); setDT(df3)

# stack the three data.tables
df <- rbindlist(list(df1, df2, df3))

# aggregate by taking max
df[ , .(second_column = max(second_column),
        third_column = max(third_column)), by = .(first_column)]

dplyr答案:

將數據幀綁定在一個組合的 dataframe 中,按first_column和 select 分組,該行對應於second_column的最大值。

library(dplyr)

bind_rows(mget(paste0('df', 1:3))) %>%
  group_by(first_column) %>%
  slice(which.max(second_column))

# first_column second_column third_column
#  <chr>        <chr>         <chr>       
#1 A            5             1           
#2 B            5             1           
#3 C            5             1           
#4 E            5             1           
#5 F            1             1           
#6 G            5             1           

使用rbind + aggregate的簡單基本 R 選項

> aggregate(.~first_column, rbind(df1,df2,df3),max)
  first_column second_column third_column
1            A             5            1
2            B             5            1
3            C             5            1
4            E             5            1
5            F             1            1
6            G             5            1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM