[英]Is there a possibility to merge 2 dataframes in R, keeping only unique results with one of the columns as the dependency for which results are kept
[英]Merge three dataframes by two columns, keeping only the largest value in R
假設我有三個數據框:
df1 <- data.frame (first_column = c("A", "B","C"),
second_column = c(5, 5, 5),
third_column = c(1, 1, 1)
)
df2 <- data.frame (first_column = c("A", "B","E"),
second_column = c(1, 1, 5),
third_column = c(1, 1, 1)
)
df3 <- data.frame (first_column = c("E", "F","G"),
second_column = c(1, 1, 5),
third_column = c(1, 1, 1)
)
我想根據第一列組合所有這些,但如果重復,則只保留 second_column 中具有最大值的行。
所以 df1 + df2 + df3 =
first_column second_column third_column
A 5 1
B 5 1
C 5 1
E 5 1
F 1 1
G 5 1
任何解決方案,即使需要兩個或更多步驟,都非常受歡迎。 (另外,如果兩列中的值相等,則保留其中任何一個)
data.table 方法:
library(data.table)
#convert dfs to data.tables
setDT(df1)
setDT(df2)
setDT(df3)
# rbind them, order them decreasing by column_2 and get the first row for each column_1:
rbindlist(list(df1, df2, df3))[order(-second_column)][, .SD[1, ], by = first_column]
first_column second_column third_column
1: A 5 1
2: B 5 1
3: C 5 1
4: E 5 1
5: G 5 1
6: F 1 1
與@PavoDive 類似的data.table
方法,但更冗長:
# convert to data.tables
library(data.table)
setDT(df1); setDT(df2); setDT(df3)
# stack the three data.tables
df <- rbindlist(list(df1, df2, df3))
# aggregate by taking max
df[ , .(second_column = max(second_column),
third_column = max(third_column)), by = .(first_column)]
dplyr
答案:
將數據幀綁定在一個組合的 dataframe 中,按first_column
和 select 分組,該行對應於second_column
的最大值。
library(dplyr)
bind_rows(mget(paste0('df', 1:3))) %>%
group_by(first_column) %>%
slice(which.max(second_column))
# first_column second_column third_column
# <chr> <chr> <chr>
#1 A 5 1
#2 B 5 1
#3 C 5 1
#4 E 5 1
#5 F 1 1
#6 G 5 1
使用rbind
+ aggregate
的簡單基本 R 選項
> aggregate(.~first_column, rbind(df1,df2,df3),max)
first_column second_column third_column
1 A 5 1
2 B 5 1
3 C 5 1
4 E 5 1
5 F 1 1
6 G 5 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.