![](/img/trans.png)
[英]After comparing two columns, how can I remove unique values in one column with its values in other different columns in R
[英]How to add two columns from two different dataframes in R wherein one column just has subset of unique values of the other
我有兩個數據幀,我需要從這兩列中添加兩列並將結果存儲在原始的更大的數據幀中,但是較大的數據幀比較小的數據幀具有更多的“分支”列。 我嘗試使用匹配但不匹配的分支總和為 NA
示例代碼:
> df1 <- data.frame(branch = letters[seq(1,5)],
+ rev = seq(10,50,10),
+ stringsAsFactors = 0)
> df1
branch rev
1 a 10
2 b 20
3 c 30
4 d 40
5 e 50
>
> df2 <- data.frame(branch = c('b','d'),
+ Amt = c(10,10),
+ stringsAsFactors = 0)
> df2
branch Amt
1 b 10
2 d 10
>
> df1$rev + df2[match(df1$branch,df2$branch),2,drop = 1]
[1] NA 30 NA 50 NA
>
預期產出
> df1
branch rev
1 a 10
2 b 30
3 c 30
4 d 50
5 e 50
>
我嘗試使用左連接如下:
> left_join(df1, df2, by = 'branch')
branch rev Amt
1 a 10 NA
2 b 20 10
3 c 30 NA
4 d 40 10
5 e 50 NA
> df1 <- left_join(df1, df2, by = 'branch')
> df1[is.na(df1)] <- 0
> df1
branch rev Amt
1 a 10 0
2 b 20 10
3 c 30 0
4 d 40 10
5 e 50 0
> df1$rev <- df1$rev + df1$Amt
> df1
branch rev Amt
1 a 10 0
2 b 30 10
3 c 30 0
4 d 50 10
5 e 50 0
> df1$Amt <- NULL
> df1
branch rev
1 a 10
2 b 30
3 c 30
4 d 50
5 e 50
>
有人可以讓我知道是否有更簡單的解決方案。
使用data.table
的選項:
library(data.table)
setDT(df1)[, rev :=
setDT(df2)[.SD, on=.(branch), rev + nafill(Amt, fill=0)]
]
輸出:
branch rev
1: a 10
2: b 30
3: c 30
4: d 50
5: e 50
這個怎么樣,不需要庫:
df1 <- df1[order(df1$branch),] #sort based on branch
df2 <- df2[order(df2$branch),] #sort also so next step works
df1$branch[df1$branch %in% df2$branch] #just to check we are on correct path
#do the task
df1$rev[df1$branch %in% df2$branch] <- df1$rev[df1$branch %in% df2$branch] + df2$Amt[df2$branch %in% df1$branch]
警告——如果 df2 中有重復的“分支”值……例如兩個“b”,您需要在將它們添加到 df1 之前累積這些值。
使用dplyr
,您可以聚合同時使用dataframes bind_rows
(以及為了重命名金額由轉來匹配colnames),按“分支”,並計算總和:
library(dplyr)
df1 %>% bind_rows(., rename(df2, rev = Amt)) %>%
group_by(branch) %>%
summarise(rev = sum(rev))
# A tibble: 5 x 2
branch rev
<chr> <dbl>
1 a 10
2 b 30
3 c 30
4 d 50
5 e 50
一種方法是將match
的輸出存儲在變量中, replace
NA
replace
為 0,然后添加值
vals <- df2$Amt[match(df1$branch,df2$branch)]
df1$rev + replace(vals, is.na(vals), 0)
#[1] 10 30 30 50 50
dplyr
類似的dplyr
,做left_join
而不是match
library(dplyr)
df1 %>%
left_join(df2, by = 'branch') %>%
mutate(Amt = replace(Amt, is.na(Amt), 0),
rev = rev + Amt) %>%
select(names(df1))
使用aggregate
得到不同分支組中rev的總和。
library(magrittr)
colnames(df2)[2] <- "rev"
df1 <- rbind(df1, df2) %>% aggregate(rev ~ branch, ., FUN = sum)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.