基於交易數據集中現有列的新列變量“計數”

Question

我有一個包含三列的交易數據集。 每行代表一個事務。

  Account_from  Account_to  Value 
1       1           2        25.0
2       1           3        30.0
3       2           1        28.0
4       2           3        10.0
5       2           4        12.0
6       3           1        40.0

我想創建新的列變量，其中包含有關每個帳戶進行和接收的交易數量的信息（兩列）。 它看起來像下面這樣：

  Account_from  Account_to  Value  Count_out  Count_in 
1       1           2        25.0      2          2
2       1           3        30.0      2          2
3       2           1        28.0      3          1
4       2           3        10.0      3          1
5       2           4        12.0      3          1
6       3           1        40.0      1          2

如何一次對整個數據集執行此操作？

Answer 1

tidyverse 提供了有用的功能 - 假設您的數據存儲在數據框df ：

library(tidyverse)
df <- df %>% add_count(Account_from, name = "Count_out") %>%
             add_count(Account_to, name = "Count_in")

Answer 2

我們可以使用dplyr通過一些連接操作來做到這dplyr 。

library(dplyr)

inner_join(df %>% count(Account_from, name = 'Count_out'), 
           df %>% count(Account_to, name = 'Count_in'), 
           by = c('Account_from' = 'Account_to')) %>%
right_join(df) %>%
select(names(df), Count_out, Count_in)

#  Account_from Account_to Value Count_out Count_in
#         <int>      <int> <dbl>     <int>    <int>
#1            1          2    25         2        2
#2            1          3    30         2        2
#3            2          1    28         3        1
#4            2          3    10         3        1
#5            2          4    12         3        1
#6            3          1    40         1        2

Answer 3

這是基礎 R 中的解決方案，其中使用了ave()

df <- within(df, 
             list(Count_out <- ave(1:nrow(df),Account_from,FUN = length),
                  Count_in <- ave(1:nrow(df),Account_to,FUN = length)[match(Account_from,Account_to,)]))

以至於

> df
  Account_from Account_to Value Count_in Count_out
1            1          2    25        2         2
2            1          3    30        2         2
3            2          1    28        1         3
4            2          3    10        1         3
5            2          4    12        1         3
6            3          1    40        2         1

或者使用下面的代碼：

df <- cbind(df, with(df, list(Count_out = ave(1:nrow(df),Account_from,FUN = length), 
                              Count_in = ave(1:nrow(df),Account_to,FUN = length)[match(Account_from,Account_to,)])))

以至於

> df
  Account_from Account_to Value Count_out Count_in
1            1          2    25         2        2
2            1          3    30         2        2
3            2          1    28         3        1
4            2          3    10         3        1
5            2          4    12         3        1
6            3          1    40         1        2

數據

df <- structure(list(Account_from = c(1L, 1L, 2L, 2L, 2L, 3L), Account_to = c(2L, 
3L, 1L, 3L, 4L, 1L), Value = c(25, 30, 28, 10, 12, 40), Count_out = c(2L, 
2L, 3L, 3L, 3L, 1L), Count_in = c(2L, 2L, 1L, 1L, 1L, 2L)), class = "data.frame", row.names = c(NA, 
-6L))

基於交易數據集中現有列的新列變量“計數”

問題描述

3 個解決方案

解決方案1
1 2019-12-20 10:07:59

解決方案2
0 2019-12-20 10:14:55

解決方案3
0 已采納 2019-12-20 10:22:03

基於交易數據集中現有列的新列變量“計數”

問題描述

3 個解決方案

解決方案1 1 2019-12-20 10:07:59

解決方案2 0 2019-12-20 10:14:55

解決方案3 0 已采納 2019-12-20 10:22:03

解決方案1
1 2019-12-20 10:07:59

解決方案2
0 2019-12-20 10:14:55

解決方案3
0 已采納 2019-12-20 10:22:03