简体   繁体   English

基于某些过滤将列中的值分配给其他两列中的值

[英]Distributing value in a column to values in two other columns based on certain filtering

I am currently working on a programming puzzle that sounds straightforward, but apparently it is pretty difficult if I want to do this efficiently in R without having to use for loop to go through a column with 100k+ rows within a data-frame.我目前正在研究一个听起来很简单的编程难题,但显然如果我想在 R 中有效地做到这一点而不必使用for循环来遍历数据框中包含 100k+ 行的列,这将非常困难。 I am trying to apply dplyr (particularly group_by and mutate ) or data.table , and -apply family, but it's quite tough.我正在尝试应用dplyr (尤其是group_bymutate )或data.table-apply系列,但这非常困难。 Could anyone give some help?任何人都可以提供一些帮助吗?

The problem is as follows: given a data-frame df with columns key ("string" data type) x , y , and z ("numeric" data type).问题如下:给定一个数据帧df ,其列key (“字符串”数据类型) xyz (“数字”数据类型)。 Some elements within column key are repeated.key中的某些元素是重复的。 Among rows with the same element in key column, check if the corresponding value in column x is smaller than the sum of corresponding elements in column y (row-wise).key列中具有相同元素的行中,检查x列中的对应值是否小于y列中对应元素的总和(逐行)。 If it is, then turn that value in column x to 0, while distributing the element in column x to elements in column y based on the ordering of corresponding values in column z .如果是,则将x列中的该值变为 0,同时根据z列中相应值的顺序x列中的元素分配给y列中的元素。 How do we effectively do this given that we need to go through all distinct elements in column key ?鉴于我们需要遍历列key中的所有不同元素,我们如何有效地做到这一点?

Input输入

df <- data.frame(key = c('aa_bb_1, aa_bb_0, ab_ca_0, abc_bbb_1, abbbc_aa_1, aaa_ccc_1, aa_bb_1, aa_bb_1, ab_ca_0, abc_bbb_1, abbbc_aa_1, aaa_ccc_1, aa_bb_0, aa_bb_1, ab_ca_0, abc_bbb_0, abbbc_aa_0, aaa_ccc_1, aa_bb_0, aa_bb_1, ab_ca_1, abc_bbb_1, abbbc_aa_1, aaa_ccc_1, aa_bb_1, aa_bb_0, ab_ca_0, abc_bbb_1, abbbc_aa_1, aaa_ccc_1),
                 x = c(10, 19, 30, 25, 37, 13, 30, 40, 100, 53, 11, 27, 89, 21, 30, 30, 17, 9, 5, 57, 10, 19, 30, 25, 37, 13, 30, 40, 100, 53, 11, 27, 89, 21, 30, 30, 17, 9, 5, 57, 10, 19, 30, 25, 37, 13, 30, 40, 100, 53), 
                 y = (3, 10, 18, 15, 32, 4, 6, 29, 71, 92, 11, 7, 21, 19, 13, 26,28,11,8, 8, 5, 23, 3, 12, 19, 7, 9, 11, 7, 12, 9, 3, 20, 13, 7, 2, 9, 3, 6, 13, 11, 8, 8, 5, 21, 5, 21,11, 25, 40),
                 z = (8,13,15,16,10,10,25,21,32,15,45,8,10,50,12,10,0,0,10,12,2,40,9,8,13,15,16,10,10,25,21,32,15,45,8,10,50,12,10,0,0,10,12,2,40,9,12,10,10,20)

          key   x  y  z
1     aa_bb_1  10  3  8
2     aa_bb_0  19 10 13
3     ab_ca_0  30 18 15
4   abc_bbb_1  25 15 16
5  abbbc_aa_1  37 32 10
6   aaa_ccc_1  13  4 10
7     aa_bb_1  30  6 25
8     aa_bb_1  40 29 21
9     ab_ca_0 100 71 32
10  abc_bbb_1  53 92 15
11 abbbc_aa_1  11 11 45
12  aaa_ccc_1  27  7  8
13    aa_bb_0  89 21 10
14    aa_bb_1  21 19 50
15    ab_ca_0  30 13 12
16  abc_bbb_0  30 26 10
17 abbbc_aa_0  17 28  0
18  aaa_ccc_1   9 11  0
....
25    aa_bb_1  37 19 13
26    aa_bb_0  13  7 15
27    ab_ca_0  30  9 16
28  abc_bbb_1  40 11 10
29 abbbc_aa_1 100  7 10
30  aaa_ccc_1  53 12 25


          

Not sure what exactly your outcome looks like不确定你的结果到底是什么样的

With dplyr you can do something like this.使用 dplyr 你可以做这样的事情。 I'm pretty, sure this doesn't exactly solve your issue because of ambiguity of your description.我很漂亮,由于您的描述含糊不清,这并不能完全解决您的问题。 But you can use this as a template to solve your issue.但是您可以将此作为模板来解决您的问题。

 df |> 
  group_by(key) |> 
  mutate(x = ifelse(n() > 1 & (x < sum(y)), 0, x)) |> 
  ungroup() |> 
  mutate( y = df |> group_by(key) |> mutate(y= x[order(z)]) |> pull(y))
   key            x     y     z
   <chr>      <dbl> <int> <int>
 1 aa_bb_1        0    10     8
 2 aa_bb_0        0    89    13
 3 ab_ca_0        0    30    15
 4 abc_bbb_1      0    53    16
 5 abbbc_aa_1     0    37    10
 6 aaa_ccc_1      0     9    10
 7 aa_bb_1        0    40    25
 8 aa_bb_1        0    30    21
 9 ab_ca_0        0    30    32
10 abc_bbb_1      0    25    15
11 abbbc_aa_1     0    11    45
12 aaa_ccc_1     27    27     8
13 aa_bb_0       89    19    10
14 aa_bb_1        0    21    50
15 ab_ca_0        0   100    12
16 abc_bbb_0     30    30    10
17 abbbc_aa_0    17    17     0
18 aaa_ccc_1      0    13     0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM