基于某些过滤将列中的值分配给其他两列中的值

Question

I am currently working on a programming puzzle that sounds straightforward, but apparently it is pretty difficult if I want to do this efficiently in R without having to use for loop to go through a column with 100k+ rows within a data-frame.我目前正在研究一个听起来很简单的编程难题，但显然如果我想在 R 中有效地做到这一点而不必使用for循环来遍历数据框中包含 100k+ 行的列，这将非常困难。 I am trying to apply dplyr (particularly group_by and mutate ) or data.table , and -apply family, but it's quite tough.我正在尝试应用dplyr （尤其是group_by和mutate ）或data.table和-apply系列，但这非常困难。 Could anyone give some help?任何人都可以提供一些帮助吗？

The problem is as follows: given a data-frame df with columns key ("string" data type) x , y , and z ("numeric" data type).问题如下：给定一个数据帧df ，其列key （“字符串”数据类型） x 、 y和z （“数字”数据类型）。 Some elements within column key are repeated.列key中的某些元素是重复的。 Among rows with the same element in key column, check if the corresponding value in column x is smaller than the sum of corresponding elements in column y (row-wise).在key列中具有相同元素的行中，检查x列中的对应值是否小于y列中对应元素的总和（逐行）。 If it is, then turn that value in column x to 0, while distributing the element in column x to elements in column y based on the ordering of corresponding values in column z .如果是，则将x列中的该值变为 0，同时根据z列中相应值的顺序将x列中的元素分配给y列中的元素。 How do we effectively do this given that we need to go through all distinct elements in column key ?鉴于我们需要遍历列key中的所有不同元素，我们如何有效地做到这一点？

Input输入

df <- data.frame(key = c('aa_bb_1, aa_bb_0, ab_ca_0, abc_bbb_1, abbbc_aa_1, aaa_ccc_1, aa_bb_1, aa_bb_1, ab_ca_0, abc_bbb_1, abbbc_aa_1, aaa_ccc_1, aa_bb_0, aa_bb_1, ab_ca_0, abc_bbb_0, abbbc_aa_0, aaa_ccc_1, aa_bb_0, aa_bb_1, ab_ca_1, abc_bbb_1, abbbc_aa_1, aaa_ccc_1, aa_bb_1, aa_bb_0, ab_ca_0, abc_bbb_1, abbbc_aa_1, aaa_ccc_1),
                 x = c(10, 19, 30, 25, 37, 13, 30, 40, 100, 53, 11, 27, 89, 21, 30, 30, 17, 9, 5, 57, 10, 19, 30, 25, 37, 13, 30, 40, 100, 53, 11, 27, 89, 21, 30, 30, 17, 9, 5, 57, 10, 19, 30, 25, 37, 13, 30, 40, 100, 53), 
                 y = (3, 10, 18, 15, 32, 4, 6, 29, 71, 92, 11, 7, 21, 19, 13, 26,28,11,8, 8, 5, 23, 3, 12, 19, 7, 9, 11, 7, 12, 9, 3, 20, 13, 7, 2, 9, 3, 6, 13, 11, 8, 8, 5, 21, 5, 21,11, 25, 40),
                 z = (8,13,15,16,10,10,25,21,32,15,45,8,10,50,12,10,0,0,10,12,2,40,9,8,13,15,16,10,10,25,21,32,15,45,8,10,50,12,10,0,0,10,12,2,40,9,12,10,10,20)

          key   x  y  z
1     aa_bb_1  10  3  8
2     aa_bb_0  19 10 13
3     ab_ca_0  30 18 15
4   abc_bbb_1  25 15 16
5  abbbc_aa_1  37 32 10
6   aaa_ccc_1  13  4 10
7     aa_bb_1  30  6 25
8     aa_bb_1  40 29 21
9     ab_ca_0 100 71 32
10  abc_bbb_1  53 92 15
11 abbbc_aa_1  11 11 45
12  aaa_ccc_1  27  7  8
13    aa_bb_0  89 21 10
14    aa_bb_1  21 19 50
15    ab_ca_0  30 13 12
16  abc_bbb_0  30 26 10
17 abbbc_aa_0  17 28  0
18  aaa_ccc_1   9 11  0
....
25    aa_bb_1  37 19 13
26    aa_bb_0  13  7 15
27    ab_ca_0  30  9 16
28  abc_bbb_1  40 11 10
29 abbbc_aa_1 100  7 10
30  aaa_ccc_1  53 12 25

Answer 1

Not sure what exactly your outcome looks like不确定你的结果到底是什么样的

With dplyr you can do something like this.使用 dplyr 你可以做这样的事情。 I'm pretty, sure this doesn't exactly solve your issue because of ambiguity of your description.我很漂亮，由于您的描述含糊不清，这并不能完全解决您的问题。 But you can use this as a template to solve your issue.但是您可以将此作为模板来解决您的问题。

 df |> 
  group_by(key) |> 
  mutate(x = ifelse(n() > 1 & (x < sum(y)), 0, x)) |> 
  ungroup() |> 
  mutate( y = df |> group_by(key) |> mutate(y= x[order(z)]) |> pull(y))

   key            x     y     z
   <chr>      <dbl> <int> <int>
 1 aa_bb_1        0    10     8
 2 aa_bb_0        0    89    13
 3 ab_ca_0        0    30    15
 4 abc_bbb_1      0    53    16
 5 abbbc_aa_1     0    37    10
 6 aaa_ccc_1      0     9    10
 7 aa_bb_1        0    40    25
 8 aa_bb_1        0    30    21
 9 ab_ca_0        0    30    32
10 abc_bbb_1      0    25    15
11 abbbc_aa_1     0    11    45
12 aaa_ccc_1     27    27     8
13 aa_bb_0       89    19    10
14 aa_bb_1        0    21    50
15 ab_ca_0        0   100    12
16 abc_bbb_0     30    30    10
17 abbbc_aa_0    17    17     0
18 aaa_ccc_1      0    13     0

基于某些过滤将列中的值分配给其他两列中的值

问题描述

1 个解决方案

解决方案1
0 2022-07-23 02:02:08

基于某些过滤将列中的值分配给其他两列中的值

问题描述

1 个解决方案

解决方案1 0 2022-07-23 02:02:08

解决方案1
0 2022-07-23 02:02:08