[英]How to replace values in a row of a df with summed values based on another column
Okay so I have a dataframe where one column has characters and another has a value associated with those characters.好的,所以我有一个 dataframe ,其中一列具有字符,另一列具有与这些字符关联的值。 The problem is that some of the characters are listed twice and so have two different values.问题是某些字符被列出了两次,因此有两个不同的值。 For the multiple values, I want to sum them together so there's one value.对于多个值,我想将它们加在一起,这样就有一个值。 The trouble I'm having is that all the characters repeat based on ID and I only need to sum the values for each Id not the whole column.我遇到的问题是所有字符都根据 ID 重复,我只需要将每个 Id 的值相加,而不是整个列。 The df looks something like this: df 看起来像这样:
Color Amount ID
[1] Purple 45 566
[2] Blue 56 566
[3] Blue 53 566
[4] Yellow 68 566
[5] Green 76 566
[6] Purple 93 789
[7] Purple 35 789
[8] Blue 56 789
[9] Yellow 37 789
And I need to get it to this:我需要做到这一点:
Color Amount ID
[1] Purple 45 566
[2] Blue 109 566
[4] Yellow 68 566
[5] Green 76 566
[6] Purple 128 789
[8] Blue 56 789
[9] Yellow 37 789
You might want to look at the dplyr
package that allow you to perform this type of cleaning.您可能想查看允许您执行此类清洁的dplyr
package 。
Here is how you can achieve this:以下是如何实现这一目标:
df %>% group_by(ID, Color) %>% summarize(Amount = sum(Amount))
# A tibble: 7 x 3
# Groups: ID [2]
ID Color Amount
<int> <chr> <int>
1 566 Blue 109
2 566 Green 76
3 566 Purple 45
4 566 Yellow 68
5 789 Blue 56
6 789 Purple 128
7 789 Yellow 37
df = read.table( text = "Color Amount ID
1 Purple 45 566
2 Blue 56 566
3 Blue 53 566
4 Yellow 68 566
5 Green 76 566
6 Purple 93 789
7 Purple 35 789
8 Blue 56 789
9 Yellow 37 789", header =TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.