[英]R conditional rowSums to replace with sums based on percentage
I'm looking to conditionally rowSums if those rows represent <1% of the data - and then replace the original values with the rowSums.如果这些行代表<1%的数据,我正在寻找有条件的rowSums - 然后用rowSums替换原始值。 *Bonus if the table could include the number of rows that were summed into the name column (eg, "Other(n=2)").
*如果表可以包括汇总到名称列中的行数(例如,“其他(n = 2)”),则奖励。 This is a small part of a much larger function.
这是更大的 function 的一小部分。 See example below:
请参见下面的示例:
Example data:示例数据:
name![]() |
Year1![]() |
Year2![]() |
Year3![]() |
Total![]() |
Percent![]() |
---|---|---|---|---|---|
John![]() |
1 ![]() |
2 ![]() |
1 ![]() |
4 ![]() |
0.7029877 ![]() |
Paul![]() |
230 ![]() |
100 ![]() |
150 ![]() |
480 ![]() |
84.358524 ![]() |
George![]() |
41 ![]() |
30 ![]() |
10 ![]() |
81 ![]() |
14.235501 ![]() |
Ringo![]() |
2 ![]() |
1 ![]() |
1 ![]() |
4 ![]() |
0.7029877 ![]() |
# Code for example data
name <- c("John", "Paul", "George", "Ringo")
Year1 <- c(1, 230, 41, 2)
Year2 <- c(2, 100, 30, 1)
Year3 <- c(1, 150, 10, 1)
df <- data.frame(name, Year1, Year2, Year3)
df$Total <- rowSums(select(df,Year1:Year3))
df$Percent <- df$Total/sum(df$Total)*100
In the solution, John and Ringo would be combined into one 'Other' solution since both have Percent < 1.在解决方案中,John 和 Ringo 将合并为一个“其他”解决方案,因为两者的百分比 < 1。
# Code for example solution
name <- c("Paul", "George", "Other(n=2)")
Year1 <- c(230, 41, 3)
Year2 <- c(100, 30, 3)
Year3 <- c(150, 10, 2)
df2 <- data.frame(name, Year1, Year2, Year3)
df2$Total <- rowSums(select(df2,Year1:Year3))
df2$Percent <- df2$Total/sum(df2$Total)*100
Example solution:示例解决方案:
name![]() |
Year1![]() |
Year2![]() |
Year3![]() |
Total![]() |
Percent![]() |
---|---|---|---|---|---|
Paul![]() |
230 ![]() |
100 ![]() |
150 ![]() |
480 ![]() |
84.358524 ![]() |
George![]() |
41 ![]() |
30 ![]() |
10 ![]() |
81 ![]() |
14.235501 ![]() |
Other(n=2)![]() |
3 ![]() |
3 ![]() |
2 ![]() |
8 ![]() |
1.405975 ![]() |
library(tidyverse) # or use forcats::fct_lump(...
df %>%
mutate(name_lumped = fct_lump(name, w = Percent, prop = 0.01)) %>%
group_by(name_lumped) %>%
summarize(across(Year1:Percent, sum))
# A tibble: 3 x 6
name_lumped Year1 Year2 Year3 Total Percent
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 George 41 30 10 81 14.2
2 Paul 230 100 150 480 84.4
3 Other 3 3 2 8 1.41
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.