[英]Rowwise Column Count in Dataframe
Let's say I have the following dataframe
假设我有以下
dataframe
country_df <- tibble(
population = c(328, 38, 30, 56, 1393, 126, 57),
population2 = c(133, 12, 99, 83, 1033, 101, 33),
population3 = c(89, 39, 33, 56, 193, 126, 58),
pop = 45
)
All I need is a concise way inside the mutate
function to get the number of columns (population to population3) that are greater than the value of the pop column within each row.我所需要的只是在
mutate
函数中使用一种简洁的方法来获取大于每行中 pop 列值的列数(population 到population3)。
So what I need is the following results (more specifically the GreaterTotal column) Note: I can get the answer by working through each column but it would take a while with more columns)所以我需要的是以下结果(更具体地说,GreaterTotal 列)注意:我可以通过处理每一列来获得答案,但更多列需要一段时间)
population population2 population3 pop GreaterThan0 GreaterThan1 GreaterThan2 GreaterTotal
<dbl> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl> <int>
1 328 133 89 45 TRUE TRUE TRUE 3
2 38 12 39 45 FALSE FALSE FALSE 0
3 30 99 33 45 FALSE TRUE FALSE 1
4 56 83 56 45 TRUE TRUE TRUE 3
5 1393 1033 193 45 TRUE TRUE TRUE 3
6 126 101 126 45 TRUE TRUE TRUE 3
7 57 33 58 45 TRUE FALSE TRUE 2
I've tried using apply
with the row index, but I can't get at it.我试过将
apply
与行索引一起使用,但我无法理解。 Can somebody please point me in the right direction?有人可以指出我正确的方向吗?
You can select the 'Population' columns and compare those column with pop
and use rowSums
to count how many of them are greater in each row.您可以选择“人口”列并将这些列与
pop
进行比较,并使用rowSums
来计算每行中有多少个更大。
cols <- grep('population', names(country_df))
country_df$GreaterTotal <- rowSums(country_df[cols] > country_df$pop)
# population population2 population3 pop GreaterTotal
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 328 133 89 45 3
#2 38 12 39 45 0
#3 30 99 33 45 1
#4 56 83 56 45 3
#5 1393 1033 193 45 3
#6 126 101 126 45 3
#7 57 33 58 45 2
In dplyr
1.0.0, you can do this with rowwise
and c_across
:在
dplyr
1.0.0 中,您可以使用rowwise
和c_across
执行此c_across
:
country_df %>%
rowwise() %>%
mutate(GreaterTotal = sum(c_across(population:population3) > pop))
Using tidyverse
, we can do使用
tidyverse
,我们可以做到
library(dplyr)
country_df %>%
mutate(GreaterTotal = rowSums(select(.,
starts_with('population')) > .$pop) )
-output -输出
# A tibble: 7 x 5
# population population2 population3 pop GreaterTotal
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 328 133 89 45 3
#2 38 12 39 45 0
#3 30 99 33 45 1
#4 56 83 56 45 3
#5 1393 1033 193 45 3
#6 126 101 126 45 3
#7 57 33 58 45 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.