简体   繁体   English

数据框中的行列数

[英]Rowwise Column Count in Dataframe

Let's say I have the following dataframe假设我有以下dataframe

country_df <- tibble(
  population = c(328, 38, 30, 56, 1393, 126, 57),
  population2 = c(133, 12, 99, 83, 1033, 101, 33),
  population3 = c(89, 39, 33, 56, 193, 126, 58),
  pop = 45
)

All I need is a concise way inside the mutate function to get the number of columns (population to population3) that are greater than the value of the pop column within each row.我所需要的只是在mutate函数中使用一种简洁的方法来获取大于每行中 pop 列值的列数(population 到population3)。

So what I need is the following results (more specifically the GreaterTotal column) Note: I can get the answer by working through each column but it would take a while with more columns)所以我需要的是以下结果(更具体地说,GreaterTotal 列)注意:我可以通过处理每一列来获得答案,但更多列需要一段时间)

  population population2 population3   pop GreaterThan0 GreaterThan1 GreaterThan2 GreaterTotal
       <dbl>       <dbl>       <dbl> <dbl> <lgl>        <lgl>        <lgl>               <int>
1        328         133          89    45 TRUE         TRUE         TRUE                    3
2         38          12          39    45 FALSE        FALSE        FALSE                   0
3         30          99          33    45 FALSE        TRUE         FALSE                   1
4         56          83          56    45 TRUE         TRUE         TRUE                    3
5       1393        1033         193    45 TRUE         TRUE         TRUE                    3
6        126         101         126    45 TRUE         TRUE         TRUE                    3
7         57          33          58    45 TRUE         FALSE        TRUE                    2

I've tried using apply with the row index, but I can't get at it.我试过将apply与行索引一起使用,但我无法理解。 Can somebody please point me in the right direction?有人可以指出我正确的方向吗?

You can select the 'Population' columns and compare those column with pop and use rowSums to count how many of them are greater in each row.您可以选择“人口”列并将这些列与pop进行比较,并使用rowSums来计算每行中有多少个更大。

cols <- grep('population', names(country_df))
country_df$GreaterTotal <- rowSums(country_df[cols] > country_df$pop)

#  population population2 population3   pop GreaterTotal
#       <dbl>       <dbl>       <dbl> <dbl>        <dbl>
#1        328         133          89    45            3
#2         38          12          39    45            0
#3         30          99          33    45            1
#4         56          83          56    45            3
#5       1393        1033         193    45            3
#6        126         101         126    45            3
#7         57          33          58    45            2

In dplyr 1.0.0, you can do this with rowwise and c_across :dplyr 1.0.0 中,您可以使用rowwisec_across执行此c_across

country_df %>%
  rowwise() %>%
  mutate(GreaterTotal = sum(c_across(population:population3) > pop))

Using tidyverse , we can do使用tidyverse ,我们可以做到

library(dplyr)
country_df %>%
      mutate(GreaterTotal = rowSums(select(., 
              starts_with('population')) > .$pop) )

-output -输出

# A tibble: 7 x 5
#  population population2 population3   pop GreaterTotal
#       <dbl>       <dbl>       <dbl> <dbl>        <dbl>
#1        328         133          89    45            3
#2         38          12          39    45            0
#3         30          99          33    45            1
#4         56          83          56    45            3
#5       1393        1033         193    45            3
#6        126         101         126    45            3
#7         57          33          58    45            2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM