简体   繁体   English

如何使用 R dplyr 找到行中第一个非零值的列索引?

[英]How can I find the column index of the first non-zero value in a row with R dplyr?

I'm working in R.我在 R 工作。 I have a dataset of COVID case totals that looks like this:我有一个 COVID 病例总数数据集,如下所示:

Facility设施 Day_1第 1 天 Day_2 Day_2 Day_3 Day_3
A一个 0 0 0 0 1 1
B 1 1 2 2 5 5
C C 0 0 2 2 6 6
D D 0 0 0 0 0 0

I would like to use mutate() to create a new column, first_case, that has the column index of the first non-zero element in each row -- or "NA" if there is no non-zero element.我想使用 mutate() 创建一个新列 first_case,它具有每行中第一个非零元素的列索引 - 如果没有非零元素,则为“NA”。 I thought about using where(), but couldn't quite figure out how to get a column index instead of a row index.我考虑过使用 where(),但不太清楚如何获取列索引而不是行索引。

Any help is much appreciated!任何帮助深表感谢!

We can use max.col to get the first instance when the value is non-zero in each zero.我们可以使用max.col来获取每个零中值非零时的第一个实例。

library(dplyr)

df %>%
  mutate(first_case = {
    tmp <- select(., starts_with('Day'))
    ifelse(rowSums(tmp) == 0, NA, max.col(tmp != 0, ties.method = 'first'))
  })

#  Facility Day_1 Day_2 Day_3 first_case
#1        A     0     0     1          3
#2        B     1     2     5          1
#3        C     0     2     6          2
#4        D     0     0     0         NA

first_case has column number of the 'Day' columns, if you need column number in the data you can add + 1 to above output. first_case'Day'列的列号,如果您需要数据中的列号,您可以在 output 上方添加+ 1

This is probably unnecessarily complex, because the data is not in a long ('tidy') format that dplyr etc expect.这可能是不必要的复杂,因为数据不是dplyr等期望的长(“整齐”)格式。

datlong <- dat %>%
  pivot_longer(cols=starts_with("Day"), names_to = c("day"), names_pattern="_(\\d+)")

## A tibble: 12 x 3
#   Facility day   value
#   <chr>    <chr> <int>
# 1 A        1         0
# 2 A        2         0
# 3 A        3         1
# 4 B        1         1
# 5 B        2         2
# 6 B        3         5
# 7 C        1         0
# 8 C        2         2
# 9 C        3         6
#10 D        1         0
#11 D        2         0
#12 D        3         0

It's then simple to get the first/second/third/[n]th day above whatever value, as well as to calculate minimums, maximums, means, weekly averages, rolling averages, whatever, because you are now dealing with a plain old vector of values rather than a list of values across multiple columns.然后很容易让第一天/第二天/第三天/ [n]天高于任何值,以及计算最小值、最大值、平均值、每周平均值、滚动平均值等等,因为您现在正在处理一个普通的旧向量值而不是跨多个列的值列表。

datlong %>%
  group_by(Facility) %>%
  filter(value > 0, .preserve=TRUE) %>%
  summarise(first_day = first(day))

#`summarise()` ungrouping output (override with `.groups` argument)
## A tibble: 4 x 2
#  Facility first_day
#  <chr>    <chr>    
#1 A        3        
#2 B        1        
#3 C        2        
#4 D        <NA>    

Alternative using indexes and stuff, which is less dplyr -like:替代使用索引和东西,这是更少的dplyr -like

datlong %>%
  group_by(Facility) %>%
  summarise(first_day = day[value > 0][1])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM