如何找到一行的第一个非空值的列名

Question

I'm working on the dataframe of an inquiry that looks like:我正在研究看起来像这样的查询的数据框：

User ID用户身份	2012-01-01 2012-01-01	2012-02-01 2012-02-01	2012-02-01 2012-02-01
Cell 1单元格 1	NA北美	2 2个	NA北美
Cell 3单元格 3	1 1个	NA北美	5 5个

I would like to find the date (the column name ) of the first non null column ( excluding the User ID column ), the name of last non null column, and the duration between these to dates for each user ID.我想找到第一个非空列（不包括用户 ID 列）的日期（列名）、最后一个非空列的名称，以及这些到每个用户 ID 的日期之间的持续时间。

Thank you !谢谢！

I've tried:我试过了：

df$min_date<-apply(df[-1], 1, function(x) 
   x[which.min(which(is.na(x) == FALSE))])

and和

df$min_date<-apply(df[-1], 1, function(x) 
   colnames(x[min(which(is.na(x) == FALSE))]))

but it didn't work但它没有用

Answer 1

How about this:这个怎么样：

library(dplyr) 
library(tidyr)
d <- tibble::tribble(
  ~"User ID",   ~"2012-01-01",  ~"2012-02-01",  ~"2012-02-01", 
"Cell 1",   NA, 2,  NA,
"Cell 3",   1,  NA, 5)
d %>% 
  pivot_longer(-1, names_to="date", values_to = "vals") %>%
  na.omit() %>% 
  mutate(date = lubridate::ymd(date)) %>% 
  group_by(`User ID`) %>% 
  summarise(first = first(date), 
            last = last(date)) %>% 
  mutate(diff = last - first)
#> # A tibble: 2 × 4
#>   `User ID` first      last       diff   
#>   <chr>     <date>     <date>     <drtn> 
#> 1 Cell 1    2012-02-01 2012-02-01  0 days
#> 2 Cell 3    2012-01-01 2012-02-01 31 days

^{Created on 2022-12-13 by the reprex package (v2.0.1)}^{由reprex 包(v2.0.1) 创建于 2022-12-13}

And here's a base R way (though using lubridate) that is more in keeping with your original idea:这是更符合您最初想法的基本 R 方式（尽管使用 lubridate）：

d <- tibble::tribble(
  ~"User ID",   ~"2012-01-01",  ~"2012-02-01",  ~"2012-02-01", 
"Cell 1",   NA, 2,  NA,
"Cell 3",   1,  NA, 5)
d <- tibble::tribble(
  ~"User ID",   ~"2012-01-01",  ~"2012-02-01",  ~"2012-02-01", 
  "Cell 1", NA, 2,  NA,
  "Cell 3", 1,  NA, 5)

mind <- apply(d[,-1], 1, function(x)
  colnames(d[,-1])[min(which(!is.na(x)))])
maxd <- apply(d[,-1], 1, function(x)
  colnames(d[,-1])[max(which(!is.na(x)))])

d$min_date <- lubridate::ymd(mind)
d$max_date <- lubridate::ymd(maxd)
d$diff <- d$max_date - d$min_date

d
#> # A tibble: 2 × 7
#>   `User ID` `2012-01-01` `2012-02-01` `2012-02-01` min_date   max_date   diff   
#>   <chr>            <dbl>        <dbl>        <dbl> <date>     <date>     <drtn> 
#> 1 Cell 1              NA            2           NA 2012-02-01 2012-02-01  0 days
#> 2 Cell 3               1           NA            5 2012-01-01 2012-02-01 31 days

^{Created on 2022-12-13 by the reprex package (v2.0.1)}^{由reprex 包(v2.0.1) 创建于 2022-12-13}

Answer 2

Here is a tidyverse option:这是一个tidyverse选项：

Be careful you are using Non-syntactic names, moreover the 3rd and 4th column have the same name.请注意您使用的是非语法名称，而且第 3 列和第 4 列具有相同的名称。 This won't work in R:这在 R 中不起作用：

library(dplyr)
library(tidyr)

df %>% 
  mutate(across(-c(User, ID), ~case_when(!is.na(.) ~ cur_column()), .names = 'new_{col}')) %>%
  unite(non_null, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  mutate(non_null = sub(" .*", "", non_null))

  User ID X2012.01.01 X2012.02.01 X2012.02.01.1    non_null
1 Cell  1          NA           2            NA X2012.02.01
2 Cell  3           1          NA             5 X2012.01.01

如何找到一行的第一个非空值的列名

问题描述

2 个解决方案

解决方案1
2 2022-12-13 21:10:48

解决方案2
2 2022-12-13 21:12:55

如何找到一行的第一个非空值的列名

问题描述

2 个解决方案

解决方案1 2 2022-12-13 21:10:48

解决方案2 2 2022-12-13 21:12:55

解决方案1
2 2022-12-13 21:10:48

解决方案2
2 2022-12-13 21:12:55