简体   繁体   English

如何找到一行的第一个非空值的列名

[英]How to find the column name of the first non nul value of a row

I'm working on the dataframe of an inquiry that looks like:我正在研究看起来像这样的查询的数据框:

User ID用户身份 2012-01-01 2012-01-01 2012-02-01 2012-02-01 2012-02-01 2012-02-01
Cell 1单元格 1 NA北美 2 2个 NA北美
Cell 3单元格 3 1 1个 NA北美 5 5个

I would like to find the date (the column name ) of the first non null column ( excluding the User ID column ), the name of last non null column, and the duration between these to dates for each user ID.我想找到第一个非空列(不包括用户 ID 列)的日期(列名)、最后一个非空列的名称,以及这些到每个用户 ID 的日期之间的持续时间。

Thank you !谢谢 !

I've tried:我试过了:

df$min_date<-apply(df[-1], 1, function(x) 
   x[which.min(which(is.na(x) == FALSE))])

and

df$min_date<-apply(df[-1], 1, function(x) 
   colnames(x[min(which(is.na(x) == FALSE))]))

but it didn't work但它没有用

How about this:这个怎么样:

library(dplyr) 
library(tidyr)
d <- tibble::tribble(
  ~"User ID",   ~"2012-01-01",  ~"2012-02-01",  ~"2012-02-01", 
"Cell 1",   NA, 2,  NA,
"Cell 3",   1,  NA, 5)
d %>% 
  pivot_longer(-1, names_to="date", values_to = "vals") %>%
  na.omit() %>% 
  mutate(date = lubridate::ymd(date)) %>% 
  group_by(`User ID`) %>% 
  summarise(first = first(date), 
            last = last(date)) %>% 
  mutate(diff = last - first)
#> # A tibble: 2 × 4
#>   `User ID` first      last       diff   
#>   <chr>     <date>     <date>     <drtn> 
#> 1 Cell 1    2012-02-01 2012-02-01  0 days
#> 2 Cell 3    2012-01-01 2012-02-01 31 days

Created on 2022-12-13 by the reprex package (v2.0.1)reprex 包(v2.0.1) 创建于 2022-12-13

And here's a base R way (though using lubridate) that is more in keeping with your original idea:这是更符合您最初想法的基本 R 方式(尽管使用 lubridate):

d <- tibble::tribble(
  ~"User ID",   ~"2012-01-01",  ~"2012-02-01",  ~"2012-02-01", 
"Cell 1",   NA, 2,  NA,
"Cell 3",   1,  NA, 5)
d <- tibble::tribble(
  ~"User ID",   ~"2012-01-01",  ~"2012-02-01",  ~"2012-02-01", 
  "Cell 1", NA, 2,  NA,
  "Cell 3", 1,  NA, 5)

mind <- apply(d[,-1], 1, function(x)
  colnames(d[,-1])[min(which(!is.na(x)))])
maxd <- apply(d[,-1], 1, function(x)
  colnames(d[,-1])[max(which(!is.na(x)))])

d$min_date <- lubridate::ymd(mind)
d$max_date <- lubridate::ymd(maxd)
d$diff <- d$max_date - d$min_date

d
#> # A tibble: 2 × 7
#>   `User ID` `2012-01-01` `2012-02-01` `2012-02-01` min_date   max_date   diff   
#>   <chr>            <dbl>        <dbl>        <dbl> <date>     <date>     <drtn> 
#> 1 Cell 1              NA            2           NA 2012-02-01 2012-02-01  0 days
#> 2 Cell 3               1           NA            5 2012-01-01 2012-02-01 31 days

Created on 2022-12-13 by the reprex package (v2.0.1)reprex 包(v2.0.1) 创建于 2022-12-13

Here is a tidyverse option:这是一个tidyverse选项:

Be careful you are using Non-syntactic names, moreover the 3rd and 4th column have the same name.请注意您使用的是非语法名称,而且第 3 列和第 4 列具有相同的名称。 This won't work in R:这在 R 中不起作用:

library(dplyr)
library(tidyr)

df %>% 
  mutate(across(-c(User, ID), ~case_when(!is.na(.) ~ cur_column()), .names = 'new_{col}')) %>%
  unite(non_null, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  mutate(non_null = sub(" .*", "", non_null))
  User ID X2012.01.01 X2012.02.01 X2012.02.01.1    non_null
1 Cell  1          NA           2            NA X2012.02.01
2 Cell  3           1          NA             5 X2012.01.01

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM