R 根据列名匹配选定列中的 NA

Question

I have a dataset containing per month revenue per client : Underneath is a working minimal sample.我有一个包含每个客户每月收入的数据集：下面是一个工作的最小样本。 (the real dataset runs over multiple years, all months and multple clients, but you get the picture.) （真实的数据集运行多年，所有月份和多个客户，但您可以了解情况。）

client <-c("name1","name2","name3","name4","name5","name6")
Feb2018 <- c(10,11,NA,21,22,NA)
Jan2018 <- c(20,NA,NA,NA,58,NA)
Dec2017 <- c(30,23,33,NA,NA,NA)
Nov2017 <- c(40,22,75,NA,NA,11)
df <- data.frame(client,Feb2018,Jan2018,Dec2017,Nov2017)

My objective is to have our revenue split up between 'new','recurrent'&'lost', by adding an extra column.我的目标是通过添加一个额外的列，将我们的收入分成“新的”、“经常性的”和“丢失的”。

That is :那是：

new : clients having some revenue in 2018 but none in 2017. (name4 & name5)新：客户在 2018 年有一些收入，但在 2017 年没有收入。（姓名 4 和姓名 5）

recurrent : clients having some revenue in 2017 & 2018. (name1 & name2)经常性：客户在 2017 年和 2018 年有一些收入。（姓名 1 和姓名 2）

lost : clients having some revenue in 2017 but none in 2018. (name3 & name6)流失：客户在 2017 年有一些收入，但在 2018 年没有收入。（姓名 3 和姓名 6）

I know how to use grep to select the column names,我知道如何使用 grep 来选择列名，

df[,c('client',colnames(df[grep('2018$',colnames(df))]))]

I also know how to use is.na.我也知道如何使用 is.na。 but I'm really stuck in making the combination of having a selection on both the column name & the existance of NA in the selected column.但我真的坚持对列名和所选列中 NA 的存在进行选择的组合。

Seen I'm thinking in circles now for some hours now, I would appreciate some help.看到我现在正在思考几个小时，我会很感激一些帮助。 Thanks for reading.谢谢阅读。

Answer 1

We can gather into 'long' format and then apply the conditions and later do a join我们可以gather到“长”格式，然后应用条件，然后进行连接

library(dplyr)
library(tidyr)
df %>%
  gather(key, val,  -client, na.rm = TRUE) %>% 
  group_by(client) %>% 
  mutate(newcol = case_when(any(grepl('2018', key)) & all(!grepl('2017', key))~ 'new', 
                           any(grepl('2018', key)) & any(grepl('2017', key)) ~ 'recurrent',
                           any(grepl('2017', key)) & all(!grepl('2018', key)) ~ 'lost')) %>%
  distinct(client, newcol) %>%
  right_join(df)
# A tibble: 6 x 6
# Groups: client [?]
#   client newcol    Feb2018 Jan2018 Dec2017 Nov2017
#  <fctr> <chr>       <dbl>   <dbl>   <dbl>   <dbl>
#1 name1  recurrent    10.0    20.0    30.0    40.0
#2 name2  recurrent    11.0    NA      23.0    22.0
#3 name3  lost         NA      NA      33.0    75.0
#4 name4  new          21.0    NA      NA      NA  
#5 name5  new          22.0    58.0    NA      NA  
#6 name6  lost         NA      NA      NA      11.0

R 根据列名匹配选定列中的 NA

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-03-03 10:36:12

R 根据列名匹配选定列中的 NA

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-03-03 10:36:12

解决方案1
1 已采纳 2018-03-03 10:36:12