简体   繁体   English

循环遍历 dataframe 中的列并根据 R 中的特定条件连接字符串

[英]Looping through column in dataframe and concatenating string based on specific condition in R

The following df represents treatments that a single patient has received during the course of a study.以下df表示单个患者在研究过程中接受的治疗。 They first received drug-v, followed by drug-w, followed by drug-x, and so on.他们首先接受 drug-v,然后是 drug-w,然后是 drug-x,依此类推。

original <- tibble::tribble(
  ~treatment_administered,
                 "drug-v",
                 "drug-w",
                 "drug-x",
                 "drug-y",
                 "drug-z",
                 "drug-l"
  )
original

My aim is to keeping a cumulative record of prior treatment exposures that belong to a specific class of treatment - let's call this "class A".我的目标是保留属于特定 class 治疗的先前治疗暴露的累积记录 - 我们称之为“A 类”。 In this example, drug v, drug x and drug z belong to class A. Here is the final df I wish to create.在此示例中,药物 v、药物 x 和药物 z 属于 class A。这是我希望创建的最终 df。

final <- tibble::tribble(
              ~prior_classA_details, ~treatment_administered,
                                 "",                "drug-v",
                           "drug-v",                "drug-w",
                           "drug-v",                "drug-x",
                   "drug-v,drug-x",                "drug-y",
                   "drug-v,drug-x",                "drug-z",
           "drug-v, drug x,drug-z",                "drug-l"
           )
final

As you can see, prior_classA_details is tracking treatment_administered on the previous row, and if it's a class A treatment, it adds the name to the following row.如您所见, prior_classA_details正在跟踪上一行的treatment_administered ,如果它是 class A 治疗,它会将名称添加到下一行。 This is an iterative process as it goes down the list, concatenating prior_classA_details as class A treatments are administered.这是一个迭代过程,因为它沿着列表向下,将prior_classA_details连接为 class A 治疗。

There are multiple other data columns in this df that I have not included here (only relevant columns included).这个df中还有多个其他数据列,我没有在此处包含(仅包含相关列)。 Ideally looking for a dplyr solution please.理想情况下请寻找 dplyr 解决方案。

Here's one way -这是一种方法 -

library(dplyr)
library(purrr)

classA <- c("drug-v", "drug-x", "drug-z")

original %>%
  mutate(prior_classA_details = lag(map_chr(row_number(), ~{
   toString(keep(treatment_administered[seq_len(.x)], function(y) y %in% classA))
    }), default = ''), .before = 1)

#  prior_classA_details     treatment_administered
#  <chr>                    <chr>                 
#1 ""                       drug-v                
#2 "drug-v"                 drug-w                
#3 "drug-v"                 drug-x                
#4 "drug-v, drug-x"         drug-y                
#5 "drug-v, drug-x"         drug-z                
#6 "drug-v, drug-x, drug-z" drug-l                

We create a vector for classA drugs and for each row keep only those values that are of type classA in a cumulative fashion and create one concatenated string.我们为classA类药物创建一个向量,并为每一行仅保留classA以累积方式属于 A 类的值,并创建一个连接的字符串。 lag is used to get lagged records by step 1. lag用于获取第 1 步的滞后记录。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM