简体   繁体   English

如何使用参考表将特定行和列中的值替换为 NA?

[英]How to replace values in specific rows and columns with NA using a reference table?

I need to set values to NA for specific columns and rows using a separate reference table and am not sure how.我需要使用单独的参考表将特定列和行的值设置为 NA 并且不确定如何。

I need to set the value to NA if the column name (field name) is not associated with the value in the column called event, using a reference table of event-fields mappings.如果列名(字段名)与名为 event 的列中的值没有关联,我需要使用事件字段映射的引用表将值设置为 NA。

This is a simplified example data of what I've got.这是我所拥有的简化示例数据。 For the real data, I've ~900 rows and >300 columns to replace NAs in, and the columns are of different types.对于真实数据,我有大约 900 行和 >300 列来替换 NA,并且这些列的类型不同。

df <- tibble::tribble(
~event, ~drug, ~status,
"referral", "drugA", 0,
"therapy", "drugA", 1
)

I have a reference table like below that says what fields are associated with each event.我有一个如下所示的参考表,其中说明了与每个事件相关联的字段。

event_fields <- tibble::tribble(
  ~unique_event_name, ~field_name,
  "referral", "record_id",
  "referral", "casetype",
  "therapy", "drug",
  "therapy", "status"
)

The output I'm trying to get is below eg drug and status are not fields associated with the referral event in the event_fields table above so they should get set to NA.我试图获取的 output 低于例如药物和状态不是与上面 event_fields 表中的推荐事件关联的字段,因此它们应该设置为 NA。

desired_result <- tibble::tribble(
~event_name, ~drug, ~status,
"referral", NA, NA,
"therapy", "drugA", 1
)

One thing I've tried is below (based on Replace multiple values in a dataframe with NA based on conditions given in another dataframe in R , the closest question I could find) but it doesn't work.我尝试过的一件事是(基于根据R 中另一个 dataframe 中给出的条件,用 NA 替换 dataframe 中的多个值,但它找不到最接近的问题)。 I'm not sure how to use the event value for each row eg "referral" and the name of the field column eg "drug" in the filter() or if there's a better way to do this.我不确定如何在filter()中使用每一行的事件值,例如“推荐”和字段列的名称,例如“药物”,或者是否有更好的方法来做到这一点。

library(tidyverse)
df %>% mutate(across(drug:status,  
                     ~ replace(., !cur_column() %in% 
event_fields %>% filter(unique_event_name == event) %>% pull(field_name), 
NA) ))

which gives the error这给出了错误

Error: Problem with `mutate()` input `..1`.
ℹ `..1 = across(...)`.
x no applicable method for 'filter' applied to an object of class "logical"

Any help will highly appreciated!任何帮助将不胜感激!

As we are creating the logical based on 'event' column and the corresponding 'unique_event_name' on the same row of the 'field_name' that matches the column names ( cur_column() ), subset the 'unique_event_name' based on the logical on 'field_name' and then do the second logical on the 'event' to replace由于我们正在创建基于“事件”列的逻辑以及与列名( cur_column() )匹配的“field_name”的同一行上的相应“unique_event_name”,因此基于“field_name”的逻辑对“unique_event_name”进行子集化' 然后在 'event' 上执行第二个逻辑来replace

library(dplyr)
df %>%
    mutate(across(drug:status, ~ replace(.,
     event != event_fields$unique_event_name[
           event_fields$field_name == cur_column()], NA)))

-output -输出

# A tibble: 2 × 3
  event    drug  status
  <chr>    <chr>  <dbl>
1 referral <NA>      NA
2 therapy  drugA      1

You may try this solution.你可以试试这个解决方案。 Although it works on the toy example this might still fail depending on your real data.虽然它适用于玩具示例,但根据您的真实数据,它可能仍然会失败。

The idea is to replace the non-matching fields and replace them with NA .这个想法是替换不匹配的字段并用NA替换它们。 First find the non-matching rows, then select the corresponding columns.首先找到不匹配的行,然后 select 对应的列。

desired_result <- df
 
desired_result[ df$event != unique( 
   event_fields$unique_event_name[ event_fields$field_name %in% colnames( df )] 
   ), na.omit( match( event_fields$field_name, colnames( df ) ) ) ] <- NA

desired_result
     event   drug status
1 referral   <NA>     NA
2  therapy  drugA      1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM