簡體   English   中英

檢查一列中的值是否存在於一列中的其他兩列中 dataframe R

[英]Check to see if value from one column is present in two other columns in one dataframe R

我想找出一種方法來比較 SAME 數據框中的列,但這樣我就為 output 創建了一個名為STATUS的新列。我有 3 列 1) SNPs ,2) gained和 3) lost 我想知道第 1 列中每個單元格中的數據是否存在於第 2 列或第 3 列中。如果第 1 列中的數據存在於第 2 列中,那么我希望 output 說GAINED ,如果它存在於第 3 列中那么 output 就會LOST 如果它存在於任何一個中,那么 output 將是NEUTRAL

這是我想要的:

SNPs         GAINED          LOST           STATUS
1_752566     1_949654        6_30022061     NEUTRAL
1_776546     1_1045331       6_30314321     NEUTRAL
1_832918     1_832918        13_95612033    GAINED
1_914852     1_1247494       1_914852       LOST

我試過這個:

data_frame$status <- data.frame(lapply(data_frame[1], `%in%`, data_frame[2:3]))

但它會產生 2 個列,都說NEUTRAL 我相信它是每行讀取以查看它是否匹配,但我的數據並不是以這種方式組織的,因此它會在每行中找到每個匹配項。 相反,我想搜索整個列並讓 R 在每個單元格中找到匹配項,而不是按行搜索。

你不需要lapply或任何花哨的東西。

data_frame$STATUS = with(data_frame,
  ifelse(SNPs %in% GAINED, "GAINED",
   ifelse(SNPs %in% LOST, "LOST", "NEUTRAL")
  )
)

請注意,首先檢查 GAINED 條件的編寫方式,因此如果它同時存在於 GAINED 和 LOST 中,則結果將為“GAINED”。

使用嵌套的ifelse應該可以工作,並且如果縮進正確則可以理解:

tbl$status <- ifelse(tbl$SNPs %in% tbl$GAINED, "GAINED",
                               ifelse(tbl$SNPs %in% tbl$LOST, "LOST", "NEUTRAL") )

> tbl
      SNPs    GAINED        LOST  STATUS  status
1 1_752566  1_949654  6_30022061 NEUTRAL NEUTRAL
2 1_776546 1_1045331  6_30314321 NEUTRAL NEUTRAL
3 1_832918  1_832918 13_95612033  GAINED  GAINED
4 1_914852 1_1247494    1_914852    LOST    LOST

使用 case_when 的case_when方法

library(tidyverse)

df <-
  structure(
    list(
      SNPs = c("1_752566", "1_776546", "1_832918", "1_914852"),
      GAINED = c("1_949654", "1_1045331", "1_832918", "1_1247494"),
      LOST = c("6_30022061", "6_30314321", "13_95612033", "1_914852")
    ),
    row.names = c(NA,-4L),
    spec = structure(list(
      cols = list(
        SNPs = structure(list(), class = c("collector_character",
                                           "collector")),
        GAINED = structure(list(), class = c("collector_character",
                                             "collector")),
        LOST = structure(list(), class = c("collector_character",
                                           "collector"))
      ),
      default = structure(list(), class = c("collector_guess",
                                            "collector")),
      delim = ","
    ), class = "col_spec"),
    class = c("spec_tbl_df",
              "tbl_df", "tbl", "data.frame")
  )

df %>%
  mutate(STATUS = case_when(
    SNPs %in% GAINED ~ 'GAINED',
    SNPs %in% LOST ~ 'LOST',
    TRUE ~ 'NEUTRAL'
  ))
#> # A tibble: 4 × 4
#>   SNPs     GAINED    LOST        STATUS 
#>   <chr>    <chr>     <chr>       <chr>  
#> 1 1_752566 1_949654  6_30022061  NEUTRAL
#> 2 1_776546 1_1045331 6_30314321  NEUTRAL
#> 3 1_832918 1_832918  13_95612033 GAINED 
#> 4 1_914852 1_1247494 1_914852    LOST

創建於 2022-12-01,使用reprex v2.0.2

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM