简体   繁体   English

在R中嵌套if else语句,每个单元格中包含多个字符串

[英]Nested if else statement in R with multiple string in each cell

I would like to do an if else statement with multiple conditions. 我想在多个条件下执行if if语句。 I have two data frames, first one looks like this: 我有两个数据框,第一个看起来像这样:

prefix <- "sample"
suffix <- seq(1:100)
id <- paste(prefix, suffix, sep="")
indv_df <- data.frame(id, count = matrix(ncol=1, nrow=100))

And the first 15 rows of indv_df looks like this: indv_df的前15行如下所示:

           id count
1     sample1    NA
2     sample2    NA
3     sample3    NA
4     sample4    NA
5     sample5    NA
6     sample6    NA
7     sample7    NA
8     sample8    NA
9     sample9    NA
10   sample10    NA
11   sample11    NA
12   sample12    NA
13   sample13    NA
14   sample14    NA
15   sample15    NA

The second table called row1 that I have looks like this: 我拥有的第二个表称为row1,如下所示:

 Hom <- paste("sample2", "sample3", "sample4", sep=",")
 Het <- paste("sample5", "sample6", "sample7", sep=",")
 Missing <- paste("sample10", "sample11", sep=",")
 row1 <- data.frame(Hom, Het, Missing)

looks like this: 看起来像这样:

                      Hom                     Het           Missing
1 sample2,sample3,sample4 sample5,sample6,sample7 sample10,sample11

I am trying to do if else statement that if the first row's id does not match any of the second table's content, write "0" in the first table's first row, second column. 我正在尝试if else语句,如果第一行的ID与第二张表的任何内容都不匹配,请在第一张表的第一行,第二列中写入“ 0”。 This is what I tried but didn't work, which I am not too surprised since this is my first if else statement. 这是我尝试但没有奏效的方法,对此并不感到惊讶,因为这是我的第一个if else声明。 I know it should be straight forward but I tried a few different methods none worked 我知道这应该很简单,但是我尝试了几种其他方法都没有用


> if(grep(indv_df$id[1], row1$Hom)){
+   apply(indv_df[1,2]=="2")
+ } else if(grep(indv_df$id[1], row1$Het)){
+   apply(indv_df[1,2]=="1")
+ } else if(grep(indv_df$id[1], row1$Missing)){
+   apply(indv_df[1,2]=="missing")
+ } else (apply(indv_df[1,2]=="0"))

this is the error message I got: 这是我收到的错误消息:

Error in if (grep(indv_df$id[1], row1$Hom)) { : 
  argument is of length zero

The real dataset has 4 million rows in the second data.frame, so I am just testing the first step..... once I get through this I will try to do that in a loop for all rows. 真正的数据集在第二个data.frame中有400万行,所以我只是在测试第一步.....一旦完成,我将尝试对所有行进行循环处理。 :D Thank you for all the help in advance. :D谢谢您提前提供的所有帮助。

A few issues that might be impacting you. 可能会影响您的一些问题。 Your eventual result will have a character column for the count column. 最终结果将在count列中包含一个character列。 It's best to do this up front, and would be clearer than how you currently have it done. 最好预先进行此操作,并且比您目前的操作方式更清晰。

indv_df <- data.frame(id, count = NA_character_)

However, using your data.frame as you constructed it, I would approach this not by a series of if statements, but by subsetting. 但是,在构造数据时使用data.frame时,我不会通过一系列if语句来实现此目的,而是通过子集来实现。 In addition, you have lines like apply(indv_df[1,2]=="missing") . 另外,您还有诸如apply(indv_df[1,2]=="missing") This is wrong for a few reasons. 由于某些原因,这是错误的。 indv_df[1,2] is more matrix syntax, and returns an NA. indv_df[1,2]具有更多的矩阵语法,并返回NA。 Then you check for identity with == , rather than assigning a value. 然后,使用==检查身份,而不是分配值。

Here is a solution using data.frame syntax, and the stringr library. 这是使用data.frame语法和stringr库的解决方案。

library(stringr)
indv_df$count <- as.character(indv_df$count)
indv_df$count <- "0"
indv_df[indv_df$id %in% unlist(str_split(as.character(row1$Hom), ",")),]$count <- "2"
indv_df[indv_df$id %in% unlist(str_split(as.character(row1$Het), ",")),]$count <- "1"
indv_df[indv_df$id %in% unlist(str_split(as.character(row1$Missing), ",")),]$count <- "missing"
#          id   count
# 1   sample1       0
# 2   sample2       2
# 3   sample3       2
# 4   sample4       2
# 5   sample5       1
# 6   sample6       1
# 7   sample7       1
# 8   sample8       0
# 9   sample9       0
# 10 sample10 missing
# 11 sample11 missing
# 12 sample12       0
# 13 sample13       0
# 14 sample14       0
# 15 sample15       0

Personally, I prefer data.table syntax for this. 就个人而言,我更喜欢使用data.table语法。

library(data.table)
library(stringr)
setDT(indv_df)
setDT(row1)
indv_df[, count := as.character(count)]
indv_df[, count := "0"]
indv_df[id %in% unlist(str_split(as.character(row1$Hom), ",")), count := "2"]
indv_df[id %in% unlist(str_split(as.character(row1$Het), ",")), count := "1"]
indv_df[id %in% unlist(str_split(as.character(row1$Missing), ",")), count := "missing"]
#           id   count
#  1:  sample1       0
#  2:  sample2       2
#  3:  sample3       2
#  4:  sample4       2
#  5:  sample5       1
#  6:  sample6       1
#  7:  sample7       1
#  8:  sample8       0
#  9:  sample9       0
# 10: sample10 missing
# 11: sample11 missing
# 12: sample12       0
# 13: sample13       0
# 14: sample14       0
# 15: sample15       0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM