简体   繁体   English

根据其他向量子集数据框

[英]Subset a dataframe based on other vector

Here is my question, the first data frame is output inside a function which would be applied to bigger dataframe2, to subset it. 这是我的问题,第一个数据帧在函数内部输出,该函数将应用于更大的dataframe2并对其进行子集化。

# dataframe1 
loc <- c(paste('Loc', 1:9, sep = ''))
qit <- c(13, 27, 16,  14,  15,  21,  12,  11,  8)

mydf <- data.frame(loc, qit)
 loc qit
1 Loc1  13
2 Loc2  27
3 Loc3  16
4 Loc4  14
5 Loc5  15
6 Loc6  21
7 Loc7  12
8 Loc8  11
9 Loc9   8

#dataframe 2
loc <- c(paste('Loc', 1:9, sep = ''))
 vloc <- c(rep(loc, each=2))
 allele <- c(
  13, 12, 27, 20, 16, 18, 
  14, 17, 15, 22, 21, 26, 
  12, 14, 11, 18,  8, 24
  )
  afreq <- c( 0.308, 0.4, 0.041, 0.5, 0.125, 0.5,
             0.139, 0.2, 0.219, 0.2,0.176, 0.33,
             0.358, 0.4, 0.274, 0.5, 0.173, 0.15)   
 loctab <- data.frame(vloc, allele, afreq)

   vloc allele afreq
1  Loc1     13 0.308
2  Loc1     12 0.400
3  Loc2     27 0.041
4  Loc2     20 0.500
5  Loc3     16 0.125
6  Loc3     18 0.500
7  Loc4     14 0.139
8  Loc4     17 0.200
9  Loc5     15 0.219
10 Loc5     22 0.200
11 Loc6     21 0.176
12 Loc6     26 0.330
13 Loc7     12 0.358
14 Loc7     14 0.400
15 Loc8     11 0.274
16 Loc8     18 0.500
17 Loc9      8 0.173
18 Loc9     24 0.150

What I want to make new dataframe like mydf with additional afreq variable from dataframe2. 我想用dataframe2中的其他afreq变量制作像mydf这样的新数据框。 I tried to subset it: 我试图将其子集化:

loctab[loctab$allele %in%  mydf$qit, ]

  vloc allele afreq
1  Loc1     13 0.308
2  Loc1     12 0.400
3  Loc2     27 0.041
5  Loc3     16 0.125
7  Loc4     14 0.139
9  Loc5     15 0.219
11 Loc6     21 0.176
13 Loc7     12 0.358
14 Loc7     14 0.400
15 Loc8     11 0.274
17 Loc9      8 0.173 

I did not get what I want. 我没有得到我想要的。 Here subset doesnot care about the vloc or loc variable. 在这里,子集不在乎vloc或loc变量。 In this whenever it gets a match for all values in qit, will subset it. 在这种情况下,只要它与qit中的所有值都匹配,就会对其进行子集化。 Is there anyway to subset by putting reference to loc or vloc. 无论如何,通过引用loc或vloc来子集化。

Maybe the merge() function is what you're looking for: 也许您正在寻找merge()函数:

mydf2 <- merge(mydf,loctab,by.x = "qit", by.y = "allele")

You end up with 4 columns, but can then just get rid of the extra "vloc" column. 您最终得到了4列,但随后就可以摆脱多余的"vloc"列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM