简体   繁体   English

根据列值从列表中选择数据框

[英]select dataframes from a list based on column values

This question has been adapted from a previous different one. 这个问题是从先前的另一个问题改编而来的。

Using R, I am trying to create a new list of dataframes by selecting dataframes from an existing list only when a particular value combination exists in one of the columns. 我使用R,仅在其中一列中存在特定值组合时,才尝试通过从现有列表中选择数据框来创建数据框的新列表。 Let me explain the first steps which work fine. 让我解释一下可以正常工作的第一步。 This is my original data in a dataframe called df: 这是我在名为df的数据框中的原始数据:

                              Taxon      C     N    func.group  trophic.grp
1  Chrysomelidae.Phylotreta.exclamationis -30.23  5.06     grazer   herbivore
2        Chrysomelidae.Neocrepidodera.sp. -27.29  5.55     grazer   herbivore
3        Chrysomelidae.Neocrepidodera.sp. -27.84  5.54     grazer   herbivore
4        Chrysomelidae.Neocrepidodera.sp. -27.69  4.59     grazer   herbivore
5              Mitidulidae.Meligethes.sp. -26.99  5.30     grazer   herbivore
6           Chrysomelidae.Phylotreta.sp.2 -28.50  2.40     grazer   herbivore
7           Chrysomelidae.Phylotreta.sp.2 -28.36  4.17     grazer   herbivore
8           Chrysomelidae.Phylotreta.sp.2 -29.50  3.15     grazer   herbivore
9           Chrysomelidae.Phylotreta.sp.2 -27.69  3.72     grazer   herbivore
10          Chrysomelidae.Phylotreta.sp.2 -28.22  3.26     grazer   herbivore
11                  Gastropoda.snail.sp.1 -26.21  3.54     grazer   herbivore
12                  Gastropoda.snail.sp.1 -27.59  2.61     grazer   herbivore
13                  Gastropoda.snail.sp.1 -25.10  2.66     grazer   herbivore
14                  Gastropoda.snail.sp.2 -26.49  2.55     grazer   herbivore
15                  Gastropoda.snail.sp.4 -27.46 -0.38     grazer   herbivore
16       Lepidoptera.Arctidae.Ermine.moth -28.51  2.44     grazer   herbivore
17       Curculionidae.Ischapterapion.sp. -29.06  2.19     weevil   herbivore
18       Curculionidae.Ischapterapion.sp. -29.27  1.60     weevil   herbivore
19       Curculionidae.Ischapterapion.sp. -29.94  2.08     weevil   herbivore
20       Curculionidae.Ischapterapion.sp. -29.71  2.16     weevil   herbivore
21            Curculionidae.Protapion.sp. -28.45  1.91     weevil   herbivore
22            Curculionidae.Protapion.sp. -25.99  0.55     weevil   herbivore
23            Curculionidae.Protapion.sp. -28.27  1.52     weevil   herbivore
24            Curculionidae.Protapion.sp. -28.01  1.74     weevil   herbivore
25            Curculionidae.Protapion.sp. -27.06  0.54     weevil   herbivore
26             Curculionidae.Hypera.meles -25.41  3.38     weevil   herbivore
27               Curculionidae.Sitona.sp. -27.05  2.01     weevil   herbivore
28               Curculionidae.Sitona.sp. -26.70  3.07     weevil   herbivore
29               Curculionidae.Sitona.sp. -27.64  2.13     weevil   herbivore
30               Curculionidae.Sitona.sp. -27.50  1.47     weevil   herbivore
31            Curculionidae.Phylobius.sp. -28.27  2.66     weevil   herbivore
32      Curculionidae.Hypera.nigrorostris -25.52  2.43     weevil   herbivore

This dataframe (df) contains 14 different "Taxon" some of which have multiple samples, so that there are 32 samples in all. 此数据帧(df)包含14个不同的“ Taxon”,其中一些具有多个样本,因此总共有32个样本。 Each Taxon is also classified by the column "func.group" as either "grazer" or "weevil". 每个分类单元也通过“ func.group”列分类为“放牧者”或“象鼻虫”。

Firstly, I want to select 6 Taxon at random from my 14, for all possible combinations of 6. Thus there are some 3003 combinations of 6 taxon that can be made from 14 (sampled at random without replacement and order is not important). 首先,我想从14中随机选择6个分类单元,以实现6个所有可能的组合。因此,可以从14个中选择3003个6个分类单元的组合(随机抽样而不进行替换,顺序并不重要)。 For each Taxon selected, I want to include all samples of that Taxon. 对于每个选定的分类单元,我要包括该分类单元的所有样本。 I use this code, which works well: 我使用此代码,效果很好:

combos<-combn(unique(as.character(df$Taxon)), 6) 

Next I want to also include all the other columns of information, so I use this additional line of code that for each Taxon selected, it adds the other columns of data, and also works well: 接下来,我还希望包括所有其他信息列,因此我将使用此额外的代码行,对于所选的每个Taxon,它会添加其他数据列,并且效果很好:

mysamples <- apply(combos, 2, function(vec) df[ df$Taxon %in% vec, ] )

So then we reach my problem. 这样我们就解决了我的问题。 From "mysamples" (which should now be a list of 3003 dataframes), I would like to select all the dataframes that include at 3 Taxon that are "grazer" and 3 Taxon that are "weevil", and to store these dataframes in a new list. 从“ mysamples”(现在应该是3003个数据框的列表)中,我要选择在3个Taxon中包括“ grazer”和在3 Taxon中包括“ weevil”的所有数据框,并将这些数据框存储在新清单。

Therefore, I would like this new list to contain only dataframes that include a balance of 3:3 weevil:grazer Taxon, 因此,我希望这个新列表仅包含平衡为3:3 weevil:grazer Taxon的数据框,

Many thanks, M 非常感谢,M

I think you're looking for all elements of mysamples that have exactly 3 weevil and exactly 3 grazer. 我认为您正在寻找mysamples中所有具有3个象鼻虫和3个放牧者的元素。 You can do this with: 您可以执行以下操作:

# Get list of bool for whether to include
include.list <- lapply(mysamples, function(x) sum(x$func.group == "weevil") == 3 &
                                              sum(x$func.group == "grazer") == 3)

# Limit mysamples to the selected ones
mysamples <- mysamples[do.call(c, include.list)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM