简体   繁体   English

根据列值从列表中选择数据框

[英]selecting dataframes from a list based on column values

Using R, I am trying to create a new list of dataframes by selecting dataframes from an existing list only when a particular value combination exists in one of the columns. 我使用R,仅在其中一列中存在特定值组合时,才尝试通过从现有列表中选择数据框来创建数据框的新列表。 Let me explain the first steps which work fine. 让我解释一下可以正常工作的第一步。 This is my original data in a dataframe called df: 这是我在名为df的数据框中的原始数据:

                                  Taxon      C     N    func.group  trophic.grp
1  Chrysomelidae.Phylotreta.exclamationis -30.23  5.06     grazer   herbivore
2        Chrysomelidae.Neocrepidodera.sp. -27.29  5.55     grazer   herbivore
3        Chrysomelidae.Neocrepidodera.sp. -27.84  5.54     grazer   herbivore
4        Chrysomelidae.Neocrepidodera.sp. -27.69  4.59     grazer   herbivore
5              Mitidulidae.Meligethes.sp. -26.99  5.30     grazer   herbivore
6           Chrysomelidae.Phylotreta.sp.2 -28.50  2.40     grazer   herbivore
7           Chrysomelidae.Phylotreta.sp.2 -28.36  4.17     grazer   herbivore
8           Chrysomelidae.Phylotreta.sp.2 -29.50  3.15     grazer   herbivore
9           Chrysomelidae.Phylotreta.sp.2 -27.69  3.72     grazer   herbivore
10          Chrysomelidae.Phylotreta.sp.2 -28.22  3.26     grazer   herbivore
11                  Gastropoda.snail.sp.1 -26.21  3.54     grazer   herbivore
12                  Gastropoda.snail.sp.1 -27.59  2.61     grazer   herbivore
13                  Gastropoda.snail.sp.1 -25.10  2.66     grazer   herbivore
14                  Gastropoda.snail.sp.2 -26.49  2.55     grazer   herbivore
15                  Gastropoda.snail.sp.4 -27.46 -0.38     grazer   herbivore
16       Lepidoptera.Arctidae.Ermine.moth -28.51  2.44     grazer   herbivore
17       Curculionidae.Ischapterapion.sp. -29.06  2.19     weevil   herbivore
18       Curculionidae.Ischapterapion.sp. -29.27  1.60     weevil   herbivore
19       Curculionidae.Ischapterapion.sp. -29.94  2.08     weevil   herbivore
20       Curculionidae.Ischapterapion.sp. -29.71  2.16     weevil   herbivore
21            Curculionidae.Protapion.sp. -28.45  1.91     weevil   herbivore
22            Curculionidae.Protapion.sp. -25.99  0.55     weevil   herbivore
23            Curculionidae.Protapion.sp. -28.27  1.52     weevil   herbivore
24            Curculionidae.Protapion.sp. -28.01  1.74     weevil   herbivore
25            Curculionidae.Protapion.sp. -27.06  0.54     weevil   herbivore
26             Curculionidae.Hypera.meles -25.41  3.38     weevil   herbivore
27               Curculionidae.Sitona.sp. -27.05  2.01     weevil   herbivore
28               Curculionidae.Sitona.sp. -26.70  3.07     weevil   herbivore
29               Curculionidae.Sitona.sp. -27.64  2.13     weevil   herbivore
30               Curculionidae.Sitona.sp. -27.50  1.47     weevil   herbivore
31            Curculionidae.Phylobius.sp. -28.27  2.66     weevil   herbivore
32      Curculionidae.Hypera.nigrorostris -25.52  2.43     weevil   herbivore

This dataframe (df) contains 14 different "Taxon" some of which have multiple samples, so that there are 32 samples in all. 此数据帧(df)包含14个不同的“ Taxon”,其中一些具有多个样本,因此总共有32个样本。 Each Taxon is also classified by the column "func.group" as either "grazer" or "weevil". 每个分类单元也通过“ func.group”列分类为“放牧者”或“象鼻虫”。

Firstly, I want to select 6 Taxon at random from my 14, for all possible combinations of 6. Thus there are some 3003 combinations of 6 taxon that can be made from 14 (sampled at random without replacement and order is not important). 首先,我想从14中随机选择6个分类单元,以实现6个所有可能的组合。因此,可以从14个中选择3003个6个分类单元的组合(随机抽样而不进行替换,顺序并不重要)。 For each Taxon selected, I want to include all samples of that Taxon. 对于每个选定的分类单元,我要包括该分类单元的所有样本。 I use this code, which works well: 我使用此代码,效果很好:

combos<-combn(unique(as.character(df$Taxon)), 6) 

Next I want to also include all the other columns of information, so I use this additional line of code that for each Taxon selected, it adds the other columns of data, and also works well: 接下来,我还希望包括所有其他信息列,因此我将使用此额外的代码行,对于所选的每个Taxon,它会添加其他数据列,并且效果很好:

mysamples <- apply(combos, 2, function(vec) df[ df$Taxon %in% vec, ] )

So then we reach my problem. 这样我们就解决了我的问题。 From "mysamples" (which should now be a list of 3003 dataframes), I would like to select all the dataframes that include at least one Taxon that is "grazer" and one Taxon that is "weevil", and to store these dataframes in a new list. 从“ mysamples”(现在应该是3003个数据框的列表)中,我要选择所有包含至少一个“ grazer”的分类单元和一个“ weevil”的分类单元的数据框,并将这些数据框存储在一个新列表。

Therefore, I would like this new list to contain only dataframes that include Taxon of both weevils and grazers. 因此,我希望这个新列表仅包含同时包含象鼻虫和放牧者分类单元的数据框。 (It doesn't matter how many of the 6 taxon in each dataframe are weevils or grazers, just as long as one is grazer and one that is weevil). (每个数据帧中的6个分类单元中有多少是象鼻虫还是掠食者都没有关系,只要一个是掠食者而又是象鼻虫)。

Many thanks, M 非常感谢,M

Try this code 试试这个代码

mysamples[unlist(lapply(mysamples,
                        function(x) !any(is.na(match(levels(df$func.group),
                                                     x$func.group)))))]

If either grazer or weevils is missing, match returns NA and any will therefore return TRUE which is inverted (!) and therefore this dataframe won't be used in the final one 如果缺少grazer或象鼻虫,则match返回NA,因此any将返回TRUE(取反(!)),因此该数据帧将不会用于最后一个

Try this. 尝试这个。

df.list <- lapply(mysamples,
                  function(x){if(any(x$func.group=="grazer")&
                                 any(x$func.group=="weevil"))
                                 return(x)})

both <- Filter(Negate(is.null),df.list)

The anonymous function takes a df as argument and returns the df if there are any weevils and any grazers in df$func.group ; 匿名函数将df作为参数,如果df$func.group中有象鼻虫and df$func.group ,则返回df otherwise it returns NULL . 否则返回NULL lapply(...) "applies" this function to every df in your list, returning df.list which has all your desired data frames, plus a lot of NULLs . lapply(...)将此函数“应用”到列表中的每个df ,返回包含所有所需数据帧以及很多NULLs df.list The second statement returns only those members of the list which are not NULL , eg a list of dataframes which have at least one weevil and at least one grazer in df$func.group . 第二条语句仅返回列表中不为NULL那些成员,例如,在df$func.group具有至少一个象鼻虫和至少一个df$func.group的数据帧列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM