selecting dataframes from a list based on column values

Question

Using R, I am trying to create a new list of dataframes by selecting dataframes from an existing list only when a particular value combination exists in one of the columns. Let me explain the first steps which work fine. This is my original data in a dataframe called df:

                                  Taxon      C     N    func.group  trophic.grp
1  Chrysomelidae.Phylotreta.exclamationis -30.23  5.06     grazer   herbivore
2        Chrysomelidae.Neocrepidodera.sp. -27.29  5.55     grazer   herbivore
3        Chrysomelidae.Neocrepidodera.sp. -27.84  5.54     grazer   herbivore
4        Chrysomelidae.Neocrepidodera.sp. -27.69  4.59     grazer   herbivore
5              Mitidulidae.Meligethes.sp. -26.99  5.30     grazer   herbivore
6           Chrysomelidae.Phylotreta.sp.2 -28.50  2.40     grazer   herbivore
7           Chrysomelidae.Phylotreta.sp.2 -28.36  4.17     grazer   herbivore
8           Chrysomelidae.Phylotreta.sp.2 -29.50  3.15     grazer   herbivore
9           Chrysomelidae.Phylotreta.sp.2 -27.69  3.72     grazer   herbivore
10          Chrysomelidae.Phylotreta.sp.2 -28.22  3.26     grazer   herbivore
11                  Gastropoda.snail.sp.1 -26.21  3.54     grazer   herbivore
12                  Gastropoda.snail.sp.1 -27.59  2.61     grazer   herbivore
13                  Gastropoda.snail.sp.1 -25.10  2.66     grazer   herbivore
14                  Gastropoda.snail.sp.2 -26.49  2.55     grazer   herbivore
15                  Gastropoda.snail.sp.4 -27.46 -0.38     grazer   herbivore
16       Lepidoptera.Arctidae.Ermine.moth -28.51  2.44     grazer   herbivore
17       Curculionidae.Ischapterapion.sp. -29.06  2.19     weevil   herbivore
18       Curculionidae.Ischapterapion.sp. -29.27  1.60     weevil   herbivore
19       Curculionidae.Ischapterapion.sp. -29.94  2.08     weevil   herbivore
20       Curculionidae.Ischapterapion.sp. -29.71  2.16     weevil   herbivore
21            Curculionidae.Protapion.sp. -28.45  1.91     weevil   herbivore
22            Curculionidae.Protapion.sp. -25.99  0.55     weevil   herbivore
23            Curculionidae.Protapion.sp. -28.27  1.52     weevil   herbivore
24            Curculionidae.Protapion.sp. -28.01  1.74     weevil   herbivore
25            Curculionidae.Protapion.sp. -27.06  0.54     weevil   herbivore
26             Curculionidae.Hypera.meles -25.41  3.38     weevil   herbivore
27               Curculionidae.Sitona.sp. -27.05  2.01     weevil   herbivore
28               Curculionidae.Sitona.sp. -26.70  3.07     weevil   herbivore
29               Curculionidae.Sitona.sp. -27.64  2.13     weevil   herbivore
30               Curculionidae.Sitona.sp. -27.50  1.47     weevil   herbivore
31            Curculionidae.Phylobius.sp. -28.27  2.66     weevil   herbivore
32      Curculionidae.Hypera.nigrorostris -25.52  2.43     weevil   herbivore

This dataframe (df) contains 14 different "Taxon" some of which have multiple samples, so that there are 32 samples in all. Each Taxon is also classified by the column "func.group" as either "grazer" or "weevil".

Firstly, I want to select 6 Taxon at random from my 14, for all possible combinations of 6. Thus there are some 3003 combinations of 6 taxon that can be made from 14 (sampled at random without replacement and order is not important). For each Taxon selected, I want to include all samples of that Taxon. I use this code, which works well:

combos<-combn(unique(as.character(df$Taxon)), 6)

Next I want to also include all the other columns of information, so I use this additional line of code that for each Taxon selected, it adds the other columns of data, and also works well:

mysamples <- apply(combos, 2, function(vec) df[ df$Taxon %in% vec, ] )

So then we reach my problem. From "mysamples" (which should now be a list of 3003 dataframes), I would like to select all the dataframes that include at least one Taxon that is "grazer" and one Taxon that is "weevil", and to store these dataframes in a new list.

Therefore, I would like this new list to contain only dataframes that include Taxon of both weevils and grazers. (It doesn't matter how many of the 6 taxon in each dataframe are weevils or grazers, just as long as one is grazer and one that is weevil).

Many thanks, M

Answer 1

Try this code

mysamples[unlist(lapply(mysamples,
                        function(x) !any(is.na(match(levels(df$func.group),
                                                     x$func.group)))))]

If either grazer or weevils is missing, match returns NA and any will therefore return TRUE which is inverted (!) and therefore this dataframe won't be used in the final one

Answer 2

Try this.

df.list <- lapply(mysamples,
                  function(x){if(any(x$func.group=="grazer")&
                                 any(x$func.group=="weevil"))
                                 return(x)})

both <- Filter(Negate(is.null),df.list)

The anonymous function takes a df as argument and returns the df if there are any weevils and any grazers in df$func.group ; otherwise it returns NULL . lapply(...) "applies" this function to every df in your list, returning df.list which has all your desired data frames, plus a lot of NULLs . The second statement returns only those members of the list which are not NULL , eg a list of dataframes which have at least one weevil and at least one grazer in df$func.group .

selecting dataframes from a list based on column values

Question

2 answers

solution1
0 2013-12-07 16:45:00

solution2
0 2013-12-08 01:18:52

selecting dataframes from a list based on column values

Question

2 answers

solution1 0 2013-12-07 16:45:00

solution2 0 2013-12-08 01:18:52

solution1
0 2013-12-07 16:45:00

solution2
0 2013-12-08 01:18:52