Using R, I am trying to create a new list of dataframes by selecting dataframes from an existing list only when a particular value combination exists in one of the columns. Let me explain the first steps which work fine. This is my original data in a dataframe called df:
Taxon C N func.group trophic.grp
1 Chrysomelidae.Phylotreta.exclamationis -30.23 5.06 grazer herbivore
2 Chrysomelidae.Neocrepidodera.sp. -27.29 5.55 grazer herbivore
3 Chrysomelidae.Neocrepidodera.sp. -27.84 5.54 grazer herbivore
4 Chrysomelidae.Neocrepidodera.sp. -27.69 4.59 grazer herbivore
5 Mitidulidae.Meligethes.sp. -26.99 5.30 grazer herbivore
6 Chrysomelidae.Phylotreta.sp.2 -28.50 2.40 grazer herbivore
7 Chrysomelidae.Phylotreta.sp.2 -28.36 4.17 grazer herbivore
8 Chrysomelidae.Phylotreta.sp.2 -29.50 3.15 grazer herbivore
9 Chrysomelidae.Phylotreta.sp.2 -27.69 3.72 grazer herbivore
10 Chrysomelidae.Phylotreta.sp.2 -28.22 3.26 grazer herbivore
11 Gastropoda.snail.sp.1 -26.21 3.54 grazer herbivore
12 Gastropoda.snail.sp.1 -27.59 2.61 grazer herbivore
13 Gastropoda.snail.sp.1 -25.10 2.66 grazer herbivore
14 Gastropoda.snail.sp.2 -26.49 2.55 grazer herbivore
15 Gastropoda.snail.sp.4 -27.46 -0.38 grazer herbivore
16 Lepidoptera.Arctidae.Ermine.moth -28.51 2.44 grazer herbivore
17 Curculionidae.Ischapterapion.sp. -29.06 2.19 weevil herbivore
18 Curculionidae.Ischapterapion.sp. -29.27 1.60 weevil herbivore
19 Curculionidae.Ischapterapion.sp. -29.94 2.08 weevil herbivore
20 Curculionidae.Ischapterapion.sp. -29.71 2.16 weevil herbivore
21 Curculionidae.Protapion.sp. -28.45 1.91 weevil herbivore
22 Curculionidae.Protapion.sp. -25.99 0.55 weevil herbivore
23 Curculionidae.Protapion.sp. -28.27 1.52 weevil herbivore
24 Curculionidae.Protapion.sp. -28.01 1.74 weevil herbivore
25 Curculionidae.Protapion.sp. -27.06 0.54 weevil herbivore
26 Curculionidae.Hypera.meles -25.41 3.38 weevil herbivore
27 Curculionidae.Sitona.sp. -27.05 2.01 weevil herbivore
28 Curculionidae.Sitona.sp. -26.70 3.07 weevil herbivore
29 Curculionidae.Sitona.sp. -27.64 2.13 weevil herbivore
30 Curculionidae.Sitona.sp. -27.50 1.47 weevil herbivore
31 Curculionidae.Phylobius.sp. -28.27 2.66 weevil herbivore
32 Curculionidae.Hypera.nigrorostris -25.52 2.43 weevil herbivore
This dataframe (df) contains 14 different "Taxon" some of which have multiple samples, so that there are 32 samples in all. Each Taxon is also classified by the column "func.group" as either "grazer" or "weevil".
Firstly, I want to select 6 Taxon at random from my 14, for all possible combinations of 6. Thus there are some 3003 combinations of 6 taxon that can be made from 14 (sampled at random without replacement and order is not important). For each Taxon selected, I want to include all samples of that Taxon. I use this code, which works well:
combos<-combn(unique(as.character(df$Taxon)), 6)
Next I want to also include all the other columns of information, so I use this additional line of code that for each Taxon selected, it adds the other columns of data, and also works well:
mysamples <- apply(combos, 2, function(vec) df[ df$Taxon %in% vec, ] )
So then we reach my problem. From "mysamples" (which should now be a list of 3003 dataframes), I would like to select all the dataframes that include at least one Taxon that is "grazer" and one Taxon that is "weevil", and to store these dataframes in a new list.
Therefore, I would like this new list to contain only dataframes that include Taxon of both weevils and grazers. (It doesn't matter how many of the 6 taxon in each dataframe are weevils or grazers, just as long as one is grazer and one that is weevil).
Many thanks, M
Try this code
mysamples[unlist(lapply(mysamples,
function(x) !any(is.na(match(levels(df$func.group),
x$func.group)))))]
If either grazer or weevils is missing, match
returns NA and any
will therefore return TRUE which is inverted (!) and therefore this dataframe won't be used in the final one
Try this.
df.list <- lapply(mysamples,
function(x){if(any(x$func.group=="grazer")&
any(x$func.group=="weevil"))
return(x)})
both <- Filter(Negate(is.null),df.list)
The anonymous function takes a df
as argument and returns the df
if there are any weevils and
any grazers in df$func.group
; otherwise it returns NULL
. lapply(...)
"applies" this function to every df
in your list, returning df.list
which has all your desired data frames, plus a lot of NULLs
. The second statement returns only those members of the list which are not NULL
, eg a list of dataframes which have at least one weevil and at least one grazer in df$func.group
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.