[英]How to generate list/table of all the observations with a given value. In R
I have a large dataset (asv_ar2) indicating the number of times a given species has been recorded on a given location.我有一个大型数据集(asv_ar2),指示给定物种在给定位置被记录的次数。 It looks like the following:
它如下所示:
Specie![]() |
loc1![]() |
loc2![]() |
loc3![]() |
loc4 ![]() |
---|---|---|---|---|
sp1 ![]() |
0 ![]() |
1 ![]() |
0 ![]() |
4 ![]() |
sp2 ![]() |
7 ![]() |
3 ![]() |
0 ![]() |
2 ![]() |
sp3 ![]() |
3 ![]() |
1 ![]() |
0 ![]() |
0 ![]() |
I would like to get for each species a list/table with the locations where it's been found (where the value of that variable is not 0).我想为每个物种获取一个列表/表格,其中包含找到它的位置(该变量的值不为 0)。 Something like:
就像是:
or the other way around, with the species found in a location.或者相反,在某个位置发现的物种。
I can select rows with values>0 with the filter function of dplyr, but only location by location.我可以使用 dplyr 的过滤器 function 的值>0 的 select 行,但只能逐个位置。
a1<-filter(asv_ar2,asv_ar2[,2]>0)[,c(1,2,8)]
I tried making a loop that joins them all together, but it only shows the first location a1<-filter(asv_ar2,asv_ar2[,2]>0)[,c(1,2,8)]
我尝试制作一个将它们连接在一起的循环,但它只显示第一个位置
for(i in 2:1156){ locs<-filter(asv_ar2,asv_ar2[,i]>0)[c(1,i)]}
I don't know how to join all the iterations.我不知道如何加入所有的迭代。 Or if there is a better way to do all this.
或者,如果有更好的方法来完成这一切。
Any suggestions?有什么建议么?
Thank you谢谢
I hope this is what you have in mind:我希望这是您的想法:
library(dplyr)
library(tidyr)
library(purrr)
df %>%
mutate(data = pmap(df %>% select(!Specie), ~ names(c(...)[c(...) != 0]))) %>%
unnest_wider(data)
# A tibble: 3 x 8
Specie loc1 loc2 loc3 loc4 ...1 ...2 ...3
<chr> <int> <int> <int> <int> <chr> <chr> <chr>
1 sp1 0 1 0 4 loc2 loc4 NA
2 sp2 7 3 0 2 loc1 loc2 loc4
3 sp3 3 1 0 0 loc1 loc2 NA
You can add a new column with column names where the value is greater than 0 in a row.您可以添加一个具有列名的新列,其中该列的值连续大于 0。
asv_ar2$locs <- apply(asv_ar2[-1] > 0, 1, function(x)
toString(names(asv_ar2[-1])[x]))
asv_ar2
# Specie loc1 loc2 loc3 loc4 locs
#1 sp1 0 1 0 4 loc2, loc4
#2 sp2 7 3 0 2 loc1, loc2, loc4
#3 sp3 3 1 0 0 loc1, loc2
In dplyr
you can use rowwise
:在
dplyr
,您可以使用rowwise
:
library(dplyr)
asv_ar2 %>%
rowwise() %>%
mutate(locs = toString(names(.[-1])[c_across(starts_with('loc')) > 0]))
We could do this in tidyverse
in a more vectorized way ie without using rowwise
.我们可以在
tidyverse
中以更加矢量化的方式来做到这一点,即不使用rowwise
。 Here, we loop across
the 'loc' columns, return the column name ( cur_column
) if the value is not 0 (the default case_when
return is NA
), speicify the .names
to create new columns by adding a suffix or prefix ( _new
), then make use of unite
to collapse those '_new' columns to a single one在这里,我们遍历 'loc' 列,如果值不为 0(默认
across
返回为NA
),则返回列名( cur_column
),通过添加后缀或前缀( case_when
) _new
.names
以创建新列,然后利用unite
将那些 '_new' 列折叠成一个
library(dplyr)
library(tidyr)
df1 %>%
mutate(across(starts_with('loc'), ~ case_when(. != 0 ~ cur_column()),
.names = '{.col}_new')) %>%
unite(locs, ends_with('new'), sep=", ", na.rm = TRUE)
# Specie loc1 loc2 loc3 loc4 locs
#1 sp1 0 1 0 4 loc2, loc4
#2 sp2 7 3 0 2 loc1, loc2, loc4
#3 sp3 3 1 0 0 loc1, loc2
df1 <- structure(list(Specie = c("sp1", "sp2", "sp3"), loc1 = c(0L,
7L, 3L), loc2 = c(1L, 3L, 1L), loc3 = c(0L, 0L, 0L), loc4 = c(4L,
2L, 0L)), class = "data.frame", row.names = c(NA, -3L))
You can do:你可以做:
apply(df, 1, function(x) paste(x[1], paste(names(which(x[-1] > 0)), collapse = ", ")))
[1] "sp1 loc2, loc4" "sp2 loc1, loc2, loc4" "sp3 loc1, loc2"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.