[英]Loop over groupby columns in r and apply a function
Hello everyone I would need help in order to loop over a dataframe by groups of columns.大家好,我需要帮助才能按列组循环遍历 dataframe。
Here is an example of dataframe这是 dataframe 的示例
Group Species Values
1 G1 Cattus_cattus Val1
2 G1 Cattus_cattus Val2
3 G1 Cattus_cattus Val3
4 G2 Canis_lupus Val4
5 G2 Canis_lupus Val5
6 G3 Griseus_lupa Val6
7 G4 Griseus_lupa Val7
I would like to:我想:
1 - loop over c(df$Group,df$Species)
1 - 循环
c(df$Group,df$Species)
2 - take the df$Values
and store it as a vector
2 - 获取
df$Values
并将其存储为vector
3 - put that vector into a function called afunction
3 - 将该向量放入称为函数的
afunction
4 - open a treefile
with anotherfunction
where its name is the df$Group name
4 - 使用另一个函数打开一个
treefile
文件,其名称为anotherfunction
df$Group name
5 - get the output value
of that function and add it into a new_column
5 - 获取 function 的
output value
并将其添加到new_column
So here is an exemple of what the code should do:所以这里是代码应该做什么的一个例子:
first groups is G1,Cattus_cattus
:第一组是
G1,Cattus_cattus
:
Group Species Values
1 G1 Cattus_cattus Val1
2 G1 Cattus_cattus Val2
3 G1 Cattus_cattus Val3
Then I open the treefile
with treefile <- anotherfunction(G1)
然后我用
treefile <- anotherfunction(G1)
打开treefile
文件
Then I generate the output value such as output_value<-afunction(treefile,c("Val1","Val2","Val3))
然后我生成 output 值,例如
output_value<-afunction(treefile,c("Val1","Val2","Val3))
then the output_value = 30
那么
output_value = 30
so I add 30 into the df:所以我将 30 添加到 df 中:
Group Species Values new_column
1 G1 Cattus_cattus Val1 30
2 G1 Cattus_cattus Val2 30
3 G1 Cattus_cattus Val3 30
if there is only one row within the Group, then I do nothing and add a NA.如果组内只有一行,那么我什么都不做并添加一个 NA。
Note that of course it is a nonexisting function, so you cannot reproduce the exemple.请注意,它当然是不存在的 function,因此您无法重现该示例。
Ath the and we should get something like (where new_column
values are random here).我们应该得到类似的东西(这里的
new_column
值是随机的)。
Group Species Values new_column
1 G1 Cattus_cattus Val1 30
2 G1 Cattus_cattus Val2 30
3 G1 Cattus_cattus Val3 30
4 G2 Canis_lupus Val4 21
5 G2 Canis_lupus Val5 21
6 G3 Griseus_lupa Val6 NA
7 G4 Griseus_lupa Val7 NA
Does someone have an idea please?有人有想法吗? So fare I known how to loop over a dataframe using a for loop but here I do not known how to deal with groups composed of 2 colums..
到目前为止,我知道如何使用 for 循环遍历 dataframe 但在这里我不知道如何处理由 2 列组成的组。
data数据
structure(list(Group = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 4L
), .Label = c("G1", "G2", "G3", "G4"), class = "factor"), Species = structure(c(2L,
2L, 2L, 1L, 1L, 3L, 3L), .Label = c("Canis_lupus", "Cattus_cattus",
"Griseus_lupa"), class = "factor"), Values = structure(1:7, .Label = c("Val1",
"Val2", "Val3", "Val4", "Val5", "Val6", "Val7"), class = "factor")), class = "data.frame", row.names = c(NA,
-7L))
You can try something like this:你可以尝试这样的事情:
library(dplyr)
library(purrr)
df %>%
group_by(Group) %>%
summarise(treefile = anotherfunction(first(Group)),
Values = list(Values)) %>%
mutate(new_column = map2_dbl(treefile, Values, afunction))
This would give you a summarised dataframe.这会给你一个总结的 dataframe。 To get the same number of rows back you can
left_join
with df
by Group
.要获得相同数量的行,您可以
left_join
与df
by Group
。
Here is how you do it:这是您的操作方法:
anotherfunction = function(x){
#do something with your treefile
ifelse("Val2" %in% x, 30, ifelse("Val4" %in% x, 21, NA))
}
df %>%
group_by(Group) %>%
mutate(new_column=anotherfunction(Values))
You did not give a lot of information about anotherfunction()
so I used an ugly nested ifelse()
to mimic the behavior.您没有提供有关
anotherfunction()
的大量信息,因此我使用了丑陋的嵌套ifelse()
来模仿该行为。
The key is that mutate()
will use the Values inside the Groups.关键是
mutate()
将使用组内的值。
To explore this, you can try to run the code:要探索这一点,您可以尝试运行代码:
anotherfunction = function(x){browser()}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.