[英]Re-arranging data for GLM analysis in R using a for-loop
I think my question is fairly simple to answer but I'm learning R so I'd like to know the best way to do it.我认为我的问题很容易回答,但我正在学习 R,所以我想知道最好的方法。
I've a dataset looking like this:我有一个看起来像这样的数据集:
> print(agg_df41367)
# A tibble: 72 x 3
# Groups: hour [24]
hour predicted y
1 0 Feeding 0.121
2 0 Foraging 0.632
3 0 Standing 0.300
4 1 Feeding 0.141
5 1 Foraging 0.727
6 1 Standing 0.183
7 2 Feeding 0.0932
8 2 Foraging 0.817
9 2 Standing 0.133
10 3 Feeding 0.214
I would like to run a GLM model, so I'd like my data to look like:我想运行 GLM 模型,因此我希望我的数据如下所示:
head(agg_df41361_GLM)
hour Foraging Standing Feeding
0 0.632 0.300 0.121
1 0.727 0.183 0.141
2 0.817 0.133 0.0932
3 etc. etc. 0.214
Any ideas of what is the most compact way to do this?关于什么是最紧凑的方式来做到这一点的任何想法? Ideally, I would like to use a
for
-loop to compute this transformation for multiple datasets.理想情况下,我想使用
for
循环来计算多个数据集的这种转换。 All my datasets follow a name format agg_df4136*
.我所有的数据集都遵循名称格式
agg_df4136*
。 Any input is appreciated!任何输入表示赞赏!
Here's a way to reshape the dataset you posted.这是一种重塑您发布的数据集的方法。
library(tidyr)
# example data
dt = read.table(text = "
hour predicted y
1 0 Feeding 0.121
2 0 Foraging 0.632
3 0 Standing 0.300
4 1 Feeding 0.141
5 1 Foraging 0.727
6 1 Standing 0.183
7 2 Feeding 0.0932
8 2 Foraging 0.817
9 2 Standing 0.133
", header=T)
spread(dt, predicted, y)
# hour Feeding Foraging Standing
# 1 0 0.1210 0.632 0.300
# 2 1 0.1410 0.727 0.183
# 3 2 0.0932 0.817 0.133
If you have multiple datasets it's better to create a list of them and apply the reshaping process to each one of them:如果您有多个数据集,最好创建一个它们的列表并将重塑过程应用于每个数据集:
library(tidyverse)
# example of list of dataframes
l = list(dt, dt, dt)
map(l, ~spread(., predicted, y))
# [[1]]
# hour Feeding Foraging Standing
# 1 0 0.1210 0.632 0.300
# 2 1 0.1410 0.727 0.183
# 3 2 0.0932 0.817 0.133
#
# [[2]]
# hour Feeding Foraging Standing
# 1 0 0.1210 0.632 0.300
# 2 1 0.1410 0.727 0.183
# 3 2 0.0932 0.817 0.133
#
# [[3]]
# hour Feeding Foraging Standing
# 1 0 0.1210 0.632 0.300
# 2 1 0.1410 0.727 0.183
# 3 2 0.0932 0.817 0.133
Note that here I'm using the same dataset ( dt
) as my 3 list elements, but it will work with different datasets, as long as you have the same column names.请注意,这里我使用相同的数据集 (
dt
) 作为我的 3 个列表元素,但只要您具有相同的列名,它就可以处理不同的数据集。
If you want to create a list of all your datasets that start with the name pattern you provided you can do this:如果要创建以您提供的名称模式开头的所有数据集的列表,可以执行以下操作:
# get objects that start with this name pattern
input_names = ls()[grepl("^agg_df4136", ls())]
# get the data that match those names
list_datasets = map(input_names, get)
So, list_datasets
is a list of all dataframes in your environment with a name that starts with "agg_df4136".因此,
list_datasets
是您环境中所有数据帧的列表,其名称以“agg_df4136”开头。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.