简体   繁体   English

使用 for 循环重新排列 R 中 GLM 分析的数据

[英]Re-arranging data for GLM analysis in R using a for-loop

I think my question is fairly simple to answer but I'm learning R so I'd like to know the best way to do it.我认为我的问题很容易回答,但我正在学习 R,所以我想知道最好的方法。

I've a dataset looking like this:我有一个看起来像这样的数据集:

> print(agg_df41367)
# A tibble: 72 x 3
# Groups:   hour [24]
    hour predicted      y
 1     0 Feeding   0.121 
 2     0 Foraging  0.632 
 3     0 Standing  0.300 
 4     1 Feeding   0.141 
 5     1 Foraging  0.727 
 6     1 Standing  0.183 
 7     2 Feeding   0.0932
 8     2 Foraging  0.817 
 9     2 Standing  0.133 
10     3 Feeding   0.214 

I would like to run a GLM model, so I'd like my data to look like:我想运行 GLM 模型,因此我希望我的数据如下所示:

head(agg_df41361_GLM)
hour Foraging Standing Feeding 
0     0.632   0.300    0.121
1     0.727   0.183    0.141
2     0.817   0.133    0.0932
3     etc.    etc.      0.214

Any ideas of what is the most compact way to do this?关于什么是最紧凑的方式来做到这一点的任何想法? Ideally, I would like to use a for -loop to compute this transformation for multiple datasets.理想情况下,我想使用for循环来计算多个数据集的这种转换。 All my datasets follow a name format agg_df4136* .我所有的数据集都遵循名称格式agg_df4136* Any input is appreciated!任何输入表示赞赏!

Here's a way to reshape the dataset you posted.这是一种重塑您发布的数据集的方法。

library(tidyr)

# example data
dt = read.table(text = "
hour predicted      y
1     0 Feeding   0.121 
2     0 Foraging  0.632 
3     0 Standing  0.300 
4     1 Feeding   0.141 
5     1 Foraging  0.727 
6     1 Standing  0.183 
7     2 Feeding   0.0932
8     2 Foraging  0.817 
9     2 Standing  0.133 
", header=T)

spread(dt, predicted, y)

#   hour Feeding Foraging Standing
# 1    0  0.1210    0.632    0.300
# 2    1  0.1410    0.727    0.183
# 3    2  0.0932    0.817    0.133

If you have multiple datasets it's better to create a list of them and apply the reshaping process to each one of them:如果您有多个数据集,最好创建一个它们的列表并将重塑过程应用于每个数据集:

library(tidyverse)

# example of list of dataframes
l = list(dt, dt, dt)

map(l, ~spread(., predicted, y))

# [[1]]
# hour Feeding Foraging Standing
# 1    0  0.1210    0.632    0.300
# 2    1  0.1410    0.727    0.183
# 3    2  0.0932    0.817    0.133
# 
# [[2]]
# hour Feeding Foraging Standing
# 1    0  0.1210    0.632    0.300
# 2    1  0.1410    0.727    0.183
# 3    2  0.0932    0.817    0.133
# 
# [[3]]
# hour Feeding Foraging Standing
# 1    0  0.1210    0.632    0.300
# 2    1  0.1410    0.727    0.183
# 3    2  0.0932    0.817    0.133

Note that here I'm using the same dataset ( dt ) as my 3 list elements, but it will work with different datasets, as long as you have the same column names.请注意,这里我使用相同的数据集 ( dt ) 作为我的 3 个列表元素,但只要您具有相同的列名,它就可以处理不同的数据集。

If you want to create a list of all your datasets that start with the name pattern you provided you can do this:如果要创建以您提供的名称模式开头的所有数据集的列表,可以执行以下操作:

# get objects that start with this name pattern
input_names = ls()[grepl("^agg_df4136", ls())]

# get the data that match those names
list_datasets = map(input_names, get)

So, list_datasets is a list of all dataframes in your environment with a name that starts with "agg_df4136".因此, list_datasets是您环境中所有数据帧的列表,其名称以“agg_df4136”开头。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM