简体   繁体   中英

How to estimate means from same column in large number of dataframes, based upon a grouping variable in R

I have a huge amount of DFs in R (>50), which correspond to different filtering I've performed, here's an example of 7 of them:

Steps_Day1 <- filter(PD2, Gait_Day == 1)
Steps_Day2 <- filter(PD2, Gait_Day == 2)
Steps_Day3 <- filter(PD2, Gait_Day == 3)
Steps_Day4 <- filter(PD2, Gait_Day == 4)
Steps_Day5 <- filter(PD2, Gait_Day == 5)
Steps_Day6 <- filter(PD2, Gait_Day == 6)
Steps_Day7 <- filter(PD2, Gait_Day == 7)

Each of the dataframes contains 19 variables, however I'm only interested in their speed (to calculate mean) and their subjectID, as each subject has multiple observations of speed in the same DF.

An example of the data we're interested in, in dataframe - Steps_Day1:

Speed     SubjectID
0.6          1
0.7          1
0.7          2
0.8          2
0.1          2
1.1          3
1.2          3
1.5          4
1.7          4
0.8          4

The data goes up to 61 pts. and each particpants number of observations is much larger than this.

Now what I want to do, is create a code that automatically cycles through each of 50 dataframes (taking the 7 above as an example) and calculates the mean speed for each participant and stores this and saves it in a new dataframe, alongside the variables containing to mean for each participant in the other DFs.

An example of Steps day 1 (Values not accurate)

Speed     SubjectID
0.6          1
0.7          2
1.2          3
1.7          4

and so on... Before I end up with a final DF containing in column vectors the means for each participant from each of the other data frames, which may look something like:

Steps_Day1   StepsDay2   StepsDay3   StepsDay4   SubjectID
0.6             0.8           0.5         0.4          1                   
0.7             0.9           0.6         0.6          2   
1.2             1.1           0.4         0.7          3  
1.7             1.3           0.3         0.8          4  

I could do this through some horrible, messy long code - but looking to see if anyone has more intuitive ideas please!

:)

You don't include a MCVE of your dataset so I can't test out a solution, but it seems like a pretty simple problem using tidyverse solutions.

First, why do you split PD2 into separate dataframes? If you skip that, you can just use group and summarize to get the average for groups:

PD2 %>%
    group_by(Gait_Day, SubjectID) %>%
    summarize(Steps = mean(Speed))

This will give you a "long-form" data.frame with 3 variables: Gait_Day , SubjectID , and Steps , which has the mean speed for that subject and day. If you want it in the format you show at the end, just pivot into "wide-form" using pivot_wider . You can see this question for further explaination on that: How to reshape data from long to wide format

To add to the previous answer, I agree that it is much easier to do this without creating a new data frame for each day. Using some generated data, you can achieve your desired results as follows:

# Generate some data
df <- data.frame(
  day = rep(1:5, 1, 100),
  subject = rep(5:10, 1, 100),
  speed = runif(500)
)

df %>%
  group_by(day, subject) %>%
  summarise(avg_speed = mean(speed)) %>%
  pivot_wider(names_from = day,
              names_prefix = "Steps_Day",
              values_from = avg_speed)

# A tibble: 6 × 6
  subject Steps_Day1 Steps_Day2 Steps_Day3 Steps_Day4 Steps_Day5
    <int>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
1       5      0.605      0.416      0.502      0.516      0.517
2       6      0.592      0.458      0.625      0.531      0.460
3       7      0.475      0.396      0.586      0.517      0.449
4       8      0.430      0.435      0.489      0.512      0.548
5       9      0.512      0.645      0.509      0.484      0.566
6      10      0.530      0.453      0.545      0.497      0.460

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM