简体   繁体   中英

R data frame organization

I am a beginner data scientist working within an NCAA football program trying to learn R. We record repeated measures data during practice that needs to be organized then analyzed. I have this sample dataset that I am having trouble analyzing due to the variations/typos in the "Activity" column. I need a way to rename/organize the activities in the dataframe.

Raw Dataset looks like this:

Position Activity Max_Velocity (mph)
WR Special Team 1 20
WR special team 1 19
WR Special team 1 18
WR special Team 2 13
WR ST 2 16
WR St 2 19
WR team 1 12
WR team 1 20
WR team 1 23
WR team 2 21
WR team 2 10
WR team 2 3
WR team 3 21
WR team 3 11
WR team 3 16
WR Indy 1 20
WR indy 2 21
WR INDY 3 22

I need it to look like this:

Position Activity Max_Velocity (mph) Activity_Catagory
WR Special Team 1 20 Special Team
WR special team 1 19 Special Team
WR Special team 1 18 Special Team
WR special Team 2 13 Special Team
WR ST 2 16 Special Team
WR St 2 19 Special Team
WR team 1 12 Team
WR team 1 20 Team
WR team 1 23 Team
WR team 2 21 Team
WR team 2 10 Team
WR team 2 3 Team
WR team 3 21 Team
WR team 3 11 Team
WR team 3 16 Team
WR Indy 1 20 Indy
WR indy 2 21 Indy
WR INDY 3 22 Indy

This way I can analyze the average of the dependent variable "Max_velocity" during each type of activity. I would rather not make individual lines of code correcting each typo in the "Activity" column. I think this can be done with loops but I am a beginner and don't know where to start. I would greatly appreciate any help to make this a simple fix.

Here is another way to solve your problem

library(stringr)

df$Activity_Catagory <- str_to_title(df$Activity) |> 
                          str_remove(" \\d") |> 
                          str_replace("St", "Special Team")



#    Position       Activity Max_Velocity (mph) Activity_Catagory
# 1        WR Special Team 1                 20      Special Team
# 2        WR special team 1                 19      Special Team
# 3        WR Special team 1                 18      Special Team
# 4        WR special Team 2                 13      Special Team
# 5        WR           ST 2                 16      Special Team
# 6        WR           St 2                 19      Special Team
# 7        WR         team 1                 12              Team
# 8        WR         team 1                 20              Team
# 9        WR         team 1                 23              Team
# 10       WR         team 2                 21              Team
# 11       WR         team 2                 10              Team
# 12       WR         team 2                  3              Team
# 13       WR         team 3                 21              Team
# 14       WR         team 3                 11              Team
# 15       WR         team 3                 16              Team
# 16       WR         Indy 1                 20              Indy
# 17       WR         indy 2                 21              Indy
# 18       WR         INDY 3                 22              Indy

You may need to first define the rules based on your data. Which values would be called as 'Special Team', which ones would be called as 'Team' etc.

Based on the example shared we can do case-insensitive pattern matching with grepl and list the conditions using case_when -

library(dplyr)

df %>%
  mutate(Activity_Category = case_when(grepl('Special Team', Activity, ignore.case = TRUE) | 
                                       grepl('ST ', Activity, ignore.case = TRUE) ~ 'Special Team', 
                                       grepl('Team', Activity, ignore.case = TRUE) ~ 'Team', 
                                       grepl('Indy', Activity, ignore.case = TRUE) ~ 'Indy'))

#   Position       Activity Max_Velocity..mph. Activity_Category
#1        WR Special Team 1                 20      Special Team
#2        WR special team 1                 19      Special Team
#3        WR Special team 1                 18      Special Team
#4        WR special Team 2                 13      Special Team
#5        WR           ST 2                 16      Special Team
#6        WR           St 2                 19      Special Team
#7        WR         team 1                 12              Team
#8        WR         team 1                 20              Team
#9        WR         team 1                 23              Team
#10       WR         team 2                 21              Team
#11       WR         team 2                 10              Team
#12       WR         team 2                  3              Team
#13       WR         team 3                 21              Team
#14       WR         team 3                 11              Team
#15       WR         team 3                 16              Team
#16       WR         Indy 1                 20              Indy
#17       WR         indy 2                 21              Indy
#18       WR         INDY 3                 22              Indy

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM