简体   繁体   中英

How to rename observations based upon frequency in R?

Within my dataframe, I'm trying to rename certain observations in column 'Name' based upon their respective frequency. That is, I want to rename the observations with a Name frequency lower than 100. If any name occurs less than 100 times in the dataset, I want to rename all those observations "Base" in the Name column. Here is an example:

Game   Home Runs     Name 

1          2        Hank Aaron
2          3        Babe Ruth
3          1        Ted Williams
3          4        Hank Aaron
4          2        Ted Williams
...

If Ted Williams's and Babe Ruth's names were to appear few than 100 times in the data frame, their names would be replaced with "Base" for all values of the Name column.

Game   Home Runs     Name 

1          2        Hank Aaron
2          3        Base
3          1        Base
3          4        Hank Aaron
4          2        Base
...

Additionally, I need the observations to be in the same dataframe, as I plan on running regressions using the new Name vector as an independent (individual effects) variable in a regression.

Apologies if I over-explained. Just a little lost

library(forcats)

df %>%

   mutate(Name = fct_lump(Name, n = 100, other_level = "Base")) 

You can use table to count number of times each Name occurs in the dataframe, using Filter keep only those names which occur less than 100 times, match them in the original dataframe using %in% and replace.

df$Name[df$Name %in% names(Filter(I, table(df$Name) < 100))] <- 'Base'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM