简体   繁体   中英

R remove "st", "nd", "rd", "th" from multiple columns in dataframe

I have a dataframe with hockey team names in column 1. In columns 2-16, stat categories are ranked (1st, 2nd, 3rd, 4th... and so on). I want to remove all non numeric amounts from all categories so I am left with (1, 2, 3, 4...)

I know I can you gsub("th", "", dataframe$column_name) for each column, but is there a way to do it quickly across all columns?

One idea is to use mutate_at to apply the replacement function to the column you want as follows. Here I provided two replacement functions: str_replace and str_extract , which both work. mutate_at , str_replace , and str_extract are all from the tidyverse package.

library(tidyverse)

# Create an example data frame
dat <- tibble(
  A = c("1st", "2nd", "3rd"),
  B = c("8th", "5th", "6th"),
  C = c("7th", "101st", "23rd"),
  Team = c("A", "B", "C")
)

# Solution 1: str_replace
dat %>%
  mutate_at(vars(-Team), list(~as.integer(str_replace(., "st|nd|rd|th", ""))))
# # A tibble: 3 x 4
#       A     B     C Team 
#   <int> <int> <int> <chr>
# 1     1     8     7 A    
# 2     2     5   101 B    
# 3     3     6    23 C 

# Solution 2: str_extract
dat %>%
  mutate_at(vars(-Team), list(~as.integer(str_extract(., "[0-9]*"))))
# # A tibble: 3 x 4
#       A     B     C Team 
#   <int> <int> <int> <chr>
# 1     1     8     7 A    
# 2     2     5   101 B    
# 3     3     6    23 C

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM