简体   繁体   中英

How to apply same operation to multiple columns spread out in dataframe in R?

beginner here. I have a dataframe with over 600 columns. I want to apply some simple operations to specific columns that are all named similarly (see example below) but spread out throughout the dataframe? How can I build a loop for all these columns based on their names and the ascending numbers at the end of the column name?

Dataframe example:

Name  Col_x_1  Company  Col_x_2  Start_Year  End_Year  Col_x_3
asd   4 Col    Test     2 Col    1902        1933      1 Col
kfj   5 Cols   Test_2   10 Col   1933        1954      0
ale   0        Test_3   11 Cols  1988        1999      5 Col
...

Output example: I want to remove the string following the space behind the value in the columns named "Col_x_1, Col_x_2, ..." (going up until 56) and turn the columns into dbl.

Name  Col_x_1  Company  Col_x_2  Start_Year  End_Year  Col_x_3
asd   4        Test     2        1902        1933      1 
kfj   5        Test_2   10       1933        1954      0
ale   0        Test_3   11       1988        1999      5 
...

How can I build a loop to perform this simple operation? Thank you in advance!

You could use tidyverse :

library(tidyverse)

df %>% 
  mutate(across(starts_with("Col_x"), ~str_extract(., "^\\d+") %>% as.numeric()))

This returns

# A tibble: 3 × 7
  Name  Col_x_1 Company Col_x_2 Start_Year End_Year Col_x_3
  <chr>   <dbl> <chr>     <dbl>      <dbl>    <dbl>   <dbl>
1 asd         4 Test          2       1902     1933       1
2 kfj         5 Test_2       10       1933     1954       0
3 ale         0 Test_3       11       1988     1999       5

Or, depending on your actual data

df %>% 
  mutate(across(starts_with("Col_x"), ~str_remove_all(., "(?<=\\d)\\s+[A-z]+") %>% as.numeric()))

The first one extracts the starting number (assuming it's an integer) and removes the remaining content. The second one looks for a number followed by a space and some letters and removes them.

Finally both convert the remaining part into a numeric.

We use starts_with() to select the columns by a names pattern. There are other possibilities like ends_with() , matches() or contains() . The selection of the appropriate function depends on the actual structure of the names.

If you must use a loop, you could use grep to get the columns names that contains "Col_x_" in the beginning and gsub to remove the words that you want

names_dat <- grep("^Col_x_",names(dat), value = TRUE)
for (i in names_dat) {
  dat[[i]] <- as.numeric(gsub("Cols?","", dat[[i]]))
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM