简体   繁体   中英

How to select a column based on part of a contained string and then drop part of the column name in R? (Column position may vary)

I am writing a function to prepare a data frame in R to be used later in a regression. I want to rename any column which contains the word distance. Specifically, I want to drop the first descriptive word previous to distance. (So this would include both a word and a period before the start of the word distance).

I have:

country.distance.median country.distance.mean population  life.q state.distance.mean
                210                   189      10000        0.6.    100
                3100                  2100     20000        0.7.    300
                37                    36        500         0.3     10 

I would like:

             distance.median distance.mean population   life.q  distance.mean
                210                   189      10000      0.6     100
                3100                  2100     20000      0.7     300
                37                    36        500       0.3     10

Because this will be contained in a function, the number and position of columns is variable, so I need a solution which is not reliant on column position. Note that it should not change the column name "life.q", and so the solutions needs to be able to likewise recognize and select columns based on the distance string. Note that the word in front of distance may change as well (for example, the column 'state.distance.mean').

(It should also have the ability to be used as an if statement within a function.)

Thank you for your time and thoughts. :)

You may try using sub here:

names(df) <- sub("^country\\.(?=distance\\.)", "", names(df), perl=TRUE)
df

  distance.median distance.mean population life.q
1             210           189      10000    0.6
2            3100          2100      20000    0.7
3              37            36        500    0.3

More generally, to remove the first word preceded by dot, provided that there is another dot later in the word, you may try:

names(df) <- sub("^[^.]+\\.(?=.*\\.)", "", names(df), perl=TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM