简体   繁体   中英

how can I remove part of a names in one column of a data frame?

I have a data looks like this

v1                                         v2
phenzine.MO.4213121906560.C02.name  2.376140e-05
dnium.bte.MO.02400072107987.E10.name    2.423254e-05
trene.MO.024213121906564.C09.name       2.438986e-05
tilli.MO.550760072207033.F09.name       2.495574e-05
tnolone.MO..614615111406.name           2.511859e-05

I want to remove part of the first column which then it will looks like below

      v1              v2
    phenzine    2.376140e-05
    dnium.bte   2.423254e-05
    trene       2.438986e-05
    tilli       2.495574e-05
    tnolone     2.511859e-05

I know I must use grep or sub but I could not do it

You can try the below regex if 'MO' is common for all the elements

 df1$v1 <- sub('\\.MO.*', '', df1$v1)

Suppose, you want to remove the strings from . followed by first capital letter

 sub('\\.[A-Z].*', '', df1$v1)
 #[1] "phenzine"  "dnium.bte" "trene"     "tilli"     "tnolone"  

Or if it is more specific

sub('\\.(MO|NO|NR).*', '', df1$v1)
#[1] "phenzine"  "dnium.bte" "trene"     "tilli"     "tnolone"  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM