简体   繁体   中英

regex in r to replace a string with no special characters

i'm practicing my regex with r on a football schedule and can't figure this out

I'm essentially trying to change any home game to the string HOME. here is a snippet of the schedule_team dataframe that I am using:

  Team   w1   w2   w3   w4   w5   w6   w7   w8   w9  w10  w11  w12  w13  w14
1  ARI   SD @NYG   SF  BYE @DEN  WSH @OAK  PHI @DAL  STL  DET @SEA @ATL   KC
2  ATL   NO @CIN   TB @MIN @NYG  CHI @BAL  DET  BYE  @TB @CAR  CLE  ARI  @GB
3  BAL  CIN  PIT @CLE  CAR @IND  @TB  ATL @CIN @PIT  TEN  BYE  @NO   SD @MIA

non home teams have a @ symbol to begin the string. home teams do not. using regex in python I believe all home teams can be selected with regex like: ^([AZ])\\w+ .. essentially saying begins with a capital. this doesn't work in R because of the \\w among other errors.

Here is what I tried (and failed):

str_replace_all(as.matrix(schedule_teams), "[[^([A-Z])\w+]]", "HOME")

is there an easier way to change all home teams to HOME?

thanks in advance

Your regular expression syntax is incorrect, you have it wrapped inside of cascading character classes and you are trying to use a capturing group inside of the class which causes the pattern to fail when it reaches the closing )

To be concise, your regular expression currently defines a set of characters ( not what you want ) then fails.

[[^([A-Z]  # any character of: '[', '^', '(', '[', 'A' to 'Z' 

To fix this issue you need to remove the character classes and the capturing group that you have placed inside, making sure you double escape \\w in your regular expression pattern and then it should work for you.

I tested this on my console and it worked fine.

> df[,-1] <- str_replace_all(as.matrix(df[,-1]), '^[A-Z]\\w+', 'HOME')
##   Team   w1   w2   w3   w4   w5   w6   w7   w8   w9  w10  w11  w12  w13  w14
## 1  ARI HOME @NYG HOME HOME @DEN HOME @OAK HOME @DAL HOME HOME @SEA @ATL HOME
## 2  ATL HOME @CIN HOME @MIN @NYG HOME @BAL HOME HOME  @TB @CAR HOME HOME  @GB
## 3  BAL HOME HOME @CLE HOME @IND  @TB HOME @CIN @PIT HOME HOME  @NO HOME @MIA

Aside from using the stringr library, you can do this using sub if you insist using a regular expression.

> df[,-1] <- sub('^[A-Z]\\w+', 'HOME', as.matrix(df[,-1]))

And here is an approach without using regular expression:

> m <- as.matrix(df[-1])
> m[substr(m,0,1) != '@'] <- 'HOME'
> cbind(df[1], m)
##   Team   w1   w2   w3   w4   w5   w6   w7   w8   w9  w10  w11  w12  w13  w14
## 1  ARI HOME @NYG HOME HOME @DEN HOME @OAK HOME @DAL HOME HOME @SEA @ATL HOME
## 2  ATL HOME @CIN HOME @MIN @NYG HOME @BAL HOME HOME  @TB @CAR HOME HOME  @GB
## 3  BAL HOME HOME @CLE HOME @IND  @TB HOME @CIN @PIT HOME HOME  @NO HOME @MIA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM