简体   繁体   中英

Selecting patterns in character sequence using regex

I would need to select all the accounts were 3 (or more) consecutive characters are identical and/or include also digits in the name, for example

Account
aaa12
43qas
42134dfsdd
did 

Output

Account
aaa12
43qas
42134dfsdd 

I am considering of using regex for this: [a-zA-Z]{3,}, but I am not sure of the approach. Also, this does not include the and/or condition on the digits. I would be interested in both for selecting accounts with at least one of these:

  • repeated identical characters,
  • numbers in the name.

Can you try:

x = re.search([a-zA-Z]{3}|\d, string) 

Give this a try

n = 3 #for 3 chars repeating
pat = f'([a-zA-Z])\\1{{{n-1}}}|(\\d)+' #need `{{` to pass a literal `{`
df_final = df[df.Account.str.findall(pat).astype(bool)] 

Out[101]:
      Account
0       aaa12
1       43qas
2  42134dfsdd

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM