简体   繁体   中英

Replace entire string based on regex match

I have a large pandas dataframe of email address and wanted to replace all the .edu emails with "Edu". I came up with an highly inefficient way of doing it but there has to be a better way of doing it. This is how I do it:

import pandas as pd
import re
inp = [{'c1':10, 'c2':'gedua.com'},   {'c1':11,'c2':'wewewe.Edu'},   {'c1':12,'c2':'wewewe.edu.ney'}]
dfn = pd.DataFrame(inp)

for index, row in dfn.iterrows():
    try:
        if len(re.search('\.edu', row['c2']).group(0)) > 1:
            dfn.c2[index] = 'Edu'
            print('Education')
    except:
        continue

Using str.contains for case insensitive selection, and assignment with loc .

dfn.loc[dfn.c2.str.contains(r'\.Edu', case=False), 'c2'] = 'Edu'    
dfn

   c1         c2
0  10  gedua.com
1  11        Edu
2  12        Edu

If it's only the emails ending with .edu you want to replace, then

dfn.loc[dfn.c2.str.contains(r'\.Edu$', case=False), 'c2'] = 'Edu'

Or, as suggested by piR,

dfn.loc[dfn.c2.str.endswith('.Edu'), 'c2'] = 'Edu'

dfn

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney  

replace

dfn.replace('^.*\.Edu$', 'Edu', regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

The pattern '^.*\\.Edu$' says grab everything from the beginning of the string to the point where we find '.Edu' followed by the end of the string, then replace that whole thing with 'Edu'


Column specific

You may want to limit the scope to just a column (or columns). You can do that by passing a dictionary to replace where the outer key specifies the column and the dictionary value specifies what is to be replaced.

dfn.replace({'c2': {'^.*\.Edu$': 'Edu'}}, regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

Case insensitive [ thx @coldspeed ]

pandas.DataFrame.replace does not have a case flag. But you can imbed it in the pattern with '(?i)'

dfn.replace({'c2': {'(?i)^.*\.edu$': 'Edu'}}, regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM