简体   繁体   English

根据正则表达式匹配替换整个字符串

[英]Replace entire string based on regex match

I have a large pandas dataframe of email address and wanted to replace all the .edu emails with "Edu". 我的电子邮件地址有一个大熊猫数据框,并希望将所有.edu电子邮件替换为“ Edu”。 I came up with an highly inefficient way of doing it but there has to be a better way of doing it. 我想出了一种效率极低的方法,但是必须有一种更好的方法。 This is how I do it: 这是我的方法:

import pandas as pd
import re
inp = [{'c1':10, 'c2':'gedua.com'},   {'c1':11,'c2':'wewewe.Edu'},   {'c1':12,'c2':'wewewe.edu.ney'}]
dfn = pd.DataFrame(inp)

for index, row in dfn.iterrows():
    try:
        if len(re.search('\.edu', row['c2']).group(0)) > 1:
            dfn.c2[index] = 'Edu'
            print('Education')
    except:
        continue

Using str.contains for case insensitive selection, and assignment with loc . 使用str.contains进行不区分大小写的选择,并使用loc赋值。

dfn.loc[dfn.c2.str.contains(r'\.Edu', case=False), 'c2'] = 'Edu'    
dfn

   c1         c2
0  10  gedua.com
1  11        Edu
2  12        Edu

If it's only the emails ending with .edu you want to replace, then 如果只是您要替换的以.edu 结尾的电子邮件,则

dfn.loc[dfn.c2.str.contains(r'\.Edu$', case=False), 'c2'] = 'Edu'

Or, as suggested by piR, 或者,如piR所建议,

dfn.loc[dfn.c2.str.endswith('.Edu'), 'c2'] = 'Edu'

dfn

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney  

replace

dfn.replace('^.*\.Edu$', 'Edu', regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

The pattern '^.*\\.Edu$' says grab everything from the beginning of the string to the point where we find '.Edu' followed by the end of the string, then replace that whole thing with 'Edu' 模式'^.*\\.Edu$'表示抓取从字符串开头到找到'.Edu'再到字符串末尾的所有内容,然后将整个内容替换为'Edu'


Column specific 列特定

You may want to limit the scope to just a column (or columns). 您可能希望将范围限制为一列(或多列)。 You can do that by passing a dictionary to replace where the outer key specifies the column and the dictionary value specifies what is to be replaced. 您可以通过传递要replace的字典来做到这一点,其中外键指定列,而字典值指定要替换的内容。

dfn.replace({'c2': {'^.*\.Edu$': 'Edu'}}, regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

Case insensitive [ thx @coldspeed ] 不区分大小写[ thx @coldspeed ]

pandas.DataFrame.replace does not have a case flag. pandas.DataFrame.replace没有大小写标志。 But you can imbed it in the pattern with '(?i)' 但是您可以使用'(?i)'其嵌入到模式中

dfn.replace({'c2': {'(?i)^.*\.edu$': 'Edu'}}, regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM