[英]Replace entire string based on regex match
I have a large pandas dataframe of email address and wanted to replace all the .edu emails with "Edu". 我的电子邮件地址有一个大熊猫数据框,并希望将所有.edu电子邮件替换为“ Edu”。 I came up with an highly inefficient way of doing it but there has to be a better way of doing it. 我想出了一种效率极低的方法,但是必须有一种更好的方法。 This is how I do it: 这是我的方法:
import pandas as pd
import re
inp = [{'c1':10, 'c2':'gedua.com'}, {'c1':11,'c2':'wewewe.Edu'}, {'c1':12,'c2':'wewewe.edu.ney'}]
dfn = pd.DataFrame(inp)
for index, row in dfn.iterrows():
try:
if len(re.search('\.edu', row['c2']).group(0)) > 1:
dfn.c2[index] = 'Edu'
print('Education')
except:
continue
Using str.contains
for case insensitive selection, and assignment with loc
. 使用str.contains
进行不区分大小写的选择,并使用loc
赋值。
dfn.loc[dfn.c2.str.contains(r'\.Edu', case=False), 'c2'] = 'Edu'
dfn
c1 c2
0 10 gedua.com
1 11 Edu
2 12 Edu
If it's only the emails ending with .edu
you want to replace, then 如果只是您要替换的以.edu
结尾的电子邮件,则
dfn.loc[dfn.c2.str.contains(r'\.Edu$', case=False), 'c2'] = 'Edu'
Or, as suggested by piR, 或者,如piR所建议,
dfn.loc[dfn.c2.str.endswith('.Edu'), 'c2'] = 'Edu'
dfn
c1 c2
0 10 gedua.com
1 11 Edu
2 12 wewewe.edu.ney
replace
dfn.replace('^.*\.Edu$', 'Edu', regex=True)
c1 c2
0 10 gedua.com
1 11 Edu
2 12 wewewe.edu.ney
The pattern '^.*\\.Edu$'
says grab everything from the beginning of the string to the point where we find '.Edu'
followed by the end of the string, then replace that whole thing with 'Edu'
模式'^.*\\.Edu$'
表示抓取从字符串开头到找到'.Edu'
再到字符串末尾的所有内容,然后将整个内容替换为'Edu'
You may want to limit the scope to just a column (or columns). 您可能希望将范围限制为一列(或多列)。 You can do that by passing a dictionary to replace
where the outer key specifies the column and the dictionary value specifies what is to be replaced. 您可以通过传递要replace
的字典来做到这一点,其中外键指定列,而字典值指定要替换的内容。
dfn.replace({'c2': {'^.*\.Edu$': 'Edu'}}, regex=True)
c1 c2
0 10 gedua.com
1 11 Edu
2 12 wewewe.edu.ney
pandas.DataFrame.replace
does not have a case flag. pandas.DataFrame.replace
没有大小写标志。 But you can imbed it in the pattern with '(?i)'
但是您可以使用'(?i)'
其嵌入到模式中
dfn.replace({'c2': {'(?i)^.*\.edu$': 'Edu'}}, regex=True)
c1 c2
0 10 gedua.com
1 11 Edu
2 12 wewewe.edu.ney
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.