[英]Partial String match on both side of the columns in pandas
[Code] [代码]
d = {
'ID': ['1', '4', '5', '9'],
'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
'email': ['abi@gmail.com', 'pugal.g@yahoo.in', 'jan232@gmail.com', 'jacob@hoi.com'],
}
df1 = pd.DataFrame(d)
print(df1)
df = pd.DataFrame()
for idx, row in df1.iterrows():
d = df1[df1['email'].str.startswith(row['username'])]
if not d.empty:
df = pd.concat([df, d])
df
Using the above code I can filter all the partially matching rows on RIGHT side column (ie email
=> username
)..使用上面的代码,我可以过滤右侧列上所有部分匹配的行(即email
=> username
)..
Current Output:当前 Output:
But I want the reversed matching as well (ie username
=> email
), as below但我也想要反向匹配(即username
=> email
),如下
Expected Output:预期 Output:
Thanks in advance,提前致谢,
Something like this works.像这样的东西有效。 The reverse task requires you have some minimum condition to match on, in this case, three consecutive matches.反向任务要求您有一些最小条件来匹配,在这种情况下,三个连续的匹配。
Hopefully, this gets you started in the right direction.希望这能让您朝着正确的方向开始。
import pandas as pd
d = {
'ID': ['1', '4', '5', '9'],
'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
'email': ['abi@gmail.com', 'pugal.g@yahoo.in', 'jan232@gmail.com', 'jacob@hoi.com'],
}
df1 = pd.DataFrame(d)
df1['email_match'] =df1.apply(lambda x: x['email'].startswith(x['username']), axis=1)
df1['user_match'] =df1.apply(lambda x: x['username'].startswith(x['email'][0:3]), axis=1)
print(df1)
ID username email email_match user_match
0 1 haabi.g abi@gmail.com False False
1 4 pugal.g pugal.g@yahoo.in True True
2 5 janani.g jan232@gmail.com False True
3 9 hajacob.h jacob@hoi.com False False
You can add a counting mechanism, to know how many of the consecutive values match.您可以添加计数机制,以了解有多少连续值匹配。
def user_match(x):
name = list(x['email'].split('@')[0])
user = list(x['username'])
count = 0
for t in list(zip(name, user)):
if t[0] == t[1]:
count += 1
if t[0] != t[1]:
break
if count >= 3:
return count
if count == 0:
return 0
df1['count'] = df1.apply(lambda x: user_match(x), axis=1)
ID username email email_match user_match count
0 1 haabi.g abi@gmail.com False False 0
1 4 pugal.g pugal.g@yahoo.in True True 7
2 5 janani.g jan232@gmail.com False True 3
3 9 hajacob.h jacob@hoi.com False False 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.