[Code]
d = {
'ID': ['1', '4', '5', '9'],
'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
'email': ['abi@gmail.com', 'pugal.g@yahoo.in', 'jan232@gmail.com', 'jacob@hoi.com'],
}
df1 = pd.DataFrame(d)
print(df1)
df = pd.DataFrame()
for idx, row in df1.iterrows():
d = df1[df1['email'].str.startswith(row['username'])]
if not d.empty:
df = pd.concat([df, d])
df
Using the above code I can filter all the partially matching rows on RIGHT side column (ie email
=> username
)..
Current Output:
But I want the reversed matching as well (ie username
=> email
), as below
Expected Output:
Thanks in advance,
Something like this works. The reverse task requires you have some minimum condition to match on, in this case, three consecutive matches.
Hopefully, this gets you started in the right direction.
import pandas as pd
d = {
'ID': ['1', '4', '5', '9'],
'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
'email': ['abi@gmail.com', 'pugal.g@yahoo.in', 'jan232@gmail.com', 'jacob@hoi.com'],
}
df1 = pd.DataFrame(d)
df1['email_match'] =df1.apply(lambda x: x['email'].startswith(x['username']), axis=1)
df1['user_match'] =df1.apply(lambda x: x['username'].startswith(x['email'][0:3]), axis=1)
print(df1)
ID username email email_match user_match
0 1 haabi.g abi@gmail.com False False
1 4 pugal.g pugal.g@yahoo.in True True
2 5 janani.g jan232@gmail.com False True
3 9 hajacob.h jacob@hoi.com False False
You can add a counting mechanism, to know how many of the consecutive values match.
def user_match(x):
name = list(x['email'].split('@')[0])
user = list(x['username'])
count = 0
for t in list(zip(name, user)):
if t[0] == t[1]:
count += 1
if t[0] != t[1]:
break
if count >= 3:
return count
if count == 0:
return 0
df1['count'] = df1.apply(lambda x: user_match(x), axis=1)
ID username email email_match user_match count
0 1 haabi.g abi@gmail.com False False 0
1 4 pugal.g pugal.g@yahoo.in True True 7
2 5 janani.g jan232@gmail.com False True 3
3 9 hajacob.h jacob@hoi.com False False 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.