简体   繁体   中英

Partial String match on both side of the columns in pandas

[Code]

d = {
    'ID': ['1', '4', '5', '9'],
    'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
    'email': ['abi@gmail.com', 'pugal.g@yahoo.in', 'jan232@gmail.com', 'jacob@hoi.com'],
}
df1 = pd.DataFrame(d)
print(df1)

在此处输入图像描述

df = pd.DataFrame()
for idx, row in df1.iterrows():
    d = df1[df1['email'].str.startswith(row['username'])]
    if not d.empty:
        df = pd.concat([df, d])
df

Using the above code I can filter all the partially matching rows on RIGHT side column (ie email => username )..

Current Output:

在此处输入图像描述

But I want the reversed matching as well (ie username => email ), as below

Expected Output:

在此处输入图像描述

Thanks in advance,

Something like this works. The reverse task requires you have some minimum condition to match on, in this case, three consecutive matches.

Hopefully, this gets you started in the right direction.


import pandas as pd

d = {
    'ID': ['1', '4', '5', '9'],
    'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
    'email': ['abi@gmail.com', 'pugal.g@yahoo.in', 'jan232@gmail.com', 'jacob@hoi.com'],
}
df1 = pd.DataFrame(d)


df1['email_match'] =df1.apply(lambda x: x['email'].startswith(x['username']), axis=1)
df1['user_match'] =df1.apply(lambda x: x['username'].startswith(x['email'][0:3]), axis=1)

print(df1)


  ID   username             email  email_match  user_match
0  1    haabi.g     abi@gmail.com        False       False
1  4    pugal.g  pugal.g@yahoo.in         True        True
2  5   janani.g  jan232@gmail.com        False        True
3  9  hajacob.h     jacob@hoi.com        False       False

You can add a counting mechanism, to know how many of the consecutive values match.


def user_match(x):
    name = list(x['email'].split('@')[0])
    user = list(x['username'])
    count = 0
    for t in list(zip(name, user)):
        if t[0] == t[1]:
            count += 1
        if t[0] != t[1]:
            break
    if count >= 3:
        return count
    if count == 0:
        return 0

df1['count'] = df1.apply(lambda x: user_match(x), axis=1)


  ID   username             email  email_match  user_match  count
0  1    haabi.g     abi@gmail.com        False       False      0
1  4    pugal.g  pugal.g@yahoo.in         True        True      7
2  5   janani.g  jan232@gmail.com        False        True      3
3  9  hajacob.h     jacob@hoi.com        False       False      0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM