pandas 中列两侧的部分字符串匹配

Question

[Code] [代码]

d = {
    'ID': ['1', '4', '5', '9'],
    'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
    'email': ['abi@gmail.com', 'pugal.g@yahoo.in', 'jan232@gmail.com', 'jacob@hoi.com'],
}
df1 = pd.DataFrame(d)
print(df1)

df = pd.DataFrame()
for idx, row in df1.iterrows():
    d = df1[df1['email'].str.startswith(row['username'])]
    if not d.empty:
        df = pd.concat([df, d])
df

Using the above code I can filter all the partially matching rows on RIGHT side column (ie email => username )..使用上面的代码，我可以过滤右侧列上所有部分匹配的行（即email => username ）..

Current Output:当前 Output：

But I want the reversed matching as well (ie username => email ), as below但我也想要反向匹配（即username => email ），如下

Expected Output:预期 Output：

Thanks in advance,提前致谢，

Answer 1

Something like this works.像这样的东西有效。 The reverse task requires you have some minimum condition to match on, in this case, three consecutive matches.反向任务要求您有一些最小条件来匹配，在这种情况下，三个连续的匹配。

Hopefully, this gets you started in the right direction.希望这能让您朝着正确的方向开始。


import pandas as pd

d = {
    'ID': ['1', '4', '5', '9'],
    'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
    'email': ['abi@gmail.com', 'pugal.g@yahoo.in', 'jan232@gmail.com', 'jacob@hoi.com'],
}
df1 = pd.DataFrame(d)


df1['email_match'] =df1.apply(lambda x: x['email'].startswith(x['username']), axis=1)
df1['user_match'] =df1.apply(lambda x: x['username'].startswith(x['email'][0:3]), axis=1)

print(df1)

  ID   username             email  email_match  user_match
0  1    haabi.g     abi@gmail.com        False       False
1  4    pugal.g  pugal.g@yahoo.in         True        True
2  5   janani.g  jan232@gmail.com        False        True
3  9  hajacob.h     jacob@hoi.com        False       False

You can add a counting mechanism, to know how many of the consecutive values match.您可以添加计数机制，以了解有多少连续值匹配。


def user_match(x):
    name = list(x['email'].split('@')[0])
    user = list(x['username'])
    count = 0
    for t in list(zip(name, user)):
        if t[0] == t[1]:
            count += 1
        if t[0] != t[1]:
            break
    if count >= 3:
        return count
    if count == 0:
        return 0

df1['count'] = df1.apply(lambda x: user_match(x), axis=1)

  ID   username             email  email_match  user_match  count
0  1    haabi.g     abi@gmail.com        False       False      0
1  4    pugal.g  pugal.g@yahoo.in         True        True      7
2  5   janani.g  jan232@gmail.com        False        True      3
3  9  hajacob.h     jacob@hoi.com        False       False      0

pandas 中列两侧的部分字符串匹配

问题描述

1 个解决方案

解决方案1
0 2021-01-09 13:42:29

pandas 中列两侧的部分字符串匹配

问题描述

1 个解决方案

解决方案1 0 2021-01-09 13:42:29

解决方案1
0 2021-01-09 13:42:29