pandas 中列两侧的部分字符串匹配

Question

[代码]

d = {
    'ID': ['1', '4', '5', '9'],
    'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
    'email': ['abi@gmail.com', 'pugal.g@yahoo.in', 'jan232@gmail.com', 'jacob@hoi.com'],
}
df1 = pd.DataFrame(d)
print(df1)

df = pd.DataFrame()
for idx, row in df1.iterrows():
    d = df1[df1['email'].str.startswith(row['username'])]
    if not d.empty:
        df = pd.concat([df, d])
df

使用上面的代码，我可以过滤右侧列上所有部分匹配的行（即email => username ）..

当前 Output：

但我也想要反向匹配（即username => email ），如下

预期 Output：

提前致谢，

Answer 1

像这样的东西有效。 反向任务要求您有一些最小条件来匹配，在这种情况下，三个连续的匹配。

希望这能让您朝着正确的方向开始。


import pandas as pd

d = {
    'ID': ['1', '4', '5', '9'],
    'username': ['haabi.g', 'pugal.g', 'janani.g', 'hajacob.h'],
    'email': ['abi@gmail.com', 'pugal.g@yahoo.in', 'jan232@gmail.com', 'jacob@hoi.com'],
}
df1 = pd.DataFrame(d)


df1['email_match'] =df1.apply(lambda x: x['email'].startswith(x['username']), axis=1)
df1['user_match'] =df1.apply(lambda x: x['username'].startswith(x['email'][0:3]), axis=1)

print(df1)

  ID   username             email  email_match  user_match
0  1    haabi.g     abi@gmail.com        False       False
1  4    pugal.g  pugal.g@yahoo.in         True        True
2  5   janani.g  jan232@gmail.com        False        True
3  9  hajacob.h     jacob@hoi.com        False       False

您可以添加计数机制，以了解有多少连续值匹配。


def user_match(x):
    name = list(x['email'].split('@')[0])
    user = list(x['username'])
    count = 0
    for t in list(zip(name, user)):
        if t[0] == t[1]:
            count += 1
        if t[0] != t[1]:
            break
    if count >= 3:
        return count
    if count == 0:
        return 0

df1['count'] = df1.apply(lambda x: user_match(x), axis=1)

  ID   username             email  email_match  user_match  count
0  1    haabi.g     abi@gmail.com        False       False      0
1  4    pugal.g  pugal.g@yahoo.in         True        True      7
2  5   janani.g  jan232@gmail.com        False        True      3
3  9  hajacob.h     jacob@hoi.com        False       False      0

pandas 中列两侧的部分字符串匹配

问题描述

1 个解决方案

解决方案1
0 2021-01-09 13:42:29

pandas 中列两侧的部分字符串匹配

问题描述

1 个解决方案

解决方案1 0 2021-01-09 13:42:29

解决方案1
0 2021-01-09 13:42:29