pandas dataframe - 两列字符串匹配和组

Question

I have a pandas dataframe which contains strings in two columns. 我有一个pandas数据帧，其中包含两列中的字符串。 I want to for each of the columns extract all strings which are similar except the numerical digits and add new columns where the similar text is exchanged against a idx value. 我想为每个列提取除数字之外相似的所有字符串，并添加新列，其中类似文本与idx值交换。

From this: 由此：

Id    Name1    Name2
0     Alpha 1  Bravo 3
1     Alpha 2  Alpha 2
2     Bravo 3  Alpha 1

To This: 对此：

Id    Name1    Name2    NewCol1    NewCol2
0     Alpha 1  Bravo 3  1          2
1     Alpha 2  Zero  2  1          3
2     Bravo 3  Alpha 1  2          1

Is there a simple solution to this without a big iteration loop? 没有大的迭代循环，有没有一个简单的解决方案？

Answer 1

I think need create Series with MultiIndex by stack , remove digit s and for categories use factorize , last unstack and join to original: 我认为需要通过stack创建具有MultiIndex的Series ，删除digit s，对于类别使用factorize ，last unstack并join到original：

s = df.set_index('Id').stack().str.replace('\d+', '')

df = df.join(pd.Series(pd.factorize(s)[0] + 1, index=s.index).unstack().add_prefix('New'))
print (df)
   Id    Name1    Name2  NewName1  NewName2
0   0  Alpha 1  Bravo 3         1         2
1   1  Alpha 2   Zero 2         1         3
2   2  Bravo 3  Alpha 1         2         1

Details : 细节：

print (s)
Id       
0   Name1    Alpha 
    Name2    Bravo 
1   Name1    Alpha 
    Name2     Zero 
2   Name1    Bravo 
    Name2    Alpha 
dtype: object

print (pd.factorize(s)[0] + 1)
[1 2 1 3 2 1]

Answer 2

You may need to use a loop to iterate over column names. 您可能需要使用循环来迭代列名称。 For rows use pandas.Series.str.replace 对于行，请使用pandas.Series.str.replace

import pandas as pd
df = pd.DataFrame({'Name1' :['Alpha 1', 'Aplha 2', 'Bravo 3'], 'Name2' : ['Bravo 3', 'Alpha 2', 'Alpha 1']})
for name in df.columns.tolist():
    df["newCol" + name.replace("Name", "")] = df[name].str.split(expand=True)[1]

pandas dataframe - 两列字符串匹配和组

问题描述

2 个解决方案

解决方案1
3 2018-06-12 07:18:07

解决方案2
0 2018-06-12 07:29:54

pandas dataframe - 两列字符串匹配和组

问题描述

2 个解决方案

解决方案1 3 2018-06-12 07:18:07

解决方案2 0 2018-06-12 07:29:54

解决方案1
3 2018-06-12 07:18:07

解决方案2
0 2018-06-12 07:29:54