![](/img/trans.png)
[英]How to merge two data frames having same column names horizontally on basis of similar values in one column
[英]How do I merge two data frames in pandas on a common column which have similar values (but not the same)?
我正在尝试在显示地理区域名称的公共列上合并 Pandas 中的两个数据框。 该列具有相似的值但不相同。 例如,一个 DataFrame 中的值是London
而另一个是London / Greater London
——它们被视为不同的值,但在合并时应视为相同的值。
In[1]:
import pandas as pd
df1 = pd.DataFrame([['London', 2], ['Bristol', 3], ['Liverpool', 6]], columns=['Area', 'B'])
df2 = pd.DataFrame([['London / Greater London', 7], ['Bristol_', 9], ['Liverpool / Liverpool', 1]], columns=['Area', 'B'])
df_merged = pd.merge(df1, df2, on="Area", indicator=True, how='outer')
df_merged
Out[1]:
Area B_x B_y _merge
0 London 2.0 NaN left_only
1 Bristol 3.0 NaN left_only
2 Liverpool 6.0 NaN left_only
3 London / Greater London NaN 7.0 right_only
4 Bristol_ NaN 9.0 right_only
5 Liverpool / Liverpool NaN 1.0 right_only
理想的输出如下所示:
Out[1]:
Area B_x B_y _merge
0 London 2.0 7.0 both
1 Bristol 3.0 9.0 both
2 Liverpool 6.0 1.0 both
有没有办法根据值的某种程度的相似性合并这两个数据框,以便London
和London / Greater London
值被视为相同的值? 谢谢!
您可以首先使用np.where()
创建两个包含重叠Area
和City
索引的arrays
。 我用一个list comprehension
来检查,如果每个City
存在in
列表Areas
,并保存索引。
注意:这仅在Area
的string
包含City
string
时才有效。 (即London
仅与London / Greater London
匹配,如果该area
包含London
一词。
编码:
# Alter the column names B (present in both dfs to B_x and B_y )
df1 = pd.DataFrame([['London', 2], ['Bristol', 3], ['Liverpool', 6]], columns=['Area', 'B_x'])
df2 = pd.DataFrame([['London / Greater London', 7], ['Bristol_', 9], ['Liverpool / Liverpool', 1]], columns=['Area', 'B_y'])
# Create indices of matching string patterns
i, j = np.where([[city in area for area in df2['Area'].values] for city in df1['Area'].values])
# Create new dataframe with found indices
pd.DataFrame(np.column_stack([df1.iloc[i], df2.iloc[j]]), columns=df1.columns.append(df2.columns))
结果
Area B_x Area B_y
0 London 2 London / Greater London 7
1 Bristol 3 Bristol_ 9
2 Liverpool 6 Liverpool / Liverpool 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.