[英]Pandas Merging Multiple Columns at the Same Between Two Dataframes
我试图找到一种方法来同时合并多个列与 Pandas。我有我想要的 output 通过进行五个单独的合并,但感觉应该有一个更 pythonic 的方法来做到这一点。
本质上,我有一个 dataframe,在一个名为 df_striking 的 dataframe 中有五个关键字列,我试图将来自另一个 dataframe(称为 df_keyword_vol)的搜索量数据合并到相邻的行中。
最小可重现示例:
import pandas as pd
striking_data = {
"KW1": ["nectarine", "apricot", "plum"],
"KW1 Vol": ["", "", ""],
"KW2": ["apple", "orange", "pear"],
"KW2 Vol": ["", "", ""],
"KW3": ["banana", "grapefruit", "cherry"],
"KW3 Vol": ["", "", ""],
"KW4": ["kiwi", "lemon", "peach"],
"KW4 Vol": ["", "", ""],
"KW5": ["raspberry", "blueberry", "berries"],
"KW5 Vol": ["", "", ""],
}
df_striking = pd.DataFrame(striking_data)
keyword_vol_data = {
"Keyword": [
"nectarine",
"apricot",
"plum",
"apple",
"orange",
"pear",
"banana",
"grapefruit",
"cherry",
"kiwi",
"lemon",
"peach",
"raspberry",
"blueberry",
"berries",
],
"Volume": [
1000,
500,
200,
600,
800,
1000,
450,
10,
900,
1200,
150,
700,
400,
850,
1000,
],
}
df_keyword_vol = pd.DataFrame(keyword_vol_data)
所需 Output
我试过的。 我做了两个函数来一次合并一行关键字数据,但它不是很pythonic!
# two functions to merge in the keyword volume data for KWs 1 - 5
def merger(col1, col2):
dx = df_striking.merge(df_keyword_vol, how='left', left_on=col1, right_on=col2)
return dx
def volume(vol1, vol2):
vol = df_striking[vol1] = df_striking[vol2]
df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True)
return vol
df_striking = merger("KW1", "Keyword")
volume("KW1 Vol", "Volume")
df_striking = merger("KW2", "Keyword")
volume("KW2 Vol", "Volume")
df_striking = merger("KW3", "Keyword")
volume("KW3 Vol", "Volume")
df_striking = merger("KW4", "Keyword")
volume("KW4 Vol", "Volume")
df_striking = merger("KW5", "Keyword")
volume("KW5 Vol", "Volume")
如果您已经有空列,则可以使用:
mapping = df_keyword_vol.set_index('Keyword')['Volume']
df_striking.iloc[:, 1::2] = df_striking.iloc[:, ::2].replace(mapping)
否则,如果您只有KWx
列:
df2 = (pd.concat([df, df.replace(mapping)], axis=1)
.sort_index(axis=1)
)
output:
KW1 KW1 KW2 KW2 KW3 KW3 KW4 KW4 KW5 KW5
0 nectarine 1000 apple 600 banana 450 kiwi 1200 raspberry 400
1 apricot 500 orange 800 grapefruit 10 lemon 150 blueberry 850
2 plum 200 pear 1000 cherry 900 peach 700 berries 1000
如果将其全部转换为长格式会更容易:
>>> striking = df_striking.filter(regex='KW[0-9]*$').stack().rename('Keyword').reset_index()
>>> joined = striking.merge(df_keyword_vol)
>>> joined
level_0 level_1 Keyword Volume
0 0 KW1 nectarine 1000
1 0 KW2 apple 600
2 0 KW3 banana 450
3 0 KW4 kiwi 1200
4 0 KW5 raspberry 400
5 1 KW1 apricot 500
6 1 KW2 orange 800
7 1 KW3 grapefruit 10
8 1 KW4 lemon 150
9 1 KW5 blueberry 850
10 2 KW1 plum 200
11 2 KW2 pear 1000
12 2 KW3 cherry 900
13 2 KW4 peach 700
14 2 KW5 berries 1000
然后您可以使用.pivot
获得原始格式,但使用多索引作为列:
>>> joined.pivot('index', 'level_1', ['Keyword', 'Volume'])
Keyword Volume
level_1 KW1 KW2 KW3 KW4 KW5 KW1 KW2 KW3 KW4 KW5
index
0 nectarine apple banana kiwi raspberry 1000 600 450 1200 400
1 apricot orange grapefruit lemon blueberry 500 800 10 150 850
2 plum pear cherry peach berries 200 1000 900 700 1000
我们可以使用pd.concat
绕过这种奇怪的格式:
>>> pd.concat([
... joined.pivot('index', 'level_1', 'Keyword'),
... joined.pivot('index', 'level_1', 'Volume').add_suffix(' Vol')
... ], axis='columns').sort_index(axis='columns')
level_1 KW1 KW1 Vol KW2 KW2 Vol KW3 KW3 Vol KW4 KW4 Vol KW5 KW5 Vol
index
0 nectarine 1000 apple 600 banana 450 kiwi 1200 raspberry 400
1 apricot 500 orange 800 grapefruit 10 lemon 150 blueberry 850
2 plum 200 pear 1000 cherry 900 peach 700 berries 1000
pd.concat([v.reset_index(drop=True).drop('col1',axis=1)
for k,v in
df_keyword_vol.assign(col1=df_keyword_vol.index//3)
.groupby('col1')]
,axis=1)\
.set_axis(df_striking.columns,axis=1)
KW1 KW1 KW2 KW2 KW3 KW3 KW4 KW4 KW5 KW5
0 nectarine 1000 apple 600 banana 450 kiwi 1200 raspberry 400
1 apricot 500 orange 800 grapefruit 10 lemon 150 blueberry 850
2 plum 200 pear 1000 cherry 900 peach 700 berries 1000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.