Pandas 在两个数据帧之间同时合并多个列

Question

我试图找到一种方法来同时合并多个列与 Pandas。我有我想要的 output 通过进行五个单独的合并，但感觉应该有一个更 pythonic 的方法来做到这一点。

本质上，我有一个 dataframe，在一个名为 df_striking 的 dataframe 中有五个关键字列，我试图将来自另一个 dataframe（称为 df_keyword_vol）的搜索量数据合并到相邻的行中。

最小可重现示例：

import pandas as pd

striking_data = {
    "KW1": ["nectarine", "apricot", "plum"],
    "KW1 Vol": ["", "", ""],
    "KW2": ["apple", "orange", "pear"],
    "KW2 Vol": ["", "", ""],
    "KW3": ["banana", "grapefruit", "cherry"],
    "KW3 Vol": ["", "", ""],
    "KW4": ["kiwi", "lemon", "peach"],
    "KW4 Vol": ["", "", ""],
    "KW5": ["raspberry", "blueberry", "berries"],
    "KW5 Vol": ["", "", ""],
}

df_striking = pd.DataFrame(striking_data)

keyword_vol_data = {
    "Keyword": [
        "nectarine",
        "apricot",
        "plum",
        "apple",
        "orange",
        "pear",
        "banana",
        "grapefruit",
        "cherry",
        "kiwi",
        "lemon",
        "peach",
        "raspberry",
        "blueberry",
        "berries",
    ],
    "Volume": [
        1000,
        500,
        200,
        600,
        800,
        1000,
        450,
        10,
        900,
        1200,
        150,
        700,
        400,
        850,
        1000,
    ],
}

df_keyword_vol = pd.DataFrame(keyword_vol_data)

所需 Output

我试过的。 我做了两个函数来一次合并一行关键字数据，但它不是很pythonic！

# two functions to merge in the keyword volume data for KWs 1 - 5
def merger(col1, col2):
    dx = df_striking.merge(df_keyword_vol, how='left', left_on=col1, right_on=col2)
    return dx

def volume(vol1, vol2):
    vol = df_striking[vol1] = df_striking[vol2]
    df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True)
    return vol

df_striking = merger("KW1", "Keyword")
volume("KW1 Vol", "Volume")
df_striking = merger("KW2", "Keyword")
volume("KW2 Vol", "Volume")
df_striking = merger("KW3", "Keyword")
volume("KW3 Vol", "Volume")
df_striking = merger("KW4", "Keyword")
volume("KW4 Vol", "Volume")
df_striking = merger("KW5", "Keyword")
volume("KW5 Vol", "Volume")

Answer 1

如果您已经有空列，则可以使用：

mapping = df_keyword_vol.set_index('Keyword')['Volume']

df_striking.iloc[:, 1::2] = df_striking.iloc[:, ::2].replace(mapping)

否则，如果您只有KWx列：

df2 = (pd.concat([df, df.replace(mapping)], axis=1)
         .sort_index(axis=1)
       )

output：

         KW1   KW1     KW2   KW2         KW3  KW3    KW4   KW4        KW5   KW5
0  nectarine  1000   apple   600      banana  450   kiwi  1200  raspberry   400
1    apricot   500  orange   800  grapefruit   10  lemon   150  blueberry   850
2       plum   200    pear  1000      cherry  900  peach   700    berries  1000

Answer 2

如果将其全部转换为长格式会更容易：

>>> striking = df_striking.filter(regex='KW[0-9]*$').stack().rename('Keyword').reset_index()
>>> joined = striking.merge(df_keyword_vol)
>>> joined
  level_0 level_1     Keyword  Volume
0       0     KW1   nectarine    1000
1       0     KW2       apple     600
2       0     KW3      banana     450
3       0     KW4        kiwi    1200
4       0     KW5   raspberry     400
5       1     KW1     apricot     500
6       1     KW2      orange     800
7       1     KW3  grapefruit      10
8       1     KW4       lemon     150
9       1     KW5   blueberry     850
10      2     KW1        plum     200
11      2     KW2        pear    1000
12      2     KW3      cherry     900
13      2     KW4       peach     700
14      2     KW5     berries    1000

然后您可以使用.pivot获得原始格式，但使用多索引作为列：

>>> joined.pivot('index', 'level_1', ['Keyword', 'Volume'])
           Keyword                                       Volume                       
level_1        KW1     KW2         KW3    KW4        KW5    KW1   KW2  KW3   KW4   KW5
index                                                                                 
0        nectarine   apple      banana   kiwi  raspberry   1000   600  450  1200   400
1          apricot  orange  grapefruit  lemon  blueberry    500   800   10   150   850
2             plum    pear      cherry  peach    berries    200  1000  900   700  1000

我们可以使用pd.concat绕过这种奇怪的格式：

>>> pd.concat([
...     joined.pivot('index', 'level_1', 'Keyword'),
...     joined.pivot('index', 'level_1', 'Volume').add_suffix(' Vol')
... ], axis='columns').sort_index(axis='columns')
level_1        KW1  KW1 Vol     KW2  KW2 Vol         KW3  KW3 Vol    KW4  KW4 Vol        KW5  KW5 Vol
index                                                                                                
0        nectarine     1000   apple      600      banana      450   kiwi     1200  raspberry      400
1          apricot      500  orange      800  grapefruit       10  lemon      150  blueberry      850
2             plum      200    pear     1000      cherry      900  peach      700    berries     1000

Answer 3

pd.concat([v.reset_index(drop=True).drop('col1',axis=1)
           for k,v in
           df_keyword_vol.assign(col1=df_keyword_vol.index//3)
          .groupby('col1')]
          ,axis=1)\
    .set_axis(df_striking.columns,axis=1)


    KW1   KW1     KW2   KW2         KW3  KW3    KW4   KW4        KW5   KW5
0  nectarine  1000   apple   600      banana  450   kiwi  1200  raspberry   400
1    apricot   500  orange   800  grapefruit   10  lemon   150  blueberry   850
2       plum   200    pear  1000      cherry  900  peach   700    berries  1000

Pandas 在两个数据帧之间同时合并多个列

问题描述

3 个解决方案

解决方案1
3 已采纳 2021-09-28 18:59:01

解决方案2
1 2021-09-28 19:12:47

解决方案3
0 2022-11-22 08:07:02

Pandas 在两个数据帧之间同时合并多个列

问题描述

3 个解决方案

解决方案1 3 已采纳 2021-09-28 18:59:01

解决方案2 1 2021-09-28 19:12:47

解决方案3 0 2022-11-22 08:07:02

解决方案1
3 已采纳 2021-09-28 18:59:01

解决方案2
1 2021-09-28 19:12:47

解决方案3
0 2022-11-22 08:07:02