使用 .to_numpy() 將特定列從 Pandas Dataframe 的一行復制到另一行

Question

我有一個這樣的數據框：

     UniqueID  CST  WEIGHT  VOLUME  PRODUCTIVITY
0  413-20012    3     123      12          1113
1  413-45365    1     889      75          6748
2  413-21165    8     554      13          4536
3  413-24354    1     387      35          7649
4  413-34658    2     121      88          2468
5  413-36889    4     105      76          3336
6  413-23457    5     355      42          7894
7  413-30089    5     146      10          9112
8  413-41158    5     453      91          4545
9  413-51015    9     654      66          2232

我有一本關於 UniqueID 的 parent:child 映射字典：

parent_child_dict = {
    '413-51015': '413-41158',
    '413-21165': '413-23457',
    '413-45365': '413-41158',
    '413-20012': '413-23457'
}

我需要做的是循環遍歷數據框，並將“子”UniqueID 行的 WEIGHT、VOLUME 和 PRODUCTIVITY 值替換為“父”UniqueID 行中的值，其中生成的 Dataframe 如下所示：

     UniqueID  CST  WEIGHT  VOLUME  PRODUCTIVITY
0  413-20012    3     355      42          7894
1  413-45365    1     453      91          4545
2  413-21165    8     355      42          7894
3  413-24354    1     387      35          7649
4  413-34658    2     121      88          2468
5  413-36889    4     105      76          3336
6  413-23457    5     355      42          7894
7  413-30089    5     146      10          9112
8  413-41158    5     453      91          4545
9  413-51015    9     453      91          4545

我已經嘗試了幾種可能的解決方案，我遇到的問題是限制副本的方式是保留“子”行的 UniqueID 和 CST 值，但復制其他值。

我得到的最接近的是通過字典的循環，其中每個配對都被輸入：

df.loc[df['UniqueID'] == '413-51015'] = df.loc[df['UniqueID'] == '413-41158'].to_numpy()

這似乎很好地將所有值從一行替換為另一行。

任何有關例外情況的幫助或更好的整體解決方案都會非常有幫助。 謝謝你。

編輯

我已經將第一個解決方案循環到我想要在數據集中更改的列中，如下所示：

columns = []
for col in df.columns:
    columns.append(col)
remove_perm = columns.remove('UniqueID')
remove_perm = columns.remove('CST')
print(columns)

輸出

['WEIGHT', 'VOLUME', 'PRODUCTIVITY']

然后

for col in columns:
    s = df[['UniqueID', col]].set_index('UniqueID')
    df[col] = s.loc[df['UniqueID'].replace(parent_child_dict)].to_numpy()

這導致了所需的數據集。

Answer 1

replace和loc訪問：

s = df[['UniqueID','PRODUCTIVITY']].set_index('UniqueID')

# using to_numpy here :-)
df['PRODUCTIVITY'] = s.loc[df['UniqueID'].replace(parent_child_dict)].to_numpy()

輸出：

    UniqueID  CST  WEIGHT  VOLUME  PRODUCTIVITY
0  413-20012    3     123      12          7894
1  413-45365    1     889      75          4545
2  413-21165    8     554      13          7894
3  413-24354    1     387      35          7649
4  413-34658    2     121      88          2468
5  413-36889    4     105      76          3336
6  413-23457    5     355      42          7894
7  413-30089    5     146      10          9112
8  413-41158    5     453      91          4545
9  413-51015    9     654      66          4545

Answer 2

首先根據您的UniqueID和PRODUCTIVITY創建一個映射。

然后使用您的父子映射您的 ID：

mapping = df.set_index('UniqueID')['PRODUCTIVITY'].to_dict()
df['PRODUCTIVITY'] = (
    df['UniqueID'].map(parent_child_dict).map(mapping).fillna(df['PRODUCTIVITY']).astype(int)
)
print(df)
    UniqueID  CST  WEIGHT  VOLUME  PRODUCTIVITY
0  413-20012    3     123      12          7894
1  413-45365    1     889      75          4545
2  413-21165    8     554      13          7894
3  413-24354    1     387      35          7649
4  413-34658    2     121      88          2468
5  413-36889    4     105      76          3336
6  413-23457    5     355      42          7894
7  413-30089    5     146      10          9112
8  413-41158    5     453      91          4545
9  413-51015    9     654      66          4545

使用 .to_numpy() 將特定列從 Pandas Dataframe 的一行復制到另一行

問題描述

2 個解決方案

解決方案1
2 已采納 2020-03-19 21:42:29

解決方案2
0 2020-03-19 21:39:30

使用 .to_numpy() 將特定列從 Pandas Dataframe 的一行復制到另一行

問題描述

2 個解決方案

解決方案1 2 已采納 2020-03-19 21:42:29

解決方案2 0 2020-03-19 21:39:30

解決方案1
2 已采納 2020-03-19 21:42:29

解決方案2
0 2020-03-19 21:39:30