將 dataframe 與具有索引重復且不包括一列的系列相乘

Question

我的 dataframe 的縮短版本如下所示：

df_crop = pd.DataFrame({
    'Name' : ['Crop1', 'Crop1', 'Crop1', 'Crop1', 'Crop2', 'Crop2', 'Crop2', 'Crop2'],
    'Type' : ['Area', 'Diesel', 'Fert', 'Pest', 'Area', 'Diesel', 'Fert', 'Pest'],
    'GHG':   [14.9, 0.0007, 0.145, 0.1611, 2.537, 0.011, 0.1825, 0.115],
    'Acid':  [0.0125, 0.0005, 0.0029, 0.0044, 0.013, 0.00014, 0.0033, 0.0055],
    'Terra Eutro': [0.053, 0.0002, 0.0077, 0.0001, 0.0547, 0.00019, 0.0058, 0.0002]
})

我現在需要用產量對 dataframe 中的所有值進行歸一化，產量不同，但每種類型不同：

s_yield = pd.Series([0.388, 0.4129], 
                    index=['Crop1', 'Crop2'])

我需要保留“類型”中的信息。 如果我嘗試使用.mul()由於重復索引而收到錯誤： ValueError: cannot reindex from a duplicate axis 。

我唯一的另一個想法是使用.loc()但我有很多列（16 列有要規范化的值）並且沒有想到任何有效的。 有什么建議么？

編輯：下表可能有助於顯示我試圖實現的目標：

Answer 1

獲取數值數據並使用系列相乘

numeric_df = df_crop.select_dtypes('number')
df_crop[numeric_df.columns] = numeric_df.mul(df_crop.Name.map(s_yield), axis=0)

Output

    Name    Type       GHG      Acid  Terra Eutro
0  Crop1    Area  5.781200  0.004850     0.020564
1  Crop1  Diesel  0.000272  0.000194     0.000078
2  Crop1    Fert  0.056260  0.001125     0.002988
3  Crop1    Pest  0.062507  0.001707     0.000039
4  Crop2    Area  1.047527  0.005368     0.022586
5  Crop2  Diesel  0.004542  0.000058     0.000078
6  Crop2    Fert  0.075354  0.001363     0.002395
7  Crop2    Pest  0.047483  0.002271     0.000083

Answer 2

從 pandas 0.24.0 開始，只要將系列命名為：

df_merged = df_crop.merge(s_yield.rename('yield'), left_on = 'Name', right_index = True)

然后根據需要將列相乘。

Answer 3

您可以使用s_yield.map將系列擴展到 dataframe 的長度，您可以使用df.select_dtypes查找特定 dtype(s) 的所有列以及它們上的多個：

cols = df_crop.select_dtypes('number').columns
df_crop[cols] = df_crop[cols].mul(df_crop['Name'].map(s_yield), axis=0)

Output：

>>> df_crop
    Name    Type       GHG      Acid  Terra Eutro
0  Crop1    Area  5.781200  0.004850     0.020564
1  Crop1  Diesel  0.000272  0.000194     0.000078
2  Crop1    Fert  0.056260  0.001125     0.002988
3  Crop1    Pest  0.062507  0.001707     0.000039
4  Crop2    Area  1.047527  0.005368     0.022586
5  Crop2  Diesel  0.004542  0.000058     0.000078
6  Crop2    Fert  0.075354  0.001363     0.002395
7  Crop2    Pest  0.047483  0.002271     0.000083

Answer 4

設置 df_crop 的索引，並與系列相乘，在相關級別上對齊：

temp = df_crop.set_index(['Name', 'Type'])

temp.mul(s_yield, level='Name', axis = 0).reset_index()

    Name    Type       GHG      Acid  Terra Eutro
0  Crop1    Area  5.781200  0.004850     0.020564
1  Crop1  Diesel  0.000272  0.000194     0.000078
2  Crop1    Fert  0.056260  0.001125     0.002988
3  Crop1    Pest  0.062507  0.001707     0.000039
4  Crop2    Area  1.047527  0.005368     0.022586
5  Crop2  Diesel  0.004542  0.000058     0.000078
6  Crop2    Fert  0.075354  0.001363     0.002395
7  Crop2    Pest  0.047483  0.002271     0.000083

將 dataframe 與具有索引重復且不包括一列的系列相乘

問題描述

4 個解決方案

解決方案1
1 2021-12-28 23:56:48

解決方案2
0 2021-12-28 23:39:18

解決方案3
0 2021-12-29 00:00:09

解決方案4
0 2021-12-29 02:23:45

將 dataframe 與具有索引重復且不包括一列的系列相乘

問題描述

4 個解決方案

解決方案1 1 2021-12-28 23:56:48

解決方案2 0 2021-12-28 23:39:18

解決方案3 0 2021-12-29 00:00:09

解決方案4 0 2021-12-29 02:23:45

解決方案1
1 2021-12-28 23:56:48

解決方案2
0 2021-12-28 23:39:18

解決方案3
0 2021-12-29 00:00:09

解決方案4
0 2021-12-29 02:23:45