如果另一個具有字符串 Pandas，則將值加到列中

Question

我有一個具有不同屬性值的 dataframe，我有一個 json 文件，它有一個屬性列表，僅當 dataframe 中的列包含字符串時才求和。

| Product | store_location | attr1 | attr2 | attr3 | attr4 | global |
| ------- | -------------- | ----- | ----- | ----- | ----- | ------ |
| First   | NY-store1      | 3     | 5     |  2    | 2     |        |
| Second  | NY-store2      | 1     | 3     |  5    | 1     |        |
| Third   | NJ-store1      | 3     | 5     |  2    | 2     |        |
| Fourth  | PA-store1      | 1     | 3     |  5    | 1     |        |

json 文件具有以下結構：

{
"positionEvaluation": [
    {
      "position": "Global",
      "sumElements": ["attr1", "attr2"],
      "gralSum": ["attr2", "attr3", "attr4"],
      "elementsProm": ["attr1", "attr2", "attr3", "attr4"]
    }
]
}

顯然真實文件的屬性比較多，僅供demo使用。 所以，我想當產品在商店位置時，字符串“NY”采用“sumElements”的相應屬性並除以“gralSum”的長度，如果產品有另一個字符串，如“NJ”或“PA”對“elementsProm”的所有元素求和，然后除以它的長度。

這是我的代碼：

for p in range(len(js_positions["positionEvaluation"])):
    aux1_string = js_positions["positionEvaluation"][p]["position"]
    df[aux1_string] = 0
    if df['store_location'].str.contains('NY').any():
        for k in range(len(js_positions["positionEvaluation"][p]["sumElements"])):
            tmp = js_positions["positionEvaluation"][p]["sumElements"][k]
            df[aux1_string] = df[aux1_string] + df[tmp_for_gk]

        df[aux1_string] = df[aux1_string] / len(js_positions["positionEvaluation"][p]["gralSum"])

    else:
        for k in range(len(js_positions["positionEvaluation"][p]["elementsProm"])):
            tmp = js_positions["positionEvaluation"][p]["elementsProm"][k]
            df[aux1_string] = df[aux1_string] + df[tmp]
        df[aux1_string] = df[aux1_string] / len(js_positions["positionEvaluation"][p]["elementsProm"])

顯式列表：

sumElements = ["attr1", "attr2"]
gralSum = ["attr2", "attr3", "attr4"]
elementsProm = ["attr1", "attr2", "attr3", "attr4"]

預計 output：

| Product | store_location | attr1 | attr2 | attr3 | attr4 | global |
| ------- | -------------- | ----- | ----- | ----- | ----- | ------ |
| First   | NY-store1      | 3     | 5     |  2    | 2     |  2,66  |
| Second  | NY-store2      | 1     | 3     |  5    | 1     |  1,33  |
| Third   | NJ-store1      | 3     | 5     |  2    | 2     |    3   |
| Fourth  | PA-store1      | 1     | 3     |  5    | 1     |   2,5  |

Answer 1

IIUC，您想對商店名稱中是否包含字符串 NY 的不同屬性求和？

為此，您可以使用 boolean 索引和mean或sum ：

sumElements = ["attr1", "attr2"]
gralSum = ["attr2", "attr3", "attr4"]
elementsProm = ["attr1", "attr2", "attr3", "attr4"]

df['global'] = np.where(df['store_location'].str.contains('NY'),
                        df[sumElements].sum(1).div(len(gralSum)),
                        df[elementsProm].mean(1))

output：

  Product store_location  attr1  attr2  attr3  attr4    global
0   First      NY-store1      3      5      2      2  2.666667
1  Second      NY-store2      1      3      5      1  1.333333
2   Third      NJ-store1      3      5      2      2  3.000000
3  Fourth      PA-store1      1      3      5      1  2.500000

如果另一個具有字符串 Pandas，則將值加到列中

問題描述

1 個解決方案

解決方案1
1 已采納 2022-05-10 07:41:42

如果另一個具有字符串 Pandas，則將值加到列中

問題描述

1 個解決方案

解決方案1 1 已采納 2022-05-10 07:41:42

解決方案1
1 已采納 2022-05-10 07:41:42