如果另一个具有字符串 Pandas，则将值加到列中

Question

I have a dataframe with values of different attributes, and i have a json file which have a list of attributes to sum only if a column in the dataframe contains a string.我有一个具有不同属性值的 dataframe，我有一个 json 文件，它有一个属性列表，仅当 dataframe 中的列包含字符串时才求和。

| Product | store_location | attr1 | attr2 | attr3 | attr4 | global |
| ------- | -------------- | ----- | ----- | ----- | ----- | ------ |
| First   | NY-store1      | 3     | 5     |  2    | 2     |        |
| Second  | NY-store2      | 1     | 3     |  5    | 1     |        |
| Third   | NJ-store1      | 3     | 5     |  2    | 2     |        |
| Fourth  | PA-store1      | 1     | 3     |  5    | 1     |        |

The json file has this structure: json 文件具有以下结构：

{
"positionEvaluation": [
    {
      "position": "Global",
      "sumElements": ["attr1", "attr2"],
      "gralSum": ["attr2", "attr3", "attr4"],
      "elementsProm": ["attr1", "attr2", "attr3", "attr4"]
    }
]
}

Obviously the real file has more attributes, only for demo.显然真实文件的属性比较多，仅供demo使用。 So, I want when the product has in the store location the string 'NY' take respective attributes of "sumElements" and divide by the length of "gralSum", and if the product has another string like 'NJ' or 'PA' just sum all elements of "elementsProm" and then divide by the length of it.所以，我想当产品在商店位置时，字符串“NY”采用“sumElements”的相应属性并除以“gralSum”的长度，如果产品有另一个字符串，如“NJ”或“PA”对“elementsProm”的所有元素求和，然后除以它的长度。

Here my code:这是我的代码：

for p in range(len(js_positions["positionEvaluation"])):
    aux1_string = js_positions["positionEvaluation"][p]["position"]
    df[aux1_string] = 0
    if df['store_location'].str.contains('NY').any():
        for k in range(len(js_positions["positionEvaluation"][p]["sumElements"])):
            tmp = js_positions["positionEvaluation"][p]["sumElements"][k]
            df[aux1_string] = df[aux1_string] + df[tmp_for_gk]

        df[aux1_string] = df[aux1_string] / len(js_positions["positionEvaluation"][p]["gralSum"])

    else:
        for k in range(len(js_positions["positionEvaluation"][p]["elementsProm"])):
            tmp = js_positions["positionEvaluation"][p]["elementsProm"][k]
            df[aux1_string] = df[aux1_string] + df[tmp]
        df[aux1_string] = df[aux1_string] / len(js_positions["positionEvaluation"][p]["elementsProm"])

Explicit list:显式列表：

sumElements = ["attr1", "attr2"]
gralSum = ["attr2", "attr3", "attr4"]
elementsProm = ["attr1", "attr2", "attr3", "attr4"]

Expected output:预计 output：

| Product | store_location | attr1 | attr2 | attr3 | attr4 | global |
| ------- | -------------- | ----- | ----- | ----- | ----- | ------ |
| First   | NY-store1      | 3     | 5     |  2    | 2     |  2,66  |
| Second  | NY-store2      | 1     | 3     |  5    | 1     |  1,33  |
| Third   | NJ-store1      | 3     | 5     |  2    | 2     |    3   |
| Fourth  | PA-store1      | 1     | 3     |  5    | 1     |   2,5  |

Answer 1

IIUC, you want to sum different attribute whether or not the string NY is in the store name? IIUC，您想对商店名称中是否包含字符串 NY 的不同属性求和？

For this you can use boolean indexing and mean or sum :为此，您可以使用 boolean 索引和mean或sum ：

sumElements = ["attr1", "attr2"]
gralSum = ["attr2", "attr3", "attr4"]
elementsProm = ["attr1", "attr2", "attr3", "attr4"]

df['global'] = np.where(df['store_location'].str.contains('NY'),
                        df[sumElements].sum(1).div(len(gralSum)),
                        df[elementsProm].mean(1))

output: output：

  Product store_location  attr1  attr2  attr3  attr4    global
0   First      NY-store1      3      5      2      2  2.666667
1  Second      NY-store2      1      3      5      1  1.333333
2   Third      NJ-store1      3      5      2      2  3.000000
3  Fourth      PA-store1      1      3      5      1  2.500000

如果另一个具有字符串 Pandas，则将值加到列中

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-10 07:41:42

如果另一个具有字符串 Pandas，则将值加到列中

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-10 07:41:42

解决方案1
1 已采纳 2022-05-10 07:41:42