[英]Sum values to column if another have a string Pandas
I have a dataframe with values of different attributes, and i have a json file which have a list of attributes to sum only if a column in the dataframe contains a string.我有一个具有不同属性值的 dataframe,我有一个 json 文件,它有一个属性列表,仅当 dataframe 中的列包含字符串时才求和。
| Product | store_location | attr1 | attr2 | attr3 | attr4 | global |
| ------- | -------------- | ----- | ----- | ----- | ----- | ------ |
| First | NY-store1 | 3 | 5 | 2 | 2 | |
| Second | NY-store2 | 1 | 3 | 5 | 1 | |
| Third | NJ-store1 | 3 | 5 | 2 | 2 | |
| Fourth | PA-store1 | 1 | 3 | 5 | 1 | |
The json file has this structure: json 文件具有以下结构:
{
"positionEvaluation": [
{
"position": "Global",
"sumElements": ["attr1", "attr2"],
"gralSum": ["attr2", "attr3", "attr4"],
"elementsProm": ["attr1", "attr2", "attr3", "attr4"]
}
]
}
Obviously the real file has more attributes, only for demo.显然真实文件的属性比较多,仅供demo使用。 So, I want when the product has in the store location the string 'NY' take respective attributes of "sumElements" and divide by the length of "gralSum", and if the product has another string like 'NJ' or 'PA' just sum all elements of "elementsProm" and then divide by the length of it.
所以,我想当产品在商店位置时,字符串“NY”采用“sumElements”的相应属性并除以“gralSum”的长度,如果产品有另一个字符串,如“NJ”或“PA”对“elementsProm”的所有元素求和,然后除以它的长度。
Here my code:这是我的代码:
for p in range(len(js_positions["positionEvaluation"])):
aux1_string = js_positions["positionEvaluation"][p]["position"]
df[aux1_string] = 0
if df['store_location'].str.contains('NY').any():
for k in range(len(js_positions["positionEvaluation"][p]["sumElements"])):
tmp = js_positions["positionEvaluation"][p]["sumElements"][k]
df[aux1_string] = df[aux1_string] + df[tmp_for_gk]
df[aux1_string] = df[aux1_string] / len(js_positions["positionEvaluation"][p]["gralSum"])
else:
for k in range(len(js_positions["positionEvaluation"][p]["elementsProm"])):
tmp = js_positions["positionEvaluation"][p]["elementsProm"][k]
df[aux1_string] = df[aux1_string] + df[tmp]
df[aux1_string] = df[aux1_string] / len(js_positions["positionEvaluation"][p]["elementsProm"])
Explicit list:显式列表:
sumElements = ["attr1", "attr2"]
gralSum = ["attr2", "attr3", "attr4"]
elementsProm = ["attr1", "attr2", "attr3", "attr4"]
Expected output:预计 output:
| Product | store_location | attr1 | attr2 | attr3 | attr4 | global |
| ------- | -------------- | ----- | ----- | ----- | ----- | ------ |
| First | NY-store1 | 3 | 5 | 2 | 2 | 2,66 |
| Second | NY-store2 | 1 | 3 | 5 | 1 | 1,33 |
| Third | NJ-store1 | 3 | 5 | 2 | 2 | 3 |
| Fourth | PA-store1 | 1 | 3 | 5 | 1 | 2,5 |
IIUC, you want to sum different attribute whether or not the string NY is in the store name? IIUC,您想对商店名称中是否包含字符串 NY 的不同属性求和?
For this you can use boolean indexing and mean
or sum
:为此,您可以使用 boolean 索引和
mean
或sum
:
sumElements = ["attr1", "attr2"]
gralSum = ["attr2", "attr3", "attr4"]
elementsProm = ["attr1", "attr2", "attr3", "attr4"]
df['global'] = np.where(df['store_location'].str.contains('NY'),
df[sumElements].sum(1).div(len(gralSum)),
df[elementsProm].mean(1))
output: output:
Product store_location attr1 attr2 attr3 attr4 global
0 First NY-store1 3 5 2 2 2.666667
1 Second NY-store2 1 3 5 1 1.333333
2 Third NJ-store1 3 5 2 2 3.000000
3 Fourth PA-store1 1 3 5 1 2.500000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.