[英]Sum values to column if another have a string Pandas
我有一個具有不同屬性值的 dataframe,我有一個 json 文件,它有一個屬性列表,僅當 dataframe 中的列包含字符串時才求和。
| Product | store_location | attr1 | attr2 | attr3 | attr4 | global |
| ------- | -------------- | ----- | ----- | ----- | ----- | ------ |
| First | NY-store1 | 3 | 5 | 2 | 2 | |
| Second | NY-store2 | 1 | 3 | 5 | 1 | |
| Third | NJ-store1 | 3 | 5 | 2 | 2 | |
| Fourth | PA-store1 | 1 | 3 | 5 | 1 | |
json 文件具有以下結構:
{
"positionEvaluation": [
{
"position": "Global",
"sumElements": ["attr1", "attr2"],
"gralSum": ["attr2", "attr3", "attr4"],
"elementsProm": ["attr1", "attr2", "attr3", "attr4"]
}
]
}
顯然真實文件的屬性比較多,僅供demo使用。 所以,我想當產品在商店位置時,字符串“NY”采用“sumElements”的相應屬性並除以“gralSum”的長度,如果產品有另一個字符串,如“NJ”或“PA”對“elementsProm”的所有元素求和,然后除以它的長度。
這是我的代碼:
for p in range(len(js_positions["positionEvaluation"])):
aux1_string = js_positions["positionEvaluation"][p]["position"]
df[aux1_string] = 0
if df['store_location'].str.contains('NY').any():
for k in range(len(js_positions["positionEvaluation"][p]["sumElements"])):
tmp = js_positions["positionEvaluation"][p]["sumElements"][k]
df[aux1_string] = df[aux1_string] + df[tmp_for_gk]
df[aux1_string] = df[aux1_string] / len(js_positions["positionEvaluation"][p]["gralSum"])
else:
for k in range(len(js_positions["positionEvaluation"][p]["elementsProm"])):
tmp = js_positions["positionEvaluation"][p]["elementsProm"][k]
df[aux1_string] = df[aux1_string] + df[tmp]
df[aux1_string] = df[aux1_string] / len(js_positions["positionEvaluation"][p]["elementsProm"])
顯式列表:
sumElements = ["attr1", "attr2"]
gralSum = ["attr2", "attr3", "attr4"]
elementsProm = ["attr1", "attr2", "attr3", "attr4"]
預計 output:
| Product | store_location | attr1 | attr2 | attr3 | attr4 | global |
| ------- | -------------- | ----- | ----- | ----- | ----- | ------ |
| First | NY-store1 | 3 | 5 | 2 | 2 | 2,66 |
| Second | NY-store2 | 1 | 3 | 5 | 1 | 1,33 |
| Third | NJ-store1 | 3 | 5 | 2 | 2 | 3 |
| Fourth | PA-store1 | 1 | 3 | 5 | 1 | 2,5 |
IIUC,您想對商店名稱中是否包含字符串 NY 的不同屬性求和?
為此,您可以使用 boolean 索引和mean
或sum
:
sumElements = ["attr1", "attr2"]
gralSum = ["attr2", "attr3", "attr4"]
elementsProm = ["attr1", "attr2", "attr3", "attr4"]
df['global'] = np.where(df['store_location'].str.contains('NY'),
df[sumElements].sum(1).div(len(gralSum)),
df[elementsProm].mean(1))
output:
Product store_location attr1 attr2 attr3 attr4 global
0 First NY-store1 3 5 2 2 2.666667
1 Second NY-store2 1 3 5 1 1.333333
2 Third NJ-store1 3 5 2 2 3.000000
3 Fourth PA-store1 1 3 5 1 2.500000
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.