如何將一列除以 dataframe 中具有相同 ID 的行數？

Question

我有一個看起來像這樣的 DataFrame：

ID	價格
1	300
1	300
1	300
2	400
2	400
3	100

我的目標是將每個觀察的價格除以具有相同 ID 號的行數。 預期的 output 將是：

ID	價格
1	100
1	100
1	100
2	200
2	200
3	100

但是，我在尋找執行此操作的最優化方法時遇到了一些問題。 我確實使用下面的代碼做到了這一點，但是運行需要超過 5 分鍾（因為我有大約 200k 的觀察結果）：

# For each row in the dataset, get the number of rows with the same Id and store them in a list
sum_of_each_id=[]
for i in df['Id'].to_numpy():
    sum_of_each_id.append(len(df[df['Id']==i]))

# Creating an auxiliar column in the dataframe, with the number of rows associated to each Id
df['auxiliar']=sum_of_each_id

# Dividing the price by the number of rows with the same Id
df['Price']=df['Price']/df['auxiliar']

你能告訴我什么是最好的方法嗎？

Answer 1

import pandas as pd

df = pd.DataFrame({"id": [1, 1, 1, 2, 2, 3], "price": [300, 300, 300, 400, 400, 100]})
df.set_index("id") / df.groupby("id").count()

解釋：

df.groupby("id").count()計算具有相同 ID 號的行數。 生成的 DataFrame 將有一個 Id 作為索引。
df.set_index("id")將 Id 列設置為索引
然后我們簡單地划分幀，pandas 將通過索引匹配數字。

Answer 2

嘗試groupby與transform 。

使用groupby('Id')根據 id 進行分組
使用 `transform('count') 獲取組中每行的值的計數
將df["Price]除以包含計數的系列。

df = pd.DataFrame({"Id":[1,1,1,2,2,3],"Price":[300,300,300,400,400,100]})

df["new_Price"] = (df["Price"]/df.groupby("Id")["Price"].transform("count")).astype('int')

print(df)

   Id  Price  new_Price
0   1    300        100
1   1    300        100
2   1    300        100
3   2    400        200
4   2    400        200
5   3    100        100

如何將一列除以 dataframe 中具有相同 ID 的行數？

問題描述

2 個解決方案

解決方案1
2 2021-05-24 10:43:18

解決方案2
2 已采納 2021-05-24 10:44:24

如何將一列除以 dataframe 中具有相同 ID 的行數？

問題描述

2 個解決方案

解決方案1 2 2021-05-24 10:43:18

解決方案2 2 已采納 2021-05-24 10:44:24

解決方案1
2 2021-05-24 10:43:18

解決方案2
2 已采納 2021-05-24 10:44:24