[英]Pandas second max value per group in dataframe
我有一個數據框:
bq_selection_id bq_balance bq_market_id bq_back_price
0 45094462 185.04 155 1.87
1 45094462 185.04 155 1.97
2 45094463 185.04 155 3.05
3 45094463 185.04 156 3.05
4 45094464 185.04 156 5.80
5 45094464 185.04 156 5.80
6 45094466 185.04 157 200.00
7 45094466 185.04 157 200.00
8 45094465 185.04 157 NaN
9 45094465 185.04 157 NaN
我想有兩個額外的列second_lowest,none_values每個組,groupby市場ID。 市場ID 155的第二個單詞lower_lowest 1.97,並且沒有NaN值,因此none_values為False。 我想得到類似的東西:
bq_selection_id bq_balance bq_market_id bq_back_price second_lowest none_val
0 45094462 185.04 155 1.87 1.97 False
1 45094462 185.04 155 1.97 1.97 False
2 45094463 185.04 155 3.05 1.97 False
3 45094463 185.04 156 3.05 5.80 False
4 45094464 185.04 156 5.80 5.80 False
5 45094464 185.04 156 6.40 5.80 False
6 45094466 185.04 157 1.00 1.70 True
7 45094466 185.04 157 1.70 1.70 True
8 45094465 185.04 157 NaN 1.70 True
9 45094465 185.04 157 NaN 1.70 True
您能幫我嗎?
結合你以前的問題用的想法( 1 , 2 ),你可以使用groupby/transform
,為您的數據幀的每一行分配一個新的價值:
import numpy as np
import pandas as pd
pd.options.display.width = 1000
df = pd.DataFrame(
{'bq_back_price': [1.87, 1.97, 3.05, 3.05, 5.8, 5.8, 200.0, 200.0, np.nan, np.nan],
'bq_balance': [1850.4, 1850.4, 1850.4, 1850.4, 1850.4, 1850.4, 1850.4,
1850.4, 1850.4, 1850.4],
'bq_market_id': [155, 155, 155, 156, 156, 156, 157, 157, 157, 157],
'bq_selection_id': [45094462, 45094462, 45094463, 45094463, 45094464,
45094464, 45094466, 45094466, 45094465, 45094465]})
grouped = df.groupby('bq_market_id')['bq_back_price']
df['second_lowest'] = grouped.transform(lambda x: x.nsmallest(2).max())
df['has_null'] = grouped.transform(lambda x: pd.isnull(x).any()).astype(bool)
print(df)
產量
bq_back_price bq_balance bq_market_id bq_selection_id second_lowest has_null
0 1.87 1850.4 155 45094462 1.97 False
1 1.97 1850.4 155 45094462 1.97 False
2 3.05 1850.4 155 45094463 1.97 False
3 3.05 1850.4 156 45094463 5.80 False
4 5.80 1850.4 156 45094464 5.80 False
5 5.80 1850.4 156 45094464 5.80 False
6 200.00 1850.4 157 45094466 200.00 True
7 200.00 1850.4 157 45094466 200.00 True
8 NaN 1850.4 157 45094465 200.00 True
9 NaN 1850.4 157 45094465 200.00 True
怎么樣:
gb = df.groupby('bq_market_id')
df['second_lowest'] = gb.bq_back_price.apply(lambda x: x.sort_values(ascending=False).iloc[1])[df.bq_market_id]
df['none_val'] = gb.bq_back_price.apply(lambda x: x.isnull().values.any())[df.bq_market_id]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.