[英]How to combine multiple rows with the same index with each row have only one true value in pandas?
I have a pandas dataframe which has the following shape: 我有一个熊猫数据框,其形状如下:
OPEN_INT PX_HIGH PX_LAST VOL
timestamp ticker source
2018-01-01 AAPL NYSE 1 NaN NaN NaN
2018-01-01 AAPL NYSE NaN 2 NaN NaN
2018-01-01 AAPL NYSE NaN NaN 3 NaN
2018-01-01 AAPL NYSE Nan NaN NaN 4
2018-01-01 MSFT NYSE 5 NaN NaN NaN
2018-01-01 MSFT NYSE NaN 6 NaN NaN
2018-01-01 MSFT NYSE NaN NaN 7 NaN
2018-01-01 MSFT NYSE Nan NaN NaN 8
In each column for each (timestamp, ticker, source) group there is gurantted only one value, all other values are Nan, is there any way I can combine these into single rows so it looks like: 在每个组(时间戳,行情指示器,源)的每一列中,仅保证一个值,所有其他值均为Nan,是否有任何方法可以将它们组合成单个行,因此如下所示:
OPEN_INT PX_HIGH PX_LAST VOL
timestamp ticker source
2018-01-01 AAPL NYSE 1 2 3 4
2018-01-01 MSFT NYSE 5 6 7 8
I have tried to use df.groupby(['timestamp', 'ticker', 'source']).agg(lambda x: x.dropna()
but I got an error saying Function does not reduce
. 我尝试使用
df.groupby(['timestamp', 'ticker', 'source']).agg(lambda x: x.dropna()
但出现错误,提示Function does not reduce
。
Use GroupBy.first
: 使用
GroupBy.first
:
df.groupby(['timestamp', 'ticker', 'source']).first()
If is always only one value per groups aggregate by max
, min
, sum
, mean
...: 如果总是,则每组中只有一个值通过
max
, min
, sum
, mean
...聚合:
df.groupby(['timestamp', 'ticker', 'source']).max()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.