[英]How can I get the max (x) number of values from each column in a pandas dataframe while keeping the index for each?
I'm attempting to get the top x largest values from each column in a pandas dataframe. 我正在尝试从pandas数据框中的每一列中获取最高的x最大值。 Each column is one date while each row is a different stock ticker(see photo)
每列是一个日期,而每一行是不同的股票行情自动收录器(见图)
ideally i'd like to see the ticker and number of the top 5 for each date(column) 理想情况下,我想查看每个日期的前5名的行情自动收录器和代码(列)
I have tried a few different iterators but none have worked and kept the index. 我尝试了一些不同的迭代器,但是没有一个起作用并且保留了索引。
The output I want is into a new csv with the date and top 5 stock tickers (index) based on their value in the column that day. 我想要的输出将根据日期和当天行中的值输入到带有日期和前5个股票行情自动收录器(索引)的新csv中。
import pandas as pd 将熊猫作为pd导入
df = pd.read_csv (see photo) df = pd.read_csv(见图)
Haven't been able to get it to turn out right. 尚未能够正确解决。 enter image description here
在此处输入图片说明
Apply pd.Series.nlargest
to each column to mask everything but the top N values. 将
pd.Series.nlargest
应用于每列以屏蔽除前N个值之外的所有内容。 Then unstack
and remove NaN
. 然后拆下
unstack
并移除NaN
。 I'll use the top 2 values here for illustration. 我将在此处使用前两个值进行说明。
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame(np.random.normal(0, 10, (4, 3)),
columns=['Date1', 'Date2', 'Date3'],
index=['Stock1', 'Stock2', 'Stock3', 'Stock4'])
# Date1 Date2 Date3
#Stock1 4.967142 -1.382643 6.476885
#Stock2 15.230299 -2.341534 -2.341370
#Stock3 15.792128 7.674347 -4.694744
#Stock4 5.425600 -4.634177 -4.657298
df.apply(pd.Series.nlargest, n=2).unstack().dropna()
#Date1 Stock2 15.230299
# Stock3 15.792128
#Date2 Stock1 -1.382643
# Stock3 7.674347
#Date3 Stock1 6.476885
# Stock2 -2.341370
#dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.