[英]PANDAS dataframe python: wanting to sort values by group
I have the following link above for a CSV file containing the raw data for which I wish to manipulate.我在上面有一个 CSV 文件的以下链接,其中包含我希望操作的原始数据。
census_df = df = pd.read_csv('https://raw.githubusercontent.com/Qian-Han/coursera-Applied-Data-Science-with-Python/master/Introduction-to-Data-Science-in-Python/original_data/census.csv')
sortedit = census_df.sort_values(by = ['STNAME','CENSUS2010POP'],ascending=False)
I am trying to order the data in descending order by the column 'CENSUS2010POP'.我正在尝试按“CENSUS2010POP”列的降序对数据进行排序。
I also want to order the data by 'state' alphabetically, hence why I have including the 'STNAME' column in the formula above.我还想按字母顺序按“状态”对数据进行排序,因此我在上面的公式中包含了“STNAME”列。
However, I only want to select the 3 highest values for 'CENSUS2010POP' from each state ('STNAME').但是,我只想 select 每个 state('STNAME')中的'CENSUS2010POP'的3 个最高值。
Thus, if there are 146 states in total, I should (146 x 3) rows in my new dataframe (and thus in the 'CENSUS2010POP' column).因此,如果总共有 146 个州,我应该在我的新 dataframe 中(因此在“CENSUS2010POP”列中)中的 (146 x 3) 行。
I would be so grateful if anybody could give me a helping hand?如果有人可以帮助我,我将不胜感激?
IIUC, groupby
with .nalrgest
to create an index filter, chained with sort_values
IIUC, groupby
与.nalrgest
创建索引过滤器,与sort_values
链接
df2 = df.iloc[df.groupby('STNAME')['CENSUS2010POP']\
.nlargest(3).index.get_level_values(1)]\
.sort_values(['STNAME','CENSUS2010POP'],ascending=True)
print(df['STNAME'].nunique())
51
print(df2.shape)
(152, 100)
print(df2[['STNAME','CENSUS2010POP']])
STNAME CENSUS2010POP
49 Alabama 412992
37 Alabama 658466
0 Alabama 4779736
76 Alaska 97581
71 Alaska 291826
... ... ...
3137 Wisconsin 947735
3096 Wisconsin 5686986
3182 Wyoming 75450
3180 Wyoming 91738
3169 Wyoming 563626
[152 rows x 2 columns]
try this:尝试这个:
df = census_df.groupby(["STNAME"]).apply(lambda x: x.sort_values(["CENSUS2010POP"], ascending = False)).reset_index(drop=True)
df.groupby('STNAME').head(3)[['STNAME','CENSUS2010POP']]
The first statement returns dataframe sorted by CENSUS2010POP
in each STNAME
.第一条语句返回CENSUS2010POP
在每个STNAME
中按 CENSUS2010POP 排序。
The second statement returns the top 3.第二条语句返回前 3 个。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.