简体   繁体   中英

Sort DataFrame into groups and plot it

I have the following dummy df:

data = {'count':[11, 113, 53, 416, 3835, 143, 1, 1, 1, 2, 3, 4, 3, 4, 4, 6, 7, 7, 8,8,8,9]}
df = pd.DataFrame(data)

and want to create this plot:

在此处输入图像描述

Namely, I want to sort the df into groups based on the frequency.
6 groups in total. Group 1-2 represents all entries in the column count that are either a 1 or 2, Group 3-4 all entries in the column count that are either a 3 or 4 and so on.

I then tried this:

new_df = pd.DataFrame(columns=['1-2', '3-4', '5-6', '7-8', '9-10', '>10'])
new_df['1-2'] = df[df['count'] > 0 & (df['count'] <= 2)].count()

Which results in 22 at the Group 1-2, so something is off here.

You probably want to use pd.cut for this, since you can specify you bins and labels, then it's just a simple as grouping, counting, and plotting.

data = {'count':[11, 113, 53, 416, 3835, 143, 1, 1, 1, 2, 3, 4, 3, 4, 4, 6, 7, 7, 8,8,8,9]}
df = pd.DataFrame(data)

# Create bins with labels
bins = pd.IntervalIndex.from_tuples([(0, 2), (2, 4), (4, 6), (6, 8), (8,10), (10, 100000)])
df['bins']  = pd.cut(df['count'], bins=bins)
# Plot the bin counts
df.groupby('bins').count().plot(kind='bar')

在此处输入图像描述

You need brackets for the first condition (df['count']>0) too:

new_df['1-2'] = df[(df['count']>0) & (df['count']<=2)].count()
new_df['3-4'] = df[(df['count']>2) & (df['count']<=4)].count()
new_df['5-6'] = df[(df['count']>4) & (df['count']<=6)].count()
new_df['7-8'] = df[(df['count']>6) & (df['count']<=8)].count()
new_df['9-10'] = df[(df['count']>8) & (df['count']<=10)].count()
new_df['>10'] = df[(df['count']>10)].count()

For the plot:

new_df.T.plot.bar(fill=False, edgecolor="red")

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM