To avoid too many tiny slices in a pie chart, I need to merge/sum all elements in a series below a certain threshold. So far this is what I came up with:
from pandas import Series
import numpy as np
ser = Series(np.random.randint(100, size=10), index=list('abcdefghij')).order(ascending=False)
thresh = 20
cleaned = ser[ser>=thresh].append(Series([ser[ser<thresh].sum()],
index=["below {}".format(thresh)]))
this delivers the correct result, but the use of append
bothers me and does not strike me as particularly pandas-like.
Is there a more appealing way to achieve the same result?
Update:
This is a solution based on the comment by IanS below.
ser.index = map(lambda (x, y): x if y>=thresh else "below {}".format(thresh),
ser.iteritems())
or
ser.index = [x if y >=thresh else "below {}".format(thresh) for (x,y) in ser.iteritems()]
and then
ser.groupby(ser.index).sum()
You can try this:
df = ser.groupby(ser>20).apply(lambda x:
x if (x>20).all()
else pd.Series(x.sum(),
index=["below 20"])
).reset_index().set_index("level_1"
).iloc[:,1:][0].copy()
df.name = None
df.index.name=None
df.sort(ascending=False)
df
c 97
f 88
e 61
h 60
a 53
g 49
i 37
d 24
below 20 21
dtype: int64
But I'm not sure it's better than your solution.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.