[英]Deleting row from hierarchical Series in Pandas based on column value and position
I would like to remove the leading and trailing zeros from each event (level 1) but not the zeros surrounded by non-zero numbers. 我想从每个事件(级别1)中删除前导和尾随零,但不要删除由非零数字包围的零。
The following works in finding and removing all zeros: 以下是查找和删除所有零的方法:
df = events[event_no][events[event_no] != 0]
I have the following hierarchical series: 我有以下层次结构系列:
1 2/09/2010 0
3/09/2010 1.5
4/09/2010 4.3
5/09/2010 5.1
6/09/2010 0
2 1/05/2007 53.2
2/05/2007 0
3/05/2007 21.5
4/05/2007 2.5
5/05/2007 0
and want: 并希望:
1 3/09/2010 1.5
4/09/2010 4.3
5/09/2010 5.1
2 1/05/2007 53.2
2/05/2007 0
3/05/2007 21.5
4/05/2007 2.5
I have read Deleting DataFrame row in Pandas based on column value and Filter columns of only zeros from a Pandas data frame but have been unsuccessful in solving this problem. 我已经从Pandas数据框中 基于列值和Filter列的零读取了在Pandas中删除DataFrame行,但是未能成功解决此问题。
How is your dataframe
looks like. dataframe
外观如何。 Anyway, shouldn't make any difference, simple Boolean indexing should do it: 无论如何,应该没有什么区别,简单的布尔索引应该做到这一点:
In [101]:print df
Out [101]:
c1
first second
1 2/09/2010 0.0
3/09/2010 1.5
4/09/2010 4.3
5/09/2010 5.1
6/09/2010 0.0
2 1/05/2007 53.2
2/05/2007 0.0
3/05/2007 21.5
4/05/2007 2.5
5/05/2007 0.0
In [102]:
is_edge=argwhere(hstack((0,diff([item[0] for item in df.index.tolist()])))!=0).flatten()
is_edge=hstack((is_edge, is_edge-1, 0, len(df)-1))
g_idx=hstack(([item for item in argwhere(df['c1']==0).flatten() if item not in is_edge],
argwhere(df['c1']!=0).flatten()))
print df.ix[sorted(g_idx)]
Out[102]:
c1
first second
1 3/09/2010 1.5
4/09/2010 4.3
5/09/2010 5.1
2 1/05/2007 53.2
2/05/2007 0.0
3/05/2007 21.5
4/05/2007 2.5
If you have a series
instead of a dataframe
, say the series is s
, you can either: 如果您有一个series
而不是一个dataframe
,则说序列是s
,则可以:
Convert it to a dataframe
: 将其转换为dataframe
:
df=pd.DataFrame(s, columns=['c1'])
Or: 要么:
In [113]:
is_edge=argwhere(hstack((0,diff([item[0] for item in s.index.tolist()])))!=0).flatten()
is_edge=hstack((is_edge, is_edge-1, 0, len(s)-1))
g_idx=hstack(([item for item in argwhere(s.values==0).flatten() if item not in is_edge],
argwhere(s.values!=0).flatten()))
s[sorted(g_idx)]
Out[113]:
first second
1 3/09/2010 1.5
4/09/2010 4.3
5/09/2010 5.1
2 1/05/2007 53.2
2/05/2007 0.0
3/05/2007 21.5
4/05/2007 2.5
dtype: float64
BTW, I generate the series by: 顺便说一句,我通过以下方式生成系列:
In [116]:
tuples=[(1, '2/09/2010'),
(1, '3/09/2010'),
(1, '4/09/2010'),
(1, '5/09/2010'),
(1, '6/09/2010'),
(2, '1/05/2007'),
(2, '2/05/2007'),
(2, '3/05/2007'),
(2, '4/05/2007'),
(2, '5/05/2007')]
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(array([0.,1.5,4.3,5.1,0.,53.2,0.,21.5,2.5,0.]), index=index)
s
Out[116]:
first second
1 2/09/2010 0.0
3/09/2010 1.5
4/09/2010 4.3
5/09/2010 5.1
6/09/2010 0.0
2 1/05/2007 53.2
2/05/2007 0.0
3/05/2007 21.5
4/05/2007 2.5
5/05/2007 0.0
dtype: float64
Do I have the same structure right? 我有相同的结构吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.