简体   繁体   English

根据列值和位置从Pandas的分层系列中删除行

[英]Deleting row from hierarchical Series in Pandas based on column value and position

I would like to remove the leading and trailing zeros from each event (level 1) but not the zeros surrounded by non-zero numbers. 我想从每个事件(级别1)中删除前导和尾随零,但不要删除由非零数字包围的零。

The following works in finding and removing all zeros: 以下是查找和删除所有零的方法:

df = events[event_no][events[event_no] != 0]

I have the following hierarchical series: 我有以下层次结构系列:

   1    2/09/2010   0
        3/09/2010   1.5
        4/09/2010   4.3
        5/09/2010   5.1
        6/09/2010   0
   2    1/05/2007   53.2
        2/05/2007   0
        3/05/2007   21.5
        4/05/2007   2.5
        5/05/2007   0

and want: 并希望:

   1    3/09/2010   1.5
        4/09/2010   4.3
        5/09/2010   5.1
   2    1/05/2007   53.2
        2/05/2007   0
        3/05/2007   21.5
        4/05/2007   2.5

I have read Deleting DataFrame row in Pandas based on column value and Filter columns of only zeros from a Pandas data frame but have been unsuccessful in solving this problem. 我已经从Pandas数据框中 基于列值Filter列的零读取了在Pandas中删除DataFrame行,但是未能成功解决此问题。

How is your dataframe looks like. dataframe外观如何。 Anyway, shouldn't make any difference, simple Boolean indexing should do it: 无论如何,应该没有什么区别,简单的布尔索引应该做到这一点:

In [101]:print df

Out [101]:
                   c1
first second         
1     2/09/2010   0.0
      3/09/2010   1.5
      4/09/2010   4.3
      5/09/2010   5.1
      6/09/2010   0.0
2     1/05/2007  53.2
      2/05/2007   0.0
      3/05/2007  21.5
      4/05/2007   2.5
      5/05/2007   0.0


In [102]:

is_edge=argwhere(hstack((0,diff([item[0] for item in df.index.tolist()])))!=0).flatten()
is_edge=hstack((is_edge, is_edge-1, 0, len(df)-1))
g_idx=hstack(([item for item in argwhere(df['c1']==0).flatten() if item not in is_edge], 
              argwhere(df['c1']!=0).flatten()))
print df.ix[sorted(g_idx)]



Out[102]:
                   c1
first second         
1     3/09/2010   1.5
      4/09/2010   4.3
      5/09/2010   5.1
2     1/05/2007  53.2
      2/05/2007   0.0
      3/05/2007  21.5
      4/05/2007   2.5

If you have a series instead of a dataframe , say the series is s , you can either: 如果您有一个series而不是一个dataframe ,则说序列是s ,则可以:

Convert it to a dataframe : 将其转换为dataframe

df=pd.DataFrame(s, columns=['c1'])

Or: 要么:

In [113]:
is_edge=argwhere(hstack((0,diff([item[0] for item in s.index.tolist()])))!=0).flatten()
is_edge=hstack((is_edge, is_edge-1, 0, len(s)-1))
g_idx=hstack(([item for item in argwhere(s.values==0).flatten() if item not in is_edge], 
              argwhere(s.values!=0).flatten()))
s[sorted(g_idx)]
Out[113]:
first  second   
1      3/09/2010     1.5
       4/09/2010     4.3
       5/09/2010     5.1
2      1/05/2007    53.2
       2/05/2007     0.0
       3/05/2007    21.5
       4/05/2007     2.5
dtype: float64

BTW, I generate the series by: 顺便说一句,我通过以下方式生成系列:

In [116]:
tuples=[(1, '2/09/2010'),
(1, '3/09/2010'),
(1, '4/09/2010'),
(1, '5/09/2010'),
(1, '6/09/2010'),
(2, '1/05/2007'),
(2, '2/05/2007'),
(2, '3/05/2007'),
(2, '4/05/2007'),
(2, '5/05/2007')]
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(array([0.,1.5,4.3,5.1,0.,53.2,0.,21.5,2.5,0.]), index=index)
s
Out[116]:
first  second   
1      2/09/2010     0.0
       3/09/2010     1.5
       4/09/2010     4.3
       5/09/2010     5.1
       6/09/2010     0.0
2      1/05/2007    53.2
       2/05/2007     0.0
       3/05/2007    21.5
       4/05/2007     2.5
       5/05/2007     0.0
dtype: float64

Do I have the same structure right? 我有相同的结构吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM