简体   繁体   English

如何截断 Pandas 时间序列数据框中的列以删除前导零和尾随零?

[英]How to truncate a column in a Pandas time series data frame so as to remove leading and trailing zeros?

I have the following time series df in Pandas:我在 Pandas 中有以下时间序列 df:

date          value
2015-01-01      0
2015-01-02      0
2015-01-03      0
2015-01-04      3
2015-01-05      0
2015-01-06      4 
2015-01-07      0 

I would like to remove the leading and trailing zeroes, so as to have the following df:我想删除前导零和尾随零,以便获得以下 df:

date          value
2015-01-04      3
2015-01-05      0
2015-01-06      4 

Simply dropping rows with 0s in them would lead to deleting the 0s in the middle as well, which I don't want.简单地删除其中包含 0 的行也会导致删除中间的 0,这是我不想要的。

I thought of writing a forward loop that starts from the first row and then continues until the first non 0 value, and a second backwards loop that goes back from the end and stops at the last non 0 value.我想写一个从第一行开始的前向循环,然后继续直到第一个非 0 值,以及第二个向后循环,从末尾返回并在最后一个非 0 值处停止。 But that seems like overkill, is there a more efficient way of doing so?但这似乎有点矫枉过正,有没有更有效的方法呢?

General solution returned empty DataFrame, if all 0 values in data with cumulative sum of mask tested not equal 0 values and swapped values by [::-1] chained by bitwise AND and filtering by boolean indexing :通用解决方案返回空数据帧,如果数据中的所有0值与测试的掩码累积总和不等于0值,并且由[::-1]交换的值通过bitwise AND链接并通过boolean indexing过滤:

s = df['value'].ne(0)
df = df[s.cumsum().ne(0) & s[::-1].cumsum().ne(0)]
print (df)
         date  value
3  2015-01-04      3
4  2015-01-05      0
5  2015-01-06      4

If always at least one non 0 value is possible convert 0 to missing values and use DataFrame.loc with DataFrame.first_valid_index and DataFrame.last_valid_index :如果总是至少有一个非0值是可能的, DataFrame.first_valid_index 0转换为缺失值并将DataFrame.locDataFrame.first_valid_indexDataFrame.last_valid_index

s = df['value'].mask(df['value'] == 0)
df = df.loc[s.first_valid_index():s.last_valid_index()]
print (df)
         date  value
3  2015-01-04      3
4  2015-01-05      0
5  2015-01-06      4

Another idea is use DataFrame.idxmax or DataFrame.idxmin :另一个想法是使用DataFrame.idxmaxDataFrame.idxmin

s = df['value'].eq(0)
df = df.loc[s.idxmin():s[::-1].idxmin()]
print (df)
         date  value
3  2015-01-04      3
4  2015-01-05      0
5  2015-01-06      4

s = df['value'].ne(0)
df = df.loc[s.idxmax():s[::-1].idxmax()]

You can get a list of the indexes where value is > than 0, and then find the min .您可以获取 value > than 0 的索引列表,然后找到min

data = [
    ['2015-01-01',      0],
    ['2015-01-02',      0],
    ['2015-01-03',      0],
    ['2015-01-04',      3],
    ['2015-01-05',      0],
    ['2015-01-06',      4]
]
df = pd.DataFrame(data, columns = ['date', 'value'])

print(min(df.index[df['value'] > 0].tolist()))
# 3

Then filter the main df like this:然后像这样过滤主df:

df.iloc[3:]

Or even better:或者甚至更好:

df.iloc[min(df.index[df['value'] > 0].tolist()):]

And you get:你会得到:

    date        value
3   2015-01-04  3
4   2015-01-05  0
5   2015-01-06  4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM