如何截断 Pandas 时间序列数据框中的列以删除前导零和尾随零？

Question

I have the following time series df in Pandas:我在 Pandas 中有以下时间序列 df：

date          value
2015-01-01      0
2015-01-02      0
2015-01-03      0
2015-01-04      3
2015-01-05      0
2015-01-06      4 
2015-01-07      0

I would like to remove the leading and trailing zeroes, so as to have the following df:我想删除前导零和尾随零，以便获得以下 df：

date          value
2015-01-04      3
2015-01-05      0
2015-01-06      4

Simply dropping rows with 0s in them would lead to deleting the 0s in the middle as well, which I don't want.简单地删除其中包含 0 的行也会导致删除中间的 0，这是我不想要的。

I thought of writing a forward loop that starts from the first row and then continues until the first non 0 value, and a second backwards loop that goes back from the end and stops at the last non 0 value.我想写一个从第一行开始的前向循环，然后继续直到第一个非 0 值，以及第二个向后循环，从末尾返回并在最后一个非 0 值处停止。 But that seems like overkill, is there a more efficient way of doing so?但这似乎有点矫枉过正，有没有更有效的方法呢？

Answer 1

General solution returned empty DataFrame, if all 0 values in data with cumulative sum of mask tested not equal 0 values and swapped values by [::-1] chained by bitwise AND and filtering by boolean indexing :通用解决方案返回空数据帧，如果数据中的所有0值与测试的掩码累积总和不等于0值，并且由[::-1]交换的值通过bitwise AND链接并通过boolean indexing过滤：

s = df['value'].ne(0)
df = df[s.cumsum().ne(0) & s[::-1].cumsum().ne(0)]
print (df)
         date  value
3  2015-01-04      3
4  2015-01-05      0
5  2015-01-06      4

If always at least one non 0 value is possible convert 0 to missing values and use DataFrame.loc with DataFrame.first_valid_index and DataFrame.last_valid_index :如果总是至少有一个非0值是可能的， DataFrame.first_valid_index 0转换为缺失值并将DataFrame.loc与DataFrame.first_valid_index和DataFrame.last_valid_index ：

s = df['value'].mask(df['value'] == 0)
df = df.loc[s.first_valid_index():s.last_valid_index()]
print (df)
         date  value
3  2015-01-04      3
4  2015-01-05      0
5  2015-01-06      4

Another idea is use DataFrame.idxmax or DataFrame.idxmin :另一个想法是使用DataFrame.idxmax或DataFrame.idxmin ：

s = df['value'].eq(0)
df = df.loc[s.idxmin():s[::-1].idxmin()]
print (df)
         date  value
3  2015-01-04      3
4  2015-01-05      0
5  2015-01-06      4

s = df['value'].ne(0)
df = df.loc[s.idxmax():s[::-1].idxmax()]

Answer 2

You can get a list of the indexes where value is > than 0, and then find the min .您可以获取 value > than 0 的索引列表，然后找到min 。

data = [
    ['2015-01-01',      0],
    ['2015-01-02',      0],
    ['2015-01-03',      0],
    ['2015-01-04',      3],
    ['2015-01-05',      0],
    ['2015-01-06',      4]
]
df = pd.DataFrame(data, columns = ['date', 'value'])

print(min(df.index[df['value'] > 0].tolist()))
# 3

Then filter the main df like this:然后像这样过滤主df：

df.iloc[3:]

Or even better:或者甚至更好：

df.iloc[min(df.index[df['value'] > 0].tolist()):]

And you get:你会得到：

    date        value
3   2015-01-04  3
4   2015-01-05  0
5   2015-01-06  4

如何截断 Pandas 时间序列数据框中的列以删除前导零和尾随零？

问题描述

2 个解决方案

解决方案1
3 2020-02-24 09:02:45

解决方案2
2 2020-02-24 09:09:04

如何截断 Pandas 时间序列数据框中的列以删除前导零和尾随零？

问题描述

2 个解决方案

解决方案1 3 2020-02-24 09:02:45

解决方案2 2 2020-02-24 09:09:04

解决方案1
3 2020-02-24 09:02:45

解决方案2
2 2020-02-24 09:09:04