[英]How to truncate a column in a Pandas time series data frame so as to remove leading and trailing zeros?
I have the following time series df in Pandas:我在 Pandas 中有以下时间序列 df:
date value
2015-01-01 0
2015-01-02 0
2015-01-03 0
2015-01-04 3
2015-01-05 0
2015-01-06 4
2015-01-07 0
I would like to remove the leading and trailing zeroes, so as to have the following df:我想删除前导零和尾随零,以便获得以下 df:
date value
2015-01-04 3
2015-01-05 0
2015-01-06 4
Simply dropping rows with 0s in them would lead to deleting the 0s in the middle as well, which I don't want.简单地删除其中包含 0 的行也会导致删除中间的 0,这是我不想要的。
I thought of writing a forward loop that starts from the first row and then continues until the first non 0 value, and a second backwards loop that goes back from the end and stops at the last non 0 value.我想写一个从第一行开始的前向循环,然后继续直到第一个非 0 值,以及第二个向后循环,从末尾返回并在最后一个非 0 值处停止。 But that seems like overkill, is there a more efficient way of doing so?但这似乎有点矫枉过正,有没有更有效的方法呢?
General solution returned empty DataFrame, if all 0
values in data with cumulative sum of mask tested not equal 0
values and swapped values by [::-1]
chained by bitwise AND
and filtering by boolean indexing
:通用解决方案返回空数据帧,如果数据中的所有0
值与测试的掩码累积总和不等于0
值,并且由[::-1]
交换的值通过bitwise AND
链接并通过boolean indexing
过滤:
s = df['value'].ne(0)
df = df[s.cumsum().ne(0) & s[::-1].cumsum().ne(0)]
print (df)
date value
3 2015-01-04 3
4 2015-01-05 0
5 2015-01-06 4
If always at least one non 0
value is possible convert 0
to missing values and use DataFrame.loc
with DataFrame.first_valid_index
and DataFrame.last_valid_index
:如果总是至少有一个非0
值是可能的, DataFrame.first_valid_index
0
转换为缺失值并将DataFrame.loc
与DataFrame.first_valid_index
和DataFrame.last_valid_index
:
s = df['value'].mask(df['value'] == 0)
df = df.loc[s.first_valid_index():s.last_valid_index()]
print (df)
date value
3 2015-01-04 3
4 2015-01-05 0
5 2015-01-06 4
Another idea is use DataFrame.idxmax
or DataFrame.idxmin
:另一个想法是使用DataFrame.idxmax
或DataFrame.idxmin
:
s = df['value'].eq(0)
df = df.loc[s.idxmin():s[::-1].idxmin()]
print (df)
date value
3 2015-01-04 3
4 2015-01-05 0
5 2015-01-06 4
s = df['value'].ne(0)
df = df.loc[s.idxmax():s[::-1].idxmax()]
You can get a list of the indexes where value is > than 0, and then find the min
.您可以获取 value > than 0 的索引列表,然后找到min
。
data = [
['2015-01-01', 0],
['2015-01-02', 0],
['2015-01-03', 0],
['2015-01-04', 3],
['2015-01-05', 0],
['2015-01-06', 4]
]
df = pd.DataFrame(data, columns = ['date', 'value'])
print(min(df.index[df['value'] > 0].tolist()))
# 3
Then filter the main df like this:然后像这样过滤主df:
df.iloc[3:]
Or even better:或者甚至更好:
df.iloc[min(df.index[df['value'] > 0].tolist()):]
And you get:你会得到:
date value
3 2015-01-04 3
4 2015-01-05 0
5 2015-01-06 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.