[英]Python: remove repeated values only if at end of list
I have a python list where order of responses is important. 我有一个python列表,响应顺序很重要。 I would like to filter out
nan
values only if they occur at the end of the list. 我想仅在列表末尾出现时才过滤掉
nan
值。 I was wondering if there is an efficient way to go from a list like the following: 我想知道是否有一种有效的方法可以从以下列表中找到:
nan = float("nan")
responses = [1.0, nan, 9.0, nan, nan, nan, nan, nan, nan, nan, nan]
To a list without any trailing nan
values: 到没有任何尾随
nan
值的列表:
[1.0, nan, 9.0]
I know how to filter out all nan
values using a list comprehension: 我知道如何使用列表理解过滤掉所有
nan
值:
import pandas as pd
[r for r in responses if pd.notnull(r)]
>>> [1.0, 9.0]
But can't think of a straightforward way to filter out nan
values at the end without converting everything to strings and using regular expressions. 但是不能想到一种直接的方法来在最后过滤掉
nan
值而不将所有内容都转换为字符串并使用正则表达式。 I could do that, but am concerned about performance, which is an issue because it will be performed several hundred thousand times. 我可以做到这一点,但我担心性能,这是一个问题,因为它将执行数十万次。
while responses and math.isnan(responses[-1]):
responses.pop()
Update: this isn't as fast as a straight up slice. 更新:这不如直线切片那么快。
>>> timeit.timeit('responses = list(r)\nwhile responses and isnan(responses[-1]): responses.pop()', 'from math import isnan; nan = float("nan"); r = [1.0, nan, 9.0, nan, nan, nan, nan, nan, nan, nan, nan]')
1.3209394318982959
>>> timeit.timeit('responses = list(r)\nresponses = responses[:3]', 'from math import isnan; nan = float("nan"); r = [1.0, nan, 9.0, nan, nan, nan, nan, nan, nan, nan, nan]')
0.29652016144245863
There is no builtin function or method. 没有内置函数或方法。 But you can use a loop:
但你可以使用循环:
while responses and math.isnan(responses[-1]):
del responses[-1]
As you can see yourself, this runs in linear time and uses no extra space. 正如您自己所看到的,这在线性时间内运行并且不会占用额外空间。
You can reverse it and use itertools.dropwhile
. 你可以反转它并使用
itertools.dropwhile
。 This should work for any value. 这适用于任何价值。
r = [1.0, nan, 9.0, nan, nan, nan, nan, nan, nan, nan, nan]
list(itertools.dropwhile(lambda x: x == r[-1], reversed(r)))[::-1] + r[-1:]
To filter just nan
, you can replace lambda x: x == r[-1]
for math.isnan
: 要仅过滤
nan
,可以将math.isnan
替换为lambda x: x == r[-1]
:
list(itertools.dropwhile(math.isnan, reversed(r)))[::-1]
What I would do is iterate over the list once, and then find where the end sequence of nans
begins. 我要做的是迭代列表一次,然后找到
nans
的结束序列开始的位置。 Something like 就像是
responses = [1.0, 'nan', 9.0, 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']
first_index = -1
for i, val in enumerate(responses):
if val == 'nan':
if first_index == -1:
first_index = i
else:
first_index = -1
responses = responses[:first_index] # [1.0, 'nan', 9.0]
Then you can perform a single slice operation. 然后,您可以执行单个切片操作。 It's a bit more verbose, than other solutions, but should be quicker.
它比其他解决方案更冗长,但应该更快。
Time Complexity 时间复杂性
According to this page , the slice operation is O(n), and iterating over the list is O(n), making this entire algorithm O(n) complexity. 根据该页面 ,切片操作是O(n),并且在列表上迭代是O(n),使得整个算法O(n)复杂。
Even better would be to iterate over the list backwards. 更好的方法是向后迭代列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.