简体   繁体   English

Python:仅在列表末尾删除重复的值

[英]Python: remove repeated values only if at end of list

I have a python list where order of responses is important. 我有一个python列表,响应顺序很重要。 I would like to filter out nan values only if they occur at the end of the list. 我想仅在列表末尾出现时才过滤掉nan值。 I was wondering if there is an efficient way to go from a list like the following: 我想知道是否有一种有效的方法可以从以下列表中找到:

nan = float("nan")
responses = [1.0, nan, 9.0, nan, nan, nan, nan, nan, nan, nan, nan]

To a list without any trailing nan values: 到没有任何尾随nan值的列表:

[1.0, nan, 9.0]

I know how to filter out all nan values using a list comprehension: 我知道如何使用列表理解过滤掉所有nan值:

import pandas as pd
[r for r in responses if pd.notnull(r)]
>>> [1.0, 9.0]

But can't think of a straightforward way to filter out nan values at the end without converting everything to strings and using regular expressions. 但是不能想到一种直接的方法来在最后过滤掉nan值而不将所有内容都转换为字符串并使用正则表达式。 I could do that, but am concerned about performance, which is an issue because it will be performed several hundred thousand times. 我可以做到这一点,但我担心性能,这是一个问题,因为它将执行数十万次。

while responses and math.isnan(responses[-1]):
    responses.pop()

Update: this isn't as fast as a straight up slice. 更新:这不如直线切片那么快。

>>> timeit.timeit('responses = list(r)\nwhile responses and isnan(responses[-1]): responses.pop()', 'from math import isnan; nan = float("nan"); r = [1.0, nan, 9.0, nan, nan, nan, nan, nan, nan, nan, nan]')
1.3209394318982959
>>> timeit.timeit('responses = list(r)\nresponses = responses[:3]', 'from math import isnan; nan = float("nan"); r = [1.0, nan, 9.0, nan, nan, nan, nan, nan, nan, nan, nan]')
0.29652016144245863

There is no builtin function or method. 没有内置函数或方法。 But you can use a loop: 但你可以使用循环:

while responses and math.isnan(responses[-1]):
    del responses[-1]

As you can see yourself, this runs in linear time and uses no extra space. 正如您自己所看到的,这在线性时间内运行并且不会占用额外空间。

You can reverse it and use itertools.dropwhile . 你可以反转它并使用itertools.dropwhile This should work for any value. 这适用于任何价值。

r = [1.0, nan, 9.0, nan, nan, nan, nan, nan, nan, nan, nan]
list(itertools.dropwhile(lambda x: x == r[-1], reversed(r)))[::-1] + r[-1:]

To filter just nan , you can replace lambda x: x == r[-1] for math.isnan : 要仅过滤nan ,可以将math.isnan替换为lambda x: x == r[-1]

list(itertools.dropwhile(math.isnan, reversed(r)))[::-1]

What I would do is iterate over the list once, and then find where the end sequence of nans begins. 我要做的是迭代列表一次,然后找到nans的结束序列开始的位置。 Something like 就像是

responses = [1.0, 'nan', 9.0, 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']

first_index = -1
for i, val in enumerate(responses):
  if val == 'nan':
    if first_index == -1:
      first_index = i
  else:
    first_index = -1

responses = responses[:first_index]  # [1.0, 'nan', 9.0]

Then you can perform a single slice operation. 然后,您可以执行单个切片操作。 It's a bit more verbose, than other solutions, but should be quicker. 它比其他解决方案更冗长,但应该更快。

Time Complexity 时间复杂性

According to this page , the slice operation is O(n), and iterating over the list is O(n), making this entire algorithm O(n) complexity. 根据该页面 ,切片操作是O(n),并且在列表上迭代是O(n),使得整个算法O(n)复杂。

Even better would be to iterate over the list backwards. 更好的方法是向后迭代列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM