简体   繁体   English

遍历列表的最有效方法

[英]Most efficient way to iterate through list of lists

I'm currently collecting data from quandl and is saved as a list of lists. 我目前正在从quandl收集数据,并保存为列表列表。 The list looks something like this (Price data): 该列表如下所示(价格数据):

['2', 1L, datetime.date(1998, 1, 2), datetime.datetime(2016, 9, 26, 1, 35, 3, 830563), datetime.datetime(2016, 9, 26, 1, 35, 3, 830563), '82.1900', '83.6200', '81.7500', '83.5000', '28.5183', 1286500.0]

This is typically 1 of about 5000 lists, and every once in awhile Quandl will spit back some NaN values that don't like being saved into the database. 这通常是大约5000个列表中的1个,每隔一段时间Quandl将吐出一些不喜欢保存到数据库中的NaN值。

['2', 1L, datetime.date(1998, 1, 2), datetime.datetime(2016, 9, 26, 1, 35, 3, 830563), datetime.datetime(2016, 9, 26, 1, 35, 3, 830563), 'nan', 'nan', 'nan', 'nan', 'nan', 0]

What would be the most efficient way of iterating through the list of lists to change 'nan' values into zeros? 遍历列表列表以将“ nan”值更改为零的最有效方法是什么?

I know I could do something like this, but it seems rather inefficient. 我知道我可以做这样的事情,但是效率似乎很低。 This operation will need to be performed on 11 different values * 5000 different dates * 500 companies: 此操作将需要在11个不同的值* 5000个不同的日期* 500个公司上执行:

def screen_data(data):
    new_data = []
    for d in data:
        new_list = []
        for x in d:
            new_value = x
            if math.isNan(x):
                new_value = 0
            new_list.append(new_value)

        new_data.append(new_list)
    return new_data

I would be interested in any solution that could reduce the time. 我将对任何可以减少时间的解决方案感兴趣。 I know DataFrames might work, but not sure how it would solve the NaN issue. 我知道DataFrames可能会起作用,但不确定如何解决NaN问题。

Or if there is a way to include NaN values in an SQLServer5.6 database along with floats, changing the database is also a viable option. 或者,如果有一种方法可以将SQL Server5.6数据库中的NaN值与浮点数一起包括在内,则更改数据库也是可行的选择。

Don't create a new list - rather, edit the old list in-place: 不要创建新列表,而是就地编辑旧列表:

import math

def screenData(L):
    for subl in L:
        for i,n in enumerate(subl):
            if math.isnan(n): subl[i] = 0

The only way I can think of, to make this faster, would be with multiprocessing 我能想到的唯一方法是更快地进行多处理

I haven't timed it but have you tried using nested list comprehension with conditional expressions ? 我尚未计时,但您是否尝试过将嵌套列表理解条件表达式一起使用

For example: 例如:

import datetime

data = [
    ['2', 1, datetime.date(1998, 1, 2),
     datetime.datetime(2016, 9, 26, 1, 35, 3, 830563),
     datetime.datetime(2016, 9, 26, 1, 35, 3, 830563),
     '82.1900', '83.6200', '81.7500', '83.5000',
     '28.5183', 1286500.0],
    ['2', 1, datetime.date(1998, 1, 2),
     datetime.datetime(2016, 9, 26, 1, 35, 3, 830563),
     datetime.datetime(2016, 9, 26, 1, 35, 3, 830563),
     'nan', 'nan', 'nan', 'nan', 'nan', 0],
]

new_data = [[y if str(y).lower() != 'nan' else 0 for y in x] for x in data]

print(new_data)

I did not use math.isnan(y) because you have to be sure that y is a float number or you'll get an error. 我没有使用math.isnan(y)因为您必须确保y浮点数,否则会出现错误。 This is much more difficult to do while almost everything has a string representation. 在几乎所有内容都有字符串表示形式的情况下,这样做要困难得多。 But I still made sure that I did the lower case comparison to 'nan' (with .lower() ) since 'NaN' or 'Nan' are legal ways to express "Not a Number". 但是我仍然确保我对'nan'(使用.lower() )进行了小写比较,因为'NaN'或'Nan'是表达“ Not a Number”的合法方法。

how about this 这个怎么样

import math

def clean_nan(data_list,value=0):
    for i,x in enumerate(data_list):
        if math.isnan(x):
            data_list[i] = value
    return data_list 

(the return is optional, as the modification was made in-place, but it is needed if used with map or similar, assuming of course that data_list is well a list or similar container) (返回值是可选的,因为修改是就地进行的,但是如果与map或类似内容一起使用,则需要这样做,当然,假设data_list是列表或类似容器也是可以的)

depending on how you get your data and how you work with it will determined how to use it, for instance if you do something like this 取决于您如何获取数据以及如何使用它,将决定如何使用它,例如,如果您这样做

for data in (my database/Quandl/whatever):
    #do stuff with data

you can change it to 您可以将其更改为

for data in (my database/Quandl/whatever):
    clean_nan(data)
    #do stuff with data

or use map or if you are in python 2 imap 或使用地图,或者如果您在python 2 imap中

for data in map(clean_nan,(my database/Quandl/whatever)):
    #do stuff with data

that way you get to work with your data as soon as that arrive from the database/Quandl/whatever, granted if the place where you get the data also work as a generator, that is don't process the whole thing all at once, and if it does, procure to change it to a generator if possible. 这样,您就可以立即从数据库/ Quandl /任何地方获取数据,如果您获取数据的地方也可以作为生成器工作,那就是不要一次全部处理整个数据,如果可以,请设法将其更改为发电机。 In either case with this you get to work with your data as soon as possible. 无论哪种情况,您都可以尽快使用数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 python 中迭代列表的最有效方法是什么? - Which is the most efficient way to iterate through a list in python? 在Python中相交列表最有效的方法? - Most efficient way to intersect list of lists in Python? 遍历一长串字符串并从原始列表构建新列表的最pythonic 方法是什么? - What is the most pythonic way to iterate through a long list of strings and structure new lists from that original list? 在 Python 中迭代列表并找到合适的字符串模式的最快(最有效)方法是什么? - What's the fastest (most efficient) way to iterate through a list and find a fitting string pattern in Python? 在 Python 中将列表(列表列表)写入 csv 的最有效方法? - Most efficient way of writing a list (of lists of lists) to a csv in Python? 有没有更好的方法来遍历列表列表? - Is there a better way to iterate through list of lists? 有没有有效的迭代方法? - Is there a efficient way to iterate through? 在 python 中循环遍历列表的最有效方法是什么? - What is the most efficient way to loop through lists in python? 遍历元素列表的最有效方法。 Python 2.7 - The most efficient way to iterate over a list of elements. Python 2.7 在列表列表中找到最长递增子序列的最有效方法 - Most efficient way to find longest incrementing subsequence in a list of lists
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM