简体   繁体   English

将长列表转换为 Pandas DataFrame 时,为什么会出现 IndexError?

[英]Why am I getting a IndexError when converting a long list to a Pandas DataFrame?

I have a list of sets that contain OrderedDicts that look like this, but the actual list contains ~22,000 elements:我有一个包含 OrderedDicts 的集合列表,看起来像这样,但实际列表包含 ~22,000 个元素:

o_dict_list = [(OrderedDict([('StreetNamePreType', 'ROAD'), ('StreetName', 'Coffee')]), 'Ambiguous'),
       (OrderedDict([('StreetNamePreType', 'AVENUE'), ('StreetName', 'Washington')]), 'Ambiguous'),
       (OrderedDict([('StreetNamePreType', 'ROAD'), ('StreetName', 'Quartz')]), 'Ambiguous')]

When I try to convert this list to a Pandas DataFrame using the question and solution noted here , on the entire list, I get the following error:当我尝试使用此处提到的问题和解决方案将此列表转换为 Pandas DataFrame 时,在整个列表中,我收到以下错误:

IndexError: string index out of range

For reference, the line of code that is causing the error is here:作为参考,导致错误的代码行在这里:

pd.DataFrame([o_dict_list[i][0] for i, j in enumerate(o_dict_list)])

When I trim the list down to 1,000, I can get the DataFrame to populate with no issue.当我将列表减少到 1,000 时,我可以毫无问题地填充 DataFrame。 The only issue is when I use the entire list of ~22K elements.唯一的问题是当我使用 ~22K 元素的整个列表时。

I am using:我在用:

Python 3.6.5 :: Anaconda, Inc. pandas==0.23.0 numpy 1.15.2 on a Window's 10 machine. Python 3.6.5 :: Anaconda, Inc. pandas==0.23.0 numpy 1.15.2在 Window 10 机器上。

Does anyone know why I get the IndexError when I use the list of ~22K elements?有谁知道为什么我在使用 ~22K 元素列表时会得到IndexError

Update: As noted below, I was able to resolve this issue by breaking up the list and testing each one.更新:如下所述,我能够通过分解列表并测试每个列表来解决此问题。 When doing so, I was able to find the part of the list that was causing the code to fail.这样做时,我能够找到导致代码失败的列表部分。 Thanks for the help.谢谢您的帮助。

Clearly some of your data is corrupt or invalid or not in the expected format.很明显,您的某些数据已损坏或无效或不是预期的格式。 You say the first 1000 elements are OK, so try the next 10000, and keep bisecting the data until you find the subset which causes the problem.你说前 1000 个元素没问题,所以尝试接下来的 10000 个,并继续二等分数据,直到找到导致问题的子集。

log2(22000) is less than 15, which is the maximum number of bisections you will need to try to narrow down where your problem is. log2(22000) 小于 15,这是您需要尝试缩小问题所在的最大二分数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么在为列表编制索引时会收到 IndexError? - Why am I getting a IndexError when I'm indexing a list? 使用串联的数据框时,为什么会出现“ IndexError:字符串索引超出范围” - Why am I getting an ‘IndexError: string index out of range’ when I use a concatenated dataframe 为什么我会收到这个 IndexError? - Why am i getting this IndexError? 为什么我会收到此 IndexError - Why am I getting this IndexError Django-为什么在确保列表存在后要求列表中的第一项时出现索引错误? - Django - Why am I getting an IndexError when asking for the first item in the list after ensuring the list exists? 当我将字典列表转换为 dataframe 时,我得到了 dataframe 的不同格式 - when i am converting list of dicts to dataframe i am getting different format of dataframe 为什么在云上训练时会收到“IndexError: list index out of range”? - Why am I getting "IndexError: list index out of range" when training on the cloud? 为什么我收到 IndexError: list index out of range 这个函数? - Why am I getting IndexError: list index out of range for this function? 为什么我收到 IndexError: list index out of range - Why am I getting IndexError: list index out of range 为什么在此代码中出现IndexError:list index超出范围? - Why am I getting IndexError: list index out of range in this code?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM