[英]Why am I getting a IndexError when converting a long list to a Pandas DataFrame?
I have a list of sets that contain OrderedDicts that look like this, but the actual list contains ~22,000 elements:我有一个包含 OrderedDicts 的集合列表,看起来像这样,但实际列表包含 ~22,000 个元素:
o_dict_list = [(OrderedDict([('StreetNamePreType', 'ROAD'), ('StreetName', 'Coffee')]), 'Ambiguous'),
(OrderedDict([('StreetNamePreType', 'AVENUE'), ('StreetName', 'Washington')]), 'Ambiguous'),
(OrderedDict([('StreetNamePreType', 'ROAD'), ('StreetName', 'Quartz')]), 'Ambiguous')]
When I try to convert this list to a Pandas DataFrame using the question and solution noted here , on the entire list, I get the following error:当我尝试使用此处提到的问题和解决方案将此列表转换为 Pandas DataFrame 时,在整个列表中,我收到以下错误:
IndexError: string index out of range
For reference, the line of code that is causing the error is here:作为参考,导致错误的代码行在这里:
pd.DataFrame([o_dict_list[i][0] for i, j in enumerate(o_dict_list)])
When I trim the list down to 1,000, I can get the DataFrame to populate with no issue.当我将列表减少到 1,000 时,我可以毫无问题地填充 DataFrame。 The only issue is when I use the entire list of ~22K elements.唯一的问题是当我使用 ~22K 元素的整个列表时。
I am using:我在用:
Python 3.6.5 :: Anaconda, Inc.
pandas==0.23.0
numpy 1.15.2
on a Window's 10 machine. Python 3.6.5 :: Anaconda, Inc.
pandas==0.23.0
numpy 1.15.2
在 Window 10 机器上。
Does anyone know why I get the IndexError
when I use the list of ~22K elements?有谁知道为什么我在使用 ~22K 元素列表时会得到IndexError
?
Update: As noted below, I was able to resolve this issue by breaking up the list and testing each one.更新:如下所述,我能够通过分解列表并测试每个列表来解决此问题。 When doing so, I was able to find the part of the list that was causing the code to fail.这样做时,我能够找到导致代码失败的列表部分。 Thanks for the help.谢谢您的帮助。
Clearly some of your data is corrupt or invalid or not in the expected format.很明显,您的某些数据已损坏或无效或不是预期的格式。 You say the first 1000 elements are OK, so try the next 10000, and keep bisecting the data until you find the subset which causes the problem.你说前 1000 个元素没问题,所以尝试接下来的 10000 个,并继续二等分数据,直到找到导致问题的子集。
log2(22000) is less than 15, which is the maximum number of bisections you will need to try to narrow down where your problem is. log2(22000) 小于 15,这是您需要尝试缩小问题所在的最大二分数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.