[英]List of LISTS of tuples to Pandas dataframe?
I have a list of lists of tuples, where every tuple is of equal length, and I need to convert the tuples to a Pandas dataframe in such a way that the columns of the dataframe are equal to the length of the tuples, and each tuple item is a row entry across the columns. 我有一个元组列表的列表,其中每个元组的长度相等,我需要将元组转换为Pandas数据帧,使得数据帧的列等于元组的长度,并且每个元组item是跨列的行条目。
I have consulted other questions on this topic (eg, Convert a list of lists of tuples to pandas dataframe , List of list of tuples to pandas dataframe , split list of tuples in lists of list of tuples ) unsuccessfully. 我已经就此主题咨询了其他问题(例如, 将元组列表转换为pandas数据帧 , 元组列表列表为pandas数据帧 , 拆分元组列表列表中的元组列表 )失败。
The closest I get is with list comprehension from a different question on Stack Overflow: 我得到的最接近的是Stack Overflow上另一个问题的列表理解:
import pandas as pd
tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]
# Trying list comprehension from previous stack question:
pd.DataFrame([[y for y in x] for x in tupList])
But this yields the unintended result: 但这会产生意想不到的结果:
0 1
0 (commentID, commentText, date) (123456, blahblahblah, 2019)
1 (45678, hello world, 2018) (0, text, 2017)
When the expected result is as follows: 当预期结果如下:
0 1 2
0 commentID commentText date
1 123456 blahblahblah 2019
2 45678 hello world 2018
3 0 text 2017
In sum: I need columns equal to the length of each tuple (in the example, 3), where each item within the tuple is a row entry across the columns. 总而言之:我需要的列等于每个元组的长度(在示例中为3),其中元组中的每个项目都是跨列的行条目。
Thanks! 谢谢!
Just flatten your list into a list of tuples (your initial list contains a sublists of tuples): 只需将列表展平为元组列表(您的初始列表包含元组的子列表):
In [1251]: tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]
In [1252]: pd.DataFrame([t for lst in tupList for t in lst])
Out[1252]:
0 1 2
0 commentID commentText date
1 123456 blahblahblah 2019
2 45678 hello world 2018
3 0 text 2017
A shorter code this: 一个更短的代码:
from itertools import chain
import pandas as pd
tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]
new_list = [x for x in chain.from_iterable(tupList)]
df = pd.DataFrame.from_records(new_list)
Edit 编辑
You can make the list comprehension directly in the from_records
function. 您可以直接在
from_records
函数中进行列表from_records
。
tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]
print(pd.DataFrame(sum(tupList,[])))
Output 产量
0 1 2
0 commentID commentText date
1 123456 blahblahblah 2019
2 45678 hello world 2018
3 0 text 2017
You can do it like this :D 你可以这样做:D
tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]
# Trying list comprehension from previous stack question:
df = pd.DataFrame([[y for y in x] for x in tupList])
df_1 = df[0].apply(pd.Series).assign(index= range(0, df.shape[0]*2, 2)).set_index("index")
df_2 = df[1].apply(pd.Series).assign(index= range(1, df.shape[0]*2, 2)).set_index("index")
pd.concat([df_1, df_2], axis=0).sort_index()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.