[英]Most pythonic/stylish/efficient way to create a dataframe from 2-dimensional list of string with varied length
I guess professional data analysts know an answer to this, but I'm no analyst.我想专业的数据分析师知道这个问题的答案,但我不是分析师。 And I just barely know Pandas.我几乎不知道熊猫。 So I am at a loss.所以我很茫然。
There are two lists.有两个列表。 Their contents are unpredictable (parsed from web counters, web analytics, web statistics, etc).它们的内容是不可预测的(从网络计数器、网络分析、网络统计等分析)。
list1 = ['WordA', 'WordB', ..., 'WordXYZ']
...and... ...和...
list2 = [['WordA1', 'WordA2'], ['WordB1'], ['WordC1', 'WordC2', ,'WordC96'], ..., ['WordXYZ1', 'WordXYZ2']]
Length of two lists are always equal (they`re the results of work of parser I already wrote)两个列表的长度总是相等的(它们是我已经写过的解析器的工作结果)
What I need is to create a dataframe which will have two rows for each item in list1
, each containing the word in first column, and then put corresponding words from list2
into first row of those two (starting from second column, first column to bealready filled from list1
).我需要的是创建一个数据框,其中list1
中的每个项目都有两行,每行包含第一列中的单词,然后将list2
中的相应单词放入这两个中的第一行(从第二列开始,第一列到 bealready从list1
填充)。
So I imagine the following steps:所以我想象以下步骤:
len(max(list2, key=len))
and number of rows equal to twice length of list1
( aaaand I don't know how, this is actually my very second time using Pandas at all! );创建一个用空字符串('')填充的数据框,其列数等于len(max(list2, key=len))
和行数等于list1
长度的两倍( aaa,我不知道如何,这实际上是我第二次使用 Pandas! );list1
, filling two rows for each item in list1
;以某种方式用list1
的内容填充结果数据框的第一列,为list1
中的每个项目填充两行;list2
into every even row of the dataframe, starting with second column;不知何故,将list2
的内容放入数据帧的每一行,从第二列开始;Now first thing, I already spend half a day trying to find an answer " how to create pandas dataframe filled with empty strings with given number of rows and columns ", and found a lot of different articles, which contradict each other.现在第一件事,我已经花了半天时间试图找到一个答案“如何创建填充有给定行数和列数的空字符串的熊猫数据框”,并发现很多不同的文章,它们相互矛盾。
And second, there's got to be a way to do all this more pythonic, more efficient and more stylish way!其次,必须有一种方法来做这一切更蟒蛇,更高效,更时尚的方式!
Aaaand , maybe there`sa way to create an excel file without using pandas at all, which I just don't know about ( hopefully, yet ) Aaaand ,也许有一种方法可以在不使用 pandas 的情况下创建一个 excel 文件,我只是不知道(希望如此)
Can anyone help, please?有人可以帮忙吗?
UPD: (to answer a question) the results should look like: UPD:(回答问题)结果应如下所示:
WordA WordA1 WordA2
WordA
WordB WordB1
WordB
WordC WordC1 WordC2 (...) WordC96
WordC
(...)x2
WordXYZ WordXYZ1 WordXYZ2
WordXYZ
If you just want to write the lists to an Excel file, you don't need pandas.如果您只想将列表写入 Excel 文件,则不需要 pandas。 You can use for instance openpyxl
:您可以使用例如openpyxl
:
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
for *word, words in zip(list1, list2):
ws.append(word + words)
ws.append(word)
wb.save('output.xlsx')
import pandas as pd df = pd.DataFrame([[None] + x if isinstance(x, list) else [x] for pair in zip(list2, list1) for x in pair]) df[0] = df[0].bfill() df.to_excel('output.xlsx', index=False, header=False)
The following should give you (almost) what you want:以下应该给你(几乎)你想要的东西:
import pandas as pd
from itertools import chain
list1 = ['WordA', 'WordB']
list2 = [['WordA1', 'WordA2'], ['WordB1']]
# Flatten list 2
list2 = list(chain(*list2))
# Create DataFrames
list1 = pd.DataFrame(data=list1, columns=["word1"])
list2 = pd.DataFrame(data=list2, columns=["word2"])
# Prefix for list2
list2["prefix"] = list2["word2"].str.extract("([^0-9]+)")
list1 = list1.merge(list2, left_on="word1", right_on="prefix", how="inner")
# Concatenated words
list1 = list1.groupby("word1")["word2"].agg(lambda x: " ".join(x)).reset_index()
list1["word2"] = list1["word1"].str.cat(list1["word2"], sep=" ")
list1 = pd.melt(list1).sort_values(by="value")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.