从具有不同长度的字符串的二维列表创建数据帧的最 Pythonic/时尚/有效的方法

Question

I guess professional data analysts know an answer to this, but I'm no analyst.我想专业的数据分析师知道这个问题的答案，但我不是分析师。 And I just barely know Pandas.我几乎不知道熊猫。 So I am at a loss.所以我很茫然。

There are two lists.有两个列表。 Their contents are unpredictable (parsed from web counters, web analytics, web statistics, etc).它们的内容是不可预测的（从网络计数器、网络分析、网络统计等分析）。

list1 = ['WordA', 'WordB', ..., 'WordXYZ']

...and... ...和...

list2 = [['WordA1', 'WordA2'], ['WordB1'], ['WordC1', 'WordC2', ,'WordC96'], ..., ['WordXYZ1', 'WordXYZ2']]

Length of two lists are always equal (they`re the results of work of parser I already wrote)两个列表的长度总是相等的（它们是我已经写过的解析器的工作结果）

What I need is to create a dataframe which will have two rows for each item in list1 , each containing the word in first column, and then put corresponding words from list2 into first row of those two (starting from second column, first column to bealready filled from list1 ).我需要的是创建一个数据框，其中list1中的每个项目都有两行，每行包含第一列中的单词，然后将list2中的相应单词放入这两个中的第一行（从第二列开始，第一列到 bealready从list1填充）。

So I imagine the following steps:所以我想象以下步骤：

Create a dataframe filled with empty strings ('') with number of columns equal to len(max(list2, key=len)) and number of rows equal to twice length of list1 ( aaaand I don't know how, this is actually my very second time using Pandas at all! );创建一个用空字符串（''）填充的数据框，其列数等于len(max(list2, key=len))和行数等于list1长度的两倍（ aaa，我不知道如何，这实际上是我第二次使用 Pandas！ ）；
Somehow fill first column of resulting dataframe with contents of list1 , filling two rows for each item in list1 ;以某种方式用list1的内容填充结果数据框的第一列，为list1中的每个项目填充两行；
Somehow put contents of list2 into every even row of the dataframe, starting with second column;不知何故，将list2的内容放入数据帧的每一行，从第二列开始；
Save into .xls file ( yes, that's the final goal ), enjoy job done.保存到 .xls 文件中（是的，这是最终目标），享受完成的工作。

Now first thing, I already spend half a day trying to find an answer " how to create pandas dataframe filled with empty strings with given number of rows and columns ", and found a lot of different articles, which contradict each other.现在第一件事，我已经花了半天时间试图找到一个答案“如何创建填充有给定行数和列数的空字符串的熊猫数据框”，并发现很多不同的文章，它们相互矛盾。

And second, there's got to be a way to do all this more pythonic, more efficient and more stylish way!其次，必须有一种方法来做这一切更蟒蛇，更高效，更时尚的方式！

Aaaand , maybe there`sa way to create an excel file without using pandas at all, which I just don't know about ( hopefully, yet ) Aaaand ，也许有一种方法可以在不使用 pandas 的情况下创建一个 excel 文件，我只是不知道（希望如此）

Can anyone help, please?有人可以帮忙吗？

UPD: (to answer a question) the results should look like: UPD：（回答问题）结果应如下所示：

WordA WordA1 WordA2 
WordA 
WordB WordB1 
WordB 
WordC WordC1 WordC2 (...) WordC96 
WordC 
(...)x2 
WordXYZ WordXYZ1 WordXYZ2 
WordXYZ

Answer 1

If you just want to write the lists to an Excel file, you don't need pandas.如果您只想将列表写入 Excel 文件，则不需要 pandas。 You can use for instance openpyxl :您可以使用例如openpyxl ：

from openpyxl import Workbook

wb = Workbook()
ws = wb.active

for *word, words in zip(list1, list2):
    ws.append(word + words)
    ws.append(word)

wb.save('output.xlsx')

If you really want to use pandas: 如果你真的想使用熊猫：

 import pandas as pd df = pd.DataFrame([[None] + x if isinstance(x, list) else [x] for pair in zip(list2, list1) for x in pair]) df[0] = df[0].bfill() df.to_excel('output.xlsx', index=False, header=False)

Answer 2

The following should give you (almost) what you want:以下应该给你（几乎）你想要的东西：

import pandas as pd
from itertools import chain

list1 = ['WordA', 'WordB']
list2 = [['WordA1', 'WordA2'], ['WordB1']]

# Flatten list 2
list2 = list(chain(*list2))

# Create DataFrames
list1 = pd.DataFrame(data=list1, columns=["word1"])
list2 = pd.DataFrame(data=list2, columns=["word2"])

# Prefix for list2
list2["prefix"] = list2["word2"].str.extract("([^0-9]+)")

list1 = list1.merge(list2, left_on="word1", right_on="prefix", how="inner")

# Concatenated words
list1 = list1.groupby("word1")["word2"].agg(lambda x: " ".join(x)).reset_index()
list1["word2"] = list1["word1"].str.cat(list1["word2"], sep=" ")
list1 = pd.melt(list1).sort_values(by="value")

从具有不同长度的字符串的二维列表创建数据帧的最 Pythonic/时尚/有效的方法

问题描述

2 个解决方案

解决方案1
0 2022-07-19 13:46:51

解决方案2
0 2022-07-19 14:10:00

从具有不同长度的字符串的二维列表创建数据帧的最 Pythonic/时尚/有效的方法

问题描述

2 个解决方案

解决方案1 0 2022-07-19 13:46:51

解决方案2 0 2022-07-19 14:10:00

解决方案1
0 2022-07-19 13:46:51

解决方案2
0 2022-07-19 14:10:00