简体   繁体   English

从具有不同长度的字符串的二维列表创建数据帧的最 Pythonic/时尚/有效的方法

[英]Most pythonic/stylish/efficient way to create a dataframe from 2-dimensional list of string with varied length

I guess professional data analysts know an answer to this, but I'm no analyst.我想专业的数据分析师知道这个问题的答案,但我不是分析师。 And I just barely know Pandas.我几乎不知道熊猫。 So I am at a loss.所以我很茫然。

There are two lists.有两个列表。 Their contents are unpredictable (parsed from web counters, web analytics, web statistics, etc).它们的内容是不可预测的(从网络计数器、网络分析、网络统计等分析)。

list1 = ['WordA', 'WordB', ..., 'WordXYZ']

...and... ...和...

list2 = [['WordA1', 'WordA2'], ['WordB1'], ['WordC1', 'WordC2', ,'WordC96'], ..., ['WordXYZ1', 'WordXYZ2']]

Length of two lists are always equal (they`re the results of work of parser I already wrote)两个列表的长度总是相等的(它们是我已经写过的解析器的工作结果)

What I need is to create a dataframe which will have two rows for each item in list1 , each containing the word in first column, and then put corresponding words from list2 into first row of those two (starting from second column, first column to bealready filled from list1 ).我需要的是创建一个数据框,其中list1中的每个项目都有行,每行包含第一列中的单词,然后将list2中的相应单词放入这两个中的第一行(从第二列开始,第一列到 bealready从list1填充)。

So I imagine the following steps:所以我想象以下步骤:

  1. Create a dataframe filled with empty strings ('') with number of columns equal to len(max(list2, key=len)) and number of rows equal to twice length of list1 ( aaaand I don't know how, this is actually my very second time using Pandas at all! );创建一个用空字符串('')填充的数据框,其列数等于len(max(list2, key=len))和行数等于list1长度的两倍aaa,我不知道如何,这实际上是我第二次使用 Pandas! );
  2. Somehow fill first column of resulting dataframe with contents of list1 , filling two rows for each item in list1 ;以某种方式用list1的内容填充结果数据框的第一列,为list1中的每个项目填充两行;
  3. Somehow put contents of list2 into every even row of the dataframe, starting with second column;不知何故,将list2的内容放入数据帧的每一行,从第二列开始;
  4. Save into .xls file ( yes, that's the final goal ), enjoy job done.保存到 .xls 文件中(是​​的,这是最终目标),享受完成的工作。

Now first thing, I already spend half a day trying to find an answer " how to create pandas dataframe filled with empty strings with given number of rows and columns ", and found a lot of different articles, which contradict each other.现在第一件事,我已经花了半天时间试图找到一个答案“如何创建填充有给定行数和列数的空字符串的熊猫数据框”,并发现很多不同的文章,它们相互矛盾。

And second, there's got to be a way to do all this more pythonic, more efficient and more stylish way!其次,必须有一种方法来做这一切更蟒蛇,更高效,更时尚的方式!

Aaaand , maybe there`sa way to create an excel file without using pandas at all, which I just don't know about ( hopefully, yet ) Aaaand ,也许有一种方法可以在不使用 pandas 的情况下创建一个 excel 文件,我只是不知道(希望如此)

Can anyone help, please?有人可以帮忙吗?

UPD: (to answer a question) the results should look like: UPD:(回答问题)结果应如下所示:

WordA WordA1 WordA2 
WordA 
WordB WordB1 
WordB 
WordC WordC1 WordC2 (...) WordC96 
WordC 
(...)x2 
WordXYZ WordXYZ1 WordXYZ2 
WordXYZ 

If you just want to write the lists to an Excel file, you don't need pandas.如果您只想将列表写入 Excel 文件,则不需要 pandas。 You can use for instance openpyxl :您可以使用例如openpyxl

from openpyxl import Workbook

wb = Workbook()
ws = wb.active

for *word, words in zip(list1, list2):
    ws.append(word + words)
    ws.append(word)

wb.save('output.xlsx')

If you really want to use pandas: 如果你真的想使用熊猫:
 import pandas as pd df = pd.DataFrame([[None] + x if isinstance(x, list) else [x] for pair in zip(list2, list1) for x in pair]) df[0] = df[0].bfill() df.to_excel('output.xlsx', index=False, header=False)

The following should give you (almost) what you want:以下应该给你(几乎)你想要的东西:

import pandas as pd
from itertools import chain

list1 = ['WordA', 'WordB']
list2 = [['WordA1', 'WordA2'], ['WordB1']]

# Flatten list 2
list2 = list(chain(*list2))

# Create DataFrames
list1 = pd.DataFrame(data=list1, columns=["word1"])
list2 = pd.DataFrame(data=list2, columns=["word2"])

# Prefix for list2
list2["prefix"] = list2["word2"].str.extract("([^0-9]+)")

list1 = list1.merge(list2, left_on="word1", right_on="prefix", how="inner")

# Concatenated words
list1 = list1.groupby("word1")["word2"].agg(lambda x: " ".join(x)).reset_index()
list1["word2"] = list1["word1"].str.cat(list1["word2"], sep=" ")
list1 = pd.melt(list1).sort_values(by="value")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检查一维列表是否为二维列表的元素的大多数pythonic方法? - Most pythonic way to check if a 1-dimensional list is an element of a 2-dimensional list? 从列表长度获取范围的最pythonic方法? - Most pythonic way to get a range from the length of a list? 大多数pythonic(和有效)的方式将列表成对嵌套 - Most pythonic (and efficient) way of nesting a list in pairs 修复此列表的最pythonic /最有效的方法是什么? - Whats the most pythonic / efficient way to fix this list? 从 pandas Dataframe 列创建列表的高效/Pythonic 方式 - Efficient/Pythonic way to create lists from pandas Dataframe column 在循环中创建NumPy数组的最有效,最Python方式 - Most efficient and most pythonic way to create a NumPy array within a loop 从字符串中删除字符并创建子字符串的最pythonic方法是什么? - What is the most pythonic way to remove characters from a string and create substrings? 在字符串末尾插入字符的最 Pythonic 和有效的方法(如果还没有的话) - Most Pythonic and efficient way to insert character at end of string if not already there 从两个不相关的系列创建DataFrame的最有效方法是什么? - What is the most efficient way to create a DataFrame from two unrelated series? 在Python中从JSON文件创建DataFrame的最有效方法是什么? - What is the most efficient way to create a DataFrame from a JSON file in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM