简体   繁体   English

熊猫 dataframe 列和行中的拆分列表

[英]Split list in panda dataframe columns and rows

I wrote a little crawler for a website and obtained a list in the following structure:我为一个网站写了一个小爬虫,并获得了以下结构的列表:

'DRAFT ACT: OPEN\nSome Information \nTopic\nJustice\nType\nImplementing\nPeriod\n12.11.2020 - 10.12.2020', 'DRAFT ACT: OPEN\Some other Information\nTopic\nJustice\nType\nImplementing\nPeriod\n12.11.2020 - 10.12.2020,...

Now I would like to seperate this text list into a pandas dataframe dividing columns by \n and rows by , .现在我想将此文本列表分隔成 pandas dataframe 将列除以\n ,将行除以, Unfortunately, I don't know how to approach his.不幸的是,我不知道如何接近他。 Could someone please help me?有人可以帮我吗? Is there an easy way to split this list using pandas or another package?有没有一种简单的方法可以使用 pandas 或另一个 package 拆分此列表?

The result should thus look like this:结果应该是这样的:

     Column1          Column2                Column3 Column4  Column5 Columns6     Column7  Column8
Row1 DRAFT ACT: OPEN  Some Information       Topic   Justice  Type    Implementing Period   12.11.2020 - 10.12.2020'
Row2 DRAFT ACT: OPEN  Some other Information Topic   Justice  Type    Implementing Period   12.11.2020 - 10.12.2020'

Thank you very much in advance!非常感谢您!

lets say you get a list of strings like this.假设您得到了这样的字符串列表。

list1=['DRAFT ACT: OPEN\nSome Information \nTopic\nJustice\nType\nImplementing\nPeriod\n12.11.2020 - 10.12.2020', 'DRAFT ACT: OPEN\nSome other Information\nTopic\nJustice\nType\nImplementing\nPeriod\n12.11.2020 - 10.12.2020']

you can iterate the list and split each item on \n您可以迭代列表并拆分每个项目\n

like:像:

list1=[x.split('\n') for x in list1]

or like:或者喜欢:

for idx,item in enumerate(list1):
    list1[idx]=item.split('\n')

now you can create a dataframe with list1 .现在您可以使用list1创建一个 dataframe 。

import pandas as pd
df=pd.DataFrame(list1,columns=['Column1','Column2','Column3','Column4','Column5','Column6','Column7','Column8'])

import pandas as pd x = "'DRAFT ACT: OPEN\nSome Information \nTopic\nJustice\nType\nImplementing\nPeriod\n12.11.2020 - 10.12.2020', 'DRAFT ACT: OPEN\nSome other Information\nTopic\nJustice\nType\nImplementing\nPeriod\n12.11.2020 - 10.12.2020'" x = x.replace("\n","_") x = x.replace(",","\n") x = x.replace("_",",") with open("output.csv", 'w') as file: file.write(x) with open('output.csv','r') as file: z = pd.read_csv(file) print(z, type(z))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM