简体   繁体   English

如何将 Python 列表转换为 pandas DataFrame:

[英]How to convert a Python list to into pandas DataFrame :

I have below list which I have simplified:我有下面的列表,我已经简化了:

my_list = ['select', 'fruit1', 'fruit2, 'fruit3', 'from', 'basket1',
           'select', 'fruit4', 'from', 'basket2',
           'select', 'fruit5', 'fruit6' 'from', 'basket3', ..... so on]

Note how my list has 'select' and 'from' statements.请注意我的列表如何包含“选择”“来自”语句。

The output I am trying to achieve is a DataFrame or let's say Excel output:我想要实现的 output 是 DataFrame 或者说 Excel Z78E6221F6393D14CE5668:

Fruit number      Basket number
fruit1            basket1
fruit2            basket1
fruit3            basket1
fruit4            basket2
fruit5            basket3
fruit6            basket3
.                 .
.                 .
.                 .
.                 .

is there a way to achieve this result?有没有办法达到这个结果? I have tried many things but it won't work.. :(我已经尝试了很多东西,但它不会工作.. :(

something like the below (use a simple "state machine")类似下面的东西(使用一个简单的“状态机”)

import pandas as pd
lst = ['select', 'fruit1', 'fruit2', 'fruit3', 'from', 'basket1',
       'select', 'fruit4', 'from', 'basket2',
       'select', 'fruit5', 'fruit6', 'from', 'basket3']

data = []
fruits = []
state = 'select'
for word in lst:
  if word == 'select':
    state = 'select'
    continue
  if word == 'from':
    state = 'basket'
    continue
  if state == 'select':
    fruits.append(word)
  if state == 'basket':
    for f in fruits:
      data.append({'fruit':f,'basket':word})
    fruits = []

df = pd.DataFrame(data)
print(df)

output output

    fruit   basket
0  fruit1  basket1
1  fruit2  basket1
2  fruit3  basket1
3  fruit4  basket2
4  fruit5  basket3
5  fruit6  basket3

There are a lot of ways to do this.有很多方法可以做到这一点。 This approach gets the index of all the 'from', and splits 2 spaces ahead using np.split so that the start of each new array is a 'select'.这种方法获取所有“来自”的索引,并使用np.split将 2 个空格向前拆分,以便每个新数组的开头都是一个“选择”。 The last one is empty, so we will drop that.最后一个是空的,所以我们将删除它。

Then you can build a dict by slicing up each array, and make a dataframe out of it.然后你可以通过分割每个数组来构建一个字典,并从中制作一个 dataframe 。

import numpy as np
import pandas as pd
my_list = ['select', 'fruit1', 'fruit2', 'fruit3', 'from', 'basket1',
           'select', 'fruit4', 'from', 'basket2',
          'select', 'fruit5', 'fruit6', 'from', 'basket3']

f = [i+2 for i, x in enumerate(my_list) if x == "from"][:-1]
s = np.split(my_list,f)

df = pd.DataFrame([{'basket':q[-1],'fruits':q[1:-2]} for q in s])
df = df.explode('fruits')

Output Output

    basket  fruits
0  basket1  fruit1
0  basket1  fruit2
0  basket1  fruit3
1  basket2  fruit4
2  basket3  fruit5
2  basket3  fruit6
data = {'Select' : {'Fruit_Number': 
['fruit1','fruit2','fruit3']},'From' : {'Basket_Number': 
['basket1','basket2','basket3']}}

data2 = data['Select']
data3 = data['From']

df2 = pd.DataFrame.from_dict(data2)
df3 = pd.DataFrame.from_dict(data3)

l = [df2,df3]
df_all = pd.concat(l,axis=1)


      Fruit_Number Basket_Number
0       fruit1       basket1
1       fruit2       basket2
2       fruit3       basket3

Make a generic and reusable split function like the ones in the answers to this question .制作一个通用且可重复使用的split function ,就像这个问题的答案中的那样。 Then it is easier to yield pairs from each split group.然后更容易从每个拆分组中产生对。

def split(sequence, sep):
    group = []
    for item in sequence:
        if item == sep:
            yield group
            group = []
        else:
            group.append(item)
    yield group
    
def parse_select(tokens):
    for group in split(tokens, "select"):
        for item in group[:-2]:
            yield item, group[-1]
        
import pandas as pd
print(pd.DataFrame(parse_select(my_list)))

or alternatively:或者:

def parse_select(tokens):
    for group in split(tokens, "select"):
        if group:
            items, (basket,) = split(group, "from")
            for item in items:
                yield item, basket

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM