[英]How to split every other element from a huge column of list in python
I have a huge list in python in a single column and i need to split all the fruits, colours etc from the list and make a dafaframe.我在 python 的单列中有一个巨大的列表,我需要从列表中拆分所有水果、颜色等并制作一个 dafaframe。
example
details=['banana',
'type:',
'fruit',
'color:',
'yellow',
'orange',
'type:',
'fruit',
'color:',
'orange',
'blueberry',
'type:',
'fruit',
'color:',
'blue']
what I'm expecting to achieve is if I extract all color from above then the result should be a single column of list like below.我期望实现的是,如果我从上面提取所有颜色,那么结果应该是如下所示的单列列表。
Out[1]:
['yellow',
'orange',
'blue']
details=['banana',
'type:',
'fruit',
'color:',
'yellow',
'orange',
'type:',
'fruit',
'color:',
'orange',
'blueberry',
'type:',
'fruit',
'color:',
'blue']
# split each fruit as a list and index colors
details = [details[i:i+5] for i in range(0,len(details),5)]
fruits = []
color = []
for i in details:
fruits.append(i[0])
color.append(i[4])
One possible approach if the data-structure doesn't change, is using list comprehension:如果数据结构没有改变,一种可能的方法是使用列表理解:
eg: [details[i+1] for i, x in enumerate(details) if x == 'color:']
例如: [details[i+1] for i, x in enumerate(details) if x == 'color:']
Full code:完整代码:
details=['banana',
'type:',
'fruit',
'color:',
'yellow',
'orange',
'type:',
'fruit',
'color:',
'orange',
'blueberry',
'type:',
'fruit',
'color:',
'blue']
colors = [details[i+1] for i, x in enumerate(details) if x == 'color:']
fruits = [details[i-1] for i, x in enumerate(details) if x == 'type:']
types = [details[i+1] for i, x in enumerate(details) if x == 'type:']
print('fruits: ', fruits)
print('types: ', types)
print('colors: ', colors)
Output: Output:
fruits: ['banana', 'orange', 'blueberry']
types: ['fruit', 'fruit', 'fruit']
colors: ['yellow', 'orange', 'blue']
Or as Dataframe或如 Dataframe
# to make datafame
import pandas as pd
df = pd.DataFrame()
df['fruits'] = [details[i-1] for i, x in enumerate(details) if x == 'type:']
df['types'] = [details[i+1] for i, x in enumerate(details) if x == 'type:']
df['colors'] = [details[i+1] for i, x in enumerate(details) if x == 'color:']
print(df)
Output: Output:
fruits types colors
0 banana fruit yellow
1 orange fruit orange
2 blueberry fruit blue
Explanation:解释:
color
, so find the indices of that string in the list color 的值在字符串color
之后,因此在列表中找到该字符串的索引If you want to extract only colors, you can use the package colour
- https://github.com/vaab/colour如果只想提取 colors,可以使用 package colour
- https://github.com/vaab/colour
In [7]: from colour import Color
In [8]: details
Out[8]:
['banana',
'type:',
'fruit',
'color:',
'yellow',
'orange',
'type:',
'fruit',
'color:',
'orange',
'blueberry',
'type:',
'fruit',
'color:',
'blue']
In [9]: colors_only = set()
In [10]: for i in details:
...: try:
...: Color(i)
...: colors_only.add(i)
...: except: pass
...:
In [11]: colors_only
Out[11]: {'yellow', 'orange', 'blue'}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.