[英]Convert a Pandas Dataframe in a list of lists
I have a Pandas DataFrame in the following format我有一个 Pandas DataFrame 格式如下
df = pd.DataFrame([[1, 2, 4, 5, 7, 8, 1],
[1, 3, 1, 3, 4, 6, 1],
[1, 4, 1, 2, 6, 5, 0],
[1, 5, 1, 3, 3, 6, 0],
[2, 6, 3, 5, 1, 3, 1],
[2, 7, 3, 2, 6, 8, 1],
[2, 1, 3, 1, 0, 4, 1]],
columns=['person_id', 'object_id', 'col_1','col_2','col_3','col_4','label'])
In a more visual way, this is how the DataFrame looks.以更直观的方式,这就是 DataFrame 的外观。 It has a
person_id
and an object_id
column.它有一个
person_id
和一个object_id
列。 Then some columns such as col_x
and finally the label
.然后是
col_x
等一些列,最后是label
。
person_id object_id col_1 col_2 col_3 col_4 label
0 1 2 4 5 7 8 1
1 1 3 1 3 4 6 1
2 1 4 1 2 6 5 0
3 1 5 1 3 3 6 0
4 2 6 3 5 1 3 1
5 2 7 3 2 6 8 1
6 2 1 3 1 0 4 1
I want to use a function from a library that needs the input in a specific format.我想使用需要特定格式输入的库中的 function 。 In specific, I want to group by
person_id
, object_id
and label
and then create a list of lists with the col_x
and a regular list with the label
.具体来说,我想按
person_id
、 object_id
和label
,然后用label
创建一个列表列表,用col_x
创建一个常规列表。 Based on the example above, it will be根据上面的例子,它将是
bags = [
[[4, 5, 7, 8],[1, 3, 4, 6]],
[[1, 2, 6, 5],[1, 3, 3, 6]],
[[3, 5, 1, 3],[3, 2, 6, 8],[3, 1, 0, 4]]
]
labels = [1,0,1]
What I do now is iterating in the pandas and create the two new lists dynamically.我现在要做的是在 pandas 中迭代并动态创建两个新列表。 However, I know it's not wise and I am looking for a more pythonic and better approach in performance.
但是,我知道这并不明智,我正在寻找一种更 Python 和更好的性能方法。
My ugly solution我丑陋的解决方案
bags = []
labels = []
uniquePeople = df['person_id'].unique()
predictors = ['col_1','col_2','col_3','col_4']
for unp in uniquePeople:
person = df[ (df['person_id'] == unp) && (df['label'] == 1) ][predictors].values
label = 1
if len(person) > 0:
bags.append(person)
labels.append(label)
person = df[ (df['person_id'] == unp) && (df['label'] == 0) ][predictors].values
label = 0
if len(person) > 0:
bags.append(paper)
labels.append(label)
PS I did a heavy lifting in the code on the fly to make it proper for stackoverflow. PS我在代码中做了繁重的工作以使其适合stackoverflow。 In case you find something wrong there, don't bother.
如果您发现那里有问题,请不要打扰。 The aim is to find a better one, not to fix the ugly one:P
目的是找到更好的,而不是修复丑陋的:P
Use DataFrame.groupby
with lambda function by both columns for Series
:将
DataFrame.groupby
与 lambda function 一起用于Series
:
predictors = ['col_1','col_2','col_3','col_4']
s = (df.groupby(['person_id','label'], sort=False)[predictors]
.apply(lambda x: x.values.tolist()))
print (s)
person_id label
1 1 [[4, 5, 7, 8], [1, 3, 4, 6]]
0 [[1, 2, 6, 5], [1, 3, 3, 6]]
2 1 [[3, 5, 1, 3], [3, 2, 6, 8], [3, 1, 0, 4]]
dtype: object
And then convert Series
to lists:然后将
Series
转换为列表:
bags = s.tolist()
print (bags)
[[[4, 5, 7, 8], [1, 3, 4, 6]],
[[1, 2, 6, 5], [1, 3, 3, 6]],
[[3, 5, 1, 3], [3, 2, 6, 8], [3, 1, 0, 4]]]
And second level of MultiIndex
by Index.get_level_values
too: Index.get_level_values
MultiIndex
:
labels = s.index.get_level_values(1).tolist()
print (labels)
[1, 0, 1]
Not sure if this is what you are looking for不确定这是否是您要找的
import pandas as pd
df = df = pd.DataFrame([[1, 2, 4, 5, 7, 8, 1],
[1, 3, 1, 3, 4, 6, 1],
[1, 4, 1, 2, 6, 5, 0],
[1, 5, 1, 3, 3, 6, 0],
[2, 6, 3, 5, 1, 3, 1],
[2, 7, 3, 2, 6, 8, 1],
[2, 1, 3, 1, 0, 4, 1]],
columns=['person_id', 'object_id', 'col_1','col_2','col_3','col_4','label']) # example dataframe
df['cols'] = df[['col_1', 'col_2', 'col_3', 'col_4']].apply(lambda x: list(x), axis=1) # create a new column with col_x as element of a list
tmp = df.groupby(['person_id', 'label'])[['cols']].agg(list) # group by and create list of lists
bags = tmp['cols'].tolist() # unpack
labels = tmp.index.droplevel(0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.