简体   繁体   English

在列表列表中转换 Pandas Dataframe

[英]Convert a Pandas Dataframe in a list of lists

I have a Pandas DataFrame in the following format我有一个 Pandas DataFrame 格式如下

df = pd.DataFrame([[1, 2, 4, 5, 7, 8, 1], 
                   [1, 3, 1, 3, 4, 6, 1],
                   [1, 4, 1, 2, 6, 5, 0],
                   [1, 5, 1, 3, 3, 6, 0],
                   [2, 6, 3, 5, 1, 3, 1],
                   [2, 7, 3, 2, 6, 8, 1],
                   [2, 1, 3, 1, 0, 4, 1]], 
                   columns=['person_id', 'object_id', 'col_1','col_2','col_3','col_4','label'])

In a more visual way, this is how the DataFrame looks.以更直观的方式,这就是 DataFrame 的外观。 It has a person_id and an object_id column.它有一个person_id和一个object_id列。 Then some columns such as col_x and finally the label .然后是col_x等一些列,最后是label

   person_id  object_id  col_1  col_2  col_3  col_4  label
0          1          2      4      5      7      8      1
1          1          3      1      3      4      6      1
2          1          4      1      2      6      5      0
3          1          5      1      3      3      6      0
4          2          6      3      5      1      3      1
5          2          7      3      2      6      8      1
6          2          1      3      1      0      4      1

I want to use a function from a library that needs the input in a specific format.我想使用需要特定格式输入的库中的 function 。 In specific, I want to group by person_id , object_id and label and then create a list of lists with the col_x and a regular list with the label .具体来说,我想按person_idobject_idlabel ,然后用label创建一个列表列表,用col_x创建一个常规列表。 Based on the example above, it will be根据上面的例子,它将是

bags = [
[[4, 5, 7, 8],[1, 3, 4, 6]],
[[1, 2, 6, 5],[1, 3, 3, 6]],
[[3, 5, 1, 3],[3, 2, 6, 8],[3, 1, 0, 4]]
]

labels = [1,0,1]

What I do now is iterating in the pandas and create the two new lists dynamically.我现在要做的是在 pandas 中迭代并动态创建两个新列表。 However, I know it's not wise and I am looking for a more pythonic and better approach in performance.但是,我知道这并不明智,我正在寻找一种更 Python 和更好的性能方法。

My ugly solution我丑陋的解决方案

bags = []
labels = []

uniquePeople = df['person_id'].unique()

predictors = ['col_1','col_2','col_3','col_4']
for unp in uniquePeople:
    person = df[ (df['person_id'] == unp) && (df['label'] == 1) ][predictors].values
    label = 1
    if len(person) > 0:
        bags.append(person)
        labels.append(label)

    person = df[ (df['person_id'] == unp) && (df['label'] == 0) ][predictors].values
    label = 0
    if len(person) > 0:
        bags.append(paper)
        labels.append(label)

PS I did a heavy lifting in the code on the fly to make it proper for stackoverflow. PS我在代码中做了繁重的工作以使其适合stackoverflow。 In case you find something wrong there, don't bother.如果您发现那里有问题,请不要打扰。 The aim is to find a better one, not to fix the ugly one:P目的是找到更好的,而不是修复丑陋的:P

Use DataFrame.groupby with lambda function by both columns for Series :DataFrame.groupby与 lambda function 一起用于Series

predictors = ['col_1','col_2','col_3','col_4']
s = (df.groupby(['person_id','label'], sort=False)[predictors]
       .apply(lambda x: x.values.tolist()))
print (s)
person_id  label
1          1                      [[4, 5, 7, 8], [1, 3, 4, 6]]
           0                      [[1, 2, 6, 5], [1, 3, 3, 6]]
2          1        [[3, 5, 1, 3], [3, 2, 6, 8], [3, 1, 0, 4]]
dtype: object

And then convert Series to lists:然后将Series转换为列表:

bags = s.tolist()
print (bags)
[[[4, 5, 7, 8], [1, 3, 4, 6]], 
 [[1, 2, 6, 5], [1, 3, 3, 6]], 
 [[3, 5, 1, 3], [3, 2, 6, 8], [3, 1, 0, 4]]]

And second level of MultiIndex by Index.get_level_values too: Index.get_level_values MultiIndex

labels = s.index.get_level_values(1).tolist()
print (labels)
[1, 0, 1]

Not sure if this is what you are looking for不确定这是否是您要找的

import pandas as pd

df = df = pd.DataFrame([[1, 2, 4, 5, 7, 8, 1], 
                   [1, 3, 1, 3, 4, 6, 1],
                   [1, 4, 1, 2, 6, 5, 0],
                   [1, 5, 1, 3, 3, 6, 0],
                   [2, 6, 3, 5, 1, 3, 1],
                   [2, 7, 3, 2, 6, 8, 1],
                   [2, 1, 3, 1, 0, 4, 1]], 
                   columns=['person_id', 'object_id', 'col_1','col_2','col_3','col_4','label']) # example dataframe


df['cols'] = df[['col_1', 'col_2', 'col_3', 'col_4']].apply(lambda x: list(x), axis=1) # create a new column with col_x as element of a list

tmp = df.groupby(['person_id', 'label'])[['cols']].agg(list) # group by and create list of lists

bags = tmp['cols'].tolist() # unpack
labels = tmp.index.droplevel(0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM