First, suppose you have the following DataFrame.
import pandas as ps
df = ps.DataFrame([
[0, 'test0', 0, 'sub0', 'one'],
[0, 'test0', 1, 'sub1', 'two'],
[1, 'test1', 0, 'sub0', 'one'],
[1, 'test1', 1, 'sub1', 'two'],
], columns=['id', 'name', 'sub_id', 'sub_name', 'value'])
df = df.set_index(['id', 'sub_id'])
name sub_name value
id sub_id
0 0 test0 sub0 one
1 test0 sub1 two
1 0 test1 sub0 one
1 test1 sub1 two
I want to convert this to a list object like the one below (Here we use dataclass).
from typing import List
from dataclasses import dataclass
@dataclass
class SubObj:
id: int
name: str
value: str
@dataclass
class MainObj:
id: int
name: str
sub_obj: List[SubObj]
The output should look like this:
result = [
MainObj(
id=0,
name='test0',
sub_obj=[
SubObj(
id=0,
name='sub0',
value='one'
),
SubObj(
id=1,
name='sub1',
value='two'
)
]
),
MainObj(
id=1,
name='test1',
sub_obj=[
SubObj(
id=0,
name='sub0',
value='one'
),
SubObj(
id=1,
name='sub1',
value='two'
)
]
),
]
print(result)
[MainObj(id=0, name='test0', sub_obj=[SubObj(id=0, name='sub0', value='one'), SubObj(id=1, name='sub1', value='two')]), MainObj(id=1, name='test1', sub_obj=[SubObj(id=0, name='sub0', value='one'), SubObj(id=1, name='sub1', value='two')])]
I want to implement it so that it outputs a list of MainObj with as short and easy-to-understand code as possible.
Do you know how to do it?
How about a little list comprehension like this?
result = [MainObj(
row[0][0],
row[1]['name'],
SubObj(
row[0][1],
row[1]['sub_name'],
row[1]['value']
)
) for row in df.iterrows()]
Returns
[MainObj(id=0, name='test0', sub_obj=SubObj(id=0, name='sub0', value='one')),
MainObj(id=0, name='test0', sub_obj=SubObj(id=1, name='sub1', value='two')),
MainObj(id=1, name='test1', sub_obj=SubObj(id=0, name='sub0', value='one')),
MainObj(id=1, name='test1', sub_obj=SubObj(id=1, name='sub1', value='two'))]
Update
Just realized you want sub_obj's as lists. I think this would be a better way:
results = list()
for _, g in df.groupby(level=0): # Groupby on first index
results.append(
MainObj(
g.index[0][0], # Get the first index value
g['name'].iloc[0],
[SubObj(row[0][1], row[1]['sub_name'], row[1]['value']) for row in g.iterrows()])) # List comp iterrating over group rows
[MainObj(id=0, name='test0', sub_obj=[SubObj(id=0, name='sub0', value='one'), SubObj(id=1, name='sub1', value='two')]),
MainObj(id=1, name='test1', sub_obj=[SubObj(id=0, name='sub0', value='one'), SubObj(id=1, name='sub1', value='two')])]
Here's a way to do it with pandas constructs
SubObj
sub_id
to create a dataframe that contains only the MainObj
level infoMainObj
>>> sub = df.reset_index('sub_id')[['sub_id', 'sub_name', 'value']].agg(lambda row: SubObj(*row), axis='columns')
>>> sub
id
0 SubObj(id=0, name='sub0', value='one')
0 SubObj(id=1, name='sub1', value='two')
1 SubObj(id=0, name='sub0', value='one')
1 SubObj(id=1, name='sub1', value='two')
>>> sub.groupby('id').agg(list)
id
0 [SubObj(id=0, name='sub0', value='one'), SubOb...
1 [SubObj(id=0, name='sub0', value='one'), SubOb...
Name: obj, dtype: object
>>> maindf = df[['name']].droplevel('sub_id').drop_duplicates().join(sub.groupby('id').agg(list))
>>> maindf
name obj
id
0 test0 [SubObj(id=0, name='sub0', value='one'), SubOb...
1 test1 [SubObj(id=0, name='sub0', value='one'), SubOb...
>>> maindf.reset_index().agg(lambda row: MainObj(*row), axis='columns').to_list()
[MainObj(id=0, name='test0', sub_obj=[SubObj(id=0, name='sub0', value='one'), SubObj(id=1, name='sub1', value='two')]), MainObj(id=1, name='test1', sub_obj=[SubObj(id=0, name='sub0', value='one'), SubObj(id=1, name='sub1', value='two')])]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.