Python pandas: construct list dataclass objects from each row of a dataframe

Question

A consistent answer seems to be to avoid iterating over rows while working with Pandas. I'd like to understand how I can do so in the following case.

from typing import List

@dataclass
class Person:
    id: int
    name: str
    age: int

persons_df = pd.DataFrame(data={'id': [1, 2, 3], 'name': ['A', 'B', 'C'], 'age': [32, 44, '86']})

persons_list: List[Person] = [] #populate this list with Person objects, created from the dataframe above

# my approach is to use iterrows()
for row in persons_df.itertuples():
    person = Person(row.id, row.name, int(row.age)) # type: ignore
    plist.append(person)

I'd like to find an option which can avoid the iterrows, and if possible, be done in a manner that has some type safety built in (avoid the mypy ignore comment).

thanks!

Answer 1

I am not sure if thats what you are looking for, but maybe this helps:

import pandas as pd
df = pd.DataFrame(data={'id': [1, 2, 3], 'name': ['A', 'B', 'C'], 'age': [32, 44, '86']})

class Person:
    def __init__(self, lst):
        self.id = lst[0]
        self.name = lst[1]
        self.age = lst[2]

df.apply(Person, axis=1).tolist()

out:

[<__main__.Person at 0x176eee70608>,
 <__main__.Person at 0x176eee704c8>,
 <__main__.Person at 0x176eee70388>]

Answer 2

I add a new answer, because the title of the question is map dataframe rows to a list of dataclass objects , and this has not been addressed yet.

To return dataclasses, we can slightly improve @Andreas answer , without requiring an additional constructor receiving a list. We just have to use Python spread operators.

I see two ways of mapping:

The dataframe column names match the data class field names. In this case, we can ask to map our row as a set of keyword arguments: df.apply(lambda row: MyDataClass(**row), axis=1)
The dataframe column names does not match data class field names, but column order match dataclass field order . In this case, we can ask that our row values are passed as a list of ordered arguments: df.apply(lambda row: MyDataClass(*row), axis=1)

Example:

Define same data class and same dataframe as in the question:

 from dataclasses import dataclass @dataclass class Person: id: int name: str age: int import pandas df = pandas.DataFrame(data={ 'id': [1, 2, 3], 'name': ['A', 'B', 'C'], 'age': [32, 44, '86'] })

Conversion based on column order:

 persons = df.apply(lambda row: Person(*row), axis=1)

Conversion based on column names (column order is shuffled for a better test):

 persons = df[['age', 'id', 'name']].apply(lambda row: Person(**row), axis=1)

Now, we can verify our result. In both cases above:

This snippet:
```
 print(type(persons)) print(persons)
```

prints:

 <class 'pandas.core.series.Series'> 0 Person(id=1, name='A', age=32) 1 Person(id=2, name='B', age=44) 2 Person(id=3, name='C', age='86') dtype: object

WARNINGS:

I have no idea of the performance of this solution
This does not enforce any type checking (look at last person printed: its age is a text). As Python does not enforce typing by default, this quick solution does not bring any additional safety.

Python pandas: construct list dataclass objects from each row of a dataframe

Question

2 answers

solution1
0 2021-05-04 22:08:59

solution2
0 2023-02-01 14:39:07

Python pandas: construct list dataclass objects from each row of a dataframe

Question

2 answers

solution1 0 2021-05-04 22:08:59

solution2 0 2023-02-01 14:39:07

solution1
0 2021-05-04 22:08:59

solution2
0 2023-02-01 14:39:07