[英]Python pandas: construct list dataclass objects from each row of a dataframe
A consistent answer seems to be to avoid iterating over rows while working with Pandas. I'd like to understand how I can do so in the following case.一个一致的答案似乎是避免在使用 Pandas 时遍历行。我想了解在以下情况下如何执行此操作。
from typing import List
@dataclass
class Person:
id: int
name: str
age: int
persons_df = pd.DataFrame(data={'id': [1, 2, 3], 'name': ['A', 'B', 'C'], 'age': [32, 44, '86']})
persons_list: List[Person] = [] #populate this list with Person objects, created from the dataframe above
# my approach is to use iterrows()
for row in persons_df.itertuples():
person = Person(row.id, row.name, int(row.age)) # type: ignore
plist.append(person)
I'd like to find an option which can avoid the iterrows, and if possible, be done in a manner that has some type safety built in (avoid the mypy ignore comment).我想找到一个可以避免 iterrows 的选项,如果可能的话,以内置某种类型安全的方式完成(避免 mypy 忽略注释)。
thanks!谢谢!
I am not sure if thats what you are looking for, but maybe this helps:我不确定这是否是您正在寻找的,但也许这会有所帮助:
import pandas as pd
df = pd.DataFrame(data={'id': [1, 2, 3], 'name': ['A', 'B', 'C'], 'age': [32, 44, '86']})
class Person:
def __init__(self, lst):
self.id = lst[0]
self.name = lst[1]
self.age = lst[2]
df.apply(Person, axis=1).tolist()
out:出去:
[<__main__.Person at 0x176eee70608>,
<__main__.Person at 0x176eee704c8>,
<__main__.Person at 0x176eee70388>]
I add a new answer, because the title of the question is map dataframe rows to a list of dataclass objects , and this has not been addressed yet.我添加了一个新的答案,因为问题的标题是map dataframe rows to a list of dataclass objects ,这还没有得到解决。
To return dataclasses, we can slightly improve @Andreas answer , without requiring an additional constructor receiving a list.要返回数据类,我们可以稍微改进@Andreas answer ,而不需要额外的构造函数接收列表。 We just have to use Python spread operators.
我们只需要使用 Python 传播运营商。
I see two ways of mapping:我看到两种映射方式:
df.apply(lambda row: MyDataClass(**row), axis=1)
df.apply(lambda row: MyDataClass(**row), axis=1)
df.apply(lambda row: MyDataClass(*row), axis=1)
df.apply(lambda row: MyDataClass(*row), axis=1)
Example:例子:
from dataclasses import dataclass @dataclass class Person: id: int name: str age: int import pandas df = pandas.DataFrame(data={ 'id': [1, 2, 3], 'name': ['A', 'B', 'C'], 'age': [32, 44, '86'] })
persons = df.apply(lambda row: Person(*row), axis=1)
persons = df[['age', 'id', 'name']].apply(lambda row: Person(**row), axis=1)
print(type(persons)) print(persons)
<class 'pandas.core.series.Series'> 0 Person(id=1, name='A', age=32) 1 Person(id=2, name='B', age=44) 2 Person(id=3, name='C', age='86') dtype: object
WARNINGS:警告:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.