简体   繁体   English

Python pandas:从 dataframe 的每一行构造列表数据类对象

[英]Python pandas: construct list dataclass objects from each row of a dataframe

A consistent answer seems to be to avoid iterating over rows while working with Pandas. I'd like to understand how I can do so in the following case.一个一致的答案似乎是避免在使用 Pandas 时遍历行。我想了解在以下情况下如何执行此操作。

from typing import List

@dataclass
class Person:
    id: int
    name: str
    age: int

persons_df = pd.DataFrame(data={'id': [1, 2, 3], 'name': ['A', 'B', 'C'], 'age': [32, 44, '86']})

persons_list: List[Person] = [] #populate this list with Person objects, created from the dataframe above

# my approach is to use iterrows()
for row in persons_df.itertuples():
    person = Person(row.id, row.name, int(row.age)) # type: ignore
    plist.append(person)

I'd like to find an option which can avoid the iterrows, and if possible, be done in a manner that has some type safety built in (avoid the mypy ignore comment).我想找到一个可以避免 iterrows 的选项,如果可能的话,以内置某种类型安全的方式完成(避免 mypy 忽略注释)。

thanks!谢谢!

I am not sure if thats what you are looking for, but maybe this helps:我不确定这是否是您正在寻找的,但也许这会有所帮助:

import pandas as pd
df = pd.DataFrame(data={'id': [1, 2, 3], 'name': ['A', 'B', 'C'], 'age': [32, 44, '86']})

class Person:
    def __init__(self, lst):
        self.id = lst[0]
        self.name = lst[1]
        self.age = lst[2]

df.apply(Person, axis=1).tolist()

out:出去:

[<__main__.Person at 0x176eee70608>,
 <__main__.Person at 0x176eee704c8>,
 <__main__.Person at 0x176eee70388>]

I add a new answer, because the title of the question is map dataframe rows to a list of dataclass objects , and this has not been addressed yet.我添加了一个新的答案,因为问题的标题是map dataframe rows to a list of dataclass objects ,这还没有得到解决。

To return dataclasses, we can slightly improve @Andreas answer , without requiring an additional constructor receiving a list.要返回数据类,我们可以稍微改进@Andreas answer ,而不需要额外的构造函数接收列表。 We just have to use Python spread operators.我们只需要使用 Python 传播运营商。

I see two ways of mapping:我看到两种映射方式:

  1. The dataframe column names match the data class field names. dataframe 列名匹配数据 class 字段名。 In this case, we can ask to map our row as a set of keyword arguments: df.apply(lambda row: MyDataClass(**row), axis=1)在这种情况下,我们可以向 map 查询我们的行作为一组关键字 arguments: df.apply(lambda row: MyDataClass(**row), axis=1)
  2. The dataframe column names does not match data class field names, but column order match dataclass field order . dataframe 列名与数据 class 字段名匹配,但列顺序与数据类字段顺序匹配 In this case, we can ask that our row values are passed as a list of ordered arguments: df.apply(lambda row: MyDataClass(*row), axis=1)在这种情况下,我们可以要求我们的行值作为有序列表 arguments 传递: df.apply(lambda row: MyDataClass(*row), axis=1)

Example:例子:

  1. Define same data class and same dataframe as in the question:定义与问题中相同的数据 class 和相同的 dataframe:
     from dataclasses import dataclass @dataclass class Person: id: int name: str age: int import pandas df = pandas.DataFrame(data={ 'id': [1, 2, 3], 'name': ['A', 'B', 'C'], 'age': [32, 44, '86'] })
  2. Conversion based on column order:基于列顺序的转换:
     persons = df.apply(lambda row: Person(*row), axis=1)
  3. Conversion based on column names (column order is shuffled for a better test):基于列名的转换(为更好的测试打乱了列顺序):
     persons = df[['age', 'id', 'name']].apply(lambda row: Person(**row), axis=1)
  4. Now, we can verify our result.现在,我们可以验证我们的结果。 In both cases above:在上述两种情况下:
    • This snippet:这个片段:
       print(type(persons)) print(persons)
    • prints:印刷:
       <class 'pandas.core.series.Series'> 0 Person(id=1, name='A', age=32) 1 Person(id=2, name='B', age=44) 2 Person(id=3, name='C', age='86') dtype: object

WARNINGS:警告:

  • I have no idea of the performance of this solution我不知道这个解决方案的性能
  • This does not enforce any type checking (look at last person printed: its age is a text).不会强制执行任何类型检查(看看最后打印的人:它的年龄是一个文本)。 As Python does not enforce typing by default, this quick solution does not bring any additional safety.由于默认情况下 Python 不强制键入,因此这种快速解决方案不会带来任何额外的安全性。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从(row,col,values)元组列表构造pandas DataFrame - Construct pandas DataFrame from list of tuples of (row,col,values) 从形式{index:行值列表}中的字典构造Pandas DataFrame - Construct Pandas DataFrame from dictionary in form {index: list of row values} 来自 Pandas Dataframe 的 Python 对象列表 - List of Python Objects from Pandas Dataframe 从字典列表中创建 Pandas DataFrame? 每个字典在 DataFrame 中作为行? - Creating a Pandas DataFrame from list of dictionaries? Each dictionary as row in DataFrame? 对于 Pandas dataframe 中的每一行,检查行是否包含列表中的字符串 - For each row in Pandas dataframe, check if row contains string from list python pandas:对于列表中的每个元素,根据条件从数据帧返回一行 - python pandas : for each element in a list return a row from dataframe based on conditions 列表相对于Pandas数据框中每一行的出现频率 - Occurence frequency from a list against each row in Pandas dataframe 从 pandas dataframe 中每一行的字符串列表中删除空字符串 - Remove empty strings from a list of strings on each row in a pandas dataframe Python 2.7 / Pandas:从数据框中的每一行写入新字符串 - Python 2.7 / Pandas: writing new string from each row in dataframe 有没有一种简单的方法可以从 attrs 对象的 Iterable 构造 pandas DataFrame ? - Is there an easy way to construct a pandas DataFrame from an Iterable of attrs objects?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM