简体   繁体   English

将Python类对象转换为DataFrame

[英]Converting Python Class Object To A DataFrame

How do I convert a Python class object that has fields that instantiate other classes to a DataFrame? 如何将具有实例化其他类的字段的Python类对象转换为DataFrame? I tried the following code below but it does not work. 我在下面尝试了以下代码,但无法正常工作。

I can get it to work when I take out self.address = Address() and self.agency_contact_info = ContactInfo() 当我取出self.address = Address()self.agency_contact_info = ContactInfo()时,它可以工作

class Address:
    def __init__(self):
        self.address_one = "address 1"
        self.address_two = "P.O. BOX 1"                  

class ContactInfo:
    def __init__(self):
        self.person_name = "Me"
        self.phone_number = "999-999-9999"    

class AgencyRecord:
    def __init__(self):
        self.agency_code = "00"
        self.agency_id = "000"
        self.agency_name = "Some Agency"
        self.address = Address()
        self.agency_contact_info = ContactInfo()            

def create_data():
    data = {}

    for i in range(0, 3):
        alc = AgencyRecord()                    
        data[i] = alc   

    column_list = [
        'agency_code', 'agency_id', 'agency_name', 
        'address_one', 'address_two', 'person_name', 'phone_number'
    ]

    spark.createDataFrame(
        list(data.values()),
        column_list
    ).createOrReplaceTempView("MyTempTable")

Quoting myself again: 再次引用自己

I find it's useful to think of the argument to createDataFrame() as a list of [iterables] where each entry in the list corresponds to a row in the DataFrame and each element of the [iterable] corresponds to a column. 我发现将createDataFrame()的参数视为[iterables]的列表很有用,其中列表中的每个条目都对应于DataFrame中的一行,而[iterable]的每个元素都对应于一列。


So you need to convert each of your objects into an interable where each element corresponds to the columns in column_list . 因此,您需要将每个对象转换为一个互变量,其中每个元素都与column_list的列相对应。

I wouldn't necessarily endorse it (there's almost surely a better way), but here is one hacky approach you can take to modify your code accordingly: 我不一定会认可它(几乎肯定会有更好的方法),但是您可以采取以下一种骇人听闻的方法来相应地修改代码:

You can take advantage of the fact that python objects have a self.__dict__ that you can use to retrieve parameters by name. 您可以利用python对象具有self.__dict__的事实,可以通过名称检索参数。 First, update your AgencyRecord class to pull in the fields from the Address and ContactInfo classes: 首先,更新您的AgencyRecord类以从AddressContactInfo类中提取字段:

class AgencyRecord:
    def __init__(self):
        self.agency_code = "00"
        self.agency_id = "000"
        self.agency_name = "Some Agency"
        self.address = Address()
        self.agency_contact_info = ContactInfo()

        # makes the variables of the contained classes members of this class
        self.__dict__.update(self.address.__dict__)
        self.__dict__.update(self.agency_contact_info.__dict__)

Now we can reference each column in column_list by name for any instance of an AgencyRecord . 现在,我们可以按名称引用AgencyRecord任何实例的column_list的每一列。

Modify the create_data as follows (I've also changed this to return a DataFrame, rather than registering a temp view) 如下修改create_data (我也将其更改为返回DataFrame,而不是注册临时视图)

def create_data():
    data = {}

    for i in range(0, 3):
        alc = AgencyRecord()                    
        data[i] = alc   

    column_list = [
        'agency_code', 'agency_id', 'agency_name', 
        'address_one', 'address_two', 'person_name', 'phone_number'
    ]

    values = [
        [data[record].__dict__[c] for c in column_list]
        for record in data
    ]

    return spark.createDataFrame(values, column_list)

Now you can do: 现在您可以执行以下操作:

temp_df = create_data()
temp_df.show()
#+-----------+---------+-----------+-----------+-----------+-----------+------------+
#|agency_code|agency_id|agency_name|address_one|address_two|person_name|phone_number|
#+-----------+---------+-----------+-----------+-----------+-----------+------------+
#|         00|      000|Some Agency|  address 1| P.O. BOX 1|         Me|999-999-9999|
#|         00|      000|Some Agency|  address 1| P.O. BOX 1|         Me|999-999-9999|
#|         00|      000|Some Agency|  address 1| P.O. BOX 1|         Me|999-999-9999|
#+-----------+---------+-----------+-----------+-----------+-----------+------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM