简体   繁体   English

将 pandas df.apply 与返回字典的函数一起使用

[英]Using pandas df.apply with a function that returns a dictionary

I have a JSON file from which I'm initially reading into a pandas DF.我有一个 JSON 文件,我最初从中读取到 pandas DF。 It looks like this:它看起来像这样:

{
  ...
  ...
"Info": [
            {
                "Type": "A",
                "Desc": "4848",
                ...
            },
            {
                "Type": "P",
                "Desc": "3763",
                ...
            },
            {
                "Type": "S",
                "Desc": "AUBERT",
                ...
            }
        ],
...
}

I have a function that will loop over the "Info" field and depending on "Type" will store information into a dictionary and return that dictionary.我有一个函数可以遍历“信息”字段,并根据“类型”将信息存储到字典中并返回该字典。 Then I want to create new columns in my df based on the values stored in the dictionary using df.apply .然后我想使用df.apply根据存储在字典中的值在我的 df 中创建新列。 Please see below:请看下面:

def extract_info(self):
    def extract_data(df):
        dic = {'a': None, 'p': None, 's': None}
        for info in df['Info']:

            if info['Type'] == "A":
                dic['a'] = info['Desc']
            if info['Type'] == "P":
                dic['p'] = info['Desc']
            if info['Type'] == "S":
                dic['s'] = info['Desc']
        return dic

self.df['A'] = self.df.apply(extract_data, axis=1)['a']
self.df['P'] = self.df.apply(extract_data, axis=1)['p']
self.df['S'] = self.df.apply(extract_data, axis=1)['s']

return self

I have also tried doing:我也尝试过这样做:

self.df['A'] = self.df.apply(lambda x: extract_data(x['a']), axis=1)

But these are not working for me.但这些对我不起作用。 I have looked at other SO posts about using df.apply with function that returns dictionary but did not find what I need for my case.我查看了其他关于将df.apply与返回字典的函数一起使用的 SO 帖子,但没有找到我的案例所需的内容。 Please help.请帮忙。

I could write 3 separate functions like extract_A , extract_B and extract_C and return single values each to make df.apply work but that means running the for loop 3 times, one for each function.我可以编写 3 个单独的函数,例如extract_Aextract_Bextract_C并分别返回单个值以使df.apply工作,但这意味着运行 for 循环 3 次,每个函数一个。 Any other suggestions other than use of a dictionary is welcome too.也欢迎使用字典以外的任何其他建议。 Thanks.谢谢。

Instead of storing it in a dictionary, I can store them as variables and return them in my extract_data function.我可以将它们存储为变量并将它们返回到我的extract_data函数中,而不是将其存储在字典中。 Then I can assign these values to new columns in my self.df directly using result_type parameter in df.apply .然后我可以直接使用df.apply中的result_type参数将这些值分配给我的self.df中的新列。

def extract_info(self):
    def extract_data(df):
        a = None
        p = None
        s = None
        for info in df['Info']:
            if info['Type'] == "A":
                a = info['Desc']
            if info['Type'] == "P":
                p = info['Desc']
            if info['Type'] == "S":
                s = info['Desc']

        return a, p, s

self.df[['A', 'P', 'S']] = self.df.apply(extract_data, axis=1, result_type="expand")

return self

Output:输出:

       A     P       S
0    4848  3763    AUBERT
...
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM