简体   繁体   English

在 pandas 中应用 function 创建两列

[英]apply function in pandas to create two columns

I have a Pandas DataFrame called ebola as seen below.我有一个名为ebola的 Pandas DataFrame,如下所示。 variable column has two pieces of information status whether it is Cases or Deaths and country which consists of country names. variable列有两条信息status ,无论是病例还是死亡,以及由country名称组成的国家。 I try to create two new columns status and country out of that variable column by using .apply() function.我尝试使用.apply() function 从该variable列中创建两个新列statuscountry /地区。 However, since there are two values I am trying to extract, it does not work.但是,由于我试图提取两个值,所以它不起作用。

埃博拉数据框

# let's create a splitter function
def splitter(column):
    status, country = column.split("_")
    return status, country

# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].apply(splitter)

The error I get is我得到的错误是

ValueError: Must have equal len keys and value when setting with an iterable

I want my output to be like this我希望我的 output 是这样的

在此处输入图像描述

Use Series.str.split使用Series.str.split

ebola[['status','country']]=ebola['variable'].str.split(pat='_',expand=True)

This is very late post to original question.这是原始问题的非常晚的帖子。 Thanks to @ansev , the solution was great and it worked out great.感谢@ansev ,这个解决方案很棒,而且效果很好。 While I was going through my question, I was trying to develop a solution based on my first approach.在我处理我的问题时,我试图根据我的第一种方法开发一个解决方案。 I was able to work it out and I wanted to share for anyone who might want to see a different perspective on this.我能够解决它,并且我想与任何可能希望对此有不同看法的人分享。

update to my code:更新我的代码:

# let's create a splitter function
def splitter(column):
    for row in column:
        status, country = row.split("_")
        return status, country

# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].to_frame().apply(splitter, axis=1, result_type='expand')

Two updates to my code, so it could work.我的代码有两个更新,所以它可以工作。

  1. Instead of going through Series, I converted it to dataframe using .to_frame() method.我没有通过系列,而是使用.to_frame()方法将其转换为 dataframe 。
  2. In my splitter function, I had to iterate through each row since it was a DataFrame.在我的splitter function 中,我必须遍历每一行,因为它是 DataFrame。 Therefore, I added for row in column line.因此,我for row in column

To replicate all of this:要复制所有这些:

import numpy as np
import pandas as pd

# create the data
ebola_dict = {'Date':['3/24/2014', '3/22/2014', '1/15/2015', '1/4/2015'],
              'variable': ['Cases_Guinea', 'Cases_Guinea', 'Cases_Liberia', 'Cases_Liberia']}
ebola = pd.DataFrame(ebola_dict)
print(ebola)

# let's create a splitter function
def splitter(column):
    for row in column:
        status, country = row.split("_")
        return status, country

# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].to_frame().apply(splitter, axis=1, result_type='expand')

# check if it worked
print(ebola)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM