在 pandas 中应用 function 创建两列

Question

I have a Pandas DataFrame called ebola as seen below.我有一个名为ebola的 Pandas DataFrame，如下所示。 variable column has two pieces of information status whether it is Cases or Deaths and country which consists of country names. variable列有两条信息status ，无论是病例还是死亡，以及由country名称组成的国家。 I try to create two new columns status and country out of that variable column by using .apply() function.我尝试使用.apply() function 从该variable列中创建两个新列status和country /地区。 However, since there are two values I am trying to extract, it does not work.但是，由于我试图提取两个值，所以它不起作用。

# let's create a splitter function
def splitter(column):
    status, country = column.split("_")
    return status, country

# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].apply(splitter)

The error I get is我得到的错误是

ValueError: Must have equal len keys and value when setting with an iterable

I want my output to be like this我希望我的 output 是这样的

Answer 1

Use Series.str.split使用Series.str.split

ebola[['status','country']]=ebola['variable'].str.split(pat='_',expand=True)

Answer 2

This is very late post to original question.这是原始问题的非常晚的帖子。 Thanks to @ansev , the solution was great and it worked out great.感谢@ansev ，这个解决方案很棒，而且效果很好。 While I was going through my question, I was trying to develop a solution based on my first approach.在我处理我的问题时，我试图根据我的第一种方法开发一个解决方案。 I was able to work it out and I wanted to share for anyone who might want to see a different perspective on this.我能够解决它，并且我想与任何可能希望对此有不同看法的人分享。

update to my code:更新我的代码：

# let's create a splitter function
def splitter(column):
    for row in column:
        status, country = row.split("_")
        return status, country

# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].to_frame().apply(splitter, axis=1, result_type='expand')

Two updates to my code, so it could work.我的代码有两个更新，所以它可以工作。

Instead of going through Series, I converted it to dataframe using .to_frame() method.我没有通过系列，而是使用.to_frame()方法将其转换为 dataframe 。
In my splitter function, I had to iterate through each row since it was a DataFrame.在我的splitter function 中，我必须遍历每一行，因为它是 DataFrame。 Therefore, I added for row in column line.因此，我for row in column 。

To replicate all of this:要复制所有这些：

import numpy as np
import pandas as pd

# create the data
ebola_dict = {'Date':['3/24/2014', '3/22/2014', '1/15/2015', '1/4/2015'],
              'variable': ['Cases_Guinea', 'Cases_Guinea', 'Cases_Liberia', 'Cases_Liberia']}
ebola = pd.DataFrame(ebola_dict)
print(ebola)

# let's create a splitter function
def splitter(column):
    for row in column:
        status, country = row.split("_")
        return status, country

# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].to_frame().apply(splitter, axis=1, result_type='expand')

# check if it worked
print(ebola)

在 pandas 中应用 function 创建两列

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-11-03 03:42:48

解决方案2
0 2022-05-12 19:21:19

在 pandas 中应用 function 创建两列

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-11-03 03:42:48

解决方案2 0 2022-05-12 19:21:19

解决方案1
1 已采纳 2019-11-03 03:42:48

解决方案2
0 2022-05-12 19:21:19