[英]apply function in pandas to create two columns
I have a Pandas DataFrame called ebola
as seen below.我有一个名为
ebola
的 Pandas DataFrame,如下所示。 variable
column has two pieces of information status
whether it is Cases or Deaths and country
which consists of country names. variable
列有两条信息status
,无论是病例还是死亡,以及由country
名称组成的国家。 I try to create two new columns status
and country
out of that variable
column by using .apply()
function.我尝试使用
.apply()
function 从该variable
列中创建两个新列status
和country
/地区。 However, since there are two values I am trying to extract, it does not work.但是,由于我试图提取两个值,所以它不起作用。
# let's create a splitter function
def splitter(column):
status, country = column.split("_")
return status, country
# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].apply(splitter)
The error I get is我得到的错误是
ValueError: Must have equal len keys and value when setting with an iterable
I want my output to be like this我希望我的 output 是这样的
Use Series.str.split
使用
Series.str.split
ebola[['status','country']]=ebola['variable'].str.split(pat='_',expand=True)
This is very late post to original question.这是原始问题的非常晚的帖子。 Thanks to @ansev , the solution was great and it worked out great.
感谢@ansev ,这个解决方案很棒,而且效果很好。 While I was going through my question, I was trying to develop a solution based on my first approach.
在我处理我的问题时,我试图根据我的第一种方法开发一个解决方案。 I was able to work it out and I wanted to share for anyone who might want to see a different perspective on this.
我能够解决它,并且我想与任何可能希望对此有不同看法的人分享。
update to my code:更新我的代码:
# let's create a splitter function
def splitter(column):
for row in column:
status, country = row.split("_")
return status, country
# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].to_frame().apply(splitter, axis=1, result_type='expand')
Two updates to my code, so it could work.我的代码有两个更新,所以它可以工作。
.to_frame()
method..to_frame()
方法将其转换为 dataframe 。splitter
function, I had to iterate through each row since it was a DataFrame.splitter
function 中,我必须遍历每一行,因为它是 DataFrame。 Therefore, I added for row in column
line.for row in column
。 To replicate all of this:要复制所有这些:
import numpy as np
import pandas as pd
# create the data
ebola_dict = {'Date':['3/24/2014', '3/22/2014', '1/15/2015', '1/4/2015'],
'variable': ['Cases_Guinea', 'Cases_Guinea', 'Cases_Liberia', 'Cases_Liberia']}
ebola = pd.DataFrame(ebola_dict)
print(ebola)
# let's create a splitter function
def splitter(column):
for row in column:
status, country = row.split("_")
return status, country
# apply this function to that column and assign to two new columns
ebola[['status', 'country']] = ebola['variable'].to_frame().apply(splitter, axis=1, result_type='expand')
# check if it worked
print(ebola)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.