[英]How do I apply a custom regex function to column in dataframe Python
I have been searching for the past two days on this site and google to and I can not figure this out.过去两天我一直在这个网站和谷歌上搜索,但我无法弄清楚。 I have a dataframe with 4 columns.
我有一个 4 列的 dataframe。 I feel like it is something so easy that I am missing.
我觉得这是一件很容易的事情,我错过了。 Here is my function:
这是我的 function:
def zip_code(zip):
if re.match('^[0-9]{5}(?:-[0-9]{4})?$',zip):
#zip = 5
return zip
else:
return ''
my customer information我的客户信息
customer_info = (['John', 'Summers', '22960', '434-305-6600'],
['Josh', 'Williams', '40143', '270-555-1544'],
['Jim', 'Roberson', '21801','555-555-5555'],
['John', 'Adams', '223211143', '4444444444']
This is my various attempts to make it work这是我让它发挥作用的各种尝试
dataframe = pd.DataFrame(customer_info,columns = ['First','Last','Zip','Phone'])
#dataframe['Zip'] = dataframe['Zip'].apply(zip_code())
#dataframe['Zip'] = dataframe['Zip'].apply(zip_code(dataframe['Zip']))
#dataframe['Zip'] = dataframe['Zip'].apply(lambda x: re.match('^[0-9]{5}(?:-[0-9]{4})?$',x))
#dataframe.Zip.apply(lambda x: zip_code(x))
#dataframe['Zip'].apply(zip_code)
print(dataframe)
zipcode = zip_code('22960')
print(zipcode)
What I am trying to do is run a check on the zip code column 'Zip'.我要做的是检查 zip 代码列“Zip”。 If it is a zip code that matches ##### or #####-#### it will return the zip code.
如果它是与##### 或#####-#### 匹配的 zip 代码,它将返回 zip 代码。 Other wise it will return an empty space.
否则它将返回一个空白空间。 I have tested the zip code function and it works as expected.
我已经测试了 zip 代码 function 并且它按预期工作。 However, I can no figure out how to pass the entire Zip column through the zip_code function.
但是,我不知道如何通过 zip_code function 传递整个 Zip 列。 Everytime I type zip_code() it asks for variable.
每次我输入 zip_code() 时,它都会询问变量。 Pretty much all of the commented out lines are lines that I found browsing this site, but it did not help me.
几乎所有被注释掉的行都是我在浏览此站点时发现的行,但这对我没有帮助。 Thank you for any help you can provide!
感谢您提供任何帮助!
This could work这可以工作
import pandas as pd
customer_info = (['John', 'Summers', '22960', '434-305-6600'],
['Josh', 'Williams', '40143', '270-555-1544'],
['Jim', 'Roberson', '21801','555-555-5555'],
['John', 'Adams', '223211143', '4444444444'])
dataframe = pd.DataFrame(customer_info,columns = ['First','Last','Zip','Phone'])
dataframe["validZip"] = dataframe.Zip.str.extract(r'^([0-9]{5}(?:-[0-9]{4})?)$').fillna('')
Your version also works你的版本也可以
import pandas as pd
customer_info = (['John', 'Summers', '22960', '434-305-6600'],
['Josh', 'Williams', '40143', '270-555-1544'],
['Jim', 'Roberson', '21801','555-555-5555'],
['John', 'Adams', '223211143', '4444444444'])
dataframe = pd.DataFrame(customer_info,columns = ['First','Last','Zip','Phone'])
def zip_code(zip):
if re.match('^[0-9]{5}(?:-[0-9]{4})?$',zip):
return zip
else:
return ''
dataframe.Zip = dataframe.Zip.apply(zip_code)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.