简体   繁体   English

如何将自定义正则表达式 function 应用于 dataframe Python 中的列

[英]How do I apply a custom regex function to column in dataframe Python

I have been searching for the past two days on this site and google to and I can not figure this out.过去两天我一直在这个网站和谷歌上搜索,但我无法弄清楚。 I have a dataframe with 4 columns.我有一个 4 列的 dataframe。 I feel like it is something so easy that I am missing.我觉得这是一件很容易的事情,我错过了。 Here is my function:这是我的 function:

  def zip_code(zip):
     if re.match('^[0-9]{5}(?:-[0-9]{4})?$',zip):
    #zip = 5
        return zip
    else:
        return ''

my customer information我的客户信息

   customer_info = (['John', 'Summers', '22960', '434-305-6600'], 
            ['Josh', 'Williams', '40143', '270-555-1544'],
            ['Jim', 'Roberson', '21801','555-555-5555'],
            ['John', 'Adams', '223211143', '4444444444']

This is my various attempts to make it work这是我让它发挥作用的各种尝试

   dataframe = pd.DataFrame(customer_info,columns = ['First','Last','Zip','Phone'])

   #dataframe['Zip'] = dataframe['Zip'].apply(zip_code())
   #dataframe['Zip'] = dataframe['Zip'].apply(zip_code(dataframe['Zip']))

   #dataframe['Zip'] = dataframe['Zip'].apply(lambda x: re.match('^[0-9]{5}(?:-[0-9]{4})?$',x))

   #dataframe.Zip.apply(lambda x: zip_code(x))
   #dataframe['Zip'].apply(zip_code)

   print(dataframe)
   zipcode = zip_code('22960')
   print(zipcode)

What I am trying to do is run a check on the zip code column 'Zip'.我要做的是检查 zip 代码列“Zip”。 If it is a zip code that matches ##### or #####-#### it will return the zip code.如果它是与##### 或#####-#### 匹配的 zip 代码,它将返回 zip 代码。 Other wise it will return an empty space.否则它将返回一个空白空间。 I have tested the zip code function and it works as expected.我已经测试了 zip 代码 function 并且它按预期工作。 However, I can no figure out how to pass the entire Zip column through the zip_code function.但是,我不知道如何通过 zip_code function 传递整个 Zip 列。 Everytime I type zip_code() it asks for variable.每次我输入 zip_code() 时,它都会询问变量。 Pretty much all of the commented out lines are lines that I found browsing this site, but it did not help me.几乎所有被注释掉的行都是我在浏览此站点时发现的行,但这对我没有帮助。 Thank you for any help you can provide!感谢您提供任何帮助!

This could work这可以工作

import pandas as pd
customer_info = (['John', 'Summers', '22960', '434-305-6600'], 
            ['Josh', 'Williams', '40143', '270-555-1544'],
            ['Jim', 'Roberson', '21801','555-555-5555'],
            ['John', 'Adams', '223211143', '4444444444'])
dataframe = pd.DataFrame(customer_info,columns = ['First','Last','Zip','Phone'])
dataframe["validZip"] = dataframe.Zip.str.extract(r'^([0-9]{5}(?:-[0-9]{4})?)$').fillna('')

Your version also works你的版本也可以

import pandas as pd
customer_info = (['John', 'Summers', '22960', '434-305-6600'], 
            ['Josh', 'Williams', '40143', '270-555-1544'],
            ['Jim', 'Roberson', '21801','555-555-5555'],
            ['John', 'Adams', '223211143', '4444444444'])
dataframe = pd.DataFrame(customer_info,columns = ['First','Last','Zip','Phone'])


def zip_code(zip):
    if re.match('^[0-9]{5}(?:-[0-9]{4})?$',zip):
        return zip
    else:
        return ''

dataframe.Zip = dataframe.Zip.apply(zip_code)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM