简体   繁体   English

基于条件在熊猫数据框列中的特殊字符上拆分字符串

[英]Splitting a string on a special character in a pandas dataframe column based on a conditional

I am trying to establish conformity in an address column in my pandas dataframe.我正在尝试在我的 Pandas 数据框中的地址列中建立一致性。 I have a ZipCode Column that has two formats: 1) 87301 2) 87301-1234.我有一个 ZipCode 列,它有两种格式:1) 87301 2) 87301-1234。 Not every row has the hyphen so I need to split on the hyphen when it is present.不是每一行都有连字符,所以当它出现时我需要在连字符上拆分。

My data looks like this:我的数据如下所示:

State  ZIP
CA     85145-7045
PA     76913   

I have tried a few methods of tackling this problem.我尝试了几种方法来解决这个问题。 I have tried:我试过了:

data['Zip_1'],data['Zip_2'] = data['Zip'].str.split('-').str

I have tried:我试过了:

data['Zip'] = data['Zip'].str.split('-', n=1, expand=True)
data['Zip'] = data['Zip'][0]
data['Zip_drop'] = data['Zip'][1]

I have also tried using a lambda function.我也尝试过使用 lambda 函数。

However it just returns nulls.但是它只返回空值。

I would expect the new column to return NaN for zipcodes that do not have the hyphen and the numbers after the hyphen if it does contain the hyphen.我希望新列对于没有连字符的邮政编码和连字符后的数字(如果它包含连字符)返回 NaN。 However, the new column just populates NaN for every observation但是,新列只是为每个观察值填充 NaN

You can do that by using " replace " combined with regular expressions .您可以通过将“替换”与正则表达式结合使用来做到这一点。

Step 1第1步

example_df = pd.DataFrame({'State': ['CA', 'PA'],
                           'ZIP': ['85145-7045', '76913'] })

example_df

在此处输入图片说明

Step 2第2步

# Keep only the numbers before the hyphen (if any).
example_df = example_df.replace('\-\d*', '', regex=True)
example_df

输出

Get a dataframe of all zipcodes containing a hyphen, and place it in a new column获取包含连字符的所有邮政编码的数据框,并将其放在新列中

data['Zip Hyphen'] = data['Zip'].str.find('-')

Then, from the dataframe with column Zip, drop any rows where there is a hyphen contained然后,从带有 Zip 列的数据框中,删除包含连字符的任何行

 data = data.drop(data[data['Zip'].str.find('-')].index)

EDIT: This code is not tested but the general idea is there编辑:此代码未经测试,但总体思路就在那里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM