Splitting a string on a special character in a pandas dataframe column based on a conditional

Question

I am trying to establish conformity in an address column in my pandas dataframe. I have a ZipCode Column that has two formats: 1) 87301 2) 87301-1234. Not every row has the hyphen so I need to split on the hyphen when it is present.

My data looks like this:

State  ZIP
CA     85145-7045
PA     76913

I have tried a few methods of tackling this problem. I have tried:

data['Zip_1'],data['Zip_2'] = data['Zip'].str.split('-').str

I have tried:

data['Zip'] = data['Zip'].str.split('-', n=1, expand=True)
data['Zip'] = data['Zip'][0]
data['Zip_drop'] = data['Zip'][1]

I have also tried using a lambda function.

However it just returns nulls.

I would expect the new column to return NaN for zipcodes that do not have the hyphen and the numbers after the hyphen if it does contain the hyphen. However, the new column just populates NaN for every observation

Answer 1

You can do that by using " replace " combined with regular expressions .

Step 1

example_df = pd.DataFrame({'State': ['CA', 'PA'],
                           'ZIP': ['85145-7045', '76913'] })

example_df

Step 2

# Keep only the numbers before the hyphen (if any).
example_df = example_df.replace('\-\d*', '', regex=True)
example_df

Answer 2

Get a dataframe of all zipcodes containing a hyphen, and place it in a new column

data['Zip Hyphen'] = data['Zip'].str.find('-')

Then, from the dataframe with column Zip, drop any rows where there is a hyphen contained

 data = data.drop(data[data['Zip'].str.find('-')].index)

EDIT: This code is not tested but the general idea is there

Splitting a string on a special character in a pandas dataframe column based on a conditional

Question

2 answers

solution1
1 ACCPTED 2020-12-02 19:16:34

solution2
0 2019-08-13 22:22:21

Splitting a string on a special character in a pandas dataframe column based on a conditional

Question

2 answers

solution1 1 ACCPTED 2020-12-02 19:16:34

solution2 0 2019-08-13 22:22:21

solution1
1 ACCPTED 2020-12-02 19:16:34

solution2
0 2019-08-13 22:22:21