I am trying to establish conformity in an address column in my pandas dataframe. I have a ZipCode Column that has two formats: 1) 87301 2) 87301-1234. Not every row has the hyphen so I need to split on the hyphen when it is present.
My data looks like this:
State ZIP
CA 85145-7045
PA 76913
I have tried a few methods of tackling this problem. I have tried:
data['Zip_1'],data['Zip_2'] = data['Zip'].str.split('-').str
I have tried:
data['Zip'] = data['Zip'].str.split('-', n=1, expand=True)
data['Zip'] = data['Zip'][0]
data['Zip_drop'] = data['Zip'][1]
I have also tried using a lambda function.
However it just returns nulls.
I would expect the new column to return NaN for zipcodes that do not have the hyphen and the numbers after the hyphen if it does contain the hyphen. However, the new column just populates NaN for every observation
You can do that by using " replace " combined with regular expressions .
Step 1
example_df = pd.DataFrame({'State': ['CA', 'PA'],
'ZIP': ['85145-7045', '76913'] })
example_df
Step 2
# Keep only the numbers before the hyphen (if any).
example_df = example_df.replace('\-\d*', '', regex=True)
example_df
Get a dataframe of all zipcodes containing a hyphen, and place it in a new column
data['Zip Hyphen'] = data['Zip'].str.find('-')
Then, from the dataframe with column Zip, drop any rows where there is a hyphen contained
data = data.drop(data[data['Zip'].str.find('-')].index)
EDIT: This code is not tested but the general idea is there
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.