I'm using Pandas to load an Excel spreadsheet which contains zip code (eg 32771). The zip codes are stored as 5 digit strings in spreadsheet. When they are pulled into a DataFrame using the command...
xls = pd.ExcelFile("5-Digit-Zip-Codes.xlsx")
dfz = xls.parse('Zip Codes')
they are converted into numbers. So '00501' becomes 501.
So my questions are, how do I:
a. Load the DataFrame and keep the string type of the zip codes stored in the Excel file?
b. Convert the numbers in the DataFrame into a five digit string eg "501" becomes "00501"?
As a workaround, you could convert the int
s to 0-padded strings of length 5 using Series.str.zfill
:
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
Demo:
import pandas as pd
df = pd.DataFrame({'zipcode':['00501']})
df.to_excel('/tmp/out.xlsx')
xl = pd.ExcelFile('/tmp/out.xlsx')
df = xl.parse('Sheet1')
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
print(df)
yields
zipcode
0 00501
str(my_zip).zfill(5)
or
print("{0:>05s}".format(str(my_zip)))
are 2 of many many ways to do this
You can avoid panda's type inference with a custom converter, eg if 'zipcode'
was the header of the column with zipcodes:
dfz = xls.parse('Zip Codes', converters={'zipcode': lambda x:x})
This is arguably a bug since the column was originally string encoded, made an issue here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.