I have a pandas dataframe where I need to conditionally update the value based on the first two letters. The pattern is simple and the code below works, but it doesn't feel pythonic. I need to extend this to other letters (at least 11-19/AJ) and, while I could just add additional rows, I'd really like to do this the right way. Existing code below
df['REFERENCE_ID'] = df['PRECERT_ID'].astype(str)
df.loc[df['REFERENCE_ID'].str.startswith('11'), 'REFERENCE_ID'] = 'A' + df['PRECERT_ID'].str[-7:]
df.loc[df['REFERENCE_ID'].str.startswith('12'), 'REFERENCE_ID'] = 'B' + df['PRECERT_ID'].str[-7:]
df.loc[df['REFERENCE_ID'].str.startswith('13'), 'REFERENCE_ID'] = 'C' + df['PRECERT_ID'].str[-7:]
df.loc[df['REFERENCE_ID'].str.startswith('14'), 'REFERENCE_ID'] = 'D' + df['PRECERT_ID'].str[-7:]
df.loc[df['REFERENCE_ID'].str.startswith('15'), 'REFERENCE_ID'] = 'E' + df['PRECERT_ID'].str[-7:]
I thought I might be able to use a list of letters, like
letters = list(string.ascii_uppercase)
but I'm new to dataframes (and python in general) and can't figure out the syntax to get the dataframe equivalent of
letters = list(string.ascii_uppercase)
text = '1523456789'
first = int(text[:2])
text = letters[first-11] + text[-7:]
I wasn't able to find something addressing this, but would be grateful for any help or a link to a similar question if it exists. Thank you.
df['REFERENCE_ID'] = df['PRECERT_ID'].astype(str)
# Save all uppercase english letters in a list
letters = list(string.ascii_uppercase)
# Enumerate over the letters list and start with 11 as the OP wants in this way only.
# All the uppercase english letters and corresponding numbers starting with 11.
for i,l in enumerate(letters, start=11):
df.loc[df['REFERENCE_ID'].str.startswith(str(i)), 'REFERENCE_ID'] = l + df['PRECERT_ID'].str[-7:]
I would try to make a look up dictionary and use map
to speed things up.
To make the look up dictonary you could use:
lu_dict = dict(zip([str(i) for i in range(11,20)],[chr(i) for i in range(65,74)]))
which returns:
{'11': 'A',
'12': 'B',
'13': 'C',
'14': 'D',
'15': 'E',
'16': 'F',
'17': 'G',
'18': 'H',
'19': 'I'}
Then you could use .str.slice.map
to avoid the for loop.
df = pd.DataFrame(data = {'Reference_ID':['112326345','12223356354','6735435634']})
df.Reference_ID = df.Reference_ID.astype(str)
df.loc[:,'Reference_new'] = df.Reference_ID.str.slice(0,2).map(lu_dict) + df.Reference_ID.str.slice(-7, )
Which results in:
Reference_ID Reference_new
0 112326345 A2326345
1 12223356354 B3356354
2 6735435634 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.