根据前两个字母替换部分 pandas 数据框列

Question

I have a pandas dataframe where I need to conditionally update the value based on the first two letters.我有一个熊猫数据框，我需要根据前两个字母有条件地更新值。 The pattern is simple and the code below works, but it doesn't feel pythonic.该模式很简单，下面的代码有效，但感觉不像 Pythonic。 I need to extend this to other letters (at least 11-19/AJ) and, while I could just add additional rows, I'd really like to do this the right way.我需要将其扩展到其他字母（至少 11-19/AJ），虽然我可以添加额外的行，但我真的很想以正确的方式做到这一点。 Existing code below下面的现有代码

df['REFERENCE_ID'] = df['PRECERT_ID'].astype(str)
df.loc[df['REFERENCE_ID'].str.startswith('11'), 'REFERENCE_ID'] = 'A' + df['PRECERT_ID'].str[-7:]
df.loc[df['REFERENCE_ID'].str.startswith('12'), 'REFERENCE_ID'] = 'B' + df['PRECERT_ID'].str[-7:]
df.loc[df['REFERENCE_ID'].str.startswith('13'), 'REFERENCE_ID'] = 'C' + df['PRECERT_ID'].str[-7:]
df.loc[df['REFERENCE_ID'].str.startswith('14'), 'REFERENCE_ID'] = 'D' + df['PRECERT_ID'].str[-7:]
df.loc[df['REFERENCE_ID'].str.startswith('15'), 'REFERENCE_ID'] = 'E' + df['PRECERT_ID'].str[-7:]

I thought I might be able to use a list of letters, like我以为我可以使用字母列表，例如

letters = list(string.ascii_uppercase)

but I'm new to dataframes (and python in general) and can't figure out the syntax to get the dataframe equivalent of但我是数据帧的新手（以及一般的python）并且无法弄清楚获得等效于的数据帧的语法

letters = list(string.ascii_uppercase)
text = '1523456789'
first = int(text[:2])
text = letters[first-11] + text[-7:]

I wasn't able to find something addressing this, but would be grateful for any help or a link to a similar question if it exists.我无法找到解决此问题的方法，但如果有任何帮助或类似问题的链接，我将不胜感激。 Thank you.谢谢你。

Answer 1

df['REFERENCE_ID'] = df['PRECERT_ID'].astype(str)

# Save all uppercase english letters in a list
letters = list(string.ascii_uppercase)

# Enumerate over the letters list and start with 11 as the OP wants in this way only. 
# All the uppercase english letters and corresponding numbers starting with 11. 
for i,l in enumerate(letters, start=11):
    df.loc[df['REFERENCE_ID'].str.startswith(str(i)), 'REFERENCE_ID'] = l + df['PRECERT_ID'].str[-7:]

Answer 2

I would try to make a look up dictionary and use map to speed things up.我会尝试查找字典并使用map来加快速度。

To make the look up dictonary you could use:要查找字典，您可以使用：

lu_dict = dict(zip([str(i) for i in range(11,20)],[chr(i) for i in range(65,74)]))

which returns:返回：

{'11': 'A',
 '12': 'B',
 '13': 'C',
 '14': 'D',
 '15': 'E',
 '16': 'F',
 '17': 'G',
 '18': 'H',
 '19': 'I'}

Then you could use .str.slice.map to avoid the for loop.然后你可以使用.str.slice.map来避免 for 循环。

df = pd.DataFrame(data = {'Reference_ID':['112326345','12223356354','6735435634']})
df.Reference_ID = df.Reference_ID.astype(str)

df.loc[:,'Reference_new'] = df.Reference_ID.str.slice(0,2).map(lu_dict) + df.Reference_ID.str.slice(-7, )

Which results in:结果是：

  Reference_ID Reference_new
0    112326345      A2326345
1  12223356354      B3356354
2   6735435634           NaN

根据前两个字母替换部分 pandas 数据框列

问题描述

2 个解决方案

解决方案1
0 已采纳 2020-08-27 17:52:17

解决方案2
0 2020-08-27 18:37:55

根据前两个字母替换部分 pandas 数据框列

问题描述

2 个解决方案

解决方案1 0 已采纳 2020-08-27 17:52:17

解决方案2 0 2020-08-27 18:37:55

解决方案1
0 已采纳 2020-08-27 17:52:17

解决方案2
0 2020-08-27 18:37:55