用从数据框中另一列的最大值计算的值替换字符串

Question

I have a dataframe with an ID column that has dtype Object (as contains INTs and STRs) so am trying to use np.where to replace each of them in turn with the next highest number... However for some reason in the example below it's only replacing one of the 2 strings and I have no idea why?我有一个带有 dtype 对象的 ID 列的数据np.where （因为包含 INT 和 STR），所以我尝试使用np.where依次用下一个最高数字替换它们中的每一个......但是出于某种原因，在下面的示例中它只是替换了 2 个字符串之一，我不知道为什么？

df = pd.DataFrame({'IDstr':['480610_ABC_087', '78910_ABC_087','4806105017087','414149'],
                       'IDint':[ 0, 0, 4806105017087, 414149]})
print (df)

unique_str_IDs = df['IDstr'][df['IDstr'].str.contains("ABC", na=False)].unique()
for i in range(len(unique_str_IDs)):
    df['SKUintTEST']=np.where(df['IDstr'] == unique_str_IDs[i].strip(), 
            df['SKUint_y'].max()+i+1, df['SKUint_y'])

Has anyone got any ideas?有没有人有任何想法？

Answer 1

You can use map with a dictionary created with in incremental for each unique id, then fillna with the original value for the rows not mapped:您可以将map与为每个唯一 id 增量创建的字典一起使用，然后使用未映射的行的原始值fillna ：

df = pd.DataFrame({'IDstr':['480610_ABC_087', '78910_ABC_087','4806105017087','414149'],
                    'IDint':[ 0, 0, 4806105017087, 414149], 
                    'SKUint_y': range(10,14)})

unique_str_IDs = df.loc[df['IDstr'].str.contains("ABC", na=False), 'IDstr'].unique()

df['SKUintTEST'] = df['IDstr'].map({idx:i for i, idx in enumerate(unique_str_IDs, df.SKUint_y.max()+1)})\
                              .fillna(df.SKUint_y)

print (df)
            IDstr          IDint  SKUint_y  SKUintTEST
0  480610_ABC_087              0        10        14.0
1   78910_ABC_087              0        11        15.0
2   4806105017087  4806105017087        12        12.0
3          414149         414149        13        13.0

用从数据框中另一列的最大值计算的值替换字符串

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-03-10 19:45:43

用从数据框中另一列的最大值计算的值替换字符串

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-03-10 19:45:43

解决方案1
1 已采纳 2020-03-10 19:45:43