[英]Replacing string with value calculated from the max of another column in a dataframe
I have a dataframe with an ID column that has dtype Object (as contains INTs and STRs) so am trying to use np.where
to replace each of them in turn with the next highest number... However for some reason in the example below it's only replacing one of the 2 strings and I have no idea why?我有一个带有 dtype 对象的 ID 列的数据
np.where
(因为包含 INT 和 STR),所以我尝试使用np.where
依次用下一个最高数字替换它们中的每一个......但是出于某种原因,在下面的示例中它只是替换了 2 个字符串之一,我不知道为什么?
df = pd.DataFrame({'IDstr':['480610_ABC_087', '78910_ABC_087','4806105017087','414149'],
'IDint':[ 0, 0, 4806105017087, 414149]})
print (df)
unique_str_IDs = df['IDstr'][df['IDstr'].str.contains("ABC", na=False)].unique()
for i in range(len(unique_str_IDs)):
df['SKUintTEST']=np.where(df['IDstr'] == unique_str_IDs[i].strip(),
df['SKUint_y'].max()+i+1, df['SKUint_y'])
Has anyone got any ideas?有没有人有任何想法?
You can use map
with a dictionary created with in incremental for each unique id, then fillna
with the original value for the rows not mapped:您可以将
map
与为每个唯一 id 增量创建的字典一起使用,然后使用未映射的行的原始值fillna
:
df = pd.DataFrame({'IDstr':['480610_ABC_087', '78910_ABC_087','4806105017087','414149'],
'IDint':[ 0, 0, 4806105017087, 414149],
'SKUint_y': range(10,14)})
unique_str_IDs = df.loc[df['IDstr'].str.contains("ABC", na=False), 'IDstr'].unique()
df['SKUintTEST'] = df['IDstr'].map({idx:i for i, idx in enumerate(unique_str_IDs, df.SKUint_y.max()+1)})\
.fillna(df.SKUint_y)
print (df)
IDstr IDint SKUint_y SKUintTEST
0 480610_ABC_087 0 10 14.0
1 78910_ABC_087 0 11 15.0
2 4806105017087 4806105017087 12 12.0
3 414149 414149 13 13.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.