简体   繁体   English

用从数据框中另一列的最大值计算的值替换字符串

[英]Replacing string with value calculated from the max of another column in a dataframe

I have a dataframe with an ID column that has dtype Object (as contains INTs and STRs) so am trying to use np.where to replace each of them in turn with the next highest number... However for some reason in the example below it's only replacing one of the 2 strings and I have no idea why?我有一个带有 dtype 对象的 ID 列的数据np.where (因为包含 INT 和 STR),所以我尝试使用np.where依次用下一个最高数字替换它们中的每一个......但是出于某种原因,在下面的示例中它只是替换了 2 个字符串之一,我不知道为什么?

df = pd.DataFrame({'IDstr':['480610_ABC_087', '78910_ABC_087','4806105017087','414149'],
                       'IDint':[ 0, 0, 4806105017087, 414149]})
print (df)
unique_str_IDs = df['IDstr'][df['IDstr'].str.contains("ABC", na=False)].unique()
for i in range(len(unique_str_IDs)):
    df['SKUintTEST']=np.where(df['IDstr'] == unique_str_IDs[i].strip(), 
            df['SKUint_y'].max()+i+1, df['SKUint_y'])

Has anyone got any ideas?有没有人有任何想法?

You can use map with a dictionary created with in incremental for each unique id, then fillna with the original value for the rows not mapped:您可以将map与为每个唯一 id 增量创建的字典一起使用,然后使用未映射的行的原始值fillna

df = pd.DataFrame({'IDstr':['480610_ABC_087', '78910_ABC_087','4806105017087','414149'],
                    'IDint':[ 0, 0, 4806105017087, 414149], 
                    'SKUint_y': range(10,14)})

unique_str_IDs = df.loc[df['IDstr'].str.contains("ABC", na=False), 'IDstr'].unique()

df['SKUintTEST'] = df['IDstr'].map({idx:i for i, idx in enumerate(unique_str_IDs, df.SKUint_y.max()+1)})\
                              .fillna(df.SKUint_y)

print (df)
            IDstr          IDint  SKUint_y  SKUintTEST
0  480610_ABC_087              0        10        14.0
1   78910_ABC_087              0        11        15.0
2   4806105017087  4806105017087        12        12.0
3          414149         414149        13        13.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从另一个数据帧中的列值替换pandas数据帧中的列中的值 - Replacing value in a column in pandas dataframe from a column value in another dataframe 用另一列中的值替换字符串的一部分 - replacing part of a string with value from another column 用另一个列值的子字符串替换 dataframe 的 null 值 - Replacing null values of a dataframe with a sub-string of another column value 使用先前计算的值(来自同一列)和Pandas Dataframe中另一列的值来计算值 - Calculate value using previously-calculated value (from the same column) and value from another column in a Pandas Dataframe 用来自另一个数据帧特定列的值替换来自数据帧特定列的 Nan 值 - Replacing Nan value from specific column of a dataframe with value from specific column of another dataframe 新列从另一个数据框的 max(value) 增量添加一个 - New Column incrementally adds one from max(value) of another dataframe 使用部分字符串匹配将 dataframe 中的列替换为另一个 dataframe 列 - Replacing a column in a dataframe with another dataframe column using partial string match pyspark - 根据另一个计算列的计算值更新列 - pyspark - Updating a column based on a calculated value from another calculated column 从另一个数据帧添加到数据帧字符串值中的列 - Add to a column in dataframe string value from another dataframe 用另一列中的相同行值替换 pandas dataframe 列中的值 - Replacing values in pandas dataframe column with same row value from another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM