[英]Modifying dataframe column based on another column values
I have a dataframe with two columns and want to modify one column based on value of other column.我有一个包含两列的数据框,想根据另一列的值修改一列。
Example例子
unit name
feet abcd_feet
celcius abcd_celcius
yard bcde_yard
yard bcde
If the unit is feet
or yard
and the name ends with it then I wanna remove it from the column.如果单位是
feet
或yard
并且名称以它结尾,那么我想将它从列中删除。
unit name
feet abcd
celcius abcd_celcius
yard bcde
yard bcde
There are two possible ways of solving your problem:有两种可能的方法可以解决您的问题:
First method , the faster, as pandas is column-based:第一种方法,速度更快,因为熊猫是基于列的:
UNITS_TO_REMOVE = {'feet', 'yard'}
df['value_'], df['unit_'] = df['name'].str.split('_').str
values_to_clean = (df['unit_'].isin(UNITS_TO_REMOVE)) & (df['unit_'] == df['unit'])
df.loc[values_to_clean, 'name'] = df.loc[values_to_clean, 'value_']
df.drop(columns=['unit_', 'value_'], inplace=True)
Here is the result,这是结果,
unit name
0 feet abcd
1 celcius abcd_celcius
2 yard bcde
3 yard bcde
Performances: 20 ms ± 401 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) (on a (4000, 2) dataframe)性能:每个循环 20 ms ± 401 µs(7 次运行的平均值 ± 标准偏差,每次 100 次循环) (在 (4000, 2) 数据帧上)
Second method , using apply (which is sometimes the only available solution):第二种方法,使用 apply(有时是唯一可用的解决方案):
UNITS_TO_REMOVE = {'feet', 'yard'}
def remove_unit(unit, value):
if unit not in UNITS_TO_REMOVE or '_' not in value:
return value
else:
row_value, row_unit = value.split('_')
if row_unit == unit:
return row_value
else:
return value
df['name'] = df.apply(lambda row: remove_unit(row['unit'], row['name']), axis=1)
Output:输出:
unit name
0 feet abcd
1 celcius abcd_celcius
2 yard bcde
3 yard bcde
Performances: 152 ms ± 3.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)性能:每个循环 152 ms ± 3.95 ms(7 次运行的平均值 ± 标准偏差,每次 10 次循环)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.