[英]Check each value in one column with each value of other column in one dataframe
我有以下数据框:
import pandas as pd
dict = {'val1':["3.2", "2.4", "-2.3", "-4.9","0"],
'class': ["1", "0", "0", "0", "1"],
'val2':["3.2", "2.7", "1.7", "-7.1", "0"]}
df = pd.DataFrame(dict)
df
val1 class val2
0 3.2 1 3.2
1 2.4 0 2.7
2 -2.3 0 1.7
3 -4.9 0 -7.1
4 0.0 1 0.0
我想检查两件事:1)对于符号:如果列 val1 中记录的符号与列 val2 的符号不同(例如:索引 2 处的值的符号不相同),在这种情况下更改值 2 的符号到值 1 的符号。期望的输出是这样的:
val1 class val2
0 3.2 1 3.2
1 2.4 0 2.7
2 -2.3 0 -1.7
3 -4.9 0 -7.1
4 0.0 1 0.0
2) 第二次检查:val2 列中的值是否在 val1 列中的值+2 和-2 之间的区间内。 例如:在索引 2: 2.4 处的记录在 [2.7+2: 2.7-2] 范围内。 如果条件为真,那么我想将类从 0 更改为 1。所需的输出是:
val1 class val2
0 3.2 1 3.2
1 2.4 1 2.7
2 -2.3 1 -1.7
3 -4.9 0 -7.1
4 0.0 1 0.0
如有必要,首先将值转换为浮点数,然后使用numpy.sign
设置符号,然后第二次使用Series.between
:
df['val1'] = df['val1'].astype(float)
df['val2'] = df['val2'].astype(float)
df['val2'] *= np.sign(df['val1']) * np.sign(df['val2'])
df['class'] = df['val2'].between(df['val1'] - 2, df['val1'] + 2).astype(int)
#alternative
#df['class'] = np.where(df['val2'].between(df['val1'] - 2, df['val1'] + 2), 1, 0)
print (df)
val1 class val2
0 3.2 1 3.2
1 2.4 1 2.7
2 -2.3 1 -1.7
3 -4.9 0 -7.1
4 0.0 1 0.0
尝试这个:
import numpy as np
# Check 1
df['val2'] = df.apply(lambda x: np.sign(x['val1']) * np.sign(x['val2']) * x['val2'], axis=1)
# Check 2
df['class'] = df.apply(lambda x: int(abs(x['val1'] - x['val2']) < 2) , axis=1)
我认为这将在不使用任何其他库的情况下解决您的查询:
def signfunc(x,y):
if x*y >= 0:
return y
else:
return -1*y
df['val1'] = df['val1'].astype(float)
df['val2'] = df['val2'].astype(float)
df['val2'] = df.apply(lambda x: signfunc(x.val1, x.val2), axis=1)
print(df)
df.loc[abs(df["val1"]-df["val2"])<=2, 'class'] = 1
print(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.