[英]Python Count The Different Values From Two Columns
If i have a dataframe:如果我有 dataframe:
A B C
0.0285714285714285 4 0.11428571
0.107142857142857 4 0.42857143
0.007142857142857 6 0.04285714
1.2 4 5.5
1.5 3 3
Desired output is;所需的 output 是;
A*B C Difference
0.114285714285714 0.11428571 0.000000004285714
0.428571428571428 0.42857143 -0.000000001428572
0.042857142857142 0.04285714 0.000000002857142
4.8 5.5 -0.7
4.5 3 1.5
Count: 2
I want to ignore the like 3 rows, because the difference is very small.我想忽略like 3行,因为差异很小。 only the first digit after the comma should be included.仅应包含逗号后的第一个数字。
Could you please help me about this?你能帮我解决这个问题吗?
Using np.where
to check whether the result is significant enough:使用np.where
检查结果是否足够显着:
df["difference"] = np.where((df["A"]*df["B"]-df["C"]>=0.1)|(df["A"]*df["B"]-df["C"]<=-0.1),df["A"]*df["B"]-df["C"],0)
print (df)
#
A B C difference
0 0.028571 4 0.114286 0.0
1 0.107143 4 0.428571 0.0
2 0.007143 6 0.042857 0.0
3 1.200000 4 5.500000 -0.7
4 1.500000 3 3.000000 1.5
EDIT:编辑:
Because values in column A
are objects (obviously strings):因为A
列中的值是对象(显然是字符串):
df['A'] = df['A'].astype(float)
If not working, because bad values (eg some strings) - bad values are repalced by NaN
s:如果不工作,因为坏值(例如一些字符串) - 坏值被NaN
替换:
df['A'] = pd.to_numeric(df['A'], errors='coerce')
Use Series.mask
for set new column by condition with Series.between
:使用Series.mask
按条件与Series.between
设置新列:
#multiple columns
df['A*B'] = df["A"]*df["B"]
#subtract to Series
diff = df['A*B'] - df['C']
#create mask
mask = diff.between(-0.1, 0.1)
df["difference"] = diff.mask(mask, 0)
print (df)
A B C A*B difference
0 0.028571 4 0.114286 0.114286 0.0
1 0.107143 4 0.428571 0.428571 0.0
2 0.007143 6 0.042857 0.042857 0.0
3 1.200000 4 5.500000 4.800000 -0.7
4 1.500000 3 3.000000 4.500000 1.5
print (f'Count: {(~mask).sum()}')
Count: 2
If order is important add DataFrame.insert
with DataFrame.pop
for extract columns:如果顺序很重要,请添加DataFrame.insert
和DataFrame.pop
用于提取列:
df.insert(0, 'A*B', df.pop("A")*df.pop("B"))
diff = df['A*B'] - df['C']
mask = diff.between(-0.1, 0.1)
df["difference"] = diff.mask(mask, 0)
print (df)
A*B C difference
0 0.114286 0.114286 0.0
1 0.428571 0.428571 0.0
2 0.042857 0.042857 0.0
3 4.800000 5.500000 -0.7
4 4.500000 3.000000 1.5
print (f'Count: {(~mask).sum()}')
Count: 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.