[英]Subtracting numbers from 2 dataframe columns in Python
I am a beginner at Python and have searched the forum for the answer to my question without success. 我是Python的初学者,已经在论坛上搜索了我的问题的答案,但没有成功。
I have a matrix and would like to subtract the numbers from one column from the numbers from another column and create a new column with the result. 我有一个矩阵,想从另一列的数字中减去一列的数字,并用结果创建一个新列。
I tried: 我试过了:
df['new column]=df['column 1']-df['column 2']
my output is: TypeError: unsupported operand type(s) for -: 'str' and 'str'
我的输出是:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
so then I tried to convert these columns to integers before performing subtraction with the following line: 因此,我尝试在执行以下行减法之前将这些列转换为整数:
df['column 2']=df['column 2'].astype(int)
my output is: ValueError: cannot convert float NaN to integer
我的输出是:
ValueError: cannot convert float NaN to integer
(I have some NaN in my dataframe). (我的数据框中有NaN)。 I then tried to replace all of the NaN with an empty string using the following code:
然后,我尝试使用以下代码用空字符串替换所有NaN:
def remove_nan(s):
import math
""" remove np.nan"""
if math.isnan(s) == True:
s.replace( np.nan,"")
else:
return s
df['column 1'] = df.apply(remove_nan, axis=0)
My output is: TypeError: ("cannot convert the series to <class 'float'>", 'occurred at index ID Number')
我的输出是:TypeError :(
("cannot convert the series to <class 'float'>", 'occurred at index ID Number')
I would greatly appreciate it if someone could provide insight as to where I am making errors. 如果有人可以提供我在哪里出错的见解,我将不胜感激。
Thank you for the help. 感谢您的帮助。
Use pd.to_numeric
to convert to numbers with parameter errors='coerce'
to give nan
when it isn't a number 使用
pd.to_numeric
转换为带有参数errors='coerce'
数字,以在非数字时给出nan
Consider the df
考虑一下
df
df = pd.DataFrame(dict(A=list('456 8'), B=list('1 345')))
print(df)
A B
0 4 1
1 5
2 6 3
3 4
4 8 5
After pd.to_numeric
在
pd.to_numeric
之后
df = df.apply(pd.to_numeric, errors='coerce')
print(df)
A B
0 4.0 1.0
1 5.0 NaN
2 6.0 3.0
3 NaN 4.0
4 8.0 5.0
Now we can do our column math 现在我们可以做专栏数学
df['C'] = df.A - df.B
print(df)
A B C
0 4.0 1.0 3.0
1 5.0 NaN NaN
2 6.0 3.0 3.0
3 NaN 4.0 NaN
4 8.0 5.0 3.0
If you want to assume missing values are zero 如果您要假设缺失值为零
df['C'] = df.A.sub(df.B, fill_value=0)
print(df)
A B C
0 4.0 1.0 3.0
1 5.0 NaN 5.0
2 6.0 3.0 3.0
3 NaN 4.0 -4.0
4 8.0 5.0 3.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.