从Python的2个数据框列中减去数字

Question

I am a beginner at Python and have searched the forum for the answer to my question without success. 我是Python的初学者，已经在论坛上搜索了我的问题的答案，但没有成功。

I have a matrix and would like to subtract the numbers from one column from the numbers from another column and create a new column with the result. 我有一个矩阵，想从另一列的数字中减去一列的数字，并用结果创建一个新列。

I tried: 我试过了：

df['new column]=df['column 1']-df['column 2']

my output is: TypeError: unsupported operand type(s) for -: 'str' and 'str' 我的输出是： TypeError: unsupported operand type(s) for -: 'str' and 'str'

so then I tried to convert these columns to integers before performing subtraction with the following line: 因此，我尝试在执行以下行减法之前将这些列转换为整数：

df['column 2']=df['column 2'].astype(int)

my output is: ValueError: cannot convert float NaN to integer 我的输出是： ValueError: cannot convert float NaN to integer

(I have some NaN in my dataframe). （我的数据框中有NaN）。 I then tried to replace all of the NaN with an empty string using the following code: 然后，我尝试使用以下代码用空字符串替换所有NaN：

def remove_nan(s):
    import math
    """ remove np.nan"""
    if math.isnan(s) == True:
        s.replace( np.nan,"")
    else:
        return s

df['column 1'] = df.apply(remove_nan, axis=0)

My output is: TypeError: ("cannot convert the series to <class 'float'>", 'occurred at index ID Number') 我的输出是：TypeError ：（ ("cannot convert the series to <class 'float'>", 'occurred at index ID Number')

I would greatly appreciate it if someone could provide insight as to where I am making errors. 如果有人可以提供我在哪里出错的见解，我将不胜感激。

Thank you for the help. 感谢您的帮助。

Answer 1

Use pd.to_numeric to convert to numbers with parameter errors='coerce' to give nan when it isn't a number 使用pd.to_numeric转换为带有参数errors='coerce'数字，以在非数字时给出nan

Consider the df 考虑一下df

df = pd.DataFrame(dict(A=list('456 8'), B=list('1 345')))

print(df)

   A  B
0  4  1
1  5   
2  6  3
3     4
4  8  5

After pd.to_numeric 在pd.to_numeric之后

df = df.apply(pd.to_numeric, errors='coerce')

print(df)

     A    B
0  4.0  1.0
1  5.0  NaN
2  6.0  3.0
3  NaN  4.0
4  8.0  5.0

Now we can do our column math 现在我们可以做专栏数学

df['C'] = df.A - df.B

print(df)

     A    B    C
0  4.0  1.0  3.0
1  5.0  NaN  NaN
2  6.0  3.0  3.0
3  NaN  4.0  NaN
4  8.0  5.0  3.0

If you want to assume missing values are zero 如果您要假设缺失值为零

df['C'] = df.A.sub(df.B, fill_value=0)

print(df)



    A    B    C
0  4.0  1.0  3.0
1  5.0  NaN  5.0
2  6.0  3.0  3.0
3  NaN  4.0 -4.0
4  8.0  5.0  3.0

从Python的2个数据框列中减去数字

问题描述

1 个解决方案

解决方案1
0 2017-03-16 18:40:28

从Python的2个数据框列中减去数字

问题描述

1 个解决方案

解决方案1 0 2017-03-16 18:40:28

解决方案1
0 2017-03-16 18:40:28