简体   繁体   English

从Python的2个数据框列中减去数字

[英]Subtracting numbers from 2 dataframe columns in Python

I am a beginner at Python and have searched the forum for the answer to my question without success. 我是Python的初学者,已经在论坛上搜索了我的问题的答案,但没有成功。

I have a matrix and would like to subtract the numbers from one column from the numbers from another column and create a new column with the result. 我有一个矩阵,想从另一列的数字中减去一列的数字,并用结果创建一个新列。

I tried: 我试过了:

df['new column]=df['column 1']-df['column 2']

my output is: TypeError: unsupported operand type(s) for -: 'str' and 'str' 我的输出是: TypeError: unsupported operand type(s) for -: 'str' and 'str'

so then I tried to convert these columns to integers before performing subtraction with the following line: 因此,我尝试在执行以下行减法之前将这些列转换为整数:

df['column 2']=df['column 2'].astype(int)

my output is: ValueError: cannot convert float NaN to integer 我的输出是: ValueError: cannot convert float NaN to integer

(I have some NaN in my dataframe). (我的数据框中有NaN)。 I then tried to replace all of the NaN with an empty string using the following code: 然后,我尝试使用以下代码用空字符串替换所有NaN:

def remove_nan(s):
    import math
    """ remove np.nan"""
    if math.isnan(s) == True:
        s.replace( np.nan,"")
    else:
        return s

df['column 1'] = df.apply(remove_nan, axis=0)

My output is: TypeError: ("cannot convert the series to <class 'float'>", 'occurred at index ID Number') 我的输出是:TypeError :( ("cannot convert the series to <class 'float'>", 'occurred at index ID Number')

I would greatly appreciate it if someone could provide insight as to where I am making errors. 如果有人可以提供我在哪里出错的见解,我将不胜感激。

Thank you for the help. 感谢您的帮助。

Use pd.to_numeric to convert to numbers with parameter errors='coerce' to give nan when it isn't a number 使用pd.to_numeric转换为带有参数errors='coerce'数字,以在非数字时给出nan

Consider the df 考虑一下df

df = pd.DataFrame(dict(A=list('456 8'), B=list('1 345')))

print(df)

   A  B
0  4  1
1  5   
2  6  3
3     4
4  8  5

After pd.to_numeric pd.to_numeric之后

df = df.apply(pd.to_numeric, errors='coerce')

print(df)

     A    B
0  4.0  1.0
1  5.0  NaN
2  6.0  3.0
3  NaN  4.0
4  8.0  5.0

Now we can do our column math 现在我们可以做专栏数学

df['C'] = df.A - df.B

print(df)

     A    B    C
0  4.0  1.0  3.0
1  5.0  NaN  NaN
2  6.0  3.0  3.0
3  NaN  4.0  NaN
4  8.0  5.0  3.0

If you want to assume missing values are zero 如果您要假设缺失值为零

df['C'] = df.A.sub(df.B, fill_value=0)

print(df)



    A    B    C
0  4.0  1.0  3.0
1  5.0  NaN  5.0
2  6.0  3.0  3.0
3  NaN  4.0 -4.0
4  8.0  5.0  3.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM