I want to avoid crashes when performing vectorized calculations using pandas dataframes (python-3.6).
For example I have a dataframe with 2 Columns A,B. I want to create a column C that will be C = A - B. However one cell in column A is a string and this cause a TypeError. Have a look at the picture below.
Column C is the outcome that I want to achieve.
Currently I get an Type Error message:
TypeError: unsupported operand type(s) for -: 'float' and 'str'
which is expected.
It is possible by numpy.select
, but get mixed values in output:
df = pd.DataFrame({
'A':[7,8,9,10,5],
'B':[1,2,3,'str',np.nan],
})
b = pd.to_numeric(df['B'], errors='coerce')
df['C'] = np.select([df['B'].isna(), b.isna()], [np.nan, 'ERROR'], default=df['A'] - b)
print (df)
A B C
0 7 1 6.0
1 8 2 6.0
2 9 3 6.0
3 10 str ERROR
4 5 NaN nan
The best is convert to numeric by to_numeric
and subtract only if need processing column later:
b = pd.to_numeric(df['B'], errors='coerce')
df['C'] = df['A'] - b
print (df)
A B C
0 7 1 6.0
1 8 2 6.0
2 9 3 6.0
3 10 str NaN
4 5 NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.