I have a fairly large pandas dataframe (11k rows and 20 columns). One column has a mixed data type, mostly numeric (float) with a handful of strings scattered throughout.
I subset this dataframe by querying other columns before performing some statistical analysis using the data in the mixed column (but can't do this if there's a string present). 99% of the time once subsetted this column is purely numeric, but rarely a string value will end up in the subset, which I need to trap.
What's the most efficient/pythonic way of looping through a Pandas mixed type column to check for strings (or conversely check whether the whole column is full of numeric values or not)?
If there is even a single string present in the column I want to raise an error, otherwise proceed.
This is one way. I'm not sure it can be vectorised.
import pandas as pd
df = pd.DataFrame({'A': [1, None, 'hello', True, 'world', 'mystr', 34.11]})
df['stringy'] = [isinstance(x, str) for x in df.A]
# A stringy
# 0 1 False
# 1 None False
# 2 hello True
# 3 True False
# 4 world True
# 5 mystr True
# 6 34.11 False
Here's a different way. It converts the values of column A
to numeric, but does not fail on errors: strings are replaced by NA. The notnull()
is there to remove these NA.
df = df[pd.to_numeric(df.A, errors='coerce').notnull()]
However, if there were NAs in the column already, they too will be removed.
See also: Select row from a DataFrame based on the type of the object(ie str)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.