简体   繁体   中英

Check if Pandas Series is of type string

I import some data from a parquet file into a DataFrame and want to check the data types. One of the data types I expect is strings. To do this, I have something like the following:

import pandas as pd
col = pd.Series([None, 'b', 'c', None, 'e'])
assert((col.dtype == object) and (isinstance(col[0], str)))

But, as you can see, this does not work if I accidentally have a None value at the beginning.

Does anybody have an idea how to do that efficiently (preferably without having to check each element of the series)?

As of Pandas 1.0.0 there's a StringDtype , which you can use to check if the pd.Series contains only either NaN or string values:

try:
    col.astype('string')
except ValueError as e:
    raise e

If you try with a column containing an int :

col = pd.Series([None, 2, 'c', None, 'e'])

try:
    col.astype('string')
except ValueError as e:
    raise e

You'd get a ValueError :

ValueError: StringArray requires a sequence of strings or pandas.NA

You can use first_valid_index to retrieve and check the first non-NA item:

isinstance(col.iloc[col.first_valid_index()], str)

you can convert entire series all values to str type as follows:

col = col.astype(str)

None value will became string value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM