检查字符串类型的熊猫数据框列

Question

I have a fairly large pandas dataframe (11k rows and 20 columns).我有一个相当大的熊猫数据框（11k 行和 20 列）。 One column has a mixed data type, mostly numeric (float) with a handful of strings scattered throughout.一列具有混合数据类型，主要是数字（浮点数），其中散布着少量字符串。

I subset this dataframe by querying other columns before performing some statistical analysis using the data in the mixed column (but can't do this if there's a string present).在使用混合列中的数据执行一些统计分析之前，我通过查询其他列来对该数据框进行子集化（但如果存在字符串则无法执行此操作）。 99% of the time once subsetted this column is purely numeric, but rarely a string value will end up in the subset, which I need to trap. 99% 的时间一旦子集该列是纯粹的数字，但很少有字符串值最终会出现在子集中，我需要捕获它。

What's the most efficient/pythonic way of looping through a Pandas mixed type column to check for strings (or conversely check whether the whole column is full of numeric values or not)?循环遍历 Pandas 混合类型列以检查字符串（或相反地检查整个列是否充满数值）的最有效/pythonic 方法是什么？

If there is even a single string present in the column I want to raise an error, otherwise proceed.如果列中甚至存在单个字符串，我想引发错误，否则继续。

Answer 1

This is one way.这是一种方式。 I'm not sure it can be vectorised.我不确定它是否可以矢量化。

import pandas as pd

df = pd.DataFrame({'A': [1, None, 'hello', True, 'world', 'mystr', 34.11]})

df['stringy'] = [isinstance(x, str) for x in df.A]

#        A stringy
# 0      1   False
# 1   None   False
# 2  hello    True
# 3   True   False
# 4  world    True
# 5  mystr    True
# 6  34.11   False

Answer 2

Here's a different way.这是一种不同的方式。 It converts the values of column A to numeric, but does not fail on errors: strings are replaced by NA.它将A列的值转换为数字，但不会因错误而失败：字符串被 NA 替换。 The notnull() is there to remove these NA. notnull()用于删除这些 NA。

df = df[pd.to_numeric(df.A, errors='coerce').notnull()]

However, if there were NAs in the column already, they too will be removed.但是，如果列中已经有 NA，它们也将被删除。

See also: Select row from a DataFrame based on the type of the object(ie str)另请参阅：根据对象的类型（即 str）从 DataFrame 中选择行

检查字符串类型的熊猫数据框列

问题描述

2 个解决方案

解决方案1
3 2018-03-07 11:55:09

解决方案2
0 2020-09-15 09:15:05

检查字符串类型的熊猫数据框列

问题描述

2 个解决方案

解决方案1 3 2018-03-07 11:55:09

解决方案2 0 2020-09-15 09:15:05

解决方案1
3 2018-03-07 11:55:09

解决方案2
0 2020-09-15 09:15:05