有条件地迭代熊猫中的列

Question

I have a Pandas dataframe for which I would like to return the number of unique values in each column, except that some columns should be excluded. 我有一个Pandas数据框，我想为其返回每列中唯一值的数量，但应排除某些列。

This is how I am used to selecting unique values in a column, but I am not sure how to iterate it: 这就是我习惯于在列中选择唯一值的方式，但是我不确定如何对其进行迭代：

pd.unique(df.column_name.ravel())

My mind goes to something like this, but it obviously is not valid. 我的想法是这样的，但这显然是无效的。

col_names = list(df.columns.values)
dont_include = ['foo', 'bar']
cols_to_include = [x for x in col_names if x not in dont_include]
for i in cols_to_include:
 col_unique_count = len(pd.unique(df.i.ravel())

What is the best solution? 最好的解决方案是什么？

Answer 1

Code can be simplified to this: 代码可以简化为：

cols_to_include = df.columns[~df.columns.str.contains('foo')]
for col in cols_to_include:
  col_unique_count = df[col].nunique()

You can call nunique to get the count of unique values for a given Series 您可以调用nunique以获得给定系列的唯一值的计数

Or: 要么：

cols_to_include = df.columns[~df.columns.str.contains('foo')]
df[cols_to_include].apply(pd.Series.nunique)

here apply will call nunique on each column 在这里apply将在每列上调用nunique

EDIT 编辑

Use isin to test for membership and ~ to negate the boolean mask: 使用isin测试成员资格，并使用~否定布尔掩码：

In [47]:
df = pd.DataFrame(columns = ['foo','baz','bar','pie'])
df

Out[47]:
Empty DataFrame
Columns: [foo, baz, bar, pie]
Index: []

In [48]:
dont_include = ['foo', 'bar']
cols = df.columns[~df.columns.isin(dont_include)]
cols

Out[48]:
Index(['baz', 'pie'], dtype='object')

You can then use my code as before to iterate over the sub-selection of your df 然后，您可以像以前一样使用我的代码遍历df的子选择

有条件地迭代熊猫中的列

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-11-19 15:13:01

有条件地迭代熊猫中的列

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-11-19 15:13:01

解决方案1
2 已采纳 2015-11-19 15:13:01