简体   繁体   English

在numpy数组和pandas数据框中选择列的常用方法

[英]Common way to select columns in numpy array and pandas dataframe

I have to write an object that takes either a pandas data frame or a numpy array as the input (similar to sklearn behavior).我必须编写一个对象,该对象将 Pandas 数据框或 numpy 数组作为输入(类似于 sklearn 行为)。 In one of the methods for this object, I need to select the columns (not a particular fixed one, I get a few column indices based on other calculations).在此对象的一种方法中,我需要选择列(不是特定的固定列,我根据其他计算得到一些列索引)。

So, to make my code compatible with both input types, I tried to find a common way to select columns and tried methods like X[:,0] (doesn't work on pandas dataframes), X[0] and others but they select differently.因此,为了使我的代码与两种输入类型兼容,我试图找到一种通用的方法来选择列并尝试使用X[:,0] (不适用于 Pandas 数据帧)、 X[0]等方法,但它们选择不同。 Is there a way to select columns in a similar fashion across pandas and numpy?有没有办法在 pandas 和 numpy 中以类似的方式选择列?

If no then how does sklearn work across these data structures?如果不是,那么 sklearn 如何跨这些数据结构工作?

You can use an if condition within your method and have separate selection methods for pandas dataframes and numpy arrays.您可以在方法中使用 if 条件,并为 Pandas 数据框和 numpy 数组提供单独的选择方法。 Given sample code below.下面给出示例代码。

def method_1(self, var, col_indices):
    if isinstance(var, pd.DataFrame):
        selected_columns = var[var.columns[col_indices]]
    else:
        selected_columns = var[:,col_indices]

Here, var is your input which can be a numpy array or pandas dataframe, col_indices are the indices of the columns you want to select.在这里,var 是您的输入,它可以是一个 numpy 数组或 Pandas 数据框,col_indices 是您要选择的列的索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM