简体   繁体   中英

pandas dataframe remove constant column

I have a dataframe that may or may not have columns that are the same value. For example

    row    A    B
    1      9    0
    2      7    0
    3      5    0
    4      2    0

I'd like to return just

   row    A  
   1      9    
   2      7    
   3      5    
   4      2

Is there a simple way to identify if any of these columns exist and then remove them?

I believe this option will be faster than the other answers here as it will traverse the data frame only once for the comparison and short-circuit if a non-unique value is found.

>>> df

   0  1  2
0  1  9  0
1  2  7  0
2  3  7  0

>>> df.loc[:, (df != df.iloc[0]).any()] 

   0  1
0  1  9
1  2  7
2  3  7

Ignoring NaN s like usual, a column is constant if nunique() == 1 . So:

>>> df
   A  B  row
0  9  0    1
1  7  0    2
2  5  0    3
3  2  0    4
>>> df = df.loc[:,df.apply(pd.Series.nunique) != 1]
>>> df
   A  row
0  9    1
1  7    2
2  5    3
3  2    4

I compared various methods on data frame of size 120*10000. And found the efficient one is

def drop_constant_column(dataframe):
    """
    Drops constant value columns of pandas dataframe.
    """
    return dataframe.loc[:, (dataframe != dataframe.iloc[0]).any()]

1 loop, best of 3: 237 ms per loop

The other contenders are

def drop_constant_columns(dataframe):
    """
    Drops constant value columns of pandas dataframe.
    """
    result = dataframe.copy()
    for column in dataframe.columns:
        if len(dataframe[column].unique()) == 1:
            result = result.drop(column,axis=1)
    return result

1 loop, best of 3: 19.2 s per loop

def drop_constant_columns_2(dataframe):
    """
    Drops constant value columns of pandas dataframe.
    """
    for column in dataframe.columns:
        if len(dataframe[column].unique()) == 1:
            dataframe.drop(column,inplace=True,axis=1)
    return dataframe

1 loop, best of 3: 317 ms per loop

def drop_constant_columns_3(dataframe):
    """
    Drops constant value columns of pandas dataframe.
    """
    keep_columns = [col for col in dataframe.columns if len(dataframe[col].unique()) > 1]
    return dataframe[keep_columns].copy()

1 loop, best of 3: 358 ms per loop

def drop_constant_columns_4(dataframe):
    """
    Drops constant value columns of pandas dataframe.
    """
    keep_columns = dataframe.columns[dataframe.nunique()>1]
    return dataframe.loc[:,keep_columns].copy()

1 loop, best of 3: 1.8 s per loop

Assuming that the DataFrame is completely of type numeric:

you can try:

>>> df = df.loc[:, df.var() == 0.0]

which will remove constant(ie variance = 0) columns.

If the DataFrame is of type both numeric and object, then you should try:

>>> enum_df = df.select_dtypes(include=['object'])
>>> num_df = df.select_dtypes(exclude=['object'])
>>> num_df = num_df.loc[:, num_df.var() == 0.0]
>>> df = pd.concat([num_df, enum_df], axis=1)

which will drop constant columns of numeric type only.

If you also want to ignore/delete constant enum columns, you should try:

>>> enum_df = df.select_dtypes(include=['object'])
>>> num_df = df.select_dtypes(exclude=['object'])
>>> enum_df = enum_df.loc[:, [True if y !=1 else False for y in [len(np.unique(x, return_counts=True)[-1]) for x in enum_df.T.as_matrix()]]]
>>> num_df = num_df.loc[:, num_df.var() == 0.0]
>>> df = pd.concat([num_df, enum_df], axis=1)

Here is my solution since I needed to do both object and numerical columns. Not claiming its super efficient or anything but it gets the job done.

def drop_constants(df):
    """iterate through columns and remove columns with constant values (all same)"""
    columns = df.columns.values
    for col in columns:
        # drop col if unique values is 1
        if df[col].nunique(dropna=False) == 1:
            del df[col]
    return df

Extra caveat, it won't work on columns of lists or arrays since they are not hashable.

Many examples in this thread does not work properly. Check this my answer with collection of examples that work

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM