Drop columns in Dataframe if more than 90% of the values in the column are 0's

Question

I have a dataframe which looks like this:

As you can see the third and fourth column have a lot of 0's. I need to drop these columns if more than 90% of these values are 0.

Answer 1

First of all, next time please give an example dataset, not an image or copy of one. It's best to give a minimal example that reproduces your problem (it's also a good way to investigate your problem). This df, for example, will do the trick:

df = pd.DataFrame.from_dict({
    'a':[1,0,0,0,0,0,0,0,0,0,0],
    'b':[1,1,1,0,1,0,0,0,0,0,0]})

Now, the previous answers help, but if you can avoid a loop, it's preferable. You can write something simpler and more concise that will do the trick:

df.drop(columns=df.columns[df.eq(0).mean()>0.9])

Let's go through it step by step:
The df.eq(0) returns True \\ False in each cell.
The .mean() method treats True as 1 and False as 0, so comparing that mean to 0.9 is what you want.
Calling df.columns[...] at these places will return only those where the >0.9 holds, and drop just drops them.

Answer 2

The following should do the trick for you:

row_count = df.shape[0]
columns_to_drop = []

for column, count in df.apply(lambda column: (column == 0).sum()).iteritems():
    if count / row_count >= 0.9:
        columns_to_drop.append(column)

df = df.drop(columns_to_drop, axis=1, inplace=True)

Answer 3

bad_col = []
for i, x in enumerate(df.columns):
    if sorted(list(df[x].value_counts(normalize = True).values))[-1] >= 0.9 :
        bad_col.append(x)

Answer 4

Explanation inline the code .

#Suppose df is your DataFrame then execute the following code.

df_float=df.loc[:, df.dtypes == np.float64] #checks if the column contains numbers

for i in df_float.columns:
    if ((len(df_float[i].loc[df_float[i]==0])/len(df_float))>0.9): #checking if 90% data is zero
        df_float.drop(i,axis=1,inplace=True) #delete the column

#Your results are stored in df_float

Drop columns in Dataframe if more than 90% of the values in the column are 0's

Question

4 answers

solution1
3 ACCPTED 2019-04-07 17:01:34

solution2
1 2019-04-07 16:31:23

solution3
0 2022-12-31 12:16:20

solution4
-1 2019-04-07 16:28:41

Drop columns in Dataframe if more than 90% of the values in the column are 0's

Question

4 answers

solution1 3 ACCPTED 2019-04-07 17:01:34

solution2 1 2019-04-07 16:31:23

solution3 0 2022-12-31 12:16:20

solution4 -1 2019-04-07 16:28:41

solution1
3 ACCPTED 2019-04-07 17:01:34

solution2
1 2019-04-07 16:31:23

solution3
0 2022-12-31 12:16:20

solution4
-1 2019-04-07 16:28:41