简体   繁体   中英

Python Dataframe: How to check specific columns for elements

I want to check whether all elements from a certain column contain the number 0?

I have a dataset that I read with df=pd.read_table('ad-data')
From this I felt a data frame with elements

[0] [1.] [2] [3] [4] [5] [6] [7] ....1559

[1.]  3   2   3   0   0   0   0

[2]  2   3   2   0   0   0   0

[3]  3   2   2   0   0   0   0

[4]  6   7   3   0   0   0   0

[5]  3   2   1   0   0   0   0

...
3220

I would like to check whether the data set from column 4 to 1559 contains only 0 or also other values.

在此处输入图像描述

You can check for equality with 0 element-wise and use all for rows:

df['all_zeros'] = (df.iloc[:, 4:1560] == 0).all(axis=1)

Small example to demonstrate it (based on columns 1 to 3 here):

N = 5
df = pd.DataFrame(np.random.binomial(1, 0.4, size=(N, N)))
df['all_zeros'] = (df.iloc[:, 1:4] == 0).all(axis=1)
df

Output:

   0  1  2  3  4  all_zeros
0  0  1  1  0  0      False
1  0  0  1  1  1      False
2  0  1  1  0  0      False
3  0  0  0  0  0       True
4  1  0  0  0  0       True

Update: Filtering non-zero values:

df[~df['all_zeros']]

Output:

   0  1  2  3  4  all_zeros
0  0  1  1  0  0      False
1  0  0  1  1  1      False
2  0  1  1  0  0      False

Update 2: To show only non-zero values:

pd.melt(
    df_filtered.iloc[:, 1:4].reset_index(),
    id_vars='index', var_name='column'
).query('value != 0').sort_values('index')

Output:

   index column  value
0      0      1      1
3      0      2      1
4      1      2      1
7      1      3      1
2      2      1      1
5      2      2      1
df['Check']=df.loc[:,4:].sum(axis=1)

here is the way to check if all of values are zero or not: it's simple and doesn't need advanced functions as above answers. only basic functions like filtering and if loops and variable assigning.

first is the way to check if one column has only zeros or not and second is how to find if all the columns have zeros or not. and it prints and answer statement.

the method to check if one column has only zero values or not:

first make a series:

 has_zero = df[4] == 0
 # has_zero is a series which contains bool values for each row eg. True, False.
 # if there is a zero in a row it result will be "row_number : True"

next:

rows_which_have_zero = df[has_zero]
# stores the rows which have zero as a data frame 

next:

if len[rows_which_have_zero] == total_number_rows:
    print("contains only zeros")
else: 
    print("contains other numbers than zero")
# substitute total_number_rows for 3220 

the above method only checks if rows_which_have_zero is equal to amount of the rows in the column.

the method to see if all of the columns have only zero or not:

it uses the above function and puts it into a if loop.

no_of_columns = 1559
value_1 = 1

if value_1 <= 1559
     has_zero = df[value_1] == 0
     rows_which_have_zero = df[has_zero]
     value_1 += 1
     if len[rows_which_have_zero] == 1559 
         no_of_rows_with_only_zero += 1
     else:
         return

to check if all rows have zero only or not:

   #since it doesn't matter if first 3 columns have zero or not:
   no_of_rows_with_only_zero = no_of_rows_with_only_zero - 3
   if no_of_rows_with_only_zero == 1559:
       print("there are only zero values")
   else:
       print("there are numbers which are not zero")

above checks if no_of_rows_with_only_zero is equal to the amount of rows (which is 1559 minus 3 because only rows 4 - 1559 need to be checked)

update:

  # convert the value_1 to str if the column title is a str instead of int 
  # when updating value_1 by adding: convert it back to int and then back to str 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM