简体   繁体   中英

Iterate over pandas columns with row-wise comparisons

I have many columns in a dataframe, and I want to compare the values in each column to a specific column. For example, say I want to, for every column in this dataframe, sum the cases where both the column value and the label are equal to 1:

 col1 | col2 | col3 | ... | label
  1   |  0   |  0   | ... |   1
  0   |  0   |  1   | ... |   0

When I try to do this with something like df.apply(lambda x: x.label==1, axis=1) , I can select the label column with x.label , but how do I select the column itself?

I can do this using a for loop that iterates over the column names, but am wondering if there's a more pandas-like way to do it without using a loop.

results = []
for col in df.columns:
    val = len(df[(df[col]==1) & (df.label==1)])
    results.append(val)

Just filter by label and sum what is left:

df.loc[df['label'] == 1].sum()

Example:

df = pd.DataFrame(np.random.randint(2, size=(10, 4)),
                  columns=['col1', 'col2', 'col3', 'label'])
print(df)

   col1  col2  col3  label
0     0     0     1      1
1     1     1     0      0
2     1     1     0      0
3     0     0     0      0
4     0     0     1      0
5     0     0     0      1
6     1     0     1      1
7     0     1     1      0
8     0     0     0      0
9     0     0     0      0

results = []
for col in df.columns:
    val = len(df[(df[col]==1) & (df.label==1)])
    results.append(val)
results

[1, 0, 2, 3]

df.loc[df['label'] == 1].sum().tolist()

[1, 0, 2, 3]

EDIT:

If not everything is 0 or 1 but you still want to sum the cases where both the column value and the label are equal to 1, after filtering by label make everyting which is not 0 or 1 to be 0 and sum what is left:

df = pd.DataFrame(np.random.randint(3, size=(10, 4)),
                  columns=['col1', 'col2', 'col3', 'label'])
print(df)

   col1  col2  col3  label
0     0     0     2      1
1     1     0     0      2
2     2     1     0      2
3     1     1     1      0
4     0     0     2      1
5     2     2     1      2
6     0     2     1      1
7     1     1     0      0
8     1     0     0      2
9     0     2     1      2

results = []
for col in df.columns:
    val = len(df[(df[col]==1) & (df.label==1)])
    results.append(val)
results

[0, 0, 1, 3]

df.loc[df['label'] == 1][df == 1].sum().fillna(0).tolist()

[0.0, 0.0, 1.0, 3.0]

You can use np.equal() to get a boolean array for element-wise equality. This works for any integer as well as other dtypes .

To illustrate:

df = pd.DataFrame(np.random.randint(2, size=(10, 4)), columns=['col1', 'col2', 'col3', 'label'])

   col1  col2  col3  label
0     0     1     1      0
1     1     0     1      0
2     1     0     0      1
3     1     0     0      0
4     0     1     1      1
5     1     1     0      0
6     0     0     0      1
7     1     1     1      0
8     0     1     0      1
9     0     1     1      1

Compare label column to each other column :

comparison = np.equal(df[['col1', 'col2', 'col3']], df[['label']])

    col1   col2   col3
0   True  False  False
1  False   True  False
2   True  False  False
3  False   True   True
4  False   True   True
5  False  False   True
6  False  False  False
7  False  False  False
8  False   True  False
9  False   True   True

You can then sum the result to get the number of equal cases per column:

comparison.sum()

col1    2
col2    5
col3    4
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM