How to replace a value depending on “identifier columns” and an additional condition in a pandas dataframe?

Question

As part of some data cleaning I need to 'align' the values in a 'Column A' for each 'Year' and 'ID' combination depending if there is any value = 1 in 'Column A' for a 'Year' and 'ID' combination

I already tried np.where() but only received ValueError: Can only compare identically-labeled Series objects

Here is a short example Dataframe:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[1, 2007, 0], 
                       [2, 2008, 0], 
                       [2, 2009, 1], 
                       [3, 2007, 0], 
                       [4, 2010, 0], 
                       [4, 2011, 1], 
                       [4, 2011, 0]]), #I want to change this 0 to 1
             columns=['ID', 'Year', 'ColA'])

the result should look like this:

result = pd.DataFrame(np.array([[1, 2007, 0], 
                       [2, 2008, 0], 
                       [2, 2009, 1], 
                       [3, 2007, 0], 
                       [4, 2010, 0], 
                       [4, 2011, 1], 
                       [4, 2011, 1]]),
             columns=['ID', 'Year', 'ColA'])

Answer 1

We can use groupby.transform with any . Then we get a boolean back so if we transform it to int with astype we get the desired result:

m = df.groupby(['ID', 'Year'])['ColA'].transform(any).astype(int)
df['ColA'] = m

   ID  Year  ColA
0   1  2007     0
1   2  2008     0
2   2  2009     1
3   3  2007     0
4   4  2010     0
5   4  2011     1
6   4  2011     1

How to replace a value depending on “identifier columns” and an additional condition in a pandas dataframe?

Question

1 answers

solution1
0 ACCPTED 2019-07-07 18:46:54

How to replace a value depending on “identifier columns” and an additional condition in a pandas dataframe?

Question

1 answers

solution1 0 ACCPTED 2019-07-07 18:46:54

solution1
0 ACCPTED 2019-07-07 18:46:54