As part of some data cleaning I need to 'align' the values in a 'Column A' for each 'Year' and 'ID' combination depending if there is any value = 1 in 'Column A' for a 'Year' and 'ID' combination
I already tried np.where()
but only received ValueError: Can only compare identically-labeled Series objects
Here is a short example Dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[1, 2007, 0],
[2, 2008, 0],
[2, 2009, 1],
[3, 2007, 0],
[4, 2010, 0],
[4, 2011, 1],
[4, 2011, 0]]), #I want to change this 0 to 1
columns=['ID', 'Year', 'ColA'])
the result should look like this:
result = pd.DataFrame(np.array([[1, 2007, 0],
[2, 2008, 0],
[2, 2009, 1],
[3, 2007, 0],
[4, 2010, 0],
[4, 2011, 1],
[4, 2011, 1]]),
columns=['ID', 'Year', 'ColA'])
We can use groupby.transform
with any
. Then we get a boolean
back so if we transform it to int
with astype
we get the desired result:
m = df.groupby(['ID', 'Year'])['ColA'].transform(any).astype(int)
df['ColA'] = m
ID Year ColA
0 1 2007 0
1 2 2008 0
2 2 2009 1
3 3 2007 0
4 4 2010 0
5 4 2011 1
6 4 2011 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.