I have a pandas dataframe as below:
data = {'A' : [1,2,3],
'B' : [2,17,17],
'C1' : ["C1", np.nan,np.nan],
'C2' : ["C2", "C2",np.nan]}
# Create DataFrame
df = pd.DataFrame(data)
Dataframe:
A B C1 C2
0 1 2 C1 C2
1 2 17 NaN C2
2 3 17 NaN NaN
I am creating a variable "C" based on the below logic and code
If any of C's(C1, C2, C3..) has the value "C"= value from C's(C1, C2, C3....).
df['C'] = df.filter(regex='C\d+').stack().groupby(level=0).agg(','.join)
Result:
A B C1 C2 C
0 1 2 C1 C2 C1,C2
1 2 17 NaN C2 C2
2 3 17 NaN NaN NaN
Now, I want to perform below logic
If "C" has more than 1 values(say C1, C2) for any row, create a new row and append 2nd value. So I want my output to look like below:
A B C1 C2 C
0 1 2 C1 C2 C1
0 1 2 C1 C2 C2
1 2 17 NaN C2 C2
2 3 17 NaN NaN NaN
We can do it by use explode
then concat
s=df.filter(regex='C\d+').stack().groupby(level=0).agg(list).explode().to_frame('C').join(df)
s=pd.concat([s,df[~df.index.isin(s.index)]],axis=0,join='outer',ignore_index=True,sort=False)
s
Out[62]:
C A B C1 C2
0 C1 1 2 C1 C2
1 C2 1 2 C1 C2
2 C2 2 17 NaN C2
3 NaN 3 17 NaN NaN
you could do:
df.merge(df.melt(['A','B'],value_name= 'C').dropna().drop('variable',axis = 1),how = "left")
A B C1 C2 C
0 1 2 C1 C2 C1
1 1 2 C1 C2 C2
2 2 17 NaN C2 C2
3 3 17 NaN NaN NaN
You can just df.explode(...)
, try:
#please note I aggregate it into list, not string
df['C'] = df.filter(regex='C\d+').stack().groupby(level=0).agg(list)
df=df.explode("C")
Outputs:
A B C1 C2 C
0 1 2 C1 C2 C1
0 1 2 C1 C2 C2
1 2 17 NaN C2 C2
2 3 17 NaN NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.