I have two columns _Id and _ParentId with this example data. Using this I want to group _Id with _ParentId.
_Id _ParentId
1 NaN
2 NaN
3 1.0
4 2.0
5 NaN
6 2.0
After grouping the result should be shown as below.
_Id _ParentId
1 NaN
3 1.0
2 NaN
4 2.0
6 2.0
5 NaN
The main aim for this is to group which _Id belongs to which _ParentId (eg _Id 3 belongs to _Id 1).
I have attempted to use groupby and duplicated but I can't seem to get the results shown above.
Use sort_values
on temp
In [3188]: (df.assign(temp=df._ParentId.combine_first(df._Id))
.sort_values(by='temp').drop('temp', 1))
Out[3188]:
_Id _ParentId
0 1 NaN
2 3 1.0
1 2 NaN
3 4 2.0
5 6 2.0
4 5 NaN
Details
In [3189]: df._ParentId.combine_first(df._Id)
Out[3189]:
0 1.0
1 2.0
2 1.0
3 2.0
4 5.0
5 2.0
Name: _ParentId, dtype: float64
In [3190]: df.assign(temp=df._ParentId.combine_first(df._Id))
Out[3190]:
_Id _ParentId temp
0 1 NaN 1.0
1 2 NaN 2.0
2 3 1.0 1.0
3 4 2.0 2.0
4 5 NaN 5.0
5 6 2.0 2.0
Your expected output is quite the same as input, just that IDs 4 and 6 are together, with NaNs being at different places. Its not possible to have that expected output.
Here is how group-by would ideally work:
print("Original: ")
print(df)
df = df.fillna(-1) # if not replaced with another character , the grouping won't show NaNs.
df2 = df.groupby('_Parent')
print("\nAfter grouping: ")
for key, item in df2:
print (df2.get_group(key))
Output:
Original:
_Id _Parent
0 1 NaN
1 2 NaN
2 3 1.0
3 4 2.0
4 5 NaN
5 6 2.0
After grouping:
_Id _Parent
0 1 0.0
1 2 0.0
4 5 0.0
_Id _Parent
2 3 1.0
_Id _Parent
3 4 2.0
5 6 2.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.