Pandas: get two different rows with same pair of values in two different columns

Question

I have two columns _Id and _ParentId with this example data. Using this I want to group _Id with _ParentId.

       _Id  _ParentId
        1        NaN
        2        NaN
        3        1.0
        4        2.0
        5        NaN
        6        2.0

After grouping the result should be shown as below.

       _Id  _ParentId
        1        NaN
        3        1.0
        2        NaN
        4        2.0
        6        2.0
        5        NaN

The main aim for this is to group which _Id belongs to which _ParentId (eg _Id 3 belongs to _Id 1).

I have attempted to use groupby and duplicated but I can't seem to get the results shown above.

Answer 1

Use sort_values on temp

In [3188]: (df.assign(temp=df._ParentId.combine_first(df._Id))
              .sort_values(by='temp').drop('temp', 1))
Out[3188]:
   _Id  _ParentId
0    1        NaN
2    3        1.0
1    2        NaN
3    4        2.0
5    6        2.0
4    5        NaN

Details

In [3189]: df._ParentId.combine_first(df._Id)
Out[3189]:
0    1.0
1    2.0
2    1.0
3    2.0
4    5.0
5    2.0
Name: _ParentId, dtype: float64

In [3190]: df.assign(temp=df._ParentId.combine_first(df._Id))
Out[3190]:
   _Id  _ParentId  temp
0    1        NaN   1.0
1    2        NaN   2.0
2    3        1.0   1.0
3    4        2.0   2.0
4    5        NaN   5.0
5    6        2.0   2.0

Answer 2

Your expected output is quite the same as input, just that IDs 4 and 6 are together, with NaNs being at different places. Its not possible to have that expected output.

Here is how group-by would ideally work:

print("Original: ")
print(df)

df = df.fillna(-1) # if not replaced with another character , the grouping won't show NaNs. 
df2 = df.groupby('_Parent')

print("\nAfter grouping: ")
for key, item in df2:
    print (df2.get_group(key))

Output:

Original: 
   _Id  _Parent
0    1      NaN
1    2      NaN
2    3      1.0
3    4      2.0
4    5      NaN
5    6      2.0

After grouping: 
   _Id  _Parent
0    1      0.0
1    2      0.0
4    5      0.0
   _Id  _Parent
2    3      1.0
   _Id  _Parent
3    4      2.0
5    6      2.0

Pandas: get two different rows with same pair of values in two different columns

Question

2 answers

solution1
2 ACCPTED 2017-09-24 05:35:29

solution2
1 2017-09-24 06:26:21

Pandas: get two different rows with same pair of values in two different columns

Question

2 answers

solution1 2 ACCPTED 2017-09-24 05:35:29

solution2 1 2017-09-24 06:26:21

solution1
2 ACCPTED 2017-09-24 05:35:29

solution2
1 2017-09-24 06:26:21