將唯一 ID 分配給 Pandas 數據框中兩列的組合，按其順序獨立

Question

我有一個這樣的數據框

我想為每一行分配一個基於 col1 和 col2 但獨立於它們的順序的唯一數據集

col1 col2 id
1    2    1
2    1    1
2    3    2
3    2    2
3    4    3
4    3    3

我怎樣才能做到這一點？

Answer 1

一種方法：

df["id"] = df.groupby(df[["col1", "col2"]].apply(frozenset, axis=1)).ngroup() + 1
print(df)

輸出

   col1  col2  id
0     1     2   1
1     2     1   1
2     2     3   2
3     3     2   2
4     3     4   3
5     4     3   3

使用np.unique + np.sort替代方法：

_, indices = np.unique(np.sort(df.values, axis=1), return_inverse=True, axis=0)
df["id"] = indices + 1
print(df)

輸出

   col1  col2  id
0     1     2   1
1     2     1   1
2     2     3   2
3     3     2   2
4     3     4   3
5     4     3   3

Answer 2

你可以apply它：

import pandas as pd

df = pd.DataFrame(data={"col1":[1,2,3,1,2,3], "col2":[3,2,1,3,2,1]})
df['id'] = df.apply(lambda row: min(row.col1, row.col2), axis=1)
print(df)

輸出：

   col1  col2  id
0     1     3   1
1     2     2   2
2     3     1   1
3     1     3   1
4     2     2   2
5     3     1   1

Answer 3

試試np.sort ：

a = np.sort(df, axis=1)
df['id'] = df.groupby([a[:,0],a[:,1]]).ngroup() + 1

輸出：

   col1  col2  id
0     1     2   1
1     2     1   1
2     2     3   2
3     3     2   2
4     3     4   3
5     4     3   3

Answer 4

還可以使用：

df['mask'] = df.apply(lambda x:','.join(map(str, x.sort_values())), axis=1)
df['id'] = (df['mask'] != df['mask'].shift()).cumsum()
df.drop(columns=['mask'], inplace=True)

輸出：

   col1  col2  id
0     1     2   1
1     2     1   1
2     2     3   2
3     3     2   2
4     3     4   3
5     4     3   3

將唯一 ID 分配給 Pandas 數據框中兩列的組合，按其順序獨立

問題描述

4 個解決方案

解決方案1
1 2021-10-13 14:40:14

解決方案2
0 2021-10-13 14:40:26

解決方案3
0 已采納 2021-10-13 14:41:26

解決方案4
0 2021-10-13 14:45:42

將唯一 ID 分配給 Pandas 數據框中兩列的組合，按其順序獨立

問題描述

4 個解決方案

解決方案1 1 2021-10-13 14:40:14

解決方案2 0 2021-10-13 14:40:26

解決方案3 0 已采納 2021-10-13 14:41:26

解決方案4 0 2021-10-13 14:45:42

解決方案1
1 2021-10-13 14:40:14

解決方案2
0 2021-10-13 14:40:26

解決方案3
0 已采納 2021-10-13 14:41:26

解決方案4
0 2021-10-13 14:45:42