如何在Pandas數據框中獲取由groupby合並的行的值列表？

Question

假設我有以下數據框：

#!/usr/bin/env python

import pandas as pd


df = pd.DataFrame([(1, 2, 1),
                   (1, 2, 2),
                   (1, 2, 3),
                   (4, 1, 612),
                   (4, 1, 612),
                   (4, 1, 1),
                   (3, 2, 1),
                   ],
                  columns=['groupid', 'a', 'b'],
                  index=['India', 'France', 'England', 'Germany', 'UK', 'USA',
                         'Indonesia'])
print(df)

這使：

           groupid  a    b
India            1  2    1
France           1  2    2
England          1  2    3
Germany          4  1  612
UK               4  1  612
USA              4  1    1
Indonesia        3  2    1

第1步

此步驟可能沒有必要，也可能與我的想象不同。 我實際上只對第2步感興趣，但是有了這個可以幫助我考慮一下並解釋我想要的。

我想按groupid（ df.groupby(df['groupid']) ）對數據進行分組，並得到以下內容：

    groupid  a    b
          1  [2]  [1, 2, 3]
          4  [1]  [612, 1]
          3  [2]  [1]

第2步

然后，我想找到所有在列b中只有一個條目且條目等於1組ID。

同樣，我想查找所有具有多個條目或一個不為1條目的組ID。

Answer 1

您可以比較set ，然后將index的值獲取到list s：

mask = df.groupby('groupid')['b'].apply(set) == set([1])
print (mask)
groupid
1    False
3     True
4    False
Name: b, dtype: bool

i = mask.index[mask].tolist()
print (i)
[3]

j = mask.index[~mask].tolist()
print (j)
[1, 4]

對於新列，請使用map ：

df['new'] = df['groupid'].map(df.groupby('groupid')['b'].apply(set) == set([1]))
print (df)

           groupid  a    b    new
India            1  2    1  False
France           1  2    2  False
England          1  2    3  False
Germany          4  1  612  False
UK               4  1  612  False
USA              4  1    1  False
Indonesia        3  2    1   True

舊的解決方案：

您可以對具有與原始df相同大小的新Series使用nunique進行transform ，因此可以將其與1進行比較以獲得唯一性，然后鏈接另一個條件以與1進行比較：

mask = (df.groupby('groupid')['b'].transform('nunique') == 1) & (df['b'] == 1)
print (mask)
India        False
France       False
England      False
Germany      False
UK           False
USA          False
Indonesia     True
Name: b, dtype: bool

對於list的唯一值：

i = df.loc[mask, 'groupid'].unique().tolist()
print (i)
[3]

j = df.loc[~mask, 'groupid'].unique().tolist()
print (j)
[1, 4]

詳情：

print (df.groupby('groupid')['b'].transform('nunique'))
India        3
France       3
England      3
Germany      2
UK           2
USA          2
Indonesia    1
Name: b, dtype: int64

Answer 2

IIUC您可以應用列表並使用.str來檢查長度

temp = df.groupby('groupid')['b'].apply(list).to_frame()

temp
                   b
groupid               
1            [1, 2, 3]
3                  [1]
4        [612, 612, 1]

mask = (temp['b'].str.len() == 1) & (temp['b'].str[0] == 1) 

temp[mask].index.tolist()
#[3]
temp[~mask].index.tolist()
#[1, 4]

Answer 3

我會去

#group by the group id and than apply count for how many b entries are equal to 1 
groups = df.groupby("groupid").apply(lambda group:len([x for x in 
group["b"].values.tolist() if x == 1]))
#keep the groups containing 1 b equal to 1 
groups = groups[groups == 1]
#print the indecies of the result (the groupid values)
print groups.index.values

如何在Pandas數據框中獲取由groupby合並的行的值列表？

問題描述

第1步

第2步

3 個解決方案

解決方案1
5 已采納 2017-12-01 13:58:23

解決方案2
3 2017-12-01 14:05:06

解決方案3
1 2017-12-01 14:12:20

如何在Pandas數據框中獲取由groupby合並的行的值列表？

問題描述

第1步

第2步

3 個解決方案

解決方案1 5 已采納 2017-12-01 13:58:23

解決方案2 3 2017-12-01 14:05:06

解決方案3 1 2017-12-01 14:12:20

解決方案1
5 已采納 2017-12-01 13:58:23

解決方案2
3 2017-12-01 14:05:06

解決方案3
1 2017-12-01 14:12:20