[英]Drop rows in dataframe whose column has more than a certain number of distinct values
我有一個示例 dataframe 如下所示,我試圖刪除cluster_num
列只有 1 個不同值的行。
df = pd.DataFrame([[1,2,3,4,5],[1,3,4,2,5],[1,3,7,9,10],[2,6,2,7,9],[2,2,4,7,0],[3,1,9,2,7],[4,9,5,1,2],[5,8,4,2,1],[5,0,7,1,2],[6,9,2,5,7]])
df.rename(columns = {0:"cluster_num",1:"value_1",2:"value_2",3:"value_3",4:"value_4"},inplace=True)
# Dropping rows for which cluster_num has only one distinct value
count_dict = df['cluster_num'].value_counts().to_dict()
df['count'] = df['cluster_num'].apply(lambda x : count_dict[x])
df[df['count']>1]
在上面的示例中, cluster_num
等於 3,4 和 6 的行將被刪除。
有沒有辦法做到這一點而不必創建一個單獨的列? 我需要 output 中的所有 5 個初始列(cluster_num、value_1、value_2、value_3、value_4)。 我的output dataframe按照上面的代碼是:
我嘗試使用groupby()
和count()
進行過濾,但沒有成功。
groupby
/ filter
df.groupby('cluster_num').filter(lambda d: len(d) > 1)
duplicated
df[df.duplicated('cluster_num', keep=False)]
groupby
/ transform
df[df.groupby('cluster_num')['cluster_num'].transform('size') >= 2]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.