![](/img/trans.png)
[英]How to use groupby, select, count(*) and where commands of SQL together in Pandas
[英]How to use distinct and where clause together in Pandas?
我有一個數據框和列表如下
op1 = pd.DataFrame({
'subject_id':[1,1,2,3,4,4,5],
'iid': [21,22,23,24,26,26,27],
'los':[121,122,123,124,111,111,131],
'area':['a','a','b','c','d','d','f'],
'date' : ['1/1/2017','1/2/2017','1/3/2017','1/4/2017','1/6/2017','1/6/2017','1/8/2109'],
'val' :[5,10,5,16,26,26,7]
})
sub_list = [1,2,3,4]
我想檢查sub_list
的subject_id
是否存在於op1
。 如果存在,則從los
, iid
, area
獲取該subject_id的distinct
值(查找subject_id
1
and
4
之間的差異(重復)
我嘗試了以下內容,但不能有多列
op1[op1['subject_id'].isin(sub_list)] # how to use distinct records here?
我必須將此應用於一百萬條記錄。 因此,任何優雅高效的解決方案都是有幫助的
我正在尋找類似的東西
select distinct subject_id, iid,los, area from op1
where subject_id in [sub_list]
我希望我的輸出如下所示
如果打算僅返回選定的列,請執行以下操作:
result = op1.loc[op1["subject_id"].isin(sub_list), ["subject_id", "los", "iid", "area"]].drop_duplicates()
我不確定這有多快,但是您可以嘗試:
(op1[['subject_id','iid','los','area']]
.drop_duplicates(['subject_id','iid','los','area'])
.set_index('subject_id')
.loc[sub_list]
)
op1[op1['subject_id'].isin(sub_list)].drop_duplicates(subset=list_columns_to_distinct)
這實際上是先前答案的混合
distCols = ["subject_id", "iid",
"los", "area"]
op1[op1['subject_id'].isin(sub_list)].drop_duplicates(distCols)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.