简体   繁体   中英

Is there any efficient way to write this code in python

I want to write this code in python.

proc sql;
select count(distinct ID_1)
from DATA
where ID_1 = ID_2 and ID_type in ("11","23","46");
quit;

I can do this in three steps

a = [x if x==y and z in ("11","23", "46") for x,y,z in zip(DATA['x'],DATA['y'],DATA['z'])]
a = [i for i in a if str(i) != 'nan']
len(np.unique(a))

Is there any efficient way to write the same code.

Most common SQL operations can be easily translated in python and pandas:

DATA[(DATA.ID_1 == DATA.ID_2) & (DATA.ID_type.isin(["11", "23", "46"]))].ID_1.nunique()

Read the introduction to pandas for more.

A different take filtering using query method:

DATA.query('ID_1 == ID_2 and ID_type.isin(["11", "23", "46"])').ID_1.nunique()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM