Currently, I am trying to select rows from a large dataframe (1.5 million rows), called active, by a combination of two columns from another dataframe, called passive, which has about 30,000 rows. If a combination of two columns in the active table matches the combination of two columns in the passive table, I select the row from the active table.
Here is the code:
active.loc[(active['userid']+active['orgcity']).isin(passive.userid+passive.city)]
However, this process is taking a long time. I think it should already be an improvement over iteration or pd.apply. Are there any other ways to speed this up?
You can find more details here .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.