I have a dataframe. It contains df['article_id']
. I'm using to_sql
function with sqlalchemy
to insert into my database. However, sometimes I have duplicate records that I want to remove before inserting.
This is my list:
usedIDs = []
select_st = select([article_table])
res = conn.execute(select_st)
for _row in res:
clean = int(_row[1])
usedIDs.append(clean)
usedIDs
With output:
[1202623831,
1747352473,
1748645480,
1759957596,
1811054956,
1812183879,
1816974229,
2450784233,
2579244390,
2580336884]
What i've tried:
df[~df.isin(usedIDs)]
df.drop(usedIDs, axis=0)
And this does not work. However when I hardcode it like below, it does work.
df = df[~df.article_id.isin(['1202623831','1747352473'])]
Error is either unhashable
or KeyError: not found in axis
.
How can I drop the rows from my dataframe where df['article_id']
is in usedIDs
list?
Just using "isin" will suffice like this on a sample data:
df
one date
0 1 2019-05-10 06:00:16
1 2 2019-05-10 06:30:21
2 3 2019-05-10 07:00:03
3 4 2019-05-10 06:32:43
4 5 2019-05-10 07:33:31
5 6 2019-05-10 07:37:39:09
6 7 2019-05-10 07:49:01
7 8 2019-05-10 08:52:05
8 9 2019-05-10 08:29:44:10
df = df[~df.one.isin([1,2])]
df
one date
2 3 2019-05-10 07:00:03
3 4 2019-05-10 06:32:43
4 5 2019-05-10 07:33:31
5 6 2019-05-10 07:37:39:09
6 7 2019-05-10 07:49:01
7 8 2019-05-10 08:52:05
8 9 2019-05-10 08:29:44:10
This works because you have changed the datatype from int to string
df = df[~df.article_id.isin(['1202623831','1747352473'])]
Try converting userIDs to strings like this:
userIDs = [str(userid) for userid in userIDs]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.