简体   繁体   中英

Randomly merge two dataframes based on condition in Pandas

I have two dataframes of same length, with a shared column called post_id , look like this:

df1 :

post_id text
001 some text 1
002 some text 2
003 some text 3
... ...
999 some text 999

df2 :

post_id text
001 different text 1
002 different text 2
003 different text 3
... ...
999 different text 999

What I want is a new dataframe with half of the rows randomly selected from df1 , the other half from df2 , with all the post_id still in there and no duplicates. Is there a way to do this short of manually iloc the rows?

If there is same number of columns and same index use DataFrame.update with DataFrame.sample :

df1.update(df2.sample(frac=0.5, replace=False))
print (df1)
   post_id                text
0      1.0    different text 1
1      2.0         some text 2
2      3.0         some text 3
3    999.0  different text 999

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM