The dataset I am using looks like this. It is a video captioning data set with captions under the column 'Description'.
Video_ID Description
mv89psg6zh4 A bird is bathing in a sink.
mv89psg6zh4 A faucet is running while a bird stands.
mv89psg6zh4 A bird gets washed.
mv89psg6zh4 A parakeet is taking a shower in a sink.
mv89psg6zh4 The bird is taking a bath under the faucet.
mv89psg6zh4 A bird is standing in a sink drinking water.
R2DvpPTfl-E PLAYING GAME ON LAPTOP.
R2DvpPTfl-E THE MAN IS WATCHING LAPTOP.
l7x8uIdg2XU A woman is pouring ingredients into a bowl.
l7x8uIdg2XU A woman is adding milk to some pasta.
l7x8uIdg2XU A person adds ingredients to pasta.
l7x8uIdg2XU the girls are doing the cooking.
However, the number of captions for each video is different and not uniform.
I intend to extract one row for one unique Video_ID and form a new dataframe merging these unique rows. Also, to delete the same row from the existing dataframe.
The result I want should look like this:
Dataframe 1-
Video_ID Description
mv89psg6zh4 A faucet is running while a bird stands.
mv89psg6zh4 A bird gets washed.
mv89psg6zh4 A parakeet is taking a shower in a sink.
mv89psg6zh4 The bird is taking a bath under the faucet.
mv89psg6zh4 A bird is standing in a sink drinking water.
R2DvpPTfl-E THE MAN IS WATCHING LAPTOP.
l7x8uIdg2XU A woman is adding milk to some pasta.
l7x8uIdg2XU A person adds ingredients to pasta.
l7x8uIdg2XU the girls are doing the cooking.
Dataframe 2-
Video_ID Description
mv89psg6zh4 A bird is bathing in a sink.
R2DvpPTfl-E PLAYING GAME ON LAPTOP.
l7x8uIdg2XU A woman is pouring ingredients into a bowl.
So that the rows are basically moved from the existing dataframe to form a new dataframe.
You can use groupby()
to sample the index:
s = df.index.to_series().groupby(df['Video_ID']).apply(lambda x: x.sample(n=1))
# random unique
df.loc[s]
# rest of data
df.drop(s)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.