Unable to delete the duplicates in CSV

Question

"i have a data set in csv there it is a field name Episode where we will take data for future sport events we have"""INDIA VS PAKISTAN AND PAKISTAN VS INDIA for same date is there any option to delete the duplicate

Thanks in advance

在此处输入图像描述

Answer 1

One idea you could use would be to pandas rank method, group by the needed columns

df["RANK"] = df.groupby("Column_1")["Column_2"].rank(method="first", ascending=True)

This should return dataframe by grouping, so three rows of dupes should be ranked 1,2 and 3 respectively. From there, you can take the subset of the dataframe where rank=1 and this will give you a dataframe with no dupes.

Answer 2

Create a new match column then drop_duplicates

# sample df
df = pd.DataFrame({'a': [1,1,1,1,1],
                   'b': ['Bulldogs at Aztecs', 'Aztecs at Bulldogs', 'Bearcats at Huskies', 'Huskies at Bearcats', 'something else']})

# list comprehension and sort words in string 
df['match'] = [' '.join(sorted(x.split())) for x in df['b'].values]

#    a                    b                match
# 0  1   Bulldogs at Aztecs   Aztecs Bulldogs at
# 1  1   Aztecs at Bulldogs   Aztecs Bulldogs at
# 2  1  Bearcats at Huskies  Bearcats Huskies at
# 3  1  Huskies at Bearcats  Bearcats Huskies at
# 4  1       something else       else something

# drop_duplicates
df.drop_duplicates(['a', 'match'], keep='first').drop(columns='match')

#    a                    b
# 0  1   Bulldogs at Aztecs
# 2  1  Bearcats at Huskies
# 4  1       something else

Unable to delete the duplicates in CSV

Question

2 answers

solution1
1 2019-11-15 20:19:41

solution2
0 ACCPTED 2019-11-15 21:10:52

Unable to delete the duplicates in CSV

Question

2 answers

solution1 1 2019-11-15 20:19:41

solution2 0 ACCPTED 2019-11-15 21:10:52

solution1
1 2019-11-15 20:19:41

solution2
0 ACCPTED 2019-11-15 21:10:52