简体   繁体   中英

Drop duplicates in pandas Dataframe

I have a DataFrame

  Type   Numer   master      width
  xyz    465_0     123        305
  xyz    465_0     123        305
  xyz    465_0     123        305
  xyz    465_0     123        315
  xyz    465_1     123        305
  xyz    465_1     123        305
  xyz    465_1     123        305
  xyz    465_1     123        315
  xyz    465_2     123        305
  xyz    465_2     123        305
  xyz    465_2     123        305
  xyz    465_2     123        315
  xyz    465_3     123        305
  xyz    465_3     123        305
  xyz    465_3     123        305
  xyz    465_3     123        315

From this I need the following DataFrame

  Type   Numer   master      width
  xyz    465_0     123        305
  xyz    465_1     123        305
  xyz    465_2     123        305
  xyz    465_3     123        315

My try is:

df[['Numer1', 'dig']] = df['Numer'].str.split("_", expand=True)
df = df.drop('Numer', axis = 1)
df.drop_duplicates()

But it is not giving me the result. I would like to write it in a generic way, because I have this for multiple types.

Data:

{'Type': ['xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 
          'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz'], 
 'Numer': ['465_0', '465_0', '465_0', '465_0', '465_1', '465_1', '465_1', '465_1', 
           '465_2', '465_2', '465_2', '465_2', '465_3', '465_3', '465_3', '465_3'], 
 'master': [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 123], 
 'width': [305, 305, 305, 315, 305, 305, 305, 315, 305, 305, 305, 315, 305, 305, 305, 315]}

We could use groupby + cumcount to create an group-specific ranking for each "Numer"; then filter the rows where the suffix in "Numer" matches the ranks in groups:

out = df[df['Numer'].str.split('_').str[1].astype(int) == df.groupby('Numer').cumcount()].drop(columns='rank')

Output:

   Type  Numer  master  width
0   xyz  465_0     123    305
5   xyz  465_1     123    305
10  xyz  465_2     123    305
15  xyz  465_3     123    315

What I understood from your question is that you are trying to "collapse" the width pattern in each Numer group into a DataFrame which only has the unique group ( Numer ) values and the same width pattern

One way to do this is using the zip_longest function from the itertools library after reversing the groups -

from itertools import zip_longest
rev_zip_longest = list(zip_longest(*[reversed(df[col].unique()) for col in df.columns]))
# [('xyz', '465_3', 123, 315),
# (None, '465_2', None, 305),
# (None, '465_1', None, None),
# (None, '465_0', None, None)]

df2 = pd.DataFrame(rev_zip_longest)
df2.columns = df.columns
df2 = df2.fillna(method='ffill')
#  Type  Numer  master  width
# 0  xyz  465_3   123.0  315.0
# 1  xyz  465_2   123.0  305.0
# 2  xyz  465_1   123.0  305.0
# 3  xyz  465_0   123.0  305.0


this code works as well:

res = pd.concat([g.take([i.split('_')[1]]) for i,g in df.groupby('Numer')])

print(res)
'''
   Type  Numer  master  width
0   xyz  465_0     123    305
5   xyz  465_1     123    305
10  xyz  465_2     123    305
15  xyz  465_3     123    315

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM