简体   繁体   中英

Python get the number of distinct values in a column grouped by another column

I have a dataframe containing data on 3 car dealerships and the sales they've made. The two columns of interest look like this:

     dealer_id   manufacturer
0    34          Audi
1    34          Audi
2    34          BMW
3    55          Audi
4    55          Ford
5    55          BMW
6    55          Ford
7    12          Mercedes
8    12          Porsche
9    12          Mercedes
10   12          Audi

In short I want to change the dataframe to where I would only have one row for each manufacturer, for each dealer. So that I can see how many distinct manufacturers had cars sold by each dealer. I'm not really fussed on how this is decided, It can be the first row of each type but I would want the output to look like this before I reset the index:

    dealer_id    manufacturer
0    34           Audi
2    34           BMW
3    55           Audi
4    55           Ford
5    55           BMW
7    12           Mercedes
8    12           Porsche
10   12           Audi

Try .drop_duplicates() :

df = df.drop_duplicates()
print(df)

Prints:

    dealer_id manufacturer
0          34         Audi
2          34          BMW
3          55         Audi
4          55         Ford
5          55          BMW
7          12     Mercedes
8          12      Porsche
10         12         Audi

Or with:

df = df.drop_duplicates(["dealer_id", "manufacturer"])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM