简体   繁体   中英

How can we count dupes in a column of a data frame and assign the results to a new column in the same data frame?

I have several addresses, in a column of a dataframe, that repeat, and I want to count the dupes. I tried the following code.

import pandas as pd
df = pd.read_csv('C:\\my_path\\lat_lon.csv')
df['count'] = df.groupby(['Street'])['Street'].count()
df['count'] = df.groupby(['Street'])[['Street']].count()

That gives me all NAN values in the 'count' column. So, I tried this next.

df = df.groupby(['Street']).size().reset_index(name='count')

That gives me the 'Street' and the 'count' but all other columns are dropped. I tried to pivot the data, and the counts are right, but I really want the counts in a new column, in the original data frame. In Excel, this would be a 'countif' function.

How about this

from collections import Counter
data = (random.choice(["221B Baker Street", "10 Downing Street", "Arc de Triomphe - Champs-Élysées"]) for _ in range(50))
df = pd.DataFrame(data={"addresses":data})
Counter(df["addresses"])

It gives

Counter({'221B Baker Street': 22, 'Arc de Triomphe - Champs-Élysées': 15, '10 Downing Street': 13})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM