How can we count dupes in a column of a data frame and assign the results to a new column in the same data frame?

Question

I have several addresses, in a column of a dataframe, that repeat, and I want to count the dupes. I tried the following code.

import pandas as pd
df = pd.read_csv('C:\\my_path\\lat_lon.csv')
df['count'] = df.groupby(['Street'])['Street'].count()
df['count'] = df.groupby(['Street'])[['Street']].count()

That gives me all NAN values in the 'count' column. So, I tried this next.

df = df.groupby(['Street']).size().reset_index(name='count')

That gives me the 'Street' and the 'count' but all other columns are dropped. I tried to pivot the data, and the counts are right, but I really want the counts in a new column, in the original data frame. In Excel, this would be a 'countif' function.

Answer 1

How about this

from collections import Counter
data = (random.choice(["221B Baker Street", "10 Downing Street", "Arc de Triomphe - Champs-Élysées"]) for _ in range(50))
df = pd.DataFrame(data={"addresses":data})
Counter(df["addresses"])

It gives

Counter({'221B Baker Street': 22, 'Arc de Triomphe - Champs-Élysées': 15, '10 Downing Street': 13})

How can we count dupes in a column of a data frame and assign the results to a new column in the same data frame?

Question

1 answers

solution1
0 2020-10-16 00:41:03

How can we count dupes in a column of a data frame and assign the results to a new column in the same data frame?

Question

1 answers

solution1 0 2020-10-16 00:41:03

solution1
0 2020-10-16 00:41:03