简体   繁体   中英

Count values, keep duplicates with Pandas

I have this dataset of ID's, in column A of GUID's (250,000 values). I need to count the amount of times each GUID in that column appears and then include that as another column within the data set. The problem is using.value_counts() with pandas gives me a list but removes the duplicates. As I want to align the new count dataset with the old one, the lists don't align.

import os
import pandas as pd

path = (r"D:\\Users\\cdoyle\Desktop\\Final2_.xlsx")
df = pd.read_excel(path)
df = df[['Data BoundingBoxGUID', 'Data Line', 'Data Remove Item:', 'Data Status:', 'Model']]
df2 = df['Data BoundingBoxGUID'].value_counts()


df_output = pd.concat([df,df2], axis=1)

We usually do transform

df['new'] = df.groupby('Data BoundingBoxGUID')['Data BoundingBoxGUID'].transform('count')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM