简体   繁体   中英

Pandas: If two or more rows have same values in particular column then get its count and add to next row

Sort data frame by values of 5th column ["Card"] and add a new row after each card nos with its count. After sorting values how can I add a new row with Total:

Dataframe looks something like this

在此处输入图像描述

This is how I want output data frame

在此处输入图像描述

You can give this a try:

import pandas as pd

# create dummy df
card = ["2222","2222","1111","2222","1111","3333"]
name = ["Ed", "Ed",  "John", "Ed", "John", "Kevin"]
phone = ["1##-###-####", "1##-###-####", "2##-###-####", "1##-###-####", "2##-###-####", "3##-###-####"]
df = pd.DataFrame({"Name":name, "Phone":phone, "Card":card})

# sort by Card value
df = df.sort_values(by=["Card"]).reset_index(drop=True)

# Groupby the Card value, count them, then insert a new row based on that count
index = 0
line = []
for x in df.groupby("Card").size():
    index += x
    line.append(pd.DataFrame({"Name": "", "Phone":"", "Card": str(x)}, index=[index]))
df = df.append(line, ignore_index=False)
df = df.sort_values(by=["Card"]).sort_index().reset_index(drop=True)
df

Output:

    Name    Phone           Card
0   Ed      1##-###-####    1111
1   Ed      1##-###-####    1111
2   Ed      1##-###-####    1111
3                              3
4   John    2##-###-####    2222
5   John    2##-###-####    2222
6                              2
7   Kevin   3##-###-####    3333
8                              1

Edit ~~~~

Due to OP's use of string for card numbers, an edit had to be made to account for naturally sorting string ints

import pandas as pd
from natsort import natsort_keygen ##### Now needed because OP has Card numbers as strings

# create dummy df ##############
card = ["1111", "2222", "3333", "4444", "5555", "6666", "7777", "8888"]
name = ["Ed", "John", "Jake", "Mike", "Liz", "Anne", "Deb", "Steph"]
phone = ["1###", "2###", "3###", "4###", "5###", "6###", "7###", "8###"]
dfList = [a for a in zip(name, phone, card)]
dfList = [dfList[random.randrange(len(dfList))] for i in range(50)]
df = pd.DataFrame(dfList, columns=["Name", "Phone", "Card"])
################################

# sort by Card value
df = df.sort_values(by=["Card"]).reset_index(drop=True)

# Groupby the Card value, count them, then insert a new row based on that count
index = 0
line = []
for x in df.groupby("Card").size():
    index += x
    line.append(pd.DataFrame({"Name": "", "Phone":"", "Card": str(x)}, index=[index-1]))
df = pd.concat([df, pd.concat(line)], ignore_index=False)

# Create an Index column to be used in the by pandas sort_values
df["Index"] = df.index

# Sort the values first by index then by card number, use "natsort_keygen()" to naturally sort ints that are strings
df = df.sort_values(by = ['Index', 'Card'], key=natsort_keygen(), ascending = [True, False]).reset_index(drop=True).drop(["Index"], axis=1)

I'm not sure if this is the best way but it worked for me.


list_of_df = [v for k, v in df.groupby("card")]
for i in enumerate(list_of_df):
    list_of_df[i[0]] = list_of_df[i[0]].append({"card":str(list_of_df[i[0]].shape[0])}, ignore_index=True)

final_df = pd.concat(list_of_df)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM