简体   繁体   中英

Counting number of occurrences when grouping by two columns

Suppose I have a pandas dataframe like below:

df = pd.DataFrame()
df["person"] = ["p1", "p2", "p1", "p3", "p3", "p2", "p2", "p1", "p3", "p1", 
  "p1", "p2", "p2", "p1", "p3", ]
df["type"] = ["a", "a", "a", "a", "b", "a", "a", "b", "b", "b", "a", "a", 
  "b", "a", "b",]
df["value"] = np.random.random(15)

bins = [0, 0.25,0.5,0.75, 1]
labels = [f"{float(i)}-{float(j)}" for i, j in zip(bins[:-1], bins[1:])] 
df["bin"] = pd.cut(df["value"], bins=bins, labels=labels, right = False)

I want to insert a new column which returns the count of "person" grouped by "type". From browsing SO I have found the following line of code to work, but only if I don't include the last column "bin". My problem is how to insert the column "counter" in a dataframe that also includes the column "bin". Thank you in advance!

df["counter"] = df.groupby(["person", "type"], as_index = False).transform("count")

Just change it to

df["counter"] = df.groupby(["person", "type"], as_index = False)['value'].transform("count")

and you'll get

   person type     value       bin  counter
0      p1    a  0.134629  0.0-0.25        4
1      p2    a  0.997557  0.75-1.0        4
2      p1    a  0.911967  0.75-1.0        4
3      p3    a  0.278438  0.25-0.5        1
4      p3    b  0.539296  0.5-0.75        3
5      p2    a  0.722150  0.5-0.75        4
6      p2    a  0.724028  0.5-0.75        4
7      p1    b  0.989627  0.75-1.0        2
8      p3    b  0.978790  0.75-1.0        3
9      p1    b  0.197428  0.0-0.25        2
10     p1    a  0.330113  0.25-0.5        4
11     p2    a  0.806856  0.75-1.0        4
12     p2    b  0.430026  0.25-0.5        1
13     p1    a  0.265003  0.25-0.5        4
14     p3    b  0.037202  0.0-0.25        3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM