Suppose I have a pandas dataframe like below:
df = pd.DataFrame()
df["person"] = ["p1", "p2", "p1", "p3", "p3", "p2", "p2", "p1", "p3", "p1",
"p1", "p2", "p2", "p1", "p3", ]
df["type"] = ["a", "a", "a", "a", "b", "a", "a", "b", "b", "b", "a", "a",
"b", "a", "b",]
df["value"] = np.random.random(15)
bins = [0, 0.25,0.5,0.75, 1]
labels = [f"{float(i)}-{float(j)}" for i, j in zip(bins[:-1], bins[1:])]
df["bin"] = pd.cut(df["value"], bins=bins, labels=labels, right = False)
I want to insert a new column which returns the count of "person" grouped by "type". From browsing SO I have found the following line of code to work, but only if I don't include the last column "bin". My problem is how to insert the column "counter" in a dataframe that also includes the column "bin". Thank you in advance!
df["counter"] = df.groupby(["person", "type"], as_index = False).transform("count")
Just change it to
df["counter"] = df.groupby(["person", "type"], as_index = False)['value'].transform("count")
and you'll get
person type value bin counter
0 p1 a 0.134629 0.0-0.25 4
1 p2 a 0.997557 0.75-1.0 4
2 p1 a 0.911967 0.75-1.0 4
3 p3 a 0.278438 0.25-0.5 1
4 p3 b 0.539296 0.5-0.75 3
5 p2 a 0.722150 0.5-0.75 4
6 p2 a 0.724028 0.5-0.75 4
7 p1 b 0.989627 0.75-1.0 2
8 p3 b 0.978790 0.75-1.0 3
9 p1 b 0.197428 0.0-0.25 2
10 p1 a 0.330113 0.25-0.5 4
11 p2 a 0.806856 0.75-1.0 4
12 p2 b 0.430026 0.25-0.5 1
13 p1 a 0.265003 0.25-0.5 4
14 p3 b 0.037202 0.0-0.25 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.