简体   繁体   中英

Divide a group into n and add block numbers for each group in python

I have the following table:

ColumnA ColumnB
A 12
B 32
C 44
D 76
E 99
F 123
G 65
H 87
I 76
J 231
k 80
l 55
m 27
n 67

I would like to divide this table in to 'n' (n = 4, here) groups and add another column with group name. The output should look like the following:

ColumnA ColumnB ColumnC
A 12 1
B 32 1
C 44 1
D 76 1
E 99 2
F 123 2
G 65 2
H 87 2
I 76 3
J 231 3
k 80 3
l 55 4
m 27 4
n 67 4

What I tried so for?

TGn = 4
idx = set(df.index // TGn)

treatment_groups = [i for i in range(1, n+1)]
df['columnC'] = (df.index // TGn).map(dict(zip(idx, treatment_groups)))

This does not split the group properly, not sure where I went wrong. How do I correct it?

Assuming that your sample size is exactly divided by n (ie sample_size%n is 0):

import numpy as np
groups = range(1,n+1)

df['columnC'] = np.repeat(groups,int(len(df)/n))

If your sample size is not exactly divided by n (ie sample_size%n is not 0):

# Assigning the remaining rows to random groups
df['columnC'] = np.concatenate(
                [np.repeat(groups,int(len(df)/n)), 
                 np.random.randint(1, high=n, size=int(len(df)%n), dtype=int)])

# Assigning the remaining rows to group 'm'
df['columnC'] = np.concatenate(
                [np.repeat(groups,int(len(df)/n)), 
                 np.repeat([m],int(len(df)%n)), dtype=int)])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM