I have the following table:
ColumnA | ColumnB |
---|---|
A | 12 |
B | 32 |
C | 44 |
D | 76 |
E | 99 |
F | 123 |
G | 65 |
H | 87 |
I | 76 |
J | 231 |
k | 80 |
l | 55 |
m | 27 |
n | 67 |
I would like to divide this table in to 'n' (n = 4, here) groups and add another column with group name. The output should look like the following:
ColumnA | ColumnB | ColumnC |
---|---|---|
A | 12 | 1 |
B | 32 | 1 |
C | 44 | 1 |
D | 76 | 1 |
E | 99 | 2 |
F | 123 | 2 |
G | 65 | 2 |
H | 87 | 2 |
I | 76 | 3 |
J | 231 | 3 |
k | 80 | 3 |
l | 55 | 4 |
m | 27 | 4 |
n | 67 | 4 |
What I tried so for?
TGn = 4
idx = set(df.index // TGn)
treatment_groups = [i for i in range(1, n+1)]
df['columnC'] = (df.index // TGn).map(dict(zip(idx, treatment_groups)))
This does not split the group properly, not sure where I went wrong. How do I correct it?
Assuming that your sample size is exactly divided by n (ie sample_size%n
is 0):
import numpy as np
groups = range(1,n+1)
df['columnC'] = np.repeat(groups,int(len(df)/n))
If your sample size is not exactly divided by n (ie sample_size%n
is not 0):
# Assigning the remaining rows to random groups
df['columnC'] = np.concatenate(
[np.repeat(groups,int(len(df)/n)),
np.random.randint(1, high=n, size=int(len(df)%n), dtype=int)])
# Assigning the remaining rows to group 'm'
df['columnC'] = np.concatenate(
[np.repeat(groups,int(len(df)/n)),
np.repeat([m],int(len(df)%n)), dtype=int)])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.