[英]Divide a group into n and add block numbers for each group in python
I have the following table:我有下表:
ColumnA ![]() |
ColumnB ![]() |
---|---|
A![]() |
12 ![]() |
B![]() |
32 ![]() |
C ![]() |
44 ![]() |
D ![]() |
76 ![]() |
E![]() |
99 ![]() |
F ![]() |
123 ![]() |
G ![]() |
65 ![]() |
H ![]() |
87 ![]() |
I![]() |
76 ![]() |
J ![]() |
231 ![]() |
k ![]() |
80 ![]() |
l ![]() |
55 ![]() |
m![]() |
27 ![]() |
n ![]() |
67 ![]() |
I would like to divide this table in to 'n' (n = 4, here) groups and add another column with group name.我想将此表划分为“n”(n = 4,此处为)组,并添加另一列与组名。 The output should look like the following:
output 应如下所示:
ColumnA ![]() |
ColumnB ![]() |
ColumnC ![]() |
---|---|---|
A![]() |
12 ![]() |
1 ![]() |
B![]() |
32 ![]() |
1 ![]() |
C ![]() |
44 ![]() |
1 ![]() |
D ![]() |
76 ![]() |
1 ![]() |
E![]() |
99 ![]() |
2 ![]() |
F ![]() |
123 ![]() |
2 ![]() |
G ![]() |
65 ![]() |
2 ![]() |
H ![]() |
87 ![]() |
2 ![]() |
I![]() |
76 ![]() |
3 ![]() |
J ![]() |
231 ![]() |
3 ![]() |
k ![]() |
80 ![]() |
3 ![]() |
l ![]() |
55 ![]() |
4 ![]() |
m![]() |
27 ![]() |
4 ![]() |
n ![]() |
67 ![]() |
4 ![]() |
What I tried so for?我这么努力是为了什么?
TGn = 4
idx = set(df.index // TGn)
treatment_groups = [i for i in range(1, n+1)]
df['columnC'] = (df.index // TGn).map(dict(zip(idx, treatment_groups)))
This does not split the group properly, not sure where I went wrong.这不能正确拆分组,不确定我哪里出错了。 How do I correct it?
我该如何纠正?
Assuming that your sample size is exactly divided by n (ie sample_size%n
is 0):假设您的样本大小正好除以 n(即
sample_size%n
为 0):
import numpy as np
groups = range(1,n+1)
df['columnC'] = np.repeat(groups,int(len(df)/n))
If your sample size is not exactly divided by n (ie sample_size%n
is not 0):如果您的样本大小未完全除以 n(即
sample_size%n
不为 0):
# Assigning the remaining rows to random groups
df['columnC'] = np.concatenate(
[np.repeat(groups,int(len(df)/n)),
np.random.randint(1, high=n, size=int(len(df)%n), dtype=int)])
# Assigning the remaining rows to group 'm'
df['columnC'] = np.concatenate(
[np.repeat(groups,int(len(df)/n)),
np.repeat([m],int(len(df)%n)), dtype=int)])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.