[英]Trying to create a test dataframe with combination of some values
I have 8 columns, col_1 to col_5 can have either a valid value or -1 indicating any value and I have lists which contain values that these columns can take.我有 8 列,col_1 到 col_5 可以有一个有效值或 -1 表示任何值,并且我有包含这些列可以采用的值的列表。 There are 3 other columns col_6 to col_8 which can take any valid value which is defined in some list.
还有 3 个其他列 col_6 到 col_8 可以采用某些列表中定义的任何有效值。 I want to create all combinations of col_1 to col_5 having -1 and fill remaining with random valid values.
我想创建 col_1 到 col_5 的所有组合,其中 -1 并用随机有效值填充剩余部分。 col_6 to col_8 can take any random value from the list.
col_6 到 col_8 可以从列表中获取任何随机值。
Example with 2 columns not having -1 and 2 columns having -1: 2 列没有 -1 和 2 列有 -1 的示例:
Example with 2 columns not having -1 and 3 columns having -1: 2 列没有 -1 和 3 列有 -1 的示例:
Valid means any value that is sampled from the list of values.有效表示从值列表中采样的任何值。
For my case I want to have rows having -1 for the below columns then randomly sample some values for the other columns对于我的情况,我希望下面的列具有 -1 的行,然后为其他列随机采样一些值
[(),
('col_1',),
('col_2',),
('col_3',),
('col_4',),
('col_5',),
('col_1', 'col_2'),
('col_1', 'col_3'),
('col_1', 'col_4'),
('col_1', 'col_5'),
('col_2', 'col_3'),
('col_2', 'col_4'),
('col_2', 'col_5'),
('col_3', 'col_4'),
('col_3', 'col_5'),
('col_4', 'col_5'),
('col_1', 'col_2', 'col_3'),
('col_1', 'col_2', 'col_4'),
('col_1', 'col_2', 'col_5'),
('col_1', 'col_3', 'col_4'),
('col_1', 'col_3', 'col_5'),
('col_1', 'col_4', 'col_5'),
('col_2', 'col_3', 'col_4'),
('col_2', 'col_3', 'col_5'),
('col_2', 'col_4', 'col_5'),
('col_3', 'col_4', 'col_5'),
('col_1', 'col_2', 'col_3', 'col_4'),
('col_1', 'col_2', 'col_3', 'col_5'),
('col_1', 'col_2', 'col_4', 'col_5'),
('col_1', 'col_3', 'col_4', 'col_5'),
('col_2', 'col_3', 'col_4', 'col_5'),
('col_1', 'col_2', 'col_3', 'col_4', 'col_5')]
Use:利用:
from itertools import product
col_1_li = 'a,b,c'.split(',')
col_2_li = 'd,e,f'.split(',')
col_3_li = 'g,h,i'.split(',')
col_4_li = 'j,k,l'.split(',')
#columns with -1
L1 = [col_1_li, col_2_li]
#columns without -1
L2 = [col_3_li, col_4_li]
#get all combinations by length of -1 columns
c = list(product([True, False], repeat=len(L1)))
print (c)
[(True, True), (True, False), (False, True), (False, False)]
#generated random combinations - size is by number of combinations
L = [np.random.choice(x, size=len(c)) for x in L1 + L2]
#generate DataFrame
df = pd.DataFrame(dict(enumerate(L))).add_prefix('col_')
#added -1 by combinations
df.iloc[:, :len(L1)] = df.iloc[:, :len(L1)].where(pd.DataFrame(c).add_prefix('col_'), -1)
print (df)
col_0 col_1 col_2 col_3
0 b e i k
1 a -1 g k
2 -1 d g l
3 -1 -1 h l
from itertools import product
col_1_li = 'a,b,c'.split(',')
col_2_li = 'd,e,f'.split(',')
col_3_li = 'g,h,i'.split(',')
col_4_li = 'j,k,l'.split(',')
col_5_li = 'x,y,z'.split(',')
#columns with -1
L1 = [col_1_li, col_2_li, col_5_li]
#columns without -1
L2 = [col_3_li, col_4_li]
#get all combinations by length of -1 columns
c = list(product([True, False], repeat=len(L1)))
print (c)
[(True, True, True), (True, True, False), (True, False, True),
(True, False, False), (False, True, True), (False, True, False),
(False, False, True), (False, False, False)]
#generated random combinations - size is by number of combinations
L = [np.random.choice(x, size=len(c)) for x in L1 + L2]
#generate DataFrame
df = pd.DataFrame(dict(enumerate(L))).add_prefix('col_')
#added -1 by combinations
df.iloc[:, :len(L1)] = df.iloc[:, :len(L1)].where(pd.DataFrame(c).add_prefix('col_'), -1)
print (df)
col_0 col_1 col_2 col_3 col_4
0 b d y i k
1 b f -1 h l
2 c -1 z g k
3 a -1 -1 h k
4 -1 e x h l
5 -1 e -1 g k
6 -1 -1 y g j
7 -1 -1 -1 h k
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.