简体   繁体   English

尝试使用一些值的组合创建测试 dataframe

[英]Trying to create a test dataframe with combination of some values

I have 8 columns, col_1 to col_5 can have either a valid value or -1 indicating any value and I have lists which contain values that these columns can take.我有 8 列,col_1 到 col_5 可以有一个有效值或 -1 表示任何值,并且我有包含这些列可以采用的值的列表。 There are 3 other columns col_6 to col_8 which can take any valid value which is defined in some list.还有 3 个其他列 col_6 到 col_8 可以采用某些列表中定义的任何有效值。 I want to create all combinations of col_1 to col_5 having -1 and fill remaining with random valid values.我想创建 col_1 到 col_5 的所有组合,其中 -1 并用随机有效值填充剩余部分。 col_6 to col_8 can take any random value from the list. col_6 到 col_8 可以从列表中获取任何随机值。

Example with 2 columns not having -1 and 2 columns having -1: 2 列没有 -1 和 2 列有 -1 的示例:

在此处输入图像描述

Example with 2 columns not having -1 and 3 columns having -1: 2 列没有 -1 和 3 列有 -1 的示例:

在此处输入图像描述

Valid means any value that is sampled from the list of values.有效表示从值列表中采样的任何值。

For my case I want to have rows having -1 for the below columns then randomly sample some values for the other columns对于我的情况,我希望下面的列具有 -1 的行,然后为其他列随机采样一些值

[(),
 ('col_1',),
 ('col_2',),
 ('col_3',),
 ('col_4',),
 ('col_5',),
 ('col_1', 'col_2'),
 ('col_1', 'col_3'),
 ('col_1', 'col_4'),
 ('col_1', 'col_5'),
 ('col_2', 'col_3'),
 ('col_2', 'col_4'),
 ('col_2', 'col_5'),
 ('col_3', 'col_4'),
 ('col_3', 'col_5'),
 ('col_4', 'col_5'),
 ('col_1', 'col_2', 'col_3'),
 ('col_1', 'col_2', 'col_4'),
 ('col_1', 'col_2', 'col_5'),
 ('col_1', 'col_3', 'col_4'),
 ('col_1', 'col_3', 'col_5'),
 ('col_1', 'col_4', 'col_5'),
 ('col_2', 'col_3', 'col_4'),
 ('col_2', 'col_3', 'col_5'),
 ('col_2', 'col_4', 'col_5'),
 ('col_3', 'col_4', 'col_5'),
 ('col_1', 'col_2', 'col_3', 'col_4'),
 ('col_1', 'col_2', 'col_3', 'col_5'),
 ('col_1', 'col_2', 'col_4', 'col_5'),
 ('col_1', 'col_3', 'col_4', 'col_5'),
 ('col_2', 'col_3', 'col_4', 'col_5'),
 ('col_1', 'col_2', 'col_3', 'col_4', 'col_5')]

Use:利用:

from  itertools import product

col_1_li = 'a,b,c'.split(',')
col_2_li = 'd,e,f'.split(',')
col_3_li = 'g,h,i'.split(',')
col_4_li = 'j,k,l'.split(',')

#columns with -1
L1 = [col_1_li, col_2_li]
#columns without -1
L2 = [col_3_li, col_4_li]

#get all combinations by length of -1 columns
c = list(product([True, False], repeat=len(L1)))
print (c)
[(True, True), (True, False), (False, True), (False, False)]

#generated random combinations - size is by number of combinations
L = [np.random.choice(x, size=len(c)) for x in L1 + L2]

#generate DataFrame
df = pd.DataFrame(dict(enumerate(L))).add_prefix('col_')

#added -1 by combinations
df.iloc[:, :len(L1)] = df.iloc[:, :len(L1)].where(pd.DataFrame(c).add_prefix('col_'), -1)

print (df)
  col_0 col_1 col_2 col_3
0     b     e     i     k
1     a    -1     g     k
2    -1     d     g     l
3    -1    -1     h     l

from  itertools import product

col_1_li = 'a,b,c'.split(',')
col_2_li = 'd,e,f'.split(',')
col_3_li = 'g,h,i'.split(',')
col_4_li = 'j,k,l'.split(',')
col_5_li = 'x,y,z'.split(',')

#columns with -1
L1 = [col_1_li, col_2_li, col_5_li]
#columns without -1
L2 = [col_3_li, col_4_li]

#get all combinations by length of -1 columns
c = list(product([True, False], repeat=len(L1)))
print (c)
[(True, True, True), (True, True, False), (True, False, True), 
 (True, False, False), (False, True, True), (False, True, False),
 (False, False, True), (False, False, False)]

#generated random combinations - size is by number of combinations
L = [np.random.choice(x, size=len(c)) for x in L1 + L2]

#generate DataFrame
df = pd.DataFrame(dict(enumerate(L))).add_prefix('col_')

#added -1 by combinations
df.iloc[:, :len(L1)] = df.iloc[:, :len(L1)].where(pd.DataFrame(c).add_prefix('col_'), -1)

print (df)
  col_0 col_1 col_2 col_3 col_4
0     b     d     y     i     k
1     b     f    -1     h     l
2     c    -1     z     g     k
3     a    -1    -1     h     k
4    -1     e     x     h     l
5    -1     e    -1     g     k
6    -1    -1     y     g     j
7    -1    -1    -1     h     k

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM