随机选择两个值，而无需在数据帧中重复

Question

Consider a dataframe df with N columns and M rows: 考虑具有N列和M行的数据帧df ：

>>> df = pd.DataFrame(np.random.randint(1, 10, (10, 5)), columns=list('abcde'))
>>> df
   a  b  c  d  e
0  4  4  5  5  7
1  9  3  8  8  1
2  2  8  1  8  5
3  9  5  1  2  7
4  3  5  8  2  3
5  2  8  8  2  8
6  3  1  7  2  6
7  4  1  5  6  3
8  5  4  4  9  5
9  3  7  5  6  6

I want to randomly choose two columns and then randomly choose one particular row (this would give me two values of the same row). 我想随机选择两列，然后随机选择一个特定的行（这将为我提供同一行的两个值）。 I can achieve this using 我可以使用

>>> df.sample(2, axis=1).sample(1,axis=0)
   e  a
1  3  5

I want to perform this K times like below : 我想像下面这样执行K次：

>>> for i in xrange(5):
...     df.sample(2, axis=1).sample(1,axis=0)
...
   e  a
1  3  5
   d  b
2  1  9
   e  b
4  8  9
   c  b
0  6  5
   e  c
1  3  5

I want to ensure that I do not choose the same two values (by choosing the same two columns and same row) in any of the trials. 我想确保在任何试验中都不要选择相同的两个值（通过选择相同的两列和同一行）。 How would I achieve this? 我将如何实现？

I want to then perform a bitwise XOR operation on the two chosen values in each trial as well. 然后，我还要对每个试验中的两个选定值执行按位XOR操作。 For example, 3 ^ 5, 1 ^ 9 , .. and count all the bit differences in the chosen values. 例如，3 ^ 5，1 ^ 9，..并计算所选值中的所有位差。

Answer 1

You can create a list of all of the index by 2 column tuples. 您可以按2列元组创建所有索引的列表。 And then take random selections from that without replacement. 然后从中随机选择而不进行替换。

Sample Data 样本数据

import pandas as pd
import numpy as np
from itertools import combinations, product

np.random.seed(123)
df = pd.DataFrame(np.random.randint(1, 10, (10, 5)), columns=list('abcde'))
#df = df.reset_index() #if index contains duplicates

Code 码

K = 5
choices = np.array(list(product(df.index, combinations(df.columns, 2))))
idx = choices[np.r_[np.random.choice(len(choices), K, replace=False)]]

#array([[9, ('a', 'e')],
#       [2, ('a', 'e')],
#       [1, ('a', 'c')],
#       [3, ('b', 'e')],
#       [8, ('d', 'e')]], dtype=object)

Then you can decide how exactly you want your output, but something like this is close to what you show: 然后，您可以决定要输出的精确程度，但是类似于您所显示的内容：

pd.concat([df.loc[myid[0], list(myid[1])].reset_index().T for myid in idx])
#       0  1
#index  a  e
#9      4  8
#index  a  e
#2      1  1
#index  a  c
#1      7  1
#index  b  e
#3      2  3
#index  d  e
#8      5  7

随机选择两个值，而无需在数据帧中重复

问题描述

1 个解决方案

解决方案1
4 已采纳 2019-04-13 18:36:16

Sample Data 样本数据

Code 码

随机选择两个值，而无需在数据帧中重复

问题描述

1 个解决方案

解决方案1 4 已采纳 2019-04-13 18:36:16

Sample Data 样本数据

Code 码

解决方案1
4 已采纳 2019-04-13 18:36:16