[英]Randomly choose two values without repetition in dataframe
Consider a dataframe df
with N
columns and M
rows: 考虑具有N
列和M
行的数据帧df
:
>>> df = pd.DataFrame(np.random.randint(1, 10, (10, 5)), columns=list('abcde'))
>>> df
a b c d e
0 4 4 5 5 7
1 9 3 8 8 1
2 2 8 1 8 5
3 9 5 1 2 7
4 3 5 8 2 3
5 2 8 8 2 8
6 3 1 7 2 6
7 4 1 5 6 3
8 5 4 4 9 5
9 3 7 5 6 6
I want to randomly choose two columns and then randomly choose one particular row (this would give me two values of the same row). 我想随机选择两列,然后随机选择一个特定的行(这将为我提供同一行的两个值)。 I can achieve this using 我可以使用
>>> df.sample(2, axis=1).sample(1,axis=0)
e a
1 3 5
I want to perform this K
times like below : 我想像下面这样执行K
次:
>>> for i in xrange(5):
... df.sample(2, axis=1).sample(1,axis=0)
...
e a
1 3 5
d b
2 1 9
e b
4 8 9
c b
0 6 5
e c
1 3 5
I want to ensure that I do not choose the same two values (by choosing the same two columns and same row) in any of the trials. 我想确保在任何试验中都不要选择相同的两个值(通过选择相同的两列和同一行)。 How would I achieve this? 我将如何实现?
I want to then perform a bitwise XOR operation on the two chosen values in each trial as well. 然后,我还要对每个试验中的两个选定值执行按位XOR操作。 For example, 3 ^ 5, 1 ^ 9 , .. and count all the bit differences in the chosen values. 例如,3 ^ 5,1 ^ 9,..并计算所选值中的所有位差。
You can create a list of all of the index by 2 column tuples. 您可以按2列元组创建所有索引的列表。 And then take random selections from that without replacement. 然后从中随机选择而不进行替换。
import pandas as pd
import numpy as np
from itertools import combinations, product
np.random.seed(123)
df = pd.DataFrame(np.random.randint(1, 10, (10, 5)), columns=list('abcde'))
#df = df.reset_index() #if index contains duplicates
K = 5
choices = np.array(list(product(df.index, combinations(df.columns, 2))))
idx = choices[np.r_[np.random.choice(len(choices), K, replace=False)]]
#array([[9, ('a', 'e')],
# [2, ('a', 'e')],
# [1, ('a', 'c')],
# [3, ('b', 'e')],
# [8, ('d', 'e')]], dtype=object)
Then you can decide how exactly you want your output, but something like this is close to what you show: 然后,您可以决定要输出的精确程度,但是类似于您所显示的内容:
pd.concat([df.loc[myid[0], list(myid[1])].reset_index().T for myid in idx])
# 0 1
#index a e
#9 4 8
#index a e
#2 1 1
#index a c
#1 7 1
#index b e
#3 2 3
#index d e
#8 5 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.