简体   繁体   English

随机选择两个值,而无需在数据帧中重复

[英]Randomly choose two values without repetition in dataframe

Consider a dataframe df with N columns and M rows: 考虑具有N列和M行的数据帧df

>>> df = pd.DataFrame(np.random.randint(1, 10, (10, 5)), columns=list('abcde'))
>>> df
   a  b  c  d  e
0  4  4  5  5  7
1  9  3  8  8  1
2  2  8  1  8  5
3  9  5  1  2  7
4  3  5  8  2  3
5  2  8  8  2  8
6  3  1  7  2  6
7  4  1  5  6  3
8  5  4  4  9  5
9  3  7  5  6  6

I want to randomly choose two columns and then randomly choose one particular row (this would give me two values of the same row). 我想随机选择两列,然后随机选择一个特定的行(这将为我提供同一行的两个值)。 I can achieve this using 我可以使用

>>> df.sample(2, axis=1).sample(1,axis=0)
   e  a
1  3  5

I want to perform this K times like below : 我想像下面这样执行K次:

>>> for i in xrange(5):
...     df.sample(2, axis=1).sample(1,axis=0)
...
   e  a
1  3  5
   d  b
2  1  9
   e  b
4  8  9
   c  b
0  6  5
   e  c
1  3  5

I want to ensure that I do not choose the same two values (by choosing the same two columns and same row) in any of the trials. 我想确保在任何试验中都不要选择相同的两个值(通过选择相同的两列和同一行)。 How would I achieve this? 我将如何实现?

I want to then perform a bitwise XOR operation on the two chosen values in each trial as well. 然后,我还要对每个试验中的两个选定值执行按位XOR操作。 For example, 3 ^ 5, 1 ^ 9 , .. and count all the bit differences in the chosen values. 例如,3 ^ 5,1 ^ 9,..并计算所选值中的所有位差。

You can create a list of all of the index by 2 column tuples. 您可以按2列元组创建所有索引的列表。 And then take random selections from that without replacement. 然后从中随机选择而不进行替换。

Sample Data 样本数据

import pandas as pd
import numpy as np
from itertools import combinations, product

np.random.seed(123)
df = pd.DataFrame(np.random.randint(1, 10, (10, 5)), columns=list('abcde'))
#df = df.reset_index() #if index contains duplicates

Code

K = 5
choices = np.array(list(product(df.index, combinations(df.columns, 2))))
idx = choices[np.r_[np.random.choice(len(choices), K, replace=False)]]

#array([[9, ('a', 'e')],
#       [2, ('a', 'e')],
#       [1, ('a', 'c')],
#       [3, ('b', 'e')],
#       [8, ('d', 'e')]], dtype=object)

Then you can decide how exactly you want your output, but something like this is close to what you show: 然后,您可以决定要输出的精确程度,但是类似于您所显示的内容:

pd.concat([df.loc[myid[0], list(myid[1])].reset_index().T for myid in idx])
#       0  1
#index  a  e
#9      4  8
#index  a  e
#2      1  1
#index  a  c
#1      7  1
#index  b  e
#3      2  3
#index  d  e
#8      5  7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 不重复的DataFrame的最小值 - Minimum values of a DataFrame without repetition 如何在两个值之间随机选择? - How to choose randomly between two values? 如何在概率分布不均匀的列表中随机选择一定数量的元素而无需重复? - How to randomly choose a certain number of elements in a list with non-uniform probability distribution without repetition? 从 pandas dataframe 中随机选择 n 行并将它们移动到新的 df 而不重复 - Randomly selecting n rows from pandas dataframe and moving them to new df without repetition 在两个值之间进行选择,并在pandas数据帧中设置最频繁的值 - Choose between two values and set the most frequent in a pandas dataframe 从 DataFrame 无重复创建排列 - Creating Permutations from DataFrame without Repetition 获取 pandas dataframe 块而不重复? - Get the pandas dataframe in chunks without repetition? 随机选择小写和大写,无需手动添加 - Choose Lower and Uppercase randomly without adding it manually 随机选择乌龟圈的位置而不重叠 - Randomly choose positions of turtle circles without overlap for 循环对迭代不会显示所有对(不重复)。 我只得到列表的前两个值 - python - for loop pair iteration will not display all pairs(without repetition). I only get the first two values of list - python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM