简体   繁体   English

从python中的xlsx文件中的一组行中获取随机项

[英]get a random item from a group of rows in a xlsx file in python

I have a xlsx file, for example: 我有一个xlsx文件,例如:

A  B  C  D  E  F  G
1  5  2  7  0  1  8
3  4  0  7  8  5  9
4  2  9  7  0  6  2
1  6  3  2  8  8  0
4  3  5  2  5  7  9
5  2  3  2  6  9  1

being my values (that are actually on an excel file). 是我的值(实际上在Excel文件中)。 I nedd to get random rows of it, but separeted for column D values. 我想要获得它的随机行,但将它们分隔为D列值。

You can note that column D has values that are 7 and values that are 2. 您可以注意到,列D的值为7,值为2。

I need to get 1 random row of all the rows that have 7 on column D and 1 random row of all the rows that have 2 on column D. 我需要在D列上具有7的所有行中获得1个随机行,在D列上具有2的所有行中获得1个随机行。

And put the results on another xlsx file. 并将结果放在另一个xlsx文件中。

My expected output needs to be the content of line 0, 1 or 2 and the content of line 3, 4 or 5. 我的预期输出需要是第0、1或2行的内容以及第3、4或5行的内容。

Can someone help me with that? 有人可以帮我吗? Thanks! 谢谢!

With OpenPyXl, you can use Worksheet.iter_rows to iterate the worksheet rows. 借助OpenPyXl,您可以使用Worksheet.iter_rows来迭代工作表行。

You can use itertools.groupby to group the row according to the "D" column values. 您可以使用itertools.groupby根据“ D”列值对行进行分组。 To do that, you can create a small function to pick-up this value in a row: 为此,您可以创建一个小函数来连续获取该值:

def get_d(row):
    return row[3].value

Then, you can use random.choice to choose a row randomly. 然后,您可以使用random.choice随机选择一行。

Putting all things togather, you can have: 综合所有内容,您可以:

def get_d(row):
    return row[3].value


for key, group in itertools.groupby(rows, key=get_d):
    row = random.choice(list(group))
    print(row)

I've created the code to that. 我已经为此创建了代码。 The code below assumes that the excel name is test.xlsx and resides in the same folder as where you run your code. 下面的代码假定excel名称为test.xlsx,并且与您运行代码的位置相同。 It samples NrandomLines from each unique value in column D and prints that out. 它从D列中的每个唯一值中采样NrandomLines并将其打印出来。

import pandas as pd
import numpy as np
import random
df = pd.read_excel('test.xlsx') # read the excel

vals = df.D.unique() # all unique values in column D, in your case its only 2 and 7

idx = []
N = []
for i in vals: # loop over unique values in column D
    locs = (df.D==i).values.nonzero()[0]
    idx = idx + [locs]  # save row index of every unique value in column D    
    N = N + [len(locs)] # save how many rows contain specific value in D



NrandomLines = 1 # how many random samples you want 

for i in np.arange(len(vals)): # loop over unique values of D
    for k in np.arange(NrandomLines): # loop how many random samples you want 
        randomRow = random.randint(0,N[i]-1) # create random sample

        print(df.iloc[idx[i][randomRow],:])  # print out random row

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM