简体   繁体   English

如何在 pandas DataFrame 中随机提取 n 个连续行?

[英]How to randomly extract n contiguous rows in a pandas DataFrame?

I would like to randomly select n contiguous rows from a DataFrame.我想从 DataFrame 中随机选择 select n 连续行。 This is the only working code I could come up with:这是我能想到的唯一工作代码:

random_row = df.sample(n=1)
start = random_row.index
end = start + n - 1
n_rows = df.iloc[int(start.values):int(end.values)]

But I feel it's bad code as in hacky and not very pythonic.但我觉得这是糟糕的代码,就像 hacky 一样,而且不是很 Pythonic。 For some reason I could not use the Int64Index es, and that feels very weird.由于某种原因,我无法使用Int64Index es,这感觉很奇怪。 I would expect to be able to index a DataFrame by its... Indexes, but it throws some errors.我希望能够通过其...索引来索引 DataFrame,但它会引发一些错误。

Can anyone advise how to make my code better or recommend a better way to do what I'm doing?谁能建议如何使我的代码更好或推荐一种更好的方法来做我正在做的事情?

I can see 2 problems with your current code:我可以看到您当前代码的 2 个问题:

  • It assumes that the rows are sequentially indexed with integers (0, 1, 2, 3, ...)它假设行是用整数(0、1、2、3,...)顺序索引的
  • There may not be enough rows going down from start until the end of the data frame to meet your requirementsstart到结束数据框可能没有足够的行来满足您的要求

A slightly improved version:稍微改进的版本:

def random_block(frame, k, wrap=True):
    "Return a random block of k contiguous rows from the DataFrame"
    n = len(frame)
    if k == 0:
        raise ValueError('k must be >0')
    elif k > n:
        raise ValueError('k must not be longer than the DataFrame')
    elif k == n:
        return frame
    
    start = np.random.randint(0, n - (0 if wrap else k - 1))
    end = start + k
    return frame.iloc[np.arange(n).take(range(start,end), mode='wrap')]
    
# Usage
result = random_block(df, 10)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM