[英]Selecting random rows (of data) from dataframe / csv file in Python after designating start and end row number?
Using the sample()
function I can get the random rows. 使用sample()
函数可以获得随机行。 Data set having 1000000
rows of data and I want to have a subset of 20000
rows. 数据集包含1000000
行数据,我想拥有20000
行的子集。 Importing random lines can be done through this solution 可以通过此解决方案导入随机行
https://stackoverflow.com/a/22259008/8966221 https://stackoverflow.com/a/22259008/8966221
dataset = read_csv(file_path)
dataset_sub = dataset.sample(20000, random_state=1) dataset_sub =数据集.sample(20000,random_state = 1)
However I want to select random rows between row number 250000
to 750000
. 但是我想选择行号250000
到750000
之间的随机行。 Any possible solution in that regard?. 在这方面有什么可能的解决方案?
您可以做的是创建一个包含行号在250000到750000之间的行的DataFrame,然后从中选择20000个随机行。
dataset_sub = dataset.loc[250000:750000].sample(20000, random_state=1)
I think you need this: 我认为您需要这样做:
dataset = read_csv(file_path)
dataset_sub = dataset.sample(random.randint(250000,750000), random_state=1)
I think the following code works: 我认为以下代码有效:
import random
a=random.sample(range(250000,750000), 20000)
data=dataset.loc[a]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.