[英]Read a number of random lines from a file in Python
有人可以告诉我如何从Python文件中读取随机的行数吗?
Your requirement is a bit vague, so here's another slightly different method (for inspiration if nothing else): 您的要求有点含糊,因此这是另一种略有不同的方法(如果没有其他帮助,则可以作为启发):
from random import random
lines = [line for line in open("/some/file") if random() >= .5]
Compared with the other solutions, the number of lines varies less (distribution around half the total number of lines) but each line is chosen with 50% probability, and only one pass through the file is required. 与其他解决方案相比,行数变化较小(分布在总行数的一半左右),但是每行的选择概率为50%,并且只需要遍历文件一次。
To get a number of lines at random from your file you could do something like the following: 要从文件中随机获取许多行,可以执行以下操作:
import random
with open('file.txt') as f:
lines = random.sample(f.readlines(),5)
The above example returns 5 lines but you can easily change that to the number you require. 上面的示例返回5行,但是您可以轻松地将其更改为所需的数字。 You could also change it to randint()
to get a random number of lines in addition to a number of random lines, but you'd have to make sure the sample size isn't bigger than the number of lines in the file. 您还可以将其更改为randint()
以获取除随机行数之外的随机行数,但是您必须确保样本大小不大于文件中的行数。 Depending on your input this might be trivial or a little more complex. 根据您的输入,这可能是微不足道的或更复杂的。
Note that the lines could appear in lines
in a different order to which they appear in the file. 需要注意的是该行可能出现在lines
中它们出现在文件中不同的顺序。
import linecache
import random
import sys
# number of line to get.
NUM_LINES_GET = 5
# Get number of line in the file.
with open('file_name') as f:
number_of_lines = len(f.readlines())
if NUM_LINES_GET > number_of_lines:
print "are you crazy !!!!"
sys.exit(1)
# Choose a random number of a line from the file.
for i in random.sample(range(1, number_of_lines+1), NUM_LINES_GET)
print linecache.getline('file_name', i)
linecache.clearcache()
import os,random
def getrandfromMem(filename) :
fd = file(filename,'rb')
l = fd.readlines()
pos = random.randint(0,len(l))
fd.close()
return (pos,l[pos])
def getrandomline2(filename) :
filesize = os.stat(filename)[6]
if filesize < 4096 : # Seek may not be very useful
return getrandfromMem(filename)
fd = file(filename,'rb')
for _ in range(10) : # Try 10 times
pos = random.randint(0,filesize)
fd.seek(pos)
fd.readline() # Read and ignore
line = fd.readline()
if line != '' :
break
if line != '' :
return (pos,line)
else :
getrandfromMem(filename)
getrandomline2("shaks12.txt")
Assuming the offset is always at the beginning of the file: 假设偏移量始终位于文件的开头:
import random
lines = file('/your/file').read().splitlines()
n_lines = random.randrange(len(lines))
random_lines = lines[:n_lines]
Note that this will read the entire file into memory. 请注意,这会将整个文件读入内存。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.