I have a.txt file that looks like this:
Title | Author
-------------------------
title1 | author1
title2 | author2
... ...
titleN | authorN
I want to extract 100 random titles from this file, like this:
title1
title2
...
title100
I tried this:
import random
with open('file.txt','r') as f:
title = f.read().split('|')
for i in range (0,100):
print(random.choice(title))
But during the execution, the program prints also random authors name. How can I avoid this?
When you do this:
with open(path,'r') as f:
title = f.read().split('|')
f.read()
gives you the whole files as a string. Splitting that on |
gives a list with both authors and titles (and new lines and spaces).
Instead, you can process the lines and split as you go. With something like:
with open(path) as f:
titles = [l.split('|')[0].strip() for l in f]
This will give you a clean list of titles like:
['title1', 'title2', 'title3', 'title4', 'title5']
With that you can use random.sample()
to get however many random items you want.
import random
path = "path/to/file.txt"
n = 100
with open(path) as f:
titles = [l.split('|')[0].strip() for l in f]
random.sample(titles, n)
This assumes you don't want duplicates.
You can use .readlines()
instead of .read()
to read the file line by line to a list. Then you can use .split('|')[0].strip()
after when you've selected a random row, to only show the title part of it:
import random
with open('file.txt', 'r') as f:
title = f.readlines()
for i in range(0, 100):
choice = random.choice(title)
print(choice.split('|')[0].strip())
Alternatively you can process the file immediately after you've read it:
import random
with open('file.txt', 'r') as f:
title = [line.split('|')[0].strip() for line in f.readlines()]
for i in range(0, 100):
print(random.choice(title))
Here's a demonstration how the .split('|')[0].strip()
works:
>>> choice = "title1 | author1"
>>> choice.split('|')
['title1 ', ' author1']
>>> choice.split('|')[0]
'title1 '
>>> choice.split('|')[0].strip()
'title1'
Have a look at title
after you read it in. If my text file is
title1 | author1
title2 | author2
title
will be ['title1 ', ' author1\ntitle2 ', ' author2\n']
. Randomly choosing from this list will sometimes give you titles, sometimes authors, and sometimes both.
A better approach would something like the following:
import random
# read in the file and split lines
with open("file.txt", "r") as f:
lines = f.read().splitlines()
# lines = ["title1 | author1", "title2 | author2"]
titles = [line.split("|")[0].strip() for line in lines]
# titles = ["title1", "title2"]
Note that we need to call strip
to strip any extra whitespace off the ends of the title.
You can now proceed with your sampling, but I suspect that you want 100 unique titles and not just 100 random titles. What you are doing is called sampling with replacement , and getting unique titles would be sampling without replacement . You can accomplish this with random.sample
as follows (see the random docs ):
print(*(random.sample(titles, 100)), sep = "\n")
or equivalently with more familiar syntax
for samp_title in random.sample(titles, 100):
print(samp_title)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.