简体   繁体   中英

How to extract random words from .txt file using python?

I have a.txt file that looks like this:

Title       | Author

-------------------------
title1      | author1

title2      | author2


...        ...

titleN      | authorN

I want to extract 100 random titles from this file, like this:

title1

title2

...

title100

I tried this:

import random
with open('file.txt','r') as f:
  title = f.read().split('|')

for i in range (0,100):
 print(random.choice(title))

But during the execution, the program prints also random authors name. How can I avoid this?

When you do this:

with open(path,'r') as f:
    title = f.read().split('|')

f.read() gives you the whole files as a string. Splitting that on | gives a list with both authors and titles (and new lines and spaces).

Instead, you can process the lines and split as you go. With something like:

with open(path) as f:
    titles = [l.split('|')[0].strip() for l in f]

This will give you a clean list of titles like:

['title1', 'title2', 'title3', 'title4', 'title5']

With that you can use random.sample() to get however many random items you want.

import random

path = "path/to/file.txt"
n = 100

with open(path) as f:
    titles = [l.split('|')[0].strip() for l in f]

random.sample(titles, n)

This assumes you don't want duplicates.

You can use .readlines() instead of .read() to read the file line by line to a list. Then you can use .split('|')[0].strip() after when you've selected a random row, to only show the title part of it:

import random

with open('file.txt', 'r') as f:
    title = f.readlines()

for i in range(0, 100):
    choice = random.choice(title)
    print(choice.split('|')[0].strip())

Alternatively you can process the file immediately after you've read it:

import random

with open('file.txt', 'r') as f:
    title = [line.split('|')[0].strip() for line in f.readlines()]

for i in range(0, 100):
    print(random.choice(title))

Here's a demonstration how the .split('|')[0].strip() works:

>>> choice = "title1      | author1"
>>> choice.split('|')
['title1      ', ' author1']
>>> choice.split('|')[0]
'title1      '
>>> choice.split('|')[0].strip()
'title1'

Have a look at title after you read it in. If my text file is

title1 | author1
title2 | author2

title will be ['title1 ', ' author1\ntitle2 ', ' author2\n'] . Randomly choosing from this list will sometimes give you titles, sometimes authors, and sometimes both.

A better approach would something like the following:

import random

# read in the file and split lines
with open("file.txt", "r") as f:
    lines = f.read().splitlines()
# lines = ["title1 | author1", "title2 | author2"]

titles = [line.split("|")[0].strip() for line in lines]
# titles = ["title1", "title2"]

Note that we need to call strip to strip any extra whitespace off the ends of the title.

You can now proceed with your sampling, but I suspect that you want 100 unique titles and not just 100 random titles. What you are doing is called sampling with replacement , and getting unique titles would be sampling without replacement . You can accomplish this with random.sample as follows (see the random docs ):

print(*(random.sample(titles, 100)), sep = "\n")

or equivalently with more familiar syntax

for samp_title in random.sample(titles, 100):
    print(samp_title)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM