简体   繁体   中英

Open a file, reformat, and write to a new file in Python 3

I'm very new to Python (a couple of weeks). I'm doing the Python for Everybody course on Coursera and decided to expand some ideas into an app I'd like to write.

I want to take a txt file of writing quotes, remove some unnecessary characters and newlines, and then write the newly formatted string to a new file. This file will be used to display random quotes in the terminal (the latter isn't a requirement here).

An entry in the txt file looks like:

“The road to hell is paved with works-in-progress.”
—Philip Roth, WD some other stuff here
“Some other quote.”
—Another Author, Blah blah

And I'd like the following to be written to a new file:

"The road to hell is paved with works-in-progress." —Phillip Roth
"Some other quote." —Another Author

I'd like to remove the newline between the quote and author and replace with a space. I'd also like to remove everything from the comma after the author onwards (so it's just: quote [space] author). The file has 73 of these, so I'd like to run through the file making these changes, and then write to a new file with the newly formatted quotes. The final output will simply be: "blah blah blah" -Author

I've tried various approaches, and currently working going through the file in a for loop writing the two segments to lists where I was thinking of joining the lists. But I'm stuck and also not sure if this is overkill. Any help would be gratefully received. Now that I have the two lists I can't seem to join them, and I'm not sure if doing it this way is even right. Any thoughts?

Code so far:

fh = open('quotes_source.txt')


quote = list()
author = list()

for line in fh:

    # Find quote segment and assign to a string variable
    if line.startswith('“'):
        phrase_end = line.find('”')+1
        phrase_start = line.find('“')
        phrase = line[phrase_start:phrase_end]
        quote.append(phrase)

    # Find author segment and assign to a string variable
    if line.startswith('—'):
        name_end = line.find(',')
        name = line[:name_end]
        author.append(name)

print(quote)
print(author)
quote_line="“The road to hell is paved with works-in-progress.”\n—Philip Roth, WD some other stuff here\n"
quote_line=quote_line.replace("\n","")
quote_line=quote_line.split(",")

formatted_quote=""

If you are not sure that there is only one comma in the line.

  • “Tit for tat.”\\n—Someone Roth, blah blah\\n #only one comma
  • “Tit for tat, tat for tit”\\n—Someone Roth, blah blah\\n #more than one comma

     len_quote_list=len(quote_line)-1 for part in range(0,len_quote_list): formatted_quote+=quote_line[part] formatted_quote+="\\n" 

or

formatted_quote=quote_line[0]+"\n"

You don't need regex for a simple task like this, you were actually on the right track but you got yourself tangled up in trying to parse everything instead of just streaming the file and deciding where to cut.

Based on your data, you want to cut on the line starting with (denoting the author) and you want to cut that line from first comma onwards. Presumably, you also want to remove the empty lines, too. Thus, a simple stream modifier would look something like:

# open quotes_source.txt for reading and quotes_processed.txt for writing
with open("quotes_source.txt", "r", encoding="utf-8") as f_in,\
        open("quotes_processed.txt", "w", encoding="utf-8") as f_out:
    for line in f_in:  # read the input file line by line
        line = line.strip()  # clear out all whitespace, including the new line
        if not line:  # ignore blank lines
            continue
        if line[0] == "—":  # we found the dash!
            # write space, everything up to the first comma and a new line in the end
            f_out.write(" " + line.split(",", 1)[0] + "\n")
        else:
            f_out.write(line)  # a quote line, write it immediately

And that's all there is to it. As long as there are no other new lines in the data it will produce exactly the result you want, ie for a quotes_source.txt file containing:

“The road to hell is paved with works-in-progress.”
—Philip Roth, WD some other stuff here

“The only thing necessary for the triumph of evil is for good men to do nothing.”
—Edmund Burke, whatever there is

“You know nothing John Snow.”
—The wildling Ygritte, "A Dance With Dragons" - George R.R. Martin

It will produce a quotes_processed.txt file containing:

“The road to hell is paved with works-in-progress.” —Philip Roth
“The only thing necessary for the triumph of evil is for good men to do nothing.” —Edmund Burke
“You know nothing John Snow.” —The wildling Ygritte

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM