简体   繁体   English

在Python 3中打开文件,重新格式化并写入新文件

[英]Open a file, reformat, and write to a new file in Python 3

I'm very new to Python (a couple of weeks). 我是Python的新手(几周)。 I'm doing the Python for Everybody course on Coursera and decided to expand some ideas into an app I'd like to write. 我正在Coursera上进行Python for Everyone课程,并决定将一些想法扩展到我想编写的应用程序中。

I want to take a txt file of writing quotes, remove some unnecessary characters and newlines, and then write the newly formatted string to a new file. 我想获取一个写引号的txt文件,删除一些不必要的字符和换行符,然后将新格式化的字符串写到一个新文件中。 This file will be used to display random quotes in the terminal (the latter isn't a requirement here). 该文件将用于在终端中显示随机引号(这里不需要后者)。

An entry in the txt file looks like: txt文件中的条目如下所示:

“The road to hell is paved with works-in-progress.”
—Philip Roth, WD some other stuff here
“Some other quote.”
—Another Author, Blah blah

And I'd like the following to be written to a new file: 我希望将以下内容写入新文件:

"The road to hell is paved with works-in-progress." —Phillip Roth
"Some other quote." —Another Author

I'd like to remove the newline between the quote and author and replace with a space. 我想删除引号和作者之间的换行符并替换为空格。 I'd also like to remove everything from the comma after the author onwards (so it's just: quote [space] author). 在作者之后,我还想从逗号中删除所有内容(所以它就是:quote [space] author)。 The file has 73 of these, so I'd like to run through the file making these changes, and then write to a new file with the newly formatted quotes. 该文件有73个,因此我想遍历文件进行这些更改,然后使用新设置的引号将其写入新文件。 The final output will simply be: "blah blah blah" -Author 最终输出将仅仅是:“等等等等”-作者

I've tried various approaches, and currently working going through the file in a for loop writing the two segments to lists where I was thinking of joining the lists. 我尝试了各种方法,目前正在for循环中遍历该文件,将这两个段写入到我想加入列表的列表中。 But I'm stuck and also not sure if this is overkill. 但是我被困住了,也不知道这是否太过分了。 Any help would be gratefully received. 任何帮助将不胜感激。 Now that I have the two lists I can't seem to join them, and I'm not sure if doing it this way is even right. 现在我有了这两个列表,我似乎无法加入他们的行列,而且我不确定这样做是否正确。 Any thoughts? 有什么想法吗?

Code so far: 到目前为止的代码:

fh = open('quotes_source.txt')


quote = list()
author = list()

for line in fh:

    # Find quote segment and assign to a string variable
    if line.startswith('“'):
        phrase_end = line.find('”')+1
        phrase_start = line.find('“')
        phrase = line[phrase_start:phrase_end]
        quote.append(phrase)

    # Find author segment and assign to a string variable
    if line.startswith('—'):
        name_end = line.find(',')
        name = line[:name_end]
        author.append(name)

print(quote)
print(author)
quote_line="“The road to hell is paved with works-in-progress.”\n—Philip Roth, WD some other stuff here\n"
quote_line=quote_line.replace("\n","")
quote_line=quote_line.split(",")

formatted_quote=""

If you are not sure that there is only one comma in the line. 如果您不确定该行中只有一个逗号。

  • “Tit for tat.”\\n—Someone Roth, blah blah\\n #only one comma “一针见血。” \\ n-有人罗斯,等等等等\\ n#只有一个逗号
  • “Tit for tat, tat for tit”\\n—Someone Roth, blah blah\\n #more than one comma “以牙还牙,以牙还牙” \\ n-某人罗斯,等等等等\\ n#个以上的逗号

     len_quote_list=len(quote_line)-1 for part in range(0,len_quote_list): formatted_quote+=quote_line[part] formatted_quote+="\\n" 

or 要么

formatted_quote=quote_line[0]+"\n"

You don't need regex for a simple task like this, you were actually on the right track but you got yourself tangled up in trying to parse everything instead of just streaming the file and deciding where to cut. 您不需要像这样的简单任务就使用正则表达式,实际上您处在正确的轨道上,但是您在尝试解析所有内容而不是仅流传输文件并决定在何处剪切时纠结了自己。

Based on your data, you want to cut on the line starting with (denoting the author) and you want to cut that line from first comma onwards. 根据您的数据,您想剪切以开头的行(表示作者),并且希望从第一个逗号开始剪切该行。 Presumably, you also want to remove the empty lines, too. 大概您也想删除空行。 Thus, a simple stream modifier would look something like: 因此,一个简单的流修饰符将类似于:

# open quotes_source.txt for reading and quotes_processed.txt for writing
with open("quotes_source.txt", "r", encoding="utf-8") as f_in,\
        open("quotes_processed.txt", "w", encoding="utf-8") as f_out:
    for line in f_in:  # read the input file line by line
        line = line.strip()  # clear out all whitespace, including the new line
        if not line:  # ignore blank lines
            continue
        if line[0] == "—":  # we found the dash!
            # write space, everything up to the first comma and a new line in the end
            f_out.write(" " + line.split(",", 1)[0] + "\n")
        else:
            f_out.write(line)  # a quote line, write it immediately

And that's all there is to it. 这就是全部。 As long as there are no other new lines in the data it will produce exactly the result you want, ie for a quotes_source.txt file containing: 只要数据中没有其他新行,它就会准确地产生您想要的结果,即对于quotes_source.txt文件,其中包含:

“The road to hell is paved with works-in-progress.”
—Philip Roth, WD some other stuff here

“The only thing necessary for the triumph of evil is for good men to do nothing.”
—Edmund Burke, whatever there is

“You know nothing John Snow.”
—The wildling Ygritte, "A Dance With Dragons" - George R.R. Martin

It will produce a quotes_processed.txt file containing: 它将产生一个quotes_processed.txt文件,其中包含:

“The road to hell is paved with works-in-progress.” —Philip Roth
“The only thing necessary for the triumph of evil is for good men to do nothing.” —Edmund Burke
“You know nothing John Snow.” —The wildling Ygritte

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM