如何在多个文件中保存多个输出，其中每个文件的标题来自python对象？

Question

I'm scraping rss feed from a web site ( http://www.gfrvitale.altervista.org/index.php/autismo-in?format=feed&type=rss ). 我正在从网站（ http://www.gfrvitale.altervista.org/index.php/autismo-in?format=feed&type=rss ）中抓取RSS提要。 I have wrote down a script to extract and purifie the text from every of the feed. 我写下了一个脚本，从每个提要中提取和纯化文本。 My main problem is to save each text of each item in a different file, I also need to name each file with it's proper title exctractet from the item. 我的主要问题是将每个项目的每个文本保存在不同的文件中，我还需要使用每个项目的正确标题摘录来命名每个文件。 My code is: 我的代码是：

for item in myFeed["items"]:
    time_structure=item["published_parsed"]
    dt = datetime.fromtimestamp(mktime(time_structure))

    if dt>t:

     link=item["link"]           
     response= requests.get(link)
     doc=Document(response.text)
     doc.summary(html_partial=False)

     # extracting text
     h = html2text.HTML2Text()

     # converting
     h.ignore_links = True  #ignoro i link
     h.skip_internal_links=True  #ignoro i link esterni
     h.inline_links=True
     h.ignore_images=True  #ignoro i link alle immagini
     h.ignore_emphasis=True
     h.ignore_anchors=True
     h.ignore_tables=True

     testo= h.handle(doc.summary())  #testo estratto

     s = doc.title()+"."+" "+testo  #contenuto da stampare nel file finale

     tit=item["title"]

     # save each file with it's proper title
     with codecs.open("testo_%s", %tit "w", encoding="utf-8") as f:
         f.write(s)
         f.close()

The error is: 错误是：

File "<ipython-input-57-cd683dec157f>", line 34 with codecs.open("testo_%s", %tit "w", encoding="utf-8") as f:
                                 ^
SyntaxError: invalid syntax

Answer 1

You need to put the comma after %tit 您需要在%tit之后加上逗号

should be: 应该：

#save each file with it's proper title
with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f:
     f.write(s)
     f.close()

However, if your file name has invalid characters it will return an error (ie [Errno 22] ) 但是，如果您的文件名包含无效字符，它将返回错误（即[Errno 22] ）

You can try this code: 您可以尝试以下代码：

...
tit = item["title"]
tit = tit.replace(' ', '').replace("'", "").replace('?', '') # Not the best way, but it could help for now (will be better to create a list of stop characters)

with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f:
     f.write(s)
     f.close()

Other way using nltk : 使用nltk其他方式：

from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer(r'\w+')
tit = item["title"]
tit = tokenizer.tokenize(tit)
tit = ''.join(tit)
with codecs.open("testo_%s" %tit, "w", encoding="utf-8") as f:
     f.write(s)
     f.close()

Answer 2

First off, you misplaced the comma, it should be after the %tit not before. 首先，您放错了逗号，应该在%tit之后，而不是之前。

Secondly, you don't need to close the file because the with statement that you use, does it automatically for you. 其次，您不需要关闭文件，因为您使用的with语句会自动为您完成文件。 And where did the codecs came from? 编解码器是从哪里来的？ I don't see it anywhere else.... anyway, the correct with statement would be: 我在其他任何地方都看不到...。无论如何，正确的with语句是：

with open("testo_%s" %tit, "w", encoding="utf-8") as f:
     f.write(s)

如何在多个文件中保存多个输出，其中每个文件的标题来自python对象？

问题描述

2 个解决方案

解决方案1
0 已采纳 2016-10-02 15:26:58

解决方案2
0 2016-10-02 19:34:03

如何在多个文件中保存多个输出，其中每个文件的标题来自python对象？

问题描述

2 个解决方案

解决方案1 0 已采纳 2016-10-02 15:26:58

解决方案2 0 2016-10-02 19:34:03

解决方案1
0 已采纳 2016-10-02 15:26:58

解决方案2
0 2016-10-02 19:34:03