简体   繁体   English

如何使用python和bs4读取和覆盖文件夹中的所有* .txt文件?

[英]How to read and overwrite all *.txt files in a folder with python and bs4?

I have a folder with thousands of files.我有一个包含数千个文件的文件夹。 I'm attempting to parse the XML tag in them using beautifulsoup4.我正在尝试使用 beautifulsoup4 解析其中的 XML 标记。

I'm able to do it for each file individually but can't make my script work using a for loop.我可以单独为每个文件执行此操作,但无法使用 for 循环使我的脚本工作。

Here's my code so far:到目前为止,这是我的代码:

 import bs4 as bs import glob path = r"~/Desktop/pythontest/*.txt" files = glob.glob(path) # ------------------------READ AND PARSE TEXT----------------------------------------- for f in files: # open file in read mode source = open(f, "rt") # parse xml as soup soup = bs.BeautifulSoup(source, "lxml") soupText = soup.get_text() text = soupText.replace(r"\\n", " ") # close file source.close() # --------------------------OVERWRITE FILE--------------------------------------------- for f in files: # open file in write mode source = open(f, "wt") # overwrite the file with the soup source.write((text)) # # close file source.close() print(text)

When I run it the console gives me this:当我运行它时,控制台给了我这个:

Traceback (most recent call last):
  File "./camltest.py", line 34, in <module>
    print(text)
NameError: name 'text' is not defined

I suspect this is a scope problem but can't fix it.我怀疑这是一个范围问题,但无法修复它。 Any suggestions?有什么建议? Thanks谢谢

Note that text is defined inside your first for loop.请注意, text是在您的第一个 for 循环中定义的。

If files is an empty list, text will never be defined.如果files是空列表,则永远不会定义text

You could simply read and then write to the file in the same loop.您可以在同一个循环中简单地读取然后写入文件。

for f in files:
    source = open(f, "w+")
    soup = bs.BeautifulSoup(source, "lxml")
    soupText = soup.get_text()
    text = soupText.replace(r"\n", " ")
    source.write(text)
    source.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM