简体   繁体   中英

How to read and overwrite all *.txt files in a folder with python and bs4?

I have a folder with thousands of files. I'm attempting to parse the XML tag in them using beautifulsoup4.

I'm able to do it for each file individually but can't make my script work using a for loop.

Here's my code so far:

 import bs4 as bs import glob path = r"~/Desktop/pythontest/*.txt" files = glob.glob(path) # ------------------------READ AND PARSE TEXT----------------------------------------- for f in files: # open file in read mode source = open(f, "rt") # parse xml as soup soup = bs.BeautifulSoup(source, "lxml") soupText = soup.get_text() text = soupText.replace(r"\\n", " ") # close file source.close() # --------------------------OVERWRITE FILE--------------------------------------------- for f in files: # open file in write mode source = open(f, "wt") # overwrite the file with the soup source.write((text)) # # close file source.close() print(text)

When I run it the console gives me this:

Traceback (most recent call last):
  File "./camltest.py", line 34, in <module>
    print(text)
NameError: name 'text' is not defined

I suspect this is a scope problem but can't fix it. Any suggestions? Thanks

Note that text is defined inside your first for loop.

If files is an empty list, text will never be defined.

You could simply read and then write to the file in the same loop.

for f in files:
    source = open(f, "w+")
    soup = bs.BeautifulSoup(source, "lxml")
    soupText = soup.get_text()
    text = soupText.replace(r"\n", " ")
    source.write(text)
    source.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM