简体   繁体   中英

Python - How to check if the text is in a file txt?

I have a function that checks if the text is in file.txt or not.

The function works like this: If the text is contained in the file, the file is closed. If the text is not contained in the file, it is added.

But it doesn't work.

import urllib2, re
from bs4 import BeautifulSoup as BS

def SaveToFile(fileToSave, textToSave):
    datafile = file(fileToSave)
    for line in datafile:
        if textToSave in line:
            datafile.close()
        else:
            datafile.write(textToSave + '\n')
            datafile.close()



urls = ['url1', 'url2'] # i dont want to public the links.

patGetTitle = re.compile(r'<title>(.*)</title>')

for url in urls:
    u = urllib2.urlopen(url)
    webpage = u.read()
    title = re.findall(patGetTitle, webpage) 
    SaveToFile('articles.txt', title) 
    # so here. If the title of the website is already in articles.txt 
    # the function should close the file. 
    # But if the title is not found in articles.txt the function should add it.

You can change the SaveToFile function like this

Your title is a list and not a string so you should call it like this SaveToFile('articles.txt', title[0]) to get the first element of the list

def SaveToFile(fileToSave, textToSave):
    with open(fileToSave, "r+") as datafile:
        for line in datafile:
            if textToSave in line:
                break
        else:
            datafile.write(textToSave + '\n')

Notes:

  • Since you very looping over an empty file the loop did not even run once.

ie)

for i in []:
    print i # This will print nothing since it is iterating over empty list same as yours
  • You have passed a list and not a string since re.findall returns a list object you have to pass the first element of the list to the function.
  • I have used for..else here if the loop is not terminated properly the else case will work.

ie)

for i in []:
    print i
else:
    print "Nooooo"

Output:

Nooooo

You should refactor your SaveToFile function to like this.

def SaveToFile(fileToSave, titleList):
    with open(fileToSave, 'a+') as f:
        data = f.read()

        for titleText in titleList:
            if titleText not in data:
                f.write(titleText + '\n')

        f.close()

This function read a content of file (if exist or created if not) and checks whether textToSave is in the file contents. If it found textToSave then, close file otherwise write content to file.

Just use r+ mode like this:

def SaveToFile(fileToSave, textToSave):
    with open(fileToSave, 'r+') as datafile:
        if textToSave not in datafile.read():
            datafile.write(textToSave + '\n')

About that file mode, from this answer :

``r+''  Open for reading and writing.  The stream is positioned at the  
        beginning of the file.

And re.find_all() always return a list, so if you're trying to write a list instead of string you'll get an error.

So you could use:

def SaveToFile(fileToSave, textToSave):
    if len(textToSave) => 1:
        textToSave = textToSave[0]
    else:
        return

    with open(fileToSave, 'r+') as datafile:
        if textToSave not in datafile.read():
            datafile.write(textToSave + '\n')

This seems closer to your problem.

This checks if the text in the file:

def is_text_in_file(file_name, text):
    with open(file_name) as fobj:
        for line in fobj:
            if text in line:
                return True
    return False

This use the function above to check and writes the text to end of the file if it is not in file yet.

def save_to_file(file_name, text):
    if not is_text_in_file in (file_name, text):
        with open(file_name, 'a') as fobj:
            fobj.write(text + '\n')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM