简体   繁体   English

Python-验证,写入和附加.txt文件

[英]Python - validate, write and append .txt file

I would like to validate the link my crawler gets from the web with the one I stored in my .txt file. 我想用我存储在.txt文件中的链接来验证我的搜寻器从网络获取的链接。 After my crawler retrieves links from the web, it will append('a') to my .txt file. 我的搜寻器从网络检索链接后,会将其附加('a')到我的.txt文件中。 However, if the link already exists in my .txt file, I would like to append it with('w'). 但是,如果该链接已经存在于我的.txt文件中,我想将其附加为('w')。 Any idea on how I can do it? 关于我该怎么做的任何想法?

    def spider(targetname, DOMAIN, g_data):
    for item in g_data:
        try:
            name = item.find_all("strong", {"class": "fullname show-popup-with-id "})[0].text
            username = item.find_all("span", {"class": "username u-dir"})[0].text
            post = item.find_all("p", {"class": "TweetTextSize TweetTextSize--normal js-tweet-text tweet-text"})[0].text
            replies = item.find_all("span", {"class": "u-hiddenVisually"})[3].text
            retweets = item.find_all("span", {"class": "u-hiddenVisually"})[4].text
            likes = item.find_all("span", {"class": "u-hiddenVisually"})[5].text
            retweetby = item.find_all("a", {"href": "/"+targetname})[0].text
            datas = item.find_all('a', {'class':'tweet-timestamp js-permalink js-nav js-tooltip'})
            for data in datas:
                link = DOMAIN + data['href']
                date = data['title']
            append_to_file(crawledfile, name, username, post, link, replies, retweets, likes, retweetby, date)
        except:
            pass


`def append_to_file(path, name, username, post, link, replies, retweets, likes, retweetby, date):
    with open(path, 'a') as file:
        try:
            file.write("Name: "+ name + '\n')
        except:
            print("Name: --Currently unavailable--" + '\n')
        try:
            file.write("Username: "+ username + '\n')
        except:
            print("Username: --Currently unavailable--" + '\n')
        try:
            file.write("Post: "+ post + '\n')
        except:
            print("Post: --Currently unavailable--" + '\n')
        try:
            file.write("post's link: "+ link.strip() + '\n')
        except:
            print("post's link: --Currently unavailable--" + '\n')
        try:
            file.write("Replies: "+ replies.strip() + '\n')
        except:
            print("Replies: --Currently unavailable--" + '\n')
        try:
            file.write("Retweet: "+ retweets.strip() + '\n')
        except:
            print("Retweet: --Currently unavailable--" + '\n')
        try:
            file.write("Likes: "+ likes.strip() + '\n')
        except:
            print("Likes: --Currently unavailable--" + '\n')
        try:
            if(username != "@" + targetname):
                file.write("Retweeted By: " + retweetby.strip() + '\n')
        except:
            file.write("Retweeted By: --Currently unavailable--" + '\n')
        try:
            file.write("Date: " + date + '\n')
        except:
            file.write("Date: --Currently unavailable--" + '\n')
        file.write("" + '\n')`




Name: Donald J. Trump Username: @realDonaldTrump Post: I look forward to paying my respects to our brave men and women on this Memorial Day at Arlington National Cemetery later this morning. post's link: https://twitter.com/realDonaldTrump/status/869170615881793536 Replies: 14,333 replies Retweet: 13,492 retweets Likes: 74,645 likes Date: 5:36 AM - 29 May 2017

Name: Donald J. Trump Username: @realDonaldTrump Post: Today we remember the men and women who made the ultimate sacrifice in serving. Thank you, God bless your families & God bless the USA! post's link: https://twitter.com/realDonaldTrump/status/869170351049240576 Replies: 8,827 replies Retweet: 33,541 retweets Likes: 123,112 likes Date: 5:35 AM - 29 May 2017

If I interpreted your statement correctly, you just need to append the character 'a' or 'w' to your text file based on the condition if the link is already present in the file or not. 如果我正确地解释了您的陈述,那么您仅需要根据条件(如果该文件中已经存在链接)就将字符'a'或'w'附加到文本文件中即可。 For this, you can use this code: 为此,您可以使用以下代码:

def append_to_file(path, name, username, post, link, replies, retweets, likes, retweetby, date):
    with open(path, 'a') as file:
        if link.strip() in file.read():
            to_append = 'a'
        else:
            to_append = 'w'
        try:
            file.write("Name: " + name + to_append + '\n')
        except:
            print("Name: -- Currently unavailable--" + '\n')
        try:
            file.write("Username: " + username + to_append + '\n')
        except:
            print("Username: -- Currently unavailable--" + '\n')
        try:
            file.write("Post: " + post + to_append + '\n')
        except:
            print("Post: -- Currently unavailable--" + '\n')
        try:
            file.write("post's link: " + link.strip() + to_append + '\n')
        except:
            print("post's link: -- Currently unavailable--" + '\n')
        try:
            file.write("Replies: " + replies.strip() + to_append + '\n')
        except:
            print("Replies: -- Currently unavailable--" + '\n')
        try:
            file.write("Retweet: " + retweets.strip() + to_append + '\n')
        except:
            print("Retweet: -- Currently unavailable--" + '\n')
        try:
            file.write("Likes: " + likes.strip() + to_append + '\n')
        except:
            print("Likes: -- Currently unavailable--" + '\n')
        try:
            if(username != "@" + targetname):
                file.write("Retweeted By: " +
                           retweetby.strip() + to_append + '\n')
        except:
            file.write(
                "Retweeted By: -- Currently unavailable--" + '\n')
        try:
            file.write("Date: " + date + to_append + '\n')
        except:
            file.write("Date: -- Currently unavailable--" +
                       to_append + '\n')
        file.write("" + '\n')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM