简体   繁体   English

从 python 中的 txt 文件中删除特定行

[英]Delete specific line from txt file in python

I'm scraping data from a list of url (input.txt) and saving data in output.txt我正在从 url (input.txt) 的列表中抓取数据并将数据保存在 output.txt 中

I want to delete those urls from input file as soon as they are scraped in the loop.我想在循环中抓取这些 url 后立即从输入文件中删除它们。

This is my code:这是我的代码:

def scrape(url):
   //do scraping and return json
   return json

with open("input.txt",
          'r+') as urllist, open('output.txt',
                                'a+') as outfile:
    for url in urllist.read().splitlines():
        data = scrape(url)
        if data:
            if data['products'] is None:
                print("data NOT FOUND: %s")
            else:
                for product in data['products']:
                    print("Saving data: %s" % product['data'])
                    outfile.write(product['data'])
                    outfile.write("\n")

I have included this code in the loop to delete the url when it passes through the loop but it deletes all the urls at once not one by one我已将此代码包含在循环中以在它通过循环时删除 url 但它会立即删除所有网址而不是一个一个地删除

    #start new code
    d = urllist.readlines()
    urllist.seek(0)
    for i in d:
        if i != url:
            urllist.write(i)

input.txt file contains following data: input.txt 文件包含以下数据:

url1
url2
url3

While output.txt file:而 output.txt 文件:

data1
data2
data3

I am referring to this code我指的是这段代码

I have shared an example of removing a line from a file after using that line.我分享了一个在使用该行后从文件中删除该行的示例。 Note that I added a function named "printFileContents" to show you what happens to the file contents after each iteration of scraping.请注意,我添加了一个名为“printFileContents”的 function 来向您展示每次抓取迭代后文件内容发生了什么。 That function is not actually necessary, just nice to visualize what is happening. function 实际上并不是必需的,只是很好地可视化正在发生的事情。 See example below:请参见下面的示例:

def scrape(url):
    # Do some stuff
    return True

def executeScrapeIteration(input_file):
    # Get the first line in the file
    url = input_file.readline()

    # Do your scraping and whatever else
    scrape(url)

    # To remove the line you just used, you have to rewrite the file, but don't include that line
    lines = input_file.readlines()
    input_file.seek(0)
    input_file.truncate()
    for line in lines:
        if line != url:
            input_file.write(line)

# This function is just to show you what happens to the file after each scrape iteration
def printFileContents(input_file, i):
    input_file.seek(0)
    print("-----------------")
    print("After iteration " + str(i) + ":\n")
    print(input_file.read())
    print("\n-----------------\n\n")
    input_file.seek(0)
    

# main function
if __name__=="__main__":
    
    with open("input.txt",'r+') as input_file:
        # Count the lines and then reset the pointer to 0 position
        line_count = len(input_file.readlines())
        input_file.seek(0)
        
        # While the file still contains url, execute an iteration of scraping
        for x in range(0, line_count):
            executeScrapeIteration(input_file)
            printFileContents(input_file, x)

My input.txt file is as follows:我的input.txt文件如下:

url1
url2
url3

Just copy/paste my python script and input.txt file, then run the python script.只需复制/粘贴我的 python 脚本和 input.txt 文件,然后运行 python 脚本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM