簡體   English   中英

從 python 中的 txt 文件中刪除特定行

[英]Delete specific line from txt file in python

我正在從 url (input.txt) 的列表中抓取數據並將數據保存在 output.txt 中

我想在循環中抓取這些 url 后立即從輸入文件中刪除它們。

這是我的代碼:

def scrape(url):
   //do scraping and return json
   return json

with open("input.txt",
          'r+') as urllist, open('output.txt',
                                'a+') as outfile:
    for url in urllist.read().splitlines():
        data = scrape(url)
        if data:
            if data['products'] is None:
                print("data NOT FOUND: %s")
            else:
                for product in data['products']:
                    print("Saving data: %s" % product['data'])
                    outfile.write(product['data'])
                    outfile.write("\n")

我已將此代碼包含在循環中以在它通過循環時刪除 url 但它會立即刪除所有網址而不是一個一個地刪除

    #start new code
    d = urllist.readlines()
    urllist.seek(0)
    for i in d:
        if i != url:
            urllist.write(i)

input.txt 文件包含以下數據:

url1
url2
url3

而 output.txt 文件:

data1
data2
data3

我指的是這段代碼

我分享了一個在使用該行后從文件中刪除該行的示例。 請注意,我添加了一個名為“printFileContents”的 function 來向您展示每次抓取迭代后文件內容發生了什么。 function 實際上並不是必需的,只是很好地可視化正在發生的事情。 請參見下面的示例:

def scrape(url):
    # Do some stuff
    return True

def executeScrapeIteration(input_file):
    # Get the first line in the file
    url = input_file.readline()

    # Do your scraping and whatever else
    scrape(url)

    # To remove the line you just used, you have to rewrite the file, but don't include that line
    lines = input_file.readlines()
    input_file.seek(0)
    input_file.truncate()
    for line in lines:
        if line != url:
            input_file.write(line)

# This function is just to show you what happens to the file after each scrape iteration
def printFileContents(input_file, i):
    input_file.seek(0)
    print("-----------------")
    print("After iteration " + str(i) + ":\n")
    print(input_file.read())
    print("\n-----------------\n\n")
    input_file.seek(0)
    

# main function
if __name__=="__main__":
    
    with open("input.txt",'r+') as input_file:
        # Count the lines and then reset the pointer to 0 position
        line_count = len(input_file.readlines())
        input_file.seek(0)
        
        # While the file still contains url, execute an iteration of scraping
        for x in range(0, line_count):
            executeScrapeIteration(input_file)
            printFileContents(input_file, x)

我的input.txt文件如下:

url1
url2
url3

只需復制/粘貼我的 python 腳本和 input.txt 文件,然后運行 python 腳本。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM