簡體   English   中英

嘗試添加僅提取包含“word”的行的代碼,並從請求中寫入一個 new.txt 文件

[英]Trying to add code that extract only lines that contains "word" and write a new .txt file from requests

此代碼打開一個包含網站的文本文件 ( list.txt ),然后從這些網站的 webarchive.org 中提取 URLS,並將它們寫入一個新的文本文件 ( urls.txt )。 我只需要從 web.archive.org 中提取包含“word”的鏈接,但出現錯誤:

if `word' in url:  IndentationError: unexpected indent

有人可以解釋原因並在此處提供正確的代碼嗎?

代碼:

urls = []
with open("list.txt", "r") as f_in:
    for line in map(str.strip, f_in):
        if line == "":
            continue
        urls.append(line)

archive_url = "http://web.archive.org/cdx/search/cdx?url=*.{}&output=text&fl=original&collapse=urlkey"

with open("url.txt", "w") as f_out:
    for url in urls:

        r = requests.get(archive_url.format(url))
         if 'word' in url:
        print(r.text, file=f_out)
        print("\n", file=f_out)

有兩個問題:

  1. if語句前有一個前導空格
  2. 在此語句之后的行中,您必須縮進代碼

這應該可以解決您的問題:

urls = []
with open("list.txt", "r") as f_in:
    for line in map(str.strip, f_in):
        if line == "":
            continue
        urls.append(line)

archive_url = "http://web.archive.org/cdx/search/cdx?url=*.{}&output=text&fl=original&collapse=urlkey"

with open("url.txt", "w") as f_out:
    for url in urls:

        r = requests.get(archive_url.format(url))
        if 'word' in url:
            print(r.text, file=f_out)
            print("\n", file=f_out)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM