Python：仅写入输出的最后一行

Question

Trying to write a program that extracts URLs from a website. 尝试编写一个程序从网站提取URL。 The output is good, but when I try to write the output to a file, only the last record is written. 输出是好的，但是当我尝试将输出写入文件时，只会写入最后一条记录。 Here is the code: 这是代码：

import re
import urllib.request

# Retrieves URLs from the HTML source code of a website
def extractUrls(url, unique=True, sort=True, restrictToTld=None):
    # Prepend "www." if not present
    if url[0:4] != "www.":
        url = "".join(["www.",url])
    # Open a connection
    with urllib.request.urlopen("http://" + url) as h:
        # Grab the headers
        headers = h.info()
        # Default charset
        charset = "ISO-8859-1"
        # If a charset is in the headers then override the default
        for i in headers:
            match = re.search(r"charset=([\w\-]+)", headers[i], re.I)
            if match != None:
                charset = match.group(1).lower()
                break
        # Grab and decode the source code
        source = h.read().decode(charset)
        # Find all URLs in the source code
        matches = re.findall(r"http\:\/\/(www.)?([a-z0-9\-\.]+\.[a-z]{2,6})\b", source, re.I)
        # Abort if no URLs were found
        if matches == None:
            return None
        # Collect URLs
        collection = []
        # Go over URLs one by one
        for url in matches:
            url = url[1].lower()
            # If there are more than one dot then the URL contains
            # subdomain(s), which we remove
            if url.count(".") > 1:
                temp = url.split(".")
                tld = temp.pop()
                url = "".join([temp.pop(),".",tld])
            # Restrict to TLD if one is set
            if restrictToTld:
                tld = url.split(".").pop()
                if tld != restrictToTld:
                    continue
            # If only unique URLs should be returned
            if unique:
                if url not in collection:
                    collection.append(url)
            # Otherwise just add the URL to the collection
            else:
                collection.append(url)
        # Done
        return sorted(collection) if sort else collection

# Test
url = "msn.com"
print("Parent:", url)
for x in extractUrls(url):
    print("-", x)

f = open("f2.txt", "w+", 1)
f.write( x ) 
f.close()

The output is: 输出为：

Parent: msn.com
- 2o7.net
- atdmt.com
- bing.com
- careerbuilder.com
- delish.com
- discoverbing.com
- discovermsn.com
- facebook.com
- foxsports.com
- foxsportsarizona.com
- foxsportssouthwest.com
- icra.org
- live.com
- microsoft.com
- msads.net
- msn.com
- msnrewards.com
- myhomemsn.com
- nbcnews.com
- northjersey.com
- outlook.com
- revsci.net
- rsac.org
- s-msn.com
- scorecardresearch.com
- skype.com
- twitter.com
- w3.org
- yardbarker.com
[Finished in 0.8s]

Only "yardbarker.com" is written to the file. 仅将“ yardbarker.com”写入文件。 I appreciate the help, thank you. 感谢您的帮助，谢谢。

Answer 1

url = "msn.com"
print("Parent:", url)
f = open("f2.txt", "w",)
for x in extractUrls(url):
    print("-", x)
    f.write( x )
f.close()

Answer 2

As per other answers the file write needs to be inside the loop but also try writing a new line character \\n after x : 根据其他答案，文件写入需要在循环内，但也可以尝试在x之后写入新的行字符\\n ：

f = open("f2.txt", "w+")
for x in extractUrls(url):
    print("-", x)
    f.write( x +'\n' ) 
f.close()

Also the line return sorted(collection) if sort else collection has two indents where it should have one. return sorted(collection) if sort else collection在应该有一个缩进的地方有两个缩进，则该行还return sorted(collection) if sort else collection 。

Also your subdomain code might not give what you expect for things like www.something.com.au which will only return .com.au 同样，您的子域代码可能无法提供您对www.something.com.au类的期望，这些东西只会返回.com.au

Answer 3

You need to open you file then Write each X in the for loop. 您需要打开文件，然后在for循环中写入每个X。

At the end you can close the file. 最后，您可以关闭文件。

f = open("f2.txt", "w+",1)

for x in extractUrls(url):
    print("-", x)
    f.write( x ) 

f.close()

Answer 4

f = open("f2.txt", "w+", 1)

for x in extractUrls(url):
    print("-", x)
    f.write( x )

f.close()

Python：仅写入输出的最后一行

问题描述

4 个解决方案

解决方案1
2 已采纳 2013-10-17 07:35:48

解决方案2
1 2013-10-17 08:43:11

解决方案3
0 2013-10-17 06:40:26

解决方案4
0 2013-10-17 06:40:41

Python：仅写入输出的最后一行

问题描述

4 个解决方案

解决方案1 2 已采纳 2013-10-17 07:35:48

解决方案2 1 2013-10-17 08:43:11

解决方案3 0 2013-10-17 06:40:26

解决方案4 0 2013-10-17 06:40:41

解决方案1
2 已采纳 2013-10-17 07:35:48

解决方案2
1 2013-10-17 08:43:11

解决方案3
0 2013-10-17 06:40:26

解决方案4
0 2013-10-17 06:40:41