簡體   English   中英

為什么我的單詞計數器在第一次運行時會產生與第二次相比不同的輸出?

[英]Why does my word counter produce a different output the first time I run it compared to the second time?

我正在做一個基本的練習項目。 我調用一個簡單的Wikipedia頁面,然后使用Beautiful Soup將所有內容寫入文本文件。 然后,我計算一個單詞在該新寫入的文本文件中出現的次數

由於某種原因,第一次運行代碼時,我得到的編號與第二次運行代碼時所得到的編號不同。

我相信我第一次運行代碼時,“ anime.txt”與我第二次運行代碼時不同。

問題一定是我使用Beautiful Soup收集所有文本數據的方式。

請幫忙

from urllib.request import urlopen
from bs4 import BeautifulSoup

f = open("anime.txt", "w", encoding="utf-8")
f.write("")
f.close() 

my_url ="https://en.wikipedia.org/wiki/Anime"

uClient = urlopen(my_url)
page_html = uClient.read()
uClient.close()
page_soup = BeautifulSoup(page_html, "html.parser")
p=page_soup.findAll("p")


f = open("anime.txt", "a", encoding="utf-8")

for i in p:

    f.write(i.text)
    f.write("\n\n")

data= open("anime.txt", encoding="utf-8").read()
anime_count = data.count("anime")
Anime_count = data.count("Anime")

print(anime_count,"\n")
print(Anime_count, "\n")

count= anime_count+Anime_count

print("The total number of times the word Anime appears within <p> in the wikipedia page is : ", count)

第一個輸出:

動漫數= 14

Anime_count = 97

數量= 111

第二個輸出:

動漫數= 23

Anime_count = 139

數量= 162

編輯:

我根據前兩個注釋編輯了代碼,當然,現在可以使用:P。 關於正確打開和關閉文件的方式/次數,這看起來更好嗎?

from urllib.request import urlopen
from bs4 import BeautifulSoup

my_url ="https://en.wikipedia.org/wiki/Anime"

uClient = urlopen(my_url)
page_html = uClient.read()
uClient.close()
page_soup = BeautifulSoup(page_html, "html.parser")
p=page_soup.findAll("p")


f = open("anime.txt", "w", encoding="utf-8")

for i in p:

    f.write(i.text)
    f.write("\n\n")

f.close()

data= open("anime.txt", encoding="utf-8").read()
anime_count = data.count("anime")
Anime_count = data.count("Anime")

print(anime_count,"\n")
print(Anime_count, "\n")

count= anime_count+Anime_count

print("The total number of times the word Anime appears within <p> in the wikipedia page is : ", count)

不要對打開和關閉文件感到困惑。 包括所有的寫入/讀取部分with陳述

from urllib.request import urlopen
from bs4 import BeautifulSoup

with open("anime.txt", "w", encoding="utf-8") as outfile:

    my_url ="https://en.wikipedia.org/wiki/Anime"

    uClient = urlopen(my_url)
    page_html = uClient.read()
    uClient.close()
    page_soup = BeautifulSoup(page_html, "html.parser")
    p=page_soup.findAll("p")


    for i in p:
        outfile.write(i.text)
        outfile.write("\n\n")

with open("anime.txt", "r", encoding="utf-8") as infile:
    data = infile.read()
    anime_count = data.count("anime")
    Anime_count = data.count("Anime")

    print(anime_count,"\n")
    print(Anime_count, "\n")

    count= anime_count+Anime_count

    print("The total number of times the word Anime appears within <p> in the wikipedia page is : ", count)s : ", count)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM