[英]Why does my word counter produce a different output the first time I run it compared to the second time?
我正在做一個基本的練習項目。 我調用一個簡單的Wikipedia頁面,然后使用Beautiful Soup將所有內容寫入文本文件。 然后,我計算一個單詞在該新寫入的文本文件中出現的次數
由於某種原因,第一次運行代碼時,我得到的編號與第二次運行代碼時所得到的編號不同。
我相信我第一次運行代碼時,“ anime.txt”與我第二次運行代碼時不同。
問題一定是我使用Beautiful Soup收集所有文本數據的方式。
請幫忙
from urllib.request import urlopen
from bs4 import BeautifulSoup
f = open("anime.txt", "w", encoding="utf-8")
f.write("")
f.close()
my_url ="https://en.wikipedia.org/wiki/Anime"
uClient = urlopen(my_url)
page_html = uClient.read()
uClient.close()
page_soup = BeautifulSoup(page_html, "html.parser")
p=page_soup.findAll("p")
f = open("anime.txt", "a", encoding="utf-8")
for i in p:
f.write(i.text)
f.write("\n\n")
data= open("anime.txt", encoding="utf-8").read()
anime_count = data.count("anime")
Anime_count = data.count("Anime")
print(anime_count,"\n")
print(Anime_count, "\n")
count= anime_count+Anime_count
print("The total number of times the word Anime appears within <p> in the wikipedia page is : ", count)
第一個輸出:
動漫數= 14
Anime_count = 97
數量= 111
第二個輸出:
動漫數= 23
Anime_count = 139
數量= 162
編輯:
我根據前兩個注釋編輯了代碼,當然,現在可以使用:P。 關於正確打開和關閉文件的方式/次數,這看起來更好嗎?
from urllib.request import urlopen
from bs4 import BeautifulSoup
my_url ="https://en.wikipedia.org/wiki/Anime"
uClient = urlopen(my_url)
page_html = uClient.read()
uClient.close()
page_soup = BeautifulSoup(page_html, "html.parser")
p=page_soup.findAll("p")
f = open("anime.txt", "w", encoding="utf-8")
for i in p:
f.write(i.text)
f.write("\n\n")
f.close()
data= open("anime.txt", encoding="utf-8").read()
anime_count = data.count("anime")
Anime_count = data.count("Anime")
print(anime_count,"\n")
print(Anime_count, "\n")
count= anime_count+Anime_count
print("The total number of times the word Anime appears within <p> in the wikipedia page is : ", count)
不要對打開和關閉文件感到困惑。 包括所有的寫入/讀取部分with
陳述 。
from urllib.request import urlopen
from bs4 import BeautifulSoup
with open("anime.txt", "w", encoding="utf-8") as outfile:
my_url ="https://en.wikipedia.org/wiki/Anime"
uClient = urlopen(my_url)
page_html = uClient.read()
uClient.close()
page_soup = BeautifulSoup(page_html, "html.parser")
p=page_soup.findAll("p")
for i in p:
outfile.write(i.text)
outfile.write("\n\n")
with open("anime.txt", "r", encoding="utf-8") as infile:
data = infile.read()
anime_count = data.count("anime")
Anime_count = data.count("Anime")
print(anime_count,"\n")
print(Anime_count, "\n")
count= anime_count+Anime_count
print("The total number of times the word Anime appears within <p> in the wikipedia page is : ", count)s : ", count)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.