简体   繁体   中英

How can I produce a count on the number of times each word has occurred in the following

Using the code below in Jupyter notebook, I can only produce a count of each character found. But I am looking to get a count on the number of times each word occurs. Thank you!

from bs4 import BeautifulSoup as Soup, Tag
import re
import requests
from collections import Counter

url = "http://en.wikipedia.org/wiki/October_27"
DayBorn = [] # create a list to save the soup contents 
response = requests.get(url)
soup = Soup(response.content)


births_span = soup.find("span", {"id": "Births"}) # find where the first instance of span with ID of births appears
births_ul = births_span.parent.find_next_sibling() # find the parents next sibling which is ul (unordered list)

for item in births_ul.findAll('li'): # find all the occurrences of li within births_ul
    if isinstance(item, Tag): 
        #print(item.text) # if the next item found is a 'li' then print the value of its text
        DayBorn.append(item.text)

This next section gives me a list of each word as it occurs.

text_iterated = str(DayBorn) 
[x for x,y in re.findall(r'((\w+[^,.()]))', text_iterated)]

I have tried both these methods so far

Counter(str(text_iterated))

and

occurrences = Counter()
for word in str(DayBorn):
    occurrences[word] += 1
occurrences  

They result in the same thing, a count of each number/letter eg

counter({'[': 4,
         "'": 449,
         '8': 104,
         '9': 277,
         '2': 109,
         ' ': 2237,
         '–': 225,
         'E': 50,

You very specifically told your program to iterate through the characters of the list you created:

for word in str(DayBorn):

You converted the list to its string-output form, and then iterated through the characters of that string. Instead,

for word in DayBorn:

Better yet, simply use the provided Python facility for counting:

from collections import Counter
...
occurrences = Counter(DayBorn)

EDIT per USER COMMENT

DayBorn needs to be a list of words. Again, we need your MVE. Perhaps this will help as you ingest lines: instead of adding the entire line to your list

    DayBorn.append(item.text)

... add the words individually

    DayBorn.extend(item.text.split())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM