简体   繁体   中英

Python word count not working

Textfile1 contains words, some of which are duplicates:

Train

21street

Train

and more.

I need to count the number of their appearances and write this into Textfile2, while removing the duplicates. Also, in alphabetical order, which is why I have the sorted there. Example of what the final Textfile2 should look like:

Train 2

21street 1

...and so on.

Here is my attempt:

file1=open(textfile1,"r")
list1=[]

for line in file1:
    list1.append(line)

import collections

counter=collections.Counter(list1) #not sure how I can use this in my program

list2=list(set(list1))

list3=sorted(list2)

file2=open(textfile2,"w")

for i in list3:

    file2.write(i+count((i)in list1))

The word count doesn't seem to work, and I'm not sure how to solve it. Thank you for your help.

Let's make some changes step by step, starting with your error.

file2.write(i+count((i)in list1))
#             ^^^^^^^^^^^^^^^^^^ 
# NameError: name 'count' is not defined

The problem is you're accessing the count incorrectly. A Counter works like a dict ; the key is what's being counted and the value is the count ( int ). You gave your Counter the name counter , so to access the count of line i , change it to this, which will give an error for other reasons:

file2.write(i+counter[i])
#             ^^^^^^^^^^ 
# TypeError: must be str, not int

Even though we're successfully getting the count, we cannot add it to the line, i , like this. The line and the count are two different types; one is text ( str ) and the other a number ( int ). We need to turn that number into its textual representation. If that confuses you, think of it like this: 2 + 2 == 4 whereas "2" + "2" == "22" . Here's how to do it:

file2.write(i+str(counter[i]))

No more errors, but depending on how you're testing, the file opened as file2 might still be empty. The changes will only be written to disk once you close it when finished. To never forget doing that, you can leave the bookkeeping to Python by using the with statement. At the end of the indented block, the file is closed automatically. Below is the full code with a few more commented changes:

# imports at the top
import collections

list1=[]
with open(textfile1,"r") as file1:
    for line in file1:
        list1.append(line)
# file1 automatically closed here
counter=collections.Counter(list1)
list2=list(set(list1))
list3=sorted(list2)
with open(textfile2,"w") as file2:
    # i implies index which it isn't; let's call it line too
    for line in list3:
        file2.write(line+str(counter[line]))
# file2 automatically closed here

After running it, the file opened as file2 will look like this:

21street
1Train
2

The number ends up on the next line. The reason this happens, is that the lines you stored in your lists aren't just "21street" and "Train" but "21street\\n" and "Train\\n" . The "\\n" at the end is the newline character which serves as a line separator. If you add any text after that, it will end up on a new line—that's the point. In a list , such separator isn't needed anymore, so let's remove it:

        list1.append(line.rstrip("\n"))
        #                ^^^^^^^^^^^^^

Now your output wil be like this:

21street1Train2

You need to add the separators back, in the right place, when writing to a file again. What's the right place? At the end of a line. Also, a space between the line and the count would be nice:

        file2.write(line+" "+str(counter[line])+"\n")
        #               ^^^^                   ^^^^^

Finally, the output is as desired:

21street 1
Train 2

My solution would be:

with open('a.txt') as f:
    a = f.read()
a = a.split('\n')
a = [i for i in a if i != '']
from collections import Counter
b = Counter(a)
with open('b.txt', 'w') as f:
    for key in b:
        f.write('{} : {}\n'.format(key, b[key]))

Instead of list you should try dict and you don't need to import any external module you can do this just by some logic :

track={}
with open("file.txt",'r') as f:
    for line in f:

        if line!='\n':

            if line.strip() not in track:
                track[line.strip()]=1
            else:
                track[line.strip()]+=1


with open("new_text",'w') as new:
    for key,value in track.items():
        tr="{} {} \n".format(key,value)
        new.write(tr)

output in new_text file :

Train 2 
21street 1 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM