简体   繁体   中英

How can I use “uniq -c” command of unix in python code?

I have to check how many times each word occurs in a paragraph. I have to print the word along with the number of occurence.

For example, If the paragraph is

how are you now? Are you better now?

then the output should be:

how-1
are-2
you-2
now-2
better-1

I tried using the subprocess

from subprocess import call
sen=raw_input("enter:")
call(["uniq", "-c",sen])

but the function wants a file as input. I dont want to input a file. How do I make it work.

Just for completeness, this is how you could solve it in Python:

import re, collections

paragraph = "how are you now? Are you better now?"

splitter = re.compile('\W')
counts = collections.Counter(word.lower() 
                             for word in splitter.split(paragraph) 
                             if word)
for word, count in counts.most_common():
    print(count, word)

As a comment to Dimitris Jim (I'd post as comment but not enough rep), you'll also need to sort the input. You can do this in python by replacing the regex statement with this

sen_list = sen.split(" ")
sen_list.sort()
sen = '\n'.join(sen_list)

I'm sure there's a way to do this with linux sort . Similarly, you can use tr ' ' '\\n' to do the replacement of space with new line not in python.

If you really want to know how to do counting using uniq, then:

from subprocess import Popen, PIPE

sen = raw_input("Enter: ")
sen = sen.lower().split() # Remove capitals and split into list of words
# Sort list to provide correct count ("-c" option counts only consecutive repeats)
# If you want to get consecutives, just don't sort.
sen.sort()
sen = "\n".join(sen) # Put each word into its own line (for input)
# uniq accepts input from stdin
p = Popen(["uniq", "-c"], stdin=PIPE, stdout=PIPE)
out = p.communicate(sen)[0].split("\n") # Pass the input, and get the output (make it a list by splittin on newlines)
counts = [] # Parse output and put it into a list
for x in out:
    if not x: continue # Skip empty lines (usually appears at the end of output string)
    counts.append(tuple(x.split())) # Split the line into tuple(number, word) and add it to counts list

# And if you want a nice output like you presented in Q:
for x in counts:
    print x[1]+"-"+x[0]

Note1: This is definitely not a way of doing this. You really should code it in Python.

Note2: This is tested on cygwin and Ubuntu 12.04 with same results

Note3: uniq is not a function, it is a command ie a program stored in /bin/uniq and /usr/bin/uniq

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM