I am complete beginner in Python. I have tried many methods from stackoverflow answers on this question, but neither of them works in my script.
I have this little script to use, however I can not get the huge result to .txt file so I can analyize the data. How do I redirect the print output to txt file on my computer?
from nltk.util import ngrams
import collections
with open("text.txt", "rU") as f:
sixgrams = ngrams(f.read().decode('utf8').split(), 2)
result = collections.Counter(sixgrams)
print result
for item, count in sorted(result.iteritems()):
if count >= 2:
print " ".join(item).encode('utf8'), count
只需在命令行上执行: python script.py > text.txt
print
statement in Python 2.x support redirection ( >> fileobj
):
...
with open('output.txt', 'w') as f:
print >>f, result
for item, count in sorted(result.iteritems()):
if count >= 2:
print >>f, " ".join(item).encode('utf8'), count
In Python 3.x, print
function accepts optional keyword parameter file
:
print("....", file=f)
If you do from __future__ import print_function
in Python 2.6+, above approach is possible even in Python 2.x.
Using a BufferedWriter you can do it like this
os = io.BufferedWriter(io.FileIO(pathOut, "wb"))
os.write( result+"\n")
for item, count in sorted(result.iteritems()):
if count >= 2:
os.write(" ".join(item).encode('utf8')+ str(count)+"\n")
outs.flush()
outs.close()
As Antti mentioned, you should prefer python3 and leave all this annoying python2 junk behind you. The following script works with python2 and python3.
To read/write files, use open
function from the io module, this is python2/python3 compatible. Allways use the with
statment to open a resource like a file. The with
is used to wrap the execution of a block within a Python Context Manager . File descriptors have context mananger implementend, and will be closed automaticly on leaving the with
block.
Not depend on python, if you want to read a text-file, you should know the encoding of this file to read it proper (if you are unsure try utf-8
first). Beside, the correct UTF-8 signature is utf-8
and the mode U
is depricated.
#!/usr/bin/env python
# -*- coding: utf-8; mode: python -*-
from nltk.util import ngrams
import collections
import io, sys
def main(inFile, outFile):
with io.open(inFile, encoding="utf-8") as i:
sixgrams = ngrams(i.read().split(), 2)
result = collections.Counter(sixgrams)
templ = "%-10s %s\n"
with io.open(outFile, "w", encoding="utf-8") as o:
o.write(templ % (u"count", u"words"))
o.write(templ % (u"-" * 10, u"-" * 30))
# Sorting might be expensive. Before sort, filter items you don't want
# to handle, btw. place *count* in front of the tuple.
filtered = [ (c, w) for w, c in result.items() if c > 1]
filtered.sort(reverse=True)
for count, item in filtered:
o.write(templ % (count, " ".join(item)))
if __name__ == '__main__':
sys.exit(main("text.txt", "out_text.txt"))
With the input text.txt
file:
At eight o'clock on Thursday morning and Arthur didn't feel very good
he missed 100 € on Thursday morning. The Euro symbol of 100 € is here
to test the encoding of non ASCII characters, because encoding errors
do occur only on Thursday morning.
I get the following output_text
:
count words
---------- ------------------------------
3 on Thursday
2 Thursday morning.
2 100 €
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.