I have following script properly identifies ASCII and non-ASCII lines, but I want a report for each file, not per line. Since I have the print inside the loop, and I have many files, I get far too much output. How can I modify this code to get a single output per file? It should tell me whether there was any non-ASCII text in the file.
import os
for file in os.listdir('.'):
if file.endswith('.txt'):
with open(file) as f:
content = f.readlines()
for entry in content:
try:
entry.encode('ascii')
except UnicodeEncodeError:
print("it was not a ascii-encoded unicode string")
print(file)
else:
print("It may have been an ascii-encoded unicode string")
print(file)
For instance, if you want to show whether there was any non-ASCII string in the file, you maintain a flag to tell you whether you've found a bad line. However, you wait until the end of the file to report.
import os
for file in os.listdir('.'):
if file.endswith('.txt'):
with open(file) as f:
content = f.readlines()
good_file = True
for entry in content:
try:
entry.encode('ascii')
except UnicodeEncodeError:
good_file = False
if good_file:
print("It may have been an ASCII-encoded unicode string")
else:
print("it was not an ASCII-encoded unicode string")
print(file)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.