[英]Running into issues opening/encoding a text files in python
Here is the raw text: 这是原始文本:
Issue / Problem Encountered Solution / Lessons
• Sample result on the print out was reported with a
“sample not seen” message indication
• Symbol character (*, ?) next to the sample value
result
• Impact :
– Wrong result / NC generation
– Downtime and delay in Lot disposition
Check for print out errors like:
• If an error is displayed for example:
“sample not seen” refer to SOP- 013499 and repeat sample.
• Sample result should not have an
interrogation mark before the sample value.
• Impact to the area:
– Minimize possible OOS results – Minimize NC
– Reduce cost for OOS
Investigation
Always ensure to verify that the print out report:
Does not has the message “sample not seen” and the symbol
“sample not
seen” message
Sample result should not have an interrogation mark before the sample value
characters on the sample value result
Now, I've used the following code to process the data: 现在,我使用以下代码来处理数据:
for ix, f in enumerate(listdir(directory_learning_group)):
if isfile(join(directory_learning_group,f)):
if "OPL" in f:
try:
dataset_outer_folder_OPL.loc[ix, "ID"] = f.split('_')[0]
dataset_outer_folder_OPL.loc[ix, "Filename"] = f
# Open a file
fd = io.open(directory_learning_group+'{}'.format(f), encoding = 'utf8', errors = 'ignore')
# Reading text
ret = fd.read()
dataset_outer_folder_OPL.loc[ix, "Text"] = ret
except:
print(f)
dataset_learning_group_OPL= dataset_learning_group_OPL.reset_index(drop = True)
And end up with the following result: 并得到以下结果:
'A\x00M\x00L\x00 \x006\x00 \x00P\x00U\x00R\x00 \x00O\x00n\x00e\x00-\x00P\x00o\x00i\x00n\x00t\x00 \x00L\x00e\x00s\x00s\x00o\x00n\x00:\x00 \x00I\x00n\x00c\x00o\x00r\x00r\x00e\x00c\x00t\x00 \x00E\x00n\x00d\x00o\x00t\x00o\x00x\x00i\x00n\x00 \x00r\x00e\x00s\x00u\x00l\x00t\x00 \x00o\x00n\x00 \x00t\x00h\x00e\x00 \x00p\x00r\x00i\x00n\x00t\x00 \x00o\x00u\x00t\x00 \x00r\x00e\x00p\x00o\x00r\x00t\
I'm having trouble understanding what exactly is happening here. 我无法理解这里到底发生了什么。 This .txt doesn't look that different than the other files that I am able to read in without issues.
该.txt看起来与我可以正常读取的其他文件没有什么不同。
Even we I try to decode/encode it, it doesn't help at all. 即使我们尝试对其进行解码/编码,也完全没有帮助。
Any help/guidance would be much appreciated. 任何帮助/指导将不胜感激。
You should probably post your whole code into the question. 您可能应该将整个代码发布到问题中。 Anyhow, I tested a raw text file with what you have posted and it works for the following code on Python 3.x:
无论如何,我用您发布的内容测试了原始文本文件,它适用于Python 3.x上的以下代码:
with open('10020_OPL Endotoxin testing.txt', 'rb') as f:
file = f.readlines()
print(file)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.