[英]Reading a txt file to Pandas with no identified delimiter
I'm trying to read a text file to Python using Pandas.我正在尝试使用 Pandas 将文本文件读取到 Python。 I have struggled trying to identify how the file is delimited and read it.我一直在努力确定文件的分隔方式和读取方式。 The txt file is organized like this: Summary.txt file txt 文件的组织方式如下: Summary.txt 文件
Date: 9/10/2021 1:53:38 AM. Run ID: 115756 Fixture: 1 (COM 51) ------------------- Position Serial Fail reason 1: 0811010750 2: 0811010240 3: 0811009324 4: 0811009130 5: 0811010032 6: 0811010082 7: 0811009366 8: 0811009247 9: 0811010170 10: FAILED SCAN 11: 0811009938 12: 0811009532 13: 0811009299 14: 0811009995 CO2 Stability
I tried to read it using the next code:我尝试使用以下代码阅读它:
import pandas as pd summary = pd.read_csv(r"C:\Users\eduardo.romero\Documents\VSC_TXT\Summary.txt", skiprows=0, header = None, delim_whitespace=True, ) print(summary)
Giving the next result:给出下一个结果:
0 1 2 3 4 5 6 0 Date: 4/11/2021 12:46:08 AM. Run ID: 105952.0 1 Fixture: 1 (COM 51) NaN NaN NaN 2 ------------------- NaN NaN NaN NaN NaN NaN 3 Position Serial Fail reason NaN NaN NaN 4 1: 0811007101 NaN NaN NaN NaN NaN.. ... ... ... ... ... ... ... 303 72: FAILED SCAN NaN NaN NaN NaN 304 73: FAILED SCAN NaN NaN NaN NaN 305 74: FAILED SCAN NaN NaN NaN NaN 306 75: FAILED SCAN NaN NaN NaN NaN 307 76: FAILED SCAN NaN NaN NaN NaN [308 rows x 7 columns]
What I'm trying to do is to read it as Excel would format the file: Excel data我要做的是将其读取为 Excel 将格式化文件: Excel 数据
Thank you for your help!谢谢您的帮助!
EDIT: RUNNING WITH FWF pf.read_fwf编辑:使用 FWF pf.read_fwf运行
Result:结果:
Position\tSerial\t\tFail reason 0 1: \t\t0811007101 1 2: \t\t0811007303 2 3: \t\t0811007300 3 4: \t\t0811007312 4 5: \t\t0811007139.. ... 299 72: \t\t\t\tFAILED SCAN 300 73: \t\t\t\tFAILED SCAN 301 74: \t\t\t\tFAILED SCAN 302 75: \t\t\t\tFAILED SCAN 303 76: \t\t\t\tFAILED SCAN [304 rows x 1 columns]
Use read_fwf
and skip some initial lines:使用read_fwf
并跳过一些初始行:
import pandas as pd
df = pd.read_fwf('input.txt', skiprows=5, keep_default_na=False)
print(df)
Output: Output:
Position Serial Fail reason
0 1: 0811010750
1 2: 0811010240
2 3: 0811009324
3 4: 0811009130
4 5: 0811010032
5 6: 0811010082
6 7: 0811009366
7 8: 0811009247
8 9: 0811010170
9 10: FAILED SCAN
10 11: 0811009938
11 12: 0811009532
12 13: 0811009299
13 14: 0811009995 CO2 Stability
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.