简体   繁体   English

将 txt 文件读取到 Pandas 没有识别的分隔符

[英]Reading a txt file to Pandas with no identified delimiter

I'm trying to read a text file to Python using Pandas.我正在尝试使用 Pandas 将文本文件读取到 Python。 I have struggled trying to identify how the file is delimited and read it.我一直在努力确定文件的分隔方式和读取方式。 The txt file is organized like this: Summary.txt file txt 文件的组织方式如下: Summary.txt 文件

 Date: 9/10/2021 1:53:38 AM. Run ID: 115756 Fixture: 1 (COM 51) ------------------- Position Serial Fail reason 1: 0811010750 2: 0811010240 3: 0811009324 4: 0811009130 5: 0811010032 6: 0811010082 7: 0811009366 8: 0811009247 9: 0811010170 10: FAILED SCAN 11: 0811009938 12: 0811009532 13: 0811009299 14: 0811009995 CO2 Stability

I tried to read it using the next code:我尝试使用以下代码阅读它:

 import pandas as pd summary = pd.read_csv(r"C:\Users\eduardo.romero\Documents\VSC_TXT\Summary.txt", skiprows=0, header = None, delim_whitespace=True, ) print(summary)

Giving the next result:给出下一个结果:

 0 1 2 3 4 5 6 0 Date: 4/11/2021 12:46:08 AM. Run ID: 105952.0 1 Fixture: 1 (COM 51) NaN NaN NaN 2 ------------------- NaN NaN NaN NaN NaN NaN 3 Position Serial Fail reason NaN NaN NaN 4 1: 0811007101 NaN NaN NaN NaN NaN.. ... ... ... ... ... ... ... 303 72: FAILED SCAN NaN NaN NaN NaN 304 73: FAILED SCAN NaN NaN NaN NaN 305 74: FAILED SCAN NaN NaN NaN NaN 306 75: FAILED SCAN NaN NaN NaN NaN 307 76: FAILED SCAN NaN NaN NaN NaN [308 rows x 7 columns]

What I'm trying to do is to read it as Excel would format the file: Excel data我要做的是将其读取为 Excel 将格式化文件: Excel 数据

Thank you for your help!谢谢您的帮助!

EDIT: RUNNING WITH FWF pf.read_fwf编辑:使用 FWF pf.read_fwf运行

Result:结果:

 Position\tSerial\t\tFail reason 0 1: \t\t0811007101 1 2: \t\t0811007303 2 3: \t\t0811007300 3 4: \t\t0811007312 4 5: \t\t0811007139.. ... 299 72: \t\t\t\tFAILED SCAN 300 73: \t\t\t\tFAILED SCAN 301 74: \t\t\t\tFAILED SCAN 302 75: \t\t\t\tFAILED SCAN 303 76: \t\t\t\tFAILED SCAN [304 rows x 1 columns]

Use read_fwf and skip some initial lines:使用read_fwf并跳过一些初始行:

import pandas as pd

df = pd.read_fwf('input.txt', skiprows=5, keep_default_na=False)
print(df)

Output: Output:

   Position      Serial    Fail reason
0        1:  0811010750               
1        2:  0811010240               
2        3:  0811009324               
3        4:  0811009130               
4        5:  0811010032               
5        6:  0811010082               
6        7:  0811009366               
7        8:  0811009247               
8        9:  0811010170               
9       10:                FAILED SCAN
10      11:  0811009938               
11      12:  0811009532               
12      13:  0811009299               
13      14:  0811009995  CO2 Stability

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM