使用Python Pandas读取.txt文件-字符串和浮点数

Question

我想使用Pandas在Python（3.6.0）中读取.txt文件。 .txt文件的第一行如下所示：

要读取的文本文件

Location:           XXX
Campaign Name:      XXX
Date of log start:  2016_10_09
Time of log start:  04:27:28
Sampling Frequency: 1Hz
Config file:        XXX
Logger Serial:      XXX

CH Mapping;;XXXC1;XXXC2;XXXC3;XXXC4
CH Offsets in ms;;X;X,X;X;X,X
CH Units;;mA;mA;mA;mA
Time;msec;Channel1;Channel2;Channel3;Channel4
04:30:00;000; 0.01526;10.67903;10.58366; 0.00000
04:30:01;000; 0.17090;10.68666;10.58518; 0.00000
04:30:02;000; 0.25177;10.68284;10.58442; 0.00000

我正在使用下面的简单代码行：

Python代码

import pandas
df = pandas.read_csv("TextFile.txt", sep=";", header=[10])
print(df)

然后在终端中获得以下输出：

终端输出

    Time msec Channel1 Channel2 Channel3 Channel4
0    NaN  NaN      NaN      NaN      NaN      NaN
1    NaN  NaN      NaN      NaN      NaN      NaN
2    NaN  NaN      NaN      NaN      NaN      NaN
..   ...  ...      ...      ...      ...      ...
599  NaN  NaN      NaN      NaN      NaN      NaN

我立即想到的是，Pandas不喜欢前两列。 您是否有任何建议可以使Pandas读取.txt文件而不更改文件本身中的任何内容？

先感谢您。

Answer 1

您想传递skiprows=11 ，并将skipinitial_space=True传递给read_csv以及sep=';' 当您与分隔符一起有空格时：

In [83]:
import io
import pandas as pd
t="""Location:           XXX
Campaign Name:      XXX
Date of log start:  2016_10_09
Time of log start:  04:27:28
Sampling Frequency: 1Hz
Config file:        XXX
Logger Serial:      XXX

CH Mapping;;XXXC1;XXXC2;XXXC3;XXXC4
CH Offsets in ms;;X;X,X;X;X,X
CH Units;;mA;mA;mA;mA
Time;msec;Channel1;Channel2;Channel3;Channel4
04:30:00;000; 0.01526;10.67903;10.58366; 0.00000
04:30:01;000; 0.17090;10.68666;10.58518; 0.00000
04:30:02;000; 0.25177;10.68284;10.58442; 0.00000"""

df = pd.read_csv(io.StringIO(t), skiprows=11, sep=';', skipinitialspace=True)
df

Out[83]:
       Time  msec  Channel1  Channel2  Channel3  Channel4
0  04:30:00     0   0.01526  10.67903  10.58366       0.0
1  04:30:01     0   0.17090  10.68666  10.58518       0.0
2  04:30:02     0   0.25177  10.68284  10.58442       0.0

您可以看到dtypes现在是正确的：

In [84]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 6 columns):
Time        3 non-null object
msec        3 non-null int64
Channel1    3 non-null float64
Channel2    3 non-null float64
Channel3    3 non-null float64
Channel4    3 non-null float64
dtypes: float64(4), int64(1), object(1)
memory usage: 224.0+ bytes

您可能还希望将时间解析为日期时间：

In [86]:    
df = pd.read_csv(io.StringIO(t), skiprows=11, sep=';', skipinitialspace=True, parse_dates=['Time'])
df

Out[86]:
                 Time  msec  Channel1  Channel2  Channel3  Channel4
0 2017-03-16 04:30:00     0   0.01526  10.67903  10.58366       0.0
1 2017-03-16 04:30:01     0   0.17090  10.68666  10.58518       0.0
2 2017-03-16 04:30:02     0   0.25177  10.68284  10.58442       0.0

In [87]:    
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 6 columns):
Time        3 non-null datetime64[ns]
msec        3 non-null int64
Channel1    3 non-null float64
Channel2    3 non-null float64
Channel3    3 non-null float64
Channel4    3 non-null float64
dtypes: datetime64[ns](1), float64(4), int64(1)
memory usage: 224.0 bytes

使用Python Pandas读取.txt文件-字符串和浮点数

问题描述

要读取的文本文件

Python代码

终端输出

1 个解决方案

解决方案1
3 已采纳 2017-03-16 09:23:59

使用Python Pandas读取.txt文件-字符串和浮点数

问题描述

要读取的文本文件

Python代码

终端输出

1 个解决方案

解决方案1 3 已采纳 2017-03-16 09:23:59

解决方案1
3 已采纳 2017-03-16 09:23:59