![](/img/trans.png)
[英]How to get read strings from a list(a txt file) and print them out as ints, strings, and floats?
[英]Read .txt file with Python Pandas - strings and floats
我想使用Pandas在Python(3.6.0)中读取.txt文件。 .txt文件的第一行如下所示:
Location: XXX
Campaign Name: XXX
Date of log start: 2016_10_09
Time of log start: 04:27:28
Sampling Frequency: 1Hz
Config file: XXX
Logger Serial: XXX
CH Mapping;;XXXC1;XXXC2;XXXC3;XXXC4
CH Offsets in ms;;X;X,X;X;X,X
CH Units;;mA;mA;mA;mA
Time;msec;Channel1;Channel2;Channel3;Channel4
04:30:00;000; 0.01526;10.67903;10.58366; 0.00000
04:30:01;000; 0.17090;10.68666;10.58518; 0.00000
04:30:02;000; 0.25177;10.68284;10.58442; 0.00000
我正在使用下面的简单代码行:
import pandas
df = pandas.read_csv("TextFile.txt", sep=";", header=[10])
print(df)
然后在终端中获得以下输出:
Time msec Channel1 Channel2 Channel3 Channel4
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
.. ... ... ... ... ... ...
599 NaN NaN NaN NaN NaN NaN
我立即想到的是,Pandas不喜欢前两列。 您是否有任何建议可以使Pandas读取.txt文件而不更改文件本身中的任何内容?
先感谢您。
您想传递skiprows=11
,并将skipinitial_space=True
传递给read_csv
以及sep=';'
当您与分隔符一起有空格时:
In [83]:
import io
import pandas as pd
t="""Location: XXX
Campaign Name: XXX
Date of log start: 2016_10_09
Time of log start: 04:27:28
Sampling Frequency: 1Hz
Config file: XXX
Logger Serial: XXX
CH Mapping;;XXXC1;XXXC2;XXXC3;XXXC4
CH Offsets in ms;;X;X,X;X;X,X
CH Units;;mA;mA;mA;mA
Time;msec;Channel1;Channel2;Channel3;Channel4
04:30:00;000; 0.01526;10.67903;10.58366; 0.00000
04:30:01;000; 0.17090;10.68666;10.58518; 0.00000
04:30:02;000; 0.25177;10.68284;10.58442; 0.00000"""
df = pd.read_csv(io.StringIO(t), skiprows=11, sep=';', skipinitialspace=True)
df
Out[83]:
Time msec Channel1 Channel2 Channel3 Channel4
0 04:30:00 0 0.01526 10.67903 10.58366 0.0
1 04:30:01 0 0.17090 10.68666 10.58518 0.0
2 04:30:02 0 0.25177 10.68284 10.58442 0.0
您可以看到dtypes现在是正确的:
In [84]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 6 columns):
Time 3 non-null object
msec 3 non-null int64
Channel1 3 non-null float64
Channel2 3 non-null float64
Channel3 3 non-null float64
Channel4 3 non-null float64
dtypes: float64(4), int64(1), object(1)
memory usage: 224.0+ bytes
您可能还希望将时间解析为日期时间:
In [86]:
df = pd.read_csv(io.StringIO(t), skiprows=11, sep=';', skipinitialspace=True, parse_dates=['Time'])
df
Out[86]:
Time msec Channel1 Channel2 Channel3 Channel4
0 2017-03-16 04:30:00 0 0.01526 10.67903 10.58366 0.0
1 2017-03-16 04:30:01 0 0.17090 10.68666 10.58518 0.0
2 2017-03-16 04:30:02 0 0.25177 10.68284 10.58442 0.0
In [87]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 6 columns):
Time 3 non-null datetime64[ns]
msec 3 non-null int64
Channel1 3 non-null float64
Channel2 3 non-null float64
Channel3 3 non-null float64
Channel4 3 non-null float64
dtypes: datetime64[ns](1), float64(4), int64(1)
memory usage: 224.0 bytes
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.