简体   繁体   English

使用 python 2.7 和 Pandas 将多个空格分隔的文件转换为 CSV?

[英]converting a multiple space delimited file to CSV using python 2.7 with Pandas?

I have multiple problems that I'm trying to solve with this single file, but but my immediate concern is trying to convert this file which has fields which are delimited by variable numbers of white spaces between the fields to a standard CSV file without 1000 lines of genius level code.我有多个问题,我试图用这个单个文件解决,但我最关心的是尝试将这个文件转换为没有 1000 行的标准 CSV 文件,该文件的字段由字段之间的可变数量的空格分隔天才级别的代码。 I know of one way to do it as I have done it in a previous project a couple of years ago by setting up functions similar to the old left$, mid$ and right$ functions in VB to select out particular characters from the row that I am interested in, because the data is very well defined and neatly parsed, ie, all the way down each column is the same size so I can grab the header row by using those functions to select out the field names of the columns, then go row by row using the same functions to pull the numeric data as strings with mid$(), write that to another file by adding in a "," between each written string, convert the strings back to floats and then I've got my CSV file with headers.我知道一种方法,就像几年前我在以前的项目中所做的那样,通过在 VB 中设置类似于旧的 left$、mid$ 和 right$ 函数到 select 从行中的特定字符我很感兴趣,因为数据定义得很好并且解析得很好,也就是说,每一列的大小都是相同的,所以我可以通过使用这些函数来抓取 header 行到 select 列的字段名称,然后go 使用相同的函数逐行使用mid$()将数字数据作为字符串提取,通过在每个写入的字符串之间添加“,”将其写入另一个文件,将字符串转换回浮点数,然后我得到了我的带有标题的 CSV 文件。 But wow is that cumbersome and ugly - I want to use Pandas to make it more elegant, concise and sharp.但是哇就是这么麻烦和丑陋——我想用 Pandas 让它更优雅、简洁和锐利。

Here is a snippet of the first few lines of a data file - I have hundreds of them to process.这是数据文件前几行的片段 - 我有数百个要处理。 I the actual files there are dozens more columns, this is just a sample that demonstrates the variable spaces between fields as delimiters.在实际文件中还有几十列,这只是一个示例,演示了字段之间的可变空格作为分隔符。

DATE......................TIME.....................CH4.......................H2O日期............时间............CH4...... ...................H2O
2021-04-01................01:47:45.407..............2.0063472018E+00..........1.2005321188E+00... 2021-04-01......................01:47:45.407.......2.0063472018E+00...... ....1.2005321188E+00...
2021-04-01................01:47:46.336..............2.0063472018E+00..........1.2005321188E+00... 2021-04-01......................01:47:46.336.......2.0063472018E+00...... ....1.2005321188E+00...
2021-04-01................01:47:47.244..............2.0063472018E+00..........1.2025918742E+00... 2021-04-01......................01:47:47.244.......2.0063472018E+00...... ....1.2025918742E+00...
2021-04-01................01:47:49.049..............2.0059096902E+00..........1.2025918742E+00... 2021-04-01......................01:47:49.049.......2.0059096902E+00...... ....1.2025918742E+00...

I also need tp parse the DATE and TIME columns as a timestamp object, which I've been trying to do from panda read_csv(parse_dates[[0,1]]), which almost works.我还需要 tp 将 DATE 和 TIME 列解析为时间戳 object,我一直在尝试从 panda read_csv(parse_dates[[0,1]]) 执行此操作,这几乎可以工作。 I need the dates for plotting the x-axis labels for each series...but this is another problem for another post haha.我需要为每个系列绘制 x 轴标签的日期……但这是另一个帖子的另一个问题,哈哈。

Thanks in advance for any assistance!!提前感谢您的帮助!!

john rainh2o约翰·雷恩

Using Pandas, specify the delimiter as a space (assuming your example has replaced spaces with dots).使用 Pandas,将分隔符指定为空格(假设您的示例已将空格替换为点)。 Next specify skipinitialspace=True .接下来指定skipinitialspace=True The date and time columns can be converted into a single datetime64 type:日期和时间列可以转换为单个datetime64类型:

import pandas as pd

df = pd.read_csv('input.txt', delimiter=' ', skipinitialspace=True, parse_dates=[['DATE', 'TIME']])

print(df)
print(df.dtypes)

This would give you:这会给你:

                DATE_TIME       CH4       H2O
0 2021-04-01 01:47:45.407  2.006347  1.200532
1 2021-04-01 01:47:46.336  2.006347  1.200532
2 2021-04-01 01:47:47.244  2.006347  1.202592
3 2021-04-01 01:47:49.049  2.005910  1.202592

DATE_TIME    datetime64[ns]
CH4                 float64
H2O                 float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM