如何读取txt.file中没有分隔符或固定宽度的数据框

Question

I'm working on a raw data which is a text file. 我正在处理文本文件的原始数据。 However, it doesn't have separator or fixed width. 但是，它没有分隔符或固定宽度。 Each column has different length. 每列都有不同的长度。 For example, the length of column 1 is 12; 例如，列1的长度为12； the length of column 2 is 5; 第2列的长度是5； and so forth. 等等。

I was wondering is there a function from some packages that can handle this kind of file given the length of each column. 我想知道在给定每列长度的情况下，某些软件包中是否有一个功能可以处理这种文件。 One way I think that may work is using regular expression to iterate each row and column. 我认为可行的一种方法是使用正则表达式迭代每一行和每一列。

Answer 1

This is still a fixed width file (that just means size of each field is fixed, it does not have to be equal). 这仍然是一个固定宽度的文件（这意味着每个字段的大小都是固定的，不必相等）。 So you can use pandas.read_fwf , with the widths argument as [21,5,5,12...] to read this. 所以，你可以使用pandas.read_fwf ，与widths参数作为[21,5,5,12...]阅读本。 https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_fwf.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_fwf.html

Answer 2

The easiest way, assuming there are no separators , would just be to hard code the string slices: 假设没有分隔符 ，最简单的方法就是硬编码字符串切片：

with open("text.txt", "r+") as fh:
  for row in fh:
    row.write(row[0:12]+","+row[12:17]+","+row[17:23]... ) #finish

Then you could just specify the separator when you create the dataframe. 然后，您可以在创建数据框时指定分隔符。

如何读取txt.file中没有分隔符或固定宽度的数据框

问题描述

2 个解决方案

解决方案1
3 2017-09-16 15:00:42

解决方案2
1 2017-09-16 14:57:02

如何读取txt.file中没有分隔符或固定宽度的数据框

问题描述

2 个解决方案

解决方案1 3 2017-09-16 15:00:42

解决方案2 1 2017-09-16 14:57:02

解决方案1
3 2017-09-16 15:00:42

解决方案2
1 2017-09-16 14:57:02