简体   繁体   English

Python 加载 txt 文件并按 position 分行

[英]Python loading txt file and split lines by position in line

I am new here and a python beginner.我是新来的,是 python 初学者。 I received a text file containing 100k lines each containing 120 characters.我收到了一个包含 100k 行的文本文件,每行包含 120 个字符。 Every line is representing data for 14 columns but as some values are shorter the other they are filled up with blank.每行代表 14 列的数据,但由于某些值较短,而另一些值则用空白填充。 That way I don´t have a separator like ",".这样我就没有像“,”这样的分隔符。 If I would choose blank as separator, the values would not go to the correct column.如果我选择空白作为分隔符,则值不会 go 到正确的列。

Lines are like线条就像

  1. Character 1:O or L字符 1:O 或 L
  2. Character 2-5:Year角色2-5:年份
  3. Character 6-13:Name of Month字符 6-13:月份名称
  4. Character 14-21:Brand of car角色14-21:汽车品牌
  5. Character 22:.字符22:。
O2020august  Opel    .
L2015may     BMW     .
L2016april   Mercedes.
O2021january Opel    .
L2023februaryAudi    .

I am stuck with我被困住了

df = pd.read_csv('text.txt', index_col=0, header = None)
print (data)

I am happy for any approach suggested.我对建议的任何方法感到高兴。 Doesn´t need to be pandas.不需要是 pandas。

Cheers Jeanny干杯珍妮

Or you can use a simple helper function that does the job for you.或者您可以使用一个简单的助手 function 为您完成这项工作。

def split_by_pos(string_to_split, *args):
    """
    Splits a string at the given positions
    :param string_to_split: the string to be split
    :param args: the positions where the function will split the string.
    :return: the splitted string as a tuple
    """
    return_value = list()
    args = sorted(args)
    previous = 0
    for position in args:
        return_value.append(string_to_split[previous:position])
        previous = position
    return_value.append(string_to_split[previous:])
    return tuple(return_value)


with open("a_random_file.txt", "r", encoding="utf-8") as fp:
    lines = fp.readlines()
    
for line in lines:
    print(split_by_pos(line, 1, 5, 12))

I believe something like that can solve your problem.我相信这样的事情可以解决你的问题。

for line in txt:
   #line should point something like that => "O2020august Opel"
   print(line)
   s1 = line[:1]
   s2 = line[1:5]
   s3 = line[5:13]
   .
   .
   .
   print(s1, s2, s3)

You can use readline and readlines methods of Python file read API.您可以使用 Python 文件的readlinereadlines方法读取 API。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM