简体   繁体   中英

Python loading txt file and split lines by position in line

I am new here and a python beginner. I received a text file containing 100k lines each containing 120 characters. Every line is representing data for 14 columns but as some values are shorter the other they are filled up with blank. That way I don´t have a separator like ",". If I would choose blank as separator, the values would not go to the correct column.

Lines are like

  1. Character 1:O or L
  2. Character 2-5:Year
  3. Character 6-13:Name of Month
  4. Character 14-21:Brand of car
  5. Character 22:.
O2020august  Opel    .
L2015may     BMW     .
L2016april   Mercedes.
O2021january Opel    .
L2023februaryAudi    .

I am stuck with

df = pd.read_csv('text.txt', index_col=0, header = None)
print (data)

I am happy for any approach suggested. Doesn´t need to be pandas.

Cheers Jeanny

Or you can use a simple helper function that does the job for you.

def split_by_pos(string_to_split, *args):
    """
    Splits a string at the given positions
    :param string_to_split: the string to be split
    :param args: the positions where the function will split the string.
    :return: the splitted string as a tuple
    """
    return_value = list()
    args = sorted(args)
    previous = 0
    for position in args:
        return_value.append(string_to_split[previous:position])
        previous = position
    return_value.append(string_to_split[previous:])
    return tuple(return_value)


with open("a_random_file.txt", "r", encoding="utf-8") as fp:
    lines = fp.readlines()
    
for line in lines:
    print(split_by_pos(line, 1, 5, 12))

I believe something like that can solve your problem.

for line in txt:
   #line should point something like that => "O2020august Opel"
   print(line)
   s1 = line[:1]
   s2 = line[1:5]
   s3 = line[5:13]
   .
   .
   .
   print(s1, s2, s3)

You can use readline and readlines methods of Python file read API.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM