Parsing array from txt file to Pandas dataframe in Python

Question

Hi, I have such array in my.txt file:

And I'd like to parse it into such dataFrame:

  speed weight  color fuel
0  180      3  black   95
1  160      4  green   92
2  200      5  white   95

That's, how i solved it:

import re
import pandas as pd

df_output_list  = {}
df_output_dict  = []
match_counter = 1

with open('sample_car.txt',encoding='utf-8') as file:
    line = file.readline()
    while line:

        result = re.split(r'\|',line.rstrip())
        result2 = re.findall(r'.(?<=\[)(\d+)(?=\])',result[1])

        regex = re.compile('vechicle.car.characteristics.')
        match = re.search(regex, result[1])
        if match:

            if match_counter == 1:
                ArrInd = 0
            match_counter+=1
            #print(df_output_list)
            if ArrInd == int(result2[0]):
                df_output_list[result[1].split('.')[3]] = result[2]
                ArrInd = int(result2[0])

            else:
                df_output_dict.append(df_output_list)
                df_output_list  = {}
                df_output_list[result[1].split('.')[3]] = result[2]
                ArrInd = int(result2[0])

        line = file.readline()
    df_output_dict.append(df_output_list)
#print(df_output_dict)
df_output = pd.DataFrame(df_output_dict)
print(df_output)

And i found it so complicated. Is it possible to simplify it?
Column names should be parsed automatically.

Answer 1

Read as csv file with sep='|'then get last column which contain values and then reshape in appropriate shape.

>>> columns=['speed','weight','color','fuel']
>>> s = pd.read_csv('filename.txt', sep='|', header=None).iloc[:,-1]
>>> df = pd.DataFrame(s.to_numpy().reshape(-1,4), columns=columns)
>>> df
  speed weight  color fuel
0   180      3  black   95
1   160      4  green   92
2   200      5  white   95

If you have fix row formate like n|vechicle.car.characteristics[0].speed|180 then we can do this

>>> df = pd.read_csv('d.csv', sep='|', header=None)
>>> columns = df.iloc[:,1].str.split('.').str[-1].unique()
>>> df_out = pd.DataFrame(df.iloc[:,-1].to_numpy().reshape(-1,len(columns)), columns=columns)
>>> df_out
  speed weight  color fuel
0   180      3  black   95
1   160      4  green   92
2   200      5  white   95

Parsing array from txt file to Pandas dataframe in Python

Question

1 answers

solution1
1 ACCPTED 2020-05-22 09:25:22

Parsing array from txt file to Pandas dataframe in Python

Question

1 answers

solution1 1 ACCPTED 2020-05-22 09:25:22

solution1
1 ACCPTED 2020-05-22 09:25:22