简体   繁体   中英

Modifying input data to a usable list format using python

I have input data in the following format stored in a text file

输入数据

I am trying to create separate lists with every column as a list with the last element of the column as the first element of the list as shown below:

list1 = [D, T, W, F, J, S, H, N]. . list3 = [L, Q, V] and so on

Tried reading the contents to a data frame and reversing the rows. However, column 0 is being clustered?

df = pd.read_fwf("input.txt", header=None)

Output 在此处输入图像描述

df.iloc[::-1] 输出

How to separate items into individual columns so they are aligned properly

Use read_fwf for read file first and then in list comprehension create list of lists:

df = pd.read_fwf('input_1.txt', header=None)
print (df)
     0    1    2    3    4    5    6    7    8
0  [N]  [G]  NaN  NaN  NaN  NaN  NaN  [Q]  NaN
1  [H]  [B]  NaN  NaN  [B]  [R]  NaN  [H]  NaN
2  [S]  [N]  NaN  [Q]  [M]  [T]  NaN  [Z]  NaN
3  [J]  [T]  NaN  [R]  [V]  [H]  NaN  [R]  [S]
4  [F]  [Q]  NaN  [W]  [T]  [V]  [J]  [V]  [M]
5  [W]  [P]  [V]  [S]  [F]  [B]  [Q]  [J]  [H]
6  [T]  [R]  [Q]  [B]  [D]  [D]  [B]  [N]  [N]
7  [D]  [H]  [L]  [N]  [N]  [M]  [D]  [D]  [B]
8    1    2    3    4    5    6    7    8    9

L =  [v[1:].str.strip('[]').dropna().tolist() 
                   for k, v in df.iloc[::-1].to_dict('series').items()]

print (L)

[['D', 'T', 'W', 'F', 'J', 'S', 'H', 'N'],
 ['H', 'R', 'P', 'Q', 'T', 'N', 'B', 'G'],
 ['L', 'Q', 'V'],
 ['N', 'B', 'S', 'W', 'R', 'Q'],
 ['N', 'D', 'F', 'T', 'V', 'M', 'B'], 
 ['M', 'D', 'B', 'V', 'H', 'T', 'R'], 
 ['D', 'B', 'Q', 'J'], 
 ['D', 'N', 'J', 'V', 'R', 'Z', 'H', 'Q'],
 ['B', 'N', 'H', 'M', 'S']]

Here is one way to create the lists dynamically based on your ( .txt ) file/dataframe:

import pandas as pd

_ = (
       pd.read_fwf("/tmp/file.txt" header=None)
            .set_axis(list(df.iloc[-1]), axis=1)
            .iloc[::-1][1:]
            .pipe(lambda df: [exec(f"globals()['list{k}'] = ([e.strip('[]') \
                                               for e in v if str(e) != 'nan'])")
                              for k,v in df.to_dict("list").items()])
    )

Output:

print([(var, val) for var, val in globals().items() if var.startswith("list")])

[('list1', ['D', 'T', 'W', 'F', 'J', 'S', 'H', 'N']),
 ('list2', ['H', 'R', 'P', 'Q', 'T', 'N', 'B', 'G']),
 ('list3', ['L', 'Q', 'V']),
 ('list4', ['N', 'B', 'S', 'W', 'R', 'Q']),
 ('list5', ['N', 'D', 'F', 'T', 'V', 'M', 'B']),
 ('list6', ['M', 'D', 'B', 'V', 'H', 'T', 'R']),
 ('list7', ['D', 'B', 'Q', 'J']),
 ('list8', ['D', 'N', 'J', 'V', 'R', 'Z', 'H', 'Q']),
 ('list9', ['B', 'N', 'H', 'M', 'S'])]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM