简体   繁体   中英

How to convert multiple line string to data frame

my sample string is like this below:

>>> x3 = '\n      DST: 10.1.1.1\n      DST2: 10.1.2.1\n      DST3: 10.1.3.1\n    \n    \n      DST: 11.1.1.1\n      DST2: 11.1.2.1\n      DST3: 11.1.3.1\n    \n    \n'
>>> print(x3)

  DST: 10.1.1.1
  DST2: 10.1.2.1
  DST3: 10.1.3.1


  DST: 11.1.1.1
  DST2: 11.1.2.1
  DST3: 11.1.3.1

i want to convert it as data frame with DST, DST2 and DST3 as columns

You could do:

# get key, value pairs from string
items = (line.strip().split(': ') for line in x3.splitlines() if line.strip())

# build data
d = {}
for key, value in items:
    d.setdefault(key, []).append(value)

# convert it to a DataFrame
result = pd.DataFrame(d)

print(result)

Output

        DST      DST2      DST3
0  10.1.1.1  10.1.2.1  10.1.3.1
1  11.1.1.1  11.1.2.1  11.1.3.1

The line:

items = (line.strip().split(': ') for line in x3.splitlines() if line.strip())

is a generator expression , for the purposes of the question you could consider it equivalent (but not the same) to the following for loop:

result = []
for line in x3.splitlines():
    if line.strip():
        result.append(line.strip().split(': '))

In addition splitlines, strip, split are functions of string .

import pandas as pd

if __name__ == '__main__':

    x3 = "\n      DST: 10.1.1.1\n      DST2: 10.1.2.1\n      DST3: 10.1.3.1\n    \n    \n      DST: 11.1.1.1\n      DST2: 11.1.2.1\n      DST3: 11.1.3.1\n    \n    \n"
    #remove spaces
    x3_no_space = x3.replace(" ", "")
    #remove new lines and replace with &
    x3_no_new_line = x3_no_space.replace("\n", "&")
    #split from &
    x3_split = x3_no_new_line.split("&")

    #data array for store values
    DST_data = []
    #dictionary for make dataframe
    DST_TABLE = dict()

    #loop splitted data
    for DST in x3_split:
        #check if data is empty or not if not empty add data to DST_DATA array
        if DST != '':
            DST_data.append(DST)
            #split data from :
            DST_split = DST.split(":")
            #get column names and store it into dictionary with null array
            DST_TABLE[DST_split[0]] = []

    #read dst array
    for COL_DATA in DST_data:
        #split from :
        DATA = COL_DATA.split(":")
        #loop the dictionary
        for COLS in DST_TABLE:
            #check if column name of dictionary equal to splitted data 0 index if equals append the data to column
            if DATA[0] == COLS:
                DST_TABLE[COLS].append(DATA[1])

    # this is dictionary
    print("Python dictionary")
    print(DST_TABLE)

    # convert dictionary to dataframe using pandas
    dataframe = pd.DataFrame.from_dict(DST_TABLE)
    print("DATA FRAME")
    print(dataframe)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM