[英]How to convert multiple line string to data frame
我的示例字符串如下所示:
>>> x3 = '\n DST: 10.1.1.1\n DST2: 10.1.2.1\n DST3: 10.1.3.1\n \n \n DST: 11.1.1.1\n DST2: 11.1.2.1\n DST3: 11.1.3.1\n \n \n'
>>> print(x3)
DST: 10.1.1.1
DST2: 10.1.2.1
DST3: 10.1.3.1
DST: 11.1.1.1
DST2: 11.1.2.1
DST3: 11.1.3.1
我想将其转换为以 DST、DST2 和 DST3 作为列的数据框
你可以这样做:
# get key, value pairs from string
items = (line.strip().split(': ') for line in x3.splitlines() if line.strip())
# build data
d = {}
for key, value in items:
d.setdefault(key, []).append(value)
# convert it to a DataFrame
result = pd.DataFrame(d)
print(result)
Output
DST DST2 DST3
0 10.1.1.1 10.1.2.1 10.1.3.1
1 11.1.1.1 11.1.2.1 11.1.3.1
该行:
items = (line.strip().split(': ') for line in x3.splitlines() if line.strip())
是一个生成器表达式,出于问题的目的,您可以认为它与以下 for 循环等效(但不相同):
result = []
for line in x3.splitlines():
if line.strip():
result.append(line.strip().split(': '))
另外 splitlines、strip、split 是string的函数。
import pandas as pd
if __name__ == '__main__':
x3 = "\n DST: 10.1.1.1\n DST2: 10.1.2.1\n DST3: 10.1.3.1\n \n \n DST: 11.1.1.1\n DST2: 11.1.2.1\n DST3: 11.1.3.1\n \n \n"
#remove spaces
x3_no_space = x3.replace(" ", "")
#remove new lines and replace with &
x3_no_new_line = x3_no_space.replace("\n", "&")
#split from &
x3_split = x3_no_new_line.split("&")
#data array for store values
DST_data = []
#dictionary for make dataframe
DST_TABLE = dict()
#loop splitted data
for DST in x3_split:
#check if data is empty or not if not empty add data to DST_DATA array
if DST != '':
DST_data.append(DST)
#split data from :
DST_split = DST.split(":")
#get column names and store it into dictionary with null array
DST_TABLE[DST_split[0]] = []
#read dst array
for COL_DATA in DST_data:
#split from :
DATA = COL_DATA.split(":")
#loop the dictionary
for COLS in DST_TABLE:
#check if column name of dictionary equal to splitted data 0 index if equals append the data to column
if DATA[0] == COLS:
DST_TABLE[COLS].append(DATA[1])
# this is dictionary
print("Python dictionary")
print(DST_TABLE)
# convert dictionary to dataframe using pandas
dataframe = pd.DataFrame.from_dict(DST_TABLE)
print("DATA FRAME")
print(dataframe)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.