[英]How to read multiple lines from csv into a single dataframe row with pandas
[英]Populate pandas dataframe from .txt reading single dataframe row information from multiple .txt lines
我想從大的.txt文件中讀取熊貓數據框信息,該信息以以下形式排列:
elm1 x1 x2 x3
cont x4 x5 x6
cont x7 x8
elm2 x9 x10 x11
cont x12 x13 x14
cont x15 x16
....
數據幀應按以下方式排列:
elm_ID col1 col2 col3 col4 col5 col6 col7 col8
elm_1 x1 x2 x3 x4 x5 x6 x7 x8
elm_2 x9 x10 x11 x12 x13 x14 x15 x16
.......
有人有主意嗎? 非常感謝。
JA
是的,您可以輕松地將數據轉換為數據框。 首先,我們通過逐行從文本文件中讀取數據來創建我們需要轉換為數據框的數據列表:
import re
df_list = [] #as you want these as your headers
with open(infile) as f:
for line in f:
# remove whitespace at the start and the newline at the end
line = line.strip()
# split each column on whitespace
columns = re.split('\s+', line, maxsplit=4)
df_list.append(columns)
然后我們可以簡單地使用以下命令將該列表轉換為數據框
import pandas as pd
df = pd.DataFrame(df_list,columns=[elm_ID col1 col2 col3 col4 col5 col6 col7 col8])
首先,通過pd.read_csv(path_to_file, sep='\\t')
讀取txt文件。
然后,假設我們有這個數據框:
a b c
0 elm1 x1 x2
1 cont x4 x5
2 cont x7 x8
3 elm2 x9 x10
4 cont x12 x13
5 cont x15 x16
我們想要以下輸出:
0 1 2 3 4 5
elm1 x1 x4 x7 x2 x5 x8
elm2 x9 x12 x15 x10 x13 x16
我試圖使用pandas函數完全解決它:
df = pd.DataFrame([("elm1", "x1", "x2" ),
("cont", "x4", "x5"),
("cont", "x7", "x8"),
("elm2", "x9", "x10"),
("cont", "x12", "x13"),
("cont", "x15", "x16")] , columns=list('abc'))
df['d'] = df['a'] != 'cont'
df['e'] = df['a']
df['e'][~df['d']] = np.nan
df['e'] = df['e'].fillna(method='ffill')
df2 = df.groupby('e').apply(lambda x: pd.concat([x['b'], x['c']])).to_frame().reset_index()
df2['ct'] = df2.reset_index().groupby('e').cumcount()
df3 = df2.pivot(index='e', values=[0], columns='ct')
df3.columns = range(len(df3.columns))
df3.index.name = ''
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.