[英]converting the contents of txt file to columns of pandas dataframe
[英]Python project - Writing contents of .txt file to Pandas dataframe
我目前正在從事 Python 項目,我想:
Memory dump
Serialnr: 1412b23990
Date/time: 24-11-2016 08:10
mode: version
Hardware release: ic2kkit01*P131113*
Software release: V3.82
Rom test 1 checksum: e0251fda
Rom test 2 checksum: cae0351f
mode: statistics
Line power connected (hours): 360
Line power disconnected (number of times): 2
Ch function(hours): 54
Dhw function(hours): 4
Burnerstarts (number of times): 604
Ignition failed (number of times): 0
Flame lost (number of times): 0
Reset (number of times): 0
mode: status
T1: 17.42
T2: 17.4
T3: 16.38
T4: -35.0
T5: -35.0
T6: 17.4
Temp_set: 0.0
Fanspeed_set: 0.0
Fanspeed: 0.0
Fan_pwm: 0.0
Opentherm: 0
Roomtherm: 0
Tap_switch: 0
當前.py 代碼:
import os
import pandas as pd
# Set rootdir for os.walk
rootdir = 'K:/Retouren'
## Create empty Pandas dataframe with just column names
column_names = ["Memory dump", "Serialnr", "Date/time", "mode", "Hardware release", "Software release", "Rom test 1 checksum", "Rom test 2 checksum",
"mode", "Line power connected (hours)", "Line power disconnected (number of times)", "Ch function(hours)", "Dhw function(hours)", "Burnerstarts (number of times)",
"Ignition failed (number of times)", "Flame lost (number of times)", "Reset (number of times)", "Gasmeter_ch", "Gasmeter_dhw", "Watermeter", "Burnerstarts_dhw",
"mode", "T1", "T2", "T3", "T4", "T5", "T6", "Temp_set", "Fanspeed_set", "Fanspeed", "Fan_pwm", "Opentherm", "Roomtherm", "Tap_switch", "Gp_switch", "Pump", "Dwk",
"Gasvalve", "Io_signal", "Spark", "Io_curr", "Displ_code", "Ch_pressure", "Rf_rth_bound", "Rf_rth_communication", "Rf_rth_battery_info", "Rf_rth_battery_ok",
"Bc_tapflow", "Pump_pwm", "Room_override_zone1", "Room_set_zone1", "Room_temp_zone1", "Room_override_zone2", "Room_set_zone2", "Room_temp_zone2", "Outside_temp",
"Ot_master_member_id", "Ot_therm_prod_version", "Ot_therm_prod_type", "mode", "Node nr", "Cloud id0", "Cloud id1", "Cloud id2", "Rf cloud nr", "Rssi_lower",
"Rssi_upper", "Rssi_wait", "Attention_period", "Attention_number", "Info10", "Info11", "Info12", "Info13", "Info14", "Info15", "Info16", "Info17", "Info18",
"Ramses_thermostat_idh", "Ramses_thermostat_idm", "Ramses_thermostat_idl", "Ramses_boiler_idh", "Ramses_boiler_idm", "Ramses_boiler_idl", "Prod. token",
"Year", "Month", "Line number", "Serial1", "Serial2", "Serial3", "mode", "Id_dongle0", "Id_dongle1", "Id_dongle2", "Id_dongle3", "Id_lan0", "Id_lan1",
"Id_lan2", "Id_lan3", "Info2_7", "Info2_8", "Info2_9", "Info2_10", "Info2_11", "Info2_12", "Info2_13", "Info2_14", "mode", "Interrupt_time",
"Interrupt_load (%)", "Main_load (%)", "Net fequency (hz)", "Voltage ref. (v)", "Checksum1", "Checksum2", "nmode", "Fault 0", "Fault 1", "Fault 2",
"Fault 3", "Fault 4", "Fault 5", "Fault 6", "Fault 7", "Fault 8", "Fault 9", "Fault 10", "Fault 11", "Fault 12", "Fault 13", "Fault 14", "Fault 15",
"Fault 16", "Fault 17", "Fault 18", "Fault 19", "Fault 20", "Fault 21", "Fault 22", "Fault 23", "Fault 24", "Fault 25", "Fault 26", "Fault 27", "Fault 28",
"Fault 29", "Fault 30", "Fault 31", "mode", "Heater_on", "Comfort_mode", "Ch_set_max", "Dhw_set", "Eco_days", "Comfort_set", "Dhw_at_night", "Ch_at_night",
"Parameter 1", "Parameter 2", "Parameter 3", "Parameter 4", "Parameter 5", "Parameter 6", "Parameter 7", "Parameter 8", "Parameter 9", "Parameter a",
"Parameter b", "Parameter c", "Parameter c", "Parameter d", "Parameter e", "Parameter e.", "Parameter f", "Parameter h", "Parameter n", "Parameter o",
"Parameter p", "Parameter r", "Parameter f.", "mode", "Param31", "Param32", "Param33", "Param34", "Param35", "Param36", "Param37", "Param38", "Param39",
"Param40", "Param41", "Param42", "Param43", "Param44", "Param45", "Param46", "Param47", "Param48", "Param49", "Param50", "Param51", "Param52", "Param53",
"Param54", "Param55", "Param56", "Param57", "Param58", "Param59", "Param60", "Param61", "Param62", "Param63"]
data = pd.DataFrame(columns = column_names)
for subdir, dirs, files in os.walk(rootdir):
for file in files:
if file.startswith('memory_') and os.path.splitext(file)[1] == '.txt':
filepath = os.path.join(subdir, file)
with open (filepath, "r") as curfile:
data.append()
## Here is where I would like to append the .txt data as a row in the data frame
我有前兩步,但第三步超出了我的編程知識。 任何提示將非常感謝。
所需 dataframe 的示例:
我建議使用readlines()
讀取文件,這將返回行列表。 然后遍歷這些行並僅處理那些在字符串中包含:
的行。 在將所有內容包裝在dict()
中時,由冒號(和尾隨空格)分割將創建一個字典,其中冒號之前的字符串作為鍵,冒號之后的字符串作為值:
dict(i.split(': ',1) for i in curfile.readlines() if ':' in i)
對於您的示例數據,這將使:
{'Serialnr': '1412b23990', 'Date/time': '24-11-2016 08:10', 'mode': 'status', 'Hardware release': 'ic2kkit01*P131113*', 'Software release': 'V3.82', 'Rom test 1 checksum': 'e0251fda', 'Rom test 2 checksum': 'cae0351f', 'Line power connected (hours)': '360', 'Line power disconnected (number of times)': '2', 'Ch function(hours)': '54', 'Dhw function(hours)': '4', 'Burnerstarts (number of times)': '604', 'Ignition failed (number of times)': '0', 'Flame lost (number of times)': '0', 'Reset (number of times)': '0', 'T1': '17.42', 'T2': '17.4', 'T3': '16.38', 'T4': '-35.0', 'T5': '-35.0', 'T6': '17.4', 'Temp_set': '0.0', 'Fanspeed_set': '0.0', 'Fanspeed': '0.0', 'Fan_pwm': '0.0', 'Opentherm': '0', 'Roomtherm': '0', 'Tap_switch': '0'}
如果您在循環之前創建一個空列表,並且 append 將字典添加到循環內的該列表中,您最終會得到一個可以在 pandas 中加載的字典列表 Z34D1F91FB2E514B8576ZFAB1A76
import os
import pandas as pd
# Set rootdir for os.walk
rootdir = 'K:/Retouren'
## Create empty list
data = []
for subdir, dirs, files in os.walk(rootdir):
for file in files:
if file.startswith('memory_') and os.path.splitext(file)[1] == '.txt':
filepath = os.path.join(subdir, file)
with open (filepath, "r") as curfile:
data.append(dict(i.split(': ',1) for i in curfile.readlines() if ':' in i))
df = pd.DataFrame(data)
另一個優點是您不需要手動設置列名,因為 pandas 將為此使用 dict 鍵。 DataFrame:
序列號 | 約會時間 | 模式 | 硬件發布 | 軟件發布 | ROM 測試 1 校驗和 | ROM 測試 2 校驗和 | 連接的線路電源(小時) | 線路電源斷開(次數) | 通道功能(小時) | Dhw 功能(小時) | 燃燒器啟動(次數) | 點火失敗(次數) | 失火(次數) | 重置(次數) | T1 | T2 | T3 | T4 | T5 | T6 | Temp_set | Fanspeed_set | 風扇轉速 | fan_pwm | 開熱 | 室溫 | Tap_switch | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1412b23990 | 24-11-2016 08:10 | 地位 | ic2kkit01 P131113 | V3.82 | e0251fda | cae0351f | 360 | 2 | 54 | 4 | 604 | 0 | 0 | 0 | 17.42 | 17.4 | 16.38 | -35 | -35 | 17.4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
有一個缺點:由於 dict 只能包含唯一鍵,因此您將丟失兩個mode
值。 我會留下它,因為它們似乎是標題而不是信息容器。 否則將需要一些額外的重命名。
我認為您是在說您已經可以將文本文件放入行列表中-因此,從文本文件中的行列表中保持超級簡單-我們將其稱為 lineList-只需執行以下操作:
splitList = [l.split(":") for l in lineList if len(l)>0]
然后someDF = pd.DataFrame(splitList, index=[i[0] for i in splitList])[[1]].T
要得到
筆記:
":".join("Date/time: 24-11-2016 08:10".split(":")[1:])
' 24-11-2016 08:10'
在上面第一行的上下文中,您可以像這樣調整該方法:
splitList = [[l.split(":")[0],":".join(l.split(":")[1:])] for l in lineList if len(l)>0]
這給了你:
[['Memory dump', ''],
['Serialnr', ' 1412b23990'],
['Date/time', ' 24-11-2016 08:10'],
['mode', ' version'],
['Hardware release', ' ic2kkit01*P131113*'],
['Software release', ' V3.82'],
['Rom test 1 checksum', ' e0251fda'],
['Rom test 2 checksum', ' cae0351f'],
['mode', ' statistics'],
['Line power connected (hours)', ' 360'],
['Line power disconnected (number of times)', ' 2'],
['Ch function(hours)', ' 54'],
['Dhw function(hours)', ' 4'],
['Burnerstarts (number of times)', ' 604'],
['Ignition failed (number of times)', ' 0'],
['Flame lost (number of times)', ' 0'],
['Reset (number of times)', ' 0'],
['mode', ' status'],
['T1', ' 17.42'],
['T2', ' 17.4'],
['T3', ' 16.38'],
['T4', ' -35.0'],
['T5', ' -35.0'],
['T6', ' 17.4'],
['Temp_set', ' 0.0'],
['Fanspeed_set', ' 0.0'],
['Fanspeed', ' 0.0'],
['Fan_pwm', ' 0.0'],
['Opentherm', ' 0'],
['Roomtherm', ' 0'],
['Tap_switch', ' 0']]
要構建多行的 DataFrame,您需要將 append 每個單行數據幀添加到一個列表(例如 dfList.append(someDF) ),然后在最后使用連接(例如 comboDF = pd.concat(dfList) )
因此,將所有這些放在一起 - 給定文本文件中的行列表:
dataFrameList = []
splitList = [[l.split(":")[0],":".join(l.split(":")[1:])] for l in lineList if len(l)>0]
oneRowDF = pd.DataFrame(data = [i[1] for i in splitList], index=[i[0] for i in splitList]).T
dataFrameList.append(oneRowDF)
然后,當您將它們全部加載到單行數據框列表中時:
comboDF = pd.concat(dataFrameList)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.