[英]Reformat text file into dataframe
我希望將文本文件重新格式化為 dataframe。 輸入文件看起來像這樣。 每個“insert_machine:”值將代表 dataframe 中的一條新記錄。
/* ----------------- REAL-001 ------------------- */
insert_machine: REAL-001
type: a
factor: 1.00
description: Cloud Added
port: 1234
node_name: REAL-001.some.domain
agent_name: REAL-001
/* key_to_agent: *** masked value ***/
encryption_type: AES
opsys: linux
character_code: ASCII
/* ----------------- REAL-002 ------------------- */
insert_machine: REAL-002
type: a
factor: 1.00
description: Cloud Added
port: 1234
node_name: REAL-002.some.domain
agent_name: REAL-002
/* key_to_agent: *** masked value ***/
encryption_type: AES
opsys: linux
character_code: ASCII
/* ----------------- VIRTUAL-001 ----------------- */
insert_machine: VIRTUAL-001
type: v
machine: REAL-001
factor: ----
machine: REAL-002
factor: ----
我當前的代碼是這樣的——
import pandas as pd
jilFileName = "inputfile.txt"
# Create empty list
jilinArray = []
# Create empty dictionary
oneJob = {}
with open(jilFile_path, "rt") as jil:
jilLines = jil.readlines()
for linesInJill in jilLines:
if "insert_machine:" in linesInJill:
jilinArray.append(oneJob)
linesInJill = linesInJill.strip()
machine = linesInJill.split("insert_machine:")[1]
oneJob = {}
oneJob["insert_machine"] = str(machine).strip()
else:
if linesInJill != "\n" and "/* ----" not in linesInJill:
if ": " in linesInJill:
spli = linesInJill.split(":", 1)
oneJob[str(spli[0]).strip()] = str(spli[1]).strip().replace("\"", "")
jilinArray.append(oneJob)
df = pd.DataFrame(jilinArray, columns=['insert_machine', 'type', 'description', 'port', 'node_name', 'agent_name',
'encrption_type', 'opsys', 'character_code', 'machine'])
print(df)
這給了我這個 output –
insert_machine type description ... opsys character_code machine
0 NaN NaN NaN ... NaN NaN NaN
1 REAL-001 a Cloud Added ... linux ASCII NaN
2 REAL-002 a Cloud Added ... linux ASCII NaN
3 VIRTUAL-001 v NaN ... NaN NaN REAL-002
我的問題是那些具有“類型:v”的“插入機器:”條目。 它們可以有零到多個“機器:”值。 我不確定如何讓我的 dataframe 中的每一個都反映出來。
我想看看這樣的東西——
insert_machine type description ... opsys character_code machine
0 NaN NaN NaN ... NaN NaN NaN
1 REAL-001 a Cloud Added ... linux ASCII NaN
2 REAL-002 a Cloud Added ... linux ASCII NaN
3 VIRTUAL-001 v NaN ... NaN NaN REAL-001
4 VIRTUAL-001 v NaN ... NaN NaN REAL-002
最終我想看到這個,但如果我至少能得到 df 中的所有“機器:”條目,我希望我能從那里得到 go。
insert_machine type description ... opsys character_code machine
0 NaN NaN NaN ... NaN NaN NaN
1 REAL-001 a Cloud Added ... linux ASCII VIRTUAL-001
2 REAL-002 a Cloud Added ... linux ASCII VIRTUAL-001
關於如何獲得在我的 dataframe 中反映的每個“機器:”值的任何想法?
我確信有更多的 eloquent 方式來處理這個問題,但這就是我想出的。
我的初始代碼現在看起來像這樣 -
# Create empty list
jilinArray = []
# Create empty dictionary
oneJob = {}
# Read our input files
with open(jilFile_path, "rt") as jil:
jilLines = jil.readlines()
for linesInJill in jilLines:
if "insert_machine:" in linesInJill:
linesInJill = linesInJill.strip()
ins_mach = linesInJill.split("insert_machine:")[1]
ins_mach_temp = ins_mach
oneJob = {}
oneJob["insert_machine"] = str(ins_mach).strip()
jilinArray.append(oneJob)
else:
if linesInJill != "\n" and "/* ----" not in linesInJill:
if ": " in linesInJill:
spli = linesInJill.split(":", 1)
oneJob[str(spli[0]).strip()] = str(spli[1]).strip().replace("\"", "")
# To allow for virtual agents that have multiple 'machine:' entries
if spli[0] == 'type':
type_temp = spli[1]
if spli[0] == 'machine':
jilinArray.append(oneJob)
oneJob = {}
oneJob["insert_machine"] = ins_mach_temp.strip()
oneJob["type"] = type_temp.strip()
# Load the list into a dataframe
df = pd.DataFrame(jilinArray, columns=['insert_machine', 'type', 'description', 'port', 'node_name', 'agent_name',
'encrption_type', 'opsys', 'character_code', 'machine'])
# Remove all duplicate entries.
df.drop_duplicates(inplace=True)
print(df)
這給了我這個 output -
insert_machine type description ... opsys character_code machine
0 REAL-001 a Cloud Added ... linux ASCII NaN
1 REAL-002 a Cloud Added ... linux ASCII NaN
2 VIRTUAL-001 v NaN ... NaN NaN REAL-001
4 VIRTUAL-001 v NaN ... NaN NaN REAL-002
然后我添加了這個來合並條目 -
# Copy our dataframe and filter on the 'type' column to only return virtual agents
df2 = df.copy()
df2 = df2[df2['type'].eq('v')]
# Select our desired columns
df2 = df2[['insert_machine', 'machine']]
# Rename some columns
df2.rename(columns={'insert_machine': 'Virtual_Machine'}, inplace=True)
# Merge the original dataframe(df) with the copied dataframe(df2). To combine the real and virtual agent names into
# one record.
df_mg = pd.merge(df, df2,
left_on=df["insert_machine"].str.lower(),
right_on=df2["machine"].str.lower(),
how='left')
# Rename some columns
df_mg.rename(columns={'insert_machine': 'Real_Machine'}, inplace=True)
# Select our desired columns
df_mg = df_mg[['Virtual_Machine', 'Real_Machine', 'type', 'description', 'port', 'node_name', 'agent_name',
'encrption_type', 'opsys', 'character_code']]
# Filter on the 'type' column
df_mg = df_mg[df_mg['type'].eq('a')]
print(df_mg)
這給了我這個 output -
Virtual_Machine Real_Machine type ... encrption_type opsys character_code
0 VIRTUAL-001 REAL-001 a ... NaN linux ASCII
1 VIRTUAL-001 REAL-002 a ... NaN linux ASCII
它似乎對我有用。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.