簡體   English   中英

Pandas 解析文本數據並根據條件對齊列

[英]Pandas to parse the text data and aligned the columns based on condition

我有以下文本數據,我需要根據以下條件解析並拆分為列..

  1. =開頭的任何內容都應在ENC_NAME

  2. 任何包含BladeSystem的行,行尾的數字應位於OA_VERSION列下

  3. 包含1 HP的任何行都應位於VC_ACTIVE列下

  4. 包含2 HP的任何行都應位於VC_STDN列下

文本數據

========= enc1001 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1002 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1003 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1004 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1005 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1006 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1007 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1008 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.40
  2 HP VC Flex-10/10D Module   4.40
========= enc1009 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2001 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2002 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2003 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2004 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2005 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2006 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2007 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2008 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2009 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2011 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2013 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3020 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc3021 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc3022 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc3026 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.45
  2 HP VC Flex-10/10D Module   4.45
========= enc3027 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3028 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3029 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3030 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3031 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4021 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4023 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4024 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4025 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4026 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4027 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4028 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4029 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4030 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4031 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4032 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4033 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4034 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc6002 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6011 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6012 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6013 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6014 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6015 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6016 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6017 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc7002 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
========= enc7003 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
========= enc7004 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
========= enc7009 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1010 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1011 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1012 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1013 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1014 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1015 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1016 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1017 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1018 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1025 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc1026 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2010 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2012 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2014 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2015 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2016 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2018 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2019 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2020 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2021 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2022 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2023 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3033 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3034 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3036 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4020 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4022 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4035 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc7005 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc7006 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC FlexFabric 10Gb/24-Port Module  4.50
  2 HP VC FlexFabric 10Gb/24-Port Module  4.50
========= enc7007 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc7008 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8001 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8017 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8018 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8019 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8021 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8022 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8023 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8024 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8025 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8026 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8027 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8028 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8033 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.40
  2 HP VC Flex-10/10D Module   4.40

期望的輸出(示例):

ENC_NAME    OA_VERSION      VC_ACTIVE   VC_STDN
enc4031     4.85            4.50        4.50
enc4032     4.85            4.50        4.50
enc4033     4.85            4.50        4.50
enc4034     4.85            4.50        4.50
enc6002     4.60            NaN         NaN
enc6011     4.60            NaN         NaN
enc6012     4.60            NaN         NaN
enc6013     4.60            NaN         NaN

編輯(我試過的)

df  = pd.read_csv("enc_list_sorted", names=["col1"])
df = df.col1.str.split(' ', expand = True)
df = df.drop(df.columns[[0, 2, 3, 4, 5, 6, 7, 8, 11]], axis=1)


df = df.rename(columns={ 1: 'ENC_NAME', 9: 'VC_VERSION', 10: 'OA_VERSION'})

print(df)

        ENC_NAME VC_VERSION OA_VERSION
    0    enc1001       None       None
    1                   KVM       4.85
    2                  4.50       None
    3                  4.50       None
    4    enc1002       None       None
    5                   KVM       4.85
    6                  4.50       None
    7                  4.50       None
    8    enc1003       None       None
    9                   KVM       4.85
    10                 4.50       None
    11                 4.50       None
    12   enc1004       None       None
    13                  KVM       4.85
    14                 4.50       None
    15                 4.50       None

任何幫助或想法都會非常有幫助。

在我看來,請改用自己編寫的解析器。 您所擁有的可以看作是所謂的 DSL 的一種形式,一種領域特定的語言。 這里使用的語法相當寬容:

import re, pandas as pd
from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor

class ENCVisitor(NodeVisitor):
    grammar = Grammar(r"""
            content     = (ws / block)*

            block       = header oa_line vc_active? vc_stdn?
            header      = delim ws word ws delim nl

            oa_line     = ~"^(?=.*BladeSystem).+"m nl?
            vc_active   = ~"^(?=.*1 HP).+"m nl?
            vc_stdn     = ~"^(?=.*2 HP).+"m nl?

            word        = ~"\w+"
            delim       = ~"=+"
            ws          = ~"\s+"
            nl          = ~"[\n\r]+"
    """)

    version_pattern = re.compile(r"\d+\.\d+$")

    def get_version(self, key, line):
        match = self.version_pattern.search(line)
        value = match.group(0) if match else None
        return {key: value}

    def generic_visit(self, node, visited_children):
        return visited_children or node

    def visit_header(self, node, visited_children):
        header = visited_children[2]
        return {"ENC_NAME": header.text}

    def visit_oa_line(self, node, visited_children):
        line, _ = visited_children
        return self.get_version("OA_VERSION", line.text)

    def visit_vc_active(self, node, visited_children):
        line, _ = visited_children
        return self.get_version("VC_ACTIVE", line.text)

    def visit_vc_stdn(self, node, visited_children):
        line, _ = visited_children
        return self.get_version("VC_STDN", line.text)

    def visit_block(self, node, visited_children):
        dct = {}
        for child in visited_children:
            if isinstance(child, dict):
                dct.update(child)
            elif isinstance(child, list):
                dct.update(child[0])
        return dct

    def visit_content(self, node, visited_children):
        return [child[0] for child in visited_children if isinstance(child[0], dict)]

enc = ENCVisitor()
result = enc.parse(data)

df = pd.DataFrame(result)
print(df)

對於您的數據,這會導致

   ENC_NAME OA_VERSION VC_ACTIVE VC_STDN
0   enc1001       4.85      4.50    4.50
1   enc1002       4.85      4.50    4.50
2   enc1003       4.85      4.50    4.50
3   enc1004       4.85      4.50    4.50
4   enc1005       4.85      4.50    4.50
..      ...        ...       ...     ...
94  enc8025       4.85      4.62    4.62
95  enc8026       4.85      4.62    4.62
96  enc8027       4.85      4.62    4.62
97  enc8028       4.85      4.62    4.62
98  enc8033       4.85      4.40    4.40

[99 rows x 4 columns]

解釋:您的輸入可以看作是一種自己的迷你語言,一種所謂的領域特定語言。 文件中的每個信息塊都包含一個 header 行、一個OA_VERSION行和兩行可能存在或不存在的行( VC_ACTIVEVC_STDN )。 您的 header 行始終以===開頭和結尾。

所有這些磚塊形成一個語法,即文件/字符串中的空格或多個塊。 在內部,我們建立了一個抽象的語法樹( ast )並檢索信息,我們需要“訪問”每個節點。 在我選擇使用的解析器庫(優秀的parsimonious )中,這是通過NodeVisitor class 完成的,並且通過相應的 function 名稱訪問 ast 的每個葉子。 這意味着如果我們將一個部分稱為“標題”,則 function 應該命名為“visit_header”。

結果是通過“visit_block”獲取的,並且是該塊的所有檢索信息的字典。 最后,所有內容都輸入pandas

當然,這只是一個簡短的介紹,如果您想了解更多關於parsimonious的內容,請查看Github 存儲庫

正如評論中所建議的那樣,使用pandas打開文件,解析並不理想。

假設您的數據保存在文本文件file.txt

import pandas as pd

with open("file.txt") as file:
    lines = [l.rstrip("\n") for l in file]


row_temp = [None] * 4
row = None
out = []
for line in lines:
    if line.startswith("="):
        if row is not None:
            out.append(row)
        row = row_temp.copy()
        row[0] = line.replace("=", "").rstrip().lstrip()

    if 'BladeSystem' in line:
        row[1] = line.split(" ")[-1]
    if '1 HP' in line:
        row[2] = line.split(" ")[-1]
    if '2 HP' in line:
        row[3] = line.split(" ")[-1]

col_names = ["ENC_NAME", "OA_VERSION", "VC_ACTIVE", "VC_STDN"]
df = pd.DataFrame(out,
                  columns=col_names)

返回您正在尋找的 output。

你可以試試這個:

import pandas as pd
import re
import numpy as np

with open(r'test1.txt','r') as file:
    txto=file.read()

data=[]
pattern1 = re.compile('(^\=.+)\s.+$\n?', re.MULTILINE)
lstlines=txto.split('\n')

for ele1, ele2 in zip(re.findall(pattern1,txto),re.findall(pattern1,txto)[1:]):
    row=lstlines[lstlines.index(ele1):lstlines.index(ele2)]

    OA_VERSION=[i for i in row if 'BladeSystem' in i]
    OA_VERSION=OA_VERSION[0].split()[-1] if len(OA_VERSION)>0 else np.nan
    
    VC_ACTIVE=[i for i in row if '1 HP' in i]
    VC_ACTIVE=VC_ACTIVE[0].split()[-1] if len(VC_ACTIVE)>0 else np.nan
    
    VC_STDN=[i for i in row if '2 HP' in i]
    VC_STDN=VC_STDN[0].split()[-1] if len(VC_STDN)>0 else np.nan
    
    data.append([ele1.replace('=','').strip(),OA_VERSION, VC_ACTIVE,VC_STDN])
    
#last row 
row=lstlines[lstlines.index(re.findall(pattern1,txto)[-1]):]
OA_VERSION=[i for i in row if 'BladeSystem' in i]
OA_VERSION=OA_VERSION[0].split()[-1] if len(OA_VERSION)>0 else np.nan
VC_ACTIVE=[i for i in row if '1 HP' in i]
VC_ACTIVE=VC_ACTIVE[0].split()[-1] if len(VC_ACTIVE)>0 else np.nan
VC_STDN=[i for i in row if '2 HP' in i]
VC_STDN=VC_STDN[0].split()[-1] if len(VC_STDN)>0 else np.nan
data.append([re.findall(pattern1,txto)[-1].replace('=','').strip(),OA_VERSION, VC_ACTIVE,VC_STDN]) 

#Create dataframe
df=pd.DataFrame(data, columns=['ENC_NAME ','OA_VERSION','VC_ACTIVE','VC_STDN'])
print(df)

Output:

df
   ENC_NAME  OA_VERSION VC_ACTIVE VC_STDN
0    enc1001       4.85      4.50    4.50
1    enc1002       4.85      4.50    4.50
2    enc1003       4.85      4.50    4.50
3    enc1004       4.85      4.50    4.50
4    enc1005       4.85      4.50    4.50
..       ...        ...       ...     ...
94   enc8025       4.85      4.62    4.62
95   enc8026       4.85      4.62    4.62
96   enc8027       4.85      4.62    4.62
97   enc8028       4.85      4.62    4.62
98   enc8033       4.85      4.40    4.40

[99 rows x 4 columns]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM