简体   繁体   中英

Regular expression to extract a group of words

I want to extract the string in Description column for each line in the following table. Since the search sting contains spaces and the columns are delimited by spaces , I am not sure how I can parse the right field in each line.

    Name     PCI Device    Driver  Admin Status  Link Status  Speed  Duplex  MAC Address         MTU  Description
-------  ------------  ------  ------------  -----------  -----  ------  -----------------  ----  ----------------------------------------------------------------
vmnic0   0000:3d:00.0  i40en   Up            Down             0  Half    00:00:00:00:03:14  1500  Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic1   0000:3d:00.1  i40en   Up            Down             0  Half    00:00:00:00:03:15  1500  Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic10  0000:d9:00.1  ixgben  Up            Down             0  Half    a0:36:9f:d9:b9:11  1500  Intel(R) Ethernet Controller 10G X550
vmnic11  0000:01:00.0  i40en   Up            Down             0  Half    3c:fd:fe:a9:4e:b8  1500  Intel(R) Ethernet Controller XXV710 for 25GbE SFP28
vmnic12  0000:01:00.1  i40en   Up            Up           10000  Full    3c:fd:fe:a9:4e:b9  1500  Intel(R) Ethernet Controller XXV710 for 25GbE SFP28
vmnic2   0000:00:1f.6  ne1000  Up            Down             0  Half    88:88:88:88:87:88  1500  Intel Corporation Ethernet Connection (3) I219-LM
vmnic3   0000:3d:00.2  i40en   Up            Down             0  Half    00:00:00:00:03:16  1500  Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic4   0000:3d:00.3  i40en   Up            Down             0  Half    00:00:00:00:03:17  1500  Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic5   0000:18:00.0  ixgben  Up            Down             0  Half    90:e2:ba:37:50:a8  1500  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic6   0000:18:00.1  ixgben  Up            Down             0  Half    90:e2:ba:37:50:a9  1500  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic7   0000:81:00.0  ixgben  Up            Up           10000  Full    90:e2:ba:1e:b6:24  1500  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic8   0000:81:00.1  ixgben  Up            Down             0  Half    90:e2:ba:1e:b6:25  1500  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic9   0000:d9:00.0  ixgben  Up            Up            1000  Full    a0:36:9f:d9:b9:10  1500  Intel(R) Ethernet Controller 10G X550

It seems your delimiter is "more than one space". The regular expression for that would be \\s{2,} . So for each line here, description = re.split('\\s{2,}', line)[-1]

Using pandas :

from io import StringIO
import pandas as pd

TESTDATA = StringIO("""
        Name     PCI Device    Driver  Admin Status  Link Status  Speed  Duplex  MAC Address         MTU  Description
-------  ------------  ------  ------------  -----------  -----  ------  -----------------  ----  ----------------------------------------------------------------
vmnic0   0000:3d:00.0  i40en   Up            Down             0  Half    00:00:00:00:03:14  1500  Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic1   0000:3d:00.1  i40en   Up            Down             0  Half    00:00:00:00:03:15  1500  Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic10  0000:d9:00.1  ixgben  Up            Down             0  Half    a0:36:9f:d9:b9:11  1500  Intel(R) Ethernet Controller 10G X550
vmnic11  0000:01:00.0  i40en   Up            Down             0  Half    3c:fd:fe:a9:4e:b8  1500  Intel(R) Ethernet Controller XXV710 for 25GbE SFP28
vmnic12  0000:01:00.1  i40en   Up            Up           10000  Full    3c:fd:fe:a9:4e:b9  1500  Intel(R) Ethernet Controller XXV710 for 25GbE SFP28
vmnic2   0000:00:1f.6  ne1000  Up            Down             0  Half    88:88:88:88:87:88  1500  Intel Corporation Ethernet Connection (3) I219-LM
vmnic3   0000:3d:00.2  i40en   Up            Down             0  Half    00:00:00:00:03:16  1500  Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic4   0000:3d:00.3  i40en   Up            Down             0  Half    00:00:00:00:03:17  1500  Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic5   0000:18:00.0  ixgben  Up            Down             0  Half    90:e2:ba:37:50:a8  1500  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic6   0000:18:00.1  ixgben  Up            Down             0  Half    90:e2:ba:37:50:a9  1500  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic7   0000:81:00.0  ixgben  Up            Up           10000  Full    90:e2:ba:1e:b6:24  1500  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic8   0000:81:00.1  ixgben  Up            Down             0  Half    90:e2:ba:1e:b6:25  1500  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic9   0000:d9:00.0  ixgben  Up            Up            1000  Full    a0:36:9f:d9:b9:10  1500  Intel(R) Ethernet Controller 10G X550
    """)

df = pd.read_csv(TESTDATA, sep="\s{2,}").iloc[1:]
descriptions = [x for x in df['Description']]

And the output:

['Intel(R) Ethernet Connection X722 for 10GbE SFP+',
 'Intel(R) Ethernet Connection X722 for 10GbE SFP+',
 'Intel(R) Ethernet Controller 10G X550',
 'Intel(R) Ethernet Controller XXV710 for 25GbE SFP28',
 'Intel(R) Ethernet Controller XXV710 for 25GbE SFP28',
 'Intel Corporation Ethernet Connection (3) I219-LM',
 'Intel(R) Ethernet Connection X722 for 10GbE SFP+',
 'Intel(R) Ethernet Connection X722 for 10GbE SFP+',
 'Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection',
 'Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection',
 'Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection',
 'Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection',
 'Intel(R) Ethernet Controller 10G X550']

I suppose you can get each line in a string.

>>> s = "vmnic0   0000:3d:00.0  i40en   Up            Down             0  Half    00:00:00:00:03:14  1500  Intel(R) Ethernet Connection X722 for 10GbE SFP+"
>>> row = re.split(r"\s{2,}", s)
>>> description = row[-1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM