[英]Regular expression to extract a group of words
我想在下表中的每一行的Description列中提取字符串。 由於搜索sting包含空格並且列由空格分隔,因此我不確定如何解析每行中的右側字段。
Name PCI Device Driver Admin Status Link Status Speed Duplex MAC Address MTU Description
------- ------------ ------ ------------ ----------- ----- ------ ----------------- ---- ----------------------------------------------------------------
vmnic0 0000:3d:00.0 i40en Up Down 0 Half 00:00:00:00:03:14 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic1 0000:3d:00.1 i40en Up Down 0 Half 00:00:00:00:03:15 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic10 0000:d9:00.1 ixgben Up Down 0 Half a0:36:9f:d9:b9:11 1500 Intel(R) Ethernet Controller 10G X550
vmnic11 0000:01:00.0 i40en Up Down 0 Half 3c:fd:fe:a9:4e:b8 1500 Intel(R) Ethernet Controller XXV710 for 25GbE SFP28
vmnic12 0000:01:00.1 i40en Up Up 10000 Full 3c:fd:fe:a9:4e:b9 1500 Intel(R) Ethernet Controller XXV710 for 25GbE SFP28
vmnic2 0000:00:1f.6 ne1000 Up Down 0 Half 88:88:88:88:87:88 1500 Intel Corporation Ethernet Connection (3) I219-LM
vmnic3 0000:3d:00.2 i40en Up Down 0 Half 00:00:00:00:03:16 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic4 0000:3d:00.3 i40en Up Down 0 Half 00:00:00:00:03:17 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic5 0000:18:00.0 ixgben Up Down 0 Half 90:e2:ba:37:50:a8 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic6 0000:18:00.1 ixgben Up Down 0 Half 90:e2:ba:37:50:a9 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic7 0000:81:00.0 ixgben Up Up 10000 Full 90:e2:ba:1e:b6:24 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic8 0000:81:00.1 ixgben Up Down 0 Half 90:e2:ba:1e:b6:25 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic9 0000:d9:00.0 ixgben Up Up 1000 Full a0:36:9f:d9:b9:10 1500 Intel(R) Ethernet Controller 10G X550
看來你的分隔符是“不止一個空格”。 正則表達式為\\s{2,}
。 所以對於這里的每一行, description = re.split('\\s{2,}', line)[-1]
使用pandas
:
from io import StringIO
import pandas as pd
TESTDATA = StringIO("""
Name PCI Device Driver Admin Status Link Status Speed Duplex MAC Address MTU Description
------- ------------ ------ ------------ ----------- ----- ------ ----------------- ---- ----------------------------------------------------------------
vmnic0 0000:3d:00.0 i40en Up Down 0 Half 00:00:00:00:03:14 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic1 0000:3d:00.1 i40en Up Down 0 Half 00:00:00:00:03:15 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic10 0000:d9:00.1 ixgben Up Down 0 Half a0:36:9f:d9:b9:11 1500 Intel(R) Ethernet Controller 10G X550
vmnic11 0000:01:00.0 i40en Up Down 0 Half 3c:fd:fe:a9:4e:b8 1500 Intel(R) Ethernet Controller XXV710 for 25GbE SFP28
vmnic12 0000:01:00.1 i40en Up Up 10000 Full 3c:fd:fe:a9:4e:b9 1500 Intel(R) Ethernet Controller XXV710 for 25GbE SFP28
vmnic2 0000:00:1f.6 ne1000 Up Down 0 Half 88:88:88:88:87:88 1500 Intel Corporation Ethernet Connection (3) I219-LM
vmnic3 0000:3d:00.2 i40en Up Down 0 Half 00:00:00:00:03:16 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic4 0000:3d:00.3 i40en Up Down 0 Half 00:00:00:00:03:17 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic5 0000:18:00.0 ixgben Up Down 0 Half 90:e2:ba:37:50:a8 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic6 0000:18:00.1 ixgben Up Down 0 Half 90:e2:ba:37:50:a9 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic7 0000:81:00.0 ixgben Up Up 10000 Full 90:e2:ba:1e:b6:24 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic8 0000:81:00.1 ixgben Up Down 0 Half 90:e2:ba:1e:b6:25 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic9 0000:d9:00.0 ixgben Up Up 1000 Full a0:36:9f:d9:b9:10 1500 Intel(R) Ethernet Controller 10G X550
""")
df = pd.read_csv(TESTDATA, sep="\s{2,}").iloc[1:]
descriptions = [x for x in df['Description']]
並輸出:
['Intel(R) Ethernet Connection X722 for 10GbE SFP+',
'Intel(R) Ethernet Connection X722 for 10GbE SFP+',
'Intel(R) Ethernet Controller 10G X550',
'Intel(R) Ethernet Controller XXV710 for 25GbE SFP28',
'Intel(R) Ethernet Controller XXV710 for 25GbE SFP28',
'Intel Corporation Ethernet Connection (3) I219-LM',
'Intel(R) Ethernet Connection X722 for 10GbE SFP+',
'Intel(R) Ethernet Connection X722 for 10GbE SFP+',
'Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection',
'Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection',
'Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection',
'Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection',
'Intel(R) Ethernet Controller 10G X550']
我想你可以在一個字符串中得到每一行。
>>> s = "vmnic0 0000:3d:00.0 i40en Up Down 0 Half 00:00:00:00:03:14 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+"
>>> row = re.split(r"\s{2,}", s)
>>> description = row[-1]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.