[英]Regular expression to extract a group of words
I want to extract the string in Description column for each line in the following table. 我想在下表中的每一行的Description列中提取字符串。 Since the search sting contains spaces and the columns are delimited by spaces , I am not sure how I can parse the right field in each line.
由于搜索sting包含空格并且列由空格分隔,因此我不确定如何解析每行中的右侧字段。
Name PCI Device Driver Admin Status Link Status Speed Duplex MAC Address MTU Description
------- ------------ ------ ------------ ----------- ----- ------ ----------------- ---- ----------------------------------------------------------------
vmnic0 0000:3d:00.0 i40en Up Down 0 Half 00:00:00:00:03:14 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic1 0000:3d:00.1 i40en Up Down 0 Half 00:00:00:00:03:15 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic10 0000:d9:00.1 ixgben Up Down 0 Half a0:36:9f:d9:b9:11 1500 Intel(R) Ethernet Controller 10G X550
vmnic11 0000:01:00.0 i40en Up Down 0 Half 3c:fd:fe:a9:4e:b8 1500 Intel(R) Ethernet Controller XXV710 for 25GbE SFP28
vmnic12 0000:01:00.1 i40en Up Up 10000 Full 3c:fd:fe:a9:4e:b9 1500 Intel(R) Ethernet Controller XXV710 for 25GbE SFP28
vmnic2 0000:00:1f.6 ne1000 Up Down 0 Half 88:88:88:88:87:88 1500 Intel Corporation Ethernet Connection (3) I219-LM
vmnic3 0000:3d:00.2 i40en Up Down 0 Half 00:00:00:00:03:16 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic4 0000:3d:00.3 i40en Up Down 0 Half 00:00:00:00:03:17 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic5 0000:18:00.0 ixgben Up Down 0 Half 90:e2:ba:37:50:a8 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic6 0000:18:00.1 ixgben Up Down 0 Half 90:e2:ba:37:50:a9 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic7 0000:81:00.0 ixgben Up Up 10000 Full 90:e2:ba:1e:b6:24 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic8 0000:81:00.1 ixgben Up Down 0 Half 90:e2:ba:1e:b6:25 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic9 0000:d9:00.0 ixgben Up Up 1000 Full a0:36:9f:d9:b9:10 1500 Intel(R) Ethernet Controller 10G X550
It seems your delimiter is "more than one space". 看来你的分隔符是“不止一个空格”。 The regular expression for that would be
\\s{2,}
. 正则表达式为
\\s{2,}
。 So for each line here, description = re.split('\\s{2,}', line)[-1]
所以对于这里的每一行,
description = re.split('\\s{2,}', line)[-1]
Using pandas
: 使用
pandas
:
from io import StringIO
import pandas as pd
TESTDATA = StringIO("""
Name PCI Device Driver Admin Status Link Status Speed Duplex MAC Address MTU Description
------- ------------ ------ ------------ ----------- ----- ------ ----------------- ---- ----------------------------------------------------------------
vmnic0 0000:3d:00.0 i40en Up Down 0 Half 00:00:00:00:03:14 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic1 0000:3d:00.1 i40en Up Down 0 Half 00:00:00:00:03:15 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic10 0000:d9:00.1 ixgben Up Down 0 Half a0:36:9f:d9:b9:11 1500 Intel(R) Ethernet Controller 10G X550
vmnic11 0000:01:00.0 i40en Up Down 0 Half 3c:fd:fe:a9:4e:b8 1500 Intel(R) Ethernet Controller XXV710 for 25GbE SFP28
vmnic12 0000:01:00.1 i40en Up Up 10000 Full 3c:fd:fe:a9:4e:b9 1500 Intel(R) Ethernet Controller XXV710 for 25GbE SFP28
vmnic2 0000:00:1f.6 ne1000 Up Down 0 Half 88:88:88:88:87:88 1500 Intel Corporation Ethernet Connection (3) I219-LM
vmnic3 0000:3d:00.2 i40en Up Down 0 Half 00:00:00:00:03:16 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic4 0000:3d:00.3 i40en Up Down 0 Half 00:00:00:00:03:17 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+
vmnic5 0000:18:00.0 ixgben Up Down 0 Half 90:e2:ba:37:50:a8 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic6 0000:18:00.1 ixgben Up Down 0 Half 90:e2:ba:37:50:a9 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic7 0000:81:00.0 ixgben Up Up 10000 Full 90:e2:ba:1e:b6:24 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic8 0000:81:00.1 ixgben Up Down 0 Half 90:e2:ba:1e:b6:25 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic9 0000:d9:00.0 ixgben Up Up 1000 Full a0:36:9f:d9:b9:10 1500 Intel(R) Ethernet Controller 10G X550
""")
df = pd.read_csv(TESTDATA, sep="\s{2,}").iloc[1:]
descriptions = [x for x in df['Description']]
And the output: 并输出:
['Intel(R) Ethernet Connection X722 for 10GbE SFP+',
'Intel(R) Ethernet Connection X722 for 10GbE SFP+',
'Intel(R) Ethernet Controller 10G X550',
'Intel(R) Ethernet Controller XXV710 for 25GbE SFP28',
'Intel(R) Ethernet Controller XXV710 for 25GbE SFP28',
'Intel Corporation Ethernet Connection (3) I219-LM',
'Intel(R) Ethernet Connection X722 for 10GbE SFP+',
'Intel(R) Ethernet Connection X722 for 10GbE SFP+',
'Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection',
'Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection',
'Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection',
'Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection',
'Intel(R) Ethernet Controller 10G X550']
I suppose you can get each line in a string. 我想你可以在一个字符串中得到每一行。
>>> s = "vmnic0 0000:3d:00.0 i40en Up Down 0 Half 00:00:00:00:03:14 1500 Intel(R) Ethernet Connection X722 for 10GbE SFP+"
>>> row = re.split(r"\s{2,}", s)
>>> description = row[-1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.