使用正則表達式從字符串中獲取序列 #Python #Regex

Question

我希望您能在使用 #Python 時得到幫助。

我有這個數據集：

E   1   1999-02-28  b,g,f    jjj:12,bbb:3,ddd:9,ggg:8,hhh:2
A   2   1999-10-28  a,f,c,d  ccc:2,ddd:0,aaa:3,hhh:9

我需要在列表中獲取序列 b、g、f 和 a、f、c、d。 我嘗試使用模式 [az],[az] 的多種組合，但每次跳過最后一項時，我不知道如何概括以獲得序列。

輸出應如下所示：

[b,g,f]
[a,f,c,d]

數據集來自一個 csv 文件，我是這樣讀的：

with open("data.csv", "r") as file:
    lines = file.readlines()

然后使用 for 循環讀取行：

list_sequence = []
for i in lines:
    a = re.findall(pattern= '???' , string=str(i))
    list_sequence.append(b)

在問號中，是我需要找到模式的地方。

Answer 1

您可以使用

(?<!\S)[a-z](?:,[a-z])*(?!\S)

請參閱正則表達式演示。 詳情：

Answer 2

您可以嘗試以下操作 - （將每一行拆分為字段並再次拆分第四個字段）

with open('in.txt') as f:
  data = []
  for line in f:
    parts = line.split()
    data.append(parts[3].split(','))
print(data)

輸出

[['b', 'g', 'f'], ['a', 'f', 'c', 'd']]