簡體   English   中英

正則表達式提取變量字符串

[英]Regular expression to pull out variable string

我在PYthon 2.7中有以下字符串列表:

list_a = ['temp_52_head sensor,
uploaded by TS','crack in the left quadrant, uploaded by AB, Left in 2hr
sunlight','FSL_pressure, uploaded by RS, no reported vacuum','art
9943_mercury, Uploaded by DY, accelerated, hurst potential too
low','uploaded by KKP, Space 55','avogadro reading level,
uploaded by HB, started mini counter, pulled lever','no comment
yesterday, Uploaded to TFG, level 1 escape but temperature stable,
pressure lever north']

每個列表項中都有一個字符串

uploaded by SOMEONE

我需要提取SOMEONE

但是,正如您所看到的, SOMEONE

  1. 從列表中的一項更改為下一項。
  2. 長度可以為2個或3個字符(僅文本,無數字)。
  3. 發生在字符串的不同位置。
  4. 上載也發生為上載
  5. 上載有時會在任何逗號之前發生

這是我需要提取的內容:

someone_names = ['TS','AB','RS','DY','KKP','HB','TFG']

我當時正在考慮使用正則表達式,但是我面臨的問題來自上面的第2點和第3點。

有沒有辦法從列表中拉出這些字符?

您可以使用列表推導來實現正則表達式。

>>> import re
>>> list_a = [
      'temp_52_head sensor, uploaded by TS',
      'crack in the left quadrant, uploaded by AB, Left in 2hr sunlight',
      'FSL_pressure, uploaded by RS, no reported vacuum',
      'art9943_mercury, Uploaded by DY, accelerated, hurst potential too low',
      'uploaded by KKP, Space 55',
      'avogadro reading level, uploaded by HB, started mini counter, pulled lever',
      'no comment yesterday, Uploaded to TFG, level 1 escape but temperature stable,pressure lever north'
]
>>> regex = re.compile(r'(?i)\buploaded\s*(?:by|to)\s*([a-z]{2,3})')
>>> names = [m.group(1) for x in list_a for m in [regex.search(x)] if m]
['TS', 'AB', 'RS', 'DY', 'KKP', 'HB', 'TFG']

不是正則表達式,但是更詳細的方法可能是這樣的:

import re
name = re.search(re.escape("uploaded by ")+"(.*?)"+re.escape(","),list_a[x]).group(1)

看起來像這樣的正則表達式可以滿足您的要求,除非我遺漏了一些東西:

/[U|u]ploaded by ([A-Z]{2}|[A-Z]{3}),/

或者,(從您的示例中)似乎還可以將字符串拆分為逗號,並從具有字符串“ ploaded”的數組中拉出元素(避免使用上/下“ u”的可能性),將其拆分為空格,然后獲取結果數組中的最后一個元素。

這個正則表達式可以解決所有這些問題,如果您更改了上傳器首字母縮寫中的字母數量,它將仍然有效。 不管是否有逗號,還是兩個或三個字母后面的單引號,都將匹配。 它還將捕獲您要查找的所有數據:

import re

m = re.compile('uploaded ((by)|(to)) ([a-z]+)', flags=re.IGNORCASE)

然后,您可以將search patter對象msearch()函數一起使用,它將提取所有匹配項。 每次迭代中的第4個匹配項是您要尋找的數據。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM