如何使用正則表達式和python在字符串中查找特定文本

Question

我在一列的每個單元格中都有一個文本，我想從中獲取一些信息。 在每個單元格中，我都有關於汽車的詳細信息，我需要從中獲取文本。 就我而言，這些是燃料和二氧化碳信息。

我得到的字符串如下所示：

單元 1 = 17.160 km，80 kW (109 PS)Limousine，Autogas (LPG)，Automatik，HU Neu，2/3 Türenca。 5,0 l/100km (komb.), ca. 116 g CO₂/km (komb.)

cell 2 = EZ 10/2018, 12.900 km, 80 kW (109 PS)Limousine, Unfallfrei, Hybrid (Benzin/Elektro), Halbautomatik, HU Neu, ca. 5,9 l/100km (komb.), ca. 134 g CO₂/km (komb.) ...等等

所以我需要來自單元格 1 的信息：5,0 l/100 km 和 116 g CO2/km

來自電池 2：5,9 l/100km 和 134 g CO2/km

我嘗試了以下代碼示例，但沒有任何效果：

    pattern_z = re.compile("[a-z]+.?\s?[0-9]+\s?[a-z]?\s[A-Z]+")
    pattern_z = re.compile("^[ac]+\s?[CO]$")
    pattern_z = re.compile(r'[0-9]+.[g]?')

在我嘗試過的每個“pattern_z”變量之后

    co = pattern_z.search(i)
    cox = co.group()

但沒有任何效果。

我將不勝感激每一個幫助。

Answer 1

用

(\d+(?:,\d+)?\s*l/\d+km).*?(\d+\s?g\s*CO[₂2]/km)

請參閱正則表達式證明。

解釋

--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
--------------------------------------------------------------------------------
      ,                        ','
--------------------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
    )?                       end of grouping
--------------------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    l/                       'l/'
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    km                       'km'
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    \s?                      whitespace (\n, \r, \t, \f, and " ")
                             (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    g                        'g'
--------------------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    CO                       'CO'
--------------------------------------------------------------------------------
    [₂2]               any character of: '&', '#', '8', '3',
                             '2', '2', ';', '2'
--------------------------------------------------------------------------------
    /km                      '/km'
--------------------------------------------------------------------------------
  )                        end of \2

蟒蛇代碼：

import re

regex = r"(\d+(?:,\d+)?\s*l/\d+km).*?(\d+\s?g\s*CO[₂2]/km)"

test_str = "17.160 km, 80 kW (109 PS)Limousine, Autogas (LPG), Automatik, HU Neu, 2/3 Türenca. 5,0 l/100km (komb.), ca. 116 g CO₂/km (komb.)\n\nEZ 10/2018, 12.900 km, 80 kW (109 PS)Limousine, Unfallfrei, Hybrid (Benzin/Elektro), Halbautomatik, HU Neu, ca. 5,9 l/100km (komb.), ca. 134 g CO₂/km (komb.) ... and so on"

print (re.findall(regex, test_str))

結果: [('5,0\ l/100km', '116\ g CO₂/km'), ('5,9\ l/100km', '134\ g CO₂/km')]

Answer 2

你可能會用

\b\d+(?:,\d+)?(?:\s*l/\d+|\s*g\s+CO₂/)km\b

\\b一個詞邊界
\\d+(?:,\\d+)? 匹配 1+ 位數字和一個可選的小數部分
(?:非捕獲組
- \\s*l/\\d+匹配l/和 1+ 數字
- | 或者
- \\s*g\\s+CO₂/匹配g 、空白字符和 CO₂/
)關閉非捕獲組
km\\b匹配km和單詞邊界以防止部分匹配

正則表達式演示

import re

strings = [
    '17.160 km, 80 kW (109 PS)Limousine, Autogas (LPG), Automatik, HU Neu, 2/3 Türenca. 5,0 l/100km (komb.), ca. 116 g CO₂/km (komb.)',
    'EZ 10/2018, 12.900 km, 80 kW (109 PS)Limousine, Unfallfrei, Hybrid (Benzin/Elektro), Halbautomatik, HU Neu, ca. 5,9 l/100km (komb.), ca. 134 g CO₂/km (komb.)'
    ]
pattern = r"\b\d+(?:,\d+)?(?:\s*l/\d+|\s*g\s+CO₂/)km\b"
for s in strings:
    print(re.findall(pattern, s))

輸出

['5,0 l/100km', '116 g CO₂/km']
['5,9 l/100km', '134 g CO₂/km']

如何使用正則表達式和python在字符串中查找特定文本

問題描述

2 個解決方案

解決方案1
0 2021-11-10 23:30:11

解決方案2
0 2021-11-10 23:30:21

如何使用正則表達式和python在字符串中查找特定文本

問題描述

2 個解決方案

解決方案1 0 2021-11-10 23:30:11

解決方案2 0 2021-11-10 23:30:21

解決方案1
0 2021-11-10 23:30:11

解決方案2
0 2021-11-10 23:30:21