简体   繁体   English

如何使用正则表达式和python在字符串中查找特定文本

[英]How to find a specific piece of text in a string with regex and python

I have a text in every cell of a column, where i want to get some information from.我在一列的每个单元格中都有一个文本,我想从中获取一些信息。 In every cell i have detailed information about cars and i need to get the text from it.在每个单元格中,我都有关于汽车的详细信息,我需要从中获取文本。 In my case these are the fuel and the CO2 information.就我而言,这些是燃料和二氧化碳信息。

The string, that i get, looks like this:我得到的字符串如下所示:

cell 1 = 17.160 km, 80 kW (109 PS)Limousine, Autogas (LPG), Automatik, HU Neu, 2/3 Türenca.单元 1 = 17.160 km,80 kW (109 PS)Limousine,Autogas (LPG),Automatik,HU Neu,2/3 Türenca。 5,0 l/100km (komb.), ca. 5,0 l/100km (komb.), ca. 116 g CO₂/km (komb.) 116 g CO₂/km (komb.)

cell 2 = EZ 10/2018, 12.900 km, 80 kW (109 PS)Limousine, Unfallfrei, Hybrid (Benzin/Elektro), Halbautomatik, HU Neu, ca. cell 2 = EZ 10/2018, 12.900 km, 80 kW (109 PS)Limousine, Unfallfrei, Hybrid (Benzin/Elektro), Halbautomatik, HU Neu, ca. 5,9 l/100km (komb.), ca. 5,9 l/100km (komb.), ca. 134 g CO₂/km (komb.) ... and so on 134 g CO₂/km (komb.) ...等等

so i need the information from cell 1: 5,0 l/100 km and 116 g CO2/km所以我需要来自单元格 1 的信息:5,0 l/100 km 和 116 g CO2/km

and from cell 2: 5,9 l/100km and 134 g CO2/km来自电池 2:5,9 l/100km 和 134 g CO2/km

I tried the following code examples, but nothing worked:我尝试了以下代码示例,但没有任何效果:

    pattern_z = re.compile("[a-z]+.?\s?[0-9]+\s?[a-z]?\s[A-Z]+")
    pattern_z = re.compile("^[ac]+\s?[CO]$")
    pattern_z = re.compile(r'[0-9]+.[g]?')
    

and after each "pattern_z" variable i tried在我尝试过的每个“pattern_z”变量之后

    co = pattern_z.search(i)
    cox = co.group()

but nothing worked.但没有任何效果。

I would appreciate every help.我将不胜感激每一个帮助。

Use

(\d+(?:,\d+)?\s*l/\d+km).*?(\d+\s?g\s*CO[₂2]/km)

See regex proof .请参阅正则表达式证明

EXPLANATION解释

--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
--------------------------------------------------------------------------------
      ,                        ','
--------------------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
    )?                       end of grouping
--------------------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    l/                       'l/'
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    km                       'km'
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    \s?                      whitespace (\n, \r, \t, \f, and " ")
                             (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    g                        'g'
--------------------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    CO                       'CO'
--------------------------------------------------------------------------------
    [₂2]               any character of: '&', '#', '8', '3',
                             '2', '2', ';', '2'
--------------------------------------------------------------------------------
    /km                      '/km'
--------------------------------------------------------------------------------
  )                        end of \2

Python code : 蟒蛇代码

import re

regex = r"(\d+(?:,\d+)?\s*l/\d+km).*?(\d+\s?g\s*CO[₂2]/km)"

test_str = "17.160 km, 80 kW (109 PS)Limousine, Autogas (LPG), Automatik, HU Neu, 2/3 Türenca. 5,0 l/100km (komb.), ca. 116 g CO₂/km (komb.)\n\nEZ 10/2018, 12.900 km, 80 kW (109 PS)Limousine, Unfallfrei, Hybrid (Benzin/Elektro), Halbautomatik, HU Neu, ca. 5,9 l/100km (komb.), ca. 134 g CO₂/km (komb.) ... and so on"

print (re.findall(regex, test_str))

Results : [('5,0\ l/100km', '116\ g CO₂/km'), ('5,9\ l/100km', '134\ g CO₂/km')]结果: [('5,0\ l/100km', '116\ g CO₂/km'), ('5,9\ l/100km', '134\ g CO₂/km')]

You might use你可能会用

\b\d+(?:,\d+)?(?:\s*l/\d+|\s*g\s+CO₂/)km\b
  • \\b A word boundary \\b一个词边界
  • \\d+(?:,\\d+)? Match 1+ digits and an optional decimal part匹配 1+ 位数字和一个可选的小数部分
  • (?: Non catpure group (?:非捕获组
    • \\s*l/\\d+ match l/ and 1+ digits \\s*l/\\d+匹配l/和 1+ 数字
    • | Or或者
    • \\s*g\\s+CO₂/ match g , whitespace chars and CO₂/ \\s*g\\s+CO₂/匹配g 、空白字符和 CO₂/
  • ) Close non capture group )关闭非捕获组
  • km\\b Match km and a word boundary to prevent a partial match km\\b匹配km和单词边界以防止部分匹配

Regex demo正则表达式演示

import re

strings = [
    '17.160 km, 80 kW (109 PS)Limousine, Autogas (LPG), Automatik, HU Neu, 2/3 Türenca. 5,0 l/100km (komb.), ca. 116 g CO₂/km (komb.)',
    'EZ 10/2018, 12.900 km, 80 kW (109 PS)Limousine, Unfallfrei, Hybrid (Benzin/Elektro), Halbautomatik, HU Neu, ca. 5,9 l/100km (komb.), ca. 134 g CO₂/km (komb.)'
    ]
pattern = r"\b\d+(?:,\d+)?(?:\s*l/\d+|\s*g\s+CO₂/)km\b"
for s in strings:
    print(re.findall(pattern, s))

Output输出

['5,0 l/100km', '116 g CO₂/km']
['5,9 l/100km', '134 g CO₂/km']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用正则表达式在字符串中查找单词的“片断”并在python中使用它? - How to find a “piece” of word in a string with regex and using it in python? 如何在 Python 中使用正则表达式查找与特定字符串匹配的字符串 - How to find matching strings upto a specific string with regex in Python 如何使用正则表达式在Python 3中查找特定字符串之后或之前的行? - How to find a line after or before a specific string in Python 3 using regex? 如何在 Python 中找到特定的正则表达式 - How to find specific regex in Python 如何使用正则表达式python3从文本文件中找到字符串? - How to find a string from text file by using regex python3? 正则表达式+ Python:如何用'?'查找字符串 在里面? - regex + Python: How to find string with '?' in it? 如何使用 Regex 和 Python 从文本输入中查找具有对应值的特定文本? - How to find specific text with it's correspondence value from a text input using Regex and Python? 如何使用pandas打开和读取文本文件,并找到特定的数据? - How to use pandas to open and read text file, and find a specific piece of data? 如何从 Python 中的字符串中获取两个特定字符之间的一段字符串 - How can I get a piece of a string between two specific characters from a string in Python 如何匹配 Python 正则表达式中的特定字符串? - How to match specific string in Python Regex?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM