[英]How to find a “piece” of word in a string with regex and using it in python?
[英]How to find a specific piece of text in a string with regex and python
我在一列的每個單元格中都有一個文本,我想從中獲取一些信息。 在每個單元格中,我都有關於汽車的詳細信息,我需要從中獲取文本。 就我而言,這些是燃料和二氧化碳信息。
我得到的字符串如下所示:
單元 1 = 17.160 km,80 kW (109 PS)Limousine,Autogas (LPG),Automatik,HU Neu,2/3 Türenca。 5,0 l/100km (komb.), ca. 116 g CO₂/km (komb.)
cell 2 = EZ 10/2018, 12.900 km, 80 kW (109 PS)Limousine, Unfallfrei, Hybrid (Benzin/Elektro), Halbautomatik, HU Neu, ca. 5,9 l/100km (komb.), ca. 134 g CO₂/km (komb.) ...等等
所以我需要來自單元格 1 的信息:5,0 l/100 km 和 116 g CO2/km
來自電池 2:5,9 l/100km 和 134 g CO2/km
我嘗試了以下代碼示例,但沒有任何效果:
pattern_z = re.compile("[a-z]+.?\s?[0-9]+\s?[a-z]?\s[A-Z]+")
pattern_z = re.compile("^[ac]+\s?[CO]$")
pattern_z = re.compile(r'[0-9]+.[g]?')
在我嘗試過的每個“pattern_z”變量之后
co = pattern_z.search(i)
cox = co.group()
但沒有任何效果。
我將不勝感激每一個幫助。
用
(\d+(?:,\d+)?\s*l/\d+km).*?(\d+\s?g\s*CO[₂2]/km)
請參閱正則表達式證明。
解釋
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
l/ 'l/'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
km 'km'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
--------------------------------------------------------------------------------
g 'g'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
CO 'CO'
--------------------------------------------------------------------------------
[₂2] any character of: '&', '#', '8', '3',
'2', '2', ';', '2'
--------------------------------------------------------------------------------
/km '/km'
--------------------------------------------------------------------------------
) end of \2
蟒蛇代碼:
import re
regex = r"(\d+(?:,\d+)?\s*l/\d+km).*?(\d+\s?g\s*CO[₂2]/km)"
test_str = "17.160 km, 80 kW (109 PS)Limousine, Autogas (LPG), Automatik, HU Neu, 2/3 Türenca. 5,0 l/100km (komb.), ca. 116 g CO₂/km (komb.)\n\nEZ 10/2018, 12.900 km, 80 kW (109 PS)Limousine, Unfallfrei, Hybrid (Benzin/Elektro), Halbautomatik, HU Neu, ca. 5,9 l/100km (komb.), ca. 134 g CO₂/km (komb.) ... and so on"
print (re.findall(regex, test_str))
結果: [('5,0\ l/100km', '116\ g CO₂/km'), ('5,9\ l/100km', '134\ g CO₂/km')]
你可能會用
\b\d+(?:,\d+)?(?:\s*l/\d+|\s*g\s+CO₂/)km\b
\\b
一個詞邊界\\d+(?:,\\d+)?
匹配 1+ 位數字和一個可選的小數部分(?:
非捕獲組
\\s*l/\\d+
匹配l/
和 1+ 數字|
或者\\s*g\\s+CO₂/
匹配g
、空白字符和 CO₂/)
關閉非捕獲組km\\b
匹配km
和單詞邊界以防止部分匹配import re
strings = [
'17.160 km, 80 kW (109 PS)Limousine, Autogas (LPG), Automatik, HU Neu, 2/3 Türenca. 5,0 l/100km (komb.), ca. 116 g CO₂/km (komb.)',
'EZ 10/2018, 12.900 km, 80 kW (109 PS)Limousine, Unfallfrei, Hybrid (Benzin/Elektro), Halbautomatik, HU Neu, ca. 5,9 l/100km (komb.), ca. 134 g CO₂/km (komb.)'
]
pattern = r"\b\d+(?:,\d+)?(?:\s*l/\d+|\s*g\s+CO₂/)km\b"
for s in strings:
print(re.findall(pattern, s))
輸出
['5,0 l/100km', '116 g CO₂/km']
['5,9 l/100km', '134 g CO₂/km']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.