简体   繁体   English

如何在python中提取两个字符之间的子字符串?

[英]How can I extract substrings between two characters in python?

I have a nasty string that I converted from HTML code that looks like this:我有一个从 HTML 代码转换而来的讨厌的字符串,如下所示:

<p><topic url="car-colours">Toyota Camry</topic> has <a href="/colours/dark-red">Dark Red</a><span> (2020)</span>, <a href="/colours/pearl-white">Pearl White</a><span> (2016 - 2017)

I want to extract the names of the colours from this string and put them in a list.我想从这个字符串中提取颜色的名称并将它们放在一个列表中。 I was thinking maybe I extract all substrings between the ">" and the "<" character as all colours are wrapped in it but I don't know how.我在想也许我提取了 ">" 和 "<" 字符之间的所有子字符串,因为所有颜色都包含在其中,但我不知道如何。

My goal is to have a list that will store all colours for the toyota camry like: toyota_camry_colours = ["Dark Red", "Pearl White"]我的目标是有一个列表来存储丰田凯美瑞的所有颜色,例如: toyota_camry_colours = ["Dark Red", "Pearl White"]

Any ideas how I can do this?任何想法我怎么能做到这一点? In bash I would use like grep or awk and stuff but don't know for python.在 bash 中,我会使用 grep 或 awk 之类的东西,但不知道 python。

The BeautifulSoup module was designed to parse HTML. BeautifulSoup 模块旨在解析 HTML。

from bs4 import BeautifulSoup 

str = """\
<p><topic url="car-colours">Toyota Camry</topic> has <a href="/colours/dark-red">Dark Red</a><span> (2020)</span>, <a href="/colours/pearl-white">Pearl White</a><span> (2016 - 2017)"""

soup = BeautifulSoup(str, 'html.parser')
for link in soup.find_all('a'):
    print( link.text )

Output:输出:

Dark Red
Pearl White

A simple regex would help it /colours/([\\w-]+)一个简单的正则表达式会帮助它/colours/([\\w-]+)

import re

txt = '<p><topic url="car-colours">Toyota Camry</topic> has <a href="/colours/dark-red">Dark Red</a><span>' \
      ' (2020)</span>, <a href="/colours/pearl-white">Pearl White</a><span> (2016 - 2017)'
colors = re.findall(r"/colours/([\w-]+)", txt)
print(colors)  # ['dark-red', 'pearl-white']

colors = [" ".join(word.capitalize() for word in color.split("-")) for color in colors]
print(colors)  # ['Dark Red', 'Pearl White']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python中的多行字符串中提取两个子字符串之间的字符串部分 - how to extract portion of a string between two substrings in a multiline string in python 如何从 python 中的字符串中提取多子字符串? - How can I extract a multi-substrings from a string in python? 如何在Python中使用正则表达式提取某些字符之间的所有子字符串? - How to extract all substrings between certain characters by using regular expression in Python? 如何为CSV文件中的每一列提取两个字符之间的子字符串,并将这些值复制到Python中的新列中? - How can I extract a substring between two characters for every row of a column in a CSV file and copy those values into a new column in Python? 如何在python的列表元素中拆分两个字符? - How can I split between two characters in a list element in python? 在python中使用正则表达式多行提取两个子字符串之间的文本 - Extract text between two substrings using regular expression multiline in python Python - 使用正则表达式提取两个标记之间的子字符串 - Python - Use regex to extract substrings between two markers 如何在一行中搜索字符串并在python中的两个字符之间提取数据? - How to search string in a line and extract data between two characters in python? 提取两个字符之间的子字符串-python DataFrame - Extract substring between two characters - python DataFrame Python中的字符串函数以在两个字符之间提取 - string function in Python to extract between two characters
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM