如何在Python中删除html文件的特定部分

Question

I am working on a html file which has item 1, item 2, and item 3. I want to delete all the text that comes after item 2. I can find item 2 in the file like this: 我正在处理一个包含项目1，项目2和项目3的html文件。我想删除项目2之后的所有文本。我可以在文件中找到项目2，如下所示：

Item2= re.compile (r'(Item&nbsp;2)',re.I|re.S)
Item2match= Item2.findall(file)

but I don't know how can I delete the text that comes after it. 但我不知道如何删除后面的文字。

Answer 1

Simply use string methods to split the html text and take the first part; 只需使用字符串方法分割html文本并采用第一部分； str.partition() works much simpler: str.partition()工作简单得多：

file.partition('Item&nbsp;2')[0]

If you wanted to keep the Item 2 text too, use: 如果您也想保留Item 2文字，请使用：

''.join(file.partition('Item&nbsp;2')[:2])

There is no need to use a regular expression here; 此处无需使用正则表达式； you are matching literal text. 您正在匹配文字文本。 Regular expressions is a wonderfully expressive and powerfool tool, but don't use it if there are simpler alternatives. 正则表达式是一种出色的表现力和强大的工具，但是如果有更简单的选择，则不要使用它。

Demo: 演示：

>>> 'Some text with Item&nbsp;2 in it'.partition('Item&nbsp;2')[0]
'Some text with '
>>> ''.join('Some text with Item&nbsp;2 in it'.partition('Item&nbsp;2')[:2])
'Some text with Item&nbsp;2'

Answer 2

>>> re.sub(r'(?s)(?<=Item&nbsp;2)(.*)', '', file)

Example: 例：

>>> s
'Item&nbsp;2...feiugeogherger\nfjweifjwef\nsfjioweiefjwe'
>>> re.sub(r'(?s)(?<=Item&nbsp;2)(.*)', '', s)
'Item&nbsp;2'

如何在Python中删除html文件的特定部分

问题描述

2 个解决方案

解决方案1
0 已采纳 2013-07-24 21:15:28

解决方案2
0 2013-07-24 21:15:55

如何在Python中删除html文件的特定部分

问题描述

2 个解决方案

解决方案1 0 已采纳 2013-07-24 21:15:28

解决方案2 0 2013-07-24 21:15:55

解决方案1
0 已采纳 2013-07-24 21:15:28

解决方案2
0 2013-07-24 21:15:55