僅從列表中刪除特殊字符

Question

從 pdf 文件中，我將所有文本提取為字符串，並通過刪除所有雙空格、換行符（兩個或更多）、空格（如果兩個或更多）和每個點 (.) 將其轉換為列表。 現在在我的列表中，如果列表的值僅包含特殊字符，則應該排除該值。

 pdfFileObj = open('Python String.pdf', 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) pageObj = pdfReader.getPage(0) text=pageObj.extractText() z =re.split("\n+|[.]|\s{2,}",text) while("" in z): z.remove("") print(z)

我的 output 是

 ['split()', 'method in Python split a string into a list of strings after breaking the', 'given string by the specified separator', 'Syntax', ':', 'str', 'split(separator, maxsplit)', 'Parameters', ':', 'separator', ':', 'This is a delimiter', ' The string splits at this specified separator', ' If is', 'no', 't provided then any white space is a separator', 'maxsplit', ':', 'It is a number, which tells us to split the string into maximum of provi', 'ded number of times', ' If it is not provided then the default is', '-', '1 that means there', 'is no limit', 'Returns', ':', 'Returns a list of s', 'trings after breaking the given string by the specifie', 'd separator']

以下是一些僅包含特殊字符的值，我想刪除它們。 謝謝

Answer 1

使用正則表達式來測試字符串是否包含任何字母或數字。

import re

z = [x for x in z if re.search(r'[a-z\d]', x, flags=re.I)]

Answer 2

在將文本轉換為列表之前刪除這些特殊字符。 刪除while("" in z): z.remove("")並在讀取text變量后添加以下行：

text = re.sub('(a|b|c)', '', text)

在這個例子中，我的特殊字符是 a、b 和 c。

僅從列表中刪除特殊字符

問題描述

2 個解決方案

解決方案1
0 已采納 2021-12-26 07:30:49

解決方案2
0 2021-12-26 07:34:14

僅從列表中刪除特殊字符

問題描述

2 個解決方案

解決方案1 0 已采納 2021-12-26 07:30:49

解決方案2 0 2021-12-26 07:34:14

解決方案1
0 已采納 2021-12-26 07:30:49

解決方案2
0 2021-12-26 07:34:14