简体   繁体   English

仅从列表中删除特殊字符

[英]only special characters remove from the list

From a pdf file I extract all the text as a string, and convert it into the list by removing all the double white spaces, newlines (two or more), spaces (if two or more) and on every dot (.).从 pdf 文件中,我将所有文本提取为字符串,并通过删除所有双空格、换行符(两个或更多)、空格(如果两个或更多)和每个点 (.) 将其转换为列表。 Now in my list I want if a value of a list consist of only special characters, that value should be excluded.现在在我的列表中,如果列表的值仅包含特殊字符,则应该排除该值。

 pdfFileObj = open('Python String.pdf', 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) pageObj = pdfReader.getPage(0) text=pageObj.extractText() z =re.split("\n+|[.]|\s{2,}",text) while("" in z): z.remove("") print(z)

My output is我的 output 是

 ['split()', 'method in Python split a string into a list of strings after breaking the', 'given string by the specified separator', 'Syntax', ':', 'str', 'split(separator, maxsplit)', 'Parameters', ':', 'separator', ':', 'This is a delimiter', ' The string splits at this specified separator', ' If is', 'no', 't provided then any white space is a separator', 'maxsplit', ':', 'It is a number, which tells us to split the string into maximum of provi', 'ded number of times', ' If it is not provided then the default is', '-', '1 that means there', 'is no limit', 'Returns', ':', 'Returns a list of s', 'trings after breaking the given string by the specifie', 'd separator']

Here are some values contain only special characters and I want to remove those.以下是一些仅包含特殊字符的值,我想删除它们。 Thanks谢谢

Use a regular expression that tests if a string contains any letters or numbers.使用正则表达式来测试字符串是否包含任何字母或数字。

import re

z = [x for x in z if re.search(r'[a-z\d]', x, flags=re.I)]

Remove those special characters before converting text to list.在将文本转换为列表之前删除这些特殊字符。 remove while("" in z): z.remove("") and add following line after read text variable:删除while("" in z): z.remove("")并在读取text变量后添加以下行:

text = re.sub('(a|b|c)', '', text)

In this example, my special characters are a, b and c.在这个例子中,我的特殊字符是 a、b 和 c。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM