I am processing 500GB of data to find only the lines which has Characters. Can you suggest me a efficent/Faster way.
Data is like:
%^^%^^%^^%
This is a valid
*%^%^ Valid
This is not a valid one
output should be:
Data is like:
This is a valid
*%^%^ Valid
This is not a valid one
I am trying this: isalpha() issue is it will remoce the line *%^%^ Valid
Actuall some how this code is also not working
if line.isalpha()=='True':
print(line)
This is not working...
can I use regular expressions but read some where it will slow it is that true?
Use regex, like:
>>> import re
>>>
>>> pattern = re.compile(r'\A[%|\^]*$')
>>>
>>> pattern.match('%^ Text') # no match
>>> pattern.match('%^^%^') # match
<re.Match object; span=(0, 5), match='%^^%^'>
You are not using isalpha correctly, it returns True
when all characters in the string are alphabetic.
You could try using any
and map
to make sure at least one character is alphabet in the line.
txt = """
Data is like:
%^^%^^%^^%
This is a valid
*%^%^ Valid
This is not a valid one
"""
for line in txt.split("\n"):
if any(map(str.isalpha, line)):
print(line)
prints:
Data is like:
This is a valid
*%^%^ Valid
This is not a valid one
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.