I am very new to python. I have a text file which contains certain strings which I am logging to csv file using pandas.
I want to remove certain strings which starts with specific characters from the file. Please let me know how can i do it.
The Text file contains the strings something like this
"<S t='a' s='3'/>SetRTEConfig,Done,<S t='s' c='IgnoreCase' s='5'/>{LogUutCurrentVersions}{GetMcbVersion}LogAndReportLastVersion,"v1.22.000",Passed....
and so on.. I need to delete all those which starts from <S t='a'
. I need only those data which starts from <S t='s'
You can do something along these lines:
goodlines = []
with open('textfile.txt','r') as fp:
lines = fp.readlines()
for line in lines:
if line[0:8]=="<S t='s'":
goodlines.append(line)
If you need more advanced pattern matching you can use ' regular expressions '
Solution using regex:
import re
contents = "<S t='a' s='3'/>SetRTEConfig,Done,<S t='s' c='IgnoreCase'" \
" s='5'/>{LogUutCurrentVersions}{GetMcbVersion}LogAndReportLastVersion,\"v1.22.000\",Passed...."
pattern = re.compile(r'<S t=\'s\'(.*)')
result = pattern.findall(contents)
print(result)
OUTPUT:
[' c=\'IgnoreCase\' s=\'5\'/>{LogUutCurrentVersions}{GetMcbVersion}LogAndReportLastVersion,"v1.22.000",Passed....']
I think the python solution here is much better than using a regex, but since you asked for it, you could do something like this for a regex solution:
import re
s=''' "<S t='a' s='3'/>SetRTEConfig,Done,<S t='s' c='IgnoreCase' s='5'/>{LogUutCurrentVersions}{GetMcbVersion}LogAndReportLastVersion,"v1.22.000",Passed....'''
re.findall(r"<S t='s'.+?>([^<]+)", s)
['{LogUutCurrentVersions}{GetMcbVersion}LogAndReportLastVersion,"v1.22.000",Passed....']
# assuming you don't want the tag but only the text after it?
Or, directly into pandas:
>>> from StringIO import StringIO; import pandas as pd; import re
>>> pd.read_csv(StringIO('\n'.join(re.findall(r"<S t='s'.+?>([^<]+)", s))), sep=",")
# this will look off since its falsey data with no header...
Empty DataFrame
Columns: [{LogUutCurrentVersions}{GetMcbVersion}LogAndReportLastVersion, v1.22.000, Passed....]
Index: []
However, the above is quite crude -- it won't work for example if there is a "<" in the csv data. To re-iterate: I would use the python example -- it will be much simpler to use and be more flexible when you run into more conditions.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.