简体   繁体   中英

I want to delete certain strings from a text file in python

I am very new to python. I have a text file which contains certain strings which I am logging to csv file using pandas.

I want to remove certain strings which starts with specific characters from the file. Please let me know how can i do it.

The Text file contains the strings something like this

 "<S t='a' s='3'/>SetRTEConfig,Done,<S t='s' c='IgnoreCase' s='5'/>{LogUutCurrentVersions}{GetMcbVersion}LogAndReportLastVersion,"v1.22.000",Passed....

and so on.. I need to delete all those which starts from <S t='a' . I need only those data which starts from <S t='s'

You can do something along these lines:

goodlines = []
with open('textfile.txt','r') as fp:
    lines = fp.readlines()
    for line in lines:
        if line[0:8]=="<S t='s'":
        goodlines.append(line)

If you need more advanced pattern matching you can use ' regular expressions '

Solution using regex:

import re
contents = "<S t='a' s='3'/>SetRTEConfig,Done,<S t='s' c='IgnoreCase'" \
           " s='5'/>{LogUutCurrentVersions}{GetMcbVersion}LogAndReportLastVersion,\"v1.22.000\",Passed...."

pattern = re.compile(r'<S t=\'s\'(.*)')
result = pattern.findall(contents)
print(result)

OUTPUT:

[' c=\'IgnoreCase\' s=\'5\'/>{LogUutCurrentVersions}{GetMcbVersion}LogAndReportLastVersion,"v1.22.000",Passed....']

I think the python solution here is much better than using a regex, but since you asked for it, you could do something like this for a regex solution:

import re
s=''' "<S t='a' s='3'/>SetRTEConfig,Done,<S t='s' c='IgnoreCase' s='5'/>{LogUutCurrentVersions}{GetMcbVersion}LogAndReportLastVersion,"v1.22.000",Passed....'''
re.findall(r"<S t='s'.+?>([^<]+)", s)
['{LogUutCurrentVersions}{GetMcbVersion}LogAndReportLastVersion,"v1.22.000",Passed....']
# assuming you don't want the tag but only the text after it?

Or, directly into pandas:

>>> from StringIO import StringIO; import pandas as pd; import re
>>> pd.read_csv(StringIO('\n'.join(re.findall(r"<S t='s'.+?>([^<]+)", s))), sep=",")
# this will look off since its falsey data with no header...
Empty DataFrame
Columns: [{LogUutCurrentVersions}{GetMcbVersion}LogAndReportLastVersion, v1.22.000, Passed....]
Index: []

However, the above is quite crude -- it won't work for example if there is a "<" in the csv data. To re-iterate: I would use the python example -- it will be much simpler to use and be more flexible when you run into more conditions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM