I have a xml string (converted to a list) and I am looking for a specific string. I want to do stuff only if this string has the same specific string in the next line in the list.
xml (called diff):
<result type="MLST" value="96">
<result_data type="profile" value="43,47,49,49,41,15,3"/>
<result_data type="QC_minimum_consensus_depth" value="7"/>
<result_data type="QC_max_percentage_non_consensus_base" value="10.0"/>
<result_data type="QC_percentage_coverage" value="100"/>
<result_data type="QC_minimum_consensus_depth_for_all_loci" value="7,17,27,10,25,18,22" diff:update-attr="value:7,17,27,10,24,18,22"/>
<result_data type="QC_complete_pileup" value="TRUE"/>
<result_data type="QC_mean_consensus_depth" value="17.67"/>
<result_data type="QC_max_percentage_non_consensus_base_for_all_loci" value="10.0, 6.25, 3.45, 9.09, 5.88, 5.26, 5.41"/>
<result_data type="QC_mean_consensus_depth_for_all_loci" value="17.67, 32.49, 34.09, 23.44, 35.57, 29.02, 39.08" diff:update-attr="value:17.67, 32.49, 34.09, 23.44, 34.24, 29.02, 39.08"/>
<result_data type="QC_traffic_light" value="GREEN"/>
<result_data diff:insert="" type="predicted_serotype" diff:add-attr="type;value" value="('Schwarzengrund (Achtman)', 168), ('Schwarzengrund (PHE)', 83), ('Blockley (Achtman)', 1), ('Uppsala (Achtman)', 1), ('Oslo (Achtman)', 1), ('Schwarzengru (Achtman)', 1), ('Iv Rough:Z4,Z32:- (Achtman)', 1)"/>
<result_data type="predicted_serotype" value="('Schwarzengrund (PHE)', 13)" diff:delete=""/>
</result>
<gastro_prelim_st reason="not novel" success="false">
<type st="96"/>
</gastro_prelim_st>
Code:
diff_list = diff.split("\n")
for n,line in enumerate(diff_list):
if "predicted_serotype" in line:
print(line)
What I want is if you fine "predicted_serotype" in line and the next line has also "predicted_serotype" then print.
Appreciate any help.
What I did, just copied your xml content into a txt file and then read it as a string
file = "path/tmp.txt"
# the content will be a variable containing string
with open(file, 'r') as file:
content = file.read()
# diff_list is a list
diff_list = content.split("\n")
for n,line in enumerate(diff_list):
print(n)
if "predicted_serotype" in line and "predicted_serotype" in diff_list[n+1]:
print(line)
basically diff_list
is a list, so you can do all sort of indexing operations.
Also as others mentioned in the comments, make sure n+1
is not out of range
UPDATED @bruno desthuilliers suggested:
for line, next_line in zip(diff_list, diff_list[1:]):
if "predicted_serotype" in line and "predicted_serotype" in next_line:
print(line)
This way you avoid the index error
Though my answer is not relevant to the question literally, considering the context of the question, I would suggest using regular expression as below.
import re
diff = "Your xml text"
regx = re.compile("(<.*predicted_serotype.*\/>)\s.*predicted_serotype.*")
matches = regx.findall(diff)
for match in matches:
print(match)
Here, the regex matches two lines containing the string "predicted_serotype" but regx.findall
returns only the capture group inside the parenthesis.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.