简体   繁体   中英

Python. Replace every second “**” with </b> in a string

I am trying to parse a Markdown string to an HTML format and struggle to find a solution to replace every second occurence of ** with </b> .

Basicly, I want to write a function that would get a Makdown string as input and output HMTL string.

input: ** Hello!** everyone! **This should be HTML string** ** Hello!** everyone! **This should be HTML string**

output: ** Hello!</b> everyone! **This should be HTML string</b> ** Hello!</b> everyone! **This should be HTML string</b>

At the second step I am planning to use str.replace() function and substitute remaining ** with <b>

Would be gratefull for any suggestions!

I would implement a counter for the "**" substring (would make it go through the string and look for * and check if there is one more next to it), and then if(counter % 2 == 0) replace.

Hope this helps, am new here.

Using a markdown library is the way to go, but if you're wanting to do this yourself without a third party library, then regular expressions will make your job easier. These allow you to do find and replaces that match a pattern, In your case you are going to want to start by searching for the regex pattern

\*\*(.*?)\*\*

Asteriks have to be escaped, so this looks for 2 asteriks

Followed by a parenthetical group. The parenthetical group tells us we want to capture the contents inside of it to reference later on

Then the .* Tells us to match an unlimited number of characters. . being any character and * being unlimited. The ? at the end tells us to be non-greedy, so we stop as soon as possible.

And replacing it with

<b> \1 </b>

The \1 will reference what was in the parenthesis above. If there were more parenthesis you would reference the next set of parenthesis with \2 and then \3 and so on.

import re

replaced_str = re.compile('\*\*(.*?)\*\*', '<b> \1 </b>', your_string)

Alternatively you could search for the position of the first occurance of ** , and then look for the next occurrence of ** and use that info to do your replacement.

s = '** Hello!** everyone! **This should be an HTML string**'
while True:
    pos1 = s.find('**')
    pos2 = pos1 + s[pos1+2:].find('**')

    if pos1 >= 0 and pos2 > pos1:
        s = s[:pos1] + '<b>' + s[pos1+2:pos2+2] + '</b>' + s[pos2+4:]
    else:
        break;

print(s)

Here's a solution with regex

import re
text = "** Hello!** everyone! **This should be HTML string**"

p = re.compile(r"\*\*(.*?)\*\*")

result = re.sub(p, r"<b>\1<b/>", text)

"""
result: '<b> Hello!</b> everyone! <b>This should be HTML string</b>'
"""

planning to use str.replace()

Then you might harness optional (third) argument accepted by this function - number of replacement, following way:

txt = '** Hello!** everyone! **This should be HTML string**'
closing = False
while '**' in txt:
    txt = txt.replace('**','</b>' if closing else '<b>',1)
    closing = not closing
print(txt)

Output:

<b> Hello!</b> everyone! <b>This should be HTML string</b>

Nonetheless I suggest using ready tools for dealing with markdown if possible.

Looking that you are new to stackoverflow, i would always suggest, do research from net and try to come up with some solution, if you can't do still then u always can ask here

It can easily be done like this

    import re
    test_str= '** Hello!** everyone! **This should be HTML string**'
    pattern='**'
    res = [i for i in range(len(test_str)) if test_str.startswith(pattern, i)] 
    res
    for i,pos in enumerate(res):    
        if i%2==0:
            test_str = test_str[:pos] + '<b>' + test_str[pos+3:]
        else: 
            test_str = test_str[:pos] + '</b>' + test_str[pos+4:]

As Faruk Imamovic proposed earlier I think here is the most optimal solution to the problem.

opening = True
pos = 0
res = []
while pos < len(text):
    if text[pos] == "*" and pos < len(text)-1 and text[pos+1] == "*":
        res.append('<b>' if opening else '</b>')
        opening = not opening
        pos += 2
    else:
        res.append(text[pos])
        pos += 1
return ''.join(res)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM