简体   繁体   中英

Python: Format string to appear as plaintext in Markdown or HTML?

I am using a Telegram bot to send messages from my Python program. Telegram requires all bot messages you send to be formatted either in Markdown or HTML.

All I want is the strings in my Python program to appear exactly the same way on the receiving end of a Telegram message.

Problem is, the text I'm trying to send is drawn from the public, so it could be anything, including special characters that have meanings in those formats, which screws up the message completely.

Is there a way I can format this message string into one of those formats so it will appear just as plain text on the other end?

edit: I've tried a bunch of things. As mmiron suggested, I tried escaping my string into html, which I have not been able to get to work. What seems to happen is that special (<,>,#) characters completely screw up the message EVEN IF I replace them with character references like &amp;

I also tried escaping my string into Markdown, which had this very strange result. Unlike HTML using markdown seems more likely to actually send the text, but special characters (especially #) seem to screw up the result

Here is the starting text with Markdown

>>Bravo: Priyanka Chopra, Navya Naveli Nanda praise Jharkhand girl who got #Harvard University scholarship https://url
"Educate a girl you can change the whole community"
- - - - - - - - - - - - - - - - - - - - - -
Sunchartist
(@sunchartist)
                                                     j1.1
- - - - - - - - - - - - - - - - - - - - - -
9:09PM +43seconds    23-4-2021
[Chopra]
(Balance: $3.43)
-----------------------------------
<https://twitter.com/sunchartist/status/1385777945248030723>
-----------------------------------
<https://www.url.com>

After escaped into markdown and sent, here is telegram message I receive:

'\>Bravo: Priyanka Chopra, Navya Naveli Nanda praise Jharkhand girl who got Harvard University scholarship https://url 
"Educate a girl you can change the whole community"
\- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \-
Sunchartist
\(@sunchartist\)
                                                     j1\.1
\- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \-
9:09PM \ 43seconds    23\-4\-2021
[Chopra\]
\(Balance: $3\.43\)
\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
<https://twitter\.com/sunchartist/status/1385777945248030723\>
\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
<https://www\.url\.com\>

Obviously what's happening is the escape function is adding a backslash before every special character. What's strange is it doesn't seem to be parsing that correctly as markdown EXCEPT in the case of the first bracket in

[Chopra\]

and the first < that appeared in the original text before the links

<https://www\.url\.com\>

Which is strange because for some reason it only seems to be working for the opening character in [] or <> but not the closing.

I've also tried wrapping with <pre>...</pre> using the HTML flag but that doesn't seem to achieve anything except change the text color on telegram IF it sends, which it will only do if I remove the offending special characters

If at any point in the text there is a hashtag (#) it seems to send only the text leading up to the hashtag, but none after it. This is true for both markdown and html for some reason

import html
escaped = html.escape(EXTERNAL_INPUT_STRING)

Then send escaped as HTML. Also see https://core.telegram.org/bots/api#sendmessage

Ok I finally solved it. Posting here for future reference.

I could never get HTML to work so I'm still unsure about that.

So first, I had my parse_mode=Markdown Instead it needs to be parse_mode=MarkdownV2

Next, there are a few specific characters that using the \ operator does work to display as a literal. Instead, you need to use Percent-Encoding to retain those symbols.

Here is the code I used to fix that portion.

message_body=message_body.replace('%', '\\%25')
message_body=message_body.replace('#', '\\%23')
message_body=message_body.replace('+', '\\%2B')
message_body=message_body.replace('*', '\\%2A')
message_body=message_body.replace('&', '\\%26')

Which fixes for %, #, +... I could probably make this more elegant/faster but this works for now.

Finally, there's a group of characters that the \ operator DOES work to create a literal for. Here's the code I used to fix those

    message_body = re.sub(r"([_*\[\]()~`>\#\+\-=|\.!{}])", r"\\\1", message_body)

To add a specific for Telegram escaping you can just use PlainText from telegram-text :

from telegram_text import PlainText

element = PlainText("Your non-escaped text!")
escaped_text = element.to_markdown()
escaped_text
'Your non\\-escaped text\\!'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM