简体繁体中英

Structure of base 64 encoded strings in html

原文 2020-06-11 06:48:15 3 1 python/ selenium/ base64

I downloaded the page source (html) of websites with Selenium (Python). And I wish to find all base 64 encoded strings in html files.

Is there a known structure to all base 64 encoded strings in htmls? From my observations, it seems like it would start with ;base64 followed by hex-strings and finally a closing bracket ) . Is that accurate?

From Wikipedia, the hex-string must also be composed of the followings: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ . Can someone also confirm that?

Thanks a lot in advance!

* Edit 1 *

Thanks a lot Tris, The link you provided is very helpful, However. from that, it seems like there is no specific format for the end of a base 64 strings. If I want to detect its end, what advice would you give other than ) ?

I mainly want to track the changes of a bunch of websites, and the base64 encodings contain a lot of data that are not relevant for my use. To save storage, I therefore intend to remove them. An example is www.amd.com , which has the following data:image/png;base64,... (after being rendered by browser).

Since there are many different websites, I don't know all of their formats. Here are some other examples of the base64 strings that I found and are not useful to me:

data:font/truetype;base64,AAEAAA...

data:image/png;base64,iVBORw0KG...

For several of the examples that I saw, they all ended with a closing bracket ) . May I ask then under what scenario would they end with ) and otherwise?

Thanks again!

1 answers

Not all base64-encoded strings will include a ;base64 at the beginning of them -- this is typically specific to data URLs . If you are specifically looking for base64-encoded images or other inline elements that would otherwise be referred to with an HTTP URL, this might be fine. The closing bracket is not typically relevant, I haven't seen that required on data URLs or other base64-encoded strings.

Typically, base64-encoded strings use the alphabet you've mentioned -- ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ . If the encoded length is not a multiple of 3 bytes, it is padded with an appropriate number of = characters at the end.

There is another commonly used base64 format on the web -- the URL-safe base64 format. In this encoding, + and / are typically replaced with - and _ so they can be included in URLs safely, hence the name.

This information may be irrelevant if you know more about the structure of the websites you are trying to parse, aside from just "they contain base64-encoded string data."

Python: Decoding base64 encoded strings within an HTML file and replacing these strings with their decoded counterpart

Python: Decoding base64 encoded strings

Remove the new line “\n” from base64 encoded strings in Python3?

Kafka getting data as base64 encoded strings even though Producer does not explictly encode

How to upload a base64 encoded string to s3 and access the url in html file in python

Base64 encoded image in email

Convert Base64 encoded bytes to Int

Not able to decode base64 encoded image

Convert a base64 encoded string to binary

Requesting URLs with base64 data encoded

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Python: Decoding base64 encoded strings within an HTML file and replacing these strings with their decoded counterpart Python: Decoding base64 encoded strings Remove the new line “\n” from base64 encoded strings in Python3? Kafka getting data as base64 encoded strings even though Producer does not explictly encode How to upload a base64 encoded string to s3 and access the url in html file in python Base64 encoded image in email Convert Base64 encoded bytes to Int Not able to decode base64 encoded image Convert a base64 encoded string to binary Requesting URLs with base64 data encoded

Related Tags

Structure of base 64 encoded strings in html

Question

1 answers

solution1 1 ACCPTED 2020-06-11 06:56:39

solution1
1 ACCPTED 2020-06-11 06:56:39