In want to, on the fly, change an HTML similar to this:
<html><head><style>
body {
background: transparent url(http://example.com/image.gif) no-repeat right bottom;
}
</style><head>
<body>
<img src="http://example.com/image2.gif"/>
</body>
</html>
To (urls are cut):
<html><head><style>
body {
background: transparent url(data:image/gif;base64,R0lGODlhEAAOALMAAOazToeHh0tLS/...) no-repeat right bottom;
}
</style>
<head>
<body>
<img src="data:image/gif;base64,R0lGODlhEAAOALMAAOazToeHh0tLS/...."/>
</body>
</html>
Now I use this code:
private string EmbebedImages(string strHtml)
{
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(strHtml);
foreach (var imgNode in doc.DocumentNode.SelectNodes("//img[@src]"))
{
string url = imgNode.Attributes["src"].Value;
if (url.StartsWith("http"))
{
using (var webClient = new WebClient())
{
var imageAsByteArray = webClient.DownloadData(url);
string mimeType = MimeMapping.GetMimeMapping(url);
imgNode.Attributes["src"].Value = "data:" + mimeType + ";base64," +
Convert.ToBase64String(imageAsByteArray);
}
}
}
return doc.DocumentNode.OuterHtml;
}
But my code ignores urls in CSS.
Is it possible to make this change simple? I tried with some css-libraries, but I can't find a simple form...
you can't do that with HtmlAgilityPack but Try Regex
using System.Text.RegularExpressions;
private string EmbebedImages(string strHtml) {
var htmlString = .......load html string....;
string currentURL;
var images_url = Regex.Matches(htmlString, @"(?:https?:\/\/.*?\.(gif|png|jpg|jpeg))");
foreach(var url in images_url) {
currentURL = url.ToString();
using(var webClient = new WebClient()) {
var imageAsByteArray = webClient.DownloadData(currentURL);
string mimeType = MimeMapping.GetMimeMapping(currentURL);
string dataURL = "data:" + mimeType + ";base64," + Convert.ToBase64String(imageAsByteArray);
htmlString = htmlString.Replace(currentURL, dataURL);
}
}
return htmlString;
}
Though this isn't applicable for OP's case, if in Python one wanted to convert URLs to Data URIs in CSS, here is how I did it:
import requests
import re
import base64
def url_to_base64(url, memo=None):
if memo is None:
memo = {}
if url in memo:
return memo[url]
res = requests.get(url)
mime_type = res.headers['content-type']
base64_data = base64.b64encode(res.content).decode('utf-8')
data_url = "data:{};base64,{}".format(mime_type, base64_data)
memo[url] = data_url
return data_url
def embed_urls(html, memo=None):
if memo is None:
memo = {}
pattern = re.compile(r"url\('(https?://.*?)'\)")
urls = set(re.findall(pattern, html))
for url in urls:
base64_data = url_to_base64(url, memo)
html = html.replace(url, base64_data)
return html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.