繁体   English   中英

如何从保存的HTML页面获取图像

[英]How to get images from a saved html page

我的PC中保存了大量HTML页面。 我已经解析了HTML页面并获得了图像src。 我需要将每个HTML页面中的图像以特定结构存储在单独的目录中。 我尝试了NET :: HTTP.get,但是文件名错误时间过长。 有什么办法做到这一点?

以下是我尝试的方法。

方法1:

{
require 'open-uri'

def save_image(imgsrc)
    File.open("images/img1","w") do |f|
        asdf = open(imgsrc).read
        f.write(asdf)
    end
end
}

方法2:

{
require 'NET::HTTP'

def save_image(imgsrc)
    File.open("images/img1","w") do |f|
        asdf = Net::HTTP.get_response(URI.parse(imgsrc)
        f.write(asdf)
    end
end
}


imgsrc => data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCABxAHEDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iiigAooooAKKKKACiiigAoNFYnibWG0a2sJF6XF/BbsfRWbB/SgDbooooAKKKKACiiigApDRmqGs6vbaJpkl/d7/KjIGEXLEk4AA7nJoA0KKgtLqG9tUuIHDxOMqw71NQAtFGaTPFIBaSjNGaYC1ynjyxOoWOkwjPOqW5JHb5s5/TH411WaY4R1G8AgEEZ7HtSEySmeYvmiPcN5G7HfFKTyB3rnDcyt8RhCpHkR6Yd4z0YyD+mPzpjOkozTd6lymRuAyRnmmieNpmhEiGRRkoGGQPcUrgS0UUUwOa8XeM7DwfFaPepJI1zJtVY+oUfeb6DI/OpfEDxz/wBh8h4ptQjPqGGx2H6gVi+KLGDVvEt1azxrIE0SV03DO1vMBBH5VVW8x4F8FXMr/vFubJTk8njY36EmpZLZ1ukmHTdAYudkVs028+gV2yf0JrK8EeNE8XxXpNq1rLbyDEbHJMbZ2t+ODVvxSRZeDdV2twY5Cc/7bc/+hGsrRbZbH4gPFGgjE2kI7ADGdjIo/maYXLvjrxPP4Y0eOeytftN5NJsji7YA3MT7AD9ap694svD4IsdT0SASX2oBRBG3Ow7S759cBWq/rca3Xi7RbZxuUwXJZcfwlQp/mPzrl/DM6pb+EopTgfbdQiAPTIMmB+WRSYX1L3iPxHro0HRf7ISNdRuYRdXJPKpGoXf/AOPOPwzTvH2ta4ltZWvh0hbtomvZ3P8ADCmMj8S36VJFa+TrtlpUzYlk0SeFQe53pn9Oa0LCAXHii9imGWg0yC3Yf7xYn/PtQGpX1bxFdXvw9h1DSUP9o6jGkNsn92V+D/3z8x/4DXHL42v18I+G9Put51G61EWtxKeyxypn8SGX9a1vCrEweHbBzkQapeEZ9kkZf0ko8SaKtpottNJAFk/4SNJo+OVUyhf1Cg0hNuxV8UarrcHxBt9Vtpyujabcw2U8QP3zIMuSPYH9Kzb6TxBefHRre1WSO0jlhebb0MI2Ek+xK1ueKLK4t/Cnim4niMYfUDLFn+JQFG7+f5Vv24A+J943/UMXP/fa0w1uc/ps2oR/EWPXHumbT9TubjT1g7IIuEYfVkb86n8MWE3/AAm8viGS6kkGqm6jWIn5USKTav6Kv61Sj1C2Tw34SAmT7Q+srhARu+aZ8/zra+Htzb6npCpuxc6XeXMTr7M7EfoR+VCBHdZoooqizE1xLDT7bUdYnKxytaeQ0jH+EbiB+bfyrz+6iefwDpflj97Y6Y1xGP8Ab8xAv6Bvzrf8bxf21r+maHISbSOCbUbhM/f8sYQH23HNUNAH2iysLLaGaXQJGVfUrKP6kVD3Ie50NtHdalr9wLq5S50W7skmitmQYUnb379CfxqrY6lbah8U51tpRJ9l06SCUjoHEkZI/DcKzvDWsND4UuLrful0zTfIyf70byqv57F/Oo/BOlppXiTTIxzLPobXE7nq8jyozEnuecfhTQHcaxf2Wj2M2q3pVI7eM5kI5AOOB9SB+leP6vJcw/D/AEnV7IlZrHVLm6T/AL/Nx+RJPsDXoXjKEalrHhvRpADb3F400ynoyxIXAPsTisXTbJNR8AyWwUNPHdXk6Rf31WV1dfxVyv8AwKh6g9WZHj3WpoJ/C/jCxB2xxrKVHeN8bx/48o/4FXa6NMs3j3X3Vso1nZsv0Ikrkxp0ep/C20gVt8tjauHH96Eu6H8vLDfVBWz4LkZvEczN/wAttFsH+pCtn/0IUkC3Mrw0SniXT4Sf9Xqd2B/4DJ/ga6b4gf8AIFsf+wpaf+jRXK6U5g8V6VMfuTa3eJ/5AKj9RXVfEA40Wx/7Cdp/6MFPoC2MD4qa8yWkmhW0XmM1ubi7ftFGOF/Etj8q2i2z4h6iw6jSc/8AjwrlfEiCe28f3zcyBre2X2VBnH65rqtu/wCIuoJ/e0nH/jy0kPqcx4K+GukXMGn+Ip5bprtLlp1TzPkVllOOP+AiovBsc+ifFnUbME/ZdQjl+X0kib+ePm+jiuw+HVysvgq3yQDHcTxn6+c+P5is5bdZvGvhrW1UK2oQu8iDoHEByfxGwf8AABTQktmd/miiiqLOH1iVIviBcO5AC+H5zknpiRCaxtCmFlf+C7hzhLm0lsyfc5YD80FT+OvD1/rfjfRktfOjt5YWiuZUyB5OcupPuMDHfNZlz4P1u1+Hdt5Ku+r2V891AinJQb+MfgAfxNTrcze5IYE0638daYM5KRzRjP8AAcr/ADGfxrc0W+gn8d6aiOv/ACABgZ9WjYD8qxZvBWtPdWt9PI0lxNZTyX+xuJJTysf0yVA/3TUngv4fX2lX3h/VLqTEkMcz3Cs3zAsoVF+gX9RSSYK9zo/ENxFbeO/DcszqiYuBuY4A/dsf6VleBb+CW304q6s0k99GQDkjdIJBn8Fq58Q/Cdz4qk0eGAYiS4PnvuwUTHJ/LI/EVj3Xw+1iwtnm0K8hiuxqUlwgbgCJk2AfUKT+dOzuPW5Z0SdLbwu4bAE2jTz4/wBlZJD/AO1KtaGh03xbpcL8CfQ0Uk/3k2AD8gxpNc8C312ujQadfiG2trUWN0p/5aQnG7HucVf8a+ErnxDHYtp179iuLdypkGR+7IwQMd8E/maLMDiNXu/7L8I+HPEGG2R6nJeE/wB5XlPH4pmp/EPi+DVPDD6uZGFlLrsCWgYYzHGFLHH1DGu38SeDoNb8GL4fgcQLEiLA5GQu0Y5/DNZ9z8MtIvfDWk6LcPL5OnqxBQ43u3Vj+OT+NDQWZwfivxDDay+LtIUM819Ms8ZUZBRVO45/4APzrpW8W2tv4+0q98uVo9a0+KO3Xbzl5I8Z+gyfwrY/4Vvpsmoajczyu63VkLOMY5jXGGOfUkZq9P4JsJvEei6ruIXSrfyYocccDCn8P8KLAkzzjw1r91a+I5fBMcTlpNZW4Mo6IikSSL+afqa67w/fw6pq3h2CGRXeyS9Mig52BWEa5+oat2w8FaZYeML3xJGGN1crjaeiE/eI+uP1NTaH4P0nw9ql/f2MRWW8OXBPCjOSB6AnmhIaTOgoooqiiF5oY3w8qKf9pgKjN9ZjrdQD/toK43xH8O08R6/Lfy6pewLIqgJE2FGAB6+1ZA+DVpkiTVL1xng+aRS1Ju77Hop1XTlzm/tRj/psv+NMbW9KQZbUrMAcn98v+Nefj4MaXuBa7uz/ANt+v6VMvwb0T+N7hue8uc/pRZhdnZt4o0JQSdWs8Dr++FVZPG/hqP72tWn4NmucHwd8Oj+CQj3arMfwl8MIBm03Y9SaVmGppN8RPCaZ/wCJ1b/gG/wqrN8U/CMR/wCQoG5x8sbf4UifC/wwj7vsK5rQj8CeHY02/wBnREe60WYe8Y0nxf8ACUef9Kmb/diNQN8ZvCwXIN2fYQ//AF66f/hDtAyD/Zlvx0+WnjwnoQwf7Mtzj/Yos+4e8cc3xs8NgfJbag59oR/jVeT43aSEzFpWouew8sD+td8vhrRkOV022/74FTLoemL0sLcf9sxRyhqeYt8bWfP2fw3eN6Fv/rZrV8J/EjU/EfiO306bQJbSCUMWmYnC4UkdvbFd+mm2UYwlpCB7IKmSCKNgUjRfooFCQWZJRRRVFCd6KWigBKKWjvQAlFLRQAmKKWigBKKWigBKWiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooA//2Q==

您已经有图像,您发布的图像(在imgsrc变量中)是

这个图片

您只需要使用base64模块对其进行解码,然后将结果保存到文件中即可。

为了解码您的图像,我使用了这项服务


要使用Base64进行解码,您应该使用#strict_decode64方法:

$ cat testb64.rb

imgsrc='/9j/4AAQS... ...oooA//2Q==' #( snipped here your long variable, 
                                    #  removed "data:image/jpeg;base64," 
                                    #  from the beginning )
require 'base64'
print Base64.strict_decode64(imgsrc)

$ ruby testb64.rb >img.jpg

$ xxd -p img.jpg 
ffd8ffe000104a464946....

(valid JFIF header, viewable JPEG by Gwenview and Dolphin)

这应该工作:

require 'open-uri'

require 'base64'
require 'open-uri'

def save_image(imgsrc)
  File.open("images/img1", "wb") do |fo|
    fo.write(Base64.decode64(open(imgsrc).read))
  end
end

它将保存到文件路径"images/img1"因此您将要为每个文件创建单独的路径,否则它们将覆盖每个文件。

"wb"表示使用二进制模式打开输出文件,从而避免了适用于您的OS的行尾转换。 如果没有b ,Ruby将查找“ \\ r”和“ \\ n”,并根据需要删除或添加它们,以删除文本文件,这将破坏二进制文件。 b避免了该步骤。 IO.new说明IO.new进行了IO.new

你不能通过

imgsrc => data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCABxAHEDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iiigAooooAKKKKACiiigAoNFYnibWG0a2sJF6XF/BbsfRWbB/SgDbooooAKKKKACiiigApDRmqGs6vbaJpkl/d7/KjIGEXLEk4AA7nJoA0KKgtLqG9tUuIHDxOMqw71NQAtFGaTPFIBaSjNGaYC1ynjyxOoWOkwjPOqW5JHb5s5/TH411WaY4R1G8AgEEZ7HtSEySmeYvmiPcN5G7HfFKTyB3rnDcyt8RhCpHkR6Yd4z0YyD+mPzpjOkozTd6lymRuAyRnmmieNpmhEiGRRkoGGQPcUrgS0UUUwOa8XeM7DwfFaPepJI1zJtVY+oUfeb6DI/OpfEDxz/wBh8h4ptQjPqGGx2H6gVi+KLGDVvEt1azxrIE0SV03DO1vMBBH5VVW8x4F8FXMr/vFubJTk8njY36EmpZLZ1ukmHTdAYudkVs028+gV2yf0JrK8EeNE8XxXpNq1rLbyDEbHJMbZ2t+ODVvxSRZeDdV2twY5Cc/7bc/+hGsrRbZbH4gPFGgjE2kI7ADGdjIo/maYXLvjrxPP4Y0eOeytftN5NJsji7YA3MT7AD9ap694svD4IsdT0SASX2oBRBG3Ow7S759cBWq/rca3Xi7RbZxuUwXJZcfwlQp/mPzrl/DM6pb+EopTgfbdQiAPTIMmB+WRSYX1L3iPxHro0HRf7ISNdRuYRdXJPKpGoXf/AOPOPwzTvH2ta4ltZWvh0hbtomvZ3P8ADCmMj8S36VJFa+TrtlpUzYlk0SeFQe53pn9Oa0LCAXHii9imGWg0yC3Yf7xYn/PtQGpX1bxFdXvw9h1DSUP9o6jGkNsn92V+D/3z8x/4DXHL42v18I+G9Put51G61EWtxKeyxypn8SGX9a1vCrEweHbBzkQapeEZ9kkZf0ko8SaKtpottNJAFk/4SNJo+OVUyhf1Cg0hNuxV8UarrcHxBt9Vtpyujabcw2U8QP3zIMuSPYH9Kzb6TxBefHRre1WSO0jlhebb0MI2Ek+xK1ueKLK4t/Cnim4niMYfUDLFn+JQFG7+f5Vv24A+J943/UMXP/fa0w1uc/ps2oR/EWPXHumbT9TubjT1g7IIuEYfVkb86n8MWE3/AAm8viGS6kkGqm6jWIn5USKTav6Kv61Sj1C2Tw34SAmT7Q+srhARu+aZ8/zra+Htzb6npCpuxc6XeXMTr7M7EfoR+VCBHdZoooqizE1xLDT7bUdYnKxytaeQ0jH+EbiB+bfyrz+6iefwDpflj97Y6Y1xGP8Ab8xAv6Bvzrf8bxf21r+maHISbSOCbUbhM/f8sYQH23HNUNAH2iysLLaGaXQJGVfUrKP6kVD3Ie50NtHdalr9wLq5S50W7skmitmQYUnb379CfxqrY6lbah8U51tpRJ9l06SCUjoHEkZI/DcKzvDWsND4UuLrful0zTfIyf70byqv57F/Oo/BOlppXiTTIxzLPobXE7nq8jyozEnuecfhTQHcaxf2Wj2M2q3pVI7eM5kI5AOOB9SB+leP6vJcw/D/AEnV7IlZrHVLm6T/AL/Nx+RJPsDXoXjKEalrHhvRpADb3F400ynoyxIXAPsTisXTbJNR8AyWwUNPHdXk6Rf31WV1dfxVyv8AwKh6g9WZHj3WpoJ/C/jCxB2xxrKVHeN8bx/48o/4FXa6NMs3j3X3Vso1nZsv0Ikrkxp0ep/C20gVt8tjauHH96Eu6H8vLDfVBWz4LkZvEczN/wAttFsH+pCtn/0IUkC3Mrw0SniXT4Sf9Xqd2B/4DJ/ga6b4gf8AIFsf+wpaf+jRXK6U5g8V6VMfuTa3eJ/5AKj9RXVfEA40Wx/7Cdp/6MFPoC2MD4qa8yWkmhW0XmM1ubi7ftFGOF/Etj8q2i2z4h6iw6jSc/8AjwrlfEiCe28f3zcyBre2X2VBnH65rqtu/wCIuoJ/e0nH/jy0kPqcx4K+GukXMGn+Ip5bprtLlp1TzPkVllOOP+AiovBsc+ifFnUbME/ZdQjl+X0kib+ePm+jiuw+HVysvgq3yQDHcTxn6+c+P5is5bdZvGvhrW1UK2oQu8iDoHEByfxGwf8AABTQktmd/miiiqLOH1iVIviBcO5AC+H5zknpiRCaxtCmFlf+C7hzhLm0lsyfc5YD80FT+OvD1/rfjfRktfOjt5YWiuZUyB5OcupPuMDHfNZlz4P1u1+Hdt5Ku+r2V891AinJQb+MfgAfxNTrcze5IYE0638daYM5KRzRjP8AAcr/ADGfxrc0W+gn8d6aiOv/ACABgZ9WjYD8qxZvBWtPdWt9PI0lxNZTyX+xuJJTysf0yVA/3TUngv4fX2lX3h/VLqTEkMcz3Cs3zAsoVF+gX9RSSYK9zo/ENxFbeO/DcszqiYuBuY4A/dsf6VleBb+CW304q6s0k99GQDkjdIJBn8Fq58Q/Cdz4qk0eGAYiS4PnvuwUTHJ/LI/EVj3Xw+1iwtnm0K8hiuxqUlwgbgCJk2AfUKT+dOzuPW5Z0SdLbwu4bAE2jTz4/wBlZJD/AO1KtaGh03xbpcL8CfQ0Uk/3k2AD8gxpNc8C312ujQadfiG2trUWN0p/5aQnG7HucVf8a+ErnxDHYtp179iuLdypkGR+7IwQMd8E/maLMDiNXu/7L8I+HPEGG2R6nJeE/wB5XlPH4pmp/EPi+DVPDD6uZGFlLrsCWgYYzHGFLHH1DGu38SeDoNb8GL4fgcQLEiLA5GQu0Y5/DNZ9z8MtIvfDWk6LcPL5OnqxBQ43u3Vj+OT+NDQWZwfivxDDay+LtIUM819Ms8ZUZBRVO45/4APzrpW8W2tv4+0q98uVo9a0+KO3Xbzl5I8Z+gyfwrY/4Vvpsmoajczyu63VkLOMY5jXGGOfUkZq9P4JsJvEei6ruIXSrfyYocccDCn8P8KLAkzzjw1r91a+I5fBMcTlpNZW4Mo6IikSSL+afqa67w/fw6pq3h2CGRXeyS9Mig52BWEa5+oat2w8FaZYeML3xJGGN1crjaeiE/eI+uP1NTaH4P0nw9ql/f2MRWW8OXBPCjOSB6AnmhIaTOgoooqiiF5oY3w8qKf9pgKjN9ZjrdQD/toK43xH8O08R6/Lfy6pewLIqgJE2FGAB6+1ZA+DVpkiTVL1xng+aRS1Ju77Hop1XTlzm/tRj/psv+NMbW9KQZbUrMAcn98v+Nefj4MaXuBa7uz/ANt+v6VMvwb0T+N7hue8uc/pRZhdnZt4o0JQSdWs8Dr++FVZPG/hqP72tWn4NmucHwd8Oj+CQj3arMfwl8MIBm03Y9SaVmGppN8RPCaZ/wCJ1b/gG/wqrN8U/CMR/wCQoG5x8sbf4UifC/wwj7vsK5rQj8CeHY02/wBnREe60WYe8Y0nxf8ACUef9Kmb/diNQN8ZvCwXIN2fYQ//AF66f/hDtAyD/Zlvx0+WnjwnoQwf7Mtzj/Yos+4e8cc3xs8NgfJbag59oR/jVeT43aSEzFpWouew8sD+td8vhrRkOV022/74FTLoemL0sLcf9sxRyhqeYt8bWfP2fw3eN6Fv/rZrV8J/EjU/EfiO306bQJbSCUMWmYnC4UkdvbFd+mm2UYwlpCB7IKmSCKNgUjRfooFCQWZJRRRVFCd6KWigBKKWjvQAlFLRQAmKKWigBKKWigBKWiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooA//2Q==

作为图片的网址,因为它不是网址。 OpenURI和Net :: HTTP都期望图像的URL,然后它们将请求该URL并读取结果响应,将数据返回给您的代码。 您需要对该数据进行Base64解码,这将在内存中产生一个二进制字符串,然后您可以将其写入以二进制模式打开的文件。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM