简体   繁体   English

如何从Python中的GLib.GString获取原始字节?

[英]How to get raw bytes from GLib.GString in Python?

I have an application written in Python using GTK3 through the GObject introspection (Python 2.7 and PyGObject 3.14). 我有一个使用GTK3通过GObject内省(Python 2.7和PyGObject 3.14)用Python编写的应用程序。 I am trying to load a web page using WebKit and access the contents of all the resources it loads. 我正在尝试使用WebKit加载网页并访问它加载的所有资源的内容。 I'm able to accomplish this by connecting to the resource-load-finished signal of the WebKitWebView object I am using to load the page. 我能够通过连接到我用来加载页面的WebKitWebView对象的资源加载完成信号来实现这一点。

Within my signal handler I use the WebKitWebResource object in the web_resource parameter to access the loaded data. 在我的信号处理程序中,我使用web_resource参数中的WebKitWebResource对象来访问加载的数据。 Everything works fine with the GLib.GString returned from get_data() when it does not contain a NULL byte, I can access what I need using data.str. 当get_data()不包含NULL字节时,从get_data()返回的GLib.GString一切正常,我可以使用data.str访问我需要的东西。 However when the data does contain a NULL byte, which is often the case when the MIME type of the loaded resource is an image, data.len is correct but data.str only contains the data up to the first NULL byte. 但是,当数据确实包含NULL字节时(通常是加载资源的MIME类型是图像时),data.len是正确的,但data.str只包含直到第一个NULL字节的数据。 I can access the raw bytes by calling data.free_to_bytes() which returns a GLib.GBytes instance, however when the signal handler returns the application segfaults. 我可以通过调用data.free_to_bytes()来访问原始字节,这会返回一个GLib.GBytes实例,但是当信号处理程序返回应用程序段错误时。 I'm trying to access all the data within the loaded resource. 我正在尝试访问已加载资源中的所有数据。

I hope the following code helps demonstrate the issue. 我希望以下代码有助于演示该问题。

from gi.repository import Gtk
from gi.repository import WebKit

def signal_resource_load_finished(webview, frame, resource):
    gstring = resource.get_data()
    print(resource.get_mime_type())
    desired_len = gstring.len
    # gstring.str is missing data because it returns the data up to the first NULL byte
    assert(gstring.str == desired_len) # this assertion fails

    # calling this causes a segfault after the handler returns, but the data is accessible from gbytes.get_data()
    #gbytes = gstring.free_to_bytes()
    #assert(len(gbytes.get_data()) == desired_len) # this assertion succeeds before the segfault
    return

webview = WebKit.WebView()
webview.connect('resource-load-finished', signal_resource_load_finished)
webview.connect('load-finished', Gtk.main_quit)
# lol cat for demo purposes of a resource containing NULL bytes (mime type: image/png)
webview.load_uri('http://images5.fanpop.com/image/photos/30600000/-Magical-Kitty-lol-cats-30656645-1280-800.png')
Gtk.main()

You don't want to use free_to_bytes as this will not only give you the bytes you want, but also release the string from memory without Python knowing about it - which, as you discovered, crashes your program. 你不想使用free_to_bytes因为这不仅会给你你想要的字节,而且还会在没有Python知道的情况下从内存中释放字符串 - 正如你所发现的那样,崩溃了你的程序。 Unfortunately there isn't a corresponding get_bytes method as GLib.String wasn't really designed to hold binary data. 不幸的是,没有相应的get_bytes方法,因为GLib.String实际上并不是为了保存二进制数据而设计的。

In fact I'd consider it a mistake in the WebKit API that the resource payload is only available as a GLib.String . 事实上,我认为WebKit API中的错误是资源有效负载仅作为GLib.String They seem to have corrected this mistake in WebKit2: http://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebResource.html 他们似乎在WebKit2中纠正了这个错误: http ://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebResource.html

Consider switching to WebKit2 if you can ( from gi.repository import WebKit2 ). 如果可以,请考虑切换到WebKit2( from gi.repository import WebKit2 )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM