简体   繁体   English

Python C扩展链接与自定义共享库

[英]Python C extension link with a custom shared library

I am writing a Python C extension on a very old Red Hat system. 我正在一个非常古老的Red Hat系统上编写Python C扩展。 The system has zlib 1.2.3, which does not correctly support large files. 系统具有zlib 1.2.3,它不能正确支持大文件。 Unfortunately, I can't just upgrade the system zlib to a newer version, since some of the packages poke into internal zlib structures and that breaks on newer zlib versions. 不幸的是,我不能只是将系统zlib升级到更新的版本,因为一些软件包会进入内部zlib结构并且会破坏新的zlib版本。

I would like to build my extension so that all the zlib calls ( gzopen() , gzseek() etc.) are resolved to a custom zlib that I install in my user directory, without affecting the rest of the Python executable and other extensions. 我想构建我的扩展,以便所有zlib调用( gzopen()gzseek()等)被解析为我安装在我的用户目录中的自定义zlib,而不会影响其余的Python可执行文件和其他扩展。

I have tried statically linking in libz.a by adding libz.a to the gcc command line during linking, but it did not work (still cannot create large files using gzopen() for example). 我曾尝试在静态链接libz.a加入libz.a在连接过程中gcc的命令行,但它没有工作(仍然使用不能创建大文件gzopen()为例)。 I also tried passing -z origin -Wl,-rpath=/path/to/zlib -lz to gcc, but that also did not work. 我也尝试将-z origin -Wl,-rpath=/path/to/zlib -lz-z origin -Wl,-rpath=/path/to/zlib -lz给gcc,但这也没有用。

Since newer versions of zlib are still named zlib 1.x , the soname is the same, so I think symbol versioning would not work. 由于较新版本的zlib仍然命名为zlib 1.x ,因此soname是相同的,所以我认为符号版本控制不起作用。 Is there a way to do what I want to do? 有办法做我想做的事吗?

I am on a 32-bit Linux system. 我在32位Linux系统上。 Python version is 2.6, which is custom-built. Python版本是2.6,它是定制的。

Edit : 编辑

I created a minimal example. 我创建了一个最小的例子。 I am using Cython (version 0.19.1). 我正在使用Cython(版本0.19.1)。

File gztest.pyx : 文件gztest.pyx

from libc.stdio cimport printf, fprintf, stderr
from libc.string cimport strerror
from libc.errno cimport errno
from libc.stdint cimport int64_t

cdef extern from "zlib.h":
    ctypedef void *gzFile
    ctypedef int64_t z_off_t

    int gzclose(gzFile fp)
    gzFile gzopen(char *path, char *mode)
    int gzread(gzFile fp, void *buf, unsigned int n)
    char *gzerror(gzFile fp, int *errnum)

cdef void print_error(void *gzfp):
    cdef int errnum = 0
    cdef const char *s = gzerror(gzfp, &errnum)
    fprintf(stderr, "error (%d): %s (%d: %s)\n", errno, strerror(errno), errnum, s)

cdef class GzFile:
    cdef gzFile fp
    cdef char *path
    def __init__(self, path, mode='rb'):
        self.path = path
        self.fp = gzopen(path, mode)
        if self.fp == NULL:
            raise IOError('%s: %s' % (path, strerror(errno)))

    cdef int read(self, void *buf, unsigned int n):
        cdef int r = gzread(self.fp, buf, n)
        if r <= 0:
            print_error(self.fp)
        return r

    cdef int close(self):
        cdef int r = gzclose(self.fp)
        return 0

def read_test():
    cdef GzFile ifp = GzFile('foo.gz')
    cdef char buf[8192]
    cdef int i, j
    cdef int n
    errno = 0
    for 0 <= i < 0x200:
        for 0 <= j < 0x210:
            n = ifp.read(buf, sizeof(buf))
            if n <= 0:
                break

        if n <= 0:
            break

        printf('%lld\n', <long long>ifp.tell())

    printf('%lld\n', <long long>ifp.tell())
    ifp.close()

File setup.py : 文件setup.py

import sys
import os

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

if __name__ == '__main__':
    if 'CUSTOM_GZ' in os.environ:
        d = {
            'include_dirs': ['/home/alok/zlib_lfs/include'],
            'extra_objects': ['/home/alok/zlib_lfs/lib/libz.a'],
            'extra_compile_args': ['-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -g3 -ggdb']
        }
    else:
        d = {'libraries': ['z']}
    ext = Extension('gztest', sources=['gztest.pyx'], **d)
    setup(name='gztest', cmdclass={'build_ext': build_ext}, ext_modules=[ext])

My custom zlib is in /home/alok/zlib_lfs (zlib version 1.2.8): 我的自定义zlib位于/home/alok/zlib_lfs (zlib版本1.2.8):

$ ls ~/zlib_lfs/lib/
libz.a  libz.so  libz.so.1  libz.so.1.2.8  pkgconfig

To compile the module using this libz.a : 要使用此libz.a编译模块:

$ CUSTOM_GZ=1 python setup.py build_ext --inplace
running build_ext
cythoning gztest.pyx to gztest.c
building 'gztest' extension
gcc -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/alok/zlib_lfs/include -I/opt/include/python2.6 -c gztest.c -o build/temp.linux-x86_64-2.6/gztest.o -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -g3 -ggdb
gcc -shared build/temp.linux-x86_64-2.6/gztest.o /home/alok/zlib_lfs/lib/libz.a -L/opt/lib -lpython2.6 -o /home/alok/gztest.so

gcc is being passed all the flags I want (adding full path to libz.a , large file flags, etc.). gcc正在传递我想要的所有标志(添加libz.a完整路径,大文件标志等)。

To build the extension without my custom zlib, I can compile without CUSTOM_GZ defined: 要在没有我的自定义zlib的情况下构建扩展,我可以在没有定义CUSTOM_GZ情况下进行编译:

$ python setup.py build_ext --inplace
running build_ext
cythoning gztest.pyx to gztest.c
building 'gztest' extension
gcc -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/include/python2.6 -c gztest.c -o build/temp.linux-x86_64-2.6/gztest.o
gcc -shared build/temp.linux-x86_64-2.6/gztest.o -L/opt/lib -lz -lpython2.6 -o /home/alok/gztest.so

We can check the size of the gztest.so files: 我们可以检查gztest.so文件的大小:

$ stat --format='%s %n' original/gztest.so custom/gztest.so 
62398 original/gztest.so
627744 custom/gztest.so

So, the statically linked file is much larger, as expected. 因此,静态链接文件比预期的要大得多。

I can now do: 我现在可以这样做:

>>> import gztest
>>> gztest.read_test()

and it will try to read foo.gz in the current directory. 它会尝试在当前目录中读取foo.gz

When I do that using non-statically linked gztest.so , it works as expected until it tries to read more than 2 GB. 当我使用非静态链接的gztest.so执行此操作时,它会按预期工作,直到它尝试读取超过2 GB。

When I do that using statically linked gztest.so , it dumps core: 当我使用静态链接的gztest.so执行此操作时,它会转储核心:

$ python -c 'import gztest; gztest.read_test()'
error (2): No such file or directory (0: )
0
Segmentation fault (core dumped)

The error about No such file or directory is misleading -- the file exists and is gzopen() actually returns successfully. No such file or directory的错误是误导性的 - 文件存在并且gzopen()实际上成功返回。 gzread() fails though. gzread()失败了。

Here is the gdb backtrace: 这是gdb回溯:

(gdb) bt
#0  0xf730eae4 in free () from /lib/libc.so.6
#1  0xf70725e2 in ?? () from /lib/libz.so.1
#2  0xf6ce9c70 in __pyx_f_6gztest_6GzFile_close (__pyx_v_self=0xf6f75278) at gztest.c:1140
#3  0xf6cea289 in __pyx_pf_6gztest_2read_test (__pyx_self=<optimized out>) at gztest.c:1526
#4  __pyx_pw_6gztest_3read_test (__pyx_self=0x0, unused=0x0) at gztest.c:1379
#5  0xf769910d in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:3690
#6  PyEval_EvalFrameEx (f=0x8115c64, throwflag=0) at Python/ceval.c:2389
#7  0xf769a3b4 in PyEval_EvalCodeEx (co=0xf6faada0, globals=0xf6ff81c4, locals=0xf6ff81c4, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#8  0xf769a433 in PyEval_EvalCode (co=0xf6faada0, globals=0xf6ff81c4, locals=0xf6ff81c4) at Python/ceval.c:522
#9  0xf76bbe1a in run_mod (arena=<optimized out>, flags=<optimized out>, locals=<optimized out>, globals=<optimized out>, filename=<optimized out>, mod=<optimized out>) at Python/pythonrun.c:1335
#10 PyRun_StringFlags (str=0x80a24c0 "import gztest; gztest.read_test()\n", start=257, globals=0xf6ff81c4, locals=0xf6ff81c4, flags=0xffbf2888) at Python/pythonrun.c:1298
#11 0xf76bd003 in PyRun_SimpleStringFlags (command=0x80a24c0 "import gztest; gztest.read_test()\n", flags=0xffbf2888) at Python/pythonrun.c:957
#12 0xf76ca1b9 in Py_Main (argc=1, argv=0xffbf2954) at Modules/main.c:548
#13 0x080485b2 in main ()

One of the problems seems to be that the second line in the backtrace refers to libz.so.1 ! 其中一个问题似乎是回溯中的第二行是指libz.so.1 If I do ldd gztest.so , I get, among other lines: 如果我做ldd gztest.so ,我会得到,除其他之外:

    libz.so.1 => /lib/libz.so.1 (0xf6f87000)

I am not sure why that is happening though. 我不知道为什么会发生这种情况。

Edit 2 : 编辑2

I ended up doing the following: 我最后做了以下事情:

  • compiled my custom zlib with all the symbols exported with a z_ prefix. 使用z_前缀导出的所有符号编译了我的自定义zlib。 zlib 's configure script makes this very easy: just run ./configure --zprefix ... . zlibconfigure脚本使这很简单:只需运行./configure --zprefix ...
  • called gzopen64() instead of gzopen() in my Cython code. 在我的Cython代码中调用gzopen64()而不是gzopen() This is because I wanted to make sure I am using the correct "underlying" symbol. 这是因为我想确保使用正确的“底层”符号。
  • used z_off64_t explicitly. 明确使用了z_off64_t
  • statically link my custom zlib.a into the shared library generated by Cython. 将我的自定义zlib.a静态链接到zlib.a生成的共享库中。 I used '-Wl,--whole-archive /home/alok/zlib_lfs_z/lib/libz.a -Wl,--no-whole-archive' while linking with gcc to achieve that. 我使用'-Wl,--whole-archive /home/alok/zlib_lfs_z/lib/libz.a -Wl,--no-whole-archive'同时与gcc链接以实现这一目的。 There might be other ways or this might not be needed but it seemed the simplest way to make sure the correct library gets used. 可能还有其他方法可能不需要,但这似乎是确保使用正确库的最简单方法。

With the above changes, large files work while the rest of the Python extension modules/processes work as before. 通过上述更改,大型文件可以正常工作,而其余的Python扩展模块/进程也可以像以前一样工作。

I would recommend using ctypes . 我建议使用ctypes Write your C library as a normal shared library and than use ctypes to access it. 将您的C库编写为普通的共享库,而不是使用ctypes来访问它。 You would need to write a bit more Python code to transfer the data from Python data structures into C ones. 您需要编写更多Python代码才能将数据从Python数据结构传输到C数据结构中。 The big advantage is that you can isolate everything from the rest of the system. 最大的优点是您可以将所有内容与系统的其他部分隔离开来。 You can explicitly specify the *.so file you would like to load. 您可以显式指定要加载的*.so文件。 The Python C API is not needed. 不需要Python C API。 I have quite good experiences with ctypes . 我对ctypes有很好的经验。 This should be not too difficult for you since you seem proficient with C. 这对你来说应该不会太难,因为你似乎精通C语言。

Looks like this is similar to the problem in another question , except I get the opposite behavior. 看起来这与另一个问题中的问题类似,除了我得到相反的行为。

I downloaded a tarball of zlib-1.2.8 , ran ./configure , then changed the following Makefile variables... 我下载了zlib-1.2.8的tarball,运行./configure ,然后更改了以下Makefile变量...

CFLAGS=-O3  -fPIC -D_LARGEFILE64_SOURCE=1 -D_FILE_OFFSET_BITS=64

SFLAGS=-O3  -fPIC -D_LARGEFILE64_SOURCE=1 -D_FILE_OFFSET_BITS=64

...mostly to add the -fPIC to libz.a so I could link to it in a shared library. ...主要是将-fPIC添加到libz.a以便我可以在共享库中链接到它。

I then added some printf() statements in the gzlib.c functions gzopen() , gzopen64() , and gz_open() so I could easily tell if these were being called. 然后我在gzlib.c函数gzopen()gzopen64()gz_open()添加了一些printf()语句,这样我就可以很容易地判断它们是否被调用了。

After building libz.a and libz.so , I created a really simple foo.c ... 在构建libz.alibz.so ,我创建了一个非常简单的foo.c ...

#include "zlib-1.2.8/zlib.h"

void main()
{
    gzFile foo = gzopen("foo.gz", "rb");
}

...and compiled both a foo standalone binary, and a foo.so shared library with... ...并编译了一个foo独立二进制文件和一个foo.so共享库...

gcc -fPIC -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -o foo.o -c foo.c
gcc -o foo foo.o zlib-1.2.8/libz.a
gcc -shared -o foo.so foo.o zlib-1.2.8/libz.a

Running foo worked as expected, and printed... 运行foo按预期工作,并打印...

gzopen64
gz_open

...but using the foo.so in Python with... ...但是在Python中使用foo.so ...

import ctypes

foo = ctypes.CDLL('./foo.so')
foo.main()

...didn't print anything, so I guess it's using Python's libz.so ... ...没有打印任何东西,所以我猜它正在使用Python的libz.so ...

$ ldd `which python`
        ...
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f5af2c68000)
        ...

...even though foo.so doesn't use it... ...即使foo.so不使用它...

$ ldd foo.so
        linux-vdso.so.1 =>  (0x00007fff93600000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc8bfa98000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fc8c0078000)

The only way I could get it to work was to open the custom libz.so directly with... 我能让它工作的唯一方法是直接打开自定义libz.so ...

import ctypes

libz = ctypes.CDLL('zlib-1.2.8/libz.so.1.2.8')
libz.gzopen64('foo.gz', 'rb')

...which printed out... ......打印出......

gzopen64
gz_open

Note that the translation from gzopen to gzopen64 is done by the pre-processor, so I had to call gzopen64() directly. 请注意,从gzopengzopen64的转换是由预处理器完成的,所以我必须直接调用gzopen64()

So that's one way to fix it, but a better way would probably be to recompile your custom Python 2.6 to either link to the static zlib-1.2.8/libz.a , or disable zlibmodule.c completely, then you'll have more flexibility in your linking options. 所以这是修复它的一种方法,但更好的方法可能是将自定义Python 2.6重新编译为链接到静态zlib-1.2.8/libz.a ,或者完全禁用zlibmodule.c ,然后你会有更多灵活的链接选项。


Update 更新

Regarding _LARGEFILE_SOURCE vs. _LARGEFILE64_SOURCE : I only pointed that out because of this comment in zlib.h ... 关于_LARGEFILE_SOURCE_LARGEFILE64_SOURCE :我只是因为zlib.h的这个注释而指出了...

/* provide 64-bit offset functions if _LARGEFILE64_SOURCE defined, and/or
 * change the regular functions to 64 bits if _FILE_OFFSET_BITS is 64 (if
 * both are true, the application gets the *64 functions, and the regular
 * functions are changed to 64 bits) -- in case these are set on systems
 * without large file support, _LFS64_LARGEFILE must also be true
 */

...the implication being that the gzopen64() function won't be exposed if you don't define _LARGEFILE64_SOURCE . ...这意味着如果您没有定义_LARGEFILE64_SOURCE则不会公开gzopen64()函数。 I'm not sure if _LFS64_LARGEFILE applies to your system or not. 我不确定_LFS64_LARGEFILE适用于您的系统。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM