简体   繁体   English

如何在编译为WebAssembly的Rust库中使用C库?

[英]How do I use a C library in a Rust library compiled to WebAssembly?

I'm experimenting with Rust, WebAssembly and C interoperability to eventually use the Rust (with static C dependency) library in the browser or Node.js. 我正在尝试使用Rust,WebAssembly和C互操作性,最终在浏览器或Node.js中使用Rust(带有静态C依赖)库。 I'm using wasm-bindgen for the JavaScript glue code. 我正在使用wasm-bindgen作为JavaScript粘合代码。

#![feature(libc, use_extern_macros)]
extern crate wasm_bindgen;

use wasm_bindgen::prelude::*;
use std::os::raw::c_char;
use std::ffi::CStr;

extern "C" {
    fn hello() -> *const c_char; // returns "hello from C" 
}

#[wasm_bindgen]
pub fn greet() -> String {
    let c_msg = unsafe { CStr::from_ptr(hello()) };
    format!("{} and Rust!", c_msg.to_str().unwrap())
}

My first naive approach was to have a build.rs script that uses the gcc crate to generate a static library from the C code. 我的第一个天真的方法是有一个build.rs脚本,它使用gcc crate从C代码生成一个静态库。 Before introducing the WASM bits, I could compile the Rust program and see the hello from C output in the console, now I get an error from the compiler saying 在介绍WASM位之前,我可以编译Rust程序并在控制台中看到hello from C输出的hello from C ,现在我从编译器中得到一个错误说

rust-lld: error: unknown file type: hello.o

build.rs build.rs

extern crate gcc;                                                                                         

fn main() {
    gcc::Build::new()
        .file("src/hello.c")
        .compile("libhello.a");
}

This makes sense, now that I think about it, since the hello.o file was compiled for my laptop's architecture not WebAssembly. 这是有道理的,现在我考虑一下,因为hello.o文件是为我的笔记本电脑架构而不是WebAssembly编译的。

Ideally I'd like this to work out of the box adding some magic in my build.rs that would for example compile the C library to be a static WebAssembly library that Rust can use. 理想情况下,我希望这可以开箱即用,在我的build.rs中添加一些魔法,例如将C库编译为Rust可以使用的静态WebAssembly库。

What I think that could work, but would like to avoid since it sounds more problematic, is using Emscripten to create a WASM library for the C code then compile the Rust library separately and glue them together in JavaScript. 我认为可以工作,但是想避免,因为它听起来更有问题,使用Emscripten为C代码创建WASM库,然后单独编译Rust库并将它们粘合在一起。

TL;DR: Jump to " New week, new adventures " in order to get "Hello from C and Rust!" TL; DR:跳到“ 新的一周,新的冒险 ”,以获得“来自C和Rust的你好!”

The nice way would be creating a WASM library and passing it to the linker. 好方法是创建一个WASM库并将其传递给链接器。 rustc has an option for that (and there seem to be source-code directives too): rustc有一个选项(并且似乎也有源代码指令):

rustc <yourcode.rs> --target wasm32-unknown-unknown --crate-type=cdylib -C link-arg=<library.wasm>

The trick is that the library has to be a library, so it needs to contain reloc (and in practice linking ) sections. 技巧是库必须是一个库,因此它需要包含reloc (以及实际linking )部分。 Emscripten seems to have a symbol for that, RELOCATABLE : Emscripten似乎有一个符号, RELOCATABLE

emcc <something.c> -s WASM=1 -s SIDE_MODULE=1 -s RELOCATABLE=1 -s EMULATED_FUNCTION_POINTERS=1 -s ONLY_MY_CODE=1 -o <something.wasm>

( EMULATED_FUNCTION_POINTERS is included with RELOCATABLE , so it is not really necessary, ONLY_MY_CODE strips some extras, but it does not matter here either) EMULATED_FUNCTION_POINTERS包含在RELOCATABLE ,因此它不是必需的, ONLY_MY_CODE一些额外内容,但这里也没关系)

The thing is, emcc never generated a relocatable wasm file for me, at least not the version I downloaded this week, for Windows (I played this on hard difficulty, which retrospectively might have not been the best idea). 事情是, emcc从来没有为我生成一个可重定位的wasm文件,至少不是我本周下载的版本,对于Windows(我在艰难的困难中玩这个,回顾性地可能不是最好的主意)。 So the sections are missing and rustc keeps complaining about <something.wasm> is not a relocatable wasm file . 因此缺少这些部分,并且rustc一直在抱怨<something.wasm> is not a relocatable wasm file

Then comes clang , which can generate a relocatable wasm module with a very simple one-liner: 然后是clang ,它可以生成一个可重定位的wasm模块,其中包含一个非常简单的单行程序:

clang -c <something.c> -o <something.wasm> --target=wasm32-unknown-unknown

Then rustc says "Linking sub-section ended prematurely". 然后rustc说“链接子部分提前结束”。 Aw, yes (by the way, my Rust setup was brand new too). 噢,是的(顺便说一下,我的Rust设置也是全新的)。 Then I read that there are two clang wasm targets: wasm32-unknown-unknown-wasm and wasm32-unknown-unknown-elf , and maybe the latter one should be used here. 然后我读到有两个clang wasm目标: wasm32-unknown-unknown-wasmwasm32-unknown-unknown-elf ,也许后者应该在这里使用。 As my also brand new llvm+clang build runs into an internal error with this target, asking me to send an error report to the developers, it might be something to test on easy or medium, like on some *nix or Mac box. 由于我的全新llvm+clang版本遇到了这个目标的内部错误,要求我向开发人员发送错误报告,它可能是在简单或中等测试,例如在某些* nix或Mac框上。

Minimal success story: sum of three numbers 最小的成功故事:三个数字的总和

At this point I just added lld to llvm and succeeded with linking a test code manually from bitcode files: 此时我刚刚将lld添加到llvm并成功从bitcode文件手动链接测试代码:

clang cadd.c --target=wasm32-unknown-unknown -emit-llvm -c
rustc rsum.rs --target wasm32-unknown-unknown --crate-type=cdylib --emit llvm-bc
lld -flavor wasm rsum.bc cadd.bc -o msum.wasm --no-entry

Aw yes, it sums numbers, 2 in C and 1+2 in Rust: 是的,它总结数字, C 2,Rust中为1 + 2:

cadd.c cadd.c

int cadd(int x,int y){
  return x+y;
}

msum.rs msum.rs

extern "C" {
    fn cadd(x: i32, y: i32) -> i32;
}

#[no_mangle]
pub fn rsum(x: i32, y: i32, z: i32) -> i32 {
    x + unsafe { cadd(y, z) }
}

test.html 的test.html

<script>
  fetch('msum.wasm')
    .then(response => response.arrayBuffer())
    .then(bytes => WebAssembly.compile(bytes))
    .then(module => {
      console.log(WebAssembly.Module.exports(module));
      console.log(WebAssembly.Module.imports(module));
      return WebAssembly.instantiate(module, {
        env:{
          _ZN4core9panicking5panic17hfbb77505dc622acdE:alert
        }
      });
    })
    .then(instance => {
      alert(instance.exports.rsum(13,14,15));
    });
</script>

That _ZN4core9panicking5panic17hfbb77505dc622acdE feels very natural (the module is compiled and instantiated in two steps in order to log the exports and imports, that is a way how such missing pieces can be found), and forecasts the demise of this attempt: the entire thing works because there is no other reference to the runtime library, and this particular method could be mocked/provided manually. _ZN4core9panicking5panic17hfbb77505dc622acdE感觉非常自然(模块被编译并实例化为两步,以记录导出和导入,这是一种如何找到这样的缺失部分的方式),并预测这种尝试的消亡:整个过程是有效的,因为没有其他对运行时库的引用,可以手动模拟/提供此特定方法。

Side story: string 侧面故事:字符串

As alloc and its Layout thing scared me a little, I went with the vector-based approach described/used from time to time, for example here or on Hello, Rust! 由于alloc和它的Layout让我感到害怕,我不时地使用描述/使用的基于矢量的方法,例如这里或者Hello,Rust! .
Here is an example, getting the "Hello from ..." string from the outside... 这是一个例子,从外面获取“Hello from ...”字符串...

rhello.rs rhello.rs

use std::ffi::CStr;
use std::mem;
use std::os::raw::{c_char, c_void};
use std::ptr;

extern "C" {
    fn chello() -> *mut c_char;
}

#[no_mangle]
pub fn alloc(size: usize) -> *mut c_void {
    let mut buf = Vec::with_capacity(size);
    let p = buf.as_mut_ptr();
    mem::forget(buf);
    p as *mut c_void
}

#[no_mangle]
pub fn dealloc(p: *mut c_void, size: usize) {
    unsafe {
        let _ = Vec::from_raw_parts(p, 0, size);
    }
}

#[no_mangle]
pub fn hello() -> *mut c_char {
    let phello = unsafe { chello() };
    let c_msg = unsafe { CStr::from_ptr(phello) };
    let message = format!("{} and Rust!", c_msg.to_str().unwrap());
    dealloc(phello as *mut c_void, c_msg.to_bytes().len() + 1);
    let bytes = message.as_bytes();
    let len = message.len();
    let p = alloc(len + 1) as *mut u8;
    unsafe {
        for i in 0..len as isize {
            ptr::write(p.offset(i), bytes[i as usize]);
        }
        ptr::write(p.offset(len as isize), 0);
    }
    p as *mut c_char
}

Built as rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib 内置为rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib

... and actually working with JavaScript : ...并且实际使用JavaScript

jhello.html jhello.html

<script>
  var e;
  fetch('rhello.wasm')
    .then(response => response.arrayBuffer())
    .then(bytes => WebAssembly.compile(bytes))
    .then(module => {
      console.log(WebAssembly.Module.exports(module));
      console.log(WebAssembly.Module.imports(module));
      return WebAssembly.instantiate(module, {
        env:{
          chello:function(){
            var s="Hello from JavaScript";
            var p=e.alloc(s.length+1);
            var m=new Uint8Array(e.memory.buffer);
            for(var i=0;i<s.length;i++)
              m[p+i]=s.charCodeAt(i);
            m[s.length]=0;
            return p;
          }
        }
      });
    })
    .then(instance => {
      /*var*/ e=instance.exports;
      var ptr=e.hello();
      var optr=ptr;
      var m=new Uint8Array(e.memory.buffer);
      var s="";
      while(m[ptr]!=0)
        s+=String.fromCharCode(m[ptr++]);
      e.dealloc(optr,s.length+1);
      console.log(s);
    });
</script>

It is not particularly beautiful (actually I have no clue about Rust), but it does something what I expect from it, and even that dealloc might work (at least invoking it twice throws a panic). 它并不是特别漂亮(实际上我对Rust没有任何线索),但它做了我对它的期望,甚至dealloc可能会起作用(至少调用它两次会引起恐慌)。
There was an important lesson on the way: when the module manages its memory, its size may change which results in invalidating the backing ArrayBuffer object and its views. 方法有一个重要的教训:当模块管理其内存时,其大小可能会发生变化,从而导致后备ArrayBuffer对象及其视图无效。 So that is why memory.buffer is checked multiple times, and checked after calling into wasm code. 这就是为什么memory.buffer检查memory.buffer ,并调用wasm代码检查。

And this is where I am stuck, because this code would refer to runtime libraries, and .rlib -s. 这就是我被困的地方,因为这段代码会引用运行时库和.rlib -s。 The closest I could get to a manual build is the following: 我最接近手动构建的是:

rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib --emit obj
lld -flavor wasm rhello.o -o rhello.wasm --no-entry --allow-undefined
     liballoc-5235bf36189564a3.rlib liballoc_system-f0b9538845741d3e.rlib
     libcompiler_builtins-874d313336916306.rlib libcore-5725e7f9b84bd931.rlib
     libdlmalloc-fffd4efad67b62a4.rlib liblibc-453d825a151d7dec.rlib
     libpanic_abort-43290913ef2070ae.rlib libstd-dcc98be97614a8b6.rlib
     libunwind-8cd3b0417a81fb26.rlib

Where I had to use the lld sitting in the depths of the Rust toolchain as .rlib -s are said to be interpreted , so they are bound to the Rust toolchain 我必须使用位于Rust工具链深处的lld作为.rlib -s据说被解释 ,因此它们被绑定到Rust工具链

--crate-type=rlib , #[crate_type = "rlib"] - A "Rust library" file will be produced. --crate-type=rlib#[crate_type = "rlib"] - 将生成“Rust库”文件。 This is used as an intermediate artifact and can be thought of as a "static Rust library". 这用作中间工件,可以被认为是“静态Rust库”。 These rlib files, unlike staticlib files, are interpreted by the Rust compiler in future linkage. staticlib文件不同,这些rlib文件在将来的链接中由Rust编译器解释。 This essentially means that rustc will look for metadata in rlib files like it looks for metadata in dynamic libraries. 这实际上意味着rustc将在rlib文件中查找元数据,就像在动态库中查找元数据一样。 This form of output is used to produce statically linked executables as well as staticlib outputs. 这种输出形式用于生成静态链接的可执行文件以及staticlib输出。

Of course this lld does not eat the .wasm / .o files generated with clang or llc ("Linking sub-section ended prematurely"), perhaps the Rust-part also should be rebuilt with the custom llvm . 当然这个lld不会吃掉用clangllc生成的.wasm / .o文件(“链接子节过早结束”),也许Rust部分也应该用自定义llvm重建。
Also, this build seems to be missing the actual allocators, besides chello , there will be 4 more entries in the import table: __rust_alloc , __rust_alloc_zeroed , __rust_dealloc and __rust_realloc . 此外,这个构建似乎缺少实际的分配器,除了chello之外,导入表中还会有4个条目: __rust_alloc__rust_alloc_zeroed__rust_dealloc__rust_realloc Which in fact could be provided from JavaScript after all, just defeats the idea of letting Rust handle its own memory, plus an allocator was present in the single-pass rustc build... Oh, yes, this is where I gave up for this week (Aug 11, 2018, at 21:56) 实际上这可以从JavaScript中提供,只是打败了让Rust处理自己的内存的想法,再加上一个分配器出现在单通道的rustc构建中......哦,是的,这就是我放弃了这个周(2018年8月11日,21:56)

New week, new adventures, with Binaryen, wasm-dis/merge 新的一周,新的冒险,与Binaryen, wasm-dis/merge

The idea was to modify the ready-made Rust code (having allocators and everything in place). 想法是修改现成的Rust代码(具有分配器和一切就绪)。 And this one works. 这一个有效。 As long as your C code has no data. 只要你的C代码没有数据。

Proof of concept code: 概念证明代码:

chello.c chello.c

void *alloc(int len); // allocator comes from Rust

char *chello(){
  char *hell=alloc(13);
  hell[0]='H';
  hell[1]='e';
  hell[2]='l';
  hell[3]='l';
  hell[4]='o';
  hell[5]=' ';
  hell[6]='f';
  hell[7]='r';
  hell[8]='o';
  hell[9]='m';
  hell[10]=' ';
  hell[11]='C';
  hell[12]=0;
  return hell;
}

Not extremely usual, but it is C code. 不是很平常,但它是C代码。

rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib
wasm-dis rhello.wasm -o rhello.wast
clang chello.c --target=wasm32-unknown-unknown -nostdlib -Wl,--no-entry,--export=chello,--allow-undefined
wasm-dis a.out -o chello.wast
wasm-merge rhello.wast chello.wast -o mhello.wasm -O

( rhello.rs is the same one presented in "Side story: string") rhello.rs与“Side story:string”中rhello.rs的相同)
And the result works as 结果如下

mhello.html mhello.html

<script>
  fetch('mhello.wasm')
    .then(response => response.arrayBuffer())
    .then(bytes => WebAssembly.compile(bytes))
    .then(module => {
      console.log(WebAssembly.Module.exports(module));
      console.log(WebAssembly.Module.imports(module));
      return WebAssembly.instantiate(module, {
        env:{
          memoryBase: 0,
          tableBase: 0
        }
      });
    })
    .then(instance => {
      var e=instance.exports;
      var ptr=e.hello();
      console.log(ptr);
      var optr=ptr;
      var m=new Uint8Array(e.memory.buffer);
      var s="";
      while(m[ptr]!=0)
        s+=String.fromCharCode(m[ptr++]);
      e.dealloc(optr,s.length+1);
      console.log(s);
    });
</script>

Even the allocators seem to do something ( ptr readings from repeated blocks with/without dealloc show how memory does not leak/leaks accordingly). 甚至分配器似乎都做了一些事情(带有/不带dealloc重复块的ptr读数显示内存不会泄漏/泄漏)。

Of course this is super-fragile and has mysterious parts too: 当然,这是非常脆弱的,也有神秘的部分:

  • if the final merge is run with -S switch (generates source code instead of .wasm ), and the result assembly file is compiled separately (using wasm-as ), the result will be a couple bytes shorter (and those bytes are somewhere in the very middle of the running code, not in export/import/data sections) 如果使用-S开关运行最终合并(生成源代码而不是.wasm ),并且结果汇编文件是单独编译的(使用wasm-as ),结果将缩短几个字节(这些字节在某处)正在运行的代码的中间,而不是导出/导入/数据部分)
  • the order of merge matters, file with "Rust-origin" has to come first. 合并的顺序,带有“Rust-origin”的文件必须先行。 wasm-merge chello.wast rhello.wast [...] dies with an entertaining message wasm-merge chello.wast rhello.wast [...]死于一个有趣的消息

    [wasm-validator error in module] unexpected false: segment offset should be reasonable, on [模块中的wasm-validator错误]意外错误:段偏移应合理,开启
    [i32] (i32.const 1) [i32](i32.const 1)
    Fatal: error in validating output 致命:验证输出时出错

  • probably my fault, but I had to build a complete chello.wasm module (so, with linking). 可能是我的错,但我必须建立一个完整的chello.wasm模块(所以,通过链接)。 Compiling only ( clang -c [...] ) resulted in the relocatable module which was missed so much at the very beginning of this story, but decompiling that one (to .wast ) lost the named export ( chello() ): 仅编译( clang -c [...] )导致可重定位模块在本故事的最开始时错过了很多,但反编译那个(到.wast )丢失了命名导出( chello() ):
    (export "chello" (func $chello)) disappears completely (export "chello" (func $chello))完全消失
    (func $chello ... becomes (func $0 ... , an internal function ( wasm-dis loses reloc and linking sections, putting only a remark about them and their size into the assembly source) (func $chello ...成为(func $0 ... ,一个内部函数( wasm-dis失去reloclinking部分,只将关于它们及其大小的注释放入汇编源中)
  • related to the previous one: this way (building a complete module) data from the secondary module can not be relocated by wasm-merge : while there is a chance for catching references to the string itself ( const char *HELLO="Hello from C"; becomes a constant at offset 1024 in particular, and later referred as (i32.const 1024) if it is local constant, inside a function), it does not happen. 与前一个相关:这种方式(构建一个完整的模块)来自辅助模块的数据wasm-merge定位:虽然有可能捕获对字符串本身的引用( const char *HELLO="Hello from C";特别是偏移1024处的常量,后来称为(i32.const 1024)如果它是局部常量,在函数内部),它不会发生。 And if it is a global constant, its address becomes a global constant too, number 1024 stored at offset 1040, and the string is going to be referred as (i32.load offset=1040 [...] , which starts being difficult to catch. 如果它是一个全局常量,它的地址也变成一个全局常量,数字1024存储在偏移1040,字符串将被称为(i32.load offset=1040 [...] ,这开始很难抓住。

For laughs, this code compiles and works too... 对于笑,这段代码编译和工作也...

void *alloc(int len);

int my_strlen(const char *ptr){
  int ret=0;
  while(*ptr++)ret++;
  return ret;
}

char *my_strcpy(char *dst,const char *src){
  char *ret=dst;
  while(*src)*dst++=*src++;
  *dst=0;
  return ret;
}

char *chello(){
  const char *HELLO="Hello from C";
  char *hell=alloc(my_strlen(HELLO)+1);
  return my_strcpy(hell,HELLO);
}

... just it writes "Hello from C" in the middle of Rust's message pool, resulting in the printout ...只是它在Rust的消息池中间写了“Hello from C”,导致打印输出

Hello from Clt::unwrap()` on an `Err`an value and Rust! 你好,来自Clt :: unwrap()`的'Err`an值和Rust!

(Explanation: 0-initializers are not present in the recompiled code because of the optimization flag, -O ) (说明:由于优化标志, -O ),重新编译的代码中不存在0-initializers
And it also brings up the question about locating a libc (though defining them without my_ , clang mentions strlen and strcpy as built-ins, also telling their correct singatures, it does not emit code for them and they become imports for the resulting module). 它还提出了一个关于定位libc的问题(尽管在没有my_情况下定义它们, clang提到strlenstrcpy作为内置函数,也告诉它们正确的单一,它不会为它们发出代码,它们会成为生成模块的导入) 。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM