[英]How do I use a C library in a Rust library compiled to WebAssembly?
I'm experimenting with Rust, WebAssembly and C interoperability to eventually use the Rust (with static C dependency) library in the browser or Node.js. 我正在尝试使用Rust,WebAssembly和C互操作性,最终在浏览器或Node.js中使用Rust(带有静态C依赖)库。 I'm using
wasm-bindgen
for the JavaScript glue code. 我正在使用
wasm-bindgen
作为JavaScript粘合代码。
#![feature(libc, use_extern_macros)]
extern crate wasm_bindgen;
use wasm_bindgen::prelude::*;
use std::os::raw::c_char;
use std::ffi::CStr;
extern "C" {
fn hello() -> *const c_char; // returns "hello from C"
}
#[wasm_bindgen]
pub fn greet() -> String {
let c_msg = unsafe { CStr::from_ptr(hello()) };
format!("{} and Rust!", c_msg.to_str().unwrap())
}
My first naive approach was to have a build.rs
script that uses the gcc crate to generate a static library from the C code. 我的第一个天真的方法是有一个
build.rs
脚本,它使用gcc crate从C代码生成一个静态库。 Before introducing the WASM bits, I could compile the Rust program and see the hello from C
output in the console, now I get an error from the compiler saying 在介绍WASM位之前,我可以编译Rust程序并在控制台中看到
hello from C
输出的hello from C
,现在我从编译器中得到一个错误说
rust-lld: error: unknown file type: hello.o
build.rs build.rs
extern crate gcc;
fn main() {
gcc::Build::new()
.file("src/hello.c")
.compile("libhello.a");
}
This makes sense, now that I think about it, since the hello.o
file was compiled for my laptop's architecture not WebAssembly. 这是有道理的,现在我考虑一下,因为
hello.o
文件是为我的笔记本电脑架构而不是WebAssembly编译的。
Ideally I'd like this to work out of the box adding some magic in my build.rs that would for example compile the C library to be a static WebAssembly library that Rust can use. 理想情况下,我希望这可以开箱即用,在我的build.rs中添加一些魔法,例如将C库编译为Rust可以使用的静态WebAssembly库。
What I think that could work, but would like to avoid since it sounds more problematic, is using Emscripten to create a WASM library for the C code then compile the Rust library separately and glue them together in JavaScript. 我认为可以工作,但是想避免,因为它听起来更有问题,使用Emscripten为C代码创建WASM库,然后单独编译Rust库并将它们粘合在一起。
TL;DR: Jump to " New week, new adventures " in order to get "Hello from C and Rust!" TL; DR:跳到“ 新的一周,新的冒险 ”,以获得“来自C和Rust的你好!”
The nice way would be creating a WASM library and passing it to the linker. 好方法是创建一个WASM库并将其传递给链接器。
rustc
has an option for that (and there seem to be source-code directives too): rustc
有一个选项(并且似乎也有源代码指令):
rustc <yourcode.rs> --target wasm32-unknown-unknown --crate-type=cdylib -C link-arg=<library.wasm>
The trick is that the library has to be a library, so it needs to contain reloc
(and in practice linking
) sections. 技巧是库必须是一个库,因此它需要包含
reloc
(以及实际linking
)部分。 Emscripten seems to have a symbol for that, RELOCATABLE
: Emscripten似乎有一个符号,
RELOCATABLE
:
emcc <something.c> -s WASM=1 -s SIDE_MODULE=1 -s RELOCATABLE=1 -s EMULATED_FUNCTION_POINTERS=1 -s ONLY_MY_CODE=1 -o <something.wasm>
( EMULATED_FUNCTION_POINTERS
is included with RELOCATABLE
, so it is not really necessary, ONLY_MY_CODE
strips some extras, but it does not matter here either) (
EMULATED_FUNCTION_POINTERS
包含在RELOCATABLE
,因此它不是必需的, ONLY_MY_CODE
一些额外内容,但这里也没关系)
The thing is, emcc
never generated a relocatable wasm
file for me, at least not the version I downloaded this week, for Windows (I played this on hard difficulty, which retrospectively might have not been the best idea). 事情是,
emcc
从来没有为我生成一个可重定位的wasm
文件,至少不是我本周下载的版本,对于Windows(我在艰难的困难中玩这个,回顾性地可能不是最好的主意)。 So the sections are missing and rustc
keeps complaining about <something.wasm> is not a relocatable wasm file
. 因此缺少这些部分,并且
rustc
一直在抱怨<something.wasm> is not a relocatable wasm file
。
Then comes clang
, which can generate a relocatable wasm
module with a very simple one-liner: 然后是
clang
,它可以生成一个可重定位的wasm
模块,其中包含一个非常简单的单行程序:
clang -c <something.c> -o <something.wasm> --target=wasm32-unknown-unknown
Then rustc
says "Linking sub-section ended prematurely". 然后
rustc
说“链接子部分提前结束”。 Aw, yes (by the way, my Rust setup was brand new too). 噢,是的(顺便说一下,我的Rust设置也是全新的)。 Then I read that there are two
clang
wasm
targets: wasm32-unknown-unknown-wasm
and wasm32-unknown-unknown-elf
, and maybe the latter one should be used here. 然后我读到有两个
clang
wasm
目标: wasm32-unknown-unknown-wasm
和wasm32-unknown-unknown-elf
,也许后者应该在这里使用。 As my also brand new llvm+clang
build runs into an internal error with this target, asking me to send an error report to the developers, it might be something to test on easy or medium, like on some *nix or Mac box. 由于我的全新
llvm+clang
版本遇到了这个目标的内部错误,要求我向开发人员发送错误报告,它可能是在简单或中等测试,例如在某些* nix或Mac框上。
At this point I just added lld
to llvm
and succeeded with linking a test code manually from bitcode files: 此时我刚刚将
lld
添加到llvm
并成功从bitcode文件手动链接测试代码:
clang cadd.c --target=wasm32-unknown-unknown -emit-llvm -c
rustc rsum.rs --target wasm32-unknown-unknown --crate-type=cdylib --emit llvm-bc
lld -flavor wasm rsum.bc cadd.bc -o msum.wasm --no-entry
Aw yes, it sums numbers, 2 in C
and 1+2 in Rust: 是的,它总结数字,
C
2,Rust中为1 + 2:
cadd.c cadd.c
int cadd(int x,int y){
return x+y;
}
msum.rs msum.rs
extern "C" {
fn cadd(x: i32, y: i32) -> i32;
}
#[no_mangle]
pub fn rsum(x: i32, y: i32, z: i32) -> i32 {
x + unsafe { cadd(y, z) }
}
test.html 的test.html
<script>
fetch('msum.wasm')
.then(response => response.arrayBuffer())
.then(bytes => WebAssembly.compile(bytes))
.then(module => {
console.log(WebAssembly.Module.exports(module));
console.log(WebAssembly.Module.imports(module));
return WebAssembly.instantiate(module, {
env:{
_ZN4core9panicking5panic17hfbb77505dc622acdE:alert
}
});
})
.then(instance => {
alert(instance.exports.rsum(13,14,15));
});
</script>
That _ZN4core9panicking5panic17hfbb77505dc622acdE
feels very natural (the module is compiled and instantiated in two steps in order to log the exports and imports, that is a way how such missing pieces can be found), and forecasts the demise of this attempt: the entire thing works because there is no other reference to the runtime library, and this particular method could be mocked/provided manually. _ZN4core9panicking5panic17hfbb77505dc622acdE
感觉非常自然(模块被编译并实例化为两步,以记录导出和导入,这是一种如何找到这样的缺失部分的方式),并预测这种尝试的消亡:整个过程是有效的,因为没有其他对运行时库的引用,可以手动模拟/提供此特定方法。
As alloc
and its Layout
thing scared me a little, I went with the vector-based approach described/used from time to time, for example here or on Hello, Rust! 由于
alloc
和它的Layout
让我感到害怕,我不时地使用描述/使用的基于矢量的方法,例如这里或者Hello,Rust! . 。
Here is an example, getting the "Hello from ..." string from the outside... 这是一个例子,从外面获取“Hello from ...”字符串...
rhello.rs rhello.rs
use std::ffi::CStr;
use std::mem;
use std::os::raw::{c_char, c_void};
use std::ptr;
extern "C" {
fn chello() -> *mut c_char;
}
#[no_mangle]
pub fn alloc(size: usize) -> *mut c_void {
let mut buf = Vec::with_capacity(size);
let p = buf.as_mut_ptr();
mem::forget(buf);
p as *mut c_void
}
#[no_mangle]
pub fn dealloc(p: *mut c_void, size: usize) {
unsafe {
let _ = Vec::from_raw_parts(p, 0, size);
}
}
#[no_mangle]
pub fn hello() -> *mut c_char {
let phello = unsafe { chello() };
let c_msg = unsafe { CStr::from_ptr(phello) };
let message = format!("{} and Rust!", c_msg.to_str().unwrap());
dealloc(phello as *mut c_void, c_msg.to_bytes().len() + 1);
let bytes = message.as_bytes();
let len = message.len();
let p = alloc(len + 1) as *mut u8;
unsafe {
for i in 0..len as isize {
ptr::write(p.offset(i), bytes[i as usize]);
}
ptr::write(p.offset(len as isize), 0);
}
p as *mut c_char
}
Built as rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib
内置为
rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib
... and actually working with JavaScript
: ...并且实际使用
JavaScript
:
jhello.html jhello.html
<script>
var e;
fetch('rhello.wasm')
.then(response => response.arrayBuffer())
.then(bytes => WebAssembly.compile(bytes))
.then(module => {
console.log(WebAssembly.Module.exports(module));
console.log(WebAssembly.Module.imports(module));
return WebAssembly.instantiate(module, {
env:{
chello:function(){
var s="Hello from JavaScript";
var p=e.alloc(s.length+1);
var m=new Uint8Array(e.memory.buffer);
for(var i=0;i<s.length;i++)
m[p+i]=s.charCodeAt(i);
m[s.length]=0;
return p;
}
}
});
})
.then(instance => {
/*var*/ e=instance.exports;
var ptr=e.hello();
var optr=ptr;
var m=new Uint8Array(e.memory.buffer);
var s="";
while(m[ptr]!=0)
s+=String.fromCharCode(m[ptr++]);
e.dealloc(optr,s.length+1);
console.log(s);
});
</script>
It is not particularly beautiful (actually I have no clue about Rust), but it does something what I expect from it, and even that dealloc
might work (at least invoking it twice throws a panic). 它并不是特别漂亮(实际上我对Rust没有任何线索),但它做了我对它的期望,甚至
dealloc
可能会起作用(至少调用它两次会引起恐慌)。
There was an important lesson on the way: when the module manages its memory, its size may change which results in invalidating the backing ArrayBuffer
object and its views. 方法有一个重要的教训:当模块管理其内存时,其大小可能会发生变化,从而导致后备
ArrayBuffer
对象及其视图无效。 So that is why memory.buffer
is checked multiple times, and checked after calling into wasm
code. 这就是为什么
memory.buffer
检查memory.buffer
,并在调用wasm
代码后检查。
And this is where I am stuck, because this code would refer to runtime libraries, and .rlib
-s. 这就是我被困的地方,因为这段代码会引用运行时库和
.rlib
-s。 The closest I could get to a manual build is the following: 我最接近手动构建的是:
rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib --emit obj
lld -flavor wasm rhello.o -o rhello.wasm --no-entry --allow-undefined
liballoc-5235bf36189564a3.rlib liballoc_system-f0b9538845741d3e.rlib
libcompiler_builtins-874d313336916306.rlib libcore-5725e7f9b84bd931.rlib
libdlmalloc-fffd4efad67b62a4.rlib liblibc-453d825a151d7dec.rlib
libpanic_abort-43290913ef2070ae.rlib libstd-dcc98be97614a8b6.rlib
libunwind-8cd3b0417a81fb26.rlib
Where I had to use the lld
sitting in the depths of the Rust toolchain as .rlib
-s are said to be interpreted , so they are bound to the Rust
toolchain 我必须使用位于Rust工具链深处的
lld
作为.rlib
-s据说被解释 ,因此它们被绑定到Rust
工具链
--crate-type=rlib
,#[crate_type = "rlib"]
- A "Rust library" file will be produced.--crate-type=rlib
,#[crate_type = "rlib"]
- 将生成“Rust库”文件。 This is used as an intermediate artifact and can be thought of as a "static Rust library".这用作中间工件,可以被认为是“静态Rust库”。 These
rlib
files, unlikestaticlib
files, are interpreted by the Rust compiler in future linkage.与
staticlib
文件不同,这些rlib
文件在将来的链接中由Rust编译器解释。 This essentially means thatrustc
will look for metadata inrlib
files like it looks for metadata in dynamic libraries.这实际上意味着
rustc
将在rlib
文件中查找元数据,就像在动态库中查找元数据一样。 This form of output is used to produce statically linked executables as well asstaticlib
outputs.这种输出形式用于生成静态链接的可执行文件以及
staticlib
输出。
Of course this lld
does not eat the .wasm
/ .o
files generated with clang
or llc
("Linking sub-section ended prematurely"), perhaps the Rust-part also should be rebuilt with the custom llvm
. 当然这个
lld
不会吃掉用clang
或llc
生成的.wasm
/ .o
文件(“链接子节过早结束”),也许Rust部分也应该用自定义llvm
重建。
Also, this build seems to be missing the actual allocators, besides chello
, there will be 4 more entries in the import table: __rust_alloc
, __rust_alloc_zeroed
, __rust_dealloc
and __rust_realloc
. 此外,这个构建似乎缺少实际的分配器,除了
chello
之外,导入表中还会有4个条目: __rust_alloc
, __rust_alloc_zeroed
, __rust_dealloc
和__rust_realloc
。 Which in fact could be provided from JavaScript after all, just defeats the idea of letting Rust handle its own memory, plus an allocator was present in the single-pass rustc
build... Oh, yes, this is where I gave up for this week (Aug 11, 2018, at 21:56) 实际上这可以从JavaScript中提供,只是打败了让Rust处理自己的内存的想法,再加上一个分配器出现在单通道的
rustc
构建中......哦,是的,这就是我放弃了这个周(2018年8月11日,21:56)
wasm-dis/merge
wasm-dis/merge
The idea was to modify the ready-made Rust code (having allocators and everything in place). 想法是修改现成的Rust代码(具有分配器和一切就绪)。 And this one works.
这一个有效。 As long as your C code has no data.
只要你的C代码没有数据。
Proof of concept code: 概念证明代码:
chello.c chello.c
void *alloc(int len); // allocator comes from Rust
char *chello(){
char *hell=alloc(13);
hell[0]='H';
hell[1]='e';
hell[2]='l';
hell[3]='l';
hell[4]='o';
hell[5]=' ';
hell[6]='f';
hell[7]='r';
hell[8]='o';
hell[9]='m';
hell[10]=' ';
hell[11]='C';
hell[12]=0;
return hell;
}
Not extremely usual, but it is C code. 不是很平常,但它是C代码。
rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib
wasm-dis rhello.wasm -o rhello.wast
clang chello.c --target=wasm32-unknown-unknown -nostdlib -Wl,--no-entry,--export=chello,--allow-undefined
wasm-dis a.out -o chello.wast
wasm-merge rhello.wast chello.wast -o mhello.wasm -O
( rhello.rs
is the same one presented in "Side story: string") (
rhello.rs
与“Side story:string”中rhello.rs
的相同)
And the result works as 结果如下
mhello.html mhello.html
<script>
fetch('mhello.wasm')
.then(response => response.arrayBuffer())
.then(bytes => WebAssembly.compile(bytes))
.then(module => {
console.log(WebAssembly.Module.exports(module));
console.log(WebAssembly.Module.imports(module));
return WebAssembly.instantiate(module, {
env:{
memoryBase: 0,
tableBase: 0
}
});
})
.then(instance => {
var e=instance.exports;
var ptr=e.hello();
console.log(ptr);
var optr=ptr;
var m=new Uint8Array(e.memory.buffer);
var s="";
while(m[ptr]!=0)
s+=String.fromCharCode(m[ptr++]);
e.dealloc(optr,s.length+1);
console.log(s);
});
</script>
Even the allocators seem to do something ( ptr
readings from repeated blocks with/without dealloc
show how memory does not leak/leaks accordingly). 甚至分配器似乎都做了一些事情(带有/不带
dealloc
重复块的ptr
读数显示内存不会泄漏/泄漏)。
Of course this is super-fragile and has mysterious parts too: 当然,这是非常脆弱的,也有神秘的部分:
-S
switch (generates source code instead of .wasm
), and the result assembly file is compiled separately (using wasm-as
), the result will be a couple bytes shorter (and those bytes are somewhere in the very middle of the running code, not in export/import/data sections) -S
开关运行最终合并(生成源代码而不是.wasm
),并且结果汇编文件是单独编译的(使用wasm-as
),结果将缩短几个字节(这些字节在某处)正在运行的代码的中间,而不是导出/导入/数据部分) wasm-merge chello.wast rhello.wast [...]
dies with an entertaining message wasm-merge chello.wast rhello.wast [...]
死于一个有趣的消息
[wasm-validator error in module] unexpected false: segment offset should be reasonable, on
[模块中的wasm-validator错误]意外错误:段偏移应合理,开启
[i32] (i32.const 1)[i32](i32.const 1)
Fatal: error in validating output致命:验证输出时出错
chello.wasm
module (so, with linking). chello.wasm
模块(所以,通过链接)。 Compiling only ( clang -c [...]
) resulted in the relocatable module which was missed so much at the very beginning of this story, but decompiling that one (to .wast
) lost the named export ( chello()
): clang -c [...]
)导致可重定位模块在本故事的最开始时错过了很多,但反编译那个(到.wast
)丢失了命名导出( chello()
): (export "chello" (func $chello))
disappears completely (export "chello" (func $chello))
完全消失 (func $chello ...
becomes (func $0 ...
, an internal function ( wasm-dis
loses reloc
and linking
sections, putting only a remark about them and their size into the assembly source) (func $chello ...
成为(func $0 ...
,一个内部函数( wasm-dis
失去reloc
和linking
部分,只将关于它们及其大小的注释放入汇编源中) wasm-merge
: while there is a chance for catching references to the string itself ( const char *HELLO="Hello from C";
becomes a constant at offset 1024 in particular, and later referred as (i32.const 1024)
if it is local constant, inside a function), it does not happen. wasm-merge
定位:虽然有可能捕获对字符串本身的引用( const char *HELLO="Hello from C";
特别是偏移1024处的常量,后来称为(i32.const 1024)
如果它是局部常量,在函数内部),它不会发生。 And if it is a global constant, its address becomes a global constant too, number 1024 stored at offset 1040, and the string is going to be referred as (i32.load offset=1040 [...]
, which starts being difficult to catch. (i32.load offset=1040 [...]
,这开始很难抓住。 For laughs, this code compiles and works too... 对于笑,这段代码编译和工作也...
void *alloc(int len);
int my_strlen(const char *ptr){
int ret=0;
while(*ptr++)ret++;
return ret;
}
char *my_strcpy(char *dst,const char *src){
char *ret=dst;
while(*src)*dst++=*src++;
*dst=0;
return ret;
}
char *chello(){
const char *HELLO="Hello from C";
char *hell=alloc(my_strlen(HELLO)+1);
return my_strcpy(hell,HELLO);
}
... just it writes "Hello from C" in the middle of Rust's message pool, resulting in the printout ...只是它在Rust的消息池中间写了“Hello from C”,导致打印输出
Hello from Clt::unwrap()` on an `Err`an value and Rust!
你好,来自Clt :: unwrap()`的'Err`an值和Rust!
(Explanation: 0-initializers are not present in the recompiled code because of the optimization flag, -O
) (说明:由于优化标志,
-O
),重新编译的代码中不存在0-initializers
And it also brings up the question about locating a libc
(though defining them without my_
, clang
mentions strlen
and strcpy
as built-ins, also telling their correct singatures, it does not emit code for them and they become imports for the resulting module). 它还提出了一个关于定位
libc
的问题(尽管在没有my_
情况下定义它们, clang
提到strlen
和strcpy
作为内置函数,也告诉它们正确的单一,它不会为它们发出代码,它们会成为生成模块的导入) 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.