简体   繁体   English

将libc :: getcwd的输出转换为字符串

[英]Converting output of libc::getcwd into a string

I'd like to print out the result of libc::getcwd . 我想打印出libc::getcwd的结果。 My issue is that to create getcwd takes an i8 ( c_char ) buffer, whereas String::from_utf8 needs a u8 buffer. 我的问题是创建getcwd需要一个i8c_char )缓冲区,而String::from_utf8需要一个u8缓冲区。 I started with: 我开始时:

static BUF_BYTES: usize = 4096;

fn main() {
    unsafe {
        let mut buf: Vec<i8> = Vec::with_capacity(BUF_BYTES as usize);
        libc::getcwd(buf.as_mut_ptr(), buf.len());
        let s = String::from_utf8(buf).expect("Found invalid UTF-8");
        println!("result: {}", s);
    }
}

Which produces the error: 哪会产生错误:

14:32 error: mismatched types:
 expected `std::vec::Vec<u8>`,
    found `std::vec::Vec<i8>` [E0308]

Thanks to the comments, I changed the buf into a Vec<u8> and cast it to a c_char buffer in the getcwd call: 感谢评论,我将buf更改为Vec<u8>并将其转换为getcwd调用中的c_char缓冲区:

    let mut buf: Vec<u8> = Vec::with_capacity(BUF_BYTES as usize);
    libc::getcwd(buf.as_mut_ptr() as *mut c_char, buf.len());

This compiles but now, when printing the string it is empty (length: 0) 这个编译但是现在,当打印字符串时它是空的(长度:0)

I found that getcwd returns NULL ( libc::getcwd(...).is_null() is true), reading the last error via external crate errno (why is this a separate crate to libc?) reveals that getcwd fails with "Invalid argument". 我发现getcwd返回NULL( libc::getcwd(...).is_null()为true),通过外部crate errno读取最后一个错误(为什么这是一个单独的crate到libc?)显示getcwd失败并显示“无效”说法”。 The source of the problem seems that buf.len() returns 0. 问题的根源似乎是buf.len()返回0。

In most cases, you should just use env::current_dir . 大多数情况下,您应该只使用env::current_dir This correctly handles all the platform-specifics for you, such as the "other" encodings mentioned in the comments. 这可以正确处理所有特定于平台的内容,例如评论中提到的“其他”编码。


C strings are kind of terrible. C弦有点可怕。 getcwd fills a buffer of some length, but doesn't tell you where it ends; getcwd填充一段长度的缓冲区,但不会告诉你它的结束位置; you have to manually find the terminating NUL byte. 你必须手动找到终止NUL字节。

extern crate libc;

static BUF_BYTES: usize = 4096;

fn main() {
    let buf = unsafe {
        let mut buf = Vec::with_capacity(BUF_BYTES);
        let res = libc::getcwd(buf.as_mut_ptr() as *mut i8, buf.capacity());
        if res.is_null() {
            panic!("Not long enough");
        }
        let mut len = 0;
        while *buf.as_mut_ptr().offset(len as isize) != 0 { len += 1 }
        buf.set_len(len);
        buf
    };

    let s = String::from_utf8(buf).expect("Found invalid UTF-8");
    println!("result: {}", s);
}

seems that buf.len() returns 0 似乎buf.len()返回0

Yes, the length is zero because no one told the vector that data was added. 是的,长度为零,因为没有人告诉向量数据被添加。 Vectors are comprised of three parts - a pointer to data, a length, and a capacity . 向量由三部分组成 - 指向数据的指针,长度和容量

The capacity is how much memory is available, the size is how much is used. 容量是可用的内存量,大小是使用多少。 When treating the vector as a blob to store data into, you want to use the capacity. 将矢量视为blob以将数据存储到其中时,您希望使用该容量。 You then need to inform the vector how many of those bytes were used, so that String::from_utf8 knows where the end is. 然后,您需要通知向量使用了多少这些字节,以便String::from_utf8知道结束的位置。

You'll note that I changed the scope of unsafe to only include the truly unsafe aspects and the code that makes that code actually safe. 您会注意到我将unsafe的范围更改为仅包含真正不安全的方面以及使该代码实际安全的代码。


In fact, you could just copy the implementation of env::current_dir for Unix-like systems . 实际上,您可以只复制类Unix系统的env::current_dir实现 It handles the failure cases much nicer and uses the proper types (paths aren't strings). 它更好地处理故障情况并使用正确的类型(路径不是字符串)。 Of course, it's even easier to just call env::current_dir . 当然,调用env::current_dir更容易。 ^_^ ^ _ ^


fyi: I ended up with this fyi:我最终得到了这个

 extern crate libc; use std::ffi::CStr; use std::io; use std::str; static BUF_BYTES: usize = 4096; fn main() { let buf = unsafe { let mut buf = Vec::with_capacity(BUF_BYTES); let ptr = buf.as_mut_ptr() as *mut libc::c_char; if libc::getcwd(ptr, buf.capacity()).is_null() { panic!(io::Error::last_os_error()); } CStr::from_ptr(ptr).to_bytes() }; println!("result: {}", str::from_utf8(buf).unwrap()); } 

This is unsafe and will lead to crashes (in the best case) or silent memory corruption or worse. 这是不安全的 ,会导致崩溃(在最好的情况下)或无声的内存损坏或更糟。

When a block ends, any variables within it will be dropped. 当块结束时,其中的任何变量都将被删除。 In this case, the unsafe block creates buf , takes a pointer to it, makes a CStr with the pointer, then frees the Vec , invalidating the pointer. 在这种情况下, unsafe块创建buf ,获取指向它的指针,使用指针创建CStr ,然后释放Vec ,使指针无效。 It then returns that CStr containing an invalid reference from the block. 然后它返回包含来自块的无效引用的CStr

Something like this is better: 这样的事情更好:

extern crate libc;

use std::ffi::{CStr, CString};
use std::io;
use std::str;

static BUF_BYTES: usize = 4096;

fn main() {
    let buf = unsafe {
        // Allocate some space to store the result
        let mut buf = Vec::with_capacity(BUF_BYTES);

        // Call the function, panicking if it fails
        let ptr = buf.as_mut_ptr() as *mut libc::c_char;
        if libc::getcwd(ptr, buf.capacity()).is_null() {
            panic!(io::Error::last_os_error());
        }

        // Find the first NUL and inform the vector of that
        let s = CStr::from_ptr(ptr);
        buf.set_len(s.to_bytes().len());

        // Transfer ownership of the Vec to a CString, ensuring there are no interior NULs
        CString::new(buf)
    };

    let s = buf.expect("Not a C string").into_string().expect("Not UTF-8");
    println!("result: {}", s);
}

I wonder why this has actually worked 我想知道为什么这实际上有效

Likely because nothing changed the memory before you attempted to access it. 可能因为在您尝试访问它之前没有改变内存。 In a heavily multithreaded environment, I could see more issues arising. 在一个繁重的多线程环境中,我可以看到更多的问题。

why is it possible to have two mutable references to the vector? 为什么有可能对载体有两个可变引用? First as mut buf and then as ptr = buf.as_mut_ptr() . 首先作为mut buf然后作为ptr = buf.as_mut_ptr() The ownership has not moved, has it? 所有权没有动,是吗? Otherwise, why is it possible to call buf.capacity() 否则,为什么可以调用buf.capacity()

You don't actually have two references . 你实际上没有两个引用 buf owns the value, then you get a mutable pointer . buf拥有该值,然后你得到一个可变指针 There is no compiler protection for pointers, which is part of the reason that an unsafe block is needed 指针没有编译器保护,这是需要unsafe块的部分原因

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM