OS and non-UTF8 Strings

RecipeCratesCategories
bstrbstrcat-text-processing
CString and CStrstdcat-text-processing
OsString and OsStrstdcat-text-processing

OsString and OsStr

std::ffi::OsString is a type that can represent owned, mutable platform-native strings, but is cheaply inter-convertible with Rust strings.

The need for this type arises from the fact that:

  • On Unix systems, strings are often arbitrary sequences of non-zero bytes, in many cases interpreted as UTF-8.
  • On Windows, strings are often arbitrary sequences of non-zero 16-bit values, interpreted as UTF-16 when it is valid to do so.
  • In Rust, strings are always valid UTF-8, which may contain zeros.

OsString and OsStr bridge this gap by simultaneously representing Rust and platform-native string values, and in particular allowing a Rust string to be converted into an “OS” string with no cost if possible. A consequence of this is that OsString instances are not NUL terminated; in order to pass to e.g., Unix system call, you should create a CStr.

std::ffi::OsStr is a borrowed reference to an OS string. &OsStr is to OsString as &str is to String: the former in each pair are borrowed references; the latter are owned strings.

use std::env;
use std::ffi::OsStr;
use std::ffi::OsString;
use std::path::Path;
use std::path::PathBuf;
use std::process::Command;

fn main() {
    // Create an OsString (owned platform-native string)
    let mut os_string = OsString::new();
    os_string.push("Hello");
    os_string.push(" ");
    os_string.push("world");

    // Convert to OsStr (borrowed platform-native string)
    let os_str: &OsStr = os_string.as_os_str();

    // Conversion to String if valid UTF-8
    match os_str.to_str() {
        Some(s) => println!("Valid UTF-8: {}", s),
        None => println!("Invalid UTF-8 sequence"),
    }

    // Create OsString from regular String
    let regular_string = String::from("example.txt");
    let _os_string_from_regular = OsString::from(regular_string);

    // Working with environment variables
    let path_var: OsString = env::var_os("PATH").unwrap_or_default();
    println!("PATH: {:?}", path_var);

    // Iterate through PATH entries
    let paths = env::split_paths(&path_var);
    for path in paths.take(3) {
        // Show first 3 paths only
        // path is a PathBuf
        println!("Path entry: {:?}", path);
    }

    // Working with file paths
    let file_path = Path::new("config").join("settings.json");
    println!("File path: {:?}", file_path);

    // Extract file name as OsStr
    if let Some(file_name) = file_path.file_name() {
        println!("File name as OsStr: {:?}", file_name);
    }

    // Working with command arguments
    let mut cmd = Command::new("echo");
    cmd.arg(&os_string); // Can use OsString directly as argument

    // Convert Path to OsString
    let path_buf = PathBuf::from("/usr/local/bin");
    let path_os_string: OsString = path_buf.into_os_string();
    println!("Path as OsString: {:?}", path_os_string);
}

CString and CStr

C strings are different from Rust strings:

  • Rust strings are UTF-8, but C strings may use other encodings.
  • Their character sizes may be different.
  • C strings are nul-terminated, i.e., they have a \0 character at the end.
  • C strings cannot have nul characters in the middle.

Use CString and CStr when you need to convert Rust UTF-8 strings to and from C-style strings. Their primary use case is FFI, Foreign Function Interface, the mechanism by which Rust interacts with code written in other languages with a C ABI, like C and Python.

std::ffi::CString represents an owned, C-compatible, nul-terminated string with no nul bytes in the middle. A CString can be created from either a byte slice or a byte vector, or anything that implements Into<Vec<u8>> (for example, you can build a CString straight out of a String or a &str, since both implement that trait).

std::ffi::CStr represents a borrowed reference to a nul-terminated array of bytes. It can be constructed safely from a &[u8] slice, or unsafely from a raw *const c_char. It can be expressed as a literal in the form c"Hello world". Note that this structure does not have a guaranteed layout (the repr(transparent) notwithstanding) and should not be directly placed in the signatures of FFI functions. Instead, safe wrappers of FFI functions may leverage CStr::as_ptr and the unsafe CStr::from_ptr constructor to provide a safe interface to other consumers.

&CStr is to CString as &str is to String: the former in each pair are borrowed references; the latter are owned strings.

//! This example demonstrates how to work with C-style strings (`CString` and
//! `CStr`) in Rust, which are essential for FFI (Foreign Function Interface)
//! interactions.

use std::ffi::CStr;
use std::ffi::CString;
use std::os::raw::c_char;

// Example external function that takes a C string
// (from the standard C library).
unsafe extern "C" {
    // It accepts a raw pointer to a C-style string,
    // which must be terminated by \0 (`nul`).
    fn strlen(s: *const c_char) -> usize;
}

fn main() {
    // 1. Call a FFI function that requires a C string.

    // Create a CString (owned nul-terminated, C-compatible string) from a Rust
    // `&str`:
    let rust_string = "Hello, C world!";
    let c_string = CString::new(rust_string).expect("CString creation failed");

    // Get the raw pointer.
    let raw_ptr: *const c_char = c_string.as_ptr();

    // Call a C function with the raw pointer. Note the unsafe block.
    let len = unsafe { strlen(raw_ptr) };
    println!("String length according to C: {}", len);

    // 2. Work with null-terminated strings from C.

    // Simulate a string received from C code. Note the `nul` terminator.
    // In real code, this would come from an external C function call.
    let c_hello = b"Hello\0" as *const u8 as *const c_char;
    let borrowed: &CStr;
    unsafe {
        // Wrap the pointer in a CStr (borrowed C string slice).
        borrowed = CStr::from_ptr(c_hello);
    }

    // Convert to Rust `&str`.
    let rust_str: &str = borrowed.to_str().expect("Invalid UTF-8");
    println!("From C: {}", rust_str);

    // Create owned version of the C-style String.
    let owned: CString = CString::from(borrowed);
    println!("Owned: {:?}", owned);
}

bstr

bstr bstr-crates.io bstr-github bstr-lib.rs cat-encoding cat-text-processing

bstr offers a string type that is not required to be valid UTF-8.

This crate provides extension traits for &[u8] and Vec<u8> that enable their use as byte strings, where byte strings are conventionally UTF-8. This differs from the standard library's String and str types in that they are not required to be valid UTF-8, but may be fully or partially valid UTF-8.

//! The `bstr` crate provides types and traits for working with byte strings,
//! which are sequences of bytes that may or may not be valid UTF-8.
//!
//! Add to your `Cargo.toml` file:
//! ```toml
//! [dependencies]
//! bstr = "1.11.3" # Or latest
//! ```

// BStr is a byte string slice, analogous to str. It represents a borrowed
// sequence of bytes that may or may not be valid UTF-8.
use bstr::BStr;
// BString is an owned growable byte string buffer, analogous to String. It
// represents an owned sequence of bytes that may or may not be valid
// UTF-8.
use bstr::BString;
// ByteSlice extends the `[u8]` type with additional string-oriented
// methods. This trait is implemented for `[u8]` and `&[u8]`, providing
// methods for searching, splitting, and other operations on byte slices.
use bstr::ByteSlice;
// ByteVec extends the `Vec<u8>` type with additional string-oriented
// methods. This trait is implemented for `Vec<u8>`, providing methods for
// pushing, extending, and other operations on byte vectors.
use bstr::ByteVec;

fn main() {
    // Basic usage:
    let _bstring = BString::from("Hello, world!");
    let bytes: Vec<u8> = vec![72, 101, 108, 108, 111]; // "Hello"
    let _bstring_from_bytes = BString::from(bytes);

    // Working with non-UTF8 data:
    let invalid_utf8 = vec![72, 101, 108, 108, 111, 0xFF, 0xFE, 33];
    let bstring_invalid = BString::from(invalid_utf8);
    println!("BString with invalid UTF-8: {}", bstring_invalid);

    // `ByteSlice` methods:
    let text: &BStr = b"apple,banana,cherry".as_bstr();
    for item in text.split_str(",") {
        println!("Item: {:?}", item);
    }

    // Find substrings:
    let haystack = b"Finding a needle in a haystack".as_bstr();
    if let Some(pos) = haystack.find("needle") {
        println!("Found 'needle' at position: {}", pos);
    }

    // Modifying a `BString`:
    let mut growable = BString::from("Growing ");
    growable.push_str("string");

    // Line handling:
    let multiline = b"First line\nSecond line\r\n".as_bstr();
    for line in multiline.lines() {
        println!("Line: {:?}", line);
    }
}

Related Topics

  • Development Tools: FFI.
  • Strings.