OS and non-UTF8 Strings

RecipeCratesCategories
bstrbstrcat-text-processing
CString and CStrstdcat-text-processing
OsString and OsStrstdcat-text-processing

OsString and OsStr

std::ffi::OsString is a type that can represent owned, mutable platform-native strings, but is cheaply inter-convertible with Rust strings.

The need for this type arises from the fact that:

  • On Unix systems, strings are often arbitrary sequences of non-zero bytes, in many cases interpreted as UTF-8.
  • On Windows, strings are often arbitrary sequences of non-zero 16-bit values, interpreted as UTF-16 when it is valid to do so.
  • In Rust, strings are always valid UTF-8, which may contain zeros.

OsString and OsStr bridge this gap by simultaneously representing Rust and platform-native string values, and in particular allowing a Rust string to be converted into an “OS” string with no cost if possible. A consequence of this is that OsString instances are not NUL terminated; in order to pass to e.g., Unix system call, you should create a CStr.

std::ffi::OsStr is a borrowed reference to an OS string. &OsStr is to OsString as &str is to String: the former in each pair are borrowed references; the latter are owned strings.

use std::env;
use std::ffi::OsStr;
use std::ffi::OsString;
use std::path::Path;
use std::path::PathBuf;
use std::process::Command;

fn main() {
    // Create an OsString (owned platform-native string)
    let mut os_string = OsString::new();
    os_string.push("Hello");
    os_string.push(" ");
    os_string.push("world");

    // Convert to OsStr (borrowed platform-native string)
    let os_str: &OsStr = os_string.as_os_str();

    // Conversion to String if valid UTF-8
    match os_str.to_str() {
        Some(s) => println!("Valid UTF-8: {}", s),
        None => println!("Invalid UTF-8 sequence"),
    }

    // Create OsString from regular String
    let regular_string = String::from("example.txt");
    let _os_string_from_regular = OsString::from(regular_string);

    // Working with environment variables
    let path_var: OsString = env::var_os("PATH").unwrap_or_default();
    println!("PATH: {:?}", path_var);

    // Iterate through PATH entries
    let paths = env::split_paths(&path_var);
    for path in paths.take(3) {
        // Show first 3 paths only
        // path is a PathBuf
        println!("Path entry: {:?}", path);
    }

    // Working with file paths
    let file_path = Path::new("config").join("settings.json");
    println!("File path: {:?}", file_path);

    // Extract file name as OsStr
    if let Some(file_name) = file_path.file_name() {
        println!("File name as OsStr: {:?}", file_name);
    }

    // Working with command arguments
    let mut cmd = Command::new("echo");
    cmd.arg(&os_string); // Can use OsString directly as argument

    // Convert Path to OsString
    let path_buf = PathBuf::from("/usr/local/bin");
    let path_os_string: OsString = path_buf.into_os_string();
    println!("Path as OsString: {:?}", path_os_string);
}

CString and CStr

std::ffi::CString represents an owned, C-compatible, nul-terminated string with no nul bytes in the middle.

A CString can be created from either a byte slice or a byte vector, or anything that implements Into<Vec<u8>> (for example, you can build a CString straight out of a String or a &str, since both implement that trait).

std::ffi::CStr represents a borrowed reference to a nul-terminated array of bytes. It can be constructed safely from a &[u8] slice, or unsafely from a raw *const c_char. It can be expressed as a literal in the form c"Hello world". Note that this structure does not have a guaranteed layout (the repr(transparent) notwithstanding) and should not be placed in the signatures of FFI functions. Instead, safe wrappers of FFI functions may leverage CStr::as_ptr and the unsafe CStr::from_ptr constructor to provide a safe interface to other consumers.

&CStr is to CString as &str is to String: the former in each pair are borrowed references; the latter are owned strings.

The primary use case for these kinds of strings is interoperating with C-like code.

// // COMING SOON

bstr

bstr bstr-crates.io bstr-github bstr-lib.rs cat-encoding cat-text-processing

bstr offers a string type that is not required to be valid UTF-8.

This crate provides extension traits for &[u8] and Vec<u8> that enable their use as byte strings, where byte strings are conventionally UTF-8. This differs from the standard library's String and str types in that they are not required to be valid UTF-8, but may be fully or partially valid UTF-8.

// BStr is a byte string slice, analogous to str.
use bstr::BStr;
// BString is an owned growable byte string buffer, analogous to String.
use bstr::BString;
// ByteSlice extends the `[u8]` type with additional string oriented
// methods.
use bstr::ByteSlice;
// ByteVec extends the `Vec<u8>` type with additional string oriented
// methods.
use bstr::ByteVec;

// Add to your `Cargo.toml` file:
// [dependencies]
// bstr = "1.11.3" # Or latest

fn main() {
    // Basic usage
    let _bstring = BString::from("Hello, world!");
    let bytes = vec![72, 101, 108, 108, 111]; // "Hello"
    let _bstring_from_bytes = BString::from(bytes);

    // Working with non-UTF8 data
    let invalid_utf8 = vec![72, 101, 108, 108, 111, 0xFF, 0xFE, 33];
    let bstring_invalid = BString::from(invalid_utf8);
    println!("BString with invalid UTF-8: {}", bstring_invalid);

    // `ByteSlice` methods
    let text: &BStr = b"apple,banana,cherry".as_bstr();
    for item in text.split_str(",") {
        println!("Item: {:?}", item);
    }

    // Find substrings
    let haystack = b"Finding a needle in a haystack".as_bstr();
    if let Some(pos) = haystack.find("needle") {
        println!("Found 'needle' at position: {}", pos);
    }

    // Modifying BString
    let mut growable = BString::from("Growing ");
    growable.push_str("string");

    // Line handling
    let multiline = b"First line\nSecond line\r\n".as_bstr();
    for line in multiline.lines() {
        println!("Line: {:?}", line);
    }
}

Related Topics

  • Development Tools: FFI.
  • Strings.