OS and non-UTF8 Strings
Recipe | Crates | Categories |
---|---|---|
bstr | ||
CString and CStr | ||
OsString and OsStr |
OsString
and OsStr
std::ffi::OsString
is a type that can represent owned, mutable platform-native strings, but is cheaply inter-convertible with Rust strings.
The need for this type arises from the fact that:
- On Unix systems, strings are often arbitrary sequences of non-zero bytes, in many cases interpreted as UTF-8.
- On Windows, strings are often arbitrary sequences of non-zero 16-bit values, interpreted as UTF-16 when it is valid to do so.
- In Rust, strings are always valid UTF-8, which may contain zeros.
OsString
and OsStr
bridge this gap by simultaneously representing Rust and platform-native string values, and in particular allowing a Rust string to be converted into an “OS” string with no cost if possible. A consequence of this is that OsString
instances are not NUL terminated; in order to pass to e.g., Unix system call, you should create a CStr
.
std::ffi::OsStr
is a borrowed reference to an OS string. &OsStr
is to OsString
as &str
is to String
: the former in each pair are borrowed references; the latter are owned strings.
use std::env; use std::ffi::OsStr; use std::ffi::OsString; use std::path::Path; use std::path::PathBuf; use std::process::Command; fn main() { // Create an OsString (owned platform-native string) let mut os_string = OsString::new(); os_string.push("Hello"); os_string.push(" "); os_string.push("world"); // Convert to OsStr (borrowed platform-native string) let os_str: &OsStr = os_string.as_os_str(); // Conversion to String if valid UTF-8 match os_str.to_str() { Some(s) => println!("Valid UTF-8: {}", s), None => println!("Invalid UTF-8 sequence"), } // Create OsString from regular String let regular_string = String::from("example.txt"); let _os_string_from_regular = OsString::from(regular_string); // Working with environment variables let path_var: OsString = env::var_os("PATH").unwrap_or_default(); println!("PATH: {:?}", path_var); // Iterate through PATH entries let paths = env::split_paths(&path_var); for path in paths.take(3) { // Show first 3 paths only // path is a PathBuf println!("Path entry: {:?}", path); } // Working with file paths let file_path = Path::new("config").join("settings.json"); println!("File path: {:?}", file_path); // Extract file name as OsStr if let Some(file_name) = file_path.file_name() { println!("File name as OsStr: {:?}", file_name); } // Working with command arguments let mut cmd = Command::new("echo"); cmd.arg(&os_string); // Can use OsString directly as argument // Convert Path to OsString let path_buf = PathBuf::from("/usr/local/bin"); let path_os_string: OsString = path_buf.into_os_string(); println!("Path as OsString: {:?}", path_os_string); }
CString
and CStr
C strings are different from Rust strings:
- Rust strings are UTF-8, but C strings may use other encodings.
- Their character sizes may be different.
- C strings are nul-terminated, i.e., they have a \0 character at the end.
- C strings cannot have nul characters in the middle.
Use CString
and CStr
when you need to convert Rust UTF-8 strings to and from C-style strings. Their primary use case is FFI, Foreign Function Interface, the mechanism by which Rust interacts with code written in other languages with a C ABI, like C and Python.
std::ffi::CString
represents an owned, C-compatible, nul-terminated string with no nul bytes in the middle. A CString
can be created from either a byte slice or a byte vector, or anything that implements Into<Vec<u8>>
(for example, you can build a CString
straight out of a String
or a &str
, since both implement that trait).
std::ffi::CStr
represents a borrowed reference to a nul-terminated array of bytes. It can be constructed safely from a &[u8]
slice, or unsafely from a raw *const c_char
. It can be expressed as a literal in the form c"Hello world"
. Note that this structure does not have a guaranteed layout (the repr(transparent)
notwithstanding) and should not be directly placed in the signatures of FFI functions. Instead, safe wrappers of FFI functions may leverage CStr::as_ptr
and the unsafe CStr::from_ptr
constructor to provide a safe interface to other consumers.
&CStr
is to CString
as &str
is to String
: the former in each pair are borrowed references; the latter are owned strings.
//! This example demonstrates how to work with C-style strings (`CString` and //! `CStr`) in Rust, which are essential for FFI (Foreign Function Interface) //! interactions. use std::ffi::CStr; use std::ffi::CString; use std::os::raw::c_char; // Example external function that takes a C string // (from the standard C library). unsafe extern "C" { // It accepts a raw pointer to a C-style string, // which must be terminated by \0 (`nul`). fn strlen(s: *const c_char) -> usize; } fn main() { // 1. Call a FFI function that requires a C string. // Create a CString (owned nul-terminated, C-compatible string) from a Rust // `&str`: let rust_string = "Hello, C world!"; let c_string = CString::new(rust_string).expect("CString creation failed"); // Get the raw pointer. let raw_ptr: *const c_char = c_string.as_ptr(); // Call a C function with the raw pointer. Note the unsafe block. let len = unsafe { strlen(raw_ptr) }; println!("String length according to C: {}", len); // 2. Work with null-terminated strings from C. // Simulate a string received from C code. Note the `nul` terminator. // In real code, this would come from an external C function call. let c_hello = b"Hello\0" as *const u8 as *const c_char; let borrowed: &CStr; unsafe { // Wrap the pointer in a CStr (borrowed C string slice). borrowed = CStr::from_ptr(c_hello); } // Convert to Rust `&str`. let rust_str: &str = borrowed.to_str().expect("Invalid UTF-8"); println!("From C: {}", rust_str); // Create owned version of the C-style String. let owned: CString = CString::from(borrowed); println!("Owned: {:?}", owned); }
bstr
bstr
⮳ offers a string type that is not required to be valid UTF-8.
This crate provides extension traits for &[u8]
and Vec<u8>
that enable their use as byte strings, where byte strings are conventionally UTF-8. This differs from the standard library's String
and str
types in that they are not required to be valid UTF-8, but may be fully or partially valid UTF-8.
//! The `bstr` crate provides types and traits for working with byte strings, //! which are sequences of bytes that may or may not be valid UTF-8. //! //! Add to your `Cargo.toml` file: //! ```toml //! [dependencies] //! bstr = "1.11.3" # Or latest //! ``` // BStr is a byte string slice, analogous to str. It represents a borrowed // sequence of bytes that may or may not be valid UTF-8. use bstr::BStr; // BString is an owned growable byte string buffer, analogous to String. It // represents an owned sequence of bytes that may or may not be valid // UTF-8. use bstr::BString; // ByteSlice extends the `[u8]` type with additional string-oriented // methods. This trait is implemented for `[u8]` and `&[u8]`, providing // methods for searching, splitting, and other operations on byte slices. use bstr::ByteSlice; // ByteVec extends the `Vec<u8>` type with additional string-oriented // methods. This trait is implemented for `Vec<u8>`, providing methods for // pushing, extending, and other operations on byte vectors. use bstr::ByteVec; fn main() { // Basic usage: let _bstring = BString::from("Hello, world!"); let bytes: Vec<u8> = vec![72, 101, 108, 108, 111]; // "Hello" let _bstring_from_bytes = BString::from(bytes); // Working with non-UTF8 data: let invalid_utf8 = vec![72, 101, 108, 108, 111, 0xFF, 0xFE, 33]; let bstring_invalid = BString::from(invalid_utf8); println!("BString with invalid UTF-8: {}", bstring_invalid); // `ByteSlice` methods: let text: &BStr = b"apple,banana,cherry".as_bstr(); for item in text.split_str(",") { println!("Item: {:?}", item); } // Find substrings: let haystack = b"Finding a needle in a haystack".as_bstr(); if let Some(pos) = haystack.find("needle") { println!("Found 'needle' at position: {}", pos); } // Modifying a `BString`: let mut growable = BString::from("Growing "); growable.push_str("string"); // Line handling: let multiline = b"First line\nSecond line\r\n".as_bstr(); for line in multiline.lines() { println!("Line: {:?}", line); } }
Related Topics
- Development Tools: FFI.
- Strings.