OS and non-UTF8 Strings
Recipe | Crates | Categories |
---|---|---|
bstr | ||
CString and CStr | ||
OsString and OsStr |
OsString
and OsStr
std::ffi::OsString
is a type that can represent owned, mutable platform-native strings, but is cheaply inter-convertible with Rust strings.
The need for this type arises from the fact that:
- On Unix systems, strings are often arbitrary sequences of non-zero bytes, in many cases interpreted as UTF-8.
- On Windows, strings are often arbitrary sequences of non-zero 16-bit values, interpreted as UTF-16 when it is valid to do so.
- In Rust, strings are always valid UTF-8, which may contain zeros.
OsString
and OsStr
bridge this gap by simultaneously representing Rust and platform-native string values, and in particular allowing a Rust string to be converted into an “OS” string with no cost if possible. A consequence of this is that OsString
instances are not NUL terminated; in order to pass to e.g., Unix system call, you should create a CStr
.
std::ffi::OsStr
is a borrowed reference to an OS string. &OsStr
is to OsString
as &str
is to String
: the former in each pair are borrowed references; the latter are owned strings.
use std::env; use std::ffi::OsStr; use std::ffi::OsString; use std::path::Path; use std::path::PathBuf; use std::process::Command; fn main() { // Create an OsString (owned platform-native string) let mut os_string = OsString::new(); os_string.push("Hello"); os_string.push(" "); os_string.push("world"); // Convert to OsStr (borrowed platform-native string) let os_str: &OsStr = os_string.as_os_str(); // Conversion to String if valid UTF-8 match os_str.to_str() { Some(s) => println!("Valid UTF-8: {}", s), None => println!("Invalid UTF-8 sequence"), } // Create OsString from regular String let regular_string = String::from("example.txt"); let _os_string_from_regular = OsString::from(regular_string); // Working with environment variables let path_var: OsString = env::var_os("PATH").unwrap_or_default(); println!("PATH: {:?}", path_var); // Iterate through PATH entries let paths = env::split_paths(&path_var); for path in paths.take(3) { // Show first 3 paths only // path is a PathBuf println!("Path entry: {:?}", path); } // Working with file paths let file_path = Path::new("config").join("settings.json"); println!("File path: {:?}", file_path); // Extract file name as OsStr if let Some(file_name) = file_path.file_name() { println!("File name as OsStr: {:?}", file_name); } // Working with command arguments let mut cmd = Command::new("echo"); cmd.arg(&os_string); // Can use OsString directly as argument // Convert Path to OsString let path_buf = PathBuf::from("/usr/local/bin"); let path_os_string: OsString = path_buf.into_os_string(); println!("Path as OsString: {:?}", path_os_string); }
CString
and CStr
std::ffi::CString
represents an owned, C-compatible, nul-terminated string with no nul bytes in the middle.
A CString
can be created from either a byte slice or a byte vector, or anything that implements Into<Vec<u8>>
(for example, you can build a CString
straight out of a String
or a &str
, since both implement that trait).
std::ffi::CStr
represents a borrowed reference to a nul-terminated array of bytes. It can be constructed safely from a &[u8]
slice, or unsafely from a raw *const c_char
. It can be expressed as a literal in the form c"Hello world"
. Note that this structure does not have a guaranteed layout (the repr(transparent) notwithstanding) and should not be placed in the signatures of FFI functions. Instead, safe wrappers of FFI functions may leverage CStr::as_ptr
and the unsafe CStr::from_ptr
constructor to provide a safe interface to other consumers.
&CStr
is to CString
as &str is to String: the former in each pair are borrowed references; the latter are owned strings.
The primary use case for these kinds of strings is interoperating with C-like code.
// // COMING SOON
bstr
bstr
⮳ offers a string type that is not required to be valid UTF-8.
This crate provides extension traits for &[u8]
and Vec<u8>
that enable their use as byte strings, where byte strings are conventionally UTF-8. This differs from the standard library's String
and str
types in that they are not required to be valid UTF-8, but may be fully or partially valid UTF-8.
// BStr is a byte string slice, analogous to str. use bstr::BStr; // BString is an owned growable byte string buffer, analogous to String. use bstr::BString; // ByteSlice extends the `[u8]` type with additional string oriented // methods. use bstr::ByteSlice; // ByteVec extends the `Vec<u8>` type with additional string oriented // methods. use bstr::ByteVec; // Add to your `Cargo.toml` file: // [dependencies] // bstr = "1.11.3" # Or latest fn main() { // Basic usage let _bstring = BString::from("Hello, world!"); let bytes = vec![72, 101, 108, 108, 111]; // "Hello" let _bstring_from_bytes = BString::from(bytes); // Working with non-UTF8 data let invalid_utf8 = vec![72, 101, 108, 108, 111, 0xFF, 0xFE, 33]; let bstring_invalid = BString::from(invalid_utf8); println!("BString with invalid UTF-8: {}", bstring_invalid); // `ByteSlice` methods let text: &BStr = b"apple,banana,cherry".as_bstr(); for item in text.split_str(",") { println!("Item: {:?}", item); } // Find substrings let haystack = b"Finding a needle in a haystack".as_bstr(); if let Some(pos) = haystack.find("needle") { println!("Found 'needle' at position: {}", pos); } // Modifying BString let mut growable = BString::from("Growing "); growable.push_str("string"); // Line handling let multiline = b"First line\nSecond line\r\n".as_bstr(); for line in multiline.lines() { println!("Line: {:?}", line); } }
Related Topics
- Development Tools: FFI.
- Strings.