Unicode

RecipeCratesCategories
Collect Unicode Graphemesunicode_segmentationcat-text-processing

Unicode segmentation is the process of dividing a string of Unicode text into meaningful units, such as grapheme clusters (user-perceived characters), words, and sentences, following the rules defined by the Unicode Standard.

Collect Unicode Graphemes

unicode-segmentation unicode-segmentation-crates.io unicode-segmentation-github unicode-segmentation-lib.rs cat-text-processing

unicode-segmentation collects individual Unicode graphemes from UTF-8 strings. See in particular the unicode_segmentation::UnicodeSegmentation::graphemes⮳ function.

use unicode_segmentation::UnicodeSegmentation;

fn main() {
    let name = "José Guimarães\r\n";
    let graphemes =
        UnicodeSegmentation::graphemes(name, true).collect::<Vec<&str>>();
    println!("{:?}", graphemes);
    assert_eq!(graphemes[3], "é");
}

Related Topics

  • Strings.
  • String Encoding.