Uniform Resource Location

Parse a URL from a String to a Url Type

url cat-network-programming

The url::Url::parse⮳ method from the url⮳ crate validates and parses a &str into a url::Url⮳ struct. The input string may be malformed so this method returns Result<Url, ParseError>.

Once the URL has been parsed, it can be used with all of the methods in the url::Url⮳ type.

use url::ParseError;
use url::Url;

/// Demonstrates URL parsing.
///
/// This function parses a URL string and prints the path part of the URL.
fn main() -> Result<(), ParseError> {
    // Define a URL string.
    let s = "https://github.com/rust-lang/rust/issues?labels=E-easy&state=open";

    let parsed = Url::parse(s)?;
    println!("The path part of the URL is: {}", parsed.path());

    Ok(())
}

Create a Base URL by Removing Path Segments

url cat-network-programming

A base URL includes a protocol and a domain. Base URLs have no folders, files or query strings. Each of those items are stripped out of the given URL. url::PathSegmentsMut::clear⮳ removes paths and url::Url::set_query⮳ removes query string.

//! This module demonstrates how to extract the base URL from a given URL.
//!
//! It uses the `url` crate to parse and manipulate URLs.

use anyhow::Result;
use url::Url;

/// Extracts the base URL from a given URL.
///
/// This function takes a URL and removes the path segments and query
/// parameters, effectively returning the base URL.
///
/// # Arguments
///
/// * `url` - The URL to extract the base from.
///
/// # Returns
///
/// Returns a `Result` containing the base URL or an error if the URL cannot be
/// processed.
fn base_url(mut url: Url) -> Result<Url> {
    url.set_fragment(None);
    url.set_query(None);
    url.set_path("");

    // You could also use `path_segments_mut` to return an object with methods
    // to manipulate the URL's path segments. match url.path_segments_mut()
    // {     Ok(mut path) => {
    //         path.clear();
    //     }
    //     Err(_) => {
    //         // Some (uncommon) URLs are said to be cannot-be-a-base:
    //         // they don’t have a username, password, host, or port,
    //         // and their "path" is an arbitrary string rather than
    // slash-separated segments.         return Err(anyhow::anyhow!("This
    // URL is cannot-be-a-base."));     }
    // }

    Ok(url)
}

fn main() -> Result<()> {
    let full = "https://github.com/rust-lang/cargo?asdf";

    let url = Url::parse(full)?;
    let base = base_url(url)?;

    assert_eq!(base.as_str(), "https://github.com/");
    println!("The base of the URL is: {}", base);

    Ok(())
}

Create new URLs from a Base URL

url cat-network-programming

The url::Url::join⮳ method creates a new URL from a base and relative path.

//! Demonstrates how to build a URL by joining a base URL with a path.

use url::ParseError;
use url::Url;

/// Builds a GitHub URL by joining a base URL with a given path.
///
/// # Arguments
///
/// * `path` - The path to append to the base GitHub URL.
///
/// Returns a `Result` containing the joined `Url` or a `ParseError`.
fn build_github_url(path: &str) -> Result<Url, ParseError> {
    const GITHUB: &str = "https://github.com";

    let base =
        Url::parse(GITHUB).expect("This hardcoded URL is known to be valid");
    let joined = base.join(path)?;

    Ok(joined)
}

fn main() -> Result<(), ParseError> {
    let path = "/rust-lang/cargo";

    let gh = build_github_url(path)?;
    println!("The joined URL is: {}", gh);
    assert_eq!(gh.as_str(), "https://github.com/rust-lang/cargo");

    Ok(())
}

Extract the URL Origin (scheme / Host / port)

url cat-network-programming

The url::Url⮳ struct exposes various methods to extract information about the URL it represents.

//! Demonstrates parsing a URL and extracting its origin components.

use url::Host;
use url::ParseError;
use url::Url;

fn main() -> Result<(), ParseError> {
    let s = "ftp://rust-lang.org/examples";

    let url = Url::parse(s)?;

    assert_eq!(url.scheme(), "ftp");
    assert_eq!(url.host(), Some(Host::Domain("rust-lang.org")));
    assert_eq!(url.port_or_known_default(), Some(21));
    println!("The origin is as expected!");

    Ok(())
}

url::Url::origin⮳ produces the same result.

//! This example demonstrates how to parse a URL and extract its origin.
//!
//! The `url::Url` struct is used to parse the URL string.
//! The `url::Host` enum is used to represent the host part of the URL.
//! The `url::Origin` enum is used to represent the origin of the URL.
use anyhow::Result;
use url::Host;
use url::Origin;
use url::Url;

fn main() -> Result<()> {
    let s = "ftp://rust-lang.org/examples";

    let url = Url::parse(s)?;

    let expected_scheme = "ftp".to_owned();
    let expected_host = Host::Domain("rust-lang.org".to_owned());
    let expected_port = 21;
    let expected = Origin::Tuple(expected_scheme, expected_host, expected_port);

    let origin = url.origin();
    assert_eq!(origin, expected);
    println!("The origin is as expected!");

    Ok(())
}

#[test]
fn test() -> anyhow::Result<()> {
    main()?;
    Ok(())
}

Remove Fragment Identifiers and Query Pairs from a URL

url cat-network-programming

Parses url::Url⮳ and slices it with url::Position⮳ to strip unneeded URL parts.

//! This example demonstrates how to parse a URL and extract a portion of it.
//!
//! The `Url::parse` function is used to parse a URL string into a `Url` object.
//! The `Position` enum is used to specify a position within the URL.
//! In this case, `Position::AfterPath` is used to specify the position after
//! the path. The `cleaned` variable is then assigned a slice of the URL string
//! from the beginning to the specified position. Finally, the `cleaned` string
//! is printed to the console.

use url::ParseError;
use url::Position;
use url::Url;

fn main() -> Result<(), ParseError> {
    let parsed = Url::parse(
        "https://github.com/rust-lang/rust/issues?labels=E-easy&state=open",
    )?;
    let cleaned: &str = &parsed[..Position::AfterPath];
    println!("`cleaned`: {}", cleaned);
    Ok(())
}