Uniform Resource Location

Parse a URL from a string to a Url type

url cat-network-programming

The url::Url::parse⮳ method from the url⮳ crate validates and parses a &str into a url::Url⮳ struct. The input string may be malformed so this method returns Result<Url, ParseError>.

Once the URL has been parsed, it can be used with all of the methods in the url::Url⮳ type.

use url::ParseError;
use url::Url;

fn main() -> Result<(), ParseError> {
    let s = "https://github.com/rust-lang/rust/issues?labels=E-easy&state=open";

    let parsed = Url::parse(s)?;
    println!("The path part of the URL is: {}", parsed.path());

    Ok(())
}

Create a base URL by removing path segments

url cat-network-programming

A base URL includes a protocol and a domain. Base URLs have no folders, files or query strings. Each of those items are stripped out of the given URL. url::PathSegmentsMut::clear⮳ removes paths and url::Url::set_query⮳ removes query string.

use anyhow::anyhow;
use anyhow::Result;
use url::Url;

fn main() -> Result<()> {
    let full = "https://github.com/rust-lang/cargo?asdf";

    let url = Url::parse(full)?;
    let base = base_url(url)?;

    assert_eq!(base.as_str(), "https://github.com/");
    println!("The base of the URL is: {}", base);

    Ok(())
}

fn base_url(mut url: Url) -> Result<Url> {
    match url.path_segments_mut() {
        Ok(mut path) => {
            path.clear();
        }
        Err(_) => {
            return Err(anyhow!("This URL is cannot-be-a-base."));
        }
    }

    url.set_query(None);

    Ok(url)
}

Create new URLs from a base URL

url cat-network-programming

The url::Url::join⮳ method creates a new URL from a base and relative path.

use url::ParseError;
use url::Url;

fn main() -> Result<(), ParseError> {
    let path = "/rust-lang/cargo";

    let gh = build_github_url(path)?;

    assert_eq!(gh.as_str(), "https://github.com/rust-lang/cargo");
    println!("The joined URL is: {}", gh);

    Ok(())
}

fn build_github_url(path: &str) -> Result<Url, ParseError> {
    const GITHUB: &str = "https://github.com";

    let base = Url::parse(GITHUB).expect("hardcoded URL is known to be valid");
    let joined = base.join(path)?;

    Ok(joined)
}

Extract the URL origin (scheme / host / port)

url cat-network-programming

The url::Url⮳ struct exposes various methods to extract information about the URL it represents.

use url::Host;
use url::ParseError;
use url::Url;

fn main() -> Result<(), ParseError> {
    let s = "ftp://rust-lang.org/examples";

    let url = Url::parse(s)?;

    assert_eq!(url.scheme(), "ftp");
    assert_eq!(url.host(), Some(Host::Domain("rust-lang.org")));
    assert_eq!(url.port_or_known_default(), Some(21));
    println!("The origin is as expected!");

    Ok(())
}

url::Url::origin⮳ produces the same result.

use anyhow::Result;
use url::Host;
use url::Origin;
use url::Url;

fn main() -> Result<()> {
    let s = "ftp://rust-lang.org/examples";

    let url = Url::parse(s)?;

    let expected_scheme = "ftp".to_owned();
    let expected_host = Host::Domain("rust-lang.org".to_owned());
    let expected_port = 21;
    let expected = Origin::Tuple(expected_scheme, expected_host, expected_port);

    let origin = url.origin();
    assert_eq!(origin, expected);
    println!("The origin is as expected!");

    Ok(())
}

Remove fragment identifiers and query pairs from a URL

url cat-network-programming

Parses url::Url⮳ and slices it with url::Position⮳ to strip unneeded URL parts.

use url::ParseError;
use url::Position;
use url::Url;

fn main() -> Result<(), ParseError> {
    let parsed = Url::parse(
        "https://github.com/rust-lang/rust/issues?labels=E-easy&state=open",
    )?;
    let cleaned: &str = &parsed[..Position::AfterPath];
    println!("cleaned: {}", cleaned);
    Ok(())
}