Statistics
Recipe | Crates | Categories |
---|---|---|
Computing standard deviation | ||
Measures of central tendency | ||
Standard deviation |
Measures of central tendency
These examples calculate measures of central tendency for a data set contained within a Rust array. There may be no mean, median or mode to calculate for an empty set of data, so each function returns an std::option::Option
⮳ to be handled by the caller.
The first example calculates the mean (the sum of all measurements divided by the number of measurements in the set) by producing an iterator of references over the data, and using std::iter::Iterator::sum
⮳ and len
⮳ to determine the total value and count of values respectively.
fn main() { let data = [3, 1, 6, 1, 5, 8, 1, 8, 10, 11]; let sum = data.iter().sum::<i32>() as f32; let count = data.len(); let mean = match count { positive if positive > 0 => Some(sum / count as f32), _ => None, }; println!("Mean of the data is {:?}", mean); }
The second example calculates the median using the quickselect algorithm, which avoids a full sort
⮳ by sorting only partitions of the data set known to possibly contain the median. This uses std::cmp::Ord::cmp
⮳ and std::cmp::Ordering
⮳ to succinctly decide the next partition to examine, and split_at
⮳ to choose an arbitrary pivot for the next partition at each step.
use std::cmp::Ordering; fn partition(data: &[i32]) -> Option<(Vec<i32>, i32, Vec<i32>)> { match data.len() { 0 => None, _ => { let (pivot_slice, tail) = data.split_at(1); let pivot = pivot_slice[0]; let (left, right) = tail.iter().fold((vec![], vec![]), |mut splits, next| { { let (ref mut left, ref mut right) = &mut splits; if next < &pivot { left.push(*next); } else { right.push(*next); } } splits }); Some((left, pivot, right)) } } } fn select(data: &[i32], k: usize) -> Option<i32> { let part = partition(data); match part { None => None, Some((left, pivot, right)) => { let pivot_idx = left.len(); match pivot_idx.cmp(&k) { Ordering::Equal => Some(pivot), Ordering::Greater => select(&left, k), Ordering::Less => select(&right, k - (pivot_idx + 1)), } } } } fn median(data: &[i32]) -> Option<f32> { let size = data.len(); match size { even if even % 2 == 0 => { let fst_med = select(data, (even / 2) - 1); let snd_med = select(data, even / 2); match (fst_med, snd_med) { (Some(fst), Some(snd)) => Some((fst + snd) as f32 / 2.0), _ => None, } } odd => select(data, odd / 2).map(|x| x as f32), } } fn main() { let data = [3, 1, 6, 1, 5, 8, 1, 8, 10, 11]; let part = partition(&data); println!("Partition is {:?}", part); let sel = select(&data, 5); println!("Selection at ordered index {} is {:?}", 5, sel); let med = median(&data); println!("Median is {:?}", med); }
The final example calculates the mode using a mutable std::collections::HashMap
⮳ to collect counts of each distinct integer from the set, using a std::iter::Iterator::fold
⮳ and the std::collections::hash_map::Entry
⮳ API. The most frequent value in the std::collections::HashMap
⮳ surfaces with std::iter::Iterator::max_by_key
⮳.
use std::collections::HashMap; fn main() { let data = [3, 1, 6, 1, 5, 8, 1, 8, 10, 11]; let frequencies = data.iter().fold(HashMap::new(), |mut freqs, value| { *freqs.entry(value).or_insert(0) += 1; freqs }); let mode = frequencies .into_iter() .max_by_key(|&(_, count)| count) .map(|(value, _)| *value); println!("Mode of the data is {:?}", mode); }
Standard deviation
This example calculates the standard deviation and z-score of a set of measurements.
The standard deviation is defined as the square root of the variance (here calculated with f32's sqrt
⮳ where the variance is the std::iter::Iterator::sum
⮳ of the squared difference between each measurement and the mean
divided by the number of measurements).
The z-score is the number of standard deviations a single measurement spans away from the mean
of the data set.
fn mean(data: &[i32]) -> Option<f32> { let sum = data.iter().sum::<i32>() as f32; let count = data.len(); match count { positive if positive > 0 => Some(sum / count as f32), _ => None, } } fn std_deviation(data: &[i32]) -> Option<f32> { match (mean(data), data.len()) { (Some(data_mean), count) if count > 0 => { let variance = data .iter() .map(|value| { let diff = data_mean - (*value as f32); diff * diff }) .sum::<f32>() / count as f32; Some(variance.sqrt()) } _ => None, } } fn main() { let data = [3, 1, 6, 1, 5, 8, 1, 8, 10, 11]; let data_mean = mean(&data); println!("Mean is {:?}", data_mean); let data_std_deviation = std_deviation(&data); println!("Standard deviation is {:?}", data_std_deviation); let zscore = match (data_mean, data_std_deviation) { (Some(mean), Some(std_deviation)) => { let diff = data[4] as f32 - mean; Some(diff / std_deviation) } _ => None, }; println!( "Z-score of data at index 4 (with value {}) is {:?}", data[4], zscore ); }