FPP

Stats functions adapted to the conventions of Freedman, Pisani, and Purves 2007.

statwrap.fpp.average(*args)

Computes the arithmetic mean.

\[\frac{1}{n} \sum_{i=1}^{n} x_i\]
Parameters:

args (array_like or numeric scalars) – Input data. This can be a single array-like object or individual numbers. Both average([1,2]) and average(1,2) are valid.

Returns:

The average value, or arithmetic mean, for a collection of numbers.

Return type:

float

Example

>>> average(0, 5, -8, 7, -3)
0.2
statwrap.fpp.box_model(*args, with_replacement=True, draws=1, random_seed=None)

Returns random draws from a box model where each number in the box model is equally likely to be drawn.

Parameters:
  • *args (tuple or array_like) – The elements forming the box from which numbers will be drawn. If a single array_like is provided, it will be used as the box model. If multiple values are provided, they should be passed as a flat tuple.

  • with_replacement (bool, optional) – Specifies whether drawing is done with replacement. Default is True, where numbers are replaced back into the box after each draw.

  • draws (int, optional) – The number of draws to be made from the box. Must be a positive integer. Default is 1.

  • random_seed (int, optional) – The seed for the random number generator ensuring reproducibility of the random draws. By default, none is passed.

Returns:

If draws is 1, returns a single value from the box. If draws is greater than 1, returns a list of length draws, containing the randomly drawn numbers from the box.

Return type:

single value or list

Raises:

ValueError – If draws is not a positive integer.

Examples

>>> box_model([1, 2, 3, 4, 5, 6], with_replacement=True, draws=3)
[2, 5, 5]
>>> box_model((1, 2, 3, 4, 5, 6), with_replacement=False, draws=3)
[4, 2, 6]
statwrap.fpp.contingency_table(data, column_1, column_2)

Generates a contingency table from a pandas DataFrame from two specified columns.

Parameters:
  • data (pd.DataFrame) – The DataFrame containing the data (define df = example_DataFrame).

  • column_1 (str) – Title of the first column.

  • column_2 (str) – Title of the second column.

Returns:

Contingency Table.

Return type:

pd.DataFrame

Examples

>>> df = pd.read_csv("cps_categoricals_00.csv")
>>> contingency_table(df, 'Industry', 'Geo_division')

(contingency table will appear in notebook output)

statwrap.fpp.histogram(*data_args, class_intervals=None, bins=None, density=True, xlim=None, ylim=None, ax=None, show=True, save_as=None, xlabel=None, ylabel=None, title=None, precision=0, **kwargs)

Creates a histogram using matplotlib.

Parameters:
  • data_args (array-like or sequence or array-likes or numeric scalars) – Input data to be plotted as a histogram.

  • class_intervals (int or sequence, optional) – The number of blocks or the interval edges if a sequence is provided. If not provided, defaults are used.

  • bins (int or sequence, optional) – Alternative name for class_intervals. class_intervals takes precedence is arguments are provided for both.

  • density (bool, default True) – If True, normalizes the histogram so that the total area is equal to 1.

  • xlim (tuple, optional) – The x-axis limits as (min, max). If not provided, defaults are used.

  • ylim (tuple, optional) – The y-axis limits as (min, max). If not provided, defaults are used.

  • ax (matplotlib axes object, optional) – An existing axes to draw the histogram on. If None, a new figure and axes are created.

  • show (bool, default True) – If True, displays the plot. Otherwise, it returns the figure and axis.

  • save_as (str, optional) – If a string is provided, the figure is saved with the given filename. This must include an extension like ‘.png’ or ‘.pdf’.

  • xlabel (str, optional) – Label for the x-axis.

  • ylabel (str, optional) – Label for the y-axis.

  • title (str, optional) – Title for the histogram plot.

  • kwargs (dict) – Additional keyword arguments to pass to ax.hist.

Returns:

fig, ax – A tuple containing the figure and axis objects. Only returned if show is False.

Return type:

tuple

Examples

>>> histogram([1,2,3,3,3], save_as = 'example.png')
(histogram will appear in notebook output)
>>> histogram(1,2,3,3,3, save_as = 'example.png')
(alternate syntax producing the same histogram as above)
>>> histogram([(1,2), (1,1,1,1)], title = 'Example')
(overlapping histograms with two data sets)
>>> histogram((1,2), (1,1,1,1), title = 'Example')
(alternate syntax for overlapping histograms with two data sets)
statwrap.fpp.r(x, y)

Calculates the Pearson correlation coefficient.

\[\frac{1}{n} \sum_{i=1}^{n} \dfrac{ (x_i - \mu_x) (y_i - \mu_y) }{ \text{SD}_x \times \text{SD}_y }\]

This is the average of the product of the z-scores.

Parameters:
  • x (array_like) – The first input array.

  • y (array_like) – The second input array.

Returns:

The correlation coefficient.

Return type:

float

Example

>>> r([0,1,1], [2,-9,2])
-0.5
statwrap.fpp.rms_size(*args)

Computes the r.m.s. (Root Mean Square) size of a list of numbers.

\[\sqrt{ \frac{1}{n}\sum_{i=1}^{n}x_i^2 }\]
Parameters:

args (array_like or numeric scalars) – Input data. This can be a single array-like object or individual numbers. Both rms_size([1,2]) and rms_size(1,2) are valid.

Returns:

The r.m.s. value of the provided numbers.

Return type:

float

Example

>>> rms_size(0, 5, -8, 7, -3)
5.422176684690384
statwrap.fpp.scatter_plot(x, y, xlim=None, ylim=None, ax=None, show=True, save_as=None, xlabel=None, ylabel=None, title=None, regression_line=False, regression_equation=False, **kwargs)

Create a scatter plot of x versus y, with specified axis labels, limits, title, and other properties. Optionally, a regression line can be added to the plot.

Parameters:
  • x (array-like) – The data values for the x-axis.

  • y (array-like) – The data values for the y-axis.

  • xlim (tuple, optional) – The limits for the x-axis in the form of (xmin, xmax). Default is None.

  • ylim (tuple, optional) – The limits for the y-axis in the form of (ymin, ymax). Default is None.

  • ax (matplotlib.axes._axes.Axes, optional) – The axes upon which to plot. If None, new axes will be created. Default is None.

  • show (bool, optional) – If True, display the plot. If False, return the plot object without displaying it. Default is True.

  • save_as (str, optional) – The filename (with path) to save the figure. If None, the figure is not saved. Default is None.

  • xlabel (str, optional) – The label for the x-axis. Default is None.

  • ylabel (str, optional) – The label for the y-axis. Default is None.

  • title (str, optional) – The title of the plot. Default is None.

  • regression_line (bool, optional) – If True, a regression line will be added to the plot. Default is False.

  • regression_equation (bool, optional) – If True, the equation of the regression line will be added to the top of the plot. Default is False.

  • **kwargs (dict) – Additional keyword arguments passed to matplotlib.pyplot.scatter.

Returns:

fig, ax – The figure and axes objects, returned only if show is False.

Return type:

matplotlib.figure.Figure, matplotlib.axes._axes.Axes

Examples

>>> import numpy as np
>>> x = np.random.rand(50)
>>> y = np.random.rand(50)
>>> scatter_plot(x, y, xlabel='X-axis', ylabel='Y-axis', title='Scatter Plot', regression_line=True)

Notes

If both ax and show are None, a new figure and axes will be created and displayed.

statwrap.fpp.sd(*args)

Computes the population standard deviation, or SD.

\[\sqrt{ \frac{1}{n}\sum_{i=1}^{n}(x_i-\mu)^2 }\]
Parameters:

args (array_like or numeric scalars) – Input data. This can be a single array-like object or individual numbers. Both sd([1,2]) and sd(1,2) are valid.

Returns:

The population standard deviation of the input array.

Return type:

float

Examples

>>> sd([-1, 0, 1])
0.816496580927726
>>> sd(-1,0,1)
0.816496580927726
statwrap.fpp.sd_plus(*args)

Computes the sample standard deviation, or SD+.

\[\sqrt{ \frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2 }\]
Parameters:

args (array_like or numeric scalars) – Input data. This can be a single array-like object or individual numbers. Both sd_plus([1,2]) and sd_plus(1,2) are valid.

Returns:

The sample standard deviation of the input array.

Return type:

float

Raises:

ValueError – If the input data has one or fewer elements, raising this error prevents division by zero.

Examples

>>> sd_plus([-1, 0, 1])
1.0
>>> sd_plus(-1, 0, 1)
1.0
statwrap.fpp.standard_units(*args, sd_plus=False)

Converts input values to standard units, where standard units indicate the number of standard deviations an element is from the average.

\[\frac{x-\mu}{\text{SD}}\]
Parameters:
  • args (array_like or numeric scalars) – Input data. This can be a single array-like object containing all data points, or individual numeric scalar values. Examples include standard_units([1, 2, 3]) and standard_units(1, 2, 3), both of which are valid.

  • sd_plus (bool, optional) – Sets the delta degrees of freedom used for numpy.std. Use False for population SD. Use True for sample SD.

Returns:

A list of the input data converted to standard units. Each value represents how many standard deviations it is from the dataset’s mean.

Return type:

list

Raises:

ValueError – If the standard deviation of the input data is zero, indicating that all input values are identical and conversion to standard units is undefined.

Examples

>>> standard_units([-1, 0, 1])
[-1.224744871391589, 0.0, 1.224744871391589]
>>> standard_units([1, 1, 1])
ValueError: Standard deviation is zero. Standard units are undefined.
>>> standard_units([1, 6, 100])
[-0.761297225001359, -0.651494740626163, 1.4127919656275223]
>>> standard_units(-100, 0, 1000, 2, 17)
[-0.6918327146385096, -0.4480579737510855, 1.9896894351231555, -0.44318247893333707, -0.4066162678002234]
statwrap.fpp.var(*args)

Computes the population variance.

\[\frac{1}{n}\sum_{i=1}^{n}(x_i-\mu)^2\]
Parameters:

args (array_like or numeric scalars) – Input data. This can be a single array-like object or individual numbers. Both var([1,2]) and var(1,2) are valid.

Returns:

The population variance of the input array.

Return type:

float

Examples

>>> var([-1, 0, 1])
0.6666666666666666
>>> var(-1, 0, 1)
0.6666666666666666
statwrap.fpp.var_plus(*args)

Computes the sample variance.

\[\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2\]
Parameters:

args (array_like or numeric scalars) – Input data. This can be a single array-like object or individual numbers. Both var_plus([1,2]) and var_plus(1,2) are valid.

Returns:

The sample variance of the input array.

Return type:

float

Raises:

ValueError – If the input data has one or fewer elements, raising this error prevents division by zero.

Examples

>>> var_plus([-1, 0, 1])
1.0
>>> var_plus(-1, 0, 1)
1.0