FPP¶
Stats functions adapted to the conventions of Freedman, Pisani, and Purves 2007.
- statwrap.fpp.average(*args)¶
Computes the arithmetic mean.
\[\frac{1}{n} \sum_{i=1}^{n} x_i\]- Parameters:
args (array_like or numeric scalars) – Input data. This can be a single array-like object or individual numbers. Both average([1,2]) and average(1,2) are valid.
- Returns:
The average value, or arithmetic mean, for a collection of numbers.
- Return type:
float
Example
>>> average(0, 5, -8, 7, -3) 0.2
- statwrap.fpp.box_model(*args, with_replacement=True, draws=1, random_seed=None)¶
Returns random draws from a box model where each number in the box model is equally likely to be drawn.
- Parameters:
*args (tuple or array_like) – The elements forming the box from which numbers will be drawn. If a single array_like is provided, it will be used as the box model. If multiple values are provided, they should be passed as a flat tuple.
with_replacement (bool, optional) – Specifies whether drawing is done with replacement. Default is True, where numbers are replaced back into the box after each draw.
draws (int, optional) – The number of draws to be made from the box. Must be a positive integer. Default is 1.
random_seed (int, optional) – The seed for the random number generator ensuring reproducibility of the random draws. By default, none is passed.
- Returns:
If draws is 1, returns a single value from the box. If draws is greater than 1, returns a list of length draws, containing the randomly drawn numbers from the box.
- Return type:
single value or list
- Raises:
ValueError – If draws is not a positive integer.
Examples
>>> box_model([1, 2, 3, 4, 5, 6], with_replacement=True, draws=3) [2, 5, 5]
>>> box_model((1, 2, 3, 4, 5, 6), with_replacement=False, draws=3) [4, 2, 6]
- statwrap.fpp.contingency_table(data, column_1, column_2)¶
Generates a contingency table from a pandas DataFrame from two specified columns.
- Parameters:
data (pd.DataFrame) – The DataFrame containing the data (define df = example_DataFrame).
column_1 (str) – Title of the first column.
column_2 (str) – Title of the second column.
- Returns:
Contingency Table.
- Return type:
pd.DataFrame
Examples
>>> df = pd.read_csv("cps_categoricals_00.csv") >>> contingency_table(df, 'Industry', 'Geo_division')
(contingency table will appear in notebook output)
- statwrap.fpp.histogram(*data_args, class_intervals=None, bins=None, density=True, xlim=None, ylim=None, ax=None, show=True, save_as=None, xlabel=None, ylabel=None, title=None, precision=0, **kwargs)¶
Creates a histogram using matplotlib.
- Parameters:
data_args (array-like or sequence or array-likes or numeric scalars) – Input data to be plotted as a histogram.
class_intervals (int or sequence, optional) – The number of blocks or the interval edges if a sequence is provided. If not provided, defaults are used.
bins (int or sequence, optional) – Alternative name for class_intervals. class_intervals takes precedence is arguments are provided for both.
density (bool, default True) – If True, normalizes the histogram so that the total area is equal to 1.
xlim (tuple, optional) – The x-axis limits as (min, max). If not provided, defaults are used.
ylim (tuple, optional) – The y-axis limits as (min, max). If not provided, defaults are used.
ax (matplotlib axes object, optional) – An existing axes to draw the histogram on. If None, a new figure and axes are created.
show (bool, default True) – If True, displays the plot. Otherwise, it returns the figure and axis.
save_as (str, optional) – If a string is provided, the figure is saved with the given filename. This must include an extension like ‘.png’ or ‘.pdf’.
xlabel (str, optional) – Label for the x-axis.
ylabel (str, optional) – Label for the y-axis.
title (str, optional) – Title for the histogram plot.
kwargs (dict) – Additional keyword arguments to pass to ax.hist.
- Returns:
fig, ax – A tuple containing the figure and axis objects. Only returned if show is False.
- Return type:
tuple
Examples
>>> histogram([1,2,3,3,3], save_as = 'example.png') (histogram will appear in notebook output)
>>> histogram(1,2,3,3,3, save_as = 'example.png') (alternate syntax producing the same histogram as above)
>>> histogram([(1,2), (1,1,1,1)], title = 'Example') (overlapping histograms with two data sets)
>>> histogram((1,2), (1,1,1,1), title = 'Example') (alternate syntax for overlapping histograms with two data sets)
- statwrap.fpp.r(x, y)¶
Calculates the Pearson correlation coefficient.
\[\frac{1}{n} \sum_{i=1}^{n} \dfrac{ (x_i - \mu_x) (y_i - \mu_y) }{ \text{SD}_x \times \text{SD}_y }\]This is the average of the product of the z-scores.
- Parameters:
x (array_like) – The first input array.
y (array_like) – The second input array.
- Returns:
The correlation coefficient.
- Return type:
float
Example
>>> r([0,1,1], [2,-9,2]) -0.5
- statwrap.fpp.rms_size(*args)¶
Computes the r.m.s. (Root Mean Square) size of a list of numbers.
\[\sqrt{ \frac{1}{n}\sum_{i=1}^{n}x_i^2 }\]- Parameters:
args (array_like or numeric scalars) – Input data. This can be a single array-like object or individual numbers. Both rms_size([1,2]) and rms_size(1,2) are valid.
- Returns:
The r.m.s. value of the provided numbers.
- Return type:
float
Example
>>> rms_size(0, 5, -8, 7, -3) 5.422176684690384
- statwrap.fpp.scatter_plot(x, y, xlim=None, ylim=None, ax=None, show=True, save_as=None, xlabel=None, ylabel=None, title=None, regression_line=False, regression_equation=False, **kwargs)¶
Create a scatter plot of x versus y, with specified axis labels, limits, title, and other properties. Optionally, a regression line can be added to the plot.
- Parameters:
x (array-like) – The data values for the x-axis.
y (array-like) – The data values for the y-axis.
xlim (tuple, optional) – The limits for the x-axis in the form of (xmin, xmax). Default is None.
ylim (tuple, optional) – The limits for the y-axis in the form of (ymin, ymax). Default is None.
ax (matplotlib.axes._axes.Axes, optional) – The axes upon which to plot. If None, new axes will be created. Default is None.
show (bool, optional) – If True, display the plot. If False, return the plot object without displaying it. Default is True.
save_as (str, optional) – The filename (with path) to save the figure. If None, the figure is not saved. Default is None.
xlabel (str, optional) – The label for the x-axis. Default is None.
ylabel (str, optional) – The label for the y-axis. Default is None.
title (str, optional) – The title of the plot. Default is None.
regression_line (bool, optional) – If True, a regression line will be added to the plot. Default is False.
regression_equation (bool, optional) – If True, the equation of the regression line will be added to the top of the plot. Default is False.
**kwargs (dict) – Additional keyword arguments passed to matplotlib.pyplot.scatter.
- Returns:
fig, ax – The figure and axes objects, returned only if show is False.
- Return type:
matplotlib.figure.Figure, matplotlib.axes._axes.Axes
Examples
>>> import numpy as np >>> x = np.random.rand(50) >>> y = np.random.rand(50) >>> scatter_plot(x, y, xlabel='X-axis', ylabel='Y-axis', title='Scatter Plot', regression_line=True)
Notes
If both ax and show are None, a new figure and axes will be created and displayed.
- statwrap.fpp.sd(*args)¶
Computes the population standard deviation, or SD.
\[\sqrt{ \frac{1}{n}\sum_{i=1}^{n}(x_i-\mu)^2 }\]- Parameters:
args (array_like or numeric scalars) – Input data. This can be a single array-like object or individual numbers. Both sd([1,2]) and sd(1,2) are valid.
- Returns:
The population standard deviation of the input array.
- Return type:
float
Examples
>>> sd([-1, 0, 1]) 0.816496580927726
>>> sd(-1,0,1) 0.816496580927726
- statwrap.fpp.sd_plus(*args)¶
Computes the sample standard deviation, or SD+.
\[\sqrt{ \frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2 }\]- Parameters:
args (array_like or numeric scalars) – Input data. This can be a single array-like object or individual numbers. Both sd_plus([1,2]) and sd_plus(1,2) are valid.
- Returns:
The sample standard deviation of the input array.
- Return type:
float
- Raises:
ValueError – If the input data has one or fewer elements, raising this error prevents division by zero.
Examples
>>> sd_plus([-1, 0, 1]) 1.0
>>> sd_plus(-1, 0, 1) 1.0
- statwrap.fpp.standard_units(*args, sd_plus=False)¶
Converts input values to standard units, where standard units indicate the number of standard deviations an element is from the average.
\[\frac{x-\mu}{\text{SD}}\]- Parameters:
args (array_like or numeric scalars) – Input data. This can be a single array-like object containing all data points, or individual numeric scalar values. Examples include standard_units([1, 2, 3]) and standard_units(1, 2, 3), both of which are valid.
sd_plus (bool, optional) – Sets the delta degrees of freedom used for numpy.std. Use False for population SD. Use True for sample SD.
- Returns:
A list of the input data converted to standard units. Each value represents how many standard deviations it is from the dataset’s mean.
- Return type:
list
- Raises:
ValueError – If the standard deviation of the input data is zero, indicating that all input values are identical and conversion to standard units is undefined.
Examples
>>> standard_units([-1, 0, 1]) [-1.224744871391589, 0.0, 1.224744871391589]
>>> standard_units([1, 1, 1]) ValueError: Standard deviation is zero. Standard units are undefined.
>>> standard_units([1, 6, 100]) [-0.761297225001359, -0.651494740626163, 1.4127919656275223]
>>> standard_units(-100, 0, 1000, 2, 17) [-0.6918327146385096, -0.4480579737510855, 1.9896894351231555, -0.44318247893333707, -0.4066162678002234]
- statwrap.fpp.var(*args)¶
Computes the population variance.
\[\frac{1}{n}\sum_{i=1}^{n}(x_i-\mu)^2\]- Parameters:
args (array_like or numeric scalars) – Input data. This can be a single array-like object or individual numbers. Both var([1,2]) and var(1,2) are valid.
- Returns:
The population variance of the input array.
- Return type:
float
Examples
>>> var([-1, 0, 1]) 0.6666666666666666
>>> var(-1, 0, 1) 0.6666666666666666
- statwrap.fpp.var_plus(*args)¶
Computes the sample variance.
\[\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2\]- Parameters:
args (array_like or numeric scalars) – Input data. This can be a single array-like object or individual numbers. Both var_plus([1,2]) and var_plus(1,2) are valid.
- Returns:
The sample variance of the input array.
- Return type:
float
- Raises:
ValueError – If the input data has one or fewer elements, raising this error prevents division by zero.
Examples
>>> var_plus([-1, 0, 1]) 1.0
>>> var_plus(-1, 0, 1) 1.0