3. Parameter estimation#
Most if not all probability distributions are parametrized, but often when we’re collecting data we do not know these parameters a priori. That’s where parameter estimation comes in, as we’ll see in these exercises. Parameter estimation allows you to learn the features of the probability distribution governing your problem, enabling you to make informed decisions about future cases (samples).
Summary of commands#
In this exercise, we will demonstrate the following:
SciPy, specifically the statistics package.
scipy.stats.expon
- Exponential continuous random variable.scipy.stats.chi2
- Chi-squared continuous random variable.
In the previous exercises we leveraged the standardized functions such as rvs()
, pdf()
, cdf()
, etc.
This time, we will be using:
fit()
to fit (estimate) the parameters based on data.ppf()
to calculate the percent point function to help us get quantiles.
Demo#
The time to failure of field-effect transistors (FETs) is known to be exponentially distributed. Data (in hours) has been collected for \(10\) samples. Using the data in the table below, compute the maximum likelihood estimate and the \(95\%\) confidence interval for the mean time to failure.
2540 |
2800 |
2650 |
2550 |
2420 |
2300 |
2580 |
2460 |
2470 |
2490 |
Note: In MATLAB, you can directly call expfit()
and be done in one line of code.
In Python, you’ll have to work for your meal by noting that the confidence interval for an exponential distribution is given by the \(\chi^2\) distribution with \(2N\) degrees of freedom, as written in Wikipedia.
If you’re wondering why our code below appears to divide by the “wrong” value in lines 17-18, see this lecture (slide 9) where it shows the more precise connection between \(\chi^2\) and the exponential mean \(\sim 1/\lambda\).
# import packages from SciPy library
from scipy.stats import expon, chi2
import numpy as np
# data
X = np.array([2540, 2800, 2650, 2550, 2420, 2300, 2580, 2460, 2470, 2490])
N = len(X)
alpha = 0.05
# exponential distribution
loc, scale = expon.fit(X, floc=0)
print(f"The MLE for the mean time to failure is {scale:.0f} hours.")
# chi-squared quantiles
lower = chi2.ppf(alpha/2, df=2*N)
upper = chi2.ppf(1 - alpha/2, df=2*N)
lb = 2 * N * scale / upper
ub = 2 * N * scale / lower
print(f"The 95% confidence interval is [{lb:.0f}, {ub:.0f}].")
The MLE for the mean time to failure is 2526 hours.
The 95% confidence interval is [1479, 5268].