Parameter estimation

3. Parameter estimation#

Most if not all probability distributions are parametrized, but often when we’re collecting data we do not know these parameters a priori. That’s where parameter estimation comes in, as we’ll see in these exercises. Parameter estimation allows you to learn the features of the probability distribution governing your problem, enabling you to make informed decisions about future cases (samples).

Summary of commands#

In this exercise, we will demonstrate the following:

In the previous exercises we leveraged the standardized functions such as rvs(), pdf(), cdf(), etc. This time, we will be using:

  • fit() to fit (estimate) the parameters based on data.

  • ppf() to calculate the percent point function to help us get quantiles.

Demo#

The time to failure of field-effect transistors (FETs) is known to be exponentially distributed. Data (in hours) has been collected for \(10\) samples. Using the data in the table below, compute the maximum likelihood estimate and the \(95\%\) confidence interval for the mean time to failure.

2540

2800

2650

2550

2420

2300

2580

2460

2470

2490

Note: In MATLAB, you can directly call expfit() and be done in one line of code. In Python, you’ll have to work for your meal by noting that the confidence interval for an exponential distribution is given by the \(\chi^2\) distribution with \(2N\) degrees of freedom, as written in Wikipedia. If you’re wondering why our code below appears to divide by the “wrong” value in lines 17-18, see this lecture (slide 9) where it shows the more precise connection between \(\chi^2\) and the exponential mean \(\sim 1/\lambda\).

# import packages from SciPy library
from scipy.stats import expon, chi2
import numpy as np

# data
X = np.array([2540, 2800, 2650, 2550, 2420, 2300, 2580, 2460, 2470, 2490])
N = len(X)
alpha = 0.05

# exponential distribution
loc, scale = expon.fit(X, floc=0)
print(f"The MLE for the mean time to failure is {scale:.0f} hours.")

# chi-squared quantiles
lower = chi2.ppf(alpha/2, df=2*N)
upper = chi2.ppf(1 - alpha/2, df=2*N)
lb = 2 * N * scale / upper
ub = 2 * N * scale / lower
print(f"The 95% confidence interval is [{lb:.0f}, {ub:.0f}].")
The MLE for the mean time to failure is 2526 hours.
The 95% confidence interval is [1479, 5268].