Inference of stress-strength reliability for two-parameter of exponentiated Gumbel distribution based on lower record values

Ehsan Fayyazishishavan, Serpil Kılıç Depren

Competing Interests: The authors have declared that no competing interests exist.

https://doi.org/10.1371/journal.pone.0249028, Volume: 16, Issue: 4, Pages: 1-12

Article Type: Research Article Article History

Publisher: Public Library of Science

- Facebook
- Twitter
- Linkedin
- Whatsapp
Altmetric

Table of Contents

1 Introduction
2 Maximum likelihood estimation
3 Bayesian estimation
4 Inference on R when λ is known
5 Conclusion
Supporting information

Abstract

The two-parameter of exponentiated Gumbel distribution is an important lifetime distribution in survival analysis. This paper investigates the estimation of the parameters of this distribution by using lower records values. The maximum likelihood estimator (MLE) procedure of the parameters is considered, and the Fisher information matrix of the unknown parameters is used to construct asymptotic confidence intervals. Bayes estimator of the parameters and the corresponding credible intervals are obtained by using the Gibbs sampling technique. Two real data set is provided to illustrate the proposed methods.

Fayyazishishavan,Kılıç Depren,and Chen: Inference of stress-strength reliability for two-parameter of exponentiated Gumbel distribution based on lower record values

1 Introduction

In engineering applications, a system may be subjected to several stresses such as extreme temperature and pressure. The survival and performance of such systems strongly depend on their resistance strength. The models which try to measure this resistance are called stress-strength models, and in the simplest terms, it can be described as an evaluation of the experienced random “stress” (Y) and the available “strength” (X) which overcomes the stress. This simple explanation induces the definition of the reliability of a system as the probability that the system is strong enough to overcome the stress that it is subjected to. Therefore, the reliability parameter could be defined as R = P(X>Y).

The estimation of the reliability parameter has extensive literature. It has been studied under different assumptions over the distribution of X and Y. [1] studied the ML estimation of R under the assumption that the stress and strength variables follow a bivariate exponential distribution. By considering a multivariate normal distribution, the MLE has been studied by [2]. The estimation of R, when the distribution is Weibull, were considered by [3]. See [4] and references therein for more details, works on the estimation of R and its applications. In some recent works [5], estimated R under the assumption that the stress and strength variables are independent and follow a generalized exponential distribution. [6] considered the estimation of R, when X and Y are independent, and both follow a three parameter Weibull distributions. Some other applications of the stress and strength models in the framework of transportation problems, which were estimated by ML methods, include [7–9]. Other engineer applications of these methods, which were applied in the Bayesian framework, could be found at [10–12].

The only difference in the above-mentioned works was the different distributions which the authors have been chosen for the random quantities. In some situations, one could not obtain the complete data and have to consider certain sampling schemes in order to get incomplete data for X and Y. [13–15] have been studied the problem of making inference on R based on progressively Type II censored data. [16], based on the record data, by considering one parameter generalized exponential distribution, has been studied ML and Bayesian estimation of R.

Another type of incomplete data is record values which usually appear in many real-life applications. Record values arise in climatology, sports, business, medicine, industry, and life testing surveys, among others. These records are commemorating over the period of the time that have been studied. The history of records may show the advancement in science and technology. By considering the record values in various areas of humankind activities, we can evaluate the performance of the societies. In 1952, [17] introduced the distribution of record values into the statistical world. After six decades of his original work, hundreds of papers were devoted to various aspects of the record’s theory. He provided a foundation for a mathematical theory of records. He defined the record values as consecutive extremes appearing in a sequence of independent identically distributed (i.i.d.) random variables. These smallest or largest occurred values are called “lower” or “upper” record values, respectively. Let X₁,…,X_n be a sequence of i.i.d. continuous random variables with a cumulative distribution function (CDF) F(x) and its corresponding probability density function (PDF) f(x). For every positive integer k≥1, the sequence of kth lower record times, {L(k), k≥1}, is defined as follows

Then the kth lower record value will be denoted by X_L(k) and the sequence {X_L(k), L(k)≥1} is called the lower record values. For the sake of simplicity, from now on, we shall refer to X_L(k) as X_k. As mentioned before, the record values have many applications in industry and engineering. Consider an electronic system that is subject to some shocks like low or high voltage in which both are dangerous for its predefined performance. These shocks could be considered as realizations of i.i.d. variable, and then one can use the record values models to study them. We refer the readers to [18, 19] for more details on the record values and their applications.

The Gumbel distribution is a well-known and popular model due to its wide application in climatology, global warming problems, wind speed, and rainfall modeling, among others. The book of [20] has an extensive list of applications of the Gumbel distribution in various fields of science. [21] has generalized this distribution by exponentiating, in the form of F(x;α) = 1−[1−G(x)]^α, where G(x) is the Gumbel density and a>0. Note that exponentiating the standard probability distributions cloud solves the problem of lack of fits that arise when using these distributions for modeling complex data [22]. They showed the power and ability of this generalized distribution in modeling the climatology data by applying it on rainfall data from Orlando, Florida.

In this work, we use a slightly different way to define the exponentiated distributions, i.e., F(x;α) = [G(x)]^α, which are called the proportional reversed hazard rate models [23]. The random variable X follows the two-parameter Exponentiated Gumbel distribution if it has the following CDF

where α>0 and λ>0. The PDF corresponding to the CDF (2) is

Here α and λ are the shape and scale parameters, respectively. We will denote this distribution by EG(α,λ).

2 Maximum likelihood estimation

In this section, we consider the maximum likelihood estimation of R = P(X>Y) when X~EG(α,λ) and Y~EG(β,λ), and X and Y are independently distributed. Formal integration shows that

Let X₁,X₂,…,X_n and Y₁,Y₂,…,Y_m be two independent sets of the lower records from EG(α,λ) and EG(β,σ), respectively. Therefore, the likelihood function of parameters becomes (see [19])

From (2) and (3) the likelihood function is obtained as

The log likelihood function is given by

Then, the likelihood equations will be

From above equations, we get

Note that $\hat{λ}$ is the harmonic mean of ${\hat{λ}}_{1} = 1 / (\bar{x} - x_{n})$ and ${\hat{λ}}_{2} = 1 / (\bar{y} - y_{m})$ , which are the MLEs of independent samples of sizes n and m, respectively.

Therefore, by applying the invariant property of ML estimators, the ML estimation of R will be

In this section, we obtain the Fisher information matrix of the unknown parameters of EG distribution, which can be used to construct asymptotic confidence intervals.

[19] showed that the PDF of the sth lower record, X_s, is given by

Therefore, the Fisher information matrix of θ = (α,β,λ) will be

where

The asymptotic covariance matrix of the ML estimators could be achieved via inverting the Fisher information matrix as following

where

Now, by using the delta method, the asymptotic variance of $\hat{R}$ could be obtained as follows.

where

C^{T} = (\frac{\partial R}{\partial α}, \frac{\partial R}{\partial β}, \frac{\partial R}{\partial λ}) = \frac{1}{(α + β)^{2}} (β, - α, 0)

Note that the $V (\hat{R})$ is a function of unknown parameters, and it needs to be estimated. It can be done by plunging the ML estimators of the parameters. Therefore, the (1−γ)% asymptotic confidence intervals of R will be in the form of $\hat{R} \pm z_{1 - γ / 2} \frac{\sqrt{V (\hat{R})}}{\sqrt{n}}$ .

3 Bayesian estimation

In this section, we attempt to find the Bayes estimator of the parameters. To do so, we consider that the parameters are apriori independent, and they follow gamma distributions, i.e., α~Gamma(a₁,b₁), β~Gamma(a₂,b₂), and λ~Gamma(a₃,b₃). Therefore, the full posterior distribution of the parameters will be

The above posterior does not admit a closed-form and cannot be used directly in the estimation procedure. Then to simulate a random sample from such distributions and perform an approximated inference, the Gibbs sampler could be used. The full conditional distributions of the parameters are as follows:

As π(λ|α,β,data) does not have a closed and standard form, one could not produce a sample from this density using direct methods. The Metropolis-Hastings algorithm is a method that can be used to produce a sample from such distributions. As shown in Fig 1, the normal distribution could be a good candidate for the proposal distribution of the Metropolis-Hastings algorithm. Therefore, the algorithm of Gibbs sampling is as follows.

Fig 1

Proposal and posterior density functions of scale parameter.

Step 1: Start with an initial value λ⁽⁰⁾.

Step 2: Set t = 1.

Step 3: Generate α^(t) from $G a m m a (n + a_{1}, b_{1} + e^{- λ^{(t - 1)} x_{n}})$ .

Step 4: Generate β^(t) from $G a m m a (m + a_{2}, b_{2} + e^{- λ^{(t - 1)} y_{m}}),$ .

Step 5: Use the Metropolis-Hastings algorithm to generate λ^(t) from π(λ|α^(t−1),β^(t−1),data) by using the $N (λ^{(t - 1)}, σ_{0}^{2})$ as a proposal distribution.

Step 5.1: Generate candidate points λ* from $N (λ^{(t - 1)}, σ_{0}^{2})$ and u from $U (0,1)$ .

Step 5.2: Set λ^(t) = λ* if u≤ρ(λ^(t−1),λ*) and λ^(t) = λ^(t−1) otherwise, when the acceptance probability is given by ρ(λ^(t−1),λ*) = min{1,A}, and the acceptance rate is given by

Step 6: Set t = t+1.

Step 7: Repeat steps 3–6, T times.

Once we get a sample from the posteriors, the approximate posterior mean of R, and its variance could be computed as following

and

where K is the burn-in period of the chain, which helps to vanish the effect of the starting values of the generated Markov chain.

The approximate highest posterior density (HPD) credible interval of R could be constructed using the method proposed in [24].

Let R_(K+1)<R_(K+2)<⋯<R_(T) be the ordered output of the chain, R^(t). To construct a 100(1−γ)% approximate HPD credible interval for R, we consider the following intervals,

by choosing the interval with the shortest length, we obtain the HPD credible intervals.

4 Inference on R when λ is known

As we show in section 2, the ML estimation of λ does not depend on the value of other parameters; therefore, by plunging the MLE of λ, one can assume that the model contains only two parameters. This assumption makes the procedure of inference easier and straightforward. In other words, we can assume that λ is known, and without loss of generality, we set λ = 1.

4.1 MLE of R

As mentioned in section 2, the ML estimator of R is

By straight computation, one can see that

By considering (13), and the fact that X_n and Y_m are independent, one can show that

Therefore, the 100(1−γ)% confidence interval for R could be obtained as

where

F_{γ, d_{1}, d_{2}}

is the 100γ^th percentile of the Fisher distribution with d₁ and d₂ degrees of freedom.

4.2 Bayesian estimation

Since we assumed that the parameters are apriori independent with gamma density, the posterior density of α and β are independent $G a m m a (a_{1} + n, b_{1} + e^{- x_{n}})$ and $G a m m a (a_{2} + m, b_{2} + e^{- y_{m}})$ , respectively. Therefore, the posterior distribution of R will be

where

The Bayesian estimation is based on the obtained posterior distribution. According to the assumed loss function, various aspects of the posterior distribution, such as the mean, median, etc., can be used to estimate the parameters. See [25, 26] for more details. By assuming the quadratic loss function, the Bayesian estimation will be the posterior mean which could be computed by considering the following well-known equation

in which B(b,c−b) and ₂F₁(a,b; c; z) are beta and hypergeometric functions, receptively. Therefore, the Bayesian estimation of R is

The variance of the Bayesian estimator could be achieved by using

To construct the HPD intervals, as the posterior is not tractable, we can generate a sample from the posterior by using an indirect sampling algorithm, such as the accept-reject method.

4.3 Real data analysis

In this section, we analyze a set of real strength data, which were taken from ([27], p. 574). These data are originally from [28], which represent the lifetimes of steel specimens tested at 14 different stress magnitudes. Here, we pick up the dataset corresponding to 35.0 and 35.5 stress levels as Dataset 1 (x) and Dataset 2 (y) in Table 1, respectively.

Table 1

The real datasets which were reported in Lawless (2011) [].

Dataset 1 (x)					Dataset 2 (y)
230	169	178	271	129	156	173	125	852	559
568	115	280	305	326	442	168	286	261	227
1101	285	734	177	493	285	253	166	133	309
218	342	431	143	381	247	112	202	365	702

Dataset 1 (x) and Dataset 2 (y) correspond to 35.0 and 35.5 stress level, respectively.

We fitted the EG distribution models for two datasets separately and estimated the scale and shape parameters. The Kolmogorov-Smirnov (K-S) goodness of fit test was applied on the datasets. The reported results in Table 2 confirm the well-fitting of the EG distribution to model these data. Moreover, Figs 2 and 3 confirm the appropriate fit of the EG distribution by comparing the empirical and fitted distributions for both datasets.

Fig 2

Empirical and fitted CDFs for Dataset 1.

Fig 3

Empirical and fitted CDFs for Dataset 2.

Table 2

The Kolmogorov-Smirnov test output for real datasets.

Dataset	Scale	Shape	K-S	p-value
1	0.0071	5.9717	0.1137	0.9326
2	0.0085	6.5564	0.1408	0.7726

Now we can use the lower records based on the datasets to drive the ML and Bayesian estimation of parameters. These records for Dataset 1 are 230,169,129,115 and based on Dataset 2 are 156, 125, 112. The corresponding results for ML methods are reported in Table 2. According to this table, it is clear that the scale parameters of the two data sets are almost the same. By assuming equality of the scale parameter, the MLE and the 95% confidence interval of R based on lower records values become 0.5927 and (0.4117,0.7737), respectively. Also, by using Gibs sampling the Bayes estimate and credible interval of R are 0.5893 and (0.4236,0.7641), respectively.

5 Conclusion

In this paper, we investigated the estimation of the parameters of the two-parameter of exponentiated Gumbel distribution by using lower records values. The maximum likelihood was used to estimate the parameters of the models, and the Fisher information matrix of the unknown parameters is used to construct asymptotic confidence intervals. Furthermore, the Bayes estimator of the parameters and the corresponding credible intervals were obtained by using the Gibbs sampling technique. The methods of estimating (ML and Bayes) were compared via two real data set and showed that the Bayesian estimations are slightly different from the ML ones.

References

A.Awad, M.Azzam, and M.Hamdan (1981). ‘Some inference results on P(X<Y) in the bivariate exponential model’. Communications in Statistics-Theory and Methods, 10(24), 2515–2525.

R.Gupta and R. C.Gupta (1990). ‘Estimation of P(a′X<b′Y) in the multivariate normal case’, Statistics, 21(1), 91–97.

J. I.McCool (1991). ‘Inference on P{Y<X} in the Weibull case’, Communications in Statistics-Simulation and Computation, 20(1), 129–148.

S.Kotz and M.Pensky (2003). ‘The stress-strength model and its generalizations: theory and applications’. World Scientific.

D.Kundu and R. D.Gupta (2005). Estimation of P[Y<X] for generalized exponential distribution. Metrika, 61(3):291–308.

D.Kundu and M. Z.Raqab (2009). ‘Estimation of R = P(Y<X) for three-parameter Weibull distribution’, Statistics & Probability Letters, 79(17), 1839–1846.

F.Chen and S.Chen (2011). ‘Injury severities of truck drivers in single-and multi-vehicle accidents on rural highways’. Accident Analysis & Prevention, 43(5):1677–1688. 10.1016/j.aap.2011.03.026

B.Dong, X.Ma, F.Chen, and S.Chen (2018). ‘Investigating the differences of single-vehicle and multivehicle accident probability using mixed logit model’. Journal of Advanced Transportation, 2018. 10.1155/2018/7905140

F.Chen, S.Chen, and X.Ma (2018). ‘Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data’. Journal of safety research, 65:153–159. 10.1016/j.jsr.2018.02.010

Q.Zeng, H.Wen, H.Huang, and M.Abdel-Aty (2017). ‘A Bayesian spatial random parameters Tobit model for analyzing crash rates on roadway segments’. Accident Analysis & Prevention, 100:37–43. 10.1016/j.aap.2016.12.023

H.Wen, X.Zhang, Q.Zeng, and N.Sze (2019). ‘Bayesian spatial-temporal model for the main and interaction effects of roadway and weather characteristics on freeway crash incidence’. Accident Analysis & Prevention, 132:105249. 10.1016/j.aap.2019.07.025

Q.Zeng, H.Wen, S.Wong, H.Huang, Q.Guo, and X.Pei (2020). ‘Spatial joint analysis for zonal daytime and nighttime crash frequencies using a Bayesian bivariate conditional autoregressive model’. Journal of Transportation Safety & Security, 12(4):566–585. 10.4271/2016-01-1439

A.Asgharzadeh, R.Valiollahi, and M. Z.Raqab (2011). ‘Stress-strength reliability of Weibull distribution based on progressively censored samples’, SORT, 35(2), 103–124.

B.Saraçoglu, I.Kinaci, and D.Kundu (2012). ‘On estimation of R = P(Y<X) for exponential distribution under progressive type-ii censoring’, Journal of Statistical Computation and Simulation, 82(5), 729–744.

Y.Lio and T.-R.Tsai (2012). ‘Estimation of δ = P(Y<X) for burr xii distribution based on the progressively first failure-censored samples’, Journal of Applied Statistics, 39(2), 309–322.

A.Baklizi (2008). ‘Estimation of P(X<Y) using record values in the one and two parameter exponential distributions’. Communications in Statistics—Theory and Methods, 37(5), 692–698.

K.Chandler (1952). ‘The distribution and frequency of record values’, Journal of the Royal Statistical Society: Series B (Methodological), 14(2), 220–228.

V. B.Nevzorov (2001). ‘Records: mathematical theory’, American Mathematical Soc.

B. C.Arnold, N.Balakrishnan, and H. N.Nagaraja (2011). ‘Records’, 768. John Wiley & Sons.

S.Kotz and S.Nadarajah (2000). ‘Extreme value distributions: theory and applications’. World Scientific.

S.Nadarajah (2006). ‘The Exponentiated Gumbel distribution with climate application’, Environmetrics: The official journal of the International Environmetrics Society, 17(1),13–23.

I.Okorie, A.Akpanta, and J.Ohakwe (2016). ‘The exponentiated gumbel type-2 distribution: properties and application’, International Journal of Mathematics and Mathematical Sciences, 2016.

R. D.Gupta and A. K.Nanda (2001). ‘Some results on reversed hazard rate ordering’, Communications in Statistics-Theory and Methods, 30(11), 2447–2457.

M.-H.Chen and Q.-M.Shao (1999). ‘Monte Carlo estimation of Bayesian credible and HPD intervals’. Journal of Computational and Graphical Statistics, 8(1), 69–92.

C.Robert (2007). ‘The Bayesian choice: from decision-theoretic foundations to computational implementation’, Springer Science & Business Media.

P.Congdon (2007). ‘Bayesian statistical modelling’, 704. John Wiley & Sons.

J. F.Lawless (2011). ‘Statistical models and methods for lifetime data’, 362. John Wiley & Sons.

M.Crowder (2000). ‘Tests for a family of survival models based on extremes’. Recent Advances in Reliability Theory, 307–321. Springer.