Thursday, April 4, 2019

Application of Regression Analysis

Application of Regression AnalysisChapter-3 systemological compendIn the application of reverse analysis, often the data set consist of unusual observances which argon either outliers (noise) or prestigious contemplations. These observations may have large residuums and affect the parameters of the reversion co-efficient and the whole regression analysis and become the source of misleading results and interpretations. Therefore it is very important to consider these suspected observations very conservatively and made a decision that either these observations should be included or get throughd from the analysis.In regression analysis, the basic step is to determine whether one or more observations erect influence the results and interpretations of the analysis. If the regression analysis have one independent variable, then it is easy to detect observations in dependent and independent variables by using scatter plot, box plot and equipoise plot etc. But graphical method t o identify outlier and/or influential observation is a subjective approach. It is also intumesce known that in the presence of octuple outliers there can be a masking or swamping result. Masking (false negative) occurs when an far subset remains undetected due the presence of another, commonly adjacent subset. Swamping (false positive) occurs when usual observation is incorrectly set as outlier in the presence of another usually remote subset of observations.In the present study, some well known diagnostics argon compared to identify multiple influential observations. For this purpose, first, robust regression methods are utilize to identify influential observation in Poisson regression, then to conform that the observations identified by robust regression method are genuine influential observations, some diagnostic measures based on adept type cutting approach like Pearson chi-square, digression residual, hat matrix, likelihood residual test, cooks outer space, release of fits, squared contrariety in beta are considered but in the presence of masking and swamping diagnostics based on single case baseball swing fail to identify outlier and influential observations. Therefore to remove or minimize the masking and swamping phenomena some throng deletion approaches generalized standardized Pearson residual, generalized dissimilarity of fits, generalized squared difference in beta are taken. 3.2 Diagnostic measures based on single case deletionThis section presents the detail of single case deleted measures which are used to identify multiple influential observations in Poisson regression model. These measures are change in Pearson chi-square, change in deviance, hat matrix, likelihood residual test, cooks distance, difference of fits (DFFITS),squared difference in beta(SDBETA).Pearson chi-squareTo show the amount of change in Poisson regression estimates that would occurred if the kth observation is deleted, Pearson 2 statistic is proposed to det ect the outlier. Such diagnostic statistics are one that examine the conventional of deleting single case on the overall summary measures of fit.Let denotes the Pearson 2 and denotes the statistic after the case k is deleted. Using one-step linear approximations given by Pregibon (1981). The decrease in the cheer of statistics due to deletion of the kth case is = - , k=1,2,3,..,n 3.1 is be as 3.2 = And for the kth deleted case is = 3.3Deviance residualThe one-step linear approximation for change in deviance when the kth case is deleted isD = D - D(-k) 3.4Because the deviance is used to measure the uprightness of fit of a model, a substantial decrease in the deviance after the deletion of the kth observation is indicate that is observation is a misfit. The deviance of Poisson regression with kth observation isD=2 3.5Where = exp (D(-k)= 2 3.6A larger value of D(-k) indicates that the kth value is an outlier.Hat matrixThe Hat matrix is used in residual diagnostics to mea sure the influence of each observation. The hat values, hii, are the prejudice entries of the Hat matrix which is work out usingH=V1/2X(XTVX)-1XTV1/2 3.7Where V=diagvar(yi)(ii)-1 var(yi)=E(yi)= In Poisson regression model=i) = (,where g function is usually called the link function and With the log link in Poisson regressioni= =V=diag( 3.8(XTVX)-1 is an estimated covariance matrix of and hii is the ith diagonal element of Hat matrix H. The properties of the diagonal element of hat matrix i.e leverage values are0and Where k indicates the parameter of the regression model with intercept term. An observation is said to be influential if ckn. where c is a suitably constant 2 and 3 or more. Using twice the consider thumb rule suggested by Hoaglin and Welsch (1978), an observation with 2kn considered as influential.Likelihood residual testFor the detection of outliers, Williams (1987) introduced the likelihood residual. The squared likelihood residual is a weighted medium of the squ ared standardized deviance and Pearson residual is defined as 3.9and it is closely equals to likelihood ratio test for testing whether an observation is an outlier and it also called approximate studentized residual, is standardized Pearson residual is defined as= 3.10 is standardized deviance residual is defined as= 3.11 = sign(Where is called the deviance residual and it is another popular residual because the sum of square of these residual is a deviance statistic.Because the average value, KN, of hi is small is much closer to than to ,and therefore also approximately normally distributed. An observation is considered to be influential if t(1, nDifference of fits test (DFFITS)Difference of fits test for Poisson regression is defined as(DFFITS)i= , i=1,2,3,..,n 3.12Where and are respectively the ith fitted response and an estimated standard error with the ith observation is deleted. DFFITS can be expressed in wrong of standardized Pearson residuals and leverage values as (DFFITS)i= 3.13 = =An observation is said to be influential if the value of DFFITS 2. grooms DistanceCook (1977) suggests the statistics which measures the change in parameter estimates caused by deleting each observation, and defined asCDi= 3.14Where is estimated parameter of without ith observation. There is also a relationship between difference of fits test and Cooks distance which can be expressed asCDi= 3.15Using approximation suggested by Pregibons C.D can be expressed as () 3.16Observation with CD value greater than 1 is inured as an influential.Squared Difference in Beta (SDFBETA)The measure is originated from the idea of Cooks distance (1977) based on single case deletion diagnostic and brings a modification in DFBETA (Belsley et al., 1980), and it is defined as(SDFBETA)i = 3.17After some necessary calculation SDFBETA can be relate with DFFITS as(SDFBETA)i = 3.18The ith observation is influential if (SDFBETA)iDiagnostic measures based on group deletion approachTh is section includes the detail of group deleted measures which are used to identify the multiple influential observations in Poisson regression model. Multiple influential observations can misfit the data and can create the masking or swamping effect. Diagnostics based on group deletion are effective for identification of multiple influential observations and are free from masking and swamping effect in the data. These measures are generalized standardized Pearson residual (GSPR), generalized difference of fits (GDFFITS) and generalized squared difference in Beta(GSDFBETA).3.3.1 reason standardized Pearson residual (GSPR)Imon and Hadi (2008) introduced GSPR to identify multiple outliers and it is defined as i 3.19= i 3.20Where are respectively the diagonal elements of V and H (hat matrix) of remaining group. Observations corresponding to the cases GSPR 3 are considered as outliers.3.3.2 Generalized difference of fits (GDFFITS)GDFFITS statistic can be expressed in terms of GSPR ( Generalized standardized Pearson residual) and GWs (generalized weights).GWs is denoted by and defined as for i 3.21= for i 3.22A value having is larger than, Median (MAD ( is considered to be influential i.e Median (MAD ( in conclusion GDFFITS is defined as(GDFFITS)i= 3.23We consider the observation as influential ifGDFFITSi 33.3.3 Generalized squared difference in Beta (GSDFBETA)In order to identify the multiple outliers in dataset and to overcome the masking and swamping effect GSDFBETA is defined asGSDFBETAi = for i 3.24= for i 3.25Now the generalized GSDFBETA can be re-expressed in terms of GSPR and GWsGSDFBETAi = for i 3.26= for i 3.27A suggested cut-off value for the detection of influential observation isGSDFBETA

No comments:

Post a Comment