SAS ETS. Лекция 3 (1185367)
Текст из файла
SAS/ETSЛЕКЦИЯ 3Валентина ВласоваValentina.Vlasova@sas.comC op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .SAS/ETSОценка параметров распределения• Модели ненаблюдаемых компонент•СодержаниеC op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .ОЦЕНКА ПАРАМЕТРОВ РАСПРЕДЕЛЕНИЯPROC SEVERITYC op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .SEVERITYDISTRIBUTIONS•Severity: Magnitude of the loss amount, which is assumed to benonnegative.•Severity distributions: any continuous parametric probability distribution withnonnegative support•••Examples: Burr, exponential, gamma, generalized Pareto, inverse Gaussian (Wald),lognormal, Pareto, WeibullThey typically have heavy right tails (probability of extreme losses is small yet nonzero).Distribution model•Let Y be the response (loss) variable and F be a probability distribution family withparameters Q•Model: Y ~ F(Q), to be read as “Y is generated from a stochastic process governedby distribution F ”C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c .
A l l r i g h t s r es er v e d .DEFINITIONS ANDNOTATIONS•f(y; Q): probability density function (PDF) of y•F(y; Q) = Pr[Y ≤ y]: cumulative distribution function (CDF) of y•Fn(y): empirical distribution function (EDF) of y (this entity is not dependent on anydistribution or parameters and hence is referred to as “nonparametric”)•S(y; Q) = Pr[Y > y] = 1 - F(y; Q); also referred to as survival distribution function (SDF)•Scale family: a distribution has a scale parameter if two random variables X and Yrelated to each other as Y=cX belong to the same probability distribution family•Scale parameter: A parameter q, such that F(y;q,W)=F(y/q;1,W), where W representsother parameters of the distribution•A distribution can be in a scale family without having a scale parameter (for example, lognormal distribution)C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c .
A l l r i g h t s r es er v e d .MODELINGSEVERITYDISTRIBUTIONS•Goal: Identify the distribution model that best describes the stochastic lossgeneration process by using the historical loss data•Process:•Collect and prepare loss data sample• Identify candidate distributions• Estimate parameters for each candidate distribution using the sample••••Initialize parametersEstimate parameters using a suitable optimization methodCompute fit statistics (how well a distribution fits the sample)Identify the best model according to a suitable selection criterionC op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c .
A l l r i g h t s r es er v e d .PARAMETERESTIMATIONMETHODS••Matching-based methods•Prepare equations that have analytical expressions for the population entities suchas moments and percentiles on the left-hand side and their corresponding sampleestimates, which are numbers, on the right-hand side•Simplify the equations and solve them for the model parametersOptimization-based methods•Define a criterion of closeness of the fitted model to the data••Typically referred to as ‘objective’ or ‘loss’ functionMaximize or minimize the objective function using suitable optimization technique•Typically nonlinear optimization is necessaryC op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c .
A l l r i g h t s r es er v e d .MATCHING-BASEDMETHODS•Types•••Pros••Method of moments: solve moment equationsMethod of percentile matching: solve percentile equationsUseful when population entities are defined and have easy-to-solve closed formexpressionsCons•Solution to the moment or percentile equations might not be easy to compute••••Might require solution of nonlinear simultaneous equationsEquations are different for each distributionMatching can be done only over certain ranges of the dataLittle theoretical results are available about statistical properties of the estimatorsC op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .OPTIMIZATIONBASED METHODS•Minimum-distance estimators: minimize a ‘distance’ between the parametric andnonparametric estimates of certain entities•Parametric estimates are based on the CDF, and nonparametric estimates are based on the EDFQ wi [ g ( yi ; Q) g n ( yi )]2iSwhere g is a function uniquely related to CDF and gn is a function related in the same manner to EDF•Maximum-likelihood estimators: maximize the likelihood of observing the data as if itwere indeed generated from the hypothesized distribution•Likelihood is determined by the PDF of an observationL f Q ( yi )iSC op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c .
A l l r i g h t s r es er v e d .OPTIMIZATIONBASED METHODS(CONTINUED)•Parameters need to have good starting values unless the objective functionis “well-behaved” (has a global optimum)••Method of moments or percentile matching can be used to initialize parametersMaximum likelihood estimates have some nice properties•Asymptotically unbiased•Estimates have asymptotic normal distribution• Smallest asymptotic variance among all such estimators• Variance (and hence standard errors) of the parameters can be computed usingthe Hessian of the objective function•Invariant under parameter transformation• Very useful for getting estimates and variances of functions of parameters (suchas mean)C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .EVALUATING ANDCOMPARINGMODELS•••Compare variances of the estimates of some common summary statisticssuch as mean•Smaller variances imply more efficient model•Does not give any explicit indications about how well the model fits the dataLikelihood ratio test•If models M1 and M2 have optimum likelihoods of L1 and L2, respectively, then thetest statistic R = −2*(log(L1) − log(L2)) follows a chi-square distribution with degreesof freedom equal to the difference in the number of parameters of the two models•Limited to comparing pairs of models•Formally applicable only when one model is a special case of anotherRank using fit statistics•Two categories of fit statistics: likelihood-based and EDF-basedC op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c .
A l l r i g h t s r es er v e d .LIKELIHOOD-BASEDFIT STATISTICSComputed using the likelihood (L), the sample of size (N), and the number of estimatedparameters in the model (p)• −2 Log likelihoodLOGLIK 2 log( L)•Akaike’s information criterion (AIC)AIC 2 log( L) 2 p••Corrected AIC (AICC)2 p( p 1)AICC AIC N p 1Schwarz’s Bayesian information criterion (BIC)BIC 2 log( L) p log( N )C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c .
A l l r i g h t s r es er v e d .EDF-BASED FITSTATISTICSComputed by comparing parametric estimates of the distribution function (CDF) with empiricalestimates (EDF)• Fn(y) = EDF estimate at y; F(y) = CDF estimate at y• Kolmogorov-Smirnov statistic (KS)KS sup | Fn ( y) F ( y) |•Anderson-Darling statistic (AD)y( Fn ( y ) F ( y )) 2AD N dF ( y )F(y)(1F(y))•Cramer-von Mises statistic (CvM)CvM N ( Fn ( y ) F ( y )) 2 dF ( y )C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c .
A l l r i g h t s r es er v e d .PROC SEVERITY:INTRODUCTION•Fits severity distribution modelsPROC SEVERITY options ;Fits an error model Y ~ F(Q)• Allows F to be any continuousparametric probability distributionwith nonnegative supportBY variable-list ;••Ships with ten distributions:Burr, exponential, gamma, generalizedPareto, inverse Gaussian (Wald),lognormal, Pareto, scale-Tweedie,Tweedie, WeibullC op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .LOSS <response-variable> </ censoring-truncation-options> ;WEIGHT weight-variable ;SCALEMODEL regressor-variable-list </ scalemodel-options> ;DIST distribution-name-or-keyword <(distribution-option)<distribution-name-or-keyword <(distribution-option)>> ...></preprocess-options> ;OUTSCORELIB <OUTLIB=> fcmp-library-name options ;NLOPTIONS options ;Programming statements ;PROC SEVERITY:PROCESS•Can estimate multiple models in one PROC SEVERITY step• For each model based on a candidate distribution•Initializes parameters• Estimates parameters that optimize an objective function••Default objective: maximization of the log-likelihood (ML)Involves nonlinear optimization•Computes fit statistics• Produces diagnostic output and ODS graphics•Identifies the best model according to the specified selection criterionC op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c .
A l l r i g h t s r es er v e d .PROC SEVERITY:SYNTAX SUMMARY•PROC SEVERITY statement: To specify various input, output, and fitting options•LOSS statement: To specify severity variable and censoring or truncation options•SCALEMODEL statement: To specify regression variables that affect the scaleparameter of each fitted distribution•BY statement: To specify grouping of your data•DIST statement: To specify distributions to fit•NLOPTIONS statement: To control behavior of the nonlinear optimizer•Programming statements: To define custom objective function to minimizeC op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c .
Характеристики
Тип файла PDF
PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.
Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.