Regression models for data sciense (779323), страница 17
Текст из файла (страница 17)
The logit is useful as it converts probabilities whichlie in [0,1] into the whole real line, a more natural space for the linear part of the model to live.Notice further, we’re not transforming the outcome (Y). Instead, we’ll modeling our Y as if it werea collection of coin flips and applying the transformation to the probability of a head.To get the estimates we maximize the likelihood. We can write out the likelihood as:n∏(µyi i (1− µi )1−yi= expi=1n∑)yi ηii=1Example, Poisson regressionLet’s consider a problem with count data. Assume that :• Yi ∼ P oisson(µi ) so that E[Yi ] = µi where 0 ≤ µi .∑• Linear predictor ηi = pk=1 Xik βk .n∏i=1(1 + ηi )−1103Generalized Linear Models• Link function g(µ) = η = log(µ)Recall that ex is the inverse of log(x) so that we have:µi = eηiThus, the likelihood is:( n)nn∏∑∑(yi !)−1 µyi i e−µi ∝ expyi η i −µii=1i=1i=1How estimates are obtainedFor GLMs, estimates have to be obtained numerically through an iterative algorithm.
The algorithmsare very well behaved, so convergence is usually not a problem unless you have a lot of data on aboundary, such as a lot of 0 counts in binomial or Poisson data. The standard errors are obtainedalso numerically, and are usually based on large sample theory. The exact equation that gets solvedis the so-called normal equations0=n∑(Yi − µi )i=1V ar(Yi )WiThe variance differs by the model. The Wi are derivative terms that we won’t deal with.• For the linear model V ar(Yi ) = σ 2 (is constant).• For Bernoulli case V ar(Yi ) = µi (1 − µi )• For the Poisson case V ar(Yi ) = µi .In the latter two cases, it is often relevant to have a more flexible variance model, even if it doesn’tcorrespond to an actual likelihood.
We might make the following changes:nn∑∑(Yi − µi )(Yi − µi )0=Wi and 0 =Wiϕµ(1−µ)ϕµiiii=1i=1These are called ‘quasi-likelihood’ normal equations. R offers these as an option in the glm functionas the quasipoisson and quasibinomial options. These offer more flexible variance options thanstraight Poisson and binomial models.Generalized Linear Models104Odds and endsAt this point, let’s do some bookkeeping before we work through examples.••••The normal equations have to be solved iteratively. Resulting in β̂k and, if included, ϕ̂.∑Predicted linear predictor responses can be obtained as η̂ = pk=1 Xk β̂kPredicted mean responses as µ̂ = g −1 (η̂)Coefficients are interpreted asg(E[Y |Xk = xk + 1, X∼k = x∼k ]) − g(E[Y |Xk = xk , X∼k = x∼k ]) = βkor the change in the link function of the expected response per unit change in Xk holding otherregressors constant.• Variations on Newon/Raphson’s algorithm are used to do it.• Asymptotics are used for inference usually (but not always).• Many of the ideas from linear models can be brought over to GLMs.Exercises1.
True or false, generalized linear models transform the observed outcome. (Discuss.) Watch avideo solution.¹⁰⁶2. True or false, the interpretation of the coefficients in a GLM are on the scale of the linkfunction. (Discuss.) Watch a video solution.¹⁰⁷3. True or false, the generalized linear model assumes an exponential family for the outcome.(Discuss.) Watch a video solution.¹⁰⁸4. True or false, GLM estimates are obtained by maximizing the likelihood.
(Discuss.) Watch avideo solution.¹⁰⁹5. True or false, some GLM distributions impose restrictions on the relationship between themean and the variance. (Discuss.) Watch a video solution.¹¹⁰¹⁰⁶https://www.youtube.com/watch?v=gsfMdAmHxgA&list=PLpl-gQkQivXji7JK1OP1qS7zalwUBPrX0&index=47¹⁰⁷https://www.youtube.com/watch?v=ewAUYoJYG_0&list=PLpl-gQkQivXji7JK1OP1qS7zalwUBPrX0&index=48¹⁰⁸https://www.youtube.com/watch?v=CkZ9wOm0Uvs&list=PLpl-gQkQivXji7JK1OP1qS7zalwUBPrX0&index=49¹⁰⁹https://www.youtube.com/watch?v=LckCGsK8oqY&list=PLpl-gQkQivXji7JK1OP1qS7zalwUBPrX0&index=50¹¹⁰https://www.youtube.com/watch?v=oRUJv6ur_cY&list=PLpl-gQkQivXji7JK1OP1qS7zalwUBPrX0&index=51Binary GLMsWatch this video before beginning.¹¹¹Binary GLMs come from trying to model outcomes that can take only two values.
Some examplesinclude: survival or not at the end of a study, winning versus losing of a team and success versusfailure of a treatment or product. Often these outcomes are called Bernoulli outcomes, from theBernoulli distribution named after the famous probabilist and mathematician.If we happen to have several exchangeable binary outcomes for the same level of covariate values,then that is binomial data and we can aggregate the 0’s and 1’s into the count of 1’s. As an example,imagine if we sprayed insect pests with 4 different pesticides and counted whether they died or not.Then for each spray, we could summarize the data with the count of dead and total number thatwere sprayed and treat the data as binomial rather than Bernoulli.Example Baltimore Ravens win/lossThe Baltimore Ravens are an American Football team in the US’s National Football League.¹¹² Thedata contains the wins and losses of the Ravens by the number of points that they scored.
(InAmerican football, the goal is to score more points than your opponent.) It should be clear thatthere would be a positive relationship between the number of points scored and the probability ofwinning that particular game.from Jeff Leek, the instructor of three of the Data Science Specialization courses.Let’s load the data and use head to look at the first few rows.> download.file("https://dl.dropboxusercontent.com/u/7710864/data/ravensData.rda", destfile="./data/ravensData.rda",method="curl")> load("./data/ravensData.rda")> head(ravensData)ravenWinNum ravenWin ravenScore opponentScore11W24921W383531W281341W343151W441360L2324¹¹¹https://youtu.be/CteWtkdXQ-Y?list=PLpl-gQkQivXjqHAJd2t-J_One_fYE55tC¹¹²Baltimore is the home of Johns Hopkins University where your author works.
I got this data set106Binary GLMsA linear regression model would look something like this:Yi = β0 + β1 Xi + eiWhere, Yi is a binary indicator of whether or not the Ravens won game i(1 for a win, 0 for a loss).Xi is the number of points that they scored for that game. and ϵi is the residual error term.Under this model then β0 is the expected value of Yi given a 0 point game.
For a 0/1 variable, theexpected value is the probability, so the intercept is the probability that the Ravens win with 0 pointsscored. Then β1 is the increase in probability of a win for each addiational point.At this point in the book, I hope that fitting and interpreting this model would be second nature.> lmRavens <- lm(ravensData$ravenWinNum ~ ravensData$ravenScore)> summary(lmRavens)$coefEstimate Std.
Error t value Pr(>|t|)(Intercept)0.28500.2566431.111 0.28135ravensData$ravenScore0.01590.0090591.755 0.09625There’s numerous problems with this model. First, if the Ravens score more than 63 points in a game,we estimate a 0.0159 * 63, which is greater than 1, increase in the probability of them winning. This isan impossibility, since a probability can’t be greater than 1. 63 is an unusual, but not impossible, scorein American football, but the principle applies broadly: modeling binary data with linear modelsresults in models that fail the basic assumption of the data.Perhaps less galling, but still an annoying aspect of the model, is that if the error is assumed to beGaussian, then our model allows Yi to be anything from minus infinity to plus infinity, when weknow our data can be only be 0 or 1. If we assume that our errors are discrete to force this, we assumea very strange distribution on the errors.There also aren’t any transformations to make things better.
Any one to one transformation of ouroutcome is still just going to have two values, thus the same set of problems.The key insight was to transform the probability of a 1 (win in our example) rather than the dataitself. Which transformation is most useful? It turns out that it involves the log of the odds, calledthe logit.OddsYou’ve heard of odds before, most likely from discussions of gambling. First note, odds are a fractiongreater than 0 but unbounded. The odds are not a percentage or proportion. So, when someonesays “The odds are fifty percent”, they are mistaking probability and odds. They likely mean “Theprobability is fifty percent.”, or equivalently “The odds are one.”, or “The odds are one to one”, or107Binary GLMs“The odds are fifty [divided by] fifty.” The latter three odds statements are all the same since: 1, 1 /1 and 50 / 50 are all the same number.If p is a probability, the odds are defined as o = p/(1 − p).
Note that we can go backwards asp = o/(1 + o). Thus, if someone says the odds are 1 to 1, they’re saying that the odds are 1 and thusp = 1/(1 + 1) = 0.5. Conversely, if someone says that the probability that something occurs is 50%,then they mean that p = 0.5 so that the odds are o = p/(1 − p) = 0.5/(1 − 0.5) =.The odds are famously derived using a standard fair game setting.
Imagine that you are playing agame where you flip a coin with success probability p with the following rules:• If it comes up heads, you win X dollars.• If it comes up tails, you lose Y .What should we set X and Y for the game to be fair? Fair means the expected earnings for eitherplayer is 0. That is:E[earnings] = Xp − Y (1 − p) = 0pY= 1−p= o. Consider setting X = 1, then Y = o. Thus, the odds can be interpretedThis implies Xas “How much should you be willing to pay for a p probability of winning a dollar?” If p > 0.5 youhave to pay more if you lose than you get if you win. If p < 0.5 you have to pay less if you losethan you get if you win.So, imagine that I go to a horse race and the odds that a horse loses are 50 to 1. They usually specifyin terms of losing at horse tracks, so this would be said to be 50 to 1 “against” where the against isimplied and not stated on the boards.
The odds of the horse winning are then 1/50. Thus, for a fairbet if I were to bet on the horse winning, they should pay me 50 dollars if he wins and should pay1 dollar if he loses. (Or any agreed upon multiple, such as 100 dollars if he wins and 2 dollars if heloses.) The implied probability that the horse loses is 50/(1 + 50).It’s an interesting side note that the house sets the odds (hence the implied probability) only bythe bets coming in. They take a small fee for every bet win or lose (the rake). So, by setting theodds dynamically as the bets roll in, they can guarantee that they make money (risk free) via therake. Thus the phrase “the house always wins” applies literally.