Regression models for data sciense (779323), страница 2

Файл №779323 Regression models for data sciense (Regression models for data sciense) 2 страницаRegression models for data sciense (779323) страница 22017-12-252017-12-25СтудИзба

Regression models for data sciense

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 2)

. . .Interpreting Logistic Regression . . . . . .Visualizing fitting logistic regression curvesRavens logistic regression . . . . . . . . . .Some summarizing comments . . . . . . .Exercises . . . . . . . . . . . . . . . . . . ...........................................................................................................................................................................................................................................105105106108108109113115115Count data . . .

. . . . . . .Poisson distribution . . . .Poisson distribution . . . .Linear regression . . . . .Poisson regression . . . . .Mean-variance relationshipRates . . . . . . . . . . . .Exercises . . . . . . . . . .........................................................................................................................................................................................................................................................................................116116117118120121123124Bonus material . .

. . . . . . . . . . . . . .How to fit functions using linear modelsNotes . . . . . . . . . . . . . . . . . . . .Harmonics using linear models . . . . . .Thanks! . . . . . . . . . . . . . . . . . ........................................................................................................................................125125126127129PrefaceAbout this bookThis book is written as a companion book to the Regression Models¹ Coursera class as part of theData Science Specialization². However, if you do not take the class, the book mostly stands on itsown.

A useful component of the book is a series of YouTube videos³ that comprise the Courseraclass.The book is intended to be a low cost introduction to the important field of regression models. Theintended audience are students who are numerically and computationally literate, who would liketo put those skills to use in Data Science or Statistics. The book is offered for free as a series ofmarkdown documents on github and in more convenient forms (epub, mobi) on LeanPub.This book is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0International License⁴, which requires author attribution for derivative works, non-commercial useof derivative works and that changes are shared in the same way as the original work.About the coverThe picture on the cover is a public domain image taken from Francis Galton’s paper on hereditarystature.

It represents an important leap in the development of regression and correlation as well asregression to the mean.¹https://www.coursera.org/course/regmods²https://www.coursera.org/specialization/jhudatascience/1?utm_medium=courseDescripTop³https://www.youtube.com/playlist?list=PLpl-gQkQivXjqHAJd2t-J_One_fYE55tC⁴http://creativecommons.org/licenses/by-nc-sa/4.0/IntroductionBefore beginningThis book is designed as a companion to the Regression Models⁵ Coursera class as part of the DataScience Specialization⁶, a ten course program offered by three faculty, Jeff Leek, Roger Peng andBrian Caffo, at the Johns Hopkins University Department of Biostatistics.The videos associated with this book can be watched in full here⁷, though the relevant links tospecific videos are placed at the appropriate locations throughout.Before beginning, we assume that you have a working knowledge of the R programming language.If not, there is a wonderful Coursera class by Roger Peng, that can be found here⁸.

In addition,students should know the basics of frequentist statistical inference. There is a Coursera class here⁹and a LeanPub book here¹⁰.The entirety of the book is on GitHub here¹¹. Please submit pull requests if you find errata! In additionthe course notes can be found also on GitHub here¹². While most code is in the book, all of the codefor every figure and analysis in the book is in the R markdown files files (.Rmd) for the respectivelectures.Finally, we should mention swirl (statistics with interactive R programming). swirl is an intelligenttutoring system developed by Nick Carchedi, with contributions by Sean Kross and Bill and GinaCroft.

It offers a way to learn R in R. Download swirl here¹³. There’s a swirl module for this course!¹⁴.Try it out, it’s probably the most effective way to learn.Regression modelsWatch this video before beginning¹⁵⁵https://www.coursera.org/course/regmods⁶https://www.coursera.org/specialization/jhudatascience/1?utm_medium=courseDescripTop⁷https://www.youtube.com/playlist?list=PLpl-gQkQivXjqHAJd2t-J_One_fYE55tC⁸https://www.coursera.org/course/rprog⁹https://www.coursera.org/course/statinference¹⁰https://leanpub.com/LittleInferenceBook¹¹https://github.com/bcaffo/regmodsbook¹²https://github.com/bcaffo/courses/tree/master/07_RegressionModels¹³http://swirlstats.com¹⁴https://github.com/swirldev/swirl_courses#swirl-courses¹⁵https://www.youtube.com/watch?v=58ZPhK32sU8&index=1&list=PLpl-gQkQivXjqHAJd2t-J_One_fYE55tCIntroduction3Regression models are the workhorse of data science. They are the most well described, practicaland theoretically understood models in statistics.

A data scientist well versed in regression modelswill be able to solve and incredible array of problems.Perhaps the key insight for regression models is that they produce highly interpretable model fits.This is unlike machine learning algorithms, which often sacrifice interpretability for improvedprediction performance or automation. These are, of course, valuable attributes in their own rights.However, the benefit of simplicity, parsimony and intrepretability offered by regression models (andtheir close generalizations) should make them a first tool of choice for any practical problem.Motivating examplesFrancis Galton’s height dataFrancis Galton, the 19th century polymath, can be credited with discovering regression. In hislandmark paper Regression Toward Mediocrity in Hereditary Stature¹⁶ he compared the heights ofparents and their children. He was particularly interested in the idea that the children of tall parentstended to be tall also, but a little shorter than their parents.

Children of short parents tended to beshort, but not quite as short as their parents. He referred to this as “regression to mediocrity” (orregression to the mean). In quantifying regression to the mean, he invented what we would callregression.It is perhaps surprising that Galton’s specific work on height is still relevant today. In fact thisEuropean Journal of Human Genetics manuscript¹⁷ compares Galton’s prediction models versusthose using modern high throughput genomic technology (spoiler alert, Galton wins).Some questions from Galton’s data come to mind. How would one fit a model that relates parentand child heights? How would one predict a childs height based on their parents? How would wequantify regression to the mean? In this class, we’ll answer all of these questions plus many more.Simply Statistics versus Kobe BryantSimply Statistics¹⁸ is a blog by Jeff Leek, Roger Peng and Rafael Irizarry.

It is one of the most widelyread statistics blogs, written by three of the top statisticians in academics. Rafa wrote a (somewhattongue in cheek) post regarding ball hogging¹⁹ among NBA basketball players. (By the way, yourauthor has played basketball with Rafael, who is quite good by the way, but certainly doesn’t passup shots; glass houses and whatnot.)Here’s some key sentences:¹⁶http://galton.org/essays/1880-1889/galton-1886-jaigi-regression-stature.pdf¹⁷http://www.nature.com/ejhg/journal/v17/n8/full/ejhg20095a.html¹⁸http://simplystatistics.org/¹⁹http://simplystatistics.org/2013/01/28/data-supports-claim-that-if-kobe-stops-ball-hogging-the-lakers-will-win-more/Introduction4• “Data supports the claim that if Kobe stops ball hogging the Lakers will win more”• “Linear regression suggests that an increase of 1% in % of shots taken by Kobe results in a dropof 1.16 points (+/- 0.22) in score differential.”In this book we will cover how to create summary statements like this using regression modelbuilding.

Note the nice interpretability of the linear regression model. With this model Rafanumerically relates the impact of more shots taken on score differential.Summary notes: questions for this bookRegression models are incredibly handy statistical tools. One can use them to answer all sorts ofquestions. Consider three of the most common tasks for regression models:1. Prediction Eg: to use the parent’s heights to predict children’s heights.2.

Modeling Eg: to try to find a parsimonious, easily described mean relationship betweenparental and child heights.3. Covariation Eg: to investigate the variation in child heights that appears unrelated to parentalheights (residual variation) and to quantify what impact genotype information has beyondparental height in explaining child height.An important aspect, especially in questions 2 and 3 is assessing modeling assumptions. For example,it is important to figure out how/whether and what assumptions are needed to generalize findingsbeyond the data in question.

Presumably, if we find a relationship between parental and childheights, we’d like to extend that knowledge beyond the data used to build the model. This requiresassumptions. In this book, we’ll cover the main assumptions necessary.Exploratory analysis of Galton’s DataWatch this video before beginning²⁰Let’s look at the data first. This data was created by Francis Galton in 1885.

Galton was a statisticianwho invented the term and concepts of regression and correlation, founded the journal Biometrika,and was the cousin of Charles Darwin.You may need to run install.packages("UsingR") if the UsingR library is not installed. Let’s lookat the marginal (parents disregarding children and children disregarding parents) distributions first.The parental distribution is all heterosexual couples. The parental average was corrected for gendervia multiplying female heights by 1.08. Remember, Galton didn’t have regression to help figure outa betetr way to do this correction!²⁰https://www.youtube.com/watch?v=1akVPR0LDsg&index=2&list=PLpl-gQkQivXjqHAJd2t-J_One_fYE55tC5IntroductionLoading and plotting Galton’s data.library(UsingR); data(galton); library(reshape); long <- melt(galton)g <- ggplot(long, aes(x = value, fill = variable))g <- g + geom_histogram(colour = "black", binwidth=1)g <- g + facet_grid(.

Характеристики

Тип файла

PDF-файл

Размер

3,77 Mb

Материал

Regression models for data sciense

Тип материала

Книга

Предмет

Математическое моделирование

Высшее учебное заведение

МГТУ им. Н.Э.Баумана

Список файлов книги

regression-models-for-data-sciense-455223438-1514184926.rar

Regression models for data sciense.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.