Data Analysis: Exploration, Patterns, Prediction and Causality is a textbook aimed primarily at business, applied economics and public policy students. It may be taught at MBA, MA Economics (non-PhD track), MSc in Business Economics/Management, MA in Public Policy, PhD in Management and comparable programs. It also a natural fit in Business Analytics graduate programs. This textbook provides integrated knowledge of methods traditionally scattered around various fields such as econometrics, machine learning and practical business statistics. It covers data organization, data description, regression analysis, predictive analytics using regression and machine learning tools, causal analysis of the effects of interventions by doing experiments or using observational data, and practical skills for working with real-life data and collecting data. The textbook covers relatively few methods but helps students gain a a lot of practice and a deep intuitive understanding of those methods. We put a lot of emphasis on the interpretation and visualization of results.
Gábor Békés (CEU)
Gábor Kézdi (U. Michigan)
Data Analysis: Patterns, Prediction and Causality by Gábor Békés (Central European University and CEPR) and Gábor Kézdi (University of Michigan and ISR) is now forthcoming with Cambridge University Press (in 2020). The textbook material may be fully covered in a year-long course (for example, in the first year of a two-year Master programs or PhD programs) It covers material for a series of courses or modules, and chapters may be used to assemble programs of various lengths.
Our textbook covers integrated knowledge of methods and tools traditionally scattered around various fields such as econometrics, machine learning and practical business statistics. Our sections in the book are:
State-of-the art knowledge in data analysis includes traditional regression analysis, causal analysis of the effects of interventions, predictive analytics using regression and machine learning tools, and practical skills for working with real-life data and collecting data. We cover relatively few methods but help students gain a deep intuitive understanding. The upside is that visualization and interpretation of results may become the focus of analysis.
Applied knowledge can be acquired only by working through many applications. Students will use real-life data; learn how to manage analytical projects from scratch as we provide data and code as part of an online ancillary platform. The textbook supports both R and Stata. The textbook is complemented with extensive online material including data, code, additional case studies, practice questions, sample exams and data exercises.
The most important features of this textbook that we think make it attractive - and different from other textbooks - are as follows.
We will provide additional case studies that allow for studying the entire process of data analysis from the substantive business or policy question through collecting or accessing data, managing and cleaning data, carrying out the analysis, presenting and interpreting its results, and addressing the original substantive questions. Case studies aim at answering a question rather than simply illustrating a method. We selected case studies with a potential appeal to a wide range of students. The topics cover management, consumer choice, labor markets, health, energy, macroeconomic and social policy.
Understanding patterns in data is greatly helped by data visualization. We present a comprehensive take on how to build graphs using a few layers. For many types of graphs, we offer shorter sections on how best show a relationship illustrated by graphs used in case studies.
Big Data presents opportunities to better answer old questions and ask new questions. It offers great advantages when applying many traditional statistical methods and allows for developing new methods. At the same time analyzing Big Data presents new challenges, too. We include explicit discussion of these opportunities and challenges in relation to uncovering and generalizing patterns, learning the effects of interventions and carrying out predictions, within each of the sections of the book.
Real data needs cleaning and restructuring before it can be analyzed. The decisions during that process may have far-reaching consequences for the results of the analysis. Yet they are rarely discussed in standard statistics, econometrics and machine learning texts. Even after extensive cleaning the data used in the analysis is typically different from the ideal dataset that would serve the analysis best. Analysts need to have a thorough understanding of those differences to interpret their results in appropriate ways. The examples used in our course help students acquire the tools of data management and data cleaning and track the consequences of data cleaning on the results of their analysis. Furthermore, similar issues are addressed when analysts collect their own data or influence data collection in some ways.
Chapters of this section:
Click on the logo to download code:
Clicking on the above link(s) will either start the download process or open the file depending on your browser. If you wish to only download the file, right-click and select your browser's "Save as..." option.
Uncovering patterns in the data can be an important goal in itself, and it is the prerequisite to establishing cause and effect and carrying out predictions. The course starts with simple regression analysis, the method that compares expected y for different values of x to learn the patterns of association between the two variables. It discusses nonparametric regressions and focuses on the linear regression. It builds on simple linear regression and goes on to enriching it with nonlinear functional forms, generalizing from a particular dataset to other data it represents, adding more explanatory variables, etc. The course also covers regression analysis for time series data, panel data, binary dependent variables, as well as nonlinear models such as logit and probit. Understanding the intuition behind the methods, their applicability in various situations, and the correct interpretation of their results are the constant focus of the course.
Chapters of this section:
Click on the logo to download this chapter:
Data analysis in business and policy applications is often aimed at prediction. The course introduces tools to evaluate predictions, such as loss functions or the Brier score. It emphasizes the importance of out-of-sample prediction, the role of stationarity, the dangers of overfitting and the use of training and testing samples and cross-validation. It presents and compares the most important predictive models that may be useful in various situations such as time series regressions, classification tools and tree-based machine learning methods.
Chapters of this section:
Click on the logo to download this chapter:
Decisions in business and policy are often centered on specific interventions, such as changing monetary policy, modifying health care financing, changing the price or other attributes of products, or changing the media mix in marketing. Learning the effects of such interventions is an important purpose of data analysis. The course incorporates the basic concepts and methods used by program evaluation (the framework of potential outcomes, the benefits of randomized assignment, etc.). It also covers related methods used in business, such as A/B testing.
Chapters of this section:
Click on the logo to download this chapter:
If you experience any discrepancies, please feel free to contact us via data.analyis.textbook@gmail.com. Comments are warmly welcome.
Just for HTML, function, css code practice: