(for figures, tables, and graphs: this means that if you have a figure, it should have appropriate axis labels and titles). Using the KGSS data set Questions 10% What is the average number of family members that live with respondents in the Korean General Social Survey? (hompop) 10% What is the average for those 45 years old and over? For those under 45 years old? 20% Generate a histogram for the average number of family members living with respondents, and label the average. 5% What is the standard deviation for the average number of family members living with respondents? What does this number mean? 10% Examine the KGSS data set and find a continuous variable that you believe might predict the ideal number of children. Provide relevant descriptive statistics (n, mean, sd, min, max) for the variable(s) you chose. 40% Then run a regression with that variable. Make a table showing the results. Interpret the coefficient for your variable. 5% Include your code. (paste it into the paper at the end) Section 2 Use the extract of the KGSS that we have already used. Use the do-files from previous labs to create the total number of children and the edcat variable. 40% Regress the total number of children (NOT the ideal number of children) that an individual has on income (the variable income), education (edcat), age, and satisfaction with the economy. Restrict the sample to those age 50 or below. Next, investigate whether the link between age and number of children differs by education. (add an interaction between education and age). Present the 2 regression models above in a table, with appropriate test and fit statistics. 20% Generate margins plots showing the link between age and number of children for three categories of education. 40% Comment on the regression models. Is the effect of age the same regardless of education, or does it differ? How would you explain this relationship to someone who has never studied statistics? Why do these results make sense? Section 3 Homework 3: Regression Diagnostics and Dealing with Problems For this assignment, you will use these data: http://www.stata-press.com/data/r8/census.dta. They are 1980 census data on marriages and divorces, median age, and population by state for the United States. We are going to try to explain variation in marriage rates using this data set. We will use median age, the percent of the population that is urban, and the total population of the state. 10% outline 1-2 expectations. Write these up as formal hypotheses. 10% State how you calculated the marriage rate. Explain in 2 sentences another way you could have calculated it and why you did not. 10% State how you calculated the percent of the population that is urban. Explain in 2 sentences another way you could have calculated it and why you did not. Report results and elaborate on the following models, using graphical representation where appropriate: 30% Marriage rate as a function of percent of the population that is urban, median age, and population. Check VIF and report. Do you have a problem with collinearity? Show the regression table. Do you find significant results? 10% Check for outliers. Generate a leverage by influence plot. Which states look like they might be problematic? 10% Generate dfbetas. What do these tell you? 10% Is it necessary to run a new model? If so, does running a new model change your understanding of which variables significantly predict the marriage rate?