• Click on book icon with peacock to the left for assignment lists.

PROJECT OPTION sent via Google Classroom May 4th, due May 29th.

video class Monday at 11-12PM.  Details ...

Otherwise, to join by phone, dial +1 219-281-4857 and enter this PIN: 133 290 263#

Group Name:  TamiscalStatIS

Classes Mondays at 10:30-11:30AM.

Optional session Fridays at 1-2PM.

(makeup 5/22 for Memorial day 5/25)

# Statistics Project 2020

Chapter 10--Inference for Distributions of Relationships

(The previous lesson, I had bet the class that two of them would have the same birthday.  I've done this many years and have won 65% of the time.  This year I lost the bet to my 17 students.  I bet them a bag of m&m's which I mailed to them by the next class.  However, the m&m's were needed for the next lesson.)

When you get your bet payoff in the mail, save it for Monday's class.
(Please don't eat until Monday's class so we can do activity together.)

Click here M&M's colors and enter the colors of the candies you got.

How to calculate the Chi-Squared Test Statistic:

 CANDIES BROWN RED YELLOW GREEN ORANGE BLUE Summation Curt's counts 6 6 9 13 15 6 55.0 = n Expected counts ( = np) 7.1 7.1 7.7 8.8 11.0 13.2 54.8 observed - expected -1.13 -1.13 1.32 4.23 4.03 -7.16 evidence that null hypothesis not true squared diff 1.27 1.27 1.75 17.86 16.27 51.27 Is it convincing? (squared diff) / expected 0.18 0.18 0.23 2.04 1.48 3.90 8.00 Chi-Squared Test Statistic

Here were my counts:

The notes on the whiteboard below assume n = 60 candies.

Here are the class counts:

Here are the distribution among the categories of candy colors:

The Relative Age Problem and Sports Rosters

 Canadian All Stars Bday counts Jan-Mar 14 Apr-Jun 5 Jul-Sep 4 Oct-Dec 2 n = 25

The following histograms are of the birth months of rosters of hockey and soccer teams.
The first bar is January, the second February and so on until the last one of December.

Canadian Hockey                                  Czechoslovakian Hockey                        Czechoslovakian Soccer

Notice anything peculiar or out of the ordinary?  Whether in Canada or Czechoslovakia,
whether it's hockey or soccer, their players tend to be born in  the first few months of the year.
They were graphed on this window.

Here are all teams on the same graph:

Now do you see it?  If not, here is the graph adjusted with a ZoomStat window:

Most of the players are born in January and February.  Why is that?  It's not astrology.  It's that the cutoff dates of eligibility in soccer and hockey in both countries are January 1st.  So a player who turns 10 on January 2nd plays the sport alongside someone who won't turn ten until December.  And at the ages of 10 and 9, who is going to have an advantage in a physical contact sport?  The older players are going to appear bigger and stronger than the pre-adolescents 11 months younger.  So the coaches select those players based on those observations when really they are just picking the oldest kids.  And those children get better coaching, more practice and experience, a tougher playing field, all of which makes them truly better players by the time they are 13 or 14.  What was once a tiny edge or advantage becomes larger and larger because they are consistently given more opportunity and experiences to develop their talents.  If you are born in the latter half of the year and want to play soccer or hockey, the deck is stacked against you.  The system discourages you from pursuing the sport further because the very month in which you are born becomes the obstacle.

What trend do you notice about NHL player birthmonths?
Why is that?

Are Major League Baseball birthmonths following the same trend?
Or are they following a different trend?

Does this apply to all sports?
Here is the NFL's birthmonth distribution.

Does the trend in Olympic birthdays generalize to the UEFA European leages?
Why is this important?

Are these observations statistically significant?

# The Birthday Problem

What are the chances that two people in class have the same birthday?

Among 17 of us, no two had the same birthday.  The chances of that were 71.6% =

365*364*363*362*361*360*359*358*357*356*355*354*353*352*351*350/365^16

= 1(.997)(.994)(.992)(.989)(.986)(.983)(.98)(.978)(.975)(.973)(.97)(.967)(.964)(.962)(.959)(.956)(.953)

= 71.6%.  So the chances of a different birthday is P(diff) = 1 - P(same) = 1-0.716 = 0.284.

There was a 28.4% chance of two people in class having the same birthday.

If there are 23 people in a group, chances are 50-50 that two of them have the same birthday!

(Tamiscal staff has 23 people and indeed, two of us have the same birthday.  One of them is me.  Can you find the other?)

formulas

# Notes for 8.1 - Idea of Significance Test

Notes for 8.3 and 8.4 - Testing a Claim and Significance for a Proportion

Notes for 8.5 and 8.6 - Testing Claims and Significance for a Mean

Assignment 8.1 key

# Chapter 7

Test this week.  Class Monday from 11-12 for both sections (M&F)

# Activity Applet

highschool.bfwpub.com/spa3e

Notes for 7.5 and 7.6

Notes for 7.3 and 7.4

Notes 6.4

problems 6.4

Notes 6.4

notes 6.5

# A#29 Stat 4.1 Key , A#30 Stat 4.2 Solutions , A#31 Stat 4.3 solution key

Stat 4.4 solution key A#32 , Stat 4.5 solution key A#33

Ch 4 Review solutions A#34

# but you get to decide when you're ready to take it.)

(It will NOT be in the R2 grade, but the Chapter 3 assignments will be.)

Lesson 3 Review Solutions

# Monday's Test will be open book/open notes.

Bring in your book and notebook.

Assignment #19 Review KEY

# Regression

Enter the following into a graphing calculator or spreadsheet (Google Sheets on Chromebook)

 SANDWICH fat (g) CALORIES Quarter Pounder 19 410 Big Mac 31 580 cheeseburger 16 343 Wendy's 35 570 Whopper 39 640 Carl's Jr West 43 660

Make a scatterplot.

To find regression equation on calculator:

Press STAT, highlight "Calc", press 4: Linreg ax+b.

Enter L1, L2 to specify the data.

Press Y=.  Then press VARS, choose 5: Statistics,

arrow cursor over to "EQ", choose 1: RegEQ.

This should enter the best-fit equation into Y1.

Press ZOOMStat (9) or GRAPH.

The equation should be y = 10.4x + 225. (y = mx + b)

Statisticians write it as y = 225 + 10.4x.  (y = a + bx)

To give it more context, the units are substituted for the variables to make a predictive model:

CALORIES = 225 + 10.4(fat g)

As more sandwich data is added, the slopes and intercepts change slightly.

The slope is the correlation factor times Sy/Sx,  where Sy is the standard deviation of the response variable, and Sx the explanatory.

b = r * Sy / Sx .

b = 0.97 (128.4 / 10.9) = 11.4

The y-intercept is calculated by substituting the means for x and y, respectively.

a = Mean y - b * (mean x)

a = 534 - 11.4 (30.5) = 186.3

So this model would be y = 186.3 + 11.4 x.

 hamburger 10 266 double cheeseburger 11 440 Bacon Double 22 400 Impossible 14 240 Beyond Meat 20 270 Baconator 62 970 Pretzel King 60 920

The BKK Pretzel Bacon King debuted September 19, 2019.

https://thetakeout.com/review-burger-king-pretzel-bacon-king-bun-1838914498

This line is slightly different because there are extra data points in it.

They are the Impossible Burger, Beyond Meat, Baconator and the BK Pretzel Burger.

Can you spot them?

We can use the predictive model EQ (Calories = 107 + 13.8*(fat g)) to predict calorie values for the McDonald's McRib sandwich.

(Coincidentally rereleased this week.)

McRib CAL = 107 + 13.8(22 fat g) = 107 + 303.6 = 410.6

The actual calories are 480.  This difference is called a residual.  It is 69.4.

 Double Down 32 540 McRib 22 480

Including these sandwiches to the data alters the regression equation slightly.

The newly predicted value for the McRib would be 423.6, so it's residual drops to 34.6
How did these change the correlation factor r?

What is R^2?  It is the square of the corrrelation factor.

For the KFC Double Down, the predicted value is

CAL = 142 +12.8(32 fat g) = 551.6.  Its residual is 540 - 551.6 = -11.6 calories.

If we plot the residuals (datum - predicted value) by the explanatory variable,

# there should be no correlation (Note R^2 is about 0.)

That is because most of the variance was accounted for by the regression equation.

Regression using the calculator

Regression using calc example problem

# Correlation worksheet

Correlations Solution KEY

# Football correlations wins and points

Baseball correlations Win-Loss and Runs

SAT 2018 Correlations

Rich students get better SAT scores article published week of this lesson!

cnbc.com/2019/10/03/rich-students-get-better-sat-scores-heres-why.html

# https://www.latimes.com/california/story/2019-10-02/uc-sat-test-optionalhttps://www.latimes.com/california/story/2019-10-02/uc-sat-test-optional

posted two days after Monday's lesson, two before Friday's class.

Sections 2.1-2.3 KEY

Standard Deviation

7 day forecast

 temperatures average deviation squared deviations Sx standard deviation SD Above/Below Thu, 9/ 12/ 2019 93 79.3 13.7 188.08 6.6 7.1 1.92 Fri, 9/ 13/ 2019 84 79.3 4.7 22.22 6.6 7.1 0.66 Sat, 9/ 14/ 2019 75 79.3 -4.3 18.37 6.6 7.1 -0.60 Sun, 9/ 15/ 2019 72 79.3 -7.3 53.08 6.6 7.1 -1.02 Mon, 9/ 16/ 2019 75 79.3 -4.3 18.37 6.6 7.1 -0.60 Tue, 9/ 17/ 2019 77 79.3 -2.3 5.22 6.6 7.1 -0.32 Wed, 9/ 18/ 2019 79 79.3 -0.3 0.08 6.6 7.1 -0.04 Average= sum = Sum / n sum / (n-1) 0.0 305.43 =variance

How to find Standard Deviation (STDEV)

#1.  Find the average (arithmetic mean.)

#2  Subtract the mean from each data value to get a deviation.

#3  Square each deviation.

#4  Sum up all squared deviations.

#5  Divide by n-1.*  This result is called the variance.

*(If you have all data, a population, you can divide by n to get Sx.

If you have a sample of a data set, divide by n-1.)

#6  Take the square root of the variance to restore original units (instead of units squared.)

This is called the standard deviation.  Usually more than 2/3 of the data will lie within one standard deviation of the mean.  See below.

Five of the seven data points, 5/7 = 72%, are within one standard deviation above or below the mean.

# STATISTICS REVIEW SESSION (Optional)

09/15/201909/20/2019

Since Friday's class is canceled due to a minimum day, I will conduct an optional review session for both Friday and Monday stat classes at 12 for half an hour.

Location is the HUB, where Monday's class meets.  If you can't make Friday at 12, I could also conduct one Thursday at 3.

I will have a review guide of concepts, and can do any practice problems.

Tests for each class on Chapter 1 Monday, Sept. 23rd, and Friday Sept. 27th.

NOTES 1.7-1.9

Review ASNWERS

Stemplot and Histogram Examples (President Ages and Shoes)

Practice Histogram warmups

Sections 1.1 to 1.3 Power Point

Histograms

Describe the distribution (symmetric, uniform, skewed, clustered) of the soccer goals histogram:

What would be the best measures of center (mode, median, mean)?

Discuss the distribution of the runs scored in MLB's 2017 season.

Would the mean or the median be the better measure of center?  What is the mode?

What is the actual mode?  What is the perceived mode?

Why does the above histogram have a different mode than the perceived actual histogram?

Describe the distribution of NBA scores.

Are the mean, median, and mode the same measures of center?

Where is most of the data?  Which direction are baseball runs skewed?

Will the mean or median be greater?  Which three measures of center is the least?

Below is a histogram of Nationaal Hockey League goals.

Describe its distribution and measures of center.