• Statistics Notes

    Click on book icon with peacock to the left for assignment lists. 

    PROJECT OPTION sent via Google Classroom May 4th, due May 29th.  


    video class Monday at 11-12PM.  Details ...

    Check email and this site for Google Meet link

    To join the video meeting, click this link: https://meet.google.com/cyp-ckuj-uic
    Otherwise, to join by phone, dial +1 219-281-4857 and enter this PIN: 133 290 263#

     Group Name:  TamiscalStatIS

    Classes Mondays at 10:30-11:30AM.

    Optional session Fridays at 1-2PM.

    (makeup 5/22 for Memorial day 5/25) 

     


    Project or Ch. 10

    Statistics Project 2020

    Possible Project Ideas , Pandemic Project Ideas 


    Chapter 10--Inference for Distributions of Relationships

     (The previous lesson, I had bet the class that two of them would have the same birthday.  I've done this many years and have won 65% of the time.  This year I lost the bet to my 17 students.  I bet them a bag of m&m's which I mailed to them by the next class.  However, the m&m's were needed for the next lesson.)

    When you get your bet payoff in the mail, save it for Monday's class.
    (Please don't eat until Monday's class so we can do activity together.)
     
    labeling envelopes addressing envelopes mailing hte m and ms
    stuffing the envelope to the post office M&Ms
     
    Click here M&M's colors and enter the colors of the candies you got. 
     
    How to calculate the Chi-Squared Test Statistic:
     
    CANDIES BROWN RED YELLOW GREEN ORANGE BLUE Summation   
    Curt's counts 6 6 9 13 15 6 55.0 = n
    Expected counts ( = np) 7.1 7.1 7.7 8.8 11.0 13.2 54.8  
    observed - expected -1.13 -1.13 1.32 4.23 4.03 -7.16  
    evidence that null hypothesis not true
    squared diff 1.27 1.27 1.75 17.86 16.27 51.27   Is it convincing?
    (squared diff) / expected 0.18 0.18 0.23 2.04 1.48 3.90 8.00
    Chi-Squared Test Statistic
     
    Here were my counts:
     
    Observed vs. Expected Counts Candy Counts by Category  
     
     The notes on the whiteboard below assume n = 60 candies.

    Chi squared candies

    Here are the class counts:

    observed minus expected counts average differences

    Here are the distribution among the categories of candy colors:

    Candy Color Distribution

    Notes 10.2 , HW 10.2 answers

     chi squared formula p value tail probability

    chi squared and p value

     

    The Relative Age Problem and Sports Rosters
     
    Canadian All Star Birthdays
     
    Notice anything unusual about this roster of Canadian Hockey all stars?
     
    Canadian All Stars Bday counts
    Jan-Mar 14
    Apr-Jun 5
    Jul-Sep 4
    Oct-Dec 2
    n = 25

     

    Canadian Hockey roster  
    The following histograms are of the birth months of rosters of hockey and soccer teams.
    The first bar is January, the second February and so on until the last one of December.
     
    Canadian Hockey     Czech Hockey    Czech Soccer
    Canadian Hockey                                  Czechoslovakian Hockey                        Czechoslovakian Soccer
     
    Notice anything peculiar or out of the ordinary?  Whether in Canada or Czechoslovakia,
    whether it's hockey or soccer, their players tend to be born in  the first few months of the year.
    They were graphed on this window.
     
    Birth month window   Here are all teams on the same graph:   All 3 teams
     
    Now do you see it?  If not, here is the graph adjusted with a ZoomStat window:
    ZoomStat all 3
     
    Most of the players are born in January and February.  Why is that?  It's not astrology.  It's that the cutoff dates of eligibility in soccer and hockey in both countries are January 1st.  So a player who turns 10 on January 2nd plays the sport alongside someone who won't turn ten until December.  And at the ages of 10 and 9, who is going to have an advantage in a physical contact sport?  The older players are going to appear bigger and stronger than the pre-adolescents 11 months younger.  So the coaches select those players based on those observations when really they are just picking the oldest kids.  And those children get better coaching, more practice and experience, a tougher playing field, all of which makes them truly better players by the time they are 13 or 14.  What was once a tiny edge or advantage becomes larger and larger because they are consistently given more opportunity and experiences to develop their talents.  If you are born in the latter half of the year and want to play soccer or hockey, the deck is stacked against you.  The system discourages you from pursuing the sport further because the very month in which you are born becomes the obstacle.
     
    What trend do you notice about NHL player birthmonths?
    NHL birthmonths
    Why is that? 
     
    Are Major League Baseball birthmonths following the same trend?
    Or are they following a different trend?
     
    mlb birthmonths
    Does this apply to all sports? 
    Here is the NFL's birthmonth distribution.
    NFL birthmonths
     
    What about soccer?
    soccer birth quartiles
     
    Does the trend in Olympic birthdays generalize to the UEFA European leages?
    UEFA soccer birthmonths
    Why is this important?
    graduate birthmonths
     
    Are these observations statistically significant?
     
     
     
     
     

    The Birthday Problem

    What are the chances that two people in class have the same birthday?

    P(different)  P(different)

    Among 17 of us, no two had the same birthday.  The chances of that were 71.6% =

    365*364*363*362*361*360*359*358*357*356*355*354*353*352*351*350/365^16

    = 1(.997)(.994)(.992)(.989)(.986)(.983)(.98)(.978)(.975)(.973)(.97)(.967)(.964)(.962)(.959)(.956)(.953)

    = 71.6%.  So the chances of a different birthday is P(diff) = 1 - P(same) = 1-0.716 = 0.284.

    There was a 28.4% chance of two people in class having the same birthday.

     

    If there are 23 people in a group, chances are 50-50 that two of them have the same birthday!

    (Tamiscal staff has 23 people and indeed, two of us have the same birthday.  One of them is me.  Can you find the other?)


    Chapter 8 -- Testing a Claim

    formulas

    Notes for 8.1 - Idea of Significance Test

    Notes for 8.3 and 8.4 - Testing a Claim and Significance for a Proportion

    Notes for 8.5 and 8.6 - Testing Claims and Significance for a Mean

    Assignment 8.1 key

    Chapter 7

    Test this week.  Class Monday from 11-12 for both sections (M&F)

    Ch7 Review Key Assign #20 , degrees of freedom Table Index B

    Activity Applet

    highschool.bfwpub.com/spa3e

    Notes for 7.5 and 7.6

    warmup 7.5 key , Notes 7.5 key

     Notes 7.6 answersNotes 7.6 Ex2

    Confidence Intervals App (7.2)

    Confidence Intervals and Proportions (7.3)

     Notes for 7.1 and 7.2

    Notes 7.2 KEY a , Notes 7.2 KEY b 

    Notes 7.1workforce , Notes 7.1 terms

    Notes for 7.3 and 7.4

     Assign #18  answersassign #19 answers 


    Chapter 6 -- Sampling Distributions

    assignment 11 KEY , assignment 12 KEY , assignment 13 KEY

    Notes 6.1 and 6.2 

    Notes 6.4

    problems 6.4

    Notes 6.4

    notes 6.5

     


    Chapter 5 -- Random Variables

    5-6 answers

    5.7 answers

     


    Chapter 4 -- Probability and Randomness

     A#29 Stat 4.1 Key , A#30 Stat 4.2 Solutions , A#31 Stat 4.3 solution key

    Stat 4.4 solution key A#32 , Stat 4.5 solution key A#33

    Ch 4 Review solutions A#34

     Chapter 4 Test Wed 12/11 in HUB 10-12


     

    Chapter 3 Test given in SIS due to school closure last week.

    Copies will be in test folder Tuesday, Nov 5. 

     Please finish assignments #19-28 and submit to Kate in SIS

    or Sue in office by this Friday.

    You have until November 15 to complete the test in SIS.

    (It' NOT open book or notes this time,

    but you get to decide when you're ready to take it.)

    (It will NOT be in the R2 grade, but the Chapter 3 assignments will be.)

     Lessons 3.1-3.3 Answers

    Lessons 3.4-3.6 Answers

    Lesson 3.7 Key , Lesson 3.8 KeyLesson 3.9 Key

    Lesson 3 Review Solutions

     


    Monday's Test will be open book/open notes.

    Bring in your book and notebook.

    Assignment #16 answers 

    Assignment #17 answers

    Assignment #18 answers

    Assignment #19 Review KEY

     


     

    Regression

     Sandwich fat g - Calories Plot

    Enter the following into a graphing calculator or spreadsheet (Google Sheets on Chromebook)


    SANDWICH fat (g) CALORIES
    Quarter Pounder 19 410
    Big Mac 31 580
    cheeseburger 16 343
    Wendy's 35 570
    Whopper 39 640
    Carl's Jr West 43 660

    Make a scatterplot.

     Sandwich Regression Plot

     

    To find regression equation on calculator:

    Press STAT, highlight "Calc", press 4: Linreg ax+b.

    Enter L1, L2 to specify the data.

    Press Y=.  Then press VARS, choose 5: Statistics,

    arrow cursor over to "EQ", choose 1: RegEQ.

    This should enter the best-fit equation into Y1.

    Press ZOOMStat (9) or GRAPH.

     

    The equation should be y = 10.4x + 225. (y = mx + b)

    Statisticians write it as y = 225 + 10.4x.  (y = a + bx)

    To give it more context, the units are substituted for the variables to make a predictive model:

    CALORIES = 225 + 10.4(fat g)

     

    As more sandwich data is added, the slopes and intercepts change slightly.

    The slope is the correlation factor times Sy/Sx,  where Sy is the standard deviation of the response variable, and Sx the explanatory.

    b = r * Sy / Sx .

    b = 0.97 (128.4 / 10.9) = 11.4

    The y-intercept is calculated by substituting the means for x and y, respectively.

    a = Mean y - b * (mean x)

    a = 534 - 11.4 (30.5) = 186.3

     So this model would be y = 186.3 + 11.4 x.

     

     Add the following sandwich data:

     

    hamburger 10 266
    double cheeseburger 11 440
    Bacon Double 22 400
    Impossible 14 240
    Beyond Meat 20 270
    Baconator 62 970
    Pretzel King 60 920
         

    The BKK Pretzel Bacon King debuted September 19, 2019.

    https://thetakeout.com/review-burger-king-pretzel-bacon-king-bun-1838914498

     

    fat calories equation

    This line is slightly different because there are extra data points in it.

    They are the Impossible Burger, Beyond Meat, Baconator and the BK Pretzel Burger.

    Can you spot them?

     We can use the predictive model EQ (Calories = 107 + 13.8*(fat g)) to predict calorie values for the McDonald's McRib sandwich.

    (Coincidentally rereleased this week.)

    McRib CAL = 107 + 13.8(22 fat g) = 107 + 303.6 = 410.6

    The actual calories are 480.  This difference is called a residual.  It is 69.4.

    Double Down 32 540
    McRib 22 480

    Including these sandwiches to the data alters the regression equation slightly.

    The newly predicted value for the McRib would be 423.6, so it's residual drops to 34.6
    How did these change the correlation factor r?

    Sandwich Regression

    What is R^2?  It is the square of the corrrelation factor.

     For the KFC Double Down, the predicted value is 

    CAL = 142 +12.8(32 fat g) = 551.6.  Its residual is 540 - 551.6 = -11.6 calories.

    If we plot the residuals (datum - predicted value) by the explanatory variable,

    Residuals Plot

     

     there should be no correlation (Note R^2 is about 0.) 

    That is because most of the variance was accounted for by the regression equation.

     



     

    Regression using the calculator

    Regression using calc example problem

    CO2 Levels regression worksheet

     


    Sections 2.4-2.5 Solution KEY

    Correlation worksheet

    Correlations Solution KEY

    Football correlations wins and points

    Baseball correlations Win-Loss and Runs

     

    SAT 2018 Correlations

    Rich students get better SAT scores article published week of this lesson!

     

    cnbc.com/2019/10/03/rich-students-get-better-sat-scores-heres-why.html

     

    Will UC schools drop their SAT scores requirement?

    https://www.latimes.com/california/story/2019-10-02/uc-sat-test-optionalhttps://www.latimes.com/california/story/2019-10-02/uc-sat-test-optional

    posted two days after Monday's lesson, two before Friday's class.

     latimes.com/california/story/2019-10-02/uc-sat-test-optional

     

    Sections 2.1-2.3 KEY

     


     

    Standard Deviation

    7 day forecast

     

      temperatures average deviation squared deviations Sx standard deviation
    SD Above/Below
    Thu, 9/ 12/ 2019 93 79.3 13.7 188.08 6.6 7.1 1.92
    Fri, 9/ 13/ 2019 84 79.3 4.7 22.22 6.6 7.1 0.66
    Sat, 9/ 14/ 2019 75 79.3 -4.3 18.37 6.6 7.1 -0.60
    Sun, 9/ 15/ 2019 72 79.3 -7.3 53.08 6.6 7.1 -1.02
    Mon, 9/ 16/ 2019 75 79.3 -4.3 18.37 6.6 7.1 -0.60
    Tue, 9/ 17/ 2019 77 79.3 -2.3 5.22 6.6 7.1 -0.32
    Wed, 9/ 18/ 2019 79 79.3 -0.3 0.08 6.6 7.1 -0.04
          Average= sum = Sum / n sum / (n-1)  
          0.0 305.43
    =variance
       

    How to find Standard Deviation (STDEV)

    #1.  Find the average (arithmetic mean.)

    #2  Subtract the mean from each data value to get a deviation.

    #3  Square each deviation.

    #4  Sum up all squared deviations.  

    #5  Divide by n-1.*  This result is called the variance.

          *(If you have all data, a population, you can divide by n to get Sx. 

          If you have a sample of a data set, divide by n-1.)

    #6  Take the square root of the variance to restore original units (instead of units squared.)

    This is called the standard deviation.  Usually more than 2/3 of the data will lie within one standard deviation of the mean.  See below.

     

    7 day forecast

     

    Five of the seven data points, 5/7 = 72%, are within one standard deviation above or below the mean.

     


    STATISTICS REVIEW SESSION (Optional)

    09/15/201909/20/2019

    Since Friday's class is canceled due to a minimum day, I will conduct an optional review session for both Friday and Monday stat classes at 12 for half an hour.

     

    Location is the HUB, where Monday's class meets.  If you can't make Friday at 12, I could also conduct one Thursday at 3.

     

    I will have a review guide of concepts, and can do any practice problems.

     

    Tests for each class on Chapter 1 Monday, Sept. 23rd, and Friday Sept. 27th.

     


     

    1.7 ANSWER KEY , 1.8 ANSWER KEY , 1.9 ANSWER KEY

    NOTES 1.7-1.9

    Review ASNWERS

    Stemplot and Histogram Examples (President Ages and Shoes)

    Practice Histogram warmups

    Sections 1.1 to 1.3 Power Point

     


    Histograms

    Describe the distribution (symmetric, uniform, skewed, clustered) of the soccer goals histogram:

    Premier League goals

    What would be the best measures of center (mode, median, mean)?

     

    Discuss the distribution of the runs scored in MLB's 2017 season.

     

    MLB 2017 Runs

    Would the mean or the median be the better measure of center?  What is the mode?

    NFL actual vs perceived frequency

    What is the actual mode?  What is the perceived mode?

     

    NFL scores

     

    Why does the above histogram have a different mode than the perceived actual histogram?

     

    NBA scores

     

    Describe the distribution of NBA scores.

    Are the mean, median, and mode the same measures of center?

     

    MLB runs

     

    Where is most of the data?  Which direction are baseball runs skewed?

    Will the mean or median be greater?  Which three measures of center is the least?

     

     

    Below is a histogram of Nationaal Hockey League goals.

    NHL Goals  

    Describe its distribution and measures of center.

     


      

     

Last Modified on Monday at 12:47 PM