Statistical Literacy
#1 #2 #3
Descriptive Statistics:Pictures are worth a thousand numbers as well as a thousand words. 
Why a histogram is better than a mean or median or even a five number summary of a set of data.  Why a scatter plot is better than an Rsquared or a
Regression Equation in summarizing the relationship between two sets of data. 
Judgment is key to adjusting the axes of the histograms and scatter plots to maximize the quality of information 
The average American has one testicle and one ovary.  Gathered data is not always good data  Correlations are not causation  Most important may be ignored by the analyst 
Has the data been massaged? Are the outliers there?  The most important facts may not be quantifiable.  Problem sets should be prioritized by civic or personal relevance.  Failure to do so is a recipe for amnesia, boredom, and poor performance. 
Inferential Statistics:
All about randomness, probability, and sample size

Randomness is key to getting a good sample  The bigger the sample size the closer and more confident you can be in generalizing.  Roughly: a random sample of 100: 95% confident, plus or minus 10%.
Sample of 1200” 95% confident Plus or minus 3% 
Beware the file drawer problem!
Beware Type 1 and Type 2 Errors 
Probability is the key to statistical experiments.
Has the experiment been reproduced? How many times? 
Perfect analogy is to the jury system. As the jury should assume innocent, so the statistician assumes no effect
(null hypothesis) 
Then calculates odds of getting actual result from chance alone. If extremely rare then, rejects the null hypothesis 
Data omission and factor omission are likely when issue has a partisan dimension.

P values are arbitrary. P values should be stated a priori. P values should be thought about. 
Chi square calculations can be completely misleading.

Simpson’s Paradox is a warning to make sure all the data has been disclosed.

Finding Right Metric key  Best hitter: is batting average the right number?
Is Zscore better than absolute? 
Finance: absolute or relative performance? riskadjusted or not,
But how? Sharpe? 
Justice: do women make $.77 on the dollar? What does this mean? Are you sure? 
Statistical Literacy 2
Level One 
The uncertain can often be predicted with amazing certainty.  The laws of chance lead often to extremely counterintuitive results.  Data can be misleading and decisions based on them false. 
Quantification can lead to the double illusion of importance and objectivity,  The most important factors may not be quantifiable.  Most complex problems require nonquantitive judgment.  
Statistical wizardry is no substitute for substantive knowledge.  Experiments should be reproduced multiple times.  The bigger the sample the lower the standard deviation.  
Level Two  1111 is a good sample size –
which is not a function of the population – the tasting soup analogy 
P values are arbitrary but should be decided on before experiments are conducted.  For what is a p value of 5% a good decision rule? Guilt or innocence? 
The inevitability of Type 1 and Type 2 errors  Studies should be based on random samples.

Experiments should be double blind and controlled.  
Regression to the man, the Placebo effect, and the Hawthorne effect can be big  Adjusting data is often necessary but can be extremely misleading.  CPI adjustment is critical but fails to account for quality improvement.  
Extrapolation is almost irresistible: budgets, stocks,
Climate. 
Partisan bias can distort data collection, experimental design.  Only 40% of social science experiments are ever repeated.
Is this science? 