class: center, middle, inverse, title-slide

# Hypothesis Test
## ⚔ A general guideline
### Yuan Du
### 11-07-2019 Updated: 2020-10-01

---

layout: false
class: bg-main3 split-30 hide-slide-number

.column[

]
.column.slide-in-right[.content.vmiddle[
.sliderbox.shade_main.pad1[
.font5[Welcome !]
]
]]

---
class: bg-main1
# Recap from Statistical Basics course

### Dataset (Column, Row) ✔️

--
### Data type ✔️

--
### Scatter plot, bar chart, etc ✔️

--
### Interpret significance test ✔️

--
 
###For more statistical classes please contact:

###Office of Research Integrity <ah.ori@AdventHealth.com> or Meghan <Meghan.Brodie@AdventHealth.com>.
---
class: bg-main1

# Data Type

<div id="htmlwidget-142630624d1c80dc4383" style="width:504px;height:504px;" class="grViz html-widget"></div>
<script type="application/json" data-for="htmlwidget-142630624d1c80dc4383">{"x":{"diagram":"digraph {\n\ngraph [layout = dot, rankdir = TB]\n\n# define the global styles of the nodes. We can override these in box if we wish\nnode [shape = rectangle, style = filled, fillcolor = Linen]\n\n# define arrow\nedge [color = grey, arrowhead = none, arrowtail = none]\n\nChar [label = \"Qualitative (Categorical) \n Variables \", shape = folder, fillcolor = Beige]\nNum [label = \"Quantatitive (Numerical) \nVariables\", shape = folder, fillcolor = Beige]\nA [label = \"Nominal\"]\nB [label = \"Binary\"]\n\nC [label= \"Interval\"]\nD [label= \"Ratio\"]\nE [label= \"Ordinal\"]\n\n# edge definitions with the node IDs\n{Char} -> A -> B\n{Num} -> C -> D -> E\n}","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>

---
class: middle center bg-main1
#Chart suggestions

![ ](https://blog-sap.com/analytics/files/2016/12/12.9.vinay_chart.jpg)

---
class: bg-main1
#P-value

###- P-value or significance is a probability, thus bounded by 0 and 1.

--
###- A test p-value provides the probability indicating statistical significance.

--
###- If the p-value is less than 0.05 (typically used value for statistical testing), then the study results are statistically significant.

--
 
###.yellow[**Note**] There is an ongoing debate on misusing p-value. and New England Jounal of Medicine in July,2019 published the [new Statistical Reporting Guidelines](https://www.nejm.org/author-center/new-manuscripts).

---
class: split-two white

.column.bg-main1[.content[
# Interpret significance test
--
###There is a statistically significant difference/reduction/increase on .yellow[outcome] between groups/pre&post (.green[p-value=...]).

]]
.column.bg-main2[.content.vmiddle.center[
###.green[Example 1:] The mean time score difference from pre to post is 5.18 compared to 14.45. There is a statistically significant increase in time scores from pre to post (p-value <0.001) using .black[paired sample t test].

--
 
###.yellow[Example 2:] Gender and ASA are independent based on the .black[Chi-square test] (p-value = .8826). The percentage of female and male patients is not different between the ASA types.

]]

---
class: bg-main1

# Class Objectives
![](https://media.giphy.com/media/vx1S8MddJ11JQLTXaB/giphy.gif)
## &nbsp;Let's get started!

--
###-Construct hypothesis

--
##-.green[Outcome is numerical] (Parametric test & Non-parametric test)

--
##-.yellow[Outcome is categorical]

---
class: bg-main1
background-image: url(https://i.imgflip.com/vghl4.jpg)

---
class: split-two white

.column.bg-main1[.content[
# Hypothesis
 
###- Null hypothesis (H0): Assumption of the test holds and is failed to be rejected at some level of significance.
 
###- Alternate hypothesis (H1): Assumption of the test does not hold and is rejected at some level of significance.

]]
.column.bg-main2[.content.vmiddle.center[
###**Example**: suppose someone claims that 20 (80%) of 25 patients who received drug A were cured, compared to 12 (48%) of 25 patients who received drug B.

--
###- H0: the two treatments are equally effective and the observed difference arose by chance
 
###- H1: one treatment is better than the other.

]]

---
class: bg-main1

###.white[Side note:] However, it is essential to note that the P value does not provide a direct answer. Let us assume in this case the statistician does a significance test and gets a P value = .04, meaning that the difference is statistically significant (P < .05). But as explained earlier, this does not mean that there is a 4% probability that the null hypothesis is true and 96% chance that the alternative hypothesis is true. The P value is a frequential probability and it provides the information that there is a 4% probability of obtaining such a difference between the cure rates, if the null hypothesis is true.

###In probability, this would be written as follows:

###P(\theta.red[/H0])
---
class: split-two white

.column.bg-main1[.content[

##**Assumptions for parametric test:**
 
###- Sample is derived from a population with a normal distribution.- a “bell-shaped curve.” 
####(The sample size is large enough for the central limit theorem to lead to normality of averages)

###- Variance is homogeneous.

###- Data are measured at interval level.

]]
.column.bg-main2[.content[
 
##**Assumptions for nonparametric test is not free:**
 
###- Distinctly non-normal and cannot be transformed
 
###- Sample size is too small for the central limit theorem to lead to normality of averages

###- Nominal or ordinal

]]

---
class: split-two white

.column.bg-main1[.content[

`(Repeated Measures are not included-more than two matched groups)`
#.green[Most widely used parametric tests are:]
 
###- paired t-test (dependent/matched two groups) 
###- (unpaired) t-test (independent two groups)
###- ANOVA (more than two groups)
###- Pearson correlation

]]
.column.bg-main2[.content[
 
`(Repeated Measures are not included-more than two matched groups)`
#.green[Most widely used non parametric tests are:]
 
###- Wilcoxon signed ranks test (paired two groups)
###- Wilcoxon Mann whitney test (independent two groups)
###- Kruskal Wallis test (more than two groups)
###- Spearman correlation

]]

---
class: middle center bg-main1

#Hypothesis test summary table (Simple version)
![](https://ars.els-cdn.com/content/image/3-s2.0-B9780123736956000156-f15-27-9780123736956.gif?_)

---
class: split-60 white

.column.bg-main1[.content[
#.green[Example 1] (Independent T test):

###Patients cared under Teaching physician group has lower cost. There is a significant lower cost in the teaching physician group than in the non teaching group ($10060 Vs $18631.37 , p-value<0.001) by t test.

```r
*t.test(Cost_Observed ~ Physician_Group, data = Data)
```
]]
.column.bg-main3[.content.vmiddle.center[
# This tells the `R` to run t test.
]]
---
class: split-60 white

.column.bg-main1[.content[
###The output shows "Welch two sample t test" which means that the variance between two groups are not equal. Some software automatically show the right result for you like R. Some provide both results and you need to choose one based on the estimate pooled variance test result.
###We can roughly check the variances:

```r
       Welch Two Sample t-test

data: Cost_Observed by Physician_Group
*t = -10.305, df = 980.71, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -10203.21 -6938.82
sample estimates:
mean in group 1 mean in group 2 
 10060.36 18631.37 
```