## Thursday, February 24, 2011

### OCW-18.443: problem-set#1

My solutions to 1st problem set in MIT-Statistics for Applications course.

Some of the solutions are done using R(version 2.12.1).

Also, they use following function that prints out the confidence interval for mean given the sample mean, estimated standard deviation, percentage and size of the sample.
n_conf_interval_mean = function(mean, sd, percentage, size) {  error = qnorm((percentage/100 + 1)/2)*sd/sqrt(size)  left = mean - error  right = mean + error  sprintf("Required interval is (%f,%f).",left,right)}

1(a) Sum of independent normal random variables is another normal random variable. So sample mean(Xbar) is also normal random variable with
E[Xbar] = 0 and
Var(Xbar) = 1/n = 1/49

So, (Xbar - E[Xbar])/$\sqrt{Var(Xbar)}$ is approximately N(0,1). That is, 7Xbar is N(0,1).

Now, P(|Xbar| < c) = 0.8
=> P(-c < Xbar < c) = 0.8
=> P(-7c < 7Xbar < 7c) = 0.8
=> P(-7c < Z < 7c) = 0.8 , where Z is standard normal random variable
=> 2F(7c) - 1 = 0.8 , where F is CDF of standard normal random variable
=> F(7c) = 0.9

From the table of CDF of standard normal distribution we can see that F(1.28) = .8997
So approximately 7c = 1.28
=> c = 0.183

1(b) T follows a t distribution with 4 degrees of freedom.

(i) P(|T| < t0) = 0.9
=> P(-t0 < T < t0) = 0.9
=> 2P(T < t0) - 1 = 0.9
=> P(T < t0) = 0.95
by looking at the quantiles of t-distribution table, we can check that t0 = 2.132

(ii) P(T > t0) = 0.05
=> 1 - P(T < t0) = 0.05
=> P(T < t0) = 0.95
so again, t0 = 2.132

2. We will solve this with R. Here is the R terminal interaction:
>
> mean(data) #2(a)
[1] 6.6766
>
> sd(data) #2(b)
[1] 0.006202688
>
> (6.6766 - 6.1)/0.006202688 #2(c)
[1] 92.9597
>
> n_conf_interval_mean(6.6766,0.006202688,95,7) #2(d)
[1] "Required interval is (6.672005,6.681195)."
>

3(a).
$\sigma^2$ can be estimated by
$S^2$ = $\frac{1}{n-1}\sum_{i=1}^n {(X_i - Xbar)}^2$

Also, $\frac{(n-1)S^2}{\sigma^2}$ has chi-square distribution with n-1 degrees of freedom.

Let $F_m(\alpha)$ denote the point beyond which the chi-square distribution with m degrees of freedom has probability $\alpha$. That is, $F_m(\alpha)$ is (1-$\alpha$)th quantile of chi-square distribution with m degrees of freedom.
Then it can be derived(a similar derivation is done in Example-8.5.6 in book " Mathematical Statistics and Data Analysis ") that

(1-$\alpha$)100 % confidence interval for $\sigma^2$ is
($\frac{(n-1)S^2}{F_{n-1}(\alpha/2)}$ , $\frac{(n-1)S^2}{F_{n-1}(1 - \alpha/2)}$)

Required caculations are done with R. Here is the R-terminal interaction

> data
[1] 6.6729 6.6735 6.6873 6.6699 6.6742 6.6830 6.6754
>
> S_square = var(data)
> S_square
[1] 3.847333e-05
>
> n = 7
>
> alpha = 1 - 95/100
>
> left = (n-1)*S_square/qchisq(0.975,6)
> right = (n-1)*S_square/qchisq(0.025,6)
>
> sprintf("95 percent confidence interval for sigma-square is (%f,%f)",left,ri$[1] "95 percent confidence interval for sigma-square is (0.000016,0.000187)" > > sprintf("95 percent confidence interval for sigma is (%f,%f)",sqrt(left),sqrt(right)) [1] "95 percent confidence interval for sigma is (0.003997,0.013659)" > 3(b) Clearly 0.0005, 0.0007, 0.0006 are withing the confidence interval for$\sigma\$

4.
Again, we will use R to solve this. Here is the R-terminal interaction
> n_conf_interval_mean(98.41,0.77,95,26) #4(a)
[1] "Required interval is (98.114027,98.705973)."
>
> n_conf_interval_mean(98.10,0.72,95,122) #4(b)
[1] "Required interval is (97.972238,98.227762)."
>

(c) Yes, the intervals overlap. Yes, 95% confidence interval for women's temperature data contains 98.6

5. Here is the R-terminal interaction
> x <- rexp(25)
> shapiro.test(x)

Shapiro-Wilk normality test

data: x
W = 0.7748, p-value = 8.875e-05

> y <- runif(200)
> shapiro.test(y)

Shapiro-Wilk normality test

data: y
W = 0.9521, p-value = 2.999e-06

> z <- rnorm(500)
> shapiro.test(z)

Shapiro-Wilk normality test

data: z
W = 0.9982, p-value = 0.8914

>