lm.ajustado <- lm(Dheight~Mheight,data = heights)
plot(lm.ajustado)
library(robustbase)
ajuste.robusto <- lmrob(Dheight~Mheight,data = heights)
summary(ajuste.robusto)
Call:
lmrob(formula = Dheight ~ Mheight, data = heights)
\--> method = "MM"
Residuals:
Min 1Q Median 3Q Max
-7.38930 -1.52310 0.04171 1.47830 9.04590
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.15192 1.64993 17.67 <2e-16 ***
Mheight 0.55400 0.02641 20.98 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Robust residual standard error: 2.222
Multiple R-squared: 0.2527, Adjusted R-squared: 0.2522
Convergence in 9 IRWLS iterations
Robustness weights:
118 weights are ~= 1. The remaining 1257 ones are summarized as
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.06007 0.86580 0.95200 0.90420 0.98580 0.99900
Algorithmic parameters:
tuning.chi bb tuning.psi refine.tol rel.tol scale.tol
1.548e+00 5.000e-01 4.685e+00 1.000e-07 1.000e-07 1.000e-10
solve.tol eps.outlier eps.x warn.limit.reject warn.limit.meanrw
1.000e-07 7.273e-05 1.288e-10 5.000e-01 5.000e-01
nResample max.it best.r.s k.fast.s k.max maxit.scale trace.lev
500 50 2 1 200 200 0
mts compute.rd fast.s.large.n
1000 0 2000
psi subsampling cov compute.outlier.stats
"bisquare" "nonsingular" ".vcov.avar1" "SM"
seed : int(0)
summary(lm.ajustado)
Call:
lm(formula = Dheight ~ Mheight, data = heights)
Residuals:
Min 1Q Median 3Q Max
-7.397 -1.529 0.036 1.492 9.053
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.91744 1.62247 18.44 <2e-16 ***
Mheight 0.54175 0.02596 20.87 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.266 on 1373 degrees of freedom
Multiple R-squared: 0.2408, Adjusted R-squared: 0.2402
F-statistic: 435.5 on 1 and 1373 DF, p-value: < 2.2e-16
plot(ajuste.robusto)
recomputing robust Mahalanobis distances
saving the robust distances 'MD' as part of ‘ajuste.robusto’
Los coeficientes son muy similares.
lm.ajustado.2.1 <- lm(wgt~hip.gi,data = bdims)
lm.ajustado.2.2 <- lm(wgt~hgt,data = bdims)
plot(lm.ajustado.2.1)
plot(lm.ajustado.2.2)
2.1 el QQ plot se aleja mucho de la normalidad. Existe una observación con alto leverage (474) 2.2 el QQ plot se aleja mucho de la normalidad.
¿Lo conforman estos modelos ajustados?
El 2.1 no, el 2.2 sí
lm.ajustado.2.1.rob <- lmrob(wgt~hip.gi,data = bdims)
lm.ajustado.2.2.rob <- lmrob(wgt~hgt,data = bdims)
plot(lm.ajustado.2.1.rob)
recomputing robust Mahalanobis distances
saving the robust distances 'MD' as part of ‘lm.ajustado.2.1.rob’
plot(lm.ajustado.2.2.rob)
recomputing robust Mahalanobis distances
saving the robust distances 'MD' as part of ‘lm.ajustado.2.2.rob’
¿Cambian mucho los modelos ajustados?
no ¿Qué indica esto?
que el ajuste robusto no soluciona el problema
ajuste <- lm(BrainWt~BodyWt, data = mammals)
plot(ajuste)
¿Difieren mucho entre sí? si b) Use el test de outliers basado en los residuos estudentizados. Indique cuáles son las observaciones candidatas a outliers.
outlierTest(ajuste)
rstudent unadjusted p-value Bonferonni p
5 12.229518 8.0699e-18 5.0033e-16
1 -11.805397 3.5737e-17 2.2157e-15
34 3.921156 2.3221e-04 1.4397e-02
Las observaciones candidatas a ser outliers son la 1,5 y 34
library(VGAM)
library(knitr)
infl <- hatvalues(ajuste)
infl <- infl[order(-infl)]
kable(head(data.frame(leverage = infl)))
leverage | |
---|---|
1 | 0.8610554 |
5 | 0.1279368 |
21 | 0.0183400 |
29 | 0.0182341 |
12 | 0.0175660 |
32 | 0.0169303 |
Calcule las distancias de Cook, vea cuáles son las observaciones influyentes.
cook.d <- cooks.distance(ajuste)
cook.d <- cook.d[order(-cook.d)]
head(data.frame(distancia.cook = cook.d))
ajuste <- lmrob(BrainWt~BodyWt, data = mammals)
plot(ajuste)
recomputing robust Mahalanobis distances
saving the robust distances 'MD' as part of ‘ajuste’
Se marcan las observaciones 1 y 5 como outliers
lm_fitted <- lm(logBrainWt ~ logBodyWt, data = mammals)
plot(lm_fitted)
¿Le parece que este modelo ajusta mejor a los datos?
ajusta mucho mejor
Hacer un ajuste robusto a los datos de perímetro cefálico y edad gestacional. Comparar con el ajuste clásico. Identificar la presencia de outliers.
birth <- read.table('low_birth_weight_infants.txt', header = T)
lm_fitted <- lm(headcirc ~ gestage, data = birth)
lm_fitted.rob <- lmrob(headcirc ~ gestage, data = birth)
summary(lm_fitted)
Call:
lm(formula = headcirc ~ gestage, data = birth)
Residuals:
Min 1Q Median 3Q Max
-3.5358 -0.8760 -0.1458 0.9041 6.9041
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.91426 1.82915 2.14 0.0348 *
gestage 0.78005 0.06307 12.37 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.59 on 98 degrees of freedom
Multiple R-squared: 0.6095, Adjusted R-squared: 0.6055
F-statistic: 152.9 on 1 and 98 DF, p-value: < 2.2e-16
summary(lm_fitted.rob)
Call:
lmrob(formula = headcirc ~ gestage, data = birth)
\--> method = "MM"
Residuals:
Min 1Q Median 3Q Max
-3.44318 -0.70590 -0.07454 1.03138 7.08226
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.06207 1.30210 3.888 0.000184 ***
gestage 0.73728 0.04561 16.166 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Robust residual standard error: 1.385
Multiple R-squared: 0.6573, Adjusted R-squared: 0.6538
Convergence in 8 IRWLS iterations
Robustness weights:
observation 31 is an outlier with |weight| = 0 ( < 0.001);
7 weights are ~= 1. The remaining 92 ones are summarized as
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1494 0.8864 0.9582 0.9065 0.9853 0.9985
Algorithmic parameters:
tuning.chi bb tuning.psi refine.tol rel.tol scale.tol
1.548e+00 5.000e-01 4.685e+00 1.000e-07 1.000e-07 1.000e-10
solve.tol eps.outlier eps.x warn.limit.reject warn.limit.meanrw
1.000e-07 1.000e-03 6.366e-11 5.000e-01 5.000e-01
nResample max.it best.r.s k.fast.s k.max maxit.scale trace.lev
500 50 2 1 200 200 0
mts compute.rd fast.s.large.n
1000 0 2000
psi subsampling cov compute.outlier.stats
"bisquare" "nonsingular" ".vcov.avar1" "SM"
seed : int(0)
plot(lm_fitted)
plot(lm_fitted.rob)
recomputing robust Mahalanobis distances
saving the robust distances 'MD' as part of ‘lm_fitted.rob’
El registro 31 y 33 parecieran ser outliers. Sin embargo el \(R^2\) sólo pasa de .6 a .65