Run a Hosmer-Lemeshow test

BS835 Class 6 HW exercises-Run a Hosmer-Lemeshow test

Questions 1 and 2 are from the in-class exercises


(Hypothetical data based broadly on Maslow, Reproductive Outcomes Following Maternal Exposure to the Events of September 11, 2001, at the World Trade Center, in New York City, AJPH.)  To determine the effect of exposure to the events of 9/11 (including both environmental exposures and stress-related exposures), exposure data was collected on n=3,360 women who gave birth to a singleton child in NYC between Oct. 2001 and December 2010.  We will look at any exposure (categorized as exposed vs. not exposed; the study looked at different types of exposures as well), and focus on low birthweight as the adverse outcome potentially related to exposure.  We expect the effect of exposure to be stronger for babies born in the two years following 9/11, and so we are interested in potential effect modification.


Hypothetical data are saved in the file ‘WTC Births.xlsx’.  Variables in the data set are:

1) idnum, a study ID number

2) momage, mother’s age at 9/11, categorized and coded as 1 for those under 30 years, 2 for those aged 30 to 35 years, and 3 for those older than 35

3) college, maternal education coded 1 for those with a 4 year college degree, 0 for those with less than a college degree

4) earlyperiod, coded 1 for births that occurred within 2 years of 9/11, and 0 for those that occurred more than 2 years after 9/11

5) exposure, maternal exposure to the events from 9/11

6) LBW, low birthweight, coded 1 for infants weighing less than 2,000 grams, 0 for those weighing 2,000 grams or more.



Question 1.  Our broad research question is whether a woman’s exposure to the events of 9/11 had an adverse effect on the outcome of her pregnancy.


As a preliminary check on the data, 5.2% of the mom’s in the study had a low birthweight infant.


As a first analysis, fit a multiple logistic regression model with low birthweight as the outcome variable and exposure, time period (the earlyperiod variable), maternal age, and maternal education as predictors.  I’ve summarized results in the following table (you can check to see that your results match the results in the table):


VariableaOR95% CI
Exposed to events of 9/11

Birth within 2 yrs of 9/11

Maternal age

<30 yrs

30 – 35 yrs

>35 yrs

College degree








0.99, 1.84

0.82, 1.61


0.62, 1.29

0.53, 1.27

0.31, 0.58


Run a Hosmer-Lemeshow test to check on the fit of the model, and interpret the results of this Hosmer-Lemeshow test.




Discuss these findings for the exposure variable – was there an effect of exposure on low birthweight?  Explain.






Question 2.  The study sample included births from the 10 years following 9/11.  The investigators believe that the effect of exposure would be strong for women pregnant at the time of 9/11 or shortly after, but that the effect of exposure may weaken over time.  To investigate, run a model including the covariates from Question 1, but adding an interaction term between exposure and the ‘earlyperiod’ (birth within 2 years of 9/11) variable.


Results from the interaction model are given in the table below (Note this table asks for slopes, not odds ratios).  Check these results using the SAS or R code given below:


Results of a logistic regression interaction model predicting low birthweight






for slope


Exposed to events of 9/11

Birth within 2 yrs of 9/11

Exposure x Birth within 2 yrs


Maternal age

<30 yrs

30 – 35 yrs

>35 yrs

College degree






















The table below focuses on the effect of exposure, during the early period and during the late period for the study (effect modification of the effect of exposure by time), from the above interaction model.  See the SAS and R code below.  While results age given by R and SAS, I think you can calculate the two odds ratios by hand (with a calculator) from the slopes above – check these calculations.  I think we need to use the computer to get the confidence intervals for these odds ratios.




Effect of exposure to the events of 9/11 on low birthweight, by time period of delivery

Time PeriodaOR95% CI for aOR
Birth within 2 years of 9/112.291.28,  4.12
Birth more than 2 years after 9/111.090.75,  1.58


Is there effect modification here?  Explain (address significance as well as size of the effect).



Some SAS and R help follow:


SAS code for the interaction model in Question 2


proc logistic;

class momage (ref=’1′) / param=ref;

model lowBW (event=’1′) = exposure earlyperiod exposure*earlyperiod

momage college;

estimate ‘Exposure OR early period’ exposure 1 exposure*earlyperiod 1

/ exp cl;

estimate ‘Exposure OR late period’ exposure 1  exposure*earlyperiod 0

            / exp cl;





R code for the interaction model in Question 2


# for later period (earlyperiod = 0)

log.out <- glm(lowBW ~ exposure + earlyperiod +

exposure:earlyperiod +

relevel(factor(momage),ref=’1′) +

college,family=binomial(link=’logit’) )





# For contrasts of exposure and interaction terms load ‘multcomp’ package

# R gives slopes and standard error for the early and late effects

# need to convert to ORs and CIs ‘by hand’



exp.early <- matrix(c(0,1,0,0,0,0,1),1)

summary(glht(log.out, linfct=exp.early))


exp.late <- matrix(c(0,1,0,0,0,0,0),1)

summary(glht(log.out, linfct=exp.late))


Question 3.  We can also test for effect modification through classical epidemiologic stratification methods, although these methods don’t allow for controlling for other variables.  I used the Mantel-Haenszel stratification methods (see Class 4 homework, using ‘proc freq’ in SAS and ‘epi.2by2’ in R) to examine the association between exposure to the events of 9/11 and low-birthweight, controlling for (stratifying by) the earlyperiod variable.  Results are given below:


Results:  From SAS, the Breslow-Day test gives chi-square (1 df) of 4.83, p=0.028.

The stratified results for the effect of exposure on low birthweight were

    Early Period: OR = 2.32 (1.30, 4.16)

    Late Period: OR = 1.08 (0.74, 1.56). 

From R, the Woolf’s test (labeled ‘M-H test of homogeneity of ORs) gives chi-square (1 df) 4.67, p=0.030.

The stratified results were:

    Early Period: OR = 2.31 (1.30, 4.22)

    Late Period: OR = 1.08 (0.74, 1.56)


Is there significant evidence of effect modification?  Explain.



SAS code:

proc freq; table earlyperiod*exposure*lowBW / all; run;


R code:


# M-H analysis


# stratified results








Question 4.  (I’ve done the computer work and summarized results in the table below.  SAS code and R code that I used is given at the end of the question.)  Significant effect modification indicates that the association of interest is different for different subgroups in the study.  So, when there is significant interaction, another way to account for the interaction is to run separate analyses for the different subgroups.  One nice consequence of this approach is that it avoids presenting results from interaction models (which can be more complicated to present).



Run two logistic regressions (I’ve done this, see table), one for mothers who gave birth within 2 years of 9/11 (earlyperiod=1), the other for mothers who gave birth after 2 years from 9/11.  Both regressions should predict low birthweight from exposure, mom’s age, and mom’s college education.  (Note that ‘earlyperiod’ and the interaction term are not included in these models, since the analysis is being done stratified by earlyperiod.)  Some SAS and R help is given below.


Results are summarized in the following table:


Results of separate logistic regressions predicting low-birthweight, for women giving birth within 2 years of 9/11 and for women giving birth more than 2 years after 9/11

Birth within 2 years of 9/11


Birth more than 2 years after 9/11


VariableaOR95% CIaOR95% CI
Exposed to events of 9/11

Maternal age

<30 yrs

30 – 35 yrs

>35 yrs

College degree







1.29, 4.19


0.37, 1.42

0.36, 1.77

0.25, 0.79







0.74, 1.57


0.64, 1.54

0.49, 1.40

0.29, 0.61


Based on these analyses, describe the effect of exposure, for mothers giving birth either within 2 years or after 2 years from 9/11.



SAS code and R code are below.


Some SAS help:

There are several ways to run a sub-group analysis in SAS.  I’ll use the ‘where’ statement, which can be used with most procs, to restrict the analysis to a subgroup.


title ‘Analysis for births within 2 years of 9/11’;

proc logistic; where earlyperiod=1;

class momage (ref=’1′) / param=ref;

model lowBW (event=’1′) = exposure momage college;



title ‘Analysis for births after 2 years of 9/11’;

proc logistic; where earlyperiod=0;

class momage (ref=’1′) / param=ref;

model lowBW (event=’1′) = exposure momage college;



The ‘where’ statement restricts an analysis to the subset of subjects who satisfy the stated condition.


Some R help:


There are several ways to run a sub-group analysis in R.  I’ll use what I think of as a ‘select if’ statement, parallel to the ‘where’ statement in SAS.  R uses square brackets following a variable name to indicate that only subjects who satisfy the condition in the square brackets should be included in the analysis.  One somewhat awkward consequence of this is that the square brackets need to be included with every variable referenced in the procedure.


log.out1 <- glm(lowBW[earlyperiod==1] ~

exposure[earlyperiod==1] +

relevel(factor(momage[earlyperiod==1]),ref=’1′) +


family=binomial(link=’logit’) )





log.out0 <- glm(lowBW[earlyperiod==0] ~

exposure[earlyperiod==0] +

relevel(factor(momage[earlyperiod==0]),ref=’1′) +


family=binomial(link=’logit’) )




Note that two equal signs, ‘==’ are needed to specify an equality in the ‘select if’ statement in the square brackets.