Semiparametric Additive Hazards Models with Missing Covariates
The case-cohort study design was originally proposed by Prentice (1986). Under this design,a random sub-cohort of individuals is selected from the cohort of study. Full covariate data are collected from all the cases in the cohort and the sub-cohort, not all the original cohort, saving time and money if measures such as biomarkers or genotypes are required. Thus, certain covariates will be missing from a large number of individuals in the cohort of study. This design has been widely used in clinical and epidemiological studies to study the effect of covariates on failure times. The Cox proportional hazards model (Cox 1972) is a popular and classical choice in such data due to its nice interpretation of regression coefficients and the availability of efficient inference procedures implemented in all statistical software packages. Few other methods allow for time varying regression coefficients. An underlying assumption of the Cox model is the so-called proportional hazards assumption, that is, the hazard ratio remains constant over time or covariates have log-linear effects on the risk of the event of interest. However, in many real datasets, covariates may exhibit much more complicated effects than log-linear effects; thus, the proportional hazards assumption may be violated, and the Cox model may not be an appropriate choice. In addition, most methods do not use the data of the non-cases that are outside of sub-cohort which results into inefficient inference. Addressing these issues, we have proposed an estimation procedure for the semiparametric additive hazards model for case-cohort data, allowing the covariates of interest to be missing for cases and for non-cases. We have considered an additive model in which effects of some covariates are time varying while the effects of some other covariates are constants. Further, we have assumed that the missing covariates have constant effect on failure time. We have proposed an Augmented Inverse Probability Weighted Estimation (AIPW) procedure. It uses auxiliary information that is correlated with missing covariates. We have established the asymptotic properties of the proposed AIPW estimation. Our simulation study shows that Augmented Inverse Probability Weighted estimation is more efficient than the widely used Inverse probability Weighed (IPW) and Complete case estimation method. This result is apparent if the sub cohort is very small. The method is applied to analyze a data from a HIV vaccine efficacy trial.