The MEC is a population-based, prospective cohort study designed to investigate lifestyle and genetic factors related to cancer and other chronic diseases . Participants were identified from sources including driver’s license records, voter registration lists, and Health Care Financing Administration data files. They were recruited by mailing an invitation letter and questionnaire . More than 215,000 adults aged 45–75 years living in Hawaii or the Los Angeles area were enrolled between 1993 and 1996. Participants were primarily African American, Japanese American, Native Hawaiian, Latino, or White. The response rates from highest to lowest were Japanese American men 46%, women 51%; White men 39%, women 47%; Native Hawaiian men 36%, women 42%; African American men 20%, women 26%; and Latino men 19%, women 21%. The respondents completed a self-administered, comprehensive questionnaire including a detailed dietary assessment. The Institutional Review Boards of the University of Hawaii and the University of Southern California approved the study.
For the current analysis, we excluded participants who were not of one of the main five racial and ethnic groups (n = 13.987), had colorectal cancer prior to baseline based on self-report (n= 2251) or tumor registry information (n= 300), and reported implausible diets (n= 8.137). Specifically, we computed a robust standard deviation (RSD) with an assumption of a truncated normal distribution from the middle 80% of the log energy distribution. Then, we excluded all individuals with energy values outside the ranges of means ± 3RSD. A similar approach was used to exclude individuals with extreme fat, protein, or carbohydrate intakes to identify individuals who skipped important dietary pages . As a result, the range of total energy intake after exclusions was 490 to 8700 kcal/day for men and 425 to 7800 kcal/day for women. We further excluded participants with missing covariates (n= 19,234) including body mass index (BMI), smoking, physical activity, multivitamin use, nonsteroidal anti-inflammatory drug (NSAID) use, and menopausal hormone therapy use for women. As a result, a total of 79,952 men and 93,475 women were included in the analysis.
Dietary assessment and plant-based diet indices
At baseline, participants’ usual intake of foods and beverages was assessed with a quantitative food frequency questionnaire (QFFQ) with > 180 food items . Participants reported the frequency and the usual portion size of food consumption in the previous year. The QFFQ had 8 response categories (“never or hardly ever” to “2 or more times a day”) and, for some beverage items, 9 response categories (“never or hardly ever” to “4 or more times a day”) . Participants were asked to choose one of three (in a few instances four) portion size options specific to each food item to assess the amount of food eaten. The QFFQ was validated in all sex-ethnic groups in a calibration study with the use of data from three repeated 24-h dietary recalls . Daily energy and nutrient intakes were calculated using the food composition tables developed by the University of Hawaii Cancer Center for use in the MEC.
We calculated three plant-based diet indices (PDI, hPDI, and uPDI) using data from the QFFQ, based on the food groups defined and the scoring methods developed in previous studies [12, 13, 18]. For the current study, 16 food groups were used for the PDI, hPDI, and uPDI. The food groups were classified as healthy plant foods (whole grains, fruits, vegetables, vegetable oils, nuts, legumes, tea and coffee), less healthy plant foods (refined grains, fruit juices, potatoes, added sugars), and animal foods ( animal fat, dairy, eggs, fish or seafood, meat) for PDI, hPDI, and uPDI based on the associations between food items and health outcomes reported in the literature [12, 13]. We modified the original 18 food groups used for PDI, hPDI, and uPDI [12, 13] by combining sugar sweetened beverages and sweets and desserts into added sugars and excluding miscellaneous animal-based foods because we primarily used the MyPyramid Equivalent Database (MPED) values (cup equivalents or ounce equivalents) calculated for the MEC participants [19, 20]. The MPED is a standardized food-grouping system developed by the USDA that disaggregates mixed dishes into their food items and allocates each food item into one of 32 food groups . We used the MPED values for 13 out of 16 component food groups of plant-based diet indices as we did for constructing commonly used diet quality indices from the MEC QFFQ that included many mixed dish items . For vegetable oils, tea and coffee, and animal fat that were not in the MPED groups, gram amounts of individual QFFQ items were used.
For each food group for all indices, daily consumption per 1000 kcal was divided into quintiles based on sex-specific distributions. For the PDI, all plant food groups were positively scored (the lowest quintile receiving 1 point and the highest quintile receiving 5 points). For the hPDI, only healthy plant foods were positively scored, while less healthy plant food groups were scored reversely (the lowest quintile receiving 5 points and the highest quintile receiving 1 point). Conversely, for the uPDI, less healthy plant food groups were scored positively, and healthy plant food groups were scored inversely. In all indices, animal food groups were scored reversely. Higher PDI scores represent greater consumption of all types of plant foods regardless of healthiness. Higher hPDI scores represent greater consumption of healthy plant foods and lower consumption of less healthy plant foods. Higher uPDI score represented lower consumption of healthy plant foods and greater consumption of less healthy plant foods. Total scores for each index were calculated as the sum of the scores (1 to 5) across each component food group. Thus, the theoretical range of PDI, hPDI, and uPDI was 16 to 80.
Incident colorectal cancer cases were identified by linkage to the statewide Surveillance, Epidemiology, and End Results Program tumor registries in Hawaii and California. Deaths were identified by linkage to death certificate files in both states and the National Death Index. Case and death ascertainment were completed through December 31, 2017. Cases in the current study were limited to invasive adenocarcinoma of the large bowel and were categorized according to anatomic subsites using International Classification of Disease (ICD)-O3 codes: C18.0–C18 .5 for right colon, C18.6–C18.7 for left colon, and C19.9 and C20.9 for rectum, excluding multi-site cases. During an average follow-up period of 19.2 years, a total of 4976 incident colorectal cancer cases were identified among the eligible participants.
Cox proportional hazards models of colorectal cancer with age as the time metric were used to calculate hazard ratios (HRs) and 95% confidence intervals (CIs) for men and women separately. Follow-up began at the date of cohort entry and ended at the earliest date of diagnosis, death, or study closure (December 31, 2017). A separate model was fit for each of three diet indices. The total scores for each index were divided into quintiles based on their distribution across the entire cohort. All models were adjusted for race and ethnicity as a strata variable and age at cohort entry (years), family history of colorectal cancer (yes/no), history of colorectal polyp (yes/no), BMI (< 25, 25– < 30, and ≥ 30 kg/m2), pack-years of cigarette smoking (continuous), multivitamin use (yes/no), NSAID use (yes/no), physical activity (hours spent in moderate and vigorous work or sports per day), menopausal hormone therapy use (never , past, current) for women only, alcohol consumption (g/day), and total energy intake (log transformed kcal/day) as covariates. We also considered other factors such as height and education levels as covariates but did not include them in the final models because adjustment for these variables did not change the associations between plant-based dietary patterns and colorectal cancer risk. The potential confounders were selected because they were associated with colorectal cancer risk in our cohort or because they were established risk factors for colorectal cancer in the literature. Linear trends were tested by modeling sex- and race- and ethnicity-specific median scores within each quintile as a continuous variable. The proportional hazards assumption was tested by the Schoenfeld residual method  and found to be met.
Sensitivity analysis was conducted to test the robustness of our findings. We conducted a 4-year time-lagged analysis to minimize reverse causation due to existing diseases. To assess the possible impact of residual confounding by the known risk factors of colorectal cancer, we conducted subgroup analyzes by BMI (≥ 25 kg/m2 vs. < 25 kg/m2), smoking status (ever vs. never smokers), and alcohol consumption (≥ 30 g/day vs. < 30 g/day). In addition, we evaluated the associations of the individual plant food groups with colorectal cancer risk and estimated the associations of substituting whole grains, fruits, vegetables, or legumes for added sugars, which were the major food groups associated with colorectal cancer risk. The substitution analyzes were conducted by including both food groups as continuous variables (divided by SD of each variable) in the multivariable model, which also contained total energy intake and other covariates. The difference in their βcoefficients and their variances and covariance were used to estimate the substitution associations . In supplemental analyses, the plant-based diet indices were updated as time-dependent variables using data from a 10-year follow-up survey (2003–2008) that was available for 79,350 (46%) of the 173,427 participants.
Tests for heterogeneity between subgroups were based on the Wald statistics for cross-product terms of trend variables and subgroup indicator variables (sex or race and ethnicity). Tests for heterogeneity by anatomic subsite were based on the Wald statistics comparing competing risk models using an augmented data approach [23, 24]. Spearman’s correlations were examined between the plant-based diet indices A possible nonlinear relationship between the indices and colorectal cancer risk was examined nonparametrically using restricted cubic splines with 4 knots at 5th, 35th, 65th, and 95th percentiles . All statistical tests were two-sided. All analyzes were performed using SAS statistical software, version 9.4 (SAS Institute, Inc., Cary, NC).