IHCC Members’ Cohorts Survey Summary
From November 2021 through March 2022 a total of 42 unique cohorts completed the IHCC Members’ Cohorts Survey. A large proportion of respondents indicated that they already have deep and diverse data-types, with demographic data most widely available and with a large proportion having the ability to recontact participants and with biospecimens available. Responders represented >21 million unique research participants with data/samples already available. Cumulatively, biosamples are available for >14 million participants and genomic data available for >8 million unique individuals. A wide range of environmental data has been collected by cohorts, including socioeconomic status (74%) education (83%), diet (76%), lifestyle (71%), and medication (71%).
A total of 25 cohorts self-identified as low and middle income countries (LMIC) and/or low resource setting (LRS). Additional responses from these cohorts identified funding as a primary challenge (63%), while 36% identified access to infrastructure and training as a challenge.
A large proportion of cohorts remains interested in Training opportunities, with IHCC webinars most favored among training opportunities.
Results
Preliminary Data to Support the Generation of an IHCC Global Resource: The IHCC’s Scientific Strategy & Cohorts Enhancement Workgroup has focused on identifying cross-cutting themes that impact multiple cohorts in order to better position IHCC to develop its strategy to serve the cohorts community more effectively. To this end, the IHCC recently developed a Cohort members’ survey to assess availability and cohort interests to create thematic workstreams for its future directions. Ultimately, the survey allows us to differentiate between what can be done with the data that exists today versus what requires new data types. The former is a near term opportunity, while the latter requires strategic and sustainability planning for longer-term outcomes. From November 2021 through March 2022 a total of 42 unique cohorts competed the survey, including 37 non-profit and 4 for-profit IHCC members (and 1 “other”). Respondents reflected the diversity of the IHCC as a whole, and were geographically dispersed across Africa (N=5), Asia (N=12), Australasia (N=2), Central America (N=2), Europe (N=11) and North America (N=10). Of these, 25 cohorts self-identified as LMIC/LRS. In total, these cohorts represented >21,000,000 unique research participants with available data/samples (and does not include an additional >7 million participants with data only (and no biospecimens available). The approximate breakdown of cohorts’ participant populations is shown in Table 1.
Research Interests: To guide immediate, medium, and long-term scientific objectives, survey respondents were asked to rate interest in, and availability of data toward, a range of scientific potential projects that members had previously identified as high-priority (panel reviews at 2019-2021 Annual Summit Conferences). The pattern of responses is relatively equal for LMIC and non-LMIC members (Figure 1). Total shown are the percentage of responders that are aligned to each respective category. Of note the majority of cohorts are interested in several cross cutting areas (meaning disease agnostic) including the ability to conduct and validate genome wide association studies (GWAS), polygenic risk scores (PRS), and capturing and using environmental data.
Resource Availability: A large proportion of respondents indicated that they already have deep and diverse data-types, with demographic data most widely available and with a large proportion having the ability to recontact participants and with biospecimens available (see Figure 2). Even though genomic data is available on only ~40% of participants, this represents >8M genomic data sets. These data underline the capacity and readiness of all cohorts, including LMIC/LRS-based cohorts, to collaborate in large-scale programs addressing a wide-range of potential studies. Cumulatively, data is available for >20 million participants.
Sample Availability: Biospecimens with linked phenotypic data are among the most valuable assets of the cohorts for future research. As indicated in Figure 3, LMIC/LRS-based cohorts are comparatively well-placed to collaborate immediately on projects requiring biosamples and the IHCC as a collective is in a healthy position to launch large-scale projects. Cumulatively, biosamples are available for >14 million participants.
LMIC Specific Considerations: While the survey data highlights a depth in resources for LMIC/LRS-based cohorts, this subgroup of cohort leaders were contacted for qualitative feedback on the challenges specific to the lower-resourced organizations. Of the 25 cohorts that responded, 19 provided additional feedback on this metric: for 12 of the responders (63%), funding was identified as a primary challenge, while 7 (36%) identified infrastructure and training. Four (21%) face challenges gaining access to populations and two (11%) identified a lack of opportunities to collaborate. As reflected in this proposal, the SSCE is actively working with IHCC Leadership to overcome these challenges and to adopt mechanisms to address them through investment, training, and outreach.
Training: IHCC members as a whole are enthusiastic about pursuing training opportunities across a variety of modalities, as shown in Figure 4. The Training workgroup is leveraging these data to follow-up with sites most interested in cohort exchange programs and mentorship opportunities.
Environmental Data: More than 70% of cohorts have access to the residential address of their participants,providing an opportunity to study the health impact of any number of environmental factors through the generation of geocode data. Further, a wide range of environmental data has been collected by cohorts and is similarly primed for a wide of range of prospective programs, including socioeconomic status (74%) education (83%), diet (76%), lifestyle (71%), and medication (71%), while 40% and 36% have collected data on residential and occupational exposure (Figure 5). Linking environmental data to other cohort information and data will require the development of a policy for privacy preservation.
In summary, the diversity, depth and breadth of these data reflect the vast potential of the IHCC as a research platform and resource for the global community of longitudinal population studies. These data highlight the massive quantity of data already available across the consortium, largely untapped, with an active effort underway to address gaps where they exist.