What is the GenomeIndia Project?

GenomeIndia (GI) is a pan-India initiative funded by the Department of Biotechnology (DBT), Ministry of Science & Technology, Government of India, aiming to create a comprehensive catalogue of genetic variations across India’s diverse population.

When was the GenomeIndia Project conceptualised and when was it launched?

The project was conceptualised in late 2017 followed by two years of intense brainstorming and preparation by the GenomeIndia consortium under the leadership of Prof. Vijayalakshmi Ravindranath. The project was officially sanctioned in late 2019 and was launched in January 2020.

What is the constitution of the GenomeIndia Consortium?

The GI Consortium comprises institutions with the following responsibilities:

  • Sample collection, Sequencing, and Analysis: BRIC-NIBMG (Kalyani), CBR (IISc, Bengaluru), CSIR-CCMB (Hyderabad), CSIR-IGIB (Delhi).
  • Sample Collection: AIIMS Jodhpur, BRIC-IBSD (Imphal), BRIC-ILS (Bhubaneswar), BRIC-RGCB (Thiruvananthapuram), GBRC (Gandhinagar), IISER (Pune), Mizoram University (Aizawl), NIMHANS (Bengaluru), SKIMS (Srinagar).
  • Method Development: BRIC-CDFD (Hyderabad), IIIT (Allahabad), IISc (Bengaluru), IIT (Delhi), IIT (Jodhpur), IIT (Madras), NCBS (Bengaluru).
  • Biobank: CBR (Bengaluru).
  • Data Archival: IBDC (Faridabad).
  • For details on the members of the consortium, please look up: https://genomeindia.in/institute.php

Who coordinates the GenomeIndia project?

The institution coordinating the GI project is the Centre for Brain Research (CBR), Bengaluru (https://cbr-iisc.ac.in/). Prof. Vijayalakshmi Ravindranath from CBR became the founding national coordinator of the GI project until 2022. Subsequently, the GenomeIndia project has been jointly coordinated by Prof. Y. Narahari, IISc, Bengaluru and Prof. K. Thangaraj, CSIR-CCMB, Hyderabad. Dr. Suchita Ninawe, Senior Adviser and Scientist-H, DBT, is the Scientific Coordinator. Dr. Richi Mahajan, Scientist-D, DBT, is the Administrative Scientist. Details of the scientists associated with the GI project may be found in: https://genomeindia.in/people.php.

What are the goals of the GenomeIndia Project?

  • Create an exhaustive catalog of genetic variations: Sequence the whole genomes of 10,000 individuals from 83 diverse Indian populations, representing the rich genetic diversity of the nation.
  • Establish a Biobank for Future Research: Collect and archive 20,000 blood samples to enable future research in genomics.
  • Enable Open Access to Genomic Data: Provide publicly accessible genomic data (sequencing data and other relevant data) to foster research collaborations and innovation.
  • Develop Affordable Genetic Tools for Diagnostics: Conceptualize genome-wide and disease-specific arrays to support low-cost diagnostic solutions specific to India.
  • Inspire the Next Generation of Genomic Innovators: Encourage young Indian minds and researchers to take up research and innovation in genomics, driving advancements in health and medicine for the nation.

How many samples have been collected and sequenced?

20,195 samples have been collected and archived in the Biobank at CBR, Bengaluru. The samples collected belong to healthy individuals (as self-declared by the individuals). 10074 samples have been sequenced. 13242 samples have undergone GWAS (Genome Wide Association Studies).

How many populations/ethnic groups are covered by the GenomeIndia Project?

Samples span 83 distinct Indian populations (also known as ethic groups), ensuring a balanced representation of anthropological, sociocultural, and ethnolinguistic diversity. In particular, the study covered diverse populations including the Tibeto-Burman, Indo-European, Dravidian, and Austro-Asiatic speakers, encompassing both tribal and non-tribal groups.

Where is the genomic data stored?

Sequencing data is archived at the Indian Biological Data Centre (IBDC), Faridabad. The following data is archived: (a) FASTQ data for 9,772 samples (b) gVCF files for the above samples (c) joint call files , and phenotype data for about 9,000 samples.

What guidelines govern data sharing?

The project follows Biotech-PRIDE Guidelines (2021) and the FeED Protocol (Framework for Exchange of Data) (2025) for responsible, equitable data access. These documents can be accessed respectively from Biotech-PRIDE Guidelines (2021) and FeED Protocol (2025) .

Why is India important for global genomics?

India contains one of the largest and most diverse human populations in the world, shaped by thousands of years of migration, cultural diversity, and endogamy (marriage within communities).

However,Indian populations have been historically under-represented in global genomic databases, which are dominated by European samples. This gap limits the accuracy of genetic studies and medical predictions for people from South Asia.

GenomeIndia helps address this imbalance by providing a large, population-aware genomic dataset from India.

What did the study discover?

The project identified around 130 million genetic variants, including over 44 million previously unknown variants that were not present in global databases.

The study also found:

  • Strong population structure linked to geography and language
  • Evidence of founder effects and genetic drift in several communities
  • Some populations with long runs of homozygosity, reflecting strong founder effect (i.e these populations are founded by a close knit group of individuals) long-term endogamy
  • Population-specific disease-related and pharmacogenomic variants

These findings provide a more detailed picture of how demographic history has shaped genetic diversity in India.

Why do many populations show strong genetic signatures?

India has a long history of social stratification, linguistic diversity, and community-based marriage practices. Over time, these practices have created genetically distinct populations.

Some groups have experienced founder effects, where a small ancestral population gave rise to many descendants. This can sometimes increase the frequency of certain inherited diseases.

Understanding these patterns helps researchers identify population-specific disease risks.

How could this research improve healthcare in India?

The GenomeIndia dataset provides several tools that could improve health research and clinical care:

Better disease gene discovery

  • Identifying genetic variants linked to rare and common diseases.

Improved drug response prediction

  • Some genetic variants affect how individuals metabolize medicines.

Population-specific diagnostics

  • Genetic tests can be adapted for variants common in Indian populations.

Better risk prediction models

  • Current models often perform poorly in non-European populations.

This dataset will help develop more accurate medical tools tailored for Indian populations.

Why do European genetic risk scores not work well for Indians?

Many genetic risk prediction tools were developed using data from European populations. Because genetic variation differs across populations, these models often perform poorly in people from South Asia.

GenomeIndia demonstrates this limitation and highlights the need for population-specific genomic resources.

Will this lead to new treatments?

The project itself does not directly develop treatments. However, it provides a foundational dataset that researchers can use to:

  • Identify disease-causing genes
  • Design better clinical studies
  • Develop precision medicine strategies

In the long term, such data can support drug discovery and targeted therapies.

Does this research change our understanding of Indian populations?

The study confirms and refines earlier insights from anthropology and population genetics. It shows how migration, geography, language, and social structure have shaped India's genetic landscape.

Importantly, the research does not support simplistic notions of biological divisions between communities. Instead, it shows that most populations share substantial ancestry and are connected through historical admixture.

Could this research inadvertently stigmatize communities?

No. The project was designed carefully to avoid any possibility of stigmatisation. Key principles include:

  • Population-level analysis rather than individual identification
  • Ethical approvals and informed consent
  • Engagement with anthropologists and social scientists

The goal is to improve scientific understanding and healthcare—not to label or rank populations.

What scientific assets has the GenomeIndia project created?

The project has built several national research assets:

  1. A large genomic dataset
    • Nearly 10,000 whole genome sequences from diverse Indian populations
  2. An Indian genomic reference
    • A catalogue of ~130 million genetic variants
  3. A South Asian imputation panel
    • Improves the accuracy of genetic studies in Indian populations
  4. Biobanking infrastructure
    • Long-term storage of 20,000+ biological samples for future research
  5. National collaboration network
    • More than 20 research institutions working together.

These resources will support many future studies beyond this project.

How does GenomeIndia compare with other national genomics initiatives?

Many countries have launched large-scale genomic projects, such as:

  • UK Biobank (United Kingdom)
  • All of Us (United States)
  • China Kadoorie Biobank
  • GenomeAsia project

GenomeIndia differs in several ways:

Focus on population diversity

  • Sampling from 83 anthropologically defined populations

Representation of small and isolated groups

  • Many global projects focus mainly on large urban populations.

Integration with social and linguistic context

  • Sampling strategy guided by anthropology and population genetics.

This makes GenomeIndia one of the most detailed population genomics studies of a single country.

What comes next for GenomeIndia?

The current dataset represents an initial phase. Future work may include:

  • Sequencing more individuals from additional populations
  • Linking genomic data with clinical and health outcomes
  • Studying rare diseases and complex traits
  • Improving genomic tools and diagnostics for India
  • Conducting longitudinal studies

Will the data be available to researchers?

Yes. The project aims to make data available to researchers through controlled-access frameworks that protect participant privacy.

Such datasets can accelerate research in:

  • disease genetics
  • population history
  • pharmacogenomics
  • public health genomics

How will ordinary people benefit from this project?

The benefits will emerge gradually through research and healthcare applications:

  • More accurate genetic tests for inherited diseases
  • Better drug prescriptions based on genetic profiles
  • Improved risk prediction for complex diseases
  • Increased inclusion of Indian populations in global biomedical research

Ultimately, the project helps ensure that advances in precision medicine benefit people in India as well.

Why is representation in genomic research important?

If certain populations are missing from genomic datasets:

  • disease risk predictions may be inaccurate
  • genetic tests may miss important variants
  • treatments may be less effective

By improving representation, GenomeIndia helps ensure that global genomic medicine becomes more equitable.

Where are the key findings of the GenomeIndia Project published?

A marker paper on the GenomeIndia project was published in Nature Genetics: Mapping genetic diversity with the GenomeIndia project. Nature Genetics. Volume 57, April 2025. Pages 767-773. Springer..

A detailed manuscript on the findings of the GenomeIndia project is published in medRxiv: An Atlas of Indian Genetic Diversity. 20 March 2026.

Where do I get more information about the GenomeIndia Project?

The GenomeIndia Digest (published in February 2024) is a coffee table book that provides a rich source of information on all aspects of the GI Project. This is available at https://dbtindia.gov.in/ebook/feed-protocols-genomeindia-digest . There is a 7-minute documentary video that captures all essential details of the GenomeIndia project. You can view the video at: Watch GenomeIndia Video . On February 27, 2024, there was an event organized in New Delhi by the Department of Biotechnology, Ministry of Science & Technology, to mark the completion of sequencing of 10000 genomes. The meeting was addressed by Dr. Jitendra Singh-Ji, the Hon’ble Union Minister of State of Science & Technology (IC). For a video recording of the event, please look up: Watch the event on YouTube.. On January 9, 2025, Hon’ble Prime Minister Shri Narendra Modi-Ji addressed the delegates of the “Genomics Data Conclave and Release of GenomeIndia Data ” organized in the Vigyan Bhawan, New Delhi, by the Department of Biotechnology, Ministry of Science & Technology. Dr. Jitendra Singh-Ji, the Hon’ble Union Minister of State of Science & Technology (IC) also addressed the delegates. During the event, the IBDC portal was launched to enable access to the GenomeIndia data. For a video recording of the event, please look up: Watch the event on YouTube . The GenomeIndia website provides up-to-date information on the project and can be accessed at https://genomeindia.in/index.php