High-throughput genomics has fundamentally changed how life scientists investigate their research questions. Instead of studying a single candidate gene, we are able to measure DNA sequences and RNA expression of every gene in a large number of genomes. This technology advance requires us to become familiar with data wrangling, programming, and statistical analysis. To meet new challenges, bioinformaticians have created a large number of software packages to deal with genomic data.
For example, a molecular biologist could nowadays sequence a full human genome that results in billions of very short and randomly located DNA sequences. Such short DNA sequences (e.g., ~hundreds base pairs) must be aligned, so that we can infer a full genome that is over 3 billion base pairs. Among many available computational methods, it is often difficult to know the most appropriate tool for one’s specific need. I propose to track the usage of bioinformatics tools used in sequence alignment, variant calling, and other genomic studies.
This work will lead to a new kind of reviews, that is interactive and dynamic. Eventually, molecular biologists will have a convenient portal that quantitatively summarizes the trends of computational and statistical methods in genomics. Furthermore, the source code will be published in Github, so that questions about other research trends can be duplicated and extended.