Computational Biology is the main topic of application of our research activities. We have worked mainly on the following biological problems.
Alternative Splicing
Alternative splicing (AS) is currently considered as one of the main mechanism able to explain the huge gap between the number of predicted genes and the high complexity of proteome in human. Moreover, the production of alternative transcripts from the same gene due to AS is involved in the onset of several diseases. The main goal of this project is the development of fast and reliable computational tools for analyzing and predicting AS from ESTs and genomic data. Most recent research is focused on the development of computational models to detect tissue or polymorphism (SNP) specific alternative events from next-generation sequencing data (NGS). Our research on this project have produced a WEBtool for AS prediction ASPIc-WEB and a database collecting data on AS ASPIc-DB. A new computational method, called Pintron, has been recently implemented to face time and space issues that were still unsolved.
Phylogenetic Reconstruction and Comparison
Our research on this basic topic of Computational Biology mainly concerns the computational complexity and algorithmic solution of optimization problems derived by specific instances of the more general problem of comparing phylogenies (or evolutionary networks) to combine them into a single representation (i.e. an evolutionary tree or network). We contributed to the solution of computational problems related to two main consensus tree methods: maximum agreement subtree (MAST) problem and the maximum isomorphic subtree (MIT) problem. A basic problem we investigate in comparative phylogenetics is the reconciliation (or inference) of species tree from gene trees and the reconstruction of species trees under the duplication model. Most recent contributions in this field concerns the algorithmic reconstruction of trees under variants of the perfect phylogeny model that include homoplasy events.
Algorithms for Haplotype Inference (HI) and Genetic Variation Analysis
The investigation of genetic differences among humans has given evidence that mutations in DNA sequences are responsible for some genetic diseases. The most common mutation is the one that involves only a single nucleotide of the DNA sequence, which is called a single nucleotide polymorphism (SNP). As a consequence, computing a complete map of all SNPs occurring in the human populations is one of the primary goals of recent studies in human genomics.
Our research in this field is mainly focused on the design and experimentation of algorithm for solving combinatorial problems related to haplotype inference and genetic variations analysis.
Specific computational problems of interest are: (1) genotype imputation and haplotype recostruction in pedigrees on real data (human and farm animals) (2) haplotype phasing and genotype analysis assuming the Coalescent model of the perfect phylogeny describing the evolutionary history of SNPs (single nucleotide polymorphism) data in presence of recurrent mutations.
Sequence Analysis and Comparison
The main goal of this project concerns the development of algorithms for sequence analysis by novel alignment methodology and sequence comparison by consensus sequence methods with application in several field of genome sequence comparison (genome sequence rearrangement, multiple sequence comparison). Our investigation in this area has concerned the design of approximation and heuristic algorithms for the LCS and SCS, the Exemplar Longest Common Subsequence.
The study of microbial communities requires to analyze populations of ribosomal RNA gene (rDNA) clones by hybridization experiments on DNA microarrays. Unlike in the classical SBH (sequencing by hybridization) procedure, where multiple probes are on a DNA chip, in this application we perform a series of experiments, each one consisting of applying a single probe to a DNA microarray containing a large sample of rDNA sequences from the studied population. The overall cost of the analysis is thus roughly proportional to the number of experiments, underscoring the need for minimizing the number of probes. We have developed some efficient algorithms for solving such problem, known as probe selection or string barcoding, and our preliminary tests demonstrate that those algorithms are able to find satisfactory probe sets for real rDNA data.
