Experimental Algorithms Lab

Algorithmic research can have a number of applications. The main fields of applications of the research done in the lab are listed.

Computational Biology

Some of the problems we are currently investigating are alternative splicing prediction, haplotyping, and sequence analysis.

ASPIc

Alternative splicing (AS) is currently considered as one of the main mechanism able to explain the huge gap between the number of predicted genes and the high complexity of proteome in human. The main goal of this project is the development of fast and reliable computational tools for analyzing and predicting AS from ESTs and genomic data. Most recent research is focused on the development of computational models to detect tissue or polymorphism (SNP) specific alternative events. Our research on this project have produced a WEBtool for AS prediction ASPIc-WEB and a database collecting data on AS ASPIc-DB.

Phylogenetic Reconstruction and Comparison

Our research on this basic topic of Computational Biology mainly concerns the computational complexity and algorithmic solution of optimization problems derived by specific instances of the more general problem of comparing phylogenies (or evolutionary networks) to combine them into a single representation (i.e. an evolutionary tree or network). We address computational problems derived from consensus tree methods such as the maximum agreement subtree (MAST) problem and the maximum isomorphic subtree (MIT) problem. A basic problem we investigate in comparative phylogenetics is the reconciliation (or inference) of species tree from gene trees.

Algorithms for Haplotype Inference (HI) and Genetic Variation Analysis

The investigation of genetic differences among humans has given evidence that mutations in DNA sequences are responsible for some genetic diseases. The most common mutation is the one that involves only a single nucleotide of the DNA sequence, which is called a single nucleotide polymorphism (SNP). As a consequence, computing a complete map of all SNPs occurring in the human populations is one of the primary goals of recent studies in human genomics.

Our research in this field is mainly focused on the design and experimentation of algorithm for solving combinatorial problems related to haplotype inference and genetic variations analysis.

Specific computational problems of interest are: (1) inferring the complete information on haplotypes from (incomplete or partial) haplotypes or genotypes assuming the Coalescent model, (2) efficient reconstruction of the perfect phylogeny describing the evolutionary history of SNPs (single nucleotide polymorphism) data in presence of recurrent mutations.

Sequence Analysis and Comparison

The main goal of this project concerns the development of algorithms for sequence analysis by novel alignment methodology and sequence comparison by consensus sequence methods with application in several field of genome sequence comparison (genome sequence rearrangement, multiple sequence comparison). Our investigation in this area has concerned the design of approximation and heuristic algorithms for the LCS and SCS, the Exemplar Longest Common Subsequence.

The study of microbial communities requires to analyze populations of ribosomal RNA gene (rDNA) clones by hybridization experiments on DNA microarrays. Unlike in the classical SBH (sequencing by hybridization) procedure, where multiple probes are on a DNA chip, in this application we perform a series of experiments, each one consisting of applying a single probe to a DNA microarray containing a large sample of rDNA sequences from the studied population. The overall cost of the analysis is thus roughly proportional to the number of experiments, underscoring the need for minimizing the number of probes. We have developed some efficient algorithms for solving such problem, known as probe selection or string barcoding, and our preliminary tests demonstrate that those algorithms are able to find satisfactory probe sets for real rDNA data.

Graph Decomposition

For some problems, decomposing a graph allows for the application of a divide-and-conquer technique to design efficient algorithms. A kind of decomposition that we have studied is called modular decomposition.

Clustering

Correlation Clustering is a model for clustering data when a binary relationship between data points is known. More precisely, for each pair of points we have two scores measuring the similarity and dissimilarity respectively, of the two points, and we would like to compute an optimal partition where the value of a partition is obtained by summing up scores of pairs involving points from a same cluster and scores of pairs involving points from different clusters. The main difference between Correlation Clustering and, say, k-means is that for the former the number of sets of the solution is not known a priori.

A closely related problem is Consensus Clustering, where we are given a set of partitions and we would like to obtain a partition that best summarizes the input partitions. We have investigate the computational complexity of both problems and we have designed some efficient approximation algorithms.

home.txt · Last modified: 2009/10/09 12:02 by Yuri Pirola
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0