General motivation
Evolution considered at the gene and genomic levels is both an object of study in itself and a means for getting at new knowledge. Comparative approaches in particular rely on the observed differences in sequence evolution between closely or distantly related species to infer such important information as genes, promoter and regulatory binding sites, repeated sequences (transposable elements, satellites) and so on. The two aspects, evolution per se, and evolution as a means for getting at new information, are addressed in BAMBOO. The organisms considered cover all three domains of life: archae, prokaryotes and eukaryotes.
The introduction of high-throughput techniques for measuring genic expression by microarrays, tiling arrays or assimilated techniques, and more recently, the new sequencing techniques (Solexa, Solid, 454) are considerably changing our vision of genome evolution and our understanding of such processes as regulation and dynamics. In eukaryotes in particular, besides the fact that a higher proportion of the genomes appears transcribed than initially believed, the complexity of the genomic structures, notably via alternative splicing, seems to challenge our idea of a genome cut into regions called genes. Furthermore, the importance and extent of regulatory processes at the epigenetic level are just now starting to be acknowledged. This concerns the chromatid structure, nucleosome positioning and sequence basepair modifications, but also, and this is a more recent discovery, the spatial arrangement of the chromosomes inside the nucleus and the contacts that are made intra- and inter-chromosomes. It is indeed increasingly more recognised that such arrangement and contacts, although dynamic, are at least in part conserved among cells of a same type, and are related with the regulation of genes, certain types of rearrangements (as appears proven for translocation) and even possible cases of trans-splicing. The networks of intra- and inter-chromosomal contacts, observed for now at a still small scale, could thus represent epigenetic information that would help explain why cells with exactly the same genetic material can exhibit very different behaviours depending on the tissue where they are and the developmental stage of the organism.
In prokaryotes, stress is put at understanding the link between genome evolution at a punctual or large scale and the life history of the organisms, more particularly the history of the relationship of the organisms with their environment. This environment most often is another organism with which the bacterial species lives in intimate relationship. The nature of this relationship, how it is influenced by the genomic context, how it impacts on our understanding of the genotype-phenotype and species-environment relations are other major concerns of BAMBOO.
Proposed approaches
The various topics addressed by BAMBOO are briefly enumerated below. In methodological terms, they appeal to algorithmics on texts (words) and on graphs (as a way of representing genomes that have been rearranged through evolution or are possibly rearranged through such hypothetical processes as trans-splicing). The mathematical techniques used include combinatorics, probability and statistics at all levels: of modelling, algorithmic development, algorithmic complexity analysis, and evaluation of the results obtained on real data. The latter aspect requires to obtain good random models, a largely open issue where biology is concerned. The need to be efficient when treating massive or complex data may also require the elaboration of special data structures such as smart indexes for strings but also trees or even graphs and/or of powerful filtering techniques.
The main more specific topics studied by BAMBOO are:
- organisation and dynamics:
- precise detection of various genomic features such as breakpoint regions, isochores, replication sites, different types of repeats, regions of gene copy number aberration etc.;
- analysis of the evolution of species, chromosomes (in particular the sex chromosomes) and gene/protein families - this appeals also to phylogenetic tree/network reconstructions under various models;
- understanding of the mechanisms underlying breaks in association with other genomic characteristics (eg isochores, replication sites, repeats, spatial chromosomal organisation) and genetic processes (eg regulation, splicing) and evolutionary events (eg recombination, mutations);
- analysis in particular of the impact of recombination on genomic evolution in general;
- analysis of the rearrangement scenarii observed;
- analysis of the evolutionary and regulatory impact of repeats, notably of transposable elements;
- inference of ancestral prokaryotic and eukaryotic gene orders;
- more in general, exploration of the link between genotype and phenotype, and between genomic organisation and dynamics and the life traits of the organisms (relation with other organisms and with the environment).
- genetic and epigenetic:
- inference of the network of intra- and inter-chromosomal interactions;
- exploration of the link between such network and, notably, regulation, splicing, replication, genomic rearrangements.
