Early global measures of genome complexity (power spectra, the analysis of fluctuations in DNA walks or compositional segmentation) uncovered a high degree of complexity in eukaryotic genome sequences. The main evolutionary mechanisms leading to increases in genome complexity (i.e. gene duplication and transposon proliferation) can all potentially produce increases in DNA clustering. To quantify such clustering and provide a genome-wide description of the formed clusters, we developed GenomeCluster, an algorithm able to detect clusters of whatever genome element identified by chromosome coordinates. We obtained a detailed description of clusters for ten categories of human genome elements, including functional (genes, exons, introns), regulatory (CpG islands, TFBSs, enhancers), variant (SNPs) and repeat (Alus, LINE1) elements, as well as DNase hypersensitivity sites. For each category, we located their clusters in the human genome, then quantifying cluster length and composition, and estimated the clustering level as the proportion of clustered genome elements. In average, we found a 27% of elements in clusters, although a considerable variation occurs among different categories. Genes form the lowest number of clusters, but these are the longest ones, both in bp and the average number of components, while the shortest clusters are formed by SNPs. Functional and regulatory elements (genes, CpG islands, TFBSs, enhancers) show the highest clustering level, as compared to DNase sites, repeats (Alus, LINE1) or SNPs. Many of the genome elements we analyzed are known to be composed of clusters of low-level entities. In addition, we found here that the clusters generated by GenomeCluster can be in turn clustered into high-level super-clusters. The observation of 'clusters-within-clusters' parallels the 'domains within domains' phenomenon previously detected through global statistical methods in eukaryotic sequences, and reveals a complex human genome landscape dominated by hierarchical clustering.

earli global measur genom complex power spectra analysi fluctuat dna walk composit segment uncov high degre complex eukaryot genom sequenc main evolutionari mechan lead increas genom complex ie gene duplic transposon prolifer can potenti produc increas dna cluster quantifi cluster provid genomewid descript form cluster develop genomeclust algorithm abl detect cluster whatev genom element identifi chromosom coordin obtain detail descript cluster ten categori human genom element includ function gene exon intron regulatori cpg island tfbss enhanc variant snps repeat alus line element well dnase hypersensit site categori locat cluster human genom quantifi cluster length composit estim cluster level proport cluster genom element averag found element cluster although consider variat occur among differ categori gene form lowest number cluster longest one bp averag number compon shortest cluster form snps function regulatori element gene cpg island tfbss enhanc show highest cluster level compar dnase site repeat alus line snps mani genom element analyz known compos cluster lowlevel entiti addit found cluster generat genomeclust can turn cluster highlevel superclust observ clusterswithinclust parallel domain within domain phenomenon previous detect global statist method eukaryot sequenc reveal complex human genom landscap domin hierarch cluster

