Motivation: Determining orthology relations among genes across multiple genomes is an important problem in the post-genomicera. Identifying orthologous genes can not only help predictfunctional annotations for newly sequenced or poorly characterizedgenomes, but can also help predict new protein-protein interactions.Unfortunately, determining orthology relation through computational methods is not straightforward due to the presence of paralogs.Traditional approaches have relied on pairwise sequence comparisonsto construct graphs, which were then partitioned into putativeclusters of orthologous groups. These methods do not attemptto preserve the non-transitivity and hierarchic nature of theorthology relation. Results: We propose a new method, COCO-CL, for hierarchical clustering of orthology/homology relations, and identificationof orthologous groups of genes. Unlike previous approaches,which are based on pairwise sequence comparisons, our method explores the correlation of evolutionary histories of individualgenes in a more global context. COCO-CL can be used as a semi-independentmethod to delineate the orthology/paralogy relation for a refinedset of homologous proteins obtained using a less-conservativeclustering approach, or as a refiner that removes putative out-paralogsfrom clusters computed using a more inclusive approach. We analyzeour clustering results manually, with support from literatureand functional annotations. Since our orthology determinationprocedure does not employ a species tree to infer duplicationevents, it can be used in situations when the species tree isunknown or uncertain.
|