Research in Bioinformatics

 


Assessing Clusters and Motifs Gene Regulation Multi-agent Searching
Consensus Sequences   

Please send details of research projects in bioinformatics to BRUHK support


Assessing Clusters and Motifs from Gene Expression Data
Lars M. Jakt, Liang Cao, Kathryn S.E. Cheah, and David K. Smith
Genome Research, 2001, 11:112-123
Data associated with the paper
Correspondence to David K. Smith
Large-scale gene expression studies and genomic sequencing projects are providing vast amounts of information that can be used to identify or predict cellular regulatory processes. Genes can be clustered on the basis of the similarity of their expression profiles or function and these clusters are likely to contain genes that are regulated by the same transcription factors. Searches for cis-regulatory elements can then be undertaken in the noncoding regions of the clustered genes. However, it is necessary to assess the efficiency of both the gene clustering and the postulated regulatory motifs, as there are many difficulties associated with clustering and determining the functional relevance of matches to sequence motifs. We have developed a method to assess the potential functional significance of clusters and motifs based on the probability of finding a certain number of matches to a motif in all of the gene clusters. To avoid problems with threshold scores for a match, the top matches to a motif are taken in several sample sizes. Genes from a sample are then counted by the cluster in which they appear. The probability of observing these counts by chance is calculated using the hypergeometric distribution. Because of the multiple sample sizes, strong and weak matching motifs can be detected and refined and significant matches to motifs across cluster boundaries are observed as all clusters are considered. By applying this method to many motifs and to a cluster set of yeast genes, we detected a similarity between Swi Five Factor and forkhead proteins and suggest that the currently unidentified Swi Five Factor is one of the yeast forkhead proteins.

Gene Regulation
M. L. Jakt, L. Cao, K. S. E. Cheah, M. M. Cheung, D. K. Smith
    The action of transcription factors is one of the ways by which the expression of genes in a cell is regulated. These proteins bind to DNA segments close to a gene to either allow or prevent its transcription. If it is known which genes a transcription factor regulates then it may be possible to identify the regulatory region for these genes by searching for common sequence patterns in the genomic DNA near these genes. Once a site is identified, it may be possible to predict other genes that are regulated by this factor.

    Now that whole genome sequences and gene expression studies are available, other approaches to identifying regulatory regions are possible. If a group of genes has a similar expression profile then it is reasonable to suggest that they may share a common regulatory mechanism. By searching the DNA near these genes for common patterns, potential regulatory elements may be found. Here, we are using this general approach to identify regulatory regions. As transcription factors can bind to rather variable DNA sequences we are looking for fairly "subtle" patterns. As well, most factors do not act alone so it may be necessary to find another nearby pattern to estimate the likelihood that a functional site has been identified.


Multi-agent Searching
L. M. H. Ko, C. S. K. Yeung, M. L. Jakt, L. Cao, K. S. E. Cheah, D. K. Smith
    When biological and biomedical researchers need to analyse their genetic data, they frequently have to consult a wide range of resources. Generally, most of the specialised tools and databases that are required will be located at many separate web sites. This lack of coordination of web resources, from an outside user's point of view, is a considerable problem for contemporary research.

    To overcome this and to provide a single point of entry to a range of web services, a Java-based multi-agent search system is being developed. Using the technology of the web, it is possible to design agents which will search, in parallel, a variety of sites and return information to the user. This web-based project has the advantages of platform independence and portability and a working prototype system has been established.


Consensus Sequences
D. K. Smith
     Consensus sequences are widely used to represent families of aligned sequences. However, many methods hide a large amount of the information in the sequence alignment. A new way to display consensus sequences, which reveals more of the information about the sequences, while maintaining a clear presentation, has been developed. This technique is based on identifying the major sequence components at each position of a set of aligned sequences.

     Methods are being developed to allow the matching of new sequences to previously aligned families. This will show where a new sequence contains common or rare elements when compared with the sequence family and, indeed, whether or not the new sequence is a member of the family. Use of this technique will allow the maintenance of collections of aligned sequences and the validation of existing compilations.

The programs and documentation are available.


Please send details of research projects in bioinformatics to BRUHK support

Home Introduction Software I Want To... People
Research Education Web Resources Search

Last Modified 16/01/01 by BRUHK support; Copyright © 1999-2001 The University of Hong Kong
This page has been accessed times.