![]() |
Research in Bioinformatics |
| Assessing Clusters and Motifs | Gene Regulation | Multi-agent Searching |
| Consensus Sequences |
| Assessing Clusters and Motifs from Gene Expression Data | |
| Lars M. Jakt, Liang Cao, Kathryn S.E. Cheah, and David K. Smith | |
| Genome Research, 2001, 11:112-123 | |
| Data associated with the paper | |
| Correspondence to David K. Smith | |
| Large-scale gene expression studies and genomic sequencing projects are providing vast amounts of information that can be used to identify or predict cellular regulatory processes. Genes can be clustered on the basis of the similarity of their expression profiles or function and these clusters are likely to contain genes that are regulated by the same transcription factors. Searches for cis-regulatory elements can then be undertaken in the noncoding regions of the clustered genes. However, it is necessary to assess the efficiency of both the gene clustering and the postulated regulatory motifs, as there are many difficulties associated with clustering and determining the functional relevance of matches to sequence motifs. We have developed a method to assess the potential functional significance of clusters and motifs based on the probability of finding a certain number of matches to a motif in all of the gene clusters. To avoid problems with threshold scores for a match, the top matches to a motif are taken in several sample sizes. Genes from a sample are then counted by the cluster in which they appear. The probability of observing these counts by chance is calculated using the hypergeometric distribution. Because of the multiple sample sizes, strong and weak matching motifs can be detected and refined and significant matches to motifs across cluster boundaries are observed as all clusters are considered. By applying this method to many motifs and to a cluster set of yeast genes, we detected a similarity between Swi Five Factor and forkhead proteins and suggest that the currently unidentified Swi Five Factor is one of the yeast forkhead proteins. | |
| Gene Regulation | |
| M. L. Jakt, L. Cao, K. S. E. Cheah, M. M. Cheung, D. K. Smith | |
| The action of transcription factors is one of the
ways by which the expression of genes in a cell is regulated. These proteins
bind to DNA segments close to a gene to either allow or prevent its transcription.
If it is known which genes a transcription factor regulates then it may be possible
to identify the regulatory region for these genes by searching for
common sequence patterns in the genomic DNA near these genes. Once a site is
identified, it may be possible to predict other genes that are regulated by this
factor.
Now that whole genome sequences and gene expression studies are available, other approaches to identifying regulatory regions are possible. If a group of genes has a similar expression profile then it is reasonable to suggest that they may share a common regulatory mechanism. By searching the DNA near these genes for common patterns, potential regulatory elements may be found. Here, we are using this general approach to identify regulatory regions. As transcription factors can bind to rather variable DNA sequences we are looking for fairly "subtle" patterns. As well, most factors do not act alone so it may be necessary to find another nearby pattern to estimate the likelihood that a functional site has been identified. |
|
| Multi-agent Searching | |
| L. M. H. Ko, C. S. K. Yeung, M. L. Jakt, L. Cao, K. S. E. Cheah, D. K. Smith | |
| When biological and biomedical researchers
need to analyse their genetic data, they frequently have to consult a wide range of
resources. Generally, most of the specialised tools and databases that are required
will be located at many separate web sites. This lack of coordination of web
resources, from an outside user's point of view, is a considerable problem for
contemporary research.
To overcome this and to provide a single point of entry to a range of web services, a Java-based multi-agent search system is being developed. Using the technology of the web, it is possible to design agents which will search, in parallel, a variety of sites and return information to the user. This web-based project has the advantages of platform independence and portability and a working prototype system has been established. |
|
| Consensus Sequences | |
| D. K. Smith | |
|
Consensus sequences are widely used to represent
families of aligned sequences. However, many methods hide a large amount of
the information in the sequence alignment. A new way to display consensus
sequences, which reveals more of the information about the sequences, while
maintaining a clear presentation, has been developed. This technique is
based on identifying the major sequence components at each position of a
set of aligned sequences.
  Methods are being developed to allow the matching of new sequences to previously aligned families. This will show where a new sequence contains common or rare elements when compared with the sequence family and, indeed, whether or not the new sequence is a member of the family. Use of this technique will allow the maintenance of collections of aligned sequences and the validation of existing compilations. The programs and documentation are available. |
![]() |
| Home | Introduction | Software | I Want To... | People |
| Research | Education | Web Resources | Search |
times.