|
|
EMBOSS: preg |
A regular expression is a way of specifying an ambiguous pattern to search for. Regular expressions are commonly used in some computer programming languages and may be more familiar to some users than to others.
The following is a short guide to regular expressions in EMBOSS:
The following quantifier characters specify the number of time that the character before (in this case 'x') matches:
Quantifiers can follow any of the following types of character specification:
Combining some of these features gives these examples from the PROSITE patterns database:
'[STAGCN][RKH][LIVMAFY]$'
which is the 'Microbodies C-terminal targeting signal'.
'LP.TG[STGAVDE]'
which is the 'Gram-positive cocci surface proteins anchoring hexapeptide'.
Regular expressions are case-sensitive. The pattern 'AAAA' will not match the sequence 'aaaa'.
% preg Input sequence: sw:* Output file [5h1d_fugru.preg]: Regular expression pattern: gc[^g]
Mandatory qualifiers: [-sequence] seqall Sequence database USA [-outfile] outfile Output file name [-pattern] regexp Regular expression pattern Optional qualifiers: (none) Advanced qualifiers: (none) |
| Mandatory qualifiers | Allowed values | Default | |
|---|---|---|---|
| [-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
| [-outfile] (Parameter 2) |
Output file name | Output file | <sequence>.preg |
| [-pattern] (Parameter 3) |
Regular expression pattern | Any regular epression pattern is accepted | Required |
| Optional qualifiers | Allowed values | Default | |
| (none) | |||
| Advanced qualifiers | Allowed values | Default | |
| (none) | |||
Matches in CO9_FUGRU
CO9_FUGRU 522 GCQ
Matches in D1DR_FUGRU
D1DR_FUGRU 27 GCF
D1DR_FUGRU 345 GCH
Matches in D5DR_FUGRU
D5DR_FUGRU 43 GCV
D5DR_FUGRU 349 GCS
Matches in HD_FUGRU
HD_FUGRU 982 GCC
Matches in SYH_FUGRU
SYH_FUGRU 15 GCR
Matches in SYV_FUGRU
SYV_FUGRU 329 GCD
SYV_FUGRU 1128 GCA
Matches in TCPD_FUGRU
TCPD_FUGRU 291 GCN
TCPD_FUGRU 375 GCA
Matches in ACH2_DROME
ACH2_DROME 4 GCC
ACH2_DROME 433 GCN
Matches in LACY_ECOLI
LACY_ECOLI 147 GCV
LACY_ECOLI 175 GCA
LACY_ECOLI 332 GCF
Matches in BGAL_ECOLI
BGAL_ECOLI 121 GCY
Matches in 12S1_ARATH
12S1_ARATH 111 GCA
Matches in OPSD_HUMAN
OPSD_HUMAN 109 GCN
Matches in AMIC_PSEAE
AMIC_PSEAE 80 GCY
Matches in AMIR_PSEAE
AMIR_PSEAE 36 GCS
| Program name | Description |
|---|---|
| dreg | regular expression search of a nucleotide sequence |
| fuzznuc | Nucleic acid pattern search |
| fuzzpro | Protein pattern search |
| fuzztran | Protein pattern search after translation |
| patmatdb | Search a protein sequence with a motif |
| patmatmotifs | Search a PROSITE motif database with a protein sequence |
| pscan | Scans proteins using PRINTS |
| tfscan | Scans DNA sequences for transcription factors |
Other EMBOSS programs allow you to search for simple patterns and may be easier for the user who has never used regular expressions before: