
WHALES - Help
WHALES - (Web Homology Alert Service)
is a sequence alert service. It allows NIH scientists to
define a profile (text terms or sequences) which will be searched each week
against the new sequences in the major DNA/Protein databases. The results are
emailed once a week.
WHALES is available only to NIH personnel, and is developed and maintained by
the Helix Systems staff, CIT, NIH.
There are two distinct search modes:
- TEXT SEARCHING - Users define an individual profile
containing text words or phrases. These terms are searched against new sequences in the
databases each week, and the results emailed to the user.
e.g. searching Genbank for 'mhc'.
- HOMOLOGY SEARCHING - Users enter a DNA or protein
sequence in the profile. A homology search (Blast, Fasta, or WU-Blast) will be run each week
against the new sequences in the chosen database.
e.g. a protein sequence is entered in the profile. A Blast search of the week's entries in the GenPept database is run each week, and the top hits and alignments are emailed each week.
General Help -- how to create, edit, delete or list profiles.
Text Searches - Available options and result formats for text searches.
Text Search Examples
Sequence Homology Searches - available options for sequence homology alerts.
Sequence Homology Alert Examples
Databases:
The following databases are available. Profiles are run against a database which contains all new entries from the
preceding week.
| Database | Sequence Type | Details
|
|---|
| Genbank | DNA | from NCBI.
|
| Genpept | Protein | from NCI, Frederick.
|
| UniProt | Protein | SwissProt + Trembl downloaded from UniProt.
|
| PDB | Protein | from RCSB. The homology searches are run against the derived sequence data for each PDB protein chain.
|
General Help
- Creating a Text or Homology profile
- Go to the main WHALES webpage. Decide if you want to search for a text word or phrase (Text Searches) or enter a sequence for a weekly homology search (Homology Searches), and click on 'Text Search Alerts' or 'Sequence Homology search Alerts'. On the resulting page, select 'Create a new search profile'. Enter your NIH email address and a simple profile identifier. Enter your text terms or your sequence, select the options, and click on 'Establish profile'.
- Editing a Profile
- You will get one email message from Whales each week for each profile you have set up. This message will include a link to 'Edit this profile' or 'Delete this profile'. You can click on the link in your mail client, which will open to a webpage where you can edit this profile.
- Alternatively, go to the main Whales page, select 'Text Search Alerts' or 'Sequence Homology Search Alerts'. On the resulting page, enter your NIH email address and the profile name in the boxes under the 'Edit' button, and then click on 'Edit an Existing Search Profile'.
The resulting webpage will have your profile loaded into the boxes, and you can modify or select
any parameters.
Click on the 'Save Edited Profile' button to save your changes.
- Deleting a Profile
- You will get one email message from Whales each week for each profile you have set up. This message will include a link to 'Edit this profile' or 'Delete this profile'. You can click on the link in your mail client, which will open to a webpage where you can delete this profile.
- Alternatively, go to the Main Whales page. Enter your NIH email address and the profile name into the boxes under 'Delete an existing search profile', and then click the 'Delete an existing search profile' button.
The resulting web page will display a summary of the profile and ask you to confirm this deletion. Click on 'Delete this profile now', and the profile will be deleted.
- Getting a list of profiles
- If you want to see a list of all your profiles, enter your email address in the box under
'Mail a list of my profiles to me', and click the button.
A summary of each profile will be emailed to you. The list will not appear on the
web page for privacy reasons. In the email message you receive, each profile will have
an 'Edit' and 'Delete' button for your convenience.
WHALES Text Searches
A WHALES Text Search profile consists of a single word or group of words. These words are searched for in the headers of the week's sequences. Words can be grouped with AND, OR, or NOT. Parentheses can be used for grouping. You can choose to search 'Alltext', or 'Description', 'Authors', or 'Organism'.
All searching is case-insensitive.
Logical Operators: The operators & (AND), | (OR), or ! (NOT) can be used to separate words in multiword searches. However, simple text profiles are more likely to be successful than searches which contain complex patterns.
Results: The results options are:
- Name Only List: A list of sequence names which match the query. Each sequence name is a link to the original entry at NCBI, UniProt or the PDB. This option will produce the shortest output.
- Simple Entry List: This option returns the link to the entry, and also the ID, Accession Number and Description of each entry. This is the default.
- Selected fields: By selecting this option, you can define exactly which fields should be returned.
- Selected Fields + Sequence: This is exactly like the option above, but the sequence is also returned in its 'native' (i.e. Genbank-format or Uniprot-format) format.
- Selected fields + sequence (fasta): Identical to the above option, except that the sequence is returnd in Fasta format.
- Native format: The entire sequence entry is returned, with headers and sequence. This option should be used with extreme care, as it can result in very large email messages.
Text Search Examples:
-
A simple text search for fruit fly calcium channels in Uniprot
-
All Text: calcium channel
Organism: Drosophila*
Database: UniProt
Leave the other fields empty or with the preset defaults
-
Text search for human nucleotide sequences relating to MHC Class 1
antigens
-
Description: MHC & Class & antigen
Organism: *sapiens
Database: Genbank
Leave the other fields empty or with the preset defaults.
(Some Genbank entries have an Arabic numeral (1) and some have a Roman
numeral (I) for the antigen class type, so it's safest to leave that
character out entirely when setting up your profile.
-
Text search for mouse proteins of about 350 amino acids.
-
Organism: mus musculus
SeqLength Greater Than: 340 but less than 360
Databases:UniProt, Genpept, PDB
Leave the other fields empty or with the preset defaults
-
Keep track of any new structural data from Ian Wilson's lab.
-
Authors: Wilson, I.A.
-
Databases: PDB
-
(Note that all authors names are converted to 'Lastname,
Initial.Initial' format. No spaces. If you don't know the author's
initials, 'Wilson, *' will find all Wilsons in the author field. )
-
-
Text search for 'tyrosine kinase' with actual sequences returned.
-
All text: tyrosine kinase
-
Format Output as: Selected fields + sequence
-
Include only these fields as output: click on the fields you
want, e.g. ID, Description, Date and Organism.
-
WHALES Sequence Homology Searches
For a WHALES homology search alert, the user enters one or more sequences and selects Blast, Fasta or WU-Blast searches, the database to be queried, and an output format. The homology search is run each week against the week's new sequences, and the results emailed to the user.
- Input Sequence:
- The input sequence can be cut-and-pasted directly into the box, or uploaded as a file from your computer. The sequence should be in one of the following formats: Genbank, EMBL, SwissProt, Fasta, ASN.1, GCG, PIR, IG (Stanford) or Phylip.
- Input Sequence Type:
- Select the sequence type as 'Nucleic Acid', 'Protein', or 'Peptide'.
- Database:
Select the database: Genbank, Genpept, UniProt, Protein Data Bank (PDB), or 'Protein' (Uniprot + Genpept + PDB).
- Comparison Program:
- Select Blast, Fasta or WU-Blast as the homology search program. For a 'Peptide Mix' sequence, Fasta is the only acceptable choice.
- Type of Output:
- Select 'Top Scores Only', or 'Top Scores and Alignments'.
Example WHALES Sequence Homology Searches
-
Keep track of new proteins that are similar to your favourite protein
sequence. Use the Homology search.
-
Input sequence : paste in your sequence.
-
Input sequence type: Protein
-
Database to search: SwissProt + GenPept + PDB
-
Comparison program: Blast
-
Type of output: Top Scores + Alignments
-
-
Search for homology to a nucleic acid sequence. Use Homology search
-
Input sequence: paste in your sequence.
-
Input sequence type: Nucleic acid
-
Database to search: Genbank
-
Comparison program: Fasta3
-
Type of output: Top scores only
More Information
Text Database Indexing -- a list of indexed fields for each database, and how they are mapped to the Whales search fields.
WHALES Home Page
WHALES Text Search ALErts
WHALES Sequence Homology Search ALErts
Questions? Email staff@helix.nih.gov