WHALES - Help


WHALES - (Web Homology Alert Service) is a sequence alert service. It allows NIH scientists to define a profile (text terms or sequences) which will be searched each week against the new sequences in the major DNA/Protein databases. The results are emailed once a week.

WHALES is available only to NIH personnel, and is developed and maintained by the Helix Systems staff, CIT, NIH.

There are two distinct search modes:

  1. TEXT SEARCHING - Users define an individual profile containing text words or phrases. These terms are searched against new sequences in the databases each week, and the results emailed to the user.
    e.g. searching Genbank for 'mhc'.
  2. HOMOLOGY SEARCHING - Users enter a DNA or protein sequence in the profile. A homology search (Blast, Fasta, or WU-Blast) will be run each week against the new sequences in the chosen database.
    e.g. a protein sequence is entered in the profile. A Blast search of the week's entries in the GenPept database is run each week, and the top hits and alignments are emailed each week.


General Help -- how to create, edit, delete or list profiles.
Text Searches - Available options and result formats for text searches.
Text Search Examples
Sequence Homology Searches - available options for sequence homology alerts.
Sequence Homology Alert Examples

Databases:

The following databases are available. Profiles are run against a database which contains all new entries from the preceding week.

DatabaseSequence Type Details
GenbankDNA from NCBI.
GenpeptProtein from NCI, Frederick.
UniProtProtein SwissProt + Trembl downloaded from UniProt.
PDB Protein from RCSB. The homology searches are run against the derived sequence data for each PDB protein chain.


General Help

Creating a Text or Homology profile
Go to the main WHALES webpage. Decide if you want to search for a text word or phrase (Text Searches) or enter a sequence for a weekly homology search (Homology Searches), and click on 'Text Search Alerts' or 'Sequence Homology search Alerts'. On the resulting page, select 'Create a new search profile'. Enter your NIH email address and a simple profile identifier. Enter your text terms or your sequence, select the options, and click on 'Establish profile'.

Editing a Profile
You will get one email message from Whales each week for each profile you have set up. This message will include a link to 'Edit this profile' or 'Delete this profile'. You can click on the link in your mail client, which will open to a webpage where you can edit this profile.

Alternatively, go to the main Whales page, select 'Text Search Alerts' or 'Sequence Homology Search Alerts'. On the resulting page, enter your NIH email address and the profile name in the boxes under the 'Edit' button, and then click on 'Edit an Existing Search Profile'.

The resulting webpage will have your profile loaded into the boxes, and you can modify or select any parameters.

Click on the 'Save Edited Profile' button to save your changes.

Deleting a Profile
You will get one email message from Whales each week for each profile you have set up. This message will include a link to 'Edit this profile' or 'Delete this profile'. You can click on the link in your mail client, which will open to a webpage where you can delete this profile.

Alternatively, go to the Main Whales page. Enter your NIH email address and the profile name into the boxes under 'Delete an existing search profile', and then click the 'Delete an existing search profile' button.
The resulting web page will display a summary of the profile and ask you to confirm this deletion. Click on 'Delete this profile now', and the profile will be deleted.

Getting a list of profiles
If you want to see a list of all your profiles, enter your email address in the box under 'Mail a list of my profiles to me', and click the button.
A summary of each profile will be emailed to you. The list will not appear on the web page for privacy reasons. In the email message you receive, each profile will have an 'Edit' and 'Delete' button for your convenience.


WHALES Text Searches

A WHALES Text Search profile consists of a single word or group of words. These words are searched for in the headers of the week's sequences. Words can be grouped with AND, OR, or NOT. Parentheses can be used for grouping. You can choose to search 'Alltext', or 'Description', 'Authors', or 'Organism'.

All searching is case-insensitive.

Logical Operators: The operators & (AND), | (OR), or ! (NOT) can be used to separate words in multiword searches. However, simple text profiles are more likely to be successful than searches which contain complex patterns.

Results: The results options are:

Text Search Examples:

A simple text search for fruit fly calcium channels in Uniprot
All Text: calcium channel
Organism: Drosophila*
Database: UniProt
Leave the other fields empty or with the preset defaults

Text search for human nucleotide sequences relating to MHC Class 1 antigens
Description: MHC & Class & antigen
Organism: *sapiens
Database: Genbank
Leave the other fields empty or with the preset defaults. (Some Genbank entries have an Arabic numeral (1) and some have a Roman numeral (I) for the antigen class type, so it's safest to leave that character out entirely when setting up your profile.

Text search for mouse proteins of about 350 amino acids.
Organism: mus musculus
SeqLength Greater Than: 340 but less than 360
Databases:UniProt, Genpept, PDB
Leave the other fields empty or with the preset defaults

Keep track of any new structural data from Ian Wilson's lab.
Authors: Wilson, I.A.
Databases: PDB
(Note that all authors names are converted to 'Lastname, Initial.Initial' format. No spaces. If you don't know the author's initials, 'Wilson, *' will find all Wilsons in the author field. )

Text search for 'tyrosine kinase' with actual sequences returned.
All text: tyrosine kinase
Format Output as: Selected fields + sequence
Include only these fields as output: click on the fields you want, e.g. ID, Description, Date and Organism.


WHALES Sequence Homology Searches

For a WHALES homology search alert, the user enters one or more sequences and selects Blast, Fasta or WU-Blast searches, the database to be queried, and an output format. The homology search is run each week against the week's new sequences, and the results emailed to the user.

Input Sequence:
The input sequence can be cut-and-pasted directly into the box, or uploaded as a file from your computer. The sequence should be in one of the following formats: Genbank, EMBL, SwissProt, Fasta, ASN.1, GCG, PIR, IG (Stanford) or Phylip.

Input Sequence Type:
Select the sequence type as 'Nucleic Acid', 'Protein', or 'Peptide'.

Database: Select the database: Genbank, Genpept, UniProt, Protein Data Bank (PDB), or 'Protein' (Uniprot + Genpept + PDB).

Comparison Program:
Select Blast, Fasta or WU-Blast as the homology search program. For a 'Peptide Mix' sequence, Fasta is the only acceptable choice.

Type of Output:
Select 'Top Scores Only', or 'Top Scores and Alignments'.

Example WHALES Sequence Homology Searches

Keep track of new proteins that are similar to your favourite protein sequence. Use the Homology search.
Input sequence : paste in your sequence.
Input sequence type: Protein
Database to search: SwissProt + GenPept + PDB
Comparison program: Blast
Type of output: Top Scores + Alignments

Search for homology to a nucleic acid sequence. Use Homology search
Input sequence: paste in your sequence.
Input sequence type: Nucleic acid
Database to search: Genbank
Comparison program: Fasta3
Type of output: Top scores only

More Information

 Text Database Indexing -- a list of indexed fields for each database, and how they are mapped to the Whales search fields.


WHALES Home Page
WHALES Text Search ALErts
WHALES Sequence Homology Search ALErts
Questions? Email staff@helix.nih.gov