https://wiki.preterhuman.net/index.php?title=Helix-turn-helix_motif_prediction_ANSI_C_source&feed=atom&action=history
Helix-turn-helix motif prediction ANSI C source - Revision history
2024-03-29T06:12:31Z
Revision history for this page on the wiki
MediaWiki 1.35.0
https://wiki.preterhuman.net/index.php?title=Helix-turn-helix_motif_prediction_ANSI_C_source&diff=14584&oldid=prev
Netfreak: Created page with "<pre> From news@wakinyan.uchicago.edu Sat Apr 24 16:13:56 1993 Received: from net.bio.net by sunflower.bio.indiana.edu (4.1/9.7jsm) id AA05660; Sat, 24 Apr 93 16:13:50 EST Re..."
2020-07-29T23:28:38Z
<p>Created page with "<pre> From news@wakinyan.uchicago.edu Sat Apr 24 16:13:56 1993 Received: from net.bio.net by sunflower.bio.indiana.edu (4.1/9.7jsm) id AA05660; Sat, 24 Apr 93 16:13:50 EST Re..."</p>
<p><b>New page</b></p><div><pre><br />
From news@wakinyan.uchicago.edu Sat Apr 24 16:13:56 1993<br />
Received: from net.bio.net by sunflower.bio.indiana.edu<br />
(4.1/9.7jsm) id AA05660; Sat, 24 Apr 93 16:13:50 EST<br />
Received: from ncar.ucar.edu by net.bio.net (5.65/IG-2.0) with SMTP <br />
id AA05950; Sat, 24 Apr 93 14:12:05 -0700<br />
Received: from midway.uchicago.edu by ncar.ucar.EDU (5.65/ NCAR Central Post Office 03/11/93)<br />
id AA02685; Sat, 24 Apr 93 15:12:34 MDT<br />
Received: from wakinyan.uchicago.edu by midway.uchicago.edu Sat, 24 Apr 93 16:12:32 CDT<br />
Return-Path: <news@wakinyan.uchicago.edu><br />
Received: by wakinyan.uchicago.edu (4.1/UofC3.2)<br />
id AA09798; Sat, 24 Apr 93 16:13:06 CDT<br />
Newsgroups: bionet.software.sources<br />
Path: kimbark!chh9<br />
From: chh9@midway.uchicago.edu (Conrad Halling)<br />
Subject: Helix-turn-helix motif prediction ANSI C source<br />
Message-Id: <1993Apr24.211302.9753@midway.uchicago.edu><br />
Sender: news@wakinyan.uchicago.edu (News System)<br />
Reply-To: chh9@midway.uchicago.edu<br />
Organization: University of Chicago Computing Organizations<br />
Date: Sat, 24 Apr 1993 21:13:02 GMT<br />
Apparently-To: bionet-software-sources@ncar.ucar.edu<br />
Status: R<br />
<br />
/*<br />
hth.c<br />
v 1.0.1<br />
24 April 1993<br />
<br />
This simple program predicts whether a protein contains a helix-turn-<br />
helix motif, using the method of:<br />
<br />
Dodd, I. B., and J. B. Egan. 1990. Improved detection of helix-turn-<br />
helix DNA-binding motifs in protein sequences. Nucleic Acids Res. <br />
18:5019-5026.<br />
<br />
This code written and donated to the public domain by:<br />
Conrad Halling<br />
Department of Molecular Genetics and Cell Biology<br />
University of Chicago<br />
920 E 58th St<br />
Chicago IL 60637<br />
<br />
e-mail: c-halling@uchicago.edu<br />
<br />
How to compile this program:<br />
<br />
This program is written in ANSI C using THINK C 5.0.4.<br />
On our local UNIX system (running SunOS Release 4.1.1), it will<br />
not compile using cc but will compile using either gcc or acc.<br />
For example, to compile this program, you would type<br />
acc -o hth hth.c <return><br />
This means, "start the acc (ANSI C Compiler) program, send<br />
the output to a file called "hth", and take the input<br />
from the file "hth.c".<br />
When the program is compiled, you run it by typing "hth"<br />
at the prompt.<br />
<br />
tabs = 4<br />
<br />
When using vi under UNIX, you can set the tabs to 4 by opening the<br />
file, typing escape (to go into command mode), colon (":") to go<br />
to the command line, and "set tabstop=4" (without the quotes).<br />
<br />
Format of input protein sequence:<br />
<br />
One protein sequence per file<br />
Single-letter code in upper case and/or lower case<br />
White space characters (space, tab, return, etc.) are ignored<br />
The program will abort if an invalid character is found<br />
*/<br />
<br />
#include <ctype.h><br />
#include <limits.h><br />
#include <stdlib.h><br />
#include <stdio.h><br />
#include <string.h><br />
<br />
#ifndef __HTH__<br />
#define __HTH__<br />
<br />
#ifndef TRUE<br />
#define TRUE 1<br />
#endif<br />
<br />
#ifndef FALSE<br />
#define FALSE 0<br />
#endif<br />
<br />
#define AMINO_ACIDS_COUNT 20<br />
#define MAX_SEQUENCE_LENGTH 20000<br />
<br />
#define WINDOW_SIZE 22 /* These values from Table 3 */<br />
#define NON_HTH_MEAN_SCORE 238.71 /* of Dodd and Egan (1991) */<br />
#define NON_HTH_STD_DEV 293.61<br />
<br />
/*<br />
Errors<br />
*/<br />
<br />
#define NO_ERROR 0<br />
#define SEQUENCE_TOO_SHORT 1<br />
#define OUT_OF_MEMORY 2<br />
#define QUIT 3<br />
#define INVALID_CHAR 4<br />
<br />
/*<br />
Function prototypes<br />
*/<br />
<br />
void DisplayError(<br />
short error );<br />
void DisplayResults(<br />
double convertedScore,<br />
size_t maxScorePosition,<br />
const char *sequence );<br />
short GetAminoAcid(<br />
char residue );<br />
<br />
#endif<br />
<br />
<br />
const short weightMatrix[ AMINO_ACIDS_COUNT ][ WINDOW_SIZE ] =<br />
{<br />
/* A (alanine) */<br />
-125, -194, -84, 70, 36, 54, 238, -15, 77, 26, -194,<br />
-194, -56, -84, 14, 77, -56, -56, -56, 46, -195, 36,<br />
/* C (cysteine) */<br />
-64, -64, -63, -63, -64, -64, -64, -64, -64, 47, 47,<br />
-63, -63, -64, -64, -64, -64, -64, -64, -63, -64, 47,<br />
/* D (aspartate) */<br />
-156, -154, -156, -154, 109, -156, -156, 109, -154, -156, 6,<br />
-156, -154, -85, -156, -156, -156, -156, -154, -154, -156, -85,<br />
/* E (glutamate) */<br />
-31, -9, -171, 70, 156, -171, -171, 107, 50, -60, -60,<br />
-171, -60, 78, 86, -171, -171, -101, -171, -170, 86, 9,<br />
/* F (phenylalanine) */<br />
10, -130, 10, -130, -130, 10, -130, -129, -130, 102, -130,<br />
-130, -130, -130, -129, -130, -130, -129, -129, -129, 180, -130,<br />
/* G (glycine) */<br />
30, 5, -190, -51, -191, -191, 18, -191, -191, -191, 202,<br />
-191, -10, -191, 5, -190, -191, -80, -190, -190, -191, -51,<br />
/* H (histidine ) */<br />
62, 33, -76, -76, -7, -78, -78, 33, -7, -78, 84,<br />
-78, 33, 33, -78, -7, -78, -7, 62, 84, -78, -7,<br />
/* I (isoleucine ) */<br />
75, -156, 101, -45, -86, 116, -156, -16, 65, -16, -156,<br />
128, -156, -86, -156, -155, 188, -155, -16, 53, 122, -155,<br />
/* K (lysine ) */<br />
-31, -31, 10, 70, 79, -170, -170, 94, 70, -171, -9,<br />
-100, -100, 25, -100, -170, -171, -9, 38, -31, -9, 101,<br />
/* L (leucine) */<br />
66, -212, 72, -213, -212, 144, -213, -102, 37, 132, -213,<br />
97, -213, -142, -212, -212, 97, -212, -212, 37, 88, -213,<br />
/* M (methionine) */<br />
122, -74, -3, -73, -73, -73, -74, -74, 88, 122, -73,<br />
158, -74, -74, -3, -74, -3, -74, -74, -3, -74, -73,<br />
/* N (asparagine) */<br />
-137, 72, -137, -136, -137, -137, -137, -67, -136, -136, 128,<br />
-137, 72, -136, 2, -67, -137, 2, 84, -137, -137, 104,<br />
/* P (proline) */<br />
-156, 23, -157, -156, -157, -157, -157, -157, -157, -157, -157,<br />
-157, -46, 101, 39, -157, -157, -157, -157, -157, -157, -46,<br />
/* Q (glutamine) */<br />
-60, -130, 175, 90, 110, -131, -131, 90, 78, -131, -131,<br />
-60, -130, 154, 65, 119, -131, -20, 119, 31, -20, 90,<br />
/* R (arginine) */<br />
65, 76, 110, 65, 7, -155, -154, 123, 76, -155, -154,<br />
-155, -154, 129, 54, 40, -155, 129, 179, -45, -155, 123,<br />
/* S (serine) */<br />
-118, 96, -188, 21, -48, -187, -8, -187, -118, -118, -77,<br />
-188, 174, -187, 135, -26, -188, 150, -77, -188, -187, -26,<br />
/* T (threonine) */<br />
11, 149, -59, 80, -169, -8, -170, -99, -99, -30, -170,<br />
-8, 131, -30, -59, 198, -170, -30, -169, -170, -30, -59,<br />
/* V (valine) */<br />
17, -67, -177, -177, -108, 100, -178, -178, -108, 71, -178,<br />
160, -178, -16, -67, -67, 169, -178, 17, 31, 17, -178,<br />
/* W (tryptophan) */<br />
44, -26, -26, -26, -26, -26, -26, -26, -26, -26, -26,<br />
-26, -26, -25, -26, -25, -26, -26, 44, 279, -26, -26,<br />
/* Y (tyrosine) */<br />
-40, -110, 30, 1, -110, -109, -110, -110, -40, 30, -110,<br />
-109, -109, -40, -110, -40, -110, 162, 52, 86, -110, -110<br />
};<br />
<br />
const char aminoAcidsString[] = "ACDEFGHIKLMNPQRSTVWYacdefghiklmnpqrstvwy";<br />
<br />
<br />
int main()<br />
{<br />
char fileName[ FILENAME_MAX ],<br />
format[ 12 ],<br />
residue,<br />
*sequence;<br />
int resultsDisplayed,<br />
theChar;<br />
short aminoAcid,<br />
fileNameEntered,<br />
maxScore,<br />
quit,<br />
status,<br />
tempScore;<br />
size_t i,<br />
j,<br />
length,<br />
maxWindowPosition,<br />
maxScorePosition,<br />
sequenceLength;<br />
double convertedScore;<br />
FILE *sequenceFile;<br />
<br />
status = NO_ERROR;<br />
quit = FALSE;<br />
<br />
printf( "Welcome to hth, a program that predicts whether a protein contains\n" );<br />
printf( "a helix-turn-helix motif.\n\n" );<br />
printf( "For more information, please read\n\n" );<br />
printf( " Dodd, I. B., and J. B. Egan. 1990. Improved detection of\n" );<br />
printf( " helix-turn-helix DNA-binding motifs in protein sequences.\n" );<br />
printf( " Nucleic Acids Res. 18:5019-5026.\n\n" );<br />
printf( "Please be sure that your protein sequence is in \"plain\" format\n" );<br />
printf( "using the single-letter code.\n\n" );<br />
<br />
while ( !quit )<br />
{<br />
<br />
/*<br />
Allocate memory for the protein sequence.<br />
*/<br />
<br />
sequence = ( char * ) malloc( MAX_SEQUENCE_LENGTH * sizeof( char ) );<br />
if ( NULL == sequence )<br />
status = OUT_OF_MEMORY;<br />
<br />
/*<br />
Get the name of the sequence file.<br />
*/<br />
<br />
fileNameEntered = FALSE;<br />
while ( !fileNameEntered )<br />
{<br />
printf( "Name of protein file (q/Q = Quit): " );<br />
sprintf( format, "%%%us", FILENAME_MAX );<br />
scanf( format, fileName );<br />
length = strlen( fileName );<br />
if ( 0 != length )<br />
{<br />
/*<br />
Check for quit command.<br />
*/<br />
<br />
if ( ( 1 == length ) &&<br />
( ( fileName[ 0 ] == 'q' ) || ( fileName[ 0 ] == 'Q' ) ) )<br />
{<br />
status = QUIT;<br />
quit = TRUE;<br />
break;<br />
}<br />
else<br />
{<br />
/*<br />
Open the sequence file.<br />
*/<br />
<br />
sequenceFile = fopen( fileName, "r" );<br />
if ( NULL == sequenceFile )<br />
printf( "File not found!\n\n" );<br />
else<br />
fileNameEntered = TRUE;<br />
}<br />
}<br />
}<br />
<br />
if ( NO_ERROR == status )<br />
{<br />
/*<br />
Read the sequence from sequenceFile into sequence[].<br />
*/<br />
<br />
i = 0;<br />
while ( i < MAX_SEQUENCE_LENGTH )<br />
{<br />
/*<br />
Stop reading at end of file.<br />
*/<br />
<br />
theChar = getc( sequenceFile );<br />
if ( feof( sequenceFile ) )<br />
break;<br />
<br />
/*<br />
Skip white space characters.<br />
*/<br />
<br />
if ( isspace( theChar ) )<br />
continue;<br />
<br />
/*<br />
GetAminoAcid will return AMINO_ACIDS_COUNT if the character<br />
is not found in AminoAcidsString[].<br />
*/<br />
<br />
if ( AMINO_ACIDS_COUNT == GetAminoAcid( ( char ) theChar ) )<br />
{<br />
status = INVALID_CHAR;<br />
break;<br />
}<br />
<br />
/*<br />
The character is valid; add it to the string.<br />
*/<br />
<br />
sequence[ i++ ] = ( char ) theChar;<br />
}<br />
<br />
if ( NO_ERROR == status )<br />
sequence[ i++ ] = 0x00;<br />
<br />
fclose( sequenceFile );<br />
}<br />
<br />
if ( NO_ERROR == status )<br />
{<br />
sequenceLength = strlen( sequence );<br />
if ( sequenceLength < WINDOW_SIZE )<br />
status = SEQUENCE_TOO_SHORT;<br />
}<br />
<br />
if ( NO_ERROR == status )<br />
{<br />
/*<br />
Calculate the highest score for the sequence.<br />
*/<br />
<br />
maxScore = SHRT_MIN; /* defined in <limits.h> */<br />
maxScorePosition = 0;<br />
maxWindowPosition = sequenceLength - WINDOW_SIZE;<br />
for ( i = 0; i < maxWindowPosition; i++ )<br />
{<br />
tempScore = 0;<br />
for ( j = 0; j < WINDOW_SIZE; j++ )<br />
{<br />
residue = sequence[ i + j ];<br />
aminoAcid = GetAminoAcid( residue );<br />
tempScore += weightMatrix[ aminoAcid ] [ j ];<br />
}<br />
if ( tempScore > maxScore )<br />
{<br />
maxScore = tempScore;<br />
maxScorePosition = i;<br />
convertedScore =<br />
( ( double ) maxScore - NON_HTH_MEAN_SCORE ) /<br />
( NON_HTH_STD_DEV );<br />
if ( convertedScore >= 2.5 )<br />
{<br />
DisplayResults(<br />
convertedScore,<br />
maxScorePosition,<br />
sequence );<br />
resultsDisplayed = TRUE;<br />
maxScore = SHRT_MIN;<br />
}<br />
}<br />
}<br />
if ( !resultsDisplayed )<br />
DisplayResults( convertedScore, maxScorePosition, sequence );<br />
}<br />
<br />
DisplayError( status );<br />
}<br />
}<br />
<br />
<br />
void DisplayResults(<br />
double convertedScore,<br />
size_t maxScorePosition,<br />
const char *sequence )<br />
{<br />
char maxScoreString[ WINDOW_SIZE ];<br />
short i,<br />
percentage;<br />
<br />
printf(<br />
"The score is %0.2f at position %ld.\n", <br />
convertedScore,<br />
maxScorePosition + 1 );<br />
<br />
for ( i = 0; i < WINDOW_SIZE; i++ )<br />
maxScoreString[ i ] = sequence[ maxScorePosition + i ];<br />
maxScoreString[ i ] = 0x00;<br />
<br />
printf(<br />
"The sequence at this position is %s.\n",<br />
maxScoreString );<br />
<br />
if ( convertedScore >= 4.5 )<br />
percentage = 100;<br />
else if ( convertedScore >= 4.0 )<br />
percentage = 90;<br />
else if ( convertedScore >= 3.5 )<br />
percentage = 71;<br />
else if ( convertedScore >= 3.0 )<br />
percentage = 50;<br />
else if ( convertedScore >= 2.5 )<br />
percentage = 25;<br />
<br />
if ( convertedScore < 2.5 )<br />
printf( "This score is not significant.\n" );<br />
else<br />
{<br />
printf(<br />
"This score suggests an approximately %d%% probability that ",<br />
percentage );<br />
printf( "this protein\ncontains a helix-turn-helix motif.\n\n" );<br />
}<br />
}<br />
<br />
<br />
short GetAminoAcid(<br />
char residue )<br />
{<br />
short i,<br />
limit;<br />
<br />
limit = strlen( aminoAcidsString );<br />
for ( i = 0; i < limit; i++ )<br />
{<br />
if ( residue == aminoAcidsString[ i ] )<br />
break;<br />
}<br />
if ( i >= AMINO_ACIDS_COUNT )<br />
i -= AMINO_ACIDS_COUNT;<br />
<br />
return ( i );<br />
}<br />
<br />
<br />
void DisplayError(<br />
short error )<br />
{<br />
<br />
char errorString[5][80] =<br />
{<br />
"\n\n",<br />
"The protein sequence is too short to analyze.\n\n",<br />
"There is insufficient memory to continue.\n\n",<br />
"Good-bye!\n\n",<br />
"There is an invalid character in the protein sequence.\n\n"<br />
};<br />
<br />
printf( errorString[ error ] );<br />
}<br />
-- <br />
Conrad Halling<br />
c-halling@uchicago.edu<br />
</pre><br />
<br />
[[Category:Source Code]]</div>
Netfreak