Amino acids

Amino acids

DNA sequencing.

3-letter sequences encode amino acids in DNA. For example, TTT is phenylalanine and TTA is leucine. This program reads a DNA sequence stored in a file and outputs the number of a particular amino acid in the sequence requested by the user. E.g. in the sequence: ACGTTTGTATTT the sequence TTT appears twice.

Make


Write a program that asks the user to enter three characters and outputs how many times that sequence of characters appears in a file.

Use this boilerplate code as a starting point:

Success Criteria

Remember to add a comment before a subprogram, selection or iteration statement to explain its purpose.

Complete the subprogram called `get_amino_acid` that:

  1. Asks the user to input an amino acid.
  2. Validates the input to ensure only three letters of ACG or T are accepted.
  3. Returns the valid choice.

Complete the subprogram called `check_sequence` that:

  1. Takes the amino acid as a parameter.
  2. Opens the file called, `dna.txt` for reading. Note this is included in the Trinket above for you to use as source data.
  3. If the file cannot be found it returns -1.
  4. Reads the file a line at a time.
  5. Counts the number of amino acids found in the file.
  6. Closes the file.
  7. Returns the number of amino acids of the given sequence found.

Complete the `main program` so that:

  1. Calls `get_amino_acid` to input a valid amino acid.
  2. Calls `check_sequence` to return the number of the amino acids in the file.
  3. If the number is -1, the message, "DNA file not found." is output.
  4. If the number is >-1 the number of amino acids is output using the format shown below.

Typical inputs and outputs from the program would be:

`dna.txt` file:

ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGC

CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCGG

CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAG

Enter the amino acid to find: CCC

There are 4 CCC amino acids in the DNA sequence.


Enter the amino acid to find: GGT

There are 0 GGT amino acids in the DNA sequence.


Enter the amino acid to find: GGG

There are 3 GGG amino acids in the DNA sequence.

Knowledge Organiser

Use these resources as a reference to help you meet the success criteria.

Programming guide:

Evaluate


Run the unit tests below to check that your program has met the success criteria.

Enter the amino acid to find: AAC

There are 0 AAC amino acids in the DNA sequence.

Enter the amino acid to find: AGF

Enter the amino acid to find: TAA

There are 0 TAA amino acids in the DNA sequence.

Enter the amino acid to find: CAC

There are 1 CAC amino acids in the DNA sequence.

Enter the amino acid to find: CCC

There are 4 CCC amino acids in the DNA sequence.

Enter the amino acid to find: GAG

There are 2 GAG amino acids in the DNA sequence.

Enter the amino acid to find: GCC

There are 3 GCC amino acids in the DNA sequence.

Check that you have:

  • Used comments within the code to describe the purpose of subprograms, conditions and iterations.
  • Used meaningful identifier names. That means the names of subprograms and variables indicate what they are for.