Amino acids

Amino acids

DNA sequencing.

3-letter sequences encode amino acids in DNA. For example, TTT is phenylalanine and TTA is leucine. This program reads a DNA sequence stored in a file and outputs the number of a particular amino acid in the sequence requested by the user. E.g. in the sequence: ACGTTTGTATTT the sequence TTT appears twice.

Make


Write a program that asks the user to enter three characters and outputs how many times that sequence of characters appears in a file.

Use this boilerplate code as a starting point:

Success Criteria

Remember to add a comment before a subprogram, selection or iteration statement to explain its purpose.

Complete the subprogram called `get_amino_acid` that:

  1. Asks the user to input an amino acid.
  2. Validates the input to ensure only three letters of ACG or T are accepted.
  3. Returns the valid choice.

Complete the subprogram called `check_sequence` that:

  1. Takes the amino acid as a parameter.
  2. Opens the file called, `dna.txt` for reading.
  3. If the file cannot be found it returns -1.
  4. Reads the file a line at a time.
  5. Counts the number of amino acids found in the file.
  6. Closes the file.
  7. Returns the number of amino acids of the given sequence found.

Complete the `main program` so that:

  1. Calls `get_amino_acid` to input a valid amino acid.
  2. Calls `check_sequence` to return the number of the amino acids in the file.
  3. If the number is -1, the message, “DNA file not found.” is output.
  4. If the number is >-1 the number of amino acids is output using the format shown below.

Typical inputs and outputs from the program would be:

`dna.txt` file:

ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGC

CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCGG

CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAG

Enter the amino acid to find: CCC

There are 4 CCC amino acids in the DNA sequence.


Enter the amino acid to find: GGT

There are 0 GGT amino acids in the DNA sequence.


Enter the amino acid to find: GGG

There are 3 GGG amino acids in the DNA sequence.

Knowledge Organiser

Use these resources as a reference to help you meet the success criteria.

Programming guide:

Evaluate


Run the unit tests below to check that your program has met the success criteria.

Enter the amino acid to find: AAC

There are 0 AAC amino acids in the DNA sequence.

Enter the amino acid to find: AGF

Enter the amino acid to find: TAA

There are 0 TAA amino acids in the DNA sequence.

Enter the amino acid to find: CAC

There are 1 CAC amino acids in the DNA sequence.

Enter the amino acid to find: CCC

There are 4 CCC amino acids in the DNA sequence.

Enter the amino acid to find: GAG

There are 2 GAG amino acids in the DNA sequence.

Enter the amino acid to find: GCC

There are 3 GCC amino acids in the DNA sequence.

Check that you have:

  • Used comments within the code to describe the purpose of subprograms, conditions and iterations.
  • Used meaningful identifier names. That means the names of subprograms and variables indicate what they are for.
Craig'n'Dave logo

Craig ‘n’ Dave

In partnership with

Mission Encodeable
Bett Awards 2024 Finalist