Python Forum
Computing GC Content - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Computing GC Content (/thread-40234.html)



Computing GC Content - uwl - Jun-25-2023

I have been trying to find the solution to a programming problem which asks me to find the GC content. The example online displays:

Sample Dataset
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT

Sample Output
Rosalind_0808
60.919540

My code runs properly and I already tried entering the solutions I find with the correct amount of decimal place values.

from collections import Counter
myDna = ''
with open('rosalind_gc.txt', 'r') as data:
     for line in data:
          if '>' in line:
               continue
          myDna += line.strip()
myNucleotideCounts = Counter(myDna)
myGC = (myNucleotideCounts['G'] + myNucleotideCounts['C']) / float(len(myDna))
print('Dna GC Content = {0}'.format(myGC)) 
Would someone be able to provide suggestions as to what might be the problem and how to fix this in the above code?


RE: Computing GC Content - Pedroski55 - Jun-26-2023

I'm not familiar with collections, so I didn't use that module here.

Apparently, as I've been told here, creating a string myDNA = '' and then appending to it actually creates many strings, because a string is immutable, hangs around in memory. So it is better to use a list, then join the list!

Anyway, this may give you some ideas:

#! /usr/bin/python3
import random

# play God, make DNA, make a Python!
def makeDNA():
    myDNA = []
    nucleotides = ['A', 'C', 'G', 'T']
    length = input('How long do you want your DNA? Just enter a number. ')
    num = int(length)
    for i in range(0, num):
        choice = random.choice(nucleotides)
        myDNA.append(choice)
    mystring = ''.join(myDNA)
    return mystring

def countNucs(DNA_sample):
    countC = 0
    countG = 0
    for nuc in DNA_sample:    
        if nuc == 'C':
            countC +=1
        elif nuc == 'G':
                countG +=1
    print('Cytosine was found', countC, 'times in this sample of DNA.')
    print('Guanine was found', countG, 'times in this sample of DNA.')

if __name__ == "__main__":
    DNA_sample = makeDNA()
    countNucs(DNA_sample)



RE: Computing GC Content - bowlofred - Jun-26-2023

(Jun-25-2023, 10:04 PM)uwl Wrote: which asks me to find the GC content

Is that the entire specification of the problem? It looks to me like it wants you to examine the GC content of different files and only print the filename and GC content in some situations (perhaps with a GC content greater than some threshold?)

Your program is calculating a single value for all lines in the file. It doesn't display the fiiename like the sample output does.


RE: Computing GC Content - Pedroski55 - Jun-26-2023

It occurred to me this morning that you may want to count the number of times the sequence GC appears in the DNA sample.

def countGC(DNA):
    countGC = 0    
    for i in range(len(DNA)):    
        if DNA[i] == 'G' and DNA[i+1] == 'C':
            countGC +=1        
    print('The sequence GC was found', countGC, 'times in this sample of DNA.')
For a DNA sample 501 long that gave:

Quote:countGC(DNA_sample)
The sequence GC was found 31 times in this sample of DNA.