Dna to rna conversion python

Dna to rna conversion python

Write a Python script that translates two genes in an RNA sequence into their protein sequence and prints them. Each gene begins with an AUG from the left and ends in UAG and has a length that is a multiple of three. However, the RNA sequence length may not be a multiple of three and there may be more than one «UAG» or «AUG» in the sequence.

For example if the input is

with open ("p:/dna.txt", "r") as myfile: data=myfile.readlines() map = DNA=data[1] flag = 1 while flag: start = DNA.find('AUG') if start == -1: flag = 0 else: done = 0 while done!= 0: i = start codon = DNA[1:i+3] if codon == "UAG": stop = i protein = translate(DNA(start)) DNA = DNA[stop:] done = 1 print(protein) 

then the output should be

I have this so far. http://dpaste.org/v2e9/ can anyone help out?

ADD COMMENT • link updated 9.9 years ago by viv_bio ▴ 50 • written 12.8 years ago by Studentguy ▴ 70

Читайте также:  Сессии в php код

@Simon: while I sometimes feel irked when I see a question that seems to be taken right out of a homework I think in the end is not our job to police this. Plus we may be wrong in our assumptions. So I would leave this up to everyone’s individual opinion on whether they would want to answer it or not. A great solution to an answer lives on and will continue to provide value beyond the original poster’s needs.

I do agree, however there is for that question a partial solution if you follow the link to the OP’s ‘dpaste’ page. Here is a thought. With enough googling, the StudentGuy will come up with an already made up solution anyways, most probably using Biopython, which he will likely not understand and which will be too much high order (using ready made package) to have much teaching value. At least here the OP did a part of the work and is ready to interact with people who likely will teach him something. A better developped question and including the code right here might have been better. Cheers

I’d be interested in other moderators opinion of homework questions? I think proof of a reasonable stab at a solution would be a good thing, rather than ‘do my homework for me’ style questions.

just need to find an otherwise permissive license that prohibits copy-paste use into a homework solution

@brentp: The people who are copy-pasting homework probably aren’t reading enough to look at the licenses anyways.

some consolation: I appreciates the frankness to write it as HW!!

I am not really looking for handouts but I am new to programming and am occasionally very confused with python. Most uni’s now require all bio majors to be thrown into the pit with a BNFO class since even with limited knowledge in the subject it is a valuable skill to have. that being said all of our teachers here are pretty seasoned vets in several languages and sometimes its hard to connect to «my first programming class»

i posted more below in the answers section 😀

homework? well, i’ll answer anyway using biopython.

from Bio.Seq import Seq from Bio.Alphabet import generic_rna # add your own logic here to parse the rna sequence from the file. # split on start codon. drop the part preceding the 1st start codon, # then for each chunk, translate to the stop codon. then join and print. print " ".join((str(Seq("AUG" + rest, generic_rna).translate(to_stop=True)) for rest in "ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA".split("AUG")[1:])) 

I’m not a python guy but the following script does the job with dna.txt=

>Human ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA 

src: you just need to check if there is at least 3 bases available after the current position:

with open ("dna.txt", "r") as myfile: data=myfile.readlines() map = DNA=data[1].strip() start = DNA.find('AUG') if start!= -1: while start+2 < len(DNA): codon = DNA[start:start+3] if codon == "UAG": break; print(map[codon]) start+=3 

ADD COMMENT • link updated 3.9 years ago by Ram 39k • written 12.8 years ago by Pierre Lindenbaum 155k

hi i ran your code but there seem to be a problem with your code. When i read a data file n characters are also read. And the start variable seem to always get the value -1 even when the sub-string is present in DNA.

when I run your code I get this error:

start, end = next_transcript(mRNA, cur_pos) TypeError: 'NoneType' object is not iterable 

I also noticed that cur_pos isn't defined until later line 43. Could this be the problem? I am also not sure why in DNA=data[1].strip() you call for the second item 1 in the string? How is your dna.txt formated? This clarification would be much appreciated. Thanks

Thank you all for the fast responses, and to those who help i greatly appreciate it, i ended up with an alternate solution.

I am not really looking for handouts but I am new to programming and am occasionally very confused with python. Most uni's now require all bio majors to be thrown into the pit with a BNFO class since even with limited knowledge in the subject it is a valuable skill to have. that being said all of our teachers here are pretty seasoned vets in several languages and sometimes its hard to connect to "my first programming class"

heres the code i ended up using, it did away with that loop structure i couldnt get and kinda cheats by using find/rfind two narrow down two sequences, it wont work for more than two sequences but it does the job none the less

Nice work with this. Your logic is sound and the major remaining issue to tackle is repetitiveness in the code. Whenever you repeat code with a few changes, you should focus on making that code into a function. Here's a next version of your code generalized with functions and a while loop. It's a combination of your first and second attempts. Notice how the functions allow you to avoid re-writing the same code multiple times: http://gist.github.com/626765

from Bio import SeqIO , Seq from Bio.SeqRecord import SeqRecord def make_trans_record(record): "Returns a new seqrecord with translated sequences" return SeqRecord(seq = record.seq[350:-103].translate(),\ description = "") Input = raw_input("Enter File location of the nucleotide sequence :") Output = raw_input("Output file location and name :") records = map(make_trans_record,SeqIO.parse(Input,"fasta")) SeqIO.write(records,Output,"fasta") 

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Convert any DNA string into an RNA translation or a theoretically valid Protein sequence using Python that takes an operation and filename from the command line.

CyrusK/convert_dna_to_rna_or_protein.py

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Convert DNA to RNA or Protein

Here is some Python script I put together (DNA translation file above) that takes any DNA string (gDNA, cDNA, etc..) and optionally prints its RNA translation or all of the possible valid protein(s) sequences with start and stop codons included. The script will conveniently take an operation and filename of your choice from the command line and print the output.

$dna.txt GCGCTTGGTTTAATGACGGCTTGTTTCTTTTCTGTGGCTGCGTGAAAGCCTTGAGGGGCTCCGGGAGGGCCCTTT GTGCGGGGGGAGCGGCTCGGGGGGTGCGTGCGTGTGTGTGTGCGTGGGGAGCG

$python conversion.py --rna dna.txt GCGCUUGGUUUAAUGACGGCUUGUUUCUUUUCUGUGGCUGCGUGAAAGCCUUGAGGGGCUCCGGGAGGGCCCUUU GUGCGGGGGGAGCGGCUCGGGGGGUGCGUGCGUGUGUGUGUGCGUGGGGAGCG

$python conversion.py --genes dna.txt

1 valid protein(s)can be produced from this sequence: DNA: ATGACGGCTTGTTTCTTTTCTGTGGCTGCGTGA RNA: AUGACGGCUUGUUUCUUUUCUGUGGCUGCGUGA AA sequence: MTACFFSVAA

The code in this repository is open source. It may be useful for anyone in biotech R&D working on big data genomic analysis.

About

Convert any DNA string into an RNA translation or a theoretically valid Protein sequence using Python that takes an operation and filename from the command line.

Источник

Dna to rna conversion python

The idea for the conclusion is to replace the T in the DNA sequence with U, just import re directly, and then use sub.

import re rna_nucleotide = '' with open(r'D:\Rosalind\haha.txt','w+') as f1: with open(r'D:\Rosalind\rosalind_rna.txt','r') as f: for nucleotide in f: rna_nucleotide += re.sub('T','U',nucleotide) print (rna_nucleotide,file = f1)

Or in the interactive interface, import re, then re.sub('T','U',seq), is more convenient, but the premise is that the sequence is short, and the memory is not enough if the sequence is long.

Intelligent Recommendation

Use LIMMA, GLIMMA and EDGER, RNA-SEQ data analysis is easy

Contents 1 summary 2 background introduction 3 initial configuration 4 data integration 4.1 Reading count data 4.2 Organize sample information 4.3 Organizational Gene Notes 5 data pretreatment 5.1 Ori.

Источник

Reverse complement of DNA strand using Python

In this article, we will cover, how to Reverse the complement of DNA or RNA sequences in Python.

DNA strand: ATGCCGAGCA Complementary Strand: TACGGCTCGT Reverse-Complementary strand: ACGAGCCGTA

An overview of DNA and RNA as used in Molecular Biology

The genetic material of living organisms is made up of Deoxyribonucleic acid(DNA) or Ribonucleic acid (RNA). The primary structure of DNA and RNA is made up of a sequence of nucleotide bases. The structure of DNA can be a double-stranded or single-stranded sequence of nucleotides(bases). For double-stranded nucleic acids, the nucleotide bases pair in a given rule which is unique to DNA and RNA. For DNA, there exist four types of bases namely; Adenine(A), Thymine(T), Guanine(G), and Cytosine(C). Therefore, DNA can be identified as containing ATGC bases. The pairing of bases in DNA is that Adenine pairs with Thymine(with a double bond) while Guanine Pairs with Cytosine (with a triple bond). i.e A=T and G≡C as shown below.

Reverse complement of DNA strand using Python

DNA base pairing. The upper strand is complementary to the downer strand and vice versa

For RNA, all instances of Thymine are replaced by Uracil. This means that for double-stranded RNA, Adenine pairs with Uracil while Guanine pairs with Cytosine A=U and G≡C as shown below:

Reverse complement of DNA strand using Python

RNA base pairing. Each strand is a complementary sequence to one another

Reverse Complement of a DNA or RNA

A Reverse Complement converts RNA or DNA sequence into its reverse, complement counterpart. One of the major questions in Molecular Biology to solve using computational approaches is to find the reverse complement of a sequence. This is always done so to work with the reversed-complement of a given sequence if it contains an open reading frame(a region that encodes for a protein sequence during the transcription process) on the reverse strand. One could be interested to verify that the sequence is a DNA or RNA before finding its reverse complement

How to identify if the sequences of DNA and RNA

One of the major tasks in Bioinformatics in computational molecular biology and bioinformatics is to verify if the sequence is DNA or RNA. To do this we can use the set method to verify a sequence.

Method 1: Verify if a sequence is DNA and RNA

In the set method, we convert the input sequence into a set. We combine the set obtained with a reference DNA set(ATGC) or RNA set(AUGC) using the union function of the set. This is done so that the input sequence is rendered valid even if it does not contain all four types of nucleotide bases. For instance, TTTTTTTAAA is a valid DNA even though it contains only two types of bases. Also, UUUUUUUUGGG is a valid RNA.

Источник

Оцените статью