I'm trying a Python script that takes from the user a sequence of certain letters, (A, C,G ,T) and prints the percentage of A's, C's, G's, and T's.
For example if the user types AGGTGACCCT then the output should be A: 20 C: 30 G: 30 T: 20
I'm fairly experienced with Java, but new to Python. I don't how to use Scanners like I would in Java. I tried searching through a reference library but couldn't really figure anything out.
collections.Counter is a very handy tool and worth learning about when you start using python.
from collections import Counter
inp = input("Enter letters") # input() if using python 3
l = len(inp.strip()) # get length of input string ,inp.strip() removes any whitespace, just use len(inp) if you want to include
c = Counter(inp)
for char in c:
c[char] = c[char] * 100 / l # don't need to cast as float for python 3
print (c)
Counter({'C': 30.0, 'G': 30.0, 'A': 20.0, 'T': 20.0})
There is a module csv
that has a DictWriter that will be able to write the data to file.
You can read directly from the standard input stream, sys.stdin
, like so:
$ cat read.py
import sys
userin = sys.stdin.read()
print [c for c in userin]
$ python read.py
HELLO
['H', 'E', 'L', 'L', 'O', '\n']
And then you can pipe a text file to stdin, like:
$ cat input.txt
HELLO
$ python read.py < input.txt
['H', 'E', 'L', 'L', 'O', '\n']
Or, if you want to read a file directly:
>>> import io
>>> with io.open('input.txt', mode='rb') as f:
... print [c for c in f.read()]
...
['H', 'E', 'L', 'L', 'O', '\n']
If you can save the sequence in a comma separated file (csv), then you could do something along the lines of:
import pandas as pd
sequence = pd.read_csv(file_name)
As = 0
Cs = 0
Gs = 0
Ts = 0
total = len(sequence)
for letter in sequence:
if letter == 'A':
As += 1.0
elif letter == 'C':
Cs += 1.0
elif letter == 'G':
Gs += 1.0
elif letter == 'T':
Ts += 1.0
percent_A = As/total
percent_C = As/total
percent_T = As/total
percent_G = As/total
Or:
import pandas as pd
sequence_list = []
sequence = pd.read_csv(file_name)
for letter in sequence:
sequence_list.append(letter)
As = sequence_list.count('A')
Cs = sequence_list.count('C')
Gs = sequence_list.count('G')
Ts = sequence_list.count('T')
total = len(sequence_list)
percent_A = As/total
percent_C = As/total
percent_T = As/total
percent_G = As/total
This general structure holds for tsvs as well.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.