Finding the count of letters in each column

Question

I need to find the count of letters in each column as follows:

String: ATCG
        TGCA
        AAGC
        GCAT

string is a series.

I need to write a program to get the following:

I have written the following code but I am getting a row in 0 index and column at the end (column index 450, actual column no 451) with nan values. I should not be getting either the row or the column 451. I need to have only 450 columns.

f = zip(*string)
counts = [{letter: column.count(letter) for letter in column} for column in 
f]
counts=pd.DataFrame(counts).transpose()
print(counts)
counts = counts.drop(counts.columns[[450]], axis =1)

Can anyone please help me understand the issue?

Answer 1

Here is one way you can implement your logic. If required, you can turn your series into a list via lst = s.tolist() .

lst = ['ATCG', 'TGCA', 'AAGC', 'GCAT']

arr = [[i.count(x) for i in zip(*lst)] for x in ('ATCG')]

res = pd.DataFrame(arr, index=list('ATCG'))

Result

   0  1  2  3
A  2  1  1  1
T  1  1  0  1
C  0  1  2  1
G  1  1  1  1

Explanation

In the list comprehension, deal with columns first by iterating the first, second, third and fourth elements of each string sequentially.
Deal with rows second by iterating through 'ATCG' sequentially.
This produces a list of lists which can be fed directly into pd.DataFrame .

Answer 2

With Series.value_counts() :

>>> s = pd.Series(['ATCG', 'TGCA', 'AAGC', 'GCAT'])

>>> s.str.join('|').str.split('|', expand=True)\
...     .apply(lambda row: row.value_counts(), axis=0)\
...     .fillna(0.)\
...     .astype(int)
   0  1  2  3
A  2  1  1  1
C  0  1  2  1
G  1  1  1  1
T  1  1  0  1

I'm not sure how logically you want to order the index, but you could call .reindex() or .sort_index() on this result.

The first line, s.str.join('|').str.split('|', expand=True) gets you an "expanded" version

   0  1  2  3
0  A  T  C  G
1  T  G  C  A
2  A  A  G  C
3  G  C  A  T

which should be faster than calling pd.Series(list(x)) ... on each row.

Finding the count of letters in each column

Question

2 answers

solution1
3 2018-03-24 20:16:28

solution2
2 2018-03-24 21:03:33

Finding the count of letters in each column

Question

2 answers

solution1 3 2018-03-24 20:16:28

solution2 2 2018-03-24 21:03:33

solution1
3 2018-03-24 20:16:28

solution2
2 2018-03-24 21:03:33