I have a .txt file of the following shape. Impractically, unknown values are simply blank:
----Header---
Description,
a few lines of description
Still description
# RESIDUE AA STRUCTURE BP1 BP2
1 79 A G 0 0 97
2 80 A A - 0 0 28
3 81 A V E -A 134 0A 53
4 82 A F E -A 133 0A 6
5 83 A K E -A 132 0A 52
11 ! 0 0 0
12 101 A D H 0 0 137
I want to extract the 2nd, 4th and 5th column, where nonexisting values should taken into account. So, what I want would be:
function(textfile,1,3,4)
>[79,80,81,82,83,"",101]
>["G","A","V","F","K","!","D"]
>["","","E","E","E","","H"]
The exact shape of the output does not matter, it could eg be anx 3 array or sth. Because of the bad choice of leaving unknowns blank, I cannot use np.loadtxt, because it would jump to the next column immediately.
Have you tried using pandas.read_csv with delimiters set to whitespace.
eg
pandas.read_csv(filename = 'filename.txt', delim_whitespace=True).
It also looks like you are missing a column name.
You could investigate using Pandas as follows:
print pd.read_fwf('input.txt', widths=(4, 5, 2, 2, 3, 7, 5, 6, 5), usecols=[1, 3, 4], skiprows=6, header=None)
This would display:
1 3 4
0 79.0 G NaN
1 80.0 A NaN
2 81.0 V E
3 82.0 F E
4 83.0 K E
5 NaN ! NaN
6 101.0 D H
Alternatively you could just extract the necessary columns manually as follows:
import itertools
col_locations = [(3,8), (11, 12), (13,15)]
with open('input.txt') as f_input:
# Skip over initial lines until the header row
next(itertools.dropwhile(lambda x: "RESIDUE" not in x, f_input))
lines = [row.rstrip() for row in f_input]
data = []
for row in lines:
data.append([row[start:end].strip() for start, end in col_locations])
data = zip(*data) # Transpose the data
print data
This would give you a list as follows:
[('79', '80', '81', '82', '83', '', '101'), ('G', 'A', 'V', 'F', 'K', '!', 'D'), ('', '', 'E', 'E', 'E', '', 'H')]
If you really want the first column converted to numbers, you could apply a per column conversion function as follows:
import itertools
def num_convert(x):
try:
return int(x)
except:
return ''
col_locations = [(3, 8, num_convert), (11, 12, str.strip), (13, 15, str.strip)]
with open('input.txt') as f_input:
# Skip over initial lines until the header row
next(itertools.dropwhile(lambda x: "RESIDUE" not in x, f_input))
lines = [row.rstrip() for row in f_input]
data = []
for row in lines:
data.append([conversion(row[start:end]) for start, end, conversion in col_locations])
data = zip(*data) # Transpose the data
print data
Giving you:
[(79, 80, 81, 82, 83, '', 101), ('G', 'A', 'V', 'F', 'K', '!', 'D'), ('', '', 'E', 'E', 'E', '', 'H')]
You can use the struct module :
import struct
line = ' 5 83 A K E -A 132 0A 52 '
extracted_line = map(lambda x: x.strip(), struct.unpack("6s3s2s3s6s4s7s5s6s", line[:42])))
print(list(extracted_line))
Probably it will need some adjustments because I don't know if the as the values grow, they move left or right. But this is a way.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.