[英]Split and parse (to new file) string every nth character iterating over starting character - python
I asked a more general approach to this problem in a previous post but I am getting stuck with trying to parse out my results to individual files. 在上一篇文章中,我问过一种更通用的方法来解决这个问题,但是我一直试图将结果解析到单个文件中,而无法解决。 I want to iterate over a long string, starting at position 1 (python 0) and print out every 100 characters.
我想遍历从位置1(python 0)开始的长字符串,并每100个字符打印一次。 Then, I want move over one character and start at position 2 (python 1) and repeat the process until I reach the last 100 characters.
然后,我要移动一个字符并从位置2(python 1)开始,重复该过程,直到到达最后100个字符。 I want to parse each "100" line chunk into a new file.
我想将每个“ 100”行数据块解析为一个新文件。 Here is what I am currently working with:
这是我目前正在处理的内容:
seq = 7524 # I get this number from a raw_input
read_num=100
for raw_reads in range(100):
def nlength_parts(seq,read_num):
return map(''.join,zip(*[seq[i:] for i in range(read_num)]))
f = open('read' + str(raw_reads), 'w')
f.write("read" '\n')
f.write(nlength_parts(seq,read_num))
f.close
The error I am constantly getting now it 我现在不断得到的错误
f.write(nlength_parts(seq,read_num))
TypeError: expected a character buffer object
Having some issues, any help would be greatly appreciated! 有一些问题,任何帮助将不胜感激!
After some help, I have made some changes but still not working properly: 在获得一些帮助之后,我进行了一些更改,但仍无法正常工作:
seq = 7524 # I get this number from a raw_input
read_num=100
def nlength_parts(seq,read_num):
return map(''.join,zip(*[seq[i:] for i in range(read_num)]))
for raw_reads in range(100): # Should be gene length - 100
f = open('read' + str(raw_reads), 'w')
f.write("read" + str(raw_reads))
f.write(nlength_parts)
f.close
I may have left out some important variables and definitions to keep my post short but it has caused confusion. 我可能遗漏了一些重要的变量和定义,以使我的帖子简短些,但这引起了混乱。 I have pasted my entire code below.
我在下面粘贴了我的整个代码。
#! /usr/bin/env python
import sys,os
import random
import string
raw = raw_input("Text file: " )
with open(raw) as f:
joined = "".join(line.strip() for line in f)
f = open(raw + '.txt', 'w')
f.write(joined)
f.closed
seq = str(joined)
read_num = 100
def nlength_parts(seq,read_num):
return map(''.join,zip(*[seq[i:] for i in range(read_num)]))
for raw_reads in range(100): # ideally I want range to be len(seq)-100
f = open('read' + str(raw_reads), 'w')
f.write("read" + str(raw_reads))
f.write('\n')
f.write(str(nlength_parts))
f.close
A few things: 一些东西:
seq
and read_num
in the global scope, and then also use the same parameters in your function. seq
和read_num
,然后在函数中使用相同的参数。 What you should be doing is have the names of the parameters in the function definition be different, and then passing those two variables to the function when you call it. seq
in your function, but seq
is an integer in your code. seq
,但是seq
是代码中的整数。 Is seq the processed output of the file you were talking about in your comment? That being said, I believe this code will do what you want it to do: 话虽如此,我相信这段代码会做您想要的事情:
def nlength_parts(myStr, length, paddingChar=" "):
if(len(myStr) < length):
myStr += paddingChar * (length - len(myStr))
sequences = []
for i in range(0, len(myStr)-length + 1):
sequences.append(myStr[i:i+length])
return(sequences)
foo = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
nlengthfoo = nlength_parts(foo, 10)
for x in range(0, length(nlengthfoo):
with open("read" + (x+1), "w") as f:
f.write(nlengthfoo[x])
EDIT: Apologies, changed my code in response to your comment. 编辑:抱歉,响应您的评论更改了我的代码。
Essentially, you want a rolling window of your string. 本质上,您需要字符串的滚动窗口。 Say
long_string = "012345678901234567890123456789..."
for a total length of 100. 说
long_string = "012345678901234567890123456789..."
,总长度为100。
In [18]: long_string
Out[18]: '0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789'
In [19]: window = 10
In [20]: for i in range(len(long_string) - window +1):
.....: chunk = long_string[i:i+window]
.....: print(chunk)
.....: with open('chunk_' + str(i+1) + '.txt','w') as f:
.....: f.write(chunk)
.....:
0123456789
1234567890
2345678901
3456789012
4567890123
5678901234
6789012345
7890123456
8901234567
9012345678
0123456789
1234567890
2345678901
3456789012
4567890123
5678901234
6789012345
7890123456
8901234567
9012345678
0123456789
1234567890
2345678901
3456789012
4567890123
5678901234
6789012345
7890123456
8901234567
9012345678
0123456789
1234567890
2345678901
3456789012
4567890123
5678901234
6789012345
7890123456
8901234567
9012345678
0123456789
1234567890
2345678901
3456789012
4567890123
5678901234
6789012345
7890123456
8901234567
9012345678
0123456789
1234567890
2345678901
3456789012
4567890123
5678901234
6789012345
7890123456
8901234567
9012345678
0123456789
1234567890
2345678901
3456789012
4567890123
5678901234
6789012345
7890123456
8901234567
9012345678
0123456789
1234567890
2345678901
3456789012
4567890123
5678901234
6789012345
7890123456
8901234567
9012345678
0123456789
1234567890
2345678901
3456789012
4567890123
5678901234
6789012345
7890123456
8901234567
9012345678
0123456789
Finally, 最后,
In [21]: ls
chunk_10.txt chunk_20.txt chunk_30.txt chunk_40.txt chunk_50.txt chunk_60.txt chunk_70.txt chunk_80.txt chunk_90.txt
chunk_11.txt chunk_21.txt chunk_31.txt chunk_41.txt chunk_51.txt chunk_61.txt chunk_71.txt chunk_81.txt chunk_91.txt
chunk_12.txt chunk_22.txt chunk_32.txt chunk_42.txt chunk_52.txt chunk_62.txt chunk_72.txt chunk_82.txt chunk_9.txt
chunk_13.txt chunk_23.txt chunk_33.txt chunk_43.txt chunk_53.txt chunk_63.txt chunk_73.txt chunk_83.txt
chunk_14.txt chunk_24.txt chunk_34.txt chunk_44.txt chunk_54.txt chunk_64.txt chunk_74.txt chunk_84.txt
chunk_15.txt chunk_25.txt chunk_35.txt chunk_45.txt chunk_55.txt chunk_65.txt chunk_75.txt chunk_85.txt
chunk_16.txt chunk_26.txt chunk_36.txt chunk_46.txt chunk_56.txt chunk_66.txt chunk_76.txt chunk_86.txt
chunk_17.txt chunk_27.txt chunk_37.txt chunk_47.txt chunk_57.txt chunk_67.txt chunk_77.txt chunk_87.txt
chunk_18.txt chunk_28.txt chunk_38.txt chunk_48.txt chunk_58.txt chunk_68.txt chunk_78.txt chunk_88.txt
chunk_19.txt chunk_29.txt chunk_39.txt chunk_49.txt chunk_59.txt chunk_69.txt chunk_79.txt chunk_89.txt
chunk_1.txt chunk_2.txt chunk_3.txt chunk_4.txt chunk_5.txt chunk_6.txt chunk_7.txt chunk_8.txt
I would just treat the string like a file. 我只是将字符串像文件一样对待。 This lets you avoid any slicing headaches and is pretty straightforward because the file API lets you "read" in chunks easily.
这使您避免了任何麻烦,而且非常简单,因为文件API使您可以轻松地“读取”块。
In [1]: import io
In [2]: long_string = 'a'*100 + 'b'*100 + 'c'*100 + 'e'*88
In [3]: print(long_string)
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
In [4]: string_io = io.StringIO(long_string)
In [5]: chunk = string_io.read(100)
In [6]: chunk_no = 1
In [7]: while chunk:
....: print(chunk)
....: with open('chunk_' + str(chunk_no) + '.txt','w') as f:
....: f.write(chunk)
....: chunk = string_io.read(100)
....: chunk_no += 1
....:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
Note, I'm using ipython terminal, so you can use terminal commands inside the interpreter session! 注意,我使用的是ipython终端,因此您可以在解释器会话中使用终端命令!
In [8]: ls chunk*
chunk_1.txt chunk_2.txt chunk_3.txt chunk_4.txt
In [9]: cat chunk_1.txt
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
In [10]: cat chunk_2.txt
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
In [11]: cat chunk_3.txt
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
In [12]: cat chunk_4.txt
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
In [13]:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.