[英]How to extract characters and numbers from every line of a file?
I tried to extract first character,second number and third character from each line of file and storing into three variables called FirstChar,SecondNum,ThirdChar. 我试图从文件的每一行中提取第一个字符,第二个数字和第三个字符,并将其存储到名为FirstChar,SecondNum,ThirdChar的三个变量中。
Input file (MultiPointMutation.txt): 输入文件(MultiPointMutation.txt):
P1T,C11F,E13T
L7A
E2W
Expected output: 预期产量:
FirstChar="PCELE"
SecondNum="1 11 13 7 2"
ThirdChar="TFTAW"
My code: 我的代码:
import re
import itertools
ns=map(lambda x:x.strip(),open('MultiplePointMutation.txt','r').readlines())#reading file
for line in ns:
second="".join(re.findall(r'\d+',line))#extract second position numbers
print second # print second nums
char="".join(re.findall(r'[a-zA-Z]',line))#Extract all characters
c=str(char.rstrip())
First=0
Third=1
for index in range(len(c)):
if index==First:
FC=c[index]#here i got all first characters
print FC
First=First+2
if index==Third:
TC=c[index]
print TC
Third=Third+2#here i got all third characters
OUTPUT: Here I got FirstCharacter and ThirdCharacter exactly correct 输出:在这里,我得到的FirstCharacter和ThirdCharacter完全正确
FirstChar:
P
C
E
L
E
ThirdChar:
T
F
T
A
W
but the problem is in getting SecondNum: 但问题在于获取SecondNum:
SecondNum:
11113
7
2
I want to extract numbers as follows: 我想提取数字如下:
1
11
13
7
2
NOTE: Here, I don't want to print one by one. 注意:在这里,我不想一张一张地打印。 I want read this SecondNum variable values one by one for latter use.
我想一遍阅读此SecondNum变量值,以备后用。
for secondNum you can simply modify the line: 对于secondNum,您可以简单地修改以下行:
second="".join(re.findall(r'\d+',line))#extract second position numbers
to 至
second="\n".join(re.findall(r'\d+',line))#extract second position numbers
but I think your first and third char doesnt work correctly. 但我认为您的第一个和第三个字符无法正常工作。 From the first output you want to receive you should have something like this:
从您要接收的第一个输出中,应该有以下内容:
import re
x= """P1T,C11F,E13T
L7A
E2W"""
secondNum = []
firstChar = []
thirdChar = []
for line in x.split('\n'):
[secondNum.append(a) for a in re.findall('\d+',line)]
[firstChar.append(a) for a in re.findall('(?:^|,)([a-zA-Z])',line)]
# this is an inline for loop which takes each element returned from re.findall
# and appends it to the firstChar Array
# the regex searchs for the start of the string (^) or a comma(,) and this is a
# non capturing group (starting with (?: meaning that the result of this group
# is not considered for the returned result and finally capture 1 character
# [a-zA-Z] behind the comma or the start which should be the first character
[thirdChar.append(a) for a in re.findall('(?:^\w\d+|,\w\d+)([a-zA-Z])',line)
# the third char works quite similar, but the non capturing group searchs for a
# comma or start of the string again followed by 1 char and at least one number
# (\d+) after this number there should be the third character which is in the
# captured group again
print "firstChar=\""+str(firstChar)+"\""
print "secondNum=\""+str(secondNum)+"\""
print "thirdChar=\""+str(thirdChar)+"\""
But your third character is the third with the numbers for L7A (where you want to have A) but it is also the forth for P1TQ (where you want to have Q) 但是您的第三个字符是L7A(您想拥有A)的第三个字符,但对于P1TQ(您想要有Q)的数字也是第四个字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.