如何从文件的每一行提取字符和数字？

Question

I tried to extract first character,second number and third character from each line of file and storing into three variables called FirstChar,SecondNum,ThirdChar. 我试图从文件的每一行中提取第一个字符，第二个数字和第三个字符，并将其存储到名为FirstChar，SecondNum，ThirdChar的三个变量中。

Input file (MultiPointMutation.txt): 输入文件（MultiPointMutation.txt）：

P1T,C11F,E13T
L7A
E2W

Expected output: 预期产量：

FirstChar="PCELE"
SecondNum="1 11 13 7 2"
ThirdChar="TFTAW"

My code: 我的代码：

 import re 
 import itertools
 ns=map(lambda x:x.strip(),open('MultiplePointMutation.txt','r').readlines())#reading  file
 for line in ns:
         second="".join(re.findall(r'\d+',line))#extract second position numbers
         print second # print second nums
         char="".join(re.findall(r'[a-zA-Z]',line))#Extract all characters
         c=str(char.rstrip())
         First=0
         Third=1
         for index in range(len(c)):
                 if index==First:
                         FC=c[index]#here i got all first characters
                         print FC
                         First=First+2
                 if index==Third:
                         TC=c[index]
                         print TC
                         Third=Third+2#here i got all third characters

OUTPUT: Here I got FirstCharacter and ThirdCharacter exactly correct 输出：在这里，我得到的FirstCharacter和ThirdCharacter完全正确

FirstChar:
          P
          C
          E
          L
          E
ThirdChar:
          T
          F
          T
          A
          W

but the problem is in getting SecondNum: 但问题在于获取SecondNum：

I want to extract numbers as follows: 我想提取数字如下：

NOTE: Here, I don't want to print one by one. 注意：在这里，我不想一张一张地打印。 I want read this SecondNum variable values one by one for latter use. 我想一遍阅读此SecondNum变量值，以备后用。

Answer 1

for secondNum you can simply modify the line: 对于secondNum，您可以简单地修改以下行：

second="".join(re.findall(r'\d+',line))#extract second position numbers

to 至

second="\n".join(re.findall(r'\d+',line))#extract second position numbers

but I think your first and third char doesnt work correctly. 但我认为您的第一个和第三个字符无法正常工作。 From the first output you want to receive you should have something like this: 从您要接收的第一个输出中，应该有以下内容：

 import re

 x= """P1T,C11F,E13T
 L7A
 E2W"""

 secondNum = []
 firstChar = []
 thirdChar = []
 for line in x.split('\n'):

      [secondNum.append(a) for a in re.findall('\d+',line)]

      [firstChar.append(a) for a in re.findall('(?:^|,)([a-zA-Z])',line)]
      # this is an inline for loop which takes each element returned from re.findall  
      # and appends it to the firstChar Array
      # the regex searchs for the start of the string (^) or a comma(,) and this is a 
      # non capturing group (starting with (?:  meaning that the result of this group 
      # is not considered for the returned result and finally capture 1 character 
      # [a-zA-Z] behind the comma or the start which should be the first character

      [thirdChar.append(a) for a in re.findall('(?:^\w\d+|,\w\d+)([a-zA-Z])',line)
      # the third char works quite similar, but the non capturing group searchs for a 
      # comma or start of the string again followed by 1 char and at least one number 
      # (\d+) after this number there should be the third character which is in the 
      # captured group again

 print "firstChar=\""+str(firstChar)+"\""
 print "secondNum=\""+str(secondNum)+"\""
 print "thirdChar=\""+str(thirdChar)+"\""

But your third character is the third with the numbers for L7A (where you want to have A) but it is also the forth for P1TQ (where you want to have Q) 但是您的第三个字符是L7A（您想拥有A）的第三个字符，但对于P1TQ（您想要有Q）的数字也是第四个字符。

如何从文件的每一行提取字符和数字？

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-07-23 13:58:08

如何从文件的每一行提取字符和数字？

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-07-23 13:58:08

解决方案1
0 已采纳 2014-07-23 13:58:08