简体   繁体   English

如何从文件的每一行提取字符和数字?

[英]How to extract characters and numbers from every line of a file?

I tried to extract first character,second number and third character from each line of file and storing into three variables called FirstChar,SecondNum,ThirdChar. 我试图从文件的每一行中提取第一个字符,第二个数字和第三个字符,并将其存储到名为FirstChar,SecondNum,ThirdChar的三个变量中。

Input file (MultiPointMutation.txt): 输入文件(MultiPointMutation.txt):

P1T,C11F,E13T
L7A
E2W

Expected output: 预期产量:

FirstChar="PCELE"
SecondNum="1 11 13 7 2"
ThirdChar="TFTAW"

My code: 我的代码:

 import re 
 import itertools
 ns=map(lambda x:x.strip(),open('MultiplePointMutation.txt','r').readlines())#reading  file
 for line in ns:
         second="".join(re.findall(r'\d+',line))#extract second position numbers
         print second # print second nums
         char="".join(re.findall(r'[a-zA-Z]',line))#Extract all characters
         c=str(char.rstrip())
         First=0
         Third=1
         for index in range(len(c)):
                 if index==First:
                         FC=c[index]#here i got all first characters
                         print FC
                         First=First+2
                 if index==Third:
                         TC=c[index]
                         print TC
                         Third=Third+2#here i got all third characters

OUTPUT: Here I got FirstCharacter and ThirdCharacter exactly correct 输出:在这里,我得到的FirstCharacter和ThirdCharacter完全正确

FirstChar:
          P
          C
          E
          L
          E
ThirdChar:
          T
          F
          T
          A
          W

but the problem is in getting SecondNum: 但问题在于获取SecondNum:

           SecondNum:
           11113
           7
           2

I want to extract numbers as follows: 我想提取数字如下:

          1
          11
          13
          7
          2

NOTE: Here, I don't want to print one by one. 注意:在这里,我不想一张一张地打印。 I want read this SecondNum variable values one by one for latter use. 我想一遍阅读此SecondNum变量值,以备后用。

for secondNum you can simply modify the line: 对于secondNum,您可以简单地修改以下行:

second="".join(re.findall(r'\d+',line))#extract second position numbers

to

second="\n".join(re.findall(r'\d+',line))#extract second position numbers

but I think your first and third char doesnt work correctly. 但我认为您的第一个和第三个字符无法正常工作。 From the first output you want to receive you should have something like this: 从您要接收的第一个输出中,应该有以下内容:

 import re

 x= """P1T,C11F,E13T
 L7A
 E2W"""

 secondNum = []
 firstChar = []
 thirdChar = []
 for line in x.split('\n'):

      [secondNum.append(a) for a in re.findall('\d+',line)]

      [firstChar.append(a) for a in re.findall('(?:^|,)([a-zA-Z])',line)]
      # this is an inline for loop which takes each element returned from re.findall  
      # and appends it to the firstChar Array
      # the regex searchs for the start of the string (^) or a comma(,) and this is a 
      # non capturing group (starting with (?:  meaning that the result of this group 
      # is not considered for the returned result and finally capture 1 character 
      # [a-zA-Z] behind the comma or the start which should be the first character

      [thirdChar.append(a) for a in re.findall('(?:^\w\d+|,\w\d+)([a-zA-Z])',line)
      # the third char works quite similar, but the non capturing group searchs for a 
      # comma or start of the string again followed by 1 char and at least one number 
      # (\d+) after this number there should be the third character which is in the 
      # captured group again

 print "firstChar=\""+str(firstChar)+"\""
 print "secondNum=\""+str(secondNum)+"\""
 print "thirdChar=\""+str(thirdChar)+"\"" 

But your third character is the third with the numbers for L7A (where you want to have A) but it is also the forth for P1TQ (where you want to have Q) 但是您的第三个字符是L7A(您想拥有A)的第三个字符,但对于P1TQ(您想要有Q)的数字也是第四个字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何删除文件中每行的最后3个字符? - How to delete the last 3 characters from every line in a file? 如何将行号从多个文件提取到单个文件 - How to extract line numbers from multiple files to a single file 如何从包含反斜杠的文件中提取行号 - How to extract the line numbers from a file containing backslash 如何从每个json文件的第一行中删除前几个字符 - How to remove first few characters from every 1st line of each json file 如何从字符串 Python pandas 中提取数字和字符? - How to extract numbers and characters from a string Python pandas? 如何从python中的文本文件的多行提取两个特定数字 - How can I extract two specific numbers from multiple line of a text file in python 从文本文件中删除行号并将值提取到列表中 - Remove line numbers from text file and extract values to list Python:如何从文件中的行读取字符并将它们转换为浮点数和strs,具体取决于它们是数字还是字母? - Python: How can I read in the characters from a line in a file and convert them to floats and strs, depending on if they are numbers or letters? 如何从目录中的文件名中提取字符? - How to extract characters from file names in a directory? 如何从文本文件中提取和求和数字 - How to extract and sum numbers from a text file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM