简体   繁体   English

搜索并拆分文本文件中的某些字符串并保存输出

[英]Search and split certain string in text file and saved output

How to split certain string in rows that contain characters of numbers and alphabets. 如何在包含数字和字母字符的行中拆分某些字符串。

Set data I have is like this ( tembin-data.dat ): 我拥有的设置数据是这样的( tembin-data.dat ):

['3317121918', '69N1345E', '15']

['3317122000', '72N1337E', '20']

['3317122006', '75N1330E', '20']

['3317122012', '78N1321E', '20']

['3317122018', '83N1310E', '25']

.......etc

I need the new data arrangement by removing "N" and "E" just like this: 我需要通过删除"N""E"进行新的数据安排,如下所示:

['3317121918', '69','1345','15']

['3317122000', '72','1337','20']

['3317122006', '75','1330','20']

['3317122012', '78','1321','20']

['3317122018', '83','1310','25']

.......etc

Python script that I used at moment is like this: 我现在使用的Python脚本是这样的:

newfile = open('tembin-data.dat', 'w')
with open('tembin4.dat', 'r') as inF:
     for line in inF:
         myString = '331712'
         if myString in line:
             data=line.split()
             print data
             newfile.write("%s\n" % data)
newfile.close() 

tembin4.dat is as below: tembin4.dat如下:

REMARKS:

230900Z POSITION NEAR 7.8N 118.6E.

TROPICAL STORM 33W (TEMBIN), LOCATED APPROXIMATELY 769 NM EAST-

SOUTHEAST OF HO CHI MINH CITY, VIETNAM, HAS TRACKED WESTWARD AT

11 KNOTS OVER THE PAST SIX HOURS. MAXIMUM SIGNIFICANT WAVE HEIGHT

AT 230600Z IS 14 FEET. NEXT WARNINGS AT 231500Z, 232100Z, 240300Z

AND 240900Z.//

3317121918  69N1345E  15

3317122000  72N1337E  20

3317122006  75N1330E  20

3317122012  78N1321E  20

3317122018  83N1310E  25

3317122100  86N1295E  35

3317122106  85N1284E  35

3317122112  84N1276E  40

3317122118  79N1267E  50

3317122118  79N1267E  50

3317122200  78N1256E  45

3317122206  78N1236E  45

3317122212  80N1225E  45

3317122218  79N1214E  50

3317122218  79N1214E  50

3317122300  77N1204E  55

3317122300  77N1204E  55

3317122306  77N1193E  55

3317122306  77N1193E  55

NNNN

Try this: 尝试这个:

import re
for line in open(r"tembin4.txt","r"):
    lst = line.split(" ")
    for i,x in enumerate(lst):
        grp = re.findall('(\d+)N(\d+)E',x)
        if len(grp) !=0:
            lst.remove(x)
            lst.insert(i,grp[0][1])
            lst.insert(i,grp[0][0])
    print(" ".join(lst))

Just extending your approach with regex and split. 只需使用正则表达式和split扩展您的方法即可。

import re
newfile = open('tembin-data.dat', 'w')

pat = re.compile("[N|E]")

with open('tembin4.dat', 'r') as inF:
 for line in inF:
     myString = '331712'
     if myString in line:
         data=line.split()
         data[2:2] = pat.split(data[1])[:-1] # insert the list flattend at index 2
         del data[1] # Remove string with N&E from list.
         print data
         newfile.write("%s\n" % data)

You can use Positive Lookbehind (?<=N) and Positive Lookahead(?=N) and just capture the group : 您可以使用Positive Lookbehind (?<=N)Positive Lookahead(?=N)并捕获组:

import re
pattern="'\d+'|(\d+)(?=N)|(?<=N)(\d+)"
with open('file.txt','r') as f:
    for line in f:
        sub_list=[]
        search=re.finditer(pattern,line)
        for lin in search:
            sub_list.append(int(lin.group().strip("'")))

        if sub_list:
            print(sub_list)

output: 输出:

[3317121918, 69, 1345, 15]
[3317122000, 72, 1337, 20]
[3317122006, 75, 1330, 20]
[3317122012, 78, 1321, 20]
[3317122018, 83, 1310, 25]

Regex information : 正则表达式信息:

'\d+'|(\d+)(?=N)|(?<=N)(\d+)/g'

\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed 

Positive Lookahead (?=N)
Assert that the Regex below matches
N matches the character N literally (case sensitive)

Positive Lookbehind (?<=N)
Assert that the Regex below matches
N matches the character N literally (case sensitive)

Using pandas you can do this easily. 使用熊猫,您可以轻松地做到这一点。

import pandas as pd
import os # optional

os.chdir('C:\\Users') # optional
df = pd.read_csv('tembin-data.dat', header = None)

df[3]= df[1].str.slice(1,3)
df[4]= df[1].str.slice(4,8)

df = df.drop(1, axis = 1)

df.to_csv('tembin-out.dat',header=False)

You can try this short solution in Python3: 您可以在Python3中尝试以下简短解决方案:

import re
s = [['3317121918', '69N1345E', '15'], ['3317122000', '72N1337E', '20'], ['3317122006', '75N1330E', '20'], ['3317122012', '78N1321E', '20'],
['3317122018', '83N1310E', '25']]
new_s = [[a, *re.findall('\d+', b), c] for a, b, c in s]

Output: 输出:

[['3317121918', '69', '1345', '15'], ['3317122000', '72', '1337', '20'], ['3317122006', '75', '1330', '20'], ['3317122012', '78', '1321', '20'], ['3317122018', '83', '1310', '25']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM