[英]Search and split certain string in text file and saved output
How to split certain string in rows that contain characters of numbers and alphabets. 如何在包含数字和字母字符的行中拆分某些字符串。
Set data I have is like this ( tembin-data.dat
): 我拥有的设置数据是这样的(
tembin-data.dat
):
['3317121918', '69N1345E', '15']
['3317122000', '72N1337E', '20']
['3317122006', '75N1330E', '20']
['3317122012', '78N1321E', '20']
['3317122018', '83N1310E', '25']
.......etc
I need the new data arrangement by removing "N"
and "E"
just like this: 我需要通过删除
"N"
和"E"
进行新的数据安排,如下所示:
['3317121918', '69','1345','15']
['3317122000', '72','1337','20']
['3317122006', '75','1330','20']
['3317122012', '78','1321','20']
['3317122018', '83','1310','25']
.......etc
Python script that I used at moment is like this: 我现在使用的Python脚本是这样的:
newfile = open('tembin-data.dat', 'w')
with open('tembin4.dat', 'r') as inF:
for line in inF:
myString = '331712'
if myString in line:
data=line.split()
print data
newfile.write("%s\n" % data)
newfile.close()
tembin4.dat
is as below: tembin4.dat
如下:
REMARKS:
230900Z POSITION NEAR 7.8N 118.6E.
TROPICAL STORM 33W (TEMBIN), LOCATED APPROXIMATELY 769 NM EAST-
SOUTHEAST OF HO CHI MINH CITY, VIETNAM, HAS TRACKED WESTWARD AT
11 KNOTS OVER THE PAST SIX HOURS. MAXIMUM SIGNIFICANT WAVE HEIGHT
AT 230600Z IS 14 FEET. NEXT WARNINGS AT 231500Z, 232100Z, 240300Z
AND 240900Z.//
3317121918 69N1345E 15
3317122000 72N1337E 20
3317122006 75N1330E 20
3317122012 78N1321E 20
3317122018 83N1310E 25
3317122100 86N1295E 35
3317122106 85N1284E 35
3317122112 84N1276E 40
3317122118 79N1267E 50
3317122118 79N1267E 50
3317122200 78N1256E 45
3317122206 78N1236E 45
3317122212 80N1225E 45
3317122218 79N1214E 50
3317122218 79N1214E 50
3317122300 77N1204E 55
3317122300 77N1204E 55
3317122306 77N1193E 55
3317122306 77N1193E 55
NNNN
Try this: 尝试这个:
import re
for line in open(r"tembin4.txt","r"):
lst = line.split(" ")
for i,x in enumerate(lst):
grp = re.findall('(\d+)N(\d+)E',x)
if len(grp) !=0:
lst.remove(x)
lst.insert(i,grp[0][1])
lst.insert(i,grp[0][0])
print(" ".join(lst))
Just extending your approach with regex and split. 只需使用正则表达式和split扩展您的方法即可。
import re
newfile = open('tembin-data.dat', 'w')
pat = re.compile("[N|E]")
with open('tembin4.dat', 'r') as inF:
for line in inF:
myString = '331712'
if myString in line:
data=line.split()
data[2:2] = pat.split(data[1])[:-1] # insert the list flattend at index 2
del data[1] # Remove string with N&E from list.
print data
newfile.write("%s\n" % data)
You can use
Positive Lookbehind (?<=N)
andPositive Lookahead(?=N)
and just capture the group :您可以使用
Positive Lookbehind (?<=N)
和Positive Lookahead(?=N)
并捕获组:
import re
pattern="'\d+'|(\d+)(?=N)|(?<=N)(\d+)"
with open('file.txt','r') as f:
for line in f:
sub_list=[]
search=re.finditer(pattern,line)
for lin in search:
sub_list.append(int(lin.group().strip("'")))
if sub_list:
print(sub_list)
output:
输出:
[3317121918, 69, 1345, 15]
[3317122000, 72, 1337, 20]
[3317122006, 75, 1330, 20]
[3317122012, 78, 1321, 20]
[3317122018, 83, 1310, 25]
Regex information :
正则表达式信息:
'\d+'|(\d+)(?=N)|(?<=N)(\d+)/g'
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed
Positive Lookahead (?=N)
Assert that the Regex below matches
N matches the character N literally (case sensitive)
Positive Lookbehind (?<=N)
Assert that the Regex below matches
N matches the character N literally (case sensitive)
Using pandas you can do this easily. 使用熊猫,您可以轻松地做到这一点。
import pandas as pd
import os # optional
os.chdir('C:\\Users') # optional
df = pd.read_csv('tembin-data.dat', header = None)
df[3]= df[1].str.slice(1,3)
df[4]= df[1].str.slice(4,8)
df = df.drop(1, axis = 1)
df.to_csv('tembin-out.dat',header=False)
You can try this short solution in Python3: 您可以在Python3中尝试以下简短解决方案:
import re
s = [['3317121918', '69N1345E', '15'], ['3317122000', '72N1337E', '20'], ['3317122006', '75N1330E', '20'], ['3317122012', '78N1321E', '20'],
['3317122018', '83N1310E', '25']]
new_s = [[a, *re.findall('\d+', b), c] for a, b, c in s]
Output: 输出:
[['3317121918', '69', '1345', '15'], ['3317122000', '72', '1337', '20'], ['3317122006', '75', '1330', '20'], ['3317122012', '78', '1321', '20'], ['3317122018', '83', '1310', '25']]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.