简体   繁体   English

如何从包含相同单词的行数的文件中仅一次提取给定单词的行

[英]How to extract a line for given word for only one time from a file containing number of lines with same word

I have data file which contains data for a month. 我有一个数据文件,其中包含一个月的数据。 File format is like: 文件格式如下:

VAAU Observations at 00Z 02 Aug 2017

-------------------------------------------------------------------------------------------
   PRES   HGHT   TEMP   DWPT   FRPT   RELH   RELI   MIXR   DRCT   SKNT   THTA   THTE   THTV
    hPa     m      C      C      C      %      %    g/kg    deg   knot     K      K      K
-------------------------------------------------------------------------------------------
 1000.0     66
  942.0    579   22.6   20.3   20.3     87     87  16.20    270      4  300.8  348.6  303.8
  925.0    747   21.6   19.9   19.9     90     90  16.09    265     10  301.4  348.9  304.3
  850.0   1481   18.8   17.1   17.1     90     90  14.65    275     19  305.8  350.0  308.5
  812.0   1873   17.3   14.1   14.1     82     82  12.60    275     22  308.2  346.6  310.6
...................
Station information and sounding indices
                         Station identifier: VAAU
                             Station number: 43014
                           Observation time: 170801/0000
                           Station latitude: 19.85
                          Station longitude: 75.40
                          Station elevation: 579.0
                            Showalter index: 0.92
                               Lifted index: 0.99
    LIFT computed using virtual temperature: 0.46
                                SWEAT index: 255.81
                                    K index: 34.70
                         Cross totals index: 19.70
                      Vertical totals index: 20.10
                        Totals totals index: 39.80
      Convective Available Potential Energy: 5.98
             CAPE using virtual temperature: 9.37
                      Convective Inhibition: -81.35
             CINS using virtual temperature: -69.07
                           Equilibrum Level: 617.53
 Equilibrum Level using virtual temperature: 523.66
                   Level of Free Convection: 662.87
             LFCT using virtual temperature: 669.25
                     Bulk Richardson Number: 4.12
          Bulk Richardson Number using CAPV: 6.44
  Temp [K] of the Lifted Condensation Level: 292.45
Pres [hPa] of the Lifted Condensation Level: 894.64
     Mean mixed layer potential temperature: 301.92
              Mean mixed layer mixing ratio: 16.03
              1000 hPa to 500 hPa thickness: 5818.00
Precipitable water [mm] for entire sounding: 51.19

The same thing will repeat for every day for a month. 同一件事将重复一个月的每一天。 I want to extract Station identifier, Station number, Station latitude & Station longitude only once from that file. 我只想从该文件中提取一次Station identifier, Station number, Station latitude & Station longitude

I tried with python script but didn't get desired output. 我尝试使用python脚本,但未获得所需的输出。 Even I tried with grep as well: 即使我也尝试了grep:

grep -E "Station number|Station latitude|Station longitude|Station identifier" wrkk_2017.out


for line in open('vaau_2017.out'):
    rec = line.strip()
    words = ["Station identifier:", "Station number:", "Station latitude:", "Station longitude"]
    for rec in words:
        if rec in line:
            print (line)
            break

I am expecting only Station identifier: ..., Station number:...., Station latitude:......, Station longitude:.... Only once but I'm getting it number of times as it is there in that file. 我只希望站标识符: ..., Station number:...., Station latitude:......, Station longitude:....仅一次,但我得到的次数是在那个文件中。

You could add a boolean array in which you save if a word was already found: 您可以添加一个布尔数组,如果已经找到一个单词,则可以在其中保存:

still_left = [True] * len(words)

for line in open('vaau_2017.out'):
    for i, w in enumerate(words):
        if w in line and still_left[i]:
            print(line)
            still_left[i] = False
    if sum(still_left)==0:
        break

Example: 例:

s = '''id: 1
num: 2
lat: 3
lon: 4
id: 1
num: 2
lat: 3
lon: 4'''

words = ['id', 'num', 'lat', 'lon']
still_left = [True] * len(words)

for line in s.splitlines():              # for line in open('vaau_2017.out'):
    for i, w in enumerate(words):
        if w in line and still_left[i]:
            print(line)
            still_left[i] = False
# id: 1
# num: 2
# lat: 3
# lon: 4

and if you want to break reading the file as soon as all words are found, you can add 如果您想在找到所有单词后立即中断读取文件,则可以添加

    if sum(still_left)==0:
        break

at the for line... level behind the inner for i, w... loop. for line...在内部for i, w...后面的水平for i, w...循环。

You can do with regex - 您可以使用正则表达式-

a = 'Station information and sounding indices Station identifier: VAAU Station number: 43014 Observation time: 170801/0000 Station latitude: 19.85 Station longitude: 75.40 Station elevation: 579.0 Showalter index: 0.92 Lifted index: 0.99 LIFT computed using virtual temperature: 0.46 SWEAT index: 255.81 K index: 34.70 Cross totals index: 19.70 Vertical totals index: 20.10'
station_identifier = re.search('Station identifier: ([A-Z]+)',a).group(1)
print station_identifier #VAAU
station_number = re.search('Station number: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',a).group(1)
print station_number #43014
station_latitude = re.search('Station latitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',a).group(1)
print station_latitude #19.85
station_longitude = re.search('Station longitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',a).group(1)
print station_longitude #75.40

Learning Path: 学习途径:

https://www.programiz.com/python-programming/regex https://www.programiz.com/python-programming/regex

EDIT: 编辑:

Solution to your question - 解决问题的方法-

filename = "vaau_2017.out"
with open(filename) as f:
    for line in f.readlines():

        if 'Station identifier' in line:
            station_identifier = re.search('Station identifier: ([\sA-Z]+)',line).group(1)
            print station_identifier #VAAU

        if 'Station number' in line:
            station_number = re.search('Station number: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
            print station_number #43014

        if 'Station latitude' in line:
            station_latitude = re.search('Station latitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
            print station_latitude #19.85

        if 'Station longitude' in line:
            station_longitude = re.search('Station longitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
            print station_longitude #75.40

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从文本文件中仅提取具有特定单词的行并编写一个新行? - How to extract only lines with specific word from text file and write a new one? 如何使用固定行数的特定单词计算行号? - How to count a line number with specific word from fixed number of lines? 如何从一行中提取一个单词? - How to extract a word from a line? 文本文件中的完全匹配单词和包含单词的打印行 - Exact match word from text file and print line containing word 如何从word2vec文件中提取行? - how to extract line from a word2vec file? 如何在一行上一次打印一个单词? - How to print one word at a time on one line? 从 Python 中的单词列表中提取给定单词之前的一个单词的正则表达式 - Regular expression to extract one word before a given word from a word list in Python 使用 python 从文件中复制包含触发字的行 - Copy lines containing trigger word from file using python 尝试添加仅提取包含“word”的行的代码,并从请求中写入一个 new.txt 文件 - Trying to add code that extract only lines that contains "word" and write a new .txt file from requests 当我们在不同的文本文件中有不同的行数时,如何从 word 中提取数据到 excel - How to extract data from word to excel when we have different number of lines in different text files
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM