如何從python中的文本文件的多行提取兩個特定數字

Question

我有一個非常大的文本文件，其中包含從2個GPS天線進行的緯度測量。 文件中有很多垃圾數據，我需要從中提取緯度測量值。 這些總是偶爾在其他文本的其他行之間發生。 它們發生的行如下所示：

12:34:56.789    78:90:12.123123123  BLAH_BLAH   blahblah    :      LAT #1 MEAS=-80[deg], LAT #2 MEAS=-110[deg]  blah_BlHaBKBjFkjsa.c

我需要的數字是“ LAT #1 MEAS=-80[deg] ”和“ LAT #2 MEAS=-110[deg] ”之間的數字。 因此，基本上是-80和-110 。

剩下的文字對我來說並不重要。

這是來自輸入文件的示例文本：

08:59:07.603    08:59:05.798816 PAL_PARR_INTF   TraceModule GET int@HISR :82    drv_Shm.c (../../../PALCommon/Platform_EV/HAL/Common/driver/Shm/src)    525 
08:59:07.603    08:59:05.798816 PAL_PARR_INTF   TraceModule xdma is not running drv_Shm.c (../../../PALCommon/Platform_EV/HAL/Common/driver/Shm/src)    316 
08:59:07.603    08:59:05.798847 PAL_PARR_INTF   TraceModule DMA is activated    drv_Shm.c (../../../PALCommon/Platform_EV/HAL/Common/driver/Shm/src)    461 
08:59:10.847    08:59:09.588001 UHAL_SRCH   TraceFlow   :      LAT #1 MEAS=-80[deg], LAT #2 MEAS=-110[deg]  uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src)   1596    
08:59:11.440    08:59:10.876819 UHAL_COMMON TraceWarning    cellRtgSlot=0 cellRtgChip=1500 CELLK_ACTIVE=1 boundary RSN 232482 current RSN 232482 boundarySFN 508 currentSFN 508 uhal_Hmcp.c (../../../HEDGE/UL1/UHAL_3XX/platform/Code/Src) 2224    
08:59:11.440    08:59:10.877277 UHAL_SRCH   TraceWarning    uhal_HmcpSearcherS1LISR: status_reg(0xf0100000) uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src)   1497    
08:59:11.440    08:59:10.877307 UHAL_COMMON TraceWarning    uhal_HmcpSearcherSCDLISR is called. uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src)   1512    
08:59:11.440    08:59:10.877338 UHAL_SRCH   TraceFlow   :      LAT #1 MEAS=-78[deg], LAT #2 MEAS=-110[deg]  uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src)   1596

現在，我正在使用代碼打開文件並獲取這些值，但是它不起作用。 我是編程新手，所以我不知道我在哪里出錯。

import re                                                                       

    # Importing 're' for using regular expressions

file_dir=raw_input('Enter the complete Directory of the file (eg c:\\abc.txt):')    # Providing the user with a choice to open their file in .txt format
with open(file_dir, 'r') as f:
    lat_lines= f.read()                                                            # storing the data in a variable

# Declaring the two lists to hold the numbers
raw_lat1 = []
raw_lat2 = []

start_1 = 'LAT #1 MEAS='
end_1 = '[de'

start_2 = 'LAT #2 MEAS='
end_2 = '[de'

x = re.findall(r'start_1(.*?)end_1',lat_lines,re.DOTALL)
raw_lat1.append(x)

y = re.findall(r'start_2(.*?)end_2',lat_lines,re.DOTALL)
raw_lat2.append(y)

Answer 1

這應該可以做到（它不使用正則表達式，但仍然可以使用）

answer = []
with open('file.txt') as infile:
    for line in infile:
        if "LAT #1 MEAS=" not in line: continue
        if "LAT #2 MEAS=" not in line: continue
        splits = line.split('=')
        temp = [0,0]
        for i,part in enumerate(splits):
            if part.endswith("LAT #1 MEAS"): temp[0] = int(splits[i+1].split(None,1)[0].split('[',1)[0])
            elif part.endswith("LAT #2 MEAS"): temp[1] = int(splits[i+1].split(None,1)[0].split('[',1)[0])
        answer.append(temp)

Answer 2

從這里我可以看到正則表達式存在兩個問題。 在re.findall調用中，您將start_1和end_2當作變量使用，但是正則表達式實際上只是將它們視為原始字符"start_1"和"end_1"等， "end_1" 。正則表達式字符串，則必須使用格式字符串。例：

r'%s(.*?)%s' % (start_1, end_1)

另外，當您使用.*end_1 ，它將匹配任何字符，因此它將匹配所有字符，直到end_1最后出現end_1為止。 LAT #1和LAT #2以相同的方式結束，因此，如果字符串中的其他所有內容都正確，則實際上將匹配“ --80°，LAT＃2 MEAS = -110 [de”

此外，當在正則表達式中使用方括號時，必須將其轉義。 尖括號用於在正則表達式中指定字符集。

這是一個示例，其中我僅假設變量line包含您的示例字符串"12:34:56.789 78:90:12.123123123 BLAH_BLAH blahblah : LAT #1 MEAS=-80[deg], LAT #2 MEAS=-110[deg] blah_BlHaBKBjFkjsa.c" 。 您可能需要為整個文件調整此代碼段。

prefix = r'LAT %s MEAS=(-?\d+)\[deg\]' # includes format string for the variable part of the expression.
p1 = r'#1'
p2 = r'#2
x = re.findall(prefix % p1, line)
y = re.findall(prefix % p2, line)

如何從python中的文本文件的多行提取兩個特定數字

問題描述

2 個解決方案

解決方案1
0 已采納 2016-12-20 00:34:18

解決方案2
0 2016-12-20 00:56:39

如何從python中的文本文件的多行提取兩個特定數字

問題描述

2 個解決方案

解決方案1 0 已采納 2016-12-20 00:34:18

解決方案2 0 2016-12-20 00:56:39

解決方案1
0 已采納 2016-12-20 00:34:18

解決方案2
0 2016-12-20 00:56:39