使用Python從文件中提取數字段？

Question

我知道有人對如何從文本文件中提取數字存在疑問，這在一定程度上有所幫助。 這是我的問題。 我有一個文本文件，看起來像：

Some crap here: 3434
A couple more lines
of crap.
34 56 56
34 55 55
A bunch more crap here
More crap here: 23
And more: 33
54 545 54
4555 55 55

我正在嘗試編寫一個腳本，提取具有三個數字的行並將其放入單獨的文本文件中。 例如，我只有一個文件：

34 56 56
34 55 55

和另一個文件：

54 545 54
4555 55 55

現在我有：

for line in file_in:
    try:
        float(line[1])
        file_out.write(line)
    except ValueError:
        print "Just using this as placeholder"

這樣可以成功將兩個數字塊都放入一個文件中。 但是我需要將一個塊放在一個文件中，將另一個塊放在另一個文件中，而我對如何完成此工作一無所知。

Answer 1

要知道字符串是否為數字，可以使用str.isdigit ：

for line in file_in:
    # split line to parts
    parts = line.strip().split()
    # check all parts are numbers
    if all([str.isdigit(part) for part in parts]):
        if should_split:
            split += 1
            with open('split%d' % split, 'a') as f:
                f.write(line)
            # don't split until we skip a line
            should_split = False
        else:
            with open('split%d' % split, 'a') as f:
                f.write(line)
    elif not should_split:
        # skipped line means we should split
        should_split = True

Answer 2

您沒有指定要使用的Python版本，但可以在Python2.7中以這種方式進行處理。

string.translate需要一個轉換表（可以為None）和一組要轉換的字符（如果table為None則刪除）。

您可以通過正確切片string.printable將delete_chars設置為0-9和空格以外的所有內容：

>>> import string
>>> remove_chars = string.printable[10:-6] + string.printable[-4:]
>>> string.translate('Some crap 3434', None, remove_chars)
'  3434'
>>> string.translate('34 45 56', None, remove_chars)
'34 45 56'

添加一條strip以修剪左右兩側的空白，並遍歷包含您的問題數據的測試文件：

>>> with open('testfile.txt') as testfile:
...   for line in testfile:
...     trans = line.translate(None, remove_chars).strip()
...     if trans:
...       print trans
... 
3434
34 56 56
34 55 55
23
33
54 545 54
4555 55 55

Answer 3

您可以在這里使用regex，但這需要通過file.read()或其他方式將文件讀入變量（如果文件不大）

((?:(?:\d+ ){2}\d+(?:\n|$))+)

參見演示。

https://regex101.com/r/tX2bH4/20

import re
p = re.compile(r'((?:(?:\d+ ){2}\d+(?:\n|$))+)', re.IGNORECASE)
test_str = "Some crap here: 3434\nA couple more lines\nof crap.\n34 56 56\n34 55 55\nA bunch more crap here\nMore crap here: 23\nAnd more: 33\n54 545 54\n4555 55 55"

re.findall(p, test_str)

re.findall返回一個列表。您可以輕松地將列表的每個內容放入一個新文件中。

使用Python從文件中提取數字段？

問題描述

3 個解決方案

解決方案1
0 2015-01-15 16:51:23

解決方案2
0 2015-01-15 17:01:39

解決方案3
0 2015-01-15 17:01:42

使用Python從文件中提取數字段？

問題描述

3 個解決方案

解決方案1 0 2015-01-15 16:51:23

解決方案2 0 2015-01-15 17:01:39

解決方案3 0 2015-01-15 17:01:42

解決方案1
0 2015-01-15 16:51:23

解決方案2
0 2015-01-15 17:01:39

解決方案3
0 2015-01-15 17:01:42