使用Python从文件中提取数字段？

Question

I know there are questions on how to extract numbers from a text file, which have helped partially. 我知道有人对如何从文本文件中提取数字存在疑问，这在一定程度上有所帮助。 Here is my problem. 这是我的问题。 I have a text file that looks like: 我有一个文本文件，看起来像：

Some crap here: 3434
A couple more lines
of crap.
34 56 56
34 55 55
A bunch more crap here
More crap here: 23
And more: 33
54 545 54
4555 55 55

I am trying to write a script that extracts the lines with the three numbers and put them into separate text files. 我正在尝试编写一个脚本，提取具有三个数字的行并将其放入单独的文本文件中。 For example, I'd have one file: 例如，我只有一个文件：

34 56 56
34 55 55

And another file: 和另一个文件：

54 545 54
4555 55 55

Right now I have: 现在我有：

for line in file_in:
    try:
        float(line[1])
        file_out.write(line)
    except ValueError:
        print "Just using this as placeholder"

This successfully puts both chunks of numbers into a single file. 这样可以成功将两个数字块都放入一个文件中。 But I need it to put one chunk in one file, and another chunk in another file, and I'm lost on how to accomplish this. 但是我需要将一个块放在一个文件中，将另一个块放在另一个文件中，而我对如何完成此工作一无所知。

Answer 1

To know if a string is a number you can use str.isdigit : 要知道字符串是否为数字，可以使用str.isdigit ：

for line in file_in:
    # split line to parts
    parts = line.strip().split()
    # check all parts are numbers
    if all([str.isdigit(part) for part in parts]):
        if should_split:
            split += 1
            with open('split%d' % split, 'a') as f:
                f.write(line)
            # don't split until we skip a line
            should_split = False
        else:
            with open('split%d' % split, 'a') as f:
                f.write(line)
    elif not should_split:
        # skipped line means we should split
        should_split = True

Answer 2

You didn't specify what version of Python you were using but you might approach it this way in Python2.7. 您没有指定要使用的Python版本，但可以在Python2.7中以这种方式进行处理。

string.translate takes a translation table (which can be None) and a group of characters to translate (or delete if table is None). string.translate需要一个转换表（可以为None）和一组要转换的字符（如果table为None则删除）。

You can set your delete_chars to everything but 0-9 and space by slicing string.printable correctly: 您可以通过正确切片string.printable将delete_chars设置为0-9和空格以外的所有内容：

>>> import string
>>> remove_chars = string.printable[10:-6] + string.printable[-4:]
>>> string.translate('Some crap 3434', None, remove_chars)
'  3434'
>>> string.translate('34 45 56', None, remove_chars)
'34 45 56'

Adding a strip to trim white space on the left and right and iterating over a testfile containing the data from your question: 添加一条strip以修剪左右两侧的空白，并遍历包含您的问题数据的测试文件：

>>> with open('testfile.txt') as testfile:
...   for line in testfile:
...     trans = line.translate(None, remove_chars).strip()
...     if trans:
...       print trans
... 
3434
34 56 56
34 55 55
23
33
54 545 54
4555 55 55

Answer 3

You can use regex here.But this will require reading file into a variable by file.read() or something.(If the file is not huge) 您可以在这里使用regex，但这需要通过file.read()或其他方式将文件读入变量（如果文件不大）

((?:(?:\d+ ){2}\d+(?:\n|$))+)

See demo. 参见演示。

https://regex101.com/r/tX2bH4/20 https://regex101.com/r/tX2bH4/20

import re
p = re.compile(r'((?:(?:\d+ ){2}\d+(?:\n|$))+)', re.IGNORECASE)
test_str = "Some crap here: 3434\nA couple more lines\nof crap.\n34 56 56\n34 55 55\nA bunch more crap here\nMore crap here: 23\nAnd more: 33\n54 545 54\n4555 55 55"

re.findall(p, test_str)

re.findall returns a list.You can easily put each content of list in a new file. re.findall返回一个列表。您可以轻松地将列表的每个内容放入一个新文件中。

使用Python从文件中提取数字段？

问题描述

3 个解决方案

解决方案1
0 2015-01-15 16:51:23

解决方案2
0 2015-01-15 17:01:39

解决方案3
0 2015-01-15 17:01:42

使用Python从文件中提取数字段？

问题描述

3 个解决方案

解决方案1 0 2015-01-15 16:51:23

解决方案2 0 2015-01-15 17:01:39

解决方案3 0 2015-01-15 17:01:42

解决方案1
0 2015-01-15 16:51:23

解决方案2
0 2015-01-15 17:01:39

解决方案3
0 2015-01-15 17:01:42