使用Python解析不同的数据类型

Question

I want to parse the following line: 我想解析以下行：

#3 = IFCPERSONANDORGANIZATION(#4, #5, $); ＃3 = IFCPERSONANDORGANIZATION（＃4，＃5，$）;

and want to extract the numbers: 3, 4 , 5 to integer values and 'IFCPERSONANDORGANIZATION' as string value, in order to save these attributes in a graph with networkx. 并想提取数字：3、4、5为整数值，并将“ IFCPERSONANDORGANIZATION”作为字符串值，以便将这些属性保存在具有networkx的图形中。

This my code: 这是我的代码：

data = []
with open('test.ifc') as f:
    for line in f:
        if line.startswith('#'):
            words = line.rstrip().split('#')
            print(words)
            node = int(words[0])
            data.append(node)

Error: ValueError: invalid literal for int() with base 10: '' 错误：ValueError：int（）以10为底的无效文字：

How can I use regex, if the line structure is different every time? 如果每次的行结构都不相同，如何使用正则表达式？ Like this: 像这样：

#3 = IFCPERSONANDORGANIZATION(#4, #5, $);
#2 = IFCOWNERHISTORY(#3, #6, $, .NOTDEFINED., $, $, $, 1348486883);
#4 = IFCPERSON($, 'Bonsma', 'Peter', $, $, $, $, $);
#5 = IFCORGANIZATION($, 'RDF', 'RDF Ltd.', $, $);
#6 = IFCAPPLICATION(#5, '0.10', 'Test Application', 'TA 1001');****

Answer 1

You could use regex: 您可以使用正则表达式：

import re
line = '#3 = IFCPERSONANDORGANIZATION(#4, #5, $);'
node, name, a, b = re.search(r'(\d+) = (\w+)\(#(\d+), #(\d+), \$\)', line).groups()
node, a, b = map(int, [node, a, b])
print(node, name, a, b)

prints 版画

3 IFCPERSONANDORGANIZATION 4 5

Answer 2

May be a late comment but I come up with your question and given answers when I am making a similar search. 可能是较晚的评论，但是当我进行类似搜索时，我想出了您的问题并给出了答案。 @user3926906 an IFC file structure is generally changing for each different file. @ user3926906 IFC文件结构通常会针对每个不同的文件而更改。 When you are using re.search() did you experience any challenge to split the # of entities? 当您使用re.search()你体验到分裂的任何挑战#实体？ I am asking because some of the entities does not # for referring other entities. 我问是因为某些实体没有#引用其他实体。 Thanks 谢谢

使用Python解析不同的数据类型

问题描述

2 个解决方案

解决方案1
0 2014-08-10 14:42:54

解决方案2
0 2015-07-09 13:18:31

使用Python解析不同的数据类型

问题描述

2 个解决方案

解决方案1 0 2014-08-10 14:42:54

解决方案2 0 2015-07-09 13:18:31

解决方案1
0 2014-08-10 14:42:54

解决方案2
0 2015-07-09 13:18:31