简体   繁体   English

读取networkx GML时出现UnicodeDecodeError

[英]UnicodeDecodeError while reading networkx GML

I am trying to read a GML file using 我正在尝试使用读取GML文件

nx.read_gml('test.gml')

I checked the networkx read_gml() documentation. 我检查了networkx read_gml()文档。 It is said that The GML specification says that files should be ASCII encoded , so when I write the GML, I use the following 据说GML规范说文件应该是ASCII编码的 ,所以当我编写GML时,我使用以下代码

reload(sys)
sys.setdefaultencoding('ascii')    
nx.write_gml(g, fname + '.gml')

The content of test.gml is given below. 下面给出了test.gml的内容。

graph [
  name "Country-based Relationships Graph"
  node [
    id 0
    label "1-14014874"
    ned 0
    ntype 1
    name "3A/63 KIRRIBILLI AVENUE; KIRRIBILLI; AUSTRALIA"
  ]
  node [
    id 1
    label "2-12097019"
    name "ANTHONY WADDELL LATIMER"
    ned 0
    ntype 2
  ]
  node [
    id 2
    label "2-12201665"
    name "QUEENSLAND M M PTY LTD"
    ned 0
    ntype 2
  ]
  node [
    id 3
    label "1-14007784"
    ned 0
    ntype 1
    name "2/15 MOSMAN STREET MOSMAN 2088"
  ]
  node [
    id 4
    label "1-14007787"
    ned 0
    ntype 1
    name "2/19 SEABREEZE PLACE THIRROUL NSW 2515"
  ]
  node [
    id 5
    label "4-10124385"
    name "SOUTH AMERICAN FERRO METALS LIMITED"
    ned 0
    ntype 4
  ]
  node [
    id 6
    label "2-12100977"
    name "MARTIN AYLMER GREEN"
    ned 0
    ntype 2
  ]
  node [
    id 7
    label "1-14023939"
    ned 0
    ntype 1
    name "9/4 BILLYARD AVENUE; ELIZABETH BAY; AUSTRALIA"
  ]
  node [
    id 8
    label "1-14017022"
    ned 0
    ntype 1
    name ""47/228 MOORE PARKE ROAD, PADDINGTON""
  ]
  node [
    id 9
    label "2-12095303"
    name "GOLDFIND HOLDINGS PTY LTD"
    ned 0
    ntype 2
  ]
  node [
    id 10
    label "1-14019821"
    ned 0
    ntype 1
    name ""5 RISORTA AVENUE, ST IVES 2075 AUSTRALIA""
  ]
  node [
    id 11
    label "1-14001076"
    ned 0
    ntype 1
    name ""10B CONWAY AVENUE, ROSE BAY NSW 2028""
  ]
  node [
    id 12
    label "2-12195748"
    name "DAVID GRAHAM GRAY"
    ned 0
    ntype 2
  ]
  node [
    id 13
    label "2-12220072"
    name "GEORGE KOTEFSKI"
    ned 0
    ntype 2
  ]
  node [
    id 14
    label "2-12121150"
    name "SINO EUROPE INVESTMENTS LIMITED"
    ned 0
    ntype 2
  ]
  node [
    id 15
    label "2-12129794"
    name "STEPHEN JOHN TURNER"
    ned 1
    ntype 2
  ]
  node [
    id 16
    label "1-14003998"
    ned 0
    ntype 1
    name "149 Hudson Parade; Clareville; NSW 2107; Australia"
  ]
  node [
    id 17
    label "4-10110003"
    ned 0
    ntype 4
    name "BUNSWICK INVESTMENTS LIMITED"
  ]
  node [
    id 18
    label "1-14014446"
    ned 0
    ntype 1
    name ""373 EDINBURGH RD, CASTLECRAG NSW""
  ]
  node [
    id 19
    label "1-14024552"
    ned 0
    ntype 1
    name ""9 BARRACK STREET, SYDNEY NSW 2000""
  ]
  node [
    id 20
    label "1-14082647"
    name "UNIT 4/281 O'SULLIVAN ROAD BELLEVUE HILL NSW 2023"
    ned 0
    ntype 1
  ]
  node [
    id 21
    label "1-14002851"
    ned 0
    ntype 1
    name "12 CHERUB CLOSE; BALLUJARA; WA 6066 AUSTRALIA"
  ]
  node [
    id 22
    label "2-12220071"
    name "THE KOGOS FAMILY TRUST"
    ned 0
    ntype 2
  ]
  node [
    id 23
    label "2-12171442"
    name "MICHAEL JOHN DOYLE"
    ned 0
    ntype 2
  ]
  node [
    id 24
    label "2-12171441"
    name "GEORGINA TSOUTSOURAS"
    ned 0
    ntype 2
  ]
  node [
    id 25
    label "2-12171440"
    name "TERESA ODETTE VIEIRA GARCES"
    ned 0
    ntype 2
  ]
  node [
    id 26
    label "2-12171465"
    name "South American Ferro Metals Limited (Formerly “Riviera Resources Limited”"
    ned 0
    ntype 2
  ]
  node [
    id 27
    label "2-12171464"
    name "MP CAPITAL PTY LTD"
    ned 0
    ntype 2
  ]
  node [
    id 28
    label "2-12098018"
    name "Burton Securities Limited"
    ned 0
    ntype 2
  ]
  node [
    id 29
    label "2-12171461"
    name "PAUL BOURS"
    ned 0
    ntype 2
  ]
  node [
    id 30
    label "2-12129946"
    name "STEPHEN JOHN TURNER"
    ned 1
    ntype 2
  ]
  node [
    id 31
    label "2-12171463"
    name "BARRY ROBERT MCINNES"
    ned 0
    ntype 2
  ]
  node [
    id 32
    label "2-12171443"
    name "LORI MARGARET RAYNER"
    ned 0
    ntype 2
  ]
  node [
    id 33
    label "2-12171502"
    name "Afro Pacific Capital Pty Limited"
    ned 0
    ntype 2
  ]
  node [
    id 34
    label "2-12129947"
    name "STEPHEN JOHN TURNER"
    ned 1
    ntype 2
  ]
  node [
    id 35
    label "4-10116684"
    name "SUPER ALLOYS AFRICA LIMITED"
    ned 0
    ntype 4
  ]
  node [
    id 36
    label "4-10204113"
    ned 0
    ntype 4
    name "African Chrome Limited"
  ]
  node [
    id 37
    label "1-14002991"
    ned 0
    ntype 1
    name ""12 LAVONI STREET, BALMORAL BEACH 2088""
  ]
  node [
    id 38
    label "1-14063550"
    name "PO BOX 6215; SOUTH YARRA; VICTORIA; 3141 AUSTRALIA"
    ned 0
    ntype 1
  ]
  node [
    id 39
    label "1-14063476"
    ned 0
    ntype 1
    name "PO BOX 6009; QUEANBEYAN; N.S.W. 2620 AUSTRALIA"
  ]
  node [
    id 40
    label "2-12070744"
    name "GARRY JACK COHEN"
    ned 0
    ntype 2
  ]
  node [
    id 41
    label "2-13010180"
    name "ALAN DAVID DOYLE"
    ned 1
    ntype 2
  ]
  node [
    id 42
    label "1-14021460"
    ned 0
    ntype 1
    name "6 Wray Street; Batemans Bay; NSW 2536; Australia"
  ]
  node [
    id 43
    label "2-12117437"
    name "IAN DONALD PRATT"
    ned 0
    ntype 2
  ]
  node [
    id 44
    label "2-12110721"
    name "ALAN DAVID DOYLE"
    ned 0
    ntype 2
  ]
  node [
    id 45
    label "2-12108471"
    name "MEGAN BLACK"
    ned 0
    ntype 2
  ]
  node [
    id 46
    label "1-14023938"
    ned 0
    ntype 1
    name "9/4 BILLYARD AVENUE; ELIZABETH BAY; AUASTRALIA"
  ]
  node [
    id 47
    label "4-10204047"
    ned 0
    ntype 4
    name "ASIA ROCK HOLDINGS LTD."
  ]
  node [
    id 48
    label "2-12117663"
    name "JOHN LYSTER ABEL"
    ned 0
    ntype 2
  ]
  node [
    id 49
    label "2-12107354"
    name "W J K INVESTMENTS PTY LTD"
    ned 0
    ntype 2
  ]
  node [
    id 50
    label "2-12121288"
    name "Barry Robert Mcinnes Superannuation Fund"
    ned 0
    ntype 2
  ]
  node [
    id 51
    label "2-12115776"
    name "KELLY LEANNE BINDON"
    ned 0
    ntype 2
  ]
  node [
    id 52
    label "4-10128856"
    ned 0
    ntype 4
    name "ASD SERVICES LIMITED"
  ]
  node [
    id 53
    label "2-12196030"
    name "Patermat Pty Limited"
    ned 0
    ntype 2
  ]
  node [
    id 54
    label "1-14017203"
    ned 0
    ntype 1
    name "486 THE RIDGE ROAD; SURF BEACH; NSW 2536"
  ]
  node [
    id 55
    label "1-14013008"
    ned 0
    ntype 1
    name "30 JAPONICA AVE WEST EPPING 2121"
  ]
  node [
    id 56
    label "2-12219691"
    name "ALAN DAVID DOYLE"
    ned 1
    ntype 2
  ]
  node [
    id 57
    label "2-12171462"
    name "SUNTRONIC PTY LTD"
    ned 0
    ntype 2
  ]
  node [
    id 58
    label "2-12128104"
    name "BANYAN PROPERTIES INC."
    ned 0
    ntype 2
  ]
  node [
    id 59
    label "1-14022758"
    ned 0
    ntype 1
    name "8/4 BILLYARD AVENUE; ELIZABETH BAY 2011; AUSTRALIA"
  ]
  node [
    id 60
    label "1-14048421"
    name "LEVEL 11; 151 MACQUARIE STREET; SYDNEY"
    ned 0
    ntype 1
  ]
  node [
    id 61
    label "1-14048420"
    ned 0
    ntype 1
    name "Level 11; 1511 Macquarie Street; Sydney NSW 2000; Australia"
  ]
  node [
    id 62
    label "1-14048422"
    ned 0
    ntype 1
    name "Level 11; 151 Macquarie Street; Sydney; NSW; 2000; Australia"
  ]
  node [
    id 63
    label "2-12198327"
    name "MALACHITE ENTERPRISES LIMITED"
    ned 0
    ntype 2
  ]
  node [
    id 64
    label "2-12198326"
    name "VICTORIAN SECURITIES NO 2 PTY LTD."
    ned 0
    ntype 2
  ]
  node [
    id 65
    label "2-12095211"
    name "TREVOR JONES"
    ned 0
    ntype 2
  ]
  node [
    id 66
    label "1-14001856"
    ned 0
    ntype 1
    name "11A BURTON STREET MOSMAN 2070"
  ]
  node [
    id 67
    label "1-14008805"
    ned 0
    ntype 1
    name "21 BRETT STREET KINGS LANGLEY NSW 2147"
  ]
  node [
    id 68
    label "4-10131080"
    ned 0
    ntype 4
    name "BRAZILIAN IRON LIMITED"
  ]
  node [
    id 69
    label "1-14064238"
    ned 0
    ntype 1
    name "PO Box N284 Grosvenor Place; Sydney NSW 1220; Australia"
  ]
  node [
    id 70
    label "2-12113688"
    name "GABRIELLE MARY JARVIS"
    ned 0
    ntype 2
  ]
  node [
    id 71
    label "2-12103291"
    name "DAVID ANTHONY BURROUGHS"
    ned 0
    ntype 2
  ]
  node [
    id 72
    label "1-14006789"
    ned 0
    ntype 1
    name ""19 COLLINS AVENUE, ROSE BAY NSW 2028""
  ]
  node [
    id 73
    label "1-14082636"
    name "UNIT 3A; 63-65 KIRRIBILLI AVENUE; KIRRIBILLI; NSW 2061; AUSTRALIA"
    ned 0
    ntype 1
  ]
  node [
    id 74
    label "1-14079412"
    ned 0
    ntype 1
    name "SUITE 2; 1233 HIGH STREET; ARMELALE VIC 3143 AUSTRALIA"
  ]
  node [
    id 75
    label "4-10109120"
    name "PACK-TECH INTERNATIONAL LICENSING PTY LIMITED"
    ned 0
    ntype 4
  ]
  edge [
    source 0
    target 44
    weight 1
  ]
  edge [
    source 1
    target 37
    weight 1
  ]
  edge [
    source 1
    target 5
    weight 2
  ]
  edge [
    source 2
    target 5
    weight 1
  ]
  edge [
    source 2
    target 74
    weight 1
  ]
  edge [
    source 3
    target 49
    weight 1
  ]
  edge [
    source 4
    target 71
    weight 1
  ]
  edge [
    source 5
    target 32
    weight 2
  ]
  edge [
    source 5
    target 6
    weight 2
  ]
  edge [
    source 5
    target 24
    weight 2
  ]
  edge [
    source 5
    target 12
    weight 2
  ]
  edge [
    source 5
    target 25
    weight 2
  ]
  edge [
    source 5
    target 14
    weight 1
  ]
  edge [
    source 5
    target 22
    weight 2
  ]
  edge [
    source 5
    target 23
    weight 2
  ]
  edge [
    source 5
    target 9
    weight 2
  ]
  edge [
    source 5
    target 13
    weight 1
  ]
  edge [
    source 5
    target 26
    weight 2
  ]
  edge [
    source 5
    target 27
    weight 1
  ]
  edge [
    source 5
    target 28
    weight 1
  ]
  edge [
    source 5
    target 29
    weight 1
  ]
  edge [
    source 5
    target 30
    weight 1
  ]
  edge [
    source 5
    target 31
    weight 2
  ]
  edge [
    source 5
    target 45
    weight 1
  ]
  edge [
    source 5
    target 33
    weight 1
  ]
  edge [
    source 5
    target 40
    weight 1
  ]
  edge [
    source 5
    target 58
    weight 1
  ]
  edge [
    source 5
    target 43
    weight 1
  ]
  edge [
    source 5
    target 44
    weight 2
  ]
  edge [
    source 5
    target 57
    weight 1
  ]
  edge [
    source 5
    target 48
    weight 2
  ]
  edge [
    source 5
    target 49
    weight 2
  ]
  edge [
    source 5
    target 50
    weight 1
  ]
  edge [
    source 5
    target 51
    weight 2
  ]
  edge [
    source 5
    target 53
    weight 1
  ]
  edge [
    source 5
    target 63
    weight 1
  ]
  edge [
    source 5
    target 64
    weight 1
  ]
  edge [
    source 5
    target 65
    weight 1
  ]
  edge [
    source 5
    target 70
    weight 2
  ]
  edge [
    source 5
    target 71
    weight 1
  ]
  edge [
    source 6
    target 72
    weight 1
  ]
  edge [
    source 7
    target 34
    weight 1
  ]
  edge [
    source 25
    target 60
    weight 1
  ]
  edge [
    source 8
    target 43
    weight 1
  ]
  edge [
    source 24
    target 60
    weight 1
  ]
  edge [
    source 10
    target 14
    weight 1
  ]
  edge [
    source 12
    target 39
    weight 1
  ]
  edge [
    source 13
    target 20
    weight 1
  ]
  edge [
    source 15
    target 17
    weight 1
  ]
  edge [
    source 15
    target 35
    weight 1
  ]
  edge [
    source 15
    target 59
    weight 1
  ]
  edge [
    source 15
    target 34
    weight 1
  ]
  edge [
    source 15
    target 30
    weight 1
  ]
  edge [
    source 16
    target 28
    weight 1
  ]
  edge [
    source 18
    target 51
    weight 1
  ]
  edge [
    source 19
    target 58
    weight 1
  ]
  edge [
    source 20
    target 22
    weight 1
  ]
  edge [
    source 21
    target 9
    weight 1
  ]
  edge [
    source 23
    target 60
    weight 1
  ]
  edge [
    source 11
    target 40
    weight 1
  ]
  edge [
    source 26
    target 62
    weight 1
  ]
  edge [
    source 27
    target 60
    weight 1
  ]
  edge [
    source 29
    target 60
    weight 1
  ]
  edge [
    source 30
    target 46
    weight 1
  ]
  edge [
    source 30
    target 34
    weight 1
  ]
  edge [
    source 31
    target 60
    weight 1
  ]
  edge [
    source 32
    target 60
    weight 1
  ]
  edge [
    source 33
    target 61
    weight 1
  ]
  edge [
    source 34
    target 75
    weight 1
  ]
  edge [
    source 35
    target 56
    weight 1
  ]
  edge [
    source 36
    target 41
    weight 1
  ]
  edge [
    source 38
    target 63
    weight 1
  ]
  edge [
    source 38
    target 64
    weight 1
  ]
  edge [
    source 41
    target 73
    weight 1
  ]
  edge [
    source 41
    target 56
    weight 1
  ]
  edge [
    source 41
    target 68
    weight 1
  ]
  edge [
    source 42
    target 50
    weight 1
  ]
  edge [
    source 45
    target 67
    weight 1
  ]
  edge [
    source 47
    target 56
    weight 1
  ]
  edge [
    source 48
    target 54
    weight 1
  ]
  edge [
    source 52
    target 56
    weight 1
  ]
  edge [
    source 53
    target 69
    weight 1
  ]
  edge [
    source 55
    target 70
    weight 1
  ]
  edge [
    source 56
    target 73
    weight 1
  ]
  edge [
    source 56
    target 75
    weight 1
  ]
  edge [
    source 57
    target 60
    weight 1
  ]
  edge [
    source 65
    target 66
    weight 1
  ]
]

However, when I try to read the GML, networkx throws the below exception, complaining that 但是,当我尝试阅读GML时,networkx抛出以下异常,抱怨说

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

May I know how can I 我可以知道如何

  1. read this GML properly? 正确阅读此GML? Is it possible to know which line of the GML causing this error? 是否可能知道导致该错误的GML的哪一行?
  2. write the GML without causing the above error? 写GML而不会导致上述错误?

With networkx version 2.0b1, running on Windows 7 使用networkx版本2.0b1,在Windows 7上运行

I got the following error: 我收到以下错误:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-9-12160890b63f> in <module>()
----> 1 G = nx.read_gml('test.gml')

<decorator-gen-501> in read_gml(path, label, destringizer)

c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\utils\decorators.pyc in _open_file(func, *args, **kwargs)
    219         # Finally, we call the original function, making sure to close the fobj.
    220         try:
--> 221             result = func(*new_args, **kwargs)
    222         finally:
    223             if close_fobj:

c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in read_gml(path, label, destringizer)
    216             yield line
    217 
--> 218     G = parse_gml_lines(filter_lines(path), label, destringizer)
    219     return G
    220 

c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_gml_lines(lines, label, destringizer)
    396 
    397     tokens = tokenize()
--> 398     graph = parse_graph()
    399 
    400     directed = graph.pop('directed', False)

c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_graph()
    385 
    386     def parse_graph():
--> 387         curr_token, dct = parse_kv(next(tokens))
    388         if curr_token[0] is not None:  # EOF
    389             unexpected(curr_token, 'EOF')

c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_kv(curr_token)
    370                 curr_token = next(tokens)
    371             elif category == 4:  # dict start
--> 372                 curr_token, value = parse_dict(curr_token)
    373             else:
    374                 unexpected(curr_token, "an int, float, string or '['")

c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_dict(curr_token)
    380     def parse_dict(curr_token):
    381         curr_token = consume(curr_token, 4, "'['")    # dict start
--> 382         curr_token, dct = parse_kv(curr_token)
    383         curr_token = consume(curr_token, 5, "']'")  # dict end
    384         return curr_token, dct

c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_kv(curr_token)
    370                 curr_token = next(tokens)
    371             elif category == 4:  # dict start
--> 372                 curr_token, value = parse_dict(curr_token)
    373             else:
    374                 unexpected(curr_token, "an int, float, string or '['")

c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_dict(curr_token)
    380     def parse_dict(curr_token):
    381         curr_token = consume(curr_token, 4, "'['")    # dict start
--> 382         curr_token, dct = parse_kv(curr_token)
    383         curr_token = consume(curr_token, 5, "']'")  # dict end
    384         return curr_token, dct

c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in parse_kv(curr_token)
    362                 curr_token = next(tokens)
    363             elif category == 3:  # strings
--> 364                 value = unescape(curr_token[1][1:-1])
    365                 if destringizer:
    366                     try:

c:\python2_7_13\lib\site-packages\networkx-2.0b1-py2.7.egg\networkx\readwrite\gml.pyc in unescape(text)
    119             return text  # leave unchanged
    120 
--> 121     return re.sub("&(?:[0-9A-Za-z]+|#(?:[0-9]+|x[0-9A-Fa-f]+));", fixup, text)
    122 
    123 

c:\python2_7_13\lib\re.pyc in sub(pattern, repl, string, count, flags)
    153     a callable, it's passed the match object and must return
    154     a replacement string to be used."""
--> 155     return _compile(pattern, flags).sub(repl, string, count)
    156 
    157 def subn(pattern, repl, string, count=0, flags=0):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

I tested this with the current networkx-2.0b1 beta version and it worked correctly. 我使用当前的networkx-2.0b1 beta版进行了测试,并且可以正常工作。

In [1]: import networkx as nx

In [2]: G = nx.read_gml('test.gml')

In [3]: nx.write_gml(G,'test2.gml')

In [4]: H = nx.read_gml('test2.gml')

In [5]: nx.is_isomorphic(G,H)
# True

If you are not using that version perhaps you could update and test? 如果您不使用该版本,也许可以进行更新和测试?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM