简体   繁体   English

titan,使用batchgraph加载数据

[英]titan, loading data with batchgraph

I want to load 2.5 millions vertex into titan with client application. 我想使用客户端应用程序将250万个顶点加载到titan中。 I have formatted txt file. 我已经格式化了txt文件。 First line of this file; 该文件的第一行;

id:12345,companyname:Abcd,country:Abcd,... format(propertyname:propertyvalue,...) id:12345,companyname:Abcd,country:Abcd,... format(propertyname:propertyvalue,...)

I tried load sample 100 lines into titan using Rexter from my client app and succeed. 我尝试使用客户端应用程序中的Rexter将样本100行加载到titan中,并成功。

For 2.5 millions lines, i think using BatchGraph is the best way. 对于250万行,我认为使用BatchGraph是最好的方法。 For testing just get first line and saved as test.txt 对于测试,只需获取第一行并保存为test.txt

Successfully compiled and run this code; 成功编译并运行此代码;

            BaseConfiguration config = new BaseConfiguration();
            config.setProperty("storage.backend", "inmemory");
            config.setProperty("storage.hostname", "192.168.200.141");
            config.setProperty("storage.port", "8182");
            config.setProperty("storage.batch-loading", "true");
            TitanGraph graph = null;
            graph = TitanFactory.open(config);
            BatchGraph bg = new BatchGraph(graph, VertexIDType.NUMBER, 1000);
            Vertex currentNode = null;

            String path = "c:\\test.txt";
            Charset encoding = Charset.forName("ISO-8859-1");
            List<String> lines = null;
            try {
                lines = Files.readAllLines(Paths.get(path), encoding);
            } catch (IOException e) {
                e.printStackTrace();
            }

            for (String line : lines) {
                currentNode = bg.addVertex(1);
                String[] values = line.split(",");
                for (String value : values) {
                    String[] property = value.split(":");
                    currentNode.setProperty(property[0].toString(), property[1].toString());
                }
                bg.commit();
            }

When adding property, getting this error; 添加属性时,出现此错误;

java.lang.IllegalArgumentException: Property Key with given name does not exist: id
at com.thinkaurelius.titan.graphdb.types.typemaker.DisableDefaultSchemaMaker.makePropertyKey(DisableDefaultSchemaMaker.java:27)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.getOrCreatePropertyKey(StandardTitanTx.java:902)
at com.thinkaurelius.titan.graphdb.vertices.AbstractVertex.setProperty(AbstractVertex.java:239)
at com.tinkerpop.blueprints.util.wrappers.batch.BatchGraph$BatchVertex.setProperty(BatchGraph.java:492)
at tr.com.titanbulk.TitanBulk$5.widgetSelected(TitanBulk.java:213)
at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)

I have already set property keys and composite index via gremlin; 我已经通过gremlin设置了属性键和复合索引;

mgmt = g.getManagementSystem()
id = mgmt.makePropertyKey('id').dataType(Integer.class).make()
companyname = mgmt.makePropertyKey('companyname').dataType(String.class).make()
country = mgmt.makePropertyKey('country').dataType(String.class).make()
mgmt.buildIndex('ni_id',Vertex.class).addKey(id).buildCompositeIndex()         
mgmt.buildIndex('ni_companynamecountry',Vertex.class).addKey(companyname).addKey(country).buildCompositeIndex()
mgmt.buildIndex('ni_companyname',Vertex.class).addKey(companyname).buildCompositeIndex()
mgmt.buildIndex('ni_country',Vertex.class).addKey(country).buildCompositeIndex()
mgmt.commit()

g.getIndexedKeys(Vertex.class)
==>id
==>companyname
==>country

Successfully loaded from txt via gremlin using cassandra backend ( How to import a CSV file into Titan graph database? ). 使用cassandra后端通过gremlin从txt成功加载( 如何将CSV文件导入Titan图形数据库? )。 But still need to do it from my app. 但是仍然需要通过我的应用程序来完成。 I changed; 我变了; config.setProperty("storage.backend", "inmemory"); config.setProperty(“ storage.backend”,“内存”); to config.setProperty("storage.backend", "cassandra"); 到config.setProperty(“ storage.backend”,“ cassandra”);

but when opening connection (graph = TitanFactory.open(config);) getting this error; 但是当打开连接时(graph = TitanFactory.open(config);),出现此错误;

18:26:15.503 [main] DEBUG c.t.t.d.c.a.AstyanaxStoreManager - About to instantiate class public com.netflix.astyanax.connectionpool.impl.FixedRetryBackoffStrategy(int,int) with 2 arguments
18:26:15.509 [main] DEBUG c.t.t.d.c.a.AstyanaxStoreManager - Instantiated RetryBackoffStrategy object com.netflix.astyanax.connectionpool.impl.FixedRetryBackoffStrategy@52e6fdee from config string "com.netflix.astyanax.connectionpool.impl.FixedRetryBackoffStrategy,1000,5000"
18:26:15.511 [main] DEBUG c.t.t.d.c.a.AstyanaxStoreManager - About to instantiate class public com.netflix.astyanax.retry.BoundedExponentialBackoff(long,long,int) with 3 arguments
18:26:15.512 [main] DEBUG c.t.t.d.c.a.AstyanaxStoreManager - Instantiated RetryPolicy object com.netflix.astyanax.retry.BoundedExponentialBackoff@7ec7ffd3[maxSleepTimeMs=25000,MAX_SHIFT=30,random=java.util.Random@dd8ba08,baseSleepTimeMs=100,maxAttempts=8,attempts=0] from config string "com.netflix.astyanax.retry.BoundedExponentialBackoff,100,25000,8"
18:26:15.530 [main] DEBUG c.t.t.d.c.a.AstyanaxStoreManager - Custom RetryBackoffStrategy com.netflix.astyanax.connectionpool.impl.FixedRetryBackoffStrategy@52e6fdee
18:26:15.810 [main] INFO  c.n.a.c.i.ConnectionPoolMBeanManager - Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=ClusterTitanConnectionPool,ServiceType=connectionpool
18:26:15.823 [main] INFO  c.n.a.c.i.CountingConnectionPoolMonitor - AddHost: 192.168.200.141
18:26:16.851 [pool-4-thread-1] DEBUG c.n.astyanax.thrift.ThriftConverter - java.net.ConnectException: Connection refused: connect
18:26:25.832 [main] DEBUG c.t.t.d.c.a.AstyanaxStoreManager - Failed to describe keyspace titan
18:26:25.832 [main] DEBUG c.t.t.d.c.a.AstyanaxStoreManager - Creating keyspace titan...
18:26:26.853 [pool-4-thread-1] DEBUG c.n.astyanax.thrift.ThriftConverter - java.net.ConnectException: Connection refused: connect
18:26:35.848 [main] DEBUG c.t.t.d.c.a.AstyanaxStoreManager - Failed to create keyspace titan
java.lang.IllegalArgumentException: Could not instantiate implementation: com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager
at com.thinkaurelius.titan.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:55)
at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:421)
at com.thinkaurelius.titan.diskstorage.Backend.getStorageManager(Backend.java:361)
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1275)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:93)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:73)
at tr.com.kale.titanbulk.TitanBulk$5.widgetSelected(TitanBulk.java:196)
at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at tr.com.kale.titanbulk.TitanBulk.open(TitanBulk.java:68)
at tr.com.kale.titanbulk.TitanBulk.main(TitanBulk.java:52)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.thinkaurelius.titan.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:44)
... 13 more
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Temporary failure in storage backend
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:563)
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.<init>(AstyanaxStoreManager.java:283)
... 18 more
Caused by: com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException: [host=192.168.200.141(192.168.200.141):9160, latency=10002(10002), attempts=1]Timed out waiting for connection
at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231)
at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:198)
at com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:84)
at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:117)
at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338)
at com.netflix.astyanax.thrift.ThriftClusterImpl.executeSchemaChangeOperation(ThriftClusterImpl.java:146)
at com.netflix.astyanax.thrift.ThriftClusterImpl.internalCreateKeyspace(ThriftClusterImpl.java:321)
at com.netflix.astyanax.thrift.ThriftClusterImpl.addKeyspace(ThriftClusterImpl.java:294)
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:558)
... 19 more
java.lang.IllegalArgumentException: Graph may not be null
at com.tinkerpop.blueprints.util.wrappers.batch.BatchGraph.<init>(BatchGraph.java:81)
at tr.com.kale.titanbulk.TitanBulk$5.widgetSelected(TitanBulk.java:206)
at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at tr.com.kale.titanbulk.TitanBulk.open(TitanBulk.java:68)
at tr.com.kale.titanbulk.TitanBulk.main(TitanBulk.java:52)

I also tried cassandrathrift; 我也尝试过cassandrathrift。

18:35:18.296 [main] DEBUG c.t.t.d.c.t.t.CTConnectionFactory - Creating TSocket(192.168.200.141, 9160, null, null, 10000)
java.lang.IllegalArgumentException: Could not instantiate implementation: com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager
at com.thinkaurelius.titan.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:55)
at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:421)
at com.thinkaurelius.titan.diskstorage.Backend.getStorageManager(Backend.java:361)
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1275)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:93)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:73)
at tr.com.kale.titanbulk.TitanBulk$5.widgetSelected(TitanBulk.java:196)
at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at tr.com.kale.titanbulk.TitanBulk.open(TitanBulk.java:68)
at tr.com.kale.titanbulk.TitanBulk.main(TitanBulk.java:52)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.thinkaurelius.titan.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:44)
... 13 more
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Temporary failure in storage backend
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager.getCassandraPartitioner(CassandraThriftStoreManager.java:218)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager.<init>(CassandraThriftStoreManager.java:196)
... 18 more
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused: connect
at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.thriftpool.CTConnectionFactory.makeRawConnection(CTConnectionFactory.java:88)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.thriftpool.CTConnectionFactory.makeObject(CTConnectionFactory.java:52)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.thriftpool.CTConnectionFactory.makeObject(CTConnectionFactory.java:21)
at org.apache.commons.pool.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:1220)
at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager.getCassandraPartitioner(CassandraThriftStoreManager.java:215)
... 19 more
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
... 25 more

Thanks. 谢谢。

I can find a few error here. 我可以在这里找到一些错误。

First of all you cannot use "id" as property key name cause it's a reserved word. 首先,您不能将“ id”用作属性键名称,因为它是保留字。

Secondly for you should have created a propertyKey for each property you would like to add using the management api mng.makePropertyKey("prop").dataType(String.class).make(); 其次,您应该使用管理api mng.makePropertyKey("prop").dataType(String.class).make();为要添加的每个属性创建一个propertyKey mng.makePropertyKey("prop").dataType(String.class).make(); for example. 例如。

here's a test with batch loading that works 这是一个可以批量加载的测试

    @Test
    public void bulkLoad(){
    BaseConfiguration config = new BaseConfiguration();
    config.setProperty("storage.backend", "inmemory");
    config.setProperty("storage.batch-loading", "true");
    TitanGraph graph = TitanFactory.open(config);
    TitanManagement mng = graph.getManagementSystem();
    if (mng.getPropertyKey("prop") == null) {
        PropertyKey pk = mng.makePropertyKey("prop").dataType(String.class).make();
        mng.buildIndex("prop_index", Vertex.class).addKey(pk).buildCompositeIndex();
    }
    mng.commit();

    BatchGraph bg = new BatchGraph(graph, VertexIDType.STRING, 1000);
    System.out.println("Start bulk loading");
    IntStream.range(1,1000).forEach(i -> {
        Vertex v = bg.addVertex("id"+i);
        v.setProperty("prop", "prop"+i);
    });
    bg.commit();

    assertNotNull(bg.getVertex("id10"));
    assertEquals("prop10",bg.getVertex("id10").getProperty("prop"));
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM