简体   繁体   English

将元数据存储到 Jackrabbit 存储库中

[英]Store metadata into Jackrabbit repository

can anybody explain to me, how to proceed in following scenario ?任何人都可以向我解释,如何在以下情况下进行?

  1. receiving documents (MS docs, ODS, PDF)接收文件(MS 文档、ODS、PDF)

  2. Dublic core metadata extraction via Apache Tika + content extraction via jackrabbit-content-extractors通过 Apache Tika 提取双核心元数据 + 通过 jackrabbit-content-extractors 提取内容

  3. using Jackrabbit to store documents (content) into repository together with their metadata ?使用 Jackrabbit 将文档(内容)与其元数据一起存储到存储库中

  4. retrieving documents + metadata检索文档 + 元数据

I'm interested in points 3 and 4 ...我对第 3 点和第 4 点感兴趣...

DETAILS: The application is processing documents interactively (some analysis - language detection, word count etc. + gather as many details possible - Dublin core + parsing the content/events handling) so that it returns results of the processing to the user and then the extracted content and metadata(extracted and custom user metadata) stores into JCR repository详细信息:应用程序以交互方式处理文档(一些分析 - 语言检测、字数等 + 收集尽可能多的细节 - Dublin 核心 + 解析内容/事件处理),以便将处理结果返回给用户,然后提取的内容和元数据(提取的和自定义的用户元数据)存储到 JCR 存储库中

Appreciate any helps, thank you感谢任何帮助,谢谢

Uploading files is basically the same for JCR 2.0 as it is for JCR 1.0. JCR 2.0 和 JCR 1.0 的上传文件基本相同。 However, JCR 2.0 adds a few additional built-in property definitions that are useful.但是,JCR 2.0 添加了一些额外的有用的内置属性定义。

The "nt:file" node type is intended to represent a file and has two built-in property definitions in JCR 2.0 (both of which are auto-created by the repository when nodes are created): “nt:file”节点类型旨在表示一个文件,并且在 JCR 2.0 中有两个内置属性定义(这两个都是在创建节点时由存储库自动创建的):

  • jcr:created (DATE) jcr:创建(日期)
  • jcr:createdBy (STRING) jcr:createdBy (STRING)

and defines a single child named "jcr:content".并定义了一个名为“jcr:content”的孩子。 This "jcr:content" node can be of any node type, but generally speaking all information pertaining to the content itself is stored on this child node.这个“jcr:content”节点可以是任何节点类型,但一般来说,与内容本身有关的所有信息都存储在这个子节点上。 The de facto standard is to use the "nt:resource" node type, which has these properties defined:事实上的标准是使用“nt:resource”节点类型,它定义了以下属性:

  • jcr:data (BINARY) mandatory jcr:data (BINARY) 强制
  • jcr:lastModified (DATE) autocreated jcr:lastModified (DATE) 自动创建
  • jcr:lastModifiedBy (STRING) autocreated jcr:lastModifiedBy (STRING) 自动创建
  • jcr:mimeType (STRING) protected? jcr:mimeType (STRING) 受保护?
  • jcr:encoding (STRING) protected? jcr:编码(字符串)保护?

Note that "jcr:mimeType" and "jcr:encoding" were added in JCR 2.0.请注意,“jcr:mimeType”和“jcr:encoding”是在 JCR 2.0 中添加的。

In particular, the purpose of the "jcr:mimeType" property was to do exactly what you're asking for - capture the "type" of the content.特别是,“jcr:mimeType”属性的目的是完全按照您的要求执行 - 捕获内容的“类型”。 However, the "jcr:mimeType" and "jcr:encoding" property definitions can be defined (by the JCR implementation) as protected (meaning the JCR implementation automatically sets them) - if this is the case, you would not be allowed to manually set these properties.但是,“jcr:mimeType”和“jcr:encoding”属性定义可以(由 JCR 实现)定义为受保护的(意味着 JCR 实现会自动设置它们) - 如果是这种情况,您将无法手动设置这些属性。 I believe that Jackrabbit and ModeShape do not treat these as protected.我相信JackrabbitModeShape不会将这些视为受保护的。

Here is some code that shows how to upload a file into a JCR 2.0 repository using these built-in node types:下面是一些代码,展示了如何使用这些内置节点类型将文件上传到 JCR 2.0 存储库:

// Get an input stream for the file ...
File file = ...
InputStream stream = new BufferedInputStream(new FileInputStream(file));

Node folder = session.getNode("/absolute/path/to/folder/node");
Node file = folder.addNode("Article.pdf","nt:file");
Node content = file.addNode("jcr:content","nt:resource");
Binary binary = session.getValueFactory().createBinary(stream);
content.setProperty("jcr:data",binary);

And if the JCR implementation does not treat the "jcr:mimeType" property as protected (ie, Jackrabbit and ModeShape), you'd have to set this property manually:如果 JCR 实现未将“jcr:mimeType”属性视为受保护(即 Jackrabbit 和 ModeShape),则必须手动设置此属性:

content.setProperty("jcr:mimeType","application/pdf");

Metadata can very easily be stored on the "nt:file" and "jcr:content" nodes, but out-of-the-box the "nt:file" and "nt:resource" node types don't allow for extra properties.元数据可以很容易地存储在“nt:file”和“jcr:content”节点上,但开箱即用的“nt:file”和“nt:resource”节点类型不允许额外的属性. So before you can add other properties, you first need to add a mixin (or multiple mixins) that have property definitions for the kinds of properties you want to store.因此,在添加其他属性之前,首先需要添加一个 mixin(或多个 mixin),其中包含要存储的属性类型的属性定义。 You can even define a mixin that would allow any property.你甚至可以定义一个允许任何属性的 mixin。 Here is a CND file defining such a mixin:这是一个定义这样一个 mixin 的 CND 文件:

<custom = 'http://example.com/mydomain'>
[custom:extensible] mixin
- * (undefined) multiple 
- * (undefined) 

After registering this node type definition, you can then use this on your nodes:注册此节点类型定义后,您可以在您的节点上使用它:

content.addMixin("custom:extensible");
content.setProperty("anyProp","some value");
content.setProperty("custom:otherProp","some other value");

You could also define and use a mixin that allowed for any Dublin Core element :您还可以定义和使用允许任何Dublin Core 元素的 mixin:

<dc = 'http://purl.org/dc/elements/1.1/'>
[dc:metadata] mixin
- dc:contributor (STRING)
- dc:coverage (STRING)
- dc:creator (STRING)
- dc:date (DATE)
- dc:description (STRING)
- dc:format (STRING)
- dc:identifier (STRING)
- dc:language (STRING)
- dc:publisher (STRING)
- dc:relation (STRING)
- dc:right (STRING)
- dc:source (STRING)
- dc:subject (STRING)
- dc:title (STRING)
- dc:type (STRING)

All of these properties are optional, and this mixin doesn't allow for properties of any name or type.所有这些属性都是可选的,并且这个 mixin 不允许任何名称或类型的属性。 I've also not really addressed with this 'dc:metadata' mixin the fact that some of these are already represented with the built-in properties (eg, "jcr:createBy", "jcr:lastModifiedBy", "jcr:created", "jcr:lastModified", "jcr:mimeType") and that some of them may be more related to content while others more related to the file.我也没有真正解决这个“dc:metadata”混合的问题,因为其中一些已经用内置属性表示(例如,“jcr:createBy”、“jcr:lastModifiedBy”、“jcr:created” , "jcr:lastModified", "jcr:mimeType") 并且其中一些可能与内容更相关,而另一些可能与文件更相关。

You could of course define other mixins that better suit your metadata needs, using inheritance where needed.您当然可以定义更适合您的元数据需求的其他 mixin,在需要的地方使用继承。 But be careful using inheritance with mixins - since JCR allows a node to multiple mixins, it's often best to design your mixins to be tightly scoped and facet-oriented (eg, "ex:taggable", "ex:describable", etc.) and then simply apply the appropriate mixins to a node as needed.但是在使用 mixin 时要小心——因为 JCR 允许一个节点有多个 mixin,通常最好将你的 mixin 设计为紧密作用域和面向方面的(例如,“ex:taggable”、“ex:describable”等)然后根据需要简单地将适当的混合应用到节点。

(It's even possible, though much more complicated, to define a mixin that allows more children under the "nt:file" nodes, and to store some metadata there.) (甚至可以定义一个mixin,允许在“nt:file”节点下有更多子节点,并在那里存储一些元数据。)

Mixins are fantastic and give a tremendous amount of flexibility and power to your JCR content. Mixins 非常棒,为您的 JCR 内容提供了极大的灵活性和功能。

Oh, and when you've created all of the nodes you want, be sure to save the session:哦,当你创建了你想要的所有节点时,一定要保存会话:

session.save();

I am new to Jackrabbit, working on 2.4.2.我是 Jackrabbit 的新手,正在开发 2.4.2。 As for your solution, you can check for the type using a core java logic and put cases defining any variation in your action.至于您的解决方案,您可以使用核心 java 逻辑检查类型并放置定义操作中任何变化的案例。

You won't need to worry about issues with saving contents of different .txt or .pdf as their content is converted into binary and saved.您无需担心保存不同 .txt 或 .pdf 内容的问题,因为它们的内容会被转换为二进制文件并保存。 Here is a small sample in which I uploaded and downloaded a pdf file in/from jackrabbit repo.这是一个小示例,我在其中上传和下载了 jackrabbit 存储库中的 pdf 文件。

    // Import the pdf file unless already imported 
            // This program is for sample purpose only so everything is hard coded.
        if (!root.hasNode("Alfresco_E0_Training.pdf"))
        { 
            System.out.print("Importing PDF... "); 

            // Create an unstructured node under which to import the XML 
            //Node node = root.addNode("importxml", "nt:unstructured"); 
            Node file = root.addNode("Alfresco_E0_Training.pdf","nt:file");

            // Import the file "Alfresco_E0_Training.pdf" under the created node 
            FileInputStream stream = new FileInputStream("<path of file>\\Alfresco_E0_Training.pdf");
            Node content = file.addNode("jcr:content","nt:resource");
            Binary binary = session.getValueFactory().createBinary(stream);
            content.setProperty("jcr:data",binary);
            stream.close();
            session.save(); 
            //System.out.println("done."); 
            System.out.println("::::::::::::::::::::Checking content of the node:::::::::::::::::::::::::");
            System.out.println("File Node Name : "+file.getName());
            System.out.println("File Node Identifier : "+file.getIdentifier());
            System.out.println("File Node child : "+file.JCR_CHILD_NODE_DEFINITION);
            System.out.println("Content Node Name : "+content.getName());
            System.out.println("Content Node Identifier : "+content.getIdentifier());
            System.out.println("Content Node Content : "+content.getProperty("jcr:data"));
            System.out.println(":::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::");

        }else
        {
            session.save();
            Node file = root.getNode("Alfresco_E0_Training.pdf");
            Node content = file.getNode("jcr:content");
            String path = content.getPath();
            Binary bin = session.getNode(path).getProperty("jcr:data").getBinary();
            InputStream stream = bin.getStream();
             File f=new File("C:<path of the output file>\\Alfresco_E0_Training.pdf");

              OutputStream out=new FileOutputStream(f);
              byte buf[]=new byte[1024];
              int len;
              while((len=stream.read(buf))>0)
              out.write(buf,0,len);
              out.close();
              stream.close();
              System.out.println("\nFile is created...................................");


            System.out.println("done."); 
            System.out.println("::::::::::::::::::::Checking content of the node:::::::::::::::::::::::::");
            System.out.println("File Node Name : "+file.getName());
            System.out.println("File Node Identifier : "+file.getIdentifier());
            //System.out.println("File Node child : "+file.JCR_CHILD_NODE_DEFINITION);
            System.out.println("Content Node Name : "+content.getName());
            System.out.println("Content Node Identifier : "+content.getIdentifier());
            System.out.println("Content Node Content : "+content.getProperty("jcr:data"));
            System.out.println(":::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::");
        } 

        //output the repository content
        } 
    catch (IOException e){
        System.out.println("Exception: "+e);
    }
    finally { 
        session.logout(); 
        } 
        } 
}

Hope this helps希望这可以帮助

I am a bit rusty with JCR and I have never used 2.0 but this should get you started.我对 JCR 有点生疏,我从未使用过 2.0,但这应该能让你开始。

See this link .请参阅此链接 You'll want to open up the second comment.你会想打开第二条评论。

You just store the file in a node and add additional metadata to the node.您只需将文件存储在节点中并向节点添加其他元数据。 Here is how to store the file:以下是存储文件的方法:

Node folder = session.getRootNode().getNode("path/to/file/uploads"); 
Node file = folder.addNode(fileName, "nt:file"); 
Node fileContent = file.addNode("jcr:content"); 
fileContent.setProperty("jcr:data", fileStream);
// Add other metadata
session.save();

How you store meta-data is up to you.您如何存储元数据取决于您。 A simple way is to just store key value pairs:一个简单的方法是只存储键值对:

fileContent.setProperty(key, value, PropertyType.STRING);

To read the data you just call getProperty() .要读取数据,您只需调用getProperty()

fileStream = fileContent.getProperty("jcr:data");
value = fileContent.getProperty(key);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM