简体   繁体   English

Azure Data Lake Analytics IOutputter 获取输出文件名

[英]Azure Data Lake Analytics IOutputter get output file name

I'm using a custom IOutputter to write the results of my U-SQL script to aa local database:我正在使用自定义 IOutputter 将我的 U-SQL 脚本的结果写入本地数据库:

OUTPUT @dataset
TO "/path/somefilename_{*}.file"
USING new CustomOutputter()

public class CustomOutputter: IOutputter
{          
        public CustomOutputter()
        {
            myCustomDatabase.Open("databasefile.database");
        }    

        public override void Output(IRow input, IUnstructuredWriter output)
        {

        }
}

Is there any possibility to replace "databasefile.database" with the specified output file path "/path/somefilename_{*}.file" ?有没有可能用指定的输出文件路径“/path/somefilename_{*}.file”替换“databasefile.database”?

Since I'm not able to pass output.BaseStream to the database I can't find a way to properly write to the correct file name.由于我无法将output.BaseStream传递给数据库,因此我找不到正确写入正确文件名的方法。

UPDATE How I copy the local DB file to the ADLA provided outputstream:更新我如何将本地数据库文件复制到 ADLA 提供的输出流:

        public override void Close()
        {
            using (var fs = File.Open("databasefile.database", FileMode.Open))
            {
                byte[] buffer = new byte[65536];
                int read;
                while ((read = fs.Read(buffer, 0, buffer.Length)) > 0)
                {
                    this.output.BaseStream.Write(buffer, 0, read);
                    this.output.BaseStream.Flush();
                }
            }
        }

I am not sure what you try to achieve.我不确定你想达到什么目的。

  1. Outputters (and UDOs in general) cannot leave their containers (VMs) when executed in ADLA (local execution has no such limit at this point).在 ADLA 中执行时,输出器(以及一般的 UDO)不能离开它们的容器(VM)(本地执行此时没有这样的限制)。 So connecting to a database outside the container is going to be blocked and I am not sure what it helps to write data into a database in a transient VM/container.因此,连接到容器外的数据库将被阻止,我不确定将数据写入临时 VM/容器中的数据库有什么帮助。

  2. The UDO model has a well-defined model to write to files that live in either ADLS or WASB by writing the data in the input row(set) into the output 's stream. UDO 模型有一个定义良好的模型,通过将input行(集)中的数据写入output流中,可以写入存在于 ADLS 或 WASB 中的文件。 You can write into local files, but again, these files will cease to exist after the vertex finishes execution.您可以写入本地文件,但同样,在顶点完成执行后,这些文件将不复存在。

Given this information, could you please rephrase?鉴于这些信息,你能改写一下吗?

Update based on clarifying comment基于澄清评论的更新

you have two options to generate a database from a rowset:您有两个选项可以从行集生成数据库:

  1. you use ADF to do the data movement.您使用 ADF 进行数据移动。 This is the most commonly used approach and probably the easiest.这是最常用的方法,也可能是最简单的方法。
  2. If you use a custom outputter you could try the following:如果您使用自定义输出器,您可以尝试以下操作:
    1. write the output rowset into the database which is local to your vertex (you have to deploy the database as a resource, so you probably need a small footprint version to fit into the resource size limit) using the database interface,使用数据库接口将输出行集写入顶点本地的数据库中(您必须将数据库部署为资源,因此您可能需要一个占用资源少的版本以适应资源大小限制),
    2. then read the database file from the vertex local directory into the output stream so you copy the file into ADLS.然后将数据库文件从顶点本地目录读取到输出流中,以便将文件复制到 ADLS。
    3. Note that you need atomic file processing on the outputter to avoid writing many database files that then get stitched together.请注意,您需要在输出器上进行原子文件处理,以避免编写许多随后拼接在一起的数据库文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM