简体   繁体   English

如何在ASP.Net Web API中摄取大量日志

[英]How to ingest large amount of logs in ASP.Net Web API

I am new to API development and I want to create a Web API end point which will be receiving a large amount of log data. 我是API开发的新手,我想创建一个Web API端点,它将接收大量的日志数据。 And I want to send that data to Amazon s3 bucket via Amazon Kinesis delivery stream . 我想通过Amazon Kinesis交付流将该数据发送到Amazon s3存储桶 Below is a sample application which works FINE, but I have NO CLUE how to INGEST large inbound of data and in What format my API should be receiving data ? 下面是一个可以正常运行的示例应用程序, 但是我不知道如何获取大量数据,API应该以哪种格式接收数据 How my API Endpoint should look like. 我的API端点应该是什么样子。

 [HttpPost]
 public async void Post() // HOW to allow it to receive large chunk of data?
 {
        await WriteToStream();
 }

    private async Task WriteToStream()
    {
        const string myStreamName = "test";
        Console.Error.WriteLine("Putting records in stream : " + myStreamName);
        // Write 10 UTF-8 encoded records to the stream.
        for (int j = 0; j < 10000; ++j)
        {
        // I AM HARDCODING DATA HERE FROM THE LOOP COUNTER!!! 
            byte[] dataAsBytes = Encoding.UTF8.GetBytes("testdata-" + j);
            using (MemoryStream memoryStream = new MemoryStream(dataAsBytes))
            {
                    PutRecordRequest putRecord = new PutRecordRequest();
                    putRecord.DeliveryStreamName = myStreamName;
                    Record record = new Record();
                    record.Data = memoryStream;
                    putRecord.Record = record;
                    await kinesisClient.PutRecordAsync(putRecord);
            }
        }
    }

PS: IN real world app I will not have that for loop. PS:在实际应用中,我不会使用for循环。 I want my API to ingest large data, what should be the definition of my API? 我希望我的API吸收大数据,我的API的定义应该是什么? Do I need to use something called multiform/data , file ? 我需要使用一种称为multiform / data的 文件吗? Please guide me. 请指导我。

Here is my thought process. 这是我的思考过程。 As you are exposing a API for the logging, your input should contain below attributes 在公开用于记录的API时,您的输入应包含以下属性

  • Log Level (info, debug, warn, fatal) 日志级别(信息,调试,警告,致命)
  • Log message (string) 日志消息(字符串)
  • Application ID 申请编号
  • Application Instance ID 应用程序实例ID
  • application IP 应用IP
  • Host (machine in which the error was logged) 主机(记录错误的计算机)
  • User ID (for whom the error occurred) 用户ID(发生错误的用户)
  • Time stamp in Utc (time at which the error occurred) Utc中的时间戳(发生错误的时间)
  • Additional Data (customisable as xml / json) 附加数据(可定制为xml / json)

I will suggest exposing the API as AWS lambda via Gateway API as it will help in scaling out as load increases. 我建议通过网关API将API公开为AWS lambda,因为它将有助于随着负载的增加而扩展。

To take sample for how to build API and use model binding, you may refer https://docs.microsoft.com/en-us/aspnet/web-api/overview/formats-and-model-binding/model-validation-in-aspnet-web-api 要获取有关如何构建API和使用模型绑定的示例,请参阅https://docs.microsoft.com/zh-cn/aspnet/web-api/overview/formats-and-model-binding/model-validation-在-ASPNET-Web的API

I don't have much context so basically will try to provide answer from how I see it. 我没有太多背景信息,因此基本上会尝试从我的看法中提供答案。

First instead of sending data to webapi I would send data directly to S3. 首先,不是将数据发送到webapi,而是将数据直接发送到S3。 In azure there is Share Access Token so you send request to you api to give you url where to upload file(there is many options but you can limit by time, limit by IP who can upload). 在蔚蓝中,存在共享访问令牌,因此您向api发送请求以向您提供url上载文件的位置(有很多选项,但您可以限制时间,可以限制可以上传IP的IP)。 So to upload file 1. Do call to get upload Url, 2. PUT to that url. 因此,要上传文件1.进行调用以获取上传网址,2.放入该网址。 Looks like in Amazon it called Signed Policy . 看起来在Amazon中它称为“ 签名策略”

After that write lambda function which will be triggered on S3 upload, this function will be sending event (Again I dont know how its in AWS but in azure I will send Blob Queue message) this event will contain url to file and start position. 写完lambda函数(将在S3上传时触发)之后,该函数将发送事件(同样,我不知道其在AWS中的运行方式,但天蓝色将发送Blob Queue消息),该事件将包含指向文件的URL和起始位置。

Write second Lambda which listens to events and do actually processing, so in my apps sometimes i know that to process N items it take 10 seconds so I usually choose N to be something not longer that 10-20 seconds, due to nature of deployments. 编写第二个Lambda来侦听事件并进行实际处理,因此在我的应用程序中有时我知道处理N个项目需要10秒钟,因此由于部署的性质,我通常选择N为不超过10-20秒的时间。 After you processed N rows and not yet finished send same event but now Start position = Start position on the begging + N. More info how to read range 在处理了N行并且尚未完成之后,发送相同的事件,但是现在开始位置=乞求时的开始位置+N。 更多信息如何读取范围

Designing this way you can process large files, even more you can be smarter because you can send multiple events where you can say Start Line, End Line so you will be able to process your file in multiple instances. 通过这种方式设计,您可以处理大文件,甚至可以变得更聪明,因为您可以发送多个事件,在其中可以说出“开始行”,“结束行”,这样便可以在多个实例中处理文件。

PS. PS。 Why I would not recommend you upload files to WebApi its because those files will be in memory, so lets say you have 1GB files sending from multiple sources in this case you will kill your servers in minutes. 为什么我不建议您将文件上传到WebApi,因为这些文件将存储在内存中,所以可以说您有多个源发送1GB的文件,在这种情况下,您将在几分钟之内杀死服务器。

PS2. PS2。 Format of file depends, could be json since its the easiest way to read those files, but keep in mind that if you have large files it will be expensive to read whole file to memory. 文件格式取决于,因为它是读取这些文件的最简单方法,所以它可能是json,但请记住,如果您有大文件,则将整个文件读取到内存中会很昂贵。 Here is example how to read them properly . 这是如何正确阅读它们的示例 So other option could be just flat file then will be easy to read it, since then you can read range and process it 因此,其他选项可能只是平面文件,因此易于阅读,因为这样您就可以读取范围并进行处理

PS3. PS3。 In azure I would use Azure Batch Jobs 天蓝色,我将使用Azure批处理作业

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM