简体   繁体   English

使用Fog删除Rackspace中的大量文件

[英]Delete a huge amount of files in Rackspace using fog

I have millions of files in my Rackspace Files. 我的Rackspace文件中有数百万个文件。 I would like to delete a part of them, passing lists of file names instead of deleting one by one, which is very slow. 我想删除其中的一部分,传递文件名列表,而不是一一删除,这非常慢。 Is there any way to do this with fog? 有什么办法可以做到这一点? Right now, I have a script to delete each file, but would be nice to have something with better performance. 现在,我有一个脚本来删除每个文件,但是拥有更好的性能会很高兴。

connection = Fog::Storage.new({
  :provider           => 'Rackspace',
  :rackspace_username => "xxxx",
  :rackspace_api_key  => "xxxx",
  :rackspace_region   => :iad  
})

dir = connection.directories.select {|d| d.key == "my_directory"}.first

CloudFileModel.where(duplicated: 1).each do |record| 
    f = record.file.gsub("/","")
    dir.files.destroy(f) rescue nil
    puts "deleted #{record.id}"
end

Yes, you can with delete_multiple_objects . 是的,您可以使用delete_multiple_objects

Deletes multiple objects or containers with a single request. 通过单个请求删除多个对象或容器。

To delete objects from a single container, container may be provided and object_names should be an Array of object names within the container. 要删除从一个单一的容器对象, container可以被提供并且object_names应该是在容器内的对象名称的数组。

To delete objects from multiple containers or delete containers, container should be nil and all object_names should be prefixed with a container name. 要从多个容器中删除对象或删除容器, container应为nil ,所有object_names均应以容器名作为前缀。

Containers must be empty when deleted. 删除后容器必须为空。 object_names are processed in the order given, so objects within a container should be listed first to empty the container. object_names是按给定的顺序处理的,因此应首先列出容器中的对象以清空容器。

Up to 10,000 objects may be deleted in a single request. 单个请求中最多可以删除10,000个对象。 The server will respond with 200 OK for all requests. 服务器将以200 OK响应所有请求。 response.body must be inspected for actual results. response.body必须检查实际结果。

Examples: Delete objects from a container 示例:从容器中删除对象

object_names = ['object', 'another/object']
conn.delete_multiple_objects('my_container', object_names)

Delete objects from multiple containers 从多个容器中删除对象

object_names = ['container_a/object', 'container_b/object']
conn.delete_multiple_objects(nil, object_names)

Delete a container and all it's objects 删除容器及其所有对象

object_names = ['my_container/object_a', 'my_container/object_b', 'my_container']
conn.delete_multiple_objects(nil, object_names)

To my knowledge, the algorithm included here is the most reliable and highest-performance algorithm for deleting a Cloud Files container along with any objects it contains. 据我所知,此处包含的算法是删除云文件容器及其包含的任何对象的最可靠,性能最高的算法。 The algorithm could be modified for your purposes by including a parameter with the names of items to delete instead of calling ListObjects . 可以为您的目的修改此算法,方法是包括一个带有要删除项目名称的参数,而不是调用ListObjects At the time of this writing, there is no server-side functionality (ie bulk operation) capable of meeting your needs in a timely manner. 在撰写本文时,尚没有能够及时满足您需求的服务器端功能(即批量操作)。 Bulk operations are rate limited to 2-3 delete operations per second, so at least 55 minutes per 10,000 items you delete. 批量操作的速率限制为每秒2-3个删除操作,因此,每删除10,000个项目至少需要55分钟。

The following code shows the basic algorithm (slightly simplified from the syntax that is actually required in the .NET SDK). 下面的代码显示了基本算法(与.NET SDK中实际需要的语法略有简化)。 It assumes that no other clients are adding objects to the container at any point after execution of this method begins. 假定此方法开始执行后,在任何时候都没有其他客户端向容器添加对象。

Note that you will be rate limited to a maximum of 100 delete operations per second per container which contains files. 请注意,您将被限制为每个包含文件的容器每秒最多进行100次删除操作 If multiple containers are involved, distribute your concurrent requests to round-robin the requests to each of the containers. 如果涉及多个容器,请分发并发请求以将请求循环到每个容器。 Adjust your concurrency level to the value that approaches the hard rate limit. 将并发级别调整为接近硬率限制的值。 Using this algorithm has allowed me to reach long-term sustained deletion rates of over 450 objects/second when multiple containers were involved. 当涉及多个容器时,使用此算法可使我达到超过450个对象/秒的长期持续删除率。

public static void DeleteContainer(
  IObjectStorageProvider provider,
  string containerName)
{
  while (true)
  {
    // The only reliable way to determine if a container is empty is
    // to list its objects
    ContainerObject[] objects = provider.ListObjects(containerName);
    if (!objects.Any())
      break;

    // the iterations of this loop should be executed concurrently.
    // depending on connection speed, expect to use 25 to upwards of 300
    // concurrent connections for best performance.
    foreach (ContainerObject obj in objects)
    {
      try
      {
        provider.DeleteObject(containerName, obj.Name);
      }
      catch (ItemNotFoundException)
      {
        // a 404 can happen if the object was deleted on a previous iteration,
        // but the internal database did not fully synchronize prior to calling
        // List Objects again.
      }
    }
  }

  provider.DeleteContainer(containerName);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM