简体   繁体   English

对存储在亚马逊S3上的文件进行远程grep /计数

[英]Doing a remote grep/count on a file stored on amazon S3

We have a cloud based applicaiton which has been storing user projects on the normal disk of our EC2 server. 我们有一个基于云的应用程序,已将用户项目存储在我们的EC2服务器的普通磁盘上。 I am in the process of moving our project storage to S3 but I have recently run into a tough challenge. 我正在将项目存储移至S3,但是最近遇到了一个严峻的挑战。 When a project is modified we sometimes need to run some analysis of the xml files stored in a project. 修改项目后,有时我们需要对存储在项目中的xml文件进行一些分析。 Before we would do this with a grep and a count which would look for certain xml tags, something like this: 在我们使用grep和count来查找某些xml标签之前,像这样:

grep -o "<tag" "' + path + '" | wc -l

Now that the files are being stored on S3 I am at a loss for how I might be able to do similar analysis (without downloading the whole project which would mostly defeat the purpose of switching to S3). 现在文件已存储在S3上,我不知道如何进行类似的分析(无需下载整个项目,这在大多数情况下都会损害切换到S3的目的)。 Is there anyway to do this? 反正有这样做吗?

Unfortunately S3 doesn't provide that functionality. 不幸的是,S3没有提供该功能。 You have to download the file(s) before grep can be applied (even if you use third party tools like s3cmd, they download the files behind the scene). 必须先下载文件,然后才能应用grep(即使您使用了s3cmd之类的第三方工具,它们也会在后台下载文件)。

If there aren't too many patterns, you can grep the files before you upload and keep the results on local machine. 如果没有太多的模式,则可以在上传之前将文件grep并将其保存在本地计算机上。 You don't have to hit S3 every time. 您不必每次都按下S3。 Yes, you may end up with stale data but the other alternative is expensive. 是的,您最终可能会获得过时的数据,但另一种选择却很昂贵。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM