简体繁体 English

如何搜索Cloudsearch经常变化的价值？

[英]How to search value that often change with Cloudsearch?

原文 2018-07-12 12:39:33 8 1 php/ mysql/ amazon-web-services/ amazon-cloudsearch

I'm new with Cloudsearch and my question might not be clear so I will try to explain my problem. 我是Cloudsearch ，我的问题可能不清楚，因此我将尝试解释我的问题。

We have a backoffice were lot of people make research and time to time our database is KO because of some request that take more than 30s to execute, so we decide to use Cloudsearch because we already use some other Amazon web service . 我们有一个后台，很多人都在进行研究，并且由于某些请求需要30多Cloudsearch才能执行，所以我们的数据库有时是KO，所以我们决定使用Cloudsearch因为我们已经在使用其他Amazon web service 。

So I created a search domain, I created the index according to the value we search in our current database and I indexed all our event (result of what people search) according to our test database (~ 42 000 row). 因此，我创建了一个搜索域，根据当前数据库中搜索的值创建了索引，并根据测试数据库（〜42 000行）为所有事件（人们搜索的结果）建立了索引。

My problem is that each event have multiple media (.jpg, .gif and .mp4) in our database (and we are migrating from v3 to v4 so there is two media database and we need to know the event version to know where we should search : the old or the new database) so my question : Can I return some media information with Cloudsearch or I will still need to use a mysql request? 我的问题是每个事件在我们的数据库中都有多个媒体（.jpg，.gif和.mp4）（并且我们正在从v3迁移到v4，所以有两个媒体数据库，我们需要知道事件的版本才能知道应该在哪里搜索：旧数据库或新数据库），所以我的问题是： 我可以通过Cloudsearch返回一些媒体信息，还是仍然需要使用mysql请求？

Right now we return the last media add in database (so he can change a lot of time if the event is running) and the total number of media of this event (that can change really often too). 现在，我们返回数据库中的最后一个媒体添加（因此，如果事件正在运行，他可以更改很多时间）和该事件的媒体总数（实际上也可以经常更改）。

What I think might work : 我认为可能有效：

I can add the two field in my event index (number of media + url of last media) and create a batch file to add / update the event data EACH time we add a new media in database : problem is that we can send 1 batch each 10s and max 10 000 batch / day, so if we have 50 event that run in the same time it could be a big problem... 我可以在事件索引中添加两个字段（媒体数+最后媒体的url），并创建一个批处理文件以添加/更新事件数据。每次我们在数据库中添加新媒体时：问题是我们可以发送1批每10s和每天最多10000个批处理，因此如果我们有50个事件同时运行，则可能是个大问题...
Same idea that before but we use a CRON to create a batch file with all the last data each hour for example : problem is that the research won't be right before a batch...and max batch size is 5 MB so it could be okay but if we have a lot of new data to add it could be a little problem. 同样的想法，但是例如，我们使用CRON每小时创建包含所有最后数据的批处理文件，例如：问题是研究不会在批处理之前进行……并且最大批处理大小为5 MB，因此可以可以，但是如果我们要添加很多新数据，可能会有点问题。
The current idea is to do a mysql request using each event id we get from the cloudsearch research and return those information, but I find this kinda stupid to still use mysql if we change for Cloudsearch... 当前的想法是使用我们从cloudsearch研究获得的每个事件ID进行mysql请求并返回这些信息，但是如果我们为Cloudsearch进行更改，我发现这个愚蠢的人仍然使用mysql ...

I saw the documentation for " Using Dynamic Fields in Amazon Cloudsearch " but I don't think it does what I want to achieve...maybe I missunderstand something, but if someone can help me to understand how to do it the best way I would be thankful. 我看到了“ 在Amazon Cloudsearch中使用动态字段 ”的文档，但我认为它并没有实现我想要实现的目标……也许我误会了一些东西，但是如果有人可以帮助我了解如何以最佳方式做到这一点，会很感激的。

1 个解决方案

Can I return some media information with Cloudsearch or I will still need to use a mysql request? 我可以通过Cloudsearch返回一些媒体信息，还是仍然需要使用mysql请求？

If you are asking whether you can store .mp4 , .jpg , etc. media files in CloudSearch, the answer is no. 如果您询问是否可以在CloudSearch中存储.mp4 ， .jpg等媒体文件，答案是否定的。 You can store text, numbers, dates, and latlong coordinates (or arrays of any of those, except latlong). 您可以存储文本，数字，日期和latlong坐标（或除latlong之外的任何数组）。

I think the conventional way to handle media is to index a URL/path to the media as a text field. 我认为处理媒体的常规方法是将指向媒体的URL /路径索引为文本字段。

Reference: AWS Cloudsearch Documentation - Configuring Index Fields 参考： AWS Cloudsearch文档-配置索引字段