简体   繁体   中英

ElasticSearch : Concurrent updates to index while _reindex for the same index in progress

We have been using this link as a reference to accommodate any change in the mappings for a field in our index with zero downtime.

Question: Considering the same example taken in the above link, when we reindex the data from my_index_v1 to my_index_v2 using _reindex API. Does ElasticSearch guarantee that any concurrent updates happening in my_index_v1 would make it to my_index_v2 for sure?

For example, a document might get updated in my_index_v1 before or after it is reindexed by api to my_index_v2.

Ultimately, we just need to ensure that while we did not want any downtime for doing any mapping changes (hence did _reindex using alias and other cool stuff by ES), we also want to ensure that none of the add/update were missed while this huge reindex was in progress, as we are talking about reindexing >50GB data.

Thanks,
Sandeep

One way to solve this is by using two aliases instead of one. One for queries (let's call it read_alias ), and one for indexing ( write_alias ). We can write our code so that all indexing happens through the write_alias and all queries go through the read_alias . Let's consider three periods of time:

Before rebuild

read_alias : points to current_index

write_alias : points to current_index 在此处输入图片说明

All queries return current data.

All modifications go into current_index .

During rebuild

read_alias : points to current_index

write_alias : points to new_index 在此处输入图片说明

All queries keep getting data as it existed before the rebuild, since searching code uses read_alias .

All rows, including modified ones, get indexed into the new_index , since both the rebuilding loop and the DB trigger use the write_alias .

After rebuild

read_alias : points to new_index

write_alias : points to new_index 在此处输入图片说明

All queries return new data, including the modifications made during rebuild.

All modifications go into new_index .

It should even be possible to get the modified data from queries while rebuilding, if we make the DB trigger code index modified rows into both the indices while the rebuild is going on (ie, while the aliases point to different indices).

在此处输入图片说明

It is often better to rebuild the index from source data using custom code instead of relying on the _reindex API, since that way we can add new fields that may not have been stored in the old index.

This article has some more details.

The reindex api will not consider the changes made after the process has started.. One thing you can do is once you are done reindexing process.You can again start process with version_type:external . This will cause only documents from source index to destination index that have different version and are not present

Here is the example

POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter",
    "version_type": "external"
  }
}

Setting version_type to external will cause Elasticsearch to preserve the version from the source, create any documents that are missing, and update any documents that have an older version in the destination index than they do in the source index:

It looks like it does it based off of snapshots of the source index.

Which would suggest to me that they couldn't reasonably honor changes to the source happening in the middle of the process. You avoid downtime on the search side, but I think you would need to pause updates on the indexing side during this process.

Something you could do is keep track on your index of when the document was last modified. Then once you finish indexing and switch the alias, you query the old index for what changed in the middle. Propagate those changes over to the new index and you get eventual consistency.

import java.nio.charset.StandardCharsets;
import java.security.InvalidKeyException;
import java.security.NoSuchAlgorithmException;
import java.time.LocalDateTime;
import java.time.ZoneOffset;
import java.util.ArrayList;
import java.util.Base64;
import java.util.List;
import java.util.UUID;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;

/**
 *
 * @author user
 */
public class JWebToken {

    private static final String SECRET_KEY = "FREE_MASON"; //@TODO Add Signature here
    private static final char[] HEX_ARRAY = "0123456789ABCDEF".toCharArray();
    private static final String ISSUER = "mason.metamug.net";
    private static final String JWT_HEADER = "{\"alg\":\"HS256\",\"typ\":\"JWT\"}";
    private JSONObject payload = new JSONObject();
    private String signature;
    private String encodedHeader;

    private JWebToken() {
        encodedHeader = encode(new JSONObject(JWT_HEADER));
    }

    public JWebToken(JSONObject payload) {
        this(payload.getString("sub"), payload.getJSONArray("aud"), payload.getLong("exp"));
    }

    public JWebToken(String sub, JSONArray aud, long expires) {
        this();
        payload.put("sub", sub);
        payload.put("aud", aud);
        payload.put("exp", expires);
        payload.put("iat", LocalDateTime.now().toEpochSecond(ZoneOffset.UTC));
        payload.put("iss", ISSUER);
        payload.put("jti", UUID.randomUUID().toString()); //how do we use this?
        signature = hmacSha256(encodedHeader + "." + encode(payload), SECRET_KEY);
    }

    /**
     * For verification
     *
     * @param token
     * @throws java.security.NoSuchAlgorithmException
     */
    public JWebToken(String token) throws NoSuchAlgorithmException {
        this();
        String[] parts = token.split("\\.");
        if (parts.length != 3) {
            throw new IllegalArgumentException("Invalid Token format");
        }
        if (encodedHeader.equals(parts[0])) {
            encodedHeader = parts[0];
        } else {
            throw new NoSuchAlgorithmException("JWT Header is Incorrect: " + parts[0]);
        }

        payload = new JSONObject(decode(parts[1]));
        if (payload.isEmpty()) {
            throw new JSONException("Payload is Empty: ");
        }
        if (!payload.has("exp")) {
            throw new JSONException("Payload doesn't contain expiry " + payload);
        }
        signature = parts[2];
    }

    @Override
    public String toString() {
        return encodedHeader + "." + encode(payload) + "." + signature;
    }

    public boolean isValid() {
        return payload.getLong("exp") > (LocalDateTime.now().toEpochSecond(ZoneOffset.UTC)) //token not expired
                && signature.equals(hmacSha256(encodedHeader + "." + encode(payload), SECRET_KEY)); //signature matched
    }

    public String getSubject() {
        return payload.getString("sub");
    }

    public List<String> getAudience() {
        JSONArray arr = payload.getJSONArray("aud");
        List<String> list = new ArrayList<>();
        for (int i = 0; i < arr.length(); i++) {
            list.add(arr.getString(i));
        }
        return list;
    }

    private static String encode(JSONObject obj) {
        return encode(obj.toString().getBytes(StandardCharsets.UTF_8));
    }

    private static String encode(byte[] bytes) {
        return Base64.getUrlEncoder().withoutPadding().encodeToString(bytes);
    }

    private static String decode(String encodedString) {
        return new String(Base64.getUrlDecoder().decode(encodedString));
    }

    /**
     * Sign with HMAC SHA256 (HS256)
     *
     * @param data
     * @return
     * @throws Exception
     */
    private String hmacSha256(String data, String secret) {
        try {

            //MessageDigest digest = MessageDigest.getInstance("SHA-256");
            byte[] hash = secret.getBytes(StandardCharsets.UTF_8);//digest.digest(secret.getBytes(StandardCharsets.UTF_8));

            Mac sha256Hmac = Mac.getInstance("HmacSHA256");
            SecretKeySpec secretKey = new SecretKeySpec(hash, "HmacSHA256");
            sha256Hmac.init(secretKey);

            byte[] signedBytes = sha256Hmac.doFinal(data.getBytes(StandardCharsets.UTF_8));
            return encode(signedBytes);
        } catch (NoSuchAlgorithmException | InvalidKeyException ex) {
            Logger.getLogger(JWebToken.class.getName()).log(Level.SEVERE, ex.getMessage(), ex);
            return null;
        }
    }

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM