简体   繁体   中英

Unknown thread spawns which ignores the filter chain and fails on async decorator

I am currently facing a strange issue I am not able to reproduce locally, but happens in AWS ECS regularly, letting the application crash or run slow.

We have a spring boot application which extracts the tenant from the incoming GraphQL request and sets the tenant to a ThreadLocal instance.

To support DataLoader from GraphQL Java kickstart we populate the tenant to each child thread which will be used by the graphql dataloader. The tenant is mandatory to specify the database schema.

The executor

@Bean
    @Override
    public Executor getAsyncExecutor() {

        log.info("Configuring async executor for multi tenancy...");
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(15);
        executor.setThreadNamePrefix("tenant-child-executor-");
        // Important part: Set the MultiTenancyTaskDecorator to populate current tenant to child thread
        executor.setTaskDecorator(new MultiTenancyAsyncTaskDecorator());
        executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
        executor.setWaitForTasksToCompleteOnShutdown(true);
        log.info("Executor configured successfully!");
        executor.initialize();
        return executor;
    }

Task Decorator

@NonNull
    @Override
    public Runnable decorate(@NonNull Runnable runnable) {        
        if (Objects.isNull(CurrentTenantContext.getTenant())) {
            log.warn("Current tenant is null while decorating a new thread!");
        }

        final TenantIdentifier parentThreadTenantIdentifier = Objects.isNull(CurrentTenantContext.getTenant()) ? TenantIdentifier.asSystem() : CurrentTenantContext.getTenant();
        // Also need to get the MDC context map as it is bound to the current local thread
        final Map<String, String> parentContextMap = MDC.getCopyOfContextMap();
        final var requestAttributes = RequestContextHolder.getRequestAttributes();

        return () -> {
            try {
            
                CurrentTenantContext.setTenant(TenantIdentifier.of(parentThreadTenantIdentifier.getTenantName()));
                if (Objects.isNull(requestAttributes)) {
                    log.warn("RequestAttributes are not available!");
                    log.warn("Running on tenant: {}", parentThreadTenantIdentifier.getTenantName());
                } else {
                    RequestContextHolder.setRequestAttributes(requestAttributes, true);
                }

                if (Objects.isNull(parentContextMap)) {
                    log.warn("Parent context map not available!");
                    log.warn("Running on tenant: {}", parentThreadTenantIdentifier.getTenantName());
                } else {
                    MDC.setContextMap(parentContextMap);
                }


                runnable.run();
            } finally {
          
                // Will be executed after thread finished or on exception
                RequestContextHolder.resetRequestAttributes();
                CurrentTenantContext.clear();
                MDC.clear();
            }
        };
    }

Tenant Context

public class CurrentTenantContext {
    private static final ThreadLocal<TenantIdentifier> currentTenant = new ThreadLocal<>();

    private CurrentTenantContext() {
        // Hide constructor to only provide static functionality
    }

    public static TenantIdentifier getTenant() {
        return currentTenant.get();
    }

    public static String getTenantName() {
        return getTenant().getTenantName();
    }

    public static void setTenant(TenantIdentifier tenant) {
        currentTenant.set(tenant);
    }


    public static void clear() {
        currentTenant.remove();
    }

    public static boolean isTenantSet() {
        return Objects.nonNull(currentTenant.get());
    }
}

Locally, this works like a charm. Even in a docker compose environment with limited resources (CPU and Mem) like in AWS. Even 100.000 requests (JMETER) everything works like expected.

On AWS we can easily let the application crash. After one or two requests, containing some child objects to resolve by GraphQL, we see a thread spawning which seems to ignore or not go through the chain

Thread-110 | [sys ] | WARN | MultiTenancyAsyncTaskDecorator | Current tenant is null while decorating a new thread!

An interesting thing in this line is the name of the thread. Each incoming request has the pattern http-nio-9100-exec-[N] and each child thread the pattern tenant-child-executor-[I] but this one has the pattern Thread-[Y] .

Now I am wondering where this thread is coming from and why is it not reproducible locally.

I was able to find the solution to the problem.

I needed to change

private static final ThreadLocal<TenantIdentifier> currentTenant = new ThreadLocal<>();

to

private static final InheritableThreadLocal<TenantIdentifier> currentTenant = new InheritableThreadLocal<>();

But I don't know why it works with InheritableThreadLocal but not with ThreadLocal within the AWS environment.

Further, I wonder why this change was not necessary for local testing which works with both ways.

Maybe somebody can provide some ideas.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM