简体   繁体   English

Vespa:无法获取 json:连接错误:套接字写入错误

[英]Vespa: Failed to fetch json: Connection error: socket write error

We have done deployment for Vespa using Kubernetes on the GKE cluster with 3 nodes while creating a Dockerfile we took Vespa 7.351.32 version as a base image and added a few more things to it我们已经在具有 3 个节点的 GKE 集群上使用 Kubernetes 完成了 Vespa 的部署,同时创建了 Dockerfile 我们将 Vespa 7.351.32 版本作为基础映像并添加了更多内容

  1. GCloud SDK GCloud SDK
  2. Some script files that copy our logs to GCS一些将我们的日志复制到 GCS 的脚本文件
  3. workspace folder工作区文件夹

The workspace folder contains all the necessary.xml and other files required for the Vespa deployment.工作区文件夹包含所有必要的.xml 和 Vespa 部署所需的其他文件。

Below are the steps we execute inside three PODs to deploy and restart the config server下面是我们在三个 POD 中执行的部署和重启配置服务器的步骤

/opt/vespa/bin/vespa-deploy prepare /workspace && /opt/vespa/bin/vespa-deploy activate

wait (5 min)

vespa-stop-services
vespa-stop-configserver

wait(15min)

vespa-start-configserver
vespa-start-services

vespa-get-cluster-state
vespa-config-status

Then we receive the following error.然后我们收到以下错误。

在此处输入图像描述

Please find below the screenshot for the connectivity to 2181 ports on all three pods.请在屏幕截图下方找到与所有三个 pod 上的 2181 端口的连接。

在此处输入图像描述

Upon further inspection of logs(using vespa-logfmt -l error), we found that com.yahoo.container.handler.threadpool.threadpool.DefaultContainerTHreadpool bundle fails to load.在进一步检查日志(使用 vespa-logfmt -l 错误)后,我们发现com.yahoo.container.handler.threadpool.threadpool.DefaultContainerTHreadpool包无法加载。 Manually restarting the config server and Vespa services seems to solve the issue.手动重启配置服务器和 Vespa 服务似乎可以解决问题。

Attaching the related log below.下面附上相关日志。

在此处输入图像描述

Please help us in understand the following points:请帮助我们了解以下几点:

Does some service need to be running before this bundle is loaded?
Is there a path issue? If so where can we find this bundle?
Is this because of any memory issue(we have the recommended 4G)?
How does vespa load these bundles?

Below are the additional details.以下是其他详细信息。 for the setup.设置。

Dockerfile Dockerfile
FROM vespaengine/vespa:7.351.32

#Copy Neccessary Files
RUN mkdir -p workspace
COPY workspace /workspace
RUN yum install python3
COPY backup-pod.sh /

# Downloading gcloud package
RUN curl https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz > /tmp/google-cloud-sdk.tar.gz

# Installing the package
RUN mkdir -p /usr/local/gcloud \
  && tar -C /usr/local/gcloud -xvf /tmp/google-cloud-sdk.tar.gz \
  && /usr/local/gcloud/google-cloud-sdk/install.sh

# Adding the package path to local
ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin
Manifest显现
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: vespa
  namespace: vespa
  labels:
    app: vespa
spec:
  replicas: 3
  #serviceName: vespa
  selector:
    matchLabels:
      app: vespa
      name: vespa-internal
  serviceName: vespa-internal
  template:
    metadata:
      labels:
        app: vespa
        name: vespa-internal
    spec:
      serviceAccount: vespa-sa
#     nodeSelector:
#       iam.gke.io/gke-metadata-server-enabled: "true"
      containers:
      - name: vespa
        image: asia-south1-docker.pkg.dev/aurum-projec/vespa/vespa:latest
        imagePullPolicy: Always
        securityContext:
          privileged: true
        ports:
        - containerPort: 8080
          protocol: TCP
        readinessProbe:
          httpGet:
            path: /ApplicationStatus
            port: 19071
            scheme: HTTP
        volumeMounts:
        - name: vespa-var
          mountPath: /opt/vespa/var
        - name: vespa-logs
          mountPath: /opt/vespa/logs
        resources:
          requests:
            memory: "2G"
          limits:
            memory: "2G"
  volumeClaimTemplates:
  - metadata:
      name: vespa-var
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi
  - metadata:
      name: vespa-logs
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi

That message comes on startup, not reconfig, and relates to one of our bundles which is always present and which does consume significant resources on construction, so yes you are probably running out of memory.该消息在启动时出现,而不是重新配置,并且与我们的一个捆绑包有关,该捆绑包始终存在并且在构建时会消耗大量资源,所以是的,您可能用完了 memory。

To be clear, 4Gb isn't recommended, it is the minimum you can get away with for trying it out.需要明确的是,不建议使用 4Gb,它是您可以尝试的最低要求。

Also note that you don't need this complex, time-consuming process for deploying changes - just deploy prepare+activate is sufficient and will also work without disrupting queries and writes so that you can do it in production.另请注意,您不需要这种复杂、耗时的过程来部署更改 - 只需部署 prepare+activate 就足够了,而且还可以在不中断查询和写入的情况下工作,这样您就可以在生产环境中进行操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM