[英]Vespa: Failed to fetch json: Connection error: socket write error
We have done deployment for Vespa using Kubernetes on the GKE cluster with 3 nodes while creating a Dockerfile we took Vespa 7.351.32 version as a base image and added a few more things to it我们已经在具有 3 个节点的 GKE 集群上使用 Kubernetes 完成了 Vespa 的部署,同时创建了 Dockerfile 我们将 Vespa 7.351.32 版本作为基础映像并添加了更多内容
The workspace folder contains all the necessary.xml and other files required for the Vespa deployment.工作区文件夹包含所有必要的.xml 和 Vespa 部署所需的其他文件。
Below are the steps we execute inside three PODs to deploy and restart the config server下面是我们在三个 POD 中执行的部署和重启配置服务器的步骤
/opt/vespa/bin/vespa-deploy prepare /workspace && /opt/vespa/bin/vespa-deploy activate
wait (5 min)
vespa-stop-services
vespa-stop-configserver
wait(15min)
vespa-start-configserver
vespa-start-services
vespa-get-cluster-state
vespa-config-status
Then we receive the following error.然后我们收到以下错误。
Please find below the screenshot for the connectivity to 2181 ports on all three pods.请在屏幕截图下方找到与所有三个 pod 上的 2181 端口的连接。
Upon further inspection of logs(using vespa-logfmt -l error), we found that com.yahoo.container.handler.threadpool.threadpool.DefaultContainerTHreadpool
bundle fails to load.在进一步检查日志(使用 vespa-logfmt -l 错误)后,我们发现
com.yahoo.container.handler.threadpool.threadpool.DefaultContainerTHreadpool
包无法加载。 Manually restarting the config server and Vespa services seems to solve the issue.手动重启配置服务器和 Vespa 服务似乎可以解决问题。
Attaching the related log below.下面附上相关日志。
Please help us in understand the following points:请帮助我们了解以下几点:
Does some service need to be running before this bundle is loaded?
Is there a path issue? If so where can we find this bundle?
Is this because of any memory issue(we have the recommended 4G)?
How does vespa load these bundles?
Below are the additional details.以下是其他详细信息。 for the setup.
设置。
FROM vespaengine/vespa:7.351.32
#Copy Neccessary Files
RUN mkdir -p workspace
COPY workspace /workspace
RUN yum install python3
COPY backup-pod.sh /
# Downloading gcloud package
RUN curl https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz > /tmp/google-cloud-sdk.tar.gz
# Installing the package
RUN mkdir -p /usr/local/gcloud \
&& tar -C /usr/local/gcloud -xvf /tmp/google-cloud-sdk.tar.gz \
&& /usr/local/gcloud/google-cloud-sdk/install.sh
# Adding the package path to local
ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: vespa
namespace: vespa
labels:
app: vespa
spec:
replicas: 3
#serviceName: vespa
selector:
matchLabels:
app: vespa
name: vespa-internal
serviceName: vespa-internal
template:
metadata:
labels:
app: vespa
name: vespa-internal
spec:
serviceAccount: vespa-sa
# nodeSelector:
# iam.gke.io/gke-metadata-server-enabled: "true"
containers:
- name: vespa
image: asia-south1-docker.pkg.dev/aurum-projec/vespa/vespa:latest
imagePullPolicy: Always
securityContext:
privileged: true
ports:
- containerPort: 8080
protocol: TCP
readinessProbe:
httpGet:
path: /ApplicationStatus
port: 19071
scheme: HTTP
volumeMounts:
- name: vespa-var
mountPath: /opt/vespa/var
- name: vespa-logs
mountPath: /opt/vespa/logs
resources:
requests:
memory: "2G"
limits:
memory: "2G"
volumeClaimTemplates:
- metadata:
name: vespa-var
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
- metadata:
name: vespa-logs
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
That message comes on startup, not reconfig, and relates to one of our bundles which is always present and which does consume significant resources on construction, so yes you are probably running out of memory.该消息在启动时出现,而不是重新配置,并且与我们的一个捆绑包有关,该捆绑包始终存在并且在构建时会消耗大量资源,所以是的,您可能用完了 memory。
To be clear, 4Gb isn't recommended, it is the minimum you can get away with for trying it out.需要明确的是,不建议使用 4Gb,它是您可以尝试的最低要求。
Also note that you don't need this complex, time-consuming process for deploying changes - just deploy prepare+activate is sufficient and will also work without disrupting queries and writes so that you can do it in production.另请注意,您不需要这种复杂、耗时的过程来部署更改 - 只需部署 prepare+activate 就足够了,而且还可以在不中断查询和写入的情况下工作,这样您就可以在生产环境中进行操作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.