Bosh task fails with "Error: Failed to upload blob, code 1, output: 'Error running app - Putting dav blob xxx-xxx-xxx-xxx : Wrong response code: 500; body: <html> <head><title>500 Internal Server Error</title></head>"
search cancel

Bosh task fails with "Error: Failed to upload blob, code 1, output: 'Error running app - Putting dav blob xxx-xxx-xxx-xxx : Wrong response code: 500; body: <html> <head><title>500 Internal Server Error</title></head>"

book

Article ID: 313106

calendar_today

Updated On:

Products

VMware

Issue/Introduction

Symptoms:
  • When updating a Kubernetes cluster’s configuration more likely increasing worker nodes count , bosh task fails with the following error.
$ tkgi update-cluster <cluster-name>  --num-nodes 4 

Update summary for cluster <cluster-name>:
Worker Number: 4
Cluster Tags: [{cluster_name <cluster-name>}]
Are you sure you want to continue? (y/n): y
Use '<cluster-name>' to monitor the state of your cluster

 
$ bosh task

Using environment '10.32.36.10' as client 'ops_manager'
Task 1417873
Task 1417873 | 00:36:22 | Deprecation: Global 'properties' are deprecated. Please define 'properties' at the job level.
Task 1417873 | 00:36:24 | Preparing deployment: Preparing deployment
Task 1417873 | 00:36:25 | Warning: DNS address not available for the link provider instance: pivotal-container-service/fb7126ce-fd3e-4f9a-a79c-24bbf1342a8d
Task 1417873 | 00:36:25 | Warning: DNS address not available for the link provider instance: pivotal-container-service/fb7126ce-fd3e-4f9a-a79c-24bbf1342a8d
Task 1417873 | 00:36:25 | Warning: DNS address not available for the link provider instance: pivotal-container-service/fb7126ce-fd3e-4f9a-a79c-24bbf1342a8d
Task 1417873 | 00:36:43 | Preparing deployment: Preparing deployment (00:00:19)
Task 1417873 | 00:36:43 | Preparing deployment: Rendering templates (00:00:11)
Task 1417873 | 00:36:54 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 1417873 | 00:36:55 | Creating missing vms: worker/b233cdf7-fc20-41f5-b495-37d37060158a (1)
Task 1417873 | 00:36:55 | Creating missing vms: worker/2b4cf530-1cf9-4214-a012-790e4e85c9f0 (3)
Task 1417873 | 00:36:55 | Creating missing vms: worker/5cd0b491-06d7-4df4-8da3-c158a7c9f6b6 (2)
Task 1417873 | 00:39:16 | Creating missing vms: worker/b233cdf7-fc20-41f5-b495-37d37060158a (1) (00:02:21)
Task 1417873 | 00:39:29 | Creating missing vms: worker/2b4cf530-1cf9-4214-a012-790e4e85c9f0 (3) (00:02:34)
Task 1417873 | 00:39:30 | Creating missing vms: worker/5cd0b491-06d7-4df4-8da3-c158a7c9f6b6 (2) (00:02:35)
Task 1417873 | 00:39:30 | Error: Failed to upload blob, code 1, output: 'Error running app - Putting dav blob b27061fc-5cad-4639-b469-ecd180b90036: Wrong response code: 500; body: <html>
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>nginx</center>
</body>
</html>
', error: ''
Task 1417873 Started Tue Jun 20 00:36:22 UTC 2023
Task 1417873 Finished Tue Jun 20 00:39:30 UTC 2023
Task 1417873 Duration 00:03:08
Task 1417873 error
Capturing task '1417873' output:
Expected task '1417873' to succeed but state is 'error'
Exit code 1


Cause

  • This error may occurs when multiple tasks are timing out or failing and too many BOSH tasks are queued for the Director. 
     
$ bosh tasks
Using environment 'x.x.x.x' as client 'ops_manager'

ID      State   Started At  Finished At  User                                            Deployment                                             Description                                                                                                                           Result  
262506  queued  -           -            pivotal-container-service-axxx
service-instance_xxxx  retrieve vm-stats                                                                                                                     -  
262505  queued  -           -            pivotal-container-service-axxx
service-instance_xxxx retrieve vm-stats                                                                                                                     -  
262503  queued  -           -            ops_manager                                     service-instance_xxxx ssh: setup:{"ids"=>["ff4d7cce-2f2d-468e-ba90-246a33a1b8bb"], "indexes"=>["ff4d7cce-2f2d-468e-ba90-246a33a1b8bb"], "job"=>"worker"}    -  
262502  queued  -           -            ops_manager                                     service-instance_xxxx  ssh: setup:{"ids"=>["fe2c2f36-8cee-40af-9b6c-84c650776405"], "indexes"=>["fe2c2f36-8cee-40af-9b6c-84c650776405"], "job"=>"worker"}    -  
262501  queued  -           -            ops_manager                                     service-instance_xxxx ssh: setup:{"ids"=>["fa9b3b8d-9f02-41cb-a945-c7536d4d2e3d"], "indexes"=>["fa9b3b8d-9f02-41cb-a945-c7536d4d2e3d"], "job"=>"worker"}    -  
262500  queued  -           -            ops_manager                                     service-instance_xxxx  ssh: setup:{"ids"=>["f9ca89c2-0396-41ea-8986-a303ea41e2e3"], "indexes"=>["f9ca89c2-0396-41ea-8986-a303ea41e2e3"], "job"=>"worker"}    -  
...
...
...
...
...
...
...
...
...
...
...
...
-  

472 tasks

Succeeded

 

  • You can see scheduled_events_cleanup count ,  snapshot_deployment count , ssh count , vms count is more than 2000 . 
type: cck_scan_and_fix count: 140", "
type: delete_artifacts count: 16", 
"type: delete_deployment count: 8", "type: fetch_logs count: 12", 
"type: run_errand count: 700",
 "type: scheduled_dns_blobs_cleanup count: 612", 
"type: scheduled_events_cleanup count: 2151", 
"type: scheduled_orphaned_disk_cleanup count: 277", 
"type: scheduled_task_cleanup count: 113", 
"type: snapshot_deployment count: 2021",
 "type: snapshot_deployments count: 707", 
"type: snapshot_self count: 707", 
"type: ssh count: 2772", 
"type: update_deployment count: 289", 
"type: update_release count: 634", 
"type: update_stemcell count: 1",
"type: vms count: 3711"]
irb(main):003:0> 

Resolution

WARNING: Make sure you are absolutely certain that the queued tasks are not affecting any ongoing deployments. If you have a currently running deployment DO NOT CONTINUE. Contact Tanzu Support for assistance. The steps outlines in this article cancel all queued tasks and if you have a running deployment, it may leave it in an inconsistent and potentially broken state. Again, do not continue if you have a deployment that is in progress.


If  using a version of Ops Manager 2.7+,  use the following command to cancel all queued BOSH tasks : 

bosh cancel-tasks -s=queued 
 
Once BOSH tasks are cancelled, retry cluster scaling operation with following command : 
tkgi update-cluster <cluster-name> --num-nodes 4

 
NOTE : If the previous command (bosh cancel-tasks -s=queued ) is not available, then follow the steps mentioned in below  KB article to cancel queued tasks. 
https://community.pivotal.io/s/article/How-to-Cancel-All-Queued-BOSH-Tasks-Using-director-ctl?language=en_US


Additional Information

How to clean up stale BOSH tasks history from BOSH Director console
https://community.pivotal.io/s/article/How-to-clean-up-stale-BOSH-tasks-history-from-console?language=en_US

How to cancel all queued BOSH tasks using director_ctl in Operations Manager
https://community.pivotal.io/s/article/How-to-Cancel-All-Queued-BOSH-Tasks-Using-director-ctl?language=en_US