For the complete documentation index, see llms.txt. This page is also available as Markdown.

Vector Log Collection

Why Vector

The Pipekit Agent's built-in log transport is best-effort. It streams logs from each workflow pod via the Kubernetes API and forwards them to Pipekit. There is no write-ahead log and no retry. If shipment fails, those logs are gone.

Two failure modes follow.

Log loss under API server pressure. The agent reads logs through the Kubernetes API. Any sustained load on the API server can stall its log streams. Without a replay path, the agent does not recover those logs.

Etcd cost at scale. Streaming every container of every workflow pod through the Kubernetes API loads etcd. For workflows with many parallel pods, that pressure sets a practical ceiling on how big a Pipekit deployment can grow before the API server becomes the bottleneck.

Vector addresses both. It runs as a DaemonSet on every node and reads container logs directly from the node filesystem, never touching the Kubernetes API for log content. Vector buffers undelivered logs to disk (default 500 MB per node) and retries with backoff. Transient ingest failures no longer mean lost data.

The built-in agent path is fine at small scale. Switch to Vector once log loss or API-server load becomes a real concern.

How it works

The Pipekit Agent Helm chart includes Vector as an optional subchart, gated on vector.enabled. When enabled, the chart deploys Vector as a DaemonSet with a ConfigMap that defines the full log pipeline.

Vector's kubernetes_logs source reads container log files directly from each node, filtered to pods carrying the workflows.argoproj.io/workflow label so the source picks up only Argo Workflow pods. A VRL transform extracts the Pipekit identifiers (orgUUID, runUUID, pipeUUID) from pod labels, the Argo workflow node ID from the workflows.argoproj.io/node-id pod annotation, and the pod name, container name, and log timestamp from Kubernetes metadata. A filter drops any record missing a required field. A final transform reshapes the remaining records to the field set the Pipekit messenger expects.

The HTTP sink batches up to 1024 events per request with a 1-second flush. It POSTs each batch to {messengerBaseUri}/api/messenger/v1/logs/ingest/batch, authenticated with X-PIPEKIT-API-KEY. A 500 MB on-disk buffer sits behind the sink: Vector retries delivery with backoff, and when the buffer fills, producers block rather than drop events.

Enabling Vector

Set vector.enabled=true and disable the agent's built-in log path with configMap.sendLogsToPipekit=false:

helm upgrade -i -n argo \
  pipekit-agent pipekit/pipekit-agent \
  --set secrets.pipekitSecretAccessKey="[provided Secret Access Key]" \
  --set secrets.pipekitClusterId="[provided Cluster ID]" \
  --set vector.enabled=true \
  --set configMap.sendLogsToPipekit=false

When vector.enabled=true, the chart ignores configMap.sendLogsToPipekit and disables the agent's built-in log path. Set the flag to false anyway for hygiene, so your values match your intent.

Self-hosted Pipekit

If you run a self-hosted Pipekit instance and have set configMap.messengerBaseUri to your own messenger URL, the chart automatically threads that value into Vector's HTTP sink. You do not need to configure the URL on the Vector side.

Image pull secrets

If you mirror container images into a private registry, set vector.image.pullSecrets to a list of secret references the same way you would for the agent itself:

Authenticating Vector via an existing Secret

When you configure the agent with secrets.existingSecret, Vector does not automatically inherit those credentials. Vector lives in the subchart, so you must tell it explicitly to read from the secret using vector.envFrom:

If you pass secrets.pipekitSecretAccessKey inline rather than using an existing secret, you do not need envFrom.

Resources and tuning

Default Vector resources are conservative:

These defaults are enough for moderate log volume. For clusters running many large parallel workflows, increase CPU and memory. See Vector's sizing guidance for tuning under heavy load.

Verifying it's working

  1. Confirm the DaemonSet is Ready on every node:

  2. Port-forward to one Vector pod and check its local API on 127.0.0.1:8686:

  3. Run a workflow and confirm its logs appear in the Pipekit UI.

Reference

For the full set of Vector-related Helm values, see the values table in Pipekit Agent Helm Chart. The relevant rows are vector.enabled, vector.dataDir, vector.envFrom, vector.existingConfigMaps, vector.image.pullSecrets, vector.resources, vector.role, and vector.service.enabled.

Last updated