Using EtcdOpsTask ​
EtcdOpsTask is a custom resource provided by the etcd-druid to facilitate various out-of-band tasks on Etcd clusters managed by the etcd-druid operator. It was initially proposed in this design document.
Overview ​
EtcdOpsTask allows operators to execute one-time operational tasks on an Etcd cluster. This includes operations like triggering on-demand snapshots (full or delta). The controller manages the task lifecycle, executing the operation and updating the task status to reflect success or failure.
How Operators Can Use EtcdOpsTask ​
NOTE
As of v0.34.0, EtcdOpsTask primarily supports on-demand snapshot operations. Future versions may introduce additional task types.
Creating an EtcdOpsTask ​
To trigger an operational task, create an EtcdOpsTask resource in the same namespace as the target Etcd cluster. The spec.config field specifies the type of operation to perform. This field is of union type, allowing one and only one operation to be defined per task:
apiVersion: druid.gardener.cloud/v1alpha1
kind: EtcdOpsTask
metadata:
name: on-demand-snapshot
namespace: default
spec:
etcdName: etcd-main
config:
onDemandSnapshot:
type: full # or delta
#isFinal: false
timeoutSecondsFull: 900
#timeoutSecondsDelta: 60
ttlSecondsAfterFinished: 360- The above example creates an
EtcdOpsTasknamedon-demand-snapshotthat triggers an on-demand snapshot for the Etcd cluster referenced byspec.etcdNamein themetadata.namespacenamespace where theEtcdOpstaskis created. Some tasks might not require an etcd reference and hence the fieldspec.etcdNameis optional. - The task will be automatically deleted after a Time-To-Live duration (TTL) completion as defined by
spec.ttlSecondsAfterFinished. If not specified, the default TTL is 3600 seconds (1 hour). - The spec field is immutable and is enforced via
CELexpressions.
Task Lifecycle ​
Once created, the EtcdOpsTask progresses through the following states:
- Pending: Task is accepted but not yet acted upon
- InProgress: Task is currently being executed
- Succeeded: Task completed successfully
- Failed: Task execution failed.
- Rejected: Task was rejected as it failed to fulfill required pre-conditions. This set of pre-conditions varies from task to task.
Once a task reaches a terminal state (Succeeded, Failed, or Rejected), it will be eligible for automatic cleanup based on the specified TTL(spec.ttlSecondsAfterFinished).
Task Phases ​
The Etcdopstask controller follows a handler-driven flow with three phases:
- Admit: Validates preconditions (RBAC, duplicates, config). Task moves
Pending → InProgresson success, orRejectedon failure. - Execute: Performs the requested operation (e.g., on-demand snapshot). Task transitions to
SucceededorFailed. - Cleanup: Handles cleanup of the task and any deployed resources.
EtcdopsTaskresource is deleted after TTL expiry.
Monitoring Task Status ​
Check the task status to monitor progress:
kubectl get etcdopstask <etcdopstask name> -o yamlThe status section provides:
- State: Current task state
- LastOperation: Last reconciliation operation performed
- Type: Type of operation i.e whether it is in the Admit phase or Execute phase or Cleanup phase
- State: State of the last operation type. A combination of Type and State can help identify the status of the operation. Eg:
- Type: Execute, State: Failed indicates that the task execution has failed
- Type: Admit, State: InProgress indicates that the task is running the pre-conditions.
- RunID: Unique identifier for the operation run
- LastUpdateTime: Timestamp of the last update to the LastOperation field
- LastError: Error details if any while running the LastOperation.
- Code: Error code
- Description: Detailed error message
- ObservedAt: Timestamp when the error was observed
- LastTransitionTime: Timestamp of the last state transition
- StartedAt: Timestamp when the task execution started
Example status:
status:
state: InProgress
lastTransitionTime: "2025-12-03T23:31:38Z"
startedAt: "2025-12-03T23:31:32Z"
lastOperation:
description: Task Execution phase is in Progress
lastUpdateTime: "2025-12-03T23:31:50Z"
runID: e094cd10-e756-4132-b52a-50c164d1df2a
state: InProgress
type: Execute
lastErrors:
- code: ERR_EXECUTE_HTTP_REQUEST
description: '[Operation: Execution, Code: ERR_EXECUTE_HTTP_REQUEST] message:
failed to execute HTTP request, cause: Post "http://etcd-test-client.default:8080/snapshot/full":
dial tcp 10.96.99.172:8080: connect: connection refused'
observedAt: "2025-12-03T23:31:53Z"Task Cleanup ​
EtcdOpsTask resources along with any dependent resources (Eg: Jobs, Pods etc.) are automatically cleaned up after a configurable TTL once they reach a terminal state (Succeeded, Failed, or Rejected).
Supported Task Types ​
Currently supported task types:
OnDemandSnapshot ​
Triggers an on-demand snapshot outside the regular snapshot schedule.
Snapshot Types:
- Full: Creates a complete snapshot of the entire Etcd data store.
- Delta: Creates an incremental snapshot containing only the changes since the last snapshot.
Prerequisites:
- Backup must be enabled for the target Etcd cluster (
spec.backup.storemust be configured in the Etcd resource) - The Etcd cluster must be in a ready state (all members healthy and running)
- No other
EtcdOpsTaskshould be in progress for the same Etcd cluster.
Configuration Options:
type: Specifies the snapshot type (fullordelta)isFinal: Optional boolean flag to mark the snapshot as final (default:false). This field is applicable only for full snapshots.timeoutSecondsFull: Timeout in seconds for full snapshot operations (default: 900)timeoutSecondsDelta: Timeout in seconds for delta snapshot operations (default: 60)
Best Practices ​
- Unique Names: Use descriptive, unique names for tasks to avoid conflicts
- Monitor Status: Check task status before creating duplicate operations. This will help avoid duplicate tasks.