Introduction

When working with event driven systems like Argo Events, it can become quite painful to troubleshoot these systems. I’m very familiar with Argo Events and using it to achieve a lot of things. In my several of my posts Argo Events is mentioned to be part of the stack.

But when things do not work as you want you need to be aware of what is happening. And if all components are getting the information they need.

Debugging event flow.

Checking the resources

  1. Make sure you have a EventBus deployed kubectl get eventbus
  2. Make sure you have a EventSource deployed kubectl get eventsource
  3. Make sure you have a Sensor deployed kubectl get sensor

If all resources are deployed, this does not mean that they’re running. The deployments / pods might not be created by the controller.

The above 3 steps you can combine into one single command: kubectl get pods

NAME                                    READY   STATUS    RESTARTS        AGE
my-eventsource-kgnvq-5b54758d96-v68nf   1/1     Running   0               4h42m
my-sensor-fnhql-597b8cc5ff-kk6xb        1/1     Running   2 (4h42m ago)   4h42m
eventbus-default-stan-0                 2/2     Running   0               4h42m
eventbus-default-stan-1                 2/2     Running   0               4h42m
eventbus-default-stan-2                 2/2     Running   0               4h42m

This should show you a EventBus, EventSource and Sensor. If it doesn’t something is wrong with the deployment. As this could be many things, here are a few pointers:

  • Check the affected resource eg:

    • kubectl describe eventbus default
    • kubectl describe eventsource my-sensor
    • kubectl describe sensor my-sensor
  • Check the controller logs of the affected resource eg:

    • kubectl logs deployment/eventbus-controller -n argo-events
    • kubectl logs deployment/eventsource-controller -n argo-events
    • kubectl logs deployment/sensor-controller -n argo-events

Read the logs carefully, there potentially a message hidden.

Checking the flow

If all resources are there then there could be something wrong with the flow of information.

  1. Check the logs of the EventSource by: kubectl logs my-eventsource-kgnvq-5b54758d96-v68nf
    This should show you something similar to upon posting of an event:

    2022-12-10T04:07:37.942338269Z {"level":"info","ts":1670645257.9422746,"logger":"argo-events.eventsource","caller":"eventsources/eventing.go:512","msg":"succeeded to publish an event","eventSourceName":"my-eventsource","eventName":"webhook","eventSourceType":"webhook","eventID":"36633463313738302d653536622d346263362d383838312d613631346231313133643333"}
    

    If that is not the case the EventSource is probably configured incorrectly, can not authenticate (SQS, slack ect..) or possibly the event is sent to the wrong EventSource type.

  2. Check the logs of the Sensor by: kubectl logs my-sensor-fnhql-597b8cc5ff-kk6xb
    This should show you something similar to:

    2022-12-10T04:07:37.3631563Z {"level":"info","ts":1670645257.3631563,"logger":"argo-events.sensor","caller":"sensors/listener.go:416","msg":"successfully processed the trigger","sensorName":"my-sensor","triggerName":"workflow-trigger","triggerType":"Kubernetes","triggeredBy":["webhook"],"triggeredByEvents":["36633463313738302d653536622d346263362d383838312d613631346231313133643333"]}
    

    if that is not the case the Sensor is probably configured incorrectly. Make sure your dependency is pointing to the correct EventSource

    apiVersion: argoproj.io/v1alpha1
    kind: Sensor
    metadata:
      name: my-sensor
      namespace: default
    spec:
      dependencies:
      - name: my-dependency
        eventSourceName: the-event-source-name
        eventName: the-name-of-the-event-source-type
    

    Note when using the resource trigger, to deploy a Workflow, make sure all namespaces are set correctly. Especially if you’re using it in conjunction with Kustomize. Kustomize will overwrite all base kubernetes resource namespaces. But not those that are part of any CustomResourceDefinition. (kustomize build . | grep "namespace:")

For me the above steps were 9/10 times enough to figure out what goes wrong with triggering the Sensor.