Setting Up a Pull Configuration for Your Customizable Data Mapping

The extraction definition contains a number of features that empower our users to better control how their data enters and persists within OpsLevel. The features that will be covered within the context of this example are intended to highlight and clarify the behaviour of the following extraction definition keys:

http_polling
iterator
exclude
expires_after_days

In this example, we will be configuring an extraction definition to poll the Jira API for Jira issues. This definition will also assist us with managing which issues persist within OpsLevel.

Prerequisites

As this is a more detailed look into the extraction definition, the expectation is that a reader of this guide has perused its parent document Mapping Integration Data to Custom Properties. The following items should already be configured if our reader wishes to roughly follow along:

A simple transform mapping should be in place for this example to run on its own. The following is what will be used in this example. This transform describes a Jira issue, identified within OpsLevel by its key. This component will contain a property "status" that contains the status of any imported issue data.

---
transforms:
- opslevel_identifier: ".id"
  external_kind: jira_issue
  opslevel_kind: jira_issue
  on_component_not_found: create
  properties:
    status: '.fields.status.name'

A pre-existing component type that matches the opslevel_kind key. I have configured a Jira Issue, to match the transform above.

An existing endpoint for OpsLevel to poll. This will take the form of the fictional company Saurons Kitchen's Jira API.

The Extraction Definition

The following extraction definition will be used to extract Jira issues into OpsLevel.

extractors:
  # Extract Jira issues for tracking technical debt and incidents
  - external_kind: jira_issue
    http_polling:
      url: "https://sauronskitchen.atlassian.net/rest/api/3/search?jql=project=TECH AND labels=technical-debt&startAt={{ cursor | default: '0' }}&maxResults=50"
      method: GET
      headers:
        - name: Authorization
          value: "Bearer {{ 'jira_api_token' | secret }}"
        - name: Accept
          value: "application/json"
        - name: Content-Type
          value: "application/json"
      errors:
        # Jira returns 400 for invalid JQL queries
        - status_code: 400
          matches: '.errorMessages[] | contains("Invalid JQL")'
          handler: no_data
        # Handle rate limiting from Atlassian
        - status_code: 429
          handler: rate_limit
      next_cursor:
        from: payload
        value: ".startAt + .maxResults | if . >= .total then null else tostring end"
    iterator: ".issues"
    external_id: ".key"
    exclude: '.fields.status.name == "Done" or .fields.status.name == "Closed"'
    expires_after_days: 30

This extraction definition is far more complex than the base example abes it implements the following parameters, increasing the complexity of the integration & the control a user has over their data:

The http_polling key tells OpsLevel that it will need to poll for integration data instead of waiting for it to be pushed
The iterator key tells OpsLevel that the objects we want to extract that are of the external_kind jira_issue will be in the form of an array, and can be found under the issues key within all received payloads.
The exclude key allows a user to define what data will persist within OpsLevel. This will act as a form of deletion, items that are explicitly excluded will be deleted from OpsLevel.
The expires_after_days key specifies how long data within OpsLevel will be permitted to persist without an update.

Let's take a closer look at these parameters & explore the impact they will have on the integration.

`http_polling`

Enabling http_polling allows OpsLevel to go get your data instead of you sending it to us. This is ideal for customers who don't want to set up some sort of cronjob on their side to push data or for an integration with a third party that does not send a webhook when data changes. HTTP polling comes with a number of features, almost all of which have been used in the example extraction definition.

`url`

This is the endpoint that OpsLevel will hit. This required parameter is a string that supports liquid templating . Alongside having access to OpsLevel Secrets, this url also has access to cursor, a special parameter made available through the use of the next_cursor parameter which will be highlighted below. This will allow OpsLevel to make use of your API's pagination solution, if one exists.

`method`

The HTTP method to be used with the url. This required parameter is an enum, and can be either GET or POST.

http_polling:
  url: "https://sauronskitchen.atlassian.net/rest/api/3/search?jql=project=TECH AND labels=technical-debt&startAt={{ cursor | default: '0' }}&maxResults=50"
  method: GET

In the example, OpsLevel will be making a paginated GET request to the Jira issues endpoint.

`headers`

The headers that are to be included in each request. This optional parameter is a list of objects, with values that support liquid templating. As seen in the example, we have the Accept and Content-Type headers, alongside an Authorization header. This header is retrieving an API key from OpsLevel Secrets to authenticate the requests OpsLevel will make. In our example we can see that we are sending a Bearer authentication header.

http_polling:
  headers:
    - name: Authorization
      value: "Bearer {{ 'jira_api_token' | secret }}"
    - name: Accept
      value: "application/json"
    - name: Content-Type
      value: "application/json"

`errors`

This optional parameter is used to define explicit error handling behaviour that OpsLevel will take when an error is encountered while polling your endpoint. The errors parameter accepts a list of objects containing the following parameters:

The status_code required parameter acts as a filter, & will catch all error responses with a matching code
The matches optional parameter acts as a filter as well & will only be checked against if the status_code is a match. This parameter accepts a JQ string & offers a more granular approach to error handling.
The handler required parameter is an enum that defines the action OpsLevel will take. Options are eitherno_data or rate_limit. no_data resolves in a quiet end to the request, and rate_limit results in a rate limit error being surfaced within OpsLevel.

Requests that result in errors that are not explicitly handled will be surfaced on the integration. From the example, the configuration is terminating requests that result in 400 errors with "Invalid JQ" in the error message. 429 errors will use the rate_limit handler which will retry the requests with backoff. All other errors will be surfaced on the integration.

http_polling:
  errors:
    # Jira returns 400 for invalid JQL queries
    - status_code: 400
      matches: '.errorMessages[] | contains("Invalid JQL")'
      handler: no_data
    # Handle rate limiting from Atlassian
    - status_code: 429
      handler: rate_limit

`next_cursor`

The optional parameter next_cursor is used to indicate that OpsLevel will be polling a paginated endpoint. Use of this parameter enables the url to have access to the cursor variable. This parameter contains the following:

The fromrequired parameter is an enum that accepts the values payload or header. This indicates the origin of where the next cursor information can be found
The value required parameter accepts a JQ string that can be used to evaluate the object selected by the from parameter to identify what the next cursor is.

In our example, the next cursor value can be calculated from the payload, and is found by summing the startAt & maxResults field. If that sum is less than the total, then that value is converted to a string and provided as the value of the cursor variable.

http_polling:
  next_cursor:
    from: payload
    value: ".startAt + .maxResults | if . >= .total then null else tostring end"

As a result of only this configuration, OpsLevel will now poll the provided endpoint daily with behaviours as defined above.

`iterator`

Enabling the optional iterator parameter allows OpsLevel to extract multiple data objects from a single chunk of data. The iteratorparameter accepts a JQ string and is used to indicate to OpsLevel that the data we have received has to be decomposed into the data we wish to represent within OpsLevel. This JQ string upon evaluation, needs to result in an array. In our example, we define the iterator for the external_kind jira_issue as follows:

http_polling:
  iterator: ".issues"

This indicates that our data can be found by iterating over the issues key in our response data. A sample of response data that we would expect to parse with the above iterator would be this:

{
  "isLast": true,
  "issues": [
    {
      "key": "c53bd80c-a56d-46c8-8835-26dc9db6be5d",
      "id": "APB-1001",
      "fields": {
        "status": {
          "name": "Open"
        }
      }
    },
    {
      "key": "d60a2bcf-7f9b-464b-a90e-89fc9f582e3a",
      "id": "APB-1002",
      "fields": {
        "status": {
          "name": "In Progress"
        }
      }
    }
  ]
}

The result of which would be two Jira issues with unique IDs, all originating from the issues array in the payload.

`exclude`

The exclude optional parameter accepts a JQ string and acts as a control for data persistence. Entities that are caught in the filter defined by the expression are deleted from OpsLevel automatically. A simple use case can be seen in our sample exclusion parameter.

http_polling:
  exclude: '.fields.status.name == "Done" or .fields.status.name == "Closed"'

This exclusion rule will catch any issues that have a "Closed" or "Done" status. That means that any incoming Jira issue that matches the filter defined by exclude will be omitted. That also means that any existing Jira issue within OpsLevel that has an external_id matching an incoming external_id (indicating that they are the same issue) will be deleted.

`expires_after_days`

The optional expires_after_days parameter accepts an integer, and acts as a catch-all for stale data. Users may use this parameter to clean up their catalog data. All objects within OpsLevel of the external_kind type that have gone expires_after_days many days without being synced will be destroyed.

http_polling:
  external_kind: jira_issue
  expires_after_days: 30

In this example, all jira_issue objects that have not been synced within the last 30 days will be deleted.

Chained Data Collection

HTTP polling has a for_each parameter that allows you to fetch related data that depends on records you've already collected. Common scenarios include getting issues for each project, extra details for each issue, or any situation where you need data from previous records as parameters for new API calls.

How It Works

When you specify for_each, the extractor will:

Wait for the referenced extractor to complete
Iterate through each record collected by that extractor
Execute a separate HTTP request for each record, with access to that record's data via a liquid variable

Example

---
extractors:
# First extractor: Get all production Sentry projects
- external_kind: sentry_project
  iterator: "."
  external_id: ".id"
  exclude: .name | test("staging")
  http_polling:
    method: GET
    url: "{% if cursor %}{{cursor}}{% else %}https://sentry.io/api/0/projects/{% endif %}"
    headers:
    - name: Accept
      value: application/json
    - name: Authorization
      value: Bearer {{ 'sentry_api_token' | secret }}
    next_cursor:
      from: header
      value: if .link.next.attributes.results == "true" then .link.next.target_url else null end

# Second extractor: Get issues for each project
- external_kind: sentry_issue
  iterator: "."
  external_id: ".id"
  http_polling:
    method: GET
    for_each: sentry_project  # ← This creates the fanout
    url: "{% if cursor %}{{cursor}}{% else %}https://sentry.io/api/0/projects/your_org/{{ sentry_project.id }}/issues/{% endif %}"
    headers:
    - name: Accept
      value: application/json
    - name: Authorization
      value: Bearer {{ 'sentry_api_token' | secret }}
    next_cursor:
      from: header
      value: if .link.next.attributes.results == "true" then .link.next.target_url else null end

In this example:

The first extractor fetches all production Sentry projects by paginating through the projects filtering out projects with name containing staging.
The second extractor uses for_each: sentry_project to iterate through each of those projects.
For each project, it makes a request to get that project's issues. The project data is available as sentry_project and we use sentry_project.id in the URL to fetch issues for that project.

Multi-Level Chaining

You can create multi-level chaining by chaining for_each extractors. For example, you could add a third extractor that uses for_each: sentry_issue to fetch events for each issue, creating a three-level hierarchy: projects → issues → events.

---
extractors:
# ... 2 extractors defined above
- external_kind: sentry_events
  external_id: ".[0].eventID"
  http_polling:
    method: GET
    for_each: sentry_issue
    url: "https://sentry.io/api/0/organizations/your_org/issues/{{ sentry_issue.id }}/events/"
    headers:
    - name: Accept
      value: application/json
    - name: Authorization
      value: Bearer {{ 'sentry_api_token' | secret }}