Datadog integration ►

Watch the video

This integration ingests Datadog metrics and events into Moogsoft. The integration engine ingests events and metrics from Datadog, then uses Moogsoft's advanced algorithms to detect anomalies, eliminate duplicates, and correlate your data into a single list of actionable incidents.

This integration was validated in Datadog on November 18, 2021. See the Datadog documentation for more details about Datadog components. See also Moogsoft Integration Status below.

About Datadog Rate Limiting

All Datadog API endpoints are rate limited. This means that during metric polling, Datadog returns a 429 error in the JSON response code if a certain number of requests are exceeded within a specific time period.

Datadog rate limiting is generally defined as follows:

X-RateLimit-Limit / X-RateLimit-Period

where:

  • X-RateLimit-Limit is the number of requests allowed in a time period.

  • X-RateLimit-Period is the length of time in seconds for polling call resets.

Moogsoft bases its handling of polling on the rate sent by Datadog.

Datadog Polling rates

The following Datadog polling rates are defaults:

  • Metric polling. The Datadog metric polling rate is 1 minute (rate limited).

  • Event polling. The Datadog event polling rate is 30 seconds (not rate limited).

Moogsoft facilitates immediate management of rate limiting levels by dynamically reacting to the X-RateLimit-Limit so that if (or when) you increase your Datadog rate limit, Moogsoft automatically reacts to the change and increases/decreases its number of queries accordingly.

Best practices

  • It is good practice to ingest CloudWatch data directly using the AWS CloudWatch Integration rather than via Datadog. Ingesting CloudWatch data collected via Datadog can introduce latencies in the CloudWatch data.

  • To avoid rate limiting, and to ensure that Datadog pushes data of interest only, it is a good practice to include event and metric filters as described in the following sections.

Obtain your API key and Application key in Datadog

Log in to Datadog and do the following:

  1. Go to Your Name (lower left) > Organization Settings. You can also search in Go to for Organization Settings.

  2. Find and click on API Keys and click on New Key (upper right).

  3. In the New API Key field, enter a name and click Create API Key.

  4. Copy the API key value to a text file.

  5. Next, find and click Application Keys and then click New Key (upper right).

  6. Enter a name for the key and click Create Key.

  7. Copy the application key to a text file.

Set up the Integration in Moogsoft

  1. Go to Data Config > Ingestion Services > Datadog and create a new integration.

  2. Enter the Name and the Credentials for the integration. If you do not have a profile defined, click Add New Credentials and enter your Application Key and API Key from Datadog.

  3. Under Collect Datadog Events, set up the filters to specify the events that you want Datadog to push. Datadog pushes only events that match all of the specified filters.

    1. Datadog Priority — An event must have a Low or Normal priority. If no filter is defined, Datadog does not filter based on priority.

    2. Datadog Hosts — An event must come from a source in this list. The list is preconfigured based on your Datadog instance. Delete any Datadog monitoring sources that you do not want to ingest.

    3. Datadog Tags — An event must contain all specified tags. To add a tag, enter the tag string and press Enter. You can also select host tags from the pull-down menu. It is good practice to review your event tags in Datadog and verify that this list includes all relevant tags and excludes all irrelevant tags.  If no filter is defined, Datadog considers all tags.

  4. Under Collect Datadog Metrics, set up the filters to specify the metrics that you want Datadog to push. Datadog pushes only metrics that match all of the specified filters.

    • Datadog Metric Name —A metric must have a name in this list. If no filter is defined, Datadog considers all metric names.

    • Datadog Host —A metric must come from a source in this list. The list is preconfigured based on your Datadog instance. Delete any Datadog monitoring sources that you do not want to ingest.

    • Datadog Tag Filter — A metric must contain all specified tags. To add a tag, enter the tag string and press Enter. You can also select host tags from the pull-down menu. It is good practice to review your metric tags in Datadog and verify that this list includes all relevant tags and excludes all irrelevant tags. If no filter is defined, Datadog considers all tags.

Moogsoft Integration Status

The UI shows the integration status after setup. The status can be one of the following:

  • Starting -- The integration is scheduled to begin polling for data from Datadog.

  • Running -- integration is polling Datadog successfully. Ingestion is underway and Moogsoft is pulling down all available metrics.

  • Limited -- Moogsoft is running but unable to pull down all metrics due to the rate limit (only the first 720 metrics are ingested). Datadog has also stopped pushing data due to hourly polling limits. If you see this, consider refining your event or metric filters to reduce the amount of data getting pushed.

  • Error --Moogsoft has been rate limited by the Datadog API and is prevented from working. This status may also include other error types as well.

Note

See the next section for more information about rate limiting and STATUS displays.

You do not need to perform any integration-specific steps on your Datadog systems. After you set up the integration, it polls each Datadog endpoint at regular intervals to collect metrics and events.

Moogsoft Handling of Datadog Rate Limiting

In the current release, Moogsoft provides an enhancement of its rate limiting logic to help you understand and manage queries and API request allocations.

API Allocations. The new Moogsoft logic for handling rate limiting of API requests and queries includes an auto-adjusting utility that increments the initial allocation up or down, based on what is actually used.

An integration deployment will start at 45% of the allocated metric rate limit. Just before the hourly rate limit reset, Moogsoft checks the remaining allocation budget; if the left-over budget is substantial, then Moogsoft increases the allocation percentage. If Datadog rate limiting occurs, Moogsoft will lower the limit. In this way, Moogsoft may raise the rate limit to a maximum of 80%, and may lower the rate to a minimum of 20%.

Important

Datadog APIs use different rate limits for different endpoints. Typically, it is the query "timeseries" rate limit that Moogsoft will encounter when attempting to pull back all the data points during ingestion. Moogsoft's Datadog integration includes logic to automatically adjust both the query frequency based on the configured rate limit, and the left over balance when the limit is reset (typically hourly).

In practice, when the integration is initiated, Moogsoft will use 45% of the 1600 limit. If the rate limit is increased on the Datadog side during that time, Moogsoft will use 45% of what ever amount that increase provided (for example: 45% of 2500). After each hour of execution, Moogsoft will adjust the allocation percentage up or down as needed. Note that, If at the end of an hour, Moogsoft has used up the entire allocation, then the rate will be adjusted down by 5%. However, if there is (for example) 20% of the rate left over, then Moogsoft will increase the allocation by 5% up to 80%. This adjustment will always only go as low as 20% and only as high as 80%.

Note

The Datadog API is often used by many other Datadog clients, so no one client can ever access and allocate the full maximum. Moogsoft uses a percentage that it has determined can last the whole hour to ensure that requests cannot run out, and to be sure that Moogsoft is not rate limited by the Datadog side. In this way, Moogsoft is confident it will obtain a consistent series of values for all metrics.

Moogsoft only auto adjusts to the query metric timeseries rates. When encountering rate limits on any other APIs (for example: querying of active metrics), then the integration will enter the error state.

Metrics accounting. In addition to the max number of queries, Moogsoft also accounts for the max number of metrics a single query can handle. For this purpose, a hard limit of 40 metrics is used to prevent problems on the Datadog side specific to large responses.

Note that the number of metrics per query, and also the number of queries, help determine how many metrics Moogsoft can handle. With out of the box limits, this results in approximately 17 queries per minute over the course of an hour. Since each query can handle a max of 40 metrics, then 17 * 40 (680) is the maximum number of metrics the integration can handle without upping the rate limit on the Datadog side.

If the total number of active metrics requires more than 1040 queries, Moogsoft will only ask for a subset of the active metric list (that can be covered by that amount of queries). When this happens, the STATUS will be shown as LIMITED. When the LIMITED status is displayed, it means that Moogsoft is picking a subset of the active metrics (for example, 700 or 4000). The chosen metrics will be the first 700 of the active metrics names returned by the active metric query.

If Moogsoft is limited by Datadog (and a 429 error status is returned), or if Moogsoft encounters other issues, then the status of the integration will be shown as ERROR.

Lastly, if the metric query request limit is changed on the Datadog side, then the integration will up the hourly request amount by 65% of what ever the new number is.

2022-01-26T16:17:11-05:00