Latest posts by Jimmy Guerrero (see all)
Every Site Reliability Engineer (SRE) knows that the more DevOps tools you use to ensure adequate monitoring coverage, the more likely you’ll end up overloaded with alerts and data analysis tasks. Paradoxically, this avalanche of alerts and data might actually cause you to miss precisely the issues you are trying to identify. To help in this situation, SignifAI delivers powerful real-time and predictive insights to DevOps teams by correlating their event, log and metrics data using a combination of artificial intelligence, machine learning and the team’s own expertise.
SignifAI offers 60+ integrations right out-of-the-box with technologies like AWS, New Relic, AppDynamics, Nagios and Pager Duty. The available integrations cover the most popular DevOps tools used for infrastructure, application, notification, collaboration and deployment tasks. In this week’s post we’ll take a look at how easy it is integrate Splunk’s log monitoring capabilities with SignifAI’s machine intelligence platform.
What is Splunk? With Splunk, DevOps teams can search, monitor, analyze and visualize log data.
The benefits of Splunk and SignifAI integration
Our mission at SignifAI is to augment a DevOps team’s existing tools with the power of AI and machine learning so that SREs can get to accurate remediations faster. We also designed our platform so that it would easily work with a DevOps team’s existing workflows.
SignifAI currently supports the following Splunk products:
- Splunk Light
- Splunk Cloud
- Splunk Enterprise version 6.3 and up
Integrating Splunk with SignifAI delivers the following benefits above and beyond what Splunk alone offers:
Splunk is a data silo of infrastructure log data. SignifAI uses AI and machine learning to correlate Splunk data with other sources of monitoring data including metrics and events, from tools like New Relic and Datadog. In turn, these powerful correlations will give you much richer context around an alert than just the associated log data. Let’s take a look at a real use case example…
Suppose you are running AppDynamics to monitor your Java application. You have configured a baseline alert to trigger for your application latency. Now, a typical scenario will be to also configure Splunk with a search condition on all of your hosts in a cluster to alert on specific ERROR/s. There could also be multiple search terms and logic that might be configured on the same host/s which are not necessarily relevant to your AppDynamics configuration, for example, a search term can be configured in Splunk for every time a new deployment or version changes in the application.
With SignifAI it’s possible to set different search terms to be sent to the SignifAI collector in parallel to the alerts triggered from AppDynamics. SignifAI correlates all the event based on a mutual timeline, a semantic analysis of the search term, any $result.fieldname$ token from Splunk and the $app$ token (application name) and correlates that information with the AppDynamics alerts that triggered on the latency baseline condition. This is done based on the time, application name, related hosts and other exact matches or similar matches (based on a confidence level).
Because SignifAI correlates internal Splunk events as well, any event that matches the same host will be grouped together, so it will be easy to identify an application deployment event which happened on a specific host that caused an application error which caused an application latency increase. All of these events are relevant to each other, automatically correlated and grouped together in SignifAI and most importantly, will trigger only a one alert from the SignifAI platform.
SignifAI doesn’t just correlate the monitoring data from different tools, it correlates the associated alerts as well. So, instead of getting multiple alerts from multiple tools concerning what ultimately is a single issue, SignifAI intelligently groups all the relevant alerts together into an incident card. This makes it easy to see what alerts should be prioritized, which tools are reporting issues and what is the underlying data that can help get you to a remediation faster.
Predictive Insights and Anomaly Detection
SignifAI looks at your Splunk data both in real-time and historically to then correlate it with your other log, metric and events data to surface predictive insights and outliers in real-time or in daily, weekly and monthly alerts.
SignifAI’s machine intelligence is trained using algorithms based on industry and vendor best practices, SignifAI’s own operational expertise plus most importantly, your expertise. As your team resolves issues, SignifAI uses your documented solutions, hints, annotations, retros and runbooks to deliver a fully customized monitoring solution specifically tailored to your systems.
Integrating Splunk with SignifAI’s Web Collector
SignifAI’s Web Collector sensor for Splunk integrates over a webhook and passively analyze and get notified of specific searches, key terms reports, correlate alerts and searches with your other metrics and incidents. Here’s how to set up the integration:
- In your Splunk console, start a search for relevant events.
- Save your search as an alert.
- Configure your alert conditions and choose the Webhook as the delivery method.
- Log in to the main SignifAI console and navigate to the Sensors tab.
- Select Splunk
- Copy the SignifAI collector URL and paste it into the Webhook Endpoint.