02. November 2016 · Comments Off on Cybersecurity Architecture in Transition to Address Incident Detection and Response · Categories: blog · Tags: ,

I would like to discuss the most significant cybersecurity architectural change in the last 15 years. It’s being driven by new products built to respond to the industry’s abysmal incident detection record.


There are two types of products driving this architectural change. The first is new “domain-level” detection countermeasures, primarily endpoint and network domains. They dramatically improve threat detection by (1) expanding the types of detection methods used and (2) leveraging threat intelligence to detect zero day threats.

The second is a new type of incident detection and response productivity tool, which Gartner would categorize as SOAR (Security Operations, Analysis and Reporting). It provides SOC analysts with (1) playbooks and orchestration to improve analyst productivity, efficiency, and consistency, (2) automated correlation of alerts from SIEMs and the above mentioned “next-generation” domain-based countermeasures, (3) the ability to query the domain-level countermeasures and other sources of information like CMDBs and vulnerability management solutions via their APIs to provide SOC analysts with improved context, and (4) a graphical user interface supported by a robust database that enables rapid SOC analyst investigations.

I am calling the traditional approach we’ve been using for the last 15 years, “Monolithic,” because a SIEM or a single log repository has been at the center of the detection process. I’m calling the new approach “Composite” because there are multiple event repositories – the SIEM/log repository and those of the domain-level countermeasures.

First I will review why this change is needed, and then I’ll go into more detail about how the Composite architecture addresses the incident detection problems we have been experiencing.

Problems with SIEM

Back around 2000, the first Security Information and Event Management (SIEM) solutions appeared. The rationale for the SIEM was the need to consolidate and analyze the events, as represented by logs, being generated by disparate domain technologies such as anti-virus, firewalls, IDSs, and servers. While SIEMs did OK with compliance reporting and forensics investigation, they were poor at incident detection. The question is, why?

First, for the most part, SIEMs are limited to log analysis. While most of the criticism of SIEMs relates to the limitations of rule based alerting, more on that below, limiting analysis to logs is a problem in itself for two reasons. One, capturing the details of actual activities from logs is difficult. Two, a log often represents just a summary of an event. So SIEMs were handicapped by their data sources.

Additionally, the SIEM’s rule-based analysis approach only alerts on predefined, known bad scenarios. New, creative, unknown scenarios are missed. Tuning rules for a particular organization is also very time consuming, especially when the number of rules that need to be tuned can run into the hundreds.

Another issue that is often overlooked is that SIEM vendors are generalists. They know a little about a lot of domains, but don’t have the same in-depth knowledge about a specific domain as a vendor who specializes in that domain. SIEM vendors don’t know as much about endpoint security as endpoint security vendors, and they don’t know as much about network security as network security vendors.

Finally, SIEMs have not addressed two other key issues that have plagued security operations teams for years – (1) lack of consistent, repeatable processes among different SOC analysts, and (2) mind-numbing repetitive manual tasks that beg to be automated. These issues, plus SIEMs high rate of false positives sap SOC team morale which results in high turnover. Considering cybersecurity’s “zero unemployment” environment, this is a costly problem indeed.

Problems with Traditional Domain-level Countermeasures

But the industry’s poor incident detection track record is not just due to SIEMs. Traditional domain-level detection products also bear responsibility. Let me explain.

Traditional domain-level detection products, whether endpoint or networking, must make their “benign (allow) vs suspicious (alert but allow) vs malicious (alert and block)” decisions in micro or milliseconds. A file appears. Is it malicious or benign? The anti-virus software on the endpoint must decide in a fraction of a second. Then it’s on to the next file. In-line IDS countermeasures face a similar problem. Analyze a packet. Good, suspicious, or malicious? Move on to the next packet. In some out-of-band cases, the countermeasure has the luxury of assembling a complete file before making the decision. File detonation/sandboxing products can take longer, but are still limited to minutes at most. Then it’s on to the next file.

So it’s really been the combination of the limitations of traditional domain-level countermeasures and SIEMs that have resulted in the poor record of incident detection, high false positive rates, and low morale and high turnover among SOC analysts. But there is hope. I am seeing a new generation of security products built around a new, Composite architecture that addresses these issues.

Next Generation Domain-level countermeasures

First, there are new, “next generation,” security domain-level companies that have expanded their analysis timeframe from microseconds to months. A next-gen endpoint product not only analyzes events on the endpoint itself, but collects and stores hundreds of event types for further analysis over time. This also gives the next-gen endpoint product the ability to leverage threat intelligence, i.e. apply new threat intelligence retrospectively over the event repository to detect previously unknown zero-day threats.

A “next-gen” network security vendor collects, analyzes, and stores full packets. With full packets captured, the nature of threat intelligence actually expands beyond IP addresses, URLs, and file hashes to include new signatures. In addition, combinations of “weak” signals can be correlated over time to generate high fidelity alerts.

These domain specific security vendors are also using machine learning and other statistical algorithms to detect malicious scenarios across combinations of multiple events that traditional rule-based analysis would miss.

Finally, these next-gen domain-level countermeasures provide APIs that (1) enables a SOAR product to pull more detailed event information to add context for the SOC analyst, and (2) enables threat hunting with third party threat intelligence.

Architectural issue created by next-gen domain-level countermeasures

But replacing traditional domain specific security countermeasures with these next gen ones actually creates an architectural problem. Instead of having a single Monolithic event repository, i.e. the SIEM or log repository, you have multiple event repositories because it no longer makes sense to add the next-gen domain specific event data into what has been your single event repository. Why? First, the analysis of the raw domain data has already been done by the domain product. Second, if you want to access the data, you can via APIs. Third, as already stated, the type of analysis a SIEM does has not been effective at detecting incidents anyway. Fourth, you are already paying the domain-level vendor for storing the data. Why pay the SIEM or log repository vendor to store that data again?

Having said all this, your primary log repository is not going away anytime soon because you still need it for traditional log sources such as firewall, Active Directory, and Data Loss Prevention. But, over time, there will be fewer traditional countermeasures as these vendors expand their analyses timeframes. Some are already doing this.

So by embracing these next-gen domain specific countermeasures we are creating multiple silos of security information that don’t talk to each other. So how do we correlate these different domains if the events they generate are not in a single repository?

Security Operations, Analysis, and Reporting (SOAR)

This issue is addressed by a new type of correlation analysis product, the second architectural component, which Gartner calls Security Operations, Analysis, and Reporting (SOAR). I believe Gartner first published research on this in the fall of 2015. Here are my SOAR solution requirements:

  1. Receive and correlate alerts from the next-gen domain-level security products.
  2. Query the next-gen domain-level products for more detailed information related to each alert to provide context.
  3. Access CMDBs and vulnerability repositories for additional context.
  4. Receive and correlate alerts from SIEMs. Rule-based alerts from SIEMs need to be correlated by entity, i.e. user.
  5. Query Splunk and other types of log repositories.
  6. Correlate alerts and events from all these sources.
  7. Take threat intelligence feeds and generate queries to the various data repositories. While the next-gen domain specific countermeasures have their own sources of threat intelligence, I fully expect organizations to still subscribe to product independent threat intelligence.
  8. Use a robust database of its own, preferably a graph database, to store all this collected information, and provide fast query responses to SOC analysts pivoting on known data during investigations.
  9. Provide playbooks and the tools to build and customize playbooks to assure consistent incident response processes.
  10. Provide orchestration/automation functions to reduce repetitive manual tasks.

User Entity Behavior Analysis (UEBA)

At this point, you may be thinking, where does User Entity Behavior Analysis (UEBA) fit in? If you are concerned about a true insider threat, i.e. malicious user activity with no malware involved, then UBA is a must. UBA solutions definitely fit into the Composite architecture querying multiple event repositories and sending alerts to the SOAR solution. They should also have APIs so they can be queried by the SOAR solution.

Future Evolution

Looking to the future, I expect the Composite architecture to evolve. Here are some possibilities:

  • A UEBA solution could add SOAR functionality
  • A SIEM solution could add UEBA and/or SOAR functionality
  • A SOAR solution could add log repository and/or SIEM functionality
  • A next-gen domain solution could add SOAR functionality


To summarize, the traditional Monolithic architecture consisting of domain countermeasures that are limited to microsecond/millisecond analysis that feed a SIEM for incident detection has failed. It’s being replaced by the Composite model featuring (1) next-gen domain-level countermeasures that play an expanded analysis role, (2) for the near term, traditional SIEMs and/or primary log repositories continue to play their roles for traditional security countermeasures, and (3) a SOAR solution at the “top of the stack” that is the SOC analysts’ primary incident detection and response tool.

Originally posted on LinkedIn on November 2, 2016


As I look over my experience in Information Security since 1999, I see three distinct eras with respect to the motivation driving technical control purchases:

  • Basic (mid-90’s to early 2000’s) – Organizations implemented basic host-based and network-based technical security controls, i.e. anti-virus and firewalls respectively.
  • Compliance (early 2000’s to mid 2000’s) – Compliance regulations such as Sarbanes-Oxley and PCI drove major improvements in security.
  • Breach Prevention and Incident Detection & Response (BPIDR) (late 2000’s to present) – Organizations realize that regulatory compliance represents a minimum level of security, and is not sufficient to cope with the fast changing methods used by cyber predators. Meeting compliance requirements will not effectively reduce the likelihood of a breach by more skilled and aggressive adversaries or detect their malicious activity.

I have three examples to support the shift from the Compliance era to the Breach Prevention and Incident Detection & Response (BPIDR) era. The first is the increasing popularity of Palo Alto Networks. No compliance regulation I am aware of makes the distinction between a traditional stateful inspection firewall and a Next Generation Firewall as defined by Gartner in their 2009 research report.  Yet in the last four years, 6,000 companies have selected Palo Alto Networks because their NGFWs enable organizations to regain control of traffic at points in their networks where trust levels change or ought to change.

The second example is the evolution of Log Management/SIEM. One can safely say that the driving force for most Log/SIEM purchases in the early to mid 2000s was compliance. The fastest growing vendors of that period had the best compliance reporting capabilities. However, by the late 2000s, many organizations began to realize they needed better detection controls. We began so see a shift in the SIEM market to those solutions which not only provided the necessary compliance reports, but could also function satisfactorily as the primary detection control within limited budget requirements. Hence the ascendancy of Q1 Labs, which actually passed ArcSight in number of installations prior to being acquired by IBM.

The third example is email security. From a compliance perspective, Section 5 of PCI DSS, for example, is very comprehensive regarding anti-virus software. However, it is silent regarding phishing. The popularity of products from Proofpoint and FireEye show that organizations have determined that blocking email-borne viruses is simply not adequate. Phishing and particularly spear-phishing must be addressed.

Rather than simply call the third era “Breach Prevention,” I chose to add “Incident Detection & Response” because preventing all system compromises that could lead to a breach is not possible. You must assume that Prevention controls will have failures. Therefore you must invest in Detection controls as well. Too often, I have seen budget imbalances in favor of Prevention controls.

The goal of a defense-in-depth architecture is to (1) prevent breaches by minimizing attack surfaces, controlling access to assets, and preventing threats and malicious behavior on allowed traffic, and (2) to detect malicious activity missed by prevention controls and detect compromised systems more quickly to minimize the risk of disclosure of confidential data.