How to Secure Industrial Automation
How to Secure Industrial Automation
The growing number of cyberattacks on industrial systems shows that industrial automation, like IT infrastructure, needs protection. This protection, as well as countering the effects of failures and human error, is also demanded by market regulators. With the upcoming implementation of the NIS2 directive and the new law on the National Cyber Security System, ensuring OT security is becoming an obligation no longer only for companies classified as critical infrastructure, but also for many branches of the manufacturing sector. We present proven practices, methods, and examples of effective security management and business continuity assurance in automation environments.
The growing number of cyberattacks on industrial systems shows that industrial automation, like IT infrastructure, needs protection. This protection, as well as countering the effects of failures and human error, is also demanded by market regulators. With the upcoming implementation of the NIS2 directive and the new law on the National Cyber Security System, ensuring OT security is becoming an obligation no longer only for companies classified as critical infrastructure, but also for many branches of the manufacturing sector. We present proven practices, methods, and examples of effective security management and business continuity assurance in automation environments.
IT infrastructure, including software that supports the business processes of industrial environments (e.g., production machinery, pumps, railroad equipment), generally referred to as OT (Operational Technology) systems, was originally designed to operate 24/7 in “isolated" networks. Today, these systems are being integrated with IT systems, remote maintenance, cloud services, and third-party vendor solutions, dramatically changing the risk profile.
Developing appropriate procedures and a practical approach to the security and business continuity of OT systems in an organization should become a standard practice, just as it has previously happened in IT environments.
Risks in OT: Hackers, Failures, and Human Error
Most concerns about industrial automation vulnerabilities are related to the risk of external interference. However, our experience with OT network cybersecurity deployment projects, including those based on the National Cyber Security System Act, indicates that a significant portion of incidents are originate internally: failures, unintentional changes, configuration errors, and operator inattention. It should be assumed that the expected amendment to the KSC Act will reflect these observations.
Already the first NIS directive implemented in Poland through the KSC Act placed great emphasis on risk analysis and incident response in this area. Therefore, it is important that preparations for the implementation of the new regulations, inventories, risk analysis, response plans and the OT environment monitoring tools that are being implemented enable parallel management of incidents and detection of operational anomalies resulting from failures or human error (business continuity).
When talking to decision-makers (including at the board level) about security planning, it is important to highlight both groups of threat sources and the need to monitor and prevent them.
While there is a growing conviction among decision-makers that OT networks need security systems, these projects often raise legitimate concerns. In many companies, the OT infrastructure is outdated and integrating it with modern IT systems can involve significant risks. However, this is an increasingly desirable step – not only for security reasons, but also due to the drive toward greater automation and remote access to OT systems needed to reduce infrastructure maintenance costs.
Another concern relates to whether IT professionals who are tasked with supporting their automation colleagues in this new area of responsibility truly understand the world of automation.
Change is required on both sides. In IT, there are certainly significant learning needs and lessons to be learned about what they will face as they enter the OT world. OT teams need to be aware of the need to adapt to IT requirements. Additionally, in companies facing a growing number of increasingly restrictive regulations at both EU and national levels, it is necessary to clearly communicate support for change from the board level.
It is worth noting that OT environments require a different kind of security measures, often with a lower scale of complexity compared to modern IT security technologies. This means that primarily passive incident detection measures are applied, while in IT the foundation is active intervention in network traffic. The very static nature of OT architectures and the highly repetitive content of network traffic result in a simpler profile of incidents that we will also detect as a direct consequence of the risks we identify.
IT ≠ OT
We outline the key differences between IT and industrial automation solutions that determine the security architecture for these systems.
- Priority: IT protects data and service availability, while in OT networks, the risks relate to the health and lives of plant personnel, the quality of the process affecting the quality of the product, and, as a consequence, often the health and lives of customers;
- Network traffic: in IT, it is sometimes chaotic; in OT, it is stable and predictable – consequently enabling anomaly detection (statistical analysis/ML);
- Maintenance and interruptions: in IT, maintenance windows are scheduled; in OT, they are sometimes infrequent or difficult to negotiate;
- Patching: IT needs to be updated; OT often runs old systems without the possibility of patching (including through process dependencies);
- Protection measures: in IT, we respond actively (locks, automations), while in OT, we mainly use passive monitoring and procedural measures;
- Performance: in IT, high throughput is required, but some latency is accepted; in OT, throughput may be lower, but latency is not tolerated (real-time operation);
- Maintenance: in IT, it is easy to change infrastructure maintenance providers; in OT, it is difficult, often the provider holds a monopoly, limiting access to infrastructure;
- Ownership: IT maintenance and development are usually the responsibility of the CFO (as business owner of the ERP system) and the CIO, while industrial automation is typically the domain of the COO;
- Lifespan: in IT, the lifespan of infrastructure is typically 3-5 years; in OT, 10 or even 15 years. Interestingly, many plants are still building new OT segments based on very old but proven network standards and protocols.
Don’t Wait for the Law
EU, national or industry regulations (NIS2, KSC, ISO 27001/22301 standards, TISAX) will increase pressure to implement risk management and incident response projects in an OT environment. From the perspective of practical experience, we can say that the biggest, often difficult-to-estimate potential delays may arise from formal and contractual barriers (e.g., maintenance service monopoly, lack of vendor approval for additional monitoring) and technical barriers (original OT network architecture preventing monitoring).
Waiting for more regulations to come into force while failing to plan adjustment activities is also unreasonable as some of them require a long time to implement changes. It is worthwhile to start removing obstacles and making adjustments that will be necessary anyway, and often turn out not to generate significant costs. It is worth noting that, regardless of the details of the new Polish KSC Act, it can be assumed that the foundations of the obligations imposed on the new group of companies were already present in the regulations previously in force for other industries: the need to perform asset inventory, vulnerability and risk analysis, and the preparation of an incident response plan.
Practical experience from similar projects allows us to formulate conclusions and recommendations.
Organizational Takeaways
- Secure sponsorship at the level of plant/production directors, the company’s board of directors (with clearly defined ownership of each risk group);
- Renegotiate maintenance contracts: require rights to access infrastructure, mirror network traffic, and use network probes;
- Prepare for incident handling. Separate responsibilities: business continuity on the automation side, cyber threats handled by the IT/security team or external support (outtasking for forensics analysis, for example).
The Most Common Obstacles at the Start
During the preparation stage of an industrial automation security project, we usually encounter similar obstacles, regardless of industry or company size. The most important of these are:
- Unmanageable switches with no free ports –no possibility to mirror network traffic or report on the contents of the MIB, netflow;
- Lack of documentation or outdated documentation of topologies and protocol modifications;
- Contractual limitations imposed by automation suppliers and maintenance service providers (e.g., restrictions on engaging third parties to implement the project);
- Reluctance to intervene – fear among automation teams that monitoring will interfere with production (hence the requirement for full passivity and galvanic separation of monitoring measures).
An example from a project: the documentation from OT does not reflect the real architecture of the network or the devices present within it – more devices are visible in a network traffic mirror (pcap) than “on paper". Conclusion: always verify documentation by performing network traffic analysis.
OT Security/Continuity Project Methodology
Based on our experience, we have prepared a methodology for conducting such projects, in which the following steps can be separated at the stage of building the foundation for future compliance with new regulations.
Step 1. Asset inventory (IT platforms and network elements: PLCs, inverters, operator panels, sensors, server applications, diagnostic and monitoring devices), performed on the basis of network traffic capture (PCAP/flow).
Step 2. Identification of vulnerabilities and limitations (without scanners). Use vendor documentation and domain knowledge. Actively scan for vulnerabilities only rarely, and preferably in a lab environment.
Step 3. Mapping infrastructure to processes and assessing their level of criticality – essential for risk assessment.
Step 4. Risk analysis, an incident response plan (along with selection of incidents you want to detect). This determines the choice of tools (sometimes open-source tools are sufficient; detecting anomalies in OT protocols requires probes that understand these protocols).
Step 5. Design and implementation of the monitoring architecture (selection of monitoring methods as a consequence of incident selection).
Step 6. Incident handling. Implementation of basic infrastructure monitoring together with an incident response plan. Defining incident owners to handle: business continuity alerts (OT), cybersecurity incidents (IT/SOC/external partner).
Step 7. Education. Visibility into infrastructure and incidents allows teams to build new requirements, expand monitoring, prepare the company for new regulations, but also look for benefits in other areas, such as predictive maintenance.
From Simple Signals to Register Anomalies
We previously noted that implementing OT infrastructure monitoring often does not require large budgets. A good example of this is the relationship between the incident being targeted and the method of event detection.
To detect simple events such as:
- a new device appearing in the OT segment
- a previously unseen session initiated between endpoints
- loss of traffic between devices
netflow collectors based on open-source solutions are often sufficient.
Complex events such as:
- anomalies in industrial protocols (e.g., non-standard commands, unusual frequency)
- changes in PLC registers (detected through passive monitoring – in mirrored traffic),
- traffic profiles deviating from a “stable" baseline
require the use of more advanced tools (like a commercial OT probes with machine learning capabilities).
As early as the first step described above, i.e. asset inventory and analysis of mirrored traffic, architectural flaws can be discovered that may lead to recommendations for immediate changes.
Example from a project: in the OT segment, a CCTV recorder was detected setting up an encrypted session to China. Recommendation: correct the rules on the firewall (block unwanted outbound connections).
Technology: What Works in OT and Why
- Netflow collectors: The simplest and cheapest to implement method for continuous network inventory and incident detection (e.g., unknown devices, unknown sessions, network traffic outage, etc.)
- Surveillance system: Often also open source-based monitoring of the contents of MIB registers and the SNMP trap collector; the ability to detect infrastructure failures, as well as more detailed inventory (LLDP or SNMP enables, for example, recording device software versions)
- Passive probes/IDS for OT: Monitoring while maintaining galvanic isolation from the monitored network; support for legacy physical interfaces, modified protocols; traceability of register states
- Network traffic recorder: Open-source tools can be used here as well. They enable very detailed network and traffic inventory, as well as post-event analysis using mirrored traffic. Recording of RDP and SSH sessions makes it possible to track administrator activities
- Firewalls, separation, DMZ: Analysis of mirrored network traffic often results in a recommendation to change the configuration of firewalls;
- Switches: Manageable switches enable traffic mirroring, collect data in MIB registers, send netlow reports, etc. Replacing old switches with manageable ones is one of the most cost-effective investments in the implementation of OT network monitoring projects.
Project practice suggests that the decision on which measures from the list above to implement should be preceded by network traffic analysis, infrastructure inventory and a selection of the incidents to be detected. This helps significantly reduce investment costs by avoiding spending money on technical measures that will not be used properly.
Monitor and Respond
It is worth noting that the same technical solutions we use to detect cybersecurity incidents in OT networks are also applied in the business continuity management process: we identify events preceded by human error or failure. Therefore, industrial automation monitoring projects should be justified not only by the need to ensure regulatory compliance. Moreover, removing technical and formal obstacles, although sometimes time-consuming, does not involve significant investment costs.
A well-prepared inventory will reduce costs. The choice of the type of incidents you want to manage determines the choice of tools. Passive monitoring does not pose risks to the industrial automation infrastructure. It is also worth remembering that detected incidents must be managed, so it is necessary to develop an incident response strategy.
Good Cybersecurity Practices in OT
- Do not actively scan OT networks with IT tools, use passive measures
- Start with the data: PCAP, netflow, MIB content
- Revise maintenance contracts (authorization to access mirrored network traffic)
- Start building a response plan before investing – you will save on investment and prepare your team for additional responsibilities
- Build competence on both sides: IT learns about OT’s limitations; OT learns about detection tools and their value in maintenance practice